点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计131篇
大模型相关(12篇)
【1】CASCADE: LLM-Powered JavaScript Deobfuscator at Google
标题:CASCADE:Google的LLM支持的JavaScript去模糊器
链接:https://arxiv.org/abs/2507.17691
作者:g, Pranoy Kovuri, David Tao, Zhixun Tan
摘要:软件混淆,特别是在JavaScript中普遍存在,阻碍了代码的理解和分析,对软件测试、静态分析和恶意软件检测构成了重大挑战。本文介绍了CASCADE,一种新的混合方法,它集成了先进的编码能力的双子座的确定性转换能力的编译器中间表示(IR),特别是JavaScript IR(JSIR)。通过使用Gemini来识别关键的prelude函数,最流行的混淆技术的基础组件,并利用JSIR进行后续的代码转换,CASCADE有效地恢复了原始字符串和API名称等语义元素,并揭示了原始的程序行为。该方法克服了现有静态和动态去混淆技术的局限性,消除了数百到数千条硬编码规则,同时实现了可靠性和灵活性。CASCADE已经部署在Google的生产环境中,证明了JavaScript去混淆效率的实质性改进,并减少了逆向工程工作。
摘要:Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.
【2】WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
标题:WSM:通过LLM预训练检查点合并的无衰退学习率计划
链接:https://arxiv.org/abs/2507.17634
作者:Tian, Jiapeng Wang, Qian Zhao, Kunlong Chen, Jia Liu, Ziqi Liu, Jiaxin Mao, Wayne Xin Zhao, Zhiqiang Zhang, Jun Zhou
摘要:学习率(LR)调度的最新进展已经证明了无衰减方法的有效性,消除了传统的衰减阶段,同时保持竞争力的性能。模型合并技术已经成为这一领域特别有前途的解决方案。我们提出了Warmup-Stable and Merge(WSM),这是一个通用框架,它在学习率衰减和模型合并之间建立了正式的联系。WSM提供了一个统一的理论基础,用于模拟各种衰减策略,包括余弦衰减,线性衰减和平方根倒数衰减,作为原则模型平均方案,同时保持与各种优化方法完全兼容。通过大量的实验,我们确定合并持续时间-检查点聚合的训练窗口-作为影响模型性能的最关键因素,超过了检查点间隔和合并数量的重要性。我们的框架在多个基准测试中始终优于广泛采用的Warmup-Stable-Decay(WSD)方法,在MATH上实现了+3.5%的显着改进,在HumanEval上实现了+2.9%的显着改进,在MMLU-Pro上实现了+5.5%的显着改进。性能优势扩展到监督微调场景,突出了WSM的长期模型细化的潜力。
摘要:Recent advances in learning rate (LR) scheduling have demonstrated the effectiveness of decay-free approaches that eliminate the traditional decay phase while maintaining competitive performance. Model merging techniques have emerged as particularly promising solutions in this domain. We present Warmup-Stable and Merge (WSM), a general framework that establishes a formal connection between learning rate decay and model merging. WSM provides a unified theoretical foundation for emulating various decay strategies-including cosine decay, linear decay and inverse square root decay-as principled model averaging schemes, while remaining fully compatible with diverse optimization methods. Through extensive experiments, we identify merge duration-the training window for checkpoint aggregation-as the most critical factor influencing model performance, surpassing the importance of both checkpoint interval and merge quantity. Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks, achieving significant improvements of +3.5% on MATH, +2.9% on HumanEval, and +5.5% on MMLU-Pro. The performance advantages extend to supervised fine-tuning scenarios, highlighting WSM's potential for long-term model refinement.
【3】A Comprehensive Evaluation on Quantization Techniques for Large Language Models
标题:大型语言模型量化技术的综合评价
链接:https://arxiv.org/abs/2507.17417
作者:u, Cairong Zhao, Guosheng Hu
摘要:对于大型语言模型(LLM),训练后量化(PTQ)可以显着减少内存占用和计算开销。模型量化是一个快速发展的研究领域。虽然许多论文已经报道了突破性的性能,但他们可能不会在同一个基础上进行实验,因为一种量化方法通常包含多个分量。此外,分析现有方法之间的理论联系对于深入理解至关重要。为了弥补这些差距,我们对最先进的方法进行了广泛的审查,并在同一基础上进行全面评估,以确保公平的比较。据我们所知,这一公正和广泛的调查仍然至关重要,但尚未得到充分探讨。为了更好地理解理论联系,我们将已发表的量化方法解耦为两个步骤:预量化变换和量化误差抑制。我们将前者定义为在量化之前应用的预处理步骤,以减少离群值的影响,使数据分布更平坦,更适合量化。量化误差缓解涉及抵消量化期间引入的误差的技术,从而增强模型性能。我们评估和分析量化方法的不同组成部分的影响。此外,我们还分析和评估了最新的MXFP 4数据格式及其性能。我们的实验结果表明,优化的旋转和缩放产生最好的性能预量化变换,并结合低秩补偿与GPTQ偶尔优于单独使用GPTQ的量化误差缓解。此外,我们探索了最新的MXFP 4量化的潜力,并揭示了INT 4的最佳预量化变换策略并不能很好地推广到MXFP 4,从而激发了进一步的研究。
摘要:For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is a rapidly evolving research field. Though many papers have reported breakthrough performance, they may not conduct experiments on the same ground since one quantization method usually contains multiple components. In addition, analyzing the theoretical connections among existing methods is crucial for in-depth understanding. To bridge these gaps, we conduct an extensive review of state-of-the-art methods and perform comprehensive evaluations on the same ground to ensure fair comparisons. To our knowledge, this fair and extensive investigation remains critically important yet underexplored. To better understand the theoretical connections, we decouple the published quantization methods into two steps: pre-quantization transformation and quantization error mitigation. We define the former as a preprocessing step applied before quantization to reduce the impact of outliers, making the data distribution flatter and more suitable for quantization. Quantization error mitigation involves techniques that offset the errors introduced during quantization, thereby enhancing model performance. We evaluate and analyze the impact of different components of quantization methods. Additionally, we analyze and evaluate the latest MXFP4 data format and its performance. Our experimental results demonstrate that optimized rotation and scaling yield the best performance for pre-quantization transformation, and combining low-rank compensation with GPTQ occasionally outperforms using GPTQ alone for quantization error mitigation. Furthermore, we explore the potential of the latest MXFP4 quantization and reveal that the optimal pre-quantization transformation strategy for INT4 does not generalize well to MXFP4, inspiring further investigation.
【4】Confidence Calibration in Vision-Language-Action Models
标题:视觉-语言-动作模型中的置信度校准
链接:https://arxiv.org/abs/2507.17383
作者:Zollo, Richard Zemel
备注:34 pages, 19 figures
摘要:值得信赖的机器人行为不仅需要高水平的任务成功,而且机器人可以可靠地量化它成功的可能性。为此,我们提出了第一个系统的研究,在视觉语言动作(VLA)的基础模型,映射视觉观察和自然语言的指令,以低级别的机器人电机命令的置信度校准。我们从广泛的基准测试开始,以了解跨多个数据集和VLA变体的任务成功与校准误差之间的关键关系,发现任务性能和校准并不紧张。接下来,我们引入了VLA的提示集合,这是一种轻量级的、受贝叶斯启发的算法,可以平均解释指令的置信度并持续改进校准。我们进一步分析了任务时间范围内的校准,表明在取得一些进展后,信心往往是最可靠的,这表明风险意识干预的自然点。最后,我们揭示了不同的误校准动作尺寸,并提出行动明智的普拉特缩放,一种方法来重新校准每个动作尺寸独立产生更好的信心估计。我们在这项研究中的目的是开始开发工具和概念的理解,使VLA的高性能和高度可信,通过可靠的不确定性量化。
摘要:Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present the first systematic study of confidence calibration in vision-language-action (VLA) foundation models, which map visual observations and natural-language instructions to low-level robot motor commands. We begin with extensive benchmarking to understand the critical relationship between task success and calibration error across multiple datasets and VLA variants, finding that task performance and calibration are not in tension. Next, we introduce prompt ensembles for VLAs, a lightweight, Bayesian-inspired algorithm that averages confidence across paraphrased instructions and consistently improves calibration. We further analyze calibration over the task time horizon, showing that confidence is often most reliable after making some progress, suggesting natural points for risk-aware intervention. Finally, we reveal differential miscalibration across action dimensions and propose action-wise Platt scaling, a method to recalibrate each action dimension independently to produce better confidence estimates. Our aim in this study is to begin to develop the tools and conceptual understanding necessary to render VLAs both highly performant and highly trustworthy via reliable uncertainty quantification.
【5】VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback
标题:VLA-Touch:通过双重触觉反馈增强视觉-语言-动作模型
链接:https://arxiv.org/abs/2507.17294
作者:i, Kevin Yuchen Ma, Ce Hao, Mike Zheng Shou, Harold Soh
备注:19 pages, 5 figures
摘要:触觉反馈通常被认为是与物理世界进行有效交互的关键。然而,最先进的视觉-语言-动作(VLA)模型缺乏解释和使用触觉信号的能力,限制了它们在接触丰富的任务中的有效性。由于缺乏大型多模态数据集,将触觉反馈引入这些系统具有挑战性。我们提出了VLA-Touch,一种方法,增强了通用机器人的策略与触觉传感\{没有微调}的基础VLA。我们的方法引入了两个关键创新:(1)一个利用预训练的语言模型的管道,该模型为高级任务规划提供语义触觉反馈,以及(2)一个基于扩散的控制器,该控制器使用触觉信号细化VLA生成的动作,以实现接触丰富的操作。通过真实世界的实验,我们证明了我们的触觉反馈的双层集成提高了任务规划效率,同时提高了执行精度。代码在\href{https://github.com/jxbi1010/VLA-Touch}{this URL}上开源。
摘要:Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.
【6】Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance
标题:利用知识图和LLM推理来识别仓库规划协助的运营瓶颈
链接:https://arxiv.org/abs/2507.17273
作者:ekh, Saisubramaniam Gopalakrishnan, Zishan Ahmad, Anirudh Deodhar
备注:12 pages, 2 figures
摘要:分析来自仓库操作的离散事件模拟(DES)的大型复杂输出数据集以识别瓶颈和效率低下是一项关键但具有挑战性的任务,通常需要大量的手动工作或专门的分析工具。我们的框架集成了知识图(KG)和大型语言模型(LLM)为基础的代理分析复杂的离散事件仿真(DES)从仓库操作的输出数据。它将原始DES数据转换为语义丰富的KG,捕获仿真事件和实体之间的关系。基于LLM的代理使用迭代推理,生成相互依赖的子问题。对于每个子问题,它为KG交互创建Cypher查询,提取信息,并自我反思以纠正错误。这种自适应、迭代和自我纠正的过程可以识别模拟人类分析的操作问题。我们的DES方法用于仓库瓶颈识别,通过设备故障和流程不规则性测试,优于基线方法。对于操作问题,它在查明效率低下方面达到了近乎完美的通过率。对于复杂的调查问题,我们展示了其卓越的诊断能力,以发现微妙的,相互关联的问题。这项工作将模拟建模和人工智能(KG+LLM)连接起来,为可操作的洞察力提供了更直观的方法,减少了洞察时间,并实现了自动化仓库效率评估和诊断。
摘要:Analyzing large, complex output datasets from Discrete Event Simulations (DES) of warehouse operations to identify bottlenecks and inefficiencies is a critical yet challenging task, often demanding significant manual effort or specialized analytical tools. Our framework integrates Knowledge Graphs (KGs) and Large Language Model (LLM)-based agents to analyze complex Discrete Event Simulation (DES) output data from warehouse operations. It transforms raw DES data into a semantically rich KG, capturing relationships between simulation events and entities. An LLM-based agent uses iterative reasoning, generating interdependent sub-questions. For each sub-question, it creates Cypher queries for KG interaction, extracts information, and self-reflects to correct errors. This adaptive, iterative, and self-correcting process identifies operational issues mimicking human analysis. Our DES approach for warehouse bottleneck identification, tested with equipment breakdowns and process irregularities, outperforms baseline methods. For operational questions, it achieves near-perfect pass rates in pinpointing inefficiencies. For complex investigative questions, we demonstrate its superior diagnostic ability to uncover subtle, interconnected issues. This work bridges simulation modeling and AI (KG+LLM), offering a more intuitive method for actionable insights, reducing time-to-insight, and enabling automated warehouse inefficiency evaluation and diagnosis.
【7】HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery
标题:Hypochiner:一个结合LLM和知识图的协作系统,用于假设驱动的科学发现
链接:https://arxiv.org/abs/2507.17209
作者:ang, Shaohan Shi, Yunjie Yao, Chang Jiang, Quan Li
摘要
:现代科学发现在整合对生物医学和药物开发的突破至关重要的大量和异质性知识方面面临着越来越大的挑战。传统的假设驱动的研究虽然有效,但受到人类认知极限、生物系统复杂性和试错实验高成本的限制。深度学习模型,特别是图神经网络(GNN),加速了预测生成,但输出的绝对数量使得手动选择验证不可扩展。大型语言模型(LLM)在过滤和假设生成方面提供了希望,但存在幻觉,缺乏结构化知识的基础,限制了它们的可靠性。为了解决这些问题,我们提出了Hypochainer,一个协作可视化框架,集成了人类的专业知识,LLM驱动的推理和知识图(KG),以增强假设生成和验证。Hypochainer分三个阶段运行:首先,探索和情境化-专家使用检索增强的LLM(RAG)和降维来导航大规模GNN预测,并辅以交互式解释。第二,假设链的形成-专家反复检查KG关系周围的预测和语义链接的实体,细化假设与LLM和KG建议。第三,验证优先级-根据KG支持的证据过滤细化的假设,以确定实验的高优先级候选人,视觉分析进一步加强推理中的薄弱环节。我们通过两个领域的案例研究和专家访谈展示了Hypochainer的有效性,突出了其支持可解释,可扩展和以知识为基础的科学发现的潜力。
摘要:Modern scientific discovery faces growing challenges in integrating vast and heterogeneous knowledge critical to breakthroughs in biomedicine and drug development. Traditional hypothesis-driven research, though effective, is constrained by human cognitive limits, the complexity of biological systems, and the high cost of trial-and-error experimentation. Deep learning models, especially graph neural networks (GNNs), have accelerated prediction generation, but the sheer volume of outputs makes manual selection for validation unscalable. Large language models (LLMs) offer promise in filtering and hypothesis generation, yet suffer from hallucinations and lack grounding in structured knowledge, limiting their reliability. To address these issues, we propose HypoChainer, a collaborative visualization framework that integrates human expertise, LLM-driven reasoning, and knowledge graphs (KGs) to enhance hypothesis generation and validation. HypoChainer operates in three stages: First, exploration and contextualization -- experts use retrieval-augmented LLMs (RAGs) and dimensionality reduction to navigate large-scale GNN predictions, assisted by interactive explanations. Second, hypothesis chain formation -- experts iteratively examine KG relationships around predictions and semantically linked entities, refining hypotheses with LLM and KG suggestions. Third, validation prioritization -- refined hypotheses are filtered based on KG-supported evidence to identify high-priority candidates for experimentation, with visual analytics further strengthening weak links in reasoning. We demonstrate HypoChainer's effectiveness through case studies in two domains and expert interviews, highlighting its potential to support interpretable, scalable, and knowledge-grounded scientific discovery.
【8】Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models
标题:大型语言模型中的强化学习细粒度稀疏子网络
链接:https://arxiv.org/abs/2507.17107
作者:lashov
备注:16 pages, 6 figures
摘要:强化学习(RL)是将大型语言模型(LLM)与复杂任务和人类偏好对齐的关键后预训练步骤。虽然经常假设RL微调需要更新模型的大部分参数,但我们用一个令人惊讶的发现挑战了这一假设:RL微调始终只修改一个小的子网络(通常为5-30%的权重),而大多数参数不变。我们称这种现象为RL诱导的参数更新稀疏性。它自然出现,没有任何稀疏约束或参数有效的调整,并出现在多个RL算法(例如,PPO、DPO、SimPO、PRIME)和模型系列(例如,OpenAI、Meta和开源LLM)。此外,通过RL更新的子网络在不同的种子、数据集和算法之间显示出大量的重叠-远远超过了机会-这表明预训练模型中的部分可转移结构。我们表明,微调只有这个稀疏的子网络恢复完整的模型性能,并产生几乎相同的参数完全微调模型。我们的分析表明,这种稀疏性的出现是因为RL在模型的原始分布附近运行,只需要有针对性的改变。KL惩罚、梯度裁剪和策略动态对稀疏模式的影响有限。这些发现揭示了强化学习如何适应模型:不是通过改变所有权重,而是通过将训练集中在一个小的、持续更新的子网络上。这种见解使更有效的RL方法成为可能,并通过彩票假设的镜头重新构建稀疏性。
摘要:Reinforcement learning (RL) is a key post-pretraining step for aligning large language models (LLMs) with complex tasks and human preferences. While it is often assumed that RL fine-tuning requires updating most of a model's parameters, we challenge this assumption with a surprising finding: RL fine-tuning consistently modifies only a small subnetwork (typically 5-30% of weights), leaving most parameters unchanged. We call this phenomenon RL-induced parameter update sparsity. It arises naturally, without any sparsity constraints or parameter-efficient tuning, and appears across multiple RL algorithms (e.g., PPO, DPO, SimPO, PRIME) and model families (e.g., OpenAI, Meta, and open-source LLMs). Moreover, the subnetworks updated by RL show substantial overlap across different seeds, datasets, and algorithms-far exceeding chance-suggesting a partially transferable structure in the pretrained model. We show that fine-tuning only this sparse subnetwork recovers full model performance and yields parameters nearly identical to the fully fine-tuned model. Our analysis suggests this sparsity emerges because RL operates near the model's original distribution, requiring only targeted changes. KL penalties, gradient clipping, and on-policy dynamics have limited effect on the sparsity pattern. These findings shed new light on how RL adapts models: not by shifting all weights, but by focusing training on a small, consistently updated subnetwork. This insight enables more efficient RL methods and reframes sparsity through the lens of the lottery ticket hypothesis.
【9】Causal Graph Fuzzy LLMs: A First Introduction and Applications in Time Series Forecasting
标题:因果图模糊LLM:时间序列预测中的首次介绍和应用
链接:https://arxiv.org/abs/2507.17016
作者:g, Patricia O. Lucas, Gabriel I. F. Paiva, Petronio C. L. Silva, Felipe Augusto Rocha da Silva, Adriano Alonso Veloso, Frederico Gadelha Guimaraes
备注:Accepted for publication at the Brazilian Congress of Artificial Intelligence (CBIC)
摘要:近年来,大语言模型(LLM)在时间序列预测(TSF)中的应用引起了研究者的极大关注。本文提出了一种新的LLM框架,命名为CGF-LLM,使用GPT-2结合模糊时间序列(FTS)和因果图来预测多变量时间序列,标志着文献中的第一个这样的架构。其主要目标是通过模糊化和因果分析的并行应用,将数值时间序列转换为可解释的形式,从而实现语义理解和结构洞察作为预训练GPT-2模型的输入。由此产生的文本表示提供了一个更可解释的观点的复杂动态的原始时间序列。报告的结果证实了我们提出的基于LLM的时间序列预测模型的有效性,如四个不同的多变量时间序列数据集所示。这一举措在TSF领域使用基于FTS的LLM铺平了有前途的未来方向。
摘要:In recent years, the application of Large Language Models (LLMs) to time series forecasting (TSF) has garnered significant attention among researchers. This study presents a new frame of LLMs named CGF-LLM using GPT-2 combined with fuzzy time series (FTS) and causal graph to predict multivariate time series, marking the first such architecture in the literature. The key objective is to convert numerical time series into interpretable forms through the parallel application of fuzzification and causal analysis, enabling both semantic understanding and structural insight as input for the pretrained GPT-2 model. The resulting textual representation offers a more interpretable view of the complex dynamics underlying the original time series. The reported results confirm the effectiveness of our proposed LLM-based time series forecasting model, as demonstrated across four different multivariate time series datasets. This initiative paves promising future directions in the domain of TSF using LLMs based on FTS.
【10】SiLQ: Simple Large Language Model Quantization-Aware Training
标题:SiLQ:简单大型语言模型量化感知训练
链接:https://arxiv.org/abs/2507.16933
作者: Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha
备注:12 pages, 3 figures
摘要:大型语言模型可以量化,以减少推理时间延迟、模型大小和能耗,从而以更低的成本提供更好的用户体验。在合理的时间内以最小的准确性损失交付量化模型,特别是在不需要与专用推理加速器不兼容的机制的情况下这样做,存在挑战。在这里,我们展示了一种简单的端到端量化感知训练方法,该方法在总模型训练预算增加不到0.1%的情况下,在几个现代基准测试中,无论是基础模型还是指令模型,都大大优于领先的量化方法。该方法很容易在不同的模型架构中推广,可以应用于激活,缓存和权重,并且除了量化本身之外,不需要向模型引入额外的操作。
摘要
:Large language models can be quantized to reduce inference time latency, model size, and energy consumption, thereby delivering a better user experience at lower cost. A challenge exists to deliver quantized models with minimal loss of accuracy in reasonable time, and in particular to do so without requiring mechanisms incompatible with specialized inference accelerators. Here, we demonstrate a simple, end-to-end quantization-aware training approach that, with an increase in total model training budget of less than 0.1%, outperforms the leading published quantization methods by large margins on several modern benchmarks, with both base and instruct model variants. The approach easily generalizes across different model architectures, can be applied to activations, cache, and weights, and requires the introduction of no additional operations to the model other than the quantization itself.
【11】Revisiting Pre-trained Language Models for Vulnerability Detection
标题:重新审视预先训练的语言模型以进行漏洞检测
链接:https://arxiv.org/abs/2507.16887
作者:i, Weiliang Qi, Xuyu Wang, Fuxun Yu, Xinda Wang
摘要:预训练语言模型(PLM)的快速发展已经证明了各种代码相关任务的良好效果。然而,它们在检测现实世界脆弱性方面的有效性仍然是一个关键挑战。%的安全社区。虽然现有的实证研究评估PLM的漏洞检测(VD),他们在数据准备,评估设置和实验设置的考虑不足,破坏了评估的准确性和全面性。本文介绍了RevisitVD,一个广泛的评估17 PLM跨越较小的代码特定的PLM和大型PLM使用新构建的数据集。具体来说,我们比较了PLM在微调和即时工程下的性能,评估了它们在各种训练和测试设置中的有效性和可推广性,并分析了它们对代码规范化,抽象和语义保持转换的鲁棒性。 我们的研究结果表明,对于VD任务,包含旨在捕获代码语法和语义模式的预训练任务的PLM的性能优于通用PLM和仅在大型代码语料库上预训练或微调的PLM。然而,这些模型在现实世界的场景中面临着显着的挑战,例如在检测具有复杂依赖关系的漏洞,处理由代码规范化和抽象引入的扰动以及识别语义保持脆弱的代码转换方面的困难。此外,由PLM的有限上下文窗口引起的截断可能导致不可忽略的标记错误量。这项研究强调了在实际情况下模型性能的全面评估的重要性,并概述了未来的方向,以帮助提高现实VD应用的PLM的有效性。
摘要:The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. % for the security community. While existing empirical studies evaluate PLMs for vulnerability detection (VD), their inadequate consideration in data preparation, evaluation setups, and experimental settings undermines the accuracy and comprehensiveness of evaluations. This paper introduces RevisitVD, an extensive evaluation of 17 PLMs spanning smaller code-specific PLMs and large-scale PLMs using newly constructed datasets. Specifically, we compare the performance of PLMs under both fine-tuning and prompt engineering, assess their effectiveness and generalizability across various training and testing settings, and analyze their robustness against code normalization, abstraction, and semantic-preserving transformations. Our findings reveal that, for VD tasks, PLMs incorporating pre-training tasks designed to capture the syntactic and semantic patterns of code outperform both general-purpose PLMs and those solely pre-trained or fine-tuned on large code corpora. However, these models face notable challenges in real-world scenarios, such as difficulties in detecting vulnerabilities with complex dependencies, handling perturbations introduced by code normalization and abstraction, and identifying semantic-preserving vulnerable code transformations. Also, the truncation caused by the limited context windows of PLMs can lead to a non-negligible amount of labeling errors. This study underscores the importance of thorough evaluations of model performance in practical scenarios and outlines future directions to help enhance the effectiveness of PLMs for realistic VD applications.
【12】SynthCTI: LLM-Driven Synthetic CTI Generation to enhance MITRE Technique Mapping
标题:SynthRTI:LLM驱动的合成RTI生成以增强MITRE技术映射
链接:https://arxiv.org/abs/2507.16852
作者:iz-Ródenas, Jaime Pujante Sáez, Daniel García-Algora, Mario Rodríguez Béjar, Jorge Blasco, José Luis Hernández-Ramos
备注:17 pages, 13 figures
摘要:网络威胁情报(CTI)挖掘涉及从非结构化威胁数据中提取结构化见解,使组织能够理解和应对不断变化的对抗行为。CTI挖掘的一个关键任务是将威胁描述映射到MITRE ATT\&CK技术。然而,该过程通常手动执行,需要专业知识和大量工作。自动化方法面临两个主要挑战:缺乏高质量的标记CTI数据和类不平衡,其中许多技术的例子很少。虽然特定于领域的大型语言模型(LLM)(如SecureBERT)已经显示出更好的性能,但最近的工作主要集中在模型架构上,而不是解决数据限制。在这项工作中,我们提出了SynthCTI,一个数据增强框架,旨在为代表不足的MITRE ATT\&CK技术生成高质量的合成CTI语句。我们的方法使用基于聚类的策略从训练数据中提取语义上下文,并指导LLM生成词汇多样且语义忠实的合成CTI句子。我们使用具有不同容量的LLM在两个公开可用的CTI数据集(CTI-to-MITRE和TRAM)上评估了SynthCTI。对合成数据进行优化会带来一致的宏F1改进:例如,ALBERT从0.35提高到0.52(相对增益为48.6%),SecureBERT达到0.6558(从0.4412提高)。值得注意的是,使用SynthCTI增强的较小模型优于未经增强训练的较大模型,证明了数据生成方法对于构建高效和有效的CTI分类系统的价值。
摘要:Cyber Threat Intelligence (CTI) mining involves extracting structured insights from unstructured threat data, enabling organizations to understand and respond to evolving adversarial behavior. A key task in CTI mining is mapping threat descriptions to MITRE ATT\&CK techniques. However, this process is often performed manually, requiring expert knowledge and substantial effort. Automated approaches face two major challenges: the scarcity of high-quality labeled CTI data and class imbalance, where many techniques have very few examples. While domain-specific Large Language Models (LLMs) such as SecureBERT have shown improved performance, most recent work focuses on model architecture rather than addressing the data limitations. In this work, we present SynthCTI, a data augmentation framework designed to generate high-quality synthetic CTI sentences for underrepresented MITRE ATT\&CK techniques. Our method uses a clustering-based strategy to extract semantic context from training data and guide an LLM in producing synthetic CTI sentences that are lexically diverse and semantically faithful. We evaluate SynthCTI on two publicly available CTI datasets, CTI-to-MITRE and TRAM, using LLMs with different capacity. Incorporating synthetic data leads to consistent macro-F1 improvements: for example, ALBERT improves from 0.35 to 0.52 (a relative gain of 48.6\%), and SecureBERT reaches 0.6558 (up from 0.4412). Notably, smaller models augmented with SynthCTI outperform larger models trained without augmentation, demonstrating the value of data generation methods for building efficient and effective CTI classification systems.
Graph相关(图学习|图神经网络|图优化等)(5篇)
【1】Towards Effective Open-set Graph Class-incremental Learning
标题:迈向有效的开集图类增量学习
链接:https://arxiv.org/abs/2507.17687
作者:hen, Zheng Ma, Sichao Fu, Mingbin Feng, Tony S. Wirjanto, Weihua Ou
备注:Accepted by 33rd ACM International Conference on Multimedia (MM 2025)
摘要:图类增量学习(GCIL)允许图神经网络(GNN)通过增量学习新类知识同时保留旧类知识来适应不断发展的图分析任务。现有的GCIL方法主要集中在闭集假设,其中所有的测试样本被假定为属于先前已知的类。这样的假设限制了它们在真实世界场景中的适用性,其中未知类在推理过程中自然出现,并且在训练过程中不存在。在本文中,我们探索了一个更具挑战性的开集图类增量学习场景,其中有两个相互交织的挑战:旧类的灾难性遗忘,这会损害未知类的检测,以及开集识别不足,这会破坏学习知识的保留。针对上述问题,提出了一种新的OGCIL框架,该框架利用伪样本嵌入生成来有效地缓解灾难性遗忘,实现未知类的鲁棒检测。具体来说,设计了一个原型条件变分自动编码器来合成旧类的节点嵌入,从而在不存储原始图数据的情况下实现知识重放。为了处理未知类,我们采用基于混合的策略来从伪分布和当前节点嵌入生成分布外(OOD)样本。进一步提出了一种新的原型超球分类损失,锚定在分布嵌入到各自的类原型,而排斥OOD嵌入了。我们提出的目标函数不是将所有未知样本分配到一个聚类中,而是通过原型感知的拒绝区域将它们显式地建模为离群值,从而确保鲁棒的开集识别。在五个基准测试上的大量实验证明了OGCIL方法优于现有的GCIL和开集GNN方法的有效性。
摘要
:Graph class-incremental learning (GCIL) allows graph neural networks (GNNs) to adapt to evolving graph analytical tasks by incrementally learning new class knowledge while retaining knowledge of old classes. Existing GCIL methods primarily focus on a closed-set assumption, where all test samples are presumed to belong to previously known classes. Such an assumption restricts their applicability in real-world scenarios, where unknown classes naturally emerge during inference, and are absent during training. In this paper, we explore a more challenging open-set graph class-incremental learning scenario with two intertwined challenges: catastrophic forgetting of old classes, which impairs the detection of unknown classes, and inadequate open-set recognition, which destabilizes the retention of learned knowledge. To address the above problems, a novel OGCIL framework is proposed, which utilizes pseudo-sample embedding generation to effectively mitigate catastrophic forgetting and enable robust detection of unknown classes. To be specific, a prototypical conditional variational autoencoder is designed to synthesize node embeddings for old classes, enabling knowledge replay without storing raw graph data. To handle unknown classes, we employ a mixing-based strategy to generate out-of-distribution (OOD) samples from pseudo in-distribution and current node embeddings. A novel prototypical hypersphere classification loss is further proposed, which anchors in-distribution embeddings to their respective class prototypes, while repelling OOD embeddings away. Instead of assigning all unknown samples into one cluster, our proposed objective function explicitly models them as outliers through prototype-aware rejection regions, ensuring a robust open-set recognition. Extensive experiments on five benchmarks demonstrate the effectiveness of OGCIL over existing GCIL and open-set GNN methods.
【2】Generalized Low-Rank Matrix Contextual Bandits with Graph Information
标题:具有图信息的广义低阶矩阵上下文带宽
链接:https://arxiv.org/abs/2507.17528
作者: Jiannan Li, Yue Kang, Shanxing Gao, Zhenxin Xiao
摘要:矩阵上下文Bandit(CB)作为多臂Bandit的扩展,是一个功能强大的框架,在低秩序贯决策问题中得到了广泛的应用。在许多现实场景中,例如在线广告和推荐系统,附加的图信息通常存在于低秩结构之外,即,用户/项目之间的相似关系可以通过相应图中节点之间的连接性自然地捕获。然而,现有的矩阵CB方法未能探索这样的图形信息,从而使他们难以产生有效的决策政策。为了填补这一空白,我们在本文中提出了一种新的矩阵CB算法框架,建立在经典的置信上限(UCB)框架。这种新的框架可以有效地集成低秩结构和图信息在一个统一的方式。具体来说,它涉及到首先解决一个联合核范数和矩阵拉普拉斯正则化问题,然后实现基于图的广义线性版本的UCB算法。严格的理论分析表明,我们的程序优于几个流行的替代品的累积遗憾界,由于有效地利用图形信息。一系列的合成和真实世界的数据实验进行进一步说明我们的程序的优点。
摘要:The matrix contextual bandit (CB), as an extension of the well-known multi-armed bandit, is a powerful framework that has been widely applied in sequential decision-making scenarios involving low-rank structure. In many real-world scenarios, such as online advertising and recommender systems, additional graph information often exists beyond the low-rank structure, that is, the similar relationships among users/items can be naturally captured through the connectivity among nodes in the corresponding graphs. However, existing matrix CB methods fail to explore such graph information, and thereby making them difficult to generate effective decision-making policies. To fill in this void, we propose in this paper a novel matrix CB algorithmic framework that builds upon the classical upper confidence bound (UCB) framework. This new framework can effectively integrate both the low-rank structure and graph information in a unified manner. Specifically, it involves first solving a joint nuclear norm and matrix Laplacian regularization problem, followed by the implementation of a graph-based generalized linear version of the UCB algorithm. Rigorous theoretical analysis demonstrates that our procedure outperforms several popular alternatives in terms of cumulative regret bound, owing to the effective utilization of graph information. A series of synthetic and real-world data experiments are conducted to further illustrate the merits of our procedure.
【3】DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning
标题:DynaSearcher:通过多奖励强化学习的动态知识图增强搜索代理
链接:https://arxiv.org/abs/2507.17365
作者:ao, Wenfeng Feng, Yuewei Zhang, Hao Wang
备注:10 pages, 2 figures
摘要:基于大型语言模型的多步智能检索系统在复杂信息检索任务中表现出了显著的性能。然而,这些系统在实际应用中仍然面临着巨大的挑战,特别是在生成事实上不一致的中间查询和低效的搜索轨迹,这可能导致推理偏差或冗余计算。为了解决这些问题,我们提出了DynaSearcher,这是一种通过动态知识图和多奖励强化学习(RL)增强的创新搜索代理。具体来说,我们的系统利用知识图作为外部结构化知识,通过显式建模实体关系来指导搜索过程,从而确保中间查询的事实一致性并减轻不相关信息的偏见。此外,我们采用多奖励RL框架对训练目标进行细粒度控制,例如检索准确性,效率和响应质量。该框架促进生成高质量的中间查询和全面的最终答案,同时阻止不必要的探索并最大限度地减少信息遗漏或冗余。实验结果表明,我们的方法在六个多跳问答数据集上实现了最先进的答案准确性,匹配前沿LLM,同时仅使用小规模模型和有限的计算资源。此外,我们的方法在不同的检索环境和更大规模的模型中表现出很强的泛化能力和鲁棒性,突出了其广泛的适用性。
摘要:Multi-step agentic retrieval systems based on large language models (LLMs) have demonstrated remarkable performance in complex information search tasks. However, these systems still face significant challenges in practical applications, particularly in generating factually inconsistent intermediate queries and inefficient search trajectories, which can lead to reasoning deviations or redundant computations. To address these issues, we propose DynaSearcher, an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL). Specifically, our system leverages knowledge graphs as external structured knowledge to guide the search process by explicitly modeling entity relationships, thereby ensuring factual consistency in intermediate queries and mitigating biases from irrelevant information. Furthermore, we employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality. This framework promotes the generation of high-quality intermediate queries and comprehensive final answers, while discouraging unnecessary exploration and minimizing information omissions or redundancy. Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets, matching frontier LLMs while using only small-scale models and limited computational resources. Furthermore, our approach demonstrates strong generalization and robustness across diverse retrieval environments and larger-scale models, highlighting its broad applicability.
【4】PyG 2.0: Scalable Learning on Real World Graphs
标题:PyG 2.0:真实世界图形上的可扩展学习
链接:https://arxiv.org/abs/2507.16991
作者:Fey, Jinu Sunil, Akihiro Nitta, Rishi Puri, Manan Shah, Blaž Stojanovič, Ramona Bendias, Alexandria Barghi, Vid Kocijan, Zecheng Zhang, Xinwei He, Jan Eric Lenssen, Jure Leskovec
摘要:PyG(PyTorch Geometric)自首次发布以来已经取得了显著的发展,成为图神经网络的领先框架。在本文中,我们介绍了Pyg 2.0(及其后续的次要版本),这是一个全面的更新,在可扩展性和实际应用程序功能方面进行了重大改进。我们详细介绍了该框架的增强架构,包括对异构和时态图的支持,可扩展的功能/图存储和各种优化,使研究人员和从业人员能够有效地解决大规模图学习问题。近年来,PyG一直在各种应用领域支持图学习,我们将对此进行总结,同时深入研究关系深度学习和大型语言建模的重要领域。
摘要:PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its subsequent minor versions), a comprehensive update that introduces substantial improvements in scalability and real-world application capabilities. We detail the framework's enhanced architecture, including support for heterogeneous and temporal graphs, scalable feature/graph stores, and various optimizations, enabling researchers and practitioners to tackle large-scale graph learning problems efficiently. Over the recent years, PyG has been supporting graph learning in a large variety of application areas, which we will summarize, while providing a deep dive into the important areas of relational deep learning and large language modeling.
【5】Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems
标题:预测准一维伊辛系统中磁性的图神经网络方法
链接:https://arxiv.org/abs/2507.17509
作者:, O. Kryvchikov, D. Laptev
备注:18 pages, 4 figures
摘要:我们提出了一个基于图的深度学习框架,用于预测准一维伊辛自旋系统的磁性。晶格几何形状被编码为图,并由图神经网络(GNN)处理,然后是完全连接的层。该模型在Monte Carlo模拟数据上进行训练,并准确地再现了磁化曲线的关键特征,包括平台,临界转变点和几何挫折的影响。它捕获了局部图案和全局对称性,表明GNN可以直接从结构连接性推断磁性行为。所提出的方法能够有效地预测磁化强度,而不需要额外的蒙特卡罗模拟。
摘要:We present a graph-based deep learning framework for predicting the magnetic properties of quasi-one-dimensional Ising spin systems. The lattice geometry is encoded as a graph and processed by a graph neural network (GNN) followed by fully connected layers. The model is trained on Monte Carlo simulation data and accurately reproduces key features of the magnetization curve, including plateaus, critical transition points, and the effects of geometric frustration. It captures both local motifs and global symmetries, demonstrating that GNNs can infer magnetic behavior directly from structural connectivity. The proposed approach enables efficient prediction of magnetization without the need for additional Monte Carlo simulations.
Transformer(4篇)
【1】Vision Transformer attention alignment with human visual perception in aesthetic object evaluation
标题:视觉Transformer者在审美对象评估中注意力与人类视觉感知的一致
链接:https://arxiv.org/abs/2507.17616
作者:rrasco, César González-Martín, José Aranda, Luis Oliveros
备注:25 pages, 15 figures
摘要:视觉注意机制在人类感知和审美评价中起着至关重要的作用。Vision Transformers(ViTs)的最新进展已经在计算机视觉任务中表现出了显着的能力,但它们与人类视觉注意力模式的一致性仍然没有得到充分的探索,特别是在美学背景下。本研究探讨了人类视觉注意力和ViT注意力机制之间的相关性时,评估手工制作的对象。我们对30名参与者(9名女性,21名男性,平均年龄24.6岁)进行了眼动追踪实验,他们观看了20个手工物品,包括编织袋和姜罐。使用Pupil Labs眼动仪,我们记录了凝视模式并生成了代表人类视觉注意力的热图。同时,我们使用预先训练好的ViT模型和DINO(Self-Distillation with NO Labels)分析了相同的对象,从12个注意力头部中提取注意力地图。我们使用Kullback-Leibler散度在不同的高斯参数(sigma=0.1至3.0)下比较了人类和ViT的注意力分布。统计分析揭示了在σ =2.4 +-0.03处的最佳相关性,其中注意力头部#12显示出与人类视觉模式的最强对准。在注意力头部之间发现了显著差异,其中头部#7和#9表现出与人类注意力的最大差异(p< 0.05,Tukey HSD检验)。结果表明,虽然ViTs表现出更多的全球性的注意模式相比,人类的焦点注意,某些注意头可以近似人类的视觉行为,特别是对于特定的对象功能,如编织物中的物品。这些发现表明了ViT注意力机制在产品设计和美学评估中的潜在应用,同时强调了人类感知和当前AI模型之间注意力策略的根本差异。
摘要:Visual attention mechanisms play a crucial role in human perception and aesthetic evaluation. Recent advances in Vision Transformers (ViTs) have demonstrated remarkable capabilities in computer vision tasks, yet their alignment with human visual attention patterns remains underexplored, particularly in aesthetic contexts. This study investigates the correlation between human visual attention and ViT attention mechanisms when evaluating handcrafted objects. We conducted an eye-tracking experiment with 30 participants (9 female, 21 male, mean age 24.6 years) who viewed 20 artisanal objects comprising basketry bags and ginger jars. Using a Pupil Labs eye-tracker, we recorded gaze patterns and generated heat maps representing human visual attention. Simultaneously, we analyzed the same objects using a pre-trained ViT model with DINO (Self-DIstillation with NO Labels), extracting attention maps from each of the 12 attention heads. We compared human and ViT attention distributions using Kullback-Leibler divergence across varying Gaussian parameters (sigma=0.1 to 3.0). Statistical analysis revealed optimal correlation at sigma=2.4 +-0.03, with attention head #12 showing the strongest alignment with human visual patterns. Significant differences were found between attention heads, with heads #7 and #9 demonstrating the greatest divergence from human attention (p< 0.05, Tukey HSD test). Results indicate that while ViTs exhibit more global attention patterns compared to human focal attention, certain attention heads can approximate human visual behavior, particularly for specific object features like buckles in basketry items. These findings suggest potential applications of ViT attention mechanisms in product design and aesthetic evaluation, while highlighting fundamental differences in attention strategies between human perception and current AI models.
【2】DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
标题:DNT:一个深度规范化的Transformer,可以通过Momentum Singapore训练
链接:https://arxiv.org/abs/2507.17501
作者:Qi, Marco Chen, Wenjie Xiao, Jiaquan Ye, Yelin He, Chun-Guang Li, Zhouchen Lin
备注:We have introduced a novel architecture, Deeply Normalized Transformer (DNT), which enables efficient training with vanilla momentum SGDW (mSGDW), achieving performance on par with AdamW-optimized Transformers
摘要:Transformers已经成为现代深度学习的事实上的支柱,但它们的训练通常需要像AdamW这样具有自适应学习速率的高级优化器,而不是动量SGDW(mSGDW)。以前的工作表明,这主要是由于重尾分布的梯度。在本文中,我们介绍了一种深度规范化的Transformer(DNT),它经过精心设计,可以克服这一限制,从而实现与vanilla mSGDW的无缝训练,同时获得与通过AdamW训练的Transformers相当的性能。具体而言,在DNT中,我们在Transformers的适当位置战略性地集成归一化技术,以有效地调制每层的雅可比矩阵,平衡权重,激活及其相互作用的影响,从而使梯度分布集中。我们提供了在我们的DNT中使用的归一化技术的理论依据,以及对两种流行的Transformer架构的广泛经验评估,以验证:a)DNT优于其同行(即ViT和GPT),以及b)DNT可以有效地使用vanilla mSGDW进行训练。
摘要:Transformers have become the de facto backbone of modern deep learning, yet their training typically demands an advanced optimizer with adaptive learning rate like AdamW, rather than a momentum SGDW (mSGDW). Previous works show that it is mainly due to a heavy-tailed distribution of the gradients. In this paper, we introduce a Deeply Normalized Transformer (DNT), which is meticulously engineered to overcome this limitation enabling seamless training with vanilla mSGDW while yielding comparable performance to the Transformers trained via AdamW. To be specific, in DNT, we strategically integrate normalization techniques at proper positions in the Transformers to effectively modulate the Jacobian matrices of each layer, balance the influence of weights, activations, and their interactions, and thus enable the distributions of gradients concentrated. We provide both theoretical justifications of the normalization technique used in our DNT and extensive empirical evaluation on two popular Transformer architectures to validate that: a) DNT outperforms its counterparts (\ie, ViT and GPT), and b) DNT can be effectively trained with vanilla mSGDW.
【3】Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography
标题:Mammo-Mamba:一种用于多视图乳腺X射线摄影的具有顺序混合专家的混合状态空间和Transformer结构
链接:https://arxiv.org/abs/2507.17662
作者:Bayatmakou, Reza Taleei, Nicole Simone, Arash Mohammadi
摘要:乳腺癌(BC)仍然是女性癌症相关死亡率的主要原因之一,尽管计算机辅助诊断(CAD)系统最近取得了进展。准确有效地解读多视图乳腺X线照片对于早期检测至关重要,这推动了人们对人工智能(AI)驱动的CAD模型的兴趣激增。虽然最先进的多视图乳房X线摄影分类模型主要基于Transformer架构,但其计算复杂度与图像块的数量成二次方,突出了对更有效的替代方案的需求。为了应对这一挑战,我们提出了Mammo-Mamba,这是一个新的框架,它将选择性状态空间模型(SSM),基于transformer的注意力和专家驱动的特征细化集成到一个统一的架构中。Mammo-Mamba通过其定制的SecMamba块引入专家顺序混合(SeqMoE)机制,扩展了MambaVision主干。SecMamba是一种经过修改的MambaVision块,通过启用内容自适应特征细化,增强了高分辨率乳腺摄影图像中的表示学习。这些模块被集成到MambaVision的更深层次中,使模型能够通过动态专家选通逐步调整特征重点,有效缓解传统Transformer模型的局限性。在CBIS-DDSM基准数据集上进行评估,Mammo-Mamba在所有关键指标上都实现了卓越的分类性能,同时保持了计算效率。
摘要:Breast cancer (BC) remains one of the leading causes of cancer-related mortality among women, despite recent advances in Computer-Aided Diagnosis (CAD) systems. Accurate and efficient interpretation of multi-view mammograms is essential for early detection, driving a surge of interest in Artificial Intelligence (AI)-powered CAD models. While state-of-the-art multi-view mammogram classification models are largely based on Transformer architectures, their computational complexity scales quadratically with the number of image patches, highlighting the need for more efficient alternatives. To address this challenge, we propose Mammo-Mamba, a novel framework that integrates Selective State-Space Models (SSMs), transformer-based attention, and expert-driven feature refinement into a unified architecture. Mammo-Mamba extends the MambaVision backbone by introducing the Sequential Mixture of Experts (SeqMoE) mechanism through its customized SecMamba block. The SecMamba is a modified MambaVision block that enhances representation learning in high-resolution mammographic images by enabling content-adaptive feature refinement. These blocks are integrated into the deeper stages of MambaVision, allowing the model to progressively adjust feature emphasis through dynamic expert gating, effectively mitigating the limitations of traditional Transformer models. Evaluated on the CBIS-DDSM benchmark dataset, Mammo-Mamba achieves superior classification performance across all key metrics while maintaining computational efficiency.
【4】Learning from Scratch: Structurally-masked Transformer for Next Generation Lib-free Simulation
标题:从Scratch中学习:用于下一代自由模拟的结构屏蔽Transformer
链接:https://arxiv.org/abs/2507.17396
作者:uang, Hao Chen, Zhong Guan
摘要:本文提出了一个神经框架的功率和时序预测的多级数据路径,区别于传统的基于库的分析方法依赖于驱动程序的特性和负载简化。据我们所知,这是第一个明确为标准单元设计的基于语言的、网络列表感知的神经网络。我们的方法采用两个预先训练的波形预测和延迟估计神经模型,直接从SPICE网表推断瞬态波形和传播延迟,条件是关键的物理参数,如负载电容,输入压摆和栅极大小。这种方法准确地捕捉固有的和耦合引起的延迟效应,而不需要简化或插值。对于多阶段的时序预测,我们实现了一个递归的传播策略,从每个阶段的预测波形馈入到后续阶段,累积捕获整个逻辑链的延迟。这种方法可确保在整个复杂信号通路中实现精确的时序对准和完整的波形可见性。波形预测利用具有网表感知节点级编码的混合CNN-变压器架构,解决传统Transformers的固定输入维度约束。此外,专门的子网络分别处理主要延迟估计和串扰校正。实验结果表明,SPICE级精度,在不同的工业电路中始终达到RMSE低于0.0098。所提出的框架提供了一个可扩展的,结构适应性强的神经替代传统的电源和定时引擎,表现出高保真的物理电路行为。
摘要:This paper proposes a neural framework for power and timing prediction of multi-stage data path, distinguishing itself from traditional lib-based analytical methods dependent on driver characterization and load simplifications. To the best of our knowledge, this is the first language-based, netlist-aware neural network designed explicitly for standard cells. Our approach employs two pre-trained neural models of waveform prediction and delay estimation that directly infer transient waveforms and propagation delays from SPICE netlists, conditioned on critical physical parameters such as load capacitance, input slew, and gate size. This method accurately captures both intrinsic and coupling-induced delay effects without requiring simplification or interpolation. For multi-stage timing prediction, we implement a recursive propagation strategy where predicted waveforms from each stage feed into subsequent stages, cumulatively capturing delays across the logic chain. This approach ensures precise timing alignment and complete waveform visibility throughout complex signal pathways. The waveform prediction utilizes a hybrid CNN-Transformer architecture with netlist-aware node-level encoding, addressing traditional Transformers' fixed input dimensionality constraints. Additionally, specialized subnetworks separately handle primary delay estimation and crosstalk correction. Experimental results demonstrate SPICE-level accuracy, consistently achieving RMSE below 0.0098 across diverse industrial circuits. The proposed framework provides a scalable, structurally adaptable neural alternative to conventional power and timing engines, demonstrating high fidelity to physical circuit behaviors.
GAN|对抗|攻击|生成相关(7篇)
【1】On the Interaction of Compressibility and Adversarial Robustness
标题:论可压缩性与对抗鲁棒性的相互作用
链接:https://arxiv.org/abs/2507.17725
作者:sbey, Antônio H. Ribeiro, Umut Şimşekli, Tolga Birdal
摘要:现代神经网络被期望同时满足许多期望的特性:对训练数据的准确拟合,对未知输入的泛化,参数和计算效率,以及对对抗性扰动的鲁棒性。虽然可压缩性和鲁棒性都得到了广泛的研究,但对其相互作用的统一理解仍然难以捉摸。在这项工作中,我们开发了一个原则性的框架来分析不同形式的可压缩性-例如神经元级稀疏性和频谱可压缩性-如何影响对抗鲁棒性。我们表明,这些形式的压缩可以诱导少量的高度敏感的方向在表示空间中,对手可以利用构建有效的扰动。我们的分析产生了一个简单而富有启发性的鲁棒性界,揭示了神经元和频谱压缩性如何通过它们对学习表示的影响来影响$L_\infty$和$L_2$鲁棒性。至关重要的是,我们发现的漏洞与压缩是如何实现的无关-无论是通过正则化,架构偏差还是隐式学习动态。通过对合成和现实任务的实证评估,我们证实了我们的理论预测,并进一步证明了这些漏洞在对抗训练和迁移学习下持续存在,并有助于普遍对抗扰动的出现。我们的研究结果表明,结构化的压缩性和鲁棒性之间的根本紧张关系,并提出了新的途径,设计模型,既有效又安全。
摘要:Modern neural networks are expected to simultaneously satisfy a host of desirable properties: accurate fitting to training data, generalization to unseen inputs, parameter and computational efficiency, and robustness to adversarial perturbations. While compressibility and robustness have each been studied extensively, a unified understanding of their interaction still remains elusive. In this work, we develop a principled framework to analyze how different forms of compressibility - such as neuron-level sparsity and spectral compressibility - affect adversarial robustness. We show that these forms of compression can induce a small number of highly sensitive directions in the representation space, which adversaries can exploit to construct effective perturbations. Our analysis yields a simple yet instructive robustness bound, revealing how neuron and spectral compressibility impact $L_\infty$ and $L_2$ robustness via their effects on the learned representations. Crucially, the vulnerabilities we identify arise irrespective of how compression is achieved - whether via regularization, architectural bias, or implicit learning dynamics. Through empirical evaluations across synthetic and realistic tasks, we confirm our theoretical predictions, and further demonstrate that these vulnerabilities persist under adversarial training and transfer learning, and contribute to the emergence of universal adversarial perturbations. Our findings show a fundamental tension between structured compressibility and robustness, and suggest new pathways for designing models that are both efficient and secure.
【2】Generalized Dual Discriminator GANs
标题:广义双鉴别器GANs
链接:https://arxiv.org/abs/2507.17684
作者: Naga Chandana, Tejas Srivastava, Gowtham R. Kurri, V. Lalitha
备注:8 pages, 2 figures, extended version of a paper accepted for presentation at ITW 2025
摘要
:为了解决生成式对抗网络中的模式崩溃问题,引入了双节点生成式对抗网络(D2 GANs)。在D2 GANs中,在生成器旁边使用两个判别器:一个判别器奖励来自真实数据分布的样本的高分,而另一个判别器则支持来自生成器的样本。在这项工作中,我们首先引入双鉴别器$\alpha$-GANs(D2 $\alpha$-GANs),它将双鉴别器的优势与可调损失函数$\alpha$-loss的灵活性相结合。我们进一步将这种方法推广到定义在正实数上的任意函数,从而产生了一类更广泛的模型,我们称之为广义对偶生成对抗网络。对于每一个这些建议的模型,我们提供了理论分析,并表明,相关的最小最大优化减少到最小化的线性组合的$f$-发散和反向$f$-发散。这概括了D2-GANs的已知简化,其中目标简化为KL发散和反向KL发散的线性组合。最后,我们在2D合成数据上进行实验,并使用多个性能指标来捕捉我们的GANs的各种优势。
摘要:Dual discriminator generative adversarial networks (D2 GANs) were introduced to mitigate the problem of mode collapse in generative adversarial networks. In D2 GANs, two discriminators are employed alongside a generator: one discriminator rewards high scores for samples from the true data distribution, while the other favors samples from the generator. In this work, we first introduce dual discriminator $\alpha$-GANs (D2 $\alpha$-GANs), which combines the strengths of dual discriminators with the flexibility of a tunable loss function, $\alpha$-loss. We further generalize this approach to arbitrary functions defined on positive reals, leading to a broader class of models we refer to as generalized dual discriminator generative adversarial networks. For each of these proposed models, we provide theoretical analysis and show that the associated min-max optimization reduces to the minimization of a linear combination of an $f$-divergence and a reverse $f$-divergence. This generalizes the known simplification for D2-GANs, where the objective reduces to a linear combination of the KL-divergence and the reverse KL-divergence. Finally, we perform experiments on 2D synthetic data and use multiple performance metrics to capture various advantages of our GANs.
【3】Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
标题:增强具有基于传输的先验的硬标签攻击的Ray搜索程序
链接:https://arxiv.org/abs/2507.17577
作者:Xinjie Xu, Shuyu Cheng, Qi Xuan
备注:Published at ICLR 2025 (Spotlight paper)
摘要:最实用和最具挑战性的黑盒对抗攻击类型之一是硬标签攻击,其中只有前1个预测标签可用。一种有效的方法是从良性图像中搜索最佳射线方向,使到敌对区域的$\ell_p$-范数距离最小化。这种方法的独特优势在于它将硬标签攻击转化为连续优化问题。目标函数值是射线的半径,这可以通过二分搜索以较高的查询成本获得。现有方法在梯度估计中使用“符号技巧”来减少查询的数量。在本文中,我们从理论上分析了这种梯度估计的质量,并提出了一种新的事先指导的方法,以提高射线搜索效率的理论和经验。具体来说,我们利用基于转移的先验从代理模型,我们的梯度估计适当地整合它们近似的投影到子空间的真实梯度跨越这些先验和随机方向,在查询效率的方式。我们从理论上推导出所获得的梯度估计和真实梯度之间的期望余弦相似性,并证明了通过合并先验所实现的改进。在ImageNet和CIFAR-10数据集上进行的大量实验表明,我们的方法在查询效率方面明显优于11种最先进的方法。
摘要:One of the most practical and challenging types of black-box adversarial attacks is the hard-label attack, where only the top-1 predicted label is available. One effective approach is to search for the optimal ray direction from the benign image that minimizes the $\ell_p$-norm distance to the adversarial region. The unique advantage of this approach is that it transforms the hard-label attack into a continuous optimization problem. The objective function value is the ray's radius, which can be obtained via binary search at a high query cost. Existing methods use a "sign trick" in gradient estimation to reduce the number of queries. In this paper, we theoretically analyze the quality of this gradient estimation and propose a novel prior-guided approach to improve ray search efficiency both theoretically and empirically. Specifically, we utilize the transfer-based priors from surrogate models, and our gradient estimators appropriately integrate them by approximating the projection of the true gradient onto the subspace spanned by these priors and random directions, in a query-efficient manner. We theoretically derive the expected cosine similarities between the obtained gradient estimators and the true gradient, and demonstrate the improvement achieved by incorporating priors. Extensive experiments on the ImageNet and CIFAR-10 datasets show that our approach significantly outperforms 11 state-of-the-art methods in terms of query efficiency.
【4】Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning
标题:Ctx2TrajGen:使用生成对抗模仿学习的交通上下文感知微尺度车辆轨迹
链接:https://arxiv.org/abs/2507.17418
作者:n, Seokjun Hong, Gyeongseon Baek, Yeeun Kim, Byeongjoon Noh
摘要:微观车辆轨迹的精确建模对于交通行为分析和自动驾驶系统至关重要。我们提出了Ctx 2 TrajGen,一个上下文感知的轨迹生成框架,使用GAIL合成现实的城市驾驶行为。利用PPO和WGAN-GP,我们的模型解决了微观环境中固有的非线性相互依赖性和训练不稳定性。通过明确地调节周围的车辆和道路几何形状,Ctx 2 TrajGen生成与现实世界环境一致的交互感知轨迹。在无人机捕获的DRIFT数据集上的实验表明,在现实主义,行为多样性和上下文保真度方面优于现有方法,为数据稀缺和域转移提供了一个强大的解决方案,而无需模拟。
摘要:Precise modeling of microscopic vehicle trajectories is critical for traffic behavior analysis and autonomous driving systems. We propose Ctx2TrajGen, a context-aware trajectory generation framework that synthesizes realistic urban driving behaviors using GAIL. Leveraging PPO and WGAN-GP, our model addresses nonlinear interdependencies and training instability inherent in microscopic settings. By explicitly conditioning on surrounding vehicles and road geometry, Ctx2TrajGen generates interaction-aware trajectories aligned with real-world context. Experiments on the drone-captured DRIFT dataset demonstrate superior performance over existing methods in terms of realism, behavioral diversity, and contextual fidelity, offering a robust solution to data scarcity and domain shift without simulation.
【5】Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation
标题:背景风险:对合成表格数据生成中基础模型的隐私泄露进行基准测试
链接:https://arxiv.org/abs/2507.17066
作者:un, Xiaofeng Lin, Joshua Ward, Guang Cheng
备注:Accepted by Agentic & GenAI Evaluation KDD2025, poster presentation
摘要:合成表格数据对于机器学习工作流至关重要,特别是对于扩展小型或不平衡的数据集以及实现隐私保护数据共享。然而,最先进的生成模型(GANs,VAE,扩散模型)依赖于具有数千个示例的大型数据集。在低数据环境中,通常是合成数据的主要动机,这些模型可能会过度拟合,泄漏敏感记录,并需要频繁重新训练。最近的工作使用大型预训练的Transformers通过上下文学习(ICL)生成行,这只需要几个种子示例,不需要参数更新,避免了重新训练。但ICL逐字重复种子行,引入了一个新的隐私风险,只有在文本中研究。在表格合成中,这种风险的严重程度-其中一行可以识别一个人-仍然不清楚。我们使用三个基础模型(GPT-4 o-mini,LLaMA 3.3 70 B,TabPFN v2)的第一个基准来解决这个差距,这些模型针对来自健康,金融和政策的35个真实世界表格的四个基线。我们评估统计保真度,下游效用,和成员推理泄漏。结果显示,基金会模型始终具有最高的隐私风险。LLaMA 3.3 70 B在1% FPR下的真阳性率比最安全的基线高出54个百分点。GPT-4 o-mini和TabPFN也非常脆弱。我们绘制了隐私效用边界,并表明CTGAN和GPT-4 o-mini提供了更好的权衡。一项析因研究发现,三个零成本的即时调整-小批量,低温和使用总结药代动力学-可以将最差情况下的AUC降低14个点,将稀有类泄漏降低多达39个点,同时保持90%以上的保真度。我们的基准提供了一个实用的指南,更安全的低数据合成与基础模型。
摘要
:Synthetic tabular data is essential for machine learning workflows, especially for expanding small or imbalanced datasets and enabling privacy-preserving data sharing. However, state-of-the-art generative models (GANs, VAEs, diffusion models) rely on large datasets with thousands of examples. In low-data settings, often the primary motivation for synthetic data, these models can overfit, leak sensitive records, and require frequent retraining. Recent work uses large pre-trained transformers to generate rows via in-context learning (ICL), which needs only a few seed examples and no parameter updates, avoiding retraining. But ICL repeats seed rows verbatim, introducing a new privacy risk that has only been studied in text. The severity of this risk in tabular synthesis-where a single row may identify a person-remains unclear. We address this gap with the first benchmark of three foundation models (GPT-4o-mini, LLaMA 3.3 70B, TabPFN v2) against four baselines on 35 real-world tables from health, finance, and policy. We evaluate statistical fidelity, downstream utility, and membership inference leakage. Results show foundation models consistently have the highest privacy risk. LLaMA 3.3 70B reaches up to 54 percentage points higher true-positive rate at 1% FPR than the safest baseline. GPT-4o-mini and TabPFN are also highly vulnerable. We plot the privacy-utility frontier and show that CTGAN and GPT-4o-mini offer better tradeoffs. A factorial study finds that three zero-cost prompt tweaks-small batch size, low temperature, and using summary statistics-can reduce worst-case AUC by 14 points and rare-class leakage by up to 39 points while maintaining over 90% fidelity. Our benchmark offers a practical guide for safer low-data synthesis with foundation models.
【6】Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation
标题:偏见应该始终被消除吗?使用数据偏差进行OOD生成的原则框架
链接:https://arxiv.org/abs/2507.17001
作者:uangyi Chen, Yunlong Deng, Zijian Li, Zeyu Tang, Anpeng Wu, Kun Zhang
摘要:大多数现有的方法适应模型的分布(OOD)域依赖于不变表示学习,以消除偏见的影响。然而,偏见是否应该永远被消除,如果不是,什么时候应该保留偏见,如何利用偏见?为了解决这些问题,我们首先提出了一个理论分析,探讨了条件下,偏见的功能可以识别和有效利用。建立在这个理论基础上,我们引入了一个新的框架,战略性地利用偏见,以补充不变表示在推理过程中。该框架包括两个关键组成部分,以直接和间接的方式利用偏见:(1)使用不变性作为指导,从偏见中提取预测成分,(2)利用已识别的偏见来估计环境条件,然后用它来探索适当的偏见意识预测,以减轻环境差距。我们验证我们的方法,通过实验合成数据集和标准域泛化基准。结果一致表明,我们的方法优于现有的方法,强调其鲁棒性和适应性。
摘要:Most existing methods for adapting models to out-of-distribution (OOD) domains rely on invariant representation learning to eliminate the influence of biased features. However, should bias always be eliminated -- and if not, when should it be retained, and how can it be leveraged? To address these questions, we first present a theoretical analysis that explores the conditions under which biased features can be identified and effectively utilized. Building on this theoretical foundation, we introduce a novel framework that strategically leverages bias to complement invariant representations during inference. The framework comprises two key components that leverage bias in both direct and indirect ways: (1) using invariance as guidance to extract predictive ingredients from bias, and (2) exploiting identified bias to estimate the environmental condition and then use it to explore appropriate bias-aware predictors to alleviate environment gaps. We validate our approach through experiments on both synthetic datasets and standard domain generalization benchmarks. Results consistently demonstrate that our method outperforms existing approaches, underscoring its robustness and adaptability.
【7】Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach
标题:通过增长界矩阵方法在NLP中弥合鲁棒性和通用性对抗词替换攻击
链接:https://arxiv.org/abs/2507.10330
作者:Bouri, Adnane Saoud
备注:Accepted to ACL Findings 2025
摘要:尽管自然语言处理(NLP)取得了进步,但模型仍然容易受到对抗性攻击,例如同义词替换。虽然之前的工作集中在提高前馈和卷积架构的鲁棒性,但递归网络和现代状态空间模型(SSM)(如S4)的鲁棒性仍然研究不足。这些架构由于其顺序处理和复杂的参数动态特性而带来了独特的挑战。在本文中,我们介绍了一种新的正则化技术的基础上增长界矩阵(GBM),以提高NLP模型的鲁棒性,减少输入扰动对模型输出的影响。我们专注于计算三种架构的GBM:长短期记忆(LSTM),状态空间模型(S4)和卷积神经网络(CNN)。我们的方法旨在(1)增强对单词替换攻击的恢复能力,(2)提高对干净文本的泛化能力,以及(3)首次系统地分析SSM(S4)的鲁棒性。跨多个架构和基准数据集的广泛实验表明,我们的方法比现有基线提高了对抗鲁棒性高达8.8%。这些结果突出了我们方法的有效性,在对抗性防御中优于几种最先进的方法。代码可在https://github.com/BouriMohammed/GBM上获得
摘要:Despite advancements in Natural Language Processing (NLP), models remain vulnerable to adversarial attacks, such as synonym substitutions. While prior work has focused on improving robustness for feed-forward and convolutional architectures, the robustness of recurrent networks and modern state space models (SSMs), such as S4, remains understudied. These architectures pose unique challenges due to their sequential processing and complex parameter dynamics. In this paper, we introduce a novel regularization technique based on Growth Bound Matrices (GBM) to improve NLP model robustness by reducing the impact of input perturbations on model outputs. We focus on computing the GBM for three architectures: Long Short-Term Memory (LSTM), State Space models (S4), and Convolutional Neural Networks (CNN). Our method aims to (1) enhance resilience against word substitution attacks, (2) improve generalization on clean text, and (3) providing the first systematic analysis of SSM (S4) robustness. Extensive experiments across multiple architectures and benchmark datasets demonstrate that our method improves adversarial robustness by up to 8.8% over existing baselines. These results highlight the effectiveness of our approach, outperforming several state-of-the-art methods in adversarial defense. Codes are available at https://github.com/BouriMohammed/GBM
半/弱/无/有监督|不确定性|主动学习(4篇)
【1】PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training
标题:PICore:基于物理信息的无监督核心集选择,以实现数据高效的神经操作员训练
链接:https://arxiv.org/abs/2507.17151
作者:atheesh, Anant Khandelwal, Mucong Ding, Radu Balan
备注:Submitted to TMLR 2025
摘要:神经算子为解决偏微分方程(PDE)提供了一个强大的范例,这些偏微分方程无法通过学习函数空间之间的映射来解析求解。然而,在训练神经运算符时存在两个主要瓶颈:它们需要大量的训练数据来学习这些映射,并且这些数据需要被标记,这些数据只能通过昂贵的数值求解器模拟来访问。为了同时缓解这两个问题,我们提出了PICore,这是一个无监督的核心集选择框架,可以识别信息量最大的训练样本,而无需访问地面实况PDE解决方案。PICore利用物理信息损失来选择未标记的输入,通过它们对操作员学习的潜在贡献。在选择输入的紧凑子集后,仅使用数值求解器模拟这些样本以生成标签,从而降低注释成本。然后,我们在减少的标记数据集上训练神经运算符,从而显著减少训练时间。在四种不同的PDE基准和多种核心选择策略中,PICore相对于监督核心选择方法的训练效率平均提高了78%,准确性变化最小。我们在https://github.com/Asatheesh6561/PICore上提供代码。
摘要:Neural operators offer a powerful paradigm for solving partial differential equations (PDEs) that cannot be solved analytically by learning mappings between function spaces. However, there are two main bottlenecks in training neural operators: they require a significant amount of training data to learn these mappings, and this data needs to be labeled, which can only be accessed via expensive simulations with numerical solvers. To alleviate both of these issues simultaneously, we propose PICore, an unsupervised coreset selection framework that identifies the most informative training samples without requiring access to ground-truth PDE solutions. PICore leverages a physics-informed loss to select unlabeled inputs by their potential contribution to operator learning. After selecting a compact subset of inputs, only those samples are simulated using numerical solvers to generate labels, reducing annotation costs. We then train the neural operator on the reduced labeled dataset, significantly decreasing training time as well. Across four diverse PDE benchmarks and multiple coreset selection strategies, PICore achieves up to 78% average increase in training efficiency relative to supervised coreset selection methods with minimal changes in accuracy. We provide code at https://github.com/Asatheesh6561/PICore.
【2】BiLO: Bilevel Local Operator Learning for PDE Inverse Problems. Part II: Efficient Uncertainty Quantification with Low-Rank Adaptation
标题:BiLO:用于PCE逆问题的双水平局部操作符学习。第二部分:低等级适应的高效不确定性量化
链接:https://arxiv.org/abs/2507.17019
作者: Zhang, Christopher E. Miles, Xiaohui Xie, John S. Lowengrub
摘要:偏微分方程(PDE)的不确定性量化和反问题是广泛的科学和工程应用的核心。在这个由两部分组成的系列的第二部分中,我们将第1部分中开发的用于PDE约束优化问题的双层局部算子学习(BiLO)扩展到贝叶斯推理框架。在较低的层次上,我们训练一个网络,通过最小化相对于神经网络的权重的局部算子损失来近似局部解算子。在上一级,我们从后验分布中采样PDE参数。我们通过基于梯度的马尔可夫链蒙特卡罗(MCMC)方法和低秩自适应(LoRA)实现了有效的采样。与现有的基于贝叶斯神经网络的方法相比,我们的方法绕过了在神经网络权重的高维空间中采样的挑战,并且不需要指定神经网络解决方案的先验分布。相反,不确定性通过PDE约束从数据中自然传播。通过强偏微分方程约束,该方法提高了参数推断和不确定性量化的准确性。我们分析了动态误差的梯度在MCMC采样器和静态误差的后验分布由于不精确的最小化的低级别的问题,并证明了解决低级别的问题和所产生的不确定性量化的准确性之间的直接联系的公差。通过在各种偏微分方程模型的数值实验,我们证明了我们的方法提供了准确的推理和量化的不确定性,同时保持高的计算效率。
摘要:Uncertainty quantification and inverse problems governed by partial differential equations (PDEs) are central to a wide range of scientific and engineering applications. In this second part of a two part series, we extend Bilevel Local Operator Learning (BiLO) for PDE-constrained optimization problems developed in Part 1 to the Bayesian inference framework. At the lower level, we train a network to approximate the local solution operator by minimizing the local operator loss with respect to the weights of the neural network. At the upper level, we sample the PDE parameters from the posterior distribution. We achieve efficient sampling through gradient-based Markov Chain Monte Carlo (MCMC) methods and low-rank adaptation (LoRA). Compared with existing methods based on Bayesian neural networks, our approach bypasses the challenge of sampling in the high-dimensional space of neural network weights and does not require specifying a prior distribution on the neural network solution. Instead, uncertainty propagates naturally from the data through the PDE constraints. By enforcing strong PDE constraints, the proposed method improves the accuracy of both parameter inference and uncertainty quantification. We analyze the dynamic error of the gradient in the MCMC sampler and the static error in the posterior distribution due to inexact minimization of the lower level problem and demonstrate a direct link between the tolerance for solving the lower level problem and the accuracy of the resulting uncertainty quantification. Through numerical experiments across a variety of PDE models, we demonstrate that our method delivers accurate inference and quantification of uncertainties while maintaining high computational efficiency.
【3】Clustering-based hard negative sampling for supervised contrastive speaker verification
标题:基于触发的硬负采样用于监督对比说话人验证
链接:https://arxiv.org/abs/2507.17540
作者:ztalski, Michał Romaniuk, Jakub Żak, Mateusz Matuszewski, Konrad Kowalczyk
备注:Accepted to INTERSPEECH 2025
摘要:在说话人确认中,对比学习作为传统的基于分类的方法的替代方案越来越受欢迎。对比方法可以从有效使用硬否定对中受益,硬否定对是不同类别的样本,由于它们的相似性,对于验证模型来说特别具有挑战性。在本文中,我们提出了CHNS -基于聚类的硬负采样方法,专用于监督对比说话人表示学习。我们的方法聚类嵌入相似的扬声器,并调整批次组成,以获得最佳比例的硬和容易的负面对比损失计算。实验评估表明,CHNS优于基线监督对比方法,有和没有基于丢失的硬负采样,以及最先进的基于分类的方法,以多达18%的相对EER和minDCF的VoxCeleb数据集上使用两个轻量级模型架构的说话人验证。
摘要:In speaker verification, contrastive learning is gaining popularity as an alternative to the traditionally used classification-based approaches. Contrastive methods can benefit from an effective use of hard negative pairs, which are different-class samples particularly challenging for a verification model due to their similarity. In this paper, we propose CHNS - a clustering-based hard negative sampling method, dedicated for supervised contrastive speaker representation learning. Our approach clusters embeddings of similar speakers, and adjusts batch composition to obtain an optimal ratio of hard and easy negatives during contrastive loss calculation. Experimental evaluation shows that CHNS outperforms a baseline supervised contrastive approach with and without loss-based hard negative sampling, as well as a state-of-the-art classification-based approach to speaker verification by as much as 18 % relative EER and minDCF on the VoxCeleb dataset using two lightweight model architectures.
【4】Enhancing Lung Disease Diagnosis via Semi-Supervised Machine Learning
标题:通过半监督机器学习增强肺部疾病诊断
链接:https://arxiv.org/abs/2507.16845
作者:ua, In-Ho Rab, Ravi Sankarc
摘要:包括肺癌和COPD在内的肺部疾病是全球重大的健康问题。传统的诊断方法可能是昂贵的,耗时的和侵入性的。本研究使用MFCC+CNN的模型组合来研究半监督学习方法在肺音信号检测中的应用。通过引入半监督学习模块,如Mix Match,Co-Refinement和Co Refurbishing,我们的目标是提高检测性能,同时减少对手动注释的依赖。通过添加半监督模块,MFCC+CNN模型的准确率为92.9%,比基线模型提高了3.8%。该研究通过解决个体差异、特征标记数据不足等挑战,为肺部疾病声音检测领域做出了贡献。
摘要:Lung diseases, including lung cancer and COPD, are significant health concerns globally. Traditional diagnostic methods can be costly, time-consuming, and invasive. This study investigates the use of semi supervised learning methods for lung sound signal detection using a model combination of MFCC+CNN. By introducing semi supervised learning modules such as Mix Match, Co-Refinement, and Co Refurbishing, we aim to enhance the detection performance while reducing dependence on manual annotations. With the add-on semi-supervised modules, the accuracy rate of the MFCC+CNN model is 92.9%, an increase of 3.8% to the baseline model. The research contributes to the field of lung disease sound detection by addressing challenges such as individual differences, feature insufficient labeled data.
迁移|Zero/Few/One-Shot|自适应(4篇)
【1】SADA: Stability-guided Adaptive Diffusion Acceleration
标题:SADA:稳定性引导的自适应扩散加速
链接:https://arxiv.org/abs/2507.17135
作者:g, Yixiao Wang, Hancheng Ye, Zishan Shao, Jingwei Sun, Jingyang Zhang, Zekai Chen, Jianyi Zhang, Yiran Chen, Hai Li
备注:Accepted and published by ICML 2025. Code is available at: this https URL
摘要:扩散模型在生成任务中取得了显着的成功,但由于其迭代采样过程和二次注意力成本,计算成本很高。现有的免训练加速策略降低了每步计算成本,同时有效地减少了采样时间,与原始基线相比,表现出较低的忠诚度。我们假设这种保真度差距的出现是因为(a)不同的提示对应于不同的去噪轨迹,以及(b)这些方法没有考虑潜在的ODE公式及其数值解。在本文中,我们提出了稳定性指导的自适应扩散加速(SADA),这是一种新的范式,通过单一的稳定性标准来统一逐步和令牌稀疏决策,以加速基于ODE的生成模型(扩散和流匹配)的采样。对于(a),SADA基于采样轨迹自适应地分配稀疏性。对于(b),SADA引入了原则性近似方案,该方案利用了来自数值ODE求解器的精确梯度信息。综合评估SD-2,SDXL和通量使用EDM和ESTA ++求解器显示一致的$\ge 1.8\times$加速与最小的保真度退化(LPIPS $\leq 0.10$和FID $\leq 4.5$)相比,未经修改的基线,显着优于以前的方法。此外,SADA可无缝适应其他管道和模式:它无需任何修改即可加速ControlNet,并使用$\sim 0.01$频谱图LPIPS将MusicLDM加速1.8\times $。
摘要
:Diffusion models have achieved remarkable success in generative tasks but suffer from high computational costs due to their iterative sampling process and quadratic attention costs. Existing training-free acceleration strategies that reduce per-step computation cost, while effectively reducing sampling time, demonstrate low faithfulness compared to the original baseline. We hypothesize that this fidelity gap arises because (a) different prompts correspond to varying denoising trajectory, and (b) such methods do not consider the underlying ODE formulation and its numerical solution. In this paper, we propose Stability-guided Adaptive Diffusion Acceleration (SADA), a novel paradigm that unifies step-wise and token-wise sparsity decisions via a single stability criterion to accelerate sampling of ODE-based generative models (Diffusion and Flow-matching). For (a), SADA adaptively allocates sparsity based on the sampling trajectory. For (b), SADA introduces principled approximation schemes that leverage the precise gradient information from the numerical ODE solver. Comprehensive evaluations on SD-2, SDXL, and Flux using both EDM and DPM++ solvers reveal consistent $\ge 1.8\times$ speedups with minimal fidelity degradation (LPIPS $\leq 0.10$ and FID $\leq 4.5$) compared to unmodified baselines, significantly outperforming prior methods. Moreover, SADA adapts seamlessly to other pipelines and modalities: It accelerates ControlNet without any modifications and speeds up MusicLDM by $1.8\times$ with $\sim 0.01$ spectrogram LPIPS.
【2】Robust Five-Class and binary Diabetic Retinopathy Classification Using Transfer Learning and Data Augmentation
标题:使用迁移学习和数据增强的稳健五级和二元糖尿病视网膜病变分类
链接:https://arxiv.org/abs/2507.17121
作者:med, Mohammad Alfrad Nobel Bhuiyan
备注:9 pages, 1 Figure
摘要:糖尿病视网膜病变(DR)是全球视力丧失的主要原因,通过自动视网膜图像分析进行早期诊断可以显著降低失明风险。本文提出了一个强大的深度学习框架,用于二进制和五类DR分类,利用迁移学习和广泛的数据增强来解决类别不平衡和有限的训练数据的挑战。我们在APTOS 2019数据集上评估了一系列预训练的卷积神经网络架构,包括ResNet和EfficientNet的变体。 对于二进制分类,我们提出的模型实现了98.9%的最先进的准确率,98.6%的精度,99.3%的召回率,98.9%的F1分数和99.4%的AUC。在更具挑战性的五级严重程度分类任务中,我们的模型获得了84.6%的准确率和94.1%的AUC,优于现有的几种方法。我们的研究结果还表明,EfficientNet-B 0和ResNet 34在两项任务的准确性和计算效率之间提供了最佳权衡。 这些结果强调了将类平衡增强与迁移学习相结合用于高性能DR诊断的有效性。所提出的框架提供了一个可扩展的和准确的DR筛查解决方案,在现实世界的临床环境中部署的潜力。
摘要:Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, and early diagnosis through automated retinal image analysis can significantly reduce the risk of blindness. This paper presents a robust deep learning framework for both binary and five-class DR classification, leveraging transfer learning and extensive data augmentation to address the challenges of class imbalance and limited training data. We evaluate a range of pretrained convolutional neural network architectures, including variants of ResNet and EfficientNet, on the APTOS 2019 dataset. For binary classification, our proposed model achieves a state-of-the-art accuracy of 98.9%, with a precision of 98.6%, recall of 99.3%, F1-score of 98.9%, and an AUC of 99.4%. In the more challenging five-class severity classification task, our model obtains a competitive accuracy of 84.6% and an AUC of 94.1%, outperforming several existing approaches. Our findings also demonstrate that EfficientNet-B0 and ResNet34 offer optimal trade-offs between accuracy and computational efficiency across both tasks. These results underscore the effectiveness of combining class-balanced augmentation with transfer learning for high-performance DR diagnosis. The proposed framework provides a scalable and accurate solution for DR screening, with potential for deployment in real-world clinical environments.
【3】Hierarchical Reinforcement Learning Framework for Adaptive Walking Control Using General Value Functions of Lower-Limb Sensor Signals
标题:使用下肢传感器信号通用值函数的自适应步行控制分层强化学习框架
链接:https://arxiv.org/abs/2507.16983
作者:Jones, Grange M. Simpson, Patrick M. Pilarski, Ashley N. Dalrymple
备注:5 pages, 3 figures, accepted at the 6th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM2025), June 11-14, 2025
摘要:康复技术是研究人类和机器代理的共享学习和决策的自然环境。在这项工作中,我们探索使用分层强化学习(HRL)来开发下肢外骨骼的自适应控制策略,旨在提高运动障碍患者的移动性和自主性。受生物感觉运动处理的突出模型的启发,我们研究的HRL方法将外骨骼控制适应的复杂任务分解为用于地形策略适应的较高级别框架和用于提供预测信息的较低级别框架;后一个元素通过对一般值函数(GVF)的持续学习来实现。GVF从多个可穿戴下肢传感器(包括肌电图、压力鞋垫和测角器)生成未来信号值的时间抽象。我们研究了两种方法,将实际和预测的传感器信号到一个政策网络的意图,以提高决策能力的下肢外骨骼的控制系统,在不同地形的ammonia。作为一个关键的结果,我们发现增加GVF的预测提高了整体网络的准确性。当在平坦的地面、不平坦的地面、上下坡道和转弯处行走时,可以看到特定于地形的性能提高,这些地形通常在没有预测信息的情况下被错误分类。这表明,预测信息可以帮助决策过程中的不确定性,例如,在很有可能被错误分类的地形上因此,这项工作为HRL的细微差别和外骨骼的未来发展提供了新的见解,以促进安全过渡和穿越不同的步行环境。
摘要:Rehabilitation technology is a natural setting to study the shared learning and decision-making of human and machine agents. In this work, we explore the use of Hierarchical Reinforcement Learning (HRL) to develop adaptive control strategies for lower-limb exoskeletons, aiming to enhance mobility and autonomy for individuals with motor impairments. Inspired by prominent models of biological sensorimotor processing, our investigated HRL approach breaks down the complex task of exoskeleton control adaptation into a higher-level framework for terrain strategy adaptation and a lower-level framework for providing predictive information; this latter element is implemented via the continual learning of general value functions (GVFs). GVFs generated temporal abstractions of future signal values from multiple wearable lower-limb sensors, including electromyography, pressure insoles, and goniometers. We investigated two methods for incorporating actual and predicted sensor signals into a policy network with the intent to improve the decision-making capacity of the control system of a lower-limb exoskeleton during ambulation across varied terrains. As a key result, we found that the addition of predictions made from GVFs increased overall network accuracy. Terrain-specific performance increases were seen while walking on even ground, uneven ground, up and down ramps, and turns, terrains that are often misclassified without predictive information. This suggests that predictive information can aid decision-making during uncertainty, e.g., on terrains that have a high chance of being misclassified. This work, therefore, contributes new insights into the nuances of HRL and the future development of exoskeletons to facilitate safe transitioning and traversing across different walking environments.
【4】Scalable DC Optimization via Adaptive Frank-Wolfe Algorithms
标题:通过自适应Frank-Wolfe算法的可扩展DC优化
链接:https://arxiv.org/abs/2507.17545
作者: Pokutta
摘要:本文考虑在一个紧凸可行域P$上最小化(光滑)凸函数的差的问题,即,$\min_{x \in P} f(x)- g(x)$,其中f$是光滑的,g$是Lipschitz连续的。这项计算研究建立在Maskan等人[2025]的框架基础上,通过整合先进的Frank-Wolfe变体来减少计算开销。我们的经验表明,约束DC问题可以有效地解决使用混合成对条件约束(BPCG)算法的组合[Tsuji等人,2022],具有热启动和Maskan等人的自适应误差界。[2025]。其结果是一个高效和可扩展的投影无约束DC优化算法。
摘要
:We consider the problem of minimizing a difference of (smooth) convex functions over a compact convex feasible region $P$, i.e., $\min_{x \in P} f(x) - g(x)$, with smooth $f$ and Lipschitz continuous $g$. This computational study builds upon and complements the framework of Maskan et al. [2025] by integrating advanced Frank-Wolfe variants to reduce computational overhead. We empirically show that constrained DC problems can be efficiently solved using a combination of the Blended Pairwise Conditional Gradients (BPCG) algorithm [Tsuji et al., 2022] with warm-starting and the adaptive error bound from Maskan et al. [2025]. The result is a highly efficient and scalable projection-free algorithm for constrained DC optimization.
强化学习(8篇)
【1】Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
标题:作为奖励的规则:超越可验证领域的强化学习
链接:https://arxiv.org/abs/2507.17746
作者:njal, Anthony Wang, Elaine Lau, Vaskar Nath, Bing Liu, Sean Hendryx
摘要:将具有可验证奖励的强化学习(RLVR)扩展到现实世界的任务通常需要平衡客观和主观的评估标准。然而,许多这样的任务缺乏一个单一的,明确的地面真相,使得很难定义可靠的奖励信号后训练语言模型。虽然传统的基于偏好的方法提供了一种解决方案,但它们依赖于难以解释且易于产生虚假相关性的不透明奖励函数。我们介绍$\textbf{Rubrics as Rewards}$(RaR),这是一个框架,使用结构化的检查表风格的Rubrics作为GRPO策略培训的可解释奖励信号。与简单的基于Likert的方法相比,我们最好的RaR方法在HealthBench-1 k上产生了高达28\%$的相对改进,同时匹配或超越了来自专家书面参考的奖励信号的性能。通过将规则视为结构化的奖励信号,我们证明了RaR使较小规模的判断模型能够更好地与人类偏好保持一致,并在整个模型规模上保持稳健的性能。
摘要:Extending Reinforcement Learning with Verifiable Rewards (RLVR) to real-world tasks often requires balancing objective and subjective evaluation criteria. However, many such tasks lack a single, unambiguous ground truth-making it difficult to define reliable reward signals for post-training language models. While traditional preference-based methods offer a workaround, they rely on opaque reward functions that are difficult to interpret and prone to spurious correlations. We introduce $\textbf{Rubrics as Rewards}$ (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a $28\%$ relative improvement on HealthBench-1k compared to simple Likert-based approaches, while matching or surpassing the performance of reward signals derived from expert-written references. By treating rubrics as structured reward signals, we show that RaR enables smaller-scale judge models to better align with human preferences and sustain robust performance across model scales.
【2】How Should We Meta-Learn Reinforcement Learning Algorithms?
标题:我们应该如何元学习强化学习算法?
链接:https://arxiv.org/abs/2507.17668
作者: David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson
备注:Accepted paper at Reinforcement Learning Conference (RLC) 2025
摘要:从数据中学习算法的过程,而不是依赖于手动设计,作为提高机器学习系统性能的范例,越来越受欢迎。元学习对强化学习(RL)表现出特别的希望,其中算法通常是从监督或无监督学习中改编的,尽管它们对RL来说是次优的。然而,到目前为止,不同的元学习算法之间严重缺乏比较,例如使用进化来优化黑盒函数或LLM来提出代码。在本文中,我们对不同的方法进行了实证比较,这些方法应用于一系列针对RL管道不同部分的元学习算法。除了元训练和元测试性能之外,我们还研究了每个元学习算法的可解释性、样本成本和训练时间等因素。基于这些发现,我们提出了几个元学习新RL算法的指导方针,这将有助于确保未来学习的算法尽可能高的性能。
摘要:The process of meta-learning algorithms from data, instead of relying on manual design, is growing in popularity as a paradigm for improving the performance of machine learning systems. Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until now there has been a severe lack of comparison between different meta-learning algorithms, such as using evolution to optimise over black-box functions or LLMs to propose code. In this paper, we carry out this empirical comparison of the different approaches when applied to a range of meta-learned algorithms which target different parts of the RL pipeline. In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time for each meta-learning algorithm. Based on these findings, we propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.
【3】Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
标题:一个域名可以帮助其他域名吗?以数据为中心的强化学习多领域推理研究
链接:https://arxiv.org/abs/2507.17512
作者:uoshi Pan, Honglin Lin, Mengyuan Sun, Conghui He, Lijun Wu
备注:27 pages, 24 figures
摘要:带有可验证奖励的强化学习(RLVR)已经成为增强LLM推理能力的强大范例。现有的研究主要集中在孤立的推理领域,如数学问题解决,编码任务或逻辑推理。然而,现实世界的推理场景本质上需要多种认知技能的综合应用。尽管如此,这些推理技能在强化学习下的相互作用仍然知之甚少。为了弥合这一差距,我们提出了一个系统的调查RLVR框架内的多域推理,明确集中在三个主要领域:数学推理,代码生成,逻辑解谜。我们进行了一项全面的研究,包括四个关键部分:(1)利用GRPO算法和Qwen-2.5- 7 B模型家族,我们的研究彻底评估了模型在单域数据集上训练时的域内改进和跨域泛化能力。(2)此外,我们还研究了复杂的相互作用,包括相互增强和冲突,出现在联合跨域培训。(3)为了进一步了解SFT对RL的影响,我们还分析和比较了在相同RL配置下基本模型和指令模型之间的性能差异。(4)此外,我们深入研究关键的RL培训细节,系统地探索课程学习策略,奖励设计的变化和语言特定因素的影响。通过广泛的实验,我们的结果为控制领域交互的动态提供了重要的见解,揭示了影响专业化和可概括推理性能的关键因素。这些发现为优化RL方法提供了有价值的指导,以促进LLM的综合,多领域推理能力。
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of LLMs. Existing research has predominantly concentrated on isolated reasoning domains such as mathematical problem-solving, coding tasks, or logical reasoning. However, real world reasoning scenarios inherently demand an integrated application of multiple cognitive skills. Despite this, the interplay among these reasoning skills under reinforcement learning remains poorly understood. To bridge this gap, we present a systematic investigation of multi-domain reasoning within the RLVR framework, explicitly focusing on three primary domains: mathematical reasoning, code generation, and logical puzzle solving. We conduct a comprehensive study comprising four key components: (1) Leveraging the GRPO algorithm and the Qwen-2.5-7B model family, our study thoroughly evaluates the models' in-domain improvements and cross-domain generalization capabilities when trained on single-domain datasets. (2) Additionally, we examine the intricate interactions including mutual enhancements and conflicts that emerge during combined cross-domain training. (3) To further understand the influence of SFT on RL, we also analyze and compare performance differences between base and instruct models under identical RL configurations. (4) Furthermore, we delve into critical RL training details, systematically exploring the impacts of curriculum learning strategies, variations in reward design, and language-specific factors. Through extensive experiments, our results offer significant insights into the dynamics governing domain interactions, revealing key factors influencing both specialized and generalizable reasoning performance. These findings provide valuable guidance for optimizing RL methodologies to foster comprehensive, multi-domain reasoning capabilities in LLMs.
【4】Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning
标题:延长工具寿命:通过终身引导的强化学习学习熟练使用通用工具
链接:https://arxiv.org/abs/2507.17275
作者:, Cheng-Yu Kuo, Yuki Kadokawa, Takamitsu Matsubara
备注:Under review
摘要:在具有不确定任务需求的不可访问环境中,机器人通常依赖于缺乏预定义使用策略的通用工具。这些工具不是为特定行动量身定制的,因此其寿命对如何使用非常敏感。这就产生了一个根本性的挑战:机器人如何学习工具使用策略,既能完成任务,又能延长工具的寿命?在这项工作中,我们通过引入强化学习(RL)框架来解决这一挑战,该框架将工具寿命作为策略优化过程中的一个因素。我们的框架利用有限元分析(FEA)和矿工规则来估计剩余使用寿命(RUL)的基础上积累的压力,并将RUL集成到RL奖励,以指导政策学习寿命指导的行为。为了处理RUL只能在任务执行后估计的事实,我们引入了自适应奖励归一化(ARN)机制,该机制基于估计的RUL动态调整奖励缩放,确保稳定的学习信号。我们验证了我们的方法在模拟和现实世界的工具使用任务,包括对象移动和门打开多个通用工具。学习的策略持续延长工具寿命(模拟中高达8.01倍),并有效地转移到现实世界的设置,展示了学习寿命指导工具使用策略的实用价值。
摘要:In inaccessible environments with uncertain task demands, robots often rely on general-purpose tools that lack predefined usage strategies. These tools are not tailored for particular operations, making their longevity highly sensitive to how they are used. This creates a fundamental challenge: how can a robot learn a tool-use policy that both completes the task and prolongs the tool's lifespan? In this work, we address this challenge by introducing a reinforcement learning (RL) framework that incorporates tool lifespan as a factor during policy optimization. Our framework leverages Finite Element Analysis (FEA) and Miner's Rule to estimate Remaining Useful Life (RUL) based on accumulated stress, and integrates the RUL into the RL reward to guide policy learning toward lifespan-guided behavior. To handle the fact that RUL can only be estimated after task execution, we introduce an Adaptive Reward Normalization (ARN) mechanism that dynamically adjusts reward scaling based on estimated RULs, ensuring stable learning signals. We validate our method across simulated and real-world tool use tasks, including Object-Moving and Door-Opening with multiple general-purpose tools. The learned policies consistently prolong tool lifespan (up to 8.01x in simulation) and transfer effectively to real-world settings, demonstrating the practical value of learning lifespan-guided tool use strategies.
【5】Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach
标题:通过集群防御方法提高深度强化学习的鲁棒性
链接:https://arxiv.org/abs/2507.17070
作者:ohan, Dominik Rößle, Daniel Cremers, Torsten Schön
备注:6 pages, 4 figures, 2 tables
摘要:深度强化学习(DRL)的最新进展已经证明了其在各个领域的适用性,包括机器人、医疗保健、能源优化和自动驾驶。然而,一个关键的问题仍然存在:当暴露于对抗性攻击时,DRL模型的鲁棒性如何?虽然现有的防御机制,如对抗性训练和蒸馏,提高了DRL模型的弹性,但在自动驾驶场景中集成多种防御方面仍然存在重大的研究空白。本文通过提出一种新的基于集成的防御架构来解决这一差距,以减轻自动驾驶中的对抗性攻击。我们的评估表明,所提出的架构显着提高了DRL模型的鲁棒性。与FGSM攻击下的基线相比,我们的集成方法将平均奖励从5.87提高到18.38(增加超过213%),并将高速公路场景和合并场景中的平均碰撞率从0.50降低到0.09(降低82%),优于所有独立防御策略。
摘要:Recent advancements in Deep Reinforcement Learning (DRL) have demonstrated its applicability across various domains, including robotics, healthcare, energy optimization, and autonomous driving. However, a critical question remains: How robust are DRL models when exposed to adversarial attacks? While existing defense mechanisms such as adversarial training and distillation enhance the resilience of DRL models, there remains a significant research gap regarding the integration of multiple defenses in autonomous driving scenarios specifically. This paper addresses this gap by proposing a novel ensemble-based defense architecture to mitigate adversarial attacks in autonomous driving. Our evaluation demonstrates that the proposed architecture significantly enhances the robustness of DRL models. Compared to the baseline under FGSM attacks, our ensemble method improves the mean reward from 5.87 to 18.38 (over 213% increase) and reduces the mean collision rate from 0.50 to 0.09 (an 82% decrease) in the highway scenario and merge scenario, outperforming all standalone defense strategies.
【6】Shared Control of Holonomic Wheelchairs through Reinforcement Learning
标题:通过强化学习共享控制完整轮椅
链接:https://arxiv.org/abs/2507.17055
作者:hler, Diego Paez-Granados, Jorge Peña-Queralta
摘要:智能电动轮椅可以通过支持驾驶员共享控制来改善用户体验。最先进的工作显示了共享控制在提高非完整机器人导航安全方面的潜力。然而,对于完整系统,目前的方法往往会导致用户的不直观的行为,并未能利用全向驾驶的全部潜力。因此,我们提出了一种基于强化学习的方法,该方法采用2D用户输入并输出3D运动,同时确保用户舒适度并减少驾驶员的认知负荷。我们的方法在Isaac Gym中进行了训练,并在Gazebo中进行了模拟测试。我们比较不同的RL代理架构和奖励功能的基础上考虑认知负荷和用户舒适度的指标。我们表明,我们的方法确保了无碰撞导航,同时巧妙地定位轮椅,并显示出更好的或有竞争力的平滑度相比,以前的非基于学习的方法。我们进一步执行一个模拟到真实的传输,并证明,据我们所知,第一个现实世界的实现基于RL的共享控制的全向移动平台。
摘要:Smart electric wheelchairs can improve user experience by supporting the driver with shared control. State-of-the-art work showed the potential of shared control in improving safety in navigation for non-holonomic robots. However, for holonomic systems, current approaches often lead to unintuitive behavior for the user and fail to utilize the full potential of omnidirectional driving. Therefore, we propose a reinforcement learning-based method, which takes a 2D user input and outputs a 3D motion while ensuring user comfort and reducing cognitive load on the driver. Our approach is trained in Isaac Gym and tested in simulation in Gazebo. We compare different RL agent architectures and reward functions based on metrics considering cognitive load and user comfort. We show that our method ensures collision-free navigation while smartly orienting the wheelchair and showing better or competitive smoothness compared to a previous non-learning-based method. We further perform a sim-to-real transfer and demonstrate, to the best of our knowledge, the first real-world implementation of RL-based shared control for an omnidirectional mobility platform.
【7】Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization
标题:用于碳和风险意识微电网优化的扩散模型强化学习
链接:https://arxiv.org/abs/2507.16867
作者:o, Wei Zhang, Cheng Xiang, Hongyang Du, Dusit Niyato, Shuhua Gao
备注:10 pages, 5 figures
摘要:本文介绍了DiffCarl,一个扩散模型的碳和风险意识强化学习算法的多微电网系统的智能操作。随着可再生能源的日益集成和系统复杂性的增加,微电网社区面临着不确定性下的实时能量调度和优化的重大挑战。DiffCarl将扩散模型集成到深度强化学习(DRL)框架中,以实现不确定性下的自适应能源调度,并明确考虑碳排放和运营风险。通过去噪生成过程学习动作分布,DiffCarl增强了DRL策略的表达能力,并在动态和不确定的微电网环境中实现碳和风险感知调度。大量的实验研究表明,它优于经典算法和最先进的日间行车线解决方案,运营成本降低2.3-30.1%。它的碳排放量比其碳无意识变体低28.7%,并减少了性能变化。这些结果突出了DiffCarl作为一个实用和前瞻性的解决方案。其灵活的设计允许有效地适应不同的系统配置和目标,以支持不断发展的能源系统中的实际部署。
摘要
:This paper introduces DiffCarl, a diffusion-modeled carbon- and risk-aware reinforcement learning algorithm for intelligent operation of multi-microgrid systems. With the growing integration of renewables and increasing system complexity, microgrid communities face significant challenges in real-time energy scheduling and optimization under uncertainty. DiffCarl integrates a diffusion model into a deep reinforcement learning (DRL) framework to enable adaptive energy scheduling under uncertainty and explicitly account for carbon emissions and operational risk. By learning action distributions through a denoising generation process, DiffCarl enhances DRL policy expressiveness and enables carbon- and risk-aware scheduling in dynamic and uncertain microgrid environments. Extensive experimental studies demonstrate that it outperforms classic algorithms and state-of-the-art DRL solutions, with 2.3-30.1% lower operational cost. It also achieves 28.7% lower carbon emissions than those of its carbon-unaware variant and reduces performance variability. These results highlight DiffCarl as a practical and forward-looking solution. Its flexible design allows efficient adaptation to different system configurations and objectives to support real-world deployment in evolving energy systems.
【8】Reinforcement Learning in hyperbolic space for multi-step reasoning
标题:用于多步推理的双曲空间强化学习
链接:https://arxiv.org/abs/2507.16864
作者:ung-Yang Lee, Momiao Xiong
备注:53 pages, 5 figures
摘要:多步推理是人工智能中的一个基本挑战,其应用范围从数学问题解决到动态环境中的决策制定。强化学习(RL)在使代理通过优化长期奖励来执行多步推理方面表现出了希望。然而,传统的强化学习方法由于信用分配、高维状态表示和稳定性问题等问题而难以处理复杂的推理任务。Transformer架构和双曲线几何的最新进展为这些挑战提供了新颖的解决方案。本文介绍了一种新的框架,将双曲Transformers集成到RL的多步推理。所提出的方法利用双曲嵌入模型的层次结构有效。我们提出的理论见解,算法的细节,和实验结果,包括前沿数学和非线性最优控制问题。与基于vanilla Transformer的RL相比,双曲线RL在FrontierMath基准测试中的精度提高了32%~44%,在非线性最优控制基准测试中的精度提高了43%~45%,同时在FrontierMath基准测试中的计算时间减少了16%~32%,在非线性最优控制基准测试中的计算时间减少了16%~17%.我们的工作证明了双曲Transformers在强化学习中的潜力,特别是对于涉及层次结构的多步推理任务。
摘要:Multi-step reasoning is a fundamental challenge in artificial intelligence, with applications ranging from mathematical problem-solving to decision-making in dynamic environments. Reinforcement Learning (RL) has shown promise in enabling agents to perform multi-step reasoning by optimizing long-term rewards. However, conventional RL methods struggle with complex reasoning tasks due to issues such as credit assignment, high-dimensional state representations, and stability concerns. Recent advancements in Transformer architectures and hyperbolic geometry have provided novel solutions to these challenges. This paper introduces a new framework that integrates hyperbolic Transformers into RL for multi-step reasoning. The proposed approach leverages hyperbolic embeddings to model hierarchical structures effectively. We present theoretical insights, algorithmic details, and experimental results that include Frontier Math and nonlinear optimal control problems. Compared to RL with vanilla transformer, the hyperbolic RL largely improves accuracy by (32%~44%) on FrontierMath benchmark, (43%~45%) on nonlinear optimal control benchmark, while achieving impressive reduction in computational time by (16%~32%) on FrontierMath benchmark, (16%~17%) on nonlinear optimal control benchmark. Our work demonstrates the potential of hyperbolic Transformers in reinforcement learning, particularly for multi-step reasoning tasks that involve hierarchical structures.
医学相关(6篇)
【1】Model Compression Engine for Wearable Devices Skin Cancer Diagnosis
标题:可穿戴设备皮肤癌诊断的模型压缩引擎
链接:https://arxiv.org/abs/2507.17125
作者:Delgado-López, Andrea P. Seda-Hernandez, Juan D. Guadalupe-Rosado, Luis E. Fernandez Ramirez, Miguel Giboyeaux-Camilo, Wilfredo E. Lugo-Beauchamp
摘要:皮肤癌是最普遍和最可预防的癌症类型之一,但其早期检测仍然是一个挑战,特别是在资源有限的环境中,获得专业医疗服务的机会很少。这项研究提出了一种针对嵌入式系统优化的AI驱动的诊断工具,以解决这一差距。使用MobileNetV2架构的迁移学习,该模型适用于将皮肤病变分为“皮肤癌”和“其他”的二元分类。TensorRT框架用于压缩和优化模型,以便在NVIDIA Jetson Orin Nano上部署,从而平衡性能和能效。在多个基准测试中进行了全面的评估,包括模型大小、推理速度、吞吐量和功耗。优化后的模型保持了原有的性能,F1得分为87.18%,准确率为93.18%,召回率为81.91%。压缩后的结果显示,模型大小减少了0.41,推理速度和吞吐量提高了,INT8精度的能耗减少了0.93。这些发现验证了在资源受限的边缘设备上部署高性能、高能效诊断工具的可行性。除了皮肤癌检测之外,本研究中应用的方法在其他医疗诊断和需要可访问的高效AI解决方案的领域中具有更广泛的应用。这项研究强调了优化的人工智能系统彻底改变医疗诊断的潜力,从而弥合先进技术和服务不足地区之间的鸿沟。
摘要:Skin cancer is one of the most prevalent and preventable types of cancer, yet its early detection remains a challenge, particularly in resource-limited settings where access to specialized healthcare is scarce. This study proposes an AI-driven diagnostic tool optimized for embedded systems to address this gap. Using transfer learning with the MobileNetV2 architecture, the model was adapted for binary classification of skin lesions into "Skin Cancer" and "Other." The TensorRT framework was employed to compress and optimize the model for deployment on the NVIDIA Jetson Orin Nano, balancing performance with energy efficiency. Comprehensive evaluations were conducted across multiple benchmarks, including model size, inference speed, throughput, and power consumption. The optimized models maintained their performance, achieving an F1-Score of 87.18% with a precision of 93.18% and recall of 81.91%. Post-compression results showed reductions in model size of up to 0.41, along with improvements in inference speed and throughput, and a decrease in energy consumption of up to 0.93 in INT8 precision. These findings validate the feasibility of deploying high-performing, energy-efficient diagnostic tools on resource-constrained edge devices. Beyond skin cancer detection, the methodologies applied in this research have broader applications in other medical diagnostics and domains requiring accessible, efficient AI solutions. This study underscores the potential of optimized AI systems to revolutionize healthcare diagnostics, thereby bridging the divide between advanced technology and underserved regions.
【2】AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation
标题:AURA:一种用于理解、推理和注释的多模式医疗代理
链接:https://arxiv.org/abs/2507.16940
作者:i, Amar Kumar, Tal Arbel
备注:9 pages, 3 figures, International Conference on Medical Image Computing and Computer-Assisted Intervention
摘要:大型语言模型(LLM)的最新进展已经催化了从静态预测系统到能够推理、与工具交互和适应复杂任务的人工智能代理的范式转变。虽然基于LLM的代理系统在许多领域都表现出了希望,但它们在医学成像方面的应用仍处于起步阶段。在这项工作中,我们介绍了AURA,第一个视觉语言解释代理专门设计的综合分析,解释和评价的医学图像。通过实现动态交互、上下文解释和假设检验,AURA代表了朝着更加透明、适应性强和临床一致的人工智能系统的重大进步。我们强调了人工智能在将医学图像分析从静态预测转变为交互式决策支持方面的前景。利用Qwen-32 B,基于LLM的架构,AURA集成了一个模块化工具箱,包括:(i)具有相位基础,病理分割和解剖分割的分割套件,以定位临床有意义的区域;(ii)通过图像级解释支持推理的反事实图像生成模块;和(iii)一组评估工具,包括逐像素差异图分析、分类和先进的最先进组件,以评估诊断相关性和视觉可解释性。
摘要
:Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based architecture, AURA integrates a modular toolbox comprising: (i) a segmentation suite with phase grounding, pathology segmentation, and anatomy segmentation to localize clinically meaningful regions; (ii) a counterfactual image-generation module that supports reasoning through image-level explanations; and (iii) a set of evaluation tools including pixel-wise difference-map analysis, classification, and advanced state-of-the-art components to assess diagnostic relevance and visual interpretability.
【3】Deep Generative Learning of Magnetic Frustration in Artificial Spin Ice from Magnetic Force Microscopy Images
标题:从磁力显微镜图像对人工旋转冰中磁挫败的深度生成学习
链接:https://arxiv.org/abs/2507.17726
作者:gi, Suryakant Mishra, Prasad P Iyer, Tzu-Ming Lu, Ezra Bussmann, Sergei Tretiak, Andrew Crandall Jones, Jian-Xin Zhu
摘要:越来越大的原子分辨率显微图像数据集促进了机器学习方法的发展,以识别和分析嵌入图像中的微妙物理现象。在这项工作中,蜂窝晶格自旋冰样品的显微图像作为数据集,我们自动计算的净磁矩和自旋冰配置的方向取向。在我们工作流程的第一阶段,机器学习模型被训练来准确预测自旋冰结构内的磁矩和方向。变分自动编码器(VAE)是一种新兴的无监督深度学习技术,用于生成高质量的合成磁力显微镜(MFM)图像并提取潜在特征表示,从而减少实验和分割错误。所提出的方法的第二阶段能够精确识别和预测受抑顶点和纳米磁性片段,有效地将显微图像的结构和功能方面相关联。这有助于设计具有受控挫折模式的优化自旋冰配置,从而实现潜在的按需合成。
摘要:Increasingly large datasets of microscopic images with atomic resolution facilitate the development of machine learning methods to identify and analyze subtle physical phenomena embedded within the images. In this work, microscopic images of honeycomb lattice spin-ice samples serve as datasets from which we automate the calculation of net magnetic moments and directional orientations of spin-ice configurations. In the first stage of our workflow, machine learning models are trained to accurately predict magnetic moments and directions within spin-ice structures. Variational Autoencoders (VAEs), an emergent unsupervised deep learning technique, are employed to generate high-quality synthetic magnetic force microscopy (MFM) images and extract latent feature representations, thereby reducing experimental and segmentation errors. The second stage of proposed methodology enables precise identification and prediction of frustrated vertices and nanomagnetic segments, effectively correlating structural and functional aspects of microscopic images. This facilitates the design of optimized spin-ice configurations with controlled frustration patterns, enabling potential on-demand synthesis.
【4】Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review
标题:基于机器学习的多模式预后模型集成病理图像和高通量组学数据,用于癌症总体生存预测:系统性综述
链接:https://arxiv.org/abs/2507.16876
作者: Jennings (1, 2), Andrew Broad (1), Lucy Godson (1), Emily Clarke (1, 2), David Westhead (2), Darren Treanor (1, 2, 3) ((1) National Pathology Imaging Cooperative, Leeds Teaching Hospitals NHS Trust, Leeds, UK (2) University of Leeds, Leeds, UK (3) Linköping University, Linköping, Sweden)
备注:Main article (50 pages, inc 3 tables, 4 figures). Supplementary material included with additional methodological information and data
摘要:整合组织病理学和分子数据的多模态机器学习显示了癌症诊断的希望。我们系统地回顾了结合全切片图像(WSIs)和高通量组学来预测总生存率的研究。EMBASE、PubMed和Cochrane CENTRAL的检索(2024年12月8日),加上引文筛选,确定了合格的研究。使用CHARMS进行数据提取;使用PROBAST+AI评估偏倚;按照SWiM和PRISMA 2020进行合成。方案:PROSPERO(CRD 42024594745)。 19种癌症类型的48项研究(均自2017年以来)符合标准;所有研究均使用癌症基因组图谱。方法包括正则化Cox回归(n=4),经典ML(n=13)和深度学习(n=31)。报告的c指数范围为0.550-0.857;多模态模型通常优于单峰模型。然而,所有研究均显示不明确/高偏倚,外部验证有限,并且很少关注临床实用性。 多模式WSI-omics生存预测是一个快速发展的领域,具有很好的结果,但需要改进方法的严谨性,更广泛的数据集和临床评估。 由NPIC,英国利兹教学医院NHS信托基金资助(项目104687),由UKRI工业战略挑战基金支持。
摘要:Multimodal machine learning integrating histopathology and molecular data shows promise for cancer prognostication. We systematically reviewed studies combining whole slide images (WSIs) and high-throughput omics to predict overall survival. Searches of EMBASE, PubMed, and Cochrane CENTRAL (12/08/2024), plus citation screening, identified eligible studies. Data extraction used CHARMS; bias was assessed with PROBAST+AI; synthesis followed SWiM and PRISMA 2020. Protocol: PROSPERO (CRD42024594745). Forty-eight studies (all since 2017) across 19 cancer types met criteria; all used The Cancer Genome Atlas. Approaches included regularised Cox regression (n=4), classical ML (n=13), and deep learning (n=31). Reported c-indices ranged 0.550-0.857; multimodal models typically outperformed unimodal ones. However, all studies showed unclear/high bias, limited external validation, and little focus on clinical utility. Multimodal WSI-omics survival prediction is a fast-growing field with promising results but needs improved methodological rigor, broader datasets, and clinical evaluation. Funded by NPIC, Leeds Teaching Hospitals NHS Trust, UK (Project 104687), supported by UKRI Industrial Strategy Challenge Fund.
【5】From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease
标题:从黑匣子到生物标志物:用于解释帕金森病语音模型的稀疏自动编码器
链接:https://arxiv.org/abs/2507.16836
作者:ntinga, Jen-Kai Chen, Roozbeh Sattari, Mirco Ravanelli, Denise Klein
备注:14 pages, 5 figures, submitted to NeurIPS 2025
摘要:语音有望成为帕金森病(PD)等神经系统疾病的成本效益和非侵入性生物标志物。虽然在原始音频上训练的深度学习系统可以找到手工制作的功能无法获得的微妙信号,但它们的黑盒性质阻碍了临床应用。为了解决这个问题,我们应用稀疏自动编码器(SAE)来揭示基于语音的PD检测系统的可解释的内部表示。我们引入了一种新的基于掩码的激活,用于使SAE适应小型生物医学数据集,创建稀疏的解纠缠字典表示。这些字典条目被发现有很强的关联与PD语音中的特征发音缺陷,如减少频谱通量和增加频谱平坦度的低能量区域突出的模型注意。我们进一步表明,光谱通量与MRI扫描的壳核体积测量相关,证明SAE揭示疾病监测和诊断的临床相关生物标志物的潜力。
摘要:Speech holds promise as a cost-effective and non-invasive biomarker for neurological conditions such as Parkinson's disease (PD). While deep learning systems trained on raw audio can find subtle signals not available from hand-crafted features, their black-box nature hinders clinical adoption. To address this, we apply sparse autoencoders (SAEs) to uncover interpretable internal representations from a speech-based PD detection system. We introduce a novel mask-based activation for adapting SAEs to small biomedical datasets, creating sparse disentangled dictionary representations. These dictionary entries are found to have strong associations with characteristic articulatory deficits in PD speech, such as reduced spectral flux and increased spectral flatness in the low-energy regions highlighted by the model attention. We further show that the spectral flux is related to volumetric measurements of the putamen from MRI scans, demonstrating the potential of SAEs to reveal clinically relevant biomarkers for disease monitoring and diagnosis.
【6】Does Language Matter for Early Detection of Parkinson's Disease from Speech?
标题:语言对于通过言语早期检测帕金森病很重要吗?
链接:https://arxiv.org/abs/2507.16832
作者:ntinga, Briac Cordelle, Dominique Louër, Mirco Ravanelli, Denise Klein
备注:Accepted to IEEE Workshop on Machine Learning for Signal Processing (MLSP) 2025
摘要
:使用语音样本作为生物标志物是检测和监测帕金森病(PD)进展的一种有前途的途径,但在文献中关于如何最好地收集和分析这些数据存在相当大的分歧。从言语中检测PD的早期研究使用持续元音发声(SVP)任务,而最近的一些研究探索了对认知要求更高的任务的记录。为了评估语言在PD检测中的作用,我们测试了具有不同数据类型和预训练目标的预训练模型,发现(1)纯文本模型与语音特征模型的性能相匹配,(2)多语言Whisper优于自监督模型,而单语Whisper表现更差,(3)AudioSet预训练提高了SVP的性能,但没有提高自发语音的性能。这些发现共同强调了语言在帕金森病早期检测中的关键作用。
摘要:Using speech samples as a biomarker is a promising avenue for detecting and monitoring the progression of Parkinson's disease (PD), but there is considerable disagreement in the literature about how best to collect and analyze such data. Early research in detecting PD from speech used a sustained vowel phonation (SVP) task, while some recent research has explored recordings of more cognitively demanding tasks. To assess the role of language in PD detection, we tested pretrained models with varying data types and pretraining objectives and found that (1) text-only models match the performance of vocal-feature models, (2) multilingual Whisper outperforms self-supervised models whereas monolingual Whisper does worse, and (3) AudioSet pretraining improves performance on SVP but not spontaneous speech. These findings together highlight the critical role of language for the early detection of Parkinson's disease.
蒸馏|知识提取(2篇)
【1】Dataset Distillation as Data Compression: A Rate-Utility Perspective
标题:数据集蒸馏作为数据压缩:速率-效用的角度
链接:https://arxiv.org/abs/2507.17221
作者:ao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma
备注:Accepted by ICCV 2025
摘要:在“规模就是一切”范式的驱动下,现代机器学习越来越需要更大的数据集和模型,从而产生了令人望而却步的计算和存储要求。数据集蒸馏通过将原始数据集压缩为一小组合成样本来缓解这一问题,同时保留其全部实用性。然而,现有的方法要么在固定的存储预算下最大化性能,要么追求合适的合成数据表示以去除冗余,而不联合优化这两个目标。在这项工作中,我们提出了一个联合率效用优化方法的数据集蒸馏。我们将合成样本参数化为可优化的潜在代码,由极其轻量级的网络解码。我们估计量化的潜在的香农熵的速率措施和堵塞任何现有的蒸馏损失的效用措施,通过拉格朗日乘子来权衡。为了实现公平的跨方法比较,我们引入了每类位数(bpc),这是一个精确的存储度量,它考虑了样本、标签和解码器参数的成本。在CIFAR-10,CIFAR-100和ImageNet-128上,我们的方法在相当的精度下比标准蒸馏实现了高达170倍的压缩。在不同的bpc预算,蒸馏损失和骨干架构,我们的方法始终建立更好的费率效用权衡。
摘要:Driven by the ``scale-is-everything'' paradigm, modern machine learning increasingly demands ever-larger datasets and models, yielding prohibitive computational and storage requirements. Dataset distillation mitigates this by compressing an original dataset into a small set of synthetic samples, while preserving its full utility. Yet, existing methods either maximize performance under fixed storage budgets or pursue suitable synthetic data representations for redundancy removal, without jointly optimizing both objectives. In this work, we propose a joint rate-utility optimization method for dataset distillation. We parameterize synthetic samples as optimizable latent codes decoded by extremely lightweight networks. We estimate the Shannon entropy of quantized latents as the rate measure and plug any existing distillation loss as the utility measure, trading them off via a Lagrange multiplier. To enable fair, cross-method comparisons, we introduce bits per class (bpc), a precise storage metric that accounts for sample, label, and decoder parameter costs. On CIFAR-10, CIFAR-100, and ImageNet-128, our method achieves up to $170\times$ greater compression than standard distillation at comparable accuracy. Across diverse bpc budgets, distillation losses, and backbone architectures, our approach consistently establishes better rate-utility trade-offs.
【2】Sensor Drift Compensation in Electronic-Nose-Based Gas Recognition Using Knowledge Distillation
标题:基于知识蒸馏的电子鼻气体识别中的传感器漂移补偿
链接:https://arxiv.org/abs/2507.17071
作者:n, Xianghao Zhan
备注:9 pages
摘要:由于环境变化和传感器老化,传感器漂移对电子鼻系统在实际部署过程中的气体分类性能提出了挑战。之前使用UCI气体传感器阵列漂移数据集的研究报告了有希望的漂移补偿结果,但缺乏稳健的统计实验验证,并且可能过度补偿传感器漂移,丢失类相关方差。为了解决这些限制并通过统计严格性改进传感器漂移补偿,我们首先基于相同的电子鼻数据集设计了两个域自适应任务:使用所有先前批次预测下一批次,模拟用于在线训练的连续训练数据更新。然后,我们系统地测试了三种方法:我们提出的新的知识蒸馏(KD)方法,基准方法域正则化成分分析(DRCA),以及混合方法KD-DRCA,在UCI数据集上的30个随机测试集分区。我们发现KD的表现始终优于DRCA和KD-DRCA,准确度提高了18%,F1分数提高了15%,证明了KD在漂移补偿方面的优越性。这是KD首次应用于电子鼻漂移缓解,显著优于之前最先进的DRCA方法,并增强了真实环境中传感器漂移补偿的可靠性。
摘要:Due to environmental changes and sensor aging, sensor drift challenges the performance of electronic nose systems in gas classification during real-world deployment. Previous studies using the UCI Gas Sensor Array Drift Dataset reported promising drift compensation results but lacked robust statistical experimental validation and may overcompensate for sensor drift, losing class-related variance.To address these limitations and improve sensor drift compensation with statistical rigor, we first designed two domain adaptation tasks based on the same electronic nose dataset: using the first batch to predict the remaining batches, simulating a controlled laboratory setting; and predicting the next batch using all prior batches, simulating continuous training data updates for online training. We then systematically tested three methods: our proposed novel Knowledge Distillation (KD) method, the benchmark method Domain Regularized Component Analysis (DRCA), and a hybrid method KD-DRCA, across 30 random test set partitions on the UCI dataset. We showed that KD consistently outperformed both DRCA and KD-DRCA, achieving up to an 18% improvement in accuracy and 15% in F1-score, demonstrating KD's superior effectiveness in drift compensation. This is the first application of KD for electronic nose drift mitigation, significantly outperforming the previous state-of-the-art DRCA method and enhancing the reliability of sensor drift compensation in real-world environments.
推荐(1篇)
【1】Citation Recommendation using Deep Canonical Correlation Analysis
标题:使用深度典型相关分析的引文推荐
链接:https://arxiv.org/abs/2507.17603
作者:amara, Effirul Ramlan
备注:21 pages, 6 figures, 7 tables
摘要:引用推荐的最新进展通过利用多视图表示学习来整合学术文档中存在的各种模态来提高准确性。然而,有效地结合多个数据视图需要融合技术,可以捕获互补的信息,同时保留每个模态的独特特征。我们提出了一种新的引文推荐算法,该算法通过应用深度CCA(DCCA)来改进线性典型相关分析(CCA)方法,DCCA是一种神经网络扩展,能够捕获科学文章的分布式文本和基于图形的表示之间的复杂非线性关系。在大规模DBLP(Digital Bibliography & Library Project)引文网络数据集上的实验表明,我们的方法优于最先进的基于CCA的方法,在Mean Average Precision@10,Precision@10和Recall@10中分别实现了超过11%,5%和7%的相对改进。这些收益反映了更相关的引用建议和增强的排名质量,这表明DCCA的非线性变换比CCA的线性投影产生更有表现力的潜在表示。
摘要
:Recent advances in citation recommendation have improved accuracy by leveraging multi-view representation learning to integrate the various modalities present in scholarly documents. However, effectively combining multiple data views requires fusion techniques that can capture complementary information while preserving the unique characteristics of each modality. We propose a novel citation recommendation algorithm that improves upon linear Canonical Correlation Analysis (CCA) methods by applying Deep CCA (DCCA), a neural network extension capable of capturing complex, non-linear relationships between distributed textual and graph-based representations of scientific articles. Experiments on the large-scale DBLP (Digital Bibliography & Library Project) citation network dataset demonstrate that our approach outperforms state-of-the-art CCA-based methods, achieving relative improvements of over 11% in Mean Average Precision@10, 5% in Precision@10, and 7% in Recall@10. These gains reflect more relevant citation recommendations and enhanced ranking quality, suggesting that DCCA's non-linear transformations yield more expressive latent representations than CCA's linear projections.
聚类(1篇)
【1】Deformable Cluster Manipulation via Whole-Arm Policy Learning
标题:通过双臂政策学习进行可变形集群操纵
链接:https://arxiv.org/abs/2507.17085
作者:Jacob, Wenzheng Zhang, Houston Warren, Paulo Borges, Tirthankar Bandyopadhyay, Fabio Ramos
摘要:操纵可变形物体的集群提出了具有广泛适用性的实质性挑战,但需要接触丰富的整臂交互。一个潜在的解决方案必须解决现实模型合成的能力有限,感知的高度不确定性,以及缺乏有效的空间抽象等问题。我们提出了一个新的框架,用于学习无模型的政策整合两种模式:3D点云和本体感觉触摸指标,强调操纵与全身接触意识,超越传统的终端效应器模式。我们的强化学习框架利用分布式状态表示,并辅以内核均值嵌入,以提高训练效率和实时推理。此外,我们提出了一种新的上下文无关的遮挡启发式,以清除变形从目标区域的曝光任务。我们在电力线清理场景中部署了该框架,并观察到该代理利用多个手臂链接来消除遮挡,从而生成创造性策略。最后,我们执行zero-shot模拟到真实的政策转移,允许手臂清除真实的分支与未知的遮挡模式,看不见的拓扑结构,和不确定的动态。
摘要:Manipulating clusters of deformable objects presents a substantial challenge with widespread applicability, but requires contact-rich whole-arm interactions. A potential solution must address the limited capacity for realistic model synthesis, high uncertainty in perception, and the lack of efficient spatial abstractions, among others. We propose a novel framework for learning model-free policies integrating two modalities: 3D point clouds and proprioceptive touch indicators, emphasising manipulation with full body contact awareness, going beyond traditional end-effector modes. Our reinforcement learning framework leverages a distributional state representation, aided by kernel mean embeddings, to achieve improved training efficiency and real-time inference. Furthermore, we propose a novel context-agnostic occlusion heuristic to clear deformables from a target region for exposure tasks. We deploy the framework in a power line clearance scenario and observe that the agent generates creative strategies leveraging multiple arm links for de-occlusion. Finally, we perform zero-shot sim-to-real policy transfer, allowing the arm to clear real branches with unknown occlusion patterns, unseen topology, and uncertain dynamics.
自动驾驶|车辆|车道检测等(3篇)
【1】PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
标题:PRIX:学习从原始像素进行端到端自动驾驶规划
链接:https://arxiv.org/abs/2507.17596
作者: Wozniak, Lianhang Liu, Yixi Cai, Patric Jensfelt
备注:under review
摘要:虽然端到端自动驾驶模型显示出良好的效果,但其实际部署往往受到大型模型尺寸,依赖昂贵的LiDAR传感器和计算密集型BEV特征表示的阻碍。这限制了它们的可扩展性,特别是对于仅配备摄像头的大众市场车辆。为了应对这些挑战,我们提出了PRIX(原始像素计划)。我们新颖高效的端到端驾驶架构仅使用相机数据运行,无需明确的BEV表示,也无需LiDAR。PRIX利用视觉特征提取器与生成规划头相结合,直接从原始像素输入预测安全轨迹。我们的架构的一个核心组成部分是上下文感知的重新校准Transformer(CaRT),一个新的模块,旨在有效地提高多层次的视觉功能,更强大的规划。我们通过全面的实验证明,PRIX在NavSim和nuScenes基准测试中达到了最先进的性能,与更大的多模式扩散规划器的功能相匹配,同时在推理速度和模型大小方面更加有效,使其成为现实世界部署的实用解决方案。我们的工作是开源的,代码将在https://maxiuw.github.io/prix。
摘要:While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.
【2】SRMambaV2: Biomimetic Attention for Sparse Point Cloud Upsampling in Autonomous Driving
标题:SRMambaV 2:自动驾驶中稀疏点云上采样的仿生关注
链接:https://arxiv.org/abs/2507.17479
作者:en, Xiaolin Qin, Jing Hu, Wenyi Ge
摘要:由于数据固有的稀疏性和复杂的3D结构,在自动驾驶场景中对LiDAR点云进行上采样仍然是一个重大挑战。最近的研究试图通过将复杂的3D空间场景转换为2D图像超分辨率任务来解决这个问题。然而,由于距离图像的稀疏和模糊的特征表示,准确地重建详细和复杂的空间拓扑结构仍然是一个主要的困难。为了解决这个问题,我们提出了一种新的稀疏点云上采样方法SRMambaV2,它提高了长距离稀疏区域的上采样精度,同时保持了整体几何重建质量。具体来说,受人类驾驶员视觉感知的启发,我们设计了一个仿生的2D选择性扫描自我注意(2DSSA)机制来模拟遥远稀疏区域的特征分布。同时,我们引入了双分支网络结构,以提高稀疏特征的表示。此外,我们还引入了渐进自适应损失(PAL)函数,以进一步改善上采样过程中细粒度细节的重建。实验结果表明,SRMambaV 2在定性和定量评估方面都取得了卓越的性能,凸显了其在汽车稀疏点云上采样任务中的有效性和实用价值。
摘要:Upsampling LiDAR point clouds in autonomous driving scenarios remains a significant challenge due to the inherent sparsity and complex 3D structures of the data. Recent studies have attempted to address this problem by converting the complex 3D spatial scenes into 2D image super-resolution tasks. However, due to the sparse and blurry feature representation of range images, accurately reconstructing detailed and complex spatial topologies remains a major difficulty. To tackle this, we propose a novel sparse point cloud upsampling method named SRMambaV2, which enhances the upsampling accuracy in long-range sparse regions while preserving the overall geometric reconstruction quality. Specifically, inspired by human driver visual perception, we design a biomimetic 2D selective scanning self-attention (2DSSA) mechanism to model the feature distribution in distant sparse areas. Meanwhile, we introduce a dual-branch network architecture to enhance the representation of sparse features. In addition, we introduce a progressive adaptive loss (PAL) function to further refine the reconstruction of fine-grained details during the upsampling process. Experimental results demonstrate that SRMambaV2 achieves superior performance in both qualitative and quantitative evaluations, highlighting its effectiveness and practical value in automotive sparse point cloud upsampling tasks.
【3】Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving Labs
标题:探索自动驾驶实验室kNN噪音特征检测和恢复的前沿
链接:https://arxiv.org/abs/2507.16833
作者:, Kangming Li, Yao Fehlis, Daniel Persaud, Robert Black, Jason Hattrick-Simpers
备注:15 pages, 6 figures
摘要:自动驾驶实验室(SDLs)通过将机器学习与自动化实验平台相结合,有望加速材料发现。然而,输入参数捕获中的错误可能会破坏用于建模系统性能的功能,从而危及当前和未来的活动。这项研究开发了一个自动化的工作流程,系统地检测噪声特征,确定可以纠正的样本特征配对,并最终恢复正确的特征值。然后进行系统的研究,以检查数据集大小、噪声强度和特征值分布如何影响有噪特征的可检测性和可恢复性。通常,高强度噪声和大的训练数据集有利于检测和校正噪声特征。低强度噪声降低了检测和恢复,但可以通过更大的干净训练数据集进行补偿。检测和校正结果在具有连续和分散特征分布的特征之间变化,与具有离散或窄分布的特征相比,显示出更大的可恢复性。这项系统的研究不仅展示了一个模型不可知的框架,合理的数据恢复存在的噪音,有限的数据,和不同的特征分布,但也提供了一个有形的基准kNN插补材料数据集。最终,它旨在提高自动材料发现的数据质量和实验精度。
摘要:Self-driving laboratories (SDLs) have shown promise to accelerate materials discovery by integrating machine learning with automated experimental platforms. However, errors in the capture of input parameters may corrupt the features used to model system performance, compromising current and future campaigns. This study develops an automated workflow to systematically detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values. A systematic study is then performed to examine how dataset size, noise intensity, and feature value distribution affect both the detectability and recoverability of noisy features. In general, high-intensity noise and large training datasets are conducive to the detection and correction of noisy features. Low-intensity noise reduces detection and recovery but can be compensated for by larger clean training data sets. Detection and correction results vary between features with continuous and dispersed feature distributions showing greater recoverability compared to features with discrete or narrow distributions. This systematic study not only demonstrates a model agnostic framework for rational data recovery in the presence of noise, limited data, and differing feature distributions but also provides a tangible benchmark of kNN imputation in materials data sets. Ultimately, it aims to enhance data quality and experimental precision in automated materials discovery.
联邦学习|隐私保护|加密(3篇)
【1】Enhancing Quantum Federated Learning with Fisher Information-Based Optimization
标题:利用基于Fisher信息的优化增强量子联邦学习
链接:https://arxiv.org/abs/2507.17580
作者:Singh Bhatia, Sabre Kais
摘要:联合学习(FL)在不同的行业越来越受欢迎,它为客户提供了一种在不共享敏感数据的情况下协同训练全球模型的方法。它涉及全局模型和参与客户端之间的多轮通信,这带来了一些挑战,例如高通信成本,异构客户端数据,延长处理时间以及增加隐私威胁的脆弱性。近年来,联邦学习和参数化量子电路的融合引发了人们的极大研究兴趣,对医疗保健和金融等领域产生了积极的影响。通过实现量子模型的分散训练,它允许客户或机构协同增强模型性能和结果,同时保护数据隐私。认识到Fisher信息可以量化量子态在参数变化下携带的信息量,从而提供对其几何和统计特性的洞察。我们打算利用这一财产来应对上述挑战。在这项工作中,我们提出了一个量子联合学习(QFL)算法,利用本地客户端模型计算的Fisher信息,数据分布在异构分区。这种方法可以识别出显著影响量子模型性能的关键参数,确保它们在聚合过程中得到保留。我们的研究评估了QFL的有效性和可行性,通过比较其性能与其他变体,并探讨在QFL设置中纳入Fisher信息的好处。ADNI和MNIST数据集上的实验结果表明,我们的方法在实现更好的性能和鲁棒性对量子联邦平均方法的有效性。
摘要:Federated Learning (FL) has become increasingly popular across different sectors, offering a way for clients to work together to train a global model without sharing sensitive data. It involves multiple rounds of communication between the global model and participating clients, which introduces several challenges like high communication costs, heterogeneous client data, prolonged processing times, and increased vulnerability to privacy threats. In recent years, the convergence of federated learning and parameterized quantum circuits has sparked significant research interest, with promising implications for fields such as healthcare and finance. By enabling decentralized training of quantum models, it allows clients or institutions to collaboratively enhance model performance and outcomes while preserving data privacy. Recognizing that Fisher information can quantify the amount of information that a quantum state carries under parameter changes, thereby providing insight into its geometric and statistical properties. We intend to leverage this property to address the aforementioned challenges. In this work, we propose a Quantum Federated Learning (QFL) algorithm that makes use of the Fisher information computed on local client models, with data distributed across heterogeneous partitions. This approach identifies the critical parameters that significantly influence the quantum model's performance, ensuring they are preserved during the aggregation process. Our research assessed the effectiveness and feasibility of QFL by comparing its performance against other variants, and exploring the benefits of incorporating Fisher information in QFL settings. Experimental results on ADNI and MNIST datasets demonstrate the effectiveness of our approach in achieving better performance and robustness against the quantum federated averaging method.
【2】Decentralized Federated Learning of Probabilistic Generative Classifiers
标题:概率生成分类器的分散联邦学习
链接:https://arxiv.org/abs/2507.17285
作者:ez, Carlos Echegoyen, Guzmán Santafé
摘要:联邦学习是现实世界应用中日益相关的一种范式,旨在跨异构用户网络构建全局模型,而无需共享私有数据。我们专注于去中心化架构上的模型学习,在这种架构中,用户可以直接协作来更新全局模型,而不依赖于中央服务器。在这种情况下,本文提出了一种新的方法来协同学习概率生成分类器的参数形式。该框架由一组本地节点上的通信网络组成,每个节点都有自己的本地数据和本地更新规则。该提案涉及与邻近节点共享本地统计数据,其中每个节点汇总邻近节点的信息并迭代学习自己的本地分类器,从而逐渐收敛到全局模型。大量的实验表明,该算法始终收敛到一个全球竞争力的模型在广泛的网络拓扑结构,网络规模,本地数据集的大小,和极端的非独立同分布。数据分布
摘要:Federated learning is a paradigm of increasing relevance in real world applications, aimed at building a global model across a network of heterogeneous users without requiring the sharing of private data. We focus on model learning over decentralized architectures, where users collaborate directly to update the global model without relying on a central server. In this context, the current paper proposes a novel approach to collaboratively learn probabilistic generative classifiers with a parametric form. The framework is composed by a communication network over a set of local nodes, each of one having its own local data, and a local updating rule. The proposal involves sharing local statistics with neighboring nodes, where each node aggregates the neighbors' information and iteratively learns its own local classifier, which progressively converges to a global model. Extensive experiments demonstrate that the algorithm consistently converges to a globally competitive model across a wide range of network topologies, network sizes, local dataset sizes, and extreme non-i.i.d. data distributions.
【3】Eco-Friendly AI: Unleashing Data Power for Green Federated Learning
标题:环保人工智能:释放数据力量实现绿色联邦学习
链接:https://arxiv.org/abs/2507.17241
作者:bella, Monica Vitali
摘要:人工智能(AI)和机器学习(ML)的广泛采用带来了重大的环境影响,特别是在能源消耗和碳排放方面。这一紧迫问题凸显了对创新解决方案的需求,以减轻人工智能的生态足迹。影响ML模型训练能耗的关键因素之一是训练数据集的大小。机器学习模型通常基于分布在多个位置的传感器和设备连续生成的大量数据进行训练。为了降低数据传输成本并增强隐私性,联邦学习(FL)支持模型训练,而无需移动或共享原始数据。虽然FL提供了这些优势,但由于数据源的异构性(与数量和质量相关),计算节点能力和环境影响,它也带来了挑战。 本文通过提出一种以数据为中心的绿色联邦学习方法,为绿色人工智能的发展做出了贡献。具体来说,我们专注于通过最小化训练数据量来减少FL对环境的影响。我们的方法包括分析联邦数据集的特征,根据质量指标选择最佳数据子集,以及选择对环境影响最小的联邦节点。我们开发了一种全面的方法,可以检查以数据为中心的因素(如数据质量和数量)对FL培训性能和碳排放的影响。在这些见解的基础上,我们引入了一个交互式推荐系统,通过数据简化优化FL配置,最大限度地减少训练期间对环境的影响。将这种方法应用于时间序列分类,在减少FL任务对环境的影响方面取得了可喜的成果。
摘要:The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) comes with a significant environmental impact, particularly in terms of energy consumption and carbon emissions. This pressing issue highlights the need for innovative solutions to mitigate AI's ecological footprint. One of the key factors influencing the energy consumption of ML model training is the size of the training dataset. ML models are often trained on vast amounts of data continuously generated by sensors and devices distributed across multiple locations. To reduce data transmission costs and enhance privacy, Federated Learning (FL) enables model training without the need to move or share raw data. While FL offers these advantages, it also introduces challenges due to the heterogeneity of data sources (related to volume and quality), computational node capabilities, and environmental impact. This paper contributes to the advancement of Green AI by proposing a data-centric approach to Green Federated Learning. Specifically, we focus on reducing FL's environmental impact by minimizing the volume of training data. Our methodology involves the analysis of the characteristics of federated datasets, the selecting of an optimal subset of data based on quality metrics, and the choice of the federated nodes with the lowest environmental impact. We develop a comprehensive methodology that examines the influence of data-centric factors, such as data quality and volume, on FL training performance and carbon emissions. Building on these insights, we introduce an interactive recommendation system that optimizes FL configurations through data reduction, minimizing environmental impact during training. Applying this methodology to time series classification has demonstrated promising results in reducing the environmental impact of FL tasks.
推理|分析|理解|解释(7篇)
【1】ViRN: Variational Inference and Distribution Trilateration for Long-Tailed Continual Representation Learning
标题:ViRN:用于长尾连续表征学习的变分推理和分布三边测量
链接:https://arxiv.org/abs/2507.17368
作者:Chong Tang, Jagmohan Chauhan
备注:6 pages, 2 figures
摘要:具有长尾数据分布的持续学习(CL)仍然是现实世界人工智能系统的一个关键挑战,尽管存在严重的类不平衡,但模型必须顺序适应新类,同时保留旧类的知识。现有的方法难以平衡稳定性和可塑性,往往在极端的样本稀缺性下崩溃。为了解决这个问题,我们提出了ViRN,一个新的CL框架,集成了变分推理(VI)与分布式三边测量,用于强大的长尾学习。首先,我们通过变分自动编码器对类条件分布进行建模,以减轻对头部类的偏见。其次,我们通过基于Wasserstein距离的邻域检索和几何融合来重建尾类分布,从而实现尾类表示的样本有效对齐。在六个长尾分类基准上进行评估,包括语音(例如,罕见的声学事件,口音)和图像任务,ViRN实现了10.24%的平均精度增益比最先进的方法。
摘要:Continual learning (CL) with long-tailed data distributions remains a critical challenge for real-world AI systems, where models must sequentially adapt to new classes while retaining knowledge of old ones, despite severe class imbalance. Existing methods struggle to balance stability and plasticity, often collapsing under extreme sample scarcity. To address this, we propose ViRN, a novel CL framework that integrates variational inference (VI) with distributional trilateration for robust long-tailed learning. First, we model class-conditional distributions via a Variational Autoencoder to mitigate bias toward head classes. Second, we reconstruct tail-class distributions via Wasserstein distance-based neighborhood retrieval and geometric fusion, enabling sample-efficient alignment of tail-class representations. Evaluated on six long-tailed classification benchmarks, including speech (e.g., rare acoustic events, accents) and image tasks, ViRN achieves a 10.24% average accuracy gain over state-of-the-art methods.
【2】R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
标题:R-Stitch:用于高效推理的动态轨迹缝合
链接:https://arxiv.org/abs/2507.17307
作者:hen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, Bohan Zhuang
摘要:思维链(CoT)推理通过鼓励推理过程中的逐步中间推理来增强大型语言模型解决问题的能力。虽然有效,CoT引入大量的计算开销,由于其依赖于自回归解码长令牌序列。现有的加速策略要么通过提前停止或压缩奖励设计来减少序列长度,要么通过使用较小模型的推测解码来提高解码速度。然而,投机性解码遭受有限的加速时,小型和大型模型之间的协议是低的,并未能利用小型模型在产生简洁的中间推理的潜在优势。在本文中,我们提出了R-Stitch,一个令牌级的,基于置信度的混合解码框架,通过沿着推理轨迹在小语言模型(SLM)和大语言模型(LLM)之间切换来加速CoT推理。默认情况下,R-Stitch使用SLM生成令牌,仅当SLM的置信度低于阈值时才委托给LLM。这种设计避免了全序列回滚,并有选择地在不确定的步骤上调用LLM,从而保持了效率和答案质量。R-Stitch与模型无关,无需训练,并与标准解码管道兼容。在数学推理基准测试上的实验表明,R-Stitch在推理延迟降低85%的同时,准确率下降可以忽略不计,突出了其在加速CoT推理方面的实际有效性。
摘要:Chain-of-thought (CoT) reasoning enhances the problem-solving capabilities of large language models by encouraging step-by-step intermediate reasoning during inference. While effective, CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences. Existing acceleration strategies either reduce sequence length through early stopping or compressive reward designs, or improve decoding speed via speculative decoding with smaller models. However, speculative decoding suffers from limited speedup when the agreement between small and large models is low, and fails to exploit the potential advantages of small models in producing concise intermediate reasoning. In this paper, we present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference by switching between a small language model (SLM) and a large language model (LLM) along the reasoning trajectory. R-Stitch uses the SLM to generate tokens by default and delegates to the LLM only when the SLM's confidence falls below a threshold. This design avoids full-sequence rollback and selectively invokes the LLM on uncertain steps, preserving both efficiency and answer quality. R-Stitch is model-agnostic, training-free, and compatible with standard decoding pipelines. Experiments on math reasoning benchmarks demonstrate that R-Stitch achieves up to 85\% reduction in inference latency with negligible accuracy drop, highlighting its practical effectiveness in accelerating CoT reasoning.
【3】Tabular Diffusion based Actionable Counterfactual Explanations for Network Intrusion Detection
标题:基于表格扩散的网络入侵检测可操作反事实解释
链接:https://arxiv.org/abs/2507.17161
作者:lwaduge, Jagath Samarabandu
摘要:现代网络入侵检测系统(NIDS)经常利用复杂深度学习模型的预测能力。然而,这种深度学习方法的“黑匣子”性质增加了一层不透明性,阻碍了对检测决策的正确理解,对决策的信任,并阻止对此类攻击的及时对策。可解释人工智能(XAI)方法通过提供对预测原因的见解来解决这个问题。大多数现有的XAI方法提供的解释不便于转换为可操作的对策。在这项工作中,我们提出了一种新的基于扩散的反事实解释框架,可以为网络入侵攻击提供可操作的解释。我们评估了我们提出的算法对其他几个公开的反事实解释算法3现代网络入侵数据集。据我们所知,这项工作还提出了第一个现有的反事实的解释算法的网络入侵检测系统的背景下进行比较分析。我们所提出的方法提供了最小的,不同的反事实的解释测试的反事实的解释算法在一个更有效的方式,通过减少时间来生成解释。我们还展示了反事实解释如何通过总结它们来创建一组全局规则,从而提供可操作的解释。这些规则不仅在实例级,而且在全局级的入侵攻击是可操作的。这些全局反事实规则能够有效地过滤入侵查询,这对于有效的入侵检测和防御机制至关重要。
摘要
:Modern network intrusion detection systems (NIDS) frequently utilize the predictive power of complex deep learning models. However, the "black-box" nature of such deep learning methods adds a layer of opaqueness that hinders the proper understanding of detection decisions, trust in the decisions and prevent timely countermeasures against such attacks. Explainable AI (XAI) methods provide a solution to this problem by providing insights into the causes of the predictions. The majority of the existing XAI methods provide explanations which are not convenient to convert into actionable countermeasures. In this work, we propose a novel diffusion-based counterfactual explanation framework that can provide actionable explanations for network intrusion attacks. We evaluated our proposed algorithm against several other publicly available counterfactual explanation algorithms on 3 modern network intrusion datasets. To the best of our knowledge, this work also presents the first comparative analysis of existing counterfactual explanation algorithms within the context of network intrusion detection systems. Our proposed method provide minimal, diverse counterfactual explanations out of the tested counterfactual explanation algorithms in a more efficient manner by reducing the time to generate explanations. We also demonstrate how counterfactual explanations can provide actionable explanations by summarizing them to create a set of global rules. These rules are actionable not only at instance level but also at the global level for intrusion attacks. These global counterfactual rules show the ability to effectively filter out incoming attack queries which is crucial for efficient intrusion detection and defense mechanisms.
【4】TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning
标题:TD-Interpreter:通过视觉语言学习增强对计时图的理解
链接:https://arxiv.org/abs/2507.16844
作者:incent Theo Willem Kenbeek, Zhantao Yang, Meixun Qu, Ezio Bartocci, Dejan Ničković, Radu Grosu
摘要:我们介绍了TD-Interpreter,这是一种专门的机器学习工具,可帮助工程师在设计和验证过程中理解来自第三方的复杂时序图(TD)。TD-Interpreter是一个可视化的问答环境,允许工程师输入一组TD,并询问有关这些TD的设计和验证问题。我们通过微调LLaVA(一个轻量级的7 B多模态大型语言模型(MLLM))来实现具有多模态学习的TD解释器。为了解决有限的训练数据可用性,我们开发了一个合成数据生成工作流程,将视觉信息与其文本解释相结合。我们的实验评估表明TD-解释器的有用性,其性能优于未调优的GPT-4 o的大幅度的评估基准。
摘要:We introduce TD-Interpreter, a specialized ML tool that assists engineers in understanding complex timing diagrams (TDs), originating from a third party, during their design and verification process. TD-Interpreter is a visual question-answer environment which allows engineers to input a set of TDs and ask design and verification queries regarding these TDs. We implemented TD-Interpreter with multimodal learning by fine-tuning LLaVA, a lightweight 7B Multimodal Large Language Model (MLLM). To address limited training data availability, we developed a synthetic data generation workflow that aligns visual information with its textual interpretation. Our experimental evaluation demonstrates the usefulness of TD-Interpreter which outperformed untuned GPT-4o by a large margin on the evaluated benchmarks.
【5】Neural networks for bifurcation and linear stability analysis of steady states in partial differential equations
标题:偏微方程稳态分歧和线性稳定性分析的神经网络
链接:https://arxiv.org/abs/2407.19707
作者:Luthfi Shahab, Hadi Susanto
备注:Accepted for publication in Applied Mathematics and Computation
摘要:本研究将神经网路应用于求解非线性偏微分方程式。提出了一种结合伪弧长延拓的神经网络方法,用于从参数化的非线性偏微分方程构造分支图。此外,神经网络方法也提出了解决的特征值问题,分析解决方案的线性稳定性,重点是确定最大的特征值。通过对Bratu方程和Burgers方程的实验,验证了该神经网络的有效性。从有限差分法的结果也作为比较。在每种情况下,采用不同数量的网格点来评估神经网络和有限差分法的行为和精度。实验结果表明,所提出的神经网络产生更好的解决方案,生成更准确的分岔图,具有合理的计算时间,并证明有效的线性稳定性分析。
摘要:This research introduces an extended application of neural networks for solving nonlinear partial differential equations (PDEs). A neural network, combined with a pseudo-arclength continuation, is proposed to construct bifurcation diagrams from parameterized nonlinear PDEs. Additionally, a neural network approach is also presented for solving eigenvalue problems to analyze solution linear stability, focusing on identifying the largest eigenvalue. The effectiveness of the proposed neural network is examined through experiments on the Bratu equation and the Burgers equation. Results from a finite difference method are also presented as comparison. Varying numbers of grid points are employed in each case to assess the behavior and accuracy of both the neural network and the finite difference method. The experimental results demonstrate that the proposed neural network produces better solutions, generates more accurate bifurcation diagrams, has reasonable computational times, and proves effective for linear stability analysis.
【6】Doubly robust outlier resistant inference on causal treatment effect
标题:对因果治疗效果的双重稳健异常值抵抗推断
链接:https://arxiv.org/abs/2507.17439
作者:Kang
摘要:离群值会严重扭曲观察性研究中的因果效应估计,但这一问题在文献中受到的关注有限。它们的影响在小样本中尤其明显,在这种情况下,检测和去除异常值变得越来越困难。因此,必须在不排除这些有影响力的数据点的情况下稳健地估计治疗效果。为了解决这个问题,我们提出了一个双重鲁棒点估计的平均治疗效果下的污染模型,包括离群值。结果回归的稳健性是通过稳健的估计方程实现的,而协变量平衡倾向评分(CBPS)确保了倾向评分建模的弹性。 为了防止由于包含大量参数而导致的模型过拟合,我们结合了变量选择。所有这些组成部分统一在一个惩罚的经验可能性框架。对于置信区间估计,大多数现有方法都依赖于渐进性质,这在有限样本中可能不可靠。我们推导出一个最佳的有限样本置信区间的平均治疗效果,使用我们提出的估计方程,确保区间界限不受离群值。通过模拟和涉及具有离群值的高血压数据的现实应用,我们证明我们的方法在准确性和鲁棒性方面始终优于现有方法。
摘要:Outliers can severely distort causal effect estimation in observational studies, yet this issue has received limited attention in the literature. Their influence is especially pronounced in small sample sizes, where detecting and removing outliers becomes increasingly difficult. Therefore, it is essential to estimate treatment effects robustly without excluding these influential data points. To address this, we propose a doubly robust point estimator for the average treatment effect under a contaminated model that includes outliers. Robustness in outcome regression is achieved through a robust estimating equation, while covariate balancing propensity scores (CBPS) ensure resilience in propensity score modeling. To prevent model overfitting due to the inclusion of numerous parameters, we incorporate variable selection. All these components are unified under a penalized empirical likelihood framework. For confidence interval estimation, most existing approaches rely on asymptotic properties, which may be unreliable in finite samples. We derive an optimal finite-sample confidence interval for the average treatment effect using our proposed estimating equation, ensuring that the interval bounds remain unaffected by outliers. Through simulations and a real-world application involving hypertension data with outliers, we demonstrate that our method consistently outperforms existing approaches in both accuracy and robustness.
【7】A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion
标题:用于多视图、多任务乳房X光摄影分析的混合CNN-VMSM模型:基于注意力的融合的稳健诊断
链接:https://arxiv.org/abs/2507.16955
作者:ari, Roaa Elalfy, Mohamed Mabrok, Somaya Al-Maadeed, Tamer Khattab, Essam A. Rashed
摘要
:早期和准确地解释筛查乳房X线照片对于有效的乳腺癌检测至关重要,但由于微妙的成像结果和诊断模糊性,它仍然是一个复杂的挑战。许多现有的人工智能方法由于专注于单一视图输入或单一任务输出而不足,限制了它们的临床实用性。为了解决这些局限性,我们提出了一种新的多视图、多任务混合深度学习框架,该框架处理所有四个标准乳房X线摄影视图,并联合预测每个乳房的诊断标签和BI-RADS评分。我们的架构集成了一个混合CNN VSSM骨干,将卷积编码器与视觉状态空间模型(VSSM)相结合,以捕获全局上下文依赖关系。为了提高鲁棒性和可解释性,我们采用了一个门控的基于注意力的融合模块,动态权重的信息,跨视图,有效地处理丢失数据的情况下。我们在不同复杂度的诊断任务中进行了广泛的实验,在单任务和多任务学习设置中,将我们提出的混合模型与基线CNN架构和VSSM模型进行基准测试。在所有任务中,混合模型始终优于基线。在二进制BI-RADS 1 vs. 5分类任务中,共享混合模型的AUC为0.9967,F1得分为0.9830。对于更具挑战性的三元分类,它的F1得分为0.7790,而在五类BI-RADS任务中,最佳F1得分达到0.4904。这些结果突出了所提出的混合框架的有效性,并强调了多任务学习在提高诊断性能和实现有临床意义的乳腺X线摄影分析方面的潜力和局限性。
摘要:Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection, yet it remains a complex challenge due to subtle imaging findings and diagnostic ambiguity. Many existing AI approaches fall short by focusing on single view inputs or single-task outputs, limiting their clinical utility. To address these limitations, we propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views and jointly predicts diagnostic labels and BI-RADS scores for each breast. Our architecture integrates a hybrid CNN VSSM backbone, combining convolutional encoders for rich local feature extraction with Visual State Space Models (VSSMs) to capture global contextual dependencies. To improve robustness and interpretability, we incorporate a gated attention-based fusion module that dynamically weights information across views, effectively handling cases with missing data. We conduct extensive experiments across diagnostic tasks of varying complexity, benchmarking our proposed hybrid models against baseline CNN architectures and VSSM models in both single task and multi task learning settings. Across all tasks, the hybrid models consistently outperform the baselines. In the binary BI-RADS 1 vs. 5 classification task, the shared hybrid model achieves an AUC of 0.9967 and an F1 score of 0.9830. For the more challenging ternary classification, it attains an F1 score of 0.7790, while in the five-class BI-RADS task, the best F1 score reaches 0.4904. These results highlight the effectiveness of the proposed hybrid framework and underscore both the potential and limitations of multitask learning for improving diagnostic performance and enabling clinically meaningful mammography analysis.
检测相关(2篇)
【1】Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs
标题:迈向值得信赖的人工智能:使用CNN和零知识证明安全Deepfake检测
链接:https://arxiv.org/abs/2507.17010
作者:manul Islam, Huynh Q. N. Vo, Aditya Rane
备注:Submitted for peer-review in TrustXR - 2025
摘要:在合成媒体时代,deepfake操纵对信息完整性构成了重大威胁。为了应对这一挑战,我们提出了TrustDefender,这是一个两阶段的框架,包括(i)一个轻量级卷积神经网络(CNN),用于检测实时延展实境(XR)流中的深度伪造图像,以及(ii)一个集成的简洁零知识证明(ZKP)协议,用于验证检测结果,而不披露原始用户数据。我们的设计解决了XR平台的计算限制,同时在敏感环境中遵守严格的隐私要求。在多个基准deepfake数据集上的实验评估表明,TrustDefender实现了95.3%的检测准确率,再加上以严格的密码学为基础的高效证据生成,确保与高性能人工智能(AI)系统无缝集成。通过将先进的计算机视觉模型与可证明的安全机制相融合,我们的工作为沉浸式和隐私敏感型应用程序中的可靠AI奠定了基础。
摘要:In the era of synthetic media, deepfake manipulations pose a significant threat to information integrity. To address this challenge, we propose TrustDefender, a two-stage framework comprising (i) a lightweight convolutional neural network (CNN) that detects deepfake imagery in real-time extended reality (XR) streams, and (ii) an integrated succinct zero-knowledge proof (ZKP) protocol that validates detection results without disclosing raw user data. Our design addresses both the computational constraints of XR platforms while adhering to the stringent privacy requirements in sensitive settings. Experimental evaluations on multiple benchmark deepfake datasets demonstrate that TrustDefender achieves 95.3% detection accuracy, coupled with efficient proof generation underpinned by rigorous cryptography, ensuring seamless integration with high-performance artificial intelligence (AI) systems. By fusing advanced computer vision models with provable security mechanisms, our work establishes a foundation for reliable AI in immersive and privacy-sensitive applications.
【2】Joint Multi-Target Detection-Tracking in Cognitive Massive MIMO Radar via POMCP
标题:基于POMCP的认知大规模MIMO雷达多目标联合检测跟踪
链接:https://arxiv.org/abs/2507.17506
作者:ou, Stefano Fortunati, Leila Gharsalli, Alexandre Renaux
摘要:这种对应关系提出了一个功率感知的认知雷达框架,用于在大规模多输入多输出(MIMO)雷达环境中联合检测和跟踪多个目标。建立在以前的单目标算法的基础上,部分可观察蒙特卡罗规划(POMCP),我们将其扩展到多目标的情况下,分配给每个目标一个独立的POMCP树,使可扩展的和有效的规划。 从均匀的功率分配,这往往是次优与不同的信号噪声比(SNR)出发,我们的方法预测每个目标的未来角位置和预期的接收功率,根据其估计的范围和雷达截面(RCS)。这些预测通过约束优化问题来指导自适应波形设计,该约束优化问题分配发射能量以增强较弱或较远目标的可检测性,同时确保高SNR目标的足够功率。部分可观测马尔可夫决策过程(POMDP)的奖励函数也被修改,以优先考虑准确的空间和功率估计。 涉及多个目标不同信噪比的仿真证实了我们的方法的有效性。所提出的认知雷达的框架,提高了低信噪比目标的检测概率,并实现更准确的跟踪相比,使用均匀或正交波形的方法。这些结果表明,POMCP为基础的自适应,高效的多目标雷达系统的框架的潜力。
摘要:This correspondence presents a power-aware cognitive radar framework for joint detection and tracking of multiple targets in a massive multiple-input multiple-output (MIMO) radar environment. Building on a previous single-target algorithm based on Partially Observable Monte Carlo Planning (POMCP), we extend it to the multi-target case by assigning each target an independent POMCP tree, enabling scalable and efficient planning. Departing from uniform power allocation-which is often suboptimal with varying signal-to-noise ratios (SNRs)-our approach predicts each target's future angular position and expected received power, based on its estimated range and radar cross-section (RCS). These predictions guide adaptive waveform design via a constrained optimization problem that allocates transmit energy to enhance the detectability of weaker or distant targets, while ensuring sufficient power for high-SNR targets. The reward function in the underlying partially observable Markov decision process (POMDP) is also modified to prioritize accurate spatial and power estimation. Simulations involving multiple targets with different SNRs confirm the effectiveness of our method. The proposed framework for the cognitive radar improves detection probability for low-SNR targets and achieves more accurate tracking compared to approaches using uniform or orthogonal waveforms. These results demonstrate the potential of the POMCP-based framework for adaptive, efficient multi-target radar systems.
分类|识别(5篇)
【1】Persistent Patterns in Eye Movements: A Topological Approach to Emotion Recognition
标题:眼球运动的持续模式:情感识别的一种布局方法
链接:https://arxiv.org/abs/2507.17450
作者:sa, Hooman Zare, Ali Shahrabi, Hanieh Hatami, Mohammadreza Razvan
摘要:我们提出了一个拓扑管道自动多类情感识别的眼动跟踪数据。凝视轨迹的延迟嵌入使用持久同源性进行分析。从由此产生的持久性图中,我们提取基于形状的特征,如平均持久性,最大持久性和熵。在这些特征上训练的随机森林分类器在四个情感类上实现了高达75.6%的准确率,这四个情感类是情感的环形模型的象限。结果表明,持久性图几何有效地编码的歧视性凝视动态,情感计算和人类行为分析提出了一个有前途的拓扑方法。
摘要
:We present a topological pipeline for automated multiclass emotion recognition from eye-tracking data. Delay embeddings of gaze trajectories are analyzed using persistent homology. From the resulting persistence diagrams, we extract shape-based features such as mean persistence, maximum persistence, and entropy. A random forest classifier trained on these features achieves up to $75.6\%$ accuracy on four emotion classes, which are the quadrants the Circumplex Model of Affect. The results demonstrate that persistence diagram geometry effectively encodes discriminative gaze dynamics, suggesting a promising topological approach for affective computing and human behavior analysis.
【2】TOC-UCO: a comprehensive repository of tabular ordinal classification datasets
标题:TOC-UCO:表格有序分类数据集的综合知识库
链接:https://arxiv.org/abs/2507.17348
作者:llón-Gavilán, David Guijo-Rubio, Antonio Manuel Gómez-Orellana, David Guijo-Rubio, Francisco Bérchez-Moreno, Víctor Manuel Vargas-Yun, Pedro A. Gutiérrez
备注:25 single column pages, 5 figures, 7 tables
摘要:有序分类(OC)问题对应于一种特殊类型的分类,其特征在于类之间存在自然顺序关系。这种类型的问题可以在许多现实世界的应用程序中找到,这激发了过去几年许多有序方法的设计和开发。然而,重要的是要强调的是,OC领域的发展遭受一个主要的缺点:缺乏一套全面的数据集,新的方法,以文献为基准。为了实现这一目标,这份来自科尔多瓦大学(UCO)的手稿,它以前在OC领域的经验,为文献提供了一个公开可用的表格数据库,用于对新的OC方法进行稳健的验证,即TOC-UCO(UCO的表格有序分类库)。具体来说,这个存储库包括一组46 $表格序数数据集,在一个共同的框架下进行预处理,并确保具有合理数量的模式和适当的类分布。我们还提供了每个数据集的来源和预处理步骤,以及如何使用TOC-UCO存储库对新方法进行基准测试的详细信息。为此,提供了$30$不同的随机化训练测试分区的指数,以促进实验的再现性。
摘要:An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC field suffers from one main disadvantage: the lack of a comprehensive set of datasets on which novel approaches to the literature have to be benchmarked. In order to approach this objective, this manuscript from the University of C\'ordoba (UCO), which have previous experience on the OC field, provides the literature with a publicly available repository of tabular data for a robust validation of novel OC approaches, namely TOC-UCO (Tabular Ordinal Classification repository of the UCO). Specifically, this repository includes a set of $46$ tabular ordinal datasets, preprocessed under a common framework and ensured to have a reasonable number of patterns and an appropriate class distribution. We also provide the sources and preprocessing steps of each dataset, along with details on how to benchmark a novel approach using the TOC-UCO repository. For this, indices for $30$ different randomised train-test partitions are provided to facilitate the reproducibility of the experiments.
【3】JAM: Keypoint-Guided Joint Prediction after Classification-Aware Marginal Proposal for Multi-Agent Interaction
标题:JAM:多智能体交互的分类感知边缘提案后的关键点引导联合预测
链接:https://arxiv.org/abs/2507.17152
作者:n, Ying He, Fei Yu, Hong Zhang
备注:IROS 2025 Accepted
摘要:预测道路参与者的未来运动是自动驾驶中的一项关键任务。在这项工作中,我们解决了多智能体联合预测中低概率模式的低质量生成的挑战。为了解决这个问题,我们提出了一个两阶段的多智能体交互式预测框架,名为\textit{关键点引导的联合预测后分类感知边缘建议}(JAM)。第一阶段被建模为边缘预测过程,该过程按轨迹类型对查询进行分类,以鼓励模型学习所有类别的轨迹,为联合预测模块提供全面的模式信息。第二阶段被建模为联合预测过程,其将场景上下文和来自第一阶段的边缘建议作为输入来学习最终的联合分布。我们明确地引入关键航路点,以指导联合预测模块更好地捕获和利用来自初始预测轨迹的关键信息。我们对真实世界的Waymo Open Motion Dataset交互式预测基准进行了广泛的实验。结果表明,我们的方法取得了有竞争力的性能。特别是,在框架比较实验中,建议的JAM优于其他预测框架,并实现了最先进的交互式轨迹预测性能。该代码可在https://github.com/LinFunster/JAM上查阅,以方便今后的研究。
摘要:Predicting the future motion of road participants is a critical task in autonomous driving. In this work, we address the challenge of low-quality generation of low-probability modes in multi-agent joint prediction. To tackle this issue, we propose a two-stage multi-agent interactive prediction framework named \textit{keypoint-guided joint prediction after classification-aware marginal proposal} (JAM). The first stage is modeled as a marginal prediction process, which classifies queries by trajectory type to encourage the model to learn all categories of trajectories, providing comprehensive mode information for the joint prediction module. The second stage is modeled as a joint prediction process, which takes the scene context and the marginal proposals from the first stage as inputs to learn the final joint distribution. We explicitly introduce key waypoints to guide the joint prediction module in better capturing and leveraging the critical information from the initial predicted trajectories. We conduct extensive experiments on the real-world Waymo Open Motion Dataset interactive prediction benchmark. The results show that our approach achieves competitive performance. In particular, in the framework comparison experiments, the proposed JAM outperforms other prediction frameworks and achieves state-of-the-art performance in interactive trajectory prediction. The code is available at https://github.com/LinFunster/JAM to facilitate future research.
【4】Divisive Decisions: Improving Salience-Based Training for Generalization in Binary Classification Tasks
标题:分裂决策:改进基于显著性的二进制分类任务的概括训练
链接:https://arxiv.org/abs/2507.17000
作者:and, Chris Sweet, Adam Czajka
摘要:现有的显着性引导训练方法通过引入损失项来改进模型泛化,该损失项将模型的类激活图(CAM)与样本的真类({\it i.e.},正确标记类)与人类参考显著性图的比较。然而,以往的工作忽略了假类CAM(s),即模型的显着性获得不正确的标签类。我们假设,在二进制任务的真实和虚假的CAM应该分歧的重要分类功能,由人类识别(并反映在人类显着地图)。我们使用这个假设来激励三种新的显着性指导的训练方法,将真类和假类模型的CAM纳入训练策略和一种新的事后工具,用于识别重要特征。我们评估了几种不同的二进制闭集和开集分类任务的所有方法,包括合成人脸检测,生物特征呈现攻击检测和胸部X射线扫描中的异常分类,并发现所提出的方法提高了深度学习模型的泛化能力。我们提供源代码和模型权重\footnote{GitHub repository link removed to preserve anonymity}以支持可重复的研究。
摘要:Existing saliency-guided training approaches improve model generalization by incorporating a loss term that compares the model's class activation map (CAM) for a sample's true-class ({\it i.e.}, correct-label class) against a human reference saliency map. However, prior work has ignored the false-class CAM(s), that is the model's saliency obtained for incorrect-label class. We hypothesize that in binary tasks the true and false CAMs should diverge on the important classification features identified by humans (and reflected in human saliency maps). We use this hypothesis to motivate three new saliency-guided training methods incorporating both true- and false-class model's CAM into the training strategy and a novel post-hoc tool for identifying important features. We evaluate all introduced methods on several diverse binary close-set and open-set classification tasks, including synthetic face detection, biometric presentation attack detection, and classification of anomalies in chest X-ray scans, and find that the proposed methods improve generalization capabilities of deep learning models over traditional (true-class CAM only) saliency-guided training approaches. We offer source codes and model weights\footnote{GitHub repository link removed to preserve anonymity} to support reproducible research.
【5】The surprising strength of weak classifiers for validating neural posterior estimates
标题:弱分类器用于验证神经后验估计的惊人实力
链接
:https://arxiv.org/abs/2507.17026
作者:sal, Tianyu Chen, James G. Scott
摘要:神经后验估计(Neural Posterior Estimation,NPE)在真实后验p(\theta \mid y)$难以处理或难以采样的情况下,已经成为一种有效的摊销贝叶斯推理方法。但是,评估神经后验估计的准确性仍然具有挑战性,现有的方法存在很大的局限性。一种吸引人且广泛使用的方法是分类器双样本测试(C2 ST),其中训练分类器以区分来自真实后验$p(\theta \mid y)$与学习的NPE近似$q(\theta \mid y)$的样本。然而,尽管C2 ST的简单性很吸引人,但其理论和实践的可靠性取决于能否获得接近贝叶斯最优的分类器--这一要求很少得到满足,充其量也很难验证。因此,一个主要的开放性问题是:弱分类器仍然可以用于神经后验验证吗?我们证明答案是肯定的。在Hu和Lei的工作基础上,我们提出了C2 ST的共形变体的几个关键结果,它将任何经过训练的分类器的分数-即使是弱或过拟合模型的分数-转换为精确的有限样本p值。我们建立了共形C2 ST的两个关键理论属性:(i)有限样本I型错误控制,以及(ii)与训练分类器的错误一起缓慢下降的非平凡功率。结果是,即使是弱的、有偏见的或过拟合的分类器仍然可以产生强大而可靠的测试。从经验上讲,共形C2 ST在广泛的基准测试中优于经典的判别测试。这些结果揭示了弱分类器在验证神经后验估计方面的不足,建立了适形C2 ST作为现代基于模拟的推理的实用的、理论上有根据的诊断。
摘要:Neural Posterior Estimation (NPE) has emerged as a powerful approach for amortized Bayesian inference when the true posterior $p(\theta \mid y)$ is intractable or difficult to sample. But evaluating the accuracy of neural posterior estimates remains challenging, with existing methods suffering from major limitations. One appealing and widely used method is the classifier two-sample test (C2ST), where a classifier is trained to distinguish samples from the true posterior $p(\theta \mid y)$ versus the learned NPE approximation $q(\theta \mid y)$. Yet despite the appealing simplicity of the C2ST, its theoretical and practical reliability depend upon having access to a near-Bayes-optimal classifier -- a requirement that is rarely met and, at best, difficult to verify. Thus a major open question is: can a weak classifier still be useful for neural posterior validation? We show that the answer is yes. Building on the work of Hu and Lei, we present several key results for a conformal variant of the C2ST, which converts any trained classifier's scores -- even those of weak or over-fitted models -- into exact finite-sample p-values. We establish two key theoretical properties of the conformal C2ST: (i) finite-sample Type-I error control, and (ii) non-trivial power that degrades gently in tandem with the error of the trained classifier. The upshot is that even weak, biased, or overfit classifiers can still yield powerful and reliable tests. Empirically, the Conformal C2ST outperforms classical discriminative tests across a wide range of benchmarks. These results reveal the under appreciated strength of weak classifiers for validating neural posterior estimates, establishing the conformal C2ST as a practical, theoretically grounded diagnostic for modern simulation-based inference.
表征(3篇)
【1】C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning
标题:C3 RL:重新思考代表学习中的课堂独立性和课堂混合的结合
链接:https://arxiv.org/abs/2507.17454
作者:, Yun-Bo Zhao, Yu Kang
摘要:多变量时间序列预测因其重要的实际意义而受到越来越多的关注。现有的方法通常采用信道混合(CM)或信道独立(CI)策略。CM策略可以捕获变量间的依赖关系,但无法辨别变量特定的时间模式。CI策略改进了这方面,但未能充分利用跨变量的依赖关系,如CM。基于特征融合的混合策略提供有限的推广性和可解释性。为了解决这些问题,我们提出了C3RL,一种新的表示学习框架,共同建模CM和CI策略。受计算机视觉中对比学习的启发,C3RL将两种策略的输入视为转置视图,并构建了一个连体网络架构:一种策略作为骨干,另一种策略作为补充,通过自适应加权联合优化对比和预测损失,C3RL平衡了表示和预测性能。对7个模型的大量实验表明,C3 RL将基于CI策略的模型的最佳情况性能率提高到81.4%,将基于CM策略的模型的最佳情况性能率提高到76.3%,表现出较强的泛化能力和有效性。一旦论文被接受,代码将可用。
摘要:Multivariate time series forecasting has drawn increasing attention due to its practical importance. Existing approaches typically adopt either channel-mixing (CM) or channel-independence (CI) strategies. CM strategy can capture inter-variable dependencies but fails to discern variable-specific temporal patterns. CI strategy improves this aspect but fails to fully exploit cross-variable dependencies like CM. Hybrid strategies based on feature fusion offer limited generalization and interpretability. To address these issues, we propose C3RL, a novel representation learning framework that jointly models both CM and CI strategies. Motivated by contrastive learning in computer vision, C3RL treats the inputs of the two strategies as transposed views and builds a siamese network architecture: one strategy serves as the backbone, while the other complements it. By jointly optimizing contrastive and prediction losses with adaptive weighting, C3RL balances representation and forecasting performance. Extensive experiments on seven models show that C3RL boosts the best-case performance rate to 81.4\% for models based on CI strategy and to 76.3\% for models based on CM strategy, demonstrating strong generalization and effectiveness. The code will be available once the paper is accepted.
【2】Principled Multimodal Representation Learning
标题:原则性多模式表示学习
链接:https://arxiv.org/abs/2507.17343
作者:iu, Xiaobo Xia, See-Kiong Ng, Tat-Seng Chua
备注:32 pages, 9 figures, 10 tables
摘要:多模态表征学习试图通过整合不同的数据模态来创建统一的表征空间,以提高多模态理解。传统方法通常依赖于成对对比学习,这依赖于预定义的锚模态,限制了所有模态的对齐。最近的进展已经调查了多个模态的同时对准,但是仍然存在一些挑战,例如固定锚点所施加的限制和优化奇异值的乘积所产生的不稳定性。为了应对这些挑战,在本文中,我们提出了原则性多模态表示学习(PMRL),这是一种新的框架,可以以更稳定的方式实现多模态的同时对齐,而无需锚点依赖。具体而言,基于完全对齐对应于秩1的Gram矩阵的理论见解,PMRL优化了表示矩阵的主导奇异值,以沿着共享的前导方向对齐模态。我们提出了一个基于softmax的损失函数,将奇异值视为logits,以优先考虑最大的奇异值。此外,在前导特征向量上的实例对比正则化保持了实例间的可分性,并防止了表示崩溃。跨不同任务的广泛实验证明了PMRL与基线方法相比的优越性。源代码将公开发布。
摘要:Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities to improve multimodal understanding. Traditional methods often depend on pairwise contrastive learning, which relies on a predefined anchor modality, restricting alignment across all modalities. Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain, such as limitations imposed by fixed anchor points and instability arising from optimizing the product of singular values. To address the challenges, in this paper, we propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities without anchor dependency in a more stable manner. Specifically, grounded in the theoretical insight that full alignment corresponds to a rank-1 Gram matrix, PMRL optimizes the dominant singular value of the representation matrix to align modalities along a shared leading direction. We propose a softmax-based loss function that treats singular values as logits to prioritize the largest singular value. Besides, instance-wise contrastive regularization on the leading eigenvectors maintains inter-instance separability and prevents representation collapse. Extensive experiments across diverse tasks demonstrate PMRL's superiority compared to baseline methods. The source code will be publicly available.
【3】Rethinking VAE: From Continuous to Discrete Representations Without Probabilistic Assumptions
标题:重新思考VAE:从连续表示到没有概率假设的离散表示
链接:https://arxiv.org/abs/2507.17255
作者:Shi
摘要:本文探讨了自动编码器(AE)的生成能力,并通过重新制定的训练框架建立了变分自动编码器(VAE)和矢量量化变分自动编码器(VQ-VAE)之间的联系。我们表明,AE表现出生成潜力,通过潜在的空间插值和扰动,虽然在编码空间中的未定义的区域的限制。为了解决这个问题,我们提出了一种新的VAE类训练方法,该方法引入聚类中心来增强数据的紧凑性,并确保定义良好的潜在空间,而不依赖于传统的KL发散或重新参数化技术。MNIST、CelebA和FashionMNIST数据集上的实验结果显示插值转换平滑,但模糊仍然存在。将这种方法扩展到多个可学习向量,我们观察到连续空间中的VQ-VAE模型的自然发展。然而,当编码器输出多个向量时,该模型退化为离散自动编码器(VQ-AE),其在不学习语义表示的情况下组合图像片段。我们的研究结果突出了编码空间紧凑性和分散性在生成建模中的关键作用,并提供了对VAE和VQ-VAE之间内在联系的见解,为它们的设计和局限性提供了新的视角。
摘要:This paper explores the generative capabilities of Autoencoders (AEs) and establishes connections between Variational Autoencoders (VAEs) and Vector Quantized-Variational Autoencoders (VQ-VAEs) through a reformulated training framework. We demonstrate that AEs exhibit generative potential via latent space interpolation and perturbation, albeit limited by undefined regions in the encoding space. To address this, we propose a new VAE-like training method that introduces clustering centers to enhance data compactness and ensure well-defined latent spaces without relying on traditional KL divergence or reparameterization techniques. Experimental results on MNIST, CelebA, and FashionMNIST datasets show smooth interpolative transitions, though blurriness persists. Extending this approach to multiple learnable vectors, we observe a natural progression toward a VQ-VAE-like model in continuous space. However, when the encoder outputs multiple vectors, the model degenerates into a discrete Autoencoder (VQ-AE), which combines image fragments without learning semantic representations. Our findings highlight the critical role of encoding space compactness and dispersion in generative modeling and provide insights into the intrinsic connections between VAEs and VQ-VAEs, offering a new perspective on their design and limitations.
编码器(1篇)
【1】Confidence Optimization for Probabilistic Encoding
标题:概率编码的置信度优化
链接:https://arxiv.org/abs/2507.16881
作者:ia, Yidian Huang, Wenchao Wei, Yuwen Tan
摘要:概率编码在神经网络中引入了高斯噪声,使得从确定性状态到不确定性状态的平滑过渡成为可能,并增强了泛化能力。然而,高斯噪声的随机性使分类任务中基于点的距离测量失真。为了缓解这个问题,我们提出了一种置信度优化概率编码(CPE)方法,它可以提高距离可靠性并增强表示学习。具体来说,我们用两个关键策略改进了概率编码:首先,我们引入了一个置信度感知机制来调整距离计算,确保概率编码分类任务的一致性和可靠性。其次,我们取代了传统的KL分歧为基础的方差正则化,它依赖于不可靠的先验假设,一个更简单的L2正则化项直接约束方差。我们提出的方法是模型不可知的,在自然语言分类任务上的大量实验表明,我们的方法显着提高了性能和泛化的BERT和RoBERTa模型。
摘要:Probabilistic encoding introduces Gaussian noise into neural networks, enabling a smooth transition from deterministic to uncertain states and enhancing generalization ability. However, the randomness of Gaussian noise distorts point-based distance measurements in classification tasks. To mitigate this issue, we propose a confidence optimization probabilistic encoding (CPE) method that improves distance reliability and enhances representation learning. Specifically, we refine probabilistic encoding with two key strategies: First, we introduce a confidence-aware mechanism to adjust distance calculations, ensuring consistency and reliability in probabilistic encoding classification tasks. Second, we replace the conventional KL divergence-based variance regularization, which relies on unreliable prior assumptions, with a simpler L2 regularization term to directly constrain variance. The method we proposed is model-agnostic, and extensive experiments on natural language classification tasks demonstrate that our method significantly improves performance and generalization on both the BERT and the RoBERTa model.
优化|敛散性(5篇)
【1】HOTA: Hamiltonian framework for Optimal Transport Advection
标题:HOTA:最佳输运平流的汉密尔顿框架
链接:https://arxiv.org/abs/2507.17513
作者:un, Daniil Shlenskii, Maxim Bobrin, Dmitry V. Dylov
摘要:最优运输(OT)已经成为一个自然的框架,指导概率流。然而,大多数最近的生成模型假设平凡的几何形状(例如,欧几里德),并依赖于强密度估计假设,产生的轨迹,不尊重真正的原则,最优性的基础流形。我们提出了汉密尔顿最优运输平流(HOTA),一个基于汉密尔顿-雅可比-贝尔曼的方法,明确通过康托洛维奇潜力来解决双动力OT问题,从而实现高效和可扩展的轨迹优化。我们的方法有效地避免了显式密度建模的需要,即使当成本泛函是非光滑的。从经验上讲,HOTA在标准基准测试以及具有不可微成本的自定义数据集中的可行性和最优性方面都优于所有基线。
摘要:Optimal transport (OT) has become a natural framework for guiding the probability flows. Yet, the majority of recent generative models assume trivial geometry (e.g., Euclidean) and rely on strong density-estimation assumptions, yielding trajectories that do not respect the true principles of optimality in the underlying manifold. We present Hamiltonian Optimal Transport Advection (HOTA), a Hamilton-Jacobi-Bellman based method that tackles the dual dynamical OT problem explicitly through Kantorovich potentials, enabling efficient and scalable trajectory optimization. Our approach effectively evades the need for explicit density modeling, performing even when the cost functionals are non-smooth. Empirically, HOTA outperforms all baselines in standard benchmarks, as well as in custom datasets with non-differentiable costs, both in terms of feasibility and optimality.
【2】DeCo-SGD: Joint Optimization of Delay Staleness and Gradient Compression Ratio for Distributed SGD
标题:DeCo-BCD:分布式BCD的延迟稳定性和梯度压缩比的联合优化
链接:https://arxiv.org/abs/2507.17346
作者:u, Jingyan Jiang, Chunyang Li, Haotian Dong, Xingguang Wei, Delin Cai, Zhi Wang
摘要:分布式机器学习在高端到端延迟和低带宽变化的网络环境中会经历严重的吞吐量下降。由于其低通信要求,分布式SGD(D-SGD)仍然是这种具有挑战性的网络中的主流优化器,但它仍然遭受显着的吞吐量降低。为了缓解这些限制,现有方法通常采用梯度压缩和延迟聚合来分别缓解低带宽和高延迟。为了同时解决这两个挑战,这些策略通常被结合起来,在压缩比、陈旧性(延迟的同步步骤)和模型收敛率之间引入复杂的三方权衡。为了在不同的带宽条件下实现平衡,需要一种自适应策略来动态地调整这些参数。遗憾的是,现有的作品依赖于静态的启发式策略,由于缺乏理论指导,这阻碍了他们实现这一目标。本研究通过引入一种新的理论工具,将联合优化问题分解为具有多个可分析噪声项的传统收敛速度分析,从而填补了这一理论空白。我们是第一个揭示陈旧指数放大梯度压缩对训练性能的负面影响的人,填补了理解压缩和延迟梯度如何影响训练的关键空白。此外,通过将收敛速度与网络感知的时间最小化条件相结合,我们提出了DeCo-SGD,它根据实时网络条件和训练任务动态调整压缩比和陈旧性。DeCo-SGD在高延迟和低带宽变化的网络中分别比D-SGD和静态策略实现了高达5.07和1.37的加速。
摘要
:Distributed machine learning in high end-to-end latency and low, varying bandwidth network environments undergoes severe throughput degradation. Due to its low communication requirements, distributed SGD (D-SGD) remains the mainstream optimizer in such challenging networks, but it still suffers from significant throughput reduction. To mitigate these limitations, existing approaches typically employ gradient compression and delayed aggregation to alleviate low bandwidth and high latency, respectively. To address both challenges simultaneously, these strategies are often combined, introducing a complex three-way trade-off among compression ratio, staleness (delayed synchronization steps), and model convergence rate. To achieve the balance under varying bandwidth conditions, an adaptive policy is required to dynamically adjust these parameters. Unfortunately, existing works rely on static heuristic strategies due to the lack of theoretical guidance, which prevents them from achieving this goal. This study fills in this theoretical gap by introducing a new theoretical tool, decomposing the joint optimization problem into a traditional convergence rate analysis with multiple analyzable noise terms. We are the first to reveal that staleness exponentially amplifies the negative impact of gradient compression on training performance, filling a critical gap in understanding how compressed and delayed gradients affect training. Furthermore, by integrating the convergence rate with a network-aware time minimization condition, we propose DeCo-SGD, which dynamically adjusts the compression ratio and staleness based on the real-time network condition and training task. DeCo-SGD achieves up to 5.07 and 1.37 speed-ups over D-SGD and static strategy in high-latency and low, varying bandwidth networks, respectively.
【3】Optimal differentially private kernel learning with random projection
标题:具有随机投影的最佳差异私有核学习
链接:https://arxiv.org/abs/2507.17544
作者:e, Cheolwoo Park, Jeongyoun Ahn
备注:110 page, 12 figures
摘要:差分隐私已经成为隐私保护学习算法发展的基石。这项工作解决了在经验风险最小化(ERM)框架内优化差分私有内核学习的问题。提出了一种新的差分私有核ERM算法,该算法基于高斯过程在再生核希尔伯特空间中的随机投影。我们的方法实现了极大极小最优超额风险的平方损失和Lipschitz光滑凸损失函数下的局部强凸性条件。我们进一步表明,现有的方法的基础上替代降维技术,如随机傅立叶特征映射或$\ell_2 $正则化,产生次优的泛化性能。我们的关键理论贡献还包括推导无量纲的泛化边界的客观扰动为基础的私人线性ERM -标志着第一个这样的结果,不依赖于嘈杂的梯度为基础的机制。此外,我们得到现有的差分私有内核ERM算法更清晰的泛化界。经验评估支持我们的理论主张,表明随机投影可以实现统计上有效和最佳的私有内核学习。这些发现为差异隐私算法的设计提供了新的见解,并突出了降维在平衡隐私和实用性方面的核心作用。
摘要:Differential privacy has become a cornerstone in the development of privacy-preserving learning algorithms. This work addresses optimizing differentially private kernel learning within the empirical risk minimization (ERM) framework. We propose a novel differentially private kernel ERM algorithm based on random projection in the reproducing kernel Hilbert space using Gaussian processes. Our method achieves minimax-optimal excess risk for both the squared loss and Lipschitz-smooth convex loss functions under a local strong convexity condition. We further show that existing approaches based on alternative dimension reduction techniques, such as random Fourier feature mappings or $\ell_2$ regularization, yield suboptimal generalization performance. Our key theoretical contribution also includes the derivation of dimension-free generalization bounds for objective perturbation-based private linear ERM -- marking the first such result that does not rely on noisy gradient-based mechanisms. Additionally, we obtain sharper generalization bounds for existing differentially private kernel ERM algorithms. Empirical evaluations support our theoretical claims, demonstrating that random projection enables statistically efficient and optimally private kernel learning. These findings provide new insights into the design of differentially private algorithms and highlight the central role of dimension reduction in balancing privacy and utility.
【4】Bayesian preference elicitation for decision support in multiobjective optimization
标题:多目标优化决策支持的Bayesian偏好启发
链接:https://arxiv.org/abs/2507.16999
作者:er, Sebastian Rojas Gonzalez, Raul Astudillo
备注:16 pages, 5 figures
摘要:我们提出了一种新的方法,以帮助决策者有效地确定首选的解决方案,从帕累托集的多目标优化问题。我们的方法使用贝叶斯模型来估计决策者的效用函数的基础上成对比较。在该模型的辅助下,一个原则性的启发策略交互地选择查询,以平衡探索和利用,指导发现高效用的解决方案。方法是灵活的:在通过标准多目标优化技术估计帕累托前沿之后,可以交互地或后验地使用它。此外,在启发阶段结束时,它会生成一个精简的高质量解决方案菜单,简化决策过程。通过对多达9个目标的测试问题的实验,我们的方法在用少量查询找到高效用解决方案方面表现出优越的性能。我们还提供了我们的方法的开源实现,以支持更广泛的社区采用它。
摘要:We present a novel approach to help decision-makers efficiently identify preferred solutions from the Pareto set of a multi-objective optimization problem. Our method uses a Bayesian model to estimate the decision-maker's utility function based on pairwise comparisons. Aided by this model, a principled elicitation strategy selects queries interactively to balance exploration and exploitation, guiding the discovery of high-utility solutions. The approach is flexible: it can be used interactively or a posteriori after estimating the Pareto front through standard multi-objective optimization techniques. Additionally, at the end of the elicitation phase, it generates a reduced menu of high-quality solutions, simplifying the decision-making process. Through experiments on test problems with up to nine objectives, our method demonstrates superior performance in finding high-utility solutions with a small number of queries. We also provide an open-source implementation of our method to support its adoption by the broader community.
【5】High-dimensional multidisciplinary design optimization for aircraft eco-design / Optimisation multi-disciplinaire en grande dimension pour l'éco-conception avion en avant-projet
标题:飞机生态设计的多维多学科设计优化/前卫飞行概念飞机的多维多学科优化
链接:https://arxiv.org/abs/2402.04711
作者:s
备注:PhD Thesis, Université de Toulouse, Toulouse, 2024 on Gaussian Process kernels for Bayesian optimization in high dimension with mixed and hierarchical variables at ISAE-SUPAERO. Keywords: Gaussian process, Black-box optimization, Bayesian inference, Multidisciplinary design optimization, Mixed hierarchical and categorical inputs, Eco-friendly aircraft design
摘要:这篇哲学博士论文的目的是提出一种有效的方法,用于优化多学科黑箱模型时,优化问题的约束,并涉及大量的混合整数设计变量(通常为100个变量)。有针对性的优化方法,称为EGO,是基于自适应代理模型的顺序富集,在这种情况下,GP代理模型是工程问题中最广泛使用的近似耗时的高保真模型之一。EGO是一种启发式BO方法,在解决方案质量方面表现良好。然而,像任何其他全局优化方法一样,EGO遭受维数灾难,这意味着它的性能在低维问题上是令人满意的,但随着优化搜索空间的维数增加而恶化。对于实际的飞机设计问题,设计变量的典型大小甚至可以超过100,因此,试图直接使用EGO解决问题被排除。当问题涉及连续变量和分类变量时,后者尤其如此,从而增加了搜索空间的大小。在这篇博士论文中,研究了有效的参数化工具,包括偏最小二乘回归等技术,以显着减少设计变量的数量。此外,贝叶斯优化适用于处理离散变量和高维空间,以便在优化创新飞机概念(如“Dragon”混合动力飞机)时减少评估次数,从而减少其对气候的影响。
摘要:The objective of this Philosophiae Doctor (Ph.D) thesis is to propose an efficient approach for optimizing a multidisciplinary black-box model when the optimization problem is constrained and involves a large number of mixed integer design variables (typically 100 variables). The targeted optimization approach, called EGO, is based on a sequential enrichment of an adaptive surrogate model and, in this context, GP surrogate models are one of the most widely used in engineering problems to approximate time-consuming high fidelity models. EGO is a heuristic BO method that performs well in terms of solution quality. However, like any other global optimization method, EGO suffers from the curse of dimensionality, meaning that its performance is satisfactory on lower dimensional problems, but deteriorates as the dimensionality of the optimization search space increases. For realistic aircraft design problems, the typical size of the design variables can even exceed 100 and, thus, trying to solve directly the problems using EGO is ruled out. The latter is especially true when the problems involve both continuous and categorical variables increasing even more the size of the search space. In this Ph.D thesis, effective parameterization tools are investigated, including techniques like partial least squares regression, to significantly reduce the number of design variables. Additionally, Bayesian optimization is adapted to handle discrete variables and high-dimensional spaces in order to reduce the number of evaluations when optimizing innovative aircraft concepts such as the "DRAGON" hybrid airplane to reduce their climate impact.
预测|估计(10篇)
【1】Mindfulness Meditation and Respiration: Accelerometer-Based Respiration Rate and Mindfulness Progress Estimation to Enhance App Engagement and Mindfulness Skills
标题:正念冥想和呼吸:基于加速计的呼吸率和正念进展估计,以增强应用程序参与度和正念技能
链接:https://arxiv.org/abs/2507.17688
作者:Nur Hossain Khan, David creswell, Jordan Albert, Patrick O'Connell, Shawn Fallon, Mathew Polowitz, Xuhai "orson" Xu, Bashima islam
备注:Accepted in Proc. ACM Interact. Mob. Wearable Ubiquitous Technology (IMWUT)
摘要:正念训练因其在减少抑郁、焦虑和孤独方面的益处而被广泛认可。随着基于智能手机的正念应用程序的兴起,数字冥想变得更容易获得,但维持长期的用户参与仍然是一个挑战。本文探讨了呼吸生物信号反馈和正念技能评估是否能提高系统的可用性和技能发展。我们开发了智能手机的基于加速度计的呼吸跟踪算法,无需额外的可穿戴设备。与现有的方法不同,我们的方法准确地捕捉了正念冥想的典型缓慢呼吸模式。此外,我们引入了第一个定量框架,以估计正念技能的浓度,感官清晰度和平静的基础上,加速计衍生的呼吸数据。我们在受控和现实世界的261个正念会话中开发和测试我们的算法。一项用户研究将接收生物信号反馈的实验组与使用标准应用程序的对照组进行了比较,结果表明呼吸反馈增强了系统的可用性。我们的呼吸跟踪模型实现了每分钟1.6次呼吸的平均绝对误差(MAE),与地面真实数据密切相关,而我们的正念技能估计在跟踪技能进展方面达到了80-84%的F1分数。通过将呼吸跟踪和正念估计集成到商业应用程序中,我们展示了智能手机传感器增强数字正念训练的潜力。
摘要:Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill development. We develop a smartphone's accelerometer-based respiration tracking algorithm, eliminating the need for additional wearables. Unlike existing methods, our approach accurately captures slow breathing patterns typical of mindfulness meditation. Additionally, we introduce the first quantitative framework to estimate mindfulness skills-concentration, sensory clarity, and equanimity-based on accelerometer-derived respiration data. We develop and test our algorithms on 261 mindfulness sessions in both controlled and real-world settings. A user study comparing an experimental group receiving biosignal feedback with a control group using a standard app shows that respiration feedback enhances system usability. Our respiration tracking model achieves a mean absolute error (MAE) of 1.6 breaths per minute, closely aligning with ground truth data, while our mindfulness skill estimation attains F1 scores of 80-84% in tracking skill progression. By integrating respiration tracking and mindfulness estimation into a commercial app, we demonstrate the potential of smartphone sensors to enhance digital mindfulness training.
【2】Generalized Advantage Estimation for Distributional Policy Gradients
标题:分布式策略约束的广义优势估计
链接:https://arxiv.org/abs/2507.17530
作者:aik, Jonathon M. Smereka, Yue Wang
备注:6 pages, 3 figures, published at ACC 2025 Conference
摘要:广义优势估计(GAE)已被用于通过采用优势函数的指数加权估计来减少策略梯度估计的方差来减轻强化学习(RL)的计算复杂性。尽管它的有效性,GAE是不是设计来处理值分布积分的分布RL,它可以捕捉系统中的固有随机性,因此更强大的系统噪声。为了解决这一差距,我们提出了一种新的方法,利用最优传输理论引入一个Wasserstein样的方向性度量,该度量的距离和概率分布之间的方向差异。使用指数加权估计,我们利用这个Wasserstein样的方向度量来推导分布式GAE(DGAE)。类似于传统的GAE,我们提出的DGAE提供了一个低方差的优势估计与控制偏置,使其非常适合于政策梯度算法,依赖于优势估计的政策更新。我们集成DGAE到三个不同的政策梯度方法。算法在各种OpenAI Gym环境中进行了评估,并与传统GAE的基线进行了比较,以评估性能。
摘要:Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL) by employing an exponentially weighted estimation of the advantage function to reduce the variance in policy gradient estimates. Despite its effectiveness, GAE is not designed to handle value distributions integral to distributional RL, which can capture the inherent stochasticity in systems and is hence more robust to system noises. To address this gap, we propose a novel approach that utilizes the optimal transport theory to introduce a Wasserstein-like directional metric, which measures both the distance and the directional discrepancies between probability distributions. Using the exponentially weighted estimation, we leverage this Wasserstein-like directional metric to derive distributional GAE (DGAE). Similar to traditional GAE, our proposed DGAE provides a low-variance advantage estimate with controlled bias, making it well-suited for policy gradient algorithms that rely on advantage estimation for policy updates. We integrated DGAE into three different policy gradient methods. Algorithms were evaluated across various OpenAI Gym environments and compared with the baselines with traditional GAE to assess the performance.
【3】A Low-Cost Machine Learning Approach for Timber Diameter Estimation
标题:木材直径估计的低成本机器学习方法
链接:https://arxiv.org/abs/2507.17219
作者:asanzadeh Fard, Sanaz Hasanzadeh Fard, Mehdi Jonoobi
摘要:木材加工行业,特别是在锯木厂和木材生产线等设施中,需要准确有效地识别木材的种类和厚度。虽然传统方法严重依赖于专业的人力,但它们速度慢,不一致,而且容易出错,特别是在处理大量数据时。这项研究的重点是实用且具有成本效益的机器学习框架,这些框架使用在真实工作条件下捕获的标准RGB图像自动估计原木直径。我们采用YOLOv5对象检测算法,在公共数据集(TimberSeg 1.0)上进行了微调,以检测单个木材日志并通过边界框尺寸估计厚度。与以前需要昂贵传感器或受控环境的方法不同,该模型是在木材运输期间在典型工业棚中拍摄的图像上训练的。实验结果表明,该模型实现了0.64的平均精度(mAP@0.5),即使在适度的计算资源下,也证明了可靠的日志检测。这种轻量级、可扩展的解决方案有望实际集成到现有的工作流程中,包括现场库存管理和初步分拣,特别是在中小型企业中。
摘要:The wood processing industry, particularly in facilities such as sawmills and MDF production lines, requires accurate and efficient identification of species and thickness of the wood. Although traditional methods rely heavily on expert human labor, they are slow, inconsistent, and prone to error, especially when processing large volumes. This study focuses on practical and cost-effective machine learning frameworks that automate the estimation of timber log diameter using standard RGB images captured under real-world working conditions. We employ the YOLOv5 object detection algorithm, fine-tuned on a public dataset (TimberSeg 1.0), to detect individual timber logs and estimate thickness through bounding-box dimensions. Unlike previous methods that require expensive sensors or controlled environments, this model is trained on images taken in typical industrial sheds during timber delivery. Experimental results show that the model achieves a mean Average Precision (mAP@0.5) of 0.64, demonstrating reliable log detection even with modest computing resources. This lightweight, scalable solution holds promise for practical integration into existing workflows, including on-site inventory management and preliminary sorting, particularly in small and medium-sized operations.
【4】Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
标题:Met $' 2$Net:复杂气象系统的脱钩两阶段时空预测模型
链接:https://arxiv.org/abs/2507.17189
作者:i, Hao Yang, Min Chen, Xiaolin Qin
摘要
:由于全球气候变化,极端天气事件的频率越来越高,这就要求准确的天气预报。最近,由于深度学习技术,\textbf{end-to-end methods}取得了很大的进步,但它们在多变量集成中面临\textit{representation inconsistency}的限制,并且难以有效地捕获变量之间的依赖关系,这是复杂天气系统所需的。将不同的变量视为不同的模态并应用多模态模型的\textbf{两阶段训练方法}可以部分缓解这个问题,但由于两个阶段之间的训练任务不一致,结果往往是次优的。为了解决这些挑战,我们提出了一种隐式两阶段训练方法,为每个变量配置单独的编码器和解码器。详细地说,在第一阶段,翻译器被冻结,而编码器和解码器学习共享的潜在空间,在第二阶段,编码器和解码器被冻结,翻译器捕获变量间的相互作用以进行预测。此外,通过在潜空间中引入自注意机制进行多变量融合,进一步提高了融合性能。经验上,大量的实验表明,我们的方法的最先进的性能。具体地说,它减少了MSE近地面空气温度和相对湿度预测分别为28.82%和23.39%。源代码可在https://github.com/ShremG/Met2Net上获得。
摘要:The increasing frequency of extreme weather events due to global climate change urges accurate weather prediction. Recently, great advances have been made by the \textbf{end-to-end methods}, thanks to deep learning techniques, but they face limitations of \textit{representation inconsistency} in multivariable integration and struggle to effectively capture the dependency between variables, which is required in complex weather systems. Treating different variables as distinct modalities and applying a \textbf{two-stage training approach} from multimodal models can partially alleviate this issue, but due to the inconformity in training tasks between the two stages, the results are often suboptimal. To address these challenges, we propose an implicit two-stage training method, configuring separate encoders and decoders for each variable. In detailed, in the first stage, the Translator is frozen while the Encoders and Decoders learn a shared latent space, in the second stage, the Encoders and Decoders are frozen, and the Translator captures inter-variable interactions for prediction. Besides, by introducing a self-attention mechanism for multivariable fusion in the latent space, the performance achieves further improvements. Empirically, extensive experiments show the state-of-the-art performance of our method. Specifically, it reduces the MSE for near-surface air temperature and relative humidity predictions by 28.82\% and 23.39\%, respectively. The source code is available at https://github.com/ShremG/Met2Net.
【5】EVOLVE-X: Embedding Fusion and Language Prompting for User Evolution Forecasting on Social Media
标题:EVOLVE-X:在社交媒体上嵌入融合和语言预处理用户进化预测
链接:https://arxiv.org/abs/2507.16847
作者:ssain, Sai Puppala, Md Jahangir Alam, Sajedul Talukder
备注:We are submitting this paper to ICWSM 2026 conference on September 15th, 2025
摘要:社交媒体平台是分享个人情绪、日常活动和各种生活事件的重要媒介,确保个人随时了解最新发展。从账户开始,用户逐渐扩大他们的朋友圈或关注者,通过发布,评论和分享内容积极参与。随着时间的推移,这些平台上的用户行为会发生变化,受到人口统计属性和他们形成的网络的影响。在这项研究中,我们提出了一种新的方法,利用开源模型Llama-3-Instruct,Mistral-7 B-Instruct,Gemma-7 B-IT通过即时工程,结合GPT-2,BERT,和RoBERTa使用联合嵌入技术,分析和预测用户行为在社交媒体上的演变。我们的实验证明了这些模型的潜力,以预测用户的社会发展的未来阶段,包括网络的变化,未来的连接,并在用户活动的变化。实验结果突出了我们的方法的有效性,GPT-2实现了最低的困惑(8.21)在跨模态配置,优于RoBERTa(9.11)和BERT,并强调了利用跨模态配置的优越性能的重要性。这种方法解决了社交媒体中的关键挑战,例如朋友推荐和活动预测,提供了对用户行为轨迹的洞察。通过预测未来的互动和活动,这项研究旨在提供有关潜在负面结果的早期预警,使用户能够做出明智的决定,并在长期内降低风险。
摘要:Social media platforms serve as a significant medium for sharing personal emotions, daily activities, and various life events, ensuring individuals stay informed about the latest developments. From the initiation of an account, users progressively expand their circle of friends or followers, engaging actively by posting, commenting, and sharing content. Over time, user behavior on these platforms evolves, influenced by demographic attributes and the networks they form. In this study, we present a novel approach that leverages open-source models Llama-3-Instruct, Mistral-7B-Instruct, Gemma-7B-IT through prompt engineering, combined with GPT-2, BERT, and RoBERTa using a joint embedding technique, to analyze and predict the evolution of user behavior on social media over their lifetime. Our experiments demonstrate the potential of these models to forecast future stages of a user's social evolution, including network changes, future connections, and shifts in user activities. Experimental results highlight the effectiveness of our approach, with GPT-2 achieving the lowest perplexity (8.21) in a Cross-modal configuration, outperforming RoBERTa (9.11) and BERT, and underscoring the importance of leveraging Cross-modal configurations for superior performance. This approach addresses critical challenges in social media, such as friend recommendations and activity predictions, offering insights into the trajectory of user behavior. By anticipating future interactions and activities, this research aims to provide early warnings about potential negative outcomes, enabling users to make informed decisions and mitigate risks in the long term.
【6】Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors
标题:大规模量子处理器的高效预测代理演示
链接:https://arxiv.org/abs/2507.17470
作者:iao, Yuxuan Du, Xinbiao Wang, Tian-Ci Tian, Yong Luo, Bo Du, Dacheng Tao, He-Liang Huang
备注:53 pages, 15 figures, comments are welcome
摘要:量子处理器的持续发展正在推动科学发现的突破。尽管取得了这一进展,但制造大规模量子处理器的巨大成本意味着它们在可预见的未来仍然很罕见,限制了它们的广泛应用。为了解决这个瓶颈,我们引入了预测代理的概念,这是一种经典的学习模型,旨在模拟给定量子处理器的平均值行为,具有可证明的计算效率。特别是,我们提出了两个预测代理,可以大大减少在不同的实际情况下对量子处理器访问的需求。为了证明它们在推进数字量子模拟方面的潜力,我们使用这些代理来模拟具有多达20个可编程超导量子位的量子处理器,从而能够有效地预训练横向场伊辛模型家族的变分量子本征解算器,并识别非平衡Floquet拓扑保护相。实验结果表明,预测代理不仅减少了测量开销的数量级,但也可以超越传统的,量子资源密集型的方法的性能。总的来说,这些发现建立了预测替代品,作为扩大先进量子处理器影响的实用途径。
摘要:The ongoing development of quantum processors is driving breakthroughs in scientific discovery. Despite this progress, the formidable cost of fabricating large-scale quantum processors means they will remain rare for the foreseeable future, limiting their widespread application. To address this bottleneck, we introduce the concept of predictive surrogates, which are classical learning models designed to emulate the mean-value behavior of a given quantum processor with provably computational efficiency. In particular, we propose two predictive surrogates that can substantially reduce the need for quantum processor access in diverse practical scenarios. To demonstrate their potential in advancing digital quantum simulation, we use these surrogates to emulate a quantum processor with up to 20 programmable superconducting qubits, enabling efficient pre-training of variational quantum eigensolvers for families of transverse-field Ising models and identification of non-equilibrium Floquet symmetry-protected topological phases. Experimental results reveal that the predictive surrogates not only reduce measurement overhead by orders of magnitude, but can also surpass the performance of conventional, quantum-resource-intensive approaches. Collectively, these findings establish predictive surrogates as a practical pathway to broadening the impact of advanced quantum processors.
【7】Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability
标题:高概率Kullback-Leibler分歧下的近极小极大离散分布估计
链接:https://arxiv.org/abs/2507.17316
作者:der Hoeven, Julia Olkhovskaia, Tim van Erven
摘要
:We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $\delta$, $\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/\delta) \}/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{\text{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - \delta$ $\text{KL}(p \| \widehat{p}) \leq C(K\log(\log(K)) + \ln(K)\ln(1/\delta)) /n$. Furthermore, we also show that with sufficiently many observations relative to $\log(1/\delta)$, the maximum likelihood estimator $\bar{p}$ guarantees that with probability at least $1-\delta$ $$ 1/6 \chi^2(\bar{p}\|p) \leq 1/4 \chi^2(p\|\bar{p}) \leq \text{KL}(p|\bar{p}) \leq C(K + \log(1/\delta))/n\,, $$ where $\chi^2$ denotes the $\chi^2$-divergence.
摘要:We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $\delta$, $\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/\delta) \}/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{\text{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - \delta$ $\text{KL}(p \| \widehat{p}) \leq C(K\log(\log(K)) + \ln(K)\ln(1/\delta)) /n$. Furthermore, we also show that with sufficiently many observations relative to $\log(1/\delta)$, the maximum likelihood estimator $\bar{p}$ guarantees that with probability at least $1-\delta$ $$ 1/6 \chi^2(\bar{p}\|p) \leq 1/4 \chi^2(p\|\bar{p}) \leq \text{KL}(p|\bar{p}) \leq C(K + \log(1/\delta))/n\,, $$ where $\chi^2$ denotes the $\chi^2$-divergence.
【8】CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates
标题:CoLT:用于评估神经后验估计准确性的条件定位测试
链接:https://arxiv.org/abs/2507.17030
作者:en, Vansh Bansal, James G. Scott
摘要:我们考虑验证神经后验估计\(q(\theta \mid x)\)是否是真实的、未知的真实后验\(p(\theta \mid x)\)的精确近似的问题。现有的方法来评估的质量NPE估计主要来自基于分类器的测试或分歧的措施,但这些遭受几个实际的缺点。作为替代方案,我们引入了条件定位测试(CoLT),这是一种原则性的方法,旨在检测整个条件输入范围内的\(p(\theta \mid x)\)和\(q(\theta \mid x)\)之间的差异。CoLT不是依赖于在每个\(x \)处的穷举比较或密度估计,而是学习一个局部化函数,该函数自适应地选择点$\theta_l(x)$,其中神经后验$q$偏离该$x$的真实后验$p$最强烈。这种方法在典型的基于模拟的推理设置中特别有利,其中对于每个条件输入,只观察到来自真实后验的单个绘制\(\theta \sim p(\theta \mid x)\),但是神经后验\(q(\theta \mid x)\)可以被采样任意次数。我们的理论结果建立了评估所有\(x \)分布平等的必要和充分条件,提供了严格的保证和实际的可扩展性。从经验上讲,我们证明了CoLT不仅在比较$p$和$q$方面比现有方法表现更好,而且还精确定位了显著分歧的区域,为模型改进提供了可操作的见解。这些属性将CoLT定位为验证神经后验估计的最先进解决方案。
摘要:We consider the problem of validating whether a neural posterior estimate \( q(\theta \mid x) \) is an accurate approximation to the true, unknown true posterior \( p(\theta \mid x) \). Existing methods for evaluating the quality of an NPE estimate are largely derived from classifier-based tests or divergence measures, but these suffer from several practical drawbacks. As an alternative, we introduce the \emph{Conditional Localization Test} (CoLT), a principled method designed to detect discrepancies between \( p(\theta \mid x) \) and \( q(\theta \mid x) \) across the full range of conditioning inputs. Rather than relying on exhaustive comparisons or density estimation at every \( x \), CoLT learns a localization function that adaptively selects points $\theta_l(x)$ where the neural posterior $q$ deviates most strongly from the true posterior $p$ for that $x$. This approach is particularly advantageous in typical simulation-based inference settings, where only a single draw \( \theta \sim p(\theta \mid x) \) from the true posterior is observed for each conditioning input, but where the neural posterior \( q(\theta \mid x) \) can be sampled an arbitrary number of times. Our theoretical results establish necessary and sufficient conditions for assessing distributional equality across all \( x \), offering both rigorous guarantees and practical scalability. Empirically, we demonstrate that CoLT not only performs better than existing methods at comparing $p$ and $q$, but also pinpoints regions of significant divergence, providing actionable insights for model refinement. These properties position CoLT as a state-of-the-art solution for validating neural posterior estimates.
【9】Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality
标题:通过条件强数据处理不等式进行分布式协方差矩阵估计的基本限制
链接:https://arxiv.org/abs/2507.16953
作者:Reza Rahmani, Mohammad Hossein Yassaee, Mohammad Reza Aref
摘要:None
摘要:Estimating high-dimensional covariance matrices is a key task across many fields. This paper explores the theoretical limits of distributed covariance estimation in a feature-split setting, where communication between agents is constrained. Specifically, we study a scenario in which multiple agents each observe different components of i.i.d. samples drawn from a sub-Gaussian random vector. A central server seeks to estimate the complete covariance matrix using a limited number of bits communicated by each agent. We obtain a nearly tight minimax lower bound for covariance matrix estimation under operator norm and Frobenius norm. Our main technical tool is a novel generalization of the strong data processing inequality (SDPI), termed the Conditional Strong Data Processing Inequality (C-SDPI) coefficient, introduced in this work. The C-SDPI coefficient shares key properties such as tensorization with the conventional SDPI. Crucially, it quantifies the average contraction in a state-dependent channel and can be significantly lower than the worst-case SDPI coefficient over the state input. Utilizing the doubling trick of Geng-Nair and an operator Jensen inequality, we compute this coefficient for Gaussian mixture channels. We then employ it to establish minimax lower bounds on estimation error, capturing the trade-offs among sample size, communication cost, and data dimensionality. Building on this, we present a nearly optimal estimation protocol whose sample and communication requirements match the lower bounds up to logarithmic factors. Unlike much of the existing literature, our framework does not assume infinite samples or Gaussian distributions, making it broadly applicable. Finally, we extend our analysis to interactive protocols, showing interaction can significantly reduce communication requirements compared to non-interactive schemes.
【10】Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages
标题:技术报告:持续时间预测对印度语言特定于说话者的TTC的影响
链接:https://arxiv.org/abs/2507.16875
作者:ey, Pranav Gaikwad, Amruta Parulekar, Ganesh Ramakrishnan
摘要:由于数据有限和语言结构多样,低资源语言(如许多印度语言)的高质量语音生成仍然是一个重大挑战。时长预测是许多语音生成流程中的关键组成部分,在韵律和语音节奏建模中起着关键作用。而最近的一些生成方法选择省略显式持续时间建模,通常以更长的训练时间为代价。我们保留和探索这个模块,以更好地了解其在印度语言丰富和数据稀缺的景观的影响。我们训练一个非自回归连续归一化流(CNF)的语音模型使用公开的印度语言数据和评估多个持续时间预测策略zero-shot,扬声器特定的一代。我们对语音填充任务的比较分析揭示了细微的权衡:基于填充的预测器提高了某些语言的可理解性,而说话者提示的预测器更好地保留了其他语言的说话者特征。这些发现为针对特定语言和任务的持续时间策略的设计和选择提供了信息,强调了持续时间预测等可解释组件在适应低资源多语言环境的高级生成架构方面的持续价值。
摘要
:High-quality speech generation for low-resource languages, such as many Indian languages, remains a significant challenge due to limited data and diverse linguistic structures. Duration prediction is a critical component in many speech generation pipelines, playing a key role in modeling prosody and speech rhythm. While some recent generative approaches choose to omit explicit duration modeling, often at the cost of longer training times. We retain and explore this module to better understand its impact in the linguistically rich and data-scarce landscape of India. We train a non-autoregressive Continuous Normalizing Flow (CNF) based speech model using publicly available Indian language data and evaluate multiple duration prediction strategies for zero-shot, speaker-specific generation. Our comparative analysis on speech-infilling tasks reveals nuanced trade-offs: infilling based predictors improve intelligibility in some languages, while speaker-prompted predictors better preserve speaker characteristics in others. These findings inform the design and selection of duration strategies tailored to specific languages and tasks, underscoring the continued value of interpretable components like duration prediction in adapting advanced generative architectures to low-resource, multilingual settings.
其他神经网络|深度学习|模型|建模(20篇)
【1】Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
标题:高学习率同时实现对虚假相关性和可压缩性的鲁棒性
链接:https://arxiv.org/abs/2507.17748
作者:sbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal
备注:Accepted at ICCV 2025, 23 pages
摘要:鲁棒性和资源效率是现代机器学习模型的两个非常理想的属性。然而,共同实现这些目标仍然是一项挑战。在本文中,我们将高学习率定位为同时实现对虚假相关性和网络压缩性的鲁棒性的促进者。我们证明,大的学习率也产生理想的表示属性,如不变特征利用率,类分离,激活稀疏。重要的是,我们的研究结果表明,大学习率与其他超参数和正则化方法相比,在一致地满足这些属性方面是有利的。除了在不同的虚假相关数据集、模型和优化器中证明大学习率的积极影响外,我们还提出了强有力的证据,证明之前在标准分类任务中大学习率的成功可能是由于它对解决训练数据集中隐藏/罕见的虚假相关性的影响。
摘要:Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we position high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Importantly, our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is likely due to its effect on addressing hidden/rare spurious correlations in the training dataset.
【2】Joint Asymmetric Loss for Learning with Noisy Labels
标题:带有噪音标签的联合不对称学习损失
链接:https://arxiv.org/abs/2507.17692
作者:Wang, Xianming Liu, Xiong Zhou, Gangfeng Hu, Deming Zhai, Junjun Jiang, Xiangyang Ji
备注:Accepted by ICCV 2025
摘要:使用噪声标签进行学习是训练准确的深度神经网络的关键任务。为了减轻标签噪声,先前的研究已经提出了各种鲁棒损失函数,特别是对称损失。然而,由于过于严格的约束,对称损耗通常遭受欠拟合问题。为了解决这个问题,主动被动损耗(APL)联合优化主动和被动损耗以相互增强整体拟合能力。在APL中,对称损失已经成功地扩展,产生了先进的鲁棒损失函数。尽管有这些进步,新兴的理论分析表明,非对称损失,一类新的鲁棒损失函数,具有优越的性能相比,对称损失。然而,现有的非对称损耗与高级优化框架(如APL)不兼容,限制了它们的潜力和适用性。出于这种理论上的差距和不对称损失的前景,我们扩展的不对称损失更复杂的被动损失的情况下,提出了不对称均方误差(AMSE),一种新的不对称损失。我们严格地建立了AMSE满足非对称条件的充分必要条件。通过用我们提出的AMSE代替APL中传统的对称无源损耗,我们引入了一个新的鲁棒损耗框架,称为联合非对称损耗(JAL)。大量的实验表明,我们的方法在减轻标签噪声的有效性。代码可在:https://github.com/cswjl/joint-asymmetric-loss
摘要:Learning with noisy labels is a crucial task for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions, particularly symmetric losses. Nevertheless, symmetric losses usually suffer from the underfitting issue due to the overly strict constraint. To address this problem, the Active Passive Loss (APL) jointly optimizes an active and a passive loss to mutually enhance the overall fitting ability. Within APL, symmetric losses have been successfully extended, yielding advanced robust loss functions. Despite these advancements, emerging theoretical analyses indicate that asymmetric losses, a new class of robust loss functions, possess superior properties compared to symmetric losses. However, existing asymmetric losses are not compatible with advanced optimization frameworks such as APL, limiting their potential and applicability. Motivated by this theoretical gap and the prospect of asymmetric losses, we extend the asymmetric loss to the more complex passive loss scenario and propose the Asymetric Mean Square Error (AMSE), a novel asymmetric loss. We rigorously establish the necessary and sufficient condition under which AMSE satisfies the asymmetric condition. By substituting the traditional symmetric passive loss in APL with our proposed AMSE, we introduce a novel robust loss framework termed Joint Asymmetric Loss (JAL). Extensive experiments demonstrate the effectiveness of our method in mitigating label noise. Code available at: https://github.com/cswjl/joint-asymmetric-loss
【3】XStacking: Explanation-Guided Stacked Ensemble Learning
标题:XStacking:学习引导的堆叠式群体学习
链接:https://arxiv.org/abs/2507.17650
作者:rouani, Ayah Barhrhouj, Olivier Teste
备注:None
摘要:Enseminated Machine Learning(EML)技术,特别是堆叠,已被证明可以通过组合多个基础模型来提高预测性能。然而,它们经常因缺乏可解释性而受到批评。在本文中,我们介绍了XStacking,一个有效的和内在的解释框架,解决了这个限制,通过集成动态特征转换与模型无关的Shapley添加剂的解释。这使得堆叠模型能够保持其预测准确性,同时变得内在可解释。我们在29个数据集上证明了该框架的有效性,在学习空间的预测有效性和所得模型的可解释性方面都取得了改进。XStacking为负责任的ML提供了实用且可扩展的解决方案。
摘要:Ensemble Machine Learning (EML) techniques, especially stacking, have been shown to improve predictive performance by combining multiple base models. However, they are often criticized for their lack of interpretability. In this paper, we introduce XStacking, an effective and inherently explainable framework that addresses this limitation by integrating dynamic feature transformation with model-agnostic Shapley additive explanations. This enables stacked models to retain their predictive accuracy while becoming inherently explainable. We demonstrate the effectiveness of the framework on 29 datasets, achieving improvements in both the predictive effectiveness of the learning space and the interpretability of the resulting models. XStacking offers a practical and scalable solution for responsible ML.
【4】Integrating Physics-Based and Data-Driven Approaches for Probabilistic Building Energy Modeling
标题:集成基于物理和数据驱动的方法进行概率建筑能源建模
链接:https://arxiv.org/abs/2507.17526
作者:on Krannichfeldt, Kristina Orehounig, Olga Fink
摘要:建筑能源建模是优化建筑能源系统性能的关键工具。从历史上看,已经探索了各种方法-从传统的基于物理的模型到纯粹的数据驱动技术。最近,结合了两种范式的优势的混合方法受到了关注。这些策略包括学习基于物理的模型的代理,模拟和观察数据之间的残差建模,使用真实世界的测量值微调代理,使用基于物理的输出作为数据驱动模型的额外输入,以及将基于物理的输出集成到数据驱动模型的损失函数中。尽管取得了这些进展,但仍然存在两个重大的研究空白。首先,大多数混合方法侧重于确定性建模,往往忽略了由天气波动和居住者行为等因素引起的固有不确定性。其次,在概率建模框架内很少有系统的比较。本研究通过评估五种具有代表性的混合方法来解决这些差距,用于概率建筑能源建模,重点关注真实案例研究中建筑热力学的分位数预测。我们的研究结果突出了两个主要发现。首先,混合方法的性能在不同的建筑房间类型中有所不同,但平均而言,前馈神经网络的残差学习性能最好。值得注意的是,残差方法是唯一的模型,产生物理直观的预测时,适用于分布外的测试数据。其次,分位数共形预测是在室内温度建模的情况下校准分位数预测的有效过程。
摘要:Building energy modeling is a key tool for optimizing the performance of building energy systems. Historically, a wide spectrum of methods has been explored -- ranging from conventional physics-based models to purely data-driven techniques. Recently, hybrid approaches that combine the strengths of both paradigms have gained attention. These include strategies such as learning surrogates for physics-based models, modeling residuals between simulated and observed data, fine-tuning surrogates with real-world measurements, using physics-based outputs as additional inputs for data-driven models, and integrating the physics-based output into the loss function the data-driven model. Despite this progress, two significant research gaps remain. First, most hybrid methods focus on deterministic modeling, often neglecting the inherent uncertainties caused by factors like weather fluctuations and occupant behavior. Second, there has been little systematic comparison within a probabilistic modeling framework. This study addresses these gaps by evaluating five representative hybrid approaches for probabilistic building energy modeling, focusing on quantile predictions of building thermodynamics in a real-world case study. Our results highlight two main findings. First, the performance of hybrid approaches varies across different building room types, but residual learning with a Feedforward Neural Network performs best on average. Notably, the residual approach is the only model that produces physically intuitive predictions when applied to out-of-distribution test data. Second, Quantile Conformal Prediction is an effective procedure for calibrating quantile predictions in case of indoor temperature modeling.
【5】BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles
标题:BGM-HAN:一个分层注意力网络,用于对半结构化配置文件进行准确和公平的决策评估
链接:https://arxiv.org/abs/2507.17472
作者:u, Roy Ka-Wei Lee, Kwan Hui Lim
备注:Accepted at ASONAM 2025
摘要:人类在高风险领域的决策通常依赖于专业知识和经验,但容易受到难以察觉的认知偏见的影响,这些偏见威胁到公平和长期结果。这项工作提出了一种新的方法,通过将分层学习与各种增强功能集成来增强复杂的决策工作流程。针对大学招生作为一个代表性的高风险领域,我们提出了BGM-HAN,一个增强的字节对编码,门控多头层次注意力网络,旨在有效地建模半结构化的申请人数据。BGM-HAN捕获了对细微评估至关重要的多层次表示,提高了可解释性和预测性能。对真实招生数据的实验结果表明,我们提出的模型显着优于传统机器学习到大型语言模型的最新基线,为增强结构,上下文和公平性重要的领域的决策提供了一个有前途的框架。源代码可从以下网址获得:https://github.com/junhua/bgm-han。
摘要:Human decision-making in high-stakes domains often relies on expertise and heuristics, but is vulnerable to hard-to-detect cognitive biases that threaten fairness and long-term outcomes. This work presents a novel approach to enhancing complex decision-making workflows through the integration of hierarchical learning alongside various enhancements. Focusing on university admissions as a representative high-stakes domain, we propose BGM-HAN, an enhanced Byte-Pair Encoded, Gated Multi-head Hierarchical Attention Network, designed to effectively model semi-structured applicant data. BGM-HAN captures multi-level representations that are crucial for nuanced assessment, improving both interpretability and predictive performance. Experimental results on real admissions data demonstrate that our proposed model significantly outperforms both state-of-the-art baselines from traditional machine learning to large language models, offering a promising framework for augmenting decision-making in domains where structure, context, and fairness matter. Source code is available at: https://github.com/junhua/bgm-han.
【6】Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees
标题:通过枝界树的顺序领先探索进行高效神经网络验证
链接:https://arxiv.org/abs/2507.17453
作者:hang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, Yulei Sui
备注:This is an extended version of the ECOOP 2025 paper, with a comparison with DATE 2025 (Figure 7 of RQ1 in Section 5.2), as well as an in-depth discussion of OOPSLA 2025 in the related work (Section 6)
摘要:神经网络对对抗性扰动的脆弱性使得形式化验证技术成为必要,这些技术可以严格证明神经网络的质量。作为最先进的分支定界(BaB)是一种“分而治之”的策略,它将现成的验证器应用于它们表现更好的子问题。虽然BaB可以识别需要拆分的子问题,但它以天真的“先到先得”方式探索这些子问题的空间,从而遭受低效率问题以得出验证结论。为了弥合这一差距,我们引入了一个顺序,在不同的子问题产生的BaB,关于他们不同的可能性包含反例。基于这个顺序,我们提出了一种新的验证框架Oliva,它通过优先考虑那些更有可能找到反例的子问题来探索子问题空间,以有效地得出验证的结论。即使在任何子问题中找不到反例,它也只是改变了访问不同子问题的顺序,因此不会导致性能下降。具体来说,Oliva有两种变体,包括$Oliva^{GR}$,一种总是优先考虑更有可能找到反例的子问题的贪婪策略,以及$Oliva^{SA}$,一种受模拟退火启发的平衡策略,逐渐从探索转向利用,以找到全局最优的子问题。我们通过实验评估了Oliva在690个验证问题上的性能,这些问题跨越了5个模型,数据集为MNIST和CIFAR 10。与最先进的方法相比,我们证明了Oliva在MNIST中的加速高达25倍,在CIFAR 10中高达80倍。
摘要:The vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch and bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive "first-come-first-serve" manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problem and so will not lead to a performance degradation. Specifically, Oliva has two variants, including $Oliva^{GR}$, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and $Oliva^{SA}$, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25X in MNIST, and up to 80X in CIFAR10.
【7】Continual Generalized Category Discovery: Learning and Forgetting from a Bayesian Perspective
标题:连续广义范畴发现:贝叶斯视角下的学习与遗忘
链接:https://arxiv.org/abs/2507.17382
作者:Jagmohan Chauhan
备注:20 pages, 6 figures. Forty-second International Conference on Machine Learning. 2025
摘要:连续广义类别发现(C-GCD)面临着一个关键的挑战:增量学习新的类从未标记的数据流,同时保持旧类的知识。现有的方法与灾难性的遗忘作斗争,特别是当未标记的数据混合了已知和新的类别时。我们通过从贝叶斯角度分析C-GCD的遗忘动态来解决这个问题,揭示了新旧类之间的协方差不一致会导致性能下降。基于这一认识,我们提出了变分贝叶斯C-GCD(VB-CGCD),一个新的框架,集成了变分推理与协方差感知的最近类均值分类。VB-CGCD自适应地对齐类分布,同时通过随机变分更新来抑制伪标签噪声。实验表明,VB-CGCD超过现有技术+15.21%,在标准基准测试的最终会话中的整体准确度。我们还引入了一个新的具有挑战性的基准,只有10%的标记数据和扩展的在线阶段,VB-CGCD达到了67.86%的最终准确率,显著高于最先进的水平(38.55%),证明了其在不同场景中的强大适用性。代码可从以下网址获得:www.example.com
摘要:Continual Generalized Category Discovery (C-GCD) faces a critical challenge: incrementally learning new classes from unlabeled data streams while preserving knowledge of old classes. Existing methods struggle with catastrophic forgetting, especially when unlabeled data mixes known and novel categories. We address this by analyzing C-GCD's forgetting dynamics through a Bayesian lens, revealing that covariance misalignment between old and new classes drives performance degradation. Building on this insight, we propose Variational Bayes C-GCD (VB-CGCD), a novel framework that integrates variational inference with covariance-aware nearest-class-mean classification. VB-CGCD adaptively aligns class distributions while suppressing pseudo-label noise via stochastic variational updates. Experiments show VB-CGCD surpasses prior art by +15.21% with the overall accuracy in the final session on standard benchmarks. We also introduce a new challenging benchmark with only 10% labeled data and extended online phases, VB-CGCD achieves a 67.86% final accuracy, significantly higher than state-of-the-art (38.55%), demonstrating its robust applicability across diverse scenarios. Code is available at: https://github.com/daihao42/VB-CGCD
【8】A Learning-based Domain Decomposition Method
标题:一种基于学习的区域分解方法
链接:https://arxiv.org/abs/2507.17328
作者:ikola Kovachki, Burigede Liu
摘要:机械、航空航天和结构工程领域的最新发展推动了对更大、更复杂尺度下结构建模和分析的有效方法的需求不断增长。虽然有限元法等已建立的数值方法仍然可靠,但在处理大型和几何复杂问题时,它们通常会与计算成本和可扩展性作斗争。近年来,基于神经网络的方法已经显示出希望,因为它们能够有效地近似非线性映射。然而,大多数现有的神经方法仍然在很大程度上限于简单的域,这使得它很难适用于现实世界中涉及复杂几何形状的偏微分方程。在本文中,我们提出了一种基于学习的区域分解方法(L-DDM),解决了这一差距。我们的方法使用一个单独的,预先训练的神经操作员-最初在简单的领域上训练-作为域分解方案中的代理模型,使我们能够有效地处理大型和复杂的领域。我们提供了一个一般的理论结果的存在性神经算子近似的背景下,区域分解解决方案的抽象偏微分方程。然后,我们证明了我们的方法,准确地近似的解决方案,椭圆偏微分方程的不连续的微观结构在复杂的几何形状,使用物理预训练的神经操作员(PPNO)。我们的研究结果表明,这种方法不仅在这些具有挑战性的问题上优于当前最先进的方法,而且还提供了分辨率不变性和对训练过程中看不到的微观结构模式的强大泛化能力。
摘要:Recent developments in mechanical, aerospace, and structural engineering have driven a growing need for efficient ways to model and analyse structures at much larger and more complex scales than before. While established numerical methods like the Finite Element Method remain reliable, they often struggle with computational cost and scalability when dealing with large and geometrically intricate problems. In recent years, neural network-based methods have shown promise because of their ability to efficiently approximate nonlinear mappings. However, most existing neural approaches are still largely limited to simple domains, which makes it difficult to apply to real-world PDEs involving complex geometries. In this paper, we propose a learning-based domain decomposition method (L-DDM) that addresses this gap. Our approach uses a single, pre-trained neural operator-originally trained on simple domains-as a surrogate model within a domain decomposition scheme, allowing us to tackle large and complicated domains efficiently. We provide a general theoretical result on the existence of neural operator approximations in the context of domain decomposition solution of abstract PDEs. We then demonstrate our method by accurately approximating solutions to elliptic PDEs with discontinuous microstructures in complex geometries, using a physics-pretrained neural operator (PPNO). Our results show that this approach not only outperforms current state-of-the-art methods on these challenging problems, but also offers resolution-invariance and strong generalization to microstructural patterns unseen during training.
【9】Confounded Causal Imitation Learning with Instrumental Variables
标题:具有工具变量的混淆因果模仿学习
链接:https://arxiv.org/abs/2507.17309
作者: Shenglan Nie, Feng Xie, Libo Huang, Peng Wu, Zhi Geng
备注:12 pages, 6 figures
摘要:从演示中进行模仿学习通常会受到不可测量变量的混淆影响(即,不可测量的混杂因素)对状态和动作的影响。如果忽略这些因素,将导致对政策的偏颇估计。为了打破这种混淆的差距,在本文中,我们采取的工具变量(IV)的强大力量,并提出了一个混淆的因果模仿学习(C2L)模型。该模型可容纳跨多个时间步影响动作的混杂因素,而不是仅限于即时的时间依赖性。我们开发了一个两阶段的模仿学习框架,有效的IV识别和政策优化。特别地,在第一阶段,我们基于定义的伪变量构造了一个测试准则,利用该准则,我们实现了对C2L模型的有效IV的识别。这一标准为IV有效性提供了充分必要的可识别性条件。在第二阶段,根据确定的IV,我们提出了两种候选政策学习方法:一种是基于模拟器的,而另一种是离线的。大量的实验验证了识别有效IV以及学习策略的有效性。
摘要:Imitation learning from demonstrations usually suffers from the confounding effects of unmeasured variables (i.e., unmeasured confounders) on the states and actions. If ignoring them, a biased estimation of the policy would be entailed. To break up this confounding gap, in this paper, we take the best of the strong power of instrumental variables (IV) and propose a Confounded Causal Imitation Learning (C2L) model. This model accommodates confounders that influence actions across multiple timesteps, rather than being restricted to immediate temporal dependencies. We develop a two-stage imitation learning framework for valid IV identification and policy optimization. In particular, in the first stage, we construct a testing criterion based on the defined pseudo-variable, with which we achieve identifying a valid IV for the C2L models. Such a criterion entails the sufficient and necessary identifiability conditions for IV validity. In the second stage, with the identified IV, we propose two candidate policy learning approaches: one is based on a simulator, while the other is offline. Extensive experiments verified the effectiveness of identifying the valid IV as well as learning the policy.
【10】Data Virtualization for Machine Learning
标题:机器学习的数据虚拟化
链接:https://arxiv.org/abs/2507.17293
作者:an, Joyraj Chakraborty, Philip Beaucamp, Niraj Bhujel, Min Chen
摘要:如今,机器学习(ML)团队针对不同的应用程序拥有多个并发ML工作流。每个工作流通常涉及许多实验、迭代和协作活动,从初始数据争论到模型部署通常需要数月甚至数年的时间。从理论上讲,有大量的中间数据需要存储、处理和维护。数据虚拟化成为基础设施中服务ML工作流的关键技术。在本文中,我们提出了一个数据虚拟化服务的设计和实现,重点是它的服务架构和服务操作。该基础设施目前支持六个ML应用程序,每个应用程序都有一个以上的ML工作流。数据虚拟化服务允许应用程序和工作流程的数量在未来几年内增长。
摘要
:Nowadays, machine learning (ML) teams have multiple concurrent ML workflows for different applications. Each workflow typically involves many experiments, iterations, and collaborative activities and commonly takes months and sometimes years from initial data wrangling to model deployment. Organizationally, there is a large amount of intermediate data to be stored, processed, and maintained. \emph{Data virtualization} becomes a critical technology in an infrastructure to serve ML workflows. In this paper, we present the design and implementation of a data virtualization service, focusing on its service architecture and service operations. The infrastructure currently supports six ML applications, each with more than one ML workflow. The data virtualization service allows the number of applications and workflows to grow in the coming years.
【11】P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices
标题:P3 SL:异类边缘设备上的个性化隐私保护拆分学习
链接:https://arxiv.org/abs/2507.17228
作者:JinYi Yoon, Xiaochang Li, Huajie Shao, Bo Ji
备注:Accepted as invited paper in The 34th International Conference on Computer Communications and Networks (ICCCN 2025)
摘要:拆分学习(SL)是一种新兴的隐私保护机器学习技术,通过将模型划分为客户端和服务器端子模型,使资源受限的边缘设备能够参与模型训练。虽然SL减少了边缘设备上的计算开销,但它在异构环境中遇到了重大挑战,其中设备在计算资源,通信能力,环境条件和隐私要求方面各不相同。尽管最近的研究已经探索了异构SL框架,这些框架可以优化具有不同资源约束的设备的分割点,但它们通常忽略了不同环境条件下的个性化隐私要求和本地模型定制。为了解决这些局限性,我们提出了P3SL,这是一个个性化的隐私保护分割学习框架,专为异构的、资源受限的边缘设备系统而设计。这项工作的主要贡献是双重的。首先,我们设计了一个个性化的顺序分割学习管道,允许每个客户端实现定制的隐私保护,并根据其计算资源,环境条件和隐私需求维护个性化的本地模型。其次,我们采用了一种双层优化技术,该技术使客户端能够在不共享私人敏感信息的情况下确定自己的最佳个性化分割点(即,计算资源、环境条件、隐私要求)。这种方法平衡了能耗和隐私泄露风险,同时保持了较高的模型精度。我们在一个由7台设备组成的测试平台上实现和评估P3SL,其中包括4台Jetson Nano P3450设备,2台Raspberry Pis和1台笔记本电脑,在不同的环境条件下使用不同的模型架构和数据集。
摘要:Split Learning (SL) is an emerging privacy-preserving machine learning technique that enables resource constrained edge devices to participate in model training by partitioning a model into client-side and server-side sub-models. While SL reduces computational overhead on edge devices, it encounters significant challenges in heterogeneous environments where devices vary in computing resources, communication capabilities, environmental conditions, and privacy requirements. Although recent studies have explored heterogeneous SL frameworks that optimize split points for devices with varying resource constraints, they often neglect personalized privacy requirements and local model customization under varying environmental conditions. To address these limitations, we propose P3SL, a Personalized Privacy-Preserving Split Learning framework designed for heterogeneous, resource-constrained edge device systems. The key contributions of this work are twofold. First, we design a personalized sequential split learning pipeline that allows each client to achieve customized privacy protection and maintain personalized local models tailored to their computational resources, environmental conditions, and privacy needs. Second, we adopt a bi-level optimization technique that empowers clients to determine their own optimal personalized split points without sharing private sensitive information (i.e., computational resources, environmental conditions, privacy requirements) with the server. This approach balances energy consumption and privacy leakage risks while maintaining high model accuracy. We implement and evaluate P3SL on a testbed consisting of 7 devices including 4 Jetson Nano P3450 devices, 2 Raspberry Pis, and 1 laptop, using diverse model architectures and datasets under varying environmental conditions.
【12】Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
标题:通过人在环指导,使自我改进的代理人能够在测试时间学习
链接:https://arxiv.org/abs/2507.17131
作者: Ruoyu Li, Alex Chen, Yue Liu, Yulin Chen, Yuan Sui, Cheng Chen, Yi Zhu, Luca Luo, Frank Yang, Bryan Hooi
摘要:大型语言模型(LLM)代理经常在规则和所需领域知识频繁变化的环境中挣扎,例如法规遵从性和用户风险筛选。现有的离线微调和标准提示等方法在实际操作中无法有效适应新知识,存在不足。为了解决这个问题,我们提出了自适应反射交互式代理(ARIA),LLM代理框架,专门设计用于在测试时不断学习更新的领域知识。ARIA通过结构化的自我对话来评估自己的不确定性,主动识别知识差距,并要求人类专家进行有针对性的解释或纠正。然后,它系统地更新一个内部的,有时间戳的知识库,提供人工指导,通过比较和澄清查询来检测和解决冲突或过时的知识。我们在TikTok Pay上对真实的客户尽职调查名称筛选任务以及公开的动态知识任务进行了评估。结果表明,使用标准的离线微调和现有的自我改进代理相比,基线的适应性和准确性显着改善。ARIA部署在TikTok Pay中,每月活跃用户超过1.5亿,证实了其在快速发展的环境中的实用性和有效性。
摘要:Large language model (LLM) agents often struggle in environments where rules and required domain knowledge frequently change, such as regulatory compliance and user risk screening. Current approaches, like offline fine-tuning and standard prompting, are insufficient because they cannot effectively adapt to new knowledge during actual operation. To address this limitation, we propose the Adaptive Reflective Interactive Agent (ARIA), an LLM agent framework designed specifically to continuously learn updated domain knowledge at test time. ARIA assesses its own uncertainty through structured self-dialogue, proactively identifying knowledge gaps and requesting targeted explanations or corrections from human experts. It then systematically updates an internal, timestamped knowledge repository with provided human guidance, detecting and resolving conflicting or outdated knowledge through comparisons and clarification queries. We evaluate ARIA on the realistic customer due diligence name screening task on TikTok Pay, alongside publicly available dynamic knowledge tasks. Results demonstrate significant improvements in adaptability and accuracy compared to baselines using standard offline fine-tuning and existing self-improving agents. ARIA is deployed within TikTok Pay serving over 150 million monthly active users, confirming its practicality and effectiveness for operational use in rapidly evolving environments.
【13】Probabilistic Graphical Models: A Concise Tutorial
标题:概率图形模型:简洁的收件箱
链接:https://arxiv.org/abs/2507.17116
作者:e Maasch, Willie Neiswanger, Stefano Ermon, Volodymyr Kuleshov
备注:Under review
摘要:概率图建模是机器学习的一个分支,它使用概率分布来描述世界,进行预测,并支持不确定性下的决策。这个建模框架的基础是一个优雅的理论体系,它连接了两个数学传统:概率论和图论。这个框架提供了紧凑而富有表现力的联合概率分布表示,产生强大的概率推理生成模型。 本教程简要介绍了这个建模框架的形式、方法和应用程序。在回顾了基本概率和图论之后,我们将探讨三个主要主题:(1)用直观的图形语言表示多变量分布,(2)从数据中学习模型参数和图形结构的算法,以及(3)精确和近似推理的算法。
摘要:Probabilistic graphical modeling is a branch of machine learning that uses probability distributions to describe the world, make predictions, and support decision-making under uncertainty. Underlying this modeling framework is an elegant body of theory that bridges two mathematical traditions: probability and graph theory. This framework provides compact yet expressive representations of joint probability distributions, yielding powerful generative models for probabilistic reasoning. This tutorial provides a concise introduction to the formalisms, methods, and applications of this modeling framework. After a review of basic probability and graph theory, we explore three dominant themes: (1) the representation of multivariate distributions in the intuitive visual language of graphs, (2) algorithms for learning model parameters and graphical structures from data, and (3) algorithms for inference, both exact and approximate.
【14】ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search
标题:ZORMS-LfD:从零阶随机矩阵搜索的演示中学习
链接:https://arxiv.org/abs/2507.17096
作者
:y, Timothy L. Molloy, Wanxin Jin, Iman Shames
摘要:我们提出了零阶随机矩阵搜索学习演示(ZORMS-LfD)。ZORMS-LfD使连续和离散时间的约束最优控制问题的成本、约束和动态能够从专家演示中学习,而不需要学习损失景观的平滑度。相比之下,现有的最先进的一阶方法需要相对于状态、控制和/或参数的成本、约束、动态和学习损失的梯度的存在和计算。大多数现有的方法也适合离散时间,在连续时间的约束问题只得到粗略的注意。我们证明了ZORMS-LfD在各种基准问题的学习损失和计算时间方面匹配或超越了最先进方法的性能。在无约束连续时间基准问题上,ZORMS-LfD实现了与最先进的一阶方法相似的损失性能,计算时间减少了80 $\%以上。在没有专门的最先进的方法的约束连续时间基准问题,ZORMS-LfD表现出优于常用的无梯度Nelder-Mead优化方法。
摘要:We propose Zeroth-Order Random Matrix Search for Learning from Demonstrations (ZORMS-LfD). ZORMS-LfD enables the costs, constraints, and dynamics of constrained optimal control problems, in both continuous and discrete time, to be learned from expert demonstrations without requiring smoothness of the learning-loss landscape. In contrast, existing state-of-the-art first-order methods require the existence and computation of gradients of the costs, constraints, dynamics, and learning loss with respect to states, controls and/or parameters. Most existing methods are also tailored to discrete time, with constrained problems in continuous time receiving only cursory attention. We demonstrate that ZORMS-LfD matches or surpasses the performance of state-of-the-art methods in terms of both learning loss and compute time across a variety of benchmark problems. On unconstrained continuous-time benchmark problems, ZORMS-LfD achieves similar loss performance to state-of-the-art first-order methods with an over $80$\% reduction in compute time. On constrained continuous-time benchmark problems where there is no specialized state-of-the-art method, ZORMS-LfD is shown to outperform the commonly used gradient-free Nelder-Mead optimization method.
【15】SplitMeanFlow: Interval Splitting Consistency in Few-Step Generative Modeling
标题:SplitMeanFlow:少步生成建模中的区间分裂一致性
链接:https://arxiv.org/abs/2507.16884
作者:ei Wang, Zhihang Yuan, Rong Cao, Kuan Chen, Zhengyang Chen, Yuanyuan Huo, Yang Zhang, Yuping Wang, Shouda Liu, Yuxuan Wang
备注:Tech Report
摘要:生成式模型(如Flow Matching)已经实现了最先进的性能,但通常受到计算成本高昂的迭代采样过程的阻碍。为了解决这个问题,最近的工作集中在通过学习平均速度场来进行几步或一步生成,该平均速度场直接将噪声映射到数据。MeanFlow是该领域的领先方法,它通过执行连接平均速度和瞬时速度的微分恒等式来学习该领域。在这项工作中,我们认为,这种微分公式是一个更基本的原则的限制特殊情况。我们回到平均速度的第一原理,并利用定积分的可加性。这使我们得出一个新的,纯粹的代数身份,我们称之为区间分裂一致性。这个恒等式建立了跨越不同时间间隔的平均速度场的自参考关系,而不需要借助于任何微分算子。基于这一原则,我们引入了SplitMeanFlow,这是一个新的训练框架,它直接将这种代数一致性作为学习目标。我们正式证明,微分身份的核心MeanFlow恢复我们的代数一致性的限制,区间分裂成为无穷小。这建立了SplitMeanFlow作为学习平均速度场的直接和更一般的基础。从实践的角度来看,我们的代数方法效率更高,因为它消除了对JVP计算的需要,从而实现更简单,更稳定的训练和更广泛的硬件兼容性。一步和两步SplitMeanFlow模型已成功部署在大规模语音合成产品(如豆宝)中,实现了20倍的加速比。
摘要:Generative models like Flow Matching have achieved state-of-the-art performance but are often hindered by a computationally expensive iterative sampling process. To address this, recent work has focused on few-step or one-step generation by learning the average velocity field, which directly maps noise to data. MeanFlow, a leading method in this area, learns this field by enforcing a differential identity that connects the average and instantaneous velocities. In this work, we argue that this differential formulation is a limiting special case of a more fundamental principle. We return to the first principles of average velocity and leverage the additivity property of definite integrals. This leads us to derive a novel, purely algebraic identity we term Interval Splitting Consistency. This identity establishes a self-referential relationship for the average velocity field across different time intervals without resorting to any differential operators. Based on this principle, we introduce SplitMeanFlow, a new training framework that enforces this algebraic consistency directly as a learning objective. We formally prove that the differential identity at the core of MeanFlow is recovered by taking the limit of our algebraic consistency as the interval split becomes infinitesimal. This establishes SplitMeanFlow as a direct and more general foundation for learning average velocity fields. From a practical standpoint, our algebraic approach is significantly more efficient, as it eliminates the need for JVP computations, resulting in simpler implementation, more stable training, and broader hardware compatibility. One-step and two-step SplitMeanFlow models have been successfully deployed in large-scale speech synthesis products (such as Doubao), achieving speedups of 20x.
【16】Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
标题:寻找Dori:文本到图像扩散模型中的再同步不像假设的那样局部化
链接:https://arxiv.org/abs/2507.16880
作者:walczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
摘要:文本到图像扩散模型(DM)在图像生成方面取得了显著的成功。然而,对数据隐私和知识产权的担忧仍然存在,因为它们可能会无意中记住和复制训练数据。最近的缓解工作集中在识别和修剪负责触发复制的权重,基于记忆可以本地化的假设。我们的研究评估了这些基于修剪的方法的鲁棒性。我们证明,即使修剪后,输入提示的文本嵌入的微小调整足以重新触发数据复制,突出这些防御的脆弱性。此外,我们挑战记忆局部性的基本假设,通过显示复制可以从文本嵌入空间内的不同位置触发,并遵循模型中的不同路径。我们的研究结果表明,现有的缓解策略是不够的,并强调需要真正删除记忆内容的方法,而不是试图抑制其检索。作为这个方向的第一步,我们引入了一种新的对抗性微调方法,该方法迭代搜索复制触发器并更新模型以提高鲁棒性。通过我们的研究,我们对文本到图像DM中记忆的本质提供了新的见解,并为构建更值得信赖和合规的生成AI奠定了基础。
摘要:Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering replication, based on the assumption that memorization can be localized. Our research assesses the robustness of these pruning-based approaches. We demonstrate that even after pruning, minor adjustments to text embeddings of input prompts are sufficient to re-trigger data replication, highlighting the fragility of these defenses. Furthermore, we challenge the fundamental assumption of memorization locality, by showing that replication can be triggered from diverse locations within the text embedding space, and follows different paths in the model. Our findings indicate that existing mitigation strategies are insufficient and underscore the need for methods that truly remove memorized content, rather than attempting to suppress its retrieval. As a first step in this direction, we introduce a novel adversarial fine-tuning method that iteratively searches for replication triggers and updates the model to increase robustness. Through our research, we provide fresh insights into the nature of memorization in text-to-image DMs and a foundation for building more trustworthy and compliant generative AI.
【17】Navigation through Non-Compact Symmetric Spaces: a mathematical perspective on Cartan Neural Networks
标题:非紧对称空间中的导航:Cartan神经网络的数学视角
链接:https://arxiv.org/abs/2507.16871
作者:useppe Fré, Federico Milanesio, Guido Sanguinetti, Matteo Santoro
备注:59 pages, 2 figures
摘要:最近的工作已经确定非紧对称空间U/H作为一类有前途的齐次流形,以发展一个几何一致的神经网络理论。这些概念的初始实现已经在Cartan神经网络的绰号下的孪生论文中提出,显示了这些几何概念在机器学习环境中的可行性和性能。本文扩展了支撑Cartan神经网络的数学结构,详细介绍了层的几何属性以及层之间的映射如何与这些结构相互作用,以使Cartan神经网络协变和几何可解释。这两篇论文一起构成了利用群论结构的神经网络的完全几何可解释理论的第一步
摘要
:Recent work has identified non-compact symmetric spaces U/H as a promising class of homogeneous manifolds to develop a geometrically consistent theory of neural networks. An initial implementation of these concepts has been presented in a twin paper under the moniker of Cartan Neural Networks, showing both the feasibility and the performance of these geometric concepts in a machine learning context. The current paper expands on the mathematical structures underpinning Cartan Neural Networks, detailing the geometric properties of the layers and how the maps between layers interact with such structures to make Cartan Neural Networks covariant and geometrically interpretable. Together, these twin papers constitute a first step towards a fully geometrically interpretable theory of neural networks exploiting group-theoretic structures
【18】Debiased maximum-likelihood estimators for hazard ratios under machine-learning adjustment
标题:机器学习调整下危险比的去偏最大似然估计
链接:https://arxiv.org/abs/2507.17686
作者:ayakawa, Satoshi Asai
摘要:既往研究表明,使用Cox模型估计的治疗组之间的风险比无法解释,因为模型的不确定基线风险无法识别由于治疗分配和多个矛盾场景中的未观察到的因素导致的风险集组成的时间变化。为了缓解这个问题,特别是在基于观察数据的研究中,不受控制的动态处理和许多协变量的实时测量,我们建议放弃基线风险,并使用机器学习来明确建模风险集的变化,有或没有潜在变量。对于这个框架,我们澄清的背景下,风险比可以被因果解释,然后开发一种方法的基础上奈曼正交计算去偏最大似然估计的风险比。计算构造的估计量是更有效的计算那些基于加权回归边际结构Cox模型。数值模拟证实,所提出的方法能够以最小的偏差识别地面真实情况。这些结果奠定了基础,为发展一个有用的,替代方法的因果推理与不受控制的,观察数据在现代流行病学。
摘要:Previous studies have shown that hazard ratios between treatment groups estimated with the Cox model are uninterpretable because the indefinite baseline hazard of the model fails to identify temporal change in the risk set composition due to treatment assignment and unobserved factors among multiple, contradictory scenarios. To alleviate this problem, especially in studies based on observational data with uncontrolled dynamic treatment and real-time measurement of many covariates, we propose abandoning the baseline hazard and using machine learning to explicitly model the change in the risk set with or without latent variables. For this framework, we clarify the context in which hazard ratios can be causally interpreted, and then develop a method based on Neyman orthogonality to compute debiased maximum-likelihood estimators of hazard ratios. Computing the constructed estimators is more efficient than computing those based on weighted regression with marginal structural Cox models. Numerical simulations confirm that the proposed method identifies the ground truth with minimal bias. These results lay the foundation for developing a useful, alternative method for causal inference with uncontrolled, observational data in modern epidemiology.
【19】To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks
标题:信任还是不信任:关于无线网络基于ML的资源分配的校准
链接:https://arxiv.org/abs/2507.17494
作者:aina, Nidhi Simmons, David E. Simmons, Michel Daoud Yacoub, Trung Q. Duong
摘要:在下一代通信和网络中,机器学习(ML)模型不仅要提供准确的预测,还要提供经过良好校准的置信度分数,以反映正确决策的真实可能性。本文研究了单用户多资源分配框架下基于ML的中断预测器的校准性能。我们首先建立了该系统的中断概率(OP)下完美校准的关键理论属性。重要的是,我们表明,随着资源数量的增长,一个完美校准的预测器的OP接近预期的输出条件下,它低于分类阈值。相反,当只有一个资源可用时,系统的OP等于模型的总体预期输出。然后,我们推导出一个完美的校准预测的OP条件。这些发现指导分类阈值的选择,以实现所需的OP,帮助系统设计人员满足特定的可靠性要求。我们还表明,后处理校准不能提高系统的最低可实现的OP,因为它不引入新的信息,未来的信道状态。此外,我们表明,校准良好的模型是一个更广泛的一类预测,必然提高OP的一部分。特别是,我们建立了一个单调性条件,准确性置信度函数必须满足这样的改善发生。为了证明这些理论特性,我们进行了严格的模拟为基础的分析,使用后处理校准技术:普拉特缩放和保序回归。作为该框架的一部分,预测器使用专门为此系统设计的停电损失函数进行训练。此外,这种分析是在瑞利衰落信道上进行的,其时间相关性由Clarke的2D模型捕获,该模型考虑了接收机的移动性。
摘要:In next-generation communications and networks, machine learning (ML) models are expected to deliver not only accurate predictions but also well-calibrated confidence scores that reflect the true likelihood of correct decisions. This paper studies the calibration performance of an ML-based outage predictor within a single-user, multi-resource allocation framework. We first establish key theoretical properties of this system's outage probability (OP) under perfect calibration. Importantly, we show that as the number of resources grows, the OP of a perfectly calibrated predictor approaches the expected output conditioned on it being below the classification threshold. In contrast, when only one resource is available, the system's OP equals the model's overall expected output. We then derive the OP conditions for a perfectly calibrated predictor. These findings guide the choice of the classification threshold to achieve a desired OP, helping system designers meet specific reliability requirements. We also demonstrate that post-processing calibration cannot improve the system's minimum achievable OP, as it does not introduce new information about future channel states. Additionally, we show that well-calibrated models are part of a broader class of predictors that necessarily improve OP. In particular, we establish a monotonicity condition that the accuracy-confidence function must satisfy for such improvement to occur. To demonstrate these theoretical properties, we conduct a rigorous simulation-based analysis using post-processing calibration techniques: Platt scaling and isotonic regression. As part of this framework, the predictor is trained using an outage loss function specifically designed for this system. Furthermore, this analysis is performed on Rayleigh fading channels with temporal correlation captured by Clarke's 2D model, which accounts for receiver mobility.
【20】OkadaTorch: A Differentiable Programming of Okada Model to Calculate Displacements and Strains from Fault Parameters
标题:OkadaTorch:Okada模型的可微规划,根据断层参数计算位移和应变
链接:https://arxiv.org/abs/2507.17126
作者: Someya, Taisuke Yamada, Tomohisa Okazaki
备注:13 pages, 8 figures
摘要:Okada模型是一种广泛使用的三维弹性半空间中点或矩形位错源引起的位移和应变的解析解。我们介绍OkadaTorch,Okada模型的PyTorch实现,其中整个代码是可微的;相对于输入的梯度可以使用自动微分(AD)轻松计算。我们的工作包括两个部分:将原始Okada模型直接转换为PyTorch,以及一个方便的包装器接口,用于有效计算相对于观测站坐标或故障参数的梯度和Hessians。这种可微框架非常适合故障参数反演,包括基于梯度的优化,贝叶斯推理以及与科学机器学习(SciML)模型的集成。我们的代码可以在这里找到:https://github.com/msomeya1/OkadaTorch
摘要:The Okada model is a widely used analytical solution for displacements and strains caused by a point or rectangular dislocation source in a 3D elastic half-space. We present OkadaTorch, a PyTorch implementation of the Okada model, where the entire code is differentiable; gradients with respect to input can be easily computed using automatic differentiation (AD). Our work consists of two components: a direct translation of the original Okada model into PyTorch, and a convenient wrapper interface for efficiently computing gradients and Hessians with respect to either observation station coordinates or fault parameters. This differentiable framework is well suited for fault parameter inversion, including gradient-based optimization, Bayesian inference, and integration with scientific machine learning (SciML) models. Our code is available here: https://github.com/msomeya1/OkadaTorch
其他(18篇)
【1】Flow Matching Meets Biology and Life Science: A Survey
标题
:流量匹配与生物学和生命科学的相遇:一项调查
链接:https://arxiv.org/abs/2507.17731
作者: Zhichen Zeng, Xiao Lin, Feihao Fang, Yanru Qu, Zhe Xu, Zhining Liu, Xuying Ning, Tianxin Wei, Ge Liu, Hanghang Tong, Jingrui He
备注:Preprint, 27 pages
摘要:在过去的十年中,生成建模的进步,如生成对抗网络,掩蔽自编码器和扩散模型,已经显著改变了生物研究和发现,使分子设计,蛋白质生成,药物发现等方面取得了突破。与此同时,生物学应用已经成为评估生成模型能力的有价值的测试平台。最近,流匹配已成为一个强大的和有效的替代扩散为基础的生成建模,在其应用程序的生物学和生命科学的问题越来越感兴趣。本文首次全面综述了流量匹配及其在生物领域的应用的最新进展。我们首先系统地回顾了流匹配的基础和变体,然后将其应用分为三个主要领域:生物序列建模,分子生成和设计,以及肽和蛋白质生成。对于每一项,我们都深入审查了最近的进展。我们还总结了常用的数据集和软件工具,并总结了未来潜在的发展方向。相应的策划资源可在https://github.com/Violet24K/Awesome-Flow-Matching-Meets-Biology上获得。
摘要:Over the past decade, advances in generative modeling, such as generative adversarial networks, masked autoencoders, and diffusion models, have significantly transformed biological research and discovery, enabling breakthroughs in molecule design, protein generation, drug discovery, and beyond. At the same time, biological applications have served as valuable testbeds for evaluating the capabilities of generative models. Recently, flow matching has emerged as a powerful and efficient alternative to diffusion-based generative modeling, with growing interest in its application to problems in biology and life sciences. This paper presents the first comprehensive survey of recent developments in flow matching and its applications in biological domains. We begin by systematically reviewing the foundations and variants of flow matching, and then categorize its applications into three major areas: biological sequence modeling, molecule generation and design, and peptide and protein generation. For each, we provide an in-depth review of recent progress. We also summarize commonly used datasets and software tools, and conclude with a discussion of potential future directions. The corresponding curated resources are available at https://github.com/Violet24K/Awesome-Flow-Matching-Meets-Biology.
【2】HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging
标题:HydraOpt:应对适配器合并的效率-性能权衡
链接:https://arxiv.org/abs/2507.17706
作者:tli, Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli
摘要:大型语言模型(LLM)通常利用适配器(例如基于低等级的适配器)来实现下游任务的强大性能。然而,为每个任务存储单独的适配器会显著增加内存需求,这对资源受限的环境(如移动设备)构成了挑战。虽然模型合并技术可以降低存储成本,但它们通常会导致性能大幅下降。在这项工作中,我们介绍HydraOpt,一种新的模型合并技术,利用低秩适配器的矩阵之间的固有相似性。与现有的在存储大小和性能之间进行固定权衡的方法不同,HydraOpt使我们能够在效率和性能的范围内进行导航。我们的实验表明,与存储所有适配器相比,HydraOpt显著减少了存储大小(减少48%),同时实现了具有竞争力的性能(降低0.2-1.8%)。此外,它优于现有的合并技术在性能方面相同或略差的存储效率。
摘要:Large language models (LLMs) often leverage adapters, such as low-rank-based adapters, to achieve strong performance on downstream tasks. However, storing a separate adapter for each task significantly increases memory requirements, posing a challenge for resource-constrained environments such as mobile devices. Although model merging techniques can reduce storage costs, they typically result in substantial performance degradation. In this work, we introduce HydraOpt, a new model merging technique that capitalizes on the inherent similarities between the matrices of low-rank adapters. Unlike existing methods that produce a fixed trade-off between storage size and performance, HydraOpt allows us to navigate this spectrum of efficiency and performance. Our experiments show that HydraOpt significantly reduces storage size (48% reduction) compared to storing all adapters, while achieving competitive performance (0.2-1.8% drop). Furthermore, it outperforms existing merging techniques in terms of performance at the same or slightly worse storage efficiency.
【3】Federated Majorize-Minimization: Beyond Parameter Aggregation
标题:联合多数最小化:超越参数聚合
链接:https://arxiv.org/abs/2507.17534
作者:ieuleveut, Gersende Fort, Mahmoud Hegazy, Hoi-To Wai
摘要:本文提出了一种统一的方法来设计随机优化算法,强大的规模联邦学习设置。本文研究了一类具有线性参数化优化代理函数族的优化极小化问题。这个框架包括(近端)基于梯度的算法(正则化)光滑的目标,期望最大化算法,和许多问题被视为变分代理MM。我们表明,我们的框架激励一个统一的算法称为随机近似随机代理MM(\SSMM),其中包括以前的随机MM程序作为特殊情况。然后,我们将\SSMM\扩展到联邦设置,同时考虑常见的瓶颈,如数据异构性、部分参与和通信约束;这产生了\QSMM。\QSMM\的独创性在于局部学习,然后聚合表征\textit{替代优化函数}的信息,与学习和聚合\textit{原始参数}的经典算法相反。最后,为了展示这种方法超出我们的理论设置的灵活性,我们用它来设计一个算法,计算最佳的运输地图在联邦设置。
摘要:This paper proposes a unified approach for designing stochastic optimization algorithms that robustly scale to the federated learning setting. Our work studies a class of Majorize-Minimization (MM) problems, which possesses a linearly parameterized family of majorizing surrogate functions. This framework encompasses (proximal) gradient-based algorithms for (regularized) smooth objectives, the Expectation Maximization algorithm, and many problems seen as variational surrogate MM. We show that our framework motivates a unifying algorithm called Stochastic Approximation Stochastic Surrogate MM (\SSMM), which includes previous stochastic MM procedures as special instances. We then extend \SSMM\ to the federated setting, while taking into consideration common bottlenecks such as data heterogeneity, partial participation, and communication constraints; this yields \QSMM. The originality of \QSMM\ is to learn locally and then aggregate information characterizing the \textit{surrogate majorizing function}, contrary to classical algorithms which learn and aggregate the \textit{original parameter}. Finally, to showcase the flexibility of this methodology beyond our theoretical setting, we use it to design an algorithm for computing optimal transport maps in the federated setting.
【4】EarthLink: Interpreting Climate Signals with Self-Evolving AI Agents
标题:EarthLink:用自我进化的人工智能代理解读气候信号
链接:https://arxiv.org/abs/2507.17311
作者:, Jiong Wang, Xiaoyu Yue, Wangxu Wei, Zhe Jiang, Wanghan Xu, Ben Fei, Wenlong Zhang, Xinyu Gu, Lijing Cheng, Jing-Jia Luo, Chao Li, Yaqiang Wang, Tao Chen, Wanli Ouyang, Fenghua Ling, Lei Bai
摘要:现代地球科学正处于一个转折点。地球系统数据的庞大、分散和复杂性,加上日益复杂的分析需求,为快速科学发现带来了重大瓶颈。在这里,我们介绍EarthLink,这是第一个为地球科学家设计的交互式副驾驶员。它自动化了从规划和代码生成到多场景分析的端到端研究工作流程。与静态诊断工具不同,EarthLink可以从用户交互中学习,通过动态反馈循环不断完善其功能。我们验证了它在气候变化的一些核心科学任务上的性能,从模型观测比较到复杂现象的诊断。在一次多专家评估中,EarthLink进行了科学合理的分析,并展示了被评为与人类初级研究人员工作流程的具体方面相当的分析能力。此外,其透明、可审计的工作流程和自然语言界面使科学家能够从费力的手动执行转向战略监督和假设生成。EarthLink标志着在全球变化加速的时代,地球系统研究朝着高效、可信和协作的范式迈出了关键一步。
摘要
:Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher's workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change.
【5】On Temporal Guidance and Iterative Refinement in Audio Source Separation
标题:音频源分离中的时间引导和迭代细化
链接:https://arxiv.org/abs/2507.17297
作者:rocutti, Jonathan Greif, Paul Primus, Florian Schmid, Gerhard Widmer
摘要:声音场景的空间语义分割(S5)涉及到活动声音类别的准确识别以及从复杂的声学混合物中精确分离它们的来源。传统的系统依赖于一个两阶段的流水线-音频标记,然后是标签条件源分离-但往往受到限制,缺乏细粒度的时间信息的有效分离的关键。在这项工作中,我们通过引入一种新的S5方法来解决这个限制,该方法增强了事件检测和源分离阶段之间的协同作用。我们的主要贡献有三个方面。首先,我们微调预训练的Transformer以检测活动声音类别。其次,我们利用一个单独的实例,这个微调Transformer执行声音事件检测(SED),提供详细的,随时间变化的指导分离模块。第三,我们实现了一个迭代的细化机制,逐步提高分离质量递归重用分离器的输出从以前的迭代。这些进步导致音频标记和源分离性能的显着改进,正如我们的系统在DCASE挑战赛2025的任务4中获得第二名所证明的那样。我们的实现和模型检查点可在我们的GitHub存储库中找到:https://github.com/theMoro/dcase25task4。
摘要:Spatial semantic segmentation of sound scenes (S5) involves the accurate identification of active sound classes and the precise separation of their sources from complex acoustic mixtures. Conventional systems rely on a two-stage pipeline - audio tagging followed by label-conditioned source separation - but are often constrained by the absence of fine-grained temporal information critical for effective separation. In this work, we address this limitation by introducing a novel approach for S5 that enhances the synergy between the event detection and source separation stages. Our key contributions are threefold. First, we fine-tune a pre-trained Transformer to detect active sound classes. Second, we utilize a separate instance of this fine-tuned Transformer to perform sound event detection (SED), providing the separation module with detailed, time-varying guidance. Third, we implement an iterative refinement mechanism that progressively enhances separation quality by recursively reusing the separator's output from previous iterations. These advancements lead to significant improvements in both audio tagging and source separation performance, as demonstrated by our system's second-place finish in Task 4 of the DCASE Challenge 2025. Our implementation and model checkpoints are available in our GitHub repository: https://github.com/theMoro/dcase25task4 .
【6】DistrAttention: An Efficient and Flexible Self-Attention Mechanism on Modern GPUs
标题:DistrAttention:现代图形处理器上高效灵活的自我注意机制
链接:https://arxiv.org/abs/2507.17245
作者:n, Mengbai Xiao, Yuan Yuan, Xiao Zhang, Dongxiao Yu, Guanghui Zhang, Haoliang Wang
摘要:Transformer架构彻底改变了深度学习,在自然语言处理、计算机视觉和时间序列预测等领域提供了最先进的性能。然而,其核心组件,自我注意,具有二次的时间复杂度相对于输入序列长度,这阻碍了Transformers的可扩展性。现有的优化自我注意的方法要么抛弃了完整的语境信息,要么缺乏灵活性。本文设计了一种高效灵活的全语境自注意机制DistrAttention。DistrAttention通过在嵌入维度上对数据进行分组来实现这一点,通常称为$d$。我们实现DistrAttention与轻量级的采样和融合方法,利用局部敏感的哈希分组相似的数据。进一步设计了一个分块分组框架,以限制由局部敏感散列法引入的错误。通过优化块大小的选择,DistrAttention可以很容易地与FlashAttention-2集成,从而在现代GPU上获得高性能。我们通过广泛的实验来评估DistrAttention。结果表明,我们的方法比FlashAttention-2在计算自我注意力方面快37%。在ViT推理中,DistrAttention是近似自我注意机制中最快和最准确的。在Llama 3 -1B中,DistrAttention仍然达到了最低的推理时间,只有1%的准确率损失。
摘要:The Transformer architecture has revolutionized deep learning, delivering the state-of-the-art performance in areas such as natural language processing, computer vision, and time series prediction. However, its core component, self-attention, has the quadratic time complexity relative to input sequence length, which hinders the scalability of Transformers. The exsiting approaches on optimizing self-attention either discard full-contextual information or lack of flexibility. In this work, we design DistrAttention, an effcient and flexible self-attention mechanism with the full context. DistrAttention achieves this by grouping data on the embedding dimensionality, usually referred to as $d$. We realize DistrAttention with a lightweight sampling and fusion method that exploits locality-sensitive hashing to group similar data. A block-wise grouping framework is further designed to limit the errors introduced by locality sensitive hashing. By optimizing the selection of block sizes, DistrAttention could be easily integrated with FlashAttention-2, gaining high-performance on modern GPUs. We evaluate DistrAttention with extensive experiments. The results show that our method is 37% faster than FlashAttention-2 on calculating self-attention. In ViT inference, DistrAttention is the fastest and the most accurate among approximate self-attention mechanisms. In Llama3-1B, DistrAttention still achieves the lowest inference time with only 1% accuray loss.
【7】Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
标题:过滤和细化:基于MLLM的级联系统,用于工业规模视频内容审核
链接:https://arxiv.org/abs/2507.17204
作者:ng, Jinghao Shi, Hanzhong Liang, Xiang Shen, Vera Wen, Zhiqian Chen, Yifan Wu, Zhixin Zhang, Hongyu Xiong
备注:Camera Ready for ACL 2025
摘要:有效的内容审核对于视频平台保障用户体验和维护社区标准至关重要。虽然传统的视频分类模型可以有效地处理定义明确的审核任务,但它们难以处理复杂的场景,例如隐含的有害内容和上下文模糊。多模态大型语言模型(MLLM)以其卓越的跨模态推理和上下文理解为这些限制提供了一个有前途的解决方案。然而,两个关键挑战阻碍了其工业应用。首先,MLLM的高计算成本使得全面部署不切实际。第二,适应判别分类的生成模型仍然是一个开放的研究问题。在本文中,我们首先介绍了一种有效的方法,使用最少的判别训练数据将生成MLLM转换为多模态分类器。为了实现行业规模的部署,我们提出了一个路由器排名级联系统,集成MLLM与轻量级路由器模型。离线实验表明,我们的MLLM为基础的方法提高了F1分数66.50%,而传统的分类器只需要2%的微调数据。在线评估表明,我们的系统将自动内容审核量增加了41%,而级联部署将计算成本降低到仅为直接全面部署的1.5%。
摘要
:Effective content moderation is essential for video platforms to safeguard user experience and uphold community standards. While traditional video classification models effectively handle well-defined moderation tasks, they struggle with complicated scenarios such as implicit harmful content and contextual ambiguity. Multimodal large language models (MLLMs) offer a promising solution to these limitations with their superior cross-modal reasoning and contextual understanding. However, two key challenges hinder their industrial adoption. First, the high computational cost of MLLMs makes full-scale deployment impractical. Second, adapting generative models for discriminative classification remains an open research problem. In this paper, we first introduce an efficient method to transform a generative MLLM into a multimodal classifier using minimal discriminative training data. To enable industry-scale deployment, we then propose a router-ranking cascade system that integrates MLLMs with a lightweight router model. Offline experiments demonstrate that our MLLM-based approach improves F1 score by 66.50% over traditional classifiers while requiring only 2% of the fine-tuning data. Online evaluations show that our system increases automatic content moderation volume by 41%, while the cascading deployment reduces computational cost to only 1.5% of direct full-scale deployment.
【8】GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP
标题:GhostUMAP 2:测量和分析(r,d)-UMAP的稳定性
链接:https://arxiv.org/abs/2507.17174
作者: Jung, Takanori Fujiwara, Jaemin Jo
摘要:尽管均匀流形近似和投影(UMAP)的广泛使用,其随机优化过程对结果的影响仍然没有得到充分的研究。我们观察到,它经常产生不稳定的结果,其中数据点的投影主要是偶然确定的,而不是反映相邻结构。为了解决这个问题,我们引入(r,d)-稳定性UMAP:一个框架,分析数据点在投影空间中的随机定位。为了评估随机元素,特别是初始投影位置和负采样,如何影响UMAP结果,我们引入了“鬼”,或重复的数据点代表潜在的位置变化,由于随机性。我们定义一个数据点的投影为(r,d)-稳定的,如果它的鬼扰动半径r的圆内的初始投影保持限制在一个圆的半径d为他们的最终位置。为了有效地计算鬼投影,我们开发了一个自适应下降方案,减少了运行时间高达60%相比,未优化的基线,同时保持约90%的不稳定点。我们还提出了一个可视化工具,支持交互式探索的(r,d)-稳定的数据点。最后,我们证明了我们的框架的有效性,通过检查真实世界的数据集的预测的稳定性,并有效地使用我们的框架目前的使用指南。
摘要:Despite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements, specifically initial projection positions and negative sampling, impact UMAP results, we introduce "ghosts", or duplicates of data points representing potential positional variations due to stochasticity. We define a data point's projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.
【9】ScSAM: Debiasing Morphology and Distributional Variability in Subcellular Semantic Segmentation
标题:ScSam:亚细胞语义分割中的去偏形态学和分布变异性
链接:https://arxiv.org/abs/2507.17149
作者:Jianan Fan, Dongnan Liu, Hang Chang, Gerald J.Shami, Filip Braet, Weidong Cai
备注:Accepted by 28th European Conference on Artificial Intelligence (ECAI)
摘要:亚细胞成分之间显著的形态和分布变异性对基于学习的细胞器分割模型提出了长期挑战,显著增加了有偏见的特征学习的风险。现有的方法往往依赖于单一的映射关系,忽视了功能的多样性,从而导致偏见的训练。尽管Segment Anything Model(SAM)提供了丰富的特征表示,但其在亚细胞场景中的应用受到两个关键挑战的阻碍:(1)亚细胞形态和分布的可变性在标签空间中产生间隙,导致模型学习虚假或有偏见的特征。(2)SAM专注于全局上下文理解,通常忽略细粒度的空间细节,这使得捕捉微妙的结构变化和应对倾斜的数据分布变得非常具有挑战性。为了解决这些挑战,我们引入了ScSAM,这是一种通过将预训练的SAM与Masked Autoencoder(MAE)引导的细胞先验知识融合来增强特征鲁棒性的方法,以减轻数据不平衡带来的训练偏差。具体来说,我们设计了一个特征对齐和融合模块,将预训练的嵌入对齐到同一个特征空间,并有效地组合不同的表示。此外,我们提出了一个基于余弦相似矩阵的类提示编码器,激活类特定的功能,以识别亚细胞类别。在不同亚细胞图像数据集上的广泛实验表明,ScSAM优于最先进的方法。
摘要:The significant morphological and distributional variability among subcellular components poses a long-standing challenge for learning-based organelle segmentation models, significantly increasing the risk of biased feature learning. Existing methods often rely on single mapping relationships, overlooking feature diversity and thereby inducing biased training. Although the Segment Anything Model (SAM) provides rich feature representations, its application to subcellular scenarios is hindered by two key challenges: (1) The variability in subcellular morphology and distribution creates gaps in the label space, leading the model to learn spurious or biased features. (2) SAM focuses on global contextual understanding and often ignores fine-grained spatial details, making it challenging to capture subtle structural alterations and cope with skewed data distributions. To address these challenges, we introduce ScSAM, a method that enhances feature robustness by fusing pre-trained SAM with Masked Autoencoder (MAE)-guided cellular prior knowledge to alleviate training bias from data imbalance. Specifically, we design a feature alignment and fusion module to align pre-trained embeddings to the same feature space and efficiently combine different representations. Moreover, we present a cosine similarity matrix-based class prompt encoder to activate class-specific features to recognize subcellular categories. Extensive experiments on diverse subcellular image datasets demonstrate that ScSAM outperforms state-of-the-art methods.
【10】Computer Vision for Real-Time Monkeypox Diagnosis on Embedded Systems
标题:嵌入式系统上实时猴痘诊断的计算机视觉
链接:https://arxiv.org/abs/2507.17123
作者:Delgado-López, Ricardo A. Morell-Rodriguez, Sebastián O. Espinosa-Del Rosario, Wilfredo E. Lugo-Beauchamp
摘要:快速诊断猴痘等传染病对于有效遏制和治疗至关重要,特别是在资源有限的环境中。这项研究提出了一种人工智能驱动的诊断工具,该工具是为部署在NVIDIA Jetson Orin Nano上而开发的,利用预先训练的MobileNetV 2架构进行二进制分类。该模型在开源猴痘皮肤病变数据集上进行了训练,达到了93.07%的F1分数,这反映了精确度和召回率的良好平衡。为了优化模型,使用TensorRT框架来加速FP 32的推理,并对FP 16和INT 8格式进行训练后量化。TensorRT的混合精度功能实现了这些优化,从而缩小了模型大小,提高了推理速度,并将功耗降低了大约二分之一,同时保持了原始准确性。功耗分析证实,优化后的模型在推理过程中使用的能量显著减少,增强了它们在资源受限环境中部署的适用性。该系统部署了Wi-Fi接入点(AP)热点和基于Web的界面,使用户能够直接通过移动电话等连接设备上传和分析图像。这种设置确保了简单的访问和无缝连接,使该工具适用于实际应用。这些进步将诊断工具定位为一种高效、可扩展和节能的解决方案,以解决服务不足地区的诊断挑战,为在低资源医疗环境中更广泛地采用铺平了道路。
摘要:The rapid diagnosis of infectious diseases, such as monkeypox, is crucial for effective containment and treatment, particularly in resource-constrained environments. This study presents an AI-driven diagnostic tool developed for deployment on the NVIDIA Jetson Orin Nano, leveraging the pre-trained MobileNetV2 architecture for binary classification. The model was trained on the open-source Monkeypox Skin Lesion Dataset, achieving a 93.07% F1-Score, which reflects a well-balanced performance in precision and recall. To optimize the model, the TensorRT framework was used to accelerate inference for FP32 and to perform post-training quantization for FP16 and INT8 formats. TensorRT's mixed-precision capabilities enabled these optimizations, which reduced the model size, increased inference speed, and lowered power consumption by approximately a factor of two, all while maintaining the original accuracy. Power consumption analysis confirmed that the optimized models used significantly less energy during inference, reinforcing their suitability for deployment in resource-constrained environments. The system was deployed with a Wi-Fi Access Point (AP) hotspot and a web-based interface, enabling users to upload and analyze images directly through connected devices such as mobile phones. This setup ensures simple access and seamless connectivity, making the tool practical for real-world applications. These advancements position the diagnostic tool as an efficient, scalable, and energy-conscious solution to address diagnosis challenges in underserved regions, paving the way for broader adoption in low-resource healthcare settings.
【11】Pragmatic Policy Development via Interpretable Behavior Cloning
标题:通过可解释行为克隆制定务实的政策
链接:https://arxiv.org/abs/2507.17056
作者:sson, Yaochen Rao, Heather J. Litman, Fredrik D. Johansson
摘要:离线强化学习(RL)在从观测数据中导出最优策略方面具有很大的潜力,但与可解释性和评估相关的挑战限制了其在安全关键领域的实际应用。不受约束的RL策略的黑盒性质阻碍了可解释性,而评估-通常在策略外执行-对数据收集行为策略的大偏差很敏感,特别是在使用基于重要性抽样的方法时。为了解决这些挑战,我们提出了一个简单而实用的替代方案:从每个患者状态中最常选择的行动中导出治疗策略,如行为策略的可解释模型所估计的那样。通过使用一个基于树的模型,这是专门设计来利用数据中的模式,我们获得了一个自然的分组状态相对于治疗。树结构通过设计确保可解释性,同时改变所考虑的动作数量控制与行为策略的重叠程度,从而实现可靠的非策略评估。这种务实的政策制定方法消除了常见的治疗模式,捕捉了数据中嵌入的集体临床判断。使用类风湿性关节炎和败血症护理中的真实例子,我们证明了在此框架下导出的政策可以优于当前的实践,为通过离线RL获得的政策提供可解释的替代方案。
摘要:Offline reinforcement learning (RL) holds great promise for deriving optimal policies from observational data, but challenges related to interpretability and evaluation limit its practical use in safety-critical domains. Interpretability is hindered by the black-box nature of unconstrained RL policies, while evaluation -- typically performed off-policy -- is sensitive to large deviations from the data-collecting behavior policy, especially when using methods based on importance sampling. To address these challenges, we propose a simple yet practical alternative: deriving treatment policies from the most frequently chosen actions in each patient state, as estimated by an interpretable model of the behavior policy. By using a tree-based model, which is specifically designed to exploit patterns in the data, we obtain a natural grouping of states with respect to treatment. The tree structure ensures interpretability by design, while varying the number of actions considered controls the degree of overlap with the behavior policy, enabling reliable off-policy evaluation. This pragmatic approach to policy development standardizes frequent treatment patterns, capturing the collective clinical judgment embedded in the data. Using real-world examples in rheumatoid arthritis and sepsis care, we demonstrate that policies derived under this framework can outperform current practice, offering interpretable alternatives to those obtained via offline RL.
【12】laplax -- Laplace Approximations with JAX
标题:Lapax--使用JAX的拉普拉斯逼近
链接:https://arxiv.org/abs/2507.17013
作者:ber, Bálint Mucsányi, Lenard Rommel, Thomas Christie, Lars Kasüschke, Marvin Pförtner, Philipp Hennig
备注:Submission to the ICML 2025 Workshop on Championing Open-source Development in Machine Learning (CODEML '25)
摘要:拉普拉斯近似提供了一种量化深度神经网络中权重空间不确定性的可扩展且有效的方法,从而能够应用贝叶斯工具,例如预测不确定性和通过奥卡姆剃刀进行模型选择。在这项工作中,我们介绍了laplax,一个新的开源Python软件包,用于执行Laplace近似。laplax采用模块化和纯功能架构设计,外部依赖性最小,为快速原型设计和实验提供了一个灵活且研究人员友好的框架。其目标是促进贝叶斯神经网络的研究,深度学习的不确定性量化以及改进的拉普拉斯近似技术的开发。
摘要:The Laplace approximation provides a scalable and efficient means of quantifying weight-space uncertainty in deep neural networks, enabling the application of Bayesian tools such as predictive uncertainty and model selection via Occam's razor. In this work, we introduce laplax, a new open-source Python package for performing Laplace approximations with jax. Designed with a modular and purely functional architecture and minimal external dependencies, laplax offers a flexible and researcher-friendly framework for rapid prototyping and experimentation. Its goal is to facilitate research on Bayesian neural networks, uncertainty quantification for deep learning, and the development of improved Laplace approximation techniques.
【13】Evaluating Artificial Intelligence Algorithms for the Standardization of Transtibial Prosthetic Socket Shape Design
标题:评估人工智能算法用于跨胫假体承窝形状设计标准化
链接:https://arxiv.org/abs/2507.16818
作者:rdaan, M. van der Stelt, T.J.J. Maal, V.M.A. Stirler, R. Leijendekkers, T. Kachman, G.A. de Jong
摘要:经胫骨假肢接受腔的质量取决于修复师的技能和专业知识,因为装配是手动进行的。本研究调查了多种人工智能(AI)方法,以帮助标准化经胫骨假肢接受腔设计。来自118名患者的数据由在荷兰医疗系统工作的假肢专家收集。这些数据包括残肢的三维(3D)扫描和假肢师设计的接受腔的相应3D模型。使用可变形模型和主成分分析执行多个数据预处理步骤以进行对齐、标准化和可选压缩。之后,开发了三种不同的算法-3D神经网络,前馈神经网络和随机森林-以预测1)最终的插座形状或2)由假肢专家进行的调整,以基于残肢的3D扫描预测插座形状。每种算法的性能都是通过比较假肢设计的插座与AI生成的插座来评估的,使用两个指标结合错误位置。首先,我们测量表面到表面的距离,以评估人工智能生成的插座和假肢设计的插座之间的整体表面误差。其次,利用人工智能生成的和假肢之间的距离图来分析错误的位置。对于所有的算法,估计所需的适应优于直接预测的最终插座形状。随机森林模型应用于适应性预测产生最低的误差与中位数的表面到表面的距离为1.24毫米,1.03毫米的第一个四分位数,和1.54毫米的第三个四分位数。
摘要:The quality of a transtibial prosthetic socket depends on the prosthetist's skills and expertise, as the fitting is performed manually. This study investigates multiple artificial intelligence (AI) approaches to help standardize transtibial prosthetic socket design. Data from 118 patients were collected by prosthetists working in the Dutch healthcare system. This data consists of a three-dimensional (3D) scan of the residual limb and a corresponding 3D model of the prosthetist-designed socket. Multiple data pre-processing steps are performed for alignment, standardization and optionally compression using Morphable Models and Principal Component Analysis. Afterward, three different algorithms - a 3D neural network, Feedforward neural network, and random forest - are developed to either predict 1) the final socket shape or 2) the adaptations performed by a prosthetist to predict the socket shape based on the 3D scan of the residual limb. Each algorithm's performance was evaluated by comparing the prosthetist-designed socket with the AI-generated socket, using two metrics in combination with the error location. First, we measure the surface-to-surface distance to assess the overall surface error between the AI-generated socket and the prosthetist-designed socket. Second, distance maps between the AI-generated and prosthetist sockets are utilized to analyze the error's location. For all algorithms, estimating the required adaptations outperformed direct prediction of the final socket shape. The random forest model applied to adaptation prediction yields the lowest error with a median surface-to-surface distance of 1.24 millimeters, a first quartile of 1.03 millimeters, and a third quartile of 1.54 millimeters.
【14】Sequential Bayesian Design for Efficient Surrogate Construction in the Inversion of Darcy Flows
标题:达西流倒置中有效替代构建的序贯Bayesian设计
链接:https://arxiv.org/abs/2507.17713
作者:ng, Hongqiao Wang, Jinyong Ying, Qingping Zhou
备注:21 pages, 15 figures
摘要:偏微分方程(PDE)反问题在计算科学、图像处理和工程等领域中起着至关重要的作用。达西渗流方程是流体力学中的基本方程,对认识流体在多孔介质中的渗流起着至关重要的作用。贝叶斯方法提供了一种有效的方法来解决偏微分方程反问题,而他们的数值实现需要大量的计算昂贵的前向求解器的评估。因此,采用具有较低计算成本的代理模型是必不可少的。然而,为高维复杂问题构建全局精确的代理模型需要高的模型容量和大量的数据。为了应对这一挑战,本研究提出了一种有效的局部准确的替代品,专注于逆问题中真实可能性的高概率区域,具有相对较低的模型复杂性和较少的训练数据要求。此外,我们引入了一个顺序贝叶斯设计策略,以获得建议的代理,因为高概率区域的可能性是未知的。该策略将序贯贝叶斯设计的后验进化过程视为高斯过程,通过提前一步实现算法加速。完整的算法框架被称为局部精确替代的序贯贝叶斯设计(SBD-LAS)。最后,基于达西流动方程的三个实验表明,所提出的方法在反演精度和计算速度方面的优势。
摘要:Inverse problems governed by partial differential equations (PDEs) play a crucial role in various fields, including computational science, image processing, and engineering. Particularly, Darcy flow equation is a fundamental equation in fluid mechanics, which plays a crucial role in understanding fluid flow through porous media. Bayesian methods provide an effective approach for solving PDEs inverse problems, while their numerical implementation requires numerous evaluations of computationally expensive forward solvers. Therefore, the adoption of surrogate models with lower computational costs is essential. However, constructing a globally accurate surrogate model for high-dimensional complex problems demands high model capacity and large amounts of data. To address this challenge, this study proposes an efficient locally accurate surrogate that focuses on the high-probability regions of the true likelihood in inverse problems, with relatively low model complexity and few training data requirements. Additionally, we introduce a sequential Bayesian design strategy to acquire the proposed surrogate since the high-probability region of the likelihood is unknown. The strategy treats the posterior evolution process of sequential Bayesian design as a Gaussian process, enabling algorithmic acceleration through one-step ahead prior. The complete algorithmic framework is referred to as Sequential Bayesian design for locally accurate surrogate (SBD-LAS). Finally, three experiments based the Darcy flow equation demonstrate the advantages of the proposed method in terms of both inversion accuracy and computational speed.
【15】Time Deep Gradient Flow Method for pricing American options
标题:美式期权定价的时间深度梯度流法
链接:https://arxiv.org/abs/2507.17606
作者:u
备注:13 pages, 6 figures
摘要:在这项研究中,我们探索基于神经网络的方法来定价多维美国看跌期权下的BlackScholes和赫斯顿模型,扩展到五个维度。我们重点介绍了两种方法:时间深度梯度流(TDGF)方法和深度伽辽金方法(DGM)。我们扩展了TDGF方法来处理美式期权中的自由边界偏微分方程。我们在训练过程中仔细设计了采样策略,以提高性能。TDGF和DGM都实现了高精度,同时在计算速度方面优于传统的Monte Carlo方法。特别是,TDGF在训练期间往往比DGM更快。
摘要:In this research, we explore neural network-based methods for pricing multidimensional American put options under the BlackScholes and Heston model, extending up to five dimensions. We focus on two approaches: the Time Deep Gradient Flow (TDGF) method and the Deep Galerkin Method (DGM). We extend the TDGF method to handle the free-boundary partial differential equation inherent in American options. We carefully design the sampling strategy during training to enhance performance. Both TDGF and DGM achieve high accuracy while outperforming conventional Monte Carlo methods in terms of computational speed. In particular, TDGF tends to be faster during training than DGM.
【16】Spintronic Bayesian Hardware Driven by Stochastic Magnetic Domain Wall Dynamics
标题:随机磁域壁动力学驱动的Spinrotics Bayesian硬件
链接:https://arxiv.org/abs/2507.17193
作者:ng, Bingqian Dai, Kin Wong, Yaochen Li, Yang Cheng, Qingyuan Shu, Haoran He, Puyang Huang, Hanshen Huang, Kang L. Wang
摘要:随着人工智能(AI)进入各种应用,确保AI模型的可靠性变得越来越重要。传统的神经网络提供了强大的预测能力,但产生确定性的输出没有固有的不确定性估计,限制了其在安全关键领域的可靠性。引入随机性的概率神经网络(PNN)已成为实现内在不确定性量化的强大方法。然而,传统的CMOS架构本质上是为确定性操作而设计的,并且主动抑制固有的随机性。这对实现PNN提出了根本挑战,因为概率处理引入了显著的计算开销。为了解决这一挑战,我们引入了一个磁概率计算(MPC)平台,一个节能,可扩展的硬件加速器,利用内在的磁随机性的不确定性感知计算。这种物理驱动的策略利用基于磁畴壁(DW)及其动力学的自旋电子系统,为人工智能建立物理概率计算的新范式。MPC平台集成了三种关键机制:热致DW随机性、压控磁各向异性(VCMA)和隧穿磁阻(TMR),从而在器件级实现完全电气和可调概率功能。作为一个代表性的演示,我们实现了贝叶斯神经网络(BNN)的推理结构,并验证其功能的CIFAR-10分类任务。与标准的28 nm CMOS实现相比,我们的方法实现了7个数量级的整体品质因数的改善,在面积效率,能耗和速度的大幅增长。这些结果强调了MPC平台实现可靠和值得信赖的物理AI系统的潜力。
摘要:As artificial intelligence (AI) advances into diverse applications, ensuring reliability of AI models is increasingly critical. Conventional neural networks offer strong predictive capabilities but produce deterministic outputs without inherent uncertainty estimation, limiting their reliability in safety-critical domains. Probabilistic neural networks (PNNs), which introduce randomness, have emerged as a powerful approach for enabling intrinsic uncertainty quantification. However, traditional CMOS architectures are inherently designed for deterministic operation and actively suppress intrinsic randomness. This poses a fundamental challenge for implementing PNNs, as probabilistic processing introduces significant computational overhead. To address this challenge, we introduce a Magnetic Probabilistic Computing (MPC) platform-an energy-efficient, scalable hardware accelerator that leverages intrinsic magnetic stochasticity for uncertainty-aware computing. This physics-driven strategy utilizes spintronic systems based on magnetic domain walls (DWs) and their dynamics to establish a new paradigm of physical probabilistic computing for AI. The MPC platform integrates three key mechanisms: thermally induced DW stochasticity, voltage controlled magnetic anisotropy (VCMA), and tunneling magnetoresistance (TMR), enabling fully electrical and tunable probabilistic functionality at the device level. As a representative demonstration, we implement a Bayesian Neural Network (BNN) inference structure and validate its functionality on CIFAR-10 classification tasks. Compared to standard 28nm CMOS implementations, our approach achieves a seven orders of magnitude improvement in the overall figure of merit, with substantial gains in area efficiency, energy consumption, and speed. These results underscore the MPC platform's potential to enable reliable and trustworthy physical AI systems.
【17】Fast and Scalable Gene Embedding Search: A Comparative Study of FAISS and ScaNN
标题:快速且可扩展的基因嵌入搜索:FAISS和ScaNN的比较研究
链接:https://arxiv.org/abs/2507.16978
作者:Saleh Refahi, Gavin Hearne, Harrison Muller, Kieran Lynch, Bahrad A. Sokhansanj, James R. Brown, Gail Rosen
备注:None
摘要:DNA测序数据的指数增长已经超过了传统的基于化学的方法,后者难以有效地扩展。高效的计算方法是迫切需要支持大规模的相似性搜索,在生物信息学的基础任务,用于检测基因组和蛋白质组序列之间的同源性,功能相似性和新颖性。尽管像BLAST这样的工具已被广泛使用并且在许多场景中仍然有效,但它们受到计算成本高和对不同序列性能差等限制。 在这项工作中,我们探索基于嵌入的相似性搜索方法,学习潜在的表示捕捉更深层次的结构和功能模式,超越原始序列比对。我们系统地评估了两个国家的最先进的载体搜索库,FAISS和ScaNN,生物意义的基因嵌入。与以往的研究不同,我们的分析侧重于生物信息学特定的嵌入和基准检测新的序列,包括那些从未知的类群或基因缺乏已知的同源物的效用。我们的研究结果突出了计算优势(在内存和运行时效率)和提高检索质量,提供了一个有前途的替代传统的繁重的工具。
摘要
:The exponential growth of DNA sequencing data has outpaced traditional heuristic-based methods, which struggle to scale effectively. Efficient computational approaches are urgently needed to support large-scale similarity search, a foundational task in bioinformatics for detecting homology, functional similarity, and novelty among genomic and proteomic sequences. Although tools like BLAST have been widely used and remain effective in many scenarios, they suffer from limitations such as high computational cost and poor performance on divergent sequences. In this work, we explore embedding-based similarity search methods that learn latent representations capturing deeper structural and functional patterns beyond raw sequence alignment. We systematically evaluate two state-of-the-art vector search libraries, FAISS and ScaNN, on biologically meaningful gene embeddings. Unlike prior studies, our analysis focuses on bioinformatics-specific embeddings and benchmarks their utility for detecting novel sequences, including those from uncharacterized taxa or genes lacking known homologs. Our results highlight both computational advantages (in memory and runtime efficiency) and improved retrieval quality, offering a promising alternative to traditional alignment-heavy tools.
【18】Avoiding spectral pollution for transfer operators using residuals
标题:使用剩余避免传输操作员的光谱污染
链接:https://arxiv.org/abs/2507.16915
作者:wig, Matthew J. Colbrook, Oliver Junge, Péter Koltai, Julia Slipantschuk
摘要:Koopman算子理论通过将非线性动力系统的演化提升到无限维函数空间来实现非线性动力系统的线性分析。然而,有限维近似的Koopman和转移(Frobenius-Perron)运营商容易频谱污染,引入虚假的特征值,可以妥协频谱计算。虽然最近的进展已经产生了可证明收敛的方法库普曼运营商,类似的工具,一般转移运营商仍然有限。在本文中,我们提出的算法计算谱特性的转移算子没有谱污染,包括扩展的Hardy-Hilbert空间。案例研究-从家庭的Blaschke地图与已知的光谱蛋白质折叠的分子动力学模型-证明了我们的方法的准确性和灵活性。值得注意的是,我们证明,即使当相应的本征函数位于所选空间之外,也可以出现谱特征,突出了定义“真正的”库普曼谱的功能分析的微妙之处。我们的方法为广泛的应用提供了强大的光谱估计工具。
摘要:Koopman operator theory enables linear analysis of nonlinear dynamical systems by lifting their evolution to infinite-dimensional function spaces. However, finite-dimensional approximations of Koopman and transfer (Frobenius--Perron) operators are prone to spectral pollution, introducing spurious eigenvalues that can compromise spectral computations. While recent advances have yielded provably convergent methods for Koopman operators, analogous tools for general transfer operators remain limited. In this paper, we present algorithms for computing spectral properties of transfer operators without spectral pollution, including extensions to the Hardy-Hilbert space. Case studies--ranging from families of Blaschke maps with known spectrum to a molecular dynamics model of protein folding--demonstrate the accuracy and flexibility of our approach. Notably, we demonstrate that spectral features can arise even when the corresponding eigenfunctions lie outside the chosen space, highlighting the functional-analytic subtleties in defining the "true" Koopman spectrum. Our methods offer robust tools for spectral estimation across a broad range of applications.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递