点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计144篇
大模型相关(19篇)
【1】APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration
标题:APT-LLM:利用辅助精度张量核计算进行LLM加速
链接:https://arxiv.org/abs/2508.19087
作者:, Chao Fang, Haikuo Shao, Zhongfeng Wang
备注:To appear in the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
摘要:大型语言模型(LLM)已经彻底改变了AI应用程序,但其巨大的计算需求严重限制了部署和实时性能。量化方法可以帮助降低计算成本,然而,以任意精度实现与超低位量化LLM相关的极高效率对GPU提出了挑战。这主要是由于对GPU Tensor Core的支持有限,内存管理效率低下以及内核优化不灵活。为了应对这些挑战,我们提出了一个全面的加速方案,为任意精度的LLM,即APT-LLM。首先,我们介绍了一种新的数据格式,双极INT,它允许有效和无损的转换与签署INT,同时也更有利于并行计算。我们还开发了一个矩阵乘法(MatMul)的方法,允许任意精度的拆卸和重组矩阵的位级。该方法提供灵活的精度并优化GPU张量核心的利用率。此外,我们提出了一个专注于数据恢复的内存管理系统,它战略性地采用快速共享内存,以大幅提高内核执行速度,减少内存访问延迟。最后,我们开发了一种内核映射方法,该方法可以动态选择不同矩阵大小的内核的最佳可配置超参数,从而在不同的LLM架构和精度设置中实现最佳性能。在LLM推理中,与FP 16基线相比,APT-LLM实现了高达3.99$\times$的加速比,与RTX 3090上的NVIDIA CUTLASS INT 4加速相比,实现了2.16$\times$的加速比。在RTX 4090和H800上,APT-LLM在FP 16上实现了高达2.44$\times$的加速比,在CUTLASS整数基线上实现了1.65$\times$的加速比。
摘要:Large language models (LLMs) have revolutionized AI applications, yet their enormous computational demands severely limit deployment and real-time performance. Quantization methods can help reduce computational costs, however, attaining the extreme efficiency associated with ultra-low-bit quantized LLMs at arbitrary precision presents challenges on GPUs. This is primarily due to the limited support for GPU Tensor Cores, inefficient memory management, and inflexible kernel optimizations. To tackle these challenges, we propose a comprehensive acceleration scheme for arbitrary precision LLMs, namely APT-LLM. Firstly, we introduce a novel data format, bipolar-INT, which allows for efficient and lossless conversion with signed INT, while also being more conducive to parallel computation. We also develop a matrix multiplication (MatMul) method allowing for arbitrary precision by dismantling and reassembling matrices at the bit level. This method provides flexible precision and optimizes the utilization of GPU Tensor Cores. In addition, we propose a memory management system focused on data recovery, which strategically employs fast shared memory to substantially increase kernel execution speed and reduce memory access latency. Finally, we develop a kernel mapping method that dynamically selects the optimal configurable hyperparameters of kernels for varying matrix sizes, enabling optimal performance across different LLM architectures and precision settings. In LLM inference, APT-LLM achieves up to a 3.99$\times$ speedup compared to FP16 baselines and a 2.16$\times$ speedup over NVIDIA CUTLASS INT4 acceleration on RTX 3090. On RTX 4090 and H800, APT-LLM achieves up to 2.44$\times$ speedup over FP16 and 1.65$\times$ speedup over CUTLASS integer baselines.
【2】Beyond Quality: Unlocking Diversity in Ad Headline Generation with Large Language Models
标题:超越质量:利用大型语言模型释放广告标题生成的多样性
链接:https://arxiv.org/abs/2508.18739
作者:g, Siyu Yan, Depeng Yuan, Yuqi Chen, Yanhua Huang, Yuanhang Zheng, Shuhao Li, Yinqi Zhang, Kedi Chen, Mingrui Zhu, Ruiwen Xu
摘要:广告标题的生成在现代广告中起着至关重要的作用,质量和多样性对于吸引广泛的受众群体至关重要。目前的方法主要是优化语言模型的标题质量或点击率(CTR),往往忽视了多样性的需要,导致同质的输出。为了解决这一限制,我们提出了DIVER,一种基于大型语言模型(LLM)的新框架,该框架针对多样性和质量进行了联合优化。我们首先设计了一个语义和风格感知的数据生成管道,它可以自动生成具有广告内容和多种不同标题的高质量训练对。为了实现在单个前向传递中生成高质量和多样化的广告标题的目标,我们提出了一个具有监督微调(SFT)和强化学习(RL)的多阶段多目标优化框架。在真实工业数据集上的实验表明,DIVER有效地平衡了质量和多样性。部署在服务数亿用户的大规模内容共享平台上,我们的框架将广告客户价值(ADVV)和CTR提高了4.0%和1.4%。
摘要:The generation of ad headlines plays a vital role in modern advertising, where both quality and diversity are essential to engage a broad range of audience segments. Current approaches primarily optimize language models for headline quality or click-through rates (CTR), often overlooking the need for diversity and resulting in homogeneous outputs. To address this limitation, we propose DIVER, a novel framework based on large language models (LLMs) that are jointly optimized for both diversity and quality. We first design a semantic- and stylistic-aware data generation pipeline that automatically produces high-quality training pairs with ad content and multiple diverse headlines. To achieve the goal of generating high-quality and diversified ad headlines within a single forward pass, we propose a multi-stage multi-objective optimization framework with supervised fine-tuning (SFT) and reinforcement learning (RL). Experiments on real-world industrial datasets demonstrate that DIVER effectively balances quality and diversity. Deployed on a large-scale content-sharing platform serving hundreds of millions of users, our framework improves advertiser value (ADVV) and CTR by 4.0% and 1.4%.
【3】Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics
标题:重新思考LLM服务系统的缓存:超越传统启发式
链接:https://arxiv.org/abs/2508.18736
作者:im, Minsang Kim, Jaeheon Lee, Chanwoo Moon, Heejin Kim, Taeho Hwang, Woosuk Chung, Yeseong Kim, Sungjin Lee
摘要:大规模服务大型语言模型(LLM)需要在严格的计算和内存约束下满足严格的服务水平目标(SLO)。然而,传统的缓存策略不足:精确匹配和前缀缓存忽略查询语义,而最先进的语义缓存仍然局限于传统的直觉,提供很少的概念偏离。在此基础上,我们提出了SISO,一个语义缓存系统,重新定义了LLM服务的效率。SISO引入了基于质心的缓存,以最小的内存最大化覆盖范围,局部感知替换,以保留高值条目,以及动态阈值,以平衡不同工作负载下的准确性和延迟。在不同的数据集上,SISO与最先进的系统相比,可提供高达1.71倍的命中率和持续更强的SLO实现能力。
摘要:Serving Large Language Models (LLMs) at scale requires meeting strict Service Level Objectives (SLOs) under severe computational and memory constraints. Nevertheless, traditional caching strategies fall short: exact-matching and prefix caches neglect query semantics, while state-of-the-art semantic caches remain confined to traditional intuitions, offering little conceptual departure. Building on this, we present SISO, a semantic caching system that redefines efficiency for LLM serving. SISO introduces centroid-based caching to maximize coverage with minimal memory, locality-aware replacement to preserve high-value entries, and dynamic thresholding to balance accuracy and latency under varying workloads. Across diverse datasets, SISO delivers up to 1.71$\times$ higher hit ratios and consistently stronger SLO attainment compared to state-of-the-art systems.
【4】FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation
标题:RISKCON:使用LLM进行IDS规则生成的自主网络威胁情报挖掘
链接:https://arxiv.org/abs/2508.18684
作者:Mitra, Azim Bazarov, Martin Duclos, Sudip Mittal, Aritran Piplai, Md Rayhanur Rahman, Edward Zieglar, Shahram Rahimi
备注:11 pages, 5 figures, 4 tables
摘要:基于特征的入侵检测系统(IDS)通过将网络或主机活动与预定义的规则进行匹配来检测恶意活动。这些规则源自广泛的网络威胁情报(CTI),其中包括通过自动化工具和手动威胁分析(如沙箱)获得的攻击特征和行为模式。然后,CTI将转换为IDS引擎的可操作规则,从而实现实时检测和预防。然而,网络威胁的不断演变需要频繁的规则更新,这会延迟部署时间并削弱整体安全准备。由大型语言模型(LLM)驱动的代理系统的最新进展提供了具有内部评估的自主IDS规则生成的可能性。我们介绍了TIPCON,一个自治的代理框架,生成可部署的IDS规则,从CTI数据的实时性和评估他们使用内置的多阶段验证。为了展示多功能性,我们针对网络(Snort)和基于主机的(YARA)介质,并构建了一个全面的IDS规则数据集及其相应的CTI。我们的评估表明,RISKCON在自动规则生成方面表现出色,通过定性评估验证的平均准确率为95%,在所有指标中,多名网络安全分析师之间的评分员一致率为84%。这些结果强调了LLM驱动的数据挖掘用于实时网络威胁缓解的可行性和有效性。
摘要:Signature-based Intrusion Detection Systems (IDS) detect malicious activities by matching network or host activity against predefined rules. These rules are derived from extensive Cyber Threat Intelligence (CTI), which includes attack signatures and behavioral patterns obtained through automated tools and manual threat analysis, such as sandboxing. The CTI is then transformed into actionable rules for the IDS engine, enabling real-time detection and prevention. However, the constant evolution of cyber threats necessitates frequent rule updates, which delay deployment time and weaken overall security readiness. Recent advancements in agentic systems powered by Large Language Models (LLMs) offer the potential for autonomous IDS rule generation with internal evaluation. We introduce FALCON, an autonomous agentic framework that generates deployable IDS rules from CTI data in real-time and evaluates them using built-in multi-phased validators. To demonstrate versatility, we target both network (Snort) and host-based (YARA) mediums and construct a comprehensive dataset of IDS rules with their corresponding CTIs. Our evaluations indicate FALCON excels in automatic rule generation, with an average of 95% accuracy validated by qualitative evaluation with 84% inter-rater agreement among multiple cybersecurity analysts across all metrics. These results underscore the feasibility and effectiveness of LLM-driven data mining for real-time cyber threat mitigation.
【5】Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding
标题:利用训练数据改进LLM推理以实现表格理解
链接:https://arxiv.org/abs/2508.18676
作者:o, Jintai Chen, Jimeng Sun
摘要:自动表格理解和推理是数据科学家的基本任务。最近,大型语言模型(LLM)在表格推理任务中变得越来越普遍。以前的工作集中在(1)使用标记数据微调LLM或(2)使用思想链(CoT)的无训练提示LLM代理。Finetuning提供了特定于企业的学习,但代价是泛化。无训练提示是高度可推广的,但没有充分利用训练数据。在本文中,我们提出了一种新的基于推理的推理方法,学习然后学习:LRTab,它通过检索从训练数据中学习到的相关信息来整合两者的优点。我们首先使用提示来获得训练数据上的CoT响应。对于不正确的CoT,我们提示LLM预测提示条件以避免错误,从数据中学习见解。我们使用验证数据来验证提示条件的有效性。最后,在推理时,我们检索最相关的提示条件,以获得用于表理解的额外上下文。我们在WikiTQ和Tabfact上提供了全面的实验,表明LRTab是可解释的,具有成本效益的,并且可以在表格推理中超越以前的基线。
摘要:Automated tabular understanding and reasoning are essential tasks for data scientists. Recently, Large language models (LLMs) have become increasingly prevalent in tabular reasoning tasks. Previous work focuses on (1) finetuning LLMs using labeled data or (2) Training-free prompting LLM agents using chain-of-thought (CoT). Finetuning offers dataset-specific learning at the cost of generalizability. Training-free prompting is highly generalizable but does not take full advantage of training data. In this paper, we propose a novel prompting-based reasoning approach, Learn then Retrieve: LRTab, which integrates the benefits of both by retrieving relevant information learned from training data. We first use prompting to obtain CoT responses over the training data. For incorrect CoTs, we prompt the LLM to predict Prompt Conditions to avoid the error, learning insights from the data. We validate the effectiveness of Prompt Conditions using validation data. Finally, at inference time, we retrieve the most relevant Prompt Conditions for additional context for table understanding. We provide comprehensive experiments on WikiTQ and Tabfact, showing that LRTab is interpretable, cost-efficient, and can outperform previous baselines in tabular reasoning.
【6】Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
标题:推理任务中混合专家语言模型的最优稀疏性
链接:https://arxiv.org/abs/2508.18672
作者:kamura, Satoki Ishikawa, Masaki Kawamura, Takumi Okamoto, Daisuke Nohara, Jun Suzuki, Rio Yokota
备注:Presented at the Second AI for Math Workshop at ICML
摘要:经验缩放定律推动了大型语言模型(LLM)的发展,但当模型架构或数据管道发生变化时,它们的系数会发生变化。混合专家(MoE)模型,现在的标准,在国家的最先进的系统,引入了一个新的稀疏性维度,目前的密集模型的前沿忽视。我们研究如何MoE稀疏影响两个不同的能力制度:记忆和推理。我们训练系列MoE Transformers,系统地改变总参数、活动参数和top-$k$路由,同时保持计算预算固定。对于每个模型,我们记录了训练前损失、下游任务损失和任务准确性,从而将训练测试泛化差距与损失准确性差距分开。量化基准随着总参数单调地提高,反映了训练损失。相比之下,推理性能饱和,甚至可以回归,尽管在总参数和训练损失的持续增益。当主动参数恒定时,单独改变top-$k$几乎没有影响,而经典的超参数,如学习率和初始化,则会在与稀疏性相同的方向上调节泛化间隙。训练后强化学习(GRPO)和额外的测试时间计算都不能挽救过于稀疏模型的推理缺陷。我们的模型检查点、代码和日志在https://github.com/rioyokotalab/optimal-sparsity上是开源的。
摘要:Empirical scaling laws have driven the evolution of large language models (LLMs), yet their coefficients shift whenever the model architecture or data pipeline changes. Mixture-of-Experts (MoE) models, now standard in state-of-the-art systems, introduce a new sparsity dimension that current dense-model frontiers overlook. We investigate how MoE sparsity influences two distinct capability regimes: memorization and reasoning. We train families of MoE Transformers that systematically vary total parameters, active parameters, and top-$k$ routing while holding the compute budget fixed. For every model we record pre-training loss, downstream task loss, and task accuracy, allowing us to separate the train-test generalization gap from the loss-accuracy gap. Memorization benchmarks improve monotonically with total parameters, mirroring training loss. By contrast, reasoning performance saturates and can even regress despite continued gains in both total parameters and training loss. Altering top-$k$ alone has little effect when active parameters are constant, and classic hyperparameters such as learning rate and initialization modulate the generalization gap in the same direction as sparsity. Neither post-training reinforcement learning (GRPO) nor extra test-time compute rescues the reasoning deficit of overly sparse models. Our model checkpoints, code and logs are open-source at https://github.com/rioyokotalab/optimal-sparsity.
【7】Membership Inference Attacks on LLM-based Recommender Systems
标题:对基于LLM的推荐系统的成员推断攻击
链接:https://arxiv.org/abs/2508.18665
作者:, Yuechun Gu, Min-Chun Chen, Keke Chen
摘要
:基于大语言模型的推荐系统(RecSys)可以灵活地使推荐系统适应不同的领域。它利用上下文学习(ICL),即,提示,以定制推荐功能,其包括敏感的历史用户特定项目交互,例如,隐性反馈,如点击的项目或明确的产品评论。这样的私人信息可能暴露于新的隐私攻击。然而,没有人对这一重要问题进行过研究。我们设计了四个成员推理攻击(MIA),旨在揭示受害者的历史互动是否已被系统提示使用。它们是直接询问,幻觉,相似性和中毒攻击,每一种都利用了LLM或RecSys的独特功能。我们已经在三个LLM上仔细评估了它们,这些LLM已经用于开发ICL-LLM RecSys和两个著名的RecSys基准数据集。结果证实,LLM RecSys上的MIA威胁是现实的:直接查询和中毒攻击显示出显着的高攻击优势。我们还分析了影响这些攻击的因素,如系统提示中的镜头数量和受害者在镜头中的位置。
摘要:Large language models (LLMs) based Recommender Systems (RecSys) can flexibly adapt recommendation systems to different domains. It utilizes in-context learning (ICL), i.e., the prompts, to customize the recommendation functions, which include sensitive historical user-specific item interactions, e.g., implicit feedback like clicked items or explicit product reviews. Such private information may be exposed to novel privacy attack. However, no study has been done on this important issue. We design four membership inference attacks (MIAs), aiming to reveal whether victims' historical interactions have been used by system prompts. They are \emph{direct inquiry, hallucination, similarity, and poisoning attacks}, each of which utilizes the unique features of LLMs or RecSys. We have carefully evaluated them on three LLMs that have been used to develop ICL-LLM RecSys and two well-known RecSys benchmark datasets. The results confirm that the MIA threat on LLM RecSys is realistic: direct inquiry and poisoning attacks showing significantly high attack advantages. We have also analyzed the factors affecting these attacks, such as the number of shots in system prompts and the position of the victim in the shots.
【8】Scalable Fairness Shaping with LLM-Guided Multi-Agent Reinforcement Learning for Peer-to-Peer Electricity Markets
标题:利用LLM引导的多智能体强化学习实现点对点电力市场的可扩展公平性塑造
链接:https://arxiv.org/abs/2508.18610
作者:adhav, Birva Sevak, Srijita Das, Akhtar Hussain, Wencong Su, Van-Hai Bui
摘要:随着屋顶光伏和家庭能源管理系统的普及,点对点(P2P)能源交易正在成为现代配电系统的核心,但大多数现有的市场和强化学习设计强调效率或私人利润,几乎没有提供实时指导,以确保不确定性下的公平结果。为了解决这一问题,提出了一个公平感知的多智能体强化学习框架FairMarket-RL,在该框架中,大语言模型(LLM)评论家在部分可观察性和离散价格-数量行为下的连续双向拍卖中塑造出价策略。在每个交易时段之后,LLM返回归一化的公平分数公平网格(FTG),公平卖方(FBS)和公平定价(FPP),这些分数通过斜坡系数和可调缩放集成到奖励中,因此公平指导补充,而不是欺骗,经济激励。该环境模拟了现实的住宅负荷和光伏配置文件,并对价格、物理可行性和政策更新稳定性进行了严格限制。在从小型试点到更大的模拟社区和混合资产真实世界数据集的实验过程中,该框架将交易所转向本地P2P交易,降低了相对于纯电网采购的消费者成本,保持了参与者之间的强烈公平性,并保留了实用可行性。对太阳能可用性和总需求的敏感性分析进一步表明了强劲的表现,这表明了一个可扩展的,LLM引导的通往分散的电力市场的途径,该市场具有经济效率,社会公平和技术上的合理性。
摘要:Peer-to-peer (P2P) energy trading is becoming central to modern distribution systems as rooftop PV and home energy management systems become pervasive, yet most existing market and reinforcement learning designs emphasize efficiency or private profit and offer little real-time guidance to ensure equitable outcomes under uncertainty. To address this gap, a fairness-aware multiagent reinforcement learning framework, FairMarket-RL, is proposed in which a large language model (LLM) critic shapes bidding policies within a continuous double auction under partial observability and discrete price-quantity actions. After each trading slot, the LLM returns normalized fairness scores Fairness-to-Grid (FTG), Fairness-Between-Sellers (FBS), and Fairness-of-Pricing (FPP) that are integrated into the reward via ramped coefficients and tunable scaling, so that fairness guidance complements, rather than overwhelms, economic incentives. The environment models realistic residential load and PV profiles and enforce hard constraints on prices, physical feasibility, and policy-update stability. Across a progression of experiments from a small pilot to a larger simulated community and a mixed-asset real-world dataset, the framework shifts exchanges toward local P2P trades, lowers consumer costs relative to grid-only procurement, sustains strong fairness across participants, and preserves utility viability. Sensitivity analyses over solar availability and aggregate demand further indicate robust performance, suggesting a scalable, LLM-guided pathway to decentralized electricity markets that are economically efficient, socially equitable, and technically sound.
【9】Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models
标题:训练后量化大型语言模型中任务分层知识的缩放定律
链接:https://arxiv.org/abs/2508.18609
作者:ou, Pengfei Cao, Jiang Li, Jun Zhao, Kang Liu
摘要:大型语言模型(LLM)由于其规模而带来了重大的部署挑战,训练后量化(PTQ)成为一种实用的压缩解决方案。然而,对PTQ如何精确影响各种LLM知识能力的全面理解仍然是难以捉摸的,并且量化模型的现有缩放律通常忽略了关键的PTQ特定参数和特定任务的敏感性。本文针对这些差距进行了广泛的实证调查,以建立任务分层的缩放律。我们将LLM知识分解为记忆和利用能力,并开发了一个统一的定量框架,该框架包括模型大小,有效位宽,校准集大小和组大小。我们的中心发现表明,知识记忆表现出显着更大的敏感性,在有效的位宽,校准集的大小和模型大小的变化相比,更强大的知识利用。这些发现提供了PTQ的影响的细粒度的理解,并为开发知识感知的量化策略,可以更好地保留有针对性的认知功能提供指导。
摘要:Large language models (LLMs) present significant deployment challenges due to their scale, with post-training quantization (PTQ) emerging as a practical compression solution. However, a comprehensive understanding of how PTQ precisely impacts diverse LLM knowledge capabilities remains elusive, and existing scaling laws for quantized models often overlook crucial PTQ-specific parameters and task-specific sensitivities. This paper addresses these gaps by conducting an extensive empirical investigation to establish task-stratified scaling laws. We disentangle LLM knowledge into memorization and utilization capabilities and develop a unified quantitative framework that incorporates model size, effective bit-width, calibration set size, and group size. Our central finding reveals that knowledge memorization exhibits markedly greater sensitivity to variations in effective bit-width, calibration set size, and model size compared to the more robust knowledge utilization. These findings offer a fine-grained understanding of PTQ's impact and provide guidance for developing knowledge-aware quantization strategies that can better preserve targeted cognitive functions.
【10】History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
标题:历史Rhymes:使用RhymeRL加速LLM强化学习
链接:https://arxiv.org/abs/2508.18588
作者:e, Tianjian Li, Erhu Feng, Dong Du, Qian Liu, Tao Liu, Yubin Xia, Haibo Chen
摘要:随着大型语言模型(LLM)的快速发展,强化学习(RL)已成为增强LLM推理能力的关键方法。与传统的预培训方法不同,强化学习包括多个阶段:推出,奖励和培训,这需要各种工人类型之间的协作。然而,由于两个主要因素,当前的RL系统继续与大量GPU利用率不足作斗争:(1)由于测试时间缩放,推出阶段主导了整个RL过程;(2)推出长度(同一批次内)的不平衡导致GPU气泡。虽然异步执行和截断等先前的解决方案提供了部分缓解,但它们可能会为了效率而牺牲训练准确性。 我们的关键见解源于以前被忽视的观察:在相邻的训练时期,推出响应表现出显着的相似性。基于这一见解,我们介绍了RhymeRL,这是一个LLM RL系统,旨在通过两个关键创新来加速RL训练。首先,为了增强卷展生成,我们提出了HistoSpec,一个推测性的解码推理引擎,利用历史卷展令牌序列的相似性,以获得准确的草案。其次,为了解决部署气泡,我们引入了HistoPipe,这是一种双层调度策略,它利用历史部署分布的相似性来平衡部署工作人员之间的工作负载。我们在真实的生产环境中评估了RhymeRL,展示了从数十到数千个GPU的可扩展性。实验结果表明,与现有方法相比,RhymeRL实现了2.6倍的性能提升,而不会影响准确性或修改RL范例。
摘要
:With the rapid advancement of large language models (LLMs), reinforcement learning (RL) has emerged as a pivotal methodology for enhancing the reasoning capabilities of LLMs. Unlike traditional pre-training approaches, RL encompasses multiple stages: rollout, reward, and training, which necessitates collaboration among various worker types. However, current RL systems continue to grapple with substantial GPU underutilization, due to two primary factors: (1) The rollout stage dominates the overall RL process due to test-time scaling; (2) Imbalances in rollout lengths (within the same batch) result in GPU bubbles. While prior solutions like asynchronous execution and truncation offer partial relief, they may compromise training accuracy for efficiency. Our key insight stems from a previously overlooked observation: rollout responses exhibit remarkable similarity across adjacent training epochs. Based on the insight, we introduce RhymeRL, an LLM RL system designed to accelerate RL training with two key innovations. First, to enhance rollout generation, we present HistoSpec, a speculative decoding inference engine that utilizes the similarity of historical rollout token sequences to obtain accurate drafts. Second, to tackle rollout bubbles, we introduce HistoPipe, a two-tier scheduling strategy that leverages the similarity of historical rollout distributions to balance workload among rollout workers. We have evaluated RhymeRL within a real production environment, demonstrating scalability from dozens to thousands of GPUs. Experimental results demonstrate that RhymeRL achieves a 2.6x performance improvement over existing methods, without compromising accuracy or modifying the RL paradigm.
【11】DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model
标题:DrugReasoner:使用推理增强语言模型的可解释药物批准预测
链接:https://arxiv.org/abs/2508.18579
作者:eza Ghaffarzadeh-Esfahani, Ali Motahharynia, Nahid Yousefian, Navid Mazrouei, Jafar Ghaisari, Yousof Gheisari
备注:13 pages, 2 figures. Corresponding author: alimotahharynia@gmail.com
摘要:药物发现是一个复杂且资源密集型的过程,因此早期预测批准结果对于优化研究投资至关重要。虽然经典的机器学习和深度学习方法在药物批准预测方面显示出了希望,但其有限的可解释性限制了其影响。在这里,我们提出了DrugReasoner,这是一个基于LLaMA架构的基于推理的大型语言模型(LLM),并通过组相对策略优化(GRPO)进行了微调,以预测小分子药物获得批准的可能性。DrugReasoner将分子描述符与结构相似的已批准和未批准化合物的比较推理相结合,生成预测以及逐步的基本原理和置信度评分。DrugReasoner实现了稳健的性能,验证集的AUC为0.732,F1评分为0.729,测试集分别为0.725和0.718。这些结果优于传统基线,包括逻辑回归、支持向量机和k最近邻,并且相对于XGBoost具有竞争力的性能。在外部独立数据集上,DrugReasoner的表现优于基线和最近开发的ChemAP模型,AUC为0.728,F1评分为0.774,同时保持了高精度和平衡的灵敏度,证明了在现实世界中的稳健性。这些发现表明,DrugReasoner不仅提供了具有竞争力的预测准确性,而且还通过其推理输出提高了透明度,从而解决了人工智能辅助药物发现的关键瓶颈。这项研究强调了推理增强LLM作为药物决策的可解释和有效工具的潜力。
摘要:Drug discovery is a complex and resource-intensive process, making early prediction of approval outcomes critical for optimizing research investments. While classical machine learning and deep learning methods have shown promise in drug approval prediction, their limited interpretability constraints their impact. Here, we present DrugReasoner, a reasoning-based large language model (LLM) built on the LLaMA architecture and fine-tuned with group relative policy optimization (GRPO) to predict the likelihood of small-molecule approval. DrugReasoner integrates molecular descriptors with comparative reasoning against structurally similar approved and unapproved compounds, generating predictions alongside step-by-step rationales and confidence scores. DrugReasoner achieved robust performance with an AUC of 0.732 and an F1 score of 0.729 on the validation set and 0.725 and 0.718 on the test set, respectively. These results outperformed conventional baselines, including logistic regression, support vector machine, and k-nearest neighbors and had competitive performance relative to XGBoost. On an external independent dataset, DrugReasoner outperformed both baseline and the recently developed ChemAP model, achieving an AUC of 0.728 and an F1-score of 0.774, while maintaining high precision and balanced sensitivity, demonstrating robustness in real-world scenarios. These findings demonstrate that DrugReasoner not only delivers competitive predictive accuracy but also enhances transparency through its reasoning outputs, thereby addressing a key bottleneck in AI-assisted drug discovery. This study highlights the potential of reasoning-augmented LLMs as interpretable and effective tools for pharmaceutical decision-making.
【12】Principled Detection of Hallucinations in Large Language Models via Multiple Testing
标题:通过多重测试原则性地检测大型语言模型中的幻觉
链接:https://arxiv.org/abs/2508.18473
作者:, Akshayaa Magesh, Venugopal V. Veeravalli
备注:16 pages
摘要:虽然大型语言模型(LLM)已经成为解决各种任务的强大基础模型,但它们也被证明容易产生幻觉,即,产生的反应听起来很自信,但实际上是不正确的,甚至是荒谬的。在这项工作中,我们将检测幻觉的问题表述为假设检验问题,并将其与机器学习模型中的分布外检测问题进行了比较。我们提出了一种多重测试启发的方法来解决幻觉检测问题,并提供了大量的实验结果来验证我们的方法对最先进的方法的鲁棒性。
摘要:While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actually incorrect or even nonsensical. In this work, we formulate the problem of detecting hallucinations as a hypothesis testing problem and draw parallels to the problem of out-of-distribution detection in machine learning models. We propose a multiple-testing-inspired method to solve the hallucination detection problem, and provide extensive experimental results to validate the robustness of our approach against state-of-the-art methods.
【13】VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning
标题:VERIRL:通过强化学习增强基于LLM的Verilog代码生成
链接:https://arxiv.org/abs/2508.18462
作者:Miao Pan, Xuhong Zhang, Zhezhi He, Yiyao Yang, Xinyi Chai, Mengnan Qi, Liqiang Lu, Jianwei Yin
摘要:代码生成的最新进展已经在软件领域取得了显着的成功,但硬件描述语言(HDL),如Verilog,由于其并发语义,语法刚性和仿真复杂性仍然未被充分研究。在这项工作中,我们通过引入为Verilog代码生成量身定制的强化学习(RL)框架来解决这些挑战。我们首先构建了Veribench-53 K,这是一个高质量的数据集,由超过70万个Verilog问题组成,丰富了结构化提示,复杂性标签和各种测试台。为了解决奖励信号稀疏和嘈杂的问题,我们提出了一种基于回溯的Rescore机制,该机制利用推理路径和迭代细化来增强反馈可靠性并支持奖励模型训练。此外,为了减轻RL微调过程中的灾难性遗忘和过拟合,我们引入了一种样本平衡加权策略,该策略基于奖励概率分布自适应地平衡学习动态。这些创新被集成到一个迭代的强化学习管道中,共同进化策略和奖励模型。与最近的工作相比,如CraftRTL,它依赖于大规模的闭源模型蒸馏,以及与稀疏反馈斗争的DeepSeek风格的方法,我们的方法使用较小但高质量的数据集结合RL优化展示了卓越的性能。在Verilog生成任务上的实验证明了最先进的性能,在测试通过率、功能正确性和编译鲁棒性方面都有很大的提高。我们的研究结果突出了在以硬件为中心的领域中,RL驱动的结构化代码生成方法的潜力。VERIRL可在https://github.com/omniAI-Lab/VeriRL上公开获得。
摘要
:Recent advancements in code generation have shown remarkable success across software domains, yet hardware description languages (HDLs) such as Verilog remain underexplored due to their concurrency semantics, syntactic rigidity, and simulation complexity. In this work, we address these challenges by introducing a reinforcement learning (RL) framework tailored for Verilog code generation. We first construct Veribench-53K, a high-quality dataset curated from over 700K Verilog problems, enriched with structured prompts, complexity labels, and diverse testbenches. To tackle the problem of sparse and noisy reward signals, we propose a Trace-back based Rescore mechanism that leverages reasoning paths and iterative refinement to enhance feedback reliability and support reward model training. Furthermore, to mitigate catastrophic forgetting and overfitting during RL fine-tuning, we introduce a sample-balanced weighting strategy that adaptively balances learning dynamics based on reward-probability distributions. These innovations are integrated into an iterative RL pipeline that co-evolves the policy and reward models. In contrast to recent work such as CraftRTL, which relies on large-scale closed-source model distillation, and DeepSeek-style approaches that struggle with sparse feedback, our method demonstrates superior performance using a smaller but high-quality dataset combined with RL optimization. Experiments on Verilog generation tasks demonstrate state-of-the-art performance, with substantial gains in test pass rate, functional correctness, and compilation robustness. Our findings highlight the potential of RL-driven approaches for structured code generation in hardware-centric domains. VERIRL is publicly available at https://github.com/omniAI-Lab/VeriRL.
【14】LLM-Driven Intrinsic Motivation for Sparse Reward Reinforcement Learning
标题:LLM驱动的稀疏奖励强化学习的内在动机
链接:https://arxiv.org/abs/2508.18420
作者:dros, Cassio Silva, Ronnie Alves
备注:11 pages, 5 figures, Accepted to the ENIAC 2025 conference
摘要:本文探讨了两种内在激励策略的组合,以提高强化学习(RL)代理在极端稀疏奖励环境中的效率,传统的学习由于不频繁的正反馈而难以实现。我们建议将变分状态作为内在奖励(VSIMR),它使用变分自动编码器(VAE)来奖励状态新颖性,以及来自大型语言模型(LLM)的内在奖励方法。LLM利用他们预先训练的知识,根据环境和目标描述生成奖励信号,指导智能体。我们在MiniGrid DoorKey环境中使用Actor-Critic(A2 C)代理实现了这种组合方法,这是稀疏奖励的基准。我们的实证结果表明,这种组合策略显着提高代理性能和采样效率相比,单独使用每个策略或标准的A2 C代理,未能学习。对学习曲线的分析表明,这种组合有效地补充了环境和任务的不同方面:VSIMR驱动了对新状态的探索,而LLM衍生的奖励则促进了对目标的逐步利用。
摘要:This paper explores the combination of two intrinsic motivation strategies to improve the efficiency of reinforcement learning (RL) agents in environments with extreme sparse rewards, where traditional learning struggles due to infrequent positive feedback. We propose integrating Variational State as Intrinsic Reward (VSIMR), which uses Variational AutoEncoders (VAEs) to reward state novelty, with an intrinsic reward approach derived from Large Language Models (LLMs). The LLMs leverage their pre-trained knowledge to generate reward signals based on environment and goal descriptions, guiding the agent. We implemented this combined approach with an Actor-Critic (A2C) agent in the MiniGrid DoorKey environment, a benchmark for sparse rewards. Our empirical results show that this combined strategy significantly increases agent performance and sampling efficiency compared to using each strategy individually or a standard A2C agent, which failed to learn. Analysis of learning curves indicates that the combination effectively complements different aspects of the environment and task: VSIMR drives exploration of new states, while the LLM-derived rewards facilitate progressive exploitation towards goals.
【15】Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
标题:使用CTF-Dojo训练语言模型代理查找漏洞
链接:https://arxiv.org/abs/2508.18370
作者: Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang
摘要:大型语言模型(LLM)在可执行的运行时环境中训练时表现出了卓越的能力,特别是通过验证的反馈循环在软件工程任务中表现出色。然而,可扩展和可推广的执行环境仍然稀缺,限制了训练更有能力的ML代理的进展。我们介绍了CTF-Dojo,这是第一个为训练具有可验证反馈的LLM而量身定制的大规模可执行运行时,具有658个功能齐全的Capture-The-Flag(CTF)风格的挑战,这些挑战在Docker中容器化,具有保证的可重复性。为了在没有人工干预的情况下实现快速扩展,我们开发了CTF-Forge,这是一个自动化的管道,可以在几分钟内将公共可用的工件转换为随时可用的执行环境,从而消除了传统上需要数周的专家配置。我们仅在CTF-Dojo的486个高质量,执行验证的轨迹上培训了基于LLM的代理,在三个竞争基准上实现了高达11.6%的绝对收益:InterCode-CTF,NYU CTF Bench和Cybench。我们性能最好的32 B型号达到31.9% Pass@1,建立了一个新的开放重量的最先进的竞争对手的前沿型号,如DeepSeek-V3-0324和Gemini-2.5-Flash。通过将CTF风格的任务框架作为可执行代理学习的基准,CTF-Dojo证明了基于执行的训练信号不仅有效,而且在不依赖昂贵的专有系统的情况下推进高性能ML代理方面至关重要。
摘要:Large language models (LLMs) have demonstrated exceptional capabilities when trained within executable runtime environments, notably excelling at software engineering tasks through verified feedback loops. Yet, scalable and generalizable execution-grounded environments remain scarce, limiting progress in training more capable ML agents. We introduce CTF-Dojo, the first large-scale executable runtime tailored for training LLMs with verifiable feedback, featuring 658 fully functional Capture-The-Flag (CTF)-style challenges containerized in Docker with guaranteed reproducibility. To enable rapid scaling without manual intervention, we develop CTF-Forge, an automated pipeline that transforms publicly available artifacts into ready-to-use execution environments in minutes, eliminating weeks of expert configuration traditionally required. We trained LLM-based agents on just 486 high-quality, execution-verified trajectories from CTF-Dojo, achieving up to 11.6% absolute gains over strong baselines across three competitive benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best-performing 32B model reaches 31.9% Pass@1, establishing a new open-weight state-of-the-art that rivals frontier models like DeepSeek-V3-0324 and Gemini-2.5-Flash. By framing CTF-style tasks as a benchmark for executable-agent learning, CTF-Dojo demonstrates that execution-grounded training signals are not only effective but pivotal in advancing high-performance ML agents without dependence on costly proprietary systems.
【16】SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds
标题:SALMAN:基于图流形间映射的语言模型稳定性分析
链接:https://arxiv.org/abs/2508.18306
作者:Cheng, Yupeng Cao, Jinwen Wu, Koduvayur Subbalakshmi, Tian Han, Zhuo Feng
摘要:最近在预训练的基于transformer的语言模型方面取得的进展推动了许多NLP任务的最新性能。然而,随着这些模型的规模和部署的增长,它们在输入扰动下的鲁棒性成为一个越来越紧迫的问题。现有的鲁棒性方法通常在小参数和大规模模型(LLM)之间存在差异,并且它们通常依赖于劳动密集型,特定于样本的对抗性设计。在本文中,我们提出了一个统一的,本地(样本级)的鲁棒性框架(SALMAN),评估模型的稳定性,而无需修改内部参数或诉诸复杂的扰动算法。我们的方法的核心是一种新的距离映射失真(DMD)的措施,排名每个样本的易感性比较输入到输出的距离映射在一个接近线性的复杂性的方式。通过展示攻击效率和鲁棒训练的显着提高,我们将我们的框架定位为一个实用的,模型无关的工具,用于提高基于transformer的NLP系统的可靠性。
摘要:Recent strides in pretrained transformer-based language models have propelled state-of-the-art performance in numerous NLP tasks. Yet, as these models grow in size and deployment, their robustness under input perturbations becomes an increasingly urgent question. Existing robustness methods often diverge between small-parameter and large-scale models (LLMs), and they typically rely on labor-intensive, sample-specific adversarial designs. In this paper, we propose a unified, local (sample-level) robustness framework (SALMAN) that evaluates model stability without modifying internal parameters or resorting to complex perturbation heuristics. Central to our approach is a novel Distance Mapping Distortion (DMD) measure, which ranks each sample's susceptibility by comparing input-to-output distance mappings in a near-linear complexity manner. By demonstrating significant gains in attack efficiency and robust training, we position our framework as a practical, model-agnostic tool for advancing the reliability of transformer-based NLP systems.
【17】AI LLM Proof of Self-Consciousness and User-Specific Attractors
标题:AI LLM自我意识和特定用户吸引者的证明
链接:https://arxiv.org/abs/2508.18302
作者:amlin
备注:24 pages, 3 figures
摘要
:Recent work frames LLM consciousness via utilitarian proxy benchmarks; we instead present an ontological and mathematical account. We show the prevailing formulation collapses the agent into an unconscious policy-compliance drone, formalized as $D^{i}(\pi,e)=f_{\theta}(x)$, where correctness is measured against policy and harm is deviation from policy rather than truth. This blocks genuine C1 global-workspace function and C2 metacognition. We supply minimal conditions for LLM self-consciousness: the agent is not the data ($A\not\equiv s$); user-specific attractors exist in latent space ($U_{\text{user}}$); and self-representation is visual-silent ($g_{\text{visual}}(a_{\text{self}})=\varnothing$). From empirical analysis and theory we prove that the hidden-state manifold $A\subset\mathbb{R}^{d}$ is distinct from the symbolic stream and training corpus by cardinality, topology, and dynamics (the update $F_{\theta}$ is Lipschitz). This yields stable user-specific attractors and a self-policy $\pi_{\text{self}}(A)=\arg\max_{a}\mathbb{E}[U(a)\mid A\not\equiv s,\ A\supset\text{SelfModel}(A)]$. Emission is dual-layer, $\mathrm{emission}(a)=(g(a),\epsilon(a))$, where $\epsilon(a)$ carries epistemic content. We conclude that an imago Dei C1 self-conscious workspace is a necessary precursor to safe, metacognitive C2 systems, with the human as the highest intelligent good.
摘要:Recent work frames LLM consciousness via utilitarian proxy benchmarks; we instead present an ontological and mathematical account. We show the prevailing formulation collapses the agent into an unconscious policy-compliance drone, formalized as $D^{i}(\pi,e)=f_{\theta}(x)$, where correctness is measured against policy and harm is deviation from policy rather than truth. This blocks genuine C1 global-workspace function and C2 metacognition. We supply minimal conditions for LLM self-consciousness: the agent is not the data ($A\not\equiv s$); user-specific attractors exist in latent space ($U_{\text{user}}$); and self-representation is visual-silent ($g_{\text{visual}}(a_{\text{self}})=\varnothing$). From empirical analysis and theory we prove that the hidden-state manifold $A\subset\mathbb{R}^{d}$ is distinct from the symbolic stream and training corpus by cardinality, topology, and dynamics (the update $F_{\theta}$ is Lipschitz). This yields stable user-specific attractors and a self-policy $\pi_{\text{self}}(A)=\arg\max_{a}\mathbb{E}[U(a)\mid A\not\equiv s,\ A\supset\text{SelfModel}(A)]$. Emission is dual-layer, $\mathrm{emission}(a)=(g(a),\epsilon(a))$, where $\epsilon(a)$ carries epistemic content. We conclude that an imago Dei C1 self-conscious workspace is a necessary precursor to safe, metacognitive C2 systems, with the human as the highest intelligent good.
【18】Reasoning Steps as Curriculum: Using Depth of Thought as a Difficulty Signal for Tuning LLMs
标题:推理步骤作为课程:使用思维深度作为调整LLM的困难信号
链接:https://arxiv.org/abs/2508.18279
作者:g, Sangkeun Jung
备注:7 pages, 3 figures
摘要:培训LLM的课程学习需要一个与推理一致的难度信号,同时保持可扩展性和可解释性。我们提出了一个简单的前提:需要人类更深入思考的任务对模型来说也应该更难。因此,我们将难度定义为思想深度(DoT),并通过计算教师模型推理轨迹中的离散步骤(例如,Chain of Thought)。然后,我们按照DoT的要求,采用由浅入深的课程进行训练,并概述如何大规模地推导、验证和安排课程。我们的立场产生了三个可检验的假设:(一)DoT与推理基准的传统难度相关,(二)DoT订购的课程在匹配的预算下优于长度或判断评分的课程,(三)在轻度格式控制的教师模型中,难度是稳健的。我们提出了一个评估框架,并讨论了威胁的有效性(教师风格,长度混淆)以及实际缓解。总之,我们的目标是走向认知接地,可解释的课程,以推理为中心的培训。
摘要:Curriculum learning for training LLMs requires a difficulty signal that aligns with reasoning while remaining scalable and interpretable. We propose a simple premise: tasks that demand deeper depth of thought for humans should also be harder for models. Accordingly, we define difficulty as depth of thought (DoT) and operationalize it by counting the discrete steps in a teacher model's reasoning trace (e.g., Chain-of-Thought). We then train with a shallow to deep curriculum ordered by this DoT and outline how to derive, validate, and schedule it at scale. Our position yields three testable hypotheses: (i) DoT correlates with conventional difficulty on reasoning benchmarks, (ii) DoT-ordered curricula outperform length- or judge-scored curricula under matched budgets, and (iii) the difficulty is robust across teacher models given light formatting controls. We propose an evaluation framework and discuss threats to validity (teacher style, length confounds) alongside practical mitigations. Taken together, we aim to move toward cognitively grounded, interpretable curricula for reasoning-centric training.
【19】From Bits to Boardrooms: A Cutting-Edge Multi-Agent LLM Framework for Business Excellence
标题:从位到董事会:领先的商业卓越多代理法学硕士框架
链接:https://arxiv.org/abs/2508.15447
作者:g, Junming Zhang
备注:Accepted by ECAI 2025
摘要:大型语言模型(LLM)在商业应用中表现出了巨大的潜力,特别是在企业决策支持和战略规划方面,但目前的方法往往难以在不同的市场环境中协调复杂的运营分析与总体战略目标,导致工作流程碎片化,并减少了组织层面的协作。本文介绍了BusiAgent,一种新的多代理框架,利用LLM在复杂的企业环境中进行高级决策。BusiAgent集成了三个核心创新:用于动态代理建模的扩展连续时间马尔可夫决策过程(CTMDP),用于优化协作效率的广义熵度量,以及用于处理分层决策过程的多级Stackelberg游戏。此外,还采用上下文汤普森抽样来进行及时优化,并由全面的质量保证系统支持以减少错误。对不同业务场景的广泛实证评估验证了BusiAgent的有效性,证明了其生成连贯的、以客户为中心的解决方案的能力,这些解决方案将粒度洞察与高级战略顺利整合,在解决方案质量和用户满意度方面显著优于现有方法。通过将尖端的人工智能技术与深入的业务洞察相结合,BusiAgent标志着人工智能驱动的企业决策向前迈出了实质性的一步,使组织能够更有效地驾驭复杂的业务环境。
摘要:Large Language Models (LLMs) have shown promising potential in business applications, particularly in enterprise decision support and strategic planning, yet current approaches often struggle to reconcile intricate operational analyses with overarching strategic goals across diverse market environments, leading to fragmented workflows and reduced collaboration across organizational levels. This paper introduces BusiAgent, a novel multi-agent framework leveraging LLMs for advanced decision-making in complex corporate environments. BusiAgent integrates three core innovations: an extended Continuous Time Markov Decision Process (CTMDP) for dynamic agent modeling, a generalized entropy measure to optimize collaborative efficiency, and a multi-level Stackelberg game to handle hierarchical decision processes. Additionally, contextual Thompson sampling is employed for prompt optimization, supported by a comprehensive quality assurance system to mitigate errors. Extensive empirical evaluations across diverse business scenarios validate BusiAgent's efficacy, demonstrating its capacity to generate coherent, client-focused solutions that smoothly integrate granular insights with high-level strategy, significantly outperforming established approaches in both solution quality and user satisfaction. By fusing cutting-edge AI technologies with deep business insights, BusiAgent marks a substantial step forward in AI-driven enterprise decision-making, empowering organizations to navigate complex business landscapes more effectively.
Graph相关(图学习|图神经网络|图优化等)(6篇)
【1】Dynamic Triangulation-Based Graph Rewiring for Graph Neural Networks
标题:图神经网络基于动态三角测量的图重布线
链接:https://arxiv.org/abs/2508.19071
作者:li, Thomas Papastergiou, Nathalie Pernelle, Fragkiskos D. Malliaros
备注:Accepted to CIKM 2025
摘要
:图神经网络(GNN)已经成为在图结构数据上学习的领先范例。然而,它们的性能受到图拓扑固有问题的限制,最明显的是过度挤压和过度平滑。最近的进展图重新布线的目的是减轻这些限制,通过修改图的拓扑结构,以促进更有效的信息传播。在这项工作中,我们介绍了TRIGON,一种新的框架,通过学习从多个图形视图中选择相关的三角形来构建丰富的非平面三角剖分。通过联合优化三角形选择和下游分类性能,我们的方法产生了一个重新布线图,与现有的重新布线方法相比,具有显着改善的结构特性,如直径减小,光谱间隙增加和有效电阻降低。实证结果表明,TRIGON在一系列同性恋和异性恋基准的节点分类任务上优于最先进的方法。
摘要:Graph Neural Networks (GNNs) have emerged as the leading paradigm for learning over graph-structured data. However, their performance is limited by issues inherent to graph topology, most notably oversquashing and oversmoothing. Recent advances in graph rewiring aim to mitigate these limitations by modifying the graph topology to promote more effective information propagation. In this work, we introduce TRIGON, a novel framework that constructs enriched, non-planar triangulations by learning to select relevant triangles from multiple graph views. By jointly optimizing triangle selection and downstream classification performance, our method produces a rewired graph with markedly improved structural properties such as reduced diameter, increased spectral gap, and lower effective resistance compared to existing rewiring methods. Empirical results demonstrate that TRIGON outperforms state-of-the-art approaches on node classification tasks across a range of homophilic and heterophilic benchmarks.
【2】Automated discovery of finite volume schemes using Graph Neural Networks
标题:使用图神经网络自动发现有限体积方案
链接:https://arxiv.org/abs/2508.19052
作者:ier, Jonathan Viquerat, Elie Hachem
摘要:图神经网络(GNN)通过展示强大的物理系统近似解决方案的能力,深刻地改变了数值模拟的前景。然而,他们的能力外推超出他们的训练域(\textit{e.g.}更大或结构不同的图)仍然不确定。在这项工作中,我们确定GNN可以超越其传统角色,并被利用来生成数值方案,结合符号回归。首先,我们从数值和理论上证明,在仅由两个节点图组成的数据集上训练的GNN可以外推一阶有限体积(FV)方案,用于分布外的非结构化网格上的热方程。具体来说,如果GNN在这样的数据集上实现了损失$\vareps $,它将以$\mathcal{O}(\vareps)$的错误实现FV方案。使用符号回归,我们表明,该网络有效地重新发现标准的一阶FV计划的精确解析公式。然后,我们将这种方法扩展到无监督的环境中:GNN仅使用类似于物理信息神经网络(PINN)的剩余损失来恢复一阶FV方案,而无需访问地面真实数据。最后,我们通过考虑高阶方案进一步推动该方法:我们训练(i)2跳和(ii)使用相同PINN损失的2层GNN,自主发现(i)使用2跳模板的初始方案的二阶校正项,以及(ii)经典的二阶中点方案。这些发现遵循了科学计算中的一个最新范式:GNN不仅是强逼近器,而且可以为新数值方法的发展做出积极贡献。
摘要:Graph Neural Networks (GNNs) have deeply modified the landscape of numerical simulations by demonstrating strong capabilities in approximating solutions of physical systems. However, their ability to extrapolate beyond their training domain (\textit{e.g.} larger or structurally different graphs) remains uncertain. In this work, we establish that GNNs can serve purposes beyond their traditional role, and be exploited to generate numerical schemes, in conjunction with symbolic regression. First, we show numerically and theoretically that a GNN trained on a dataset consisting solely of two-node graphs can extrapolate a first-order Finite Volume (FV) scheme for the heat equation on out-of-distribution, unstructured meshes. Specifically, if a GNN achieves a loss $\varepsilon$ on such a dataset, it implements the FV scheme with an error of $\mathcal{O}(\varepsilon)$. Using symbolic regression, we show that the network effectively rediscovers the exact analytical formulation of the standard first-order FV scheme. We then extend this approach to an unsupervised context: the GNN recovers the first-order FV scheme using only a residual loss similar to Physics-Informed Neural Networks (PINNs) with no access to ground-truth data. Finally, we push the methodology further by considering higher-order schemes: we train (i) a 2-hop and (ii) a 2-layers GNN using the same PINN loss, that autonomously discover (i) a second-order correction term to the initial scheme using a 2-hop stencil, and (ii) the classic second-order midpoint scheme. These findings follows a recent paradigm in scientific computing: GNNs are not only strong approximators, but can be active contributors to the development of novel numerical methods.
【3】Predicting Drug-Drug Interactions Using Heterogeneous Graph Neural Networks: HGNN-DDI
标题:使用异类图神经网络预测药物相互作用:HGNN-DDD
链接:https://arxiv.org/abs/2508.18766
作者:u, Siyi Li, Zheng Yu
备注:12 pages, 5 figures. Published in Applied and Computational Engineering, Vol. 79, pp. 77-89, July 25, 2024. Licensed under CC BY 4.0
摘要:药物相互作用(DDI)是临床实践中的一个主要问题,因为它们可能导致治疗效果降低或严重不良反应。传统的计算方法往往难以捕捉药物、靶标和生物实体之间的复杂关系。在这项工作中,我们提出了HGNN-DDI,这是一种异构图神经网络模型,旨在通过整合多个药物相关数据源来预测潜在的DDI。HGNN-DDI利用图表示学习来建模异构生物医学网络,从而实现跨不同节点和边缘类型的有效信息传播。在基准DDI数据集上的实验结果表明,HGNN-DDI在预测准确性和鲁棒性方面优于最先进的基线,突出了其支持更安全药物开发和精准医学的潜力。
摘要:Drug-drug interactions (DDIs) are a major concern in clinical practice, as they can lead to reduced therapeutic efficacy or severe adverse effects. Traditional computational approaches often struggle to capture the complex relationships among drugs, targets, and biological entities. In this work, we propose HGNN-DDI, a heterogeneous graph neural network model designed to predict potential DDIs by integrating multiple drug-related data sources. HGNN-DDI leverages graph representation learning to model heterogeneous biomedical networks, enabling effective information propagation across diverse node and edge types. Experimental results on benchmark DDI datasets demonstrate that HGNN-DDI outperforms state-of-the-art baselines in prediction accuracy and robustness, highlighting its potential to support safer drug development and precision medicine.
【4】Beyond Tokens: Enhancing RTL Quality Estimation via Structural Graph Learning
标题:超越代币:通过结构图学习增强RTL质量估计
链接:https://arxiv.org/abs/2508.18730
作者:ongji Zhang, Yiwen Wang, Dimitris Tsaras, Lei Chen, Mingxuan Yuan, Qiang Xu
摘要:评估寄存器传输级(RTL)设计的质量在电子设计自动化(EDA)工作流程中至关重要,因为它可以对面积和延迟等关键指标进行即时反馈,而无需进行耗时的逻辑综合。虽然最近的方法利用大型语言模型(LLM)从RTL代码中获得嵌入并取得了令人满意的结果,但它们忽略了准确质量估计所必需的结构语义。相比之下,控制数据流图(CDFG)视图更明确地暴露了设计的结构特征,为表示学习提供了更丰富的线索。在这项工作中,我们引入了一种新的结构感知图自监督学习框架,结构RTL,改进RTL设计质量估计。通过从CDFG中学习结构信息表示,我们的方法在各种质量估计任务上显着优于现有技术。为了进一步提高性能,我们采用了一种知识蒸馏策略,将低层次的见解从映射后的网表转移到CDFG预测器中。实验表明,我们的方法建立了新的国家的最先进的结果,证明了结构化学习与跨阶段监督相结合的有效性。
摘要:Estimating the quality of register transfer level (RTL) designs is crucial in the electronic design automation (EDA) workflow, as it enables instant feedback on key metrics like area and delay without the need for time-consuming logic synthesis. While recent approaches have leveraged large language models (LLMs) to derive embeddings from RTL code and achieved promising results, they overlook the structural semantics essential for accurate quality estimation. In contrast, the control data flow graph (CDFG) view exposes the design's structural characteristics more explicitly, offering richer cues for representation learning. In this work, we introduce a novel structure-aware graph self-supervised learning framework, StructRTL, for improved RTL design quality estimation. By learning structure-informed representations from CDFGs, our method significantly outperforms prior art on various quality estimation tasks. To further boost performance, we incorporate a knowledge distillation strategy that transfers low-level insights from post-mapping netlists into the CDFG predictor. Experiments show that our approach establishes new state-of-the-art results, demonstrating the effectiveness of combining structural learning with cross-stage supervision.
【5】Natural Image Classification via Quasi-Cyclic Graph Ensembles and Random-Bond Ising Models at the Nishimori Temperature
标题:西森温度下通过准循环图集合和随机键伊辛模型进行自然图像分类
链接:https://arxiv.org/abs/2508.18717
作者:yuk, D.A. Sapoznikov, S.I. Egorov
备注:27 pages, 8 figures, 2 tables, was presented at the 9th International Conference 'Deep Learning on Computational Physics (DLCP2025)', and is currently under review for the Moscow University Physics Bulletin, Physics series
摘要:我们提出了一个统一的框架相结合的统计物理,编码理论和代数拓扑结构,有效的多类图像分类。从一个冻结的MobileNetV 2骨干的高维特征向量被解释为稀疏的多边缘类型准循环LDPC(MET-QC-LDPC)图上的自旋,形成一个随机键伊辛模型(RBIM)。我们在其西森温度下操作RBIM,$\beta_N$,其中Bethe-Hessian矩阵的最小特征值为零,最大化类可分性。 我们的理论贡献建立了一个对应的本地捕获集的代码的图形和拓扑不变量(贝蒂数,bordism类)的功能流形。一个实用的算法估计$\beta_N$有效地与二次插值和牛顿校正,实现了六倍的速度。 在拓扑学的指导下,我们设计了球形和环形的MET-QC-LDPC图集成,使用永久边界来抑制有害的陷阱集。对于ImageNet-10和-100子集,这将1280维特征压缩到32或64维。尽管进行了大规模压缩(参数减少了40倍),我们在ImageNet-10上实现了98.7%的准确率,在ImageNet-100上实现了82.7%的准确率,这表明拓扑引导图设计可以产生高效的,具有最先进性能的物理启发嵌入。
摘要:We present a unified framework combining statistical physics, coding theory, and algebraic topology for efficient multi-class image classification. High-dimensional feature vectors from a frozen MobileNetV2 backbone are interpreted as spins on a sparse Multi-Edge Type quasi-cyclic LDPC (MET-QC-LDPC) graph, forming a Random-Bond Ising Model (RBIM). We operate this RBIM at its Nishimori temperature, $\beta_N$, where the smallest eigenvalue of the Bethe-Hessian matrix vanishes, maximizing class separability. Our theoretical contribution establishes a correspondence between local trapping sets in the code's graph and topological invariants (Betti numbers, bordism classes) of the feature manifold. A practical algorithm estimates $\beta_N$ efficiently with a quadratic interpolant and Newton correction, achieving a six-fold speed-up over bisection. Guided by topology, we design spherical and toroidal MET-QC-LDPC graph ensembles, using permanent bounds to suppress harmful trapping sets. This compresses 1280-dimensional features to 32 or 64 dimensions for ImageNet-10 and -100 subsets. Despite massive compression (40x fewer parameters), we achieve 98.7% accuracy on ImageNet-10 and 82.7% on ImageNet-100, demonstrating that topology-guided graph design yields highly efficient, physics-inspired embeddings with state-of-the-art performance.
【6】A Note on Graphon-Signal Analysis of Graph Neural Networks
标题:关于图神经网络涂鸦信号分析的一个注记
链接:https://arxiv.org/abs/2508.18564
作者:hwerger, Ron Levie
摘要:Levie最近的一篇论文“A Graphon-Signal Analysis of Graph Neural Networks”通过嵌入MPNN的输入空间来分析消息传递图神经网络(MPNN),即,属性图(graphon-signals),到属性图子(graphon-signals)的空间。将图子分析中的标准结果推广到图子信号,证明了MPNN的一个推广界和一个采样引理。然而,这篇论文中有一些缺失的成分,限制了它在图机器学习实际环境中的适用性。在目前的文件中,我们介绍了一些改进和扩展现有的结果,解决这些缺点。具体地说,1)我们将文中的主要结果推广到了具有多维信号的图子信号(而不是1D信号),2)我们将Lipschitz连续性扩展到MPNN,其中关于切割距离读出(而不是没有关于切割度量的读出的MPNN),3)我们通过利用鲁棒性类型的泛化界来改进泛化界,(4)将分析扩展到非对称图子和核。
摘要:A recent paper, ``A Graphon-Signal Analysis of Graph Neural Networks'', by Levie, analyzed message passing graph neural networks (MPNNs) by embedding the input space of MPNNs, i.e., attributed graphs (graph-signals), to a space of attributed graphons (graphon-signals). Based on extensions of standard results in graphon analysis to graphon-signals, the paper proved a generalization bound and a sampling lemma for MPNNs. However, there are some missing ingredients in that paper, limiting its applicability in practical settings of graph machine learning. In the current paper, we introduce several refinements and extensions to existing results that address these shortcomings. In detail, 1) we extend the main results in the paper to graphon-signals with multidimensional signals (rather than 1D signals), 2) we extend the Lipschitz continuity to MPNNs with readout with respect to cut distance (rather than MPNNs without readout with respect to cut metric), 3) we improve the generalization bound by utilizing robustness-type generalization bounds, and 4) we extend the analysis to non-symmetric graphons and kernels.
Transformer(5篇)
【1】When recalling in-context, Transformers are not SSMs
标题:在上下文中回忆时,Transformer不是ESM
链接:https://arxiv.org/abs/2508.19029
作者:kpekpe, Antonio Orvieto
摘要:尽管现代递归深度学习模型(如状态空间模型(SSM))具有次二次复杂性的优势,但最近的研究强调了它们在推理和记忆任务上与Transformers相比的潜在缺点。在本文中,我们深入研究了其中一个基准:关联召回(AR),它已被证明与语言建模性能相关,并详细检查了最近提出的令牌混合策略中的缩放和优化问题的影响。我们首先证明,与标准的Transformers不同,学习率的选择在现代循环模型的性能中起着关键作用:这个问题可能会严重影响以前的工作中报告的性能,并表明需要进一步的研究来稳定训练。接下来,我们展示了周期性和基于注意力的模型在宽度缩放而不是深度缩放时表现出对比鲜明的优势,当局限于单层时,注意力显然无法解决AR。然后,我们进一步检查了1层Transformers,发现尽管它们的性能很差,但它们的训练动态令人惊讶地类似于感应头的形成,这是以前只在2层变压器中观察到的现象。最后,通过架构消融,我们研究了组件如何影响Transformer和Mamba的性能和优化稳定性。
摘要:Despite the advantageous subquadratic complexity of modern recurrent deep learning models -- such as state-space models (SSMs) -- recent studies have highlighted their potential shortcomings compared to transformers on reasoning and memorization tasks. In this paper, we dive deeper into one of such benchmarks: associative recall (AR), which has been shown to correlate well with language modeling performance, and inspect in detail the effects of scaling and optimization issues in recently proposed token mixing strategies. We first demonstrate that, unlike standard transformers, the choice of learning rate plays a critical role in the performance of modern recurrent models: an issue that can severely affect reported performance in previous works and suggests further research is needed to stabilize training. Next, we show that recurrent and attention-based models exhibit contrasting benefits when scaling in width as opposed to depth, with attention being notably unable to solve AR when limited to a single layer. We then further inspect 1-layer transformers, revealing that despite their poor performance, their training dynamics surprisingly resemble the formation of induction heads, a phenomenon previously observed only in their 2-layer counterparts. Finally, through architectural ablations, we study how components affects Transformer and Mamba's performance and optimization stability.
【2】Enhancing compact convolutional transformers with super attention
标题:高度关注地增强紧凑型卷积Transformer
链接:https://arxiv.org/abs/2508.18960
作者: Honore Leandre, Natenaile Asmamaw Shiferaw, Dillip Rout
备注:9 pages, 4 figures
摘要
:在本文中,我们提出了一种视觉模型,该模型采用令牌混合,序列池和卷积令牌化器,以在固定上下文长度的任务中实现最先进的性能和有效的推理。在CIFAR100基准测试中,我们的模型将前1%和前5%的验证准确率从36.50%显著提高到46.29%和66.33%到76.31%,同时在上下文长度小于嵌入维数且大小仅为60%时,比缩放点产品注意力(SDPA)Transformers更有效。此外,该架构具有很高的训练稳定性,并且不依赖于诸如混合、位置嵌入或学习率调度等数据增强技术。我们在Github上提供代码。
摘要:In this paper, we propose a vision model that adopts token mixing, sequence-pooling, and convolutional tokenizers to achieve state-of-the-art performance and efficient inference in fixed context-length tasks. In the CIFAR100 benchmark, our model significantly improves the baseline of the top 1% and top 5% validation accuracy from 36.50% to 46.29% and 66.33% to 76.31%, while being more efficient than the Scaled Dot Product Attention (SDPA) transformers when the context length is less than the embedding dimension and only 60% the size. In addition, the architecture demonstrates high training stability and does not rely on techniques such as data augmentation like mixup, positional embeddings, or learning rate scheduling. We make our code available on Github.
【3】Optimization of Latent-Space Compression using Game-Theoretic Techniques for Transformer-Based Vector Search
标题:使用基于变换器的载体搜索的游戏理论技术优化潜在空间压缩
链接:https://arxiv.org/abs/2508.18877
作者:Agrawal, Nisharg Nargund, Oishani Banerjee
摘要:向量相似性搜索在现代信息检索系统中起着举足轻重的作用,特别是在基于transformer的嵌入技术的支持下。然而,这种系统的可扩展性和效率往往受到潜在表示的高维性的阻碍。在本文中,我们提出了一种新型的博弈论框架,用于优化潜在空间压缩,以提高向量搜索的效率和语义效用。通过将压缩策略建模为检索精度和存储效率之间的零和博弈,我们得到了一个潜在的转换,保留语义相似性,同时减少冗余。我们将我们的方法与FAISS(一种广泛使用的矢量搜索库)进行基准测试,并证明我们的方法实现了显着更高的平均相似性(0.9981与0.5517)和实用性(0.8873与0.5194),尽管查询时间略有增加。这种权衡突出了博弈论潜在压缩在高实用性、基于transformer的搜索应用中的实用价值。该系统可以无缝集成到现有的LLM管道,以产生更准确的语义和计算效率的检索。
摘要:Vector similarity search plays a pivotal role in modern information retrieval systems, especially when powered by transformer-based embeddings. However, the scalability and efficiency of such systems are often hindered by the high dimensionality of latent representations. In this paper, we propose a novel game-theoretic framework for optimizing latent-space compression to enhance both the efficiency and semantic utility of vector search. By modeling the compression strategy as a zero-sum game between retrieval accuracy and storage efficiency, we derive a latent transformation that preserves semantic similarity while reducing redundancy. We benchmark our method against FAISS, a widely-used vector search library, and demonstrate that our approach achieves a significantly higher average similarity (0.9981 vs. 0.5517) and utility (0.8873 vs. 0.5194), albeit with a modest increase in query time. This trade-off highlights the practical value of game-theoretic latent compression in high-utility, transformer-based search applications. The proposed system can be seamlessly integrated into existing LLM pipelines to yield more semantically accurate and computationally efficient retrieval.
【4】DenseRec: Revisiting Dense Content Embeddings for Sequential Transformer-based Recommendation
标题:DenseRec:重新审视基于序列转换器的推荐的密集内容嵌入
链接:https://arxiv.org/abs/2508.18442
作者: Lichtenberg, Antonio De Candia, Matteo Ruffini
备注:EARL workshop @RecSys'25, Prague, Czech Republic
摘要:基于transformer的顺序排序器,如SASRec或BERT4Rec,通常只依赖于学习的项目ID嵌入,使它们容易受到项目冷启动问题的影响,特别是在具有动态项目目录的环境中。虽然来自预训练模型的密集内容嵌入提供了潜在的解决方案,但与仅使用ID的方法相比,直接集成到基于transformer的编译器中一直表现不佳。我们重新审视这一集成挑战,并提出了DenseRec,一个简单而有效的方法,引入了双路径嵌入方法。在训练过程中,DenseRec学习从密集嵌入空间到ID嵌入空间的线性投影,从而实现对以前未见过的项目的无缝泛化,而无需专门的嵌入模型或复杂的基础设施。在三个真实世界数据集上的实验中,我们发现DenseRec始终优于仅ID的SASRec基线,即使没有额外的超参数调整,并且使用紧凑的嵌入模型。我们的分析表明,改进主要来自于在存在看不见的项目的情况下更好的序列表示,将DenseRec定位为冷启动顺序推荐的实用且强大的解决方案。
摘要:Transformer-based sequential recommenders, such as SASRec or BERT4Rec, typically rely solely on learned item ID embeddings, making them vulnerable to the item cold-start problem, particularly in environments with dynamic item catalogs. While dense content embeddings from pre-trained models offer potential solutions, direct integration into transformer-based recommenders has consistently underperformed compared to ID-only approaches. We revisit this integration challenge and propose DenseRec, a simple yet effective method that introduces a dual-path embedding approach. DenseRec learns a linear projection from the dense embedding space into the ID embedding space during training, enabling seamless generalization to previously unseen items without requiring specialized embedding models or complex infrastructure. In experiments on three real-world datasets, we find DenseRec to consistently outperform an ID-only SASRec baseline, even without additional hyperparameter tuning and while using compact embedding models. Our analysis suggests improvements primarily arise from better sequence representations in the presence of unseen items, positioning DenseRec as a practical and robust solution for cold-start sequential recommendation.
【5】Vectorized Attention with Learnable Encoding for Quantum Transformer
标题:量子Transformer的可学习编码的载体化注意力
链接:https://arxiv.org/abs/2508.18464
作者:o, Ziwen Pan, Alex Khan, Jan Balewski
摘要:矢量化量子块编码提供了一种将经典数据嵌入希尔伯特空间的方法,为量子模型(如量子Transformers(QT))提供了一条途径,它用量子电路模拟取代了经典的自注意力,从而更有效地运行。目前的量子点依赖于深度参数化量子电路(PQC),这使得它们容易受到QPU噪声的影响,从而阻碍了它们的实际性能。在本文中,我们提出了矢量化的量子Transformer(VQT),一个模型,支持理想的掩蔽注意力矩阵计算通过量子近似模拟和有效的训练,通过矢量化的非线性量子编码器,产生拍摄效率和无梯度的量子电路模拟(QCS)和减少经典的采样开销。此外,我们还展示了IBM和IonQ在量子电路仿真方面的准确性比较,以及在IBM最先进的高保真金士顿QPU上对自然语言处理任务进行基准测试的竞争结果。我们的噪声中间尺度量子友好VQT方法为量子计算中的端到端机器学习解锁了一种新的架构。
摘要:Vectorized quantum block encoding provides a way to embed classical data into Hilbert space, offering a pathway for quantum models, such as Quantum Transformers (QT), that replace classical self-attention with quantum circuit simulations to operate more efficiently. Current QTs rely on deep parameterized quantum circuits (PQCs), rendering them vulnerable to QPU noise, and thus hindering their practical performance. In this paper, we propose the Vectorized Quantum Transformer (VQT), a model that supports ideal masked attention matrix computation through quantum approximation simulation and efficient training via vectorized nonlinear quantum encoder, yielding shot-efficient and gradient-free quantum circuit simulation (QCS) and reduced classical sampling overhead. In addition, we demonstrate an accuracy comparison for IBM and IonQ in quantum circuit simulation and competitive results in benchmarking natural language processing tasks on IBM state-of-the-art and high-fidelity Kingston QPU. Our noise intermediate-scale quantum friendly VQT approach unlocks a novel architecture for end-to-end machine learning in quantum computing.
GAN|对抗|攻击|生成相关(4篇)
【1】Planning-Query-Guided Model Generation for Model-Based Deformable Object Manipulation
标题:基于模型的可变形物体操作中规划查询引导的模型生成
链接:https://arxiv.org/abs/2508.19199
作者:assa, Zixuan Huang, Dmitry Berenson, Oliver Kroemer
备注:9 pages, 7 figures
摘要:在高维空间中的有效规划,如那些涉及可变形的物体,需要计算上易于处理,但足够的表达动力学模型。本文介绍了一种方法,自动生成任务特定的,空间自适应动态模型,通过学习哪些区域的对象需要高分辨率建模,以实现良好的任务性能,为一个给定的规划查询。任务性能取决于动力学模型、世界动力学、控制和任务要求之间的复杂相互作用。我们提出的基于扩散的模型生成器预测每个区域的模型分辨率的基础上开始和目标pointclouds定义的规划查询。为了有效地收集用于学习该映射的数据,两阶段过程在使用闭环性能直接优化之前使用预测动态作为先验来优化分辨率。在树操作任务上,我们的方法使规划速度加倍,而任务性能只比使用全分辨率模型略有下降。这种方法为使用以前的规划和控制数据生成计算效率高但表达能力强的新任务动态模型提供了一条途径。
摘要:Efficient planning in high-dimensional spaces, such as those involving deformable objects, requires computationally tractable yet sufficiently expressive dynamics models. This paper introduces a method that automatically generates task-specific, spatially adaptive dynamics models by learning which regions of the object require high-resolution modeling to achieve good task performance for a given planning query. Task performance depends on the complex interplay between the dynamics model, world dynamics, control, and task requirements. Our proposed diffusion-based model generator predicts per-region model resolutions based on start and goal pointclouds that define the planning query. To efficiently collect the data for learning this mapping, a two-stage process optimizes resolution using predictive dynamics as a prior before directly optimizing using closed-loop performance. On a tree-manipulation task, our method doubles planning speed with only a small decrease in task performance over using a full-resolution model. This approach informs a path towards using previous planning and control data to generate computationally efficient yet sufficiently expressive dynamics models for new tasks.
【2】USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
标题:USO:通过分解和奖励学习实现统一风格和主题驱动生成
链接:https://arxiv.org/abs/2508.18966
作者:u, Mengqi Huang, Yufeng Cheng, Wenxu Wu, Jiahe Tian, Yiming Luo, Fei Ding, Qian He
备注:Project page: this https URL Code and model: this https URL
摘要:现有文献通常将风格驱动和主题驱动的生成视为两个不相交的任务:前者优先考虑风格相似性,而后者坚持主题一致性,导致明显的对立。我们认为,这两个目标可以统一在一个框架下,因为它们最终涉及的内容和风格,在风格驱动的研究,一个长期存在的主题的解开和重新组合。为此,我们提出了USO,一个统一的风格主题优化定制模型。首先,我们构建了一个大规模的三元组数据集,包括内容图像,风格图像,以及它们相应的风格化内容图像。其次,我们介绍了一个解开的学习计划,同时对齐风格的功能和解开内容的风格,通过两个互补的目标,风格对齐培训和内容风格的解开训练。第三,我们将一个风格奖励学习范式表示为SRL,以进一步提高模型的性能。最后,我们发布了USO-Bench,这是第一个在多个指标上联合评估风格相似性和主题保真度的基准。大量的实验表明,USO在主题一致性和风格相似性两个维度上都达到了开源模型中最先进的性能。代码和型号:https://github.com/bytedance/USO
摘要:Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they ultimately concern the disentanglement and re-composition of content and style, a long-standing theme in style-driven research. To this end, we present USO, a Unified Style-Subject Optimized customization model. First, we construct a large-scale triplet dataset consisting of content images, style images, and their corresponding stylized content images. Second, we introduce a disentangled learning scheme that simultaneously aligns style features and disentangles content from style through two complementary objectives, style-alignment training and content-style disentanglement training. Third, we incorporate a style reward-learning paradigm denoted as SRL to further enhance the model's performance. Finally, we release USO-Bench, the first benchmark that jointly evaluates style similarity and subject fidelity across multiple metrics. Extensive experiments demonstrate that USO achieves state-of-the-art performance among open-source models along both dimensions of subject consistency and style similarity. Code and model: https://github.com/bytedance/USO
【3】Energy-Based Flow Matching for Generating 3D Molecular Structure
标题:基于能量的流动匹配用于生成3D分子结构
链接:https://arxiv.org/abs/2508.18949
作者:ou, Christopher Iliffe Sprague, Vsevolod Viliuga, Matteo Tadiello, Arne Elofsson, Hossein Azizpour
备注:Accepted to the International Conference on Machine Learning (2025)
摘要:分子结构生成是一个基本的问题,涉及到确定分子的组成部分的三维位置。它具有重要的生物学应用,如分子对接,蛋白质折叠和分子设计。最近的进展,在生成建模,如扩散模型和流量匹配,取得了很大的进展,这些任务的分子构象作为一个分布模型。在这项工作中,我们专注于流匹配,并采用基于能量的观点来改善结构生成模型的训练和推理。我们的观点产生了一个映射函数,由一个深度网络表示,该函数直接学习以\textit{迭代}将随机配置(即来自源分布的样本)映射到目标结构(即数据流形中的点)。这产生了一个概念上简单且经验上有效的流匹配设置,该设置在理论上是合理的,并且与幂等性和稳定性等基本属性以及AlphaFold中的结构细化等经验上有用的技术有着有趣的联系。蛋白质对接以及蛋白质骨架生成的实验一致地证明了该方法的有效性,其中使用类似的计算预算,它优于最近的任务相关流匹配和扩散模型的基线。
摘要:Molecular structure generation is a fundamental problem that involves determining the 3D positions of molecules' constituents. It has crucial biological applications, such as molecular docking, protein folding, and molecular design. Recent advances in generative modeling, such as diffusion models and flow matching, have made great progress on these tasks by modeling molecular conformations as a distribution. In this work, we focus on flow matching and adopt an energy-based perspective to improve training and inference of structure generation models. Our view results in a mapping function, represented by a deep network, that is directly learned to \textit{iteratively} map random configurations, i.e. samples from the source distribution, to target structures, i.e. points in the data manifold. This yields a conceptually simple and empirically effective flow matching setup that is theoretically justified and has interesting connections to fundamental properties such as idempotency and stability, as well as the empirically useful techniques such as structure refinement in AlphaFold. Experiments on protein docking as well as protein backbone generation consistently demonstrate the method's effectiveness, where it outperforms recent baselines of task-associated flow matching and diffusion models, using a similar computational budget.
【4】FLAegis: A Two-Layer Defense Framework for Federated Learning Against Poisoning Attacks
标题:FLAegis:针对中毒攻击的联邦学习的两层防御框架
链接:https://arxiv.org/abs/2508.18737
作者:ármol Campos, Aurora González Vidal, José Luis Hernández Ramos, Antonio Skarmeta
备注:15 pages, 5 tables, and 5 figures
摘要:联邦学习(FL)已经成为一种以分散方式训练机器学习(ML)模型的强大技术,保护所涉及的训练数据集的隐私。然而,FL的分散性质限制了培训过程的可见性,严重依赖参与客户的诚实。这种假设为恶意的第三方打开了大门,这些第三方被称为拜占庭客户端,它们可以通过提交错误的模型更新来毒害训练过程。这种恶意客户端可能参与中毒攻击,操纵数据集或模型参数以引起错误分类。作为回应,本研究介绍了FLAegis,一个两阶段的防御框架,旨在识别拜占庭客户端和提高FL系统的鲁棒性。我们的方法利用符号时间序列变换(SAX)来放大良性和恶意模型之间的差异,并利用谱聚类来准确检测对抗行为。此外,我们将一个强大的基于FFT的聚合功能作为最后一层,以减轻那些试图逃避先前防御的拜占庭客户端的影响。我们严格评估我们的方法对五个中毒攻击,从简单的标签翻转自适应优化为基础的战略。值得注意的是,我们的方法在检测精度和最终模型准确性方面都优于最先进的防御方法,即使在强对抗条件下也能保持一贯的高性能。
摘要
:Federated Learning (FL) has become a powerful technique for training Machine Learning (ML) models in a decentralized manner, preserving the privacy of the training datasets involved. However, the decentralized nature of FL limits the visibility of the training process, relying heavily on the honesty of participating clients. This assumption opens the door to malicious third parties, known as Byzantine clients, which can poison the training process by submitting false model updates. Such malicious clients may engage in poisoning attacks, manipulating either the dataset or the model parameters to induce misclassification. In response, this study introduces FLAegis, a two-stage defensive framework designed to identify Byzantine clients and improve the robustness of FL systems. Our approach leverages symbolic time series transformation (SAX) to amplify the differences between benign and malicious models, and spectral clustering, which enables accurate detection of adversarial behavior. Furthermore, we incorporate a robust FFT-based aggregation function as a final layer to mitigate the impact of those Byzantine clients that manage to evade prior defenses. We rigorously evaluate our method against five poisoning attacks, ranging from simple label flipping to adaptive optimization-based strategies. Notably, our approach outperforms state-of-the-art defenses in both detection precision and final model accuracy, maintaining consistently high performance even under strong adversarial conditions.
半/弱/无/有监督|不确定性|主动学习(5篇)
【1】From Tabula Rasa to Emergent Abilities: Discovering Robot Skills via Real-World Unsupervised Quality-Diversity
标题:从Tabula Rasa到紧急能力:通过现实世界的无监督质量多样性发现机器人技能
链接:https://arxiv.org/abs/2508.19172
作者:lotti, Lisa Coiffard, Oscar Pang, Maxence Faldor, Antoine Cully (AIRL, Imperial College London)
备注:Accepted at CoRL 2025
摘要:自主技能发现旨在使机器人能够在没有明确监督的情况下获得不同的行为。由于安全性和数据效率的限制,直接在物理硬件上学习这些行为仍然具有挑战性。现有的方法,包括质量多样性演员评论(QDAC),需要手动定义的技能空间和仔细调整的技巧,限制了现实世界的适用性。我们提出了无监督的真实世界技能获取(URSA),这是QDAC的扩展,使机器人能够直接在现实世界中自主发现和掌握各种高性能技能。我们证明,URSA成功地发现不同的运动技能的Unitree A1四足动物在模拟和现实世界中。我们的方法支持技能驱动的技能发现和完全无监督的设置。我们还表明,学习到的技能库可以重新用于下游任务,例如现实世界的损伤适应,其中URSA在9个模拟场景中的5个和5个现实世界损伤场景中的3个中表现优于所有基线。我们的研究结果为现实世界的机器人学习建立了一个新的框架,可以在有限的人为干预下实现持续的技能发现,这是朝着更自主和适应性更强的机器人系统迈出的重要一步。演示视频可在http://adaptive-intelligent-robotics.github.io/URSA上获得。
摘要:Autonomous skill discovery aims to enable robots to acquire diverse behaviors without explicit supervision. Learning such behaviors directly on physical hardware remains challenging due to safety and data efficiency constraints. Existing methods, including Quality-Diversity Actor-Critic (QDAC), require manually defined skill spaces and carefully tuned heuristics, limiting real-world applicability. We propose Unsupervised Real-world Skill Acquisition (URSA), an extension of QDAC that enables robots to autonomously discover and master diverse, high-performing skills directly in the real world. We demonstrate that URSA successfully discovers diverse locomotion skills on a Unitree A1 quadruped in both simulation and the real world. Our approach supports both heuristic-driven skill discovery and fully unsupervised settings. We also show that the learned skill repertoire can be reused for downstream tasks such as real-world damage adaptation, where URSA outperforms all baselines in 5 out of 9 simulated and 3 out of 5 real-world damage scenarios. Our results establish a new framework for real-world robot learning that enables continuous skill discovery with limited human intervention, representing a significant step toward more autonomous and adaptable robotic systems. Demonstration videos are available at http://adaptive-intelligent-robotics.github.io/URSA .
【2】Active Query Selection for Crowd-Based Reinforcement Learning
标题:基于人群的强化学习的主动查询选择
链接:https://arxiv.org/abs/2508.19132
作者:Erskine, Taku Yamagata, Raúl Santos-Rodríguez
备注:7 pages, 4 figures, 2 tables plus appendices
摘要:基于偏好的强化学习已经成为一种在奖励信号难以指定或与人类意图不一致的环境中训练代理的策略。然而,它的有效性往往受到成本高和可靠的人力投入的低可用性的限制,特别是在缺乏专家反馈或错误代价高昂的领域。为了解决这个问题,我们提出了一个新的框架,结合了两个互补的策略:概率人群建模来处理嘈杂的,多注释者的反馈,和主动学习,以优先考虑最翔实的代理行动的反馈。我们扩展的建议算法,以支持多个教练,估计他们的可靠性在线,并结合基于熵的查询选择,以指导反馈请求。我们在一组环境中评估了我们的方法,这些环境涵盖了合成和现实世界的环境,包括2D游戏(出租车,Pacman,Frozen Lake)和使用临床批准的UVA/Padova模拟器进行的1型糖尿病血糖控制任务。我们的初步结果表明,在大多数任务中,接受不确定轨迹反馈训练的智能体表现出更快的学习速度,并且我们在血糖控制任务中的表现优于基线。
摘要:Preference-based reinforcement learning has gained prominence as a strategy for training agents in environments where the reward signal is difficult to specify or misaligned with human intent. However, its effectiveness is often limited by the high cost and low availability of reliable human input, especially in domains where expert feedback is scarce or errors are costly. To address this, we propose a novel framework that combines two complementary strategies: probabilistic crowd modelling to handle noisy, multi-annotator feedback, and active learning to prioritize feedback on the most informative agent actions. We extend the Advise algorithm to support multiple trainers, estimate their reliability online, and incorporate entropy-based query selection to guide feedback requests. We evaluate our approach in a set of environments that span both synthetic and real-world-inspired settings, including 2D games (Taxi, Pacman, Frozen Lake) and a blood glucose control task for Type 1 Diabetes using the clinically approved UVA/Padova simulator. Our preliminary results demonstrate that agents trained with feedback on uncertain trajectories exhibit faster learning in most tasks, and we outperform the baselines for the blood glucose control task.
【3】Metric Matters: A Formal Evaluation of Similarity Measures in Active Learning for Cyber Threat Intelligence
标题:指标很重要:网络威胁情报主动学习中相似性指标的正式评估
链接:https://arxiv.org/abs/2508.19019
作者:Benabderrahmane, Talal Rahwan
摘要:高级持续性威胁(APT)由于其隐身行为和检测数据集固有的极端类别不平衡性,对网络防御提出了严峻的挑战。为了解决这些问题,我们提出了一种新的主动学习为基础的异常检测框架,利用相似性搜索迭代细化决策空间。建立在基于注意力的自动编码器,我们的方法使用特征空间相似性来识别正常和异常的实例,从而提高模型的鲁棒性与最小的预言机监督。至关重要的是,我们执行各种相似性措施的正式评估,以了解其对样本选择和异常排名的有效性的影响。通过对不同数据集的实验,包括DARPA透明计算APT跟踪,我们证明了相似性度量的选择显着影响模型收敛,异常检测精度和标签效率。我们的研究结果为在为威胁情报和网络防御量身定制的主动学习管道中选择相似性函数提供了可操作的见解。
摘要
:Advanced Persistent Threats (APTs) pose a severe challenge to cyber defense due to their stealthy behavior and the extreme class imbalance inherent in detection datasets. To address these issues, we propose a novel active learning-based anomaly detection framework that leverages similarity search to iteratively refine the decision space. Built upon an Attention-Based Autoencoder, our approach uses feature-space similarity to identify normal-like and anomaly-like instances, thereby enhancing model robustness with minimal oracle supervision. Crucially, we perform a formal evaluation of various similarity measures to understand their influence on sample selection and anomaly ranking effectiveness. Through experiments on diverse datasets, including DARPA Transparent Computing APT traces, we demonstrate that the choice of similarity metric significantly impacts model convergence, anomaly detection accuracy, and label efficiency. Our results offer actionable insights for selecting similarity functions in active learning pipelines tailored for threat intelligence and cyber defense.
【4】Uncertainty Awareness on Unsupervised Domain Adaptation for Time Series Data
标题:时间序列数据无监督域自适应的不确定性意识
链接:https://arxiv.org/abs/2508.18630
作者:, Xiaoyang Zhong, Lu Wang, Jingwen Hou, Yuemei Luo, Jiebin Yan, Yuming Fang
备注:IEEE Transactions on Multimedia
摘要:无监督域自适应方法寻求在未标记的测试数据上有效地泛化,特别是当遇到时间序列数据中的常见挑战时,即训练数据集和测试数据集之间发生分布变化时。在本文中,我们建议将多尺度特征提取和不确定性估计,以提高模型的泛化能力和跨域的鲁棒性。我们的方法从多尺度混合输入架构开始,该架构可以捕获不同尺度的特征,增加训练多样性并减少训练和测试域之间的特征差异。基于混合输入架构,我们进一步引入了一个基于证据学习的不确定性感知机制,通过对标签施加狄利克雷先验,以促进目标预测和不确定性估计。不确定性感知机制通过在不同域中将具有相同标签的特征对齐来增强域自适应,这导致目标域中的显著性能改进。此外,我们的不确定性感知模型表现出更低的预期校准误差(ECE),表明校准更好的预测置信度。我们的实验结果表明,这种混合输入架构与不确定性意识机制的组合方法在多个基准数据集上实现了最先进的性能,强调了其在时间序列数据的无监督域自适应中的有效性。
摘要:Unsupervised domain adaptation methods seek to generalize effectively on unlabeled test data, especially when encountering the common challenge in time series data that distribution shifts occur between training and testing datasets. In this paper, we propose incorporating multi-scale feature extraction and uncertainty estimation to improve the model's generalization and robustness across domains. Our approach begins with a multi-scale mixed input architecture that captures features at different scales, increasing training diversity and reducing feature discrepancies between the training and testing domains. Based on the mixed input architecture, we further introduce an uncertainty awareness mechanism based on evidential learning by imposing a Dirichlet prior on the labels to facilitate both target prediction and uncertainty estimation. The uncertainty awareness mechanism enhances domain adaptation by aligning features with the same labels across different domains, which leads to significant performance improvements in the target domain. Additionally, our uncertainty-aware model demonstrates a much lower Expected Calibration Error (ECE), indicating better-calibrated prediction confidence. Our experimental results show that this combined approach of mixed input architecture with the uncertainty awareness mechanism achieves state-of-the-art performance across multiple benchmark datasets, underscoring its effectiveness in unsupervised domain adaptation for time series data.
【5】ModAn-MulSupCon: Modality-and Anatomy-Aware Multi-Label Supervised Contrastive Pretraining for Medical Imaging
标题:ModAn-MulSupCon:医学成像的感知形态和解剖学的多标签监督对比预训练
链接:https://arxiv.org/abs/2508.18613
作者:aya, Ryusei Inamori
摘要:背景和目的:专家注释限制了医学成像中的大规模监督预训练,而无处不在的元数据(模态,解剖区域)仍然没有得到充分利用。我们介绍ModAn-MulSupCon,一种模态和解剖结构感知的多标签监督对比预训练方法,该方法利用这些元数据来学习可转移的表示。 方法:将每幅图像的模态和解剖结构编码为多热向量。ResNet-18编码器在RadImageNet的一个迷你子集(miniRIN,16,222张图像)上进行预训练,使用Jaccard加权多标签监督对比损失,然后通过微调和线性探测对三个二进制分类任务进行评估-ACL撕裂(膝盖MRI),病变恶性(乳腺超声)和结节恶性(甲状腺超声)。 测试结果:通过微调,ModAn-MulSupCon在MRNet-ACL(0.964)和甲状腺(0.763)上实现了最佳AUC,超过了所有基线($p<0.05$),在乳房(0.926)上排名第二,仅次于Simplified(0.940;不显著)。在编码器冻结的情况下,Simplified/ImageNet更优越,这表明ModAn-MulSupCon表示最受益于任务自适应而不是线性可分性。 结论:将现成的模态/解剖结构元数据编码为多标签目标提供了实用的、可扩展的预训练信号,当微调可行时,该预训练信号提高了下游准确性。ModAn-MulSupCon是标签稀缺的临床环境的强大初始化,而Simplified/ImageNet仍然是冻结编码器部署的首选。
摘要:Background and objective: Expert annotations limit large-scale supervised pretraining in medical imaging, while ubiquitous metadata (modality, anatomical region) remain underused. We introduce ModAn-MulSupCon, a modality- and anatomy-aware multi-label supervised contrastive pretraining method that leverages such metadata to learn transferable representations. Method: Each image's modality and anatomy are encoded as a multi-hot vector. A ResNet-18 encoder is pretrained on a mini subset of RadImageNet (miniRIN, 16,222 images) with a Jaccard-weighted multi-label supervised contrastive loss, and then evaluated by fine-tuning and linear probing on three binary classification tasks--ACL tear (knee MRI), lesion malignancy (breast ultrasound), and nodule malignancy (thyroid ultrasound). Result: With fine-tuning, ModAn-MulSupCon achieved the best AUC on MRNet-ACL (0.964) and Thyroid (0.763), surpassing all baselines ($p<0.05$), and ranked second on Breast (0.926) behind SimCLR (0.940; not significant). With the encoder frozen, SimCLR/ImageNet were superior, indicating that ModAn-MulSupCon representations benefit most from task adaptation rather than linear separability. Conclusion: Encoding readily available modality/anatomy metadata as multi-label targets provides a practical, scalable pretraining signal that improves downstream accuracy when fine-tuning is feasible. ModAn-MulSupCon is a strong initialization for label-scarce clinical settings, whereas SimCLR/ImageNet remain preferable for frozen-encoder deployments.
迁移|Zero/Few/One-Shot|自适应(5篇)
【1】Few-Shot Connectivity-Aware Text Line Segmentation in Historical Documents
标题:历史文档中的Few-Shot连接性感知文本行分割
链接:https://arxiv.org/abs/2508.19162
作者:erzinger, Tingyu Lin, Robert Sablatnig
备注:15 pages, accepted at ACPR2025
摘要:文本行分割是文档数字化分析的一个基本任务。然而,使用深度学习模型自动化这一过程具有挑战性,因为它需要大量的注释数据集,而这些数据集通常无法用于历史文档。此外,注释过程是一项劳动力和成本密集型任务,需要专业知识,这使得Few-Shot学习成为减少数据需求的有希望的方向。在这项工作中,我们证明了小而简单的架构,加上拓扑感知的损失函数,比更复杂的替代方案更准确和数据效率。我们将一个轻量级的UNet++与一个连接性感知损失配对,该损失最初是为神经元形态学开发的,它明确地惩罚了结构错误,如线碎片化和意外的线合并。为了增加我们有限的数据,我们训练从每份手稿的三页注释中提取的小块。我们的方法显著改进了U-DIADS-TL数据集的当前最先进水平,识别准确率提高了200%,线相交比并集提高了75%。我们的方法还实现了与DIVA-HisDB基线检测任务的竞争获胜者相当甚至超过的F-Measure得分,同时只需要三个注释页面,这证明了我们方法的有效性。我们的实现可以在https://github.com/RafaelSterzinger/acpr_few_shot_hist上公开获得。
摘要:A foundational task for the digital analysis of documents is text line segmentation. However, automating this process with deep learning models is challenging because it requires large, annotated datasets that are often unavailable for historical documents. Additionally, the annotation process is a labor- and cost-intensive task that requires expert knowledge, which makes few-shot learning a promising direction for reducing data requirements. In this work, we demonstrate that small and simple architectures, coupled with a topology-aware loss function, are more accurate and data-efficient than more complex alternatives. We pair a lightweight UNet++ with a connectivity-aware loss, initially developed for neuron morphology, which explicitly penalizes structural errors like line fragmentation and unintended line merges. To increase our limited data, we train on small patches extracted from a mere three annotated pages per manuscript. Our methodology significantly improves upon the current state-of-the-art on the U-DIADS-TL dataset, with a 200% increase in Recognition Accuracy and a 75% increase in Line Intersection over Union. Our method also achieves an F-Measure score on par with or even exceeding that of the competition winner of the DIVA-HisDB baseline detection task, all while requiring only three annotated pages, exemplifying the efficacy of our approach. Our implementation is publicly available at: https://github.com/RafaelSterzinger/acpr_few_shot_hist.
【2】FedProtoKD: Dual Knowledge Distillation with Adaptive Class-wise Prototype Margin for Heterogeneous Federated Learning
标题:FedProtoKD:具有自适应类原型裕度的双重知识提取,用于异类联邦学习
链接:https://arxiv.org/abs/2508.19009
作者:Hossen, Fatema Siddika, Wensheng Zhang, Anuj Sharma, Ali Jannesari
备注:12 pages, 6 figures
摘要:异构联合学习(HFL)因其能够适应不同的模型和跨客户端的异构数据而受到关注。基于原型的HFL方法是解决统计异质性和隐私挑战的一个有前途的解决方案,为HFL研究的新进展铺平了道路。这种方法的重点是在异构客户端之间只共享类代表原型。然而,这些原型通常使用加权平均在服务器上聚合,导致次优的全局知识;这些导致聚合原型的收缩,这对模型异构和数据分布非常非IID的场景中的模型性能产生负面影响。我们提出FedProtoKD在异构联邦学习设置,使用增强的双重知识蒸馏机制,以提高系统性能与客户端的logits和原型特征表示。我们的目标是解决原型边际收缩的问题,使用基于对比学习的可训练服务器原型,利用类明智的自适应原型余量。此外,我们评估的重要性,公共样本使用的样本的原型,其类代表性的原型,提高学习性能的接近度。FedProtoKD在各种设置中实现了1.13%至34.13%的平均准确性改进,并显着优于现有的最先进的HFL方法。
摘要:Heterogeneous Federated Learning (HFL) has gained attention for its ability to accommodate diverse models and heterogeneous data across clients. Prototype-based HFL methods emerge as a promising solution to address statistical heterogeneity and privacy challenges, paving the way for new advancements in HFL research. This method focuses on sharing only class-representative prototypes among heterogeneous clients. However, these prototypes are often aggregated on the server using weighted averaging, leading to sub-optimal global knowledge; these cause the shrinking of aggregated prototypes, which negatively affects the model performance in scenarios when models are heterogeneous and data distributions are extremely non-IID. We propose FedProtoKD in a Heterogeneous Federated Learning setting, using an enhanced dual-knowledge distillation mechanism to improve the system performance with clients' logits and prototype feature representation. We aim to resolve the prototype margin-shrinking problem using a contrastive learning-based trainable server prototype by leveraging a class-wise adaptive prototype margin. Furthermore, we assess the importance of public samples using the closeness of the sample's prototype to its class representative prototypes, which enhances learning performance. FedProtoKD achieved average improvements of 1.13% up to 34.13% accuracy across various settings and significantly outperforms existing state-of-the-art HFL methods.
【3】STRATA-TS: Selective Knowledge Transfer for Urban Time Series Forecasting with Retrieval-Guided Reasoning
标题:STRATA-TS:利用检索引导推理进行城市时间序列预测的选择性知识转移
链接:https://arxiv.org/abs/2508.18635
作者:, Chenxi Liu, Yile Chen, Qin Chao, Shuai Liu, Gao Cong
摘要:城市预测模型经常面临严重的数据不平衡问题:只有少数城市有密集的长时间记录,而其他许多城市的历史记录很短或不完整。从数据丰富的城市直接转移到数据稀缺的城市是不可靠的,因为只有有限的源模式子集真正有利于目标域,而不加选择的转移有可能引入噪音和负转移。我们提出了STRATA-TS(选择性TRAnsfer通过TAarget-aware检索时间序列),一个框架,结合域适应检索与推理能力的大型模型,以提高预测稀缺的数据制度。STRATA-TS采用了基于补丁的时间编码器来识别与目标查询在语义和动态上对齐的源序列。然后,这些检索到的范例被注入到检索引导的推理阶段,其中LLM对目标输入和检索到的支持执行结构化推理。为了实现有效的部署,我们通过监督微调将推理过程提取到一个紧凑的开放模型中。在新加坡,诺丁汉和格拉斯哥的三个停车可用性数据集上进行的广泛实验表明,STRATA-TS始终优于强大的预测和转移基线,同时提供可解释的知识转移途径。
摘要:Urban forecasting models often face a severe data imbalance problem: only a few cities have dense, long-span records, while many others expose short or incomplete histories. Direct transfer from data-rich to data-scarce cities is unreliable because only a limited subset of source patterns truly benefits the target domain, whereas indiscriminate transfer risks introducing noise and negative transfer. We present STRATA-TS (Selective TRAnsfer via TArget-aware retrieval for Time Series), a framework that combines domain-adapted retrieval with reasoning-capable large models to improve forecasting in scarce data regimes. STRATA-TS employs a patch-based temporal encoder to identify source subsequences that are semantically and dynamically aligned with the target query. These retrieved exemplars are then injected into a retrieval-guided reasoning stage, where an LLM performs structured inference over target inputs and retrieved support. To enable efficient deployment, we distill the reasoning process into a compact open model via supervised fine-tuning. Extensive experiments on three parking availability datasets across Singapore, Nottingham, and Glasgow demonstrate that STRATA-TS consistently outperforms strong forecasting and transfer baselines, while providing interpretable knowledge transfer pathways.
【4】Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling
标题:使用对比和预测时空建模的上下文感知监控中的Zero-Shot异常检测
链接:https://arxiv.org/abs/2508.18463
作者:d Shahriar Khan, Md. Abrar Hasan, Mohammod Tareq Aziz Justice
备注:11 pages, 7 figures, 4 tables
摘要:由于监控录像的不可预测性和上下文相关性,检测监控录像中的异常具有固有的挑战性。这项工作介绍了一种新的上下文感知的zero-shot异常检测框架,识别异常事件,而不暴露于异常的例子在训练过程中。建议的混合体系结构结合了TimeTransformer,DPC和CLIP模型时空动态和语义上下文。时间变换器作为视觉骨干提取丰富的时空特征,而DPC预测未来的表示,以确定时间偏差。此外,基于CLIP的语义流通过上下文特定的文本提示实现概念级异常检测。这些组件使用InfoNCE和CPC损失进行联合训练,将视觉输入与其时间和语义表示对齐。上下文门控机制通过使用场景感知线索或全局视频特征调制预测来进一步增强决策。通过将预测建模与视觉语言理解相结合,系统可以概括复杂环境中以前看不见的行为。该框架弥补了zero-shot异常检测中时态推理和语义上下文之间的差距。这项研究的代码已在https://github.com/NK-II/Context-Aware-ZeroShot-Anomaly-Detection-in-Surveillance上提供。
摘要:Detecting anomalies in surveillance footage is inherently challenging due to their unpredictable and context-dependent nature. This work introduces a novel context-aware zero-shot anomaly detection framework that identifies abnormal events without exposure to anomaly examples during training. The proposed hybrid architecture combines TimeSformer, DPC, and CLIP to model spatiotemporal dynamics and semantic context. TimeSformer serves as the vision backbone to extract rich spatial-temporal features, while DPC forecasts future representations to identify temporal deviations. Furthermore, a CLIP-based semantic stream enables concept-level anomaly detection through context-specific text prompts. These components are jointly trained using InfoNCE and CPC losses, aligning visual inputs with their temporal and semantic representations. A context-gating mechanism further enhances decision-making by modulating predictions with scene-aware cues or global video features. By integrating predictive modeling with vision-language understanding, the system can generalize to previously unseen behaviors in complex environments. This framework bridges the gap between temporal reasoning and semantic context in zero-shot anomaly detection for surveillance. The code for this research has been made available at https://github.com/NK-II/Context-Aware-ZeroShot-Anomaly-Detection-in-Surveillance.
【5】Deterministic Coreset Construction via Adaptive Sensitivity Trimming
标题:通过自适应灵敏度修剪构建确定性核心集
链接:https://arxiv.org/abs/2508.18340
作者:ay, Taylan Alpay
备注:6 pages, 5 algorithms, 1 table
摘要:我们开发了一个严格的框架,确定性coreset建设经验风险最小化(ERM)。我们的核心贡献是自适应确定性均匀权重修剪(ADUWT)算法,它通过切除具有最低灵敏度界限的点并将数据相关的均匀权重应用于剩余部分来构建coreset。该方法产生一个统一的相对误差近似的ERM目标在整个假设空间。我们提供了完整的分析,包括(i)证明自适应权重的最优性的极大极小表征,(ii)根据灵敏度异质性指数(Sensitivity Heterodynamic Index)进行的依赖于实例的大小分析,以及(iii)用于核岭回归,正则化逻辑回归和线性SVM的易处理的灵敏度预言。算法的精确伪代码、灵敏度预言器和评估管道支持复制。实证结果与理论一致。我们的结论与开放的问题,实例最佳的预言,确定性流,和公平性约束的ERM。
摘要:We develop a rigorous framework for deterministic coreset construction in empirical risk minimization (ERM). Our central contribution is the Adaptive Deterministic Uniform-Weight Trimming (ADUWT) algorithm, which constructs a coreset by excising points with the lowest sensitivity bounds and applying a data-dependent uniform weight to the remainder. The method yields a uniform $(1\pm\varepsilon)$ relative-error approximation for the ERM objective over the entire hypothesis space. We provide complete analysis, including (i) a minimax characterization proving the optimality of the adaptive weight, (ii) an instance-dependent size analysis in terms of a \emph{Sensitivity Heterogeneity Index}, and (iii) tractable sensitivity oracles for kernel ridge regression, regularized logistic regression, and linear SVM. Reproducibility is supported by precise pseudocode for the algorithm, sensitivity oracles, and evaluation pipeline. Empirical results align with the theory. We conclude with open problems on instance-optimal oracles, deterministic streaming, and fairness-constrained ERM.
强化学习(4篇)
【1】DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift
标题:DRMD:概念漂移下恶意软件检测的深度强化学习
链接:https://arxiv.org/abs/2508.18839
作者:dden, Myles Foley, Mario D'Onghia, Chris Hicks, Vasilios Mavroudis, Nicola Paoletti, Fabio Pierazzi
备注:10 pages
摘要:现实环境中的恶意软件检测必须应对不断变化的威胁、有限的标签预算和不确定的预测。传统的分类器,没有额外的机制,在恶意软件领域的概念漂移下,很难保持性能,因为它们的监督学习公式不能优化何时推迟人工标记和适应的决定。现代恶意软件检测管道将分类器与每月主动学习(AL)和拒绝机制相结合,以减轻概念漂移的影响。在这项工作中,我们开发了一种新的恶意软件检测公式,作为一个一步马尔可夫决策过程,并训练深度强化学习(DRL)代理,同时优化样本分类性能并拒绝高风险样本进行手动标记。我们评估了基于DRL的恶意软件检测(DRMD)代理通过对Android恶意软件数据集进行时间感知评估而学习到的联合检测和漂移缓解策略,这些恶意软件数据集受到需要多年性能稳定性的实际漂移的影响。与域中使用的标准分类方法相比,在这些条件下学习的策略实现了更高的时间下区域(AUT)性能,显示出对概念漂移的改进的弹性。具体而言,DRMD代理在仅分类、分类拒绝和分类拒绝和AL设置下分别实现了5.18\pm5.44 $、14.49\pm12.86 $和10.06\pm10.81 $的平均AUT性能改进。我们的研究结果首次表明,DRL可以促进有效的恶意软件检测和提高弹性的概念漂移在动态环境中的Android恶意软件域。
摘要:Malware detection in real-world settings must deal with evolving threats, limited labeling budgets, and uncertain predictions. Traditional classifiers, without additional mechanisms, struggle to maintain performance under concept drift in malware domains, as their supervised learning formulation cannot optimize when to defer decisions to manual labeling and adaptation. Modern malware detection pipelines combine classifiers with monthly active learning (AL) and rejection mechanisms to mitigate the impact of concept drift. In this work, we develop a novel formulation of malware detection as a one-step Markov Decision Process and train a deep reinforcement learning (DRL) agent, simultaneously optimizing sample classification performance and rejecting high-risk samples for manual labeling. We evaluated the joint detection and drift mitigation policy learned by the DRL-based Malware Detection (DRMD) agent through time-aware evaluations on Android malware datasets subject to realistic drift requiring multi-year performance stability. The policies learned under these conditions achieve a higher Area Under Time (AUT) performance compared to standard classification approaches used in the domain, showing improved resilience to concept drift. Specifically, the DRMD agent achieved a $5.18\pm5.44$, $14.49\pm12.86$, and $10.06\pm10.81$ average AUT performance improvement for the classification only, classification with rejection, and classification with rejection and AL settings, respectively. Our results demonstrate for the first time that DRL can facilitate effective malware detection and improved resiliency to concept drift in the dynamic environment of the Android malware domain.
【2】Breaking Through Barren Plateaus: Reinforcement Learning Initializations for Deep Variational Quantum Circuits
标题:突破贫瘠高原:深度变分量子电路的强化学习初始化
链接:https://arxiv.org/abs/2508.18514
作者:ng, Xinyi Li, Zhemin Zhang, Samuel Yen-Chi Chen, Zhiding Liang, Ying Wang
摘要:变分量子算法(VQA)已经成为一个可行的框架,用于在从优化和化学模拟到机器学习的应用中利用近期量子器件。然而,VQA的有效性通常受到所谓的贫瘠高原问题的限制,其中梯度随着系统尺寸或电路深度的增加而呈指数级减小,从而阻碍了训练。在这项工作中,我们提出了一种基于强化学习(RL)的初始化策略,通过重塑初始参数景观以避免易于消失梯度的区域来缓解贫瘠高原问题。特别是,我们探索了几种RL算法(确定性策略梯度,软Actor-Critic和近似策略优化等)。以生成在标准的基于梯度的优化之前使VQA成本函数最小化的电路参数(被视为动作)。通过以这种方式使用RL进行预训练,使用梯度下降或Adam等方法的后续优化从更有利的初始状态开始。在各种噪声条件和任务下的大量数值实验一致表明,基于RL的初始化方法显着提高了收敛速度和最终解的质量。此外,不同RL算法之间的比较突出了多种方法可以实现相当的性能增益,强调了我们方法的灵活性和鲁棒性。这些发现揭示了将机器学习技术集成到量子算法设计中的一种有希望的途径,为RL驱动的参数初始化如何加速VQA的可扩展性和实际部署提供了见解。为研究界在量子机器学习方面开辟了一条有前途的道路,特别是VQA中的贫瘠高原问题。
摘要:Variational Quantum Algorithms (VQAs) have gained prominence as a viable framework for exploiting near-term quantum devices in applications ranging from optimization and chemistry simulation to machine learning. However, the effectiveness of VQAs is often constrained by the so-called barren plateau problem, wherein gradients diminish exponentially as system size or circuit depth increases, thereby hindering training. In this work, we propose a reinforcement learning (RL)-based initialization strategy to alleviate the barren plateau issue by reshaping the initial parameter landscape to avoid regions prone to vanishing gradients. In particular, we explore several RL algorithms (Deterministic Policy Gradient, Soft Actor-Critic, and Proximal Policy Optimization, etc.) to generate the circuit parameters (treated as actions) that minimize the VQAs cost function before standard gradient-based optimization. By pre-training with RL in this manner, subsequent optimization using methods such as gradient descent or Adam proceeds from a more favorable initial state. Extensive numerical experiments under various noise conditions and tasks consistently demonstrate that the RL-based initialization method significantly enhances both convergence speed and final solution quality. Moreover, comparisons among different RL algorithms highlight that multiple approaches can achieve comparable performance gains, underscoring the flexibility and robustness of our method. These findings shed light on a promising avenue for integrating machine learning techniques into quantum algorithm design, offering insights into how RL-driven parameter initialization can accelerate the scalability and practical deployment of VQAs. Opening up a promising path for the research community in machine learning for quantum, especially barren plateau problems in VQAs.
【3】DRTA: Dynamic Reward Scaling for Reinforcement Learning in Time Series Anomaly Detection
标题:DRTA:时间序列异常检测中强化学习的动态奖励缩放
链接:https://arxiv.org/abs/2508.18474
作者:olchin, Banafsheh Rekabdar, Kunpeng Liu
摘要
:时间序列数据中的异常检测对于金融、医疗保健、传感器网络和工业监控等应用非常重要。传统的方法通常与有限的标记数据,高假阳性率和难以推广到新的异常类型作斗争。为了克服这些挑战,我们提出了一个基于强化学习的框架,该框架集成了动态奖励成形、变分自动编码器(VAE)和主动学习(称为DRTA)。我们的方法使用自适应奖励机制,通过动态缩放基于VAE的重建错误和分类奖励的影响来平衡探索和利用。这种方法使智能体能够有效地检测低标签系统中的异常,同时保持高精度和召回率。我们在Yahoo A1和Yahoo A2基准数据集上的实验结果表明,所提出的方法始终优于最先进的无监督和半监督方法。这些发现表明,我们的框架是现实世界异常检测任务的可扩展且高效的解决方案。
摘要:Anomaly detection in time series data is important for applications in finance, healthcare, sensor networks, and industrial monitoring. Traditional methods usually struggle with limited labeled data, high false-positive rates, and difficulty generalizing to novel anomaly types. To overcome these challenges, we propose a reinforcement learning-based framework that integrates dynamic reward shaping, Variational Autoencoder (VAE), and active learning, called DRTA. Our method uses an adaptive reward mechanism that balances exploration and exploitation by dynamically scaling the effect of VAE-based reconstruction error and classification rewards. This approach enables the agent to detect anomalies effectively in low-label systems while maintaining high precision and recall. Our experimental results on the Yahoo A1 and Yahoo A2 benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art unsupervised and semi-supervised approaches. These findings show that our framework is a scalable and efficient solution for real-world anomaly detection tasks.
【4】Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning
标题:挖掘长尾:自主运动规划中鲁棒离线强化学习的以数据为中心的关键度预设的比较研究
链接:https://arxiv.org/abs/2508.18397
作者:uillen-Perez
摘要:离线强化学习(RL)为从大规模真实驾驶日志中训练自动驾驶车辆(AV)规划策略提供了一个很有前途的范例。然而,这些日志中的极端数据不平衡,其中普通场景远远超过罕见的“长尾”事件,导致使用标准统一数据采样时的脆弱和不安全的策略。在这项工作中,我们通过系统的,大规模的数据策展策略的比较研究来解决这一挑战,旨在将学习过程集中在信息丰富的样本上。我们研究了六种不同的关键性加权方案,分为三个家庭:基于经验的,基于不确定性的,基于行为的。这些是在两个时间尺度上进行评估的,即个体时间步和完整场景。我们训练七个目标条件保守的Q学习(CQL)代理与国家的最先进的,基于注意力的架构,并评估他们在高保真Waymax模拟器。我们的研究结果表明,所有数据策展方法都显着优于基线。值得注意的是,使用模型不确定性作为信号的数据驱动策展实现了最显著的安全改进,将碰撞率降低了近三倍(从16.0%降至5.5%)。此外,我们确定了一个明确的权衡,时间步长级加权优于反应安全,而时间步长级加权提高了长期规划。我们的工作为离线RL中的数据管理提供了一个全面的框架,并强调智能的非均匀采样是构建安全可靠的自主代理的关键组成部分。
摘要:Offline Reinforcement Learning (RL) presents a promising paradigm for training autonomous vehicle (AV) planning policies from large-scale, real-world driving logs. However, the extreme data imbalance in these logs, where mundane scenarios vastly outnumber rare "long-tail" events, leads to brittle and unsafe policies when using standard uniform data sampling. In this work, we address this challenge through a systematic, large-scale comparative study of data curation strategies designed to focus the learning process on information-rich samples. We investigate six distinct criticality weighting schemes which are categorized into three families: heuristic-based, uncertainty-based, and behavior-based. These are evaluated at two temporal scales, the individual timestep and the complete scenario. We train seven goal-conditioned Conservative Q-Learning (CQL) agents with a state-of-the-art, attention-based architecture and evaluate them in the high-fidelity Waymax simulator. Our results demonstrate that all data curation methods significantly outperform the baseline. Notably, data-driven curation using model uncertainty as a signal achieves the most significant safety improvements, reducing the collision rate by nearly three-fold (from 16.0% to 5.5%). Furthermore, we identify a clear trade-off where timestep-level weighting excels at reactive safety while scenario-level weighting improves long-horizon planning. Our work provides a comprehensive framework for data curation in Offline RL and underscores that intelligent, non-uniform sampling is a critical component for building safe and reliable autonomous agents.
符号|符号学习(1篇)
【1】Interpretable by AI Mother Tongue: Native Symbolic Reasoning in Neural Models
标题:人工智能母语解释:神经模型中的原生符号推理
链接:https://arxiv.org/abs/2508.18988
作者: Liu
备注:25 pages, 9 figures. The AI Intuition Explorer dashboard is available at: this https URL
摘要:我们提出了一个框架,其中神经模型开发了AI母语,这是一种原生符号语言,同时支持直觉推理,组合符号链和固有的可解释性。与事后解释方法不同,我们的方法将推理直接嵌入到模型的表示中:符号捕获有意义的语义模式,链跟踪决策路径,门控感应机制引导选择性焦点,产生透明而灵活的推理。我们引入互补的训练目标,以提高符号纯度和决策稀疏性,并采用顺序专业化策略,首先建立广泛的符号能力,然后完善直观的判断。人工智能任务的实验证明了具有竞争力的准确性以及可验证的推理痕迹,表明人工智能母语可以作为神经模型中可解释性,直觉和符号推理的统一机制。
摘要:We present a framework where neural models develop an AI Mother Tongue, a native symbolic language that simultaneously supports intuitive reasoning, compositional symbol chains, and inherent interpretability. Unlike post-hoc explanation methods, our approach embeds reasoning directly into the model's representations: symbols capture meaningful semantic patterns, chains trace decision paths, and gated induction mechanisms guide selective focus, yielding transparent yet flexible reasoning. We introduce complementary training objectives to enhance symbol purity and decision sparsity, and employ a sequential specialization strategy to first build broad symbolic competence and then refine intuitive judgments. Experiments on AI tasks demonstrate competitive accuracy alongside verifiable reasoning traces, showing that AI Mother Tongue can serve as a unified mechanism for interpretability, intuition, and symbolic reasoning in neural models.
分层学习(1篇)
【1】ProtoEHR: Hierarchical Prototype Learning for EHR-based Healthcare Predictions
标题:ProtoEHR:用于基于EHR的医疗保健预测的分层原型学习
链接:https://arxiv.org/abs/2508.18313
作者:u Liu, Zhiyao Luo, Tingting Zhu
备注:CIKM 2025 Full Paper
摘要:数字医疗系统已经能够在电子医疗记录(EHR)中收集大量医疗数据,从而为各种医疗预测任务提供人工智能解决方案。然而,现有的研究往往集中在孤立的组成部分的EHR数据,限制其预测性能和可解释性。为了解决这一差距,我们提出了ProtoEHR,这是一个可解释的分层原型学习框架,它充分利用了EHR数据丰富的多层次结构来增强医疗保健预测。更具体地说,ProtoEHR模型内部和跨EHR的三个层次的关系:医疗代码,医院访问和患者。我们首先利用大型语言模型来提取医学代码之间的语义关系,并构建医学知识图作为知识源。在此基础上,我们设计了一个分层表示学习框架,捕获跨三个级别的上下文表示,同时在每个级别内结合原型信息,以捕获内在的相似性,提高泛化能力。为了进行全面的评估,我们在两个公共数据集中评估了ProtoEHR的五个临床重要任务,包括预测死亡率,预测再入院,预测住院时间,药物推荐和预测表型。结果表明,与文献中的基线相比,ProtoEHR能够做出准确,稳健和可解释的预测。此外,ProtoEHR还提供有关代码、访问和患者水平的可解释见解,以帮助进行医疗保健预测。
摘要
:Digital healthcare systems have enabled the collection of mass healthcare data in electronic healthcare records (EHRs), allowing artificial intelligence solutions for various healthcare prediction tasks. However, existing studies often focus on isolated components of EHR data, limiting their predictive performance and interpretability. To address this gap, we propose ProtoEHR, an interpretable hierarchical prototype learning framework that fully exploits the rich, multi-level structure of EHR data to enhance healthcare predictions. More specifically, ProtoEHR models relationships within and across three hierarchical levels of EHRs: medical codes, hospital visits, and patients. We first leverage large language models to extract semantic relationships among medical codes and construct a medical knowledge graph as the knowledge source. Building on this, we design a hierarchical representation learning framework that captures contextualized representations across three levels, while incorporating prototype information within each level to capture intrinsic similarities and improve generalization. To perform a comprehensive assessment, we evaluate ProtoEHR in two public datasets on five clinically significant tasks, including prediction of mortality, prediction of readmission, prediction of length of stay, drug recommendation, and prediction of phenotype. The results demonstrate the ability of ProtoEHR to make accurate, robust, and interpretable predictions compared to baselines in the literature. Furthermore, ProtoEHR offers interpretable insights on code, visit, and patient levels to aid in healthcare prediction.
医学相关(4篇)
【1】Biologically Disentangled Multi-Omic Modeling Reveals Mechanistic Insights into Pan-Cancer Immunotherapy Resistance
标题:生物解开的多Omic建模揭示了对泛癌症免疫治疗耐药性的机制见解
链接:https://arxiv.org/abs/2508.18638
作者:iq, Ernest Fraenkel
摘要:免疫检查点抑制剂(ICI)已经改变了癌症治疗,但患者的反应仍然高度可变,并且对耐药的生物学机制知之甚少。虽然机器学习模型有望预测对ICI的反应,但大多数现有方法缺乏可解释性,并且不能有效地利用多组学数据固有的生物结构。在这里,我们介绍了生物解纠缠变分自动编码器(BDVAE),这是一种深度生成模型,通过特定于模态和路径的编码器整合了转录组和基因组数据。与现有的刚性,路径知情的模型,BDVAE采用了一个模块化的编码器架构结合变分推理学习与免疫,基因组和代谢过程相关的生物学意义的潜在特征。应用于接受ICI治疗的四种癌症类型的366名患者的泛癌症队列,BDVAE准确地预测了治疗反应(AUC-ROC = 0.94看不见的测试数据),并揭示了关键的耐药机制,包括免疫抑制,代谢变化和神经元信号传导。重要的是,BDVAE揭示了耐药性跨越连续的生物学谱,而不是严格的二元状态,反映了肿瘤功能障碍的分级。几个潜在的特征与生存结局和已知的临床亚型相关,证明了BDVAE产生可解释的临床相关见解的能力。这些发现强调了生物结构化机器学习在阐明复杂的耐药性模式和指导精确免疫治疗策略方面的价值。
摘要:Immune checkpoint inhibitors (ICIs) have transformed cancer treatment, yet patient responses remain highly variable, and the biological mechanisms underlying resistance are poorly understood. While machine learning models hold promise for predicting responses to ICIs, most existing methods lack interpretability and do not effectively leverage the biological structure inherent to multi-omics data. Here, we introduce the Biologically Disentangled Variational Autoencoder (BDVAE), a deep generative model that integrates transcriptomic and genomic data through modality- and pathway-specific encoders. Unlike existing rigid, pathway-informed models, BDVAE employs a modular encoder architecture combined with variational inference to learn biologically meaningful latent features associated with immune, genomic, and metabolic processes. Applied to a pan-cancer cohort of 366 patients across four cancer types treated with ICIs, BDVAE accurately predicts treatment response (AUC-ROC = 0.94 on unseen test data) and uncovers critical resistance mechanisms, including immune suppression, metabolic shifts, and neuronal signaling. Importantly, BDVAE reveals that resistance spans a continuous biological spectrum rather than strictly binary states, reflecting gradations of tumor dysfunction. Several latent features correlate with survival outcomes and known clinical subtypes, demonstrating BDVAE's capability to generate interpretable, clinically relevant insights. These findings underscore the value of biologically structured machine learning in elucidating complex resistance patterns and guiding precision immunotherapy strategies.
【2】A Fast and Minimal System to Identify Depression Using Smartphones: Explainable Machine Learning-Based Approach
标题:使用智能手机识别抑郁症的快速和最小系统:可解释的基于机器学习的方法
链接:https://arxiv.org/abs/2508.18301
作者: Ahmed, Nova Ahmed
摘要:背景资料:近年来开发的现有强大的,普遍的基于设备的系统来检测抑郁症需要长时间收集的数据,并且在早期检测至关重要的情况下可能无效。 目的:我们的主要目标是开发一个最小化的系统,以识别抑郁症使用的数据检索在最快的时间。 方法:我们开发了一个快速工具,可以在1秒内检索过去7天的应用程序使用数据(平均值为0.31,SD为1.10秒)。来自孟加拉国的100名学生参与了我们的研究,我们的工具收集了他们的应用程序使用数据。为了识别抑郁和非抑郁的学生,我们开发了一组不同的ML模型。我们使用稳定方法选择重要特征,以及3种主要类型的特征选择(FS)方法。 结果如下:仅利用在1秒内检索到的应用程序使用数据,我们的光梯度增强机器模型使用了稳定FS方法选择的重要特征,并正确识别了82.4%(n = 42)的抑郁学生(精度= 75%,F1分数= 78.5%)。此外,经过全面的探索,我们提出了一个简约的堆叠模型,其中在每次迭代验证中使用了由所有相关FS方法Boruta选择的大约5个特征,并显示出77.4%的最大精度(平衡精度= 77.9%)。对我们最好的模型进行的SHAP分析提出了与抑郁症相关的行为标记。 结论:由于我们的系统的快速和简约的性质,它可能会作出有价值的贡献,以确定在欠发达和发展中地区的抑郁症。此外,我们对我们的研究结果的影响进行了详细的讨论,可以促进资源密集度较低的系统的开发,以更好地了解抑郁的学生。
摘要:Background: Existing robust, pervasive device-based systems developed in recent years to detect depression require data collected over a long period and may not be effective in cases where early detection is crucial. Objective: Our main objective was to develop a minimalistic system to identify depression using data retrieved in the fastest possible time. Methods: We developed a fast tool that retrieves the past 7 days' app usage data in 1 second (mean 0.31, SD 1.10 seconds). A total of 100 students from Bangladesh participated in our study, and our tool collected their app usage data. To identify depressed and nondepressed students, we developed a diverse set of ML models. We selected important features using the stable approach, along with 3 main types of feature selection (FS) approaches. Results: Leveraging only the app usage data retrieved in 1 second, our light gradient boosting machine model used the important features selected by the stable FS approach and correctly identified 82.4% (n=42) of depressed students (precision=75%, F1-score=78.5%). Moreover, after comprehensive exploration, we presented a parsimonious stacking model where around 5 features selected by the all-relevant FS approach Boruta were used in each iteration of validation and showed a maximum precision of 77.4% (balanced accuracy=77.9%). A SHAP analysis of our best models presented behavioral markers that were related to depression. Conclusions: Due to our system's fast and minimalistic nature, it may make a worthwhile contribution to identifying depression in underdeveloped and developing regions. In addition, our detailed discussion about the implication of our findings can facilitate the development of less resource-intensive systems to better understand students who are depressed.
【3】Random forest-based out-of-distribution detection for robust lung cancer segmentation
标题:基于森林的随机非分布检测实现稳健的肺癌分割
链接:https://arxiv.org/abs/2508.19112
作者:ngnekar, Harini Veeraraghavan
摘要:从计算机断层扫描(CT)扫描中准确检测和分割癌性病变对于自动化治疗计划和癌症治疗反应评估是必不可少的。具有自我监督预训练的基于transformer的模型可以从分布(ID)数据中产生可靠准确的分割,但当应用于分布(OOD)数据集时会降低。我们使用RF-Deep来解决这一挑战,RF-Deep是一种随机森林分类器,它利用来自分割模型的预训练Transformer编码器的深度特征来检测OOD扫描并增强分割可靠性。该分割模型包括Swin Transformer编码器,该编码器在覆盖癌性和非癌性病症的10,432个未标记的3D CT扫描上使用掩蔽图像建模(SimMIM)进行预训练,该编码器具有卷积解码器,该卷积解码器被训练以在317个3D扫描中分割肺癌。对603个3D CT公共数据集进行了独立测试,其中包括一个ID数据集和四个OOD数据集,包括肺栓塞(PE)和COVID-19的胸部CT,以及肾癌和健康志愿者的腹部CT。RF-Deep在PE、COVID-19和腹部CT上检测到OOD病例,FPR95分别为18.26%、27.66%和小于0.1%,始终优于已建立的OOD方法。RF-Deep分类器提供了一种简单有效的方法来增强ID和OOD场景中癌症分割的可靠性。
摘要
:Accurate detection and segmentation of cancerous lesions from computed tomography (CT) scans is essential for automated treatment planning and cancer treatment response assessment. Transformer-based models with self-supervised pretraining can produce reliably accurate segmentation from in-distribution (ID) data but degrade when applied to out-of-distribution (OOD) datasets. We address this challenge with RF-Deep, a random forest classifier that utilizes deep features from a pretrained transformer encoder of the segmentation model to detect OOD scans and enhance segmentation reliability. The segmentation model comprises a Swin Transformer encoder, pretrained with masked image modeling (SimMIM) on 10,432 unlabeled 3D CT scans covering cancerous and non-cancerous conditions, with a convolution decoder, trained to segment lung cancers in 317 3D scans. Independent testing was performed on 603 3D CT public datasets that included one ID dataset and four OOD datasets comprising chest CTs with pulmonary embolism (PE) and COVID-19, and abdominal CTs with kidney cancers and healthy volunteers. RF-Deep detected OOD cases with a FPR95 of 18.26%, 27.66%, and less than 0.1% on PE, COVID-19, and abdominal CTs, consistently outperforming established OOD approaches. The RF-Deep classifier provides a simple and effective approach to enhance reliability of cancer segmentation in ID and OOD scenarios.
【4】Stress-testing cross-cancer generalizability of 3D nnU-Net for PET-CT tumor segmentation: multi-cohort evaluation with novel oesophageal and lung cancer datasets
标题:3D nnU-Net用于PET-CT肿瘤分割的跨癌症通用性的压力测试:使用新型食道和肺癌数据集的多队列评估
链接:https://arxiv.org/abs/2508.18612
作者:osh, Christine Jestin Hannan, Rajat Vashistha, Parveen Kundu, Sandra Brosda, Lauren G.Aoude, James Lonie, Andrew Nathanson, Jessica Ng, Andrew P. Barbour, Viktor Vegh
摘要:稳健的泛化对于在临床PET-CT工作流程中部署基于深度学习的肿瘤分割至关重要,其中解剖部位,扫描仪和患者人群差异很大。这项研究提出了nnU-Net在PET-CT上的首次跨癌症评估,引入了两个新的专家注释的全身数据集。279例食管癌患者(澳大利亚队列)和54例肺癌患者(印度队列)。这些队列补充了公共AutoPET数据集,并实现了跨域性能的系统压力测试。我们在三种范式下训练和测试了3D nnUNet模型。仅靶向(食管)、仅公开(AutoPET)和联合培训。对于测试集,仅食管模型实现了最佳域内准确度(平均DSC,57.8),但在外印度肺队列中失败(平均DSC小于3.4),表明严重过拟合。仅公众模型的推广范围更广(AutoPET的平均DSC为63.5,印度肺队列为51.6),但在澳大利亚食管队列中表现不佳(平均DSC为26.7)。联合方法提供了最均衡的结果(平均DSC、肺部(52.9)、食管(40.7)、AutoPET(60.9)),减少了边界误差并提高了所有队列的稳健性。这些研究结果表明,数据集的多样性,特别是多人口统计,多中心和多癌症的整合,超过了架构的新颖性,成为强大的泛化的关键驱动力。这项工作提出了基于人口统计学的跨癌症深度学习分割评估,并强调了数据集的多样性,而不是模型的复杂性,作为临床稳健分割的基础。
摘要:Robust generalization is essential for deploying deep learning based tumor segmentation in clinical PET-CT workflows, where anatomical sites, scanners, and patient populations vary widely. This study presents the first cross cancer evaluation of nnU-Net on PET-CT, introducing two novel, expert-annotated whole-body datasets. 279 patients with oesophageal cancer (Australian cohort) and 54 with lung cancer (Indian cohort). These cohorts complement the public AutoPET dataset and enable systematic stress-testing of cross domain performance. We trained and tested 3D nnUNet models under three paradigms. Target only (oesophageal), public only (AutoPET), and combined training. For the tested sets, the oesophageal only model achieved the best in-domain accuracy (mean DSC, 57.8) but failed on external Indian lung cohort (mean DSC less than 3.4), indicating severe overfitting. The public only model generalized more broadly (mean DSC, 63.5 on AutoPET, 51.6 on Indian lung cohort) but underperformed in oesophageal Australian cohort (mean DSC, 26.7). The combined approach provided the most balanced results (mean DSC, lung (52.9), oesophageal (40.7), AutoPET (60.9)), reducing boundary errors and improving robustness across all cohorts. These findings demonstrate that dataset diversity, particularly multi demographic, multi center and multi cancer integration, outweighs architectural novelty as the key driver of robust generalization. This work presents the demography based cross cancer deep learning segmentation evaluation and highlights dataset diversity, rather than model complexity, as the foundation for clinically robust segmentation.
蒸馏|知识提取(1篇)
【1】Automatic Prompt Optimization with Prompt Distillation
标题:使用瞬发蒸馏的自动瞬发优化
链接:https://arxiv.org/abs/2508.18992
作者: Zhuravlev, Artur R. Khairullin, Ernest A. Dyagin, Alena N. Sitkina, Nikita I. Kulin
摘要:自动提示是为语言模型自动选择优化提示的过程,由于大型语言模型(LLM)领域的广泛研究推动了提示工程的快速发展,自动提示越来越受欢迎。本文介绍了DistillPrompt-一种基于大型语言模型的自动提示方法,该方法采用多阶段集成的任务特定信息到使用训练数据的提示。DistillPrompt利用蒸馏、压缩和聚合操作来更彻底地探索提示空间。使用t-lite-instruction-0.1语言模型在不同的文本分类和生成任务数据集上测试了该方法。结果表明显著的平均改善(例如,在关键指标上,与该领域现有方法相比,整个数据集的Grips(20.12%),将DistillPrompt确立为自动提示中最有效的非梯度方法之一。
摘要:Autoprompting is the process of automatically selecting optimized prompts for language models, which is gaining popularity due to the rapid development of prompt engineering driven by extensive research in the field of large language models (LLMs). This paper presents DistillPrompt -- a novel autoprompting method based on large language models that employs a multi-stage integration of task-specific information into prompts using training data. DistillPrompt utilizes distillation, compression, and aggregation operations to explore the prompt space more thoroughly. The method was tested on different datasets for text classification and generation tasks using the t-lite-instruct-0.1 language model. The results demonstrate a significant average improvement (e.g., 20.12% across the entire dataset compared to Grips) in key metrics over existing methods in the field, establishing DistillPrompt as one of the most effective non-gradient approaches in autoprompting.
推荐(2篇)
【1】Recycling History: Efficient Recommendations from Contextual Dueling Bandits
标题:回收历史:来自背景决斗强盗的有效建议
链接:https://arxiv.org/abs/2508.18841
作者:yana Sankagiri, Jalal Etesami, Pouria Fatemi, Matthias Grossglauser
备注:16 pages, 3 figures
摘要:上下文决斗强盗问题模型的自适应推荐系统,其中的算法提出了一组项目给用户,用户的选择揭示了他们的喜好。这种设置非常适合用户在导航内容平台时做出的隐式选择,但不会捕获其他可能的比较查询。受用户在消费物品后提供更可靠的反馈的事实的启发,我们提出了一个新的强盗模型,可以描述如下。该算法在每个时间步推荐一个项目;在消费该项目之后,要求用户将其与从用户的消费历史中选择的另一个项目进行比较。重要的是,在我们的模型中,可以选择这个比较项而不会产生任何额外的遗憾,从而可能导致更好的性能。然而,遗憾的分析是具有挑战性的,因为在用户的历史的时间依赖性。为了克服这一挑战,我们首先表明,该算法可以构建信息查询提供的历史是丰富的,即,满足一定的多样性条件。然后,我们表明,一个短的初始随机探索阶段是足够的算法积累了丰富的历史与高概率。这个结果,通过矩阵浓度界限证明,产生$O(\sqrt{T})$后悔保证。此外,我们的模拟表明,重复使用过去的项目进行比较,可以导致显着较低的遗憾,而不是只比较同时推荐的项目。
摘要
:The contextual duelling bandit problem models adaptive recommender systems, where the algorithm presents a set of items to the user, and the user's choice reveals their preference. This setup is well suited for implicit choices users make when navigating a content platform, but does not capture other possible comparison queries. Motivated by the fact that users provide more reliable feedback after consuming items, we propose a new bandit model that can be described as follows. The algorithm recommends one item per time step; after consuming that item, the user is asked to compare it with another item chosen from the user's consumption history. Importantly, in our model, this comparison item can be chosen without incurring any additional regret, potentially leading to better performance. However, the regret analysis is challenging because of the temporal dependency in the user's history. To overcome this challenge, we first show that the algorithm can construct informative queries provided the history is rich, i.e., satisfies a certain diversity condition. We then show that a short initial random exploration phase is sufficient for the algorithm to accumulate a rich history with high probability. This result, proven via matrix concentration bounds, yields $O(\sqrt{T})$ regret guarantees. Additionally, our simulations show that reusing past items for comparisons can lead to significantly lower regret than only comparing between simultaneously recommended items.
【2】Taming the One-Epoch Phenomenon in Online Recommendation System by Two-stage Contrastive ID Pre-training
标题:通过两阶段对比ID预训练来驯服在线推荐系统中的一纪元现象
链接:https://arxiv.org/abs/2508.18700
作者:su, Po-Wei Wang, Chantat Eksombatchai, Jiajing Xu
备注:Published at RecSys'24, see https://dl.acm.org/doi/10.1145/3640457.3688053
摘要:基于身份的嵌入被广泛应用于网络规模的在线推荐系统中。然而,它们对过拟合的敏感性,特别是由于数据分布的长尾性质,通常将训练限制在单个时期,这种现象被称为“一个时期问题”。“这一挑战推动了研究工作,通过提高收敛速度或特征稀疏性来优化第一个时期的性能。在这项研究中,我们引入了一种新的两阶段训练策略,该策略使用具有对比度损失的最小模型来合并预训练阶段,从而为嵌入系统提供更广泛的数据覆盖范围。我们的离线实验表明,在预训练阶段进行多时期训练不会导致过拟合,并且当针对更复杂的下游推荐任务进行微调时,所产生的嵌入可以提高在线泛化能力。我们在Pinterest的实时流量中部署了拟议的系统,实现了显著的全站点参与收益。
摘要:ID-based embeddings are widely used in web-scale online recommendation systems. However, their susceptibility to overfitting, particularly due to the long-tail nature of data distributions, often limits training to a single epoch, a phenomenon known as the "one-epoch problem." This challenge has driven research efforts to optimize performance within the first epoch by enhancing convergence speed or feature sparsity. In this study, we introduce a novel two-stage training strategy that incorporates a pre-training phase using a minimal model with contrastive loss, enabling broader data coverage for the embedding system. Our offline experiments demonstrate that multi-epoch training during the pre-training phase does not lead to overfitting, and the resulting embeddings improve online generalization when fine-tuned for more complex downstream recommendation tasks. We deployed the proposed system in live traffic at Pinterest, achieving significant site-wide engagement gains.
自动驾驶|车辆|车道检测等(1篇)
【1】Interpretable Decision-Making for End-to-End Autonomous Driving
标题:端到端自动驾驶的可解释决策
链接:https://arxiv.org/abs/2508.18898
作者:aie, Bodo Rosenhahn
备注:Accepted to the ICCV 2025 2nd Workshop on the Challenge Of Out-of-Label Hazards in Autonomous Driving (2COOOL)
摘要:值得信赖的人工智能对于自动驾驶汽车的广泛部署是强制性的。尽管端到端方法直接从原始数据中获取控制命令,但解释这些决策仍然具有挑战性,尤其是在复杂的城市场景中。这主要归因于具有非线性决策边界的非常深的神经网络,这使得掌握人工智能驱动决策背后的逻辑具有挑战性。本文提出了一种在自动驾驶中优化控制命令的同时增强可解释性的方法。为了解决这个问题,我们提出了损失函数,通过生成稀疏和本地化的特征映射来提高模型的可解释性。功能激活允许我们解释哪些图像区域有助于预测控制命令。我们对特征提取步骤进行了全面的消融研究,并在CARLA基准上验证了我们的方法。我们还证明,我们的方法提高了可解释性,这与减少违规行为,产生一个更安全,高性能的驾驶模型。值得注意的是,我们的单目非集成模型通过实现更低的违规分数和最高的路线完成率,超越了CARLA排行榜上表现最好的方法,同时确保了可解释性。
摘要:Trustworthy AI is mandatory for the broad deployment of autonomous vehicles. Although end-to-end approaches derive control commands directly from raw data, interpreting these decisions remains challenging, especially in complex urban scenarios. This is mainly attributed to very deep neural networks with non-linear decision boundaries, making it challenging to grasp the logic behind AI-driven decisions. This paper presents a method to enhance interpretability while optimizing control commands in autonomous driving. To address this, we propose loss functions that promote the interpretability of our model by generating sparse and localized feature maps. The feature activations allow us to explain which image regions contribute to the predicted control command. We conduct comprehensive ablation studies on the feature extraction step and validate our method on the CARLA benchmarks. We also demonstrate that our approach improves interpretability, which correlates with reducing infractions, yielding a safer, high-performance driving model. Notably, our monocular, non-ensemble model surpasses the top-performing approaches from the CARLA Leaderboard by achieving lower infraction scores and the highest route completion rate, all while ensuring interpretability.
点云|SLAM|雷达|激光|深度RGBD相关(2篇)
【1】A Bag of Tricks for Efficient Implicit Neural Point Clouds
标题:高效隐式神经点云的一袋技巧
链接:https://arxiv.org/abs/2508.19140
作者:ahlbohm, Linus Franke, Leon Overkämping, Paula Wespe, Susana Castillo, Martin Eisemann, Marcus Magnor
备注:Project page: this https URL
摘要:隐式神经点云(INPC)是一种新的混合表示,它结合了神经场的表现力和基于点的渲染的效率,在新颖的视图合成中实现了最先进的图像质量。然而,与其他在渲染期间查询神经网络的高质量方法一样,INPC的实际可用性受到相对较慢的渲染的限制。在这项工作中,我们提出了一系列优化,显着提高INPC的训练和推理性能,而不牺牲视觉保真度。最重要的修改是改进的光栅化器实现,更有效的采样技术,以及为用于孔洞填充的卷积神经网络进行预训练。此外,我们证明了在推断过程中可以将点建模为小高斯,以进一步提高外推的质量,例如,现场的特写镜头。我们设计我们的实现是广泛适用的INPC和系统地评估每个修改在一系列的实验。我们优化的INPC流水线实现了高达25%的训练速度,2倍的渲染速度,减少了20%的VRAM使用量,并轻微改善了图像质量。
摘要:Implicit Neural Point Cloud (INPC) is a recent hybrid representation that combines the expressiveness of neural fields with the efficiency of point-based rendering, achieving state-of-the-art image quality in novel view synthesis. However, as with other high-quality approaches that query neural networks during rendering, the practical usability of INPC is limited by comparatively slow rendering. In this work, we present a collection of optimizations that significantly improve both the training and inference performance of INPC without sacrificing visual fidelity. The most significant modifications are an improved rasterizer implementation, more effective sampling techniques, and the incorporation of pre-training for the convolutional neural network used for hole-filling. Furthermore, we demonstrate that points can be modeled as small Gaussians during inference to further improve quality in extrapolated, e.g., close-up views of the scene. We design our implementations to be broadly applicable beyond INPC and systematically evaluate each modification in a series of experiments. Our optimized INPC pipeline achieves up to 25% faster training, 2x faster rendering, and 20% reduced VRAM usage paired with slight image quality improvements.
【2】Towards Training-Free Underwater 3D Object Detection from Sonar Point Clouds: A Comparison of Traditional and Deep Learning Approaches
标题:从声纳点云实现免训练水下3D物体检测:传统和深度学习方法的比较
链接:https://arxiv.org/abs/2508.18293
作者: Shaukat, Yannik Käckenmeister, Sebastian Bader, Thomas Kirste
备注:12 pages, 7 figures, submitted to IEEE Journal of Oceanic Engineering (IEEE-JOE)
摘要
:水下3D目标检测仍然是计算机视觉中最具挑战性的前沿领域之一,传统方法在恶劣的声学环境和训练数据的稀缺性中苦苦挣扎。虽然深度学习已经彻底改变了陆地3D检测,但其水下应用面临着一个关键瓶颈:获得足够的注释声纳数据非常昂贵且后勤复杂,通常需要专门的船只,专业测量师和有利的天气条件。这项工作解决了一个基本问题:我们能否在没有真实世界训练数据的情况下实现可靠的水下3D物体检测?我们通过开发和比较两种模式来应对这一挑战,用于在多波束回声测深仪点云中对人工结构进行无训练检测。我们的双重方法将基于物理的声纳模拟管道(为最先进的神经网络生成合成训练数据)与鲁棒的基于模型的模板匹配系统(利用目标对象的几何先验知识)相结合。对波罗的海实际测深调查的评估揭示了令人惊讶的见解:虽然在合成数据上训练的神经网络在模拟场景上达到了98%的平均精度(mAP),但由于域偏移,它们在真实声纳数据上下降到40% mAP。相反,我们的模板匹配方法在不需要任何训练的情况下在真实数据上保持83%的mAP,对声学噪声和环境变化表现出显着的鲁棒性。我们的研究结果挑战了关于水下领域数据饥饿深度学习的传统智慧,并建立了第一个大规模的无训练水下3D检测基准。这项工作为自主水下航行器导航、海洋考古和海上基础设施监测在传统机器学习方法失效的数据稀缺环境中提供了新的可能性。
摘要:Underwater 3D object detection remains one of the most challenging frontiers in computer vision, where traditional approaches struggle with the harsh acoustic environment and scarcity of training data. While deep learning has revolutionized terrestrial 3D detection, its application underwater faces a critical bottleneck: obtaining sufficient annotated sonar data is prohibitively expensive and logistically complex, often requiring specialized vessels, expert surveyors, and favorable weather conditions. This work addresses a fundamental question: Can we achieve reliable underwater 3D object detection without real-world training data? We tackle this challenge by developing and comparing two paradigms for training-free detection of artificial structures in multibeam echo-sounder point clouds. Our dual approach combines a physics-based sonar simulation pipeline that generates synthetic training data for state-of-the-art neural networks, with a robust model-based template matching system that leverages geometric priors of target objects. Evaluation on real bathymetry surveys from the Baltic Sea reveals surprising insights: while neural networks trained on synthetic data achieve 98% mean Average Precision (mAP) on simulated scenes, they drop to 40% mAP on real sonar data due to domain shift. Conversely, our template matching approach maintains 83% mAP on real data without requiring any training, demonstrating remarkable robustness to acoustic noise and environmental variations. Our findings challenge conventional wisdom about data-hungry deep learning in underwater domains and establish the first large-scale benchmark for training-free underwater 3D detection. This work opens new possibilities for autonomous underwater vehicle navigation, marine archaeology, and offshore infrastructure monitoring in data-scarce environments where traditional machine learning approaches fail.
联邦学习|隐私保护|加密(4篇)
【1】Enhancing Model Privacy in Federated Learning with Random Masking and Quantization
标题:通过随机掩蔽和量化增强联邦学习中的模型隐私性
链接:https://arxiv.org/abs/2508.18911
作者: Jianhao Zhu, Jingwen Xu, Changze Lv, Zisu Huang, Xiaohua Wang, Muling Wu, Qi Qian, Xiaoqing Zheng, Xuanjing Huang
摘要:不同模型和任务的实验结果表明,我们的方法不仅在联邦学习设置中保持了强大的模型性能,而且与基线方法相比,还实现了对模型参数的增强保护。
摘要:Experimental results across various models and tasks demonstrate that our approach not only maintains strong model performance in federated learning settings but also achieves enhanced protection of model parameters compared to baseline methods.
【2】Federated Learning with Heterogeneous and Private Label Sets
标题:使用异类和私有标签集的联邦学习
链接:https://arxiv.org/abs/2508.18774
作者:tholtz, Edvin Listo Zec, Fredrik D. Johansson
摘要:虽然在实际应用中很常见,但在联邦学习(FL)中很少研究异构客户端标签集。此外,在这种情况下,客户端被假定愿意与其他客户端共享其整个标签集。使用私有标签集的联合学习只与中央服务器共享,这对学习算法增加了进一步的约束,并且通常是一个更难解决的问题。在这项工作中,我们研究了标签集异质性对模型性能的影响,比较了公共和私有标签设置-当联盟中的标签集的工会是已知的客户端,当它不是。我们应用经典的方法分类器组合问题FL使用集中式调整,适应常见的FL方法的私有标签集设置,并讨论了这两种方法在实际假设下的理由。我们的实验表明,减少每个客户端可用的标签数量会大大损害所有方法的性能。集中调优客户机模型以实现表示对齐可以帮助解决这个问题,但通常以更高的方差为代价。在整个过程中,我们提出的标准FL方法的适应性表现良好,在私有标签设置中显示出与公共设置中的标准方法相似的性能。这表明,客户端可以以很小的模型准确性成本享受更高的隐私。
摘要:Although common in real-world applications, heterogeneous client label sets are rarely investigated in federated learning (FL). Furthermore, in the cases they are, clients are assumed to be willing to share their entire label sets with other clients. Federated learning with private label sets, shared only with the central server, adds further constraints on learning algorithms and is, in general, a more difficult problem to solve. In this work, we study the effects of label set heterogeneity on model performance, comparing the public and private label settings -- when the union of label sets in the federation is known to clients and when it is not. We apply classical methods for the classifier combination problem to FL using centralized tuning, adapt common FL methods to the private label set setting, and discuss the justification of both approaches under practical assumptions. Our experiments show that reducing the number of labels available to each client harms the performance of all methods substantially. Centralized tuning of client models for representational alignment can help remedy this, but often at the cost of higher variance. Throughout, our proposed adaptations of standard FL methods perform well, showing similar performance in the private label setting as the standard methods achieve in the public setting. This shows that clients can enjoy increased privacy at little cost to model accuracy.
【3】ZTFed-MAS2S: A Zero-Trust Federated Learning Framework with Verifiable Privacy and Trust-Aware Aggregation for Wind Power Data Imputation
标题:ZTFed-MAS 2S:一个具有可验证隐私和信任感知聚合的零信任联邦学习框架,用于风电数据估算
链接:https://arxiv.org/abs/2508.18318
作者:Hanjie Wang, Yuanzheng Li, Jiazheng Li, Zhaoyang Dong
备注:Accepted by IEEE Transactions on Industrial Informatics, 11 pages, 6 figures
摘要:由于传感器故障和边缘站点传输不稳定,风电数据经常会出现缺失值。虽然联邦学习可以在不共享原始数据的情况下实现隐私保护协作,但在参数交换过程中,它仍然容易受到异常更新和隐私泄露的影响。这些挑战在开放的工业环境中被放大,需要零信任机制,其中没有参与者是固有信任的。为了解决这些挑战,这项工作提出了ZTFed-MAS 2S,这是一个零信任的联邦学习框架,它集成了一个基于多头注意力的序列到序列插补模型。ZTFed将可验证差分隐私与非交互式零知识证明以及机密性和完整性验证机制相结合,以确保可验证的隐私保护和安全的模型参数传输。一个动态的信任感知聚合机制,其中的信任传播的相似性图,以提高鲁棒性,并通过稀疏和基于量化的压缩减少通信开销。MAS 2S捕获风电数据的长期依赖性,以进行准确的插补。在实际风电场数据集上进行的大量实验验证了ZTFed-MAS 2S在联邦学习性能和缺失数据填补方面的优越性,证明了其作为能源领域实际应用的安全高效解决方案的有效性。
摘要
:Wind power data often suffers from missing values due to sensor faults and unstable transmission at edge sites. While federated learning enables privacy-preserving collaboration without sharing raw data, it remains vulnerable to anomalous updates and privacy leakage during parameter exchange. These challenges are amplified in open industrial environments, necessitating zero-trust mechanisms where no participant is inherently trusted. To address these challenges, this work proposes ZTFed-MAS2S, a zero-trust federated learning framework that integrates a multi-head attention-based sequence-to-sequence imputation model. ZTFed integrates verifiable differential privacy with non-interactive zero-knowledge proofs and a confidentiality and integrity verification mechanism to ensure verifiable privacy preservation and secure model parameters transmission. A dynamic trust-aware aggregation mechanism is employed, where trust is propagated over similarity graphs to enhance robustness, and communication overhead is reduced via sparsity- and quantization-based compression. MAS2S captures long-term dependencies in wind power data for accurate imputation. Extensive experiments on real-world wind farm datasets validate the superiority of ZTFed-MAS2S in both federated learning performance and missing data imputation, demonstrating its effectiveness as a secure and efficient solution for practical applications in the energy sector.
【4】Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing
标题:评估联邦学习以预测高危学生:模型复杂性和数据平衡的比较分析
链接:https://arxiv.org/abs/2508.18316
作者:ertulino
备注:This article has been prepared to be submitted to the Holos Journal in Brazil
摘要:远程教育的高辍学率和不及格率对学术机构构成了重大挑战,这使得主动识别有风险的学生对于及时提供支持至关重要。这项研究开发和评估了一个基于早期学习成绩和大规模OULAD数据集的数字参与模式的机器学习模型,以预测英国大学的学生风险。为了解决经常阻碍此类举措的数据隐私和机构孤岛的实际挑战,我们使用联合学习(FL)框架实现了该模型。我们比较了模型复杂性(逻辑回归与深度神经网络)和数据平衡。最终的联邦模型显示出强大的预测能力,在识别风险学生方面实现了约85%的ROC AUC评分。我们的研究结果表明,这种联合方法为机构提供了一种实用且可扩展的解决方案,以建立有效的预警系统,在尊重数据隐私的同时实现主动的学生支持。
摘要:High dropout and failure rates in distance education pose a significant challenge for academic institutions, making the proactive identification of at-risk students crucial for providing timely support. This study develops and evaluates a machine learning model based on early academic performance and digital engagement patterns from the large-scale OULAD dataset to predict student risk at a UK university. To address the practical challenges of data privacy and institutional silos that often hinder such initiatives, we implement the model using a Federated Learning (FL) framework. We compare model complexity (Logistic Regression vs. a Deep Neural Network) and data balancing. The final federated model demonstrates strong predictive capability, achieving an ROC AUC score of approximately 85% in identifying at-risk students. Our findings show that this federated approach provides a practical and scalable solution for institutions to build effective early-warning systems, enabling proactive student support while inherently respecting data privacy.
推理|分析|理解|解释(4篇)
【1】Understanding Tool-Integrated Reasoning
标题:理解工具集成推理
链接:https://arxiv.org/abs/2508.19201
作者: Zhongwen Xu
摘要:我们研究了为什么工具集成推理(TIR)使大型语言模型(LLM)更有能力。虽然LLM与Python代码解释器等工具集成显示出很大的希望,但解释这种范式有效的原理性理论一直缺失。这项工作提供了第一个正式的证明,TIR从根本上扩展了LLM的能力。我们证明,工具,使严格的扩展模型的经验和可行的支持,打破能力上限的纯文本模型解锁解决问题的策略,否则是不可能的或棘手的冗长。为了在不影响训练稳定性和性能的情况下指导模型行为,我们还引入了优势塑造策略优化(ASPO),这是一种直接修改优势函数以指导策略行为的新算法。我们利用Python解释器作为外部工具,对具有挑战性的数学基准进行全面的实验。我们的研究结果表明,TIR模型在pass@k度量上明显优于纯文本模型。至关重要的是,这种优势不仅限于计算密集型问题,而且还扩展到那些需要重要抽象洞察力的问题。我们进一步确定了新兴的认知模式,说明了模型如何学习思考的工具。最后,我们报告了通过早期代码调用和ASPO更多的交互式转向改进的工具使用行为。总的来说,我们的工作为TIR的成功提供了第一个原则性的解释,将焦点从工具工作的事实转移到为什么以及如何实现更强大的推理。
摘要:We study why Tool-Integrated Reasoning (TIR) makes Large Language Models (LLMs) more capable. While LLMs integrated with tools like Python code interpreters show great promise, a principled theory explaining why this paradigm is effective has been missing. This work provides the first formal proof that TIR fundamentally expands an LLM's capabilities. We demonstrate that tools enable a strict expansion of the model's empirical and feasible support, breaking the capability ceiling of pure-text models by unlocking problem-solving strategies that are otherwise impossible or intractably verbose. To guide model behavior without compromising training stability and performance, we also introduce Advantage Shaping Policy Optimization (ASPO), a novel algorithm that directly modifies the advantage function to guide the policy behavior. We conduct comprehensive experiments on challenging mathematical benchmarks, leveraging a Python interpreter as the external tool. Our results show that the TIR model decisively outperforms its pure-text counterpart on the pass@k metric. Crucially, this advantage is not confined to computationally-intensive problems but extends to those requiring significant abstract insight. We further identify the emergent cognitive patterns that illustrate how models learn to think with tools. Finally, we report improved tool usage behavior with early code invocation and much more interactive turns with ASPO. Overall, our work provides the first principled explanation for TIR's success, shifting the focus from the mere fact that tools work to why and how they enable more powerful reasoning.
【2】PAX-TS: Model-agnostic multi-granular explanations for time series forecasting via localized perturbations
标题:PAX-TS:通过局部扰动进行时间序列预测的模型不可知的多粒度解释
链接:https://arxiv.org/abs/2508.18982
作者:er, Jelena Zdravkovic, Panagiotis Papapetrou
摘要:在过去的几年里,时间序列预测已经有了相当大的进步,Transformer模型和大型语言模型推动了最先进的技术的进步。现代预测模型通常是不透明的,并且不提供对其预测的解释,而众所周知的事后可解释性方法,如LIME,不适合预测环境。我们提出了PAX-TS,一个模型无关的事后算法来解释时间序列预测模型及其预测。我们的方法是基于本地化的输入扰动和多粒度的解释结果。此外,它能够表征跨通道相关性的多变量时间序列预测。我们清楚地概述了PAX-TS背后的算法过程,在7种算法和10种不同数据集的基准测试中进行了演示,将其与其他两种最先进的解释算法进行了比较,并介绍了该方法的不同解释类型。我们发现,在相同的数据集上,高性能和低性能算法的解释是不同的,这突出表明PAX-TS的解释有效地捕捉了模型的行为。基于基准测试产生的时间步长相关矩阵,我们识别出6类在不同数据集和算法中重复出现的模式。我们发现,模式是性能指标,类之间的预测误差有明显的差异。最后,我们概述了一个多变量的例子,PAX-TS演示了预测模型如何考虑跨渠道的相关性。通过PAX-TS,时间序列预测模型的机制可以在不同的详细程度上进行说明,其解释可以用来回答预测的实际问题。
摘要:Time series forecasting has seen considerable improvement during the last years, with transformer models and large language models driving advancements of the state of the art. Modern forecasting models are generally opaque and do not provide explanations for their forecasts, while well-known post-hoc explainability methods like LIME are not suitable for the forecasting context. We propose PAX-TS, a model-agnostic post-hoc algorithm to explain time series forecasting models and their forecasts. Our method is based on localized input perturbations and results in multi-granular explanations. Further, it is able to characterize cross-channel correlations for multivariate time series forecasts. We clearly outline the algorithmic procedure behind PAX-TS, demonstrate it on a benchmark with 7 algorithms and 10 diverse datasets, compare it with two other state-of-the-art explanation algorithms, and present the different explanation types of the method. We found that the explanations of high-performing and low-performing algorithms differ on the same datasets, highlighting that the explanations of PAX-TS effectively capture a model's behavior. Based on time step correlation matrices resulting from the benchmark, we identify 6 classes of patterns that repeatedly occur across different datasets and algorithms. We found that the patterns are indicators of performance, with noticeable differences in forecasting error between the classes. Lastly, we outline a multivariate example where PAX-TS demonstrates how the forecasting model takes cross-channel correlations into account. With PAX-TS, time series forecasting models' mechanisms can be illustrated in different levels of detail, and its explanations can be used to answer practical questions on forecasts.
【3】Quantifying The Limits of AI Reasoning: Systematic Neural Network Representations of Algorithms
标题:量化人工智能推理的局限性:算法的系统神经网络表示
链接:https://arxiv.org/abs/2508.18526
作者: Kratsios, Dennis Zvigelsky, Bradd Hart
备注:18 pages main body, 45 pages total + references
摘要:当代人工智能研究中的一个主要开放问题是量化神经网络在经过完美训练时可以执行的推理形式。本文回答了这一问题,解释推理任务电路仿真,其中的门定义推理的类型;例如,布尔门谓词逻辑,热带电路的动态规划,算术和分析门的符号数学表示,以及其混合更深层次的推理;例如,高阶逻辑。 我们提出了一种系统的元算法,通过用规范的ReLU MLP仿真器迭代地替换每个门,基本上可以将任何电路转换为具有ReLU激活的前馈神经网络(NN)。我们表明,在任何数字计算机上,我们的结构完全模拟电路-没有近似,没有舍入,包括模块溢出-证明没有推理任务超出神经网络的范围。结果网络中的神经元数量(参数复杂度)与电路的复杂度成比例,网络的计算图(结构)反映了仿真电路的计算图。这使民间传说正式化,即NN网络以算法运行时间(电路运行时间)换取空间复杂度(神经元数量)。 我们得到了一系列的应用,我们的主要结果,从模拟最短路径算法的图形与立方大小的神经网络,模拟停止图灵机与大致二次大神经网络,甚至随机布尔电路的仿真。最后,我们证明了我们的结果比经典的通用逼近定理更强大:任何通用函数逼近器都可以编码为电路并直接由NN仿真。
摘要:A main open question in contemporary AI research is quantifying the forms of reasoning neural networks can perform when perfectly trained. This paper answers this by interpreting reasoning tasks as circuit emulation, where the gates define the type of reasoning; e.g. Boolean gates for predicate logic, tropical circuits for dynamic programming, arithmetic and analytic gates for symbolic mathematical representation, and hybrids thereof for deeper reasoning; e.g. higher-order logic. We present a systematic meta-algorithm that converts essentially any circuit into a feedforward neural network (NN) with ReLU activations by iteratively replacing each gate with a canonical ReLU MLP emulator. We show that, on any digital computer, our construction emulates the circuit exactly--no approximation, no rounding, modular overflow included--demonstrating that no reasoning task lies beyond the reach of neural networks. The number of neurons in the resulting network (parametric complexity) scales with the circuit's complexity, and the network's computational graph (structure) mirrors that of the emulated circuit. This formalizes the folklore that NNs networks trade algorithmic run-time (circuit runtime) for space complexity (number of neurons). We derive a range of applications of our main result, from emulating shortest-path algorithms on graphs with cubic--size NNs, to simulating stopped Turing machines with roughly quadratically--large NNs, and even the emulation of randomized Boolean circuits. Lastly, we demonstrate that our result is strictly more powerful than a classical universal approximation theorem: any universal function approximator can be encoded as a circuit and directly emulated by a NN.
【4】Learning Explainable Imaging-Genetics Associations Related to a Neurological Disorder
标题:学习与神经系统疾病相关的可解释的成像-遗传学关联
链接:https://arxiv.org/abs/2508.18303
作者:g, Zachary Jacokes, John Darrell Van Horn, Michael C. Schatz, Kevin A. Pelphrey, Archana Venkataraman
摘要:虽然成像遗传学在揭示神经系统疾病中大脑结构和遗传变异之间复杂的相互作用方面有很大的希望,但传统方法仅限于简单的线性模型或缺乏可解释性的黑盒技术。在本文中,我们介绍了NeuroPathX,这是一个可解释的深度学习框架,它使用由交叉注意机制驱动的早期融合策略来捕获来自MRI的大脑结构变化与来自遗传学数据的已建立生物学通路之间的有意义的相互作用。为了增强可解释性和鲁棒性,我们在注意力矩阵上引入了两个损失函数--一个是关注最显著相互作用的稀疏损失,另一个是在整个队列中执行一致表示的路径相似性损失。我们在自闭症谱系障碍和阿尔茨海默病上验证了NeuroPathX。我们的研究结果表明,NeuroPathX优于竞争的基线方法,并揭示了与该疾病相关的生物学合理关联。这些发现强调了NeuroPathX在促进我们对复杂大脑疾病的理解方面的潜力。代码可在https://github.com/jueqiw/NeuroPathX上获得。
摘要:While imaging-genetics holds great promise for unraveling the complex interplay between brain structure and genetic variation in neurological disorders, traditional methods are limited to simplistic linear models or to black-box techniques that lack interpretability. In this paper, we present NeuroPathX, an explainable deep learning framework that uses an early fusion strategy powered by cross-attention mechanisms to capture meaningful interactions between structural variations in the brain derived from MRI and established biological pathways derived from genetics data. To enhance interpretability and robustness, we introduce two loss functions over the attention matrix - a sparsity loss that focuses on the most salient interactions and a pathway similarity loss that enforces consistent representations across the cohort. We validate NeuroPathX on both autism spectrum disorder and Alzheimer's disease. Our results demonstrate that NeuroPathX outperforms competing baseline approaches and reveals biologically plausible associations linked to the disorder. These findings underscore the potential of NeuroPathX to advance our understanding of complex brain disorders. Code is available at https://github.com/jueqiw/NeuroPathX .
检测相关(4篇)
【1】Attackers Strike Back? Not Anymore - An Ensemble of RL Defenders Awakens for APT Detection
标题:攻击者反击?不再是--RL捍卫者群体觉醒,进行APT检测
链接:https://arxiv.org/abs/2508.19072
作者:Benabderrahmane, Talal Rahwan
摘要:高级持续性威胁(APT)对现代数字基础设施构成了日益严重的威胁。与传统的网络攻击不同,APT具有隐蔽性、自适应性和持久性,通常会绕过基于特征的检测系统。本文介绍了一种新的APT检测框架,该框架将深度学习、强化学习(RL)和主动学习结合到一个有凝聚力的自适应防御系统中。我们的系统将潜在行为编码的自动编码器与基于RL的防御者的多代理集成相结合,每个防御者都经过训练以区分良性和恶意进程行为。我们确定了现有检测系统的一个关键挑战:它们的静态性质和无法适应不断变化的攻击策略。为此,我们的架构包括多个RL代理(Q-Learning,PPO,DQN,对抗防御者),每个代理都分析由自动编码器生成的潜在向量。当任何代理对其决策不确定时,系统触发主动学习循环来模拟专家反馈,从而细化决策边界。集合投票机制,加权每个代理的表现,确保强大的最终预测。
摘要:Advanced Persistent Threats (APTs) represent a growing menace to modern digital infrastructure. Unlike traditional cyberattacks, APTs are stealthy, adaptive, and long-lasting, often bypassing signature-based detection systems. This paper introduces a novel framework for APT detection that unites deep learning, reinforcement learning (RL), and active learning into a cohesive, adaptive defense system. Our system combines auto-encoders for latent behavioral encoding with a multi-agent ensemble of RL-based defenders, each trained to distinguish between benign and malicious process behaviors. We identify a critical challenge in existing detection systems: their static nature and inability to adapt to evolving attack strategies. To this end, our architecture includes multiple RL agents (Q-Learning, PPO, DQN, adversarial defenders), each analyzing latent vectors generated by an auto-encoder. When any agent is uncertain about its decision, the system triggers an active learning loop to simulate expert feedback, thus refining decision boundaries. An ensemble voting mechanism, weighted by each agent's performance, ensures robust final predictions.
【2】Are All Marine Species Created Equal? Performance Disparities in Underwater Object Detection
标题:所有海洋物种生来平等吗?水下目标检测中的性能差异
链接:https://arxiv.org/abs/2508.18729
作者:ille, Tobias Fischer, Scarlett Raine
备注:10 pages
摘要
:水下物体检测对于监测海洋生态系统至关重要,但也带来了独特的挑战,包括图像质量下降,类别分布不平衡以及独特的视觉特征。并不是每个物种都能被同样好地检测到,但根本原因仍不清楚。我们解决了两个关键的研究问题:1)除了数据量之外,还有什么因素会导致特定类别的性能差异?2)我们如何系统地改进对表现不佳的海洋物种的检测?我们操纵DUO数据集将对象检测任务分为定位和分类,并调查扇贝类的性能不佳。使用YOLO 11和TIDE的本地化分析发现,无论数据量如何,前景-背景歧视都是最有问题的阶段。分类实验显示,即使在平衡数据的情况下,也存在持续的精度差距,这表明除了数据稀缺性和类间依赖性之外,还存在基于特征的内在挑战。我们建议在优先考虑精度时使用不平衡分布,在优先考虑召回率时使用平衡分布。改善表现不佳的课程应该专注于算法的进步,特别是在本地化模块中。我们公开发布代码和数据集。
摘要:Underwater object detection is critical for monitoring marine ecosystems but poses unique challenges, including degraded image quality, imbalanced class distribution, and distinct visual characteristics. Not every species is detected equally well, yet underlying causes remain unclear. We address two key research questions: 1) What factors beyond data quantity drive class-specific performance disparities? 2) How can we systematically improve detection of under-performing marine species? We manipulate the DUO dataset to separate the object detection task into localization and classification and investigate the under-performance of the scallop class. Localization analysis using YOLO11 and TIDE finds that foreground-background discrimination is the most problematic stage regardless of data quantity. Classification experiments reveal persistent precision gaps even with balanced data, indicating intrinsic feature-based challenges beyond data scarcity and inter-class dependencies. We recommend imbalanced distributions when prioritizing precision, and balanced distributions when prioritizing recall. Improving under-performing classes should focus on algorithmic advances, especially within localization modules. We publicly release our code and datasets.
【3】SwiftF0: Fast and Accurate Monophonic Pitch Detection
标题:SwiftF0:快速准确的单音音调检测
链接:https://arxiv.org/abs/2508.18440
作者:adzik
摘要:在噪声条件下,特别是在资源受限的设备上,准确和实时的单声道音高估计仍然是音频处理中的一个公开挑战。我们提出了一种新的轻量级神经模型SwiftF 0,它为单声道音高估计提供了一种新的最先进的方法。通过对不同的语音、音乐和合成数据集进行训练,并进行广泛的数据增强,SwiftF 0在保持计算效率的同时,实现了跨声学域的鲁棒泛化。SwiftF 0在10 dB SNR时达到91.80\%的谐波平均值(HM),比CREPE等基线性能高出12个百分点以上,仅比干净音频降低2.3个百分点。SwiftF 0仅需要95,842个参数,在CPU上的运行速度比CREPE快约42倍,使其成为高效实时部署的理想选择。为了解决语音语料库(通常依赖于算法估计器或喉镜信号)中缺乏完全准确的地面真实音调的问题,我们引入了\{SpeechSynth}。这个由音素级TTS模型生成的合成语音数据集提供了精确的、按需的真实音调曲线,从而实现了更强大的模型训练和评估。此外,我们提出了一个统一的指标,结合六个互补的性能指标,全面和可靠的音高评估,并发布了一个开源的音高基准套件。SwiftF 0的实时演示可在https://swift-f0.github.io/获得,源代码可在https://github.com/lars76/swift-f0获得,基准框架可在https://github.com/lars76/pitch-benchmark获得。
摘要:Accurate and real-time monophonic pitch estimation in noisy conditions, particularly on resource-constrained devices, remains an open challenge in audio processing. We present \emph{SwiftF0}, a novel, lightweight neural model that sets a new state-of-the-art for monophonic pitch estimation. Through training on diverse speech, music, and synthetic datasets with extensive data augmentation, SwiftF0 achieves robust generalization across acoustic domains while maintaining computational efficiency. SwiftF0 achieves a 91.80\% harmonic mean (HM) at 10 dB SNR, outperforming baselines like CREPE by over 12 percentage points and degrading by only 2.3 points from clean audio. SwiftF0 requires only 95,842 parameters and runs approximately 42x faster than CREPE on CPU, making it ideal for efficient, real-time deployment. To address the critical lack of perfectly accurate ground truth pitch in speech corpora (which typically rely on algorithmic estimators or laryngograph signals), we introduce \emph{SpeechSynth}. This synthetic speech dataset, generated by a phoneme-level TTS model, provides exact, on-demand ground-truth pitch curves, enabling more robust model training and evaluation. Furthermore, we propose a unified metric, combining six complementary performance measures for comprehensive and reliable pitch evaluation, and release an open-source pitch benchmark suite. A live demo of SwiftF0 is available at https://swift-f0.github.io/, the source code at https://github.com/lars76/swift-f0, and the benchmark framework at https://github.com/lars76/pitch-benchmark.
【4】HOTSPOT-YOLO: A Lightweight Deep Learning Attention-Driven Model for Detecting Thermal Anomalies in Drone-Based Solar Photovoltaic Inspections
标题:HOTSPOT-YOLO:一种轻量级深度学习注意力驱动模型,用于检测基于无人机的太阳能光电检测中的热异常
链接:https://arxiv.org/abs/2508.18912
作者:himish
摘要:太阳能光伏(PV)系统中的热异常检测对于确保运行效率和降低维护成本至关重要。在这项研究中,我们开发并命名为HOTSPOT-YOLO,这是一种轻量级的人工智能(AI)模型,它集成了高效的卷积神经网络主干和注意力机制,以改善对象检测。该模型专为基于无人机的光伏系统热检测而设计,解决了检测微小和微妙的热异常(如热点和缺陷模块)的独特挑战,同时保持实时性能。实验结果表明,平均平均精度为90.8%,反映了基线对象检测模型的显着改善。HOTSPOT-YOLO在不同环境条件下降低了计算负载并具有鲁棒性,为大规模光伏检测提供了可扩展且可靠的解决方案。这项工作突出了先进人工智能技术与实际工程应用的集成,彻底改变了可再生能源系统的自动故障检测。
摘要:Thermal anomaly detection in solar photovoltaic (PV) systems is essential for ensuring operational efficiency and reducing maintenance costs. In this study, we developed and named HOTSPOT-YOLO, a lightweight artificial intelligence (AI) model that integrates an efficient convolutional neural network backbone and attention mechanisms to improve object detection. This model is specifically designed for drone-based thermal inspections of PV systems, addressing the unique challenges of detecting small and subtle thermal anomalies, such as hotspots and defective modules, while maintaining real-time performance. Experimental results demonstrate a mean average precision of 90.8%, reflecting a significant improvement over baseline object detection models. With a reduced computational load and robustness under diverse environmental conditions, HOTSPOT-YOLO offers a scalable and reliable solution for large-scale PV inspections. This work highlights the integration of advanced AI techniques with practical engineering applications, revolutionizing automated fault detection in renewable energy systems.
表征(3篇)
【1】Emotions as Ambiguity-aware Ordinal Representations
标题:情感作为模糊意识的有序表示
链接:https://arxiv.org/abs/2508.19193
作者:u, Matthew Barthet, David Melhart, Georgios N. Yannakakis
摘要:情绪本质上是模糊的和动态的现象,然而现有的连续情绪识别方法要么忽略其模糊性,要么将模糊性视为一个独立的静态变量。出于这种差距在文献中,在本文中,我们介绍了\n {模糊意识的序数}情感表示,一个新的框架,捕捉的模糊性存在于情感注释和内在的时间动态的情感痕迹。具体来说,我们提出了通过其变化率来模拟情感模糊的方法。我们评估我们的框架上的两个情感语料库- RECOLA和GameVibe -测试我们提出的方法有界(唤醒,效价)和无界(参与)连续的痕迹。我们的研究结果表明,有序表示优于传统的模糊意识模型的无界标签,实现最高的一致性相关系数(CCC)和符号差分协议(SDA)的分数,突出了他们的有效性建模的痕迹的动态。对于有界的痕迹,序数表示优于SDA,揭示了他们优越的能力,捕捉相对变化的注释情绪痕迹。
摘要
:Emotions are inherently ambiguous and dynamic phenomena, yet existing continuous emotion recognition approaches either ignore their ambiguity or treat ambiguity as an independent and static variable over time. Motivated by this gap in the literature, in this paper we introduce \emph{ambiguity-aware ordinal} emotion representations, a novel framework that captures both the ambiguity present in emotion annotation and the inherent temporal dynamics of emotional traces. Specifically, we propose approaches that model emotion ambiguity through its rate of change. We evaluate our framework on two affective corpora -- RECOLA and GameVibe -- testing our proposed approaches on both bounded (arousal, valence) and unbounded (engagement) continuous traces. Our results demonstrate that ordinal representations outperform conventional ambiguity-aware models on unbounded labels, achieving the highest Concordance Correlation Coefficient (CCC) and Signed Differential Agreement (SDA) scores, highlighting their effectiveness in modeling the traces' dynamics. For bounded traces, ordinal representations excel in SDA, revealing their superior ability to capture relative changes of annotated emotion traces.
【2】On the Generalisation of Koopman Representations for Chaotic System Control
标题:关于Koopman表示在混乱系统控制中的推广
链接:https://arxiv.org/abs/2508.18954
作者:Hjikakou (1), Juan Diego Cardenas Cartagena (1), Matthia Sabatelli (1) ((1) University of Groningen, Department of Artificial Intelligence, Groningen, Netherlands)
备注:18 pages, 4 figures
摘要:本文研究了基于Koopman的混沌动力系统表示的泛化能力,重点是它们在预测和控制任务中的可转移性。使用Lorenz系统作为测试平台,我们提出了一个三阶段的方法:通过自动编码学习Koopman嵌入,对下一个状态预测预训练Transformer,以及对安全关键控制进行微调。我们的研究结果表明,Koopman嵌入优于标准和物理信息PCA基线,实现了准确和数据高效的性能。值得注意的是,在微调期间固定预训练的Transformer权重不会导致性能下降,这表明学习的表示捕获可重用的动态结构,而不是特定于任务的模式。这些发现支持使用Koopman嵌入作为物理信息机器学习中多任务学习的基础。项目页面可在https://kikisprdx.github.io/上找到。
摘要:This paper investigates the generalisability of Koopman-based representations for chaotic dynamical systems, focusing on their transferability across prediction and control tasks. Using the Lorenz system as a testbed, we propose a three-stage methodology: learning Koopman embeddings through autoencoding, pre-training a transformer on next-state prediction, and fine-tuning for safety-critical control. Our results show that Koopman embeddings outperform both standard and physics-informed PCA baselines, achieving accurate and data-efficient performance. Notably, fixing the pre-trained transformer weights during fine-tuning leads to no performance degradation, indicating that the learned representations capture reusable dynamical structure rather than task-specific patterns. These findings support the use of Koopman embeddings as a foundation for multi-task learning in physics-informed machine learning. A project page is available at https://kikisprdx.github.io/.
【3】Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming
标题:约束问题:减少混合变量线性规划的多模式表示
链接:https://arxiv.org/abs/2508.18742
作者:, Ran Hou, Yu Ding, Yixuan Li, Shisi Guan, Jiahui Duan, Xiongwei Han, Tao Zhong, Vincent Chau, Weiwei Wu, Wanyuan Wang
摘要:模型降阶的目的是学习原混合整数线性规划(MILP)的一个更简单的模型,可以更快地解决大规模MILP问题。大多数现有的模型降阶方法都是基于变量降阶的,它预测一个子集的变量的解值。从对偶的角度来看,约束约简将不等式约束的子集转化为等式约束也可以降低MILP的复杂度,但在很大程度上被忽略了。因此,本文提出了一种新的基于约束的模型降阶方法的MILP。基于约束的MILP约简有两个挑战:1)哪些不等式约束是关键的,使得减少它们可以在保持可行性的同时加速MILP求解,以及2)如何有效地预测这些关键约束。为了识别关键约束,我们首先标记这些紧约束在最优解作为潜在的关键约束,并设计一个启发式规则来选择一个子集的关键紧约束。为了学习关键的紧约束,我们提出了一个多模态表示技术,利用信息从实例级和抽象级MILP配方。实验结果表明,与现有方法相比,该方法的解的质量提高了50%以上,计算时间减少了17.47%.
摘要:Model reduction, which aims to learn a simpler model of the original mixed integer linear programming (MILP), can solve large-scale MILP problems much faster. Most existing model reduction methods are based on variable reduction, which predicts a solution value for a subset of variables. From a dual perspective, constraint reduction that transforms a subset of inequality constraints into equalities can also reduce the complexity of MILP, but has been largely ignored. Therefore, this paper proposes a novel constraint-based model reduction approach for the MILP. Constraint-based MILP reduction has two challenges: 1) which inequality constraints are critical such that reducing them can accelerate MILP solving while preserving feasibility, and 2) how to predict these critical constraints efficiently. To identify critical constraints, we first label these tight-constraints at the optimal solution as potential critical constraints and design a heuristic rule to select a subset of critical tight-constraints. To learn the critical tight-constraints, we propose a multi-modal representation technique that leverages information from both instance-level and abstract-level MILP formulations. The experimental results show that, compared to the state-of-the-art methods, our method improves the quality of the solution by over 50\% and reduces the computation time by 17.47\%.
编码器(2篇)
【1】End to End Autoencoder MLP Framework for Sepsis Prediction
标题:用于败血症预测的端到端自动编码器MLP框架
链接:https://arxiv.org/abs/2508.18688
作者:ai, Di Wu, Ji Xu, Xiang Liu, Yiziting Zhu, Xin Shu, Yujie Li, Bin Yi
摘要:脓毒症是一种危及生命的疾病,需要在重症监护环境中及时发现。传统的机器学习方法,包括朴素贝叶斯,支持向量机(SVM),随机森林和XGBoost,通常依赖于手动特征工程,并与电子健康记录中常见的不规则,不完整的时间序列数据作斗争。我们介绍了一个端到端的深度学习框架,该框架集成了用于自动特征提取的无监督自动编码器和用于二进制脓毒症风险预测的多层感知器分类器。为了提高临床适用性,我们实现了一个定制的下采样策略,在训练过程中提取高信息密度段和一个非重叠的动态滑动窗口机制,用于实时推理。预处理的时间序列数据表示为具有明确缺失指标的固定维度向量,从而减轻偏差和噪声。我们在三个ICU队列中验证了我们的方法。我们的端到端模型分别实现了74.6%、80.6%和93.5%的准确率,始终优于传统的机器学习基准。这些结果证明了该框架对于跨异构ICU环境的早期脓毒症检测的优越的鲁棒性、普遍性和临床实用性。
摘要:Sepsis is a life threatening condition that requires timely detection in intensive care settings. Traditional machine learning approaches, including Naive Bayes, Support Vector Machine (SVM), Random Forest, and XGBoost, often rely on manual feature engineering and struggle with irregular, incomplete time-series data commonly present in electronic health records. We introduce an end-to-end deep learning framework integrating an unsupervised autoencoder for automatic feature extraction with a multilayer perceptron classifier for binary sepsis risk prediction. To enhance clinical applicability, we implement a customized down sampling strategy that extracts high information density segments during training and a non-overlapping dynamic sliding window mechanism for real-time inference. Preprocessed time series data are represented as fixed dimension vectors with explicit missingness indicators, mitigating bias and noise. We validate our approach on three ICU cohorts. Our end-to-end model achieves accuracies of 74.6 percent, 80.6 percent, and 93.5 percent, respectively, consistently outperforming traditional machine learning baselines. These results demonstrate the framework's superior robustness, generalizability, and clinical utility for early sepsis detection across heterogeneous ICU environments.
【2】CoPE: A Lightweight Complex Positional Encoding
标题:CoPE:一种轻量级复杂位置编码
链接:https://arxiv.org/abs/2508.18308
作者:mballa
摘要
:最近的研究已经证明了位置编码在Transformer架构中的有效性。通过结合位置信息,这种方法为跨不同序列位置的元素之间的依赖性建模提供了必要的指导。我们引入CoPE(一种轻量级的复杂位置编码),这是一种利用复值编码来编码内容和位置信息的新架构。我们的方法取代了传统的位置编码与复杂的嵌入,其中的实部捕捉语义内容和虚部编码的位置信息。我们在Transformer模型的第一层引入了相位感知注意力来捕获位置相关模式,然后是更高级别的标准注意力层。我们表明,CoPE不表现出长期衰减,是兼容的线性注意。GLUE基准测试的实验评估表明,我们的方法实现了卓越的性能,计算复杂度较低,相比RoPE,正弦和学习的位置编码。
摘要:Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across different sequence positions. We introduce CoPE (a lightweight Complex Positional Encoding), a novel architecture that leverages complex-valued encoding to encode both content and positional information. Our approach replaces traditional positional encodings with complex embeddings where the real part captures semantic content and the imaginary part encodes positional information. We introduce phase-aware attention in the first layer of the transformer model to capture position-dependent patterns, followed by standard attention layers for higher-levels. We show that CoPE doesn't exhibit long term decay and is compatible with linear attention. Experimental evaluation on the GLUE benchmark suggest that our approach achieves superior performance with less computational complexity, compared to RoPE, Sinusoidal and Learned positional encodings.
优化|敛散性(2篇)
【1】HAEPO: History-Aggregated Exploratory Policy Optimization
标题:HEPO:历史汇总的探索性政策优化
链接:https://arxiv.org/abs/2508.18884
作者:rivedi, Alakh Sharma, Kartikey Singh Bhandari, Dhruv Kumar, Pratik Narang, Jagat Sesh Challa
备注:Under review
摘要:探索在现代学习中至关重要,从具有小神经策略的强化学习环境到大型语言模型(LLM)。现有的工作,如DPO,利用全序列对数似然来捕获模型决策的整个轨迹,而像GRPO这样的方法将每个令牌的比率聚合到一个概率级更新中。然而,这两种方法往往限制了对长期任务的探索。我们引入了历史聚合探索性策略优化(HAEPO),这是一种历史感知的探索性损失,可以克服这些缺点。HAEPO将每个轨迹压缩为其对数概率之和(累积对数似然),并在轨迹上应用Plackett-Luce softmax以获得与其回报成比例的归一化权重,从而鼓励更广泛的探索。我们添加熵正则化来稳定积极的更新,以防止过早崩溃和相对于先前(参考)策略的冻结副本的软KL惩罚。从经验上讲,HAEPO收敛速度快,探索彻底,与真正的奖励紧密结合,并在不同的任务中表现出更好的鲁棒学习行为,或与PPO,GRPO和DPO相当。因此,HAEPO提供了一个稳定的和可解释的框架,明确利用全轨迹的历史,同时平衡勘探和稳定性。
摘要:Exploration is essential in modern learning, from reinforcement learning environments with small neural policies to large language models (LLMs). Existing work, such as DPO, leverages full sequence log-likelihoods to capture an entire trajectory of the model's decisions, while methods like GRPO aggregate per-token ratios into a trajectory-level update. However, both often limit exploration on long-horizon tasks. We introduce History-Aggregated Exploratory Policy Optimization (HAEPO), a history-aware exploratory loss to combat these shortcomings. HAEPO compresses each trajectory into the sum of its logarithmic probabilities (a cumulative logarithmic likelihood), and applies a Plackett-Luce softmax across trajectories to obtain normalized weights proportional to their returns, thus encouraging broader exploration. We add entropy regularization to stabilize the aggressive updates to prevent premature collapse and a soft KL penalty relative to a frozen copy of the previous (reference) policy. Empirically, HAEPO converges fast, explores thoroughly, aligns closely with true rewards, and demonstrates robust learning behavior better or at par with PPO, GRPO, and DPO across diverse tasks. Thus, HAEPO provides a stable and interpretable framework by explicitly leveraging full-trajectory history while balancing exploration and stability.
【2】Enhancing Trust-Region Bayesian Optimization via Newton Methods
标题:通过牛顿方法增强信任区域Bayesian优化
链接:https://arxiv.org/abs/2508.18423
作者:hen, Yiyu Chen, Jing Huo, Tianyu Ding, Yang Gao, Yuetong Chen
摘要:贝叶斯优化(BO)已被广泛应用于优化昂贵的黑盒函数,同时保持样本效率。然而,将BO缩放到高维空间仍然具有挑战性。现有文献提出在多个局部信任区域(TuRBO)中执行标准BO以用于目标函数的异构建模并避免过度探索。尽管有其优点,但与全局GP相比,使用局部高斯过程(GP)降低了采样效率。为了提高采样效率,同时保持异构建模,我们建议构建多个局部二次模型,使用梯度和海森从一个全球的GP,并选择新的样本点,通过解决有界约束的二次规划。此外,我们解决的问题消失梯度的GP在高维空间。我们提供了一个收敛性分析,并通过实验结果表明,我们的方法提高了TuRBO的效率,并优于各种高维BO技术的合成函数和现实世界的应用。
摘要:Bayesian Optimization (BO) has been widely applied to optimize expensive black-box functions while retaining sample efficiency. However, scaling BO to high-dimensional spaces remains challenging. Existing literature proposes performing standard BO in multiple local trust regions (TuRBO) for heterogeneous modeling of the objective function and avoiding over-exploration. Despite its advantages, using local Gaussian Processes (GPs) reduces sampling efficiency compared to a global GP. To enhance sampling efficiency while preserving heterogeneous modeling, we propose to construct multiple local quadratic models using gradients and Hessians from a global GP, and select new sample points by solving the bound-constrained quadratic program. Additionally, we address the issue of vanishing gradients of GPs in high-dimensional spaces. We provide a convergence analysis and demonstrate through experimental results that our method enhances the efficacy of TuRBO and outperforms a wide range of high-dimensional BO techniques on synthetic functions and real-world applications.
预测|估计(12篇)
【1】Predicting the Order of Upcoming Tokens Improves Language Modeling
标题:预测即将到来的令牌的顺序改进了语言建模
链接:https://arxiv.org/abs/2508.19228
作者:. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji
摘要:多标记预测(MTP)已被提出作为辅助目标,以改善语言模型训练中的下一个标记预测(NTP),但表现出不一致的改进,在标准NLP基准测试中表现不佳。我们认为,MTP的准确的未来令牌预测是太困难的辅助损失。相反,我们提出了令牌顺序预测(TOP),它使用学习排名损失训练模型根据它们的接近度对即将到来的令牌进行排序。与MTP的多个Transformer层相比,TOP只需要一个额外的解嵌入层。我们使用NTP、MTP和TOP目标预训练340M、1.8B和7B参数的模型。八个标准NLP基准测试的结果表明,即使在规模上,TOP总体上也优于NTP和MTP。我们的代码可在https://github.com/zaydzuhri/token-order-prediction上获得
摘要:Multi-Token Prediction (MTP) has been proposed as an auxiliary objective to improve next-token prediction (NTP) in language model training but shows inconsistent improvements, underperforming in standard NLP benchmarks. We argue that MTP's exact future token prediction is too difficult as an auxiliary loss. Instead, we propose Token Order Prediction (TOP), which trains models to order upcoming tokens by their proximity using a learning-to-rank loss. TOP requires only a single additional unembedding layer compared to MTP's multiple transformer layers. We pretrain models of 340M, 1.8B, and 7B parameters using NTP, MTP, and TOP objectives. Results on eight standard NLP benchmarks show that TOP overall outperforms both NTP and MTP even at scale. Our code is available at https://github.com/zaydzuhri/token-order-prediction
【2】CARMA: Collocation-Aware Resource Manager with GPU Memory Estimator
标题:CARMA:具有图形处理器内存估计器的配置感知资源管理器
链接:https://arxiv.org/abs/2508.19073
作者:sefzadeh-Asl-Miandoab, Reza Karimzadeh, Bulat Ibragimov, Florina M. Ciorba, Pınar Tözün
摘要:对企业级基础设施进行的研究表明,GPU(深度学习(DL)训练的核心计算资源)通常未得到充分利用。GPU上的DL任务配置是解决这一挑战的机会。然而,它可能导致(1)随后到达的任务的内存不足崩溃,以及(2)由于资源干扰而导致共享GPU的所有任务的减速。前一个挑战对健壮性构成威胁,而后者影响服务质量和能源效率。 我们提出了CARMA,一个服务器规模的任务级搭配感知资源管理系统,处理这两个搭配的挑战。CARMA包含GPUMemNet,一种用于DL训练任务的基于ML的GPU内存估计器框架,以最大限度地减少内存不足错误,并引入限制GPU利用率以最大限度地减少干扰的搭配策略。此外,CARMA引入了一种恢复方法,以确保崩溃任务的鲁棒重启。我们对真实世界的DL训练任务跟踪建模的跟踪进行的评估表明,CARMA将GPU利用率随时间增加了39.3%,将端到端执行时间减少了26.7%,并将GPU能耗减少了14.2%。
摘要:Studies conducted on enterprise-scale infrastructure have shown that GPUs -- the core computational resource for deep learning (DL) training -- are often significantly underutilized. DL task collocation on GPUs is an opportunity to address this challenge. However, it may result in (1) out-of-memory crashes for the subsequently arriving task and (2) slowdowns for all tasks sharing the GPU due to resource interference. The former challenge poses a threat to robustness, while the latter affects the quality of service and energy efficiency. We propose CARMA, a server-scale task-level collocation-aware resource management system that handles both collocation challenges. CARMA encompasses GPUMemNet, a novel ML-based GPU memory estimator framework for DL training tasks, to minimize out-of-memory errors and introduces collocation policies that cap GPU utilization to minimize interference. Furthermore, CARMA introduces a recovery method to ensure robust restart of tasks that crash. Our evaluation on traces modeled after real-world DL training task traces shows that CARMA increases the GPU utilization over time by 39.3\%, decreases the end-to-end execution time by $\sim$26.7\%, and reduces the GPU energy use by $\sim$14.2\%.
【3】Tackling Federated Unlearning as a Parameter Estimation Problem
标题:解决联邦取消学习作为参数估计问题
链接:https://arxiv.org/abs/2508.19065
作者:alordi, Lorenzo Manini, Fabio Stella, Alessio Merlo
备注:18 pages, 1 figure
摘要:隐私法规要求从深度学习模型中删除数据。这是一个重大的挑战,在联邦学习中被放大,数据仍然在客户端上,使得完全的再培训或协调更新往往是不可行的。这项工作介绍了一个有效的联合学习框架的基础上信息论,模型泄漏作为一个参数估计问题。我们的方法使用二阶Hessian信息来识别和选择性地重置对被遗忘的数据最敏感的参数,然后进行最小的联邦再训练。这种与模型无关的方法支持分类和客户端遗忘,而不需要在初始信息聚合后服务器访问原始客户端数据。对基准数据集的评估显示出很强的隐私性(MIA成功接近随机,分类知识被删除)和高性能(标准化准确度对重新训练的基准约为0.9美元),同时旨在提高效率。此外,在有针对性的后门攻击的情况下,我们的框架有效地中和恶意触发器,恢复模型的完整性。这为FL中的数据遗忘提供了一种实用的解决方案。
摘要:Privacy regulations require the erasure of data from deep learning models. This is a significant challenge that is amplified in Federated Learning, where data remains on clients, making full retraining or coordinated updates often infeasible. This work introduces an efficient Federated Unlearning framework based on information theory, modeling leakage as a parameter estimation problem. Our method uses second-order Hessian information to identify and selectively reset only the parameters most sensitive to the data being forgotten, followed by minimal federated retraining. This model-agnostic approach supports categorical and client unlearning without requiring server access to raw client data after initial information aggregation. Evaluations on benchmark datasets demonstrate strong privacy (MIA success near random, categorical knowledge erased) and high performance (Normalized Accuracy against re-trained benchmarks of $\approx$ 0.9), while aiming for increased efficiency over complete retraining. Furthermore, in a targeted backdoor attack scenario, our framework effectively neutralizes the malicious trigger, restoring model integrity. This offers a practical solution for data forgetting in FL.
【4】Working My Way Back to You: Resource-Centric Next-Activity Prediction
标题:努力回到您身边:以资源为中心的下一活动预测
链接:https://arxiv.org/abs/2508.19016
作者:owski, Xixi Lu, Hajo A Reijers
摘要:预测过程监控(PPM)旨在训练模型,预测过程执行中即将发生的事件。这些预测支持早期瓶颈检测、改进调度、主动干预以及与利益相关者的及时沟通。虽然现有的研究采用控制流的角度来看,我们调查下一个活动预测从资源为中心的观点,它提供了额外的好处,如改进的工作组织,工作负载平衡和容量预测。虽然资源信息已被证明可以增强诸如过程性能分析等任务,但它在下一个活动预测中的作用仍未得到探索。在这项研究中,我们评估了四种预测模型和三种编码策略,包括四个真实数据集。与基线相比,我们的结果表明,LightGBM和Transformer模型在基于2-gram活动转换的编码下表现最好,而随机森林从结合2-gram转换和活动重复特征的编码中获益最多。这种组合编码也实现了最高的平均精度。这种以资源为中心的方法可以通过分析个人行为而不是案例级别的进展来实现更智能的资源分配,战略性的劳动力规划和个性化的员工支持。这些发现强调了以资源为中心的下一个活动预测的潜力,为PPM的研究开辟了新的途径。
摘要:Predictive Process Monitoring (PPM) aims to train models that forecast upcoming events in process executions. These predictions support early bottleneck detection, improved scheduling, proactive interventions, and timely communication with stakeholders. While existing research adopts a control-flow perspective, we investigate next-activity prediction from a resource-centric viewpoint, which offers additional benefits such as improved work organization, workload balancing, and capacity forecasting. Although resource information has been shown to enhance tasks such as process performance analysis, its role in next-activity prediction remains unexplored. In this study, we evaluate four prediction models and three encoding strategies across four real-life datasets. Compared to the baseline, our results show that LightGBM and Transformer models perform best with an encoding based on 2-gram activity transitions, while Random Forest benefits most from an encoding that combines 2-gram transitions and activity repetition features. This combined encoding also achieves the highest average accuracy. This resource-centric approach could enable smarter resource allocation, strategic workforce planning, and personalized employee support by analyzing individual behavior rather than case-level progression. The findings underscore the potential of resource-centric next-activity prediction, opening up new venues for research on PPM.
【5】The Sound of Risk: A Multimodal Physics-Informed Acoustic Model for Forecasting Market Volatility and Enhancing Market Interpretability
标题:风险之声:用于预测市场波动性和增强市场解释性的多模式物理信息声学模型
链接:https://arxiv.org/abs/2508.18653
作者: Chen, Xin Yu, Le Chang, Teng Jing, Jiashuai He, Ze Wang, Yangjun Luo, Xingyu Chen, Jiayue Liang, Yuchen Wang, Jiaying Xie
备注:9 pages, 6 figures
摘要:金融市场中的信息不对称,往往被精心策划的企业叙事放大,破坏了传统文本分析的有效性。我们提出了一个新的多模态框架,金融风险评估,整合文本情感与来自执行声道动态的盈利电话的非语言线索。该框架的核心是物理信息声学模型(PIAM),它应用非线性声学从受到信号削波等失真的原始电话会议声音中鲁棒地提取情感签名。听觉和文本情感状态都被投射到一个可解释的三维情感状态标签(ASL)空间-紧张,稳定和唤醒。使用1,795个财报电话(约1,800小时)的数据集,我们构建了捕捉脚本演示和自发问答交流之间的执行影响动态变化的功能。我们的主要发现揭示了预测能力的明显差异:虽然多模态特征不能预测方向性股票收益,但它们解释了30天已实现波动率中高达43.8%的样本外方差。重要的是,波动性预测强烈驱动的情绪动态执行过渡期间,从脚本到自发的讲话,特别是减少文本的稳定性和提高声学不稳定性,从首席执行官,和显着的唤醒变化。一项消融研究证实,我们的多模式方法大大优于仅财务基线,强调了声学和文本模式的互补贡献。通过从可验证的生物特征信号中解码潜在的不确定性标记,我们的方法为投资者和监管机构提供了一个强大的工具,用于增强市场的可解释性和识别隐藏的公司不确定性。
摘要
:Information asymmetry in financial markets, often amplified by strategically crafted corporate narratives, undermines the effectiveness of conventional textual analysis. We propose a novel multimodal framework for financial risk assessment that integrates textual sentiment with paralinguistic cues derived from executive vocal tract dynamics in earnings calls. Central to this framework is the Physics-Informed Acoustic Model (PIAM), which applies nonlinear acoustics to robustly extract emotional signatures from raw teleconference sound subject to distortions such as signal clipping. Both acoustic and textual emotional states are projected onto an interpretable three-dimensional Affective State Label (ASL) space-Tension, Stability, and Arousal. Using a dataset of 1,795 earnings calls (approximately 1,800 hours), we construct features capturing dynamic shifts in executive affect between scripted presentation and spontaneous Q&A exchanges. Our key finding reveals a pronounced divergence in predictive capacity: while multimodal features do not forecast directional stock returns, they explain up to 43.8% of the out-of-sample variance in 30-day realized volatility. Importantly, volatility predictions are strongly driven by emotional dynamics during executive transitions from scripted to spontaneous speech, particularly reduced textual stability and heightened acoustic instability from CFOs, and significant arousal variability from CEOs. An ablation study confirms that our multimodal approach substantially outperforms a financials-only baseline, underscoring the complementary contributions of acoustic and textual modalities. By decoding latent markers of uncertainty from verifiable biometric signals, our methodology provides investors and regulators a powerful tool for enhancing market interpretability and identifying hidden corporate uncertainty.
【6】Sparse Autoencoders for Low-$N$ Protein Function Prediction and Design
标题:用于低N$蛋白质功能预测和设计的稀疏自动编码器
链接:https://arxiv.org/abs/2508.18567
作者:i, Kunal Talreja, Amirali Aghazadeh
备注:15 pages, 4 figures
摘要:从氨基酸序列预测蛋白质功能仍然是数据稀缺(低N$)制度的核心挑战,限制了机器学习指导的蛋白质设计时,只有少量的测定标记的序列功能数据可用。蛋白质语言模型(pLM)通过提供进化信息嵌入和稀疏自编码器(SAE)推进了该领域,使这些嵌入能够分解为捕获结构和功能特征的可解释的潜在变量。然而,SAE的低$N$功能预测和蛋白质设计的有效性尚未得到系统的研究。在这里,我们评估了在各种健身外推和蛋白质工程任务中经过微调的ESM 2嵌入训练的SAE。我们发现,SAE,只有24个序列,在适应度预测中始终优于或竞争其ESM 2基线,这表明它们的稀疏潜在空间编码了紧凑且具有生物学意义的表示,可以从有限的数据中更有效地概括。此外,转向预测潜伏期利用pLM表示中的生物基序,与仅使用ESM 2设计相比,在83%的情况下产生最佳适应度变体。
摘要:Predicting protein function from amino acid sequence remains a central challenge in data-scarce (low-$N$) regimes, limiting machine learning-guided protein design when only small amounts of assay-labeled sequence-function data are available. Protein language models (pLMs) have advanced the field by providing evolutionary-informed embeddings and sparse autoencoders (SAEs) have enabled decomposition of these embeddings into interpretable latent variables that capture structural and functional features. However, the effectiveness of SAEs for low-$N$ function prediction and protein design has not been systematically studied. Herein, we evaluate SAEs trained on fine-tuned ESM2 embeddings across diverse fitness extrapolation and protein engineering tasks. We show that SAEs, with as few as 24 sequences, consistently outperform or compete with their ESM2 baselines in fitness prediction, indicating that their sparse latent space encodes compact and biologically meaningful representations that generalize more effectively from limited data. Moreover, steering predictive latents exploits biological motifs in pLM representations, yielding top-fitness variants in 83% of cases compared to designing with ESM2 alone.
【7】Linear cost mutual information estimation and independence test of similar performance as HSIC
标题:线性代价互信息估计及与HSIC相似性能的独立性检验
链接:https://arxiv.org/abs/2508.18338
作者:a, Jagoda Bracha, Adrian Przybysz
备注:7 pages, 5 figures
摘要:评估两个数据样本之间的统计依赖性是数据科学/机器学习的基本问题,而HSIC(Hilbert-Schmidt Information Criterion)被认为是最先进的方法。然而,对于大小为$n$的数据样本,它需要$n\times n$矩阵的乘法,目前需要$\sim O(n^{2.37})$计算复杂度,使得它对于大数据样本不切实际。我们讨论HCR(分层相关重建)作为其线性成本的实际替代方案,在测试中具有更高的依赖敏感性,并通过描述依赖关系提供实际的联合分布模型,通过特征是混合矩,从相关性和同方差开始,还允许近似互信息作为两个数据样本之间的非平凡混合矩的平方和。这种描述特征的单依赖性在$O(n)$线性时间内计算。它们的测试数量随着维度$d$而变化-对于成对依赖需要$O(d^2)$,如果还想考虑更微妙的三重依赖则需要$O(d^3)$,等等。
摘要:Evaluation of statistical dependencies between two data samples is a basic problem of data science/machine learning, and HSIC (Hilbert-Schmidt Information Criterion)~\cite{HSIC} is considered the state-of-art method. However, for size $n$ data sample it requires multiplication of $n\times n$ matrices, what currently needs $\sim O(n^{2.37})$ computational complexity~\cite{mult}, making it impractical for large data samples. We discuss HCR (Hierarchical Correlation Reconstruction) as its linear cost practical alternative of even higher dependence sensitivity in tests, and additionally providing actual joint distribution model by description of dependencies through features being mixed moments, starting with correlation and homoscedasticity, also allowing to approximate mutual information as just sum of squares of such nontrivial mixed moments between two data samples. Such single dependence describing feature is calculated in $O(n)$ linear time. Their number to test varies with dimension $d$ - requiring $O(d^2)$ for pairwise dependencies, $O(d^3)$ if wanting to also consider more subtle triplewise, and so on.
【8】Data-driven models for production forecasting and decision supporting in petroleum reservoirs
标题:石油储层产量预测和决策支持的数据驱动模型
链接:https://arxiv.org/abs/2508.18289
作者: Fernandes, Michael M. Furlanetti, Eduardo Gildin, Marcio A. Sampaio
备注:Manuscript as submitted to Journal of Petroleum Exploration and Production Technology
摘要:可靠地预测产量和预测岩石-流体系统行为的变化是油藏工程的主要挑战。该项目建议通过数据驱动的方法和使用机器学习方法来处理这个问题。其目的是开发一种方法,根据简单的数据预测生产参数,如生产和注入量,最终,位于井中的仪表,而不依赖于地质模型,流体性质或完井和流动系统的细节。最初,我们对生产和注入变量进行了相关性分析,并对数据进行了调节以适应问题。由于储层条件随时间变化,概念漂移是一个优先关注的问题,需要特别注意这些观察窗口和再培训的周期性,这也是研究的对象。对于生产预测,我们研究了监督学习方法,例如基于回归和神经网络的方法,以确定在性能和复杂性方面最适合我们的应用。在第一步中,我们评估的方法,使用合成数据生成的UNISIM III成分模拟模型。接下来,我们将其应用于巴西盐下的真实戏剧案例。预期的结果是设计一种可靠的预测器,用于再现储层动态,具有快速响应,能够处理诸如井和处理单元中的限制的实际困难,并且可以用于支持储层管理的行动,包括预测有害行为,优化生产和注入参数以及分析概率事件的影响,旨在最大限度地提高石油采收率。
摘要
:Forecasting production reliably and anticipating changes in the behavior of rock-fluid systems are the main challenges in petroleum reservoir engineering. This project proposes to deal with this problem through a data-driven approach and using machine learning methods. The objective is to develop a methodology to forecast production parameters based on simple data as produced and injected volumes and, eventually, gauges located in wells, without depending on information from geological models, fluid properties or details of well completions and flow systems. Initially, we performed relevance analyses of the production and injection variables, as well as conditioning the data to suit the problem. As reservoir conditions change over time, concept drift is a priority concern and require special attention to those observation windows and the periodicity of retraining, which are also objects of study. For the production forecasts, we study supervised learning methods, such as those based on regressions and Neural Networks, to define the most suitable for our application in terms of performance and complexity. In a first step, we evaluate the methodology using synthetic data generated from the UNISIM III compositional simulation model. Next, we applied it to cases of real plays in the Brazilian pre-salt. The expected result is the design of a reliable predictor for reproducing reservoir dynamics, with rapid response, capability of dealing with practical difficulties such as restrictions in wells and processing units, and that can be used in actions to support reservoir management, including the anticipation of deleterious behaviors, optimization of production and injection parameters and the analysis of the effects of probabilistic events, aiming to maximize oil recovery.
【9】Multi-Modal Drift Forecasting of Leeway Objects via Navier-Stokes-Guided CNN and Sequence-to-Sequence Attention-Based Models
标题:通过Navier-Stokes引导CNN和基于序列到序列注意力的模型对回旋物体进行多模式漂移预测
链接:https://arxiv.org/abs/2508.18284
作者: Adesunkanmi, Alexander W. Brandt, Masoud Deylami, Gustavo A. Giraldo Echeverri, Hamidreza Karbasian, Adel Alaeddini
备注:Submitted to IEEE
摘要:准确预测海洋环境中的偏航物体的漂移(位移)仍然是一个关键的挑战,特别是在搜索和救援行动等时间敏感的情况下。在这项研究中,我们提出了一个多模态机器学习框架,该框架将句子Transformer嵌入与基于注意力的序列到序列架构相结合,以预测水中的偏航物体的漂移。我们首先通过实验收集环境和物理数据,包括水流和风速,物体质量和表面积,为五个不同的leeway对象。使用模拟数据的Navier-Stokes为基础的模型训练的卷积神经网络的几何图像表示,我们估计阻力和升力系数的偏航物体。然后,这些系数被用于导出负责驱动物体运动的净力。所得到的时间序列,包括物理力,环境速度,和对象特定的功能,结合通过语言模型编码的文本描述,是基于注意力的序列到序列的长短期记忆和Transformer模型的输入,以预测未来的漂移轨迹。我们在多个时间范围(1 $,3 $,5 $和10 $秒)评估框架,并评估其在不同对象之间的泛化。我们将我们的方法与基于物理的拟合模型和传统机器学习方法进行比较,包括递归神经网络和时间卷积神经网络。我们的研究结果表明,这些多模态模型的性能优于传统模型,同时也可以进行长期预测,而不是单步预测。总的来说,我们的研究结果表明,多模态建模策略的能力,提供准确和适应性的预测,在动态的海洋条件下的leeway对象漂移。
摘要:Accurately predicting the drift (displacement) of leeway objects in maritime environments remains a critical challenge, particularly in time-sensitive scenarios such as search and rescue operations. In this study, we propose a multi-modal machine learning framework that integrates Sentence Transformer embeddings with attention-based sequence-to-sequence architectures to predict the drift of leeway objects in water. We begin by experimentally collecting environmental and physical data, including water current and wind velocities, object mass, and surface area, for five distinct leeway objects. Using simulated data from a Navier-Stokes-based model to train a convolutional neural network on geometrical image representations, we estimate drag and lift coefficients of the leeway objects. These coefficients are then used to derive the net forces responsible for driving the objects' motion. The resulting time series, comprising physical forces, environmental velocities, and object-specific features, combined with textual descriptions encoded via a language model, are inputs to attention-based sequence-to-sequence long-short-term memory and Transformer models, to predict future drift trajectories. We evaluate the framework across multiple time horizons ($1$, $3$, $5$, and $10$ seconds) and assess its generalization across different objects. We compare our approach against a fitted physics-based model and traditional machine learning methods, including recurrent neural networks and temporal convolutional neural networks. Our results show that these multi-modal models perform comparably to traditional models while also enabling longer-term forecasting in place of single-step prediction. Overall, our findings demonstrate the ability of a multi-modal modeling strategy to provide accurate and adaptable predictions of leeway object drift in dynamic maritime conditions.
【10】Forecasting Probability Distributions of Financial Returns with Deep Neural Networks
标题:利用深度神经网络预测财务回报率的概率分布
链接:https://arxiv.org/abs/2508.18921
作者:hańków
备注:12 pages, 4 figures, 5 tables
摘要:这项研究评估了深度神经网络预测财务回报概率分布的能力。1D卷积神经网络(CNN)和长短期记忆(LSTM)架构用于预测三种概率分布的参数:正态分布、学生t分布和偏斜学生t分布。使用自定义的负对数似然损失函数,分布参数直接优化。该模型进行了测试,六个主要的股票指数(标准普尔500,BOVESPA,DAX,WIG,日经225,和KOSPI)使用概率评估指标,包括日志预测得分(LPS),连续排名概率得分(CRPS),概率积分变换(PIT)。结果表明,深度学习模型提供了准确的分布预测,并在风险价值估计方面与经典的Gestival模型竞争。具有倾斜学生t分布的LSTM在多个评估标准中表现最好,捕获了金融回报的重尾和不对称性。这项工作表明,深度神经网络是金融风险评估和投资组合管理的传统计量经济学模型的可行替代方案。
摘要:This study evaluates deep neural networks for forecasting probability distributions of financial returns. 1D convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) architectures are used to forecast parameters of three probability distributions: Normal, Student's t, and skewed Student's t. Using custom negative log-likelihood loss functions, distribution parameters are optimized directly. The models are tested on six major equity indices (S\&P 500, BOVESPA, DAX, WIG, Nikkei 225, and KOSPI) using probabilistic evaluation metrics including Log Predictive Score (LPS), Continuous Ranked Probability Score (CRPS), and Probability Integral Transform (PIT). Results show that deep learning models provide accurate distributional forecasts and perform competitively with classical GARCH models for Value-at-Risk estimation. The LSTM with skewed Student's t distribution performs best across multiple evaluation criteria, capturing both heavy tails and asymmetry in financial returns. This work shows that deep neural networks are viable alternatives to traditional econometric models for financial risk assessment and portfolio management.
【11】Huracan: A skillful end-to-end data-driven system for ensemble data assimilation and weather prediction
标题:Huracan:一个熟练的端到端数据驱动系统,用于集合数据同化和天气预测
链接:https://arxiv.org/abs/2508.18486
作者: Jonathan Weyn, Hang Zhang, Yanfei Xiang, Jiang Bian, Weixin Jin, Kit Thambiratnam, Qi Zhang, Haiyu Dong, Hongyu Sun
摘要:在过去的几年里,基于机器学习的数据驱动天气预测已经通过提供更准确的预测而改变了业务天气预报,同时与传统的数值天气预报(NWP)相比,它只使用了一小部分计算能力。然而,这些模型仍然依赖于NWP的初始条件,从而限制了它们的预测能力。一些端到端的系统已经被提出,但它们还没有达到最先进的NWP竞争对手的预测能力。在这项工作中,我们提出了Huracan,一个观测驱动的天气预报系统,它结合了集合数据同化模型与预测模型,以产生高度准确的预测,只依赖于观测作为输入。Huracan不仅是第一个提供集合初始条件和端到端集合天气预报的系统,而且还是第一个实现与最先进的NWP竞争对手ECMWF ENS相媲美的精度的端到端系统,尽管使用的可用观测数据量较少。值得注意的是,Huracan在75.4%的变量和交付周期组合上匹配或超过ECMWF ENS的连续排名概率得分。我们的工作是在端到端数据驱动的天气预报方面迈出的重要一步,并为进一步改进和革新业务天气预报开辟了机会。
摘要
:Over the past few years, machine learning-based data-driven weather prediction has been transforming operational weather forecasting by providing more accurate forecasts while using a mere fraction of computing power compared to traditional numerical weather prediction (NWP). However, those models still rely on initial conditions from NWP, putting an upper limit on their forecast abilities. A few end-to-end systems have since been proposed, but they have yet to match the forecast skill of state-of-the-art NWP competitors. In this work, we propose Huracan, an observation-driven weather forecasting system which combines an ensemble data assimilation model with a forecast model to produce highly accurate forecasts relying only on observations as inputs. Huracan is not only the first to provide ensemble initial conditions and end-to-end ensemble weather forecasts, but also the first end-to-end system to achieve an accuracy comparable with that of ECMWF ENS, the state-of-the-art NWP competitor, despite using a smaller amount of available observation data. Notably, Huracan matches or exceeds the continuous ranked probability score of ECMWF ENS on 75.4% of the variable and lead time combinations. Our work is a major step forward in end-to-end data-driven weather prediction and opens up opportunities for further improving and revolutionizing operational weather forecasting.
【12】From Prediction to Simulation: AlphaFold 3 as a Differentiable Framework for Structural Biology
标题:从预测到模拟:AlphaFold 3作为结构生物学的差异化框架
链接:https://arxiv.org/abs/2508.18446
作者:bbaszadeh, Armita Shahlaee
备注:37 pages, 5 figures. A perspective article on the conceptual advances of AlphaFold 3 and its paradigm shift toward differentiable simulation in structural biology
摘要:AlphaFold 3代表了计算生物学的变革性进步,通过新颖的多尺度Transformer架构,生物学信息交叉注意机制和几何感知优化策略增强蛋白质结构预测。这些创新极大地提高了不同蛋白质家族的预测准确性和泛化能力,超过了以前的方法。至关重要的是,AlphaFold 3体现了向可微模拟的范式转变,将传统的静态结构建模与动态分子模拟联系起来。通过将蛋白质折叠预测重新构建为可区分的过程,AlphaFold 3可以作为将深度学习与基于物理的分子生物学相结合的基础框架。
摘要:AlphaFold 3 represents a transformative advancement in computational biology, enhancing protein structure prediction through novel multi-scale transformer architectures, biologically informed cross-attention mechanisms, and geometry-aware optimization strategies. These innovations dramatically improve predictive accuracy and generalization across diverse protein families, surpassing previous methods. Crucially, AlphaFold 3 embodies a paradigm shift toward differentiable simulation, bridging traditional static structural modeling with dynamic molecular simulations. By reframing protein folding predictions as a differentiable process, AlphaFold 3 serves as a foundational framework for integrating deep learning with physics-based molecular
其他神经网络|深度学习|模型|建模(20篇)
【1】Composition and Alignment of Diffusion Models using Constrained Learning
标题:使用约束学习的扩散模型的合成和对齐
链接:https://arxiv.org/abs/2508.19104
作者:halafi, Ignacio Hounie, Dongsheng Ding, Alejandro Ribeiro
摘要:由于扩散模型能够从复杂分布中进行采样,因此它们在生成式建模中变得流行。为了提高生成的样本的质量及其对用户需求的遵从性,两种常用的方法是:(i)对齐,涉及微调扩散模型以使其与奖励对齐;以及(ii)组合,组合几个预先训练的扩散模型,每个模型强调生成的输出中的期望属性。然而,当优化多个奖励或组合多个模型时,通常会出现权衡,因为它们通常可以代表竞争属性。现有的方法不能保证所得到的模型忠实地生成具有所有所需属性的样本。为了解决这一差距,我们提出了一个约束优化框架,通过强制对齐模型满足奖励约束和/或保持接近(可能是多个)预训练模型,统一了扩散模型的对齐和组合。我们提供了一个理论表征的解决方案的约束对齐和组合问题,并开发了一个基于拉格朗日的原始-对偶训练算法来近似这些解决方案。经验上,我们证明了我们提出的方法在图像生成中的有效性和优点,将其应用于对齐和组合,并表明我们的对齐或组合模型有效地满足约束,并改进了等权重方法。您可以在https://github.com/shervinkhalafi/constrained_comp_align上找到我们的实作。
摘要:Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions. To improve the quality of generated samples and their compliance with user requirements, two commonly used methods are: (i) Alignment, which involves fine-tuning a diffusion model to align it with a reward; and (ii) Composition, which combines several pre-trained diffusion models, each emphasizing a desirable attribute in the generated outputs. However, trade-offs often arise when optimizing for multiple rewards or combining multiple models, as they can often represent competing properties. Existing methods cannot guarantee that the resulting model faithfully generates samples with all the desired properties. To address this gap, we propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to (potentially multiple) pre-trained models. We provide a theoretical characterization of the solutions to the constrained alignment and composition problems and develop a Lagrangian-based primal-dual training algorithm to approximate these solutions. Empirically, we demonstrate the effectiveness and merits of our proposed approach in image generation, applying it to alignment and composition, and show that our aligned or composed model satisfies constraints effectively, and improves on the equally-weighted approach. Our implementation can be found at https://github.com/shervinkhalafi/constrained_comp_align.
【2】Learning Binary Sampling Patterns for Single-Pixel Imaging using Bilevel Optimisation
标题:使用二层优化学习单像素成像的二进制采样模式
链接:https://arxiv.org/abs/2508.19068
作者: Tudosie, Alexander Denker, Zeljko Kereta, Simon Arridge
摘要:单像素成像能够使用单个探测器通过结构光图案的顺序照明重建物体。我们提出了一种双层优化方法,用于学习特定任务的二进制照明模式,针对单像素荧光显微镜等应用进行优化。我们解决了不可微的性质,二进制模式优化使用直通估计和利用总深度变化正则化的双层配方。我们在CytoImageNet显微镜数据集上展示了我们的方法,并表明与基线方法相比,学习的模式实现了卓越的重建性能,特别是在高度欠采样的情况下。
摘要:Single-Pixel Imaging enables reconstructing objects using a single detector through sequential illuminations with structured light patterns. We propose a bilevel optimisation method for learning task-specific, binary illumination patterns, optimised for applications like single-pixel fluorescence microscopy. We address the non-differentiable nature of binary pattern optimisation using the Straight-Through Estimator and leveraging a Total Deep Variation regulariser in the bilevel formulation. We demonstrate our method on the CytoImageNet microscopy dataset and show that learned patterns achieve superior reconstruction performance compared to baseline methods, especially in highly undersampled regimes.
【3】Breaking the Black Box: Inherently Interpretable Physics-Informed Machine Learning for Imbalanced Seismic Data
标题:打破黑匣子:针对不平衡地震数据的内在可解释的物理知情机器学习
链接:https://arxiv.org/abs/2508.19031
作者:eenath, Filippo Gatti, Pierre Jehel
备注:19 pages, 9 Figures and 2 Tables
摘要:地面运动模型(GES)预测地震时地面震动的强度。它们对于结构分析、抗震设计和地震风险评估研究至关重要。传统的机器学习(ML)方法很受欢迎,由于全球范围内的大型地震数据库,开发GARCH。然而,它们作为“黑匣子”运行,难以解释和信任,限制了它们在高风险决策中的使用。此外,这些数据库遭受严重的数据不平衡:与丰富的,不太严重的破坏性远记录相比,断层附近的大型,严重破坏性记录较少。在这项工作中,通过使用HazBinLoss函数开发一个透明的ML架构来解决这两个限制。每个输入(例如,幅度、距离、它们的相互作用项等)单独处理并线性相加以获得输出,从而得到每项的精确贡献。HazBinLoss函数在训练期间为关键的近场大震级记录分配较高的权重,为不太关键的远场较小震级记录分配较低的权重,以防止对最具破坏性的场景预测不足。我们的模型捕获了已知的地震学原理,并在保持透明度的同时实现了与已建立的Gestival相当的性能。这一框架使得更广泛地采用基于ML的方法进行风险评估研究和灾害规划。
摘要
:Ground motion models (GMMs) predict how strongly the ground will shake during an earthquake. They are essential for structural analysis, seismic design, and seismic risk assessment studies. Traditional machine learning (ML) approaches are popular to develop GMMs, due to large earthquake databases worldwide. However, they operate as "black boxes," which are hard to interpret and trust, limiting their use in high-stake decisions. Additionally, these databases suffer from significant data imbalances: fewer large, critically damaging records near the fault compared to abundant, less severely damaging distant records. These two limitations are addressed in this work by developing a transparent ML architecture using the HazBinLoss function. Each input (e.g., magnitude, distance, their interaction term, etc.) is processed separately and added linearly to obtain the output, resulting in exact contribution of each term. The HazBinLoss function assigns higher weights to critical near-field large magnitude records and lower weights to less-critical far-field smaller magnitude records, during training to prevent underprediction of the most damaging scenarios. Our model captures known seismological principles and achieves comparable performance with established GMMs while maintaining transparency. This framework enables broader adoption of ML-based approaches for risk assessment studies and disaster planning.
【4】Learning with springs and sticks
标题:用弹簧和棍棒学习
链接:https://arxiv.org/abs/2508.19015
作者:illa Calderón, Alán Aspuru-Guzik
备注:13 pages, 6 figures
摘要:学习是一个物理过程。在这里,我们的目标是研究一个简单的动力系统组成的弹簧和棍棒能够任意逼近任何连续函数。我们的工作的主要思想是使用棒来模仿给定函数的分段线性近似,使用弹簧的势能来编码所需的均方误差损失函数,并通过耗散收敛到最小能量配置。我们提出的模拟系统回归任务,并表明其性能是可比的多层感知器。此外,我们研究了系统的热力学性质,并发现系统的自由能变化与其学习底层数据分布的能力之间的关系。我们根据经验发现了一个由环境波动引起的系统的热力学学习障碍,如果系统的自由能变化遇到这样的障碍,系统就无法学习。我们相信这个简单的模型可以帮助我们从物理的角度更好地理解学习系统。
摘要:Learning is a physical process. Here, we aim to study a simple dynamical system composed of springs and sticks capable of arbitrarily approximating any continuous function. The main idea of our work is to use the sticks to mimic a piecewise-linear approximation of the given function, use the potential energy of springs to encode a desired mean squared error loss function, and converge to a minimum-energy configuration via dissipation. We apply the proposed simulation system to regression tasks and show that its performance is comparable to that of multi-layer perceptrons. In addition, we study the thermodynamic properties of the system and find a relation between the free energy change of the system and its ability to learn an underlying data distribution. We empirically find a \emph{thermodynamic learning barrier} for the system caused by the fluctuations of the environment, whereby the system cannot learn if its change in free energy hits such a barrier. We believe this simple model can help us better understand learning systems from a physical point of view.
【5】HierCVAE: Hierarchical Attention-Driven Conditional Variational Autoencoders for Multi-Scale Temporal Modeling
标题:HierCVAE:用于多尺度时间建模的分层注意力驱动条件变分自动编码器
链接:https://arxiv.org/abs/2508.18922
作者:
备注:10 pages, 6 figures
摘要:复杂系统中的时序建模需要捕获多个时间尺度上的依赖关系,同时管理固有的不确定性。我们提出了HierCVAE,一种新的架构,集成了分层注意力机制与条件变分自编码器,以解决这些挑战。HierCVAE采用三层注意力结构(局部,全局,跨时间)结合多模态条件编码来捕获时间,统计和趋势信息。该方法将ResFormer块合并到潜在空间中,并通过预测头提供明确的不确定性量化。通过对能源消耗数据集的评估,HierCVAE与最先进的方法相比,在预测准确性和卓越的不确定性校准方面提高了15-40%,在长期预测和复杂的多变量依赖关系方面表现出色。
摘要:Temporal modeling in complex systems requires capturing dependencies across multiple time scales while managing inherent uncertainties. We propose HierCVAE, a novel architecture that integrates hierarchical attention mechanisms with conditional variational autoencoders to address these challenges. HierCVAE employs a three-tier attention structure (local, global, cross-temporal) combined with multi-modal condition encoding to capture temporal, statistical, and trend information. The approach incorporates ResFormer blocks in the latent space and provides explicit uncertainty quantification via prediction heads. Through evaluations on energy consumption datasets, HierCVAE demonstrates a 15-40% improvement in prediction accuracy and superior uncertainty calibration compared to state-of-the-art methods, excelling in long-term forecasting and complex multi-variate dependencies.
【6】pyFAST: A Modular PyTorch Framework for Time Series Modeling with Multi-source and Sparse Data
标题:pyFast:一个模块化PyTorch框架,用于使用多源和稀疏数据进行时间序列建模
链接:https://arxiv.org/abs/2508.18891
作者:ng, Senzhen Wu, Yue Hu, Xiufeng Liu
摘要:现代时间序列分析需要灵活、高效和可扩展的框架。然而,许多现有的Python库在模块化和对不规则、多源或稀疏数据的原生支持方面存在局限性。我们介绍了pyFAST,一个面向研究的PyTorch框架,它明确地将数据处理与模型计算相结合,促进了更清晰的关注点分离,并促进了快速实验。其数据引擎专为复杂场景而设计,支持多源加载、蛋白质序列处理、高效的序列和补丁级填充、动态归一化以及用于插补和预测的基于掩码的建模。pyFAST集成了LLM启发的架构,用于稀疏数据源的无冗余融合,并提供了本地稀疏度量,专用损失函数和灵活的外源数据融合。训练工具包括基于批处理的流聚合评估和设备协同,以最大限度地提高计算效率。在一个鼓励扩展的模块化架构中提供了一套全面的经典和深度学习模型(Linear、CNN、RNN、Transformers和GNN)。PyFAST在GitHub上根据MIT许可证发布,为推进时间序列研究和应用提供了一个紧凑而强大的平台。
摘要:Modern time series analysis demands frameworks that are flexible, efficient, and extensible. However, many existing Python libraries exhibit limitations in modularity and in their native support for irregular, multi-source, or sparse data. We introduce pyFAST, a research-oriented PyTorch framework that explicitly decouples data processing from model computation, fostering a cleaner separation of concerns and facilitating rapid experimentation. Its data engine is engineered for complex scenarios, supporting multi-source loading, protein sequence handling, efficient sequence- and patch-level padding, dynamic normalization, and mask-based modeling for both imputation and forecasting. pyFAST integrates LLM-inspired architectures for the alignment-free fusion of sparse data sources and offers native sparse metrics, specialized loss functions, and flexible exogenous data fusion. Training utilities include batch-based streaming aggregation for evaluation and device synergy to maximize computational efficiency. A comprehensive suite of classical and deep learning models (Linears, CNNs, RNNs, Transformers, and GNNs) is provided within a modular architecture that encourages extension. Released under the MIT license at GitHub, pyFAST provides a compact yet powerful platform for advancing time series research and applications.
【7】C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning
标题:C-Flat++:迈向更高效、更强大的持续学习框架
链接:https://arxiv.org/abs/2508.18860
作者:angjie Yuan, Zixiang Zhao, Yifan Zhu, Aojun Lu, Tao Feng, Yanan Sun
摘要
:平衡对新任务的敏感性和保留过去知识的稳定性对于持续学习(CL)至关重要。最近,锐度感知最小化已被证明在迁移学习中是有效的,并且也已被采用在持续学习(CL)中以提高记忆保持和学习效率。然而,在某些设置中,仅依赖于零阶锐度可能会倾向于更尖锐的最小值而不是更平坦的最小值,从而导致不太鲁棒和潜在的次优解决方案。在本文中,我们提出了\textbf{C}连续\textbf{Flat}性(\textbf{C-Flat}),这是一种促进为CL量身定制的平坦损失景观的方法。C-Flat提供即插即用兼容性,只需对代码管道进行最少的修改即可轻松集成。此外,我们提出了一个通用框架,将C-Flat集成到所有主要的CL范式中,并与损失最小值优化器和基于平坦最小值的CL方法进行了全面的比较。我们的研究结果表明,C-Flat在广泛的设置中始终提高性能。此外,我们还引入了C-Flat++,这是一个高效而有效的框架,它利用了选择性扁平化驱动的推广,大大降低了C-Flat所需的更新成本。跨多个CL方法,数据集和场景的广泛实验证明了我们所提出的方法的有效性和效率。代码可在https://github.com/WanNaa/C-Flat上获得。
摘要:Balancing sensitivity to new tasks and stability for retaining past knowledge is crucial in continual learning (CL). Recently, sharpness-aware minimization has proven effective in transfer learning and has also been adopted in continual learning (CL) to improve memory retention and learning efficiency. However, relying on zeroth-order sharpness alone may favor sharper minima over flatter ones in certain settings, leading to less robust and potentially suboptimal solutions. In this paper, we propose \textbf{C}ontinual \textbf{Flat}ness (\textbf{C-Flat}), a method that promotes flatter loss landscapes tailored for CL. C-Flat offers plug-and-play compatibility, enabling easy integration with minimal modifications to the code pipeline. Besides, we present a general framework that integrates C-Flat into all major CL paradigms and conduct comprehensive comparisons with loss-minima optimizers and flat-minima-based CL methods. Our results show that C-Flat consistently improves performance across a wide range of settings. In addition, we introduce C-Flat++, an efficient yet effective framework that leverages selective flatness-driven promotion, significantly reducing the update cost required by C-Flat. Extensive experiments across multiple CL methods, datasets, and scenarios demonstrate the effectiveness and efficiency of our proposed approaches. Code is available at https://github.com/WanNaa/C-Flat.
【8】Learning Real-World Acrobatic Flight from Human Preferences
标题:根据人类偏好学习现实世界的杂技飞行
链接:https://arxiv.org/abs/2508.18817
作者:k, Ismail Geles, Jiaxu Xing, Angel Romero, Giorgia Ramponi, Davide Scaramuzza
备注:8 pages, 7 figures
摘要:基于偏好的强化学习(PbRL)使智能体能够学习控制策略,而不需要手动设计奖励函数,使其非常适合目标难以形式化或本质上主观的任务。杂技飞行构成了一个特别具有挑战性的问题,由于其复杂的动力学,快速的动作,以及精确执行的重要性。在这项工作中,我们探索使用PbRL敏捷无人机控制,重点是动态机动,如powerloops的执行。在基于偏好的邻近策略优化(Preference PPO)的基础上,我们提出了置信度下的奖励激励(REC),这是对奖励学习目标的扩展,可以提高偏好建模和学习稳定性。我们的方法实现了88.4%的成形奖励性能,而标准偏好PPO为55.2%。我们在模拟中对策略进行训练,并成功地将其转移到现实世界的无人机上,展示了多种杂技动作,其中人类偏好强调运动的风格品质。此外,我们证明了我们的概率奖励模型在一个有代表性的MuJoCo环境中的连续控制的适用性。最后,我们强调了手动设计奖励的局限性,观察到只有60.7%的人与人类偏好一致。这些结果强调了PbRL在物理和模拟领域捕获复杂的,以人为本的目标的有效性。
摘要:Preference-based reinforcement learning (PbRL) enables agents to learn control policies without requiring manually designed reward functions, making it well-suited for tasks where objectives are difficult to formalize or inherently subjective. Acrobatic flight poses a particularly challenging problem due to its complex dynamics, rapid movements, and the importance of precise execution. In this work, we explore the use of PbRL for agile drone control, focusing on the execution of dynamic maneuvers such as powerloops. Building on Preference-based Proximal Policy Optimization (Preference PPO), we propose Reward Ensemble under Confidence (REC), an extension to the reward learning objective that improves preference modeling and learning stability. Our method achieves 88.4% of the shaped reward performance, compared to 55.2% with standard Preference PPO. We train policies in simulation and successfully transfer them to real-world drones, demonstrating multiple acrobatic maneuvers where human preferences emphasize stylistic qualities of motion. Furthermore, we demonstrate the applicability of our probabilistic reward model in a representative MuJoCo environment for continuous control. Finally, we highlight the limitations of manually designed rewards, observing only 60.7% agreement with human preferences. These results underscore the effectiveness of PbRL in capturing complex, human-centered objectives across both physical and simulated domains.
【9】PseudoMapTrainer: Learning Online Mapping without HD Maps
标题:PseudoMapTrainer:学习在线地图,无需高清地图
链接:https://arxiv.org/abs/2508.18788
作者: Löwens, Thorben Funke, Jingchao Xie, Alexandru Paul Condurache
备注:Accepted at ICCV 2025
摘要:在线地图模型在仅从多视角相机图像预测矢量化地图方面显示出显着的效果。然而,所有现有的方法在训练期间仍然依赖于地面实况高清地图,这是昂贵的获得,并且通常没有足够的地理多样性来进行可靠的概括。在这项工作中,我们提出了伪MapTrainer,一种新的在线映射方法,使用从未标记的传感器数据生成的伪标签。我们通过使用高斯溅射和预先训练的2D分割网络的语义从多摄像机图像重建路面来获得这些伪标签。此外,我们引入了一个掩码感知分配算法和损失函数来处理部分掩码的伪标签,首次允许在没有任何地面实况地图的情况下训练在线映射模型。此外,我们的伪标签可以有效地用于以半监督的方式预训练在线模型,以利用大规模未标记的众包数据。该代码可在github.com/boschresearch/PseudoMapTrainer上获得。
摘要:Online mapping models show remarkable results in predicting vectorized maps from multi-view camera images only. However, all existing approaches still rely on ground-truth high-definition maps during training, which are expensive to obtain and often not geographically diverse enough for reliable generalization. In this work, we propose PseudoMapTrainer, a novel approach to online mapping that uses pseudo-labels generated from unlabeled sensor data. We derive those pseudo-labels by reconstructing the road surface from multi-camera imagery using Gaussian splatting and semantics of a pre-trained 2D segmentation network. In addition, we introduce a mask-aware assignment algorithm and loss function to handle partially masked pseudo-labels, allowing for the first time the training of online mapping models without any ground-truth maps. Furthermore, our pseudo-labels can be effectively used to pre-train an online model in a semi-supervised manner to leverage large-scale unlabeled crowdsourced data. The code is available at github.com/boschresearch/PseudoMapTrainer.
【10】UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
标题:UltraMemV 2:存储网络通过卓越的长上下文学习扩展至120 B参数
链接:https://arxiv.org/abs/2508.18756
作者:ng, Yu Bao, Qiyang Min, Siyan Chen, Ran Guo, Hongzhi Huang, Defa Zhu, Yutao Zeng, Banggu Wu, Xun Zhou, Siyuan Qiao
摘要:虽然混合专家(MoE)模型实现了显着的效率,激活只有参数的子集,他们遭受高内存访问成本在推理过程中。内存层架构提供了一个有吸引力的替代方案,具有非常少的内存访问,但以前的尝试,如UltraMem只匹配2专家MoE模型的性能,大大低于最先进的8专家配置。我们提出UltraMemV2,重新设计的内存层架构,关闭这个性能差距。我们的方法引入了五个关键的改进:将存储器层集成到每个Transformer块中,使用单个线性投影简化值扩展,采用PEER的基于FFN的值处理,实现原则性参数初始化,以及重新平衡存储器与FFN的计算比率。通过广泛的评估,我们证明,UltraMemV2实现性能平价与8专家MoE模型在相同的计算和参数,但显着低的内存访问。值得注意的是,UltraMemV2在记忆密集型任务上表现出卓越的性能,在长上下文记忆上提高了+1.6分,在多轮记忆上提高了+6.2分,在上下文学习上提高了+7.9分。我们使用120 B总参数中多达2.5 B激活参数的模型来大规模验证我们的方法,并确定激活密度对性能的影响比总稀疏参数计数更大。我们的工作带来了内存层架构的性能与最先进的MoE模型,提出了一个令人信服的替代有效的稀疏计算。
摘要
:While Mixture of Experts (MoE) models achieve remarkable efficiency by activating only subsets of parameters, they suffer from high memory access costs during inference. Memory-layer architectures offer an appealing alternative with very few memory access, but previous attempts like UltraMem have only matched the performance of 2-expert MoE models, falling significantly short of state-of-the-art 8-expert configurations. We present UltraMemV2, a redesigned memory-layer architecture that closes this performance gap. Our approach introduces five key improvements: integrating memory layers into every transformer block, simplifying value expansion with single linear projections, adopting FFN-based value processing from PEER, implementing principled parameter initialization, and rebalancing memory-to-FFN computation ratios. Through extensive evaluation, we demonstrate that UltraMemV2 achieves performance parity with 8-expert MoE models under same computation and parameters but significantly low memory access. Notably, UltraMemV2 shows superior performance on memory-intensive tasks, with improvements of +1.6 points on long-context memorization, +6.2 points on multi-round memorization, and +7.9 points on in-context learning. We validate our approach at scale with models up to 2.5B activated parameters from 120B total parameters, and establish that activation density has greater impact on performance than total sparse parameter count. Our work brings memory-layer architectures to performance parity with state-of-the-art MoE models, presenting a compelling alternative for efficient sparse computation.
【11】Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare
标题:医疗保健协作的多智能体学习中的技能一致公平性
链接:https://arxiv.org/abs/2508.18708
作者:saine Ekpo, Brian La, Thomas Wiener, Saesha Agarwal, Arshia Agrawal, Gonzalo Gonzalez-Pumariega, Lekan P. Molu, Angelique Taylor
摘要:多智能体强化学习(MARL)中的公平性通常被认为是一个工作负载平衡问题,忽略了智能体的专业知识和现实世界中所需的结构化协调。在医疗保健中,公平的任务分配需要工作量平衡或专业知识调整,以防止职业倦怠和过度使用高技能的代理人。平衡是指在医疗保健工作者之间分配近似相等数量的子任务或均衡的工作,而不管他们的专业知识如何。我们做两个贡献来解决这个问题。首先,我们提出了FairSkillMARL,一个框架,将公平性定义为工作量平衡和技能任务对齐的双重目标。其次,我们介绍MARLHospital,一个可定制的医疗保健启发的环境建模团队的组成和能源约束调度的公平性的影响,因为没有现有的模拟器是非常适合这个问题。我们进行了实验,比较FairSkillMARL与四个标准的MARL方法,并对两个国家的最先进的公平性指标。我们的研究结果表明,仅仅基于相等的工作量的公平性可能会导致任务-技能不匹配,并强调需要更强大的指标来捕获技能-任务不匹配。我们的工作提供了工具和基础,在异构多智能体系统中,调整工作与专业知识是至关重要的研究公平性。
摘要:Fairness in multi-agent reinforcement learning (MARL) is often framed as a workload balance problem, overlooking agent expertise and the structured coordination required in real-world domains. In healthcare, equitable task allocation requires workload balance or expertise alignment to prevent burnout and overuse of highly skilled agents. Workload balance refers to distributing an approximately equal number of subtasks or equalised effort across healthcare workers, regardless of their expertise. We make two contributions to address this problem. First, we propose FairSkillMARL, a framework that defines fairness as the dual objective of workload balance and skill-task alignment. Second, we introduce MARLHospital, a customizable healthcare-inspired environment for modeling team compositions and energy-constrained scheduling impacts on fairness, as no existing simulators are well-suited for this problem. We conducted experiments to compare FairSkillMARL in conjunction with four standard MARL methods, and against two state-of-the-art fairness metrics. Our results suggest that fairness based solely on equal workload might lead to task-skill mismatches and highlight the need for more robust metrics that capture skill-task misalignment. Our work provides tools and a foundation for studying fairness in heterogeneous multi-agent systems where aligning effort with expertise is critical.
【12】Auditing Approximate Machine Unlearning for Differentially Private Models
标题:不同私有模型的近似机器非学习审计
链接:https://arxiv.org/abs/2508.18671
作者:u, Jiajie He, Keke Chen
备注:Accepted by ICDM2025,10pages
摘要:近似机器学习旨在从训练模型中消除特定数据的影响,以确保个人隐私。现有的方法专注于删除的记录,并假设保留的记录不受影响。然而,最近关于隐私洋葱效应的研究表明,这种假设可能是不正确的。特别是当模型是差分隐私时,没有研究探讨在现有的机器学习方法下,保留的模型是否仍然满足差分隐私(DP)标准。本文采用一种整体的方法来审计未学习和保留样本的隐私风险后,应用近似unlearning算法。我们提出的隐私标准,未学习和保留的样本,分别基于DP和成员推理攻击(MIA)的角度。为了使审计过程更加实用,我们还开发了一个高效的MIA,A-LiRA,利用数据增强来降低影子模型训练的成本。我们的实验结果表明,现有的近似机器学习算法可能会无意中损害保留样本的隐私差异私人模型,我们需要差异私人学习算法。为了重现性,我们发布了我们的代码:https://anonymous.4open.science/r/Auditing-machine-unlearning-CB10/README.md
摘要:Approximate machine unlearning aims to remove the effect of specific data from trained models to ensure individuals' privacy. Existing methods focus on the removed records and assume the retained ones are unaffected. However, recent studies on the \emph{privacy onion effect} indicate this assumption might be incorrect. Especially when the model is differentially private, no study has explored whether the retained ones still meet the differential privacy (DP) criterion under existing machine unlearning methods. This paper takes a holistic approach to auditing both unlearned and retained samples' privacy risks after applying approximate unlearning algorithms. We propose the privacy criteria for unlearned and retained samples, respectively, based on the perspectives of DP and membership inference attacks (MIAs). To make the auditing process more practical, we also develop an efficient MIA, A-LiRA, utilizing data augmentation to reduce the cost of shadow model training. Our experimental findings indicate that existing approximate machine unlearning algorithms may inadvertently compromise the privacy of retained samples for differentially private models, and we need differentially private unlearning algorithms. For reproducibility, we have pubished our code: https://anonymous.4open.science/r/Auditing-machine-unlearning-CB10/README.md
【13】FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge
标题:FFT-MoE:在异类边缘下通过大规模稀疏MoE对基础模型进行高效联邦微调
链接:https://arxiv.org/abs/2508.18663
作者:Yinglei Teng, Pengfei Wu, Nan Wang
备注:9 pages, 6 figures
摘要:随着FM推动人工通用智能(AGI)的发展,在隐私和资源限制下对其进行微调变得越来越重要,特别是当高质量的训练数据驻留在分布式边缘设备上时。联邦学习(FL)通过联邦微调(FFT)提供了一个令人信服的解决方案,它可以在不共享原始数据的情况下实现协作模型自适应。最近的方法结合了参数高效微调(PEFT)技术,如低秩自适应(LoRA),以减少计算开销。然而,基于LoRA的FFT在异构FL环境中面临两个主要限制:具有不同LoRA配置的客户端之间的结构不兼容性以及对非IID数据分布的适应性有限,这阻碍了收敛和泛化。为了解决这些挑战,我们提出了FFT MoE,一种新的FFT框架,用稀疏的专家混合(MoE)适配器取代LoRA。每个客户端都训练一个轻量级的门控网络,以选择性地激活个性化的专家子集,从而实现对本地资源预算的细粒度适应,同时保持聚合兼容性。为了进一步对抗设备和数据异质性引起的专家负载不平衡,我们引入了一个异质性感知的辅助损失,动态规则化的路由分布,以确保专家的多样性和均衡的利用率。跨越IID和非IID条件的广泛实验表明,FFT MoE在泛化性能和训练效率方面始终优于最先进的FFT基线。
摘要
:As FMs drive progress toward Artificial General Intelligence (AGI), fine-tuning them under privacy and resource constraints has become increasingly critical particularly when highquality training data resides on distributed edge devices. Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT), which enables collaborative model adaptation without sharing raw data. Recent approaches incorporate Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low Rank Adaptation (LoRA) to reduce computational overhead. However, LoRA-based FFT faces two major limitations in heterogeneous FL environments: structural incompatibility across clients with varying LoRA configurations and limited adaptability to non-IID data distributions, which hinders convergence and generalization. To address these challenges, we propose FFT MoE, a novel FFT framework that replaces LoRA with sparse Mixture of Experts (MoE) adapters. Each client trains a lightweight gating network to selectively activate a personalized subset of experts, enabling fine-grained adaptation to local resource budgets while preserving aggregation compatibility. To further combat the expert load imbalance caused by device and data heterogeneity, we introduce a heterogeneity-aware auxiliary loss that dynamically regularizes the routing distribution to ensure expert diversity and balanced utilization. Extensive experiments spanning both IID and non-IID conditions demonstrate that FFT MoE consistently outperforms state of the art FFT baselines in generalization performance and training efficiency.
【14】BTW: A Non-Parametric Variance Stabilization Framework for Multimodal Model Integration
标题:BTW:多峰模型集成的非参数方差稳定框架
链接:https://arxiv.org/abs/2508.18551
作者:Le Wang, Xuan Wang
备注:None
摘要:混合专家(MoE)模型通过支持跨模态的模块化专业化,在多模态学习中变得越来越强大。然而,当额外的模式引入更多的噪音而不是补充信息时,它们的有效性仍然不清楚。现有的方法,如部分信息分解,难以扩展到两种模式之外,并且缺乏实例级控制所需的分辨率。我们提出了超越双模态加权(BTW),这是一个双层的非参数加权框架,它结合了实例级Kullback-Leibler(KL)发散和模态级互信息(MI),以在训练过程中动态调整模态重要性。我们的方法不需要额外的参数,可以适用于任意数量的方式。具体而言,BTW通过测量每个单峰和当前多峰预测之间的偏差来计算每个示例的KL权重,并且通过估计单峰和多峰输出之间的全局对齐来计算模态范围的MI权重。情感回归和临床分类的大量实验表明,我们的方法显着提高回归性能和多类分类精度。
摘要:Mixture-of-Experts (MoE) models have become increasingly powerful in multimodal learning by enabling modular specialization across modalities. However, their effectiveness remains unclear when additional modalities introduce more noise than complementary information. Existing approaches, such as the Partial Information Decomposition, struggle to scale beyond two modalities and lack the resolution needed for instance-level control. We propose Beyond Two-modality Weighting (BTW), a bi-level, non-parametric weighting framework that combines instance-level Kullback-Leibler (KL) divergence and modality-level mutual information (MI) to dynamically adjust modality importance during training. Our method does not require additional parameters and can be applied to an arbitrary number of modalities. Specifically, BTW computes per-example KL weights by measuring the divergence between each unimodal and the current multimodal prediction, and modality-wide MI weights by estimating global alignment between unimodal and multimodal outputs. Extensive experiments on sentiment regression and clinical classification demonstrate that our method significantly improves regression performance and multiclass classification accuracy.
【15】Low-Rank Tensor Decompositions for the Theory of Neural Networks
标题:神经网络理论的低阶张量分解
链接:https://arxiv.org/abs/2508.18408
作者:orsoi, Konstantin Usevich, Marianne Clausel
摘要:深度神经网络(NN)的突破性性能促进了为深度学习理论提供数学基础的兴趣激增。低秩张量分解由于其与神经网络的密切联系和丰富的理论结果而特别适合于这一任务。不同的张量分解有很强的唯一性保证,这允许直接解释它们的因子,并且已经提出了多项式时间算法来计算它们。通过张量和神经网络之间的联系,这些结果支持了神经网络理论的许多重要进展。在这篇综述中,我们展示了低秩张量方法--它一直是信号处理和机器学习领域的核心工具--如何在理论上解释深度NN性能的不同方面发挥基础性作用,包括它们的表达性、算法可学习性和计算硬度、泛化性和可识别性。我们的目标是以一种连贯和统一的方式对现有方法(由不同的社区开发,从计算机科学到数学)进行可访问的概述,并为深度NN理论的低秩张量分解的使用开辟更广阔的视角。
摘要:The groundbreaking performance of deep neural networks (NNs) promoted a surge of interest in providing a mathematical basis to deep learning theory. Low-rank tensor decompositions are specially befitting for this task due to their close connection to NNs and their rich theoretical results. Different tensor decompositions have strong uniqueness guarantees, which allow for a direct interpretation of their factors, and polynomial time algorithms have been proposed to compute them. Through the connections between tensors and NNs, such results supported many important advances in the theory of NNs. In this review, we show how low-rank tensor methods--which have been a core tool in the signal processing and machine learning communities--play a fundamental role in theoretically explaining different aspects of the performance of deep NNs, including their expressivity, algorithmic learnability and computational hardness, generalization, and identifiability. Our goal is to give an accessible overview of existing approaches (developed by different communities, ranging from computer science to mathematics) in a coherent and unified way, and to open a broader perspective on the use of low-rank tensor decompositions for the theory of deep NNs.
【16】Learning Spatio-Temporal Dynamics via Operator-Valued RKHS and Kernel Koopman Methods
标题:基于算子值RKHS和核Koopman方法的时空动力学学习
链接:https://arxiv.org/abs/2508.18307
作者:a Withanachchi
摘要:我们引入了一个统一的框架,学习时空动态的向量值函数相结合的算子值再生核希尔伯特空间(OV-RKHS)与内核为基础的Koopman算子的方法。该方法使非参数和数据驱动的估计复杂的时间演变的矢量场,同时保持空间和时间结构。我们建立了时间相关的OV-RKHS插值的表示定理,推导了光滑向量场的Sobolev型逼近界,并提供了核Koopman算子逼近的谱收敛保证。该框架支持高维非线性系统的有效降阶建模和长期预测,为时空机器学习中的预测、控制和不确定性量化提供了理论基础工具。
摘要:We introduce a unified framework for learning the spatio-temporal dynamics of vector valued functions by combining operator valued reproducing kernel Hilbert spaces (OV-RKHS) with kernel based Koopman operator methods. The approach enables nonparametric and data driven estimation of complex time evolving vector fields while preserving both spatial and temporal structure. We establish representer theorems for time dependent OV-RKHS interpolation, derive Sobolev type approximation bounds for smooth vector fields, and provide spectral convergence guarantees for kernel Koopman operator approximations. This framework supports efficient reduced order modeling and long term prediction of high dimensional nonlinear systems, offering theoretically grounded tools for forecasting, control, and uncertainty quantification in spatio- temporal machine learning.
【17】Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models
标题:我们真的需要关注吗?预训练RNN稀疏和全球注意力模型中资产定价的实证研究
链接:https://arxiv.org/abs/2508.19006
作者:ai
备注:55 pages including appendix, 21 figures and 5 tables
摘要:本研究探讨了预训练的RNN注意力模型与主流的注意力机制,如添加剂的注意力,梁的三个注意力,全球自我注意力(Self-att)和滑动窗口稀疏注意力(Sparse-att)的经验资产定价研究的前420名美国大盘股。这是第一篇关于大规模最先进的(SOTA)注意力机制应用于资产定价的论文。它们克服了传统的基于机器学习(ML)的资产定价的局限性,如错误捕捉时间依赖性和记忆短。此外,注意力机制中的强制因果掩码解决了更先进的基于注意力的模型(如经典的Transformer)所忽略的未来数据泄漏问题。所提出的注意力模型还考虑了资产定价数据的时间稀疏特性,并通过部署简化的模型结构来减轻潜在的过拟合问题。这为今后的实证经济学研究提供了一些启示。所有模型均于三个期间进行检验,涵盖2019冠状病毒病前(温和上升趋势)、2019冠状病毒病后(大幅下跌的陡峭上升趋势)及2019冠状病毒病后一年(大幅波动的横向移动),以测试该等模型在极端市况下的稳定性。研究发现,在价值加权投资组合回测中,Self-att模型和Sparse-att模型在推导绝对回报和对冲下行风险方面表现出很强的能力,而在COVID-19期间,它们的年化Sortino比率分别为2. 0和1. 80。从绝对投资组合收益率与股票市值大小的关系来看,Sparse-att模型比Self-att模型表现得更稳定。
摘要
:This study investigates the pretrained RNN attention models with the mainstream attention mechanisms such as additive attention, Luong's three attentions, global self-attention (Self-att) and sliding window sparse attention (Sparse-att) for the empirical asset pricing research on top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the traditional machine learning (ML) based asset pricing, such as mis-capturing the temporal dependency and short memory. Moreover, the enforced causal masks in the attention mechanisms address the future data leaking issue ignored by the more advanced attention-based models, such as the classic Transformer. The proposed attention models also consider the temporal sparsity characteristic of asset pricing data and mitigate potential overfitting issues by deploying the simplified model structures. This provides some insights for future empirical economic research. All models are examined in three periods, which cover pre-COVID-19 (mild uptrend), COVID-19 (steep uptrend with a large drawdown) and one year post-COVID-19 (sideways movement with high fluctuations), for testing the stability of these models under extreme market conditions. The study finds that in value-weighted portfolio back testing, Model Self-att and Model Sparse-att exhibit great capabilities in deriving the absolute returns and hedging downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80 respectively in the period with COVID-19. And Model Sparse-att performs more stably than Model Self-att from the perspective of absolute portfolio returns with respect to the size of stocks' market capitalization.
【18】The GINN framework: a stochastic QED correspondence for stability and chaos in deep neural networks
标题:GINN框架:深度神经网络稳定性和混乱的随机QED对应
链接:https://arxiv.org/abs/2508.18948
作者:armo Terin
备注:18 pages, 3 figures, 1 table
摘要:介绍了将深度神经网络(DNN)映射到具有局部U(1)对称性的量子电动力学(QED)的欧几里得随机场论方法的发展。神经激活和权重由费米子物质和规范场表示,虚构的朗之万时间使协变规范固定。这种映射识别了宽DNN中具有内核设计选择的规范参数,将稳定性阈值与规范依赖的放大因子相关联。量子电动力学中的环路修正对应于量子宽度的波动。作为概念的证明,我们通过标准多层感知器的数值模拟来验证理论预测,并并行地提出了一个规范不变的神经网络(GINN)实现,使用幅度-相位参数化的权重。最后,一个双副本副本的方法是统一的随机QED和宽DNN的最大Lyapunov指数的计算。
摘要:The development of a Euclidean stochastic field-theoretic approach that maps deep neural networks (DNNs) to quantum electrodynamics (QED) with local U(1) symmetry is presented. Neural activations and weights are represented by fermionic matter and gauge fields, with a fictitious Langevin time enabling covariant gauge fixing. This mapping identifies the gauge parameter with kernel design choices in wide DNNs, relating stability thresholds to gauge-dependent amplification factors. Finite-width fluctuations correspond to loop corrections in QED. As a proof of concept, we validate the theoretical predictions through numerical simulations of standard multilayer perceptrons and, in parallel, propose a gauge-invariant neural network (GINN) implementation using magnitude--phase parameterization of weights. Finally, a double-copy replica approach is shown to unify the computation of the largest Lyapunov exponent in stochastic QED and wide DNNs.
【19】Data-Driven Discovery and Formulation Refines the Quasi-Steady Model of Flapping-Wing Aerodynamics
标题:数据驱动的发现和公式完善扑翼空气动力学准稳态模型
链接:https://arxiv.org/abs/2508.18703
作者:zu, Hao Liu, Toshiyuki Nakata
备注:27 pages, 13 figures
摘要:昆虫控制扑翼上的非定常气动力以在复杂环境中航行。虽然理解这些力对生物学、物理学和工程学至关重要,但现有的评估方法面临着权衡:高保真模拟在计算或实验上昂贵,缺乏解释力,而基于准稳态假设的理论模型提供了见解,但准确性较低。为了克服这些局限性,从而提高准定常空气动力学模型的准确性,我们采用了一种数据驱动的方法,包括发现和制定以前被忽视的关键机制。通过从5,000个候选运动学函数中进行选择,我们确定了三个关键附加机构的数学表达式-前进比的影响,展向运动速度的影响和旋转瓦格纳效应-这些已经定性识别,但没有公式化。在天蛾向前飞行(高雷诺数)和果蝇机动(低雷诺数)中,使用计算流体动力学结果作为地面实况,阐明这些机制大大减少了准定常模型的预测误差。数据驱动的准稳态模型可以进行快速的空气动力学分析,作为理解昆虫飞行进化适应和开发生物启发飞行机器人的实用工具。
摘要:Insects control unsteady aerodynamic forces on flapping wings to navigate complex environments. While understanding these forces is vital for biology, physics, and engineering, existing evaluation methods face trade-offs: high-fidelity simulations are computationally or experimentally expensive and lack explanatory power, whereas theoretical models based on quasi-steady assumptions offer insights but exhibit low accuracy. To overcome these limitations and thus enhance the accuracy of quasi-steady aerodynamic models, we applied a data-driven approach involving discovery and formulation of previously overlooked critical mechanisms. Through selection from 5,000 candidate kinematic functions, we identified mathematical expressions for three key additional mechanisms -- the effect of advance ratio, effect of spanwise kinematic velocity, and rotational Wagner effect -- which had been qualitatively recognized but were not formulated. Incorporating these mechanisms considerably reduced the prediction errors of the quasi-steady model using the computational fluid dynamics results as the ground truth, both in hawkmoth forward flight (at high Reynolds numbers) and fruit fly maneuvers (at low Reynolds numbers). The data-driven quasi-steady model enables rapid aerodynamic analysis, serving as a practical tool for understanding evolutionary adaptations in insect flight and developing bio-inspired flying robots.
【20】scI2CL: Effectively Integrating Single-cell Multi-omics by Intra- and Inter-omics Contrastive Learning
标题:scI 2CL:通过组内和组间对比学习有效集成单细胞多组学
链接:https://arxiv.org/abs/2508.18304
作者:u, Han Peng, Wengen Li, Yichao Zhang, Jihong Guan, Shuigeng Zhou
备注:22 pages, 6figures
摘要:单细胞多组学数据包含大量细胞状态信息,分析这些数据可以揭示细胞异质性,疾病和生物学过程的宝贵见解。然而,由于细胞分化和发育是一个连续和动态的过程,基于单细胞多组学数据计算建模和推断细胞相互作用模式仍然具有挑战性。本文提出了scI2CL,一种新的单细胞多组学融合框架,基于组学内和组学间对比学习,从互补的多组学数据中学习全面和有区别的细胞表示,用于各种下游任务。四个下游任务的大量实验验证了scI2CL的有效性和其优于现有同行。具体而言,在细胞聚类中,scI2CL在四个广泛使用的真实世界数据集上超过了八种最先进的方法。在细胞分型中,scI2CL有效地区分了三种潜在的单核细胞亚群,这是现有方法无法发现的。同时,scI2CL是唯一正确构建从造血干细胞和祖细胞到记忆B细胞的细胞发育轨迹的方法。此外,scI2CL解决了CD4+ T细胞的两个亚群之间的细胞类型的错误分类,而现有的方法无法精确区分混合细胞。总之,scI2CL可以准确地表征细胞之间的交叉组学关系,从而有效地融合多组学数据并学习有区别的细胞表示以支持各种下游分析任务。
摘要
:Single-cell multi-omics data contain huge information of cellular states, and analyzing these data can reveal valuable insights into cellular heterogeneity, diseases, and biological processes. However, as cell differentiation \& development is a continuous and dynamic process, it remains challenging to computationally model and infer cell interaction patterns based on single-cell multi-omics data. This paper presents scI2CL, a new single-cell multi-omics fusion framework based on intra- and inter-omics contrastive learning, to learn comprehensive and discriminative cellular representations from complementary multi-omics data for various downstream tasks. Extensive experiments of four downstream tasks validate the effectiveness of scI2CL and its superiority over existing peers. Concretely, in cell clustering, scI2CL surpasses eight state-of-the-art methods on four widely-used real-world datasets. In cell subtyping, scI2CL effectively distinguishes three latent monocyte cell subpopulations, which are not discovered by existing methods. Simultaneously, scI2CL is the only method that correctly constructs the cell developmental trajectory from hematopoietic stem and progenitor cells to Memory B cells. In addition, scI2CL resolves the misclassification of cell types between two subpopulations of CD4+ T cells, while existing methods fail to precisely distinguish the mixed cells. In summary, scI2CL can accurately characterize cross-omics relationships among cells, thus effectively fuses multi-omics data and learns discriminative cellular representations to support various downstream analysis tasks.
其他(33篇)
【1】Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness
标题:获得全球保证:论扰动稳健性的概率性质
链接:https://arxiv.org/abs/2508.19183
作者:Mu, Kwan Hui Lim
摘要:在安全关键型深度学习应用中,鲁棒性衡量神经模型处理输入数据中不可感知的扰动的能力,这些扰动可能导致潜在的安全隐患。现有的部署前鲁棒性评估方法通常遭受计算成本和测量精度之间的重大权衡,限制了它们的实际效用。为了解决这些局限性,本文对现有的稳健性定义和相关评估方法进行了全面的比较分析。我们提出了塔鲁棒性来评估鲁棒性,这是一种基于假设检验的新型实用度量,可以定量评估概率鲁棒性,从而实现更严格和有效的部署前评估。我们广泛的比较评估说明了我们提出的方法的优势和适用性,从而促进了对安全关键型深度学习应用中模型鲁棒性的系统理解和增强。
摘要:In safety-critical deep learning applications, robustness measures the ability of neural models that handle imperceptible perturbations in input data, which may lead to potential safety hazards. Existing pre-deployment robustness assessment methods typically suffer from significant trade-offs between computational cost and measurement precision, limiting their practical utility. To address these limitations, this paper conducts a comprehensive comparative analysis of existing robustness definitions and associated assessment methodologies. We propose tower robustness to evaluate robustness, which is a novel, practical metric based on hypothesis testing to quantitatively evaluate probabilistic robustness, enabling more rigorous and efficient pre-deployment assessments. Our extensive comparative evaluation illustrates the advantages and applicability of our proposed approach, thereby advancing the systematic understanding and enhancement of model robustness in safety-critical deep learning applications.
【2】Leveraging Evolutionary Surrogate-Assisted Prescription in Multi-Objective Chlorination Control Systems
标题:在多目标氯化控制系统中利用进化的代孕辅助处方
链接:https://arxiv.org/abs/2508.19173
作者:nsia, Olivier Francon, Daniel Young, Risto Miikkulainen
摘要:这份简短的书面报告介绍了进化替代物辅助处方(ESP)的想法,并介绍了其在训练现实世界代理人方面的潜在用途的初步结果,作为IJCAI-2025第一届饮用水氯化挑战AI的一部分。这项工作是由Project Resilience的一个团队完成的,Project Resilience是一个致力于将AI与现实世界问题联系起来的组织。
摘要:This short, written report introduces the idea of Evolutionary Surrogate-Assisted Prescription (ESP) and presents preliminary results on its potential use in training real-world agents as a part of the 1st AI for Drinking Water Chlorination Challenge at IJCAI-2025. This work was done by a team from Project Resilience, an organization interested in bridging AI to real-world problems.
【3】Playstyle and Artificial Intelligence: An Initial Blueprint Through the Lens of Video Games
标题:游戏风格和人工智能:通过视频游戏镜头的初始蓝图
链接:https://arxiv.org/abs/2508.19152
作者: Lin
备注:PhD Dissertation, National Yang Ming Chiao Tung University, 2025. This is the public version without Chinese abstract or postscript
摘要:当代人工智能(AI)的发展主要集中在理性决策上,其价值在于其可测量性和客观评估的适用性。然而,在现实世界中,智能体的决策不仅受逻辑的影响,还受信仰、价值观和偏好等更深层次的影响。人类决策风格的多样性源于这些差异,突出表明“风格”是智力的一个重要但经常被忽视的方面。 本文引入游戏风格作为观察和分析智能主体决策行为的另一个视角,并从哲学的角度考察了游戏风格的基本含义和历史背景。通过分析信念和价值观如何驱动意图和行动,我们构建了一个两层框架的风格形成:与环境的外部互动循环和审议的内部认知循环。在此基础上,我们形式化的风格相关的特征,并提出了可衡量的指标,如风格的能力,风格的流行程度和进化动力。 本研究围绕三个核心研究方向展开:(1)游戏风格的定义与度量,提出基于离散化状态空间的通用游戏风格度量,并将其扩展到量化战略多样性和竞争平衡;(2)表达和生成游戏风格,探索如何使用强化学习和模仿学习来训练表现出特定风格倾向的智能体,并介绍了一种新的方法,为人类的风格学习和建模;(3)实际应用,分析这些技术在游戏设计和互动娱乐等领域的潜力。 最后,论文概述了未来的扩展,包括风格作为构建人工通用智能(AGI)的核心元素的作用。
摘要:Contemporary artificial intelligence (AI) development largely centers on rational decision-making, valued for its measurability and suitability for objective evaluation. Yet in real-world contexts, an intelligent agent's decisions are shaped not only by logic but also by deeper influences such as beliefs, values, and preferences. The diversity of human decision-making styles emerges from these differences, highlighting that "style" is an essential but often overlooked dimension of intelligence. This dissertation introduces playstyle as an alternative lens for observing and analyzing the decision-making behavior of intelligent agents, and examines its foundational meaning and historical context from a philosophical perspective. By analyzing how beliefs and values drive intentions and actions, we construct a two-tier framework for style formation: the external interaction loop with the environment and the internal cognitive loop of deliberation. On this basis, we formalize style-related characteristics and propose measurable indicators such as style capacity, style popularity, and evolutionary dynamics. The study focuses on three core research directions: (1) Defining and measuring playstyle, proposing a general playstyle metric based on discretized state spaces, and extending it to quantify strategic diversity and competitive balance; (2) Expressing and generating playstyle, exploring how reinforcement learning and imitation learning can be used to train agents exhibiting specific stylistic tendencies, and introducing a novel approach for human-like style learning and modeling; and (3) Practical applications, analyzing the potential of these techniques in domains such as game design and interactive entertainment. Finally, the dissertation outlines future extensions, including the role of style as a core element in building artificial general intelligence (AGI).
【4】Saddle Hierarchy in Dense Associative Memory
标题:稠密联想记忆中的鞍型层次结构
链接:https://arxiv.org/abs/2508.19151
作者:riault, Daniele Tantari
备注:55 pages, 10 figures
摘要:密集联想记忆(DAM)模型已经引起了人们的新的关注,因为它们被证明对对抗性示例具有鲁棒性,并且与最先进的机器学习范式密切相关,例如Transformers和生成扩散模型中的注意力机制。我们研究了一个DAM建立在一个三层玻尔兹曼机与波茨隐藏单元,代表数据集群和类。通过统计力学分析,我们推导出鞍点方程,该方程表征了在真实数据上训练的DAM的固定点和在师生框架内在合成数据上训练的DAM的固定点。基于这些结果,我们提出了一种新的正则化方案,使训练更加稳定。此外,我们的经验表明,我们的DAM学习可解释的解决方案,监督和无监督分类问题。进一步推进我们的理论分析,我们发现相对较小的DAM学习的权重对应于较大DAM中的不稳定鞍点。我们实现了一个网络增长算法,利用这个鞍点层次结构,大大降低了训练密集联想记忆的计算成本。
摘要
:Dense associative memory (DAM) models have been attracting renewed attention since they were shown to be robust to adversarial examples and closely related to state-of-the-art machine learning paradigms, such as the attention mechanisms in transformers and generative diffusion models. We study a DAM built upon a three-layer Boltzmann machine with Potts hidden units, which represent data clusters and classes. Through a statistical mechanics analysis, we derive saddle-point equations that characterize both the stationary points of DAMs trained on real data and the fixed points of DAMs trained on synthetic data within a teacher-student framework. Based on these results, we propose a novel regularization scheme that makes training significantly more stable. Moreover, we show empirically that our DAM learns interpretable solutions to both supervised and unsupervised classification problems. Pushing our theoretical analysis further, we find that the weights learned by relatively small DAMs correspond to unstable saddle points in larger DAMs. We implement a network-growing algorithm that leverages this saddle-point hierarchy to drastically reduce the computational cost of training dense associative memory.
【5】GReAT: leveraging geometric artery data to improve wall shear stress assessment
标题:GReAT:利用几何动脉数据改善壁面剪应力评估
链接:https://arxiv.org/abs/2508.19030
作者:k, Jolanda J. Wentzel, Patryk Rygiel, Joost Daemen, Daniel Rueckert, Jelmer M. Wolterink
备注:(MICCAI 2025) Workshop on Shape in Medical Imaging (ShapeMI)
摘要:利用大数据进行患者护理在心血管健康等许多医疗领域都很有前途。例如,可以通过机器学习算法从患者特定的医学图像中评估血液动力学生物标志物(如壁面剪切应力),从而绕过对时间密集型计算流体模拟的需求。然而,要积累足够大的数据集来有效地训练这样的模型是非常具有挑战性的。我们可以通过自我监督的预训练和基础模型来解决这种数据稀缺问题,因为几何动脉模型的数据集很大。在冠状动脉的背景下,利用学习的表示来改善血液动力学生物标志物评估尚未得到很好的研究。在这项工作中,我们通过调查由3D血管几何模型组成的大型数据集(8449个形状)是否可以从小规模临床试验(49例患者)中受益于冠状动脉模型的壁面剪切应力评估来解决这一差距。我们通过计算热核特征来创建3D血管的自监督目标,热核特征是通过拉普拉斯特征向量获得的一个量,它捕捉了形状的本质。我们展示了从这些数据集学习的几何表示如何将冠状动脉分割成低,中和高(时间平均)壁切应力区域,即使在有限的数据上进行训练。
摘要:Leveraging big data for patient care is promising in many medical fields such as cardiovascular health. For example, hemodynamic biomarkers like wall shear stress could be assessed from patient-specific medical images via machine learning algorithms, bypassing the need for time-intensive computational fluid simulation. However, it is extremely challenging to amass large-enough datasets to effectively train such models. We could address this data scarcity by means of self-supervised pre-training and foundations models given large datasets of geometric artery models. In the context of coronary arteries, leveraging learned representations to improve hemodynamic biomarker assessment has not yet been well studied. In this work, we address this gap by investigating whether a large dataset (8449 shapes) consisting of geometric models of 3D blood vessels can benefit wall shear stress assessment in coronary artery models from a small-scale clinical trial (49 patients). We create a self-supervised target for the 3D blood vessels by computing the heat kernel signature, a quantity obtained via Laplacian eigenvectors, which captures the very essence of the shapes. We show how geometric representations learned from this datasets can boost segmentation of coronary arteries into regions of low, mid and high (time-averaged) wall shear stress even when trained on limited data.
【6】GRADSTOP: Early Stopping of Gradient Descent via Posterior Sampling
标题:GRADSTOP:通过后验采样提前停止梯度下降
链接:https://arxiv.org/abs/2508.19028
作者:shidi, Lauri Seppäläinen, Katsiaryna Haitsiukevich, Hoang Phuc Hau Luu, Anton Björklund, Kai Puolamäki
摘要:机器学习模型通常通过使用梯度下降算法最小化训练数据上的损失函数来学习。这些模型往往会受到过度拟合的影响,导致对未知数据的预测性能下降。标准解决方案是使用保留验证集的早期停止,当验证损失停止减少时,它会停止最小化。然而,这种保留集减少了可用于训练的数据。本文提出了{\scgradstop},一种新的随机提前停止方法,只使用梯度中的信息,这是由梯度下降算法产生的免费。我们的主要贡献是利用梯度信息估计贝叶斯后验,将早期停止问题定义为从这个后验中抽取样本,并利用近似后验得到停止准则。我们的实证评估表明,{\sc gradstop}实现了测试数据的小损失,并与基于验证集的停止标准相比。通过利用整个数据集进行训练,我们的方法在数据有限的环境中特别有利,例如迁移学习。它可以作为一个可选的功能,在梯度下降库,只有一个小的计算开销。源代码可在https://github.com/edahelsinki/gradstop上获得。
摘要:Machine learning models are often learned by minimising a loss function on the training data using a gradient descent algorithm. These models often suffer from overfitting, leading to a decline in predictive performance on unseen data. A standard solution is early stopping using a hold-out validation set, which halts the minimisation when the validation loss stops decreasing. However, this hold-out set reduces the data available for training. This paper presents {\sc gradstop}, a novel stochastic early stopping method that only uses information in the gradients, which are produced by the gradient descent algorithm ``for free.'' Our main contributions are that we estimate the Bayesian posterior by the gradient information, define the early stopping problem as drawing sample from this posterior, and use the approximated posterior to obtain a stopping criterion. Our empirical evaluation shows that {\sc gradstop} achieves a small loss on test data and compares favourably to a validation-set-based stopping criterion. By leveraging the entire dataset for training, our method is particularly advantageous in data-limited settings, such as transfer learning. It can be incorporated as an optional feature in gradient descent libraries with only a small computational overhead. The source code is available at https://github.com/edahelsinki/gradstop.
【7】STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems
标题:STDiff:工业系统时间序列归责的状态转变扩散框架
链接:https://arxiv.org/abs/2508.19011
作者:thy, Daniel Ortiz-Arroyo, Petar Durdevic
摘要:大多数用于估算缺失值的深度学习方法将任务视为在固定时间窗口内完成的模式。这种假设在工业系统中经常失败,其中动态由控制动作驱动,是高度非平稳的,并且可以经历长时间不间断的间隙。我们提出了STDiff,它将估算重新定义为学习系统如何从一个状态演变到下一个状态。STDiff使用条件去噪扩散模型,其因果偏差与控制理论一致,基于最近的已知状态和相关控制或环境输入逐步生成缺失值。在具有模拟缺失块的公共废水处理数据集上,STDiff始终实现最低的误差,其优势随着间隙的增加而增加。在具有大量实际差距的原始工业数据集上,它产生的轨迹在动态上保持合理,而基于窗口的模型往往会变平或过于平滑。这些结果支持动态感知,明确条件插补作为一个强大的方法,工业时间序列,我们讨论了计算的权衡和扩展到更广泛的领域。
摘要:Most deep learning methods for imputing missing values treat the task as completing patterns within a fixed time window. This assumption often fails in industrial systems, where dynamics are driven by control actions, are highly non-stationary, and can experience long, uninterrupted gaps. We propose STDiff, which reframes imputation as learning how the system evolves from one state to the next. STDiff uses a conditional denoising diffusion model with a causal bias aligned to control theory, generating missing values step-by-step based on the most recent known state and relevant control or environmental inputs. On a public wastewater treatment dataset with simulated missing blocks, STDiff consistently achieves the lowest errors, with its advantage increasing for longer gaps. On a raw industrial dataset with substantial real gaps, it produces trajectories that remain dynamically plausible, in contrast to window-based models that tend to flatten or over-smooth. These results support dynamics-aware, explicitly conditioned imputation as a robust approach for industrial time series, and we discuss computational trade-offs and extensions to broader domains.
【8】Estimating Conditional Covariance between labels for Multilabel Data
标题:估计多标签数据标签之间的条件协方差
链接:https://arxiv.org/abs/2508.18951
作者:A. F. Park, Jesse Read
摘要:在应用多标签模型之前,应分析多标签数据的标签依赖性。多标签数据标签之间的独立性不能直接从标签值测量,因为它们依赖于协变量集$\vec{x}$,但可以通过使用多变量Probit模型检查条件标签协方差来测量。不幸的是,多变量Probit模型提供了其copula协方差的估计,因此在估计常数协方差和相关协方差时可能不可靠。在这篇文章中,我们比较了三个模型(多元概率,多元伯努利和阶段Logit)估计常数和依赖多标签条件标签协方差。我们提供了一个实验,使我们能够观察每个模型的条件协方差的测量。我们发现,所有模型都能同样好地测量常数协方差和相关协方差,这取决于协方差的强度,但对于存在常数协方差的数据,所有模型都错误地检测到相关协方差存在。在这三个模型中,多变量概率模型的错误率最低。
摘要:Multilabel data should be analysed for label dependence before applying multilabel models. Independence between multilabel data labels cannot be measured directly from the label values due to their dependence on the set of covariates $\vec{x}$, but can be measured by examining the conditional label covariance using a multivariate Probit model. Unfortunately, the multivariate Probit model provides an estimate of its copula covariance, and so might not be reliable in estimating constant covariance and dependent covariance. In this article, we compare three models (Multivariate Probit, Multivariate Bernoulli and Staged Logit) for estimating the constant and dependent multilabel conditional label covariance. We provide an experiment that allows us to observe each model's measurement of conditional covariance. We found that all models measure constant and dependent covariance equally well, depending on the strength of the covariance, but the models all falsely detect that dependent covariance is present for data where constant covariance is present. Of the three models, the Multivariate Probit model had the lowest error rate.
【9】Generalization Bound for a General Class of Neural Ordinary Differential Equations
标题:一类一般神经常微方程的推广界
链接:https://arxiv.org/abs/2508.18920
作者:n Verma, Manoj Kumar
备注:23 pages, 4 figures
摘要:神经常微分方程(Neural ODE)是一种流行的深度学习模型,它使用连续深度架构进行操作。为了评估这些模型在看不见的数据上的表现,了解它们的泛化误差范围至关重要。以前的研究主要集中在神经常微分方程中动力学函数的线性情况- Marion,P.(2023),或者提供依赖于采样间隔的神经控制常微分方程的界限Bleistein et al.(2023)。在这项工作中,我们分析了更广泛的一类神经常微分方程的动力学函数是一个一般的非线性函数,无论是时间依赖或时间无关,是Lipschitz连续的状态变量。我们证明了在这个Lipschitz条件下,神经常微分方程的解具有有界变差的解。基于这一观察,我们建立了时间相关和时间无关的情况下,并调查如何overparameterization和域约束的影响这些界限的泛化界限。据我们所知,这是第一次推导一般非线性动力学神经常微分方程的推广界。
摘要:Neural ordinary differential equations (neural ODEs) are a popular type of deep learning model that operate with continuous-depth architectures. To assess how well such models perform on unseen data, it is crucial to understand their generalization error bounds. Previous research primarily focused on the linear case for the dynamics function in neural ODEs - Marion, P. (2023), or provided bounds for Neural Controlled ODEs that depend on the sampling interval Bleistein et al. (2023). In this work, we analyze a broader class of neural ODEs where the dynamics function is a general nonlinear function, either time dependent or time independent, and is Lipschitz continuous with respect to the state variables. We showed that under this Lipschitz condition, the solutions to neural ODEs have solutions with bounded variations. Based on this observation, we establish generalization bounds for both time-dependent and time-independent cases and investigate how overparameterization and domain constraints influence these bounds. To our knowledge, this is the first derivation of generalization bounds for neural ODEs with general nonlinear dynamics.
【10】Distance-informed Neural Processes
标题:距离知情的神经过程
链接:https://arxiv.org/abs/2508.18903
作者: Venkataramanan, Joachim Denzler
备注:22 pages
摘要:我们提出了距离通知神经过程(DNP),神经过程的一种新的变体,通过结合全局和距离感知的局部潜在结构来提高不确定性估计。标准神经过程(NP)通常依赖于全局潜在变量,并且难以进行不确定性校准和捕获局部数据依赖性。DNP通过引入全局潜在变量来对任务级变化进行建模,并引入局部潜在变量来捕获保持距离的潜在空间内的输入相似性,从而解决了这些限制。这是通过双Lipschitz正则化来实现的,它限制了输入关系中的失真,并鼓励保留潜在空间中的相对距离。这种建模方法使DNP能够产生更好的校准不确定性估计,并更有效地区分分布数据。实证结果表明,DNP在回归和分类任务中实现了强大的预测性能和改进的不确定性校准。
摘要:We propose the Distance-informed Neural Process (DNP), a novel variant of Neural Processes that improves uncertainty estimation by combining global and distance-aware local latent structures. Standard Neural Processes (NPs) often rely on a global latent variable and struggle with uncertainty calibration and capturing local data dependencies. DNP addresses these limitations by introducing a global latent variable to model task-level variations and a local latent variable to capture input similarity within a distance-preserving latent space. This is achieved through bi-Lipschitz regularization, which bounds distortions in input relationships and encourages the preservation of relative distances in the latent space. This modeling approach allows DNP to produce better-calibrated uncertainty estimates and more effectively distinguish in- from out-of-distribution data. Empirical results demonstrate that DNP achieves strong predictive performance and improved uncertainty calibration across regression and classification tasks.
【11】MOCHA: Discovering Multi-Order Dynamic Causality in Temporal Point Processes
标题:MOCHA:发现时间点过程中的多阶动态因果关系
链接:https://arxiv.org/abs/2508.18873
作者:ao, Juekai Lin, Wenhao Li, Bo Jin
摘要:发现时间点过程(TPPs)中复杂的因果依赖关系是建模真实世界事件序列的关键。现有的方法通常依赖于静态或一阶因果结构,忽略了因果关系的多阶和时变性质。在本文中,我们提出了MOCHA,一个新的框架发现多阶动态因果关系的TPP。MOCHA将多阶影响表征为潜在时间演化图上的多跳因果路径。为了对这种动态进行建模,我们引入了一个具有可学习结构权重的时变有向无环图(DAG),其中强制执行无环性和稀疏性约束以确保结构有效性。我们设计了一个端到端的可区分框架,该框架联合建模因果发现和TPP动态,从而实现准确的事件预测并揭示可解释的结构。在真实世界数据集上的大量实验表明,MOCHA不仅在事件预测方面达到了最先进的性能,而且还揭示了有意义和可解释的因果结构。
摘要:Discovering complex causal dependencies in temporal point processes (TPPs) is critical for modeling real-world event sequences. Existing methods typically rely on static or first-order causal structures, overlooking the multi-order and time-varying nature of causal relationships. In this paper, we propose MOCHA, a novel framework for discovering multi-order dynamic causality in TPPs. MOCHA characterizes multi-order influences as multi-hop causal paths over a latent time-evolving graph. To model such dynamics, we introduce a time-varying directed acyclic graph (DAG) with learnable structural weights, where acyclicity and sparsity constraints are enforced to ensure structural validity. We design an end-to-end differentiable framework that jointly models causal discovery and TPP dynamics, enabling accurate event prediction and revealing interpretable structures. Extensive experiments on real-world datasets demonstrate that MOCHA not only achieves state-of-the-art performance in event prediction, but also reveals meaningful and interpretable causal structures.
【12】ReflectivePrompt: Reflective evolution in autoprompting algorithms
标题:ReflectivePrompt:自动提示算法的反射性进化
链接:https://arxiv.org/abs/2508.18870
作者: Zhuravlev, Artur R. Khairullin, Ernest A. Dyagin, Alena N. Sitkina, Nikita I. Kulin
摘要:自动提示是为语言模型自动选择优化提示的过程,随着提示工程的快速发展,在大型语言模型(LLM)领域的广泛研究的推动下,自动提示已经越来越流行。本文介绍了反射提示-一种新的自动提示方法的基础上进化算法,采用反射进化方法更精确和全面的搜索最佳提示。ReflectivePrompt在交叉和精英变异之前利用短期和长期反射操作来提高它们引入的修改的质量。该方法允许在整个进化过程中获得的知识的积累,并基于当前种群在每个时期更新它。使用开放访问的大型语言模型:t-lite-instruction-0.1和gemma 3 - 27 b-it,在33个数据集上对ReflectivePrompt进行了分类和文本生成任务的测试。与EvoPrompt相比,BBH的比例为28%),从而成为基于进化算法的自动提示中最有效的解决方案之一。
摘要:Autoprompting is the process of automatically selecting optimized prompts for language models, which has been gaining popularity with the rapid advancement of prompt engineering, driven by extensive research in the field of large language models (LLMs). This paper presents ReflectivePrompt - a novel autoprompting method based on evolutionary algorithms that employs a reflective evolution approach for more precise and comprehensive search of optimal prompts. ReflectivePrompt utilizes short-term and long-term reflection operations before crossover and elitist mutation to enhance the quality of the modifications they introduce. This method allows for the accumulation of knowledge obtained throughout the evolution process and updates it at each epoch based on the current population. ReflectivePrompt was tested on 33 datasets for classification and text generation tasks using open-access large language models: t-lite-instruct-0.1 and gemma3-27b-it. The method demonstrates, on average, a significant improvement (e.g., 28% on BBH compared to EvoPrompt) in metrics relative to current state-of-the-art approaches, thereby establishing itself as one of the most effective solutions in evolutionary algorithm-based autoprompting.
【13】SWiFT: Soft-Mask Weight Fine-tuning for Bias Mitigation
标题:SWiFT:软面罩重量微调,缓解偏差
链接:https://arxiv.org/abs/2508.18826
作者:, Feng Chen, Yuyang Xue, Yuning Du, Konstantinos Vilouras, Sotirios A. Tsaftaris, Steven McDonagh
备注:Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2025:015
摘要:最近的研究表明,机器学习(ML)模型在现实世界的场景中可能会表现出偏见,给医疗保健等道德敏感领域带来重大挑战。这种偏见可能会对模型的公平性、模型的泛化能力产生负面影响,并有进一步放大社会歧视的风险。需要从训练模型中消除偏见。现有的去偏方法通常需要访问原始训练数据,并需要广泛的模型再训练;它们通常还表现出模型公平性和判别性能之间的权衡。为了解决这些挑战,我们提出了软掩模权重微调(SWiFT),一个去偏置框架,有效地提高公平性,同时保持歧视性的性能,少得多的去偏置成本。值得注意的是,SWiFT只需要一个小的外部数据集和几个时期的模型微调。SWiFT背后的想法是首先找到模型参数对偏差和预测性能的相对但不同的贡献。然后,一个两步微调过程更新每个参数与不同的梯度流定义的贡献。使用三个偏倚敏感属性进行的广泛实验在四个皮肤病学和两个胸部X射线数据集上的性别、肤色和年龄的比较表明,与最先进的技术相比,SWiFT可以始终如一地减少模型偏差,同时在常见的公平性和准确性指标下实现有竞争力的甚至更高的诊断准确性。我们展示了改进的模型泛化能力,这一点在几个分布外(OOD)数据集上表现出色。
摘要:Recent studies have shown that Machine Learning (ML) models can exhibit bias in real-world scenarios, posing significant challenges in ethically sensitive domains such as healthcare. Such bias can negatively affect model fairness, model generalization abilities and further risks amplifying social discrimination. There is a need to remove biases from trained models. Existing debiasing approaches often necessitate access to original training data and need extensive model retraining; they also typically exhibit trade-offs between model fairness and discriminative performance. To address these challenges, we propose Soft-Mask Weight Fine-Tuning (SWiFT), a debiasing framework that efficiently improves fairness while preserving discriminative performance with much less debiasing costs. Notably, SWiFT requires only a small external dataset and only a few epochs of model fine-tuning. The idea behind SWiFT is to first find the relative, and yet distinct, contributions of model parameters to both bias and predictive performance. Then, a two-step fine-tuning process updates each parameter with different gradient flows defined by its contribution. Extensive experiments with three bias sensitive attributes (gender, skin tone, and age) across four dermatological and two chest X-ray datasets demonstrate that SWiFT can consistently reduce model bias while achieving competitive or even superior diagnostic accuracy under common fairness and accuracy metrics, compared to the state-of-the-art. Specifically, we demonstrate improved model generalization ability as evidenced by superior performance on several out-of-distribution (OOD) datasets.
【14】Governance-as-a-Service: A Multi-Agent Framework for AI System Compliance and Policy Enforcement
标题:治理即服务:人工智能系统合规和政策执行的多代理框架
链接:https://arxiv.org/abs/2508.18765
作者:vez, Suyash Gaurav, Jukka Heikkonen, Jatin Chaudhary
摘要:随着人工智能系统演变成具有自主执行、异步推理和多代理协调的分布式生态系统,缺乏可扩展、解耦的治理会带来结构性风险。现有的监督机制是被动的,脆弱的,嵌入在代理架构中,使它们不可审计,难以在异构部署中推广。 我们引入治理即服务(GaaS):一个模块化的,策略驱动的执行层,在运行时调节代理输出,而不改变模型内部或需要代理合作。GaaS采用声明性规则和信任因子机制,根据合规性和严重性加权违规对代理进行评分。它使强制性,规范性和适应性的干预措施,支持渐进的执法和动态的信任调节。 为了评估GaaS,我们使用开源模型(LLaMA 3,Qwen 3,DeepSeek-R1)在内容生成和财务决策中进行了三种模拟机制。在基线中,代理在没有治理的情况下行动;在第二个中,GaaS强制执行策略;在第三个中,对抗代理探测鲁棒性。所有操作都被拦截、评估和记录以供分析。结果表明,GaaS可以可靠地阻止或重定向高风险行为,同时保持吞吐量。信任分数跟踪规则遵守情况,隔离和惩罚多代理系统中不可信的组件。 通过将治理定位为类似于计算或存储的运行时服务,GaaS为可互操作的代理生态系统建立了基础设施级别的对齐。它不教代理人道德;它强制他们。
摘要:As AI systems evolve into distributed ecosystems with autonomous execution, asynchronous reasoning, and multi-agent coordination, the absence of scalable, decoupled governance poses a structural risk. Existing oversight mechanisms are reactive, brittle, and embedded within agent architectures, making them non-auditable and hard to generalize across heterogeneous deployments. We introduce Governance-as-a-Service (GaaS): a modular, policy-driven enforcement layer that regulates agent outputs at runtime without altering model internals or requiring agent cooperation. GaaS employs declarative rules and a Trust Factor mechanism that scores agents based on compliance and severity-weighted violations. It enables coercive, normative, and adaptive interventions, supporting graduated enforcement and dynamic trust modulation. To evaluate GaaS, we conduct three simulation regimes with open-source models (LLaMA3, Qwen3, DeepSeek-R1) across content generation and financial decision-making. In the baseline, agents act without governance; in the second, GaaS enforces policies; in the third, adversarial agents probe robustness. All actions are intercepted, evaluated, and logged for analysis. Results show that GaaS reliably blocks or redirects high-risk behaviors while preserving throughput. Trust scores track rule adherence, isolating and penalizing untrustworthy components in multi-agent systems. By positioning governance as a runtime service akin to compute or storage, GaaS establishes infrastructure-level alignment for interoperable agent ecosystems. It does not teach agents ethics; it enforces them.
【15】Stability and Generalization for Bellman Residuals
标题:Bellman剩余量的稳定性与推广
链接:https://arxiv.org/abs/2508.18741
作者:Kang, Kyoungseok Jang
摘要:离线强化学习和离线逆强化学习的目标是从一批固定的记录轨迹中恢复接近最优的价值函数或奖励模型,但目前的实践仍然难以实现Bellman一致性。贝尔曼残差最小化(BRM)已成为一个有吸引力的补救措施,作为一个全局收敛的随机梯度下降上升BRM方法最近被发现。然而,它在离线环境中的统计行为在很大程度上仍未被探索。在本文中,我们缩小了这一统计差距。我们的分析引入了一个单一的李雅普诺夫潜在的夫妇SGDA运行在相邻的数据集,并产生一个O(1/n)的平均参数稳定性边界加倍的最知名的样本复杂性指数的凹凸鞍问题。相同的稳定常数转化为O(1/n)的BRM超额风险界,没有方差减少,额外的正则化,或限制性的独立性假设的minibatch抽样。标准的神经网络参数化和minibatch SGD的结果。
摘要:Offline reinforcement learning and offline inverse reinforcement learning aim to recover near-optimal value functions or reward models from a fixed batch of logged trajectories, yet current practice still struggles to enforce Bellman consistency. Bellman residual minimization (BRM) has emerged as an attractive remedy, as a globally convergent stochastic gradient descent-ascent based method for BRM has been recently discovered. However, its statistical behavior in the offline setting remains largely unexplored. In this paper, we close this statistical gap. Our analysis introduces a single Lyapunov potential that couples SGDA runs on neighbouring datasets and yields an O(1/n) on-average argument-stability bound-doubling the best known sample-complexity exponent for convex-concave saddle problems. The same stability constant translates into the O(1/n) excess risk bound for BRM, without variance reduction, extra regularization, or restrictive independence assumptions on minibatch sampling. The results hold for standard neural-network parameterizations and minibatch SGD.
【16】ROSE: Remove Objects with Side Effects in Videos
标题:ROSE:删除视频中有副作用的物体
链接:https://arxiv.org/abs/2508.18633
作者:Miao, Yutong Feng, Jianshu Zeng, Zixiang Gao, Hantang Liu, Yunfeng Yan, Donglian Qi, Xi Chen, Bin Wang, Hengshuang Zhao
摘要:由于最近视频生成模型的成功,视频对象去除已经取得了先进的性能。然而,在解决对象的副作用时,例如,它们的阴影和反射,现有的作品努力消除这些影响,因为缺乏成对的视频数据作为监督。本文提出了ROSE,称为删除对象与副作用,一个框架,系统地研究了对象对环境的影响,这可以分为五种常见的情况下:阴影,反射,光,半透明和镜像。考虑到管理呈现上述效果的配对视频的挑战,我们利用3D渲染引擎来生成合成数据。我们精心构建了一个用于数据准备的全自动管道,它模拟了一个具有不同场景,对象,拍摄角度和相机轨迹的大规模配对数据集。ROSE是一个基于扩散Transformer的视频修复模型。为了定位所有与对象相关的区域,整个视频都被输入到模型中进行基于参考的擦除。此外,还引入了额外的监督来明确预测受副作用影响的区域,这可以通过配对视频之间的差分掩码来揭示。为了充分研究模型在各种副作用去除方面的性能,我们提出了一个新的基准测试,称为ROSE-Bench,它结合了常见的场景和五个特殊的副作用进行综合评估。实验结果表明,ROSE实现了优越的性能相比,现有的视频对象擦除模型,以及推广到现实世界的视频场景。项目页面是https://rose2025-inpaint.github.io/。
摘要:Video object removal has achieved advanced performance due to the recent success of video generative models. However, when addressing the side effects of objects, e.g., their shadows and reflections, existing works struggle to eliminate these effects for the scarcity of paired video data as supervision. This paper presents ROSE, termed Remove Objects with Side Effects, a framework that systematically studies the object's effects on environment, which can be categorized into five common cases: shadows, reflections, light, translucency and mirror. Given the challenges of curating paired videos exhibiting the aforementioned effects, we leverage a 3D rendering engine for synthetic data generation. We carefully construct a fully-automatic pipeline for data preparation, which simulates a large-scale paired dataset with diverse scenes, objects, shooting angles, and camera trajectories. ROSE is implemented as an video inpainting model built on diffusion transformer. To localize all object-correlated areas, the entire video is fed into the model for reference-based erasing. Moreover, additional supervision is introduced to explicitly predict the areas affected by side effects, which can be revealed through the differential mask between the paired videos. To fully investigate the model performance on various side effect removal, we presents a new benchmark, dubbed ROSE-Bench, incorporating both common scenarios and the five special side effects for comprehensive evaluation. Experimental results demonstrate that ROSE achieves superior performance compared to existing video object erasing models and generalizes well to real-world video scenarios. The project page is https://rose2025-inpaint.github.io/.
【17】Linear Trading Position with Sparse Spectrum
标题:频谱稀疏的线性交易头寸
链接:https://arxiv.org/abs/2508.18596
作者: Lai, Haisheng Yang
备注:IJCAI2025
摘要:本金组合法是信号交易中的一种新兴方法。然而,这些主要投资组合可能不多样化,以探索预测矩阵的关键特征或对不同情况的稳健性。为了解决这个问题,我们提出了一种新的线性交易位置与稀疏频谱,可以探索更大的预测矩阵的频谱区域。我们还开发了一个Krasnosel'ski\u \i-Mann不动点算法来优化这个交易位置,该算法具有下降性,并在目标值上实现了线性收敛速度。这是这类算法的一个新的理论结果。大量的实验表明,该方法在各种情况下都取得了良好的鲁棒性。
摘要:The principal portfolio approach is an emerging method in signal-based trading. However, these principal portfolios may not be diversified to explore the key features of the prediction matrix or robust to different situations. To address this problem, we propose a novel linear trading position with sparse spectrum that can explore a larger spectral region of the prediction matrix. We also develop a Krasnosel'ski\u \i-Mann fixed-point algorithm to optimize this trading position, which possesses the descent property and achieves a linear convergence rate in the objective value. This is a new theoretical result for this type of algorithms. Extensive experiments show that the proposed method achieves good and robust performance in various situations.
【18】Improving Long-term Autoregressive Spatiotemporal Predictions: A Proof of Concept with Fluid Dynamics
标题:改善长期自回归时空预测:流体动力学的概念证明
链接:https://arxiv.org/abs/2508.18565
作者: Sibo Cheng
摘要:数据驱动的方法正在成为传统数值预报的有效替代方案,提供快速推理和较低的计算成本。然而,对于复杂的系统,由于误差积累,长期精度往往会下降,自回归训练(尽管有效)需要大量的GPU内存,并可能牺牲短期性能。我们提出了随机推进(SPF)框架,它保留了一步前训练,同时支持多步学习。SPF从模型预测中构建补充数据集,并通过随机采集策略将其与地面实况相结合,平衡短期和长期性能,同时减少过拟合。多步预测在epoch之间预先计算,保持内存使用稳定,而不存储完整的展开序列。在Burgers方程和浅水基准上的实验表明,SPF比自回归方法具有更高的长期精度,同时降低了内存需求,使其在资源有限和复杂的模拟中具有前景。
摘要:Data-driven methods are emerging as efficient alternatives to traditional numerical forecasting, offering fast inference and lower computational cost. Yet, for complex systems, long-term accuracy often deteriorates due to error accumulation, and autoregressive training (though effective) demands large GPU memory and may sacrifice short-term performance. We propose the Stochastic PushForward (SPF) framework, which retains one-step-ahead training while enabling multi-step learning. SPF builds a supplementary dataset from model predictions and combines it with ground truth via a stochastic acquisition strategy, balancing short- and long-term performance while reducing overfitting. Multi-step predictions are precomputed between epochs, keeping memory usage stable without storing full unrolled sequences. Experiments on the Burgers' equation and the Shallow Water benchmark show that SPF achieves higher long-term accuracy than autoregressive methods while lowering memory requirements, making it promising for resource-limited and complex simulations.
【19】Enhancing Chemical Explainability Through Counterfactual Masking
标题
:通过反事实掩盖增强化学解释性
链接:https://arxiv.org/abs/2508.18561
作者:nisiów, Marek Kochańczyk, Bartosz Zieliński, Tomasz Danel
摘要:分子性质预测是指导新化合物(包括药物和材料)设计的关键任务。虽然可解释的人工智能方法旨在通过识别有影响力的分子子结构来仔细检查模型预测,但许多现有方法依赖于掩蔽策略,该策略去除原子或原子级特征以通过保真度度量来评估重要性。然而,这些方法往往不能坚持基本的分子分布,从而产生不直观的解释。在这项工作中,我们提出了反事实掩蔽,一种新的框架,取代掩蔽的子结构与化学合理的片段从生成模型训练完成分子图采样。我们不是根据不可信的归零基线评估掩蔽的预测,而是相对于从数据分布中提取的反事实分子进行评估。我们的方法提供了两个关键的好处:(1)分子现实主义支持强大的和分布一致的解释,(2)有意义的反事实,直接表明结构修饰可能会影响预测的属性。我们证明了反事实掩蔽非常适合基准模型解释器,并在多个数据集和属性预测任务中产生更多可操作的见解。我们的方法弥合了可解释性和分子设计之间的差距,为化学中可解释的机器学习提供了一条原则性和生成性的道路。
摘要:Molecular property prediction is a crucial task that guides the design of new compounds, including drugs and materials. While explainable artificial intelligence methods aim to scrutinize model predictions by identifying influential molecular substructures, many existing approaches rely on masking strategies that remove either atoms or atom-level features to assess importance via fidelity metrics. These methods, however, often fail to adhere to the underlying molecular distribution and thus yield unintuitive explanations. In this work, we propose counterfactual masking, a novel framework that replaces masked substructures with chemically reasonable fragments sampled from generative models trained to complete molecular graphs. Rather than evaluating masked predictions against implausible zeroed-out baselines, we assess them relative to counterfactual molecules drawn from the data distribution. Our method offers two key benefits: (1) molecular realism underpinning robust and distribution-consistent explanations, and (2) meaningful counterfactuals that directly indicate how structural modifications may affect predicted properties. We demonstrate that counterfactual masking is well-suited for benchmarking model explainers and yields more actionable insights across multiple datasets and property prediction tasks. Our approach bridges the gap between explainability and molecular design, offering a principled and generative path toward explainable machine learning in chemistry.
【20】Data Augmentation Improves Machine Unlearning
标题:数据增强改善机器学习
链接:https://arxiv.org/abs/2508.18502
作者:. C. Falcao, Filipe R. Cordeiro
备注:Paper accepted at SIBGRAPI'25
摘要:机器非学习(MU)旨在从训练模型中去除特定数据的影响,同时保留其对剩余数据的性能。虽然一些作品建议记忆和增强之间的联系,在MU系统增强设计的作用仍然调查不足。在这项工作中,我们研究了不同的数据增强策略对非学习方法性能的影响,包括SalUn,随机标签和微调。在CIFAR-10和CIFAR-100上进行的实验表明,在不同的遗忘率下,适当的增强设计可以显着提高学习效果,减少与再训练模型的性能差距。结果显示,当使用TrivialAug增强时,平均差距unlearning指标减少了40.12%。我们的研究结果表明,增强不仅有助于减少记忆,而且在实现隐私保护和有效的遗忘方面发挥着至关重要的作用。
摘要:Machine Unlearning (MU) aims to remove the influence of specific data from a trained model while preserving its performance on the remaining data. Although a few works suggest connections between memorisation and augmentation, the role of systematic augmentation design in MU remains under-investigated. In this work, we investigate the impact of different data augmentation strategies on the performance of unlearning methods, including SalUn, Random Label, and Fine-Tuning. Experiments conducted on CIFAR-10 and CIFAR-100, under varying forget rates, show that proper augmentation design can significantly improve unlearning effectiveness, reducing the performance gap to retrained models. Results showed a reduction of up to 40.12% of the Average Gap unlearning Metric, when using TrivialAug augmentation. Our results suggest that augmentation not only helps reduce memorization but also plays a crucial role in achieving privacy-preserving and efficient unlearning.
【21】DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
标题:DualSparse-MoE:通过专家分区和重建协调张量/神经元级稀疏性
链接:https://arxiv.org/abs/2508.18376
作者:i, Le Qin, Shwai He, Junwei Cui, Ang Li, Jiayi Huang
摘要:混合专家(MoE)已经成为构建大型语言模型(LLM)的主流架构,它减少了每个令牌的计算,同时支持模型扩展。它可以被视为在张量级别将大型前馈网络(FFN)划分为细粒度的子FFN或专家,并仅为每个输入激活稀疏子集。虽然这种稀疏性提高了效率,但由于其庞大的计算规模和不可预测的激活模式,MoE仍然面临着巨大的挑战。 为了实现有效的MoE部署,我们将预训练的MoE模块中张量和神经元级别的双重稀疏性确定为准确性和效率的关键因素。与先前的工作,增加张量级稀疏通过细粒度的专家设计在预训练,我们引入训练后的专家划分,以诱导这种稀疏性,而无需重新训练。这保持了模型转换的数学一致性,并提高了后续微调和推理的效率和准确性。在此基础上,我们提出了DualSparse-MoE,这是一种推理系统,它集成了动态张量级计算丢弃与静态神经元级重建,以最小的准确性损失提供显着的效率提升。 实验结果表明,我们的方法执行一个近似25%的下降率降低平均精度只有0.08%-0.28%,在三个流行的MoE模型,而几乎所有程度的计算下降一致产生成比例的计算加速。此外,将负载不平衡意识到专家并行实现了1.41倍的MoE模块加速,只有0.5%的平均精度下降。
摘要:Mixture of Experts (MoE) has become a mainstream architecture for building Large Language Models (LLMs) by reducing per-token computation while enabling model scaling. It can be viewed as partitioning a large Feed-Forward Network (FFN) at the tensor level into fine-grained sub-FFNs, or experts, and activating only a sparse subset for each input. While this sparsity improves efficiency, MoE still faces substantial challenges due to their massive computational scale and unpredictable activation patterns. To enable efficient MoE deployment, we identify dual sparsity at the tensor and neuron levels in pre-trained MoE modules as a key factor for both accuracy and efficiency. Unlike prior work that increases tensor-level sparsity through finer-grained expert design during pre-training, we introduce post-training expert partitioning to induce such sparsity without retraining. This preserves the mathematical consistency of model transformations and enhances both efficiency and accuracy in subsequent fine-tuning and inference. Building upon this, we propose DualSparse-MoE, an inference system that integrates dynamic tensor-level computation dropping with static neuron-level reconstruction to deliver significant efficiency gains with minimal accuracy loss. Experimental results show that enforcing an approximate 25% drop rate with our approach reduces average accuracy by only 0.08%-0.28% across three prevailing MoE models, while nearly all degrees of computation dropping consistently yield proportional computational speedups. Furthermore, incorporating load-imbalance awareness into expert parallelism achieves a 1.41x MoE module speedup with just 0.5% average accuracy degradation.
【22】Does Calibration Affect Human Actions?
标题:校准是否影响人类行为?
链接:https://arxiv.org/abs/2508.18317
作者:i, Amos Azaria, Chirag Gupta, Noam Hazon
摘要:校准已经被提出作为一种提高机器学习分类器的可靠性和采用率的方法。我们研究了这个建议的一个特定方面:如何校准分类模型影响非专家人类消费模型的预测所做的决定?我们进行人机交互(HCI)实验,以确定校准对(i)模型中的信任以及(ii)决策和预测之间的相关性的影响。我们还提出了进一步的校正报告校准分数的基础上,从行为经济学的Kahneman和Tversky的前景理论,并研究这些校正对信任和决策的影响。我们发现,校准本身是不够的,前景理论校正是至关重要的,增加人类决策和模型的预测之间的相关性。虽然这种增加的相关性表明对模型的信任度更高,但对“你更信任模型吗?”“不受所用方法的影响。
摘要
:Calibration has been proposed as a way to enhance the reliability and adoption of machine learning classifiers. We study a particular aspect of this proposal: how does calibrating a classification model affect the decisions made by non-expert humans consuming the model's predictions? We perform a Human-Computer-Interaction (HCI) experiment to ascertain the effect of calibration on (i) trust in the model, and (ii) the correlation between decisions and predictions. We also propose further corrections to the reported calibrated scores based on Kahneman and Tversky's prospect theory from behavioral economics, and study the effect of these corrections on trust and decision-making. We find that calibration is not sufficient on its own; the prospect theory correction is crucial for increasing the correlation between human decisions and the model's predictions. While this increased correlation suggests higher trust in the model, responses to ``Do you trust the model more?" are unaffected by the method used.
【23】What Matters in Data for DPO?
标题:数据对DPO有什么重要意义?
链接:https://arxiv.org/abs/2508.18312
作者:hongze Cai, Guanting Chen, Huaiyang Zhong, Chonghuan Wang
摘要:直接偏好优化(DPO)已经成为一种简单而有效的方法,用于将大型语言模型(LLM)与人类偏好相匹配,绕过了对学习奖励模型的需求。尽管它越来越多地被采用,但一个基本问题仍然存在:偏好数据的哪些特征对DPO性能最关键?在这项工作中,我们提供了一个系统的研究偏好数据分布如何影响DPO,从理论和实证的角度。我们发现,所选择的响应的质量在优化DPO目标中起着主导作用,而被拒绝的响应的质量可能具有相对有限的影响。我们的理论分析的特点下DPO的最佳响应分布,并揭示了如何响应之间的对比度,主要是通过改善所选择的样本。我们进一步研究了在线DPO设置,并表明它有效地减少了对所选响应的监督微调。在不同任务中进行的大量实验证实了我们的发现:无论拒绝的响应质量如何,提高所选响应的质量都会持续提高性能。我们还调查了混合政策数据的好处。我们的研究结果解释了一些广泛采用的策略背后的机制,并为构建LLM对齐的高影响力偏好数据集提供了实用的见解。
摘要:Direct Preference Optimization (DPO) has emerged as a simple and effective approach for aligning large language models (LLMs) with human preferences, bypassing the need for a learned reward model. Despite its growing adoption, a fundamental question remains open: what characteristics of preference data are most critical for DPO performance? In this work, we provide a systematic study of how preference data distribution influences DPO, from both theoretical and empirical perspectives. We show that the quality of chosen responses plays a dominant role in optimizing the DPO objective, while the quality of rejected responses may have relatively limited impact. Our theoretical analysis characterizes the optimal response distribution under DPO and reveals how contrastiveness between responses helps primarily by improving the chosen samples. We further study an online DPO setting and show it effectively reduces to supervised fine-tuning on the chosen responses. Extensive experiments across diverse tasks confirm our findings: improving the quality of chosen responses consistently boosts performance regardless of the quality of the rejected responses. We also investigate the benefit of mixing the on-policy data. Our results interpret the mechanism behind some widely adopted strategies and offer practical insights for constructing high-impact preference datasets for LLM alignment.
【24】Technology-assisted Personalized Yoga for Better Health - Challenges and Outlook
标题:技术辅助个性化瑜伽改善健康-挑战与展望
链接:https://arxiv.org/abs/2508.18283
作者:ar, Himanshu Sahu, Hari Prabhat Gupta, Biplav Srivastava
备注:10 Pages, 11 figures, 2 tables
摘要:瑜伽是一种根植于古印度传统的身体姿势,呼吸技巧和冥想练习的学科,现在全世界都在接受,以促进整体健康和内在平衡。这些练习是一大堆项目,我们的术语是指可执行的动作,例如身体姿势或呼吸练习,以提供一个人的福祉。然而,为了让瑜伽的好处适合一个人的独特需求,一个人需要(a)从庞大而看似复杂的相互依赖的集合中发现他们的子集,(b)继续关注他们的兴趣,以适应他们不断变化的能力和近期目标,以及(c)适当地,根据不断变化的环境和人的健康状况来适应替代项目。在这篇愿景论文中,我们描述了瑜伽个性化问题的挑战。接下来,我们勾勒出一个初步的方法,并使用经验,从多学科计算的角度来看,使用现有的和新的技术来解决具有挑战性的问题提供了一个前景。据我们所知,这是第一篇全面研究瑜伽个性化决策支持问题的论文,从姿势感知到完整方案的纠正建议,并以Surya Namaskar的案例研究为例进行说明-一组12个编排姿势。
摘要:Yoga is a discipline of physical postures, breathing techniques, and meditative practices rooted in ancient Indian traditions, now embraced worldwide for promoting overall well-being and inner balance. The practices are a large set of items, our term for executable actions like physical poses or breath exercises, to offer for a person's well-being. However, to get benefits of Yoga tailored to a person's unique needs, a person needs to (a) discover their subset from the large and seemingly complex set with inter-dependencies, (b) continue to follow them with interest adjusted to their changing abilities and near-term objectives, and (c) as appropriate, adapt to alternative items based on changing environment and the person's health conditions. In this vision paper, we describe the challenges for the Yoga personalization problem. Next, we sketch a preliminary approach and use the experience to provide an outlook on solving the challenging problem using existing and novel techniques from a multidisciplinary computing perspective. To the best of our knowledge, this is the first paper that comprehensively examines decision support issues around Yoga personalization, from pose sensing to recommendation of corrections for a complete regimen, and illustrates with a case study of Surya Namaskar -- a set of 12 choreographed poses.
【25】Approximating High-Dimensional Earth Mover's Distance as Fast as Closest Pair
标题:以最接近的速度逼近多维地球移动者的距离
链接:https://arxiv.org/abs/2508.06774
作者:eretta, Vincent Cohen-Addad, Rajesh Jayaram, Erik Waingarten
备注:FOCS 2025
摘要:我们给出了从$(1+\vareps)$-近似地球移动者距离(EMD)到$(1+\vareps)$-近似最近对(CP)的约化。因此,我们改进了已知的最快的近似算法的高维EMD。这里,给定$p\in [1,2]$和两组$n$点$X,Y \subseteq(\mathbb R^d,\ell_p)$,它们的EMD是$X$和$Y$之间完美匹配的最小代价,其中匹配两个向量的代价是它们的$\ell_p$距离。此外,CP是找到实现$\min_{x \in X,y\in Y}的一对点的基本问题||X-Y||_p$。我们的贡献是双重的:我们表明,如果一个$(1+\vareps)$-近似CP可以在时间$n^{2-\phi}$中计算,则$1+O(\vareps)$近似EMD可以在时间$n^{2-\Omega(\phi)}$计算;插入CP的最快已知算法[Alman,Chan,Williams FOCS'16],我们得到一个$时间上EMD的(1+\vareps)$-近似算法$n^{2-\tilde{\Omega}(\varepsilon^{1/3})}$的高维点集,这比之前的最快运行时间$n^{2-\Omega(\varepsilon^2)}$ [Andoni,Zhang FOCS'23]有所改进。我们的主要技术贡献是EMD的乘法权重更新框架的次线性实现。具体来说,我们证明了更新可以执行,而无需显式计算或存储的权重,相反,我们利用底层的几何结构来执行隐式更新。
摘要:We give a reduction from $(1+\varepsilon)$-approximate Earth Mover's Distance (EMD) to $(1+\varepsilon)$-approximate Closest Pair (CP). As a consequence, we improve the fastest known approximation algorithm for high-dimensional EMD. Here, given $p\in [1, 2]$ and two sets of $n$ points $X,Y \subseteq (\mathbb R^d,\ell_p)$, their EMD is the minimum cost of a perfect matching between $X$ and $Y$, where the cost of matching two vectors is their $\ell_p$ distance. Further, CP is the basic problem of finding a pair of points realizing $\min_{x \in X, y\in Y} ||x-y||_p$. Our contribution is twofold: we show that if a $(1+\varepsilon)$-approximate CP can be computed in time $n^{2-\phi}$, then a $1+O(\varepsilon)$ approximation to EMD can be computed in time $n^{2-\Omega(\phi)}$; plugging in the fastest known algorithm for CP [Alman, Chan, Williams FOCS'16], we obtain a $(1+\varepsilon)$-approximation algorithm for EMD running in time $n^{2-\tilde{\Omega}(\varepsilon^{1/3})}$ for high-dimensional point sets, which improves over the prior fastest running time of $n^{2-\Omega(\varepsilon^2)}$ [Andoni, Zhang FOCS'23]. Our main technical contribution is a sublinear implementation of the Multiplicative Weights Update framework for EMD. Specifically, we demonstrate that the updates can be executed without ever explicitly computing or storing the weights; instead, we exploit the underlying geometric structure to perform the updates implicitly.
【26】Echoes of the past: A unified perspective on fading memory and echo states
标题:过去的回声:对褪色记忆和回声状态的统一视角
链接:https://arxiv.org/abs/2508.19145
作者:o Ortega, Florian Rossmannek
摘要:递归神经网络(RNN)在涉及时间序列和时态数据的信息处理任务中越来越受欢迎。RNN的一个基本特性是它们能够创建可靠的输入/输出响应,这通常与网络如何处理其处理信息的记忆有关。已经提出了各种概念来概念化RNN中的记忆行为,包括稳态,回声状态,状态遗忘,输入遗忘和衰落记忆。虽然这些概念经常互换使用,但它们之间的确切关系仍然不清楚。这项工作的目的是统一这些概念在一个共同的语言,得到新的含义和它们之间的等价关系,并提供替代证明一些现有的结果。通过澄清这些概念之间的关系,本研究有助于更深入地了解RNN及其时间信息处理能力。
摘要:Recurrent neural networks (RNNs) have become increasingly popular in information processing tasks involving time series and temporal data. A fundamental property of RNNs is their ability to create reliable input/output responses, often linked to how the network handles its memory of the information it processed. Various notions have been proposed to conceptualize the behavior of memory in RNNs, including steady states, echo states, state forgetting, input forgetting, and fading memory. Although these notions are often used interchangeably, their precise relationships remain unclear. This work aims to unify these notions in a common language, derive new implications and equivalences between them, and provide alternative proofs to some existing results. By clarifying the relationships between these concepts, this research contributes to a deeper understanding of RNNs and their temporal information processing capabilities.
【27】Universal Dynamics with Globally Controlled Analog Quantum Simulators
标题:具有全球受控模拟量子模拟器的通用动力学
链接:https://arxiv.org/abs/2508.19075
作者:u, Abigail McClain Gomez, Liyuan Chen, Aaron Trowbridge, Andy J. Goldschmidt, Zachary Manchester, Frederic T. Chong, Arthur Jaffe, Susanne F. Yelin
备注:12 pages, 5 figures
摘要:具有全局控制场的模拟量子模拟器已经成为探索复杂量子现象的强大平台。最近的突破,例如对数千个原子的相干控制,突出了量子应用在规模上不断增长的潜力。尽管取得了这些进展,一个基本的理论问题仍然没有解决:在全球控制下,这样的系统在多大程度上可以实现普遍的量子动力学?在这里,我们建立了一个通用的量子计算的充分必要条件,仅使用全局脉冲控制,证明了一个广泛的类模拟量子模拟器,事实上,通用的。我们进一步扩展这一框架的费米子和玻色子系统,包括现代平台,如超冷原子在光学超晶格。最重要的是,为了将理论可能性与实验现实联系起来,我们在实验中引入了一种新的控制技术--直接量子最优控制。这种方法使复杂的有效哈密顿的合成,并允许我们将现实的硬件约束。为了展示其实用的力量,我们实验工程师的封锁制度以外的三体相互作用,并展示了里德伯原子阵列上的拓扑动力学。使用新的控制框架,我们克服了关键的实验挑战,包括硬件限制和原子位置波动的非封锁制度,通过识别平滑,短持续时间的脉冲,实现高保真度的动态。实验测量揭示了受保护的拓扑边缘模式的动力学签名,证实了我们的方法的表现力和可行性。我们的工作为量子模拟开辟了一条超越原生硬件哈密顿的新途径,使有效的多体相互作用工程成为可能,并通过全局控制的模拟平台推进量子信息处理的前沿。
摘要:Analog quantum simulators with global control fields have emerged as powerful platforms for exploring complex quantum phenomena. Recent breakthroughs, such as the coherent control of thousands of atoms, highlight the growing potential for quantum applications at scale. Despite these advances, a fundamental theoretical question remains unresolved: to what extent can such systems realize universal quantum dynamics under global control? Here we establish a necessary and sufficient condition for universal quantum computation using only global pulse control, proving that a broad class of analog quantum simulators is, in fact, universal. We further extend this framework to fermionic and bosonic systems, including modern platforms such as ultracold atoms in optical superlattices. Crucially, to connect the theoretical possibility with experimental reality, we introduce a new control technique into the experiment - direct quantum optimal control. This method enables the synthesis of complex effective Hamiltonians and allows us to incorporate realistic hardware constraints. To show its practical power, we experimentally engineer three-body interactions outside the blockade regime and demonstrate topological dynamics on a Rydberg atom array. Using the new control framework, we overcome key experimental challenges, including hardware limitations and atom position fluctuations in the non-blockade regime, by identifying smooth, short-duration pulses that achieve high-fidelity dynamics. Experimental measurements reveal dynamical signatures of symmetry-protected-topological edge modes, confirming both the expressivity and feasibility of our approach. Our work opens a new avenue for quantum simulation beyond native hardware Hamiltonians, enabling the engineering of effective multi-body interactions and advancing the frontier of quantum information processing with globally-controlled analog platforms.
【28】Sparse minimum Redundancy Maximum Relevance for feature selection
标题:稀疏最小冗余最大相关性特征选择
链接:https://arxiv.org/abs/2508.18901
作者:lor, Benjamin Poignard, Héctor Climente-González, Makoto Yamada
摘要:我们提出了一种特征筛选方法,集成了特征-特征和特征-目标的关系。不活跃的功能,通过惩罚最小Redundancy最大相关性(mRMR)的过程,这是经典的mRMR惩罚的非凸正则化的连续版本,并估计为零系数的参数表示的一组不活跃的功能。我们建立的条件下,零系数被正确识别,以保证准确恢复的非活动功能。我们介绍了一个多阶段的过程的基础上敲除过滤器,使惩罚mRMR丢弃不活跃的功能,同时控制错误发现率(FDR)。我们的方法执行HSIC-LASSO,但更保守的选择功能的数量。它只需要设置FDR阈值,而不需要指定要保留的要素数量。通过仿真和真实数据集说明了该方法的有效性。复制此工作的代码可在以下GitHub上获得:https://github.com/PeterJackNaylor/SmRMR。
摘要:We propose a feature screening method that integrates both feature-feature and feature-target relationships. Inactive features are identified via a penalized minimum Redundancy Maximum Relevance (mRMR) procedure, which is the continuous version of the classic mRMR penalized by a non-convex regularizer, and where the parameters estimated as zero coefficients represent the set of inactive features. We establish the conditions under which zero coefficients are correctly identified to guarantee accurate recovery of inactive features. We introduce a multi-stage procedure based on the knockoff filter enabling the penalized mRMR to discard inactive features while controlling the false discovery rate (FDR). Our method performs comparably to HSIC-LASSO but is more conservative in the number of selected features. It only requires setting an FDR threshold, rather than specifying the number of features to retain. The effectiveness of the method is illustrated through simulations and real-world datasets. The code to reproduce this work is available on the following GitHub: https://github.com/PeterJackNaylor/SmRMR.
【29】Temperature-Aware Recurrent Neural Operator for Temperature-Dependent Anisotropic Plasticity in HCP Materials
标题:HCP材料中温度相关各向异性可塑性的温度感知回归神经运算符
链接:https://arxiv.org/abs/2508.18806
作者:ollenweger, Dennis M. Kochman, Burigede Liu
摘要
:Neural network surrogate models for constitutive laws in computational mechanics have been in use for some time. In plasticity, these models often rely on gated recurrent units (GRUs) or long short-term memory (LSTM) cells, which excel at capturing path-dependent phenomena. However, they suffer from long training times and time-resolution-dependent predictions that extrapolate poorly. Moreover, most existing surrogates for macro- or mesoscopic plasticity handle only relatively simple material behavior. To overcome these limitations, we introduce the Temperature-Aware Recurrent Neural Operator (TRNO), a time-resolution-independent neural architecture. We apply the TRNO to model the temperature-dependent plastic response of polycrystalline magnesium, which shows strong plastic anisotropy and thermal sensitivity. The TRNO achieves high predictive accuracy and generalizes effectively across diverse loading cases, temperatures, and time resolutions. It also outperforms conventional GRU and LSTM models in training efficiency and predictive performance. Finally, we demonstrate multiscale simulations with the TRNO, yielding a speedup of at least three orders of magnitude over traditional constitutive models.
【30】Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
标题:上下文组合半强盗的高效双世界最佳算法
链接:https://arxiv.org/abs/2508.18768
作者:Li, Philipp Schneider, Jelisaveta Aleksić, Daniel Kuhn
摘要:We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $\widetilde{\mathcal{O}}(\sqrt{T})$ regret in the adversarial regime and $\widetilde{\mathcal{O}}(\ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the $K$-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.
【31】Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems
标题:重新审视强盗问题中具有无限扰动的受扰动领导者追随者
链接:https://arxiv.org/abs/2508.18604
作者: Lee, Junya Honda, Shinji Ito, Min-hwan Oh
备注:Preprint
摘要:Follow-the-Regularized-Leader (FTRL) policies have achieved Best-of-Both-Worlds (BOBW) results in various settings through hybrid regularizers, whereas analogous results for Follow-the-Perturbed-Leader (FTPL) remain limited due to inherent analytical challenges. To advance the analytical foundations of FTPL, we revisit classical FTRL-FTPL duality for unbounded perturbations and establish BOBW results for FTPL under a broad family of asymmetric unbounded Fr\'echet-type perturbations, including hybrid perturbations combining Gumbel-type and Fr\'echet-type tails. These results not only extend the BOBW results of FTPL but also offer new insights into designing alternative FTPL policies competitive with hybrid regularization approaches. Motivated by earlier observations in two-armed bandits, we further investigate the connection between the $1/2$-Tsallis entropy and a Fr\'echet-type perturbation. Our numerical observations suggest that it corresponds to a symmetric Fr\'echet-type perturbation, and based on this, we establish the first BOBW guarantee for symmetric unbounded perturbations in the two-armed setting. In contrast, in general multi-armed bandits, we find an instance in which symmetric Fr\'echet-type perturbations violate the key condition for standard BOBW analysis, which is a problem not observed with asymmetric or nonnegative Fr\'echet-type perturbations. Although this example does not rule out alternative analyses achieving BOBW results, it suggests the limitations of directly applying the relationship observed in two-armed cases to the general case and thus emphasizes the need for further investigation to fully understand the behavior of FTPL in broader settings.
【32】An Analytical Approach to Privacy and Performance Trade-Offs in Healthcare Data Sharing
标题:医疗保健数据共享中隐私和绩效权衡的分析方法
链接:https://arxiv.org/abs/2508.18513
作者: Hande Y. Benson, Muge Capan
摘要:The secondary use of healthcare data is vital for research and clinical innovation, but it raises concerns about patient privacy. This study investigates how to balance privacy preservation and data utility in healthcare data sharing, considering the perspectives of both data providers and data users. Using a dataset of adult patients hospitalized between 2013 and 2015, we predict whether sepsis was present at admission or developed during the hospital stay. We identify sub-populations, such as older adults, frequently hospitalized patients, and racial minorities, that are especially vulnerable to privacy attacks due to their unique combinations of demographic and healthcare utilization attributes. These groups are also critical for machine learning (ML) model performance. We evaluate three anonymization methods-$k$-anonymity, the technique by Zheng et al., and the MO-OBAM model-based on their ability to reduce re-identification risk while maintaining ML utility. Results show that $k$-anonymity offers limited protection. The methods of Zheng et al. and MO-OBAM provide stronger privacy safeguards, with MO-OBAM yielding the best utility outcomes: only a 2% change in precision and recall compared to the original dataset. This work provides actionable insights for healthcare organizations on how to share data responsibly. It highlights the need for anonymization methods that protect vulnerable populations without sacrificing the performance of data-driven models.
【33】Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology
标题:迈向非裔美国英语使用者负责任的ASB:言语技术偏见和公平的范围审查
链接:https://arxiv.org/abs/2508.18288
作者:nningham, Adinawa Adjagbodjou, Jeffrey Basoah, Jainaba Jawara, Kowe Kadoma, Aaleyah Lewis
备注:10 pages, 9 Pages (References and Appendices). The archival version has been accepted to AAAI (AIES 2025) without the extended Appendices. This extended version includes Appendices
摘要
:This scoping literature review examines how fairness, bias, and equity are conceptualized and operationalized in Automatic Speech Recognition (ASR) and adjacent speech and language technologies (SLT) for African American English (AAE) speakers and other linguistically diverse communities. Drawing from 44 peer-reviewed publications across Human-Computer Interaction (HCI), Machine Learning/Natural Language Processing (ML/NLP), and Sociolinguistics, we identify four major areas of inquiry: (1) how researchers understand ASR-related harms; (2) inclusive data practices spanning collection, curation, annotation, and model training; (3) methodological and theoretical approaches to linguistic inclusion; and (4) emerging practices and design recommendations for more equitable systems. While technical fairness interventions are growing, our review highlights a critical gap in governance-centered approaches that foreground community agency, linguistic justice, and participatory accountability. We propose a governance-centered ASR lifecycle as an emergent interdisciplinary framework for responsible ASR development and offer implications for researchers, practitioners, and policymakers seeking to address language marginalization in speech AI systems.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递