点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计142篇
大模型相关(20篇)
【1】FlowRL: Matching Reward Distributions for LLM Reasoning
标题:FlowRL:LLM推理的匹配奖励分布
链接:https://arxiv.org/abs/2509.15207
作者:u, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, Zhouhan Lin
摘要:我们建议FlowRL:通过流平衡来匹配完整的奖励分布,而不是在大型语言模型(LLM)强化学习(RL)中最大化奖励。最近的高级推理模型采用了奖励最大化方法(例如PPO和GRPO),这些方法倾向于过度优化主要的奖励信号,而忽略不太频繁但有效的推理路径,从而降低了多样性。相比之下,我们使用可学习的分区函数将标量奖励转换为归一化的目标分布,然后最小化策略和目标分布之间的反向KL发散。我们实现这个想法作为一个流量平衡的优化方法,促进多样化的探索和概括的推理轨迹。我们对数学和代码推理任务进行了实验:FlowRL在数学基准测试中比GRPO平均提高了10.0美元,比PPO平均提高了5.1美元,并且在代码推理任务中表现得更好。这些结果强调了奖励分布匹配是LLM强化学习中有效探索和多样化推理的关键一步。
摘要:We propose FlowRL: matching the full reward distribution via flow balancing instead of maximizing rewards in large language model (LLM) reinforcement learning (RL). Recent advanced reasoning models adopt reward-maximizing methods (\eg, PPO and GRPO), which tend to over-optimize dominant reward signals while neglecting less frequent but valid reasoning paths, thus reducing diversity. In contrast, we transform scalar rewards into a normalized target distribution using a learnable partition function, and then minimize the reverse KL divergence between the policy and the target distribution. We implement this idea as a flow-balanced optimization method that promotes diverse exploration and generalizable reasoning trajectories. We conduct experiments on math and code reasoning tasks: FlowRL achieves a significant average improvement of $10.0\%$ over GRPO and $5.1\%$ over PPO on math benchmarks, and performs consistently better on code reasoning tasks. These results highlight reward distribution-matching as a key step toward efficient exploration and diverse reasoning in LLM reinforcement learning.
【2】Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
标题:不断进化的无标签语言模型:多数决定选择,新颖性促进变异
链接:https://arxiv.org/abs/2509.15194
作者:u, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu
摘要:大型语言模型(LLM)越来越多地使用可验证奖励强化学习(RLVR)进行训练,但现实世界的部署要求模型可以在没有标签或外部判断的情况下自我改进。现有的无标签方法,置信度最小化,自我一致性,或多数投票目标,稳定学习,但不断缩小探索,导致熵崩溃:代变得更短,更少的多样性,和脆弱。与之前的方法不同,例如测试时强化学习(TTRL),它主要使模型适应手头的直接未标记数据集,我们的目标更广泛:在不牺牲模型固有的探索能力和泛化能力的情况下实现一般改进,即,进化我们将这个问题正式化,并提出了面向演化的无标签强化学习(EVOL-RL),这是一个简单的规则,在无标签设置下将稳定性与变化结合起来。EVOL-RL保持多数投票的答案作为稳定的锚点(选择),同时增加一个新颖的奖励,有利于那些推理与已经产生的(变化)不同的反应,在语义空间中测量。使用GRPO实现,EVOL-RL还使用非对称限幅来保留强信号和熵正则化器来维持搜索。这种多数选择+新颖变化的设计防止了崩溃,保持了更长和更丰富的思想链,并改进了pass@1和pass@n。EVOL-RL始终优于仅多数TTRL基线;例如,对无标签AIME 24的培训将Qwen 3 - 4 B-Base AIME 25 pass@1从TTRL的4.6%提高到16.4%,pass@16从18.5%提高到37.9%。EVOL-RL不仅可以防止多样性崩溃,还可以解锁跨域的更强泛化(例如,GPQA)。此外,我们还证明了EVOL-RL还可以提高RLVR设置中的性能,突出了其广泛的适用性。
摘要:Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR), yet real-world deployment demands models that can self-improve without labels or external judges. Existing label-free methods, confidence minimization, self-consistency, or majority-vote objectives, stabilize learning but steadily shrink exploration, causing an entropy collapse: generations become shorter, less diverse, and brittle. Unlike prior approaches such as Test-Time Reinforcement Learning (TTRL), which primarily adapt models to the immediate unlabeled dataset at hand, our goal is broader: to enable general improvements without sacrificing the model's inherent exploration capacity and generalization ability, i.e., evolving. We formalize this issue and propose EVolution-Oriented and Label-free Reinforcement Learning (EVOL-RL), a simple rule that couples stability with variation under a label-free setting. EVOL-RL keeps the majority-voted answer as a stable anchor (selection) while adding a novelty-aware reward that favors responses whose reasoning differs from what has already been produced (variation), measured in semantic space. Implemented with GRPO, EVOL-RL also uses asymmetric clipping to preserve strong signals and an entropy regularizer to sustain search. This majority-for-selection + novelty-for-variation design prevents collapse, maintains longer and more informative chains of thought, and improves both pass@1 and pass@n. EVOL-RL consistently outperforms the majority-only TTRL baseline; e.g., training on label-free AIME24 lifts Qwen3-4B-Base AIME25 pass@1 from TTRL's 4.6% to 16.4%, and pass@16 from 18.5% to 37.9%. EVOL-RL not only prevents diversity collapse but also unlocks stronger generalization across domains (e.g., GPQA). Furthermore, we demonstrate that EVOL-RL also boosts performance in the RLVR setting, highlighting its broad applicability.
【3】Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
标题:通过卷积解码和拒绝微调快速流畅的扩散语言模型
链接:https://arxiv.org/abs/2509.15188
作者:Seo, Dongha Lee, Jaehyung Kim, Jinyoung Yeo
备注:NeurIPS 2025 spotlight
摘要:自回归(AR)语言模型一次生成一个标记的文本,这限制了它们的推理速度。基于扩散的语言模型提供了一种有前途的替代方案,因为它们可以并行解码多个令牌。然而,我们确定了当前扩散LM中的一个关键瓶颈:长解码窗口问题,其中远离输入上下文生成的令牌通常变得不相关或重复。以前的解决方案,如半自回归,通过将窗口分割成块来解决这个问题,但这牺牲了速度和双向性,消除了扩散模型的主要优势。为了克服这个问题,我们提出了卷积解码(Conv),这是一种基于规范化的方法,可以缩小解码窗口,而无需硬分割,从而带来更好的流畅性和灵活性。此外,我们还引入了拒绝基于规则的微调(R2FT),这是一种事后训练方案,可以更好地将标记与远离上下文的位置对齐。我们的方法在开放式生成基准上实现了最先进的结果(例如,AlpacaEval)在扩散LM基线之间,步长比以前的作品明显降低,证明了速度和质量的提高。
摘要:Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a key bottleneck in current diffusion LMs: the long decoding-window problem, where tokens generated far from the input context often become irrelevant or repetitive. Previous solutions like semi-autoregressive address this issue by splitting windows into blocks, but this sacrifices speed and bidirectionality, eliminating the main advantage of diffusion models. To overcome this, we propose Convolutional decoding (Conv), a normalization-based method that narrows the decoding window without hard segmentation, leading to better fluency and flexibility. Additionally, we introduce Rejecting Rule-based Fine-Tuning (R2FT), a post-hoc training scheme that better aligns tokens at positions far from context. Our methods achieve state-of-the-art results on open-ended generation benchmarks (e.g., AlpacaEval) among diffusion LM baselines, with significantly lower step size than previous works, demonstrating both speed and quality improvements.
【4】TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
标题:TDRM:LLM RL和推理的具有时间差异的平滑奖励模型
链接:https://arxiv.org/abs/2509.15110
作者:, Min Cai, Jonathan Li, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang
备注:9 figures, 7 tables
摘要:奖励模型是语言模型强化学习(RL)和推理时间验证的核心。然而,现有的奖励模型往往缺乏时间一致性,导致无效的策略更新和不稳定的RL训练。我们介绍了TDRM,这是一种通过最小化训练过程中的时间差异来学习更平滑、更可靠的奖励模型的方法。这种时间差(TD)正则化产生了平滑的回报,并提高了与长期目标的一致性。将TDRM转换为演员-评论家风格的在线RL循环会产生一致的经验收益。值得注意的是,TDRM是可验证奖励方法的补充,两者都可以串联使用。实验表明,TD训练的过程奖励模型(PRM)在最佳N(高达6.6%)和树搜索(高达23.7%)设置中提高了性能。当与带有可验证奖励的强化学习(RLVR)相结合时,TD训练的PRM会带来更高的数据效率RL-仅用2.5k数据就实现了与基线方法需要50.1k数据才能实现的性能相当的性能-并在8个模型变体(5个系列)上产生更高质量的语言模型策略,例如,Qwen2.5-(0.5B,1,5B)、GLM4-9B-0414、GLM-Z1-9B-0414、Qwen2.5-Math-(1.5B,7B)和DeepSeek-R1-Distill-Qwen-(1.5B,7B)。我们在https://github.com/THUDM/TDRM上发布所有代码。
摘要:Reward models are central to both reinforcement learning (RL) with language models and inference-time verification. However, existing reward models often lack temporal consistency, leading to ineffective policy updates and unstable RL training. We introduce TDRM, a method for learning smoother and more reliable reward models by minimizing temporal differences during training. This temporal-difference (TD) regularization produces smooth rewards and improves alignment with long-term objectives. Incorporating TDRM into the actor-critic style online RL loop yields consistent empirical gains. It is worth noting that TDRM is a supplement to verifiable reward methods, and both can be used in series. Experiments show that TD-trained process reward models (PRMs) improve performance across Best-of-N (up to 6.6%) and tree-search (up to 23.7%) settings. When combined with Reinforcement Learning with Verifiable Rewards (RLVR), TD-trained PRMs lead to more data-efficient RL -- achieving comparable performance with just 2.5k data to what baseline methods require 50.1k data to attain -- and yield higher-quality language model policies on 8 model variants (5 series), e.g., Qwen2.5-(0.5B, 1,5B), GLM4-9B-0414, GLM-Z1-9B-0414, Qwen2.5-Math-(1.5B, 7B), and DeepSeek-R1-Distill-Qwen-(1.5B, 7B). We release all code at https://github.com/THUDM/TDRM.
【5】Forecasting and Visualizing Air Quality from Sky Images with Vision-Language Models
标题:利用视觉语言模型从天空图像预测和可视化空气质量
链接:https://arxiv.org/abs/2509.15076
作者:Saleh Vahdatpour, Maryam Eyvazi, Yanqing Zhang
备注:Published at ICCVW 2025
摘要:空气污染仍然是对公众健康和环境可持续性的一个严重威胁,但传统的监测系统往往受到有限的空间覆盖范围和可及性的限制。本文提出了一种人工智能驱动的代理,它可以从天空图像中预测环境空气污染水平,并使用生成建模来合成污染场景的逼真可视化。我们的方法将统计纹理分析与监督学习相结合,用于污染分类,并利用视觉语言模型(VLM)引导的图像生成来生成可解释的空气质量状况表示。生成的视觉效果模拟不同程度的污染,为面向用户的界面提供基础,提高透明度并支持知情的环境决策。这些输出可以无缝集成到智能应用程序中,旨在增强态势感知并鼓励基于实时预测的行为反应。我们使用城市天空图像数据集验证我们的方法,并证明其有效性,在污染水平估计和语义一致的视觉合成。该系统设计进一步融入了以人为本的用户体验原则,以确保空气质量预报的可访问性、清晰度和公众参与度。为了支持可扩展和节能的部署,未来的迭代将采用基于FPGA的增量学习增强的绿色CNN架构,从而实现边缘平台上的实时推理。
摘要
:Air pollution remains a critical threat to public health and environmental sustainability, yet conventional monitoring systems are often constrained by limited spatial coverage and accessibility. This paper proposes an AI-driven agent that predicts ambient air pollution levels from sky images and synthesizes realistic visualizations of pollution scenarios using generative modeling. Our approach combines statistical texture analysis with supervised learning for pollution classification, and leverages vision-language model (VLM)-guided image generation to produce interpretable representations of air quality conditions. The generated visuals simulate varying degrees of pollution, offering a foundation for user-facing interfaces that improve transparency and support informed environmental decision-making. These outputs can be seamlessly integrated into intelligent applications aimed at enhancing situational awareness and encouraging behavioral responses based on real-time forecasts. We validate our method using a dataset of urban sky images and demonstrate its effectiveness in both pollution level estimation and semantically consistent visual synthesis. The system design further incorporates human-centered user experience principles to ensure accessibility, clarity, and public engagement in air quality forecasting. To support scalable and energy-efficient deployment, future iterations will incorporate a green CNN architecture enhanced with FPGA-based incremental learning, enabling real-time inference on edge platforms.
【6】Patent Language Model Pretraining with ModernBERT
标题:使用ModernBERT进行专利语言模型预训练
链接:https://arxiv.org/abs/2509.14926
作者:in Yousefiramandi, Ciaran Cooney
备注:7 pages, 2 figures, 4 tables
摘要:基于transformer的语言模型(如BERT)已经成为NLP的基础,但它们的性能在专利等专业领域会下降,这些领域包含长的,技术性的和合法结构的文本。以前的专利NLP方法主要依赖于微调通用模型或用有限数据预训练的适应领域的变体。在这项工作中,我们使用ModernBERT架构和超过6000万条专利记录的策展语料库,为专利预训练了3个特定于领域的掩码语言模型。我们的方法结合了架构优化,包括FlashAttention,旋转嵌入和GLU前馈层。我们评估我们的模型上的四个下游专利分类任务。我们的模型ModernBERT-base-PT在四个数据集中的三个数据集上始终优于通用ModernBERT基线,并通过基线PatentBERT实现了具有竞争力的性能。使用ModernBERT-base-VX和Mosaic-BERT-large的其他实验表明,缩放模型大小和自定义标记器进一步增强了选定任务的性能。值得注意的是,所有ModernBERT变体都保持了比PatentBERT快3倍的推理速度,这突出了它们对时间敏感的应用程序的适用性。这些结果强调了针对特定领域的预训练和架构改进对于以专利为中心的NLP任务的好处。
摘要:Transformer-based language models such as BERT have become foundational in NLP, yet their performance degrades in specialized domains like patents, which contain long, technical, and legally structured text. Prior approaches to patent NLP have primarily relied on fine-tuning general-purpose models or domain-adapted variants pretrained with limited data. In this work, we pretrain 3 domain-specific masked language models for patents, using the ModernBERT architecture and a curated corpus of over 60 million patent records. Our approach incorporates architectural optimizations, including FlashAttention, rotary embeddings, and GLU feed-forward layers. We evaluate our models on four downstream patent classification tasks. Our model, ModernBERT-base-PT, consistently outperforms the general-purpose ModernBERT baseline on three out of four datasets and achieves competitive performance with a baseline PatentBERT. Additional experiments with ModernBERT-base-VX and Mosaic-BERT-large demonstrate that scaling the model size and customizing the tokenizer further enhance performance on selected tasks. Notably, all ModernBERT variants retain substantially faster inference over - 3x that of PatentBERT - underscoring their suitability for time-sensitive applications. These results underscore the benefits of domain-specific pretraining and architectural improvements for patent-focused NLP tasks.
【7】CARGO: A Framework for Confidence-Aware Routing of Large Language Models
标题:CARGO:大型语言模型的信任感知路由框架
链接:https://arxiv.org/abs/2509.14899
作者:rak, Yosr Fourati, Michael Olchawa, Emna Ksontini, Khalil Zoghlami
备注:None
摘要:随着大型语言模型(LLM)在规模、专业化和延迟配置文件方面的激增,将用户提示路由到最合适的模型的挑战对于平衡性能和成本变得越来越重要。我们介绍了货物(基于间隙的优化的类别感知路由),一个轻量级的,信心感知的框架,用于动态LLM选择。CARGO采用基于LLM判断的成对比较训练的单个基于嵌入的回归器来预测模型性能,当预测不确定时调用可选的二元分类器。这种两阶段设计实现了精确的、成本感知的路由,而无需人工注释的监督。为了捕捉特定领域的行为,CARGO还支持在五个任务组中训练的特定类别的回归器:数学,编码,推理,总结和创意写作。在四个有竞争力的LLM(GPT-4 o,Claude 3.5 Sonnet,DeepSeek V3和Perplexity Sonar)上进行评估,CARGO实现了76.4%的前1路由准确率,并在72%至89%的范围内击败了单个专家。这些结果表明,信任引导的轻量级路由可以以最小的开销实现专家级性能,为现实世界的多模型LLM部署提供了实用的解决方案。
摘要:As large language models (LLMs) proliferate in scale, specialization, and latency profiles, the challenge of routing user prompts to the most appropriate model has become increasingly critical for balancing performance and cost. We introduce CARGO (Category-Aware Routing with Gap-based Optimization), a lightweight, confidence-aware framework for dynamic LLM selection. CARGO employs a single embedding-based regressor trained on LLM-judged pairwise comparisons to predict model performance, with an optional binary classifier invoked when predictions are uncertain. This two-stage design enables precise, cost-aware routing without the need for human-annotated supervision. To capture domain-specific behavior, CARGO also supports category-specific regressors trained across five task groups: mathematics, coding, reasoning, summarization, and creative writing. Evaluated on four competitive LLMs (GPT-4o, Claude 3.5 Sonnet, DeepSeek V3, and Perplexity Sonar), CARGO achieves a top-1 routing accuracy of 76.4% and win rates ranging from 72% to 89% against individual experts. These results demonstrate that confidence-guided, lightweight routing can achieve expert-level performance with minimal overhead, offering a practical solution for real-world, multi-model LLM deployments.
【8】LEED: A Highly Efficient and Scalable LLM-Empowered Expert Demonstrations Framework for Multi-Agent Reinforcement Learning
标题:LEED:一个高效且可扩展的LLM授权的多智能体强化学习专家演示框架
链接:https://arxiv.org/abs/2509.14680
作者:Duan, Zongyuan Zhang, Songxiao Guo, Dong Huang, Yuanye Zhao, Zheng Lin, Zihan Fang, Dianxin Luan, Heming Cui, Yong Cui
备注:5 pages, 4 figures
摘要
:多智能体强化学习(MARL)为复杂环境中的智能决策提供了巨大的希望。然而,随着代理数量的增加,它会遇到协调和可扩展性瓶颈。为了解决这些问题,我们提出了LLM授权的多智能体强化学习(LEED)专家演示框架。LEED由两个部分组成:示范生成(DG)模块和政策优化(PO)模块。具体来说,DG模块利用大型语言模型来生成与环境交互的指令,从而生成高质量的演示。PO模块采用分散式训练范式,每个代理利用生成的演示来构建专家策略损失,然后将其与自己的策略损失相结合。这使得每个代理都能够根据专家知识和个人经验有效地个性化和优化其本地策略。实验结果表明,LEED实现了卓越的采样效率,时间效率和强大的可扩展性相比,国家的最先进的基线。
摘要:Multi-agent reinforcement learning (MARL) holds substantial promise for intelligent decision-making in complex environments. However, it suffers from a coordination and scalability bottleneck as the number of agents increases. To address these issues, we propose the LLM-empowered expert demonstrations framework for multi-agent reinforcement learning (LEED). LEED consists of two components: a demonstration generation (DG) module and a policy optimization (PO) module. Specifically, the DG module leverages large language models to generate instructions for interacting with the environment, thereby producing high-quality demonstrations. The PO module adopts a decentralized training paradigm, where each agent utilizes the generated demonstrations to construct an expert policy loss, which is then integrated with its own policy loss. This enables each agent to effectively personalize and optimize its local policy based on both expert knowledge and individual experience. Experimental results show that LEED achieves superior sample efficiency, time efficiency, and robust scalability compared to state-of-the-art baselines.
【9】Reveal and Release: Iterative LLM Unlearning with Self-generated Data
标题:揭示和发布:利用自我生成的数据迭代LLM学习
链接:https://arxiv.org/abs/2509.14624
作者:, Xin Teng, Shichang Ke, Hongyi Wen, Shengjie Wang
备注:Accepted to EMNLP 2025 Findings
摘要:大型语言模型(LLM)的非学习已经证明了消除不需要的数据(也称为遗忘数据)的影响的有效性。现有的方法通常假设对遗忘数据集的完全访问,忽略了两个关键挑战:(1)遗忘数据通常是隐私敏感的,罕见的,或受法律管制的,使得获得昂贵或不切实际(2)可用遗忘数据的分布可能与模型中表示信息的方式不一致。为了解决这些局限性,我们提出了一种“揭示和释放”的方法来遗忘自生成的数据,在这种方法中,我们提示模型使用优化的指令来揭示它所知道的内容。为了充分利用自我生成的遗忘数据,我们提出了一个迭代学习框架,在这个框架中,我们使用在遗忘数据上训练的参数高效模块对模型的权重空间进行增量调整。实验结果表明,我们的方法平衡了遗忘质量和效用保存之间的权衡。
摘要:Large language model (LLM) unlearning has demonstrated effectiveness in removing the influence of undesirable data (also known as forget data). Existing approaches typically assume full access to the forget dataset, overlooking two key challenges: (1) Forget data is often privacy-sensitive, rare, or legally regulated, making it expensive or impractical to obtain (2) The distribution of available forget data may not align with how that information is represented within the model. To address these limitations, we propose a ``Reveal-and-Release'' method to unlearn with self-generated data, where we prompt the model to reveal what it knows using optimized instructions. To fully utilize the self-generated forget data, we propose an iterative unlearning framework, where we make incremental adjustments to the model's weight space with parameter-efficient modules trained on the forget data. Experimental results demonstrate that our method balances the tradeoff between forget quality and utility preservation.
【10】VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
标题:VisMoDAl:用于评估和改进视觉语言模型腐败稳健性的视觉分析
链接:https://arxiv.org/abs/2509.14571
作者:Wang, Wencheng Zhang, Zhiqiang Wang, Zhicong Lu, Yuxin Ma
备注:11 pages, 7 figures, 1 table, accepted to IEEE VIS 2025 (IEEE Transactions on Visualization and Computer Graphics)
摘要:视觉语言(VL)模型由于其理解多模态信息的能力而在各个关键领域显示出变革潜力。然而,它们的性能经常在分布变化下下降,因此评估和提高对实际应用中遇到的真实世界数据损坏的鲁棒性至关重要。虽然VL基准数据集和数据增强(DA)的进步有助于鲁棒性评估和改进,但由于缺乏对模型行为的深入理解以及需要专业知识和迭代努力来探索数据模式,因此仍然存在挑战。鉴于可视化在解释复杂模型和探索大规模数据方面的成就,理解各种数据损坏对VL模型的影响自然与可视化分析方法相一致。为了应对这些挑战,我们引入了VisMoDAl,这是一个可视化分析框架,旨在评估VL模型对各种腐败类型的鲁棒性,并识别表现不佳的样本,以指导有效DA策略的开发。基于文献综述和专家讨论,VisMoDAl支持多层次分析,从检查特定腐败下的性能到模型行为和相应数据切片的任务驱动检查。与传统的作品不同,VisMoDAl使用户能够推理腐败对VL模型的影响,促进模型行为理解和DA策略制定。我们的系统的效用证明,通过案例研究和定量评估,专注于腐败鲁棒性的图像字幕任务。
摘要:Vision-language (VL) models have shown transformative potential across various critical domains due to their capability to comprehend multi-modal information. However, their performance frequently degrades under distribution shifts, making it crucial to assess and improve robustness against real-world data corruption encountered in practical applications. While advancements in VL benchmark datasets and data augmentation (DA) have contributed to robustness evaluation and improvement, there remain challenges due to a lack of in-depth comprehension of model behavior as well as the need for expertise and iterative efforts to explore data patterns. Given the achievement of visualization in explaining complex models and exploring large-scale data, understanding the impact of various data corruption on VL models aligns naturally with a visual analytics approach. To address these challenges, we introduce VisMoDAl, a visual analytics framework designed to evaluate VL model robustness against various corruption types and identify underperformed samples to guide the development of effective DA strategies. Grounded in the literature review and expert discussions, VisMoDAl supports multi-level analysis, ranging from examining performance under specific corruptions to task-driven inspection of model behavior and corresponding data slice. Unlike conventional works, VisMoDAl enables users to reason about the effects of corruption on VL models, facilitating both model behavior understanding and DA strategy formulation. The utility of our system is demonstrated through case studies and quantitative evaluations focused on corruption robustness in the image captioning task.
【11】Delta Knowledge Distillation for Large Language Models
标题:大型语言模型的Delta知识提取
链接:https://arxiv.org/abs/2509.14526
作者:, Yanbin Kang, Zhengming Xing, Ruijie Jiang
备注:8 pages, 3 figures
摘要:知识蒸馏(KD)是一种广泛采用的压缩大型神经网络的方法,通过将知识从大型教师模型转移到较小的学生模型。在大型语言模型的背景下,令牌级KD通常最小化学生输出分布和教师输出分布之间的KL偏差,已显示出强大的实证性能。然而,先前的工作假设学生输出分布和教师输出分布共享相同的最优表示空间,这一前提在许多情况下可能不成立。为了解决这个问题,我们提出了Delta知识蒸馏(Delta-KD),这是令牌级KD的一种新扩展,它鼓励学生通过显式保留教师监督微调(SFT)期间引入的分布偏移Delta来近似最佳表示空间。ROUGE指标的实证结果表明,三角洲KD大幅提高学生的表现,同时保留更多的教师的知识。
摘要:Knowledge distillation (KD) is a widely adopted approach for compressing large neural networks by transferring knowledge from a large teacher model to a smaller student model. In the context of large language models, token level KD, typically minimizing the KL divergence between student output distribution and teacher output distribution, has shown strong empirical performance. However, prior work assumes student output distribution and teacher output distribution share the same optimal representation space, a premise that may not hold in many cases. To solve this problem, we propose Delta Knowledge Distillation (Delta-KD), a novel extension of token level KD that encourages the student to approximate an optimal representation space by explicitly preserving the distributional shift Delta introduced during the teacher's supervised finetuning (SFT). Empirical results on ROUGE metrics demonstrate that Delta KD substantially improves student performance while preserving more of the teacher's knowledge.
【12】BEACON: Behavioral Malware Classification with Large Language Model Embeddings and Deep Learning
标题:BEACON:采用大型语言模型嵌入和深度学习的行为恶意软件分类
链接:https://arxiv.org/abs/2509.14519
作者: Shanika Perera, Haodi Jiang
摘要:恶意软件正变得越来越复杂和广泛,这使得开发更有效和及时的检测方法变得至关重要。传统的静态分析通常无法防御使用代码混淆、多态和其他规避技术的现代威胁。相比之下,行为恶意软件检测(监视运行时活动)提供了更可靠和上下文感知的解决方案。在这项工作中,我们提出了BEACON,这是一种新型的深度学习框架,它利用大型语言模型(LLM)从原始沙箱生成的行为报告中生成密集的上下文嵌入。这些嵌入捕获每个样本的语义和结构模式,并由一维卷积神经网络(1D CNN)进行处理,用于多类恶意软件分类。在Avast-CTU公共CAPE数据集上进行评估,我们的框架始终优于现有方法,突出了基于LLM的行为嵌入的有效性和BEACON的整体设计,以实现强大的恶意软件分类。
摘要:Malware is becoming increasingly complex and widespread, making it essential to develop more effective and timely detection methods. Traditional static analysis often fails to defend against modern threats that employ code obfuscation, polymorphism, and other evasion techniques. In contrast, behavioral malware detection, which monitors runtime activities, provides a more reliable and context-aware solution. In this work, we propose BEACON, a novel deep learning framework that leverages large language models (LLMs) to generate dense, contextual embeddings from raw sandbox-generated behavior reports. These embeddings capture semantic and structural patterns of each sample and are processed by a one-dimensional convolutional neural network (1D CNN) for multi-class malware classification. Evaluated on the Avast-CTU Public CAPE Dataset, our framework consistently outperforms existing methods, highlighting the effectiveness of LLM-based behavioral embeddings and the overall design of BEACON for robust malware classification.
【13】Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
标题:用于LLM不确定性量化的语义字母表大小估计
链接:https://arxiv.org/abs/2509.14478
作者:McCabe, Rimon Melamed, Thomas Hartvigsen, H. Howie Huang
摘要:许多用于量化大型语言模型(LLM)的不确定性的黑盒技术依赖于重复的LLM采样,这在计算上是昂贵的。因此,实际应用需要从少量样本中进行可靠的估计。语义熵(SE)是一种流行的基于样本的不确定性估计与离散配方吸引力的黑箱设置。语义熵的最新扩展表现出改进的LLM幻觉检测,但这样做的可解释性较低的方法,承认额外的超参数。出于这个原因,我们重新审视了典型的离散语义熵估计,发现它低估了“真正的”语义熵,从理论上预期。我们提出了一种改进的语义字母表大小估计,并说明使用它来调整离散语义熵的样本覆盖率的结果在我们的设置更准确的语义熵估计的兴趣。此外,我们提出的字母表大小估计器标记不正确的LLM响应以及或优于最近的最佳性能方法,具有保持高度可解释性的额外好处。
摘要:Many black-box techniques for quantifying the uncertainty of large language models (LLMs) rely on repeated LLM sampling, which can be computationally expensive. Therefore, practical applicability demands reliable estimation from few samples. Semantic entropy (SE) is a popular sample-based uncertainty estimator with a discrete formulation attractive for the black-box setting. Recent extensions of semantic entropy exhibit improved LLM hallucination detection, but do so with less interpretable methods that admit additional hyperparameters. For this reason, we revisit the canonical discrete semantic entropy estimator, finding that it underestimates the "true" semantic entropy, as expected from theory. We propose a modified semantic alphabet size estimator, and illustrate that using it to adjust discrete semantic entropy for sample coverage results in more accurate semantic entropy estimation in our setting of interest. Furthermore, our proposed alphabet size estimator flags incorrect LLM responses as well or better than recent top-performing approaches, with the added benefit of remaining highly interpretable.
【14】Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs
标题:Q-ROAR:量化长上下文LLM中RoPE位置插值的离群者感知重新缩放
链接:https://arxiv.org/abs/2509.14391
作者:Sitao Huang
摘要
:扩展LLM上下文窗口对于远程任务至关重要。基于RoPE的位置插值(PI)方法(如线性和频率感知缩放)可以扩展输入长度而无需重新训练,而训练后量化(PTQ)可以实现实际部署。我们表明,PI与PTQ相结合,降低精度由于耦合效应长上下文混叠,动态范围膨胀,轴网格各向异性,和离群值移位,诱导位置相关的logit噪声。我们提供了PI加PTQ的第一个系统分析,并介绍了两个诊断:插值压力(每波段相位缩放灵敏度)和尾部膨胀率(从短到长的异常值转移)。为了解决这个问题,我们提出了Q-ROAR,这是一种感知RoPE的仅权重稳定化,将RoPE维度分组到几个频带中,并在每个频带尺度上对W_Q,W_K执行小搜索,并具有可选的对称变体以保持logit尺度。诊断引导搜索使用一个小的长上下文开发集,不需要微调,内核或架构更改。根据经验,Q-ROAR在标准任务上恢复了高达0.7%的准确率,并将GovReport的困惑减少了10%以上,同时保持了短上下文性能和与现有推理堆栈的兼容性。
摘要:Extending LLM context windows is crucial for long range tasks. RoPE-based position interpolation (PI) methods like linear and frequency-aware scaling extend input lengths without retraining, while post-training quantization (PTQ) enables practical deployment. We show that combining PI with PTQ degrades accuracy due to coupled effects long context aliasing, dynamic range dilation, axis grid anisotropy, and outlier shifting that induce position-dependent logit noise. We provide the first systematic analysis of PI plus PTQ and introduce two diagnostics: Interpolation Pressure (per-band phase scaling sensitivity) and Tail Inflation Ratios (outlier shift from short to long contexts). To address this, we propose Q-ROAR, a RoPE-aware, weight-only stabilization that groups RoPE dimensions into a few frequency bands and performs a small search over per-band scales for W_Q,W_K, with an optional symmetric variant to preserve logit scale. The diagnostics guided search uses a tiny long-context dev set and requires no fine-tuning, kernel, or architecture changes. Empirically, Q-ROAR recovers up to 0.7% accuracy on standard tasks and reduces GovReport perplexity by more than 10%, while preserving short-context performance and compatibility with existing inference stacks.
【15】From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
标题:从能力到性能:在渗透测试中评估LLM架构的关键功能属性
链接:https://arxiv.org/abs/2509.14289
作者:uang, Daksh Dave, Ming Jin, Tyler Cody, Peter Beling
摘要:大型语言模型(LLM)越来越多地用于自动化或增强渗透测试,但其在攻击阶段的有效性和可靠性仍不清楚。我们提出了一个全面的评估多个基于LLM的代理,从单代理到模块化设计,在现实的渗透测试方案,测量经验的性能和反复出现的故障模式。我们还通过有针对性的增强隔离了五个核心功能的影响:全局上下文记忆(GCM),代理间消息传递(IAM),上下文条件调用(CCI),自适应规划(AP)和实时监控(RTM)。这些干预措施分别支持:(i)上下文一致性和保留,(ii)组件间协调和状态管理,(iii)工具使用的准确性和选择性执行,(iv)多步骤战略规划,错误检测和恢复,以及(v)实时动态响应。我们的研究结果表明,虽然一些架构本身表现出这些属性的子集,有针对性的增强大大提高模块化代理的性能,特别是在复杂的,多步骤的,实时的渗透测试任务。
摘要:Large language models (LLMs) are increasingly used to automate or augment penetration testing, but their effectiveness and reliability across attack phases remain unclear. We present a comprehensive evaluation of multiple LLM-based agents, from single-agent to modular designs, across realistic penetration testing scenarios, measuring empirical performance and recurring failure patterns. We also isolate the impact of five core functional capabilities via targeted augmentations: Global Context Memory (GCM), Inter-Agent Messaging (IAM), Context-Conditioned Invocation (CCI), Adaptive Planning (AP), and Real-Time Monitoring (RTM). These interventions support, respectively: (i) context coherence and retention, (ii) inter-component coordination and state management, (iii) tool use accuracy and selective execution, (iv) multi-step strategic planning, error detection, and recovery, and (v) real-time dynamic responsiveness. Our results show that while some architectures natively exhibit subsets of these properties, targeted augmentations substantially improve modular agent performance, especially in complex, multi-step, and real-time penetration testing tasks.
【16】A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks
标题:一种基于多Agent的LLM快速注入攻击防御流水线
链接:https://arxiv.org/abs/2509.14285
作者:Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin
摘要:提示注入攻击是大型语言模型(LLM)部署中的一个主要漏洞,其中嵌入在用户输入中的恶意指令可以覆盖系统提示并引发意外行为。本文提出了一种新型的多代理防御框架,该框架在协调的管道中使用专门的LLM代理来实时检测和中和提示注入攻击。我们评估我们的方法使用两种不同的架构:一个顺序链的代理管道和一个分层的协调器为基础的系统。我们对55个独特的提示注入攻击进行了全面评估,分为8个类别,在两个LLM平台(ChatGLM和Llama2)上共有400个攻击实例,证明了显着的安全改进。在没有防御机制的情况下,ChatGLM的基线攻击成功率(ASR)达到30%,Llama2达到20%。我们的多代理管道实现了100%的缓解,在所有测试场景中将ASR降低到0%。该框架展示了跨多个攻击类别的鲁棒性,包括直接覆盖、代码执行尝试、数据泄漏和混淆技术,同时保持系统功能以进行合法查询。
摘要:Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We evaluate our approach using two distinct architectures: a sequential chain-of-agents pipeline and a hierarchical coordinator-based system. Our comprehensive evaluation on 55 unique prompt injection attacks, grouped into 8 categories and totaling 400 attack instances across two LLM platforms (ChatGLM and Llama2), demonstrates significant security improvements. Without defense mechanisms, baseline Attack Success Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent pipeline achieved 100% mitigation, reducing ASR to 0% across all tested scenarios. The framework demonstrates robustness across multiple attack categories including direct overrides, code execution attempts, data exfiltration, and obfuscation techniques, while maintaining system functionality for legitimate queries.
【17】Beyond Data Privacy: New Privacy Risks for Large Language Models
标题:超越数据隐私:大型语言模型的新隐私风险
链接:https://arxiv.org/abs/2509.14278
作者:, Zitao Li, Ninghui Li, Bolin Ding
摘要:大型语言模型(LLM)在自然语言理解、推理和自主决策方面取得了显著的进展。然而,这些进步也带来了严重的隐私问题。虽然大量的研究都集中在减轻模型训练各个阶段中LLM的数据隐私风险,但对部署过程中出现的新威胁关注较少。将LLM集成到广泛使用的应用程序中,以及将其自主能力武器化,造成了新的隐私漏洞。这些漏洞为LLM驱动的系统的无意数据泄漏和恶意渗透提供了机会。此外,攻击者可以利用这些系统发起复杂的大规模隐私攻击,不仅威胁个人隐私,还威胁金融安全和社会信任。在本文中,我们系统地研究这些新兴的隐私风险的LLM。我们还讨论了潜在的缓解策略,并呼吁研究界将其关注点扩大到数据隐私风险之外,开发新的防御措施,以应对日益强大的LLM和LLM驱动的系统所带来的不断变化的威胁。
摘要:Large Language Models (LLMs) have achieved remarkable progress in natural language understanding, reasoning, and autonomous decision-making. However, these advancements have also come with significant privacy concerns. While significant research has focused on mitigating the data privacy risks of LLMs during various stages of model training, less attention has been paid to new threats emerging from their deployment. The integration of LLMs into widely used applications and the weaponization of their autonomous abilities have created new privacy vulnerabilities. These vulnerabilities provide opportunities for both inadvertent data leakage and malicious exfiltration from LLM-powered systems. Additionally, adversaries can exploit these systems to launch sophisticated, large-scale privacy attacks, threatening not only individual privacy but also financial security and societal trust. In this paper, we systematically examine these emerging privacy risks of LLMs. We also discuss potential mitigation strategies and call for the research community to broaden its focus beyond data privacy risks, developing new defenses to address the evolving threats posed by increasingly powerful LLMs and LLM-powered systems.
【18】FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health
标题:FedMentor:心理健康领域异类联邦LLM的领域感知差异隐私
链接:https://arxiv.org/abs/2509.14275
作者:war, Shubhashis Roy Dipta
备注:(e.g.: 18 pages, 6 figures, 6 tables)
摘要:在敏感领域(例如,心理健康)要求平衡严格的保密性与模型实用性和安全性。我们提出了FedMentor,一个联邦微调框架,集成了低秩自适应(LoRA)和域感知差分隐私(DP),以满足每个域的隐私预算,同时保持性能。每个客户端(域)应用与其数据敏感度成比例的自定义DP噪声标度,并且服务器在效用低于阈值时自适应地降低噪声。在三个心理健康数据集的实验中,我们表明FedMentor比没有隐私的标准联合学习提高了安全性,将安全输出率提高了三个点,降低了毒性,同时将效用(BERTScore F1和ROUGE-L)保持在非私有基线的0.5%以内,接近集中式上限。该框架可扩展到单GPU客户端上具有高达1.7B参数的骨干,每轮需要< 173 MB的通信。FedMentor展示了一种实用的方法,可以私下微调LLM,以便在医疗保健和其他敏感领域进行更安全的部署。
摘要:Privacy-preserving adaptation of Large Language Models (LLMs) in sensitive domains (e.g., mental health) requires balancing strict confidentiality with model utility and safety. We propose FedMentor, a federated fine-tuning framework that integrates Low-Rank Adaptation (LoRA) and domain-aware Differential Privacy (DP) to meet per-domain privacy budgets while maintaining performance. Each client (domain) applies a custom DP noise scale proportional to its data sensitivity, and the server adaptively reduces noise when utility falls below a threshold. In experiments on three mental health datasets, we show that FedMentor improves safety over standard Federated Learning without privacy, raising safe output rates by up to three points and lowering toxicity, while maintaining utility (BERTScore F1 and ROUGE-L) within 0.5% of the non-private baseline and close to the centralized upper bound. The framework scales to backbones with up to 1.7B parameters on single-GPU clients, requiring < 173 MB of communication per round. FedMentor demonstrates a practical approach to privately fine-tune LLMs for safer deployments in healthcare and other sensitive fields.
【19】Discovering New Theorems via LLMs with In-Context Proof Learning in Lean
标题:通过LLM在精益中通过上下文证明学习发现新理论
链接:https://arxiv.org/abs/2509.14274
作者:saura, Naoto Onda, Yuta Oriike, Masaya Taniguchi, Akiyoshi Sannai, Sho Sonoda
备注:11 pages, 3 figures
摘要:大型语言模型在形式化定理证明中表现出了重要的前景。然而,以前的工作主要集中在解决现有的问题。在本文中,我们专注于LLM发现新定理的能力。我们提出了猜想-证明循环流水线,用于自动生成数学公式并以Lean 4格式证明它们。我们的方法的一个特点是,我们生成和证明进一步的命题与上下文,包括以前生成的定理及其证明,这使得生成更困难的证明,通过在上下文中学习的证明策略,而不改变参数的LLM。我们证明了我们的框架重新发现了具有验证的定理,这些定理发表在过去的数学论文中,尚未正式化。此外,这些定理中至少有一个在没有上下文学习的情况下不能被LLM证明,即使是在自然语言中,这意味着上下文学习对于神经定理证明是有效的。源代码可在https://github.com/auto-res/ConjecturingProvingLoop上获得。
摘要:Large Language Models have demonstrated significant promise in formal theorem proving. However, previous works mainly focus on solving existing problems. In this paper, we focus on the ability of LLMs to find novel theorems. We propose Conjecturing-Proving Loop pipeline for automatically generating mathematical conjectures and proving them in Lean 4 format. A feature of our approach is that we generate and prove further conjectures with context including previously generated theorems and their proofs, which enables the generation of more difficult proofs by in-context learning of proof strategies without changing parameters of LLMs. We demonstrated that our framework rediscovered theorems with verification, which were published in past mathematical papers and have not yet formalized. Moreover, at least one of these theorems could not be proved by the LLM without in-context learning, even in natural language, which means that in-context learning was effective for neural theorem proving. The source code is available at https://github.com/auto-res/ConjecturingProvingLoop.
【20】SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
标题
:SpeechWeave:用于训练文本到语音模型的多样化多语言合成文本和音频数据生成管道
链接:https://arxiv.org/abs/2509.14270
作者:, Puneet Mittal, Ranjeet Gupta, Hitesh Laxmichand Patel
备注:Accepted to ACL 2025
摘要:高质量的文本到语音(TTS)模型训练需要广泛而多样的文本和语音数据。由于领域特异性、许可和可扩展性的问题,从真实来源获取此类数据具有挑战性。大型语言模型(LLM)当然可以生成文本数据,但它们在生成过程中创建了重复的文本,提示中的变化不足。TTS训练数据的另一个重要方面是文本规范化。规范化工具可能偶尔会引入异常或忽略有价值的模式,从而影响数据质量。此外,在具有标准化语音的商业TTS系统中,依赖语音艺术家进行大规模语音记录也是不切实际的。为了应对这些挑战,我们提出了SpeechWeave,这是一种合成语音数据生成管道,能够自动生成多语言,特定于领域的数据集,用于训练TTS模型。我们的实验表明,我们的管道生成的数据比各种语言和语音指标的基线多10-48%,以及说话者标准化的语音音频,同时生成大约97%正确的规范化文本。我们的方法可以为TTS训练生成可扩展的高质量数据,提高生成数据集的多样性,规范化和语音一致性。
摘要:High-quality Text-to-Speech (TTS) model training requires extensive and diverse text and speech data. It is challenging to procure such data from real sources due to issues of domain specificity, licensing, and scalability. Large language models (LLMs) can certainly generate textual data, but they create repetitive text with insufficient variation in the prompt during the generation process. Another important aspect in TTS training data is text normalization. Tools for normalization might occasionally introduce anomalies or overlook valuable patterns, and thus impact data quality. Furthermore, it is also impractical to rely on voice artists for large scale speech recording in commercial TTS systems with standardized voices. To address these challenges, we propose SpeechWeave, a synthetic speech data generation pipeline that is capable of automating the generation of multilingual, domain-specific datasets for training TTS models. Our experiments reveal that our pipeline generates data that is 10-48% more diverse than the baseline across various linguistic and phonetic metrics, along with speaker-standardized speech audio while generating approximately 97% correctly normalized text. Our approach enables scalable, high-quality data generation for TTS training, improving diversity, normalization, and voice consistency in the generated datasets.
Graph相关(图学习|图神经网络|图优化等)(7篇)
【1】Attention Beyond Neighborhoods: Reviving Transformer for Graph Clustering
标题:邻里之外的注意力:复兴图集群的变形Transformer
链接:https://arxiv.org/abs/2509.15024
作者:Xie, Bingheng Li, Erlin Pan, Rui Hou, Wenyu Chen, Zhao Kang
备注:9 pages, 5 figures
摘要:注意力机制已经成为现代神经网络的基石,推动了各个领域的突破。然而,与图神经网络(GNN)相比,它们在图结构化数据中的应用(其中捕获拓扑连接是必不可少的)仍然没有得到充分的探索和表现不佳,特别是在图聚类任务中。GNN倾向于过分强调邻域聚合,导致节点表示的同质化。相反,Transformer倾向于过度全球化,以牺牲有意义的本地模式为代价突出显示远程节点。这种二分法提出了一个关键问题:对于无监督图学习来说,注意力本质上是多余的吗?为了解决这个问题,我们进行了全面的实证分析,揭示了GNN和Transformer在图聚类中的互补弱点。受这些见解的启发,我们提出了注意图聚类网络(AGCN)一种新的架构,重新解释了图形是注意力的概念。AGCN直接将注意力机制嵌入到图结构中,从而在保持对局部拓扑线索的敏感性的同时实现有效的全局信息提取。我们的框架结合了理论分析,将AGCN的行为与GNN和Transformer进行了对比,并引入了两个创新:(1)KV缓存机制,以提高计算效率,(2)成对边缘对比损失,以提高注意力空间的区分能力。大量的实验结果表明,AGCN优于国家的最先进的方法。
摘要:Attention mechanisms have become a cornerstone in modern neural networks, driving breakthroughs across diverse domains. However, their application to graph structured data, where capturing topological connections is essential, remains underexplored and underperforming compared to Graph Neural Networks (GNNs), particularly in the graph clustering task. GNN tends to overemphasize neighborhood aggregation, leading to a homogenization of node representations. Conversely, Transformer tends to over globalize, highlighting distant nodes at the expense of meaningful local patterns. This dichotomy raises a key question: Is attention inherently redundant for unsupervised graph learning? To address this, we conduct a comprehensive empirical analysis, uncovering the complementary weaknesses of GNN and Transformer in graph clustering. Motivated by these insights, we propose the Attentive Graph Clustering Network (AGCN) a novel architecture that reinterprets the notion that graph is attention. AGCN directly embeds the attention mechanism into the graph structure, enabling effective global information extraction while maintaining sensitivity to local topological cues. Our framework incorporates theoretical analysis to contrast AGCN behavior with GNN and Transformer and introduces two innovations: (1) a KV cache mechanism to improve computational efficiency, and (2) a pairwise margin contrastive loss to boost the discriminative capacity of the attention space. Extensive experimental results demonstrate that AGCN outperforms state-of-the-art methods.
【2】Learning Graph from Smooth Signals under Partial Observation: A Robustness Analysis
标题:部分观察下平滑信号学习图:鲁棒性分析
链接:https://arxiv.org/abs/2509.14887
作者: Nguyen, Hoi-To Wai
备注:7 pages, 3 figures
摘要:从节点信号中学习网络系统底层的图对于图信号处理和机器学习中的下游任务至关重要。信号不可观测的隐藏节点的存在可能会破坏估计的图。虽然现有的工作提出了各种鲁棒性的香草图学习目标,明确占这些隐藏节点的存在,“天真”,隐藏节点不可知的方法的鲁棒性分析仍然是underexplored。这项工作表明,香草图拓扑学习方法是隐含的低通滤波图信号的部分观测鲁棒性。我们通过将限制等距属性(RIP)扩展到图学习目标中使用的Dirichlet能量函数来实现这一理论结果。我们表明,基于平滑度的图学习公式(例如,GL-SigRep方法)可以恢复对应于所观察节点的地面真值图拓扑。合成和真实数据实验证实了我们的发现。
摘要
:Learning the graph underlying a networked system from nodal signals is crucial to downstream tasks in graph signal processing and machine learning. The presence of hidden nodes whose signals are not observable might corrupt the estimated graph. While existing works proposed various robustifications of vanilla graph learning objectives by explicitly accounting for the presence of these hidden nodes, a robustness analysis of "naive", hidden-node agnostic approaches is still underexplored. This work demonstrates that vanilla graph topology learning methods are implicitly robust to partial observations of low-pass filtered graph signals. We achieve this theoretical result through extending the restricted isometry property (RIP) to the Dirichlet energy function used in graph learning objectives. We show that smoothness-based graph learning formulation (e.g., the GL-SigRep method) on partial observations can recover the ground truth graph topology corresponding to the observed nodes. Synthetic and real data experiments corroborate our findings.
【3】Exploring the Global-to-Local Attention Scheme in Graph Transformers: An Empirical Study
标题:探索图形变形者中的全球到本地注意力方案:实证研究
链接:https://arxiv.org/abs/2509.14863
作者:Wang, Gang Wu
摘要:图Transformers(GT)在图表示学习中显示出巨大的潜力。GT的架构通常将图形神经网络(GNN)与全局注意力机制并行集成或作为注意力机制的前身,从而产生局部和全局或局部到全局的注意力方案。然而,由于全局注意力机制主要捕获节点之间的长程依赖关系,因此这些集成方案可能会遭受信息丢失,其中GNN学习的局部邻域信息可能会被注意力机制稀释。因此,我们提出了G2LFormer,它具有一种新的全局到局部注意力方案,其中浅层网络层使用注意力机制来捕获全局信息,而深层网络层使用GNN模块来学习局部结构信息,从而防止节点忽略它们的近邻。引入了一种有效的跨层信息融合策略,使局部层能够保留全局层的有益信息,减轻信息丢失,并在可扩展性方面取得了可接受的折衷。为了验证全局到局部注意力方案的可行性,我们将G2LFormer与最先进的线性GT和GNN在节点级和图形级任务上进行了比较。实验结果表明,G2LFormer算法在保持线性复杂度的同时,具有良好的性能。
摘要:Graph Transformers (GTs) show considerable potential in graph representation learning. The architecture of GTs typically integrates Graph Neural Networks (GNNs) with global attention mechanisms either in parallel or as a precursor to attention mechanisms, yielding a local-and-global or local-to-global attention scheme. However, as the global attention mechanism primarily captures long-range dependencies between nodes, these integration schemes may suffer from information loss, where the local neighborhood information learned by GNN could be diluted by the attention mechanism. Therefore, we propose G2LFormer, featuring a novel global-to-local attention scheme where the shallow network layers use attention mechanisms to capture global information, while the deeper layers employ GNN modules to learn local structural information, thereby preventing nodes from ignoring their immediate neighbors. An effective cross-layer information fusion strategy is introduced to allow local layers to retain beneficial information from global layers and alleviate information loss, with acceptable trade-offs in scalability. To validate the feasibility of the global-to-local attention scheme, we compare G2LFormer with state-of-the-art linear GTs and GNNs on node-level and graph-level tasks. The results indicate that G2LFormer exhibits excellent performance while keeping linear complexity.
【4】Precision Neural Networks: Joint Graph And Relational Learning
标题:精确神经网络:联合图和关系学习
链接:https://arxiv.org/abs/2509.14821
作者:vallo, Samuel Rey, Antonio G. Marques, Elvin Isufi
摘要:协方差神经网络(VNN)在由数据的协方差矩阵确定的图上执行卷积,这使得基于协方差的学习具有表现力和稳定性。然而,协方差矩阵通常是密集的,无法编码条件独立性,并且通常以任务不可知的方式预先计算,这可能会阻碍性能。为了克服这些限制,我们研究了精确神经网络(PNN),即,精度矩阵上的VNN--逆协方差。精度矩阵自然地编码统计独立性,通常表现出稀疏性,并保留协方差谱结构。为了使精度估计任务感知,我们制定了一个优化问题,该问题联合学习网络参数和精度矩阵,并通过交替优化来解决它,依次更新网络权重和精度估计。我们理论上限制了每次迭代时估计精度矩阵和真实精度矩阵之间的距离,并证明了联合估计与合成和真实数据的两步方法相比的有效性。
摘要:CoVariance Neural Networks (VNNs) perform convolutions on the graph determined by the covariance matrix of the data, which enables expressive and stable covariance-based learning. However, covariance matrices are typically dense, fail to encode conditional independence, and are often precomputed in a task-agnostic way, which may hinder performance. To overcome these limitations, we study Precision Neural Networks (PNNs), i.e., VNNs on the precision matrix -- the inverse covariance. The precision matrix naturally encodes statistical independence, often exhibits sparsity, and preserves the covariance spectral structure. To make precision estimation task-aware, we formulate an optimization problem that jointly learns the network parameters and the precision matrix, and solve it via alternating optimization, by sequentially updating the network weights and the precision estimate. We theoretically bound the distance between the estimated and true precision matrices at each iteration, and demonstrate the effectiveness of joint estimation compared to two-step approaches on synthetic and real-world data.
【5】One-step Multi-view Clustering With Adaptive Low-rank Anchor-graph Learning
标题:具有自适应低阶锚点图学习的一步多视图聚集
链接:https://arxiv.org/abs/2509.14724
作者:ue, Ben Yang, Xuetao Zhang, Fei Wang, Zhiping Lin
备注:13 pages, 7 figures, journal article. Accepted by IEEE Transactions on Multimedia, not yet published online
摘要
:基于锚点图的多视图聚类(AGMC)方法具有捕获结构信息、降低计算复杂度的优点,在大规模聚类问题中受到了广泛关注。然而,现有的AGMC方法仍然面临以下两个问题:1)直接将不同的锚图嵌入到一致性锚图(CAG)中,忽略了锚图中包含的冗余信息和大量噪声,导致聚类效果下降; 2)独立的后处理获取聚类指标,降低了有效性和效率。为了克服上述问题,我们提出了一种新的一步多视图聚类方法与自适应低秩锚图学习(OMCAL)。为了构造高质量的CAG,OMCAL提供了一种基于核范数的自适应CAG学习模型,以抵抗信息冗余和噪声干扰。然后,为了提高聚类的有效性和效率,我们将类别指标获取和CAG学习纳入一个统一的框架。在普通和大规模数据集上进行的大量研究表明,OMCAL在聚类效果和效率方面优于现有的最先进的方法。
摘要:In light of their capability to capture structural information while reducing computing complexity, anchor graph-based multi-view clustering (AGMC) methods have attracted considerable attention in large-scale clustering problems. Nevertheless, existing AGMC methods still face the following two issues: 1) They directly embedded diverse anchor graphs into a consensus anchor graph (CAG), and hence ignore redundant information and numerous noises contained in these anchor graphs, leading to a decrease in clustering effectiveness; 2) They drop effectiveness and efficiency due to independent post-processing to acquire clustering indicators. To overcome the aforementioned issues, we deliver a novel one-step multi-view clustering method with adaptive low-rank anchor-graph learning (OMCAL). To construct a high-quality CAG, OMCAL provides a nuclear norm-based adaptive CAG learning model against information redundancy and noise interference. Then, to boost clustering effectiveness and efficiency substantially, we incorporate category indicator acquisition and CAG learning into a unified framework. Numerous studies conducted on ordinary and large-scale datasets indicate that OMCAL outperforms existing state-of-the-art methods in terms of clustering effectiveness and efficiency.
【6】Towards Pre-trained Graph Condensation via Optimal Transport
标题:通过最佳传输实现预训练图凝聚
链接:https://arxiv.org/abs/2509.14722
作者: Shuai Zheng, Wenjun Hui, Xiangkai Zhu, Dong Chen, Zhenfeng Zhu, Yao Zhao, Kunlun He
摘要:图压缩(GC)旨在将原始图提取为小规模图,减少冗余并加速GNN训练。然而,传统的GC方法严重依赖于严格的GNN和特定于任务的监督。这种依赖性严重限制了它们在各种任务和架构中的可重用性和通用性。本文从GNN优化一致性的角度重新审视了理想GC的目标,并由此导出了广义GC优化目标,将传统GC方法看作是该优化范式的特例。在此基础上,提出了通过最佳传输的预训练图凝聚(PreGC),以超越任务和架构相关的GC方法的限制。具体而言,提出了一种混合区间图扩散增强,通过增强节点状态的不确定性来抑制凝聚图在特定架构上的弱泛化能力。同时,巧妙地建立了最优图传输计划与表示传输计划之间的匹配,以保持源图和凝聚图空间之间的语义一致性,从而使图凝聚不受任务依赖的影响.为了进一步促进凝聚图适应各种下游任务,提出了一个可追溯的语义协调器从源节点到凝聚节点,通过优化的表示传输计划在预训练中桥接语义关联。大量的实验验证了PreGC的优越性和多功能性,证明了它的任务独立性和与任意GNN的无缝兼容性。
摘要:Graph condensation (GC) aims to distill the original graph into a small-scale graph, mitigating redundancy and accelerating GNN training. However, conventional GC approaches heavily rely on rigid GNNs and task-specific supervision. Such a dependency severely restricts their reusability and generalization across various tasks and architectures. In this work, we revisit the goal of ideal GC from the perspective of GNN optimization consistency, and then a generalized GC optimization objective is derived, by which those traditional GC methods can be viewed nicely as special cases of this optimization paradigm. Based on this, Pre-trained Graph Condensation (PreGC) via optimal transport is proposed to transcend the limitations of task- and architecture-dependent GC methods. Specifically, a hybrid-interval graph diffusion augmentation is presented to suppress the weak generalization ability of the condensed graph on particular architectures by enhancing the uncertainty of node states. Meanwhile, the matching between optimal graph transport plan and representation transport plan is tactfully established to maintain semantic consistencies across source graph and condensed graph spaces, thereby freeing graph condensation from task dependencies. To further facilitate the adaptation of condensed graphs to various downstream tasks, a traceable semantic harmonizer from source nodes to condensed nodes is proposed to bridge semantic associations through the optimized representation transport plan in pre-training. Extensive experiments verify the superiority and versatility of PreGC, demonstrating its task-independent nature and seamless compatibility with arbitrary GNNs.
【7】Sampling Method for Generalized Graph Signals with Pre-selected Vertices via DC Optimization
标题:基于DC优化的具有预选点的广义图信号采样方法
链接:https://arxiv.org/abs/2509.14836
作者:amashita, Kazuki Naganuma, Shunsuke Ono
备注:Submitted to the IEEE Open Journal of Signal Processing
摘要:本文提出了一种方法,顶点灵活采样的一个广泛的类图信号,旨在达到最佳可能的恢复广义采样理论的基础上。这是通过由优化问题设计采样算子来实现的,该优化问题本质上是非凸的,因为最佳可能的恢复施加了秩约束。现有的顶点灵活采样方法能够控制活动顶点的数量,但不能结合强制或禁止顶点的先验知识。为了解决这些挑战,我们制定的运营商设计作为一个问题,处理的有效顶点的数量和先验知识的特定顶点的采样,强制包含或排除的约束。我们将这个约束问题转化为一个凸差分(DC)优化问题,通过使用核范数和DC惩罚的顶点选择。为了解决这个问题,我们开发了一个收敛求解器的基础上,一般的双邻近梯度DC算法。我们的方法的有效性证明了通过各种图形信号模型,包括现实世界的数据,通过比较现有的方法,显示出优越的性能在恢复精度的实验。
摘要:This paper proposes a method for vertex-wise flexible sampling of a broad class of graph signals, designed to attain the best possible recovery based on the generalized sampling theory. This is achieved by designing a sampling operator by an optimization problem, which is inherently non-convex, as the best possible recovery imposes a rank constraint. An existing method for vertex-wise flexible sampling is able to control the number of active vertices but cannot incorporate prior knowledge of mandatory or forbidden vertices. To address these challenges, we formulate the operator design as a problem that handles a constraint of the number of active vertices and prior knowledge on specific vertices for sampling, mandatory inclusion or exclusion. We transformed this constrained problem into a difference-of-convex (DC) optimization problem by using the nuclear norm and a DC penalty for vertex selection. To solve this, we develop a convergent solver based on the general double-proximal gradient DC algorithm. The effectiveness of our method is demonstrated through experiments on various graph signal models, including real-world data, showing superior performance in the recovery accuracy by comparing to existing methods.
Transformer(5篇)
【1】Explainable AI for Infection Prevention and Control: Modeling CPE Acquisition and Patient Outcomes in an Irish Hospital with Transformers
标题:用于感染预防和控制的可解释人工智能:在爱尔兰一家医院中使用Transformer建模CPD获取和患者结果
链接:https://arxiv.org/abs/2509.14942
作者: Pham, Tai Tan Mai, Martin Crane, Rob Brennan, Marie E. Ward, Una Geary, Declan Byrne, Brian O Connell, Colm Bergin, Donncha Creagh, Nick McDonald, Marija Bezbradica
备注:Accepted to BMC Medical Informatics and Decision Making on September 18th 2025
摘要:产碳青霉烯酶肠杆菌是医院感染预防和控制的关键问题。然而,对先前强调的CPE相关风险(如再入院、死亡率和住院时间延长(LOS))的预测建模仍然没有得到充分研究,特别是在现代深度学习方法中。本研究引入了一个可扩展的人工智能建模框架,以调查CPE对爱尔兰医院电子病历数据的患者结局的影响。我们分析了来自爱尔兰急性医院的住院患者数据集,包括诊断代码、病房转换、患者人口统计学、感染相关变量和接触网络特征。几个基于transformer的架构与传统的机器学习模型一起进行了基准测试。预测临床结果,并应用XAI技术解释模型决策。我们的框架成功地证明了基于Transformer的模型的实用性,TabTransformer在多个临床预测任务中的表现始终优于基线,特别是在CPE采集(AUROC和灵敏度)方面。我们发现感染相关特征,包括历史医院暴露、入院背景和网络中心性指标,在预测患者结局和CPE获得风险方面具有高度影响力。可解释性分析显示,“居住地区”、“入院病房”和先前入院等特征是关键的风险因素。像“沃德网页排名”这样的网络变量也排名很高,反映了结构性曝光信息的潜在价值。该研究提出了一个强大且可解释的AI框架,用于分析复杂的EMR数据,以识别关键风险因素并预测CPE相关结果。我们的研究结果强调了Transformer模型的卓越性能,并强调了多样化临床和网络功能的重要性。
摘要:Carbapenemase-Producing Enterobacteriace poses a critical concern for infection prevention and control in hospitals. However, predictive modeling of previously highlighted CPE-associated risks such as readmission, mortality, and extended length of stay (LOS) remains underexplored, particularly with modern deep learning approaches. This study introduces an eXplainable AI modeling framework to investigate CPE impact on patient outcomes from Electronic Medical Records data of an Irish hospital. We analyzed an inpatient dataset from an Irish acute hospital, incorporating diagnostic codes, ward transitions, patient demographics, infection-related variables and contact network features. Several Transformer-based architectures were benchmarked alongside traditional machine learning models. Clinical outcomes were predicted, and XAI techniques were applied to interpret model decisions. Our framework successfully demonstrated the utility of Transformer-based models, with TabTransformer consistently outperforming baselines across multiple clinical prediction tasks, especially for CPE acquisition (AUROC and sensitivity). We found infection-related features, including historical hospital exposure, admission context, and network centrality measures, to be highly influential in predicting patient outcomes and CPE acquisition risk. Explainability analyses revealed that features like "Area of Residence", "Admission Ward" and prior admissions are key risk factors. Network variables like "Ward PageRank" also ranked highly, reflecting the potential value of structural exposure information. This study presents a robust and explainable AI framework for analyzing complex EMR data to identify key risk factors and predict CPE-related outcomes. Our findings underscore the superior performance of the Transformer models and highlight the importance of diverse clinical and network features.
【2】A Comparative Analysis of Transformer Models in Social Bot Detection
标题:社交机器人检测中的Transformer模型比较分析
链接:https://arxiv.org/abs/2509.14936
作者:t, Michael Lones
备注:To appear in proceedings of UKCI 2025
摘要:社交媒体已成为当今社会的重要沟通媒介。这种认识导致许多人使用人工用户(或机器人)来误导他人相信谎言或以有益于这些人的方式行事。复杂的文本生成工具,如大型语言模型,进一步加剧了这个问题。本文旨在比较基于编码器和解码器Transformers的机器人检测模型的有效性。管道开发这些分类器的性能进行评估,揭示了基于编码器的分类器表现出更高的准确性和鲁棒性。然而,基于解码器的模型通过特定于任务的对齐显示出更大的适应性,这表明除了卓越的观察外,在不同用例中的泛化潜力更大。这些发现有助于防止数字环境被操纵,同时保护在线讨论的完整性。
摘要:Social media has become a key medium of communication in today's society. This realisation has led to many parties employing artificial users (or bots) to mislead others into believing untruths or acting in a beneficial manner to such parties. Sophisticated text generation tools, such as large language models, have further exacerbated this issue. This paper aims to compare the effectiveness of bot detection models based on encoder and decoder transformers. Pipelines are developed to evaluate the performance of these classifiers, revealing that encoder-based classifiers demonstrate greater accuracy and robustness. However, decoder-based models showed greater adaptability through task-specific alignment, suggesting more potential for generalisation across different use cases in addition to superior observa. These findings contribute to the ongoing effort to prevent digital environments being manipulated while protecting the integrity of online discussion.
【3】Leveraging Reinforcement Learning, Genetic Algorithms and Transformers for background determination in particle physics
标题:利用强化学习、遗传算法和Transformer进行粒子物理学中的背景确定
链接:https://arxiv.org/abs/2509.14894
作者: Hijano Mendizabal, Davide Lancierini, Alex Marshall, Andrea Mauri, Patrick Haworth Owen, Mitesh Patel, Konstantinos Petridis, Shah Rukh Qasim, Nicola Serra, William Sutcliffe, Hanae Tilquin
备注:32 pages, 12 figures
摘要
:美强子衰变的实验研究面临着重大的挑战,由于广泛的背景所产生的众多可能的衰变通道具有相似的最终状态。对于一个特定的信号衰减,用于确定最相关的背景过程的过程需要一个详细的分析最终状态粒子,潜在的错误识别,和运动学重叠,这是由于计算的限制,仅限于模拟最相关的背景。此外,这个过程通常依赖于物理学家的直觉和专业知识,因为不存在系统的方法。 本文有两个主要目标。首先,从粒子物理学的角度来看,我们提出了一种新的方法,利用强化学习(RL)来克服上述挑战,系统地确定影响美强子衰变测量的关键背景。虽然美强子物理学在这项工作中作为案例研究,提出的策略是广泛适用于其他类型的粒子物理测量。其次,从机器学习的角度来看,我们引入了一种新的算法,该算法利用RL和遗传算法(GAs)之间的协同作用,用于具有高度稀疏奖励和大轨迹空间的环境。该策略利用遗传算法来有效地探索轨迹空间并识别成功的轨迹,这些轨迹用于指导RL代理的训练。我们的方法还结合了一个Transformer架构的RL代理处理令牌序列表示衰变。
摘要:Experimental studies of beauty hadron decays face significant challenges due to a wide range of backgrounds arising from the numerous possible decay channels with similar final states. For a particular signal decay, the process for ascertaining the most relevant background processes necessitates a detailed analysis of final state particles, potential misidentifications, and kinematic overlaps, which, due to computational limitations, is restricted to the simulation of only the most relevant backgrounds. Moreover, this process typically relies on the physicist's intuition and expertise, as no systematic method exists. This paper has two primary goals. First, from a particle physics perspective, we present a novel approach that utilises Reinforcement Learning (RL) to overcome the aforementioned challenges by systematically determining the critical backgrounds affecting beauty hadron decay measurements. While beauty hadron physics serves as the case study in this work, the proposed strategy is broadly adaptable to other types of particle physics measurements. Second, from a Machine Learning perspective, we introduce a novel algorithm which exploits the synergy between RL and Genetic Algorithms (GAs) for environments with highly sparse rewards and a large trajectory space. This strategy leverages GAs to efficiently explore the trajectory space and identify successful trajectories, which are used to guide the RL agent's training. Our method also incorporates a transformer architecture for the RL agent to handle token sequences representing decays.
【4】DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers
标题:DyWPE:时间序列Transformer的信号感知动态子波位置编码
链接:https://arxiv.org/abs/2509.14640
作者:ni, Vangelis Metsis
摘要:Transformers中现有的位置编码方法基本上是信号不可知的,仅从序列索引导出位置信息,而忽略潜在的信号特性。这种限制对于时间序列分析尤其成问题,因为信号在多个时间尺度上表现出复杂的非平稳动态。我们介绍了动态小波位置编码(DyWPE),一种新的信号感知框架,使用离散小波变换(DWT)直接从输入时间序列生成位置嵌入。在十个不同的时间序列数据集的综合实验表明,DyWPE始终优于八个现有的最先进的位置编码方法,实现了9.1%的平均相对改善相比,基线正弦绝对位置编码在生物医学信号,同时保持竞争力的计算效率。
摘要:Existing positional encoding methods in transformers are fundamentally signal-agnostic, deriving positional information solely from sequence indices while ignoring the underlying signal characteristics. This limitation is particularly problematic for time series analysis, where signals exhibit complex, non-stationary dynamics across multiple temporal scales. We introduce Dynamic Wavelet Positional Encoding (DyWPE), a novel signal-aware framework that generates positional embeddings directly from input time series using the Discrete Wavelet Transform (DWT). Comprehensive experiments in ten diverse time series datasets demonstrate that DyWPE consistently outperforms eight existing state-of-the-art positional encoding methods, achieving average relative improvements of 9.1\% compared to baseline sinusoidal absolute position encoding in biomedical signals, while maintaining competitive computational efficiency.
【5】Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models
标题:等价模型下随机变换的上下文学习的渐近性研究
链接:https://arxiv.org/abs/2509.15152
作者:ir, Zafer Dogan
备注:MLSP 2025, 6 pages 2 figures
摘要:我们研究了在非线性回归的背景下,预训练的Transformers的上下文学习(ICL)能力。具体来说,我们专注于一个随机的Transformer与非线性MLP头,其中第一层是随机初始化和固定的,而第二层是训练。此外,我们考虑一个渐近制度的上下文长度,输入尺寸,隐藏尺寸,训练任务的数量,和训练样本的数量共同增长。在这种情况下,我们表明,随机Transformer的行为相当于一个有限度的Hermite多项式模型的ICL误差。这种等价性通过不同激活函数、上下文长度、隐藏层宽度(揭示了双下降现象)和正则化设置的模拟来验证。我们的研究结果提供了理论和经验的见解,何时以及如何MLP层增强ICL,以及如何非线性和过参数化影响模型的性能。
摘要:We study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where the first layer is randomly initialized and fixed while the second layer is trained. Furthermore, we consider an asymptotic regime where the context length, input dimension, hidden dimension, number of training tasks, and number of training samples jointly grow. In this setting, we show that the random Transformer behaves equivalent to a finite-degree Hermite polynomial model in terms of ICL error. This equivalence is validated through simulations across varying activation functions, context lengths, hidden layer widths (revealing a double-descent phenomenon), and regularization settings. Our results offer theoretical and empirical insights into when and how MLP layers enhance ICL, and how nonlinearity and over-parameterization influence model performance.
GAN|对抗|攻击|生成相关(5篇)
【1】Generalizable Geometric Image Caption Synthesis
标题:可推广的几何图像字幕合成
链接:https://arxiv.org/abs/2509.15217
作者:Wenyuan Wang, Rui Pan, Ruida Wang, Howard Meng, Renjie Pi, Shizhe Diao, Tong Zhang
摘要:多模态大型语言模型具有多种实际应用,需要强大的推理能力。尽管最近取得了进展,但这些模型仍然难以解决复杂的几何问题。一个关键的挑战来自于缺乏高质量的图像-文本对数据集来理解几何图像。此外,大多数基于模板的数据合成管道通常无法推广到超出其预定义模板的问题。在本文中,我们通过将具有可验证奖励的强化学习(RLVR)的补充过程引入数据生成管道来弥合这一差距。通过采用RLVR来优化由50个基本几何关系合成的几何图像的标题,并使用来自数学问题解决任务的奖励信号,我们的流水线成功地捕捉到了几何问题解决的关键特征。这使得更好的任务泛化,并产生重要的改进。此外,即使在分布外的情况下,生成的数据集也增强了多模态大型语言模型的一般推理能力,在MathVista和MathVerse的非几何输入图像的统计,算术,代数和数值任务中,准确性提高了2.8\%\text {-}4.8\%$,在艺术,设计,技术,和MMMU的工程任务。
摘要
:Multimodal large language models have various practical applications that demand strong reasoning abilities. Despite recent advancements, these models still struggle to solve complex geometric problems. A key challenge stems from the lack of high-quality image-text pair datasets for understanding geometric images. Furthermore, most template-based data synthesis pipelines typically fail to generalize to questions beyond their predefined templates. In this paper, we bridge this gap by introducing a complementary process of Reinforcement Learning with Verifiable Rewards (RLVR) into the data generation pipeline. By adopting RLVR to refine captions for geometric images synthesized from 50 basic geometric relations and using reward signals derived from mathematical problem-solving tasks, our pipeline successfully captures the key features of geometry problem-solving. This enables better task generalization and yields non-trivial improvements. Furthermore, even in out-of-distribution scenarios, the generated dataset enhances the general reasoning capabilities of multimodal large language models, yielding accuracy improvements of $2.8\%\text{-}4.8\%$ in statistics, arithmetic, algebraic, and numerical tasks with non-geometric input images of MathVista and MathVerse, along with $2.4\%\text{-}3.9\%$ improvements in Art, Design, Tech, and Engineering tasks in MMMU.
【2】Explicit Context-Driven Neural Acoustic Modeling for High-Fidelity RIR Generation
标题:用于高保真RIR生成的显式上下文驱动神经声学建模
链接:https://arxiv.org/abs/2509.15210
作者:Qianyi Wu, Chaitanya Amballa, Romit Roy Choudhury
摘要:真实感声音模拟在许多应用中起着至关重要的作用。声音模拟中的一个关键要素是房间脉冲响应(RIR),它表征了声音如何在给定空间内从声源传播到听众。最近的研究已经应用神经隐式方法来学习RIR使用从环境中收集的上下文信息,如场景图像。然而,这些方法不能有效地利用来自环境的显式几何信息。为了进一步利用具有直接几何特征的神经隐式模型的潜力,我们提出了网格注入神经声场(MiNAF),它在给定位置查询粗糙的房间网格,并提取距离分布作为局部上下文的显式表示。我们的方法表明,结合显式的局部几何特征可以更好地指导神经网络生成更准确的RIR预测。通过与传统和最先进的基线方法的比较,我们表明MiNAF在各种评估指标上具有竞争力。此外,我们在训练样本有限的数据集中验证了MiNAF的鲁棒性,展示了高保真声音模拟的进步。
摘要:Realistic sound simulation plays a critical role in many applications. A key element in sound simulation is the room impulse response (RIR), which characterizes how sound propagates from a source to a listener within a given space. Recent studies have applied neural implicit methods to learn RIR using context information collected from the environment, such as scene images. However, these approaches do not effectively leverage explicit geometric information from the environment. To further exploit the potential of neural implicit models with direct geometric features, we present Mesh-infused Neural Acoustic Field (MiNAF), which queries a rough room mesh at given locations and extracts distance distributions as an explicit representation of local context. Our approach demonstrates that incorporating explicit local geometric features can better guide the neural network in generating more accurate RIR predictions. Through comparisons with conventional and state-of-the-art baseline methods, we show that MiNAF performs competitively across various evaluation metrics. Furthermore, we verify the robustness of MiNAF in datasets with limited training samples, demonstrating an advance in high-fidelity sound simulation.
【3】Diffusion-Based Scenario Tree Generation for Multivariate Time Series Prediction and Multistage Stochastic Optimization
标题:基于扩散的场景树生成用于多元时间序列预测和多阶段随机优化
链接:https://arxiv.org/abs/2509.14832
作者:arifis, Ioannis Kordonis, Petros Maragos
备注:5 pages, 2 figures, 2 tables, and 1 algorithm. This version is submitted to the 51st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026), to be held in Barcelona, Spain, on May 4-8, 2026
摘要:随机预测对于不确定系统中的有效决策至关重要,例如能源市场和金融,其中估计未来情景的完整分布至关重要。我们提出了扩散情景树(DST),一个通用的框架,用于构建情景树的多变量预测任务,使用基于扩散的概率预测模型。DST递归地对未来的轨迹进行采样,并通过聚类将它们组织成一棵树,确保每个阶段的非预期性(决策仅取决于观察到的历史)。我们评估了纽约州日前电力市场的能源套利优化任务的框架。实验结果表明,我们的方法始终优于使用来自更传统模型和无模型强化学习基线的场景树的相同优化算法。此外,使用DST进行随机优化产生更有效的决策策略,通过比使用相同的基于扩散的预测器的确定性和随机MPC变体更好地处理不确定性来实现更高的性能。
摘要:Stochastic forecasting is critical for efficient decision-making in uncertain systems, such as energy markets and finance, where estimating the full distribution of future scenarios is essential. We propose Diffusion Scenario Tree (DST), a general framework for constructing scenario trees for multivariate prediction tasks using diffusion-based probabilistic forecasting models. DST recursively samples future trajectories and organizes them into a tree via clustering, ensuring non-anticipativity (decisions depending only on observed history) at each stage. We evaluate the framework on the optimization task of energy arbitrage in New York State's day-ahead electricity market. Experimental results show that our approach consistently outperforms the same optimization algorithms that use scenario trees from more conventional models and Model-Free Reinforcement Learning baselines. Furthermore, using DST for stochastic optimization yields more efficient decision policies, achieving higher performance by better handling uncertainty than deterministic and stochastic MPC variants using the same diffusion-based forecaster.
【4】Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
标题:用于在线恶意意图检测的对抗提取检索-增强守护模型
链接:https://arxiv.org/abs/2509.14622
作者:, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu
摘要
:随着大型语言模型(LLM)在交互式应用程序中的部署,在线恶意意图检测变得越来越重要。然而,现有的方法不能实时处理多样化和复杂的用户查询。为了应对这些挑战,我们引入了ADRAG(对抗蒸馏检索增强警卫),这是一个两阶段的框架,用于强大而高效的在线恶意意图检测。在训练阶段,一个高容量的教师模型在对抗干扰,检索增强的输入上进行训练,以学习各种复杂用户查询的鲁棒决策边界。在推理阶段,蒸馏调度器将教师的知识转移到一个紧凑的学生模型中,并不断更新在线收集的知识库。在部署时,紧凑的学生模型利用从在线更新的知识库中检索的前K个相似的安全范例,以实现在线和实时的恶意查询检测。对10个安全基准的评估表明,ADRAG具有149 M参数模型,实现了WildGuard-7 B性能的98.5%,在分发外检测方面超过GPT-4 3.3%和Llama-Guard-3-8B 9.5%,同时在实时应用中以每秒300次查询(QPS)的速度提供高达5.6倍的延迟。
摘要:With the deployment of Large Language Models (LLMs) in interactive applications, online malicious intent detection has become increasingly critical. However, existing approaches fall short of handling diverse and complex user queries in real time. To address these challenges, we introduce ADRAG (Adversarial Distilled Retrieval-Augmented Guard), a two-stage framework for robust and efficient online malicious intent detection. In the training stage, a high-capacity teacher model is trained on adversarially perturbed, retrieval-augmented inputs to learn robust decision boundaries over diverse and complex user queries. In the inference stage, a distillation scheduler transfers the teacher's knowledge into a compact student model, with a continually updated knowledge base collected online. At deployment, the compact student model leverages top-K similar safety exemplars retrieved from the online-updated knowledge base to enable both online and real-time malicious query detection. Evaluations across ten safety benchmarks demonstrate that ADRAG, with a 149M-parameter model, achieves 98.5% of WildGuard-7B's performance, surpasses GPT-4 by 3.3% and Llama-Guard-3-8B by 9.5% on out-of-distribution detection, while simultaneously delivering up to 5.6x lower latency at 300 queries per second (QPS) in real-time applications.
【5】Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models
标题:即时注射防御对抗微调的早期方法:2022年GPT-3和当代模型的研究
链接:https://arxiv.org/abs/2509.14271
作者:andoval, Denys Fenchenko, Junyao Chen
摘要:本文记录了2022年在大型语言模型中防御即时注入攻击的早期研究,为这一关键安全领域的发展提供了历史背景。本研究主要关注两种针对大型语言模型(LLM)的对抗性攻击:即时注入和目标劫持。我们将研究如何构建这些攻击,在各种LLM上进行测试,并比较它们的有效性。我们提出并评估了一种新的防御技术,称为对抗微调。我们的研究结果表明,没有这种防御,攻击成功的GPT-3系列模型的时间为31%。当使用我们的对抗性微调方法时,对于较小的GPT-3变体(Ada,Babbage,Curie),攻击成功率降低到接近零,尽管我们注意到随后的研究揭示了基于微调的防御的局限性。我们还发现,更灵活的模型表现出更大的脆弱性,这些攻击。因此,像GPT-3 Davinci这样的大型模型比像GPT-2这样的小型模型更容易受到攻击。虽然测试的特定模型现在被取代,但核心方法和实证研究结果为现代即时注射防御研究奠定了基础,包括指令层次系统和宪法AI方法。
摘要:This paper documents early research conducted in 2022 on defending against prompt injection attacks in large language models, providing historical context for the evolution of this critical security domain. This research focuses on two adversarial attacks against Large Language Models (LLMs): prompt injection and goal hijacking. We examine how to construct these attacks, test them on various LLMs, and compare their effectiveness. We propose and evaluate a novel defense technique called Adversarial Fine-Tuning. Our results show that, without this defense, the attacks succeeded 31\% of the time on GPT-3 series models. When using our Adversarial Fine-Tuning approach, attack success rates were reduced to near zero for smaller GPT-3 variants (Ada, Babbage, Curie), though we note that subsequent research has revealed limitations of fine-tuning-based defenses. We also find that more flexible models exhibit greater vulnerability to these attacks. Consequently, large models such as GPT-3 Davinci are more vulnerable than smaller models like GPT-2. While the specific models tested are now superseded, the core methodology and empirical findings contributed to the foundation of modern prompt injection defense research, including instruction hierarchy systems and constitutional AI approaches.
半/弱/无/有监督|不确定性|主动学习(8篇)
【1】Semi-Supervised 3D Medical Segmentation from 2D Natural Images Pretrained Model
标题:基于2D自然图像预训练模型的半监督3D医学分割
链接:https://arxiv.org/abs/2509.15167
作者:eung, Jayroop Ramesh, Pengfei Lyu, Ana Namburete, Jagath Rajapakse
备注:Machine Learning in Medical Imaging (MLMI) 2025 Oral
摘要:本文探讨了从2D自然图像上预训练的一般视觉模型中转移知识,以改进3D医学图像分割。我们专注于半监督设置,其中只有少数标记的3D医学图像可用,以及大量的未标记图像。为了解决这个问题,我们提出了一个与模型无关的框架,该框架将知识从2D预训练模型逐步提取到从头开始训练的3D分割模型。我们的方法M&N涉及使用彼此生成的伪掩码对两个模型进行迭代联合训练,以及我们提出的学习率引导采样,该采样自适应地调整每个训练批次中标记和未标记数据的比例,以与模型的预测准确性和稳定性保持一致,最大限度地减少不准确伪掩码造成的不利影响。在多个公开数据集上进行的大量实验表明,M&N实现了最先进的性能,在所有不同的设置下都优于现有的13种半监督分割方法。重要的是,消融研究表明,M&N仍然是模型不可知的,允许与不同的架构无缝集成。这确保了它的适应性,因为更先进的模型出现。该代码可在https://github.com/pakheiyeung/M-N上获得。
摘要:This paper explores the transfer of knowledge from general vision models pretrained on 2D natural images to improve 3D medical image segmentation. We focus on the semi-supervised setting, where only a few labeled 3D medical images are available, along with a large set of unlabeled images. To tackle this, we propose a model-agnostic framework that progressively distills knowledge from a 2D pretrained model to a 3D segmentation model trained from scratch. Our approach, M&N, involves iterative co-training of the two models using pseudo-masks generated by each other, along with our proposed learning rate guided sampling that adaptively adjusts the proportion of labeled and unlabeled data in each training batch to align with the models' prediction accuracy and stability, minimizing the adverse effect caused by inaccurate pseudo-masks. Extensive experiments on multiple publicly available datasets demonstrate that M&N achieves state-of-the-art performance, outperforming thirteen existing semi-supervised segmentation approaches under all different settings. Importantly, ablation studies show that M&N remains model-agnostic, allowing seamless integration with different architectures. This ensures its adaptability as more advanced models emerge. The code is available at https://github.com/pakheiyeung/M-N.
【2】Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning
标题:注意差距:数据重写以实现稳定的政策外监督微调
链接:https://arxiv.org/abs/2509.15157
作者:ao, Xuyang Zhao, Jiaming Zhou, Aobo Kong, Qicheng Li, Yong Qin
摘要:大型语言模型的监督微调(SFT)可以被视为一个非策略学习问题,其中专家演示来自固定的行为策略,而训练旨在优化目标策略。重要性抽样是纠正这种分布不匹配的标准工具,但较大的政策差距会导致高方差和训练不稳定性。现有的方法使用KL惩罚或裁剪来缓解这个问题,这被动地约束更新,而不是主动地减少间隙。我们提出了一个简单而有效的数据重写框架,通过保持正确的解决方案作为政策数据和重写不正确的指导重新解决,只有在需要时才回落到专家演示,主动缩小政策差距。这在优化之前将训练分布与目标策略对齐,从而降低重要性抽样方差并稳定非策略微调。在五个数学推理基准上的实验表明,与香草SFT和最先进的动态微调(DFT)方法相比,该方法具有一致和显著的增益。数据和代码将在https://github.com/NKU-HLT/Off-Policy-SFT上发布。
摘要:Supervised fine-tuning (SFT) of large language models can be viewed as an off-policy learning problem, where expert demonstrations come from a fixed behavior policy while training aims to optimize a target policy. Importance sampling is the standard tool for correcting this distribution mismatch, but large policy gaps lead to high variance and training instability. Existing approaches mitigate this issue using KL penalties or clipping, which passively constrain updates rather than actively reducing the gap. We propose a simple yet effective data rewriting framework that proactively shrinks the policy gap by keeping correct solutions as on-policy data and rewriting incorrect ones with guided re-solving, falling back to expert demonstrations only when needed. This aligns the training distribution with the target policy before optimization, reducing importance sampling variance and stabilizing off-policy fine-tuning. Experiments on five mathematical reasoning benchmarks demonstrate consistent and significant gains over both vanilla SFT and the state-of-the-art Dynamic Fine-Tuning (DFT) approach. The data and code will be released at https://github.com/NKU-HLT/Off-Policy-SFT.
【3】Self-Explaining Reinforcement Learning for Mobile Network Resource Allocation
标题:用于移动网络资源分配的自解释强化学习
链接:https://arxiv.org/abs/2509.14925
作者:wosadko, Franco Ruggeri, Ahmad Terra
摘要:结合深度神经网络(DNN)的强化学习(RL)方法虽然强大,但往往缺乏透明度。它们的黑箱特性阻碍了可解释性并降低了可信度,特别是在关键领域。为了解决RL任务中的这一挑战,我们提出了一种基于自解释神经网络(SENN)的解决方案,以及解释提取方法,以增强可解释性,同时保持预测准确性。我们的方法针对低维问题,以产生强大的本地和全球的解释模型的行为。我们评估所提出的方法在移动网络中的资源分配问题,证明SENN可以构成具有竞争力的性能可解释的解决方案。这项工作突出了SENN在提高低维任务AI驱动决策的透明度和信任度方面的潜力。我们的方法与现有的最先进的方法相比具有很强的性能,同时提供了强大的解释。
摘要:Reinforcement Learning (RL) methods that incorporate deep neural networks (DNN), though powerful, often lack transparency. Their black-box characteristic hinders interpretability and reduces trustworthiness, particularly in critical domains. To address this challenge in RL tasks, we propose a solution based on Self-Explaining Neural Networks (SENNs) along with explanation extraction methods to enhance interpretability while maintaining predictive accuracy. Our approach targets low-dimensionality problems to generate robust local and global explanations of the model's behaviour. We evaluate the proposed method on the resource allocation problem in mobile networks, demonstrating that SENNs can constitute interpretable solutions with competitive performance. This work highlights the potential of SENNs to improve transparency and trust in AI-driven decision-making for low-dimensional tasks. Our approach strong performance on par with the existing state-of-the-art methods, while providing robust explanations.
【4】DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training
标题:DeCoP:通过依赖控制的预训练增强自监督时间序列表示
链接:https://arxiv.org/abs/2509.14642
作者:, Zhongze Wu, Xiu Su, Feng Yang, Hongyan Xu, Xi Lin, Wenti Huang, Shan You, Chang Xu
摘要:动态时间依赖性建模是时间序列预训练中的一个关键挑战,由于分布变化和多尺度模式而演变。这种时间可变性严重损害了预训练模型对下游任务的推广。现有的框架无法捕捉短期和长期依赖关系的复杂交互,使它们容易受到虚假相关性的影响,从而降低泛化能力。为了解决这些限制,我们提出了DeCoP,一个依赖控制的预训练框架,通过模拟不断变化的补丁间依赖关系,明确地建模动态的,多尺度的依赖关系。在输入层面,DeCoP引入了实例分片归一化(IPN)来减轻分布偏移,同时保留每个分片的独特特征,为表示学习奠定了坚实的基础。在潜在水平上,分层依赖控制学习(DCL)策略显式地对多个时间尺度上的补丁间依赖关系进行建模,实例级对比模块(ICM)通过从时不变的正对中学习实例判别表示来增强全局泛化。DeCoP在10个数据集上实现了最先进的结果,计算资源更少,在ETTh1上比PatchTST提高了3%的MSE,仅使用了37%的FLOP。
摘要
:Modeling dynamic temporal dependencies is a critical challenge in time series pre-training, which evolve due to distribution shifts and multi-scale patterns. This temporal variability severely impairs the generalization of pre-trained models to downstream tasks. Existing frameworks fail to capture the complex interactions of short- and long-term dependencies, making them susceptible to spurious correlations that degrade generalization. To address these limitations, we propose DeCoP, a Dependency Controlled Pre-training framework that explicitly models dynamic, multi-scale dependencies by simulating evolving inter-patch dependencies. At the input level, DeCoP introduces Instance-wise Patch Normalization (IPN) to mitigate distributional shifts while preserving the unique characteristics of each patch, creating a robust foundation for representation learning. At the latent level, a hierarchical Dependency Controlled Learning (DCL) strategy explicitly models inter-patch dependencies across multiple temporal scales, with an Instance-level Contrastive Module (ICM) enhances global generalization by learning instance-discriminative representations from time-invariant positive pairs. DeCoP achieves state-of-the-art results on ten datasets with lower computing resources, improving MSE by 3% on ETTh1 over PatchTST using only 37% of the FLOPs.
【5】Learning to Retrieve for Environmental Knowledge Discovery: An Augmentation-Adaptive Self-Supervised Learning Framework
标题:学习学习环境知识发现:增强自适应自我监督学习框架
链接:https://arxiv.org/abs/2509.14563
作者:uo, Runlong Yu, Chonghao Qiu, Rahul Ghosh, Robert Ladwig, Paul C. Hanson, Yiqun Xie, Xiaowei Jia
摘要:环境知识的发现依赖于特定任务的标记数据,但往往受到数据收集成本高的限制。现有的机器学习方法通常难以在数据稀疏或非典型条件下进行推广。为此,我们提出了一个增强自适应自监督学习(A$^2$SL)框架,该框架检索相关的观察样本以增强目标生态系统的建模。具体来说,我们引入了一个多层次的成对学习损失训练的场景编码器,捕捉不同程度的相似性的情况下。这些学习到的相似性驱动一种检索机制,该机制用来自不同位置或时间段的相关数据补充目标场景。此外,为了更好地处理可变场景,特别是在传统模型难以处理的非典型或极端条件下,我们设计了一种增强自适应机制,通过有针对性的数据增强来选择性地增强这些场景。使用淡水生态系统作为案例研究,我们评估A$^2$SL在模拟现实世界湖泊中的水温和溶解氧动态。实验结果表明,A$^2$SL显著提高了预测精度,增强了数据稀缺和非典型场景下的鲁棒性。虽然这项研究的重点是淡水生态系统,但A$^2$SL框架在各个科学领域提供了广泛适用的解决方案。
摘要:The discovery of environmental knowledge depends on labeled task-specific data, but is often constrained by the high cost of data collection. Existing machine learning approaches usually struggle to generalize in data-sparse or atypical conditions. To this end, we propose an Augmentation-Adaptive Self-Supervised Learning (A$^2$SL) framework, which retrieves relevant observational samples to enhance modeling of the target ecosystem. Specifically, we introduce a multi-level pairwise learning loss to train a scenario encoder that captures varying degrees of similarity among scenarios. These learned similarities drive a retrieval mechanism that supplements a target scenario with relevant data from different locations or time periods. Furthermore, to better handle variable scenarios, particularly under atypical or extreme conditions where traditional models struggle, we design an augmentation-adaptive mechanism that selectively enhances these scenarios through targeted data augmentation. Using freshwater ecosystems as a case study, we evaluate A$^2$SL in modeling water temperature and dissolved oxygen dynamics in real-world lakes. Experimental results show that A$^2$SL significantly improves predictive accuracy and enhances robustness in data-scarce and atypical scenarios. Although this study focuses on freshwater ecosystems, the A$^2$SL framework offers a broadly applicable solution in various scientific domains.
【6】Disproving the Feasibility of Learned Confidence Calibration Under Binary Supervision: An Information-Theoretic Impossibility
标题:反驳二元监督下习得置信度校准的可行性:信息论不可能
链接:https://arxiv.org/abs/2509.14386
作者:Nair, Kristina P. Sinaga
备注:30 pages, 13 figures, 8 tables
摘要:我们证明了一个基本的不可能性定理:当使用二元正确/不正确监督进行训练时,神经网络无法同时学习具有有意义多样性的校准良好的置信度估计。通过严格的数学分析和全面的实证评估,跨越负奖励训练,对称损失函数和事后校准方法,我们证明这是一个信息理论的约束,而不是方法上的失败。我们的实验揭示了普遍的失败模式:负奖励产生极端的信心不足(ECE大于0.8),同时破坏信心的多样性(标准差小于0.05),对称损失未能逃脱二进制信号平均,事后方法实现校准(ECE小于0.02),只有通过压缩的置信分布。我们将其形式化为一个未指定的映射问题,其中二进制信号无法区分正确预测的不同置信水平:60%置信的正确答案与90%置信的正确答案接受相同的监督。至关重要的是,我们的真实世界验证显示,MNIST、Fashion-MNIST和CIFAR-10的所有训练方法的失败率都是100%,而事后校准的成功率为33%,这矛盾地证实了我们的定理,即通过转换而不是学习来实现校准。这种不可能性直接解释了神经网络的幻觉,并确定了为什么事后校准在数学上是必要的,而不仅仅是方便。我们提出了新的监督范式,使用合奏分歧和自适应多智能体学习,可以克服这些根本的限制,而不需要人类的信心注释。
摘要:We prove a fundamental impossibility theorem: neural networks cannot simultaneously learn well-calibrated confidence estimates with meaningful diversity when trained using binary correct/incorrect supervision. Through rigorous mathematical analysis and comprehensive empirical evaluation spanning negative reward training, symmetric loss functions, and post-hoc calibration methods, we demonstrate this is an information-theoretic constraint, not a methodological failure. Our experiments reveal universal failure patterns: negative rewards produce extreme underconfidence (ECE greater than 0.8) while destroying confidence diversity (std less than 0.05), symmetric losses fail to escape binary signal averaging, and post-hoc methods achieve calibration (ECE less than 0.02) only by compressing the confidence distribution. We formalize this as an underspecified mapping problem where binary signals cannot distinguish between different confidence levels for correct predictions: a 60 percent confident correct answer receives identical supervision to a 90 percent confident one. Crucially, our real-world validation shows 100 percent failure rate for all training methods across MNIST, Fashion-MNIST, and CIFAR-10, while post-hoc calibration's 33 percent success rate paradoxically confirms our theorem by achieving calibration through transformation rather than learning. This impossibility directly explains neural network hallucinations and establishes why post-hoc calibration is mathematically necessary, not merely convenient. We propose novel supervision paradigms using ensemble disagreement and adaptive multi-agent learning that could overcome these fundamental limitations without requiring human confidence annotations.
【7】BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings
标题:BabyHuBERT:多语言自我监督学习,用于在以儿童为中心的长篇录音中细分发言者
链接:https://arxiv.org/abs/2509.15001
作者:lot, Tarek Kunze, Maxime Poli, Alejandrina Cristia, Emmanuel Dupoux, Marvin Lavechin
备注:5 pages, 1 figure
摘要
:以儿童为中心的长形式录音对于研究早期语言发展至关重要,但由于声学和语言差异,现有的基于干净成人数据训练的语音模型表现不佳。我们介绍BabyHuBERT,这是第一个自我监督的语音表示模型,经过13,000小时的以儿童为中心的多语言长格式录音训练,涵盖40多种语言。我们评估了BabyHuBERT的说话者分割,确定目标儿童何时与女性成年人,男性成年人或其他儿童说话-这是分析自然语言经验的基本预处理步骤。BabyHuBERT在六个不同的数据集上获得了从52.1%到74.4%的F1分数,始终优于W2 V2-LL 4300(在英语长格式上训练)和标准HuBERT(在干净的成人语音上训练)。值得注意的改进包括瓦努阿图的F1绝对值比HuBERT高13.2分,所罗门群岛语料库的F1绝对值高15.9分,证明了对代表性不足的语言的有效性。通过共享代码和模型,BabyHuBERT作为儿童语音研究的基础模型,可以对各种下游任务进行微调。
摘要:Child-centered long-form recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic differences. We introduce BabyHuBERT, the first self-supervised speech representation model trained on 13,000 hours of multilingual child-centered long-form recordings spanning over 40 languages. We evaluate BabyHuBERT on speaker segmentation, identifying when target children speak versus female adults, male adults, or other children -- a fundamental preprocessing step for analyzing naturalistic language experiences. BabyHuBERT achieves F1-scores from 52.1% to 74.4% across six diverse datasets, consistently outperforming W2V2-LL4300 (trained on English long-forms) and standard HuBERT (trained on clean adult speech). Notable improvements include 13.2 absolute F1 points over HuBERT on Vanuatu and 15.9 points on Solomon Islands corpora, demonstrating effectiveness on underrepresented languages. By sharing code and models, BabyHuBERT serves as a foundation model for child speech research, enabling fine-tuning on diverse downstream tasks.
【8】Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior
标题:有噪音先验的噪音环境中基于扩散的无监督视听语音分离
链接:https://arxiv.org/abs/2509.14379
作者:mini, Rami Ben-Ari, Sharon Gannot, Ethan Fetaya
摘要:在本文中,我们解决的问题,单麦克风语音分离存在的环境噪声。我们提出了一种生成式无监督技术,可以直接对干净的语音和结构化的噪声成分进行建模,专门针对这些单独的信号而不是嘈杂的混合信号进行训练。我们的方法利用视听评分模型,结合视觉线索,作为一个强大的生成语音之前。通过明确建模的噪声分布旁边的语音分布,我们使有效的分解,通过逆问题的范例。我们通过反向扩散过程从后验分布中采样来进行语音分离,该过程直接估计并去除建模的噪声分量以恢复干净的组成信号。实验结果显示了良好的性能,突出了我们的直接噪声建模方法在具有挑战性的声学环境中的有效性。
摘要:In this paper, we address the problem of single-microphone speech separation in the presence of ambient noise. We propose a generative unsupervised technique that directly models both clean speech and structured noise components, training exclusively on these individual signals rather than noisy mixtures. Our approach leverages an audio-visual score model that incorporates visual cues to serve as a strong generative speech prior. By explicitly modelling the noise distribution alongside the speech distribution, we enable effective decomposition through the inverse problem paradigm. We perform speech separation by sampling from the posterior distributions via a reverse diffusion process, which directly estimates and removes the modelled noise component to recover clean constituent signals. Experimental results demonstrate promising performance, highlighting the effectiveness of our direct noise modelling approach in challenging acoustic environments.
迁移|Zero/Few/One-Shot|自适应(3篇)
【1】Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning
标题:用于联邦微调的自适应LoRA专家分配和选择
链接:https://arxiv.org/abs/2509.15087
作者: Jieming Bian, Letian Zhang, Jie Xu
备注:Accepted to NeurIPS 2025
摘要:大型语言模型(LLM)已经在各种任务中展示了令人印象深刻的功能,但是针对特定领域的应用程序对其进行微调通常需要大量的特定领域数据,这些数据可能分布在多个组织中。联邦学习(FL)提供了一个隐私保护的解决方案,但在应用于LLM时面临着计算约束的挑战。低级别自适应(LoRA)已成为一种参数高效的微调方法,尽管单个LoRA模块经常难以处理不同领域的异构数据。本文解决了联邦LoRA微调中的两个关键挑战:1。确定LoRA专家跨异构客户端的最佳数量和分配,以及2.使得客户能够基于这些专家的特定数据特征来选择性地利用这些专家。我们提出了FedLEASE(联邦自适应LoRA专家分配和选择),这是一种新型框架,可根据表示相似性自适应地对客户端进行集群,以分配和训练特定于领域的LoRA专家。它还引入了一个自适应的顶级$M$专家混合机制,允许每个客户端选择最佳数量的利用专家。我们在不同基准数据集上的广泛实验表明,FedLEASE在保持通信效率的同时,在异构客户端设置中显著优于现有的联邦微调方法。
摘要:Large Language Models (LLMs) have demonstrated impressive capabilities across various tasks, but fine-tuning them for domain-specific applications often requires substantial domain-specific data that may be distributed across multiple organizations. Federated Learning (FL) offers a privacy-preserving solution, but faces challenges with computational constraints when applied to LLMs. Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient fine-tuning approach, though a single LoRA module often struggles with heterogeneous data across diverse domains. This paper addresses two critical challenges in federated LoRA fine-tuning: 1. determining the optimal number and allocation of LoRA experts across heterogeneous clients, and 2. enabling clients to selectively utilize these experts based on their specific data characteristics. We propose FedLEASE (Federated adaptive LoRA Expert Allocation and SElection), a novel framework that adaptively clusters clients based on representation similarity to allocate and train domain-specific LoRA experts. It also introduces an adaptive top-$M$ Mixture-of-Experts mechanism that allows each client to select the optimal number of utilized experts. Our extensive experiments on diverse benchmark datasets demonstrate that FedLEASE significantly outperforms existing federated fine-tuning approaches in heterogeneous client settings while maintaining communication efficiency.
【2】Stochastic Adaptive Gradient Descent Without Descent
标题:随机自适应梯度下降没有下降
链接:https://arxiv.org/abs/2509.14969
作者:çois Aujol, Jérémie Bigot, Camille Castera
摘要:我们引入了一个新的自适应步长策略的凸优化随机梯度,利用局部几何的目标函数,只有通过一阶随机预言,没有任何超参数调整。该方法来自于随机设置的自适应梯度下降无下降方法的理论基础适应。我们证明了收敛的随机梯度下降与我们的步长在各种假设下,我们表明,它的经验竞争对调谐基线。
摘要:We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.
【3】TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding
标题:ALEDART:用于表格理解的动态自适应多模式路由
链接:https://arxiv.org/abs/2509.14671
作者:ng, Wei Yuan, Tong Chen, Quoc Viet Hung Nguyen, Xiangliang Zhang, Hongzhi Yin
摘要:从表格数据中建模语义和结构信息仍然是有效理解表格的核心挑战。现有的表作为文本的方法扁平化大型语言模型(LLM)的表,但失去了关键的结构线索,而表作为图像的方法保留结构,但与细粒度的语义斗争。最近的表格作为多模态策略试图结合文本和视觉视图,但它们(1)静态地处理大型多模态LLM(MLLM)中每个查询表对的两种模态,不可避免地引入冗余甚至冲突,(2)依赖于MLLM的昂贵的微调。鉴于此,我们提出了TableDART,一个训练效率高的框架,通过重用预训练的单模态模型来集成多模态视图。TableDART引入了一个轻量级的2. 59M参数MLP门控网络,它为每个表查询对动态选择最佳路径(纯文本、纯图像或融合),有效地减少了两种模式的冗余和冲突。此外,我们提出了一种新的代理调解跨模态知识整合分析输出文本和图像为基础的模型,无论是选择最佳的结果或通过推理合成一个新的答案。这种设计避免了完全MLLM微调的高昂成本。在七个基准测试上进行的广泛实验表明,TableDART在开源模型中建立了新的最先进的性能,平均超过最强基线4.02%。该代码可从以下网址获得:https://anonymous.4open.science/r/TableDART-C52B
摘要:Modeling semantic and structural information from tabular data remains a core challenge for effective table understanding. Existing Table-as-Text approaches flatten tables for large language models (LLMs), but lose crucial structural cues, while Table-as-Image methods preserve structure yet struggle with fine-grained semantics. Recent Table-as-Multimodality strategies attempt to combine textual and visual views, but they (1) statically process both modalities for every query-table pair within a large multimodal LLMs (MLLMs), inevitably introducing redundancy and even conflicts, and (2) depend on costly fine-tuning of MLLMs. In light of this, we propose TableDART, a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models. TableDART introduces a lightweight 2.59M-parameter MLP gating network that dynamically selects the optimal path (either Text-only, Image-only, or Fusion) for each table-query pair, effectively reducing redundancy and conflicts from both modalities. In addition, we propose a novel agent to mediate cross-modal knowledge integration by analyzing outputs from text- and image-based models, either selecting the best result or synthesizing a new answer through reasoning. This design avoids the prohibitive costs of full MLLM fine-tuning. Extensive experiments on seven benchmarks show that TableDART establishes new state-of-the-art performance among open-source models, surpassing the strongest baseline by an average of 4.02%. The code is available at: https://anonymous.4open.science/r/TableDART-C52B
强化学习(4篇)
【1】Reinforcement Learning Agent for a 2D Shooter Game
标题:用于2D射击游戏的强化学习代理
链接:https://arxiv.org/abs/2509.15042
作者:kermann, Moritz Spang, Hamza A. A. Gardi
摘要:在复杂的游戏环境中,强化学习代理通常会受到奖励稀疏、训练不稳定和样本效率低下的影响。本文提出了一种混合训练方法,结合离线模仿学习和在线强化学习的2D射击游戏代理。我们实现了一个多头神经网络,它具有用于行为克隆和Q学习的单独输出,通过具有注意力机制的共享特征提取层统一起来。使用纯深度Q网络的初始实验表现出显著的不稳定性,尽管偶尔表现良好,但代理经常恢复到糟糕的策略。为了解决这个问题,我们开发了一种混合方法,从基于规则的代理的演示数据的行为克隆开始,然后过渡到强化学习。我们的混合方法对基于规则的对手的胜率始终保持在70%以上,大大优于表现出高方差和频繁性能下降的纯强化学习方法。多头架构使学习模式之间的有效知识转移,同时保持培训的稳定性。结果表明,将基于演示的初始化与强化学习优化相结合,为在复杂的多智能体环境中开发游戏AI智能体提供了一种强大的解决方案,而纯粹的探索证明是不够的。
摘要:Reinforcement learning agents in complex game environments often suffer from sparse rewards, training instability, and poor sample efficiency. This paper presents a hybrid training approach that combines offline imitation learning with online reinforcement learning for a 2D shooter game agent. We implement a multi-head neural network with separate outputs for behavioral cloning and Q-learning, unified by shared feature extraction layers with attention mechanisms. Initial experiments using pure deep Q-Networks exhibited significant instability, with agents frequently reverting to poor policies despite occasional good performance. To address this, we developed a hybrid methodology that begins with behavioral cloning on demonstration data from rule-based agents, then transitions to reinforcement learning. Our hybrid approach achieves consistently above 70% win rate against rule-based opponents, substantially outperforming pure reinforcement learning methods which showed high variance and frequent performance degradation. The multi-head architecture enables effective knowledge transfer between learning modes while maintaining training stability. Results demonstrate that combining demonstration-based initialization with reinforcement learning optimization provides a robust solution for developing game AI agents in complex multi-agent environments where pure exploration proves insufficient.
【2】Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization
标题:通过信息收益最大化的多保真混合强化学习
链接:https://arxiv.org/abs/2509.14848
作者:ifaou, Osvaldo Simeone
摘要:优化强化学习(RL)策略通常需要与环境的高保真模拟器进行广泛的交互,这通常是昂贵的或不切实际的。离线RL通过允许从预先收集的数据进行训练来解决这个问题,但其有效性受到数据集大小和质量的强烈限制。混合离线-在线RL利用离线数据和与环境的单个模拟器的交互。然而,在许多真实世界的场景中,可以使用具有不同保真度和计算成本的多个模拟器。在这项工作中,我们研究了多保真度混合RL的政策优化下的固定成本预算。我们介绍了多保真度混合RL通过信息增益最大化(MF-HRL-IGM),一个混合离线-在线RL算法,实现保真度选择的基础上,通过自举方法的信息增益最大化。理论分析表明MF-HRL-IGM具有无遗憾性,而实证评估表明其性能优于现有的基准。
摘要:Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, multiple simulators with varying levels of fidelity and computational cost are available. In this work, we study multi-fidelity hybrid RL for policy optimization under a fixed cost budget. We introduce multi-fidelity hybrid RL via information gain maximization (MF-HRL-IGM), a hybrid offline-online RL algorithm that implements fidelity selection based on information gain maximization through a bootstrapping approach. Theoretical analysis establishes the no-regret property of MF-HRL-IGM, while empirical evaluations demonstrate its superior performance compared to existing benchmarks.
【3】Scalable Multi-Objective Robot Reinforcement Learning through Gradient Conflict Resolution
标题:通过梯度冲突解决的可扩展多目标机器人强化学习
链接:https://arxiv.org/abs/2509.14816
作者:Munn, Brendan Tidd, Peter Böhm, Marcus Gallagher, David Howard
摘要:强化学习(RL)机器人控制器通常将多个任务目标聚合到一个标量奖励中。虽然大规模的近端策略优化(PPO)已经实现了令人印象深刻的结果,例如在现实世界中的鲁棒机器人运动,但许多任务仍然需要仔细的奖励调整,并且容易陷入局部最优。调优成本和次优性随着目标的数量而增加,限制了可伸缩性。对奖励向量及其权衡进行建模可以解决这些问题;然而,由于计算成本和优化难度,多目标方法在机器人强化学习中仍然没有得到充分利用。在这项工作中,我们调查的梯度贡献之间的冲突,每个目标出现从scalarising的任务目标。特别是,我们明确地解决基于任务的奖励和规则化的政策对现实的行为的条款之间的冲突。我们提出了GCR-PPO,修改演员评论家优化,分解演员更新到客观明智的梯度使用多头评论家和解决冲突的基础上的客观优先级。我们的方法,GCR-PPO,在著名的IsaacLab操作和运动基准和两个相关任务的额外多目标修改进行评估。我们表现出优越的可扩展性相比,并行PPO(p = 0.04),没有显着的计算开销。我们也表现出更高的性能与更多的冲突任务。GCR-PPO比大规模PPO平均提高了9.5%,高冲突任务的改善更大。该代码可在https://github.com/humphreymunn/GCR-PPO上获得。
摘要:Reinforcement Learning (RL) robot controllers usually aggregate many task objectives into one scalar reward. While large-scale proximal policy optimisation (PPO) has enabled impressive results such as robust robot locomotion in the real world, many tasks still require careful reward tuning and are brittle to local optima. Tuning cost and sub-optimality grow with the number of objectives, limiting scalability. Modelling reward vectors and their trade-offs can address these issues; however, multi-objective methods remain underused in RL for robotics because of computational cost and optimisation difficulty. In this work, we investigate the conflict between gradient contributions for each objective that emerge from scalarising the task objectives. In particular, we explicitly address the conflict between task-based rewards and terms that regularise the policy towards realistic behaviour. We propose GCR-PPO, a modification to actor-critic optimisation that decomposes the actor update into objective-wise gradients using a multi-headed critic and resolves conflicts based on the objective priority. Our methodology, GCR-PPO, is evaluated on the well-known IsaacLab manipulation and locomotion benchmarks and additional multi-objective modifications on two related tasks. We show superior scalability compared to parallel PPO (p = 0.04), without significant computational overhead. We also show higher performance with more conflicting tasks. GCR-PPO improves on large-scale PPO with an average improvement of 9.5%, with high-conflict tasks observing a greater improvement. The code is available at https://github.com/humphreymunn/GCR-PPO.
【4】Online reinforcement learning via sparse Gaussian mixture model Q-functions
标题:通过稀疏高斯混合模型Q函数的在线强化学习
链接:https://arxiv.org/abs/2509.14585
作者:Konstantinos Slavakis
摘要:本文介绍了一种结构化和可解释的在线策略迭代框架,用于强化学习(RL),建立在一类新的稀疏高斯混合模型Q函数(S-GMM-QFs)。扩展早期离线训练GMM-QFs的工作,所提出的框架开发了一个在线计划,利用流数据来鼓励探索。模型的复杂性通过Hadamard过参数化的稀疏化来调节,这在保持表现力的同时减轻了过拟合。S-GMM-QFs的参数空间自然具有黎曼流形结构,允许通过平滑目标上的在线梯度下降进行原则性参数更新。数值测试表明,S-GMM-QF在标准基准上与密集深度RL(DeepRL)方法的性能相匹配,同时使用显著较少的参数,并且即使在稀疏DeepRL方法无法推广的低参数计数制度下也能保持强大的性能。
摘要:This paper introduces a structured and interpretable online policy-iteration framework for reinforcement learning (RL), built around the novel class of sparse Gaussian mixture model Q-functions (S-GMM-QFs). Extending earlier work that trained GMM-QFs offline, the proposed framework develops an online scheme that leverages streaming data to encourage exploration. Model complexity is regulated through sparsification by Hadamard overparametrization, which mitigates overfitting while preserving expressiveness. The parameter space of S-GMM-QFs is naturally endowed with a Riemannian manifold structure, allowing for principled parameter updates via online gradient descent on a smooth objective. Numerical tests show that S-GMM-QFs match the performance of dense deep RL (DeepRL) methods on standard benchmarks while using significantly fewer parameters, and maintain strong performance even in low-parameter-count regimes where sparsified DeepRL methods fail to generalize.
元学习(1篇)
【1】Balancing Sparse RNNs with Hyperparameterization Benefiting Meta-Learning
标题:平衡稀疏RNN与超参数化有利于元学习
链接:https://arxiv.org/abs/2509.15057
作者:rshey, Randy Paffenroth
摘要:本文开发了用于指定稀疏递归神经网络(RNN)的替代超参数。这些超参数允许在模型的可训练权重矩阵内改变稀疏性,同时提高整体性能。这种架构能够定义一种新的度量标准,即隐藏比例,它旨在平衡模型中未知数的分布,并提供对模型性能的重要解释力。总之,使用不同的稀疏性RNN架构结合隐藏比例度量产生了显着的性能增益,同时提高了先验基础上的性能预期。这种组合方法提供了一条通往广义元学习应用和基于数据集内在特征(包括输入和输出维度)的模型优化的道路。
摘要:This paper develops alternative hyperparameters for specifying sparse Recurrent Neural Networks (RNNs). These hyperparameters allow for varying sparsity within the trainable weight matrices of the model while improving overall performance. This architecture enables the definition of a novel metric, hidden proportion, which seeks to balance the distribution of unknowns within the model and provides significant explanatory power of model performance. Together, the use of the varied sparsity RNN architecture combined with the hidden proportion metric generates significant performance gains while improving performance expectations on an a priori basis. This combined approach provides a path forward towards generalized meta-learning applications and model optimization based on intrinsic characteristics of the data set, including input and output dimensions.
分层学习(2篇)
【1】The Energy-Efficient Hierarchical Neural Network with Fast FPGA-Based Incremental Learning
标题:具有快速基于FPGA的增量学习的节能分层神经网络
链接:https://arxiv.org/abs/2509.15097
作者:Saleh Vahdatpour, Huaiyuan Chu, Yanqing Zhang
备注:Published at IJCNN 2025
摘要:深度学习不断增长的计算和能源需求,特别是在基础模型和大型语言模型(LLM)等大规模架构中,对可持续性提出了重大挑战。传统的基于梯度的训练方法效率低下,需要大量的迭代更新和高功耗。为了解决这些限制,我们提出了一个混合框架,结合层次分解与基于FPGA的直接方程求解和增量学习。我们的方法将神经网络分为两个功能层:较低层通过FPGA上的单步方程求解进行优化,以实现高效和可并行化的特征提取,而较高层采用自适应增量学习来支持持续更新,而无需完全重新训练。在此基础上,我们引入了复合LLM框架,它显式地在两个层次结构中部署LLM模块。较低级别的LLM以最小的能量开销处理可重用的表示学习,而较高级别的LLM通过能量感知更新执行自适应决策。这种集成设计增强了可扩展性,减少了冗余计算,并符合可持续AI的原则。理论分析和架构见解表明,我们的方法显着降低了计算成本,同时保持了高模型性能,使其非常适合在能源受限的环境中进行边缘部署和实时适应。
摘要:The rising computational and energy demands of deep learning, particularly in large-scale architectures such as foundation models and large language models (LLMs), pose significant challenges to sustainability. Traditional gradient-based training methods are inefficient, requiring numerous iterative updates and high power consumption. To address these limitations, we propose a hybrid framework that combines hierarchical decomposition with FPGA-based direct equation solving and incremental learning. Our method divides the neural network into two functional tiers: lower layers are optimized via single-step equation solving on FPGAs for efficient and parallelizable feature extraction, while higher layers employ adaptive incremental learning to support continual updates without full retraining. Building upon this foundation, we introduce the Compound LLM framework, which explicitly deploys LLM modules across both hierarchy levels. The lower-level LLM handles reusable representation learning with minimal energy overhead, while the upper-level LLM performs adaptive decision-making through energy-aware updates. This integrated design enhances scalability, reduces redundant computation, and aligns with the principles of sustainable AI. Theoretical analysis and architectural insights demonstrate that our method reduces computational costs significantly while preserving high model performance, making it well-suited for edge deployment and real-time adaptation in energy-constrained environments.
【2】Hierarchical Federated Learning for Social Network with Mobility
标题:具有移动性的社交网络分层联邦学习
链接:https://arxiv.org/abs/2509.14938
作者:, Wen Chen, Jun Li, Qingqing Wu, Ming Ding, Xuefeng Han, Xiumei Deng, Liwei Wang
摘要:Federated Learning(FL)提供了一种分散的解决方案,允许协作的本地模型训练和全局聚合,从而保护数据隐私。在传统的FL框架中,数据隐私通常是在本地数据保持绝对隐私的假设下保留的,而客户端的移动性在显式建模中经常被忽略。在本文中,我们提出了一个分层的联邦学习框架的基础上的社会网络的移动性,即HFL-SNM,同时考虑客户端之间的数据共享和他们的移动模式。在有限资源的约束下,我们制定了一个资源分配和客户端调度的联合优化问题,其目标是最大限度地减少客户端在FL过程中的能源消耗。在社会网络中,我们引入了有效数据覆盖率和冗余数据覆盖率的概念。通过初步实验分析了有效数据和冗余数据对模型性能的影响。我们将优化问题分解为多个子问题,并根据初步的实验结果对它们进行分析,提出了具有移动性的社会网络动态优化(DO-SNM)算法。实验结果表明,我们的算法实现了优越的模型性能,同时显着降低能耗,与传统的基线算法相比。
摘要
:Federated Learning (FL) offers a decentralized solution that allows collaborative local model training and global aggregation, thereby protecting data privacy. In conventional FL frameworks, data privacy is typically preserved under the assumption that local data remains absolutely private, whereas the mobility of clients is frequently neglected in explicit modeling. In this paper, we propose a hierarchical federated learning framework based on the social network with mobility namely HFL-SNM that considers both data sharing among clients and their mobility patterns. Under the constraints of limited resources, we formulate a joint optimization problem of resource allocation and client scheduling, which objective is to minimize the energy consumption of clients during the FL process. In social network, we introduce the concepts of Effective Data Coverage Rate and Redundant Data Coverage Rate. We analyze the impact of effective data and redundant data on the model performance through preliminary experiments. We decouple the optimization problem into multiple sub-problems, analyze them based on preliminary experimental results, and propose Dynamic Optimization in Social Network with Mobility (DO-SNM) algorithm. Experimental results demonstrate that our algorithm achieves superior model performance while significantly reducing energy consumption, compared to traditional baseline algorithms.
医学相关(5篇)
【1】Explaining deep learning for ECG using time-localized clusters
标题:使用时间局部化集群解释心电图的深度学习
链接:https://arxiv.org/abs/2509.15198
作者:ubekki, Konstantinos Patlatzoglou, Joseph Barker, Fu Siong Ng, Antônio H. Ribeiro
摘要:深度学习具有显著先进的心电图(ECG)分析,能够实现超越传统临床能力的自动注释、疾病筛查和预后。然而,理解这些模型仍然是一个挑战,限制了解释和从这些发展中获得知识。在这项工作中,我们提出了一种新的可解释性方法卷积神经网络应用于心电图分析。我们的方法从模型的内部表示中提取时间局部化的聚类,根据学习到的特征分割ECG,同时量化这些表示的不确定性。这使我们能够可视化不同的波形区域如何对模型的预测做出贡献,并评估其决策的确定性。通过提供ECG深度学习模型的结构化和可解释的视图,我们的方法增强了对AI驱动诊断的信任,并促进了临床相关电生理模式的发现。
摘要:Deep learning has significantly advanced electrocardiogram (ECG) analysis, enabling automatic annotation, disease screening, and prognosis beyond traditional clinical capabilities. However, understanding these models remains a challenge, limiting interpretation and gaining knowledge from these developments. In this work, we propose a novel interpretability method for convolutional neural networks applied to ECG analysis. Our approach extracts time-localized clusters from the model's internal representations, segmenting the ECG according to the learned characteristics while quantifying the uncertainty of these representations. This allows us to visualize how different waveform regions contribute to the model's predictions and assess the certainty of its decisions. By providing a structured and interpretable view of deep learning models for ECG, our method enhances trust in AI-driven diagnostics and facilitates the discovery of clinically relevant electrophysiological patterns.
【2】Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges
标题:人工智能公共胸部放射摄影数据集的局限性:标签质量、领域转移、偏差和评估挑战
链接:https://arxiv.org/abs/2509.15107
作者:rty, Rishi Ramaesh, Ajitha Rajan
摘要:人工智能在胸部放射摄影中显示出了巨大的前景,深度学习模型可以接近放射科医生级别的诊断性能。MIMIC-CXR、ChestX-ray 14、PadChest和CheXpert等大型公共数据集加速了这一进程,这些数据集提供了数十万张带有病理学注释的标记图像。然而,这些数据集也存在重要的局限性。从放射学报告中自动提取标签会引入错误,特别是在处理不确定性和否定时,并且放射科医师的审查经常与分配的标签不一致。此外,领域转移和人群偏倚限制了模型的普适性,而评价实践往往忽略了有临床意义的措施。我们对这些挑战进行了系统的分析,重点是标签质量,数据集偏差和域转移。我们在多个模型架构中进行的跨数据集域偏移评估显示,外部性能大幅下降,AUPRC和F1分数相对于内部测试显著降低。为了评估数据集偏差,我们训练了一个源分类模型,该模型以近乎完美的准确度区分数据集,并进行了亚组分析,显示少数年龄和性别群体的性能下降。最后,由两名委员会认证的放射科医生进行的专家审查发现了与公共数据集标签的重大分歧。我们的研究结果强调了当前基准的重要临床弱点,并强调了对临床医生验证数据集和更公平的评估框架的需求。
摘要:Artificial intelligence has shown significant promise in chest radiography, where deep learning models can approach radiologist-level diagnostic performance. Progress has been accelerated by large public datasets such as MIMIC-CXR, ChestX-ray14, PadChest, and CheXpert, which provide hundreds of thousands of labelled images with pathology annotations. However, these datasets also present important limitations. Automated label extraction from radiology reports introduces errors, particularly in handling uncertainty and negation, and radiologist review frequently disagrees with assigned labels. In addition, domain shift and population bias restrict model generalisability, while evaluation practices often overlook clinically meaningful measures. We conduct a systematic analysis of these challenges, focusing on label quality, dataset bias, and domain shift. Our cross-dataset domain shift evaluation across multiple model architectures revealed substantial external performance degradation, with pronounced reductions in AUPRC and F1 scores relative to internal testing. To assess dataset bias, we trained a source-classification model that distinguished datasets with near-perfect accuracy, and performed subgroup analyses showing reduced performance for minority age and sex groups. Finally, expert review by two board-certified radiologists identified significant disagreement with public dataset labels. Our findings highlight important clinical weaknesses of current benchmarks and emphasise the need for clinician-validated datasets and fairer evaluation frameworks.
【3】Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery
标题:具有细粒度结合表示的结构感知对比学习用于药物发现
链接:https://arxiv.org/abs/2509.14788
作者: Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gwing Kei Yip, Gerald W.Y. Cheng, Yunlin Mao, Jing Cai, Liang-ting Lin, Jung Sun Yoo
摘要:准确识别药物-靶标相互作用(DTI)仍然是计算药理学的核心挑战,其中基于序列的方法提供了可扩展性。这项工作引入了一个基于序列的药物-靶标相互作用框架,该框架将结构先验整合到蛋白质表征中,同时保持高通量筛选能力。通过多个基准测试,该模型在Human和BioSNAP数据集上实现了最先进的性能,并在BindingDB上保持竞争力。在虚拟筛选任务中,它超过了LIT-PCBA上的先前方法,在AUROC和BEDROC中产生了实质性的收益。消融研究证实了学习聚合,双线性注意力和对比对齐在增强预测鲁棒性方面的关键作用。嵌入可视化揭示了与已知结合口袋的改进的空间对应关系,并突出了配体-残基接触的可解释注意力模式。这些结果验证了该框架的实用性,可扩展的和结构感知的DTI预测。
摘要
:Accurate identification of drug-target interactions (DTI) remains a central challenge in computational pharmacology, where sequence-based methods offer scalability. This work introduces a sequence-based drug-target interaction framework that integrates structural priors into protein representations while maintaining high-throughput screening capability. Evaluated across multiple benchmarks, the model achieves state-of-the-art performance on Human and BioSNAP datasets and remains competitive on BindingDB. In virtual screening tasks, it surpasses prior methods on LIT-PCBA, yielding substantial gains in AUROC and BEDROC. Ablation studies confirm the critical role of learned aggregation, bilinear attention, and contrastive alignment in enhancing predictive robustness. Embedding visualizations reveal improved spatial correspondence with known binding pockets and highlight interpretable attention patterns over ligand-residue contacts. These results validate the framework's utility for scalable and structure-aware DTI prediction.
【4】HD3C: Efficient Medical Data Classification for Embedded Devices
标题:HD 3C:嵌入式设备的高效医疗数据分类
链接:https://arxiv.org/abs/2509.14617
作者:Wei, Zhenyu Zhang, Pengcheng Wang, Mingjie Zeng, Zhigang Zeng
摘要:节能医疗数据分类对于现代疾病筛查至关重要,特别是在嵌入式设备普遍存在的家庭和现场医疗保健中。虽然深度学习模型达到了最先进的准确性,但其大量的能源消耗和对GPU的依赖限制了在此类平台上的部署。我们提出了超维计算与类明智的聚类(HD 3C),一个轻量级的分类框架,专为低功耗环境。HD 3C将数据编码成高维超向量,将它们聚合成多个特定于集群的原型,并通过超空间中的相似性搜索进行分类。我们在三个医疗分类任务中评估了HD 3C;在心音分类方面,HD 3C比Bayesian ResNet高出350倍的能效,准确率差异不到1%。此外,HD 3C对噪声、有限的训练数据和硬件错误表现出出色的鲁棒性,这得到了理论分析和经验结果的支持,突出了其在现实环境中可靠部署的潜力。代码可在https://github.com/jianglanwei/HD3C上获得。
摘要:Energy-efficient medical data classification is essential for modern disease screening, particularly in home and field healthcare where embedded devices are prevalent. While deep learning models achieve state-of-the-art accuracy, their substantial energy consumption and reliance on GPUs limit deployment on such platforms. We present Hyperdimensional Computing with Class-Wise Clustering (HD3C), a lightweight classification framework designed for low-power environments. HD3C encodes data into high-dimensional hypervectors, aggregates them into multiple cluster-specific prototypes, and performs classification through similarity search in hyperspace. We evaluate HD3C across three medical classification tasks; on heart sound classification, HD3C is $350\times$ more energy-efficient than Bayesian ResNet with less than 1% accuracy difference. Moreover, HD3C demonstrates exceptional robustness to noise, limited training data, and hardware error, supported by both theoretical analysis and empirical results, highlighting its potential for reliable deployment in real-world settings. Code is available at https://github.com/jianglanwei/HD3C.
【5】Non-Intrusive Parametrized-Background Data-Weak Reconstruction of Cardiac Displacement Fields from Sparse MRI-like Observations
标题:非侵入性参数化背景数据-从稀疏MRI样观察中弱重建心脏位移场
链接:https://arxiv.org/abs/2509.14844
作者: C. Mantegazza, Federica Caforio, Christoph Augustin, Matthias A.F. Gsell, Gundolf Haase, Elias Karabelas
备注:42 pages, 12 figures, 6 tables
摘要:个性化的心脏诊断需要从稀疏的临床成像数据中准确重建心肌位移场,但目前的方法通常需要侵入式访问计算模型。在这项工作中,我们应用非侵入性参数化背景数据弱(PBDW)的方法来三维(3D)心脏位移场重建有限的磁共振成像(MRI)的观察。我们的实现只需要解决方案的快照-没有控制方程,汇编程序,或求解器访问-使跨商业和研究代码使用不同的本构模型的即时部署。此外,我们介绍了两个增强功能:一个H大小的小批量最坏情况下的正交匹配追踪(wOMP)算法,提高传感器选择(SS)的计算效率,同时保持重建精度,和内存优化技术,利用块矩阵结构的向量问题。我们证明了该方法的有效性,通过验证的三维左心室模型与模拟疤痕组织。从无噪声重建开始,我们系统地将高斯噪声和空间稀疏性模仿现实的MRI采集协议。结果表明,在无噪声条件下(O(1 e-5)阶的相对L2误差),具有10%噪声的鲁棒性能(O(1 e-2)阶的相对L2误差),以及从稀疏测量(O(1 e-2)阶的相对L2误差)的有效重建异常准确。与完整的有限元(FE)模拟相比,在线重建实现了四个数量级的计算速度,稀疏场景的重建时间不到十分之一秒,显示出集成到临床心脏建模工作流程中的巨大潜力。
摘要:Personalized cardiac diagnostics require accurate reconstruction of myocardial displacement fields from sparse clinical imaging data, yet current methods often demand intrusive access to computational models. In this work, we apply the non-intrusive Parametrized-Background Data-Weak (PBDW) approach to three-dimensional (3D) cardiac displacement field reconstruction from limited Magnetic Resonance Image (MRI)-like observations. Our implementation requires only solution snapshots -- no governing equations, assembly routines, or solver access -- enabling immediate deployment across commercial and research codes using different constitutive models. Additionally, we introduce two enhancements: an H-size minibatch worst-case Orthogonal Matching Pursuit (wOMP) algorithm that improves Sensor Selection (SS) computational efficiency while maintaining reconstruction accuracy, and memory optimization techniques exploiting block matrix structures in vectorial problems. We demonstrate the effectiveness of the method through validation on a 3D left ventricular model with simulated scar tissue. Starting with noise-free reconstruction, we systematically incorporate Gaussian noise and spatial sparsity mimicking realistic MRI acquisition protocols. Results show exceptional accuracy in noise-free conditions (relative L2 error of order O(1e-5)), robust performance with 10% noise (relative L2 error of order O(1e-2)), and effective reconstruction from sparse measurements (relative L2 error of order O(1e-2)). The online reconstruction achieves four-order-of-magnitude computational speed-up compared to full Finite Element (FE) simulations, with reconstruction times under one tenth of second for sparse scenarios, demonstrating significant potential for integration into clinical cardiac modeling workflows.
聚类(1篇)
【1】Improving Internet Traffic Matrix Prediction via Time Series Clustering
标题:通过时间序列集群改进互联网流量矩阵预测
链接:https://arxiv.org/abs/2509.15072
作者:sh, Alexander Wyglinski
备注
:Accepted to ICMLA 2025
摘要:我们提出了一个新的框架,利用时间序列聚类来改进使用深度学习(DL)模型的互联网流量矩阵(TM)预测。TM中的流量通常表现出不同的时间行为,这可能会在训练跨所有流量的单个模型时阻碍预测准确性。为了解决这个问题,我们提出了两个聚类策略,源聚类和直方图聚类,组流模型训练之前具有相似的时间模式。聚类创建了更同质的数据子集,使模型能够更有效地捕获底层模式,并且比将单个模型拟合到整个TM的全局预测方法更好地泛化。与现有的TM预测方法相比,我们的方法降低了RMSE高达92%的Abilene和75%的G\'EANT。在路由场景中,我们的聚类预测还分别将最大链路利用率(MLU)偏差降低了18%和21%,这表明了当TM用于网络优化时聚类的实际好处。
摘要:We present a novel framework that leverages time series clustering to improve internet traffic matrix (TM) prediction using deep learning (DL) models. Traffic flows within a TM often exhibit diverse temporal behaviors, which can hinder prediction accuracy when training a single model across all flows. To address this, we propose two clustering strategies, source clustering and histogram clustering, that group flows with similar temporal patterns prior to model training. Clustering creates more homogeneous data subsets, enabling models to capture underlying patterns more effectively and generalize better than global prediction approaches that fit a single model to the entire TM. Compared to existing TM prediction methods, our method reduces RMSE by up to 92\% for Abilene and 75\% for G\'EANT. In routing scenarios, our clustered predictions also reduce maximum link utilization (MLU) bias by 18\% and 21\%, respectively, demonstrating the practical benefits of clustering when TMs are used for network optimization.
自动驾驶|车辆|车道检测等(1篇)
【1】Resolve Highway Conflict in Multi-Autonomous Vehicle Controls with Local State Attention
标题:在地方政府的关注下解决多自主车辆控制中的高速公路冲突
链接:https://arxiv.org/abs/2506.11445
作者:Ta, Bang Giang Le, Thanh Ha Le, Viet Cuong Ta
摘要:在混合交通环境中,自动驾驶汽车必须适应人类控制的车辆和其他不寻常的驾驶情况。这种设置可以被构建为多智能体强化学习(MARL)环境,在自动驾驶汽车之间具有完全的合作奖励。虽然多智能体邻近策略优化等方法可以有效地训练MARL任务,但它们往往无法解决智能体之间的局部冲突,并且无法推广到随机事件。在本文中,我们提出了一个本地状态注意力模块,以协助输入状态表示。该模型通过引入自注意算子,压缩邻近智能体的基本信息,解决交通冲突问题。利用一个模拟的高速公路合并的情况下,优先车辆作为意外事件,我们的方法是能够优先考虑其他车辆的信息来管理合并过程。结果表明,与流行的基线相比,合并效率有显着提高,特别是在高密度交通环境中。
摘要:In mixed-traffic environments, autonomous vehicles must adapt to human-controlled vehicles and other unusual driving situations. This setting can be framed as a multi-agent reinforcement learning (MARL) environment with full cooperative reward among the autonomous vehicles. While methods such as Multi-agent Proximal Policy Optimization can be effective in training MARL tasks, they often fail to resolve local conflict between agents and are unable to generalize to stochastic events. In this paper, we propose a Local State Attention module to assist the input state representation. By relying on the self-attention operator, the module is expected to compress the essential information of nearby agents to resolve the conflict in traffic situations. Utilizing a simulated highway merging scenario with the priority vehicle as the unexpected event, our approach is able to prioritize other vehicles' information to manage the merging process. The results demonstrate significant improvements in merging efficiency compared to popular baselines, especially in high-density traffic settings.
点云|SLAM|雷达|激光|深度RGBD相关(2篇)
【1】Next-Depth Lookahead Tree
标题:下一深度前瞻树
链接:https://arxiv.org/abs/2509.15143
作者:, Kangjin Kim, Gyeong Taek Lee
备注:25 pages, 2 figures
摘要:本文提出了下一个深度前瞻树(NDLT),一个单树模型,旨在通过评估节点分裂,不仅在被优化的节点,而且通过评估下一个深度级别的质量,以提高性能。
摘要:This paper proposes the Next-Depth Lookahead Tree (NDLT), a single-tree model designed to improve performance by evaluating node splits not only at the node being optimized but also by evaluating the quality of the next depth level.
【2】Efficiently learning depth-3 circuits via quantum agnostic boosting
标题:通过量子不可知增强有效学习深度3电路
链接:https://arxiv.org/abs/2509.14461
作者:n Arunachalam, Arkopal Dutt, Alexandru Gheorghiu, Michael de Oliveira
备注:52 pages
摘要
:We initiate the study of quantum agnostic learning of phase states with respect to a function class $\mathsf{C}\subseteq \{c:\{0,1\}^n\rightarrow \{0,1\}\}$: given copies of an unknown $n$-qubit state $|\psi\rangle$ which has fidelity $\textsf{opt}$ with a phase state $|\phi_c\rangle=\frac{1}{\sqrt{2^n}}\sum_{x\in \{0,1\}^n}(-1)^{c(x)}|x\rangle$ for some $c\in \mathsf{C}$, output $|\phi\rangle$ which has fidelity $|\langle \phi | \psi \rangle|^2 \geq \textsf{opt}-\varepsilon$. To this end, we give agnostic learning protocols for the following classes: (i) Size-$t$ decision trees which runs in time $\textsf{poly}(n,t,1/\varepsilon)$. This also implies $k$-juntas can be agnostically learned in time $\textsf{poly}(n,2^k,1/\varepsilon)$. (ii) $s$-term DNF formulas in near-polynomial time $\textsf{poly}(n,(s/\varepsilon)^{\log \log s/\varepsilon})$. Our main technical contribution is a quantum agnostic boosting protocol which converts a weak agnostic learner, which outputs a parity state $|\phi\rangle$ such that $|\langle \phi|\psi\rangle|^2\geq \textsf{opt}/\textsf{poly}(n)$, into a strong learner which outputs a superposition of parity states $|\phi'\rangle$ such that $|\langle \phi'|\psi\rangle|^2\geq \textsf{opt} - \varepsilon$. Using quantum agnostic boosting, we obtain the first near-polynomial time $n^{O(\log \log n)}$ algorithm for learning $\textsf{poly}(n)$-sized depth-$3$ circuits (consisting of $\textsf{AND}$, $\textsf{OR}$, $\textsf{NOT}$ gates) in the uniform quantum $\textsf{PAC}$ model using quantum examples. Classically, the analogue of efficient learning depth-$3$ circuits (and even depth-$2$ circuits) in the uniform $\textsf{PAC}$ model has been a longstanding open question in computational learning theory. Our work nearly settles this question, when the learner is given quantum examples.
摘要:We initiate the study of quantum agnostic learning of phase states with respect to a function class $\mathsf{C}\subseteq \{c:\{0,1\}^n\rightarrow \{0,1\}\}$: given copies of an unknown $n$-qubit state $|\psi\rangle$ which has fidelity $\textsf{opt}$ with a phase state $|\phi_c\rangle=\frac{1}{\sqrt{2^n}}\sum_{x\in \{0,1\}^n}(-1)^{c(x)}|x\rangle$ for some $c\in \mathsf{C}$, output $|\phi\rangle$ which has fidelity $|\langle \phi | \psi \rangle|^2 \geq \textsf{opt}-\varepsilon$. To this end, we give agnostic learning protocols for the following classes: (i) Size-$t$ decision trees which runs in time $\textsf{poly}(n,t,1/\varepsilon)$. This also implies $k$-juntas can be agnostically learned in time $\textsf{poly}(n,2^k,1/\varepsilon)$. (ii) $s$-term DNF formulas in near-polynomial time $\textsf{poly}(n,(s/\varepsilon)^{\log \log s/\varepsilon})$. Our main technical contribution is a quantum agnostic boosting protocol which converts a weak agnostic learner, which outputs a parity state $|\phi\rangle$ such that $|\langle \phi|\psi\rangle|^2\geq \textsf{opt}/\textsf{poly}(n)$, into a strong learner which outputs a superposition of parity states $|\phi'\rangle$ such that $|\langle \phi'|\psi\rangle|^2\geq \textsf{opt} - \varepsilon$. Using quantum agnostic boosting, we obtain the first near-polynomial time $n^{O(\log \log n)}$ algorithm for learning $\textsf{poly}(n)$-sized depth-$3$ circuits (consisting of $\textsf{AND}$, $\textsf{OR}$, $\textsf{NOT}$ gates) in the uniform quantum $\textsf{PAC}$ model using quantum examples. Classically, the analogue of efficient learning depth-$3$ circuits (and even depth-$2$ circuits) in the uniform $\textsf{PAC}$ model has been a longstanding open question in computational learning theory. Our work nearly settles this question, when the learner is given quantum examples.
联邦学习|隐私保护|加密(3篇)
【1】Who to Trust? Aggregating Client Knowledge in Logit-Based Federated Learning
标题:该相信谁?在基于日志的联邦学习中聚合客户知识
链接:https://arxiv.org/abs/2509.15147
作者:valchuk, Nikita Kotelevskii, Maxim Panov, Samuel Horváth, Martin Takáč
摘要:联邦学习(FL)通常共享模型权重或梯度,这对于大型模型来说是昂贵的。基于Logit的FL通过仅共享在公共代理数据集上计算的Logit来降低此成本。然而,聚合来自异构客户端的信息仍然具有挑战性。本文研究了这一问题,介绍并比较了三种logit聚集方法:简单平均法、不确定加权平均法和学习元聚集器。在MNIST和CIFAR-10上进行了评估,这些方法减少了通信开销,提高了非IID数据下的鲁棒性,并实现了与集中式训练相竞争的准确性。
摘要:Federated learning (FL) usually shares model weights or gradients, which is costly for large models. Logit-based FL reduces this cost by sharing only logits computed on a public proxy dataset. However, aggregating information from heterogeneous clients is still challenging. This paper studies this problem, introduces and compares three logit aggregation methods: simple averaging, uncertainty-weighted averaging, and a learned meta-aggregator. Evaluated on MNIST and CIFAR-10, these methods reduce communication overhead, improve robustness under non-IID data, and achieve accuracy competitive with centralized training.
【2】Towards Privacy-Preserving and Heterogeneity-aware Split Federated Learning via Probabilistic Masking
标题:通过概率掩蔽实现隐私保护和异类感知的分裂联邦学习
链接:https://arxiv.org/abs/2509.14603
作者:Wang, Feijie Wu, Chenglin Miao, Tianchun Li, Haoyu Hu, Qiming Cao, Jing Gao, Lu Su
摘要:分裂联邦学习(SFL)已经成为传统联邦学习(FL)的有效替代方案,通过模型划分减少客户端计算。然而,中间激活和模型更新的交换引入了显著的隐私风险,特别是来自从中间表示恢复原始输入的数据重构攻击。使用噪声注入的现有防御经常降低模型性能。为了克服这些挑战,我们提出了PM-SFL,这是一个可扩展和隐私保护的SFL框架,它结合了概率掩码训练来增加结构化的随机性,而不依赖于显式噪声。这在保持模型实用性的同时减轻了数据重建风险。为了解决数据异构性,PM-SFL采用个性化的掩码学习,为每个客户端的本地数据定制子模型结构。针对系统的异构性,引入了分层知识补偿机制,使具有不同资源的客户能够在自适应模型分割下有效地参与。理论分析证实了它的隐私保护,图像和无线传感任务的实验表明,PM-SFL不断提高准确性,通信效率和隐私攻击的鲁棒性,特别是在数据和系统异构性下的强大性能。
摘要:Split Federated Learning (SFL) has emerged as an efficient alternative to traditional Federated Learning (FL) by reducing client-side computation through model partitioning. However, exchanging of intermediate activations and model updates introduces significant privacy risks, especially from data reconstruction attacks that recover original inputs from intermediate representations. Existing defenses using noise injection often degrade model performance. To overcome these challenges, we present PM-SFL, a scalable and privacy-preserving SFL framework that incorporates Probabilistic Mask training to add structured randomness without relying on explicit noise. This mitigates data reconstruction risks while maintaining model utility. To address data heterogeneity, PM-SFL employs personalized mask learning that tailors submodel structures to each client's local data. For system heterogeneity, we introduce a layer-wise knowledge compensation mechanism, enabling clients with varying resources to participate effectively under adaptive model splitting. Theoretical analysis confirms its privacy protection, and experiments on image and wireless sensing tasks demonstrate that PM-SFL consistently improves accuracy, communication efficiency, and robustness to privacy attacks, with particularly strong performance under data and system heterogeneity.
【3】FedAVOT: Exact Distribution Alignment in Federated Learning via Masked Optimal Transport
标题:FedADOT:通过掩蔽最优传输实现联邦学习中的精确分布对齐
链接:https://arxiv.org/abs/2509.14444
作者:SeyedAbolfazl)Rahimi, Dionysis Kalogerias
备注:5 pages, 1 figure, ICASSP
摘要
:联邦学习(FL)允许分布式模型训练,而无需共享原始数据,但当客户端参与部分时会受到影响。在实践中,可用用户的分布(\ldblquote {availability distribution} $q$)很少与定义优化目标的分布(\ldblquote {importance distribution} $p$)保持一致,这导致了经典FedAvg下有偏差和不稳定的更新。我们提出了\textbf{FedAVOT},它将聚合表示为一个匹配$q$和$p$的掩码最优传输问题。使用Sinkhorn缩放,\textbf{FedAVOT}计算基于传输的聚合权重,并具有可证明的收敛保证。\textbf{FedAVOT}在非光滑凸FL设置下实现标准$\mathcal{O}(1/\sqrt{T})$速率,与每轮参与用户的数量无关。我们的实验证实,即使每轮只有两个客户端参与,在异构,公平敏感和低可用性的制度相比,FedAvg大大提高了性能。
摘要:Federated Learning (FL) allows distributed model training without sharing raw data, but suffers when client participation is partial. In practice, the distribution of available users (\emph{availability distribution} $q$) rarely aligns with the distribution defining the optimization objective (\emph{importance distribution} $p$), leading to biased and unstable updates under classical FedAvg. We propose \textbf{Fereated AVerage with Optimal Transport (\textbf{FedAVOT})}, which formulates aggregation as a masked optimal transport problem aligning $q$ and $p$. Using Sinkhorn scaling, \textbf{FedAVOT} computes transport-based aggregation weights with provable convergence guarantees. \textbf{FedAVOT} achieves a standard $\mathcal{O}(1/\sqrt{T})$ rate under a nonsmooth convex FL setting, independent of the number of participating users per round. Our experiments confirm drastically improved performance compared to FedAvg across heterogeneous, fairness-sensitive, and low-availability regimes, even when only two clients participate per round.
推理|分析|理解|解释(11篇)
【1】MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration
标题:MaRVIn:用于DNN推理的跨层混合精度RISC-V框架,从ISA扩展到硬件加速
链接:https://arxiv.org/abs/2509.15187
作者:rmeniakos, Alexis Maras, Sotirios Xydis, Dimitrios Soudris
备注:Accepted for publication by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, March 2025
摘要:量子化和混合精度技术的发展为提高NN的速度和能效提供了新的可能性。最近的几项研究表明,在不同参数之间调整精度水平可以保持与全精度模型相当的精度,同时显着降低计算需求。然而,现有的嵌入式微处理器缺乏足够的架构支持有效地执行混合精度NN,无论是在ISA扩展和硬件设计方面,导致效率低下,如过多的数据打包/解包和未充分利用的算术单元。在这项工作中,我们提出了新的ISA扩展和微架构实现,专门用于优化混合精度执行,从而在RISC-V架构上实现节能的深度学习推理。我们介绍了MaRVIn,这是一个跨层软硬件协同设计框架,通过硬件改进、混合精度量化、ISA级优化和周期精确仿真的组合来提高能效和性能。在硬件层面,我们增强了ALU与可配置的混合精度算术(2,4,8位)的权重/激活,并采用多泵,以减少执行延迟,同时实现有效的2位操作的软SIMD。在软件层面,我们集成了一个修剪意识微调方法来优化模型压缩和基于贪婪的DSE方法,有效地搜索帕累托最优的混合量化模型。此外,我们还采用了电压缩放来提高系统的功率效率。我们对广泛使用的DNN和数据集(如CIFAR 10和ImageNet)的实验评估表明,我们的框架平均可以实现17.6倍的加速,准确率损失不到1%,并且优于ISA不可知的最先进的RISC-V内核,提供高达1.8 TOPs/W。
摘要:The evolution of quantization and mixed-precision techniques has unlocked new possibilities for enhancing the speed and energy efficiency of NNs. Several recent studies indicate that adapting precision levels across different parameters can maintain accuracy comparable to full-precision models while significantly reducing computational demands. However, existing embedded microprocessors lack sufficient architectural support for efficiently executing mixed-precision NNs, both in terms of ISA extensions and hardware design, resulting in inefficiencies such as excessive data packing/unpacking and underutilized arithmetic units. In this work, we propose novel ISA extensions and a micro-architecture implementation specifically designed to optimize mixed-precision execution, enabling energy-efficient deep learning inference on RISC-V architectures. We introduce MaRVIn, a cross-layer hardware-software co-design framework that enhances power efficiency and performance through a combination of hardware improvements, mixed-precision quantization, ISA-level optimizations, and cycle-accurate emulation. At the hardware level, we enhance the ALU with configurable mixed-precision arithmetic (2, 4, 8 bits) for weights/activations and employ multi-pumping to reduce execution latency while implementing soft SIMD for efficient 2-bit ops. At the software level, we integrate a pruning-aware fine-tuning method to optimize model compression and a greedy-based DSE approach to efficiently search for Pareto-optimal mixed-quantized models. Additionally, we incorporate voltage scaling to boost the power efficiency of our system. Our experimental evaluation over widely used DNNs and datasets, such as CIFAR10 and ImageNet, demonstrates that our framework can achieve, on average, 17.6x speedup for less than 1% accuracy loss and outperforms the ISA-agnostic state-of-the-art RISC-V cores, delivering up to 1.8 TOPs/W.
【2】Blockchain-Enabled Explainable AI for Trusted Healthcare Systems
标题:支持区块链的可解释人工智能,用于可信的医疗保健系统
链接:https://arxiv.org/abs/2509.14987
作者:Mohsin
备注:6 Pages, 4 Figures
摘要:本文介绍了一种用于医疗保健系统的区块链集成可解释人工智能框架(BXHF),以解决健康信息网络面临的两个基本挑战:安全的数据交换和可理解的人工智能驱动的临床决策。我们的架构结合了区块链,确保患者记录是不可变的,可审计的,防篡改的,以及可解释的AI(XAI)方法,产生透明和临床相关的模型预测。通过将安全保证和可解释性要求纳入统一的优化管道,BXHF确保了数据级信任(通过验证和加密的记录共享)和决策级信任(通过可审计和临床一致的解释)。其混合边缘云架构允许跨不同机构进行联合计算,在保护患者隐私的同时实现协作分析。我们通过跨境临床研究网络、罕见疾病检测和高风险干预决策支持等用例展示了该框架的适用性。通过确保透明度、可扩展性和法规遵从性,BXHF提高了AI在医疗保健领域的可信度、使用率和有效性,为更安全、更可靠的临床决策奠定了基础。
摘要
:This paper introduces a Blockchain-Integrated Explainable AI Framework (BXHF) for healthcare systems to tackle two essential challenges confronting health information networks: safe data exchange and comprehensible AI-driven clinical decision-making. Our architecture incorporates blockchain, ensuring patient records are immutable, auditable, and tamper-proof, alongside Explainable AI (XAI) methodologies that yield transparent and clinically relevant model predictions. By incorporating security assurances and interpretability requirements into a unified optimization pipeline, BXHF ensures both data-level trust (by verified and encrypted record sharing) and decision-level trust (with auditable and clinically aligned explanations). Its hybrid edge-cloud architecture allows for federated computation across different institutions, enabling collaborative analytics while protecting patient privacy. We demonstrate the framework's applicability through use cases such as cross-border clinical research networks, uncommon illness detection and high-risk intervention decision support. By ensuring transparency, auditability, and regulatory compliance, BXHF improves the credibility, uptake, and effectiveness of AI in healthcare, laying the groundwork for safer and more reliable clinical decision-making.
【3】FAWN: A MultiEncoder Fusion-Attention Wave Network for Integrated Sensing and Communication Indoor Scene Inference
标题:FAWN:用于集成传感和通信室内场景推理的多编码器融合注意力波网络
链接:https://arxiv.org/abs/2509.14968
作者:rroso-Fernández, Alejandro Calvillo-Fernandez, Antonio de la Oliva, Carlos J. Bernardos
备注:7 pages, 6 figures and tables, less than 5500 words. Under revision at IEEE Communication Magazine
摘要:即将到来的几代无线技术预示着一个万物互联和智能化的时代。随着对智能需求的增长,网络必须学会更好地理解物理世界。然而,部署专用硬件来感知环境并不总是可行的,主要是由于成本和/或复杂性。集成传感和通信(ISAC)在应对这一挑战方面向前迈出了一步。在ISAC中,被动传感作为一种具有成本效益的解决方案出现,它可以重复使用无线通信来感知环境,而不会干扰现有的通信。然而,目前的大多数解决方案仅限于一种技术(主要是Wi-Fi或5G),限制了可达到的最大精度。由于不同的技术适用于不同的频谱,我们认为有必要整合一种以上的技术来扩大覆盖范围。因此,我们利用ISAC被动传感的优势,提出FAWN,一个多编码器融合注意波网络ISAC室内场景推理。FAWN基于原始的Transformers架构,融合来自Wi-Fi和5G的信息,使网络能够在不干扰当前通信的情况下理解物理世界。为了测试我们的解决方案,我们构建了一个原型并将其集成到真实场景中。结果显示,误差小于0.6米约84%的时间。
摘要:The upcoming generations of wireless technologies promise an era where everything is interconnected and intelligent. As the need for intelligence grows, networks must learn to better understand the physical world. However, deploying dedicated hardware to perceive the environment is not always feasible, mainly due to costs and/or complexity. Integrated Sensing and Communication (ISAC) has made a step forward in addressing this challenge. Within ISAC, passive sensing emerges as a cost-effective solution that reuses wireless communications to sense the environment, without interfering with existing communications. Nevertheless, the majority of current solutions are limited to one technology (mostly Wi-Fi or 5G), constraining the maximum accuracy reachable. As different technologies work with different spectrums, we see a necessity in integrating more than one technology to augment the coverage area. Hence, we take the advantage of ISAC passive sensing, to present FAWN, a MultiEncoder Fusion-Attention Wave Network for ISAC indoor scene inference. FAWN is based on the original transformers architecture, to fuse information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current communication. To test our solution, we have built a prototype and integrated it in a real scenario. Results show errors below 0.6 m around 84% of times.
【4】ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification
标题:ProtoMedX:迈向骨健康分类的可解释多模式原型学习
链接:https://arxiv.org/abs/2509.14830
作者:pez Pellicer, Andre Mariucci, Plamen Angelov, Marwan Bukhari, Jemma G. Kerns
备注:Accepted ICCV 2025. Adaptation, Fairness, Explainability in AI Medical Imaging (PHAROS-AFE-AIMI Workshop). 8 pages, 5 figures, 4 tables
摘要:骨健康研究在早期发现和治疗骨质减少和骨质疏松症的医疗实践中至关重要。临床医生通常根据密度测定(DEXA扫描)和患者病史做出诊断。AI在这一领域的应用正在进行研究。大多数成功的方法依赖于仅使用视觉(DEXA/X射线图像)的深度学习模型,并专注于预测准确性,而可解释性通常被忽视,并留给输入贡献的事后评估。我们提出了ProtoMedX,一个多模态模型,使用两个DEXA扫描的腰椎和病人的记录。ProtoMedX基于原型的架构可以通过设计来解释,这对于医疗应用至关重要,特别是在即将出台的欧盟人工智能法案的背景下,因为它允许明确分析模型决策,包括不正确的决策。ProtoMedX在骨骼健康分类方面展示了最先进的性能,同时还提供了临床医生可以直观理解的解释。使用4,160名真实NHS患者的数据集,所提出的ProtoMedX在仅视觉任务中实现了87.58%的准确率,在其多模态变体中实现了89.8%的准确率,两者都超过了现有的已发表方法。
摘要:Bone health studies are crucial in medical practice for the early detection and treatment of Osteopenia and Osteoporosis. Clinicians usually make a diagnosis based on densitometry (DEXA scans) and patient history. The applications of AI in this field are ongoing research. Most successful methods rely on deep learning models that use vision alone (DEXA/X-ray imagery) and focus on prediction accuracy, while explainability is often disregarded and left to post hoc assessments of input contributions. We propose ProtoMedX, a multi-modal model that uses both DEXA scans of the lumbar spine and patient records. ProtoMedX's prototype-based architecture is explainable by design, which is crucial for medical applications, especially in the context of the upcoming EU AI Act, as it allows explicit analysis of model decisions, including incorrect ones. ProtoMedX demonstrates state-of-the-art performance in bone health classification while also providing explanations that can be visually understood by clinicians. Using a dataset of 4,160 real NHS patients, the proposed ProtoMedX achieves 87.58% accuracy in vision-only tasks and 89.8% in its multi-modal variant, both surpassing existing published methods.
【5】Transcoder-based Circuit Analysis for Interpretable Single-Cell Foundation Models
标题:可解释单细胞基础模型的基于变编码器的电路分析
链接:https://arxiv.org/abs/2509.14723
作者:sokawa, Toshiharu Kawakami, Satoshi Kodera, Masamichi Ito, Norihiko Takeda
摘要
:单细胞基础模型(scFM)已经在各种任务上表现出最先进的性能,例如细胞类型注释和扰动响应预测,通过从大规模转录组数据中学习基因调控网络。然而,一个重大的挑战仍然存在:这些模型的决策过程是不太可解释的相比,传统的方法,如差异基因表达分析。最近,代码转换器已经成为一种很有前途的方法,用于从大型语言模型(LLM)中提取可解释的决策电路。在这项工作中,我们训练一个转码的cell2sentence(C2S)模型,一个国家的最先进的scFM。通过利用经过训练的代码转换器,我们从C2S模型中提取内部决策电路。我们证明了所发现的电路与现实世界的生物机制相对应,证实了代码转换器在复杂的单细胞模型中发现生物学上合理的途径的潜力。
摘要:Single-cell foundation models (scFMs) have demonstrated state-of-the-art performance on various tasks, such as cell-type annotation and perturbation response prediction, by learning gene regulatory networks from large-scale transcriptome data. However, a significant challenge remains: the decision-making processes of these models are less interpretable compared to traditional methods like differential gene expression analysis. Recently, transcoders have emerged as a promising approach for extracting interpretable decision circuits from large language models (LLMs). In this work, we train a transcoder on the cell2sentence (C2S) model, a state-of-the-art scFM. By leveraging the trained transcoder, we extract internal decision-making circuits from the C2S model. We demonstrate that the discovered circuits correspond to real-world biological mechanisms, confirming the potential of transcoders to uncover biologically plausible pathways within complex single-cell models.
【6】Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory
标题:理解推理模型的思维过程:舍恩菲尔德情景理论的视角
链接:https://arxiv.org/abs/2509.14662
作者:Nan Zhang, Chenrui Fan, Hong Jiao, Yanbin Fu, Sydney Peters, Qingshu Xu, Robert Lissitz, Tianyi Zhou
备注:EMNLP2025 main, Camera-ready
摘要:虽然大型推理模型(LRM)产生了广泛的思想链推理,但我们缺乏一个原则性的框架来理解这些思想是如何构建的。在本文中,我们介绍了一种新的方法,应用Schoenfeld的情节理论,一个经典的人类数学问题解决的认知框架,来分析LRM的推理痕迹。我们使用七个认知标签(例如,计划、实施、验证)。其结果是第一个公开可用的机器推理细粒度分析基准,包括大型注释语料库和详细的注释指南。我们的初步分析揭示了LRM推理的不同模式,如认知状态之间的转换动力学。这个框架提供了一个理论上的接地方法来解释LRM认知,并使未来的工作更可控和透明的推理系统。
摘要:While Large Reasoning Models (LRMs) generate extensive chain-of-thought reasoning, we lack a principled framework for understanding how these thoughts are structured. In this paper, we introduce a novel approach by applying Schoenfeld's Episode Theory, a classic cognitive framework for human mathematical problem-solving, to analyze the reasoning traces of LRMs. We annotated thousands of sentences and paragraphs from model-generated solutions to math problems using seven cognitive labels (e.g., Plan, Implement, Verify). The result is the first publicly available benchmark for the fine-grained analysis of machine reasoning, including a large annotated corpus and detailed annotation guidebooks. Our preliminary analysis reveals distinct patterns in LRM reasoning, such as the transition dynamics between cognitive states. This framework provides a theoretically grounded methodology for interpreting LRM cognition and enables future work on more controllable and transparent reasoning systems.
【7】H-Alpha Anomalyzer: An Explainable Anomaly Detector for Solar H-Alpha Observations
标题:H-Alpha异常探测器:太阳H-Alpha观测的可解释异常探测器
链接:https://arxiv.org/abs/2509.14472
作者:zaei, Azim Ahmadzadeh, Alexei Pevtsov, Luca Bertello, Alexander Pevtsov
摘要:过多的空间和地面观测站为天体物理学家提供了前所未有的数据量,这些数据只能使用先进的计算算法进行大规模处理。因此,确保输入机器学习(ML)模型的数据质量至关重要。来自GONG网络的H$\alpha$观测值代表了一个这样的数据流,自2010年以来,每分钟产生几个观测值,24/7。在这项研究中,我们引入了一个轻量级的(非ML)异常检测算法,称为H-Alpha Anomalyzer,旨在根据用户定义的标准来识别异常观测。与许多黑箱算法不同,我们的方法突出显示了哪些区域触发了异常标志,并量化了相应的异常可能性。为了进行比较分析,我们还创建并发布了一个包含2,000个观察结果的数据集,平均分为异常和非异常情况。我们的研究结果表明,该模型不仅优于现有的方法,但也提供了可解释性,使领域专家的定性评价。
摘要:The plethora of space-borne and ground-based observatories has provided astrophysicists with an unprecedented volume of data, which can only be processed at scale using advanced computing algorithms. Consequently, ensuring the quality of data fed into machine learning (ML) models is critical. The H$\alpha$ observations from the GONG network represent one such data stream, producing several observations per minute, 24/7, since 2010. In this study, we introduce a lightweight (non-ML) anomaly-detection algorithm, called H-Alpha Anomalyzer, designed to identify anomalous observations based on user-defined criteria. Unlike many black-box algorithms, our approach highlights exactly which regions triggered the anomaly flag and quantifies the corresponding anomaly likelihood. For our comparative analysis, we also created and released a dataset of 2,000 observations, equally divided between anomalous and non-anomalous cases. Our results demonstrate that the proposed model not only outperforms existing methods but also provides explainability, enabling qualitative evaluation by domain experts.
【8】eIQ Neutron: Redefining Edge-AI Inference with Integrated NPU and Compiler Innovations
标题:eIQ Neutron:通过集成的NPU和更轻松的创新重新定义边缘AI推理
链接:https://arxiv.org/abs/2509.14388
作者:amberg, Filippo Minnella, Roberto Bosio, Fabrizio Ottati, Yuebin Wang, Jongmin Lee, Luciano Lavagno, Adam Fuks
备注:Submitted to IEEE Transactions on Computers
摘要:神经处理单元(NPU)是在资源受限的边缘环境中实现高效AI推理的关键。虽然峰值每秒千万亿次操作(TOPS)通常用于衡量性能,但它不能很好地反映实际性能,并且通常与更高的硅成本相关。为了解决这个问题,架构师必须专注于最大化计算利用率,而不牺牲灵活性。本文介绍了eIQ中子效率NPU,集成到一个商业旗舰MPU,以及协同设计的编译器算法。该架构采用灵活的数据驱动设计,而编译器则使用约束编程方法来根据工作负载特性优化计算和数据移动。与领先的嵌入式NPU和编译器堆栈相比,我们的解决方案在标准AI基准测试中以相同的TOPS和内存资源实现了1.8倍的平均加速比(4倍峰值)。即使针对计算和内存资源加倍的NPU,Neutron也能提供高达3.3倍的性能。
摘要
:Neural Processing Units (NPUs) are key to enabling efficient AI inference in resource-constrained edge environments. While peak tera operations per second (TOPS) is often used to gauge performance, it poorly reflects real-world performance and typically rather correlates with higher silicon cost. To address this, architects must focus on maximizing compute utilization, without sacrificing flexibility. This paper presents the eIQ Neutron efficient-NPU, integrated into a commercial flagship MPU, alongside co-designed compiler algorithms. The architecture employs a flexible, data-driven design, while the compiler uses a constrained programming approach to optimize compute and data movement based on workload characteristics. Compared to the leading embedded NPU and compiler stack, our solution achieves an average speedup of 1.8x (4x peak) at equal TOPS and memory resources across standard AI-benchmarks. Even against NPUs with double the compute and memory resources, Neutron delivers up to 3.3x higher performance.
【9】Defining, Understanding, and Detecting Online Toxicity: Challenges and Machine Learning Approaches
标题:定义、理解和检测在线毒性:挑战和机器学习方法
链接:https://arxiv.org/abs/2509.14264
作者:shore Shahi, Tim A. Majchrzak
备注:Paper is accepted at LNCS Porceedings
摘要:网络有毒内容已成为一种普遍现象,在危机、选举和社会动荡时期愈演愈烈。大量的研究集中在使用机器学习方法检测或分析有毒内容。有毒内容在数字平台上的扩散促使人们对自动检测机制进行广泛研究,这主要是由机器学习和自然语言处理的进步推动的。总体而言,本研究综合了140篇关于数字平台上不同类型有毒内容的出版物。我们全面概述了以前研究中使用的数据集,重点是定义,数据源,挑战和机器学习方法,用于检测在线毒性,如仇恨言论,攻击性语言和有害话语。该数据集包含32种语言的内容,涵盖选举、自发事件和危机等主题。我们研究了使用现有的跨平台数据来提高分类模型性能的可能性。我们提出的建议和指导方针,新的研究对网上有毒的同意和使用的内容节制缓解。最后,我们提出了一些实用的指导方针,以减少在线平台的有毒内容。
摘要:Online toxic content has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. A significant amount of research has been focused on detecting or analyzing toxic content using machine-learning approaches. The proliferation of toxic content across digital platforms has spurred extensive research into automated detection mechanisms, primarily driven by advances in machine learning and natural language processing. Overall, the present study represents the synthesis of 140 publications on different types of toxic content on digital platforms. We present a comprehensive overview of the datasets used in previous studies focusing on definitions, data sources, challenges, and machine learning approaches employed in detecting online toxicity, such as hate speech, offensive language, and harmful discourse. The dataset encompasses content in 32 languages, covering topics such as elections, spontaneous events, and crises. We examine the possibility of using existing cross-platform data to improve the performance of classification models. We present the recommendations and guidelines for new research on online toxic consent and the use of content moderation for mitigation. Finally, we present some practical guidelines to mitigate toxic content from online platforms.
【10】Learning Rate Should Scale Inversely with High-Order Data Moments in High-Dimensional Online Independent Component Analysis
标题:在多维在线独立成分分析中,学习率应通过高位数据矩来进行不确定性缩放
链接:https://arxiv.org/abs/2509.15127
作者:n Gultekin, Samet Demir, Zafer Dogan
备注:MLSP 2025, 6 pages, 3 figures
摘要:我们调查的影响,高阶矩的学习动态的在线独立分量分析(ICA)算法下的高维数据模型组成的两个非高斯随机变量的加权和。该模型允许通过加权参数对输入力矩结构进行精确控制。基于现有的基于常微分方程(ODE)的分析在高维极限,我们证明,随着高阶矩的增加,该算法表现出较慢的收敛速度,并需要更低的学习率和更大的初始对齐,以实现信息的解决方案。我们的研究结果突出了该算法的敏感性的统计结构的输入数据,特别是其矩特性。此外,ODE框架揭示了一个关键的学习速率阈值时,时刻接近其最大值的学习。这些见解激发了矩感知初始化和自适应学习率策略的未来方向,以抵消由高非高斯性引起的学习速度下降,从而提高ICA在复杂高维环境中的鲁棒性和效率。
摘要:We investigate the impact of high-order moments on the learning dynamics of an online Independent Component Analysis (ICA) algorithm under a high-dimensional data model composed of a weighted sum of two non-Gaussian random variables. This model allows precise control of the input moment structure via a weighting parameter. Building on an existing ordinary differential equation (ODE)-based analysis in the high-dimensional limit, we demonstrate that as the high-order moments increase, the algorithm exhibits slower convergence and demands both a lower learning rate and greater initial alignment to achieve informative solutions. Our findings highlight the algorithm's sensitivity to the statistical structure of the input data, particularly its moment characteristics. Furthermore, the ODE framework reveals a critical learning rate threshold necessary for learning when moments approach their maximum. These insights motivate future directions in moment-aware initialization and adaptive learning rate strategies to counteract the degradation in learning speed caused by high non-Gaussianity, thereby enhancing the robustness and efficiency of ICA in complex, high-dimensional settings.
【11】SpeechOp: Inference-Time Task Composition for Generative Speech Processing
标题:SpeechOp:生成式语音处理的推理时任务合成
链接:https://arxiv.org/abs/2509.14298
作者:velace, Rithesh Kumar, Jiaqi Su, Ke Chen, Kilian Q Weinberger, Zeyu Jin
摘要
:虽然生成文本到语音(TTS)系统利用大量的“野外”数据取得了显着的成功,但语音到语音处理任务(如增强)面临数据限制,这导致数据饥饿的生成方法扭曲语音内容和说话人身份。为了弥合这一差距,我们提出了SpeechOp,这是一种多任务潜在扩散模型,它将预先训练的TTS模型转换为能够执行各种语音任务并在推理时以新颖方式组成的通用语音处理器。通过调整预训练的TTS模型,SpeechOp继承了对自然语音的丰富理解,加速了训练并提高了S2S任务质量,同时增强了核心TTS性能。最后,我们介绍了内隐任务合成(ITC),一种新的管道,其中ASR衍生的转录本(例如,来自Whisper)通过我们的原则性推理时间任务组合来指导SpeechOp的增强。ITC通过将网络规模的语音理解与SpeechOp的生成功能相结合,实现了最先进的内容保存。音频样本可在https://justinlovelace.github.io/projects/speechop上获得
摘要:While generative Text-to-Speech (TTS) systems leverage vast ``in-the-wild" data to achieve remarkable success, speech-to-speech processing tasks like enhancement face data limitations, which lead data-hungry generative approaches to distort speech content and speaker identity. To bridge this gap, we present SpeechOp, a multi-task latent diffusion model that transforms pre-trained TTS models into a universal speech processor capable of performing a wide range of speech tasks and composing them in novel ways at inference time. By adapting a pre-trained TTS model, SpeechOp inherits a rich understanding of natural speech, accelerating training and improving S2S task quality, while simultaneously enhancing core TTS performance. Finally, we introduce Implicit Task Composition (ITC), a novel pipeline where ASR-derived transcripts (e.g., from Whisper) guide SpeechOp's enhancement via our principled inference-time task composition. ITC achieves state-of-the-art content preservation by robustly combining web-scale speech understanding with SpeechOp's generative capabilities. Audio samples are available at https://justinlovelace.github.io/projects/speechop
检测相关(5篇)
【1】AnoF-Diff: One-Step Diffusion-Based Anomaly Detection for Forceful Tool Use
标题:AnoF-Diff:基于扩散的一步异常检测,用于强制工具使用
链接:https://arxiv.org/abs/2509.15153
作者:n, Zixuan Huang, Fan Yang, Dmitry Berenson
摘要:多变量时间序列异常检测是识别突发事件的关键,在机器学习领域已经探索了几十年。然而,直接将这些方法应用于来自强制工具使用任务的数据具有挑战性,因为现实世界中的流传感器数据往往具有固有的噪音,表现出非平稳行为,并且在不同的任务和工具中有所不同。为了解决这些挑战,我们提出了一种方法,AnoF-Diff,基于扩散模型从时间序列数据中提取力-扭矩特征,并使用力-扭矩特征来检测异常。我们比较我们的方法与其他国家的最先进的方法在F1分数和面积下的接收器操作特征曲线(AUROC)的四个有力的工具使用任务,表明我们的方法具有更好的性能,是更强大的噪声数据集。我们还提出了基于一步扩散的并行异常评分评估方法,并在几个有力的工具使用实验中演示了我们的方法如何用于在线异常检测。
摘要:Multivariate time-series anomaly detection, which is critical for identifying unexpected events, has been explored in the field of machine learning for several decades. However, directly applying these methods to data from forceful tool use tasks is challenging because streaming sensor data in the real world tends to be inherently noisy, exhibits non-stationary behavior, and varies across different tasks and tools. To address these challenges, we propose a method, AnoF-Diff, based on the diffusion model to extract force-torque features from time-series data and use force-torque features to detect anomalies. We compare our method with other state-of-the-art methods in terms of F1-score and Area Under the Receiver Operating Characteristic curve (AUROC) on four forceful tool-use tasks, demonstrating that our method has better performance and is more robust to a noisy dataset. We also propose the method of parallel anomaly score evaluation based on one-step diffusion and demonstrate how our method can be used for online anomaly detection in several forceful tool use experiments.
【2】Synthetic-to-Real Object Detection using YOLOv11 and Domain Randomization Strategies
标题:使用YOLOv 11和域随机化策略进行合成到真实对象检测
链接:https://arxiv.org/abs/2509.15045
作者:quato Niño, Hamza A. A. Gardi
摘要:本文讨论了对象检测中的合成域与真实域之间的差距,重点是训练YOLOv11模型,仅使用合成数据和域随机化策略来检测特定对象(汤罐)。该方法涉及大量的数据增强,数据集组成和模型缩放实验。虽然合成验证指标一直很高,但它们被证明是现实世界性能的不良预测指标。因此,还通过对预测的目视检查对模型进行了定性评估,并在手动标记的真实世界测试集上进行了定量评估,以指导开发。最终mAP@50分数由官方Kaggle竞赛提供。主要研究结果表明,增加合成数据集的多样性,特别是通过包括不同的视角和复杂的背景,结合精心调整的数据增强,对于弥合领域差距至关重要。性能最好的配置是在扩展和多样化的数据集上训练的YOLOv11l模型,在比赛的隐藏测试集上达到了0.910的最终mAP@50。这一结果表明了仅合成训练方法的潜力,同时也突出了在完全捕获现实世界的变化性方面仍然存在的挑战。
摘要:This paper addresses the synthetic-to-real domain gap in object detection, focusing on training a YOLOv11 model to detect a specific object (a soup can) using only synthetic data and domain randomization strategies. The methodology involves extensive experimentation with data augmentation, dataset composition, and model scaling. While synthetic validation metrics were consistently high, they proved to be poor predictors of real-world performance. Consequently, models were also evaluated qualitatively, through visual inspection of predictions, and quantitatively, on a manually labeled real-world test set, to guide development. Final mAP@50 scores were provided by the official Kaggle competition. Key findings indicate that increasing synthetic dataset diversity, specifically by including varied perspectives and complex backgrounds, combined with carefully tuned data augmentation, were crucial in bridging the domain gap. The best performing configuration, a YOLOv11l model trained on an expanded and diverse dataset, achieved a final mAP@50 of 0.910 on the competition's hidden test set. This result demonstrates the potential of a synthetic-only training approach while also highlighting the remaining challenges in fully capturing real-world variability.
【3】Credit Card Fraud Detection
标题:信用卡欺诈检测
链接:https://arxiv.org/abs/2509.15044
作者:a, Hamza A. A. Gardi
摘要:信用卡欺诈仍然是一个重大挑战,由于阶级不平衡和欺诈者模仿合法行为。本研究使用欠采样、SMOTE和混合方法在真实数据集上评估了五种机器学习模型- Logistic回归、随机森林、XGBoost、K最近邻(KNN)和多层感知器(MLP)。我们的模型在原始的不平衡测试集上进行评估,以更好地反映真实世界的性能。实验结果表明,该方法在查全率和查准率之间取得了最佳的平衡,尤其是提高了MLP和KNN的性能。
摘要
:Credit card fraud remains a significant challenge due to class imbalance and fraudsters mimicking legitimate behavior. This study evaluates five machine learning models - Logistic Regression, Random Forest, XGBoost, K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP) on a real-world dataset using undersampling, SMOTE, and a hybrid approach. Our models are evaluated on the original imbalanced test set to better reflect real-world performance. Results show that the hybrid method achieves the best balance between recall and precision, especially improving MLP and KNN performance.
【4】Beyond Marginals: Learning Joint Spatio-Temporal Patterns for Multivariate Anomaly Detection
标题:超越边缘:学习联合时空模式进行多元异常检测
链接:https://arxiv.org/abs/2509.15033
作者: Roy, Almuatazbellah Boker, Lamine Mili
备注:None
摘要:在本文中,我们的目标是提高多元异常检测(AD)的建模\textit{时变非线性时空相关性}在多元时间序列数据中发现。在多变量时间序列数据中,异常可以由相关时间序列与其预期的集体行为的同时偏差指示,即使没有单独的时间序列本身表现出明显的异常模式。在许多现有的方法中,时间序列变量被假设为(有条件)独立的,这过度简化了现实世界的相互作用。我们的方法通过对潜在空间中的联合依赖关系进行建模,并将\textit{边际分布,时间动态和变量间依赖关系}的建模解耦来解决这个问题。我们使用Transformer编码器来捕获时间模式,并对空间(变量间)依赖性进行建模,我们拟合多变量似然和copula。时间和空间分量在潜在空间中使用自监督对比学习目标进行联合训练,以学习有意义的特征表示来分离正常和异常样本。
摘要:In this paper, we aim to improve multivariate anomaly detection (AD) by modeling the \textit{time-varying non-linear spatio-temporal correlations} found in multivariate time series data . In multivariate time series data, an anomaly may be indicated by the simultaneous deviation of interrelated time series from their expected collective behavior, even when no individual time series exhibits a clearly abnormal pattern on its own. In many existing approaches, time series variables are assumed to be (conditionally) independent, which oversimplifies real-world interactions. Our approach addresses this by modeling joint dependencies in the latent space and decoupling the modeling of \textit{marginal distributions, temporal dynamics, and inter-variable dependencies}. We use a transformer encoder to capture temporal patterns, and to model spatial (inter-variable) dependencies, we fit a multi-variate likelihood and a copula. The temporal and the spatial components are trained jointly in a latent space using a self-supervised contrastive learning objective to learn meaningful feature representations to separate normal and anomaly samples.
【5】Benefits of Online Tilted Empirical Risk Minimization: A Case Study of Outlier Detection and Robust Regression
标题:在线倾斜经验风险最小化的好处:异常值检测和稳健回归的案例研究
链接:https://arxiv.org/abs/2509.15141
作者:Yildirim, Samet Demir, Zafer Dogan
备注:MLSP 2025, 6 pages, 3 figures
摘要:经验风险最小化(ERM)是监督学习的基础框架,但主要优化平均情况下的性能,往往忽略了公平性和鲁棒性的考虑。倾斜经验风险最小化(TERM)通过引入指数倾斜超参数$t$来平衡平均情况下的准确性与最坏情况下的公平性和鲁棒性,从而扩展了ERM。然而,在数据一次到达一个样本的在线或流媒体设置中,经典TERM物镜退化为标准ERM,失去倾斜灵敏度。我们解决这个问题,提出了一个在线的TERM制定,从经典的目标,保留倾斜效应,而无需额外的计算或内存开销的对数。该公式实现了由$t$控制的连续权衡,在ERM($t \to 0$)、公平性强调($t > 0$)和对离群值的鲁棒性($t < 0$)之间平滑插值。我们经验验证在线TERM两个代表性的流任务:强大的线性回归与敌对离群值和少数类检测二进制分类。我们的研究结果表明,负倾斜有效地抑制离群值的影响,而正倾斜提高召回率,对精度的影响最小,每个样本的计算成本相当于ERM。因此,在线$TERM恢复完整的鲁棒性,公平性频谱的经典TERM在一个有效的单样本学习制度。
摘要:Empirical Risk Minimization (ERM) is a foundational framework for supervised learning but primarily optimizes average-case performance, often neglecting fairness and robustness considerations. Tilted Empirical Risk Minimization (TERM) extends ERM by introducing an exponential tilt hyperparameter $t$ to balance average-case accuracy with worst-case fairness and robustness. However, in online or streaming settings where data arrive one sample at a time, the classical TERM objective degenerates to standard ERM, losing tilt sensitivity. We address this limitation by proposing an online TERM formulation that removes the logarithm from the classical objective, preserving tilt effects without additional computational or memory overhead. This formulation enables a continuous trade-off controlled by $t$, smoothly interpolating between ERM ($t \to 0$), fairness emphasis ($t > 0$), and robustness to outliers ($t < 0$). We empirically validate online TERM on two representative streaming tasks: robust linear regression with adversarial outliers and minority-class detection in binary classification. Our results demonstrate that negative tilting effectively suppresses outlier influence, while positive tilting improves recall with minimal impact on precision, all at per-sample computational cost equivalent to ERM. Online TERM thus recovers the full robustness-fairness spectrum of classical TERM in an efficient single-sample learning regime.
3D|3D重建等相关(1篇)
【1】Physics-Informed GCN-LSTM Framework for Long-Term Forecasting of 2D and 3D Microstructure Evolution
标题:基于物理的GCN-LSTM框架,用于长期预测2D和3D微结构演变
链接:https://arxiv.org/abs/2509.15029
作者: Razavi, Nele Moelans
摘要
:本文提出了一个物理信息框架,该框架将图卷积网络(GCN)与长短期记忆(LSTM)架构集成在一起,以预测2D和3D中长时间范围内的微观结构演变,并在各种指标上表现出色。所提出的框架是组成感知的,在具有不同组成的数据集上联合训练,并在潜在图空间中操作,这使得模型能够捕获组成和形态动态,同时保持计算效率。用卷积自动编码器压缩和编码相场模拟数据,并在潜图空间中操作,有助于跨成分、维度和长期范围对微观结构演变进行有效建模。该框架捕捉不断变化的微观结构的空间和时间模式,同时在训练后以降低的计算成本实现长期预测。
摘要:This paper presents a physics-informed framework that integrates graph convolutional networks (GCN) with long short-term memory (LSTM) architecture to forecast microstructure evolution over long time horizons in both 2D and 3D with remarkable performance across varied metrics. The proposed framework is composition-aware, trained jointly on datasets with different compositions, and operates in latent graph space, which enables the model to capture compositions and morphological dynamics while remaining computationally efficient. Compressing and encoding phase-field simulation data with convolutional autoencoders and operating in Latent graph space facilitates efficient modeling of microstructural evolution across composition, dimensions, and long-term horizons. The framework captures the spatial and temporal patterns of evolving microstructures while enabling long-range forecasting at reduced computational cost after training.
编码器(2篇)
【1】Learning Mechanistic Subtypes of Neurodegeneration with a Physics-Informed Variational Autoencoder Mixture Model
标题:使用物理信息变分自动编码器混合模型学习神经退行性病变的机制亚型
链接:https://arxiv.org/abs/2509.15124
作者:innawala, Annabelle Hartanto, Ivor J. A. Simpson, Peter A. Wijeratne
备注:13 pages, 5 figures, accepted at SASHIMI workshop, MICCAI 2025
摘要:模拟神经退行性疾病的潜在机制需要从稀疏的高维神经成像数据中捕获异质和空间变化的动态的方法。将基于偏微分方程(PDE)的物理知识与机器学习相结合,与经典的数值方法相比,提供了增强的可解释性和实用性。然而,目前的物理学集成机器学习方法仅限于考虑单个PDE,严重限制了它们对多种机制负责不同群体的疾病的应用(即,子类型),并加剧了模型错误指定和退化的问题。在这里,我们提出了一个深度生成模型,用于学习由基于物理的PDE控制的潜在动态模型的混合物,超越了假设单一PDE结构的传统方法。我们的方法将反应扩散偏微分方程集成在变分自动编码器(VAE)混合模型框架内,支持从神经成像数据中推断可解释的潜在变量(例如扩散率和反应率)的亚型。我们评估我们的方法合成基准,并证明其潜在的揭示机制亚型阿尔茨海默氏病的进展,从正电子发射断层扫描(PET)数据。
摘要:Modelling the underlying mechanisms of neurodegenerative diseases demands methods that capture heterogeneous and spatially varying dynamics from sparse, high-dimensional neuroimaging data. Integrating partial differential equation (PDE) based physics knowledge with machine learning provides enhanced interpretability and utility over classic numerical methods. However, current physics-integrated machine learning methods are limited to considering a single PDE, severely limiting their application to diseases where multiple mechanisms are responsible for different groups (i.e., subtypes) and aggravating problems with model misspecification and degeneracy. Here, we present a deep generative model for learning mixtures of latent dynamic models governed by physics-based PDEs, going beyond traditional approaches that assume a single PDE structure. Our method integrates reaction-diffusion PDEs within a variational autoencoder (VAE) mixture model framework, supporting inference of subtypes of interpretable latent variables (e.g. diffusivity and reaction rates) from neuroimaging data. We evaluate our method on synthetic benchmarks and demonstrate its potential for uncovering mechanistic subtypes of Alzheimer's disease progression from positron emission tomography (PET) data.
【2】Novel Phase-Noise-Tolerant Variational-Autoencoder-Based Equalization Suitable for Space-Division-Multiplexed Transmission
标题:适用于时分多路传输的新型容噪自变分自动编码器均衡
链接:https://arxiv.org/abs/2509.14072
作者:auinger, Lennart Schmitz, Patrick Matalla, Andrej Rode, Sebastian Randel, Laurent Schmalen
备注:Accepted and to be presented at the European Conference on Optical Communication (ECOC) 2025
摘要:我们证明了一种新的相位噪声容忍,变分自动编码器为基础的均衡方案的空分复用(SDM)传输的有效性在150公里的随机耦合多芯光纤的实验。
摘要:We demonstrate the effectiveness of a novel phase-noise-tolerant, variational-autoencoder-based equalization scheme for space-division-multiplexed (SDM) transmission in an experiment over 150km of randomly-coupled multi-core fibers.
优化|敛散性(7篇)
【1】Optimal Learning from Label Proportions with General Loss Functions
标题:具有一般损失函数的标签比例的最佳学习
链接:https://arxiv.org/abs/2509.15145
作者:lebaum, Travis Dick, Claudio Gentile, Haim Kaplan, Tomer Koren
摘要:受在线广告问题的启发,我们解决了从标签比例(LLP)学习的任务。在这种部分监督的设置中,训练数据由称为袋子的示例组组成,我们只观察平均标签值。然而,主要目标仍然是为单个示例的标签设计预测器。我们引入了一种新颖的和通用的低方差去偏置方法,从聚合标签信息中学习,显着推进LLP的最新技术。我们的方法具有显着的灵活性,无缝地适应了广泛的实际相关的损失函数在二进制和多类分类设置。通过仔细结合我们的估计与标准技术,我们大大提高了样本的复杂性保证了一大类的实际相关性的损失。我们还在不同的基准数据集上实证验证了我们提出的方法的有效性,证明了与标准基线相比具有令人信服的经验优势。
摘要
:Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). In this partially-supervised setting, training data consists of groups of examples, termed bags, for which we only observe the average label value. The main goal, however, remains the design of a predictor for the labels of individual examples. We introduce a novel and versatile low-variance de-biasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we substantially improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.
【2】Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers
标题:具有黑匣子层的神经网络训练的低等级代理建模和随机零阶优化
链接:https://arxiv.org/abs/2509.15113
作者:ertkov, Artem Basharin, Mikhail Saygin, Evgeny Frolov, Stanislav Straupe, Ivan Oseledets
摘要:对节能、高性能AI系统的需求不断增长,导致人们越来越关注替代计算平台(例如,光子的、神经形态的),因为它们具有加速学习和推理的潜力。然而,将这些物理组件集成到深度学习管道中仍然具有挑战性,因为物理设备通常提供有限的表达能力,并且它们的不可微性质使得设备上的反向传播变得困难或不可行。这激发了混合架构的发展,将数字神经网络与可重构的物理层相结合,有效地表现为黑匣子。在这项工作中,我们提出了一个框架,这种混合网络的端到端的培训。该框架集成了随机零阶优化更新物理层的内部参数与动态低秩代理模型,使梯度传播通过物理层。我们的方法的一个关键组成部分是隐式投影分裂积分算法,它更新轻量级代理模型后,每一个向前通过最少的硬件查询,从而避免昂贵的全矩阵重建。我们在不同的深度学习任务中展示了我们的方法,包括:计算机视觉,音频分类和语言建模。值得注意的是,在所有模态中,所提出的方法都实现了近数字基线精度,并始终能够对包含各种不可微物理组件(空间光调制器,微谐振器和马赫-曾德尔干涉仪)的混合模型进行有效的端到端训练。这项工作将硬件感知的深度学习和无梯度优化联系在一起,从而为将不可微的物理组件集成到可扩展的端到端可训练AI系统中提供了一条实用的途径。
摘要:The growing demand for energy-efficient, high-performance AI systems has led to increased attention on alternative computing platforms (e.g., photonic, neuromorphic) due to their potential to accelerate learning and inference. However, integrating such physical components into deep learning pipelines remains challenging, as physical devices often offer limited expressiveness, and their non-differentiable nature renders on-device backpropagation difficult or infeasible. This motivates the development of hybrid architectures that combine digital neural networks with reconfigurable physical layers, which effectively behave as black boxes. In this work, we present a framework for the end-to-end training of such hybrid networks. This framework integrates stochastic zeroth-order optimization for updating the physical layer's internal parameters with a dynamic low-rank surrogate model that enables gradient propagation through the physical layer. A key component of our approach is the implicit projector-splitting integrator algorithm, which updates the lightweight surrogate model after each forward pass with minimal hardware queries, thereby avoiding costly full matrix reconstruction. We demonstrate our method across diverse deep learning tasks, including: computer vision, audio classification, and language modeling. Notably, across all modalities, the proposed approach achieves near-digital baseline accuracy and consistently enables effective end-to-end training of hybrid models incorporating various non-differentiable physical components (spatial light modulators, microring resonators, and Mach-Zehnder interferometers). This work bridges hardware-aware deep learning and gradient-free optimization, thereby offering a practical pathway for integrating non-differentiable physical components into scalable, end-to-end trainable AI systems.
【3】The Role of Touch: Towards Optimal Tactile Sensing Distribution in Anthropomorphic Hands for Dexterous In-Hand Manipulation
标题:触摸的作用:在拟人手中实现最佳触觉感知分布,以实现灵巧的手部操纵
链接:https://arxiv.org/abs/2509.14984
作者:ão Almeida, Egidio Falotico, Cecilia Laschi, José Santos-Victor
摘要:手操作任务,特别是在人类启发的机器人系统中,必须依赖于分布式触觉传感来实现对各种任务的精确控制。然而,这种传感器网络的最佳配置是一个复杂的问题,虽然指尖是放置传感器的常见选择,但手的其他区域的触觉信息的贡献往往被忽视。这项工作调查的影响,触觉反馈的手指和手掌的各个区域在执行在手的对象重定向任务。我们分析了来自手部不同部位的感觉反馈如何影响深度强化学习控制策略的鲁棒性,并研究了对象特征与最佳传感器位置之间的关系。我们确定哪些触觉传感配置有助于提高操作的效率和准确性。我们的研究结果提供了有价值的见解拟人化的末端执行器的设计和使用,增强了操作能力。
摘要:In-hand manipulation tasks, particularly in human-inspired robotic systems, must rely on distributed tactile sensing to achieve precise control across a wide variety of tasks. However, the optimal configuration of this network of sensors is a complex problem, and while the fingertips are a common choice for placing sensors, the contribution of tactile information from other regions of the hand is often overlooked. This work investigates the impact of tactile feedback from various regions of the fingers and palm in performing in-hand object reorientation tasks. We analyze how sensory feedback from different parts of the hand influences the robustness of deep reinforcement learning control policies and investigate the relationship between object characteristics and optimal sensor placement. We identify which tactile sensing configurations contribute to improving the efficiency and accuracy of manipulation. Our results provide valuable insights for the design and use of anthropomorphic end-effectors with enhanced manipulation capabilities.
【4】Stochastic Bilevel Optimization with Heavy-Tailed Noise
标题:具有重尾噪音的随机二层优化
链接:https://arxiv.org/abs/2509.14952
作者: Liu, Luo Luo
摘要
:This paper considers the smooth bilevel optimization in which the lower-level problem is strongly convex and the upper-level problem is possibly nonconvex. We focus on the stochastic setting that the algorithm can access the unbiased stochastic gradient evaluation with heavy-tailed noise, which is prevalent in many machine learning applications such as training large language models and reinforcement learning. We propose a nested-loop normalized stochastic bilevel approximation (N$^2$SBA) for finding an $\epsilon$-stationary point with the stochastic first-order oracle (SFO) complexity of $\tilde{\mathcal{O}}\big(\kappa^{\frac{7p-3}{p-1}} \sigma^{\frac{p}{p-1}} \epsilon^{-\frac{4 p - 2}{p-1}}\big)$, where $\kappa$ is the condition number, $p\in(1,2]$ is the order of central moment for the noise, and $\sigma$ is the noise level. Furthermore, we specialize our idea to solve the nonconvex-strongly-concave minimax optimization problem, achieving an $\epsilon$-stationary point with the SFO complexity of $\tilde{\mathcal O}\big(\kappa^{\frac{2p-1}{p-1}} \sigma^{\frac{p}{p-1}} \epsilon^{-\frac{3p-2}{p-1}}\big)$. All above upper bounds match the best-known results under the special case of the bounded variance setting, i.e., $p=2$.
摘要:This paper considers the smooth bilevel optimization in which the lower-level problem is strongly convex and the upper-level problem is possibly nonconvex. We focus on the stochastic setting that the algorithm can access the unbiased stochastic gradient evaluation with heavy-tailed noise, which is prevalent in many machine learning applications such as training large language models and reinforcement learning. We propose a nested-loop normalized stochastic bilevel approximation (N$^2$SBA) for finding an $\epsilon$-stationary point with the stochastic first-order oracle (SFO) complexity of $\tilde{\mathcal{O}}\big(\kappa^{\frac{7p-3}{p-1}} \sigma^{\frac{p}{p-1}} \epsilon^{-\frac{4 p - 2}{p-1}}\big)$, where $\kappa$ is the condition number, $p\in(1,2]$ is the order of central moment for the noise, and $\sigma$ is the noise level. Furthermore, we specialize our idea to solve the nonconvex-strongly-concave minimax optimization problem, achieving an $\epsilon$-stationary point with the SFO complexity of $\tilde{\mathcal O}\big(\kappa^{\frac{2p-1}{p-1}} \sigma^{\frac{p}{p-1}} \epsilon^{-\frac{3p-2}{p-1}}\big)$. All above upper bounds match the best-known results under the special case of the bounded variance setting, i.e., $p=2$.
【5】Decentralized Optimization with Topology-Independent Communication
标题:具有拓扑无关通信的分散优化
链接:https://arxiv.org/abs/2509.14488
作者: Yao Kuang, Ahmet Alacaoglu, Michael P. Friedlander
备注:36 pages
摘要:分布式优化需要节点协调,但完全同步的规模很差。当$n$个节点通过$m$个成对正则化器协作时,标准方法每次迭代需要$\mathcal{O}(m)$通信。本文提出了随机局部协调:每个节点独立地采样一个正则化子,并且只与共享该项的节点协调。这利用了部分可分性,其中每个正则化器$G_j$依赖于节点的子集$S_j \subseteq \{1,\ldots,n\}$。对于图引导的正则化器,其中$|S_j| =2$,预期的通信下降到每次迭代正好2条消息。该方法实现了对凸目标的$\tilde{\mathcal{O}}(\varepsilon^{-2})$迭代,在强凸性下,$\mathcal{O}(\varepsilon^{-1})$到$\varepsilon$-解,$\mathcal{O}(\log(1/\varepsilon))$到邻域。用一个随机选择的正则化子$G_j$的邻近映射替换sum $\sum_j G_j $的邻近映射保持了收敛性,同时消除了全局协调。实验验证了合成和真实世界数据集的收敛速度和通信效率。
摘要:Distributed optimization requires nodes to coordinate, yet full synchronization scales poorly. When $n$ nodes collaborate through $m$ pairwise regularizers, standard methods demand $\mathcal{O}(m)$ communications per iteration. This paper proposes randomized local coordination: each node independently samples one regularizer uniformly and coordinates only with nodes sharing that term. This exploits partial separability, where each regularizer $G_j$ depends on a subset $S_j \subseteq \{1,\ldots,n\}$ of nodes. For graph-guided regularizers where $|S_j|=2$, expected communication drops to exactly 2 messages per iteration. This method achieves $\tilde{\mathcal{O}}(\varepsilon^{-2})$ iterations for convex objectives and under strong convexity, $\mathcal{O}(\varepsilon^{-1})$ to an $\varepsilon$-solution and $\mathcal{O}(\log(1/\varepsilon))$ to a neighborhood. Replacing the proximal map of the sum $\sum_j G_j$ with the proximal map of a single randomly selected regularizer $G_j$ preserves convergence while eliminating global coordination. Experiments validate both convergence rates and communication efficiency across synthetic and real-world datasets.
【6】Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
标题:迈向稳健的统计CUDA核心基准测试、验证和优化
链接:https://arxiv.org/abs/2509.14279
作者:arko Lange, Qi Sun, Aaditya Prasad, Maxence Faldor, Yujin Tang, David Ha
备注:62 pages, 10 figures
摘要:大型语言模型(LLM)的最新进展证明了它们在扩展软件工程任务的测试时计算方面的有效性。然而,这些方法通常专注于高级解决方案,而对优化低级CUDA内核实现的关注有限。此外,现有的内核生成基准遭受可利用的漏洞和测试条件的多样性不足,阻碍真正的泛化评估。为了解决这些限制,我们引入了鲁棒kbench,一个新的基准严格评估内核的性能和正确性在不同的情况下。此外,我们提出了一个全面的代理框架,自动CUDA内核的发现,验证和优化。该管道使前沿LLM能够将torch代码转换为CUDA内核,并在我们强大的评估设置中迭代地改进其运行时间。我们的顺序工作流程首先将PyTorch代码转换为等效的CUDA内核。然后,它使用专为CUDA生态系统量身定制的新的进化元生成过程来优化其运行时,由基于LLM的验证器指导,以实现正确性和有效过滤。在鲁棒kbench上进行评估,我们的方法产生的CUDA内核优于实际应用中的torch实现,包括向前和向后传递。它可以融合操作并部署各种运行时优化策略。验证器工作流程准确地分类不正确的内核,提高硬件验证效率。
摘要:Recent advances in large language models (LLMs) demonstrate their effectiveness in scaling test-time compute for software engineering tasks. However, these approaches often focus on high-level solutions, with limited attention to optimizing low-level CUDA kernel implementations. Additionally, existing kernel generation benchmarks suffer from exploitable loopholes and insufficient diversity in testing conditions, hindering true generalization assessment. To address these limitations, we introduce robust-kbench, a new benchmark for rigorous evaluation of kernel performance and correctness across varied scenarios. Furthermore, we present a comprehensive agentic framework that automates CUDA kernel discovery, verification, and optimization. This pipeline enables frontier LLMs to translate torch code to CUDA kernels and iteratively improve their runtime within our robust evaluation setting. Our sequential workflow first translates PyTorch code into equivalent CUDA kernels. It then optimizes their runtime using a novel evolutionary meta-generation procedure tailored to the CUDA ecosystem, guided by LLM-based verifiers for correctness and efficient filtering. Evaluated on robust-kbench, our approach produces CUDA kernels outperforming torch implementations for practical applications, including forward and backward passes. It can fuse operations and deploy various runtime optimization strategies. The verifier workflow accurately classifies incorrect kernels, enhancing hardware verification efficiency.
【7】Inspired by machine learning optimization: can gradient-based optimizers solve cycle skipping in full waveform inversion given sufficient iterations?
标题:受机器学习优化的启发:如果有足够的迭代,基于梯度的优化器能否解决全波倒置中的循环跳过问题?
链接:https://arxiv.org/abs/2509.14919
作者: Omar M. Saad, Shaowen Wang, Tariq Alkhalifah
备注:40 pages, 40 figures
摘要:全波形反演(FWI)通过最小化观测数据和模拟数据之间的差异来迭代地更新速度模型。由于与全局优化算法相关联的高计算成本和存储器要求,FWI通常使用局部优化方法来实现。然而,当初始速度模型不准确并且低频地震数据(例如,低于3Hz)的情况下,模拟数据和观测数据之间的失配可能超过半个周期,这种现象称为周期跳跃。在这种情况下,局部优化算法(例如,基于梯度的局部优化器)趋向于收敛到局部最小值,从而导致不准确的反演结果。在机器学习中,神经网络训练也是一个容易出现局部极小值的优化问题。它通常采用基于梯度的优化器,具有相对较大的学习率(超出通常由线搜索数值确定的局部优化的理论限制),这使得优化表现得像准全局优化器。因此,经过数千次迭代训练,我们可以获得具有强大生成能力的神经网络模型。在这项研究中,我们还采用了基于梯度的优化器,具有相对较大的学习率FWI。从合成和现场数据实验的结果表明,FWI最初可能会收敛到局部最小值,然而,有足够的额外的迭代,反演可以逐渐接近全局最小值,慢慢地从浅地下到深,最终产生一个准确的速度模型。数值算例表明,只要迭代次数足够多,即使在5 Hz以下的低频数据缺失的情况下,仍能得到合理的速度反演结果。
摘要:Full waveform inversion (FWI) iteratively updates the velocity model by minimizing the difference between observed and simulated data. Due to the high computational cost and memory requirements associated with global optimization algorithms, FWI is typically implemented using local optimization methods. However, when the initial velocity model is inaccurate and low-frequency seismic data (e.g., below 3 Hz) are absent, the mismatch between simulated and observed data may exceed half a cycle, a phenomenon known as cycle skipping. In such cases, local optimization algorithms (e.g., gradient-based local optimizers) tend to converge to local minima, leading to inaccurate inversion results. In machine learning, neural network training is also an optimization problem prone to local minima. It often employs gradient-based optimizers with a relatively large learning rate (beyond the theoretical limits of local optimization that are usually determined numerically by a line search), which allows the optimization to behave like a quasi-global optimizer. Consequently, after training for several thousand iterations, we can obtain a neural network model with strong generative capability. In this study, we also employ gradient-based optimizers with a relatively large learning rate for FWI. Results from both synthetic and field data experiments show that FWI may initially converge to a local minimum; however, with sufficient additional iterations, the inversion can gradually approach the global minimum, slowly from shallow subsurface to deep, ultimately yielding an accurate velocity model. Furthermore, numerical examples indicate that, given sufficient iterations, reasonable velocity inversion results can still be achieved even when low-frequency data below 5 Hz are missing.
预测|估计(13篇)
【1】Out-of-Sight Trajectories: Tracking, Fusion, and Prediction
标题:视线外轨迹:跟踪、融合和预测
链接:https://arxiv.org/abs/2509.15219
作者:hang, Yi Xu, Yun Fu
摘要:轨迹预测是计算机视觉和自主系统中的一项关键任务,在自动驾驶、机器人、监控和虚拟现实中发挥着关键作用。现有的方法通常依赖于完整且无噪声的观测数据,忽略了与视线外物体相关的挑战以及由有限的相机覆盖范围、障碍物和缺乏去噪轨迹的地面实况引起的传感器数据中的固有噪声。这些限制带来了安全风险,并阻碍了在现实世界中的可靠预测。在这个扩展的工作中,我们提出了在视线外轨迹(OST),一个新的任务,预测无噪声的视觉轨迹的视线外的对象使用嘈杂的传感器数据的进步。在我们之前研究的基础上,我们扩大了视线外轨迹预测(OOSTraj)的范围,将行人和车辆包括在内,将其适用性扩展到自动驾驶、机器人、监控和虚拟现实。我们增强的视觉定位去噪模块利用相机校准来建立视觉定位映射,解决视觉参考的缺乏,同时以无监督的方式有效地对噪声传感器数据进行去噪。通过对Vi-Fi和JRDB数据集的广泛评估,我们的方法在轨迹去噪和预测方面都达到了最先进的性能,大大超过了以前的基线。此外,我们还介绍了与传统去噪方法(如卡尔曼滤波)的比较,并使最近的轨迹预测模型适应我们的任务,提供了一个全面的基准。这项工作代表了第一个整合视觉定位投影的倡议,用于对视线外代理的噪声传感器轨迹进行降噪,为未来的进步铺平了道路。代码和预处理数据集可在github.com/Hai-chao-Zhang/OST上获得
摘要:Trajectory prediction is a critical task in computer vision and autonomous systems, playing a key role in autonomous driving, robotics, surveillance, and virtual reality. Existing methods often rely on complete and noise-free observational data, overlooking the challenges associated with out-of-sight objects and the inherent noise in sensor data caused by limited camera coverage, obstructions, and the absence of ground truth for denoised trajectories. These limitations pose safety risks and hinder reliable prediction in real-world scenarios. In this extended work, we present advancements in Out-of-Sight Trajectory (OST), a novel task that predicts the noise-free visual trajectories of out-of-sight objects using noisy sensor data. Building on our previous research, we broaden the scope of Out-of-Sight Trajectory Prediction (OOSTraj) to include pedestrians and vehicles, extending its applicability to autonomous driving, robotics, surveillance, and virtual reality. Our enhanced Vision-Positioning Denoising Module leverages camera calibration to establish a vision-positioning mapping, addressing the lack of visual references, while effectively denoising noisy sensor data in an unsupervised manner. Through extensive evaluations on the Vi-Fi and JRDB datasets, our approach achieves state-of-the-art performance in both trajectory denoising and prediction, significantly surpassing previous baselines. Additionally, we introduce comparisons with traditional denoising methods, such as Kalman filtering, and adapt recent trajectory prediction models to our task, providing a comprehensive benchmark. This work represents the first initiative to integrate vision-positioning projection for denoising noisy sensor trajectories of out-of-sight agents, paving the way for future advances. The code and preprocessed datasets are available at github.com/Hai-chao-Zhang/OST
【2】Efficient Conformal Prediction for Regression Models under Label Noise
标题:标签噪音下回归模型的有效共形预测
链接:https://arxiv.org/abs/2509.15120
作者:en, Jacob Goldberger, Tom Tirer
摘要
:在高风险场景中,如医学成像应用,为回归模型的预测提供可靠的置信区间至关重要。最近,共形预测(CP)已经成为一个强大的统计框架,基于标记的校准集,生成的间隔,包括真正的标签与预先指定的概率。在本文中,我们解决的问题,应用CP的回归模型时,校准集包含噪声标签。首先,我们建立一个数学接地的程序,估计无噪声CP阈值。然后,我们把它变成一个实用的算法,克服了回归问题的连续性所带来的挑战。我们评估所提出的方法在两个医学成像回归数据集与高斯标签噪声。我们的方法显着优于现有的替代方案,实现接近清洁标签设置的性能。
摘要:In high-stakes scenarios, such as medical imaging applications, it is critical to equip the predictions of a regression model with reliable confidence intervals. Recently, Conformal Prediction (CP) has emerged as a powerful statistical framework that, based on a labeled calibration set, generates intervals that include the true labels with a pre-specified probability. In this paper, we address the problem of applying CP for regression models when the calibration set contains noisy labels. We begin by establishing a mathematically grounded procedure for estimating the noise-free CP threshold. Then, we turn it into a practical algorithm that overcomes the challenges arising from the continuous nature of the regression problem. We evaluate the proposed method on two medical imaging regression datasets with Gaussian label noise. Our method significantly outperforms the existing alternative, achieving performance close to the clean-label setting.
【3】Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting
标题:超线性:用于时间序列预测的轻量级预训练线性专家混合
链接:https://arxiv.org/abs/2509.15105
作者:humsohn, Raz Marshanski, Hedi Zisling, Omri Azencot
摘要:时间序列预测(TSF)在能源、金融、医疗保健和物流等领域至关重要,需要在不同数据集上进行泛化的模型。大型预训练模型,如Chronos和Time-MoE,显示出强大的zero-shot(ZS)性能,但计算成本高。在这项工作中,我们介绍了超线性,一个轻量级的和可扩展的混合专家(MoE)模型的一般预测。它用简单的频率专用线性专家取代了深度架构,这些专家在多个频率范围内对重新采样的数据进行了培训。轻量级的频谱门控机制动态选择相关专家,实现高效、准确的预测。尽管它的简单性,超线性匹配最先进的性能,同时提供卓越的效率,各种采样率的鲁棒性和增强的可解释性。Super-Linear的实现可以在\href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}上找到
摘要:Time series forecasting (TSF) is critical in domains like energy, finance, healthcare, and logistics, requiring models that generalize across diverse datasets. Large pre-trained models such as Chronos and Time-MoE show strong zero-shot (ZS) performance but suffer from high computational costs. In this work, We introduce Super-Linear, a lightweight and scalable mixture-of-experts (MoE) model for general forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. Despite its simplicity, Super-Linear matches state-of-the-art performance while offering superior efficiency, robustness to various sampling rates, and enhanced interpretability. The implementation of Super-Linear is available at \href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}
【4】From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting in Noisy Financial Markets
标题:从模式到预测:一个基于Shapelet的在喧闹的金融市场中进行方向预测的框架
链接:https://arxiv.org/abs/2509.15040
作者:, Hyunwook Lee, Hyotaek Jeon, Seungmin Jin, Sungahn Ko
备注:10 pages, 7 figures, accepted at ACM CIKM 2025 conference
摘要:金融市场的方向性预测需要准确性和可解释性。在深度学习出现之前,基于人类定义模式的可解释方法很普遍,但它们的结构复杂性和规模模糊性阻碍了泛化。相比之下,深度学习模型可以有效地捕捉复杂的动态,但通常提供有限的透明度。为了弥合这一差距,我们提出了一个两阶段的框架,集成了无监督模式提取与可解释的预测。(i)SIMPC分割和聚类多变量时间序列,提取经常性的模式,是不变的幅度缩放和时间失真,即使在不同的窗口大小。(ii)JISC-Net是一个基于形状的分类器,它使用提取的模式的初始部分作为输入,并预测随后的短期方向运动的部分序列。在比特币和三只标准普尔500指数股票上的实验表明,我们的方法在12个度量数据集组合中的11个中排名第一或第二,始终优于基线。与传统的深度学习模型不同,传统的深度学习模型输出买入或卖出信号而没有可解释的理由,我们的方法通过揭示驱动预测结果的潜在模式结构来实现透明的决策。
摘要:Directional forecasting in financial markets requires both accuracy and interpretability. Before the advent of deep learning, interpretable approaches based on human-defined patterns were prevalent, but their structural vagueness and scale ambiguity hindered generalization. In contrast, deep learning models can effectively capture complex dynamics, yet often offer limited transparency. To bridge this gap, we propose a two-stage framework that integrates unsupervised pattern extracion with interpretable forecasting. (i) SIMPC segments and clusters multivariate time series, extracting recurrent patterns that are invariant to amplitude scaling and temporal distortion, even under varying window sizes. (ii) JISC-Net is a shapelet-based classifier that uses the initial part of extracted patterns as input and forecasts subsequent partial sequences for short-term directional movement. Experiments on Bitcoin and three S&P 500 equities demonstrate that our method ranks first or second in 11 out of 12 metric--dataset combinations, consistently outperforming baselines. Unlike conventional deep learning models that output buy-or-sell signals without interpretable justification, our approach enables transparent decision-making by revealing the underlying pattern structures that drive predictive outcomes.
【5】Data-Driven Prediction of Maternal Nutritional Status in Ethiopia Using Ensemble Machine Learning Models
标题:使用Ensemble机器学习模型以数据驱动预测埃塞俄比亚孕产妇营养状况
链接:https://arxiv.org/abs/2509.14945
作者:ssema, Tizazu Bayih, Kassahun Azezew, Ayenew Kassie
备注:9 pages, 5 figures, 2 Tables
摘要
:在埃塞俄比亚,孕妇营养不良是一个重大的公共卫生挑战,增加了孕产妇和新生儿不良后果的风险。传统的统计方法往往无法捕捉营养状况的复杂和多层面的决定因素。这项研究利用集成机器学习技术开发了一个预测模型,利用埃塞俄比亚人口和健康调查(2005-2020年)的数据,包括18,108条记录,其中30个社会人口和健康属性。数据预处理包括处理缺失值、归一化和使用SMOTE进行平衡,然后进行特征选择以识别关键预测因子。几个监督集成算法,包括XGBoost,随机森林,CatBoost,和AdaBoost被应用于分类营养状况。其中,随机森林模型表现最好,将女性分为四类(正常,中度营养不良,重度营养不良和营养过剩),准确率为97.87%,精确率为97.88%,召回率为97.87%,F1评分为97.87%,ROC AUC为99.86%。这些发现证明了集成学习在从复杂数据集中捕获隐藏模式方面的有效性,并为早期发现营养风险提供了及时的见解。研究结果为医疗保健提供者,政策制定者和研究人员提供了实际意义,支持数据驱动的战略,以改善埃塞俄比亚的孕产妇营养和健康状况。
摘要:Malnutrition among pregnant women is a major public health challenge in Ethiopia, increasing the risk of adverse maternal and neonatal outcomes. Traditional statistical approaches often fail to capture the complex and multidimensional determinants of nutritional status. This study develops a predictive model using ensemble machine learning techniques, leveraging data from the Ethiopian Demographic and Health Survey (2005-2020), comprising 18,108 records with 30 socio-demographic and health attributes. Data preprocessing included handling missing values, normalization, and balancing with SMOTE, followed by feature selection to identify key predictors. Several supervised ensemble algorithms including XGBoost, Random Forest, CatBoost, and AdaBoost were applied to classify nutritional status. Among them, the Random Forest model achieved the best performance, classifying women into four categories (normal, moderate malnutrition, severe malnutrition, and overnutrition) with 97.87% accuracy, 97.88% precision, 97.87% recall, 97.87% F1-score, and 99.86% ROC AUC. These findings demonstrate the effectiveness of ensemble learning in capturing hidden patterns from complex datasets and provide timely insights for early detection of nutritional risks. The results offer practical implications for healthcare providers, policymakers, and researchers, supporting data-driven strategies to improve maternal nutrition and health outcomes in Ethiopia.
【6】DAG: A Dual Causal Network for Time Series Forecasting with Exogenous Variables
标题:DAB:一个用于外生变量时间序列预测的双因果网络
链接:https://arxiv.org/abs/2509.14933
作者:Qiu, Yuhan Zhu, Zhengyu Li, Hanyin Cheng, Xingjian Wu, Chenjuan Guo, Bin Yang, Jilin Hu
摘要:时间序列预测在经济、交通和AIOps等各个领域都至关重要。然而,在现实世界的应用中,仅仅关注内生变量(即,目标变量),往往不足以确保准确的预测。考虑到外生变量(即,协变量)提供了额外的预测信息,从而提高了预测精度。然而,现有的外生变量时间序列预测方法(TSF-X)存在以下不足:1)没有利用未来的外生变量; 2)没有考虑内外生变量之间的因果关系。因此,它们的性能是次优的。在这项研究中,为了更好地利用外生变量,特别是未来的外生变量,我们提出了一个通用框架DAG,它利用双因果网络沿时间和通道维度的时间序列预测与外生变量。具体来说,我们首先介绍了时间因果模块,其中包括一个因果发现模块,以捕捉历史外生变量如何影响未来的外生变量。随后,我们构建了一个因果注入模块,将发现的因果关系纳入基于历史内生变量预测未来内生变量的过程中。接下来,我们提出了通道因果模块,它遵循类似的设计原则。它的特点是因果发现模块模拟历史外生变量如何影响历史内生变量,因果注入模块结合发现的关系,以增强基于未来外生变量对未来内生变量的预测。
摘要:Time series forecasting is crucial in various fields such as economics, traffic, and AIOps. However, in real-world applications, focusing solely on the endogenous variables (i.e., target variables), is often insufficient to ensure accurate predictions. Considering exogenous variables (i.e., covariates) provides additional predictive information, thereby improving forecasting accuracy. However, existing methods for time series forecasting with exogenous variables (TSF-X) have the following shortcomings: 1) they do not leverage future exogenous variables, 2) they fail to account for the causal relationships between endogenous and exogenous variables. As a result, their performance is suboptimal. In this study, to better leverage exogenous variables, especially future exogenous variable, we propose a general framework DAG, which utilizes dual causal network along both the temporal and channel dimensions for time series forecasting with exogenous variables. Specifically, we first introduce the Temporal Causal Module, which includes a causal discovery module to capture how historical exogenous variables affect future exogenous variables. Following this, we construct a causal injection module that incorporates the discovered causal relationships into the process of forecasting future endogenous variables based on historical endogenous variables. Next, we propose the Channel Causal Module, which follows a similar design principle. It features a causal discovery module models how historical exogenous variables influence historical endogenous variables, and a causal injection module incorporates the discovered relationships to enhance the prediction of future endogenous variables based on future exogenous variables.
【7】DPANet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting
标题:DPANet:用于多元时间序列预测的双金字塔注意力网络
链接:https://arxiv.org/abs/2509.14868
作者:Li, Xingjun Zhang, Shaoxun Wang, Jia Wei
摘要:我们进行了严格的消融研究,以验证DPANet的关键组件(Table \ref{tab:ablation-study})。完整模型始终优于所有变体。为了测试我们的双域假设,我们设计了两个专门的版本:仅时间模型(融合两个相同的时间金字塔)和仅频率模型(融合两个光谱金字塔)。这两种变体表现不佳显着,确认异构的时间和频率信息的融合是至关重要的。此外,用更简单的方法(w/o Cross-Fusion)取代交叉注意机制导致了最严重的性能下降。这一结果强调了我们的交互式融合块是最重要的组成部分。
摘要:We conducted rigorous ablation studies to validate DPANet's key components (Table \ref{tab:ablation-study}). The full model consistently outperforms all variants. To test our dual-domain hypothesis, we designed two specialized versions: a Temporal-Only model (fusing two identical temporal pyramids) and a Frequency-Only model (fusing two spectral pyramids). Both variants underperformed significantly, confirming that the fusion of heterogeneous temporal and frequency information is critical. Furthermore, replacing the cross-attention mechanism with a simpler method (w/o Cross-Fusion) caused the most severe performance degradation. This result underscores that our interactive fusion block is the most essential component.
【8】STEP: Structured Training and Evaluation Platform for benchmarking trajectory prediction models
标题:Step:结构化训练和评估平台,用于基准轨迹预测模型
链接:https://arxiv.org/abs/2509.14801
作者: Schumann, Anna Mészáros, Jens Kober, Arkady Zgonnikov
摘要:虽然轨迹预测在实现自动驾驶车辆安全有效的路径规划方面发挥着关键作用,但评估此类模型的标准化实践仍然不发达。最近的努力旨在统一数据集格式和模型接口,以便更容易进行比较,但现有的框架往往在支持异构流量场景,联合预测模型或用户文档方面存在不足。在这项工作中,我们介绍了STEP -一个新的基准框架,通过为多个数据集提供统一的接口,实施一致的训练和评估条件,并支持广泛的预测模型来解决这些限制。我们在许多实验中展示了STEP的能力,这些实验揭示了1)广泛使用的测试程序的局限性,2)代理联合建模对更好地预测交互的重要性,以及3)当前最先进的模型对分布变化和对抗代理的目标攻击的脆弱性。通过STEP,我们的目标是将重点从“排行榜”方法转移到对复杂多智能体环境中的模型行为和泛化的更深层次的见解。
摘要:While trajectory prediction plays a critical role in enabling safe and effective path-planning in automated vehicles, standardized practices for evaluating such models remain underdeveloped. Recent efforts have aimed to unify dataset formats and model interfaces for easier comparisons, yet existing frameworks often fall short in supporting heterogeneous traffic scenarios, joint prediction models, or user documentation. In this work, we introduce STEP -- a new benchmarking framework that addresses these limitations by providing a unified interface for multiple datasets, enforcing consistent training and evaluation conditions, and supporting a wide range of prediction models. We demonstrate the capabilities of STEP in a number of experiments which reveal 1) the limitations of widely-used testing procedures, 2) the importance of joint modeling of agents for better predictions of interactions, and 3) the vulnerability of current state-of-the-art models against both distribution shifts and targeted attacks by adversarial agents. With STEP, we aim to shift the focus from the ``leaderboard'' approach to deeper insights about model behavior and generalization in complex multi-agent settings.
【9】FlowCast-ODE: Continuous Hourly Weather Forecasting with Dynamic Flow Matching and ODE Integration
标题:FlowCast-ODE:具有动态流匹配和ODE集成的连续逐小时天气预报
链接:https://arxiv.org/abs/2509.14775
作者:ang He, Yuanting Zhang, Hongli Liang, Qingye Meng, Xingyuan Yuan
摘要:准确的每小时天气预报对于许多应用至关重要。最近的深度学习模型在6小时间隔上表现出了强大的能力,但实现准确和稳定的每小时预测仍然是一个关键挑战。这主要是由于ERA 5数据的12小时同化周期内自回归推出和时间不连续性中的误差的快速积累。为了解决这些问题,我们提出了FlowCast-ODE,一个框架,模型大气状态演变为一个连续流。FlowCast-ODE直接从先前的状态学习条件流路径,这种方法更自然地与物理动态系统保持一致,并实现高效计算。引入了一种从粗到细的策略,使用动态流量匹配根据6小时数据对模型进行训练,然后根据每小时数据进行细化,该数据包含常微分方程(ODE)求解器,以实现时间连贯的预测。此外,提出了一种轻量级的低秩AdaLN零调制机制,在不影响精度的情况下将模型大小减少了15%。实验表明,FlowCast-ODE优于强基线,产生更低的均方根误差(RMSE)和更好的能量守恒,从而减少模糊并保留更多的精细尺度空间细节。它在预测台风等极端事件方面的表现也与最先进的模型相当。此外,该模型解释了与同化周期转换相关的时间不连续性。
摘要:Accurate hourly weather forecasting is critical for numerous applications. Recent deep learning models have demonstrated strong capability on 6-hour intervals, yet achieving accurate and stable hourly predictions remains a critical challenge. This is primarily due to the rapid accumulation of errors in autoregressive rollouts and temporal discontinuities within the ERA5 data's 12-hour assimilation cycle. To address these issues, we propose FlowCast-ODE, a framework that models atmospheric state evolution as a continuous flow. FlowCast-ODE learns the conditional flow path directly from the previous state, an approach that aligns more naturally with physical dynamic systems and enables efficient computation. A coarse-to-fine strategy is introduced to train the model on 6-hour data using dynamic flow matching and then refined on hourly data that incorporates an Ordinary Differential Equation (ODE) solver to achieve temporally coherent forecasts. In addition, a lightweight low-rank AdaLN-Zero modulation mechanism is proposed and reduces model size by 15% without compromising accuracy. Experiments demonstrate that FlowCast-ODE outperforms strong baselines, yielding lower root mean square error (RMSE) and better energy conservation, which reduces blurring and preserves more fine-scale spatial details. It also shows comparable performance to the state-of-the-art model in forecasting extreme events like typhoons. Furthermore, the model alleviates temporal discontinuities associated with assimilation cycle transitions.
【10】Predicting Case Suffixes With Activity Start and End Times: A Sweep-Line Based Approach
标题:预测具有活动开始和结束时间的案例后缀:基于扫描线的方法
链接:https://arxiv.org/abs/2509.14536
作者:Awais Ali, Marlon Dumas, Fredrik Milani
摘要:预测性流程监控技术通过预测业务流程的正在进行的案例的未来状态来支持操作决策。这些技术的一个子集预测一个正在进行的情况下(情况后缀预测)的活动的剩余序列。用于案例后缀预测的现有方法生成具有单个时间戳(例如,结束时间戳)的活动序列。这个输出对于资源容量规划来说是不够的,在资源容量规划中,我们需要推断资源将忙于执行工作的时间段。本文介绍了一种技术,用于预测的情况下,由活动的开始和结束时间戳的后缀。换句话说,所提出的技术预测每个活动的等待时间和处理时间。由于一个案例中活动的等待时间取决于其他案例中资源的繁忙程度,因此该技术采用了扫描线方法,其中流程中所有正在进行的案例的后缀都是同步预测的,而不是孤立地对每个案例进行预测。对真实数据集和合成数据集的评估比较了该方法不同实例的准确性,证明了多模型方法在案例后缀预测方面的优势。
摘要
:Predictive process monitoring techniques support the operational decision making by predicting future states of ongoing cases of a business process. A subset of these techniques predict the remaining sequence of activities of an ongoing case (case suffix prediction). Existing approaches for case suffix prediction generate sequences of activities with a single timestamp (e.g. the end timestamp). This output is insufficient for resource capacity planning, where we need to reason about the periods of time when resources will be busy performing work. This paper introduces a technique for predicting case suffixes consisting of activities with start and end timestamps. In other words, the proposed technique predicts both the waiting time and the processing time of each activity. Since the waiting time of an activity in a case depends on how busy resources are in other cases, the technique adopts a sweep-line approach, wherein the suffixes of all ongoing cases in the process are predicted in lockstep, rather than predictions being made for each case in isolation. An evaluation on real-life and synthetic datasets compares the accuracy of different instantiations of this approach, demonstrating the advantages of a multi-model approach to case suffix prediction.
【11】Towards universal property prediction in Cartesian space: TACE is all you need
标题:迈向Cartesian空间中的普适属性预测:您所需要的一切都是ACE
链接:https://arxiv.org/abs/2509.14961
作者: Wenbo Xie, Daiqian Xie, P. Hu
摘要:机器学习已经彻底改变了原子模拟和材料科学,但目前的方法通常依赖于球谐表示。在这里,我们介绍张量原子团簇展开和张量矩势,第一个统一的框架制定完全在笛卡尔空间的任意结构确定的张量性质的系统预测。TACE通过将原子环境分解为(不可约的)笛卡尔张量的完整层次来实现这一点,确保自然编码不变性和等变性约束的一致性表示。除了几何形状,TACE还采用了通用嵌入,灵活地集成了各种属性,包括基组,电荷,磁矩和场扰动。这允许在预测过程中显式控制外部不变量和等变量。长程相互作用也通过短程近似内的潜在埃瓦尔德求和模块准确描述,提供了一个严格但计算效率高的静电相互作用处理。我们证明了TACE达到的准确性,稳定性和效率与有限分子和扩展材料的领先等变框架相当或超过,包括域内和域外基准,光谱,hessians,外场响应,带电系统,磁性系统,多保真度训练和非均相催化系统。至关重要的是,TACE桥接标量和张量建模,并建立了一个笛卡尔空间范式,统一和扩展到基于球谐的方法的设计空间之外。这项工作为新一代通用原子机器学习模型奠定了基础,这些模型能够在一个连贯的框架内系统地捕捉几何、场和材料特性之间的丰富相互作用。
摘要:Machine learning has revolutionized atomistic simulations and materials science, yet current approaches often depend on spherical-harmonic representations. Here we introduce the Tensor Atomic Cluster Expansion and Tensor Moment Potential, the first unified framework formulated entirely in Cartesian space for the systematic prediction of arbitrary structure-determined tensorial properties. TACE achieves this by decomposing atomic environments into a complete hierarchy of (irreducible) Cartesian tensors, ensuring symmetry-consistent representations that naturally encode invariance and equivariance constraints. Beyond geometry, TACE incorporates universal embeddings that flexibly integrate diverse attributes including basis sets, charges, magnetic moments and field perturbations. This allows explicit control over external invariants and equivariants in the prediction process. Long-range interactions are also accurately described through the Latent Ewald Summation module within the short-range approximation, providing a rigorous yet computationally efficient treatment of electrostatic interactions. We demonstrate that TACE attains accuracy, stability, and efficiency on par with or surpassing leading equivariant frameworks across finite molecules and extended materials, including in-domain and out-of-domain benchmarks, spectra, hessians, external-field response, charged systems, magnetic systems, multi-fidelity training, and heterogeneous catalytic systems. Crucially, TACE bridges scalar and tensorial modeling and establishes a Cartesian-space paradigm that unifies and extends beyond the design space of spherical-harmonic-based methods. This work lays the foundation for a new generation of universal atomistic machine learning models capable of systematically capturing the rich interplay of geometry, fields and material properties within a single coherent framework.
【12】Radiolunadiff: Estimation of wireless network signal strength in lunar terrain
标题:Radiolunadiff:月球地形中无线网络信号强度的估计
链接:https://arxiv.org/abs/2509.14559
作者:rado, Anders Pearson, Jason Klein, Alexander Moscibroda, Joshua Smith
摘要:在本文中,我们提出了一种新的物理信息深度学习架构,用于预测月球地形上的无线电地图。我们的方法集成了一个基于物理的月球地形生成器,该生成器通过公开的NASA数据生成逼真的地形,并使用射线跟踪引擎创建高保真的无线电传播场景数据集。在此数据集的基础上,我们引入了一个三重UNet架构,由两个标准UNets和一个扩散网络组成,以模拟复杂的传播效果。实验结果表明,我们的方法在各种指标上都优于现有的深度学习方法。
摘要:In this paper, we propose a novel physics-informed deep learning architecture for predicting radio maps over lunar terrain. Our approach integrates a physics-based lunar terrain generator, which produces realistic topography informed by publicly available NASA data, with a ray-tracing engine to create a high-fidelity dataset of radio propagation scenarios. Building on this dataset, we introduce a triplet-UNet architecture, consisting of two standard UNets and a diffusion network, to model complex propagation effects. Experimental results demonstrate that our method outperforms existing deep learning approaches on our terrain dataset across various metrics.
【13】Artificial Intelligence-derived Cardiotocography Age as a Digital Biomarker for Predicting Future Adverse Pregnancy Outcomes
标题:人工智能衍生的子宫内膜成像时代作为预测未来不良妊娠结局的数字生物标志物
链接:https://arxiv.org/abs/2509.14242
作者:Gu, Zenghui Lin, Jingying Ma, Jingyu Wang, Linyan Zhang, Rui Bai, Zelin Tu, Youyou Jiang, Donglin Xie, Yuxi Zhou, Guoli Liu, Shenda Hong
摘要
:胎儿分娩描记术(CTG)是一种低成本、无创的胎儿健康评估技术,在全球范围内使用,尤其是在欠发达国家。然而,它目前主要用于识别胎儿的当前状态(例如,胎儿酸中毒或缺氧),CTG预测未来不良妊娠结局的潜力尚未得到充分探讨。我们的目标是开发一种基于人工智能的模型,从CTG时间序列预测生物学年龄(称为CTGage),然后计算CTGage和实际年龄之间的年龄差距(称为CTGage-gap),并将此差距用作未来不良妊娠结局的新数字生物标志物。CTGage模型是使用2018年至2022年期间在北京大学人民医院收集的11,385名孕妇的61,140条记录开发的。对于模型训练,使用结构设计的1D卷积神经网络,结合分布对齐的增强回归技术。CTGage间隙分为五组:< -21天(低估组)、-21至-7天、-7至7天(正常组)、7至21天和> 21天(高估组)。我们进一步将低估组和高估组一起定义为高风险组。然后,我们比较了这些组的不良结局和孕产妇疾病的发生率。CTGage模型的平均绝对误差为10.91天。早产儿发生率为5.33%,妊娠期糖尿病(GDM)发生率为31.93%,高于正常组的1.42%(P < 0.05)。低估组与正常组比较,低出生体重发生率分别为0.17%和0.15%(p < 0.05),贫血发生率分别为37.51%和34.74%(p < 0.05)。人工智能衍生的CTGage可以预测不良妊娠结局的未来风险,并有可能成为一种新型、非侵入性且易于获取的数字生物标志物。
摘要:Cardiotocography (CTG) is a low-cost, non-invasive fetal health assessment technique used globally, especially in underdeveloped countries. However, it is currently mainly used to identify the fetus's current status (e.g., fetal acidosis or hypoxia), and the potential of CTG in predicting future adverse pregnancy outcomes has not been fully explored. We aim to develop an AI-based model that predicts biological age from CTG time series (named CTGage), then calculate the age gap between CTGage and actual age (named CTGage-gap), and use this gap as a new digital biomarker for future adverse pregnancy outcomes. The CTGage model is developed using 61,140 records from 11,385 pregnant women, collected at Peking University People's Hospital between 2018 and 2022. For model training, a structurally designed 1D convolutional neural network is used, incorporating distribution-aligned augmented regression technology. The CTGage-gap is categorized into five groups: < -21 days (underestimation group), -21 to -7 days, -7 to 7 days (normal group), 7 to 21 days, and > 21 days (overestimation group). We further defined the underestimation group and overestimation group together as the high-risk group. We then compare the incidence of adverse outcomes and maternal diseases across these groups. The average absolute error of the CTGage model is 10.91 days. When comparing the overestimation group with the normal group, premature infants incidence is 5.33% vs. 1.42% (p < 0.05) and gestational diabetes mellitus (GDM) incidence is 31.93% vs. 20.86% (p < 0.05). When comparing the underestimation group with the normal group, low birth weight incidence is 0.17% vs. 0.15% (p < 0.05) and anaemia incidence is 37.51% vs. 34.74% (p < 0.05). Artificial intelligence-derived CTGage can predict the future risk of adverse pregnancy outcomes and hold potential as a novel, non-invasive, and easily accessible digital biomarker.
其他神经网络|深度学习|模型|建模(18篇)
【1】Self-Improving Embodied Foundation Models
标题:自我完善的基础模型
链接:https://arxiv.org/abs/2509.15155
作者:yar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson, Pannag Sanketi, Igor Mordatch
备注:Appearing in the Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:在网络规模数据上训练的基础模型已经彻底改变了机器人技术,但它们在低级控制方面的应用仍然主要限于行为克隆。从强化学习阶段在微调大型语言模型方面的成功中汲取灵感,我们提出了一种两阶段的机器人后训练方法。第一阶段,监督微调(SFT),使用以下两种方法对预训练的基础模型进行微调:a)行为克隆,b)步骤预测目标。在第二阶段,自我改进,步骤到去的预测,使提取一个良好的奖励函数和一个强大的成功检测器,使一个舰队的机器人,以最少的人类监督,自主实践下游任务。通过对真实世界和模拟机器人实施例的广泛实验,我们新颖的后训练配方揭示了体现基础模型的显着结果。首先,我们证明了SFT和自我改进的结合比监督学习的规模化模仿数据收集更有效,并且它导致了具有更高成功率的政策。进一步的消融强调,网络规模的预训练和自我完善的结合是这种样本效率的关键。接下来,我们证明了我们提出的组合独特地解锁了当前方法无法实现的能力:自主练习和获得新的技能,这些技能远远超出了在训练过程中使用的模仿学习数据集中观察到的行为。这些发现强调了将预先训练的基础模型与在线自我改进相结合的变革潜力,以实现机器人技术的自主技能获取。我们的项目网站可以在https://self-improving-efms.github.io上找到。
摘要:Foundation models trained on web-scale data have revolutionized robotics, but their application to low-level control remains largely limited to behavioral cloning. Drawing inspiration from the success of the reinforcement learning stage in fine-tuning large language models, we propose a two-stage post-training approach for robotics. The first stage, Supervised Fine-Tuning (SFT), fine-tunes pretrained foundation models using both: a) behavioral cloning, and b) steps-to-go prediction objectives. In the second stage, Self-Improvement, steps-to-go prediction enables the extraction of a well-shaped reward function and a robust success detector, enabling a fleet of robots to autonomously practice downstream tasks with minimal human supervision. Through extensive experiments on real-world and simulated robot embodiments, our novel post-training recipe unveils significant results on Embodied Foundation Models. First, we demonstrate that the combination of SFT and Self-Improvement is significantly more sample-efficient than scaling imitation data collection for supervised learning, and that it leads to policies with significantly higher success rates. Further ablations highlight that the combination of web-scale pretraining and Self-Improvement is the key to this sample-efficiency. Next, we demonstrate that our proposed combination uniquely unlocks a capability that current methods cannot achieve: autonomously practicing and acquiring novel skills that generalize far beyond the behaviors observed in the imitation learning datasets used during training. These findings highlight the transformative potential of combining pretrained foundation models with online Self-Improvement to enable autonomous skill acquisition in robotics. Our project website can be found at https://self-improving-efms.github.io .
【2】Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits
标题:非静止多臂盗贼的约束反馈学习
链接:https://arxiv.org/abs/2509.15073
作者:i, Jian Li
摘要:非固定的多武装土匪使代理能够适应不断变化的环境,通过结合机制来检测和响应奖励分配的变化,使其非常适合动态设置。然而,现有的方法通常假设奖励反馈在每一轮都是可用的,这一假设忽略了许多反馈有限的现实世界场景。在本文中,我们向前迈出了重要的一步,通过引入一个新的模型的约束反馈的非平稳多臂土匪,奖励反馈的可用性受到限制。我们提出了第一个无先验算法-也就是说,一个不需要先验知识的非平稳性的程度-在这种情况下,实现接近最佳的动态遗憾。具体来说,我们的算法达到了动态遗憾$\tilde{\mathcal{O}}({K^{1/3} V_T^{1/3} T }/{ B^{1/3}})$,其中$T$是轮数,$K$是臂数,$B$是查询预算,$V_T$是捕获非平稳程度的变化预算。
摘要
:Non-stationary multi-armed bandits enable agents to adapt to changing environments by incorporating mechanisms to detect and respond to shifts in reward distributions, making them well-suited for dynamic settings. However, existing approaches typically assume that reward feedback is available at every round - an assumption that overlooks many real-world scenarios where feedback is limited. In this paper, we take a significant step forward by introducing a new model of constrained feedback in non-stationary multi-armed bandits, where the availability of reward feedback is restricted. We propose the first prior-free algorithm - that is, one that does not require prior knowledge of the degree of non-stationarity - that achieves near-optimal dynamic regret in this setting. Specifically, our algorithm attains a dynamic regret of $\tilde{\mathcal{O}}({K^{1/3} V_T^{1/3} T }/{ B^{1/3}})$, where $T$ is the number of rounds, $K$ is the number of arms, $B$ is the query budget, and $V_T$ is the variation budget capturing the degree of non-stationarity.
【3】Communication Efficient Split Learning of ViTs with Attention-based Double Compression
标题:具有基于注意力的双重压缩的ViT的通信高效分割学习
链接:https://arxiv.org/abs/2509.15058
作者:Alvetreti, Jary Pomponi, Paolo Di Lorenzo, Simone Scardapane
摘要:本文提出了一种新的通信高效的分裂学习(SL)框架,命名为基于注意力的双重压缩(ADC),它减少了在SL训练过程中传输中间Vision Transformers激活所需的通信开销。ADC采用两种并行压缩策略。第一种策略基于最后一个客户端层中计算的平均注意力得分合并相似的样本激活;这种策略是类不可知的,这意味着它也可以合并具有不同类的样本,而不会失去泛化能力,也不会降低最终结果。第二种策略遵循第一种策略,丢弃最没有意义的令牌,进一步降低了通信成本。结合这些策略不仅可以在前向传递过程中发送更少的数据,而且梯度也会被自然压缩,从而允许整个模型在没有额外调整或近似梯度的情况下进行训练。仿真结果表明,基于注意力的双重压缩通过显着减少通信开销,同时保持高准确性,优于最先进的SL框架。
摘要:This paper proposes a novel communication-efficient Split Learning (SL) framework, named Attention-based Double Compression (ADC), which reduces the communication overhead required for transmitting intermediate Vision Transformers activations during the SL training process. ADC incorporates two parallel compression strategies. The first one merges samples' activations that are similar, based on the average attention score calculated in the last client layer; this strategy is class-agnostic, meaning that it can also merge samples having different classes, without losing generalization ability nor decreasing final results. The second strategy follows the first and discards the least meaningful tokens, further reducing the communication cost. Combining these strategies not only allows for sending less during the forward pass, but also the gradients are naturally compressed, allowing the whole model to be trained without additional tuning or approximations of the gradients. Simulation results demonstrate that Attention-based Double Compression outperforms state-of-the-art SL frameworks by significantly reducing communication overheads while maintaining high accuracy.
【4】Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale
标题:机器人控制栈:用于机器人大规模学习的精益生态系统
链接:https://arxiv.org/abs/2509.14932
作者:lg, Pierre Krack, Seongjin Bien, Yannik Blei, Khaled Gamal, Ken Nakahara, Johannes Hechtl, Roberto Calandra, Wolfram Burgard, Florian Walter
摘要:视觉-语言-动作模型(VLA)标志着机器人学习的重大转变。它们用大规模数据收集和特定于设置的微调取代了专业的架构和专家策略的任务定制组件。在这种以模型和可扩展训练为中心的机器学习工作流中,传统的机器人软件框架成为瓶颈,而机器人仿真仅为从现实世界实验过渡到现实世界实验提供有限的支持。在这项工作中,我们通过引入机器人控制栈(RCS)来缩小这一差距,RCS是一个从头开始设计的精益生态系统,旨在通过大规模的通才策略支持机器人学习研究。在其核心,RCS具有模块化和易于扩展的分层架构,具有用于模拟和物理机器人的统一接口,便于模拟到真实的传输。尽管它的占用空间和依赖性很小,但它提供了一个完整的功能集,可以进行真实世界的实验和大规模的模拟训练。我们的贡献是双重的:首先,我们介绍了RCS的体系结构,并解释其设计原则。其次,我们评估其可用性和性能沿VLA和RL政策的发展周期。我们的实验还提供了Octo,OpenVLA和Pi Zero在多个机器人上的广泛评估,并阐明了模拟数据如何提高现实世界的策略性能。我们的代码、数据集、权重和视频可在https://robotcontrolstack.github.io/上获得
摘要:Vision-Language-Action models (VLAs) mark a major shift in robot learning. They replace specialized architectures and task-tailored components of expert policies with large-scale data collection and setup-specific fine-tuning. In this machine learning-focused workflow that is centered around models and scalable training, traditional robotics software frameworks become a bottleneck, while robot simulations offer only limited support for transitioning from and to real-world experiments. In this work, we close this gap by introducing Robot Control Stack (RCS), a lean ecosystem designed from the ground up to support research in robot learning with large-scale generalist policies. At its core, RCS features a modular and easily extensible layered architecture with a unified interface for simulated and physical robots, facilitating sim-to-real transfer. Despite its minimal footprint and dependencies, it offers a complete feature set, enabling both real-world experiments and large-scale training in simulation. Our contribution is twofold: First, we introduce the architecture of RCS and explain its design principles. Second, we evaluate its usability and performance along the development cycle of VLA and RL policies. Our experiments also provide an extensive evaluation of Octo, OpenVLA, and Pi Zero on multiple robots and shed light on how simulation data can improve real-world policy performance. Our code, datasets, weights, and videos are available at: https://robotcontrolstack.github.io/
【5】Designing Latent Safety Filters using Pre-Trained Vision Models
标题:使用预先训练的视觉模型设计潜在安全过滤器
链接:https://arxiv.org/abs/2509.14758
作者:ara, Yuxuan Yang, Ahmad Hamzeh, Maxwell Astafyev, Hussein Sibai
摘要:确保基于视觉的控制系统的安全性仍然是阻碍其在关键环境中部署的主要挑战。安全滤波器作为确保经典控制系统安全的有效工具已经获得了越来越多的关注,但是它们在基于视觉的控制设置中的应用迄今为止受到限制。预训练的视觉模型(PVR)已被证明是各种机器人领域控制的有效感知骨干。在本文中,我们感兴趣的是检查它们的有效性时,用于设计基于视觉的安全过滤器。我们使用它们作为定义故障集的分类器、基于Hamilton-Jacobi(HJ)可达性的安全过滤器和潜在世界模型的主干。我们讨论了在训练PVR作为主干的模型时从头开始训练、微调和冻结PVR之间的权衡。我们还评估了其中一个PVR是否在所有任务中都是优越的,评估了学习世界模型或Q函数是否更适合于将决策切换到安全策略,并讨论了在资源受限的设备上部署这些PVR的实际考虑因素。
摘要
:Ensuring safety of vision-based control systems remains a major challenge hindering their deployment in critical settings. Safety filters have gained increased interest as effective tools for ensuring the safety of classical control systems, but their applications in vision-based control settings have so far been limited. Pre-trained vision models (PVRs) have been shown to be effective perception backbones for control in various robotics domains. In this paper, we are interested in examining their effectiveness when used for designing vision-based safety filters. We use them as backbones for classifiers defining failure sets, for Hamilton-Jacobi (HJ) reachability-based safety filters, and for latent world models. We discuss the trade-offs between training from scratch, fine-tuning, and freezing the PVRs when training the models they are backbones for. We also evaluate whether one of the PVRs is superior across all tasks, evaluate whether learned world models or Q-functions are better for switching decisions to safe policies, and discuss practical considerations for deploying these PVRs on resource-constrained devices.
【6】ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning
标题:Tools Sample:基于RL的工具学习的双重动态抽样方法和课程学习
链接:https://arxiv.org/abs/2509.14718
作者:g, Xiaoxue Wang, Bowen Wu, Hailong Cao, Tiejun Zhao, Qun Yu, Baoxun Wang
摘要:虽然强化学习(RL)越来越多地用于基于LLM的工具学习,但其效率往往受到过多简单样本的阻碍,这些样本随着训练的进行而减少学习价值。现有的动态采样技术不适合工具学习所固有的多任务结构和细粒度奖励机制。本文介绍了动态抽样与课程学习(DSCL),一个框架,专门设计来解决这一挑战,针对工具学习的独特特征:其多个相互依赖的子任务和多值奖励功能。DSCL有两个核心组件:基于奖励的动态采样,它使用多维奖励统计(均值和方差)来优先考虑有价值的数据,以及基于任务的动态课程学习,它自适应地将训练集中在较少掌握的子任务上。通过大量的实验,我们证明了DSCL在强基线上显着提高了训练效率和模型性能,在BFCLv3基准测试中提高了3.29%。我们的方法提供了一个量身定制的解决方案,有效地利用工具学习中复杂的奖励信号和子任务动态来实现卓越的结果。
摘要:While reinforcement learning (RL) is increasingly used for LLM-based tool learning, its efficiency is often hampered by an overabundance of simple samples that provide diminishing learning value as training progresses. Existing dynamic sampling techniques are ill-suited for the multi-task structure and fine-grained reward mechanisms inherent to tool learning. This paper introduces Dynamic Sampling with Curriculum Learning (DSCL), a framework specifically designed to address this challenge by targeting the unique characteristics of tool learning: its multiple interdependent sub-tasks and multi-valued reward functions. DSCL features two core components: Reward-Based Dynamic Sampling, which uses multi-dimensional reward statistics (mean and variance) to prioritize valuable data, and Task-Based Dynamic Curriculum Learning, which adaptively focuses training on less-mastered sub-tasks. Through extensive experiments, we demonstrate that DSCL significantly improves training efficiency and model performance over strong baselines, achieving a 3.29\% improvement on the BFCLv3 benchmark. Our method provides a tailored solution that effectively leverages the complex reward signals and sub-task dynamics within tool learning to achieve superior results.
【7】Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition
标题:具有低等级分解的高位张量数据的结构保持裕度分布学习
链接:https://arxiv.org/abs/2509.14577
作者:Junpeng Li, Changchun Hua, Yana Yang
摘要:LMDM(Large Margin Distribution Machine)是分类器设计的最新进展,它不仅优化了最小间隔(如SVM),而且优化了整个间隔分布,从而提高了泛化能力。然而,现有的LMDM公式仅限于矢量化输入,并且由于需要扁平化而与高维张量数据斗争,这破坏了数据固有的多模结构并增加了计算负担。在本文中,我们提出了一种用于高阶张量数据的低秩分解(SPMD-LRT)的结构保持间隔分布学习,该学习直接对张量表示进行操作,而无需矢量化。SPMD-LRT通过将一阶和二阶张量统计(边缘均值和方差)纳入目标来保留多维空间结构,并利用低秩张量分解技术(包括秩1(CP),高阶CP和Tucker分解)来参数化权重张量。交替优化(双梯度下降)算法的开发,以有效地解决SPMD-LRT,迭代更新因子矩阵和核心张量。这种方法使SPMD-LRT保持高阶数据的结构信息,同时优化边缘分布,以提高分类。在不同数据集(包括MNIST,图像和fMRI神经成像)上的广泛实验表明,SPMD-LRT与传统SVM,基于向量的LMDM和先前基于张量的SVM扩展(支持张量机和支持塔克机)相比,具有更高的分类精度。值得注意的是,SPMD-LRT与塔克分解达到最高的精度,突出了结构保存的好处。这些结果证实了SPMD-LRT在处理高维张量数据进行分类方面的有效性和鲁棒性。
摘要:The Large Margin Distribution Machine (LMDM) is a recent advancement in classifier design that optimizes not just the minimum margin (as in SVM) but the entire margin distribution, thereby improving generalization. However, existing LMDM formulations are limited to vectorized inputs and struggle with high-dimensional tensor data due to the need for flattening, which destroys the data's inherent multi-mode structure and increases computational burden. In this paper, we propose a Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition (SPMD-LRT) that operates directly on tensor representations without vectorization. The SPMD-LRT preserves multi-dimensional spatial structure by incorporating first-order and second-order tensor statistics (margin mean and variance) into the objective, and it leverages low-rank tensor decomposition techniques including rank-1(CP), higher-rank CP, and Tucker decomposition to parameterize the weight tensor. An alternating optimization (double-gradient descent) algorithm is developed to efficiently solve the SPMD-LRT, iteratively updating factor matrices and core tensor. This approach enables SPMD-LRT to maintain the structural information of high-order data while optimizing margin distribution for improved classification. Extensive experiments on diverse datasets (including MNIST, images and fMRI neuroimaging) demonstrate that SPMD-LRT achieves superior classification accuracy compared to conventional SVM, vector-based LMDM, and prior tensor-based SVM extensions (Support Tensor Machines and Support Tucker Machines). Notably, SPMD-LRT with Tucker decomposition attains the highest accuracy, highlighting the benefit of structure preservation. These results confirm the effectiveness and robustness of SPMD-LRT in handling high-dimensional tensor data for classification.
【8】Evidential Physics-Informed Neural Networks for Scientific Discovery
标题:用于科学发现的证据物理学知情神经网络
链接:https://arxiv.org/abs/2509.14568
作者: Tan, Kuancheng Wang, Rafe McBeth
备注:15 pages, 4 figures
摘要:我们提出了证据物理信息神经网络(E-PINN)的基本理论和实现指南-一类新的不确定性感知PINN。它利用证据深度学习的边缘分布损失函数来估计输出的不确定性,并通过学习的后验分布来推断PDE的未知参数。在两个说明性案例研究中验证我们的模型-具有高斯源的1D泊松方程和2D Fisher-KPP方程,我们发现E-PINN生成的经验覆盖概率比贝叶斯PINN和深度包围方法校准得更好。为了证明现实世界的适用性,我们还提出了一个简短的案例研究,应用E-PINN来分析临床葡萄糖-胰岛素数据集,这些数据集在糖尿病病理生理学的医学研究中具有特色。
摘要:We present the fundamental theory and implementation guidelines underlying Evidential Physics-Informed Neural Network (E-PINN) -- a novel class of uncertainty-aware PINN. It leverages the marginal distribution loss function of evidential deep learning for estimating uncertainty of outputs, and infers unknown parameters of the PDE via a learned posterior distribution. Validating our model on two illustrative case studies -- the 1D Poisson equation with a Gaussian source and the 2D Fisher-KPP equation, we found that E-PINN generated empirical coverage probabilities that were calibrated significantly better than Bayesian PINN and Deep Ensemble methods. To demonstrate real-world applicability, we also present a brief case study on applying E-PINN to analyze clinical glucose-insulin datasets that have featured in medical research on diabetes pathophysiology.
【9】LiMuon: Light and Fast Muon Optimizer for Large Models
标题:LiMuon:适用于大型模型的轻便快速Muon优化器
链接:https://arxiv.org/abs/2509.14562
作者:ng, Yuning Luo, Songcan Chen
备注:28 pages
摘要:近年来,大模型在人工智能中得到了广泛的应用,因此大模型的有效训练受到了广泛的关注。最近,一个有用的μ子优化器是专门为大型模型的矩阵结构参数设计的。虽然已有研究开始对μ子优化器进行研究,但现有的μ子及其变体仍然存在样本复杂度高或大型模型内存大的问题。为了填补这一空白,我们提出了一种用于训练大型模型的轻型快速Muon(LiMuon)优化器,它建立在基于动量的方差降低技术和随机奇异值分解(SVD)的基础上。我们的LiMuon优化器比当前的Muon及其变体具有更低的内存。此外,我们证明了在光滑条件下,我们的LiMuon对于寻找非凸随机优化的平稳解具有较低的样本复杂度为O(\N ^{-3})$.目前,已有的μ子优化算法的收敛性分析主要依赖于严格的Lipschitz光滑假设,而一些人工智能任务如训练大型语言模型(LLM)并不满足这一条件。我们还证明了我们的LiMuon优化器在广义光滑条件下的样本复杂度为O(-3)。在DistilGPT 2和ViT模型上的数值实验结果验证了LiMuon优化器的有效性。
摘要:Large models recently are widely applied in artificial intelligence, so efficient training of large models has received widespread attention. More recently, a useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to studying Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models. To fill this gap, we propose a light and fast Muon (LiMuon) optimizer for training large models, which builds on the momentum-based variance reduced technique and randomized Singular Value Decomposition (SVD). Our LiMuon optimizer has a lower memory than the current Muon and its variants. Moreover, we prove that our LiMuon has a lower sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of non-convex stochastic optimization under the smooth condition. Recently, the existing convergence analysis of Muon optimizer mainly relies on the strict Lipschitz smooth assumption, while some artificial intelligence tasks such as training large language models (LLMs) do not satisfy this condition. We also proved that our LiMuon optimizer has a sample complexity of $O(\epsilon^{-3})$ under the generalized smooth condition. Numerical experimental results on training DistilGPT2 and ViT models verify efficiency of our LiMuon optimizer.
【10】Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models
标题:哈希基线:在预训练模型时代重新思考哈希
链接:https://arxiv.org/abs/2509.14427
作者:ummad, Kawtar Zaher, Lukas Rauch, Alexis Joly
摘要:使用紧凑二进制嵌入的信息检索(也称为哈希)对于可扩展的快速搜索应用程序至关重要,但最先进的哈希方法需要昂贵的特定于XML的训练。在这项工作中,我们引入了Hashing-Baseline,这是一种强大的免训练哈希方法,利用强大的预训练编码器产生丰富的预训练嵌入。我们重新审视经典的,无需训练的哈希技术:主成分分析,随机正交投影和阈值二值化,以产生强大的哈希基线。我们的方法将这些技术与来自最先进的视觉和音频编码器的冻结嵌入相结合,以产生具有竞争力的检索性能,而无需任何额外的学习或微调。为了证明这种方法的通用性和有效性,我们评估它的标准图像检索基准,以及一个新推出的基准音频哈希。
摘要:Information retrieval with compact binary embeddings, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pretrained encoders that produce rich pretrained embeddings. We revisit classical, training-free hashing techniques: principal component analysis, random orthogonal projection, and threshold binarization, to produce a strong baseline for hashing. Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning. To demonstrate the generality and effectiveness of this approach, we evaluate it on standard image retrieval benchmarks as well as a newly introduced benchmark for audio hashing.
【11】A Neural Network for the Identical Kuramoto Equation: Architectural Considerations and Performance Evaluation
标题:相同Kuramoto方程的神经网络:架构考虑和性能评估
链接:https://arxiv.org/abs/2509.14384
作者: Panigrahi, Mayank Patwal
备注:6 pages, 10 figures. Presented at IEEE International Conference on Compute, Control, Network & Photonics (ICCCNP), 2025
摘要
:在本文中,我们研究了深度神经网络(DNN)近似来自同振子仓本模型的非局部守恒律解的效率,重点是评估架构选择及其对基于能量范数和计算时间的解精度的影响。通过系统的实验,我们证明了网络配置参数-特别是激活函数选择(tanh vs. sin vs. ReLU),网络深度(4-8个隐藏层),宽度(64-256个神经元)和训练方法(配置点,epoch计数)-显着影响收敛特性。我们观察到,双曲正切激活产生稳定的收敛跨配置,而正弦激活可以达到较低的误差和训练时间在孤立的情况下,但偶尔会产生非物理的文物。我们与传统数值方法的比较分析表明,最佳配置的DNN提供了具有竞争力的准确性,具有显着不同的计算权衡。此外,我们确定了标准前馈架构在处理奇异或分段常数解时的基本限制,提供了经验证据表明,由于标准激活函数的自然函数空间限制,此类网络固有地过度平滑尖锐特征。这项工作为越来越多的基于神经网络的科学计算研究做出了贡献,为实践者提供了DNN实施的经验指南,同时阐明了必须克服的基本理论约束,以将其适用性扩展到更具挑战性的不连续物理系统。
摘要:In this paper, we investigate the efficiency of Deep Neural Networks (DNNs) to approximate the solution of a nonlocal conservation law derived from the identical-oscillator Kuramoto model, focusing on the evaluation of an architectural choice and its impact on solution accuracy based on the energy norm and computation time. Through systematic experimentation, we demonstrate that network configuration parameters-specifically, activation function selection (tanh vs. sin vs. ReLU), network depth (4-8 hidden layers), width (64-256 neurons), and training methodology (collocation points, epoch count)-significantly influence convergence characteristics. We observe that tanh activation yields stable convergence across configurations, whereas sine activation can attain marginally lower errors and training times in isolated cases, but occasionally produce nonphysical artefacts. Our comparative analysis with traditional numerical methods shows that optimally configured DNNs offer competitive accuracy with notably different computational trade-offs. Furthermore, we identify fundamental limitations of standard feed-forward architectures when handling singular or piecewise-constant solutions, providing empirical evidence that such networks inherently oversmooth sharp features due to the natural function space limitations of standard activation functions. This work contributes to the growing body of research on neural network-based scientific computing by providing practitioners with empirical guidelines for DNN implementation while illuminating fundamental theoretical constraints that must be overcome to expand their applicability to more challenging physical systems with discontinuities.
【12】Monitoring Machine Learning Systems: A Multivocal Literature Review
标题:监控机器学习系统:多域文献评论
链接:https://arxiv.org/abs/2509.14294
作者:ed, Scott Barnett, Chetan Arora, John Grundy, Hourieh Khalajzadeh, Omar Haggag
摘要:动态的生产环境使得维护可靠的机器学习(ML)系统变得非常具有挑战性。在生产环境中,数据模式或操作上下文的更改等会降低模型性能的故障问题是常见的。通过监控,可以及早发现和缓解这些运行时问题,帮助维护用户的信任,并防止对组织造成不必要的后果。目的:本研究旨在提供ML监测文献的全面概述。方法:我们根据Garousi制定的指南进行了多目标文献综述(MLR),以调查136篇论文中ML监测方法的各个方面。结果如下:我们根据四个关键领域分析了选定的研究:(1)动机,目标和背景;(2)监测方面,具体技术,指标和工具;(3)贡献和好处;(4)当前的限制。我们还讨论了研究中发现的一些见解,它们的影响,以及对未来研究和实践的建议。结论:我们的MLR识别并总结了ML监控实践和差距,强调正式文献和灰色文献之间的相似性和脱节。我们的研究对学者和从业者都很有价值,因为它有助于选择适当的解决方案,突出当前方法的局限性,并为研究和工具开发提供未来的方向。
摘要:Context: Dynamic production environments make it challenging to maintain reliable machine learning (ML) systems. Runtime issues, such as changes in data patterns or operating contexts, that degrade model performance are a common occurrence in production settings. Monitoring enables early detection and mitigation of these runtime issues, helping maintain users' trust and prevent unwanted consequences for organizations. Aim: This study aims to provide a comprehensive overview of the ML monitoring literature. Method: We conducted a multivocal literature review (MLR) following the well established guidelines by Garousi to investigate various aspects of ML monitoring approaches in 136 papers. Results: We analyzed selected studies based on four key areas: (1) the motivations, goals, and context; (2) the monitored aspects, specific techniques, metrics, and tools; (3) the contributions and benefits; and (4) the current limitations. We also discuss several insights found in the studies, their implications, and recommendations for future research and practice. Conclusion: Our MLR identifies and summarizes ML monitoring practices and gaps, emphasizing similarities and disconnects between formal and gray literature. Our study is valuable for both academics and practitioners, as it helps select appropriate solutions, highlights limitations in current approaches, and provides future directions for research and tool development.
【13】Masked Diffusion Models as Energy Minimization
标题:作为能源最小化的掩蔽扩散模型
链接:https://arxiv.org/abs/2509.13866
作者:en, Shen Nie, Jiacheng Sun, Zijin Feng, Zhenguo Li, Ji-Rong Wen, Chongxuan Li
摘要:我们提出了一个系统的理论框架,解释掩蔽扩散模型(MDM)的离散最优运输的能量最小化问题的解决方案。具体来说,我们证明了三种不同的能量配方-动能,条件动能和测地线能量-在数学上是等价的MDM结构下,和MDM最小化所有三个时,掩模时间表满足封闭形式的最优性条件。这种统一不仅澄清了MDM的理论基础,而且还激发了采样的实际改进。通过Beta分布参数化插值时间表,我们将时间表设计空间减少到易于处理的2D搜索,从而在不修改模型的情况下实现有效的训练后调整。合成和现实世界的基准实验表明,我们的能源启发的时间表优于手工制作的基线,特别是在低步长采样设置。
摘要:We present a systematic theoretical framework that interprets masked diffusion models (MDMs) as solutions to energy minimization problems in discrete optimal transport. Specifically, we prove that three distinct energy formulations--kinetic, conditional kinetic, and geodesic energy--are mathematically equivalent under the structure of MDMs, and that MDMs minimize all three when the mask schedule satisfies a closed-form optimality condition. This unification not only clarifies the theoretical foundations of MDMs, but also motivates practical improvements in sampling. By parameterizing interpolation schedules via Beta distributions, we reduce the schedule design space to a tractable 2D search, enabling efficient post-training tuning without model modification. Experiments on synthetic and real-world benchmarks demonstrate that our energy-inspired schedules outperform hand-crafted baselines, particularly in low-step sampling settings.
【14】Shedding Light on Dark Matter at the LHC with Machine Learning
标题
:利用机器学习在大型强子对撞机上揭开暗物质的面纱
链接:https://arxiv.org/abs/2509.15121
作者:rganda, Martín de los Rios, Andres D. Perez, Subhojit Roy, Rosa M. Sandá Seoane, Carlos E. M. Wagner
备注:24 pages + references, 5 figures, 8 tables
摘要:本文在Z3对称次极小超对称标准模型中研究了一个以单斜占优的最轻超对称粒子形式存在的WIMP暗物质候选者。这种框架产生了参数空间的区域,在这些区域中,DM是通过与附近的类希格西诺电弱子的共湮灭获得的,并且DM直接检测信号被抑制,即所谓的"盲点”。另一方面,对撞机签名仍然是有希望的,由于增强的辐射衰变模式的希格斯玻色子到单主导LSP和光子,而不是到轻子或强子。这激发了对辐射衰变中性伴子的搜索,然而,这些信号面临着大量的背景挑战,因为衰变产物通常是软的,这是由于LSP和希格斯诺类共湮灭伙伴之间的小质量分裂($\Delta m$)。我们应用数据驱动的机器学习(ML)分析,提高了对这些微妙信号的敏感性,为传统搜索策略提供了强大的补充,以发现新的物理场景。使用LHC积分光度为100 ~\mathrm{fb}^{-1}$,能量为14 ~\mathrm{TeV}$,该方法实现了5\sigma $的发现范围,质量高达225 ~\mathrm{GeV}$,能量为$\Delta m\!lesssim\!12~\mathrm{GeV}$,以及$2\sigma$排除高达$285~\mathrm{GeV}$,其中$\Delta m\!\ lesssim\!20 GeV。这些结果突出了对撞机搜索探测DM候选者的能力,这些候选者仍然隐藏在当前的直接检测实验中,并为LHC合作使用ML方法进行搜索提供了动力。
摘要:We investigate a WIMP dark matter (DM) candidate in the form of a singlino-dominated lightest supersymmetric particle (LSP) within the $Z_3$-symmetric Next-to-Minimal Supersymmetric Standard Model. This framework gives rise to regions of parameter space where DM is obtained via co-annihilation with nearby higgsino-like electroweakinos and DM direct detection~signals are suppressed, the so-called ``blind spots". On the other hand, collider signatures remain promising due to enhanced radiative decay modes of higgsinos into the singlino-dominated LSP and a photon, rather than into leptons or hadrons. This motivates searches for radiatively decaying neutralinos, however, these signals face substantial background challenges, as the decay products are typically soft due to the small mass-splits ($\Delta m$) between the LSP and the higgsino-like coannihilation partners. We apply a data-driven Machine Learning (ML) analysis that improves sensitivity to these subtle signals, offering a powerful complement to traditional search strategies to discover a new physics scenario. Using an LHC integrated luminosity of $100~\mathrm{fb}^{-1}$ at $14~\mathrm{TeV}$, the method achieves a $5\sigma$ discovery reach for higgsino masses up to $225~\mathrm{GeV}$ with $\Delta m\!\lesssim\!12~\mathrm{GeV}$, and a $2\sigma$ exclusion up to $285~\mathrm{GeV}$ with $\Delta m\!\lesssim\!20~\mathrm{GeV}$. These results highlight the power of collider searches to probe DM candidates that remain hidden from current direct detection experiments, and provide a motivation for a search by the LHC collaborations using ML methods.
【15】Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance
标题:通过反记忆指导减轻文本到音频生成扩散模型中的数据复制
链接:https://arxiv.org/abs/2509.14934
作者: Messina, Francesca Ronchini, Luca Comanducci, Paolo Bestagini, Fabio Antonacci
摘要:生成音频模型中的一个持续挑战是数据复制,其中模型在推理期间无意中生成其训练数据的一部分。在这项工作中,我们解决这个问题,在文本到音频的扩散模型,探索使用反记忆策略。我们采用了抗记忆指导(AMG),这是一种修改预训练扩散模型的采样过程以阻止记忆的技术。我们的研究探讨了AMG中的三种类型的指导,每种指导都旨在减少复制,同时保持生成质量。我们使用Stable Audio Open作为我们的骨干,利用其完全开源的架构和训练数据集。我们全面的实验分析表明,AMG显着减轻记忆扩散为基础的文本到音频生成,而不损害音频保真度或语义对齐。
摘要:A persistent challenge in generative audio models is data replication, where the model unintentionally generates parts of its training data during inference. In this work, we address this issue in text-to-audio diffusion models by exploring the use of anti-memorization strategies. We adopt Anti-Memorization Guidance (AMG), a technique that modifies the sampling process of pre-trained diffusion models to discourage memorization. Our study explores three types of guidance within AMG, each designed to reduce replication while preserving generation quality. We use Stable Audio Open as our backbone, leveraging its fully open-source architecture and training dataset. Our comprehensive experimental analysis suggests that AMG significantly mitigates memorization in diffusion-based text-to-audio generation without compromising audio fidelity or semantic alignment.
【16】Beyond Spherical geometry: Unraveling complex features of objects orbiting around stars from its transit light curve using deep learning
标题:超越球形几何:使用深度学习从凌日光曲线解开绕恒星运行的物体的复杂特征
链接:https://arxiv.org/abs/2509.14875
作者:owmick, Shivam Kumaran
备注:16 pages, 17 figures
摘要:从凌日光变曲线来描述围绕恒星运行的物体的几何特征是揭示各种复杂现象的有力工具。这个问题本质上是不适定的,因为相似或相同的光变曲线可以由多个不同的形状产生。在这项研究中,我们调查的程度,形状的功能可以嵌入在过境光变曲线。我们生成了一个二维随机形状库,并用光变曲线模拟器Yuti模拟了它们的渡越光变曲线。每个形状被分解成一系列的椭圆分量表示的傅立叶系数的形式,增加了越来越少的扰动到一个理想的椭圆。我们训练深度神经网络,直接从模拟的光变曲线预测这些傅立叶系数。我们的研究结果表明,神经网络可以成功地重建低阶椭圆,它描述了整体形状,方向和大规模的扰动。对于高阶椭圆的规模成功地确定,但偏心率和取向的推断是有限的,展示了光变曲线中的形状信息的程度。我们探讨了非凸形状特征在重建中的影响,并展示了其对形状方向的依赖性。神经网络实现的重建水平强调了使用光变曲线作为从过境系统中提取几何信息的手段的实用性。
摘要
:Characterizing the geometry of an object orbiting around a star from its transit light curve is a powerful tool to uncover various complex phenomena. This problem is inherently ill-posed, since similar or identical light curves can be produced by multiple different shapes. In this study, we investigate the extent to which the features of a shape can be embedded in a transit light curve. We generate a library of two-dimensional random shapes and simulate their transit light curves with light curve simulator, Yuti. Each shape is decomposed into a series of elliptical components expressed in the form of Fourier coefficients that adds increasingly diminishing perturbations to an ideal ellipse. We train deep neural networks to predict these Fourier coefficients directly from simulated light curves. Our results demonstrate that the neural network can successfully reconstruct the low-order ellipses, which describe overall shape, orientation and large-scale perturbations. For higher order ellipses the scale is successfully determined but the inference of eccentricity and orientation is limited, demonstrating the extent of shape information in the light curve. We explore the impact of non-convex shape features in reconstruction, and show its dependence on shape orientation. The level of reconstruction achieved by the neural network underscores the utility of using light curves as a means to extract geometric information from transiting systems.
【17】Data coarse graining can improve model performance
标题:数据粗粒度可以提高模型性能
链接:https://arxiv.org/abs/2509.14498
作者:en, David J. Schwab, Vudtiwat Ngampruetikorn
备注:7 pages, 4 figures
摘要:从定义上讲,有损数据转换会丢失信息。然而,在现代机器学习中,数据修剪和有损数据增强等方法可以帮助提高泛化性能。我们使用一个可解的高维岭正则化线性回归模型来研究这个悖论。“受统计物理学中重正化群的启发,我们分析了粗粒度方案,这些方案根据特征与学习任务的相关性系统地丢弃特征。我们的研究结果揭示了一个非单调依赖的预测风险的程度粗粒化。一个“高通”方案--过滤掉不太相关、信号较低的特征--可以帮助模型更好地泛化。相比之下,集成了更相关、更高信号特性的“低通”方案纯粹是有害的。至关重要的是,使用最优正则化,我们证明了这种非单调性是数据粗粒度的明显效果,而不是双重下降的伪影。我们的框架提供了一个清晰的,分析性的解释,解释了为什么仔细的数据增强会起作用:它剥离了不太相关的自由度,并隔离了更多的预测信号。我们的研究结果突出了由数据结构塑造的复杂的非单调风险景观,并说明了统计物理学的思想如何为理解现代机器学习现象提供了一个原则性的视角。
摘要:Lossy data transformations by definition lose information. Yet, in modern machine learning, methods like data pruning and lossy data augmentation can help improve generalization performance. We study this paradox using a solvable model of high-dimensional, ridge-regularized linear regression under 'data coarse graining.' Inspired by the renormalization group in statistical physics, we analyze coarse-graining schemes that systematically discard features based on their relevance to the learning task. Our results reveal a nonmonotonic dependence of the prediction risk on the degree of coarse graining. A 'high-pass' scheme--which filters out less relevant, lower-signal features--can help models generalize better. By contrast, a 'low-pass' scheme that integrates out more relevant, higher-signal features is purely detrimental. Crucially, using optimal regularization, we demonstrate that this nonmonotonicity is a distinct effect of data coarse graining and not an artifact of double descent. Our framework offers a clear, analytical explanation for why careful data augmentation works: it strips away less relevant degrees of freedom and isolates more predictive signals. Our results highlight a complex, nonmonotonic risk landscape shaped by the structure of the data, and illustrate how ideas from statistical physics provide a principled lens for understanding modern machine learning phenomena.
【18】Property-Isometric Variational Autoencoders for Sequence Modeling and Design
标题:用于序列建模和设计的特性等距变分自动编码器
链接:https://arxiv.org/abs/2509.14287
作者:eghi, Xianqi Deng, I-Hsin Lin, Stacy M. Copp, Petko Bogdanov
备注:20 pages, 6 figures, preprint
摘要:具有所需功能特性的生物序列设计(DNA,RNA或肽)在发现新型纳米材料,生物传感器,抗菌药物等方面具有应用。一个常见的挑战是优化复杂的高维特性的能力,例如DNA介导的荧光纳米颗粒的目标发射光谱,光和化学稳定性,以及肽在目标微生物中的抗微生物活性。现有的模型依赖于简单的二进制标签(例如,结合/非结合)而不是高维复杂性质。为了解决这个问题,我们提出了一个几何保持变分自动编码器框架,称为PrIVAE,它学习尊重其属性空间几何的潜在序列嵌入。具体来说,我们将属性空间建模为一个高维流形,可以通过最近邻图局部近似,给出适当定义的距离度量。我们使用属性图来指导序列潜在表示,使用(1)图神经网络编码器层和(2)等距正则化器。PrIVAE学习一个属性组织的潜在空间,通过使用训练的解码器,可以合理设计具有所需属性的新序列。我们评估了我们的框架对两个生成任务的效用:(1)设计模板荧光金属纳米簇的DNA序列和(2)设计抗菌肽。训练后的模型在根据属性组织潜在空间的同时保持了较高的重建精度。除了计算机模拟实验之外,我们还采用采样序列进行DNA纳米簇的湿实验室设计,与训练数据中的丰度相比,稀有性质纳米簇的富集高达16.1倍,证明了我们框架的实用性。
摘要:Biological sequence design (DNA, RNA, or peptides) with desired functional properties has applications in discovering novel nanomaterials, biosensors, antimicrobial drugs, and beyond. One common challenge is the ability to optimize complex high-dimensional properties such as target emission spectra of DNA-mediated fluorescent nanoparticles, photo and chemical stability, and antimicrobial activity of peptides across target microbes. Existing models rely on simple binary labels (e.g., binding/non-binding) rather than high-dimensional complex properties. To address this gap, we propose a geometry-preserving variational autoencoder framework, called PrIVAE, which learns latent sequence embeddings that respect the geometry of their property space. Specifically, we model the property space as a high-dimensional manifold that can be locally approximated by a nearest neighbor graph, given an appropriately defined distance measure. We employ the property graph to guide the sequence latent representations using (1) graph neural network encoder layers and (2) an isometric regularizer. PrIVAE learns a property-organized latent space that enables rational design of new sequences with desired properties by employing the trained decoder. We evaluate the utility of our framework for two generative tasks: (1) design of DNA sequences that template fluorescent metal nanoclusters and (2) design of antimicrobial peptides. The trained models retain high reconstruction accuracy while organizing the latent space according to properties. Beyond in silico experiments, we also employ sampled sequences for wet lab design of DNA nanoclusters, resulting in up to 16.1-fold enrichment of rare-property nanoclusters compared to their abundance in training data, demonstrating the practical utility of our framework.
其他(18篇)
【1】CausalPre: Scalable and Effective Data Pre-processing for Causal Fairness
标题:Causon Pre:可扩展且有效的数据预处理,以实现因果公平
链接
:https://arxiv.org/abs/2509.15199
作者:g, Yangfan Jiang, Kian-Lee Tan
摘要:数据库中的因果公平性对于防止下游任务中有偏见和不准确的结果至关重要。虽然大多数以前的工作假设一个已知的因果模型,最近的努力放松了这一假设,强制执行额外的约束。然而,这些方法往往无法捕捉更广泛的属性关系,这是至关重要的维护实用程序。这就提出了一个基本问题:我们能否利用因果推理的好处来设计高效和有效的公平解决方案,而不依赖于对潜在因果模型的强有力假设?在本文中,我们试图回答这个问题,通过引入CausalPre,一个可扩展的和有效的cacidity引导的数据预处理框架,保证合理的公平性,一个强因果公平的概念。Causal Pre通过将原本复杂且计算上不可行的提取任务重新公式化为定制的分布估计问题来提取因果公平关系。为了确保可扩展性,Captain Pre采用了一种精心制作的低维边际因子分解变体来近似联合分布,并辅以启发式算法,有效地解决了相关的计算挑战。在基准数据集上进行的大量实验表明,CausePre是有效的和可扩展的,挑战了传统的信念,即实现因果公平性需要权衡关系覆盖率和宽松的模型假设。
摘要:Causal fairness in databases is crucial to preventing biased and inaccurate outcomes in downstream tasks. While most prior work assumes a known causal model, recent efforts relax this assumption by enforcing additional constraints. However, these approaches often fail to capture broader attribute relationships that are critical to maintaining utility. This raises a fundamental question: Can we harness the benefits of causal reasoning to design efficient and effective fairness solutions without relying on strong assumptions about the underlying causal model? In this paper, we seek to answer this question by introducing CausalPre, a scalable and effective causality-guided data pre-processing framework that guarantees justifiable fairness, a strong causal notion of fairness. CausalPre extracts causally fair relationships by reformulating the originally complex and computationally infeasible extraction task into a tailored distribution estimation problem. To ensure scalability, CausalPre adopts a carefully crafted variant of low-dimensional marginal factorization to approximate the joint distribution, complemented by a heuristic algorithm that efficiently tackles the associated computational challenge. Extensive experiments on benchmark datasets demonstrate that CausalPre is both effective and scalable, challenging the conventional belief that achieving causal fairness requires trading off relationship coverage for relaxed model assumptions.
【2】Emergent Alignment via Competition
标题:通过竞争实现紧急联盟
链接:https://arxiv.org/abs/2509.15090
作者:ollina, Surbhi Goel, Aaron Roth, Emily Ryu, Mirah Shi
摘要:使人工智能系统与人类价值观保持一致仍然是一个根本性的挑战,但我们无法创建完美一致的模型是否会妨碍我们获得一致性的好处?我们研究了一种策略设置,在这种设置中,人类用户与多个不同的错位AI代理进行交互,其中没有一个是单独对齐的。我们的主要观点是,当用户效用近似位于代理效用的凸包内时,随着模型多样性的增加,这种条件变得更容易满足,战略竞争可以产生与完美对齐模型相媲美的结果。我们将其建模为多领导者Stackelberg博弈,将贝叶斯说服扩展到不同知情方之间的多轮对话,并证明了三个结果:(1)当完美对齐将允许用户学习她的贝叶斯最优动作时,她也可以在凸包条件下的所有均衡中这样做(2)在仅需要近似效用学习的较弱假设下,采用量子响应的非策略用户在所有均衡中实现接近最优效用,以及(3)当用户在评估期之后选择最佳单个AI时,均衡保证保持接近最优,而无需进一步的分布假设。我们用两组实验来补充这个理论。
摘要:Aligning AI systems with human values remains a fundamental challenge, but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? We study a strategic setting where a human user interacts with multiple differently misaligned AI agents, none of which are individually well-aligned. Our key insight is that when the users utility lies approximately within the convex hull of the agents utilities, a condition that becomes easier to satisfy as model diversity increases, strategic competition can yield outcomes comparable to interacting with a perfectly aligned model. We model this as a multi-leader Stackelberg game, extending Bayesian persuasion to multi-round conversations between differently informed parties, and prove three results: (1) when perfect alignment would allow the user to learn her Bayes-optimal action, she can also do so in all equilibria under the convex hull condition (2) under weaker assumptions requiring only approximate utility learning, a non-strategic user employing quantal response achieves near-optimal utility in all equilibria and (3) when the user selects the best single AI after an evaluation period, equilibrium guarantees remain near-optimal without further distributional assumptions. We complement the theory with two sets of experiments.
【3】Probabilistic and nonlinear compressive sensing
标题:概率和非线性压缩感知
链接:https://arxiv.org/abs/2509.15060
作者:vester Barth, Paulo von Petersenn
摘要:我们提出了一个光滑的概率重新制定的$\ell_0$正则化回归,不需要蒙特卡罗采样,并允许精确梯度的计算,促进快速收敛到局部最优的最佳子集选择问题。该方法大大提高了收敛速度相比,类似的蒙特卡罗方法。此外,我们的经验表明,它优于压缩感知算法,如IHT和(松弛)Lasso在广泛的设置和信噪比。该实现在CPU和GPU上都能高效运行,可在https://github.com/L0-and-behold/probabilistic-nonlinear-cs上免费获得。 我们还有助于压缩感知的非线性推广的研究,通过调查当一个非线性教师网络的参数恢复是可能的,通过学生网络的压缩。建立在定理的February man和马克尔,我们从理论上表明,在无限数据限制的全局最优强制恢复到一定的对称性。对于经验验证,我们实现了一个正常形式的算法,选择一个典型的代表在每个对称类。然而,虽然压缩可以帮助改善测试损失,我们发现,确切的参数恢复甚至是不可能的对称。特别是,我们观察到一个令人惊讶的反弹效应,教师和学生的配置最初收敛,但随后发散,尽管不断减少测试损失。这些发现表明了线性和非线性压缩传感之间的根本差异。
摘要
:We present a smooth probabilistic reformulation of $\ell_0$ regularized regression that does not require Monte Carlo sampling and allows for the computation of exact gradients, facilitating rapid convergence to local optima of the best subset selection problem. The method drastically improves convergence speed compared to similar Monte Carlo based approaches. Furthermore, we empirically demonstrate that it outperforms compressive sensing algorithms such as IHT and (Relaxed-) Lasso across a wide range of settings and signal-to-noise ratios. The implementation runs efficiently on both CPUs and GPUs and is freely available at https://github.com/L0-and-behold/probabilistic-nonlinear-cs. We also contribute to research on nonlinear generalizations of compressive sensing by investigating when parameter recovery of a nonlinear teacher network is possible through compression of a student network. Building upon theorems of Fefferman and Markel, we show theoretically that the global optimum in the infinite-data limit enforces recovery up to certain symmetries. For empirical validation, we implement a normal-form algorithm that selects a canonical representative within each symmetry class. However, while compression can help to improve test loss, we find that exact parameter recovery is not even possible up to symmetries. In particular, we observe a surprising rebound effect where teacher and student configurations initially converge but subsequently diverge despite continuous decrease in test loss. These findings indicate fundamental differences between linear and nonlinear compressive sensing.
【4】Sample Efficient Experience Replay in Non-stationary Environments
标题:非静止环境中高效体验重播的示例
链接:https://arxiv.org/abs/2509.15032
作者:Duan, Zongyuan Zhang, Songxiao Guo, Yuanye Zhao, Zheng Lin, Zihan Fang, Yi Liu, Dianxin Luan, Dong Huang, Heming Cui, Yong Cui
备注:5 pages, 3 figures
摘要:非平稳环境中的强化学习(RL)具有挑战性,因为不断变化的动态和奖励很快使过去的经验过时。传统的经验重放(ER)方法,特别是那些使用TD错误优先级,努力区分由代理的政策和环境引起的变化,导致在动态条件下的学习效率低下。为了应对这一挑战,我们提出了环境动态的离散性(DoE),一个隔离环境变化对价值函数影响的度量。在此基础上,我们介绍了环境优先经验重放(DEER),一个自适应ER框架,优先过渡的基础上,政策更新和环境变化的离散性。DEER使用二元分类器来检测环境变化,并在每次轮班之前和之后应用不同的优先级策略,从而实现更有效的样本学习。在四个非平稳基准上的实验表明,与性能最好的最先进的ER方法相比,DEER进一步提高了11.54%的非策略算法的性能。
摘要:Reinforcement learning (RL) in non-stationary environments is challenging, as changing dynamics and rewards quickly make past experiences outdated. Traditional experience replay (ER) methods, especially those using TD-error prioritization, struggle to distinguish between changes caused by the agent's policy and those from the environment, resulting in inefficient learning under dynamic conditions. To address this challenge, we propose the Discrepancy of Environment Dynamics (DoE), a metric that isolates the effects of environment shifts on value functions. Building on this, we introduce Discrepancy of Environment Prioritized Experience Replay (DEER), an adaptive ER framework that prioritizes transitions based on both policy updates and environmental changes. DEER uses a binary classifier to detect environment changes and applies distinct prioritization strategies before and after each shift, enabling more sample-efficient learning. Experiments on four non-stationary benchmarks demonstrate that DEER further improves the performance of off-policy algorithms by 11.54 percent compared to the best-performing state-of-the-art ER methods.
【5】Robust Barycenters of Persistence Diagrams
标题:持久性图的稳健重心
链接:https://arxiv.org/abs/2509.14904
作者:ouk, Eloi Tanguy, Julie Delon, Julien Tierny
摘要:这篇简短的论文提出了一种计算持久性图的鲁棒Wasserstein重心的通用方法。经典的方法是在找到重心图和持久图之间的最优运输方案后,计算分配算术平均值。然而,这个过程只适用于与$q$-Wasserstein距离$W_q$相关的运输成本,当$q=2$时。我们采用另一种固定点方法来计算一般运输成本($q > 1$)的重心图,特别是对离群值($q \in(1,2)$)的鲁棒性。我们的工作在两个应用程序中的效用:持久性图的度量空间上的聚类和持久性图的字典编码。在这两种情况下,我们证明了我们的广义框架提供的离群值的鲁棒性。我们的Python实现可在以下地址获得:https://github.com/Keanu-Sisouk/RobustBarycenter。
摘要:This short paper presents a general approach for computing robust Wasserstein barycenters of persistence diagrams. The classical method consists in computing assignment arithmetic means after finding the optimal transport plans between the barycenter and the persistence diagrams. However, this procedure only works for the transportation cost related to the $q$-Wasserstein distance $W_q$ when $q=2$. We adapt an alternative fixed-point method to compute a barycenter diagram for generic transportation costs ($q > 1$), in particular those robust to outliers, $q \in (1,2)$. We show the utility of our work in two applications: \emph{(i)} the clustering of persistence diagrams on their metric space and \emph{(ii)} the dictionary encoding of persistence diagrams. In both scenarios, we demonstrate the added robustness to outliers provided by our generalized framework. Our Python implementation is available at this address: https://github.com/Keanu-Sisouk/RobustBarycenter .
【6】Template-Based Cortical Surface Reconstruction with Minimal Energy Deformation
标题:基于模板的具有最小能量变形的皮质表面重建
链接:https://arxiv.org/abs/2509.14827
作者:adlindl, Fabian Bongratz, Christian Wachinger
摘要:来自磁共振成像(MRI)的皮质表面重建(CSR)是神经图像分析的基础,使得能够进行大脑皮质的形态学研究和功能性脑映射。基于学习的CSR的最新进展大大加速了处理,允许在几秒钟内通过解剖模板的变形进行重建。然而,确保学习的变形在变形能量方面是最佳的,并且在训练运行中保持一致仍然是一个特别的挑战。在这项工作中,我们设计了一个最小能量变形(MED)损失,作为变形轨迹上的正则化器,并补充了CSR中广泛使用的倒角距离。我们将其纳入到最近的V2 C-Flow模型中,并在不损害重建准确性和拓扑正确性的情况下,在以前被忽视的训练一致性和可重复性方面取得了相当大的改进。
摘要
:Cortical surface reconstruction (CSR) from magnetic resonance imaging (MRI) is fundamental to neuroimage analysis, enabling morphological studies of the cerebral cortex and functional brain mapping. Recent advances in learning-based CSR have dramatically accelerated processing, allowing for reconstructions through the deformation of anatomical templates within seconds. However, ensuring the learned deformations are optimal in terms of deformation energy and consistent across training runs remains a particular challenge. In this work, we design a Minimal Energy Deformation (MED) loss, acting as a regularizer on the deformation trajectories and complementing the widely used Chamfer distance in CSR. We incorporate it into the recent V2C-Flow model and demonstrate considerable improvements in previously neglected training consistency and reproducibility without harming reconstruction accuracy and topological correctness.
【7】Pre-training under infinite compute
标题:无限计算下的预训练
链接:https://arxiv.org/abs/2509.14786
作者:m, Suhas Kotha, Percy Liang, Tatsunori Hashimoto
摘要:由于计算的增长速度比语言模型预训练可用的Web文本快得多,因此我们询问如何在固定数据和无计算约束的情况下进行预训练。我们首先表明,现有的数据约束的方法,增加历元计数和参数计数最终过拟合,我们显着改善了这样的食谱,通过适当调整正则化,发现最佳的权重衰减是30\times $大于标准实践。由于我们的正则化配方在参数计数中遵循简单的幂律单调减少损失,因此我们通过其缩放律的渐近线而不是固定计算预算下的性能来估计其最佳性能。然后,我们发现,集成独立训练的模型实现了比正则化配方明显更低的损失渐近线。我们最好的干预结合了划时代,正则化,参数缩放和集成缩放,使用比我们的基线少5.17倍的数据在2亿个令牌上实现了渐近线,我们的数据缩放定律预测这种改进在更高的令牌预算下持续存在。我们发现,我们的数据效率提高可以在更小的参数数下实现,因为我们可以将集合提取到比其小8 $\times $的学生模型中,并保留83\%$的集合优势。最后,我们为验证损失设计的干预措施推广到下游基准测试,与数学中期训练数据的持续预训练相比,预训练评估提高了9\%$,数据效率提高了17.5\times $。我们的研究结果表明,简单的算法改进可以在计算丰富的未来实现更有效的数据预训练。
摘要:Since compute grows much faster than web text available for language model pre-training, we ask how one should approach pre-training under fixed data and no compute constraints. We first show that existing data-constrained approaches of increasing epoch count and parameter count eventually overfit, and we significantly improve upon such recipes by properly tuning regularization, finding that the optimal weight decay is $30\times$ larger than standard practice. Since our regularized recipe monotonically decreases loss following a simple power law in parameter count, we estimate its best possible performance via the asymptote of its scaling law rather than the performance at a fixed compute budget. We then identify that ensembling independently trained models achieves a significantly lower loss asymptote than the regularized recipe. Our best intervention combining epoching, regularization, parameter scaling, and ensemble scaling achieves an asymptote at 200M tokens using $5.17\times$ less data than our baseline, and our data scaling laws predict that this improvement persists at higher token budgets. We find that our data efficiency gains can be realized at much smaller parameter counts as we can distill an ensemble into a student model that is 8$\times$ smaller and retains $83\%$ of the ensembling benefit. Finally, our interventions designed for validation loss generalize to downstream benchmarks, achieving a $9\%$ improvement for pre-training evals and a $17.5\times$ data efficiency improvement over continued pre-training on math mid-training data. Our results show that simple algorithmic improvements can enable significantly more data-efficient pre-training in a compute-rich future.
【8】Stochastic Clock Attention for Aligning Continuous and Ordered Sequences
标题:对齐连续和有序序列的随机时钟注意力
链接:https://arxiv.org/abs/2509.14678
作者: Soh, Junghyo Jo
备注:8 pages, 3 figures
摘要:我们制定了一个连续和有序的序列,明确的功能作为一个对齐模型,这是许多序列到序列的任务的核心的注意力机制。标准缩放点积注意依赖于位置编码和掩码,但不强制连续性或单调性,这对于帧同步目标至关重要。我们提出了学习的非负的时钟源和目标和模型的注意力作为这些时钟的会议概率;路径积分推导产生一个封闭的形式,高斯式的评分规则,具有内在的偏向因果关系,平滑,近对角对齐,没有外部位置正则化。该框架支持两个互补的制度:规范化的时钟并行解码时,一个全球的长度是可用的,和非规范化的时钟自回归解码-两个几乎无参数,下拉式的替代品。在Transformer文本到语音测试平台中,这种构造产生更稳定的对齐和对全局时间缩放的改进的鲁棒性,同时匹配或改进缩放的点积基线的准确性。我们假设适用于其他连续的目标,包括视频和时间信号建模。
摘要:We formulate an attention mechanism for continuous and ordered sequences that explicitly functions as an alignment model, which serves as the core of many sequence-to-sequence tasks. Standard scaled dot-product attention relies on positional encodings and masks but does not enforce continuity or monotonicity, which are crucial for frame-synchronous targets. We propose learned nonnegative \emph{clocks} to source and target and model attention as the meeting probability of these clocks; a path-integral derivation yields a closed-form, Gaussian-like scoring rule with an intrinsic bias toward causal, smooth, near-diagonal alignments, without external positional regularizers. The framework supports two complementary regimes: normalized clocks for parallel decoding when a global length is available, and unnormalized clocks for autoregressive decoding -- both nearly-parameter-free, drop-in replacements. In a Transformer text-to-speech testbed, this construction produces more stable alignments and improved robustness to global time-scaling while matching or improving accuracy over scaled dot-product baselines. We hypothesize applicability to other continuous targets, including video and temporal signal modeling.
【9】CUFG: Curriculum Unlearning Guided by the Forgetting Gradient
标题:CUFG:以遗忘梯度为指导的课程遗忘
链接:https://arxiv.org/abs/2509.14633
作者:iao, Liang Hu, Qi Zhang, Lai Zhong Yuan, Usman Naseem
备注:under review (early)
摘要
:随着隐私和安全成为人工智能的中心舞台,机器非学习,即从模型中删除特定知识的能力,已经引起了越来越多的关注。然而,现有的方法过于优先考虑效率和积极的遗忘,这引入了显着的局限性。特别是,像梯度上升、影响函数和随机标签噪声这样的激进干预可能会破坏模型权重的稳定性,导致崩溃和可靠性降低。为了解决这个问题,我们提出了CUFG(通过遗忘事件的课程遗忘),一个新的框架,通过遗忘机制和数据调度策略的创新,提高了近似遗忘的稳定性。具体来说,CUFG集成了一个新的梯度校正器,引导遗忘梯度微调为基础的unlearning和课程unlearning范式,逐渐忘记从容易到困难。这些创新通过实现更稳定和渐进的遗忘,缩小了与黄金标准的再训练方法的差距,从而提高了有效性和可靠性。此外,我们认为课程非学习的概念具有巨大的研究潜力,并为MU领域的发展提供了前瞻性的见解。广泛的实验在不同的遗忘场景验证了我们的方法和CUFG的原理和有效性。代码可在https://anonymous.4open.science/r/CUFG-6375上获得。
摘要:As privacy and security take center stage in AI, machine unlearning, the ability to erase specific knowledge from models, has garnered increasing attention. However, existing methods overly prioritize efficiency and aggressive forgetting, which introduces notable limitations. In particular, radical interventions like gradient ascent, influence functions, and random label noise can destabilize model weights, leading to collapse and reduced reliability. To address this, we propose CUFG (Curriculum Unlearning via Forgetting Gradients), a novel framework that enhances the stability of approximate unlearning through innovations in both forgetting mechanisms and data scheduling strategies. Specifically, CUFG integrates a new gradient corrector guided by forgetting gradients for fine-tuning-based unlearning and a curriculum unlearning paradigm that progressively forgets from easy to hard. These innovations narrow the gap with the gold-standard Retrain method by enabling more stable and progressive unlearning, thereby improving both effectiveness and reliability. Furthermore, we believe that the concept of curriculum unlearning has substantial research potential and offers forward-looking insights for the development of the MU field. Extensive experiments across various forgetting scenarios validate the rationale and effectiveness of our approach and CUFG. Codes are available at https://anonymous.4open.science/r/CUFG-6375.
【10】TICA-Based Free Energy Matching for Machine-Learned Molecular Dynamics
标题:基于TICA的机器学习分子动力学自由能匹配
链接:https://arxiv.org/abs/2509.14600
作者: Aghili, Andy Bruce, Daniel Sabo, Razvan Marinescu
备注:Proceedings of the ICML 2025 Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences, Vancouver, Canada. 2025. Copyright 2025 by the author(s). 4 Pages 5 Figures
摘要:分子动力学(MD)模拟提供了对生物分子系统的原子洞察力,但通常受到访问长时间尺度所需的高计算成本的限制。粗粒度的机器学习模型为加速采样提供了一种有前途的途径,但传统的力匹配方法往往无法捕捉到完整的热力学景观,因为在梯度上拟合模型可能不适合低能构象状态之间的绝对差异。在这项工作中,我们将补充能量匹配项的损失函数。我们使用CGSchNet模型评估我们的Chignolin蛋白框架,系统地改变能量损失项的权重。虽然能量匹配并没有产生统计上显着的准确性的改善,它揭示了不同的趋势,在模型如何概括的自由能表面。我们的研究结果表明,未来的机会,以提高粗粒度建模,通过改进的能量估计技术和多模态损耗公式。
摘要:Molecular dynamics (MD) simulations provide atomistic insight into biomolecular systems but are often limited by high computational costs required to access long timescales. Coarse-grained machine learning models offer a promising avenue for accelerating sampling, yet conventional force matching approaches often fail to capture the full thermodynamic landscape as fitting a model on the gradient may not fit the absolute differences between low-energy conformational states. In this work, we incorporate a complementary energy matching term into the loss function. We evaluate our framework on the Chignolin protein using the CGSchNet model, systematically varying the weight of the energy loss term. While energy matching did not yield statistically significant improvements in accuracy, it revealed distinct tendencies in how models generalize the free energy surface. Our results suggest future opportunities to enhance coarse-grained modeling through improved energy estimation techniques and multi-modal loss formulations.
【11】Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction
标题:推出OmniGEC:用于语法错误纠正的银色多语言数据集
链接:https://arxiv.org/abs/2509.14504
作者:alchuk, Mariana Romanyshyn, Petro Ivaniuk
备注:None
摘要:在本文中,我们介绍了OmniGEC,一个多语言的银标准数据集的语法错误纠正(GEC)的任务,涵盖11种语言:捷克语,英语,爱沙尼亚语,德语,希腊语,冰岛语,意大利语,拉脱维亚语,斯洛文尼亚语,瑞典语和乌克兰语。这些数据集促进了多语言GEC解决方案的开发,并有助于弥合将英语GEC解决方案适应多语言GEC的数据差距。数据集中的文本来自三个来源:维基百科对11种目标语言的编辑,Reddit对11种目标语言的子Reddit,以及乌克兰语的UberText 2.0社交媒体语料库。虽然维基百科的编辑来自人工更正,但Reddit和UberText 2.0的数据是使用GPT-4o-mini模型自动更正的。自动和手动评估数据集中校正的质量。最后,我们在多语言OmniGEC语料库上微调了两个开源大型语言模型-Aya-Expanse(8B)和Gemma-3(12B),并为段落级多语言GEC实现了最先进的(SOTA)结果。数据集集合和性能最佳的模型可在Hugging Face上使用。
摘要:In this paper, we introduce OmniGEC, a collection of multilingual silver-standard datasets for the task of Grammatical Error Correction (GEC), covering eleven languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Slovene, Swedish, and Ukrainian. These datasets facilitate the development of multilingual GEC solutions and help bridge the data gap in adapting English GEC solutions to multilingual GEC. The texts in the datasets originate from three sources: Wikipedia edits for the eleven target languages, subreddits from Reddit in the eleven target languages, and the Ukrainian-only UberText 2.0 social media corpus. While Wikipedia edits were derived from human-made corrections, the Reddit and UberText 2.0 data were automatically corrected with the GPT-4o-mini model. The quality of the corrections in the datasets was evaluated both automatically and manually. Finally, we fine-tune two open-source large language models - Aya-Expanse (8B) and Gemma-3 (12B) - on the multilingual OmniGEC corpora and achieve state-of-the-art (SOTA) results for paragraph-level multilingual GEC. The dataset collection and the best-performing models are available on Hugging Face.
【12】Class-invariant Test-Time Augmentation for Domain Generalization
标题:领域概括的类不变测试时扩充
链接:https://arxiv.org/abs/2509.14420
作者:Lin, Xiaolin Wu, Xi Zhang
摘要
:深度模型在分布变化下通常会遭受显著的性能下降。领域泛化(DG)试图通过使模型泛化到看不见的领域来减轻这一挑战。大多数先前的方法依赖于多域训练或计算密集型测试时间适应。相比之下,我们提出了一个补充策略:轻量级测试时间增强。具体来说,我们开发了一种新的类不变测试时间增强(CI-TTA)技术。这个想法是通过弹性和网格变形来生成每个输入图像的多个变体,这些变体与原始输入属于同一类。他们的预测通过一个信心引导的过滤方案进行汇总,该方案删除了不可靠的输出,确保最终决策依赖于一致和可信的线索。PACS和家庭数据集上的广泛实验表明,不同的DG算法和骨干一致的收益,突出了我们的方法的有效性和通用性。
摘要:Deep models often suffer significant performance degradation under distribution shifts. Domain generalization (DG) seeks to mitigate this challenge by enabling models to generalize to unseen domains. Most prior approaches rely on multi-domain training or computationally intensive test-time adaptation. In contrast, we propose a complementary strategy: lightweight test-time augmentation. Specifically, we develop a novel Class-Invariant Test-Time Augmentation (CI-TTA) technique. The idea is to generate multiple variants of each input image through elastic and grid deformations that nevertheless belong to the same class as the original input. Their predictions are aggregated through a confidence-guided filtering scheme that remove unreliable outputs, ensuring the final decision relies on consistent and trustworthy cues. Extensive Experiments on PACS and Office-Home datasets demonstrate consistent gains across different DG algorithms and backbones, highlighting the effectiveness and generality of our approach.
【13】DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion
标题:DreamControl:以人为本的全身人形机器人控制,通过引导扩散实现场景交互
链接:https://arxiv.org/abs/2509.14353
作者:ria, Sudarshan S Harithas, Pushkal Katara, Sangkyung Kwak, Sarthak Bhagat, Shankar Sastry, Srinath Sridhar, Sai Vemprala, Ashish Kapoor, Jonathan Chung-Kuan Huang
备注:(under submission)
摘要:我们介绍DreamControl,一种学习自主全身人形技能的新方法。DreamControl利用了扩散模型和强化学习(RL)的优势:我们的核心创新是使用在人体运动数据上训练的扩散先验,随后在模拟中指导RL策略完成感兴趣的特定任务(例如,打开抽屉或拿起物体)。我们证明了这种人类运动信息先验允许RL发现直接RL无法实现的解决方案,并且扩散模型本质上促进了自然的运动,有助于模拟到真实的转移。我们验证了DreamControl在Unitree G1机器人上的有效性,这些机器人可以完成各种具有挑战性的任务,包括同时进行下半身和上身控制以及对象交互。
摘要:We introduce DreamControl, a novel methodology for learning autonomous whole-body humanoid skills. DreamControl leverages the strengths of diffusion models and Reinforcement Learning (RL): our core innovation is the use of a diffusion prior trained on human motion data, which subsequently guides an RL policy in simulation to complete specific tasks of interest (e.g., opening a drawer or picking up an object). We demonstrate that this human motion-informed prior allows RL to discover solutions unattainable by direct RL, and that diffusion models inherently promote natural looking motions, aiding in sim-to-real transfer. We validate DreamControl's effectiveness on a Unitree G1 robot across a diverse set of challenging tasks involving simultaneous lower and upper body control and object interaction.
【14】Normalized Square Root: Sharper Matrix Factorization Bounds for Differentially Private Continual Counting
标题:规范化平方根:差异私有连续计数的更尖锐矩阵分解边界
链接:https://arxiv.org/abs/2509.14334
作者:nzinger, Nikita P. Kalinin, Jalaj Upadhyay
摘要:The factorization norms of the lower-triangular all-ones $n \times n$ matrix, $\gamma_2(M_{count})$ and $\gamma_{F}(M_{count})$, play a central role in differential privacy as they are used to give theoretical justification of the accuracy of the only known production-level private training algorithm of deep neural networks by Google. Prior to this work, the best known upper bound on $\gamma_2(M_{count})$ was $1 + \frac{\log n}{\pi}$ by Mathias (Linear Algebra and Applications, 1993), and the best known lower bound was $\frac{1}{\pi}(2 + \log(\frac{2n+1}{3})) \approx 0.507 + \frac{\log n}{\pi}$ (Matou\v{s}ek, Nikolov, Talwar, IMRN 2020), where $\log$ denotes the natural logarithm. Recently, Henzinger and Upadhyay (SODA 2025) gave the first explicit factorization that meets the bound of Mathias (1993) and asked whether there exists an explicit factorization that improves on Mathias' bound. We answer this question in the affirmative. Additionally, we improve the lower bound significantly. More specifically, we show that $$ 0.701 + \frac{\log n}{\pi} + o(1) \;\leq\; \gamma_2(M_{count}) \;\leq\; 0.846 + \frac{\log n}{\pi} + o(1). $$ That is, we reduce the gap between the upper and lower bound to $0.14 + o(1)$. We also show that our factors achieve a better upper bound for $\gamma_{F}(M_{count})$ compared to prior work, and we establish an improved lower bound: $$ 0.701 + \frac{\log n}{\pi} + o(1) \;\leq\; \gamma_{F}(M_{count}) \;\leq\; 0.748 + \frac{\log n}{\pi} + o(1). $$ That is, the gap between the lower and upper bound provided by our explicit factorization is $0.047 + o(1)$.
摘要:The factorization norms of the lower-triangular all-ones $n \times n$ matrix, $\gamma_2(M_{count})$ and $\gamma_{F}(M_{count})$, play a central role in differential privacy as they are used to give theoretical justification of the accuracy of the only known production-level private training algorithm of deep neural networks by Google. Prior to this work, the best known upper bound on $\gamma_2(M_{count})$ was $1 + \frac{\log n}{\pi}$ by Mathias (Linear Algebra and Applications, 1993), and the best known lower bound was $\frac{1}{\pi}(2 + \log(\frac{2n+1}{3})) \approx 0.507 + \frac{\log n}{\pi}$ (Matou\v{s}ek, Nikolov, Talwar, IMRN 2020), where $\log$ denotes the natural logarithm. Recently, Henzinger and Upadhyay (SODA 2025) gave the first explicit factorization that meets the bound of Mathias (1993) and asked whether there exists an explicit factorization that improves on Mathias' bound. We answer this question in the affirmative. Additionally, we improve the lower bound significantly. More specifically, we show that $$ 0.701 + \frac{\log n}{\pi} + o(1) \;\leq\; \gamma_2(M_{count}) \;\leq\; 0.846 + \frac{\log n}{\pi} + o(1). $$ That is, we reduce the gap between the upper and lower bound to $0.14 + o(1)$. We also show that our factors achieve a better upper bound for $\gamma_{F}(M_{count})$ compared to prior work, and we establish an improved lower bound: $$ 0.701 + \frac{\log n}{\pi} + o(1) \;\leq\; \gamma_{F}(M_{count}) \;\leq\; 0.748 + \frac{\log n}{\pi} + o(1). $$ That is, the gap between the lower and upper bound provided by our explicit factorization is $0.047 + o(1)$.
【15】Real-Time Streaming Mel Vocoding with Generative Flow Matching
标题:具有生成流匹配的实时流媒体Mel语音编码
链接:https://arxiv.org/abs/2509.15085
作者:ker, Tal Peer, Timo Gerkmann
备注:(C) 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
摘要:Mel声码的任务,即,梅尔幅度频谱图到音频波形的反转仍然是当今许多文本到语音(TTS)系统中的关键组件。基于生成式流匹配,我们在生成式STFT相位恢复方面的前期工作,(DiffPhase)和Mel滤波器组的伪逆运算符,我们开发了MelFlow,这是一种具有流式生成能力的Mel声码器,用于以16 kHz采样的语音,算法延迟仅为32 ms,总延迟为48 ms。但实际上是在消费者笔记本电脑GPU上。此外,我们表明,我们的模型实现了更好的PESQ和SI-SDR值相比,完善的不流传输能力的基线梅尔声码,包括HiFi-GAN。
摘要:The task of Mel vocoding, i.e., the inversion of a Mel magnitude spectrogram to an audio waveform, is still a key component in many text-to-speech (TTS) systems today. Based on generative flow matching, our prior work on generative STFT phase retrieval (DiffPhase), and the pseudoinverse operator of the Mel filterbank, we develop MelFlow, a streaming-capable generative Mel vocoder for speech sampled at 16 kHz with an algorithmic latency of only 32 ms and a total latency of 48 ms. We show real-time streaming capability at this latency not only in theory, but in practice on a consumer laptop GPU. Furthermore, we show that our model achieves substantially better PESQ and SI-SDR values compared to well-established not streaming-capable baselines for Mel vocoding including HiFi-GAN.
【16】Undersampled Phase Retrieval with Image Priors
标题:利用图像先验进行欠采样阶段检索
链接:https://arxiv.org/abs/2509.15026
作者: Ducotterd, Zhiyuan Hu, Michael Unser, Jonathan Dong
摘要:Phase retrieval seeks to recover a complex signal from amplitude-only measurements, a challenging nonlinear inverse problem. Current theory and algorithms often ignore signal priors. By contrast, we evaluate here a variety of image priors in the context of severe undersampling with structured random Fourier measurements. Our results show that those priors significantly improve reconstruction, allowing accurate reconstruction even below the weak recovery threshold.
【17】Aligning Audio Captions with Human Preferences
标题:将音频字幕与人类偏好保持一致
链接:https://arxiv.org/abs/2509.14659
作者:gde, Rehana Mahfuz, Yinyi Guo, Erik Visser
备注:Submitted to ICASSP 2026
摘要:Current audio captioning systems rely heavily on supervised learning with paired audio-caption datasets, which are expensive to curate and may not reflect human preferences in real-world scenarios. To address this limitation, we propose a preference-aligned audio captioning framework based on Reinforcement Learning from Human Feedback (RLHF). To effectively capture nuanced human preferences, we train a Contrastive Language-Audio Pretraining (CLAP)-based reward model using human-labeled pairwise preference data. This reward model is integrated into a reinforcement learning framework to fine-tune any baseline captioning system without relying on ground-truth caption annotations. Extensive human evaluations across multiple datasets show that our method produces captions preferred over those from baseline models, particularly in cases where the baseline models fail to provide correct and natural captions. Furthermore, our framework achieves performance comparable to supervised approaches with ground-truth data, demonstrating its effectiveness in aligning audio captioning with human preferences and its scalability in real-world scenarios.
【18】Indoor Airflow Imaging Using Physics-Informed Background-Oriented Schlieren Tomography
标题:使用物理信息的面向背景纹影断层扫描技术进行室内气流成像
链接:https://arxiv.org/abs/2509.14442
作者:, Wael H. Ali, Joshua Rapp, Hassan Mansour
备注:Presented in ISCS25
摘要:We develop a framework for non-invasive volumetric indoor airflow estimation from a single viewpoint using background-oriented schlieren (BOS) measurements and physics-informed reconstruction. Our framework utilizes a light projector that projects a pattern onto a target back-wall and a camera that observes small distortions in the light pattern. While the single-view BOS tomography problem is severely ill-posed, our proposed framework addresses this using: (1) improved ray tracing, (2) a physics-based light rendering approach and loss formulation, and (3) a physics-based regularization using a physics-informed neural network (PINN) to ensure that the reconstructed airflow is consistent with the governing equations for buoyancy-driven flows.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递