Py学习  »  机器学习算法

机器学习学术速递[7.28]

arXiv每日学术速递 • 9 月前 • 713 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计103篇


大模型相关(6篇)

【1】Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
标题:通过大规模训练大型语言模型推进事件预测:挑战、解决方案和更广泛的影响
链接:https://arxiv.org/abs/2507.19477

作者:Lee, Sohee Yang, Donghyun Kwak, Noah Y. Siegel
摘要:最近的许多论文研究了超级预报员级事件预报LLM的发展。虽然早期研究的方法问题对使用LLM进行事件预测产生了怀疑,但最近的研究表明,最先进的LLM正在逐渐达到超级预测员级别的性能,并且据报道,强化学习也可以改善未来的预测。此外,最近的推理模型和深度研究式模型取得了前所未有的成功,这表明能够大大提高预测性能的技术已经开发出来。因此,基于这些积极的近期趋势,我们认为,研究超级预报员级事件预测LLM的大规模培训的时机已经成熟。我们讨论了两个关键的研究方向:训练方法和数据采集。对于训练,我们首先介绍了基于LLM的事件预测训练的三个难点:噪声稀疏性,知识截止和简单的奖励结构问题。然后,我们提出了相关的想法,以减轻这些问题:假设事件贝叶斯网络,利用回忆不良和反事实的事件,和辅助奖励信号。对于数据,我们建议积极使用市场,公共和爬行数据集,以实现大规模的训练和评估。最后,我们解释了这些技术进步如何使人工智能能够在更广泛的领域为社会提供预测智能。这份立场文件提出了更接近超级预报员级人工智能技术的有希望的具体途径和考虑因素,旨在引起研究人员对这些方向的兴趣。
摘要:Many recent papers have studied the development of superforecaster-level event forecasting LLMs. While methodological problems with early studies cast doubt on the use of LLMs for event forecasting, recent studies with improved evaluation methods have shown that state-of-the-art LLMs are gradually reaching superforecaster-level performance, and reinforcement learning has also been reported to improve future forecasting. Additionally, the unprecedented success of recent reasoning models and Deep Research-style models suggests that technology capable of greatly improving forecasting performance has been developed. Therefore, based on these positive recent trends, we argue that the time is ripe for research on large-scale training of superforecaster-level event forecasting LLMs. We discuss two key research directions: training methods and data acquisition. For training, we first introduce three difficulties of LLM-based event forecasting training: noisiness-sparsity, knowledge cut-off, and simple reward structure problems. Then, we present related ideas to mitigate these problems: hypothetical event Bayesian networks, utilizing poorly-recalled and counterfactual events, and auxiliary reward signals. For data, we propose aggressive use of market, public, and crawling datasets to enable large-scale training and evaluation. Finally, we explain how these technical advances could enable AI to provide predictive intelligence to society in broader areas. This position paper presents promising specific paths and considerations for getting closer to superforecaster-level AI technology, aiming to call for researchers' interest in these directions.


【2】Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
标题:几分钟内将数据翻倍:通过LLM诱导的依赖性图生成超快表格数据
链接:https://arxiv.org/abs/2507.19334

作者:, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci
摘要:表格数据在不同领域中至关重要,但由于隐私问题和收集成本,高质量的数据集仍然稀缺。当代方法采用大型语言模型(LLM)进行表格扩充,但表现出两个主要局限性:(1)表格特征之间的密集依赖建模可能会引入偏差,以及(2)采样时的高计算开销。为了解决这些问题,我们提出了SPADA用于SPArse依赖驱动的增强,这是一个轻量级的生成框架,可以通过LLM诱导图显式捕获稀疏依赖。我们将每个特征视为一个节点,并通过遍历图来合成值,仅在其父节点上调节每个特征。我们探索了两种合成策略:一种使用高斯核密度估计的非参数方法,以及一种学习条件密度估计的可逆映射的条件归一化流模型。在四个数据集上的实验表明,与基于扩散的方法相比,SPADA将约束违反减少了4%,并将生成速度提高了近9,500倍。
摘要:Tabular data is critical across diverse domains, yet high-quality datasets remain scarce due to privacy concerns and the cost of collection. Contemporary approaches adopt large language models (LLMs) for tabular augmentation, but exhibit two major limitations: (1) dense dependency modeling among tabular features that can introduce bias, and (2) high computational overhead in sampling. To address these issues, we propose SPADA for SPArse Dependency-driven Augmentation, a lightweight generative framework that explicitly captures sparse dependencies via an LLM-induced graph. We treat each feature as a node and synthesize values by traversing the graph, conditioning each feature solely on its parent nodes. We explore two synthesis strategies: a non-parametric method using Gaussian kernel density estimation, and a conditional normalizing flow model that learns invertible mappings for conditional density estimation. Experiments on four datasets show that SPADA reduces constraint violations by 4% compared to diffusion-based methods and accelerates generation by nearly 9,500 times over LLM-based baselines.


【3】Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
标题:小规模数据中毒会加剧大型语言模型中与方言相关的偏见吗?
链接:https://arxiv.org/abs/2507.19195

作者:bbas, Mariette Awad, Razane Tajeddine
摘要:尽管大型语言模型(LLM)的设计正在不断改进,以促进包容和平衡的反应,但这些系统仍然容易编码和放大社会偏见。本研究探讨如何方言的变化,特别是非洲裔美国人的白话英语(AAVE)与标准美国英语(SAE),与数据中毒,影响输出的毒性。使用小规模和中等规模的LLaMA模型,我们表明,即使是最小的暴露于中毒的数据显着增加毒性的AAVE输入,而它仍然相对不受影响的SAE。较大的模型表现出更显着的放大效应,这表明随着规模的增加,敏感性增加。为了进一步评估这些差异,我们聘请了GPT-4 o作为公平审计员,该审计员发现了与AAVE输入不成比例地相关的有害刻板印象模式,包括对攻击性、犯罪和智力低下的描述。这些发现强调了数据中毒和方言偏见的复合影响,并强调了在开发过程中需要方言意识评估,有针对性的去偏见干预措施和对社会负责的培训协议。
摘要:Despite the ongoing improvements in the design of large language models (LLMs) to foster inclusion and balanced responses, these systems remain susceptible to encoding and amplifying social biases. This study examines how dialectal variation, specifically African American Vernacular English (AAVE) versus Standard American English (SAE), interacts with data poisoning to influence toxicity in outputs. Using both small- and medium-scale LLaMA models, we show that even minimal exposure to poisoned data significantly increases toxicity for AAVE inputs, while it remains comparatively unaffected for SAE. Larger models exhibit a more significant amplification effect which suggests heightened susceptibility with scale. To further assess these disparities, we employed GPT-4o as a fairness auditor, which identified harmful stereotypical patterns disproportionately tied to AAVE inputs, including portrayals of aggression, criminality, and intellectual inferiority. These findings underscore the compounding impact of data poisoning and dialectal bias and emphasize the need for dialect-aware evaluation, targeted debiasing interventions, and socially responsible training protocols during development.


【4】Solar Photovoltaic Assessment with Large Language Model
标题:大语言模型的太阳能发电评估
链接:https://arxiv.org/abs/2507.19144

作者:, Yang Weng
备注:27 pages, 7 figures
摘要:卫星图像中太阳能光伏(PV)面板的准确检测和定位对于优化微电网和有源配电网(ADN)至关重要,而微电网和有源配电网是可再生能源系统的关键组成部分。现有的方法在其底层算法或训练数据集方面缺乏透明度,依赖于大型高质量的PV训练数据,并且难以推广到新的地理区域或不同的环境条件,而无需大量的重新训练。这些限制导致检测结果不一致,阻碍了大规模部署和数据驱动的网格优化。在本文中,我们研究了如何利用大型语言模型(LLM)来克服这些挑战。尽管有希望,但LLM在太阳能电池板检测中面临着若干挑战,包括多步逻辑过程的困难、不一致的输出格式、视觉相似对象的频繁错误分类(例如,阴影、停车场),以及在复杂任务(如空间定位和量化)中的低精度。为了克服这些问题,我们提出了具有LLM的PV评估(PVAL)框架,该框架将任务分解用于更高效的工作流程,输出标准化用于一致和可扩展的格式,Few-Shot提示以提高分类准确性,并使用具有详细注释的策划PV数据集进行微调。PVAL确保了跨异构数据集的透明性、可伸缩性和适应性,同时最大限度地减少了计算开销。通过将开源可访问性与强大的方法相结合,PVAL为太阳能电池板检测建立了一个自动化和可重复的管道,为大规模可再生能源集成和优化电网管理铺平了道路。
摘要:Accurate detection and localization of solar photovoltaic (PV) panels in satellite imagery is essential for optimizing microgrids and active distribution networks (ADNs), which are critical components of renewable energy systems. Existing methods lack transparency regarding their underlying algorithms or training datasets, rely on large, high-quality PV training data, and struggle to generalize to new geographic regions or varied environmental conditions without extensive re-training. These limitations lead to inconsistent detection outcomes, hindering large-scale deployment and data-driven grid optimization. In this paper, we investigate how large language models (LLMs) can be leveraged to overcome these challenges. Despite their promise, LLMs face several challenges in solar panel detection, including difficulties with multi-step logical processes, inconsistent output formatting, frequent misclassification of visually similar objects (e.g., shadows, parking lots), and low accuracy in complex tasks such as spatial localization and quantification. To overcome these issues, we propose the PV Assessment with LLMs (PVAL) framework, which incorporates task decomposition for more efficient workflows, output standardization for consistent and scalable formatting, few-shot prompting to enhance classification accuracy, and fine-tuning using curated PV datasets with detailed annotations. PVAL ensures transparency, scalability, and adaptability across heterogeneous datasets while minimizing computational overhead. By combining open-source accessibility with robust methodologies, PVAL establishes an automated and reproducible pipeline for solar panel detection, paving the way for large-scale renewable energy integration and optimized grid management.


【5】Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations
标题:Agent 0:利用LLM代理从文本中发现多价值特征以获得增强型推荐
链接:https://arxiv.org/abs/2507.18993

作者:j, Benoît Guilleminot, Andraž Tori
备注:Agent4IR, KDD '25
摘要:大型语言模型(LLM)及其相关的基于代理的框架显着先进的自动信息提取,现代推荐系统的关键组成部分。虽然这些多任务框架广泛用于代码生成,但它们在以数据为中心的研究中的应用在很大程度上仍未得到开发。本文介绍了Agent0,一个LLM驱动的,基于代理的系统,旨在自动化信息提取和功能建设从原始的,非结构化的文本。分类特征对于大规模推荐系统是至关重要的,但是获取分类特征的成本往往很高。Agent0协调一组相互作用的LLM代理,以自动识别后续任务(如模型或AutoML管道)中最有价值的文本方面。除了其功能工程功能之外,Agent0还提供了一种自动化的工程优化方法,该方法利用了来自Oracle的动态反馈循环。我们的研究结果表明,这种闭环方法是实用和有效的自动特征发现,这是公认的最具挑战性的阶段之一,在目前的推荐系统的发展。
摘要:Large language models (LLMs) and their associated agent-based frameworks have significantly advanced automated information extraction, a critical component of modern recommender systems. While these multitask frameworks are widely used in code generation, their application in data-centric research is still largely untapped. This paper presents Agent0, an LLM-driven, agent-based system designed to automate information extraction and feature construction from raw, unstructured text. Categorical features are crucial for large-scale recommender systems but are often expensive to acquire. Agent0 coordinates a group of interacting LLM agents to automatically identify the most valuable text aspects for subsequent tasks (such as models or AutoML pipelines). Beyond its feature engineering capabilities, Agent0 also offers an automated prompt-engineering tuning method that utilizes dynamic feedback loops from an oracle. Our findings demonstrate that this closed-loop methodology is both practical and effective for automated feature discovery, which is recognized as one of the most challenging phases in current recommender system development.


【6】Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks
标题:推进基于视觉的人类动作识别:探索视觉语言CLIP模型用于领域独立任务的概括
链接:https://arxiv.org/abs/2507.18675

作者:in, Marsha Mariya Kappan, Vijeta Sharma
摘要:人类动作识别在医疗保健和医学中起着关键作用,支持患者行为监测、跌倒检测、手术机器人监督和手术技能评估等应用。虽然像CNN和RNN这样的传统模型已经取得了一定的成功,但它们往往很难在各种复杂的行为中进行推广。视觉语言模型的最新进展,特别是基于transformer的CLIP模型,为从视频数据中概括动作识别提供了有前途的能力。在这项工作中,我们在UCF-101数据集上评估了CLIP,并系统地分析了其在三种掩蔽策略下的性能:(1)基于图像和基于形状的黑色掩蔽,分别为10%,30%和50%,(2)特定于特征的掩蔽,以抑制偏差诱导元素,以及(3)仅保留类特定区域的隔离掩蔽。我们的研究结果表明,CLIP表现出不一致的行为和频繁的错误分类,特别是当重要的视觉线索被掩盖。为了克服这些限制,我们建议将通过自定义损失函数学习的特定于类的噪声结合起来,以加强对类定义特征的关注。这种增强提高了分类精度和模型置信度,同时减少了偏差。最后,我们讨论了在临床领域应用这些模型的挑战,并概述了未来工作的方向,以提高跨领域独立的医疗保健方案的通用性。
摘要:Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.


Graph相关(图学习|图神经网络|图优化等)(12篇)

【1】Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems
标题:CMS触发系统中用于横向动量估计的物理信息图神经网络
链接:https://arxiv.org/abs/2507.19205

作者:Jahin, Shahriar Soudeep, M. F. Mridha, Muhammad Mostafa Monowar, Md. Abdul Hamid
摘要 :高能物理中粒子横动量($p_T$)的实时估计要求算法在严格的硬件约束下既高效又准确。静态机器学习模型在高堆积下会退化,并且缺乏物理感知优化,而通用图神经网络(GNN)通常忽略对鲁棒$p_T$回归至关重要的域结构。我们提出了一个物理知情的GNN框架,通过四个不同的图形构造策略,系统地编码检测器几何形状和物理可观的系统编码检测器几何形状和物理可观的:站作为节点,功能作为节点,弯曲角度为中心,和伪快度($\eta$)为中心的表示。该框架集成了这些定制的图结构与一个新的消息传递层(MPL),具有消息内的注意力和门控更新,并结合$p_{T}$-分布先验域特定的损失函数。与现有基准相比,我们的协同设计方法可实现卓越的准确性-效率权衡。在CMS Trigger Dataset上进行的大量实验验证了该方法:基于站点的EdgeConv模型实现了最先进的MAE 0.8525,参数比深度学习基线少了$\ge55\%$,特别是TabNet,而以$\eta$为中心的MPL配置也证明了提高的准确性和相当的效率。这些结果为在资源受限的触发系统中部署物理引导的GNN奠定了基础。
摘要:Real-time particle transverse momentum ($p_T$) estimation in high-energy physics demands algorithms that are both efficient and accurate under strict hardware constraints. Static machine learning models degrade under high pileup and lack physics-aware optimization, while generic graph neural networks (GNNs) often neglect domain structure critical for robust $p_T$ regression. We propose a physics-informed GNN framework that systematically encodes detector geometry and physical observables through four distinct graph construction strategies that systematically encode detector geometry and physical observables: station-as-node, feature-as-node, bending angle-centric, and pseudorapidity ($\eta$)-centric representations. This framework integrates these tailored graph structures with a novel Message Passing Layer (MPL), featuring intra-message attention and gated updates, and domain-specific loss functions incorporating $p_{T}$-distribution priors. Our co-design methodology yields superior accuracy-efficiency trade-offs compared to existing baselines. Extensive experiments on the CMS Trigger Dataset validate the approach: a station-informed EdgeConv model achieves a state-of-the-art MAE of 0.8525 with $\ge55\%$ fewer parameters than deep learning baselines, especially TabNet, while an $\eta$-centric MPL configuration also demonstrates improved accuracy with comparable efficiency. These results establish the promise of physics-guided GNNs for deployment in resource-constrained trigger systems.


【2】Graph Structure Learning with Privacy Guarantees for Open Graph Data
标题:开放图形数据具有隐私保证的图形结构学习
链接:https://arxiv.org/abs/2507.19116

作者:, Jiaqi Wu, Yang Weng, Yizheng Liao, Shengzhe Chen
摘要:在《通用数据保护条例》(GDPR)等法规下,确保大规模开放数据集的隐私越来越具有挑战性。虽然差分隐私(DP)提供了强有力的理论保证,但它主要关注模型训练期间的噪声注入,忽略了数据发布阶段的隐私保护。现有的隐私保护数据发布(PPDP)方法难以平衡隐私和实用性,特别是当数据发布者和用户是不同的实体时。为了解决这一差距,我们专注于图恢复问题,并提出了一种新的隐私保护的开放图数据的估计框架,利用高斯DP(GDP)与结构化的噪声注入机制。与扰动梯度或模型更新的传统方法不同,我们的方法确保了无偏的图结构恢复,同时在数据发布阶段执行DP。此外,我们提供了理论保证的估计精度和扩展我们的方法,离散变量图,DP研究中经常被忽视的设置。图学习的实验结果表明,鲁棒的性能,提供了一个可行的解决方案,隐私意识的图分析。
摘要:Ensuring privacy in large-scale open datasets is increasingly challenging under regulations such as the General Data Protection Regulation (GDPR). While differential privacy (DP) provides strong theoretical guarantees, it primarily focuses on noise injection during model training, neglecting privacy preservation at the data publishing stage. Existing privacy-preserving data publishing (PPDP) approaches struggle to balance privacy and utility, particularly when data publishers and users are distinct entities. To address this gap, we focus on the graph recovery problem and propose a novel privacy-preserving estimation framework for open graph data, leveraging Gaussian DP (GDP) with a structured noise-injection mechanism. Unlike traditional methods that perturb gradients or model updates, our approach ensures unbiased graph structure recovery while enforcing DP at the data publishing stage. Moreover, we provide theoretical guarantees on estimation accuracy and extend our method to discrete-variable graphs, a setting often overlooked in DP research. Experimental results in graph learning demonstrate robust performance, offering a viable solution for privacy-conscious graph analysis.


【3】GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network
标题:GCL-GCN:图形和对比学习增强属性图聚集网络
链接:https://arxiv.org/abs/2507.19095

作者:Li, Xu Xiang, Xue Li, Binyu Zhao, Yujie Liu, Huijie Tang, Benhan Yang, Zhixuan Chen
备注:The source code for this study is available at this https URL
摘要:属性图聚类在现代数据分析中具有重要意义。然而,由于图数据的复杂性和节点属性的异构性,利用图信息进行聚类仍然具有挑战性。为了解决这个问题,我们提出了一种新的深度图聚类模型GCL-GCN,专门用于解决现有模型在处理稀疏和异构图数据时捕获局部依赖关系和复杂结构的局限性。GCL-GCN引入了一个创新的Graphormer模块,该模块结合了中心性编码和空间关系,有效地捕获节点之间的全局和局部信息,从而提高了节点表示的质量。此外,我们提出了一种新的对比学习模块,显着提高了区分能力的特征表示。在预训练阶段,该模块通过对原始特征矩阵进行对比学习来增加特征区分,从而确保后续图卷积和聚类任务的初始表示更加可识别。在六个数据集上的实验结果表明,GCL-GCN在聚类质量和鲁棒性方面优于14种先进方法。具体而言,在Cora数据集上,与主要比较方法MBN相比,它分别将ACC,NMI和ARI提高了4.94%,13.01%和10.97%。
摘要:Attributed graph clustering holds significant importance in modern data analysis. However, due to the complexity of graph data and the heterogeneity of node attributes, leveraging graph information for clustering remains challenging. To address this, we propose a novel deep graph clustering model, GCL-GCN, specifically designed to address the limitations of existing models in capturing local dependencies and complex structures when dealing with sparse and heterogeneous graph data. GCL-GCN introduces an innovative Graphormer module that combines centrality encoding and spatial relationships, effectively capturing both global and local information between nodes, thereby enhancing the quality of node representations. Additionally, we propose a novel contrastive learning module that significantly enhances the discriminative power of feature representations. In the pre-training phase, this module increases feature distinction through contrastive learning on the original feature matrix, ensuring more identifiable initial representations for subsequent graph convolution and clustering tasks. Extensive experimental results on six datasets demonstrate that GCL-GCN outperforms 14 advanced methods in terms of clustering quality and robustness. Specifically, on the Cora dataset, it improves ACC, NMI, and ARI by 4.94%, 13.01%, and 10.97%, respectively, compared to the primary comparison method MBN.


【4】Clustering-Oriented Generative Attribute Graph Imputation
标题:面向聚类的生成式属性图插补
链接:https://arxiv.org/abs/2507.19085

作者:n, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li
备注:Accepted by ACM MM'25
摘要:属性缺失图聚类是一个重要的无监督聚类任务,它只提供部分节点的属性向量,而图结构是完整的。相关模型一般遵循估算和细化两步范式。然而,大多数填补方法无法捕捉类相关的语义信息,导致次优填补聚类。此外,现有的细化策略通过图重构来优化学习的嵌入,而忽略了一些属性与图不相关的事实。为了解决这一问题,我们建立了面向预测的可靠精化生成插补(CGIR)模型。具体地说,估计子簇分布以精确地揭示类特定的特征,并约束生成对抗模块的采样空间,使得插补节点与正确的簇对齐。然后,通过合并多个子簇来引导边缘注意网络,识别每个类的边缘属性,从而避免图重构中的冗余属性干扰整体嵌入的细化.综上所述,CGIR将属性缺失图聚类分解为子聚类的搜索和合并,从而指导在统一的框架内实现节点填充和细化。大量的实验证明了CGIR优于最先进的竞争对手。
摘要 :Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of imputation and refinement. However, most imputation approaches fail to capture class-relevant semantic information, leading to sub-optimal imputation for clustering. Moreover, existing refinement strategies optimize the learned embedding through graph reconstruction, while neglecting the fact that some attributes are uncorrelated with the graph. To remedy the problems, we establish the Clustering-oriented Generative Imputation with reliable Refinement (CGIR) model. Concretely, the subcluster distributions are estimated to reveal the class-specific characteristics precisely, and constrain the sampling space of the generative adversarial module, such that the imputation nodes are impelled to align with the correct clusters. Afterwards, multiple subclusters are merged to guide the proposed edge attention network, which identifies the edge-wise attributes for each class, so as to avoid the redundant attributes in graph reconstruction from disturbing the refinement of overall embedding. To sum up, CGIR splits attribute-missing graph clustering into the search and mergence of subclusters, which guides to implement node imputation and refinement within a unified framework. Extensive experiments prove the advantages of CGIR over state-of-the-art competitors.


【5】Dynamics-Informed Reservoir Computing with Visibility Graphs
标题:使用可见性图的动态信息水库计算
链接:https://arxiv.org/abs/2507.19046

作者: Geier (1), Merten Stender (2) ((1) Dynamics Group, Hamburg University of Technology, (2) Chair of Cyber-Physical Systems in Mechanical Engineering, Technische Universität Berlin, Germany)
备注:7 pages, 4 figures. The following article has been submitted to by Chaos: An Interdisciplinary Journal of Nonlinear Science
摘要:复杂非线性时间序列的准确预测仍然是工程和科学学科的一个具有挑战性的问题。水库计算(RC)通过仅训练读出层,同时采用随机结构和固定的水库网络,为传统的深度学习提供了一种计算效率高的替代方案。尽管具有优势,但很大程度上随机的水库图架构通常会导致次优和超大的网络,其动力学知之甚少。针对这个问题,我们提出了一种新的动态信息水库计算(DyRC)框架,系统地推断水库网络结构直接从输入的训练序列。这项工作提出了采用可见性图(VG)技术,它将时间序列数据转换成网络表示测量点作为节点相互连接的可见性。水库网络是通过直接采用VG网络从训练数据序列,利用无参数可见性图的方法,以避免昂贵的超参数调整。该过程导致由所研究的预测任务的特定动态直接通知的储层。我们通过涉及规范非线性Duffing振子的预测任务来评估DyRC-VG方法,评估预测精度和一致性。与具有相同大小、谱半径和可比密度的Erd\H{o}s-R\'enyi图相比,我们在DyRC-VG的重复实现中观察到更高的预测质量和更一致的性能。
摘要:Accurate prediction of complex and nonlinear time series remains a challenging problem across engineering and scientific disciplines. Reservoir computing (RC) offers a computationally efficient alternative to traditional deep learning by training only the read-out layer while employing a randomly structured and fixed reservoir network. Despite its advantages, the largely random reservoir graph architecture often results in suboptimal and oversized networks with poorly understood dynamics. Addressing this issue, we propose a novel Dynamics-Informed Reservoir Computing (DyRC) framework that systematically infers the reservoir network structure directly from the input training sequence. This work proposes to employ the visibility graph (VG) technique, which converts time series data into networks by representing measurement points as nodes linked by mutual visibility. The reservoir network is constructed by directly adopting the VG network from a training data sequence, leveraging the parameter-free visibility graph approach to avoid expensive hyperparameter tuning. This process results in a reservoir that is directly informed by the specific dynamics of the prediction task under study. We assess the DyRC-VG method through prediction tasks involving the canonical nonlinear Duffing oscillator, evaluating prediction accuracy and consistency. Compared to an Erd\H{o}s-R\'enyi graph of the same size, spectral radius, and comparable density, we observe higher prediction quality and more consistent performance over repeated implementations in the DyRC-VG.


【6】Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction
标题:预测血脑屏障渗透性的几何多色消息传递图神经网络
链接:https://arxiv.org/abs/2507.18926

作者:yen, Md Masud Rana, Farjana Tasnim Mukta, Chang-Guo Zhan, Duc Duy Nguyen
摘要:血脑屏障通透性(BBBP)的准确预测对于中枢神经系统(CNS)药物开发至关重要。虽然图神经网络(GNN)具有先进的分子性质预测,但它们通常依赖于分子拓扑结构,而忽略了对传输机制建模至关重要的三维几何信息。本文介绍了几何多色消息传递图神经网络(GMC-MPNN),一种新的框架,通过显式地将原子级几何特征和远程交互,增强了标准的消息传递架构。我们的模型构造加权有色子图的基础上的原子类型,以捕捉空间关系和化学背景下,支配血脑屏障渗透性。我们在三个基准数据集上评估了GMC-MPNN的分类和回归任务,使用严格的基于脚手架的分裂来确保对泛化的鲁棒评估。结果表明,GMC-MPNN始终优于现有的最先进的模型,在将化合物分类为渗透性/非渗透性(AUC-ROC为0.9704和0.9685)和回归连续渗透性值(RMSE为0.4609,Pearson相关性为0.7759)方面都实现了优异的性能。一项消融研究进一步量化了特定原子对相互作用的影响,揭示了该模型的预测能力来自于它从常见和罕见但具有化学意义的功能基序中学习的能力。通过将空间几何集成到图形表示中,GMC-MPNN设定了一个新的性能基准,并为药物发现管道提供了一个更准确和更通用的工具。
摘要:Accurate prediction of blood-brain barrier permeability (BBBP) is essential for central nervous system (CNS) drug development. While graph neural networks (GNNs) have advanced molecular property prediction, they often rely on molecular topology and neglect the three-dimensional geometric information crucial for modeling transport mechanisms. This paper introduces the geometric multi-color message-passing graph neural network (GMC-MPNN), a novel framework that enhances standard message-passing architectures by explicitly incorporating atomic-level geometric features and long-range interactions. Our model constructs weighted colored subgraphs based on atom types to capture the spatial relationships and chemical context that govern BBB permeability. We evaluated GMC-MPNN on three benchmark datasets for both classification and regression tasks, using rigorous scaffold-based splitting to ensure a robust assessment of generalization. The results demonstrate that GMC-MPNN consistently outperforms existing state-of-the-art models, achieving superior performance in both classifying compounds as permeable/non-permeable (AUC-ROC of 0.9704 and 0.9685) and in regressing continuous permeability values (RMSE of 0.4609, Pearson correlation of 0.7759). An ablation study further quantified the impact of specific atom-pair interactions, revealing that the model's predictive power derives from its ability to learn from both common and rare, but chemically significant, functional motifs. By integrating spatial geometry into the graph representation, GMC-MPNN sets a new performance benchmark and offers a more accurate and generalizable tool for drug discovery pipelines.


【7】Ralts: Robust Aggregation for Enhancing Graph Neural Network Resilience on Bit-flip Errors
标题:Ralts:鲁棒聚合,增强图神经网络对位翻转错误的弹性
链接:https://arxiv.org/abs/2507.18804

作者:Zou, Nan Wu
摘要:图神经网络(GNN)已广泛应用于安全关键型应用,如金融和医疗网络,其中妥协的预测可能会导致灾难性的后果。虽然现有的GNN鲁棒性研究主要集中在软件层面的威胁,但硬件引起的故障和错误在很大程度上仍然没有得到充分的研究。随着硬件系统向先进技术节点发展以满足高性能和能效需求,它们变得越来越容易受到瞬时故障的影响,这可能导致位翻转和无声数据损坏,这是主要技术公司(例如,Meta和Google)。作为回应,我们首先全面分析了GNN对比特翻转错误的鲁棒性,旨在揭示未来可靠和高效的GNN系统的系统级优化机会。其次,我们提出了Ralts,这是一种可推广的轻量级解决方案,可以增强GNN对比特翻转错误的弹性。具体来说,Ralts利用各种图相似性度量来过滤离群值并恢复受损的图拓扑,并将这些保护技术直接纳入聚合函数以支持任何消息传递GNN。评估结果表明,Ralts有效地增强了GNN在一系列GNN模型,图形数据集,错误模式以及密集和稀疏架构中的鲁棒性。平均而言,在误码率为3 × 10 - 5时,当模型权重或节点嵌入中存在错误时,这些鲁棒聚合函数将预测精度提高至少20%,当邻接矩阵中存在错误时,这些鲁棒聚合函数将预测精度提高至少10%。Ralts还进行了优化,以提供与PyTorch Geometric中内置聚合函数相当的执行效率。
摘要 :Graph neural networks (GNNs) have been widely applied in safety-critical applications, such as financial and medical networks, in which compromised predictions may cause catastrophic consequences. While existing research on GNN robustness has primarily focused on software-level threats, hardware-induced faults and errors remain largely underexplored. As hardware systems progress toward advanced technology nodes to meet high-performance and energy efficiency demands, they become increasingly susceptible to transient faults, which can cause bit flips and silent data corruption, a prominent issue observed by major technology companies (e.g., Meta and Google). In response, we first present a comprehensive analysis of GNN robustness against bit-flip errors, aiming to reveal system-level optimization opportunities for future reliable and efficient GNN systems. Second, we propose Ralts, a generalizable and lightweight solution to bolster GNN resilience to bit-flip errors. Specifically, Ralts exploits various graph similarity metrics to filter out outliers and recover compromised graph topology, and incorporates these protective techniques directly into aggregation functions to support any message-passing GNNs. Evaluation results demonstrate that Ralts effectively enhances GNN robustness across a range of GNN models, graph datasets, error patterns, and both dense and sparse architectures. On average, under a BER of $3\times10^{-5}$, these robust aggregation functions improve prediction accuracy by at least 20\% when errors present in model weights or node embeddings, and by at least 10\% when errors occur in adjacency matrices. Ralts is also optimized to deliver execution efficiency comparable to built-in aggregation functions in PyTorch Geometric.


【8】Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs
标题:在集成图中利用更高级信息的高效知识跟踪
链接:https://arxiv.org/abs/2507.18668

作者:an, Daehee Kim, Minjun Lee, Daeyoung Roh, Keejun Han, Mun Yong Yi
摘要:在线学习的兴起导致了各种知识追踪(KT)方法的发展。然而,现有的方法忽略了使用大型图和长学习序列时增加计算成本的问题。为了解决这个问题,我们引入了基于双图注意力的知识跟踪(DGAKT),这是一种图神经网络模型,旨在利用代表学生-练习-KC关系的子图的高阶信息。DGAKT采用了基于子图的方法来提高计算效率。通过仅处理每个目标交互的相关子图,与完整的全局图模型相比,DGAKT显著降低了内存和计算需求。大量的实验结果表明,DGAKT不仅优于现有的KT模型,而且还设定了一个新的标准,在资源效率,解决了一个关键的需求,已在很大程度上被忽略了以前的KT方法。
摘要:The rise of online learning has led to the development of various knowledge tracing (KT) methods. However, existing methods have overlooked the problem of increasing computational cost when utilizing large graphs and long learning sequences. To address this issue, we introduce Dual Graph Attention-based Knowledge Tracing (DGAKT), a graph neural network model designed to leverage high-order information from subgraphs representing student-exercise-KC relationships. DGAKT incorporates a subgraph-based approach to enhance computational efficiency. By processing only relevant subgraphs for each target interaction, DGAKT significantly reduces memory and computational requirements compared to full global graph models. Extensive experimental results demonstrate that DGAKT not only outperforms existing KT models but also sets a new standard in resource efficiency, addressing a critical need that has been largely overlooked by prior KT approaches.


【9】Gradient-based grand canonical optimization enabled by graph neural networks with fractional atomic existence
标题:具有分数原子存在的图神经网络实现基于子集的大经典优化
链接:https://arxiv.org/abs/2507.19438

作者:r Verner Christiansen, Bjørk Hammer
摘要:机器学习原子间相互作用势已成为材料科学不可或缺的工具,使研究更大的系统和更长的时间尺度成为可能。最先进的模型通常是图神经网络,它采用消息传递来迭代更新最终用于预测属性的原子嵌入。在这项工作中,我们扩展的消息传递形式主义,包括一个连续的变量,占分数原子的存在。这使我们能够计算吉布斯自由能相对于原子的笛卡尔坐标和它们的存在的梯度。利用这一点,我们提出了一个基于梯度的巨正则优化方法,并记录其能力的Cu(110)表面氧化物。
摘要:Machine learning interatomic potentials have become an indispensable tool for materials science, enabling the study of larger systems and longer timescales. State-of-the-art models are generally graph neural networks that employ message passing to iteratively update atomic embeddings that are ultimately used for predicting properties. In this work we extend the message passing formalism with the inclusion of a continuous variable that accounts for fractional atomic existence. This allows us to calculate the gradient of the Gibbs free energy with respect to both the Cartesian coordinates of atoms and their existence. Using this we propose a gradient-based grand canonical optimization method and document its capabilities for a Cu(110) surface oxide.


【10】Bespoke multiresolution analysis of graph signals
标题:图形信号的定制多分辨率分析
链接:https://arxiv.org/abs/2507.19181

作者:lefante, Gianluca Giacchi, Michael Multerer, Jacopo Quizi
摘要:我们提出了一个新的框架,图信号的离散多分辨率分析。主要的分析工具是采样变换,最初定义在欧几里德框架作为一个离散的小波样的建设,适合于分散的数据分析。这项工作的第一个贡献是定义图上的样本。为此,我们将图细分为固定数量的补丁,将每个补丁嵌入到欧几里得空间中,在那里我们构建样本,并最终将构建拉回到图中。这确保了正交性,局部性和消失矩属性相对于正确定义的多项式空间的图。与经典的Haar小波相比,该框架拓宽了可以有效压缩和分析的图形信号的类别。沿着这条线,我们提供了一类信号的定义,可以使用我们的建设压缩。我们支持我们的研究结果与不同的例子定义的信号的顶点位于光滑流形上的图。为了有效的数值实现,我们结合了重边聚类,将图形划分为有意义的补丁,具有地标\texttt{Isomap},它为每个补丁提供低维嵌入。我们的研究结果表明,该方法的鲁棒性,可扩展性,并能够产生稀疏表示与可控的近似误差,显着优于传统的Haar小波方法的压缩效率和多分辨率保真度。
摘要:We present a novel framework for discrete multiresolution analysis of graph signals. The main analytical tool is the samplet transform, originally defined in the Euclidean framework as a discrete wavelet-like construction, tailored to the analysis of scattered data. The first contribution of this work is defining samplets on graphs. To this end, we subdivide the graph into a fixed number of patches, embed each patch into a Euclidean space, where we construct samplets, and eventually pull the construction back to the graph. This ensures orthogonality, locality, and the vanishing moments property with respect to properly defined polynomial spaces on graphs. Compared to classical Haar wavelets, this framework broadens the class of graph signals that can efficiently be compressed and analyzed. Along this line, we provide a definition of a class of signals that can be compressed using our construction. We support our findings with different examples of signals defined on graphs whose vertices lie on smooth manifolds. For efficient numerical implementation, we combine heavy edge clustering, to partition the graph into meaningful patches, with landmark \texttt{Isomap}, which provides low-dimensional embeddings for each patch. Our results demonstrate the method's robustness, scalability, and ability to yield sparse representations with controllable approximation error, significantly outperforming traditional Haar wavelet approaches in terms of compression efficiency and multiresolution fidelity.


【11】Graph Neural Network-Based Predictor for Optimal Quantum Hardware Selection
标题:基于图神经网络的最佳量子硬件选择预测器
链接:https://arxiv.org/abs/2507.19093

作者:udisco, Deborah Volpe, Giacomo Orlandi, Giovanna Turvani
摘要 :量子硬件技术的种类越来越多,每种技术都具有独特的特性,如连接性和本地门设置,在选择执行特定量子电路的最佳平台时带来了挑战。这个选择过程通常涉及一种蛮力方法:在各种器件上编译电路,并根据电路深度和栅极保真度等因素评估性能。然而,这种方法在计算上是昂贵的,并且随着可用量子处理器数量的增加而不能很好地扩展。在这项工作中,我们提出了一个基于图神经网络(GNN)的预测器,通过分析量子电路的有向无环图(DAG)表示来自动化硬件选择。我们的研究评估了来自MQT Bench数据集的498个量子电路(最多27个量子位),这些数据集使用Qiskit在四个设备上编译:三个超导量子处理器(IBM-Kyiv,IBM-Brisbane,IBM-Sherbrooke)和一个捕获离子处理器(IONQ-Forte)。使用集成电路深度和门保真度的度量来估计性能,从而产生一个数据集,其中93个电路在捕获离子设备上进行最佳编译,而其余电路更喜欢超导平台。通过利用基于图的机器学习,我们的方法避免了提取用于模型评估的电路特征,而是直接将其嵌入为图,从而显著加快了最佳目标决策过程并维护了所有信息。实验结果表明,少数类的准确率为94.4%,F1得分为85.5%,有效地预测了最佳编译目标。开发的代码可在GitHub上公开获取(https://github.com/antotu/GNN-Model-Quantum-Predictor)。
摘要:The growing variety of quantum hardware technologies, each with unique peculiarities such as connectivity and native gate sets, creates challenges when selecting the best platform for executing a specific quantum circuit. This selection process usually involves a brute-force approach: compiling the circuit on various devices and evaluating performance based on factors such as circuit depth and gate fidelity. However, this method is computationally expensive and does not scale well as the number of available quantum processors increases. In this work, we propose a Graph Neural Network (GNN)-based predictor that automates hardware selection by analyzing the Directed Acyclic Graph (DAG) representation of a quantum circuit. Our study evaluates 498 quantum circuits (up to 27 qubits) from the MQT Bench dataset, compiled using Qiskit on four devices: three superconducting quantum processors (IBM-Kyiv, IBM-Brisbane, IBM-Sherbrooke) and one trapped-ion processor (IONQ-Forte). Performance is estimated using a metric that integrates circuit depth and gate fidelity, resulting in a dataset where 93 circuits are optimally compiled on the trapped-ion device, while the remaining circuits prefer superconducting platforms. By exploiting graph-based machine learning, our approach avoids extracting the circuit features for the model evaluation but directly embeds it as a graph, significantly accelerating the optimal target decision-making process and maintaining all the information. Experimental results prove 94.4% accuracy and an 85.5% F1 score for the minority class, effectively predicting the best compilation target. The developed code is publicly available on GitHub (https://github.com/antotu/GNN-Model-Quantum-Predictor).


【12】Central limit theorems for the eigenvalues of graph Laplacians on data clouds
标题:数据云上图拉普拉斯特征值的中心极限定理
链接:https://arxiv.org/abs/2507.18803

作者:Li, Nicolás García Trillos, Housen Li, Leo Suchan
摘要:给予i.i.d.\样本$X_n =\{ x_1,\dots,x_n \}$,我们考虑了与$X_n$上的$\varepp $-邻近图相关联的图Laplacian算子$\Delta_n$,并研究了其特征值在均值附近的渐近涨落.特别地,令$\hat{\lambda}_l^\vareps $表示$\Delta_n$的第l$个特征值,并且在对数据生成模型和$\vareps $的衰减速率的适当假设下,我们证明了$\sqrt{n }(\hat{\lambda}_{l}^\varepsilon - \mathbb{E}[\hat{\lambda}_{l}^\varepsilon])$是渐近高斯的,我们可以明确地描述它的方差。一个正式的参数,使我们能够解释这种渐近方差的耗散梯度流的一个合适的能量相对于费舍尔-饶几何。这种几何解释使我们能够给出,反过来,一个统计解释的渐近方差的Cramer-Rao下界估计的特征值的某些加权Laplace-Beltrami算子。后一种解释提出了一种形式的渐近统计效率的特征值的图形拉普拉斯算子。我们还提出了CLTs的多重特征值,并通过几个数值实验探索我们的结果的有效性时,我们在我们的理论分析中所作的一些假设是放松的。
摘要:Given i.i.d.\ samples $X_n =\{ x_1, \dots, x_n \}$ from a distribution supported on a low dimensional manifold ${M}$ embedded in Eucliden space, we consider the graph Laplacian operator $\Delta_n$ associated to an $\varepsilon$-proximity graph over $X_n$ and study the asymptotic fluctuations of its eigenvalues around their means. In particular, letting $\hat{\lambda}_l^\varepsilon$ denote the $l$-th eigenvalue of $\Delta_n$, and under suitable assumptions on the data generating model and on the rate of decay of $\varepsilon$, we prove that $\sqrt{n } (\hat{\lambda}_{l}^\varepsilon - \mathbb{E}[\hat{\lambda}_{l}^\varepsilon] )$ is asymptotically Gaussian with a variance that we can explicitly characterize. A formal argument allows us to interpret this asymptotic variance as the dissipation of a gradient flow of a suitable energy with respect to the Fisher-Rao geometry. This geometric interpretation allows us to give, in turn, a statistical interpretation of the asymptotic variance in terms of a Cramer-Rao lower bound for the estimation of the eigenvalues of certain weighted Laplace-Beltrami operator. The latter interpretation suggests a form of asymptotic statistical efficiency for the eigenvalues of the graph Laplacian. We also present CLTs for multiple eigenvalues and through several numerical experiments explore the validity of our results when some of the assumptions that we make in our theoretical analysis are relaxed.


GAN|对抗|攻击|生成相关(6篇)

【1】Dependency-aware synthetic tabular data generation
标题:依赖性感知的合成表格数据生成
链接:https://arxiv.org/abs/2507.19211

作者:Umesh, Kristian Schultz, Manjunath Mahendra, Saptarshi Bej, Olaf Wolkenhauer
备注:23 pages, 3 figures, submitted to Pattern Recognition
摘要:合成表格数据越来越多地用于隐私敏感的领域,如医疗保健,但现有的生成模型往往无法保持属性间的关系。特别是,函数依赖(FD)和逻辑依赖(LD),捕获特征之间的确定性和基于规则的关联,很少或往往很差地保留在合成数据集中。为了解决这一研究空白,我们提出了层次特征生成框架(HFGF)的合成表格数据生成。我们创建了具有已知依赖关系的基准数据集来评估我们提出的HFGF。该框架首先使用任何标准的生成模型生成独立的特征,然后根据预定义的FD和LD规则重构依赖特征。我们在具有不同大小,特征不平衡和依赖复杂性的四个基准数据集上的实验表明,HFGF提高了六个生成模型(包括CTGAN,TVAE和GReaT)中FD和LD的保存。我们的研究结果表明,HFGF可以显着提高合成表格数据的结构保真度和下游效用。
摘要:Synthetic tabular data is increasingly used in privacy-sensitive domains such as health care, but existing generative models often fail to preserve inter-attribute relationships. In particular, functional dependencies (FDs) and logical dependencies (LDs), which capture deterministic and rule-based associations between features, are rarely or often poorly retained in synthetic datasets. To address this research gap, we propose the Hierarchical Feature Generation Framework (HFGF) for synthetic tabular data generation. We created benchmark datasets with known dependencies to evaluate our proposed HFGF. The framework first generates independent features using any standard generative model, and then reconstructs dependent features based on predefined FD and LD rules. Our experiments on four benchmark datasets with varying sizes, feature imbalance, and dependency complexity demonstrate that HFGF improves the preservation of FDs and LDs across six generative models, including CTGAN, TVAE, and GReaT. Our findings demonstrate that HFGF can significantly enhance the structural fidelity and downstream utility of synthetic tabular data.


【2】Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation
标题:提炼小型基于公用事业的通道收件箱以增强检索增强生成
链接:https://arxiv.org/abs/2507.19102

作者:hang, Keping Bi, Jiafeng Guo, Jiaming Zhang, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng
备注:9 pages, 5 figures
摘要:检索增强生成(RAG)通过合并检索到的信息来增强大型语言模型(LLM)。标准的检索过程优先考虑相关性,重点是查询和段落之间的主题对齐。相比之下,在RAG中,重点已经转移到实用性,即考虑段落对生成准确答案的有用性。尽管经验证据表明,在RAG的效用为基础的检索的好处,使用LLM效用判断的高计算成本限制了评估的通道数量。这种限制对于需要大量信息的复杂查询来说是有问题的。为了解决这个问题,我们提出了一种方法来提取LLM的效用判断能力到更小,更有效的模型。我们的方法侧重于基于效用的选择而不是排名,从而实现针对特定查询量身定制的动态通道选择,而无需固定阈值。我们训练学生模型学习伪答案生成和实用判断教师LLM,使用滑动窗口方法,动态选择有用的通道。我们的实验表明,基于效用的选择提供了一个灵活的和具有成本效益的解决方案,RAG,显着降低计算成本,同时提高答案质量。我们使用Qwen 3 - 32 B作为教师模型,对相关性排名和基于效用的选择进行蒸馏,蒸馏成RankQwen1.7B和UtilityQwen1.7B。我们的研究结果表明,对于复杂的问题,基于效用的选择是更有效的,比相关性排名在提高答案生成性能。我们将发布MS MARCO数据集的相关性排名和基于效用的选择注释,以支持该领域的进一步研究。
摘要 :Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information. Standard retrieval process prioritized relevance, focusing on topical alignment between queries and passages. In contrast, in RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers. Despite empirical evidence showing the benefits of utility-based retrieval in RAG, the high computational cost of using LLMs for utility judgments limits the number of passages evaluated. This restriction is problematic for complex queries requiring extensive information. To address this, we propose a method to distill the utility judgment capabilities of LLMs into smaller, more efficient models. Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds. We train student models to learn pseudo-answer generation and utility judgments from teacher LLMs, using a sliding window method that dynamically selects useful passages. Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality. We present the distillation results using Qwen3-32B as the teacher model for both relevance ranking and utility-based selection, distilled into RankQwen1.7B and UtilityQwen1.7B. Our findings indicate that for complex questions, utility-based selection is more effective than relevance ranking in enhancing answer generation performance. We will release the relevance ranking and utility-based selection annotations for the MS MARCO dataset, supporting further research in this area.


【3】PurpCode: Reasoning for Safer Code Generation
标题:PurpCode:推理以更安全地生成代码
链接:https://arxiv.org/abs/2507.19060

作者:u, Nirav Diwan, Zhe Wang, Haoyu Zhai, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Muntasir Wahed, Yinlin Deng, Hadjer Benkraouda, Yuxiang Wei, Lingming Zhang, Ismini Lourentzou, Gang Wang
摘要:我们介绍PurpCode,这是第一个训练安全代码推理模型的训练后配方,用于生成安全代码和防御恶意网络活动。PurpCode分两个阶段训练推理模型:(i)规则学习,明确教导模型参考网络安全规则,以生成无可验证性的代码,并避免促进恶意网络活动;(ii)强化学习,通过多样化的多目标奖励机制优化模型安全性并保留模型效用。为使培训管道具备全面的网络安全数据,我们进行内部红队,根据真实任务综合全面且高覆盖率的提示,以在模型中诱导不安全的网络活动。基于PurpCode,我们开发了一个基于推理的编码模型,即PurpCode-32 B,它展示了最先进的网络安全,优于各种前沿模型。同时,我们的对齐方法降低了一般和网络安全特定场景中的模型过度拒绝率,同时保留了代码生成和常见安全知识中的模型效用。
摘要:We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learning, which explicitly teaches the model to reference cybersafety rules to generate vulnerability-free code and to avoid facilitating malicious cyberactivities; and (ii) Reinforcement Learning, which optimizes model safety and preserves model utility through diverse, multi-objective reward mechanisms. To empower the training pipelines with comprehensive cybersafety data, we conduct internal red-teaming to synthesize comprehensive and high-coverage prompts based on real-world tasks for inducing unsafe cyberactivities in the model. Based on PurpCode, we develop a reasoning-based coding model, namely PurpCode-32B, which demonstrates state-of-the-art cybersafety, outperforming various frontier models. Meanwhile, our alignment method decreases the model overrefusal rates in both general and cybersafety-specific scenarios, while preserving model utility in both code generation and common security knowledge.


【4】A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions
标题:关键检索增强一代(RAG)系统的系统性回顾:进展、差距和未来方向
链接:https://arxiv.org/abs/2507.18910

作者:eph Oche, Ademola Glory Folashade, Tirthankar Ghosal, Arpan Biswas
备注:33 pages, 2 figures
摘要:检索增强生成(RAG)代表了自然语言处理(NLP)的一个重大进步,将大型语言模型(LLM)与信息检索系统相结合,以增强事实基础,准确性和上下文相关性。本文提出了一个全面系统的审查RAG,跟踪其演变,从早期的发展开放领域的问题回答,最近的国家的最先进的实现在不同的应用程序。该综述首先概述了RAG背后的动机,特别是它在参数模型中减轻幻觉和过时知识的能力。核心技术组件检索机制,序列到序列生成模型,融合策略进行了详细研究。逐年分析突出了关键的里程碑和研究趋势,为RAG的快速增长提供了洞察力。本文进一步探讨了RAG在企业系统中的部署,解决了与专有数据检索、安全性和可扩展性相关的实际挑战。RAG实现进行了比较评估,基准性能的检索准确性,生成流畅性,延迟和计算效率。持续的挑战,如检索质量,隐私问题,集成开销进行了严格的评估。最后,审查强调新兴的解决方案,包括混合检索方法,隐私保护技术,优化的融合策略,和代理RAG架构。这些创新指向了更可靠,更高效,更上下文感知的知识密集型NLP系统的未来。
摘要:Retrieval-Augmented Generation (RAG) represents a major advancement in natural language processing (NLP), combining large language models (LLMs) with information retrieval systems to enhance factual grounding, accuracy, and contextual relevance. This paper presents a comprehensive systematic review of RAG, tracing its evolution from early developments in open domain question answering to recent state-of-the-art implementations across diverse applications. The review begins by outlining the motivations behind RAG, particularly its ability to mitigate hallucinations and outdated knowledge in parametric models. Core technical components-retrieval mechanisms, sequence-to-sequence generation models, and fusion strategies are examined in detail. A year-by-year analysis highlights key milestones and research trends, providing insight into RAG's rapid growth. The paper further explores the deployment of RAG in enterprise systems, addressing practical challenges related to retrieval of proprietary data, security, and scalability. A comparative evaluation of RAG implementations is conducted, benchmarking performance on retrieval accuracy, generation fluency, latency, and computational efficiency. Persistent challenges such as retrieval quality, privacy concerns, and integration overhead are critically assessed. Finally, the review highlights emerging solutions, including hybrid retrieval approaches, privacy-preserving techniques, optimized fusion strategies, and agentic RAG architectures. These innovations point toward a future of more reliable, efficient, and context-aware knowledge-intensive NLP systems.


【5】RealDeal: Enhancing Realism and Details in Brain Image Generation via Image-to-Image Diffusion Models
标题:RealDeal:通过图像到图像扩散模型增强大脑图像生成中的真实性和细节
链接:https://arxiv.org/abs/2507.18830

作者: Yinzhu Jin, Tyler Spears, Ifrah Zawar, P. Thomas Fletcher
备注:19 pages, 10 figures
摘要:我们提出了图像到图像的扩散模型,旨在通过引入锐利的边缘,精细的纹理,微妙的解剖特征和成像噪声,以提高所生成的大脑图像的真实性和细节。生成模型在生物医学领域,特别是在图像生成应用中得到了广泛的应用。潜在扩散模型在生成脑MRI方面实现了最先进的结果。然而,由于潜在的压缩,从这些模型生成的图像过于平滑,缺乏精细的解剖结构和扫描采集噪声,通常在真实图像中看到。本文将真实感增强和细节添加过程描述为图像到图像的扩散模型,从而改善了LDM生成图像的质量。我们采用常用的指标,如FID和LPIPS的图像真实感评估。此外,我们引入了新的指标来证明RealDeal在图像噪声分布,清晰度和纹理方面生成的图像的真实性。
摘要 :We propose image-to-image diffusion models that are designed to enhance the realism and details of generated brain images by introducing sharp edges, fine textures, subtle anatomical features, and imaging noise. Generative models have been widely adopted in the biomedical domain, especially in image generation applications. Latent diffusion models achieve state-of-the-art results in generating brain MRIs. However, due to latent compression, generated images from these models are overly smooth, lacking fine anatomical structures and scan acquisition noise that are typically seen in real images. This work formulates the realism enhancing and detail adding process as image-to-image diffusion models, which refines the quality of LDM-generated images. We employ commonly used metrics like FID and LPIPS for image realism assessment. Furthermore, we introduce new metrics to demonstrate the realism of images generated by RealDeal in terms of image noise distribution, sharpness, and texture.


【6】SCORE-SET: A dataset of GuitarPro files for Music Phrase Generation and Sequence Learning
标题:SCORE-SET:用于音乐短语生成和序列学习的GuitarPro文件数据集
链接:https://arxiv.org/abs/2507.18723

作者:egari
备注:6 pages, 6 figures
摘要:提供了一个精心策划的Guitar Pro指法文件数据集(.gp5格式),专为涉及吉他音乐生成、序列建模和表演感知学习的任务而量身定制。该数据集来自MAESTRO和Giantstrom中的音符,这些音符已被改编为节奏吉他曲目。这些音轨经过进一步处理,以包括吉他演奏中典型的各种表情设置,例如压音、滑动、颤音和手掌静音,以更好地反映现实世界吉他演奏的细微差别。
摘要:A curated dataset of Guitar Pro tablature files (.gp5 format), tailored for tasks involving guitar music generation, sequence modeling, and performance-aware learning is provided. The dataset is derived from MIDI notes in MAESTRO and GiantMIDI which have been adapted into rhythm guitar tracks. These tracks are further processed to include a variety of expression settings typical of guitar performance, such as bends, slides, vibrato, and palm muting, to better reflect the nuances of real-world guitar playing.


半/弱/无/有监督|不确定性|主动学习(2篇)

【1】Explainable AI guided unsupervised fault diagnostics for high-voltage circuit breakers
标题:可解释的人工智能指导高压断路器的无监督故障诊断
链接:https://arxiv.org/abs/2507.19168

作者: Hsu, Gaëtan Frusque, Florent Forest, Felipe Macedo, Christian M. Franck, Olga Fink
备注:None
摘要:商用高压断路器(CB)状态监测系统依赖于可直接观察到的物理参数,例如具有预定义阈值的充气压力。虽然这些参数至关重要,但它们仅涵盖故障机制的一小部分,并且通常仅在CB与电网断开时才能进行监测。为了便于在CB保持连接的同时进行在线状态监测,必须采用振动或声学信号等非侵入式测量技术。目前,使用这些信号的CB状态监测研究通常利用监督方法进行故障诊断,其中由于在实验室设置中人为引入的故障,已知地面实况故障类型。然而,这种监督方法在实际应用中是不可行的,其中故障标签不可用。在这项工作中,我们提出了一种新的无监督的故障检测和分割框架的基础上振动和声信号的CB。该框架可以检测与健康状态的偏差。可解释人工智能(XAI)的方法被应用到检测到的故障进行故障诊断。具体贡献如下:(1)我们提出了一个集成的无监督故障检测和分割框架,该框架能够检测故障并在训练期间仅使用所需的健康数据对不同故障进行聚类(2)我们提供了一种无监督的可解释性指导的故障诊断方法,使用XAI为领域专家提供老化或故障组件的潜在指示,实现故障诊断而无需地面实况故障标签的先决条件。这些贡献进行了验证,使用实验数据集从高压CB健康和人为引入的故障条件下,有助于更可靠的CB系统的操作。
摘要:Commercial high-voltage circuit breaker (CB) condition monitoring systems rely on directly observable physical parameters such as gas filling pressure with pre-defined thresholds. While these parameters are crucial, they only cover a small subset of malfunctioning mechanisms and usually can be monitored only if the CB is disconnected from the grid. To facilitate online condition monitoring while CBs remain connected, non-intrusive measurement techniques such as vibration or acoustic signals are necessary. Currently, CB condition monitoring studies using these signals typically utilize supervised methods for fault diagnostics, where ground-truth fault types are known due to artificially introduced faults in laboratory settings. This supervised approach is however not feasible in real-world applications, where fault labels are unavailable. In this work, we propose a novel unsupervised fault detection and segmentation framework for CBs based on vibration and acoustic signals. This framework can detect deviations from the healthy state. The explainable artificial intelligence (XAI) approach is applied to the detected faults for fault diagnostics. The specific contributions are: (1) we propose an integrated unsupervised fault detection and segmentation framework that is capable of detecting faults and clustering different faults with only healthy data required during training (2) we provide an unsupervised explainability-guided fault diagnostics approach using XAI to offer domain experts potential indications of the aged or faulty components, achieving fault diagnostics without the prerequisite of ground-truth fault labels. These contributions are validated using an experimental dataset from a high-voltage CB under healthy and artificially introduced fault conditions, contributing to more reliable CB system operation.


【2】Human-AI Synergy in Adaptive Active Learning for Continuous Lithium Carbonate Crystallization Optimization
标题:自适应主动学习中的人机协同作用,用于连续碳酸锂结晶优化
链接:https://arxiv.org/abs/2507.19316

作者: Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Cara Cronin, Jason E. Hein, Jason Hattrick-Simpers
摘要:随着电动汽车(EV)行业的增长,对高纯度锂的需求激增,从Smackover Formation等较低品位的北美资源中进行经济有效的提取至关重要。与高纯度的南美卤水不同,这些资源需要创新的净化技术才能在经济上可行。连续结晶是生产电池级碳酸锂的一种有前途的方法,但其优化受到复杂参数空间和有限数据的挑战。本研究引入了一个人在回路(HITL)辅助的主动学习框架来优化碳酸锂的连续结晶。通过将人类专业知识与数据驱动的见解相结合,我们的方法加速了从具有挑战性的来源中提取锂的优化。我们的研究结果证明了该框架能够快速适应新数据,显著提高了工艺对镁等关键杂质的耐受性,从几百ppm的行业标准提高到高达6000 ppm。这一突破使得低品位、富含杂质的锂资源的开采变得可行,从而可能减少对大量预精炼工艺的需求。通过利用人工智能,我们改进了操作参数,并证明可以在不牺牲产品质量的情况下使用较低等级的材料。这一进展是在经济上利用北美巨大的锂储量(如Smackover Formation中的锂储量)和增强全球锂供应链可持续性方面迈出的重要一步。
摘要:As demand for high-purity lithium surges with the growth of the electric vehicle (EV) industry, cost-effective extraction from lower-grade North American sources like the Smackover Formation is critical. These resources, unlike high-purity South American brines, require innovative purification techniques to be economically viable. Continuous crystallization is a promising method for producing battery-grade lithium carbonate, but its optimization is challenged by a complex parameter space and limited data. This study introduces a Human-in-the-Loop (HITL) assisted active learning framework to optimize the continuous crystallization of lithium carbonate. By integrating human expertise with data-driven insights, our approach accelerates the optimization of lithium extraction from challenging sources. Our results demonstrate the framework's ability to rapidly adapt to new data, significantly improving the process's tolerance to critical impurities like magnesium from the industry standard of a few hundred ppm to as high as 6000 ppm. This breakthrough makes the exploitation of low-grade, impurity-rich lithium resources feasible, potentially reducing the need for extensive pre-refinement processes. By leveraging artificial intelligence, we have refined operational parameters and demonstrated that lower-grade materials can be used without sacrificing product quality. This advancement is a significant step towards economically harnessing North America's vast lithium reserves, such as those in the Smackover Formation, and enhancing the sustainability of the global lithium supply chain.


迁移|Zero/Few/One-Shot|自适应(1篇)

【1】Adaptive Neural Quantum States: A Recurrent Neural Network Perspective
标题:自适应神经量子状态:循环神经网络的视角
链接:https://arxiv.org/abs/2507.18700

作者:ughton, Mohamed Hibat-Allah
备注:14 pages, 7 figures, 3 tables. Link to GitHub repository: this https URL
摘要:神经网络量子态(NQS)是一种功能强大的神经网络模型,已经成为通过变分原理研究量子多体物理的有前途的工具。已知这些架构可通过增加参数的数量来系统地改进。在这里,我们展示了一个自适应方案来优化NQS,通过递归神经网络(RNN)的例子,使用一小部分的计算成本,同时减少训练波动,提高变分计算的质量,目标是一维和二维空间中原型模型的基态。这种自适应技术通过训练小型RNN并重用它们来初始化较大的RNN来降低计算成本。这项工作为优化大规模NQS模拟中部署的图形处理单元(GPU)资源提供了可能性。
摘要:Neural-network quantum states (NQS) are powerful neural-network ans\"atzes that have emerged as promising tools for studying quantum many-body physics through the lens of the variational principle. These architectures are known to be systematically improvable by increasing the number of parameters. Here we demonstrate an Adaptive scheme to optimize NQSs, through the example of recurrent neural networks (RNN), using a fraction of the computation cost while reducing training fluctuations and improving the quality of variational calculations targeting ground states of prototypical models in one- and two-spatial dimensions. This Adaptive technique reduces the computational cost through training small RNNs and reusing them to initialize larger RNNs. This work opens up the possibility for optimizing graphical processing unit (GPU) resources deployed in large-scale NQS simulations.


强化学习(9篇)

【1】Hierarchical Deep Reinforcement Learning Framework for Multi-Year Asset Management Under Budget Constraints
标题:预算约束下多年资产管理的分层深度强化学习框架
链接:https://arxiv.org/abs/2507.19458

作者:, Arnold X.-X. Yuan
摘要:预算规划和维护优化对于基础设施资产管理至关重要,可确保成本效益和可持续性。然而,由组合动作空间、多样的资产恶化、严格的预算约束和环境不确定性引起的复杂性显著限制了现有方法的可扩展性。本文提出了一种专门针对多年基础设施规划的分层深度强化学习方法。我们的方法分解成两个层次的问题:一个高层次的预算规划师分配明确的可行性范围内的年度预算,和一个低层次的维护规划师优先分配的预算内的资产。通过在结构上将宏观预算决策与资产级优先级划分分离,并在分层软演员-评论家框架内集成线性规划投影,该方法有效地解决了行动空间的指数增长问题,并确保严格的预算合规性。一个案例研究评估不同规模的下水道网络(10,15,20污水厂)说明了所提出的方法的有效性。与传统的深度Q学习和增强型遗传算法相比,我们的方法收敛更快,扩展更有效,即使网络规模不断增长,也能始终如一地提供接近最优的解决方案。
摘要:Budget planning and maintenance optimization are crucial for infrastructure asset management, ensuring cost-effectiveness and sustainability. However, the complexity arising from combinatorial action spaces, diverse asset deterioration, stringent budget constraints, and environmental uncertainty significantly limits existing methods' scalability. This paper proposes a Hierarchical Deep Reinforcement Learning methodology specifically tailored to multi-year infrastructure planning. Our approach decomposes the problem into two hierarchical levels: a high-level Budget Planner allocating annual budgets within explicit feasibility bounds, and a low-level Maintenance Planner prioritizing assets within the allocated budget. By structurally separating macro-budget decisions from asset-level prioritization and integrating linear programming projection within a hierarchical Soft Actor-Critic framework, the method efficiently addresses exponential growth in the action space and ensures rigorous budget compliance. A case study evaluating sewer networks of varying sizes (10, 15, and 20 sewersheds) illustrates the effectiveness of the proposed approach. Compared to conventional Deep Q-Learning and enhanced genetic algorithms, our methodology converges more rapidly, scales effectively, and consistently delivers near-optimal solutions even as network size grows.


【2】GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
标题:GEPA:反思性即时进化可以胜过强化学习
链接:https://arxiv.org/abs/2507.19457

作者: Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab
摘要:大型语言模型(LLM)越来越多地通过组相对策略优化(GRPO)等强化学习(RL)方法来适应下游任务,这些方法通常需要数千次部署才能学习新任务。我们认为,语言的可解释性往往可以提供一个更丰富的学习媒介LLM相比,来自稀疏,标量奖励的政策梯度。为了测试这一点,我们引入了GEPA(Genetic-Pareto),这是一种即时优化器,它完全结合了自然语言反射,从试错中学习高级规则。给定包含一个或多个LLM提示的任何AI系统,GEPA对系统级轨迹进行采样(例如,推理,工具调用和工具输出),并以自然语言对其进行反思,以诊断问题,提出和测试即时更新,并结合自己尝试的帕累托前沿的互补经验教训。由于GEPA的设计,它往往可以把即使只是几个推出到一个大的质量增益。在四项任务中,GEPA的平均性能比GRPO高出10%,最高可达20%,同时使用的部署次数减少了35倍。GEPA在两个LLM上的性能也超过了领先的提示优化器MIPROv 2超过10%,并展示了作为代码优化的推理时间搜索策略的有希望的结果。
摘要:Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language can often provide a much richer learning medium for LLMs, compared with policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. Given any AI system containing one or more LLM prompts, GEPA samples system-level trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems, propose and test prompt updates, and combine complementary lessons from the Pareto frontier of its own attempts. As a result of GEPA's design, it can often turn even just a few rollouts into a large quality gain. Across four tasks, GEPA outperforms GRPO by 10% on average and by up to 20%, while using up to 35x fewer rollouts. GEPA also outperforms the leading prompt optimizer, MIPROv2, by over 10% across two LLMs, and demonstrates promising results as an inference-time search strategy for code optimization.


【3】ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination
标题:ReCodDe:基于强化学习的多智能体协调动态约束设计
链接:https://arxiv.org/abs/2507.19151

作者:mir, Guang Yang, Zhan Gao, Keisuke Okumura, Heedo Woo, Amanda Prorok
摘要:基于约束的优化是机器人技术的基石,可以设计出可靠编码任务和安全要求(如避免碰撞或编队粘附)的控制器。然而,手工制作的约束可能会在需要复杂协调的多代理设置中失败。我们引入了ReCoDe--基于强化的约束设计--一个分散的混合框架,将基于优化的控制器的可靠性与多智能体强化学习的适应性融为一体。ReCoDe没有抛弃专家控制器,而是通过学习额外的动态约束来改进它们,这些约束可以捕获更微妙的行为,例如,通过约束代理移动来防止混乱场景中的拥塞。通过本地通信,智能体集体限制其允许的行动,以在不断变化的条件下更有效地协调。在这项工作中,我们专注于应用程序的ReCoDe多智能体导航任务,需要复杂的,基于上下文的运动和共识,在那里我们表明,它优于纯手工制作的控制器,其他混合方法,和标准的MARL基线。我们给出了经验(真正的机器人)和理论证据,保留用户定义的控制器,即使它是不完美的,是更有效的比从头开始学习,特别是因为ReCoDe可以动态地改变它依赖于这个控制器的程度。
摘要 :Constraint-based optimization is a cornerstone of robotics, enabling the design of controllers that reliably encode task and safety requirements such as collision avoidance or formation adherence. However, handcrafted constraints can fail in multi-agent settings that demand complex coordination. We introduce ReCoDe--Reinforcement-based Constraint Design--a decentralized, hybrid framework that merges the reliability of optimization-based controllers with the adaptability of multi-agent reinforcement learning. Rather than discarding expert controllers, ReCoDe improves them by learning additional, dynamic constraints that capture subtler behaviors, for example, by constraining agent movements to prevent congestion in cluttered scenarios. Through local communication, agents collectively constrain their allowed actions to coordinate more effectively under changing conditions. In this work, we focus on applications of ReCoDe to multi-agent navigation tasks requiring intricate, context-based movements and consensus, where we show that it outperforms purely handcrafted controllers, other hybrid approaches, and standard MARL baselines. We give empirical (real robot) and theoretical evidence that retaining a user-defined controller, even when it is imperfect, is more efficient than learning from scratch, especially because ReCoDe can dynamically change the degree to which it relies on this controller.


【4】Reinforcement Learning via Conservative Agent for Environments with Random Delays
标题:随机延迟环境中的保守代理强化学习
链接:https://arxiv.org/abs/2507.18992

作者:ee, Jangwon Kim, Jiseok Jeong, Soohee Han
摘要:现实世界的强化学习应用经常受到来自环境的延迟反馈的阻碍,这违反了马尔可夫假设并带来了重大挑战。虽然已经提出了许多延迟补偿方法的环境中具有恒定的延迟,随机延迟的环境仍然在很大程度上未被探索,由于其固有的可变性和不可预测性。在这项研究中,我们提出了一个简单而强大的代理随机延迟下的决策,称为保守的代理,重新制定的随机延迟环境到其恒定延迟等效。这种转换使任何国家的最先进的恒定延迟的方法可以直接扩展到随机延迟的环境,而无需修改算法结构或牺牲性能。我们评估了保守的基于代理的算法在连续控制任务,实证结果表明,它显着优于现有的基线算法的渐近性能和样本效率。
摘要:Real-world reinforcement learning applications are often hindered by delayed feedback from environments, which violates the Markov assumption and introduces significant challenges. Although numerous delay-compensating methods have been proposed for environments with constant delays, environments with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent. This transformation enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance. We evaluate the conservative agent-based algorithm on continuous control tasks, and empirical results demonstrate that it significantly outperforms existing baseline algorithms in terms of asymptotic performance and sample efficiency.


【5】Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise
标题:通过吸收广义人类专业知识在多智能体强化学习中学习个人内在奖励
链接:https://arxiv.org/abs/2507.18867

作者:, Xiao Yin, Yuanyang Zhu, Chunlin Chen
备注:IEEE International Conference on Systems, Man, and Cybernetics
摘要:多智能体强化学习(MARL)中的有效探索是一个具有挑战性的问题,当只接收团队奖励时,特别是在奖励稀疏的环境中。缓解这个问题的一个强有力的方法是制定密集的个人奖励,以引导智能体进行有效的探索。然而,个体奖励通常依赖于人工设计的成形奖励函数,缺乏高阶智能,因此在复杂问题的学习和泛化方面,它的表现不如人类。为了解决这些问题,我们结合了上述两种范式,并提出了一个新的框架,光(学习个人内在的奖励通过简化广义人类experTise),它可以集成人类知识到MARL算法在一个端到端的方式。LIGHT通过综合考虑个体行为分布和人类经验偏好分布来指导每个Agent避免不必要的探索。然后,LIGHT基于与Q学习相关的可操作表示转换为每个代理设计个人内在奖励,以便代理将其动作偏好与人类专业知识相一致,同时最大化联合动作价值。实验结果表明,我们的方法的优越性,在具有挑战性的情况下,在不同的稀疏奖励任务的性能和更好的知识重用性的代表性基线。
摘要:Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. To tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner. LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios.


【6】Test-time Offline Reinforcement Learning on Goal-related Experience
标题:目标相关体验的测试时间离线强化学习
链接:https://arxiv.org/abs/2507.18809

作者:atella, Mert Albaba, Jonas Hübotter, Georg Martius, Andreas Krause
摘要:基础模型将大量信息压缩在单个大型神经网络中,然后可以针对单个任务进行查询。这个广泛的框架和离线目标条件强化学习算法之间有很强的相似之处:在大量目标上训练普适值函数,并在每个测试事件中对单个目标进行评估。对基础模型的广泛研究表明,通过测试时训练,使模型专门化以达到当前目标,可以大大提高性能。我们同样发现,对与测试目标相关的经验进行测试时离线强化学习,可以以最小的计算成本获得更好的策略。我们提出了一种新的自监督数据选择标准,该标准根据其与当前状态的相关性和相对于评估目标的质量从离线数据集中选择转换。我们在广泛的高维局部导航和操作任务中证明,对所选数据进行几个梯度步骤的微调策略,与标准离线预训练相比,可以获得显着的性能提升。我们的目标条件测试时间训练(GC-TTT)算法在评估期间以滚动时域的方式应用此例程,使策略适应当前的轨迹。最后,我们研究了计算分配的推断,表明,在可比的成本,GC-TTT诱导的性能增益是无法实现的缩放模型的大小。
摘要:Foundation models compress a large amount of information in a single, large neural network, which can then be queried for individual tasks. There are strong parallels between this widespread framework and offline goal-conditioned reinforcement learning algorithms: a universal value function is trained on a large number of goals, and the policy is evaluated on a single goal in each test episode. Extensive research in foundation models has shown that performance can be substantially improved through test-time training, specializing the model to the current goal. We find similarly that test-time offline reinforcement learning on experience related to the test goal can lead to substantially better policies at minimal compute costs. We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state and quality with respect to the evaluation goal. We demonstrate across a wide range of high-dimensional loco-navigation and manipulation tasks that fine-tuning a policy on the selected data for a few gradient steps leads to significant performance gains over standard offline pre-training. Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out. Finally, we study compute allocation at inference, demonstrating that, at comparable costs, GC-TTT induces performance gains that are not achievable by scaling model size.


【7】Market Making Strategies with Reinforcement Learning
标题:强化学习的做市策略
链接:https://arxiv.org/abs/2507.18680

作者:nández Vicente
摘要:本文介绍了一个综合研究项目的结果,该项目的重点是将强化学习(RL)应用于金融市场中的做市问题。做市商在提供流动性方面发挥着重要作用,但也面临着来自库存风险、竞争和非平稳市场动态的重大挑战。这项研究探讨了如何利用强化学习,特别是深度强化学习(DRL)来开发自主、自适应和有利可图的市场决策策略。   该研究首先将MM任务制定为强化学习问题,设计能够在模拟金融环境中在单代理和多代理设置中操作的代理。然后,它使用两种互补的方法来解决库存管理的复杂问题:奖励工程和多目标强化学习(MORL)。前者使用动态奖励塑造来指导行为,后者利用帕累托前沿优化来明确平衡竞争目标。   为了解决非平稳性问题,该研究引入了POW-dTS,一种新的策略加权算法的基础上折扣汤普森采样。这种方法允许代理动态选择和组合预先训练的策略,从而能够持续适应不断变化的市场条件。   实验结果表明,所提出的基于RL的方法显着优于传统的和基线的算法策略在各种性能指标。总的来说,本研究论文为设计强大,高效和自适应的做市代理提供了新的方法和见解,增强了RL在复杂金融系统中改变算法交易的潜力。
摘要:This thesis presents the results of a comprehensive research project focused on applying Reinforcement Learning (RL) to the problem of market making in financial markets. Market makers (MMs) play a fundamental role in providing liquidity, yet face significant challenges arising from inventory risk, competition, and non-stationary market dynamics. This research explores how RL, particularly Deep Reinforcement Learning (DRL), can be employed to develop autonomous, adaptive, and profitable market making strategies.   The study begins by formulating the MM task as a reinforcement learning problem, designing agents capable of operating in both single-agent and multi-agent settings within a simulated financial environment. It then addresses the complex issue of inventory management using two complementary approaches: reward engineering and Multi-Objective Reinforcement Learning (MORL). While the former uses dynamic reward shaping to guide behavior, the latter leverages Pareto front optimization to explicitly balance competing objectives.   To address the problem of non-stationarity, the research introduces POW-dTS, a novel policy weighting algorithm based on Discounted Thompson Sampling. This method allows agents to dynamically select and combine pretrained policies, enabling continual adaptation to shifting market conditions.   The experimental results demonstrate that the proposed RL-based approaches significantly outperform traditional and baseline algorithmic strategies across various performance metrics. Overall, this research thesis contributes new methodologies and insights for the design of robust, efficient, and adaptive market making agents, reinforcing the potential of RL to transform algorithmic trading in complex financial systems.


【8】Controlling Topological Defects in Polar Fluids via Reinforcement Learning
标题:通过强化学习控制极流体中的布局缺陷
链接:https://arxiv.org/abs/2507.19298

作者:ingh, Petros Koumoutsakos
摘要:在活跃的极性流体的拓扑缺陷表现出复杂的动力学驱动的内部产生的应力,反映了拓扑结构,流动和非平衡流体力学之间的深刻的相互作用。反馈控制提供了一个强大的手段来指导这样的系统,使动态状态之间的转换。我们研究了封闭的活性流体中的整数电荷缺陷的闭环转向通过调制的空间分布的活动。使用连续流体动力学模型,我们表明,局部控制的主动应力诱导流场,可以重新定位和直接缺陷沿规定的轨迹,通过利用系统中的非线性耦合。一个强化学习框架被用来发现有效的控制策略,产生强大的缺陷传输经过训练的和新的轨迹。研究结果强调了人工智能智能如何学习潜在的动力学和空间结构活动来操纵拓扑激发,从而深入了解活性物质的可控性和自适应自组织材料的设计。
摘要:Topological defects in active polar fluids exhibit complex dynamics driven by internally generated stresses, reflecting the deep interplay between topology, flow, and non-equilibrium hydrodynamics. Feedback control offers a powerful means to guide such systems, enabling transitions between dynamic states. We investigated closed-loop steering of integer-charged defects in a confined active fluid by modulating the spatial profile of activity. Using a continuum hydrodynamic model, we show that localized control of active stress induces flow fields that can reposition and direct defects along prescribed trajectories by exploiting non-linear couplings in the system. A reinforcement learning framework is used to discover effective control strategies that produce robust defect transport across both trained and novel trajectories. The results highlight how AI agents can learn the underlying dynamics and spatially structure activity to manipulate topological excitations, offering insights into the controllability of active matter and the design of adaptive, self-organized materials.


【9】Optimizing Metachronal Paddling with Reinforcement Learning at Low Reynolds Number
标题:低雷诺数下利用强化学习优化元时划桨
链接:https://arxiv.org/abs/2507.18849

作者:Bailey, Robert D. Guy
备注:18 pages, 14 figures, to be published in EPJ E
摘要:异时划水(英语:Metachronal paddling)是一种游泳策略,其中生物体以恒定的相位滞后振荡相邻的肢体,通过其肢体传播异时波并推动其前进。这种肢体协调策略是由游泳运动员在很宽的雷诺数范围内使用的,这表明这种异时节律被选择用于其最佳的游泳性能。在这项研究中,我们将强化学习应用于零雷诺数的游泳运动员,并研究学习算法是否选择这种异时节律,或者是否出现其他协调模式。我们设计了一个细长的身体和直的,不灵活的桨沿身体放置各种固定桨间距的游泳者代理。基于桨间距,游泳者代理学习定性不同的协调模式。在紧密的间距,一个后到前的异时性波状中风出现类似于通常观察到的生物节律,但在宽间距,不同的肢体协调选择。在所有产生的划水中,最快的划水取决于桨的数量,然而,最有效的划水是从后到前的波浪式划水,而不管桨的数量。
摘要:Metachronal paddling is a swimming strategy in which an organism oscillates sets of adjacent limbs with a constant phase lag, propagating a metachronal wave through its limbs and propelling it forward. This limb coordination strategy is utilized by swimmers across a wide range of Reynolds numbers, which suggests that this metachronal rhythm was selected for its optimality of swimming performance. In this study, we apply reinforcement learning to a swimmer at zero Reynolds number and investigate whether the learning algorithm selects this metachronal rhythm, or if other coordination patterns emerge. We design the swimmer agent with an elongated body and pairs of straight, inflexible paddles placed along the body for various fixed paddle spacings. Based on paddle spacing, the swimmer agent learns qualitatively different coordination patterns. At tight spacings, a back-to-front metachronal wave-like stroke emerges which resembles the commonly observed biological rhythm, but at wide spacings, different limb coordinations are selected. Across all resulting strokes, the fastest stroke is dependent on the number of paddles, however, the most efficient stroke is a back-to-front wave-like stroke regardless of the number of paddles.


符号|符号学习(1篇)

【1】Learning neuro-symbolic convergent term rewriting systems
标题:学习神经符号收敛项重写系统
链接:https://arxiv.org/abs/2507.19372

作者:truzzellis, Alberto Testolin, Alessandro Sperduti
备注:48 pages, 31 figures. Submitted for review by Artificial Intelligence Journal
摘要:构建可以学习执行符号算法的神经系统是人工智能中一个具有挑战性的开放问题,特别是当目标是强泛化和分布外性能时。在这项工作中,我们介绍了一个通用的框架,学习收敛项重写系统使用的神经符号架构的重写算法本身的启发。我们提出了这样的架构的两个模块化实现:神经重写系统(NRS)和快速神经重写系统(FastNRS)。由于算法灵感的设计和关键的架构元素,这两个模型都可以推广到分布外的实例,FastNRS在内存效率,训练速度和推理时间方面提供了显着的改进。我们在涉及数学公式简化的四个任务上评估了这两种架构,并进一步证明了它们在多领域学习场景中的多功能性,在多领域学习场景中,单个模型被训练以同时解决多种类型的问题。所提出的系统显着优于两个强大的神经基线:神经数据路由器,最近的Transformer变体,专门设计用于解决算法问题,和GPT-4o,最强大的通用大型语言模型之一。此外,我们的系统匹配或优于OpenAI的最新o1-preview模型,该模型在推理基准方面表现出色。
摘要 :Building neural systems that can learn to execute symbolic algorithms is a challenging open problem in artificial intelligence, especially when aiming for strong generalization and out-of-distribution performance. In this work, we introduce a general framework for learning convergent term rewriting systems using a neuro-symbolic architecture inspired by the rewriting algorithm itself. We present two modular implementations of such architecture: the Neural Rewriting System (NRS) and the Fast Neural Rewriting System (FastNRS). As a result of algorithmic-inspired design and key architectural elements, both models can generalize to out-of-distribution instances, with FastNRS offering significant improvements in terms of memory efficiency, training speed, and inference time. We evaluate both architectures on four tasks involving the simplification of mathematical formulas and further demonstrate their versatility in a multi-domain learning scenario, where a single model is trained to solve multiple types of problems simultaneously. The proposed system significantly outperforms two strong neural baselines: the Neural Data Router, a recent transformer variant specifically designed to solve algorithmic problems, and GPT-4o, one of the most powerful general-purpose large-language models. Moreover, our system matches or outperforms the latest o1-preview model from OpenAI that excels in reasoning benchmarks.


医学相关(4篇)

【1】Counterfactual Explanations in Medical Imaging: Exploring SPN-Guided Latent Space Manipulation
标题:医学成像中的反事实解释:探索SEN引导的潜在空间操纵
链接:https://arxiv.org/abs/2507.19368

作者:kiera, Stefan Kramer
备注:10 pages, 3 figures
摘要:人工智能越来越多地被用于各个领域,以自动化对人类生活产生重大影响的决策过程。在医学图像分析中,深度学习模型表现出了卓越的性能。然而,它们固有的复杂性使它们成为黑箱系统,引起了对可靠性和可解释性的担忧。反事实解释通过提出改变模型分类的假设性“假设”情景,为决策过程提供了可理解的见解。通过检查输入的变化,反事实解释提供了影响决策过程的模式。尽管它们的潜力,产生合理的反事实,坚持相似性约束提供人类可解释的解释仍然是一个挑战。在本文中,我们研究了这一挑战的模型特定的优化方法。虽然变分自编码器(VAE)等深度生成模型表现出显著的生成能力,但和积网络(SPN)等概率模型有效地表示复杂的联合概率分布。通过对半监督VAE的潜在空间与SPN的可能性进行建模,我们利用其作为潜在空间描述符和给定判别任务的分类器的双重角色。该公式使得能够优化既接近原始数据分布又与目标类分布对齐的潜在空间反事实。我们对cheXpert数据集进行了实验评估。为了评估SPN的整合的有效性,我们的SPN引导的潜在空间操作相比,对神经网络基线。此外,潜变量正则化和反事实质量之间的权衡进行了分析。
摘要:Artificial intelligence is increasingly leveraged across various domains to automate decision-making processes that significantly impact human lives. In medical image analysis, deep learning models have demonstrated remarkable performance. However, their inherent complexity makes them black box systems, raising concerns about reliability and interpretability. Counterfactual explanations provide comprehensible insights into decision processes by presenting hypothetical "what-if" scenarios that alter model classifications. By examining input alterations, counterfactual explanations provide patterns that influence the decision-making process. Despite their potential, generating plausible counterfactuals that adhere to similarity constraints providing human-interpretable explanations remains a challenge. In this paper, we investigate this challenge by a model-specific optimization approach. While deep generative models such as variational autoencoders (VAEs) exhibit significant generative power, probabilistic models like sum-product networks (SPNs) efficiently represent complex joint probability distributions. By modeling the likelihood of a semi-supervised VAE's latent space with an SPN, we leverage its dual role as both a latent space descriptor and a classifier for a given discrimination task. This formulation enables the optimization of latent space counterfactuals that are both close to the original data distribution and aligned with the target class distribution. We conduct experimental evaluation on the cheXpert dataset. To evaluate the effectiveness of the integration of SPNs, our SPN-guided latent space manipulation is compared against a neural network baseline. Additionally, the trade-off between latent variable regularization and counterfactual quality is analyzed.


【2】Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection
标题:自动咳嗽分析用于非小细胞肺癌检测
链接:https://arxiv.org/abs/2507.19174

作者:angregorio (1), Cristina Maria Licciardello (1), Vanja Miskovic (1 and 2), Leonardo Provenzano (1 and 2), Alessandra Laura Giulia Pedrocchi (1), Andra Diana Dumitrascu (2), Arsela Prelaj (2), Marina Chiara Garassino (3), Emilia Ambrosini (1), Simona Ferrante (1 and 4) ((1) Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy, (2) Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Milan, Italy, (3) Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, IL, USA, (4) IRCCS Istituto Neurologico Carlo Besta, Milan, Italy)
备注:Emilia Ambrosini and Simona Ferrante equally contributed to the work
摘要:非小细胞肺癌(NSCLC)的早期检测对于改善患者预后至关重要,需要新的方法来促进早期诊断。在这项研究中,我们探索使用自动咳嗽分析作为区分NSCLC患者和健康对照的预筛选工具。从总共227名受试者(分为NSCLC患者和健康对照)中前瞻性采集咳嗽音频记录。使用机器学习技术(如支持向量机(SVM)和XGBoost)以及深度学习方法(特别是卷积神经网络(CNN)和VGG 16的迁移学习)分析了记录。为了增强机器学习模型的可解释性,我们使用了Shapley加法修正(SHAP)。通过比较不同年龄组(小于或等于58岁和大于58岁)和性别的最佳模型的性能,使用测试集上的均衡优势差,评估了人口统计学组之间模型的公平性。结果表明,CNN实现了最佳性能,在测试集上的准确率为0.83。然而,SVM的性能略低(验证准确率为0.76,测试集准确率为0.78),因此适用于计算能力较低的环境。使用SHAP进行SVM解释进一步增强了模型的透明度,使其在临床应用中更值得信赖。公平性分析显示,在测试集上,年龄(0.15)的差异略高于性别(0.09)。因此,为了加强我们研究结果的可靠性,需要一个更大,更多样化和无偏见的数据集-特别是包括有NSCLC风险的个体和处于早期疾病阶段的个体。
摘要:Early detection of non-small cell lung cancer (NSCLC) is critical for improving patient outcomes, and novel approaches are needed to facilitate early diagnosis. In this study, we explore the use of automatic cough analysis as a pre-screening tool for distinguishing between NSCLC patients and healthy controls. Cough audio recordings were prospectively acquired from a total of 227 subjects, divided into NSCLC patients and healthy controls. The recordings were analyzed using machine learning techniques, such as support vector machine (SVM) and XGBoost, as well as deep learning approaches, specifically convolutional neural networks (CNN) and transfer learning with VGG16. To enhance the interpretability of the machine learning model, we utilized Shapley Additive Explanations (SHAP). The fairness of the models across demographic groups was assessed by comparing the performance of the best model across different age groups (less than or equal to 58y and higher than 58y) and gender using the equalized odds difference on the test set. The results demonstrate that CNN achieves the best performance, with an accuracy of 0.83 on the test set. Nevertheless, SVM achieves slightly lower performances (accuracy of 0.76 in validation and 0.78 in the test set), making it suitable in contexts with low computational power. The use of SHAP for SVM interpretation further enhances model transparency, making it more trustworthy for clinical applications. Fairness analysis shows slightly higher disparity across age (0.15) than gender (0.09) on the test set. Therefore, to strengthen our findings' reliability, a larger, more diverse, and unbiased dataset is needed -- particularly including individuals at risk of NSCLC and those in early disease stages.


【3】Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model
标题:使用机器学习模型和具有不同先验的Bayesian神经网络进行区分甲状腺癌复发分类:基于SHAP的最佳表现模型解释
链接:https://arxiv.org/abs/2507.18987

作者:ri, HMLS Kumari, UMMPK Nawarathne
备注 :16 pages, 15 figures, to be published in International Journal of Research in Computing (IJRC)
摘要:分化型甲状腺癌DTC复发是一个主要的公共卫生问题,需要分类和预测模型,不仅准确,而且可解释和不确定性意识。这项研究介绍了一个全面的框架DTC复发分类使用的数据集包含383例患者和16个临床和病理变量。最初,使用完整的数据集使用了11个机器学习ML模型,其中支持向量机SVM模型达到了0.9481的最高精度。为了减少复杂性和冗余,使用Boruta算法进行特征选择,并将相同的ML模型应用于简化的数据集,其中观察到Logistic回归LR模型获得了0.9611的最大准确度。然而,这些ML模型通常缺乏不确定性量化,这在临床决策中至关重要。因此,为了解决这个限制,贝叶斯神经网络BNN与六个不同的先验分布,包括正常0,1,正常0,10,拉普拉斯0,1,柯西0,1,柯西0,2.5和马蹄铁1,在完整和减少的数据集上实现。具有正态0,10先验分布的BNN模型在特征选择之前和之后分别表现出0.9740和0.9870的最大精度。
摘要:Differentiated thyroid cancer DTC recurrence is a major public health concern, requiring classification and predictive models that are not only accurate but also interpretable and uncertainty aware. This study introduces a comprehensive framework for DTC recurrence classification using a dataset containing 383 patients and 16 clinical and pathological variables. Initially, 11 machine learning ML models were employed using the complete dataset, where the Support Vector Machines SVM model achieved the highest accuracy of 0.9481. To reduce complexity and redundancy, feature selection was carried out using the Boruta algorithm, and the same ML models were applied to the reduced dataset, where it was observed that the Logistic Regression LR model obtained the maximum accuracy of 0.9611. However, these ML models often lack uncertainty quantification, which is critical in clinical decision making. Therefore, to address this limitation, the Bayesian Neural Networks BNN with six varying prior distributions, including Normal 0,1, Normal 0,10, Laplace 0,1, Cauchy 0,1, Cauchy 0,2.5, and Horseshoe 1, were implemented on both the complete and reduced datasets. The BNN model with Normal 0,10 prior distribution exhibited maximum accuracies of 0.9740 and 0.9870 before and after feature selection, respectively.


【4】Early Mortality Prediction in ICU Patients with Hypertensive Kidney Disease Using Interpretable Machine Learning
标题:使用可解释机器学习预测ICU高血压肾病患者的早期死亡率
链接:https://arxiv.org/abs/2507.18866

作者:Junyi Fan, Li Sun, Shuheng Chen, Minoo Ahmadi, Elham Pishgar, Kamiar Alaei, Greg Placencia, Maryam Pishgar
摘要:背景资料:重症监护室(ICU)中的高血压肾病(HKD)患者面临较高的短期死亡率,但缺乏量身定制的风险预测工具。早期识别高风险个体对于临床决策至关重要。研究方法:我们开发了一个机器学习框架,使用MIMIC-IV v2.2数据库的早期临床数据预测ICU HKD患者的30天住院死亡率。一个由1,366名成年人组成的队列按照严格的标准进行管理,不包括恶性肿瘤病例。通过随机森林重要性和互信息过滤选择了18个临床特征,包括生命体征、实验室检查、合并症和治疗。对几个模型进行了训练,并与分层五重交叉验证进行了比较; CatBoost表现出最佳性能。结果:CatBoost在独立测试集上的AUROC为0.88,灵敏度为0.811,特异性为0.798。SHAP值和累积局部效应(ALE)图显示,该模型依赖于有意义的预测因子,如意识改变、血管加压药使用和凝血状态。此外,DREAM算法被整合用于估计患者特异性后验风险分布,使临床医生能够评估预测的死亡率及其不确定性。结论:我们提出了一种可解释的机器学习管道,用于ICU HKD患者的早期实时风险评估。通过将高预测性能与不确定性量化相结合,我们的模型支持个性化分诊和透明的临床决策。这种方法显示出临床部署的前景,值得在更广泛的重症监护人群中进行外部验证。
摘要:Background: Hypertensive kidney disease (HKD) patients in intensive care units (ICUs) face high short-term mortality, but tailored risk prediction tools are lacking. Early identification of high-risk individuals is crucial for clinical decision-making. Methods: We developed a machine learning framework to predict 30-day in-hospital mortality among ICU patients with HKD using early clinical data from the MIMIC-IV v2.2 database. A cohort of 1,366 adults was curated with strict criteria, excluding malignancy cases. Eighteen clinical features-including vital signs, labs, comorbidities, and therapies-were selected via random forest importance and mutual information filtering. Several models were trained and compared with stratified five-fold cross-validation; CatBoost demonstrated the best performance. Results: CatBoost achieved an AUROC of 0.88 on the independent test set, with sensitivity of 0.811 and specificity of 0.798. SHAP values and Accumulated Local Effects (ALE) plots showed the model relied on meaningful predictors such as altered consciousness, vasopressor use, and coagulation status. Additionally, the DREAM algorithm was integrated to estimate patient-specific posterior risk distributions, allowing clinicians to assess both predicted mortality and its uncertainty. Conclusions: We present an interpretable machine learning pipeline for early, real-time risk assessment in ICU patients with HKD. By combining high predictive performance with uncertainty quantification, our model supports individualized triage and transparent clinical decisions. This approach shows promise for clinical deployment and merits external validation in broader critical care populations.


蒸馏|知识提取(1篇)

【1】ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs
标题:ProGMLP:GNN-to-MLP知识提炼的渐进框架,具有高效权衡
链接:https://arxiv.org/abs/2507.19031

作者:u, Ziyu Guan, Wei Zhao, Yaming Yang, Yujie Sun, Zheng Liang, Yibing Zhan, Dapeng Tao
摘要:GNN-to-MLP(G2 M)方法已经成为一种很有前途的方法,可以通过将图形神经网络(GNN)的知识提取到更简单的多层感知器(MLP)中来加速图形神经网络(GNN)。这些方法弥补了GNN的表达能力和MLP的计算效率之间的差距,使它们非常适合资源受限的环境。然而,现有的G2 M方法的限制,他们无法灵活地动态调整推理成本和准确性,一个关键的要求,为现实世界的应用程序中的计算资源和时间约束可以显着变化。为了解决这个问题,我们引入了一个渐进式框架,旨在为GNN-to-MLP知识蒸馏(ProGMLP)的推理成本和准确性之间提供灵活和按需的权衡。ProGMLP采用渐进式培训结构(PTS),其中多个MLP学生按顺序接受培训,每个学生都建立在前一个学生的基础上。此外,ProGMLP结合了渐进知识蒸馏(PKD),以迭代地改进从GNN到MLP的蒸馏过程,以及渐进混合增强(PMA),以通过逐步生成更难的混合样本来增强泛化能力。我们的方法是通过8个真实世界的图形数据集的综合实验进行验证,证明ProGMLP保持高精度,同时动态适应不同的运行时场景,使其非常有效的部署在不同的应用程序设置。
摘要:GNN-to-MLP (G2M) methods have emerged as a promising approach to accelerate Graph Neural Networks (GNNs) by distilling their knowledge into simpler Multi-Layer Perceptrons (MLPs). These methods bridge the gap between the expressive power of GNNs and the computational efficiency of MLPs, making them well-suited for resource-constrained environments. However, existing G2M methods are limited by their inability to flexibly adjust inference cost and accuracy dynamically, a critical requirement for real-world applications where computational resources and time constraints can vary significantly. To address this, we introduce a Progressive framework designed to offer flexible and on-demand trade-offs between inference cost and accuracy for GNN-to-MLP knowledge distillation (ProGMLP). ProGMLP employs a Progressive Training Structure (PTS), where multiple MLP students are trained in sequence, each building on the previous one. Furthermore, ProGMLP incorporates Progressive Knowledge Distillation (PKD) to iteratively refine the distillation process from GNNs to MLPs, and Progressive Mixup Augmentation (PMA) to enhance generalization by progressively generating harder mixed samples. Our approach is validated through comprehensive experiments on eight real-world graph datasets, demonstrating that ProGMLP maintains high accuracy while dynamically adapting to varying runtime scenarios, making it highly effective for deployment in diverse application settings.


推荐(4篇)

【1】Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization
标题:放手?不完全是:使用基于内容的收件箱解决顺序推荐中的项目冷启动
链接:https://arxiv.org/abs/2507.19473

作者:bek, Artem Fatkulin, Anton Klenitskiy, Alexey Vasilev
摘要 :许多顺序推荐系统遭受冷启动问题,其中由于缺乏训练的嵌入,具有很少或没有交互的项目不能被模型有效地使用。基于内容的方法(利用项元数据)通常用于此类场景。一种可能的方法是使用从内容特征(如文本描述)派生的嵌入作为模型嵌入的初始化。然而,直接使用冻结内容嵌入通常会导致性能不佳,因为它们可能无法完全适应推荐任务。另一方面,微调这些嵌入会降低冷启动项的性能,因为项表示在训练后可能会远离其原始结构。我们提出了一种新的方法来解决这个限制。我们没有完全冻结内容嵌入或对其进行广泛的微调,而是为冻结的嵌入引入了一个小的可训练增量,使模型能够适应项表示,而不会让它们偏离其原始语义结构太远。这种方法在多个数据集和模式上表现出一致的改进,包括具有文本描述的电子商务数据集和具有基于音频表示的音乐数据集。
摘要:Many sequential recommender systems suffer from the cold start problem, where items with few or no interactions cannot be effectively used by the model due to the absence of a trained embedding. Content-based approaches, which leverage item metadata, are commonly used in such scenarios. One possible way is to use embeddings derived from content features such as textual descriptions as initialization for the model embeddings. However, directly using frozen content embeddings often results in suboptimal performance, as they may not fully adapt to the recommendation task. On the other hand, fine-tuning these embeddings can degrade performance for cold-start items, as item representations may drift far from their original structure after training. We propose a novel approach to address this limitation. Instead of entirely freezing the content embeddings or fine-tuning them extensively, we introduce a small trainable delta to frozen embeddings that enables the model to adapt item representations without letting them go too far from their original semantic structure. This approach demonstrates consistent improvements across multiple datasets and modalities, including e-commerce datasets with textual descriptions and a music dataset with audio-based representation.


【2】Short-Form Video Recommendations with Multimodal Embeddings: Addressing Cold-Start and Bias Challenges
标题:具有多模式嵌入的简短视频建议:应对冷启动和偏见挑战
链接:https://arxiv.org/abs/2507.19346

作者:hoha, Katya Mirylenka, Egor Malykh, Marco-Andrea Buchmann, Francesca Catino
摘要:近年来,社交媒体用户在短视频平台上花费了大量时间。因此,电子商务等其他领域的成熟平台已开始引入短视频内容,以吸引用户并增加他们在平台上花费的时间。这些体验的成功不仅归功于内容本身,还归功于独特的UI创新:平台不是为用户提供点击选择列表,而是主动推荐内容供用户一次观看。这给推荐系统带来了新的挑战,特别是在推出新的视频体验时。除了有限的交互数据之外,沉浸式Feed体验在优化观看时间时会引入更强的位置偏差,这是由于UI和持续时间偏差,因为模型倾向于更短的视频。这些问题,再加上推荐系统固有的反馈回路,使得很难建立有效的解决方案。在本文中,我们强调了在引入新的短视频体验时所面临的挑战,并提出了我们的经验,表明即使有足够的视频交互数据,利用视频检索系统使用微调的多模态视觉语言模型来克服这些挑战可能会更有益。在我们的电子商务平台上进行的在线实验中,这种方法比传统的监督学习方法表现出更大的有效性。
摘要:In recent years, social media users have spent significant amounts of time on short-form video platforms. As a result, established platforms in other domains, such as e-commerce, have begun introducing short-form video content to engage users and increase their time spent on the platform. The success of these experiences is due not only to the content itself but also to a unique UI innovation: instead of offering users a list of choices to click, platforms actively recommend content for users to watch one at a time. This creates new challenges for recommender systems, especially when launching a new video experience. Beyond the limited interaction data, immersive feed experiences introduce stronger position bias due to the UI and duration bias when optimizing for watch-time, as models tend to favor shorter videos. These issues, together with the feedback loop inherent in recommender systems, make it difficult to build effective solutions. In this paper, we highlight the challenges faced when introducing a new short-form video experience and present our experience showing that, even with sufficient video interaction data, it can be more beneficial to leverage a video retrieval system using a fine-tuned multimodal vision-language model to overcome these challenges. This approach demonstrated greater effectiveness compared to conventional supervised learning methods in online experiments conducted on our e-commerce platform.


【3】Semantic IDs for Music Recommendation
标题:用于音乐推荐的语义ID
链接:https://arxiv.org/abs/2507.18800

作者:y Mei, Florian Henkel, Samuel E. Sandberg, Oliver Bembom, Andreas F. Ehmann
备注:RecSys 2025 Industry Track
摘要:训练推荐系统进行下一个项目推荐通常需要为每个项目学习唯一的嵌入,这可能会占用模型的大部分可训练参数。共享嵌入,例如使用内容信息,可以减少要存储在内存中的不同嵌入的数量。这允许更轻量级的模型;相应地,由于在内存中存储的嵌入更少,模型复杂性可能会增加。我们展示了使用共享的基于内容的特征(“语义ID”)在提高推荐准确性和多样性方面的好处,同时减少了两个音乐推荐数据集的模型大小,包括对音乐流媒体服务的在线A/B测试。
摘要:Training recommender systems for next-item recommendation often requires unique embeddings to be learned for each item, which may take up most of the trainable parameters for a model. Shared embeddings, such as using content information, can reduce the number of distinct embeddings to be stored in memory. This allows for a more lightweight model; correspondingly, model complexity can be increased due to having fewer embeddings to store in memory. We show the benefit of using shared content-based features ('semantic IDs') in improving recommendation accuracy and diversity, while reducing model size, for two music recommendation datasets, including an online A/B test on a music streaming service.


【4】Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation
标题:剥削胜过探索:揭露线性盗贼推荐离线评估中的偏见
链接:https://arxiv.org/abs/2507.18756

作者:Pires, Gregorio F. Azevedo, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida
备注:Accepted to be published in RecSys'25, 10 pages, 3 figures
摘要:多臂强盗(MAB)算法广泛应用于需要连续、增量学习的推荐系统中。MAB的一个核心方面是探索-开发权衡:在开发可能享受的项目和探索新项目之间进行选择以收集信息。在上下文线性强盗中,这种权衡尤其重要,因为许多变体共享相同的线性回归主干,并且主要在其探索策略上有所不同。尽管MABs的离线评价被广泛使用,但人们越来越认识到其在可靠评估勘探行为方面的局限性。本研究对几种线性MAB进行了广泛的离线实证比较。引人注目的是,在超过90%的各种数据集中,没有任何类型的探索的贪婪线性模型始终实现顶级性能,通常优于或匹配其探索性模型。超参数优化进一步证实了这一观察结果,它始终倾向于最小化探索的配置,这表明纯开发是这些评估设置中的主导策略。我们的结果暴露了强盗离线评估协议的显着不足之处,特别是在反映真实探索功效的能力方面。因此,这项研究强调迫切需要开发更强大的评估方法,指导未来的调查替代评价框架的交互式学习的推荐系统。
摘要:Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing between exploiting items likely to be enjoyed and exploring new ones to gather information. In contextual linear bandits, this trade-off is particularly central, as many variants share the same linear regression backbone and differ primarily in their exploration strategies. Despite its prevalent use, offline evaluation of MABs is increasingly recognized for its limitations in reliably assessing exploration behavior. This study conducts an extensive offline empirical comparison of several linear MABs. Strikingly, across over 90% of various datasets, a greedy linear model, with no type of exploration, consistently achieves top-tier performance, often outperforming or matching its exploratory counterparts. This observation is further corroborated by hyperparameter optimization, which consistently favors configurations that minimize exploration, suggesting that pure exploitation is the dominant strategy within these evaluation settings. Our results expose significant inadequacies in offline evaluation protocols for bandits, particularly concerning their capacity to reflect true exploratory efficacy. Consequently, this research underscores the urgent necessity for developing more robust assessment methodologies, guiding future investigations into alternative evaluation frameworks for interactive learning in recommender systems.


聚类(3篇)

【1】Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box
标题:森林引导的集群--将光线投射到随机森林黑匣子中
链接:https://arxiv.org/abs/2507.19455

作者:os de Andrade e Sousa, Gregor Miller, Ronan Le Gleut, Dominik Thalmeier, Helena Pelin, Marie Piraud
摘要:随着机器学习模型越来越多地部署在敏感的应用领域,对可解释和可信赖的决策的需求也在增加。随机森林(RF),尽管它们的广泛使用和强大的性能表数据,仍然难以解释,由于其整体性质。我们提出了森林引导聚类(FGC),模型特定的可解释性方法,揭示了本地和全球结构的RF分组实例根据共享的决策路径。FGC产生与模型内部逻辑一致的人类可解释的聚类,并计算特定于聚类的和全局特征重要性得分,以导出RF预测的决策规则。FGC在基准数据集上准确地恢复了潜在子类结构,并且优于经典聚类和事后解释方法。应用于AML转录组数据集,FGC发现了生物学上一致的亚群,从混杂因素中分离出疾病相关信号,并恢复了已知和新的基因表达模式。FGC通过提供超越特征级属性的结构感知洞察,弥合了性能和可解释性之间的差距。
摘要:As machine learning models are increasingly deployed in sensitive application areas, the demand for interpretable and trustworthy decision-making has increased. Random Forests (RF), despite their widespread use and strong performance on tabular data, remain difficult to interpret due to their ensemble nature. We present Forest-Guided Clustering (FGC), a model-specific explainability method that reveals both local and global structure in RFs by grouping instances according to shared decision paths. FGC produces human-interpretable clusters aligned with the model's internal logic and computes cluster-specific and global feature importance scores to derive decision rules underlying RF predictions. FGC accurately recovered latent subclass structure on a benchmark dataset and outperformed classical clustering and post-hoc explanation methods. Applied to an AML transcriptomic dataset, FGC uncovered biologically coherent subpopulations, disentangled disease-relevant signals from confounders, and recovered known and novel gene expression patterns. FGC bridges the gap between performance and interpretability by providing structure-aware insights that go beyond feature-level attribution.


【2】MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
标题:MindSpeed RL:分布式数据流,在Ascend NPU集群上实现可扩展和高效的RL训练
链接:https://arxiv.org/abs/2507.19017

作者:Feng, Chenyi Pan, Xinjie Guo, Fei Mei, Benzhe Ning, Jianxiang Zhang, Xinyang Liu, Beirong Zhou, Zeng Shu, Chang Liu, Guang Yang, Zhenyu Han, Jiangben Wang, Bo Wang
备注:9 pages
摘要:强化学习(RL)是一种越来越多地用于对齐大型语言模型的范式。流行的RL算法使用多个工作者,并且可以建模为图,其中每个节点是工作者的状态,每个边表示节点之间的连接。由于强化学习训练系统存在严重的跨节点依赖性,其集群可扩展性差,内存利用率低。在这篇文章中,我们介绍了MindSpeed RL,这是一个有效且高效的大规模RL训练系统。与现有的集中式方法不同,MindSpeed RL组织RL训练中的基本数据依赖关系,即,样品流和再切片流,从分布式视图。一方面,设计了一种分布式转运码头策略,在传统的重放缓冲区的基础上设置控制器和仓库,以释放样本流中的调度开销。提出了一种实用的全聚集-交换策略,以消除重分片流程中的冗余内存使用。此外,MindSpeed RL还集成了许多并行化策略和加速技术,用于系统优化。通过对当前流行的Qwen2.5-Dense-7 B/32 B、Qwen 3-MoE-30 B和DeepSeek-R1-MoE-671 B的RL训练进行综合实验,与现有的最先进的系统相比,MindSpeed RL的吞吐量提高了1.42 ~ 3.97倍。最后,我们开源了MindSpeed RL,并在拥有384个神经处理单元(NPU)的Ascend超级吊舱上进行了所有实验,以展示Ascend的强大性能和可靠性。
摘要:Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents dataflow between nodes. Owing to the heavy cross-node dependencies, the RL training system usually suffers from poor cluster scalability and low memory utilization. In this article, we introduce MindSpeed RL, an effective and efficient system for large-scale RL training. Unlike existing centralized methods, MindSpeed RL organizes the essential data dependencies in RL training, i.e., sample flow and resharding flow, from a distributed view. On the one hand, a distributed transfer dock strategy, which sets controllers and warehouses on the basis of the conventional replay buffer, is designed to release the dispatch overhead in the sample flow. A practical allgather--swap strategy is presented to eliminate redundant memory usage in resharding flow. In addition, MindSpeed RL further integrates numerous parallelization strategies and acceleration techniques for systematic optimization. Compared with existing state-of-the-art systems, comprehensive experiments on the RL training of popular Qwen2.5-Dense-7B/32B, Qwen3-MoE-30B, and DeepSeek-R1-MoE-671B show that MindSpeed RL increases the throughput by 1.42 ~ 3.97 times. Finally, we open--source MindSpeed RL and perform all the experiments on a super pod of Ascend with 384 neural processing units (NPUs) to demonstrate the powerful performance and reliability of Ascend.


【3】Perfect Clustering in Very Sparse Diverse Multiplex Networks
标题:非常稀疏多元化多路网络中的完美分簇
链接:https://arxiv.org/abs/2507.19423

作者:Pensky
备注:5 figures
摘要:本文研究了DIverse MultiPLEx Signed Generalized Random Dot Product Graph(DIMPLE-SGRDPG)网络模型(Pensky(2024)),其中网络的所有层都有相同的节点集合。此外,所有层可以被划分成组,使得同一组中的层被嵌入在相同的环境子空间中,但是连接概率的矩阵可以全部不同。此设置包括大多数多层网络模型作为其特定情况。该模型的关键任务是恢复具有唯一子空间结构的层组,因为网络的所有层都嵌入在同一子空间中的情况已经得到了相当好的研究。到目前为止,这种网络中的层聚类是基于逐层分析,这要求多层网络足够密集。尽管如此,在本文中,我们成功地将所有层中的信息汇集在一起,并提供了一种基于张量的方法,可以确保为更稀疏的网络进行完美的聚类。我们的理论结果,建立在直观的非限制性假设下,断言新技术实现了完美的聚类稀疏条件下,对数因子,符合计算下限推导出一个简单得多的模型。
摘要:The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned into groups such that the layers in the same group are embedded in the same ambient subspace but otherwise matrices of connection probabilities can be all different. This setting includes majority of multilayer network models as its particular cases. The key task in this model is to recover the groups of layers with unique subspace structures, since the case where all layers of the network are embedded in the same subspace has been fairly well studied. Until now, clustering of layers in such networks was based on the layer-per-layer analysis, which required the multilayer network to be sufficiently dense. Nevertheless, in this paper we succeeded in pooling information in all layers together and providing a tensor-based methodology that ensures perfect clustering for a much sparser network. Our theoretical results, established under intuitive non-restrictive assumptions, assert that the new technique achieves perfect clustering under sparsity conditions that, up to logarithmic factors, coincide with the computational lower bound derived for a much simpler model.


推理|分析|理解|解释(5篇)

【1】SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence
标题:SIDE:稀疏信息解纠缠的可解释人工智能
链接:https://arxiv.org/abs/2507.19321

作者:bovik, Łukasz Struski, Jacek Tabor, Dawid Rymarczyk
摘要:理解深度神经网络做出的决策在医学成像和自动驾驶等高风险领域至关重要。然而,这些模型往往缺乏透明度,特别是在计算机视觉中。基于原型零件的神经网络通过提供概念级解释而成为一种有前途的解决方案。然而,大多数都局限于细粒度的分类任务,只有少数例外,如InfoDisent。InfoDisent将原型模型扩展到像ImageNet这样的大规模数据集,但产生了复杂的解释。   我们介绍了稀疏信息解纠缠的可解释性(SIDE),一种新的方法,提高了解释性的原型部分,通过一个专门的训练和修剪计划,强制稀疏。结合sigmoid激活来代替softmax,这种方法允许SIDE将每个类仅与一小部分相关原型相关联。大量的实验表明,SIDE匹配现有方法的准确性,同时减少解释的大小超过90\%$,大大提高了基于原型的解释的可理解性。
摘要:Understanding the decisions made by deep neural networks is essential in high-stakes domains such as medical imaging and autonomous driving. Yet, these models often lack transparency, particularly in computer vision. Prototypical-parts-based neural networks have emerged as a promising solution by offering concept-level explanations. However, most are limited to fine-grained classification tasks, with few exceptions such as InfoDisent. InfoDisent extends prototypical models to large-scale datasets like ImageNet, but produces complex explanations.   We introduce Sparse Information Disentanglement for Explainability (SIDE), a novel method that improves the interpretability of prototypical parts through a dedicated training and pruning scheme that enforces sparsity. Combined with sigmoid activations in place of softmax, this approach allows SIDE to associate each class with only a small set of relevant prototypes. Extensive experiments show that SIDE matches the accuracy of existing methods while reducing explanation size by over $90\%$, substantially enhancing the understandability of prototype-based explanations.


【2】KASPER: Kolmogorov Arnold Networks for Stock Prediction and Explainable Regimes
标题:KASPER:用于股票预测和可解释机制的Kolmogorov Arnold网络
链接:https://arxiv.org/abs/2507.18983

作者:, Param Pathak, Nouhaila Innan, Shalini D, Muhammad Shafique
备注:11 pages, 7 figures, 3 tables
摘要:金融市场的预测仍然是一个重大挑战,由于其非线性和制度依赖的动态。传统的深度学习模型,如长短期记忆网络和多层感知器,通常很难在不断变化的市场条件下推广,这凸显了对更具适应性和可解释性的方法的需求。为了解决这个问题,我们引入了Kolmogorov-Arnold网络用于股票预测和可解释的制度(KASPER),这是一个新的框架,它集成了制度检测,基于稀疏样条的函数建模和符号规则提取。该框架使用基于Gumbel-Softmax的机制来识别隐藏的市场条件,从而实现针对特定政权的预测。对于每种制度,它采用稀疏样条激活的Kolmogorov-Arnold网络来捕捉复杂的价格行为,同时保持鲁棒性。可解释性是通过基于Monte Carlo Shapley值的符号学习来实现的,该符号学习提取了为每个政权量身定制的人类可读规则。应用于Yahoo Finance的真实世界金融时间序列,该模型的R^2 $得分为0.89,夏普比率为12.02,均方误差低至0.0001,优于现有方法。这项研究为金融市场的制度意识,透明和稳健的预测奠定了新的方向。
摘要:Forecasting in financial markets remains a significant challenge due to their nonlinear and regime-dependent dynamics. Traditional deep learning models, such as long short-term memory networks and multilayer perceptrons, often struggle to generalize across shifting market conditions, highlighting the need for a more adaptive and interpretable approach. To address this, we introduce Kolmogorov-Arnold networks for stock prediction and explainable regimes (KASPER), a novel framework that integrates regime detection, sparse spline-based function modeling, and symbolic rule extraction. The framework identifies hidden market conditions using a Gumbel-Softmax-based mechanism, enabling regime-specific forecasting. For each regime, it employs Kolmogorov-Arnold networks with sparse spline activations to capture intricate price behaviors while maintaining robustness. Interpretability is achieved through symbolic learning based on Monte Carlo Shapley values, which extracts human-readable rules tailored to each regime. Applied to real-world financial time series from Yahoo Finance, the model achieves an $R^2$ score of 0.89, a Sharpe Ratio of 12.02, and a mean squared error as low as 0.0001, outperforming existing methods. This research establishes a new direction for regime-aware, transparent, and robust forecasting in financial markets.


【3】A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation
标题:一把筷子,不是一把锤子-- Multi-TAG:使用多工具聚合扩展数学推理
链接:https://arxiv.org/abs/2507.18973

作者:, Vikas Yadav
备注:21 pages, 3 figures
摘要:用外部工具扩充大型语言模型(LLM)是开发高性能数学推理系统的一条很有前途的途径。先前的工具增强方法通常微调LLM,以在每个推理步骤中选择和调用单个工具,并在更简单的数学推理基准(如GSM 8 K)上显示有希望的结果。然而,这些方法难以解决更复杂的数学问题,需要在多个步骤上进行精确推理。为了解决这个问题,在这项工作中,我们提出了多标签,多工具聚集为基础的框架。而不是依赖于一个单一的工具,多标签引导LLM在每个推理步骤并发调用多个工具。然后,它汇总了他们的不同输出,以验证和完善推理过程,提高解决方案的鲁棒性和准确性。值得注意的是,Multi-TAG是一个无微调,仅推理的框架,使其易于适用于任何LLM骨干,包括大型开放权重模型,这些模型在微调和专有前沿模型时计算成本很高,无法使用自定义配方进行微调。我们在四个具有挑战性的基准上评估Multi-TAG:MATH 500,AIME,AMC和OlympiadBench。在开源和闭源LLM主干中,Multi-TAG始终大幅优于最先进的基线,比最先进的基线平均提高6.0%至7.5%。
摘要:Augmenting large language models (LLMs) with external tools is a promising avenue for developing high-performance mathematical reasoning systems. Prior tool-augmented approaches typically finetune an LLM to select and invoke a single tool at each reasoning step and show promising results on simpler math reasoning benchmarks such as GSM8K. However, these approaches struggle with more complex math problems that require precise reasoning over multiple steps. To address this limitation, in this work, we propose Multi-TAG, a Multi-Tool AGgregation-based framework. Instead of relying on a single tool, Multi-TAG guides an LLM to concurrently invoke multiple tools at each reasoning step. It then aggregates their diverse outputs to verify and refine the reasoning process, enhancing solution robustness and accuracy. Notably, Multi-TAG is a finetuning-free, inference-only framework, making it readily applicable to any LLM backbone, including large open-weight models which are computationally expensive to finetune and proprietary frontier models which cannot be finetuned with custom recipes. We evaluate Multi-TAG on four challenging benchmarks: MATH500, AIME, AMC, and OlympiadBench. Across both open-weight and closed-source LLM backbones, Multi-TAG consistently and substantially outperforms state-of-the-art baselines, achieving average improvements of 6.0% to 7.5% over state-of-the-art baselines.


【4】PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning
标题:棱镜RAG:通过干扰者复原力和战略推理增强RAG事实
链接:https://arxiv.org/abs/2507.18857

作者:Kachuee, Teja Gollapudi, Minseok Kim, Yin Huang, Kai Sun, Xiao Yang, Jiaqi Wang, Nirav Shah, Yue Liu, Aaron Colak, Anuj Kumar, Wen-tau Yih, Xin Luna Dong
摘要:检索增强生成(RAG)往往不足时,检索上下文包括混乱的半相关段落,或回答问题时,需要深入的上下文理解和推理。我们提出了一个高效的微调框架,称为PrismRAG,它(i)用分心感知QA对训练模型,将黄金证据与微妙的分心段落混合在一起,(ii)灌输以推理为中心的习惯,使LLM计划,合理化和合成,而不依赖于广泛的人类工程指令。在跨越不同应用领域和场景的12个开卷RAG QA基准测试中,PrismRAG将平均真实性提高了5.4%,优于最先进的解决方案。
摘要 :Retrieval-augmented generation (RAG) often falls short when retrieved context includes confusing semi-relevant passages, or when answering questions require deep contextual understanding and reasoning. We propose an efficient fine-tuning framework, called PrismRAG, that (i) trains the model with distractor-aware QA pairs mixing gold evidence with subtle distractor passages, and (ii) instills reasoning-centric habits that make the LLM plan, rationalize, and synthesize without relying on extensive human engineered instructions. Evaluated across 12 open-book RAG QA benchmarks spanning diverse application domains and scenarios, PrismRAG improves average factuality by 5.4%, outperforming state-of-the-art solutions.


【5】An Explainable Equity-Aware P2P Energy Trading Framework for Socio-Economically Diverse Microgrid
标题:社会经济多元化微电网的可解释的股权意识P2P能源交易框架
链接:https://arxiv.org/abs/2507.18738

作者:heja, Mayukha Pal
摘要:社区微电网的公平和动态能源分配仍然是一个关键挑战,特别是在为社会经济多样化的参与者提供服务时。静态的优化和费用分摊方法往往不能适应不断变化的不平等,导致参与者不满和合作不可持续。本文提出了一种新的框架,集成了多目标混合整数线性规划(MILP),合作博弈理论,和强化学习(RL)驱动的动态股权调整机制。在其核心,该框架利用了一个基于公平的福利最大化(EqWM)原则的双层优化模型,该模型结合了罗尔斯的公平性,以优先考虑最不公平参与者的福利。我们引入了一个近端政策优化(PPO)代理,根据观察到的成本和可再生能源接入的不公平性,动态调整优化目标中的社会经济权重。这种RL驱动的反馈回路使系统能够学习和适应,不断追求更公平的状态。为了确保透明度,可解释人工智能(XAI)用于解释从加权Shapley值得出的福利分配。该框架在六个现实场景中得到验证,表明峰值需求减少高达72.6%,并取得了显著的合作成果。自适应强化学习机制随着时间的推移进一步降低了基尼系数,展示了一条通往真正可持续和公平能源社区的道路。
摘要:Fair and dynamic energy allocation in community microgrids remains a critical challenge, particularly when serving socio-economically diverse participants. Static optimization and cost-sharing methods often fail to adapt to evolving inequities, leading to participant dissatisfaction and unsustainable cooperation. This paper proposes a novel framework that integrates multi-objective mixed-integer linear programming (MILP), cooperative game theory, and a dynamic equity-adjustment mechanism driven by reinforcement learning (RL). At its core, the framework utilizes a bi-level optimization model grounded in Equity-regarding Welfare Maximization (EqWM) principles, which incorporate Rawlsian fairness to prioritize the welfare of the least advantaged participants. We introduce a Proximal Policy Optimization (PPO) agent that dynamically adjusts socio-economic weights in the optimization objective based on observed inequities in cost and renewable energy access. This RL-powered feedback loop enables the system to learn and adapt, continuously striving for a more equitable state. To ensure transparency, Explainable AI (XAI) is used to interpret the benefit allocations derived from a weighted Shapley value. Validated across six realistic scenarios, the framework demonstrates peak demand reductions of up to 72.6%, and significant cooperative gains. The adaptive RL mechanism further reduces the Gini coefficient over time, showcasing a pathway to truly sustainable and fair energy communities.


检测相关(4篇)

【1】SILS: Strategic Influence on Liquidity Stability and Whale Detection in Concentrated-Liquidity DEXs
标题:SILS:对流动性集中的SEN中流动性稳定性和鲸鱼检测的战略影响
链接:https://arxiv.org/abs/2507.19411

作者:iNekoo, Laleh Rasoul, Amirfarhad Farhadi, Azadeh Zamanifar
摘要:在集中式流动性做市商(CLINGS)中识别有影响力的流动性提供者(LP)的传统方法依赖于广泛的衡量标准,例如名义资本规模或表面活动,这通常会导致不准确的风险分析。SILS框架提供了一种更为详细的方法,不仅将LP描述为资本持有人,而且还将其描述为动态的系统代理人,其行为直接影响市场稳定。这代表着从静态的、基于数量的分析到动态的、注重影响的理解的根本范式转变。这种先进的方法使用链上事件日志和智能合约执行跟踪来计算指数时间加权流动性(ETWL)配置文件并应用无监督异常检测。最重要的是,它通过流动性稳定性影响评分(LSIS)来定义LP的功能重要性,这是一种反事实指标,用于衡量LP退出时市场的潜在恶化。这种综合方法提供了对LP影响的更详细和更现实的描述,超越了现有方法使用的二元和经常误导的分类。这种注重影响力的综合方法使SILS能够准确识别高影响力LP-包括传统方法遗漏的LP,并支持保护性Oracle层和可操作的交易者信号等基本应用程序,从而显著增强DeFi生态系统。该框架为基础流动性结构和相关风险提供了前所未有的透明度,有效地减少了常见的误报,并发现了传统模型中的关键误报。因此,SILS为主动风险管理提供了一种有效的机制,改变了DeFi协议如何保护其生态系统免受不对称流动性行为的影响。
摘要:Traditional methods for identifying impactful liquidity providers (LPs) in Concentrated Liquidity Market Makers (CLMMs) rely on broad measures, such as nominal capital size or surface-level activity, which often lead to inaccurate risk analysis. The SILS framework offers a significantly more detailed approach, characterizing LPs not just as capital holders but as dynamic systemic agents whose actions directly impact market stability. This represents a fundamental paradigm shift from the static, volume-based analysis to a dynamic, impact-focused understanding. This advanced approach uses on-chain event logs and smart contract execution traces to compute Exponential Time-Weighted Liquidity (ETWL) profiles and apply unsupervised anomaly detection. Most importantly, it defines an LP's functional importance through the Liquidity Stability Impact Score (LSIS), a counterfactual metric that measures the potential degradation of the market if the LP withdraws. This combined approach provides a more detailed and realistic characterization of an LP's impact, moving beyond the binary and often misleading classifications used by existing methods. This impact-focused and comprehensive approach enables SILS to accurately identify high-impact LPs-including those missed by traditional methods and supports essential applications like a protective oracle layer and actionable trader signals, thereby significantly enhancing DeFi ecosystem. The framework provides unprecedented transparency into the underlying liquidity structure and associated risks, effectively reducing the common false positives and uncovering critical false negatives found in traditional models. Therefore, SILS provides an effective mechanism for proactive risk management, transforming how DeFi protocols safeguard their ecosystems against asymmetric liquidity behavior.


【2】FD4QC: Application of Classical and Quantum-Hybrid Machine Learning for Financial Fraud Detection A Technical Report
标题:FD 4 QC:经典和量子混合机器学习在金融欺诈检测中的应用技术报告
链接:https://arxiv.org/abs/2507.19402

作者:rdaioli, Luca Marangoni, Giada Martini, Francesco Mazzolin, Luca Pajola, Andrea Ferretto Parodi, Alessandra Saitta, Maria Chiara Vernillo
备注:This is a technical report
摘要:金融交易的复杂性和数量的增加对传统的欺诈检测系统提出了重大挑战。本技术报告调查并比较了经典、量子和量子混合机器学习模型对欺诈性金融活动的二进制分类的有效性。   作为我们的方法,首先,我们开发了一个全面的行为特征工程框架,将原始事务数据转换为丰富的描述性特征集。其次,我们在IBM反洗钱(AML)数据集上实现并评估了一系列模型。经典的基线模型包括Logistic回归,决策树,随机森林和XGBoost。这些比较对三个混合经典量子算法架构:量子支持向量机(QSVM),变分量子分类器(VQC),和混合量子神经网络(HQNN)。   此外,我们还提出了量子计算欺诈检测(FD 4 QC),这是一种实用的、API驱动的系统架构,专为现实世界的部署而设计,具有经典优先、量子增强的理念,并具有强大的回退机制。   我们的结果表明,经典的基于树的模型,特别是随机森林,在当前的设置中显着优于量子对应物,实现了高精度(97.34\%\)和F-测量(86.95\%\)。在量子模型中,\textbf{QSVM}显示出最有希望的,提供高精度(\(77.15\%\))和低假阳性率(\(1.36\%\)),尽管具有较低的召回率和显着的计算开销。   该报告为现实世界的金融应用提供了一个基准,强调了量子机器学习在这一领域的局限性,并概述了未来研究的前景。
摘要 :The increasing complexity and volume of financial transactions pose significant challenges to traditional fraud detection systems. This technical report investigates and compares the efficacy of classical, quantum, and quantum-hybrid machine learning models for the binary classification of fraudulent financial activities.   As of our methodology, first, we develop a comprehensive behavioural feature engineering framework to transform raw transactional data into a rich, descriptive feature set. Second, we implement and evaluate a range of models on the IBM Anti-Money Laundering (AML) dataset. The classical baseline models include Logistic Regression, Decision Tree, Random Forest, and XGBoost. These are compared against three hybrid classic quantum algorithms architectures: a Quantum Support Vector Machine (QSVM), a Variational Quantum Classifier (VQC), and a Hybrid Quantum Neural Network (HQNN).   Furthermore, we propose Fraud Detection for Quantum Computing (FD4QC), a practical, API-driven system architecture designed for real-world deployment, featuring a classical-first, quantum-enhanced philosophy with robust fallback mechanisms.   Our results demonstrate that classical tree-based models, particularly \textit{Random Forest}, significantly outperform the quantum counterparts in the current setup, achieving high accuracy (\(97.34\%\)) and F-measure (\(86.95\%\)). Among the quantum models, \textbf{QSVM} shows the most promise, delivering high precision (\(77.15\%\)) and a low false-positive rate (\(1.36\%\)), albeit with lower recall and significant computational overhead.   This report provides a benchmark for a real-world financial application, highlights the current limitations of quantum machine learning in this domain, and outlines promising directions for future research.


【3】Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN
标题:使用深度学习检测水下废物YOLOv 7到10和更快RCNN的性能比较
链接:https://arxiv.org/abs/2507.18967

作者:arathne, HMNS Kumari, HMLS Kumari
备注:7 pages, 11 figures, to be published in International Journal of Research in Computing (IJRC)
摘要:水下污染是当今最重要的环境问题之一,在世界各地的海洋,河流和景观中发现了大量的垃圾。准确检测这些废料对于成功的废物管理、环境监测和缓解策略至关重要。在这项研究中,我们研究了五种先进的物体识别算法的性能,即YOLO(You Only Look Once)模型,包括YOLOv 7、YOLOv 8、YOLOv 9、YOLOv 10和更快的区域卷积神经网络(R-CNN),以确定哪种模型在水下识别材料时最有效。这些模型在一个包含15个不同类别的大型数据集上进行了全面的训练和测试,这些数据集在不同的条件下,例如低能见度和可变深度。从上述模型来看,YOLOv 8优于其他模型,平均精度(mAP)为80.9%,表明性能显著。这种性能的提高归功于YOLOv 8的架构,该架构集成了先进的功能,例如改进的无锚机制和自我监督学习,可以在各种设置中更精确,更有效地识别物品。这些发现突出了YOLOv 8模型作为全球对抗污染的有效工具的潜力,提高了水下清理行动的检测能力和可扩展性。
摘要:Underwater pollution is one of today's most significant environmental concerns, with vast volumes of garbage found in seas, rivers, and landscapes around the world. Accurate detection of these waste materials is crucial for successful waste management, environmental monitoring, and mitigation strategies. In this study, we investigated the performance of five cutting-edge object recognition algorithms, namely YOLO (You Only Look Once) models, including YOLOv7, YOLOv8, YOLOv9, YOLOv10, and Faster Region-Convolutional Neural Network (R-CNN), to identify which model was most effective at recognizing materials in underwater situations. The models were thoroughly trained and tested on a large dataset containing fifteen different classes under diverse conditions, such as low visibility and variable depths. From the above-mentioned models, YOLOv8 outperformed the others, with a mean Average Precision (mAP) of 80.9%, indicating a significant performance. This increased performance is attributed to YOLOv8's architecture, which incorporates advanced features such as improved anchor-free mechanisms and self-supervised learning, allowing for more precise and efficient recognition of items in a variety of settings. These findings highlight the YOLOv8 model's potential as an effective tool in the global fight against pollution, improving both the detection capabilities and scalability of underwater cleanup operations.


【4】Adapt, But Don't Forget: Fine-Tuning and Contrastive Routing for Lane Detection under Distribution Shift
标题:适应,但不要忘记:分布转移下用于车道检测的微调和对比路由
链接:https://arxiv.org/abs/2507.18653

作者:Abdul Hafeez Khan, Parth Ganeriwala, Sarah M. Lehman, Siddhartha Bhattacharyya, Amy Alvarez, Natasha Neogi
备注:Accepted to ICCV 2025, 2COOOL Workshop. Total 14 pages, 5 tables, and 4 figures
摘要:车道检测模型通常在封闭世界环境中进行评估,其中训练和测试在同一数据集上进行。我们观察到,即使在同一个域中,跨数据集分布的变化也会在微调过程中导致严重的灾难性遗忘。为了解决这个问题,我们首先在源分布上训练一个基础模型,然后通过创建单独的分支来使其适应每个新的目标分布,只微调选定的组件,同时保持原始的源分支不变。基于组件分析,我们确定了有效的微调策略,使参数有效的适应目标分布。在推理时,我们建议使用监督对比学习模型来识别输入分布并将其动态路由到相应的分支。我们的框架实现了接近最优的F1分数,同时使用的参数比为每个分布训练单独的模型少得多。
摘要:Lane detection models are often evaluated in a closed-world setting, where training and testing occur on the same dataset. We observe that, even within the same domain, cross-dataset distribution shifts can cause severe catastrophic forgetting during fine-tuning. To address this, we first train a base model on a source distribution and then adapt it to each new target distribution by creating separate branches, fine-tuning only selected components while keeping the original source branch fixed. Based on a component-wise analysis, we identify effective fine-tuning strategies for target distributions that enable parameter-efficient adaptation. At inference time, we propose using a supervised contrastive learning model to identify the input distribution and dynamically route it to the corresponding branch. Our framework achieves near-optimal F1-scores while using significantly fewer parameters than training separate models for each distribution.


分类|识别(2篇)

【1】Secure Best Arm Identification in the Presence of a Copycat
标题:在模仿者存在的情况下确保最佳手臂识别
链接:https://arxiv.org/abs/2507.18975

作者:n, Onur Günlü
备注:To appear in ITW 2025
摘要:考虑具有安全约束的最佳手臂识别问题。具体地说,假设一个随机线性土匪与$K$武器的尺寸$d$。在每次拉臂中,玩家获得的奖励是手臂与未知参数向量和独立噪声的点积之和。玩家的目标是在$T$手臂拉动后确定最佳手臂。另外,假设克洛依是个模仿者在观察手臂的拉扯。玩家希望让克洛伊不知道最好的手臂。   虽然最小最大-最优算法用$\Omega\left(\frac{T}{\log(d)}\right)$误差指数来识别最佳手臂,但它很容易向外部观察者揭示其最佳手臂估计,因为最佳手臂的使用频率更高。一个简单的安全算法,平等地发挥所有武器的结果是一个$\Omega\left(\frac{T}{d}\right)$指数。在本文中,我们提出了一个安全的算法,发挥\n {编码武器}。该算法不需要任何密钥或密码原语,但实现了$\Omega\left(\frac{T}{\log^2(d)}\right)$指数,同时几乎没有透露最佳手臂的信息。
摘要:Consider the problem of best arm identification with a security constraint. Specifically, assume a setup of stochastic linear bandits with $K$ arms of dimension $d$. In each arm pull, the player receives a reward that is the sum of the dot product of the arm with an unknown parameter vector and independent noise. The player's goal is to identify the best arm after $T$ arm pulls. Moreover, assume a copycat Chloe is observing the arm pulls. The player wishes to keep Chloe ignorant of the best arm.   While a minimax--optimal algorithm identifies the best arm with an $\Omega\left(\frac{T}{\log(d)}\right)$ error exponent, it easily reveals its best-arm estimate to an outside observer, as the best arms are played more frequently. A naive secure algorithm that plays all arms equally results in an $\Omega\left(\frac{T}{d}\right)$ exponent. In this paper, we propose a secure algorithm that plays with \emph{coded arms}. The algorithm does not require any key or cryptographic primitives, yet achieves an $\Omega\left(\frac{T}{\log^2(d)}\right)$ exponent while revealing almost no information on the best arm.


【2】ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting
标题:ylmmcl出席2025年多语言文本去规范化:词典引导的去规范化和分类器门控重写
链接:https://arxiv.org/abs/2507.18769

作者:i-Lopez, Lusha Wang, Su Yuan, Liza Zhang
备注:16 pages, 5 figures, 3 tables,
摘要:在这项工作中,我们介绍了我们在PAN-2025竞赛中为ylmmcl团队提供的多语言文本解毒任务的解决方案:一个强大的多语言文本解毒管道,集成了词典引导的标记,微调的序列到序列模型(s-nlp/mt 0-xl-detox-orpo)和基于迭代分类器的看门机制。我们的方法从以前的无监督或单语管道出发,通过多语言_toxic_lexicon利用明确的有毒单词注释来指导解毒,具有更高的精度和跨语言的泛化。我们的最终模型达到了我们之前尝试的最高STA(0.922),并且在开发和测试集中有毒输入的平均官方J得分为0.612。它还获得了0.793(开发)和0.787(测试)的xCOMET分数。这种性能在多种语言中优于基线和反向翻译方法,并在高资源设置(英语,俄语,法语)中显示出强大的泛化能力。尽管在SIM中有一些权衡,但该模型显示出解毒强度的持续改善。在比赛中,我们队以0.612的成绩获得第九名。
摘要:In this work, we introduce our solution for the Multilingual Text Detoxification Task in the PAN-2025 competition for the ylmmcl team: a robust multilingual text detoxification pipeline that integrates lexicon-guided tagging, a fine-tuned sequence-to-sequence model (s-nlp/mt0-xl-detox-orpo) and an iterative classifier-based gatekeeping mechanism. Our approach departs from prior unsupervised or monolingual pipelines by leveraging explicit toxic word annotation via the multilingual_toxic_lexicon to guide detoxification with greater precision and cross-lingual generalization. Our final model achieves the highest STA (0.922) from our previous attempts, and an average official J score of 0.612 for toxic inputs in both the development and test sets. It also achieved xCOMET scores of 0.793 (dev) and 0.787 (test). This performance outperforms baseline and backtranslation methods across multiple languages, and shows strong generalization in high-resource settings (English, Russian, French). Despite some trade-offs in SIM, the model demonstrates consistent improvements in detoxification strength. In the competition, our team achieved ninth place with a score of 0.612.


表征(1篇)

【1】Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization
标题:观察满足行动:鲁棒策略泛化的学习控制充分表示
链接:https://arxiv.org/abs/2507.19437

作者:u, Hongpeng Cao, Marco Caccamo, Naira Hovakimyan
摘要:捕获潜在的变化(“上下文”)是在训练机制之外部署重复学习(RL)代理的关键。我们将基于上下文的强化学习转化为一个双重推理控制问题,并正式描述了两个属性及其层次结构:观察充分性(保留所有预测信息)和控制充分性(保留决策相关信息)。利用这种二分法,我们得到了一个上下文证据下限(ELBO)风格的目标,干净地分离表示学习策略学习和优化它与瓶颈上下文策略优化(BCPO),一个算法,将变分信息瓶颈编码器在任何关闭政策的政策学习者的前面。在具有移动物理参数的标准连续控制基准上,BCPO匹配或超过其他基线,同时使用更少的样本并保持远远超出训练机制的性能。该框架统一了基于上下文的强化学习的理论、诊断和实践。
摘要:Capturing latent variations ("contexts") is key to deploying reinforcement-learning (RL) agents beyond their training regime. We recast context-based RL as a dual inference-control problem and formally characterize two properties and their hierarchy: observation sufficiency (preserving all predictive information) and control sufficiency (retaining decision-making relevant information). Exploiting this dichotomy, we derive a contextual evidence lower bound(ELBO)-style objective that cleanly separates representation learning from policy learning and optimizes it with Bottlenecked Contextual Policy Optimization (BCPO), an algorithm that places a variational information-bottleneck encoder in front of any off-policy policy learner. On standard continuous-control benchmarks with shifting physical parameters, BCPO matches or surpasses other baselines while using fewer samples and retaining performance far outside the training regime. The framework unifies theory, diagnostics, and practice for context-based RL.


3D|3D重建等相关(1篇)

【1】Fast Learning of Non-Cooperative Spacecraft 3D Models through Primitive Initialization
标题:通过原始数据库快速学习非合作航天器3D模型
链接:https://arxiv.org/abs/2507.19459

作者:esch Huc, Emily Bates, Simone D'Amico
摘要:新的视图合成技术,如NeRF和3D高斯溅射(3DGS)的出现,使学习精确的3D模型,只有从提出的单目图像。虽然这些方法很有吸引力,但它们有两个主要的限制,阻止它们在空间应用中的使用:它们需要在训练过程中的姿势,并且在训练和推理时具有很高的计算成本。为了解决这些限制,这项工作有助于:(1)基于卷积神经网络(CNN)的原始初始化器,用于使用单目图像的3DGS;(2)能够使用噪声或隐式姿势估计进行训练的管道;以及(3)和初始化变量的分析,降低精确3D模型的训练成本。CNN将单个图像作为输入,并输出表示为基元集合的粗略3D模型,以及目标相对于相机的姿态。然后使用这些图元的集合来初始化3DGS,从而显著减少所需的训练迭代和输入图像的数量-至少减少一个数量级。为了获得额外的灵活性,CNN组件具有多个变体,这些变体具有不同的姿态估计技术。这项工作对这些变体进行了比较,评估了它们在噪声或隐式姿态估计下对下游3DGS训练的有效性。结果表明,即使有不完美的姿势监督,管道也能够学习高保真的3D表示,为在空间应用中使用新的视图合成打开了大门。
摘要:The advent of novel view synthesis techniques such as NeRF and 3D Gaussian Splatting (3DGS) has enabled learning precise 3D models only from posed monocular images. Although these methods are attractive, they hold two major limitations that prevent their use in space applications: they require poses during training, and have high computational cost at training and inference. To address these limitations, this work contributes: (1) a Convolutional Neural Network (CNN) based primitive initializer for 3DGS using monocular images; (2) a pipeline capable of training with noisy or implicit pose estimates; and (3) and analysis of initialization variants that reduce the training cost of precise 3D models. A CNN takes a single image as input and outputs a coarse 3D model represented as an assembly of primitives, along with the target's pose relative to the camera. This assembly of primitives is then used to initialize 3DGS, significantly reducing the number of training iterations and input images needed -- by at least an order of magnitude. For additional flexibility, the CNN component has multiple variants with different pose estimation techniques. This work performs a comparison between these variants, evaluating their effectiveness for downstream 3DGS training under noisy or implicit pose estimates. The results demonstrate that even with imperfect pose supervision, the pipeline is able to learn high-fidelity 3D representations, opening the door for the use of novel view synthesis in space applications.


优化|敛散性(1篇)

【1】Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models
标题:具有失败轨迹的弱到强概括:强模型中激发最优策略的基于树的方法
链接:https://arxiv.org/abs/2507.18858

作者:e, Zihan Wang, Xiao Yang, Zinan Ling, Manling Li, Bo Hui
摘要:弱到强泛化(W2SG)是一个新的趋势,从一个弱模型的监督下引出一个强模型的全部能力。虽然现有的W2SG研究集中在简单的任务,如二进制分类,我们扩展这种范式复杂的交互式决策环境。具体来说,我们微调一个强模型与弱模型产生的中间动作的轨迹。受人类学习过程的启发,我们提出不仅要推广成功知识,还要推广失败经验,以便强模型可以从弱模型积累的失败轨迹中学习。为了有效和高效地激发强大代理的潜力,我们进一步构建了“轨迹树”,这是一种分层表示,它组织弱模型生成的动作轨迹,再加上蒙特卡洛树搜索(MCTS)来优化强模型。通过理论分析,我们提供了形式化的保证,我们的方法在提高W2SG性能的有效性。我们的实证评估表明,在不同的任务域的推理和决策能力的大幅改善,验证了我们提出的框架的可扩展性和鲁棒性。我们的代码可从以下网址获得:https://github.com/yeruimeng/TraTree
摘要 :Weak-to-Strong generalization (W2SG) is a new trend to elicit the full capabilities of a strong model with supervision from a weak model. While existing W2SG studies focus on simple tasks like binary classification, we extend this paradigm to complex interactive decision-making environments. Specifically, we fine-tune a strong model with trajectories of intermediate actions generated by a weak model. Motivated by the human learning process, we propose to generalize not only success knowledge but also failure experience so that the strong model can learn from failed trajectories accumulated by weak models. To effectively and efficiently elicit the potential of strong agents, we further construct ``trajectory trees," a hierarchical representation that organizes weak model-generated action trajectories, coupled with Monte Carlo Tree Search (MCTS) to optimize the strong model. Through theoretical analysis, we provide formal guarantees for the effectiveness of our method in improving W2SG performance. Our empirical evaluations demonstrate substantial improvements in reasoning and decision-making capabilities across diverse task domains, validating the scalability and robustness of our proposed framework. Our code is available at: https://github.com/yeruimeng/TraTree


预测|估计(5篇)

【1】Component-Based Machine Learning for Indoor Flow and Temperature Fields Prediction Latent Feature Aggregation and Flow Interaction
标题:基于对象的机器学习用于室内流场和温度场预测潜在特征聚集和流相互作用
链接:https://arxiv.org/abs/2507.19233

作者:ang, Nils Thuerey, Philipp Geyer
摘要:准确、有效地预测室内气流和温度分布对于建筑节能和舒适性控制至关重要。然而,传统的CFD模拟是计算密集型的,限制了它们集成到实时或设计迭代工作流中。提出了一种基于组件的机器学习(CBML)替代建模方法,以取代传统的CFD模拟,快速预测室内的速度和温度场。该模型由三个神经网络组成:具有残余连接的卷积自动编码器(CAER)用于提取和压缩流量特征,多层感知器(MLP)用于将入口速度映射到潜在表示,以及卷积神经网络(CNN)作为聚合器将单入口特征组合到双入口场景中。一个具有不同的左,右空气入口速度的二维房间被用作基准情况下,与CFD模拟提供的培训和测试数据。结果表明,CBML模型准确,快速地预测两个组件的聚合速度和温度场的训练和测试数据集。
摘要:Accurate and efficient prediction of indoor airflow and temperature distributions is essential for building energy optimization and occupant comfort control. However, traditional CFD simulations are computationally intensive, limiting their integration into real-time or design-iterative workflows. This study proposes a component-based machine learning (CBML) surrogate modeling approach to replace conventional CFD simulation for fast prediction of indoor velocity and temperature fields. The model consists of three neural networks: a convolutional autoencoder with residual connections (CAER) to extract and compress flow features, a multilayer perceptron (MLP) to map inlet velocities to latent representations, and a convolutional neural network (CNN) as an aggregator to combine single-inlet features into dual-inlet scenarios. A two-dimensional room with varying left and right air inlet velocities is used as a benchmark case, with CFD simulations providing training and testing data. Results show that the CBML model accurately and fast predicts two-component aggregated velocity and temperature fields across both training and testing datasets.


【2】WACA-UNet: Weakness-Aware Channel Attention for Static IR Drop Prediction in Integrated Circuit Design
标题:WACA-UNet:集成电路设计中静态IR降预测的弱点感知渠道关注
链接:https://arxiv.org/abs/2507.19197

作者:Seo, Yunhyeong Kwon, Younghun Park, HwiRyong Kim, Seungho Eum, Jinha Kim, Taigon Song, Juho Kim, Unsang Park
备注:9 pages, 5 figures
摘要:功率完整性问题(如IR压降)的准确空间预测对于可靠的VLSI设计至关重要。然而,传统的基于仿真的求解器计算昂贵,难以扩展。我们解决这个挑战,通过重新制定IR降估计作为一个逐像素的回归任务异构多通道物理地图来自电路布局。现有的基于学习的方法处理所有输入层(例如,金属、通孔和电流图),忽略它们对预测精度的不同重要性。为了解决这个问题,我们提出了一种新的弱感知通道注意力(WACA)机制,它递归地增强弱特征通道,同时通过两阶段的门控策略抑制过度主导的通道。集成到一个基于ConvNeXtV 2的注意力U-Net中,我们的方法可以实现自适应和平衡的特征表示。在公开的ICCAD-2023基准测试中,我们的方法通过将平均绝对误差降低61.1%和将F1分数提高71.0%而优于ICCAD-2023竞赛获胜者。这些结果表明,通道方式的异质性是一个关键的归纳偏差的物理布局分析的超大规模集成电路。
摘要:Accurate spatial prediction of power integrity issues, such as IR drop, is critical for reliable VLSI design. However, traditional simulation-based solvers are computationally expensive and difficult to scale. We address this challenge by reformulating IR drop estimation as a pixel-wise regression task on heterogeneous multi-channel physical maps derived from circuit layouts. Prior learning-based methods treat all input layers (e.g., metal, via, and current maps) equally, ignoring their varying importance to prediction accuracy. To tackle this, we propose a novel Weakness-Aware Channel Attention (WACA) mechanism, which recursively enhances weak feature channels while suppressing over-dominant ones through a two-stage gating strategy. Integrated into a ConvNeXtV2-based attention U-Net, our approach enables adaptive and balanced feature representation. On the public ICCAD-2023 benchmark, our method outperforms the ICCAD-2023 contest winner by reducing mean absolute error by 61.1% and improving F1-score by 71.0%. These results demonstrate that channel-wise heterogeneity is a key inductive bias in physical layout analysis for VLSI.


【3】CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction over Medium-range Forecast Periods
标题:基于CNN的地表温度预报和中期预报期整体数值天气预报
链接:https://arxiv.org/abs/2507.18937

作者:oue, Takuya Kawabata (Meteorological Research Institute, Tsukuba, Japan)
备注:32 pages, 10 figures
摘要:这项研究提出了一种将卷积神经网络(CNN)与集合数值天气预报(NWP)模型相结合的方法,使地表温度预报能够在短期(五天)预报期之外的提前期进行。由于计算资源有限,业务中期温度预报通常依赖于低分辨率数值预报模式,这种模式容易出现系统误差和随机误差。为了解决这些限制,所提出的方法首先通过对每个系综成员进行基于CNN的后处理(偏差校正和空间超分辨率)来减少系统误差,从低分辨率模型输出重建高分辨率温度场。其次,它通过对CNN校正的成员进行集成平均来减少随机误差。本研究亦探讨CNN订正与集合平均的顺序是否会影响预报的准确度。为了与所提出的方法进行比较,我们还使用在整体平均预测上训练的CNN进行了实验。第一种方法--在整体平均之前进行CNN校正--始终比反向方法获得更高的准确性。虽然基于低分辨率集合预报,所提出的方法显着优于高分辨率确定性数值预报模式。这些结果表明,结合基于CNN的校正与集合平均有效地减少了系统和随机误差的数值预报模式输出。所提出的方法是一个实用的和可扩展的解决方案,以提高中期温度预报,是特别有价值的业务中心有限的计算资源。
摘要:This study proposes a method that integrates convolutional neural networks (CNNs) with ensemble numerical weather prediction (NWP) models, enabling surface temperature forecasting at lead times beyond the short-range (five-day) forecast period. Owing to limited computational resources, operational medium-range temperature forecasts typically rely on low-resolution NWP models, which are prone to systematic and random errors. To resolve these limitations, the proposed method first reduces systematic errors through CNN-based post-processing (bias correction and spatial super-resolution) on each ensemble member, reconstructing high-resolution temperature fields from low-resolution model outputs. Second, it reduces random errors through ensemble averaging of the CNN-corrected members. This study also investigates whether the sequence of CNN correction and ensemble averaging affects the forecast accuracy. For comparison with the proposed method, we additionally conducted experiments with the CNN trained on ensemble-averaged forecasts. The first approach--CNN correction before ensemble averaging--consistently achieved higher accuracy than the reverse approach. Although based on low-resolution ensemble forecasts, the proposed method notably outperformed the high-resolution deterministic NWP models. These findings indicate that combining CNN-based correction with ensemble averaging effectively reduces both the systematic and random errors in NWP model outputs. The proposed approach is a practical and scalable solution for improving medium-range temperature forecasts, and is particularly valuable at operational centers with limited computational resources.


【4】A Regression-Based Share Market Prediction Model for Bangladesh
标题:孟加拉国基于回归的股市预测模型
链接:https://arxiv.org/abs/2507.18643

作者:nim Fabiha, Rubaiyat Jahan Mumu, Farzana Aktar, B M Mainul Hossain
备注:Originally written in 2018. Updated in 2025 for open-access archiving. Not previously published
摘要:股票市场是一国经济发展的重要组成部分。每天几乎所有的公司都发行股票,投资者购买和出售这些公司的股票。一般来说,投资者希望购买市场流动性相对较大的公司的股票。市场流动性取决于股票的平均价格。本文对达卡证券交易所的股票市场数据进行了全面的线性回归分析。随后,将线性模型与基于不同度量的随机森林模型进行了比较,结果表明随机森林模型具有更好的结果。然而,不同因素对股票价格变异性的个体显著性的数量已经被识别和解释。本文还表明,时间序列数据不能生成用于分析的预测线性模型。
摘要:Share market is one of the most important sectors of economic development of a country. Everyday almost all companies issue their shares and investors buy and sell shares of these companies. Generally investors want to buy shares of the companies whose market liquidity is comparatively greater. Market liquidity depends on the average price of a share. In this paper, a thorough linear regression analysis has been performed on the stock market data of Dhaka Stock Exchange. Later, the linear model has been compared with random forest based on different metrics showing better results for random forest model. However, the amount of individual significance of different factors on the variability of stock price has been identified and explained. This paper also shows that the time series data is not capable of generating a predictive linear model for analysis.


【5】A comparison of stretched-grid and limited-area modelling for data-driven regional weather forecasting
标题:数据驱动区域天气预报的拉伸网格和有限区域建模的比较
链接:https://arxiv.org/abs/2507.18378

作者: Wijnands, Michiel Van Ginderachter, Bastien François, Sophie Buurman, Piet Termonia, Dieter Van den Bleeken
摘要:基于图神经网络的区域机器学习天气预测(MLWP)模型最近表现出显着的预测准确性,以较低的计算成本优于数值天气预测模型。特别是,有限区域模型(LAM)和拉伸网格模型(SGM)的方法已经出现,用于生成高分辨率的区域预报,根据初始条件从区域(再)分析。虽然LAM使用外部全局模型的横向边界,但SGM以较低的分辨率合并了全局域。本研究旨在了解模型设计的差异如何影响相对性能和潜在应用。具体而言,这两种方法的优点和缺点被确定为产生确定性的区域预测在欧洲。使用Anemoi框架,这两种类型的模型都是通过最低限度地适应共享架构来构建的,并在几乎相同的设置中使用全球和区域重新分析进行训练。已经进行了几个推理实验,以探索它们的相对性能,并突出关键的差异。结果表明,LAM和SGM都是有竞争力的确定性MLWP模型,在区域域具有普遍准确和可比的预测性能。各种不同的应用程序中的模型的性能被确定。LAM能够成功地利用高质量的边界强迫在区域域内进行预测,并且适用于难以获取全局数据的情况。SGM是完全独立的,更容易操作,可以利用更多的训练数据,并在(时间)概括性方面显着超过LAM。我们的论文可以作为一个起点,气象机构,以指导他们之间的选择LAM和SGM在开发业务数据驱动的预报系统。
摘要:Regional machine learning weather prediction (MLWP) models based on graph neural networks have recently demonstrated remarkable predictive accuracy, outperforming numerical weather prediction models at lower computational costs. In particular, limited-area model (LAM) and stretched-grid model (SGM) approaches have emerged for generating high-resolution regional forecasts, based on initial conditions from a regional (re)analysis. While LAM uses lateral boundaries from an external global model, SGM incorporates a global domain at lower resolution. This study aims to understand how the differences in model design impact relative performance and potential applications. Specifically, the strengths and weaknesses of these two approaches are identified for generating deterministic regional forecasts over Europe. Using the Anemoi framework, models of both types are built by minimally adapting a shared architecture and trained using global and regional reanalyses in a near-identical setup. Several inference experiments have been conducted to explore their relative performance and highlight key differences. Results show that both LAM and SGM are competitive deterministic MLWP models with generally accurate and comparable forecasting performance over the regional domain. Various differences were identified in the performance of the models across applications. LAM is able to successfully exploit high-quality boundary forcings to make predictions within the regional domain and is suitable in contexts where global data is difficult to acquire. SGM is fully self-contained for easier operationalisation, can take advantage of more training data and significantly surpasses LAM in terms of (temporal) generalisability. Our paper can serve as a starting point for meteorological institutes to guide their choice between LAM and SGM in developing an operational data-driven forecasting system.


其他神经网络|深度学习|模型|建模(17篇)

【1】Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
标题:第3步规模大但价格实惠:模型-系统协同设计,实现经济高效的解码
链接:https://arxiv.org/abs/2507.19427

作者:Bin Wang, Bojun Wang, Changyi Wan, Guanzhe Huang, Hanpeng Hu, Haonan Jia, Hao Nie, Mingliang Li, Nuo Chen, Siyu Chen, Song Yuan, Wuxun Xie, Xiaoniu Song, Xing Chen, Xingping Yang, Xuelin Zhang, Yanbo Yu, Yaoyu Wang, Yibo Zhu, Yimin Jiang, Yu Zhou, Yuanwei Lu, Houyi Li, Jingcheng Hu, Ka Man Lo, Ailin Huang, Binxing Jiao, Bo Li, Boyu Chen, Changxin Miao, Chang Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengyuan Yao, Daokuan Lv, Dapeng Shi, Deshan Sun, Ding Huang, Dingyuan Hu, Dongqing Pang, Enle Liu, Fajie Zhang, Fanqi Wan, Gulin Yan, Han Zhang, Han Zhou, Hanghao Wu, Hangyu Guo, Hanqi Chen, Hanshan Zhang, Hao Wu, Haocheng Zhang, Haolong Yan, Haoran Lv, Haoran Wei, Hebin Zhou, Heng Wang, Heng Wang, Hongxin Li, Hongyu Zhou, Hongyuan Wang, Huiyong Guo, Jia Wang, Jiahao Gong, Jialing Xie, Jian Zhou, Jianjian Sun, Jiaoren Wu, Jiaran Zhang, Jiayu Liu, Jie Cheng, Jie Luo, Jie Yan, Jie Yang, Jieyi Hou, Jinguang Zhang, Jinlan Cao, Jisheng Yin, Junfeng Liu, Junhao Huang, Junzhe Lin, Kaijun Tan, Kaixiang Li, Kang An, Kangheng Lin, Kenkun Liu, Lei Yang, Liang Zhao, Liangyu Chen, Lieyu Shi, Liguo Tan, Lin Lin, Lin Zhang, Lina Chen, Liwen Huang, Liying Shi, Longlong Gu, Mei Chen
摘要:大型语言模型(LLM)在解码过程中面临低硬件效率,特别是对于长上下文推理任务。本文介绍了步骤3,321 B参数VLM与硬件感知模型系统协同设计优化,以最大限度地减少解码成本。Step-3在两个关键方面进行了创新:(1)一种新的多矩阵分解注意力(MFA)机制,可以显著减少KV缓存大小和计算量,同时保持高的注意力表达能力,以及(2)注意力-FFN分解(AFD),一种分布式推理系统,将注意力和前馈网络(FFN)层合并到专门的子系统中。这种协同设计实现了前所未有的成本效率:与DeepSeek-V3和Qwen 3 MoE 235 B等模型相比,Step-3显著降低了理论解码成本,并且在更长的上下文中增益扩大。Step-3实现了低成本,同时激活每个令牌的38 B参数(超过DeepSeek-V3和Qwen 3 MoE 235 B),表明硬件对齐的注意力算法强度、MoE稀疏性和AFD对成本效益至关重要。我们在其有利的场景中与DeepSeek-V3进行了头对头的比较。我们在Hopper GPU上的实现在50 ms TPOT SLA(4K上下文,FP 8,无MTP)下实现了每GPU每秒高达4,039个令牌的解码吞吐量。它高于DeepSeek-V3在相同设置下的2,324,并为LLM解码设置了新的Pareto边界。
摘要 :Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache size and computation while maintaining high attention expressiveness, and (2) Attention-FFN Disaggregation (AFD), a distributed inference system that decouples attention and Feed-Forward Network (FFN) layers into specialized subsystems. This co-design achieves unprecedented cost efficiency: Step-3 significantly reduces theoretical decoding costs compared with models like DeepSeek-V3 and Qwen3 MoE 235B, with the gains widening at longer context. Step-3 achieves low cost while activating 38B parameters per token (more than DeepSeek-V3 and Qwen3 MoE 235B), demonstrating that hardware-aligned attention arithmetic intensity, MoE sparsity, and AFD are critical to cost-effectiveness. We perform a head-to-head comparison with DeepSeek-V3 in its favorable scenarios. Our implementation on Hopper GPUs achieves a decoding throughput of up to 4,039 tokens per second per GPU under 50ms TPOT SLA (4K context, FP8, no MTP). It is higher than DeepSeek-V3's 2,324 in the same setup and sets a new Pareto frontier for LLM decoding.


【2】On Arbitrary Predictions from Equally Valid Models
标题:关于等价模型的任意预测
链接:https://arxiv.org/abs/2507.19408

作者:kfisch, Kristian Schwethelm, Martin Menten, Rickmer Braren, Daniel Rueckert, Alexander Ziller, Georgios Kaissis
摘要:模型多样性是指存在多个机器学习模型,这些模型同样很好地描述了数据,但可能对单个样本产生不同的预测。在医学上,这些模型可能会对同一个病人做出相互矛盾的预测--这是一种人们知之甚少、也没有得到充分解决的风险。   在这项研究中,我们实证分析了不同医疗任务和模型架构中预测多重性的程度、驱动因素和后果,并表明即使是小的集合也可以在实践中减轻/消除预测多重性。我们的分析表明,(1)标准验证指标无法识别唯一的最优模型,(2)大量的预测取决于模型开发过程中的任意选择。使用多个模型而不是单个模型揭示了在同样合理的模型之间预测不同的情况-突出显示如果使用任何单个模型则会接受任意诊断的患者。相比之下,(3)与排除策略配对的小集合可以有效地减轻实践中可测量的预测多重性;因此,具有高模型间一致性的预测可能适合自动分类。虽然准确性不是预测多重性的原则性解毒剂,但我们发现(4)通过增加模型容量实现的更高准确性降低了预测多重性。   我们的研究结果强调了考虑模型多样性的临床重要性,并倡导基于整体的策略来提高诊断可靠性。如果模型无法达成足够的共识,我们建议将决策推迟到专家审查。
摘要:Model multiplicity refers to the existence of multiple machine learning models that describe the data equally well but may produce different predictions on individual samples. In medicine, these models can admit conflicting predictions for the same patient -- a risk that is poorly understood and insufficiently addressed.   In this study, we empirically analyze the extent, drivers, and ramifications of predictive multiplicity across diverse medical tasks and model architectures, and show that even small ensembles can mitigate/eliminate predictive multiplicity in practice. Our analysis reveals that (1) standard validation metrics fail to identify a uniquely optimal model and (2) a substantial amount of predictions hinges on arbitrary choices made during model development. Using multiple models instead of a single model reveals instances where predictions differ across equally plausible models -- highlighting patients that would receive arbitrary diagnoses if any single model were used. In contrast, (3) a small ensemble paired with an abstention strategy can effectively mitigate measurable predictive multiplicity in practice; predictions with high inter-model consensus may thus be amenable to automated classification. While accuracy is not a principled antidote to predictive multiplicity, we find that (4) higher accuracy achieved through increased model capacity reduces predictive multiplicity.   Our findings underscore the clinical importance of accounting for model multiplicity and advocate for ensemble-based strategies to improve diagnostic reliability. In cases where models fail to reach sufficient consensus, we recommend deferring decisions to expert review.


【3】A Data-Driven Approach to Estimate LEO Orbit Capacity Models
标题:估算LEO轨道容量模型的数据驱动方法
链接:https://arxiv.org/abs/2507.19365

作者:ock, Maddox McVarthy, Simone Servadio
备注:18 pages, 15 figures
摘要:利用非线性动力学稀疏识别算法(SINDy)和长短期记忆递归神经网络(LSTM),可以对低地球轨道上的驻留空间物体群(分为活动、废弃和碎片)进行准确建模,以预测未来的卫星和碎片传播。这种方法利用了来自计算昂贵的高保真模型MOCAT-MC的数据集,以提供一个轻,低保真度的对应物,在较短的时间内提供准确的预测。
摘要:Utilizing the Sparse Identification of Nonlinear Dynamics algorithm (SINDy) and Long Short-Term Memory Recurrent Neural Networks (LSTM), the population of resident space objects, divided into Active, Derelict, and Debris, in LEO can be accurately modeled to predict future satellite and debris propagation. This proposed approach makes use of a data set coming from a computational expensive high-fidelity model, the MOCAT-MC, to provide a light, low-fidelity counterpart that provides accurate forecasting in a shorter time frame.


【4】Query Efficient Structured Matrix Learning
标题:查询高效的结构化矩阵学习
链接:https://arxiv.org/abs/2507.19290

作者:l, Pratyush Avi, Tyler Chen, Feyza Duman Keles, Chinmay Hegde, Cameron Musco, Christopher Musco, David Persson
摘要:我们研究了学习结构化近似(低秩,稀疏,带状等)的问题。一个未知的矩阵$A$给定访问矩阵向量积(matvec)查询的形式$x \rightarrow Ax$和$x \rightarrow A^Tx$。这个问题对于科学计算和机器学习的算法至关重要,应用于结构矩阵的快速乘法和求逆,构建一阶优化的预条件,以及作为微分算子学习的模型。以前的工作重点是获得查询复杂性的上限和下限学习特定的结构化矩阵家庭,通常出现在应用程序中。   我们开始在更大的一般性问题的研究,旨在了解查询的复杂性学习近似一般矩阵家庭。我们的主要结果集中在寻找一个接近最佳的近似$A$从任何有限大小的家庭矩阵,$\mathcal{F}$。矩阵草图的标准结果表明,$O(\log|\mathcal|)$ matvec查询在此设置中就足够了。这个界限也可以实现,并且是最佳的,对于形式为$x,y\rightarrow x^TAy$的向量-矩阵-向量查询,这在秩1 $矩阵检测的工作中已经得到了广泛的研究。   令人惊讶的是,我们表明,在matvec模型中,有可能获得接近二次的复杂性改善,达到$\tilde{O}(\sqrt{\log|\mathcal|})$.进一步,我们证明了这个界在重对数因子下是紧的,通过覆盖数的论证,我们的结果推广到了已有研究的无限族。作为一个例子,我们建立了一个接近最佳的近似从任何\n {线性矩阵族}的维度$q$可以学习$\tilde{O}(\sqrt{q})$ matvec查询,提高了通过草图技术和向量矩阵向量查询实现的$O(q)$界。
摘要:We study the problem of learning a structured approximation (low-rank, sparse, banded, etc.) to an unknown matrix $A$ given access to matrix-vector product (matvec) queries of the form $x \rightarrow Ax$ and $x \rightarrow A^Tx$. This problem is of central importance to algorithms across scientific computing and machine learning, with applications to fast multiplication and inversion for structured matrices, building preconditioners for first-order optimization, and as a model for differential operator learning. Prior work focuses on obtaining query complexity upper and lower bounds for learning specific structured matrix families that commonly arise in applications.   We initiate the study of the problem in greater generality, aiming to understand the query complexity of learning approximations from general matrix families. Our main result focuses on finding a near-optimal approximation to $A$ from any finite-sized family of matrices, $\mathcal{F}$. Standard results from matrix sketching show that $O(\log|\mathcal{F}|)$ matvec queries suffice in this setting. This bound can also be achieved, and is optimal, for vector-matrix-vector queries of the form $x,y\rightarrow x^TAy$, which have been widely studied in work on rank-$1$ matrix sensing.   Surprisingly, we show that, in the matvec model, it is possible to obtain a nearly quadratic improvement in complexity, to $\tilde{O}(\sqrt{\log|\mathcal{F}|})$. Further, we prove that this bound is tight up to log-log factors.Via covering number arguments, our result extends to well-studied infinite families. As an example, we establish that a near-optimal approximation from any \emph{linear matrix family} of dimension $q$ can be learned with $\tilde{O}(\sqrt{q})$ matvec queries, improving on an $O(q)$ bound achievable via sketching techniques and vector-matrix-vector queries.


【5】Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
标题:知识移植:资源受限环境中优化人工智能模型部署的机制
链接:https://arxiv.org/abs/2507.19261

作者:urshed, Ashish Kaushal, Asmail Muftah, Nitin Auluck, Omer Rana
备注:18 pages, 4 figures, ArXiv preprint - Novel "knowledge grafting" technique achieving 88.54% AI model size reduction while improving accuracy for resource-constrained deployment
摘要:人工智能(AI)的日益采用导致了更大、更复杂的模型,这些模型具有众多参数,需要大量的计算能力--而这些资源在许多现实世界的应用场景中通常不可用。我们的论文通过引入知识嫁接来解决这一挑战,知识嫁接是一种新的机制,通过将选定的特征(接穗)从大型供体模型转移到较小的砧木模型来优化资源受限环境下的AI模型。该方法实现了88.54%的模型大小减少(从64.39 MB到7.38 MB),同时提高了模型的泛化能力。我们的新rootstock模型实现了89.97%的验证准确率(与供体的87.47%相比),保持了较低的验证损失(0.2976与0.5068),并且在看不见的测试数据上表现出色,准确率为90.45%。它解决了典型的大小与性能的权衡问题,并使AI框架能够在资源受限的设备上部署,同时增强性能。我们已经在农业杂草检测场景中测试了我们的方法,但是,它可以扩展到各种边缘计算场景,可能会加速人工智能在硬件/软件支持有限的地区的采用-通过以类似的方式镜像园艺嫁接,使具有挑战性的农业环境中的生产性种植成为可能。
摘要:The increasing adoption of Artificial Intelligence (AI) has led to larger, more complex models with numerous parameters that require substantial computing power -- resources often unavailable in many real-world application scenarios. Our paper addresses this challenge by introducing knowledge grafting, a novel mechanism that optimizes AI models for resource-constrained environments by transferring selected features (the scion) from a large donor model to a smaller rootstock model. The approach achieves an 88.54% reduction in model size (from 64.39 MB to 7.38 MB), while improving generalization capability of the model. Our new rootstock model achieves 89.97% validation accuracy (vs. donor's 87.47%), maintains lower validation loss (0.2976 vs. 0.5068), and performs exceptionally well on unseen test data with 90.45% accuracy. It addresses the typical size vs performance trade-off, and enables deployment of AI frameworks on resource-constrained devices with enhanced performance. We have tested our approach on an agricultural weed detection scenario, however, it can be extended across various edge computing scenarios, potentially accelerating AI adoption in areas with limited hardware/software support -- by mirroring in a similar manner the horticultural grafting enables productive cultivation in challenging agri-based environments.


【6】A Markov Categorical Framework for Language Modeling
标题:语言建模的马尔科夫分类框架
链接:https://arxiv.org/abs/2507.19247

作者:ng
备注:Project Page: this https URL
摘要:自回归语言模型分解序列概率,并通过最小化负对数似然(NLL)目标进行训练。虽然从经验上看很有说服力,但对为什么这个简单的目标会产生如此多功能的表征的深刻理论理解仍然难以捉摸。这项工作介绍了一个统一的分析框架,使用马尔可夫类别(MC)解构AR生成过程和NLL目标。我们模型的单步生成地图作为一个组成的马尔可夫内核的类别Stoch。这种组合观点,当被统计分歧所丰富时,使我们能够剖析信息流和学习几何。我们的框架做出了三个主要贡献。首先,我们提供了一个正式的,信息理论的成功的现代投机解码方法,如EAGLE的理由,量化的信息盈余隐藏状态,这些方法利用。其次,我们形式化了NLL最小化如何迫使模型不仅学习下一个令牌,而且学习数据的内在条件随机性,这是我们使用分类熵分析的过程。第三,也是最重要的,我们证明了NLL训练是一种内隐形式的谱对比学习。通过分析模型的预测头的信息几何,我们表明,NLL隐式地迫使学习的表示空间与预测相似性算子的特征谱对齐,从而学习几何结构空间,而无需显式对比对。这种组合和信息几何的视角揭示了现代LM有效性背后的深层结构原则。项目页面:https://github.com/asiresearch/lm-theory
摘要:Auto-regressive language models factorize sequence probabilities and are trained by minimizing the negative log-likelihood (NLL) objective. While empirically powerful, a deep theoretical understanding of why this simple objective yields such versatile representations remains elusive. This work introduces a unifying analytical framework using Markov Categories (MCs) to deconstruct the AR generation process and the NLL objective. We model the single-step generation map as a composition of Markov kernels in the category Stoch. This compositional view, when enriched with statistical divergences, allows us to dissect information flow and learned geometry. Our framework makes three main contributions. First, we provide a formal, information-theoretic rationale for the success of modern speculative decoding methods like EAGLE, quantifying the information surplus in hidden states that these methods exploit. Second, we formalize how NLL minimization forces the model to learn not just the next token, but the data's intrinsic conditional stochasticity, a process we analyze using categorical entropy. Third, and most centrally, we prove that NLL training acts as an implicit form of spectral contrastive learning. By analyzing the information geometry of the model's prediction head, we show that NLL implicitly forces the learned representation space to align with the eigenspectrum of a predictive similarity operator, thereby learning a geometrically structured space without explicit contrastive pairs. This compositional and information-geometric perspective reveals the deep structural principles underlying the effectiveness of modern LMs. Project Page: https://github.com/asiresearch/lm-theory


【7】Game-Theoretic Gradient Control for Robust Neural Network Training
标题:用于鲁棒神经网络训练的游戏理论梯度控制
链接:https://arxiv.org/abs/2507.19143

作者:tseva, Ivan Tomilov, Natalia Gusarova
备注:19 pages, 6 figures
摘要:前馈神经网络(FFNN)容易受到输入噪声的影响,从而降低预测性能。现有的正则化方法(如dropout)通常会改变网络架构或忽略神经元之间的相互作用。本研究旨在通过修改反向传播(解释为多智能体博弈)和探索受控目标变量噪声来增强FFNN噪声鲁棒性。我们的“梯度丢弃”在反向传播过程中以1 - p的概率选择性地使隐藏层神经元梯度无效,同时保持正向传递活跃。这是一个组合博弈论的框架。此外,目标变量受到白噪声或稳定分布的干扰。在十个不同的表格数据集上进行的实验显示了不同的影响:根据数据集和超参数,鲁棒性和准确性的提高或降低。值得注意的是,在回归任务中,梯度丢失(p = 0.9)与稳定的分布目标噪声相结合,显著提高了输入噪声鲁棒性,这可以通过更平坦的MSE曲线和更稳定的SMAPE值来证明。这些结果突出了该方法的潜力,强调了自适应参数调整的关键作用,并开辟了新的途径,分析神经网络作为复杂的自适应系统表现出紧急行为的博弈论框架内。
摘要:Feed-forward neural networks (FFNNs) are vulnerable to input noise, reducing prediction performance. Existing regularization methods like dropout often alter network architecture or overlook neuron interactions. This study aims to enhance FFNN noise robustness by modifying backpropagation, interpreted as a multi-agent game, and exploring controlled target variable noising. Our "gradient dropout" selectively nullifies hidden layer neuron gradients with probability 1 - p during backpropagation, while keeping forward passes active. This is framed within compositional game theory. Additionally, target variables were perturbed with white noise or stable distributions. Experiments on ten diverse tabular datasets show varying impacts: improvement or diminishing of robustness and accuracy, depending on dataset and hyperparameters. Notably, on regression tasks, gradient dropout (p = 0.9) combined with stable distribution target noising significantly increased input noise robustness, evidenced by flatter MSE curves and more stable SMAPE values. These results highlight the method's potential, underscore the critical role of adaptive parameter tuning, and open new avenues for analyzing neural networks as complex adaptive systems exhibiting emergent behavior within a game-theoretic framework.


【8】Exploring molecular assembly as a biosignature using mass spectrometry and machine learning
标题:使用MS和机器学习探索分子组装作为生物特征
链接:https://arxiv.org/abs/2507.19057

作者:. Rutter, Abhishek Sharma, Ian Seet, David Obeh Alobo, An Goto, Leroy Cronin
备注:35 pages,7 figures, 62 references
摘要 :分子组装为探测地球以外的生命提供了一条有希望的途径,同时最大限度地减少了基于地球生命的假设。由于质谱仪将是即将到来的太阳系任务的核心,因此从它们的数据中预测分子组装而无需阐明未知结构对于公正的生命探测至关重要。理想的不可知生物特征必须是可解释的和实验可测量的。在这里,我们表明,分子组装,最近开发的方法来衡量已产生的进化对象,满足这两个标准。首先,它可以解释为生命探测,因为它反映了分子的组装与它们的键作为积木,与折扣建设历史的方法相反。第二,它可以在没有结构解析的情况下确定,因为它可以通过质谱法进行物理测量,这是将其与使用基于结构的信息测量分子复杂性的其他方法区分开来的特性。虽然分子组装可以使用质谱数据直接测量,但任务限制也有限制。为了解决这个问题,我们开发了一种机器学习模型,可以高精度地预测分子组装,与基线模型相比,误差减少了三倍。模拟数据表明,即使是很小的仪器不一致,可以加倍模型误差,强调需要标准化。这些结果表明,标准化的质谱数据库可以实现准确的分子组装预测,而无需结构解析,为未来的天体生物学任务提供概念验证。
摘要:Molecular assembly offers a promising path to detect life beyond Earth, while minimizing assumptions based on terrestrial life. As mass spectrometers will be central to upcoming Solar System missions, predicting molecular assembly from their data without needing to elucidate unknown structures will be essential for unbiased life detection. An ideal agnostic biosignature must be interpretable and experimentally measurable. Here, we show that molecular assembly, a recently developed approach to measure objects that have been produced by evolution, satisfies both criteria. First, it is interpretable for life detection, as it reflects the assembly of molecules with their bonds as building blocks, in contrast to approaches that discount construction history. Second, it can be determined without structural elucidation, as it can be physically measured by mass spectrometry, a property that distinguishes it from other approaches that use structure-based information measures for molecular complexity. Whilst molecular assembly is directly measurable using mass spectrometry data, there are limits imposed by mission constraints. To address this, we developed a machine learning model that predicts molecular assembly with high accuracy, reducing error by three-fold compared to baseline models. Simulated data shows that even small instrumental inconsistencies can double model error, emphasizing the need for standardization. These results suggest that standardized mass spectrometry databases could enable accurate molecular assembly prediction, without structural elucidation, providing a proof-of-concept for future astrobiology missions.


【9】Neural Ordinary Differential Equations for Learning and Extrapolating System Dynamics Across Bifurcations
标题:用于学习和跨分叉推断系统动力学的神经常微方程
链接:https://arxiv.org/abs/2507.19036

作者:egelen, George van Voorn, Ioannis Athanasiadis, Peter van Heijster
摘要:预测系统行为附近和跨越分叉是至关重要的,以确定潜在的变化动力系统。虽然机器学习最近已被用于从数据中学习关键转换和分叉结构,但大多数研究仍然有限,因为它们只关注离散时间方法和局部分叉。为了解决这些限制,我们使用神经常微分方程,它为学习系统动力学提供了一个连续的,数据驱动的框架。我们将我们的方法应用到一个捕食系统,具有本地和全球的分歧,提出了一个具有挑战性的测试案例。我们的研究结果表明,神经常微分方程可以通过学习参数依赖的向量场直接从时间序列数据中恢复潜在的分叉结构。值得注意的是,我们证明了神经常微分方程甚至可以预测训练数据中所代表的参数区域之外的分叉。我们还评估了该方法在有限和嘈杂的数据条件下的性能,发现模型的准确性更多地取决于可以从训练数据中推断出的信息的质量,而不是可用的数据量。
摘要:Forecasting system behaviour near and across bifurcations is crucial for identifying potential shifts in dynamical systems. While machine learning has recently been used to learn critical transitions and bifurcation structures from data, most studies remain limited as they exclusively focus on discrete-time methods and local bifurcations. To address these limitations, we use Neural Ordinary Differential Equations which provide a continuous, data-driven framework for learning system dynamics. We apply our approach to a predator-prey system that features both local and global bifurcations, presenting a challenging test case. Our results show that Neural Ordinary Differential Equations can recover underlying bifurcation structures directly from timeseries data by learning parameter-dependent vector fields. Notably, we demonstrate that Neural Ordinary Differential Equations can forecast bifurcations even beyond the parameter regions represented in the training data. We also assess the method's performance under limited and noisy data conditions, finding that model accuracy depends more on the quality of information that can be inferred from the training data, than on the amount of data available.


【10】A diffusion-based generative model for financial time series via geometric Brownian motion
标题:基于几何布朗运动的金融时间序列扩散生成模型
链接:https://arxiv.org/abs/2507.19003

作者:, Sun-Yong Choi, Yeoneung Kim
摘要:我们提出了一种新的基于扩散的金融时间序列生成框架,将几何布朗运动(GBM),布莱克-斯科尔斯理论的基础,到前向噪声处理。与将价格轨迹视为通用数值序列的标准基于分数的模型不同,我们的方法在每个时间步与资产价格成比例地注入噪声,反映了在金融时间序列中观察到的异方差性。通过精确地平衡漂移和扩散项,我们表明,由此产生的对数价格过程减少到一个方差爆炸随机微分方程,与基于分数的生成模型的配方。反向时间生成过程通过使用基于条件分数的扩散插补(CSDI)框架改编的基于变换器的架构的去噪分数匹配来训练。对历史股票数据的实证评估表明,我们的模型再现了关键的程式化事实重尾收益率分布,波动聚集,杠杆效应比传统的扩散模型更现实。
摘要:We propose a novel diffusion-based generative framework for financial time series that incorporates geometric Brownian motion (GBM), the foundation of the Black--Scholes theory, into the forward noising process. Unlike standard score-based models that treat price trajectories as generic numerical sequences, our method injects noise proportionally to asset prices at each time step, reflecting the heteroskedasticity observed in financial time series. By accurately balancing the drift and diffusion terms, we show that the resulting log-price process reduces to a variance-exploding stochastic differential equation, aligning with the formulation in score-based generative models. The reverse-time generative process is trained via denoising score matching using a Transformer-based architecture adapted from the Conditional Score-based Diffusion Imputation (CSDI) framework. Empirical evaluations on historical stock data demonstrate that our model reproduces key stylized facts heavy-tailed return distributions, volatility clustering, and the leverage effect more realistically than conventional diffusion models.


【11】GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units
标题:GENIAL:通过低功耗算术逻辑单元的网络倒置进行生成设计空间探索
链接:https://arxiv.org/abs/2507.18989

作者:ouvier, Ryan Amaudruz, Felix Arnold, Renzo Andri, Lukas Cavigelli
备注:Under review
摘要:随着人工智能工作负载的激增,优化运算单元对于减少数字系统的占用空间变得越来越重要。传统的设计流程通常依赖于手动或基于几何学的优化,其彻底探索广阔设计空间的能力有限。在本文中,我们介绍GENIAL,一个基于机器学习的框架,用于自动生成和优化算术单元,更具体地说,乘法器。   GENIAL的核心是一个基于Transformer的代理模型,该模型分为两个阶段进行训练,包括自我监督的预训练和监督的微调,以从抽象的设计表示中稳健地预测关键的硬件指标,如功率和面积。通过反转代理模型,GENIAL有效地搜索新的操作数编码,这些编码可以直接最小化特定输入数据分布的算术单元中的功耗。在大型数据集上的大量实验表明,GENIAL始终比其他方法更有效,并且更快地收敛到优化设计。这使得能够在循环中部署高工作量的逻辑合成优化流程,从而提高代理模型的准确性。值得注意的是,与传统的二进制补码相比,GENIAL自动发现编码,在代表性AI工作负载的乘数内节省高达18%的切换活动。我们还证明了我们的方法的多功能性,通过实现显着的改进有限状态机,突出GENIAL的适用范围广泛的逻辑功能。总之,这些进步标志着数字系统自动生成结果质量优化组合电路的重要一步。
摘要 :As AI workloads proliferate, optimizing arithmetic units is becoming increasingly important to reduce the footprint of digital systems. Conventional design flows, which often rely on manual or heuristics-based optimization, are limited in their ability to thoroughly explore the vast design space. In this paper, we introduce GENIAL, a machine learning-based framework for the automatic generation and optimization of arithmetic units, more specifically multipliers.   At the core of GENIAL is a Transformer-based surrogate model trained in two stages, involving self-supervised pretraining followed by supervised finetuning, to robustly forecast key hardware metrics such as power and area from abstracted design representations. By inverting the surrogate model, GENIAL efficiently searches for new operand encodings that directly minimize power consumption in arithmetic units for specific input data distributions. Extensive experiments on large datasets demonstrate that GENIAL is consistently more sample efficient than other methods, and converges faster towards optimized designs. This enables to deploy a high-effort logic synthesis optimization flow in the loop, improving the accuracy of the surrogate model. Notably, GENIAL automatically discovers encodings that achieve up to 18% switching activity savings within multipliers on representative AI workloads compared with the conventional two's complement. We also demonstrate the versatility of our approach by achieving significant improvements on Finite State Machines, highlighting GENIAL's applicability for a wide spectrum of logic functions. Together, these advances mark a significant step toward automated Quality-of-Results-optimized combinational circuit generation for digital systems.


【12】Scale-Consistent Learning for Partial Differential Equations
标题:偏方程的尺度一致学习
链接:https://arxiv.org/abs/2507.18813

作者:, Samuel Lanthaler, Catherine Deng, Michael Chen, Yixuan Wang, Kamyar Azizzadenesheli, Anima Anandkumar
摘要:机器学习(ML)模型已经成为解决科学和工程中偏微分方程(PDE)的一种有前途的方法。以前的ML模型通常无法在训练数据之外进行泛化;例如,Navier-Stokes方程的训练ML模型仅适用于预定义域上的固定雷诺数($Re$)。为了克服这些限制,我们提出了一个数据增强方案的基础上规模一致性属性的偏微分方程和设计一个规模知情的神经操作,可以模拟范围广泛的规模。我们的公式利用了以下事实:(i)PDE可以重新缩放,或者更具体地说,可以将给定域重新缩放到单位大小,并且可以适当地调整PDE的参数和边界条件以表示原始解,以及(ii)给定域上的解算子在子域上是一致的。我们利用这些事实来创建一个规模一致性损失,鼓励匹配的解决方案,在一个给定的域和解决方案的子域从重新缩放PDE评估。由于神经算子可以适应多个尺度和分辨率,因此它们是在神经PDE求解器训练期间合并尺度一致性损失的自然选择。我们在Burgers方程、Darcy流、Helmholtz方程和Navier-Stokes方程上进行了尺度一致性损失和尺度信息神经算子模型的实验。通过尺度一致性,在$Re$为1000的情况下训练的模型可以推广到$Re$从250到10000的范围,并且与基线相比,所有数据集的平均误差减少了34%。
摘要:Machine learning (ML) models have emerged as a promising approach for solving partial differential equations (PDEs) in science and engineering. Previous ML models typically cannot generalize outside the training data; for example, a trained ML model for the Navier-Stokes equations only works for a fixed Reynolds number ($Re$) on a pre-defined domain. To overcome these limitations, we propose a data augmentation scheme based on scale-consistency properties of PDEs and design a scale-informed neural operator that can model a wide range of scales. Our formulation leverages the facts: (i) PDEs can be rescaled, or more concretely, a given domain can be re-scaled to unit size, and the parameters and the boundary conditions of the PDE can be appropriately adjusted to represent the original solution, and (ii) the solution operators on a given domain are consistent on the sub-domains. We leverage these facts to create a scale-consistency loss that encourages matching the solutions evaluated on a given domain and the solution obtained on its sub-domain from the rescaled PDE. Since neural operators can fit to multiple scales and resolutions, they are the natural choice for incorporating scale-consistency loss during training of neural PDE solvers. We experiment with scale-consistency loss and the scale-informed neural operator model on the Burgers' equation, Darcy Flow, Helmholtz equation, and Navier-Stokes equations. With scale-consistency, the model trained on $Re$ of 1000 can generalize to $Re$ ranging from 250 to 10000, and reduces the error by 34% on average of all datasets compared to baselines.


【13】CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization
标题:清晰:通过对比学习和反对比规范化来消除虚假的风格-内容关联
链接:https://arxiv.org/abs/2507.18794

作者:un, Benjamin A. Goldstein, Matthew M. Engelhard
备注:10 pages main text, 24 pages in total
摘要:学习不受表面特征影响的表征对于确保测试时这些特征的变化不会损害下游预测性能很重要。例如,在医疗保健应用中,我们可能希望学习包含病理信息但不受种族,性别和其他生理变异性来源影响的特征,从而确保预测在所有人口统计学中是公平和可推广的。在这里,我们提出了反对比正则化(CLEAR)的对比学习,这是一个直观且易于实现的框架,可以有效地将基本(即,任务相关的)特征从表面(即,任务无关)的特征,导致更好的表现时,表面特征转移在测试时间。我们首先假设数据表示可以在语义上分离为任务相关的内容特征,其中包含与下游任务相关的信息,以及任务无关的风格特征,其中包含与这些任务无关的表面属性,但可能会降低性能,因为与训练数据中存在的内容相关联,而这些内容不会泛化。然后,我们证明了我们的反对比惩罚,我们称之为对切换(PS),最大限度地减少了风格属性和内容标签之间的互信息。最后,我们实例化CLEAR在潜空间的变分自动编码器(VAE),然后进行实验,定量和定性地评估所产生的CLEAR-VAE在几个图像数据集。我们的研究结果表明,CLEAR-VAE允许我们:(a)在任何一对样本之间交换和插入内容和风格,以及(b)在存在以前看不见的内容和风格组合的情况下提高下游分类性能。我们的代码将会公开发布。
摘要:Learning representations unaffected by superficial characteristics is important to ensure that shifts in these characteristics at test time do not compromise downstream prediction performance. For instance, in healthcare applications, we might like to learn features that contain information about pathology yet are unaffected by race, sex, and other sources of physiologic variability, thereby ensuring predictions are equitable and generalizable across all demographics. Here we propose Contrastive LEarning with Anti-contrastive Regularization (CLEAR), an intuitive and easy-to-implement framework that effectively separates essential (i.e., task-relevant) characteristics from superficial (i.e., task-irrelevant) characteristics during training, leading to better performance when superficial characteristics shift at test time. We begin by supposing that data representations can be semantically separated into task-relevant content features, which contain information relevant to downstream tasks, and task-irrelevant style features, which encompass superficial attributes that are irrelevant to these tasks, yet may degrade performance due to associations with content present in training data that do not generalize. We then prove that our anti-contrastive penalty, which we call Pair-Switching (PS), minimizes the Mutual Information between the style attributes and content labels. Finally, we instantiate CLEAR in the latent space of a Variational Auto-Encoder (VAE), then perform experiments to quantitatively and qualitatively evaluate the resulting CLEAR-VAE over several image datasets. Our results show that CLEAR-VAE allows us to: (a) swap and interpolate content and style between any pair of samples, and (b) improve downstream classification performance in the presence of previously unseen combinations of content and style. Our code will be made publicly available.


【14】Tell Me What You See: An Iterative Deep Learning Framework for Image Captioning
标题:告诉我你看到了什么:图像字幕的迭代深度学习框架
链接:https://arxiv.org/abs/2507.18788

作者:mar Gupta
备注:16 pages, 12 total figures (including a 7-figure appendix), 4 tables
摘要:图像字幕是计算机视觉和自然语言处理的融合,需要对视觉场景和语言结构进行复杂的理解。虽然现代方法主要是大规模Transformer架构,但本文记录了基础图像字幕模型的系统化迭代开发,从简单的CNN-LSTM编码器-解码器发展到基于注意力的竞争系统。我们提出了一系列的五个模型,开始与成因和结束与Nexus,一个先进的模型,具有高效NetV 2B 3骨干和动态注意力机制。我们的实验描绘了架构增强的影响,并展示了经典CNN-LSTM范式中的一个关键发现:仅仅升级视觉骨干而没有相应的注意力机制会降低性能,因为单向量瓶颈无法传输更丰富的视觉细节。这种洞察力验证了架构向注意力的转变。在MS COCO 2017数据集上进行训练后,我们的最终模型Nexus获得了31.4的BLEU-4分数,超过了几个基础基准,并验证了我们的迭代设计过程。这项工作为理解支撑现代视觉语言任务的核心架构原则提供了一个清晰的、可复制的蓝图。
摘要 :Image captioning, a task at the confluence of computer vision and natural language processing, requires a sophisticated understanding of both visual scenes and linguistic structure. While modern approaches are dominated by large-scale Transformer architectures, this paper documents a systematic, iterative development of foundational image captioning models, progressing from a simple CNN-LSTM encoder-decoder to a competitive attention-based system. We present a series of five models, beginning with Genesis and concluding with Nexus, an advanced model featuring an EfficientNetV2B3 backbone and a dynamic attention mechanism. Our experiments chart the impact of architectural enhancements and demonstrate a key finding within the classic CNN-LSTM paradigm: merely upgrading the visual backbone without a corresponding attention mechanism can degrade performance, as the single-vector bottleneck cannot transmit the richer visual detail. This insight validates the architectural shift to attention. Trained on the MS COCO 2017 dataset, our final model, Nexus, achieves a BLEU-4 score of 31.4, surpassing several foundational benchmarks and validating our iterative design process. This work provides a clear, replicable blueprint for understanding the core architectural principles that underpin modern vision-language tasks.


【15】The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models
标题:修剪中被遗忘的权利:揭示稀疏模型上的机器去学习
链接:https://arxiv.org/abs/2507.18725

作者:, Gen Li, Jie Ji, Ruimeng Ye, Xiaolong Ma, Bo Hui
备注:9 pages for main part
摘要:机器非学习旨在有效地消除训练模型中有关已删除数据的记忆,并解决被遗忘的权利。尽管现有的unlearning算法取得了成功,稀疏模型中的unlearning尚未得到很好的研究。在本文中,我们经验性地发现,删除的数据对修剪的拓扑结构在稀疏模型的影响。受观察和被遗忘权的启发,我们定义了一个新的术语“un-pruning”来消除删除数据对模型修剪的影响。然后,我们提出了一个un-pruning算法来近似修剪后的拓扑结构驱动的保留数据。我们注意到任何现有的去学习算法都可以与所提出的去修剪工作流程集成,并且理论上去修剪的误差是有上限的。此外,我们的un-pruning算法可以适用于结构化稀疏模型和非结构化稀疏模型。在实验中,我们进一步发现,成员推理攻击(MIA)的准确性对于评估模型是否忘记删除数据是不可靠的,因为删除数据量的微小变化可以产生任意的MIA结果。因此,我们设计了新的稀疏模型的性能指标,以评估成功的un-pruning。最后,我们进行了大量的实验,以验证各种修剪方法和unlearning算法的un-pruning的效果。我们的代码发布在https://anonymous.4open.science/r/UnlearningSparseModels-FBC5/。
摘要:Machine unlearning aims to efficiently eliminate the memory about deleted data from trained models and address the right to be forgotten. Despite the success of existing unlearning algorithms, unlearning in sparse models has not yet been well studied. In this paper, we empirically find that the deleted data has an impact on the pruned topology in a sparse model. Motivated by the observation and the right to be forgotten, we define a new terminology ``un-pruning" to eliminate the impact of deleted data on model pruning. Then we propose an un-pruning algorithm to approximate the pruned topology driven by retained data. We remark that any existing unlearning algorithm can be integrated with the proposed un-pruning workflow and the error of un-pruning is upper-bounded in theory. Also, our un-pruning algorithm can be applied to both structured sparse models and unstructured sparse models. In the experiment, we further find that Membership Inference Attack (MIA) accuracy is unreliable for assessing whether a model has forgotten deleted data, as a small change in the amount of deleted data can produce arbitrary MIA results. Accordingly, we devise new performance metrics for sparse models to evaluate the success of un-pruning. Lastly, we conduct extensive experiments to verify the efficacy of un-pruning with various pruning methods and unlearning algorithms. Our code is released at https://anonymous.4open.science/r/UnlearningSparseModels-FBC5/.


【16】Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance
标题:分段引导后验抽样求解反问题的扩散模型
链接:https://arxiv.org/abs/2507.18654

作者:seni-Sehdeh, Walid Saad, Kei Sakaguchi, Tao Yu
摘要:扩散模型是从高维分布中采样的强大工具,通过去噪过程将纯噪声逐步转换为结构化数据。当配备有指导机制时,这些模型还可以从条件分布中生成样本。在本文中,一种新的扩散为基础的框架,采用分段指导计划求解反问题。指导项被定义为扩散时间步的分段函数,便于在高噪声和低噪声阶段使用不同的近似值。这种设计被证明是有效地平衡计算效率与指导项的准确性。不同于特定任务的方法,需要重新训练的每个问题,所提出的方法是问题不可知的,并很容易适应各种逆问题。此外,它明确地将测量噪声纳入重建过程。该框架的有效性通过大量的图像恢复任务,特别是图像修复和超分辨率的实验证明。使用类条件扩散模型进行恢复,与pgdm基线相比,所提出的框架实现了随机和中心掩模修复的推理时间减少\(25\%\),以及\(4\times\)和\(8\times\)超分辨率任务的推理时间分别减少\(23\%\)和\(24\%\),而PSNR和SSIM的损失微不足道。
摘要:Diffusion models are powerful tools for sampling from high-dimensional distributions by progressively transforming pure noise into structured data through a denoising process. When equipped with a guidance mechanism, these models can also generate samples from conditional distributions. In this paper, a novel diffusion-based framework is introduced for solving inverse problems using a piecewise guidance scheme. The guidance term is defined as a piecewise function of the diffusion timestep, facilitating the use of different approximations during high-noise and low-noise phases. This design is shown to effectively balance computational efficiency with the accuracy of the guidance term. Unlike task-specific approaches that require retraining for each problem, the proposed method is problem-agnostic and readily adaptable to a variety of inverse problems. Additionally, it explicitly incorporates measurement noise into the reconstruction process. The effectiveness of the proposed framework is demonstrated through extensive experiments on image restoration tasks, specifically image inpainting and super-resolution. Using a class conditional diffusion model for recovery, compared to the \pgdm baseline, the proposed framework achieves a reduction in inference time of \(25\%\) for inpainting with both random and center masks, and \(23\%\) and \(24\%\) for \(4\times\) and \(8\times\) super-resolution tasks, respectively, while incurring only negligible loss in PSNR and SSIM.


【17】Multi-Year Maintenance Planning for Large-Scale Infrastructure Systems: A Novel Network Deep Q-Learning Approach
标题:大型基础设施系统的多年维护规划:一种新型网络深度Q学习方法
链接:https://arxiv.org/abs/2507.18732

作者:, Arnold X.-X. Yuan
摘要:基础设施资产管理对于维持公路网、桥梁和公用事业网络等公共基础设施的性能至关重要。传统的维护和修复规划方法通常面临可扩展性和计算方面的挑战,特别是对于在预算限制下拥有数千资产的大型网络。本文提出了一种新的深度强化学习(DRL)框架,用于优化大型基础设施网络的资产管理策略。通过将网络级马尔可夫决策过程(MDP)分解为单个资产级MDP,同时使用统一的神经网络架构,所提出的框架降低了计算复杂度,提高了学习效率,并增强了可扩展性。该框架通过预算分配机制直接纳入年度预算限制,确保维护计划既优化又具有成本效益。通过对一个由68,800个路段组成的大规模路面网络的案例研究,所提出的DRL框架在效率和网络性能方面都比传统的方法(如渐进线性规划和遗传算法)有了显着的改进。这一进步有助于基础设施资产管理和强化学习在复杂的大规模环境中的更广泛应用。
摘要:Infrastructure asset management is essential for sustaining the performance of public infrastructure such as road networks, bridges, and utility networks. Traditional maintenance and rehabilitation planning methods often face scalability and computational challenges, particularly for large-scale networks with thousands of assets under budget constraints. This paper presents a novel deep reinforcement learning (DRL) framework that optimizes asset management strategies for large infrastructure networks. By decomposing the network-level Markov Decision Process (MDP) into individual asset-level MDPs while using a unified neural network architecture, the proposed framework reduces computational complexity, improves learning efficiency, and enhances scalability. The framework directly incorporates annual budget constraints through a budget allocation mechanism, ensuring maintenance plans are both optimal and cost-effective. Through a case study on a large-scale pavement network of 68,800 segments, the proposed DRL framework demonstrates significant improvements over traditional methods like Progressive Linear Programming and genetic algorithms, both in efficiency and network performance. This advancement contributes to infrastructure asset management and the broader application of reinforcement learning in complex, large-scale environments.


其他(18篇)

【1】CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing
标题:CircuitProbe:用电路追踪剖析时空视觉语义
链接:https://arxiv.org/abs/2507.19420

作者:ang, Chengzhang Yu, Zhuokai Zhao, Kun Wang, Qiankun Li, Zihan Chen, Yang Liu, Zenghui Ding, Yining Sun
摘要:大型视觉语言模型(LVLM)中语言和图像理解的处理机制已经得到了广泛的研究。然而,LVLM的时空理解的内部推理机制仍然知之甚少。在这项工作中,我们引入了一个系统的,基于电路的框架,旨在研究时空视觉语义是如何表示和处理这些LVLM。具体来说,我们的框架包括三个电路:视觉审计电路,语义跟踪电路,注意流电路。通过这些电路的镜头,我们发现视觉语义高度本地化到特定的对象标记-删除这些标记可以降低模型性能高达92.6%。此外,我们确定,可解释的概念的对象和行动出现,并逐步完善中到晚层的LVLM。与目前只关注一个图像中的对象的工作相反,我们发现LVLM的中晚期层表现出时空语义的专门功能定位。我们的研究结果提供了显着的机械见解LVLM时空语义分析,奠定了基础,设计更强大的和可解释的模型。
摘要:The processing mechanisms underlying language and image understanding in large vision-language models (LVLMs) have been extensively studied. However, the internal reasoning mechanisms of LVLMs for spatiotemporal understanding remain poorly understood. In this work, we introduce a systematic, circuit-based framework designed to investigate how spatiotemporal visual semantics are represented and processed within these LVLMs. Specifically, our framework comprises three circuits: visual auditing circuit, semantic tracing circuit, and attention flow circuit. Through the lens of these circuits, we discover that visual semantics are highly localized to specific object tokens--removing these tokens can degrade model performance by up to 92.6%. Furthermore, we identify that interpretable concepts of objects and actions emerge and become progressively refined in the middle-to-late layers of LVLMs. In contrary to the current works that solely focus on objects in one image, we reveal that the middle-to-late layers of LVLMs exhibit specialized functional localization for spatiotemporal semantics. Our findings offer significant mechanistic insights into spatiotemporal semantics analysis of LVLMs, laying a foundation for designing more robust and interpretable models.


【2】LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
标题:LOTUS:从质量到社会偏见和用户偏好的详细图像字幕排行榜
链接:https://arxiv.org/abs/2507.19362

作者:rota, Boyi Li, Ryo Hachiuma, Yueh-Hua Wu, Boris Ivanovic, Yuta Nakashima, Marco Pavone, Yejin Choi, Yu-Chiang Frank Wang, Chao-Han Huck Yang
备注:Accepted to ACL 2025. Leaderboard: this http URL
摘要:大型视觉语言模型(LVLM)已经改变了图像字幕,从简洁的字幕转向详细的描述。我们介绍LOTUS,一个用于评估详细字幕的排行榜,解决现有评估中的三个主要差距:缺乏标准化标准,偏见意识评估和用户偏好考虑。LOTUS全面评估各个方面,包括字幕质量(例如,对齐,重复性),风险(例如,幻觉)和社会偏见(例如,性别偏见),同时通过根据不同用户的偏好制定标准,实现以偏好为导向的评价。我们对最近LVLM的分析显示,没有一个模型在所有标准上都表现出色,而标题细节和偏差风险之间存在相关性。偏好导向的评价表明,最佳模式的选择取决于用户的优先事项。
摘要:Large Vision-Language Models (LVLMs) have transformed image captioning, shifting from concise captions to detailed descriptions. We introduce LOTUS, a leaderboard for evaluating detailed captions, addressing three main gaps in existing evaluations: lack of standardized criteria, bias-aware assessments, and user preference considerations. LOTUS comprehensively evaluates various aspects, including caption quality (e.g., alignment, descriptiveness), risks (\eg, hallucination), and societal biases (e.g., gender bias) while enabling preference-oriented evaluations by tailoring criteria to diverse user preferences. Our analysis of recent LVLMs reveals no single model excels across all criteria, while correlations emerge between caption detail and bias risks. Preference-oriented evaluations demonstrate that optimal model selection depends on user priorities.


【3】EffiComm: Bandwidth Efficient Multi Agent Communication
标题:EfiComm:带宽高效的多代理通信
链接:https://arxiv.org/abs/2507.19354

作者:gan, Allen Xavier Arasan, J. Marius Zöllner
备注:Accepted for publication at ITSC 2025
摘要:协作感知允许联网车辆交换传感器信息并克服每辆车的盲点。然而,传输原始点云或完整的特征地图会干扰车对车(V2 V)通信,导致延迟和可扩展性问题。我们引入了EffiComm,这是一个端到端的框架,它传输的数据不到现有技术所需数据的40%,同时保持了最先进的3D物体检测精度。EffiComm对来自任何模态的鸟瞰图(BEV)特征图进行操作,并应用两阶段缩减管道:(1)选择性传输(ST)使用置信掩码修剪低效用区域;(2)自适应网格缩减(AGR)使用图形神经网络(GNN)根据角色和网络负载分配车辆特定的保持比率。其余功能与软门控专家混合(MoE)关注层融合,为有效的功能集成提供更大的容量和专业化。在OPV 2 V基准测试中,EffiComm达到0.84 mAP@0.7,而平均每帧仅发送约1.5 MB,在每比特精度曲线上优于以前的方法。这些结果突出了自适应学习通信对于可扩展的车辆到一切(V2X)感知的价值。
摘要:Collaborative perception allows connected vehicles to exchange sensor information and overcome each vehicle's blind spots. Yet transmitting raw point clouds or full feature maps overwhelms Vehicle-to-Vehicle (V2V) communications, causing latency and scalability problems. We introduce EffiComm, an end-to-end framework that transmits less than 40% of the data required by prior art while maintaining state-of-the-art 3D object detection accuracy. EffiComm operates on Bird's-Eye-View (BEV) feature maps from any modality and applies a two-stage reduction pipeline: (1) Selective Transmission (ST) prunes low-utility regions with a confidence mask; (2) Adaptive Grid Reduction (AGR) uses a Graph Neural Network (GNN) to assign vehicle-specific keep ratios according to role and network load. The remaining features are fused with a soft-gated Mixture-of-Experts (MoE) attention layer, offering greater capacity and specialization for effective feature integration. On the OPV2V benchmark, EffiComm reaches 0.84 mAP@0.7 while sending only an average of approximately 1.5 MB per frame, outperforming previous methods on the accuracy-per-bit curve. These results highlight the value of adaptive, learned communication for scalable Vehicle-to-Everything (V2X) perception.


【4】Reconstruction of Sparse Urban Wireless Signals via Group Equivariant Non-Expansive Operators
标题:利用群等变非扩张操作器重建稀疏城市无线信号
链接:https://arxiv.org/abs/2507.19349

作者:ario Amorosa, Francesco Conti, Nicola Quercioli, Flavio Zabini, Tayebeh Lotfi Mahyari, Yiqun Ge, Patrizio Frosini
摘要:在诸如第六代(6G)无线网络之类的新兴通信系统中,高效的资源管理和服务递送依赖于对空间变化量(如信号干扰噪声比(SINR)图)的准确了解,以高分辨率获取这些空间变化量是昂贵的。这项工作探讨了使用群等变非扩展算子(GENEOs)从稀疏测量中重建这种空间信号,为传统神经网络提供了一种低复杂度的替代方案。GENEO的概念起源于拓扑数据分析(TDA),是机器学习中使用的一种数学工具,用于表示代理建模为作用于数据的函数运算符,同时结合特定于应用的不变性。利用这些不变性减少了传统神经网络的参数数量,并通过强制执行反映代理行为对称性的已知代数和几何约束来缓解数据稀缺性。在本文中,我们介绍了一种新的基于GENEO的方法,在城市无线通信网络中使用极其稀疏的采样的SINR地图重建。我们证明,这个数学框架实现了竞争力的性能相比,建立的方法。我们的评估,使用统计和TDA指标进行,突出了我们的方法在样本数量的严重数据限制下准确重建空间信号的优势。
摘要 :In emerging communication systems such as sixth generation (6G) wireless networks, efficient resource management and service delivery rely on accurate knowledge of spatially-varying quantities like signal-to-interference-noise ratio (SINR) maps, which are costly to acquire at high resolution. This work explores the reconstruction of such spatial signals from sparse measurements using Group Equivariant Non-Expansive Operators (GENEOs), offering a low-complexity alternative to traditional neural networks. The concept of GENEO, which originated in topological data analysis (TDA), is a mathematical tool used in machine learning to represent agents modelled as functional operators acting on data while incorporating application-specific invariances. Leveraging these invariances reduces the number of parameters with respect to traditional neural networks and mitigates data scarcity by enforcing known algebraic and geometric constraints that reflect symmetries in the agents' actions. In this paper, we introduce a novel GENEO-based approach for SINR map reconstruction in urban wireless communication networks using extremely sparse sampling. We demonstrate that this mathematical framework achieves competitive performance compared to established methods. Our evaluation, conducted using both statistical and TDA metrics, highlights the advantages of our approach in accurately reconstructing spatial signals under severe data limitations on the number of samples.


【5】Negative news posts are less prevalent and generate lower user engagement than non-negative news posts across six countries
标题:与六个国家的非负面新闻帖子相比,负面新闻帖子不太普遍,用户参与度也较低
链接:https://arxiv.org/abs/2507.19300

作者:laga, Dominik Batorski, Magdalena Wojcieszak
摘要:虽然新闻负面性经常被研究,但缺乏关于社交媒体上负面政治和非政治新闻帖子的流行和参与的比较证据。我们使用了2020年1月1日至2024年4月1日期间由六个国家的97家媒体机构发布的6,081,134篇Facebook帖子(美国,英国,爱尔兰,波兰,法国,西班牙),并开发两个多语言分类标签的职位(非)政治和(非)负面的。我们证明:(1)负面新闻帖子占相对较小的比例(12.6%);(2)政治新闻帖子的负面性既不比非政治新闻帖子多也不比非政治新闻帖子少;(3)美国政治新闻帖子相对于其他国家平均负面性较低(低40%的几率);(4)负面新闻帖子比非负面新闻帖子少15%的喜欢和13%的评论。最后,(5)我们提供了负面新闻帖子的用户参与总量的比例估计,并显示只有10.2%至13.1%的参与与分析的新闻机构的负面帖子有关。
摘要:Although news negativity is often studied, missing is comparative evidence on the prevalence of and engagement with negative political and non-political news posts on social media. We use 6,081,134 Facebook posts published between January 1, 2020, and April 1, 2024, by 97 media organizations in six countries (U.S., UK, Ireland, Poland, France, Spain) and develop two multilingual classifiers for labeling posts as (non-)political and (non-)negative. We show that: (1) negative news posts constitute a relatively small fraction (12.6%); (2) political news posts are neither more nor less negative than non-political news posts; (3) U.S. political news posts are less negative relative to the other countries on average (40% lower odds); (4) Negative news posts get 15% fewer likes and 13% fewer comments than non-negative news posts. Lastly, (5) we provide estimates of the proportion of the total volume of user engagement with negative news posts and show that only between 10.2% to 13.1% of engagement is linked to negative posts by the analyzed news organizations.


【6】Latent Granular Resynthesis using Neural Audio Codecs
标题:使用神经音频编解码器的潜在颗粒再合成
链接:https://arxiv.org/abs/2507.19202

作者:, Tom Baker
备注:Accepted at ISMIR 2025 Late Breaking Demos
摘要:我们介绍了一种新的技术,创造性的音频再合成,通过改造的概念,颗粒合成在潜在的向量水平。我们的方法通过将源音频语料编码成潜在向量段来创建“粒度码本”,然后将目标音频信号的每个潜在颗粒与码本中最接近的对应物进行匹配。解码得到的混合序列以产生保留目标的时间结构同时采用源的音色特征的音频。这种技术不需要模型训练,适用于各种音频材料,并且通过编解码器在解码期间的隐式插值自然避免了传统级联合成的典型不连续性。我们在https://github.com/naotokui/latentgranular/上提供了补充材料,以及一个概念验证实现,允许用户在https://huggingface.co/spaces/naotokui/latentgranular上试验他们自己的声音。
摘要:We introduce a novel technique for creative audio resynthesis that operates by reworking the concept of granular synthesis at the latent vector level. Our approach creates a "granular codebook" by encoding a source audio corpus into latent vector segments, then matches each latent grain of a target audio signal to its closest counterpart in the codebook. The resulting hybrid sequence is decoded to produce audio that preserves the target's temporal structure while adopting the source's timbral characteristics. This technique requires no model training, works with diverse audio materials, and naturally avoids the discontinuities typical of traditional concatenative synthesis through the codec's implicit interpolation during decoding. We include supplementary material at https://github.com/naotokui/latentgranular/ , as well as a proof-of-concept implementation to allow users to experiment with their own sounds at https://huggingface.co/spaces/naotokui/latentgranular .


【7】Closing the Modality Gap for Mixed Modality Search
标题:缩小混合情态搜索的情态差距
链接:https://arxiv.org/abs/2507.19054

作者: Yuhui Zhang, Xiaohan Wang, Weixin Liang, Ludwig Schmidt, Serena Yeung-Levy
备注:Project page: this https URL
摘要:混合模态搜索--在由图像、文本和多模态文档组成的异构语料库中检索信息--是一个重要但尚未开发的现实应用。在这项工作中,我们研究对比视觉语言模型,如CLIP,执行混合模态搜索任务。我们的分析揭示了一个关键的局限性:这些模型在嵌入空间中表现出明显的模态差距,其中图像和文本嵌入形成不同的集群,导致模态内排名偏差和模态间融合失败。为了解决这个问题,我们提出了GR-CLIP,一个轻量级的事后校准方法,消除CLIP的嵌入空间的模态差距。在MixBench(第一个专为混合模态搜索设计的基准测试)上进行评估,GR-CLIP将NDCG@10比CLIP提高了26个百分点,超过了最近的视觉语言生成嵌入模型4个百分点,同时使用的计算量减少了75倍。
摘要:Mixed modality search -- retrieving information across a heterogeneous corpus composed of images, texts, and multimodal documents -- is an important yet underexplored real-world application. In this work, we investigate how contrastive vision-language models, such as CLIP, perform on the mixed modality search task. Our analysis reveals a critical limitation: these models exhibit a pronounced modality gap in the embedding space, where image and text embeddings form distinct clusters, leading to intra-modal ranking bias and inter-modal fusion failure. To address this issue, we propose GR-CLIP, a lightweight post-hoc calibration method that removes the modality gap in CLIP's embedding space. Evaluated on MixBench -- the first benchmark specifically designed for mixed modality search -- GR-CLIP improves NDCG@10 by up to 26 percentage points over CLIP, surpasses recent vision-language generative embedding models by 4 percentage points, while using 75x less compute.


【8】Adapting to Fragmented and Evolving Data: A Fisher Information Perspective
标题:适应碎片化和不断变化的数据:Fisher信息视角
链接:https://arxiv.org/abs/2507.18996

作者:an, Tahir Qasim Syed, Nouman Muhammad Durrani
摘要 :在动态环境中运行的现代机器学习系统经常面临\textit{顺序协变量移位}(SCS),其中输入分布随着时间的推移而演变,而条件分布保持稳定。我们介绍FADE(基于Fisher的动态环境适应),一个轻量级的,理论上接地框架下的SCS强大的学习。FADE采用固定在Fisher信息几何中的移位感知正则化机制,通过基于灵敏度和稳定性调制参数更新来指导自适应。为了检测显著的分布变化,我们提出了一个Cramer-Rao通知的移位信号,该信号将KL发散与时间Fisher动力学相结合。与需要任务边界、目标监督或体验回放的先前方法不同,FADE以固定内存在线操作,并且无法访问目标标签。在视觉、语言和表格数据的七个基准测试中,FADE在严重偏移的情况下实现了高达19%的准确性,优于TENT和DIW等方法。FADE还通过将异构客户端视为时间上分散的环境,自然地推广到联邦学习,从而在分散的环境中实现可扩展和稳定的适应。理论分析保证了有限的遗憾和参数的一致性,而实证结果表明,FADE的鲁棒性跨模式和转移强度。
摘要:Modern machine learning systems operating in dynamic environments often face \textit{sequential covariate shift} (SCS), where input distributions evolve over time while the conditional distribution remains stable. We introduce FADE (Fisher-based Adaptation to Dynamic Environments), a lightweight and theoretically grounded framework for robust learning under SCS. FADE employs a shift-aware regularization mechanism anchored in Fisher information geometry, guiding adaptation by modulating parameter updates based on sensitivity and stability. To detect significant distribution changes, we propose a Cramer-Rao-informed shift signal that integrates KL divergence with temporal Fisher dynamics. Unlike prior methods requiring task boundaries, target supervision, or experience replay, FADE operates online with fixed memory and no access to target labels. Evaluated on seven benchmarks spanning vision, language, and tabular data, FADE achieves up to 19\% higher accuracy under severe shifts, outperforming methods such as TENT and DIW. FADE also generalizes naturally to federated learning by treating heterogeneous clients as temporally fragmented environments, enabling scalable and stable adaptation in decentralized settings. Theoretical analysis guarantees bounded regret and parameter consistency, while empirical results demonstrate FADE's robustness across modalities and shift intensities.


【9】TiVy: Time Series Visual Summary for Scalable Visualization
标题:TiVy:可扩展可视化的时间序列视觉摘要
链接:https://arxiv.org/abs/2507.18972

作者:uk-Yin Chan, Luis Gustavo Nonato, Themis Palpanas, Cláudio T. Silva, Juliana Freire
备注:to be published in TVCG (IEEE VIS 2025)
摘要:可视化多个时间序列在可伸缩性和视觉清晰度之间存在根本的权衡。时间序列捕捉许多大规模现实世界过程的行为,从股市趋势到城市活动。用户通常通过将它们可视化为折线图,并置或叠加多个时间序列来比较它们并识别趋势和模式来获得见解。然而,现有的表示方法在可扩展性方面遇到了困难:当覆盖很长的时间跨度时,由于太多的小倍数或重叠线而导致视觉混乱。我们提出了Tivy,一个新的算法,总结时间序列使用顺序模式。该方法利用动态时间规整(DTW)将序列转换为基于子序列视觉相似性的符号序列集合,然后基于频繁序列模式构造相似序列的不相交分组。分组结果是时间序列的可视化摘要,提供了具有较少小倍数的整齐叠加。与常见的聚类技术不同,TiVy提取在时间上对齐的相似序列(不同长度)。我们还提出了一个交互式的时间序列可视化,实时呈现大规模的时间序列。我们的实验评估表明,我们的算法(1)提取清晰和准确的模式时,可视化的时间序列数据,(2)实现了显着的速度(1000倍)相比,一个简单的DTW聚类。我们还证明了我们的方法的效率,以探索隐藏的结构在大量的时间序列数据在两个使用场景。
摘要:Visualizing multiple time series presents fundamental tradeoffs between scalability and visual clarity. Time series capture the behavior of many large-scale real-world processes, from stock market trends to urban activities. Users often gain insights by visualizing them as line charts, juxtaposing or superposing multiple time series to compare them and identify trends and patterns. However, existing representations struggle with scalability: when covering long time spans, leading to visual clutter from too many small multiples or overlapping lines. We propose TiVy, a new algorithm that summarizes time series using sequential patterns. It transforms the series into a set of symbolic sequences based on subsequence visual similarity using Dynamic Time Warping (DTW), then constructs a disjoint grouping of similar subsequences based on the frequent sequential patterns. The grouping result, a visual summary of time series, provides uncluttered superposition with fewer small multiples. Unlike common clustering techniques, TiVy extracts similar subsequences (of varying lengths) aligned in time. We also present an interactive time series visualization that renders large-scale time series in real-time. Our experimental evaluation shows that our algorithm (1) extracts clear and accurate patterns when visualizing time series data, (2) achieves a significant speed-up (1000X) compared to a straightforward DTW clustering. We also demonstrate the efficiency of our approach to explore hidden structures in massive time series data in two usage scenarios.


【10】CueBuddy: helping non-native English speakers navigate English-centric STEM education
标题:CueBuddy:帮助非英语母语者应对以英语为中心的STEM教育
链接:https://arxiv.org/abs/2507.18827

作者:pta
摘要:世界各地STEM课程的学生,特别是在全球南部,落后于英语更流利的同龄人,尽管在科学先决条件方面与他们不相上下。虽然他们中的许多人能够轻松地理解日常英语,但英语中的关键术语仍然具有挑战性。在大多数情况下,这些学生的大部分课程先决条件都是用较低的资源语言完成的。将实时语音翻译成低资源语言是一个很有前途的研究领域,然而,语音翻译模型在大规模上可能过于昂贵,并且经常与技术内容作斗争。在本文中,我们描述了CueBuddy,其目的是通过提供实时的“词汇线索”,通过技术关键字发现沿实时多语言词汇表查找,以帮助学生跟上速度与复杂的英语行话,而不会破坏他们的注意力集中在讲座上,以补救这些问题。我们还描述了我们的方法的局限性和未来的扩展。
摘要:Students across the world in STEM classes, especially in the Global South, fall behind their peers who are more fluent in English, despite being at par with them in terms of scientific prerequisites. While many of them are able to follow everyday English at ease, key terms in English stay challenging. In most cases, such students have had most of their course prerequisites in a lower resource language. Live speech translation to lower resource languages is a promising area of research, however, models for speech translation can be too expensive on a large scale and often struggle with technical content. In this paper, we describe CueBuddy, which aims to remediate these issues by providing real-time "lexical cues" through technical keyword spotting along real-time multilingual glossary lookup to help students stay up to speed with complex English jargon without disrupting their concentration on the lecture. We also describe the limitations and future extensions of our approach.


【11】Even Faster Simulations with Flow Matching: A Study of Zero Degree Calorimeter Responses
标题:利用流量匹配实现更快的模拟:零度量热计响应的研究
链接:https://arxiv.org/abs/2507.18811

作者:an Wojnar
摘要:生成神经网络的最新进展,特别是流量匹配(FM),已经能够生成高保真样本,同时显着降低计算成本。这些模型的一个很有前途的应用是加速高能物理(HEP)的模拟,帮助研究机构满足其日益增长的计算需求。在这项工作中,我们利用FM开发代理模型,用于ALICE实验中零度量热计的快速模拟。我们提出了一种有效的训练策略,可以用极少量的参数训练快速生成模型。这种方法实现了最先进的模拟保真度的中子(ZN)和质子(ZP)探测器,同时提供了显着降低计算成本相比,现有的方法。我们的FM模型实现了Wasserstein距离为1.27的ZN仿真与推理时间为0.46毫秒每个样本,相比,目前最好的1.20与推理时间约为109毫秒。潜在FM模型进一步提高了推理速度,减少采样时间为0.026毫秒每个样本,以最小的权衡精度。同样,我们的方法在ZP模拟中实现了1.30的Wasserstein距离,优于目前最好的2.08。源代码可在https://github.com/m-wojnar/faster_zdc上获得。
摘要 :Recent advances in generative neural networks, particularly flow matching (FM), have enabled the generation of high-fidelity samples while significantly reducing computational costs. A promising application of these models is accelerating simulations in high-energy physics (HEP), helping research institutions meet their increasing computational demands. In this work, we leverage FM to develop surrogate models for fast simulations of zero degree calorimeters in the ALICE experiment. We present an effective training strategy that enables the training of fast generative models with an exceptionally low number of parameters. This approach achieves state-of-the-art simulation fidelity for both neutron (ZN) and proton (ZP) detectors, while offering substantial reductions in computational costs compared to existing methods. Our FM model achieves a Wasserstein distance of 1.27 for the ZN simulation with an inference time of 0.46 ms per sample, compared to the current best of 1.20 with an inference time of approximately 109 ms. The latent FM model further improves the inference speed, reducing the sampling time to 0.026 ms per sample, with a minimal trade-off in accuracy. Similarly, our approach achieves a Wasserstein distance of 1.30 for the ZP simulation, outperforming the current best of 2.08. The source code is available at https://github.com/m-wojnar/faster_zdc.


【12】Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
标题:渔民免费?利用平方梯度累加器逼近Fisher信息矩阵
链接:https://arxiv.org/abs/2507.18807

作者: Felix Dangel, Derek Tam, Colin Raffel
备注:19 pages, 2 figures. Accepted as a spotlight poster at ICML 2025
摘要:模型的Fisher信息矩阵的对角线(“Fisher对角线”)经常被用作测量参数敏感性的一种方法。通常情况下,Fisher对角线是通过模型的似然相对于其参数的平方采样梯度来估计的,平均超过几百或几千个示例-这是一个会产生非平凡计算成本的过程。与此同时,自适应梯度方法(如无处不在的Adam优化器)在训练过程中计算平方梯度的移动平均值。因此,本文探讨是否可以通过回收已经在训练过程中计算的平方梯度累加器来“免费”获得Fisher对角的近似。通过涵盖Fisher对角线的五种应用的一组全面实验,我们证明“Squisher”(平方梯度累加器作为Fisher对角线的近似)的性能始终与Fisher对角线相似,同时优于基线方法。此外,我们澄清了Squisher和Fisher对角线之间的确切差异,并提供了各自影响的实证量化。
摘要:The diagonal of a model's Fisher Information Matrix (the "Fisher diagonal") has frequently been used as a way to measure parameter sensitivity. Typically, the Fisher diagonal is estimated via squared sampled gradients of the model's likelihood with respect to its parameters, averaged over a few hundred or thousand examples -- a process which incurs nontrivial computational costs. At the same time, adaptive gradient methods like the ubiquitous Adam optimizer compute a moving average of the squared gradient over the course of training. This paper therefore explores whether an approximation of the Fisher diagonal can be obtained "for free" by recycling the squared gradient accumulator that has already been computed over the course of training. Through a comprehensive set of experiments covering five applications of the Fisher diagonal, we demonstrate that the "Squisher" (SQUared gradient accumulator as an approximation of the FISHER) consistently performs similarly to the Fisher diagonal while outperforming baseline methods. Additionally, we clarify the exact differences between the Squisher and the Fisher diagonal and provide empirical quantification of their respective impact.


【13】Concept Probing: Where to Find Human-Defined Concepts (Extended Version)
标题:概念探索:哪里可以找到人类定义的概念(扩展版本)
链接:https://arxiv.org/abs/2507.18681

作者: Sousa Ribeiro, Afonso Leote, João Leite
备注:Extended version of the paper published in Proceedings of the International Conference on Neurosymbolic Learning and Reasoning (NeSy 2025)
摘要:概念探测最近越来越受欢迎,作为人类窥视人工神经网络中编码的一种方式。在概念探测中,额外的分类器被训练来将模型的内部表示映射到人类定义的感兴趣的概念。然而,这些探测器的性能高度依赖于它们探测的内部表示,这使得识别适当的层来探测成为一项重要任务。在本文中,我们提出了一种方法来自动识别神经网络模型中的哪一层的表示,当探测一个给定的人类定义的感兴趣的概念时,应该考虑,基于如何信息和规则的表示是相对于概念。我们通过对不同的神经网络模型和数据集进行详尽的实证分析来验证我们的发现。
摘要:Concept probing has recently gained popularity as a way for humans to peek into what is encoded within artificial neural networks. In concept probing, additional classifiers are trained to map the internal representations of a model into human-defined concepts of interest. However, the performance of these probes is highly dependent on the internal representations they probe from, making identifying the appropriate layer to probe an essential task. In this paper, we propose a method to automatically identify which layer's representations in a neural network model should be considered when probing for a given human-defined concept of interest, based on how informative and regular the representations are with respect to the concept. We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.


【14】Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling
标题:创新者:通过细粒度MoE升级进行科学的持续预训练
链接:https://arxiv.org/abs/2507.18671

作者:, Xiaoxing Wang, Zehao Lin, Weiyang Guo, Feng Hong, Shixiang Song, Geng Yu, Zihua Zhao, Sitao Xie, Longxuan Wei, Xiangqi Jin, Xiaohan Qin, Jiale Ma, Kai Chen, Jiangchao Yao, Zhouhan Lin, Junchi Yan, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Linfeng Zhang
备注:Technical Report
摘要:科学通用智能的基础是一个既包含科学知识又包含通用知识的大语言模型。然而,使用科学数据直接持续预训练LLM通常会导致灾难性的遗忘,这表明一般能力严重退化。在本报告中,我们介绍了Innovator,它通过在持续的预训练过程中将预训练的密集LLM升级为细粒度的专家混合模型来解决这个问题,其中不同的专家预计将学习不同学科的科学知识,并将共享专家用于一般任务。Innovator引入了一个四阶段的升级周期训练范式:(1)针对特定学科数据的科学专家归纳,(2)通过FFN维度分解的细粒度专家分裂,(3)科学感知路由预热,以及(4)混合数据集上的通才-科学家集成训练。这种范式使一般领域的知识能够相互分离,不同学科之间的知识能够相互分离,避免了不同领域知识之间的负面影响。在53.3B总参数和13.3B激活的情况下,Innovator使用共享的一般专家和64名专业科学专家(其中8名激活)扩展了Qwen 2.5 - 7 B。在300 B代币上进行训练,具有三级质量控制数据,Innovator在30项科学任务中实现了25%的平均改进,胜率为70%,同时在一般任务中保持了99%的性能。此外,Innovator-Reason经过Innovator的推理提升后训练,在解决复杂科学问题时表现出出色的推理性能,提高了30%以上。
摘要:A large language model (LLM) with knowledge in both scientific and general tasks is the foundation of science general intelligence. However, directly continued pretraining an LLM using science data usually leads to catastrophic forgetting, which indicates severe degradation in general ability. In this report, we present Innovator, which solves this problem by upcycling a pre-trained dense LLM into a fine-grained Mixtures-of-Experts model during continued pretraining, where different experts are expected to learn science knowledge in different disciplines, and a shared expert is utilized for general tasks. Innovator introduces a four-stage upcycle training paradigm: (1) Scientific Expert Induction on discipline-specific data, (2) Fine-grained Expert Splitting via FFN dimension decomposition, (3) Science-Aware Routing warmup, and (4) Generalist-Scientist Integration training on hybrid datasets. Such a paradigm enables knowledge in the general domain, and different scientific disciplines can be decoupled, avoiding the negative influence among knowledge in different domains. With 53.3B total parameters and 13.3B activated, Innovator extends Qwen2.5-7B using a shared general expert and 64 specialized scientific experts with 8 activated. Trained on 300B tokens with tri-level quality-controlled data, Innovator achieves 25% average improvement across 30 scientific tasks with a win rate as 70%, while retaining 99% performance in general tasks. Furthermore, Innovator-Reason, which is post-trained from Innovator for reasoning boosting, exhibits excellent reasoning performance in solving complex scientific problems with improvements over 30%.


【15】Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces
标题:具有未知光滑部分的非光滑问题的线性收敛算法
链接:https://arxiv.org/abs/2507.19465

作者:, Suvrit Sra
摘要:我们开发了有效的算法优化分段光滑(PWS)功能的基础分区的域光滑件是\n {未知}。对于PWS函数满足二次增长(QG)的条件,我们提出了一个层次(BL)型的方法,实现全局线性收敛-据我们所知,第一个这样的结果,任何算法,这类问题。我们将此方法扩展到处理近似PWS函数和解决弱凸PWS问题,提高了最先进的复杂性,以匹配光滑非凸优化的基准。此外,我们介绍了第一个可验证的和准确的PWS优化终止标准。类似于平滑优化中的梯度范数,该证书严格描述了QG条件下的最优性间隙,并且可以在不知道任何问题参数的情况下进行评估。我们为此证书开发了一个搜索子程序,并将其嵌入到一个猜测和检查框架中,从而导致凸QG和弱凸设置的几乎无参数的算法。
摘要:We develop efficient algorithms for optimizing piecewise smooth (PWS) functions where the underlying partition of the domain into smooth pieces is \emph{unknown}. For PWS functions satisfying a quadratic growth (QG) condition, we propose a bundle-level (BL) type method that achieves global linear convergence -- to our knowledge, the first such result for any algorithm for this problem class. We extend this method to handle approximately PWS functions and to solve weakly-convex PWS problems, improving the state-of-the-art complexity to match the benchmark for smooth non-convex optimization. Furthermore, we introduce the first verifiable and accurate termination criterion for PWS optimization. Similar to the gradient norm in smooth optimization, this certificate tightly characterizes the optimality gap under the QG condition, and can moreover be evaluated without knowledge of any problem parameters. We develop a search subroutine for this certificate and embed it within a guess-and-check framework, resulting in an almost parameter-free algorithm for both the convex QG and weakly-convex settings.


【16】Probably Approximately Correct Causal Discovery
标题:可能大致正确的原因发现
链接:https://arxiv.org/abs/2507.18903

作者: Somesh Jha, David Page
摘要:因果关系的发现是人工智能、统计学、流行病学、经济学等领域的基础问题。虽然存在精确的因果发现理论,但现实世界的应用程序本质上是资源受限的。从观测数据中推断因果关系的有效方法必须在有限的数据和时间限制下表现良好,其中“表现良好”意味着达到高精度,但不是完美的精度。在他的开创性论文A Theory of the Learnable中,Valiant强调了资源约束在监督机器学习中的重要性,引入了可能近似正确(PAC)学习的概念作为精确学习的替代方案。受Valiant工作的启发,我们提出了可能近似正确的因果(PACC)发现框架,将PAC学习原则扩展到因果领域。这个框架强调既建立因果关系的方法,如倾向得分技术和工具变量的方法计算和样本的效率。此外,我们还表明,它也可以提供理论保证,其他广泛使用的方法,如自我控制的情况下系列(SCCS)的方法,以前缺乏这样的保证。
摘要:The discovery of causal relationships is a foundational problem in artificial intelligence, statistics, epidemiology, economics, and beyond. While elegant theories exist for accurate causal discovery given infinite data, real-world applications are inherently resource-constrained. Effective methods for inferring causal relationships from observational data must perform well under finite data and time constraints, where "performing well" implies achieving high, though not perfect accuracy. In his seminal paper A Theory of the Learnable, Valiant highlighted the importance of resource constraints in supervised machine learning, introducing the concept of Probably Approximately Correct (PAC) learning as an alternative to exact learning. Inspired by Valiant's work, we propose the Probably Approximately Correct Causal (PACC) Discovery framework, which extends PAC learning principles to the causal field. This framework emphasizes both computational and sample efficiency for established causal methods such as propensity score techniques and instrumental variable approaches. Furthermore, we show that it can also provide theoretical guarantees for other widely used methods, such as the Self-Controlled Case Series (SCCS) method, which had previously lacked such guarantees.


【17】Discovering the dynamics of \emph{Sargassum} rafts' centers of mass
链接:https://arxiv.org/abs/2507.18771

作者: J. Beron-Vera, Gage Bonner
备注:Submitted to Chaos
摘要:自2011年以来,漂浮的马尾藻筏经常阻塞美洲内海域的海岸。筏体的运动由一个高维非线性动力学系统描述。被称为eBOMB模型,这是建立在马克西-莱利方程,通过合并之间的相互作用的块状的马尾藻形成一个筏和地球的旋转的影响。木筏质心的预测规律的缺乏表明需要机器学习。在本文中,我们评估和对比了长短期记忆(LSTM)递归神经网络(RNN)和非线性动力学稀疏识别(SINDy)。在这两种情况下,一个物理启发关闭建模方法是植根于eBOMB。具体来说,LSTM模型学习了从一组eBOMB变量到筏质心和海洋速度之间差异的映射。SINDy模型的候选函数库由eBOMB变量建议,并包括包含运载流远场效应的加窗速度项。LSTM和SINDy模型在紧密结合的团块条件下表现得最有效,尽管精度随着复杂性的增加而下降,例如风效应和评估松散连接的团块时。当设计简单、神经元和隐藏层更少时,LSTM模型可以提供最佳结果。虽然LSTM模型是一个缺乏可解释性的不透明黑盒模型,但SINDy模型通过函数库识别显式函数关系来实现透明性。窗口速度项的集成使非局部相互作用的有效建模成为可能,特别是在具有稀疏连接的筏的数据集中。
摘要:Since 2011, rafts of floating \emph{Sargassum} seaweed have frequently obstructed the coasts of the Intra-Americas Seas. The motion of the rafts is represented by a high-dimensional nonlinear dynamical system. Referred to as the eBOMB model, this builds on the Maxey--Riley equation by incorporating interactions between clumps of \emph{Sargassum} forming a raft and the effects of Earth's rotation. The absence of a predictive law for the rafts' centers of mass suggests a need for machine learning. In this paper, we evaluate and contrast Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) and Sparse Identification of Nonlinear Dynamics (SINDy). In both cases, a physics-inspired closure modeling approach is taken rooted in eBOMB. Specifically, the LSTM model learns a mapping from a collection of eBOMB variables to the difference between raft center-of-mass and ocean velocities. The SINDy model's library of candidate functions is suggested by eBOMB variables and includes windowed velocity terms incorporating far-field effects of the carrying flow. Both LSTM and SINDy models perform most effectively in conditions with tightly bonded clumps, despite declining precision with rising complexity, such as with wind effects and when assessing loosely connected clumps. The LSTM model delivered the best results when designs were straightforward, with fewer neurons and hidden layers. While LSTM model serves as an opaque black-box model lacking interpretability, the SINDy model brings transparency by discerning explicit functional relationships through the function libraries. Integration of the windowed velocity terms enabled effective modeling of nonlocal interactions, particularly in datasets featuring sparsely connected rafts.


【18】Interpretable inverse design of optical multilayer thin films based on extended neural adjoint and regression activation mapping
标题:基于扩展神经伴随和回归激活映射的光学多层薄膜可解释逆设计
链接:https://arxiv.org/abs/2507.18644

作者:im, Jungho Kim
摘要:我们提出了一个扩展的神经伴随(ENA)框架,它满足人工智能辅助光学多层薄膜(OMT)的逆向设计的六个关键标准:准确性,效率,多样性,可扩展性,灵活性和可解释性。为了提高现有神经伴随方法的可扩展性,我们提出了一种新的前向神经网络结构的OMT和引入材料损失函数到现有的神经伴随损失函数,方便探索的OMT的材料配置。此外,我们提出了详细的配方的回归激活映射所提出的前向神经网络架构(F-RAM),一个功能的可视化方法,旨在提高可解释性。我们通过进行消融研究验证了材料损失的有效性,其中损失功能的每个组件被系统地移除并进行评估。结果表明,包括材料损失显着提高准确性和多样性。为了证实基于ENA的逆设计的性能,我们将其与基于残差网络的全局优化网络(Res-GLOnet)进行了比较。与Res-GLOnet相比,ENA产生了具有更高精度和更好多样性的逆设计的OMT解决方案。为了证明的可解释性,我们采用F-RAM不同的OMT结构具有相似的光学特性,所提出的ENA方法获得。我们发现,各种OMT结构表现出类似的光学性能的特征重要性的分布是一致的,尽管在材料配置,层数和厚度的变化。此外,我们通过将OMT的初始层限制为SiO2和100 nm来证明ENA方法的灵活性。
摘要 :We propose an extended neural adjoint (ENA) framework, which meets six key criteria for artificial intelligence-assisted inverse design of optical multilayer thin films (OMTs): accuracy, efficiency, diversity, scalability, flexibility, and interpretability. To enhance the scalability of the existing neural adjoint method, we present a novel forward neural network architecture for OMTs and introduce a material loss function into the existing neural adjoint loss function, facilitating the exploration of material configurations of OMTs. Furthermore, we present the detailed formulation of the regression activation mapping for the presented forward neural network architecture (F-RAM), a feature visualization method aimed at improving interpretability. We validated the efficacy of the material loss by conducting an ablation study, where each component of the loss function is systematically removed and evaluated. The results indicated that the inclusion of the material loss significantly improves accuracy and diversity. To substantiate the performance of the ENA-based inverse design, we compared it against the residual network-based global optimization network (Res-GLOnet). The ENA yielded the OMT solutions of an inverse design with higher accuracy and better diversity compared to the Res-GLOnet. To demonstrate the interpretability, we applied F-RAM to diverse OMT structures with similar optical properties, obtained by the proposed ENA method. We showed that distributions of feature importance for various OMT structures exhibiting analogous optical properties are consistent, despite variations in material configurations, layer number, and thicknesses. Furthermore, we demonstrate the flexibility of the ENA method by restricting the initial layer of OMTs to SiO2 and 100 nm.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/184906