点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计141篇
大模型相关(23篇)
【1】LLM-based Content Classification Approach for GitHub Repositories by the README Files
标题:REAUTE文件针对GitHub存储库的基于LLM的内容分类方法
链接:https://arxiv.org/abs/2507.21899
作者:ir Mehmood, Shahid Hussain, Wen Li Wang, Muhammad Usama Malik
备注:8 pages, 4 Figures
摘要:GitHub是世界上最受欢迎的存储、共享和管理代码的平台。每个GitHub存储库都有一个与之关联的README文件。README文件应该包含根据GitHub建议的项目相关信息,以支持存储库的使用和改进。然而,GitHub仓库所有者有时会忽略这些建议。这会阻止GitHub存储库发挥其全部潜力。这项研究认为,GitHub存储库的README文件的全面性会显著影响其采用和利用,缺乏细节可能会阻碍其在研究社区中广泛参与和影响的全部潜力。大型语言模型(LLM)在许多基于文本的任务中表现出了良好的性能,包括文本分类,文本生成,文本摘要和文本翻译。在这项研究中,开发了一种方法来微调LLM,以自动分类GitHub README文件的不同部分。使用三种仅编码器的LLM,包括BERT、DistilBERT和RoBERTa。然后,这些预先训练的模型基于由4226个README文件部分组成的黄金标准数据集进行微调。这种方法优于目前最先进的方法,并取得了0.98的整体F1分数。此外,我们还研究了参数有效微调(PEFT)技术的使用,如低秩自适应(LoRA),并显示了一种经济的替代方案,以充分微调,而不会影响太多的性能。结果证明了使用LLM设计自动分类器对GitHub README文件内容进行分类的潜力。因此,这项研究有助于为GitHub存储库开发自动化工具,以提高其识别和潜在用途。
摘要:GitHub is the world's most popular platform for storing, sharing, and managing code. Every GitHub repository has a README file associated with it. The README files should contain project-related information as per the recommendations of GitHub to support the usage and improvement of repositories. However, GitHub repository owners sometimes neglected these recommendations. This prevents a GitHub repository from reaching its full potential. This research posits that the comprehensiveness of a GitHub repository's README file significantly influences its adoption and utilization, with a lack of detail potentially hindering its full potential for widespread engagement and impact within the research community. Large Language Models (LLMs) have shown great performance in many text-based tasks including text classification, text generation, text summarization and text translation. In this study, an approach is developed to fine-tune LLMs for automatically classifying different sections of GitHub README files. Three encoder-only LLMs are utilized, including BERT, DistilBERT and RoBERTa. These pre-trained models are then fine-tuned based on a gold-standard dataset consisting of 4226 README file sections. This approach outperforms current state-of-the-art methods and has achieved an overall F1 score of 0.98. Moreover, we have also investigated the use of Parameter-Efficient Fine-Tuning (PEFT) techniques like Low-Rank Adaptation (LoRA) and shown an economical alternative to full fine-tuning without compromising much performance. The results demonstrate the potential of using LLMs in designing an automatic classifier for categorizing the content of GitHub README files. Consequently, this study contributes to the development of automated tools for GitHub repositories to improve their identifications and potential usages.
【2】Introducing HALC: A general pipeline for finding optimal prompting strategies for automated coding with LLMs in the computational social sciences
标题:引入HALC:一个通用管道,用于寻找计算社会科学领域LLM自动编码的最佳提示策略
链接:https://arxiv.org/abs/2507.21831
作者:eich, Claudia Thoms, Tobias Schrimpf
备注:48 pages, 9 figures and 8 tables
摘要:LLM正在广泛使用任务自动化,包括社会科学中的自动编码。然而,尽管研究人员提出了不同的激励策略,但它们的有效性在LLM和任务中各不相同。试验和错误的做法往往仍然很普遍。我们提出HALC$-$一个通用管道,允许系统和可靠的建设最佳提示任何给定的编码任务和模型,允许集成的任何提示策略认为相关的。为了调查LLM编码并验证我们的管道,我们在超过200万个请求中向本地LLM发送了1,512个单独的提示。我们测试提示策略和LLM任务性能的基础上几个专家编码(地面真相)。与这些专家编码相比,我们发现提示使用LLM Mistral NeMo可靠地编码单个变量(${\alpha}$climate = .76; ${\alpha}$movement = .78)和两个变量(${\alpha}$climate = .71; ${\alpha}$movement = .74)。我们的提示策略是以一种将LLM与我们的码本对齐的方式设置的,我们没有为LLM友好性优化我们的码本。我们的论文提供了不同的提示策略的有效性,关键的影响因素,并确定每个编码任务和模型的可靠提示的见解。
摘要:LLMs are seeing widespread use for task automation, including automated coding in the social sciences. However, even though researchers have proposed different prompting strategies, their effectiveness varies across LLMs and tasks. Often trial and error practices are still widespread. We propose HALC$-$a general pipeline that allows for the systematic and reliable construction of optimal prompts for any given coding task and model, permitting the integration of any prompting strategy deemed relevant. To investigate LLM coding and validate our pipeline, we sent a total of 1,512 individual prompts to our local LLMs in over two million requests. We test prompting strategies and LLM task performance based on few expert codings (ground truth). When compared to these expert codings, we find prompts that code reliably for single variables (${\alpha}$climate = .76; ${\alpha}$movement = .78) and across two variables (${\alpha}$climate = .71; ${\alpha}$movement = .74) using the LLM Mistral NeMo. Our prompting strategies are set up in a way that aligns the LLM to our codebook$-$we are not optimizing our codebook for LLM friendliness. Our paper provides insights into the effectiveness of different prompting strategies, crucial influencing factors, and the identification of reliable prompts for each coding task and model.
【3】DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs
标题:DGP:一个用于利用图形增强的LLM进行欺诈检测的双粒度预算框架
链接:https://arxiv.org/abs/2507.21653
作者:Jun Hu, Bryan Hooi, Bingsheng He, Cheng Chen
摘要:真实世界的欺诈检测应用程序受益于图学习技术,这些技术联合利用节点特征(通常包含丰富的文本数据)和图结构信息。最近,图形增强的LLM作为一种有前途的图形学习方法出现,它将图形信息转换为提示,利用LLM对文本和结构信息进行推理的能力。其中,纯文本提示将图形信息转换为仅由文本标记组成的提示,提供了一种仅依赖于LLM调优而不需要额外的图形特定编码器的解决方案。然而,纯文本提示在异构欺诈检测图上的斗争:多跳关系随着每一个额外的跳而呈指数级扩展,导致与密集文本信息相关的快速增长的社区。这些邻域可能会在提示中用冗长、不相关的内容淹没模型,并抑制来自目标节点的关键信号,从而降低性能。为了解决这一挑战,我们提出了双粒度提示(DGP),它通过保留目标节点的细粒度文本细节,同时将邻居信息汇总到粗粒度文本提示中来减轻信息过载。DGP为不同的数据形式引入了量身定制的摘要策略,为文本字段引入了双层语义抽象,为数值特征引入了统计聚合,从而将冗长的邻居内容有效地压缩为简洁、信息丰富的提示。在公共和工业数据集上的实验表明,DGP在可管理的令牌预算内运行,同时将欺诈检测性能提高了6.8%(AUPRC),显示了图形增强LLM在欺诈检测方面的潜力。
摘要
:Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features, often rich in textual data, and graph structural information. Recently, Graph-Enhanced LLMs emerge as a promising graph learning approach that converts graph information into prompts, exploiting LLMs' ability to reason over both textual and structural information. Among them, text-only prompting, which converts graph information to prompts consisting solely of text tokens, offers a solution that relies only on LLM tuning without requiring additional graph-specific encoders. However, text-only prompting struggles on heterogeneous fraud-detection graphs: multi-hop relations expand exponentially with each additional hop, leading to rapidly growing neighborhoods associated with dense textual information. These neighborhoods may overwhelm the model with long, irrelevant content in the prompt and suppress key signals from the target node, thereby degrading performance. To address this challenge, we propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node while summarizing neighbor information into coarse-grained text prompts. DGP introduces tailored summarization strategies for different data modalities, bi-level semantic abstraction for textual fields and statistical aggregation for numerical features, enabling effective compression of verbose neighbor content into concise, informative prompts. Experiments across public and industrial datasets demonstrate that DGP operates within a manageable token budget while improving fraud detection performance by up to 6.8% (AUPRC) over state-of-the-art methods, showing the potential of Graph-Enhanced LLMs for fraud detection.
【4】Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation
标题:通过多数投票LLM-Rerank增强增强基于图表的建议
链接:https://arxiv.org/abs/2507.21563
作者:Nguyen, Bao Nguyen, Ha Lan N.T., Tuan Anh Hoang, Duc-Trong Le, Dung D. Le
摘要:推荐系统经常受到有限的用户-项目交互引起的数据稀疏性的影响,这降低了推荐系统的性能,并放大了真实世界场景中的流行度偏差。本文提出了一种新的数据增强框架,利用大语言模型(LLM)和项目的文本描述,以丰富交互数据。通过Few-Shot提示LLM多次对项目重新排序并通过多数投票汇总结果,我们生成了高置信度的合成用户-项目交互,并得到基于测量集中度的理论保证的支持。为了在图推荐系统的背景下有效地利用增强数据,我们将其集成到图对比学习框架中,以减轻分布偏移并减轻流行度偏差。大量的实验表明,我们的方法提高了准确性,减少了流行偏见,优于强基线。
摘要:Recommendation systems often suffer from data sparsity caused by limited user-item interactions, which degrade their performance and amplify popularity bias in real-world scenarios. This paper proposes a novel data augmentation framework that leverages Large Language Models (LLMs) and item textual descriptions to enrich interaction data. By few-shot prompting LLMs multiple times to rerank items and aggregating the results via majority voting, we generate high-confidence synthetic user-item interactions, supported by theoretical guarantees based on the concentration of measure. To effectively leverage the augmented data in the context of a graph recommendation system, we integrate it into a graph contrastive learning framework to mitigate distributional shift and alleviate popularity bias. Extensive experiments show that our method improves accuracy and reduces popularity bias, outperforming strong baselines.
【5】Persona Vectors: Monitoring and Controlling Character Traits in Language Models
标题:女神异闻录Vectors:监视和控制语言模型中的角色特征
链接:https://arxiv.org/abs/2507.21509
作者:en, Andy Arditi, Henry Sleight, Owain Evans, Jack Lindsey
摘要:大型语言模型通过模拟的“助理”角色与用户交互。虽然助理通常被训练成乐于助人,无害和诚实,但有时会偏离这些理想。在本文中,我们确定了方向模型的激活空间人物矢量潜在的几个特征,如邪恶,谄媚,和倾向于幻觉。我们确认,这些向量可用于监测部署时助理个性的波动。然后,我们应用人物角色向量来预测和控制训练过程中发生的性格变化。我们发现,无论是有意的和无意的个性变化后微调强烈相关的变化沿相关的人物角色向量。这些变化可以通过事后干预来缓解,或者首先通过新的预防性转向方法来避免。此外,人物角色向量可以用于标记训练数据,这些数据将在数据集级别和个体样本级别上产生不期望的个性变化。我们的方法提取人物矢量是自动化的,可以应用于任何感兴趣的个性特征,只给出一个自然语言描述。
摘要:Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space-persona vectors-underlying several traits, such as evil, sycophancy, and propensity to hallucinate. We confirm that these vectors can be used to monitor fluctuations in the Assistant's personality at deployment time. We then apply persona vectors to predict and control personality shifts that occur during training. We find that both intended and unintended personality changes after finetuning are strongly correlated with shifts along the relevant persona vectors. These shifts can be mitigated through post-hoc intervention, or avoided in the first place with a new preventative steering method. Moreover, persona vectors can be used to flag training data that will produce undesirable personality changes, both at the dataset level and the individual sample level. Our method for extracting persona vectors is automated and can be applied to any personality trait of interest, given only a natural-language description.
【6】Evaluation and Benchmarking of LLM Agents: A Survey
标题:LLM代理人的评估和基准:调查
链接:https://arxiv.org/abs/2507.21504
作者:ohammadi, Yipeng Li, Jane Lo, Wendy Yip
摘要:基于LLM的代理的兴起开辟了人工智能应用的新领域,但评估这些代理仍然是一个复杂和欠发达的领域。该调查提供了对LLM代理评估新兴领域的深入概述,介绍了一种二维分类法,该分类法将现有工作组织为(1)评估目标-评估什么,如代理行为,能力,可靠性和安全性-以及(2)评估过程-如何评估,包括交互模式,数据集和基准,度量计算方法和工具。除了分类之外,我们还强调了企业特有的挑战,例如基于角色的数据访问、可靠性保证的需求、动态和长期交互以及合规性,这些在当前的研究中经常被忽视。我们还确定了未来的研究方向,包括整体的,更现实的,可扩展的评估。这项工作旨在澄清代理评估的碎片化景观,并提供系统评估的框架,使研究人员和从业人员能够评估LLM代理的真实部署。
摘要:The rise of LLM-based agents has opened new frontiers in AI applications, yet evaluating these agents remains a complex and underdeveloped area. This survey provides an in-depth overview of the emerging field of LLM agent evaluation, introducing a two-dimensional taxonomy that organizes existing work along (1) evaluation objectives -- what to evaluate, such as agent behavior, capabilities, reliability, and safety -- and (2) evaluation process -- how to evaluate, including interaction modes, datasets and benchmarks, metric computation methods, and tooling. In addition to taxonomy, we highlight enterprise-specific challenges, such as role-based access to data, the need for reliability guarantees, dynamic and long-horizon interactions, and compliance, which are often overlooked in current research. We also identify future research directions, including holistic, more realistic, and scalable evaluation. This work aims to bring clarity to the fragmented landscape of agent evaluation and provide a framework for systematic assessment, enabling researchers and practitioners to evaluate LLM agents for real-world deployment.
【7】Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning
标题:拿铁:联邦学习中视觉语言模型的协作测试时适应
链接:https://arxiv.org/abs/2507.21494
作者:ao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He
备注:Accepted by ICCV 2025
摘要
:使用预先训练的视觉语言模型进行测试时自适应,以解决测试过程中的分布变化问题,已经受到越来越多的关注。在这些方法中,基于内存的算法由于其免训练性质和利用历史测试数据的能力而脱颖而出。然而,现有的测试时自适应方法通常是针对具有丰富数据的单个域而设计的。在去中心化的设置中,如联邦学习,将这些方法单独应用于每个客户端会受到有限的测试数据的影响,而通过服务器直接共享单个全局内存会阻止每个客户端的唯一分布的适当个性化。为了解决这个问题,我们提出了Latte,一个新的框架,每个客户端都维护一个本地内存来存储来自自己的历史测试数据的嵌入,以及一个外部内存来存储来自其他相关客户端的类原型。在通信过程中,每个客户端在服务器的协调下从相似的客户端检索原型,以扩展其内存。对于局部自适应,Latte利用嵌入相似性和不确定性来增强模型性能。我们的理论分析表明,Latte有效地利用了分销客户端,同时对分销客户端保持稳健。对域自适应和腐败基准的广泛实验验证了Latte在分散设置中实现了卓越的性能,同时只引入了微不足道的通信和计算成本。我们的代码可在https://github.com/baowenxuan/Latte上获得。
摘要:Test-time adaptation with pre-trained vision-language models has gained increasing attention for addressing distribution shifts during testing. Among these approaches, memory-based algorithms stand out due to their training-free nature and ability to leverage historical test data. However, existing test-time adaptation methods are typically designed for a single domain with abundant data. In decentralized settings such as federated learning, applying these methods individually to each client suffers from limited test data, while directly sharing a single global memory via the server prevents proper personalization to each client's unique distribution. To address this, we propose Latte, a novel framework where each client maintains a local memory to store embeddings from its own historical test data and an external memory to store class prototypes from other relevant clients. During communication, each client retrieves prototypes from similar clients under the server's coordination to expand its memory. For local adaptation, Latte utilizes both embedding similarity and uncertainty to enhance model performance. Our theoretical analysis shows that Latte effectively leverages in-distribution clients while remaining robust to out-of-distribution clients. Extensive experiments on domain adaptation and corruption benchmarks validate that Latte achieves superior performance in decentralized settings, while introducing only negligible communication and computation costs. Our code is available at https://github.com/baowenxuan/Latte .
【8】Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations
标题:大型语言模型增强强化学习,以实现多样化和新颖的推荐
链接:https://arxiv.org/abs/2507.21274
作者: Alireza Bagheri Garakani, Tianchen Zhou, Zhishen Huang, Yan Gao
摘要:在推荐系统中,多样性和新颖性对于捕捉不同的用户偏好和鼓励探索是必不可少的,但许多系统优先考虑点击相关性。虽然强化学习(RL)已被探索以提高多样性,但它通常依赖于可能与用户兴趣不一致的随机探索。我们提出了LAAC(LLM引导的对抗性演员评论),这是一种新的方法,它利用大型语言模型(LLM)作为参考策略来建议新的项目,同时训练一个轻量级的策略来使用系统特定的数据来完善这些建议。该方法将训练公式化为演员和评论家网络之间的双层优化,使评论家能够选择性地支持有前途的新动作,演员能够在LLM建议之外改进其策略。为了减轻对不可靠的LLM建议的高估,我们应用正则化,将未探索项目的评论值锚定在估计良好的数据集操作附近。在真实世界数据集上的实验表明,LAAC在多样性、新颖性和准确性方面优于现有基线,同时在不平衡数据上保持鲁棒性,有效地整合了LLM知识,而无需进行昂贵的微调。
摘要:In recommendation systems, diversity and novelty are essential for capturing varied user preferences and encouraging exploration, yet many systems prioritize click relevance. While reinforcement learning (RL) has been explored to improve diversity, it often depends on random exploration that may not align with user interests. We propose LAAC (LLM-guided Adversarial Actor Critic), a novel method that leverages large language models (LLMs) as reference policies to suggest novel items, while training a lightweight policy to refine these suggestions using system-specific data. The method formulates training as a bilevel optimization between actor and critic networks, enabling the critic to selectively favor promising novel actions and the actor to improve its policy beyond LLM recommendations. To mitigate overestimation of unreliable LLM suggestions, we apply regularization that anchors critic values for unexplored items close to well-estimated dataset actions. Experiments on real-world datasets show that LAAC outperforms existing baselines in diversity, novelty, and accuracy, while remaining robust on imbalanced data, effectively integrating LLM knowledge without expensive fine-tuning.
【9】Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications
标题:在交互式多模式通信中利用结构化任务关系推进组合LLM推理
链接:https://arxiv.org/abs/2507.21199
作者:, Hongcan Guo, Guoshun Nan, Jiaoyang Cui, Haoting Qian, Yihan Lin, Yilin Peng, Diyang Zhang, Yanzhao Hou, Huici Wu, Xiaofeng Tao, Tony Q.S. Quek
备注:Accepted by IEEE JSAC. This work has been submitted to the IEEE for possible publication
摘要:交互式多模态应用(IMA),例如车联网中的路线规划,通过在无线网络上集成各种形式的数据,丰富了用户的个性化体验。大型语言模型(LLM)的最新进展利用专家混合(MoE)机制来支持多个IMA,每个LLM都针对呈现不同业务工作流的特定任务进行单独训练。与现有的方法,依赖于多个LLM的IMA,本文提出了一种新的范式,实现各种IMA使用一个单一的组合LLM在无线网络。两个主要挑战包括:1)指导单个LLM适应不同的IMA目标; 2)确保LLM在资源受限的移动环境中的灵活性和效率。为了解决第一个挑战,我们提出了ContextLoRA,一种新的方法,引导LLM通过构建任务依赖图来学习IMA之间丰富的结构化上下文。我们为每个IMA划分神经层的可学习参数矩阵,以促进LLM组合。然后,我们开发了一个逐步的微调程序的任务关系,包括训练,冻结和掩蔽阶段的指导。这允许LLM学习在任务之间进行推理,以更好地适应,捕获任务之间的潜在依赖关系。对于第二个挑战,我们引入了ContextGear,这是一种优化ContextLoRA训练过程的调度策略,旨在通过策略分组机制最大限度地减少计算和通信成本。三个基准测试的实验表明,所提出的ContextLoRA和ContextGear的优越性。此外,我们在现实世界的无线测试平台上对我们提出的范例进行了原型设计,证明了它对各种IMA的实际适用性。我们将向社区发布代码。
摘要:Interactive multimodal applications (IMAs), such as route planning in the Internet of Vehicles, enrich users' personalized experiences by integrating various forms of data over wireless networks. Recent advances in large language models (LLMs) utilize mixture-of-experts (MoE) mechanisms to empower multiple IMAs, with each LLM trained individually for a specific task that presents different business workflows. In contrast to existing approaches that rely on multiple LLMs for IMAs, this paper presents a novel paradigm that accomplishes various IMAs using a single compositional LLM over wireless networks. The two primary challenges include 1) guiding a single LLM to adapt to diverse IMA objectives and 2) ensuring the flexibility and efficiency of the LLM in resource-constrained mobile environments. To tackle the first challenge, we propose ContextLoRA, a novel method that guides an LLM to learn the rich structured context among IMAs by constructing a task dependency graph. We partition the learnable parameter matrix of neural layers for each IMA to facilitate LLM composition. Then, we develop a step-by-step fine-tuning procedure guided by task relations, including training, freezing, and masking phases. This allows the LLM to learn to reason among tasks for better adaptation, capturing the latent dependencies between tasks. For the second challenge, we introduce ContextGear, a scheduling strategy to optimize the training procedure of ContextLoRA, aiming to minimize computational and communication costs through a strategic grouping mechanism. Experiments on three benchmarks show the superiority of the proposed ContextLoRA and ContextGear. Furthermore, we prototype our proposed paradigm on a real-world wireless testbed, demonstrating its practical applicability for various IMAs. We will release our code to the community.
【10】Uncovering Gradient Inversion Risks in Practical Language Model Training
标题:揭示实用语言模型训练中的梯度倒置风险
链接:https://arxiv.org/abs/2507.21198
作者:ng, Zhongkui Ma, Zihan Wang, Eu Joe Chegne, Mengyao Ma, Alsharif Abuadbba, Guangdong Bai
备注:15 Pages, 5 figures, 10 tables. Accepted by ACM CCS 2024
摘要
:梯度反转攻击已被证明是对联邦学习(FL)的重大隐私威胁,特别是在视觉模型等连续领域。相比之下,当应用于语言模型时,它通常被认为效率较低或高度依赖于不切实际的训练设置,这是由于文本数据中令牌的离散性带来的挑战。因此,尽管FL是一种新兴的语言模型训练方法,但其潜在的隐私威胁在很大程度上仍然被低估。在这项工作中,我们提出了一个特定领域的梯度反转攻击命名为抓斗(梯度反转与混合优化)。Grab具有两个交替的优化过程,以解决实际训练设置带来的挑战,包括对层之间的dropout掩码进行同步优化,以提高令牌恢复能力,以及对有效令牌排序进行离散优化。Grab可以恢复大部分私有训练数据(恢复率高达92.9%),优于使用辅助模型的离散优化攻击策略,在基准设置中恢复率高达28.9%,在实际设置中恢复率高达48.5%。Grab在理解语言模型的新兴FL训练模式中的这种隐私威胁方面迈出了有价值的一步。
摘要:The gradient inversion attack has been demonstrated as a significant privacy threat to federated learning (FL), particularly in continuous domains such as vision models. In contrast, it is often considered less effective or highly dependent on impractical training settings when applied to language models, due to the challenges posed by the discrete nature of tokens in text data. As a result, its potential privacy threats remain largely underestimated, despite FL being an emerging training method for language models. In this work, we propose a domain-specific gradient inversion attack named Grab (gradient inversion with hybrid optimization). Grab features two alternating optimization processes to address the challenges caused by practical training settings, including a simultaneous optimization on dropout masks between layers for improved token recovery and a discrete optimization for effective token sequencing. Grab can recover a significant portion (up to 92.9% recovery rate) of the private training data, outperforming the attack strategy of utilizing discrete optimization with an auxiliary model by notable improvements of up to 28.9% recovery rate in benchmark settings and 48.5% recovery rate in practical settings. Grab provides a valuable step forward in understanding this privacy threat in the emerging FL training mode of language models.
【11】Interpretable Anomaly-Based DDoS Detection in AI-RAN with XAI and LLMs
标题:具有XAI和LLM的AI-RAN中基于可解释异常的DDOS检测
链接:https://arxiv.org/abs/2507.21193
作者:hatzimiltis, Mohammad Shojafar, Mahdi Boloursaz Mashhadi, Rahim Tafazolli
摘要:下一代无线电接入网络(RAN)通过智能控制器引入可编程性、智能性和近实时控制,从而增强RAN内以及更广泛的5G/6 G基础设施的安全性。本文提出了一个全面的调查,突出的机会,挑战和研究差距大语言模型(LLM)辅助解释(XAI)入侵检测(IDS)的安全未来RAN环境。受此启发,我们提出了一个LLM可解释的基于异常的检测系统,用于分布式拒绝服务(DDoS)攻击,使用多变量时间序列关键性能指标(KPMs),从E2节点中提取,在近实时RAN智能控制器(近RT RIC)。基于LSTM的模型被训练以基于这些KPM来识别恶意用户设备(UE)行为。为了提高透明度,我们应用LIME和SHAP等事后局部可解释性方法来解释个体预测。此外,LLM被用来将技术解释转换为非专家用户可以访问的自然语言见解。在真实5G网络KPMs上的实验结果表明,我们的框架实现了高检测准确率(F1得分> 0.96),同时提供了可操作和可解释的输出。
摘要:Next generation Radio Access Networks (RANs) introduce programmability, intelligence, and near real-time control through intelligent controllers, enabling enhanced security within the RAN and across broader 5G/6G infrastructures. This paper presents a comprehensive survey highlighting opportunities, challenges, and research gaps for Large Language Models (LLMs)-assisted explainable (XAI) intrusion detection (IDS) for secure future RAN environments. Motivated by this, we propose an LLM interpretable anomaly-based detection system for distributed denial-of-service (DDoS) attacks using multivariate time series key performance measures (KPMs), extracted from E2 nodes, within the Near Real-Time RAN Intelligent Controller (Near-RT RIC). An LSTM-based model is trained to identify malicious User Equipment (UE) behavior based on these KPMs. To enhance transparency, we apply post-hoc local explainability methods such as LIME and SHAP to interpret individual predictions. Furthermore, LLMs are employed to convert technical explanations into natural-language insights accessible to non-expert users. Experimental results on real 5G network KPMs demonstrate that our framework achieves high detection accuracy (F1-score > 0.96) while delivering actionable and interpretable outputs.
【12】Embeddings to Diagnosis: Latent Fragility under Agentic Perturbations in Clinical LLMs
标题:诊断的嵌入:临床LLM中扩张性扰动下的潜在脆弱性
链接:https://arxiv.org/abs/2507.21188
作者:nan Vijayaraj
摘要:用于临床决策支持的LLM通常在小但有临床意义的输入变化下失败,例如掩盖症状或否定发现,尽管在静态基准上性能很高。这些推理失败通常不会被标准的NLP度量检测到,这些NLP度量对驱动诊断不稳定性的潜在表示变化不敏感。我们提出了一个几何感知的评估框架,LAPD(潜在干扰扰动诊断),它系统地探讨了结构化对抗编辑下临床LLM的潜在鲁棒性。在这个框架内,我们引入了潜在诊断翻转率(LDFR),一个模型不可知的诊断信号,捕获代表性的不稳定性时,嵌入跨PCA减少的潜在空间中的决策边界。临床笔记使用基于诊断推理的结构化提示管道生成,然后沿着四个轴扰动:掩蔽,否定,同义词替换和数值变化,以模拟常见的歧义和遗漏。我们计算了基础和临床LLM的LDFR,发现即使在最小的表面变化下也会出现潜在的脆弱性。最后,我们验证了我们在90个来自DireCT基准(MIMIC-IV)的真实临床记录上的发现,证实了LDFR在合成环境之外的普遍性。我们的研究结果揭示了表面鲁棒性和语义稳定性之间的持续差距,强调了几何感知审计在安全关键型临床AI中的重要性。
摘要:LLMs for clinical decision support often fail under small but clinically meaningful input shifts such as masking a symptom or negating a finding, despite high performance on static benchmarks. These reasoning failures frequently go undetected by standard NLP metrics, which are insensitive to latent representation shifts that drive diagnosis instability. We propose a geometry-aware evaluation framework, LAPD (Latent Agentic Perturbation Diagnostics), which systematically probes the latent robustness of clinical LLMs under structured adversarial edits. Within this framework, we introduce Latent Diagnosis Flip Rate (LDFR), a model-agnostic diagnostic signal that captures representational instability when embeddings cross decision boundaries in PCA-reduced latent space. Clinical notes are generated using a structured prompting pipeline grounded in diagnostic reasoning, then perturbed along four axes: masking, negation, synonym replacement, and numeric variation to simulate common ambiguities and omissions. We compute LDFR across both foundation and clinical LLMs, finding that latent fragility emerges even under minimal surface-level changes. Finally, we validate our findings on 90 real clinical notes from the DiReCT benchmark (MIMIC-IV), confirming the generalizability of LDFR beyond synthetic settings. Our results reveal a persistent gap between surface robustness and semantic stability, underscoring the importance of geometry-aware auditing in safety-critical clinical AI.
【13】EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models
标题:EvoSLD:使用大型语言模型自动发现神经缩放定律
链接:https://arxiv.org/abs/2507.21184
作者:n, Xiangyu Wang, Jianzhu Ma, Yitao Liang
摘要:标度律是一种基本的数学关系,可以预测神经网络性能如何随着模型大小、数据集大小和计算资源等变量的变化而演变。传统上,发现这些规律需要广泛的人类专业知识和手工实验。我们介绍EvoSLD,缩放律发现(SLD)的自动化框架,利用大语言模型(LLM)指导的进化算法共同进化符号表达式及其优化例程。制定处理缩放变量,控制变量,并在不同的实验设置响应指标,EvoSLD搜索简约,通用的功能形式,最大限度地减少分组数据子集的拟合误差。在最近文献中的五个真实世界场景中进行评估,EvoSLD在两种情况下重新发现了精确的人类衍生法律,并在其他情况下超越了它们,在保持的测试集上实现了归一化均方误差的数量级减少。与符号回归和消融变体等基线相比,EvoSLD表现出卓越的准确性、可解释性和效率,突出了其加速人工智能研究的潜力。代码可在https://github.com/linhaowei1/SLD上获得。
摘要
:Scaling laws are fundamental mathematical relationships that predict how neural network performance evolves with changes in variables such as model size, dataset size, and computational resources. Traditionally, discovering these laws requires extensive human expertise and manual experimentation. We introduce EvoSLD, an automated framework for Scaling Law Discovery (SLD) that leverages evolutionary algorithms guided by Large Language Models (LLMs) to co-evolve symbolic expressions and their optimization routines. Formulated to handle scaling variables, control variables, and response metrics across diverse experimental settings, EvoSLD searches for parsimonious, universal functional forms that minimize fitting errors on grouped data subsets. Evaluated on five real-world scenarios from recent literature, EvoSLD rediscovers exact human-derived laws in two cases and surpasses them in others, achieving up to orders-of-magnitude reductions in normalized mean squared error on held-out test sets. Compared to baselines like symbolic regression and ablated variants, EvoSLD demonstrates superior accuracy, interpretability, and efficiency, highlighting its potential to accelerate AI research. Code is available at https://github.com/linhaowei1/SLD.
【14】LLM-Adapted Interpretation Framework for Machine Learning Models
标题:适用于机器学习模型的法学硕士解释框架
链接:https://arxiv.org/abs/2507.21179
作者: Zihan Hu, Weiteng Zhang, Weihao Xie, Jianwei Shuai, Xian Shen, Zhen Feng
备注:11 pages, 8 figures, 2 tables
摘要:背景和目标:像XGBoost这样的高性能机器学习模型通常是“黑匣子”,由于缺乏可解释性,限制了它们的临床应用。本研究旨在弥合肌肉减少症风险评估的预测准确性和叙述透明度之间的差距。方法:我们提出了LLM适应解释框架(LAI-ML),一种新的知识蒸馏架构。LAI-ML使用专门的技术(HAGA和CACS)将经过训练的XGBoost模型的特征属性转换为概率格式。然后,在强化学习循环和基于案例的检索的指导下,大型语言模型(LLM)生成数据忠实的诊断叙述。结果:LAI-ML框架实现了83%的预测准确率,显著优于基线XGBoost模型,高出13%。值得注意的是,LLM不仅复制了教师模型的逻辑,而且还在21.7%的不一致情况下纠正了其预测,证明了增强的推理。结论:LAI-ML有效地将不透明的模型预测转化为值得信赖和可解释的临床见解,为医疗AI中的“黑匣子”问题提供了可部署的解决方案。
摘要:Background & Aims: High-performance machine learning models like XGBoost are often "black boxes," limiting their clinical adoption due to a lack of interpretability. This study aims to bridge the gap between predictive accuracy and narrative transparency for sarcopenia risk assessment. Methods: We propose the LLM-Adapted Interpretation Framework (LAI-ML), a novel knowledge distillation architecture. LAI-ML transforms feature attributions from a trained XGBoost model into a probabilistic format using specialized techniques (HAGA and CACS). A Large Language Model (LLM), guided by a reinforcement learning loop and case-based retrieval, then generates data-faithful diagnostic narratives. Results: The LAI-ML framework achieved 83% prediction accuracy, significantly outperforming the baseline XGBoost model, 13% higher. Notably, the LLM not only replicated the teacher model's logic but also corrected its predictions in 21.7% of discordant cases, demonstrating enhanced reasoning. Conclusion: LAI-ML effectively translates opaque model predictions into trustworthy and interpretable clinical insights, offering a deployable solution to the "black-box" problem in medical AI.
【15】Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question
标题:多元化的法学硕士还是多元化的问题解释?这就是整体问题
链接:https://arxiv.org/abs/2507.21168
作者:sales, Santiago Miret
摘要:有效地利用多样性已被证明可以提高各种机器学习模型的性能,包括大型语言模型(LLM)。然而,确定利用多样性的最有效方式仍然是一项挑战。在这项工作中,我们比较了使用LLM回答二元问题的两种多样性方法:模型多样性,它依赖于多个模型回答同一个问题,以及问题解释多样性,它依赖于使用相同的模型以不同的方式回答同一个问题。对于这两种情况下,我们采用多数表决作为合奏共识启发式,以确定最终的答案。我们在boolq,strategyqa和pubmedqa上的实验表明,与模型多样性相比,问题解释多样性始终会导致更好的集成精度。此外,我们对GPT和LLaMa的分析表明,模型多样性通常会产生最佳和最差集成成员之间的结果,而没有明显的改善。
摘要:Effectively leveraging diversity has been shown to improve performance for various machine learning models, including large language models (LLMs). However, determining the most effective way of using diversity remains a challenge. In this work, we compare two diversity approaches for answering binary questions using LLMs: model diversity, which relies on multiple models answering the same question, and question interpretation diversity, which relies on using the same model to answer the same question framed in different ways. For both cases, we apply majority voting as the ensemble consensus heuristic to determine the final answer. Our experiments on boolq, strategyqa, and pubmedqa show that question interpretation diversity consistently leads to better ensemble accuracy compared to model diversity. Furthermore, our analysis of GPT and LLaMa shows that model diversity typically produces results between the best and the worst ensemble members without clear improvement.
【16】AGORA: Incentivizing Group Emergence Capability in LLMs via Group Distillation
标题:AGORA:通过群蒸馏激励LLM的群涌现能力
链接:https://arxiv.org/abs/2507.21166
作者:g, Ben Wang, Shuifa Sun
摘要:复杂推理的进展受到当前训练数据集的静态性质的限制。我们提出结构化的相互作用作为一个新的缩放轴,超越了普遍的范例增加模型参数。我们的自我进化框架AGORA使协作集成能够在具有挑战性的数学基准上实现超过最先进的单片系统高达4.45个百分点的推理性能。这一成果源于群体涌现能力--将孤立的模型无法达到的集体能力综合起来,验证交互作为智能的可扩展驱动力。我们的研究结果将协作生态系统的工程定位为能力涌现的重要前沿。
摘要:Progress in complex reasoning is constrained by the static nature of the current training datasets. We propose structured interaction as a new scaling axis, moving beyond the prevailing paradigm of increasing model parameters. Our self-evolving framework, AGORA, enables a collaborative ensemble to achieve reasoning performance exceeding state-of-the-art monolithic systems by up to 4.45 percentage points on challenging mathematical benchmarks. This gain stems from group emergent ability-the synthesis of collective capabilities unattainable by isolated models, validating interaction as a scalable driver of intelligence. Our results position the engineering of collaborative ecosystems as a vital frontier for capability emergence.
【17】Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch Problems
标题:大型语言模型支持的主动配电网调度问题的自动化建模和优化
链接:https://arxiv.org/abs/2507.21162
作者:Chenhui Lin, Yue Yang, Qi Wang, Haotian Liu, Haizhou Hua, Wenchuan Wu
摘要:随着分布式能源向有源配电网的渗透,有效的有源配电网调度势在必行。然而,许多新集成的ADN运营商,如配电系统集成商,虚拟电厂管理者和终端生产者,往往缺乏电力系统运行,建模,优化和编程方面的专业知识。这种知识差距使得依赖人类专家既费钱又费时。为了应对这一挑战,实现智能,灵活的ADN调度,本文提出了一个大语言模型(LLM)驱动的自动建模和优化方法。首先,ADN调度问题分解成顺序的阶段,和多LLM协调架构的设计。该框架包括一个信息提取器,一个问题制定者,和一个代码程序员,分别负责信息检索,优化问题制定和代码实现。之后,为每个LLM代理开发定制的细化技术,大大提高了生成内容的准确性和可靠性。所提出的方法具有以用户为中心的界面,使ADN运营商能够通过简单的自然语言查询获得调度策略,消除技术障碍并提高效率。各种测试用例的全面比较和端到端的演示验证了所提出的架构和方法的有效性。
摘要
:The increasing penetration of distributed energy resources into active distribution networks (ADNs) has made effective ADN dispatch imperative. However, the numerous newly-integrated ADN operators, such as distribution system aggregators, virtual power plant managers, and end prosumers, often lack specialized expertise in power system operation, modeling, optimization, and programming. This knowledge gap renders reliance on human experts both costly and time-intensive. To address this challenge and enable intelligent, flexible ADN dispatch, this paper proposes a large language model (LLM) powered automated modeling and optimization approach. First, the ADN dispatch problems are decomposed into sequential stages, and a multi-LLM coordination architecture is designed. This framework comprises an Information Extractor, a Problem Formulator, and a Code Programmer, tasked with information retrieval, optimization problem formulation, and code implementation, respectively. Afterwards, tailored refinement techniques are developed for each LLM agent, greatly improving the accuracy and reliability of generated content. The proposed approach features a user-centric interface that enables ADN operators to derive dispatch strategies via simple natural language queries, eliminating technical barriers and increasing efficiency. Comprehensive comparisons and end-to-end demonstrations on various test cases validate the effectiveness of the proposed architecture and methods.
【18】Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity
标题:自适应集群协作提升LLC医疗决策支持能力
链接:https://arxiv.org/abs/2507.21159
作者:ng, Liuxin Bao, Shengyuan Liu, Yixuan Yuan
摘要:大型语言模型(LLM)的协作性已被证明在自然语言处理系统中是有效的,为医疗保健发展带来了可观的前景。然而,它缺乏明确的组件选择规则,需要人为干预或临床特定的验证。此外,现有架构严重依赖于预定义的LLM集群,其中部分LLM在医疗决策支持场景中表现不佳,使LLM的协作性无效。为此,我们提出了一个自适应集群协作方法,涉及自我多样性和交叉一致性最大化机制,以提高LLM医疗决策支持能力。对于自多样性,我们计算LLM内成对输出的模糊匹配值作为其自多样性值,随后优先考虑具有高自多样性值的LLM作为集群组件,以免训练的方式。对于交叉一致性,我们首先测量具有最高自分集值的LLM与其他LLM之间的交叉一致性值,然后逐渐屏蔽具有最低交叉一致性值的LLM,以消除协作传播过程中潜在的不一致输出。在两个专业医疗数据集NEJMQA和MMLU-Pro-health上进行的广泛实验证明了我们的方法在面向医生的专业中的有效性。例如,在NEJMQA上,我们的方法实现了所有学科的官方通过分数的准确率,特别是实现了65.47%的ACC,而GPT-4在妇产科学科上实现了56.12%。
摘要:The collaborativeness of large language models (LLMs) has proven effective in natural language processing systems, holding considerable promise for healthcare development. However, it lacks explicit component selection rules, necessitating human intervention or clinical-specific validation. Moreover, existing architectures heavily rely on a predefined LLM cluster, where partial LLMs underperform in medical decision support scenarios, invalidating the collaborativeness of LLMs. To this end, we propose an adaptive cluster collaborativeness methodology involving self-diversity and cross-consistency maximization mechanisms to boost LLMs medical decision support capacity. For the self-diversity, we calculate the fuzzy matching value of pairwise outputs within an LLM as its self-diversity value, subsequently prioritizing LLMs with high self-diversity values as cluster components in a training-free manner. For the cross-consistency, we first measure cross-consistency values between the LLM with the highest self-diversity value and others, and then gradually mask out the LLM having the lowest cross-consistency value to eliminate the potential inconsistent output during the collaborative propagation. Extensive experiments on two specialized medical datasets, NEJMQA and MMLU-Pro-health, demonstrate the effectiveness of our method across physician-oriented specialties. For example, on NEJMQA, our method achieves the accuracy rate up to the publicly official passing score across all disciplines, especially achieving ACC of 65.47\% compared to the 56.12\% achieved by GPT-4 on the Obstetrics and Gynecology discipline.
【19】TRIDENT: Benchmarking LLM Safety in Finance, Medicine, and Law
标题:TRIDENT:金融、医学和法律领域LLM安全基准
链接:https://arxiv.org/abs/2507.21134
作者:, Yijiang River Dong, Ehsan Shareghi, Nigel Collier
摘要:随着大型语言模型(LLM)越来越多地部署在法律、金融和医学等高风险领域,系统地评估其特定领域的安全性和合规性变得至关重要。虽然以前的工作主要集中在提高这些领域的LLM性能,但它往往忽视了特定领域安全风险的评估。为了弥合这一差距,我们首先根据AMA医学伦理原则,ABA职业行为示范规则和CFA协会道德准则为LLM定义特定领域的安全原则。在此基础上,我们引入了Trident-Bench,这是一个专门针对法律,金融和医疗领域LLM安全的基准。我们在Trident-Bench上评估了19个通用和领域专用模型,并表明它有效地揭示了关键的安全差距-强大的通才模型(例如,GPT,Gemini)可以满足基本的期望,而领域专用模型经常与微妙的道德细微差别作斗争。这突出了对更细粒度的特定于域的安全改进的迫切需要。通过引入三叉戟长凳,我们的工作提供了第一个系统的资源,研究法学硕士在法律和金融安全之一,并奠定了基础,为未来的研究,旨在减少部署法学硕士在专业监管领域的安全风险。代码和基准测试将在www.example.com上发布
摘要:As large language models (LLMs) are increasingly deployed in high-risk domains such as law, finance, and medicine, systematically evaluating their domain-specific safety and compliance becomes critical. While prior work has largely focused on improving LLM performance in these domains, it has often neglected the evaluation of domain-specific safety risks. To bridge this gap, we first define domain-specific safety principles for LLMs based on the AMA Principles of Medical Ethics, the ABA Model Rules of Professional Conduct, and the CFA Institute Code of Ethics. Building on this foundation, we introduce Trident-Bench, a benchmark specifically targeting LLM safety in the legal, financial, and medical domains. We evaluated 19 general-purpose and domain-specialized models on Trident-Bench and show that it effectively reveals key safety gaps -- strong generalist models (e.g., GPT, Gemini) can meet basic expectations, whereas domain-specialized models often struggle with subtle ethical nuances. This highlights an urgent need for finer-grained domain-specific safety improvements. By introducing Trident-Bench, our work provides one of the first systematic resources for studying LLM safety in law and finance, and lays the groundwork for future research aimed at reducing the safety risks of deploying LLMs in professionally regulated fields. Code and benchmark will be released at: https://github.com/zackhuiiiii/TRIDENT
【20】Can You Trust an LLM with Your Life-Changing Decision? An Investigation into AI High-Stakes Responses
标题:您能相信法学硕士做出改变生活的决定吗?对人工智能高风险响应的调查
链接:https://arxiv.org/abs/2507.21132
作者:rian Cahyono, Saran Subramanian
摘要:越来越多的人向大型语言模型(LLM)咨询高风险的生活建议,但他们缺乏标准的保障措施,无法提供自信但误导的回应。这就产生了谄媚和过度自信的风险。本文通过三个实验研究了这些故障模式:(1)多项选择评估,以衡量模型对用户压力的稳定性;(2)使用新的安全类型学和LLM法官的自由响应分析;(3)通过操纵“高风险”激活向量来引导模型行为的机械可解释性实验。我们的研究结果表明,虽然一些模型表现出谄媚,但其他像o 4-mini仍然很强大。表现最好的模型通过经常提出澄清性问题而获得高安全分数,这是安全,好奇的方法的关键特征,而不是发布规定性建议。此外,我们证明了一个模型的谨慎可以直接控制通过激活转向,提出了一个新的路径安全对齐。这些发现强调了需要细致入微,多方面的基准,以确保LLM可以信任改变生活的决定。
摘要:Large Language Models (LLMs) are increasingly consulted for high-stakes life advice, yet they lack standard safeguards against providing confident but misguided responses. This creates risks of sycophancy and over-confidence. This paper investigates these failure modes through three experiments: (1) a multiple-choice evaluation to measure model stability against user pressure; (2) a free-response analysis using a novel safety typology and an LLM Judge; and (3) a mechanistic interpretability experiment to steer model behavior by manipulating a "high-stakes" activation vector. Our results show that while some models exhibit sycophancy, others like o4-mini remain robust. Top-performing models achieve high safety scores by frequently asking clarifying questions, a key feature of a safe, inquisitive approach, rather than issuing prescriptive advice. Furthermore, we demonstrate that a model's cautiousness can be directly controlled via activation steering, suggesting a new path for safety alignment. These findings underscore the need for nuanced, multi-faceted benchmarks to ensure LLMs can be trusted with life-changing decisions.
【21】RATE: An LLM-Powered Retrieval Augmented Generation Technology-Extraction Pipeline
标题:RATE:LLM支持的检索增强生成技术-提取管道
链接:https://arxiv.org/abs/2507.21125
作者:hosseini, Arya Aftab, Alireza Sheikh
备注:9 pages, 4 figures, 1 table
摘要:在一个彻底的技术变革时代,技术地图在加强决策方面发挥着至关重要的作用。这些地图在很大程度上依赖于技术提取的自动化方法。本文介绍了检索增强技术提取(RATE),一个大语言模型(LLM)为基础的管道,从科学文献中自动提取技术。RATE将检索增强生成(RAG)与基于多定义LLM的验证相结合。这种混合方法在候选生成中具有高召回率,同时在候选过滤中具有高精度。虽然该管道的设计具有通用性和广泛适用性,但我们在678篇研究文章中展示了其使用,这些文章重点关注脑机接口(BCI)和延展实境(XR)作为案例研究。最后,将RATE验证的科技术语映射到共现网络中,揭示了研究景观的主题集群和结构特征。为了进行评价,专家们挑选了70篇随机文章,整理了一个黄金标准技术数据集。此外,使用基于Transformers的双向编码器表示(BERT)的技术提取模型作为比较方法。RATE的F1评分为91.27%,显著优于BERT的F1评分53.73%。我们的研究结果突出了定义驱动的LLM方法用于技术提取和映射的承诺。它们还为BCI-XR领域的新兴趋势提供了新的见解。源代码可从https://github.com/AryaAftab/RATE获得
摘要:In an era of radical technology transformations, technology maps play a crucial role in enhancing decision making. These maps heavily rely on automated methods of technology extraction. This paper introduces Retrieval Augmented Technology Extraction (RATE), a Large Language Model (LLM) based pipeline for automated technology extraction from scientific literature. RATE combines Retrieval Augmented Generation (RAG) with multi-definition LLM-based validation. This hybrid method results in high recall in candidate generation alongside with high precision in candidate filtering. While the pipeline is designed to be general and widely applicable, we demonstrate its use on 678 research articles focused on Brain-Computer Interfaces (BCIs) and Extended Reality (XR) as a case study. Consequently, The validated technology terms by RATE were mapped into a co-occurrence network, revealing thematic clusters and structural features of the research landscape. For the purpose of evaluation, a gold standard dataset of technologies in 70 selected random articles had been curated by the experts. In addition, a technology extraction model based on Bidirectional Encoder Representations of Transformers (BERT) was used as a comparative method. RATE achieved F1-score of 91.27%, Significantly outperforming BERT with F1-score of 53.73%. Our findings highlight the promise of definition-driven LLM methods for technology extraction and mapping. They also offer new insights into emerging trends within the BCI-XR field. The source code is available https://github.com/AryaAftab/RATE
【22】Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing
标题:重振MNEME:通过稀疏模型偏差预测LLM取消学习和微调的副作用
链接:https://arxiv.org/abs/2507.21084
作者:ssem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi
摘要:大型语言模型(LLM)经常被微调或取消学习,以适应新任务或消除不受欢迎的行为。虽然现有的评估方法评估此类干预后的表现,但仍然没有用于检测非预期副作用的通用方法,例如遗忘生物内容降低化学任务的表现,特别是当这些影响不可预测或紧急时。为了解决这个问题,我们介绍了MNEME,模型diffiNg评估机制的影响,一个轻量级的框架,用于识别这些副作用,使用稀疏模型差分。MNEME在不访问微调数据的情况下,将基本模型和微调模型与任务无关的数据(例如,The Pile,LMSYS-Chat-1 M)进行比较,以隔离行为变化。应用于三种场景的五个LLM:WMDP知识遗忘,紧急错位和良性微调,MNEME在预测副作用方面的准确率高达95%,与已知的基准保持一致,并且不需要自定义算法。此外,我们表明,对高激活样本的再训练可以部分逆转这些影响。我们的研究结果表明,稀疏探测和差分提供了一个可扩展的和自动化的镜头到微调引起的模型变化,为理解和管理LLM行为提供了实用的工具。
摘要:Large language models (LLMs) are frequently fine-tuned or unlearned to adapt to new tasks or eliminate undesirable behaviors. While existing evaluation methods assess performance after such interventions, there remains no general approach for detecting unintended side effects, such as unlearning biology content degrading performance on chemistry tasks, particularly when these effects are unpredictable or emergent. To address this issue, we introduce MNEME, Model diffiNg for Evaluating Mechanistic Effects, a lightweight framework for identifying these side effects using sparse model diffing. MNEME compares base and fine-tuned models on task-agnostic data (for example, The Pile, LMSYS-Chat-1M) without access to fine-tuning data to isolate behavioral shifts. Applied to five LLMs across three scenarios: WMDP knowledge unlearning, emergent misalignment, and benign fine-tuning, MNEME achieves up to 95 percent accuracy in predicting side effects, aligning with known benchmarks and requiring no custom heuristics. Furthermore, we show that retraining on high-activation samples can partially reverse these effects. Our results demonstrate that sparse probing and diffing offer a scalable and automated lens into fine-tuning-induced model changes, providing practical tools for understanding and managing LLM behavior.
【23】Dialogic Social Learning for Artificial Agents: Enhancing LLM Ontology Acquisition through Mixed-Initiative Educational Interactions
标题:人工代理的对话社交学习:通过混合举措教育互动增强LLM Ontology习得
链接:https://arxiv.org/abs/2507.21065
作者:atania, Luca Annese, Cansu Koyuturk, Azzurra Ruggeri, Dimitri Ognibene
备注:submitted to ICSR2025
摘要:大型语言模型(LLM)在处理大量离线数据集方面表现出了卓越的能力。然而,他们在获取和整合复杂的在线知识方面往往面临挑战。传统的人工智能训练范式主要基于监督学习或强化学习,反映了独立探索的“皮亚杰”模型。这些方法通常依赖于大型数据集和稀疏的反馈信号,限制了模型从交互中有效学习的能力。从维果茨基的社会文化理论的启示,本研究探讨了潜在的社会中介学习范式,以解决这些局限性。 我们引入了一个动态的环境,称为“AI社交体操”,其中AI学习者代理与知识渊博的AI教师代理进行二元教学对话。这些互动强调外部的,结构化的对话作为知识获取的核心机制,与仅依赖于内部推理或模式识别的方法形成对比。 我们的调查集中在不同的教学策略如何影响AI学习过程中的本体获取的背景下。实证结果表明,这种对话的方法,特别是那些涉及混合方向的互动相结合的自上而下的解释与学习者发起的问题,显着提高法学硕士的能力,以获取和应用新知识,优于单向教学方法和直接访问结构化的知识,格式通常存在于训练数据集。 这些研究结果表明,将教学和心理学见解融入人工智能和机器人训练中可以大大提高训练后知识的获取和响应质量。这种方法提供了一个补充途径,以现有的战略,如即时工程
摘要:Large Language Models (LLMs) have demonstrated remarkable capabilities in processing extensive offline datasets. However, they often face challenges in acquiring and integrating complex, knowledge online. Traditional AI training paradigms, predominantly based on supervised learning or reinforcement learning, mirror a 'Piagetian' model of independent exploration. These approaches typically rely on large datasets and sparse feedback signals, limiting the models' ability to learn efficiently from interactions. Drawing inspiration from Vygotsky's sociocultural theory, this study explores the potential of socially mediated learning paradigms to address these limitations. We introduce a dynamic environment, termed the 'AI Social Gym', where an AI learner agent engages in dyadic pedagogical dialogues with knowledgeable AI teacher agents. These interactions emphasize external, structured dialogue as a core mechanism for knowledge acquisition, contrasting with methods that depend solely on internal inference or pattern recognition. Our investigation focuses on how different pedagogical strategies impact the AI learning process in the context of ontology acquisition. Empirical results indicate that such dialogic approaches-particularly those involving mixed-direction interactions combining top-down explanations with learner-initiated questioning-significantly enhance the LLM's ability to acquire and apply new knowledge, outperforming both unidirectional instructional methods and direct access to structured knowledge, formats typically present in training datasets. These findings suggest that integrating pedagogical and psychological insights into AI and robot training can substantially improve post-training knowledge acquisition and response quality. This approach offers a complementary pathway to existing strategies like prompt engineering
Graph相关(图学习|图神经网络|图优化等)(4篇)
【1】Torque-based Graph Surgery:Enhancing Graph Neural Networks with Hierarchical Rewiring
标题:基于扭矩的图形手术:通过分层重新布线增强图形神经网络
链接:https://arxiv.org/abs/2507.21422
作者:ng, Lele Fu, Zhen Cui, Tong Zhang, Na Song, Bo Huang
摘要:图神经网络(GNN)已经成为从图结构数据中学习的强大工具,利用消息传递来传播信息和更新节点表示。然而,大多数努力表明,编码在图中的本地交互可能对该过程不友好,从而激发了图重新布线方法的开发。在这项工作中,我们提出了一个扭矩驱动的分层重新布线策略,灵感来自经典力学中的扭矩概念,动态调制消息传递,以提高异嗜图中的表示学习,并增强对噪声图的鲁棒性。具体来说,我们定义了一个干扰感知的扭矩度量,它集成了结构距离和能量分数,以量化由边缘引起的扰动,从而鼓励每个节点从其最近的低能量邻居聚集信息。我们使用的度量分层重新配置每一层的感受野,明智地修剪高扭矩的边缘,并添加低扭矩的链接,抑制传播噪声和提高相关的信号。在基准数据集上的广泛评估表明,我们的方法超越了最先进的方法在异嗜和同嗜图,并保持高精度的噪声图。
摘要:Graph Neural Networks (GNNs) have emerged as powerful tools for learning from graph-structured data, leveraging message passing to diffuse information and update node representations. However, most efforts have suggested that native interactions encoded in the graph may not be friendly for this process, motivating the development of graph rewiring methods. In this work, we propose a torque-driven hierarchical rewiring strategy, inspired by the notion of torque in classical mechanics, dynamically modulating message passing to improve representation learning in heterophilous graphs and enhance robustness against noisy graphs. Specifically, we define an interference-aware torque metric that integrates structural distance and energy scores to quantify the perturbation induced by edges, thereby encouraging each node to aggregate information from its nearest low-energy neighbors. We use the metric to hierarchically reconfigure the receptive field of each layer by judiciously pruning high-torque edges and adding low-torque links, suppressing propagation noise and boosting pertinent signals. Extensive evaluations on benchmark datasets show that our approach surpasses state-of-the-art methods on both heterophilous and homophilous graphs, and maintains high accuracy on noisy graph.
【2】Exploring Adaptive Structure Learning for Heterophilic Graphs
标题:探索异嗜图的适应性结构学习
链接:https://arxiv.org/abs/2507.21191
作者:hik
备注:Initially submitted this draft at Tiny ICLR 2025
摘要:图卷积网络(GCN)在图表示学习中获得了吸引力,最近人们关注的是如何提高各种现实应用中异嗜图的性能。典型的消息传递范例中的本地化特征聚合阻碍了捕获同一类的非本地节点之间的远程依赖关系。异亲图中固有的连通性结构往往与同类中相距较远的节点之间的信息共享发生冲突。我们提出结构学习来重新连接浅GCN本身的边缘,以避免由于过度平滑而导致下游判别任务的性能下降。参数化的邻接矩阵,学习非本地节点之间的连接,并扩展浅层GCN的跳跨度有利于捕获远程依赖关系。然而,我们的方法是不可推广的heterophilic图和执行不一致的节点分类任务的图形结构。
摘要:Graph Convolutional Networks (GCNs) gained traction for graph representation learning, with recent attention on improving performance on heterophilic graphs for various real-world applications. The localized feature aggregation in a typical message-passing paradigm hinders the capturing of long-range dependencies between non-local nodes of the same class. The inherent connectivity structure in heterophilic graphs often conflicts with information sharing between distant nodes of same class. We propose structure learning to rewire edges in shallow GCNs itself to avoid performance degradation in downstream discriminative tasks due to oversmoothing. Parameterizing the adjacency matrix to learn connections between non-local nodes and extend the hop span of shallow GCNs facilitates the capturing of long-range dependencies. However, our method is not generalizable across heterophilic graphs and performs inconsistently on node classification task contingent to the graph structure.
【3】Beyond Neural Networks: Symbolic Reasoning over Wavelet Logic Graph Signals
标题:超越神经网络:基于子波逻辑图信号的符号推理
链接:https://arxiv.org/abs/2507.21190
作者:ruluta, Andreas Lemos, Priscilla Burity
摘要:我们提出了一个基于图拉普拉斯小波变换(GLWT)的完全非神经学习框架。与依赖于卷积、递归或基于注意力的神经网络的传统架构不同,我们的模型纯粹在图谱域中使用结构化多尺度滤波、非线性收缩和小波系数上的符号逻辑进行操作。在图节点上定义的信号通过GLWT进行分解,用可解释的非线性进行调制,并重新组合用于下游任务,如去噪和令牌分类。该系统支持组合推理,通过一个符号域特定的语言(DSL)在图小波激活。在合成图去噪和语言令牌图上的实验证明了与轻量级GNN相比具有更高透明度和效率的竞争力。这项工作提出了一种原则性的,可解释的,资源高效的替代深度神经架构的图形学习。
摘要:We present a fully non neural learning framework based on Graph Laplacian Wavelet Transforms (GLWT). Unlike traditional architectures that rely on convolutional, recurrent, or attention based neural networks, our model operates purely in the graph spectral domain using structured multiscale filtering, nonlinear shrinkage, and symbolic logic over wavelet coefficients. Signals defined on graph nodes are decomposed via GLWT, modulated with interpretable nonlinearities, and recombined for downstream tasks such as denoising and token classification. The system supports compositional reasoning through a symbolic domain-specific language (DSL) over graph wavelet activations. Experiments on synthetic graph denoising and linguistic token graphs demonstrate competitive performance against lightweight GNNs with far greater transparency and efficiency. This work proposes a principled, interpretable, and resource-efficient alternative to deep neural architectures for learning on graphs.
【4】Graph neural networks for residential location choice: connection to classical logit models
标题:用于住宅选址选择的图神经网络:与经典logit模型的连接
链接:https://arxiv.org/abs/2507.21334
作者:Cheng, Lingqian Hu, Yuheng Bu, Yuqi Zhou, Shenhao Wang
摘要:研究人员采用深度学习进行经典的离散选择分析,因为它可以捕获复杂的特征关系并实现更高的预测性能。然而,现有的深度学习方法不能明确地捕捉选择方案之间的关系,这一直是经典离散选择模型的一个长期关注的问题。为了解决这一问题,本文引入图神经网络(GNN)作为一种新的框架来分析住宅区位选择。基于GNN的离散选择模型(GNN-DCMs)为神经网络提供了一种结构化的方法来捕获空间选择之间的依赖关系,同时保持与经典随机效用理论的清晰联系。从理论上讲,我们证明了GNN-DCMs将嵌套logit(NL)模型和空间相关logit(SCL)模型作为两个特定的情况下,产生新的算法解释通过替代品的效用之间的消息传递。从经验上讲,GNN-DCMs在预测芝加哥77个社区中的住宅位置选择方面优于基准MNL,SCL和前馈神经网络。关于模型解释,GNN-DCMs可以捕获个体异质性并表现出空间感知的替代模式。总的来说,这些结果突出了GNN-DCMs作为一个统一的表达框架的潜力,用于在复杂的空间选择背景下协同离散选择建模和深度学习。
摘要:Researchers have adopted deep learning for classical discrete choice analysis as it can capture complex feature relationships and achieve higher predictive performance. However, the existing deep learning approaches cannot explicitly capture the relationship among choice alternatives, which has been a long-lasting focus in classical discrete choice models. To address the gap, this paper introduces Graph Neural Network (GNN) as a novel framework to analyze residential location choice. The GNN-based discrete choice models (GNN-DCMs) offer a structured approach for neural networks to capture dependence among spatial alternatives, while maintaining clear connections to classical random utility theory. Theoretically, we demonstrate that the GNN-DCMs incorporate the nested logit (NL) model and the spatially correlated logit (SCL) model as two specific cases, yielding novel algorithmic interpretation through message passing among alternatives' utilities. Empirically, the GNN-DCMs outperform benchmark MNL, SCL, and feedforward neural networks in predicting residential location choices among Chicago's 77 community areas. Regarding model interpretation, the GNN-DCMs can capture individual heterogeneity and exhibit spatially-aware substitution patterns. Overall, these results highlight the potential of GNN-DCMs as a unified and expressive framework for synergizing discrete choice modeling and deep learning in the complex spatial choice contexts.
Transformer(5篇)
【1】Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline
标题:通过呼吸信号高效的疼痛识别:单个交叉注意Transformer多窗口融合管道
链接:https://arxiv.org/abs/2507.21886
作者:Gkikas, Ioannis Kyprakis, Manolis Tsiknakis
摘要:疼痛是一种影响大部分人群的复杂病症。准确和一致的评估是必不可少的个人经历的痛苦,它支持有效和先进的管理策略的发展。自动疼痛评估系统提供持续监测和支持临床决策,旨在减少痛苦和防止功能下降。这项研究已提交给\textit{下一代疼痛评估(AI 4PAIN)的第二次多模态感知大挑战}。所提出的方法引入了一个管道,该管道利用呼吸作为输入信号,并结合了高效的交叉注意力转换器(Transformer)以及多窗口策略。大量的实验表明,呼吸是一个有价值的生理模式的疼痛评估。此外,实验表明,紧凑和高效的模型,如果适当优化,可以实现强大的性能,往往超过较大的同行。所提出的多窗口方法有效地捕捉短期和长期的功能,以及全球的特点,从而提高模型的代表性能力。
摘要:Pain is a complex condition affecting a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain, and it supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring and support clinical decision-making, aiming to reduce distress and prevent functional decline. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages respiration as the input signal and incorporates a highly efficient cross-attention transformer alongside a multi-windowing strategy. Extensive experiments demonstrate that respiration is a valuable physiological modality for pain assessment. Moreover, experiments revealed that compact and efficient models, when properly optimized, can achieve strong performance, often surpassing larger counterparts. The proposed multi-window approach effectively captures both short-term and long-term features, as well as global characteristics, thereby enhancing the model's representational capacity.
【2】Unlocking Interpretability for RF Sensing: A Complex-Valued White-Box Transformer
标题:RF传感的解锁可解释性:复值白盒Transformer
链接:https://arxiv.org/abs/2507.21799
作者:, Yina Wang, Chenshu Wu
摘要:深度学习的经验性成功推动了其在射频(RF)领域的应用,导致深度无线传感(DWS)的重大进展。然而,大多数现有的DWS模型的功能与有限的可解释性,这阻碍了他们的普遍性,并提出了在安全敏感的物理应用程序的黑箱。在这项工作中,受白盒Transformers的显着进步的启发,我们提出了RF-CRATE,这是第一个数学上可解释的RF传感深度网络架构,基于复杂稀疏速率降低的原则。为了适应独特的RF信号,我们进行了非平凡的理论推导,将原来的实值白盒Transformer扩展到复域。通过利用CR演算框架,我们成功地构建了一个完全复值的白盒Transformer与理论推导的自注意和剩余的多层感知器模块。此外,为了提高模型从有限的无线数据中提取区分特征的能力,我们引入了子空间正则化,这是一种新的正则化策略,可以增强特征多样性,从而在多个传感任务中平均提高19.98%的性能。我们广泛评估了RF-CRATE对七个基线与多个公共和自我收集的数据集,涉及不同的RF信号。结果表明,RF-CRATE实现了与经过彻底设计的黑盒模型相当的性能,同时提供了完整的数学可解释性。更重要的是,通过将CRATE扩展到复杂域,RF-CRATE产生了实质性的改进,与CRATE相比,在不同的传感任务中实现了5.08%的平均分类增益,并将回归误差降低了10.34%。RF-CRATE在https://github.com/rfcrate/RF_CRATE上完全开源。
摘要:The empirical success of deep learning has spurred its application to the radio-frequency (RF) domain, leading to significant advances in Deep Wireless Sensing (DWS). However, most existing DWS models function as black boxes with limited interpretability, which hampers their generalizability and raises concerns in security-sensitive physical applications. In this work, inspired by the remarkable advances of white-box transformers, we present RF-CRATE, the first mathematically interpretable deep network architecture for RF sensing, grounded in the principles of complex sparse rate reduction. To accommodate the unique RF signals, we conduct non-trivial theoretical derivations that extend the original real-valued white-box transformer to the complex domain. By leveraging the CR-Calculus framework, we successfully construct a fully complex-valued white-box transformer with theoretically derived self-attention and residual multi-layer perceptron modules. Furthermore, to improve the model's ability to extract discriminative features from limited wireless data, we introduce Subspace Regularization, a novel regularization strategy that enhances feature diversity, resulting in an average performance improvement of 19.98% across multiple sensing tasks. We extensively evaluate RF-CRATE against seven baselines with multiple public and self-collected datasets involving different RF signals. The results show that RF-CRATE achieves performance on par with thoroughly engineered black-box models, while offering full mathematical interpretability. More importantly, by extending CRATE to the complex domain, RF-CRATE yields substantial improvements, achieving an average classification gain of 5.08% and reducing regression error by 10.34% across diverse sensing tasks compared to CRATE. RF-CRATE is fully open-sourced at: https://github.com/rfcrate/RF_CRATE.
【3】Bubbleformer: Forecasting Boiling with Transformers
标题:Bubbleformer:预测与Transformer的沸腾
链接:https://arxiv.org/abs/2507.21244
作者: Shakeel Hassan, Xianwei Zou, Akash Dhruv, Vishwanath Ganesan, Aparna Chandramowlishwaran
备注:39 pages, 13 figures, Submitted to NeurIPS 2025
摘要:建模沸腾(一个固有的混乱,多相过程的能源和热系统的核心)仍然是一个重大的挑战,神经PDE代理。现有模型需要未来的投入(例如,气泡位置),因为它们不能从过去的状态学习成核,限制了它们自主预测沸腾动力学的能力。他们也无法模拟流动沸腾速度场,尖锐的界面动量耦合需要长距离和方向性的感应偏差。我们介绍Bubbleformer,一个基于变压器的时空模型,预测稳定和远程沸腾动力学,包括成核,界面演化和传热,而不依赖于模拟数据在推理过程中。Bubbleformer集成了因子化轴向注意力、频率感知缩放和热物理参数条件,以概括流体、几何形状和操作条件。为了评估混沌系统中的物理保真度,我们提出了可解释的基于物理的度量,评估热通量的一致性,界面几何形状和质量守恒。我们还发布了BubbleML 2.0,这是一个高保真数据集,涵盖了各种工作流体(冷冻剂,制冷剂,冷凝器),沸腾配置(池和流动沸腾),流动状态(气泡,段塞,环形)和边界条件。Bubbleformer在两相沸腾流的预测和预报方面都树立了新的基准。
摘要:Modeling boiling (an inherently chaotic, multiphase process central to energy and thermal systems) remains a significant challenge for neural PDE surrogates. Existing models require future input (e.g., bubble positions) during inference because they fail to learn nucleation from past states, limiting their ability to autonomously forecast boiling dynamics. They also fail to model flow boiling velocity fields, where sharp interface-momentum coupling demands long-range and directional inductive biases. We introduce Bubbleformer, a transformer-based spatiotemporal model that forecasts stable and long-range boiling dynamics including nucleation, interface evolution, and heat transfer without dependence on simulation data during inference. Bubbleformer integrates factorized axial attention, frequency-aware scaling, and conditions on thermophysical parameters to generalize across fluids, geometries, and operating conditions. To evaluate physical fidelity in chaotic systems, we propose interpretable physics-based metrics that evaluate heat-flux consistency, interface geometry, and mass conservation. We also release BubbleML 2.0, a high-fidelity dataset that spans diverse working fluids (cryogens, refrigerants, dielectrics), boiling configurations (pool and flow boiling), flow regimes (bubbly, slug, annular), and boundary conditions. Bubbleformer sets new benchmark results in both prediction and forecasting of two-phase boiling flows.
【4】Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers
标题:Contrast-CAT:基于转换器的文本分类器中增强解释性的对比激活
链接:https://arxiv.org/abs/2507.21186
作者:an, Jeonghyun Lee, Sangkyun Lee
摘要:Transformers对人工智能研究产生了深远的影响,但解释它们的决定仍然具有挑战性-即使是相对简单的任务,如分类-这阻碍了现实世界应用程序的信任和安全部署。虽然基于激活的归因方法有效地解释了基于transformer的文本分类模型,但我们的研究结果表明,这些方法可能会被激活中的类无关特征破坏,导致不太可靠的解释。为了解决这个问题,我们提出了对比CAT,一种新的激活对比度为基础的归因方法,通过过滤掉类无关的功能,细化标记级的属性。通过对比输入序列的激活与参考激活,对比CAT生成更清晰,更忠实的归因图。在各种数据集和模型上的实验结果证实,Contrast-CAT始终优于最先进的方法。值得注意的是,在MoRF设置下,与大多数竞争方法相比,它在AOPC中实现了x1.30的平均改进,在LOdds中实现了x2.25的平均改进,证明了它在增强基于transformer的文本分类的可解释性方面的有效性。
摘要:Transformers have profoundly influenced AI research, but explaining their decisions remains challenging -- even for relatively simpler tasks such as classification -- which hinders trust and safe deployment in real-world applications. Although activation-based attribution methods effectively explain transformer-based text classification models, our findings reveal that these methods can be undermined by class-irrelevant features within activations, leading to less reliable interpretations. To address this limitation, we propose Contrast-CAT, a novel activation contrast-based attribution method that refines token-level attributions by filtering out class-irrelevant features. By contrasting the activations of an input sequence with reference activations, Contrast-CAT generates clearer and more faithful attribution maps. Experimental results across various datasets and models confirm that Contrast-CAT consistently outperforms state-of-the-art methods. Notably, under the MoRF setting, it achieves average improvements of x1.30 in AOPC and x2.25 in LOdds over the most competing methods, demonstrating its effectiveness in enhancing interpretability for transformer-based text classification.
【5】Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification
标题:视觉变形器和卷积神经网络用于医学图像分类的比较分析
链接:https://arxiv.org/abs/2507.21156
作者:adkar
备注:9 pages, 8 figures, 3 tables. Submitted to IEEE Access
摘要:Vision Transformers(ViTs)的出现彻底改变了计算机视觉,但与传统的卷积神经网络(CNN)相比,它们在医学成像中的有效性仍然未得到充分探索。这项研究提出了一个全面的比较分析CNN和ViT架构在三个关键的医学成像任务:胸部X射线肺炎检测,脑肿瘤分类,皮肤癌黑色素瘤检测。我们评估了四种最先进的模型- ResNet-50,EfficientNet-B 0,ViT-Base和DeiT-Small -在总计8,469张医学图像的数据集上。我们的研究结果证明了特定于任务的模型优势:ResNet-50在胸部X射线分类方面达到了98.37%的准确率,DeiT-Small在脑肿瘤检测方面表现出色,准确率为92.16%,EfficientNet-B 0领导的皮肤癌分类准确率为81.84%。这些发现为从业者选择医疗AI应用的架构提供了重要的见解,突出了临床决策支持系统中特定任务架构选择的重要性。
摘要:The emergence of Vision Transformers (ViTs) has revolutionized computer vision, yet their effectiveness compared to traditional Convolutional Neural Networks (CNNs) in medical imaging remains under-explored. This study presents a comprehensive comparative analysis of CNN and ViT architectures across three critical medical imaging tasks: chest X-ray pneumonia detection, brain tumor classification, and skin cancer melanoma detection. We evaluated four state-of-the-art models - ResNet-50, EfficientNet-B0, ViT-Base, and DeiT-Small - across datasets totaling 8,469 medical images. Our results demonstrate task-specific model advantages: ResNet-50 achieved 98.37% accuracy on chest X-ray classification, DeiT-Small excelled at brain tumor detection with 92.16% accuracy, and EfficientNet-B0 led skin cancer classification at 81.84% accuracy. These findings provide crucial insights for practitioners selecting architectures for medical AI applications, highlighting the importance of task-specific architecture selection in clinical decision support systems.
GAN|对抗|攻击|生成相关(9篇)
【1】Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation
标题:教我恶作剧:通过知识提炼探索对抗性可转移性
链接:https://arxiv.org/abs/2507.21992
作者:a Pradhan, Shikshya Shiwakoti, Neha Bathuri
备注:10 pages, 4 figures
摘要:我们研究了来自多个异构教师模型的知识蒸馏(KD)是否可以增强可转移对抗示例的生成。使用两种KD策略训练轻量级学生模型:基于交换的切换和联合优化,使用ResNet 50和DenseNet-161作为教师。然后,经过训练的学生将使用FG、FGS和PGD攻击生成对抗性示例,并根据黑盒目标模型(GoogLeNet)进行评估。我们的研究结果表明,从多个教师中提取的学生模型实现了与基于集合的基线相当的攻击成功率,同时将对抗性示例生成时间减少了六倍。消融研究进一步表明,较低的温度设置和包括硬标签监督显着提高可转移性。这些发现表明,KD不仅可以作为一种模型压缩技术,而且可以作为提高黑盒对抗攻击效率和有效性的强大工具。
摘要:We investigate whether knowledge distillation (KD) from multiple heterogeneous teacher models can enhance the generation of transferable adversarial examples. A lightweight student model is trained using two KD strategies: curriculum-based switching and joint optimization, with ResNet50 and DenseNet-161 as teachers. The trained student is then used to generate adversarial examples using FG, FGS, and PGD attacks, which are evaluated against a black-box target model (GoogLeNet). Our results show that student models distilled from multiple teachers achieve attack success rates comparable to ensemble-based baselines, while reducing adversarial example generation time by up to a factor of six. An ablation study further reveals that lower temperature settings and the inclusion of hard-label supervision significantly enhance transferability. These findings suggest that KD can serve not only as a model compression technique but also as a powerful tool for improving the efficiency and effectiveness of black-box adversarial attacks.
【2】TempRe: Template generation for single and direct multi-step retrosynthesis
标题:TempRe:用于单步和直接多步逆合成的模板生成
链接:https://arxiv.org/abs/2507.21762
作者:an-Vu, Daniel Armstrong, Zlatko Joncev, Philippe Schwaller
摘要:由于巨大而复杂的化学反应空间,逆合成规划仍然是分子发现中的核心挑战。虽然传统的基于模板的方法提供了易处理性,但它们具有较差的可扩展性和有限的泛化能力,并且无模板生成方法存在生成无效反应的风险。在这项工作中,我们提出了TempRe,一个生成框架,重新制定基于模板的方法作为序列生成,使可扩展的,灵活的,化学上合理的逆合成。我们在单步和多步逆合成任务中评估了TempRe,证明了其优于模板分类和基于SMILES的生成方法。在PaRoutes多步基准测试中,TempRe实现了强大的top-k路由准确性。此外,我们将TempRe扩展到直接多步合成路线生成,为传统的单步和基于搜索的方法提供了一种轻量级和高效的替代方案。这些结果突出了潜在的模板生成建模作为一个强大的范例,在计算机辅助合成规划。
摘要
:Retrosynthesis planning remains a central challenge in molecular discovery due to the vast and complex chemical reaction space. While traditional template-based methods offer tractability, they suffer from poor scalability and limited generalization, and template-free generative approaches risk generating invalid reactions. In this work, we propose TempRe, a generative framework that reformulates template-based approaches as sequence generation, enabling scalable, flexible, and chemically plausible retrosynthesis. We evaluated TempRe across single-step and multi-step retrosynthesis tasks, demonstrating its superiority over both template classification and SMILES-based generation methods. On the PaRoutes multi-step benchmark, TempRe achieves strong top-k route accuracy. Furthermore, we extend TempRe to direct multi-step synthesis route generation, providing a lightweight and efficient alternative to conventional single-step and search-based approaches. These results highlight the potential of template generative modeling as a powerful paradigm in computer-aided synthesis planning.
【3】Zero-Shot Machine Unlearning with Proxy Adversarial Data Generation
标题:Zero-Shot Machine Unlearning with Proxy Adversarial Data Generation
链接:https://arxiv.org/abs/2507.21738
作者:Chen, Tianqing Zhu, Xin Yu, Wanlei Zhou
备注:Accepted by IJCAI 2025
摘要:机器非学习旨在从训练模型中去除特定样本的影响。这个过程中的一个关键挑战是过度遗忘,由于模型参数的变化,模型在剩余数据上的性能显着下降。现有的遗忘算法依赖于剩余的数据来防止这个问题。因此,这些方法在更实际的场景中是不适用的,其中只有未学习样本可用(即,zero-shot unlearning)。本文提出了一种新的框架,EST-PAG,以填补这一空白。我们的方法提供了三个关键的创新:(1)我们通过生成对抗样本来近似不可访问的剩余数据;(2)利用生成的样本,我们精确定位特定的子空间来执行遗忘过程,从而防止在具有挑战性的zero-shot场景中过度遗忘;(3)考虑了非学习过程对剩余样本的影响,设计了一种基于影响的伪标记策略。因此,我们的方法进一步提高了模型的性能后,unlearning。该方法具有理论保证,在各种基准上的实验验证了我们所提出的方法的有效性和优越性超过几个基线。
摘要:Machine unlearning aims to remove the influence of specific samples from a trained model. A key challenge in this process is over-unlearning, where the model's performance on the remaining data significantly drops due to the change in the model's parameters. Existing unlearning algorithms depend on the remaining data to prevent this issue. As such, these methods are inapplicable in a more practical scenario, where only the unlearning samples are available (i.e., zero-shot unlearning). This paper presents a novel framework, ZS-PAG, to fill this gap. Our approach offers three key innovations: (1) we approximate the inaccessible remaining data by generating adversarial samples; (2) leveraging the generated samples, we pinpoint a specific subspace to perform the unlearning process, therefore preventing over-unlearning in the challenging zero-shot scenario; and (3) we consider the influence of the unlearning process on the remaining samples and design an influence-based pseudo-labeling strategy. As a result, our method further improves the model's performance after unlearning. The proposed method holds a theoretical guarantee, and experiments on various benchmarks validate the effectiveness and superiority of our proposed method over several baselines.
【4】Retrieve-Augmented Generation for Speeding up Diffusion Policy without Additional Training
标题:在无需额外训练的情况下加快扩散政策的检索增强生成
链接:https://arxiv.org/abs/2507.21452
作者:n Odonchimed, Tatsuya Matsushima, Simon Holk, Yusuke Iwasawa, Yutaka Matsuo
摘要:扩散策略(Diffusion Policies,DPs)在各种模仿学习任务中能够显著提高学习精度,因此受到了广泛关注。然而,DP依赖于扩散模型,这需要多个噪声去除步骤来生成单个动作,导致生成时间长。为了解决这个问题,已经提出了基于知识提取的方法,如一致性策略(CP)。然而,这些方法需要大量的训练时间,特别是对于困难的任务。在这项研究中,我们提出了RAGDP(检索增强生成扩散政策)作为一种新的框架,消除了使用知识库进行额外训练的需要,以加快预训练DP的推理。具体来说,RAGDP通过DP编码器对观察-动作对进行编码,构建专家论证的矢量数据库。在推理过程中,嵌入当前观察,并提取最相似的专家动作。该提取的动作与中间噪声去除步骤相结合,以减少与原始扩散步骤相比所需的步骤的数量。我们表明,通过使用RAGDP与基础模型和现有的加速方法,我们提高了准确性和速度的权衡,而无需额外的训练。即使将模型加速20倍,RAGDP在准确性方面仍保持优势,比CP等蒸馏模型提高了7%。
摘要:Diffusion Policies (DPs) have attracted attention for their ability to achieve significant accuracy improvements in various imitation learning tasks. However, DPs depend on Diffusion Models, which require multiple noise removal steps to generate a single action, resulting in long generation times. To solve this problem, knowledge distillation-based methods such as Consistency Policy (CP) have been proposed. However, these methods require a significant amount of training time, especially for difficult tasks. In this study, we propose RAGDP (Retrieve-Augmented Generation for Diffusion Policies) as a novel framework that eliminates the need for additional training using a knowledge base to expedite the inference of pre-trained DPs. In concrete, RAGDP encodes observation-action pairs through the DP encoder to construct a vector database of expert demonstrations. During inference, the current observation is embedded, and the most similar expert action is extracted. This extracted action is combined with an intermediate noise removal step to reduce the number of steps required compared to the original diffusion step. We show that by using RAGDP with the base model and existing acceleration methods, we improve the accuracy and speed trade-off with no additional training. Even when accelerating the models 20 times, RAGDP maintains an advantage in accuracy, with a 7% increase over distillation models such as CP.
【5】Cascading and Proxy Membership Inference Attacks
标题:级联和代理成员推断攻击
链接:https://arxiv.org/abs/2507.21412
作者:, Jiacheng Li, Yuetian Chen, Kaiyuan Zhang, Zhizhen Yuan, Hanshen Xiao, Bruno Ribeiro, Ninghui Li
备注:Our code is available at: this https URL
摘要:成员关系推断攻击(MIA)通过确定数据集中是否包含特定查询实例来评估经过训练的机器学习模型对其训练数据的揭示程度。我们将现有的MIA分为自适应或非自适应,这取决于对手是否允许成员查询训练阴影模型。在自适应设置中,对手可以训练阴影模型访问查询实例后,我们强调了利用实例之间的成员依赖关系的重要性,并提出了一个攻击不可知的框架,称为级联成员推理攻击(CMIA),它采用了成员依赖关系,通过条件阴影训练,以提高成员推理性能。 在非自适应设置中,对手被限制在获得成员查询之前训练阴影模型,我们引入代理成员推理攻击(PMIA)。PMIA采用代理选择策略,该策略识别与查询实例具有相似行为的样本,并使用它们在阴影模型中的行为来执行成员资格推断的成员资格后验概率测试。我们为这两种攻击提供了理论分析,广泛的实验结果表明,CMIA和PMIA在这两种情况下都大大优于现有的MIA,特别是在低误报率的情况下,这对评估隐私风险至关重要。
摘要:A Membership Inference Attack (MIA) assesses how much a trained machine learning model reveals about its training data by determining whether specific query instances were included in the dataset. We classify existing MIAs into adaptive or non-adaptive, depending on whether the adversary is allowed to train shadow models on membership queries. In the adaptive setting, where the adversary can train shadow models after accessing query instances, we highlight the importance of exploiting membership dependencies between instances and propose an attack-agnostic framework called Cascading Membership Inference Attack (CMIA), which incorporates membership dependencies via conditional shadow training to boost membership inference performance. In the non-adaptive setting, where the adversary is restricted to training shadow models before obtaining membership queries, we introduce Proxy Membership Inference Attack (PMIA). PMIA employs a proxy selection strategy that identifies samples with similar behaviors to the query instance and uses their behaviors in shadow models to perform a membership posterior odds test for membership inference. We provide theoretical analyses for both attacks, and extensive experimental results demonstrate that CMIA and PMIA substantially outperform existing MIAs in both settings, particularly in the low false-positive regime, which is crucial for evaluating privacy risks.
【6】FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning
标题:FedBAP:通过联邦学习中的良性对抗扰动进行后门防御
链接:https://arxiv.org/abs/2507.21177
作者:n, Libing Wu, Zhuangzhuang Zhang, Bingyi Liu, Lijuan Huo, Jing Wang
备注:Accepted to ACM Multimedia 2025
摘要:联邦学习(FL)支持协作模型训练,同时保护数据隐私,但它极易受到后门攻击。FL中大多数现有的防御方法由于忽视了模型对后门触发器的过度依赖,特别是当恶意客户端的比例增加时,效果有限。在本文中,我们提出了FedBAP,一种新的防御框架,通过减少模型对后门触发器的依赖来减轻FL中的后门攻击。具体来说,首先,我们提出了一个扰动触发器生成机制,创建扰动触发器精确匹配后门触发器的位置和大小,确保强大的影响模型输出。其次,我们利用这些扰动触发器来生成良性的对抗性扰动,这些扰动破坏了模型对后门触发器的依赖,同时迫使它学习更鲁棒的决策边界。最后,我们设计了一个自适应缩放机制来动态调整扰动强度,有效地平衡防御强度和模型性能。实验结果表明,FedBAP在三种后门攻击下分别降低了0.22%-5.34%、0.48%-6.34%和97.22%-97.6%的攻击成功率。特别是,FedBAP在对抗新型后门攻击方面表现出色。
摘要:Federated Learning (FL) enables collaborative model training while preserving data privacy, but it is highly vulnerable to backdoor attacks. Most existing defense methods in FL have limited effectiveness due to their neglect of the model's over-reliance on backdoor triggers, particularly as the proportion of malicious clients increases. In this paper, we propose FedBAP, a novel defense framework for mitigating backdoor attacks in FL by reducing the model's reliance on backdoor triggers. Specifically, first, we propose a perturbed trigger generation mechanism that creates perturbation triggers precisely matching backdoor triggers in location and size, ensuring strong influence on model outputs. Second, we utilize these perturbation triggers to generate benign adversarial perturbations that disrupt the model's dependence on backdoor triggers while forcing it to learn more robust decision boundaries. Finally, we design an adaptive scaling mechanism to dynamically adjust perturbation intensity, effectively balancing defense strength and model performance. The experimental results demonstrate that FedBAP reduces the attack success rates by 0.22%-5.34%, 0.48%-6.34%, and 97.22%-97.6% under three types of backdoor attacks, respectively. In particular, FedBAP demonstrates outstanding performance against novel backdoor attacks.
【7】Generating Adversarial Point Clouds Using Diffusion Model
标题:利用扩散模型生成对抗点云
链接:https://arxiv.org/abs/2507.21163
作者:hao, Bingbing Zhu, Chuxuan Tong, Xiaoyi Zhou, Xi Zheng
摘要:针对三维点云分类的对抗性攻击方法揭示了点云识别模型的脆弱性。该漏洞可能会导致使用深度学习模型的关键应用程序(如自动驾驶汽车)的安全风险。为了发现这些模型的缺陷,研究人员可以通过对抗性攻击来评估它们的安全性。然而,大多数现有的对抗性攻击方法都是基于白盒攻击。虽然这些方法实现了高攻击成功率和不可感知性,但它们在现实世界场景中的适用性是有限的。黑盒攻击在现实世界的场景中更有意义,但往往效果不佳。提出了一种新的黑盒对抗样本生成方法,该方法利用扩散模型提高黑盒环境下的攻击成功率和不可感知性,而不依赖点云分类模型的内部信息生成对抗样本。我们使用一个3D扩散模型,将点云的压缩特征作为先验知识来指导反向扩散过程,将对抗点添加到干净的示例中。随后,它的逆过程被用来将其他类别的分布转换为对抗点,然后将其添加到点云。
摘要:Adversarial attack methods for 3D point cloud classification reveal the vulnerabilities of point cloud recognition models. This vulnerability could lead to safety risks in critical applications that use deep learning models, such as autonomous vehicles. To uncover the deficiencies of these models, researchers can evaluate their security through adversarial attacks. However, most existing adversarial attack methods are based on white-box attacks. While these methods achieve high attack success rates and imperceptibility, their applicability in real-world scenarios is limited. Black-box attacks, which are more meaningful in real-world scenarios, often yield poor results. This paper proposes a novel black-box adversarial example generation method that utilizes a diffusion model to improve the attack success rate and imperceptibility in the black-box setting, without relying on the internal information of the point cloud classification model to generate adversarial samples. We use a 3D diffusion model to use the compressed features of the point cloud as prior knowledge to guide the reverse diffusion process to add adversarial points to clean examples. Subsequently, its reverse process is employed to transform the distribution of other categories into adversarial points, which are then added to the point cloud.
【8】VizGenie: Toward Self-Refining, Domain-Aware Workflows for Next-Generation Scientific Visualization
标题:VizGenie:迈向下一代科学可视化的自我完善、领域感知工作流程
链接:https://arxiv.org/abs/2507.21124
作者:as, Terece L. Turton, Nishath Rajiv Ranasinghe, Shawn Jones, Bradley Love, William Jones, Aric Hagberg, Han-Wei Shen, Nathan DeBardeleben, Earl Lawrence
摘要:我们提出了VizGenie,一个自我改进的代理框架,通过编排特定领域和动态生成的模块集合,通过大型语言模型(LLM)推进科学可视化。用户最初通过预先存在的工具访问核心功能,例如基于阈值的过滤、切片提取和统计分析。对于超出此基线的任务,VizGenie自主使用LLM来生成新的可视化脚本(例如,VTK Python代码),按需扩展其功能。每个生成的脚本都经过自动后端验证,并在成功测试后无缝集成,不断增强系统的适应性和鲁棒性。VizGenie的一个显著特征是其直观的自然语言界面,允许用户发出基于高级特征的查询(例如,“想象头骨”)。该系统通过微调的视觉模型利用基于图像的分析和视觉问答(VQA)来精确地解释这些查询,从而将领域专业知识和技术实现联系起来。此外,用户可以通过VQA交互式查询生成的可视化,促进更深入的探索。检索扩增生成(RAG)进一步加强了可靠性和可重复性,提供上下文驱动的响应,同时保持全面的出处记录。对复杂体积数据集的评估表明,迭代可视化任务的认知开销显着减少。通过将策划的特定领域工具与LLM驱动的灵活性相结合,VizGenie不仅可以加速洞察力的生成,还可以建立可持续的,不断发展的可视化实践。由此产生的平台动态地从用户交互中学习,不断增强对科学可视化中以功能为中心的探索和可再现研究的支持。
摘要:We present VizGenie, a self-improving, agentic framework that advances scientific visualization through large language model (LLM) by orchestrating of a collection of domain-specific and dynamically generated modules. Users initially access core functionalities--such as threshold-based filtering, slice extraction, and statistical analysis--through pre-existing tools. For tasks beyond this baseline, VizGenie autonomously employs LLMs to generate new visualization scripts (e.g., VTK Python code), expanding its capabilities on-demand. Each generated script undergoes automated backend validation and is seamlessly integrated upon successful testing, continuously enhancing the system's adaptability and robustness. A distinctive feature of VizGenie is its intuitive natural language interface, allowing users to issue high-level feature-based queries (e.g., ``visualize the skull"). The system leverages image-based analysis and visual question answering (VQA) via fine-tuned vision models to interpret these queries precisely, bridging domain expertise and technical implementation. Additionally, users can interactively query generated visualizations through VQA, facilitating deeper exploration. Reliability and reproducibility are further strengthened by Retrieval-Augmented Generation (RAG), providing context-driven responses while maintaining comprehensive provenance records. Evaluations on complex volumetric datasets demonstrate significant reductions in cognitive overhead for iterative visualization tasks. By integrating curated domain-specific tools with LLM-driven flexibility, VizGenie not only accelerates insight generation but also establishes a sustainable, continuously evolving visualization practice. The resulting platform dynamically learns from user interactions, consistently enhancing support for feature-centric exploration and reproducible research in scientific visualization.
【9】Learning Kinetic Monte Carlo stochastic dynamics with Deep Generative Adversarial Networks
标题:使用深度生成对抗网络学习动态蒙特卡罗随机动力学
链接:https://arxiv.org/abs/2507.21763
作者:anzoni, Olivier Pierre-Louis, Roberto Bergamaschini, Francesco Montalenti
备注:15 pages, 8 figures, 2 appendices
摘要
:我们表明,生成对抗网络(GANs)可以有效地用于学习随机动态,在捕获热波动的同时替代传统模型。具体来说,我们展示了应用程序的二维,多粒子系统,专注于表面台阶的波动和相关的时间依赖性粗糙度。在基于动力学蒙特卡罗模拟构建数据集后,训练条件GAN以及时随机传播系统状态,从而可以以更低的计算成本生成新序列。相对于标准的GANs,这有利于收敛和提高精度的修改,进行了讨论。经过训练的网络被证明可以定量地再现平衡和动力学性质,包括标度律,与精确值的偏差只有百分之几。外推的限制和未来的前景进行了批判性的讨论。
摘要:We show that Generative Adversarial Networks (GANs) may be fruitfully exploited to learn stochastic dynamics, surrogating traditional models while capturing thermal fluctuations. Specifically, we showcase the application to a two-dimensional, many-particle system, focusing on surface-step fluctuations and on the related time-dependent roughness. After the construction of a dataset based on Kinetic Monte Carlo simulations, a conditional GAN is trained to propagate stochastically the state of the system in time, allowing the generation of new sequences with a reduced computational cost. Modifications with respect to standard GANs, which facilitate convergence and increase accuracy, are discussed. The trained network is demonstrated to quantitatively reproduce equilibrium and kinetic properties, including scaling laws, with deviations of a few percent from the exact value. Extrapolation limits and future perspectives are critically discussed.
半/弱/无/有监督|不确定性|主动学习(6篇)
【1】Probabilistic Consistency in Machine Learning and Its Connection to Uncertainty Quantification
标题:机器学习中的概率一致性及其与不确定性量化的联系
链接:https://arxiv.org/abs/2507.21670
作者:one, Anthony Kearsley
摘要:机器学习(ML)通常被视为一种强大的数据分析工具,由于其黑箱性质而易于学习。然而,这种性质也使得难以量化从ML模型中提取的预测的置信度,更根本的是,难以理解这些模型是如何对训练数据进行数学抽象的。本文的目标是解开这些问题和他们的连接不确定性量化(UQ),通过追求一条线的推理动机的诊断。在这种情况下,流行率-即类中元素的分数-通常是固有的兴趣。在这里,我们分析了流行率的许多解释,以导出分类的水平集理论,这表明某些类型的自洽ML模型等价于类条件概率分布。我们首先研究二进制贝叶斯最优分类器的属性,认识到它们的边界集可以被重新解释为成对密度比的水平集。通过参数化贝叶斯分类的患病率,然后,我们表明,他们满足重要的单调性和类切换属性,可以用来推断密度比,而无需直接访问边界集。此外,这些信息对于诸如构建多类贝叶斯最优分类器和估计类分配中的固有不确定性之类的任务是足够的。在多类的情况下,我们使用这些结果推导出规范化和自洽条件,后者相当于法律的分类器的总概率。我们还表明,这些是任意ML模型具有有效概率解释的必要条件。在整个过程中,我们展示了这种分析如何通过不确定性传播框架为ML提供更广泛的UQ任务。
摘要:Machine learning (ML) is often viewed as a powerful data analysis tool that is easy to learn because of its black-box nature. Yet this very nature also makes it difficult to quantify confidence in predictions extracted from ML models, and more fundamentally, to understand how such models are mathematical abstractions of training data. The goal of this paper is to unravel these issues and their connections to uncertainty quantification (UQ) by pursuing a line of reasoning motivated by diagnostics. In such settings, prevalence - i.e. the fraction of elements in class - is often of inherent interest. Here we analyze the many interpretations of prevalence to derive a level-set theory of classification, which shows that certain types of self-consistent ML models are equivalent to class-conditional probability distributions. We begin by studying the properties of binary Bayes optimal classifiers, recognizing that their boundary sets can be reinterpreted as level-sets of pairwise density ratios. By parameterizing Bayes classifiers in terms of the prevalence, we then show that they satisfy important monotonicity and class-switching properties that can be used to deduce the density ratios without direct access to the boundary sets. Moreover, this information is sufficient for tasks such as constructing the multiclass Bayes-optimal classifier and estimating inherent uncertainty in the class assignments. In the multiclass case, we use these results to deduce normalization and self-consistency conditions, the latter being equivalent to the law of total probability for classifiers. We also show that these are necessary conditions for arbitrary ML models to have valid probabilistic interpretations. Throughout we demonstrate how this analysis informs the broader task of UQ for ML via an uncertainty propagation framework.
【2】MapDiffusion: Generative Diffusion for Vectorized Online HD Map Construction and Uncertainty Estimation in Autonomous Driving
标题:MapDistribution:自动驾驶中的自动化在线高清地图构建和不确定性估计的生成扩散
链接:https://arxiv.org/abs/2507.21423
作者:nninger, Zihan Zhang, Zhipeng Mo, Md Zafar Anwar, Steffen Staab, Sihao Ding
备注:Accepted for 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
摘要:自动驾驶需要从传感器数据中了解静态环境。学习鸟瞰图(BEV)编码器通常用于融合多个输入,并且矢量解码器从潜在BEV网格预测矢量化地图表示。然而,传统的地图构建模型提供确定性的点估计,无法捕获不确定性和现实世界环境的固有模糊性,例如遮挡和缺失的车道标记。我们提出了MapDiffusion,一种新的生成方法,利用扩散范式来学习可能的矢量化地图的完整分布。MapDiffusion不是从学习的查询中预测单个确定性输出,而是迭代地细化随机初始化的查询,以BEV潜在网格为条件,以生成多个合理的地图样本。这允许聚合样本以提高预测准确度并导出与场景模糊度直接相关的不确定性估计。在nuScenes数据集上进行的大量实验表明,MapDiffusion在在线地图构建方面实现了最先进的性能,在单样本性能方面超过基线5%。我们进一步表明,聚合多个样本始终提高性能沿ROC曲线,验证分布建模的好处。此外,我们的不确定性估计在闭塞地区显着更高,加强其在识别模糊传感器输入区域的价值。通过对完整地图分布进行建模,MapDiffusion增强了在线矢量化高清地图构建的鲁棒性和可靠性,使自动驾驶车辆能够在复杂环境中做出具有不确定性的决策。
摘要:Autonomous driving requires an understanding of the static environment from sensor data. Learned Bird's-Eye View (BEV) encoders are commonly used to fuse multiple inputs, and a vector decoder predicts a vectorized map representation from the latent BEV grid. However, traditional map construction models provide deterministic point estimates, failing to capture uncertainty and the inherent ambiguities of real-world environments, such as occlusions and missing lane markings. We propose MapDiffusion, a novel generative approach that leverages the diffusion paradigm to learn the full distribution of possible vectorized maps. Instead of predicting a single deterministic output from learned queries, MapDiffusion iteratively refines randomly initialized queries, conditioned on a BEV latent grid, to generate multiple plausible map samples. This allows aggregating samples to improve prediction accuracy and deriving uncertainty estimates that directly correlate with scene ambiguity. Extensive experiments on the nuScenes dataset demonstrate that MapDiffusion achieves state-of-the-art performance in online map construction, surpassing the baseline by 5% in single-sample performance. We further show that aggregating multiple samples consistently improves performance along the ROC curve, validating the benefit of distribution modeling. Additionally, our uncertainty estimates are significantly higher in occluded areas, reinforcing their value in identifying regions with ambiguous sensor input. By modeling the full map distribution, MapDiffusion enhances the robustness and reliability of online vectorized HD map construction, enabling uncertainty-aware decision-making for autonomous vehicles in complex environments.
【3】OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection
标题:OCSV引导的无监督异常检测的表示学习
链接:https://arxiv.org/abs/2507.21164
作者:inon (MYRIAD), Carole Lartizien (MYRIAD)
摘要:无监督异常检测(UAD)旨在检测没有标记数据的异常,这是许多机器学习应用程序中异常样本罕见或不可用的必要条件。大多数最先进的方法分为两类:基于重建的方法,通常重建异常太好,和解耦表示学习与密度估计,这可能会受到次优的特征空间。虽然最近的一些方法试图将特征学习和异常检测结合起来,但它们通常依赖于代理目标,限制内核选择,或引入限制其表达能力和鲁棒性的近似。为了解决这一挑战,我们提出了一种新的方法,通过自定义的损失公式,直接将潜在特征与OCSVM决策边界紧密结合,将表示学习与分析可解的一类SVM(OCSVM)紧密结合。该模型在两个任务上进行评估:基于MNIST-C的新基准,以及具有挑战性的脑MRI细微病变检测任务。与大多数专注于图像水平上的大的高信号病变的方法不同,我们的方法成功地瞄准了小的非高信号病变,同时我们评估了体素指标,解决了更临床相关的情况。两个实验都评估了对域偏移的鲁棒性,包括MNIST-C中的腐败类型和MRI中的扫描仪/年龄变化。结果表明,我们提出的模式的性能和鲁棒性,突出其潜力一般UAD和现实世界的医学成像应用。源代码可在https://github.com/Nicolas-Pinon/uad_ocsvm_guided_repr_learning上获得
摘要
:Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data, a necessity in many machine learning applications where anomalous samples are rare or not available. Most state-of-the-art methods fall into two categories: reconstruction-based approaches, which often reconstruct anomalies too well, and decoupled representation learning with density estimators, which can suffer from suboptimal feature spaces. While some recent methods attempt to couple feature learning and anomaly detection, they often rely on surrogate objectives, restrict kernel choices, or introduce approximations that limit their expressiveness and robustness. To address this challenge, we propose a novel method that tightly couples representation learning with an analytically solvable one-class SVM (OCSVM), through a custom loss formulation that directly aligns latent features with the OCSVM decision boundary. The model is evaluated on two tasks: a new benchmark based on MNIST-C, and a challenging brain MRI subtle lesion detection task. Unlike most methods that focus on large, hyperintense lesions at the image level, our approach succeeds to target small, non-hyperintense lesions, while we evaluate voxel-wise metrics, addressing a more clinically relevant scenario. Both experiments evaluate a form of robustness to domain shifts, including corruption types in MNIST-C and scanner/age variations in MRI. Results demonstrate performance and robustness of our proposed mode,highlighting its potential for general UAD and real-world medical imaging applications. The source code is available at https://github.com/Nicolas-Pinon/uad_ocsvm_guided_repr_learning
【4】A Study on Variants of Conventional, Fuzzy, and Nullspace-Based Independence Criteria for Improving Supervised and Unsupervised Learning
标题:用于改善监督和无监督学习的传统、模糊和基于零空间的独立性标准变体的研究
链接:https://arxiv.org/abs/2507.21136
作者:oattari
摘要:无监督和监督学习方法通常使用内核来捕获数据结构中固有的非线性。然而,专家们必须确保他们提出的非线性最大化可变性,并捕捉数据的内在多样性。我们审查了所有的独立性标准,以设计无监督学习器。然后提出了3个独立性准则,并利用它们设计了无监督和有监督的降维方法。我们评估了这些方法在线性和神经非线性设置的对比度,准确性和可解释性。结果表明,这些方法的性能优于基线(tSNE,PCA,正则化LDA,VAE与(非)监督学习者和层共享),并为研究人员开辟了一条新的可解释机器学习(ML)路线。
摘要:Unsupervised and supervised learning methods conventionally use kernels to capture nonlinearities inherent in data structure. However experts have to ensure their proposed nonlinearity maximizes variability and capture inherent diversity of data. We reviewed all independence criteria to design unsupervised learners. Then we proposed 3 independence criteria and used them to design unsupervised and supervised dimensionality reduction methods. We evaluated contrast, accuracy and interpretability of these methods in both linear and neural nonlinear settings. The results show that the methods have outperformed the baseline (tSNE, PCA, regularized LDA, VAE with (un)supervised learner and layer sharing) and opened a new line of interpretable machine learning (ML) for the researchers.
【5】Supervised Quantum Image Processing
标题:监督量子图像处理
链接:https://arxiv.org/abs/2507.22039
作者:igi, Mehran Khosrojerdi, Filippo Caruso, Leonardo Banchi
备注:13 pages, 11 figures
摘要:在大数据和人工智能时代,日益增长的数据量和解决越来越复杂的计算挑战的需求是提高数据存储、处理和分析效率的两大驱动力。量子图像处理(QIP)是量子信息科学和图像处理之间的一个跨学科领域,它有可能通过利用量子计算的力量来缓解其中的一些挑战。在这项工作中,我们比较和研究四种不同的量子图像表示(QImR)的压缩特性:即张量网络表示(TNR),量子图像的灵活表示(FRQI),新的增强量子表示NEQR和量子概率图像编码(QPIE)。我们的模拟结果表明,FRQI执行更高的压缩图像信息比TNR,NEQR,和QPIE。此外,我们研究了二进制分类问题中的准确性和内存之间的权衡,评估了基于QImRs的量子内核与经典线性内核相比的性能。我们的研究结果表明,量子内核提供了相当的分类平均精度,但需要的图像存储资源呈指数级减少。
摘要:In the era of big data and artificial intelligence, the increasing volume of data and the demand to solve more and more complex computational challenges are two driving forces for improving the efficiency of data storage, processing and analysis. Quantum image processing (QIP) is an interdisciplinary field between quantum information science and image processing, which has the potential to alleviate some of these challenges by leveraging the power of quantum computing. In this work, we compare and examine the compression properties of four different Quantum Image Representations (QImRs): namely, Tensor Network Representation (TNR), Flexible Representation of Quantum Image (FRQI), Novel Enhanced Quantum Representation NEQR, and Quantum Probability Image Encoding (QPIE). Our simulations show that FRQI performs a higher compression of image information than TNR, NEQR, and QPIE. Furthermore, we investigate the trade-off between accuracy and memory in binary classification problems, evaluating the performance of quantum kernels based on QImRs compared to the classical linear kernel. Our results indicate that quantum kernels provide comparable classification average accuracy but require exponentially fewer resources for image storage.
【6】Generative imaging for radio interferometry with fast uncertainty quantification
标题:具有快速不确定性量化的无线电干涉测量生成成像
链接:https://arxiv.org/abs/2507.21270
作者:Mars, Tobías I. Liaudat, Jessica J. Whitney, Marta M. Betcke, Jason D. McEwen
摘要:随着大型射电干涉望远镜,特别是SKA的兴起,对计算效率高的图像重建技术的需求不断增长。现有的重建方法,如CLEAN算法或近端优化方法,本质上是迭代的,需要大量的计算。这些方法要么不提供不确定性量化,要么需要很大的计算开销。学习的重建方法已经显示出提供高效和高质量重建的希望。在这篇文章中,我们将探讨生成神经网络的使用,它可以对后验分布进行有效的近似采样,以实现具有不确定性量化的高质量重建。我们的RI-GAN框架建立在正则化条件生成对抗网络(rcGAN)框架的基础上,通过集成梯度U-Net(GU-Net)架构-一种将测量算子直接嵌入网络的混合重建模型。该框架使用Wasserstein GANs来提高训练稳定性,并结合战斗模式崩溃的正则化条款,这是条件GANs的典型问题。该方法将脏图像和观测的点扩散函数(PSF)作为输入,并提供高效、高质量的图像重建,该图像重建对变化的可见度覆盖率是鲁棒的,推广到具有增加的动态范围的图像,并提供信息量丰富的不确定性量化。我们的方法为下一代射电望远镜的计算效率,可扩展性和不确定性感知成像迈出了重要的一步。
摘要:With the rise of large radio interferometric telescopes, particularly the SKA, there is a growing demand for computationally efficient image reconstruction techniques. Existing reconstruction methods, such as the CLEAN algorithm or proximal optimisation approaches, are iterative in nature, necessitating a large amount of compute. These methods either provide no uncertainty quantification or require large computational overhead to do so. Learned reconstruction methods have shown promise in providing efficient and high quality reconstruction. In this article we explore the use of generative neural networks that enable efficient approximate sampling of the posterior distribution for high quality reconstructions with uncertainty quantification. Our RI-GAN framework, builds on the regularised conditional generative adversarial network (rcGAN) framework by integrating a gradient U-Net (GU-Net) architecture - a hybrid reconstruction model that embeds the measurement operator directly into the network. This framework uses Wasserstein GANs to improve training stability in combination with regularisation terms that combat mode collapse, which are typical problems for conditional GANs. This approach takes as input the dirty image and the point spread function (PSF) of the observation and provides efficient, high-quality image reconstructions that are robust to varying visibility coverages, generalises to images with an increased dynamic range, and provides informative uncertainty quantification. Our methods provide a significant step toward computationally efficient, scalable, and uncertainty-aware imaging for next-generation radio telescopes.
迁移|Zero/Few/One-Shot|自适应(6篇)
【1】Thou Shalt Not Prompt: Zero-Shot Human Activity Recognition in Smart Homes via Language Modeling of Sensor Data & Activities
标题:你不应提示:通过传感器数据和活动的语言建模在智能家居中实现Zero-Shot人类活动识别
链接:https://arxiv.org/abs/2507.21964
作者:unesh Dhekane, Thomas Ploetz
摘要:开发zero-shot人类活动识别(HAR)方法是智能家居研究的一个关键方向-考虑到其对使HAR系统在具有不同传感模式、布局和感兴趣活动的智能家居中工作的影响。沿着这个方向的最先进的解决方案是基于生成传感器数据的自然语言描述,并通过精心制作的提示将其馈送到LLM以执行分类。尽管它们有性能保证,但这种“取消LLM”方法存在几种风险,包括隐私侵犯,依赖外部服务,以及由于版本更改而导致的不一致预测,这使得不需要提示LLM的替代zero-shot HAR方法成为可能。在本文中,我们提出了一个这样的解决方案,使用自然语言对传感器数据和活动进行建模,利用其嵌入来执行zero-shot分类,从而绕过了提示LLM进行活动预测的需要。我们工作的影响在于对六个数据集进行了详细的案例研究,突出了语言建模如何在zero-shot识别中支持HAR系统。
摘要:Developing zero-shot human activity recognition (HAR) methods is a critical direction in smart home research -- considering its impact on making HAR systems work across smart homes having diverse sensing modalities, layouts, and activities of interest. The state-of-the-art solutions along this direction are based on generating natural language descriptions of the sensor data and feeding it via a carefully crafted prompt to the LLM to perform classification. Despite their performance guarantees, such ``prompt-the-LLM'' approaches carry several risks, including privacy invasion, reliance on an external service, and inconsistent predictions due to version changes, making a case for alternative zero-shot HAR methods that do not require prompting the LLMs. In this paper, we propose one such solution that models sensor data and activities using natural language, leveraging its embeddings to perform zero-shot classification and thereby bypassing the need to prompt the LLMs for activity predictions. The impact of our work lies in presenting a detailed case study on six datasets, highlighting how language modeling can bolster HAR systems in zero-shot recognition.
【2】Generalized few-shot transfer learning architecture for modeling the EDFA gain spectrum
标题:用于建模EDF收益谱的广义少次传输学习架构
链接:https://arxiv.org/abs/2507.21728
作者:aj, Zehao Wang, Tingjun Chen, Daniel C Kilper, Marco Ruffini
备注:This is a preprint of a paper accepted and published in the Journal of Optical Communications and Networking (JOCN). The final published version is available at: https://doi.org/10.1364/JOCN.560987
摘要:掺铒光纤放大器(EDFA)增益谱的精确建模对于优化光网络性能至关重要,尤其是在网络向多供应商解决方案发展的情况下。在这项工作中,我们提出了一个广义的Few-Shot转移学习架构的基础上半监督自规范化神经网络(SS-NN),利用内部EDFA功能-如VOA输入或输出功率和衰减,以提高增益谱预测。我们的SS-NN模型采用两阶段训练策略,包括无监督预训练和噪声增强测量,以及有监督微调和自定义加权MSE损失。此外,我们使用迁移学习(TL)技术扩展了该框架,该技术可以在增强器,增强器和ILA EDFA之间实现同质(相同特征空间)和异构(不同特征集)模型自适应。为了解决异质TL中的特征失配问题,我们引入了协方差匹配损失来调整源域和目标域之间的二阶特征统计。在COSMOS和Open Ireland测试平台上对26个EDFA进行的大量实验表明,与基准方法相比,所提出的方法显着减少了对系统的测量要求,同时实现了更低的平均绝对误差和更好的误差分布。
摘要:Accurate modeling of the gain spectrum in Erbium-Doped Fiber Amplifiers (EDFAs) is essential for optimizing optical network performance, particularly as networks evolve toward multi-vendor solutions. In this work, we propose a generalized few-shot transfer learning architecture based on a Semi-Supervised Self-Normalizing Neural Network (SS-NN) that leverages internal EDFA features - such as VOA input or output power and attenuation, to improve gain spectrum prediction. Our SS-NN model employs a two-phase training strategy comprising unsupervised pre-training with noise-augmented measurements and supervised fine-tuning with a custom weighted MSE loss. Furthermore, we extend the framework with transfer learning (TL) techniques that enable both homogeneous (same-feature space) and heterogeneous (different-feature sets) model adaptation across booster, preamplifier, and ILA EDFAs. To address feature mismatches in heterogeneous TL, we incorporate a covariance matching loss to align second-order feature statistics between source and target domains. Extensive experiments conducted across 26 EDFAs in the COSMOS and Open Ireland testbeds demonstrate that the proposed approach significantly reduces the number of measurements requirements on the system while achieving lower mean absolute errors and improved error distributions compared to benchmark methods.
【3】Adaptive Multimodal Protein Plug-and-Play with Diffusion-Based Priors
标题:具有基于扩散的先验的自适应多峰蛋白质即插即用
链接:https://arxiv.org/abs/2507.21260
作者:anerjee, Xingyu Xu, Caroline Moosmüller, Harlin Lee
备注:Code: this https URL
摘要:在逆问题中,目标是恢复未知参数(例如,图像),其在测量期间通常经历了一些有损或噪声变换。最近,深度生成模型,特别是扩散模型,已经成为蛋白质结构生成的强大先验。然而,整合来自多个来源的噪声实验数据来指导这些模型仍然是一个重大挑战。现有的方法通常需要精确的知识,实验噪声水平和手动调整的权重为每个数据模态。在这项工作中,我们介绍了Adam-Pandem,这是一个即插即用的框架,它使用来自多个异质实验源的梯度来指导预训练的蛋白质扩散模型。我们的框架具有自适应噪声估计方案和集成到扩散过程中的动态模态加权机制,这减少了对手动超参数调整的需要。复杂的重建任务的实验表明,显着提高精度使用亚当-Pendash。
摘要:In an inverse problem, the goal is to recover an unknown parameter (e.g., an image) that has typically undergone some lossy or noisy transformation during measurement. Recently, deep generative models, particularly diffusion models, have emerged as powerful priors for protein structure generation. However, integrating noisy experimental data from multiple sources to guide these models remains a significant challenge. Existing methods often require precise knowledge of experimental noise levels and manually tuned weights for each data modality. In this work, we introduce Adam-PnP, a Plug-and-Play framework that guides a pre-trained protein diffusion model using gradients from multiple, heterogeneous experimental sources. Our framework features an adaptive noise estimation scheme and a dynamic modality weighting mechanism integrated into the diffusion process, which reduce the need for manual hyperparameter tuning. Experiments on complex reconstruction tasks demonstrate significantly improved accuracy using Adam-PnP.
【4】AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction
标题:AdaptHeavy:机器学习解释驱动的亚组适应,用于基于EHR的临床预测
链接:https://arxiv.org/abs/2507.21197
作者:, Eva Aagaard
备注:11 pages, 3 figures
摘要:机器学习解释主要用于建立临床医生的信任,并在EHR中发现可操作的见解。然而,EHR数据的内在复杂性和异质性限制了其在指导特定亚组建模方面的有效性。我们提出了AdaptHead,一种新型的MLI驱动的框架,将可解释性见解转化为可操作的指导,用于在单个医院系统内的亚群中定制模型训练和评估。在三个大规模EHR数据集上进行评估- GOSSIS-1-eICU,WiDS和MIMIC-IV -AdaptHeart在预测ICU死亡率,住院死亡和隐藏的低氧血症方面始终识别异质模型行为。通过整合基于SHAP的解释和无监督聚类,该框架增强了对具有临床意义的亚组特异性特征的识别,从而提高了预测性能。
摘要
:Machine learning interpretation has primarily been leveraged to build clinician trust and uncover actionable insights in EHRs. However, the intrinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup-specific modeling. We propose AdaptHetero, a novel MLI-driven framework that transforms interpretability insights into actionable guidance for tailoring model training and evaluation across subpopulations within individual hospital systems. Evaluated on three large-scale EHR datasets - GOSSIS-1-eICU, WiDS, and MIMIC-IV - AdaptHetero consistently identifies heterogeneous model behaviors in predicting ICU mortality, in-hospital death, and hidden hypoxemia. By integrating SHAP-based interpretation and unsupervised clustering, the framework enhances the identification of clinically meaningful subgroup-specific characteristics, leading to improved predictive performance.
【5】Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues
标题:超越帧观看:利用原始时间视频和多模式线索进行Zero-Shot行人意图预测
链接:https://arxiv.org/abs/2507.21161
作者:ambare, Venkata Nikhil Thanikella, Ying Liu
备注:Accepted in IEEE 3rd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings 2025)
摘要:行人意图预测是复杂城市环境下自动驾驶的关键。传统的方法依赖于对帧序列的监督学习,并且需要大量的再训练以适应新的场景。在这里,我们介绍BF-PIP(超帧行人意图预测),一个zero-shot的方法,建立在双子座2.5专业。它直接从短的、连续的视频剪辑中推断交叉意图,这些视频剪辑富含结构化的JAAD元数据。与在离散帧上操作的基于GPT-4V的方法相比,BF-PIP处理不间断的时间剪辑。它还通过专门的多模态提示整合了边界框注释和自我车辆速度。在没有任何额外训练的情况下,BF-PIP的预测准确率达到73%,比GPT-4V基线高出18%。这些研究结果表明,结合时间的视频输入与上下文线索增强时空感知,并提高了模糊条件下的意图推断。这种方法为智能交通系统中的敏捷、免再训练感知模块铺平了道路。
摘要:Pedestrian intention prediction is essential for autonomous driving in complex urban environments. Conventional approaches depend on supervised learning over frame sequences and require extensive retraining to adapt to new scenarios. Here, we introduce BF-PIP (Beyond Frames Pedestrian Intention Prediction), a zero-shot approach built upon Gemini 2.5 Pro. It infers crossing intentions directly from short, continuous video clips enriched with structured JAAD metadata. In contrast to GPT-4V based methods that operate on discrete frames, BF-PIP processes uninterrupted temporal clips. It also incorporates bounding-box annotations and ego-vehicle speed via specialized multimodal prompts. Without any additional training, BF-PIP achieves 73% prediction accuracy, outperforming a GPT-4V baseline by 18 %. These findings illustrate that combining temporal video inputs with contextual cues enhances spatiotemporal perception and improves intent inference under ambiguous conditions. This approach paves the way for agile, retraining-free perception module in intelligent transportation system.
【6】Domain Generalization and Adaptation in Intensive Care with Anchor Regression
标题:基于锚回归的重症监护领域泛化与适应研究
链接:https://arxiv.org/abs/2507.21783
作者:dschien, Manuel Burger, Gunnar Rätsch, Peter Bühlmann
摘要:由于分布变化,在新医院部署时,临床环境中的预测模型的性能通常会降低。本文提出了一个大规模的研究,空心启发域泛化异构多中心重症监护病房(ICU)的数据。我们应用锚回归,并引入锚助推,一种新的,基于树的非线性扩展,一个大型数据集,包括40万名患者从9个不同的ICU数据库。锚正则化始终提高了分布外性能,特别是对于最不相似的目标域。该方法出现强大的违反理论假设,如锚外生性。此外,我们提出了一个新的概念框架来量化大型外部数据集的效用。通过评估作为可用目标域数据的函数的性能,我们确定了三种制度:(i)域泛化制度,其中只应使用外部模型,(ii)域适应制度,其中改装外部模型是最佳的,以及(iii)数据丰富的制度,其中外部数据不提供额外的价值。
摘要:The performance of predictive models in clinical settings often degrades when deployed in new hospitals due to distribution shifts. This paper presents a large-scale study of causality-inspired domain generalization on heterogeneous multi-center intensive care unit (ICU) data. We apply anchor regression and introduce anchor boosting, a novel, tree-based nonlinear extension, to a large dataset comprising 400,000 patients from nine distinct ICU databases. The anchor regularization consistently improves out-of-distribution performance, particularly for the most dissimilar target domains. The methods appear robust to violations of theoretical assumptions, such as anchor exogeneity. Furthermore, we propose a novel conceptual framework to quantify the utility of large external data datasets. By evaluating performance as a function of available target-domain data, we identify three regimes: (i) a domain generalization regime, where only the external model should be used, (ii) a domain adaptation regime, where refitting the external model is optimal, and (iii) a data-rich regime, where external data provides no additional value.
强化学习(5篇)
【1】Structure-Informed Deep Reinforcement Learning for Inventory Management
标题:库存管理的结构知情深度强化学习
链接:https://arxiv.org/abs/2507.22040
作者:ggiar, Sohrab Andaz, Akhil Bagaria, Carson Eisenach, Dean Foster, Omer Gottesman, Dominique Perrault-Joncas
摘要:本文研究了深度强化学习(DRL)在经典库存管理问题中的应用,重点是实际实施考虑。我们应用DRL算法的基础上DirectBackprop几个基本的库存管理方案,包括多周期系统的销售损失(有和没有提前期),易腐库存管理,双重采购,联合库存采购和拆除。DRL方法仅使用实践中可用的历史信息来学习跨产品的策略,避免对需求分布或分布参数的不切实际的假设。我们证明,我们的通用DRL实现竞争力或优于这些不同的设置建立的基准和prosticistics,同时需要最小的参数调整。通过检查学习的政策,我们表明,DRL的方法自然地捕捉到许多已知的结构特性的最佳政策来自传统的运筹学方法。为了进一步提高政策的性能和可解释性,我们提出了一种结构知情的政策网络技术,明确地将分析得出的最佳政策的特征纳入学习过程。这种方法可以帮助解释性,并增加鲁棒性的政策在样本外的性能,我们在一个例子中展示了现实的需求数据。最后,我们提供了一个说明性的应用DRL在非平稳设置。我们的工作弥合了库存管理中数据驱动学习和分析见解之间的差距,同时保持了实用性。
摘要:This paper investigates the application of Deep Reinforcement Learning (DRL) to classical inventory management problems, with a focus on practical implementation considerations. We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios including multi-period systems with lost sales (with and without lead times), perishable inventory management, dual sourcing, and joint inventory procurement and removal. The DRL approach learns policies across products using only historical information that would be available in practice, avoiding unrealistic assumptions about demand distributions or access to distribution parameters. We demonstrate that our generic DRL implementation performs competitively against or outperforms established benchmarks and heuristics across these diverse settings, while requiring minimal parameter tuning. Through examination of the learned policies, we show that the DRL approach naturally captures many known structural properties of optimal policies derived from traditional operations research methods. To further improve policy performance and interpretability, we propose a Structure-Informed Policy Network technique that explicitly incorporates analytically-derived characteristics of optimal policies into the learning process. This approach can help interpretability and add robustness to the policy in out-of-sample performance, as we demonstrate in an example with realistic demand data. Finally, we provide an illustrative application of DRL in a non-stationary setting. Our work bridges the gap between data-driven learning and analytical insights in inventory management while maintaining practical applicability.
【2】Improving Generative Ad Text on Facebook using Reinforcement Learning
标题:使用强化学习改进Facebook上的生成广告文本
链接:https://arxiv.org/abs/2507.21983
作者: Jiang, Alex Nikulkov, Yu-Chia Chen, Yang Bai, Zheqing Zhu
备注:D.J. and A.N. contributed equally, 41 pages, 6 figures
摘要:生成式人工智能(AI),特别是大型语言模型(LLM),有望推动变革性的经济变革。LLM在大量文本数据上进行预训练以学习一般语言模式,但随后的后训练阶段对于将它们与特定的现实任务相结合至关重要。强化学习(RL)是领先的后训练技术,但其经济影响在很大程度上仍然未被探索和量化。我们通过第一次部署RL训练的LLM在Facebook上进行生成广告的镜头来研究这个问题。集成到Meta的文本生成功能中,我们的模型“AdLlama”为AI工具提供动力,帮助广告商创建人类编写的广告文本的新变体。为了训练这个模型,我们引入了带有性能反馈的强化学习(RLPF),这是一种使用历史广告性能数据作为奖励信号的后训练方法。在Facebook上进行的一项为期10周的大规模A/B测试中,涵盖了近35,000个广告商和640,000个广告变体,我们发现AdLlama与在策划广告上训练的监督模仿模型相比,将点击率提高了6.7%(p=0.0296)。这意味着广告客户在Facebook上的投资回报率大幅提高。我们还发现,使用AdLlama的广告客户产生了更多的广告变化,表明对模型输出的满意度更高。据我们所知,这是迄今为止关于在生态有效环境中使用生成AI的最大规模的研究,提供了量化RL后培训的有形影响的重要数据点。此外,结果表明,RLPF是一种有前途的和可推广的方法,用于度量驱动的后训练,弥合了高能力的语言模型和有形成果之间的差距。
摘要:Generative artificial intelligence (AI), in particular large language models (LLMs), is poised to drive transformative economic change. LLMs are pre-trained on vast text data to learn general language patterns, but a subsequent post-training phase is critical to align them for specific real-world tasks. Reinforcement learning (RL) is the leading post-training technique, yet its economic impact remains largely underexplored and unquantified. We examine this question through the lens of the first deployment of an RL-trained LLM for generative advertising on Facebook. Integrated into Meta's Text Generation feature, our model, "AdLlama," powers an AI tool that helps advertisers create new variations of human-written ad text. To train this model, we introduce reinforcement learning with performance feedback (RLPF), a post-training method that uses historical ad performance data as a reward signal. In a large-scale 10-week A/B test on Facebook spanning nearly 35,000 advertisers and 640,000 ad variations, we find that AdLlama improves click-through rates by 6.7% (p=0.0296) compared to a supervised imitation model trained on curated ads. This represents a substantial improvement in advertiser return on investment on Facebook. We also find that advertisers who used AdLlama generated more ad variations, indicating higher satisfaction with the model's outputs. To our knowledge, this is the largest study to date on the use of generative AI in an ecologically valid setting, offering an important data point quantifying the tangible impact of RL post-training. Furthermore, the results show that RLPF is a promising and generalizable approach for metric-driven post-training that bridges the gap between highly capable language models and tangible outcomes.
【3】Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics
标题:Assistax:辅助机器人的硬件加速强化学习基准
链接:https://arxiv.org/abs/2507.21638
作者:inckeldey, Elliot Fosong, Elle Miller, Rimvydas Rubavicius, Trevor McInroe, Patricia Wollstadt, Christiane B. Wiebel-Herboth, Subramanian Ramamoorthy, Stefano V. Albrecht
备注:Accepted for the Coordination and Cooperation in Multi-Agent Reinforcement Learning Workshop at the Reinforcement Learning Conference 2025
摘要:强化学习(RL)算法的发展在很大程度上是由雄心勃勃的挑战任务和基准驱动的。游戏已经主导了RL基准测试,因为它们提出了相关的挑战,运行成本低,易于理解。虽然像围棋和雅达利这样的游戏已经带来了许多突破,但它们通常不会直接转化为现实世界的具体应用。在认识到需要多样化RL基准和解决在具体的交互场景中出现的复杂性时,我们引入了Assistax:一个开源基准,旨在解决辅助机器人任务中出现的挑战。Assistax使用JAX的硬件加速来显著提高基于物理的模拟学习的速度。在开环挂钟时间方面,与基于CPU的替代方案相比,Assistax在矢量化训练运行时的运行速度快了370倍。Assistax概念化的辅助机器人和一个积极的人类患者之间的互动,使用多智能体RL训练人口的不同合作伙伴代理,对一个具体的机器人代理的zero-shot协调能力可以进行测试。对流行的连续控制RL和MARL算法进行了广泛的评估和超参数调整,提供了可靠的基线,并将Assistax确立为推进辅助机器人RL研究的实用基准。该代码可从以下网址获得:https://github.com/assistive-autonomy/assistax。
摘要:The development of reinforcement learning (RL) algorithms has been largely driven by ambitious challenge tasks and benchmarks. Games have dominated RL benchmarks because they present relevant challenges, are inexpensive to run and easy to understand. While games such as Go and Atari have led to many breakthroughs, they often do not directly translate to real-world embodied applications. In recognising the need to diversify RL benchmarks and addressing complexities that arise in embodied interaction scenarios, we introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks. Assistax uses JAX's hardware acceleration for significant speed-ups for learning in physics-based simulations. In terms of open-loop wall-clock time, Assistax runs up to $370\times$ faster when vectorising training runs compared to CPU-based alternatives. Assistax conceptualises the interaction between an assistive robot and an active human patient using multi-agent RL to train a population of diverse partner agents against which an embodied robotic agent's zero-shot coordination capabilities can be tested. Extensive evaluation and hyperparameter tuning for popular continuous control RL and MARL algorithms provide reliable baselines and establish Assistax as a practical benchmark for advancing RL research for assistive robotics. The code is available at: https://github.com/assistive-autonomy/assistax.
【4】Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach
标题:在多目标强化学习中实现帕累托平稳性探索:多目标加权Chebyshev演员评论家方法
链接:https://arxiv.org/abs/2507.21397
作者:, Jiao Yang, Tianchen Zhou, Haibo Yang, Chaosheng Dong, Fan Yang, Michinari Momma, Yan Gao, Jia Liu
摘要:在许多多目标强化学习(MORL)应用中,能够在理论上有限时间样本复杂度保证的情况下系统地探索多个非凸奖励目标下的Pareto平稳解是一个重要但尚未探索的问题。这促使我们迈出第一步,填补MORL的重要空白。具体来说,在本文中,我们提出了一个\uline{M}ulti-\uline{O}bjective weighted-\uline{CH}ebyshev \uline{A}ctor-critic(MOCHA)算法MORL,明智地集成了加权Chebychev(WC)和actor-critic框架,使Pareto平稳性探索系统与有限时间的样本复杂度保证。MOCHA算法的复杂度结果表明,在寻找$\martbf {p}$的Pareto平稳解时,$p_{\min}$具有有趣的依赖性,其中$p_{\min}$表示WC-标量化中给定权向量$\martbf {p}$的最小项。通过仔细选择学习率,每次探索的样本复杂度可以是$\tilde{\mathcal{O}}(\mathcal ^{-2})$。此外,在一个大型KuaiRand离线数据集上的仿真研究表明,MOCHA算法的性能显着优于其他基线MORL方法。
摘要:In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the first step and fill the important gap in MORL. Specifically, in this paper, we propose a \uline{M}ulti-\uline{O}bjective weighted-\uline{CH}ebyshev \uline{A}ctor-critic (MOCHA) algorithm for MORL, which judiciously integrates the weighted-Chebychev (WC) and actor-critic framework to enable Pareto-stationarity exploration systematically with finite-time sample complexity guarantee. Sample complexity result of MOCHA algorithm reveals an interesting dependency on $p_{\min}$ in finding an $\epsilon$-Pareto-stationary solution, where $p_{\min}$ denotes the minimum entry of a given weight vector $\mathbf{p}$ in WC-scarlarization. By carefully choosing learning rates, the sample complexity for each exploration can be $\tilde{\mathcal{O}}(\epsilon^{-2})$. Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of MOCHA algorithm significantly outperforms other baseline MORL approaches.
【5】Deep Reinforcement Learning for Real-Time Green Energy Integration in Data Centers
标题:深度强化学习用于数据中心实时绿色能源集成
链接:https://arxiv.org/abs/2507.21153
作者: Bahi, Amel Ourici
摘要:本文探讨了深度强化学习(DRL)优化的电子商务数据中心能源管理系统的实施,旨在提高能源效率,成本效益和环境可持续性。所提出的系统利用DRL算法来动态管理可再生能源、储能和电网电力的集成,实时适应波动的能源可用性。该研究表明,DRL优化的系统实现了38%的能源成本降低,显着优于传统的强化学习(RL)方法(28%)和启发式方法(22%)。此外,它保持了1.5%的低SLA违规率,而RL为3.0%,启发式方法为4.8%。DRL优化的方法还导致能源效率提高82%,超过其他方法,并减少45%的碳排放,使其成为最环保的解决方案。该系统的累积奖励为950,反映了其在平衡多个目标方面的卓越表现。通过严格的测试和消融研究,本文验证了DRL模型的架构和参数的有效性,为数据中心的能源管理提供了一个强大的解决方案。研究结果强调了DRL在推进能源优化战略和应对可持续性挑战方面的潜力。
摘要:This paper explores the implementation of a Deep Reinforcement Learning (DRL)-optimized energy management system for e-commerce data centers, aimed at enhancing energy efficiency, cost-effectiveness, and environmental sustainability. The proposed system leverages DRL algorithms to dynamically manage the integration of renewable energy sources, energy storage, and grid power, adapting to fluctuating energy availability in real time. The study demonstrates that the DRL-optimized system achieves a 38\% reduction in energy costs, significantly outperforming traditional Reinforcement Learning (RL) methods (28\%) and heuristic approaches (22\%). Additionally, it maintains a low SLA violation rate of 1.5\%, compared to 3.0\% for RL and 4.8\% for heuristic methods. The DRL-optimized approach also results in an 82\% improvement in energy efficiency, surpassing other methods, and a 45\% reduction in carbon emissions, making it the most environmentally friendly solution. The system's cumulative reward of 950 reflects its superior performance in balancing multiple objectives. Through rigorous testing and ablation studies, the paper validates the effectiveness of the DRL model's architecture and parameters, offering a robust solution for energy management in data centers. The findings highlight the potential of DRL in advancing energy optimization strategies and addressing sustainability challenges.
符号|符号学习(2篇)
【1】DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation
标题:DEM-NeRF:一种通过物理知情模拟进行科学发现的神经符号方法
链接:https://arxiv.org/abs/2507.21350
作者:n, Alvaro Velasquez, Houbing Song
摘要:神经网络已经成为建模物理系统的强大工具,提供了从有限数据中学习复杂表示的能力,同时集成了基础科学知识。特别是,将数据驱动学习(neuro)与符号方程和规则(symbolic)相结合的神经符号方法,解决了纯经验方法与传统数值求解器之间的紧张关系,纯经验方法有偏离既定物理原理的风险,而传统数值求解器需要完整的几何知识,并且对于高保真模拟来说可能过于昂贵。在这项工作中,我们提出了一种新的神经符号框架,用于直接从稀疏多视图图像序列重建和模拟弹性物体,而不需要显式的几何信息。具体来说,我们集成了一个神经辐射场(NeRF)的对象重建与物理信息的神经网络(PINN),其中包括弹性的偏微分方程。在这样做的过程中,我们的方法学习变形对象的时空表示,利用图像监督和符号物理约束。为了处理复杂的边界和初始条件,传统上使用有限元方法,边界元方法或基于传感器的测量,我们采用了能量约束的物理信息神经网络架构。这种设计提高了模拟精度和结果的可解释性。
摘要:Neural networks have emerged as a powerful tool for modeling physical systems, offering the ability to learn complex representations from limited data while integrating foundational scientific knowledge. In particular, neuro-symbolic approaches that combine data-driven learning, the neuro, with symbolic equations and rules, the symbolic, address the tension between methods that are purely empirical, which risk straying from established physical principles, and traditional numerical solvers that demand complete geometric knowledge and can be prohibitively expensive for high-fidelity simulations. In this work, we present a novel neuro-symbolic framework for reconstructing and simulating elastic objects directly from sparse multi-view image sequences, without requiring explicit geometric information. Specifically, we integrate a neural radiance field (NeRF) for object reconstruction with physics-informed neural networks (PINN) that incorporate the governing partial differential equations of elasticity. In doing so, our method learns a spatiotemporal representation of deforming objects that leverages both image supervision and symbolic physical constraints. To handle complex boundary and initial conditions, which are traditionally confronted using finite element methods, boundary element methods, or sensor-based measurements, we employ an energy-constrained Physics-Informed Neural Network architecture. This design enhances both simulation accuracy and the explainability of results.
【2】Operator-Based Machine Intelligence: A Hilbert Space Framework for Spectral Learning and Symbolic Reasoning
标题:基于操作员的机器智能:谱学习和符号推理的希尔BERT空间框架
链接:https://arxiv.org/abs/2507.21189
作者:ruluta, Andreas Lemos, Priscilla Burity
摘要:传统的机器学习模型,特别是神经网络,都是建立在有限维参数空间和非线性函数近似的基础上。本报告探讨了一种替代方案,其中学习任务表示为无限维希尔伯特空间中的采样和计算,利用泛函分析,信号处理和谱理论的工具。我们回顾基本概念,如再生核希尔伯特空间(RKHS),谱算子学习和小波域表示。我们提出了一个严格的数学公式的学习希尔伯特空间,突出最近的模型的基础上散射变换和Koopman算子,并讨论相对于传统的神经架构的优势和局限性。报告最后概述了基于希尔伯特信号处理的可扩展和可解释的机器学习的方向。
摘要:Traditional machine learning models, particularly neural networks, are rooted in finite-dimensional parameter spaces and nonlinear function approximations. This report explores an alternative formulation where learning tasks are expressed as sampling and computation in infinite dimensional Hilbert spaces, leveraging tools from functional analysis, signal processing, and spectral theory. We review foundational concepts such as Reproducing Kernel Hilbert Spaces (RKHS), spectral operator learning, and wavelet-domain representations. We present a rigorous mathematical formulation of learning in Hilbert spaces, highlight recent models based on scattering transforms and Koopman operators, and discuss advantages and limitations relative to conventional neural architectures. The report concludes by outlining directions for scalable and interpretable machine learning grounded in Hilbertian signal processing.
分层学习(1篇)
【1】Hierarchical Stochastic Differential Equation Models for Latent Manifold Learning in Neural Time Series
标题:神经时间序列中潜在Manifold学习的分层随机方程模型
链接:https://arxiv.org/abs/2507.21531
作者:jaei, Maryam Ostadsharif Memar, Navid Ziaei, Behzad Nazari, Ali Yousefi
摘要:流形假设表明,高维神经时间序列位于由更简单的底层动力学形成的低维流形上。为了揭示这种结构,潜在的动态变量模型,如状态空间模型,递归神经网络,神经常微分方程和高斯过程潜变量模型被广泛使用。我们提出了一种新的分层随机微分方程(SDD)模型,平衡计算效率和可解释性,解决现有方法的关键限制。我们的模型假设流形的轨迹可以从流形轨迹的稀疏样本集重建。潜在空间使用布朗桥SDES建模,点-指定的时间和值-从多变量标记点过程中采样。这些布朗桥定义了第二组PDE的漂移,然后将其映射到观测数据。这产生了一个连续的,可微的潜在过程,能够建模任意复杂的时间序列作为流形点的数量增加。我们推导出训练和推理过程,并表明推理的计算成本与观察数据的长度呈线性关系。然后,我们在合成数据和神经记录上验证了我们的模型,以证明它准确地恢复了底层的流形结构,并有效地扩展了数据维度。
摘要
:The manifold hypothesis suggests that high-dimensional neural time series lie on a low-dimensional manifold shaped by simpler underlying dynamics. To uncover this structure, latent dynamical variable models such as state-space models, recurrent neural networks, neural ordinary differential equations, and Gaussian Process Latent Variable Models are widely used. We propose a novel hierarchical stochastic differential equation (SDE) model that balances computational efficiency and interpretability, addressing key limitations of existing methods. Our model assumes the trajectory of a manifold can be reconstructed from a sparse set of samples from the manifold trajectory. The latent space is modeled using Brownian bridge SDEs, with points - specified in both time and value - sampled from a multivariate marked point process. These Brownian bridges define the drift of a second set of SDEs, which are then mapped to the observed data. This yields a continuous, differentiable latent process capable of modeling arbitrarily complex time series as the number of manifold points increases. We derive training and inference procedures and show that the computational cost of inference scales linearly with the length of the observation data. We then validate our model on both synthetic data and neural recordings to demonstrate that it accurately recovers the underlying manifold structure and scales effectively with data dimensionality.
医学相关(2篇)
【1】Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis
标题:使用机器学习预测心血管疾病:比较分析
链接:https://arxiv.org/abs/2507.21898
作者:rinivas Ramesh, Roshani T S Udupa, Monisha J, Kushi K K S
摘要:心血管疾病(CVD)是全球死亡的主要原因,占所有死亡的31%。这项研究涉及一个心血管疾病(CVD)数据集,包括68,119条记录,以探索数字(年龄,身高,体重,血压,BMI)和分类性别,胆固醇,葡萄糖,吸烟,酒精,活动)因素对CVD发生的影响。我们进行了统计分析,包括t检验,卡方检验和方差分析,以确定CVD和老年人,高血压,体重增加和胆固醇水平异常之间的强相关性,而体力活动(保护因素)。一个逻辑回归模型强调年龄、血压和胆固醇是主要的风险因素,吸烟和饮酒与此有意想不到的负相关,这表明了潜在的数据问题。模型性能比较显示,CatBoost是表现最好的,准确度为0.734,ECE为0.0064,并且在概率预测方面表现出色(Brier评分= 0.1824)。数据挑战,包括离群值和偏态分布,表明需要改进预处理以提高预测可靠性。
摘要:Cardiovascular diseases (CVDs) are a main cause of mortality globally, accounting for 31% of all deaths. This study involves a cardiovascular disease (CVD) dataset comprising 68,119 records to explore the influence of numerical (age, height, weight, blood pressure, BMI) and categorical gender, cholesterol, glucose, smoking, alcohol, activity) factors on CVD occurrence. We have performed statistical analyses, including t-tests, Chi-square tests, and ANOVA, to identify strong associations between CVD and elderly people, hypertension, higher weight, and abnormal cholesterol levels, while physical activity (a protective factor). A logistic regression model highlights age, blood pressure, and cholesterol as primary risk factors, with unexpected negative associations for smoking and alcohol, suggesting potential data issues. Model performance comparisons reveal CatBoost as the top performer with an accuracy of 0.734 and an ECE of 0.0064 and excels in probabilistic prediction (Brier score = 0.1824). Data challenges, including outliers and skewed distributions, indicate a need for improved preprocessing to enhance predictive reliability.
【2】PanoGAN A Deep Generative Model for Panoramic Dental Radiographs
标题:PanoGAN全景牙科X光片深度生成模型
链接:https://arxiv.org/abs/2507.21200
作者:ersen, Sanyam Jain, Mikkel Chavez, Viktor Ladehoff, Bruna Neves de Freitas, Ruben Pauwels
摘要:本文介绍了一个生成对抗网络(GAN)的发展,用于合成牙科全景X光片。虽然本质上是探索性的,但该研究旨在解决牙科研究和教育数据稀缺的问题。我们在2322张不同质量的射线照片的数据集上使用具有梯度惩罚的Wasserstein损失(WGANGP)训练了深度卷积GAN(DCGAN)。重点是牙槽骨区域,其他解剖结构被裁剪掉。进行了广泛的预处理和数据清理,以标准化输入,同时保留解剖变异性。我们通过不同的批评者迭代,特征深度和在训练前使用去噪来探索四个候选模型。临床专家根据解剖可见性和真实性,使用5分制(1非常差5极好)评价生成的X线片。大多数图像显示中度解剖描述,尽管有些图像因伪影而退化。观察到一个权衡,在非去噪数据上训练的模型产生了更精细的细节,特别是在下颌管和骨小梁等结构中,而在去噪数据上训练的模型提供了更好的整体图像清晰度和锐度。这些发现为牙科成像中基于GAN的方法的未来工作提供了基础。
摘要:This paper presents the development of a generative adversarial network (GAN) for synthesizing dental panoramic radiographs. Although exploratory in nature, the study aims to address the scarcity of data in dental research and education. We trained a deep convolutional GAN (DCGAN) using a Wasserstein loss with gradient penalty (WGANGP) on a dataset of 2322 radiographs of varying quality. The focus was on the dentoalveolar regions, other anatomical structures were cropped out. Extensive preprocessing and data cleaning were performed to standardize the inputs while preserving anatomical variability. We explored four candidate models by varying critic iterations, feature depth, and the use of denoising prior to training. A clinical expert evaluated the generated radiographs based on anatomical visibility and realism, using a 5-point scale (1 very poor 5 excellent). Most images showed moderate anatomical depiction, although some were degraded by artifacts. A trade-off was observed the model trained on non-denoised data yielded finer details especially in structures like the mandibular canal and trabecular bone, while a model trained on denoised data offered superior overall image clarity and sharpness. These findings provide a foundation for future work on GAN-based methods in dental imaging.
推荐(1篇)
【1】FedFlex: Federated Learning for Diverse Netflix Recommendations
标题:FedFlex:针对多元化Netflix推荐的联合学习
链接:https://arxiv.org/abs/2507.21115
作者:ester, Manel Slokom, Gustavo de Carvalho Bertoli, Matias Vizcaino, Emmanuelle Beauxis Aussalet, Laura Hollink
摘要:联合学习是一种分散的方法,可以在多个设备上进行协作模型训练,同时保护数据隐私。它在各个领域显示出巨大的潜力,包括医疗保健和个性化推荐系统。然而,大多数关于联合推荐系统的现有工作主要集中在提高准确性上,对公平性和多样性的关注有限。在本文中,我们介绍FedFlex,联邦推荐系统Netflix风格的电视剧推荐。FedFlex集成了两种最先进的矩阵分解算法,用于个性化微调。FedFlex还应用最大边缘相关性(MMR)来重新排序项目并增强多样性。我们进行了广泛的实验比较SVD和BPR算法产生的建议。在为期两周的实时用户研究中,参与者收到了两个推荐列表:列表A,基于SVD或BPR,以及列表B,重新排名的版本强调多样性。参与者被要求点击他们感兴趣的电影。我们的研究结果表明,FedFlex有效地将不同的内容,如新的类型,引入到推荐中,而不一定会影响用户的满意度。
摘要:Federated learning is a decentralized approach that enables collaborative model training across multiple devices while preserving data privacy. It has shown significant potential in various domains, including healthcare and personalized recommendation systems. However, most existing work on federated recommendation systems has focused primarily on improving accuracy, with limited attention to fairness and diversity. In this paper, we introduce FedFlex, a federated recommender system for Netflix-style TV series recommendations. FedFlex integrates two state-of-the-art matrix factorization algorithms for personalized fine-tuning. FedFlex also applies Maximal Marginal Relevance (MMR) to re-rank items and enhance diversity. We conduct extensive experiments comparing recommendations generated by SVD and BPR algorithms. In a live two-week user study, participants received two recommendation lists: List A, based on SVD or BPR, and List B, a re-ranked version emphasizing diversity. Participants were asked to click on the movies they were interested in watching. Our findings demonstrate that FedFlex effectively introduces diverse content, such as new genres, into recommendations without necessarily compromising user satisfaction.
自动驾驶|车辆|车道检测等(1篇)
【1】Efficient Neural Combinatorial Optimization Solver for the Min-max Heterogeneous Capacitated Vehicle Routing Problem
标题:最小-最大异类容量车辆路径问题的高效神经组合优化求解器
链接:https://arxiv.org/abs/2507.21386
作者:Di Wang, Chunguo Wu, Kaifang Qi, Chunyan Miao, Yubin Xiao, Jian Zhang, You Zhou
摘要:许多神经组合优化(NCO)求解器已被提出来解决车辆路径问题(VRP)。然而,大多数这些求解器只专注于单车辆VRP的变种,忽略了更现实的最小-最大异构容量限制车辆路径问题(MMHCVRP),其中涉及多个车辆。现有的MMHCVRP求解器通常在每个解码步骤中选择车辆及其下一个节点进行访问,但通常会做出短视的解码决策并忽略MMHCVRP的关键属性,包括局部拓扑关系,车辆置换不变性和节点对称性,从而导致次优性能。为了更好地解决这些限制,我们提出了ECHO,一个有效的NCO求解器。首先,ECHO利用所提出的双模态节点编码器来捕获节点之间的局部拓扑关系。随后,为了减轻近视决策,ECHO采用所提出的无参数交叉注意机制来优先考虑在前面的解码步骤中选择的车辆。最后,利用车辆排列不变性和节点对称性,我们为MMHCVRP引入了一种定制的数据增强策略,以稳定强化学习训练过程。为了评估ECHO的性能,我们进行了大量的实验。实验结果表明,ECHO在不同数量的车辆和节点上的性能优于最先进的NCO求解器,并在尺度和分布模式上表现出良好的泛化能力。最后,消融研究验证了所有提出的方法的有效性。
摘要:Numerous Neural Combinatorial Optimization (NCO) solvers have been proposed to address Vehicle Routing Problems (VRPs). However, most of these solvers focus exclusively on single-vehicle VRP variants, overlooking the more realistic min-max Heterogeneous Capacitated Vehicle Routing Problem (MMHCVRP), which involves multiple vehicles. Existing MMHCVRP solvers typically select a vehicle and its next node to visit at each decoding step, but often make myopic decoding decisions and overlook key properties of MMHCVRP, including local topological relationships, vehicle permutation invariance, and node symmetry, resulting in suboptimal performance. To better address these limitations, we propose ECHO, an efficient NCO solver. First, ECHO exploits the proposed dual-modality node encoder to capture local topological relationships among nodes. Subsequently, to mitigate myopic decisions, ECHO employs the proposed Parameter-Free Cross-Attention mechanism to prioritize the vehicle selected in the preceding decoding step. Finally, leveraging vehicle permutation invariance and node symmetry, we introduce a tailored data augment strategy for MMHCVRP to stabilize the Reinforcement Learning training process. To assess the performance of ECHO, we conduct extensive experiments. The experimental results demonstrate that ECHO outperforms state-of-the-art NCO solvers across varying numbers of vehicles and nodes, and exhibits well-performing generalization across both scales and distribution patterns. Finally, ablation studies validate the effectiveness of all proposed methods.
推理|分析|理解|解释(6篇)
【1】Analysis of Fourier Neural Operators via Effective Field Theory
标题:有效场论分析傅里叶神经运算符
链接:https://arxiv.org/abs/2507.21833
作者:Kim
备注:37 pages, 10 figures
摘要:傅立叶神经算子(FNOs)已经成为高维偏微分方程的主要代理人,但它们的稳定性,泛化和频率行为缺乏原则性的解释。我们提出了第一个系统的有效场理论分析的FNOs在无限维函数空间,推导出封闭的递归关系的层内核和四点顶点,然后检查三个实际上重要的设置分析激活,规模不变的情况下,与残留连接的架构。理论表明,非线性激活不可避免地耦合频率输入的高频模式,否则丢弃的频谱截断,实验证实了这种频率转移。对于宽的网络,我们得到明确的临界条件的权重初始化合奏,保持小的输入扰动有统一的规模在整个深度,和实证测试验证这些预测。总之,我们的研究结果量化了非线性如何使神经运算符捕捉非平凡的特征,通过关键性分析提供超参数选择的标准,并解释了为什么尺度不变的激活和残余连接增强了FNO中的特征学习。
摘要:Fourier Neural Operators (FNOs) have emerged as leading surrogates for high-dimensional partial-differential equations, yet their stability, generalization and frequency behavior lack a principled explanation. We present the first systematic effective-field-theory analysis of FNOs in an infinite-dimensional function space, deriving closed recursion relations for the layer kernel and four-point vertex and then examining three practically important settings-analytic activations, scale-invariant cases and architectures with residual connections. The theory shows that nonlinear activations inevitably couple frequency inputs to high-frequency modes that are otherwise discarded by spectral truncation, and experiments confirm this frequency transfer. For wide networks we obtain explicit criticality conditions on the weight-initialization ensemble that keep small input perturbations to have uniform scale across depth, and empirical tests validate these predictions. Taken together, our results quantify how nonlinearity enables neural operators to capture non-trivial features, supply criteria for hyper-parameter selection via criticality analysis, and explain why scale-invariant activations and residual connections enhance feature learning in FNOs.
【2】MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse
标题:MemShare:通过KV缓存重用进行大型推理模型的内存高效推理
链接:https://arxiv.org/abs/2507.21433
作者:en, Xin Tan, Minchen Yu, Hong Xu
备注:11 pages, 7 figures, submitted to AAAI 2026
摘要:大型推理模型(LRM)在数学推理和形式逻辑任务方面取得了重大进展。然而,它们倾向于生成冗长的思想链序列,这会导致推理过程中大量的内存开销。我们观察到,LRM经常产生高度相似的中间推理步骤,这对应于跨层的类似KV缓存状态。出于这种观察,我们提出MemShare,一种新的KV缓存管理方法,有效地减少内存开销。MemShare采用协同过滤算法,有效识别可重用KV缓存块,实现零拷贝缓存重用,显著降低内存开销,提高吞吐量,同时保持准确性。实验结果表明,MemShare提供了高达84.79%的吞吐量的提高,同时保持更好的准确性相比,现有的KV缓存管理方法。
摘要:Large Reasoning Models (LRMs) have achieved significant advances in mathematical reasoning and formal logic tasks. However, their tendency to generate lengthy chain-of-thought sequences leads to substantial memory overhead during inference. We observe that LRMs frequently produce highly similar intermediate reasoning steps, which correspond to similar KV cache states across layers. Motivated by this observation, we propose MemShare, a novel KV cache management approach that effectively reduces memory overhead. MemShare employs a collaborative filtering algorithm to efficiently identify reusable KV cache blocks and enables zero copy cache reuse to significantly reduce memory overhead, improve throughput while maintaining accuracy. Experimental results demonstrate that MemShare delivers up to 84.79\% improvement in throughput while maintaining better accuracy compared to existing KV cache management methods.
【3】Failure Risk Prediction in a MOOC: A Multivariate Time Series Analysis Approach
标题:MOOC中的失败风险预测:多元时间序列分析方法
链接:https://arxiv.org/abs/2507.21118
作者:Ayady (Crem, IRIMAS), Maxime Devanne (IRIMAS), Germain Forestier (IRIMAS), Nour El Mawas (Crem)
备注:in French language, Environnements Informatiques pour l'Apprentissage Humain 2025, Jun 2025, Villeneuve d'Ascq (Lille), France
摘要
:MOOC为广泛的受众提供免费和开放的访问,但完成率仍然很低,这通常是由于缺乏个性化的内容。为了解决这个问题,它是必不可少的预测学习者的表现,以提供量身定制的反馈。行为痕迹,如点击和事件,可以作为时间序列进行分析,以预测学习者的结果。这项工作比较了多元时间序列分类方法,以确定在不同阶段的课程(5,10周后,等)的风险学习者。在开放大学学习分析数据集(OULAD)上进行的实验评估侧重于三门课程:两门STEM课程和一门SHS课程。初步结果表明,评估方法是有希望的预测学习者失败的MOOC。分析还表明,预测准确性受到记录的交互量的影响,突出了丰富多样的行为数据的重要性。
摘要:MOOCs offer free and open access to a wide audience, but completion rates remain low, often due to a lack of personalized content. To address this issue, it is essential to predict learner performance in order to provide tailored feedback. Behavioral traces-such as clicks and events-can be analyzed as time series to anticipate learners' outcomes. This work compares multivariate time series classification methods to identify at-risk learners at different stages of the course (after 5, 10 weeks, etc.). The experimental evaluation, conducted on the Open University Learning Analytics Dataset (OULAD), focuses on three courses: two in STEM and one in SHS. Preliminary results show that the evaluated approaches are promising for predicting learner failure in MOOCs. The analysis also suggests that prediction accuracy is influenced by the amount of recorded interactions, highlighting the importance of rich and diverse behavioral data.
【4】R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
标题:R-Stitch:用于高效推理的动态轨迹缝合
链接:https://arxiv.org/abs/2507.17307
作者:hen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, Bohan Zhuang
摘要:思维链(CoT)推理通过鼓励推理过程中的逐步中间推理来增强大型语言模型解决问题的能力。虽然有效,CoT引入大量的计算开销,由于其依赖于自回归解码长令牌序列。现有的加速策略要么通过提前停止或压缩奖励设计来减少序列长度,要么通过使用较小模型的推测解码来提高解码速度。然而,投机性解码遭受有限的加速时,小型和大型模型之间的协议是低的,并未能利用小型模型在产生简洁的中间推理的潜在优势。在本文中,我们提出了R-Stitch,一个令牌级的,基于置信度的混合解码框架,通过沿着推理轨迹在小语言模型(SLM)和大语言模型(LLM)之间切换来加速CoT推理。默认情况下,R-Stitch使用SLM生成令牌,仅当SLM的置信度低于阈值时才委托给LLM。这种设计避免了全序列回滚,并有选择地在不确定的步骤上调用LLM,从而保持了效率和答案质量。R-Stitch与模型无关,无需训练,并与标准解码管道兼容。在数学推理基准测试上的实验表明,R-Stitch在推理延迟降低85%的同时,准确率下降可以忽略不计,突出了其在加速CoT推理方面的实际有效性。
摘要:Chain-of-thought (CoT) reasoning enhances the problem-solving capabilities of large language models by encouraging step-by-step intermediate reasoning during inference. While effective, CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences. Existing acceleration strategies either reduce sequence length through early stopping or compressive reward designs, or improve decoding speed via speculative decoding with smaller models. However, speculative decoding suffers from limited speedup when the agreement between small and large models is low, and fails to exploit the potential advantages of small models in producing concise intermediate reasoning. In this paper, we present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference by switching between a small language model (SLM) and a large language model (LLM) along the reasoning trajectory. R-Stitch uses the SLM to generate tokens by default and delegates to the LLM only when the SLM's confidence falls below a threshold. This design avoids full-sequence rollback and selectively invokes the LLM on uncertain steps, preserving both efficiency and answer quality. R-Stitch is model-agnostic, training-free, and compatible with standard decoding pipelines. Experiments on math reasoning benchmarks demonstrate that R-Stitch achieves up to 85\% reduction in inference latency with negligible accuracy drop, highlighting its practical effectiveness in accelerating CoT reasoning.
【5】An Equal-Probability Partition of the Sample Space: A Non-parametric Inference from Finite Samples
标题:样本空间的概率划分:来自有限样本的非参数推断
链接:https://arxiv.org/abs/2507.21712
作者:ksson
摘要:本文研究了从任意连续概率分布中抽取的N个有限样本中可以推断出什么,主要发现是N个有序样本点将实数线划分为N+1段,每段的期望概率质量正好为1/(N+1)。这个非参数的结果,遵循顺序统计量的基本属性,无论底层分布的形状如何都成立。这种等概率划分产生了$\log_2(N+1)$比特的离散熵,它量化了从样本中获得的信息,并与香农的连续变量结果形成对比。我比较这个分区为基础的框架,传统的ECDF和讨论其影响强大的非参数推断,特别是在密度和尾部估计。
摘要:This paper investigates what can be inferred about an arbitrary continuous probability distribution from a finite sample of $N$ observations drawn from it. The central finding is that the $N$ sorted sample points partition the real line into $N+1$ segments, each carrying an expected probability mass of exactly $1/(N+1)$. This non-parametric result, which follows from fundamental properties of order statistics, holds regardless of the underlying distribution's shape. This equal-probability partition yields a discrete entropy of $\log_2(N+1)$ bits, which quantifies the information gained from the sample and contrasts with Shannon's results for continuous variables. I compare this partition-based framework to the conventional ECDF and discuss its implications for robust non-parametric inference, particularly in density and tail estimation.
【6】Multiscale geometrical and topological learning in the analysis of soft matter collective dynamics
标题:软物质集体动力学分析中的多尺度几何和拓扑学习
链接:https://arxiv.org/abs/2507.21265
作者:rlova, Amaranta Membrillo Solis, Hayley R. O. Sohn, Tristan Madeleine, Giampaolo D'Alessandro, Ivan I. Smalyukh, Malgosia Kaczmarek, Jacek Brodzki
备注:13 pages, 6 figures
摘要:通过分析实验捕获的图像中的模式来理解动态多体系统的行为和演化是一种与各种生命和非生命自组装系统相关的有前途的方法。这里研究的移动液晶skyrmions阵列是一个有代表性的例子,层次组织的材料,表现出复杂的时空动力学驱动的多尺度过程。联合几何和拓扑数据分析(TDA)提供了一个强大的框架,通过捕获在多个尺度上的数据的底层结构调查这样的系统。在TDA方法中,我们引入了$\Psi$-函数,这是一个强大的数值拓扑描述符,与单个拓扑孤子的大小和形状的时空变化以及具有不同空间组织的区域的出现有关。几何方法的基础上产生的图像的skyrmion合奏的矢量场的分析提供了洞察系统的响应外部刺激的非线性物理机制,并提供了与理论预测比较的基础。这里提出的方法是非常普遍的,可以提供一个表征系统的行为,无论是在个人的模式形成剂的水平,并作为一个整体,允许一个相关的图像数据分析的结果发生在一个物理,化学或生物系统在现实世界中的过程。
摘要:Understanding the behavior and evolution of a dynamical many-body system by analyzing patterns in their experimentally captured images is a promising method relevant for a variety of living and non-living self-assembled systems. The arrays of moving liquid crystal skyrmions studied here are a representative example of hierarchically organized materials that exhibit complex spatiotemporal dynamics driven by multiscale processes. Joint geometric and topological data analysis (TDA) offers a powerful framework for investigating such systems by capturing the underlying structure of the data at multiple scales. In the TDA approach, we introduce the $\Psi$-function, a robust numerical topological descriptor related to both the spatiotemporal changes in the size and shape of individual topological solitons and the emergence of regions with their different spatial organization. The geometric method based on the analysis of vector fields generated from images of skyrmion ensembles offers insights into the nonlinear physical mechanisms of the system's response to external stimuli and provides a basis for comparison with theoretical predictions. The methodology presented here is very general and can provide a characterization of system behavior both at the level of individual pattern-forming agents and as a whole, allowing one to relate the results of image data analysis to processes occurring in a physical, chemical, or biological system in the real world.
检测相关(4篇)
【1】Group Relative Augmentation for Data Efficient Action Detection
标题
:用于数据高效动作检测的组相对增强
链接:https://arxiv.org/abs/2507.21353
作者: Patel, Iain Melvin, Zachary Izzo, Martin Renqiang Min
摘要:仅使用几个示例来调整大型视频语言模型(VLM)以进行动作检测会带来一些挑战,例如过度拟合以及场景级预训练与所需的以人为中心的理解之间的粒度不匹配。我们提出了一种有效的自适应策略相结合的参数有效的调整(LoRA)与一种新的可学习的内部功能增强。在使用Film的冻结VLM骨干中应用,这些增强生成与任务直接相关的各种功能变化。此外,我们引入了一个组加权损失函数,该函数根据每个增强样本相对于组平均值的预测偏差动态调节每个增强样本的训练贡献。这通过优先考虑信息丰富但合理的增强来促进健壮的学习。我们证明了我们的方法在复杂的多标签,多人动作检测数据集(AVA,MOMA)上的有效性,实现了强大的mAP性能,并展示了从有限的例子中调整VLM的显着数据效率。
摘要:Adapting large Video-Language Models (VLMs) for action detection using only a few examples poses challenges like overfitting and the granularity mismatch between scene-level pre-training and required person-centric understanding. We propose an efficient adaptation strategy combining parameter-efficient tuning (LoRA) with a novel learnable internal feature augmentation. Applied within the frozen VLM backbone using FiLM, these augmentations generate diverse feature variations directly relevant to the task. Additionally, we introduce a group-weighted loss function that dynamically modulates the training contribution of each augmented sample based on its prediction divergence relative to the group average. This promotes robust learning by prioritizing informative yet reasonable augmentations. We demonstrate our method's effectiveness on complex multi-label, multi-person action detection datasets (AVA, MOMA), achieving strong mAP performance and showcasing significant data efficiency for adapting VLMs from limited examples.
【2】Deep Unfolding for MIMO Signal Detection
标题:深入展开用于MMO信号检测
链接:https://arxiv.org/abs/2507.21152
作者:, Noboru Koshizuka
摘要:在本文中,我们提出了一种基于深度展开神经网络的MIMO检测器,该检测器采用Wirtinger演算进行复值计算。该方法被称为动态部分收缩阈值(DPST),可以实现高效、可解释且低复杂度的MIMO信号检测。与依赖于实值近似的先前方法不同,我们的方法在复域中原生地操作,与信号处理任务的基本性质保持一致。该算法只需要少量的可训练参数,允许简化训练。仿真结果表明,该方法具有较好的检测性能,迭代次数少,计算复杂度低,是下一代大规模MIMO系统的实用解决方案。
摘要:In this paper, we propose a deep unfolding neural network-based MIMO detector that incorporates complex-valued computations using Wirtinger calculus. The method, referred as Dynamic Partially Shrinkage Thresholding (DPST), enables efficient, interpretable, and low-complexity MIMO signal detection. Unlike prior approaches that rely on real-valued approximations, our method operates natively in the complex domain, aligning with the fundamental nature of signal processing tasks. The proposed algorithm requires only a small number of trainable parameters, allowing for simplified training. Numerical results demonstrate that the proposed method achieves superior detection performance with fewer iterations and lower computational complexity, making it a practical solution for next-generation massive MIMO systems.
【3】Pre-, In-, and Post-Processing Class Imbalance Mitigation Techniques for Failure Detection in Optical Networks
标题:光网络故障检测的预处理、处理中和处理后类失衡缓解技术
链接:https://arxiv.org/abs/2507.21119
作者:iz Ali, Jaroslaw E. Prilepsky, Nicola Sambo, João Pedro, Mohammad M. Hosseini, Antonio Napoli, Sergei K. Turitsyn, Pedro Freire
备注:3 pages + 1 page for acknowledgement and references
摘要:我们比较前,在,和后处理技术类不平衡缓解光网络故障检测。阈值调整实现了最高的F1增益(15.3%),而随机欠采样(RUS)提供了最快的推理,突出了关键的性能-复杂性权衡。
摘要:We compare pre-, in-, and post-processing techniques for class imbalance mitigation in optical network failure detection. Threshold Adjustment achieves the highest F1 gain (15.3%), while Random Under-sampling (RUS) offers the fastest inference, highlighting a key performance-complexity trade-off.
【4】An empirical comparison of some outlier detection methods with longitudinal data
标题:几种异常值检测方法在纵向数据中的实证比较
链接:https://arxiv.org/abs/2507.21203
作者:D'Orazio
摘要:本文研究了纵向数据中异常值的检测问题。它将官方统计中使用的众所周知的方法与数据挖掘和机器学习领域的建议进行了比较,这些建议基于观察或二叉划分树之间的距离。这是通过将这些方法应用于与不同类型统计单位有关的小组调查数据来实现的。传统方法非常简单,可以直接识别潜在的离群值,但它们需要特定的假设。相比之下,最近的方法只提供了一个分数,其大小直接相关的离群值存在的可能性。所有这些方法都需要用户设置一些调优参数。然而,最新的方法比传统方法更灵活,有时更有效。此外,这些方法可以应用于多维数据。
摘要:This note investigates the problem of detecting outliers in longitudinal data. It compares well-known methods used in official statistics with proposals from the fields of data mining and machine learning that are based on the distance between observations or binary partitioning trees. This is achieved by applying the methods to panel survey data related to different types of statistical units. Traditional methods are quite simple, enabling the direct identification of potential outliers, but they require specific assumptions. In contrast, recent methods provide only a score whose magnitude is directly related to the likelihood of an outlier being present. All the methods require the user to set a number of tuning parameters. However, the most recent methods are more flexible and sometimes more effective than traditional methods. In addition, these methods can be applied to multidimensional data.
分类|识别(5篇)
【1】Classification of Honey Botanical and Geographical Sources using Mineral Profiles and Machine Learning
标题:使用矿物剖面和机器学习对蜂蜜植物和地理来源进行分类
链接:https://arxiv.org/abs/2507.22032
作者:l-Awadhi, Ratnadeep Deshmukh
备注:13 pages, 7 figures, conference paper
摘要:本文提出了一种基于机器学习的方法来识别蜂蜜的植物和地理来源,使用矿物元素配置文件。该方法包括两个步骤:预处理和分类。预处理阶段包括缺失值处理和数据规范化。在分类阶段,我们采用了各种监督分类模型,用于区分蜂蜜的6种植物来源和13种地理来源。我们在公开的蜂蜜矿物质元素数据集上测试了分类器的性能。该数据集包含来自不同植物和地理来源的蜂蜜的矿物质元素分布。结果表明,蜂蜜中矿质元素含量为蜂蜜的植物来源和地理来源分类提供了有用的判别信息。结果还表明,随机森林(RF)分类器在该数据集上获得了最佳性能,对蜂蜜植物来源分类的交叉验证准确率为99.30%,对蜂蜜地理来源分类的交叉验证准确率为98.01%。
摘要
:This paper proposes a machine learning-based approach for identifying honey floral and geographical sources using mineral element profiles. The proposed method comprises two steps: preprocessing and classification. The preprocessing phase involves missing-value treatment and data normalization. In the classification phase, we employ various supervised classification models for discriminating between six botanical sources and 13 geographical origins of honey. We test the classifiers' performance on a publicly available honey mineral element dataset. The dataset contains mineral element profiles of honeys from various floral and geographical origins. Results show that mineral element content in honey provides discriminative information useful for classifying honey botanical and geographical sources. Results also show that the Random Forests (RF) classifier obtains the best performance on this dataset, achieving a cross-validation accuracy of 99.30% for classifying honey botanical origins and 98.01% for classifying honey geographical origins.
【2】Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification
标题:基于动态学习率调度的PINNs神经网络训练及图像分类
链接:https://arxiv.org/abs/2507.21749
作者:abu, Ashwin A. Raikar, Prasanta K. Ghosh
备注:10 pages
摘要:训练神经网络可能具有挑战性,特别是当问题的复杂性增加时。尽管使用了更广泛或更深入的网络,但训练它们可能是一个繁琐的过程,特别是如果超参数选择错误。学习率是其中一个重要的超参数,在训练过程中通常保持静态。复杂系统中的学习动态通常需要一种更适应的学习速率方法。这种适应性对于有效地导航变化的梯度和优化训练过程中的学习过程至关重要。本文提出了一种动态学习率调度算法(DLRS),该算法根据训练过程中计算的损失值来调整学习率。实验进行有关的问题,物理信息的神经网络(PINNs)和图像分类使用多层感知器和卷积神经网络,分别。实验结果表明,该算法加快了训练速度,提高了稳定性。
摘要:Training neural networks can be challenging, especially as the complexity of the problem increases. Despite using wider or deeper networks, training them can be a tedious process, especially if a wrong choice of the hyperparameter is made. The learning rate is one of such crucial hyperparameters, which is usually kept static during the training process. Learning dynamics in complex systems often requires a more adaptive approach to the learning rate. This adaptability becomes crucial to effectively navigate varying gradients and optimize the learning process during the training process. In this paper, a dynamic learning rate scheduler (DLRS) algorithm is presented that adapts the learning rate based on the loss values calculated during the training process. Experiments are conducted on problems related to physics-informed neural networks (PINNs) and image classification using multilayer perceptrons and convolutional neural networks, respectively. The results demonstrate that the proposed DLRS accelerates training and improves stability.
【3】Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
链接:https://arxiv.org/abs/2507.21642
作者:avenscroft, George Close, Kit Bower-Morris, Jamie Stacey, Dmitry Sityaev, Kris Y. Hong
备注:Accepted for Interspeech 2025
摘要:近年来,大规模的野生语音数据集变得越来越普遍,这是因为人们对模型越来越感兴趣,这些模型可以从未标记的数据中学习有用的特征,用于语音识别或合成等任务。这些数据集通常包含不受欢迎的特征,例如多个扬声器,非目标语言和音乐,这可能会影响模型学习。Whilter模型被提出作为一个多任务的解决方案,以确定这些不需要的样本。Whilter使用Whisper编码器和基于注意力的分类器来同时解决五个不同的分类问题。此外,还为两个流行的野外语料库的子集发布了一个带注释的数据集。Whilter在五个子任务中的三个子任务中获得了85%以上的F1分数和6.5%至7.8%的相等错误率,在语音特定类上的表现优于最先进的BEAT分类器,与单任务替代方案的组合相比,处理时间显着减少。
摘要:Large-scale in-the-wild speech datasets have become more prevalent in recent years due to increased interest in models that can learn useful features from unlabelled data for tasks such as speech recognition or synthesis. These datasets often contain undesirable features, such as multiple speakers, non-target languages, and music, which may impact model learning. The Whilter model is proposed as a multitask solution to identify these undesirable samples. Whilter uses a Whisper encoder with an attention-based classifier to solve five diverse classification problems at once. In addition, an annotated dataset is published for a subset of two popular in-the-wild corpora. Whilter achieves F1 scores above 85% and equal error rates of 6.5% to 7.8% for three of five subtasks, outperforming a state-of-the-art BEATs classifier on speech-specific classes, with a notable decrease in processing time compared to a combination of single-task alternatives.
【4】Automatic Classification of User Requirements from Online Feedback -- A Replication Study
标题:根据在线反馈自动分类用户需求--复制研究
链接:https://arxiv.org/abs/2507.21532
作者:t, Nic Boilard, Muhammad Rehan Chaudhary, Cole Thompson, Jacob Idoko, Aakash Sorathiya, Gouri Ginde
备注:10 pages, 3 figures, Replication package available at this https URL, Accepted at AIRE 2025 (12th International Workshop on Artificial Intelligence and Requirements Engineering)
摘要:自然语言处理(NLP)技术已广泛应用于需求工程(RE)领域,以支持分类和歧义检测等任务。尽管RE研究植根于实证调查,但它对RE(NLP 4 RE)研究的复制NLP的关注有限。快速发展的NLP领域正在为高效的机器辅助工作流程创造新的机会,这可以带来新的视角和结果。因此,我们复制并扩展了之前的NLP 4 RE研究(基线),“使用深度学习对小数据集环境中的在线反馈进行用户需求分类”,该研究评估了不同的深度学习模型,用于用户评论的需求分类。我们使用公开发布的源代码再现了原始结果,从而有助于加强基线研究的外部有效性。然后,我们通过评估外部数据集上的模型性能并将结果与GPT-4 o zero-shot分类器进行比较来扩展设置。此外,我们为基线研究准备了复制研究ID卡,这对评估复制准备情况很重要。结果显示,不同模型的重现性水平不同,朴素贝叶斯证明了完美的重现性。相比之下,BERT和其他模型显示了混合的结果。我们的研究结果显示,基线深度学习模型BERT和ELMo在外部数据集上表现出良好的泛化能力,GPT-4 o的性能与传统的基线机器学习模型相当。此外,我们的评估确认了基线研究的复制就绪性;然而,缺少环境设置文件将进一步增强就绪性。我们将此缺失信息纳入我们的复制包中,并为我们的研究提供复制研究ID卡,以进一步鼓励和支持我们的研究复制。
摘要:Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Although RE research is rooted in empirical investigation, it has paid limited attention to replicating NLP for RE (NLP4RE) studies. The rapidly advancing realm of NLP is creating new opportunities for efficient, machine-assisted workflows, which can bring new perspectives and results to the forefront. Thus, we replicate and extend a previous NLP4RE study (baseline), "Classifying User Requirements from Online Feedback in Small Dataset Environments using Deep Learning", which evaluated different deep learning models for requirement classification from user reviews. We reproduced the original results using publicly released source code, thereby helping to strengthen the external validity of the baseline study. We then extended the setup by evaluating model performance on an external dataset and comparing results to a GPT-4o zero-shot classifier. Furthermore, we prepared the replication study ID-card for the baseline study, important for evaluating replication readiness. Results showed diverse reproducibility levels across different models, with Naive Bayes demonstrating perfect reproducibility. In contrast, BERT and other models showed mixed results. Our findings revealed that baseline deep learning models, BERT and ELMo, exhibited good generalization capabilities on an external dataset, and GPT-4o showed performance comparable to traditional baseline machine learning models. Additionally, our assessment confirmed the baseline study's replication readiness; however missing environment setup files would have further enhanced readiness. We include this missing information in our replication package and provide the replication study ID-card for our study to further encourage and support the replication of our study.
【5】A Contrastive Diffusion-based Network (CDNet) for Time Series Classification
标题:用于时间序列分类的对比扩散网络(CDNet)
链接:https://arxiv.org/abs/2507.21357
作者:ng, Chi-Guhn Lee
备注:19 pages, conference
摘要:深度学习模型因其可扩展性和效率而广泛用于时间序列分类(TSC)。然而,在具有挑战性的数据条件下,如类相似性,多峰分布和噪声,它们的性能会下降。为了解决这些限制,我们提出了CDNet,这是一种基于对比扩散的网络,通过学习扩散过程生成信息丰富的阳性和阴性样本来增强现有的分类器。与对单个样本进行降噪的传统扩散模型不同,CDNet通过反向扩散步骤的卷积近似来学习样本之间的转换-包括类内和类间的转换。我们引入了一个理论上基于CNN的机制来实现去噪和模式覆盖,并结合了一个不确定性加权的复合损失来进行鲁棒训练。在UCR Archive和模拟数据集上进行的大量实验表明,CDNet显著改善了最先进的(SOTA)深度学习分类器,特别是在嘈杂、相似和多模态条件下。
摘要:Deep learning models are widely used for time series classification (TSC) due to their scalability and efficiency. However, their performance degrades under challenging data conditions such as class similarity, multimodal distributions, and noise. To address these limitations, we propose CDNet, a Contrastive Diffusion-based Network that enhances existing classifiers by generating informative positive and negative samples via a learned diffusion process. Unlike traditional diffusion models that denoise individual samples, CDNet learns transitions between samples--both within and across classes--through convolutional approximations of reverse diffusion steps. We introduce a theoretically grounded CNN-based mechanism to enable both denoising and mode coverage, and incorporate an uncertainty-weighted composite loss for robust training. Extensive experiments on the UCR Archive and simulated datasets demonstrate that CDNet significantly improves state-of-the-art (SOTA) deep learning classifiers, particularly under noisy, similar, and multimodal conditions.
表征(2篇)
【1】Representations in vision and language converge in a shared, multidimensional space of perceived similarities
标题:视觉和语言的表达融合在一个共享的、多维的感知相似性空间中
链接:https://arxiv.org/abs/2507.21871
作者:Marie Simkova, Adrien Doerig, Clayton Hickey, Ian Charest
备注:51 pages, 15 figures
摘要:人类可以毫不费力地描述他们所看到的东西,但在视觉和语言之间建立一种共享的表征格式仍然是一个重大挑战。新出现的证据表明,从大型语言模型(LLM)获得的语义特征空间可以很好地预测视觉和语言中的人脑表征。这就提出了一种可能性,即感觉系统在将其输入转换到共享的、嵌入式的表征空间的内在能力上收敛。然而,目前尚不清楚这种空间如何体现在人类行为中。为了研究这一点,63名参与者分别对来自自然场景数据集的100幅自然场景图像和100个相应的句子标题进行了行为相似性判断。我们发现,视觉和语言的相似性判断不仅收敛在行为水平,但也预测一个非常相似的网络功能磁共振成像脑反应诱发观看自然场景图像。此外,训练将图像映射到LLM嵌入的计算模型在解释行为相似性结构方面优于类别训练和AlexNet控制。这些研究结果表明,人类的视觉和语言相似性判断是基于一个共享的,模态不可知的表征结构,反映了视觉系统如何编码的经验。感觉系统和人工系统之间的融合表明了概念表征形成的共同能力--不是一阶模态特定输入的任意产物,而是反映外部世界稳定的关系属性的结构化表征。
摘要:Humans can effortlessly describe what they see, yet establishing a shared representational format between vision and language remains a significant challenge. Emerging evidence suggests that human brain representations in both vision and language are well predicted by semantic feature spaces obtained from large language models (LLMs). This raises the possibility that sensory systems converge in their inherent ability to transform their inputs onto shared, embedding-like representational space. However, it remains unclear how such a space manifests in human behaviour. To investigate this, sixty-three participants performed behavioural similarity judgements separately on 100 natural scene images and 100 corresponding sentence captions from the Natural Scenes Dataset. We found that visual and linguistic similarity judgements not only converge at the behavioural level but also predict a remarkably similar network of fMRI brain responses evoked by viewing the natural scene images. Furthermore, computational models trained to map images onto LLM-embeddings outperformed both category-trained and AlexNet controls in explaining the behavioural similarity structure. These findings demonstrate that human visual and linguistic similarity judgements are grounded in a shared, modality-agnostic representational structure that mirrors how the visual system encodes experience. The convergence between sensory and artificial systems suggests a common capacity of how conceptual representations are formed-not as arbitrary products of first order, modality-specific input, but as structured representations that reflect the stable, relational properties of the external world.
【2】Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations
标题:基于预训练视觉表征的实时音视频语音增强
链接:https://arxiv.org/abs/2507.21448
作者:ksandra)Ma, Sile Yin, Li-Chia Yang, Shuo Zhang
备注:Accepted into Interspeech 2025
摘要:纯音频环境中的语音增强仍然具有挑战性,特别是在存在干扰扬声器的情况下。本文提出了一种简单而有效的实时音视频语音增强系统RAVEN,它隔离和增强屏幕上的目标说话人,同时抑制干扰说话人和背景噪声。我们研究了从视听语音识别(AVSR)和主动说话人检测(ASD)中学习到的视觉嵌入如何在不同的SNR条件和干扰说话人数量下对AVSE做出贡献。我们的研究结果表明,从AVSR和ASD模型的级联嵌入提供了最大的改善,在低信噪比,多扬声器环境,而AVSR嵌入单独执行最好的只有噪音的情况下。此外,我们还开发了一个在计算机CPU上运行的实时流媒体系统,并提供了视频演示和代码库。据我们所知,这是实时AVSE系统的第一个开源实现。
摘要:Speech enhancement in audio-only settings remains challenging, particularly in the presence of interfering speakers. This paper presents a simple yet effective real-time audio-visual speech enhancement (AVSE) system, RAVEN, which isolates and enhances the on-screen target speaker while suppressing interfering speakers and background noise. We investigate how visual embeddings learned from audio-visual speech recognition (AVSR) and active speaker detection (ASD) contribute to AVSE across different SNR conditions and numbers of interfering speakers. Our results show concatenating embeddings from AVSR and ASD models provides the greatest improvement in low-SNR, multi-speaker environments, while AVSR embeddings alone perform best in noise-only scenarios. In addition, we develop a real-time streaming system that operates on a computer CPU and we provide a video demonstration and code repository. To our knowledge, this is the first open-source implementation of a real-time AVSE system.
优化|敛散性(6篇)
【1】Bayesian Neural Network Surrogates for Bayesian Optimization of Carbon Capture and Storage Operations
标题:碳捕获和储存操作的Bayesian优化的Bayesian神经网络替代
链接:https://arxiv.org/abs/2507.21803
作者:Panagiotis Fotias, Vassilis Gaganis
摘要
:碳捕集与封存(CCS)是促进可持续未来的关键技术。该过程涉及将超临界CO2注入地下地层,这是一种已广泛用于提高石油采收率的方法,具有双重目的:它不仅可以减少CO2排放并应对气候变化,还可以延长油田和平台的运营寿命和可持续性,从而减轻向绿色实践的转变。本文提供了一个彻底的比较评估的策略,优化决策变量CCS项目开发,采用无衍生工具的技术称为贝叶斯优化。除了高斯过程,这通常作为黄金标准,在BO,各种新的随机模型进行了检查和比较内BO框架。本研究探讨了在GP表现不佳的环境中,例如在大量决策变量或多个目标函数不相似的情况下,利用比GP更奇异的随机模型进行BO的有效性。通过将净现值(NPV)作为一个关键的目标函数,拟议的框架表明,它的潜力,以提高经济可行性,同时确保CCS技术的可持续部署。最后,这项研究代表了越来越多的BO研究机构在油藏工程行业的首次应用,特别是在寻找更合适的随机模型,突出了其作为提高能源部门可持续性的首选方法的潜力。
摘要:Carbon Capture and Storage (CCS) stands as a pivotal technology for fostering a sustainable future. The process, which involves injecting supercritical CO$_2$ into underground formations, a method already widely used for Enhanced Oil Recovery, serves a dual purpose: it not only curbs CO$_2$ emissions and addresses climate change but also extends the operational lifespan and sustainability of oil fields and platforms, easing the shift toward greener practices. This paper delivers a thorough comparative evaluation of strategies for optimizing decision variables in CCS project development, employing a derivative-free technique known as Bayesian Optimization. In addition to Gaussian Processes, which usually serve as the gold standard in BO, various novel stochastic models were examined and compared within a BO framework. This research investigates the effectiveness of utilizing more exotic stochastic models than GPs for BO in environments where GPs have been shown to underperform, such as in cases with a large number of decision variables or multiple objective functions that are not similarly scaled. By incorporating Net Present Value (NPV) as a key objective function, the proposed framework demonstrates its potential to improve economic viability while ensuring the sustainable deployment of CCS technologies. Ultimately, this study represents the first application in the reservoir engineering industry of the growing body of BO research, specifically in the search for more appropriate stochastic models, highlighting its potential as a preferred method for enhancing sustainability in the energy sector.
【2】MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
标题:MaPPO:利用先验知识最大化后验偏好优化
链接:https://arxiv.org/abs/2507.21183
作者: Lan, Sipeng Zhang, Tianle Wang, Yuwei Zhang, Daoan Zhang, Xinpeng Wei, Xiaoman Pan, Hongming Zhang, Dong-Jun Han, Christopher G. Brinton
摘要:随着代表用户的大型语言模型(LLM)时代的到来,偏好优化(PO)方法已成为将LLM与人类偏好相匹配并提高性能的核心方法。我们提出了最大后验偏好优化(MaPPO),一个从偏好中学习的框架,该框架明确地将先验奖励知识纳入优化目标。虽然现有的方法,如直接偏好优化(DPO)及其变体将偏好学习视为最大似然估计(MLE)问题,但MaPPO通过将先前的奖励估计整合到原则性的最大后验概率(MaP)目标中来扩展这种范式。这不仅推广了DPO及其变体,而且还通过减轻响应的过度简化的二进制分类来增强比对。更重要的是,MaPPO没有引入额外的超参数,并支持离线和在线设置的偏好优化。此外,MaPPO可以作为一个插件使用,对DPO变体进行持续改进,包括广泛使用的SimPO,IPO和CPO。在三个标准基准测试(包括MT-Bench,AlpacaEval 2.0和Arena-Hard)上对不同模型大小和模型系列进行了广泛的经验评估,证明了在不牺牲计算效率的情况下对齐性能的一致改进。
摘要:As the era of large language models (LLMs) on behalf of users unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimization (MaPPO), a framework for learning from preferences that explicitly incorporates prior reward knowledge into the optimization objective. While existing methods such as Direct Preference Optimization (DPO) and its variants treat preference learning as a Maximum Likelihood Estimation (MLE) problem, MaPPO extends this paradigm by integrating prior reward estimates into a principled Maximum a Posteriori (MaP) objective. This not only generalizes DPO and its variants, but also enhances alignment by mitigating the oversimplified binary classification of responses. More importantly, MaPPO introduces no additional hyperparameter, and supports preference optimization in both offline and online settings. In addition, MaPPO can be used as a plugin with consistent improvement on DPO variants, including widely used SimPO, IPO, and CPO. Extensive empirical evaluations of different model sizes and model series on three standard benchmarks, including MT-Bench, AlpacaEval 2.0, and Arena-Hard, demonstrate consistent improvements in alignment performance without sacrificing computational efficiency.
【3】Riemannian Optimization on Tree Tensor Networks with Application in Machine Learning
标题:树张量网络的Riemann优化及其在机器学习中的应用
链接:https://arxiv.org/abs/2507.21726
作者:llner, Marco Trenti, Dirk Lebiedz
备注:24 pages, 6 figures, 4 pseudo-code algorithms, 1 table
摘要:树型张量网络在低秩近似和量子多体模拟中有着广泛的应用。在这项工作中,我们提出了一个正式的分析TTN的微分几何基础。在此基础上,我们开发了高效的一阶和二阶优化算法,利用TTN的内在商结构。此外,我们设计了一个反向传播算法,用于在内核学习设置中训练TTN。我们通过一个有代表性的机器学习任务的数值实验来验证我们的方法。
摘要:Tree tensor networks (TTNs) are widely used in low-rank approximation and quantum many-body simulation. In this work, we present a formal analysis of the differential geometry underlying TTNs. Building on this foundation, we develop efficient first- and second-order optimization algorithms that exploit the intrinsic quotient structure of TTNs. Additionally, we devise a backpropagation algorithm for training TTNs in a kernel learning setting. We validate our methods through numerical experiments on a representative machine learning task.
【4】diffSPH: Differentiable Smoothed Particle Hydrodynamics for Adjoint Optimization and Machine Learning
标题:diffSPH:用于伴随优化和机器学习的可区分光滑粒子流体动力学
链接:https://arxiv.org/abs/2507.21684
作者:henbach, Nils Thuerey
摘要:我们介绍了diffSPH,这是一种全新的开源可微分光滑粒子流体动力学(SPH)框架,完全在PyTorch中开发,具有GPU加速功能。diffSPH是围绕微分设计的,以促进计算流体动力学(CFD)中的优化和机器学习(ML)应用,包括训练神经网络和开发混合模型。它的可微SPH核心,可压缩(激波捕获和多相流),弱可压缩(边界处理和自由表面流)和不可压缩物理方案,使广泛的应用领域。我们通过几个应用程序展示了该框架的独特功能,包括通过一种新颖的、面向目标的方法,通过最小化物理和正则化损失项来解决粒子移位问题,这是一项在传统求解器中通常难以处理的任务。进一步的例子包括优化初始条件和物理参数以匹配目标轨迹、形状优化、实现求解器在环设置以模拟高阶积分,以及通过数百个完整的模拟步骤演示梯度传播。优先考虑可读性,可用性和可扩展性,这项工作为CFD社区提供了一个基础平台,以开发和部署新型神经网络和伴随优化应用程序。
摘要:We present diffSPH, a novel open-source differentiable Smoothed Particle Hydrodynamics (SPH) framework developed entirely in PyTorch with GPU acceleration. diffSPH is designed centrally around differentiation to facilitate optimization and machine learning (ML) applications in Computational Fluid Dynamics~(CFD), including training neural networks and the development of hybrid models. Its differentiable SPH core, and schemes for compressible (with shock capturing and multi-phase flows), weakly compressible (with boundary handling and free-surface flows), and incompressible physics, enable a broad range of application areas. We demonstrate the framework's unique capabilities through several applications, including addressing particle shifting via a novel, target-oriented approach by minimizing physical and regularization loss terms, a task often intractable in traditional solvers. Further examples include optimizing initial conditions and physical parameters to match target trajectories, shape optimization, implementing a solver-in-the-loop setup to emulate higher-order integration, and demonstrating gradient propagation through hundreds of full simulation steps. Prioritizing readability, usability, and extensibility, this work offers a foundational platform for the CFD community to develop and deploy novel neural networks and adjoint optimization applications.
【5】On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems
标题
:线性系统互信息最优控制中的策略随机性
链接:https://arxiv.org/abs/2507.21543
作者:mi, Kenji Kashima
备注:17 pages
摘要:近年来,互信息最优控制作为最大熵最优控制的一种推广被提出。这两种方法都引入了正则化项来使策略具有随机性,并且重要的是在理论上澄清温度参数(即,正则化项的系数)和策略的随机性。与最大熵最优控制不同的是,互信息最优控制中的这种关系还没有得到研究。在本文中,我们研究了离散时间线性系统的互信息最优控制问题(MIOCP)的这种关系。在推广了MIOCP的一个研究结果之后,我们建立了MIOCP的最优策略的存在性,并分别给出了最优策略成为随机和确定性的温度参数条件.此外,我们还推导出相应的条件下,由交替优化算法得到的政策成为随机和确定性的温度参数。通过数值实验验证了理论结果的正确性。
摘要:In recent years, mutual information optimal control has been proposed as an extension of maximum entropy optimal control. Both approaches introduce regularization terms to render the policy stochastic, and it is important to theoretically clarify the relationship between the temperature parameter (i.e., the coefficient of the regularization term) and the stochasticity of the policy. Unlike in maximum entropy optimal control, this relationship remains unexplored in mutual information optimal control. In this paper, we investigate this relationship for a mutual information optimal control problem (MIOCP) of discrete-time linear systems. After extending the result of a previous study of the MIOCP, we establish the existence of an optimal policy of the MIOCP, and then derive the respective conditions on the temperature parameter under which the optimal policy becomes stochastic and deterministic. Furthermore, we also derive the respective conditions on the temperature parameter under which the policy obtained by an alternating optimization algorithm becomes stochastic and deterministic. The validity of the theoretical results is demonstrated through numerical experiments.
【6】From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions
标题:从次线性到线性:通过局部Polyak-Lojasiewicz区域在深度网络中快速收敛
链接:https://arxiv.org/abs/2507.21429
作者:Aich, Ashit Baran Aich, Bruce Wade
摘要:梯度下降(GD)在深度神经网络(DNN)的非凸损失景观上的收敛性提出了一个基本的理论挑战。虽然最近的工作已经确定,GD收敛到一个稳定点在局部准凸区域(LQCR)内的次线性速率,这无法解释指数收敛率在实践中一直观察到。在本文中,我们解决了这种差异,证明了在一个温和的假设下,神经切线核(NTK)的稳定性,这些相同的区域满足当地的Polyak-Lojasiewicz(PL)条件。我们引入了局部Polyak-Lojasiewicz区域(LPLR)的概念,其中平方梯度范数下界次优间隙,证明了适当初始化的有限宽度网络在初始化周围允许这样的区域,并建立GD在LPLR内实现线性收敛,提供了第一个有限宽度保证,与经验观察到的速率相匹配。我们在不同的环境中验证了我们的理论,从全连接网络上的受控实验到用随机方法训练的现代ResNet架构,证明了LPLR结构在实际的深度学习场景中稳健地出现。通过NTK框架将局部景观几何与快速优化严格联系起来,我们的工作为深度学习中基于梯度的优化的显着效率提供了明确的理论解释。
摘要:The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate within locally quasi-convex regions (LQCRs), this fails to explain the exponential convergence rates consistently observed in practice. In this paper, we resolve this discrepancy by proving that under a mild assumption on Neural Tangent Kernel (NTK) stability, these same regions satisfy a local Polyak-Lojasiewicz (PL) condition. We introduce the concept of a Locally Polyak-Lojasiewicz Region (LPLR), where the squared gradient norm lower-bounds the suboptimality gap, prove that properly initialized finite-width networks admit such regions around initialization, and establish that GD achieves linear convergence within an LPLR, providing the first finite-width guarantee that matches empirically observed rates. We validate our theory across diverse settings, from controlled experiments on fully-connected networks to modern ResNet architectures trained with stochastic methods, demonstrating that LPLR structure emerges robustly in practical deep learning scenarios. By rigorously connecting local landscape geometry to fast optimization through the NTK framework, our work provides a definitive theoretical explanation for the remarkable efficiency of gradient-based optimization in deep learning.
预测|估计(10篇)
【1】Foundation Models for Demand Forecasting via Dual-Strategy Ensembling
标题:基于双策略集成的需求预测基础模型
链接:https://arxiv.org/abs/2507.22053
作者: Defu Cao, Yan Liu
摘要:准确的需求预测对于供应链优化至关重要,但由于层次结构的复杂性,领域的变化和不断变化的外部因素,在实践中仍然很困难。虽然最近的基础模型提供了强大的潜力,时间序列预测,他们往往遭受建筑刚性和有限的鲁棒性下分布的变化。在本文中,我们提出了一个统一的集成框架,提高了在现实世界的供应链销售预测的基础模型的性能。我们的方法结合了两种互补的策略:(1)分层包围(HE),它通过语义级别(例如,商店,类别,部门),以捕捉本地化的模式;和(2)架构Ensemble(AE),它集成了来自不同模型骨干的预测,以减轻偏差并提高稳定性。我们在M5基准测试和三个外部销售数据集上进行了广泛的实验,涵盖了域内预测和zero-shot预测。结果表明,我们的方法始终优于强基线,提高了跨层次的准确性,并提供了一个简单而有效的机制,以提高在复杂的预测环境中的泛化能力。
摘要:Accurate demand forecasting is critical for supply chain optimization, yet remains difficult in practice due to hierarchical complexity, domain shifts, and evolving external factors. While recent foundation models offer strong potential for time series forecasting, they often suffer from architectural rigidity and limited robustness under distributional change. In this paper, we propose a unified ensemble framework that enhances the performance of foundation models for sales forecasting in real-world supply chains. Our method combines two complementary strategies: (1) Hierarchical Ensemble (HE), which partitions training and inference by semantic levels (e.g., store, category, department) to capture localized patterns; and (2) Architectural Ensemble (AE), which integrates predictions from diverse model backbones to mitigate bias and improve stability. We conduct extensive experiments on the M5 benchmark and three external sales datasets, covering both in-domain and zero-shot forecasting. Results show that our approach consistently outperforms strong baselines, improves accuracy across hierarchical levels, and provides a simple yet effective mechanism for boosting generalization in complex forecasting environments.
【2】DeepGo: Predictive Directed Greybox Fuzzing
标题:DeepGo:预测定向灰盒模糊处理
链接:https://arxiv.org/abs/2507.21952
作者:in, Pengfei Wang, Xu Zhou, Wei Xie, Gen Zhang, Kai Lu
摘要:最先进的DGF技术重新定义和优化了适应度度量,以准确快速地到达目标站点。然而,适应度指标的优化主要是基于启发式算法,它通常依赖于历史执行信息,并缺乏前瞻性的路径,尚未行使。因此,那些具有复杂约束的难以执行的路径将阻碍DGF到达目标,使得DGF效率较低。在本文中,我们提出了DeepGo,一个预测有向灰盒模糊器,可以结合历史和预测信息,引导DGF通过最佳路径到达目标站点。我们首先提出了路径转换模型,它将DGF建模为通过特定路径转换序列到达目标站点的过程。变异产生的新种子会引起路径转移,高回报路径转移序列对应的路径表明通过它到达目标站点的可能性很高。然后,为了预测路径转移和相应的回报,我们使用深度神经网络构建虚拟包围环境(VEE),其逐渐模仿路径转换模型并预测尚未采取的路径转换的回报。为了确定最佳路径,我们开发了一个模糊强化学习(RLF)模型来生成具有最高序列奖励的转换序列。RLF模型结合历史路径变迁和预测路径变迁生成最优路径变迁序列,并通过策略指导模糊化的变异策略。最后,为了训练高回报路径转移序列,提出了行动组的概念,对模糊搜索的关键步骤进行综合优化,以实现高效到达目标的最优路径。
摘要
:The state-of-the-art DGF techniques redefine and optimize the fitness metric to reach the target sites precisely and quickly. However, optimizations for fitness metrics are mainly based on heuristic algorithms, which usually rely on historical execution information and lack foresight on paths that have not been exercised yet. Thus, those hard-to-execute paths with complex constraints would hinder DGF from reaching the targets, making DGF less efficient. In this paper, we propose DeepGo, a predictive directed grey-box fuzzer that can combine historical and predicted information to steer DGF to reach the target site via an optimal path. We first propose the path transition model, which models DGF as a process of reaching the target site through specific path transition sequences. The new seed generated by mutation would cause the path transition, and the path corresponding to the high-reward path transition sequence indicates a high likelihood of reaching the target site through it. Then, to predict the path transitions and the corresponding rewards, we use deep neural networks to construct a Virtual Ensemble Environment (VEE), which gradually imitates the path transition model and predicts the rewards of path transitions that have not been taken yet. To determine the optimal path, we develop a Reinforcement Learning for Fuzzing (RLF) model to generate the transition sequences with the highest sequence rewards. The RLF model can combine historical and predicted path transitions to generate the optimal path transition sequences, along with the policy to guide the mutation strategy of fuzzing. Finally, to exercise the high-reward path transition sequence, we propose the concept of an action group, which comprehensively optimizes the critical steps of fuzzing to realize the optimal path to reach the target efficiently.
【3】Data-Driven Extended Corresponding State Approach for Residual Property Prediction of Hydrofluoroolefins
标题:数据驱动的扩展对应状态方法用于氢氟烯残余性质预测
链接:https://arxiv.org/abs/2507.21720
作者:, Peng Hu
摘要:氢氟烯烃因其极低的全球变暖潜值而被认为是最有前途的下一代制冷剂,可以有效缓解全球变暖效应。然而,由于缺乏可靠的热力学数据,阻碍了新的、更优越的氢氟烯烃制冷剂的发现和应用。本文综合理论方法和数据驱动方法的优点,提出了一种神经网络扩展对应状态模型来预测氢氟烯烃制冷剂的剩余热力学性质。本文的创新之处在于,通过引入图神经网络模块和对模型结构的专门设计,以提高模型的泛化能力,从流体的微观分子结构来表征流体。所提出的模型使用高精度的数据进行训练,已知的流体,并通过留一交叉验证方法进行评估。与传统的扩展对应状态模型或立方型状态方程相比,该模型对液相和超临界区的密度和能量性质的计算精度有显著提高,密度的平均绝对偏差分别为1.49%(液相)和2.42%(超临界),剩余熵的平均绝对偏差分别为3.37%和2.50%,剩余焓的平均绝对偏差分别为1.85%和1.34%。这些结果证明了将物理知识嵌入机器学习模型的有效性。所提出的神经网络扩展对应状态模型有望显著加快新型氢氟烯烃制冷剂的发现。
摘要:Hydrofluoroolefins are considered the most promising next-generation refrigerants due to their extremely low global warming potential values, which can effectively mitigate the global warming effect. However, the lack of reliable thermodynamic data hinders the discovery and application of newer and superior hydrofluoroolefin refrigerants. In this work, integrating the strengths of theoretical method and data-driven method, we proposed a neural network extended corresponding state model to predict the residual thermodynamic properties of hydrofluoroolefin refrigerants. The innovation is that the fluids are characterized through their microscopic molecular structures by the inclusion of graph neural network module and the specialized design of model architecture to enhance its generalization ability. The proposed model is trained using the highly accurate data of available known fluids, and evaluated via the leave-one-out cross-validation method. Compared to conventional extended corresponding state models or cubic equation of state, the proposed model shows significantly improved accuracy for density and energy properties in liquid and supercritical regions, with average absolute deviation of 1.49% (liquid) and 2.42% (supercritical) for density, 3.37% and 2.50% for residual entropy, 1.85% and 1.34% for residual enthalpy. These results demonstrate the effectiveness of embedding physics knowledge into the machine learning model. The proposed neural network extended corresponding state model is expected to significantly accelerate the discovery of novel hydrofluoroolefin refrigerants.
【4】PREIG: Physics-informed and Reinforcement-driven Interpretable GRU for Commodity Demand Forecasting
标题:PREIG:基于物理信息和增强驱动的可解释GRU,用于商品需求预测
链接:https://arxiv.org/abs/2507.21710
作者:a, Junbin Gao, Minh-Ngoc Tran
摘要:由于波动的市场动态、非线性依赖性以及对经济一致性预测的需求,准确预测商品需求仍然是一个关键挑战。本文介绍了PREIG,这是一种为商品需求预测量身定制的新型深度学习框架。该模型通过嵌入特定领域的经济约束(价格和需求之间的负弹性),将门控递归单元(GRU)架构与物理信息神经网络(PINN)原理进行了独特的集成。这种约束是通过一个定制的损失函数来实施的,该函数对违反物理规则的行为进行惩罚,确保模型预测保持可解释性,并与经济理论保持一致。为了进一步提高预测性能和稳定性,PREIG采用了一种混合优化策略,将NAAM和L-BFGS与基于人口的训练(POP)相结合。在多个商品数据集上的实验表明,PREIG在RMSE和MAPE方面都显著优于传统的计量经济学模型(ARIMA,GARCH)和深度学习基线(BPNN,RNN)。当与GRU相比,PREIG保持良好的解释能力,同时仍然表现出良好的预测。通过桥接领域知识,优化理论和深度学习,PREIG为经济中的高维非线性时间序列预测提供了一个强大的,可解释的和可扩展的解决方案。
摘要:Accurately forecasting commodity demand remains a critical challenge due to volatile market dynamics, nonlinear dependencies, and the need for economically consistent predictions. This paper introduces PREIG, a novel deep learning framework tailored for commodity demand forecasting. The model uniquely integrates a Gated Recurrent Unit (GRU) architecture with physics-informed neural network (PINN) principles by embedding a domain-specific economic constraint: the negative elasticity between price and demand. This constraint is enforced through a customized loss function that penalizes violations of the physical rule, ensuring that model predictions remain interpretable and aligned with economic theory. To further enhance predictive performance and stability, PREIG incorporates a hybrid optimization strategy that couples NAdam and L-BFGS with Population-Based Training (POP). Experiments across multiple commodities datasets demonstrate that PREIG significantly outperforms traditional econometric models (ARIMA,GARCH) and deep learning baselines (BPNN,RNN) in both RMSE and MAPE. When compared with GRU,PREIG maintains good explainability while still performing well in prediction. By bridging domain knowledge, optimization theory and deep learning, PREIG provides a robust, interpretable, and scalable solution for high-dimensional nonlinear time series forecasting in economy.
【5】Categorical Distributions are Effective Neural Network Outputs for Event Prediction
标题:分类分布是事件预测的有效神经网络输出
链接:https://arxiv.org/abs/2507.21616
作者:an, Tom Baden
备注:32 pages, 26 figures
摘要:我们证明了使用一个简单的神经网络输出,分类概率分布,下一个尖峰预测的任务的有效性。这个案例研究激发了一个调查为什么这个简单的输出结构是不常用的神经时间点过程模型。我们发现的证据表明,许多现有的数据集用于评估时间点过程模型并没有揭示太多的信息的基础事件生成过程,许多现有的模型表现良好,由于正则化的影响模型的大小和约束的输出结构。我们扩展了现有的数据集,并创建了新的数据集,以便在这种信息有限的制度之外进行探索,并发现输出一个简单的分类分布在广泛的数据集上具有竞争力。
摘要:We demonstrate the effectiveness of using a simple neural network output, a categorical probability distribution, for the task of next spike prediction. This case study motivates an investigation into why this simple output structure is not commonly used with neural temporal point process models. We find evidence that many existing datasets for evaluating temporal point process models do not reveal much information about the underlying event generating processes, and many existing models perform well due to regularization effects of model size and constraints on output structure. We extend existing datasets and create new ones in order to explore outside of this information limited regime and find that outputting a simple categorical distribution is competitive across a wide range of datasets.
【6】Advancing Wildfire Risk Prediction via Morphology-Aware Curriculum Contrastive Learning
标题:通过形态感知课程对比学习推进野火风险预测
链接:https://arxiv.org/abs/2507.21147
作者:Lo Scudo, Alessio De Rango, Luca Furnari, Alfonso Senatore, Donato D'Ambrosio, Giuseppe Mendicino, Gianluigi Greco
备注:To appear in the Proceedings of ECAI 2025
摘要:野火严重影响自然生态系统和人类健康,导致生物多样性丧失,水文地质风险增加,有毒物质排放增加。气候变化加剧了这些影响,特别是在气温上升和干旱期延长的地区,如地中海。这就需要制定先进的风险管理战略,利用最先进的技术。然而,在这种情况下,数据显示出对不平衡环境的偏见,其中野火事件的发生率显着低于典型情况。这种不平衡,加上高维时空数据的固有复杂性,为训练深度学习架构带来了重大挑战。此外,由于精确的野火预测主要取决于天气数据,因此找到一种降低计算成本的方法,以便使用最新的天气预报进行更频繁的更新将是有益的。本文探讨了如何采用对比框架,可以通过增强潜在的补丁的动态功能表示来解决这些挑战。因此,我们引入了一个新的基于形态学的课程对比学习,减轻了与不同的区域特征相关的问题,并使使用较小的补丁大小,而不影响性能。通过实验分析验证了所提出的建模策略的有效性。
摘要:Wildfires significantly impact natural ecosystems and human health, leading to biodiversity loss, increased hydrogeological risks, and elevated emissions of toxic substances. Climate change exacerbates these effects, particularly in regions with rising temperatures and prolonged dry periods, such as the Mediterranean. This requires the development of advanced risk management strategies that utilize state-of-the-art technologies. However, in this context, the data show a bias toward an imbalanced setting, where the incidence of wildfire events is significantly lower than typical situations. This imbalance, coupled with the inherent complexity of high-dimensional spatio-temporal data, poses significant challenges for training deep learning architectures. Moreover, since precise wildfire predictions depend mainly on weather data, finding a way to reduce computational costs to enable more frequent updates using the latest weather forecasts would be beneficial. This paper investigates how adopting a contrastive framework can address these challenges through enhanced latent representations for the patch's dynamic features. We thus introduce a new morphology-based curriculum contrastive learning that mitigates issues associated with diverse regional characteristics and enables the use of smaller patch sizes without compromising performance. An experimental analysis is performed to validate the effectiveness of the proposed modeling strategies.
【7】Reducing Data Requirements for Sequence-Property Prediction in Copolymer Compatibilizers via Deep Neural Network Tuning
标题:通过深度神经网络调整降低聚合物增稠剂序列性质预测的数据要求
链接:https://arxiv.org/abs/2507.21902
作者:qul Islam, Nishat N. Labiba, Lawrence O. Hall, David S. Simmons
备注:23 pages, 6 figures
摘要:合成序列控制聚合物有望通过将合成聚合物的化学多功能性与生物蛋白质的精确序列介导功能相结合来改变聚合物科学。然而,这些材料的设计已被证明是非常具有挑战性的,因为它们缺乏与加速蛋白质设计密切相关的进化分子的大量数据集。在这里,我们报告了一种新的材料智能策略,可以大大减少加速这些材料设计所需的数据量。我们专注于数据连接的重复单元序列的\endash {增容剂}分子的能力,以减少不同的聚合物域之间的界面张力。这些分子的最佳顺序对于混合废物聚合物回收等应用至关重要,这在很大程度上取决于聚合物的浓度和化学细节等变量。使用当前的方法,这将需要一个完全不同的数据集来实现每种条件下的设计。在这里,我们表明,在一组条件下对序列/界面张力关系的低保真度数据进行训练的深度神经网络可以快速调整,以在一组不同的条件下进行更高保真的预测,需要的数据比通常需要的少得多。这种启动和调整方法应该允许单个低保真度父数据集在整个相关系统的星座中显着加速预测和设计。从长远来看,它还可以提供一种方法来引导定量原子设计,并从快速,粗略的模拟中获得AI见解。
摘要:Synthetic sequence-controlled polymers promise to transform polymer science by combining the chemical versatility of synthetic polymers with the precise sequence-mediated functionality of biological proteins. However, design of these materials has proven extraordinarily challenging, because they lack the massive datasets of closely related evolved molecules that accelerate design of proteins. Here we report on a new Artifical Intelligence strategy to dramatically reduce the amount of data necessary to accelerate these materials' design. We focus on data connecting the repeat-unit-sequence of a \emph{compatibilizer} molecule to its ability to reduce the interfacial tension between distinct polymer domains. The optimal sequence of these molecules, which are essential for applications such as mixed-waste polymer recycling, depends strongly on variables such as concentration and chemical details of the polymer. With current methods, this would demand an entirely distinct dataset to enable design at each condition. Here we show that a deep neural network trained on low-fidelity data for sequence/interfacial tension relations at one set of conditions can be rapidly tuned to make higher-fidelity predictions at a distinct set of conditions, requiring far less data that would ordinarily be needed. This priming-and-tuning approach should allow a single low-fidelity parent dataset to dramatically accelerate prediction and design in an entire constellation of related systems. In the long run, it may also provide an approach to bootstrapping quantitative atomistic design with AI insights from fast, coarse simulations.
【8】Unified machine-learning framework for property prediction and time-evolution simulation of strained alloy microstructure
标题:用于应变合金微观结构性能预测和时间演变模拟的统一机器学习框架
链接:https://arxiv.org/abs/2507.21760
作者:ntasia, Daniele Lanzoni, Niccolò Di Eugenio, Angelo Monteleone, Roberto Bergamaschini, Francesco Montalenti
备注:19 pages, 9 figures
摘要:我们介绍了一个统一的机器学习框架,旨在方便地解决合金微观结构的弹性场的影响下的时间演变。这种方法允许同时提取的弹性参数从一个短的轨迹和预测进一步的微观结构演变的影响下。这是通过专注于Spinodal分解在晶格失配eta的存在下,并通过进行广泛的比较相场模拟和合适的卷积递归神经网络架构的预测提供的地面实况演变。这两项任务随后可以在级联框架中执行。在广泛的失配条件下,这里提出的级联模型准确地预测eta和完整的相应的微观结构的演变,也接近临界条件时的亚稳分解。可扩展性更大的计算域的大小和温和的外推误差的时间(时间序列的五倍长,在训练过程中的采样)证明。所提出的框架是通用的,可以应用于超出这里作为一个例子考虑的具体的,原型系统。有趣的是,在模拟进一步的时间演化之前,实验视频可以用来推断未知的外部参数。
摘要:We introduce a unified machine-learning framework designed to conveniently tackle the temporal evolution of alloy microstructures under the influence of an elastic field. This approach allows for the simultaneous extraction of elastic parameters from a short trajectory and for the prediction of further microstructure evolution under their influence. This is demonstrated by focusing on spinodal decomposition in the presence of a lattice mismatch eta, and by carrying out an extensive comparison between the ground-truth evolution supplied by phase field simulations and the predictions of suitable convolutional recurrent neural network architectures. The two tasks may then be performed subsequently into a cascade framework. Under a wide spectrum of misfit conditions, the here-presented cascade model accurately predicts eta and the full corresponding microstructure evolution, also when approaching critical conditions for spinodal decomposition. Scalability to larger computational domain sizes and mild extrapolation errors in time (for time sequences five times longer than the sampled ones during training) are demonstrated. The proposed framework is general and can be applied beyond the specific, prototypical system considered here as an example. Intriguingly, experimental videos could be used to infer unknown external parameters, prior to simulating further temporal evolution.
【9】Stochastic forest transition model dynamics and parameter estimation via deep learning
标题:通过深度学习实现随机森林变迁模型动态和参数估计
链接:https://arxiv.org/abs/2507.21486
作者:umabe, Tianyu Song, Ton Viet Ta
摘要:森林过渡是一种复杂的现象,其特点是森林、农业用地和废弃地之间的动态变化。本研究开发了一个随机微分方程模型来捕捉这些转变的复杂动态。我们建立了模型的全局正解的存在性,并进行了数值分析,以评估模型参数对森林砍伐激励的影响。为了解决参数估计的挑战,我们提出了一种新的深度学习方法,该方法从包含森林和农业用地比例的时间序列观测的单个样本中估计所有模型参数。这一创新方法使我们能够了解未来任何时候的森林转型动态和森林砍伐趋势。
摘要:Forest transitions, characterized by dynamic shifts between forest, agricultural, and abandoned lands, are complex phenomena. This study developed a stochastic differential equation model to capture the intricate dynamics of these transitions. We established the existence of global positive solutions for the model and conducted numerical analyses to assess the impact of model parameters on deforestation incentives. To address the challenge of parameter estimation, we proposed a novel deep learning approach that estimates all model parameters from a single sample containing time-series observations of forest and agricultural land proportions. This innovative approach enables us to understand forest transition dynamics and deforestation trends at any future time.
【10】Predicting VBAC Outcomes from U.S. Natality Data using Deep and Classical Machine Learning Models
标题:使用深度和经典机器学习模型从美国出生数据预测WBAC结果
链接:https://arxiv.org/abs/2507.21330
作者:and
备注:12 pages, 10 figures, 1 table
摘要:准确预测剖腹产后分娩试验(TOLAC)的结果对于指导产前咨询和最大限度地减少分娩相关风险至关重要。这项研究提出了用于预测剖宫产后阴道分娩(VBAC)的监督机器学习模型,使用来自CDC WONDER Natality数据集(2017-2023)的643,029例TOLAC病例。在过滤了有一两次剖腹产史的单胎分娩以及47个产前特征的完整数据后,训练了三个分类器:逻辑回归、XGBoost和多层感知器(MLP)。MLP实现了最高性能,AUC为0.7287,紧随其后的是XGBoost(AUC = 0.727),两者都超过了逻辑回归基线(AUC = 0.709)。为了解决类别不平衡问题,将类别加权应用于MLP,并在XGBoost中实现了自定义损失函数。评价指标包括ROC曲线、混淆矩阵和精确-召回分析。Logistic回归系数强调母亲BMI、教育、产次、合并症和产前护理指标是关键预测因素。总体而言,结果表明,定期收集的早孕变量可以支持可扩展且性能中等的VBAC预测模型。这些模型提供了潜在的效用,在临床决策支持,特别是在缺乏专业的分娩期数据的设置。
摘要:Accurately predicting the outcome of a trial of labor after cesarean (TOLAC) is essential for guiding prenatal counseling and minimizing delivery-related risks. This study presents supervised machine learning models for predicting vaginal birth after cesarean (VBAC) using 643,029 TOLAC cases from the CDC WONDER Natality dataset (2017-2023). After filtering for singleton births with one or two prior cesareans and complete data across 47 prenatal-period features, three classifiers were trained: logistic regression, XGBoost, and a multilayer perceptron (MLP). The MLP achieved the highest performance with an AUC of 0.7287, followed closely by XGBoost (AUC = 0.727), both surpassing the logistic regression baseline (AUC = 0.709). To address class imbalance, class weighting was applied to the MLP, and a custom loss function was implemented in XGBoost. Evaluation metrics included ROC curves, confusion matrices, and precision-recall analysis. Logistic regression coefficients highlighted maternal BMI, education, parity, comorbidities, and prenatal care indicators as key predictors. Overall, the results demonstrate that routinely collected, early-pregnancy variables can support scalable and moderately high-performing VBAC prediction models. These models offer potential utility in clinical decision support, particularly in settings lacking access to specialized intrapartum data.
其他神经网络|深度学习|模型|建模(13篇)
【1】Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling
标题:用于代理建模的连续时间深度神经网络中的权重参数化
链接:https://arxiv.org/abs/2507.22045
作者:so, Lars Ruthotto, Khachik Sargsyan
备注:34 pages, 6 figures, submitted to the MoRE24 special issue of Computational Science and Engineering
摘要:连续时间深度学习模型,如神经常微分方程(ODE),为复杂物理系统的代理建模提供了一个有前途的框架。训练这些模型的一个核心挑战在于学习表达性但稳定的时变权重,特别是在计算约束下。这项工作研究的权重参数化策略,约束权重的时间演变到一个低维子空间由多项式基函数。我们在离散化然后优化和优化然后离散化训练范例下评估神经ODE和残差网络(ResNet)架构中的单项式和勒让德多项式基。三个高维基准问题的实验结果表明,勒让德参数化产生更稳定的训练动态,降低计算成本,并实现可比或优于单项参数化和无约束权重模型的准确性。这些发现阐明了基选择在时间相关权重参数化中的作用,并表明使用正交多项式基在模型表达能力和训练效率之间提供了有利的权衡。
摘要:Continuous-time deep learning models, such as neural ordinary differential equations (ODEs), offer a promising framework for surrogate modeling of complex physical systems. A central challenge in training these models lies in learning expressive yet stable time-varying weights, particularly under computational constraints. This work investigates weight parameterization strategies that constrain the temporal evolution of weights to a low-dimensional subspace spanned by polynomial basis functions. We evaluate both monomial and Legendre polynomial bases within neural ODE and residual network (ResNet) architectures under discretize-then-optimize and optimize-then-discretize training paradigms. Experimental results across three high-dimensional benchmark problems show that Legendre parameterizations yield more stable training dynamics, reduce computational cost, and achieve accuracy comparable to or better than both monomial parameterizations and unconstrained weight models. These findings elucidate the role of basis choice in time-dependent weight parameterization and demonstrate that using orthogonal polynomial bases offers a favorable tradeoff between model expressivity and training efficiency.
【2】Staining and locking computer vision models without retraining
标题:无需再训练即可染色和锁定计算机视觉模型
链接:https://arxiv.org/abs/2507.22000
作者: Sutton, Qinghua Zhou, George Leete, Alexander N. Gorban, Ivan Y. Tyukin
备注:10 pages, 9 pages of appendices, 10 figures
摘要:我们介绍了染色和锁定计算机视觉模型的新方法,以保护其所有者的知识产权。染色,也称为水印,将秘密行为嵌入到模型中,稍后可以用于识别它,而锁定旨在使模型不可用,除非将秘密触发器插入到输入图像中。与现有方法不同,我们的算法可用于染色和锁定预先训练的模型,而无需微调或重新训练,并且具有可证明的,可计算的保证,限制了最坏情况下的误报率。染色和锁定通过直接修改少量模型的权重来实现,并且对(解锁)模型的性能具有最小的影响。通过在输入图像的角落插入一个小的“触发补丁”来解锁锁定的模型。我们目前的实验结果表明,我们的方法的有效性,并展示其实际性能的各种计算机视觉模型。
摘要
:We introduce new methods of staining and locking computer vision models, to protect their owners' intellectual property. Staining, also known as watermarking, embeds secret behaviour into a model which can later be used to identify it, while locking aims to make a model unusable unless a secret trigger is inserted into input images. Unlike existing methods, our algorithms can be used to stain and lock pre-trained models without requiring fine-tuning or retraining, and come with provable, computable guarantees bounding their worst-case false positive rates. The stain and lock are implemented by directly modifying a small number of the model's weights and have minimal impact on the (unlocked) model's performance. Locked models are unlocked by inserting a small `trigger patch' into the corner of the input image. We present experimental results showing the efficacy of our methods and demonstrating their practical performance on a variety of computer vision models.
【3】Capacity-Constrained Continual Learning
标题:能力受限的持续学习
链接:https://arxiv.org/abs/2507.21479
作者:, Doina Precup, Benjamin Van Roy, Satinder Singh
摘要:我们可能构建的任何代理都受到容量限制,因为内存和计算资源本质上是有限的。然而,相对较少的注意力一直致力于了解有限的能力代理应该如何分配他们的资源,以实现最佳性能。本文的目标是通过研究一个简单但相关的连续学习问题:容量约束线性二次高斯(LQG)序列预测问题来阐明这个问题。我们在适当的技术条件下得出了这个问题的解决方案。此外,对于可以分解为一组子问题的问题,我们还演示了如何在稳定状态下优化分配这些子问题的容量。我们认为,本文的结果作为第一步,在系统的理论研究能力限制下的学习。
摘要:Any agents we can possibly build are subject to capacity constraints, as memory and compute resources are inherently finite. However, comparatively little attention has been dedicated to understanding how agents with limited capacity should allocate their resources for optimal performance. The goal of this paper is to shed some light on this question by studying a simple yet relevant continual learning problem: the capacity-constrained linear-quadratic-Gaussian (LQG) sequential prediction problem. We derive a solution to this problem under appropriate technical conditions. Moreover, for problems that can be decomposed into a set of sub-problems, we also demonstrate how to optimally allocate capacity across these sub-problems in the steady state. We view the results of this paper as a first step in the systematic theoretical study of learning under capacity constraints.
【4】Hebbian Memory-Augmented Recurrent Networks: Engram Neurons in Deep Learning
标题:Hebbian内存增强回归网络:深度学习中的Engram神经元
链接:https://arxiv.org/abs/2507.21474
作者:elogowski
备注:20 pages, 11 figures, 4 tables
摘要:尽管在不同的任务中取得了成功,但目前的人工递归网络架构主要依赖于隐式隐藏状态记忆,限制了它们的可解释性和建模长期依赖关系的能力。相比之下,生物神经系统采用显式的关联记忆痕迹(即,记忆痕迹)通过赫布突触可塑性加强,并在回忆过程中稀疏激活。出于这些神经生物学的见解,我们介绍了恩格拉姆神经网络(ENN),一种新的经常性架构,结合了明确的,可区分的记忆矩阵与赫布可塑性和稀疏,注意力驱动的检索机制。ENN通过动态Hebbian痕迹显式地对记忆形成和回忆进行建模,与传统的RNN变体相比,提高了透明度和可解释性。我们评估ENN架构的三个典型的基准:MNIST数字分类,CIFAR-10图像序列建模,WikiText-103语言建模。我们的实证结果表明,ENN实现了与经典RNN,GRU和LSTM架构大致相当的准确性和泛化性能,所有模型在大规模WikiText-103任务上都收敛到类似的准确性和困惑度。与此同时,ENN通过可观察的内存动态提供了显著的可解释性增强。Hebbian轨迹可视化进一步揭示了生物学上合理的结构化记忆形成过程,验证了神经科学启发的机制的潜力,为开发更具可解释性和鲁棒性的深度学习模型提供了信息。
摘要:Despite success across diverse tasks, current artificial recurrent network architectures rely primarily on implicit hidden-state memories, limiting their interpretability and ability to model long-range dependencies. In contrast, biological neural systems employ explicit, associative memory traces (i.e., engrams) strengthened through Hebbian synaptic plasticity and activated sparsely during recall. Motivated by these neurobiological insights, we introduce the Engram Neural Network (ENN), a novel recurrent architecture incorporating an explicit, differentiable memory matrix with Hebbian plasticity and sparse, attention-driven retrieval mechanisms. The ENN explicitly models memory formation and recall through dynamic Hebbian traces, improving transparency and interpretability compared to conventional RNN variants. We evaluate the ENN architecture on three canonical benchmarks: MNIST digit classification, CIFAR-10 image sequence modeling, and WikiText-103 language modeling. Our empirical results demonstrate that the ENN achieves accuracy and generalization performance broadly comparable to classical RNN, GRU, and LSTM architectures, with all models converging to similar accuracy and perplexity on the large-scale WikiText-103 task. At the same time, the ENN offers significant enhancements in interpretability through observable memory dynamics. Hebbian trace visualizations further reveal biologically plausible, structured memory formation processes, validating the potential of neuroscience-inspired mechanisms to inform the development of more interpretable and robust deep learning models.
【5】Systolic Array-based Accelerator for State-Space Models
标题:基于Symmon数组的状态空间模型加速器
链接:https://arxiv.org/abs/2507.21394
作者:a, Cansu Demirkiran, Aakash Sarkar, Milos Popovic, Ajay Joshi
摘要:序列建模对于AI理解时态数据和检测复杂的时间依赖模式至关重要。虽然递归神经网络(RNN)、卷积神经网络(CNN)和Transformers在捕获远程依赖性方面取得了进展,但由于有限的记忆保留(固定上下文窗口),它们难以在非常长的序列中实现高准确性。状态空间模型(SSM)利用指数衰减的内存来实现冗长的上下文窗口,因此它们比循环和基于transformer的模型更有效地处理非常长的数据序列。与CNN和RNN等传统神经模型不同,基于SSM的模型需要通过连续积分来求解微分方程,使得训练和推理在传统CPU和GPU上都是计算密集型和内存密集型的。在本文中,我们介绍了一个专门的硬件加速器,EpochCore,用于加速SSM。EpochCore基于脉动阵列(SA),旨在提高基于SSM的模型推理的能量效率和吞吐量,用于长距离序列任务。在SA中,我们提出了一个名为LIMA-PE的通用处理元素(PE)来执行传统和专门的MAC操作,以支持传统的DNN和SSM。为了补充EpochCore微架构,我们提出了一种新的可编程逻辑,ProDF,它可以高效地执行基于SSM的模型。通过利用LIMA-PE微架构和ProDF,EpochCore实现了平均250倍的性能提升和45倍的能效提升,但与传统的基于SA的加速器相比,面积成本增加了2倍,与GPU内核操作相比,LRA数据集的延迟/推理提高了约2,000倍。
摘要:Sequence modeling is crucial for AI to understand temporal data and detect complex time-dependent patterns. While recurrent neural networks (RNNs), convolutional neural networks (CNNs), and Transformers have advanced in capturing long-range dependencies, they struggle with achieving high accuracy with very long sequences due to limited memory retention (fixed context window). State-Space Models (SSMs) leverage exponentially decaying memory enabling lengthy context window and so they process very long data sequences more efficiently than recurrent and Transformer-based models. Unlike traditional neural models like CNNs and RNNs, SSM-based models require solving differential equations through continuous integration, making training and inference both compute- and memory-intensive on conventional CPUs and GPUs. In this paper we introduce a specialized hardware accelerator, EpochCore, for accelerating SSMs. EpochCore is based on systolic arrays (SAs) and is designed to enhance the energy efficiency and throughput of inference of SSM-based models for long-range sequence tasks. Within the SA, we propose a versatile processing element (PE) called LIMA-PE to perform traditional and specialized MAC operations to support traditional DNNs and SSMs. To complement the EpochCore microarchitecture, we propose a novel dataflow, ProDF, which enables highly efficient execution of SSM-based models. By leveraging the LIMA-PE microarchitecture and ProDF, EpochCore achieves on average 250x gains in performance and 45x improvement in energy efficiency, at the expense of 2x increase in area cost over traditional SA-based accelerators, and around ~2,000x improvement in latency/inference on LRA datasets compared to GPU kernel operations.
【6】Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators
标题:利用区分神经元环振子网络进行水库计算
链接:https://arxiv.org/abs/2507.21377
作者
: Yeung, Peter DelMastro, Arjun Karuvally, Hava Siegelmann, Edward Rietman, Hananel Hazan
备注:8 pages, 5 figures
摘要:水库计算是一种机器学习方法,它使用复杂系统动力学的丰富曲目进行函数逼近。目前的水库计算方法使用耦合的集成神经元网络,需要稳定的电流来维持活动。在这里,我们介绍了一个小世界图的分化神经元,只有当有变化的输入作为替代集成神经元作为水库计算基板是活跃的。我们发现耦合强度和网络拓扑结构,使这些小世界网络作为一个有效的水库。我们证明了这些网络在MNIST数字识别任务中的有效性,达到了与现有水库计算方法相当的90.65%的性能。研究结果表明,分化神经元可能是整合神经元的潜在替代方案,并可以为耗电的人工智能应用提供可持续的未来替代方案。
摘要:Reservoir Computing is a machine learning approach that uses the rich repertoire of complex system dynamics for function approximation. Current approaches to reservoir computing use a network of coupled integrating neurons that require a steady current to maintain activity. Here, we introduce a small world graph of differentiating neurons that are active only when there are changes in input as an alternative to integrating neurons as a reservoir computing substrate. We find the coupling strength and network topology that enable these small world networks to function as an effective reservoir. We demonstrate the efficacy of these networks in the MNIST digit recognition task, achieving comparable performance of 90.65% to existing reservoir computing approaches. The findings suggest that differentiating neurons can be a potential alternative to integrating neurons and can provide a sustainable future alternative for power-hungry AI applications.
【7】Blending data and physics for reduced-order modeling of systems with spatiotemporal chaotic dynamics
标题:混合数据和物理对具有时空混乱动力学的系统进行降阶建模
链接:https://arxiv.org/abs/2507.21299
作者: Michael D. Graham
摘要:虽然数据驱动技术是混沌动力学系统降阶建模的强大工具,但利用已知物理学(即全阶模型(FOM))来提高预测能力仍有很大潜力。我们开发了一个混合的降阶模型(ROM),由数据和FOM,不断发展的时空混沌动力学的不变流形上的坐标发现使用自动编码器。这种方法将FOM的向量场投影到不变流形上;然后,使用动态数据校正该物理导出的向量场,或者将其用作随数据更新的贝叶斯先验。在这两种情况下,神经常微分方程的方法被使用。我们认为模拟数据从Kuramoto-Sivashinsky和复杂的金斯堡-朗道方程。相对于仅数据方法,对于大量数据、稀缺数据甚至错误FOM(即错误参数值)的情况,混合方法产生显著改进的时间序列预测。
摘要:While data-driven techniques are powerful tools for reduced-order modeling of systems with chaotic dynamics, great potential remains for leveraging known physics (i.e. a full-order model (FOM)) to improve predictive capability. We develop a hybrid reduced order model (ROM), informed by both data and FOM, for evolving spatiotemporal chaotic dynamics on an invariant manifold whose coordinates are found using an autoencoder. This approach projects the vector field of the FOM onto the invariant manifold; then, this physics-derived vector field is either corrected using dynamic data, or used as a Bayesian prior that is updated with data. In both cases, the neural ordinary differential equation approach is used. We consider simulated data from the Kuramoto-Sivashinsky and complex Ginzburg-Landau equations. Relative to the data-only approach, for scenarios of abundant data, scarce data, and even an incorrect FOM (i.e. erroneous parameter values), the hybrid approach yields substantially improved time-series predictions.
【8】Learning from Limited and Imperfect Data
标题:从有限且不完美的数据中学习
链接:https://arxiv.org/abs/2507.21205
作者:gwani
备注:PhD Thesis
摘要:数据在世界上的分布(例如,互联网等)与精心策划的数据集有很大不同,并且经常被来自常见类别的样本过度填充。为精心策划的数据集设计的算法在用于从具有长尾不平衡和分布偏移的不完美数据集进行学习时表现不佳。为了扩大深度模型的使用,必须通过开发可以从不同的真实世界数据分布中学习的强大算法来克服劳动密集型的策展过程。为了实现这一目标,我们为深度神经网络开发了实用的算法,这些算法可以从现实世界中存在的有限和不完美的数据中学习。本论文分为四个部分,每个部分都涵盖了从有限或不完美的数据中学习的场景。论文的第一部分重点关注从长尾数据中学习生成模型,在那里我们减轻了模式崩溃,并为尾部(少数)类提供了多样化的美学图像生成。在第二部分中,我们通过归纳正则化方案对尾部类进行有效的泛化,该方案允许尾部类与头部类一样有效地泛化,而无需显式生成图像。在第三部分中,我们开发了优化相关学习的算法,用于从具有有限注释(半监督)的长尾数据中学习,然后是第四部分,重点是模型对具有很少到零标记样本的各种域的有效域适应。
摘要:The distribution of data in the world (eg, internet, etc.) significantly differs from the well-curated datasets and is often over-populated with samples from common categories. The algorithms designed for well-curated datasets perform suboptimally when used for learning from imperfect datasets with long-tailed imbalances and distribution shifts. To expand the use of deep models, it is essential to overcome the labor-intensive curation process by developing robust algorithms that can learn from diverse, real-world data distributions. Toward this goal, we develop practical algorithms for Deep Neural Networks which can learn from limited and imperfect data present in the real world. This thesis is divided into four segments, each covering a scenario of learning from limited or imperfect data. The first part of the thesis focuses on Learning Generative Models from Long-Tail Data, where we mitigate the mode-collapse and enable diverse aesthetic image generations for tail (minority) classes. In the second part, we enable effective generalization on tail classes through Inductive Regularization schemes, which allow tail classes to generalize as effectively as the head classes without requiring explicit generation of images. In the third part, we develop algorithms for Optimizing Relevant Metrics for learning from long-tailed data with limited annotation (semi-supervised), followed by the fourth part, which focuses on the Efficient Domain Adaptation of the model to various domains with very few to zero labeled samples.
【9】Combolutional Neural Networks
标题:组合神经网络
链接:https://arxiv.org/abs/2507.21202
作者:hurchwell, Minje Kim, Paris Smaragdis
备注:4 pages, 3 figures, accepted to WASPAA 2025
摘要:选择适当的归纳偏差是机器学习模型设计的重要步骤,尤其是在处理音频时,即使是很短的剪辑也可能包含数百万个样本。为此,我们提出了combolutional层:一个学习延迟IIR梳状滤波器和融合包络检测器,提取谐波特征的时域。我们展示了combolutional层在三个信息检索任务上的功效,评估了其相对于其他音频前端的计算成本,并提供了有效的训练实现。我们发现,在精确的谐波分析很重要的音频任务中,组合层是卷积层的有效替代,例如,钢琴转录、说话人分类和键检测。此外,与现有前端相比,组合层还有其他几个关键优势,即:低参数计数,高效的CPU推理,严格的实值计算和改进的可解释性。
摘要:Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.
【10】EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks
标题:EdgeAgentX-DT:集成数字双胞胎和生成人工智能,在战术网络中实现弹性边缘情报
链接
:https://arxiv.org/abs/2507.21196
作者:
备注:13 pages, 6 figures
摘要:EdgeAgentX-DT是EdgeAgentX框架的高级扩展,它集成了数字孪生模拟和生成式AI驱动的场景训练,可显著增强军事网络中的边缘智能。EdgeAgentX-DT利用网络数字孪生,与现实世界的边缘设备同步的虚拟副本,为培训和验证提供安全,逼真的环境。利用扩散模型和Transformers等生成式AI方法,该系统为基于模拟的强大代理训练创建了多样化和对抗性的场景。我们的多层架构包括:(1)设备上的边缘智能;(2)数字孪生同步;(3)生成场景训练。实验模拟表明,与EdgeAgentX相比,EdgeAgentX有显著的改进,包括更快的学习收敛,更高的网络吞吐量,更低的延迟,以及更好的抗干扰和节点故障的能力。一个案例研究涉及一个复杂的战术场景,同时干扰攻击,代理故障,并增加网络负载说明了如何EdgeAgentX-DT维持操作性能,而基线方法失败。这些结果凸显了数字孪生生成训练在竞争环境中加强边缘AI部署的潜力。
摘要:We introduce EdgeAgentX-DT, an advanced extension of the EdgeAgentX framework that integrates digital twin simulations and generative AI-driven scenario training to significantly enhance edge intelligence in military networks. EdgeAgentX-DT utilizes network digital twins, virtual replicas synchronized with real-world edge devices, to provide a secure, realistic environment for training and validation. Leveraging generative AI methods, such as diffusion models and transformers, the system creates diverse and adversarial scenarios for robust simulation-based agent training. Our multi-layer architecture includes: (1) on-device edge intelligence; (2) digital twin synchronization; and (3) generative scenario training. Experimental simulations demonstrate notable improvements over EdgeAgentX, including faster learning convergence, higher network throughput, reduced latency, and improved resilience against jamming and node failures. A case study involving a complex tactical scenario with simultaneous jamming attacks, agent failures, and increased network loads illustrates how EdgeAgentX-DT sustains operational performance, whereas baseline methods fail. These results highlight the potential of digital-twin-enabled generative training to strengthen edge AI deployments in contested environments.
【11】Task-Focused Consolidation with Spaced Recall: Making Neural Networks learn like college students
标题:以任务为中心的整合与间隔回忆:让神经网络像大学生一样学习
链接:https://arxiv.org/abs/2507.21109
作者:mnodkar
摘要:深度神经网络经常受到一个被称为灾难性遗忘的关键限制,即在学习新任务后,过去任务的性能会下降。本文介绍了一种新的持续学习方法,命名为任务集中巩固与间隔回忆(TFC-SR)的灵感来自人类的学习策略,如主动回忆,刻意练习和间隔重复。TFC-SR通过我们称为主动召回探测器的机制增强了标准体验重放。它是对模型记忆的定期的、任务感知的评估,稳定了过去知识的表示。我们在Split MNIST和Split CIFAR-100基准测试中测试了TFC-SR,并与领先的基于正则化和基于重放的基线进行了比较。我们的研究结果表明,TFC-SR执行显着优于这些方法。例如,在Split CIFAR-100上,它的最终准确率为13.17%,而标准重播的准确率为7.40%。我们证明,这种优势来自探头本身的稳定效果,而不是从重放量的差异。此外,我们分析了内存大小和性能之间的权衡,并表明,虽然TFC-SR在内存受限的环境中表现更好,但当可用内存丰富时,更高的重放量仍然更有效。我们的结论是TFC-SR是一种强大而有效的方法,强调了将主动记忆检索机制集成到持续学习系统中的重要性。
摘要:Deep Neural Networks often suffer from a critical limitation known as Catastrophic Forgetting, where performance on past tasks degrades after learning new ones. This paper introduces a novel continual learning approach inspired by human learning strategies like Active Recall, Deliberate Practice and Spaced Repetition, named Task Focused Consolidation with Spaced Recall (TFC-SR). TFC-SR enhances the standard experience replay with a mechanism we termed the Active Recall Probe. It is a periodic, task-aware evaluation of the model's memory that stabilizes the representations of past knowledge. We test TFC-SR on the Split MNIST and Split CIFAR-100 benchmarks against leading regularization-based and replay-based baselines. Our results show that TFC-SR performs significantly better than these methods. For instance, on the Split CIFAR-100, it achieves a final accuracy of 13.17% compared to standard replay's 7.40%. We demonstrate that this advantage comes from the stabilizing effect of the probe itself, and not from the difference in replay volume. Additionally, we analyze the trade-off between memory size and performance and show that while TFC-SR performs better in memory-constrained environments, higher replay volume is still more effective when available memory is abundant. We conclude that TFC-SR is a robust and efficient approach, highlighting the importance of integrating active memory retrieval mechanisms into continual learning systems.
【12】Higher-Order Kuramoto Oscillator Network for Dense Associative Memory
标题:密集联想记忆的高级Kuramoto振子网络
链接:https://arxiv.org/abs/2507.21984
作者:rl, Natalia G. Berloff
备注:13 pages, 7 figures
摘要:相位振荡器网络可以作为密集的联想记忆,如果他们将更高阶的耦合超过经典仓本模型的成对相互作用。在这里,我们介绍了一个广义仓本模型结合二次谐波(成对)和四次谐波(四次)耦合,灵感来自稠密的Hopfield记忆理论。利用平均场理论及其动力学近似,我们得到了稠密联想记忆模型的相图,该模型具有一个三临界点,在该点处,连续的记忆提取开始被一个不连续的滞后过渡所取代。在四次主导的制度,系统支持存储的记忆模式,与记忆和非相干状态之间的一个相当大的能量障碍,对应的双相锁定状态。我们分析确定这一区域,并表明从内存状态(由于噪声)逃逸时间与网络大小呈指数增长,表明强大的存储。扩展到有限的内存负载的理论,我们表明,高阶耦合实现超线性缩放的内存容量与系统的大小,远远超过了限制的成对只振荡器。振荡器网络的大规模模拟证实了我们的理论预测,展示了许多相位模式的快速模式检索和鲁棒存储。这些结果桥仓本同步与现代Hopfield存储器,指向实验实现的高容量,模拟联想记忆振荡器系统。
摘要:Networks of phase oscillators can serve as dense associative memories if they incorporate higher-order coupling beyond the classical Kuramoto model's pairwise interactions. Here we introduce a generalized Kuramoto model with combined second-harmonic (pairwise) and fourth-harmonic (quartic) coupling, inspired by dense Hopfield memory theory. Using mean-field theory and its dynamical approximation, we obtain a phase diagram for dense associative memory model that exhibits a tricritical point at which the continuous onset of memory retrieval is supplanted by a discontinuous, hysteretic transition. In the quartic-dominated regime, the system supports bistable phase-locked states corresponding to stored memory patterns, with a sizable energy barrier between memory and incoherent states. We analytically determine this bistable region and show that the escape time from a memory state (due to noise) grows exponentially with network size, indicating robust storage. Extending the theory to finite memory load, we show that higher-order couplings achieve superlinear scaling of memory capacity with system size, far exceeding the limit of pairwise-only oscillators. Large-scale simulations of the oscillator network confirm our theoretical predictions, demonstrating rapid pattern retrieval and robust storage of many phase patterns. These results bridge the Kuramoto synchronization with modern Hopfield memories, pointing toward experimental realization of high-capacity, analog associative memory in oscillator systems.
【13】Benchmarking a Tunable Quantum Neural Network on Trapped-Ion and Superconducting Hardware
标题:在俘获离子和导电硬件上对可调谐量子神经网络进行基准测试
链接:https://arxiv.org/abs/2507.21222
作者:khdar-Hamina, Xingxin Liu, Richard Barney, Sarah H. Miller, Alaina M. Green, Norbert M. Linke, Victor Galitski
备注:6 pages, 3 figures
摘要
:我们实现了一个量子泛化的神经网络的捕获离子和IBM超导量子计算机分类MNIST图像,在计算机视觉中的一个共同的基准。网络前馈涉及量子比特旋转,其角度取决于前一层的测量结果。网络通过模拟进行训练,但推理是在量子硬件上进行的。经典到量子的对应关系是由一个插值参数a控制的,在经典极限下a为零。增加$a$将量子不确定性引入测量中,这表明在插值参数的中等值下可以提高网络性能。然后,我们专注于无法被经典神经网络分类但在量子网络中被正确检测到的特定图像。对于这样的边界情况下,我们观察到强烈的偏离模拟的行为。我们将此归因于物理噪声,其导致输出在分类能量景观的附近最小值之间波动。这种对物理噪声的强烈敏感性对于清晰的图像是不存在的。我们通过在神经网络电路中插入额外的单量子比特和双量子比特门对来进一步对物理噪声进行基准测试。我们的工作为在当前设备上实现更复杂的量子神经网络提供了一个跳板:虽然这种方法植根于标准的经典机器学习,但扩大这种网络可能证明是经典不可模拟的,并可能为短期量子优势提供一条途径。
摘要:We implement a quantum generalization of a neural network on trapped-ion and IBM superconducting quantum computers to classify MNIST images, a common benchmark in computer vision. The network feedforward involves qubit rotations whose angles depend on the results of measurements in the previous layer. The network is trained via simulation, but inference is performed experimentally on quantum hardware. The classical-to-quantum correspondence is controlled by an interpolation parameter, $a$, which is zero in the classical limit. Increasing $a$ introduces quantum uncertainty into the measurements, which is shown to improve network performance at moderate values of the interpolation parameter. We then focus on particular images that fail to be classified by a classical neural network but are detected correctly in the quantum network. For such borderline cases, we observe strong deviations from the simulated behavior. We attribute this to physical noise, which causes the output to fluctuate between nearby minima of the classification energy landscape. Such strong sensitivity to physical noise is absent for clear images. We further benchmark physical noise by inserting additional single-qubit and two-qubit gate pairs into the neural network circuits. Our work provides a springboard toward more complex quantum neural networks on current devices: while the approach is rooted in standard classical machine learning, scaling up such networks may prove classically non-simulable and could offer a route to near-term quantum advantage.
其他(30篇)
【1】UserBench: An Interactive Gym Environment for User-Centric Agents
标题:userBench:适合以用户为中心的代理的交互式健身房环境
链接:https://arxiv.org/abs/2507.22034
作者:n, Zuxin Liu, Akshara Prabhakar, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
备注:25 Pages, 17 Figures, 6 Tables
摘要:基于大型语言模型(LLM)的智能体在推理和工具使用方面取得了令人印象深刻的进展,使它们能够解决复杂的任务。然而,他们主动与用户合作的能力,特别是当目标模糊,不断变化或间接表达时,仍然没有得到充分的探索。为了解决这个差距,我们引入了UserBench,一个以用户为中心的基准,旨在评估代理在多回合,偏好驱动的交互。UserBench具有模拟用户的功能,这些用户从未指定的目标开始,并逐步揭示偏好,要求代理主动澄清意图并使用工具做出基础决策。我们对领先的开源和闭源LLM的评估揭示了任务完成和用户对齐之间的显着脱节。例如,模型提供的答案平均只有20%的时间与所有用户意图完全一致,即使是最先进的模型也只有不到30%的用户偏好是通过主动交互发现的。这些结果突出了建筑代理的挑战,不仅是有能力的任务执行者,但真正的合作伙伴。UserBench提供了一个交互式环境来衡量和提升这一关键能力。
摘要:Large Language Models (LLMs)-based agents have made impressive progress in reasoning and tool use, enabling them to solve complex tasks. However, their ability to proactively collaborate with users, especially when goals are vague, evolving, or indirectly expressed, remains underexplored. To address this gap, we introduce UserBench, a user-centric benchmark designed to evaluate agents in multi-turn, preference-driven interactions. UserBench features simulated users who start with underspecified goals and reveal preferences incrementally, requiring agents to proactively clarify intent and make grounded decisions with tools. Our evaluation of leading open- and closed-source LLMs reveals a significant disconnect between task completion and user alignment. For instance, models provide answers that fully align with all user intents only 20% of the time on average, and even the most advanced models uncover fewer than 30% of all user preferences through active interaction. These results highlight the challenges of building agents that are not just capable task executors, but true collaborative partners. UserBench offers an interactive environment to measure and advance this critical capability.
【2】SLA-Centric Automated Algorithm Selection Framework for Cloud Environments
标题:面向云环境的以SLA为中心的自动算法选择框架
链接:https://arxiv.org/abs/2507.21963
作者:wan, Tasnim Ahmed, Salimur Choudhury
摘要:云计算提供按需资源访问,由消费者和云服务提供商(CSP)之间的服务级别协议(SLA)进行监管。违反SLA可能会影响效率和CSP盈利能力。在这项工作中,我们提出了一个SLA感知的自动算法选择框架,在资源受限的云环境中的组合优化问题。该框架使用一系列机器学习模型来预测性能,并根据SLA约束对算法-硬件对进行排名。我们还将我们的框架应用于0-1背包问题。我们策划了一个数据集,包括实例特定的功能以及内存使用,运行时和6个算法的最优性差距。作为一个经验基准,我们评估的框架分类和回归任务。我们的消融研究探讨了超参数,学习方法和大型语言模型在回归中的有效性以及基于SHAP的可解释性的影响。
摘要:Cloud computing offers on-demand resource access, regulated by Service-Level Agreements (SLAs) between consumers and Cloud Service Providers (CSPs). SLA violations can impact efficiency and CSP profitability. In this work, we propose an SLA-aware automated algorithm-selection framework for combinatorial optimization problems in resource-constrained cloud environments. The framework uses an ensemble of machine learning models to predict performance and rank algorithm-hardware pairs based on SLA constraints. We also apply our framework to the 0-1 knapsack problem. We curate a dataset comprising instance specific features along with memory usage, runtime, and optimality gap for 6 algorithms. As an empirical benchmark, we evaluate the framework on both classification and regression tasks. Our ablation study explores the impact of hyperparameters, learning approaches, and large language models effectiveness in regression, and SHAP-based interpretability.
【3】Multi-state Protein Design with DynamicMPNN
标题:利用DynamicMPNN进行多状态蛋白质设计
链接:https://arxiv.org/abs/2507.21938
作者:dan, Sebastian Pujalte Ojeda, Chaitanya K. Joshi, Matthew Greenig, Felipe Engelberger, Alena Khmelinskaia, Jens Meiler, Michele Vendruscolo, Tuomas P. J. Knowles
备注:ICML 2025 GenBio Workshop
摘要:结构生物学长期以来一直由一个序列,一个结构,一个功能范式主导,但许多关键的生物过程-从酶催化到膜转运-依赖于采用多种构象状态的蛋白质。现有的多状态设计方法依赖于单状态预测的事后聚合,与单状态设计相比,实验成功率很差。我们介绍DynamicMPNN,一个反向折叠模型明确训练,通过跨构象集合的联合学习生成与多种构象兼容的序列。在覆盖75%的CATH超家族的46,033个构象对上进行训练,并使用AlphaFold初始猜测进行评估,DynamicMPNN在我们具有挑战性的多状态蛋白质基准中的结构归一化RMSD上比ProteinMPNN高出13%。
摘要:Structural biology has long been dominated by the one sequence, one structure, one function paradigm, yet many critical biological processes - from enzyme catalysis to membrane transport - depend on proteins that adopt multiple conformational states. Existing multi-state design approaches rely on post-hoc aggregation of single-state predictions, achieving poor experimental success rates compared to single-state design. We introduce DynamicMPNN, an inverse folding model explicitly trained to generate sequences compatible with multiple conformations through joint learning across conformational ensembles. Trained on 46,033 conformational pairs covering 75% of CATH superfamilies and evaluated using AlphaFold initial guess, DynamicMPNN outperforms ProteinMPNN by up to 13% on structure-normalized RMSD across our challenging multi-state protein benchmark.
【4】Evaluating Deepfake Detectors in the Wild
标题:评估野外Deepfake探测器
链接:https://arxiv.org/abs/2507.21905
作者:v Pirogov, Maksim Artemev
备注:Accepted to the ICML 2025 Workshop 'DataWorld: Unifying Data Curation Frameworks Across Domains'
摘要:由先进机器学习模型支持的Deepfakes对身份验证和数字媒体的真实性构成了重大且不断发展的威胁。虽然已经开发了许多检测器来解决这个问题,但它们的有效性还有待于在应用于实际数据时进行测试。在这项工作中,我们评估了现代的deepfake检测器,介绍了一种旨在模拟deepfake检测的真实场景的新测试程序。使用最先进的deepfake生成方法,我们创建了一个包含50多万张高质量deepfake图像的综合数据集。我们的分析表明,检测deepfake仍然是一项具有挑战性的任务。评估显示,在测试的deepfake检测器中,只有不到一半的检测器的AUC分数大于60%,最低的为50%。我们证明,基本的图像操作,如JPEG压缩或图像增强,可以显着降低模型的性能。所有代码和数据都可以在https://github.com/messlav/Deepfake-Detectors-in-the-Wild上公开获取。
摘要:Deepfakes powered by advanced machine learning models present a significant and evolving threat to identity verification and the authenticity of digital media. Although numerous detectors have been developed to address this problem, their effectiveness has yet to be tested when applied to real-world data. In this work we evaluate modern deepfake detectors, introducing a novel testing procedure designed to mimic real-world scenarios for deepfake detection. Using state-of-the-art deepfake generation methods, we create a comprehensive dataset containing more than 500,000 high-quality deepfake images. Our analysis shows that detecting deepfakes still remains a challenging task. The evaluation shows that in fewer than half of the deepfake detectors tested achieved an AUC score greater than 60%, with the lowest being 50%. We demonstrate that basic image manipulations, such as JPEG compression or image enhancement, can significantly reduce model performance. All code and data are publicly available at https://github.com/messlav/Deepfake-Detectors-in-the-Wild.
【5】Discovering Interpretable Ordinary Differential Equations from Noisy Data
标题:从有噪数据中发现可解释的常微方程
链接:https://arxiv.org/abs/2507.21841
作者:der, M. M. Faruque Hasan
备注:20 pages, 11 figures, 7 tables
摘要:在过去的十年中,数据驱动的可解释模型的发现近似于物理系统的潜在动力学已经获得了吸引力。目前的方法采用预先指定的功能形式或基础功能,往往导致模型缺乏物理意义和可解释性,更不用说代表系统的真实物理。我们提出了一种无监督的参数估计方法,首先找到一个近似的一般解决方案,然后通过样条变换线性估计的系数的常微分方程(ODE)。近似的一般解决方案假设使用相同的功能形式作为一般齐次,线性,常系数常微分方程的解析解。一个额外的优点是它能够产生一个高保真,平滑的函数形式,即使在存在噪声数据。样条近似从线性无关的函数形式中获得梯度信息,并创建梯度矩阵的基础。该梯度矩阵用于线性系统中以找到常微分方程的系数。从案例研究中,我们观察到,我们的建模方法发现了具有高精度的常微分方程,并且在不使用任何正则化技术的情况下提高了解的稀疏性。该方法对噪声数据也是鲁棒的,因此允许将数据驱动技术集成到真实的实验环境中,以进行物理现象的数据驱动学习。
摘要:The data-driven discovery of interpretable models approximating the underlying dynamics of a physical system has gained attraction in the past decade. Current approaches employ pre-specified functional forms or basis functions and often result in models that lack physical meaning and interpretability, let alone represent the true physics of the system. We propose an unsupervised parameter estimation methodology that first finds an approximate general solution, followed by a spline transformation to linearly estimate the coefficients of the governing ordinary differential equation (ODE). The approximate general solution is postulated using the same functional form as the analytical solution of a general homogeneous, linear, constant-coefficient ODE. An added advantage is its ability to produce a high-fidelity, smooth functional form even in the presence of noisy data. The spline approximation obtains gradient information from the functional form which are linearly independent and creates the basis of the gradient matrix. This gradient matrix is used in a linear system to find the coefficients of the ODEs. From the case studies, we observed that our modeling approach discovers ODEs with high accuracy and also promotes sparsity in the solution without using any regularization techniques. The methodology is also robust to noisy data and thus allows the integration of data-driven techniques into real experimental setting for data-driven learning of physical phenomena.
【6】evoxels: A differentiable physics framework for voxel-based microstructure simulations
标题:evoxel:基于体素的微结构模拟的可区分物理框架
链接:https://arxiv.org/abs/2507.21748
作者:bner, Alexander E. Cohen, Benjamin Dörich, Samuel J. Cooper
备注:9 pages, 3 figures, structure following JOSS style
摘要:材料科学本质上跨越学科:实验学家使用先进的显微镜来揭示微观和纳米级结构,而理论家和计算科学家则开发连接加工,结构和性能的模型。桥接这些域对于逆向材料设计至关重要,在逆向材料设计中,您可以从所需性能开始,然后反向工作以获得最佳的微观结构和制造路线。将高分辨率成像与预测模拟和数据驱动的优化相结合,可加速发现并加深对过程-结构-性能关系的理解。可微物理框架evoxels基于完全Python的统一体素方法,该方法集成了分段的3D显微镜数据,物理模拟,逆建模和机器学习。
摘要:Materials science inherently spans disciplines: experimentalists use advanced microscopy to uncover micro- and nanoscale structure, while theorists and computational scientists develop models that link processing, structure, and properties. Bridging these domains is essential for inverse material design where you start from desired performance and work backwards to optimal microstructures and manufacturing routes. Integrating high-resolution imaging with predictive simulations and data-driven optimization accelerates discovery and deepens understanding of process-structure-property relationships. The differentiable physics framework evoxels is based on a fully Pythonic, unified voxel-based approach that integrates segmented 3D microscopy data, physical simulations, inverse modeling, and machine learning.
【7】Hyperbolic Genome Embeddings
标题:双曲基因组嵌入
链接:https://arxiv.org/abs/2507.21648
作者: Khan, Philippe Chlenski, Itsik Pe'er
备注:30 pages, 16 figures, 10 tables. Camera-ready version for ICLR 2025
摘要:目前的基因组序列建模方法往往难以将机器学习模型的归纳偏差与生物系统的进化结构相结合。为此,我们制定了一个新的应用双曲CNN,利用这种结构,使更多的表达DNA序列表示。我们的策略规避了明确的系统发育映射的需要,同时辨别与核心功能和监管行为有关的序列的关键属性。在42个基因组解释基准数据集中的37个中,我们的双曲模型优于其欧几里得等效模型。值得注意的是,我们的方法甚至超过了七个GUE基准数据集的最先进性能,始终优于许多DNA语言模型,同时使用数量级更少的参数并避免预训练。我们的研究结果包括一组新的基准数据集-转座因子基准-它探索了基因组中具有深刻进化意义的主要但未充分研究的组成部分。我们通过探索我们的双曲模型如何在各种数据生成条件下识别基因组信号,以及通过构建一种解释数据集嵌入的双曲性的经验方法来进一步激励我们的工作。在这些评估中,我们发现了持续的证据,突出了我们的双曲线框架作为基因组表征学习的强大范式的潜力。我们的代码和基准数据集可在https://github.com/rrkhan/HGE上获取。
摘要
:Current approaches to genomic sequence modeling often struggle to align the inductive biases of machine learning models with the evolutionarily-informed structure of biological systems. To this end, we formulate a novel application of hyperbolic CNNs that exploits this structure, enabling more expressive DNA sequence representations. Our strategy circumvents the need for explicit phylogenetic mapping while discerning key properties of sequences pertaining to core functional and regulatory behavior. Across 37 out of 42 genome interpretation benchmark datasets, our hyperbolic models outperform their Euclidean equivalents. Notably, our approach even surpasses state-of-the-art performance on seven GUE benchmark datasets, consistently outperforming many DNA language models while using orders of magnitude fewer parameters and avoiding pretraining. Our results include a novel set of benchmark datasets--the Transposable Elements Benchmark--which explores a major but understudied component of the genome with deep evolutionary significance. We further motivate our work by exploring how our hyperbolic models recognize genomic signal under various data-generating conditions and by constructing an empirical method for interpreting the hyperbolicity of dataset embeddings. Throughout these assessments, we find persistent evidence highlighting the potential of our hyperbolic framework as a robust paradigm for genome representation learning. Our code and benchmark datasets are available at https://github.com/rrkhan/HGE.
【8】Multifunctional physical reservoir computing in soft tensegrity robots
标题:软张拉整体机器人中的多功能物理储层计算
链接:https://arxiv.org/abs/2507.21496
作者:ima, Katsuma Inoue, Kohei Nakajima, Yasuo Kuniyoshi
备注:25 pages, 12 figures. The following article has been accepted by Chaos: An Interdisciplinary Journal of Nonlinear Science
摘要:最近的研究表明,物理系统的动力学可以用于所需的信息处理的框架下的物理水库计算(PRC)。具有柔软身体的机器人是这样的物理系统的例子,并且它们的非线性身体-环境动力学可以用于计算和产生控制它们自身行为所需的电机信号。在这项模拟研究中,我们扩展这种方法来控制和嵌入不仅一个,但也多个行为到一种软机器人称为张拉整体机器人。由此产生的系统,由机器人和环境,是一个多稳定的动力系统,从不同的初始条件收敛到不同的吸引子。此外,吸引子分析表明,在训练数据之外的系统状态空间中存在“未训练吸引子”。这些未经训练的吸引子反映了张拉整体机器人的内在属性和结构及其与环境的相互作用。这些最新发现在中国的影响在具体的人工智能研究中尚未探索。我们在这里说明他们的潜力,了解各种功能的具身认知尚未完全解决的日期。
摘要:Recent studies have demonstrated that the dynamics of physical systems can be utilized for the desired information processing under the framework of physical reservoir computing (PRC). Robots with soft bodies are examples of such physical systems, and their nonlinear body-environment dynamics can be used to compute and generate the motor signals necessary for the control of their own behavior. In this simulation study, we extend this approach to control and embed not only one but also multiple behaviors into a type of soft robot called a tensegrity robot. The resulting system, consisting of the robot and the environment, is a multistable dynamical system that converges to different attractors from varying initial conditions. Furthermore, attractor analysis reveals that there exist "untrained attractors" in the state space of the system outside the training data. These untrained attractors reflect the intrinsic properties and structures of the tensegrity robot and its interactions with the environment. The impacts of these recent findings in PRC remain unexplored in embodied AI research. We here illustrate their potential to understand various features of embodied cognition that have not been fully addressed to date.
【9】PVD-ONet: A Multi-scale Neural Operator Method for Singularly Perturbed Boundary Layer Problems
标题:PVD-ONet:奇异扰动边界层问题的多尺度神经运算符方法
链接:https://arxiv.org/abs/2507.21437
作者:Sun, Jian Zu
备注:34pages,14figures
摘要:物理信息神经网络和物理信息DeepONet擅长解决偏微分方程;然而,它们通常无法收敛于奇摄动问题。为了解决这个问题,我们提出了两个新的框架,Prandtl-Van Dyke神经网络(PVD-Net)及其算子学习扩展Prandtl-Van Dyke深度算子网络(PVD-ONet),它们只依赖于没有数据的控制方程。为了满足不同的特定任务要求,PVD-Net和PVD-ONet都以两个不同的版本开发,分别针对以稳定性为重点的建模和高精度建模进行定制。领先订单PVD-Net采用两个网络架构结合普朗特的匹配条件,针对稳定优先的场景。高阶PVD-Net采用了五网络设计和Van Dyke匹配原理来捕获精细尺度的边界层结构,使其成为高精度场景的理想选择。PVD-ONet通过组装多个DeepONet模块,将PVD-Net推广到算子学习设置,直接将初始条件映射到解算子,并在无需重新训练的情况下对整个边界层问题家族进行即时预测。在各种模型上的数值实验表明,我们提出的方法在各种误差度量下始终优于现有的基线,从而为多尺度问题提供了一种强大的新方法。
摘要:Physics-informed neural networks and Physics-informed DeepONet excel in solving partial differential equations; however, they often fail to converge for singularly perturbed problems. To address this, we propose two novel frameworks, Prandtl-Van Dyke neural network (PVD-Net) and its operator learning extension Prandtl-Van Dyke Deep Operator Network (PVD-ONet), which rely solely on governing equations without data. To address varying task-specific requirements, both PVD-Net and PVD-ONet are developed in two distinct versions, tailored respectively for stability-focused and high-accuracy modeling. The leading-order PVD-Net adopts a two-network architecture combined with Prandtl's matching condition, targeting stability-prioritized scenarios. The high-order PVD-Net employs a five-network design with Van Dyke's matching principle to capture fine-scale boundary layer structures, making it ideal for high-accuracy scenarios. PVD-ONet generalizes PVD-Net to the operator learning setting by assembling multiple DeepONet modules, directly mapping initial conditions to solution operators and enabling instant predictions for an entire family of boundary layer problems without retraining. Numerical experiments on various models show that our proposed methods consistently outperform existing baselines under various error metrics, thereby offering a powerful new approach for multi-scale problems.
【10】Data Leakage and Redundancy in the LIT-PCBA Benchmark
标题:LIT-PCBA基准中的数据泄露和冗余
链接:https://arxiv.org/abs/2507.21404
作者:ng, Ian Scott Knight, Slava Naprienko
摘要:LIT-PCBA是一种广泛使用的虚拟筛选基准,但我们的审计显示它从根本上受到了损害。数据集遭受了严重的数据泄漏,猖獗的重复和普遍的模拟冗余-这些缺陷使其无法用于公平的模型评估。值得注意的是,我们识别出在训练和验证集中重复的2,491个不活跃项,以及在单个数据分割中重复的数千个不活跃项(训练中2,945个,验证中789个)。关键的是,查询集中的三个配体--表示看不见的测试用例--被泄露了:两个出现在训练集中,一个出现在验证中。结构冗余加剧了这些问题:对于某些靶标,超过80%的查询配体接近重复,Tanimoto相似性>= 0.9。仅在ALDH 1中,我们在训练集和验证集之间发现了323个高度相似的活性对,使化学多样性的说法无效。这些和其他缺陷共同导致在LIT-PCBA上训练的模型记忆而不是概括。为了证明这些数据完整性故障的后果,我们实现了一个基于记忆的简单基线-不使用学习,不使用物理学,也不使用建模-通过利用这些工件,在LIT-PCBA上的性能优于最先进的模型,包括像CHEESE这样的深度神经网络。我们的研究结果使基准不适合其预期目的,并质疑以前的结果基于其使用。我们分享这次审计,以提高认识,并提供工具,帮助社区开发更严格和可靠的数据集。复制我们的审计和基线实现所需的所有脚本都可以在https://github.com/sievestack/LIT-PCBA-audit上获得
摘要
:LIT-PCBA is a widely used benchmark for virtual screening, but our audit reveals it is fundamentally compromised. The dataset suffers from egregious data leakage, rampant duplication, and pervasive analog redundancy -- flaws that invalidate its use for fair model evaluation. Notably, we identify 2,491 inactives duplicated across training and validation sets, and thousands more repeated within individual data splits (2,945 in training, 789 in validation). Critically, three ligands in the query set -- meant to represent unseen test cases -- are leaked: two appear in the training set, one in validation. Structural redundancy compounds these issues: for some targets, over 80% of query ligands are near duplicates, with Tanimoto similarity >= 0.9. In ALDH1 alone, we find 323 highly similar active pairs between training and validation sets, invalidating claims of chemical diversity. These and other flaws collectively cause models trained on LIT-PCBA to memorize rather than generalize. To demonstrate the consequences of these data integrity failures, we implement a trivial memorization-based baseline -- using no learning, no physics, and no modeling -- that outperforms state-of-the-art models, including deep neural networks like CHEESE, on LIT-PCBA simply by exploiting these artifacts. Our findings render the benchmark unfit for its intended purpose and call into question previous results based on its use. We share this audit to raise awareness and provide tooling to help the community develop more rigorous and reliable datasets going forward. All scripts necessary to reproduce our audit and the baseline implementation are available at: https://github.com/sievestack/LIT-PCBA-audit
【11】Load Balancing for AI Training Workloads
标题:人工智能训练工作负载的负载平衡
链接:https://arxiv.org/abs/2507.21372
作者:lure, Sylvia Ratnasamy, Scott Shenker
摘要:我们研究了在专用基础设施上运行的大规模AI训练工作负载的各种负载平衡算法的性能。负载均衡的性能取决于拥塞控制和丢失恢复算法,因此我们的评估也揭示了这些设计的适当选择。
摘要:We investigate the performance of various load balancing algorithms for large-scale AI training workloads that are running on dedicated infrastructure. The performance of load balancing depends on both the congestion control and loss recovery algorithms, so our evaluation also sheds light on the appropriate choices for those designs as well.
【12】Deep Polynomial Chaos Expansion
标题:深度多项混乱扩张
链接:https://arxiv.org/abs/2507.21273
作者:Exenberger, Sascha Ranftl, Robert Peharz
备注:8th Workshop on Tractable Probabilistic Modeling, UAI 2025
摘要:多项式混沌展开(PCE)是一种经典的替代建模技术,在物理模拟和不确定性量化中有着广泛的应用。通过采取一组基本多项式的线性组合-相对于不确定输入参数的分布正交- PCE使关键统计量,如(条件)均值,方差,协方差和Sobol灵敏度指数,这是必不可少的理解建模系统和识别有影响力的参数及其相互作用的易处理的推断。由于基函数的数量随着参数的数量呈指数增长,PCE不能很好地扩展到高维问题。我们通过将PCE与概率电路的想法相结合来解决这一挑战,从而产生深度多项式混沌扩展(DeepPCE)-PCE的深度推广,可有效扩展到高维输入空间。DeepPCE实现了与多层感知器(MLP)相当的预测性能,同时保留了PCE通过简单的前向传递计算精确统计推断的能力。
摘要:Polynomial chaos expansion (PCE) is a classical and widely used surrogate modeling technique in physical simulation and uncertainty quantification. By taking a linear combination of a set of basis polynomials - orthonormal with respect to the distribution of uncertain input parameters - PCE enables tractable inference of key statistical quantities, such as (conditional) means, variances, covariances, and Sobol sensitivity indices, which are essential for understanding the modeled system and identifying influential parameters and their interactions. As the number of basis functions grows exponentially with the number of parameters, PCE does not scale well to high-dimensional problems. We address this challenge by combining PCE with ideas from probabilistic circuits, resulting in the deep polynomial chaos expansion (DeepPCE) - a deep generalization of PCE that scales effectively to high-dimensional input spaces. DeepPCE achieves predictive performance comparable to that of multi-layer perceptrons (MLPs), while retaining PCE's ability to compute exact statistical inferences via simple forward passes.
【13】Numerical PDE solvers outperform neural PDE solvers
标题:数值PDE求解器优于神经PDE求解器
链接:https://arxiv.org/abs/2507.21269
作者:hatain, Michael Rizvi-Martel, Guillaume Rabusseau, Adam Oberman
备注:17 pages, 7 figures
摘要:我们提出了DeepFDM,这是一个可微有限差分框架,用于学习时间相关偏微分方程(PDE)中的空间变化系数。通过将经典的前向欧拉离散化嵌入到卷积架构中,DeepFDM通过符合CFL的系数参数化来实现稳定性和一阶收敛。模型权重直接对应于偏微分方程系数,产生一个可解释的反问题制定。我们评估DeepFDM的标量偏微分方程的基准套件:对流,扩散,对流扩散,反应扩散和非齐次Burgers方程-在一个,两个和三个空间维度。在分布内和分布外测试(通过系数先验之间的Hellinger距离量化)中,DeepFDM获得的归一化均方误差比傅立叶神经运算符,U-Nets和ResNets小一到两个数量级;需要的训练次数少10 - 20倍;使用的参数少5 - 50倍。此外,恢复系数字段准确地匹配地面实况参数。这些结果使DeepFDM成为数据驱动解决方案和参数偏微分方程识别的强大、高效和透明的基线。
摘要:We present DeepFDM, a differentiable finite-difference framework for learning spatially varying coefficients in time-dependent partial differential equations (PDEs). By embedding a classical forward-Euler discretization into a convolutional architecture, DeepFDM enforces stability and first-order convergence via CFL-compliant coefficient parameterizations. Model weights correspond directly to PDE coefficients, yielding an interpretable inverse-problem formulation. We evaluate DeepFDM on a benchmark suite of scalar PDEs: advection, diffusion, advection-diffusion, reaction-diffusion and inhomogeneous Burgers' equations-in one, two and three spatial dimensions. In both in-distribution and out-of-distribution tests (quantified by the Hellinger distance between coefficient priors), DeepFDM attains normalized mean-squared errors one to two orders of magnitude smaller than Fourier Neural Operators, U-Nets and ResNets; requires 10-20X fewer training epochs; and uses 5-50X fewer parameters. Moreover, recovered coefficient fields accurately match ground-truth parameters. These results establish DeepFDM as a robust, efficient, and transparent baseline for data-driven solution and identification of parametric PDEs.
【14】Diffusion Denoiser-Aided Gyrocompassing
标题:扩散降噪器辅助陀螺罗盘
链接:https://arxiv.org/abs/2507.21245
作者:n-Arie, Daniel Engelsman, Rotem Dror, Itzik Klein
备注:8 pages, 8 figures
摘要:准确的初始航向角对于跨不同领域的高效和安全导航至关重要。与磁力计不同,陀螺仪可以提供精确的航向参考,而不受称为陀螺罗盘的过程中的磁干扰的影响。然而,使用低成本陀螺仪进行准确和及时的陀螺罗盘定位,在没有外部导航辅助设备的情况下仍然是一个重大挑战。这些挑战通常在诸如自动驾驶车辆的实际应用中得到解决,其中尺寸、重量和功率限制限制了传感器质量,并且噪声测量严重降低了陀螺罗盘性能。为了应对这一挑战,我们提出了一种新的扩散去噪辅助陀螺罗盘的方法。它集成了基于扩散的去噪框架和增强的基于学习的航向估计模型。扩散去噪器在输入到深度学习模型之前处理原始惯性传感器信号,从而产生精确的陀螺罗盘。使用模拟和真实的传感器数据的实验表明,我们提出的方法提高陀螺精度26%相比,基于模型的陀螺和15%相比,其他学习驱动的方法。这一进步对于确保在导航系统中包含低成本陀螺仪的自主平台中的准确和鲁棒导航具有特别重要的意义。
摘要
:An accurate initial heading angle is essential for efficient and safe navigation across diverse domains. Unlike magnetometers, gyroscopes can provide accurate heading reference independent of the magnetic disturbances in a process known as gyrocompassing. Yet, accurate and timely gyrocompassing, using low-cost gyroscopes, remains a significant challenge in scenarios where external navigation aids are unavailable. Such challenges are commonly addressed in real-world applications such as autonomous vehicles, where size, weight, and power limitations restrict sensor quality, and noisy measurements severely degrade gyrocompassing performance. To cope with this challenge, we propose a novel diffusion denoiser-aided gyrocompass approach. It integrates a diffusion-based denoising framework with an enhanced learning-based heading estimation model. The diffusion denoiser processes raw inertial sensor signals before input to the deep learning model, resulting in accurate gyrocompassing. Experiments using both simulated and real sensor data demonstrate that our proposed approach improves gyrocompassing accuracy by 26% compared to model-based gyrocompassing and by 15% compared to other learning-driven approaches. This advancement holds particular significance for ensuring accurate and robust navigation in autonomous platforms that incorporate low-cost gyroscopes within their navigation systems.
【15】Fluidically Innervated Lattices Make Versatile and Durable Tactile Sensors
标题:流体神经支配网格制造多功能且耐用的触觉传感器
链接:https://arxiv.org/abs/2507.21225
作者:ng, Miguel Flores-Acton, Andy Yu, Anshul Gupta, Maggie Yao, Daniela Rus
备注:Accepted for publication in the proceedings of the 2025 International Symposium on Experimental Robotics (ISER)
摘要:触觉传感在使机器人能够在动态和非结构化环境中导航方面发挥着重要作用,特别是在精密物体操作、表面探索和人机交互等应用中。在本文中,我们介绍了一种具有集成触觉传感的被动软机器人指尖,该指尖使用具有嵌入式空气通道的3D打印弹性体晶格制造。这种传感方法,称为流体神经支配,通过检测密封空气通道内的压力变化将晶格转换为触觉传感器,为机器人中的触觉传感提供了一种简单而强大的解决方案。与依赖复杂材料或设计的传统方法不同,流体神经支配提供了一种简单、可扩展的单一材料制造工艺。我们描述了传感器的响应,开发了一个几何模型来估计尖端位移,并训练神经网络来准确地预测接触位置和接触力。此外,我们将指尖与导纳控制器集成,以模拟类似弹簧的行为,通过触觉反馈展示其环境探索能力,并验证其在高冲击和循环载荷条件下的耐用性。这种触觉传感技术在简单性、适应性和耐用性方面具有优势,并为多功能机器人操作开辟了新的机会。
摘要:Tactile sensing plays a fundamental role in enabling robots to navigate dynamic and unstructured environments, particularly in applications such as delicate object manipulation, surface exploration, and human-robot interaction. In this paper, we introduce a passive soft robotic fingertip with integrated tactile sensing, fabricated using a 3D-printed elastomer lattice with embedded air channels. This sensorization approach, termed fluidic innervation, transforms the lattice into a tactile sensor by detecting pressure changes within sealed air channels, providing a simple yet robust solution to tactile sensing in robotics. Unlike conventional methods that rely on complex materials or designs, fluidic innervation offers a simple, scalable, single-material fabrication process. We characterize the sensors' response, develop a geometric model to estimate tip displacement, and train a neural network to accurately predict contact location and contact force. Additionally, we integrate the fingertip with an admittance controller to emulate spring-like behavior, demonstrate its capability for environment exploration through tactile feedback, and validate its durability under high impact and cyclic loading conditions. This tactile sensing technique offers advantages in terms of simplicity, adaptability, and durability and opens up new opportunities for versatile robotic manipulation.
【16】Agentic Web: Weaving the Next Web with AI Agents
标题:抽象网络:用人工智能代理编织下一个网络
链接:https://arxiv.org/abs/2507.21206
作者:Yang, Mulei Ma, Yuxuan Huang, Huacan Chai, Chenyu Gong, Haoran Geng, Yuanjian Zhou, Ying Wen, Meng Fang, Muhao Chen, Shangding Gu, Ming Jin, Costas Spanos, Yang Yang, Pieter Abbeel, Dawn Song, Weinan Zhang, Jun Wang
摘要:由大型语言模型(LLM)驱动的人工智能代理的出现标志着向互联网的关键转变,互联网是由自主,目标驱动的交互定义的互联网的新阶段。在这种模式中,代理直接相互交互,代表用户计划,协调和执行复杂的任务。这种从人驱动到机器对机器交互的转变允许委托意图,使用户摆脱常规的数字操作,并实现更具交互性的自动化Web体验。在本文中,我们提出了一个结构化的框架,理解和建设的互联网。我们追溯了它从PC和移动网络时代的演变,并确定了支持这一转变的核心技术基础。我们的框架的核心是一个概念模型,包括三个关键方面:智能,互动和经济。这些维度共同实现了AI代理的功能,例如检索,推荐,规划和协作。我们分析了在创建可扩展的代理系统,包括通信协议,编排策略,和新兴的范例,如代理注意力经济的建筑和基础设施的挑战。最后,我们讨论了潜在的应用,社会风险和治理问题所带来的代理系统,并概述了研究方向,开发开放,安全和智能生态系统塑造的人类意图和自主代理行为。可在https://github.com/SafeRL-Lab/agentic-web上查阅不断更新的有关研究报告汇编。
摘要:The emergence of AI agents powered by large language models (LLMs) marks a pivotal shift toward the Agentic Web, a new phase of the internet defined by autonomous, goal-driven interactions. In this paradigm, agents interact directly with one another to plan, coordinate, and execute complex tasks on behalf of users. This transition from human-driven to machine-to-machine interaction allows intent to be delegated, relieving users from routine digital operations and enabling a more interactive, automated web experience. In this paper, we present a structured framework for understanding and building the Agentic Web. We trace its evolution from the PC and Mobile Web eras and identify the core technological foundations that support this shift. Central to our framework is a conceptual model consisting of three key dimensions: intelligence, interaction, and economics. These dimensions collectively enable the capabilities of AI agents, such as retrieval, recommendation, planning, and collaboration. We analyze the architectural and infrastructural challenges involved in creating scalable agentic systems, including communication protocols, orchestration strategies, and emerging paradigms such as the Agent Attention Economy. We conclude by discussing the potential applications, societal risks, and governance issues posed by agentic systems, and outline research directions for developing open, secure, and intelligent ecosystems shaped by both human intent and autonomous agent behavior. A continuously updated collection of relevant studies for agentic web is available at: https://github.com/SafeRL-Lab/agentic-web.
【17】Handling Out-of-Distribution Data: A Survey
标题:处理分发外数据:一项调查
链接:https://arxiv.org/abs/2507.21160
作者:ang, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal
备注:20 pages, 6 figures, 6 tables. Accepted at IEEE Transactions on Knowledge and Data Engineering
摘要:在机器学习(ML)和数据驱动的应用程序领域,一个重要的挑战是训练和部署阶段之间的数据分布变化,通常称为分布转移。本文概述了处理两种主要类型的分布变化的不同机制:(i)协变量变化:其中特征或协变量的值在训练和测试数据之间发生变化,以及(ii)概念/语义变化:由于测试阶段出现新的类,模型在训练过程中学习到的概念发生变化。我们把我们的贡献归纳为三个方面。首先,我们形式化的分布变化,叙述如何传统的方法无法充分处理它们,并敦促一个模型,可以同时执行更好地在所有类型的分布变化。其次,我们讨论了为什么处理分布变化是重要的,并提供了广泛的审查的方法和技术,已开发的检测,测量和减轻这些变化的影响。第三,我们讨论了目前的状态分布转移处理机制,并提出了未来的研究方向在这方面。总的来说,我们提供了一个回顾性的分布转变的文献概要,集中在OOD数据,已被忽视在现有的调查。
摘要:In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines different mechanisms for handling two main types of distribution shifts: (i) Covariate shift: where the value of features or covariates change between train and test data, and (ii) Concept/Semantic-shift: where model experiences shift in the concept learned during training due to emergence of novel classes in the test phase. We sum up our contributions in three folds. First, we formalize distribution shifts, recite on how the conventional method fails to handle them adequately and urge for a model that can simultaneously perform better in all types of distribution shifts. Second, we discuss why handling distribution shifts is important and provide an extensive review of the methods and techniques that have been developed to detect, measure, and mitigate the effects of these shifts. Third, we discuss the current state of distribution shift handling mechanisms and propose future research directions in this area. Overall, we provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.
【18】SPADE-S: A Sparsity-Robust Foundational Forecaster
标题:SPADE-S:稀疏稳健的基础预测器
链接:https://arxiv.org/abs/2507.21155
作者:olff, Matthew Li, Ravi Kiran Selvam, Hanjing Zhu, Kin G. Olivares, Ruijun Ma, Abhinav Katoch, Shankar Ramasubramanian, Mengfei Cao, Roberto Bandarra, Rahul Gopalsamy, Stefania La Vattiata, Sitan Yang, Michael M. Mahoney
摘要:尽管在时间序列预测方面取得了重大进展,但对于最先进的深度学习架构来说,对具有强异质性的时间序列进行精确建模仍然具有挑战性。我们确定了导致现有模型在低幅度和稀疏时间序列上系统性表现不佳的几个因素,包括对高幅度序列、训练时间采样方法和时间序列编码方法的限制具有隐式偏差的损失函数。 SPADE-S是一个强大的预测架构,可显著降低基于幅度和稀疏性的系统偏差,并提高整体预测精度。实证结果表明,SPADE-S优于现有的最先进的方法在不同的用例集的需求预测。特别是,我们表明,根据分位数预测和幅度的系列,SPADE-S可以提高预测精度高达15%。这导致P90整体预测准确率分别提高了2.21%、6.58%和4.28%,P50预测准确率分别提高了0.92%、0.77%和1.95%,这三个不同的数据集来自一家大型在线零售商,范围从300万到7亿个系列。
摘要:Despite significant advancements in time series forecasting, accurate modeling of time series with strong heterogeneity in magnitude and/or sparsity patterns remains challenging for state-of-the-art deep learning architectures. We identify several factors that lead existing models to systematically underperform on low-magnitude and sparse time series, including loss functions with implicit biases toward high-magnitude series, training-time sampling methods, and limitations of time series encoding methods. SPADE-S is a robust forecasting architecture that significantly reduces magnitude- and sparsity-based systematic biases and improves overall prediction accuracy. Empirical results demonstrate that SPADE-S outperforms existing state-of-the-art approaches across a diverse set of use cases in demand forecasting. In particular, we show that, depending on the quantile forecast and magnitude of the series, SPADE-S can improve forecast accuracy by up to 15%. This results in P90 overall forecast accuracy gains of 2.21%, 6.58%, and 4.28%, and P50 forecast accuracy gains of 0.92%, 0.77%, and 1.95%, respectively, for each of three distinct datasets, ranging from 3 million to 700 million series, from a large online retailer.
【19】TTS-1 Technical Report
标题:TTC-1技术报告
链接:https://arxiv.org/abs/2507.21138
作者:anenko, Anna Chalova, Joseph Coombes, Nikki Cope, Phillip Dang, Zhifeng Deng, Jimmy Du, Michael Ermolenko, Feifan Fan, Yufei Feng, Cheryl Fichter, Pavel Filimonov, Louis Fischer, Kylan Gibbs, Valeria Gusarova, Pavel Karpik, Andreas Assad Kottner, Ian Lee, Oliver Louie, Jasmine Mai, Mikhail Mamontov, Suri Mao, Nurullah Morshed, Igor Poletaev, Florin Radu, Dmytro Semernia, Evgenii Shingarev, Vikram Sivaraja, Peter Skirko, Rinat Takhautdinov, Robert Villahermosa, Jean Wang
备注:20 pages, 10 figures. For associated modeling and training code, see this https URL
摘要:我们介绍Inworld TTS-1,一组两个基于transformer的自回归文本到语音(TTS)模型。我们最大的型号TTS-1-Max具有8.8B参数,专为满足苛刻应用的最高质量和表现力而设计。TTS-1是我们最高效的模型,具有1.6B参数,专为实时语音合成和设备上的用例而构建。通过扩展训练时间计算并应用语音语言模型(SpeechLM)组件的预训练,微调和RL对齐的顺序过程,这两个模型在各种基准上都实现了最先进的性能,表现出纯粹依赖于说话者语音的上下文学习的卓越质量。Inworld TTS-1和TTS-1-Max可以生成高分辨率的48 kHz语音,具有低延迟,并支持11种语言,通过音频标记进行精细的情感控制和非语言发声。我们还在MIT许可证下开源了我们的训练和建模代码。
摘要:We introduce Inworld TTS-1, a set of two Transformer-based autoregressive text-to-speech (TTS) models. Our largest model, TTS-1-Max, has 8.8B parameters and is designed for utmost quality and expressiveness in demanding applications. TTS-1 is our most efficient model, with 1.6B parameters, built for real-time speech synthesis and on-device use cases. By scaling train-time compute and applying a sequential process of pre-training, fine-tuning, and RL-alignment of the speech-language model (SpeechLM) component, both models achieve state-of-the-art performance on a variety of benchmarks, demonstrating exceptional quality relying purely on in-context learning of the speaker's voice. Inworld TTS-1 and TTS-1-Max can generate high-resolution 48 kHz speech with low latency, and support 11 languages with fine-grained emotional control and non-verbal vocalizations through audio markups. We additionally open-source our training and modeling code under an MIT license.
【20】Quantum Geometry of Data
标题:数据的量子几何
链接:https://arxiv.org/abs/2507.21135
作者: G. Abanov, Luca Candelori, Harold C. Steinacker, Martin T. Wells, Jerome R. Busemeyer, Cameron J. Hogan, Vahagn Kirakosyan, Nicola Marzari, Sunil Pinnamaneni, Dario Villani, Mengjia Xu, Kharen Musaelian
备注:27 pages, 14 figures, 1 table
摘要:我们演示了量子认知机器学习(QCML)如何将数据编码为量子几何。在QCML中,数据的特征由学习的Hermitian矩阵表示,数据点映射到Hilbert空间中的状态。量子几何描述赋予数据集丰富的几何和拓扑结构-包括内在维度,量子度量和Berry曲率-直接来自数据。QCML捕获数据的全局属性,同时避免了局部方法中固有的维数灾难。我们说明了一些合成和现实世界的例子。QCML的量子几何表示可以在量子认知的框架内推进我们对认知现象的理解。
摘要:We demonstrate how Quantum Cognition Machine Learning (QCML) encodes data as quantum geometry. In QCML, features of the data are represented by learned Hermitian matrices, and data points are mapped to states in Hilbert space. The quantum geometry description endows the dataset with rich geometric and topological structure - including intrinsic dimension, quantum metric, and Berry curvature - derived directly from the data. QCML captures global properties of data, while avoiding the curse of dimensionality inherent in local methods. We illustrate this on a number of synthetic and real-world examples. Quantum geometric representation of QCML could advance our understanding of cognitive phenomena within the framework of quantum cognition.
【21】Leveraging Generative AI to Enhance Synthea Module Development
标题:利用生成性人工智能增强合成模块开发
链接:https://arxiv.org/abs/2507.21123
作者:ramer, Aanchal Mathur, Caroline E. Adams, Jason A. Walonoski
备注:Title: Leveraging Generative AI to Enhance Synthea Module Development Word Count: [Approximately 12,000 words] Figures: 3 Tables: 3 Supplementary Material: Extensive appendices with prompts and disease profiles
摘要:本文探讨了使用大型语言模型(LLM)来帮助开源合成健康数据生成器Synthea开发新疾病模块。将LLM纳入模块开发过程有可能缩短开发时间,减少所需的专业知识,扩大模型多样性,并提高合成患者数据的整体质量。我们展示了LLM可以支持Synthea模块创建的四种方式:生成疾病概况,从疾病概况生成疾病模块,评估现有的Synthea模块,以及改进现有的模块。我们引入了渐进式细化的概念,它涉及通过检查其语法正确性和临床准确性来迭代地评估LLM生成的模块,然后使用该信息来修改模块。虽然在这种情况下使用LLM显示出希望,但我们也承认存在挑战和限制,例如需要人为监督,严格测试和验证的重要性,以及LLM生成内容的不准确性。最后,本文对未来的研究和开发提出了建议,以充分发挥LLM辅助合成数据创建的潜力。
摘要
:This paper explores the use of large language models (LLMs) to assist in the development of new disease modules for Synthea, an open-source synthetic health data generator. Incorporating LLMs into the module development process has the potential to reduce development time, reduce required expertise, expand model diversity, and improve the overall quality of synthetic patient data. We demonstrate four ways that LLMs can support Synthea module creation: generating a disease profile, generating a disease module from a disease profile, evaluating an existing Synthea module, and refining an existing module. We introduce the concept of progressive refinement, which involves iteratively evaluating the LLM-generated module by checking its syntactic correctness and clinical accuracy, and then using that information to modify the module. While the use of LLMs in this context shows promise, we also acknowledge the challenges and limitations, such as the need for human oversight, the importance of rigorous testing and validation, and the potential for inaccuracies in LLM-generated content. The paper concludes with recommendations for future research and development to fully realize the potential of LLM-aided synthetic data creation.
【22】InsurTech innovation using natural language processing
标题:使用自然语言处理的保险技术创新
链接:https://arxiv.org/abs/2507.21112
作者:g, Zhiyu Quan
摘要:随着InsurTech的迅速崛起,传统保险公司越来越多地探索替代数据源和先进技术,以保持其竞争优势。本文提供了自然语言处理(NLP)及其在保险业务中的新兴应用的概念概述和实际案例研究,重点是将原始的非结构化文本转换为适合精算分析和决策的结构化数据。利用InsurTech行业合作伙伴提供的丰富传统保险数据源的真实替代数据,我们应用各种NLP技术来展示商业保险背景下的实际用例。这些丰富的、源自文本的见解不仅增加和完善了商业保险定价的传统评级因素,而且通过引入新的行业分类,为评估潜在风险提供了新的视角。通过这些演示,我们表明NLP不仅仅是一种补充工具,而是现代数据驱动的保险分析的基本要素。
摘要:With the rapid rise of InsurTech, traditional insurance companies are increasingly exploring alternative data sources and advanced technologies to sustain their competitive edge. This paper provides both a conceptual overview and practical case studies of natural language processing (NLP) and its emerging applications within insurance operations with a focus on transforming raw, unstructured text into structured data suitable for actuarial analysis and decision-making. Leveraging real-world alternative data provided by an InsurTech industry partner that enriches traditional insurance data sources, we apply various NLP techniques to demonstrate practical use cases in the commercial insurance context. These enriched, text-derived insights not only add to and refine traditional rating factors for commercial insurance pricing but also offer novel perspectives for assessing underlying risk by introducing novel industry classifications. Through these demonstrations, we show that NLP is not merely a supplementary tool but a foundational element for modern, data-driven insurance analytics.
【23】High hopes for "Deep Medicine"? AI, economics, and the future of care
链接:https://arxiv.org/abs/2507.21054
作者:arrow, Joshua Hatherley
备注:None
摘要:在著名的《深度医学》一书中,埃里克·托波尔认为,医疗保健人工智能的发展将导致医学文化和实践的巨大转变。他认为,在未来几十年里,人工智能将变得足够复杂,以至于医生的许多日常任务都可以委托给它。Topol可能是人工智能在医学中的好处的最明确的倡导者,但他并不是唯一一个开发人工智能潜力的人,他可以让医生在未来投入更多的时间和注意力来为病人提供同理心的护理。不幸的是,有几个因素表明,医疗保健的未来完全不同。医疗人工智能的使用非但不能促进医患关系的恢复,反而可能进一步侵蚀治疗关系,威胁专业和患者的满意度。
摘要:In the much-celebrated book Deep Medicine, Eric Topol argues that the development of artificial intelligence for health care will lead to a dramatic shift in the culture and practice of medicine. In the next several decades, he suggests, AI will become sophisticated enough that many of the everyday tasks of physicians could be delegated to it. Topol is perhaps the most articulate advocate of the benefits of AI in medicine, but he is hardly alone in spruiking its potential to allow physicians to dedicate more of their time and attention to providing empathetic care for their patients in the future. Unfortunately, several factors suggest a radically different picture for the future of health care. Far from facilitating a return to a time of closer doctor-patient relationships, the use of medical AI seems likely to further erode therapeutic relationships and threaten professional and patient satisfaction.
【24】Online hierarchical partitioning of the output space in extreme multi-label data stream
标题:极端多标签数据流中输出空间的在线分层划分
链接:https://arxiv.org/abs/2507.20894
作者:s, Afonso Lourenço, Alberto Cano, Goreti Marreiros
备注:Accepted at 28th European Conference on Artificial Intelligence (ECAI 2025)
摘要:多标签输出数据流的挖掘带来了巨大的挑战,由于不断变化的分布,高维标签空间,稀疏的标签出现,和复杂的标签依赖。此外,概念漂移不仅影响输入分布,而且随着时间的推移,标签相关性和不平衡比率,使模型自适应复杂化。为了应对这些挑战,结构化学习器分为本地和全球的方法。局部方法将任务分解为更简单的组件,而全局方法将算法适应整个输出空间,通过利用标签相关性可能会产生更好的预测。这项工作介绍了iHOMER(多标签分类器的增量层次结构),这是一个在线多标签学习框架,它将标签空间增量地划分为不相交的相关聚类,而不依赖于预定义的层次结构。iHOMER利用基于\textit{Jaccard}相似性的在线划分-聚合聚类和由多变量\textit{Bernoulli}过程驱动的全局基于树的学习器来指导实例划分。为了解决非平稳性问题,它在全局和局部级别集成了漂移检测机制,从而实现了标签分区和子树的动态重组。在23个真实世界数据集上的实验表明,iHOMER优于5个最先进的全局基线,如MLHAT,修剪集的MLHT和iSOUPT,23%,以及12个局部基线,如kNN,EFDT,ARF和ADWIN bagging/boosting ensembles的二进制相关性变换,32%,建立了在线多标签分类的鲁棒性。
摘要:Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on \textit{Jaccard} similarity and a global tree-based learner driven by a multivariate \textit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23\%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32\%, establishing its robustness for online multi-label classification.
【25】Exploring the Stratified Space Structure of an RL Game with the Volume Growth Transform
标题:利用体积增长变换探索RL游戏的分层空间结构
链接:https://arxiv.org/abs/2507.22010
作者:rry, Brennan Lagasse, Ngoc B. Lam, Gregory Cox, David Rosenbluth, Alberto Speranzon
备注:17 pages and 8 figures. Preliminary report. Feedback welcome!
摘要:在这项工作中,我们探索了一个Transformer模型的嵌入空间的结构,该模型是为玩一个特定的强化学习(RL)游戏而训练的。具体来说,我们研究了基于变压器的邻近策略优化(PPO)模型如何在一个简单的环境中嵌入视觉输入,在这个环境中,代理必须收集“硬币”,同时避免由“聚光灯”组成的动态障碍。“通过改编罗宾逊等人。的研究的体积增长变换为LLM的RL设置,我们发现,令牌嵌入空间为我们的视觉硬币收集游戏也不是一个流形,并更好地建模为分层空间,其中局部尺寸可以从点到点变化。我们进一步加强罗宾逊的方法证明,相当一般的体积增长曲线可以实现分层空间。最后,我们进行了一项分析,表明当RL代理行为时,其潜在表示在低局部维度的时期之间交替,同时遵循固定的子策略,以及高局部维度的突发,其中代理实现子目标(例如,收集物体)或环境复杂性增加的地方(例如,出现更多障碍)。因此,我们的工作表明,分层潜在空间中的维度分布可能为RL游戏的复杂性提供一个新的几何指标。
摘要:In this work, we explore the structure of the embedding space of a transformer model trained for playing a particular reinforcement learning (RL) game. Specifically, we investigate how a transformer-based Proximal Policy Optimization (PPO) model embeds visual inputs in a simple environment where an agent must collect "coins" while avoiding dynamic obstacles consisting of "spotlights." By adapting Robinson et al.'s study of the volume growth transform for LLMs to the RL setting, we find that the token embedding space for our visual coin collecting game is also not a manifold, and is better modeled as a stratified space, where local dimension can vary from point to point. We further strengthen Robinson's method by proving that fairly general volume growth curves can be realized by stratified spaces. Finally, we carry out an analysis that suggests that as an RL agent acts, its latent representation alternates between periods of low local dimension, while following a fixed sub-strategy, and bursts of high local dimension, where the agent achieves a sub-goal (e.g., collecting an object) or where the environmental complexity increases (e.g., more obstacles appear). Consequently, our work suggests that the distribution of dimensions in a stratified latent space may provide a new geometric indicator of complexity for RL games.
【26】Data-driven quantum Koopman method for simulating nonlinear dynamics
标题:模拟非线性动力学的数据驱动量子库普曼方法
链接:https://arxiv.org/abs/2507.21890
作者:hang, Zhen Lu, Yaomin Zhao, Yue Yang
摘要:量子计算为模拟某些物理系统提供了潜在的指数加速,但它在非线性动力学中的应用固有地受到酉演化要求的限制。我们提出了量子Koopman方法(QKM),这是一个数据驱动的框架,通过将非线性动力学转化为高维可观测空间中的线性幺正演化来弥合这一差距。利用Koopman算子理论来实现全局线性化,我们的方法使用深度自动编码器将系统状态映射到Hilbert空间的层次结构中。在线性化的嵌入空间内,状态表示被分解为模和相位分量,并且演化由一组唯一作用于相位的酉Koopman算子来控制。这些算子由对角哈密顿算子构造而成,其系数从数据中学习,这是一种为在量子硬件上有效实现而设计的结构。这种架构能够实现直接的多步预测,并且运算符的计算复杂度与可观察的空间维度成比例。QKM在不同的非线性系统中进行了验证。它的预测保持相对误差低于6%的反应扩散系统和剪切流,并捕捉在二维湍流的关键统计。这项工作为非线性现象的量子加速模拟建立了一条实用的途径,探索了一个基于全局线性化深度学习和酉动力学演化量子算法之间协同作用的框架。
摘要:Quantum computation offers potential exponential speedups for simulating certain physical systems, but its application to nonlinear dynamics is inherently constrained by the requirement of unitary evolution. We propose the quantum Koopman method (QKM), a data-driven framework that bridges this gap through transforming nonlinear dynamics into linear unitary evolution in higher-dimensional observable spaces. Leveraging the Koopman operator theory to achieve a global linearization, our approach maps system states into a hierarchy of Hilbert spaces using a deep autoencoder. Within the linearized embedding spaces, the state representation is decomposed into modulus and phase components, and the evolution is governed by a set of unitary Koopman operators that act exclusively on the phase. These operators are constructed from diagonal Hamiltonians with coefficients learned from data, a structure designed for efficient implementation on quantum hardware. This architecture enables direct multi-step prediction, and the operator's computational complexity scales logarithmically with the observable space dimension. The QKM is validated across diverse nonlinear systems. Its predictions maintain relative errors below 6% for reaction-diffusion systems and shear flows, and capture key statistics in 2D turbulence. This work establishes a practical pathway for quantum-accelerated simulation of nonlinear phenomena, exploring a framework built on the synergy between deep learning for global linearization and quantum algorithms for unitary dynamics evolution.
【27】MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation
标题:MIBoost:多重插补后变量选择的梯度提升算法
链接:https://arxiv.org/abs/2507.21807
作者:chen
备注:21 pages, 2 algorithms, includes a simulation study
摘要:用于自动变量选择的统计学习方法,如LASSO、弹性网络或梯度提升,已成为构建强大预测模型的越来越受欢迎的工具。然而,在实践中,分析往往因数据缺失而变得复杂。解决缺失问题最广泛使用的方法是多重插补,它创建了几个完整的数据集。然而,在存在多个插补数据集的情况下,如何进行模型选择仍存在争议。简单的策略,如跨数据集的池模型,已被证明具有次优特性。虽然存在更复杂的方法,但它们往往难以实施,因此没有得到广泛应用。相比之下,最近的两种方法通过定义单个损失函数来修改正则化方法LASSO和弹性网络,从而在插补中产生统一的系数集。我们的主要贡献是通过提出MIBoost,将这一原则扩展到组件式梯度提升的框架,MIBoost是一种新的算法,它在输入数据集上采用统一的变量选择机制。模拟研究表明,我们的方法产生的预测性能相比,这些最近提出的方法。
摘要:Statistical learning methods for automated variable selection, such as LASSO, elastic nets, or gradient boosting, have become increasingly popular tools for building powerful prediction models. Yet, in practice, analyses are often complicated by missing data. The most widely used approach to address missingness is multiple imputation, which creates several completed datasets. However, there is an ongoing debate on how to perform model selection in the presence of multiple imputed datasets. Simple strategies, such as pooling models across datasets, have been shown to have suboptimal properties. Although more sophisticated methods exist, they are often difficult to implement and therefore not widely applied. In contrast, two recent approaches modify the regularization methods LASSO and elastic nets by defining a single loss function, resulting in a unified set of coefficients across imputations. Our key contribution is to extend this principle to the framework of component-wise gradient boosting by proposing MIBoost, a novel algorithm that employs a uniform variable-selection mechanism across imputed datasets. Simulation studies suggest that our approach yields prediction performance comparable to that of these recently proposed methods.
【28】An em algorithm for quantum Boltzmann machines
标题:量子Boltzmann机的em算法
链接:https://arxiv.org/abs/2507.21569
作者:imura, Kohtaro Kato, Masahito Hayashi
备注:Main text: 10 pages, 2 figures. Appendix: 3 pages, 1 figure
摘要:我们开发了一个量子版本的em算法来训练量子玻尔兹曼机。EM算法是著名的期望最大化(EM)算法的信息几何扩展,为基于梯度的方法提供了一种结构化的替代方案,在稳定性和收敛性方面具有潜在的优势。我们实现的算法上的半量子限制玻尔兹曼机,量子效应被限制在隐藏层。这种结构使分析更新规则,同时保持量子表达。基准数据集上的数值实验表明,该方法实现了稳定的学习,并在几种情况下优于基于梯度的训练。这些结果证明了量子机器学习的信息几何优化的潜力,特别是在标准方法由于非交换性或消失梯度而难以实现的情况下。
摘要
:We develop a quantum version of the em algorithm for training quantum Boltzmann machines. The em algorithm is an information-geometric extension of the well-known expectation-maximization (EM) algorithm, offering a structured alternative to gradient-based methods with potential advantages in stability and convergence. We implement the algorithm on a semi-quantum restricted Boltzmann machine, where quantum effects are confined to the hidden layer. This structure enables analytical update rules while preserving quantum expressivity. Numerical experiments on benchmark datasets show that the proposed method achieves stable learning and outperforms gradient-based training in several cases. These results demonstrate the potential of information-geometric optimization for quantum machine learning, particularly in settings where standard methods struggle due to non-commutativity or vanishing gradients.
【29】From Global to Local: A Scalable Benchmark for Local Posterior Sampling
标题:从全球到本地:本地后验抽样的可扩展基准
链接:https://arxiv.org/abs/2507.21449
作者:chcock, Jesse Hoogland
备注:25 pages
摘要:Degeneracy is an inherent feature of the loss landscape of neural networks, but it is not well understood how stochastic gradient MCMC (SGMCMC) algorithms interact with this degeneracy. In particular, current global convergence guarantees for common SGMCMC algorithms rely on assumptions which are likely incompatible with degenerate loss landscapes. In this paper, we argue that this gap requires a shift in focus from global to local posterior sampling, and, as a first step, we introduce a novel scalable benchmark for evaluating the local sampling performance of SGMCMC algorithms. We evaluate a number of common algorithms, and find that RMSProp-preconditioned SGLD is most effective at faithfully representing the local geometry of the posterior distribution. Although we lack theoretical guarantees about global sampler convergence, our empirical results show that we are able to extract non-trivial local information in models with up to O(100M) parameters.
【30】Measuring Sample Quality with Copula Discrepancies
标题:Copula离散性度量样本质量
链接:https://arxiv.org/abs/2507.21434
作者:Aich, Ashit Baran Aich, Bruce Wade
摘要:The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While powerful Stein-based diagnostics can detect distributional mismatches, they provide no direct assessment of dependence structure, often the primary inferential target in multivariate problems. We introduce the Copula Discrepancy (CD), a principled and computationally efficient diagnostic that leverages Sklar's theorem to isolate and quantify the fidelity of a sample's dependence structure independent of its marginals. Our theoretical framework provides the first structure-aware diagnostic specifically designed for the era of approximate inference. Empirically, we demonstrate that a moment-based CD dramatically outperforms standard diagnostics like effective sample size for hyperparameter selection in biased MCMC, correctly identifying optimal configurations where traditional methods fail. Furthermore, our robust MLE-based variant can detect subtle but critical mismatches in tail dependence that remain invisible to rank correlation-based approaches, distinguishing between samples with identical Kendall's tau but fundamentally different extreme-event behavior. With computational overhead orders of magnitude lower than existing Stein discrepancies, the CD provides both immediate practical value for MCMC practitioners and a theoretical foundation for the next generation of structure-aware sample quality assessment.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递