机器学习学术速递[9.1]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计95篇

大模型相关(9篇)

【1】QR-LoRA: QR-Based Low-Rank Adaptation for Efficient Fine-Tuning of Large Language Models
标题：QR-LoRA：基于QR的低等级自适应，用于对大型语言模型进行高效微调
链接：https://arxiv.org/abs/2508.21810

作者：iang, Anirudh Bharadwaj
摘要：大型语言模型（LLM）规模的不断扩大，需要开发参数高效的微调技术。低秩自适应（LoRA）已经成为一种很有前途的方法，通过将低秩更新应用于预训练的权重来减少可训练参数的数量。虽然标准LoRA直接学习这两个更新因子，但最近的几个变体首先通过预训练权重的SVD初始化这些矩阵-这种操作在大型模型上可能很昂贵，并且产生的奇异向量并不总是容易解释。在这项工作中，我们使用带有列旋转的QR分解从预训练的权重矩阵中提取正交基，然后将LoRA更新表示为这些基向量的线性组合-仅训练标量系数，这对自适应施加了清晰的结构并大大减少了参数计数。跨GLUE任务的实验表明，QR-LoRA匹配或超过完全微调，标准LoRA和SVD-LoRA（LoRA通过奇异值分解初始化更新矩阵）的性能，参数少至601个-与完全微调相比减少了1000倍以上，比典型LoRA设置少77倍。
摘要：The growing scale of Large Language Models (LLMs) has necessitated the development of parameter-efficient fine-tuning techniques. Low-Rank Adaptation (LoRA) has emerged as a promising approach, reducing the number of trainable parameters by applying low-rank updates to pretrained weights. While standard LoRA learns both update factors directly, several recent variants first initialize those matrices via an SVD of the pretrained weights -- an operation that can be expensive on large models and yields singular vectors that are not always easy to interpret. In this work, we extract an orthonormal basis from the pretrained weight matrix using QR decomposition with column pivoting, and then express the LoRA update as a linear combination of these basis vectors -- training only the scalar coefficients, which imposes clear structure on adaptation and drastically reduces parameter count. Experiments across GLUE tasks show that QR-LoRA matches or exceeds the performance of full fine-tuning, standard LoRA, and SVD-LoRA (LoRA with update matrices initialized via singular value decomposition) with as few as 601 parameters -- a reduction of over 1000x compared to full fine-tuning and 77x fewer than typical LoRA setups.

【2】Summarize-Exemplify-Reflect: Data-driven Insight Distillation Empowers LLMs for Few-shot Tabular Classification
标题：摘要-示例-反映：数据驱动的洞察蒸馏使LLM能够进行Few-Shot表格分类
链接：https://arxiv.org/abs/2508.21561

作者：n, Jiatong Li, Weijia Zhang, Mohammad Aliannejadi, Evangelos Kanoulas, Renjun Hu
备注：EMNLP 25 Findings
摘要：最近的研究表明，大型语言模型（LLM）的承诺，Few-Shot表格分类，但突出的挑战，由于结构化数据的变化。为了解决这个问题，我们建议将数据提取为可操作的见解，以实现LLM的强大和有效的分类。从人类学习过程中汲取灵感，我们引入了InsightTab，这是一个以分而治之，简单优先和反思学习原则为指导的洞察力蒸馏框架。我们的方法通过LLM和数据建模技术之间的深度合作，集成了规则总结，战略例证和洞察力反射。所获得的见解使LLM能够更好地将其一般知识和能力与特定表格任务的特定要求相结合。我们在九个数据集上广泛评估了InsightTab。结果表明，与最先进的方法相比，得到了一致的改善。消融研究进一步验证了原则指导的蒸馏过程，而分析强调了InsightTab在利用标记数据和管理偏差方面的有效性。
摘要：Recent studies show the promise of large language models (LLMs) for few-shot tabular classification but highlight challenges due to the variability in structured data. To address this, we propose distilling data into actionable insights to enable robust and effective classification by LLMs. Drawing inspiration from human learning processes, we introduce InsightTab, an insight distillation framework guided by principles of divide-and-conquer, easy-first, and reflective learning. Our approach integrates rule summarization, strategic exemplification, and insight reflection through deep collaboration between LLMs and data modeling techniques. The obtained insights enable LLMs to better align their general knowledge and capabilities with the particular requirements of specific tabular tasks. We extensively evaluate InsightTab on nine datasets. The results demonstrate consistent improvement over state-of-the-art methods. Ablation studies further validate the principle-guided distillation process, while analyses emphasize InsightTab's effectiveness in leveraging labeled data and managing bias.

【3】Accept or Deny? Evaluating LLM Fairness and Performance in Loan Approval across Table-to-Text Serialization Approaches
标题：接受还是否认？通过表到文本序列化方法评估LLM在贷款批准方面的公平性和绩效
链接：https://arxiv.org/abs/2508.21512

作者：ebe Azime, Deborah D. Kanubala, Tejumade Afonja, Mario Fritz, Isabel Valera, Dietrich Klakow, Philipp Slusallek
摘要：大型语言模型（LLM）越来越多地用于高风险的决策任务，例如贷款审批。虽然他们的应用程序扩展到各个领域，但LLM很难处理表格数据，确保公平性并提供可靠的预测。在这项工作中，我们评估的性能和公平性LLM序列化贷款审批数据集从三个地理上不同的地区：加纳，德国和美国。我们的评估集中在模型的zero-shot和上下文学习（ICL）能力。我们的研究结果表明，选择序列化（序列化是指将表格数据转换为适合LLM处理的文本格式的过程。格式显着影响LLM的性能和公平性，某些格式，如GReat和LIFT产生更高的F1分数，但加剧了公平性差异。值得注意的是，虽然ICL相对于zero-shot基线将模型性能提高了4.9-59.6%，但其对公平性的影响在数据集之间差异很大。我们的工作强调了有效的表格数据表示方法和公平感知模型的重要性，以提高LLM在财务决策中的可靠性。
摘要：Large Language Models (LLMs) are increasingly employed in high-stakes decision-making tasks, such as loan approvals. While their applications expand across domains, LLMs struggle to process tabular data, ensuring fairness and delivering reliable predictions. In this work, we assess the performance and fairness of LLMs on serialized loan approval datasets from three geographically distinct regions: Ghana, Germany, and the United States. Our evaluation focuses on the model's zero-shot and in-context learning (ICL) capabilities. Our results reveal that the choice of serialization (Serialization refers to the process of converting tabular data into text formats suitable for processing by LLMs.) format significantly affects both performance and fairness in LLMs, with certain formats such as GReat and LIFT yielding higher F1 scores but exacerbating fairness disparities. Notably, while ICL improved model performance by 4.9-59.6% relative to zero-shot baselines, its effect on fairness varied considerably across datasets. Our work underscores the importance of effective tabular data representation methods and fairness-aware models to improve the reliability of LLMs in financial decision-making.

【4】Challenges and Applications of Large Language Models: A Comparison of GPT and DeepSeek family of models
标题：大型语言模型的挑战和应用：GPT和DeepSeek系列模型的比较
链接：https://arxiv.org/abs/2508.21377

作者：harma, Sneha Tuli, Narendra Badam
备注：18 pages, 7 figures
摘要：大型语言模型（LLM）正在改变各行各业的人工智能，但它们的开发和部署仍然很复杂。本调查回顾了构建和使用LLM时面临的16个关键挑战，并探讨了两种具有独特方法的最先进模型如何解决这些挑战：OpenAI的闭源GPT-4 o（2024年5月更新）和DeepSeek-V3-0324（2025年3月），一个大型开源混合专家模型。通过这种比较，我们展示了闭源模型（强大的安全性、微调的可靠性）和开源模型（效率、适应性）之间的权衡。我们还探索了不同领域的LLM应用程序（从聊天机器人和编码工具到医疗保健和教育），突出了哪些模型属性最适合每个用例。本文旨在指导AI研究人员，开发人员和决策者了解当前LLM的功能，限制和最佳实践。
摘要：Large Language Models (LLMs) are transforming AI across industries, but their development and deployment remain complex. This survey reviews 16 key challenges in building and using LLMs and examines how these challenges are addressed by two state-of-the-art models with unique approaches: OpenAI's closed source GPT-4o (May 2024 update) and DeepSeek-V3-0324 (March 2025), a large open source Mixture-of-Experts model. Through this comparison, we showcase the trade-offs between closed source models (robust safety, fine-tuned reliability) and open source models (efficiency, adaptability). We also explore LLM applications across different domains (from chatbots and coding tools to healthcare and education), highlighting which model attributes are best suited for each use case. This article aims to guide AI researchers, developers, and decision-makers in understanding current LLM capabilities, limitations, and best practices.

【5】Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning
标题：改进基于LoRA的LLM去学习的Fisher信息估计和效率
链接：https://arxiv.org/abs/2508.21300

作者：, Eunwon Kim, Buru Chang, Junsuk Choe
备注：None
摘要：LLM在各种任务中表现出卓越的性能，但面临着与无意生成包含敏感信息的输出有关的挑战。解决这个问题的一个简单方法是在排除有问题的数据后重新训练模型。然而，这种方法导致过高的计算成本。为了克服这一限制，机器非学习已经成为一种很有前途的解决方案，可以有效地删除敏感信息，而无需从头开始重新训练模型。最近，FILA已被提出作为一种参数有效的unlearning方法，通过集成LoRA适配器。具体而言，它计算Fisher信息以识别与遗忘集相关的参数，并将其分配给LoRA适配器进行更新。尽管其创新的方法，FILA仍然需要访问所有的模型参数，并没有充分考虑基本假设的基础费舍尔信息，导致不准确的重要性估计。为了解决这些限制，我们提出了VILA，一种新的unlearning框架，明确考虑FILA中忽略的假设，从而提高了遗忘集参数识别的准确性。此外，VILA通过在不访问整个模型的情况下进行参数识别来显著降低计算成本。与FILA相比，我们的方法实现了高达100倍的参数效率和40倍的训练速度，并在包括TOFU，WMDP和MUSE在内的基准测试中设置了新的最先进的性能。我们的代码可在https://github.com/kyj93790/VILA上获得。
摘要：LLMs have demonstrated remarkable performance across various tasks but face challenges related to unintentionally generating outputs containing sensitive information. A straightforward approach to address this issue is to retrain the model after excluding the problematic data. However, this approach incurs prohibitively high computational costs. To overcome this limitation, machine unlearning has emerged as a promising solution that can effectively remove sensitive information without the need to retrain the model from scratch. Recently, FILA has been proposed as a parameter-efficient unlearning method by integrating LoRA adapters. Specifically, it calculates the Fisher information to identify parameters associated with the forget set and assigns them to LoRA adapters for updates. Despite its innovative approach, FILA still requires access to all model parameters and does not adequately account for fundamental assumptions underlying Fisher information, leading to inaccuracies in importance estimation. To address these limitations, we propose VILA, a novel unlearning framework that explicitly considers the assumptions overlooked in FILA, thereby enhancing the accuracy of parameter identification for the forget set. Moreover, VILA significantly reduces computational costs by enabling parameter identification without accessing the entire model. Our method achieves up to 100x higher parameter efficiency and 40x faster training speed compared to FILA, and sets new state-of-the-art performance on benchmarks including TOFU, WMDP, and MUSE. Our code is available at https://github.com/kyj93790/VILA.

【6】CALM: A Framework for Continuous, Adaptive, and LLM-Mediated Anomaly Detection in Time-Series Streams
标题：CALM：时间序列流中连续、自适应和LLM介导的异常检测框架
链接：https://arxiv.org/abs/2508.21273

作者：ireddy, Shunping Huang
摘要：非平稳时间序列流中的异常检测是众多工业和科学领域中的一项关键但具有挑战性的任务。离线训练的传统模型在面临概念漂移时会遭受显着的性能下降，其中数据的基本统计属性会随着时间的推移而变化。本文介绍了CALM（连续，自适应，LLM-Mediated），一种新颖的，端到端的实时异常检测框架，旨在解决这一挑战。CALM构建在Apache Beam分布式处理框架上，并利用TimesFm基础模型进行基于预测的异常检测。该框架的新颖之处在于两个核心贡献。首先，它实现了一个闭环的、连续的微调机制，允许异常检测模型近实时地适应不断变化的数据模式。其次，它引入了一个LLM-as-a-Judge组件，这是一个大型语言模型，可以对检测到的异常提供语义，上下文感知的判断，以管理高质量的训练数据集，决定异常是代表瞬态噪声还是有意义的模式转变。我们在全面的TSB-UAD基准上评估CALM。我们的研究结果表明，与静态预训练的基础模型相比，不断微调的模型提高了大多数数据集中的ROC AUC分数，验证了我们的自适应LLM指导方法在动态流环境中保持高性能异常检测的有效性。
摘要：The detection of anomalies in non-stationary time-series streams is a critical but challenging task across numerous industrial and scientific domains. Traditional models, trained offline, suffer significant performance degradation when faced with concept drift, where the underlying statistical properties of the data change over time. This paper introduces CALM (Continuous, Adaptive, and LLM-Mediated), a novel, end-to-end framework for real-time anomaly detection designed to address this challenge. CALM is built on the Apache Beam distributed processing framework and leverages the TimesFm foundation model for forecasting-based anomaly detection. The framework's novelty lies in two core contributions. First, it implements a closed-loop, continuous fine-tuning mechanism that allows the anomaly detection model to adapt to evolving data patterns in near real-time. Second, it introduces an LLM-as-a-Judge component, a Large Language Model that provides semantic, context-aware judgments on detected anomalies to curate a high-quality training dataset, deciding whether an anomaly represents transient noise or a meaningful pattern shift. We evaluate CALM on the comprehensive TSB-UAD benchmark. Our results demonstrate that the continuously fine-tuned model improves the ROC AUC score in most datasets compared to the static, pre-trained base model, validating the efficacy of our adaptive, LLM-guided approach to maintaining high-performance anomaly detection in dynamic streaming environments.

【7】Adaptive LLM Routing under Budget Constraints
标题：预算约束下的自适应LLM路由
链接：https://arxiv.org/abs/2508.21141

作者：nda, Raghav Magazine, Chaitanya Devaguptapu, Sho Takemori, Vishal Sharma
备注：Accepted at EMNLP 2025 (findings)
摘要：大型语言模型（LLM）已经彻底改变了自然语言处理，但其不同的功能和成本在实际应用中提出了挑战。LLM路由通过为每个查询/任务动态选择最合适的LLM来解决这个问题。以前的方法将其视为监督学习问题，假设完全了解最佳查询-LLM配对。然而，现实世界的场景缺乏这种全面的映射，并面临不断变化的用户查询。因此，我们建议研究LLM路由作为一个上下文的强盗问题，使自适应决策使用强盗反馈，而不需要在所有LLM的所有查询（与监督路由）的穷举推理。为了解决这个问题，我们为查询和LLM开发了一个共享的嵌入空间，其中查询和LLM嵌入对齐以反映它们的亲和力。这个空间最初是从离线的人类偏好数据中学习的，并通过在线的bandit反馈进行优化。我们实例化这个想法，通过优先通知Linucb或自适应路由（PILOT），一个新的扩展LinUCB。为了处理不同的用户预算模型路由，我们引入了一个在线的成本政策建模为多选择背包问题，确保资源有效的路由。
摘要：Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications. LLM routing addresses this by dynamically selecting the most suitable LLM for each query/task. Previous approaches treat this as a supervised learning problem, assuming complete knowledge of optimal query-LLM pairings. However, real-world scenarios lack such comprehensive mappings and face evolving user queries. We thus propose to study LLM routing as a contextual bandit problem, enabling adaptive decision-making using bandit feedback without requiring exhaustive inference across all LLMs for all queries (in contrast to supervised routing). To address this problem, we develop a shared embedding space for queries and LLMs, where query and LLM embeddings are aligned to reflect their affinity. This space is initially learned from offline human preference data and refined through online bandit feedback. We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB. To handle diverse user budgets for model routing, we introduce an online cost policy modeled as a multi-choice knapsack problem, ensuring resource-efficient routing.

【8】Evaluating Differentially Private Generation of Domain-Specific Text
标题：评估特定领域文本的差异私人生成
链接：https://arxiv.org/abs/2508.20452

作者：, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Warren Del-Pinto, Goran Nenadic, Siew-Kei Lam, Jie Zhang, Anil A Bharath
摘要：生成式人工智能为医疗保健和金融等高风险领域提供了变革潜力，但隐私和监管障碍阻碍了真实世界数据的使用。为了解决这个问题，差异化的私人合成数据生成已经成为一个有前途的替代方案。在这项工作中，我们引入了一个统一的基准，系统地评估正式差分隐私（DP）保证下生成的文本数据集的实用性和保真度。我们的基准测试解决了特定领域基准测试中的关键挑战，包括选择代表性数据和现实的隐私预算，考虑预训练和各种评估指标。我们评估了五个特定领域数据集的最先进的隐私保护生成方法，与真实数据相比，特别是在严格的隐私限制下，揭示了显着的实用性和保真度下降。这些发现强调了当前方法的局限性，概述了对先进的隐私保护数据共享方法的需求，并在现实场景中对其进行评估方面树立了先例。
摘要：Generative AI offers transformative potential for high-stakes domains such as healthcare and finance, yet privacy and regulatory barriers hinder the use of real-world data. To address this, differentially private synthetic data generation has emerged as a promising alternative. In this work, we introduce a unified benchmark to systematically evaluate the utility and fidelity of text datasets generated under formal Differential Privacy (DP) guarantees. Our benchmark addresses key challenges in domain-specific benchmarking, including choice of representative data and realistic privacy budgets, accounting for pre-training and a variety of evaluation metrics. We assess state-of-the-art privacy-preserving generation methods across five domain-specific datasets, revealing significant utility and fidelity degradation compared to real data, especially under strict privacy constraints. These findings underscore the limitations of current approaches, outline the need for advanced privacy-preserving data sharing methods and set a precedent regarding their evaluation in realistic scenarios.

【9】Quantum-Enhanced Natural Language Generation: A Multi-Model Framework with Hybrid Quantum-Classical Architectures
标题：量子增强自然语言生成：具有混合量子经典架构的多模型框架
链接：https://arxiv.org/abs/2508.21332

作者： Chen, En-Jui Kuo
摘要：本文提出了一个全面的评估量子文本生成模型对传统的Transformer/MLP架构，解决了日益增长的兴趣，量子计算应用于自然语言处理。我们进行了系统的实验，比较了五种不同的模型：Transformer（基线），量子内核自我注意力网络（QKSAN），量子RWKV（QRWKV）和量子注意力序列架构（QASA），跨越五个不同的数据集，包括简单的句子，短篇小说，量子短语，俳句诗和谚语。我们的评估采用了多个指标，包括困惑，BLEU分数，词汇多样性，重复率和流畅性措施，以评估文本生成质量的不同方面。实验结果表明，传统的Transformer模型保持了整体优势，平均困惑度最低（1.21），BLEU-1得分最高（0.2895），量子模型在特定场景下表现出竞争力。值得注意的是，QKSAN在保持零重复率的同时达到了0.2800的BLEU-1得分，QRWKV在某些任务中表现出完美的词汇多样性（Distinct-1 = 1.000）。
摘要：This paper presents a comprehensive evaluation of quantum text generation models against traditional Transformer/MLP architectures, addressing the growing interest in quantum computing applications for natural language processing. We conduct systematic experiments comparing five distinct models: Transformer (baseline), Quantum Kernel Self-Attention Network (QKSAN), Quantum RWKV (QRWKV), and Quantum Attention Sequence Architecture (QASA) across five diverse datasets including simple sentences, short stories, quantum phrases, haiku poetry, and proverbs. Our evaluation employs multiple metrics including perplexity, BLEU scores, vocabulary diversity, repetition rates, and fluency measures to assess different aspects of text generation quality. The experimental results reveal that while traditional Transformer models maintain overall superiority with the lowest average perplexity (1.21) and highest BLEU-1 score (0.2895), quantum-inspired models demonstrate competitive performance in specific scenarios. Notably, QKSAN achieves a competitive BLEU-1 score of 0.2800 while maintaining zero repetition rates, and QRWKV demonstrates perfect vocabulary diversity (Distinct-1 = 1.000) in certain tasks.

Graph相关(图学习|图神经网络|图优化等)(1篇)

【1】On the Hardness of Learning GNN-based SAT Solvers: The Role of Graph Ricci Curvature
标题：关于学习基于GNN的SAT求解器的难度：图Ricci曲线的作用
链接：https://arxiv.org/abs/2508.21513

作者：deri
备注：Preprint
摘要：图神经网络（GNN）最近通过对逻辑公式的图形表示进行操作，显示出作为布尔可满足性问题（SAT）解决方案的前景。然而，在更困难的情况下，它们的性能会急剧下降，这是否反映了基本的架构限制？在这项工作中，我们提供了一个几何解释，通过镜头的图形Ricci曲率（RC），量化本地连接瓶颈。我们证明了来自随机k-SAT公式的二分图是固有的负弯曲，这种曲率与实例难度减少。在此基础上，我们证明了基于GNN的SAT求解器受到过度挤压的影响，这种现象使得长范围的依赖关系无法压缩成固定长度的表示。我们在不同的SAT基准测试中验证了我们的主张，并确认曲率既是问题复杂性的一个强有力的指标，也可以用来预测性能。最后，我们将我们的研究结果与现有求解器的设计原则联系起来，并为未来的工作勾勒出有希望的方向。
摘要：Graph Neural Networks (GNNs) have recently shown promise as solvers for Boolean Satisfiability Problems (SATs) by operating on graph representations of logical formulas. However, their performance degrades sharply on harder instances, raising the question of whether this reflects fundamental architectural limitations. In this work, we provide a geometric explanation through the lens of graph Ricci Curvature (RC), which quantifies local connectivity bottlenecks. We prove that bipartite graphs derived from random k-SAT formulas are inherently negatively curved, and that this curvature decreases with instance difficulty. Building on this, we show that GNN-based SAT solvers are affected by oversquashing, a phenomenon where long-range dependencies become impossible to compress into fixed-length representations. We validate our claims empirically across different SAT benchmarks and confirm that curvature is both a strong indicator of problem complexity and can be used to predict performance. Finally, we connect our findings to design principles of existing solvers and outline promising directions for future work.

Transformer(1篇)

【1】Spiking Decision Transformers: Local Plasticity, Phase-Coding, and Dendritic Routing for Low-Power Sequence Control
标题：尖峰决策Transformer：用于低功率序列控制的局部可塑性、相编码和树枝状布线
链接：https://arxiv.org/abs/2508.21505

作者：ndey, Debasmita Biswas
备注：Preprint (31 pages, 19 images, 7 tables)
摘要：基于Transformer架构的强化学习代理在顺序决策任务上取得了令人印象深刻的性能，但它们对密集矩阵运算的依赖使它们不适合能量受限的边缘导向平台。尖峰神经网络承诺超低功耗，事件驱动的推理，但没有以前的工作已经无缝融合尖峰动态与返回条件序列建模。我们提出了尖峰决策Transformer（SNN-DT），它嵌入泄漏集成和消防神经元到每个自我注意力块，通过代理梯度训练端到端，并结合生物启发的三因素可塑性，相移尖峰为基础的位置编码，和一个轻量级的树突路由模块。我们的实现在经典控制基准测试（CartPole-v1，MountainCar-v0，Acrobot-v1，Pendulum-v1）上匹配或超过标准决策Transformer性能，同时每个决策发出不到10个尖峰，能量代理表明每个推理能量减少了四个数量级。通过将序列建模与神经形态效率相结合，SNN-DT开辟了一条实现嵌入式和可穿戴设备实时、低功耗控制的途径。
摘要：Reinforcement learning agents based on Transformer architectures have achieved impressive performance on sequential decision-making tasks, but their reliance on dense matrix operations makes them ill-suited for energy-constrained, edge-oriented platforms. Spiking neural networks promise ultra-low-power, event-driven inference, yet no prior work has seamlessly merged spiking dynamics with return-conditioned sequence modeling. We present the Spiking Decision Transformer (SNN-DT), which embeds Leaky Integrate-and-Fire neurons into each self-attention block, trains end-to-end via surrogate gradients, and incorporates biologically inspired three-factor plasticity, phase-shifted spike-based positional encodings, and a lightweight dendritic routing module. Our implementation matches or exceeds standard Decision Transformer performance on classic control benchmarks (CartPole-v1, MountainCar-v0, Acrobot-v1, Pendulum-v1) while emitting fewer than ten spikes per decision, an energy proxy suggesting over four orders-of-magnitude reduction in per inference energy. By marrying sequence modeling with neuromorphic efficiency, SNN-DT opens a pathway toward real-time, low-power control on embedded and wearable devices.

GAN|对抗|攻击|生成相关(8篇)

【1】Achieving Hilbert-Schmidt Independence Under Rényi Differential Privacy for Fair and Private Data Generation
标题：在Rényi的差异隐私下实现Hilbert-Schmidt独立，以实现公平和私人的数据生成
链接：https://arxiv.org/abs/2508.21815

作者：rup, Emmanouil Panagiotou, Arjun Roy, Arthur Zimek, Eirini Ntoutsi, Peter Schneider-Kamp
摘要：随着GDPR和HIPAA等隐私法规以及AI法案等人工智能责任框架的发展，道德和负责任地使用真实世界数据面临越来越多的限制。合成数据生成已成为风险感知数据共享和模型开发的一个有前途的解决方案，特别是对于医疗保健等敏感领域的基础表格数据集。为了解决在这种情况下的隐私和公平性问题，我们提出了FLIP（隐私保证下的公平潜在干预），一个基于变换的变分自动编码器增强潜在扩散生成异构表格数据。与公平感知数据生成中的典型设置不同，我们假设任务不可知的设置，不依赖于固定的、定义的下游任务，从而提供更广泛的适用性。为了确保隐私，FLIP在训练过程中采用了R 'enyi差分隐私（RDP）约束，并通过与RDP兼容的平衡采样来解决输入空间中的公平性，该平衡采样考虑了多个采样率中的特定于组的噪声水平。在潜在空间中，我们通过使用中心核对齐（CKA）（一种扩展希尔伯特-施密特独立准则（HSIC）的相似性度量）在受保护的组中对齐神经元激活模式来促进公平性。这种对齐鼓励潜在表示和受保护特征之间的统计独立性。实验结果表明，FLIP有效地提供了显着的公平性改善任务无关的公平性和不同的下游任务下的差异隐私约束。
摘要：As privacy regulations such as the GDPR and HIPAA and responsibility frameworks for artificial intelligence such as the AI Act gain traction, the ethical and responsible use of real-world data faces increasing constraints. Synthetic data generation has emerged as a promising solution to risk-aware data sharing and model development, particularly for tabular datasets that are foundational to sensitive domains such as healthcare. To address both privacy and fairness concerns in this setting, we propose FLIP (Fair Latent Intervention under Privacy guarantees), a transformer-based variational autoencoder augmented with latent diffusion to generate heterogeneous tabular data. Unlike the typical setup in fairness-aware data generation, we assume a task-agnostic setup, not reliant on a fixed, defined downstream task, thus offering broader applicability. To ensure privacy, FLIP employs R\'enyi differential privacy (RDP) constraints during training and addresses fairness in the input space with RDP-compatible balanced sampling that accounts for group-specific noise levels across multiple sampling rates. In the latent space, we promote fairness by aligning neuron activation patterns across protected groups using Centered Kernel Alignment (CKA), a similarity measure extending the Hilbert-Schmidt Independence Criterion (HSIC). This alignment encourages statistical independence between latent representations and the protected feature. Empirical results demonstrate that FLIP effectively provides significant fairness improvements for task-agnostic fairness and across diverse downstream tasks under differential privacy constraints.

【2】I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks
标题：我偷偷发誓我会（不）好：模型窃取攻击的设计和评估
链接：https://arxiv.org/abs/2508.21654

作者：iynyk, Rudolf Mayer, Kathrin Grosse, Andreas Rauber
备注：Under review
摘要：模型窃取攻击危及作为服务提供的机器学习模型的机密性。虽然这些模型是保密的，但恶意方可以查询模型来标记数据样本并训练自己的替代模型，这违反了知识产权。虽然该领域的新攻击不断被发布，但它们的设计和评估并没有标准化，这使得比较先前的工作和评估该领域的进展变得具有挑战性。本文是第一个解决这一差距，提供设计和评估模型窃取攻击的建议。为此，我们研究了依赖于训练替代模型的最大一组攻击-那些攻击图像分类模型的攻击。我们提出了第一个全面的威胁模型，并开发了一个框架进行攻击比较。此外，我们还分析了相关作品中的攻击设置，以了解哪些任务和模型被研究得最多。根据我们的研究结果，我们提出了最佳实践的攻击开发之前，期间和之后的实验，并得出一个广泛的开放式研究问题的清单，关于模型窃取攻击的评估。我们的研究结果和建议也转移到其他问题领域，从而建立了第一个通用的模型窃取攻击的评估方法。
摘要：Model stealing attacks endanger the confidentiality of machine learning models offered as a service. Although these models are kept secret, a malicious party can query a model to label data samples and train their own substitute model, violating intellectual property. While novel attacks in the field are continually being published, their design and evaluations are not standardised, making it challenging to compare prior works and assess progress in the field. This paper is the first to address this gap by providing recommendations for designing and evaluating model stealing attacks. To this end, we study the largest group of attacks that rely on training a substitute model -- those attacking image classification models. We propose the first comprehensive threat model and develop a framework for attack comparison. Further, we analyse attack setups from related works to understand which tasks and models have been studied the most. Based on our findings, we present best practices for attack development before, during, and beyond experiments and derive an extensive list of open research questions regarding the evaluation of model stealing attacks. Our findings and recommendations also transfer to other problem domains, hence establishing the first generic evaluation methodology for model stealing attacks.

【3】OASIS: Harnessing Diffusion Adversarial Network for Ocean Salinity Imputation using Sparse Drifter Trajectories
标题：OASIS：利用扩散对抗网络使用稀疏漂流物轨迹进行海洋盐分估算
链接：https://arxiv.org/abs/2508.21570

作者：ngqi Feng, Ming Jin, Xin Zheng, Yufei Tang, Laurent Cherubin, Alan Wee-Chung Liew, Can Wang, Qinghua Lu, Jingwei Yao, Shirui Pan, Hong Zhang, Xingquan Zhu
备注：CIKM 2025 Accepted
摘要：海洋盐度在环流、气候和海洋生态系统中起着至关重要的作用，但它的测量通常是稀疏、不规则和嘈杂的，特别是在基于漂移的数据集中。传统的方法，如遥感和最佳插值，依赖于线性和平稳性，并受到云层覆盖，传感器漂移和低卫星重访率的限制。虽然机器学习模型提供了灵活性，但它们通常在严重稀疏的情况下失败，并且缺乏在没有专门传感器的情况下纳入物理协变量的原则性方法。在本文中，我们介绍了OceAn盐度估算系统（OASIS），这是一种旨在解决这些挑战的新型扩散对抗框架。
摘要：Ocean salinity plays a vital role in circulation, climate, and marine ecosystems, yet its measurement is often sparse, irregular, and noisy, especially in drifter-based datasets. Traditional approaches, such as remote sensing and optimal interpolation, rely on linearity and stationarity, and are limited by cloud cover, sensor drift, and low satellite revisit rates. While machine learning models offer flexibility, they often fail under severe sparsity and lack principled ways to incorporate physical covariates without specialized sensors. In this paper, we introduce the OceAn Salinity Imputation System (OASIS), a novel diffusion adversarial framework designed to address these challenges.

【4】Controllable 3D Molecular Generation for Structure-Based Drug Design Through Bayesian Flow Networks and Gradient Integration
标题：通过Bayesian流网络和梯度积分实现基于结构的药物设计的可控3D分子生成
链接：https://arxiv.org/abs/2508.21468

作者： Choi, Hwanhee Kim, Chihyun Park, Dahyeon Lee, Seungyong Lee, Yoonju Kim, Hyoungjoon Park, Sein Kwon, Youngwan Jo, Sanghyun Park
摘要：基于结构的药物设计（SBDD）的最新进展已经利用生成模型进行3D分子生成，主要通过与靶蛋白的结合亲和力来评估模型性能。然而，实际的药物发现需要高结合亲和力以及合成的可行性和选择性，在以前的评估中很大程度上被忽视的关键特性。为了解决这一差距，我们确定了传统的基于扩散的生成模型在有效引导分子生成这些不同的药理学特性方面的基本局限性。我们提出了CByG，一个新的框架扩展贝叶斯流网络到一个基于梯度的条件生成模型，鲁棒地集成了属性特定的指导。此外，我们引入了一个综合的评价方案，结合结合亲和力，合成的可行性和选择性的实际基准，克服了传统的评价方法的局限性。大量的实验表明，我们提出的CByG框架在多个基本评估标准上显着优于基线模型，突出了其在现实世界的药物发现应用中的有效性和实用性。
摘要：Recent advances in Structure-based Drug Design (SBDD) have leveraged generative models for 3D molecular generation, predominantly evaluating model performance by binding affinity to target proteins. However, practical drug discovery necessitates high binding affinity along with synthetic feasibility and selectivity, critical properties that were largely neglected in previous evaluations. To address this gap, we identify fundamental limitations of conventional diffusion-based generative models in effectively guiding molecule generation toward these diverse pharmacological properties. We propose CByG, a novel framework extending Bayesian Flow Network into a gradient-based conditional generative model that robustly integrates property-specific guidance. Additionally, we introduce a comprehensive evaluation scheme incorporating practical benchmarks for binding affinity, synthetic feasibility, and selectivity, overcoming the limitations of conventional evaluation methods. Extensive experiments demonstrate that our proposed CByG framework significantly outperforms baseline models across multiple essential evaluation criteria, highlighting its effectiveness and practicality for real-world drug discovery applications.

【5】Quantum enhanced ensemble GANs for anomaly detection in continuous biomanufacturing
标题：用于连续生物制造中异常检测的量子增强型集成GAN
链接：https://arxiv.org/abs/2508.21438

作者：lasanathan, William R. Clements, Mohammad Reza Boskabadi, Shawn M. Gibford, Emmanouil Papadakis, Christopher J. Savoie, Seyed Soheil Mansouri
摘要：连续生物制造过程的开发需要强大的早期异常检测，因为即使是微小的偏差也会影响产量和稳定性，导致调度中断，每周产量减少，经济效益下降。这些过程本质上是复杂的，并表现出非线性动态与过程变量之间的复杂关系，从而使先进的异常检测方法对于高效运行至关重要。在这项工作中，我们提出了一个新的框架，在连续生物制造的基础上生成对抗网络（GANs）的集成无监督异常检测。我们首先建立了一个基准数据集模拟正常和异常的操作制度在一个连续的过程中生产的小分子。然后，我们证明了我们的基于GAN的框架在检测由原料突然变化引起的异常方面的有效性。最后，我们评估了使用混合量子/经典GAN方法与模拟量子电路和真实光子量子处理器对异常检测性能的影响。我们发现，混合方法产生改进的异常检测率。我们的工作显示了混合量子/经典方法解决复杂连续生物制造过程中现实问题的潜力。
摘要：The development of continuous biomanufacturing processes requires robust and early anomaly detection, since even minor deviations can compromise yield and stability, leading to disruptions in scheduling, reduced weekly production, and diminished economic performance. These processes are inherently complex and exhibit non-linear dynamics with intricate relationships between process variables, thus making advanced methods for anomaly detection essential for efficient operation. In this work, we present a novel framework for unsupervised anomaly detection in continuous biomanufacturing based on an ensemble of generative adversarial networks (GANs). We first establish a benchmark dataset simulating both normal and anomalous operation regimes in a continuous process for the production of a small molecule. We then demonstrate the effectiveness of our GAN-based framework in detecting anomalies caused by sudden feedstock variability. Finally, we evaluate the impact of using a hybrid quantum/classical GAN approach with both a simulated quantum circuit and a real photonic quantum processor on anomaly detection performance. We find that the hybrid approach yields improved anomaly detection rates. Our work shows the potential of hybrid quantum/classical approaches for solving real-world problems in complex continuous biomanufacturing processes.

【6】DLGAN : Time Series Synthesis Based on Dual-Layer Generative Adversarial Networks
标题：DLGAN：基于双层生成对抗网络的时间序列合成
链接：https://arxiv.org/abs/2508.21340

作者： Shuhan Liu, Zhaohui Peng, Yaohui Chu, Yue Zhang, Yining Wang
备注：8 pages, 3 figures
摘要：时间序列综合是保证时间序列数据安全流通的有效途径。现有的时间序列合成方法通常基于随机序列进行时间建模以生成目标序列，这通常难以确保所生成的时间序列中的时间依赖性。此外，直接在随机序列上建模时间特征使得准确捕获原始时间序列的特征信息具有挑战性。为了解决上述问题，我们提出了一个简单但有效的生成模型\textbf{D}ual-\textbf{L}ayer \textbf{G} generative\textbf{A}dversarial \textbf{N}etworks，命名为\textbf{DLGAN}。该模型将时间序列生成过程分解为两个阶段：序列特征提取和序列重建。首先，这两个阶段形成了一个完整的时间序列自动编码器，可以对原始时间序列进行监督学习，以确保重建过程可以恢复序列的时间依赖性。其次，生成对抗网络（GAN）用于生成与实时序列特征向量对齐的合成特征向量，确保生成器可以从实时时间序列中捕获时间特征。在四个公共数据集上进行的大量实验证明了该模型在各种评估指标上的优越性。
摘要：Time series synthesis is an effective approach to ensuring the secure circulation of time series data. Existing time series synthesis methods typically perform temporal modeling based on random sequences to generate target sequences, which often struggle to ensure the temporal dependencies in the generated time series. Additionally, directly modeling temporal features on random sequences makes it challenging to accurately capture the feature information of the original time series. To address the above issues, we propose a simple but effective generative model \textbf{D}ual-\textbf{L}ayer \textbf{G}enerative \textbf{A}dversarial \textbf{N}etworks, named \textbf{DLGAN}. The model decomposes the time series generation process into two stages: sequence feature extraction and sequence reconstruction. First, these two stages form a complete time series autoencoder, enabling supervised learning on the original time series to ensure that the reconstruction process can restore the temporal dependencies of the sequence. Second, a Generative Adversarial Network (GAN) is used to generate synthetic feature vectors that align with the real-time sequence feature vectors, ensuring that the generator can capture the temporal features from real time series. Extensive experiments on four public datasets demonstrate the superiority of this model across various evaluation metrics.

【7】Stage-Diff: Stage-wise Long-Term Time Series Generation Based on Diffusion Models
标题：阶段差异：基于扩散模型的阶段长期时间序列生成
链接：https://arxiv.org/abs/2508.21330

作者： Shuhan Liu, Zhaohui Peng, Yaohui Chu, Yue Zhang, Yining Wang
备注：8 pages, 5 figures
摘要：生成式模型已成功地应用于时间序列生成领域。然而，当处理长期的时间序列，跨越较长的时期，并表现出更复杂的长期时间模式，生成的任务变得更具挑战性。长期时间序列表现出长期的时间依赖性，但其数据分布也会随着时间的推移而逐渐变化。在这些长期依赖性和数据分布的漂移之间找到平衡是一个关键挑战。另一方面，长期时间序列包含不同特征序列之间更复杂的相互关系，使得有效捕获序列内和序列间依赖性的任务成为另一个重要挑战。为了解决这些问题，我们提出了一个基于扩散模型的长期时间序列阶段生成模型Stage-Diff。首先，通过逐阶段序列生成和阶段间信息传递，该模型保留了长期序列依赖性，同时能够对数据分布变化进行建模。第二，在每个阶段内，渐进序列分解应用于在不同的时间尺度上执行通道独立的建模，而阶段间的信息传递利用多通道融合建模。该方法结合了通道独立建模的鲁棒性和多通道建模的信息融合优势，有效地平衡了长期时间序列的序列内和序列间依赖性。在多个真实数据集上的大量实验验证了Stage-Diff在长期时间序列生成任务中的有效性。
摘要：Generative models have been successfully used in the field of time series generation. However, when dealing with long-term time series, which span over extended periods and exhibit more complex long-term temporal patterns, the task of generation becomes significantly more challenging. Long-term time series exhibit long-range temporal dependencies, but their data distribution also undergoes gradual changes over time. Finding a balance between these long-term dependencies and the drift in data distribution is a key challenge. On the other hand, long-term time series contain more complex interrelationships between different feature sequences, making the task of effectively capturing both intra-sequence and inter-sequence dependencies another important challenge. To address these issues, we propose Stage-Diff, a staged generative model for long-term time series based on diffusion models. First, through stage-wise sequence generation and inter-stage information transfer, the model preserves long-term sequence dependencies while enabling the modeling of data distribution shifts. Second, within each stage, progressive sequence decomposition is applied to perform channel-independent modeling at different time scales, while inter-stage information transfer utilizes multi-channel fusion modeling. This approach combines the robustness of channel-independent modeling with the information fusion advantages of multi-channel modeling, effectively balancing the intra-sequence and inter-sequence dependencies of long-term time series. Extensive experiments on multiple real-world datasets validate the effectiveness of Stage-Diff in long-term time series generation tasks.

【8】Privacy Auditing Synthetic Data Release through Local Likelihood Attacks
标题：隐私审计通过本地可能性攻击发布的合成数据
链接：https://arxiv.org/abs/2508.21146

作者：rd, Chi-Hua Wang, Guang Cheng
摘要：审计合成数据的隐私泄漏是一个重要但尚未解决的问题。现有的大多数合成数据隐私审计框架依赖于抽象和不合理的假设来攻击生成模型的故障模式，表现出有限的能力来描述和检测通过合成数据发布的训练数据的隐私暴露。在本文中，我们研究设计成员推理攻击（MIA），专门利用表格生成模型往往显着过拟合训练分布的某些区域的观察结果。在这里，我们提出了生成似然比攻击（Gen-LRA），一种新的，计算效率高的无盒MIA，没有假设的模型知识或访问，制定其攻击通过评估的影响，测试观察在代理模型的估计局部似然比的合成数据。通过对涵盖不同数据集、模型架构和攻击参数的全面基准进行评估，我们发现Gen-LRA在多个性能指标上始终优于其他MIA生成模型。这些结果强调了Gen-LRA作为发布合成数据的隐私审计工具的有效性，突出了生成模型在现实世界应用中过度拟合所带来的重大隐私风险。
摘要：Auditing the privacy leakage of synthetic data is an important but unresolved problem. Most existing privacy auditing frameworks for synthetic data rely on heuristics and unreasonable assumptions to attack the failure modes of generative models, exhibiting limited capability to describe and detect the privacy exposure of training data through synthetic data release. In this paper, we study designing Membership Inference Attacks (MIAs) that specifically exploit the observation that tabular generative models tend to significantly overfit to certain regions of the training distribution. Here, we propose Generative Likelihood Ratio Attack (Gen-LRA), a novel, computationally efficient No-Box MIA that, with no assumption of model knowledge or access, formulates its attack by evaluating the influence a test observation has in a surrogate model's estimation of a local likelihood ratio over the synthetic data. Assessed over a comprehensive benchmark spanning diverse datasets, model architectures, and attack parameters, we find that Gen-LRA consistently dominates other MIAs for generative models across multiple performance metrics. These results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data, highlighting the significant privacy risks posed by generative model overfitting in real-world applications.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering
标题：通过非参数深度嵌入式集群的无监督视频连续学习
链接：https://arxiv.org/abs/2508.21773

作者： Kurpukdee, Adrian G. Bors
备注：Accepted to The 36th British Machine Vision Conference (BMVC 2025), Sheffield, UK
摘要：我们提出了一个现实的情况下，无监督的视频学习，既没有任务边界，也没有标签时，学习一系列的任务。我们还提供了一个非参数的学习解决方案，未充分探索的问题，无监督的视频连续学习。视频是一种复杂而丰富的时空媒体信息，在许多应用中得到了广泛的应用，但在无监督的持续学习中还没有得到充分的研究。以前的研究只关注监督式持续学习，依赖于标签和任务边界的知识，而标记数据的成本很高，而且不实用。为了解决这个问题，我们研究了无监督视频连续学习（uVCL）。与图像相比，uVCL由于处理视频的额外计算和存储器要求而提出更多挑战。我们介绍了一个通用的基准实验协议uVCL考虑在每个任务的非结构化视频数据类别的学习。我们建议使用无监督视频Transformer网络提取的深度嵌入视频特征的核密度估计（KDE）作为数据的非参数概率表示。我们引入了一个新的检测标准，为传入的新任务数据，动态地使内存集群的扩展，旨在捕捉新的知识时，学习一系列的任务。我们利用从先前任务的迁移学习作为知识迁移到当前学习任务的初始状态。我们发现，所提出的方法大大提高了性能的模型时，连续学习许多任务。我们对三个标准的视频动作识别数据集进行了深入的评估，包括UCF 101，HMDB 51和Something-to-Something V2，而不使用任何标签或类边界。
摘要：We propose a realistic scenario for the unsupervised video learning where neither task boundaries nor labels are provided when learning a succession of tasks. We also provide a non-parametric learning solution for the under-explored problem of unsupervised video continual learning. Videos represent a complex and rich spatio-temporal media information, widely used in many applications, but which have not been sufficiently explored in unsupervised continual learning. Prior studies have only focused on supervised continual learning, relying on the knowledge of labels and task boundaries, while having labeled data is costly and not practical. To address this gap, we study the unsupervised video continual learning (uVCL). uVCL raises more challenges due to the additional computational and memory requirements of processing videos when compared to images. We introduce a general benchmark experimental protocol for uVCL by considering the learning of unstructured video data categories during each task. We propose to use the Kernel Density Estimation (KDE) of deep embedded video features extracted by unsupervised video transformer networks as a non-parametric probabilistic representation of the data. We introduce a novelty detection criterion for the incoming new task data, dynamically enabling the expansion of memory clusters, aiming to capture new knowledge when learning a succession of tasks. We leverage the use of transfer learning from the previous tasks as an initial state for the knowledge transfer to the current learning task. We found that the proposed methodology substantially enhances the performance of the model when successively learning many tasks. We perform in-depth evaluations on three standard video action recognition datasets, including UCF101, HMDB51, and Something-to-Something V2, without using any labels or class boundaries.

【2】SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing
标题：SatDINO：深入研究遥感自我监督预训练
链接：https://arxiv.org/abs/2508.21402

作者：aka, Ivan Gruber
摘要：自监督学习已经成为遥感的一个强大工具，其中有大量的未标记的数据。在这项工作中，我们调查使用DINO，对比自我监督的方法，遥感图像的预训练。我们介绍SatDINO，一个专为卫星图像表示学习而设计的模型。通过在多个测试设置中对多个数据集进行广泛的实验，我们证明了SatDINO优于其他基于更常见的掩码自编码器（MAE）的最先进方法，并在多个基准测试中取得了有竞争力的结果。我们还提供了一个严格的消融研究评估SatDINO的个别组件。最后，我们提出了一些新的增强功能，例如结合地面样本距离（GSD）编码和自适应视图采样的新方法。这些增强功能可以在我们的SatDINO模型上独立使用。我们的代码和训练模型可以在https://github.com/strakaj/SatDINO上找到。
摘要：Self-supervised learning has emerged as a powerful tool for remote sensing, where large amounts of unlabeled data are available. In this work, we investigate the use of DINO, a contrastive self-supervised method, for pretraining on remote sensing imagery. We introduce SatDINO, a model tailored for representation learning in satellite imagery. Through extensive experiments on multiple datasets in multiple testing setups, we demonstrate that SatDINO outperforms other state-of-the-art methods based on much more common masked autoencoders (MAE) and achieves competitive results in multiple benchmarks. We also provide a rigorous ablation study evaluating SatDINO's individual components. Finally, we propose a few novel enhancements, such as a new way to incorporate ground sample distance (GSD) encoding and adaptive view sampling. These enhancements can be used independently on our SatDINO model. Our code and trained models are available at: https://github.com/strakaj/SatDINO.

【3】Class Incremental Continual Learning with Self-Organizing Maps and Variational Autoencoders Using Synthetic Replay
标题：使用自组织地图和使用合成回放的变分自动编码器的班级增量连续学习
链接：https://arxiv.org/abs/2508.21240

作者：pa, Alexander Ororbia, Travis Desell
摘要：这项工作介绍了一种新的生成式持续学习框架，基于自组织映射（SOM）和变分自编码器（VAE），以实现内存高效重放，消除了存储原始数据样本或任务标签的需要。对于高维输入空间，如CIFAR-10和CIFAR-100，我们设计了一个方案，其中SOM在VAE学习的潜在空间上运行，而对于低维输入，如MNIST和FashionMNIST中的输入，SOM以独立的方式运行。我们的方法存储每个SOM单元的运行均值、方差和协方差，然后在未来的学习迭代中从中生成合成样本。对于基于VAE的方法，生成的样本然后通过解码器馈送，然后用于后续重放。在标准类增量基准测试上的实验结果表明，我们的方法与最先进的基于内存的方法相比具有竞争力，并且优于无内存方法，在CIFAR-10和CIFAR-100上的最佳单类增量性能分别提高了近10 $\%和7 $\%。我们的方法进一步促进了学习过程的可视化，也可以用作生成模型后训练。结果表明，我们的方法的能力，作为一个可扩展的，任务标签的自由，和内存效率的解决方案，持续学习。
摘要：This work introduces a novel generative continual learning framework based on self-organizing maps (SOMs) and variational autoencoders (VAEs) to enable memory-efficient replay, eliminating the need to store raw data samples or task labels. For high-dimensional input spaces, such as of CIFAR-10 and CIFAR-100, we design a scheme where the SOM operates over the latent space learned by a VAE, whereas, for lower-dimensional inputs, such as those found in MNIST and FashionMNIST, the SOM operates in a standalone fashion. Our method stores a running mean, variance, and covariance for each SOM unit, from which synthetic samples are then generated during future learning iterations. For the VAE-based method, generated samples are then fed through the decoder to then be used in subsequent replay. Experimental results on standard class-incremental benchmarks show that our approach performs competitively with state-of-the-art memory-based methods and outperforms memory-free methods, notably improving over best state-of-the-art single class incremental performance on CIFAR-10 and CIFAR-100 by nearly $10$\% and $7$\%, respectively. Our methodology further facilitates easy visualization of the learning process and can also be utilized as a generative model post-training. Results show our method's capability as a scalable, task-label-free, and memory-efficient solution for continual learning.

【4】Deep Active Learning for Lung Disease Severity Classification from Chest X-rays: Learning with Less Data in the Presence of Class Imbalance
标题：胸部X光检查肺部疾病严重程度分类的深度主动学习：在阶级失衡的情况下用更少的数据进行学习
链接：https://arxiv.org/abs/2508.21263

作者：briel, Mohammadreza Zandehshahvar, Marly van Assen, Nattakorn Kittisut, Kyle Peters, Carlo N. De Cecco, Ali Adibi
摘要：为了减少在类别不平衡情况下胸部X射线（CXR）肺部疾病严重程度分类所需的标记数据量，本研究应用了具有贝叶斯神经网络（BNN）近似和加权损失函数的深度主动学习。这项回顾性研究收集了2020年1月至11月期间Emory Healthcare附属医院963名患者（平均年龄59.2岁; 481名女性）的2，319份CXR。所有患者均经临床确诊为COVID-19。每个CXR由3到6名委员会认证的放射科医生独立标记为正常，中度或重度。使用主动学习来训练具有Monte Carlo Dropout的深度神经网络以分类疾病严重程度。使用各种采集函数从未标记样本池中迭代选择信息量最大的样本。使用准确度、受试者工作特征曲线下面积（AU ROC）和精确-召回曲线下面积（AU PRC）评估性能。记录训练时间和采集时间。统计分析包括描述性指标和跨收购策略的性能比较。熵采样在使用15.4%的训练数据的二元分类（正常与患病）中实现了93.7%的准确度（AU ROC，0.91）。在多类设置中，平均STD采样使用23.1%的标记数据实现了70.3%的准确度（AU ROC，0.86）。这些方法优于更复杂和计算昂贵的采集功能，并显着减少标记的需要。具有BNN近似和加权损失的深度主动学习有效地减少了标记数据需求，同时解决了类别不平衡，保持或超过诊断性能。
摘要：To reduce the amount of required labeled data for lung disease severity classification from chest X-rays (CXRs) under class imbalance, this study applied deep active learning with a Bayesian Neural Network (BNN) approximation and weighted loss function. This retrospective study collected 2,319 CXRs from 963 patients (mean age, 59.2 $\pm$ 16.6 years; 481 female) at Emory Healthcare affiliated hospitals between January and November 2020. All patients had clinically confirmed COVID-19. Each CXR was independently labeled by 3 to 6 board-certified radiologists as normal, moderate, or severe. A deep neural network with Monte Carlo Dropout was trained using active learning to classify disease severity. Various acquisition functions were used to iteratively select the most informative samples from an unlabeled pool. Performance was evaluated using accuracy, area under the receiver operating characteristic curve (AU ROC), and area under the precision-recall curve (AU PRC). Training time and acquisition time were recorded. Statistical analysis included descriptive metrics and performance comparisons across acquisition strategies. Entropy Sampling achieved 93.7% accuracy (AU ROC, 0.91) in binary classification (normal vs. diseased) using 15.4% of the training data. In the multi-class setting, Mean STD sampling achieved 70.3% accuracy (AU ROC, 0.86) using 23.1% of the labeled data. These methods outperformed more complex and computationally expensive acquisition functions and significantly reduced labeling needs. Deep active learning with BNN approximation and weighted loss effectively reduces labeled data requirements while addressing class imbalance, maintaining or exceeding diagnostic performance.

迁移|Zero/Few/One-Shot|自适应(7篇)

【1】Harnessing IoT and Generative AI for Weather-Adaptive Learning in Climate Resilience Education
标题：利用物联网和生成式人工智能进行气候适应性教育中的天气适应性学习
链接：https://arxiv.org/abs/2508.21666

作者：A. Khan, Emmanuel G. Blanchard, Sébastien George
摘要：本文介绍了未来大气条件培训系统（FACTS），这是一个新的平台，通过基于地点的适应性学习经验推进气候适应性教育。FACTS将物联网传感器收集的实时大气数据与知识库中的精选资源相结合，以动态生成本地化的学习挑战。学习者的反应由生成AI驱动的服务器进行分析，该服务器提供个性化的反馈和自适应支持。用户评价的结果表明，与会者认为该系统易于使用，对建立与气候抗御能力有关的知识有效。这些研究结果表明，将物联网和生成人工智能集成到大气适应性学习技术中，对于提高教育参与度和培养气候意识具有重要意义。
摘要：This paper introduces the Future Atmospheric Conditions Training System (FACTS), a novel platform that advances climate resilience education through place-based, adaptive learning experiences. FACTS combines real-time atmospheric data collected by IoT sensors with curated resources from a Knowledge Base to dynamically generate localized learning challenges. Learner responses are analyzed by a Generative AI powered server, which delivers personalized feedback and adaptive support. Results from a user evaluation indicate that participants found the system both easy to use and effective for building knowledge related to climate resilience. These findings suggest that integrating IoT and Generative AI into atmospherically adaptive learning technologies holds significant promise for enhancing educational engagement and fostering climate awareness.

【2】Adapting to Change: A Comparison of Continual and Transfer Learning for Modeling Building Thermal Dynamics under Concept Drifts
标题：适应变化：概念漂移下建筑热动力学建模的连续学习和迁移学习的比较
链接：https://arxiv.org/abs/2508.21615

作者：isch, Max Langtry, Felix Koch, Ruchi Choudhary, Christoph Goebel, Benjamin Tischler
备注：Currently under review
摘要：迁移学习（TL）是目前在数据有限的情况下建立建筑热动力学模型的最有效方法。TL使用预先训练的模型，该模型针对特定的目标建筑进行了微调。然而，目前尚不清楚在初步微调后如何进行，因为随着时间的推移会收集更多的业务测量数据。当建筑物的动态变化时，例如在改造或占用变化之后，这种挑战变得更加复杂。在机器学习文献中，连续学习（CL）方法用于更新变化系统的模型。TL方法也可以通过在每个更新步骤中重用预训练模型并使用新的测量数据对其进行微调来解决这一挑战。关于如何随着时间的推移纳入新的测量数据以提高预测精度并解决建筑热动力学概念漂移（动态变化）的挑战的全面研究仍然缺失。因此，本研究比较了几种CL和TL策略，以及从头开始训练的模型，用于建筑操作过程中的热动力学建模。使用5- 7年的模拟数据代表的单户住宅在中欧，包括方案与概念漂移改造和入住率的变化进行评估的方法。我们提出了一个CL策略（季节性记忆学习），提供了更大的精度比现有的CL和TL方法的改进，同时保持低的计算工作量。SML在没有概念漂移的情况下比初始微调的基准高出28.1%，在有概念漂移的情况下高出34.9%。
摘要：Transfer Learning (TL) is currently the most effective approach for modeling building thermal dynamics when only limited data are available. TL uses a pretrained model that is fine-tuned to a specific target building. However, it remains unclear how to proceed after initial fine-tuning, as more operational measurement data are collected over time. This challenge becomes even more complex when the dynamics of the building change, for example, after a retrofit or a change in occupancy. In Machine Learning literature, Continual Learning (CL) methods are used to update models of changing systems. TL approaches can also address this challenge by reusing the pretrained model at each update step and fine-tuning it with new measurement data. A comprehensive study on how to incorporate new measurement data over time to improve prediction accuracy and address the challenges of concept drifts (changes in dynamics) for building thermal dynamics is still missing. Therefore, this study compares several CL and TL strategies, as well as a model trained from scratch, for thermal dynamics modeling during building operation. The methods are evaluated using 5--7 years of simulated data representative of single-family houses in Central Europe, including scenarios with concept drifts from retrofits and changes in occupancy. We propose a CL strategy (Seasonal Memory Learning) that provides greater accuracy improvements than existing CL and TL methods, while maintaining low computational effort. SML outperformed the benchmark of initial fine-tuning by 28.1\% without concept drifts and 34.9\% with concept drifts.

【3】Adaptive Heavy-Tailed Stochastic Gradient Descent
标题：自适应重尾随机梯度下降
链接：https://arxiv.org/abs/2508.21353

作者：, Gustavo Enrique Batista, Pierre Lafaye de Micheaux
摘要：在大规模神经网络模型时代，由于过度依赖训练损失，优化算法通常难以泛化。机器学习社区广泛接受的一个关键见解是宽盆地（局部最小值周围的区域，损失逐渐增加）通过为输入数据或模型参数的微小变化提供更大的稳定性来促进更好的泛化。相比之下，尖锐的最小值通常更敏感且不太稳定。受两个关键经验观察的启发-随机梯度下降中梯度噪声的固有重尾分布和神经网络训练过程中的稳定边缘现象，其中曲率在稳定之前增长，我们引入了自适应重尾随机梯度下降（AHTSGD）。该算法在训练的早期阶段将拖尾噪声注入优化器，以增强探索，并随着锐度稳定而逐渐过渡到拖尾噪声。通过在整个训练过程中动态适应损失情况的尖锐度，AHTSGD促进了向宽流域的加速收敛。AHTSGD是第一个基于稳定边缘现象调整注入噪声到优化器中的性质的算法。AHTSGD在MNIST和CIFAR-10等基准测试中始终优于SGD和其他基于噪声的方法，在SVHN等噪声数据集上具有显着的增益。它最终加速了初始化不佳的早期训练，并提高了在干净和嘈杂环境中的泛化能力，对学习率选择保持鲁棒性。
摘要：In the era of large-scale neural network models, optimization algorithms often struggle with generalization due to an overreliance on training loss. One key insight widely accepted in the machine learning community is the idea that wide basins (regions around a local minimum where the loss increases gradually) promote better generalization by offering greater stability to small changes in input data or model parameters. In contrast, sharp minima are typically more sensitive and less stable. Motivated by two key empirical observations - the inherent heavy-tailed distribution of gradient noise in stochastic gradient descent and the Edge of Stability phenomenon during neural network training, in which curvature grows before settling at a plateau, we introduce Adaptive Heavy Tailed Stochastic Gradient Descent (AHTSGD). The algorithm injects heavier-tailed noise into the optimizer during the early stages of training to enhance exploration and gradually transitions to lighter-tailed noise as sharpness stabilizes. By dynamically adapting to the sharpness of the loss landscape throughout training, AHTSGD promotes accelerated convergence to wide basins. AHTSGD is the first algorithm to adjust the nature of injected noise into an optimizer based on the Edge of Stability phenomenon. AHTSGD consistently outperforms SGD and other noise-based methods on benchmarks like MNIST and CIFAR-10, with marked gains on noisy datasets such as SVHN. It ultimately accelerates early training from poor initializations and improves generalization across clean and noisy settings, remaining robust to learning rate choices.

【4】Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation
标题：猜测和学习（G & L）：衡量冷启动适应的累积错误成本
链接：https://arxiv.org/abs/2508.21270

作者：nold
备注：15 pages, 7 figures. Main text is 10 pages. Code and data are available at this https URL
摘要：对机器学习模型的评估通常强调最终的准确性，而忽略了适应的成本：从头开始学习时产生的累积错误。Guess-and- Learn（G&L）v1.0通过测量冷启动适应性来解决这一差距-模型在顺序标记未标记数据集时所犯的总错误。在每一步中，学习器选择一个实例，预测其标签，接收地面实况，并在在线（每个样本）或批量（延迟）模式下更新参数。由此产生的错误轨迹暴露了适应速度、选择质量和偏差-端点度量不可见的动态。 G&L定义了四个轨道（Scratch/Pretrained $\times$ Online/Batch）来区分初始化和更新频率的影响。我们正式的协议，它与经典的错误绑定理论，并估计一个启发式的“预言参考带”MNIST作为一个可扩展性的参考。MNIST和AG News上的基线实验，包括经典方法（Perceptron，k-NN），卷积架构（CNN，ResNet-50）和预训练的Transformers（ViT-B/16，BERT-base），揭示了早期效率的系统差异：较小的模型可以适应较少的初始错误，而预训练的好处因领域而异。在不同的环境中，当前的模型仍然远远高于甲骨文的水平，突出了适应性的差距。通过量化早期学习的错误成本，G&L补充了传统的基准，并提供了一个可重复的框架，用于开发不仅在极限范围内准确而且从第一个例子开始就可靠的学习器。
摘要：Evaluation of machine learning models typically emphasizes final accuracy, overlooking the cost of adaptation: the cumulative errors incurred while learning from scratch. Guess-and- Learn (G&L) v1.0 addresses this gap by measuring cold-start adaptability - the total mistakes a model makes while sequentially labeling an unlabeled dataset. At each step, the learner selects an instance, predicts its label, receives the ground truth, and updates parameters under either online (per-sample) or batch (delayed) mode. The resulting error trajectory exposes adaptation speed, selection quality, and bias - dynamics invisible to endpoint metrics. G&L defines four tracks (Scratch/Pretrained $\times$ Online/Batch) to disentangle the effects of initialization and update frequency. We formalize the protocol, relate it to classical mistake-bound theory, and estimate a heuristic "oracle reference band" for MNIST as a plausibility reference. Baseline experiments on MNIST and AG News, spanning classical methods (Perceptron, k-NN), convolutional architectures (CNN, ResNet-50), and pretrained transformers (ViT-B/16, BERT-base), reveal systematic differences in early-phase efficiency: smaller models can adapt with fewer initial errors, while pretraining benefits vary by domain. Across settings, current models remain well above the oracle band, highlighting an adaptability gap. By quantifying the mistake cost of early learning, G&L complements conventional benchmarks and provides a reproducible framework for developing learners that are not only accurate in the limit but also reliable from the first examples.

【5】Automating the Deep Space Network Data Systems; A Case Study in Adaptive Anomaly Detection through Agentic AI
标题：深空网络数据系统自动化;通过显着人工智能进行自适应异常检测的案例研究
链接：https://arxiv.org/abs/2508.21111

作者：hou (1 and 2), Lisa S. Locke (3), Harvey M. Soldan (3) ((1) University of California San Diego, (2) Pasadena City College, (3) Jet Propulsion Laboratory California Institute of Technology)
摘要：深空网络（DSN）是美国宇航局最大的天线设施网络，可生成大量多变量时间序列数据。这些设施包含深空网络天线和发射器，这些天线和发射器在很长一段时间内会发生退化，这可能会导致数据流中断，并威胁到数十个依赖深空网络作为生命线的航天器的地球连接。这项研究的目的是试验不同的方法，这些方法将能够帮助喷气推进实验室的工程师通过收集的数据直接查明异常和设备退化，并继续为未来的宇宙空间任务进行DSN的维护和操作。因此，我们研究了各种机器学习技术，这些技术可以通过预测分析完全重建数据，并通过统计计算和阈值确定实时数据集中的异常数据条目。除了经过充分训练和测试的机器学习模型之外，我们还集成了强化学习子系统的使用，该子系统根据严重程度对已识别的异常进行分类，并使用大型语言模型为每个异常数据条目标记解释，所有这些都可以通过人工反馈/输入随着时间的推移进行改进和微调。具体来说，对于DSN发射器，我们还实现了一个完整的数据管道系统，将数据提取、解析和处理工作流连接在一起，因为之前没有连贯的程序或脚本来执行这些任务。使用这个数据管道系统，我们还能够连接从DSN天线数据训练的模型，完成DSN异常检测的数据工作流程。这一切都由代理人工智能系统包裹并进一步连接，其中利用复杂的推理来确定异常数据的分类和预测。
摘要：The Deep Space Network (DSN) is NASA's largest network of antenna facilities that generate a large volume of multivariate time-series data. These facilities contain DSN antennas and transmitters that undergo degradation over long periods of time, which may cause costly disruptions to the data flow and threaten the earth-connection of dozens of spacecraft that rely on the Deep Space Network for their lifeline. The purpose of this study was to experiment with different methods that would be able to assist JPL engineers with directly pinpointing anomalies and equipment degradation through collected data, and continue conducting maintenance and operations of the DSN for future space missions around our universe. As such, we have researched various machine learning techniques that can fully reconstruct data through predictive analysis, and determine anomalous data entries within real-time datasets through statistical computations and thresholds. On top of the fully trained and tested machine learning models, we have also integrated the use of a reinforcement learning subsystem that classifies identified anomalies based on severity level and a Large Language Model that labels an explanation for each anomalous data entry, all of which can be improved and fine-tuned over time through human feedback/input. Specifically, for the DSN transmitters, we have also implemented a full data pipeline system that connects the data extraction, parsing, and processing workflow all together as there was no coherent program or script for performing these tasks before. Using this data pipeline system, we were able to then also connect the models trained from DSN antenna data, completing the data workflow for DSN anomaly detection. This was all wrapped around and further connected by an agentic AI system, where complex reasoning was utilized to determine the classifications and predictions of anomalous data.

【6】Adaptive generative moment matching networks for improved learning of dependence structures
标题：用于改进依赖结构学习的自适应生成矩匹配网络
链接：https://arxiv.org/abs/2508.21531

作者：fert, Gan Yao
摘要：介绍了一种用于拟合生成矩匹配网络（GMMN）的最大均值差（MMD）混合核的自适应带宽选择方法，并证明了该方法能够提高Copula随机数发生器的学习能力.基于训练损失的相对误差，在训练期间增加核的数量;此外，验证损失的相对误差被用作早期停止标准。虽然这种自适应训练的GMMN（AGMMN）的训练时间类似于GMMN的训练时间，但与GMMN相比，训练性能显著增加，这是基于验证MMD轨迹、样本和验证MMD值来评估和示出的。AGMMN的GMMNs，以及典型的参数Copula模型的优势，证明了在三个应用程序。首先，拟随机与伪随机样本从高维copula的收敛速度进行了研究，为三个泛函的兴趣，并在尺寸高达100的第一次。第二，复制验证MMD，以及蒙特卡洛和准蒙特卡洛应用程序的基础上的预期收益的一个晒看涨期权和风险度量预期短缺的泛函被用来证明改进的训练AGMMN的GMMN的copula模型拟合的标准化残差的50个成分的标准化后，标准普尔500指数deGARCHing。最后，后一个数据集和50个成分的FTSE~100被用来证明，改进的训练AGMMN超过GMMN和比较，经典的参数Copula模型的拟合确实也转化为一个改进的模型预测。
摘要：An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and its ability to improve the learning of copula random number generators is demonstrated. Based on the relative error of the training loss, the number of kernels is increased during training; additionally, the relative error of the validation loss is used as an early stopping criterion. While training time of such adaptively trained GMMNs (AGMMNs) is similar to that of GMMNs, training performance is increased significantly in comparison to GMMNs, which is assessed and shown based on validation MMD trajectories, samples and validation MMD values. Superiority of AGMMNs over GMMNs, as well as typical parametric copula models, is demonstrated in terms of three applications. First, convergence rates of quasi-random versus pseudo-random samples from high-dimensional copulas are investigated for three functionals of interest and in dimensions as large as 100 for the first time. Second, replicated validation MMDs, as well as Monte Carlo and quasi-Monte Carlo applications based on the expected payoff of a basked call option and the risk measure expected shortfall as functionals are used to demonstrate the improved training of AGMMNs over GMMNs for a copula model fitted to the standardized residuals of the 50 constituents of the S&P 500 index after deGARCHing. Last, both the latter dataset and 50 constituents of the FTSE~100 are used to demonstrate that the improved training of AGMMNs over GMMNs and in comparison to the fitting of classical parametric copula models indeed also translates to an improved model prediction.

【7】Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children's Speech?
标题：分层SSL功能能否提高儿童语音的Zero-ShotASB性能？
链接：https://arxiv.org/abs/2508.21225

作者：inha, Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Shrikanth Narayanan
备注：Accepted
摘要：自动语音识别（ASR）系统往往很难准确地处理儿童的语音，由于其独特的和高度可变的声学和语言特征。虽然自我监督学习（SSL）模型的最新进展极大地增强了成人语音的转录，但准确地转录儿童语音仍然是一个重大挑战。这项研究调查了从最先进的SSL预训练模型中提取的逐层特征的有效性-特别是Wav 2 Vec 2，HuBERT，Data 2 Vec和WavLM，以提高zero-shot场景中儿童语音的ASR性能。对从这些模型中提取的特征进行了详细的分析，并使用Kaldi工具包将其集成到简化的基于DNN的ASR系统中。该分析确定了在zero-shot场景中增强儿童语音ASR性能的最有效层，其中WSJCAM 0成人语音用于训练，PFSTAR儿童语音用于测试。实验结果表明，Wav 2 Vec 2模型的第22层实现了5.15%的最低字错误率（WER），表示比使用Wav 2 Vec 2的直接zero-shot解码（WER为10.65%）相对改善了51.64%。此外，年龄组分析表明，随着年龄的增长，性能得到了一致的改善，即使在使用SSL功能的年轻年龄组中也观察到了显着的收益。对CMU Kids数据集的进一步实验证实了类似的趋势，突出了所提出方法的普遍性。
摘要：Automatic Speech Recognition (ASR) systems often struggle to accurately process children's speech due to its distinct and highly variable acoustic and linguistic characteristics. While recent advancements in self-supervised learning (SSL) models have greatly enhanced the transcription of adult speech, accurately transcribing children's speech remains a significant challenge. This study investigates the effectiveness of layer-wise features extracted from state-of-the-art SSL pre-trained models - specifically, Wav2Vec2, HuBERT, Data2Vec, and WavLM in improving the performance of ASR for children's speech in zero-shot scenarios. A detailed analysis of features extracted from these models was conducted, integrating them into a simplified DNN-based ASR system using the Kaldi toolkit. The analysis identified the most effective layers for enhancing ASR performance on children's speech in a zero-shot scenario, where WSJCAM0 adult speech was used for training and PFSTAR children speech for testing. Experimental results indicated that Layer 22 of the Wav2Vec2 model achieved the lowest Word Error Rate (WER) of 5.15%, representing a 51.64% relative improvement over the direct zero-shot decoding using Wav2Vec2 (WER of 10.65%). Additionally, age group-wise analysis demonstrated consistent performance improvements with increasing age, along with significant gains observed even in younger age groups using the SSL features. Further experiments on the CMU Kids dataset confirmed similar trends, highlighting the generalizability of the proposed approach.

强化学习(4篇)

【1】DynaMark: A Reinforcement Learning Framework for Dynamic Watermarking in Industrial Machine Tool Controllers
标题：DynaMark：用于工业机械工具控制器中动态水印的强化学习框架
链接：https://arxiv.org/abs/2508.21797

作者：abi, Abhishek Hanchate, Satish Bukkapatnam, Dan Li
摘要：工业4.0的高度网络化机床控制器（MTC）是使用过时传感器数据操纵执行器的重放攻击的主要目标。动态水印可以揭示这种篡改，但目前的计划假设线性高斯动态和使用恒定的水印统计，使他们容易受到时变，部分专有的行为的MTC。我们使用DynaMark来缩小这一差距，DynaMark是一种强化学习框架，将动态水印建模为马尔可夫决策过程（MDP）。它学习一个自适应的政策，动态适应的协方差零均值高斯水印使用现有的测量和检测器反馈，而不需要系统的知识。DynaMark最大限度地发挥独特的奖励功能，动态平衡控制性能、能耗和检测置信度。我们开发了一个贝叶斯信念更新机制，在线性系统中的实时检测信心。这种方法，独立于特定的系统假设，支持线性动态系统的MDP。在西门子Sinumerik 828D数字孪生控制器上，与恒定方差基线相比，DynaMark在保持标称轨迹的同时实现了70%的水印能量降低。它还保持一个平均检测延迟等于一个采样间隔。一个物理步进电机测试台验证了这些发现，快速触发报警，控制性能下降较少，并超过现有的基准。
摘要：Industry 4.0's highly networked Machine Tool Controllers (MTCs) are prime targets for replay attacks that use outdated sensor data to manipulate actuators. Dynamic watermarking can reveal such tampering, but current schemes assume linear-Gaussian dynamics and use constant watermark statistics, making them vulnerable to the time-varying, partly proprietary behavior of MTCs. We close this gap with DynaMark, a reinforcement learning framework that models dynamic watermarking as a Markov decision process (MDP). It learns an adaptive policy online that dynamically adapts the covariance of a zero-mean Gaussian watermark using available measurements and detector feedback, without needing system knowledge. DynaMark maximizes a unique reward function balancing control performance, energy consumption, and detection confidence dynamically. We develop a Bayesian belief updating mechanism for real-time detection confidence in linear systems. This approach, independent of specific system assumptions, underpins the MDP for systems with linear dynamics. On a Siemens Sinumerik 828D controller digital twin, DynaMark achieves a reduction in watermark energy by 70% while preserving the nominal trajectory, compared to constant variance baselines. It also maintains an average detection delay equivalent to one sampling interval. A physical stepper-motor testbed validates these findings, rapidly triggering alarms with less control performance decline and exceeding existing benchmarks.

【2】Beyond expected value: geometric mean optimization for long-term policy performance in reinforcement learning
标题：超出预期值：强化学习中长期政策性能的几何平均优化
链接：https://arxiv.org/abs/2508.21443

作者：ng, Dominik Baumann
备注：Accepted final version to appear in the Proceedings of the IEEE Conference on Decision and Control
摘要：强化学习（RL）算法通常优化预期的累积奖励，即，代理在轨迹过程中接收的标量奖励之和的期望值。期望值是无限多个轨迹上性能的平均值。然而，当在现实世界中部署代理时，该总体平均值对于个体轨迹的性能可能是无信息的。因此，在许多应用中，优化各个轨迹的长期性能可能更可取。在这项工作中，我们提出了一种新的RL算法，结合了标准的合奏平均与时间平均增长率，衡量个人轨迹的长期性能。我们首先定义Bellman算子的时间平均增长率。然后，我们表明，乘法奖励动态下，几何平均值与时间平均增长率一致。为了解决更一般和未知的奖励动态，我们提出了一个修改后的几何平均值与$N$-滑动窗口，捕获的路径依赖作为时间平均增长率的估计。该估计器作为正则化器嵌入到目标中，形成了一个实用的算法，并使该策略同时受益于系综平均和时间平均。我们评估我们的算法在具有挑战性的模拟，它优于传统的RL方法。
摘要：Reinforcement learning (RL) algorithms typically optimize the expected cumulative reward, i.e., the expected value of the sum of scalar rewards an agent receives over the course of a trajectory. The expected value averages the performance over an infinite number of trajectories. However, when deploying the agent in the real world, this ensemble average may be uninformative for the performance of individual trajectories. Thus, in many applications, optimizing the long-term performance of individual trajectories might be more desirable. In this work, we propose a novel RL algorithm that combines the standard ensemble average with the time-average growth rate, a measure for the long-term performance of individual trajectories. We first define the Bellman operator for the time-average growth rate. We then show that, under multiplicative reward dynamics, the geometric mean aligns with the time-average growth rate. To address more general and unknown reward dynamics, we propose a modified geometric mean with $N$-sliding window that captures the path-dependency as an estimator for the time-average growth rate. This estimator is embedded as a regularizer into the objective, forming a practical algorithm and enabling the policy to benefit from ensemble average and time-average simultaneously. We evaluate our algorithm in challenging simulations, where it outperforms conventional RL methods.

【3】Beyond Prediction: Reinforcement Learning as the Defining Leap in Healthcare AI
标题：超越预测：强化学习是医疗保健人工智能的定义性飞跃
链接：https://arxiv.org/abs/2508.21101

作者：rera, Gousia Habib, Qianyi Xu, Daniel J. Tan, Kai He, Erik Cambria, Mengling Feng
备注：40 pages in total (including appendix)
摘要：强化学习（RL）标志着人工智能在医疗保健中应用的根本转变。RL不是仅仅预测结果，而是积极地决定具有长期目标的干预措施。与基于固定关联的传统模型不同，强化学习系统通过试验、反馈和长期奖励优化来学习，从而引入了变革的可能性和新的风险。从信息融合的角度来看，医疗RL通常使用时间和决策级机制集成多源信号，如生命体征、实验室临床记录、成像和设备遥测。这些系统可以在集中式、联合式或边缘架构中运行，以满足实时临床约束，并自然地跨越数据、功能和决策融合级别。这项调查探讨了RL在医疗保健领域的崛起，不仅仅是一套工具，而是在临床环境中向代理智能的转变。我们首先构建了RL技术的景观，包括基于模型和无模型的方法，离线和批量约束的方法，以及通过医疗约束的镜头进行奖励规范和不确定性校准的新兴策略。然后，我们全面分析了RL应用，包括重症监护、慢性病、心理健康、诊断和机器人辅助，确定了它们的趋势、差距和转化瓶颈。与之前的评论相比，我们批判性地分析了RL的道德，部署和奖励设计挑战，并综合了安全，人性化的政策学习经验。本文既是一个技术路线图，也是对RL在医疗AI中新兴的变革性作用的批判性反思，而不是作为预测机器，而是作为代理临床智能。
摘要：Reinforcement learning (RL) marks a fundamental shift in how artificial intelligence is applied in healthcare. Instead of merely predicting outcomes, RL actively decides interventions with long term goals. Unlike traditional models that operate on fixed associations, RL systems learn through trial, feedback, and long-term reward optimization, introducing transformative possibilities and new risks. From an information fusion lens, healthcare RL typically integrates multi-source signals such as vitals, labs clinical notes, imaging and device telemetry using temporal and decision-level mechanisms. These systems can operate within centralized, federated, or edge architectures to meet real-time clinical constraints, and naturally span data, features and decision fusion levels. This survey explore RL's rise in healthcare as more than a set of tools, rather a shift toward agentive intelligence in clinical environments. We first structure the landscape of RL techniques including model-based and model-free methods, offline and batch-constrained approaches, and emerging strategies for reward specification and uncertainty calibration through the lens of healthcare constraints. We then comprehensively analyze RL applications spanning critical care, chronic disease, mental health, diagnostics, and robotic assistance, identifying their trends, gaps, and translational bottlenecks. In contrast to prior reviews, we critically analyze RL's ethical, deployment, and reward design challenges, and synthesize lessons for safe, human-aligned policy learning. This paper serves as both a a technical roadmap and a critical reflection of RL's emerging transformative role in healthcare AI not as prediction machinery, but as agentive clinical intelligence.

【4】Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning
标题：边缘的机器智能：使用强化学习的可解释心脏模式定位
链接：https://arxiv.org/abs/2508.21652

作者：an, Qiyu Rao, Nina Moutonnet, Pietro Ferraro, Danilo Mandic
摘要：匹配滤波器由于其高效率和可解释性而被广泛用于定位信号模式。然而，对于低信噪比（SNR）信号，例如在边缘设备上记录的信号，其有效性会恶化，其中突出的噪声模式可以在滤波器的有限长度内与目标非常相似。一个示例是耳心电图（ear-ECG），其中心脏信号被衰减并且被伪影严重破坏。为了解决这个问题，我们提出了序列匹配滤波器（SMF），这是一种用强化学习代理设计的滤波器序列取代传统单个匹配滤波器的范例。通过将滤波器设计制定为顺序决策过程，SMF自适应地设计信号特定的滤波器序列，通过揭示驱动决策的关键模式来保持完全可解释性。拟议的SMF框架具有可靠且可解释的临床决策支持的强大潜力，正如其在两个具有挑战性的真实世界心电图数据集上的最先进R峰检测和生理状态分类性能所证明的那样。所提出的公式还可以扩展到需要从噪声损坏的信号中准确定位模式的广泛应用。
摘要：Matched filters are widely used to localise signal patterns due to their high efficiency and interpretability. However, their effectiveness deteriorates for low signal-to-noise ratio (SNR) signals, such as those recorded on edge devices, where prominent noise patterns can closely resemble the target within the limited length of the filter. One example is the ear-electrocardiogram (ear-ECG), where the cardiac signal is attenuated and heavily corrupted by artefacts. To address this, we propose the Sequential Matched Filter (SMF), a paradigm that replaces the conventional single matched filter with a sequence of filters designed by a Reinforcement Learning agent. By formulating filter design as a sequential decision-making process, SMF adaptively design signal-specific filter sequences that remain fully interpretable by revealing key patterns driving the decision-making. The proposed SMF framework has strong potential for reliable and interpretable clinical decision support, as demonstrated by its state-of-the-art R-peak detection and physiological state classification performance on two challenging real-world ECG datasets. The proposed formulation can also be extended to a broad range of applications that require accurate pattern localisation from noise-corrupted signals.

医学相关(5篇)

【1】Inferring Effects of Major Events through Discontinuity Forecasting of Population Anxiety
标题：通过人口焦虑的间断性预测推断重大事件的影响
链接：https://arxiv.org/abs/2508.21722

作者： Mangalik, Ojas Deshpande, Adithya V. Ganesan, Sean A. P. Clouston, H. Andrew Schwartz
摘要：评估当地事件对社区特定心理健康的影响对公共卫生政策至关重要。虽然仅预测心理健康分数对事件对社区福祉的影响提供了有限的见解，但计量经济学的纵向回归不连续性设计（LRDD）等准实验设计有助于研究人员从观察数据中获得更多更有可能是因果关系的影响。LRDD旨在推断由于特定时间事件而导致的结果变化的大小（例如，焦虑的跑步评分的不连续性）。在这里，我们建议将传统预测之外的LRDD调整为统计学习框架，从而估计未来的不连续性（即特定于时间的变化）和斜率变化（即线性轨迹），给定位置的得分历史，动态协变量（其他运行评估）和外生变量（静态表示）。应用我们的框架来预测美国各县对COVID-19事件的焦虑的不连续性，我们发现这项任务很困难，但随着模型复杂性的增加，更容易实现，最好的结果来自整合外生和动态协变量。我们的方法显示出强大的改进（$r=+.46$的不连续性和$r = +.65$的斜率），比传统的静态社区表示。不连续性预测提出了新的可能性，以估计潜在的未来或假设的事件对特定社区的特殊影响。
摘要：Estimating community-specific mental health effects of local events is vital for public health policy. While forecasting mental health scores alone offers limited insights into the impact of events on community well-being, quasi-experimental designs like the Longitudinal Regression Discontinuity Design (LRDD) from econometrics help researchers derive more effects that are more likely to be causal from observational data. LRDDs aim to extrapolate the size of changes in an outcome (e.g. a discontinuity in running scores for anxiety) due to a time-specific event. Here, we propose adapting LRDDs beyond traditional forecasting into a statistical learning framework whereby future discontinuities (i.e. time-specific shifts) and changes in slope (i.e. linear trajectories) are estimated given a location's history of the score, dynamic covariates (other running assessments), and exogenous variables (static representations). Applying our framework to predict discontinuities in the anxiety of US counties from COVID-19 events, we found the task was difficult but more achievable as the sophistication of models was increased, with the best results coming from integrating exogenous and dynamic covariates. Our approach shows strong improvement ($r=+.46$ for discontinuity and $r = +.65$ for slope) over traditional static community representations. Discontinuity forecasting raises new possibilities for estimating the idiosyncratic effects of potential future or hypothetical events on specific communities.

【2】Comprehensive Signal Quality Evaluation of a Wearable Textile ECG Garment: A Sex-Balanced Study
标题：可穿戴纺织心电图服装的综合信号质量评估：性别平衡研究
链接：https://arxiv.org/abs/2508.21554

作者：n P. Oppelt, Tobias S. Zech, Sarah H. Lorenz, Laurenz Ottmann, Jan Steffan, Bjoern M. Eskofier, Nadine R. Lang-Richter, Norman Pfeiffer
摘要：我们介绍了一种新颖的可穿戴纺织服装，其具有创新的电极放置，旨在最大限度地减少噪声和运动伪影，从而提高心电图（ECG）记录的信号保真度。我们提出了一个全面的，性别平衡的评价，涉及15名健康男性和15名健康女性参与者，以确保该设备的适用性在解剖和生理变化。评估框架包括不同的评估方法：定量信号质量指数，以客观地基准设备性能;基于节律的生理参数分析，如心率和心率变异性;机器学习分类任务，以评估应用相关的预测效用; ECG特征的形态分析，包括幅度和间期参数;以及由织物/体型给出的电极投射角的影响的研究，所有分析均按性别分层，以阐明性别特异性影响。评估是在代表现实世界条件的各个活动阶段进行的。结果表明，纺织品系统实现的信号质量高度一致的参考设备在节奏和形态分析，表现出强大的分类性能，并能够识别影响信号采集的关键性别特异性决定因素。这些发现强调了基于纺织品的ECG服装用于生理监测以及心理生理状态检测的实际可行性。此外，我们还确定了将性别特异性设计考虑因素纳入可穿戴健康技术中以确保公平和可靠的心脏诊断的重要性。
摘要：We introduce a novel wearable textile-garment featuring an innovative electrode placement aimed at minimizing noise and motion artifacts, thereby enhancing signal fidelity in Electrocardiography (ECG) recordings. We present a comprehensive, sex-balanced evaluation involving 15 healthy males and 15 healthy female participants to ensure the device's suitability across anatomical and physiological variations. The assessment framework encompasses distinct evaluation approaches: quantitative signal quality indices to objectively benchmark device performance; rhythm-based analyzes of physiological parameters such as heart rate and heart rate variability; machine learning classification tasks to assess application-relevant predictive utility; morphological analysis of ECG features including amplitude and interval parameters; and investigations of the effects of electrode projection angle given by the textile / body shape, with all analyzes stratified by sex to elucidate sex-specific influences. Evaluations were conducted across various activity phases representing real-world conditions. The results demonstrate that the textile system achieves signal quality highly concordant with reference devices in both rhythm and morphological analyses, exhibits robust classification performance, and enables identification of key sex-specific determinants affecting signal acquisition. These findings underscore the practical viability of textile-based ECG garments for physiological monitoring as well as psychophysiological state detection. Moreover, we identify the importance of incorporating sex-specific design considerations to ensure equitable and reliable cardiac diagnostics in wearable health technologies.

【3】Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation
标题：医学概念表示的多实体集成和双轴传播
链接：https://arxiv.org/abs/2508.21320

作者：yebi Kerdabadi, Arya Hadizadeh Moghaddam, Dongjie Wang, Zijun Yao
备注：This work has been accepted as a full research paper at CIKM 2025
摘要：医学本体图通过结构化关系将外部知识映射到电子健康记录中的医学代码。通过利用域批准的连接（例如，父-子），预测模型可以通过合并来自相关概念的上下文信息来生成更丰富的医学概念表示。然而，现有的文献主要集中于合并来自单个本体系统或来自多个本体系统（例如，疾病，药物和程序），而不是将它们整合到一个统一的学习结构中。因此，概念表示学习往往仍然局限于本体内的关系，忽略了跨本体的连接。在本文中，我们提出了LINKO，一个大的语言模型（LLM）增强的综合本体学习框架，利用多个本体图，同时使双轴知识传播内和跨异构本体系统，以提高医学概念表示学习。具体来说，LINKO首先采用LLM提供一个图形检索增强的初始化本体概念嵌入，通过工程提示，包括概念描述，并进一步增强本体上下文。其次，我们的方法通过在两个轴上执行知识传播来联合学习不同本体图中的医学概念：（1）跨层次本体级别的本体内垂直传播和（2）并行的每个级别内的本体间水平传播。最后，通过在两个公共数据集上的大量实验，我们验证了LINKO在最先进的基线上的优越性能。作为与现有EHR预测模型兼容的插件编码器，LINKO进一步证明了在涉及有限数据可用性和罕见疾病预测的场景中增强的鲁棒性。
摘要：Medical ontology graphs map external knowledge to medical codes in electronic health records via structured relationships. By leveraging domain-approved connections (e.g., parent-child), predictive models can generate richer medical concept representations by incorporating contextual information from related concepts. However, existing literature primarily focuses on incorporating domain knowledge from a single ontology system, or from multiple ontology systems (e.g., diseases, drugs, and procedures) in isolation, without integrating them into a unified learning structure. Consequently, concept representation learning often remains limited to intra-ontology relationships, overlooking cross-ontology connections. In this paper, we propose LINKO, a large language model (LLM)-augmented integrative ontology learning framework that leverages multiple ontology graphs simultaneously by enabling dual-axis knowledge propagation both within and across heterogeneous ontology systems to enhance medical concept representation learning. Specifically, LINKO first employs LLMs to provide a graph-retrieval-augmented initialization for ontology concept embedding, through an engineered prompt that includes concept descriptions, and is further augmented with ontology context. Second, our method jointly learns the medical concepts in diverse ontology graphs by performing knowledge propagation in two axes: (1) intra-ontology vertical propagation across hierarchical ontology levels and (2) inter-ontology horizontal propagation within every level in parallel. Last, through extensive experiments on two public datasets, we validate the superior performance of LINKO over state-of-the-art baselines. As a plug-in encoder compatible with existing EHR predictive models, LINKO further demonstrates enhanced robustness in scenarios involving limited data availability and rare disease prediction.

【4】Advanced Deep Learning Techniques for Classifying Dental Conditions Using Panoramic X-Ray Images
标题：使用全景X射线图像对牙齿状况进行分类的高级深度学习技术
链接：https://arxiv.org/abs/2508.21088

作者：olkarieh, Kiana Kiashemshaki, Sajjad Rezvani Boroujeni
备注：14 pages, 8 figures, 8 tables
摘要：本研究探讨了用于全景X射线图像中牙齿状况自动分类的深度学习方法。使用了包含1，512张X光照片的数据集，其中包含11，137条经过专家验证的注释，涵盖四种情况：填充物、蛀牙、种植体和阻生牙。在预处理和类平衡之后，评估了三种方法：自定义卷积神经网络（CNN），将CNN特征提取与传统分类器相结合的混合模型，以及微调的预训练架构。实验采用5折交叉验证，准确率，精度，召回率和F1分数作为评估指标。混合CNN随机森林模型以85.4%的准确率实现了最高性能，超过了自定义CNN基线的74.3%。在预训练的模型中，VGG16表现最好，准确率为82.3%，其次是Xception和ResNet50。结果表明，混合模型改善了形态相似条件的区分，并提供了有效，可靠的性能。这些发现表明，将基于CNN的特征提取与集成分类器相结合，为自动化牙科诊断支持提供了一条实用的途径，同时也强调了对更大数据集和进一步临床验证的需求。
摘要：This study investigates deep learning methods for automated classification of dental conditions in panoramic X-ray images. A dataset of 1,512 radiographs with 11,137 expert-verified annotations across four conditions fillings, cavities, implants, and impacted teeth was used. After preprocessing and class balancing, three approaches were evaluated: a custom convolutional neural network (CNN), hybrid models combining CNN feature extraction with traditional classifiers, and fine-tuned pre-trained architectures. Experiments employed 5 fold cross validation with accuracy, precision, recall, and F1 score as evaluation metrics. The hybrid CNN Random Forest model achieved the highest performance with 85.4% accuracy, surpassing the custom CNN baseline of 74.3%. Among pre-trained models, VGG16 performed best at 82.3% accuracy, followed by Xception and ResNet50. Results show that hybrid models improve discrimination of morphologically similar conditions and provide efficient, reliable performance. These findings suggest that combining CNN-based feature extraction with ensemble classifiers offers a practical path toward automated dental diagnostic support, while also highlighting the need for larger datasets and further clinical validation.

【5】Data-driven Discovery of Digital Twins in Biomedical Research
标题：数据驱动的生物医学研究中数字双胞胎的发现
链接：https://arxiv.org/abs/2508.21484

作者：Métayer, Annabelle Ballesta, Julien Martinelli
摘要：最近的技术进步扩大了高通量生物数据集的可用性，使生物医学系统或患者的数字双胞胎的可靠设计成为可能。这些计算工具代表了驱动扰动或药物反应的关键反应网络，可以指导药物发现和个性化治疗。然而，它们的开发仍然依赖于人类建模者费力的数据集成，因此迫切需要自动化方法。物理学中数据驱动系统发现的成功，植根于干净的数据集和定义良好的管理法则，激发了人们对在生物学中应用类似技术的兴趣，这提出了独特的挑战。在这里，我们回顾了从生物时间序列中自动推断数字孪生子的方法，这些方法主要涉及符号或稀疏回归。我们评估算法根据八个生物和方法的挑战，相关的噪声/不完整的数据，多种条件，先验知识的整合，潜在变量，高维，未观察到的变量衍生物，候选库设计，和不确定性量化。根据这些标准，稀疏回归通常优于符号回归，特别是在使用贝叶斯框架时。我们进一步强调了深度学习和大型语言模型的新兴作用，它们可以实现创新的先验知识整合，尽管这些方法的可靠性和一致性必须得到改进。虽然没有单一的方法可以解决所有的挑战，但我们认为，学习数字双胞胎的进展将来自混合和模块化框架，结合了基于化学反应网络的机械基础，贝叶斯不确定性量化以及深度学习的生成和知识整合能力。为了支持他们的发展，我们进一步提出了一个基准框架，以评估所有挑战的方法。
摘要：Recent technological advances have expanded the availability of high-throughput biological datasets, enabling the reliable design of digital twins of biomedical systems or patients. Such computational tools represent key reaction networks driving perturbation or drug response and can guide drug discovery and personalized therapeutics. Yet, their development still relies on laborious data integration by the human modeler, so that automated approaches are critically needed. The success of data-driven system discovery in Physics, rooted in clean datasets and well-defined governing laws, has fueled interest in applying similar techniques in Biology, which presents unique challenges. Here, we reviewed methodologies for automatically inferring digital twins from biological time series, which mostly involve symbolic or sparse regression. We evaluate algorithms according to eight biological and methodological challenges, associated to noisy/incomplete data, multiple conditions, prior knowledge integration, latent variables, high dimensionality, unobserved variable derivatives, candidate library design, and uncertainty quantification. Upon these criteria, sparse regression generally outperformed symbolic regression, particularly when using Bayesian frameworks. We further highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration, though the reliability and consistency of such approaches must be improved. While no single method addresses all challenges, we argue that progress in learning digital twins will come from hybrid and modular frameworks combining chemical reaction network-based mechanistic grounding, Bayesian uncertainty quantification, and the generative and knowledge integration capacities of deep learning. To support their development, we further propose a benchmarking framework to evaluate methods across all challenges.

蒸馏|知识提取(1篇)

【1】Normalisation of SWIFT Message Counterparties with Feature Extraction and Clustering
标题：利用特征提取和集群规范SWIFT消息对手方
链接：https://arxiv.org/abs/2508.21081

作者：Schoinas, Benjamin Guinard, Diba Esbati, Richard Chalk
摘要：短文本聚类是文本分析社区中的一个已知用例。当结构和内容属于自然语言领域（例如Twitter帖子或即时消息）时，则可以使用自然语言技术，前提是文本具有足够的长度以允许使用（预）训练模型来提取有意义的信息，例如词性或主题注释。然而，自然语言模型不适合对交易对手进行聚类，因为它们存在于银行支付消息传递系统中，例如SWIFT。手动输入的标签通常是物理或法律实体的细节，缺乏句子结构，同时包含手动输入引入的所有变化和噪音。这使得调查人员或反欺诈专业人员在寻求增加其对支付流发起人和受益人实体的了解并追踪资金和资产时，在工具集中留下了空白。供应商传统上试图用模糊匹配工具来弥补这一差距。考虑到这些因素，我们提出了一种混合字符串相似性，主题建模，分层聚类和基于规则的管道，以促进交易对手的聚类，同时也满足未知数量的预期聚类。我们还在设计指标，以补充对该方法的评估，基于众所周知的精确度和召回率的测量。在现实生活中的标记数据集上进行的测试表明，与基于基线规则（“关键字”）的方法相比，性能得到了显着提高。该方法保留了基于规则的系统中的大部分可解释性，因为前者为后者添加了额外级别的集群细化。由此产生的工作流减少了手动审查的需要。当只需要调查总体的一个子集时，例如在制裁调查中，该方法可以更好地控制缺失实体变异的风险。
摘要：Short text clustering is a known use case in the text analytics community. When the structure and content falls in the natural language domain e.g. Twitter posts or instant messages, then natural language techniques can be used, provided texts are of sufficient length to allow for use of (pre)trained models to extract meaningful information, such as part-of-speech or topic annotations. However, natural language models are not suitable for clustering transaction counterparties, as they are found in bank payment messaging systems, such as SWIFT. The manually typed tags are typically physical or legal entity details, which lack sentence structure, while containing all the variations and noise that manual entry introduces. This leaves a gap in an investigator or counter-fraud professional's toolset when looking to augment their knowledge of payment flow originator and beneficiary entities and trace funds and assets. A gap that vendors traditionally try to close with fuzzy matching tools. With these considerations in mind, we are proposing a hybrid string similarity, topic modelling, hierarchical clustering and rule-based pipeline to facilitate clustering of transaction counterparties, also catering for unknown number of expected clusters. We are also devising metrics to supplement the evaluation of the approach, based on the well-known measures of precision and recall. Testing on a real-life labelled dataset demonstrates significantly improved performance over a baseline rule-based ('keyword') approach. The approach retains most of the interpretability found in rule-based systems, as the former adds an additional level of cluster refinement to the latter. The resulting workflow reduces the need for manual review. When only a subset of the population needs to be investigated, such as in sanctions investigations, the approach allows for better control of the risks of missing entity variations.

推荐(1篇)

【1】What Data is Really Necessary? A Feasibility Study of Inference Data Minimization for Recommender Systems
标题：哪些数据是真正需要的？推荐系统推理数据最小化的可行性研究
链接：https://arxiv.org/abs/2508.21547

作者：en, Marco Favier, Bart Goethals
备注：Accepted for publication at the 34th ACM International Conference on Information and Knowledge Management (CIKM '25), November 10-14, 2025, Seoul, Republic of Korea
摘要：数据最小化是一项法律原则，要求个人数据处理仅限于特定目的所需。对于依赖大量个人数据的推荐系统，将这一原则付诸实施仍然是一个重大挑战。本文对这类系统的隐反馈推理数据最小化问题进行了可行性研究。我们提出了一个新的问题制定，分析各种最小化技术，并调查影响其有效性的关键因素。我们证明了大量的推理数据减少在技术上是可行的，而没有显着的性能损失。然而，其实用性主要取决于两个因素：技术环境（例如，性能目标，模型的选择）和用户特性（例如，历史大小、偏好复杂度）。因此，虽然我们确定其技术可行性，我们的结论是，数据最小化仍然具有实际挑战性，其依赖于技术和用户的情况下，使一个通用的标准，数据的“必要性”难以执行。
摘要：Data minimization is a legal principle requiring personal data processing to be limited to what is necessary for a specified purpose. Operationalizing this principle for recommender systems, which rely on extensive personal data, remains a significant challenge. This paper conducts a feasibility study on minimizing implicit feedback inference data for such systems. We propose a novel problem formulation, analyze various minimization techniques, and investigate key factors influencing their effectiveness. We demonstrate that substantial inference data reduction is technically feasible without significant performance loss. However, its practicality is critically determined by two factors: the technical setting (e.g., performance targets, choice of model) and user characteristics (e.g., history size, preference complexity). Thus, while we establish its technical feasibility, we conclude that data minimization remains practically challenging and its dependence on the technical and user context makes a universal standard for data `necessity' difficult to implement.

自动驾驶|车辆|车道检测等(1篇)

【1】Multi-robot Path Planning and Scheduling via Model Predictive Optimal Transport (MPC-OT)
标题：基于模型预测最优运输（MPC-OT）的多机器人路径规划和调度
链接：https://arxiv.org/abs/2508.21205

作者：Khan, Mouhacine Benosman, Wenliang Liu, Federico Pecora, Joseph W. Durham
备注：2025 IEEE Conference on Decision and Control
摘要：在本文中，我们提出了一种新的方法，路径规划和调度多机器人导航，是基于最优运输理论和模型预测控制。我们考虑一个设置，其中$N$机器人的任务是在一个共同的空间与障碍物导航到$M$的目标。首先将机器人映射到目标，然后规划路径可能会导致路径重叠，从而导致死锁。我们推导出一种基于最优运输的策略，该策略不仅提供了从机器人到目标的最小成本路径，而且还保证了不重叠的轨迹。我们通过将感兴趣的空间离散化为$K$个单元并通过施加${K\times K}$成本结构来实现这一点，该成本结构描述了从一个单元转换到另一个单元的成本。最佳运输，然后提供\textit{最佳和非重叠}的细胞过渡的机器人，以达到目标，可以很容易地部署没有任何调度的考虑。所提出的解决方案在最坏情况下需要$\unicode{x1 D4 AA}（K^3\log K）$计算，而在性能良好的问题中需要$\unicode{x1 D4 AA}（K^2\log K）$计算。为了进一步适应潜在的重叠轨迹（在某些情况下是不可避免的）以及机器人动力学，我们表明，时间结构可以集成到最佳运输的帮助下，\textit{replans}和\textit{模型预测控制}。
摘要：In this paper, we propose a novel methodology for path planning and scheduling for multi-robot navigation that is based on optimal transport theory and model predictive control. We consider a setup where $N$ robots are tasked to navigate to $M$ targets in a common space with obstacles. Mapping robots to targets first and then planning paths can result in overlapping paths that lead to deadlocks. We derive a strategy based on optimal transport that not only provides minimum cost paths from robots to targets but also guarantees non-overlapping trajectories. We achieve this by discretizing the space of interest into $K$ cells and by imposing a ${K\times K}$ cost structure that describes the cost of transitioning from one cell to another. Optimal transport then provides \textit{optimal and non-overlapping} cell transitions for the robots to reach the targets that can be readily deployed without any scheduling considerations. The proposed solution requires $\unicode{x1D4AA}(K^3\log K)$ computations in the worst-case and $\unicode{x1D4AA}(K^2\log K)$ for well-behaved problems. To further accommodate potentially overlapping trajectories (unavoidable in certain situations) as well as robot dynamics, we show that a temporal structure can be integrated into optimal transport with the help of \textit{replans} and \textit{model predictive control}.

联邦学习|隐私保护|加密(1篇)

【1】Owen Sampling Accelerates Contribution Estimation in Federated Learning
标题：Owen抽样加速联邦学习中的贡献估计
链接：https://arxiv.org/abs/2508.21261

作者：hademSohi, Hadi Hemmati, Jiayu Zhou, Steve Drew
备注：ECAI 2025 camera-ready; 8 pages + appendix; code link inside
摘要：联合学习（FL）聚合来自多个客户端的信息，以训练共享的全局模型，而不暴露原始数据。准确估计每个客户的贡献不仅对公平回报至关重要，而且对选择最有用的客户至关重要，因此全局模型收敛得更快。Shapley值是一个原则性的选择，但精确的计算会随着客户端的数量呈指数级增长，这使得它不适用于大型联盟。我们提出FedOwen，一个有效的框架，使用欧文采样近似Shapley值下相同的总评估预算现有的方法，同时保持近似误差小。此外，FedOwen使用了一种自适应的客户选择策略，该策略平衡了利用高价值客户与探索采样不足的客户，减少了偏见，并发现了罕见但信息丰富的数据。在固定的估值成本下，与非IID基准的最新基线相比，FedOwen在相同数量的通信轮次内实现了高达23%的最终准确性。
摘要：Federated Learning (FL) aggregates information from multiple clients to train a shared global model without exposing raw data. Accurately estimating each client's contribution is essential not just for fair rewards, but for selecting the most useful clients so the global model converges faster. The Shapley value is a principled choice, yet exact computation scales exponentially with the number of clients, making it infeasible for large federations. We propose FedOwen, an efficient framework that uses Owen sampling to approximate Shapley values under the same total evaluation budget as existing methods while keeping the approximation error small. In addition, FedOwen uses an adaptive client selection strategy that balances exploiting high-value clients with exploring under-sampled ones, reducing bias and uncovering rare but informative data. Under a fixed valuation cost, FedOwen achieves up to 23 percent higher final accuracy within the same number of communication rounds compared to state-of-the-art baselines on non-IID benchmarks.

推理|分析|理解|解释(5篇)

【1】Introduction to the Analysis of Probabilistic Decision-Making Algorithms
标题：概率决策算法分析简介
链接：https://arxiv.org/abs/2508.21620

作者： Kristiadi
摘要：决策理论提供了在各种不确定性下做出选择的原则性方法。实现这些理论的算法已成功应用于广泛的现实问题，包括材料和药物发现。事实上，它们是可取的，因为它们可以自适应地收集信息，以便在未来做出更好的决策，从而实现数据高效的工作流程。在科学发现中，实验是昂贵的，因此这些算法可以显着降低实验成本。对这些算法的理论分析对于理解它们的行为并为开发下一代算法提供有价值的见解至关重要。然而，文献中的理论分析往往是非专家无法访问的。这本专著的目的是提供一个方便，自成一体的理论分析介绍常用的概率决策算法，包括强盗算法，贝叶斯优化和树搜索算法。假设只有概率论和统计学的基础知识，以及高斯过程的一些基本知识。
摘要：Decision theories offer principled methods for making choices under various types of uncertainty. Algorithms that implement these theories have been successfully applied to a wide range of real-world problems, including materials and drug discovery. Indeed, they are desirable since they can adaptively gather information to make better decisions in the future, resulting in data-efficient workflows. In scientific discovery, where experiments are costly, these algorithms can thus significantly reduce the cost of experimentation. Theoretical analyses of these algorithms are crucial for understanding their behavior and providing valuable insights for developing next-generation algorithms. However, theoretical analyses in the literature are often inaccessible to non-experts. This monograph aims to provide an accessible, self-contained introduction to the theoretical analysis of commonly used probabilistic decision-making algorithms, including bandit algorithms, Bayesian optimization, and tree search algorithms. Only basic knowledge of probability theory and statistics, along with some elementary knowledge about Gaussian processes, is assumed.

【2】Iterative Inference in a Chess-Playing Neural Network
标题：国际象棋神经网络中的迭代推理
链接：https://arxiv.org/abs/2508.21380

作者：dmann, Sebastian Lapuschkin, Wojciech Samek
摘要：神经网络是通过平滑、渐进的细化，还是通过更复杂的计算过程来构建它们的表示？我们通过扩展logit镜头来分析Leela Chess Zero（一个超人国际象棋引擎）的策略网络。我们发现，在玩强度和解谜能力跨层强单调的趋势，但政策分布经常遵循非平滑的轨迹。这方面的证据包括早期发现但随后被丢弃的正确谜题解决方案、与最终输出相关性较差的移动排名，以及直到网络后期的高政策分歧。这些发现与通常在语言模型中观察到的平滑分布收敛形成对比。
摘要：Do neural networks build their representations through smooth, gradual refinement, or via more complex computational processes? We investigate this by extending the logit lens to analyze the policy network of Leela Chess Zero, a superhuman chess engine. We find strong monotonic trends in playing strength and puzzle-solving ability across layers, yet policy distributions frequently follow non-smooth trajectories. Evidence for this includes correct puzzle solutions that are discovered early but subsequently discarded, move rankings that remain poorly correlated with final outputs, and high policy divergence until late in the network. These findings contrast with the smooth distributional convergence typically observed in language models.

【3】Faster Inference of Cell Complexes from Flows via Matrix Factorization
标题：通过矩阵分解从流中更快地推断细胞复合物
链接：https://arxiv.org/abs/2508.21372

作者：er, Josef Hoppe, Michael T. Schaub
备注：5 pages, 5 figures, accepted at EUSIPCO 2025 in Palermo, evaluation code available at this https URL
摘要：我们考虑以下推理问题：给定一组在图上观察到的边缘流信号，将图提升到细胞复合体，使得观察到的边缘流信号可以表示为细胞复合体上的梯度流和旋度流的稀疏组合。具体来说，我们的目标是增加观察到的图由一组2细胞（多边形包围封闭的，不相交的路径），这样的特征向量的霍奇拉普拉斯算子相关联的细胞复杂提供了一个稀疏的，可解释的表示观察到的边缘流的图。由于它已被证明，一般问题是NP-困难的在以前的工作中，我们在这里开发了一种新的矩阵分解为基础的启发式来解决这个问题。使用计算实验，我们证明了我们的新方法是显着较低的计算成本比以前的算法，而在大多数情况下，实现只有轻微的性能差。事实上，我们发现，对于特定的噪声设置，我们的新方法在解决方案质量和计算速度方面都优于以前的最新技术。
摘要：We consider the following inference problem: Given a set of edge-flow signals observed on a graph, lift the graph to a cell complex, such that the observed edge-flow signals can be represented as a sparse combination of gradient and curl flows on the cell complex. Specifically, we aim to augment the observed graph by a set of 2-cells (polygons encircled by closed, non-intersecting paths), such that the eigenvectors of the Hodge Laplacian of the associated cell complex provide a sparse, interpretable representation of the observed edge flows on the graph. As it has been shown that the general problem is NP-hard in prior work, we here develop a novel matrix-factorization-based heuristic to solve the problem. Using computational experiments, we demonstrate that our new approach is significantly less computationally expensive than prior heuristics, while achieving only marginally worse performance in most settings. In fact, we find that for specifically noisy settings, our new approach outperforms the previous state of the art in both solution quality and computational speed.

【4】An Explainable, Attention-Enhanced, Bidirectional Long Short-Term Memory Neural Network for Joint 48-Hour Forecasting of Temperature, Irradiance, and Relative Humidity
标题：可解释、注意力增强、双向长短期记忆神经网络，用于48小时联合预测温度、辐射率和相对湿度
链接：https://arxiv.org/abs/2508.21109

作者：Vamvouras, Konstantinos Braimakis, Christos Tzivanidis
备注：27 pages, 8 figures
摘要：本文提出了一种深度学习（DL）框架，用于48小时预测温度，太阳辐照度和相对湿度，以支持智能HVAC系统中的模型预测控制（MPC）。该方法采用具有注意力的堆叠双向长短期记忆（BiLSTM）网络，通过联合预测所有三个变量来捕获时间和跨特征依赖性。具有编码周期时间特征的历史气象数据（2019-2022）用于训练，而2023年的数据用于评估泛化。该模型的平均绝对误差为1.3摄氏度（温度），31 W/m2（辐照度）和6.7个百分点（湿度），优于最先进的数值天气预测和机器学习基准。集成的特征量化的贡献，和注意力的权重揭示了时间模式，提高可解释性。通过结合多变量预测，基于注意力的DL和可解释性，这项工作推进了数据驱动的天气预测。所展示的准确性和透明度突出了该框架通过可靠的短期气象预报进行节能建筑控制的潜力。
摘要：This paper presents a Deep Learning (DL) framework for 48-hour forecasting of temperature, solar irradiance, and relative humidity to support Model Predictive Control (MPC) in smart HVAC systems. The approach employs a stacked Bidirectional Long Short-Term Memory (BiLSTM) network with attention, capturing temporal and cross-feature dependencies by jointly predicting all three variables. Historical meteorological data (2019-2022) with encoded cyclical time features were used for training, while 2023 data evaluated generalization. The model achieved Mean Absolute Errors of 1.3 degrees Celsius (temperature), 31 W/m2 (irradiance), and 6.7 percentage points (humidity), outperforming state-of-the-art numerical weather prediction and machine learning benchmarks. Integrated Gradients quantified feature contributions, and attention weights revealed temporal patterns, enhancing interpretability. By combining multivariate forecasting, attention-based DL, and explainability, this work advances data-driven weather prediction. The demonstrated accuracy and transparency highlight the framework's potential for energy-efficient building control through reliable short-term meteorological forecasting.

【5】PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
标题：PVPO：用于统计推理的预估计基于价值的策略优化
链接：https://arxiv.org/abs/2508.21104

作者：eng, Penghong Zhao, Guochao Jiang, Chuzhan Hao, Yuewei Zhang, Hao Wang
备注：14 pages, 5 figures
摘要：无临界强化学习方法，特别是组策略，因其在复杂任务中的效率而引起了相当大的关注。然而，这些方法在很大程度上依赖于多个抽样和比较的政策，以估计的优势，这可能会导致政策陷入局部最优和增加计算成本。为了解决这些问题，我们提出了PVPO，这是一种通过优势参考锚和数据预采样增强的有效强化学习方法。具体来说，我们使用参考模型提前推出，并采用计算的奖励分数作为参考锚。我们的方法有效地纠正了组内比较引入的累积偏差，并显着降低了对推出数量的依赖。同时，参考模型可以在数据预采样时评估样本难度，从而有效选择高增益数据，提高训练效率。在两个领域的九个数据集上进行的实验表明，PVPO实现了最先进的（SOTA）性能。我们的方法不仅在多个任务中表现出强大的泛化能力，而且在不同规模的模型中表现出可扩展的性能。
摘要：Critic-free reinforcement learning methods, particularly group policies, have attracted considerable attention for their efficiency in complex tasks. However, these methods rely heavily on multiple sampling and comparisons within the policy to estimate advantage, which may cause the policy to fall into local optimum and increase computational cost. To address these issues, we propose PVPO, an efficient reinforcement learning method enhanced by an advantage reference anchor and data pre-sampling. Specifically, we use the reference model to rollout in advance and employ the calculated reward score as a reference anchor. Our approach effectively corrects the cumulative bias introduced by intra-group comparisons and significantly reduces reliance on the number of rollouts. Meanwhile, the reference model can assess sample difficulty during data pre-sampling, enabling effective selection of high-gain data to improve training efficiency. Experiments conducted on nine datasets across two domains demonstrate that PVPO achieves State-Of-The-Art (SOTA) performance. Our approach not only demonstrates robust generalization across multiple tasks, but also exhibits scalable performance across models of varying scales.

检测相关(3篇)

【1】Activation Subspaces for Out-of-Distribution Detection
标题：用于分布外检测的激活子空间
链接：https://arxiv.org/abs/2508.21695

作者：gür, Robin Hesse, Stefan Roth
摘要：为了确保深度模型在实际应用中的可靠性，分布外（OOD）检测方法旨在区分接近训练分布（分布内，ID）的样本和远离训练分布（OOD）的样本。在这项工作中，我们提出了一种新的OOD检测方法，利用奇异值分解的权重矩阵的分类头分解模型的激活成决定性的和无关紧要的组件，这有助于最大，分别最小，最终的分类器输出。我们发现，子空间的无关紧要的组件更有效地区分ID从OOD数据比原始激活制度的大分布变化（远OOD）。这是因为分类目标使不重要的子空间在很大程度上不受影响，从而产生不受目标分类任务“污染”的特征。相反，在较小的分布变化（近OOD）的制度，我们发现，激活整形方法受益于只考虑决定性的子空间，因为无关紧要的组件可以在激活空间中造成干扰。通过将两个发现结合到一个方法中，称为ActSub，我们在各种标准的OOD基准测试中获得了最先进的结果。
摘要：To ensure the reliability of deep models in real-world applications, out-of-distribution (OOD) detection methods aim to distinguish samples close to the training distribution (in-distribution, ID) from those farther away (OOD). In this work, we propose a novel OOD detection method that utilizes singular value decomposition of the weight matrix of the classification head to decompose the model's activations into decisive and insignificant components, which contribute maximally, respectively minimally, to the final classifier output. We find that the subspace of insignificant components more effectively distinguishes ID from OOD data than raw activations in regimes of large distribution shifts (Far-OOD). This occurs because the classification objective leaves the insignificant subspace largely unaffected, yielding features that are ''untainted'' by the target classification task. Conversely, in regimes of smaller distribution shifts (Near-OOD), we find that activation shaping methods profit from only considering the decisive subspace, as the insignificant component can cause interference in the activation space. By combining two findings into a single approach, termed ActSub, we achieve state-of-the-art results in various standard OOD benchmarks.

【2】HSFN: Hierarchical Selection for Fake News Detection building Heterogeneous Ensemble
标题：HSEN：假新闻检测构建异类联盟的分层选择
链接：https://arxiv.org/abs/2508.21482

作者：outinho, Rafael M.O. Cruz, Francimaria R. S. Nascimento, George D. C. Cavalcanti
备注：Accepted by IEEE International Conference on Systems, Man, and Cybernetics (SMC) - IEEE SMC 2025
摘要：心理偏见，如确认偏见，使个人特别容易相信和传播社交媒体上的假新闻，导致公共卫生和政治等领域的重大后果。基于机器学习的事实检查系统已经被广泛研究以缓解这个问题。其中，集成方法在组合多个分类器以提高鲁棒性方面特别有效。然而，它们的性能在很大程度上取决于组成分类器的多样性选择真正多样的模型仍然是一个关键的挑战，特别是当模型倾向于学习冗余模式时。在这项工作中，我们提出了一种新的自动分类器选择方法，优先多样性，也扩展了性能。该方法首先计算分类器之间的成对多样性，并应用层次聚类将它们组织成不同粒度级别的组。然后，HierarchySelect探索这些分层级别，以在每个级别上选择一个分类器池，每个分类器池表示不同的池内多样性。最多样化的游泳池被确定并从中选择用于整体构建。选择过程中结合了一个评价指标，反映每个分类器的性能，以确保合奏也概括良好。我们进行实验，40个异构分类器在6个数据集，从不同的应用领域和不同数量的类。我们的方法进行比较，对肘启发式和国家的最先进的基线。结果表明，我们的方法在六个数据集中的两个上达到了最高的准确率。实现细节可以在项目的存储库中找到：https://github.com/SaraBCoutinho/HSFN。
摘要：Psychological biases, such as confirmation bias, make individuals particularly vulnerable to believing and spreading fake news on social media, leading to significant consequences in domains such as public health and politics. Machine learning-based fact-checking systems have been widely studied to mitigate this problem. Among them, ensemble methods are particularly effective in combining multiple classifiers to improve robustness. However, their performance heavily depends on the diversity of the constituent classifiers-selecting genuinely diverse models remains a key challenge, especially when models tend to learn redundant patterns. In this work, we propose a novel automatic classifier selection approach that prioritizes diversity, also extended by performance. The method first computes pairwise diversity between classifiers and applies hierarchical clustering to organize them into groups at different levels of granularity. A HierarchySelect then explores these hierarchical levels to select one pool of classifiers per level, each representing a distinct intra-pool diversity. The most diverse pool is identified and selected for ensemble construction from these. The selection process incorporates an evaluation metric reflecting each classifiers's performance to ensure the ensemble also generalises well. We conduct experiments with 40 heterogeneous classifiers across six datasets from different application domains and with varying numbers of classes. Our method is compared against the Elbow heuristic and state-of-the-art baselines. Results show that our approach achieves the highest accuracy on two of six datasets. The implementation details are available on the project's repository: https://github.com/SaraBCoutinho/HSFN .

【3】Detecting Domain Shifts in Myoelectric Activations: Challenges and Opportunities in Stream Learning
标题：检测肌电激活的域变化：流学习的挑战和机遇
链接：https://arxiv.org/abs/2508.21278

作者：, Nick Lim, Guilherme Weigert Cassales, Heitor Murilo Gomes, Bernhard Pfahringer, Albert Bifet, Anany Dwivedi
备注：16 pages, 5 figures, 1 table, PRICAI25
摘要：由于肌电（EMG）信号固有的非平稳性，检测肌电激活中的域偏移构成了一个重大挑战。本文探讨了使用数据流（DS）学习技术检测域偏移，重点是Ninapro数据库中的DB 6数据集。我们根据不同的主题和记录会话将域定义为不同的时间序列片段，应用具有余弦核的核主成分分析（KPCA）来预处理并突出显示这些变化。通过评估多个漂移检测方法，如MANUUM，Page-Hinckley，和ADWIN，我们揭示了当前技术在实现高性能的实时域的肌电信号移位检测的局限性。我们的研究结果强调了基于流媒体的方法在维持稳定的EMG解码模型方面的潜力，同时强调了进一步研究的领域，以提高真实世界场景中的鲁棒性和准确性。
摘要：Detecting domain shifts in myoelectric activations poses a significant challenge due to the inherent non-stationarity of electromyography (EMG) signals. This paper explores the detection of domain shifts using data stream (DS) learning techniques, focusing on the DB6 dataset from the Ninapro database. We define domains as distinct time-series segments based on different subjects and recording sessions, applying Kernel Principal Component Analysis (KPCA) with a cosine kernel to pre-process and highlight these shifts. By evaluating multiple drift detection methods such as CUSUM, Page-Hinckley, and ADWIN, we reveal the limitations of current techniques in achieving high performance for real-time domain shift detection in EMG signals. Our results underscore the potential of streaming-based approaches for maintaining stable EMG decoding models, while highlighting areas for further research to enhance robustness and accuracy in real-world scenarios.

分类|识别(3篇)

【1】Domain Generalization in-the-Wild: Disentangling Classification from Domain-Aware Representations
标题：领域泛化：从领域感知表示中分离分类
链接：https://arxiv.org/abs/2508.21769

作者：n, Zhe Zhao, Shahbaz Rezaei, Xin Liu
摘要：评估CLIP等基础模型的领域泛化（DG）是一项挑战，因为网络规模的预训练数据可能涵盖许多现有的基准。因此，目前的DG评估可能既不具有足够的挑战性，也不足以测试真正看不见的数据场景。为了更好地评估CLIP在DG上的性能，CLIP遇到挑战性的未知数据的情况下，我们考虑了两种方法：（1）在ImageNet上微调CLIP后，对33个不同的数据集进行评估，并量化分布外（OOD）分数，以及（2）使用unlearning使CLIP“忘记”一些域作为近似值。我们观察到，CLIP的性能显着恶化更多的OOD数据集。为了解决这个问题，我们提出了CLIP-DCA（从增强的域感知表示中解开分类）。我们的方法的动机是观察到，虽然标准的域不变性损失的目的是使表示域不变，这可能是有害的基础模型，迫使丢弃有利于推广的域感知表示。相反，我们假设增强域意识是基础模型中有效域不变分类的先决条件。CLIP-DCA使用单独的域头和综合生成的不同域数据来识别和增强CLIP编码器内的域感知。同时，它鼓励领域不变的分类，通过解开领域的功能。与现有方法相比，CLIP-DCA在这一具有挑战性的评估中显示出显着的改进，特别是在更加面向对象的数据集上。
摘要：Evaluating domain generalization (DG) for foundational models like CLIP is challenging, as web-scale pretraining data potentially covers many existing benchmarks. Consequently, current DG evaluation may neither be sufficiently challenging nor adequately test genuinely unseen data scenarios. To better assess the performance of CLIP on DG in-the-wild, a scenario where CLIP encounters challenging unseen data, we consider two approaches: (1) evaluating on 33 diverse datasets with quantified out-of-distribution (OOD) scores after fine-tuning CLIP on ImageNet, and (2) using unlearning to make CLIP `forget' some domains as an approximation. We observe that CLIP's performance deteriorates significantly on more OOD datasets. To address this, we present CLIP-DCA (Disentangling Classification from enhanced domain Aware representations). Our approach is motivated by the observation that while standard domain invariance losses aim to make representations domain-invariant, this can be harmful to foundation models by forcing the discarding of domain-aware representations beneficial for generalization. We instead hypothesize that enhancing domain awareness is a prerequisite for effective domain-invariant classification in foundation models. CLIP-DCA identifies and enhances domain awareness within CLIP's encoders using a separate domain head and synthetically generated diverse domain data. Simultaneously, it encourages domain-invariant classification through disentanglement from the domain features. CLIP-DCA shows significant improvements within this challenging evaluation compared to existing methods, particularly on datasets that are more OOD.

【2】RARR : Robust Real-World Activity Recognition with Vibration by Scavenging Near-Surface Audio Online
标题：RARR：通过在线清理近表面音频来实现具有振动的鲁棒的现实世界活动识别
链接：https://arxiv.org/abs/2508.21167

作者： Lee, Alyssa Weakley, Hui Wei, Blake Brown, Keyana Carrion, Shijia Pan
摘要：四分之一的痴呆症患者独自生活，导致家庭成员从远处承担起照顾的角色。许多研究人员已经开发了远程监控解决方案来减少监控需求;然而，仍然存在局限性，包括隐私保护解决方案，活动识别以及对新用户和环境的模型推广。结构振动传感器系统是一种不显眼的解决方案，已被证明可以通过感测活动产生的表面振动，在受控环境中准确监测人类信息，例如身份识别和活动识别。然而，当部署在最终用户的家中时，当前的解决方案需要大量的标记数据来进行准确的活动识别。我们的可扩展解决方案采用来自近表面声学音频的合成数据来预训练模型，并允许使用非常有限的数据进行微调，以便为日常跟踪创建一个强大的框架。
摘要：One in four people dementia live alone, leading family members to take on caregiving roles from a distance. Many researchers have developed remote monitoring solutions to lessen caregiving needs; however, limitations remain including privacy preserving solutions, activity recognition, and model generalizability to new users and environments. Structural vibration sensor systems are unobtrusive solutions that have been proven to accurately monitor human information, such as identification and activity recognition, in controlled settings by sensing surface vibrations generated by activities. However, when deploying in an end user's home, current solutions require a substantial amount of labeled data for accurate activity recognition. Our scalable solution adapts synthesized data from near-surface acoustic audio to pretrain a model and allows fine tuning with very limited data in order to create a robust framework for daily routine tracking.

【3】Spatiotemporal EEG-Based Emotion Recognition Using SAM Ratings from Serious Games with Hybrid Deep Learning
标题：使用混合深度学习的严肃游戏的Sam评级进行基于时空脑电波的情感识别
链接：https://arxiv.org/abs/2508.21103

作者：man, Ilona Heldal, Jerry Chun-Wei Lin
摘要：基于EEG的情感识别的最新进展使用深度学习和经典机器学习方法显示出有希望的结果;然而，大多数现有研究都狭隘地关注二进制效价预测或特定于主题的分类，这限制了现实世界情感计算系统的可推广性和部署。为了解决这一差距，本文提出了一个统一的，多粒度的EEG情绪分类框架，建立在GAMEEMO数据集，其中包括14通道EEG记录和连续自我报告的情绪评级（无聊，可怕，平静，有趣）从28个主题在四个情绪诱导游戏场景。我们的管道采用了结构化的预处理策略，包括时间窗口分割，混合统计和频域特征提取，以及z分数归一化，将原始EEG信号转换为鲁棒的，有区别的输入向量。情绪标签是在三个互补的轴上导出和编码的：（i）基于积极和消极情绪评级的平均极性的二进制效价分类，以及（ii）多类情绪分类，其中预测最情绪状态的存在。(iii)通过将每种情绪分为10个有序类，实现细粒度多标签表示。我们评估了广泛的模型，包括随机森林，XGBoost和SVM，以及LSTM，LSTM-GRU和CNN-LSTM等深度神经架构。其中，LSTM-GRU模型始终优于其他模型，在二进制效价任务中实现了0.932的F1分数，在多类和多标签情感分类中分别达到了94.5%和90.6%。
摘要：Recent advancements in EEG-based emotion recognition have shown promising outcomes using both deep learning and classical machine learning approaches; however, most existing studies focus narrowly on binary valence prediction or subject-specific classification, which limits generalizability and deployment in real-world affective computing systems. To address this gap, this paper presents a unified, multigranularity EEG emotion classification framework built on the GAMEEMO dataset, which consists of 14-channel EEG recordings and continuous self-reported emotion ratings (boring, horrible, calm, and funny) from 28 subjects across four emotion-inducing gameplay scenarios. Our pipeline employs a structured preprocessing strategy that comprises temporal window segmentation, hybrid statistical and frequency-domain feature extraction, and z-score normalization to convert raw EEG signals into robust, discriminative input vectors. Emotion labels are derived and encoded across three complementary axes: (i) binary valence classification based on the averaged polarity of positive and negative emotion ratings, and (ii) Multi-class emotion classification, where the presence of the most affective state is predicted. (iii) Fine-grained multi-label representation via binning each emotion into 10 ordinal classes. We evaluate a broad spectrum of models, including Random Forest, XGBoost, and SVM, alongside deep neural architectures such as LSTM, LSTM-GRU, and CNN-LSTM. Among these, the LSTM-GRU model consistently outperforms the others, achieving an F1-score of 0.932 in the binary valence task and 94.5% and 90.6% in both multi-class and Multi-Label emotion classification.

表征(1篇)

【1】Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling
标题：从异类数据中学习统一表示以实现稳健的心率建模
链接：https://arxiv.org/abs/2508.21785

作者：, Zhengdong Huang, Zicheng Xie, Wentao Tian, Jingyu Liu, Lunhong Dong
摘要：心率预测对于个性化健康监测和健身至关重要，但在现实世界中部署时经常面临一个关键挑战：数据异构性。我们将其分为两个关键维度：来自具有不同功能集的碎片化设备市场的源异质性，以及反映个体和活动之间不同生理模式的用户异质性。现有的方法要么丢弃特定于设备的信息，要么无法对特定于用户的差异进行建模，从而限制了它们的实际性能。为了解决这个问题，我们提出了一个框架，学习潜在的表示不可知的异质性，使下游预测异构数据模式下的工作一致。具体来说，我们引入了一个随机特征丢弃策略来处理源异构性，使模型对各种特征集具有鲁棒性。为了管理用户的异质性，我们采用了一个时间感知的注意力模块来捕捉长期的生理特征，并使用对比学习目标来建立一个有区别的表示空间。为了反映真实世界数据的异质性，我们创建并公开发布了一个新的基准数据集ParroTao。对ParroTao和公共FitRec数据集的评估显示，我们的模型分别比现有基线显著高出17%和15%。此外，对学习的表示的分析表明了它们的强大的区分能力，并且一个下游应用任务证实了我们模型的实用价值。
摘要：Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge when deploying in real-world: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fragmented device markets with varying feature sets, and user heterogeneity reflecting distinct physiological patterns across individuals and activities. Existing methods either discard device-specific information, or fail to model user-specific differences, limiting their real-world performance. To address this, we propose a framework that learns latent representations agnostic to both heterogeneity, enabling downstream predictors to work consistently under heterogeneous data patterns. Specifically, we introduce a random feature dropout strategy to handle source heterogeneity, making the model robust to various feature sets. To manage user heterogeneity, we employ a time-aware attention module to capture long-term physiological traits and use a contrastive learning objective to build a discriminative representation space. To reflect the heterogeneous nature of real-world data, we created and publicly released a new benchmark dataset, ParroTao. Evaluations on both ParroTao and the public FitRec dataset show that our model significantly outperforms existing baselines by 17% and 15%, respectively. Furthermore, analysis of the learned representations demonstrates their strong discriminative power, and one downstream application task confirm the practical value of our model.

3D|3D重建等相关(1篇)

【1】ImmunoAI: Accelerated Antibody Discovery Using Gradient-Boosted Machine Learning with Thermodynamic-Hydrodynamic Descriptors and 3D Geometric Interface Topology
标题：免疫AI：使用具有热力学-流体动力学描述符和3D几何界面布局的学生增强机器学习加速抗体发现
链接：https://arxiv.org/abs/2508.21082

作者：hivakumar, Matthew Sandora
备注：6 pages, accepted at IEEE International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) '25
摘要：人偏肺病毒（hMPV）对儿童、老年人和免疫功能低下人群构成严重风险。传统的抗体发现管道需要10-12个月，限制了其快速疫情响应的适用性。该项目引入了ImmunoAI，这是一种机器学习框架，通过使用在热力学，流体动力学和3D拓扑界面描述符上训练的梯度增强模型预测高亲和力候选物来加速抗体发现。对213种抗体-抗原复合物的数据集进行了管理，以提取几何和物理化学特征，并训练LightGBM回归器以高精度预测结合亲和力。该模型将抗体候选搜索空间减少了89%，对117个SARS-CoV-2结合对的微调进一步将均方根误差（RMSE）从1.70降低到0.92。在缺乏hMPV A2.2变体的实验结构的情况下，使用AlphaFold 2来预测其3D结构。微调模型确定了两种最佳抗体，其预测的皮摩尔亲和力靶向关键突变位点（G42 V和E96 K），使其成为实验测试的优秀候选者。总之，ImmunoAI缩短了设计周期，并能够对病毒爆发做出更快、更明智的反应。
摘要：Human metapneumovirus (hMPV) poses serious risks to pediatric, elderly, and immunocompromised populations. Traditional antibody discovery pipelines require 10-12 months, limiting their applicability for rapid outbreak response. This project introduces ImmunoAI, a machine learning framework that accelerates antibody discovery by predicting high-affinity candidates using gradient-boosted models trained on thermodynamic, hydrodynamic, and 3D topological interface descriptors. A dataset of 213 antibody-antigen complexes was curated to extract geometric and physicochemical features, and a LightGBM regressor was trained to predict binding affinity with high precision. The model reduced the antibody candidate search space by 89%, and fine-tuning on 117 SARS-CoV-2 binding pairs further reduced Root Mean Square Error (RMSE) from 1.70 to 0.92. In the absence of an experimental structure for the hMPV A2.2 variant, AlphaFold2 was used to predict its 3D structure. The fine-tuned model identified two optimal antibodies with predicted picomolar affinities targeting key mutation sites (G42V and E96K), making them excellent candidates for experimental testing. In summary, ImmunoAI shortens design cycles and enables faster, structure-informed responses to viral outbreaks.

优化|敛散性(2篇)

【1】Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks
标题：宽两层物理信息神经网络随机梯度方法的收敛性
链接：https://arxiv.org/abs/2508.21571

作者：n, Longjun Wu
备注：24 pages
摘要：物理信息神经网络（PINN）代表了一类非常流行的偏微分方程神经求解器。在实践中，经常采用随机梯度下降型算法来训练神经网络。因此，保证随机梯度下降的收敛性是至关重要的。在这项工作中，我们建立了线性收敛的随机梯度下降/流在训练过参数化的两层PINN的一般类的激活函数的意义下的高概率。这些结果扩展了已有的结果[18]，其中分析了梯度下降。分析的挑战在于处理随机优化方法引入的动态随机性。分析的关键在于在训练过程中保证合适的Gram矩阵的正定性。分析揭示了优化过程的动态，并为随机算法训练的神经网络提供了保证。
摘要：Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore, the convergence guarantee of stochastic gradient descent is of fundamental importance. In this work, we establish the linear convergence of stochastic gradient descent / flow in training over-parameterized two layer PINNs for a general class of activation functions in the sense of high probability. These results extend the existing result [18] in which gradient descent was analyzed. The challenge of the analysis lies in handling the dynamic randomness introduced by stochastic optimization methods. The key of the analysis lies in ensuring the positive definiteness of suitable Gram matrices during the training. The analysis sheds insight into the dynamics of the optimization process, and provides guarantees on the neural networks trained by stochastic algorithms.

【2】Convergence of regularized agent-state-based Q-learning in POMDPs
标题：POMDPs中基于规则化代理状态的Q学习的收敛性
链接：https://arxiv.org/abs/2508.21314

作者：a, Matthieu Geist, Aditya Mahajan
备注：Accepted in CDC 2025
摘要：在本文中，我们提出了一个框架来理解实践中常用的Q学习强化学习算法的收敛性。这类算法的两个显著特征是：（i）使用代理状态（例如递归神经网络的状态）递归更新Q表，该代理状态不是信念状态或信息状态;（ii）策略正则化通常用于鼓励探索和稳定学习算法。我们调查的最简单的形式，这样的Q-学习算法，我们称之为正则化代理状态为基础的Q-学习（RASQL），并表明，它收敛在温和的技术条件下适当定义的正则化MDP的不动点，这取决于平稳分布引起的行为政策。我们还表明，类似的分析继续工作的RASQL的一个变种，学习定期的政策。我们提出的数值例子来说明的经验收敛行为与建议的理论极限相匹配。
摘要：In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practice. Two salient features of such algorithms are: (i)~the Q-table is recursively updated using an agent state (such as the state of a recurrent neural network) which is not a belief state or an information state and (ii)~policy regularization is often used to encourage exploration and stabilize the learning algorithm. We investigate the simplest form of such Q-learning algorithms which we call regularized agent-state-based Q-learning (RASQL) and show that it converges under mild technical conditions to the fixed point of an appropriately defined regularized MDP, which depends on the stationary distribution induced by the behavioral policy. We also show that a similar analysis continues to work for a variant of RASQL that learns periodic policies. We present numerical examples to illustrate that the empirical convergence behavior matches with the proposed theoretical limit.

预测|估计(5篇)

【1】MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction
标题：教育部健康：稳健多模式医疗保健预测的混合专家框架
链接：https://arxiv.org/abs/2508.21793

作者：Wang, Christopher C. Yang
备注：Accepted to The 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2025)
摘要：医疗保健系统生成各种多模态数据，包括电子健康记录（EHR）、临床记录和医学图像。有效地利用这些数据进行临床预测是具有挑战性的，特别是因为真实世界的样本通常存在不同或不完整的模式。现有方法通常需要完整的模态数据或依赖于手动选择策略，限制了其在真实世界临床环境中的适用性，其中数据可用性因患者和机构而异。为了解决这些局限性，我们提出了MoE-Health，这是一种新型的专家混合框架，旨在实现医疗保健预测中的稳健多模态融合。MoE-Health架构专门用于处理不同模式的样本，并提高关键临床任务的性能。通过利用专门的专家网络和动态门控机制，我们的方法动态地选择和组合相关专家的基础上可用的数据模式，使灵活的适应不同的数据可用性的情况。我们在MIMIC-IV数据集上评估了MoE-Health的三个关键临床预测任务：住院死亡率预测，长期住院和再入院预测。实验结果表明，MoE-Health实现了优越的性能相比，现有的多模态融合方法，同时保持不同的模态可用性模式的鲁棒性。该框架有效地集成了多模态信息，在处理异构和不完整的医疗数据时提供了更好的预测性能和鲁棒性，使其特别适合部署在具有异构数据可用性的各种医疗环境中。
摘要：Healthcare systems generate diverse multimodal data, including Electronic Health Records (EHR), clinical notes, and medical images. Effectively leveraging this data for clinical prediction is challenging, particularly as real-world samples often present with varied or incomplete modalities. Existing approaches typically require complete modality data or rely on manual selection strategies, limiting their applicability in real-world clinical settings where data availability varies across patients and institutions. To address these limitations, we propose MoE-Health, a novel Mixture of Experts framework designed for robust multimodal fusion in healthcare prediction. MoE-Health architecture is specifically developed to handle samples with differing modalities and improve performance on critical clinical tasks. By leveraging specialized expert networks and a dynamic gating mechanism, our approach dynamically selects and combines relevant experts based on available data modalities, enabling flexible adaptation to varying data availability scenarios. We evaluate MoE-Health on the MIMIC-IV dataset across three critical clinical prediction tasks: in-hospital mortality prediction, long length of stay, and hospital readmission prediction. Experimental results demonstrate that MoE-Health achieves superior performance compared to existing multimodal fusion methods while maintaining robustness across different modality availability patterns. The framework effectively integrates multimodal information, offering improved predictive performance and robustness in handling heterogeneous and incomplete healthcare data, making it particularly suitable for deployment in diverse healthcare environments with heterogeneous data availability.

【2】Predicting Social Media Engagement from Emotional and Temporal Features
标题：根据情感和时间特征预测社交媒体参与度
链接：https://arxiv.org/abs/2508.21650

作者：m, Junhyuk Hwang
备注：7 pages
摘要：我们提出了一种机器学习方法，用于从情感和时间特征预测社交媒体参与度（评论和喜欢）。该数据集包含600首歌曲，并附有效价、唤醒和相关情感指标的注释。基于HistoricentBoostingRegressor的多目标回归模型在对数转换的参与率上进行训练，以解决倾斜的目标。使用自定义数量级精度和标准回归度量（包括决定系数（R^2））评估性能。结果表明，情感和时间元数据，连同现有的视图计数，预测未来的参与有效。该模型在点赞时达到R^2 = 0.98，但在评论时只有R^2 = 0.41。这种差距表明，喜欢在很大程度上是由容易捕获的情感和曝光信号驱动的，而评论则取决于当前特征集中没有表示的其他因素。
摘要：We present a machine learning approach for predicting social media engagement (comments and likes) from emotional and temporal features. The dataset contains 600 songs with annotations for valence, arousal, and related sentiment metrics. A multi target regression model based on HistGradientBoostingRegressor is trained on log transformed engagement ratios to address skewed targets. Performance is evaluated with both a custom order of magnitude accuracy and standard regression metrics, including the coefficient of determination (R^2). Results show that emotional and temporal metadata, together with existing view counts, predict future engagement effectively. The model attains R^2 = 0.98 for likes but only R^2 = 0.41 for comments. This gap indicates that likes are largely driven by readily captured affective and exposure signals, whereas comments depend on additional factors not represented in the current feature set.

【3】Failure Prediction Is a Better Performance Proxy for Early-Exit Networks Than Calibration
标题：故障预测是比校准更好的提前退出网络的性能代理
链接：https://arxiv.org/abs/2508.21495

作者：aty, Filip Szatkowski, Metod Jazbec, Bartosz Wójcik
摘要：早期退出模型通过将内部分类器附加到模型的中间层并允许在预测满足退出标准时停止计算来加速推理。大多数早期退出方法依赖于基于置信度的退出策略，这促使一些工作来校准中间分类器，以提高整个模型的性能。在本文中，我们表明，校准措施可以误导多出口模型的性能指标：一个良好的校准分类器可能仍然浪费计算，和常见的校准方法不保存分类器内的样本排名。我们展示了经验的情况下，错误校准的网络优于校准的。作为替代方案，我们建议使用故障预测作为早期退出模型性能的更有用的代理。与校准不同的是，故障预测解释了样本排名的变化，并显示出与效率提高的强烈相关性，使其成为设计和评估早期退出模型的更可靠的基础。
摘要：Early-exit models speed up inference by attaching internal classifiers to intermediate layers of the model and allowing computation to stop once a prediction satisfies an exit criterion. Most early-exit methods rely on confidence-based exit strategies, which motivated some works to calibrate intermediate classifiers to improve the performance of the entire model. In this paper, we show that calibration measures can be misleading indicators of the performance of multi-exit models: a well-calibrated classifier may still waste computation, and common calibration methods do not preserve the sample ranking within a classifier. We demonstrate empirical cases where miscalibrated networks outperform calibrated ones. As an alternative, we propose to use failure prediction as a more useful proxy for early-exit model performance. Unlike calibration, failure prediction accounts for changes in the ranking of samples and shows a strong correlation with efficiency improvements, making it a more dependable basis for designing and evaluating early-exit models.

【4】Manifold Trajectories in Next-Token Prediction: From Replicator Dynamics to Softmax Equilibrium
标题：下一个代币预测中的多种轨迹：从Copy Dynamics到Softmax均衡
链接：https://arxiv.org/abs/2508.21186

作者：er R. Lee-Jenkins
摘要：大型语言模型中的解码通常被描述为对令牌进行评分并使用softmax进行规范化。我们给出了一个最小的，自足的帐户，这一步作为一个约束变分原理的概率单纯形。离散的，规范化的尊重上升是经典的乘法权重（熵镜）更新;其连续时间的限制是复制流。从这些成分中，我们证明，对于固定的上下文和温度，下一个令牌分布遵循单纯形内的平滑轨迹，并收敛到softmax平衡。这在产出分配水平上形式化了常见的“流形”直觉。该分析产生了精确的、面向实践的结果：温度作为沿着相同轨迹的时间的精确重新缩放，而top-k和核采样将流限制在具有相同保证的面。我们还概述了一个控制帐户的路径依赖的分数调整和他们的连接循环，幻觉式的行为。我们没有对训练动态或内部表征提出任何要求;这些都推迟到未来的工作中。
摘要：Decoding in large language models is often described as scoring tokens and normalizing with softmax. We give a minimal, self-contained account of this step as a constrained variational principle on the probability simplex. The discrete, normalization-respecting ascent is the classical multiplicative-weights (entropic mirror) update; its continuous-time limit is the replicator flow. From these ingredients we prove that, for a fixed context and temperature, the next-token distribution follows a smooth trajectory inside the simplex and converges to the softmax equilibrium. This formalizes the common ``manifold traversal'' intuition at the output-distribution level. The analysis yields precise, practice-facing consequences: temperature acts as an exact rescaling of time along the same trajectory, while top-k and nucleus sampling restrict the flow to a face with identical guarantees. We also outline a controlled account of path-dependent score adjustments and their connection to loop-like, hallucination-style behavior. We make no claims about training dynamics or internal representations; those are deferred to future work.

【5】Pep2Prob Benchmark: Predicting Fragment Ion Probability for MS$^2$-based Proteomics
标题：Pep 2 Prob基准：预测基于MS $2 $的蛋白质组学碎片离子概率
链接：https://arxiv.org/abs/2508.21076

作者：hichao Wang, Shengqi Sang, Pisit Wajanasara, Nuno Bandeira
备注：Dataset is available at HuggingFace: this https URL
摘要：蛋白质执行几乎所有的细胞功能，并构成大多数药物靶点，使其分析成为理解人类健康和疾病生物学的基础。串联质谱法（Tandem mass spectrometry，MS$^2$）是蛋白质组学中的主要分析技术，它通过电离、裂解多肽，并利用产生的质谱来鉴定和定量生物样品中的蛋白质。在MS $^2 $分析中，肽碎片离子概率预测发挥着关键作用，作为强度信息的补充，提高了从质谱中识别肽的准确性。目前的方法依赖于片段化的全局统计，其假设片段的概率在所有肽中是均匀的。然而，从生物化学原理的角度来看，这种假设过于简单化，限制了准确的预测。为了解决这一差距，我们提出了Pep 2 Prob，这是第一个为肽特异性碎片离子概率预测而设计的综合数据集和基准。拟议的数据集包含608，780个独特前体的碎片离子概率统计（每个前体是一对肽序列和电荷状态），总结了超过1.83亿个高质量，高分辨率的HCD MS $^2 $光谱，并进行了验证肽分配和碎片注释。我们使用简单的统计规则和基于学习的方法建立基线性能，并发现利用肽特异性信息的模型显著优于仅使用全局片段化统计的先前方法。此外，随着容量的增加，基准模型的性能表明肽片段化关系表现出复杂的非线性，需要复杂的机器学习方法。
摘要：Proteins perform nearly all cellular functions and constitute most drug targets, making their analysis fundamental to understanding human biology in health and disease. Tandem mass spectrometry (MS$^2$) is the major analytical technique in proteomics that identifies peptides by ionizing them, fragmenting them, and using the resulting mass spectra to identify and quantify proteins in biological samples. In MS$^2$ analysis, peptide fragment ion probability prediction plays a critical role, enhancing the accuracy of peptide identification from mass spectra as a complement to the intensity information. Current approaches rely on global statistics of fragmentation, which assumes that a fragment's probability is uniform across all peptides. Nevertheless, this assumption is oversimplified from a biochemical principle point of view and limits accurate prediction. To address this gap, we present Pep2Prob, the first comprehensive dataset and benchmark designed for peptide-specific fragment ion probability prediction. The proposed dataset contains fragment ion probability statistics for 608,780 unique precursors (each precursor is a pair of peptide sequence and charge state), summarized from more than 183 million high-quality, high-resolution, HCD MS$^2$ spectra with validated peptide assignments and fragmentation annotations. We establish baseline performance using simple statistical rules and learning-based methods, and find that models leveraging peptide-specific information significantly outperform previous methods using only global fragmentation statistics. Furthermore, performance across benchmark models with increasing capacities suggests that the peptide-fragmentation relationship exhibits complex nonlinearities requiring sophisticated machine learning approaches.

其他神经网络|深度学习|模型|建模(21篇)

【1】UniMLR: Modeling Implicit Class Significance for Multi-Label Ranking
标题：UniMLR：为多标签排名建模隐式类别重要性
链接：https://arxiv.org/abs/2508.21772

作者：Yesilkaynak, Emine Dari, Alican Mertan, Gozde Unal
摘要：现有的多标签排序（MLR）框架只利用从标签的正集和负集的二分推导出的信息。因此，它们不会受益于正标签之间的排名，这是我们在本文中介绍的新的MLR方法。我们提出了UniMLR，一个新的MLR范式，模型隐式类相关性/重要性值的概率分布使用的排名之间的积极标签，而不是把它们视为同等重要。这种方法统一了与MLR相关的排名和分类任务。此外，我们通过引入八个具有不同显著性决定因素的合成数据集（排名MNIST），解决了MLR数据集的稀缺性和注释偏差的挑战，提供了一个丰富和可控的实验环境。我们在统计上证明，我们的方法准确地学习了正秩顺序的表示，这与地面真理一致，并与潜在的显著性值成比例。最后，我们在真实世界和合成数据集上进行了全面的实证实验，证明了我们提出的框架的价值。
摘要：Existing multi-label ranking (MLR) frameworks only exploit information deduced from the bipartition of labels into positive and negative sets. Therefore, they do not benefit from ranking among positive labels, which is the novel MLR approach we introduce in this paper. We propose UniMLR, a new MLR paradigm that models implicit class relevance/significance values as probability distributions using the ranking among positive labels, rather than treating them as equally important. This approach unifies ranking and classification tasks associated with MLR. Additionally, we address the challenges of scarcity and annotation bias in MLR datasets by introducing eight synthetic datasets (Ranked MNISTs) generated with varying significance-determining factors, providing an enriched and controllable experimental environment. We statistically demonstrate that our method accurately learns a representation of the positive rank order, which is consistent with the ground truth and proportional to the underlying significance values. Finally, we conduct comprehensive empirical experiments on both real-world and synthetic datasets, demonstrating the value of our proposed framework.

【2】Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL
标题：MPSoC板上的神经网络加速：集成SLAC的SNL、Rogue Software和Auto-SNL
链接：https://arxiv.org/abs/2508.21739

作者：aoui Rahali, Abhilasha Dave, Larry Ruckman, Mohammad Mehdi Rahimifar, Audrey C. Therrien, James J. Russel, Ryan T. Herbst
摘要：LCLS-II自由电子激光器（FEL）将产生用于束线实验的X射线脉冲，其速率高达1~MHz，探测器产生的数据吞吐量超过1 TB/s。管理如此大规模的数据流带来了巨大的挑战，因为传输和存储基础设施变得过于昂贵。机器学习（ML）为实时数据减少提供了一个有前途的解决方案，但传统的实现会引入过多的延迟，使其不适合高速实验环境。为了应对这些挑战，SLAC开发了SLAC神经网络库（SNL），这是一个专门的框架，旨在将实时ML推理模型部署在现场可编程门阵列（FPGA）上。SNL的关键特性是能够动态更新模型权重，而无需FPGA重新合成，增强了自适应学习应用的灵活性。为了进一步增强可用性和可访问性，我们引入了Auto-SNL，这是一个Python扩展，它简化了将基于Python的神经网络模型转换为SNL兼容的高级合成代码的过程。本文针对Xilinx ZCU 102 FPGA的多种神经网络架构、定点精度和综合配置，与当前最先进的工具hls 4 ml进行了基准比较。结果表明，SNL在大多数测试架构中实现了具有竞争力或优越的延迟，同时在某些情况下还节省了FPGA资源。这种适应性展示了SNL的多功能性，为高能物理，医学成像，机器人等领域的研究人员和学者提供了新的机会。
摘要：The LCLS-II Free Electron Laser (FEL) will generate X-ray pulses for beamline experiments at rates of up to 1~MHz, with detectors producing data throughputs exceeding 1 TB/s. Managing such massive data streams presents significant challenges, as transmission and storage infrastructures become prohibitively expensive. Machine learning (ML) offers a promising solution for real-time data reduction, but conventional implementations introduce excessive latency, making them unsuitable for high-speed experimental environments. To address these challenges, SLAC developed the SLAC Neural Network Library (SNL), a specialized framework designed to deploy real-time ML inference models on Field-Programmable Gate Arrays (FPGA). SNL's key feature is the ability to dynamically update model weights without requiring FPGA resynthesis, enhancing flexibility for adaptive learning applications. To further enhance usability and accessibility, we introduce Auto-SNL, a Python extension that streamlines the process of converting Python-based neural network models into SNL-compatible high-level synthesis code. This paper presents a benchmark comparison against hls4ml, the current state-of-the-art tool, across multiple neural network architectures, fixed-point precisions, and synthesis configurations targeting a Xilinx ZCU102 FPGA. The results showed that SNL achieves competitive or superior latency in most tested architectures, while in some cases also offering FPGA resource savings. This adaptation demonstrates SNL's versatility, opening new opportunities for researchers and academics in fields such as high-energy physics, medical imaging, robotics, and many more.

【3】Trajectory learning for ensemble forecasts via the continuous ranked probability score: a Lorenz '96 case study
标题：通过连续排名概率得分进行总体预测的轨迹学习：Lorenz ' 96案例研究
链接：https://arxiv.org/abs/2508.21664

作者：ati, James Woodfield
备注：19 pages, 9 figures. All comments are welcome!
摘要：本文通过采用连续排序概率得分（CRPS）作为损失函数，论证了集合预报轨迹学习的可行性。使用双尺度Lorenz '96系统作为一个案例研究，我们开发和训练添加剂和乘法随机参数化生成合奏预测。结果表明，基于CRPS的轨迹学习产生的参数化是准确和尖锐的。由此产生的参数化是直接校准和优于衍生拟合为基础的参数化在短期预测。这种方法是特别有前途的数据同化应用，由于其准确性在较短的前置时间。
摘要：This paper demonstrates the feasibility of trajectory learning for ensemble forecasts by employing the continuous ranked probability score (CRPS) as a loss function. Using the two-scale Lorenz '96 system as a case study, we develop and train both additive and multiplicative stochastic parametrizations to generate ensemble predictions. Results indicate that CRPS-based trajectory learning produces parametrizations that are both accurate and sharp. The resulting parametrizations are straightforward to calibrate and outperform derivative-fitting-based parametrizations in short-term forecasts. This approach is particularly promising for data assimilation applications due to its accuracy over short lead times.

【4】Physics-Informed Spectral Modeling for Hyperspectral Imaging
标题：用于高光谱成像的物理信息光谱建模
链接：https://arxiv.org/abs/2508.21618

作者：awrysiak, Krzysztof Krawiec
摘要：我们提出了PhISM，这是一种基于物理学的深度学习架构，可以在没有监督的情况下学习，以明确地解开高光谱观测，并使用连续基函数对其进行建模。\mname在几个分类和回归基准上优于先前的方法，需要有限的标记数据，并提供额外的见解，这要归功于可解释的潜在表示。
摘要：We present PhISM, a physics-informed deep learning architecture that learns without supervision to explicitly disentangle hyperspectral observations and model them with continuous basis functions. \mname outperforms prior methods on several classification and regression benchmarks, requires limited labeled data, and provides additional insights thanks to interpretable latent representation.

【5】L3Cube-MahaSTS: A Marathi Sentence Similarity Dataset and Models
标题：L3 Cube-MahaSTS：马拉地语句子相似性数据集和模型
链接：https://arxiv.org/abs/2508.21569

作者： Mirashi, Ananya Joshi, Raviraj Joshi
摘要：我们提出了MahaSTS，一个人类注释的句子文本相似性（STS）数据集马拉地语，以及MahaSBERT-STS-v2，一个微调的句子BERT模型，优化了基于回归的相似性评分。MahaSTS数据集由16，860个马拉地语句子对组成，这些句子对的相似性分数在0-5之间。为了确保平衡的监督，数据集均匀分布在六个基于分数的桶中，跨越整个0-5范围，从而减少标签偏差并增强模型稳定性。我们在此数据集上微调了MahaSBERT模型，并将其性能与其他替代方案（如MahaBERT，MuRIL，IndicBERT和IndicSBERT）进行了比较。我们的实验表明，MahaSTS能够有效地训练马拉地语中的句子相似性任务，突出了人工策划的注释，有针对性的微调和结构化监督在低资源环境中的影响。数据集和模型在https://github.com/l3cube-pune/MarathiNLP上公开共享
摘要：We present MahaSTS, a human-annotated Sentence Textual Similarity (STS) dataset for Marathi, along with MahaSBERT-STS-v2, a fine-tuned Sentence-BERT model optimized for regression-based similarity scoring. The MahaSTS dataset consists of 16,860 Marathi sentence pairs labeled with continuous similarity scores in the range of 0-5. To ensure balanced supervision, the dataset is uniformly distributed across six score-based buckets spanning the full 0-5 range, thus reducing label bias and enhancing model stability. We fine-tune the MahaSBERT model on this dataset and benchmark its performance against other alternatives like MahaBERT, MuRIL, IndicBERT, and IndicSBERT. Our experiments demonstrate that MahaSTS enables effective training for sentence similarity tasks in Marathi, highlighting the impact of human-curated annotations, targeted fine-tuning, and structured supervision in low-resource settings. The dataset and model are publicly shared at https://github.com/l3cube-pune/MarathiNLP

【6】Limitations of Physics-Informed Neural Networks: a Study on Smart Grid Surrogation
标题：物理信息神经网络的局限性：智能电网替代研究
链接：https://arxiv.org/abs/2508.21559

作者：tero, Carmine Delle Femine, Kenji S. Muro, Marco Quartulli, Marcello Restelli
备注：Presented in PowerTech2025
摘要：物理信息神经网络（PINN）通过将物理定律直接集成到学习框架中，为智能电网建模提供了一种变革性的方法，解决了传统数据驱动方法中数据稀缺和物理一致性的关键挑战。本文评估了PINN作为智能电网动态代理模型的能力，比较了它们在三个关键实验中与XGBoost，随机森林和线性回归的性能：插值，交叉验证和情景轨迹预测。通过专门通过基于物理的损失函数（强制执行功率平衡，操作约束和电网稳定性）训练PINN，我们证明了它们优越的泛化能力，在减少错误方面优于数据驱动模型。值得注意的是，PINN在动态网格操作中保持相对较低的MAE，在随机和专家驱动的控制场景中可靠地捕获状态转换，而传统模型表现出不稳定的性能。尽管在极端操作条件下性能略有下降，但PINN始终保持物理可行性，这对于安全关键型应用至关重要。我们的研究结果有助于建立PINN作为智能电网代理的范式转换工具，桥接数据驱动的灵活性与第一原则的严谨性。这项工作推进了实时电网控制和可扩展的数字孪生，强调了关键任务能源系统中物理感知架构的必要性。
摘要：Physics-Informed Neural Networks (PINNs) present a transformative approach for smart grid modeling by integrating physical laws directly into learning frameworks, addressing critical challenges of data scarcity and physical consistency in conventional data-driven methods. This paper evaluates PINNs' capabilities as surrogate models for smart grid dynamics, comparing their performance against XGBoost, Random Forest, and Linear Regression across three key experiments: interpolation, cross-validation, and episodic trajectory prediction. By training PINNs exclusively through physics-based loss functions (enforcing power balance, operational constraints, and grid stability) we demonstrate their superior generalization, outperforming data-driven models in error reduction. Notably, PINNs maintain comparatively lower MAE in dynamic grid operations, reliably capturing state transitions in both random and expert-driven control scenarios, while traditional models exhibit erratic performance. Despite slight degradation in extreme operational regimes, PINNs consistently enforce physical feasibility, proving vital for safety-critical applications. Our results contribute to establishing PINNs as a paradigm-shifting tool for smart grid surrogation, bridging data-driven flexibility with first-principles rigor. This work advances real-time grid control and scalable digital twins, emphasizing the necessity of physics-aware architectures in mission-critical energy systems.

【7】Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning
标题：先验知识很重要：解决贝叶斯深度Q学习中的错误指定
链接：https://arxiv.org/abs/2508.21488

作者： van der Vaart, Neil Yorke-Smith, Matthijs T.J. Spaan
摘要：强化学习中的不确定性量化可以大大提高探索性和鲁棒性。近似贝叶斯方法最近已经推广到量化无模型算法中的不确定性。然而，到目前为止，重点一直是提高后验近似的准确性，而不是研究后验的先验和似然假设的准确性。在这项工作中，我们证明了贝叶斯深度Q学习中存在冷后验效应，与理论相反，当降低后验温度时，性能会提高。为了识别和克服可能的原因，我们挑战对贝叶斯无模型算法中的可能性和先验知识所做的常见假设。我们实证研究先验分布，并通过统计检验表明，常见的高斯似然假设经常被违反。我们认为，开发更合适的似然和先验应该是未来贝叶斯强化学习研究的重点，我们为深度Q学习中更好的先验提供了简单，可实现的解决方案，从而导致更高性能的贝叶斯算法。
摘要：Uncertainty quantification in reinforcement learning can greatly improve exploration and robustness. Approximate Bayesian approaches have recently been popularized to quantify uncertainty in model-free algorithms. However, so far the focus has been on improving the accuracy of the posterior approximation, instead of studying the accuracy of the prior and likelihood assumptions underlying the posterior. In this work, we demonstrate that there is a cold posterior effect in Bayesian deep Q-learning, where contrary to theory, performance increases when reducing the temperature of the posterior. To identify and overcome likely causes, we challenge common assumptions made on the likelihood and priors in Bayesian model-free algorithms. We empirically study prior distributions and show through statistical tests that the common Gaussian likelihood assumption is frequently violated. We argue that developing more suitable likelihoods and priors should be a key focus in future Bayesian reinforcement learning research and we offer simple, implementable solutions for better priors in deep Q-learning that lead to more performant Bayesian algorithms.

【8】Rethinking Layer-wise Model Merging through Chain of Merges
标题：通过合并链重新思考分层合并模型
链接：https://arxiv.org/abs/2508.21421

作者：zzega, Riccardo Salami, Angelo Porrello, Simone Calderara
摘要：微调预训练模型已成为在广泛领域实现最先进性能的标准途径，导致特定于任务的模型变体激增。随着这些专门模块的数量增加，将它们合并成一个统一的模型而不进行再培训已成为一个关键的挑战。现有的合并技术通常依赖于干扰分析、重要性加权或激活匹配，同时独立地处理每个层，从而无法考虑深度网络中固有的层间依赖性。这种简化导致分布失配，特别是基于失活的方法，当早期层的变化没有正确反映在下游层时。我们将这些不匹配识别为内部协变量转移的一种形式，与神经网络训练初始阶段遇到的现象相当。为了解决这个问题，我们提出了合并链（CoM），这是一个逐层合并的过程，它以自回归的方式更新激活统计数据，明确说明跨层交互。CoM通过一系列有条件的最优更新产生一致的合并模型，有效地减轻了协变量偏移引起的退化。标准基准的实验表明，CoM达到最先进的性能。
摘要：Fine-tuning pretrained models has become a standard pathway to achieve state-of-the-art performance across a wide range of domains, leading to a proliferation of task-specific model variants. As the number of such specialized modules in-creases, merging them into a unified model without retraining has become a critical challenge. Existing merging techniques often rely on interference heuristics,importance weighting, or activation matching while treating each layer independently, thereby failing to account for the inter-layer dependencies inherent in deep networks. This simplification leads to distributional mismatches, especially inactivation-based methods, when changes in early layers are not properly reflected in downstream ones. We identify these mismatches as a form of internal covariate shift, comparable to the phenomenon encountered in the initial phases of neural networks training. To address it, we propose Chain of Merges (CoM), a layer-wise merging procedure that updates activation statistics in an auto-regressive fashion, explicitly accounting for cross-layer interactions. CoM produces a coherent merged model through a series of conditionally optimal updates, effectively mitigating degradation caused by covariate shift. Experiments on standard bench-marks demonstrate that CoM achieves state-of-the-art performance.

【9】Benchmarking the State of Networks with a Low-Cost Method Based on Reservoir Computing
标题：利用基于水库计算的低成本方法对网络状态进行基准测试
链接：https://arxiv.org/abs/2508.21420

作者：on Reimers, Carl-Hendrik Peters, Stefano Nichele
备注：Net-Zero Future 2025 Conference
摘要：使用挪威移动网络利用率的数据，我们展示了一种非侵入性，低成本的方法来监测通信和移动网络状态的可能性。该方法将网络数据转换为水库计算框架内的模型，然后测量模型的代理任务的性能。通过实验，我们展示了这些代理的性能与网络状态的关系。这种方法的一个主要优点是，它使用现成的数据集，并利用水库计算框架的一个廉价的和很大程度上不可知的方法。来自移动网络利用率的数据以匿名、聚合的形式提供，每天有多个快照。这些数据可以被视为一个加权网络。水库计算允许使用加权但未经训练的网络作为机器学习工具。该网络初始化为所谓的回声状态网络（ESN），将传入信号投射到更高维的空间中，在该空间上，单个训练层进行操作。这比深度神经网络消耗更少的能量，在深度神经网络中，网络的每个权重都被训练。我们使用神经科学启发的任务，并训练我们的ESN模型来解决它们。然后，我们展示了性能如何取决于某些网络配置，以及当扰动网络时，性能如何明显下降。虽然这项工作作为概念验证，但我们相信它可以被提升用于近实时监控以及识别移动通信网络和交通网络的可能弱点。
摘要：Using data from mobile network utilization in Norway, we showcase the possibility of monitoring the state of communication and mobility networks with a non-invasive, low-cost method. This method transforms the network data into a model within the framework of reservoir computing and then measures the model's performance on proxy tasks. Experimentally, we show how the performance on these proxies relates to the state of the network. A key advantage of this approach is that it uses readily available data sets and leverages the reservoir computing framework for an inexpensive and largely agnostic method. Data from mobile network utilization is available in an anonymous, aggregated form with multiple snapshots per day. This data can be treated like a weighted network. Reservoir computing allows the use of weighted, but untrained networks as a machine learning tool. The network, initialized as a so-called echo state network (ESN), projects incoming signals into a higher dimensional space, on which a single trained layer operates. This consumes less energy than deep neural networks in which every weight of the network is trained. We use neuroscience inspired tasks and trained our ESN model to solve them. We then show how the performance depends on certain network configurations and also how it visibly decreases when perturbing the network. While this work serves as proof of concept, we believe it can be elevated to be used for near-real-time monitoring as well as the identification of possible weak spots of both mobile communication networks as well as transportation networks.

【10】PMODE: Theoretically Grounded and Modular Mixture Modeling
标题：PMODE：理论基础和模块化混合建模
链接：https://arxiv.org/abs/2508.21396

作者： Vandermeulen
摘要：我们介绍PMODE（分区混合密度估计），一个通用的和模块化的框架，参数和非参数成分的混合建模。PMODE通过划分数据并将单独的估计量拟合到每个子集来构建混合。它达到了接近最优的估计类，并保持有效，即使当混合成分来自不同的分布家庭。作为一个应用程序，我们开发了MV-PMODE，它将以前的理论方法扩展到高维密度估计，以设置数千个维度。尽管它的简单性，它表现出竞争力的深基线CIFAR-10异常检测。
摘要：We introduce PMODE (Partitioned Mixture Of Density Estimators), a general and modular framework for mixture modeling with both parametric and nonparametric components. PMODE builds mixtures by partitioning the data and fitting separate estimators to each subset. It attains near-optimal rates for this estimator class and remains valid even when the mixture components come from different distribution families. As an application, we develop MV-PMODE, which scales a previously theoretical approach to high-dimensional density estimation to settings with thousands of dimensions. Despite its simplicity, it performs competitively against deep baselines on CIFAR-10 anomaly detection.

【11】MyGO: Memory Yielding Generative Offline-consolidation for Lifelong Learning Systems
标题：MyGO：终身学习系统的记忆生成性离线整合
链接：https://arxiv.org/abs/2508.21296

作者：, Zihui Song
备注：5 pages
摘要：持续或终身学习旨在开发能够从一系列任务中获取新知识的模型，而不会灾难性地忘记以前学到的知识。现有的方法通常依赖于存储来自先前任务的样本（经验重放）或采用复杂的正则化项来保护学习的权重。然而，这些方法面临着与数据隐私，存储限制和性能下降时，任务是不同的挑战。为了应对这些挑战，我们引入了MyGO（Memory Yielding Generative Offline-consolidation），这是一种受生物唤醒-睡眠周期启发的新型终身学习框架。在“唤醒”阶段，系统快速学习新任务，并训练一个紧凑的生成模型（生成记忆，G-MRM）来捕获其数据分布。在“睡眠”阶段，系统进入离线状态，使用所有学习的G-SVM模型生成伪数据（“梦”），并通过知识蒸馏将新旧知识合并到核心特征提取器中。这种方法消除了存储任何原始数据的需要，仅保留紧凑的生成模型，这在隐私和存储效率方面具有显着优势。我们在计算机视觉（Split-MNIST）和自然语言处理（Split-AG News）基准上评估了MyGO，并将其与顺序微调基线进行了比较。结果表明，MyGO显著减轻了灾难性遗忘，并在任务中保持了较高的平均准确率，证明了该框架的有效性和领域通用性。
摘要：Continual or Lifelong Learning aims to develop models capable of acquiring new knowledge from a sequence of tasks without catastrophically forgetting what has been learned before. Existing approaches often rely on storing samples from previous tasks (experience replay) or employing complex regularization terms to protect learned weights. However, these methods face challenges related to data privacy, storage limitations, and performance degradation when tasks are dissimilar. To address these challenges, we introduce MyGO (Memory Yielding Generative Offline-consolidation), a novel lifelong learning framework inspired by the biological wake-sleep cycle. During the "wake" phase, the system rapidly learns a new task and trains a compact generative model (Generative Memory, G-mem) to capture its data distribution. During the "sleep" phase, the system enters an offline state, using all learned G-mem models to generate pseudo-data ("dreams") and consolidate new and old knowledge into a core feature extractor via knowledge distillation. This approach obviates the need to store any raw data, retaining only compact generative models, which offers significant advantages in privacy and storage efficiency. We evaluate MyGO on computer vision (Split-MNIST) and natural language processing (Split-AG News) benchmarks, comparing it against a sequential fine-tuning baseline. The results demonstrate that MyGO significantly mitigates catastrophic forgetting and maintains high average accuracy across tasks, proving the framework's effectiveness and domain-generality.

【12】A Mixture of Experts Gating Network for Enhanced Surrogate Modeling in External Aerodynamics
标题：用于增强外部空气动力学代理建模的混合专家门控网络
链接：https://arxiv.org/abs/2508.21249

作者：Amin Nabian, Sanjay Choudhry
摘要：与高保真CFD模拟相关的计算成本仍然是汽车设计和优化周期中的一个重要瓶颈。虽然基于ML的代理模型已成为加速空气动力学预测的一种有前途的替代方案，但该领域的特点是专业神经网络架构的多样化和快速发展，没有单一模型表现出普遍的优越性。本文介绍了一种新的元学习框架，利用这种架构的多样性作为一种优势。我们提出了一种专家混合（MoE）模型，该模型采用专用的门控网络来动态地优化组合来自三个异构的最先进的代理模型的预测：DoMINO，一种可分解的多尺度神经运算符; X-MeshGraphNet，一种可扩展的多尺度图形神经网络;和FigConvNet，一种因子化的隐式全局卷积网络。门控网络学习一个空间变化的加权策略，分配可信度的基础上，其本地化的性能预测表面压力和壁面剪切应力场的每个专家。为了防止模型崩溃并鼓励平衡的专家贡献，我们将熵正则化项集成到训练损失函数中。整个系统在DrivAerML数据集上进行训练和验证，DrivAerML数据集是汽车空气动力学高保真CFD模拟的大规模公共基准。定量结果表明，MoE模型实现了L-2预测误差的显着减少，不仅优于总体平均值，而且在所有评估的物理量最准确的个人专家模型。这项工作建立了MoE框架作为一个强大而有效的战略，通过协同结合专门架构的互补优势，创建更强大，更准确的复合代理模型。
摘要：The computational cost associated with high-fidelity CFD simulations remains a significant bottleneck in the automotive design and optimization cycle. While ML-based surrogate models have emerged as a promising alternative to accelerate aerodynamic predictions, the field is characterized by a diverse and rapidly evolving landscape of specialized neural network architectures, with no single model demonstrating universal superiority. This paper introduces a novel meta-learning framework that leverages this architectural diversity as a strength. We propose a Mixture of Experts (MoE) model that employs a dedicated gating network to dynamically and optimally combine the predictions from three heterogeneous, state-of-the-art surrogate models: DoMINO, a decomposable multi-scale neural operator; X-MeshGraphNet, a scalable multi-scale graph neural network; and FigConvNet, a factorized implicit global convolution network. The gating network learns a spatially-variant weighting strategy, assigning credibility to each expert based on its localized performance in predicting surface pressure and wall shear stress fields. To prevent model collapse and encourage balanced expert contributions, we integrate an entropy regularization term into the training loss function. The entire system is trained and validated on the DrivAerML dataset, a large-scale, public benchmark of high-fidelity CFD simulations for automotive aerodynamics. Quantitative results demonstrate that the MoE model achieves a significant reduction in L-2 prediction error, outperforming not only the ensemble average but also the most accurate individual expert model across all evaluated physical quantities. This work establishes the MoE framework as a powerful and effective strategy for creating more robust and accurate composite surrogate models by synergistically combining the complementary strengths of specialized architectures.

【13】Population-Scale Network Embeddings Expose Educational Divides in Network Structure Related to Right-Wing Populist Voting
标题：人口规模网络嵌入暴露了与右翼民粹主义投票相关的网络结构中的教育鸿沟
链接：https://arxiv.org/abs/2508.21236

作者：en (1 and 2 and 3), Javier Garcia-Bernardo (4), Sreeparna Deb (5), Flavio Hafner (1 and 3), Megha Khosla (5) ((1) Netherlands eScience Center, (2) University of Amsterdam, (3) Erasmus University Rotterdam, (4) Utrecht University, (5) Delft University of Technology)
备注：31 pages, 6 figures, Supplementary Materials available at this https URL
摘要：行政登记数据可用于构建人口规模的网络，其联系反映了人与人之间共享的社会背景。通过机器学习，这样的网络可以被编码成数字表示-嵌入-自动捕获个人在网络中的位置。我们从一个人口规模的网络中为荷兰人口中的所有人创建了嵌入，该网络代表了五个共享环境：邻里，工作，家庭，家庭和学校。为了评估这些嵌入的信息量，我们用它们来预测右翼民粹主义者的投票。仅嵌入预测右翼民粹主义投票高于机会水平，但表现不如个人特征。将嵌入的最佳子集与个体特征相结合只能略微改善预测。然而，在对嵌入进行变换以使其维度更加稀疏和正交之后，我们发现一个嵌入维度与结果密切相关。将这一维度映射回人口网络，揭示了不同学校关系和教育水平之间与右翼民粹主义投票相关的网络结构差异。我们的研究通过展示如何解释人口规模的网络嵌入，并通过将教育中的结构性网络差异与右翼民粹主义投票联系起来，在方法上做出了贡献。
摘要：Administrative registry data can be used to construct population-scale networks whose ties reflect shared social contexts between persons. With machine learning, such networks can be encoded into numerical representations -- embeddings -- that automatically capture individuals' position within the network. We created embeddings for all persons in the Dutch population from a population-scale network that represents five shared contexts: neighborhood, work, family, household, and school. To assess the informativeness of these embeddings, we used them to predict right-wing populist voting. Embeddings alone predicted right-wing populist voting above chance-level but performed worse than individual characteristics. Combining the best subset of embeddings with individual characteristics only slightly improved predictions. However, after transforming the embeddings to make their dimensions more sparse and orthogonal, we found that one embedding dimension was strongly associated with the outcome. Mapping this dimension back to the population network revealed differences in network structure related to right-wing populist voting between different school ties and achieved education levels. Our study contributes methodologically by demonstrating how population-scale network embeddings can be made interpretable, and substantively by linking structural network differences in education to right-wing populist voting.

【14】Model-Task Alignment Drives Distinct RL Outcomes
标题：模型任务一致性推动独特的RL结果
链接：https://arxiv.org/abs/2508.21188

作者： Cheng Wang, Wenshuo Zhao, Junxian He
摘要：将强化学习（RL）应用于大型语言模型（LLM）的最新进展已经取得了实质性进展。特别是，一系列显着的，但往往违反直觉的现象已被报告在LLM中，表现出的模式通常不会在传统的RL设置中观察到。例如，值得注意的主张包括，单个训练示例可以匹配整个数据集所实现的性能，奖励信号不需要非常准确，并且仅使用负样本的训练可以匹配甚至超过复杂的基于奖励的方法。然而，这些观测结果成立的确切条件--以及关键的是，它们何时失败--仍然不清楚。在这项工作中，我们确定了区分RL观察的一个关键因素：预训练的模型是否已经表现出很强的模型-任务对齐，这是通过评估任务的pass@k准确性来衡量的。通过对一系列违反直觉的主张进行系统和全面的检查，并在不同的模型架构和任务领域进行严格的实验验证，我们的研究结果表明，虽然标准RL训练在不同的设置中保持一致的鲁棒性，但只有当模型和任务已经表现出很强的模型-任务对齐时，才会出现许多违反直觉的结果。相比之下，这些技术无法在更具挑战性的制度中推动实质性的学习，在这些制度中，标准RL方法仍然有效。
摘要：Recent advances in applying reinforcement learning (RL) to large language models (LLMs) have led to substantial progress. In particular, a series of remarkable yet often counterintuitive phenomena have been reported in LLMs, exhibiting patterns not typically observed in traditional RL settings. For example, notable claims include that a single training example can match the performance achieved with an entire dataset, that the reward signal does not need to be very accurate, and that training solely with negative samples can match or even surpass sophisticated reward-based methods. However, the precise conditions under which these observations hold - and, critically, when they fail - remain unclear. In this work, we identify a key factor that differentiates RL observations: whether the pretrained model already exhibits strong Model-Task Alignment, as measured by pass@k accuracy on the evaluated task. Through a systematic and comprehensive examination of a series of counterintuitive claims, supported by rigorous experimental validation across different model architectures and task domains, our findings show that while standard RL training remains consistently robust across settings, many of these counterintuitive results arise only when the model and task already exhibit strong model-task alignment. In contrast, these techniques fail to drive substantial learning in more challenging regimes, where standard RL methods remain effective.

【15】Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks
标题：深度剩余回声状态网络：探索未训练的递归神经网络中的剩余正交连接
链接：https://arxiv.org/abs/2508.21172

作者：nna, Andrea Ceni, Claudio Gallicchio
备注：10 pages, 6 figures
摘要：回声状态网络（ESN）是水库计算（RC）框架内的一种特殊类型的未经训练的递归神经网络（RNN），因其快速有效的学习而流行。然而，传统的ESN往往与长期的信息处理斗争。在本文中，我们介绍了一类新的基于时间残差连接的深度未经训练的RNN，称为深度残差回声状态网络（DeepResESN）。我们表明，利用未经训练的残余递归层的层次结构显着提高记忆容量和长期时间建模。对于时间剩余连接，我们考虑了不同的正交配置，包括随机生成和固定结构的配置，我们研究了它们对网络动力学的影响。深入的数学分析概述了确保DeepResESN内稳定动态的必要和充分条件。我们对各种时间序列任务的实验展示了所提出的方法比传统的浅和深RC的优势。
摘要：Echo State Networks (ESNs) are a particular type of untrained Recurrent Neural Networks (RNNs) within the Reservoir Computing (RC) framework, popular for their fast and efficient learning. However, traditional ESNs often struggle with long-term information processing. In this paper, we introduce a novel class of deep untrained RNNs based on temporal residual connections, called Deep Residual Echo State Networks (DeepResESNs). We show that leveraging a hierarchy of untrained residual recurrent layers significantly boosts memory capacity and long-term temporal modeling. For the temporal residual connections, we consider different orthogonal configurations, including randomly generated and fixed-structure configurations, and we study their effect on network dynamics. A thorough mathematical analysis outlines necessary and sufficient conditions to ensure stable dynamics within DeepResESN. Our experiments on a variety of time series tasks showcase the advantages of the proposed approach over traditional shallow and deep RC.

【16】Data-Driven Bifurcation Handling in Physics-Based Reduced-Order Vascular Hemodynamic Models
标题：基于物理的降阶血管血流动力学模型中的数据驱动分歧处理
链接：https://arxiv.org/abs/2508.21165

作者：. Rubio, Eric F. Darve, Alison L. Marsden
备注：32 pages, 13 figures
摘要：心血管血流的三维（3D）有限元模拟提供了高保真度的预测，以支持心血管医学，但其高计算成本限制了临床实用性。降阶模型（ROM）提供了计算效率高的替代方案，但精度降低，特别是在血管分叉，其中复杂的流动物理不足以捕获标准Poiffille流假设。我们提出了一个增强的数值框架，将机器学习预测的分叉系数集成到零维（0 D）血液动力学ROM中，以提高准确性，同时保持计算效率。我们开发了一个电阻-电阻-电感（RRI）模型，该模型使用神经网络从分叉几何形状预测压力-流量关系，将线性和二次电阻与电感效应结合在一起。该方法采用无量纲化，以减少训练数据的要求和先验的流量分裂预测改善分叉表征。我们将RRI模型到一个0 D模型使用基于优化的解决方案策略。我们在孤立的分叉和血管树中验证了该方法，雷诺数从0到5，500，通过与3D有限元模拟进行比较来定义ROM精度。结果表明，精度得到了显著提高：所有树和雷诺数的平均值，RRI方法将进口压力误差从标准0 D模型的54 mmHg（45%）降低到25 mmHg（17%），而简化的电阻-电感（RI）变体达到31 mmHg（26%）误差。增强的0 D模型在高雷诺数和广泛的血管网络中显示出特别的有效性。这种混合数值方法可以实现准确、实时的血流动力学建模，用于心血管生物医学工程中的临床决策支持、不确定性量化和数字孪生。
摘要：Three-dimensional (3D) finite-element simulations of cardiovascular flows provide high-fidelity predictions to support cardiovascular medicine, but their high computational cost limits clinical practicality. Reduced-order models (ROMs) offer computationally efficient alternatives but suffer reduced accuracy, particularly at vessel bifurcations where complex flow physics are inadequately captured by standard Poiseuille flow assumptions. We present an enhanced numerical framework that integrates machine learning-predicted bifurcation coefficients into zero-dimensional (0D) hemodynamic ROMs to improve accuracy while maintaining computational efficiency. We develop a resistor-resistor-inductor (RRI) model that uses neural networks to predict pressure-flow relationships from bifurcation geometry, incorporating linear and quadratic resistances along with inductive effects. The method employs non-dimensionalization to reduce training data requirements and apriori flow split prediction for improved bifurcation characterization. We incorporate the RRI model into a 0D model using an optimization-based solution strategy. We validate the approach in isolated bifurcations and vascular trees, across Reynolds numbers from 0 to 5,500, defining ROM accuracy by comparison to 3D finite element simulation. Results demonstrate substantial accuracy improvements: averaged across all trees and Reynolds numbers, the RRI method reduces inlet pressure errors from 54 mmHg (45%) for standard 0D models to 25 mmHg (17%), while a simplified resistor-inductor (RI) variant achieves 31 mmHg (26%) error. The enhanced 0D models show particular effectiveness at high Reynolds numbers and in extensive vascular networks. This hybrid numerical approach enables accurate, real-time hemodynamic modeling for clinical decision support, uncertainty quantification, and digital twins in cardiovascular biomedical engineering.

【17】R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
标题：R-4 B：通过双模式退火和强化学习激励MLLM中的通用自动思考能力
链接：https://arxiv.org/abs/2508.21113

作者：, Qi Yang, Bolin Ni, Shiming Xiang, Han Hu, Houwen Peng
备注：20 pages, 14 figures, 5 tables
摘要：多模态大型语言模型（MLLM）具有逐步思维能力，在复杂的推理问题上表现出了卓越的性能。然而，这种思考过程对于无需复杂推理即可解决的简单问题来说是多余的。为了解决这种效率低下的问题，我们提出了R-4 B，一种自动思考的MLLM，它可以根据问题的复杂性自适应地决定何时思考。R-4 B的核心思想是使用双模式退火赋予模型思考和非思考能力，并应用双模式策略优化（BPO）来提高模型在确定是否激活思考过程时的准确性。具体来说，我们首先在一个精心策划的涵盖各种主题的数据集上训练模型，该数据集包含来自思维和非思维模式的样本。然后，在改进的GRPO框架下进行第二阶段的训练，其中策略模型被迫为每个输入查询生成来自两种模式的响应。实验结果表明，R-4 B在25个具有挑战性的基准测试中达到了最先进的性能。它在大多数任务中的性能优于Qwen2.5-VL-7 B，并在推理密集型基准测试中以较低的计算成本实现了与Kimi-VL-A3 B-Thinking-2506（16 B）等大型模型相当的性能。
摘要：Multimodal Large Language Models (MLLMs) equipped with step-by-step thinking capabilities have demonstrated remarkable performance on complex reasoning problems. However, this thinking process is redundant for simple problems solvable without complex reasoning. To address this inefficiency, we propose R-4B, an auto-thinking MLLM, which can adaptively decide when to think based on problem complexity. The central idea of R-4B is to empower the model with both thinking and non-thinking capabilities using bi-mode annealing, and apply Bi-mode Policy Optimization~(BPO) to improve the model's accuracy in determining whether to activate the thinking process. Specifically, we first train the model on a carefully curated dataset spanning various topics, which contains samples from both thinking and non-thinking modes. Then it undergoes a second phase of training under an improved GRPO framework, where the policy model is forced to generate responses from both modes for each input query. Experimental results show that R-4B achieves state-of-the-art performance across 25 challenging benchmarks. It outperforms Qwen2.5-VL-7B in most tasks and achieves performance comparable to larger models such as Kimi-VL-A3B-Thinking-2506 (16B) on reasoning-intensive benchmarks with lower computational cost.

【18】Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models
标题：训练广义线性模型的全矩阵预处理器动态低阶逼近
链接：https://arxiv.org/abs/2508.21106

作者：atveeva, Aleksandr Katrutsa, Evgeny Frolov
摘要：自适应梯度方法，如Adagrad及其变体，在大规模优化中广泛使用。然而，它们使用对角预处理矩阵限制了捕获参数相关性的能力。全矩阵自适应方法，近似精确的海森，可以模拟这些相关性，并可能实现更快的收敛。同时，它们的计算和内存成本对于大规模模型来说往往是令人望而却步的。为了解决这个问题，我们提出了AdaGram，一个优化器，使有效的全矩阵自适应梯度更新。为了减少内存和计算开销，我们利用快速对称因式分解计算的预处理更新方向在每次迭代。此外，我们保持低秩结构的预处理沿优化轨迹使用矩阵积分器的方法。标准机器学习任务的数值实验表明，AdaGram收敛速度更快，或者在使用秩5和更小的秩近似时与对角自适应优化器的性能相匹配。这证明了AdaGram作为大型模型自适应优化的可扩展解决方案的潜力。
摘要：Adaptive gradient methods like Adagrad and its variants are widespread in large-scale optimization. However, their use of diagonal preconditioning matrices limits the ability to capture parameter correlations. Full-matrix adaptive methods, approximating the exact Hessian, can model these correlations and may enable faster convergence. At the same time, their computational and memory costs are often prohibitive for large-scale models. To address this limitation, we propose AdaGram, an optimizer that enables efficient full-matrix adaptive gradient updates. To reduce memory and computational overhead, we utilize fast symmetric factorization for computing the preconditioned update direction at each iteration. Additionally, we maintain the low-rank structure of a preconditioner along the optimization trajectory using matrix integrator methods. Numerical experiments on standard machine learning tasks show that AdaGram converges faster or matches the performance of diagonal adaptive optimizers when using rank five and smaller rank approximations. This demonstrates AdaGram's potential as a scalable solution for adaptive optimization in large models.

【19】Surface Stability Modeling with Universal Machine Learning Interatomic Potentials: A Comprehensive Cleavage Energy Benchmarking Study
标题：利用通用机器学习原子间势进行表面稳定性建模：全面的劈开能基准研究
链接：https://arxiv.org/abs/2508.21663

作者：ehdizadeh, Peter Schindler
备注：70 pages total (main paper + supplementary information), 4 figures in main text, multiple supplementary figures and tables
摘要：机器学习原子间相互作用势（MLIP）通过弥合量子力学精度和经典模拟效率之间的差距，彻底改变了计算材料科学，从而实现了对元素周期表中材料特性的前所未有的探索。尽管他们显着的成功预测散装性能，没有系统的评价评估如何以及这些通用MLIPs（uMLIPs）可以预测裂解能，一个关键的属性管理断裂，催化，表面稳定性和界面现象。在这里，我们提出了一个全面的基准的19个国家的最先进的uMLIPs的裂解能量预测使用我们以前建立的密度泛函理论（DFT）数据库的36,718板结构跨越元素，二元和三元金属化合物。我们评估不同的建筑范例，分析它们在化学成分，晶体系统，厚度和表面取向方面的性能。我们的研究结果表明，训练数据组成主导了结构复杂性：在Open Materials 2024（OMat 24）数据集上训练的模型，强调非平衡配置，平均绝对百分比误差低于6%，并在87%的情况下正确识别出最稳定的表面终止，而无需任何明确的表面能训练。相比之下，在仅平衡数据集上训练的结构相同的模型显示出5倍的错误，而在表面吸附数据上训练的模型则灾难性地失败了17倍的退化。值得注意的是，在适当数据上训练的更简单的架构可以实现与复杂Transformers相当的精度，同时提供10- 100倍的计算加速。这些研究结果表明，社区应该专注于捕捉相关物理现象的战略培训数据生成。
摘要：Machine learning interatomic potentials (MLIPs) have revolutionized computational materials science by bridging the gap between quantum mechanical accuracy and classical simulation efficiency, enabling unprecedented exploration of materials properties across the periodic table. Despite their remarkable success in predicting bulk properties, no systematic evaluation has assessed how well these universal MLIPs (uMLIPs) can predict cleavage energies, a critical property governing fracture, catalysis, surface stability, and interfacial phenomena. Here, we present a comprehensive benchmark of 19 state-of-the-art uMLIPs for cleavage energy prediction using our previously established density functional theory (DFT) database of 36,718 slab structures spanning elemental, binary, and ternary metallic compounds. We evaluate diverse architectural paradigms, analyzing their performance across chemical compositions, crystal systems, thickness, and surface orientations. Our results reveal that training data composition dominates architectural sophistication: models trained on the Open Materials 2024 (OMat24) dataset, which emphasizes non-equilibrium configurations, achieve mean absolute percentage errors below 6% and correctly identify the thermodynamically most stable surface terminations in 87% of cases, without any explicit surface energy training. In contrast, architecturally identical models trained on equilibrium-only datasets show five-fold higher errors, while models trained on surface-adsorbate data fail catastrophically with a 17-fold degradation. Remarkably, simpler architectures trained on appropriate data achieve comparable accuracy to complex transformers while offering 10-100x computational speedup. These findings show that the community should focus on strategic training data generation that captures the relevant physical phenomena.

【20】Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling
标题：随机测量的加权支持点：生成建模的可解释替代方案
链接：https://arxiv.org/abs/2508.21255

作者：o, Carlos E. Rodríguez, Ramsés H. Mena, Stephen G. Walker
备注：24 pages, 6 figures
摘要：支持点通过一组较小的代表性点来总结大型数据集，这些代表性点可用于数据操作，例如Monte Carlo积分，而无需访问完整的数据集。从这个意义上说，支持点提供了原始数据的紧凑但信息丰富的表示。我们建立在这个想法的基础上，介绍了一个生成建模框架的基础上随机加权支持点，其中的随机性来自于一个加权方案的启发Dirichlet过程和贝叶斯自助。所提出的方法从固定的数据集生成多样化和可解释的样本集，而不依赖于概率建模假设或神经网络架构。我们提出了该方法的理论公式，并开发了一个有效的优化算法的基础上的凸-凹过程（CCP）。MNIST和CelebA-HQ数据集的实证结果表明，我们的方法产生了高质量和多样化的输出，其计算成本仅为黑盒替代方案（如生成对抗网络（GANs）或去噪扩散概率模型（DDPMs））的一小部分。这些结果表明，随机加权支持点提供了一个原则性的，可扩展的，可解释的替代生成建模。一个关键特性是它们能够产生真正的插值样本，从而保留底层数据结构。
摘要：Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer a compact yet informative representation of the original data. We build on this idea to introduce a generative modeling framework based on random weighted support points, where the randomness arises from a weighting scheme inspired by the Dirichlet process and the Bayesian bootstrap. The proposed method generates diverse and interpretable sample sets from a fixed dataset, without relying on probabilistic modeling assumptions or neural network architectures. We present the theoretical formulation of the method and develop an efficient optimization algorithm based on the Convex--Concave Procedure (CCP). Empirical results on the MNIST and CelebA-HQ datasets show that our approach produces high-quality and diverse outputs at a fraction of the computational cost of black-box alternatives such as Generative Adversarial Networks (GANs) or Denoising Diffusion Probabilistic Models (DDPMs). These results suggest that random weighted support points offer a principled, scalable, and interpretable alternative for generative modeling. A key feature is their ability to produce genuinely interpolative samples that preserve underlying data structure.

【21】Quantum-inspired probability metrics define a complete, universal space for statistical learning
标题：量子启发的概率指标定义了一个完整、通用的统计学习空间
链接：https://arxiv.org/abs/2508.21086

作者：McCarty
备注：42 pages, 1 figure
摘要：比较概率分布是自然科学、社会科学和计算科学的核心挑战。现有的方法，如最大平均离散（MMD），挣扎在高维和非紧域。在这里，我们介绍量子概率度量（QPM），通过在量子态空间中嵌入概率测度而导出：希尔伯特空间上的正单位迹算子。这种构造扩展了基于核的方法，克服了MMD在非紧空间上的不完备性。作为一个积分概率度量（IPM），QPM具有对偶函数，一致逼近$\mathbb{R}^n$上的所有有界、一致连续函数，从而增强了对高维空间中细微分布差异的敏感性。对于经验分布，QPM很容易使用特征值方法计算，其中分析梯度适合于学习和优化。虽然对于大样本量，QPM的计算量更大（$O（n^3）$ vs. $O（n^2）$），但QPM作为MMD的替代品可以显着提高性能，正如经典的生成建模任务所证明的那样。通过将量子力学丰富的数学框架与经典概率论相结合，这种方法为分析和操作概率测度的强大工具奠定了基础。
摘要：Comparing probability distributions is a core challenge across the natural, social, and computational sciences. Existing methods, such as Maximum Mean Discrepancy (MMD), struggle in high-dimensional and non-compact domains. Here we introduce quantum probability metrics (QPMs), derived by embedding probability measures in the space of quantum states: positive, unit-trace operators on a Hilbert space. This construction extends kernel-based methods and overcomes the incompleteness of MMD on non-compact spaces. Viewed as an integral probability metric (IPM), QPMs have dual functions that uniformly approximate all bounded, uniformly continuous functions on $\mathbb{R}^n$, offering enhanced sensitivity to subtle distributional differences in high dimensions. For empirical distributions, QPMs are readily calculated using eigenvalue methods, with analytic gradients suited for learning and optimization. Although computationally more intensive for large sample sizes ($O(n^3)$ vs. $O(n^2)$), QPMs can significantly improve performance as a drop-in replacement for MMD, as demonstrated in a classic generative modeling task. By combining the rich mathematical framework of quantum mechanics with classical probability theory, this approach lays the foundation for powerful tools to analyze and manipulate probability measures.

其他(11篇)

【1】Benchmarking GPT-5 in Radiation Oncology: Measurable Gains, but Persistent Need for Expert Oversight
标题：放射肿瘤学中的GPT-5基准：可测量的收益，但持续需要专家监督
链接：https://arxiv.org/abs/2508.21777

作者：, Jibak Sarkar, Philipp Schubert, Sabine Semrau, Thomas Weissmann, Andre Karius, Johann Brand, Bernd-Niklas Axer, Ahmed Gomaa, Pluvio Stephan, Ishita Sheth, Sogand Beirami, Annette Schwarz, Udo Gaipl, Benjamin Frey, Christoph Bert, Stefanie Corradini, Rainer Fietkau, Florian Putz
备注：Under review in Frontiers in Artificial Intelligence
摘要：大型语言模型（LLM）在临床决策支持方面显示出巨大的潜力。GPT-5是一种新型LLM系统，专门针对肿瘤学用途销售。研究方法：使用两个互补的基准评估性能：（i）ACR放射肿瘤学培训考试（TXIT，2021），包括300个多项选择题，以及（ii）一组60个真实的放射肿瘤学插图，代表不同的疾病部位和治疗适应症。对于小插曲评价，指示GPT-5生成简明的治疗计划。四位委员会认证的放射肿瘤学家对正确性、全面性和幻觉进行了评级。使用Fleiss 'k {appa}对评定者间可靠性进行量化。结果：在TXIT基准测试中，GPT-5的平均准确率为92.8%，优于GPT-4（78.8%）和GPT-3.5（62.1%）。领域特异性增益在剂量和诊断中最为明显。在小插曲评价中，GPT-5的治疗建议在正确性（平均值3.24/4，95% CI：3.11 - 3.38）和全面性（3.59/4，95% CI：3.49 - 3.69）方面评分较高。幻觉是罕见的，没有一个病例能达到大多数人的共识。评价者之间的一致性较低（正确性为Fleiss 0.083），反映了临床判断的固有变异性。错误集中在需要精确的试验知识或详细的临床适应的复杂场景中。讨论：GPT-5在放射肿瘤学多项选择基准上明显优于先前模型变体。虽然GPT-5在生成真实世界的放射肿瘤治疗建议方面表现出良好的性能，但正确性评级表明还有进一步改进的空间。虽然幻觉并不常见，但实质性错误的存在强调了GPT-5生成的建议在临床实施之前需要严格的专家监督。
摘要：Introduction: Large language models (LLM) have shown great potential in clinical decision support. GPT-5 is a novel LLM system that has been specifically marketed towards oncology use. Methods: Performance was assessed using two complementary benchmarks: (i) the ACR Radiation Oncology In-Training Examination (TXIT, 2021), comprising 300 multiple-choice items, and (ii) a curated set of 60 authentic radiation oncologic vignettes representing diverse disease sites and treatment indications. For the vignette evaluation, GPT-5 was instructed to generate concise therapeutic plans. Four board-certified radiation oncologists rated correctness, comprehensiveness, and hallucinations. Inter-rater reliability was quantified using Fleiss' \k{appa}. Results: On the TXIT benchmark, GPT-5 achieved a mean accuracy of 92.8%, outperforming GPT-4 (78.8%) and GPT-3.5 (62.1%). Domain-specific gains were most pronounced in Dose and Diagnosis. In the vignette evaluation, GPT-5's treatment recommendations were rated highly for correctness (mean 3.24/4, 95% CI: 3.11-3.38) and comprehensiveness (3.59/4, 95% CI: 3.49-3.69). Hallucinations were rare with no case reaching majority consensus for their presence. Inter-rater agreement was low (Fleiss' \k{appa} 0.083 for correctness), reflecting inherent variability in clinical judgment. Errors clustered in complex scenarios requiring precise trial knowledge or detailed clinical adaptation. Discussion: GPT-5 clearly outperformed prior model variants on the radiation oncology multiple-choice benchmark. Although GPT-5 exhibited favorable performance in generating real-world radiation oncology treatment recommendations, correctness ratings indicate room for further improvement. While hallucinations were infrequent, the presence of substantive errors underscores that GPT-5-generated recommendations require rigorous expert oversight before clinical implementation.

【2】Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
标题：为什么停在言语上？通过行级OCR揭开更大的图景
链接：https://arxiv.org/abs/2508.21693

作者：Vempati, Nishit Anand, Gaurav Talebailkar, Arpan Garai, Chetan Arora
备注：11 pages. Project Website: this https URL
摘要：传统的光学字符识别（OCR）技术将每个字符分段然后识别。这使得它们在字符分割时容易出现错误，并且缺乏上下文来利用语言模型。在过去十年中，序列到序列翻译的进步导致现代技术首先检测单词，然后一次输入一个单词到模型中，以直接输出完整的单词作为字符序列。这允许更好地利用语言模型并绕过容易出错的字符分割步骤。我们观察到，上述风格的转变已经将准确性的瓶颈转移到了分词。因此，在本文中，我们提出了一个自然和逻辑的进展，从字级OCR行级OCR。该建议允许绕过单词检测中的错误，并提供更大的句子上下文以更好地利用语言模型。我们表明，该技术不仅提高了准确性，而且OCR的效率。尽管我们进行了全面的文献调查，但我们没有找到任何公共数据集来训练和基准测试从单词到行级OCR的这种转变。因此，我们还贡献了一个精心策划的数据集，包含251个带有行级注释的英文页面图像。我们的实验显示，端到端的准确性显著提高了5.4%，这突出了向行级OCR过渡的潜在好处，特别是对于文档图像。我们还报告了与基于单词的管道相比，效率提高了4倍。随着大型语言模型的不断改进，我们的方法也有可能利用这些进步。项目网址：https://nishitanand.github.io/line-level-ocr-website
摘要：Conventional optical character recognition (OCR) techniques segmented each character and then recognized. This made them prone to error in character segmentation, and devoid of context to exploit language models. Advances in sequence to sequence translation in last decade led to modern techniques first detecting words and then inputting one word at a time to a model to directly output full words as sequence of characters. This allowed better utilization of language models and bypass error-prone character segmentation step. We observe that the above transition in style has moved the bottleneck in accuracy to word segmentation. Hence, in this paper, we propose a natural and logical progression from word level OCR to line-level OCR. The proposal allows to bypass errors in word detection, and provides larger sentence context for better utilization of language models. We show that the proposed technique not only improves the accuracy but also efficiency of OCR. Despite our thorough literature survey, we did not find any public dataset to train and benchmark such shift from word to line-level OCR. Hence, we also contribute a meticulously curated dataset of 251 English page images with line-level annotations. Our experimentation revealed a notable end-to-end accuracy improvement of 5.4%, underscoring the potential benefits of transitioning towards line-level OCR, especially for document images. We also report a 4 times improvement in efficiency compared to word-based pipelines. With continuous improvements in large language models, our methodology also holds potential to exploit such advances. Project Website: https://nishitanand.github.io/line-level-ocr-website

【3】A Soft Inducement Framework for Incentive-Aided Steering of No-Regret Players
标题：无悔球员激励辅助引导的软诱导框架
链接：https://arxiv.org/abs/2508.21672

作者： Yorulmaz, Raj Kiriti Velicheti, Melih Bastopcu, Tamer Başar
摘要：在这项工作中，我们研究了一个转向问题，在一个调解人增强的两个球员的正常形式的游戏，调解人的目的是引导球员走向一个特定的行动配置文件，通过信息和激励设计。我们首先描述的游戏，成功的指导是可能的。此外，我们确定，引导玩家到任何所需的行动配置文件并不总是可以实现的信息设计，也不伴随着次线性支付方案。因此，我们得出了一个下限的恒定支付所需的每一轮，以实现这一目标。为了解决这些限制与信息设计，我们引入了一个增强的方法，涉及一个一杆的信息设计阶段开始之前的重复游戏，转换到一个Stackelberg游戏之前的互动。最后，我们从理论上证明了这种方法提高了收敛速度的球员的行动剖面的目标点的一个常数的因素，具有很高的概率，并支持它与实证结果。
摘要：In this work, we investigate a steering problem in a mediator-augmented two-player normal-form game, where the mediator aims to guide players toward a specific action profile through information and incentive design. We first characterize the games for which successful steering is possible. Moreover, we establish that steering players to any desired action profile is not always achievable with information design alone, nor when accompanied with sublinear payment schemes. Consequently, we derive a lower bound on the constant payments required per round to achieve this goal. To address these limitations incurred with information design, we introduce an augmented approach that involves a one-shot information design phase before the start of the repeated game, transforming the prior interaction into a Stackelberg game. Finally, we theoretically demonstrate that this approach improves the convergence rate of players' action profiles to the target point by a constant factor with high probability, and support it with empirical results.

【4】Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators
标题：内存计算机CNN加速器的二进制加权多位激活量化
链接：https://arxiv.org/abs/2508.21524

作者：hou, Zhengwu Liu, Yuan Ren, Ngai Wong
备注：5 pages, 6 figures
摘要：内存计算（CIM）加速器已经成为提高卷积神经网络（CNN）能效的一种有前途的方法。在CIM平台上部署CNN通常需要量化网络权重和激活以满足硬件约束。然而，现有的方法或者以精度为代价优先考虑具有二进制权重和激活量化的硬件效率，或者利用多位权重和激活以获得更高的精度但效率有限。在本文中，我们介绍了一种新的二进制权重多位激活（BWMA）方法，用于基于CIM的加速器上的CNN。我们的贡献包括：导出每一层中的权重量化的封闭形式解，从而显著地改进二进制化权重的表示能力;以及开发用于激活量化的可微分函数，从而近似理想的多位函数，同时绕过对最佳设置的广泛搜索。通过在CIFAR-10和ImageNet数据集上的综合实验，我们表明BWMA比现有方法取得了显着的准确性提高，在各自的数据集上注册增益为1.44%-5.46%和0.35%-5.37%。此外，硬件仿真结果表明，4位激活量化达到了硬件成本和模型性能之间的最佳平衡。
摘要：Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multi-bit weights and activations for greater accuracy but limited efficiency. In this paper, we introduce a novel binary weight multi-bit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multi-bit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44\%-5.46\% and 0.35\%-5.37\% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.

【5】Normalized Maximum Likelihood Code-Length on Riemannian Manifold Data Spaces
标题：黎曼流形数据空间上的归一化最大似然码长
链接：https://arxiv.org/abs/2508.21466

作者：zawa, Atsushi Suzuki, Kenji Yamanishi
备注：14 pages. This is a preprint of an article submitted to IEEE Transactions on Information Theory
摘要：近年来，随着图数据的大规模扩展，除了欧氏空间之外，黎曼流形数据空间也越来越受到人们的关注。特别是，双曲空间的发展是显著的，它们对具有层次结构的图形数据具有很高的表达能力。采用归一化最大似然法（NML）进行后悔最小化和模型选择。然而，现有的NML公式主要是在欧几里得空间中开发的，并且固有地依赖于坐标系的选择，这使得将NML扩展到黎曼流形是不平凡的。在这项研究中，我们定义了一个新的NML，它反映了Riemannian manifold的几何结构，称为Riemannian manifold NML（Rm-NML）。这种Rm-NML在坐标变换下是不变的，并且在欧氏空间中的自然参数化下与传统的NML一致。我们将现有的NML计算技术扩展到Riemannian流形的设置。此外，我们推导出一种方法来简化计算的Rm-NML的黎曼对称空间，其中包括数据空间的日益增长的兴趣，如双曲空间。为了说明我们所提出的方法的实际应用，我们明确计算了双曲空间上正态分布的Rm-NML。
摘要：In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressive power for graph data with hierarchical structures. Normalized Maximum Likelihood (NML) is employed in regret minimization and model selection. However, existing formulations of NML have been developed primarily in Euclidean spaces and are inherently dependent on the choice of coordinate systems, making it non-trivial to extend NML to Riemannian manifolds. In this study, we define a new NML that reflects the geometric structure of Riemannian manifolds, called the Riemannian manifold NML (Rm-NML). This Rm-NML is invariant under coordinate transformations and coincides with the conventional NML under the natural parameterization in Euclidean space. We extend existing computational techniques for NML to the setting of Riemannian manifolds. Furthermore, we derive a method to simplify the computation of Rm-NML on Riemannian symmetric spaces, which encompass data spaces of growing interest such as hyperbolic spaces. To illustrate the practical application of our proposed method, we explicitly computed the Rm-NML for normal distributions on hyperbolic spaces.

【6】Standardized Multi-Layer Tissue Maps for Enhanced Artificial Intelligence Integration and Search in Large-Scale Whole Slide Image Archives
标题：标准化多层组织图，用于在大规模整片幻灯片图像档案中增强人工智能集成和搜索
链接：https://arxiv.org/abs/2508.21418

作者：ala, Markus Plass, Robert Harb, Peter Regitnig, Kristijan Skok, Wael Al Zoughbi, Carmen Zerner, Paul Torke, Michaela Kargl, Heimo Müller, Tomas Brazdil, Matej Gallo, Jaroslav Kubín, Roman Stoklasa, Rudolf Nenutil, Norman Zerbe, Andreas Holzinger, Petr Holub
摘要：全载玻片图像（WSI）是通过以多种放大倍数扫描包含生物标本（例如组织切片或细胞样本）的整个载玻片而创建的高分辨率数字图像。这些图像可以被数字化地查看、分析和共享，并且今天被用于人工智能（AI）算法开发。WSI用于各种领域，包括用于诊断疾病的病理学和用于癌症研究的肿瘤学。它们还用于神经学、兽医学、血液学、微生物学、皮肤病学、药理学、毒理学、免疫学和法医学。在为AI算法的训练或验证组装队列时，必须知道这样的WSI上存在什么。然而，目前还没有关于这种元数据的标准，因此这种选择主要是通过手动检查来完成的，这不适合拥有数百万对象的大型集合。我们提出了一个通用的框架来生成一个二维索引图WSI和特定的应用程序域的分析机制。我们在临床病理学领域展示了这种方法，使用通用的语法和语义来实现不同目录之间的互操作性。我们的方法增强了每个WSI收集与详细的组织图，提供有关WSI内容的细粒度信息。组织图分为三层：来源、组织类型和病理改变，每层将WSI的片段分配给特定的类别。我们通过WSI目录，机器学习（ML）和基于图形的WSI表示的具体例子说明了所提出的标准的优点和适用性。
摘要：A Whole Slide Image (WSI) is a high-resolution digital image created by scanning an entire glass slide containing a biological specimen, such as tissue sections or cell samples, at multiple magnifications. These images can be viewed, analyzed, shared digitally, and are used today for Artificial Intelligence (AI) algorithm development. WSIs are used in a variety of fields, including pathology for diagnosing diseases and oncology for cancer research. They are also utilized in neurology, veterinary medicine, hematology, microbiology, dermatology, pharmacology, toxicology, immunology, and forensic science. When assembling cohorts for the training or validation of an AI algorithm, it is essential to know what is present on such a WSI. However, there is currently no standard for this metadata, so such selection has mainly been done through manual inspection, which is not suitable for large collections with several million objects. We propose a general framework to generate a 2D index map for WSI and a profiling mechanism for specific application domains. We demonstrate this approach in the field of clinical pathology, using common syntax and semantics to achieve interoperability between different catalogs. Our approach augments each WSI collection with a detailed tissue map that provides fine-grained information about the WSI content. The tissue map is organized into three layers: source, tissue type, and pathological alterations, with each layer assigning segments of the WSI to specific classes. We illustrate the advantages and applicability of the proposed standard through specific examples in WSI catalogs, Machine Learning (ML), and graph-based WSI representations.

【7】Distribution-Aware Feature Selection for SAEs
标题：针对严重不良事件的分布感知功能选择
链接：https://arxiv.org/abs/2508.21324

作者：ozeer, Nirmalendu Prakash, Michael Lan, Alice Rigg, Amirali Abdullah
摘要：稀疏自动编码器（SAE）将神经激活分解为可解释的特征。广泛采用的变体TopK SAE从其K个最活跃的潜伏期重建每个令牌。然而，这种方法是低效的，因为一些令牌比其他令牌携带更多的信息。BatchTopK通过在一批令牌中选择顶部激活来解决此限制。这提高了平均重建，但有“激活彩票”的风险，即罕见的高量级特征会挤出更多信息但较低量级的特征。为了解决这个问题，我们引入了采样SAE：我们对批量激活矩阵的列（表示特征）进行评分（通过$L_2$范数或熵），形成大小为$K1 $的候选池，然后应用Top-$K$从受限的特征池中选择整个批次的令牌。改变$l$跟踪批次级别和特定标记选择之间的频谱。在$l=1$时，令牌仅从$K$个具有全球影响力的特征中提取，而较大的$l$将池扩展到标准BatchTopK和批处理中更多特定于令牌的特征。因此，小的$l$加强了全局一致性;大的$l$有利于细粒度的重构。在Pythia-160 M上，没有单个值在所有度量上优化$l$：最佳选择取决于共享结构、重建保真度和下游性能之间的权衡。因此，Sampled-SAE将BatchTopK重新定义为可调的分布感知系列。
摘要：Sparse autoencoders (SAEs) decompose neural activations into interpretable features. A widely adopted variant, the TopK SAE, reconstructs each token from its K most active latents. However, this approach is inefficient, as some tokens carry more information than others. BatchTopK addresses this limitation by selecting top activations across a batch of tokens. This improves average reconstruction but risks an "activation lottery," where rare high-magnitude features crowd out more informative but lower-magnitude ones. To address this issue, we introduce Sampled-SAE: we score the columns (representing features) of the batch activation matrix (via $L_2$ norm or entropy), forming a candidate pool of size $Kl$, and then apply Top-$K$ to select tokens across the batch from the restricted pool of features. Varying $l$ traces a spectrum between batch-level and token-specific selection. At $l=1$, tokens draw only from $K$ globally influential features, while larger $l$ expands the pool toward standard BatchTopK and more token-specific features across the batch. Small $l$ thus enforces global consistency; large $l$ favors fine-grained reconstruction. On Pythia-160M, no single value optimizes $l$ across all metrics: the best choice depends on the trade-off between shared structure, reconstruction fidelity, and downstream performance. Sampled-SAE thus reframes BatchTopK as a tunable, distribution-aware family.

【8】RelP: Faithful and Efficient Circuit Discovery via Relevance Patching
标题：RelP：通过相关性修补实现忠实有效的电路发现
链接：https://arxiv.org/abs/2508.21258

作者：Rezaei Jafari, Oliver Eberle, Ashkan Khakzar, Neel Nanda
摘要：激活补丁是一种标准的方法，用于定位负责特定行为的模型组件的机械可解释性，但它是计算昂贵的规模应用。归因修补提供了一种更快的基于梯度的近似，但在深度、高度非线性的网络中会受到噪声和可靠性降低的影响。在这项工作中，我们介绍了相关修补（RelP），它取代了本地梯度的属性修补与传播系数来自逐层相关传播（LRP）。LRP通过层向后传播网络的输出，根据本地传播规则将相关性重新分配到较低级别的组件，以确保相关性守恒或改善信噪比等属性。与属性修补一样，RelP只需要两次向前传递和一次向后传递，在保持计算效率的同时提高了忠诚度。我们在一系列模型和任务中验证了RelP，表明它比标准属性修补更准确地近似于激活修补，特别是在分析间接对象识别（IOI）任务中的残留流和MLP输出时。例如，对于GPT-2 Large中的MLP输出，归因修补实现了0.006的Pearson相关性，而RelP达到了0.956，突出了RelP提供的改进。此外，我们比较了由RelP和集成电路（IG）确定的稀疏特征电路的忠实性，表明RelP实现了相当的忠实性，而无需与IG相关的额外计算成本。
摘要：Activation patching is a standard method in mechanistic interpretability for localizing the components of a model responsible for specific behaviors, but it is computationally expensive to apply at scale. Attribution patching offers a faster, gradient-based approximation, yet suffers from noise and reduced reliability in deep, highly non-linear networks. In this work, we introduce Relevance Patching (RelP), which replaces the local gradients in attribution patching with propagation coefficients derived from Layer-wise Relevance Propagation (LRP). LRP propagates the network's output backward through the layers, redistributing relevance to lower-level components according to local propagation rules that ensure properties such as relevance conservation or improved signal-to-noise ratio. Like attribution patching, RelP requires only two forward passes and one backward pass, maintaining computational efficiency while improving faithfulness. We validate RelP across a range of models and tasks, showing that it more accurately approximates activation patching than standard attribution patching, particularly when analyzing residual stream and MLP outputs in the Indirect Object Identification (IOI) task. For instance, for MLP outputs in GPT-2 Large, attribution patching achieves a Pearson correlation of 0.006, whereas RelP reaches 0.956, highlighting the improvement offered by RelP. Additionally, we compare the faithfulness of sparse feature circuits identified by RelP and Integrated Gradients (IG), showing that RelP achieves comparable faithfulness without the extra computational cost associated with IG.

【9】FUTURE: Flexible Unlearning for Tree Ensemble
标题：未来：树木群落的灵活放弃学习
链接：https://arxiv.org/abs/2508.21181

作者：en, Jin Huang, Jiali Cheng, Yuchan Guo, Mengjie Wang, Lalitesh Morishetti, Kaushiki Nag, Hadi Amiri
备注：CIKM 2025
摘要：树集合因其在分类任务中的有效性而被广泛认可，在包括生物信息学，金融和医疗诊断在内的不同领域实现了最先进的性能。随着对数据隐私和被遗忘权的日益重视，已经提出了几种非学习算法来使树集成忘记敏感信息。然而，现有的方法往往是针对特定的模型或依赖于离散的树结构，使他们难以推广到复杂的合奏和效率低下的大规模数据集。为了解决这些限制，我们提出了未来，一种新的学习算法的树集成。具体来说，我们将遗忘样本的问题制定为基于梯度的优化任务。为了适应不可微的树系综，我们采用概率模型近似的优化框架。这使得端到端的遗忘能够以有效和高效的方式进行。在真实世界数据集上的大量实验表明，FUTURE产生了显着和成功的非学习性能。
摘要：Tree ensembles are widely recognized for their effectiveness in classification tasks, achieving state-of-the-art performance across diverse domains, including bioinformatics, finance, and medical diagnosis. With increasing emphasis on data privacy and the \textit{right to be forgotten}, several unlearning algorithms have been proposed to enable tree ensembles to forget sensitive information. However, existing methods are often tailored to a particular model or rely on the discrete tree structure, making them difficult to generalize to complex ensembles and inefficient for large-scale datasets. To address these limitations, we propose FUTURE, a novel unlearning algorithm for tree ensembles. Specifically, we formulate the problem of forgetting samples as a gradient-based optimization task. In order to accommodate non-differentiability of tree ensembles, we adopt the probabilistic model approximations within the optimization framework. This enables end-to-end unlearning in an effective and efficient manner. Extensive experiments on real-world datasets show that FUTURE yields significant and successful unlearning performance.

【10】Synthetic CVs To Build and Test Fairness-Aware Hiring Tools
标题：合成简历构建和测试公平性招聘工具
链接：https://arxiv.org/abs/2508.21179

作者：divar, Anna Gatzioura, Carlos Castillo
摘要：在某些行业，招聘变得越来越必要，因为它承诺要处理数百甚至数千名申请人。这些系统的核心是设计用于检索和排名候选人档案的算法，这些档案通常由简历（CV）表示。然而，研究表明，这些技术可能会在无意中引入偏见，导致基于候选人年龄、性别或民族血统等因素的歧视。开发方法来衡量，减轻和解释算法招聘中的偏见，以及在部署之前评估和比较公平技术，需要反映不同背景的人的特征的简历集。然而，可用于进行这项研究的这些特征的数据集并不存在。为了解决这个问题，本文介绍了一种方法，用于构建一个合成的CV数据集，其特征是通过数据捐赠活动收集的真实材料。此外，还提供了包含1，730份简历的数据集，我们将其视为算法招聘歧视研究的潜在基准标准。
摘要：Algorithmic hiring has become increasingly necessary in some sectors as it promises to deal with hundreds or even thousands of applicants. At the heart of these systems are algorithms designed to retrieve and rank candidate profiles, which are usually represented by Curricula Vitae (CVs). Research has shown, however, that such technologies can inadvertently introduce bias, leading to discrimination based on factors such as candidates' age, gender, or national origin. Developing methods to measure, mitigate, and explain bias in algorithmic hiring, as well as to evaluate and compare fairness techniques before deployment, requires sets of CVs that reflect the characteristics of people from diverse backgrounds. However, datasets of these characteristics that can be used to conduct this research do not exist. To address this limitation, this paper introduces an approach for building a synthetic dataset of CVs with features modeled on real materials collected through a data donation campaign. Additionally, the resulting dataset of 1,730 CVs is presented, which we envision as a potential benchmarking standard for research on algorithmic hiring discrimination.

【11】Considerations for Estimating Causal Effects of Informatively Timed Treatments
标题：估计信息定时治疗因果效应的考虑因素
链接：https://arxiv.org/abs/2508.21804

作者：nisian
摘要：流行病学研究通常涉及估计一系列治疗决策对生存结局的因果影响。在许多情况下，治疗决定不会发生在固定的，预先指定的随访时间。相反，不同受试者的时机不同，可能会为后续治疗决策和潜在结果提供信息。文献中缺乏对这一问题及其潜在解决方案的认识，这是这项工作的动机。在这里，我们正式的信息定时的问题，与忽略它相关的问题，并显示如何g-方法可以用来分析信息定时的顺序治疗。正如我们所描述的，在这种情况下，连续治疗决策之间的等待时间可以被正确地视为随时间变化的混杂因素。使用合成的例子，我们说明了如何g-方法，不调整这些等待时间可能是有偏见的，以及如何调整可以在患者可能会死亡或审查治疗之间的情况下进行。我们绘制调整和识别与离散时间与连续时间模型之间的连接。最后，我们提供了实施指南和使用公开可用的软件的例子。我们的结论是：1）考虑时间对于有效的推断很重要，2）可以使用g方法校正信息时间，该方法将治疗之间的等待时间调整为随时间变化的混杂因素。
摘要：Epidemiological studies are often concerned with estimating causal effects of a sequence of treatment decisions on survival outcomes. In many settings, treatment decisions do not occur at fixed, pre-specified followup times. Rather, timing varies across subjects in ways that may be informative of subsequent treatment decisions and potential outcomes. Awareness of the issue and its potential solutions is lacking in the literature, which motivate this work. Here, we formalize the issue of informative timing, problems associated with ignoring it, and show how g-methods can be used to analyze sequential treatments that are informatively timed. As we describe, in such settings, the waiting times between successive treatment decisions may be properly viewed as a time-varying confounders. Using synthetic examples, we illustrate how g-methods that do not adjust for these waiting times may be biased and how adjustment can be done in scenarios where patients may die or be censored in between treatments. We draw connections between adjustment and identification with discrete-time versus continuous-time models. Finally, we provide implementation guidance and examples using publicly available software. Our concluding message is that 1) considering timing is important for valid inference and 2) correcting for informative timing can be done with g-methods that adjust for waiting times between treatments as time-varying confounders.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递