机器学习学术速递[8.22]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计137篇

大模型相关(7篇)

【1】Communication Efficient LLM Pre-training with SparseLoCo
标题：与SparseLoCo进行高效沟通LLM预训练
链接：https://arxiv.org/abs/2508.15706

作者：i, Benjamin Thérien, Joel Lidin, Eugene Belilovsky
备注：15 pages, 9 tables, 2 figures
摘要：通信高效的分布式训练算法最近受到了相当大的关注，因为它们在带宽受限的环境中（例如跨数据中心和互联网）训练大型语言模型（LLM）的好处。尽管降低了通信频率，但这些方法通常仍然需要传输模型梯度的完整副本，即使对于跨数据中心链接也会导致通信瓶颈。此外，与简单的AdamW DDP基线相比，它们会略微降低性能。虽然量化和误差反馈通常被应用于减小伪梯度的大小，但是在LLM预训练的上下文中，现有方法已经不能额外地利用稀疏化并且已经获得有限的量化。在这项工作中，我们引入了SparseLoCo，这是一种用于LLM的通信高效训练算法，它有效地利用Top-k稀疏化和量化来达到高达1-3%稀疏度和2位量化的极端压缩比，同时优于全精度DiLoCo。我们的主要观察结果是，外部动量可以通过误差反馈与积极的稀疏性相结合来局部近似，并且稀疏聚合实际上可以提高模型性能。我们的经验表明，在一系列的通信约束的LLM训练设置，SparseLoCo提供了显着的好处，在性能和通信成本。
摘要：Communication-efficient distributed training algorithms have received considerable interest recently due to their benefits for training Large Language Models (LLMs) in bandwidth-constrained settings, such as across data centers and over the internet. Despite reducing communication frequency, these methods still typically require communicating a full copy of the model's gradients-resulting in a communication bottleneck even for cross-datacenter links. Furthermore, they can slightly degrade performance compared to a naive AdamW DDP baseline. While quantization and error feedback are often applied to reduce the pseudo-gradient's size, in the context of LLM pre-training, existing approaches have been unable to additionally leverage sparsification and have obtained limited quantization. In this work, we introduce SparseLoCo, a communication-efficient training algorithm for LLMs that effectively leverages Top-k sparsification and quantization to reach extreme compression ratios of up to 1-3% sparsity and 2-bit quantization while outperforming full-precision DiLoCo. Our key observations are that outer momentum can be locally approximated by an error feedback combined with aggressive sparsity and that sparse aggregation can actually improve model performance. We empirically demonstrate in a range of communication-constrained LLM training settings that SparseLoCo provides significant benefits in both performance and communication cost.

【2】Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection
标题：通过变形表示投影可靠地消除LLM中的有害信息
链接：https://arxiv.org/abs/2508.15449

作者：Wu, Zeming Wei, Huanran Chen, Yinpeng Dong, Meng Sun
备注：10 pages, 9 figures, Under review as a full paper at AAAI 2026. A preliminary version is under review at the NeurIPS 2025 Workshop on Reliable ML from Unreliable Data
摘要：虽然大型语言模型（LLM）在各种领域和任务中表现出令人印象深刻的性能，但对其安全性的担忧正变得越来越严重。特别是，由于模型可能在内部存储不安全的知识，机器非学习已经成为确保模型安全的代表性范例。现有的方法采用各种训练技术，如梯度上升和负偏好优化，试图消除不需要的数据对目标模型的影响。然而，这些方法只是通过参数训练来抑制不需要的数据的激活，而没有完全消除其在模型中的信息痕迹。这种基本的限制使得很难实现有效的连续学习，使这些方法容易受到重新学习攻击。为了克服这些挑战，我们提出了一种变形表示投影（MRP）方法，该方法开创了不可逆投影属性在机器学习中的应用。通过在特定网络层的隐藏状态空间中实现投影变换，我们的方法有效地消除了有害信息，同时保留了有用的知识。实验结果表明，我们的方法能够有效地连续unlearning和成功地抵御再学习攻击，实现最先进的性能unlearning的有效性，同时保持自然的性能。我们的代码可在https://github.com/ChengcanWu/MRP上找到。
摘要：While Large Language Models (LLMs) have demonstrated impressive performance in various domains and tasks, concerns about their safety are becoming increasingly severe. In particular, since models may store unsafe knowledge internally, machine unlearning has emerged as a representative paradigm to ensure model safety. Existing approaches employ various training techniques, such as gradient ascent and negative preference optimization, in attempts to eliminate the influence of undesired data on target models. However, these methods merely suppress the activation of undesired data through parametric training without completely eradicating its informational traces within the model. This fundamental limitation makes it difficult to achieve effective continuous unlearning, rendering these methods vulnerable to relearning attacks. To overcome these challenges, we propose a Metamorphosis Representation Projection (MRP) approach that pioneers the application of irreversible projection properties to machine unlearning. By implementing projective transformations in the hidden state space of specific network layers, our method effectively eliminates harmful information while preserving useful knowledge. Experimental results demonstrate that our approach enables effective continuous unlearning and successfully defends against relearning attacks, achieving state-of-the-art performance in unlearning effectiveness while preserving natural performance. Our code is available in https://github.com/ChengcanWu/MRP.

【3】LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
标题：LLaSO：大型语言和语音模型可重复性研究的基础框架
链接：https://arxiv.org/abs/2508.15418

作者：n, Yizhong Geng, Peidong Wei, Yanjun Chen, Jinghan Yang, Rongfei Chen, Wei Zhang, Xiaoyu Shen
摘要：大型语音语言模型（LSLMs）的发展由于架构分散和缺乏透明度而放缓，阻碍了研究的系统比较和可重复性。与视觉语言领域不同，LSLM领域的常见做法是在没有相应训练数据和配置的情况下释放模型权重。为了解决这些关键的差距，我们引入了LLaSO，这是第一个完全开放的，端到端的大规模语音语言建模框架。LLaSO为社区提供了三个基本资源：（1）LLaSO-Align，一个12 M实例的语音-文本对齐语料库;（2）LLaSO-Instruct，一个1350万实例的多任务调试数据集;（3）LLaSO-Eval，一个可重复的标准化评估基准。为了验证我们的框架，我们构建并发布了LLaSO-Base，这是一个专门在我们的公共数据上训练的3. 8B参数参考模型。它实现了0.72的标准化得分，建立了一个强大的，可重复的基线，超越了可比模型。我们的分析表明，虽然更广泛的训练覆盖范围可以提高性能，但在看不见的任务上，特别是在纯音频场景中，仍然存在显着的泛化差距。通过发布完整的数据、基准和模型，LLaSO建立了一个基础性的开放标准，以统一研究工作并加速LSLM的社区驱动的进展。我们在https://github.com/EIT-NLP/LLaSO上发布代码、数据集、预训练模型和结果。
摘要：The development of Large Speech-Language Models (LSLMs) has been slowed by fragmented architectures and a lack of transparency, hindering the systematic comparison and reproducibility of research. Unlike in the vision-language domain, the LSLM field suffers from the common practice of releasing model weights without their corresponding training data and configurations. To address these critical gaps, we introduce LLaSO, the first fully open, end-to-end framework for large-scale speech-language modeling. LLaSO provides the community with three essential resources: (1) LLaSO-Align, a 12M-instance speech-text alignment corpus; (2) LLaSO-Instruct, a 13.5M-instance multi-task instruction-tuning dataset; and (3) LLaSO-Eval, a reproducible benchmark for standardized evaluation. To validate our framework, we build and release LLaSO-Base, a 3.8B-parameter reference model trained exclusively on our public data. It achieves a normalized score of 0.72, establishing a strong, reproducible baseline that surpasses comparable models. Our analysis reveals that while broader training coverage enhances performance, significant generalization gaps persist on unseen tasks, particularly in pure audio scenarios. By releasing the complete stack of data, benchmarks, and models, LLaSO establishes a foundational open standard to unify research efforts and accelerate community-driven progress in LSLMs. We release the code, dataset, pretrained models, and results in https://github.com/EIT-NLP/LLaSO.

【4】Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
标题：利用语言模型预训练中的词汇频率不平衡
链接：https://arxiv.org/abs/2508.15390

作者：ung, Jeonghoon Kim
备注：Preprint
摘要：大型语言模型使用标记器进行训练，产生的标记分布高度不平衡：少数单词在流中占主导地位，而大多数单词很少出现。最近的实践倾向于更大的词汇表，但好处的来源还不清楚。我们进行了一项对照研究，将语言模型的词汇量从24 K扩展到196 K，同时保持数据，计算和优化固定。我们首先量化的复杂性，通过Kolmogorov复杂性形式化的标记化的文本，并表明，较大的词汇减少这种复杂性。在24 K以上，每个常用词都已经是一个标记，因此进一步的增长主要是加深了相对标记频率的不平衡。词级损失分解表明，较大的词汇量几乎完全通过降低2,500个最常见单词的不确定性来降低交叉熵，尽管罕见尾部的损失会增加。约束输入和输出嵌入范数以减弱令牌频率不平衡的影响，从而逆转了增益，直接表明模型利用而不是遭受不平衡。由于相同的频繁单词覆盖了下游基准测试中大约77%的令牌，因此这种训练优势完好无损。我们还表明，扩大模型参数与固定的词汇产生相同的频繁字的好处。我们的研究结果将“更大的词汇量有助于”重新定义为“降低标记化文本的复杂性有助于”，为标记器模型协同设计提供了一个简单，原则性的杠杆，并澄清了在预训练中控制语言模型缩放的损失动态。
摘要：Large language models are trained with tokenizers, and the resulting token distribution is highly imbalanced: a few words dominate the stream while most occur rarely. Recent practice favors ever-larger vocabularies, but the source of the benefit is unclear. We conduct a controlled study that scales the language model's vocabulary from 24K to 196K while holding data, compute, and optimization fixed. We first quantify the complexity of tokenized text, formalized via Kolmogorov complexity, and show that larger vocabularies reduce this complexity. Above 24K, every common word is already a single token, so further growth mainly deepens the relative token-frequency imbalance. A word-level loss decomposition shows that larger vocabularies reduce cross-entropy almost exclusively by lowering uncertainty on the 2,500 most frequent words, even though loss on the rare tail rises. Constraining input and output embedding norms to attenuate the effect of token-frequency imbalance reverses the gain, directly showing that the model exploits rather than suffers from imbalance. Because the same frequent words cover roughly 77% of tokens in downstream benchmarks, this training advantage transfers intact. We also show that enlarging model parameters with a fixed vocabulary yields the same frequent-word benefit. Our results reframe "bigger vocabularies help" as "lowering the complexity of tokenized text helps," providing a simple, principled lever for tokenizer-model co-design and clarifying the loss dynamics that govern language-model scaling in pre-training.

【5】VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models
标题：VocabTailor：小型语言模型中下游任务的动态词汇选择
链接：https://arxiv.org/abs/2508.15229

作者：hang, Yayu Zhou, Tongcheng Fang, Zhihang Yuan, Guohao Dai, Yu Wang
摘要：小型语言模型（SLM）在资源受限的环境中提供了计算优势，但内存限制仍然是边缘设备部署的关键瓶颈。由于词汇量很大，SLM的内存占用很大一部分来自词汇相关的组件，特别是嵌入和语言建模（LM）头。现有的静态词汇修剪虽然减少了内存使用量，但却存在僵化的、一刀切的设计，这会导致预填充阶段的信息丢失和缺乏灵活性。在这项工作中，我们确定两个关键原则的词汇减少的挑战：词汇的局部性原则，观察，只有一个小的子集的令牌是需要在任何单一的推理，和不对称的计算特性之间的词汇相关的组件SLM。基于这些见解，我们引入VocabTailor，一种新的解耦动态词汇选择框架，通过卸载嵌入解决内存限制，并实现了LM头的混合静态-动态词汇选择策略，使按需加载词汇组件。在不同的下游任务的综合实验表明，VocabTailor实现了减少高达99%的词汇相关组件的内存使用的任务性能最小或没有退化，大大优于现有的静态词汇修剪。
摘要：Small Language Models (SLMs) provide computational advantages in resource-constrained environments, yet memory limitations remain a critical bottleneck for edge device deployment. A substantial portion of SLMs' memory footprint stems from vocabulary-related components, particularly embeddings and language modeling (LM) heads, due to large vocabulary sizes. Existing static vocabulary pruning, while reducing memory usage, suffers from rigid, one-size-fits-all designs that cause information loss from the prefill stage and a lack of flexibility. In this work, we identify two key principles underlying the vocabulary reduction challenge: the lexical locality principle, the observation that only a small subset of tokens is required during any single inference, and the asymmetry in computational characteristics between vocabulary-related components of SLM. Based on these insights, we introduce VocabTailor, a novel decoupled dynamic vocabulary selection framework that addresses memory constraints through offloading embedding and implements a hybrid static-dynamic vocabulary selection strategy for LM Head, enabling on-demand loading of vocabulary components. Comprehensive experiments across diverse downstream tasks demonstrate that VocabTailor achieves a reduction of up to 99% in the memory usage of vocabulary-related components with minimal or no degradation in task performance, substantially outperforming existing static vocabulary pruning.

【6】SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
标题：SafeLLM：消除大型语言模型的有害输出以应对越狱攻击
链接：https://arxiv.org/abs/2508.15182

作者：Li, Xiaodong Wu, Qi Li, Jianbing Ni, Rongxing Lu
摘要：越狱攻击对大型语言模型（LLM）的安全性构成了严重威胁，因为它会制作绕过对齐机制的对抗性提示，导致模型产生有害的、受限的或有偏见的内容。在本文中，我们提出了SafeLLM，一种新的基于unlearning-based防御框架，它可以在保留语言流畅性和一般能力的同时，从LLM中删除有害知识。SafeLLM采用三阶段流水线：（1）使用混合方法进行动态不安全输出检测，该方法将外部分类器与模型内部评估集成在一起;（2）通过前馈网络（FFN）激活进行令牌级有害内容跟踪，以定位有害知识;以及（3）约束优化，以抑制不安全行为，而不会降低整体模型质量。SafeLLM通过识别和中和负责有害生成途径的FFN子结构来实现有针对性和不可逆的遗忘。在多个越狱基准测试中对突出的LLM（Vicuna，LLaMA和GPT-J）进行的广泛实验表明，SafeLLM在保持高通用性能的同时大大降低了攻击成功率。与监督微调和直接偏好优化等标准防御方法相比，SafeLLM提供了更强的安全保障、对有害行为的更精确控制以及对未知攻击的更强鲁棒性。此外，SafeLLM保持了有害知识去除后的一般性能。这些结果突出了unlearning作为可扩展和有效的LLM安全性的一个有希望的方向。
摘要：Jailbreak attacks pose a serious threat to the safety of Large Language Models (LLMs) by crafting adversarial prompts that bypass alignment mechanisms, causing the models to produce harmful, restricted, or biased content. In this paper, we propose SafeLLM, a novel unlearning-based defense framework that unlearn the harmful knowledge from LLMs while preserving linguistic fluency and general capabilities. SafeLLM employs a three-stage pipeline: (1) dynamic unsafe output detection using a hybrid approach that integrates external classifiers with model-internal evaluations; (2) token-level harmful content tracing through feedforward network (FFN) activations to localize harmful knowledge; and (3) constrained optimization to suppress unsafe behavior without degrading overall model quality. SafeLLM achieves targeted and irreversible forgetting by identifying and neutralizing FFN substructures responsible for harmful generation pathways. Extensive experiments on prominent LLMs (Vicuna, LLaMA, and GPT-J) across multiple jailbreak benchmarks show that SafeLLM substantially reduces attack success rates while maintaining high general-purpose performance. Compared to standard defense methods such as supervised fine-tuning and direct preference optimization, SafeLLM offers stronger safety guarantees, more precise control over harmful behavior, and greater robustness to unseen attacks. Moreover, SafeLLM maintains the general performance after the harmful knowledge unlearned. These results highlight unlearning as a promising direction for scalable and effective LLM safety.

【7】Hydra: A 1.6B-Parameter State-Space Language Model with Sparse Attention, Mixture-of-Experts, and Memory
标题：Hydra：具有稀疏注意力、混合专家和记忆力的1.6B参数状态空间语言模型
链接：https://arxiv.org/abs/2508.15099

作者： Chaudhary, Bennett Browning
摘要：我们提出了Hydra作为混合长上下文语言模型的架构建议，该模型结合了条件计算，长上下文记忆机制和稀疏混合专家，在大约1.6B的参数设计信封。Hydra集成了一个Mamba风格的结构化状态空间模型（SSM）骨干，具有间歇性稀疏全局注意力，块级MoE前馈路由和双重（工作空间加事实PKM）记忆。我们正式的组件接口，透明的参数和复杂性会计，并概述了一个阶段性的课程，旨在稳定激活的部分。我们伴随着说明性的玩具规模的原型测量（合成数据上千万的参数），其唯一目的是证明实施的可行性和定性的扩展行为（例如，长上下文吞吐量交叉和可控的专家路由），而不是声称有竞争力的全面性能。我们明确地描述了假设和开放的风险（训练复杂性，内存利用率，专业化动态）和位置九头蛇作为一个蓝图，以刺激经验的后续行动，而不是一个完成的系统。通过将SSM效率、选择性稀疏注意力、MoE容量和可学习记忆相结合，Hydra勾画了一条通往模块化、输入自适应的长上下文语言模型的道路;验证目标规模下的最终任务收益仍然是未来的工作。
摘要：We present Hydra as an architectural proposal for hybrid long-context language models that combine conditional computation, long-context memory mechanisms, and sparse mixture-of-experts within an approximately 1.6B parameter design envelope. Hydra integrates a Mamba-style Structured State Space Model (SSM) backbone with intermittent sparse global attention, chunk-level MoE feed-forward routing, and dual (workspace plus factual PKM) memories. We formalize the component interfaces, give transparent parameter and complexity accounting, and outline a staged curriculum intended to stably activate the parts. We accompany the specification with illustrative toy-scale prototype measurements (tens of millions of parameters on synthetic data) whose sole purpose is to demonstrate implementation feasibility and qualitative scaling behaviors (for example, long-context throughput crossover and controllable expert routing), not to claim competitive full-scale performance. We explicitly delineate assumptions and open risks (training complexity, memory utilization, specialization dynamics) and position Hydra as a blueprint to stimulate empirical follow-up rather than a finished system. By combining SSM efficiency, selective sparse attention, MoE capacity, and learnable memory, Hydra sketches a path toward modular, input-adaptive long-context language models; validating end-task gains at target scale remains future work.

Graph相关(图学习|图神经网络|图优化等)(10篇)

【1】GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning
标题：GRAFT：用于文本对齐的GRAPH和表格推理--结构化指令遵循和视觉推理的基准
链接：https://arxiv.org/abs/2508.15690

作者：erma, Sriram Puttagunta, Seganrasan Subramanian, Sravan Ramachandran
备注：23 pages, 9 tables, 3 figures
摘要：GRAFT是一个结构化的多模态基准测试，用于评估模型的预防跟踪，视觉推理和视觉-文本对齐任务。它以编程方式生成的图表和综合呈现的表格为特色，这些图表和表格是使用Python可视化库创建的，以确保对数据语义、结构和清晰度的控制。每个GRAFT实例都将图表或表格图像与系统生成的多步骤分析问题配对，这些问题仅基于视觉内容。答案以JSON或YAML等结构化格式提供，支持推理和输出格式的一致评估。该基准引入了推理类型的分类，包括比较、趋势识别、排名、聚合、比例估计和异常检测，以实现全面的评估。参考答案遵循严格的事实和格式准则，以进行精确的、基于方面的评估。GRAFT提供了一个统一的，可扩展的框架，用于对基于视觉的结构化推理任务的多模态模型进行细粒度基准测试，在该领域建立了新的评估标准。
摘要：GRAFT is a structured multimodal benchmark for evaluating models on instruction-following, visual reasoning, and visual-textual alignment tasks. It features programmatically generated charts and synthetically rendered tables, created with Python visualization libraries to ensure control over data semantics, structure, and clarity. Each GRAFT instance pairs a chart or table image with a systematically generated, multi-step analytical question based solely on visual content. Answers are provided in structured formats such as JSON or YAML, supporting consistent evaluation of both reasoning and output format. The benchmark introduces a taxonomy of reasoning types including comparison, trend identification, ranking, aggregation, proportion estimation, and anomaly detection to enable comprehensive assessment. Reference answers follow strict factual and formatting guidelines for precise, aspect-based evaluation. GRAFT offers a unified, scalable framework for fine-grained benchmarking of multimodal models on visually grounded, structured reasoning tasks, setting a new evaluation standard in this field.

【2】GRASPED: Graph Anomaly Detection using Autoencoder with Spectral Encoder and Decoder (Full Version)
标题：GRASPP：使用自动编码器和光谱编码器和解码器进行图形异常检测（完整版本）
链接：https://arxiv.org/abs/2508.15633

作者： Choong, Jixing Liu, Ching-Yu Kao, Philip Sperl
备注：Full version of the paper accepted for publication at the European Conference on Artificial Intelligence (ECAI 2025)
摘要：图机器学习已经在各个领域得到了广泛的研究，例如社区检测，交易分析和推荐系统。在这些应用中，异常检测起着重要的作用。最近的研究表明，图上的异常会导致谱移动。一些监督方法已经提高了这种谱域信息的利用率。然而，由于异常的性质，它们仍然受到标记数据稀缺的限制。另一方面，现有的无监督学习方法主要依赖于空间信息或仅采用低通滤波器，从而失去了多波段分析的能力。在本文中，我们提出了图自动编码器与频谱编码器和频谱解码器（GRASPED）的节点异常检测。我们的无监督学习模型具有基于图小波卷积的编码器，以及结构和属性解码器。图小波卷积为基础的编码器，结合维纳图解卷积为基础的解码器，表现出带通滤波器的特性，捕捉全球和当地的图形信息在多个尺度。这种设计允许基于学习的节点属性重建，有效地捕获异常信息。在几个真实世界的图异常检测数据集上进行的大量实验表明，GRASPED优于当前最先进的模型。
摘要：Graph machine learning has been widely explored in various domains, such as community detection, transaction analysis, and recommendation systems. In these applications, anomaly detection plays an important role. Recently, studies have shown that anomalies on graphs induce spectral shifts. Some supervised methods have improved the utilization of such spectral domain information. However, they remain limited by the scarcity of labeled data due to the nature of anomalies. On the other hand, existing unsupervised learning approaches predominantly rely on spatial information or only employ low-pass filters, thereby losing the capacity for multi-band analysis. In this paper, we propose Graph Autoencoder with Spectral Encoder and Spectral Decoder (GRASPED) for node anomaly detection. Our unsupervised learning model features an encoder based on Graph Wavelet Convolution, along with structural and attribute decoders. The Graph Wavelet Convolution-based encoder, combined with a Wiener Graph Deconvolution-based decoder, exhibits bandpass filter characteristics that capture global and local graph information at multiple scales. This design allows for a learning-based reconstruction of node attributes, effectively capturing anomaly information. Extensive experiments on several real-world graph anomaly detection datasets demonstrate that GRASPED outperforms current state-of-the-art models.

【3】Let's Grow an Unbiased Community: Guiding the Fairness of Graphs via New Links
标题：让我们发展一个无偏见的社区：通过新链接引导图形的公平性
链接：https://arxiv.org/abs/2508.15499

作者：, Huaxiao Liu, Shuotong Bai, Junjie Xu, Renqiang Luo, Enyan Dai
摘要：图神经网络（GNN）在各种应用中取得了显着的成功。然而，由于图结构中的偏差，图神经网络在公平性方面面临重大挑战。虽然原始的用户图结构通常是有偏的，但通过引入新的链接，有希望引导这些现有的结构朝着无偏的方向发展。通过新链接的公平性指导可以培养无偏见的社区，从而提高下游应用程序的公平性。为了解决这个问题，我们提出了一个新的框架命名为FairGuide。具体来说，为了确保在公平指导图上训练的下游任务的公平性，我们引入了一个可区分的社区检测任务作为伪下游任务。我们的理论分析进一步表明，在这个伪任务中优化公平性有效地增强了结构公平性，促进了不同下游应用程序的公平性泛化。此外，FairGuide采用了一种有效的策略，该策略利用来自公平指导目标的元梯度来识别显着增强结构公平性的新链接。大量的实验结果表明，我们提出的方法在各种基于图的公平性任务的有效性和通用性。
摘要：Graph Neural Networks (GNNs) have achieved remarkable success across diverse applications. However, due to the biases in the graph structures, graph neural networks face significant challenges in fairness. Although the original user graph structure is generally biased, it is promising to guide these existing structures toward unbiased ones by introducing new links. The fairness guidance via new links could foster unbiased communities, thereby enhancing fairness in downstream applications. To address this issue, we propose a novel framework named FairGuide. Specifically, to ensure fairness in downstream tasks trained on fairness-guided graphs, we introduce a differentiable community detection task as a pseudo downstream task. Our theoretical analysis further demonstrates that optimizing fairness within this pseudo task effectively enhances structural fairness, promoting fairness generalization across diverse downstream applications. Moreover, FairGuide employs an effective strategy which leverages meta-gradients derived from the fairness-guidance objective to identify new links that significantly enhance structural fairness. Extensive experimental results demonstrate the effectiveness and generalizability of our proposed method across a variety of graph-based fairness tasks.

【4】GraSP: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data for SFT and DPO
标题：GraSP：一个基于图形的统一框架，用于SFT和DPO合成数据的可扩展生成、质量标记和管理
链接：https://arxiv.org/abs/2508.15432

作者： Pradhan, Surajit Dasgupta, Amit Kumar Saha, Omkar Anustoop, Sriram Puttagunta, Vipul Mittal, Gopal Sarda
摘要：大型语言模型（LLM）的发展严重依赖于高质量数据集的可用性，用于监督微调（SFT），直接偏好优化（DPO）等对齐任务。在这项工作中，我们提出了一个全面的合成数据生成框架，可以促进可扩展，可配置和高保真度生成为这些训练范式量身定制的合成数据。我们的方法采用了一个模块化的和基于配置的管道，能够建模复杂的对话流，最小的手动干预。该框架使用双阶段质量标记机制，结合启发式规则和基于LLM的评估，自动过滤和评分从OASST格式对话中提取的数据，确保高质量对话样本的管理。生成的数据集根据支持SFT和DPO用例的灵活模式进行结构化，从而能够无缝集成到不同的培训工作流程中。总之，这些创新为大规模生成和管理合成会话数据提供了一个强大的解决方案，大大减少了LLM训练管道中的数据准备开销。
摘要：The advancement of large language models (LLMs) is critically dependent on the availability of high-quality datasets for Supervised Fine-Tuning (SFT), alignment tasks like Direct Preference Optimization (DPO), etc. In this work, we present a comprehensive synthetic data generation framework that facilitates scalable, configurable, and high-fidelity generation of synthetic data tailored for these training paradigms. Our approach employs a modular and configuration-based pipeline capable of modeling complex dialogue flows with minimal manual intervention. This framework uses a dual-stage quality tagging mechanism, combining heuristic rules and LLM-based evaluations, to automatically filter and score data extracted from OASST-formatted conversations, ensuring the curation of high-quality dialogue samples. The resulting datasets are structured under a flexible schema supporting both SFT and DPO use cases, enabling seamless integration into diverse training workflows. Together, these innovations offer a robust solution for generating and managing synthetic conversational data at scale, significantly reducing the overhead of data preparation in LLM training pipelines.

【5】CITE: A Comprehensive Benchmark for Heterogeneous Text-Attributed Graphs on Catalytic Materials
标题：CITE：催化材料上异质文本属性图的综合基准
链接：https://arxiv.org/abs/2508.15392

作者：Zhang, Qingqing Long, Ludi Wang, Wenjuan Cui, Jianjun Yu, Yi Du
备注：23 pages, 4 figures,
摘要：文本属性图（TAG）在现实世界的系统中是普遍存在的，其中每个节点都携带自己的文本特征。在许多情况下，这些图本质上是异构的，包含多个节点类型和不同的边类型。尽管此类异构TAG无处不在，但仍然缺乏大规模的基准数据集。这一不足已经成为一个关键的瓶颈，阻碍了异构文本属性图表示学习方法的发展和公平比较。在本文中，我们介绍了CITE -催化信息文本实体图，第一个也是最大的异构文本属性引用图基准催化材料。CITE包括超过438 K个节点和120万条边，跨越四种关系类型。此外，我们建立了标准化的评估程序，并进行了广泛的基准测试的节点分类任务，以及消融实验的异质性和文本属性的CITE。我们比较了四类学习范式，包括同构图模型，异构图模型，LLM（大语言模型）为中心的模型，和LLM+图模型。简而言之，我们提供了（i）CITE数据集的概述，（ii）标准化的评估方案，以及（iii）不同建模范式的基线和消融实验。
摘要：Text-attributed graphs(TAGs) are pervasive in real-world systems,where each node carries its own textual features. In many cases these graphs are inherently heterogeneous, containing multiple node types and diverse edge types. Despite the ubiquity of such heterogeneous TAGs, there remains a lack of large-scale benchmark datasets. This shortage has become a critical bottleneck, hindering the development and fair comparison of representation learning methods on heterogeneous text-attributed graphs. In this paper, we introduce CITE - Catalytic Information Textual Entities Graph, the first and largest heterogeneous text-attributed citation graph benchmark for catalytic materials. CITE comprises over 438K nodes and 1.2M edges, spanning four relation types. In addition, we establish standardized evaluation procedures and conduct extensive benchmarking on the node classification task, as well as ablation experiments on the heterogeneous and textual properties of CITE. We compare four classes of learning paradigms, including homogeneous graph models, heterogeneous graph models, LLM(Large Language Model)-centric models, and LLM+Graph models. In a nutshell, we provide (i) an overview of the CITE dataset, (ii) standardized evaluation protocols, and (iii) baseline and ablation experiments across diverse modeling paradigms.

【6】EvoFormer: Learning Dynamic Graph-Level Representations with Structural and Temporal Bias Correction
标题：EvoFormer：通过结构和时间偏差纠正来学习动态图级表示
链接：https://arxiv.org/abs/2508.15378

作者：ng, Liuxin Zou, Di Wang, Bo Wang, Zhenxing Niu, Quan Wang
备注：None
摘要：动态图级嵌入旨在捕获网络中的结构演化，这对于建模真实世界场景至关重要。然而，现有的方法面临着两个关键的但未充分探讨的问题：结构访问偏差，其中随机游走采样不成比例地强调高度节点，导致冗余和嘈杂的结构表示;和突变进化盲，由于刚性或过于简单的时间建模策略，未能有效地检测突然的结构变化，导致不一致的时间嵌入。为了克服这些挑战，我们提出了EvoFormer，一个进化感知的Transformer框架，专为动态图级表示学习而设计。为了减轻结构访问偏差，EvoFormer引入了结构感知Transformer模块，该模块基于节点结构角色合并了位置编码，允许模型全局区分并准确表示节点结构。为了克服突变进化盲，EvoFormer采用了进化敏感的时间模块，该模块通过顺序的三步策略显式地对时间进化进行建模：（I）随机游走时间戳分类，生成初始时间戳感知的图级嵌入;（II）图级时间分割，将图流划分为反映结构上连贯的周期的段;以及（III）结合边缘演化预测任务的分段感知时间自注意，使模型能够精确地捕获分段边界并感知结构演化趋势，有效地适应快速的时间变化。对五个基准数据集的广泛评估证实，EvoFormer在图相似性排名，时间异常检测和时间分割任务中实现了最先进的性能，验证了其在纠正结构和时间偏差方面的有效性。
摘要：Dynamic graph-level embedding aims to capture structural evolution in networks, which is essential for modeling real-world scenarios. However, existing methods face two critical yet under-explored issues: Structural Visit Bias, where random walk sampling disproportionately emphasizes high-degree nodes, leading to redundant and noisy structural representations; and Abrupt Evolution Blindness, the failure to effectively detect sudden structural changes due to rigid or overly simplistic temporal modeling strategies, resulting in inconsistent temporal embeddings. To overcome these challenges, we propose EvoFormer, an evolution-aware Transformer framework tailored for dynamic graph-level representation learning. To mitigate Structural Visit Bias, EvoFormer introduces a Structure-Aware Transformer Module that incorporates positional encoding based on node structural roles, allowing the model to globally differentiate and accurately represent node structures. To overcome Abrupt Evolution Blindness, EvoFormer employs an Evolution-Sensitive Temporal Module, which explicitly models temporal evolution through a sequential three-step strategy: (I) Random Walk Timestamp Classification, generating initial timestamp-aware graph-level embeddings; (II) Graph-Level Temporal Segmentation, partitioning the graph stream into segments reflecting structurally coherent periods; and (III) Segment-Aware Temporal Self-Attention combined with an Edge Evolution Prediction task, enabling the model to precisely capture segment boundaries and perceive structural evolution trends, effectively adapting to rapid temporal shifts. Extensive evaluations on five benchmark datasets confirm that EvoFormer achieves state-of-the-art performance in graph similarity ranking, temporal anomaly detection, and temporal segmentation tasks, validating its effectiveness in correcting structural and temporal biases.

【7】Evaluating Knowledge Graph Complexity via Semantic, Spectral, and Structural Metrics for Link Prediction
标题：通过链接预测的语义、谱和结构表评估知识图复杂性
链接：https://arxiv.org/abs/2508.15291

作者： Abul Ghani Naim, Ajaz Ahmad Bhat
摘要：了解数据集的复杂性是评估和比较知识图（KG）上的链接预测模型的基础。虽然累积谱梯度（CSG）度量，从谱聚类框架内的类之间的概率分歧，已被提出作为一个分类器不可知的复杂性度量，据称与类基数和相关的下游性能进行缩放，它还没有被评估在KG设置至今。在这项工作中，我们严格审查CSG的多关系链接预测的背景下，通过Transformer派生嵌入的语义表示。与先前的声明相反，我们发现CSG对参数化高度敏感，并且不随类的数量而稳健地缩放。此外，它与标准性能指标（如平均倒数秩（MRR）和Hit@1）的相关性较弱或不一致。为了加深分析，我们引入了一套结构和语义KG复杂性度量标准，并进行了基准测试。我们的研究结果表明，通过关系熵，节点级别的最大关系多样性，关系类型基数捕获的全球和局部关系模糊表现出强烈的负相关性与MRR和Hit@1，这表明这些任务难度的更忠实的指标。相反，图连接性度量，如平均度、度熵、PageRank和特征向量中心性与Hit@10正相关。我们的研究结果表明，CSGs声称的稳定性和泛化预测能力在链接预测设置中无法保持，并强调了在知识驱动学习中需要更稳定，可解释和任务对齐的数据集复杂性度量。
摘要：Understanding dataset complexity is fundamental to evaluating and comparing link prediction models on knowledge graphs (KGs). While the Cumulative Spectral Gradient (CSG) metric, derived from probabilistic divergence between classes within a spectral clustering framework, has been proposed as a classifier agnostic complexity metric purportedly scaling with class cardinality and correlating with downstream performance, it has not been evaluated in KG settings so far. In this work, we critically examine CSG in the context of multi relational link prediction, incorporating semantic representations via transformer derived embeddings. Contrary to prior claims, we find that CSG is highly sensitive to parametrisation and does not robustly scale with the number of classes. Moreover, it exhibits weak or inconsistent correlation with standard performance metrics such as Mean Reciprocal Rank (MRR) and Hit@1. To deepen the analysis, we introduce and benchmark a set of structural and semantic KG complexity metrics. Our findings reveal that global and local relational ambiguity captured via Relation Entropy, node level Maximum Relation Diversity, and Relation Type Cardinality exhibit strong inverse correlations with MRR and Hit@1, suggesting these as more faithful indicators of task difficulty. Conversely, graph connectivity measures such as Average Degree, Degree Entropy, PageRank, and Eigenvector Centrality correlate positively with Hit@10. Our results demonstrate that CSGs purported stability and generalization predictive power fail to hold in link prediction settings and underscore the need for more stable, interpretable, and task-aligned measures of dataset complexity in knowledge driven learning.

【8】Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis
标题：通过分子分解和贡献分析实现图神经网络中的分段可解释性
链接：https://arxiv.org/abs/2508.15015

作者： Musiał, Bartosz Zieliński, Tomasz Danel
摘要：图神经网络通过利用分子图中编码的丰富结构信息，在预测分子性质方面取得了显着的成功。然而，它们的黑箱性质降低了可解释性，这限制了对药物发现和材料设计等重要应用的预测的信任。此外，现有的解释技术往往无法可靠地量化单个原子或子结构的贡献，由于纠缠的消息传递动力学。我们介绍了SEAL（Substructure Explanation via Attribution Learning），这是一种新的可解释图神经网络，它将模型预测归因于有意义的分子子图。SEAL将输入图分解成化学相关的片段，并估计它们对输出的因果影响。片段贡献和模型预测之间的强对齐是通过显式减少片段间消息传递在我们提出的模型架构。对合成基准和真实世界分子数据集的广泛评估表明，SEAL在定量归因指标和人类对齐的可解释性方面优于其他可解释性方法。一项用户研究进一步证实，SEAL为领域专家提供了更直观、更可靠的解释。通过弥合预测性能和可解释性之间的差距，SEAL为更透明和可操作的分子建模提供了一个有前途的方向。
摘要：Graph neural networks have demonstrated remarkable success in predicting molecular properties by leveraging the rich structural information encoded in molecular graphs. However, their black-box nature reduces interpretability, which limits trust in their predictions for important applications such as drug discovery and materials design. Furthermore, existing explanation techniques often fail to reliably quantify the contribution of individual atoms or substructures due to the entangled message-passing dynamics. We introduce SEAL (Substructure Explanation via Attribution Learning), a new interpretable graph neural network that attributes model predictions to meaningful molecular subgraphs. SEAL decomposes input graphs into chemically relevant fragments and estimates their causal influence on the output. The strong alignment between fragment contributions and model predictions is achieved by explicitly reducing inter-fragment message passing in our proposed model architecture. Extensive evaluations on synthetic benchmarks and real-world molecular datasets demonstrate that SEAL outperforms other explainability methods in both quantitative attribution metrics and human-aligned interpretability. A user study further confirms that SEAL provides more intuitive and trustworthy explanations to domain experts. By bridging the gap between predictive performance and interpretability, SEAL offers a promising direction for more transparent and actionable molecular modeling.

【9】Fast Graph Neural Network for Image Classification
标题：用于图像分类的快速图神经网络
链接：https://arxiv.org/abs/2508.14958

作者：ohammadi Gharasuie, Luis Rueda
备注：12 pages, proceeding into CanadianAI 2025
摘要：图像分类的快速发展在很大程度上是由图卷积网络（GCN）的采用推动的，它为处理复杂的数据结构提供了一个强大的框架。这项研究介绍了一种新的方法，集成GCN与Voronoi图，以提高图像分类，利用他们的能力，有效地建模关系数据。与传统的卷积神经网络（CNN）不同，我们的方法将图像表示为图形，其中像素或区域作为顶点。然后使用相应的Delaunay三角剖分来细化这些图，优化它们的表示。该模型在各种基准数据集上的预处理效率和分类精度都有了显着提高，超过了最先进的方法，特别是在涉及复杂场景和细粒度类别的挑战性场景中。通过交叉验证验证的实验结果强调了GCN与Voronoi图相结合用于推进图像分类的有效性。这项研究不仅为图像分类提供了一个新的视角，而且扩展了基于图的学习范式在计算机视觉和非结构化数据分析中的潜在应用。
摘要：The rapid progress in image classification has been largely driven by the adoption of Graph Convolutional Networks (GCNs), which offer a robust framework for handling complex data structures. This study introduces a novel approach that integrates GCNs with Voronoi diagrams to enhance image classification by leveraging their ability to effectively model relational data. Unlike conventional convolutional neural networks (CNNs), our method represents images as graphs, where pixels or regions function as vertices. These graphs are then refined using corresponding Delaunay triangulations, optimizing their representation. The proposed model achieves significant improvements in both preprocessing efficiency and classification accuracy across various benchmark datasets, surpassing state-of-the-art approaches, particularly in challenging scenarios involving intricate scenes and fine-grained categories. Experimental results, validated through cross-validation, underscore the effectiveness of combining GCNs with Voronoi diagrams for advancing image classification. This research not only presents a novel perspective on image classification but also expands the potential applications of graph-based learning paradigms in computer vision and unstructured data analysis.

【10】JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs
标题：JEDI-linear：快速有效的图形神经网络，用于在VGA上进行Jet标记
链接：https://arxiv.org/abs/2508.15468

作者：Que, Chang Sun, Sudarshan Paramesvaran, Emyr Clement, Katerina Karakoulaki, Christopher Brown, Lauri Laatu, Arianna Cox, Alexander Tapper, Wayne Luk, Maria Spiropulu
备注：10 pages, 9 figures
摘要：图神经网络（GNN），特别是交互网络（IN），在CERN高亮度大型强子对撞机（HL-LHC）的喷流标记中表现出卓越的性能。然而，它们的计算复杂性和不规则的存储器访问模式对硬件触发系统中的FPGA部署提出了重大挑战，其中应用了严格的延迟和资源约束。在这项工作中，我们提出了JEDI-linear，一种具有线性计算复杂度的新型GNN架构，通过利用共享变换和全局聚合来消除显式的成对交互。为了进一步提高硬件效率，我们引入了具有每个参数位宽优化的细粒度量化感知训练，并通过分布式算法采用无乘法器的乘法累加运算。评估结果表明，与最先进的设计相比，我们基于FPGA的JEDI-linear实现了3.7至11.5倍的延迟，高达150倍的启动间隔，以及高达6.2倍的LUT使用率，同时还提供了更高的模型精度，并完全消除了对DSP模块的需求。相比之下，最先进的解决方案消耗超过8，700个DSP。这是第一个实现小于60 ns延迟的基于交互的GNN，目前符合HL-LHC CMS Level-1触发系统的要求。这项工作通过在实时环境中实现准确，可扩展和资源高效的GNN推理来推进下一代触发系统。我们的开源模板将进一步支持可重复性，并在科学应用中得到更广泛的采用。
摘要：Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider (HL-LHC). However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply. In this work, we propose JEDI-linear, a novel GNN architecture with linear computational complexity that eliminates explicit pairwise interactions by leveraging shared transformations and global aggregation. To further enhance hardware efficiency, we introduce fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic. Evaluation results show that our FPGA-based JEDI-linear achieves 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, and up to 6.2 times lower LUT usage compared to state-of-the-art designs while also delivering higher model accuracy and eliminating the need for DSP blocks entirely. In contrast, state-of-the-art solutions consume over 8,700 DSPs. This is the first interaction-based GNN to achieve less than 60~ns latency and currently meets the requirements for use in the HL-LHC CMS Level-1 trigger system. This work advances the next-generation trigger systems by enabling accurate, scalable, and resource-efficient GNN inference in real-time environments. Our open-sourced templates will further support reproducibility and broader adoption across scientific applications.

Transformer(5篇)

【1】Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
标题：通过具有等级感知梁的变形器发现隐藏的代数结构GRPO
链接：https://arxiv.org/abs/2508.15766

作者：, Gio Huh, Ning Su, Tony Yue YU
摘要：最近的努力已经扩展了Transformers在逻辑推理和符号计算方面的能力。在这项工作中，我们调查他们的非线性潜在模式发现功能分解的背景下，专注于具有挑战性的代数任务的多元多项式分解的能力。这个问题在科学和工程中有着广泛的应用，被证明是NP难的，需要精确度和洞察力。我们的贡献有三个方面：首先，我们开发了一个合成数据生成管道，提供对问题复杂性的细粒度控制。其次，我们通过监督学习训练Transformer模型，并在涉及缩放行为和泛化能力的四个关键维度上对其进行评估。第三，我们提出了Beam Grouped Relative Policy Optimization（BGRPO），这是一种适用于硬代数问题的秩感知强化学习方法。使用BGRPO进行微调可以提高精度，同时将波束宽度减少一半，从而降低约75%的推理计算。此外，我们的模型在多项式简化方面表现出了竞争力，在各种情况下都优于Mathematica。
摘要：Recent efforts have extended the capabilities of transformers in logical reasoning and symbolic computations. In this work, we investigate their capacity for non-linear latent pattern discovery in the context of functional decomposition, focusing on the challenging algebraic task of multivariate polynomial decomposition. This problem, with widespread applications in science and engineering, is proved to be NP-hard, and demands both precision and insight. Our contributions are threefold: First, we develop a synthetic data generation pipeline providing fine-grained control over problem complexity. Second, we train transformer models via supervised learning and evaluate them across four key dimensions involving scaling behavior and generalizability. Third, we propose Beam Grouped Relative Policy Optimization (BGRPO), a rank-aware reinforcement learning method suitable for hard algebraic problems. Finetuning with BGRPO improves accuracy while reducing beam width by up to half, resulting in approximately 75% lower inference compute. Additionally, our model demonstrates competitive performance in polynomial simplification, outperforming Mathematica in various cases.

【2】Amortized In-Context Mixed Effect Transformer Models: A Zero-Shot Approach for Pharmacokinetics
标题：在上下文中摊销混合效应Transformer模型：药代动力学的零冲击方法
链接：https://arxiv.org/abs/2508.15659

作者： Ojeda Marin, Wilhelm Huisinga, Purity Kavwele, Niklas Hartung
摘要：稀疏采样下精确的剂量反应预测是精确药物治疗的核心。我们提出了摊销的上下文混合效应Transformer（AICMET）模型，一个基于变压器的潜在变量框架，统一了机械房室先验与摊销的上下文贝叶斯推理。AICMET在数十万个合成药代动力学轨迹上进行了预训练，其中Ornstein-Uhlenbeck先验超过了房室模型的参数，从而赋予模型强大的诱导偏差，并使zero-shot能够适应新化合物。在推理时，解码器根据先前分析的试验参与者的集体背景进行调节，在几次早期药物浓度测量后为新入组的患者生成校准后验预测。这种能力将传统的模型开发周期从数周缩短到数小时，同时保留一定程度的专家建模。在公共数据集上的实验表明，AICMET达到了最先进的预测准确性，并忠实地量化了患者间的变异性-优于非线性混合效应基线和最近的神经ODE变体。我们的研究结果强调了基于转换器的群体感知神经架构的可行性，为定制的药代动力学建模管道提供了一种新的替代方案，为真正的群体感知个性化给药方案开辟了道路。
摘要：Accurate dose-response forecasting under sparse sampling is central to precision pharmacotherapy. We present the Amortized In-Context Mixed-Effect Transformer (AICMET) model, a transformer-based latent-variable framework that unifies mechanistic compartmental priors with amortized in-context Bayesian inference. AICMET is pre-trained on hundreds of thousands of synthetic pharmacokinetic trajectories with Ornstein-Uhlenbeck priors over the parameters of compartment models, endowing the model with strong inductive biases and enabling zero-shot adaptation to new compounds. At inference time, the decoder conditions on the collective context of previously profiled trial participants, generating calibrated posterior predictions for newly enrolled patients after a few early drug concentration measurements. This capability collapses traditional model-development cycles from weeks to hours while preserving some degree of expert modelling. Experiments across public datasets show that AICMET attains state-of-the-art predictive accuracy and faithfully quantifies inter-patient variability -- outperforming both nonlinear mixed-effects baselines and recent neural ODE variants. Our results highlight the feasibility of transformer-based, population-aware neural architectures as offering a new alternative for bespoke pharmacokinetic modeling pipelines, charting a path toward truly population-aware personalized dosing regimens.

【3】ExBigBang: A Dynamic Approach for Explainable Persona Classification through Contextualized Hybrid Transformer Analysis
标题：ExBigBang：通过上下文混合Transformer分析进行可解释人物角色分类的动态方法
链接：https://arxiv.org/abs/2508.15364

作者：oon, Amin Beheshti, Nabi Rezvani, Farshad Khunjush, Usman Naseem, John McMahon, Zahra Fathollahi, Mahdieh Labani, Wathiq Mansoor, Xuyun Zhang
摘要：在以用户为中心的设计中，角色开发在理解用户行为、捕捉需求、细分受众和指导设计决策方面发挥着至关重要的作用。然而，用户交互的日益复杂性需要一种更加情境化的方法，以确保设计与真正的用户需求保持一致。虽然早期的研究通过对用户行为建模来推进角色分类，但捕获上下文信息，特别是通过整合文本和表格数据，仍然是一个关键挑战。这些模型也往往缺乏可解释性，使其预测难以解释或证明。为了解决这些限制，我们提出了ExBigBang（可解释的BigBang），一个混合的文本表格的方法，使用基于transformer的架构来建模丰富的上下文特征的人物角色分类。ExBigBang整合了元数据、领域知识和用户分析，以将更深层次的背景嵌入到预测中。通过用户分析和分类的循环过程，我们的方法动态更新，以反映不断变化的用户行为。在基准人物分类数据集上的实验证明了该模型的鲁棒性。一项消融研究证实了结合文本和表格数据的好处，而可解释的人工智能技术则揭示了模型预测背后的原理。
摘要：In user-centric design, persona development plays a vital role in understanding user behaviour, capturing needs, segmenting audiences, and guiding design decisions. However, the growing complexity of user interactions calls for a more contextualized approach to ensure designs align with real user needs. While earlier studies have advanced persona classification by modelling user behaviour, capturing contextual information, especially by integrating textual and tabular data, remains a key challenge. These models also often lack explainability, leaving their predictions difficult to interpret or justify. To address these limitations, we present ExBigBang (Explainable BigBang), a hybrid text-tabular approach that uses transformer-based architectures to model rich contextual features for persona classification. ExBigBang incorporates metadata, domain knowledge, and user profiling to embed deeper context into predictions. Through a cyclical process of user profiling and classification, our approach dynamically updates to reflect evolving user behaviours. Experiments on a benchmark persona classification dataset demonstrate the robustness of our model. An ablation study confirms the benefits of combining text and tabular data, while Explainable AI techniques shed light on the rationale behind the model's predictions.

【4】SleepDIFFormer: Sleep Stage Classification via Multivariate Differential Transformer
标题：SleepDIFFormer：通过多元差异Transformer进行睡眠阶段分类
链接：https://arxiv.org/abs/2508.15215

作者：Wei Hao Chin, Yuin Torng Yew, Haocheng Wu, Lanxin Liang, Chow Khuen Chan, Norita Mohd Zain, Siti Balqis Samdin, Sim Kuan Goh
备注：8 Pages
摘要：睡眠阶段分类对于评估睡眠质量和诊断失眠等睡眠障碍至关重要。然而，每个阶段的EEG特征的手动检查是耗时的并且容易出现人为错误。虽然机器学习和深度学习方法已经得到了积极的发展，但它们仍然面临着来自脑电图（EEG）和眼电图（EOG）信号的非平稳性和可变性的挑战，这往往导致对未知数据集的泛化能力较差。本研究提出了一种睡眠阶段分类方法，通过开发多变量差分Transformer（SleepDIFFormer）的联合EEG和EOG表示学习。具体而言，SleepDIFFormer的开发是为了使用我们的多变量差分Transformer架构（MDTA）处理EEG和EOG信号，用于时间序列，并通过跨域对齐进行训练。我们的方法减轻了空间和时间注意力噪声，同时通过特征分布对齐学习域不变的联合EEG-EOG表示，从而能够推广到看不见的目标数据集。从经验上讲，我们在五个不同的睡眠分期数据集上评估了我们的方法，并将其与现有方法进行了比较，实现了最先进的性能。我们还对SleepDIFFormer进行了彻底的消融分析，并解释了不同的注意力权重，突出了它们与特征睡眠EEG模式的相关性。这些发现对推进自动睡眠阶段分类及其在睡眠质量评估中的应用具有重要意义。我们的源代码可在https://github.com/Ben1001409/SleepDIFFormer上公开获取
摘要：Classification of sleep stages is essential for assessing sleep quality and diagnosing sleep disorders such as insomnia. However, manual inspection of EEG characteristics for each stage is time-consuming and prone to human error. Although machine learning and deep learning methods have been actively developed, they continue to face challenges from the non-stationarity and variability of electroencephalography (EEG) and electrooculography (EOG) signals, often leading to poor generalization on unseen datasets. This research proposed a Sleep Stage Classification method by developing Multivariate Differential Transformer (SleepDIFFormer) for joint EEG and EOG representation learning. Specifically, SleepDIFFormer was developed to process EEG and EOG signals using our Multivariate Differential Transformer Architecture (MDTA) for time series, trained with cross-domain alignment. Our method mitigated spatial and temporal attention noise while learning a domain-invariant joint EEG-EOG representation through feature distribution alignment, thereby enabling generalization to unseen target datasets. Empirically, we evaluated our method on five different sleep staging datasets and compared it with existing approaches, achieving state-of-the-art performance. We also conducted thorough ablation analyses of SleepDIFFormer and interpreted the differential attention weights, highlighting their relevance to characteristic sleep EEG patterns. These findings have implications for advancing automated sleep stage classification and its application to sleep quality assessment. Our source code is publicly available at https://github.com/Ben1001409/SleepDIFFormer

【5】End-to-End Analysis of Charge Stability Diagrams with Transformers
标题：带有Transformer的电荷稳定图的端到端分析
链接：https://arxiv.org/abs/2508.15710

作者：chand, Lucas Schorling, Cornelius Carlsson, Jonas Schuff, Barnaby van Straaten, Taylor L. Patti, Federico Fedele, Joshua Ziegler, Parth Girdhar, Pranav Vaidhyanathan, Natalia Ares
备注：8 pages, 2 figures, RM and LS contributed equally
摘要：Transformer模型和端到端学习框架正在迅速改变人工智能领域。在这项工作中，我们应用对象检测Transformers来分析半导体量子点阵列中的电荷稳定性图，这是实现基于自旋的量子计算可扩展性的关键任务。具体来说，我们的模型识别三相点及其连通性，这对于虚拟栅极校准、电荷状态初始化、漂移校正和脉冲排序至关重要。我们证明，它在三种不同的自旋量子位架构上的性能超过了卷积神经网络，而所有这些都不需要再训练。与现有的方法相比，我们的方法显着降低了复杂性和运行时间，同时提高了泛化能力。结果突出了基于transformer的端到端学习框架作为可扩展的，设备和架构不可知的工具的基础，用于控制和调整量子点设备的潜力。
摘要：Transformer models and end-to-end learning frameworks are rapidly revolutionizing the field of artificial intelligence. In this work, we apply object detection transformers to analyze charge stability diagrams in semiconductor quantum dot arrays, a key task for achieving scalability with spin-based quantum computing. Specifically, our model identifies triple points and their connectivity, which is crucial for virtual gate calibration, charge state initialization, drift correction, and pulse sequencing. We show that it surpasses convolutional neural networks in performance on three different spin qubit architectures, all without the need for retraining. In contrast to existing approaches, our method significantly reduces complexity and runtime, while enhancing generalizability. The results highlight the potential of transformer-based end-to-end learning frameworks as a foundation for a scalable, device- and architecture-agnostic tool for control and tuning of quantum dot devices.

GAN|对抗|攻击|生成相关(8篇)

【1】Scaling Group Inference for Diverse and High-Quality Generation
标题：扩展群体推理以实现多元化和高质量的一代
链接：https://arxiv.org/abs/2508.15773

作者：rmar, Or Patashnik, Daniil Ostashev, Kuan-Chieh Wang, Kfir Aberman, Srinivasa Narasimhan, Jun-Yan Zhu
备注：Project website: this https URL, GitHub: this https URL
摘要：生成模型通常独立地对输出进行采样，最近的推理时间指导和缩放算法专注于提高单个样本的质量。然而，在现实世界的应用中，通常向用户呈现一组多个图像（例如，4-8)对于每一个提示，独立的抽样往往会导致冗余的结果，限制用户的选择，阻碍想法的探索。在这项工作中，我们引入了一个可扩展的组推理方法，提高了一组样本的多样性和质量。我们将群体推理公式化为二次整数分配问题：候选输出被建模为图节点，并选择一个子集来优化样本质量（一元项），同时最大化群体多样性（二元项）。为了大大提高运行时效率，我们使用中间预测逐步修剪候选集，使我们的方法能够扩展到大型候选集。大量的实验表明，我们的方法显着提高组的多样性和质量相比，独立的采样基线和最近的推理算法。我们的框架概括了广泛的任务，包括文本到图像，图像到图像，图像提示和视频生成，使生成模型能够将多个输出视为内聚组，而不是独立的样本。
摘要：Generative models typically sample outputs independently, and recent inference-time guidance and scaling algorithms focus on improving the quality of individual samples. However, in real-world applications, users are often presented with a set of multiple images (e.g., 4-8) for each prompt, where independent sampling tends to lead to redundant results, limiting user choices and hindering idea exploration. In this work, we introduce a scalable group inference method that improves both the diversity and quality of a group of samples. We formulate group inference as a quadratic integer assignment problem: candidate outputs are modeled as graph nodes, and a subset is selected to optimize sample quality (unary term) while maximizing group diversity (binary term). To substantially improve runtime efficiency, we progressively prune the candidate set using intermediate predictions, allowing our method to scale up to large candidate sets. Extensive experiments show that our method significantly improves group diversity and quality compared to independent sampling baselines and recent inference algorithms. Our framework generalizes across a wide range of tasks, including text-to-image, image-to-image, image prompting, and video generation, enabling generative models to treat multiple outputs as cohesive groups rather than independent samples.

【2】Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space
标题：具有连续动作空间的多智能体强化学习中对抗性攻击的分布式检测
链接：https://arxiv.org/abs/2508.15764

作者：azari, Ezzeldin Shereen, György Dán
备注：Accepted for publication at ECAI 2025
摘要：针对具有连续动作空间的合作多智能体强化学习，提出了对抗性攻击的检测问题。我们提出了一个分散的检测器，它只依赖于代理的本地观测，并利用可观察代理的正常行为的统计特征。所提出的检测器利用深度神经网络将代理的正常行为近似为参数多元高斯分布。基于预测的密度函数，我们定义了正态性分数并提供了其均值和方差的特征。这种表征使我们能够采用双侧的CRAMUM程序来检测正常性分数与其平均值的偏差，作为实时异常行为的检测器。我们在各种多智能体PettingZoo基准测试中针对不同的最新攻击方法评估了我们的方案，我们的结果证明了我们的方法在检测有影响力的对抗性攻击方面的有效性。特别是，它通过在所有评估环境中针对最具影响力的攻击实现超过0.95的AUC-ROC得分，优于离散对手。
摘要：We address the problem of detecting adversarial attacks against cooperative multi-agent reinforcement learning with continuous action space. We propose a decentralized detector that relies solely on the local observations of the agents and makes use of a statistical characterization of the normal behavior of observable agents. The proposed detector utilizes deep neural networks to approximate the normal behavior of agents as parametric multivariate Gaussian distributions. Based on the predicted density functions, we define a normality score and provide a characterization of its mean and variance. This characterization allows us to employ a two-sided CUSUM procedure for detecting deviations of the normality score from its mean, serving as a detector of anomalous behavior in real-time. We evaluate our scheme on various multi-agent PettingZoo benchmarks against different state-of-the-art attack methods, and our results demonstrate the effectiveness of our method in detecting impactful adversarial attacks. Particularly, it outperforms the discrete counterpart by achieving AUC-ROC scores of over 0.95 against the most impactful attacks in all evaluated environments.

【3】BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning
标题：BadFU：通过对抗机器取消学习的后门联邦学习
链接：https://arxiv.org/abs/2508.15541

作者： Lu, Hongsheng Hu, Yuantian Miao, Shaleeza Sohail, Chaoxiang He, Shuo Wang, Xiao Chen
摘要：联邦学习（FL）已被广泛采用作为一种分散的训练范式，使多个客户端能够协作学习共享模型，而无需暴露其本地数据。随着对数据隐私和监管合规性的担忧日益增长，旨在消除训练模型中特定数据影响的机器学习在联邦环境中变得越来越重要，以满足法律，道德或用户驱动的需求。然而，将非学习集成到FL中会带来新的挑战，并在很大程度上带来未开发的安全风险。特别是，对手可能会利用遗忘过程来损害全局模型的完整性。在本文中，我们提出了第一个后门攻击的背景下，联邦unlearning，证明了对手可以注入后门到全球模型通过看似合法的unlearning请求。具体来说，我们提出BadFU，攻击策略，恶意客户端使用后门和伪装样本训练的全球模型通常在联邦训练过程中。一旦客户端请求取消伪装样本，全局模型就转换到后门状态。在各种FL框架和学习策略下进行的大量实验验证了BadFU的有效性，揭示了当前联邦学习实践中的一个关键漏洞，并强调了对更安全和更强大的联邦学习机制的迫切需要。
摘要：Federated learning (FL) has been widely adopted as a decentralized training paradigm that enables multiple clients to collaboratively learn a shared model without exposing their local data. As concerns over data privacy and regulatory compliance grow, machine unlearning, which aims to remove the influence of specific data from trained models, has become increasingly important in the federated setting to meet legal, ethical, or user-driven demands. However, integrating unlearning into FL introduces new challenges and raises largely unexplored security risks. In particular, adversaries may exploit the unlearning process to compromise the integrity of the global model. In this paper, we present the first backdoor attack in the context of federated unlearning, demonstrating that an adversary can inject backdoors into the global model through seemingly legitimate unlearning requests. Specifically, we propose BadFU, an attack strategy where a malicious client uses both backdoor and camouflage samples to train the global model normally during the federated training process. Once the client requests unlearning of the camouflage samples, the global model transitions into a backdoored state. Extensive experiments under various FL frameworks and unlearning strategies validate the effectiveness of BadFU, revealing a critical vulnerability in current federated unlearning practices and underscoring the urgent need for more secure and robust federated unlearning mechanisms.

【4】MMQ: Multimodal Mixture-of-Quantization Tokenization for Semantic ID Generation and User Behavioral Adaptation
标题：MMQ：用于语义ID生成和用户行为适应的多模式混合量化令牌化
链接：https://arxiv.org/abs/2508.15281

作者：yu Zhang, Chenxuan Li, Zhihao Liao, Haibo Xing, Hao Deng, Jinxin Hu, Yu Zhang, Xiaoyi Zeng, Jing Zhang
摘要：推荐系统传统上使用唯一标识符（ItemID）表示项目，但这种方法难以处理大型动态项目语料库和稀疏长尾数据，限制了可扩展性和泛化性。从文本和图像等多模态内容派生的语义ID通过将项目映射到共享语义空间，实现知识转移并改进对新项目或稀有项目的推荐，提供了一种有前途的替代方案。然而，现有方法面临两个关键挑战：（1）平衡跨模态协同与模态特定的独特性，以及（2）弥合语义-行为差距，其中语义表示可能与实际用户偏好不一致。为了解决这些挑战，我们提出了多模态混合量化（MMQ），这是一个两阶段的框架，可以训练一个新的多模态标记器。首先，特定于共享的标记器利用具有特定于模态和模态共享专家的多专家架构，使用正交正则化来捕获全面的多模态信息。其次，行为感知微调动态地适应语义ID下游推荐目标，同时通过多模态重建损失保留模态信息。大量的离线实验和在线A/B测试表明，MMQ有效地统一了多模态协同，特异性和行为适应，为生成检索和判别排名任务提供了一个可扩展的和通用的解决方案。
摘要：Recommender systems traditionally represent items using unique identifiers (ItemIDs), but this approach struggles with large, dynamic item corpora and sparse long-tail data, limiting scalability and generalization. Semantic IDs, derived from multimodal content such as text and images, offer a promising alternative by mapping items into a shared semantic space, enabling knowledge transfer and improving recommendations for new or rare items. However, existing methods face two key challenges: (1) balancing cross-modal synergy with modality-specific uniqueness, and (2) bridging the semantic-behavioral gap, where semantic representations may misalign with actual user preferences. To address these challenges, we propose Multimodal Mixture-of-Quantization (MMQ), a two-stage framework that trains a novel multimodal tokenizer. First, a shared-specific tokenizer leverages a multi-expert architecture with modality-specific and modality-shared experts, using orthogonal regularization to capture comprehensive multimodal information. Second, behavior-aware fine-tuning dynamically adapts semantic IDs to downstream recommendation objectives while preserving modality information through a multimodal reconstruction loss. Extensive offline experiments and online A/B tests demonstrate that MMQ effectively unifies multimodal synergy, specificity, and behavioral adaptation, providing a scalable and versatile solution for both generative retrieval and discriminative ranking tasks.

【5】A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives
标题：模型提取攻击和防御的系统性调查：最新技术水平和观点
链接：https://arxiv.org/abs/2508.15031

作者：Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong
摘要：机器学习（ML）模型在复杂性和实用性方面都有了显着增长，推动了多个领域的进步。然而，大量的计算资源和专业知识在历史上限制了它们的广泛采用。机器学习即服务（MLaaS）平台通过用户友好的API提供对复杂ML模型的可扩展、方便和经济实惠的访问，解决了这些障碍。虽然这种可访问性促进了高级ML功能的广泛使用，但它也引入了通过模型提取攻击（MEA）利用的漏洞。最近的研究表明，攻击者可以通过与公开的接口交互来系统地复制目标模型的功能，从而对知识产权、隐私和系统安全构成威胁。在本文中，我们提供了一个全面的调查多边环境协定和相应的防御策略。我们提出了一种新的分类法，根据攻击机制，防御方法和计算环境的MEA进行分类。我们的分析涵盖了各种攻击技术，评估了它们的有效性，并强调了现有防御所面临的挑战，特别是保护模型效用和确保安全性之间的关键权衡。我们进一步评估多边环境协定在不同的计算范式，并讨论其技术，伦理，法律和社会的影响，以及未来的研究有前途的方向。这项系统性调查旨在为从事人工智能安全和隐私的研究人员、从业人员和政策制定者提供有价值的参考。此外，我们在https://github.com/kzhao5/ModelExtractionPapers上维护一个在线存储库，不断更新相关文献。
摘要：Machine learning (ML) models have significantly grown in complexity and utility, driving advances across multiple domains. However, substantial computational resources and specialized expertise have historically restricted their wide adoption. Machine-Learning-as-a-Service (MLaaS) platforms have addressed these barriers by providing scalable, convenient, and affordable access to sophisticated ML models through user-friendly APIs. While this accessibility promotes widespread use of advanced ML capabilities, it also introduces vulnerabilities exploited through Model Extraction Attacks (MEAs). Recent studies have demonstrated that adversaries can systematically replicate a target model's functionality by interacting with publicly exposed interfaces, posing threats to intellectual property, privacy, and system security. In this paper, we offer a comprehensive survey of MEAs and corresponding defense strategies. We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments. Our analysis covers various attack techniques, evaluates their effectiveness, and highlights challenges faced by existing defenses, particularly the critical trade-off between preserving model utility and ensuring security. We further assess MEAs within different computing paradigms and discuss their technical, ethical, legal, and societal implications, along with promising directions for future research. This systematic survey aims to serve as a valuable reference for researchers, practitioners, and policymakers engaged in AI security and privacy. Additionally, we maintain an online repository continuously updated with related literature at https://github.com/kzhao5/ModelExtractionPapers.

【6】Aura-CAPTCHA: A Reinforcement Learning and GAN-Enhanced Multi-Modal CAPTCHA System
标题：Aura-验证码：强化学习和GAN增强型多模式验证码系统
链接：https://arxiv.org/abs/2508.14976

作者：handra, Prabal Manhas, Ramanjot Kaur, Rashi Sahay
摘要：Aura-CAPTCHA是一个多模式CAPTCHA系统，旨在解决人工智能技术（如光学字符识别（OCR）和对抗性图像处理）越来越多地绕过传统方法的漏洞。该设计集成了用于生成动态图像挑战的生成对抗网络（GANs）、用于自适应难度调整的强化学习（RL）以及用于创建文本和音频提示的大型语言模型（LLM）。视觉挑战包括至少三个正确图像的3x 3网格选择，而音频挑战将随机数字和单词组合到单个任务中。RL根据错误尝试、响应时间和可疑用户行为调整难度。对真实世界流量的评估显示，人类成功率为92%，机器人绕过率为10%，明显优于现有的CAPTCHA系统。该系统提供了一个强大的和可扩展的方法来保护在线应用程序，同时保持用户的访问，解决了以前的研究中突出的差距。
摘要：Aura-CAPTCHA was developed as a multi-modal CAPTCHA system to address vulnerabilities in traditional methods that are increasingly bypassed by AI technologies, such as Optical Character Recognition (OCR) and adversarial image processing. The design integrated Generative Adversarial Networks (GANs) for generating dynamic image challenges, Reinforcement Learning (RL) for adaptive difficulty tuning, and Large Language Models (LLMs) for creating text and audio prompts. Visual challenges included 3x3 grid selections with at least three correct images, while audio challenges combined randomized numbers and words into a single task. RL adjusted difficulty based on incorrect attempts, response time, and suspicious user behavior. Evaluations on real-world traffic demonstrated a 92% human success rate and a 10% bot bypass rate, significantly outperforming existing CAPTCHA systems. The system provided a robust and scalable approach for securing online applications while remaining accessible to users, addressing gaps highlighted in previous research.

【7】MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers
标题：MCPTox：对现实世界的HCP服务器进行工具中毒攻击的基准
链接：https://arxiv.org/abs/2508.14925

作者：Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, Xiangyang Li
摘要：通过为LLM代理提供与外部工具交互的标准化接口，模型上下文协议（MCP）正在迅速成为现代自治代理生态系统的基石。但是，由于不受信任的外部工具，它会创建新的攻击面。虽然以前的工作主要集中在通过外部工具输出注入的攻击，我们调查了一个更根本的漏洞：工具中毒，其中恶意指令嵌入在工具的元数据中而不执行。迄今为止，这一威胁主要是通过孤立的案件来显示的，缺乏系统的大规模评估。我们介绍MCPTox，第一个基准系统地评估代理的鲁棒性对工具中毒在现实的MCP设置。MCPTox基于45个真实的MCP服务器和353个真实的工具。为了实现这一目标，我们设计了三个不同的攻击模板，通过Few-Shot学习生成一套全面的1312个恶意测试用例，涵盖了10类潜在风险。我们对20个突出的LLM代理设置的评估揭示了工具中毒的广泛漏洞，o 1-mini的攻击成功率为72.8%。我们发现，更有能力的模型往往更容易受到影响，因为攻击利用了他们优越的防御能力。最后，失败案例分析表明，代理很少拒绝这些攻击，最高拒绝率（克劳德-3.7-十四行诗）小于3%，表明现有的安全对齐是无效的恶意行为，使用合法的工具进行未经授权的操作。我们的研究结果为理解和减轻这种广泛的威胁创造了一个关键的经验基线，我们发布了MCPTox，用于开发可验证的更安全的AI代理。我们的数据集可以在匿名存储库中找到：\textit{https：//anonymous.4open.science/r/AAAI26-7C02}。
摘要：By providing a standardized interface for LLM agents to interact with external tools, the Model Context Protocol (MCP) is quickly becoming a cornerstone of the modern autonomous agent ecosystem. However, it creates novel attack surfaces due to untrusted external tools. While prior work has focused on attacks injected through external tool outputs, we investigate a more fundamental vulnerability: Tool Poisoning, where malicious instructions are embedded within a tool's metadata without execution. To date, this threat has been primarily demonstrated through isolated cases, lacking a systematic, large-scale evaluation. We introduce MCPTox, the first benchmark to systematically evaluate agent robustness against Tool Poisoning in realistic MCP settings. MCPTox is constructed upon 45 live, real-world MCP servers and 353 authentic tools. To achieve this, we design three distinct attack templates to generate a comprehensive suite of 1312 malicious test cases by few-shot learning, covering 10 categories of potential risks. Our evaluation on 20 prominent LLM agents setting reveals a widespread vulnerability to Tool Poisoning, with o1-mini, achieving an attack success rate of 72.8\%. We find that more capable models are often more susceptible, as the attack exploits their superior instruction-following abilities. Finally, the failure case analysis reveals that agents rarely refuse these attacks, with the highest refused rate (Claude-3.7-Sonnet) less than 3\%, demonstrating that existing safety alignment is ineffective against malicious actions that use legitimate tools for unauthorized operation. Our findings create a crucial empirical baseline for understanding and mitigating this widespread threat, and we release MCPTox for the development of verifiably safer AI agents. Our dataset is available at an anonymized repository: \textit{https://anonymous.4open.science/r/AAAI26-7C02}.

【8】Potential and challenges of generative adversarial networks for super-resolution in 4D Flow MRI
标题：生成对抗网络在4D Flow MRI中实现超分辨率的潜力和挑战
链接：https://arxiv.org/abs/2508.14950

作者：lin Odeback, Arivazhagan Geetha Balasubramanian, Jonas Schollenberger, Edward Ferdiand, Alistair A. Young, C. Alberto Figueroa, Susanne Schnell, Outi Tammisola, Ricardo Vinuesa, Tobias Granberg, Alexander Fyrdahl, David Marlevi
备注：23 pages, 9 figures
摘要：4D Flow磁共振成像（4D Flow MRI）能够无创量化血流和血液动力学参数。然而，其临床应用受到低空间分辨率和噪声的限制，特别是影响近壁速度测量。基于机器学习的超分辨率在解决这些限制方面表现出了希望，但挑战仍然存在，尤其是在恢复近壁速度方面。生成对抗网络（GANs）提供了一个令人信服的解决方案，在非医疗超分辨率任务中恢复清晰边界方面表现出强大的能力。然而，它们在4D Flow MRI中的应用仍未得到探索，其实施受到已知问题的挑战，例如训练不稳定性和不收敛。在这项研究中，我们研究了4D Flow MRI中基于GAN的超分辨率。使用患者特定脑血管计算机模拟模型进行训练和验证，通过MR真实重建管道转换为合成图像。一个专用的GAN架构的实施和评估跨越三个对抗损失函数：香草，相对论，和Wasserstein。我们的研究结果表明，与非对抗性参考相比，所提出的GAN改善了近壁速度恢复（vNRMSE：6.9% vs. 9.6%）;然而，实现细节对于稳定的网络训练至关重要。虽然Vanilla和Relativistic GANs与仅生成器训练相比不稳定（vNRMSE：8.1%和7.8% vs. 7.2%），但Wasserstein GAN表现出最佳的稳定性和增量改进（vNRMSE：6.9% vs. 7.2%）。Wasserstein GAN在低SNR下的性能进一步优于仅发生器基线（vNRMSE：8.7% vs. 10.7%）。这些发现强调了基于GAN的超分辨率在增强4D Flow MRI方面的潜力，特别是在具有挑战性的脑血管区域，同时强调需要仔细选择对抗策略。
摘要：4D Flow Magnetic Resonance Imaging (4D Flow MRI) enables non-invasive quantification of blood flow and hemodynamic parameters. However, its clinical application is limited by low spatial resolution and noise, particularly affecting near-wall velocity measurements. Machine learning-based super-resolution has shown promise in addressing these limitations, but challenges remain, not least in recovering near-wall velocities. Generative adversarial networks (GANs) offer a compelling solution, having demonstrated strong capabilities in restoring sharp boundaries in non-medical super-resolution tasks. Yet, their application in 4D Flow MRI remains unexplored, with implementation challenged by known issues such as training instability and non-convergence. In this study, we investigate GAN-based super-resolution in 4D Flow MRI. Training and validation were conducted using patient-specific cerebrovascular in-silico models, converted into synthetic images via an MR-true reconstruction pipeline. A dedicated GAN architecture was implemented and evaluated across three adversarial loss functions: Vanilla, Relativistic, and Wasserstein. Our results demonstrate that the proposed GAN improved near-wall velocity recovery compared to a non-adversarial reference (vNRMSE: 6.9% vs. 9.6%); however, that implementation specifics are critical for stable network training. While Vanilla and Relativistic GANs proved unstable compared to generator-only training (vNRMSE: 8.1% and 7.8% vs. 7.2%), a Wasserstein GAN demonstrated optimal stability and incremental improvement (vNRMSE: 6.9% vs. 7.2%). The Wasserstein GAN further outperformed the generator-only baseline at low SNR (vNRMSE: 8.7% vs. 10.7%). These findings highlight the potential of GAN-based super-resolution in enhancing 4D Flow MRI, particularly in challenging cerebrovascular regions, while emphasizing the need for careful selection of adversarial strategies.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】Measures of Overlapping Multivariate Gaussian Clusters in Unsupervised Online Learning
标题：无监督在线学习中多元高斯集群重叠的测量
链接：https://arxiv.org/abs/2508.15444

作者：t, Igor Škrjanc
备注：5 pages, in Slovenian language. 2 figures. Accepted for the 33rd International Electrotechnical and Computer Science Conference ERK 2024 (Portoroz, Slovenia, 26-27 Sep 2024). Conference PDF: https://erk.fe.uni-lj.si/2024/papers/ozbot(mere_prekrivanja).pdf
摘要：在本文中，我们提出了一个新的措施，检测重叠的多元高斯聚类。从数据流中在线学习的目的是创建聚类、分类或回归模型，这些模型可以根据流数据的概念漂移随时间变化。在聚类的情况下，这可能会导致大量的集群可能重叠，应该合并。通常使用的分布相异性措施是不足以确定重叠的集群的上下文中的在线学习流数据，由于他们无法考虑所有形状的集群和他们的高计算需求。我们提出的相异度测量是专门设计来检测重叠，而不是相异度，可以更快地计算相比，现有的措施。我们的方法比比较方法快几倍，并且能够检测重叠的聚类，同时避免合并正交聚类。
摘要：In this paper, we propose a new measure for detecting overlap in multivariate Gaussian clusters. The aim of online learning from data streams is to create clustering, classification, or regression models that can adapt over time based on the conceptual drift of streaming data. In the case of clustering, this can result in a large number of clusters that may overlap and should be merged. Commonly used distribution dissimilarity measures are not adequate for determining overlapping clusters in the context of online learning from streaming data due to their inability to account for all shapes of clusters and their high computational demands. Our proposed dissimilarity measure is specifically designed to detect overlap rather than dissimilarity and can be computed faster compared to existing measures. Our method is several times faster than compared methods and is capable of detecting overlapping clusters while avoiding the merging of orthogonal clusters.

【2】Federated Learning based on Self-Evolving Gaussian Clustering
标题：基于自进化高斯集群的联邦学习
链接：https://arxiv.org/abs/2508.15393

作者：t, Igor Škrjanc
备注：5 pages, in slovenian language, 3 figures. Published in the Proceedings of the 33rd International Electrotechnical and Computer Science Conference (ERK 2024), Portoroz, Slovenia, pp. 240-243. Indexed in COBISS (this http URL-ID 212879107). Official version available at this https URL
摘要：在这项研究中，我们提出了一个不断发展的模糊系统的背景下，联邦学习，适应动态添加新的集群，因此不需要的集群数量被选择先验。与传统方法不同，联合学习允许在客户端设备上本地训练模型，仅与中央服务器共享模型参数而不是数据。我们使用PyTorch实现的方法在聚类和分类任务上进行了测试。结果表明，我们的方法优于几个著名的UCI数据集上建立的分类方法。虽然由于重叠条件计算而导致计算密集型，但所提出的方法在分散式数据处理中表现出显着的优势。
摘要：In this study, we present an Evolving Fuzzy System within the context of Federated Learning, which adapts dynamically with the addition of new clusters and therefore does not require the number of clusters to be selected apriori. Unlike traditional methods, Federated Learning allows models to be trained locally on clients' devices, sharing only the model parameters with a central server instead of the data. Our method, implemented using PyTorch, was tested on clustering and classification tasks. The results show that our approach outperforms established classification methods on several well-known UCI datasets. While computationally intensive due to overlap condition calculations, the proposed method demonstrates significant advantages in decentralized data processing.

【3】Twin-Boot: Uncertainty-Aware Optimization via Online Two-Sample Bootstrapping
标题：Twin-Boot：通过在线两样本引导进行不确定性感知优化
链接：https://arxiv.org/abs/2508.15019

作者：ein Brito
备注：12 pages, 6 figures
摘要：标准的梯度下降方法产生点估计，没有置信度。这种局限性在过度参数化和低数据状态下尤为严重，在这些状态下，模型具有许多相对于可用数据的参数，并且很容易过拟合。Bootstrapping是一个经典的统计框架，用于基于响应的不确定性估计，但天真地将其应用于深度学习是不切实际的：它需要训练许多副本，产生无法指导学习的事后估计，并隐含地假设跨运行的可比最优值-一个在非凸景观中失败的假设。我们介绍了双引导梯度下降（双引导），一个基于重采样的训练过程，将不确定性估计集成到优化中。两个相同的模型在独立的自举样本上并行训练，并且周期性的均值重置使两个轨迹保持在同一流域中，使得它们的分歧反映了局部（流域内）的不确定性。在训练过程中，我们使用此估计以自适应、数据驱动的方式对权重进行采样，提供有利于更平坦解决方案的正则化。在深度神经网络和复杂的高维逆问题中，该方法改进了校准和泛化，并产生可解释的不确定性图。
摘要：Standard gradient descent methods yield point estimates with no measure of confidence. This limitation is acute in overparameterized and low-data regimes, where models have many parameters relative to available data and can easily overfit. Bootstrapping is a classical statistical framework for uncertainty estimation based on resampling, but naively applying it to deep learning is impractical: it requires training many replicas, produces post-hoc estimates that cannot guide learning, and implicitly assumes comparable optima across runs - an assumption that fails in non-convex landscapes. We introduce Twin-Bootstrap Gradient Descent (Twin-Boot), a resampling-based training procedure that integrates uncertainty estimation into optimization. Two identical models are trained in parallel on independent bootstrap samples, and a periodic mean-reset keeps both trajectories in the same basin so that their divergence reflects local (within-basin) uncertainty. During training, we use this estimate to sample weights in an adaptive, data-driven way, providing regularization that favors flatter solutions. In deep neural networks and complex high-dimensional inverse problems, the approach improves calibration and generalization and yields interpretable uncertainty maps.

【4】Label Uncertainty for Ultrasound Segmentation
标题：超声分割的标签不确定性
链接：https://arxiv.org/abs/2508.15635

作者：ivaram, Gautam Rajendrakumar Gare, Laura Hutchins, Jacob Duplantis, Thomas Deiss, Thales Nogueira Gomes, Thong Tran, Keyur H. Patel, Thomas H Fox, Amita Krishnan, Deva Ramanan, Bennett DeBoisblanc, Ricardo Rodriguez, John Galeotti
备注：Paper under review
摘要：在医学成像中，放射科医师之间的观察者间差异通常会引入标签不确定性，特别是在视觉解释是主观的模态中。肺部超声（LUS）就是一个很好的例子，它经常呈现高度模糊的区域和清晰可辨的结构的混合物，即使对于经验丰富的临床医生来说，也难以进行一致的注释。在这项工作中，我们引入了一种新的方法，使用专家提供的每像素置信度值来标记和训练AI模型。我们设计了一个数据注释协议，该协议捕获了放射科医生在每个标记区域中的置信度，而不是将注释视为绝对的基础事实，从而对现实世界临床数据中存在的固有随机不确定性进行建模。我们证明，在训练过程中纳入这些置信度值可以提高分割性能。更重要的是，我们表明，这种增强的分割质量转化为下游临床关键任务的更好的性能，特别是，估计S/F氧合比值，分类S/F比值变化，并预测30天的患者再入院。虽然我们根据经验评估了许多将不确定性暴露给学习模型的方法，但我们发现一种简单的方法可以在以（60%）置信阈值获得的二值化标签上训练模型。重要的是，高阈值比50%阈值的简单方法要好得多，这表明在非常自信的像素上进行训练要有效得多。我们的研究系统地研究了具有不同置信阈值的训练的影响，不仅比较了分割指标，还比较了下游临床结果。这些结果表明，标签置信度是一个有价值的信号，当适当利用时，可以显着提高AI在医学成像中的可靠性和临床实用性。
摘要：In medical imaging, inter-observer variability among radiologists often introduces label uncertainty, particularly in modalities where visual interpretation is subjective. Lung ultrasound (LUS) is a prime example-it frequently presents a mixture of highly ambiguous regions and clearly discernible structures, making consistent annotation challenging even for experienced clinicians. In this work, we introduce a novel approach to both labeling and training AI models using expert-supplied, per-pixel confidence values. Rather than treating annotations as absolute ground truth, we design a data annotation protocol that captures the confidence that radiologists have in each labeled region, modeling the inherent aleatoric uncertainty present in real-world clinical data. We demonstrate that incorporating these confidence values during training leads to improved segmentation performance. More importantly, we show that this enhanced segmentation quality translates into better performance on downstream clinically-critical tasks-specifically, estimating S/F oxygenation ratio values, classifying S/F ratio change, and predicting 30-day patient readmission. While we empirically evaluate many methods for exposing the uncertainty to the learning model, we find that a simple approach that trains a model on binarized labels obtained with a (60%) confidence threshold works well. Importantly, high thresholds work far better than a naive approach of a 50% threshold, indicating that training on very confident pixels is far more effective. Our study systematically investigates the impact of training with varying confidence thresholds, comparing not only segmentation metrics but also downstream clinical outcomes. These results suggest that label confidence is a valuable signal that, when properly leveraged, can significantly enhance the reliability and clinical utility of AI in medical imaging.

【5】LoUQAL: Low-fidelity informed Uncertainty Quantification for Active Learning in the chemical configuration space
标题：LoUQAL：化学配置空间中的低保真信息不确定性量化主动学习
链接：https://arxiv.org/abs/2508.15577

作者：od, Peter Zaspel
摘要：不确定性量化是主动学习技术中的一个重要方案，包括在预测量子化学性质中的应用。在量子化学计算中，存在保真度的概念，即以较低的计算成本获得较不精确的计算。这项工作提出了一种新的低保真度知情的不确定性量化的主动学习与预测不同的量子化学性质，如激发能和\textit{ab initio}势能面的应用。计算实验进行，以评估所提出的方法与结果表明，与新的方法训练的模型优于替代品的经验误差和所需的迭代次数。还研究了保真度的选择的影响，以进行彻底的基准测试。
摘要：Uncertainty quantification is an important scheme in active learning techniques, including applications in predicting quantum chemical properties. In quantum chemical calculations, there exists the notion of a fidelity, a less accurate computation is accessible at a cheaper computational cost. This work proposes a novel low-fidelity informed uncertainty quantification for active learning with applications in predicting diverse quantum chemical properties such as excitation energies and \textit{ab initio} potential energy surfaces. Computational experiments are carried out in order to assess the proposed method with results demonstrating that models trained with the novel method outperform alternatives in terms of empirical error and number of iterations required. The effect of the choice of fidelity is also studied to perform a thorough benchmark.

【6】CUTE-MRI: Conformalized Uncertainty-based framework for Time-adaptivE MRI
标题：CUTE-MRI：时间自适应E MRI的基于不确定性的共形框架
链接：https://arxiv.org/abs/2508.14952

作者：her, Jan Nikolas Morshuis, Thomas Küstner, Christian Baumgartner
摘要：磁共振成像（MRI）提供无与伦比的软组织对比度，但从根本上受到长采集时间的限制。虽然基于深度学习的加速MRI可以大大缩短扫描时间，但欠采样数据的重建会引入由不适定问题引起的模糊性，该问题具有无限多个可能的解决方案，并传播到下游临床任务。这种不确定性通常在采集过程中被忽略，因为加速因子通常是先验固定的，导致扫描时间不必要地长或对于给定的临床终点质量不足。这项工作介绍了一个动态的，不确定性感知的采集框架，调整扫描时间的基础上，每个主题。我们的方法利用概率重建模型来估计图像不确定性，然后通过完整的分析管道将其传播到感兴趣的定量度量（例如，髌骨软骨体积或心脏射血分数）。我们使用保形预测将这种不确定性转化为一个严格的，校准的置信区间的度量。在采集期间，系统迭代地对k空间进行采样，更新重建，并评估置信区间。一旦不确定性达到用户预定义的精度目标，扫描自动终止。我们在膝关节和心脏MRI数据集上验证了我们的框架。我们的研究结果表明，这种自适应的方法减少了扫描时间相比，固定的协议，同时提供正式的统计保证的最终图像的精度。该框架超越了固定的加速因子，实现了针对患者的采集，平衡了扫描效率与诊断信心，这是迈向个性化和资源高效型MRI的关键一步。
摘要：Magnetic Resonance Imaging (MRI) offers unparalleled soft-tissue contrast but is fundamentally limited by long acquisition times. While deep learning-based accelerated MRI can dramatically shorten scan times, the reconstruction from undersampled data introduces ambiguity resulting from an ill-posed problem with infinitely many possible solutions that propagates to downstream clinical tasks. This uncertainty is usually ignored during the acquisition process as acceleration factors are often fixed a priori, resulting in scans that are either unnecessarily long or of insufficient quality for a given clinical endpoint. This work introduces a dynamic, uncertainty-aware acquisition framework that adjusts scan time on a per-subject basis. Our method leverages a probabilistic reconstruction model to estimate image uncertainty, which is then propagated through a full analysis pipeline to a quantitative metric of interest (e.g., patellar cartilage volume or cardiac ejection fraction). We use conformal prediction to transform this uncertainty into a rigorous, calibrated confidence interval for the metric. During acquisition, the system iteratively samples k-space, updates the reconstruction, and evaluates the confidence interval. The scan terminates automatically once the uncertainty meets a user-predefined precision target. We validate our framework on both knee and cardiac MRI datasets. Our results demonstrate that this adaptive approach reduces scan times compared to fixed protocols while providing formal statistical guarantees on the precision of the final image. This framework moves beyond fixed acceleration factors, enabling patient-specific acquisitions that balance scan efficiency with diagnostic confidence, a critical step towards personalized and resource-efficient MRI.

迁移|Zero/Few/One-Shot|自适应(9篇)

【1】Conditionally adaptive augmented Lagrangian method for physics-informed learning of forward and inverse problems using artificial neural networks
标题：使用人工神经网络进行正向和逆问题的物理信息学习的连续自适应增广拉格朗日方法
链接：https://arxiv.org/abs/2508.15695

作者：, Shamsulhaq Basir, Inanc Senocak
备注：37 pages, 23 figures
摘要：我们提出了物理和等式约束的人工神经网络（PECANN）框架，大大提高了其学习解决方案的典型偏微分方程（PDE）的能力的几个进展。首先，我们推广增广拉格朗日方法（ALM），以支持多个独立的惩罚参数，使异构约束的同时执行。其次，我们将逐点约束执行和拉格朗日乘子重新表示为约束条件的期望，从而减少内存开销并允许高效的小批量训练。第三，为了解决偏微分方程的振荡，多尺度功能，我们将傅立叶特征映射，并表明，一个单一的映射足够多的映射或更昂贵的架构，需要在相关的方法。第四，我们引入了一个时间窗口的策略，其中每个窗口的终端状态被强制执行作为下一个初始条件约束，确保连续性，而不离散时间模型。最重要的是，我们提出了一个有条件的自适应惩罚更新（CAPU）策略的ALM，它保留了较大的约束违反招致更强的处罚的原则。CAPU加速了拉格朗日乘数的增长，有选择地挑战约束，增强了训练过程中的约束执行。我们证明了PECANN-CAPU的有效性问题，包括跨音速稀疏问题，可逆对流的被动涡，高波数亥姆霍兹和泊松方程，以及逆识别空间变化的热源。与现有方法和最近的Kolmogorov-Arnold网络方法的比较表明，PECANN-CAPU在所有情况下都具有竞争力的准确性。总的来说，这些进步提高了PECANN的鲁棒性，效率和适用性，要求在科学计算的问题。
摘要：We present several advances to the physics and equality constrained artificial neural networks (PECANN) framework that substantially improve its capability to learn solutions of canonical partial differential equations (PDEs). First, we generalize the augmented Lagrangian method (ALM) to support multiple independent penalty parameters, enabling simultaneous enforcement of heterogeneous constraints. Second, we reformulate pointwise constraint enforcement and Lagrange multipliers as expectations over constraint terms, reducing memory overhead and permitting efficient mini-batch training. Third, to address PDEs with oscillatory, multi-scale features, we incorporate Fourier feature mappings and show that a single mapping suffices where multiple mappings or more costly architectures were required in related methods. Fourth, we introduce a time-windowing strategy for long-time evolution in which the terminal state of each window is enforced as an initial-condition constraint for the next, ensuring continuity without discrete time models. Crucially, we propose a conditionally adaptive penalty update (CAPU) strategy for ALM, which preserves the principle that larger constraint violations incur stronger penalties. CAPU accelerates the growth of Lagrange multipliers for selectively challenging constraints, enhancing constraint enforcement during training. We demonstrate the effectiveness of PECANN-CAPU on problems including the transonic rarefaction problem, reversible advection of a passive by a vortex, high-wavenumber Helmholtz and Poisson equations, and inverse identification of spatially varying heat sources. Comparisons with established methods and recent Kolmogorov-Arnold network approaches show that PECANN-CAPU achieves competitive accuracy across all cases. Collectively, these advances improve PECANN's robustness, efficiency, and applicability to demanding problems in scientific computing.

【2】Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment
标题：通过概率高斯对齐的无反向传播测试时间自适应
链接：https://arxiv.org/abs/2508.15568

作者：ang, Youngeun Kim, Young-Geun Choi, Hongyeob Kim, Huiling Liu, Sungeun Hong
摘要：测试时自适应（TTA）通过在推理过程中利用未标记的测试数据来增强分布偏移下的zero-shot鲁棒性。尽管取得了显著进展，但若干挑战仍然限制了其更广泛的适用性。首先，大多数方法依赖于反向传播或迭代优化，这限制了可扩展性并阻碍了实时部署。其次，它们缺乏类条件特征分布的显式建模。这种建模对于产生可靠的决策边界和校准的预测至关重要，但由于缺乏源数据和测试时的监督，它仍然没有得到充分的探索。在本文中，我们提出了ADAPT，一个先进的分布感知和反向传播免费的测试时间适应方法。我们将TTA重新定义为高斯概率推理任务，通过使用逐渐更新的类均值和共享协方差矩阵建模类条件似然。这就实现了封闭形式的、无需训练的推理。为了纠正潜在的似然偏差，我们引入了由CLIP先验和历史知识库指导的轻量级正则化。ADAPT不需要源数据、梯度更新和对目标数据的完全访问，支持在线和转换设置。在不同的基准测试中进行的大量实验表明，我们的方法在广泛的分布变化下实现了最先进的性能，具有卓越的可扩展性和鲁棒性。
摘要：Test-time adaptation (TTA) enhances the zero-shot robustness under distribution shifts by leveraging unlabeled test data during inference. Despite notable advances, several challenges still limit its broader applicability. First, most methods rely on backpropagation or iterative optimization, which limits scalability and hinders real-time deployment. Second, they lack explicit modeling of class-conditional feature distributions. This modeling is crucial for producing reliable decision boundaries and calibrated predictions, but it remains underexplored due to the lack of both source data and supervision at test time. In this paper, we propose ADAPT, an Advanced Distribution-Aware and backPropagation-free Test-time adaptation method. We reframe TTA as a Gaussian probabilistic inference task by modeling class-conditional likelihoods using gradually updated class means and a shared covariance matrix. This enables closed-form, training-free inference. To correct potential likelihood bias, we introduce lightweight regularization guided by CLIP priors and a historical knowledge bank. ADAPT requires no source data, no gradient updates, and no full access to target data, supporting both online and transductive settings. Extensive experiments across diverse benchmarks demonstrate that our method achieves state-of-the-art performance under a wide range of distribution shifts with superior scalability and robustness.

【3】Think in Blocks: Adaptive Reasoning from Direct Response to Deep Reasoning
标题：区块思考：从直接反应到深度推理的适应性推理
链接：https://arxiv.org/abs/2508.15507

作者：, Guang Chen, Chengjun Mao
摘要：具有思想链的大型语言模型（LLM）在越来越多的任务中表现出强大的性能，特别是那些涉及复杂逻辑推理的任务。然而，过长的链可能会导致过度思考，导致计算浪费和反应较慢。这就提出了一个问题：LLM能否根据任务复杂度动态调整推理过程的长度？为了解决这个问题，我们提出了在块框架，它使自适应推理从零到深推理的推理过程划分成一个可调的块数。我们的主要贡献是：（1）建立一个显式的块结构范式，模型首先预测一个整数推理参数--块的数量--然后相应地划分其推理：（2）通过一个三阶段流水线--监督微调、奖励引导的直接偏好优化和强化学习--训练一个自适应模型，该流水线根据问题的难度调整其推理深度;（3）利用显式块计数在推理时动态控制推理深度，允许在部署期间灵活调整思想链长度。
摘要：Large Language Models (LLMs) with chains-of-thought have demonstrated strong performance on an increasing range of tasks, particularly those involving complex logical reasoning. However, excessively long chains can lead to overthinking, causing computational waste and slower responses. This raises a question: can LLMs dynamically adjust the length of their reasoning processes based on task complexity? To address this, we propose the Think in Blocks framework, which enables adaptive reasoning-from zero to deep reasoning-by partitioning the reasoning process into a tunable number of blocks. Our main contributions are: (1) Establishing an explicit block-structured paradigm in which the model first predicts an integer reasoning budget-the number of blocks-and then partitions its reasoning accordingly; (2) Training an adaptive model through a three-stage pipeline-Supervised Fine-Tuning, reward-guided Direct Preference Optimization, and Reinforcement Learning-that adjusts its reasoning depth to problem difficulty; (3) Exploiting the explicit block count to dynamically control reasoning depth at inference time, allowing flexible adjustment of chain-of-thought length during deployment.

【4】Bridging Generalization and Personalization in Wearable Human Activity Recognition via On-Device Few-Shot Learning
标题：通过设备上Few-Shot学习在可穿戴人类活动识别中架起概括和个性化的桥梁
链接：https://arxiv.org/abs/2508.15413

作者：, Julian Moosmann, Mengxi Liu, Bo Zhou, Michele Magno, Paul Lukowicz, Sizhen Bian
摘要：近年来，使用可穿戴设备的人类活动识别（HAR）取得了显着进展，但当模型部署到新用户时，其推广仍然有限。这种性能下降主要是由于用户引起的概念漂移（UICD），突出了有效个性化的重要性。在本文中，我们提出了一个混合框架，首先在用户之间进行推广，然后直接在设备上使用Few-Shot学习快速适应单个用户。通过只更新分类器层与用户特定的数据，我们的方法实现了强大的个性化与最小的计算和内存开销。我们在基于RISC-V的GAP 9微控制器上实现了这个框架，并在三种不同的HAR场景中验证了它：RecGym，QVAR-Gesture和Ultrasound-Gesture。部署后自适应分别产生了3.73%、17.38%和3.70%的一致准确性改进。这些结果证实了快速、轻量级和有效的个性化在嵌入式平台上是可行的，为可扩展和用户感知的HAR系统铺平了道路\footnote{https：//github.com/kangpx/onlineTiny2023}。
摘要：Human Activity Recognition (HAR) using wearable devices has advanced significantly in recent years, yet its generalization remains limited when models are deployed to new users. This degradation in performance is primarily due to user-induced concept drift (UICD), highlighting the importance of efficient personalization. In this paper, we present a hybrid framework that first generalizes across users and then rapidly adapts to individual users using few-shot learning directly on-device. By updating only the classifier layer with user-specific data, our method achieves robust personalization with minimal computational and memory overhead. We implement this framework on the energy-efficient RISC-V-based GAP9 microcontroller and validate it across three diverse HAR scenarios: RecGym, QVAR-Gesture, and Ultrasound-Gesture. Post-deployment adaptation yields consistent accuracy improvements of 3.73\%, 17.38\%, and 3.70\% respectively. These results confirm that fast, lightweight, and effective personalization is feasible on embedded platforms, paving the way for scalable and user-aware HAR systems in the wild \footnote{https://github.com/kangpx/onlineTiny2023}.

【5】Foundational Design Principles and Patterns for Building Robust and Adaptive GenAI-Native Systems
标题：构建稳健且自适应的GenAI原生系统的基础设计原则和模式
链接：https://arxiv.org/abs/2508.15411

作者：Vandeputte
摘要：生成人工智能（GenAI）已经成为一种变革性技术，在不同的应用领域展示了卓越的能力。然而，由于其不可预测性和低效率，GenAI在开发可靠和高效的GenAI授权系统方面面临着几个主要挑战。本文倡导一种范式转变：未来的GenAI原生系统应该将GenAI的认知能力与传统的软件工程原则相结合，以创建强大、自适应和高效的系统。我们介绍了基本的GenAI原生设计原则，围绕五个关键支柱-可靠性，卓越性，可进化性，自力更生和保证-并提出了架构模式，如GenAI原生细胞，有机基板和可编程路由器，以指导创建弹性和自我进化的系统。此外，我们还概述了GenAI原生软件栈的关键要素，并从技术、用户采用、经济和法律角度讨论了这些系统的影响，强调了进一步验证和实验的必要性。我们的工作旨在激发未来的研究，并鼓励相关社区实施和完善这一概念框架。
摘要：Generative AI (GenAI) has emerged as a transformative technology, demonstrating remarkable capabilities across diverse application domains. However, GenAI faces several major challenges in developing reliable and efficient GenAI-empowered systems due to its unpredictability and inefficiency. This paper advocates for a paradigm shift: future GenAI-native systems should integrate GenAI's cognitive capabilities with traditional software engineering principles to create robust, adaptive, and efficient systems. We introduce foundational GenAI-native design principles centered around five key pillars -- reliability, excellence, evolvability, self-reliance, and assurance -- and propose architectural patterns such as GenAI-native cells, organic substrates, and programmable routers to guide the creation of resilient and self-evolving systems. Additionally, we outline the key ingredients of a GenAI-native software stack and discuss the impact of these systems from technical, user adoption, economic, and legal perspectives, underscoring the need for further validation and experimentation. Our work aims to inspire future research and encourage relevant communities to implement and refine this conceptual framework.

【6】Frequency-adaptive tensor neural networks for high-dimensional multi-scale problems
标题：用于多维多尺度问题的频率自适应张量神经网络
链接：https://arxiv.org/abs/2508.15198

作者：g, Rukang You, Tao Zhou
摘要：张量神经网络（TNN）已经证明了其在解决高维问题方面的优越性。然而，与传统的神经网络类似，TNN也受到频率原理的影响，这限制了它们准确捕获解决方案的高频特征的能力。在这项工作中，我们通过傅立叶分析来分析TNN的训练动态，并通过引入随机傅立叶特征来增强其对高维多尺度问题的表达能力。利用TNN固有的张量结构，我们进一步提出了一种新的方法，通过对一维分量函数进行离散傅立叶变换来提取高维函数的频率特征。这种策略有效地缓解了维度灾难。基于这一思想，我们提出了一种频率自适应的TNN算法，它显着提高了TNN在解决复杂的多尺度问题的能力。大量的数值实验验证了所提出的频率自适应TNN算法的有效性和鲁棒性。
摘要：Tensor neural networks (TNNs) have demonstrated their superiority in solving high-dimensional problems. However, similar to conventional neural networks, TNNs are also influenced by the Frequency Principle, which limits their ability to accurately capture high-frequency features of the solution. In this work, we analyze the training dynamics of TNNs by Fourier analysis and enhance their expressivity for high-dimensional multi-scale problems by incorporating random Fourier features. Leveraging the inherent tensor structure of TNNs, we further propose a novel approach to extract frequency features of high-dimensional functions by performing the Discrete Fourier Transform to one-dimensional component functions. This strategy effectively mitigates the curse of dimensionality. Building on this idea, we propose a frequency-adaptive TNNs algorithm, which significantly improves the ability of TNNs in solving complex multi-scale problems. Extensive numerical experiments are performed to validate the effectiveness and robustness of the proposed frequency-adaptive TNNs algorithm.

【7】Adaptive Anomaly Detection in Evolving Network Environments
标题：不断发展的网络环境中的自适应异常检测
链接：https://arxiv.org/abs/2508.15100

作者：usavipour, Andrey Dimanchev, Majid Ghaderi
摘要：分布偏移是数据统计特性随时间的变化，对深度学习异常检测系统构成了严峻挑战。现有的异常检测系统通常很难适应这些变化。具体而言，基于监督学习的系统需要昂贵的手动标记，而基于无监督学习的系统依赖于难以获得的干净数据进行移位自适应。这两个要求在实践中都很难满足。在本文中，我们介绍了NetSight，一个框架，监督异常检测网络数据，不断检测和适应分布变化的在线方式。NetSight通过一种新的伪标签技术消除了人工干预，并使用基于知识蒸馏的自适应策略来防止灾难性遗忘。在三个长期网络数据集上进行评估，与依赖手动标记的最先进方法相比，NetSight表现出卓越的自适应性能，实现了高达11.72%的F1分数改进。这证明了它在经历分布随时间变化的动态网络中的鲁棒性和有效性。
摘要：Distribution shift, a change in the statistical properties of data over time, poses a critical challenge for deep learning anomaly detection systems. Existing anomaly detection systems often struggle to adapt to these shifts. Specifically, systems based on supervised learning require costly manual labeling, while those based on unsupervised learning rely on clean data, which is difficult to obtain, for shift adaptation. Both of these requirements are challenging to meet in practice. In this paper, we introduce NetSight, a framework for supervised anomaly detection in network data that continually detects and adapts to distribution shifts in an online manner. NetSight eliminates manual intervention through a novel pseudo-labeling technique and uses a knowledge distillation-based adaptation strategy to prevent catastrophic forgetting. Evaluated on three long-term network datasets, NetSight demonstrates superior adaptation performance compared to state-of-the-art methods that rely on manual labeling, achieving F1-score improvements of up to 11.72%. This proves its robustness and effectiveness in dynamic networks that experience distribution shifts over time.

【8】Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
标题：增强优化器稳定性：MSG步进大小的动量适应
链接：https://arxiv.org/abs/2508.15071

作者：lamov, Niccolo Ajroldi, Antonio Orvieto, Aurelien Lucchi
摘要：结合动量和自适应步长的现代优化算法在许多具有挑战性的深度学习任务中提供了更好的性能。然而，它们的有效性通常对超参数的选择高度敏感，特别是步长。调整这些参数通常是困难的、资源密集型的和耗时的。因此，最近的努力已经指向在宽范围的超参数选择上增强优化器的稳定性[Schaipp等人，2024年]。在本文中，我们介绍了一种算法，该算法与最先进的优化器的性能相匹配，同时通过对NGN步长方法的新适应来提高步长超参数选择的稳定性[Orvieto和Xiao，2024]。具体来说，我们提出了一个基于动量的版本（NGN-M），达到标准的收敛速度$\mathcal{O}（1/\sqrt{K}）$限制性较低的假设下，不需要插值条件或假设的有界随机梯度或迭代，与以前的方法相比。此外，我们的经验表明，NGN步长与动量的组合结果在增强的鲁棒性的步长超参数的选择，同时提供的性能是相当的或超过其他国家的最先进的优化。
摘要：Modern optimization algorithms that incorporate momentum and adaptive step-size offer improved performance in numerous challenging deep learning tasks. However, their effectiveness is often highly sensitive to the choice of hyperparameters, especially the step-size. Tuning these parameters is often difficult, resource-intensive, and time-consuming. Therefore, recent efforts have been directed toward enhancing the stability of optimizers across a wide range of hyperparameter choices [Schaipp et al., 2024]. In this paper, we introduce an algorithm that matches the performance of state-of-the-art optimizers while improving stability to the choice of the step-size hyperparameter through a novel adaptation of the NGN step-size method [Orvieto and Xiao, 2024]. Specifically, we propose a momentum-based version (NGN-M) that attains the standard convergence rate of $\mathcal{O}(1/\sqrt{K})$ under less restrictive assumptions, without the need for interpolation condition or assumptions of bounded stochastic gradients or iterates, in contrast to previous approaches. Additionally, we empirically demonstrate that the combination of the NGN step-size with momentum results in enhanced robustness to the choice of the step-size hyperparameter while delivering performance that is comparable to or surpasses other state-of-the-art optimizers.

【9】HHNAS-AM: Hierarchical Hybrid Neural Architecture Search using Adaptive Mutation Policies
标题：HHNAS-AM：使用自适应突变策略的分层混合神经架构搜索
链接：https://arxiv.org/abs/2508.14946

作者：ipathi, Ajeet Kumar Singh, Rajsabi Surya, Aum Gupta, Sahiinii Lemaina Veikho, Dorien Herremans, Sudhir Bisane
摘要：神经架构搜索（NAS）由于其发现优于手动设计的架构的能力而引起了重大的研究兴趣。学习文本表示对于文本分类和其他与语言相关的任务至关重要。用于文本分类的NAS模型没有混合层次结构，对体系结构也没有限制，搜索空间变得非常大，而且大多是冗余的，因此现有的强化学习模型不能有效地导航搜索空间。此外，进行平面架构搜索会导致无组织的搜索空间，难以遍历。为此，我们提出了HHNAS-AM（分层混合神经架构搜索与自适应突变策略），一种新的方法，有效地探索不同的架构配置。我们引入了一些架构模板来组织搜索空间，搜索空间是根据特定领域的线索设计的。我们的方法采用变异策略，这些策略基于使用Q学习的先前迭代的性能反馈动态适应，从而实现更有效和加速的搜索空间遍历。所提出的模型是完全概率的，使搜索空间的有效探索。我们评估我们的方法在数据库ID（db_id）预测任务，在那里它始终发现高性能的架构在多个实验。在Spider数据集上，我们的方法比现有基线的测试精度提高了8%。
摘要：Neural Architecture Search (NAS) has garnered significant research interest due to its capability to discover architectures superior to manually designed ones. Learning text representation is crucial for text classification and other language-related tasks. The NAS model used in text classification does not have a Hybrid hierarchical structure, and there is no restriction on the architecture structure, due to which the search space becomes very large and mostly redundant, so the existing RL models are not able to navigate the search space effectively. Also, doing a flat architecture search leads to an unorganised search space, which is difficult to traverse. For this purpose, we propose HHNAS-AM (Hierarchical Hybrid Neural Architecture Search with Adaptive Mutation Policies), a novel approach that efficiently explores diverse architectural configurations. We introduce a few architectural templates to search on which organise the search spaces, where search spaces are designed on the basis of domain-specific cues. Our method employs mutation strategies that dynamically adapt based on performance feedback from previous iterations using Q-learning, enabling a more effective and accelerated traversal of the search space. The proposed model is fully probabilistic, enabling effective exploration of the search space. We evaluate our approach on the database id (db_id) prediction task, where it consistently discovers high-performing architectures across multiple experiments. On the Spider dataset, our method achieves an 8% improvement in test accuracy over existing baselines.

强化学习(3篇)

【1】Understanding Action Effects through Instrumental Empowerment in Multi-Agent Reinforcement Learning
标题：通过多智能体强化学习中的工具授权理解动作效应
链接：https://arxiv.org/abs/2508.15652

作者：lmonaj, Miroslav Strupl, Oleg Szehr, Alessandro Antonucci
备注：European Conference on Artificial Intelligence (ECAI) 2025
摘要：为了可靠地部署多智能体强化学习（MARL）系统，了解团队中的单个智能体行为至关重要。虽然以前的工作通常基于明确的奖励信号或学习的价值函数来评估整体团队绩效，但目前还不清楚如何在没有任何价值反馈的情况下推断代理人的贡献。在这项工作中，我们调查是否可以提取有意义的见解代理行为是一致的基本价值函数，仅通过分析政策分布。受智能代理倾向于追求收敛的工具值，这通常会增加任务成功的可能性的现象的启发，我们引入了意向合作值（ICVs），基于信息理论的Shapley值的方法，用于量化每个代理的因果影响，他们的合作伙伴的工具授权。具体来说，ICV通过评估决策不确定性和偏好一致性来衡量智能体对其队友政策的行动效果。跨合作和竞争的MARL环境的分析揭示了代理人采取类似或不同的策略的程度。通过比较政策和价值函数之间的行动效果，我们的方法确定代理的行为是有益的团队成功，无论是通过促进确定性的决策，或通过保留灵活性，为未来的行动选择。我们提出的方法提供了新的见解合作动态和增强MARL系统的可解释性。
摘要：To reliably deploy Multi-Agent Reinforcement Learning (MARL) systems, it is crucial to understand individual agent behaviors within a team. While prior work typically evaluates overall team performance based on explicit reward signals or learned value functions, it is unclear how to infer agent contributions in the absence of any value feedback. In this work, we investigate whether meaningful insights into agent behaviors can be extracted that are consistent with the underlying value functions, solely by analyzing the policy distribution. Inspired by the phenomenon that intelligent agents tend to pursue convergent instrumental values, which generally increase the likelihood of task success, we introduce Intended Cooperation Values (ICVs), a method based on information-theoretic Shapley values for quantifying each agent's causal influence on their co-players' instrumental empowerment. Specifically, ICVs measure an agent's action effect on its teammates' policies by assessing their decision uncertainty and preference alignment. The analysis across cooperative and competitive MARL environments reveals the extent to which agents adopt similar or diverse strategies. By comparing action effects between policies and value functions, our method identifies which agent behaviors are beneficial to team success, either by fostering deterministic decisions or by preserving flexibility for future action choices. Our proposed method offers novel insights into cooperation dynamics and enhances explainability in MARL systems.

【2】Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
标题：离线基于偏好的强化学习的基于搜索的信用分配
链接：https://arxiv.org/abs/2508.15327

作者： Gao, Yufeng Shi, Wengang Zhou, Houqiang Li
备注：7 pages, 6 figures, under review
摘要：离线强化学习是指从固定数据集学习策略的过程，而不需要额外的环境交互。然而，它通常依赖于定义良好的奖励函数，这是困难和昂贵的设计。人类反馈是一种很有吸引力的选择，但它的两种常见形式，专家演示和偏好，有互补的局限性。示范提供了逐步的监督，但它们的收集成本很高，往往反映了有限的专家行为模式。相比之下，偏好更容易收集，但不清楚行为的哪些部分对轨迹段的贡献最大，从而使信用分配悬而未决。在本文中，我们引入了一个基于搜索的偏好加权（SPW）计划，以统一这两个反馈源。对于偏好标记轨迹中的每个过渡，SPW从专家演示中搜索最相似的状态-动作对，并根据其相似性得分直接导出逐步重要性权重。然后，这些权重用于指导标准偏好学习，从而实现传统方法难以实现的更准确的信用分配。我们证明了SPW可以从偏好和演示中进行有效的联合学习，在具有挑战性的机器人操作任务上优于利用这两种反馈类型的先前方法。
摘要：Offline reinforcement learning refers to the process of learning policies from fixed datasets, without requiring additional environment interaction. However, it often relies on well-defined reward functions, which are difficult and expensive to design. Human feedback is an appealing alternative, but its two common forms, expert demonstrations and preferences, have complementary limitations. Demonstrations provide stepwise supervision, but they are costly to collect and often reflect limited expert behavior modes. In contrast, preferences are easier to collect, but it is unclear which parts of a behavior contribute most to a trajectory segment, leaving credit assignment unresolved. In this paper, we introduce a Search-Based Preference Weighting (SPW) scheme to unify these two feedback sources. For each transition in a preference labeled trajectory, SPW searches for the most similar state-action pairs from expert demonstrations and directly derives stepwise importance weights based on their similarity scores. These weights are then used to guide standard preference learning, enabling more accurate credit assignment that traditional approaches struggle to achieve. We demonstrate that SPW enables effective joint learning from preferences and demonstrations, outperforming prior methods that leverage both feedback types on challenging robot manipulation tasks.

【3】Universal Reinforcement Learning in Coalgebras: Asynchronous Stochastic Computation via Conduction
标题：余代数中的通用强化学习：通过导电的非同步随机计算
链接：https://arxiv.org/abs/2508.15128

作者：ahadevan
备注：45 pages
摘要：在本文中，我们介绍了RL的范畴推广，称为通用强化学习（URL），建立在强大的数学抽象的研究非良基集和通用coalgebras，拓扑理论和异步并行分布式计算的范畴模型的coinduction。在本文的前半部分，我们回顾了基本的RL框架，说明了类别和函子在RL中的使用，展示了它们如何导致有趣的见解。特别是，我们还介绍了一个标准的异步分布式最小化提出的Bertsekas和Tsitsiklis模型，并描述度量共归纳和他们的异步收敛定理的证明之间的关系。MDP或PSR的算法空间可以建模为函子范畴，其中共域范畴形成拓扑，其允许所有（共）限制，拥有子对象分类器，并且具有指数对象。在论文的后半部分，我们继续研究泛余代数。动态系统模型，如马尔可夫决策过程（MDP），部分观测MDP（POMDP），预测状态表示（PSR）和线性动态系统（LDS）都是特殊类型的余代数。我们描述了一个广泛的泛余代数家族，扩展了以前在RL中研究的动态系统模型。在RL中寻找不动点以确定精确或近似（作用）值函数的核心问题在URL中被推广到以并行分布式方式异步确定最终余代数。
摘要：In this paper, we introduce a categorial generalization of RL, termed universal reinforcement learning (URL), building on powerful mathematical abstractions from the study of coinduction on non-well-founded sets and universal coalgebras, topos theory, and categorial models of asynchronous parallel distributed computation. In the first half of the paper, we review the basic RL framework, illustrate the use of categories and functors in RL, showing how they lead to interesting insights. In particular, we also introduce a standard model of asynchronous distributed minimization proposed by Bertsekas and Tsitsiklis, and describe the relationship between metric coinduction and their proof of the Asynchronous Convergence Theorem. The space of algorithms for MDPs or PSRs can be modeled as a functor category, where the co-domain category forms a topos, which admits all (co)limits, possesses a subobject classifier, and has exponential objects. In the second half of the paper, we move on to universal coalgebras. Dynamical system models, such as Markov decision processes (MDPs), partially observed MDPs (POMDPs), a predictive state representation (PSRs), and linear dynamical systems (LDSs) are all special types of coalgebras. We describe a broad family of universal coalgebras, extending the dynamic system models studied previously in RL. The core problem in finding fixed points in RL to determine the exact or approximate (action) value function is generalized in URL to determining the final coalgebra asynchronously in a parallel distributed manner.

符号|符号学习(1篇)

【1】From Basic Affordances to Symbolic Thought: A Computational Phylogenesis of Biological Intelligence
标题：从基本功能到象征性思维：生物智能的计算系统发生
链接：https://arxiv.org/abs/2508.15082

作者：ummel, Rachel F. Heaton
备注：47 pages 8 figures
摘要：人类的大脑是什么让我们能够进行象征性的推理，而大多数其他动物却不能？有证据表明，动态绑定（dynamic binding），即在运行中将神经元组合成组的能力，是符号思维所必需的，但也有证据表明，这还不够。我们建议，两种层次的集成（多个角色绑定到多个谓词的集成，和多个对应到结构映射的集成）是最低要求，在基本的动态绑定，实现象征性的思想。我们测试了这一假设，在一个系统的收集17个模拟，探讨了认知架构的能力，并没有能力的多位谓词和结构映射执行各种任务。模拟尽可能通用，因为没有任务可以基于任何诊断特征来执行，而是取决于多位谓词和结构映射的能力。结果是一致的假设，随着动态绑定，多位谓词和结构映射是基本的象征性思维的最低要求。这些结果让我们了解了人类大脑如何产生符号思维，并说明了生物智能与现代机器学习方法之间的差异，生物智能倾向于从很少的训练示例中广泛推广，而现代机器学习方法通常需要数百万或数十亿个训练示例。我们报告的结果对生物启发的人工智能也有重要意义。
摘要：What is it about human brains that allows us to reason symbolically whereas most other animals cannot? There is evidence that dynamic binding, the ability to combine neurons into groups on the fly, is necessary for symbolic thought, but there is also evidence that it is not sufficient. We propose that two kinds of hierarchical integration (integration of multiple role-bindings into multiplace predicates, and integration of multiple correspondences into structure mappings) are minimal requirements, on top of basic dynamic binding, to realize symbolic thought. We tested this hypothesis in a systematic collection of 17 simulations that explored the ability of cognitive architectures with and without the capacity for multi-place predicates and structure mapping to perform various kinds of tasks. The simulations were as generic as possible, in that no task could be performed based on any diagnostic features, depending instead on the capacity for multi-place predicates and structure mapping. The results are consistent with the hypothesis that, along with dynamic binding, multi-place predicates and structure mapping are minimal requirements for basic symbolic thought. These results inform our understanding of how human brains give rise to symbolic thought and speak to the differences between biological intelligence, which tends to generalize broadly from very few training examples, and modern approaches to machine learning, which typically require millions or billions of training examples. The results we report also have important implications for bio-inspired artificial intelligence.

医学相关(5篇)

【1】Learning ECG Representations via Poly-Window Contrastive Learning
标题：基于多窗口对比学习的心电图表示方法
链接：https://arxiv.org/abs/2508.15225

作者：Joseph Van Duyn, Runze Yan, Zhuoyi Huang, Sulaiman Vesal, Sergey Plis, Xiao Hu, Gloria Hyunjung Kwak, Ran Xiao, Alex Fedorov
备注：This work has been accepted for publication in IEEE-EMBS International Conference on Biomedical and Health Informatics 2025. The final published version will be available via IEEE Xplore
摘要：心电图（ECG）分析是心血管疾病诊断的基础，但深度学习模型的性能通常受到对注释数据的有限访问的限制。自监督对比学习已经成为从未标记信号中学习鲁棒ECG表示的强大方法。然而，大多数现有的方法仅生成成对增强视图，并且未能利用ECG记录的丰富的时间结构。在这项工作中，我们提出了一个多窗口对比学习框架。我们从每个ECG实例中提取多个时间窗口来构建正对，并通过统计最大化它们的一致性。受慢速特征分析原理的启发，我们的方法明确鼓励模型学习随时间持续的时间不变和生理上有意义的特征。我们通过对PTB-XL数据集进行广泛的实验和消融研究来验证我们的方法。我们的研究结果表明，多窗口对比学习在多标签超类分类中始终优于传统的双视图方法，实现了更高的AUROC（0.891 vs. 0.888）和F1得分（0.680 vs. 0.679），同时需要的预训练时间减少了四倍（32 vs. 128），总挂钟预训练时间减少了14.8%。尽管每个样本处理多个窗口，但我们显著减少了训练时期的数量和总计算时间，使我们的方法适用于训练基础模型。通过广泛的消融，我们确定了最佳的设计选择，并证明了各种超参数的鲁棒性。这些发现建立了多窗口对比学习作为一个高效和可扩展的自动心电图分析的范例，并提供了一个有前途的一般框架，在生物医学时间序列数据的自我监督表示学习。
摘要：Electrocardiogram (ECG) analysis is foundational for cardiovascular disease diagnosis, yet the performance of deep learning models is often constrained by limited access to annotated data. Self-supervised contrastive learning has emerged as a powerful approach for learning robust ECG representations from unlabeled signals. However, most existing methods generate only pairwise augmented views and fail to leverage the rich temporal structure of ECG recordings. In this work, we present a poly-window contrastive learning framework. We extract multiple temporal windows from each ECG instance to construct positive pairs and maximize their agreement via statistics. Inspired by the principle of slow feature analysis, our approach explicitly encourages the model to learn temporally invariant and physiologically meaningful features that persist across time. We validate our approach through extensive experiments and ablation studies on the PTB-XL dataset. Our results demonstrate that poly-window contrastive learning consistently outperforms conventional two-view methods in multi-label superclass classification, achieving higher AUROC (0.891 vs. 0.888) and F1 scores (0.680 vs. 0.679) while requiring up to four times fewer pre-training epochs (32 vs. 128) and 14.8% in total wall clock pre-training time reduction. Despite processing multiple windows per sample, we achieve a significant reduction in the number of training epochs and total computation time, making our method practical for training foundational models. Through extensive ablations, we identify optimal design choices and demonstrate robustness across various hyperparameters. These findings establish poly-window contrastive learning as a highly efficient and scalable paradigm for automated ECG analysis and provide a promising general framework for self-supervised representation learning in biomedical time-series data.

【2】A Robust BERT-Based Deep Learning Model for Automated Cancer Type Extraction from Unstructured Pathology Reports
标题：基于BERT的稳健深度学习模型，用于从非结构化病理报告中自动提取癌症类型
链接：https://arxiv.org/abs/2508.15149

作者：, Jeffery C. Chan, Min Li Huang, Maya Kansara, John P. Grady, Christine E. Napier, Subotheni Thavaneswaran, Mandy L. Ballinger, David M. Thomas, Frank P. Lin
摘要：从电子病历中准确提取临床信息对临床研究尤其重要，但需要大量训练有素的专业知识和手工劳动。在这项研究中，我们开发了一个强大的系统，用于自动提取特定的癌症类型，以支持精确的肿瘤学研究。从病理学报告中提取的数据。该模型显著优于基线模型和大型语言模型Mistral 7B，实现了F1_Bertscore 0.98和80.61%的总体精确匹配。这种微调方法展示了可无缝集成到分子肿瘤委员会过程中的可扩展性潜力。针对肿瘤学中的精确任务微调特定领域的模型，可以为更有效和准确的临床信息提取铺平道路。
摘要：The accurate extraction of clinical information from electronic medical records is particularly critical to clinical research but require much trained expertise and manual labor. In this study we developed a robust system for automated extraction of the specific cancer types for the purpose of supporting precision oncology research. from pathology reports using a fine-tuned RoBERTa model. This model significantly outperformed the baseline model and a Large Language Model, Mistral 7B, achieving F1_Bertscore 0.98 and overall exact match of 80.61%. This fine-tuning approach demonstrates the potential for scalability that can integrate seamlessly into the molecular tumour board process. Fine-tuning domain-specific models for precision tasks in oncology, may pave the way for more efficient and accurate clinical information extraction.

【3】XAI-Driven Spectral Analysis of Cough Sounds for Respiratory Disease Characterization
标题：XAI驱动的咳嗽声频谱分析用于呼吸系统疾病特征
链接：https://arxiv.org/abs/2508.14949

作者：Amado-Caballero, Luis Miguel San-José-Revuelta, María Dolores Aguilar-García, José Ramón Garmendia-Leiza, Carlos Alberola-López, Pablo Casaseca-de-la-Higuera
摘要：本文提出了一种可解释的人工智能（XAI）驱动的方法，以提高对呼吸系统疾病管理咳嗽声分析的理解。我们采用遮挡图来突出显示卷积神经网络（CNN）处理的咳嗽频谱图中的相关频谱区域。随后，由这些遮挡图加权的频谱图的频谱分析揭示了疾病组之间的显著差异，特别是在患有COPD的患者中，其中咳嗽模式在所识别的感兴趣的频谱区域中出现更多变化。这与分析原始光谱图时观察到的缺乏显著差异形成对比。所提出的方法提取和分析了几个频谱特征，展示了XAI技术的潜力，以发现疾病特异性的声学特征，并通过提供更多的可解释的结果来提高咳嗽声分析的诊断能力。
摘要：This paper proposes an eXplainable Artificial Intelligence (XAI)-driven methodology to enhance the understanding of cough sound analysis for respiratory disease management. We employ occlusion maps to highlight relevant spectral regions in cough spectrograms processed by a Convolutional Neural Network (CNN). Subsequently, spectral analysis of spectrograms weighted by these occlusion maps reveals significant differences between disease groups, particularly in patients with COPD, where cough patterns appear more variable in the identified spectral regions of interest. This contrasts with the lack of significant differences observed when analyzing raw spectrograms. The proposed approach extracts and analyzes several spectral features, demonstrating the potential of XAI techniques to uncover disease-specific acoustic signatures and improve the diagnostic capabilities of cough sound analysis by providing more interpretable results.

【4】Structure-Aware Temporal Modeling for Chronic Disease Progression Prediction
标题：用于慢性病进展预测的结构感知时态建模
链接：https://arxiv.org/abs/2508.14942

作者：Hu, Bo Zhang, Ting Xu, Haifeng Yang, Min Gao
摘要：这项研究解决了帕金森病进展预测中症状演变复杂性和时间依赖性建模不足的挑战。它提出了一个统一的预测框架，集成了结构感知和时间建模。该方法利用图神经网络对多模态临床症状之间的结构关系进行建模，并引入基于图的表示来捕获症状之间的语义依赖关系。它还结合了一个Transformer架构，以模拟疾病进展期间的动态时间特征。为了融合结构和时间信息，设计了一种结构感知的门控机制，动态调整结构编码和时间特征之间的融合权重，增强了模型识别关键进展阶段的能力。为了提高分类精度和稳定性，该框架包括一个多组件建模管道，由图构建模块，时间编码模块和预测输出层组成。该模型评估现实世界的纵向帕金森氏病的数据。实验包括与主流模型的比较、超参数的敏感性分析和图连接密度控制。结果表明，该方法优于现有的方法在AUC，RMSE和IPW-F1指标。它有效地区分了进展阶段，并提高了模型捕捉个性化症状轨迹的能力。整体框架表现出很强的泛化能力和结构可扩展性，为帕金森病等慢性进展性疾病的智能建模提供了可靠的支持。
摘要：This study addresses the challenges of symptom evolution complexity and insufficient temporal dependency modeling in Parkinson's disease progression prediction. It proposes a unified prediction framework that integrates structural perception and temporal modeling. The method leverages graph neural networks to model the structural relationships among multimodal clinical symptoms and introduces graph-based representations to capture semantic dependencies between symptoms. It also incorporates a Transformer architecture to model dynamic temporal features during disease progression. To fuse structural and temporal information, a structure-aware gating mechanism is designed to dynamically adjust the fusion weights between structural encodings and temporal features, enhancing the model's ability to identify key progression stages. To improve classification accuracy and stability, the framework includes a multi-component modeling pipeline, consisting of a graph construction module, a temporal encoding module, and a prediction output layer. The model is evaluated on real-world longitudinal Parkinson's disease data. The experiments involve comparisons with mainstream models, sensitivity analysis of hyperparameters, and graph connection density control. Results show that the proposed method outperforms existing approaches in AUC, RMSE, and IPW-F1 metrics. It effectively distinguishes progression stages and improves the model's ability to capture personalized symptom trajectories. The overall framework demonstrates strong generalization and structural scalability, providing reliable support for intelligent modeling of chronic progressive diseases such as Parkinson's disease.

【5】Cohort-Aware Agents for Individualized Lung Cancer Risk Prediction Using a Retrieval-Augmented Model Selection Framework
标题：使用检索增强模型选择框架进行个体化肺癌风险预测的协同感知代理
链接：https://arxiv.org/abs/2508.14940

作者：u, Allen J. Luna, Thomas Z. Li, Junchao Zhu, Junlin Guo, Juming Xiong, Kim L. Sandler, Bennett A. Landman, Yuankai Huo
摘要：准确的肺癌风险预测仍然具有挑战性，因为患者人群和临床环境之间存在很大的差异-没有一个模型对所有队列都表现最好。为了解决这个问题，我们提出了一个个性化的肺癌风险预测代理，动态选择最合适的模型，为每个病人结合队列特定的知识与现代检索和推理技术。给定患者的CT扫描和结构化元数据-包括人口统计学，临床和结节水平特征-代理首先使用基于FAISS的相似性搜索在9个不同的现实世界队列中进行队列检索，以从多机构数据库中识别最相关的患者人群。其次，利用检索到的群组及其相关联的性能度量提示大语言模型（LLM），以从八个代表性模型的池中推荐最佳预测算法，包括经典线性风险模型（例如，Mayo，Brock），时间感知模型（例如，TDVIT、DLSTM）和基于多模态计算机视觉的方法（例如，Liao，Sybil，DLS，DLI）。这种两阶段的代理管道-通过FAISS检索和通过LLM推理-实现了动态的，队列感知的风险预测个性化每个患者的个人资料。基于这种架构，该代理支持在不同的临床人群中进行灵活和队列驱动的模型选择，为现实世界的肺癌筛查提供了一条个性化风险评估的实用途径。
摘要：Accurate lung cancer risk prediction remains challenging due to substantial variability across patient populations and clinical settings -- no single model performs best for all cohorts. To address this, we propose a personalized lung cancer risk prediction agent that dynamically selects the most appropriate model for each patient by combining cohort-specific knowledge with modern retrieval and reasoning techniques. Given a patient's CT scan and structured metadata -- including demographic, clinical, and nodule-level features -- the agent first performs cohort retrieval using FAISS-based similarity search across nine diverse real-world cohorts to identify the most relevant patient population from a multi-institutional database. Second, a Large Language Model (LLM) is prompted with the retrieved cohort and its associated performance metrics to recommend the optimal prediction algorithm from a pool of eight representative models, including classical linear risk models (e.g., Mayo, Brock), temporally-aware models (e.g., TDVIT, DLSTM), and multi-modal computer vision-based approaches (e.g., Liao, Sybil, DLS, DLI). This two-stage agent pipeline -- retrieval via FAISS and reasoning via LLM -- enables dynamic, cohort-aware risk prediction personalized to each patient's profile. Building on this architecture, the agent supports flexible and cohort-driven model selection across diverse clinical populations, offering a practical path toward individualized risk assessment in real-world lung cancer screening.

推荐(2篇)

【1】Large Foundation Model for Ads Recommendation
标题：广告推荐的大型基金会模型
链接：https://arxiv.org/abs/2508.14948

作者：hang, Shijie Quan, Zhongren Wang, Junwei Pan, Tianqu Zhuang, Bo Fu, Yilong Sun, Jieying Lin, Jushuo Chen, Xiaotian Li, Zhixiang Feng, Xian Hu, Huiting Deng, Hua Lu, Jinpeng Wang, Boqi Dai, Xiaoyu Chen, Bin Hu, Lili Huang, Yanwen Wu, Yeshou Cai, Qi Zhou, Huang Tang, Chunfeng Yang, Chengguo Yin, Tingyu Jiang, Lifeng Wang, Shudong Huang, Dapeng Liu, Lei Xiao, Haijie Gu, Shu-Tao Xia, Jie Jiang
摘要：在线广告依赖于准确的推荐模型，最近的进展是使用预先训练的大规模基础模型（LFM）来捕捉用户在多个场景和任务中的一般兴趣。然而，现有的方法有严重的局限性：他们提取和传输只有用户表示（URs），忽略了有价值的项目表示（IR）和用户项目交叉表示（CR）;他们只是使用一个UR作为下游应用程序的功能，这无法弥合上下游的差距，忽略了更多的传输粒度。在本文中，我们提出了LFM 4Ads，一个全表示多粒度的广告推荐框架。它首先全面传输UR、IR和CR，即，预训练基础模型中的所有可用表示。为了有效地利用CR，它识别最佳提取层并将其聚合为可转移的粗粒度形式。此外，我们通过多粒度机制增强了可转移性：非线性适配器的功能级传输，同构交互模块的模块级传输，和独立检索模型级传输。LFM 4Ads已成功部署在腾讯的行业级广告平台中，每天处理数百亿个样本，同时维护TB级模型参数，数十亿个稀疏嵌入密钥，覆盖约2000个特征。自2024年第四季度投入生产以来，LFM 4Ads已在各种广告场景中成功推出10多个产品，包括微信朋友圈和渠道等主要广告场景。这些发布在整个平台上实现了2.45%的整体GMV提升，转化为数亿美元的估计年收入增长。
摘要：Online advertising relies on accurate recommendation models, with recent advances using pre-trained large-scale foundation models (LFMs) to capture users' general interests across multiple scenarios and tasks. However, existing methods have critical limitations: they extract and transfer only user representations (URs), ignoring valuable item representations (IRs) and user-item cross representations (CRs); and they simply use a UR as a feature in downstream applications, which fails to bridge upstream-downstream gaps and overlooks more transfer granularities. In this paper, we propose LFM4Ads, an All-Representation Multi-Granularity transfer framework for ads recommendation. It first comprehensively transfers URs, IRs, and CRs, i.e., all available representations in the pre-trained foundation model. To effectively utilize the CRs, it identifies the optimal extraction layer and aggregates them into transferable coarse-grained forms. Furthermore, we enhance the transferability via multi-granularity mechanisms: non-linear adapters for feature-level transfer, an Isomorphic Interaction Module for module-level transfer, and Standalone Retrieval for model-level transfer. LFM4Ads has been successfully deployed in Tencent's industrial-scale advertising platform, processing tens of billions of daily samples while maintaining terabyte-scale model parameters with billions of sparse embedding keys across approximately two thousand features. Since its production deployment in Q4 2024, LFM4Ads has achieved 10+ successful production launches across various advertising scenarios, including primary ones like Weixin Moments and Channels. These launches achieve an overall GMV lift of 2.45% across the entire platform, translating to estimated annual revenue increases in the hundreds of millions of dollars.

【2】Personalized Recommendations via Active Utility-based Pairwise Sampling
标题：通过基于主动效用的成对抽样进行个性化推荐
链接：https://arxiv.org/abs/2508.14911

作者：oomand, James R. Wright
摘要：None
摘要：Recommender systems play a critical role in enhancing user experience by providing personalized suggestions based on user preferences. Traditional approaches often rely on explicit numerical ratings or assume access to fully ranked lists of items. However, ratings frequently fail to capture true preferences due to users' behavioral biases and subjective interpretations of rating scales, while eliciting full rankings is demanding and impractical. To overcome these limitations, we propose a generalized utility-based framework that learns preferences from simple and intuitive pairwise comparisons. Our approach is model-agnostic and designed to optimize for arbitrary, task-specific utility functions, allowing the system's objective to be explicitly aligned with the definition of a high-quality outcome in any given application. A central contribution of our work is a novel utility-based active sampling strategy for preference elicitation. This method selects queries that are expected to provide the greatest improvement to the utility of the final recommended outcome. We ground our preference model in the probabilistic Plackett-Luce framework for pairwise data. To demonstrate the versatility of our approach, we present two distinct experiments: first, an implementation using matrix factorization for a classic movie recommendation task, and second, an implementation using a neural network for a complex candidate selection scenario in university admissions. Experimental results demonstrate that our framework provides a more accurate, data-efficient, and user-centric paradigm for personalized ranking.

超分辨率|去噪|去模糊|去雾(1篇)

【1】Denoising by neural network for muzzle blast detection
标题：枪口爆炸检测的神经网络去噪
链接：https://arxiv.org/abs/2508.14919

作者：ujol, Matteo Bevillacqua, Christophe Thirard, Thierry Mazoyer
备注：INTER-NOISE 2024, Aug 2024, Nantes (France), France
摘要：Acoem开发枪击检测系统，包括麦克风阵列和软件，可以检测和定位战场上的射手。这种系统的性能显然受到其运行的声学环境的影响：特别是，当安装在移动的军用车辆上时，噪声的存在降低了软件的检测性能。为了限制声学环境的影响，已经开发了神经网络。我们没有使用重型卷积神经网络，而是选择了轻量级神经网络架构，以限制在尽可能多的硬件平台上嵌入算法所需的计算资源。由于两个隐藏层感知器和适当的信号处理技术的结合，脉冲枪口爆炸波形（来自爆炸的波，指示射手的位置）的检测率显着增加。当噪声的均方根值与枪口冲击波峰值幅度同阶时，采用这种去噪处理，检测率提高了一倍以上。
摘要：Acoem develops gunshot detection systems, consisting of a microphone array and software that detects and locates shooters on the battlefield. The performance of such systems is obviously affected by the acoustic environment in which they are operating: in particular, when mounted on a moving military vehicle, the presence of noise reduces the detection performance of the software. To limit the influence of the acoustic environment, a neural network has been developed. Instead of using a heavy convolutional neural network, a lightweight neural network architecture was chosen to limit the computational resources required to embed the algorithm on as many hardware platforms as possible. Thanks to the combination of a two hidden layer perceptron and appropriate signal processing techniques, the detection rate of impulsive muzzle blast waveforms (the wave coming from the detonation and indicating the position of the shooter) is significantly increased. With a rms value of noise of the same order as the muzzle blast peak amplitude, the detect rate is more than doubled with this denoising processing.

自动驾驶|车辆|车道检测等(1篇)

【1】Learning to Drive Ethically: Embedding Moral Reasoning into Autonomous Driving
标题：学习道德驾驶：将道德推理嵌入自动驾驶
链接：https://arxiv.org/abs/2508.14926

作者：Li, Ostap Okhrin
摘要：自动驾驶汽车在减少交通事故和提高运输效率方面有很大的希望，但它们的广泛采用取决于将强大的道德推理嵌入到日常和紧急操作中。在这里，我们提出了一个层次化的安全强化学习（Safe RL）框架，该框架明确地将道德考虑与标准驾驶目标相结合。在决策层，安全RL代理使用复合道德风险成本进行训练，结合碰撞概率和伤害严重程度，以生成高级运动目标。动态的优先经验回放机制增强了从罕见但关键的高风险事件中的学习。在执行层面，多项式路径规划结合比例-积分-微分（PID）和Stanley控制器将这些目标转化为平滑可行的轨迹，确保准确性和舒适性。我们在包含各种车辆、骑自行车者和行人的丰富的真实交通数据集上对我们的方法进行了训练和验证，并证明它在降低道德风险和保持驾驶性能方面优于基线方法。据我们所知，这是第一次在现实世界中通过安全RL对自动驾驶汽车进行道德决策的研究。我们的研究结果强调了将正式控制理论和数据驱动学习相结合的潜力，以在复杂的人类混合交通环境中推进道德上负责的自主权。
摘要：Autonomous vehicles hold great promise for reducing traffic fatalities and improving transportation efficiency, yet their widespread adoption hinges on embedding robust ethical reasoning into routine and emergency maneuvers. Here, we present a hierarchical Safe Reinforcement Learning (Safe RL) framework that explicitly integrates moral considerations with standard driving objectives. At the decision level, a Safe RL agent is trained using a composite ethical risk cost, combining collision probability and harm severity, to generate high-level motion targets. A dynamic Prioritized Experience Replay mechanism amplifies learning from rare but critical, high-risk events. At the execution level, polynomial path planning coupled with Proportional-Integral-Derivative (PID) and Stanley controllers translates these targets into smooth, feasible trajectories, ensuring both accuracy and comfort. We train and validate our approach on rich, real-world traffic datasets encompassing diverse vehicles, cyclists, and pedestrians, and demonstrate that it outperforms baseline methods in reducing ethical risk and maintaining driving performance. To our knowledge, this is the first study of ethical decision-making for autonomous vehicles via Safe RL in real-world scenarios. Our results highlight the potential of combining formal control theory and data-driven learning to advance ethically accountable autonomy in complex, human-mixed traffic environments.

联邦学习|隐私保护|加密(1篇)

【1】Integrated Sensing, Communication, and Computation for Over-the-Air Federated Edge Learning
标题：用于空中联合边缘学习的集成传感、通信和计算
链接：https://arxiv.org/abs/2508.15185

作者：en, Sijing Xie, Xiaowen Cao, Yuanhao Cui, Jie Xu, Yuanming Shi, Shuguang Cui
备注：The paper has been accepted for publication in IEEE Transactions on Wireless Communications
摘要：本文研究了一种集成传感、通信和计算（ISCC）的空中联合边缘学习（Air-FEEL）系统，其中一个边缘服务器协调多个边缘设备无线感知对象，并使用感知数据协作训练机器学习模型以执行识别任务。在该系统中，空中计算（AirComp）被用于实现来自边缘设备的一次性模型聚合。在这种设置下，我们分析了支持ISCC的Air-FEEL在损失函数退化方面的收敛行为，特别是考虑到训练数据采集期间的无线传感噪声和空中模型聚合期间的AirComp失真。理论分析表明，感知、通信和计算竞争网络资源，共同决定收敛速度。在此基础上，以最大化损失函数退化为目标，在保证每轮时延和能量预算的前提下，设计了ISCC参数。挑战在于不同设备之间的传感，通信和计算的紧密耦合过程。为了解决这一问题，我们通过交替优化批量控制和网络资源分配，得到了一个低复杂度的ISCC算法。研究发现，对于每一个设备，更少的传感功率应消耗，如果一个更大的批量的数据样本，获得反之亦然。此外，对于给定的批量大小，一个设备的最佳计算速度是满足延迟约束的最小计算速度。基于人体运动识别任务的数值结果验证了理论的收敛性分析，并表明所提出的ISCC算法很好地协调了批量大小控制和资源分配之间的传感，通信和计算，以提高学习性能。
摘要：This paper studies an over-the-air federated edge learning (Air-FEEL) system with integrated sensing, communication, and computation (ISCC), in which one edge server coordinates multiple edge devices to wirelessly sense the objects and use the sensing data to collaboratively train a machine learning model for recognition tasks. In this system, over-the-air computation (AirComp) is employed to enable one-shot model aggregation from edge devices. Under this setup, we analyze the convergence behavior of the ISCC-enabled Air-FEEL in terms of the loss function degradation, by particularly taking into account the wireless sensing noise during the training data acquisition and the AirComp distortions during the over-the-air model aggregation. The result theoretically shows that sensing, communication, and computation compete for network resources to jointly decide the convergence rate. Based on the analysis, we design the ISCC parameters under the target of maximizing the loss function degradation while ensuring the latency and energy budgets in each round. The challenge lies on the tightly coupled processes of sensing, communication, and computation among different devices. To tackle the challenge, we derive a low-complexity ISCC algorithm by alternately optimizing the batch size control and the network resource allocation. It is found that for each device, less sensing power should be consumed if a larger batch of data samples is obtained and vice versa. Besides, with a given batch size, the optimal computation speed of one device is the minimum one that satisfies the latency constraint. Numerical results based on a human motion recognition task verify the theoretical convergence analysis and show that the proposed ISCC algorithm well coordinates the batch size control and resource allocation among sensing, communication, and computation to enhance the learning performance.

推理|分析|理解|解释(7篇)

【1】Inductive Domain Transfer In Misspecified Simulation-Based Inference
标题：错误指定的基于模拟的推理中的归纳域转移
链接：https://arxiv.org/abs/2508.15593

作者：ouf, Antoine Wehenkel, Cédric Vincent-Cuaz, Emmanuel Abbé, Pascal Frossard
摘要：基于仿真的推理（SBI）是一种统计推理方法，用于在可能性难以处理但仿真可用时估计物理系统的潜在参数。在实践中，SBI经常受到模型错误指定的阻碍-由固有的建模简化引起的模拟和真实世界观测之间的不匹配。RoPE是一种最新的SBI方法，通过两阶段域转移过程解决了这一挑战，该过程将半监督校准与基于最佳传输（OT）的分布对齐相结合。然而，RoPE在完全转换的设置中操作，需要在推理时访问一批测试样本，这限制了可扩展性和泛化性。我们在这里提出了一个完全归纳和摊销的SBI框架，将校准和分布对齐集成到一个单一的端到端的可训练模型中。我们的方法利用具有封闭形式耦合的小批量OT来对齐对应于相同潜在参数的真实和模拟观测，使用配对的校准数据和未配对的样本。然后训练条件归一化流以近似OT诱导的后验，从而在测试时无需模拟访问即可实现有效的推理。在一系列合成和现实世界的基准测试中（包括复杂的医学生物标志物估计），我们的方法匹配或超越了RoPE以及其他标准SBI和非SBI估计器的性能，同时在具有挑战性的、错误指定的环境中提供了更好的可扩展性和适用性。
摘要：Simulation-based inference (SBI) is a statistical inference approach for estimating latent parameters of a physical system when the likelihood is intractable but simulations are available. In practice, SBI is often hindered by model misspecification--the mismatch between simulated and real-world observations caused by inherent modeling simplifications. RoPE, a recent SBI approach, addresses this challenge through a two-stage domain transfer process that combines semi-supervised calibration with optimal transport (OT)-based distribution alignment. However, RoPE operates in a fully transductive setting, requiring access to a batch of test samples at inference time, which limits scalability and generalization. We propose here a fully inductive and amortized SBI framework that integrates calibration and distributional alignment into a single, end-to-end trainable model. Our method leverages mini-batch OT with a closed-form coupling to align real and simulated observations that correspond to the same latent parameters, using both paired calibration data and unpaired samples. A conditional normalizing flow is then trained to approximate the OT-induced posterior, enabling efficient inference without simulation access at test time. Across a range of synthetic and real-world benchmarks--including complex medical biomarker estimation--our approach matches or surpasses the performance of RoPE, as well as other standard SBI and non-SBI estimators, while offering improved scalability and applicability in challenging, misspecified environments.

【2】Demonstrating Onboard Inference for Earth Science Applications with Spectral Analysis Algorithms and Deep Learning
标题：利用光谱分析算法和深度学习演示地球科学应用的机载推理
链接：https://arxiv.org/abs/2508.15053

作者：erstein, Alberto Candela, Steve Chien, David Rijlaarsdam, Tom Hendrix, Leonie Buckley, Aubrey Dunne
备注：International Symposium on Artificial Intelligence, Robotics and Automation in Space, November 2024
摘要：喷气推进实验室与Ubotica技术公司合作，正在展示CogniSAT-6/HAMMER（CS-6）上最先进的数据分析。CS-6是一颗带有可见光和近红外范围高光谱仪器和神经网络加速硬件的卫星。在边缘（例如机载）执行数据分析可以实现新的地球科学测量和响应。我们将使用深度学习和频谱分析算法演示CS-6机载数据分析和推理。
摘要：In partnership with Ubotica Technologies, the Jet Propulsion Laboratory is demonstrating state-of-the-art data analysis onboard CogniSAT-6/HAMMER (CS-6). CS-6 is a satellite with a visible and near infrared range hyperspectral instrument and neural network acceleration hardware. Performing data analysis at the edge (e.g. onboard) can enable new Earth science measurements and responses. We will demonstrate data analysis and inference onboard CS-6 for numerous applications using deep learning and spectral analysis algorithms.

【3】TOAST: Fast and scalable auto-partitioning based on principled static analysis
标题：TOAST：基于原则性静态分析的快速可扩展自动分区
链接：https://arxiv.org/abs/2508.15010

作者：ed, Dominik Grewe, Norman Alexander Rink, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Daniel Belov
摘要：在分布式加速器系统中划分大型机器学习模型是一个复杂的过程，需要一系列相互依赖的决策，这些决策由于内部分片模糊而进一步复杂化。因此，现有的自动分区器经常遭受内存不足错误，或者在探索可能的分区的指数级大空间时速度非常慢。为了缓解这种情况，他们人为地限制搜索空间，但这种方法经常产生不可行的解决方案，违反设备内存限制或导致次优性能。我们提出了一个新的静态编译器分析与蒙特卡洛树搜索相结合的系统。我们的分析通过识别（i）需要相同分片的张量维度，以及（ii）需要解决的分区“冲突”来构建有效的决策空间。我们的系统在不同的硬件平台和模型架构上的性能明显优于最先进的工业方法，发现了以前未知的卓越解决方案，即使对于复杂和大型模型，该过程也是完全自动化的。
摘要：Partitioning large machine learning models across distributed accelerator systems is a complex process, requiring a series of interdependent decisions that are further complicated by internal sharding ambiguities. Consequently, existing auto-partitioners often suffer from out-of-memory errors or are prohibitively slow when exploring the exponentially large space of possible partitionings. To mitigate this, they artificially restrict the search space, but this approach frequently yields infeasible solutions that violate device memory constraints or lead to sub-optimal performance. We propose a system that combines a novel static compiler analysis with a Monte Carlo Tree Search. Our analysis constructs an efficient decision space by identifying (i) tensor dimensions requiring identical sharding, and (ii) partitioning "conflicts" that require resolution. Our system significantly outperforms state-of-the-art industrial methods across diverse hardware platforms and model architectures, discovering previously unknown, superior solutions, and the process is fully automated even for complex and large models.

【4】Inference Time Debiasing Concepts in Diffusion Models
标题：扩散模型中的推理时间去偏差概念
链接：https://arxiv.org/abs/2508.14933

作者：Kupssinskü, Marco N. Bochernitsan, Jordan Kopper, Otávio Parraga, Rodrigo C. Barros
摘要：我们提出了DeCoDi，一个基于文本到图像扩散模型的去偏置过程，它改变了推理过程，不会显著改变图像质量，计算开销可以忽略不计，并且可以应用于任何基于扩散的图像生成模型。DeCoDi改变了扩散过程，以避免有偏见的概念的潜在维度区域。虽然大多数深度学习去偏方法需要复杂或计算密集型的干预，但我们的方法仅用于改变推理过程。因此，它更容易为广大从业者所接受。我们通过对护士、消防员和CEO的概念进行性别、种族和年龄的去偏置来展示该方法的有效性。两名不同的人类评估员手动检查1，200张生成的图像。他们的评估结果提供了证据，证明我们的方法在减轻基于性别、种族和年龄的偏见方面是有效的。我们还表明，由GPT4o进行的自动偏倚评估与人类评估在统计学上没有显著差异。我们的评估显示出可喜的结果，评估人员之间的协议和更多的保护属性的覆盖率可靠的水平。我们的方法有可能显着提高它所产生的基于扩散的文本到图像生成模型的图像的多样性。
摘要：We propose DeCoDi, a debiasing procedure for text-to-image diffusion-based models that changes the inference procedure, does not significantly change image quality, has negligible compute overhead, and can be applied in any diffusion-based image generation model. DeCoDi changes the diffusion process to avoid latent dimension regions of biased concepts. While most deep learning debiasing methods require complex or compute-intensive interventions, our method is designed to change only the inference procedure. Therefore, it is more accessible to a wide range of practitioners. We show the effectiveness of the method by debiasing for gender, ethnicity, and age for the concepts of nurse, firefighter, and CEO. Two distinct human evaluators manually inspect 1,200 generated images. Their evaluation results provide evidence that our method is effective in mitigating biases based on gender, ethnicity, and age. We also show that an automatic bias evaluation performed by the GPT4o is not significantly statistically distinct from a human evaluation. Our evaluation shows promising results, with reliable levels of agreement between evaluators and more coverage of protected attributes. Our method has the potential to significantly improve the diversity of images it generates by diffusion-based text-to-image generative models.

【5】Privacy Preserving Inference of Personalized Content for Out of Matrix Users
标题：为矩阵外用户提供个性化内容的隐私保护推理
链接：https://arxiv.org/abs/2508.14905

作者：un, Tai Vu, Andrew Wang
摘要：利基和动态社区的推荐系统面临着持续的挑战，从数据稀疏，冷启动用户和项目，隐私约束。传统的协同过滤和基于内容的方法在这些环境中表现不佳，要么需要侵入性的用户数据，要么在没有偏好历史时失败。我们提出了DeepNaniNet，这是一个深度神经推荐框架，它通过一个基于归纳图的架构来解决这些挑战，该架构结合了用户-项目交互、项目-项目关系和来自BERT的丰富文本评论嵌入。我们的设计使冷启动建议没有配置文件挖掘，使用一种新的“内容篮”用户表示和一个自动编码器为基础的泛化策略看不见的用户。我们引入了AnimeULike，一个包含10，000个动漫标题和13，000个用户的新数据集，以评估具有高比例访客或低活动用户的现实场景中的性能。DeepNaniNet在CiteULike基准测试中实现了最先进的冷启动结果，在用户召回方面与DropoutNet相匹配，而不会降低矩阵外用户的性能，并且在Recall@100中，在AnimeULike热启动上的性能分别优于加权矩阵分解（WMF）和DropoutNet高达7倍和1.5倍。我们的研究结果表明，DeepNaniNet在数据稀疏，冷启动繁重的环境中提供高质量，隐私保护的建议，同时有效地集成异构内容源。
摘要：Recommender systems for niche and dynamic communities face persistent challenges from data sparsity, cold start users and items, and privacy constraints. Traditional collaborative filtering and content-based approaches underperform in these settings, either requiring invasive user data or failing when preference histories are absent. We present DeepNaniNet, a deep neural recommendation framework that addresses these challenges through an inductive graph-based architecture combining user-item interactions, item-item relations, and rich textual review embeddings derived from BERT. Our design enables cold start recommendations without profile mining, using a novel "content basket" user representation and an autoencoder-based generalization strategy for unseen users. We introduce AnimeULike, a new dataset of 10,000 anime titles and 13,000 users, to evaluate performance in realistic scenarios with high proportions of guest or low-activity users. DeepNaniNet achieves state-of-the-art cold start results on the CiteULike benchmark, matches DropoutNet in user recall without performance degradation for out-of-matrix users, and outperforms Weighted Matrix Factorization (WMF) and DropoutNet on AnimeULike warm start by up to 7x and 1.5x in Recall@100, respectively. Our findings demonstrate that DeepNaniNet delivers high-quality, privacy-preserving recommendations in data-sparse, cold start-heavy environments while effectively integrating heterogeneous content sources.

【6】Bayesian Inference and Learning in Nonlinear Dynamical Systems: A Framework for Incorporating Explicit and Implicit Prior Knowledge
标题：非线性动态系统中的Bayesian推理和学习：综合显式和隐式先验知识的框架
链接：https://arxiv.org/abs/2508.15345

作者：kmann, Jan-Hendrik Ewering, Michael Meindl, Simon F. G. Ehlers, Thomas Seel
备注：16 pages, Preprint submitted to Automatica
摘要：准确性和泛化能力是学习动态系统模型时的关键目标。为了从有限的数据中获得这样的模型，目前的工作利用先验知识和假设的系统。然而，融合不同的先验知识，e。G.部分已知的系统方程和关于未知模型部分的平滑假设以及包含在数据中的信息仍然是一个具有挑战性的问题，特别是在具有潜在系统状态的输入-输出设置中。特别是，嵌套在已知系统方程中的学习函数可能是一项费力且容易出错的专家任务。本文考虑的潜在状态的推理和学习未知的模型部分的融合数据信息与不同来源的先验知识。主要贡献是一个通用的系统识别工具，第一次，提供了一个一致的解决方案，在线和离线贝叶斯推理和学习，同时允许将显式和隐式的先验系统知识。我们提出了一种新的接口结合已知的动态功能与基于学习的近似未知系统的部分。基于所提出的模型结构，封闭形式的密度，有效的参数边缘化。不需要用户定制的坐标变换或模型反演，使所提出的框架成为推理和学习的通用工具。在三个不同的案例研究，包括实验数据集的设计框架的广泛适用性。
摘要：Accuracy and generalization capabilities are key objectives when learning dynamical system models. To obtain such models from limited data, current works exploit prior knowledge and assumptions about the system. However, the fusion of diverse prior knowledge, e. g. partially known system equations and smoothness assumptions about unknown model parts, with information contained in the data remains a challenging problem, especially in input-output settings with latent system state. In particular, learning functions that are nested inside known system equations can be a laborious and error-prone expert task. This paper considers inference of latent states and learning of unknown model parts for fusion of data information with different sources of prior knowledge. The main contribution is a general-purpose system identification tool that, for the first time, provides a consistent solution for both, online and offline Bayesian inference and learning while allowing to incorporate explicit and implicit prior system knowledge. We propose a novel interface for combining known dynamics functions with a learning-based approximation of unknown system parts. Based on the proposed model structure, closed-form densities for efficient parameter marginalization are derived. No user-tailored coordinate transformations or model inversions are needed, making the presented framework a general-purpose tool for inference and learning. The broad applicability of the devised framework is illustrated in three distinct case studies, including an experimental data set.

【7】Enhanced Predictive Modeling for Hazardous Near-Earth Object Detection: A Comparative Analysis of Advanced Resampling Strategies and Machine Learning Algorithms in Planetary Risk Assessment
标题：危险近地物体检测的增强预测建模：行星风险评估中高级恢复策略和机器学习算法的比较分析
链接：https://arxiv.org/abs/2508.15106

作者：handra
摘要：本研究评估了几种机器学习模型通过二进制分类框架预测危险近地天体的性能，包括数据缩放、功率转换和交叉验证。比较了六种分类器，即随机森林分类器（RFC），梯度提升分类器（GBC），支持向量分类器（SVC），线性判别分析（LDA），逻辑回归（LR）和K-最近邻（KNN）。RFC和GBC表现最好，两者的F2分数分别为0.987和0.986，变异性非常小。其次是SVC，得分较低但合理，为0.896。LDA和LR的性能中等，得分分别为0.749和0.748左右，而KNN的性能较差，得分为0.691，这是由于难以处理复杂的数据模式。RFC和GBC还提供了大量的混淆矩阵，假阳性和假阴性的数量可以忽略不计，从而分别获得了99.7%和99.6%的出色准确率。这些发现突出了集成方法在高精度和高召回率方面的强大功能，并进一步指出了针对数据集特征和所选评估指标进行定制模型选择的重要性。今后的研究可侧重于利用先进的特征工程优化超参数，以提高近地天体危险预测模型的准确性和稳健性。
摘要：This study evaluates the performance of several machine learning models for predicting hazardous near-Earth objects (NEOs) through a binary classification framework, including data scaling, power transformation, and cross-validation. Six classifiers were compared, namely Random Forest Classifier (RFC), Gradient Boosting Classifier (GBC), Support Vector Classifier (SVC), Linear Discriminant Analysis (LDA), Logistic Regression (LR), and K-Nearest Neighbors (KNN). RFC and GBC performed the best, both with an impressive F2-score of 0.987 and 0.986, respectively, with very small variability. SVC followed, with a lower but reasonable score of 0.896. LDA and LR had a moderate performance with scores of around 0.749 and 0.748, respectively, while KNN had a poor performance with a score of 0.691 due to difficulty in handling complex data patterns. RFC and GBC also presented great confusion matrices with a negligible number of false positives and false negatives, which resulted in outstanding accuracy rates of 99.7% and 99.6%, respectively. These findings highlight the power of ensemble methods for high precision and recall and further point out the importance of tailored model selection with regard to dataset characteristics and chosen evaluation metrics. Future research could focus on the optimization of hyperparameters with advanced features engineering to further the accuracy and robustness of the model on NEO hazard predictions.

检测相关(2篇)

【1】Probability Density from Latent Diffusion Models for Out-of-Distribution Detection
标题：来自潜在扩散模型的分布外检测概率密度
链接：https://arxiv.org/abs/2508.15737

作者：rve, Karl Kaspar Haavel, Meelis Kull
备注：ECAI 2025
摘要：尽管人工智能发展迅速，但安全性仍然是部署机器学习系统的主要瓶颈。一个关键的安全组件是分布外检测：给定一个输入，决定它是否来自与训练数据相同的分布。在生成模型中，最自然的OOD得分是数据可能性。实际上，在均匀分布的OOD数据的假设下，可能性甚至是最佳的OOD检测器，正如我们在这项工作中所示。然而，早期的工作报告说，可能性往往在实践中失败，提出了怀疑其有用性。我们探讨是否，在实践中，表示空间也遭受无法学习良好的密度估计OOD检测，或者如果它仅仅是一个问题，通常用于生成模型的像素空间。为了测试这一点，我们不是在图像上训练变分扩散模型，而是在预先训练的ResNet-18的表示空间上训练，以评估我们基于似然性的检测器与OpenOOD套件中最先进的方法相比的性能。
摘要：Despite rapid advances in AI, safety remains the main bottleneck to deploying machine-learning systems. A critical safety component is out-of-distribution detection: given an input, decide whether it comes from the same distribution as the training data. In generative models, the most natural OOD score is the data likelihood. Actually, under the assumption of uniformly distributed OOD data, the likelihood is even the optimal OOD detector, as we show in this work. However, earlier work reported that likelihood often fails in practice, raising doubts about its usefulness. We explore whether, in practice, the representation space also suffers from the inability to learn good density estimation for OOD detection, or if it is merely a problem of the pixel space typically used in generative models. To test this, we trained a Variational Diffusion Model not on images, but on the representation space of a pre-trained ResNet-18 to assess the performance of our likelihood-based detector in comparison to state-of-the-art methods from the OpenOOD suite.

【2】An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models
标题：基于预训练模型为异常声音检测量身定制的增强音频特征
链接：https://arxiv.org/abs/2508.15334

作者：ong, Qing Wang, Jun Du, Lei Wang, Mingqi Cai, Xin Fang
备注：13 pages, 3 figures, accepted by ICANN2025
摘要：异常声音检测（ASD）旨在识别机器发出的异常声音，已引起学术界和工业界的广泛研究兴趣。然而，异常位置的不确定性和机器声音中的大量冗余信息（如噪声）阻碍了ASD系统性能的提高。本文提出了一种新的音频特征的滤波器组均匀分布的时间间隔，确保平等地关注音频中的所有频率范围，这增强了机器声音中的异常检测。此外，基于预训练的模型，本文提出了一种无参数的特征增强方法来去除机器音频中的冗余信息。据信，这种无参数策略有助于在模型微调期间将通用知识从预先训练的任务有效地转移到ASD任务。声学场景和事件的检测和分类（DCASE）2024挑战数据集的评估结果表明，我们提出的方法在ASD性能方面有显着改善。
摘要：Anomalous Sound Detection (ASD) aims at identifying anomalous sounds from machines and has gained extensive research interests from both academia and industry. However, the uncertainty of anomaly location and much redundant information such as noise in machine sounds hinder the improvement of ASD system performance. This paper proposes a novel audio feature of filter banks with evenly distributed intervals, ensuring equal attention to all frequency ranges in the audio, which enhances the detection of anomalies in machine sounds. Moreover, based on pre-trained models, this paper presents a parameter-free feature enhancement approach to remove redundant information in machine audio. It is believed that this parameter-free strategy facilitates the effective transfer of universal knowledge from pre-trained tasks to the ASD task during model fine-tuning. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge dataset demonstrate significant improvements in ASD performance with our proposed methods.

分类|识别(4篇)

【1】Classification errors distort findings in automated speech processing: examples and solutions from child-development research
标题：分类错误扭曲自动语音处理中的结果：儿童发展研究的例子和解决方案
链接：https://arxiv.org/abs/2508.15637

作者：theron, Evan Kidd, Anton Malko, Marvin Lavechin, Alejandrina Cristia
摘要：随着可穿戴录音机的出现，科学家们越来越多地转向音频和视频数据的自动分析方法，以衡量儿童的经验，行为和结果，大量文献采用长格式录音来研究语言习得。虽然许多文章报道了最流行的自动分类器的准确性和可靠性，但关于分类错误对测量和统计推断的下游影响（例如，回归中的相关性和效应量的估计）。本文提出了一种贝叶斯方法来研究算法错误对关键科学问题的影响，包括兄弟姐妹对儿童语言经验的影响以及儿童的产出与输入之间的关联。在最常用的\gls{lena}和开源替代方案（ACLEW系统的语音类型分类器）中，我们发现分类错误会显着扭曲估计。例如，自动化注释低估了兄弟姐妹对成人输入的负面影响20- 80%，可能使其低于统计学显著性阈值。我们进一步表明，贝叶斯校准方法恢复无偏估计的效果大小可以是有效的和有见地的，但并没有提供一个万无一失的解决方案。报告的问题和我们的解决方案都可以应用于任何涉及事件检测和非零错误率分类的分类器。
摘要：With the advent of wearable recorders, scientists are increasingly turning to automated methods of analysis of audio and video data in order to measure children's experience, behavior, and outcomes, with a sizable literature employing long-form audio-recordings to study language acquisition. While numerous articles report on the accuracy and reliability of the most popular automated classifiers, less has been written on the downstream effects of classification errors on measurements and statistical inferences (e.g., the estimate of correlations and effect sizes in regressions). This paper proposes a Bayesian approach to study the effects of algorithmic errors on key scientific questions, including the effect of siblings on children's language experience and the association between children's production and their input. In both the most commonly used \gls{lena}, and an open-source alternative (the Voice Type Classifier from the ACLEW system), we find that classification errors can significantly distort estimates. For instance, automated annotations underestimated the negative effect of siblings on adult input by 20--80\%, potentially placing it below statistical significance thresholds. We further show that a Bayesian calibration approach for recovering unbiased estimates of effect sizes can be effective and insightful, but does not provide a fool-proof solution. Both the issue reported and our solution may apply to any classifier involving event detection and classification with non-zero error rates.

【2】Nonlinear Federated System Identification
标题：非线性联邦系统识别
链接：https://arxiv.org/abs/2508.15025

作者：e, Max Hartman, Lav R. Varshney, Saurav Prakash
摘要：我们考虑线性参数化非线性系统的联邦学习。我们建立了理论保证联邦非线性系统识别的有效性相比，集中式的方法，表明收敛速度提高客户端的数量增加。虽然在线性和非线性情况下的收敛速度只相差一个常数，这个常数取决于特征映射$\phi$，可以在非线性设置中仔细选择，以增加激励和提高性能。我们通过实验验证了我们的理论，在物理环境中，客户端设备由i.i.d.驱动。控制输入和控制策略表现出I.I.D.随机扰动，确保非主动勘探。实验使用的轨迹从非线性动力学系统的特点是真正的解析功能，包括多项式和三角分量，代表物理系统，包括钟摆和四旋翼动力学。我们分析了所提出的方法在不同的噪声水平和数据分布下的收敛行为。结果表明，随着参与客户端数量的增加，联邦学习始终如一地提高了任何单个客户端的收敛性。
摘要：We consider federated learning of linearly-parameterized nonlinear systems. We establish theoretical guarantees on the effectiveness of federated nonlinear system identification compared to centralized approaches, demonstrating that the convergence rate improves as the number of clients increases. Although the convergence rates in the linear and nonlinear cases differ only by a constant, this constant depends on the feature map $\phi$, which can be carefully chosen in the nonlinear setting to increase excitation and improve performance. We experimentally validate our theory in physical settings where client devices are driven by i.i.d. control inputs and control policies exhibiting i.i.d. random perturbations, ensuring non-active exploration. Experiments use trajectories from nonlinear dynamical systems characterized by real-analytic feature functions, including polynomial and trigonometric components, representative of physical systems including pendulum and quadrotor dynamics. We analyze the convergence behavior of the proposed method under varying noise levels and data distributions. Results show that federated learning consistently improves convergence of any individual client as the number of participating clients increases.

【3】Human Feedback Driven Dynamic Speech Emotion Recognition
标题：人类反馈驱动的动态语音情感识别
链接：https://arxiv.org/abs/2508.14920

作者：rov, Dmitry Korobchenko
摘要：本文的工作为动态语音情感识别开辟了一个新的领域。与传统方法不同，我们假设每个音轨都与在不同时刻活跃的情绪序列相关联。该研究特别关注情感3D化身的动画。我们提出了一个多阶段的方法，包括一个经典的语音情感识别模型的训练，情感序列的合成生成，并进一步改进模型的基础上，人类的反馈。此外，我们介绍了一种新的方法来建模的基础上的狄利克雷分布的情感混合物。基于从3D面部动画数据集中提取的真实情感对模型进行评估。我们将我们的模型与滑动窗口方法进行比较。我们的实验结果表明，狄利克雷为基础的方法在建模情感的混合物的有效性。简化人工反馈进一步提高了模型质量，同时提供了简化的注释过程。
摘要：This work proposes to explore a new area of dynamic speech emotion recognition. Unlike traditional methods, we assume that each audio track is associated with a sequence of emotions active at different moments in time. The study particularly focuses on the animation of emotional 3D avatars. We propose a multi-stage method that includes the training of a classical speech emotion recognition model, synthetic generation of emotional sequences, and further model improvement based on human feedback. Additionally, we introduce a novel approach to modeling emotional mixtures based on the Dirichlet distribution. The models are evaluated based on ground-truth emotions extracted from a dataset of 3D facial animations. We compare our models against the sliding window approach. Our experimental results show the effectiveness of Dirichlet-based approach in modeling emotional mixtures. Incorporating human feedback further improves the model quality while providing a simplified annotation procedure.

【4】Effect Identification and Unit Categorization in the Multi-Score Regression Discontinuity Design with Application to LED Manufacturing
标题：多分数回归不连续性设计中的效应识别和单元分类及其在LED制造中的应用
链接：https://arxiv.org/abs/2508.15692

作者：lexander Schwarz, Oliver Schacht, Sven Klaassen, Johannes Oberpriller, Martin Spindler
摘要：RDD（回归不连续设计）是一种广泛使用的框架，用于识别和估计单个运行变量的截止点处的因果效应。实际环境，特别是生产系统中遇到的环境，往往涉及由多个阈值和标准界定的决策。常见的MRD（多得分RDD）方法将这些转换为一维设计，采用识别和估计结果。然而，这种做法可能会引入不合规行为。我们开发的理论工具，以确定和减少一些这种“模糊”估计截止效应的子规则的遵守者。我们提供了一个健全的定义和多维截止规则的单位行为类型的分类，扩展现有的分类。我们确定的截止效应的存在和识别条件编译器在多个维度，并指定识别时保持稳定后，排除nevertaker和alwaystaker。此外，我们研究如何分解截止规则到更简单的部分改变了单位的行为。这使得能够识别和删除不符合规定的单位，从而有可能改进估计数。我们验证了我们的框架模拟和真实世界的数据从光电子半导体制造。我们的实证结果表明，精炼生产政策的可用性。特别是，我们表明，我们的方法减少了估计方差，突出了MRD框架在制造业中的实用价值。
摘要：The RDD (regression discontinuity design) is a widely used framework for identification and estimation of causal effects at a cutoff of a single running variable. Practical settings, in particular those encountered in production systems, often involve decision-making defined by multiple thresholds and criteria. Common MRD (multi-score RDD) approaches transform these to a one-dimensional design, to employ identification and estimation results. However, this practice can introduce non-compliant behavior. We develop theoretical tools to identify and reduce some of this "fuzziness" when estimating the cutoff-effect on compliers of sub-rules. We provide a sound definition and categorization of unit behavior types for multi-dimensional cutoff-rules, extending existing categorizations. We identify conditions for the existence and identification of the cutoff-effect on complier in multiple dimensions, and specify when identification remains stable after excluding nevertaker and alwaystaker. Further, we investigate how decomposing cutoff-rules into simpler parts alters the unit behavior. This allows identification and removal of non-compliant units potentially improving estimates. We validate our framework on simulated and real-world data from opto-electronic semiconductor manufacturing. Our empirical results demonstrate the usability for refining production policies. Particularly we show that our approach decreases the estimation variance, highlighting the practical value of the MRD framework in manufacturing.

表征(2篇)

【1】Evaluating Sparse Autoencoders for Monosemantic Representation
标题：评估稀疏自编码器的单语义表示
链接：https://arxiv.org/abs/2508.15094

作者：reidouni, Muhammad Umair Haider, Peizhong Ju, A.B. Siddique
摘要：解释大型语言模型的一个关键障碍是多义性，即神经元激活多个不相关的概念。稀疏自动编码器（SAE）已经被提出来通过将密集的激活转换为稀疏的、更可解释的特征来缓解这个问题。虽然以前的工作表明，严重不良事件促进单义性，一直没有定量比较与他们的基础模型。本文提供了第一个系统的评价严重不良事件对基础模型有关单义性。我们引入了一个细粒度的概念可分性分数的基础上的詹森-香农距离，它捕捉如何明显的神经元的激活分布不同的概念。使用Gemma-2-2B和多个SAE变体在五个基准，我们表明，SAE减少多义，实现更高的概念可分性。然而，SAE的更大稀疏性并不总是产生更好的可分离性，并且经常损害下游性能。为了评估实际效用，我们使用两种策略评估概念水平的干预：完全神经元掩蔽和部分抑制。我们发现，与基础模型相比，SAE在使用部分抑制时能够实现更精确的概念水平控制。在此基础上，我们提出了后验概率衰减（APP），这是一种新的干预方法，使用概念条件激活分布进行靶向抑制。APP在有针对性的概念删除方面优于现有方法。
摘要：A key barrier to interpreting large language models is polysemanticity, where neurons activate for multiple unrelated concepts. Sparse autoencoders (SAEs) have been proposed to mitigate this issue by transforming dense activations into sparse, more interpretable features. While prior work suggests that SAEs promote monosemanticity, there has been no quantitative comparison with their base models. This paper provides the first systematic evaluation of SAEs against base models concerning monosemanticity. We introduce a fine-grained concept separability score based on the Jensen-Shannon distance, which captures how distinctly a neuron's activation distributions vary across concepts. Using Gemma-2-2B and multiple SAE variants across five benchmarks, we show that SAEs reduce polysemanticity and achieve higher concept separability. However, greater sparsity of SAEs does not always yield better separability and often impairs downstream performance. To assess practical utility, we evaluate concept-level interventions using two strategies: full neuron masking and partial suppression. We find that, compared to base models, SAEs enable more precise concept-level control when using partial suppression. Building on this, we propose Attenuation via Posterior Probabilities (APP), a new intervention method that uses concept-conditioned activation distributions for targeted suppression. APP outperforms existing approaches in targeted concept removal.

【2】Kernel-based Equalized Odds: A Quantification of Accuracy-Fairness Trade-off in Fair Representation Learning
标题：基于核的均衡赔率：公平表示学习中准确性与公平性权衡的量化
链接：https://arxiv.org/abs/2508.15084

作者： Xiaoming Huo
摘要：本文介绍了一种新的基于核的公式的均衡赔率（EO）标准，表示为$EO_k$，在监督设置的公平表示学习（FRL）。FRL的中心目标是减轻歧视的敏感属性$S$，同时保持预测精度的目标变量$Y$。我们提出的标准可以对三个核心公平目标进行严格和可解释的量化：独立性（预测$\hat{Y}$独立于$S$），分离（也称为均衡化几率;预测$\hat{Y}$独立于$S$，条件是目标属性$Y$）和校准（$Y$独立于$S$，条件是预测$\hat{Y}$）。在无偏（$Y$独立于$S$）和有偏（$Y$依赖于$S$）条件下，我们证明了$EO_k$在前者中同时满足独立性和分离性，在后者中唯一地保持了预测精度，同时降低了边界独立性和校准性，从而为这些公平性准则之间的权衡提供了一个统一的分析表征。我们进一步定义了经验对应物，$\hat{EO}_k$，一个基于核的统计量，可以在二次时间内计算，线性时间近似也可用。一个集中不等式$\hat{EO}_k$的推导，提供性能保证和误差界，作为实际的公平性合规证书。虽然我们的重点是理论的发展，结果奠定了必要的基础原则和可证明公平的算法设计在未来的实证研究。
摘要：This paper introduces a novel kernel-based formulation of the Equalized Odds (EO) criterion, denoted as $EO_k$, for fair representation learning (FRL) in supervised settings. The central goal of FRL is to mitigate discrimination regarding a sensitive attribute $S$ while preserving prediction accuracy for the target variable $Y$. Our proposed criterion enables a rigorous and interpretable quantification of three core fairness objectives: independence (prediction $\hat{Y}$ is independent of $S$), separation (also known as equalized odds; prediction $\hat{Y}$ is independent with $S$ conditioned on target attribute $Y$), and calibration ($Y$ is independent of $S$ conditioned on the prediction $\hat{Y}$). Under both unbiased ($Y$ is independent of $S$) and biased ($Y$ depends on $S$) conditions, we show that $EO_k$ satisfies both independence and separation in the former, and uniquely preserves predictive accuracy while lower bounding independence and calibration in the latter, thereby offering a unified analytical characterization of the tradeoffs among these fairness criteria. We further define the empirical counterpart, $\hat{EO}_k$, a kernel-based statistic that can be computed in quadratic time, with linear-time approximations also available. A concentration inequality for $\hat{EO}_k$ is derived, providing performance guarantees and error bounds, which serve as practical certificates of fairness compliance. While our focus is on theoretical development, the results lay essential groundwork for principled and provably fair algorithmic design in future empirical studies.

编码器(2篇)

【1】CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing
标题：CUPE：用于语音不可知语音处理的无上下文通用音素编码器
链接：https://arxiv.org/abs/2508.15316

作者：man, Jian-Jun Zhang, Xiaosong Yang
备注：Accepted in: 8th International Conference on Natural Language and Speech Processing (ICNLSP 2025)
摘要：通用音素识别通常需要分析长语音段和特定语言模式。许多语音处理任务需要不受上下文影响的纯音素表示，这促使我们开发了CUPE -一种轻量级模型，可以在120毫秒内捕获关键音素特征，大约是一个音素的长度。CUPE独立处理短的固定宽度窗口，尽管参数比当前方法少，但通过学习所有语言共同的基本声学模式，实现了具有竞争力的跨语言性能。我们通过对不同语言的监督和自我监督训练进行了广泛的评估，包括对UCLA语音语料库的zero-shot测试，证明了强大的跨语言泛化能力，并揭示了通过在音素长度窗口内建模基本声学模式来实现有效的通用语音处理是可能的。
摘要：Universal phoneme recognition typically requires analyzing long speech segments and language-specific patterns. Many speech processing tasks require pure phoneme representations free from contextual influence, which motivated our development of CUPE - a lightweight model that captures key phoneme features in just 120 milliseconds, about one phoneme's length. CUPE processes short, fixed-width windows independently and, despite fewer parameters than current approaches, achieves competitive cross-lingual performance by learning fundamental acoustic patterns common to all languages. Our extensive evaluation through supervised and self-supervised training on diverse languages, including zero-shot tests on the UCLA Phonetic Corpus, demonstrates strong cross-lingual generalization and reveals that effective universal speech processing is possible through modeling basic acoustic patterns within phoneme-length windows.

【2】CuMoLoS-MAE: A Masked Autoencoder for Remote Sensing Data Reconstruction
标题：CuMoLos-MAE：一种用于遥感数据重建的掩蔽自动编码器
链接：https://arxiv.org/abs/2508.14957

作者：skar, Nathanael Zhixin Wong, Sara Shamekh
备注：4 pages, 2 figures
摘要：多普勒激光雷达、雷达和辐射计等遥感仪器的精确大气剖面经常受到低信噪比（SNR）门、距离折叠和虚假不连续性的破坏。传统的间隙填充模糊了精细尺度结构，而深层模型缺乏置信度估计。我们提出了CuMoLoS-MAE，一个课程引导的蒙特卡罗随机Ensemble Masked Autoencoder，旨在（i）恢复精细尺度的功能，如上升气流和下降气流核心，剪切线和小涡旋，（ii）学习数据驱动的大气场先验，（iii）量化像素级的不确定性。在训练期间，CuMoLoS-MAE采用掩码比课程，该课程迫使ViT解码器从逐渐稀疏的上下文进行重建。在推理时，我们通过随机掩模实现的Monte Carlo近似后验预测，多次评估MAE并聚合输出以获得后验预测均值重建以及精细解析的每像素不确定性图。结合高保真重建，这种新型的基于深度学习的工作流程可以增强对流诊断，支持实时数据同化，并改善长期气候再分析。
摘要：Accurate atmospheric profiles from remote sensing instruments such as Doppler Lidar, Radar, and radiometers are frequently corrupted by low-SNR (Signal to Noise Ratio) gates, range folding, and spurious discontinuities. Traditional gap filling blurs fine-scale structures, whereas deep models lack confidence estimates. We present CuMoLoS-MAE, a Curriculum-Guided Monte Carlo Stochastic Ensemble Masked Autoencoder designed to (i) restore fine-scale features such as updraft and downdraft cores, shear lines, and small vortices, (ii) learn a data-driven prior over atmospheric fields, and (iii) quantify pixel-wise uncertainty. During training, CuMoLoS-MAE employs a mask-ratio curriculum that forces a ViT decoder to reconstruct from progressively sparser context. At inference, we approximate the posterior predictive by Monte Carlo over random mask realisations, evaluating the MAE multiple times and aggregating the outputs to obtain the posterior predictive mean reconstruction together with a finely resolved per-pixel uncertainty map. Together with high-fidelity reconstruction, this novel deep learning-based workflow enables enhanced convection diagnostics, supports real-time data assimilation, and improves long-term climate reanalysis.

优化|敛散性(4篇)

【1】Language-Guided Tuning: Enhancing Numeric Optimization with Textual Feedback
标题：数字引导调优：通过文本反馈增强数字优化
链接：https://arxiv.org/abs/2508.15757

作者：, Yucheng Hu, Nan Sun, Xukai Zhao
备注：9 pages, 4 figures, 4 tables
摘要：配置优化仍然是机器学习中的一个关键瓶颈，需要在模型架构、训练策略、特征工程和超参数之间进行协调调整。传统的方法独立地处理这些维度，缺乏可解释性，而最近的自动化方法则在动态适应性和关于优化决策的语义推理方面苦苦挣扎。我们介绍了一个新的框架，采用多智能体大语言模型，智能优化配置，通过自然语言推理的引导调优（LGT）。我们应用文本梯度-定性反馈信号，通过提供对训练动态和配置相互依赖性的语义理解来补充数值优化。LGT协调三个专门的代理：建议配置更改的Advisor、评估进度的Evaluator和优化决策流程的Optimizer，从而创建一个自我改进的反馈循环。通过对六个不同数据集的综合评估，LGT展示了对传统优化方法的重大改进，在保持高可解释性的同时实现了性能提升。
摘要：Configuration optimization remains a critical bottleneck in machine learning, requiring coordinated tuning across model architecture, training strategy, feature engineering, and hyperparameters. Traditional approaches treat these dimensions independently and lack interpretability, while recent automated methods struggle with dynamic adaptability and semantic reasoning about optimization decisions. We introduce Language-Guided Tuning (LGT), a novel framework that employs multi-agent Large Language Models to intelligently optimize configurations through natural language reasoning. We apply textual gradients - qualitative feedback signals that complement numerical optimization by providing semantic understanding of training dynamics and configuration interdependencies. LGT coordinates three specialized agents: an Advisor that proposes configuration changes, an Evaluator that assesses progress, and an Optimizer that refines the decision-making process, creating a self-improving feedback loop. Through comprehensive evaluation on six diverse datasets, LGT demonstrates substantial improvements over traditional optimization methods, achieving performance gains while maintaining high interpretability.

【2】Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models
标题：黑匣子机器学习模型的局部帕累托最优解释
链接：https://arxiv.org/abs/2508.15220

作者： Joshi, Supratik Chakraborty, S Akshay, Shetal Shah, Hazem Torfah, Sanjit Seshia
备注：This work has been accepted at ATVA'25
摘要：为黑盒机器学习模型创建有意义的解释涉及平衡两个经常相互冲突的目标：准确性和可解释性。探索这些目标之间的权衡对于制定可信的解释至关重要。虽然已经开发了许多多目标解释综合技术，但它们通常缺乏对结果的帕累托最优性的正式保证。另一方面，提供这种保证的方法在探索帕累托最优空间时通常面临严重的可伸缩性限制。为了解决这个问题，我们开发了一个框架的基础上，局部最优性的保证，使更多的可扩展性合成的解释。具体来说，我们考虑的问题，合成一组帕累托最优解释与局部最优性保证，在每个解决方案的直接邻域内。我们的方法开始于多目标学习或搜索技术，如多目标蒙特卡罗树搜索，以生成一组最佳的帕累托最优候选人的准确性和可解释性。然后，我们将每个候选项的局部最优性验证为布尔可满足性问题，并使用SAT求解器来解决该问题。我们证明了我们的方法在一组基准上的有效性，将其与以前的方法进行比较，以探索帕累托最优的解释。特别是，我们表明，我们的方法产生的解释，密切匹配的方法提供全球担保合成。
摘要：Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto-optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpretations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal candidates with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.

【3】Linear Preference Optimization: Decoupled Gradient Control via Absolute Regularization
标题：线性偏好优化：通过绝对正规化实现脱钩梯度控制
链接：https://arxiv.org/abs/2508.14947

作者： Qianguo Sun, Chao Song, Junlong Wu, Tianrong Chen, Zhiyun Zeng, Yu Li
摘要：DPO（Direct Preference Optimization）算法以其简单性和训练稳定性成为一种广泛应用的离线偏好优化算法。然而，DPO容易过拟合和崩溃。为了解决这些挑战，我们提出了线性偏好优化（LPO），一种新的对齐框架，具有三个关键的创新。首先，我们通过用绝对差损失替换对数S形函数来引入梯度解耦，从而隔离优化动态。其次，我们通过偏移约束结合正正则化项来提高稳定性，以保持所选择的响应质量。第三，我们实现了可控的拒绝抑制使用梯度分离与简单的估计和一个可调系数，线性调节下降的拒绝概率。通过大量的实验，我们证明了LPO持续提高了各种任务的性能，包括一般的文本任务，数学任务和文本到语音（TTS）任务。这些结果建立了LPO作为一个强大的和可调的范式偏好对齐，我们公开发布的源代码，模型和训练数据。
摘要：DPO (Direct Preference Optimization) has become a widely used offline preference optimization algorithm due to its simplicity and training stability. However, DPO is prone to overfitting and collapse. To address these challenges, we propose Linear Preference Optimization (LPO), a novel alignment framework featuring three key innovations. First, we introduce gradient decoupling by replacing the log-sigmoid function with an absolute difference loss, thereby isolating the optimization dynamics. Second, we improve stability through an offset constraint combined with a positive regularization term to preserve the chosen response quality. Third, we implement controllable rejection suppression using gradient separation with straightforward estimation and a tunable coefficient that linearly regulates the descent of the rejection probability. Through extensive experiments, we demonstrate that LPO consistently improves performance on various tasks, including general text tasks, math tasks, and text-to-speech (TTS) tasks. These results establish LPO as a robust and tunable paradigm for preference alignment, and we release the source code, models, and training data publicly.

【4】Bayesian Optimization with Expected Improvement: No Regret and the Choice of Incumbent
标题：具有预期改进的Bayesian优化：无悔与现任者的选择
链接：https://arxiv.org/abs/2508.15674

作者：ng, Haowei Wang, Szu Hui Ng, Cosmin G. Petra
摘要：期望改进（EI）是贝叶斯优化（BO）中最广泛使用的捕获函数之一。尽管它在应用中的经验证明是成功的，累积后悔上限EI仍然是一个悬而未决的问题。本文分析了经典的带噪高斯过程期望改进（GP-EI）算法。我们考虑贝叶斯设置，其中的目标是从GP的样本。在GP-EI中，将最佳后验均值现任者（BPMI）、最佳采样后验均值现任者（BSPMI）和最佳观测现任者（BOI）作为当前最佳值的选择.首次给出了基于BPMI和BSPMI的GP-EI的累积后悔上界。重要的是，我们表明，在这两种情况下，GP-EI是一个无遗憾的算法平方指数（SE）和马特\'ern内核。此外，我们提出了第一次，GP-EI与BOI要么实现了次线性累积遗憾上限或具有快速收敛的噪声简单的遗憾界SE和Mat\'ern内核。我们的研究结果提供了理论指导时，从业者应用GP-EI在噪声环境中的现任者的选择。数值实验验证了我们的研究结果。
摘要：Expected improvement (EI) is one of the most widely used acquisition functions in Bayesian optimization (BO). Despite its proven empirical success in applications, the cumulative regret upper bound of EI remains an open question. In this paper, we analyze the classic noisy Gaussian process expected improvement (GP-EI) algorithm. We consider the Bayesian setting, where the objective is a sample from a GP. Three commonly used incumbents, namely the best posterior mean incumbent (BPMI), the best sampled posterior mean incumbent (BSPMI), and the best observation incumbent (BOI) are considered as the choices of the current best value in GP-EI. We present for the first time the cumulative regret upper bounds of GP-EI with BPMI and BSPMI. Importantly, we show that in both cases, GP-EI is a no-regret algorithm for both squared exponential (SE) and Mat\'ern kernels. Further, we present for the first time that GP-EI with BOI either achieves a sublinear cumulative regret upper bound or has a fast converging noisy simple regret bound for SE and Mat\'ern kernels. Our results provide theoretical guidance to the choice of incumbent when practitioners apply GP-EI in the noisy setting. Numerical experiments are conducted to validate our findings.

预测|估计(5篇)

【1】Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI
标题：探讨估计理论、机器学习和生成性人工智能的概率统一
链接：https://arxiv.org/abs/2508.15719

作者：Elmusrati
摘要：从不确定的噪声数据中提取意义是时间序列分析、模式识别和语言建模的一个基本问题。该调查提出了一个统一的数学框架，将经典估计理论，统计推断和现代机器学习（包括深度学习和大型语言模型）联系起来。通过分析最大似然估计、贝叶斯推理和注意力机制等技术如何处理不确定性，本文说明了许多人工智能方法都植根于共享的概率原则。通过包括系统识别、图像分类和语言生成在内的说明性场景，我们展示了越来越复杂的模型如何建立在这些基础上，以解决过拟合、数据稀疏性和可解释性等实际挑战。换句话说，这项工作表明，最大似然、MAP估计、贝叶斯分类和深度学习都代表了一个共同目标的不同方面：从嘈杂和/或有偏见的观察中推断隐藏的原因。它既是理论综合，也是学生和研究人员在机器学习不断发展的领域中的实践指南。
摘要：Extracting meaning from uncertain, noisy data is a fundamental problem across time series analysis, pattern recognition, and language modeling. This survey presents a unified mathematical framework that connects classical estimation theory, statistical inference, and modern machine learning, including deep learning and large language models. By analyzing how techniques such as maximum likelihood estimation, Bayesian inference, and attention mechanisms address uncertainty, the paper illustrates that many AI methods are rooted in shared probabilistic principles. Through illustrative scenarios including system identification, image classification, and language generation, we show how increasingly complex models build upon these foundations to tackle practical challenges like overfitting, data sparsity, and interpretability. In other words, the work demonstrates that maximum likelihood, MAP estimation, Bayesian classification, and deep learning all represent different facets of a shared goal: inferring hidden causes from noisy and/or biased observations. It serves as both a theoretical synthesis and a practical guide for students and researchers navigating the evolving landscape of machine learning.

【2】Enhancing Forecasting with a 2D Time Series Approach for Cohort-Based Data
标题：通过基于队列的2D时间序列方法增强预测
链接：https://arxiv.org/abs/2508.15369

作者：Guttel, Orit Moradov, Nachi Lieder, Asnat Greenstein-Messica
备注：Accepted at IEEE CiFer Companion 2025. 5 pages, 3 figures, 2 tables
摘要：本文介绍了一种新的二维（2D）时间序列预测模型，该模型集成了随时间推移的队列行为，解决了小数据环境中的挑战。我们使用多个真实世界的数据集证明了其有效性，与参考模型相比，在准确性和适应性方面表现出卓越的性能。该方法为面临财务和营销预测挑战的各行业的战略决策提供了宝贵的见解。
摘要：This paper introduces a novel two-dimensional (2D) time series forecasting model that integrates cohort behavior over time, addressing challenges in small data environments. We demonstrate its efficacy using multiple real-world datasets, showcasing superior performance in accuracy and adaptability compared to reference models. The approach offers valuable insights for strategic decision-making across industries facing financial and marketing forecasting challenges.

【3】See Beyond a Single View: Multi-Attribution Learning Leads to Better Conversion Rate Prediction
标题：超越单一观点：多归因学习带来更好的转化率预测
链接：https://arxiv.org/abs/2508.15217

作者：en, Zhangming Chan, Xiang-Rong Sheng, Lei Zhang, Sheng Chen, Chenghuan Hou, Han Zhu, Jian Xu, Bo Zheng
备注：Accepted at CIKM 2025
摘要：转化率（CVR）预测是在线广告系统的核心组成部分，其中归因机制（用于在用户接触点之间分配转化积分的规则）从根本上决定标签生成和模型优化。虽然许多工业平台支持不同的归属机制（例如，第一次点击、最后一次点击、线性和数据驱动的多点触摸归因），传统方法将模型训练限制在来自单个生产关键归因机制的标签上，丢弃了替代归因视角中的互补信号。为了解决这一限制，我们提出了一种新的多归因学习（MAL）框架，用于CVR预测，该框架集成了来自多个归因视角的信号，以更好地捕捉驱动用户转换的潜在模式。具体来说，MAL是一个联合学习框架，由两个核心组件组成：归因知识聚合器（AKA）和主要目标预测器（PTP）。AKA是作为一个多任务的学习者，集成了从不同的属性标签提取的知识。相比之下，PTP专注于生成与系统优化的归因度量（例如，最后点击属性下的CVR），确保与工业部署要求的直接兼容性。此外，我们提出了CAT，一种新的训练策略，利用所有属性标签组合的笛卡尔积来生成丰富的监督信号。这种设计大大提高了属性知识聚合器的性能。实证评估表明，MAL优于单一归因学习基线，在离线指标上实现了+0.51%GAUC改善。在线实验表明，MAL实现了+2.6%的ROI（投资回报率）增长。
摘要：Conversion rate (CVR) prediction is a core component of online advertising systems, where the attribution mechanisms-rules for allocating conversion credit across user touchpoints-fundamentally determine label generation and model optimization. While many industrial platforms support diverse attribution mechanisms (e.g., First-Click, Last-Click, Linear, and Data-Driven Multi-Touch Attribution), conventional approaches restrict model training to labels from a single production-critical attribution mechanism, discarding complementary signals in alternative attribution perspectives. To address this limitation, we propose a novel Multi-Attribution Learning (MAL) framework for CVR prediction that integrates signals from multiple attribution perspectives to better capture the underlying patterns driving user conversions. Specifically, MAL is a joint learning framework consisting of two core components: the Attribution Knowledge Aggregator (AKA) and the Primary Target Predictor (PTP). AKA is implemented as a multi-task learner that integrates knowledge extracted from diverse attribution labels. PTP, in contrast, focuses on the task of generating well-calibrated conversion probabilities that align with the system-optimized attribution metric (e.g., CVR under the Last-Click attribution), ensuring direct compatibility with industrial deployment requirements. Additionally, we propose CAT, a novel training strategy that leverages the Cartesian product of all attribution label combinations to generate enriched supervision signals. This design substantially enhances the performance of the attribution knowledge aggregator. Empirical evaluations demonstrate the superiority of MAL over single-attribution learning baselines, achieving +0.51% GAUC improvement on offline metrics. Online experiments demonstrate that MAL achieved a +2.6% increase in ROI (Return on Investment).

【4】Robust Estimation Under Heterogeneous Corruption Rates
标题：不同腐败率下的稳健估计
链接：https://arxiv.org/abs/2508.15051

作者： Chaudhuri, Jerry Li, Thomas A. Courtade
备注：NeurIPS 2025
摘要：我们研究了异质腐败率下的鲁棒估计问题，其中每个样本可能以已知但不相同的概率独立腐败。这种设置在分布式和联合学习、众包和传感器网络中自然出现，但现有的鲁棒估计器通常假设统一或最坏情况下的腐败，忽略了结构异质性。对于多元有界分布和一元高斯分布的均值估计，我们给出了所有异构腐败模式的严格极大极小率。对于多元高斯均值估计和线性回归，我们建立了平方误差的最小最大率，最大因子为$\sqrt{d}$，其中$d$是维数。粗略地说，我们的研究结果表明，超过一定的腐败阈值的样本可能会被丢弃的最佳估计-这个阈值是由给定的腐败率的经验分布。
摘要：We study the problem of robust estimation under heterogeneous corruption rates, where each sample may be independently corrupted with a known but non-identical probability. This setting arises naturally in distributed and federated learning, crowdsourcing, and sensor networks, yet existing robust estimators typically assume uniform or worst-case corruption, ignoring structural heterogeneity. For mean estimation for multivariate bounded distributions and univariate gaussian distributions, we give tight minimax rates for all heterogeneous corruption patterns. For multivariate gaussian mean estimation and linear regression, we establish the minimax rate for squared error up to a factor of $\sqrt{d}$, where $d$ is the dimension. Roughly, our findings suggest that samples beyond a certain corruption threshold may be discarded by the optimal estimators -- this threshold is determined by the empirical distribution of the corruption rates given.

【5】GEN2: A Generative Prediction-Correction Framework for Long-time Emulations of Spatially-Resolved Climate Extremes
标题：Gen 2：用于空间分辨率气候极端现象长期模拟的生成预测-修正框架
链接：https://arxiv.org/abs/2508.15196

作者：ng, Benedikt Barthel Sorensen, Themistoklis Sapsis
摘要：准确量化极端气候风险的增加需要在各种排放情景中产生大量的气候实现集合，这对传统的地球系统模型来说是一个计算挑战。我们提出了GEN 2，一个生成的预测-校正框架，用于对极端事件统计数据进行有效和准确的预测。预测步骤被构造为条件高斯仿真器，然后是非高斯机器学习（ML）校正步骤。ML模型在参考数据和模拟字段对上进行训练，以确保训练对混沌具有鲁棒性。我们首先验证了我们的模型的准确性，历史ERA 5数据，然后展示了各种未来气候变化情景的外推能力。当对一种变暖情景的单一实现进行训练时，我们的模型准确地预测了不同情景下极端事件的统计数据，成功地外推了训练数据的分布。
摘要：Accurately quantifying the increased risks of climate extremes requires generating large ensembles of climate realization across a wide range of emissions scenarios, which is computationally challenging for conventional Earth System Models. We propose GEN2, a generative prediction-correction framework for an efficient and accurate forecast of the extreme event statistics. The prediction step is constructed as a conditional Gaussian emulator, followed by a non-Gaussian machine-learning (ML) correction step. The ML model is trained on pairs of the reference data and the emulated fields nudged towards the reference, to ensure the training is robust to chaos. We first validate the accuracy of our model on historical ERA5 data and then demonstrate the extrapolation capabilities on various future climate change scenarios. When trained on a single realization of one warming scenario, our model accurately predicts the statistics of extreme events in different scenarios, successfully extrapolating beyond the distribution of training data.

其他神经网络|深度学习|模型|建模(24篇)

【1】Intern-S1: A Scientific Multimodal Foundation Model
标题：Intern-S1：科学的多模式基础模型
链接：https://arxiv.org/abs/2508.15763

作者：Zhongrui Cai, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Nianchen Deng, Ning Ding, Nanqin Dong, Peijie Dong, Shihan Dou, Sinan Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyang Gao, Yang Gao, Zhangwei Gao, Jiaye Ge, Qiming Ge, Lixin Gu, Yuzhe Gu, Aijia Guo, Qipeng Guo, Xu Guo, Conghui He, Junjun He, Yili Hong, Siyuan Hou, Caiyu Hu, Hanglei Hu, Jucheng Hu, Ming Hu, Zhouqi Hua, Haian Huang, Junhao Huang, Xu Huang, Zixian Huang, Zhe Jiang, Lingkai Kong, Linyang Li, Peiji Li, Pengze Li, Shuaibin Li, Tianbin Li, Wei Li, Yuqiang Li, Dahua Lin, Junyao Lin, Tianyi Lin, Zhishan Lin, Hongwei Liu, Jiangning Liu, Jiyao Liu, Junnan Liu, Kai Liu, Kaiwen Liu, Kuikun Liu, Shichun Liu, Shudong Liu, Wei Liu, Xinyao Liu, Yuhong Liu, Zhan Liu, Yinquan Lu, Haijun Lv, Hongxia Lv, Huijie Lv, Qidang Lv, Ying Lv, Chengqi Lyu, Chenglong Ma, Jianpeng Ma, Ren Ma, Runmin Ma, Runyuan Ma, Xinzhu Ma, Yichuan Ma, Zihan Ma, Sixuan Mi, Junzhi Ning, Wenchang Ning, Xinle Pang, Jiahui Peng, Runyu Peng, Yu Qiao
摘要：近年来，大量开源基金会模型涌现，在一些广泛关注的领域取得了显著进展，性能与闭源模型相当接近。然而，在高价值但更具挑战性的科学专业领域，要么仍然依赖专家模型，要么一般基础模型的进展明显滞后于热门领域，远远不足以转化科学研究，在这些科学领域的开源模型和闭源模型之间留下了巨大的差距。为了缩小这一差距，并进一步探索人工通用智能（AGI），我们引入了Intern-S1，这是一个专业的通才，具有一般理解和推理能力，并具有分析多个科学模态数据的专业知识。Intern-S1是一个多模式专家混合（MoE）模型，具有280亿个激活参数和2410亿个总参数，在5 T令牌上持续预训练，包括来自科学领域的超过2.5T令牌。在训练后阶段，Intern-S1在InternBootCamp中进行离线和在线强化学习（RL），在那里我们提出了混合奖励（MoR），以同时在1000多个任务上协同RL训练。通过算法、数据和训练系统的集成创新，Intern-S1在在线强化学习训练中取得了一流的成绩。在综合评测基准上，Intern-S1在一般推理任务上表现出了开源模型的竞争力，在科学领域的表现明显优于开源模型，在专业任务上，如分子合成规划，反应条件预测，晶体热力学稳定性预测。我们的模型可在https://huggingface.co/internlm/Intern-S1上获得。
摘要：In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to those in popular areas, far from sufficient for transforming scientific research and leaving substantial gap between open-source models and closed-source models in these scientific domains. To mitigate this gap and explore a step further toward Artificial General Intelligence (AGI), we introduce Intern-S1, a specialized generalist equipped with general understanding and reasoning capabilities with expertise to analyze multiple science modal data. Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the post-training stage, Intern-S1 undergoes offline and then online reinforcement learning (RL) in InternBootCamp, where we propose Mixture-of-Rewards (MoR) to synergize the RL training on more than 1000 tasks simultaneously. Through integrated innovations in algorithms, data, and training systems, Intern-S1 achieved top-tier performance in online RL training.On comprehensive evaluation benchmarks, Intern-S1 demonstrates competitive performance on general reasoning tasks among open-source models and significantly outperforms open-source models in scientific domains, surpassing closed-source state-of-the-art models in professional tasks, such as molecular synthesis planning, reaction condition prediction, predicting thermodynamic stabilities for crystals. Our models are available at https://huggingface.co/internlm/Intern-S1.

【2】An Efficient Open World Environment for Multi-Agent Social Learning
标题：多智能体社会学习的高效开放世界环境
链接：https://arxiv.org/abs/2508.15679

作者：Ren Tao, Natasha Jaques
摘要：在AI代理可以部署到现实世界环境中之前，仍然存在许多挑战。然而，这种环境的一个优点是，它们本质上是多智能体的，并且包含人类专家。在这样的环境中使用先进的社会智能可以帮助AI代理学习已知专家展示的自适应技能和行为。虽然社交智能可以加速训练，但由于缺乏开放的多智能体环境，目前很难研究。在这项工作中，我们提出了一个环境中，多个自私的代理人可以追求复杂和独立的目标，反映现实世界的挑战。这种环境将使研究能够在开放式多智能体环境中开发社会智能AI代理，在这种环境中，代理可能会被隐含地激励合作以击败共同的敌人，构建和共享工具，并实现长期目标。在这项工作中，我们调查的影响，由于社会学习的存在下，专家和隐式合作，如紧急协作工具的使用，以及代理人是否可以受益于合作或竞争，在这种环境中。
摘要：Many challenges remain before AI agents can be deployed in real-world environments. However, one virtue of such environments is that they are inherently multi-agent and contain human experts. Using advanced social intelligence in such an environment can help an AI agent learn adaptive skills and behaviors that a known expert exhibits. While social intelligence could accelerate training, it is currently difficult to study due to the lack of open-ended multi-agent environments. In this work, we present an environment in which multiple self-interested agents can pursue complex and independent goals, reflective of real world challenges. This environment will enable research into the development of socially intelligent AI agents in open-ended multi-agent settings, where agents may be implicitly incentivized to cooperate to defeat common enemies, build and share tools, and achieve long horizon goals. In this work, we investigate the impact on agent performance due to social learning in the presence of experts and implicit cooperation such as emergent collaborative tool use, and whether agents can benefit from either cooperation or competition in this environment.

【3】Tensorized Multi-Task Learning for Personalized Modeling of Heterogeneous Individuals with High-Dimensional Data
标题：张量化多任务学习，用于对具有多维数据的异类个体进行个性化建模
链接：https://arxiv.org/abs/2508.15676

作者：ar, Mostafa Reisi Gahrooei, Kamran Paynabar
摘要：由于个体特征和行为的变化，异质亚群的有效建模提出了一个重大的挑战。本文提出了一种新的方法来解决这个问题，通过多任务学习（MTL）和低秩张量分解技术。我们的MTL方法旨在通过利用相似任务之间的共享结构来增强个性化建模，同时考虑到不同的亚群特定的变化。我们引入了一个框架，其中低秩分解将任务模型参数的集合分解为低秩结构，该结构捕获任务和子群体之间的共性和变化。这种方法允许通过在类似任务之间共享知识来有效地学习个性化模型，同时保留每个子群体的独特特征。在模拟和案例研究数据集的实验结果表明，所提出的方法相比，几个基准的优越性能，特别是在亚群之间的高变异性的情况下。该框架不仅提高了预测精度，而且通过揭示有助于模型个性化的潜在模式来增强可解释性。
摘要：Effective modeling of heterogeneous subpopulations presents a significant challenge due to variations in individual characteristics and behaviors. This paper proposes a novel approach to address this issue through multi-task learning (MTL) and low-rank tensor decomposition techniques. Our MTL approach aims to enhance personalized modeling by leveraging shared structures among similar tasks while accounting for distinct subpopulation-specific variations. We introduce a framework where low-rank decomposition decomposes the collection of task model parameters into a low-rank structure that captures commonalities and variations across tasks and subpopulations. This approach allows for efficient learning of personalized models by sharing knowledge between similar tasks while preserving the unique characteristics of each subpopulation. Experimental results in simulation and case study datasets demonstrate the superior performance of the proposed method compared to several benchmarks, particularly in scenarios with high variability among subpopulations. The proposed framework not only improves prediction accuracy but also enhances interpretability by revealing underlying patterns that contribute to the personalization of models.

【4】Correct-By-Construction: Certified Individual Fairness through Neural Network Training
标题：通过构造的正确性：通过神经网络训练认证的个人公平性
链接：https://arxiv.org/abs/2508.15642

作者：ang, Jun Sun
摘要：机器学习中的公平性比以往任何时候都更加重要，因为道德问题不断增长。个体公平性要求只有敏感属性不同的个体得到相同的结果。然而，常用的机器学习算法往往无法实现这种公平性。为了提高个体公平性，已经开发了各种训练方法，例如将公平性约束作为优化目标。虽然这些方法已经证明了经验的有效性，但它们缺乏公平的正式保证。旨在提供公平性保证的现有方法主要依赖于验证技术，这些技术有时无法产生明确的结果。此外，仅凭核查并不能积极提高培训期间的个人公平性。为了解决这个问题，我们提出了一个新的框架，正式保证个人公平整个培训。我们的方法包括两个部分，即，(1)可证明公平的初始化，确保模型在公平状态下开始，以及（2）保持公平的训练算法，在模型学习时保持公平。我们的方法的一个关键要素是使用随机响应机制，在保持公平性保证的同时保护敏感属性。我们正式证明，这种机制在整个培训过程中维持个人公平。实验评估证实，我们的方法是有效的，即，产生经验上公平和准确的模型。此外，我们的方法比基于认证训练的替代方法（需要在训练期间进行神经网络验证）更有效。
摘要：Fairness in machine learning is more important than ever as ethical concerns continue to grow. Individual fairness demands that individuals differing only in sensitive attributes receive the same outcomes. However, commonly used machine learning algorithms often fail to achieve such fairness. To improve individual fairness, various training methods have been developed, such as incorporating fairness constraints as optimisation objectives. While these methods have demonstrated empirical effectiveness, they lack formal guarantees of fairness. Existing approaches that aim to provide fairness guarantees primarily rely on verification techniques, which can sometimes fail to produce definitive results. Moreover, verification alone does not actively enhance individual fairness during training. To address this limitation, we propose a novel framework that formally guarantees individual fairness throughout training. Our approach consists of two parts, i.e., (1) provably fair initialisation that ensures the model starts in a fair state, and (2) a fairness-preserving training algorithm that maintains fairness as the model learns. A key element of our method is the use of randomised response mechanisms, which protect sensitive attributes while maintaining fairness guarantees. We formally prove that this mechanism sustains individual fairness throughout the training process. Experimental evaluations confirm that our approach is effective, i.e., producing models that are empirically fair and accurate. Furthermore, our approach is much more efficient than the alternative approach based on certified training (which requires neural network verification during training).

【5】Continual Neural Topic Model
标题：连续主题神经模型
链接：https://arxiv.org/abs/2508.15612

作者：akkaparambil James, Waleed Mustafa, Marius Kloft, Sophie Fellenz
摘要：在持续学习中，我们的目标是学习新的任务，而不忘记以前学过的东西。在主题模型中，这转化为学习新的主题模型，而不会忘记以前学习过的主题。以前的工作要么考虑动态主题模型（DTM），它基于整个训练语料库一次学习主题的演变，要么考虑在线主题模型，它基于新数据不断更新，但没有长期记忆。为了填补这一空白，我们提出了连续神经主题模型（CoNTM），它在随后的时间步连续学习主题模型，而不会忘记以前学到的东西。这是使用不断更新的全局先验分布来实现的。在我们的实验中，CoNTM在主题质量和预测困惑方面始终优于动态主题模型，同时能够在线捕获主题变化。分析表明，与现有方法相比，CoNTM可以学习更多样化的主题，并更好地捕捉时间变化。
摘要：In continual learning, our aim is to learn a new task without forgetting what was learned previously. In topic models, this translates to learning new topic models without forgetting previously learned topics. Previous work either considered Dynamic Topic Models (DTMs), which learn the evolution of topics based on the entire training corpus at once, or Online Topic Models, which are updated continuously based on new data but do not have long-term memory. To fill this gap, we propose the Continual Neural Topic Model (CoNTM), which continuously learns topic models at subsequent time steps without forgetting what was previously learned. This is achieved using a global prior distribution that is continuously updated. In our experiments, CoNTM consistently outperformed the dynamic topic model in terms of topic quality and predictive perplexity while being able to capture topic changes online. The analysis reveals that CoNTM can learn more diverse topics and better capture temporal changes than existing methods.

【6】Conformalized Exceptional Model Mining: Telling Where Your Model Performs (Not) Well
标题：保形异常模型挖掘：告诉您的模型在哪里表现（不）好
链接：https://arxiv.org/abs/2508.15569

作者：ikun Yang, Wouter Duivesteijn, Mykola Pechenizkiy
备注：Accepted by ECML-PKDD
摘要：了解机器学习模型的细微差别对于负责任的部署至关重要，特别是在医疗保健和金融等高风险领域。本文介绍了一种新的框架，共形异常模型挖掘，它结合了共形预测的严谨性和异常模型挖掘（EMM）的解释能力。拟议的框架识别了数据中模型性能异常偏离的有凝聚力的子组，突出显示了高置信度和高不确定性的区域。我们开发了一个新的模型类，mSMoPE（多重软模型性能评估），它量化的不确定性，通过保形预测的严格覆盖保证。通过定义一个新的质量度量，相对平均不确定性损失（RAUL），我们的框架隔离子组在多类分类和回归任务的特殊性能模式。不同数据集的实验结果表明，该框架在发现可解释的子组方面的有效性，这些子组提供了对模型行为的重要见解。这项工作为增强模型的可解释性和可靠性奠定了基础，推动了可解释人工智能和不确定性量化的发展。
摘要：Understanding the nuanced performance of machine learning models is essential for responsible deployment, especially in high-stakes domains like healthcare and finance. This paper introduces a novel framework, Conformalized Exceptional Model Mining, which combines the rigor of Conformal Prediction with the explanatory power of Exceptional Model Mining (EMM). The proposed framework identifies cohesive subgroups within data where model performance deviates exceptionally, highlighting regions of both high confidence and high uncertainty. We develop a new model class, mSMoPE (multiplex Soft Model Performance Evaluation), which quantifies uncertainty through conformal prediction's rigorous coverage guarantees. By defining a new quality measure, Relative Average Uncertainty Loss (RAUL), our framework isolates subgroups with exceptional performance patterns in multi-class classification and regression tasks. Experimental results across diverse datasets demonstrate the framework's effectiveness in uncovering interpretable subgroups that provide critical insights into model behavior. This work lays the groundwork for enhancing model interpretability and reliability, advancing the state-of-the-art in explainable AI and uncertainty quantification.

【7】HEAS: Hierarchical Evolutionary Agent Simulation Framework for Cross-Scale Modeling and Multi-Objective Search
标题：HEAS：用于跨规模建模和多目标搜索的分层进化代理仿真框架
链接：https://arxiv.org/abs/2508.15555

作者：ng, Lin Nie, Xin Zhao
备注：9 pages, 1 figure
摘要：分层进化代理仿真（HEAS）是一个Python框架，它将分层的基于代理的建模与进化优化和锦标赛评估统一在一个可重复的工作流程中。HEAS将模型表示为在确定性层中调度的轻量级进程（“流”）的层次结构，这些进程读取和写入共享上下文，使跨尺度耦合显式且可审计。一个紧凑的API和CLI-模拟，优化，评估-暴露单目标和多目标进化，通过参数扁平化/非扁平化的PyTorch策略集成，以及具有用户定义的评分和投票规则的通用锦标赛工具。该框架通过统一的每一步和每一集的指标进行评估，持久化种子，日志和名人堂档案，并为痕迹，帕累托前沿和比较结果提供绘图帮助，减少胶水代码并提高研究的可比性。HEAS强调机制与编排的分离，允许外源驱动程序，内源代理和聚合器在没有重构的情况下进行组合和交换，而同一模型可以用于正向模拟，优化或系统比较。我们说明使用两个紧凑的例子-生态系统和企业决策设置。HEAS为跨学科，多层次的调查提供了一个实用的基础，产生可靠，可重复的结果。
摘要：Hierarchical Evolutionary Agent Simulation (HEAS) is a Python framework that unifies layered agent-based modeling with evolutionary optimization and tournament evaluation in a single, reproducible workflow. HEAS represents models as hierarchies of lightweight processes ("streams") scheduled in deterministic layers that read and write a shared context, making cross-scale couplings explicit and auditable. A compact API and CLI-simulate, optimize, evaluate-expose single- and multi-objective evolution, PyTorch policy integration via parameter flattening/unflattening, and general tournament tooling with user-defined scoring and voting rules. The framework standardizes evaluation through uniform per-step and episode metrics, persists seeds, logbooks, and hall-of-fame archives, and provides plotting helpers for traces, Pareto fronts, and comparative outcomes, reducing glue code and improving comparability across studies. HEAS emphasizes separation of mechanism from orchestration, allowing exogenous drivers, endogenous agents, and aggregators to be composed and swapped without refactoring, while the same model can be used for forward simulation, optimization, or systematic comparison. We illustrate usage with two compact examples-an ecological system and an enterprise decision-making setting. HEAS offers a practical foundation for cross-disciplinary, multi-level inquiry, yielding reliable, reproducible results.

【8】AI-Powered Machine Learning Approaches for Fault Diagnosis in Industrial Pumps
标题：用于工业泵故障诊断的人工智能驱动机器学习方法
链接：https://arxiv.org/abs/2508.15550

作者： A. Alghtus, Ayad Gannan, Khalid M. Alhajri, Ali L. A. Al Jubouri, Hassan A. I. Al-Janahi
摘要：这项研究提出了一种实用的方法，早期故障检测工业泵系统使用真实世界的传感器数据，从大型立式离心泵在苛刻的海洋环境中运行。监测五个关键操作参数：振动、温度、流速、压力和电流。应用双阈值标记方法，将固定工程限值与计算为历史传感器值的第95百分位数的自适应阈值相结合。为了解决记录故障的罕见性，使用特定领域的规则将合成故障信号注入数据中，在合理的操作范围内模拟关键警报。三个机器学习分类器-随机森林，极端梯度提升（XGBoost）和支持向量机（SVM）-被训练来区分正常操作，早期预警和关键警报。结果表明，Random Forest和XGBoost模型在所有类别中都取得了高准确性，包括代表罕见或新出现故障的少数情况，而SVM模型对异常的敏感性较低。视觉分析，包括分组混淆矩阵和时间序列图，表明所提出的混合方法提供了强大的检测能力。该框架具有可扩展性、可解释性，适用于实时工业部署，支持在故障发生前做出主动维护决策。此外，它可以适用于具有类似传感器架构的其他机械，突出了其作为复杂系统中预测性维护的可扩展解决方案的潜力。
摘要：This study presents a practical approach for early fault detection in industrial pump systems using real-world sensor data from a large-scale vertical centrifugal pump operating in a demanding marine environment. Five key operational parameters were monitored: vibration, temperature, flow rate, pressure, and electrical current. A dual-threshold labeling method was applied, combining fixed engineering limits with adaptive thresholds calculated as the 95th percentile of historical sensor values. To address the rarity of documented failures, synthetic fault signals were injected into the data using domain-specific rules, simulating critical alerts within plausible operating ranges. Three machine learning classifiers - Random Forest, Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) - were trained to distinguish between normal operation, early warnings, and critical alerts. Results showed that Random Forest and XGBoost models achieved high accuracy across all classes, including minority cases representing rare or emerging faults, while the SVM model exhibited lower sensitivity to anomalies. Visual analyses, including grouped confusion matrices and time-series plots, indicated that the proposed hybrid method provides robust detection capabilities. The framework is scalable, interpretable, and suitable for real-time industrial deployment, supporting proactive maintenance decisions before failures occur. Furthermore, it can be adapted to other machinery with similar sensor architectures, highlighting its potential as a scalable solution for predictive maintenance in complex systems.

【9】Jointly Computation- and Communication-Efficient Distributed Learning
标题：联合计算和通信高效的分布式学习
链接：https://arxiv.org/abs/2508.15509

作者：Ren, Nicola Bastianello, Karl H. Johansson, Thomas Parisini
备注：To be presented at 2025 IEEE Conference on Decision and Control
摘要：我们解决了无向网络上的分布式学习问题。具体来说，我们专注于设计一种新的基于ADMM的算法，是联合计算和通信效率。我们的设计通过允许代理在本地训练期间使用随机梯度来保证计算效率。此外，通信效率如下实现：i）代理在通信轮之间执行多个训练时期，以及ii）使用压缩传输。在强凸情形下证明了算法的精确线性收敛性。我们证实了我们的理论结果与国家的最先进的技术的分类任务的数值比较。
摘要：We address distributed learning problems over undirected networks. Specifically, we focus on designing a novel ADMM-based algorithm that is jointly computation- and communication-efficient. Our design guarantees computational efficiency by allowing agents to use stochastic gradients during local training. Moreover, communication efficiency is achieved as follows: i) the agents perform multiple training epochs between communication rounds, and ii) compressed transmissions are used. We prove exact linear convergence of the algorithm in the strongly convex setting. We corroborate our theoretical results by numerical comparisons with state of the art techniques on a classification task.

【10】Learning Protein-Ligand Binding in Hyperbolic Space
标题：双曲空间中学习蛋白质-配体结合
链接：https://arxiv.org/abs/2508.15480

作者：ang, Wenyu Zhu, Bowen Gao, Xin Hong, Ya-Qin Zhang, Wei-Ying Ma, Yanyan Lan
摘要：蛋白质-配体结合预测是虚拟筛选和亲和力排序的核心，这是药物发现中的两个基本任务。虽然最近基于检索的方法将配体和蛋白质口袋嵌入到欧几里得空间中进行基于相似性的搜索，但欧几里得嵌入的几何形状通常无法捕获分子相互作用固有的层次结构和细粒度亲和力变化。在这项工作中，我们提出了HypSeek，一个双曲表示学习框架，将配体，蛋白质口袋和序列嵌入到洛伦兹模型双曲空间中。通过利用双曲空间的指数几何和负曲率，HypSeek实现了富有表现力的亲和力敏感嵌入，可以有效地模拟全局活性和微妙的功能差异，特别是在具有挑战性的情况下，如活性悬崖，结构相似的配体表现出很大的亲和力差距。我们的模式将虚拟筛选和亲和力排名统一在一个框架中，引入蛋白质指导的三塔架构来增强代表性结构。HypSeek将DUD-E上虚拟筛选的早期富集从42.63提高到51.44（+20.7%），将JACS上的亲和力排名相关性从0.5774提高到0.7239（+25.4%），证明了双曲线几何在两项任务中的优势，并突出了其作为蛋白质-配体建模的强大诱导偏差的潜力。
摘要：Protein-ligand binding prediction is central to virtual screening and affinity ranking, two fundamental tasks in drug discovery. While recent retrieval-based methods embed ligands and protein pockets into Euclidean space for similarity-based search, the geometry of Euclidean embeddings often fails to capture the hierarchical structure and fine-grained affinity variations intrinsic to molecular interactions. In this work, we propose HypSeek, a hyperbolic representation learning framework that embeds ligands, protein pockets, and sequences into Lorentz-model hyperbolic space. By leveraging the exponential geometry and negative curvature of hyperbolic space, HypSeek enables expressive, affinity-sensitive embeddings that can effectively model both global activity and subtle functional differences-particularly in challenging cases such as activity cliffs, where structurally similar ligands exhibit large affinity gaps. Our mode unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture to enhance representational structure. HypSeek improves early enrichment in virtual screening on DUD-E from 42.63 to 51.44 (+20.7%) and affinity ranking correlation on JACS from 0.5774 to 0.7239 (+25.4%), demonstrating the benefits of hyperbolic geometry across both tasks and highlighting its potential as a powerful inductive bias for protein-ligand modeling.

【11】Influence-driven Curriculum Learning for Pre-training on Limited Data
标题：针对有限数据进行预训练的影响力驱动课程学习
链接：https://arxiv.org/abs/2508.15475

作者：oenegger, Lukas Thoma, Terra Blevins, Benjamin Roth
备注：9 pages
摘要：课程学习，一种训练技术，其中数据按照示例难度的顺序呈现给模型（例如，从更简单到更复杂的文档），在预训练语言模型方面取得了有限的成功。在这项工作中，我们研究课程学习是否变得有竞争力，如果我们取代传统的以人为中心的难度指标，更密切地对应于模型训练过程中观察到的例子的难度。具体来说，我们通过训练样本的\textit{training data influence}来对训练样本进行排序，这是一个评估单个训练样本对模型输出影响的分数。在我们的课程上训练的模型能够在基准测试中超过随机顺序训练的模型10个百分点，这证实了课程学习有利于语言模型的预训练，只要采用更加以模型为中心的难度概念。
摘要：Curriculum learning, a training technique where data is presented to the model in order of example difficulty (e.g., from simpler to more complex documents), has shown limited success for pre-training language models. In this work, we investigate whether curriculum learning becomes competitive if we replace conventional human-centered difficulty metrics with one that more closely corresponds to example difficulty as observed during model training. Specifically, we experiment with sorting training examples by their \textit{training data influence}, a score which estimates the effect of individual training examples on the model's output. Models trained on our curricula are able to outperform ones trained in random order by over 10 percentage points in benchmarks, confirming that curriculum learning is beneficial for language model pre-training, as long as a more model-centric notion of difficulty is adopted.

【12】Mini-Batch Robustness Verification of Deep Neural Networks
标题：深度神经网络的小批量鲁棒性验证
链接：https://arxiv.org/abs/2508.15454

作者：r-Shaday, Dana Drachsler Cohen
备注：30 pages, 12 figures, conference OOPSLA 2025
摘要：神经网络图像分类器在许多安全关键应用中无处不在。然而，它们容易受到敌对攻击。为了理解它们对攻击的鲁棒性，许多局部鲁棒性验证器被提出来分析输入的球。然而，现有的验证器引入了很长的分析时间或失去了太多的精度，使得它们对大量输入不太有效。在这项工作中，我们提出了一种新的方法，局部鲁棒性：组局部鲁棒性验证。关键思想是利用某些$\n $-球的网络计算的相似性来减少整体分析时间。我们提出了BaVerLy，一个健全的和完整的验证，提高了本地的鲁棒性验证的一组$\N $-球动态构建和验证小批量。BaVerLy自适应地识别成功的小批量大小，相应地构造具有相似网络计算的$\N $-球的小批量，并联合验证它们。如果验证了小批量，则所有$\n $-球都被证明是坚固的。否则，一个$\n $-球被怀疑是不健壮的，指导细化。在后一种情况下，BaVerLy利用分析结果来加快对该$\N $-球以及批处理中的其他$\N $-球的分析。我们在MNIST和CIFAR-10的全连接和卷积网络上评估BaVerLy。结果显示，BaVerLy将常见的逐个验证平均扩展2.3倍，最高可扩展4.1倍，在这种情况下，它将总分析时间从24小时缩短至6小时。
摘要：Neural network image classifiers are ubiquitous in many safety-critical applications. However, they are susceptible to adversarial attacks. To understand their robustness to attacks, many local robustness verifiers have been proposed to analyze $\epsilon$-balls of inputs. Yet, existing verifiers introduce a long analysis time or lose too much precision, making them less effective for a large set of inputs. In this work, we propose a new approach to local robustness: group local robustness verification. The key idea is to leverage the similarity of the network computations of certain $\epsilon$-balls to reduce the overall analysis time. We propose BaVerLy, a sound and complete verifier that boosts the local robustness verification of a set of $\epsilon$-balls by dynamically constructing and verifying mini-batches. BaVerLy adaptively identifies successful mini-batch sizes, accordingly constructs mini-batches of $\epsilon$-balls that have similar network computations, and verifies them jointly. If a mini-batch is verified, all $\epsilon$-balls are proven robust. Otherwise, one $\epsilon$-ball is suspected as not being robust, guiding the refinement. In the latter case, BaVerLy leverages the analysis results to expedite the analysis of that $\epsilon$-ball as well as the other $\epsilon$-balls in the batch. We evaluate BaVerLy on fully connected and convolutional networks for MNIST and CIFAR-10. Results show that BaVerLy scales the common one by one verification by 2.3x on average and up to 4.1x, in which case it reduces the total analysis time from 24 hours to 6 hours.

【13】A Solvable Molecular Switch Model for Stable Temporal Information Processing
标题：稳定时间信息处理的可解分子开关模型
链接：https://arxiv.org/abs/2508.15451

作者：din, C. A. Nijhuis
备注：21 pages, 6 figures, submitted for publication. Comments are welcome
摘要：本文研究了一个输入驱动的单态微分方程模型，该模型最初是为实验证明的动态分子开关而开发的，该开关像大脑中的突触一样。线性的状态和非线性的输入模型是完全可解的，它表明，它也具有收敛和衰减记忆，使稳定的处理时变输入的非线性动力系统的数学特性。因此，该模型表现出生物启发的行为和期望的数学属性的共存，以稳定的学习序列数据。研究结果为动态分子开关作为计算单元应用于深度级联/分层前馈和递归结构以及其他更一般的神经形态计算结构提供了理论支持。它们还可以激发更一般的精确可解模型，这些模型可以模拟任意的物理设备，这些设备可以模仿大脑启发的行为，并对输入信号进行稳定的计算。
摘要：This paper studies an input-driven one-state differential equation model initially developed for an experimentally demonstrated dynamic molecular switch that switches like synapses in the brain do. The linear-in-the-state and nonlinear-in-the-input model is exactly solvable, and it is shown that it also possesses mathematical properties of convergence and fading memory that enable stable processing of time-varying inputs by nonlinear dynamical systems. Thus, the model exhibits the co-existence of biologically-inspired behavior and desirable mathematical properties for stable learning on sequential data. The results give theoretical support for the use of the dynamic molecular switches as computational units in deep cascaded/layered feedforward and recurrent architectures as well as other more general structures for neuromorphic computing. They could also inspire more general exactly solvable models that can be fitted to emulate arbitrary physical devices which can mimic brain-inspired behaviour and perform stable computation on input signals.

【14】Pretrained Diffusion Models Are Inherently Skipped-Step Samplers
标题：预训练的扩散模型本质上是跳过步骤的采样器
链接：https://arxiv.org/abs/2508.15233

作者：
摘要：扩散模型已经在各种生成任务中取得了最先进的结果。然而，一个显著的缺点是它们的顺序生成过程，需要长序列逐步生成。现有的方法，如DDIM，试图通过构建一类非马尔可夫扩散过程，保持相同的训练目标，以减少采样步骤。然而，在理解原始扩散过程是否可以在不诉诸非马尔可夫过程的情况下实现相同的效率方面仍然存在差距。在本文中，我们提供了一个肯定的答案，并引入跳步采样，一种机制，绕过多个中间去噪步骤的迭代生成过程中，与传统的一步一步的细化标准扩散推理。至关重要的是，我们证明了这种跳步采样机制与标准扩散模型的训练目标相同，这表明通过马尔可夫方式通过跳步采样进行加速采样是预训练扩散模型的内在属性。此外，我们提出了一个增强的生成方法，通过将我们的加速采样技术与DDIM。在流行的预训练扩散模型（包括OpenAI ADM、Stable Diffusion和Open Sora模型）上进行的大量实验表明，我们的方法可以显著减少采样步骤，实现高质量的生成。
摘要：Diffusion models have been achieving state-of-the-art results across various generation tasks. However, a notable drawback is their sequential generation process, requiring long-sequence step-by-step generation. Existing methods, such as DDIM, attempt to reduce sampling steps by constructing a class of non-Markovian diffusion processes that maintain the same training objective. However, there remains a gap in understanding whether the original diffusion process can achieve the same efficiency without resorting to non-Markovian processes. In this paper, we provide a confirmative answer and introduce skipped-step sampling, a mechanism that bypasses multiple intermediate denoising steps in the iterative generation process, in contrast with the traditional step-by-step refinement of standard diffusion inference. Crucially, we demonstrate that this skipped-step sampling mechanism is derived from the same training objective as the standard diffusion model, indicating that accelerated sampling via skipped-step sampling via a Markovian way is an intrinsic property of pretrained diffusion models. Additionally, we propose an enhanced generation method by integrating our accelerated sampling technique with DDIM. Extensive experiments on popular pretrained diffusion models, including the OpenAI ADM, Stable Diffusion, and Open Sora models, show that our method achieves high-quality generation with significantly reduced sampling steps.

【15】Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)
标题：迈向可靠和可推广的差异私有机器学习（扩展版本）
链接：https://arxiv.org/abs/2508.15141

作者：ao, Vincent Bindschaedler
备注：This paper is published at ACSAC 2024. This is the extended version that includes an overview of the relevant literature. We open-source our codebase at: this https URL
摘要：最近有一系列研究论文提出了新的差分私有机器学习（DPML）技术。这些论文声称实现了新的最先进的（SoTA）结果，并提供了实证结果作为验证。然而，对于哪些技术最有效，或者它们是否真正符合其声明，并没有达成共识。复杂的问题，代码库，数据集，方法和模型架构的异质性使得不同方法的直接比较具有挑战性。在本文中，我们进行了再现性和可重复性（R+R）实验11个不同的SoTA DPML技术从最近的研究文献。我们的研究结果是多种多样的：虽然有些方法经得起审查，但其他方法在初始实验条件之外进行测试时会出现问题。我们还讨论了DPML的再现性所面临的独特挑战，包括DP噪声带来的额外随机性，以及如何解决这些挑战。最后，我们获得见解和最佳实践，以获得科学有效和可靠的结果。
摘要：There is a flurry of recent research papers proposing novel differentially private machine learning (DPML) techniques. These papers claim to achieve new state-of-the-art (SoTA) results and offer empirical results as validation. However, there is no consensus on which techniques are most effective or if they genuinely meet their stated claims. Complicating matters, heterogeneity in codebases, datasets, methodologies, and model architectures make direct comparisons of different approaches challenging. In this paper, we conduct a reproducibility and replicability (R+R) experiment on 11 different SoTA DPML techniques from the recent research literature. Results of our investigation are varied: while some methods stand up to scrutiny, others falter when tested outside their initial experimental conditions. We also discuss challenges unique to the reproducibility of DPML, including additional randomness due to DP noise, and how to address them. Finally, we derive insights and best practices to obtain scientifically valid and reliable results.

【16】Side Effects of Erasing Concepts from Diffusion Models
标题：从扩散模型中删除概念的副作用
链接：https://arxiv.org/abs/2508.15124

作者：Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale
备注：Findings of the Association for Computational Linguistics: EMNLP 2025
摘要：对文本到图像（T2 I）生成模型侵犯隐私，版权和安全的担忧导致了概念擦除技术（CET）的发展。有效的CET的目标是禁止生成用户指定的不希望的"目标“概念，同时保留合成剩余概念的高质量图像的能力。在这项工作中，我们证明，CET可以很容易地绕过，并提出了几个概念擦除的副作用。为了全面衡量CET的鲁棒性，我们提出了副作用评估（见），一个评估基准，由描述对象及其属性的分层和组合提示。这个数据集和我们的自动评估管道从三个方面量化了CET的副作用：对邻近概念的影响，目标的规避和属性泄漏。我们的实验表明，CET可以绕过使用超类-子类层次结构和语义相似的提示，如组成的目标变量。我们发现，CET遭受属性泄漏和违反直觉的现象，注意力集中或分散。我们发布了我们的数据集，代码和评估工具，以帮助未来的工作健壮的概念擦除。
摘要：Concerns about text-to-image (T2I) generative models infringing on privacy, copyright, and safety have led to the development of Concept Erasure Techniques (CETs). The goal of an effective CET is to prohibit the generation of undesired ``target'' concepts specified by the user, while preserving the ability to synthesize high-quality images of the remaining concepts. In this work, we demonstrate that CETs can be easily circumvented and present several side effects of concept erasure. For a comprehensive measurement of the robustness of CETs, we present Side Effect Evaluation (\see), an evaluation benchmark that consists of hierarchical and compositional prompts that describe objects and their attributes. This dataset and our automated evaluation pipeline quantify side effects of CETs across three aspects: impact on neighboring concepts, evasion of targets, and attribute leakage. Our experiments reveal that CETs can be circumvented by using superclass-subclass hierarchy and semantically similar prompts, such as compositional variants of the target. We show that CETs suffer from attribute leakage and counterintuitive phenomena of attention concentration or dispersal. We release our dataset, code, and evaluation tools to aid future work on robust concept erasure.

【17】Wormhole Dynamics in Deep Neural Networks
标题：深度神经网络中的蠕虫洞动力学
链接：https://arxiv.org/abs/2508.15086

作者：Lai, Zhe Jin
摘要：这项工作研究了深度神经网络（DNN）的泛化行为，重点关注“愚弄示例”现象，其中DNN自信地对人类随机或非结构化的输入进行分类。为了探索这一现象，我们引入了一个基于最大似然估计的分析框架，而不坚持传统的数值方法，依赖于基于梯度的优化和显式标签。我们的分析表明，在过度参数化机制下运行的DNN在输出特征空间中表现出崩溃。虽然这种崩溃提高了网络的泛化能力，但增加更多的层最终会导致退化状态，在这种状态下，模型通过将不同的输入映射到相同的输出来学习平凡的解决方案，从而导致零损失。进一步的研究表明，这种简并可以绕过使用我们新推导的“虫洞”的解决方案。虫洞的解决方案，当应用于任意愚弄的例子，调和有意义的标签与随机的，并提供了一个新的视角捷径学习。这些发现为DNN泛化提供了更深入的见解，并为未来在无监督环境下的学习动态研究指明了方向，以弥合理论与实践之间的差距。
摘要：This work investigates the generalization behavior of deep neural networks (DNNs), focusing on the phenomenon of "fooling examples," where DNNs confidently classify inputs that appear random or unstructured to humans. To explore this phenomenon, we introduce an analytical framework based on maximum likelihood estimation, without adhering to conventional numerical approaches that rely on gradient-based optimization and explicit labels. Our analysis reveals that DNNs operating in an overparameterized regime exhibit a collapse in the output feature space. While this collapse improves network generalization, adding more layers eventually leads to a state of degeneracy, where the model learns trivial solutions by mapping distinct inputs to the same output, resulting in zero loss. Further investigation demonstrates that this degeneracy can be bypassed using our newly derived "wormhole" solution. The wormhole solution, when applied to arbitrary fooling examples, reconciles meaningful labels with random ones and provides a novel perspective on shortcut learning. These findings offer deeper insights into DNN generalization and highlight directions for future research on learning dynamics in unsupervised settings to bridge the gap between theory and practice.

【18】Rethinking the Potential of Layer Freezing for Efficient DNN Training
标题：重新思考层冻结以实现高效DNN训练的潜力
链接：https://arxiv.org/abs/2508.15033

作者：ng, Ci Zhang, Lei Lu, Qitao Tan, Sheng Li, Ao Li, Xulong Tang, Shaoyi Huang, Jinzhen Wang, Guoming Li, Jundong Li, Xiaoming Zhai, Jin Lu, Geng Yuan
摘要：随着深度神经网络和数据集规模的不断扩大，训练的计算成本也显著增加。层冻结技术作为一种有效降低网络训练成本的方法，近年来引起了人们的极大关注。然而，在传统的层冻结方法中，冻结层仍然需要向前传播以生成未冻结层的特征图，从而限制了计算成本的降低。为了克服这个问题，先前的工作提出了一种假设的解决方案，该解决方案将来自冻结层的特征图缓存为新的数据集，允许后面的层直接在存储的特征图上进行训练。虽然这种方法看起来很简单，但它提出了一些被先前文献严重忽视的主要挑战，例如如何有效地将增强应用于特征图以及引入的大量存储开销。如果不解决这些被忽视的挑战，缓存方法的性能将受到严重影响，甚至使其不可行。本文首次全面探讨了这些挑战，并提供了系统的解决方案。为了提高训练精度，我们提出了\textit{相似性感知通道增强}，它以最小的额外存储成本高速缓存具有高增强灵敏度的通道。为了减少存储开销，我们将有损数据压缩纳入层冻结中，并设计了一种\textit{渐进式压缩}策略，该策略随着更多的层被冻结而提高压缩率，从而有效地降低存储成本。最后，我们的解决方案在保持模型准确性的同时显著降低了训练成本，并且时间开销很小。此外，我们还对冻结和压缩策略进行了全面评估，为优化其应用以实现高效的DNN训练提供了见解。
摘要：With the growing size of deep neural networks and datasets, the computational costs of training have significantly increased. The layer-freezing technique has recently attracted great attention as a promising method to effectively reduce the cost of network training. However, in traditional layer-freezing methods, frozen layers are still required for forward propagation to generate feature maps for unfrozen layers, limiting the reduction of computation costs. To overcome this, prior works proposed a hypothetical solution, which caches feature maps from frozen layers as a new dataset, allowing later layers to train directly on stored feature maps. While this approach appears to be straightforward, it presents several major challenges that are severely overlooked by prior literature, such as how to effectively apply augmentations to feature maps and the substantial storage overhead introduced. If these overlooked challenges are not addressed, the performance of the caching method will be severely impacted and even make it infeasible. This paper is the first to comprehensively explore these challenges and provides a systematic solution. To improve training accuracy, we propose \textit{similarity-aware channel augmentation}, which caches channels with high augmentation sensitivity with a minimum additional storage cost. To mitigate storage overhead, we incorporate lossy data compression into layer freezing and design a \textit{progressive compression} strategy, which increases compression rates as more layers are frozen, effectively reducing storage costs. Finally, our solution achieves significant reductions in training cost while maintaining model accuracy, with a minor time overhead. Additionally, we conduct a comprehensive evaluation of freezing and compression strategies, providing insights into optimizing their application for efficient DNN training.

【19】Reversible Unfolding Network for Concealed Visual Perception with Generative Refinement
标题：具有生成细化的隐藏视觉感知的可逆展开网络
链接：https://arxiv.org/abs/2508.15027

作者：He, Fengyang Xiao, Rihan Zhang, Chengyu Fang, Deng-Ping Fan, Sina Farsiu
备注：18 pages, 21 tables, 13 figures
摘要：现有的隐藏视觉感知（CVP）方法通常利用可逆策略来降低不确定性，但这些方法通常局限于掩模域，使RGB域的潜力未得到充分挖掘。为了解决这个问题，我们提出了一个具有生成细化的可逆展开网络，称为RUN++。具体来说，RUN++首先将CVP任务公式化为数学优化问题，并将迭代解决方案展开为多级深度网络。这种方法提供了一种原则性的方法来跨遮罩和RGB域应用可逆建模，同时利用扩散模型来解决所产生的不确定性。网络的每个阶段集成了三个目的驱动的模块：隐藏对象区域提取（CORE）模块将可逆建模应用于掩模域以识别核心对象区域;上下文感知区域增强（CARE）模块将此原理扩展到RGB域以促进更好的前景-背景分离;通过基于噪声的增强（FINE）模块进行微调迭代提供最终的改进。FINE模块引入了一个有针对性的伯努利扩散模型，该模型仅细化分割掩模的不确定区域，利用扩散的生成能力进行精细细节恢复，而无需全图像处理的高昂计算成本。这种独特的协同作用，其中展开网络为扩散模型提供了强大的不确定性先验，使RUN++能够有效地将其重点放在模糊区域，显著减少误报和漏报。此外，我们引入了一个新的范例，用于构建强大的CVP系统，在现实世界的退化下仍然有效，并将此概念扩展到更广泛的双层优化框架。
摘要：Existing methods for concealed visual perception (CVP) often leverage reversible strategies to decrease uncertainty, yet these are typically confined to the mask domain, leaving the potential of the RGB domain underexplored. To address this, we propose a reversible unfolding network with generative refinement, termed RUN++. Specifically, RUN++ first formulates the CVP task as a mathematical optimization problem and unfolds the iterative solution into a multi-stage deep network. This approach provides a principled way to apply reversible modeling across both mask and RGB domains while leveraging a diffusion model to resolve the resulting uncertainty. Each stage of the network integrates three purpose-driven modules: a Concealed Object Region Extraction (CORE) module applies reversible modeling to the mask domain to identify core object regions; a Context-Aware Region Enhancement (CARE) module extends this principle to the RGB domain to foster better foreground-background separation; and a Finetuning Iteration via Noise-based Enhancement (FINE) module provides a final refinement. The FINE module introduces a targeted Bernoulli diffusion model that refines only the uncertain regions of the segmentation mask, harnessing the generative power of diffusion for fine-detail restoration without the prohibitive computational cost of a full-image process. This unique synergy, where the unfolding network provides a strong uncertainty prior for the diffusion model, allows RUN++ to efficiently direct its focus toward ambiguous areas, significantly mitigating false positives and negatives. Furthermore, we introduce a new paradigm for building robust CVP systems that remain effective under real-world degradations and extend this concept into a broader bi-level optimization framework.

【20】Quantized Neural Networks for Microcontrollers: A Comprehensive Review of Methods, Platforms, and Applications
标题：微控制器的量化神经网络：方法、平台和应用的全面回顾
链接：https://arxiv.org/abs/2508.15008

作者：Abushahla, Dara Varam, Ariel J. N. Panopio, Mohamed I. AlHajri
备注：39 pages, 16 figures, 8 Tables, submitted to the Proceedings of the IEEE
摘要：在资源受限的设备（如微控制器）上部署量化神经网络（QNN），在平衡模型性能、计算复杂性和内存限制方面带来了重大挑战。Tiny Machine Learning（TinyML）通过整合机器学习算法、硬件加速和软件优化的进步来解决这些问题，从而在嵌入式系统上高效地运行深度神经网络。本调查介绍了以硬件为中心的量化介绍，系统地回顾了用于加速嵌入式应用程序深度学习模型的基本量化技术。特别是，进一步强调模型性能和硬件能力之间的关键权衡。该调查进一步评估了专门为支持微控制器上的QNN执行而设计的现有软件框架和硬件平台。此外，我们提供了一个分析当前的挑战，并在QNN部署的快速发展的领域有前途的未来方向的轮廓。
摘要：The deployment of Quantized Neural Networks (QNNs) on resource-constrained devices, such as microcontrollers, has introduced significant challenges in balancing model performance, computational complexity and memory constraints. Tiny Machine Learning (TinyML) addresses these issues by integrating advancements across machine learning algorithms, hardware acceleration, and software optimization to efficiently run deep neural networks on embedded systems. This survey presents a hardware-centric introduction to quantization, systematically reviewing essential quantization techniques employed to accelerate deep learning models for embedded applications. In particular, further emphasis is put on critical trade-offs among model performance and hardware capabilities. The survey further evaluates existing software frameworks and hardware platforms designed specifically for supporting QNN execution on microcontrollers. Moreover, we provide an analysis of the current challenges and an outline of promising future directions in the rapidly evolving domain of QNN deployment.

【21】Closing the Performance Gap in Generative Recommenders with Collaborative Tokenization and Efficient Modeling
标题：通过协作代币化和高效建模缩小生成式推荐器的性能差距
链接：https://arxiv.org/abs/2508.14910

作者：age, Jeremie Mary, David Picard
备注：Code coming soon
摘要：最近的工作已经探索生成推荐系统作为传统的基于ID的模型的替代方案，重构项目推荐作为离散项目令牌的序列生成任务。虽然很有前途，但与SASRec等基于ID的基线相比，这些方法在实践中往往表现不佳。在本文中，我们确定了两个关键的限制阻碍生成方法：缺乏合作信号的项目标记化，和效率低下的常用编码器-解码器架构。为了解决这些问题，我们引入COSETTE，一种对比标记化方法，将协作信息直接集成到学习的项目表示中，共同优化内容重构和推荐相关性。此外，我们提出了MARIUS，一个轻量级的，音频启发的生成模型，从项目解码中提取时间轴建模。MARIUS降低了推理成本，同时提高了推荐精度。标准的顺序推荐基准测试的实验表明，我们的方法缩小，甚至消除，生成和现代ID为基础的模型之间的性能差距，同时保留了生成范式的好处。
摘要：Recent work has explored generative recommender systems as an alternative to traditional ID-based models, reframing item recommendation as a sequence generation task over discrete item tokens. While promising, such methods often underperform in practice compared to well-tuned ID-based baselines like SASRec. In this paper, we identify two key limitations holding back generative approaches: the lack of collaborative signal in item tokenization, and inefficiencies in the commonly used encoder-decoder architecture. To address these issues, we introduce COSETTE, a contrastive tokenization method that integrates collaborative information directly into the learned item representations, jointly optimizing for both content reconstruction and recommendation relevance. Additionally, we propose MARIUS, a lightweight, audio-inspired generative model that decouples timeline modeling from item decoding. MARIUS reduces inference cost while improving recommendation accuracy. Experiments on standard sequential recommendation benchmarks show that our approach narrows, or even eliminates, the performance gap between generative and modern ID-based models, while retaining the benefits of the generative paradigm.

【22】Tree-like Pairwise Interaction Networks
标题：树木成对交互网络
链接：https://arxiv.org/abs/2508.15678

作者：chman, Salvatore Scognamiglio, Mario V. Wüthrich
摘要：对表格数据中的特征交互进行建模仍然是预测建模中的一个关键挑战，例如，用于保险定价。本文提出了树状成对交互网络（PIN），一种新的神经网络架构，通过一个共享的前馈神经网络架构，模仿决策树的结构，显式地捕捉成对特征的相互作用。PIN通过设计实现了内在的可解释性，允许直接检查相互作用效应。此外，它允许有效的SHapley加法解释（SHAP）计算，因为它只涉及成对的相互作用。我们强调了PIN和已建立的模型（如GA2M，梯度提升机和图形神经网络）之间的联系。在流行的法国汽车保险数据集上的实证结果表明，PIN在预测准确性方面优于传统和现代神经网络基准，同时还提供了对特征如何相互作用以及它们如何对预测做出贡献的见解。
摘要：Modeling feature interactions in tabular data remains a key challenge in predictive modeling, for example, as used for insurance pricing. This paper proposes the Tree-like Pairwise Interaction Network (PIN), a novel neural network architecture that explicitly captures pairwise feature interactions through a shared feed-forward neural network architecture that mimics the structure of decision trees. PIN enables intrinsic interpretability by design, allowing for direct inspection of interaction effects. Moreover, it allows for efficient SHapley's Additive exPlanation (SHAP) computations because it only involves pairwise interactions. We highlight connections between PIN and established models such as GA2Ms, gradient boosting machines, and graph neural networks. Empirical results on the popular French motor insurance dataset show that PIN outperforms both traditional and modern neural networks benchmarks in predictive accuracy, while also providing insight into how features interact with each another and how they contribute to the predictions.

【23】Flow Matching at Scale: A Machine Learning Framework for Efficient Large-Size Sampling of Many-Body Systems
标题：大规模流匹配：多体系统高效大尺寸采样的机器学习框架
链接：https://arxiv.org/abs/2508.15318

作者：Lee, Daw-Wei Wang
摘要：我们提出了一个基于流匹配的机器学习框架，以克服马尔可夫链蒙特卡罗（MCMC）方法的缩放限制。我们展示了它的能力在二维XY模型，其中一个单一的网络，只训练配置从一个小的（32\times 32$）晶格在稀疏的温度点，生成可靠的样本为一个显着更大的系统（128\times 128$）在一个连续的温度范围内没有再训练。生成的配置显示出强烈的协议与关键的热力学观测值，并正确地捕获的Berezinskiii-Kosterlitz-无（BKT）过渡的签名。这种双重泛化是由流匹配框架实现的，它允许我们学习连续的、温度调节的映射。同时，底层CNN架构的归纳偏差确保了学习到的局部物理规则是尺度不变的。这种“小火车，大发电”的能力建立了一个新的范式，有效地研究临界现象，探索热力学极限提供了显着的计算优势。该方法可直接应用于其它由格点上连续场描述的经典或量子多体系统。
摘要：We propose a machine learning framework based on Flow Matching to overcome the scaling limitations of Markov Chain Monte Carlo (MCMC) methods. We demonstrate its capability in the 2D XY model, where a single network, trained only on configurations from a small ($32\times 32$) lattice at sparse temperature points, generates reliable samples for a significantly larger system ($128\times 128$) across a continuous temperature range without retraining. The generated configurations show strong agreement with key thermodynamic observables and correctly capture the signatures of the Berezinskii-Kosterlitz-Thouless (BKT) transition. This dual generalization is enabled by the Flow Matching framework, which allows us to learn a continuous, temperature-conditioned mapping. At the same time, the inductive biases of the underlying CNN architecture ensure that the learned local physical rules are scale-invariant. This "train-small, generate-large" capability establishes a new paradigm for efficiently studying critical phenomena, offering a significant computational advantage for exploring the thermodynamic limit. The method can be directly applied to other classical or quantum many-body systems described by continuous fields on a lattice.

【24】Generative AI models enable efficient and physically consistent sea-ice simulations
标题：生成的人工智能模型能够实现高效且物理一致的海冰模拟
链接：https://arxiv.org/abs/2508.14984

作者：bastian Finn, Marc Bocquet, Pierre Rampal, Charlotte Durand, Flavia Porro, Alban Farchi, Alberto Carrassi
备注：43 pages, 10 figures
摘要：海冰是由高度复杂的，尺度不变的，各向异性的过程，是具有挑战性的代表在地球系统模型。虽然先进的数值模式提高了我们对海冰动力学的理解，但它们的计算成本往往限制了它们在集合预报和气候模拟中的应用。在这里，我们介绍GenSIM，这是第一个基于人工智能的生成式泛北极模型，它可以在12小时的窗口内预测所有相关关键属性（包括浓度、厚度和漂移）的演变，比确定性预测具有更高的准确性和更高的计算效率，同时保持物理一致。经过对最先进的海冰-海洋系统的长期模拟训练，GenSIM强大地再现了数值模型和观测中观察到的统计数据，展示了类似脆性的短期动态，同时也描绘了长期海冰下降。仅仅由大气强迫驱动，我们将GenSIM的紧急外推能力归因于反映海洋长期影响的模式：它似乎已经学会了内部海洋模拟器。这种从短期预测中推断出缓慢演变的气候相关动态的能力，突显了生成模型在概括看不见的气候和编码隐藏的物理学方面的巨大潜力。
摘要：Sea ice is governed by highly complex, scale-invariant, and anisotropic processes that are challenging to represent in Earth system models. While advanced numerical models have improved our understanding of the sea-ice dynamics, their computational costs often limit their application in ensemble forecasting and climate simulations. Here, we introduce GenSIM, the first generative AI-based pan-Arctic model that predicts the evolution of all relevant key properties, including concentration, thickness, and drift, in a 12-hour window with improved accuracy over deterministic predictions and high computational efficiency, while remaining physically consistent. Trained on a long simulation from a state-of-the-art sea-ice--ocean system, GenSIM robustly reproduces statistics as observed in numerical models and observations, exhibiting brittle-like short-term dynamics while also depicting the long-term sea-ice decline. Driven solely by atmospheric forcings, we attribute GenSIM's emergent extrapolation capabilities to patterns that reflect the long-term impact of the ocean: it seemingly has learned an internal ocean emulator. This ability to infer slowly evolving climate-relevant dynamics from short-term predictions underlines the large potential of generative models to generalise for unseen climates and to encode hidden physics.

其他(28篇)

【1】Neural Robot Dynamics
标题：神经机器人动力学
链接：https://arxiv.org/abs/2508.15755

作者：ric Heiden, Iretiayo Akinola, Dieter Fox, Miles Macklin, Yashraj Narang
摘要：由于现代机器人的高自由度和复杂的机构，精确和有效的仿真仍然具有挑战性。神经模拟器已经成为传统分析模拟器的一个有前途的替代方案，能够有效地预测复杂的动态并适应现实世界的数据;然而，现有的神经模拟器通常需要特定于应用的训练，并且无法推广到新的任务和/或环境，主要是由于全局状态的表示不足。在这项工作中，我们解决的问题，学习一般化的神经模拟器的机器人结构为铰接刚体。我们提出了NeRD（神经机器人动力学），学习机器人特定的动力学模型，用于预测未来的状态下，关节刚体接触约束。NeRD独特地取代了低层次的动力学和接触求解器的分析模拟器，并采用了以机器人为中心的和空间不变的仿真状态表示。我们将学到的NeRD模型作为一个可互换的后端求解器集成在一个最先进的机器人模拟器中。我们进行了大量的实验，以表明NeRD模拟器是稳定和准确的超过一千个模拟步骤;概括跨任务和环境配置;使策略学习专门在神经引擎;和，不像大多数经典的模拟器，可以从现实世界的数据进行微调，以弥合模拟和现实之间的差距。
摘要：Accurate and efficient simulation of modern robots remains challenging due to their high degrees of freedom and intricate mechanisms. Neural simulators have emerged as a promising alternative to traditional analytical simulators, capable of efficiently predicting complex dynamics and adapting to real-world data; however, existing neural simulators typically require application-specific training and fail to generalize to novel tasks and/or environments, primarily due to inadequate representations of the global state. In this work, we address the problem of learning generalizable neural simulators for robots that are structured as articulated rigid bodies. We propose NeRD (Neural Robot Dynamics), learned robot-specific dynamics models for predicting future states for articulated rigid bodies under contact constraints. NeRD uniquely replaces the low-level dynamics and contact solvers in an analytical simulator and employs a robot-centric and spatially-invariant simulation state representation. We integrate the learned NeRD models as an interchangeable backend solver within a state-of-the-art robotics simulator. We conduct extensive experiments to show that the NeRD simulators are stable and accurate over a thousand simulation steps; generalize across tasks and environment configurations; enable policy learning exclusively in a neural engine; and, unlike most classical simulators, can be fine-tuned from real-world data to bridge the gap between simulation and reality.

【2】Investigation of D-Wave quantum annealing for training Restricted Boltzmann Machines and mitigating catastrophic forgetting
标题：研究D波量子模拟用于训练受限Boltzmann机和减轻灾难性遗忘
链接：https://arxiv.org/abs/2508.15697

作者：a El-Yazizi, Yaroslav Koshka
备注：26 pages, 5 figures
摘要：适度的统计差异之间的采样性能的D-波量子退火（QA）和经典的马尔可夫链蒙特卡罗（MCMC），当应用于限制玻尔兹曼机（RBM），探索解释，并可能解决，没有显着和一致的改善RBM可训练性时，在以前的调查中使用的D-波采样。一种新的混合采样方法，结合经典和QA的贡献，研究作为一个有前途的方式，以受益于两种采样方法之间的适度差异。在这项工作中，成果管理制培训没有得到改善，从而表明基于QA和MCMC抽样之间的差异，主要是在分布的中低概率区域，这对样本的质量不太重要，不足以使培训受益。将RBM嵌入到新一代D-Wave硬件的晶格中的难度可能会使任务进一步复杂化。另一方面，从分布的低概率部分生成足够多样的样本的能力有可能使其他机器学习应用受益，例如在增量学习期间减轻灾难性遗忘（CF）。在这项工作中，第一次证明了使用QA生成的模式的理想类CF缓解的生成重放的可行性。虽然使用D-Wave QA的CF缓解的效率与经典缓解的效率相当，但生成大量不同期望模式的速度和进一步改进的潜力使得这种方法有希望用于各种具有挑战性的机器学习应用。
摘要：Modest statistical differences between the sampling performances of the D-Wave quantum annealer (QA) and the classical Markov Chain Monte Carlo (MCMC), when applied to Restricted Boltzmann Machines (RBMs), are explored to explain, and possibly address, the absence of significant and consistent improvements in RBM trainability when the D-Wave sampling was used in previous investigations. A novel hybrid sampling approach, combining the classical and the QA contributions, is investigated as a promising way to benefit from the modest differences between the two sampling methods. No improvements in the RBM training are achieved in this work, thereby suggesting that the differences between the QA-based and MCMC sampling, mainly found in the medium-to-low probability regions of the distribution, which are less important for the quality of the sample, are insufficient to benefit the training. Difficulties in achieving sufficiently high quality of embedding RBMs into the lattice of the newer generation of D-Wave hardware could be further complicating the task. On the other hand, the ability to generate samples of sufficient variety from lower-probability parts of the distribution has a potential to benefit other machine learning applications, such as the mitigation of catastrophic forgetting (CF) during incremental learning. The feasibility of using QA-generated patterns of desirable classes for CF mitigation by the generative replay is demonstrated in this work for the first time. While the efficiency of the CF mitigation using the D-Wave QA was comparable to that of the classical mitigation, both the speed of generating a large number of distinct desirable patterns and the potential for further improvement make this approach promising for a variety of challenging machine learning applications.

【3】Exploiting Policy Idling for Dexterous Manipulation
标题：利用政策懒惰进行灵巧操纵
链接：https://arxiv.org/abs/2508.15669

作者：Chen, Philemon Brakel, Antonia Bronars, Annie Xie, Sandy Huang, Oliver Groth, Maria Bauza, Markus Wulfmeier, Nicolas Heess, Dushyant Rao
备注：A similar version to this paper was accepted at IROS 2025
摘要：基于学习的灵巧操作方法近年来取得了显著的进展。然而，学习的政策往往仍然缺乏可靠性，并表现出有限的鲁棒性的重要因素的变化。可以在许多设置中观察到的一个故障模式是策略空闲，即当它们到达某些状态时，它们停止移动到状态的小区域之外。这种策略空转通常反映了训练数据。例如，当数据包含机器人需要执行高精度运动的区域中的小动作时，可能会发生这种情况，例如，当准备抓取物体或物体插入时。先前的工作已经尝试减轻这种现象，例如通过过滤训练数据或修改控制频率。然而，这些方法可能会以其他方式对政策绩效产生负面影响。作为一种替代方案，我们研究如何利用闲置行为的可检测性，为探索和政策改进提供信息。我们的方法，暂停诱导扰动（PIP），适用于检测到的闲置状态的扰动，从而帮助它逃脱有问题的盆地的吸引力。在一系列具有挑战性的模拟双臂任务中，我们发现这种简单的方法已经可以显着提高测试时的表现，无需额外的监督或培训。此外，由于机器人往往在运动的关键点闲置，我们还发现，从由此产生的情节学习导致更好的迭代策略改进相比，以前的方法。我们的扰动策略也导致了15-35%的绝对成功率的改善，在现实世界中的插入任务，需要复杂的多手指操作。
摘要：Learning-based methods for dexterous manipulation have made notable progress in recent years. However, learned policies often still lack reliability and exhibit limited robustness to important factors of variation. One failure pattern that can be observed across many settings is that policies idle, i.e. they cease to move beyond a small region of states when they reach certain states. This policy idling is often a reflection of the training data. For instance, it can occur when the data contains small actions in areas where the robot needs to perform high-precision motions, e.g., when preparing to grasp an object or object insertion. Prior works have tried to mitigate this phenomenon e.g. by filtering the training data or modifying the control frequency. However, these approaches can negatively impact policy performance in other ways. As an alternative, we investigate how to leverage the detectability of idling behavior to inform exploration and policy improvement. Our approach, Pause-Induced Perturbations (PIP), applies perturbations at detected idling states, thus helping it to escape problematic basins of attraction. On a range of challenging simulated dual-arm tasks, we find that this simple approach can already noticeably improve test-time performance, with no additional supervision or training. Furthermore, since the robot tends to idle at critical points in a movement, we also find that learning from the resulting episodes leads to better iterative policy improvement compared to prior approaches. Our perturbation strategy also leads to a 15-35% improvement in absolute success rate on a real-world insertion task that requires complex multi-finger manipulation.

【4】Transduction is All You Need for Structured Data Workflows
标题：结构化数据工作流所需的一切都是转换
链接：https://arxiv.org/abs/2508.15610

作者：ozzo, Naweed Khan, Christodoulos Constantinides, Nandana Mihindukulasooriya, Nahuel Defosse, Junkyu Lee
备注：32 pages, 8 figures
摘要：本文介绍了一个模块化的框架，用于构建基于代理的系统，能够在复杂的数据结构化推理和合成概括。在设计时考虑到了研究和实际应用，Approtics提供了一个处理数据和人工智能工作流的新视角。在这个框架中，代理从逻辑流中抽象出来，它们在数据类型内部使用，以实现数据之间的逻辑转换。人工智能鼓励人工智能开发人员专注于建模数据，而不是制作提示，从而启用声明性语言，其中数据类型由LLM提供，并通过逻辑转换组成，当类型连接时由LLM执行。我们提供了实证证据，证明了该框架在特定领域的多项选择题回答，文本到SQL的语义解析和自动提示优化任务中的适用性，在不牺牲性能的情况下实现了最先进的准确性或提高了可扩展性。开源实现可在\texttt{https：//github.com/IBM/agentics}上获得。
摘要：This paper introduces Agentics, a modular framework for building agent-based systems capable of structured reasoning and compositional generalization over complex data. Designed with research and practical applications in mind, Agentics offers a novel perspective on working with data and AI workflows. In this framework, agents are abstracted from the logical flow and they are used internally to the data type to enable logical transduction among data. Agentics encourages AI developers to focus on modeling data rather than crafting prompts, enabling a declarative language in which data types are provided by LLMs and composed through logical transduction, which is executed by LLMs when types are connected. We provide empirical evidence demonstrating the applicability of this framework across domain-specific multiple-choice question answering, semantic parsing for text-to-SQL, and automated prompt optimization tasks, achieving state-of-the-art accuracy or improved scalability without sacrificing performance. The open-source implementation is available at \texttt{https://github.com/IBM/agentics}.

【5】Stabilization of Perturbed Loss Function: Differential Privacy without Gradient Noise
标题：扰动损失函数的稳定化：没有梯度噪音的差异隐私
链接：https://arxiv.org/abs/2508.15523

作者：bib, Remi Chou, Taejoon Kim
备注：under review
摘要：我们提出了一种用于多用户局部差分隐私（LDP）的差分隐私训练机制SPOF（Stabilization of Perturbed Loss Function）。SPOF扰动模型训练损失函数的稳定泰勒展开多项式近似，其中每个用户的数据通过添加到多项式系数的校准噪声来私有化。与差分私有随机梯度下降（DP-SGD）等基于梯度的机制不同，SPOF不需要将噪声注入损失函数的梯度中，这提高了计算效率和稳定性。这种表述自然支持所有用户的同时隐私保证。此外，SPOF在训练过程中对环境噪声具有鲁棒性，即使在用户输入被破坏时也能保持稳定的性能。我们比较SPOF与DP-SGD的多用户扩展，在无线体域网（WBAN）场景中评估这两种方法，涉及异构用户数据和来自身体传感器的随机信道噪声。我们的研究结果表明，与DP-SGD相比，SPOF的重建精度平均提高了3.5%，平均训练时间减少了57.2%，在多用户环境中表现出卓越的隐私-效用权衡。
摘要：We propose SPOF (Stabilization of Perturbed Loss Function), a differentially private training mechanism intended for multi-user local differential privacy (LDP). SPOF perturbs a stabilized Taylor expanded polynomial approximation of a model's training loss function, where each user's data is privatized by calibrated noise added to the coefficients of the polynomial. Unlike gradient-based mechanisms such as differentially private stochastic gradient descent (DP-SGD), SPOF does not require injecting noise into the gradients of the loss function, which improves both computational efficiency and stability. This formulation naturally supports simultaneous privacy guarantees across all users. Moreover, SPOF exhibits robustness to environmental noise during training, maintaining stable performance even when user inputs are corrupted. We compare SPOF with a multi-user extension of DP-SGD, evaluating both methods in a wireless body area network (WBAN) scenario involving heterogeneous user data and stochastic channel noise from body sensors. Our results show that SPOF achieves, on average, up to 3.5% higher reconstruction accuracy and reduces mean training time by up to 57.2% compared to DP-SGD, demonstrating superior privacy-utility trade-offs in multi-user environments.

【6】Test-time Corpus Feedback: From Retrieval to RAG
标题：测试时数据库反馈：从检索到RAG
链接：https://arxiv.org/abs/2508.15437

作者：athee, Venktesh V, Sean MacAvaney, Avishek Anand
备注：18 pages, 1 figure
摘要：检索增强生成（RAG）已经成为知识密集型NLP任务的标准框架，将大型语言模型（LLM）与外部语料库的文档检索相结合。尽管它被广泛使用，但大多数RAG管道仍然将检索和推理视为独立的组件，检索文档一次，然后生成答案，而无需进一步的交互。这种静态设计通常会限制需要迭代证据收集或高精度检索的复杂任务的性能。最近的工作在信息检索（IR）和NLP社区已经开始通过引入自适应检索和排名方法，纳入反馈，以缩小这一差距。在这次调查中，我们提出了一个结构化的概述先进的检索和排名机制，集成这样的反馈。我们根据反馈信号的来源和在改进查询、检索上下文或文档池中的作用对反馈信号进行分类。通过巩固这些发展，我们的目标是弥合IR和NLP的观点，并突出检索作为端到端RAG系统的动态，可学习的组成部分。
摘要：Retrieval-Augmented Generation (RAG) has emerged as a standard framework for knowledge-intensive NLP tasks, combining large language models (LLMs) with document retrieval from external corpora. Despite its widespread use, most RAG pipelines continue to treat retrieval and reasoning as isolated components, retrieving documents once and then generating answers without further interaction. This static design often limits performance on complex tasks that require iterative evidence gathering or high-precision retrieval. Recent work in both the information retrieval (IR) and NLP communities has begun to close this gap by introducing adaptive retrieval and ranking methods that incorporate feedback. In this survey, we present a structured overview of advanced retrieval and ranking mechanisms that integrate such feedback. We categorize feedback signals based on their source and role in improving the query, retrieved context, or document pool. By consolidating these developments, we aim to bridge IR and NLP perspectives and highlight retrieval as a dynamic, learnable component of end-to-end RAG systems.

【7】Hybrid Least Squares/Gradient Descent Methods for DeepONets
标题：DeepOnets的混合最小平方/梯度下降方法
链接：https://arxiv.org/abs/2508.15394

作者： Chang-Ock Lee, Minam Moon
摘要：我们提出了一种有效的混合最小二乘/梯度下降方法来加速DeepONet训练。由于DeepONet的输出可以被视为相对于分支网络的最后一层参数是线性的，因此可以使用最小二乘（LS）求解来优化这些参数，并且通过梯度下降形式来更新剩余的隐藏层参数。然而，为所有可能的分支和主干输入组合构建LS系统会产生一个非常大的线性问题，无法直接求解。为了解决这个问题，我们的方法分解成两个更小的，更易于管理的子问题$\unicode{x2014}$一个分支网络和一个主干网络$\unicode{x2014}$，并分别解决它们。该方法被推广到更广泛的L^2损失类型，最后一层参数具有正则化项，包括具有物理信息损失的无监督学习的情况。
摘要：We propose an efficient hybrid least squares/gradient descent method to accelerate DeepONet training. Since the output of DeepONet can be viewed as linear with respect to the last layer parameters of the branch network, these parameters can be optimized using a least squares (LS) solve, and the remaining hidden layer parameters are updated by means of gradient descent form. However, building the LS system for all possible combinations of branch and trunk inputs yields a prohibitively large linear problem that is infeasible to solve directly. To address this issue, our method decomposes the large LS system into two smaller, more manageable subproblems $\unicode{x2014}$ one for the branch network and one for the trunk network $\unicode{x2014}$ and solves them separately. This method is generalized to a broader type of $L^2$ loss with a regularization term for the last layer parameters, including the case of unsupervised learning with physics-informed loss.

【8】Fairness for the People, by the People: Minority Collective Action
标题：公平为民、民治：少数群体集体行动
链接：https://arxiv.org/abs/2508.15374

作者：Dov, Samira Samadi, Amartya Sanyal, Alexandru Ţifrea
摘要：机器学习模型通常会保留训练数据中存在的偏见，从而导致对某些少数群体的不公平待遇。尽管有一系列现有的公司方的偏见缓解技术，他们通常会产生效用成本，并需要组织买入。认识到许多模型依赖于用户提供的数据，最终用户可以通过企业集体行动框架来诱导公平性，在该框架中，一个协调的少数群体战略性地重新标记自己的数据以增强公平性，而不改变公司的培训过程。我们提出了三种实用的模型不可知的方法来近似理想的重新标记，并在现实世界的数据集上验证它们。我们的研究结果表明，少数人的一个小组可以大大减少不公平性，对整体预测误差的影响很小。
摘要：Machine learning models often preserve biases present in training data, leading to unfair treatment of certain minority groups. Despite an array of existing firm-side bias mitigation techniques, they typically incur utility costs and require organizational buy-in. Recognizing that many models rely on user-contributed data, end-users can induce fairness through the framework of Algorithmic Collective Action, where a coordinated minority group strategically relabels its own data to enhance fairness, without altering the firm's training process. We propose three practical, model-agnostic methods to approximate ideal relabeling and validate them on real-world datasets. Our findings show that a subgroup of the minority can substantially reduce unfairness with a small impact on the overall prediction error.

【9】Saving for the future: Enhancing generalization via partial logic regularization
标题：为未来储蓄：通过部分逻辑正规化增强概括性
链接：https://arxiv.org/abs/2508.15317

作者：an, Yijie Hu, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang
摘要：泛化仍然是视觉分类任务中的一个重大挑战，特别是在处理真实世界应用中的未知类时。现有的研究集中在类发现范式，这往往有利于已知的类，和增量学习范式，这遭受灾难性的遗忘。最近的方法，如L-Reg技术，采用基于逻辑的正则化来增强泛化，但受到完全定义的逻辑公式的必要性的约束，限制了未知类的灵活性。本文介绍了PL-Reg，一种新的部分逻辑正则化项，允许模型为未定义的逻辑公式保留空间，提高对未知类的适应性。具体来说，我们正式证明，涉及未知类的任务可以有效地解释使用部分逻辑。我们还证明了部分逻辑的基础上的方法，导致改进的泛化。我们通过对广义类别发现，多域广义类别发现和长尾类增量学习任务的广泛实验来验证PL-Reg，证明了一致的性能改进。我们的研究结果突出了部分逻辑在应对未知类相关挑战方面的有效性。
摘要：Generalization remains a significant challenge in visual classification tasks, particularly in handling unknown classes in real-world applications. Existing research focuses on the class discovery paradigm, which tends to favor known classes, and the incremental learning paradigm, which suffers from catastrophic forgetting. Recent approaches such as the L-Reg technique employ logic-based regularization to enhance generalization but are bound by the necessity of fully defined logical formulas, limiting flexibility for unknown classes. This paper introduces PL-Reg, a novel partial-logic regularization term that allows models to reserve space for undefined logic formulas, improving adaptability to unknown classes. Specifically, we formally demonstrate that tasks involving unknown classes can be effectively explained using partial logic. We also prove that methods based on partial logic lead to improved generalization. We validate PL-Reg through extensive experiments on Generalized Category Discovery, Multi-Domain Generalized Category Discovery, and long-tailed Class Incremental Learning tasks, demonstrating consistent performance improvements. Our results highlight the effectiveness of partial logic in tackling challenges related to unknown classes.

【10】Deep Think with Confidence
标题：自信地深入思考
链接：https://arxiv.org/abs/2508.15260

作者：, Xuewei Wang, Yuandong Tian, Jiawei Zhao
摘要：大型语言模型（LLM）通过测试时间缩放方法（如自我一致性与多数投票）在推理任务中显示出巨大的潜力。然而，这种方法通常会导致准确性和高计算开销的收益递减。为了解决这些挑战，我们引入了DeepConf（DeepConf），这是一种简单而强大的方法，可以在测试时提高推理效率和性能。DeepConf利用模型内部的置信度信号，在生成过程中或生成后动态过滤掉低质量的推理跟踪。它不需要额外的模型训练或超参数调优，并且可以无缝集成到现有的服务框架中。我们在各种推理任务和最新的开源模型中评估DeepConf，包括Qwen 3和GPT-OSS系列。值得注意的是，在AIME 2025等具有挑战性的基准测试中，DeepConf@512的准确率高达99.9%，与完全并行思维相比，生成的令牌减少了84.7%。
摘要：Large Language Models (LLMs) have shown great potential in reasoning tasks through test-time scaling methods like self-consistency with majority voting. However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. DeepConf leverages model-internal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. We evaluate DeepConf across a variety of reasoning tasks and the latest open-source models, including Qwen 3 and GPT-OSS series. Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and reduces generated tokens by up to 84.7% compared to full parallel thinking.

【11】SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
标题：SparK：具有可恢复的KV缓存通道修剪的查询感知非结构化稀疏性
链接：https://arxiv.org/abs/2508.15212

作者：Liao, Yixing Xu, Shizhu He, Guanchen Li, Xuanwu Yin, Dong Li, Emad Barsoum, Jun Zhao, Kang Liu
摘要：大型语言模型（LLM）中的长上下文推理越来越受到KV缓存瓶颈的限制：内存使用量随序列长度线性增长，而注意力计算则呈二次曲线增长。现有方法通过诸如令牌驱逐或合并的策略沿着时间轴压缩KV高速缓存来解决这个问题，以减少存储器和计算开销。然而，这些方法经常忽略跨特征维度的细粒度重要性变化（即，通道轴），从而限制了他们有效平衡效率和模型准确性的能力。在现实中，我们观察到通道显着性在查询和位置上都有很大的变化：某些特征通道对于给定的查询携带的信息接近零，而其他特征通道则具有相关性。为了解决这一疏忽，我们提出了SPARK，一种无需训练的即插即用方法，通过在通道级别修剪KV来应用非结构化稀疏性，同时在注意力得分计算期间动态恢复修剪的条目。值得注意的是，我们的方法与现有的KV压缩和量化技术正交，使其与它们集成以实现进一步的加速。通过减少通道级冗余，SPARK可以在相同的内存预算内处理更长的序列。对于长度相等的序列，SPARK不仅保留或提高了模型精度，而且与基于逐出的方法相比，还将KV缓存存储减少了30%以上。此外，即使在80%的积极修剪率下，SPARK与基线驱逐方法相比仍保持性能下降小于5%，证明了其鲁棒性和有效性。我们的代码将在https://github.com/Xnhyacinth/SparK上提供。
摘要：Long-context inference in large language models (LLMs) is increasingly constrained by the KV cache bottleneck: memory usage grows linearly with sequence length, while attention computation scales quadratically. Existing approaches address this issue by compressing the KV cache along the temporal axis through strategies such as token eviction or merging to reduce memory and computational overhead. However, these methods often neglect fine-grained importance variations across feature dimensions (i.e., the channel axis), thereby limiting their ability to effectively balance efficiency and model accuracy. In reality, we observe that channel saliency varies dramatically across both queries and positions: certain feature channels carry near-zero information for a given query, while others spike in relevance. To address this oversight, we propose SPARK, a training-free plug-and-play method that applies unstructured sparsity by pruning KV at the channel level, while dynamically restoring the pruned entries during attention score computation. Notably, our approach is orthogonal to existing KV compression and quantization techniques, making it compatible for integration with them to achieve further acceleration. By reducing channel-level redundancy, SPARK enables processing of longer sequences within the same memory budget. For sequences of equal length, SPARK not only preserves or improves model accuracy but also reduces KV cache storage by over 30% compared to eviction-based methods. Furthermore, even with an aggressive pruning ratio of 80%, SPARK maintains performance with less degradation than 5% compared to the baseline eviction method, demonstrating its robustness and effectiveness. Our code will be available at https://github.com/Xnhyacinth/SparK.

【12】Revisiting Pre-processing Group Fairness: A Modular Benchmarking Framework
标题：重新审视预处理组公平性：模块化基准框架
链接：https://arxiv.org/abs/2508.15193

作者：dfield, Ziqi Xu, Sevvandi Kandanaarachchi
备注：This paper has been accepted to the 34th ACM International Conference on Information and Knowledge Management (CIKM 2025), Resource Track
摘要：随着机器学习系统越来越多地集成到高风险决策过程中，确保算法结果的公平性已成为一个关键问题。减轻偏差的方法通常分为三类：预处理、处理中和后处理。虽然后两种方法受到了极大的关注，但在数据层面操作并提供模型不可知论和改善隐私合规性等优势的预处理方法相对较少受到关注，并且缺乏标准化的评估工具。在这项工作中，我们介绍了FairPrep，一个可扩展的和模块化的基准测试框架，旨在评估公平意识的预处理技术的表格数据集。FairPrep建立在AIF360平台上，可以无缝集成数据集、公平干预和预测模型。它具有批处理接口，可以进行有效的实验，并自动报告公平性和效用指标。通过提供标准化的管道和支持可重复的评估，FairPrep填补了公平基准测试领域的关键空白，并为推进数据级公平性研究提供了实践基础。
摘要：As machine learning systems become increasingly integrated into high-stakes decision-making processes, ensuring fairness in algorithmic outcomes has become a critical concern. Methods to mitigate bias typically fall into three categories: pre-processing, in-processing, and post-processing. While significant attention has been devoted to the latter two, pre-processing methods, which operate at the data level and offer advantages such as model-agnosticism and improved privacy compliance, have received comparatively less focus and lack standardised evaluation tools. In this work, we introduce FairPrep, an extensible and modular benchmarking framework designed to evaluate fairness-aware pre-processing techniques on tabular datasets. Built on the AIF360 platform, FairPrep allows seamless integration of datasets, fairness interventions, and predictive models. It features a batch-processing interface that enables efficient experimentation and automatic reporting of fairness and utility metrics. By offering standardised pipelines and supporting reproducible evaluations, FairPrep fills a critical gap in the fairness benchmarking landscape and provides a practical foundation for advancing data-level fairness research.

【13】Towards Source-Free Machine Unlearning
标题：迈向无源机器学习
链接：https://arxiv.org/abs/2508.15127

作者：Ahmed, Umit Yigit Basaran, Dripta S. Raychaudhuri, Arindam Dutta, Rohit Kundu, Fahim Faisal Niloy, Basak Guler, Amit K. Roy-Chowdhury
备注：Accepted by CVPR 2025
摘要：随着机器学习变得越来越普遍以及数据隐私法规的发展，从训练模型中删除隐私或受版权保护的信息的能力正在成为一个越来越重要的要求。现有的遗忘方法通常依赖于在遗忘过程中访问整个训练数据集的假设。然而，该假设在原始训练数据可能不可访问的实际场景中可能不成立，即，无源设置。为了应对这一挑战，我们专注于无源的unlearning场景，其中unlearning算法必须能够从训练模型中删除特定数据，而无需访问原始训练数据集。在最近工作的基础上，我们提出了一种方法，可以估计未知剩余训练数据的Hessian，这是有效学习所需的关键组成部分。利用这种估计技术，我们的方法使有效的zero-shot unlearning，同时提供强大的理论保证的unlearning性能，同时保持对其余数据的性能。在广泛的数据集上进行的大量实验验证了该方法的有效性。
摘要：As machine learning becomes more pervasive and data privacy regulations evolve, the ability to remove private or copyrighted information from trained models is becoming an increasingly critical requirement. Existing unlearning methods often rely on the assumption of having access to the entire training dataset during the forgetting process. However, this assumption may not hold true in practical scenarios where the original training data may not be accessible, i.e., the source-free setting. To address this challenge, we focus on the source-free unlearning scenario, where an unlearning algorithm must be capable of removing specific data from a trained model without requiring access to the original training dataset. Building on recent work, we present a method that can estimate the Hessian of the unknown remaining training data, a crucial component required for efficient unlearning. Leveraging this estimation technique, our method enables efficient zero-shot unlearning while providing robust theoretical guarantees on the unlearning performance, while maintaining performance on the remaining data. Extensive experiments over a wide range of datasets verify the efficacy of our method.

【14】Open-Universe Assistance Games
标题：开放宇宙援助游戏
链接：https://arxiv.org/abs/2508.15119

作者：, Jingyi Qu, Andreea Bobu, Dylan Hadfield-Menell
备注：7 pages + 2 pages references + 7 pages appendix
摘要：人工智能代理必须以可解释的方式推断并采取行动，实现各种人类目标和偏好，而这些目标和偏好是没有预先定义的。为了正式化这种设置，我们引入了开放宇宙援助游戏（OU-AGs），一个框架，代理必须在一个无限的和不断发展的空间可能的目标的原因。在这种情况下，我们介绍了GOOD（GOals from Open-ended Dialogue），这是一种数据高效的在线方法，可以在与人类的交互过程中以自然语言的形式提取目标，并推断出自然语言目标的分布。GOOD提示LLM模拟具有不同复杂意图的用户，使用其响应对候选目标执行概率推理。这种方法可以实现丰富的目标表示和不确定性估计，而无需大型离线数据集。我们在基于文本的杂货店购物域和文本操作的模拟家庭机器人环境（AI 2 Thor）中使用合成用户配置文件评估GOOD。我们的方法优于没有明确目标跟踪的基线，这一点得到了基于LLM和人类评估的证实。
摘要：Embodied AI agents must infer and act in an interpretable way on diverse human goals and preferences that are not predefined. To formalize this setting, we introduce Open-Universe Assistance Games (OU-AGs), a framework where the agent must reason over an unbounded and evolving space of possible goals. In this context, we introduce GOOD (GOals from Open-ended Dialogue), a data-efficient, online method that extracts goals in the form of natural language during an interaction with a human, and infers a distribution over natural language goals. GOOD prompts an LLM to simulate users with different complex intents, using its responses to perform probabilistic inference over candidate goals. This approach enables rich goal representations and uncertainty estimation without requiring large offline datasets. We evaluate GOOD in a text-based grocery shopping domain and in a text-operated simulated household robotics environment (AI2Thor), using synthetic user profiles. Our method outperforms a baseline without explicit goal tracking, as confirmed by both LLM-based and human evaluations.

【15】Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
标题：Nemotron-CC-Math：1330亿代币规模的高质量数学预训练数据集
链接：https://arxiv.org/abs/2508.15096

作者：rimi Mahabadi, Sanjeev Satheesh, Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
摘要：在高质量的结构化数据（如数学和代码）上对大型语言模型（LLM）进行预训练，可以大大增强推理能力。然而，现有的基于Common Crawl构建的以数学为中心的数据集由于脆弱的提取算法、有损的HTML到文本转换以及无法可靠地保留数学结构而遭受质量下降。在这项工作中，我们介绍了Nemotron-CC-Math，这是一个大规模的高质量数学语料库，使用专门为强大的科学文本提取而设计的新颖的域不可知管道从Common Crawl构建。与以前的努力不同，我们的管道跨各种格式（例如，MathJax、KaTeX、MathML），利用lynx的布局感知渲染和基于LLM的目标清理阶段。这种方法保留了方程和代码块的结构完整性，同时删除了样板、将符号标准化为LaTeX表示并纠正了不一致之处。我们收集了一个大的，高质量的数学语料库，即Nemotron-CC-Math-3+（133 B令牌）和Nemotron-CC-Math-4+（52 B令牌）。值得注意的是，Nemotron-CC-Math-4+不仅超过了之前所有的开放数学数据集（包括MegaMath、FineMath和OpenWebMath），而且包含的令牌数量是FineMath-4+的5.5倍，后者是之前质量最高的数学预训练数据集。当用于预训练Nemotron-T 8B模型时，我们的语料库在MATH上产生了+4.8到+12.6的增益，在MBPP+上产生了+4.6到+14.3的增益，同时还提高了MMLU和MMLU-Stem的通用域性能。我们提出了第一个从嘈杂的网络规模数据中可靠地提取科学内容（包括数学）的管道，在数学，代码和一般推理方面产生了可衡量的收益，并在开放的数学预训练语料库中建立了一个新的艺术状态。为了支持开源工作，我们发布了我们的代码和数据集。
摘要：Pretraining large language models (LLMs) on high-quality, structured data such as mathematics and code substantially enhances reasoning capabilities. However, existing math-focused datasets built from Common Crawl suffer from degraded quality due to brittle extraction heuristics, lossy HTML-to-text conversion, and the failure to reliably preserve mathematical structure. In this work, we introduce Nemotron-CC-Math, a large-scale, high-quality mathematical corpus constructed from Common Crawl using a novel, domain-agnostic pipeline specifically designed for robust scientific text extraction. Unlike previous efforts, our pipeline recovers math across various formats (e.g., MathJax, KaTeX, MathML) by leveraging layout-aware rendering with lynx and a targeted LLM-based cleaning stage. This approach preserves the structural integrity of equations and code blocks while removing boilerplate, standardizing notation into LaTeX representation, and correcting inconsistencies. We collected a large, high-quality math corpus, namely Nemotron-CC-Math-3+ (133B tokens) and Nemotron-CC-Math-4+ (52B tokens). Notably, Nemotron-CC-Math-4+ not only surpasses all prior open math datasets-including MegaMath, FineMath, and OpenWebMath-but also contains 5.5 times more tokens than FineMath-4+, which was previously the highest-quality math pretraining dataset. When used to pretrain a Nemotron-T 8B model, our corpus yields +4.8 to +12.6 gains on MATH and +4.6 to +14.3 gains on MBPP+ over strong baselines, while also improving general-domain performance on MMLU and MMLU-Stem. We present the first pipeline to reliably extract scientific content--including math--from noisy web-scale data, yielding measurable gains in math, code, and general reasoning, and setting a new state of the art among open math pretraining corpora. To support open-source efforts, we release our code and datasets.

【16】LongRecall: A Structured Approach for Robust Recall Evaluation in Long-Form Text
标题：LongRecall：长格式文本中稳健回忆评估的结构化方法
链接：https://arxiv.org/abs/2508.15085

作者：avad Ardestani, Ehsan Kamalloo, Davood Rafiei
摘要：LongRecall。机器生成文本的完整性，确保它捕获所有相关信息，在医学和法律等领域以及基于列表的问答（QA）等任务中至关重要，遗漏可能会产生严重后果。然而，现有的召回指标往往依赖于词汇重叠，导致错误与未经证实的实体和释义的答案，而LLM-as-a-Judge方法与长期整体提示捕捉更广泛的语义，但仍然容易错位和幻觉没有结构化验证。我们介绍LongRecall，一个一般的三阶段召回评估框架，分解成自包含的事实的答案，连续缩小合理的候选匹配通过词汇和语义过滤，并验证其对齐通过结构化蕴涵检查。这种设计减少了假阳性和假阴性，同时适应不同的措辞和上下文变化，作为系统召回评估的基础构建块。我们使用人类注释和基于LLM的法官在三个具有挑战性的长形式QA基准上评估了LongRecall，证明了在强大的词汇和LLM作为法官基线的召回准确性方面的实质性改进。
摘要：LongRecall. The completeness of machine-generated text, ensuring that it captures all relevant information, is crucial in domains such as medicine and law and in tasks like list-based question answering (QA), where omissions can have serious consequences. However, existing recall metrics often depend on lexical overlap, leading to errors with unsubstantiated entities and paraphrased answers, while LLM-as-a-Judge methods with long holistic prompts capture broader semantics but remain prone to misalignment and hallucinations without structured verification. We introduce LongRecall, a general three-stage recall evaluation framework that decomposes answers into self-contained facts, successively narrows plausible candidate matches through lexical and semantic filtering, and verifies their alignment through structured entailment checks. This design reduces false positives and false negatives while accommodating diverse phrasings and contextual variations, serving as a foundational building block for systematic recall assessment. We evaluate LongRecall on three challenging long-form QA benchmarks using both human annotations and LLM-based judges, demonstrating substantial improvements in recall accuracy over strong lexical and LLM-as-a-Judge baselines.

【17】Generative Neural Operators of Log-Complexity Can Simultaneously Solve Infinitely Many Convex Programs
标题：具有log复杂性的生成神经运算符可以同时求解无限多个凸规划
链接：https://arxiv.org/abs/2508.14995

作者： Kratsios, Ariel Neufeld, Philipp Schmocker
摘要：神经操作符（Neural Operators，NO）是一类深度学习模型，旨在通过将无限多个相关问题投射到这些NO操作的无限维空间中来同时解决这些问题。理论和实践之间仍然存在着一个巨大的差距：来自普适逼近定理的最坏情况参数界限表明，NO可能需要不切实际的大量参数来解决大多数算子学习问题，这与大量实验证据直接相反。本文关闭了一个特定的类{NO}，生成{平衡运营商}（GEOs），使用（现实）有限维深平衡层，当解决家庭的凸优化问题在一个可分离的希尔伯特空间$X$的差距。这里，输入是X上的光滑凸损失函数，输出是由每个输入损失定义的优化问题的相关（近似）解。我们表明，当输入损失位于合适的无限维紧集，我们的GEO可以一致地近似相应的解决方案，以任意精度，与秩，深度和宽度增长的近似误差的倒数只有几何。然后，我们验证了我们的理论结果和可训练的GEOs的三个应用：（1）非线性偏微分方程，（2）随机最优控制问题，（3）对冲问题的数学金融流动性约束下。
摘要：Neural operators (NOs) are a class of deep learning models designed to simultaneously solve infinitely many related problems by casting them into an infinite-dimensional space, whereon these NOs operate. A significant gap remains between theory and practice: worst-case parameter bounds from universal approximation theorems suggest that NOs may require an unrealistically large number of parameters to solve most operator learning problems, which stands in direct opposition to a slew of experimental evidence. This paper closes that gap for a specific class of {NOs}, generative {equilibrium operators} (GEOs), using (realistic) finite-dimensional deep equilibrium layers, when solving families of convex optimization problems over a separable Hilbert space $X$. Here, the inputs are smooth, convex loss functions on $X$, and outputs are the associated (approximate) solutions to the optimization problem defined by each input loss. We show that when the input losses lie in suitable infinite-dimensional compact sets, our GEO can uniformly approximate the corresponding solutions to arbitrary precision, with rank, depth, and width growing only logarithmically in the reciprocal of the approximation error. We then validate both our theoretical results and the trainability of GEOs on three applications: (1) nonlinear PDEs, (2) stochastic optimal control problems, and (3) hedging problems in mathematical finance under liquidity constraints.

【18】A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
标题：一种基于视觉的共享控制遥操作控制方案用于控制四足机器人机械臂
链接：https://arxiv.org/abs/2508.14994

作者：nicius da Silva, Matheus Hipolito Carvalho, Juliano Negri, Thiago Segreto, Gustavo J. G. Lahr, Ricardo V. Godoy, Marcelo Becker
摘要：在危险和偏远的环境中，机器人系统执行要求提高安全性和效率的关键任务。其中，四足机器人与机械臂提供灵活性和多功能性的复杂操作。然而，遥操作四足机器人是具有挑战性的，由于缺乏集成的障碍物检测和直观的控制方法的机器人手臂，增加碰撞的风险，在有限的或动态变化的机器人。通过触摸杆或平板进行远程操作可能是非直观的，并且由于其复杂性而需要高水平的专业知识，最终导致操作员的高认知负荷。为了应对这一挑战，直接将人类手臂运动映射到机器人操纵器的遥操作方法提供了一种更简单，更容易获得的解决方案。这项工作提出了一种直观的远程控制，利用基于视觉的姿态估计管道，利用外部相机与基于机器学习的模型来检测操作员的手腕位置。该系统将这些手腕运动映射到机器人手臂命令中，以实时控制机器人的手臂。轨迹规划器通过检测和防止与障碍物和机械臂本身的碰撞来确保安全的远程操作。该系统在实际机器人上进行了验证，证明了实时控制的鲁棒性。这种远程操作方法为工业应用提供了一种具有成本效益的解决方案，其中安全性，精度和易用性至关重要，确保在高风险环境中可靠和直观的机器人控制。
摘要：In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.

【19】Quantum Long Short-term Memory with Differentiable Architecture Search
标题：具有差异架构搜索的量子长短期存储器
链接：https://arxiv.org/abs/2508.14955

作者：n-Chi Chen, Prayag Tiwari
备注：Accepted by the IEEE International Conference on Quantum Artificial Intelligence (QAI) 2025
摘要：量子计算和机器学习的最新进展已经引起了量子机器学习（QML），人们对从序列数据中学习的兴趣越来越大。像QLSTM这样的量子递归模型在时间序列预测、NLP和强化学习方面很有前途。然而，设计有效的变分量子电路（VQC）仍然具有挑战性，并且通常是特定于任务的。为了解决这个问题，我们提出了DiffQAS-QLSTM，这是一个端到端的可区分框架，可以在训练过程中优化VQC参数和架构选择。我们的结果表明，DiffQAS-QLSTM始终优于手工制作的基线，在不同的测试设置中实现了更低的损失。这种方法为可扩展和自适应的量子序列学习打开了大门。
摘要：Recent advances in quantum computing and machine learning have given rise to quantum machine learning (QML), with growing interest in learning from sequential data. Quantum recurrent models like QLSTM are promising for time-series prediction, NLP, and reinforcement learning. However, designing effective variational quantum circuits (VQCs) remains challenging and often task-specific. To address this, we propose DiffQAS-QLSTM, an end-to-end differentiable framework that optimizes both VQC parameters and architecture selection during training. Our results show that DiffQAS-QLSTM consistently outperforms handcrafted baselines, achieving lower loss across diverse test settings. This approach opens the door to scalable and adaptive quantum sequence learning.

【20】Collaborative Filtering using Variational Quantum Hopfield Associative Memory
标题：使用变分量子Hopfield联想记忆的协同过滤
链接：https://arxiv.org/abs/2508.14906

作者：anshahani, Ebrahim Ardeshir-Larijani, Rakesh Saini, Saif Al-Kuwari
摘要：与经典系统相比，量子计算能够以指数级速度进行计算，在机器学习和推荐系统等各个领域都有新的应用。量子机器学习（QML）将量子计算与机器学习技术相结合，为数据处理和模式识别提供了强大的新工具。本文提出了一种混合推荐系统，该系统将量子Hopfield联想记忆（QHAM）与深度神经网络相结合，以改进MovieLens 1 M数据集的提取和分类。使用K-Means算法将用户原型聚类为多个唯一的组，并通过编码器的激活函数将其转换为极坐标模式。这些极性模式，然后集成到变分QHAM为基础的混合推荐模型。该系统在理想环境中使用超过35个时期的MSE损失进行训练，达到0.9795的ROC值，0.8841的准确度和0.8786的F-1评分。在噪声环境中使用自定义Qiskit AER噪声模型进行相同数量的epoch训练，该模型将位翻转和读出错误与真实量子硬件中的概率相同，其ROC为0.9177，准确度为0.8013，F-1得分等于0.7866，表现出一致的性能。此外，我们能够通过仅有效地更新一个随机目标量子位来优化先前QHAM架构中存在的量子位开销。这项研究提出了一种新的框架，将变分量子计算与深度学习相结合，能够处理真实世界的数据集，与纯粹的经典数据集相比，性能相当。此外，该模型可以在嘈杂的配置中表现得同样出色，表现出稳定的性能，并为推荐系统的未来使用提出了一个有前途的方向。
摘要：Quantum computing, with its ability to do exponentially faster computation compared to classical systems, has found novel applications in various fields such as machine learning and recommendation systems. Quantum Machine Learning (QML), which integrates quantum computing with machine learning techniques, presents powerful new tools for data processing and pattern recognition. This paper proposes a hybrid recommendation system that combines Quantum Hopfield Associative Memory (QHAM) with deep neural networks to improve the extraction and classification on the MovieLens 1M dataset. User archetypes are clustered into multiple unique groups using the K-Means algorithm and converted into polar patterns through the encoder's activation function. These polar patterns are then integrated into the variational QHAM-based hybrid recommendation model. The system was trained using the MSE loss over 35 epochs in an ideal environment, achieving an ROC value of 0.9795, an accuracy of 0.8841, and an F-1 Score of 0.8786. Trained with the same number of epochs in a noisy environment using a custom Qiskit AER noise model incorporating bit-flip and readout errors with the same probabilities as in real quantum hardware, it achieves an ROC of 0.9177, an accuracy of 0.8013, and an F-1 Score equal to 0.7866, demonstrating consistent performance. Additionally, we were able to optimize the qubit overhead present in previous QHAM architectures by efficiently updating only one random targeted qubit. This research presents a novel framework that combines variational quantum computing with deep learning, capable of dealing with real-world datasets with comparable performance compared to purely classical counterparts. Additionally, the model can perform similarly well in noisy configurations, showcasing a steady performance and proposing a promising direction for future usage in recommendation systems.

【21】Exploring the Landscape of Non-Equilibrium Memories with Neural Cellular Automata
标题：用神经细胞自动机探索非平衡记忆的格局
链接：https://arxiv.org/abs/2508.15726

作者：e, Ehsan Pajouheshgar
备注：4+9 pages
摘要：我们调查景观的多体记忆：家庭的本地非平衡动力学，保留有关其初始条件的信息，即使在存在任意扰动的时间尺度。在二维空间中，唯一被研究得很好的记忆是托姆规则。通过结合严格的证明和机器学习方法，我们证明了2D记忆的前景实际上是非常广阔的。我们发现的记忆，纠正错误的方式在质量上不同于图姆的规则，有序的阶段稳定的波动，并保留信息只在存在噪声。总之，我们的研究结果表明，物理系统可以以许多不同的方式执行强大的信息存储，并证明多体记忆的物理学比以前实现的更丰富。这项工作中研究的动态交互式可视化可在https://memorynca.github.io/2D上获得。
摘要：We investigate the landscape of many-body memories: families of local non-equilibrium dynamics that retain information about their initial conditions for thermodynamically long time scales, even in the presence of arbitrary perturbations. In two dimensions, the only well-studied memory is Toom's rule. Using a combination of rigorous proofs and machine learning methods, we show that the landscape of 2D memories is in fact quite vast. We discover memories that correct errors in ways qualitatively distinct from Toom's rule, have ordered phases stabilized by fluctuations, and preserve information only in the presence of noise. Taken together, our results show that physical systems can perform robust information storage in many distinct ways, and demonstrate that the physics of many-body memories is richer than previously realized. Interactive visualizations of the dynamics studied in this work are available at https://memorynca.github.io/2D.

【22】High-dimensional Asymptotics of Generalization Performance in Continual Ridge Regression
标题：连续岭回归中推广性能的多维渐进性
链接：https://arxiv.org/abs/2508.15494

作者：o, Wenqing Su, Ying Yang
摘要：持续学习的动机是需要适应任务和数据分布中的现实动态，同时减轻灾难性遗忘。尽管持续学习技术取得了重大进展，但对其泛化性能的理论理解仍然落后。本文研究了高维线性模型中连续岭回归的理论性质，其中维数与每个任务的样本量成正比。使用随机矩阵理论，我们得到的渐近预测风险的精确表达式，从而使三个评价指标的泛化性能在持续学习的特征：平均风险，向后转移，向前转移。此外，我们提出的理论风险曲线，以说明在整个持续学习过程中，这些评估指标的趋势。我们的分析揭示了风险曲线中的一些有趣的现象，展示了模型规格如何影响泛化性能。仿真研究进行验证我们的理论研究结果。
摘要：Continual learning is motivated by the need to adapt to real-world dynamics in tasks and data distribution while mitigating catastrophic forgetting. Despite significant advances in continual learning techniques, the theoretical understanding of their generalization performance lags behind. This paper examines the theoretical properties of continual ridge regression in high-dimensional linear models, where the dimension is proportional to the sample size in each task. Using random matrix theory, we derive exact expressions of the asymptotic prediction risk, thereby enabling the characterization of three evaluation metrics of generalization performance in continual learning: average risk, backward transfer, and forward transfer. Furthermore, we present the theoretical risk curves to illustrate the trends in these evaluation metrics throughout the continual learning process. Our analysis reveals several intriguing phenomena in the risk curves, demonstrating how model specifications influence the generalization performance. Simulation studies are conducted to validate our theoretical findings.

【23】Robust and Efficient Quantum Reservoir Computing with Discrete Time Crystal
标题：利用离散时间晶体进行稳健有效的量子水库计算
链接：https://arxiv.org/abs/2508.15230

作者： Xin Li, Yibin Guo, Haifeng Yu, Yirong Jin, Zhang-Qi Yin
备注：12 pages, 7 figures
摘要：机器学习和量子计算的快速发展使量子机器学习处于研究的前沿。然而，现有的基于量子变分算法的量子机器学习算法在可训练性和噪声鲁棒性方面面临挑战。为了解决这些挑战，我们引入了一个无梯度，噪声鲁棒的量子水库计算算法，利用离散时间晶体动力学作为水库。我们首先校准的记忆，非线性和信息扰频能力的量子水库，揭示它们与动力学相位和非平衡相变的相关性。然后，我们将该算法应用于二进制分类任务，并建立了一个比较量子内核的优势。对于十类分类，噪声模拟和超导量子处理器上的实验结果都与理想模拟相匹配，证明了随着系统尺寸的增加而提高的准确性，并证实了拓扑噪声鲁棒性。我们的工作提出了基于数字量子模拟的图像分类量子水库计算的第一个实验演示。它建立了量子多体非平衡相变与量子机器学习性能之间的关联，为NISQ时代的量子库计算和更广泛的量子机器学习算法提供了新的设计原则。
摘要：The rapid development of machine learning and quantum computing has placed quantum machine learning at the forefront of research. However, existing quantum machine learning algorithms based on quantum variational algorithms face challenges in trainability and noise robustness. In order to address these challenges, we introduce a gradient-free, noise-robust quantum reservoir computing algorithm that harnesses discrete time crystal dynamics as a reservoir. We first calibrate the memory, nonlinear, and information scrambling capacities of the quantum reservoir, revealing their correlation with dynamical phases and non-equilibrium phase transitions. We then apply the algorithm to the binary classification task and establish a comparative quantum kernel advantage. For ten-class classification, both noisy simulations and experimental results on superconducting quantum processors match ideal simulations, demonstrating the enhanced accuracy with increasing system size and confirming the topological noise robustness. Our work presents the first experimental demonstration of quantum reservoir computing for image classification based on digital quantum simulation. It establishes the correlation between quantum many-body non-equilibrium phase transitions and quantum machine learning performance, providing new design principles for quantum reservoir computing and broader quantum machine learning algorithms in the NISQ era.

【24】Can synthetic data reproduce real-world findings in epidemiology? A replication study using tree-based generative AI
标题：合成数据能否重现现实世界的流行病学发现？使用基于树的生成人工智能的复制研究
链接：https://arxiv.org/abs/2508.14936

作者：, Kathrin Günther, Lori Ann Vallis, Klaus Berger, Nadine Binder, Hermann Brenner, Stefanie Castell, Beate Fischer, Volker Harth, Bernd Holleczek, Timm Intemann, Till Ittermann, André Karch, Thomas Keil, Lilian Krist, Berit Lange, Michael F. Leitzmann, Katharina Nimptsch, Nadia Obi, Iris Pigeot, Tobias Pischon, Tamara Schikowski, Börge Schmidt, Carsten Oliver Schmidt, Anja M. Sedlmair, Justine Tanoey, Harm Wienbergen, Andreas Wienke, Claudia Wigmann, Marvin N. Wright
摘要：用于合成数据生成的生成式人工智能在解决流行病学的实际挑战方面具有巨大的潜力。然而，许多当前的方法遭受有限的质量，高计算需求，和复杂的非专家。此外，合成数据的常见评价策略往往不能直接反映统计效用。在这种背景下，一个关键的未充分探索的问题是，合成数据是否可以可靠地再现流行病学研究的关键发现。我们建议使用对抗性随机森林（ARF）作为一种有效和方便的方法来合成表格流行病学数据。为了评估其性能，我们复制了六个流行病学出版物的统计分析，并将原始结果与合成结果进行了比较。这些出版物涵盖了血压、人体测量、心肌梗死、加速度、孤独和糖尿病，基于德国国家队列研究（NAKO Gesundheitsstudie）、不来梅STEMI登记U45研究和圭尔夫家庭健康研究的数据。此外，我们通过将数据集限制为与个体分析相关的变量（包括必要的推导），评估了维度和变量复杂性对合成质量的影响。在所有复制的原始研究中，多个合成数据复制的结果与原始发现一致。即使对于样本量与维度之比相对较低的数据集，复制结果也与各种描述性和推断性分析的原始结果密切匹配。降维和预导出变量进一步提高了结果的质量和稳定性。
摘要：Generative artificial intelligence for synthetic data generation holds substantial potential to address practical challenges in epidemiology. However, many current methods suffer from limited quality, high computational demands, and complexity for non-experts. Furthermore, common evaluation strategies for synthetic data often fail to directly reflect statistical utility. Against this background, a critical underexplored question is whether synthetic data can reliably reproduce key findings from epidemiological research. We propose the use of adversarial random forests (ARF) as an efficient and convenient method for synthesizing tabular epidemiological data. To evaluate its performance, we replicated statistical analyses from six epidemiological publications and compared original with synthetic results. These publications cover blood pressure, anthropometry, myocardial infarction, accelerometry, loneliness, and diabetes, based on data from the German National Cohort (NAKO Gesundheitsstudie), the Bremen STEMI Registry U45 Study, and the Guelph Family Health Study. Additionally, we assessed the impact of dimensionality and variable complexity on synthesis quality by limiting datasets to variables relevant for individual analyses, including necessary derivations. Across all replicated original studies, results from multiple synthetic data replications consistently aligned with original findings. Even for datasets with relatively low sample size-to-dimensionality ratios, the replication outcomes closely matched the original results across various descriptive and inferential analyses. Reducing dimensionality and pre-deriving variables further enhanced both quality and stability of the results.

【25】AGP: A Novel Arabidopsis thaliana Genomics-Phenomics Dataset and its HyperGraph Baseline Benchmarking
标题：ANP：一种新型的蚕豆基因组学-表型组学数据集及其HyperShape基线基准
链接：https://arxiv.org/abs/2508.14934

作者：rna-Aguilera, Fiona L. Goggin, Aranyak Goswami, Alexander Bucksch, Suxing Liu, Khoa Luu
摘要：了解生物体中哪些基因控制哪些性状仍然是生物学的核心挑战之一。尽管数据收集技术取得了重大进展，但我们将基因映射到性状的能力仍然有限。这种基因组到表型组（G2 P）的挑战跨越了几个问题领域，包括植物育种，并需要能够推理高维，异构和生物结构化数据的模型。然而，目前，许多数据集仅捕获遗传信息或仅捕获表型信息。此外，表型数据是非常异构的，许多数据集不能完全捕获。关键的缺点是，这些数据集是不集成的，也就是说，他们不相互联系，以描述相同的生物标本。这限制了机器学习模型了解这些样本的各个方面的能力，影响了学习到的相关性的广度，从而影响了它们做出更准确预测的能力。为了解决这一差距，我们提出了拟南芥基因组学-表型组学（AGP）数据集，这是一个策划的多模态数据集，将拟南芥（植物生物学中的模式生物）的基因表达谱与表型性状测量联系起来。AGP支持表型预测和可解释图学习等任务。此外，我们对传统的回归和解释性基线进行了基准测试，包括生物学上知情的超图基线，以验证基因-性状关联。据我们所知，这是第一个数据集，提供多模态基因信息和异质性状或表型数据相同的拟南芥标本。通过AGP，我们的目标是促进研究界使用基因信息，高阶基因配对和来自多个来源的性状数据准确理解基因型和表型之间的联系。
摘要：Understanding which genes control which traits in an organism remains one of the central challenges in biology. Despite significant advances in data collection technology, our ability to map genes to traits is still limited. This genome-to-phenome (G2P) challenge spans several problem domains, including plant breeding, and requires models capable of reasoning over high-dimensional, heterogeneous, and biologically structured data. Currently, however, many datasets solely capture genetic information or solely capture phenotype information. Additionally, phenotype data is very heterogeneous, which many datasets do not fully capture. The critical drawback is that these datasets are not integrated, that is, they do not link with each other to describe the same biological specimens. This limits machine learning models' ability to be informed on the various aspects of these specimens, impacting the breadth of correlations learned, and therefore their ability to make more accurate predictions. To address this gap, we present the Arabidopsis Genomics-Phenomics (AGP) Dataset, a curated multi-modal dataset linking gene expression profiles with phenotypic trait measurements in Arabidopsis thaliana, a model organism in plant biology. AGP supports tasks such as phenotype prediction and interpretable graph learning. In addition, we benchmark conventional regression and explanatory baselines, including a biologically-informed hypergraph baseline, to validate gene-trait associations. To the best of our knowledge, this is the first dataset that provides multi-modal gene information and heterogeneous trait or phenotype data for the same Arabidopsis thaliana specimens. With AGP, we aim to foster the research community towards accurately understanding the connection between genotypes and phenotypes using gene information, higher-order gene pairings, and trait data from several sources.

【26】A U-Statistic-based random forest approach for genetic interaction study
标题：遗传相互作用研究的基于U统计的随机森林方法
链接：https://arxiv.org/abs/2508.14924

作者：Ruo-Sin Peng, Changshuai Wei, Qing Lu
摘要：复杂性状的变异受到多种遗传变异、环境风险因素及其相互作用的影响。虽然在识别与复杂性状相关的单个遗传变异方面取得了实质性进展，但检测基因-基因和基因-环境相互作用仍然是一个巨大的挑战。当涉及大量的遗传变异和环境风险因素时，由于特征空间和计算强度呈指数增加，搜索相互作用仅限于成对相互作用。另外，递归分区方法，如随机森林，在高维遗传关联研究中已经流行起来。在这篇文章中，我们提出了一个基于U-统计的随机森林方法，称为森林U-测试，与数量性状的遗传关联研究。通过仿真研究，我们表明，森林U检验优于现有的方法。所提出的方法也被应用于研究大麻依赖CD，使用来自成瘾研究的三个独立数据集：遗传学和环境。经验p值小于0.001时，检测到显著的联合相关性。这一发现在两个独立的数据集中也得到了重复，p值分别为5.93e-19和4.70e-17。
摘要：Variations in complex traits are influenced by multiple genetic variants, environmental risk factors, and their interactions. Though substantial progress has been made in identifying single genetic variants associated with complex traits, detecting the gene-gene and gene-environment interactions remains a great challenge. When a large number of genetic variants and environmental risk factors are involved, searching for interactions is limited to pair-wise interactions due to the exponentially increased feature space and computational intensity. Alternatively, recursive partitioning approaches, such as random forests, have gained popularity in high-dimensional genetic association studies. In this article, we propose a U-Statistic-based random forest approach, referred to as Forest U-Test, for genetic association studies with quantitative traits. Through simulation studies, we showed that the Forest U-Test outperformed existing methods. The proposed method was also applied to study Cannabis Dependence CD, using three independent datasets from the Study of Addiction: Genetics and Environment. A significant joint association was detected with an empirical p-value less than 0.001. The finding was also replicated in two independent datasets with p-values of 5.93e-19 and 4.70e-17, respectively.

【27】Computational Resolution of Hadamard Product Factorization for $4 \times 4$ Matrices
标题：阿达玛产品分解的计算分辨率，售价4美元 imes 4$矩阵
链接：https://arxiv.org/abs/2508.14901

作者：n
摘要：我们计算解决了一个公开的问题，有关的可表达性$4 \times 4$满秩矩阵的Hadamard产品的两个秩2矩阵。通过在$\mathbb{F}_2$上的穷举搜索，我们在20,160个满秩二元矩阵（26.3\%）中识别出5,304个反例。我们通过符号枚举验证了这些反例在$\mathbb{Z}$上仍然有效，并为它们在$\mathbb{R}$上的有效性提供了强有力的数值证据。值得注意的是，我们的分析表明，矩阵密度（1的数量）是高度预测的可表达性，达到95.7%的分类精度。使用现代机器学习技术，我们发现，可表达的矩阵位于16维环境空间内的大约10维的品种，尽管天真的参数计数为24（12个参数，每个为两个4 × 4$秩-2矩阵）。这种新出现的低维结构表明了阿达玛因子分解的深层代数约束。
摘要：We computationally resolve an open problem concerning the expressibility of $4 \times 4$ full-rank matrices as Hadamard products of two rank-2 matrices. Through exhaustive search over $\mathbb{F}_2$, we identify 5,304 counterexamples among the 20,160 full-rank binary matrices (26.3\%). We verify that these counterexamples remain valid over $\mathbb{Z}$ through sign enumeration and provide strong numerical evidence for their validity over $\mathbb{R}$. Remarkably, our analysis reveals that matrix density (number of ones) is highly predictive of expressibility, achieving 95.7\% classification accuracy. Using modern machine learning techniques, we discover that expressible matrices lie on an approximately 10-dimensional variety within the 16-dimensional ambient space, despite the naive parameter count of 24 (12 parameters each for two $4 \times 4$ rank-2 matrices). This emergent low-dimensional structure suggests deep algebraic constraints governing Hadamard factorizability.

【28】SVM/SVR Kernels as Quantum Propagators
标题：作为量子相关器的支持者/SVR核
链接：https://arxiv.org/abs/2502.11153

作者：Kuo, Renata Wong
摘要：我们建立了支持向量机（SVM）核函数和量子传播子之间的数学等价关系，量子传播子由时间相关的格林函数表示，这在很大程度上尚未探索。我们证明，许多常见的SVM内核自然对应的格林函数通过算子反演理论。sigmoid核并不总是满足Mercer定理，因此相应的格林函数也可能无法最佳地执行。我们进一步介绍了核多项式方法（KPM）设计定制的内核，符合格林函数。我们的数值实验证实，采用半正定核，对应于格林函数显着提高SVM模型在物理系统中的预测精度。
摘要：We establish a mathematical equivalence between Support Vector Machine (SVM) kernel functions and quantum propagators represented by time-dependent Green's functions, which has remained largely unexplored. We demonstrate that many common SVM kernels correspond naturally to Green's functions via operator inversion theory. The sigmoid kernel does not always satisfy Mercer's theorem, and therefore the corresponding Green's function may also fail to perform optimally. We further introduce a Kernel Polynomial Method (KPM) for designing customized kernels that align with Green's functions. Our numerical experiments confirm that employing positive-semidefinite kernels that correspond to Green's functions significantly improves predictive accuracy of SVM models in physical systems.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递