机器学习学术速递[8.18]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计105篇

大模型相关(8篇)

【1】Controlling Multimodal LLMs via Reward-guided Decoding
标题：通过奖励引导解码控制多模式LLM
链接：https://arxiv.org/abs/2508.11616

作者：as, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal
备注：Published at ICCV 2025
摘要：随着多模态大型语言模型（MLLM）的广泛应用，使其适应不同的用户需求变得越来越可取。在本文中，我们研究的MLLM通过控制解码的适应。为了实现这一目标，我们介绍了第一种方法的奖励引导解码MLLM和演示其应用程序，以提高他们的视觉接地。我们的方法包括建立奖励模型的视觉接地，并使用它们来指导MLLM的解码过程。具体地说，我们建立了两个独立的奖励模型，以独立地控制模型输出中对象的精确度和召回度。我们的方法通过两种方式实现MLLM推理过程的动态可控性：首先，通过在解码期间控制每个奖励函数的相对重要性，允许用户在图像字幕任务中动态地权衡对象精度和召回率;第二，通过在解码期间控制搜索的宽度，允许用户控制测试时间计算量和视觉基础程度之间的权衡。我们在标准对象幻觉基准上评估了我们的方法，结果表明它提供了对MLLM推理的显著可控性，同时始终优于现有的幻觉缓解方法。
摘要：As Multimodal Large Language Models (MLLMs) gain widespread applicability, it is becoming increasingly desirable to adapt them for diverse user needs. In this paper, we study the adaptation of MLLMs through controlled decoding. To achieve this, we introduce the first method for reward-guided decoding of MLLMs and demonstrate its application in improving their visual grounding. Our method involves building reward models for visual grounding and using them to guide the MLLM's decoding process. Concretely, we build two separate reward models to independently control the degree of object precision and recall in the model's output. Our approach enables on-the-fly controllability of an MLLM's inference process in two ways: first, by giving control over the relative importance of each reward function during decoding, allowing a user to dynamically trade off object precision for recall in image captioning tasks; second, by giving control over the breadth of the search during decoding, allowing the user to control the trade-off between the amount of test-time compute and the degree of visual grounding. We evaluate our method on standard object hallucination benchmarks, showing that it provides significant controllability over MLLM inference, while consistently outperforming existing hallucination mitigation methods.

【2】ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
标题：ETRL：通过熵机制平衡LLM测试时强化学习中的探索和利用
链接：https://arxiv.org/abs/2508.11356

作者：ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao
摘要：大型语言模型的最新进展在数学和编程等复杂推理任务中取得了显着改进。然而，这些模型仍然严重依赖于注释数据，并且在无监督场景中表现出有限的适应性。为了解决这些限制，已经提出了测试时强化学习（TTRL），它通过利用模型生成的伪标签来实现自优化。尽管TTRL前景光明，但它面临着几个关键挑战，包括由于并行推出和早期估计偏差导致的高推理成本，这些偏差会导致过度自信，减少输出多样性并导致性能平台。为了解决这些挑战，我们引入了一种基于熵的机制，通过两种策略来增强测试时强化学习中的探索-利用平衡：熵叉树多数推出（ETMR）和基于熵的优势重塑（ESTR）。与基线相比，我们的方法使Llama3.1-8B在AIME 2024基准测试中的1个指标上的Pass相对提高了68%，同时仅消耗了60%的推出令牌预算。这突出了我们的方法能够有效地优化推理效率，多样性和估计鲁棒性之间的权衡，从而推进开放域推理任务的无监督强化学习。
摘要：Recent advancements in Large Language Models have yielded significant improvements in complex reasoning tasks such as mathematics and programming. However, these models remain heavily dependent on annotated data and exhibit limited adaptability in unsupervised scenarios. To address these limitations, test-time reinforcement learning (TTRL) has been proposed, which enables self-optimization by leveraging model-generated pseudo-labels. Despite its promise, TTRL faces several key challenges, including high inference costs due to parallel rollouts and early-stage estimation bias that fosters overconfidence, reducing output diversity and causing performance plateaus. To address these challenges, we introduce an entropy-based mechanism to enhance the exploration-exploitation balance in test-time reinforcement learning through two strategies: Entropy-fork Tree Majority Rollout (ETMR) and Entropy-based Advantage Reshaping (EAR). Compared with the baseline, our approach enables Llama3.1-8B to achieve a 68 percent relative improvement in Pass at 1 metric on the AIME 2024 benchmark, while consuming only 60 percent of the rollout tokens budget. This highlights our method's ability to effectively optimize the trade-off between inference efficiency, diversity, and estimation robustness, thereby advancing unsupervised reinforcement learning for open-domain reasoning tasks.

【3】Semantically Guided Adversarial Testing of Vision Models Using Language Models
标题：使用语言模型的视觉模型的语义引导对抗测试
链接：https://arxiv.org/abs/2508.11341

作者： Filus, Jorge M. Cruz-Duarte
备注：12 pages, 4 figures, 3 tables. Submitted for peer review
摘要：在针对视觉模型的有针对性的对抗性攻击中，目标标签的选择是攻击成功的关键但往往被忽视的决定因素。这个目标标签对应于攻击者旨在迫使模型预测的类。现在，现有的策略通常依赖于随机性，模型预测或静态语义资源，限制了可解释性，可重复性或灵活性。然后，本文提出了一个语义引导的框架，用于对抗性目标选择，使用来自预训练语言和视觉语言模型的跨模态知识转移。我们评估了几个最先进的模型（BERT，TinyLLAMA和CLIP）作为相似性来源，以选择与地面事实相关的最多和最少的语义相关标签，形成最佳和最坏情况的对抗场景。我们对三种视觉模型和五种攻击方法的实验表明，这些模型始终呈现实际的对抗目标，并超过静态词汇数据库，如WordNet，特别是对于遥远的类关系。我们还观察到，静态测试的目标标签提供了一个初步评估的有效性相似性来源，\textit{先验}测试。我们的研究结果证实了预训练模型对于构建跨架构和数据集的可解释，标准化和可扩展的对抗性基准的适用性。
摘要：In targeted adversarial attacks on vision models, the selection of the target label is a critical yet often overlooked determinant of attack success. This target label corresponds to the class that the attacker aims to force the model to predict. Now, existing strategies typically rely on randomness, model predictions, or static semantic resources, limiting interpretability, reproducibility, or flexibility. This paper then proposes a semantics-guided framework for adversarial target selection using the cross-modal knowledge transfer from pretrained language and vision-language models. We evaluate several state-of-the-art models (BERT, TinyLLAMA, and CLIP) as similarity sources to select the most and least semantically related labels with respect to the ground truth, forming best- and worst-case adversarial scenarios. Our experiments on three vision models and five attack methods reveal that these models consistently render practical adversarial targets and surpass static lexical databases, such as WordNet, particularly for distant class relationships. We also observe that static testing of target labels offers a preliminary assessment of the effectiveness of similarity sources, \textit{a priori} testing. Our results corroborate the suitability of pretrained models for constructing interpretable, standardized, and scalable adversarial benchmarks across architectures and datasets.

【4】Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
标题：无线边缘设备网络中用于LLM推理的动态质量延迟感知路由
链接：https://arxiv.org/abs/2508.11291

作者：Nan Xue, Yaping Sun, Zhiyong Chen
备注：accepted by IEEE/CIC ICCC workshop
摘要：无线通信和大型语言模型（LLM）的集成有望解锁无处不在的智能服务，但在无线边缘设备协作环境中部署它们会在推理质量和端到端延迟之间进行关键权衡。任务复杂性和资源分配之间存在根本性的不匹配：卸载简单的查询会导致令人望而却步的延迟，而设备上的模型缺乏要求苛刻的计算能力。为了解决这一挑战，我们提出了一个动态的，质量延迟感知的路由框架，编排之间的推理轻量级模型的移动终端和边缘服务器上的功能强大的模型。我们的框架采用了两种不同的成本模型：对于单轮查询，它融合了BERT预测的语义得分与通信和计算开销;对于多轮对话，它进一步量化了模型切换和KV缓存管理所产生的上下文感知成本。在保持完整的推理质量的同时，广泛的实验表明，我们的框架将平均响应延迟降低了5-15%，并将大型模型调用减少了10-20%，与MMLU，GSM 8 K和MT-Bench-101基准测试的竞争基线相比。
摘要：The integration of wireless communications and Large Language Models (LLMs) is poised to unlock ubiquitous intelligent services, yet deploying them in wireless edge-device collaborative environments presents a critical trade-off between inference quality and end-to-end latency. A fundamental mismatch exists between task complexity and resource allocation: offloading simple queries invites prohibitive latency, while on-device models lack the capacity for demanding computations. To address this challenge, we propose a dynamic, quality-latency aware routing framework that orchestrates inference between a lightweight model on the mobile device and a powerful model on the edge server. Our framework employs two distinct cost models: for single-turn queries, it fuses a BERT-predicted semantic score with communication and computation overheads; for multi-turn dialogues, it further quantifies context-aware costs arising from model switching and KV-cache management. While maintaining full inference quality, extensive experiments demonstrate that our framework cuts average response latency by 5-15% and reduces large model invocations by 10-20% against competitive baselines on MMLU, GSM8K, and MT-Bench-101 benchmarks.

【5】CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems
标题：CSGO：无线协作边缘LLM系统冷启动的广义优化
链接：https://arxiv.org/abs/2508.11287

作者：, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui
备注：submitted to Journal of Communications and Information Networks
摘要：虽然在边缘设备上部署大型语言模型有望提供低延迟和隐私保护的AI服务，但它受到有限设备资源的阻碍。虽然流水线并行有利于分布式推理，现有的方法往往忽略了冷启动延迟所造成的按需模型加载。在本文中，我们提出了一个延迟感知的调度框架，重叠模型加载计算和通信，以尽量减少总的推理延迟。基于设备和模型参数，该框架动态地调整层划分和分配，以有效地隐藏加载时间，从而消除尽可能多的空闲时间。我们将问题表示为一个混合非线性规划，并设计了一个有效的动态规划算法来优化模型划分和设备分配。实验结果表明，该方法显着减少冷启动延迟相比，基线策略。
摘要：While deploying large language models on edge devices promises low-latency and privacy-preserving AI services, it is hindered by limited device resources. Although pipeline parallelism facilitates distributed inference, existing approaches often ignore the cold-start latency caused by on-demand model loading. In this paper, we propose a latency-aware scheduling framework that overlaps model loading with computation and communication to minimize total inference latency. Based on device and model parameters, the framework dynamically adjusts layer partitioning and allocation to effectively hide loading time, thereby eliminating as many idle periods as possible. We formulate the problem as a Mixed-Integer Non-Linear Program and design an efficient dynamic programming algorithm to optimize model partitioning and device assignment. Experimental results show that the proposed method significantly reduces cold-start latency compared to baseline strategies.

【6】Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
标题：小组公平性遇到黑匣子：通过后处理在封闭的LLM上启用公平算法
链接：https://arxiv.org/abs/2508.11258

作者：Xian, Yuxuan Wan, Han Zhao
摘要：指令微调的大型语言模型（LLM）实现了用于构建预测模型的简单的zero-shot或Few-Shot提示范例，也称为上下文学习。这种便利性，再加上LLM能力的持续进步，有可能推动它们在广泛领域的采用，包括高风险应用程序，其中群体公平性-防止人口群体之间的不同影响-至关重要。大多数现有的基于LLM的分类器上执行组公平性的方法依赖于通过模型微调或最终层嵌入的头部调整应用的传统公平算法，但它们不再适用于上下文学习设置下的闭权重LLM，其中包括一些当今最有能力的商业模型，如GPT-4，Gemini和Claude。在本文中，我们提出了一个框架，用于通过提示从封闭权重LLM中导出公平分类器：LLM被视为特征提取器，并且从其概率预测中引出特征（例如，令牌日志概率），以获得用于公平分类的足够的统计数据;然后将公平算法应用于这些特征，以事后方式训练轻量级公平分类器。在五个数据集上的实验，包括三个表格数据集，证明了我们的框架从开放权重和封闭权重LLM中导出的分类器的强大准确性-公平性权衡;特别是，我们的框架是数据高效的，并且优于在LLM嵌入上训练的公平分类器（即，头部调整）或从原始表格特征上从头开始。
摘要：Instruction fine-tuned large language models (LLMs) enable a simple zero-shot or few-shot prompting paradigm, also known as in-context learning, for building prediction models. This convenience, combined with continued advances in LLM capability, has the potential to drive their adoption across a broad range of domains, including high-stakes applications where group fairness -- preventing disparate impacts across demographic groups -- is essential. The majority of existing approaches to enforcing group fairness on LLM-based classifiers rely on traditional fair algorithms applied via model fine-tuning or head-tuning on final-layer embeddings, but they are no longer applicable to closed-weight LLMs under the in-context learning setting, which include some of the most capable commercial models today, such as GPT-4, Gemini, and Claude. In this paper, we propose a framework for deriving fair classifiers from closed-weight LLMs via prompting: the LLM is treated as a feature extractor, and features are elicited from its probabilistic predictions (e.g., token log probabilities) using prompts strategically designed for the specified fairness criterion to obtain sufficient statistics for fair classification; a fair algorithm is then applied to these features to train a lightweight fair classifier in a post-hoc manner. Experiments on five datasets, including three tabular ones, demonstrate strong accuracy-fairness tradeoffs for the classifiers derived by our framework from both open-weight and closed-weight LLMs; in particular, our framework is data-efficient and outperforms fair classifiers trained on LLM embeddings (i.e., head-tuning) or from scratch on raw tabular features.

【7】Improving Text Style Transfer using Masked Diffusion Language Models with Inference-time Scaling
标题：使用具有推理时缩放的掩蔽扩散语言模型改进文本风格转移
链接：https://arxiv.org/abs/2508.10995

作者：ishor Padole, Suyash P Awate, Pushpak Bhattacharyya
备注：Accepted as a main conference submission in the European Conference on Artificial Intelligence (ECAI 2025)
摘要：屏蔽扩散语言模型（Masked Diffusion Language Models，MDM）是一种可行的自然语言生成框架。这可以归因于与离散数据的其他扩散模型范例相比，其可扩展性和易于训练，从而将其自身确立为离散数据的最先进的非自回归生成器。一般来说，扩散模型已经显示出通过利用推理时间缩放来提高生成质量的出色能力，所述推理时间缩放通过增加去噪步骤的数量或通过在每个步骤的输出之上使用外部验证器来指导生成。在这项工作中，我们提出了一个基于验证器的推理时间缩放方法，有助于在MDM的去噪过程中找到更好的候选生成。我们的实验证明了MDM在标准文本风格迁移任务中的应用，并建立了MDM作为自回归语言模型的更好替代方案。此外，我们还证明了使用现成的预训练嵌入模型为MDM设置一个简单的基于软值的验证器，即使在现有文献中的典型无分类器指导设置上使用，也能显著提高生成质量。
摘要：Masked diffusion language models (MDMs) have recently gained traction as a viable generative framework for natural language. This can be attributed to its scalability and ease of training compared to other diffusion model paradigms for discrete data, establishing itself as the state-of-the-art non-autoregressive generator for discrete data. Diffusion models, in general, have shown excellent ability to improve the generation quality by leveraging inference-time scaling either by increasing the number of denoising steps or by using external verifiers on top of the outputs of each step to guide the generation. In this work, we propose a verifier-based inference-time scaling method that aids in finding a better candidate generation during the denoising process of the MDM. Our experiments demonstrate the application of MDMs for standard text-style transfer tasks and establish MDMs as a better alternative to autoregressive language models. Additionally, we show that a simple soft-value-based verifier setup for MDMs using off-the-shelf pre-trained embedding models leads to significant gains in generation quality even when used on top of typical classifier-free guidance setups in the existing literature.

【8】ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
标题：ADMIRE-BayesOpt：采用Bayes优化的语言模型加速数据混合RE加权
链接：https://arxiv.org/abs/2508.11551

作者：ng Chen, Xu Ouyang, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz
摘要：确定大型语言模型训练的最佳数据混合仍然是一个具有挑战性的问题，对性能有着巨大的影响。在实践中，语言模型开发人员继续依赖于启发式探索，因为没有基于学习的方法已经成为一个可靠的解决方案。在这项工作中，我们建议将训练数据混合物的选择视为黑箱超参数优化问题，贝叶斯优化是一种成熟的适当算法。首先，我们将数据混合学习作为一个顺序决策问题，我们的目标是在训练探索性（代理）模型的计算成本和最终混合性能之间找到一个合适的权衡。其次，我们系统地探索了将小规模学习到的混合物转移到大规模实验的特性，提供了见解并突出了适度规模研究的机会。通过提出多保真度贝叶斯优化作为这种常见情况下的合适方法，我们引入了一个自然的框架来平衡实验成本与模型拟合，避免了过度拟合到较小尺度的风险，同时最大限度地减少了高成本实验的数量。我们展示了从100万到70亿个参数的模型的预训练和指令微调结果，从简单的架构到跨越数十个数据集的最先进的模型和基准。相对于广泛的基准，我们展示了一致的强有力的结果，在我们最大的实验中，相对于最近的基线，在确定最佳数据混合时，显示出超过500%的加速。此外，我们还通过共享ADMIRE IFT数据集来扩大研究范围，ADMIRE IFT数据集是一个包含460个完整训练和评估的数据集，跨越各种模型大小，价值超过13，000个GPU小时，大大降低了在该领域进行研究的成本。
摘要：Determining the optimal data mixture for large language model training remains a challenging problem with an outsized impact on performance. In practice, language model developers continue to rely on heuristic exploration since no learning-based approach has emerged as a reliable solution. In this work, we propose to view the selection of training data mixtures as a black-box hyperparameter optimization problem, for which Bayesian Optimization is a well-established class of appropriate algorithms. Firstly, we cast data mixture learning as a sequential decision-making problem, in which we aim to find a suitable trade-off between the computational cost of training exploratory (proxy-) models and final mixture performance. Secondly, we systematically explore the properties of transferring mixtures learned at a small scale to larger-scale experiments, providing insights and highlighting opportunities for research at a modest scale. By proposing Multi-fidelity Bayesian Optimization as a suitable method in this common scenario, we introduce a natural framework to balance experiment cost with model fit, avoiding the risks of overfitting to smaller scales while minimizing the number of experiments at high cost. We present results for pre-training and instruction finetuning across models ranging from 1 million to 7 billion parameters, varying from simple architectures to state-of-the-art models and benchmarks spanning dozens of datasets. We demonstrate consistently strong results relative to a wide range of benchmarks, showingspeed-ups of over 500% in determining the best data mixture on our largest experiments relative to recent baselines. In addition, we broaden access to research by sharing ADMIRE IFT Runs, a dataset of 460 full training & evaluation runs across various model sizes worth over 13,000 GPU hours, greatly reducing the cost of conducting research in this area.

Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】DFed-SST: Building Semantic- and Structure-aware Topologies for Decentralized Federated Graph Learning
标题：DFed-CST：为去中心化联邦图学习构建语义和结构感知的Topology
链接：https://arxiv.org/abs/2508.11530

作者： Guo, Zhongzheng Yuan, Xunkai Li, Yinlin Zhu, Meixia Qu, Wenyu Wang
摘要：去中心化联合学习（DFL）已成为一种强大的分布式范式，可以规避中心化架构的单点故障和通信瓶颈风险。然而，由于现有的DFL优化策略（主要针对计算机视觉等任务设计）无法解决局部子图中固有的独特拓扑信息，因此出现了一个重大挑战。值得注意的是，虽然联邦图学习（FGL）是为图数据量身定制的，但它主要是在集中式服务器-客户端模型中实现的，未能利用分散化的好处。为了弥合这一差距，我们提出了DFed-SST，一个具有自适应通信的分散式联邦图学习框架。我们的方法的核心是一个双拓扑自适应通信机制，利用每个客户端的本地子图的独特的拓扑特征，动态地构建和优化客户端间的通信拓扑。这使得我们的框架能够在面对异构性时有效地指导模型聚合。在八个真实世界数据集上的广泛实验一致证明了DFed-SST的优越性，与基线方法相比，平均精度提高了3.26%。
摘要：Decentralized Federated Learning (DFL) has emerged as a robust distributed paradigm that circumvents the single-point-of-failure and communication bottleneck risks of centralized architectures. However, a significant challenge arises as existing DFL optimization strategies, primarily designed for tasks such as computer vision, fail to address the unique topological information inherent in the local subgraph. Notably, while Federated Graph Learning (FGL) is tailored for graph data, it is predominantly implemented in a centralized server-client model, failing to leverage the benefits of decentralization.To bridge this gap, we propose DFed-SST, a decentralized federated graph learning framework with adaptive communication. The core of our method is a dual-topology adaptive communication mechanism that leverages the unique topological features of each client's local subgraph to dynamically construct and optimize the inter-client communication topology. This allows our framework to guide model aggregation efficiently in the face of heterogeneity. Extensive experiments on eight real-world datasets consistently demonstrate the superiority of DFed-SST, achieving 3.26% improvement in average accuracy over baseline methods.

【2】Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies
标题：通过子图从属关系实现图神经网络中忠实的类级自我解释性
链接：https://arxiv.org/abs/2508.11513

作者：iu, Xiaoxiao Ma, Jian Yang, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Quan Z. Sheng, Jia Wu
备注：14 pages, 12 figures
摘要：增强图神经网络（GNN）的可解释性对于确保其安全和公平部署至关重要。最近的工作引入了可自我解释的GNN，作为训练的一部分生成解释，提高了忠诚度和效率。其中一些模型，如ProtGNN和PGIB，学习类特定的原型，提供了一个潜在的途径，类级别的解释。然而，他们的评估只关注实例级的解释，留下了一个问题，即这些原型是否有意义地概括了同一个类的实例。在本文中，我们介绍了GraphOracle，这是一种新型的自解释GNN框架，旨在为GNN生成和评估类级解释。我们的模型联合学习了一个GNN分类器和一组结构化的稀疏子图，这些子图对每个类都是有区别的。我们提出了一种新的集成训练，捕获图$\unicode{x2013}$子图$\unicode{x2013}$预测的依赖关系，有效和忠实，通过基于掩码的评估策略验证。该策略使我们能够追溯性地评估ProtGNN和PGIB等先前方法是否提供有效的班级级别解释。我们的结果表明，他们没有。相比之下，GraphOracle在一系列图分类任务中实现了卓越的保真度，可解释性和可扩展性。我们进一步证明，GraphOracle通过使用熵正则化子图选择和轻量级随机游走提取，避免了以前方法$\unicode {x2014}$（如Monte Carlo Tree Search$\unicode{x2014}$）的计算瓶颈，从而实现了更快，更可扩展的训练。这些发现将GraphOracle定位为GNN中忠实的类级别自我解释性的实用且有原则的解决方案。
摘要：Enhancing the interpretability of graph neural networks (GNNs) is crucial to ensure their safe and fair deployment. Recent work has introduced self-explainable GNNs that generate explanations as part of training, improving both faithfulness and efficiency. Some of these models, such as ProtGNN and PGIB, learn class-specific prototypes, offering a potential pathway toward class-level explanations. However, their evaluations focus solely on instance-level explanations, leaving open the question of whether these prototypes meaningfully generalize across instances of the same class. In this paper, we introduce GraphOracle, a novel self-explainable GNN framework designed to generate and evaluate class-level explanations for GNNs. Our model jointly learns a GNN classifier and a set of structured, sparse subgraphs that are discriminative for each class. We propose a novel integrated training that captures graph$\unicode{x2013}$subgraph$\unicode{x2013}$prediction dependencies efficiently and faithfully, validated through a masking-based evaluation strategy. This strategy enables us to retroactively assess whether prior methods like ProtGNN and PGIB deliver effective class-level explanations. Our results show that they do not. In contrast, GraphOracle achieves superior fidelity, explainability, and scalability across a range of graph classification tasks. We further demonstrate that GraphOracle avoids the computational bottlenecks of previous methods$\unicode{x2014}$like Monte Carlo Tree Search$\unicode{x2014}$by using entropy-regularized subgraph selection and lightweight random walk extraction, enabling faster and more scalable training. These findings position GraphOracle as a practical and principled solution for faithful class-level self-explainability in GNNs.

【3】A Remedy for Over-Squashing in Graph Learning via Forman-Ricci Curvature based Graph-to-Hypergraph Structural Lifting
标题：通过基于Forman-Ricci曲线的图到超图结构提升来解决图学习中过度压缩的一种补救措施
链接：https://arxiv.org/abs/2508.11390

作者：anf, Dominik Filipiak, Max Schattauer, Liliya Imasheva
摘要：图神经网络在从关系数据中学习方面非常有效，利用节点和边特征，同时保持图结构固有的对称性。然而，许多现实世界的系统，如社会或生物网络，表现出复杂的相互作用，更自然地表示为高阶拓扑域。新兴的几何和拓扑深度学习领域通过引入利用并受益于高阶结构的方法来解决这一挑战。TDL的核心是提升的概念，它在应用GNN模型进行学习之前，将数据表示从基本的图形形式转换为更具表达力的拓扑。在这项工作中，我们提出了一个结构提升策略，使用Forman-Ricci曲率，它定义了一个基于边的网络特征的基础上黎曼几何。曲率揭示了图的局部和全局属性，例如网络的骨干，即形成主要社区之间连接的粗糙的结构保持图几何形状-最适合表示为超边，以模拟网络中大距离集群之间的信息流。为此，我们的方法提供了一个补救措施的信息失真的问题，在长距离的消息传递和图瓶颈-一种现象，在图学习中被称为过度挤压。
摘要：Graph Neural Networks are highly effective at learning from relational data, leveraging node and edge features while maintaining the symmetries inherent to graph structures. However, many real-world systems, such as social or biological networks, exhibit complex interactions that are more naturally represented by higher-order topological domains. The emerging field of Geometric and Topological Deep Learning addresses this challenge by introducing methods that utilize and benefit from higher-order structures. Central to TDL is the concept of lifting, which transforms data representations from basic graph forms to more expressive topologies before the application of GNN models for learning. In this work, we propose a structural lifting strategy using Forman-Ricci curvature, which defines an edge-based network characteristic based on Riemannian geometry. Curvature reveals local and global properties of a graph, such as a network's backbones, i.e. coarse, structure-preserving graph geometries that form connections between major communities - most suitably represented as hyperedges to model information flows between clusters across large distances in the network. To this end, our approach provides a remedy to the problem of information distortion in message passing across long distances and graph bottlenecks - a phenomenon known in graph learning as over-squashing.

【4】SAGE: Scale-Aware Gradual Evolution for Continual Knowledge Graph Embedding
标题：SAGE：连续知识图嵌入的规模感知渐进进化
链接：https://arxiv.org/abs/2508.11347

作者： Lingling Zhang, Hang Yan, Tianzhe Zhao, Zihan Ma, Muye Huang, Jun Liu
备注：10 pages, 5 figures, Accepted at KDD 2025, code available at this https URL
摘要：传统的知识图嵌入方法主要集中在静态图上，其目标是在低维空间中表示实体和关系。然而，现实世界的知识库是动态发展的，不断增加的实体，关系和事实。为了解决KG的这种动态性质，已经开发了几种连续知识图嵌入（CKGE）方法来有效地更新KG嵌入以适应新的事实，同时保持学习的知识。由于KGs在现实世界中以不同的速度和规模增长，现有CKGE方法通常无法考虑更新的不同规模，并且在整个更新过程中缺乏系统的评估。在本文中，我们提出了SAGE，CKGE的规模意识的逐步进化框架。具体地说，SAGE首先根据更新尺度确定嵌入维数，并相应地扩展嵌入空间。动态蒸馏机制被进一步用于平衡学习知识的保存和新事实的并入。我们在七个基准上进行了广泛的实验，结果表明SAGE始终优于现有的基线，在MRR中有1.38%的显着改善，在H@1和H@10中分别为1.25%和1.6%。此外，SAGE与使用固定嵌入维数的方法进行比较的实验表明，SAGE在每个快照上都实现了最佳性能，证明了自适应嵌入维数在CKGE中的重要性。SAGE的代码可在https://github.com/lyfxjtu/Dynamic-Embedding上公开获取。
摘要：Traditional knowledge graph (KG) embedding methods aim to represent entities and relations in a low-dimensional space, primarily focusing on static graphs. However, real-world KGs are dynamically evolving with the constant addition of entities, relations and facts. To address such dynamic nature of KGs, several continual knowledge graph embedding (CKGE) methods have been developed to efficiently update KG embeddings to accommodate new facts while maintaining learned knowledge. As KGs grow at different rates and scales in real-world scenarios, existing CKGE methods often fail to consider the varying scales of updates and lack systematic evaluation throughout the entire update process. In this paper, we propose SAGE, a scale-aware gradual evolution framework for CKGE. Specifically, SAGE firstly determine the embedding dimensions based on the update scales and expand the embedding space accordingly. The Dynamic Distillation mechanism is further employed to balance the preservation of learned knowledge and the incorporation of new facts. We conduct extensive experiments on seven benchmarks, and the results show that SAGE consistently outperforms existing baselines, with a notable improvement of 1.38% in MRR, 1.25% in H@1 and 1.6% in H@10. Furthermore, experiments comparing SAGE with methods using fixed embedding dimensions show that SAGE achieves optimal performance on every snapshot, demonstrating the importance of adaptive embedding dimensions in CKGE. The codes of SAGE are publicly available at: https://github.com/lyfxjtu/Dynamic-Embedding.

【5】Generalize across Homophily and Heterophily: Hybrid Spectral Graph Pre-Training and Prompt Tuning
标题：同质性和异同质性的概括：混合谱图预训练和即时调整
链接：https://arxiv.org/abs/2508.11328

作者：uo, Suhang Wang, Weiyao Zhang, Ruiqi Meng, Xuying Meng, Yujun Zhang
备注：Under Review
摘要：图"预培训和预调整“将下游任务与预培训目标相结合，以在有限的监督下实现有效的知识转移。然而，现有的方法依赖于基于同质性的低频知识，无法处理现实世界中具有不同同质性的图中的不同谱分布。我们的理论分析揭示了一个光谱特异性原理：最佳的知识转移需要预先训练的光谱滤波器和下游图的固有光谱之间的对齐。在有限的监督下，预训练和下游任务之间的巨大频谱差距阻碍了有效的适应。为了弥合这一差距，我们提出了HS-GPPT模型，这是一种新的框架，可以确保在预训练和预调整过程中的频谱对齐。我们利用混合光谱滤波器主干和局部-全局对比学习来获取丰富的光谱知识。然后，我们设计提示图对齐的光谱分布的借口，促进光谱知识转移同质性和异质性。大量的实验验证了在两种学习环境下的有效性。我们的代码可在https://anonymous.4open.science/r/HS-GPPT-62D2/上获得。
摘要：Graph ``pre-training and prompt-tuning'' aligns downstream tasks with pre-trained objectives to enable efficient knowledge transfer under limited supervision. However, existing methods rely on homophily-based low-frequency knowledge, failing to handle diverse spectral distributions in real-world graphs with varying homophily. Our theoretical analysis reveals a spectral specificity principle: optimal knowledge transfer requires alignment between pre-trained spectral filters and the intrinsic spectrum of downstream graphs. Under limited supervision, large spectral gaps between pre-training and downstream tasks impede effective adaptation. To bridge this gap, we propose the HS-GPPT model, a novel framework that ensures spectral alignment throughout both pre-training and prompt-tuning. We utilize a hybrid spectral filter backbone and local-global contrastive learning to acquire abundant spectral knowledge. Then we design prompt graphs to align the spectral distribution with pretexts, facilitating spectral knowledge transfer across homophily and heterophily. Extensive experiments validate the effectiveness under both transductive and inductive learning settings. Our code is available at https://anonymous.4open.science/r/HS-GPPT-62D2/.

【6】Graph Neural Diffusion via Generalized Opinion Dynamics
标题：通过广义观点动力学的图神经扩散
链接：https://arxiv.org/abs/2508.11249

作者：apathige, Asiri Wijesinghe, Ahad N. Zehmakan
摘要：人们对开发基于扩散的图神经网络（GNN）越来越感兴趣，该网络建立在GNN中的消息传递机制与物理扩散过程之间的联系之上。然而，现有的方法受到三个关键限制：（1）它们依赖于具有静态动态的均匀扩散，限制了对不同图结构的适应性;（2）它们的深度受到计算开销和可解释性降低的限制;以及（3）对它们的收敛行为的理论理解仍然有限。为了应对这些挑战，我们提出了GODNF，一个广义的意见动态神经框架，它将多个意见动态模型统一到一个有原则的，可训练的扩散机制中。我们的框架通过特定于节点的行为建模和动态邻域影响来捕获异构扩散模式和时间动态，同时确保即使在深层也能进行有效和可解释的消息传播。我们提供了一个严格的理论分析，证明GODNF的能力，模拟不同的收敛配置。节点分类和影响估计任务的广泛经验评估证实了GODNF优于最先进的GNN。
摘要：There has been a growing interest in developing diffusion-based Graph Neural Networks (GNNs), building on the connections between message passing mechanisms in GNNs and physical diffusion processes. However, existing methods suffer from three critical limitations: (1) they rely on homogeneous diffusion with static dynamics, limiting adaptability to diverse graph structures; (2) their depth is constrained by computational overhead and diminishing interpretability; and (3) theoretical understanding of their convergence behavior remains limited. To address these challenges, we propose GODNF, a Generalized Opinion Dynamics Neural Framework, which unifies multiple opinion dynamics models into a principled, trainable diffusion mechanism. Our framework captures heterogeneous diffusion patterns and temporal dynamics via node-specific behavior modeling and dynamic neighborhood influence, while ensuring efficient and interpretable message propagation even at deep layers. We provide a rigorous theoretical analysis demonstrating GODNF's ability to model diverse convergence configurations. Extensive empirical evaluations of node classification and influence estimation tasks confirm GODNF's superiority over state-of-the-art GNNs.

【7】Hybrid-Hierarchical Fashion Graph Attention Network for Compatibility-Oriented and Personalized Outfit Recommendation
标题：面向可信度和个性化服装推荐的混合分层时尚图形注意力网络
链接：https://arxiv.org/abs/2508.11105

作者：ed, Babak Teimourpour
摘要：时尚行业的快速扩张和产品种类的不断增加，使得用户在电子商务平台上找到兼容的产品变得非常困难。有效的服装推荐系统对于过滤不相关的商品并推荐合适的商品是至关重要的。然而，同时解决服装的兼容性和个性化的建议仍然是一个重大的挑战，因为这些方面往往是独立对待在现有的研究，往往忽略了项目和用户偏好之间的复杂的相互作用。这项研究引入了一个名为FGAT的新框架，该框架受到HFGN模型的启发，利用图神经网络和图注意力机制来解决这个问题。该框架构建了一个三层的用户，服装和项目的层次图，整合视觉和文本功能，同时模型服装的兼容性和用户的喜好。图形注意机制在表示传播期间动态地加权节点重要性，从而能够捕获关键交互并为用户偏好和服装兼容性生成精确表示。在POG数据集上的测试结果表明，FGAT模型在查准率、HR、召回率、NDCG和准确率等方面均优于HFGN模型，表明将多模态视觉-文本特征与层次图结构和注意力机制相结合，可以显著提高个性化时尚推荐系统的准确率和效率。
摘要：The rapid expansion of the fashion industry and the growing variety of products have made it challenging for users to find compatible items on e-commerce platforms. Effective fashion recommendation systems are crucial for filtering irrelevant items and suggesting suitable ones. However, simultaneously addressing outfit compatibility and personalized recommendations remains a significant challenge, as these aspects are often treated independently in existing studies, often overlooking the complex interactions between items and user preferences. This research introduces a new framework named FGAT, inspired by the HFGN model, which leverages graph neural networks and graph attention mechanisms to tackle this issue. The proposed framework constructs a three-tier hierarchical graph of users, outfits, and items, integrating visual and textual features to simultaneously model outfit compatibility and user preferences. A graph attention mechanism dynamically weights node importance during representation propagation, enabling the capture of key interactions and generating precise representations for both user preferences and outfit compatibility. Evaluated on the POG dataset, FGAT outperforms baseline models such as HFGN, achieving improved results in precision, HR, recall, NDCG, and accuracy.These results demonstrate that combining multimodal visual-textual features with a hierarchical graph structure and attention mechanisms significantly enhances the accuracy and efficiency of personalized fashion recommendation systems.

Transformer(4篇)

【1】Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
标题：使用基于转换器的模型进行历史手稿手写文本识别
链接：https://arxiv.org/abs/2508.11499

作者：ed
摘要：历史手写文本识别（HTR）对于释放档案文件的文化和学术价值至关重要，但数字化往往受到稀缺的transmittance，语言变异和高度多样化的手写风格的阻碍。在这项研究中，我们应用TrOCR，一个国家的最先进的变压器为基础的HTR模型，16世纪的拉丁文手稿作者鲁道夫Gwalther。我们研究了有针对性的图像预处理和一系列广泛的数据增强技术，介绍了四种专门针对历史手写特征设计的新型增强方法。我们还评估了集成学习方法，以利用增强训练模型的互补优势。在Gwalther数据集上，我们最好的单模型增强（Elastic）实现了1.86的字符错误率（CER），而前5名的投票集合达到了1.60的CER-表示比报告的最佳TrOCR_BASE结果相对提高了50%，比先前的技术水平提高了42%。这些结果突出了领域的影响-具体的增强和合奏策略，在推进HTR性能的历史手稿。
摘要：Historical handwritten text recognition (HTR) is essential for unlocking the cultural and scholarly value of archival documents, yet digitization is often hindered by scarce transcriptions, linguistic variation, and highly diverse handwriting styles. In this study, we apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther. We investigate targeted image preprocessing and a broad suite of data augmentation techniques, introducing four novel augmentation methods designed specifically for historical handwriting characteristics. We also evaluate ensemble learning approaches to leverage the complementary strengths of augmentation-trained models. On the Gwalther dataset, our best single-model augmentation (Elastic) achieves a Character Error Rate (CER) of 1.86, while a top-5 voting ensemble achieves a CER of 1.60 - representing a 50% relative improvement over the best reported TrOCR_BASE result and a 42% improvement over the previous state of the art. These results highlight the impact of domain-specific augmentations and ensemble strategies in advancing HTR performance for historical manuscripts.

【2】Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
标题：通过端到端差异化自我训练来简化Transformer预测
链接：https://arxiv.org/abs/2508.11393

作者：ner, Sina Zarrieß
备注：None
摘要：我们提出了一个端到端的可微分训练范式，用于合理化Transformer分类器的稳定训练。我们的方法导致一个单一的模型，同时分类的样本和分数输入令牌的基础上，他们的相关性的分类。为此，我们建立在广泛使用的三人游戏训练合理化模型，这通常依赖于训练一个合理选择器，一个分类器和一个补充分类器。我们通过使单个模型满足所有三个角色来简化这种方法，从而产生更有效的训练范式，这种范式不容易受到困扰现有方法的常见训练不稳定性的影响。此外，我们扩展了这种范式以产生类的理论依据，同时结合了参数化和规则化所产生的理论依据的最新进展，从而在没有任何明确监督的情况下大幅改进了与人类注释的最新对齐。
摘要：We propose an end-to-end differentiable training paradigm for stable training of a rationalized transformer classifier. Our approach results in a single model that simultaneously classifies a sample and scores input tokens based on their relevance to the classification. To this end, we build on the widely-used three-player-game for training rationalized models, which typically relies on training a rationale selector, a classifier and a complement classifier. We simplify this approach by making a single model fulfill all three roles, leading to a more efficient training paradigm that is not susceptible to the common training instabilities that plague existing approaches. Further, we extend this paradigm to produce class-wise rationales while incorporating recent advances in parameterizing and regularizing the resulting rationales, thus leading to substantially improved and state-of-the-art alignment with human annotations without any explicit supervision.

【3】Abundance-Aware Set Transformer for Microbiome Sample Embedding
标题：用于微生物组样本嵌入的丰富感知集Transformer
链接：https://arxiv.org/abs/2508.11075

作者：oo, Gail Rosen
摘要：输入到LLM的微生物组样本表示对于表型预测和环境分类等下游任务至关重要。虽然先前的研究已经探索了每个微生物组样本的基于嵌入的表示，但大多数都依赖于序列嵌入的简单平均，通常忽略了分类群丰度的生物重要性。在这项工作中，我们提出了一个丰度感知的集合Transformer的变体，通过根据它们的相对丰度对序列嵌入进行加权来构建固定大小的样本级嵌入。在不修改模型架构的情况下，我们复制与其丰度成比例的嵌入向量，并应用基于自我注意力的聚合。我们的方法在实际微生物组分类任务上优于平均池化和未加权集合Transformers，在某些情况下实现了完美的性能。这些结果证明了丰度感知聚合对于稳健和生物学上知情的微生物组表示的效用。据我们所知，这是将序列水平丰度整合到基于Transformer的样本嵌入中的首批方法之一。
摘要：Microbiome sample representation to input into LLMs is essential for downstream tasks such as phenotype prediction and environmental classification. While prior studies have explored embedding-based representations of each microbiome sample, most rely on simple averaging over sequence embeddings, often overlooking the biological importance of taxa abundance. In this work, we propose an abundance-aware variant of the Set Transformer to construct fixed-size sample-level embeddings by weighting sequence embeddings according to their relative abundance. Without modifying the model architecture, we replicate embedding vectors proportional to their abundance and apply self-attention-based aggregation. Our method outperforms average pooling and unweighted Set Transformers on real-world microbiome classification tasks, achieving perfect performance in some cases. These results demonstrate the utility of abundance-aware aggregation for robust and biologically informed microbiome representation. To the best of our knowledge, this is one of the first approaches to integrate sequence-level abundance into Transformer-based sample embeddings.

【4】HistoViT: Vision Transformer for Accurate and Scalable Histopathological Cancer Diagnosis
标题：HistoViT：用于准确和可扩展的组织病理癌症诊断的视觉Transformer
链接：https://arxiv.org/abs/2508.11181

作者：med
备注：13 pages, 3 Figures
摘要：准确和可扩展的癌症诊断仍然是现代病理学中的关键挑战，特别是对于表现出复杂组织学变异性的恶性肿瘤，如乳腺癌、前列腺癌、骨肿瘤和宫颈癌。在这项研究中，我们提出了一个基于transformer的深度学习框架，用于组织病理学图像中的多类肿瘤分类。利用微调的Vision Transformer（ViT）架构，我们的方法解决了传统卷积神经网络的关键限制，提供了更好的性能，降低了预处理要求，并增强了跨组织类型的可扩展性。为了使模型适应组织病理学癌症图像，我们实现了一个简化的预处理管道，将平铺的全载玻片图像转换为PyTorch张量，并通过数据归一化对其进行处理。这确保了与ViT架构的兼容性，并增强了收敛稳定性和整体分类性能。我们在四个基准数据集上评估了我们的模型：ICIAR 2018（乳腺），SICAPV 2（前列腺），UT-骨肉瘤（骨）和SipakMed（宫颈）数据集-证明了与现有深度学习方法相比的一致性。我们的方法对乳腺癌、前列腺癌、骨癌和宫颈癌的分类准确率分别为99.32%、96.92%、95.28%和96.94%，所有数据集的ROC曲线下面积（AUC）得分均超过99%。这些结果证实了基于变压器的架构在数字病理学中的鲁棒性、可推广性和临床潜力。我们的工作代表了可靠，自动化和可解释的癌症诊断系统的重大进步，可以减轻诊断负担并改善医疗结果。
摘要：Accurate and scalable cancer diagnosis remains a critical challenge in modern pathology, particularly for malignancies such as breast, prostate, bone, and cervical, which exhibit complex histological variability. In this study, we propose a transformer-based deep learning framework for multi-class tumor classification in histopathological images. Leveraging a fine-tuned Vision Transformer (ViT) architecture, our method addresses key limitations of conventional convolutional neural networks, offering improved performance, reduced preprocessing requirements, and enhanced scalability across tissue types. To adapt the model for histopathological cancer images, we implement a streamlined preprocessing pipeline that converts tiled whole-slide images into PyTorch tensors and standardizes them through data normalization. This ensures compatibility with the ViT architecture and enhances both convergence stability and overall classification performance. We evaluate our model on four benchmark datasets: ICIAR2018 (breast), SICAPv2 (prostate), UT-Osteosarcoma (bone), and SipakMed (cervical) dataset -- demonstrating consistent outperformance over existing deep learning methods. Our approach achieves classification accuracies of 99.32%, 96.92%, 95.28%, and 96.94% for breast, prostate, bone, and cervical cancers respectively, with area under the ROC curve (AUC) scores exceeding 99% across all datasets. These results confirm the robustness, generalizability, and clinical potential of transformer-based architectures in digital pathology. Our work represents a significant advancement toward reliable, automated, and interpretable cancer diagnosis systems that can alleviate diagnostic burdens and improve healthcare outcomes.

GAN|对抗|攻击|生成相关(3篇)

【1】DiCriTest: Testing Scenario Generation for Decision-Making Agents Considering Diversity and Criticality
标题：DiCriTest：测试考虑多样性和关键性的决策代理的场景生成
链接：https://arxiv.org/abs/2508.11514

作者：u, Yufeng Yue, Danya Yao, Huaxin Pei
摘要：在动态环境中决策代理的不断部署增加了对安全验证的需求。虽然关键测试场景生成已经成为一种有吸引力的验证方法，有效地平衡多样性和关键性仍然是现有方法的关键挑战，特别是由于在高维场景空间中的局部最优捕获。为了解决这个问题，我们提出了一个双空间引导的测试框架，协调场景参数空间和代理行为空间，旨在生成测试场景，考虑多样性和关键性。具体而言，在场景参数空间中，一个分层表示框架结合降维和多维子空间评估，有效地定位不同的和关键的子空间。这指导了两种生成模式之间的动态协调：局部扰动和全局探索，优化关键场景的数量和多样性。补充地，在代理行为空间中，代理-环境交互数据被利用来量化行为关键性/多样性，并自适应地支持生成模式切换，形成闭环反馈回路，该闭环反馈回路不断增强参数空间内的场景表征和探索。实验表明，我们的框架提高了关键场景生成的平均56.23%，并表现出更大的多样性下，新的参数行为共同驱动的指标进行测试时，五个决策代理，优于国家的最先进的基线。
摘要：The growing deployment of decision-making agents in dynamic environments increases the demand for safety verification. While critical testing scenario generation has emerged as an appealing verification methodology, effectively balancing diversity and criticality remains a key challenge for existing methods, particularly due to local optima entrapment in high-dimensional scenario spaces. To address this limitation, we propose a dual-space guided testing framework that coordinates scenario parameter space and agent behavior space, aiming to generate testing scenarios considering diversity and criticality. Specifically, in the scenario parameter space, a hierarchical representation framework combines dimensionality reduction and multi-dimensional subspace evaluation to efficiently localize diverse and critical subspaces. This guides dynamic coordination between two generation modes: local perturbation and global exploration, optimizing critical scenario quantity and diversity. Complementarily, in the agent behavior space, agent-environment interaction data are leveraged to quantify behavioral criticality/diversity and adaptively support generation mode switching, forming a closed feedback loop that continuously enhances scenario characterization and exploration within the parameter space. Experiments show our framework improves critical scenario generation by an average of 56.23\% and demonstrates greater diversity under novel parameter-behavior co-driven metrics when tested on five decision-making agents, outperforming state-of-the-art baselines.

【2】Towards the Next-generation Bayesian Network Classifiers
标题：迈向下一代Bayesian网络分类器
链接：https://arxiv.org/abs/2508.11145

作者：g, Daokun Zhang, Kexin Meng, Geoffrey I. Webb
摘要：贝叶斯网络分类器为表格数据分类提供了一种可行的解决方案，具有时间和内存效率高，解释能力强等优点。然而，由于参数爆炸和数据稀疏性问题，贝叶斯网络分类器仅限于低阶特征依赖建模，使得它们难以外推复杂真实世界数据的发生概率。在本文中，我们提出了一种新的范式来设计高阶贝叶斯网络分类器，通过学习特征值的分布表示，就像在词嵌入和图表示学习中所做的那样。学习的分布表示通过在训练数据中观察到的同现模式用不同特征之间的语义相关性进行编码，然后将其作为外推新测试样本的发生概率的标志。作为分类器设计实现，我们通过将其扩展为神经版本来改造K依赖贝叶斯分类器（KDB），即，NeuralKDB，其中设计了一种新颖的神经网络架构，用于学习特征值的分布表示，并参数化相互依赖的特征之间的条件概率。设计了一种基于随机梯度下降的算法对NeuralKDB模型进行有效的训练。在60个UCI数据集上进行的大量分类实验表明，所提出的NeuralKDB分类器在捕获高阶特征依赖性方面表现出色，并且显著优于传统的贝叶斯网络分类器，以及其他竞争性分类器，包括两个基于神经网络的分类器，而无需分布式表示学习。
摘要：Bayesian network classifiers provide a feasible solution to tabular data classification, with a number of merits like high time and memory efficiency, and great explainability. However, due to the parameter explosion and data sparsity issues, Bayesian network classifiers are restricted to low-order feature dependency modeling, making them struggle in extrapolating the occurrence probabilities of complex real-world data. In this paper, we propose a novel paradigm to design high-order Bayesian network classifiers, by learning distributional representations for feature values, as what has been done in word embedding and graph representation learning. The learned distributional representations are encoded with the semantic relatedness between different features through their observed co-occurrence patterns in training data, which then serve as a hallmark to extrapolate the occurrence probabilities of new test samples. As a classifier design realization, we remake the K-dependence Bayesian classifier (KDB) by extending it into a neural version, i.e., NeuralKDB, where a novel neural network architecture is designed to learn distributional representations of feature values and parameterize the conditional probabilities between interdependent features. A stochastic gradient descent based algorithm is designed to train the NeuralKDB model efficiently. Extensive classification experiments on 60 UCI datasets demonstrate that the proposed NeuralKDB classifier excels in capturing high-order feature dependencies and significantly outperforms the conventional Bayesian network classifiers, as well as other competitive classifiers, including two neural network based classifiers without distributional representation learning.

【3】SHLIME: Foiling adversarial attacks fooling SHAP and LIME
标题：SHLIME：挫败对抗攻击愚弄SHAP和LIME
链接：https://arxiv.org/abs/2508.11053

作者：an, Estelle Duguet, Karthik Ramakrishnan, Hugh Van Deventer, Jack Kruger, Ranjan Subbaraman
备注：7 pages, 7 figures
摘要：事后解释方法，如LIME和SHAP，为黑盒分类器提供了可解释的见解，并越来越多地用于评估模型偏差和泛化能力。然而，这些方法容易受到对抗性操纵，可能会掩盖有害的偏见。在Slack等人（2020）的工作基础上，我们研究了LIME和SHAP对有偏模型的敏感性，并评估了提高鲁棒性的策略。我们首先复制了最初的COMPAS实验，以验证先前的发现并建立基线。然后，我们介绍了一个模块化的测试框架，使系统的评估增强和集成的解释方法在不同的性能分类。使用这个框架，我们评估了多个LIME/SHAP系综配置的分布模型，比较他们的抵抗偏见隐藏对原来的方法。我们的研究结果确定了可以大幅提高偏差检测的配置，突出了它们在高风险机器学习系统部署中提高透明度的潜力。
摘要：Post hoc explanation methods, such as LIME and SHAP, provide interpretable insights into black-box classifiers and are increasingly used to assess model biases and generalizability. However, these methods are vulnerable to adversarial manipulation, potentially concealing harmful biases. Building on the work of Slack et al. (2020), we investigate the susceptibility of LIME and SHAP to biased models and evaluate strategies for improving robustness. We first replicate the original COMPAS experiment to validate prior findings and establish a baseline. We then introduce a modular testing framework enabling systematic evaluation of augmented and ensemble explanation approaches across classifiers of varying performance. Using this framework, we assess multiple LIME/SHAP ensemble configurations on out-of-distribution models, comparing their resistance to bias concealment against the original methods. Our results identify configurations that substantially improve bias detection, highlighting their potential for enhancing transparency in the deployment of high-stakes machine learning systems.

半/弱/无/有监督|不确定性|主动学习(9篇)

【1】Physics-Informed Diffusion Models for Unsupervised Anomaly Detection in Multivariate Time Series
标题：多元时间序列中无监督异常检测的物理知情扩散模型
链接：https://arxiv.org/abs/2508.11528

作者：, Markus Lange-Hegermann, Stefan Windmann
备注：16 pages, 5 figures
摘要：我们提出了一种无监督的异常检测方法的基础上，多变量时间序列数据的物理信息扩散模型。近年来，扩散模型在时间序列领域的预测、插补、生成和异常检测等方面表现出了良好的效果。在本文中，我们提出了一种新的方法来学习的物理依赖的时间分布的多变量时间序列数据使用加权物理通知的损失在扩散模型训练。一个加权的物理通知的损失是使用静态重量计划。这种方法使扩散模型能够准确地近似底层数据分布，这可能会影响无监督异常检测性能。我们在合成和真实世界数据集上的实验表明，物理信息训练提高了异常检测的F1分数;它产生了更好的数据多样性和对数似然性。我们的模型优于基线方法，此外，它在合成数据集和一个真实世界数据集上超越了先前的物理信息工作和纯数据驱动的扩散模型，同时在其他数据集上保持竞争力。
摘要：We propose an unsupervised anomaly detection approach based on a physics-informed diffusion model for multivariate time series data. Over the past years, diffusion model has demonstrated its effectiveness in forecasting, imputation, generation, and anomaly detection in the time series domain. In this paper, we present a new approach for learning the physics-dependent temporal distribution of multivariate time series data using a weighted physics-informed loss during diffusion model training. A weighted physics-informed loss is constructed using a static weight schedule. This approach enables a diffusion model to accurately approximate underlying data distribution, which can influence the unsupervised anomaly detection performance. Our experiments on synthetic and real-world datasets show that physics-informed training improves the F1 score in anomaly detection; it generates better data diversity and log-likelihood. Our model outperforms baseline approaches, additionally, it surpasses prior physics-informed work and purely data-driven diffusion models on a synthetic dataset and one real-world dataset while remaining competitive on others.

【2】RMSL: Weakly-Supervised Insider Threat Detection with Robust Multi-sphere Learning
标题：RMSL：具有稳健多领域学习的弱监督内部威胁检测
链接：https://arxiv.org/abs/2508.11472

作者：, Yaxin Zhao, Xinyu Jiao, Sihan Xu, Xiangrui Cai, Ying Zhang, Xiaojie Yuan
备注：15 pages
摘要：内部威胁检测旨在通过分析记录用户交互的日志来识别恶意用户行为。由于缺乏细粒度的行为级注释，检测用户行为序列中的特定行为级异常是具有挑战性的。由于正常和异常行为之间固有的模糊性，无监督方法面临高误报率和错过率。在这项工作中，我们引入了行为序列的弱标签，它具有较低的注释成本，即，训练标签（异常或正常）处于序列级而不是行为级，以通过学习区别性特征来增强对行为级异常的检测能力。为了实现这一目标，我们提出了一个新的框架，称为鲁棒多球学习（RMSL）。RMSL使用多个超球体来表示正常的行为模式。首先，一类分类器被构造为一个很好的异常监督自由的起点。在此基础上，使用多实例学习和基于模型预测置信度的自适应行为级自训练去偏置，该框架使用弱序列级标签进一步细化超球体和特征表示。这种方法增强了模型区分正常和异常行为的能力。大量的实验表明，RMSL显着提高行为级内部威胁检测的性能。
摘要：Insider threat detection aims to identify malicious user behavior by analyzing logs that record user interactions. Due to the lack of fine-grained behavior-level annotations, detecting specific behavior-level anomalies within user behavior sequences is challenging. Unsupervised methods face high false positive rates and miss rates due to the inherent ambiguity between normal and anomalous behaviors. In this work, we instead introduce weak labels of behavior sequences, which have lower annotation costs, i.e., the training labels (anomalous or normal) are at sequence-level instead of behavior-level, to enhance the detection capability for behavior-level anomalies by learning discriminative features. To achieve this, we propose a novel framework called Robust Multi-sphere Learning (RMSL). RMSL uses multiple hyper-spheres to represent the normal patterns of behaviors. Initially, a one-class classifier is constructed as a good anomaly-supervision-free starting point. Building on this, using multiple instance learning and adaptive behavior-level self-training debiasing based on model prediction confidence, the framework further refines hyper-spheres and feature representations using weak sequence-level labels. This approach enhances the model's ability to distinguish between normal and anomalous behaviors. Extensive experiments demonstrate that RMSL significantly improves the performance of behavior-level insider threat detection.

【3】Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models
标题：校准且不确定？评估二元分类模型中的不确定性估计
链接：https://arxiv.org/abs/2508.11460

作者：efsrud, Nello Blaser, Trygve Buanes
摘要：严格的统计方法，包括伴随着不确定性的参数估计，支持科学发现的有效性，特别是在自然科学中。随着深度学习技术等数据模型越来越复杂，不确定性量化变得非常困难，并且已经提出了大量技术。在本案例研究中，我们使用近似贝叶斯推理的统一框架，结合对精心创建的合成分类数据集的实证测试，研究六种不同概率机器学习算法的定性属性，用于类概率和不确定性估计：（i）神经网络集成，（ii）具有冲突损失的神经网络集成，（iii）证据深度学习，（iv）具有Monte Carlo Dropout的单个神经网络，（v）高斯过程分类和（vi）Dirichlet过程混合模型。我们检查算法是否产生反映共同期望属性的不确定性估计，例如经过良好校准并显示出分布外数据点的不确定性增加。我们的研究结果表明，所有算法都得到了很好的校准，但没有一个基于深度学习的算法提供了不确定性，这些不确定性始终反映了缺乏分布外数据点的实验证据。我们希望我们的研究可以作为一个明确的例子，研究人员开发新的方法的不确定性估计科学数据驱动的建模。
摘要：Rigorous statistical methods, including parameter estimation with accompanying uncertainties, underpin the validity of scientific discovery, especially in the natural sciences. With increasingly complex data models such as deep learning techniques, uncertainty quantification has become exceedingly difficult and a plethora of techniques have been proposed. In this case study, we use the unifying framework of approximate Bayesian inference combined with empirical tests on carefully created synthetic classification datasets to investigate qualitative properties of six different probabilistic machine learning algorithms for class probability and uncertainty estimation: (i) a neural network ensemble, (ii) neural network ensemble with conflictual loss, (iii) evidential deep learning, (iv) a single neural network with Monte Carlo Dropout, (v) Gaussian process classification and (vi) a Dirichlet process mixture model. We check if the algorithms produce uncertainty estimates which reflect commonly desired properties, such as being well calibrated and exhibiting an increase in uncertainty for out-of-distribution data points. Our results indicate that all algorithms are well calibrated, but none of the deep learning based algorithms provide uncertainties that consistently reflect lack of experimental evidence for out-of-distribution data points. We hope our study may serve as a clarifying example for researchers developing new methods of uncertainty estimation for scientific data-driven modeling.

【4】SelfAdapt: Unsupervised Domain Adaptation of Cell Segmentation Models
标题：SelfAdapt：细胞分割模型的无监督域适应
链接：https://arxiv.org/abs/2508.11411

作者： Reith, Jannik Franzen, Dinesh R. Palli, J. Lorenz Rumberger, Dagmar Kainmueller
备注：8 pages, 3 figures. To appear in the proceedings of the BioImage Computing (BIC) Workshop @ ICCVW 2025. This is the accepted author manuscript (camera-ready version)
摘要：深度神经网络已成为生物医学实例分割的首选方法。像Cellpose这样的通用模型在不同的细胞数据中表现出最先进的性能，尽管它们的有效性通常会在与训练数据不同的域上下降。虽然监督微调可以解决这一限制，但它需要注释的数据，这些数据可能不容易获得。我们提出了SelfAdapt，这是一种能够在不需要标签的情况下适应预先训练的细胞分割模型的方法。我们的方法建立在学生-教师增强一致性训练的基础上，引入了L2-SP正则化和无标签停止标准。我们在LiveCell和TissueNet数据集上评估了我们的方法，证明了AP0.5比基线Cellpose的相对改进高达29.64%。此外，我们还证明了我们的无监督自适应可以进一步改进以前通过监督进行微调的模型。我们将SelfAdapt作为Cellpose框架的易于使用的扩展发布。我们的方法的代码可以在https：www.example.com上公开获得。
摘要：Deep neural networks have become the go-to method for biomedical instance segmentation. Generalist models like Cellpose demonstrate state-of-the-art performance across diverse cellular data, though their effectiveness often degrades on domains that differ from their training data. While supervised fine-tuning can address this limitation, it requires annotated data that may not be readily available. We propose SelfAdapt, a method that enables the adaptation of pre-trained cell segmentation models without the need for labels. Our approach builds upon student-teacher augmentation consistency training, introducing L2-SP regularization and label-free stopping criteria. We evaluate our method on the LiveCell and TissueNet datasets, demonstrating relative improvements in AP0.5 of up to 29.64% over baseline Cellpose. Additionally, we show that our unsupervised adaptation can further improve models that were previously fine-tuned with supervision. We release SelfAdapt as an easy-to-use extension of the Cellpose framework. The code for our method is publicly available at https: //github.com/Kainmueller-Lab/self_adapt.

【5】On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
标题：政策上RL会见政策外专家：通过动态加权协调监督微调和强化学习
链接：https://arxiv.org/abs/2508.11408

作者：ang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou
摘要：监督微调（SFT）和强化学习（RL）是两种突出的后训练范例，用于改进大型语言模型（LLM）的能力和调整其行为。现有的集成SFT和RL的方法通常面临着破坏已建立的模型模式并导致对专家数据过度拟合的风险。为了解决这个问题，我们提出了一个新的调查SFT和RL的统一观点，通过关闭政策与政策的镜头。我们提出了CHORD，一个通过动态加权对策略和非策略强化学习进行可控协调的框架，该框架将SFT重新定义为不是一个单独的阶段，而是在策略强化学习过程中动态加权的辅助目标。基于对政策外专家数据在整体和粒度两个层面影响的分析，我们在CHORD中引入了双重控制机制。具体而言，该框架首先采用全局系数来整体引导从政策外模仿到政策内探索的过渡，然后应用令牌加权函数，该函数能够从专家令牌进行粒度学习，从而保留政策内探索并减轻政策外数据的破坏。我们广泛使用的基准进行了大量的实验，提供经验证据表明，CHORD实现了稳定和有效的学习过程。通过有效地协调政策外专家数据与政策内探索，CHORD展示了基线的显着改进。我们在https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord上发布了该实现，以激发进一步的研究。
摘要：Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-training paradigms for refining the capabilities and aligning the behavior of Large Language Models (LLMs). Existing approaches that integrate SFT and RL often face the risk of disrupting established model patterns and inducing overfitting to expert data. To address this, we present a novel investigation into the unified view of SFT and RL through an off-policy versus on-policy lens. We propose CHORD, a framework for the Controllable Harmonization of On- and Off-Policy Reinforcement Learning via Dynamic Weighting, which reframes SFT not as a separate stage but as a dynamically weighted auxiliary objective within the on-policy RL process. Based on an analysis of off-policy expert data's influence at both holistic and granular levels, we incorporate a dual-control mechanism in CHORD. Specifically, the framework first employs a global coefficient to holistically guide the transition from off-policy imitation to on-policy exploration, and then applies a token-wise weighting function that enables granular learning from expert tokens, which preserves on-policy exploration and mitigates disruption from off-policy data. We conduct extensive experiments on widely used benchmarks, providing empirical evidence that CHORD achieves a stable and efficient learning process. By effectively harmonizing off-policy expert data with on-policy exploration, CHORD demonstrates significant improvements over baselines. We release the implementation at https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord to inspire further research.

【6】A CLIP-based Uncertainty Modal Modeling (UMM) Framework for Pedestrian Re-Identification in Autonomous Driving
标题：基于CLIP的自动驾驶行人识别不确定性模态建模框架
链接：https://arxiv.org/abs/2508.11218

作者：, Shuqi Wu, Ning Wang
摘要：重新识别（ReID）是智能感知系统中的关键技术，特别是在自动驾驶中，车载摄像头必须实时识别跨视图和时间的行人，以支持安全导航和轨迹预测。然而，存在不确定或缺失的输入形式-例如RGB，红外线，草图或文本描述-对传统的ReID方法构成了重大挑战。虽然大规模预训练模型提供了强大的多模态语义建模能力，但它们的计算开销限制了在资源受限环境中的实际部署。为了解决这些挑战，我们提出了一个轻量级的不确定性模态建模（UMM）框架，它集成了多模态标记映射器，合成模态增强策略，跨模态提示交互式学习器。这些组件共同实现了统一的特征表示，减轻了缺失模态的影响，并跨不同数据类型提取补充信息。此外，UMM利用CLIP的视觉语言对齐能力，有效地融合多模态输入，而无需大量的微调。实验结果表明，UMM在不确定模态条件下具有较强的鲁棒性、泛化能力和计算效率，为自动驾驶场景中行人的重新识别提供了一种可扩展的实用解决方案。
摘要：Re-Identification (ReID) is a critical technology in intelligent perception systems, especially within autonomous driving, where onboard cameras must identify pedestrians across views and time in real-time to support safe navigation and trajectory prediction. However, the presence of uncertain or missing input modalities--such as RGB, infrared, sketches, or textual descriptions--poses significant challenges to conventional ReID approaches. While large-scale pre-trained models offer strong multimodal semantic modeling capabilities, their computational overhead limits practical deployment in resource-constrained environments. To address these challenges, we propose a lightweight Uncertainty Modal Modeling (UMM) framework, which integrates a multimodal token mapper, synthetic modality augmentation strategy, and cross-modal cue interactive learner. Together, these components enable unified feature representation, mitigate the impact of missing modalities, and extract complementary information across different data types. Additionally, UMM leverages CLIP's vision-language alignment ability to fuse multimodal inputs efficiently without extensive finetuning. Experimental results demonstrate that UMM achieves strong robustness, generalization, and computational efficiency under uncertain modality conditions, offering a scalable and practical solution for pedestrian re-identification in autonomous driving scenarios.

【7】A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels
标题：一种半监督生成的缺失标签不完全多视图数据集成模型
链接：https://arxiv.org/abs/2508.11180

作者：en, Weiran Wang
摘要：多视图学习被广泛应用于现实生活中的数据集，如多组学生物数据，但它经常遭受丢失的视图和丢失的标签。以前的概率方法解决了丢失的视图问题，通过使用产品的专家计划，从目前的意见，并取得了优越的性能，确定性分类器，使用信息瓶颈（IB）的原则。然而，IB框架本质上是完全监督的，无法利用未标记的数据。在这项工作中，我们提出了一个半监督生成模型，利用标记和未标记的样本在一个统一的框架。我们的方法最大限度地提高了未标记样本的可能性，以学习与IB共享的标记数据的潜在空间。我们还执行跨视图互信息最大化的潜在空间，以提高跨视图共享信息的提取。与现有的方法相比，我们的模型在图像和多组学数据上具有更好的预测和插补性能，这些数据具有缺失视图和有限的标记样本。
摘要：Multi-view learning is widely applied to real-life datasets, such as multiple omics biological data, but it often suffers from both missing views and missing labels. Prior probabilistic approaches addressed the missing view problem by using a product-of-experts scheme to aggregate representations from present views and achieved superior performance over deterministic classifiers, using the information bottleneck (IB) principle. However, the IB framework is inherently fully supervised and cannot leverage unlabeled data. In this work, we propose a semi-supervised generative model that utilizes both labeled and unlabeled samples in a unified framework. Our method maximizes the likelihood of unlabeled samples to learn a latent space shared with the IB on labeled data. We also perform cross-view mutual information maximization in the latent space to enhance the extraction of shared information across views. Compared to existing approaches, our model achieves better predictive and imputation performance on both image and multi-omics data with missing views and limited labeled samples.

【8】Zono-Conformal Prediction: Zonotope-Based Uncertainty Quantification for Regression and Classification Tasks
标题：地带共形预测：回归和分类任务的基于地带位的不确定性量化
链接：https://arxiv.org/abs/2508.11025

作者：zow, Michael Eichelbeck, Mykel J. Kochenderfer, Matthias Althoff
备注：Preprint. Under review
摘要：共形预测是一种流行的不确定性量化方法，它使用具有统计有效覆盖保证的预测集来增强基本预测器。然而，目前的方法通常是计算昂贵和数据密集型的，因为它们需要在校准之前构建不确定性模型。此外，现有的方法通常用区间表示预测集，这限制了它们捕获多维输出中的依赖关系的能力。我们通过引入区域共形预测来解决这些局限性，这是一种受到区间预测模型和到达集一致识别启发的新方法，可以构建具有保证覆盖率的预测区域。通过将区域拓扑不确定性集直接放置到基本预测器的模型中，区域共形预测器可以通过单个数据有效的线性规划来识别。虽然我们可以将区域共形预测应用于任意非线性基预测器，但在这项工作中，我们专注于前馈神经网络。除了回归任务，我们还在分类设置中构建了最佳的区域共形预测器，其中不确定预测器的输出是一组可能的类。我们提供了概率覆盖保证，并提出了在识别数据中检测离群值的方法。在广泛的数值实验中，我们表明，zono-conformal预测是保守性比区间预测模型和标准的共形预测方法，同时实现了类似的覆盖测试数据。
摘要：Conformal prediction is a popular uncertainty quantification method that augments a base predictor with prediction sets with statistically valid coverage guarantees. However, current methods are often computationally expensive and data-intensive, as they require constructing an uncertainty model before calibration. Moreover, existing approaches typically represent the prediction sets with intervals, which limits their ability to capture dependencies in multi-dimensional outputs. We address these limitations by introducing zono-conformal prediction, a novel approach inspired by interval predictor models and reachset-conformant identification that constructs prediction zonotopes with assured coverage. By placing zonotopic uncertainty sets directly into the model of the base predictor, zono-conformal predictors can be identified via a single, data-efficient linear program. While we can apply zono-conformal prediction to arbitrary nonlinear base predictors, we focus on feed-forward neural networks in this work. Aside from regression tasks, we also construct optimal zono-conformal predictors in classification settings where the output of an uncertain predictor is a set of possible classes. We provide probabilistic coverage guarantees and present methods for detecting outliers in the identification data. In extensive numerical experiments, we show that zono-conformal predictors are less conservative than interval predictor models and standard conformal prediction methods, while achieving a similar coverage over the test data.

【9】Semi-Supervised Learning with Online Knowledge Distillation for Skin Lesion Classification
标题：半监督学习和在线知识提取用于皮肤病变分类
链接：https://arxiv.org/abs/2508.11511

作者： Manivannan
摘要：深度学习已经成为皮肤病变分析的一种有前途的方法。然而，现有的方法大多依赖于完全监督学习，需要大量的标记数据，这是具有挑战性和成本高昂的获得。为了减轻这种注释负担，本研究引入了一种新的半监督深度学习方法，该方法将集成学习与在线知识蒸馏相结合，以增强皮肤病变分类。我们的方法包括训练卷积神经网络模型的集合，使用在线知识蒸馏将见解从集合转移到其成员。该过程旨在增强集合中每个模型的性能，从而提升集合本身的整体性能。训练后，集合中的任何单个模型都可以在测试时部署，因为每个成员都经过训练，可以提供与集合相当的性能。这在资源有限的环境中尤其有益。实验结果表明，知识提取的个体模型比独立训练的模型性能更好。我们的方法在2018年和2019年国际皮肤成像合作组织公共基准数据集上表现出卓越的性能，超过了当前最先进的结果。通过利用集成学习和在线知识蒸馏，我们的方法减少了对大量标记数据的需求，同时为现实世界中的皮肤病变分类提供了一种更资源有效的解决方案。
摘要：Deep Learning has emerged as a promising approach for skin lesion analysis. However, existing methods mostly rely on fully supervised learning, requiring extensive labeled data, which is challenging and costly to obtain. To alleviate this annotation burden, this study introduces a novel semi-supervised deep learning approach that integrates ensemble learning with online knowledge distillation for enhanced skin lesion classification. Our methodology involves training an ensemble of convolutional neural network models, using online knowledge distillation to transfer insights from the ensemble to its members. This process aims to enhance the performance of each model within the ensemble, thereby elevating the overall performance of the ensemble itself. Post-training, any individual model within the ensemble can be deployed at test time, as each member is trained to deliver comparable performance to the ensemble. This is particularly beneficial in resource-constrained environments. Experimental results demonstrate that the knowledge-distilled individual model performs better than independently trained models. Our approach demonstrates superior performance on both the \emph{International Skin Imaging Collaboration} 2018 and 2019 public benchmark datasets, surpassing current state-of-the-art results. By leveraging ensemble learning and online knowledge distillation, our method reduces the need for extensive labeled data while providing a more resource-efficient solution for skin lesion classification in real-world scenarios.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】Nested Operator Inference for Adaptive Data-Driven Learning of Reduced-order Models
标题：用于降阶模型自适应数据驱动学习的嵌套操作员推理
链接：https://arxiv.org/abs/2508.11542

作者：etz, Karen Willcox
摘要：本文提出了一种数据驱动的嵌套算子推理（OpInf）方法，用于从高维动力系统的快照数据中学习物理信息降阶模型（ROM）。该方法利用减少的空间内的固有层次结构，迭代地构建初始猜测的OpInf学习问题，优先考虑的主导模式的相互作用。对于任何目标缩减维度计算的初始猜测对应于具有比标准OpInf更小或相等的快照重建误差的ROM。此外，我们的嵌套OpInf算法可以从以前学习的模型中热启动，从而实现涉及动态基础和模型形式更新的多功能应用场景。我们证明了我们的算法在立方热传导问题上的性能，嵌套OpInf在可比的离线时间内实现了比标准OpInf小四倍的误差。此外，我们将嵌套OpInf应用于格陵兰冰盖的大规模参数化模型，尽管模型形式近似误差，但它学习的ROM平均误差为3%，计算加速因子超过19，000。
摘要：This paper presents a data-driven, nested Operator Inference (OpInf) approach for learning physics-informed reduced-order models (ROMs) from snapshot data of high-dimensional dynamical systems. The approach exploits the inherent hierarchy within the reduced space to iteratively construct initial guesses for the OpInf learning problem that prioritize the interactions of the dominant modes. The initial guess computed for any target reduced dimension corresponds to a ROM with provably smaller or equal snapshot reconstruction error than with standard OpInf. Moreover, our nested OpInf algorithm can be warm-started from previously learned models, enabling versatile application scenarios involving dynamic basis and model form updates. We demonstrate the performance of our algorithm on a cubic heat conduction problem, with nested OpInf achieving a four times smaller error than standard OpInf at a comparable offline time. Further, we apply nested OpInf to a large-scale, parameterized model of the Greenland ice sheet where, despite model form approximation errors, it learns a ROM with, on average, 3% error and computational speed-up factor above 19,000.

【2】CTRL Your Shift: Clustered Transfer Residual Learning for Many Small Datasets
标题：CTRL Your Shift：许多小数据集的重复迁移残差学习
链接：https://arxiv.org/abs/2508.11144

作者：n, Dominik Rothenhäusler, Kirk Bansak, Elisabeth Paulson
摘要：机器学习（ML）任务通常使用来自多个不同来源的大规模数据，例如不同的位置，治疗组或组。在这种情况下，从业者通常希望预测不仅表现出良好的整体准确性，而且在每个来源中保持可靠，并保留不同来源之间的差异。例如，一些庇护和难民重新安置方案现在使用基于ML的就业预测来指导新抵达的家庭在东道国的安置，这需要为许多而且往往是小的来源地生成信息丰富和有区别的预测。然而，这一任务是具有挑战性的几个共同特点的数据在这些设置：存在许多不同的数据源，它们之间的分布变化，以及大量的变化，在样本量的来源。本文介绍了离散转移残差学习（CTRL），这是一种元学习方法，它结合了跨域残差学习和自适应池化/聚类的优势，以同时提高整体准确性并保持源代码级的异质性。我们提供的理论结果，澄清我们的目标如何导航数据量和数据质量之间的权衡。我们在5个大规模数据集上评估CTRL以及其他最先进的基准。这包括来自瑞士国家庇护计划的数据集，目前正在试行寻求庇护者的算法地理分配。CTRL在几个关键指标上以及使用一系列不同的基础学习器时始终优于基准。
摘要：Machine learning (ML) tasks often utilize large-scale data that is drawn from several distinct sources, such as different locations, treatment arms, or groups. In such settings, practitioners often desire predictions that not only exhibit good overall accuracy, but also remain reliable within each source and preserve the differences that matter across sources. For instance, several asylum and refugee resettlement programs now use ML-based employment predictions to guide where newly arriving families are placed within a host country, which requires generating informative and differentiated predictions for many and often small source locations. However, this task is made challenging by several common characteristics of the data in these settings: the presence of numerous distinct data sources, distributional shifts between them, and substantial variation in sample sizes across sources. This paper introduces Clustered Transfer Residual Learning (CTRL), a meta-learning method that combines the strengths of cross-domain residual learning and adaptive pooling/clustering in order to simultaneously improve overall accuracy and preserve source-level heterogeneity. We provide theoretical results that clarify how our objective navigates the trade-off between data quantity and data quality. We evaluate CTRL alongside other state-of-the-art benchmarks on 5 large-scale datasets. This includes a dataset from the national asylum program in Switzerland, where the algorithmic geographic assignment of asylum seekers is currently being piloted. CTRL consistently outperforms the benchmarks across several key metrics and when using a range of different base learners.

【3】Human-in-the-Loop Systems for Adaptive Learning Using Generative AI
标题：使用生成性人工智能的自适应学习的人在环系统
链接：https://arxiv.org/abs/2508.11062

作者： Tarun, Haoze Du, Dinesh Kannan, Edward F. Gehringer
备注：Accepted for presentation at the Frontiers in Education Conference, Nashville, Tennessee, USA, 2-5 November 2025
摘要：Human-in-the-Loop（HITL）方法利用生成式AI，通过将学生反馈直接集成到AI生成的解决方案中来增强个性化学习。学生使用预定义的反馈标签批评和修改AI响应，促进更深入的参与和理解。这使学生能够积极塑造他们的学习，人工智能作为一个适应性的合作伙伴。该系统使用标记技术和提示工程来个性化内容，通知检索增强生成（RAG）系统检索相关的教育材料并实时调整解释。这建立在现有的自适应学习研究的基础上，展示了学生驱动的反馈回路如何修改人工智能生成的响应，以提高学生的保留率和参与度，特别是在STEM教育中。一项针对STEM学生的研究的初步结果表明，与传统的人工智能工具相比，学习效果和信心有所改善。这项工作突出了人工智能通过迭代改进创建动态，反馈驱动和个性化学习环境的潜力。
摘要：A Human-in-the-Loop (HITL) approach leverages generative AI to enhance personalized learning by directly integrating student feedback into AI-generated solutions. Students critique and modify AI responses using predefined feedback tags, fostering deeper engagement and understanding. This empowers students to actively shape their learning, with AI serving as an adaptive partner. The system uses a tagging technique and prompt engineering to personalize content, informing a Retrieval-Augmented Generation (RAG) system to retrieve relevant educational material and adjust explanations in real time. This builds on existing research in adaptive learning, demonstrating how student-driven feedback loops can modify AI-generated responses for improved student retention and engagement, particularly in STEM education. Preliminary findings from a study with STEM students indicate improved learning outcomes and confidence compared to traditional AI tools. This work highlights AI's potential to create dynamic, feedback-driven, and personalized learning environments through iterative refinement.

强化学习(1篇)

【1】Fusing Rewards and Preferences in Reinforcement Learning
标题：强化学习中的奖励和偏好融合
链接：https://arxiv.org/abs/2508.11363

作者：orasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser
摘要：我们提出了双反馈执行者（DFA），这是一种强化学习算法，它将个人奖励和成对偏好（如果可用）融合到一个更新规则中。DFA直接使用策略的对数概率来建模偏好概率，避免了单独的奖励建模步骤。偏好可以由人工注释器（在状态级或自动化级）提供，或者从存储在非策略重放缓冲区中的Q值在线合成。在Bradley-Terry模型下，我们证明了最小化DFA的偏好损失可以恢复熵正则化的软演员批评（SAC）政策。我们的仿真结果表明，DFA生成的偏好训练匹配或超过SAC的六个控制环境，并展示了一个更稳定的训练过程。仅使用Bradley-Terry模型下的半合成偏好数据集，我们的算法在随机GridWorld中的性能优于基于人类反馈（RLHF）基线的奖励建模强化学习，并接近具有真实奖励的Oracle的性能。
摘要：We present Dual-Feedback Actor (DFA), a reinforcement learning algorithm that fuses both individual rewards and pairwise preferences (if available) into a single update rule. DFA uses the policy's log-probabilities directly to model the preference probability, avoiding a separate reward-modeling step. Preferences can be provided by human-annotators (at state-level or trajectory-level) or be synthesized online from Q-values stored in an off-policy replay buffer. Under a Bradley-Terry model, we prove that minimizing DFA's preference loss recovers the entropy-regularized Soft Actor-Critic (SAC) policy. Our simulation results show that DFA trained on generated preferences matches or exceeds SAC on six control environments and demonstrates a more stable training process. With only a semi-synthetic preference dataset under Bradley-Terry model, our algorithm outperforms reward-modeling reinforcement learning from human feedback (RLHF) baselines in a stochastic GridWorld and approaches the performance of an oracle with true rewards.

元学习(2篇)

【1】Meta-learning Structure-Preserving Dynamics
标题：元学习结构保持动力学
链接：https://arxiv.org/abs/2508.11205

作者：g, Uvini Balasuriya Mudiyanselage, Woojin Cho, Minju Jo, Anthony Gruber, Kookjin Lee
摘要：结构保持的方法，动力学建模已经表现出巨大的潜力，建模物理系统，由于其强大的归纳偏见，强制执行守恒定律和耗散行为。然而，所得到的模型通常是针对固定的系统配置进行训练的，这需要系统参数的明确知识以及针对每一组新参数进行昂贵的再训练-这是许多查询或参数变化场景中的主要限制。元学习提供了一个潜在的解决方案，但现有的方法，如基于优化的元学习，往往受到训练不稳定或有限的泛化能力。受计算机视觉思想的启发，我们引入了一个基于调制的元学习框架，该框架直接将结构保持模型置于潜在未知系统参数的紧凑潜在表示上，从而避免了适应过程中对灰箱系统知识和显式优化的需求。通过将新的调制策略应用于参数能量守恒和耗散系统，我们可以在动力系统的参数家族中进行可扩展和可推广的学习。在标准基准问题上的实验表明，我们的方法在Few-Shot学习设置中实现了准确的预测，而不会影响动态稳定性和有效泛化性能所需的基本物理约束。
摘要：Structure-preserving approaches to dynamics modeling have demonstrated great potential for modeling physical systems due to their strong inductive biases that enforce conservation laws and dissipative behavior. However, the resulting models are typically trained for fixed system configurations, requiring explicit knowledge of system parameters as well as costly retraining for each new set of parameters -- a major limitation in many-query or parameter-varying scenarios. Meta-learning offers a potential solution, but existing approaches like optimization-based meta-learning often suffer from training instability or limited generalization capability. Inspired by ideas from computer vision, we introduce a modulation-based meta-learning framework that directly conditions structure-preserving models on compact latent representations of potentially unknown system parameters, avoiding the need for gray-box system knowledge and explicit optimization during adaptation. Through the application of novel modulation strategies to parametric energy-conserving and dissipative systems, we enable scalable and generalizable learning across parametric families of dynamical systems. Experiments on standard benchmark problems demonstrate that our approach achieves accurate predictions in few-shot learning settings, without compromising on the essential physical constraints necessary for dynamical stability and effective generalization performance across parameter space.

【2】Compressive Meta-Learning
标题：压缩元学习
链接：https://arxiv.org/abs/2508.11090

作者：s Montserrat, David Bonet, Maria Perera, Xavier Giró-i-Nieto, Alexander G. Ioannidis
备注：Extended version of a paper accepted at KDD '25
摘要：新数据集规模的快速扩张产生了对快速有效的参数学习技术的需求。压缩学习是一种框架，它通过使用随机的非线性特征将大规模数据库投影到紧凑的信息保留表示上来实现有效处理，这些表示的维度与样本数量无关，并且可以轻松存储，传输和处理。然后，这些数据库级摘要用于从底层数据分布中解码感兴趣的参数，而无需访问原始样本，从而提供高效且隐私友好的学习框架。然而，编码和解码技术通常都是随机化的和与数据无关的，无法利用数据的底层结构。在这项工作中，我们提出了一个框架，通过使用神经网络来元学习压缩学习方法的编码和解码阶段，这些神经网络提供了比当前最先进的方法更快，更准确的系统。为了展示压缩元学习框架的潜力，我们探索了多种应用-包括基于神经网络的压缩PCA，压缩岭回归，压缩k-means和自动编码器。
摘要：The rapid expansion in the size of new datasets has created a need for fast and efficient parameter-learning techniques. Compressive learning is a framework that enables efficient processing by using random, non-linear features to project large-scale databases onto compact, information-preserving representations whose dimensionality is independent of the number of samples and can be easily stored, transferred, and processed. These database-level summaries are then used to decode parameters of interest from the underlying data distribution without requiring access to the original samples, offering an efficient and privacy-friendly learning framework. However, both the encoding and decoding techniques are typically randomized and data-independent, failing to exploit the underlying structure of the data. In this work, we propose a framework that meta-learns both the encoding and decoding stages of compressive learning methods by using neural networks that provide faster and more accurate systems than the current state-of-the-art approaches. To demonstrate the potential of the presented Compressive Meta-Learning framework, we explore multiple applications -- including neural network-based compressive PCA, compressive ridge regression, compressive k-means, and autoencoders.

医学相关(2篇)

【1】An Efficient Medical Image Classification Method Based on a Lightweight Improved ConvNeXt-Tiny Architecture
标题：基于轻量级改进ConvNeXt-Tiny架构的高效医学图像分类方法
链接：https://arxiv.org/abs/2508.11532

作者：Xia, Yue Yin, Xiuhan Li
摘要：医学影像的智能分析在辅助临床诊断中起着至关重要的作用。然而，在资源受限的计算环境中实现高效和高精度的图像分类仍然具有挑战性。提出了一种基于改进的ConvNeXt-Tiny架构的医学图像分类方法。通过结构优化和损失函数设计，该方法提高了特征提取能力和分类性能，同时降低了计算复杂度。具体而言，该方法引入了一个双全局池（全局平均池和全局最大池）的特征融合策略到ConvNeXt-Tiny骨干，同时保留全局统计特征和显着的响应信息。一个轻量级的信道注意模块，称为挤压和激励向量（SEVector），旨在提高信道权重的自适应分配，同时最大限度地减少参数开销。此外，一个特征平滑损失被纳入损失函数，以提高类内特征的一致性和抑制类内方差。在仅CPU条件下（8个线程），该方法在10个训练时期内在测试集上实现了89.10%的最大分类准确率，表现出稳定的损失值收敛趋势。实验结果表明，该方法有效地提高了资源受限环境下的医学图像分类性能，为医学图像分析模型的部署和推广提供了一种可行、高效的解决方案。
摘要：Intelligent analysis of medical imaging plays a crucial role in assisting clinical diagnosis. However, achieving efficient and high-accuracy image classification in resource-constrained computational environments remains challenging. This study proposes a medical image classification method based on an improved ConvNeXt-Tiny architecture. Through structural optimization and loss function design, the proposed method enhances feature extraction capability and classification performance while reducing computational complexity. Specifically, the method introduces a dual global pooling (Global Average Pooling and Global Max Pooling) feature fusion strategy into the ConvNeXt-Tiny backbone to simultaneously preserve global statistical features and salient response information. A lightweight channel attention module, termed Squeeze-and-Excitation Vector (SEVector), is designed to improve the adaptive allocation of channel weights while minimizing parameter overhead. Additionally, a Feature Smoothing Loss is incorporated into the loss function to enhance intra-class feature consistency and suppress intra-class variance. Under CPU-only conditions (8 threads), the method achieves a maximum classification accuracy of 89.10% on the test set within 10 training epochs, exhibiting a stable convergence trend in loss values. Experimental results demonstrate that the proposed method effectively improves medical image classification performance in resource-limited settings, providing a feasible and efficient solution for the deployment and promotion of medical imaging analysis models.

【2】Towards Efficient Prompt-based Continual Learning in Distributed Medical AI
标题：在分布式医疗人工智能中实现高效的基于预算的持续学习
链接：https://arxiv.org/abs/2508.10954

作者：, Jitae Shin
备注：10p
摘要：现代人工智能模型通过大规模、高质量的数据集实现了最先进的性能;然而，医疗领域的伦理、社会和制度约束严重限制了数据共享，使得集中式学习几乎不可能实现。每个机构都必须只使用本地数据逐步更新模型。传统的训练过度适应新的样本，并遭受灾难性的遗忘，失去以前获得的知识。医疗数据分布也会因诊断设备和人口统计学的变化而发生变化。虽然持续学习（CL）已经取得了进展，但大多数方法都是针对自然图像的，这使得医学领域特定的CL尚未得到充分探索。我们提出了一种基于正则化的持续学习（PCL）方法，其特征在于具有最小扩展策略的统一提示池：通过扩展和冻结提示的子集，我们的方法降低了计算开销，并且一种新的正则化项平衡了保留和适应。在三个糖尿病视网膜病变数据集Aptos 2019，LI 2019和糖尿病视网膜病变检测上的实验表明，我们的模型比最先进的方法提高了至少10%的最终分类准确率和9分的F1分数，同时降低了推理成本。我们预计这项研究将推动可持续的医疗AI进步，实现分布式医疗中的实时诊断、患者监测和远程医疗应用。代码将在接受后发布
摘要：Modern AI models achieve state-of-the-art performance with large-scale, high-quality datasets; however, ethical, social, and institutional constraints in the medical domain severely restrict data sharing, rendering centralized learning nearly impossible. Each institution must incrementally update models using only local data. Traditional training overfits new samples and suffers from catastrophic forgetting, losing previously acquired knowledge. Medical data distributions also shift due to varying diagnostic equipment and demographics. Although continual learning (CL) has advanced, most methods address natural images, leaving medical-domain-specific CL underexplored. We propose a prompt-based continual learning (PCL) approach featuring a unified prompt pool with a minimal expansion strategy: by expanding and freezing a subset of prompts, our method reduces computational overhead, and a novel regularization term balances retention and adaptation. Experiments on three diabetic retinopathy datasets Aptos2019, LI2019, and Diabetic Retinopathy Detection show our model improves final classification accuracy by at least 10% and F1-score by 9 points over state-of-the-art approaches while lowering inference cost. We anticipate this study will drive sustainable medical AI advances, enabling real-time diagnosis, patient monitoring, and telemedicine applications in distributed healthcare. Code will be released upon acceptance

蒸馏|知识提取(2篇)

【1】Model Interpretability and Rationale Extraction by Input Mask Optimization
标题：通过输入屏蔽优化的模型可解释性和基本原理提取
链接：https://arxiv.org/abs/2508.11388

作者：ner, Sina Zarriess
备注：None
摘要：在自然语言处理和计算机视觉等领域基于神经网络的模型发展迅速的同时，为这些黑箱模型的预测创建解释的需求也在稳步上升。我们提出了一种新的方法来生成神经网络预测的提取解释，该方法基于模型不认为指示相应类别的输入的掩蔽部分。掩蔽是使用基于梯度的优化与新的正则化方案相结合来完成的，该方案增强了生成的解释的充分性、全面性和紧凑性，这三个属性是自然语言处理中的基本原理提取相关领域所期望的。通过这种方式，我们弥合了模型可解释性和基本原理提取之间的差距，从而证明了后者可以在不训练专门模型的情况下进行，仅基于经过训练的分类器。我们进一步将相同的方法应用于图像输入，并获得高质量的图像分类解释，这表明在自然语言处理中提出的合理提取的条件更广泛地适用于不同的输入类型。
摘要：Concurrent to the rapid progress in the development of neural-network based models in areas like natural language processing and computer vision, the need for creating explanations for the predictions of these black-box models has risen steadily. We propose a new method to generate extractive explanations for predictions made by neural networks, that is based on masking parts of the input which the model does not consider to be indicative of the respective class. The masking is done using gradient-based optimization combined with a new regularization scheme that enforces sufficiency, comprehensiveness and compactness of the generated explanation, three properties that are known to be desirable from the related field of rationale extraction in natural language processing. In this way, we bridge the gap between model interpretability and rationale extraction, thereby proving that the latter of which can be performed without training a specialized model, only on the basis of a trained classifier. We further apply the same method to image inputs and obtain high quality explanations for image classifications, which indicates that the conditions proposed for rationale extraction in natural language processing are more broadly applicable to different input types.

【2】Unified Knowledge Distillation Framework: Fine-Grained Alignment and Geometric Relationship Preservation for Deep Face Recognition
标题：统一知识蒸馏框架：用于深度人脸识别的细粒度对齐和几何关系保持
链接：https://arxiv.org/abs/2508.11376

作者：ishra, Rishabh Uikey
备注：The paper spans a total of 14 pages, 10 pages for the main content (including references) and 4 pages for the appendix. The main paper contains 3 figures and 1 table, while the appendix includes 1 pseudo-code algorithm and 4 tables. The work was recently accepted for publication at IJCB 2025
摘要：知识蒸馏对于优化人脸识别模型以部署在计算有限的环境中（如边缘设备）至关重要。传统的KD方法，如Raw L2 Feature Distillation或Feature Consistency loss，通常无法捕获细粒度的实例级细节和复杂的关系结构，导致性能不佳。我们提出了一个统一的方法，集成了两个新的损失函数，实例级嵌入蒸馏和基于实例的成对相似蒸馏。实例级嵌入蒸馏（Instance-Level Embedding Distillation）专注于通过利用动态硬挖掘策略来调整单个特征嵌入，从而增强从具有挑战性的示例中学习的能力。基于数据库的成对相似性提取通过成对相似性关系获取关系信息，采用记忆库机制和样本挖掘策略。这种统一的框架确保了有效的实例级对齐和样本之间的几何关系的保留，从而实现了更全面的蒸馏过程。我们的统一框架在多个基准人脸识别数据集上优于最先进的蒸馏方法，正如广泛的实验评估所证明的那样。有趣的是，当使用强大的教师网络与学生相比，我们的统一KD使学生甚至超过教师的准确性。
摘要：Knowledge Distillation is crucial for optimizing face recognition models for deployment in computationally limited settings, such as edge devices. Traditional KD methods, such as Raw L2 Feature Distillation or Feature Consistency loss, often fail to capture both fine-grained instance-level details and complex relational structures, leading to suboptimal performance. We propose a unified approach that integrates two novel loss functions, Instance-Level Embedding Distillation and Relation-Based Pairwise Similarity Distillation. Instance-Level Embedding Distillation focuses on aligning individual feature embeddings by leveraging a dynamic hard mining strategy, thereby enhancing learning from challenging examples. Relation-Based Pairwise Similarity Distillation captures relational information through pairwise similarity relationships, employing a memory bank mechanism and a sample mining strategy. This unified framework ensures both effective instance-level alignment and preservation of geometric relationships between samples, leading to a more comprehensive distillation process. Our unified framework outperforms state-of-the-art distillation methods across multiple benchmark face recognition datasets, as demonstrated by extensive experimental evaluations. Interestingly, when using strong teacher networks compared to the student, our unified KD enables the student to even surpass the teacher's accuracy.

推荐(1篇)

【1】Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation
标题：短视频推荐中观看时间预测的相对优势去偏
链接：https://arxiv.org/abs/2508.11086

作者：, Kuan Han, Minfeng Zhan, Bocheng Zhao, Guanyu Mu, Yang Song
摘要：观看时间被广泛用作视频推荐平台中用户满意度的代理。然而，原始观看时间受到诸如视频持续时间、流行度和个人用户行为等混杂因素的影响，可能会扭曲偏好信号并导致有偏见的推荐模型。我们提出了一种新的相对优势去偏框架，纠正观看时间比较，经验得出的参考分布条件的用户和项目组。这种方法产生了一个基于分位数的偏好信号，并引入了一个两阶段的架构，明确分离的分布估计偏好学习。此外，我们提出了分布式嵌入，有效地参数化的手表时间分位数，而不需要在线采样或存储的历史数据。离线和在线实验都表明，与现有的基线方法相比，推荐的准确性和鲁棒性都有显着提高。
摘要：Watch time is widely used as a proxy for user satisfaction in video recommendation platforms. However, raw watch times are influenced by confounding factors such as video duration, popularity, and individual user behaviors, potentially distorting preference signals and resulting in biased recommendation models. We propose a novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups. This approach yields a quantile-based preference signal and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning. Additionally, we present distributional embeddings to efficiently parameterize watch-time quantiles without requiring online sampling or storage of historical data. Both offline and online experiments demonstrate significant improvements in recommendation accuracy and robustness compared to existing baseline methods.

自动驾驶|车辆|车道检测等(1篇)

【1】Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection
标题：通过碰撞特征选择预测和解释交通碰撞严重性
链接：https://arxiv.org/abs/2508.11504

作者：stellani, Zacharias Papadovasilakis, Giorgos Papoutsoglou, Mary Cole, Brian Bautsch, Tobias Rodemann, Ioannis Tsamardinos, Angela Harden
备注：Preprint. Manuscript under review at "Accident Analysis & Prevention" journal
摘要：机动车碰撞仍然是全球伤亡的主要原因，需要数据驱动的方法来了解和减轻碰撞的严重程度。这项研究介绍了俄亥俄州六年（2017-2022年）超过300万人参与事故的策划数据集，汇总到超过230万条车辆级记录进行预测分析。其主要贡献是一种透明且可重复的方法，该方法结合了自动机器学习（AutoML）和可解释的人工智能（AI），以识别和解释与严重碰撞相关的关键风险因素。使用JADBio AutoML平台，构建了预测模型，以区分严重和非严重的碰撞结果。这些模型在分层训练子集中进行了严格的特征选择，并使用SHapley加法解释（SHAP）来解释其输出，以量化单个特征的贡献。最终的岭逻辑回归模型在训练集上实现了85.6%的AUC-ROC，在保持测试集上实现了84.9%的AUC-ROC，其中17个特征始终被确定为最有影响力的预测因子。主要特征涵盖人口统计、环境、车辆、人类和操作类别，包括位置类型、公布速度、最小乘员年龄和碰撞前行动。值得注意的是，某些传统上强调的因素，如酒精或药物损伤，在最终模型中的影响力较小，相比环境和背景变量。这项研究强调方法的严谨性和可解释性，而不仅仅是预测性能，它提供了一个可扩展的框架，通过协调一致的干预措施和先进的数据信息交通安全政策来支持Vision Zero。
摘要：Motor vehicle crashes remain a leading cause of injury and death worldwide, necessitating data-driven approaches to understand and mitigate crash severity. This study introduces a curated dataset of more than 3 million people involved in accidents in Ohio over six years (2017-2022), aggregated to more than 2.3 million vehicle-level records for predictive analysis. The primary contribution is a transparent and reproducible methodology that combines Automated Machine Learning (AutoML) and explainable artificial intelligence (AI) to identify and interpret key risk factors associated with severe crashes. Using the JADBio AutoML platform, predictive models were constructed to distinguish between severe and non-severe crash outcomes. The models underwent rigorous feature selection across stratified training subsets, and their outputs were interpreted using SHapley Additive exPlanations (SHAP) to quantify the contribution of individual features. A final Ridge Logistic Regression model achieved an AUC-ROC of 85.6% on the training set and 84.9% on a hold-out test set, with 17 features consistently identified as the most influential predictors. Key features spanned demographic, environmental, vehicle, human, and operational categories, including location type, posted speed, minimum occupant age, and pre-crash action. Notably, certain traditionally emphasized factors, such as alcohol or drug impairment, were less influential in the final model compared to environmental and contextual variables. Emphasizing methodological rigor and interpretability over mere predictive performance, this study offers a scalable framework to support Vision Zero with aligned interventions and advanced data-informed traffic safety policy.

联邦学习|隐私保护|加密(1篇)

【1】Mitigating Modality Quantity and Quality Imbalance in Multimodal Online Federated Learning
标题：缓解多模式在线联邦学习中的模式数量和质量失衡
链接：https://arxiv.org/abs/2508.11159

作者：ang, Weihong Yang, Xiaoxiong Zhong, Jia Zhou, Fangming Liu, Weizhe Zhang
备注：arXiv admin note: text overlap with arXiv:2505.16138
摘要：物联网（IoT）生态系统从不同的来源产生大量的多模式数据，包括传感器、摄像头和麦克风。随着边缘智能的进步，物联网设备已经从简单的数据采集单元发展成为具有计算能力的节点，从而能够对异构多模态数据进行本地化处理。这种演变需要能够有效处理这些数据的分布式学习范式。此外，数据生成的连续性和边缘设备的有限存储容量需要在线学习框架。多模态在线联合学习（MMO-FL）已经成为一种很有前途的方法，以满足这些要求。然而，由于物联网设备固有的不稳定性，MMO-FL面临着新的挑战，这通常会导致数据收集过程中的模态数量和质量不平衡（QQI）。在这项工作中，我们系统地调查的影响，QQI内的MMO-FL框架，并提出了一个全面的理论分析量化这两种类型的不平衡如何降低学习成绩。为了应对这些挑战，我们提出了模态数量和质量重新平衡（QQR）算法，这是一种基于原型学习的方法，旨在与训练过程并行操作。在两个真实世界的多模态数据集上的实验表明，所提出的QQR算法在模态不平衡条件下始终优于基准，具有良好的学习性能。
摘要：The Internet of Things (IoT) ecosystem produces massive volumes of multimodal data from diverse sources, including sensors, cameras, and microphones. With advances in edge intelligence, IoT devices have evolved from simple data acquisition units into computationally capable nodes, enabling localized processing of heterogeneous multimodal data. This evolution necessitates distributed learning paradigms that can efficiently handle such data. Furthermore, the continuous nature of data generation and the limited storage capacity of edge devices demand an online learning framework. Multimodal Online Federated Learning (MMO-FL) has emerged as a promising approach to meet these requirements. However, MMO-FL faces new challenges due to the inherent instability of IoT devices, which often results in modality quantity and quality imbalance (QQI) during data collection. In this work, we systematically investigate the impact of QQI within the MMO-FL framework and present a comprehensive theoretical analysis quantifying how both types of imbalance degrade learning performance. To address these challenges, we propose the Modality Quantity and Quality Rebalanced (QQR) algorithm, a prototype learning based method designed to operate in parallel with the training process. Extensive experiments on two real-world multimodal datasets show that the proposed QQR algorithm consistently outperforms benchmarks under modality imbalance conditions with promising learning performance.

推理|分析|理解|解释(7篇)

【1】Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks
标题：视觉感知引擎：用于机器人视觉任务的快速灵活的多头推理
链接：https://arxiv.org/abs/2508.11584

作者：ki, Jonathan Becktor, Georgios Georgakis, Robert Royce, Shehryar Khattak
备注：6 pages, 6 figures, 2 tables
摘要：在资源受限的机器人平台上部署多个机器学习模型以执行不同的感知任务通常会导致冗余计算、大内存占用和复杂的集成挑战。作为回应，这项工作提出了视觉感知引擎（VPEngine），一个模块化的框架，旨在使有效的GPU使用视觉多任务，同时保持可扩展性和开发人员的可访问性。我们的框架架构利用了一个共享的基础模型主干，可以提取图像表示，这些图像表示可以在并行运行的多个专用任务特定模型头之间高效共享，而无需任何不必要的GPU-CPU内存传输。该设计消除了传统顺序模型部署时特征提取组件固有的计算冗余，同时支持基于应用需求的动态任务优先级排序。我们展示了我们的框架的能力，通过一个示例实现使用DINOv 2作为基础模型与多个任务（深度，对象检测和语义分割）头，实现高达3倍的速度比顺序执行。基于CUDA多进程服务（MPS），VPEngine提供高效的GPU利用率，并保持恒定的内存占用，同时允许在运行时动态调整每个任务的推理频率。该框架是用Python编写的，并且是开源的，带有ROS 2 C++（Humble）绑定，便于机器人社区在不同的机器人平台上使用。我们的示例实现演示了在NVIDIA Jetson Orin AGX上以50 Hz的频率为TensorRT优化模型提供的端到端实时性能。
摘要：Deploying multiple machine learning models on resource-constrained robotic platforms for different perception tasks often results in redundant computations, large memory footprints, and complex integration challenges. In response, this work presents Visual Perception Engine (VPEngine), a modular framework designed to enable efficient GPU usage for visual multitasking while maintaining extensibility and developer accessibility. Our framework architecture leverages a shared foundation model backbone that extracts image representations, which are efficiently shared, without any unnecessary GPU-CPU memory transfers, across multiple specialized task-specific model heads running in parallel. This design eliminates the computational redundancy inherent in feature extraction component when deploying traditional sequential models while enabling dynamic task prioritization based on application demands. We demonstrate our framework's capabilities through an example implementation using DINOv2 as the foundation model with multiple task (depth, object detection and semantic segmentation) heads, achieving up to 3x speedup compared to sequential execution. Building on CUDA Multi-Process Service (MPS), VPEngine offers efficient GPU utilization and maintains a constant memory footprint while allowing per-task inference frequencies to be adjusted dynamically during runtime. The framework is written in Python and is open source with ROS2 C++ (Humble) bindings for ease of use by the robotics community across diverse robotic platforms. Our example implementation demonstrates end-to-end real-time performance at $\geq$50 Hz on NVIDIA Jetson Orin AGX for TensorRT optimized models.

【2】A Comprehensive Perspective on Explainable AI across the Machine Learning Workflow
标题：对整个机器学习工作流程中的可解释人工智能的全面视角
链接：https://arxiv.org/abs/2508.11529

作者：terakis, Andrea Castellani, George Papoutsoglou, Tobias Rodemann, Ioannis Tsamardinos
备注：Preprint. Currently under review at "Artificial Intelligence Review" journal
摘要：人工智能正在重塑科学和工业，但许多用户仍将其模型视为不透明的“黑匣子”。传统的可解释的人工智能方法澄清了个人的预测，但忽略了上游的决策和下游的质量检查，这些都决定了洞察力是否可信。在这项工作中，我们提出了整体可解释人工智能（HXAI），这是一个以用户为中心的框架，它将解释嵌入到数据分析工作流程的每个阶段，并为用户量身定制这些解释。HXAI将六个组成部分（数据，分析设置，学习过程，模型输出，模型质量，沟通渠道）统一到一个单一的分类法中，并将每个组成部分与领域专家，数据分析师和数据科学家的需求保持一致。一个112项的问题库涵盖了这些需求;我们对当代工具的调查突出了关键的覆盖差距。基于人类解释的理论，人机交互的原则和经验用户研究的结果，HXAI确定了使解释清晰，可操作和认知可管理的特征。一个全面的分类法将这些见解操作化，减少术语的模糊性，并对现有工具链进行严格的覆盖分析。我们进一步展示了嵌入大型语言模型的AI代理如何编排不同的解释技术，将技术工件转换为特定于用户的叙述，从而弥合AI开发人员和领域专家之间的差距。与传统的调查或观点文章不同，这项工作融合了来自多个学科的概念，来自现实世界项目的经验教训和文献的关键综合，以推进关于透明度，可信度和负责任的人工智能部署的新颖的端到端观点。
摘要：Artificial intelligence is reshaping science and industry, yet many users still regard its models as opaque "black boxes". Conventional explainable artificial-intelligence methods clarify individual predictions but overlook the upstream decisions and downstream quality checks that determine whether insights can be trusted. In this work, we present Holistic Explainable Artificial Intelligence (HXAI), a user-centric framework that embeds explanation into every stage of the data-analysis workflow and tailors those explanations to users. HXAI unifies six components (data, analysis set-up, learning process, model output, model quality, communication channel) into a single taxonomy and aligns each component with the needs of domain experts, data analysts and data scientists. A 112-item question bank covers these needs; our survey of contemporary tools highlights critical coverage gaps. Grounded in theories of human explanation, principles from human-computer interaction and findings from empirical user studies, HXAI identifies the characteristics that make explanations clear, actionable and cognitively manageable. A comprehensive taxonomy operationalises these insights, reducing terminological ambiguity and enabling rigorous coverage analysis of existing toolchains. We further demonstrate how AI agents that embed large-language models can orchestrate diverse explanation techniques, translating technical artifacts into stakeholder-specific narratives that bridge the gap between AI developers and domain experts. Departing from traditional surveys or perspective articles, this work melds concepts from multiple disciplines, lessons from real-world projects and a critical synthesis of the literature to advance a novel, end-to-end viewpoint on transparency, trustworthiness and responsible AI deployment.

【3】Informative Post-Hoc Explanations Only Exist for Simple Functions
标题：信息丰富的事后解释仅适用于简单功能
链接：https://arxiv.org/abs/2508.11441

作者：her, Balázs Szabados, Robi Bhattacharjee, Sebastian Bordt, Ulrike von Luxburg
摘要：许多研究人员认为，局部事后解释算法可以用来深入了解复杂机器学习模型的行为。然而，这种算法的理论保证只存在于简单的决策函数中，目前还不清楚复杂模型是否以及在何种假设下可能存在类似的结果。在本文中，我们介绍了一个通用的，学习理论为基础的框架，它意味着什么解释提供信息的决策功能。如果一种解释有助于降低似然决策函数空间的复杂性，我们就称之为信息性解释。通过这种方法，我们表明，许多流行的解释算法是不翔实的应用于复杂的决策函数时，提供了一个严格的数学拒绝的想法，它应该是可以解释任何模型。然后，我们得出的条件下，不同的解释算法变得翔实。它们往往比人们所期望的要强大。例如，梯度解释和反事实解释相对于可微函数的空间是无信息的，并且SHAP和锚解释相对于决策树的空间是无信息的。基于这些结果，我们讨论了如何解释算法可以修改，成为信息。虽然对解释算法的分析是数学上的，但我们认为它对这些算法的实际适用性具有很强的影响，特别是对于审计，监管和高风险的人工智能应用。
摘要：Many researchers have suggested that local post-hoc explanation algorithms can be used to gain insights into the behavior of complex machine learning models. However, theoretical guarantees about such algorithms only exist for simple decision functions, and it is unclear whether and under which assumptions similar results might exist for complex models. In this paper, we introduce a general, learning-theory-based framework for what it means for an explanation to provide information about a decision function. We call an explanation informative if it serves to reduce the complexity of the space of plausible decision functions. With this approach, we show that many popular explanation algorithms are not informative when applied to complex decision functions, providing a rigorous mathematical rejection of the idea that it should be possible to explain any model. We then derive conditions under which different explanation algorithms become informative. These are often stronger than what one might expect. For example, gradient explanations and counterfactual explanations are non-informative with respect to the space of differentiable functions, and SHAP and anchor explanations are not informative with respect to the space of decision trees. Based on these results, we discuss how explanation algorithms can be modified to become informative. While the proposed analysis of explanation algorithms is mathematical, we argue that it holds strong implications for the practical applicability of these algorithms, particularly for auditing, regulation, and high-risk applications of AI.

【4】Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis
标题：回溯专家：可解释回溯合成的协作推理
链接：https://arxiv.org/abs/2508.10967

作者： Sai Wang, Yutian Lin, Yu Wu, Yi Yang
摘要：逆合成预测的目的是根据已知的产物分子来推断反应物分子，这是化学合成中的一项基本任务。然而，现有的模型依赖于静态模式匹配范式，这限制了它们执行有效逻辑决策的能力，导致黑箱决策。在此基础上，我们提出了Retro-Expert，这是一个可解释的逆合成框架，通过强化学习结合大型语言模型和专业模型的互补推理优势来执行协作推理。它通过三个部分输出基于化学逻辑的自然语言解释：（1）专业模型执行浅层推理以构建高质量的化学决策空间，（2）LLM驱动的批判性推理以生成预测和相应的可解释推理路径，以及（3）强化学习优化可解释决策策略。实验表明，Retro-Expert不仅在不同指标上超越了基于LLM的模型和专业模型，而且还提供了与专家一致的解释，弥合了人工智能预测和可操作的化学见解之间的差距。
摘要：Retrosynthesis prediction aims to infer the reactant molecule based on a given product molecule, which is a fundamental task in chemical synthesis. However, existing models rely on static pattern-matching paradigm, which limits their ability to perform effective logic decision-making, leading to black-box decision-making. Building on this, we propose Retro-Expert, an interpretable retrosynthesis framework that performs collaborative reasoning by combining the complementary reasoning strengths of Large Language Models and specialized models via reinforcement learning. It outputs natural language explanations grounded in chemical logic through three components: (1) specialized models perform shallow reasoning to construct high-quality chemical decision space, (2) LLM-driven critical reasoning to generate predictions and corresponding interpretable reasoning path, and (3) reinforcement learning optimizing interpretable decision policy. Experiments show that Retro-Expert not only surpasses both LLM-based and specialized models across different metrics but also provides expert-aligned explanations that bridge the gap between AI predictions and actionable chemical insights.

【5】Functional Analysis of Variance for Association Studies
标题：关联研究方差的功能分析
链接：https://arxiv.org/abs/2508.11069

作者：sevolozhskaya, Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, Qing Lu
摘要：虽然在确定与人类疾病相关的常见遗传变异方面取得了进展，但对于大多数常见复杂疾病，已确定的遗传变异仅占遗传度的一小部分。挑战仍然存在于寻找其他未知的易患复杂疾病的遗传变异。随着下一代测序技术的进步，测序研究在遗传研究中已经变得司空见惯。正在进行的外显子组测序和全基因组测序研究产生了大量的测序变体，使研究人员能够全面研究它们在人类疾病中的作用。新的疾病相关变异的发现可以通过利用强大且计算效率高的统计方法来增强。在本文中，我们提出了一个功能的方差分析（FANOVA）的方法来测试一个基因组区域的序列变异与质量性状的关联。FANOVA具有许多优点：（1）它测试基因变异的联合效应，包括常见和罕见;（2）它充分利用连锁不平衡和遗传位置信息;（3）允许保护性或增加风险的因果变异。通过模拟，我们表明FANOVA优于两种常用的方法- SKAT和先前提出的基于功能线性模型（FLM）的方法，特别是如果研究的样本量很小和/或序列变异具有低至中等的影响。本文运用FAANOVA、SKAT和FLM三种方法对达拉斯心脏研究的数据进行了实证分析。虽然SKAT和FLM分别检测到与肥胖相关的ANGPTL 4和ANGPTL 3，但FAANOVA能够鉴定出与肥胖相关的两种基因。
摘要：While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods - SKAT and a previously proposed method based on functional linear models (FLM), - especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.

【6】A Generalized Similarity U Test for Multivariate Analysis of Sequencing Data
标题：测序数据多元分析的广义相似度U检验
链接：https://arxiv.org/abs/1505.01179

作者：i Wei, Qing Lu
摘要：基于测序的研究正在成为复杂疾病遗传关联研究的主要工具。这些研究对传统的统计方法（例如，基于回归方法的单基因座分析），因为数据的高维性和遗传变异的低频率。此外，在生物学和流行病学中，人们对确定导致多种疾病表型的遗传风险因素有很大的兴趣。多个表型通常可以遵循不同的分布，这违反了大多数现有方法的假设。在本文中，我们提出了一种广义相似性U检验，简称GSU。GSU是一种基于相似性的测试，可以处理高维基因型和表型。我们研究了GSU的理论性质，并提供了关联检验的有效p值计算以及研究设计的样本量和功效计算。通过模拟，我们发现GSU在功率和对表型分布的鲁棒性方面优于现有方法。最后，我们使用GSU对达拉斯心脏研究中的测序数据进行多变量分析，并确定了4个基因与5种代谢相关表型的联合关联。
摘要：Sequencing-based studies are emerging as a major tool for genetic association studies of complex diseases. These studies pose great challenges to the traditional statistical methods (e.g., single-locus analyses based on regression methods) because of the high-dimensionality of data and the low frequency of genetic variants. In addition, there is a great interest in biology and epidemiology to identify genetic risk factors contributed to multiple disease phenotypes. The multiple phenotypes can often follow different distributions, which violates the assumptions of most current methods. In this paper, we propose a generalized similarity U test, referred to as GSU. GSU is a similarity-based test and can handle high-dimensional genotypes and phenotypes. We studied the theoretical properties of GSU, and provided the efficient p-value calculation for association test as well as the sample size and power calculation for the study design. Through simulation, we found that GSU had advantages over existing methods in terms of power and robustness to phenotype distributions. Finally, we used GSU to perform a multivariate analysis of sequencing data in the Dallas Heart Study and identified a joint association of 4 genes with 5 metabolic related phenotypes.

【7】A weighted U statistic for association analysis considering genetic heterogeneity
标题：考虑遗传变异性的关联分析加权U统计量
链接：https://arxiv.org/abs/1504.08319

作者：i Wei, Robert C. Elston, Qing Lu
摘要：越来越多的证据表明，具有相同或相似临床表现的常见复杂疾病可能具有不同的潜在遗传病因。虽然目前的研究兴趣已经转向揭示罕见的变异和结构变异诱发人类疾病，异质性在复杂疾病的遗传研究的影响在很大程度上被忽视。大多数现有的统计方法假设所研究的疾病具有同质的遗传效应，因此，如果疾病经历异质的病理生理学和病因学过程，则可能具有低功效。本文提出了一种考虑遗传异质性的关联分析的异质性加权U（HWU）方法。HWU可以应用于各种类型的表型（例如，二进制和连续的）并且对于高维遗传数据是计算有效的。通过模拟，我们展示了当疾病的潜在遗传病因是异质性时HWU的优势，以及HWU对不同模型假设的鲁棒性（例如，表型分布）。使用HWU，我们对成瘾研究：遗传学和环境（SAGE）数据集的尼古丁依赖进行了全基因组分析。对近100万个遗传标记的全基因组分析花费了7个小时，鉴定了两个新基因的异质效应（即，CYP3A5和IKBKB）对尼古丁依赖的影响。
摘要：Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally effcient for high- dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments (SAGE) dataset. The genome-wide analysis of nearly one million genetic markers took 7 hours, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence.

检测相关(7篇)

【1】E-CaTCH: Event-Centric Cross-Modal Attention with Temporal Consistency and Class-Imbalance Handling for Misinformation Detection
标题：E-CaTok：以事件为中心的跨模式注意力，具有时间一致性和类别失衡处理，用于错误信息检测
链接：https://arxiv.org/abs/2508.11197

作者：savi, Yeganeh Abdollahinejad, Roberto Corizzo, Nathalie Japkowicz, Zois Boukouvalas
摘要：由于模态之间的不一致性、时间模式的变化以及大量的类不平衡，检测社交媒体上的多模态错误信息仍然具有挑战性。许多现有的方法单独处理帖子，无法捕捉跨时间和模式连接帖子的事件级结构。我们提出了E-CaTCH，一个可解释和可扩展的框架，用于鲁棒地检测错误信息。如果需要，E-CaTCH会根据文本相似性和时间接近性将帖子聚类为伪事件，然后独立处理每个事件。在每个事件中，使用预先训练的BERT和ResNet编码器提取文本和视觉特征，通过模态内自我注意进行细化，并通过双向跨模态注意进行对齐。软门控机制融合这些表示，以形成每个帖子的上下文化，内容感知嵌入。为了对时间演变进行建模，E-CaTCH将事件分割成重叠的时间窗口，并使用趋势感知LSTM（通过语义转变和动量信号增强）来编码随着时间的推移的叙事进展。分类是在事件级别执行的，能够更好地与真实世界的错误信息动态保持一致。为了解决类不平衡问题，促进稳定学习，该模型集成了自适应类加权，时间一致性正则化和硬示例挖掘。所有事件的总损失汇总。在Fakeddit、IND和COVID-19 MISINFOGRAPH上进行的大量实验表明，E-CaTCH的表现始终优于最先进的基线。跨数据集评估进一步证明了其在各种错误信息场景中的鲁棒性，通用性和实用性。
摘要：Detecting multimodal misinformation on social media remains challenging due to inconsistencies between modalities, changes in temporal patterns, and substantial class imbalance. Many existing methods treat posts independently and fail to capture the event-level structure that connects them across time and modality. We propose E-CaTCH, an interpretable and scalable framework for robustly detecting misinformation. If needed, E-CaTCH clusters posts into pseudo-events based on textual similarity and temporal proximity, then processes each event independently. Within each event, textual and visual features are extracted using pre-trained BERT and ResNet encoders, refined via intra-modal self-attention, and aligned through bidirectional cross-modal attention. A soft gating mechanism fuses these representations to form contextualized, content-aware embeddings of each post. To model temporal evolution, E-CaTCH segments events into overlapping time windows and uses a trend-aware LSTM, enhanced with semantic shift and momentum signals, to encode narrative progression over time. Classification is performed at the event level, enabling better alignment with real-world misinformation dynamics. To address class imbalance and promote stable learning, the model integrates adaptive class weighting, temporal consistency regularization, and hard-example mining. The total loss is aggregated across all events. Extensive experiments on Fakeddit, IND, and COVID-19 MISINFOGRAPH demonstrate that E-CaTCH consistently outperforms state-of-the-art baselines. Cross-dataset evaluations further demonstrate its robustness, generalizability, and practical applicability across diverse misinformation scenarios.

【2】CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector
标题：CHARM 3 R：迈向隐形摄像机高度稳健的单目镜3D检测器
链接：https://arxiv.org/abs/2508.11185

作者：umar, Yuliang Guo, Zhihao Zhang, Xinyu Huang, Liu Ren, Xiaoming Liu
备注：ICCV 2025
摘要：单目3D物体检测器虽然对来自一个自我相机高度的数据有效，但却与看不见的或分布外的相机高度作斗争。现有的方法通常依赖于Plucker嵌入，图像变换或数据增强。本文首先研究了相机高度变化对最先进（SoTA）Mono3D模型的影响，从而对这个未充分研究的问题迈出了一步。通过对具有多个相机高度的扩展CARLA数据集的系统分析，我们观察到深度估计是高度变化下影响性能的主要因素。我们在数学上证明，并根据经验观察一致的负面和积极的趋势，平均深度误差的回归和地面深度模型，分别根据相机高度的变化。为了缓解这一问题，我们提出了相机高度鲁棒单目3D检测器（CHARM3R），它对模型内的两个深度估计进行平均。CHARM3R将不可见相机高度的泛化能力提高了45美元以上，在CARLA数据集上实现了SoTA性能。https://github.com/abhi1kumar/CHARM3R上的代码和型号
摘要：Monocular 3D object detectors, while effective on data from one ego camera height, struggle with unseen or out-of-distribution camera heights. Existing methods often rely on Plucker embeddings, image transformations or data augmentation. This paper takes a step towards this understudied problem by first investigating the impact of camera height variations on state-of-the-art (SoTA) Mono3D models. With a systematic analysis on the extended CARLA dataset with multiple camera heights, we observe that depth estimation is a primary factor influencing performance under height variations. We mathematically prove and also empirically observe consistent negative and positive trends in mean depth error of regressed and ground-based depth models, respectively, under camera height changes. To mitigate this, we propose Camera Height Robust Monocular 3D Detector (CHARM3R), which averages both depth estimates within the model. CHARM3R improves generalization to unseen camera heights by more than $45\%$, achieving SoTA performance on the CARLA dataset. Codes and Models at https://github.com/abhi1kumar/CHARM3R

【3】iWatchRoad: Scalable Detection and Geospatial Visualization of Potholes for Smart Cities
标题：iWatchRoad：智慧城市坑洼的可扩展检测和地理空间可视化
链接：https://arxiv.org/abs/2508.10945

作者： Sahoo, Surbhi Saswati Mohanty, Subhankar Mishra
备注：Under review
摘要：道路上的坑洼是一个严重的危险和维护负担。这对道路安全和车辆寿命构成了重大威胁，特别是在印度多样化和维护不足的道路上。在本文中，我们提出了一个完整的端到端的系统，称为iWatchRoad的自动坑洞检测，全球定位系统（GPS）标记，并使用OpenStreetMap（OSM）的实时映射。我们策划了一个大型的自我注释数据集，其中包含7，000多帧，这些帧是在各种道路类型，照明条件和印度环境特有的天气场景中捕获的，利用了仪表盘镜头。该数据集用于微调Ultralytics You Only Look Once（YOLO）模型以执行实时坑洞检测，而自定义光学字符识别（OCR）模块用于直接从视频帧中提取时间戳。时间戳与GPS日志同步，以准确地对每个检测到的坑洞进行地理标记。处理后的数据包括坑洞的细节和帧，作为元数据存储在数据库中，并使用OSM通过用户友好的Web界面进行可视化。iWatchRoad不仅在具有挑战性的条件下提高了检测准确性，而且还通过网站上可见的元数据为道路评估和维护规划提供了政府兼容的输出。我们的解决方案具有成本效益、硬件效率和可扩展性，为发展中地区的城市和农村道路管理提供了实用的工具，使系统实现自动化。iWatchRoad可在https://smlab.niser.ac.in/project/iwatchroad上获得
摘要：Potholes on the roads are a serious hazard and maintenance burden. This poses a significant threat to road safety and vehicle longevity, especially on the diverse and under-maintained roads of India. In this paper, we present a complete end-to-end system called iWatchRoad for automated pothole detection, Global Positioning System (GPS) tagging, and real time mapping using OpenStreetMap (OSM). We curated a large, self-annotated dataset of over 7,000 frames captured across various road types, lighting conditions, and weather scenarios unique to Indian environments, leveraging dashcam footage. This dataset is used to fine-tune, Ultralytics You Only Look Once (YOLO) model to perform real time pothole detection, while a custom Optical Character Recognition (OCR) module was employed to extract timestamps directly from video frames. The timestamps are synchronized with GPS logs to geotag each detected potholes accurately. The processed data includes the potholes' details and frames as metadata is stored in a database and visualized via a user friendly web interface using OSM. iWatchRoad not only improves detection accuracy under challenging conditions but also provides government compatible outputs for road assessment and maintenance planning through the metadata visible on the website. Our solution is cost effective, hardware efficient, and scalable, offering a practical tool for urban and rural road management in developing regions, making the system automated. iWatchRoad is available at https://smlab.niser.ac.in/project/iwatchroad

【4】HQ-OV3D: A High Box Quality Open-World 3D Detection Framework based on Diffision Model
标题：HQ-Ov 3D：基于分歧模型的高盒子质量开放世界3D检测框架
链接：https://arxiv.org/abs/2508.10935

作者：abei Li, Hongsong Wang, Lei He
摘要：传统的闭集3D检测框架无法满足自动驾驶等开放世界应用的需求。现有的开放词汇3D检测方法通常采用两阶段流水线，包括伪标签生成和语义对齐。虽然视觉语言模型（VLM）最近已经大大提高了伪标签的语义准确性，其几何质量，特别是边界框精度，仍然是普遍忽视的。为了解决这个问题，我们提出了一个高框质量的开放词汇3D检测（HQ-OV 3D）框架，致力于生成和完善高质量的伪标签的开放词汇类。该框架包括两个主要组成部分：一个利用跨模态几何一致性生成高质量初始3D建议的模态内交叉验证（IMCV）建议生成器，以及一个通过基于DDIM的去噪机制利用来自注释类别的几何先验来逐步细化3D建议的注释类别辅助（ACA）去噪器。与最先进的方法相比，使用我们的方法生成的伪标签进行训练，在新类上的mAP提高了7.37%，证明了我们的框架生成的伪标签的卓越质量。HQ-OV 3D不仅可以作为强大的独立开放词汇3D检测器，还可以作为现有开放词汇检测或注释管道的插件高质量伪标签生成器。
摘要：Traditional closed-set 3D detection frameworks fail to meet the demands of open-world applications like autonomous driving. Existing open-vocabulary 3D detection methods typically adopt a two-stage pipeline consisting of pseudo-label generation followed by semantic alignment. While vision-language models (VLMs) recently have dramatically improved the semantic accuracy of pseudo-labels, their geometric quality, particularly bounding box precision, remains commonly neglected.To address this issue, we propose a High Box Quality Open-Vocabulary 3D Detection (HQ-OV3D) framework, dedicated to generate and refine high-quality pseudo-labels for open-vocabulary classes. The framework comprises two key components: an Intra-Modality Cross-Validated (IMCV) Proposal Generator that utilizes cross-modality geometric consistency to generate high-quality initial 3D proposals, and an Annotated-Class Assisted (ACA) Denoiser that progressively refines 3D proposals by leveraging geometric priors from annotated categories through a DDIM-based denoising mechanism.Compared to the state-of-the-art method, training with pseudo-labels generated by our approach achieves a 7.37% improvement in mAP on novel classes, demonstrating the superior quality of the pseudo-labels produced by our framework. HQ-OV3D can serve not only as a strong standalone open-vocabulary 3D detector but also as a plug-in high-quality pseudo-label generator for existing open-vocabulary detection or annotation pipelines.

【5】Modeling and Detecting Company Risks from News: A Case Study in Bloomberg News
标题：从新闻中识别公司风险的模型与方法--以彭博新闻社为例
链接：https://arxiv.org/abs/2508.10927

作者：i, Soumya Vadlamannati, Liang-Kang Huang, Daniel Preotiuc-Pietro, Xinyu Hua
备注：None
摘要：识别与公司相关的风险对投资者和整个金融市场的健康至关重要。在这项研究中，我们建立了一个计算框架，自动提取公司的风险因素，从新闻文章。我们新提出的方案包括七个不同的方面，如供应链，法规和竞争。我们对744篇新闻文章进行了采样和注释，并对各种机器学习模型进行了基准测试。虽然大型语言模型在各种类型的NLP任务中取得了巨大的进步，但我们的实验表明，zero-shot和Few-Shot提示最先进的LLM（例如LLaMA-2）只能在识别风险因素方面实现中等至低的性能。经过微调的预先训练的语言模型在大多数风险因素上表现更好。使用这个模型，我们分析了超过277 K的彭博新闻文章，并证明从新闻中识别风险因素可以为公司和行业的运营提供广泛的见解。
摘要：Identifying risks associated with a company is important to investors and the well-being of the overall financial market. In this study, we build a computational framework to automatically extract company risk factors from news articles. Our newly proposed schema comprises seven distinct aspects, such as supply chain, regulations, and competitions. We sample and annotate 744 news articles and benchmark various machine learning models. While large language models have achieved huge progress in various types of NLP tasks, our experiment shows that zero-shot and few-shot prompting state-of-the-art LLMs (e.g. LLaMA-2) can only achieve moderate to low performances in identifying risk factors. And fine-tuned pre-trained language models are performing better on most of the risk factors. Using this model, we analyze over 277K Bloomberg news articles and demonstrate that identifying risk factors from news could provide extensive insight into the operations of companies and industries.

【6】CleanCTG: A Deep Learning Model for Multi-Artefact Detection and Reconstruction in Cardiotocography
标题：CleanCTG：用于子宫内膜摄影中多伪影检测和重建的深度学习模型
链接：https://arxiv.org/abs/2508.10928

作者：g, Beth Albert, Gabriel Davis Jones
摘要：胎儿监护（CTG）是必不可少的，但经常受到各种伪影的影响，这些伪影模糊了真实的胎儿心率（FHR）模式，并可能导致误诊或延迟干预。目前的深度学习方法通常绕过全面的噪声处理，应用最小的预处理或仅关注下游分类，而传统方法依赖于简单的插值或基于规则的过滤，仅解决丢失的样本，无法纠正复杂的伪影类型。我们提出了CleanCTG，一个端到端的双阶段模型，首先通过多尺度卷积和上下文感知的交叉注意识别多个伪影类型，然后通过伪影特定的校正分支重建损坏的片段。培训使用了超过800，000分钟的生理上真实的，合成损坏的CTG，这些CTG来自专家验证的“干净”录音。在合成数据上，CleanCTG实现了完美的伪影检测（AU-ROC = 1.00），并将损坏片段的均方误差（MSE）降低到2.74 x 10^-4（干净片段的MSE = 2.40 x 10^-6），比第二好的方法高出60%以上。对10，190分钟临床医生注释片段的外部验证产生AU-ROC = 0.95（灵敏度= 83.44%，特异性94.22%），超过了6个比较分类器。最后，当与Dawes-Redman系统集成在933个临床CTG记录上时，去噪痕迹增加了特异性（从80.70%到82.70%），并将中位决策时间缩短了33%。这些研究结果表明，明确的伪影去除和信号重建既可以保持诊断的准确性，使监测时间更短，提供了一个实用的路线，以更可靠的CTG解释。
摘要：Cardiotocography (CTG) is essential for fetal monitoring but is frequently compromised by diverse artefacts which obscure true fetal heart rate (FHR) patterns and can lead to misdiagnosis or delayed intervention. Current deep-learning approaches typically bypass comprehensive noise handling, applying minimal preprocessing or focusing solely on downstream classification, while traditional methods rely on simple interpolation or rule-based filtering that addresses only missing samples and fail to correct complex artefact types. We present CleanCTG, an end-to-end dual-stage model that first identifies multiple artefact types via multi-scale convolution and context-aware cross-attention, then reconstructs corrupted segments through artefact-specific correction branches. Training utilised over 800,000 minutes of physiologically realistic, synthetically corrupted CTGs derived from expert-verified "clean" recordings. On synthetic data, CleanCTG achieved perfect artefact detection (AU-ROC = 1.00) and reduced mean squared error (MSE) on corrupted segments to 2.74 x 10^-4 (clean-segment MSE = 2.40 x 10^-6), outperforming the next best method by more than 60%. External validation on 10,190 minutes of clinician-annotated segments yielded AU-ROC = 0.95 (sensitivity = 83.44%, specificity 94.22%), surpassing six comparator classifiers. Finally, when integrated with the Dawes-Redman system on 933 clinical CTG recordings, denoised traces increased specificity (from 80.70% to 82.70%) and shortened median time to decision by 33%. These findings suggest that explicit artefact removal and signal reconstruction can both maintain diagnostic accuracy and enable shorter monitoring sessions, offering a practical route to more reliable CTG interpretation.

【7】Trees Assembling Mann Whitney Approach for Detecting Genome-wide Joint Association among Low Marginal Effect loci
标题：树组装Mann Whitney方法检测低边缘效应基因座之间的全基因组联合关联
链接：https://arxiv.org/abs/1505.01206

作者：i Wei, Daniel J. Schaid, Qing Lu
备注：None
摘要：常见的复杂疾病可能受到数百甚至数千种遗传变异相互作用的影响。越来越多的证据表明，具有低边际效应（LME）的遗传变异在疾病发展中起着重要作用。尽管它们具有潜在的意义，但发现LME遗传变异并评估它们在高维数据上的联合关联（例如，全基因组关联研究）仍然是一个巨大的挑战。为了促进LME遗传变异的大集合之间的联合关联分析，我们提出了一种计算效率高且功能强大的方法，我们称之为Trees Assembling Mann whitney（TAMW）。通过模拟研究和实证数据应用，我们发现，TAMW优于多因素降维（MDR）和似然比为基础的曼惠特尼方法（LRMW）时，潜在的复杂疾病涉及多个LME基因座及其相互作用。例如，在20个相互作用的LME位点的模拟中，TAMW获得了比MDR（功效=0.599）和LRMW（功效=0.704）更高的功效（功效=0.931）。在一项对29个已知克罗恩病（CD）基因座的实证研究中，TAMW也发现了比MDR和LRMW检测到的更强的与CD的联合关联。最后，我们将TAMW应用于Wellcome Trust CD GWAS以进行全基因组分析。使用并行计算在40小时内完成了459 K单核苷酸多态性的分析，并揭示了诱发CD的联合关联（p值=2.763e-19）。进一步分析发现，ATG 16 L1和LACC 1等13个基因可能在CD的病理生理和病因学过程中发挥重要作用。
摘要：Common complex diseases are likely influenced by the interplay of hundreds, or even thousands, of genetic variants. Converging evidence shows that genetic variants with low marginal effects (LME) play an important role in disease development. Despite their potential significance, discovering LME genetic variants and assessing their joint association on high dimensional data (e.g., genome wide association studies) remain a great challenge. To facilitate joint association analysis among a large ensemble of LME genetic variants, we proposed a computationally efficient and powerful approach, which we call Trees Assembling Mann whitney (TAMW). Through simulation studies and an empirical data application, we found that TAMW outperformed multifactor dimensionality reduction (MDR) and the likelihood ratio based Mann whitney approach (LRMW) when the underlying complex disease involves multiple LME loci and their interactions. For instance, in a simulation with 20 interacting LME loci, TAMW attained a higher power (power=0.931) than both MDR (power=0.599) and LRMW (power=0.704). In an empirical study of 29 known Crohn's disease (CD) loci, TAMW also identified a stronger joint association with CD than those detected by MDR and LRMW. Finally, we applied TAMW to Wellcome Trust CD GWAS to conduct a genome wide analysis. The analysis of 459K single nucleotide polymorphisms was completed in 40 hours using parallel computing, and revealed a joint association predisposing to CD (p-value=2.763e-19). Further analysis of the newly discovered association suggested that 13 genes, such as ATG16L1 and LACC1, may play an important role in CD pathophysiological and etiological processes.

分类|识别(4篇)

【1】Investigating Sensors and Methods in Grasp State Classification in Agricultural Manipulation
标题：农业操作抓取状态分类中的传感器和方法研究
链接：https://arxiv.org/abs/2508.11588

作者：Walt, Jordan Westphal, Girish Krishnan
摘要：有效和高效的农业操作和收获取决于准确理解当前的抓持状态。农业环境由于其复杂性、杂乱性和闭塞性而呈现出独特的挑战。此外，果实在物理上附着在植物上，需要在收获期间精确分离。选择合适的传感器和建模技术是获得可靠的反馈和正确识别抓取状态的关键。这项工作研究了一组关键的传感器，即惯性测量单元（IMU），红外（IR）反射，张力，触觉传感器和RGB摄像头，集成到一个兼容的夹持器分类把握状态。我们评估了每个传感器的单独贡献，并比较了两种广泛使用的分类模型的性能：随机森林和长短期记忆（LSTM）网络。我们的研究结果表明，在受控实验室环境中训练并在真正的樱桃番茄植物上测试的随机森林分类器在识别滑动，抓取失败和成功采摘方面实现了100%的准确性，标志着基线性能的大幅改善。此外，我们确定了一个最小可行的传感器组合，即IMU和张力传感器，有效地分类把握状态。该分类器能够根据实时反馈规划纠正措施，从而提高水果收获操作的效率和可靠性。
摘要：Effective and efficient agricultural manipulation and harvesting depend on accurately understanding the current state of the grasp. The agricultural environment presents unique challenges due to its complexity, clutter, and occlusion. Additionally, fruit is physically attached to the plant, requiring precise separation during harvesting. Selecting appropriate sensors and modeling techniques is critical for obtaining reliable feedback and correctly identifying grasp states. This work investigates a set of key sensors, namely inertial measurement units (IMUs), infrared (IR) reflectance, tension, tactile sensors, and RGB cameras, integrated into a compliant gripper to classify grasp states. We evaluate the individual contribution of each sensor and compare the performance of two widely used classification models: Random Forest and Long Short-Term Memory (LSTM) networks. Our results demonstrate that a Random Forest classifier, trained in a controlled lab environment and tested on real cherry tomato plants, achieved 100% accuracy in identifying slip, grasp failure, and successful picks, marking a substantial improvement over baseline performance. Furthermore, we identify a minimal viable sensor combination, namely IMU and tension sensors that effectively classifies grasp states. This classifier enables the planning of corrective actions based on real-time feedback, thereby enhancing the efficiency and reliability of fruit harvesting operations.

【2】Conformal Prediction Meets Long-tail Classification
标题：保形预测满足长尾分类
链接：https://arxiv.org/abs/2508.11345

作者：, Jianguo Huang, Luke Ong
摘要：共形预测（CP）是一种流行的不确定性量化方法，它将预训练模型的点预测转换为预测集，集大小反映模型的置信度。虽然现有的CP方法保证实现边缘覆盖，但它们通常在长尾标签分布下表现出跨类的不平衡覆盖，倾向于以覆盖剩余尾类为代价覆盖头部类。这种覆盖不足尤其令人担忧，因为它破坏了少数群体预测集的可靠性，即使平均覆盖率得到了保证。在本文中，我们提出了Tail-Aware Conformal Prediction（TACP）方法，以减轻覆盖下的尾部类利用长尾结构和缩小头-尾覆盖差距。理论分析表明，它始终实现了一个较小的头-尾覆盖间隙比标准的方法。为了进一步提高所有类的覆盖平衡，我们引入了TACP的扩展：通过重新加权机制的软TACP（sTACP）。所提出的框架可以与各种不一致性分数相结合，在多个长尾基准数据集上的实验证明了我们的方法的有效性。
摘要：Conformal Prediction (CP) is a popular method for uncertainty quantification that converts a pretrained model's point prediction into a prediction set, with the set size reflecting the model's confidence. Although existing CP methods are guaranteed to achieve marginal coverage, they often exhibit imbalanced coverage across classes under long-tail label distributions, tending to over cover the head classes at the expense of under covering the remaining tail classes. This under coverage is particularly concerning, as it undermines the reliability of the prediction sets for minority classes, even with coverage ensured on average. In this paper, we propose the Tail-Aware Conformal Prediction (TACP) method to mitigate the under coverage of the tail classes by utilizing the long-tail structure and narrowing the head-tail coverage gap. Theoretical analysis shows that it consistently achieves a smaller head-tail coverage gap than standard methods. To further improve coverage balance across all classes, we introduce an extension of TACP: soft TACP (sTACP) via a reweighting mechanism. The proposed framework can be combined with various non-conformity scores, and experiments on multiple long-tail benchmark datasets demonstrate the effectiveness of our methods.

【3】A Cooperative Game-Based Multi-Criteria Weighted Ensemble Approach for Multi-Class Classification
标题：一种基于合作游戏的多准则加权积分方法
链接：https://arxiv.org/abs/2508.10926

作者：-Yoon
备注：English translation of the author's pre-revision version of the article published in J-KICS 50(4):561-571 (2025), DOI 10.7840/kics.2025.50.4.561. Posted with permission from KICS (Aug 7, 2025). The published version may differ
摘要：自第四次工业革命以来，人工智能技术在许多领域得到了广泛的应用，但也存在一些需要克服的局限性，包括过拟合/欠拟合、类别不平衡以及由于不同模型的特性而导致的表示（假设空间）的局限性。作为克服这些问题的一种方法，集成（通常称为模型组合）正在机器学习领域得到广泛应用。在集成学习方法中，投票集成已经被研究了各种加权方法，表现出性能的改善。然而，现有的权重反映分类器先验信息的方法只考虑了一个评价标准，限制了模型真实地反映各种应考虑的信息。因此，本文提出了一种在多准则情况下，通过合作博弈考虑各种信息进行决策的方法。该方法可以同时考虑和反映分类器中预先已知的各种信息，从而合理分配权值，提高分类器性能。将机器学习算法应用于Open-ML-CC 18数据集，并与现有的集成加权方法进行比较。实验结果表明，与其他加权方法相比，该方法具有更好的性能。
摘要：Since the Fourth Industrial Revolution, AI technology has been widely used in many fields, but there are several limitations that need to be overcome, including overfitting/underfitting, class imbalance, and the limitations of representation (hypothesis space) due to the characteristics of different models. As a method to overcome these problems, ensemble, commonly known as model combining, is being extensively used in the field of machine learning. Among ensemble learning methods, voting ensembles have been studied with various weighting methods, showing performance improvements. However, the existing methods that reflect the pre-information of classifiers in weights consider only one evaluation criterion, which limits the reflection of various information that should be considered in a model realistically. Therefore, this paper proposes a method of making decisions considering various information through cooperative games in multi-criteria situations. Using this method, various types of information known beforehand in classifiers can be simultaneously considered and reflected, leading to appropriate weight distribution and performance improvement. The machine learning algorithms were applied to the Open-ML-CC18 dataset and compared with existing ensemble weighting methods. The experimental results showed superior performance compared to other weighting methods.

【4】Repetitive TMS-based Identification of Methamphetamine-Dependent Individuals Using EEG Spectra
标题：使用脑电频谱基于重复TMS识别甲基苯丙胺依赖者
链接：https://arxiv.org/abs/2508.11312

作者：, Yun-Hsuan Chen, Xurong Gao, Wenyao Zheng, Hemmings Wu, Zhoule Zhu, Jie Yang, Chengkai Wang, Lihua Zhong, Weiwei Cheng, Mohamad Sawan
备注：10 pages, 9 figures
摘要：重复经颅磁刺激（rTMS）对甲基苯丙胺（METH）使用者渴望水平的影响通常使用问卷进行评估。本研究探讨了利用神经信号获得更客观结果的可行性。分析了20名METH成瘾参与者在rTMS（MBT和MAT）之前和之后以及20名健康参与者（HC）的EEG信号。在每个脑电范式中，被试被随机展示15幅与METH相关的图片和15幅中性图片，并推导出每个EEG子带频率的相对频带功率（RBP）。分析所有31个通道以及单个大脑区域的平均RBP。从统计学上看，MAT的α、β和γ RBP与MBT相比更像HC的RBP，如功率拓扑图所示。利用随机森林（RF），伽玛RBP被确定为最佳频带之间的MBT和HC的90%的准确性区分。MAT与HC的分类性能低于MBT与HC的分类性能，表明rTMS的有效性可以使用具有γ RBP的RF进行验证。此外，由TP10和CP2通道记录的γ RBP在接收METH相关图像提示时主导MBT与HC的分类任务。暴露于METH相关线索期间的γ RBP可以作为区分MBT和HC以及评估rTMS有效性的生物标志物。因此，γ RBP变化的实时监测有望作为用于实施定制的闭环神经调节系统以治疗METH成瘾的参数。
摘要：The impact of repetitive transcranial magnetic stimulation (rTMS) on methamphetamine (METH) users' craving levels is often assessed using questionnaires. This study explores the feasibility of using neural signals to obtain more objective results. EEG signals recorded from 20 METH-addicted participants Before and After rTMS (MBT and MAT) and from 20 healthy participants (HC) are analyzed. In each EEG paradigm, participants are shown 15 METH-related and 15 neutral pictures randomly, and the relative band power (RBP) of each EEG sub-band frequency is derived. The average RBP across all 31 channels, as well as individual brain regions, is analyzed. Statistically, MAT's alpha, beta, and gamma RBPs are more like those of HC compared to MBT, as indicated by the power topographies. Utilizing a random forest (RF), the gamma RBP is identified as the optimal frequency band for distinguishing between MBT and HC with a 90% accuracy. The performance of classifying MAT versus HC is lower than that of MBT versus HC, suggesting that the efficacy of rTMS can be validated using RF with gamma RBP. Furthermore, the gamma RBP recorded by the TP10 and CP2 channels dominates the classification task of MBT versus HC when receiving METH-related image cues. The gamma RBP during exposure to METH-related cues can serve as a biomarker for distinguishing between MBT and HC and for evaluating the effectiveness of rTMS. Therefore, real-time monitoring of gamma RBP variations holds promise as a parameter for implementing a customized closed-loop neuromodulation system for treating METH addiction.

优化|敛散性(5篇)

【1】Optimal CO2 storage management considering safety constraints in multi-stakeholder multi-site CCS projects: a game theoretic perspective
标题：考虑多利益相关者多站点碳捕获项目安全约束的最佳二氧化碳储存管理：博弈论的视角
链接：https://arxiv.org/abs/2508.11618

作者：hen, Seyyed A. Hosseini
备注：38 pages, 16 figures
摘要：碳捕获和储存（CCS）项目通常涉及来自公共、私营和监管部门的各种利益相关者或参与者，每个利益相关者或参与者都有不同的目标和责任。考虑到CCS运营的复杂性、规模和长期性，确定单个利益相关者是否能够独立地最大化其利益，或者是否需要合作联盟协议，仍然是有效的CCS项目规划和管理的一个核心问题。CCS项目通常在地质上相连的地点实施，在这些地点，共同的地质特征，如压力空间和储层孔隙容量，可能导致利益相关者之间的竞争行为。此外，CO2储存地点往往位于地质成熟的盆地，这些盆地以前是碳氢化合物提取或废水处理的地点，以便利用现有的基础设施，这使得单方面的优化更加复杂和不现实。在这项工作中，我们提出了一个基于马尔可夫博弈的范式来定量研究不同的联盟结构如何影响利益相关者的目标。我们将这个多利益相关者多站点问题框架为具有安全约束的多代理强化学习问题。我们的方法使代理能够学习最佳策略，同时遵守安全法规。我们提出了一个例子，多个运营商注入二氧化碳到各自的项目领域在地质连接盆地。为了解决高保真模型重复仿真的高计算成本，采用了先前开发的基于嵌入到控制（E2C）框架的代理模型。我们的研究结果表明，当涉及多个利益相关者具有不同的目标和目的时，所提出的框架在解决二氧化碳储存的最佳管理方面的有效性。
摘要：Carbon capture and storage (CCS) projects typically involve a diverse array of stakeholders or players from public, private, and regulatory sectors, each with different objectives and responsibilities. Given the complexity, scale, and long-term nature of CCS operations, determining whether individual stakeholders can independently maximize their interests or whether collaborative coalition agreements are needed remains a central question for effective CCS project planning and management. CCS projects are often implemented in geologically connected sites, where shared geological features such as pressure space and reservoir pore capacity can lead to competitive behavior among stakeholders. Furthermore, CO2 storage sites are often located in geologically mature basins that previously served as sites for hydrocarbon extraction or wastewater disposal in order to leverage existing infrastructures, which makes unilateral optimization even more complicated and unrealistic. In this work, we propose a paradigm based on Markov games to quantitatively investigate how different coalition structures affect the goals of stakeholders. We frame this multi-stakeholder multi-site problem as a multi-agent reinforcement learning problem with safety constraints. Our approach enables agents to learn optimal strategies while compliant with safety regulations. We present an example where multiple operators are injecting CO2 into their respective project areas in a geologically connected basin. To address the high computational cost of repeated simulations of high-fidelity models, a previously developed surrogate model based on the Embed-to-Control (E2C) framework is employed. Our results demonstrate the effectiveness of the proposed framework in addressing optimal management of CO2 storage when multiple stakeholders with various objectives and goals are involved.

【2】Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization
标题：基于可微优化的决策学习代理损失最小化
链接：https://arxiv.org/abs/2508.11365

作者：andi, Ali İrfan Mahmutoğulları, Senne Berden, Tias Guns
摘要：决策聚焦学习（DFL）训练机器学习（ML）模型来预测优化问题的参数，以直接最小化决策遗憾，即，最大化决策质量。基于导数的DFL需要计算优化问题的解相对于预测参数的导数。然而，对于许多优化问题，如线性规划（LP），遗憾的梯度相对于预测参数几乎处处为零。现有的基于梯度的LP DFL方法试图以两种方式之一来规避这个问题：（a）通过添加二次正则化器将LP平滑为可微优化问题，然后直接最小化遗憾或（b）最小化具有信息（子）梯度的代理损失。在本文中，我们表明，前一种方法仍然导致零梯度，因为即使在平滑后的遗憾仍然保持不变的参数空间的大区域。为了解决这个问题，我们建议最小化代理损失-即使使用可微优化层，并且可以直接最小化遗憾。我们的实验表明，最大限度地减少代理损失允许可微优化层，以实现相当于或优于代理损失为基础的DFL方法的遗憾。此外，我们证明了这也适用于DYS-Net，这是最近提出的LP的可微分优化技术，它通过可以使用前馈神经网络层执行的操作来计算近似解和梯度。由于DYS-Net非常有效地执行向前和向后传递，通过使用DYS-Net最小化代理损失，我们能够获得与最先进水平相当的遗憾，同时大幅减少训练时间。
摘要：Decision-focused learning (DFL) trains a machine learning (ML) model to predict parameters of an optimization problem, to directly minimize decision regret, i.e., maximize decision quality. Gradient-based DFL requires computing the derivative of the solution to the optimization problem with respect to the predicted parameters. However, for many optimization problems, such as linear programs (LPs), the gradient of the regret with respect to the predicted parameters is zero almost everywhere. Existing gradient-based DFL approaches for LPs try to circumvent this issue in one of two ways: (a) smoothing the LP into a differentiable optimization problem by adding a quadratic regularizer and then minimizing the regret directly or (b) minimizing surrogate losses that have informative (sub)gradients. In this paper, we show that the former approach still results in zero gradients, because even after smoothing the regret remains constant across large regions of the parameter space. To address this, we propose minimizing surrogate losses -- even when a differentiable optimization layer is used and regret can be minimized directly. Our experiments demonstrate that minimizing surrogate losses allows differentiable optimization layers to achieve regret comparable to or better than surrogate-loss based DFL methods. Further, we demonstrate that this also holds for DYS-Net, a recently proposed differentiable optimization technique for LPs, that computes approximate solutions and gradients through operations that can be performed using feedforward neural network layers. Because DYS-Net executes the forward and the backward pass very efficiently, by minimizing surrogate losses using DYS-Net, we are able to attain regret on par with the state-of-the-art while reducing training time by a significant margin.

【3】Quantization through Piecewise-Affine Regularization: Optimization and Statistical Guarantees
标题：通过逐段仿射正规化进行量化：优化和统计保证
链接：https://arxiv.org/abs/2508.11112

作者：a, Lin Xiao
摘要：离散变量或量化变量的优化问题由于其搜索空间的组合性质而通常非常具有挑战性。分段仿射正则化（PAR）为基于连续优化的量化提供了灵活的建模和计算框架。在这项工作中，我们专注于监督学习的设置，并从优化和统计的角度研究PAR的理论基础。首先，我们表明，在overparameterized制度，其中参数的数量超过样本的数量，PAR正则化损失函数的每个临界点表现出高度的量化。其次，我们推导出各种（凸，准凸，非凸）PAR的封闭形式的近端映射，并展示了如何使用近端梯度法，其加速变体和交替方向乘法来解决PAR正则化问题。第三，我们研究PAR正则化线性回归问题的统计保证，具体来说，我们可以近似的经典配方$\ell_1$-，平方$\ell_2$-，和非凸正则化使用PAR，并获得类似的统计保证与量化的解决方案。
摘要：Optimization problems over discrete or quantized variables are very challenging in general due to the combinatorial nature of their search space. Piecewise-affine regularization (PAR) provides a flexible modeling and computational framework for quantization based on continuous optimization. In this work, we focus on the setting of supervised learning and investigate the theoretical foundations of PAR from optimization and statistical perspectives. First, we show that in the overparameterized regime, where the number of parameters exceeds the number of samples, every critical point of the PAR-regularized loss function exhibits a high degree of quantization. Second, we derive closed-form proximal mappings for various (convex, quasi-convex, and non-convex) PARs and show how to solve PAR-regularized problems using the proximal gradient method, its accelerated variant, and the Alternating Direction Method of Multipliers. Third, we study statistical guarantees of PAR-regularized linear regression problems; specifically, we can approximate classical formulations of $\ell_1$-, squared $\ell_2$-, and nonconvex regularizations using PAR and obtain similar statistical guarantees with quantized solutions.

【4】Uniform convergence for Gaussian kernel ridge regression
标题：高斯核岭回归的一致收敛
链接：https://arxiv.org/abs/2508.11274

作者：el, Rajmadan Lakshmanan
摘要：本文建立了超参数固定的高斯核岭回归（KRR）在一致范数和$L^{2}$范数下的第一多项式收敛速度。一致收敛的结果关闭了高斯核的KRR的理论理解的差距，在没有这样的利率是以前已知的。另外，在高斯核的宽度参数固定的情况下，我们证明了多项式L^{2}$-收敛速度.这也有助于更广泛地理解平滑内核，以前只有子多项式$L^{2}$-率在类似的设置中是已知的。总之，这些结果为在非参数回归中使用具有固定超参数的高斯KRR提供了新的理论依据。
摘要：This paper establishes the first polynomial convergence rates for Gaussian kernel ridge regression (KRR) with a fixed hyperparameter in both the uniform and the $L^{2}$-norm. The uniform convergence result closes a gap in the theoretical understanding of KRR with the Gaussian kernel, where no such rates were previously known. In addition, we prove a polynomial $L^{2}$-convergence rate in the case, where the Gaussian kernel's width parameter is fixed. This also contributes to the broader understanding of smooth kernels, for which previously only sub-polynomial $L^{2}$-rates were known in similar settings. Together, these results provide new theoretical justification for the use of Gaussian KRR with fixed hyperparameters in nonparametric regression.

【5】Non-asymptotic convergence bound of conditional diffusion models
标题：条件扩散模型的非渐进收敛界
链接：https://arxiv.org/abs/2508.10944

作者：
摘要：基于条件扩散模型的各类数据的学习和生成是近年来的研究热点。虽然条件扩散模型在改进加速算法和提高生成质量方面取得了长足的进步，但其非渐近性质的缺乏阻碍了理论研究。为了解决这一问题，我们将重点放在分类和回归（CARD）领域内的条件扩散模型上，该模型旨在学习给定输入x（表示为Y）的原始分布|X）。它创新地将预先训练的模型f_{\phi}（x）集成到原始扩散模型框架中，使其能够精确地捕获给定f（表示为Y）的原始条件分布|f_{\phi}（x））。值得注意的是，当f_{\phi}（x）表现令人满意时，Y| f_{\phi}（x）近似于Y| X.在理论上，推导了CARD的随机微分方程，建立了以Fokker-Planck方程为基础的广义CARD形式，为分析CARD奠定了坚实的理论基础。主要是在Lipschitz假设下，我们利用二阶Wasserstein距离来证明原始和生成的条件分布之间的误差上限。此外，通过附加假设，如轻尾的原始分布，我们推导出类似于得分函数的真实值和相应的网络估计值之间的收敛上界。
摘要：Learning and generating various types of data based on conditional diffusion models has been a research hotspot in recent years. Although conditional diffusion models have made considerable progress in improving acceleration algorithms and enhancing generation quality, the lack of non-asymptotic properties has hindered theoretical research. To address this gap, we focus on a conditional diffusion model within the domains of classification and regression (CARD), which aims to learn the original distribution with given input x (denoted as Y|X). It innovatively integrates a pre-trained model f_{\phi}(x) into the original diffusion model framework, allowing it to precisely capture the original conditional distribution given f (expressed as Y|f_{\phi}(x)). Remarkably, when f_{\phi}(x) performs satisfactorily, Y|f_{\phi}(x) closely approximates Y|X. Theoretically, we deduce the stochastic differential equations of CARD and establish its generalized form predicated on the Fokker-Planck equation, thereby erecting a firm theoretical foundation for analysis. Mainly under the Lipschitz assumptions, we utilize the second-order Wasserstein distance to demonstrate the upper error bound between the original and the generated conditional distributions. Additionally, by appending assumptions such as light-tailedness to the original distribution, we derive the convergence upper bound between the true value analogous to the score function and the corresponding network-estimated value.

预测|估计(4篇)

【1】Air Quality PM2.5 Index Prediction Model Based on CNN-LSTM
标题：基于CNN-LSTM的空气质量PM2.5指数预测模型
链接：https://arxiv.org/abs/2508.11215

作者：uo, Shuqi Wu, Meixing Zhu, He Guandi
摘要：随着全球气候变化的加剧，准确预测空气质量指标，特别是PM2.5浓度，在环境保护、公共卫生、城市管理等领域变得越来越重要。为了解决这个问题，我们提出了一个基于混合CNN-LSTM架构的空气质量PM2.5指数预测模型。该模型有效地结合了用于局部空间特征提取的卷积神经网络（CNN）和用于建模时间序列数据中的时间依赖性的长短期记忆（LSTM）网络。使用2010年至2015年期间从北京工业区收集的多变量数据集-其中包括PM2.5浓度，温度，露点，压力，风向，风速和降水的每小时记录-该模型预测了6小时间隔的平均PM2.5浓度。实验结果表明，该模型的均方根误差（RMSE）为5.236，优于传统的时间序列模型的准确性和泛化能力。这证明了它在空气污染预警系统等实际应用中的强大潜力。然而，由于多变量输入的复杂性，该模式需要高的计算资源，其处理不同的大气因素的能力仍然需要优化。未来的工作将集中在增强可扩展性和扩大对更复杂的多变量天气预测任务的支持。
摘要：With the intensification of global climate change, accurate prediction of air quality indicators, especially PM2.5 concentration, has become increasingly important in fields such as environmental protection, public health, and urban management. To address this, we propose an air quality PM2.5 index prediction model based on a hybrid CNN-LSTM architecture. The model effectively combines Convolutional Neural Networks (CNN) for local spatial feature extraction and Long Short-Term Memory (LSTM) networks for modeling temporal dependencies in time series data. Using a multivariate dataset collected from an industrial area in Beijing between 2010 and 2015 -- which includes hourly records of PM2.5 concentration, temperature, dew point, pressure, wind direction, wind speed, and precipitation -- the model predicts the average PM2.5 concentration over 6-hour intervals. Experimental results show that the model achieves a root mean square error (RMSE) of 5.236, outperforming traditional time series models in both accuracy and generalization. This demonstrates its strong potential in real-world applications such as air pollution early warning systems. However, due to the complexity of multivariate inputs, the model demands high computational resources, and its ability to handle diverse atmospheric factors still requires optimization. Future work will focus on enhancing scalability and expanding support for more complex multivariate weather prediction tasks.

【2】Predictive Multimodal Modeling of Diagnoses and Treatments in EHR
标题：EHR诊断和治疗的预测多模式建模
链接：https://arxiv.org/abs/2508.11092

作者：h-Ting Huang, Clarence Boon Liang Ng, Marek Rei
备注：10 pages, 1 figure
摘要：虽然ICD代码分配问题已得到广泛研究，但大多数工作都集中在出院后文档分类上。这些信息的早期预测模型可用于识别健康风险，建议有效的治疗方法或优化资源分配。为了解决在患者住院开始时使用有限信息进行预测建模的挑战，我们提出了一种多模式系统来融合电子健康记录中捕获的临床记录和表格事件。该模型集成了预训练的编码器、特征池和跨模态注意力，以学习跨模态的最佳表示，并在每个时间点平衡它们的存在。此外，我们提出了一个加权的时间损失，调整其在每个时间点的贡献。实验表明，这些策略增强了早期预测模型，优于当前最先进的系统。
摘要：While the ICD code assignment problem has been widely studied, most works have focused on post-discharge document classification. Models for early forecasting of this information could be used for identifying health risks, suggesting effective treatments, or optimizing resource allocation. To address the challenge of predictive modeling using the limited information at the beginning of a patient stay, we propose a multimodal system to fuse clinical notes and tabular events captured in electronic health records. The model integrates pre-trained encoders, feature pooling, and cross-modal attention to learn optimal representations across modalities and balance their presence at every temporal point. Moreover, we present a weighted temporal loss that adjusts its contribution at each point in time. Experiments show that these strategies enhance the early prediction model, outperforming the current state-of-the-art systems.

【3】A Feasibility Experiment on the Application of Predictive Coding to Instant Messaging Corpora
标题：预测编码在即时通信语料库中应用的可行性实验
链接：https://arxiv.org/abs/2508.11084

作者：Schoinas, Ghulam Qadir
摘要：预测编码是法律行业使用机器学习进行文档分类的术语，当数据集包括即时消息时，由于其非正式性质和较小的大小，预测编码会带来额外的挑战。在本文中，我们利用数据管理工作流将消息分组到日常聊天中，然后进行特征选择和逻辑回归分类器，以提供经济可行的预测编码解决方案。我们还通过降维提高了解决方案的基线模型性能，重点是定量特征。我们在Instant Bloomberg数据集上测试我们的方法，该数据集具有丰富的定量信息。与此同时，我们提供了我们的方法节省成本的示例。
摘要：Predictive coding, the term used in the legal industry for document classification using machine learning, presents additional challenges when the dataset comprises instant messages, due to their informal nature and smaller sizes. In this paper, we exploit a data management workflow to group messages into day chats, followed by feature selection and a logistic regression classifier to provide an economically feasible predictive coding solution. We also improve the solution's baseline model performance by dimensionality reduction, with focus on quantitative features. We test our methodology on an Instant Bloomberg dataset, rich in quantitative information. In parallel, we provide an example of the cost savings of our approach.

【4】Conditional Independence Estimates for the Generalized Nonparanormal
标题：广义非超正常现象的条件独立估计
链接：https://arxiv.org/abs/2508.11050

作者： (1), Manuel Lladser (1), Rebecca Morrison (1) ((1) University of Colorado Boulder)
备注：22 pages, 7 figures, 3 tables
摘要：对于一般的非高斯分布，协方差矩阵和精度矩阵不编码变量的独立性结构，因为它们对多元高斯分布进行编码。本文以之前的工作为基础，表明对于一类非高斯分布（从高斯分布的对角变换导出的分布），只要数据满足某些标准，仍然可以从精度矩阵中推断出有关条件独立结构的信息，类似于高斯情况。我们把这种高斯变换称为广义非超常变换。定义这些变换的函数在广义上是任意的。我们还提供了一个简单和计算效率高的算法，利用这一理论来恢复条件独立结构的广义nonparanormal数据。通过合成实验和实际数据的应用证明了该算法的有效性。
摘要：For general non-Gaussian distributions, the covariance and precision matrices do not encode the independence structure of the variables, as they do for the multivariate Gaussian. This paper builds on previous work to show that for a class of non-Gaussian distributions -- those derived from diagonal transformations of a Gaussian -- information about the conditional independence structure can still be inferred from the precision matrix, provided the data meet certain criteria, analogous to the Gaussian case. We call such transformations of the Gaussian as the generalized nonparanormal. The functions that define these transformations are, in a broad sense, arbitrary. We also provide a simple and computationally efficient algorithm that leverages this theory to recover conditional independence structure from the generalized nonparanormal data. The effectiveness of the proposed algorithm is demonstrated via synthetic experiments and applications to real-world data.

其他神经网络|深度学习|模型|建模(14篇)

【1】Multi-Sensory Cognitive Computing for Learning Population-level Brain Connectivity
标题：多感官认知计算用于学习人群级大脑连接
链接：https://arxiv.org/abs/2508.11436

作者：ussia, Mohamed Ali Mahjoub, Islem Rekik
摘要：最近，连接脑模板（CBT）的产生因其识别个体之间共享的独特连接模式的潜力而引起了极大的关注。然而，现有的CBT学习方法，如传统的机器学习和图神经网络（GNN）受到几个限制的阻碍。其中包括：（i）由于其黑盒性质，可解释性差，（ii）计算成本高，以及（iii）只关注结构和拓扑结构，忽略了所生成的CBT的认知能力。为了解决这些挑战，我们引入了mCOCO（多感官COgestival计算），这是一种新的框架，它利用水库计算（RC）从BOLD（血氧水平依赖）信号中学习人群水平的功能性CBT。RC的动态系统特性允许跟踪状态随时间的变化，增强可解释性，并使建模的类脑动力学，如在以前的文献中所示。通过整合多感官输入（例如，文本、音频和视觉数据），mCOCO不仅捕捉结构和拓扑，还捕捉大脑区域如何处理信息并适应感知处理等认知任务，所有这些都以计算高效的方式进行。我们的mCOCO框架由两个阶段组成：（1）将BOLD信号映射到水库中，以获得单个功能连接体，然后将其聚合成组水平的CBT -据我们所知，这是一种以前在功能连接研究中没有探索过的方法-以及（2）通过认知水库纳入多感官输入，赋予CBT认知特征。广泛的评估表明，我们的mCOCO为基础的模板显着优于GNN为基础的CBT的中心，歧视性，拓扑健全，和多感官记忆保持方面。我们的源代码可在https://github.com/basiralab/mCOCO上获得。
摘要：The generation of connectional brain templates (CBTs) has recently garnered significant attention for its potential to identify unique connectivity patterns shared across individuals. However, existing methods for CBT learning such as conventional machine learning and graph neural networks (GNNs) are hindered by several limitations. These include: (i) poor interpretability due to their black-box nature, (ii) high computational cost, and (iii) an exclusive focus on structure and topology, overlooking the cognitive capacity of the generated CBT. To address these challenges, we introduce mCOCO (multi-sensory COgnitive COmputing), a novel framework that leverages Reservoir Computing (RC) to learn population-level functional CBT from BOLD (Blood-Oxygen-level-Dependent) signals. RC's dynamic system properties allow for tracking state changes over time, enhancing interpretability and enabling the modeling of brain-like dynamics, as demonstrated in prior literature. By integrating multi-sensory inputs (e.g., text, audio, and visual data), mCOCO captures not only structure and topology but also how brain regions process information and adapt to cognitive tasks such as sensory processing, all in a computationally efficient manner. Our mCOCO framework consists of two phases: (1) mapping BOLD signals into the reservoir to derive individual functional connectomes, which are then aggregated into a group-level CBT - an approach, to the best of our knowledge, not previously explored in functional connectivity studies - and (2) incorporating multi-sensory inputs through a cognitive reservoir, endowing the CBT with cognitive traits. Extensive evaluations show that our mCOCO-based template significantly outperforms GNN-based CBT in terms of centeredness, discriminativeness, topological soundness, and multi-sensory memory retention. Our source code is available at https://github.com/basiralab/mCOCO.

【2】PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding
标题：PTSM：用于跨主题脑电解码的生理感知和任务不变时空建模
链接：https://arxiv.org/abs/2508.11357

作者： Jing, Yan Liu, Shuqiang Wang, Bruce X.B. Yu, Gong Chen, Zhejing Hu, Zhi Zhang, Yanyan Shen
摘要：由于主体间的差异性和主体不变表示的缺乏，跨主体脑电（EEG）解码仍然是脑机接口（BCI）研究中的一个基本挑战。本文提出了PTSM（生理感知和任务不变的时空建模），一个新的框架，可解释和强大的EEG解码看不见的主题。PTSM采用双分支掩蔽机制，独立学习个性化和共享的时空模式，使模型能够保留个体特定的神经特征，同时提取任务相关的人群共享特征。该掩模在时间和空间维度上被分解，允许以低计算开销对动态EEG模式进行细粒度调制。为了进一步解决表征纠缠，PTSM强制执行信息理论约束，将潜在嵌入分解为正交的任务相关和主题相关的子空间。该模型通过集成分类、对比和解纠缠目标的多目标损失进行端到端训练。跨主题运动想象数据集上的大量实验表明，PTSM实现了强大的zero-shot泛化，优于最先进的基线，而无需特定于主题的校准。结果突出了在非静态神经生理学环境中实现个性化和可转移解码的解开神经表征的功效。
摘要：Cross-subject electroencephalography (EEG) decoding remains a fundamental challenge in brain-computer interface (BCI) research due to substantial inter-subject variability and the scarcity of subject-invariant representations. This paper proposed PTSM (Physiology-aware and Task-invariant Spatio-temporal Modeling), a novel framework for interpretable and robust EEG decoding across unseen subjects. PTSM employs a dual-branch masking mechanism that independently learns personalized and shared spatio-temporal patterns, enabling the model to preserve individual-specific neural characteristics while extracting task-relevant, population-shared features. The masks are factorized across temporal and spatial dimensions, allowing fine-grained modulation of dynamic EEG patterns with low computational overhead. To further address representational entanglement, PTSM enforces information-theoretic constraints that decompose latent embeddings into orthogonal task-related and subject-related subspaces. The model is trained end-to-end via a multi-objective loss integrating classification, contrastive, and disentanglement objectives. Extensive experiments on cross-subject motor imagery datasets demonstrate that PTSM achieves strong zero-shot generalization, outperforming state-of-the-art baselines without subject-specific calibration. Results highlight the efficacy of disentangled neural representations for achieving both personalized and transferable decoding in non-stationary neurophysiological settings.

【3】Leveraging the RETFound foundation model for optic disc segmentation in retinal images
标题：利用RETFound基础模型在视网膜图像中进行椎间盘分割
链接：https://arxiv.org/abs/2508.11354

作者：ao, Muthu Rama Krishnan Mookiah, Emanuele Trucco
摘要：RETFound是为眼底相机和光学相干断层扫描图像开发的众所周知的基础模型（FM）。它在从视网膜图像诊断眼睛特异性和系统性疾病的多个数据集上显示出有前途的性能。然而，据我们所知，它没有被用于其他任务。我们提出了第一个适应的RETFound视盘分割，在视网膜图像分析中的一个普遍存在的和基础的任务。由此产生的分割系统优于最先进的，特定于分割的基线网络后，训练的头部只有一个非常有限的特定于任务的例子。我们报告并讨论了四个公共数据集IDRID，Drishti-GS，RIM-ONE-r3和REFUGE以及一个私人数据集GoDARTS的结果，在所有数据集中实现了约96%的Dice一致性。总体而言，我们的方法在内部验证，域泛化和域自适应方面获得了优异的性能，并超过了大多数最先进的基线结果。我们讨论的结果的框架内的辩论FM作为替代特定任务的架构。该代码可在：[链接将在论文被接受后添加]
摘要：RETFound is a well-known foundation model (FM) developed for fundus camera and optical coherence tomography images. It has shown promising performance across multiple datasets in diagnosing diseases, both eye-specific and systemic, from retinal images. However, to our best knowledge, it has not been used for other tasks. We present the first adaptation of RETFound for optic disc segmentation, a ubiquitous and foundational task in retinal image analysis. The resulting segmentation system outperforms state-of-the-art, segmentation-specific baseline networks after training a head with only a very modest number of task-specific examples. We report and discuss results with four public datasets, IDRID, Drishti-GS, RIM-ONE-r3, and REFUGE, and a private dataset, GoDARTS, achieving about 96% Dice consistently across all datasets. Overall, our method obtains excellent performance in internal verification, domain generalization and domain adaptation, and exceeds most of the state-of-the-art baseline results. We discuss the results in the framework of the debate about FMs as alternatives to task-specific architectures. The code is available at: [link to be added after the paper is accepted]

【4】Harmonized Gradient Descent for Class Imbalanced Data Stream Online Learning
标题：课堂不平衡数据流在线学习的协调梯度下降
链接：https://arxiv.org/abs/2508.11353

作者： Hongpeng Yin, Xuanhong Deng, Yuyu Huang, Hao Ren
摘要：许多真实世界的数据是随时间顺序收集的，并且通常表现出倾斜的类分布，从而导致不平衡的数据流。虽然现有的方法已经探索了几种策略，如重新分配和重新加权，用于不平衡的数据流学习，但我们的工作通过训练修改来解决不平衡问题，特别是专注于梯度下降技术。我们引入了协调梯度下降（HGD）算法，其目的是均衡不同类别的梯度范数。通过确保梯度范数平衡，HGD减轻了对小类的拟合不足，并实现了平衡的在线学习。值得注意的是，HGD在一个简化的实现过程中运行，不需要数据缓冲区，额外的参数或先验知识，使其适用于任何利用梯度下降进行优化的学习模型。理论分析表明，基于几个常见的和温和的假设，HGD实现了满意的次线性遗憾界。在几种数据流不平衡的情况下，将该算法与常用的在线不平衡学习方法进行了比较。大量的实验评估表明HGD学习不平衡数据流的效率和有效性。
摘要：Many real-world data are sequentially collected over time and often exhibit skewed class distributions, resulting in imbalanced data streams. While existing approaches have explored several strategies, such as resampling and reweighting, for imbalanced data stream learning, our work distinguishes itself by addressing the imbalance problem through training modification, particularly focusing on gradient descent techniques. We introduce the harmonized gradient descent (HGD) algorithm, which aims to equalize the norms of gradients across different classes. By ensuring the gradient norm balance, HGD mitigates under-fitting for minor classes and achieves balanced online learning. Notably, HGD operates in a streamlined implementation process, requiring no data-buffer, extra parameters, or prior knowledge, making it applicable to any learning models utilizing gradient descent for optimization. Theoretical analysis, based on a few common and mild assumptions, shows that HGD achieves a satisfied sub-linear regret bound. The proposed algorithm are compared with the commonly used online imbalance learning methods under several imbalanced data stream scenarios. Extensive experimental evaluations demonstrate the efficiency and effectiveness of HGD in learning imbalanced data streams.

【5】NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models
标题：NeMo：一种用于分解DNN模型的神经元级训练模块化方法
链接：https://arxiv.org/abs/2508.11348

作者：i, Binhang Qi, Hailong Sun, Xiang Gao, Yue Yu, Xiaojun Liang
备注：None
摘要：随着深度神经网络（DNN）模型越来越多地融入现代软件系统，高昂的建设成本已成为一个重大挑战。模型重用已被广泛应用于降低训练成本，但不加区别地重用整个模型可能会导致显着的推理开销。因此，DNN模块化得到了关注，通过分解DNN模型实现模块重用。新兴的模块化培训（MwT）范式，将模块化纳入培训，优于模块化后培训方法。然而，现有的MwT方法专注于卷积核级别的小规模CNN模型，并与各种DNN和大规模模型，特别是基于Transformer的模型进行斗争。为了解决这些限制，我们提出了NeMo，一个可扩展和可推广的MwT方法。NeMo在神经元级别运行，是所有DNN通用的基本组件，确保适用于Transformers和各种架构。我们设计了一种基于对比学习的模块化训练方法，具有有效的复合损失函数，能够扩展到大规模模型。在两个分类数据集上对两个基于transformer的模型和四个CNN模型进行的综合实验证明了NeMo优于最先进的MwT方法。结果显示，模块分类准确率平均提高了1.72%，模块大小减少了58.10%，证明了CNN和基于transformer的大规模模型的有效性。开源项目的案例研究显示了NeMo在实际场景中的潜在优势，为可扩展和可推广的DNN模块化提供了一种有前途的方法。
摘要：With the growing incorporation of deep neural network (DNN) models into modern software systems, the prohibitive construction costs have become a significant challenge. Model reuse has been widely applied to reduce training costs, but indiscriminately reusing entire models may incur significant inference overhead. Consequently, DNN modularization has gained attention, enabling module reuse by decomposing DNN models. The emerging modularizing-while-training (MwT) paradigm, which incorporates modularization into training, outperforms modularizing-after-training approaches. However, existing MwT methods focus on small-scale CNN models at the convolutional kernel level and struggle with diverse DNNs and large-scale models, particularly Transformer-based models. To address these limitations, we propose NeMo, a scalable and generalizable MwT approach. NeMo operates at the neuron level fundamental component common to all DNNs-ensuring applicability to Transformers and various architectures. We design a contrastive learning-based modular training method with an effective composite loss function, enabling scalability to large-scale models. Comprehensive experiments on two Transformer-based models and four CNN models across two classification datasets demonstrate NeMo's superiority over state-of-the-art MwT methods. Results show average gains of 1.72% in module classification accuracy and 58.10% reduction in module size, demonstrating efficacy across both CNN and large-scale Transformer-based models. A case study on open-source projects shows NeMo's potential benefits in practical scenarios, offering a promising approach for scalable and generalizable DNN modularization.

【6】Probing the Representational Power of Sparse Autoencoders in Vision Models
标题：探索视觉模型中稀疏自动编码器的表示能力
链接：https://arxiv.org/abs/2508.11277

作者：yle Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng
备注：ICCV 2025 Findings
摘要：稀疏自动编码器（SAE）已经成为解释大型语言模型（LLM）隐藏状态的流行工具。通过学习从稀疏瓶颈层重建激活，SAE从LLM的高维内部表示中发现可解释的特征。尽管它们在语言模型中很受欢迎，但SAE在视觉领域仍然研究不足。在这项工作中，我们提供了一个广泛的评估代表性的权力SAE的视觉模型使用广泛的基于图像的任务。我们的实验结果表明，SAE功能是语义上有意义的，提高了分布外的泛化，并使可控的生成跨三个视觉模型架构：视觉嵌入模型，多模态的LIFE和扩散模型。在视觉嵌入模型中，我们发现学习的SAE特征可以用于OOD检测，并提供证据证明它们恢复了底层模型的本体结构。对于扩散模型，我们证明了SAE通过文本编码器操作实现语义转向，并开发了一个自动化管道来发现人类可解释的属性。最后，我们对多模态LLM进行了探索性实验，发现SAE特征揭示了跨视觉和语言模态的共享表征的证据。我们的研究为SAE在视觉模型中的评估提供了基础，突出了它们在视觉领域提高可解释性、泛化性和可操纵性的强大潜力。
摘要：Sparse Autoencoders (SAEs) have emerged as a popular tool for interpreting the hidden states of large language models (LLMs). By learning to reconstruct activations from a sparse bottleneck layer, SAEs discover interpretable features from the high-dimensional internal representations of LLMs. Despite their popularity with language models, SAEs remain understudied in the visual domain. In this work, we provide an extensive evaluation the representational power of SAEs for vision models using a broad range of image-based tasks. Our experimental results demonstrate that SAE features are semantically meaningful, improve out-of-distribution generalization, and enable controllable generation across three vision model architectures: vision embedding models, multi-modal LMMs and diffusion models. In vision embedding models, we find that learned SAE features can be used for OOD detection and provide evidence that they recover the ontological structure of the underlying model. For diffusion models, we demonstrate that SAEs enable semantic steering through text encoder manipulation and develop an automated pipeline for discovering human-interpretable attributes. Finally, we conduct exploratory experiments on multi-modal LLMs, finding evidence that SAE features reveal shared representations across vision and language modalities. Our study provides a foundation for SAE evaluation in vision models, highlighting their strong potential improving interpretability, generalization, and steerability in the visual domain.

【7】Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning
标题：借鉴未来：通过对比学习加强早期风险评估
链接：https://arxiv.org/abs/2508.11210

作者：un, Matthew M. Engelhard, Benjamin A. Goldstein
备注：accepted by Machine Learning for Healthcare 2025
摘要：儿科人群的风险评估通常在多个阶段进行。例如，临床医生可能会在产前、出生时和健康儿童访视期间评估风险。虽然在后期阶段进行的预测通常会达到更高的精度，但临床上需要尽可能早地进行可靠的风险评估。因此，本研究的重点是提高早期风险评估的预测性能。我们的解决方案，\textbf{借用未来（BFF）}，是一个对比多模态框架，将每个时间窗口视为一个不同的模态。在BFF中，在使用最新信息进行风险评估的同时，模型始终在所有可用数据上进行训练。这种对比框架允许模型从后期阶段（例如，健康儿童访问），以隐含地监督早期阶段的学习（例如，产前/出生阶段）。我们在两个现实世界的儿科结局预测任务中验证了BFF，证明了早期风险评估的一致改善。该代码可在https://github.com/scotsun/bff上获得。
摘要：Risk assessments for a pediatric population are often conducted across multiple stages. For example, clinicians may evaluate risks prenatally, at birth, and during Well-Child visits. Although predictions made at later stages typically achieve higher precision, it is clinically desirable to make reliable risk assessments as early as possible. Therefore, this study focuses on improving prediction performance in early-stage risk assessments. Our solution, \textbf{Borrowing From the Future (BFF)}, is a contrastive multi-modal framework that treats each time window as a distinct modality. In BFF, a model is trained on all available data throughout the time while performing a risk assessment using up-to-date information. This contrastive framework allows the model to ``borrow'' informative signals from later stages (e.g., Well-Child visits) to implicitly supervise the learning at earlier stages (e.g., prenatal/birth stages). We validate BFF on two real-world pediatric outcome prediction tasks, demonstrating consistent improvements in early risk assessments. The code is available at https://github.com/scotsun/bff.

【8】Quantum-Boosted High-Fidelity Deep Learning
标题：量子增强的高保真深度学习
链接：https://arxiv.org/abs/2508.11190

作者：ang, Shaobo Chen, Yao Xuan, Junwei Liu, Qi Gao, Hongdong Zhu, Junjie Hou, Lixin Yuan, Jinyu Cheng, Chenxin Yi, Hai Wei, Yin Ma, Tao Xu, Kai Wen, Yixue Li
摘要：概率深度学习的一个基本限制是它主要依赖于高斯先验。这种简单化的假设阻碍了模型准确地捕捉自然数据的复杂、非高斯景观，特别是在复杂的生物数据等要求苛刻的领域，严重阻碍了模型对科学发现的保真度。基于物理的玻尔兹曼分布提供了一个更有表现力的选择，但在经典计算机上计算起来很难。到目前为止，量子方法一直受到深度学习迭代需求所需的量子比特规模和操作稳定性不足的阻碍。在这里，我们通过引入量子玻尔兹曼机变分自动编码器（QBM-VAE）来弥合这一差距，这是一种大规模和长时间稳定的混合量子经典架构。我们的框架利用量子处理器从玻尔兹曼分布进行有效采样，使其能够在深度生成模型中用作强大的先验。应用于来自多个来源的百万级单细胞数据集，QBM-VAE生成了一个潜在空间，可以更好地保留复杂的生物结构，在组学数据集成、细胞类型分类和轨迹推断等基本任务中始终优于VAE和SCVI等传统的基于高斯的深度学习模型。它还提供了一个将物理先验引入深度学习的典型例子，以驱动模型获得突破数据限制的科学发现能力。这项工作证明了在大规模科学问题上深度学习的实际量子优势，并为开发混合量子AI模型提供了可转移的蓝图。
摘要：A fundamental limitation of probabilistic deep learning is its predominant reliance on Gaussian priors. This simplistic assumption prevents models from accurately capturing the complex, non-Gaussian landscapes of natural data, particularly in demanding domains like complex biological data, severely hindering the fidelity of the model for scientific discovery. The physically-grounded Boltzmann distribution offers a more expressive alternative, but it is computationally intractable on classical computers. To date, quantum approaches have been hampered by the insufficient qubit scale and operational stability required for the iterative demands of deep learning. Here, we bridge this gap by introducing the Quantum Boltzmann Machine-Variational Autoencoder (QBM-VAE), a large-scale and long-time stable hybrid quantum-classical architecture. Our framework leverages a quantum processor for efficient sampling from the Boltzmann distribution, enabling its use as a powerful prior within a deep generative model. Applied to million-scale single-cell datasets from multiple sources, the QBM-VAE generates a latent space that better preserves complex biological structures, consistently outperforming conventional Gaussian-based deep learning models like VAE and SCVI in essential tasks such as omics data integration, cell-type classification, and trajectory inference. It also provides a typical example of introducing a physics priori into deep learning to drive the model to acquire scientific discovery capabilities that breaks through data limitations. This work provides the demonstration of a practical quantum advantage in deep learning on a large-scale scientific problem and offers a transferable blueprint for developing hybrid quantum AI models.

【9】Learn to optimize for automatic proton PBS treatment planning for H&N cancers
标题：学习优化H & N癌症的自动质子PBS治疗计划
链接：https://arxiv.org/abs/2508.11085

作者：Wang, Liqiang Xiao, Chang Chang
备注：27 pages, 4 figures
摘要：H&N癌症的质子PBS治疗计划涉及许多相互冲突的目标，需要人类计划者付出巨大努力来平衡和满足计划期间的多个临床目标。为了实现这一点，经验要求的目标参数调整和计算昂贵的逆优化迭代执行。已经进行了大量的努力来自动调整目标参数，但是最耗时的组件，即，逆优化，仍然严重依赖于理论驱动的方法。我们提出了一个数据驱动的逆优化器，并将其集成到一个基于PPO的自动治疗计划框架，在临床可接受的计划时间内自动生成高质量的计划。逆向优化器是一种L2 O方法，它通过从特定于任务的数据分布中学习来预测更新步骤。这是第一次，我们集成技术设计的长上下文处理，最初开发的LLM，到一个基于transformer的L2 O框架，以解决现有的L2 O方法的可扩展性问题。PPO框架作为外环虚拟规划器，通过策略网络自主调整目标参数，剂量预测器用于初始化目标参数。内循环L2 O逆优化器基于PPO策略网络细化的目标计算机器可交付的MU值。在本研究中收集了97例患者，与L-BFGSB相比，我们的基于L2 O的逆优化器的有效性和效率分别提高了22.97%和36.41%。结合基于PPO的学习虚拟计划器，我们的框架在平均2.55小时内生成的计划显示出改善的或相当的OAR节省，具有不同处方剂量水平、靶体积数量、射束角度等的患者的优异靶覆盖，与人类创造的计划相比。
摘要：Proton PBS treatment planning for H&N cancers involves numerous conflicting objectives, requiring significant effort from human planners to balance and satisfy multiple clinical goals during planning. To achieve this, experience-demanding objective parameter adjustment and computationally expensive inverse optimization are performed iteratively. Extensive efforts have been made to automatically adjust objective parameters, but the most time-consuming component, i.e., inverse optimization, still relies heavily on theory-driven approaches. We propose a data-driven inverse optimizer and integrate it into a PPO-based automatic treatment planning framework to automatically generate high-quality plans within a clinical acceptable planning time. The inverse optimizer is a L2O method that predicts update steps by learning from the task-specific data distribution. For the first time, we integrate techniques designed for long-context processing, originally developed for LLMs, into a Transformer-based L2O framework to address the scalability issue of existing L2O methods. The PPO framework functions as an outer-loop virtual planner, autonomously adjusting objective parameters through a policy network, and the dose predictor is used to initialize objective parameters. The inner-loop L2O inverse optimizer computes machine-deliverable MU values based on objectives refined by the PPO policy network. 97 patients are collected in this study, and compared with L-BFGSB, our L2O-based inverse optimizer improves the effectiveness and efficiency by 22.97% and 36.41%, respectively. In conjunction with the PPO-based learned virtual planner, plans generated by our framework within an average of 2.55 hours show improved or comparable OAR sparing with superior target coverage for patients with different prescription dose levels, number of target volumes, beam angles, etc., compared with human-generated plans.

【10】Learning with Confidence
标题：自信地学习
链接：https://arxiv.org/abs/2508.11037

作者：han Richardson
备注：Accepted for oral UAI 2025, plus some additional modifications for clarity
摘要：我们描述了在学习或更新信念时产生的信心的概念：一个人对传入信息的信任程度及其对信念状态的影响。这种学习者的信心可以与概率或可能性一起使用（并且很容易被误认为是概率或可能性），但它本质上是一个不同的概念-它捕获了文献中许多熟悉的概念，包括学习率和训练时期的数量，Shafer的证据权重和Kalman增益。我们正式axiomatize这意味着学习的信心，给出了两个典型的方法来衡量信心的连续，并证明信心总是可以用这种方式表示。在额外的假设下，我们得到更紧凑的表示的信任为基础的学习向量场和损失函数。这些表示引发了复合“平行”观察的扩展语言。我们将贝叶斯规则描述为损失表示为线性期望的优化学习器的特殊情况。
摘要：We characterize a notion of confidence that arises in learning or updating beliefs: the amount of trust one has in incoming information and its impact on the belief state. This learner's confidence can be used alongside (and is easily mistaken for) probability or likelihood, but it is fundamentally a different concept -- one that captures many familiar concepts in the literature, including learning rates and number of training epochs, Shafer's weight of evidence, and Kalman gain. We formally axiomatize what it means to learn with confidence, give two canonical ways of measuring confidence on a continuum, and prove that confidence can always be represented in this way. Under additional assumptions, we derive more compact representations of confidence-based learning in terms of vector fields and loss functions. These representations induce an extended language of compound "parallel" observations. We characterize Bayes Rule as the special case of an optimizing learner whose loss representation is a linear expectation.

【11】Match & Choose: Model Selection Framework for Fine-tuning Text-to-Image Diffusion Models
标题：匹配与选择：微调文本到图像扩散模型的模型选择框架
链接：https://arxiv.org/abs/2508.10993

作者：wandowski, Robert Birke, Lydia Y. Chen
摘要：基于扩散和Transformer架构的文本到图像（T2 I）模型发展迅速。它们通常在大型语料库上进行预训练，并在模型平台上公开共享，例如HuggingFace。然后，用户可以构建AI应用程序，例如，通过采用预训练的T2 I模型并在目标数据集上对其进行微调来生成媒体内容。虽然公共预训练的T2 I模型促进了模型的民主化，但用户面临着一个新的挑战：哪种模型可以根据目标数据域进行最佳微调？模型选择在分类任务中得到了很好的解决，但在（预训练的）T2 I模型及其在目标域上的性能指示中知之甚少。在本文中，我们提出了第一个模型选择框架M&C，它使用户能够有效地从模型平台中选择预训练的T2 I模型，而无需在目标数据集上对它们进行彻底的微调。M&C的核心是一个匹配图，它包括：（i）可用模型和分析数据集的节点，以及（ii）分别捕获微调性能和数据相似性的模型-数据和数据-数据对的边。然后，我们建立了一个模型，该模型基于模型/数据特征的输入，并且，关键的是，从匹配图中提取的图嵌入特征，预测模型在针对目标域进行微调后达到最佳质量。我们评估了M&C在32个数据集的10个T2 I模型中对三个基线的选择。我们的研究结果表明，M&C成功地预测了61.3%的情况下，微调的最佳模型和一个密切执行的模型，其余的。
摘要：Text-to-image (T2I) models based on diffusion and transformer architectures advance rapidly. They are often pretrained on large corpora, and openly shared on a model platform, such as HuggingFace. Users can then build up AI applications, e.g., generating media contents, by adopting pretrained T2I models and fine-tuning them on the target dataset. While public pretrained T2I models facilitate the democratization of the models, users face a new challenge: which model can be best fine-tuned based on the target data domain? Model selection is well addressed in classification tasks, but little is known in (pretrained) T2I models and their performance indication on the target domain. In this paper, we propose the first model selection framework, M&C, which enables users to efficiently choose a pretrained T2I model from a model platform without exhaustively fine-tuning them all on the target dataset. The core of M&C is a matching graph, which consists of: (i) nodes of available models and profiled datasets, and (ii) edges of model-data and data-data pairs capturing the fine-tuning performance and data similarity, respectively. We then build a model that, based on the inputs of model/data feature, and, critically, the graph embedding feature, extracted from the matching graph, predicts the model achieving the best quality after fine-tuning for the target domain. We evaluate M&C on choosing across ten T2I models for 32 datasets against three baselines. Our results show that M&C successfully predicts the best model for fine-tuning in 61.3% of the cases and a closely performing model for the rest.

【12】Nonparametric learning of stochastic differential equations from sparse and noisy data
标题：从稀疏和有噪数据中进行随机方程的非参数学习
链接：https://arxiv.org/abs/2508.11597

作者：guly, Riten Mitra, Jinpu Zhou
备注：35 pages, 6 figures
摘要：本文提出了一个系统框架，用于根据稀疏、有噪的观测数据构建数据驱动的随机微分方程（SDE）模型。与传统的参数化方法不同，传统的参数化方法假设漂移的函数形式，我们的目标是直接从数据中学习整个漂移函数，而无需强有力的结构假设，这使得它在系统动力学部分理解或高度复杂的科学学科中特别相关。我们铸造的估计问题的最小化的惩罚负对数似然功能的再生核希尔伯特空间（RKHS）。在稀疏观测机制中，未观测到的轨迹段的存在使得稀疏似然难以处理。为了解决这个问题，我们开发了一个期望最大化（EM）算法，采用一种新的顺序蒙特卡罗（SMC）方法来近似的过滤分布和生成Monte Carlo估计的E-步骤的目标。M-步骤，然后减少到一个惩罚的经验风险最小化问题的RKHS，其最小值是由一个有限的线性组合的核函数通过广义表示定理。为了控制EM迭代模型的复杂性，我们还开发了一种混合贝叶斯算法的变体，该算法使用收缩先验来识别内核扩展中的重要系数。我们建立了重要的理论收敛结果的精确和近似EM序列。由此产生的EM-SMC-RKHS程序，使准确估计的漂移函数的随机动力系统在低数据制度，并广泛适用于跨域需要连续时间建模下的观测约束。我们通过一系列的数值实验证明了我们的方法的有效性。
摘要：The paper proposes a systematic framework for building data-driven stochastic differential equation (SDE) models from sparse, noisy observations. Unlike traditional parametric approaches, which assume a known functional form for the drift, our goal here is to learn the entire drift function directly from data without strong structural assumptions, making it especially relevant in scientific disciplines where system dynamics are partially understood or highly complex. We cast the estimation problem as minimization of the penalized negative log-likelihood functional over a reproducing kernel Hilbert space (RKHS). In the sparse observation regime, the presence of unobserved trajectory segments makes the SDE likelihood intractable. To address this, we develop an Expectation-Maximization (EM) algorithm that employs a novel Sequential Monte Carlo (SMC) method to approximate the filtering distribution and generate Monte Carlo estimates of the E-step objective. The M-step then reduces to a penalized empirical risk minimization problem in the RKHS, whose minimizer is given by a finite linear combination of kernel functions via a generalized representer theorem. To control model complexity across EM iterations, we also develop a hybrid Bayesian variant of the algorithm that uses shrinkage priors to identify significant coefficients in the kernel expansion. We establish important theoretical convergence results for both the exact and approximate EM sequences. The resulting EM-SMC-RKHS procedure enables accurate estimation of the drift function of stochastic dynamical systems in low-data regimes and is broadly applicable across domains requiring continuous-time modeling under observational constraints. We demonstrate the effectiveness of our method through a series of numerical experiments.

【13】Counterfactual Survival Q Learning for Longitudinal Randomized Trials via Buckley James Boosting
标题：纵向随机试验的反事实生存Q学习
链接：https://arxiv.org/abs/2508.11060

作者：Lee, Jong-Min Kim
摘要：我们提出了一个Buckley James（BJ）Boost Q学习框架，用于估计右删失生存数据下的最佳动态治疗方案，为纵向随机临床试验设置量身定制。该方法集成了加速故障时间模型与迭代提升技术，包括组件最小二乘法和回归树，在一个反事实Q学习框架。通过直接建模条件生存时间，BJ Boost Q学习避免了限制性比例风险假设，并实现了阶段特定Q函数的无偏估计。以潜在结果为基础，该框架确保了在标准因果假设下最佳治疗方案的可识别性。基于Cox的Q学习依赖于风险模型，可能会受到错误指定的偏见相比，我们的方法提供了强大的和灵活的估计。ACTG175 HIV试验的模拟研究和分析表明，BJ Boost Q学习在治疗决策中具有更高的准确性，特别是在偏差可能累积的多阶段设置中。
摘要：We propose a Buckley James (BJ) Boost Q learning framework for estimating optimal dynamic treatment regimes under right censored survival data, tailored for longitudinal randomized clinical trial settings. The method integrates accelerated failure time models with iterative boosting techniques, including componentwise least squares and regression trees, within a counterfactual Q learning framework. By directly modeling conditional survival time, BJ Boost Q learning avoids the restrictive proportional hazards assumption and enables unbiased estimation of stage specific Q functions. Grounded in potential outcomes, this framework ensures identifiability of the optimal treatment regime under standard causal assumptions. Compared to Cox based Q learning, which relies on hazard modeling and may suffer from bias under misspecification, our approach provides robust and flexible estimation. Simulation studies and analysis of the ACTG175 HIV trial demonstrate that BJ Boost Q learning yields higher accuracy in treatment decision making, especially in multistage settings where bias can accumulate.

【14】Data-driven global ocean model resolving ocean-atmosphere coupling dynamics
标题：数据驱动的全球海洋模型解决海洋-大气耦合动力学
链接：https://arxiv.org/abs/2508.10908

作者：n Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham
备注：The manuscript contains 4 main figures. The Extended Data contains 7 figures and 3 tables. The Supplementary Information contains 3 text sections, 7 figures, 1 table
摘要：人工智能已经推进了全球天气预报，在准确性和计算效率方面都优于传统的数值模型。然而，将预测扩展到亚季节时间尺度之外需要开发基于深度学习（DL）的海洋-大气耦合模型，这些模型可以真实地模拟海洋对大气强迫的复杂响应。本研究提出了KIST-Ocean，一个基于DL的全球三维海洋环流模式，采用U形视觉注意对抗网络架构。KIST-Ocean集成了部分卷积、对抗训练和迁移学习，以解决自回归模型中的海岸复杂性和预测分布漂移问题。综合评估证实了该模型的强大的海洋预测能力和效率。此外，它准确地捕捉到了真实的海洋反应，如热带太平洋的开尔文和罗斯比波传播，以及气旋和反气旋风应力引起的垂直运动，表明它有能力代表气候现象，包括厄尔尼诺-南方涛动的关键海洋-大气耦合机制。这些发现增强了人们对基于DL的全球天气和气候模型的信心，并将基于DL的方法扩展到更广泛的地球系统建模，为增强气候预测能力提供了潜力。
摘要：Artificial intelligence has advanced global weather forecasting, outperforming traditional numerical models in both accuracy and computational efficiency. Nevertheless, extending predictions beyond subseasonal timescales requires the development of deep learning (DL)-based ocean-atmosphere coupled models that can realistically simulate complex oceanic responses to atmospheric forcing. This study presents KIST-Ocean, a DL-based global three-dimensional ocean general circulation model using a U-shaped visual attention adversarial network architecture. KIST-Ocean integrates partial convolution, adversarial training, and transfer learning to address coastal complexity and predictive distribution drift in auto-regressive models. Comprehensive evaluations confirmed the model's robust ocean predictive skill and efficiency. Moreover, it accurately captures realistic ocean response, such as Kelvin and Rossby wave propagation in the tropical Pacific, and vertical motions induced by cyclonic and anticyclonic wind stress, demonstrating its ability to represent key ocean-atmosphere coupling mechanisms underlying climate phenomena, including the El Nino-Southern Oscillation. These findings reinforce confidence in DL-based global weather and climate models and their extending DL-based approaches to broader Earth system modeling, offering potential for enhancing climate prediction capabilities.

其他(20篇)

【1】SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling
标题：DeliverlessFlow：一个训练器代理隔离RL框架，通过标签调度实现无气泡管道
链接：https://arxiv.org/abs/2508.11553

作者：ang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Xiaojiang Zhang, Minglei Zhang, Jiarong Zhang, Wenhao Zhuang, Yuchen Cao, Wankang Bao, Haimo Li, Zheng Lin, Huiming Wang, Haoyang Huang, Zongxian Feng, Zizheng Zhan, Ken Deng, Wen Xiang, Huaixi Tang, Kun Wu, Mengtong Li, Mengfei Xie, Junyi Peng, Haotian Zhang, Bin Chen, Bing Yu
摘要：我们引入了基于服务器的强化学习（RL）框架，它解决了工业规模RL中的两个核心挑战：（1）将RL训练与代理的复杂执行流解耦;（2）以最少的空闲时间最大化GPU利用率，同时保持大规模部署所需的稳定性和可扩展性。首先，无中断流引入了一个数据平面，它将RL训练器从不同的、复杂的代理实现中分离出来，同时保持高吞吐量。中央轨迹管理器维护完整的交互历史并支持部分推出，允许推出暂停以进行权重更新并无缝恢复，使代理不知道服务中断。其次，我们提出了一个标签驱动的调度模式，抽象的硬件能力标记的资源，统一的同位和分散的架构。在此基础上，提出了一种时空复用流水线机制，通过训练与部署分离的方式，动态地将空闲的训练节点重新分配到部署中，消除了流水线气泡，充分利用了异构集群资源。通过将这些创新结合在一起，Blessless Flow提供了稳定性和高性能，使其非常适合多代理，长时间和其他复杂的RL任务。
摘要：We introduce SeamlessFlow, a server based reinforcement learning (RL) framework that addresses two core challenges in industrial scale RL: (1) decoupling RL training from the complex execution flow of agents; (2) maximizing GPU utilization with minimal idle time while preserving the stability and scalability required for large-scale deployments. First, SeamlessFlow introduces a data plane that decouples the RL trainer from diverse, complex agent implementations while sustaining high throughput. A central trajectory manager maintains complete interaction histories and supports partial rollout, allowing rollout to pause for weight updates and resume seamlessly, keeping agents unaware of service interruptions. Second, we propose a tag driven scheduling paradigm that abstracts hardware into capability tagged resources, unifying colocated and disaggregated architectures. Based on this, SeamlessFlow introduces a spatiotemporal multiplexing pipeline that dynamically reassigns idle training nodes to rollout in a train rollout separated setup, eliminating pipeline bubbles and fully exploiting heterogeneous cluster resources. By combining these innovations, SeamlessFlow delivers both stability and high performance, making it well suited for multi agent, long horizon, and other complex RL tasks.

【2】Finite-Width Neural Tangent Kernels from Feynman Diagrams
标题：费曼图中的宽神经切核
链接：https://arxiv.org/abs/2508.11522

作者：en, Philipp Misof, Jan E. Gerken
备注：11 pages + appendices
摘要：神经正切核（NTK）是分析深度非线性神经网络的强大工具。在无限宽的限制下，NTK可以很容易地计算出最常见的架构，产生完整的分析控制的训练动态。然而，在无限宽度上，缺乏训练的重要属性，如NTK进化或特征学习。然而，有限宽度效应可以通过计算无限宽度处的高斯统计的校正来包括。我们介绍费曼图计算有限宽度校正NTK统计。这些极大地简化了必要的代数操作，并使计算分层递归关系的任意统计，涉及预激活，NTK和某些高阶导数张量（dNTK和ddNTK）预测训练动态所需的领先秩序。我们通过将深度网络的稳定性结果从预激活扩展到NTK，并证明NTK的Gram矩阵对角线上的ReLU等尺度不变非线性没有有限宽度校正，来证明我们框架的可行性。我们验证了我们的结果与数值实验。
摘要：Neural tangent kernels (NTKs) are a powerful tool for analyzing deep, non-linear neural networks. In the infinite-width limit, NTKs can easily be computed for most common architectures, yielding full analytic control over the training dynamics. However, at infinite width, important properties of training such as NTK evolution or feature learning are absent. Nevertheless, finite width effects can be included by computing corrections to the Gaussian statistics at infinite width. We introduce Feynman diagrams for computing finite-width corrections to NTK statistics. These dramatically simplify the necessary algebraic manipulations and enable the computation of layer-wise recursive relations for arbitrary statistics involving preactivations, NTKs and certain higher-derivative tensors (dNTK and ddNTK) required to predict the training dynamics at leading order. We demonstrate the feasibility of our framework by extending stability results for deep networks from preactivations to NTKs and proving the absence of finite-width corrections for scale-invariant nonlinearities such as ReLU on the diagonal of the Gram matrix of the NTK. We validate our results with numerical experiments.

【3】Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media
标题：Sim2 Dust：掌握颗粒媒体上的动态航路点跟踪
链接：https://arxiv.org/abs/2508.11503

作者：sula, Matthieu Geist, Miguel Olivares-Mendez, Carol Martinez
备注：The source code is available at this https URL
摘要：在遥远行星表面的非结构化地形上进行可靠的自主导航是未来太空探索的关键推动力。然而，基于学习的控制器的部署受到固有的模拟到真实差距的阻碍，特别是对于车轮与颗粒介质相互作用的复杂动力学。这项工作提出了一个完整的模拟到真实的框架，用于开发和验证鲁棒控制策略，用于在这种具有挑战性的表面上进行动态航路点跟踪。我们利用大规模并行模拟来训练强化学习代理，这些代理跨越具有随机物理的程序生成环境的广泛分布。这些政策，然后转移zero-shot到一个物理轮式漫游车在月球模拟设施。我们的实验系统地比较了多个强化学习算法和动作平滑滤波器，以确定最有效的组合，用于实际部署。至关重要的是，我们提供了强有力的经验证据表明，与静态场景训练相比，经过程序多样性训练的代理实现了更好的zero-shot性能。我们还分析了与高保真粒子物理，它提供了一个显着的计算成本在低速精度微小的增益微调的权衡。这些贡献共同建立了一个经过验证的工作流程，用于创建可靠的基于学习的导航系统，标志着在最后的前沿部署自主机器人的关键一步。
摘要：Reliable autonomous navigation across the unstructured terrains of distant planetary surfaces is a critical enabler for future space exploration. However, the deployment of learning-based controllers is hindered by the inherent sim-to-real gap, particularly for the complex dynamics of wheel interactions with granular media. This work presents a complete sim-to-real framework for developing and validating robust control policies for dynamic waypoint tracking on such challenging surfaces. We leverage massively parallel simulation to train reinforcement learning agents across a vast distribution of procedurally generated environments with randomized physics. These policies are then transferred zero-shot to a physical wheeled rover operating in a lunar-analogue facility. Our experiments systematically compare multiple reinforcement learning algorithms and action smoothing filters to identify the most effective combinations for real-world deployment. Crucially, we provide strong empirical evidence that agents trained with procedural diversity achieve superior zero-shot performance compared to those trained on static scenarios. We also analyze the trade-offs of fine-tuning with high-fidelity particle physics, which offers minor gains in low-speed precision at a significant computational cost. Together, these contributions establish a validated workflow for creating reliable learning-based navigation systems, marking a critical step towards deploying autonomous robots in the final frontier.

【4】Robust Convolution Neural ODEs via Contractivity-promoting regularization
标题：通过促进压缩性的规则化实现鲁棒卷积神经ODE
链接：https://arxiv.org/abs/2508.11432

作者：Zakwan, Liang Xu, Giancarlo Ferrari-Trecate
备注：Accepted in IEEE CDC2025, Rio de Janeiro, Brazil
摘要：神经网络对于输入噪声和对抗性攻击来说是脆弱的。在这项工作中，我们考虑卷积神经常微分方程（NODE），一个由动力系统表示的连续深度神经网络家族，并建议使用收缩理论来提高其鲁棒性。对于一个压缩动力系统，两条从不同初始条件出发的轨迹以指数速度收敛到彼此。收缩卷积节点可以享受增加的鲁棒性，因为特征的轻微扰动不会导致输出的显著变化。收缩性可以在训练期间通过使用涉及系统动力学的雅可比矩阵的正则化项来诱导。为了减少计算负担，我们表明，它也可以促进使用精心选择的权重正则化项的一类NODE与斜率限制激活函数。通过MNIST和FashionMNIST数据集上的基准图像分类任务来说明所提出的正则化器的性能，其中图像受到不同类型的噪声和攻击的破坏。
摘要：Neural networks can be fragile to input noise and adversarial attacks. In this work, we consider Convolutional Neural Ordinary Differential Equations (NODEs), a family of continuous-depth neural networks represented by dynamical systems, and propose to use contraction theory to improve their robustness. For a contractive dynamical system two trajectories starting from different initial conditions converge to each other exponentially fast. Contractive Convolutional NODEs can enjoy increased robustness as slight perturbations of the features do not cause a significant change in the output. Contractivity can be induced during training by using a regularization term involving the Jacobian of the system dynamics. To reduce the computational burden, we show that it can also be promoted using carefully selected weight regularization terms for a class of NODEs with slope-restricted activation functions. The performance of the proposed regularizers is illustrated through benchmark image classification tasks on MNIST and FashionMNIST datasets, where images are corrupted by different kinds of noise and attacks.

【5】Generative Co-Design of Antibody Sequences and Structures via Black-Box Guidance in a Shared Latent Space
标题：在共享潜在空间中通过黑匣子引导进行抗体序列和结构的生成性协同设计
链接：https://arxiv.org/abs/2508.11424

作者：ao, Yuangang Pan, Xixian Chen
备注：Accepted by IJCAI 2025
摘要：深度生成模型的进步使得能够在抗原-抗体复合物作为背景的情况下对抗体序列和结构进行联合建模。然而，用于优化互补决定区（CDR）以改善可开发性性质的现有方法在原始数据空间中操作，由于低效的搜索过程而导致过度昂贵的评估。为了解决这个问题，我们提出了LatEnt黑盒设计（LEAD），一个序列结构协同设计框架，优化序列和结构在其共享的潜在空间。优化共享潜码不仅可以突破现有方法的局限性，而且可以保证不同模态设计的同步性。特别是，我们设计了一个黑盒指导策略，以适应现实世界的情况下，许多属性评估是不可微的。实验结果表明，我们的LEAD实现了卓越的优化性能为单一和多属性的目标。值得注意的是，LEAD将查询消耗减少了一半，同时在属性优化方面超过了基线方法。该代码可在www.example.com上获得。
摘要：Advancements in deep generative models have enabled the joint modeling of antibody sequence and structure, given the antigen-antibody complex as context. However, existing approaches for optimizing complementarity-determining regions (CDRs) to improve developability properties operate in the raw data space, leading to excessively costly evaluations due to the inefficient search process. To address this, we propose LatEnt blAck-box Design (LEAD), a sequence-structure co-design framework that optimizes both sequence and structure within their shared latent space. Optimizing shared latent codes can not only break through the limitations of existing methods, but also ensure synchronization of different modality designs. Particularly, we design a black-box guidance strategy to accommodate real-world scenarios where many property evaluators are non-differentiable. Experimental results demonstrate that our LEAD achieves superior optimization performance for both single and multi-property objectives. Notably, LEAD reduces query consumption by a half while surpassing baseline methods in property optimization. The code is available at https://github.com/EvaFlower/LatEnt-blAck-box-Design.

【6】A Global Dataset of Location Data Integrity-Assessed Reforestation Efforts
标题：位置数据完整性评估的重新造林努力的全球数据集
链接：https://arxiv.org/abs/2508.11349

作者：hn, Selvyn Allotey, Till Koebe, Alexandra Tyukavina, Ingmar Weber
备注：10 figures
摘要：植树造林和重新造林是通过加强碳固存来减缓气候变化的流行战略。然而，这些努力的有效性往往是由项目开发者自我报告的，或通过外部验证有限的过程进行认证的。这导致了对数据可靠性和项目完整性的担忧。为了应对对自愿碳市场日益严格的审查，本研究报告提出了一个关于全球造林和再造林努力的数据集，该数据集是根据原始（Meta）信息汇编的，并以时间序列卫星图像和其他二级数据加以补充。我们的数据集涵盖了33年来45，628个项目的1，289，068个种植点。由于任何基于遥感的验证工作都依赖于种植场地地理边界的完整性，因此该数据集引入了对所提供的场地级位置信息的标准化评估，我们将其总结为一个易于沟通的关键指标：LDIS -位置数据完整性得分。我们发现，大约79%的地理参考种植点监测失败的10个LDIS指标中至少有1个，而15%的监测项目缺乏机器可读的地理参考数据摆在首位。除了加强自愿碳市场的问责制外，所介绍的数据集还具有作为培训数据的价值，例如与数百万个链接的Sentinel-2和Planetscope卫星图像有关的计算机视觉任务。
摘要：Afforestation and reforestation are popular strategies for mitigating climate change by enhancing carbon sequestration. However, the effectiveness of these efforts is often self-reported by project developers, or certified through processes with limited external validation. This leads to concerns about data reliability and project integrity. In response to increasing scrutiny of voluntary carbon markets, this study presents a dataset on global afforestation and reforestation efforts compiled from primary (meta-)information and augmented with time-series satellite imagery and other secondary data. Our dataset covers 1,289,068 planting sites from 45,628 projects spanning 33 years. Since any remote sensing-based validation effort relies on the integrity of a planting site's geographic boundary, this dataset introduces a standardized assessment of the provided site-level location information, which we summarize in one easy-to-communicate key indicator: LDIS -- the Location Data Integrity Score. We find that approximately 79\% of the georeferenced planting sites monitored fail on at least 1 out of 10 LDIS indicators, while 15\% of the monitored projects lack machine-readable georeferenced data in the first place. In addition to enhancing accountability in the voluntary carbon market, the presented dataset also holds value as training data for e.g. computer vision-related tasks with millions of linked Sentinel-2 and Planetscope satellite images.

【7】RegimeNAS: Regime-Aware Differentiable Architecture Search With Theoretical Guarantees for Financial Trading
标题：RegimeNAS：具有监管意识的差异化架构搜索，为金融交易提供理论保证
链接：https://arxiv.org/abs/2508.11338

作者：h Devadiga, Yashmitha Shailesh
摘要：我们介绍RegimeNAS，这是一种新型的可区分架构搜索框架，专门用于通过明确整合市场机制意识来提高加密货币交易性能。针对静态深度学习模型在高度动态的金融环境中的局限性，RegimeNAS具有三个核心创新：（1）理论上基于贝叶斯搜索空间优化架构，具有可证明的收敛特性;（2）专门的动态激活神经模块（波动率，趋势和范围块）为不同的市场条件量身定制;以及（3）包含市场特定惩罚的多目标损失函数（例如，波动率匹配，过渡平滑）以及数学上强制的Lipschitz稳定性约束。状态识别利用跨多个时间帧的多头注意力，以提高准确性和不确定性估计。对大量真实世界加密货币数据的严格实证评估表明，RegimeNAS的性能明显优于最先进的基准测试，与最佳传统循环基准相比，平均绝对误差降低了80.3%，收敛速度也大大加快（9个epoch对50多个epoch）。消融研究和特定制度的分析证实了每个组件的关键贡献，特别是制度意识的适应机制。这项工作强调了嵌入特定领域的知识，如市场制度，直接在NAS过程中开发具有挑战性的金融应用程序的强大和自适应模型的必要性。
摘要：We introduce RegimeNAS, a novel differentiable architecture search framework specifically designed to enhance cryptocurrency trading performance by explicitly integrating market regime awareness. Addressing the limitations of static deep learning models in highly dynamic financial environments, RegimeNAS features three core innovations: (1) a theoretically grounded Bayesian search space optimizing architectures with provable convergence properties; (2) specialized, dynamically activated neural modules (Volatility, Trend, and Range blocks) tailored for distinct market conditions; and (3) a multi-objective loss function incorporating market-specific penalties (e.g., volatility matching, transition smoothness) alongside mathematically enforced Lipschitz stability constraints. Regime identification leverages multi-head attention across multiple timeframes for improved accuracy and uncertainty estimation. Rigorous empirical evaluation on extensive real-world cryptocurrency data demonstrates that RegimeNAS significantly outperforms state-of-the-art benchmarks, achieving an 80.3% Mean Absolute Error reduction compared to the best traditional recurrent baseline and converging substantially faster (9 vs. 50+ epochs). Ablation studies and regime-specific analysis confirm the critical contribution of each component, particularly the regime-aware adaptation mechanism. This work underscores the imperative of embedding domain-specific knowledge, such as market regimes, directly within the NAS process to develop robust and adaptive models for challenging financial applications.

【8】Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble
标题：通过鲁棒的时间自我整合增强SNN的鲁棒性和准确性权衡
链接：https://arxiv.org/abs/2508.11279

作者：ng, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng
摘要：尖峰神经网络（SNN）为节能和脑启发计算提供了一个有前途的方向，但它们对对抗性扰动的脆弱性仍然知之甚少。在这项工作中，我们通过时间集成的镜头重新审视了SNN的对抗鲁棒性，将网络视为跨越离散时间步的演化子网络的集合。这个公式揭示了两个关键的，但未充分探索的挑战-个人的时间子网络的脆弱性和对抗性的脆弱性，跨时间转移的趋势。为了克服这些限制，我们提出了鲁棒时间自集成（RTE），这是一种训练框架，可以提高每个子网络的鲁棒性，同时减少对抗性扰动的时间可转移性。RTE将这两个目标集成到一个统一的损失，并采用随机抽样策略进行有效的优化。在多个基准测试中进行的大量实验表明，RTE在鲁棒性-准确性权衡方面始终优于现有的训练方法。进一步的分析表明，RTE重塑了SNN的内部鲁棒性景观，导致更具弹性和时间多样化的决策边界。我们的研究强调了时间结构在对抗学习中的重要性，并为构建鲁棒的尖峰模型提供了原则基础。
摘要：Spiking Neural Networks (SNNs) offer a promising direction for energy-efficient and brain-inspired computing, yet their vulnerability to adversarial perturbations remains poorly understood. In this work, we revisit the adversarial robustness of SNNs through the lens of temporal ensembling, treating the network as a collection of evolving sub-networks across discrete timesteps. This formulation uncovers two critical but underexplored challenges-the fragility of individual temporal sub-networks and the tendency for adversarial vulnerabilities to transfer across time. To overcome these limitations, we propose Robust Temporal self-Ensemble (RTE), a training framework that improves the robustness of each sub-network while reducing the temporal transferability of adversarial perturbations. RTE integrates both objectives into a unified loss and employs a stochastic sampling strategy for efficient optimization. Extensive experiments across multiple benchmarks demonstrate that RTE consistently outperforms existing training methods in robust-accuracy trade-off. Additional analyses reveal that RTE reshapes the internal robustness landscape of SNNs, leading to more resilient and temporally diversified decision boundaries. Our study highlights the importance of temporal structure in adversarial learning and offers a principled foundation for building robust spiking models.

【9】Enhancing Interactive Voting-Based Map Matching: Improving Efficiency and Robustness for Heterogeneous GPS Trajectories
标题：增强基于交互式投票的地图匹配：提高异类GPS轨迹的效率和稳健性
链接：https://arxiv.org/abs/2508.11235

作者：lemanni, Arianna Burzacchi, Davide Colombi, Elena Giarratano
摘要：本文提出了一种增强版本的交互式投票为基础的地图匹配算法，旨在有效地处理不同采样率的轨迹。主要目的是重建GPS轨迹具有高精度，独立的输入数据的质量。在原始算法的基础上，专门开发用于将GPS信号与道路网络对齐的算法，我们通过整合轨迹插补来扩展其功能。我们的改进还包括实施一个距离有界的交互式投票策略，以降低计算复杂性，以及修改，以解决道路网络中丢失的数据。此外，我们还整合了一个源自OpenStreetMap的定制资产，使这种方法能够顺利应用于OpenStreetMap道路网络覆盖的任何地理区域。这些改进保留了原始算法的核心优势，同时显著扩展了其对各种现实场景的适用性。
摘要：This paper presents an enhanced version of the Interactive Voting-Based Map Matching algorithm, designed to efficiently process trajectories with varying sampling rates. The main aim is to reconstruct GPS trajectories with high accuracy, independent of input data quality. Building upon the original algorithm, developed exclusively for aligning GPS signals to road networks, we extend its capabilities by integrating trajectory imputation. Our improvements also include the implementation of a distance-bounded interactive voting strategy to reduce computational complexity, as well as modifications to address missing data in the road network. Furthermore, we incorporate a custom-built asset derived from OpenStreetMap, enabling this approach to be smoothly applied in any geographic region covered by OpenStreetMap's road network. These advancements preserve the core strengths of the original algorithm while significantly extending its applicability to diverse real-world scenarios.

【10】How Causal Abstraction Underpins Computational Explanation
标题：因果抽象如何支撑计算解释
链接：https://arxiv.org/abs/2508.11214

作者：eiger, Jacqueline Harding, Thomas Icard
摘要：认知行为的解释常常诉诸于计算而非表征。一个系统在该系统中通过合适的代表性载体实现给定的计算需要什么？我们认为，因果关系的语言-特别是因果抽象理论-提供了一个富有成效的镜头在这个话题上。利用当前关于人工神经网络深度学习的讨论，我们说明了计算和认知哲学中的经典主题如何在当代机器学习中重新出现。我们提供了一个帐户的计算实现接地因果抽象，并检查所产生的图片表示的作用。我们认为，这些问题是最有益的探索与推广和预测。
摘要：Explanations of cognitive behavior often appeal to computations over representations. What does it take for a system to implement a given computation over suitable representational vehicles within that system? We argue that the language of causality -- and specifically the theory of causal abstraction -- provides a fruitful lens on this topic. Drawing on current discussions in deep learning with artificial neural networks, we illustrate how classical themes in the philosophy of computation and cognition resurface in contemporary machine learning. We offer an account of computational implementation grounded in causal abstraction, and examine the role for representation in the resulting picture. We argue that these issues are most profitably explored in connection with generalization and prediction.

【11】Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis
标题：量化与修剪：来自强彩票假说的见解
链接：https://arxiv.org/abs/2508.11020

作者：mar, Emanuele Natale
摘要：量化是提高神经网络效率的一项基本技术，但我们对它的理论理解仍然有限。以前的工作表明，极低精度的网络，如二进制网络，可以通过修剪大型随机初始化网络来构建，并表明原始网络和修剪网络的大小之间的比例最多为多对数。他们采用的特定修剪方法激发了一系列被称为强彩票假设（SLTH）的理论工作，该理论利用了随机子集和问题的见解。然而，这些结果主要针对连续设置，并且不能应用于将SLTH结果扩展到量化设置。在这项工作中，我们建立在Borgs等人的基础结果上。在数字划分问题上，在量化设置中推导出随机子集和问题的新理论结果。利用这些结果，我们然后扩展SLTH框架有限精度网络。虽然以前的工作SLTH表明，修剪允许近似的某一类神经网络，我们证明，在量化的设置，类似的目标离散神经网络类可以准确地表示，我们证明了最佳界限的必要overparameterization的初始网络作为目标网络的精度的函数。
摘要：Quantization is an essential technique for making neural networks more efficient, yet our theoretical understanding of it remains limited. Previous works demonstrated that extremely low-precision networks, such as binary networks, can be constructed by pruning large, randomly-initialized networks, and showed that the ratio between the size of the original and the pruned networks is at most polylogarithmic. The specific pruning method they employed inspired a line of theoretical work known as the Strong Lottery Ticket Hypothesis (SLTH), which leverages insights from the Random Subset Sum Problem. However, these results primarily address the continuous setting and cannot be applied to extend SLTH results to the quantized setting. In this work, we build on foundational results by Borgs et al. on the Number Partitioning Problem to derive new theoretical results for the Random Subset Sum Problem in a quantized setting. Using these results, we then extend the SLTH framework to finite-precision networks. While prior work on SLTH showed that pruning allows approximation of a certain class of neural networks, we demonstrate that, in the quantized setting, the analogous class of target discrete neural networks can be represented exactly, and we prove optimal bounds on the necessary overparameterization of the initial network as a function of the precision of the target network.

【12】CURE: Critical-Token-Guided Re-concatenation for Entropy-collapse Prevention
标题：CREE：关键令牌引导的重新连接以防止熵崩溃
链接：https://arxiv.org/abs/2508.11016

作者：i, Rongkun Xue, Jie Wang, Ming Zhou, Zhi Li, Xiaofeng Ji, Yongqi Wang, Miao Liu, Zheming Yang, Minghui Qiu, Jing Yang
摘要：具有验证奖励的强化学习（RLVR）的最新进展推动了大型语言模型（LLM）中更复杂的认知行为的出现，从而增强了它们的推理能力。然而，在先前的RLVR流水线中，在每个采样阶段期间重复使用从数据集分布精确提取的静态初始状态采样产生了过度确定性的低多样性模型行为，这表现为快速熵崩溃并阻碍了长时间训练期间的持续性能增益。为了解决这个问题，我们引入了CURE（用于熵崩溃预防的关键令牌再连接），这是一个平衡探索和开发的两阶段框架。具体来说，在第一阶段，为了有意地将模型转向新颖但连贯的上下文，我们在高熵临界令牌处重新生成，并联合优化原始和分支轨迹。与普通DAPO算法的进一步比较表明，再生过程在保持较高熵度的同时，在数学推理任务上取得了更好的性能。在第二阶段，我们继续使用DAPO的静态初始状态采样进行训练，有意将模型置于熟悉的状态，以逐步加强开发。在Qwen-2.5-Math-7 B上的大量实验表明，与其他RLVR方法相比，CURE在六个数学基准测试中实现了5%的性能增益，在熵和准确性方面都建立了最先进的性能。一系列的实验进一步验证了该方法的有效性。代码可在https://github.com/CURE-Project/CURE上获得。
摘要：Recent advances in Reinforcement Learning with Verified Reward (RLVR) have driven the emergence of more sophisticated cognitive behaviors in large language models (LLMs), thereby enhancing their reasoning capabilities. However, in prior RLVR pipelines, the repeated use of static initial-state sampling drawn exactly from the dataset distribution during each sampling phase produced overly deterministic, low diversity model behavior, which manifested as rapid entropy collapse and hindered sustained performance gains during prolonged training. To address this issue, we introduce CURE (Critical-token-gUided Re concatenation for Entropy-collapse prevention), a two-stage framework that balances exploration and exploitation. Specifically, in the first stage, to deliberately steer the model toward novel yet coherent contexts, we re-generate at high-entropy critical tokens and jointly optimize the original and the branched trajectories. The further comparison with vanilla DAPO shows that the regeneration process achieves a better performance on math reasoning tasks while sustaining a high-level entropy degree for exploration. In the second stage, we continue training with static initial-state sampling by DAPO, intentionally placing the model in a familiar state to gradually strengthen exploitation. Extensive experiments on Qwen-2.5-Math-7B show that, compared to other RLVR methods, CURE achieves a 5% performance gain across six math benchmarks, establishing state-of-the-art performance in both entropy and accuracy. A series of experiments further validate the effectiveness of our approach. Code is available at https://github.com/CURE-Project/CURE.

【13】BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
标题：BeyondWeb：从扩展万亿级预训练的合成数据中获得的经验教训
链接：https://arxiv.org/abs/2508.10975

作者：Maini, Vineeth Dorna, Parth Doshi, Aldo Carranza, Fan Pan, Jack Urbanek, Paul Burstein, Alex Fang, Alvin Deng, Amro Abbas, Brett Larsen, Cody Blakeney, Charvi Bannur, Christina Baek, Darren Teh, David Schwab, Haakon Mongstad, Haoli Yin, Josh Wills, Kaleigh Mentzer, Luke Merrick, Ricardo Monti, Rishabh Adiga, Siddharth Joshi, Spandan Das, Zhengping Wang, Bogdan Gaza, Ari Morcos, Matthew Leavitt
摘要：大型语言模型（LLM）预训练的最新进展表明，简单地扩展数据量最终会导致收益递减，从而撞上数据墙。作为回应，使用合成数据进行预训练已经成为推动性能前沿的一个有前途的范例。尽管如此，影响合成数据质量的因素仍然知之甚少。在这项工作中，我们介绍了BeyondWeb，这是一个合成数据生成框架，可以为预训练生成高质量的合成数据。BeyondWeb显著扩展了传统网络规模数据集的功能，在一套14个基准评估中平均时，其性能分别优于最先进的合成预训练数据集，如Cosmopedia和Nemotron-CC的高质量合成子集（Nemotron-Synth），分别高达5.1个百分点（pp）和2.6pp。它提供的训练速度比开放网络数据快7.7倍，比Nemotron-Synth快2.7倍。值得注意的是，在BeyondWeb上为180 B令牌训练的3B模型优于在Cosmopedia上为相同令牌预算训练的8B模型。我们还介绍了BeyondWeb对预训练合成数据的几点见解：是什么推动了它的好处，哪些数据需要重新措辞以及如何重新措辞，以及模型大小和家族对数据质量的影响。总的来说，我们的工作表明，没有生成高质量合成预训练数据的灵丹妙药。最好的结果需要联合优化许多因素，这是一项具有挑战性的任务，需要严格的科学和实用的专业知识。朴素的方法可以产生适度的改进，可能会付出很大的代价，而执行良好的方法可以产生变革性的改进，BeyondWeb就是一个例子。
摘要：Recent advances in large language model (LLM) pretraining have shown that simply scaling data quantity eventually leads to diminishing returns, hitting a data wall. In response, the use of synthetic data for pretraining has emerged as a promising paradigm for pushing the frontier of performance. Despite this, the factors affecting synthetic data quality remain poorly understood. In this work, we introduce BeyondWeb, a synthetic data generation framework that produces high-quality synthetic data for pretraining. BeyondWeb significantly extends the capabilities of traditional web-scale datasets, outperforming state-of-the-art synthetic pretraining datasets such as Cosmopedia and Nemotron-CC's high-quality synthetic subset (Nemotron-Synth) by up to 5.1 percentage points (pp) and 2.6pp, respectively, when averaged across a suite of 14 benchmark evaluations. It delivers up to 7.7x faster training than open web data and 2.7x faster than Nemotron-Synth. Remarkably, a 3B model trained for 180B tokens on BeyondWeb outperforms an 8B model trained for the same token budget on Cosmopedia. We also present several insights from BeyondWeb on synthetic data for pretraining: what drives its benefits, which data to rephrase and how, and the impact of model size and family on data quality. Overall, our work shows that there's no silver bullet for generating high-quality synthetic pretraining data. The best outcomes require jointly optimizing many factors, a challenging task that requires rigorous science and practical expertise. Naive approaches can yield modest improvements, potentially at great cost, while well-executed methods can yield transformative improvements, as exemplified by BeyondWeb.

【14】Apriel-Nemotron-15B-Thinker
标题：Apriel-Nemotron-15 B-Thinker
链接：https://arxiv.org/abs/2508.10948

作者：Radhakrishna, Soham Parikh, Gopal Sarda, Anil Turkkan, Quaizar Vohra, Raymond Li, Dhruv Jhamb, Kelechi Ogueji, Aanjaneya Shukla, Oluwanifemi Bamgbose, Toby Liang, Luke Kumar, Oleksiy Ostapenko, Shiva Krishna Reddy Malay, Aman Tiwari, Tara Bogavelli, Vikas Yadav, Jash Mehta, Saloni Mittal, Akshay Kalkunte, Pulkit Pattnaik, Khalil Slimi, Anirudh Sreeram, Jishnu Nair, Akintunde Oladipo, Shashank Maiya, Khyati Mahajan, Rishabh Maheshwary, Masoud Hashemi, Sai Rajeswar Mudumba, Sathwik Tejaswi Madhusudhan, Torsten Scholak, Sebastien Paquet, Sagar Davasam, Srinivas Sunkara
摘要：虽然大型语言模型（LLM）在代码、数学和其他企业任务等领域取得了卓越的推理能力，但其显著的内存和计算成本往往妨碍其在实际企业环境中的使用。为此，我们推出了Apriel-Nemotron-15 B-Thinker，这是ServiceNow Apriel SLM系列中的一款150亿参数型号，其性能可媲美o 1-mini、QWQ 32 B和EXAONE-Deep-32 B等中型最先进型号，同时仅保留了这些替代品一半的内存占用。Apriel-Nemotron-15 B-Thinker模型在四个阶段的训练管道中进行训练，包括1）基础模型升级，2）连续预训练，3）监督微调（SFT）和4）使用GRPO的强化学习。对一套不同基准的综合评估一致表明，我们的Apriel-Nemotron-15 B-Thinker模型与其320亿参数对应模型的性能相匹配或超过其性能，尽管其大小不到其一半。
摘要：While large language models (LLMs) have achieved remarkable reasoning capabilities across domains like code, math and other enterprise tasks, their significant memory and computational costs often preclude their use in practical enterprise settings. To this end, we introduce Apriel-Nemotron-15B-Thinker, a 15-billion parameter model in the ServiceNow Apriel SLM series that achieves performance against medium sized state-of-the-art models such as o1-mini, QWQ32B, and EXAONE-Deep-32B while maintaining only half the memory footprint of those alternatives. Apriel-Nemotron-15B-Thinker model is trained in a four stage training pipeline including 1) Base Model upscaling, 2) Continual Pre-training 3) Supervised Fine-tuning (SFT) and 4) Reinforcement Learning using GRPO. Comprehensive evaluations across a diverse suite of benchmarks consistently demonstrate that our Apriel-Nemotron-15B-Thinker model matches or exceeds the performance of its 32-billion parameter counterparts, despite being less than half their size.

【15】Insect-Wing Structured Microfluidic System for Reservoir Computing
标题：用于储层计算的昆虫翼结构微流体系统
链接：https://arxiv.org/abs/2508.10915

作者：use (1), Thomas Ramsey (2), Samitha Somathilaka (1), Nicholas Kleinsasser (1), Sangjin Ryu (2), Sasitharan Balasubramaniam (1) ((1) School of Computing, University of Nebraska-Lincoln, Lincoln, Nebraska, USA, (2) Department of Mechanical and Materials Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA)
摘要：随着对更高效和自适应计算的需求不断增长，自然启发的架构为传统电子设计提供了有前途的替代方案。利用生物形态和流体动力学的微流体平台为电子设备不适合的环境中的低功耗、高弹性计算提供了令人信服的基础。本研究探讨了一种基于蜻蜓翅膀启发的微流控芯片的混合水库计算系统，该芯片将时间输入模式编码为微通道网络内的流体相互作用。该系统采用三个基于染料的入口通道和三个摄像头监控的检测区域，将离散的空间模式转换为动态的颜色输出信号。然后修改这些储层输出信号，并将其传递到一个简单且可训练的读出层进行模式分类。使用原始储层输出和综合生成的输出的组合，我们评估了系统性能、系统清晰度和数据效率。结果表明，一致的分类精度高达$91\%$，即使与粗糙的分辨率和有限的训练数据，突出了微流体水库计算的可行性。
摘要：As the demand for more efficient and adaptive computing grows, nature-inspired architectures offer promising alternatives to conventional electronic designs. Microfluidic platforms, drawing on biological forms and fluid dynamics, present a compelling foundation for low-power, high-resilience computing in environments where electronics are unsuitable. This study explores a hybrid reservoir computing system based on a dragonfly-wing inspired microfluidic chip, which encodes temporal input patterns as fluid interactions within the micro channel network. The system operates with three dye-based inlet channels and three camera-monitored detection areas, transforming discrete spatial patterns into dynamic color output signals. These reservoir output signals are then modified and passed to a simple and trainable readout layer for pattern classification. Using a combination of raw reservoir outputs and synthetically generated outputs, we evaluated system performance, system clarity, and data efficiency. The results demonstrate consistent classification accuracies up to $91\%$, even with coarse resolution and limited training data, highlighting the viability of the microfluidic reservoir computing.

【16】Uncovering Latent Connections in Indigenous Heritage: Semantic Pipelines for Cultural Preservation in Brazil
标题：揭露土著遗产中的潜在联系：巴西文化保护的语义管道
链接：https://arxiv.org/abs/2508.10911

作者：r Zerkowski, Nina S. T. Hirata
备注：8 tables, 7 figures, submitted to AAAI2026
摘要：土著社区在保护其文化遗产方面面临着持续的挑战，特别是在面临系统性边缘化和城市发展的情况下。在巴西，国家土著人民博物馆通过Tainacan平台拥有该国最大的土著物品和图像在线收藏，为文化参与提供了重要资源。使用来自该存储库的公开数据，我们提出了一个数据驱动的倡议，该倡议应用人工智能来增强可访问性，解释和探索。我们开发了两个语义管道：一个视觉管道，模型基于图像的相似性和文本管道，捕捉项目描述的语义关系。这些嵌入空间被投影到二维空间中，并集成到我们开发的交互式可视化工具中。除了基于相似性的导航之外，用户还可以通过时间和地理镜头来探索集合，从而实现语义和上下文视角。该系统支持策展任务，帮助公众参与，并揭示收藏品中的潜在联系。这项工作展示了人工智能如何在道德上为文化保护实践做出贡献。
摘要：Indigenous communities face ongoing challenges in preserving their cultural heritage, particularly in the face of systemic marginalization and urban development. In Brazil, the Museu Nacional dos Povos Indigenas through the Tainacan platform hosts the country's largest online collection of Indigenous objects and iconographies, providing a critical resource for cultural engagement. Using publicly available data from this repository, we present a data-driven initiative that applies artificial intelligence to enhance accessibility, interpretation, and exploration. We develop two semantic pipelines: a visual pipeline that models image-based similarity and a textual pipeline that captures semantic relationships from item descriptions. These embedding spaces are projected into two dimensions and integrated into an interactive visualization tool we also developed. In addition to similarity-based navigation, users can explore the collection through temporal and geographic lenses, enabling both semantic and contextualized perspectives. The system supports curatorial tasks, aids public engagement, and reveals latent connections within the collection. This work demonstrates how AI can ethically contribute to cultural preservation practices.

【17】Approximating the universal thermal climate index using sparse regression with orthogonal polynomials
标题：使用带有正交多项的稀疏回归逼近通用热气候指数
链接：https://arxiv.org/abs/2508.11307

作者：an, Gregor Skok, Ljupco Todorovski, Saso Dzeroski
摘要：本文探讨了新的数据驱动的建模方法，用于分析和近似通用热气候指数（UTCI），这是一种基于生理学的指标，集成了多个大气变量来评估热舒适度。鉴于UTCI的非线性，多变量结构，我们研究符号和稀疏回归技术作为工具，可解释和有效的函数逼近。特别是，我们强调了使用正交多项式基地的好处，如勒让德多项式稀疏回归框架，展示了他们的优势，在稳定性，收敛性和层次的可解释性相比，标准的多项式展开。我们证明，我们的模型实现显着较低的均方根损失比广泛使用的六次多项式基准，同时使用相同或更少的参数。通过利用勒让德多项式基，我们构建的模型，有效地填充帕累托前沿的准确性与复杂性，并表现出稳定的，分层的系数结构在不同的模型容量。仅对20%的数据进行训练，我们的模型就可以稳健地推广到剩余的80%，并且在自举过程中具有一致的性能。分解有效地近似的UTCI作为一个类似傅立叶展开的正交基础上，产生的结果接近理论最优的L2（最小二乘）的意义。我们还将这些发现与环境建模中方程发现的更广泛背景联系起来，参考基于概率语法的方法，这些方法可以在符号表达式中执行域一致性和紧凑性。综上所述，这些结果说明了如何结合稀疏性，正交性和符号结构，实现UTCI等复杂环境指数的鲁棒性，可解释性建模-并在准确性和效率方面显着优于最先进的近似。
摘要：This article explores novel data-driven modeling approaches for analyzing and approximating the Universal Thermal Climate Index (UTCI), a physiologically-based metric integrating multiple atmospheric variables to assess thermal comfort. Given the nonlinear, multivariate structure of UTCI, we investigate symbolic and sparse regression techniques as tools for interpretable and efficient function approximation. In particular, we highlight the benefits of using orthogonal polynomial bases-such as Legendre polynomials-in sparse regression frameworks, demonstrating their advantages in stability, convergence, and hierarchical interpretability compared to standard polynomial expansions. We demonstrate that our models achieve significantly lower root-mean squared losses than the widely used sixth-degree polynomial benchmark-while using the same or fewer parameters. By leveraging Legendre polynomial bases, we construct models that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training on just 20% of the data, our models generalize robustly to the remaining 80%, with consistent performance under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense. We also connect these findings to the broader context of equation discovery in environmental modeling, referencing probabilistic grammar-based methods that enforce domain consistency and compactness in symbolic expressions. Taken together, these results illustrate how combining sparsity, orthogonality, and symbolic structure enables robust, interpretable modeling of complex environmental indices like UTCI - and significantly outperforms the state-of-the-art approximation in both accuracy and efficiency.

【18】The Role of Entanglement in Quantum Reservoir Computing with Coupled Kerr Nonlinear Oscillators
标题：纠缠在具有耦合克尔非线性振子的量子水库计算中的作用
链接：https://arxiv.org/abs/2508.11175

作者：i, Hadi Zadeh-Haghighi, Youssef Kora, Christoph Simon
摘要：量子水库计算（QRC）使用量子动力学来有效地处理时间数据。在这项工作中，我们研究了基于两个耦合克尔非线性振荡器的QRC框架，由于其复杂的非线性相互作用和潜在的高维状态空间，该系统非常适合于时间序列预测任务。我们探讨了它在时间序列预测中的性能如何取决于关键的物理参数：输入驱动强度，克尔非线性和振荡器耦合，并分析了纠缠在提高水库的计算性能中的作用，重点是它对预测非平凡时间序列的影响。使用对数负量化纠缠和归一化均方根误差（NRMSE），以评估预测精度，我们的研究结果表明，纠缠提供了一个计算上的优势，平均高达一个阈值的输入频率，持续在一定程度的耗散和失相。特别是，我们发现，更高的耗散率可以提高性能。虽然纠缠优势表现为平均和最坏情况下的性能的改善，但它不会导致最佳情况下的错误的改善。这些发现有助于更广泛地理解量子库，以实现高性能、高效的量子机器学习和时间序列预测。
摘要：Quantum Reservoir Computing (QRC) uses quantum dynamics to efficiently process temporal data. In this work, we investigate a QRC framework based on two coupled Kerr nonlinear oscillators, a system well-suited for time-series prediction tasks due to its complex nonlinear interactions and potentially high-dimensional state space. We explore how its performance in time-series prediction depends on key physical parameters: input drive strength, Kerr nonlinearity, and oscillator coupling, and analyze the role of entanglement in improving the reservoir's computational performance, focusing on its effect on predicting non-trivial time series. Using logarithmic negativity to quantify entanglement and normalized root mean square error (NRMSE) to evaluate predictive accuracy, our results suggest that entanglement provides a computational advantage on average-up to a threshold in the input frequency-that persists under some levels of dissipation and dephasing. In particular, we find that higher dissipation rates can enhance performance. While the entanglement advantage manifests as improvements in both average and worst-case performance, it does not lead to improvements in the best-case error. These findings contribute to the broader understanding of quantum reservoirs for high performance, efficient quantum machine learning and time-series forecasting.

【19】Generalized Similarity U: A Non-parametric Test of Association Based on Similarity
标题：广义相似性U：基于相似性的关联性非参数测试
链接：https://arxiv.org/abs/1801.01220

作者：i Wei, Qing Lu
备注：None
摘要：第二代测序技术越来越多地用于遗传关联研究，其中主要研究兴趣是识别导致各种表型的遗传变异集。表型可以是单变量疾病状态，多变量反应，甚至是高维结果。考虑到基因型和表型作为两个复杂的对象，这也提出了一个一般的统计问题，测试复杂对象之间的关联。本文提出了一种基于相似性的测试方法--广义相似性U（GSU），它可以测试复杂对象之间的关联。我们首先研究了测试在一般情况下的理论属性，然后集中在应用程序的测试测序关联研究。在理论分析的基础上，我们提出了使用基于拉普拉斯核的相似性GSU，以提高功率和增强鲁棒性。通过仿真，我们发现GSU确实在功率和鲁棒性方面优于现有方法。我们进一步进行了全基因组测序（WGS）扫描阿尔茨海默病神经影像学倡议（ADNI）的数据，确定三个基因，APOE，APOC 1和TOMM 40，与成像表型。我们开发了一个C++软件包，用于使用GSU分析全基因组测序数据。源代码可以从https://github.com/changshuaiwei/gsu下载。
摘要：Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotype. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian kernel based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer Disease Neuroimaging Initiative (ADNI) data, identifying three genes, APOE, APOC1 and TOMM40, associated with imaging phenotype. We developed a C++ package for analysis of whole genome sequencing data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu.

【20】A Weighted U Statistic for Genetic Association Analyses of Sequencing Data
标题：测序数据遗传关联分析的加权U统计量
链接：https://arxiv.org/abs/1505.01204

作者：i Wei, Ming Li, Zihuai He, Olga Vsevolozhskaya, Daniel J. Schaid, Qing Lu
备注：None
摘要：随着下一代测序技术的进步，产生了大量的测序数据，为全面研究罕见变异在复杂疾病遗传病因学中的作用提供了很好的机会。然而，这对高维测序数据的统计分析提出了很大的挑战。基于传统统计方法的关联分析由于遗传变异的频率低和数据的维数极高而损失了大量的能量。我们开发了一种加权U统计量，称为WU-seq，用于测序数据的高维关联分析。基于非参数U统计量，WU-SEQ不假设潜在的疾病模型和表型分布，并且可以应用于各种表型。通过模拟研究和实证研究，我们表明，当基本假设被违反时，WU-SEQ优于常用的SKAT方法（例如，表型遵循重尾分布）。即使假设得到满足，WU-SEQ仍然达到了与SKAT相当的性能。最后，我们将WU-seq应用于达拉斯心脏研究（DHS）的测序数据，并检测到ANGPTL 4与极低密度脂蛋白胆固醇之间的关联。
摘要：With advancements in next generation sequencing technology, a massive amount of sequencing data are generated, offering a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, this poses a great challenge for the statistical analysis of high-dimensional sequencing data. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a weighted U statistic, referred to as WU-seq, for the high-dimensional association analysis of sequencing data. Based on a non-parametric U statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used SKAT method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-seq to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递