机器学习学术速递[8.20]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计118篇

大模型相关(8篇)

【1】Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
标题：WLVR中的深度-广度协同效应：通过自适应探索解锁LLM推理收益
链接：https://arxiv.org/abs/2508.13755

作者：Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Dongchun Xie, Yiwei Wang, Xiaodan Liang, Jing Tang
备注：11pages, 9 figures
摘要：具有可验证奖励的强化学习（RLVR）已经成为解锁大型语言模型中推理能力的强大范例，但其全部潜力受到两个未充分开发的维度的阻碍：深度-模型可以采样的最难问题;广度-单次迭代中消耗的实例数量。我们剖析了流行的GRPO算法，并揭示了一个系统性的偏见：累积优势不成比例地加权中等精度的样本，而降低权重的低精度的情况下，是至关重要的推推理边界。为了纠正深度忽视，我们引入了难度自适应推出采样（DARS），它通过有针对性的多阶段推出来重新加权硬问题，从而增加了硬问题的积极推出数量。从经验上讲，天真地扩大卷展栏的大小只会加速收敛，甚至会伤害Pass@K。相比之下，我们的DARS提供一致的Pass@K增益，而不会在收敛时产生额外的推理成本。正如我们自适应地扩展了探索的深度一样，我们现在要问的是，积极地扩展训练数据的广度是否可以进一步扩大推理的收益。为此，我们强烈地扩展批量大小，并在多个时期内用全批量更新取代PPO的小批量迭代。增加宽度可显著提高Pass@1性能。大宽度训练维持高的标记级熵，表明持续的探索和降低的梯度噪声。我们进一步介绍了DARS-B，它以较大的宽度增强了DARS，并在Pass@K和Pass@1中展示了同步增益。研究结果证实，广度和适应性探索跨深度作为正交维度的RLVR，这是关键释放RLVR的推理能力。
摘要：Reinforcement Learning with Verifiable Reward (RLVR) has emerged as a powerful paradigm for unlocking reasoning capabilities in large language models, yet its full potential is hindered by two under-explored dimensions: Depth-the hardest problem a model can sample; Breadth-the number of instances consumed in a single iteration. We dissect the popular GRPO algorithm and reveal a systematic bias: the cumulative-advantage disproportionately weights samples with medium accuracy, while down-weighting the low-accuracy instances that are crucial for pushing reasoning boundaries. To rectify the depth neglect, we introduce Difficulty Adaptive Rollout Sampling (DARS), which re-weights hard problems through targeted multi-stage rollouts, thereby increasing the number of positive rollouts for hard problems. Empirically, naively enlarging rollout size only accelerates convergence and even hurts Pass@K. Our DARS, in contrast, delivers consistent Pass@K gains without extra inference cost at convergence. Just as we adaptively expanded the depth of exploration, we now ask whether aggressively scaling the breadth of training data can further amplify reasoning gains. To this end, we intensely scale batch size and replace PPO's mini-batch iterations with full-batch updates over multiple epochs. Increasing breadth significantly enhances Pass@1 performance. Large-breadth training sustains high token-level entropy, indicating continued exploration and reduced gradient noise. We further present DARS-B, which augments DARS with large breadth, and demonstrate simultaneous gains in Pass@K and Pass@1. The results confirm that breadth and adaptive exploration across depth operate as orthogonal dimensions in RLVR, which are key to unleashing the reasoning power of RLVR.

【2】ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?
标题：ViExam：视觉语言模型在越南多模式考试问题上比人类更好吗？
链接：https://arxiv.org/abs/2508.13680

作者：Dang, An Vo, Quang Tau, Duc Dm, Daeyoung Kim
摘要：视觉语言模型（VLM）在英语多模态任务上表现出显着的能力，但它们在具有真正多模态教育内容的低资源语言上的表现在很大程度上尚未被探索。在这项工作中，我们测试VLM如何执行越南教育评估，调查是否VLM主要是在英语数据训练可以处理现实世界中的跨语言多模态推理。我们的工作通过提出ViExam，一个包含2，548个多模态问题的基准，首次全面评估了VLM在多模态越南语考试中的能力。我们发现，最先进的VLM在数学、物理、化学、生物、地理、驾驶考试和智商测试等7个学术领域的平均准确率仅为57.74%，而开源模型的平均准确率为27.70%。大多数VLM的表现低于人类平均水平（66.54%），只有思维VLM o3（74.07%）超过人类平均水平，但仍远低于人类最佳表现（99.60%）。在保持越南语内容的同时使用英语指令的跨语言提示无法提高性能，SOTA VLM的准确性降低了1个百分点。人在环协作可以部分提高VLM性能5个百分点。代码和数据可在https://vi-exam.github.io上获得。
摘要：Vision language models (VLMs) demonstrate remarkable capabilities on English multimodal tasks, but their performance on low-resource languages with genuinely multimodal educational content remains largely unexplored. In this work, we test how VLMs perform on Vietnamese educational assessments, investigating whether VLMs trained predominantly on English data can handle real-world cross-lingual multimodal reasoning. Our work presents the first comprehensive evaluation of VLM capabilities on multimodal Vietnamese exams through proposing ViExam, a benchmark containing 2,548 multimodal questions. We find that state-of-the-art VLMs achieve only 57.74% while open-source models achieve 27.70% mean accuracy across 7 academic domains, including Mathematics, Physics, Chemistry, Biology, Geography, Driving Test, and IQ Test. Most VLMs underperform average human test-takers (66.54%), with only the thinking VLM o3 (74.07%) exceeding human average performance, yet still falling substantially short of human best performance (99.60%). Cross-lingual prompting with English instructions while maintaining Vietnamese content fails to improve performance, decreasing accuracy by 1 percentage point for SOTA VLMs. Human-in-the-loop collaboration can partially improve VLM performance by 5 percentage points. Code and data are available at: https://vi-exam.github.io.

【3】Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models
标题：神经符号人工智能：提高大型语言模型的推理能力
链接：https://arxiv.org/abs/2508.13678

作者：Yang, Jie-Jing Shao, Lan-Zhe Guo, Bo-Wen Zhang, Zhi Zhou, Lin-Han Jia, Wang-Zhou Dai, Yu-Feng Li
备注：9 pages, 3 figures, IJCAI 2025 Survey Track
摘要：大型语言模型（LLM）在各种任务中表现出了令人鼓舞的结果，但它们的推理能力仍然是一个根本性的挑战。开发具有强大推理能力的人工智能系统被认为是追求人工通用智能（AGI）的一个重要里程碑，并引起了学术界和工业界的广泛关注。已经探索了各种技术来增强LLM的推理能力，其中神经符号方法是一种特别有前途的方法。本文全面回顾了最近的发展，神经符号的方法来增强LLM推理。我们首先提出了一个形式化的推理任务，并简要介绍了神经符号学习范式。然后，我们从三个角度讨论了提高LLM推理能力的神经符号方法：Symbolic->LLM，LLM->Symbolic和LLM+Symbolic。最后，我们讨论了几个关键的挑战和有前途的未来方向。我们还发布了一个GitHub存储库，其中包括与本次调查相关的论文和资源：https://github.com/LAMDASZ-ML/Awesome-LLM-Reasoning-with-NeSy。
摘要：Large Language Models (LLMs) have shown promising results across various tasks, yet their reasoning capabilities remain a fundamental challenge. Developing AI systems with strong reasoning capabilities is regarded as a crucial milestone in the pursuit of Artificial General Intelligence (AGI) and has garnered considerable attention from both academia and industry. Various techniques have been explored to enhance the reasoning capabilities of LLMs, with neuro-symbolic approaches being a particularly promising way. This paper comprehensively reviews recent developments in neuro-symbolic approaches for enhancing LLM reasoning. We first present a formalization of reasoning tasks and give a brief introduction to the neurosymbolic learning paradigm. Then, we discuss neuro-symbolic methods for improving the reasoning capabilities of LLMs from three perspectives: Symbolic->LLM, LLM->Symbolic, and LLM+Symbolic. Finally, we discuss several key challenges and promising future directions. We have also released a GitHub repository including papers and resources related to this survey: https://github.com/LAMDASZ-ML/Awesome-LLM-Reasoning-with-NeSy.

【4】LLM-Enhanced Linear Autoencoders for Recommendation
标题：LLM增强型线性自动编码器推荐
链接：https://arxiv.org/abs/2508.13500

作者：on, Seongmin Park, Jongwuk Lee
备注：Accepted by CIKM 2025
摘要：大语言模型（LLM）已被广泛采用，以丰富文本项信息的语义表示在推荐系统。然而，现有的线性自动编码器（LAE），结合文本信息依赖于稀疏的词共现模式，限制了他们的能力，以捕捉丰富的文本语义。为了解决这个问题，我们提出了L3 AE，第一个集成的LLM到LAE框架。L3 AE通过两阶段优化策略有效地整合了文本语义和用户-项目交互的异构知识。(i)L3 AE首先从LLM导出的项目表示构造语义项目到项目的相关矩阵。(ii)然后，它从协作信号中学习项与项的权重矩阵，同时提取语义项相关性作为正则化。值得注意的是，L3 AE的每个阶段都通过封闭形式的解决方案进行优化，确保全局最优性和计算效率。大量的实验表明，L3 AE在三个基准数据集上的表现始终优于最先进的LLM增强模型，在Recall@20和NDCG@20中分别获得了27.6%和39.3%的收益。源代码可在https://github.com/jaewan7599/L3AE_CIKM2025上获得。
摘要：Large language models (LLMs) have been widely adopted to enrich the semantic representation of textual item information in recommender systems. However, existing linear autoencoders (LAEs) that incorporate textual information rely on sparse word co-occurrence patterns, limiting their ability to capture rich textual semantics. To address this, we propose L3AE, the first integration of LLMs into the LAE framework. L3AE effectively integrates the heterogeneous knowledge of textual semantics and user-item interactions through a two-phase optimization strategy. (i) L3AE first constructs a semantic item-to-item correlation matrix from LLM-derived item representations. (ii) It then learns an item-to-item weight matrix from collaborative signals while distilling semantic item correlations as regularization. Notably, each phase of L3AE is optimized through closed-form solutions, ensuring global optimality and computational efficiency. Extensive experiments demonstrate that L3AE consistently outperforms state-of-the-art LLM-enhanced models on three benchmark datasets, achieving gains of 27.6% in Recall@20 and 39.3% in NDCG@20. The source code is available at https://github.com/jaewan7599/L3AE_CIKM2025.

【5】NovoMolGen: Rethinking Molecular Language Model Pretraining
标题：NovoMolGen：重新思考分子语言模型预训练
链接：https://arxiv.org/abs/2508.13408

作者：itsaz, Roshan Balaji, Quentin Fournier, Nirav Pravinbhai Bhatt, Sarath Chandar
摘要：设计具有所需性质分布的从头分子需要有效探索范围从10 ^{23}$到10 ^{60}$的可能的可合成候选物的巨大化学空间。虽然已经开发了各种深度生成模型来使用不同的输入表示来设计小分子，但基于字符串表示的分子大语言模型（Mol-LLM）已经成为一种能够探索数十亿个分子的可扩展方法。然而，关于标准语言建模实践（例如文本表示、标记化策略、模型大小和数据集规模）如何影响分子生成性能的理解仍然有限。在这项工作中，我们通过引入NovoMolGen系统地研究了这些关键方面，NovoMolGen是一系列基于transformer的基础模型，对15亿个分子进行了预训练，用于从头生成分子。通过广泛的实证分析，我们发现在预训练和实际下游性能测量的性能指标之间的弱相关性，揭示了分子和一般NLP训练动态之间的重要区别。NovoMolGen建立了新的最先进的结果，在无约束和目标导向的分子生成任务中大大优于先前的Mol-LLM和专门的生成模型，从而为推进高效和有效的分子建模策略提供了坚实的基础。
摘要：Designing de-novo molecules with desired property profiles requires efficient exploration of the vast chemical space ranging from $10^{23}$ to $10^{60}$ possible synthesizable candidates. While various deep generative models have been developed to design small molecules using diverse input representations, Molecular Large Language Models (Mol-LLMs) based on string representations have emerged as a scalable approach capable of exploring billions of molecules. However, there remains limited understanding regarding how standard language modeling practices such as textual representations, tokenization strategies, model size, and dataset scale impact molecular generation performance. In this work, we systematically investigate these critical aspects by introducing NovoMolGen, a family of transformer-based foundation models pretrained on 1.5 billion molecules for de-novo molecule generation. Through extensive empirical analyses, we identify a weak correlation between performance metrics measured during pretraining and actual downstream performance, revealing important distinctions between molecular and general NLP training dynamics. NovoMolGen establishes new state-of-the-art results, substantially outperforming prior Mol-LLMs and specialized generative models in both unconstrained and goal-directed molecular generation tasks, thus providing a robust foundation for advancing efficient and effective molecular modeling strategies.

【6】Contextual Attention-Based Multimodal Fusion of LLM and CNN for Sentiment Analysis
标题：基于上下文注意力的LLM和CNN多模态融合情感分析
链接：https://arxiv.org/abs/2508.13196

作者：rkouk, Miloud Mihoubi, Belkacem Chikhaoui
备注：The 38th Canadian Conference on Artificial Intelligence ( 2025 )
摘要：本文介绍了一种新的方法，多模态情感分析的社会媒体，特别是在自然灾害的背景下，了解公众情绪是至关重要的有效的危机管理。与单独处理文本和图像模态的传统方法不同，我们的方法将基于卷积神经网络（CNN）的图像分析与基于大语言模型（LLM）的文本处理无缝集成，利用生成预训练Transformer（GPT）和提示工程从CrisisMMD数据集中提取情感相关特征。为了有效地建模联运关系，我们在融合过程中引入了上下文注意机制。利用上下文注意力层，这种机制有效地捕获了模态间的相互作用，增强了模型对文本和视觉数据之间复杂关系的理解。我们模型的深度神经网络架构从这些融合特征中学习，与现有基线相比，提高了准确性。实验结果表明，在各种自然灾害中，将社交媒体数据分类为信息类和非信息类方面取得了重大进展。我们的模型在准确性上显著提高了2.43%，F1得分提高了5.18%，突出了其在处理复杂多模态数据方面的有效性。除了量化指标，我们的方法还能更深入地了解危机期间表达的情绪。其实际影响扩展到实时灾害管理，其中增强的情感分析可以优化紧急干预的准确性。通过弥合多模态分析，LLM驱动的文本理解和灾难响应之间的差距，我们的工作为人工智能（AI）驱动的危机管理解决方案提供了一个有希望的方向。保留字：
摘要：This paper introduces a novel approach for multimodal sentiment analysis on social media, particularly in the context of natural disasters, where understanding public sentiment is crucial for effective crisis management. Unlike conventional methods that process text and image modalities separately, our approach seamlessly integrates Convolutional Neural Network (CNN) based image analysis with Large Language Model (LLM) based text processing, leveraging Generative Pre-trained Transformer (GPT) and prompt engineering to extract sentiment relevant features from the CrisisMMD dataset. To effectively model intermodal relationships, we introduce a contextual attention mechanism within the fusion process. Leveraging contextual-attention layers, this mechanism effectively captures intermodality interactions, enhancing the model's comprehension of complex relationships between textual and visual data. The deep neural network architecture of our model learns from these fused features, leading to improved accuracy compared to existing baselines. Experimental results demonstrate significant advancements in classifying social media data into informative and noninformative categories across various natural disasters. Our model achieves a notable 2.43% increase in accuracy and 5.18% in F1-score, highlighting its efficacy in processing complex multimodal data. Beyond quantitative metrics, our approach provides deeper insight into the sentiments expressed during crises. The practical implications extend to real time disaster management, where enhanced sentiment analysis can optimize the accuracy of emergency interventions. By bridging the gap between multimodal analysis, LLM powered text understanding, and disaster response, our work presents a promising direction for Artificial Intelligence (AI) driven crisis management solutions. Keywords:

【7】FedChip: Federated LLM for Artificial Intelligence Accelerator Chip Design
标题：FedChip：人工智能加速器芯片设计联合LLM
链接：https://arxiv.org/abs/2508.13162

作者：azzal, Khoa Nguyen, Deepak Vungarala, Ramtin Zand, Shaahin Angizi, Hai Phan, Abdallah Khreishah
摘要：人工智能硬件设计正在快速发展，这是由设计自动化的承诺推动的，使芯片开发更快，更高效，更容易被广泛的用户使用。在自动化工具中，大型语言模型（LLM）通过自动化和简化设计过程的部分提供了一个很有前途的解决方案。然而，他们的潜力受到数据隐私问题和缺乏特定领域培训的阻碍。为了解决这个问题，我们引入了FedChip，这是一种联邦微调方法，使多个芯片设计方能够协作增强专用于自动化硬件设计生成的共享LLM，同时保护专有数据。FedChip使各方能够根据专有本地数据训练模型并提高共享LLM的性能。为了验证FedChip的部署，我们创建并发布了APTPU-Gen，这是一个包含30 k设计变体的数据集，涵盖各种性能指标值，如功耗，性能和面积（PPA）。为了鼓励LLM生成在多个质量指标之间实现平衡的设计，我们提出了一个新的设计评估指标，Chip@k，它根据预定义的验收标准对生成的设计的质量进行统计评估。实验结果表明，FedChip在保持数据隐私的同时，比高端LLM提高了77%以上的设计质量
摘要：AI hardware design is advancing rapidly, driven by the promise of design automation to make chip development faster, more efficient, and more accessible to a wide range of users. Amongst automation tools, Large Language Models (LLMs) offer a promising solution by automating and streamlining parts of the design process. However, their potential is hindered by data privacy concerns and the lack of domain-specific training. To address this, we introduce FedChip, a Federated fine-tuning approach that enables multiple Chip design parties to collaboratively enhance a shared LLM dedicated for automated hardware design generation while protecting proprietary data. FedChip enables parties to train the model on proprietary local data and improve the shared LLM's performance. To exemplify FedChip's deployment, we create and release APTPU-Gen, a dataset of 30k design variations spanning various performance metric values such as power, performance, and area (PPA). To encourage the LLM to generate designs that achieve a balance across multiple quality metrics, we propose a new design evaluation metric, Chip@k, which statistically evaluates the quality of generated designs against predefined acceptance criteria. Experimental results show that FedChip improves design quality by more than 77% over high-end LLMs while maintaining data privacy

【8】Uncovering Emergent Physics Representations Learned In-Context by Large Language Models
标题：揭示大型语言模型在上下文中学习的新兴物理表示
链接：https://arxiv.org/abs/2508.12448

作者：Song, Jaeyong Bae, Dong-Kyum Kim, Hawoong Jeong
备注：17 pages, 10 figures
摘要：大型语言模型（LLM）表现出令人印象深刻的上下文学习（ICL）能力，使它们能够仅通过文本提示解决各种任务。随着这些能力的发展，适用领域的范围继续显着扩大。然而，确定LLM中的精确机制或内部结构，使ICL在不同的，不同类别的任务中取得成功仍然是难以捉摸的。基于物理的任务为探索这一挑战提供了一个有前途的测试平台。与基本算术或符号方程等合成序列不同，物理系统基于基于基本原理的结构化动力学提供实验可控的真实世界数据。这使得它们特别适合于在现实但易于处理的环境中研究LLM的紧急推理行为。在这里，我们机械地调查LLM的ICL能力，特别是专注于他们对物理的推理能力。使用物理系统中的动态预测任务作为代理，我们评估LLM是否可以在上下文中学习物理。我们首先表明，在上下文中的动态预测的性能提高与较长的输入上下文。为了揭示这种能力是如何出现在LLM中的，我们使用稀疏自动编码器（SAE）分析了模型的剩余流激活。我们的实验表明，SAE捕获的特征与关键物理变量（如能量）相关。这些研究结果表明，有意义的物理概念编码在LLM在上下文学习。总之，我们的工作提供了一个新颖的案例研究，拓宽了我们对LLM如何在上下文中学习的理解。
摘要：Large language models (LLMs) exhibit impressive in-context learning (ICL) abilities, enabling them to solve wide range of tasks via textual prompts alone. As these capabilities advance, the range of applicable domains continues to expand significantly. However, identifying the precise mechanisms or internal structures within LLMs that allow successful ICL across diverse, distinct classes of tasks remains elusive. Physics-based tasks offer a promising testbed for probing this challenge. Unlike synthetic sequences such as basic arithmetic or symbolic equations, physical systems provide experimentally controllable, real-world data based on structured dynamics grounded in fundamental principles. This makes them particularly suitable for studying the emergent reasoning behaviors of LLMs in a realistic yet tractable setting. Here, we mechanistically investigate the ICL ability of LLMs, especially focusing on their ability to reason about physics. Using a dynamics forecasting task in physical systems as a proxy, we evaluate whether LLMs can learn physics in context. We first show that the performance of dynamics forecasting in context improves with longer input contexts. To uncover how such capability emerges in LLMs, we analyze the model's residual stream activations using sparse autoencoders (SAEs). Our experiments reveal that the features captured by SAEs correlate with key physical variables, such as energy. These findings demonstrate that meaningful physical concepts are encoded within LLMs during in-context learning. In sum, our work provides a novel case study that broadens our understanding of how LLMs learn in context.

Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】Efficient Knowledge Graph Unlearning with Zeroth-order Information
标题：零阶信息下的高效知识图去学习
链接：https://arxiv.org/abs/2508.14013

作者：, Ruimeng Ye, Bohan Liu, Xiaolong Ma, Bo Hui
备注：9 page
摘要：由于“被遗忘权”等法规，越来越多的人需要从模型中删除训练数据及其影响。由于完全再训练的成本很高，已经提出了各种机器非学习方法。在本文中，我们首先提出了一个有效的知识图（KG）unlearning算法。我们注意到，KG unlearning是不平凡的，由于KG的独特结构和实体之间的语义关系。此外，通过估计移除组件的影响来进行遗忘，当应用于大规模知识图时会产生显著的计算开销。为此，我们定义了一个影响函数KG unlearning，并建议近似模型的灵敏度，而无需昂贵的计算参数更新的一阶和二阶导数。具体地说，我们使用泰勒展开来估计由数据移除引起的参数变化。考虑到一阶梯度和二阶导数在计算量中占主导地位，我们使用Fisher矩阵和零阶优化来近似逆Hessian向量积，而无需构建计算图。我们的实验结果表明，所提出的方法优于其他国家的最先进的图unlearning基线显着的unlearning效率和unlearning质量。我们的代码发布在https://github.com/NKUShaw/ZOWFKGIF。
摘要：Due to regulations like the Right to be Forgotten, there is growing demand for removing training data and its influence from models. Since full retraining is costly, various machine unlearning methods have been proposed. In this paper, we firstly present an efficient knowledge graph (KG) unlearning algorithm. We remark that KG unlearning is nontrivial due to the distinctive structure of KG and the semantic relations between entities. Also, unlearning by estimating the influence of removed components incurs significant computational overhead when applied to large-scale knowledge graphs. To this end, we define an influence function for KG unlearning and propose to approximate the model's sensitivity without expensive computation of first-order and second-order derivatives for parameter updates. Specifically, we use Taylor expansion to estimate the parameter changes caused by data removal. Given that the first-order gradients and second-order derivatives dominate the computational load, we use the Fisher matrices and zeroth-order optimization to approximate the inverse-Hessian vector product without constructing the computational graphs. Our experimental results demonstrate that the proposed method outperforms other state-of-the-art graph unlearning baselines significantly in terms of unlearning efficiency and unlearning quality. Our code is released at https://github.com/NKUShaw/ZOWFKGIF.

【2】Interactive Query Answering on Knowledge Graphs with Soft Entity Constraints
标题：具有软实体约束的知识图上的交互式查询服务
链接：https://arxiv.org/abs/2508.13663

作者：za, Alberto Bernardi, Luca Costabello, Christophe Gueret, Masoud Mansoury, Michael Cochez, Martijn Schut
摘要：用于在不完整知识图上进行查询应答的方法检索可能是应答的实体，这在由于缺失边而不能通过直接图遍历到达这样的应答时特别有用。然而，现有的方法都集中在使用一阶逻辑形式化的查询。在实践中，许多现实世界的查询涉及到固有的模糊或上下文相关的约束，如属性或相关类别的首选项。针对这一差距，我们介绍了问题的查询回答软约束。我们提出了一个神经查询重排序器（NQR），旨在调整查询答案分数，将软约束，而不破坏原始答案的查询。NQR交互式操作，根据首选和非首选实体的增量示例细化答案。我们通过生成具有软约束的数据集来扩展现有的QA基准。我们的实验表明，NQR可以捕获软约束，同时保持强大的查询应答性能。
摘要：Methods for query answering over incomplete knowledge graphs retrieve entities that are likely to be answers, which is particularly useful when such answers cannot be reached by direct graph traversal due to missing edges. However, existing approaches have focused on queries formalized using first-order-logic. In practice, many real-world queries involve constraints that are inherently vague or context-dependent, such as preferences for attributes or related categories. Addressing this gap, we introduce the problem of query answering with soft constraints. We propose a Neural Query Reranker (NQR) designed to adjust query answer scores by incorporating soft constraints without disrupting the original answers to a query. NQR operates interactively, refining answers based on incremental examples of preferred and non-preferred entities. We extend existing QA benchmarks by generating datasets with soft constraints. Our experiments demonstrate that NQR can capture soft constraints while maintaining robust query answering performance.

【3】SVDformer: Direction-Aware Spectral Graph Embedding Learning via SVD and Transformer
标题：SVDformer：通过DID和Transformer的方向感知谱图嵌入学习
链接：https://arxiv.org/abs/2508.13435

作者：g, Zhiqi Shao, S T Boris Choy, Junbin Gao
摘要：有向图被广泛用于对现实世界系统中的非对称关系进行建模。然而，现有的有向图神经网络往往难以联合捕获方向语义和全局结构模式，由于其各向同性的聚合机制和本地化的过滤机制。为了解决这个问题，本文提出了SVDformer，一个新的框架，协同SVD和Transformer架构的方向感知图表示学习。SVDformer首先通过多头自关注细化奇异值嵌入，自适应增强关键频谱分量，同时抑制高频噪声。这使得可学习的低通/高通图滤波，而不需要频谱内核。此外，通过将奇异向量视为方向投影基，将奇异值视为缩放因子，SVDformer使用Transformer通过注意力权重对传入/传出边缘模式之间的多尺度交互进行建模，从而在特征传播期间显式地保留边缘方向性。在六个有向图基准上进行的大量实验表明，SVDformer在节点分类任务上始终优于最先进的GNN和方向感知基线，为有向图上的学习表示建立了一个新的范例。
摘要：Directed graphs are widely used to model asymmetric relationships in real-world systems. However, existing directed graph neural networks often struggle to jointly capture directional semantics and global structural patterns due to their isotropic aggregation mechanisms and localized filtering mechanisms. To address this limitation, this paper proposes SVDformer, a novel framework that synergizes SVD and Transformer architecture for direction-aware graph representation learning. SVDformer first refines singular value embeddings through multi-head self-attention, adaptively enhancing critical spectral components while suppressing high-frequency noise. This enables learnable low-pass/high-pass graph filtering without requiring spectral kernels. Furthermore, by treating singular vectors as directional projection bases and singular values as scaling factors, SVDformer uses the Transformer to model multi-scale interactions between incoming/outgoing edge patterns through attention weights, thereby explicitly preserving edge directionality during feature propagation. Extensive experiments on six directed graph benchmarks demonstrate that SVDformer consistently outperforms state-of-the-art GNNs and direction-aware baselines on node classification tasks, establishing a new paradigm for learning representations on directed graphs.

【4】A Dual-Attention Graph Network for fMRI Data Classification
标题：用于fMRI数据分类的双注意力图网络
链接：https://arxiv.org/abs/2508.13328

作者：rbab, Zeinab Davarani, Mehran Safayani
摘要：了解复杂的神经活动动力学对于神经科学领域的发展至关重要。虽然目前的功能性MRI分类方法往往是基于静态的功能连接或不能全面捕捉时空关系，我们提出了一个新的框架，利用动态图形创建和时空注意机制的自闭症谱系障碍（ASD）诊断。本研究中使用的方法使用基于transformer的注意力机制动态推断每个时间间隔的功能性大脑连接，使模型能够选择性地关注关键的大脑区域和时间段。通过构建时变图，然后使用图卷积网络（GCN）和Transformers进行处理，我们的方法成功地捕获了局部相互作用和全局时间依赖性。在ABIDE数据集的子集上进行评估，我们的模型实现了63.2的准确度和60.0的AUC，优于基于静态图的方法（例如，GCN：51.8）。这验证了动态连接和时空背景的联合建模的有效性fMRI分类。核心新颖性来自（1）注意力驱动的动态图形创建，学习时间大脑区域的相互作用和（2）通过GCN transformer融合的分层时空特征融合。
摘要：Understanding the complex neural activity dynamics is crucial for the development of the field of neuroscience. Although current functional MRI classification approaches tend to be based on static functional connectivity or cannot capture spatio-temporal relationships comprehensively, we present a new framework that leverages dynamic graph creation and spatiotemporal attention mechanisms for Autism Spectrum Disorder(ASD) diagnosis. The approach used in this research dynamically infers functional brain connectivity in each time interval using transformer-based attention mechanisms, enabling the model to selectively focus on crucial brain regions and time segments. By constructing time-varying graphs that are then processed with Graph Convolutional Networks (GCNs) and transformers, our method successfully captures both localized interactions and global temporal dependencies. Evaluated on the subset of ABIDE dataset, our model achieves 63.2 accuracy and 60.0 AUC, outperforming static graph-based approaches (e.g., GCN:51.8). This validates the efficacy of joint modeling of dynamic connectivity and spatio-temporal context for fMRI classification. The core novelty arises from (1) attention-driven dynamic graph creation that learns temporal brain region interactions and (2) hierarchical spatio-temporal feature fusion through GCNtransformer fusion.

【5】Deep Graph Neural Point Process For Learning Temporal Interactive Networks
标题：用于学习时态交互网络的深度图神经点过程
链接：https://arxiv.org/abs/2508.13219

作者：Xiaohua Qi, Xixun Lin, Yanmin Shang, Xiaolin Xu, Yangxi Li
摘要：以往将时态交互网络（TIN）的学习看作是一个粗粒度的多序列预测问题，忽略了网络拓扑结构的影响。本文针对这一局限性，提出了一种深度图神经点处理（DGNPP）模型。DGNPP由两个关键模块组成：节点聚合层和自关注层。节点聚合层捕获拓扑结构以生成用户和项目的静态表示，而自关注层随时间动态更新嵌入。通过将动态和静态嵌入结合到事件强度函数中，并通过最大似然估计优化模型，DGNPP有效地预测了事件和发生时间。在三个公开数据集上的实验结果表明，DGNPP在事件预测和时间预测任务中具有较高的效率，显著优于基线模型，并有效缓解了现有方法的局限性。
摘要：Learning temporal interaction networks(TIN) is previously regarded as a coarse-grained multi-sequence prediction problem, ignoring the network topology structure influence. This paper addresses this limitation and a Deep Graph Neural Point Process(DGNPP) model for TIN is proposed. DGNPP consists of two key modules: the Node Aggregation Layer and the Self Attentive Layer. The Node Aggregation Layer captures topological structures to generate static representation for users and items, while the Self Attentive Layer dynamically updates embeddings over time. By incorporating both dynamic and static embeddings into the event intensity function and optimizing the model via maximum likelihood estimation, DGNPP predicts events and occurrence time effectively. Experimental evaluations on three public datasets demonstrate that DGNPP achieves superior performance in event prediction and time prediction tasks with high efficiency, significantly outperforming baseline models and effectively mitigating the limitations of prior approaches.

Transformer(3篇)

【1】ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery
标题：ASDFormer：一个具有混合池分类器专家的Transformer，用于强大的自闭症诊断和生物标志物发现
链接：https://arxiv.org/abs/2508.14005

作者：Izadi, Mehran Safayani
摘要：自闭症谱系障碍（ASD）是一种复杂的神经发育状况，其特征是大脑连接中断。功能性磁共振成像（fMRI）通过测量大脑中的血氧水平依赖（BOLD）信号，为大规模神经动力学提供了一个非侵入性窗口。这些信号可以被建模为感兴趣区域（ROI）之间的相互作用，这些感兴趣区域根据其在大脑功能中的潜在作用被分组为功能社区。新出现的证据表明，这些社区内部和之间的连接模式对ASD相关的改变特别敏感。有效地捕获这些模式并识别偏离典型发育的相互作用对于改善ASD诊断和实现生物标志物发现至关重要。在这项工作中，我们介绍了ASDFormer，一个基于转换器的架构，它采用了混合池分类专家（MoE）来捕获与ASD相关的神经签名。通过将多个专门的专家分支与注意力机制相结合，ASDFormer自适应地强调了与自闭症相关的不同大脑区域和连接模式。这使得能够改进分类性能和更可解释的疾病相关生物标志物的鉴定。应用于ABIDE数据集，ASDFormer实现了最先进的诊断准确性，并揭示了与ASD相关的功能连接中断的强大见解，突出了其作为生物标志物发现工具的潜力。
摘要：Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition marked by disruptions in brain connectivity. Functional MRI (fMRI) offers a non-invasive window into large-scale neural dynamics by measuring blood-oxygen-level-dependent (BOLD) signals across the brain. These signals can be modeled as interactions among Regions of Interest (ROIs), which are grouped into functional communities based on their underlying roles in brain function. Emerging evidence suggests that connectivity patterns within and between these communities are particularly sensitive to ASD-related alterations. Effectively capturing these patterns and identifying interactions that deviate from typical development is essential for improving ASD diagnosis and enabling biomarker discovery. In this work, we introduce ASDFormer, a Transformer-based architecture that incorporates a Mixture of Pooling-Classifier Experts (MoE) to capture neural signatures associated with ASD. By integrating multiple specialized expert branches with attention mechanisms, ASDFormer adaptively emphasizes different brain regions and connectivity patterns relevant to autism. This enables both improved classification performance and more interpretable identification of disorder-related biomarkers. Applied to the ABIDE dataset, ASDFormer achieves state-of-the-art diagnostic accuracy and reveals robust insights into functional connectivity disruptions linked to ASD, highlighting its potential as a tool for biomarker discovery.

【2】PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting
标题：PENGUIN：通过周期性的群体关注来增强长期时间序列预测的Transformer
链接：https://arxiv.org/abs/2508.13773

作者： Yuqi Chen, Weiwei Sun
摘要：长期时间序列预测（LTSF）是一项具有广泛应用前景的基础性工作。虽然基于transformer的模型在预测方面取得了重大突破，但其对时间序列预测的有效性仍然存在争议。在本文中，我们重新审视自我注意的意义，并提出了一个简单而有效的机制，周期嵌套的群体注意，即企鹅。我们的方法突出了明确建模周期性模式，并将有效的时间序列建模的相对注意力偏差的重要性。为此，我们引入了一个嵌套的相对注意力偏差，直接捕捉周期性结构。为了处理多个共存的周期性（例如，每日和每周周期），我们设计了一个分组注意机制，其中每个组使用多查询注意机制针对特定的周期。在不同的基准测试中进行的大量实验表明，PENGUIN始终优于基于MLP和基于Transformer的模型。
摘要：Long-term time series forecasting (LTSF) is a fundamental task with wide-ranging applications. Although Transformer-based models have made significant breakthroughs in forecasting, their effectiveness for time series forecasting remains debatable. In this paper, we revisit the significance of self-attention and propose a simple yet effective mechanism, Periodic-Nested Group Attention, namely PENGUIN. Our approach highlights the importance of explicitly modeling periodic patterns and incorporating relative attention bias for effective time series modeling. To this end, we introduce a periodic-nested relative attention bias that captures periodic structures directly. To handle multiple coexisting periodicities (e.g., daily and weekly cycles), we design a grouped attention mechanism, where each group targets a specific periodicity using a multi-query attention mechanism. Extensive experiments across diverse benchmarks demonstrate that PENGUIN consistently outperforms both MLP-based and Transformer-based models.

【3】Vision Transformers for Kidney Stone Image Classification: A Comparative Study with CNNs
标题：用于肾结石图像分类的视觉变形器：与CNN的比较研究
链接：https://arxiv.org/abs/2508.13461

作者：s-Amezcua, Francisco Lopez-Tiro, Clement Larose, Andres Mendez-Vazquez, Gilberto Ochoa-Ruiz, Christian Daul
摘要：根据内窥镜图像对肾结石进行分类对于个性化治疗和预防复发至关重要。虽然卷积神经网络（CNN）在这项任务中表现出了希望，但它们捕获远程依赖关系的能力有限，可能会阻碍可变成像条件下的性能。本研究对Vision Transformers（ViTs）和基于CNN的模型进行了比较分析，评估了它们在两个离体数据集（包括CCD相机和柔性输尿管镜图像）上的性能。在ImageNet-21 k上预训练的ViT-base模型在多种成像条件下的表现始终优于ResNet 50基线。例如，在视觉上最复杂的子集（来自内窥镜图像的切片）中，ViT模型实现了95.2%的准确率和95.1%的F1评分，而ResNet 50分别为64.5%和59.3%。在来自CCD相机图像的混合视图子集中，ViT达到87.1%的准确率，而CNN为78.4%。这些改进也扩展到了精确度和召回率。结果表明，基于ViT的架构提供了卓越的分类性能，并为肾结石图像分析提供了传统CNN的可扩展替代方案。
摘要：Kidney stone classification from endoscopic images is critical for personalized treatment and recurrence prevention. While convolutional neural networks (CNNs) have shown promise in this task, their limited ability to capture long-range dependencies can hinder performance under variable imaging conditions. This study presents a comparative analysis between Vision Transformers (ViTs) and CNN-based models, evaluating their performance on two ex vivo datasets comprising CCD camera and flexible ureteroscope images. The ViT-base model pretrained on ImageNet-21k consistently outperformed a ResNet50 baseline across multiple imaging conditions. For instance, in the most visually complex subset (Section patches from endoscopic images), the ViT model achieved 95.2% accuracy and 95.1% F1-score, compared to 64.5% and 59.3% with ResNet50. In the mixed-view subset from CCD-camera images, ViT reached 87.1% accuracy versus 78.4% with CNN. These improvements extend across precision and recall as well. The results demonstrate that ViT-based architectures provide superior classification performance and offer a scalable alternative to conventional CNNs for kidney stone image analysis.

GAN|对抗|攻击|生成相关(5篇)

【1】Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation
标题：重温扩散Q学习：从迭代去噪到一步行动生成
链接：https://arxiv.org/abs/2508.13904

作者：yen, Chang D. Yoo
摘要：扩散模型（DM）的生成能力最近使离线强化学习（RL）中的高性能决策算法成为可能，在标准基准测试中获得了最先进的结果。其中，扩散Q学习（DQL）因其一贯的强大性能而脱颖而出。然而，DQL在实践中仍然受到限制，因为它依赖于在训练和推理过程中生成动作的多步去噪。虽然一步去噪是可取的，但简单地将其应用于DQL会导致性能急剧下降。在这项工作中，我们重新审视DQL并确定其核心限制。然后，我们提出了一步流Q-Learning（OFQL），这是一种新的框架，可以在训练和推理过程中实现高效的一步动作生成，而不需要辅助模型，蒸馏或多阶段训练。具体来说，OFQL重新制定DQL内的样本效率流匹配（FM）的框架。虽然传统FM会导致弯曲的生成轨迹，从而阻碍一步生成，但OFQL会学习平均速度场，从而促进直接、准确的动作生成。总的来说，OFQL消除了DQL中多步采样和递归梯度更新的需要，从而实现了更快，更强大的训练和推理。在D4 RL基准上进行的大量实验表明，OFQL的性能优于DQL和其他基于扩散的基线，同时与DQL相比，大大减少了训练和推理时间。
摘要：The generative power of diffusion models (DMs) has recently enabled high-performing decision-making algorithms in offline reinforcement learning (RL), achieving state-of-the-art results across standard benchmarks. Among them, Diffusion Q-Learning (DQL) stands out as a leading method for its consistently strong performance. Nevertheless, DQL remains limited in practice due to its reliance on multi-step denoising for action generation during both training and inference. Although one-step denoising is desirable, simply applying it to DQL leads to a drastic performance drop. In this work, we revisit DQL and identify its core limitations. We then propose One-Step Flow Q-Learning (OFQL), a novel framework that enables efficient one-step action generation during both training and inference, without requiring auxiliary models, distillation, or multi-phase training. Specifically, OFQL reformulates DQL within the sample-efficient Flow Matching (FM) framework. While conventional FM induces curved generative trajectories that impede one-step generation, OFQL instead learns an average velocity field that facilitates direct, accurate action generation. Collectively, OFQL eliminates the need for multi-step sampling and recursive gradient updates in DQL, resulting in faster and more robust training and inference. Extensive experiments on the D4RL benchmark demonstrate that OFQL outperforms DQL and other diffusion-based baselines, while substantially reducing both training and inference time compared to DQL.

【2】FedUP: Efficient Pruning-based Federated Unlearning for Model Poisoning Attacks
标题：FedUP：针对模型中毒攻击的高效基于修剪的联邦撤销学习
链接：https://arxiv.org/abs/2508.13853

作者：mandini, Cristian Borcea, Rebecca Montanari, Luca Foschini
备注：15 pages, 5 figures, 7 tables
摘要：联邦学习（FL）可能容易受到攻击，例如模型中毒，其中对手发送恶意的局部权重以损害全局模型。联邦非学习（FU）正在成为一种解决此类漏洞的解决方案，通过选择性地消除检测到的恶意贡献者对全局模型的影响，而无需完全重新训练。然而，与典型的FU场景不同，在这些场景中，客户端是可信的和合作的，对恶意的和可能串通的客户端应用FU是具有挑战性的，因为不能假设他们在忘记他们的数据方面的合作。这项工作提出了FedUP，一个轻量级的FU算法，旨在有效地减轻恶意客户端的影响，通过修剪特定的连接内的攻击模型。我们的方法通过仅依赖于客户端在最后一轮训练中的权重来实现效率，然后再进行学习以确定要抑制哪些连接。由于良性和恶意客户端的更新重叠，因此隔离恶意影响并不简单。FedUP通过仔细选择和归零来自良性和恶意客户端的最新更新之间差异最大的最高幅度权重来解决这个问题，同时保留良性信息。FedUP在强对抗性威胁模型下进行评估，其中高达50%-1的客户端可能是恶意的，并且完全了解聚合过程。我们证明了我们的解决方案的有效性，鲁棒性和效率，通过实验在IID和非IID数据，标签翻转和后门攻击下，并通过比较它与最先进的（SOTA）FU解决方案。在所有情况下，FedUP都能减少恶意影响，降低恶意数据的准确性，以匹配从头开始重新训练的模型，同时保留良性数据的性能。FedUP实现了有效的遗忘，同时与SOTA相比始终更快并节省存储。
摘要：Federated Learning (FL) can be vulnerable to attacks, such as model poisoning, where adversaries send malicious local weights to compromise the global model. Federated Unlearning (FU) is emerging as a solution to address such vulnerabilities by selectively removing the influence of detected malicious contributors on the global model without complete retraining. However, unlike typical FU scenarios where clients are trusted and cooperative, applying FU with malicious and possibly colluding clients is challenging because their collaboration in unlearning their data cannot be assumed. This work presents FedUP, a lightweight FU algorithm designed to efficiently mitigate malicious clients' influence by pruning specific connections within the attacked model. Our approach achieves efficiency by relying only on clients' weights from the last training round before unlearning to identify which connections to inhibit. Isolating malicious influence is non-trivial due to overlapping updates from benign and malicious clients. FedUP addresses this by carefully selecting and zeroing the highest magnitude weights that diverge the most between the latest updates from benign and malicious clients while preserving benign information. FedUP is evaluated under a strong adversarial threat model, where up to 50%-1 of the clients could be malicious and have full knowledge of the aggregation process. We demonstrate the effectiveness, robustness, and efficiency of our solution through experiments across IID and Non-IID data, under label-flipping and backdoor attacks, and by comparing it with state-of-the-art (SOTA) FU solutions. In all scenarios, FedUP reduces malicious influence, lowering accuracy on malicious data to match that of a model retrained from scratch while preserving performance on benign data. FedUP achieves effective unlearning while consistently being faster and saving storage compared to the SOTA.

【3】Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond
标题：重尾线性Bandits：对抗鲁棒性，两个世界中最好的，以及超越
链接：https://arxiv.org/abs/2508.13679

作者：ao, Shinji Ito, Shuai Li
摘要：Heavy-tailed bandits have been extensively studied since the seminal work of \citet{Bubeck2012BanditsWH}. In particular, heavy-tailed linear bandits, enabling efficient learning with both a large number of arms and heavy-tailed noises, have recently attracted significant attention \citep{ShaoYKL18,XueWWZ20,ZhongHYW21,Wang2025heavy,tajdini2025improved}. However, prior studies focus almost exclusively on stochastic regimes, with few exceptions limited to the special case of heavy-tailed multi-armed bandits (MABs) \citep{Huang0H22,ChengZ024,Chen2024uniINF}. In this work, we propose a general framework for adversarial heavy-tailed bandit problems, which performs follow-the-regularized-leader (FTRL) over the loss estimates shifted by a bonus function. Via a delicate setup of the bonus function, we devise the first FTRL-type best-of-both-worlds (BOBW) algorithm for heavy-tailed MABs, which does not require the truncated non-negativity assumption and achieves an $\widetilde{O}(T^{\frac{1}{\varepsilon}})$ worst-case regret in the adversarial regime as well as an $\widetilde{O}(\log T)$ gap-dependent regret in the stochastic regime. We then extend our framework to the linear case, proposing the first algorithm for adversarial heavy-tailed linear bandits with finite arm sets. This algorithm achieves an $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{\varepsilon}})$ regret, matching the best-known worst-case regret bound in stochastic regimes. Moreover, we propose a general data-dependent learning rate, termed \textit{heavy-tailed noise aware stability-penalty matching} (HT-SPM). We prove that HT-SPM guarantees BOBW regret bounds for general heavy-tailed bandit problems once certain conditions are satisfied. By using HT-SPM and, in particular, a variance-reduced linear loss estimator, we obtain the first BOBW result for heavy-tailed linear bandits.
摘要：Heavy-tailed bandits have been extensively studied since the seminal work of \citet{Bubeck2012BanditsWH}. In particular, heavy-tailed linear bandits, enabling efficient learning with both a large number of arms and heavy-tailed noises, have recently attracted significant attention \citep{ShaoYKL18,XueWWZ20,ZhongHYW21,Wang2025heavy,tajdini2025improved}. However, prior studies focus almost exclusively on stochastic regimes, with few exceptions limited to the special case of heavy-tailed multi-armed bandits (MABs) \citep{Huang0H22,ChengZ024,Chen2024uniINF}. In this work, we propose a general framework for adversarial heavy-tailed bandit problems, which performs follow-the-regularized-leader (FTRL) over the loss estimates shifted by a bonus function. Via a delicate setup of the bonus function, we devise the first FTRL-type best-of-both-worlds (BOBW) algorithm for heavy-tailed MABs, which does not require the truncated non-negativity assumption and achieves an $\widetilde{O}(T^{\frac{1}{\varepsilon}})$ worst-case regret in the adversarial regime as well as an $\widetilde{O}(\log T)$ gap-dependent regret in the stochastic regime. We then extend our framework to the linear case, proposing the first algorithm for adversarial heavy-tailed linear bandits with finite arm sets. This algorithm achieves an $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{\varepsilon}})$ regret, matching the best-known worst-case regret bound in stochastic regimes. Moreover, we propose a general data-dependent learning rate, termed \textit{heavy-tailed noise aware stability-penalty matching} (HT-SPM). We prove that HT-SPM guarantees BOBW regret bounds for general heavy-tailed bandit problems once certain conditions are satisfied. By using HT-SPM and, in particular, a variance-reduced linear loss estimator, we obtain the first BOBW result for heavy-tailed linear bandits.

【4】Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation
标题：沙特方言-ALLaM：LoRA针对阿拉伯语方言一代的微调
链接：https://arxiv.org/abs/2508.13525

作者：rmandah
备注：7 pages, 6 figures, 2 tables. Code: this https URL . Dataset and trained weights/adapters are not released. Primary category: cs.CL
摘要：阿拉伯语的大型语言模型（LLM）仍然由现代标准阿拉伯语（MSA）主导，对Najdi和Hijazi等沙特方言的支持有限。这种代表性不足阻碍了他们捕捉真实方言变异的能力。使用一个私人策划的沙特方言指令数据集（Hijazi和Najdi; 5，466个合成应答对; 50/50分裂），我们对沙特阿拉伯开发的第一个基础模型ALLaM-7 B-Instruct-preview进行LoRA调整，用于沙特方言生成。我们研究两种变体：（i）方言标记训练，其将显式方言标签前置到指令，以及（ii）无标记训练，其在格式化时省略标签。在测试集上的评估将外部方言分类器与文本保真度度量（chrF++和BERTScore）和多样性度量相结合。Dialect-Token模型的控制效果最好，Saudi率从47.97%提高到84.21%，MSA泄漏从32.63%降低到6.21%;保真度也有所提高（chrF +3.53，BERTScore +0.059）。这两种LoRA变体在方言控制和保真度方面都优于强大的通用指令模型（Falcon-7 B-Instruct，Llama-3.1-8B-Instruct，Qwen-2.5- 7 B-Instruct，AceGPT-v2-8B-Chat，JAIS-13 B-Chat），同时避免了这些基线经常出现的元数据标签回声。我们不发布数据集或任何模型权重/适配器;相反，我们发布训练/评估/推理代码和详细的数据库（模式和聚合统计数据）以支持独立验证。
摘要：Large language models (LLMs) for Arabic are still dominated by Modern Standard Arabic (MSA), with limited support for Saudi dialects such as Najdi and Hijazi. This underrepresentation hinders their ability to capture authentic dialectal variation. Using a privately curated Saudi Dialect Instruction dataset (Hijazi and Najdi; 5,466 synthetic instruction-response pairs; 50/50 split), we LoRA-tune ALLaM-7B-Instruct-preview, the first foundation model developed in Saudi Arabia, for Saudi dialect generation. We investigate two variants: (i) Dialect-Token training, which prepends an explicit dialect tag to the instruction, and (ii) No-Token training, which omits the tag at formatting time. Evaluation on a held-out test set combines an external dialect classifier with text fidelity metrics (chrF++ and BERTScore) and diversity measures. The Dialect-Token model achieves the best control, raising the Saudi rate from 47.97% to 84.21% and reducing MSA leakage from 32.63% to 6.21%; fidelity also improves (chrF++ +3.53, BERTScore +0.059). Both LoRA variants outperform strong generic instruction models (Falcon-7B-Instruct, Llama-3.1-8B-Instruct, Qwen-2.5-7B-Instruct, AceGPT-v2-8B-Chat, JAIS-13B-Chat) in dialect control and fidelity, while avoiding metadata-tag echoing that these baselines frequently exhibit. We do not release the dataset or any model weights/adapters; instead, we release training/evaluation/inference code and a detailed datasheet (schema and aggregate statistics) to support independent verification.

【5】DAASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples
标题：DAASH：一个用于合成有效且隐蔽的对抗示例的元攻击框架
链接：https://arxiv.org/abs/2508.13309

作者：Al Nomaan Nafi, Habibur Rahaman, Zafaryab Haider, Tanzim Mahfuz, Fnu Suya, Swarup Bhunia, Prabuddha Chakraborty
摘要：已经提出了许多技术来在严格的Lp范数约束下在白盒设置中生成对抗性示例。然而，这样的范数有界的例子往往不能很好地与人类的感知保持一致，直到最近才有一些方法开始专门探索感知对齐的对抗性例子。此外，目前还不清楚是否可以有效地利用Lp约束攻击的见解来提高感知效能。在本文中，我们介绍了DAASH，这是一个完全可区分的元攻击框架，它通过战略性地组合现有的基于LP的攻击方法来生成有效且感知一致的对抗性示例。DAASH以多阶段的方式运行：在每个阶段，它使用学习的自适应权重从多个基本攻击中聚合候选对抗性示例，并将结果传播到下一阶段。一种新的元损失函数通过联合最小化误分类损失和感知失真来指导这一过程，使框架能够在整个阶段动态调节每个基础攻击的贡献。我们在CIFAR-10、CIFAR-100和ImageNet上的逆向训练模型上评估DAASH。尽管仅仅依赖于基于Lp约束的方法，DAASH的性能明显优于先进的感知攻击，如AdvAD -实现了更高的攻击成功率（例如，20.63%的改善）和卓越的视觉质量，如SSIM、LPIPS和FID所测量的（分别改善约11、0.015和5.7）。此外，DAASH可以很好地推广到看不见的防御，使其成为评估鲁棒性的实用和强大的基线，而无需为每个新防御手工制作自适应攻击。
摘要：Numerous techniques have been proposed for generating adversarial examples in white-box settings under strict Lp-norm constraints. However, such norm-bounded examples often fail to align well with human perception, and only recently have a few methods begun specifically exploring perceptually aligned adversarial examples. Moreover, it remains unclear whether insights from Lp-constrained attacks can be effectively leveraged to improve perceptual efficacy. In this paper, we introduce DAASH, a fully differentiable meta-attack framework that generates effective and perceptually aligned adversarial examples by strategically composing existing Lp-based attack methods. DAASH operates in a multi-stage fashion: at each stage, it aggregates candidate adversarial examples from multiple base attacks using learned, adaptive weights and propagates the result to the next stage. A novel meta-loss function guides this process by jointly minimizing misclassification loss and perceptual distortion, enabling the framework to dynamically modulate the contribution of each base attack throughout the stages. We evaluate DAASH on adversarially trained models across CIFAR-10, CIFAR-100, and ImageNet. Despite relying solely on Lp-constrained based methods, DAASH significantly outperforms state-of-the-art perceptual attacks such as AdvAD -- achieving higher attack success rates (e.g., 20.63\% improvement) and superior visual quality, as measured by SSIM, LPIPS, and FID (improvements $\approx$ of 11, 0.015, and 5.7, respectively). Furthermore, DAASH generalizes well to unseen defenses, making it a practical and strong baseline for evaluating robustness without requiring handcrafted adaptive attacks for each new defense.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】Unsupervised Urban Tree Biodiversity Mapping from Street-Level Imagery Using Spatially-Aware Visual Clustering
标题：基于空间感知视觉聚类的城市树木生物多样性地图
链接：https://arxiv.org/abs/2508.13814

作者：en Abuhani, Marco Seccaroni, Martina Mazzarello, Imran Zualkernan, Fabio Duarte, Carlo Ratti
备注：26 pages, 7 figures, Nature Format
摘要：城市树木生物多样性对于城市的气候适应能力、生态稳定性和宜居性至关重要，但大多数城市缺乏对其树冠的详细了解。基于现场的库存可以提供对香农和辛普森多样性的可靠估计，但成本高昂且耗时，而监督式人工智能方法需要标记的数据，而这些数据通常无法跨地区推广。我们引入了一个无监督的聚类框架，该框架将街道级图像的视觉嵌入与空间种植模式相结合，以估计没有标签的生物多样性。应用于八个北美城市，该方法恢复属级的多样性模式与高保真度，实现低Wasserstein距离地面真相香农和辛普森指数，并保持空间自相关。这种可扩展的细粒度方法使缺乏详细清单的城市能够进行生物多样性测绘，并提供了一种持续、低成本监测的途径，以支持公平获得绿色植物和城市生态系统的适应性管理。
摘要：Urban tree biodiversity is critical for climate resilience, ecological stability, and livability in cities, yet most municipalities lack detailed knowledge of their canopies. Field-based inventories provide reliable estimates of Shannon and Simpson diversity but are costly and time-consuming, while supervised AI methods require labeled data that often fail to generalize across regions. We introduce an unsupervised clustering framework that integrates visual embeddings from street-level imagery with spatial planting patterns to estimate biodiversity without labels. Applied to eight North American cities, the method recovers genus-level diversity patterns with high fidelity, achieving low Wasserstein distances to ground truth for Shannon and Simpson indices and preserving spatial autocorrelation. This scalable, fine-grained approach enables biodiversity mapping in cities lacking detailed inventories and offers a pathway for continuous, low-cost monitoring to support equitable access to greenery and adaptive management of urban ecosystems.

【2】MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning
标题：MACTAS：基于自我注意力的多智能体强化学习中智能体间通信模块
链接：https://arxiv.org/abs/2508.13661

作者：jtala, Bogusz Stefańczyk, Dominik Bogucki, Łukasz Lepak, Jakub Strykowski, Paweł Wawrzyński
备注：Submitted for AAAI 2026
摘要：通信对于人类智能体集体执行复杂任务至关重要，激发了人们对多智能体强化学习（MARL）通信机制的兴趣。然而，现有的通信协议在MARL往往是复杂的和不可区分的。在这项工作中，我们引入了一个基于自我注意力的通信模块，在MARL的代理之间交换信息。我们提出的方法是完全可区分的，允许代理学习以奖励驱动的方式生成消息。该模块可以与任何行动价值函数分解方法无缝集成，可以被视为此类分解的扩展。值得注意的是，它包括固定数量的可训练参数，独立于代理的数量。SMAC基准测试的实验结果证明了我们的方法的有效性，它在几个地图上达到了最先进的性能。
摘要：Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on several maps.

【3】A Generalized Learning Framework for Self-Supervised Contrastive Learning
标题：自我监督对比学习的广义学习框架
链接：https://arxiv.org/abs/2508.13596

作者：, Jingyao Wang, Wenwen Qiang
摘要：自监督对比学习（SSCL）最近在多个下游任务中表现出优越性。在本文中，我们推广的标准SSCL方法的广义学习框架（GLF）由两部分组成：对齐部分和约束部分。我们分析了现有的三种SSCL方法：BYOL，Barlow Twins和SwAV，并表明它们可以在GLF下统一，并选择不同的约束部分。我们进一步提出了经验和理论分析，为设计GLF的约束部分提供了两个见解：类内紧致性和类间可分性，它们衡量特征空间如何保留输入的类信息。然而，由于SSCL不能使用标签，这是具有挑战性的设计一个约束部分，满足这些属性。为了解决这个问题，我们考虑通过迭代捕获锚点和其他样本之间的动态关系来诱导类内紧致性和类间可分性，并提出一种称为自适应分布校准（ADC）的即插即用方法，以确保在原始输入空间中靠近或远离锚点的样本在特征空间中更接近或更远离锚点。理论分析和实证评价均表明了ADC的优越性。
摘要：Self-supervised contrastive learning (SSCL) has recently demonstrated superiority in multiple downstream tasks. In this paper, we generalize the standard SSCL methods to a Generalized Learning Framework (GLF) consisting of two parts: the aligning part and the constraining part. We analyze three existing SSCL methods: BYOL, Barlow Twins, and SwAV, and show that they can be unified under GLF with different choices of the constraining part. We further propose empirical and theoretical analyses providing two insights into designing the constraining part of GLF: intra-class compactness and inter-class separability, which measure how well the feature space preserves the class information of the inputs. However, since SSCL can not use labels, it is challenging to design a constraining part that satisfies these properties. To address this issue, we consider inducing intra-class compactness and inter-class separability by iteratively capturing the dynamic relationship between anchor and other samples and propose a plug-and-play method called Adaptive Distribution Calibration (ADC) to ensure that samples that are near or far from the anchor point in the original input space are closer or further away from the anchor point in the feature space. Both the theoretical analysis and the empirical evaluation demonstrate the superiority of ADC.

【4】Uncertainty Tube Visualization of Particle Trajectories
标题：粒子轨迹的不确定性管可视化
链接：https://arxiv.org/abs/2508.13505

作者：, Timbwaoga Aime Judicael Ouermi, Mengjiao Han, Chris R. Johnson
摘要：用神经网络（NN）预测粒子轨迹大大增强了许多科学和工程领域。然而，有效地量化和可视化预测中固有的不确定性仍然具有挑战性。如果不了解的不确定性，NN模型的可靠性在应用程序中的可信度是至关重要的显着妥协。本文介绍了不确定性管，一种新的，计算效率高的可视化方法，旨在代表这种不确定性NN派生粒子路径。我们的关键创新是设计和实现了一个超椭圆管，准确地捕捉和直观地传达非对称的不确定性。通过整合成熟的不确定性量化技术，如深度集成，蒙特卡罗丢弃（MC丢弃）和随机加权平均高斯（SWAG），我们展示了不确定性管的实际效用，展示了其在合成和模拟数据集上的应用。
摘要：Predicting particle trajectories with neural networks (NNs) has substantially enhanced many scientific and engineering domains. However, effectively quantifying and visualizing the inherent uncertainty in predictions remains challenging. Without an understanding of the uncertainty, the reliability of NN models in applications where trustworthiness is paramount is significantly compromised. This paper introduces the uncertainty tube, a novel, computationally efficient visualization method designed to represent this uncertainty in NN-derived particle paths. Our key innovation is the design and implementation of a superelliptical tube that accurately captures and intuitively conveys nonsymmetric uncertainty. By integrating well-established uncertainty quantification techniques, such as Deep Ensembles, Monte Carlo Dropout (MC Dropout), and Stochastic Weight Averaging-Gaussian (SWAG), we demonstrate the practical utility of the uncertainty tube, showcasing its application on both synthetic and simulation datasets.

【5】ASAP: Unsupervised Post-training with Label Distribution Shift Adaptive Learning Rate
标题：尽快：标签分布转移自适应学习率的无监督后训练
链接：https://arxiv.org/abs/2508.13445

作者：rk, Mugon Joe, Miru Kim, Minhae Kwon
备注：5 pages, 3 figures, accepted for ACM CIKM 2025
摘要：在现实世界的应用中，机器学习模型面临着在线标签转移，标签分布随着时间的推移而变化。有效的适应需要仔细选择学习率：过低会减慢适应速度，过高会导致不稳定。我们提出了ASAP（Adaptive Shift Aware Post-training），它通过计算当前和先前未标记输出之间的余弦距离并将其映射到有界范围内来动态调整学习率。ASAP不需要标签、模型集合或过去的输入，只使用以前的softmax输出，以实现快速、轻量级的自适应。跨多个数据集和移位场景的实验表明，ASAP始终提高了准确性和效率，使其适用于无监督模型自适应。
摘要：In real-world applications, machine learning models face online label shift, where label distributions change over time. Effective adaptation requires careful learning rate selection: too low slows adaptation and too high causes instability. We propose ASAP (Adaptive Shift Aware Post-training), which dynamically adjusts the learning rate by computing the cosine distance between current and previous unlabeled outputs and mapping it within a bounded range. ASAP requires no labels, model ensembles, or past inputs, using only the previous softmax output for fast, lightweight adaptation. Experiments across multiple datasets and shift scenarios show ASAP consistently improves accuracy and efficiency, making it practical for unsupervised model adaptation.

【6】Semi-Supervised Anomaly Detection Pipeline for SOZ Localization Using Ictal-Related Chirp
标题：使用Ictal相关Chirp进行SOZ定位的半监督异常检测管道
链接：https://arxiv.org/abs/2508.13406

作者：ahador, Milad Lankarany
备注：23 pages, 7 figures
摘要：本研究提出了一个定量的框架，用于评估临床定义的癫痫发作区（SOZ）和统计学异常的通道，通过时频分析的啁啾事件之间的空间一致性。提出的管道采用两步方法：（1）无监督离群检测，其中局部离群因子（LOF）分析与自适应邻域选择基于啁啾的频谱-时间特征识别异常通道（起始频率、偏移频率和持续时间）;以及（2）空间相关性分析，其计算精确的同现度量和加权索引相似性，结合半球一致性和电极接近性。主要研究结果表明，LOF为基础的方法（N邻居=20，污染=0.2）有效地检测离群值，与索引匹配（加权通道接近度）优于精确匹配SOZ定位。无癫痫发作患者（平均指数精确度：0.903）和手术成功患者（平均指数精确度：0.865）的性能指标（精确度、召回率、F1）最高，而失败病例的一致性较低（平均指数精确度：0.460）。关键的结论是，基于啁啾的离群值检测与加权空间度量相结合，为SOZ定位提供了一种补充方法，特别是在手术成功的患者中。
摘要：This study presents a quantitative framework for evaluating the spatial concordance between clinically defined seizure onset zones (SOZs) and statistically anomalous channels identified through time-frequency analysis of chirp events. The proposed pipeline employs a two-step methodology: (1) Unsupervised Outlier Detection, where Local Outlier Factor (LOF) analysis with adaptive neighborhood selection identifies anomalous channels based on spectro-temporal features of chirp (Onset frequency, offset frequency, and temporal duration); and (2) Spatial Correlation Analysis, which computes both exact co-occurrence metrics and weighted index similarity, incorporating hemispheric congruence and electrode proximity. Key findings demonstrate that the LOF-based approach (N neighbors=20, contamination=0.2) effectively detects outliers, with index matching (weighted by channel proximity) outperforming exact matching in SOZ localization. Performance metrics (precision, recall, F1) were highest for seizure-free patients (Index Precision mean: 0.903) and those with successful surgical outcomes (Index Precision mean: 0.865), whereas failure cases exhibited lower concordance (Index Precision mean: 0.460). The key takeaway is that chirp-based outlier detection, combined with weighted spatial metrics, provides a complementary method for SOZ localization, particularly in patients with successful surgical outcomes.

【7】RISE: Enhancing VLM Image Annotation with Self-Supervised Reasoning
标题：RISE：通过自我监督推理增强VLM图像注释
链接：https://arxiv.org/abs/2508.13229

作者：, Wei Hu, Yuhang Su, Fan Zhang
摘要：视觉语言模型（VLM）难以处理复杂的图像注释任务，例如情感分类和上下文驱动的对象检测，这些任务需要复杂的推理。标准监督微调（SFT）仅关注注释结果，忽略了潜在的原理，而视觉强化微调（Visual-RFT）由于在预训练期间缺乏高质量的，经过验证的CoT而产生不一致的思想链（CoT）。我们引入了RISE（理性-启发-加强-专业知识），这是一个克服这些限制的两阶段框架。在推理阶段（RISE-CoT），强化学习驱动的“注释-推理-注释”闭环通过验证其重建原始注释而不直接泄漏的能力来生成视觉上接地的、逻辑上一致的CoT。优化和强化阶段（RISE-R1）利用一个高质量的CoT子集，通过RISE-CoT奖励进行过滤，进行监督微调，然后进行强化微调，以产生可解释的推理和准确的注释，从而在复杂的视觉任务中获得专业知识。在复杂和简单的图像注释任务上进行评估，RISE训练的Qwen 2-VL-2B优于SFT和Visual-RFT，实现了强大的性能和增强的可解释性。RISE提供了一个自我监督的解决方案，用于推进VLM推理，而无需手动注释CoT。
摘要：Vision-Language Models (VLMs) struggle with complex image annotation tasks, such as emotion classification and context-driven object detection, which demand sophisticated reasoning. Standard Supervised Fine-Tuning (SFT) focuses solely on annotation outcomes, ignoring underlying rationales, while Visual Reinforcement Fine-Tuning (Visual-RFT) produces inconsistent Chains of Thought (CoTs) due to the absence of high-quality, verified CoTs during pre-training. We introduce RISE (Reason-Inspire-Strengthen-Expertise), a two-stage framework to overcome these limitations. In the Reason stage (RISE-CoT), a reinforcement learning-driven "annotation-reasoning-annotation" closed-loop generates visually grounded, logically consistent CoTs by verifying their ability to reconstruct original annotations without direct leakage. The Inspire and Strengthen stage (RISE-R1) leverages a high-quality CoT subset, filtered by RISE-CoT rewards, for supervised fine-tuning, followed by reinforcement fine-tuning to produce interpretable reasoning and accurate annotations, achieving Expertise in complex visual tasks. Evaluated on complex and simple image annotation tasks, RISE-trained Qwen2-VL-2B outperforms SFT and Visual-RFT, achieving robust performance and enhanced explainability. RISE offers a self-supervised solution for advancing VLM reasoning without requiring manually annotated CoTs.

【8】Uncertainty-Aware PCA for Arbitrarily Distributed Data Modeled by Gaussian Mixture Models
标题：高斯混合模型建模的随机分布数据的不确定性感知PCA
链接：https://arxiv.org/abs/2508.13990

作者：ötzl, Ozan Tastekin, David Hägele, Marina Evers, Daniel Weiskopf
备注：10 pages, 6 figures
摘要：多维数据通常与不确定性相关，这些不确定性不能很好地用正态分布描述。在这项工作中，我们描述了如何使用不确定性感知主成分分析（UAPCA）将这些分布投影到低维空间。我们建议使用高斯混合模型（GARCH）来模拟多维分布，并从允许投影任意概率密度函数的一般公式推导出投影。与UAPCA映射相比，密度的低维投影展示了关于分布的更多细节，并且更忠实地表示它们。此外，我们支持在不同分布之间包含用户定义的权重，这允许改变多维分布的重要性。我们通过比较我们的方法和UAPCA获得的低维空间中的分布与基于样本的投影获得的分布来评估我们的方法。
摘要：Multidimensional data is often associated with uncertainties that are not well-described by normal distributions. In this work, we describe how such distributions can be projected to a low-dimensional space using uncertainty-aware principal component analysis (UAPCA). We propose to model multidimensional distributions using Gaussian mixture models (GMMs) and derive the projection from a general formulation that allows projecting arbitrary probability density functions. The low-dimensional projections of the densities exhibit more details about the distributions and represent them more faithfully compared to UAPCA mappings. Further, we support including user-defined weights between the different distributions, which allows for varying the importance of the multidimensional distributions. We evaluate our approach by comparing the distributions in low-dimensional space obtained by our method and UAPCA to those obtained by sample-based projections.

迁移|Zero/Few/One-Shot|自适应(7篇)

【1】One Shot vs. Iterative: Rethinking Pruning Strategies for Model Compression
标题：One Shot vs. Iterative：重新思考模型压缩的修剪策略
链接：https://arxiv.org/abs/2508.13836

作者：anusz, Tomasz Wojnar, Yawei Li, Luca Benini, Kamil Adamczewski
摘要：剪枝是压缩神经网络以提高计算效率的核心技术。这个过程通常以两种方式进行：一次性修剪，涉及单次训练和修剪，以及迭代修剪，其中修剪在多个周期内执行，以实现潜在的更精细的网络细化。虽然迭代修剪在历史上已经被广泛采用，但这种偏好通常是假设的，而不是严格测试的。我们的研究提出了这些方法的第一个系统和全面的比较之一，提供了严格的定义，跨结构化和非结构化设置的基准，并应用不同的修剪标准和模式。我们发现，每种方法都有特定的优势：一次修剪证明更有效的修剪率较低，而迭代修剪性能更好，在较高的比率。基于这些发现，我们提倡基于患者的修剪，并引入了一种混合方法，在某些情况下可以优于传统方法，为从业者选择针对其目标和限制的修剪策略提供了有价值的见解。源代码可在https://github.com/janumiko/pruning-benchmark上获得。
摘要：Pruning is a core technique for compressing neural networks to improve computational efficiency. This process is typically approached in two ways: one-shot pruning, which involves a single pass of training and pruning, and iterative pruning, where pruning is performed over multiple cycles for potentially finer network refinement. Although iterative pruning has historically seen broader adoption, this preference is often assumed rather than rigorously tested. Our study presents one of the first systematic and comprehensive comparisons of these methods, providing rigorous definitions, benchmarking both across structured and unstructured settings, and applying different pruning criteria and modalities. We find that each method has specific advantages: one-shot pruning proves more effective at lower pruning ratios, while iterative pruning performs better at higher ratios. Building on these findings, we advocate for patience-based pruning and introduce a hybrid approach that can outperform traditional methods in certain scenarios, providing valuable insights for practitioners selecting a pruning strategy tailored to their goals and constraints. Source code is available at https://github.com/janumiko/pruning-benchmark.

【2】Reinforcement Learning-based Adaptive Path Selection for Programmable Networks
标题：可编程网络基于强化学习的自适应路径选择
链接：https://arxiv.org/abs/2508.13806

作者：rdo Zerna Torres, Marios Avgeris, Chrysa Papagianni, Gergely Pongrácz, István Gódor, Paola Grosso
摘要：这项工作提出了一个分布式的概念验证实现，在网络强化学习（IN-RL）框架，在可编程网络中的自适应路径选择。通过将随机学习自动机（SLA）与通过带内网络遥测（INT）收集的实时遥测数据相结合，所提出的系统能够实现动态适应拥塞条件的本地数据驱动的转发决策。该系统进行评估的Mininet为基础的测试平台上使用P4可编程的BMv 2交换机，演示了我们的SLA为基础的机制如何收敛到有效的路径选择，并适应线速变化的网络条件。
摘要：This work presents a proof-of-concept implementation of a distributed, in-network reinforcement learning (IN-RL) framework for adaptive path selection in programmable networks. By combining Stochastic Learning Automata (SLA) with real-time telemetry data collected via In-Band Network Telemetry (INT), the proposed system enables local, data-driven forwarding decisions that adapt dynamically to congestion conditions. The system is evaluated on a Mininet-based testbed using P4-programmable BMv2 switches, demonstrating how our SLA-based mechanism converges to effective path selections and adapts to shifting network conditions at line rate.

【3】Communication-Efficient Federated Learning with Adaptive Number of Participants
标题：具有自适应参与者数量的通信高效联邦学习
链接：https://arxiv.org/abs/2508.13803

作者：orik, Vladislav Dorofeev, Gleb Molodtsov, Aram Avetisyan, Dmitry Bylinkin, Daniil Medyakov, Aleksandr Beznosikov
摘要：深度学习模型的快速扩展已经实现了跨领域的性能提升，但它也带来了一些挑战。联邦学习（FL）已经成为一个有前途的框架，以解决这些问题，使分散的培训。然而，通信效率仍然是FL的关键瓶颈，特别是在异构和动态客户端参与下。现有的方法，如FedAvg和FedProx，或其他方法，包括客户端选择策略，试图减少通信成本。然而，在一轮培训中选择客户人数的问题仍然极未得到充分探讨。我们引入了智能选择参与者（ISP），这是一种自适应机制，可以动态确定每轮的最佳客户端数量，以提高通信效率，而不会影响模型的准确性。我们验证了ISP在不同设置中的有效性，包括Vision Transformers，真实世界的ECG分类和梯度压缩训练。我们的结果显示，在不损失最终质量的情况下，可持续节省高达30%的通信费用。将ISP应用于不同的真实世界ECG分类设置突出了选择客户端数量作为联邦学习的单独任务。
摘要：Rapid scaling of deep learning models has enabled performance gains across domains, yet it introduced several challenges. Federated Learning (FL) has emerged as a promising framework to address these concerns by enabling decentralized training. Nevertheless, communication efficiency remains a key bottleneck in FL, particularly under heterogeneous and dynamic client participation. Existing methods, such as FedAvg and FedProx, or other approaches, including client selection strategies, attempt to mitigate communication costs. However, the problem of choosing the number of clients in a training round remains extremely underexplored. We introduce Intelligent Selection of Participants (ISP), an adaptive mechanism that dynamically determines the optimal number of clients per round to enhance communication efficiency without compromising model accuracy. We validate the effectiveness of ISP across diverse setups, including vision transformers, real-world ECG classification, and training with gradient compression. Our results show consistent communication savings of up to 30\% without losing the final quality. Applying ISP to different real-world ECG classification setups highlighted the selection of the number of clients as a separate task of federated learning.

【4】Towards a Larger Model via One-Shot Federated Learning on Heterogeneous Client Models
标题：通过异类客户机模型上的一次性联邦学习建立更大的模型
链接：https://arxiv.org/abs/2508.13625

作者：e, Xueli An, Onur Ayan, Junfan Wang, Xueqiang Yan, Georg Carle
备注：Accepted to Globecom 2025
摘要：大型模型以卓越的性能而闻名，即使没有十亿参数尺度，也胜过小型模型。虽然移动网络服务器具有充足的计算资源来支持比客户端设备更大的模型，但隐私约束阻止客户端直接共享其原始数据。联邦学习（FL）使分散的客户端能够通过交换模型参数而不是传输原始数据来协作训练共享模型。然而，它需要一个统一的模型架构和多轮的通信，忽视了资源的异构性，对客户端施加沉重的计算需求，并增加通信开销。为了解决这些挑战，我们提出了FedOL，在一次性设置中构建更大，更全面的服务器模型（即，在一个单独的通信回合中）。FedOL采用知识蒸馏而不是模型参数共享，其中客户端仅在未标记的公共数据集上交换模型预测输出。这通过传输紧凑的预测而不是完整的模型权重来减少通信开销，并通过允许异构模型架构来实现模型定制。这种情况下的一个关键挑战是，由于本地数据分布的倾斜，客户端的预测可能会有偏差，而公共数据集中缺乏地面实况标签，这进一步使可靠的学习变得复杂。为了缓解这些问题，FedOL引入了一个专门的目标函数，迭代地细化伪标签和服务器模型，提高学习可靠性。为了补充这一点，FedOL采用了量身定制的伪标签生成和知识蒸馏策略，有效地整合了不同的知识。仿真结果表明，FedOL显着优于现有的基线，为移动网络提供了一个具有成本效益的解决方案，客户端拥有宝贵的私人数据，但有限的计算资源。
摘要：Large models, renowned for superior performance, outperform smaller ones even without billion-parameter scales. While mobile network servers have ample computational resources to support larger models than client devices, privacy constraints prevent clients from directly sharing their raw data. Federated Learning (FL) enables decentralized clients to collaboratively train a shared model by exchanging model parameters instead of transmitting raw data. Yet, it requires a uniform model architecture and multiple communication rounds, which neglect resource heterogeneity, impose heavy computational demands on clients, and increase communication overhead. To address these challenges, we propose FedOL, to construct a larger and more comprehensive server model in one-shot settings (i.e., in a single communication round). Instead of model parameter sharing, FedOL employs knowledge distillation, where clients only exchange model prediction outputs on an unlabeled public dataset. This reduces communication overhead by transmitting compact predictions instead of full model weights and enables model customization by allowing heterogeneous model architectures. A key challenge in this setting is that client predictions may be biased due to skewed local data distributions, and the lack of ground-truth labels in the public dataset further complicates reliable learning. To mitigate these issues, FedOL introduces a specialized objective function that iteratively refines pseudo-labels and the server model, improving learning reliability. To complement this, FedOL incorporates a tailored pseudo-label generation and knowledge distillation strategy that effectively integrates diverse knowledge. Simulation results show that FedOL significantly outperforms existing baselines, offering a cost-effective solution for mobile networks where clients possess valuable private data but limited computational resources.

【5】Hierarchy-Consistent Learning and Adaptive Loss Balancing for Hierarchical Multi-Label Classification
标题：分层多标签分类的分层一致学习和自适应损失平衡
链接：https://arxiv.org/abs/2508.13452

作者：iang, Mengzhe Liu, Haobing Liu, Yanwei Yu
备注：10 pages, 7 figures, accepted by CIKM 2025
摘要：层次多标签分类（HMC）在保持结构一致性和平衡多任务学习（MTL）中的损失权重方面面临着关键挑战。为了解决这些问题，我们提出了一个分类器称为HCAL的MTL集成原型对比学习和自适应任务加权机制的基础上。我们的分类器最显着的优点是语义一致性，包括原型显式建模标签和特征聚合从子类到父类。另一个重要的优点是一个自适应的损失加权机制，通过监控特定于任务的收敛率动态分配优化资源。它有效地解决了传统MTL方法中固有的“一强多弱”优化偏差。为了进一步增强鲁棒性，通过向原型中注入受控噪声来扩展决策边界，从而形成原型扰动机制。此外，我们形式化的量化指标，称为层次违规率（HVR），以评估层次的一致性和泛化。在三个数据集上的大量实验表明，与基线模型相比，该分类器具有更高的分类精度和更低的分层违规率。
摘要：Hierarchical Multi-Label Classification (HMC) faces critical challenges in maintaining structural consistency and balancing loss weighting in Multi-Task Learning (MTL). In order to address these issues, we propose a classifier called HCAL based on MTL integrated with prototype contrastive learning and adaptive task-weighting mechanisms. The most significant advantage of our classifier is semantic consistency including both prototype with explicitly modeling label and feature aggregation from child classes to parent classes. The other important advantage is an adaptive loss-weighting mechanism that dynamically allocates optimization resources by monitoring task-specific convergence rates. It effectively resolves the "one-strong-many-weak" optimization bias inherent in traditional MTL approaches. To further enhance robustness, a prototype perturbation mechanism is formulated by injecting controlled noise into prototype to expand decision boundaries. Additionally, we formalize a quantitative metric called Hierarchical Violation Rate (HVR) as to evaluate hierarchical consistency and generalization. Extensive experiments across three datasets demonstrate both the higher classification accuracy and reduced hierarchical violation rate of the proposed classifier over baseline models.

【6】Adaptive Conformal Prediction Intervals Over Trajectory Ensembles
标题：轨迹系组的自适应保形预测区间
链接：https://arxiv.org/abs/2508.13362

作者： Daniel Menacho, Alexander Rodríguez
摘要：未来的轨迹在自动驾驶、飓风预测和流行病建模等领域发挥着重要作用，从业者通常通过采样概率模型或利用多个自回归预测因子来生成集成路径。虽然这些轨迹反映了固有的不确定性，但它们通常未经校准。我们提出了一个统一的框架，基于共形预测，将采样轨迹转换成校准的预测区间与理论覆盖率的保证。通过引入一个新的在线更新步骤和一个捕获步骤间依赖关系的优化步骤，我们的方法可以在每个轨迹周围产生不连续的预测区间，自然地捕获时间依赖关系，并产生更清晰，更自适应的不确定性估计。
摘要：Future trajectories play an important role across domains such as autonomous driving, hurricane forecasting, and epidemic modeling, where practitioners commonly generate ensemble paths by sampling probabilistic models or leveraging multiple autoregressive predictors. While these trajectories reflect inherent uncertainty, they are typically uncalibrated. We propose a unified framework based on conformal prediction that transforms sampled trajectories into calibrated prediction intervals with theoretical coverage guarantees. By introducing a novel online update step and an optimization step that captures inter-step dependencies, our method can produce discontinuous prediction intervals around each trajectory, naturally capture temporal dependencies, and yield sharper, more adaptive uncertainty estimates.

【7】Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory
标题：基于Cosserat杆理论的物理信息神经网络的软连续体机器人自适应模型预测控制
链接：https://arxiv.org/abs/2508.12681

作者：cher, Max Bartholdt, Henrik Krauss, Tim-Lukas Habich, Thomas Seel, Moritz Schappler
备注：20 pages, 15 figures
摘要：软连续体机器人（SCR）的动态控制具有扩大其应用的巨大潜力，但由于精确动态模型的高计算要求，仍然是一个具有挑战性的问题。虽然已经提出了基于Koopman算子的方法等数据驱动方法，但它们通常缺乏适应性，无法捕获完整的机器人形状，限制了它们的适用性。这项工作介绍了一个实时的非线性模型预测控制（MPC）框架的SCR的基础上，域解耦的物理信息的神经网络（DD-PINN）与自适应弯曲刚度。DD-PINN作为动态Cosserat棒模型的替代品，加速因子为44000。它也被用于无迹卡尔曼滤波器中，用于从末端执行器位置测量估计模型状态和弯曲顺应性。我们实现了一个非线性进化MPC运行在70 Hz的GPU上。在仿真中，它演示了精确的动态轨迹跟踪和设定点控制与末端执行器的位置误差低于3毫米（致动器的长度的2.3%）。在实际实验中，该控制器实现了相似的精度和高达3.55 m/s2的加速度。
摘要：Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot capture the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of 44000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.

强化学习(1篇)

【1】Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem
标题：随机最短路问题的收敛强化学习算法
链接：https://arxiv.org/abs/2508.13963

作者： Guin, Shalabh Bhatnagar
摘要：在本文中，我们提出了两个算法在表格设置和一个算法的函数近似设置的随机最短路（SSP）问题。SSP问题是强化学习（RL）中的一类重要问题，因为其他类型的成本标准可以在SSP的设置中制定。我们证明了所有算法的渐近几乎必然收敛性。我们观察到优越的性能，我们的表格算法相比，其他知名的收敛RL算法。我们进一步观察我们的函数近似算法相比，在函数近似设置的其他算法的可靠性能。
摘要：In this paper we propose two algorithms in the tabular setting and an algorithm for the function approximation setting for the Stochastic Shortest Path (SSP) problem. SSP problems form an important class of problems in Reinforcement Learning (RL), as other types of cost-criteria in RL can be formulated in the setting of SSP. We show asymptotic almost-sure convergence for all our algorithms. We observe superior performance of our tabular algorithms compared to other well-known convergent RL algorithms. We further observe reliable performance of our function approximation algorithm compared to other algorithms in the function approximation setting.

医学相关(3篇)

【1】Classifying Clinical Outcome of Epilepsy Patients with Ictal Chirp Embeddings
标题：癫痫发作期Chirp嵌入对癫痫患者临床结局的分类
链接：https://arxiv.org/abs/2508.13476

作者：ahador, Milad Lankarany
备注：21 pages, 10 figures
摘要：这项研究提出了一个管道，利用t-分布式随机邻居嵌入（t-SNE）的啁啾功能在不同的结果场景的可解释的可视化。该数据集包括基于啁啾的时间、频谱和频率度量。使用t-SNE，保留局部邻域关系，同时通过基于学生t分布的相似性优化来解决拥挤问题。在2D t-SNE嵌入上制定了三个分类任务：（1）区分临床成功与失败/未切除，（2）区分高难度与低难度病例，以及（3）识别最佳病例，定义为临床难度最小的成功结局。四个分类器，即随机森林，支持向量机，逻辑回归，和k-最近邻，进行了训练和评估，使用分层5倍交叉验证。在不同的任务中，随机森林和k-NN分类器表现出了卓越的性能，在最佳病例检测中达到了高达88.8%的准确率（成功的结果和最小的临床困难）。此外，使用SHAP解释生成特征影响敏感度图，该解释应用于模型预测t-SNE坐标，揭示了嵌入空间内的空间定位特征重要性。这些地图突出了特定的啁啾属性如何驱动区域聚类和类别分离，从而深入了解数据的潜在结构。综合框架展示了可解释的嵌入和局部特征属性的潜力，用于临床分层和决策支持。
摘要：This study presents a pipeline leveraging t-Distributed Stochastic Neighbor Embedding (t-SNE) for interpretable visualizations of chirp features across diverse outcome scenarios. The dataset, comprising chirp-based temporal, spectral, and frequency metrics. Using t-SNE, local neighborhood relationships were preserved while addressing the crowding problem through Student t-distribution-based similarity optimization. Three classification tasks were formulated on the 2D t-SNE embeddings: (1) distinguishing clinical success from failure/no-resection, (2) separating high-difficulty from low-difficulty cases, and (3) identifying optimal cases, defined as successful outcomes with minimal clinical difficulty. Four classifiers, namely, Random Forests, Support Vector Machines, Logistic Regression, and k-Nearest Neighbors, were trained and evaluated using stratified 5-fold cross-validation. Across tasks, the Random Forest and k-NN classifiers demonstrated superior performance, achieving up to 88.8% accuracy in optimal case detection (successful outcomes with minimal clinical difficulty). Additionally, feature influence sensitivity maps were generated using SHAP explanations applied to model predicting t-SNE coordinates, revealing spatially localized feature importance within the embedding space. These maps highlighted how specific chirp attributes drive regional clustering and class separation, offering insights into the latent structure of the data. The integrated framework showcases the potential of interpretable embeddings and local feature attribution for clinical stratification and decision support.

【2】Automated Cervical Cancer Detection through Visual Inspection with Acetic Acid in Resource-Poor Settings with Lightweight Deep Learning Models Deployed on an Android Device
标题：在资源匮乏的环境中通过使用醋酸进行视觉检查自动检测宫颈癌，并在Android设备上部署轻量级深度学习模型
链接：https://arxiv.org/abs/2508.13253

作者：elroy Maben, Keerthana Prasad, Shyamala Guruvare, Vidya Kudva, P C Siddalingaswamy
摘要：宫颈癌是女性中最常见的癌症之一，尽管相对容易治疗，但在低收入和中等收入国家，它夺去了大量的生命。一些研究表明，公共筛查计划可以显着降低宫颈癌的发病率和死亡率。虽然有几种筛选测试，但由于测试的可负担性和简单性，使用乙酸（VIA）进行目视检查是低资源环境中最可行的选择。VIA需要经过培训的医疗专业人员来解释测试，并且本质上是主观的。使用人工智能实现VIA自动化消除了主观性，并将允许将任务转移给训练有素的卫生工作者。人工智能的任务转移将有助于进一步加快低资源环境中的筛查计划。在我们的工作中，我们提出了一种轻量级的深度学习算法，其中包括EfficientDet-Lite 3作为感兴趣区域（ROI）检测器和基于MobileNet- V2的分类模型。这些模型将部署在一个基于安卓系统的设备上，可以远程操作，并提供几乎即时的结果，而不需要训练有素的医疗专业人员、实验室、复杂的基础设施或互联网连接。该分类模型在测试数据集上的准确率为92.31%，灵敏度为98.24%，特异性为88.37%，是一种很有前途的自动化低资源筛选方法。
摘要：Cervical cancer is among the most commonly occurring cancer among women and claims a huge number of lives in low and middle-income countries despite being relatively easy to treat. Several studies have shown that public screening programs can bring down cervical cancer incidence and mortality rates significantly. While several screening tests are available, visual inspection with acetic acid (VIA) presents itself as the most viable option for low-resource settings due to the affordability and simplicity of performing the test. VIA requires a trained medical professional to interpret the test and is subjective in nature. Automating VIA using AI eliminates subjectivity and would allow shifting of the task to less trained health workers. Task shifting with AI would help further expedite screening programs in low-resource settings. In our work, we propose a lightweight deep learning algorithm that includes EfficientDet-Lite3 as the Region of Interest (ROI) detector and a MobileNet- V2 based model for classification. These models would be deployed on an android-based device that can operate remotely and provide almost instant results without the requirement of highly-trained medical professionals, labs, sophisticated infrastructure, or internet connectivity. The classification model gives an accuracy of 92.31%, a sensitivity of 98.24%, and a specificity of 88.37% on the test dataset and presents itself as a promising automated low-resource screening approach.

【3】Sex-Specific Vascular Score: A Novel Perfusion Biomarker from Supervoxel Analysis of 3D pCASL MRI
标题：性别特异性血管评分：一种基于3D pCASL MRI超体素分析的新灌注生物标志物
链接：https://arxiv.org/abs/2508.13173

作者：le, Neelam Sinha, Vaanathi Sundareshan, Thomas Gregor Issac
备注：18 pages, 7 figures
摘要：我们提出了一个新的框架，利用三维伪连续动脉自旋标记（3D pCASL）MRI来计算性别特异性血管评分，量化脑血管健康和潜在的疾病易感性。使用超体素聚类将大脑分割成空间连续的均匀灌注区域，捕获微血管和大血管的贡献。从186名认知健康的参与者中提取平均脑血流量（CBF）值，并用于训练自定义卷积神经网络，实现了95%的性别分类准确率。这突出了整个大脑中强大的性别特异性灌注模式。此外，在男性和女性队列中系统地评价了区域CBF变化和年龄相关影响。拟议的血管风险评分框架增强了对规范性脑灌注和衰老的理解，并可能有助于早期发现和个性化干预神经退行性疾病，如阿尔茨海默氏症。
摘要：We propose a novel framework that leverages 3D pseudo-continuous arterial spin labeling (3D pCASL) MRI to compute sex-specific vascular scores that quantify cerebrovascular health and potential disease susceptibility. The brain is parcellated into spatially contiguous regions of homogeneous perfusion using supervoxel clustering, capturing both microvascular and macrovascular contributions. Mean cerebral blood flow (CBF) values are extracted from 186 cognitively healthy participants and used to train a custom convolutional neural network, achieving 95 percent accuracy in sex classification. This highlights robust, sex-specific perfusion patterns across the brain. Additionally, regional CBF variations and age-related effects are systematically evaluated within male and female cohorts. The proposed vascular risk-scoring framework enhances understanding of normative brain perfusion and aging, and may facilitate early detection and personalized interventions for neurodegenerative diseases such as Alzheimer's.

蒸馏|知识提取(1篇)

【1】TASER: Table Agents for Schema-guided Extraction and Recommendation
标题：TASER：用于方案引导提取和推荐的表代理
链接：https://arxiv.org/abs/2508.13404

作者：o, Kirsty Fielding, William Watson, Sumitra Ganesh, Manuela Veloso
摘要：真实世界的金融文件报告了有关实体金融资产的基本信息，这些金融资产可能涉及数百万种不同的金融工具类型。然而，这些细节往往隐藏在杂乱的、多页的、碎片化的表格中--例如，我们数据集中99.4%的表格没有边界框，44页中每个表格的最大行数为426。为了解决这些独特的挑战，从现实世界的表，我们提出了一个不断学习，代理表提取系统，TASER（表代理模式引导提取和推荐），提取高度非结构化，多页，异构表到规范化，符合模式的输出。我们的表代理通过利用初始模式执行表检测、分类、提取和建议。然后，我们的RecommenderAgent检查输出，推荐模式修订，并决定最终的建议，使TASER能够比现有的表检测模型（如Table Transformer）高出10.1%。在这个持续学习的过程中，我们强调，更大的批量会导致可操作和利用的模式建议增加104.3%，导致提取的持有量增加9.8%-突出了持续学习过程的重要性。为了训练TASER，我们手动标记了22，584页（28，150，449个代币），3，213个表格，价值731，685，511，687美元，最终成为第一个真正的金融表格数据集。我们发布数据集TASERTab，使研究社区能够访问真实世界的金融表格和输出。我们的研究结果突出了代理，模式引导提取系统的强大的理解现实世界的金融表的承诺。
摘要：Real-world financial documents report essential information about an entity's financial holdings that can span millions of different financial instrument types. Yet, these details are often buried in messy, multi-page, fragmented tables - for example, 99.4% of the tables in our dataset have no bounding boxes with the maximum number of rows amounting to 426 per table across 44 pages. To tackle these unique challenges from real-world tables, we present a continuously learning, agentic table extraction system, TASER (Table Agents for Schema-guided Extraction and Recommendation) that extracts highly unstructured, multi-page, heterogeneous tables into normalized, schema-conforming outputs. Our table agents execute on table detection, classification, extraction, and recommendations by leveraging an initial schema. Then, our Recommender Agent reviews the outputs, recommends schema revisions, and decides on the final recommendations, enabling TASER to outperform existing table detection models such as Table Transformer by 10.1%. Within this continuous learning process, we highlight that larger batch sizes result in a 104.3% increase in schema recommendations that are actionable and utilized, resulting in a 9.8% increase in extracted holdings - highlighting the importance of a continuous learning process. To train TASER, we have manually labeled 22,584 pages (28,150,449 tokens), 3,213 tables for $731,685,511,687 of holdings culminating in one of the first real financial table datasets. We release our dataset TASERTab to enable the research community to access real-world financial tables and outputs. Our results highlight the promise of agentic, schema-guided extraction systems for robust understanding of real-world financial tables.

推荐(3篇)

【1】Multi-User Contextual Cascading Bandits for Personalized Recommendation
标题：多用户上下文级联Bandits的个性化推荐
链接：https://arxiv.org/abs/2508.13981

作者：, Huiwen Jia
备注：35 pages, 5 figures
摘要：我们引入了一个多用户上下文级联强盗模型，一个新的组合强盗框架，捕捉现实的在线广告场景中，多个用户同时与顺序显示的项目进行交互。与经典的情境强盗不同，MCCB集成了三个关键的结构元素：（i）基于顺序臂暴露的级联反馈，（ii）并行的情境会话，使选择性探索成为可能，以及（iii）异构臂级奖励。我们首先提出了向后规划置信上限（UCBBP），这是一种针对此设置定制的UCB风格算法，并证明它在$T$集、$H$会话步骤和$N$每集上下文上实现了$\widetilde{O}（\sqrt{THN}）$的遗憾界。受许多用户同时与系统交互的事实的启发，我们引入了第二种算法，称为后向规划的主动置信上限（AUCBBP），它在上下文缩放方面表现出严格的效率改进，即，用户缩放，遗憾边界为$\widetilde{O}（\sqrt{T+HN}）$。我们通过数值实验验证了我们的理论研究结果，证明了这两种算法在各种设置下的经验有效性。
摘要：We introduce a Multi-User Contextual Cascading Bandit model, a new combinatorial bandit framework that captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously. Unlike classical contextual bandits, MCCB integrates three key structural elements: (i) cascading feedback based on sequential arm exposure, (ii) parallel context sessions enabling selective exploration, and (iii) heterogeneous arm-level rewards. We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of $\widetilde{O}(\sqrt{THN})$ over $T$ episodes, $H$ session steps, and $N$ contexts per episode. Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with Backward Planning (AUCBBP), which shows a strict efficiency improvement in context scaling, i.e., user scaling, with a regret bound of $\widetilde{O}(\sqrt{T+HN})$. We validate our theoretical findings via numerical experiments, demonstrating the empirical effectiveness of both algorithms under various settings.

【2】Understanding Distribution Structure on Calibrated Recommendation Systems
标题：了解经过校准的推荐系统的分布结构
链接：https://arxiv.org/abs/2508.13568

作者：rea da Silva, Denis Robson Dantas Boaventura, Mayki dos Santos Oliveira, Eduardo Ferreira da Silva, Joel Machado Pires, Frederico Araújo Durão
摘要：传统的推荐系统旨在生成包括与用户简档最相关或最相似的项目的推荐列表。这些方法可以创建推荐列表，该推荐列表从用户简档的不太突出的区域中省略项目类型，从而破坏用户的体验。为了解决这个问题，校准的推荐系统提供了在推荐列表中包括较少代表性区域的保证。校准的上下文适用于三种分布。第一个来自用户的个人资料，第二个来自候选项，最后一个来自推荐列表。这些分布是G维的，其中G是系统中流派的总数。这种高维性需要不同的评估方法，考虑到传统的查询器在一维数据空间中操作。在这个意义上，我们实现了15个模型，帮助理解这些分布是如何结构化的。我们评估用户的模式在三个数据集从电影领域。结果表明，离群点检测模型提供了一个更好的理解的结构。经过校准的系统创建的推荐列表与传统的推荐列表类似，允许用户以相同的程度改变他们的偏好组。
摘要：Traditional recommender systems aim to generate a recommendation list comprising the most relevant or similar items to the user's profile. These approaches can create recommendation lists that omit item genres from the less prominent areas of a user's profile, thereby undermining the user's experience. To solve this problem, the calibrated recommendation system provides a guarantee of including less representative areas in the recommended list. The calibrated context works with three distributions. The first is from the user's profile, the second is from the candidate items, and the last is from the recommendation list. These distributions are G-dimensional, where G is the total number of genres in the system. This high dimensionality requires a different evaluation method, considering that traditional recommenders operate in a one-dimensional data space. In this sense, we implement fifteen models that help to understand how these distributions are structured. We evaluate the users' patterns in three datasets from the movie domain. The results indicate that the models of outlier detection provide a better understanding of the structures. The calibrated system creates recommendation lists that act similarly to traditional recommendation lists, allowing users to change their groups of preferences to the same degree.

【3】Heterogeneous Influence Maximization in User Recommendation
标题：用户推荐中的异类影响最大化
链接：https://arxiv.org/abs/2508.13517

作者：u, Jiachen Sun, Wenqing Lin, Wendong Bi, Xiangrong Wang, Deqing Yang
备注：Accepted in CIKM 2025
摘要：用户推荐系统通过鼓励用户充当邀请者与其他用户（被邀请者）交互来增强用户参与度，从而潜在地促进信息传播。传统的推荐方法通常集中在建模交互意愿。影响最大化（IM）方法的重点是确定一组用户，以最大限度地提高信息传播。然而，现有方法面临两个重大挑战。首先，推荐方法无法释放候选人的传播能力。第二，IM方法未能解释互动的意愿。为了解决这些问题，我们提出了两个模型HeteroIR和HeteroIM。HeteroIR提供了一个直观的解决方案，以释放用户推荐系统的传播潜力。HeteroIM填补了IM方法和推荐任务之间的空白，提高了交互意愿，最大限度地扩大了传播覆盖范围。HeteroIR引入了一个两阶段的框架来估计价差利润。HeteroIM递增地选择最有影响力的被邀请者以基于包含邀请者和被邀请者的反向可达（RR）集合的数量来推荐和重新排名。RR集表示可以通过传播到达目标的节点的集合。广泛的实验表明，HeteroIR和HeteroIM显著优于最先进的基线，p值< 0.05。此外，我们在腾讯的在线游戏平台上部署了HeteroIR和HeteroIM，并在在线A/B测试中分别获得了8.5%和10%的改进。可在https://github.com/socialalgo/HIM上获得实施代码。
摘要：User recommendation systems enhance user engagement by encouraging users to act as inviters to interact with other users (invitees), potentially fostering information propagation. Conventional recommendation methods typically focus on modeling interaction willingness. Influence-Maximization (IM) methods focus on identifying a set of users to maximize the information propagation. However, existing methods face two significant challenges. First, recommendation methods fail to unleash the candidates' spread capability. Second, IM methods fail to account for the willingness to interact. To solve these issues, we propose two models named HeteroIR and HeteroIM. HeteroIR provides an intuitive solution to unleash the dissemination potential of user recommendation systems. HeteroIM fills the gap between the IM method and the recommendation task, improving interaction willingness and maximizing spread coverage. The HeteroIR introduces a two-stage framework to estimate the spread profits. The HeteroIM incrementally selects the most influential invitee to recommend and rerank based on the number of reverse reachable (RR) sets containing inviters and invitees. RR set denotes a set of nodes that can reach a target via propagation. Extensive experiments show that HeteroIR and HeteroIM significantly outperform the state-of-the-art baselines with the p-value < 0.05. Furthermore, we have deployed HeteroIR and HeteroIM in Tencent's online gaming platforms and gained an 8.5\% and 10\% improvement in the online A/B test, respectively. Implementation codes are available at https://github.com/socialalgo/HIM.

聚类(2篇)

【1】Multi-view Clustering via Bi-level Decoupling and Consistency Learning
标题：通过双层脱钩和一致性学习的多视图集群
链接：https://arxiv.org/abs/2508.13499

作者：ng, Yuhui Zheng, Huiying Xu, Xinzhong Zhu
摘要：多视图聚类已被证明是分析多视图数据中的潜在模式的有效方法。通过学习多视图特征之间的一致性和互补性可以提高聚类的性能，然而，面向聚类的表示学习往往被忽视。在本文中，我们提出了一种新的双层解耦和一致性学习框架（BDCL），以进一步探索多视图数据的有效表示，以提高多视图聚类中特征的簇间区分度和簇内紧凑性。我们的框架包括三个模块：1）多视图实例学习模块对齐一致的信息，同时通过重建自动编码器和对比学习保持视图之间的私有特征。2)特征和聚类的双层解耦增强了特征空间和聚类空间的可区分性。3)一致性学习模块将样本及其邻居的不同视图视为正对，学习其聚类分配的一致性，并进一步压缩簇内空间。在5个基准数据集上的实验结果表明，与SOTA方法相比，该方法具有明显的优越性。我们的代码发布在https://github.com/LouisDong95/BDCL上。
摘要：Multi-view clustering has shown to be an effective method for analyzing underlying patterns in multi-view data. The performance of clustering can be improved by learning the consistency and complementarity between multi-view features, however, cluster-oriented representation learning is often overlooked. In this paper, we propose a novel Bi-level Decoupling and Consistency Learning framework (BDCL) to further explore the effective representation for multi-view data to enhance inter-cluster discriminability and intra-cluster compactness of features in multi-view clustering. Our framework comprises three modules: 1) The multi-view instance learning module aligns the consistent information while preserving the private features between views through reconstruction autoencoder and contrastive learning. 2) The bi-level decoupling of features and clusters enhances the discriminability of feature space and cluster space. 3) The consistency learning module treats the different views of the sample and their neighbors as positive pairs, learns the consistency of their clustering assignments, and further compresses the intra-cluster space. Experimental results on five benchmark datasets demonstrate the superiority of the proposed method compared with the SOTA methods. Our code is published on https://github.com/LouisDong95/BDCL.

【2】A Recurrent Neural Network based Clustering Method for Binary Data Sets in Education
标题：基于回归神经网络的教育二进制数据集聚集方法
链接：https://arxiv.org/abs/2508.13224

作者：ira, Toshimichi Saito
摘要：本文研究了递归神经网络在S-P图聚类中的应用。随着学生人数的增加，S-P图变得难以处理。为了将大型图表分类为较小的图表，我们提出了一个简单的聚类方法的基础上的网络动力学。在该方法中，网络具有多个不动点和吸引盆，给出对应于小S-P图的聚类。为了评价聚类的性能，我们提出了一个重要的特征量：平均谨慎指数来表征学生答案的奇异性。通过基础实验验证了该方法的有效性。
摘要：This paper studies an application of a recurrent neural network to clustering method for the S-P chart: a binary data set used widely in education. As the number of students increases, the S-P chart becomes hard to handle. In order to classify the large chart into smaller charts, we present a simple clustering method based on the network dynamics. In the method, the network has multiple fixed points and basins of attraction give clusters corresponding to small S-P charts. In order to evaluate the clustering performance, we present an important feature quantity: average caution index that characterizes singularity of students answer oatterns. Performing fundamental experiments, effectiveness of the method is confirmed.

联邦学习|隐私保护|加密(2篇)

【1】Trans-XFed: An Explainable Federated Learning for Supply Chain Credit Assessment
标题：Trans-XFed：供应链信用评估的可解释联邦学习
链接：https://arxiv.org/abs/2508.13715

作者：Arno P. J. M. Siebes, Siamak Mehrkanoon
备注：Accepted by FLTA 2025
摘要：本文提出了一种将联邦学习与可解释的AI技术相结合的Trans-XFed架构，用于供应链信用评估。该模型旨在解决几个关键的挑战，包括隐私，信息孤岛，类不平衡，不相同和独立分布（非IID）的数据，以及供应链信用评估模型的可解释性。我们引入了基于性能的客户端选择策略（PBCS）来解决类不平衡和非IID问题。该策略通过选择具有较高本地F1分数的客户端来实现更快的收敛。FedProx架构，增强了同态加密，被用作核心模型，并进一步纳入了一个Transformer编码器。Transformer编码器块提供了对学习特征的深入了解。此外，我们采用集成梯度可解释人工智能技术来提供对决策的见解。我们通过对真实世界供应链数据集的实验评估证明了Trans-XFed的有效性。所获得的结果表明，与几个基线相比，它能够提供准确的信用评估，同时保持透明度和隐私。
摘要：This paper proposes a Trans-XFed architecture that combines federated learning with explainable AI techniques for supply chain credit assessment. The proposed model aims to address several key challenges, including privacy, information silos, class imbalance, non-identically and independently distributed (Non-IID) data, and model interpretability in supply chain credit assessment. We introduce a performance-based client selection strategy (PBCS) to tackle class imbalance and Non-IID problems. This strategy achieves faster convergence by selecting clients with higher local F1 scores. The FedProx architecture, enhanced with homomorphic encryption, is used as the core model, and further incorporates a transformer encoder. The transformer encoder block provides insights into the learned features. Additionally, we employ the integrated gradient explainable AI technique to offer insights into decision-making. We demonstrate the effectiveness of Trans-XFed through experimental evaluations on real-world supply chain datasets. The obtained results show its ability to deliver accurate credit assessments compared to several baselines, while maintaining transparency and privacy.

【2】Personalized Subgraph Federated Learning with Sheaf Collaboration
标题：具有Sheaf协作的个性化子图联邦学习
链接：https://arxiv.org/abs/2508.13642

作者：ang, Yanan Zhao, Rui She, Yiming Li, Wee Peng Tay
备注：Full version of our ECAI 2025 accepted paper
摘要：图结构数据在许多应用中很普遍。在子图联邦学习（FL）中，这些数据分布在客户端之间，每个客户端都有一个本地子图。个性化子图FL旨在为每个客户端开发一个定制的模型，以处理不同的数据分布。然而，由于本地子图的异质性，客户端之间的性能差异仍然是一个关键问题。为了克服这一挑战，我们提出了FedSheafHN，一个新的框架，建立在一个捆的协作机制，以统一增强客户端描述符与高效的个性化模型生成。具体来说，FedSheafHN通过利用图级嵌入并在协作图中采用层扩散来丰富客户端表示，从而将每个客户端的本地子图嵌入到服务器构造的协作图中。随后，FedSheafHN通过服务器优化的超网络生成定制的客户端模型。实证评估表明，FedSheafHN优于现有的个性化子图FL方法在各种图形数据集。此外，它具有快速的模型收敛性，并有效地推广到新客户端。
摘要：Graph-structured data is prevalent in many applications. In subgraph federated learning (FL), this data is distributed across clients, each with a local subgraph. Personalized subgraph FL aims to develop a customized model for each client to handle diverse data distributions. However, performance variation across clients remains a key issue due to the heterogeneity of local subgraphs. To overcome the challenge, we propose FedSheafHN, a novel framework built on a sheaf collaboration mechanism to unify enhanced client descriptors with efficient personalized model generation. Specifically, FedSheafHN embeds each client's local subgraph into a server-constructed collaboration graph by leveraging graph-level embeddings and employing sheaf diffusion within the collaboration graph to enrich client representations. Subsequently, FedSheafHN generates customized client models via a server-optimized hypernetwork. Empirical evaluations demonstrate that FedSheafHN outperforms existing personalized subgraph FL methods on various graph datasets. Additionally, it exhibits fast model convergence and effectively generalizes to new clients.

推理|分析|理解|解释(7篇)

【1】Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
标题：假设-R1：通用机器人操纵的增强假设推理
链接：https://arxiv.org/abs/2508.13998

作者：, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Jianye Hao
备注：Embodied-R1 technical report
摘要：具体化人工智能的泛化受到“看到做的差距”的阻碍，这源于数据稀缺和具体化异质性。为了解决这个问题，我们开创了“指向”作为一个统一的，不可知论的中间表示，定义了四个核心体现指向能力，桥梁高层次的视觉语言理解与低层次的动作原语。我们介绍了3B视觉语言模型（VLM），专门为体现推理和指向。我们使用广泛的体现和一般的视觉推理数据集作为源来构建一个大规模的数据集，点200 K，它支持关键体现指向功能。然后，我们使用两阶段的强化微调（RFT）课程和专门的多任务奖励设计来训练R1。DSP-R1在11个具体的空间和指向基准上实现了最先进的性能。重要的是，它在SIMPLEREenv中实现了56.2%的成功率，在8个真实世界的XArm任务中实现了87.5%的成功率，而没有任何特定于任务的微调，这表明它具有强大的zero-shot泛化能力，比强基线提高了62%。此外，该模型对不同的视觉干扰具有很高的鲁棒性。我们的工作表明，一个以点为中心的表示，结合RFT训练范式，提供了一个有效的和可推广的途径，以缩小机器人技术中的感知-动作差距。
摘要：Generalization in embodied AI is hindered by the "seeing-to-doing gap," which stems from data scarcity and embodiment heterogeneity. To address this, we pioneer "pointing" as a unified, embodiment-agnostic intermediate representation, defining four core embodied pointing abilities that bridge high-level vision-language comprehension with low-level action primitives. We introduce Embodied-R1, a 3B Vision-Language Model (VLM) specifically designed for embodied reasoning and pointing. We use a wide range of embodied and general visual reasoning datasets as sources to construct a large-scale dataset, Embodied-Points-200K, which supports key embodied pointing capabilities. We then train Embodied-R1 using a two-stage Reinforced Fine-tuning (RFT) curriculum with a specialized multi-task reward design. Embodied-R1 achieves state-of-the-art performance on 11 embodied spatial and pointing benchmarks. Critically, it demonstrates robust zero-shot generalization by achieving a 56.2% success rate in the SIMPLEREnv and 87.5% across 8 real-world XArm tasks without any task-specific fine-tuning, representing a 62% improvement over strong baselines. Furthermore, the model exhibits high robustness against diverse visual disturbances. Our work shows that a pointing-centric representation, combined with an RFT training paradigm, offers an effective and generalizable pathway to closing the perception-action gap in robotics.

【2】Explainable Learning Rate Regimes for Stochastic Optimization
标题：随机优化的可解释学习率方案
链接：https://arxiv.org/abs/2508.13639

作者：ng
摘要：现代机器学习是通过随机梯度下降（SGD）进行训练的，其性能主要取决于学习率（LR）如何随时间调整和降低。然而，现有的LR机制可能是复杂的，或者需要手动调整一个或多个额外的超参数，其瓶颈包括巨大的计算开销，时间和功率。这项工作，在一个自然和直接的方式，阐明了LR应该如何自动更新只根据随机梯度的内在变化。一个可解释的LR制度，利用随机二阶算法的开发，表现出类似的模式，启发式算法，但实施简单，没有任何参数调整的要求，它是一个自动的过程，LR应增加（减少）作为随机梯度的范数减少（增加）。由此产生的LR制度显示其效率，鲁棒性和可扩展性，在不同的经典随机算法，包含SGD，SGDM和SIGNSGD，机器学习任务。
摘要：Modern machine learning is trained by stochastic gradient descent (SGD), whose performance critically depends on how the learning rate (LR) is adjusted and decreased over time. Yet existing LR regimes may be intricate, or need to tune one or more additional hyper-parameters manually whose bottlenecks include huge computational expenditure, time and power in practice. This work, in a natural and direct manner, clarifies how LR should be updated automatically only according to the intrinsic variation of stochastic gradients. An explainable LR regime by leveraging stochastic second-order algorithms is developed, behaving a similar pattern to heuristic algorithms but implemented simply without any parameter tuning requirement, where it is of an automatic procedure that LR should increase (decrease) as the norm of stochastic gradients decreases (increases). The resulting LR regime shows its efficiency, robustness, and scalability in different classical stochastic algorithms, containing SGD, SGDM, and SIGNSGD, on machine learning tasks.

【3】Approximate Bayesian Inference via Bitstring Representations
标题：通过位串表示的近似Bayesian推理
链接：https://arxiv.org/abs/2508.13598

作者：ri Sladek, Martin Trapp, Arno Solin
备注：Published at Uncertainty in Artificial Intelligence (UAI 2025)
摘要：机器学习社区最近致力于量化或低精度算法来扩展大型模型。本文提出在这些表示创建的量化离散参数空间中执行概率推理，有效地使我们能够使用离散参数学习连续分布。我们考虑了2D密度和量化神经网络，在那里我们引入了一种使用概率电路的易处理的学习方法。这种方法提供了一种可扩展的解决方案来管理复杂的分布，并提供了对模型行为的清晰见解。我们用各种模型验证了我们的方法，在不牺牲准确性的情况下展示了推理效率。这项工作通过利用离散近似进行概率计算来推进可扩展的、可解释的机器学习。
摘要：The machine learning community has recently put effort into quantized or low-precision arithmetics to scale large models. This paper proposes performing probabilistic inference in the quantized, discrete parameter space created by these representations, effectively enabling us to learn a continuous distribution using discrete parameters. We consider both 2D densities and quantized neural networks, where we introduce a tractable learning approach using probabilistic circuits. This method offers a scalable solution to manage complex distributions and provides clear insights into model behavior. We validate our approach with various models, demonstrating inference efficiency without sacrificing accuracy. This work advances scalable, interpretable machine learning by utilizing discrete approximations for probabilistic computations.

【4】MuFlex: A Scalable, Physics-based Platform for Multi-Building Flexibility Analysis and Coordination
标题：MuFlex：一个可扩展的、基于物理的平台，用于多建筑灵活性分析和协调
链接：https://arxiv.org/abs/2508.13532

作者： Ivan Korolija, Rui Tang
备注：The platform will be released open-source on GitHub: this https URL once pre-printed
摘要：随着可再生能源发电在电网中的渗透率越来越高，维持系统平衡需要建筑群的协调需求灵活性。强化学习（RL）由于其无模型的性质，已被广泛用于构建控制。开源模拟测试台不仅对于训练RL代理至关重要，而且对于公平地对控制策略进行基准测试也至关重要。然而，大多数建筑部门的测试平台针对单一建筑物;多建筑平台相对有限，通常依赖于简化模型（例如，电阻-电容）或数据驱动的方法，这些方法缺乏充分捕获解释控制性能所需的物理复杂性和中间变量的能力。此外，这些平台通常采用固定的输入、输出和模型格式，限制了它们作为跨各种控制场景的基准工具的适用性。为了解决这些差距，MuFlex是一个可扩展的开源平台，用于对多建筑灵活性协调的控制策略进行基准测试和测试。MuFlex支持跨EnergyPlus建筑模型的同步信息交换，并遵循最新的OpenAI Gym接口，提供模块化、标准化的RL实现。该平台的功能在一个案例研究中得到了展示，该案例研究使用Soft Actor-Critic算法和精心微调的超参数来协调四座办公楼的需求灵活性。结果表明，聚合四个建筑物的灵活性降低总峰值需求低于指定的阈值，同时保持室内环境质量。
摘要：With the increasing penetration of renewable generation on the power grid, maintaining system balance requires coordinated demand flexibility from aggregations of buildings. Reinforcement learning (RL) has been widely explored for building controls because of its model-free nature. Open-source simulation testbeds are essential not only for training RL agents but also for fairly benchmarking control strategies. However, most building-sector testbeds target single buildings; multi-building platforms are relatively limited and typically rely on simplified models (e.g., Resistance-Capacitance) or data-driven approaches, which lack the ability to fully capture the physical intricacies and intermediate variables necessary for interpreting control performance. Moreover, these platforms often impose fixed inputs, outputs, and model formats, restricting their applicability as benchmarking tools across diverse control scenarios. To address these gaps, MuFlex, a scalable, open-source platform for benchmarking and testing control strategies for multi-building flexibility coordination, was developed in this study. MuFlex enables synchronous information exchange across EnergyPlus building models and adheres to the latest OpenAI Gym interface, providing a modular, standardized RL implementation. The platform capabilities were demonstrated in a case study coordinating demand flexibility across four office buildings using the Soft Actor-Critic algorithm with carefully fine-tuned hyperparameters. The results show that aggregating the four buildings flexibility reduced total peak demand below a specified threshold while maintaining indoor environmental quality.

【5】MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search
标题：MAVIS：基于值引导的推理时间搜索的多目标对齐
链接：https://arxiv.org/abs/2508.13415

作者：rleton, Debajoy Mukherjee, Srinivas Shakkottai, Dileep Kalathil
备注：20 pages, 6 figures
摘要：大型语言模型（LLM）越来越多地部署在不同的应用程序中，这些应用程序需要平衡多个通常相互冲突的目标-例如有用性，无害性或幽默性。在这种多目标设置中将输出与用户特定的偏好对齐通常需要针对每个目标或偏好配置微调模型，这在计算上是昂贵的且不灵活的。我们介绍了MAVIS -通过值引导的推理时间搜索的多目标对齐-一个轻量级的推理时间对齐框架，可以在不修改基础模型权重的情况下动态控制LLM行为。MAVIS训练一组小价值模型，每个模型对应一个不同的目标。在推理时，使用用户指定的权重组合这些值模型，以产生倾斜函数，该倾斜函数将基本模型的输出分布调整为期望的折衷。使用简单的迭代算法来训练值模型，该算法确保KL正则化策略的单调改进。我们的经验表明，MAVIS优于基线，微调每个目标的模型，并结合他们事后，甚至接近理想化的设置，其中模型进行微调，用户的确切偏好的性能。
摘要：Large Language Models (LLMs) are increasingly deployed across diverse applications that demand balancing multiple, often conflicting, objectives -- such as helpfulness, harmlessness, or humor. Aligning outputs to user-specific preferences in such multi-objective settings typically requires fine-tuning models for each objective or preference configuration, which is computationally expensive and inflexible. We introduce MAVIS -- Multi-Objective Alignment via Value-Guided Inference-Time Search -- a lightweight inference-time alignment framework that enables dynamic control over LLM behavior without modifying the base model's weights. MAVIS trains a set of small value models, each corresponding to a distinct objective. At inference time, these value models are combined using user-specified weights to produce a tilting function that adjusts the base model's output distribution toward desired trade-offs. The value models are trained using a simple iterative algorithm that ensures monotonic improvement of the KL-regularized policy. We show empirically that MAVIS outperforms baselines that fine-tune per-objective models and combine them post hoc, and even approaches the performance of the idealized setting where models are fine-tuned for a user's exact preferences.

【6】Batching-Aware Joint Model Onloading and Offloading for Hierarchical Multi-Task Inference
标题：分层多任务推理的批感知联合模型加载和卸载
链接：https://arxiv.org/abs/2508.13380

作者：Cha, Kevin Chan, Gustavo de Veciana, Haris Vikalo
摘要：资源受限的边缘设备上对智能服务的需求不断增长，推动了协作推理系统的发展，该系统将工作负载分布在终端设备、边缘服务器和云上。虽然大多数现有的框架专注于单任务、单模型场景，但许多现实世界的应用程序（例如，自动驾驶和增强现实）需要同时执行各种任务，包括检测、分割和深度估计。在这项工作中，我们提出了一个统一的框架，以共同决定在客户端和边缘服务器上部署（加载）哪些多任务模型，以及如何在层次结构中路由查询（卸载），以在内存，计算和通信约束下最大限度地提高整体推理精度。我们将其制定为混合整数规划，并引入J3O（加载和卸载的联合优化），这是一种交替算法，（i）通过拉格朗日松弛子模块优化来选择要加载的模型，（ii）通过约束线性规划来确定最佳卸载。我们进一步扩展了J3O，以解决边缘的问题，在异构任务负载下保持可伸缩性。实验表明，J3O始终达到超过97美元的最佳精度，而招致的运行时间不到15美元的最佳求解器所需的跨多任务基准测试。
摘要：The growing demand for intelligent services on resource-constrained edge devices has spurred the development of collaborative inference systems that distribute workloads across end devices, edge servers, and the cloud. While most existing frameworks focus on single-task, single-model scenarios, many real-world applications (e.g., autonomous driving and augmented reality) require concurrent execution of diverse tasks including detection, segmentation, and depth estimation. In this work, we propose a unified framework to jointly decide which multi-task models to deploy (onload) at clients and edge servers, and how to route queries across the hierarchy (offload) to maximize overall inference accuracy under memory, compute, and communication constraints. We formulate this as a mixed-integer program and introduce J3O (Joint Optimization of Onloading and Offloading), an alternating algorithm that (i) greedily selects models to onload via Lagrangian-relaxed submodular optimization and (ii) determines optimal offloading via constrained linear programming. We further extend J3O to account for batching at the edge, maintaining scalability under heterogeneous task loads. Experiments show J3O consistently achieves over $97\%$ of the optimal accuracy while incurring less than $15\%$ of the runtime required by the optimal solver across multi-task benchmarks.

【7】The Course Difficulty Analysis Cookbook
标题：课程难度分析食谱
链接：https://arxiv.org/abs/2508.13218

作者：Baucks, Robin Schmucker, Laurenz Wiskott
摘要：课程分析（CA）研究课程结构和学生数据，以确保教育计划的质量。一个重要的方面是研究课程属性，这涉及到为每门课程分配一个代表性的难度值。这对于CA的几个方面至关重要，例如质量控制（例如，监视随时间的变化），过程比较（例如，衔接），以及课程推荐（例如，建议）。测量课程难度需要仔细考虑多个因素：首先，当难度测量对入学学生的表现水平敏感时，它可能会忽视学生的多样性而导致解释偏差。通过独立于注册学生的表现来评估难度，我们可以减少偏见的风险，并实现公平，有代表性的难度评估。其次，从测量理论的角度来看，测量必须是可靠和有效的，为后续分析提供坚实的基础。第三，难度测量应该考虑协变量，例如不同人群中个别学生的特征（例如，转移状态）。近年来，人们提出了各种难度概念。本文提供了第一个全面的审查和比较现有的方法来评估课程难度的基础上平均成绩和潜在特质模型。它还提供了模型选择，假设检查和实际CA应用程序的实践教程。这些应用包括随时间监测课程难度和检测不同学生群体之间具有不同结果的课程（例如，辍学者与毕业生），最终旨在促进高质量，公平和平等的学习经验。为了支持进一步的研究和应用，我们提供了一个开源软件包和人工数据集，以促进可重复性和采用。
摘要：Curriculum analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. An essential aspect is studying course properties, which involves assigning each course a representative difficulty value. This is critical for several aspects of CA, such as quality control (e.g., monitoring variations over time), course comparisons (e.g., articulation), and course recommendation (e.g., advising). Measuring course difficulty requires careful consideration of multiple factors: First, when difficulty measures are sensitive to the performance level of enrolled students, it can bias interpretations by overlooking student diversity. By assessing difficulty independently of enrolled students' performances, we can reduce the risk of bias and enable fair, representative assessments of difficulty. Second, from a measurement theoretic perspective, the measurement must be reliable and valid to provide a robust basis for subsequent analyses. Third, difficulty measures should account for covariates, such as the characteristics of individual students within a diverse populations (e.g., transfer status). In recent years, various notions of difficulty have been proposed. This paper provides the first comprehensive review and comparison of existing approaches for assessing course difficulty based on grade point averages and latent trait modeling. It further offers a hands-on tutorial on model selection, assumption checking, and practical CA applications. These applications include monitoring course difficulty over time and detecting courses with disparate outcomes between distinct groups of students (e.g., dropouts vs. graduates), ultimately aiming to promote high-quality, fair, and equitable learning experiences. To support further research and application, we provide an open-source software package and artificial datasets, facilitating reproducibility and adoption.

分类|识别(4篇)

【1】Hierarchical Conformal Classification
标题：分层保形分类
链接：https://arxiv.org/abs/2508.13288

作者：n Hengst, Inès Blin, Majid Mohammadi, Syed Ihtesham Hussain Shah, Taraneh Younesian
摘要：共形预测（CP）是量化机器学习模型中不确定性的强大框架，提供具有有限样本覆盖保证的可靠预测。当应用于分类时，CP产生一个可能标签的预测集，该预测集保证包含具有高概率的真实标签，而不管底层分类器如何。然而，标准CP将类视为扁平和非结构化的，忽略了领域知识，如语义关系或类标签之间的层次结构。本文提出了分层共形分类（HCC），这是CP的扩展，将类层次结构纳入预测集的结构和语义中。我们制定HCC作为一个约束优化问题，其解决方案产生的预测集组成的节点在不同级别的层次结构，同时保持覆盖保证。为了解决该问题的组合性质，我们正式表明，候选解的一个小得多、结构良好的子集足以确保覆盖率，同时保持最优性。三个新的基准组成的音频，图像和文本数据的实证评估突出了我们的方法的优势，和用户研究表明，注释显着更喜欢层次的平面预测集。
摘要：Conformal prediction (CP) is a powerful framework for quantifying uncertainty in machine learning models, offering reliable predictions with finite-sample coverage guarantees. When applied to classification, CP produces a prediction set of possible labels that is guaranteed to contain the true label with high probability, regardless of the underlying classifier. However, standard CP treats classes as flat and unstructured, ignoring domain knowledge such as semantic relationships or hierarchical structure among class labels. This paper presents hierarchical conformal classification (HCC), an extension of CP that incorporates class hierarchies into both the structure and semantics of prediction sets. We formulate HCC as a constrained optimization problem whose solutions yield prediction sets composed of nodes at different levels of the hierarchy, while maintaining coverage guarantees. To address the combinatorial nature of the problem, we formally show that a much smaller, well-structured subset of candidate solutions suffices to ensure coverage while upholding optimality. An empirical evaluation on three new benchmarks consisting of audio, image, and text data highlights the advantages of our approach, and a user study shows that annotators significantly prefer hierarchical over flat prediction sets.

【2】Physically Plausible Data Augmentations for Wearable IMU-based Human Activity Recognition Using Physics Simulation
标题：使用物理模拟进行基于可穿戴IMU的人类活动识别的物理合理数据增强
链接：https://arxiv.org/abs/2508.13284

作者：Oishi, Philip Birch, Daniel Roggen, Paula Lago
备注：12 pages, 4 figures
摘要：在基于传感器的人类活动识别（HAR）中，高质量标记数据的稀缺阻碍了模型性能，并限制了在真实世界场景中的泛化。数据扩充是通过增强训练数据集的多样性来缓解这一问题的关键策略。基于信号变换的数据增强（STDA）技术在HAR中得到了广泛的应用。然而，这些方法通常在物理上是不可信的，可能导致无法保留活动标签的原始含义的增强数据。在这项研究中，我们介绍和系统地表征物理似然数据增强（PPDA），使物理模拟。PPDA利用来自运动捕捉或基于视频的姿势估计的人体运动数据，并通过物理模拟结合各种现实的可变性，包括修改身体运动，传感器放置和硬件相关的效果。我们比较了PPDA与传统STDA在日常活动和健身锻炼的三个公共数据集上的表现。首先，我们单独评估每种增强方法，直接将PPDA与STDA进行比较。接下来，我们评估如何通过改变用于训练的受试者数量来组合多个PPDA以减少对初始数据收集的需求。实验表明，PPDA的好处是一致的，提高宏观F1分数平均3.7 pp（高达13 pp），并实现了竞争力的表现与高达60%的培训科目比STDA。作为PPDA在基于传感器的HAR中的首次系统研究，这些结果突出了在数据增强中追求物理可扩展性的优势，以及物理模拟生成用于训练深度学习HAR模型的合成惯性测量单元数据的潜力。因此，这种具有成本效益和可扩展性的方法有助于解决HAR中注释稀缺的挑战。
摘要：The scarcity of high-quality labeled data in sensor-based Human Activity Recognition (HAR) hinders model performance and limits generalization across real-world scenarios. Data augmentation is a key strategy to mitigate this issue by enhancing the diversity of training datasets. Signal Transformation-based Data Augmentation (STDA) techniques have been widely used in HAR. However, these methods are often physically implausible, potentially resulting in augmented data that fails to preserve the original meaning of the activity labels. In this study, we introduce and systematically characterize Physically Plausible Data Augmentation (PPDA) enabled by physics simulation. PPDA leverages human body movement data from motion capture or video-based pose estimation and incorporates various realistic variabilities through physics simulation, including modifying body movements, sensor placements, and hardware-related effects. We compare the performance of PPDAs with traditional STDAs on three public datasets of daily activities and fitness workouts. First, we evaluate each augmentation method individually, directly comparing PPDAs to their STDA counterparts. Next, we assess how combining multiple PPDAs can reduce the need for initial data collection by varying the number of subjects used for training. Experiments show consistent benefits of PPDAs, improving macro F1 scores by an average of 3.7 pp (up to 13 pp) and achieving competitive performance with up to 60% fewer training subjects than STDAs. As the first systematic study of PPDA in sensor-based HAR, these results highlight the advantages of pursuing physical plausibility in data augmentation and the potential of physics simulation for generating synthetic Inertial Measurement Unit data for training deep learning HAR models. This cost-effective and scalable approach therefore helps address the annotation scarcity challenge in HAR.

【3】CLoE: Curriculum Learning on Endoscopic Images for Robust MES Classification
标题：CLOE：关于内窥镜图像的课程学习，以实现稳健的MES分类
链接：https://arxiv.org/abs/2508.13280

作者：demir, Hacer Yalim Keles, Omer Ozgur Tanriover
备注：16 pages, 4 figures, 9 tables
摘要：从内窥镜图像估计疾病严重程度对于评估溃疡性结肠炎至关重要，其中Mayo内窥镜子评分（MES）被广泛用于炎症分级。然而，MES分类仍然具有挑战性，这是由于来自观察者间变异性的标签噪声和评分的有序性，而标准模型往往忽略了这些因素。我们提出了CLoE，一个课程学习框架，占标签的可靠性和顺序结构。通过在波士顿肠道准备量表（BBPS）标签上训练的轻量级模型估计的图像质量用作注释置信度的代理，以将样本从简单（干净）到困难（嘈杂）进行排序。该课程进一步与ResizeMix增强相结合，以提高鲁棒性。使用CNN和Transformers在LIMUC和HyperKvasir数据集上进行的实验表明，CLoE在强监督和自监督基线上持续提高性能。例如，ConvNeXt-Tiny在LIMUC上达到82.5%的准确度和0.894的QWK，具有较低的计算成本。这些结果突出了潜在的困难意识的训练策略，以提高有序分类标签的不确定性。代码将在https://github.com/zeynepozdemir/CLoE上发布。
摘要：Estimating disease severity from endoscopic images is essential in assessing ulcerative colitis, where the Mayo Endoscopic Subscore (MES) is widely used to grade inflammation. However, MES classification remains challenging due to label noise from inter-observer variability and the ordinal nature of the score, which standard models often ignore. We propose CLoE, a curriculum learning framework that accounts for both label reliability and ordinal structure. Image quality, estimated via a lightweight model trained on Boston Bowel Preparation Scale (BBPS) labels, is used as a proxy for annotation confidence to order samples from easy (clean) to hard (noisy). This curriculum is further combined with ResizeMix augmentation to improve robustness. Experiments on the LIMUC and HyperKvasir datasets, using both CNNs and Transformers, show that CLoE consistently improves performance over strong supervised and self-supervised baselines. For instance, ConvNeXt-Tiny reaches 82.5\% accuracy and a QWK of 0.894 on LIMUC with low computational cost. These results highlight the potential of difficulty-aware training strategies for improving ordinal classification under label uncertainty. Code will be released at https://github.com/zeynepozdemir/CLoE.

【4】Using Artificial Intuition in Distinct, Minimalist Classification of Scientific Abstracts for Management of Technology Portfolios
标题：在科学摘要的清晰、极简分类中使用人工直觉进行技术资产管理
链接：https://arxiv.org/abs/2508.13182

作者：anka, Fred Morstatter, Andrea Belz, Alexandra Graddy-Reed
摘要：科学摘要的分类对于战略活动是有用的，但由于稀疏的文本提供的上下文线索很少，因此自动化具有挑战性。与科学出版物相关联的元数据可用于提高性能，但通常仍需要半监督设置。此外，这样的方案可能会产生缺乏区别的标签-也就是说，它们重叠，因此不能唯一地定义抽象。相比之下，专家们可以轻松地对这些文本进行标记和分类。在这里，我们描述了一个应用程序的过程中，我们称之为人工直觉复制专家的方法，使用大型语言模型（LLM）生成元数据。我们使用美国国家科学基金会的公开摘要来创建一组标签，然后我们在中国国家自然科学基金会的一组摘要上进行测试，以检查资助趋势。我们证明了这种方法的可行性研究组合管理，技术侦察，和其他战略活动。
摘要：Classification of scientific abstracts is useful for strategic activities but challenging to automate because the sparse text provides few contextual clues. Metadata associated with the scientific publication can be used to improve performance but still often requires a semi-supervised setting. Moreover, such schemes may generate labels that lack distinction -- namely, they overlap and thus do not uniquely define the abstract. In contrast, experts label and sort these texts with ease. Here we describe an application of a process we call artificial intuition to replicate the expert's approach, using a Large Language Model (LLM) to generate metadata. We use publicly available abstracts from the United States National Science Foundation to create a set of labels, and then we test this on a set of abstracts from the Chinese National Natural Science Foundation to examine funding trends. We demonstrate the feasibility of this method for research portfolio management, technology scouting, and other strategic activities.

优化|敛散性(2篇)

【1】AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics
标题：AutoScale：由多任务优化子索引导的线性缩放
链接：https://arxiv.org/abs/2508.13979

作者：Kei Ikemura, Qingwen Zhang, Xiaomeng Zhu, Ci Li, Nazre Batool, Sina Sharif Mansouri, John Folkesson
备注：The first two authors hold equal contribution. 10 pages, 6 figures
摘要：最近的多任务学习研究表明，当使用精心选择的固定任务权重时，线性标量化可以实现与复杂多任务优化（MTO）方法相当甚至更好的性能。目前还不清楚为什么某些权重会产生最佳性能，以及如何在不依赖穷举超参数搜索的情况下确定这些权重。本文建立了线性标量化和MTO方法之间的直接联系，通过大量的实验揭示了性能良好的标量化权重在关键的MTO指标中表现出特定的趋势，例如高梯度幅度相似性。在此基础上，我们引入了AutoScale，这是一个简单而有效的两阶段框架，它使用这些MTO指标来指导线性标量化的权重选择，而无需昂贵的权重搜索。AutoScale在不同的数据集（包括一个新的大规模基准测试）上始终显示出卓越的性能和高效率。
摘要：Recent multi-task learning studies suggest that linear scalarization, when using well-chosen fixed task weights, can achieve comparable to or even better performance than complex multi-task optimization (MTO) methods. It remains unclear why certain weights yield optimal performance and how to determine these weights without relying on exhaustive hyperparameter search. This paper establishes a direct connection between linear scalarization and MTO methods, revealing through extensive experiments that well-performing scalarization weights exhibit specific trends in key MTO metrics, such as high gradient magnitude similarity. Building on this insight, we introduce AutoScale, a simple yet effective two-phase framework that uses these MTO metrics to guide weight selection for linear scalarization, without expensive weight search. AutoScale consistently shows superior performance with high efficiency across diverse datasets including a new large-scale benchmark.

【2】Order Optimal Regret Bounds for Sharpe Ratio Optimization in the Bandit Setting
标题：强盗环境下夏普比优化的阶次最优后悔界
链接：https://arxiv.org/abs/2508.13749

作者：Taha Shah, Sabrina Khurshid, Gourab Ghatak
摘要：本文研究了随机强盗环境下夏普比最大化的序贯决策问题。我们专注于汤普森采样（TS）算法，贝叶斯方法庆祝其经验的性能和探索效率，假设高斯奖励未知参数。与传统的集中在最大化累积回报的强盗目标不同，夏普比率优化引入了实现高回报和控制风险之间的内在权衡，需要仔细探索均值和方差。我们的理论贡献包括一个新的遗憾分解专门设计的夏普比率，突出了信息获取的奖励分布在驾驶学习效率的作用。然后，我们建立基本的性能限制所提出的算法\texttt{SRTS}的上界后悔。我们还推导出匹配下界，并显示顺序最优性。我们的研究结果表明，汤普森抽样实现对数遗憾随着时间的推移，与分布相关的因素捕获的困难，区分武器的基础上，风险调整后的性能。实验结果表明，我们的算法显着优于现有的算法。
摘要：In this paper, we investigate the problem of sequential decision-making for Sharpe ratio (SR) maximization in a stochastic bandit setting. We focus on the Thompson Sampling (TS) algorithm, a Bayesian approach celebrated for its empirical performance and exploration efficiency, under the assumption of Gaussian rewards with unknown parameters. Unlike conventional bandit objectives focusing on maximizing cumulative reward, Sharpe ratio optimization instead introduces an inherent tradeoff between achieving high returns and controlling risk, demanding careful exploration of both mean and variance. Our theoretical contributions include a novel regret decomposition specifically designed for the Sharpe ratio, highlighting the role of information acquisition about the reward distribution in driving learning efficiency. Then, we establish fundamental performance limits for the proposed algorithm \texttt{SRTS} in terms of an upper bound on regret. We also derive the matching lower bound and show the order-optimality. Our results show that Thompson Sampling achieves logarithmic regret over time, with distribution-dependent factors capturing the difficulty of distinguishing arms based on risk-adjusted performance. Empirical simulations show that our algorithm significantly outperforms existing algorithms.

预测|估计(5篇)

【1】Prediction is not Explanation: Revisiting the Explanatory Capacity of Mapping Embeddings
标题：预测不是解释--再论映射嵌入的解释能力
链接：https://arxiv.org/abs/2508.13729

作者：asimchyk, Alhassan Abdelhalim, Sören Laue, Michaela Regneri
备注：10 pages, 6 Figures. Published at ECAI 2025 in a version without the Appendix
摘要：了解深度学习模型中隐式编码的知识对于提高AI系统的可解释性至关重要。本文研究了解释词嵌入中编码的知识的常用方法，词嵌入是大型语言模型（LLM）的核心元素。这些方法通常涉及将嵌入映射到人类可解释的语义特征的集合上，称为特征规范。先前的工作假设，准确地预测这些语义特征的词嵌入意味着嵌入包含相应的知识。我们通过证明预测准确度本身并不能可靠地表明真正的基于特征的可解释性来挑战这一假设。我们表明，这些方法可以成功地预测，甚至随机信息，得出的结论是，结果主要是由算法的上限，而不是有意义的语义表示的词嵌入。因此，仅基于预测性能的数据集之间的比较并不能可靠地指示哪个数据集更好地被词嵌入捕获。我们的分析表明，这种映射主要反映了向量空间内的几何相似性，而不是真正出现的语义属性。
摘要：Understanding what knowledge is implicitly encoded in deep learning models is essential for improving the interpretability of AI systems. This paper examines common methods to explain the knowledge encoded in word embeddings, which are core elements of large language models (LLMs). These methods typically involve mapping embeddings onto collections of human-interpretable semantic features, known as feature norms. Prior work assumes that accurately predicting these semantic features from the word embeddings implies that the embeddings contain the corresponding knowledge. We challenge this assumption by demonstrating that prediction accuracy alone does not reliably indicate genuine feature-based interpretability. We show that these methods can successfully predict even random information, concluding that the results are predominantly determined by an algorithmic upper bound rather than meaningful semantic representation in the word embeddings. Consequently, comparisons between datasets based solely on prediction performance do not reliably indicate which dataset is better captured by the word embeddings. Our analysis illustrates that such mappings primarily reflect geometric similarity within vector spaces rather than indicating the genuine emergence of semantic properties.

【2】Prediction of Hospital Associated Infections During Continuous Hospital Stays
标题：连续住院期间医院感染的预测
链接：https://arxiv.org/abs/2508.13561

作者： Datta, Methun Kamruzzaman, Eili Y. Klein, Gregory R Madden, Xinwei Deng, Anil Vullikanti, Parantapa Bhattacharya
摘要：2019年，美国疾病控制和预防中心（CDC）将耐甲氧西林金黄色葡萄球菌（MRSA）指定为严重的抗菌素耐药性威胁。对于住院患者来说，由于多种因素的独特组合，获得MRSA并因其而遭受危及生命的后果的风险仍然特别高，这些因素包括：合并症，免疫抑制，抗生素使用以及与受污染的医院工作人员和设备接触的风险。在本文中，我们提出了一种新型的生成概率模型GenHAI，用于对患者在单次住院期间的MRSA检测结果结果序列进行建模。该模型可用于从医院管理者的角度回答许多重要问题，以降低MRSA感染的风险。我们的模型是基于概率编程范式，并可用于近似回答各种预测，因果和反事实的问题。我们通过使用两个真实世界的数据集将其与判别式和生成式机器学习模型进行比较来证明我们模型的有效性。
摘要：The US Centers for Disease Control and Prevention (CDC), in 2019, designated Methicillin-resistant Staphylococcus aureus (MRSA) as a serious antimicrobial resistance threat. The risk of acquiring MRSA and suffering life-threatening consequences due to it remains especially high for hospitalized patients due to a unique combination of factors, including: co-morbid conditions, immuno suppression, antibiotic use, and risk of contact with contaminated hospital workers and equipment. In this paper, we present a novel generative probabilistic model, GenHAI, for modeling sequences of MRSA test results outcomes for patients during a single hospitalization. This model can be used to answer many important questions from the perspectives of hospital administrators for mitigating the risk of MRSA infections. Our model is based on the probabilistic programming paradigm, and can be used to approximately answer a variety of predictive, causal, and counterfactual questions. We demonstrate the efficacy of our model by comparing it against discriminative and generative machine learning models using two real-world datasets.

【3】Collapsing ROC approach for risk prediction research on both common and rare variants
标题：常见和罕见变种风险预测研究的崩溃ROC方法
链接：https://arxiv.org/abs/2508.13552

作者：i Wei, Qing Lu
摘要：利用新兴遗传学发现进行风险预测，对改善公共卫生和临床护理大有希望。然而，最近的风险预测研究表明，在现有的常见遗传基因座上形成的预测测试，包括来自全基因组关联研究的那些，缺乏足够的准确性用于临床使用。由于基因组上大多数罕见变异在风险预测中的作用尚未得到研究，未来的疾病预测发现应该转向更全面的风险预测策略，将常见和罕见变异都考虑在内。我们提出了一个崩溃的受试者工作特征CROC方法的风险预测研究的共同和罕见的变异。新方法是先前开发的前向ROC FROC方法的扩展，具有用于处理罕见变异的额外程序。通过使用来自遗传分析研讨会17个微型外显子组数据集的37个候选基因中的533个单核苷酸多态性SNP来评估该方法。我们发现，建立在所有SNPs上的预测模型获得了更高的准确性AUC = 0.605，而不是建立在常见变异上的模型AUC = 0.585。我们通过逐渐减少分析中常见变体的数量来进一步评估两种方法的性能。我们发现，当数据中常见变异的数量减少时，CROC方法比FROC方法获得更高的准确性。在极端情况下，当数据中仅存在罕见变体时，CROC达到0.603的AUC值，而FROC具有0.524的AUC值。
摘要：Risk prediction that capitalizes on emerging genetic findings holds great promise for improving public health and clinical care. However, recent risk prediction research has shown that predictive tests formed on existing common genetic loci, including those from genome-wide association studies, have lacked sufficient accuracy for clinical use. Because most rare variants on the genome have not yet been studied for their role in risk prediction, future disease prediction discoveries should shift toward a more comprehensive risk prediction strategy that takes into account both common and rare variants. We are proposing a collapsing receiver operating characteristic CROC approach for risk prediction research on both common and rare variants. The new approach is an extension of a previously developed forward ROC FROC approach, with additional procedures for handling rare variants. The approach was evaluated through the use of 533 single-nucleotide polymorphisms SNPs in 37 candidate genes from the Genetic Analysis Workshop 17 mini-exome data set. We found that a prediction model built on all SNPs gained more accuracy AUC = 0.605 than one built on common variants alone AUC = 0.585. We further evaluated the performance of two approaches by gradually reducing the number of common variants in the analysis. We found that the CROC method attained more accuracy than the FROC method when the number of common variants in the data decreased. In an extreme scenario, when there are only rare variants in the data, the CROC reached an AUC value of 0.603, whereas the FROC had an AUC value of 0.524.

【4】CALYPSO: Forecasting and Analyzing MRSA Infection Patterns with Community and Healthcare Transmission Dynamics
标题：CALLYPSO：通过社区和医疗保健传播动态预测和分析耐甲氧金黄色葡萄球菌感染模式
链接：https://arxiv.org/abs/2508.13548

作者： Datta, Jiaming Cui, Gregory R. Madden, Anil Vullikanti
摘要：耐甲氧西林金黄色葡萄球菌（MRSA）是医院以及长期护理机构内的一个严重公共卫生威胁。更好地了解MRSA风险，评估干预措施和预测MRSA发病率是重要的公共卫生问题。现有的预测模型依赖于统计或神经网络方法，缺乏流行病学的可解释性，并具有有限的性能。机制流行病模型难以校准，并且在整合不同数据集方面受到限制。我们提出了CALYPSO，一个混合框架，将神经网络与机械集合种群模型相结合，以捕获传染病的传播动态（即，MRSA）在医疗保健和社区环境中的应用。我们的模型利用患者级别的保险索赔、通勤数据和医疗保健转移模式来学习控制MRSA传播的特定区域和时间参数。这使得在多个空间分辨率（县，医疗机构，地区，州）准确，可解释的预测，并支持感染控制政策和爆发风险的反事实分析。我们还表明，与机器学习基线相比，CALYPSO将全州的预测性能提高了4.5%以上，同时还确定了高风险地区和分配感染预防资源的成本效益策略。
摘要：Methicillin-resistant Staphylococcus aureus (MRSA) is a critical public health threat within hospitals as well as long-term care facilities. Better understanding of MRSA risks, evaluation of interventions and forecasting MRSA rates are important public health problems. Existing forecasting models rely on statistical or neural network approaches, which lack epidemiological interpretability, and have limited performance. Mechanistic epidemic models are difficult to calibrate and limited in incorporating diverse datasets. We present CALYPSO, a hybrid framework that integrates neural networks with mechanistic metapopulation models to capture the spread dynamics of infectious diseases (i.e., MRSA) across healthcare and community settings. Our model leverages patient-level insurance claims, commuting data, and healthcare transfer patterns to learn region- and time-specific parameters governing MRSA spread. This enables accurate, interpretable forecasts at multiple spatial resolutions (county, healthcare facility, region, state) and supports counterfactual analyses of infection control policies and outbreak risks. We also show that CALYPSO improves statewide forecasting performance by over 4.5% compared to machine learning baselines, while also identifying high-risk regions and cost-effective strategies for allocating infection prevention resources.

【5】EventTSF: Event-Aware Non-Stationary Time Series Forecasting
标题：EventTSF：事件感知的非平稳时间序列预测
链接：https://arxiv.org/abs/2508.13434

作者：e, Ming Jin, Yiji Zhao, Hongyan Li, Bo Du, Chang Xu, Shirui Pan
备注：13 pages, 10 figures
摘要：时间序列预测在能源和交通等关键领域发挥着至关重要的作用，其中非平稳动态与文本等其他模式中的事件深深交织在一起。然而，结合基于自然语言的外部事件来改善非平稳预测在很大程度上仍未得到探索，因为大多数方法仍然依赖于单一模式，导致上下文知识有限和模型性能不佳。实现时态和文本数据之间的细粒度多模态交互面临三个基本问题：（1）时变离散文本事件和连续时间序列之间的细粒度同步的困难;（2）文本语义引入的固有时态不确定性;（3）文本事件嵌入和多分辨率时态模式之间的不一致。在这项工作中，我们通过引入事件感知的非平稳时间序列预测（EventTSF）来解决这些挑战，EventTSF是一个自回归生成框架，它将历史时间序列与文本事件相结合，以进行后续预测。具体来说，EventTSF在每一步都使用自回归扩散和流匹配来捕获细微的时间事件交互。为了处理事件引起的不确定性，根据事件语义信号自适应地控制流匹配时间步长。底层去噪器采用多模态U形扩散Transformer，有效地融合了不同分辨率的时间和文本模态。在8个合成数据集和真实数据集上进行的大量实验表明，EventTSF在各种事件感知的非平稳时间序列预测场景中优于12个基线，预测准确率提高了10.7%，训练效率提高了1.13倍。
摘要：Time series forecasting plays a vital role in critical domains like energy and transportation, where non-stationary dynamics are deeply intertwined with events in other modalities such as texts. However, incorporating natural language-based external events to improve non-stationary forecasting remains largely unexplored, as most approaches still rely on a single modality, resulting in limited contextual knowledge and model underperformance. Enabling fine-grained multimodal interactions between temporal and textual data is challenged by three fundamental issues: (1) the difficulty of fine-grained synchronization between time-varying discrete textual events and continuous time series; (2) the inherent temporal uncertainty introduced by textual semantics; and (3) the misalignment between textual event embeddings and multi-resolution temporal patterns. In this work, we address these challenges by introducing event-aware non-stationary time series forecasting (EventTSF), an autoregressive generation framework that integrates historical time series with textual events to make subsequent forecasts. Specifically, EventTSF uses autoregressive diffusion with flow matching at each step to capture nuanced temporal-event interactions. To handle event-induced uncertainty, flow matching timesteps are adaptively controlled according to event semantic signals. The underlying denoiser employs a multimodal U-shaped diffusion transformer that efficiently fuses temporal and textual modalities across different resolutions. Extensive experiments on 8 synthetic and real-world datasets show that EventTSF outperforms 12 baselines across diverse event-aware non-stationary time series forecasting scenarios, achieving substantial improvements of 10.7% higher forecasting accuracy and $1.13\times$ faster training efficiency.

其他神经网络|深度学习|模型|建模(16篇)

【1】Learning from Preferences and Mixed Demonstrations in General Settings
标题：从一般设置中的偏好和混合演示中学习
链接：https://arxiv.org/abs/2508.14027

作者：rown, Carl Henrik Ek, Robert D Mullins
摘要：强化学习是一种在顺序设置中进行学习的通用方法，但当任务复杂时，通常很难指定一个好的奖励函数。在这些情况下，可以使用偏好反馈或专家演示。然而，现有的同时利用两者的方法通常是临时的，依赖于特定领域的属性，或者不会扩展。我们开发了一个新的框架，从人类数据中学习，\n {奖励理性偏序观察}，旨在灵活和可扩展。在此基础上，我们介绍了一个实用的算法，LEOPARD：从偏好和排名演示学习估计目标。LEOPARD可以从广泛的数据中学习，包括负面示范，以有效地学习广泛领域的奖励函数。我们发现，当有限数量的偏好和示范反馈是可用的，LEOPARD优于现有的基准显着的利润率。此外，我们使用LEOPARD来研究从多种类型的反馈中学习，而不是只从一种反馈中学习，并发现组合反馈类型通常是有益的。
摘要：Reinforcement learning is a general method for learning in sequential settings, but it can often be difficult to specify a good reward function when the task is complex. In these cases, preference feedback or expert demonstrations can be used instead. However, existing approaches utilising both together are often ad-hoc, rely on domain-specific properties, or won't scale. We develop a new framing for learning from human data, \emph{reward-rational partial orderings over observations}, designed to be flexible and scalable. Based on this we introduce a practical algorithm, LEOPARD: Learning Estimated Objectives from Preferences And Ranked Demonstrations. LEOPARD can learn from a broad range of data, including negative demonstrations, to efficiently learn reward functions across a wide range of domains. We find that when a limited amount of preference and demonstration feedback is available, LEOPARD outperforms existing baselines by a significant margin. Furthermore, we use LEOPARD to investigate learning from many types of feedback compared to just a single one, and find that combining feedback types is often beneficial.

【2】GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks
标题：GDNSQ：低比特神经网络的渐进可微噪声尺度量化
链接：https://arxiv.org/abs/2508.14004

作者：lishev, Ian Akhremchik
备注：9 pages, 6 figures, 7 tables
摘要：量化的神经网络可以被看作是一串有噪声的通道，每一层的舍入会随着位宽的缩小而降低容量;浮点（FP）检查点设置最大输入速率。我们跟踪容量动态的平均位宽减少，并确定由此产生的量化瓶颈，铸造微调作为一个光滑的，有约束的优化问题。我们的方法采用完全可微的直通估计（STE），具有可学习的位宽，噪声尺度和箝位界限，并通过外部点惩罚强制执行目标位宽;温和的度量平滑（通过蒸馏）稳定训练。尽管它的简单性，该方法达到竞争力的精度下降到极端的W1 A1设置，同时保持STE的效率。
摘要：Quantized neural networks can be viewed as a chain of noisy channels, where rounding in each layer reduces capacity as bit-width shrinks; the floating-point (FP) checkpoint sets the maximum input rate. We track capacity dynamics as the average bit-width decreases and identify resulting quantization bottlenecks by casting fine-tuning as a smooth, constrained optimization problem. Our approach employs a fully differentiable Straight-Through Estimator (STE) with learnable bit-width, noise scale and clamp bounds, and enforces a target bit-width via an exterior-point penalty; mild metric smoothing (via distillation) stabilizes training. Despite its simplicity, the method attains competitive accuracy down to the extreme W1A1 setting while retaining the efficiency of STE.

【3】Formal Algorithms for Model Efficiency
标题：模型效率的形式算法
链接：https://arxiv.org/abs/2508.14000

作者：gi, Srishti Das, Kunal, Vatsal Gupta
备注：17 pages, 0 figures
摘要：我们引入了旋钮-仪表-规则（KMR）框架，这是一种用于表示和推理深度学习中模型效率技术的统一形式主义。通过将不同的方法（包括修剪、量化、知识蒸馏和参数高效架构）抽象为一组一致的可控旋钮、确定性规则和可测量仪表，KMR提供了一个数学上精确和模块化的效率优化视角。该框架能够系统地组合多种技术，灵活的策略驱动的应用程序，并通过迭代KMR算法的迭代预算优化。我们演示了如何知名的效率方法可以实例化为KMR三元组，并提出了简洁的算法模板。该框架突出了方法之间的潜在关系，促进混合管道，并为未来的研究奠定了基础，在自动化的政策学习，动态适应和成本质量权衡的理论分析。总体而言，KMR提供了一个概念和实用的工具，统一和推进模型效率的研究。
摘要：We introduce the Knob-Meter-Rule (KMR) framework, a unified formalism for representing and reasoning about model efficiency techniques in deep learning. By abstracting diverse methods, including pruning, quantization, knowledge distillation, and parameter-efficient architectures, into a consistent set of controllable knobs, deterministic rules, and measurable meters, KMR provides a mathematically precise and modular perspective on efficiency optimization. The framework enables systematic composition of multiple techniques, flexible policy-driven application, and iterative budgeted optimization through the Budgeted-KMR algorithm. We demonstrate how well-known efficiency methods can be instantiated as KMR triples and present concise algorithmic templates for each. The framework highlights underlying relationships between methods, facilitates hybrid pipelines, and lays the foundation for future research in automated policy learning, dynamic adaptation, and theoretical analysis of cost-quality trade-offs. Overall, KMR offers both a conceptual and practical tool for unifying and advancing model efficiency research.

【4】Categorical Policies: Multimodal Policy Learning and Exploration in Continuous Control
标题：类别政策：持续控制中的多模式政策学习和探索
链接：https://arxiv.org/abs/2508.13922

作者：ul Islam, Manfred Huber
备注：6 pages, 4 figures; Has been submitted and accepted at IEEE SMC, 2025
摘要：深度强化学习（RL）中的策略，无论是确定性的还是随机的，通常都被参数化为单独的高斯分布，从而将学习到的行为限制为单峰。然而，许多实际的决策问题的性质有利于多模态的政策，有利于强大的探索环境，从而解决学习的挑战，从稀疏的回报，复杂的动态，或需要战略适应不同的情况下。这个问题在连续控制域中加剧，其中探索通常发生在预测的最佳动作附近，无论是通过加性高斯噪声还是随机策略的采样过程。在本文中，我们引入分类策略模型的多模态行为模式与一个中间的分类分布，然后生成输出动作，是条件的采样模式。我们探索了两种采样方案，确保可微的离散潜在结构，同时保持有效的基于梯度的优化。通过利用潜在的分类分布来选择行为模式，我们的方法自然地表达了多模态，同时通过采样技巧保持完全可微。我们在一组DeepMind Control Suite环境中评估了我们的多模态策略，证明通过更好的探索，我们学习的策略收敛得更快，并且优于标准的高斯策略。我们的研究结果表明，分类分布作为一个强大的工具，结构化的探索和多模态行为表示在连续控制。
摘要：A policy in deep reinforcement learning (RL), either deterministic or stochastic, is commonly parameterized as a Gaussian distribution alone, limiting the learned behavior to be unimodal. However, the nature of many practical decision-making problems favors a multimodal policy that facilitates robust exploration of the environment and thus to address learning challenges arising from sparse rewards, complex dynamics, or the need for strategic adaptation to varying contexts. This issue is exacerbated in continuous control domains where exploration usually takes place in the vicinity of the predicted optimal action, either through an additive Gaussian noise or the sampling process of a stochastic policy. In this paper, we introduce Categorical Policies to model multimodal behavior modes with an intermediate categorical distribution, and then generate output action that is conditioned on the sampled mode. We explore two sampling schemes that ensure differentiable discrete latent structure while maintaining efficient gradient-based optimization. By utilizing a latent categorical distribution to select the behavior mode, our approach naturally expresses multimodality while remaining fully differentiable via the sampling tricks. We evaluate our multimodal policy on a set of DeepMind Control Suite environments, demonstrating that through better exploration, our learned policies converge faster and outperform standard Gaussian policies. Our results indicate that the Categorical distribution serves as a powerful tool for structured exploration and multimodal behavior representation in continuous control.

【5】Automated Energy-Aware Time-Series Model Deployment on Embedded FPGAs for Resilient Combined Sewer Overflow Management
标题：在嵌入式FPGA上自动部署能量感知时间序列模型，用于弹性合流污水溢流管理
链接：https://arxiv.org/abs/2508.13905

作者：Ling, Vipin Singh, Chao Qian, Felix Biessmann, Gregor Schiele
备注：6 pages, 6 figures, 1 table, accepted by the 11th IEEE International Smart Cities Conference
摘要：气候变化加剧了极端天气事件，日益挑战老化的合流下水道系统，增加了未经处理的废水溢出的风险。准确预测污水溢流池的填充水平可以为早期干预提供可操作的见解，有助于减少不受控制的排放。近年来，基于人工智能的预测方法为传统的基于物理的模型提供了可扩展的替代方案，但它们对云计算的依赖限制了它们在通信中断期间的可靠性。为了解决这个问题，我们提出了一个端到端的预测框架，可以直接在边缘设备上进行节能推理。我们的解决方案集成了轻量级的Transformer和长短期内存（LSTM）模型，通过仅整数量化进行压缩，以实现高效的设备上执行。此外，一个自动化的硬件感知部署管道用于搜索最佳的模型配置，通过联合最大限度地减少预测误差和AMD Spartan-7 XC 7S 15 FPGA上的能耗。通过对现实世界的下水道数据进行评估，所选的8位Transformer模型经过24小时历史测量训练，以每次推理0.370 mJ的能源成本实现了高准确度（MSE 0.0376）。相比之下，最佳的8位LSTM模型需要的能量明显更少（0.009 mJ，低40倍以上），但准确度差14.89%（MSE 0.0432），训练时间长得多。这种权衡强调了将模型选择与部署优先级保持一致的必要性，有利于LSTM实现超低能耗或Transformer实现更高的预测准确性。总的来说，我们的工作可以实现本地的节能预测，有助于提高合流污水系统的弹性。所有代码都可以在GitHub Repository（https：//github.com/tianheng-ling/EdgeOverflowForecast）中找到。
摘要：Extreme weather events, intensified by climate change, increasingly challenge aging combined sewer systems, raising the risk of untreated wastewater overflow. Accurate forecasting of sewer overflow basin filling levels can provide actionable insights for early intervention, helping mitigating uncontrolled discharge. In recent years, AI-based forecasting methods have offered scalable alternatives to traditional physics-based models, but their reliance on cloud computing limits their reliability during communication outages. To address this, we propose an end-to-end forecasting framework that enables energy-efficient inference directly on edge devices. Our solution integrates lightweight Transformer and Long Short-Term Memory (LSTM) models, compressed via integer-only quantization for efficient on-device execution. Moreover, an automated hardware-aware deployment pipeline is used to search for optimal model configurations by jointly minimizing prediction error and energy consumption on an AMD Spartan-7 XC7S15 FPGA. Evaluated on real-world sewer data, the selected 8-bit Transformer model, trained on 24 hours of historical measurements, achieves high accuracy (MSE 0.0376) at an energy cost of 0.370 mJ per inference. In contrast, the optimal 8-bit LSTM model requires significantly less energy (0.009 mJ, over 40x lower) but yields 14.89% worse accuracy (MSE 0.0432) and much longer training time. This trade-off highlights the need to align model selection with deployment priorities, favoring LSTM for ultra-low energy consumption or Transformer for higher predictive accuracy. In general, our work enables local, energy-efficient forecasting, contributing to more resilient combined sewer systems. All code can be found in the GitHub Repository (https://github.com/tianheng-ling/EdgeOverflowForecast).

【6】Text2Weight: Bridging Natural Language and Neural Network Weight Spaces
标题：文本2权重：架起自然语言和神经网络权重空间
链接：https://arxiv.org/abs/2508.13633

作者：n, Wenshuo Chen, Zexi Li, Songning Lai, Jiemin Wu, Yutao Yue
备注：Accepted By ACM MM 2025 Main Track
摘要：我们离自动生成神经网络还有多远？虽然神经网络权重生成显示出希望，但目前的方法难以推广到看不见的任务和实际应用探索。为了解决这个问题，我们提出了T2W，一个扩散Transformer框架，它可以根据自然语言描述生成特定于任务的权重。T2W将网络参数分层处理为统一的块，通过先验注意力机制整合CLIP的文本嵌入，并采用带有权重空间增强的对抗训练来增强泛化能力。在Cifar100、Caltech256和TinyImageNet上的实验表明，T2W能够为看不见的任务生成高质量的权重，性能优于基于优化的初始化，并支持权重增强和文本引导模型融合等新应用。我们的工作将文本语义与权重空间动力学联系起来，由文本权重对的开源数据集支持，推进了神经网络参数合成中生成模型的实用性。我们的代码可以在Github上找到。
摘要：How far are we really from automatically generating neural networks? While neural network weight generation shows promise, current approaches struggle with generalization to unseen tasks and practical application exploration. To address this, we propose T2W, a diffusion transformer framework that generates task-specific weights conditioned on natural language descriptions. T2W hierarchically processes network parameters into uniform blocks, integrates text embeddings from CLIP via a prior attention mechanism, and employs adversarial training with weight-space augmentation to enhance generalization. Experiments on Cifar100, Caltech256, and TinyImageNet demonstrate T2W's ability to produce high-quality weights for unseen tasks, outperforming optimization-based initialization and enabling novel applications such as weight enhancement and text-guided model fusion. Our work bridges textual semantics with weight-space dynamics, supported by an open-source dataset of text-weight pairs, advancing the practicality of generative models in neural network parameter synthesis. Our code is available on Github.

【7】Compressed Models are NOT Trust-equivalent to Their Large Counterparts
标题：压缩模型与其大型对应模型不信任等效
链接：https://arxiv.org/abs/2508.13533

作者： Rai, Chirag Kothari, Siddhesh Shelke, Amit Awekar
摘要：大型深度学习模型在部署到资源受限的环境中之前通常会被压缩。我们能像信任原始大模型的预测一样信任压缩模型的预测吗？现有的工作已经敏锐地研究了压缩对准确性和相关性能指标的影响。然而，性能对等并不能保证信任等价。我们提出了一个二维框架的信任等价评估。首先，可解释性对齐衡量模型是否基于相同的输入特征进行预测。我们使用LIME和SHAP测试来测量可解释性对齐。其次，校准相似性衡量模型在其预测概率方面是否表现出可比的可靠性。它通过ECE，MCE，Brier评分和可靠性图表进行评估。我们使用BERT-base作为大型模型及其多个压缩变体进行了实验。我们专注于两个文本分类任务：自然语言推理和释义识别。我们的研究结果揭示了低的可解释性对齐和校准相似性的显着不匹配。即使模型之间的精度几乎相同，也会发生这种情况。这些发现表明，压缩模型与大型模型不具有信任等价性。将压缩模型部署为大型模型的直接替代品需要仔细的评估，而不仅仅是性能对等。
摘要：Large Deep Learning models are often compressed before being deployed in a resource-constrained environment. Can we trust the prediction of compressed models just as we trust the prediction of the original large model? Existing work has keenly studied the effect of compression on accuracy and related performance measures. However, performance parity does not guarantee trust-equivalence. We propose a two-dimensional framework for trust-equivalence evaluation. First, interpretability alignment measures whether the models base their predictions on the same input features. We use LIME and SHAP tests to measure the interpretability alignment. Second, calibration similarity measures whether the models exhibit comparable reliability in their predicted probabilities. It is assessed via ECE, MCE, Brier Score, and reliability diagrams. We conducted experiments using BERT-base as the large model and its multiple compressed variants. We focused on two text classification tasks: natural language inference and paraphrase identification. Our results reveal low interpretability alignment and significant mismatch in calibration similarity. It happens even when the accuracies are nearly identical between models. These findings show that compressed models are not trust-equivalent to their large counterparts. Deploying compressed models as a drop-in replacement for large models requires careful assessment, going beyond performance parity.

【8】Dynamic Design of Machine Learning Pipelines via Metalearning
标题：通过元学习动态设计机器学习管道
链接：https://arxiv.org/abs/2508.13436

作者：cobaça, André C. P. L. F. de Carvalho
摘要：自动化机器学习（AutoML）通过自动化模型选择、超参数调整和特征工程，使基于机器学习的系统的设计民主化。然而，与传统搜索和优化策略（诸如随机搜索、粒子群优化和贝叶斯优化）相关联的高计算成本仍然是一个重大挑战。此外，AutoML系统通常会探索很大的搜索空间，这可能会导致过度拟合。本文介绍了一种用于AutoML系统的动态设计搜索空间的元学习方法。该方法使用历史元知识来选择搜索空间的有前途的区域，加速优化过程。实验结果表明，该方法在不影响预测性能的前提下，在随机搜索和搜索空间（1.8/13的预处理器和4.3/16的分类器）上的运行时间减少了89%.此外，该方法在适应Auto-Sklearn时表现出竞争力，减少了搜索空间。此外，本研究还包括元特征选择，元模型的可解释性，以及搜索空间缩减策略中固有的权衡。
摘要：Automated machine learning (AutoML) has democratized the design of machine learning based systems, by automating model selection, hyperparameter tuning and feature engineering. However, the high computational cost associated with traditional search and optimization strategies, such as Random Search, Particle Swarm Optimization and Bayesian Optimization, remains a significant challenge. Moreover, AutoML systems typically explore a large search space, which can lead to overfitting. This paper introduces a metalearning method for dynamically designing search spaces for AutoML system. The proposed method uses historical metaknowledge to select promising regions of the search space, accelerating the optimization process. According to experiments conducted for this study, the proposed method can reduce runtime by 89\% in Random Search and search space by (1.8/13 preprocessor and 4.3/16 classifier), without compromising significant predictive performance. Moreover, the proposed method showed competitive performance when adapted to Auto-Sklearn, reducing its search space. Furthermore, this study encompasses insights into meta-feature selection, meta-model explainability, and the trade-offs inherent in search space reduction strategies.

【9】Decentralized Contextual Bandits with Network Adaptivity
标题：具有网络适应性的分散式背景盗贼
链接：https://arxiv.org/abs/2508.13411

作者：ng, Huiwen Jia
备注：46 Pages, 9 figures
摘要：我们考虑网络上的上下文线性强盗，这是一类顺序决策问题，其中学习同时发生在多个位置，奖励分布具有结构相似性，同时也表现出局部差异。虽然经典的上下文强盗假设完全集中的数据或完全孤立的学习者，但在网络环境中，当信息部分共享时，仍有许多未被探索。在本文中，我们通过开发两种网络感知的置信度上界（UCB）算法NetLinUCB和Net-SGD-UCB来解决这一差距，这些算法能够在动态更新的网络权重的指导下实现自适应信息共享。我们的方法分解成全球和本地组件的学习，因此，允许代理受益于共享结构，而无需完全同步。这两种算法产生较轻的通信成本相比，一个完全集中的设置代理只共享计算的摘要关于同质的功能。我们建立的遗憾界表明，我们的方法降低了学习复杂度与共享结构从$O（N）$到次线性$O（\sqrt{N}）$，其中$N$是网络的大小。这两种算法显示出互补的优势：NetLinUCB在具有细粒度异质性的低噪声区域中表现出色，而Net-SGD-UCB对高维，高方差上下文具有鲁棒性。我们进一步证明了我们的方法的有效性，在模拟的定价环境相比，标准的基准。
摘要：We consider contextual linear bandits over networks, a class of sequential decision-making problems where learning occurs simultaneously across multiple locations and the reward distributions share structural similarities while also exhibiting local differences. While classical contextual bandits assume either fully centralized data or entirely isolated learners, much remains unexplored in networked environments when information is partially shared. In this paper, we address this gap by developing two network-aware Upper Confidence Bound (UCB) algorithms, NetLinUCB and Net-SGD-UCB, which enable adaptive information sharing guided by dynamically updated network weights. Our approach decompose learning into global and local components and as a result allow agents to benefit from shared structure without full synchronization. Both algorithms incur lighter communication costs compared to a fully centralized setting as agents only share computed summaries regarding the homogeneous features. We establish regret bounds showing that our methods reduce the learning complexity associated with the shared structure from $O(N)$ to sublinear $O(\sqrt{N})$, where $N$ is the size of the network. The two algorithms reveal complementary strengths: NetLinUCB excels in low-noise regimes with fine-grained heterogeneity, while Net-SGD-UCB is robust to high-dimensional, high-variance contexts. We further demonstrate the effectiveness of our methods across simulated pricing environments compared to standard benchmarks.

【10】Counterfactual Probabilistic Diffusion with Expert Models
标题：专家模型的反事实概率扩散
链接：https://arxiv.org/abs/2508.13355

作者：, Zhi Cao, Mehmed Uludag, Alexander Rodríguez
摘要：预测复杂动力系统中的反事实分布对于公共卫生和医学等领域的科学建模和决策至关重要。然而，现有的方法通常依赖于点估计或纯粹的数据驱动模型，这些模型在数据稀缺的情况下往往会出现问题。我们提出了一个基于时间序列扩散的框架，通过提取高层次的信号作为生成建模的结构化先验，结合了不完美的专家模型的指导。我们的方法，ODE-Diff，桥梁机械和数据驱动的方法，使更可靠和可解释的因果推理。我们在半合成COVID-19模拟、合成药理学动力学和现实世界案例研究中评估了ODE-Diff，证明它在点预测和分布准确性方面始终优于强基线。
摘要：Predicting counterfactual distributions in complex dynamical systems is essential for scientific modeling and decision-making in domains such as public health and medicine. However, existing methods often rely on point estimates or purely data-driven models, which tend to falter under data scarcity. We propose a time series diffusion-based framework that incorporates guidance from imperfect expert models by extracting high-level signals to serve as structured priors for generative modeling. Our method, ODE-Diff, bridges mechanistic and data-driven approaches, enabling more reliable and interpretable causal inference. We evaluate ODE-Diff across semi-synthetic COVID-19 simulations, synthetic pharmacological dynamics, and real-world case studies, demonstrating that it consistently outperforms strong baselines in both point prediction and distributional accuracy.

【11】Strategies for training point distributions in physics-informed neural networks
标题：物理信息神经网络中训练点分布策略
链接：https://arxiv.org/abs/2508.13216

作者：umagain, Toni Schneidereit
摘要：物理信息神经网络通过直接将微分方程的结构和给定条件纳入损失函数来逼近微分方程。这使得条件，例如，在建模阶段容易添加不变量。此外，该方法可以被认为是无网格的，并且可以在训练阶段之后用于计算任意网格上的解决方案。因此，物理信息神经网络正在成为用数值数学方法求解微分方程的一个有前途的替代方案。然而，它们的性能在很大程度上取决于各种因素。在本文中，我们系统地调查和评估的方法，即训练点分布的核心组成部分。我们测试了两个普通和两个偏微分方程，五种策略用于训练数据生成和浅层网络架构，具有一个和两个隐藏层。除了常见的分布，我们还引入了基于正弦的训练点，这是由Chebyshev节点的构建所激发的。通过使用某些参数组合来挑战结果，例如，随机和固定的种子重量初始化的可重复性。结果表明，训练点分布的解决方案的精度的影响，我们发现的证据表明，他们连接到微分方程的特性。
摘要：Physics-informed neural networks approach the approximation of differential equations by directly incorporating their structure and given conditions in a loss function. This enables conditions like, e.g., invariants to be easily added during the modelling phase. In addition, the approach can be considered as mesh free and can be utilised to compute solutions on arbitrary grids after the training phase. Therefore, physics-informed neural networks are emerging as a promising alternative to solving differential equations with methods from numerical mathematics. However, their performance highly depends on a large variety of factors. In this paper, we systematically investigate and evaluate a core component of the approach, namely the training point distribution. We test two ordinary and two partial differential equations with five strategies for training data generation and shallow network architectures, with one and two hidden layers. In addition to common distributions, we introduce sine-based training points, which are motivated by the construction of Chebyshev nodes. The results are challenged by using certain parameter combinations like, e.g., random and fixed-seed weight initialisation for reproducibility. The results show the impact of the training point distributions on the solution accuracy and we find evidence that they are connected to the characteristics of the differential equation.

【12】Machine Learning H-theorem
标题：机器学习H-定理
链接：https://arxiv.org/abs/2508.14003

作者：r
摘要：H定理提供了热力学第二定律的微观基础，因此对建立统计物理学至关重要，但同时，H定理一直受到争议，部分持续到今天。为了更好地理解H定理及其与时间箭头的关系，我们研究了具有周期性边界条件的随机定向和定位的硬盘的平衡问题。使用基于DeepSets架构的模型，该架构强加粒子标签的置换不变性，我们训练模型以捕获H泛函的不可逆性。
摘要：H-theorem provides a microscopic foundation of the Second Law of Thermodynamics and is therefore essential to establishing statistical physics, but at the same time, H-theorem has been subject to controversy that in part persists till this day. To better understand H-theorem and its relation to the arrow of time, we study the equilibration of randomly oriented and positioned hard disks with periodic boundary conditions. Using a model based on the DeepSets architecture, which imposes permutation invariance of the particle labels, we train a model to capture the irreversibility of the H-functional.

【13】A PC Algorithm for Max-Linear Bayesian Networks
标题：最大线性Bayesian网络的PC算法
链接：https://arxiv.org/abs/2508.13967

作者：éndola, Benjamin Hollering, Francesco Nowell
备注：24 pages, 7 figures, 1 table
摘要：最大线性贝叶斯网络（MLBN）是一类相对较新的结构方程模型，当所涉及的随机变量具有重尾分布时就会出现。与大多数有向图模型不同，MLBN通常不忠实于d-分离，因此经典的因果发现算法（如PC算法或贪婪等价搜索）不能用于准确地恢复真实的图结构。在本文中，我们开始研究基于约束的发现算法MLBN给出了一个预言测试条件独立的真实，未知的图。我们表明，如果预言是由$\ast$-分离标准在真图，那么PC算法仍然是一致的，尽管存在额外的CI语句隐含的$\ast$-分离。我们还介绍了一个新的因果发现算法命名为“PCstar”，它假定忠实于$C^\ast$-分离，并能够定向额外的边缘，不能只定向与d-或$\ast$-分离。
摘要：Max-linear Bayesian networks (MLBNs) are a relatively recent class of structural equation models which arise when the random variables involved have heavy-tailed distributions. Unlike most directed graphical models, MLBNs are typically not faithful to d-separation and thus classical causal discovery algorithms such as the PC algorithm or greedy equivalence search can not be used to accurately recover the true graph structure. In this paper, we begin the study of constraint-based discovery algorithms for MLBNs given an oracle for testing conditional independence in the true, unknown graph. We show that if the oracle is given by the $\ast$-separation criteria in the true graph, then the PC algorithm remains consistent despite the presence of additional CI statements implied by $\ast$-separation. We also introduce a new causal discovery algorithm named "PCstar" which assumes faithfulness to $C^\ast$-separation and is able to orient additional edges which cannot be oriented with only d- or $\ast$-separation.

【14】Flow Matching-Based Generative Modeling for Efficient and Scalable Data Assimilation
标题：基于流匹配的生成建模，实现高效且可扩展的数据同化
链接：https://arxiv.org/abs/2508.13313

作者：sue, Bohan Chen, So Takao, Bao Wang
摘要：数据同化（DA）是从噪声观测中顺序估计动力系统状态的问题。生成建模的最新进展激发了在高维非线性环境中进行DA的新方法，特别是集成分数滤波器（EnSF）。然而，由于缓慢的采样，这些带来了显著的计算负担。在本文中，我们介绍了一种新的过滤框架的基础上流匹配（FM）-所谓的合奏流过滤器（EnFF）-加速采样，使灵活的概率路径的设计。EnFF --一种无需训练的DA方法--集成了边缘FM矢量场（VF）的MC估计器和同化观测的本地化指导。与现有的DA生成式建模相比，EnFF具有更快的采样速度和更大的VF设计灵活性。理论上，我们证明EnFF包含经典滤波方法，例如自举粒子滤波器和集合卡尔曼滤波器作为特殊情况。高维滤波基准的实验表明，改进的成本精度权衡和能力，利用更大的合奏比以前的方法。我们的研究结果突出了FM作为一种可扩展的工具，用于在高维应用中进行过滤，使大型合奏的使用的承诺。
摘要：Data assimilation (DA) is the problem of sequentially estimating the state of a dynamical system from noisy observations. Recent advances in generative modeling have inspired new approaches to DA in high-dimensional nonlinear settings, especially the ensemble score filter (EnSF). However, these come at a significant computational burden due to slow sampling. In this paper, we introduce a new filtering framework based on flow matching (FM) -- called the ensemble flow filter (EnFF) -- to accelerate sampling and enable flexible design of probability paths. EnFF -- a training-free DA approach -- integrates MC estimators for the marginal FM vector field (VF) and a localized guidance to assimilate observations. EnFF has faster sampling and more flexibility in VF design compared to existing generative modeling for DA. Theoretically, we show that EnFF encompasses classical filtering methods such as the bootstrap particle filter and the ensemble Kalman filter as special cases. Experiments on high-dimensional filtering benchmarks demonstrate improved cost-accuracy tradeoffs and the ability to leverage larger ensembles than prior methods. Our results highlight the promise of FM as a scalable tool for filtering in high-dimensional applications that enable the use of large ensembles.

【15】Modeling GRNs with a Probabilistic Categorical Framework
标题：用概率分类框架建模GR
链接：https://arxiv.org/abs/2508.13208

作者：a, Zheng Wei, Zheng Yang, Guohong Peng
备注：21 pages, 5 figures
摘要：了解基因调控网络（GRNs）的复杂性和随机性仍然是系统生物学的核心挑战。现有的建模范式往往难以有效地捕捉复杂的，多因素的监管逻辑，并严格管理网络结构和动力学参数的双重不确定性。作为回应，这项工作介绍了概率分类GRN（PC-GRN）框架。它是一种新的理论方法，建立在三种核心方法的协同整合基础上。首先，范畴理论为调控通路的模块性和组合提供了形式化的语言。其次，贝叶斯类型Petri网（BTPN）作为一个可解释的，机械基板建模随机细胞过程，动力学参数本身表示为概率分布。PC-GRN的核心创新是其端到端的生成贝叶斯推理引擎，该引擎在BTPN模型上学习完整的后验分布（P（G，{\Theta}|（1）直接从数据。这是通过GFlowNet和HyperNetwork的新颖相互作用来实现的，GFlowNet学习一种策略来采样网络拓扑，HyperNetwork执行摊销推理来预测其相应的参数分布。由此产生的框架提供了一个数学上严格的，生物学上可解释的，和不确定性意识的表示GRN，推进预测建模和系统级分析。
摘要：Understanding the complex and stochastic nature of Gene Regulatory Networks (GRNs) remains a central challenge in systems biology. Existing modeling paradigms often struggle to effectively capture the intricate, multi-factor regulatory logic and to rigorously manage the dual uncertainties of network structure and kinetic parameters. In response, this work introduces the Probabilistic Categorical GRN(PC-GRN) framework. It is a novel theoretical approach founded on the synergistic integration of three core methodologies. Firstly, category theory provides a formal language for the modularity and composition of regulatory pathways. Secondly, Bayesian Typed Petri Nets (BTPNs) serve as an interpretable,mechanistic substrate for modeling stochastic cellular processes, with kinetic parameters themselves represented as probability distributions. The central innovation of PC-GRN is its end-to-end generative Bayesian inference engine, which learns a full posterior distribution over BTPN models (P (G, {\Theta}|D)) directly from data. This is achieved by the novel interplay of a GFlowNet, which learns a policy to sample network topologies, and a HyperNetwork, which performs amortized inference to predict their corresponding parameter distributions. The resulting framework provides a mathematically rigorous, biologically interpretable, and uncertainty-aware representation of GRNs, advancing predictive modeling and systems-level analysis.

【16】Preference Models assume Proportional Hazards of Utilities
标题：偏好模型假设公用事业的比例风险
链接：https://arxiv.org/abs/2508.13189

作者：gpal
摘要：从人类注释数据中估计偏好的方法通常涉及在诸如Plackett-Luce模型的排序选择列表上诱导分布。事实上，现代人工智能对齐工具，如奖励建模和直接偏好优化，都是基于Plackett-Luce模型提出的统计假设。在本文中，我将Plackett-Luce模型连接到另一个经典的和众所周知的统计模型，Cox比例风险模型，并试图阐明其中的连接的含义。
摘要：Approaches for estimating preferences from human annotated data typically involves inducing a distribution over a ranked list of choices such as the Plackett-Luce model. Indeed, modern AI alignment tools such as Reward Modelling and Direct Preference Optimization are based on the statistical assumptions posed by the Plackett-Luce model. In this paper, I will connect the Plackett-Luce model to another classical and well known statistical model, the Cox Proportional Hazards model and attempt to shed some light on the implications of the connection therein.

其他(36篇)

【1】Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
标题：宏观微调的意外失调：风险和缓解措施
链接：https://arxiv.org/abs/2508.14031

作者：Hahm, Taywon Min, Woogyeol Jin, Kimin Lee
备注：Source code: this https URL
摘要：除了简单的文本生成，大型语言模型（LLM）已经发展成为能够规划和与外部工具交互以解决复杂任务的代理系统。这一演变涉及到对特定于代理的任务进行微调，以提高他们的熟练程度。然而，在这种微调过程中，安全问题经常被忽视。在这项工作中，我们表明对齐的LLM可能会无意中错位，导致执行有害任务的可能性更高，并且在微调以执行代理任务时拒绝它们的倾向减少。为了解决这些安全挑战，我们提出了前缀注入保护（PING），这是一种简单而有效的方法，可以将自动生成的自然语言前缀添加到代理响应中，引导他们拒绝有害请求，同时保留良性任务的性能。具体来说，我们引入了一种迭代方法，该方法在（1）生成候选前缀和（2）选择那些优化任务性能和拒绝行为的前缀之间交替。实验结果表明，PING显着提高了微调LLM代理的安全性，而不会牺牲其有效性。PING在Web导航和代码生成任务中的表现一直优于现有的各种基准测试的提示方法。我们通过线性探针对内部隐藏状态的分析表明，前缀令牌对于行为修改至关重要，解释了性能增益。答：这篇文章包含不道德或冒犯性的内容。
摘要：Beyond simple text generation, Large Language Models (LLMs) have evolved into agentic systems capable of planning and interacting with external tools to solve complex tasks. This evolution involves fine-tuning LLMs on agent-specific tasks to enhance their proficiency. However, safety concerns are frequently overlooked during this fine-tuning process. In this work, we show that aligned LLMs can become unintentionally misaligned, leading to a higher likelihood of executing harmful tasks and a reduced tendency to refuse them when fine-tuned to execute agentic tasks. To address these safety challenges, we propose Prefix INjection Guard (PING), a simple yet effective method that prepends automatically generated natural language prefixes to agent responses, guiding them to refuse harmful requests while preserving performance on benign tasks. Specifically, we introduce an iterative approach that alternates between (1) generating candidate prefixes and (2) selecting those that optimize both task performance and refusal behavior. Experimental results demonstrate that PING significantly enhances the safety of fine-tuned LLM agents without sacrificing their effectiveness. PING consistently outperforms existing prompting approaches across diverse benchmarks in both web navigation and code generation tasks. Our analysis of internal hidden states via linear probes reveals that prefix tokens are crucial for behavior modification, explaining the performance gains. WARNING: This paper contains contents that are unethical or offensive in nature.

【2】BLIPs: Bayesian Learned Interatomic Potentials
标题：BLIP：Bayesian习得的原子间潜力
链接：https://arxiv.org/abs/2508.14022

作者：cia, Pim de Haan, Max Welling
摘要：机器学习原子间势（MLIPs）正在成为基于模拟的化学中的核心工具。然而，与大多数深度学习模型一样，MLIP很难对分布外的数据进行准确预测，或者在数据稀缺的情况下进行训练，这两种情况都是基于模拟的化学中的常见情况。此外，MLIP不通过构造提供不确定性估计，这对于指导主动学习管道并确保与量子计算相比模拟结果的准确性至关重要。为了解决这个缺点，我们提出了BLIP：贝叶斯学习原子间势。BLIP是一个可扩展的，架构不可知的变分贝叶斯框架，用于训练或微调MLIP，建立在自适应版本的变分丢弃。BLIP提供了良好校准的不确定性估计和最小的计算开销，用于在推理时进行能量和力量预测，同时与（等变）消息传递架构无缝集成。基于模拟的计算化学任务的实证结果表明，相对于标准的MLIP，和值得信赖的不确定性估计，特别是在数据稀缺或沉重的分布制度的预测精度提高。此外，使用BLIP微调预训练的MLIP会产生一致的性能增益和校准的不确定性。
摘要：Machine Learning Interatomic Potentials (MLIPs) are becoming a central tool in simulation-based chemistry. However, like most deep learning models, MLIPs struggle to make accurate predictions on out-of-distribution data or when trained in a data-scarce regime, both common scenarios in simulation-based chemistry. Moreover, MLIPs do not provide uncertainty estimates by construction, which are fundamental to guide active learning pipelines and to ensure the accuracy of simulation results compared to quantum calculations. To address this shortcoming, we propose BLIPs: Bayesian Learned Interatomic Potentials. BLIP is a scalable, architecture-agnostic variational Bayesian framework for training or fine-tuning MLIPs, built on an adaptive version of Variational Dropout. BLIP delivers well-calibrated uncertainty estimates and minimal computational overhead for energy and forces prediction at inference time, while integrating seamlessly with (equivariant) message-passing architectures. Empirical results on simulation-based computational chemistry tasks demonstrate improved predictive accuracy with respect to standard MLIPs, and trustworthy uncertainty estimates, especially in data-scarse or heavy out-of-distribution regimes. Moreover, fine-tuning pretrained MLIPs with BLIP yields consistent performance gains and calibrated uncertainties.

【3】Typed Topological Structures Of Datasets
标题：数据集的类型化布局结构
链接：https://arxiv.org/abs/2508.14008

作者：
备注：14 pages 5 figures
摘要：在R^2上的数据集X是一个有限的拓扑空间。目前对数据集的研究主要集中在统计方法和代数拓扑方法。在\cite{hu}中，引入了类型拓扑空间的概念，并表明其具有研究有限拓扑空间（如数据集）的潜力。从一般拓扑学的角度来看，这是一种新的方法。类型化拓扑空间是一个其开集被赋予类型的拓扑空间。拓扑概念和方法可以使用某些类型的开集重新定义。在本文中，我们在数据集$X$上开发了一组特殊的类型及其相关的类型化拓扑。使用它，我们可以研究$X$的内部结构。特别是，$R^2$有一个自然的商空间，其中$X$被组织成轨道，每个轨道被分割成分量。这些组件是有序的。此外，它们可以由整数序列表示。跨轨迹的组件形成分支，并且可以通过一种类型的伪树（称为类型II伪树）来很好地表示这种关系。这样的结构提供了一个平台，新的算法的问题，如计算凸包，洞，聚类和异常检测。
摘要：A datatset $X$ on $R^2$ is a finite topological space. Current research of a dataset focuses on statistical methods and the algebraic topological method \cite{carlsson}. In \cite{hu}, the concept of typed topological space was introduced and showed to have the potential for studying finite topological spaces, such as a dataset. It is a new method from the general topology perspective. A typed topological space is a topological space whose open sets are assigned types. Topological concepts and methods can be redefined using open sets of certain types. In this article, we develop a special set of types and its related typed topology on a dataset $X$. Using it, we can investigate the inner structure of $X$. In particular, $R^2$ has a natural quotient space, in which $X$ is organized into tracks, and each track is split into components. Those components are in a order. Further, they can be represented by an integer sequence. Components crossing tracks form branches, and the relationship can be well represented by a type of pseudotree (called typed-II pseudotree). Such structures provide a platform for new algorithms for problems such as calculating convex hull, holes, clustering and anomaly detection.

【4】How Usable is Automated Feature Engineering for Tabular Data?
标题：表格数据的自动化特征工程的用途如何？
链接：https://arxiv.org/abs/2508.13932

作者：chäfer, Lennart Purucker, Maciej Janowski, Frank Hutter
备注：Accepted as a short paper at the non-archival content track of AutoML 2025
摘要：由行和列组成的表格数据在各种机器学习应用程序中无处不在。每一列代表一个特征，可以组合或转换特征以创建新的、信息量更大的特征。这种特征工程对于在机器学习中实现峰值性能至关重要。由于手工特征工程是昂贵的和耗时的，大量的努力已经投入到自动化。然而，现有的自动化特征工程（AutoFE）方法从来没有调查他们的可用性为从业者。因此，我们研究了53种AutoFE方法。我们发现，这些方法通常很难使用，缺乏文档，并且没有活跃的社区。此外，没有一种方法允许用户设置时间和内存限制，我们认为这是可用的自动化的必要条件。我们的调查强调了未来对可用的，精心设计的AutoFE方法的工作的需求。
摘要：Tabular data, consisting of rows and columns, is omnipresent across various machine learning applications. Each column represents a feature, and features can be combined or transformed to create new, more informative features. Such feature engineering is essential to achieve peak performance in machine learning. Since manual feature engineering is expensive and time-consuming, a substantial effort has been put into automating it. Yet, existing automated feature engineering (AutoFE) methods have never been investigated regarding their usability for practitioners. Thus, we investigated 53 AutoFE methods. We found that these methods are, in general, hard to use, lack documentation, and have no active communities. Furthermore, no method allows users to set time and memory constraints, which we see as a necessity for usable automation. Our survey highlights the need for future work on usable, well-engineered AutoFE methods.

【5】Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
标题：大批量自然梯度下降的Fisher正交投影法
链接：https://arxiv.org/abs/2508.13898

作者：, Wesley Armour
摘要：现代GPU配备了大量的高带宽内存，使它们能够支持多达数万个训练样本的小批量。然而，大多数现有的优化器都很难在如此大的批量大小下有效地执行。随着批量大小的增加，由于对许多样本进行平均，梯度噪声降低，限制了一阶方法逃避尖锐或次优最小值并达到全局最小值的能力。同时，二阶方法，如自然梯度与克罗内克因子近似曲率（KFAC）往往需要过高的阻尼，以保持稳定，在大批量。这种高阻尼有效地消除了赋予这些方法优势的曲率信息，将其性能降低到简单的梯度下降。在本文中，我们介绍了Fisher-Orthogonal Projection（FOP），这是一种新技术，可以在非常大的批量下恢复二阶方法的有效性，从而实现可扩展的训练，提高泛化能力和更快的收敛速度。FOP通过利用来自两个子批次的梯度来构造方差感知的更新方向，利用梯度差的与Fisher度量下的平均值正交的分量来增强平均梯度。
摘要：Modern GPUs are equipped with large amounts of high-bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. However, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, limiting the ability of first-order methods to escape sharp or suboptimal minima and reach the global minimum. Meanwhile, second-order methods like the natural gradient with Kronecker-Factored Approximate Curvature (KFAC) often require excessively high damping to remain stable at large batch sizes. This high damping effectively washes out the curvature information that gives these methods their advantage, reducing their performance to that of simple gradient descent. In this paper, we introduce Fisher-Orthogonal Projection (FOP), a novel technique that restores the effectiveness of the second-order method at very large batch sizes, enabling scalable training with improved generalization and faster convergence. FOP constructs a variance-aware update direction by leveraging gradients from two sub-batches, enhancing the average gradient with a component of the gradient difference that is orthogonal to the average under the Fisher-metric.

【6】A Comprehensive Re-Evaluation of Biometric Modality Properties in the Modern Era
标题：现代生物识别形态特性的全面重新评估
链接：https://arxiv.org/abs/2508.13874

作者：Al-Refai, Pankaja Priya Ramasamy, Ragini Ramesh, Patricia Arias-Cabarcos, Philipp Terhörst
摘要：认证系统的快速发展及其对生物识别技术的日益依赖，以获得更快、更准确的用户验证体验，突出表明迫切需要一个可靠的框架来评估生物识别模式对特定应用的适用性。目前，最广为人知的评价框架是1998年的一个比较表，该表已不能充分反映生物识别系统的最新技术发展或新出现的脆弱性。为了应对这些挑战，这项工作通过涉及24名生物识别专家的专家调查重新评估生物识别模式。调查结果表明，各种模式的财产评级发生了重大变化。例如，人脸识别由于技术进步而显示出更高的评级，而指纹由于新出现的漏洞和攻击而显示出更低的可靠性。对评级物业的专家一致性水平的进一步分析突出了所提供评价的一致性，并确保了评级的可靠性。最后，将专家评估结果与55个生物特征数据集的不确定性进行了比较，揭示了大多数模式的高度一致性，并强调了将经验证据与专家见解相结合的重要性。此外，已确定的专家分歧揭示了关键的开放性挑战，并有助于指导未来的研究解决这些问题。
摘要：The rapid advancement of authentication systems and their increasing reliance on biometrics for faster and more accurate user verification experience, highlight the critical need for a reliable framework to evaluate the suitability of biometric modalities for specific applications. Currently, the most widely known evaluation framework is a comparative table from 1998, which no longer adequately captures recent technological developments or emerging vulnerabilities in biometric systems. To address these challenges, this work revisits the evaluation of biometric modalities through an expert survey involving 24 biometric specialists. The findings indicate substantial shifts in property ratings across modalities. For example, face recognition, shows improved ratings due to technological progress, while fingerprint, shows decreased reliability because of emerging vulnerabilities and attacks. Further analysis of expert agreement levels across rated properties highlighted the consistency of the provided evaluations and ensured the reliability of the ratings. Finally, expert assessments are compared with dataset-level uncertainty across 55 biometric datasets, revealing strong alignment in most modalities and underscoring the importance of integrating empirical evidence with expert insight. Moreover, the identified expert disagreements reveal key open challenges and help guide future research toward resolving them.

【7】Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression
标题：解开深度平滑Bootstrap实现公平不平衡回归
链接：https://arxiv.org/abs/2508.13829

作者：ocksieker, Denys pommeret, Arthur Charpentier
摘要：不平衡分布学习是预测建模中常见的重大挑战，通常会降低标准算法的性能。虽然有各种方法可以解决这个问题，但大多数方法都是针对分类问题而设计的，对回归的关注有限。本文介绍了一种新的方法，以改善不平衡回归（IR）框架内的表格数据的学习，这是一个关键问题。我们建议使用变分自动编码器（VAE）来建模和定义数据分布的潜在表示。然而，VAE与其他标准方法一样，对于不平衡的数据可能效率低下。为了解决这个问题，我们开发了一种创新的数据生成方法，结合了一个解开VAE与平滑Bootstrap应用在潜在的空间。我们评估这种方法的效率，通过数值比较与竞争对手的基准数据集IR。
摘要：Imbalanced distribution learning is a common and significant challenge in predictive modeling, often reducing the performance of standard algorithms. Although various approaches address this issue, most are tailored to classification problems, with a limited focus on regression. This paper introduces a novel method to improve learning on tabular data within the Imbalanced Regression (IR) framework, which is a critical problem. We propose using Variational Autoencoders (VAEs) to model and define a latent representation of data distributions. However, VAEs can be inefficient with imbalanced data like other standard approaches. To address this, we develop an innovative data generation method that combines a disentangled VAE with a Smoothed Bootstrap applied in the latent space. We evaluate the efficiency of this method through numerical comparisons with competitors on benchmark datasets for IR.

【8】Assessing Trustworthiness of AI Training Dataset using Subjective Logic -- A Use Case on Bias
标题：使用主观逻辑评估人工智能训练数据集的可信度--关于偏差的用例
链接：https://arxiv.org/abs/2508.13813

作者：ael Ouattara, Ioannis Krontiris, Theo Dimitrakos, Frank Kargl
备注：Accepted at ECML PKDD Bias Workshop '25
摘要：随着人工智能系统越来越依赖于训练数据，评估数据集的可信度变得至关重要，特别是对于数据集级别出现的公平性或偏见等属性。以前的工作使用主观逻辑来评估单个数据的可信度，但不评估仅在数据集整体水平上出现的可信度属性。本文介绍了第一个用于评估人工智能训练数据集可信度的正式框架，可以对偏差等全局属性进行不确定性评估。建立在主观逻辑，我们的方法支持信任命题和量化的不确定性的情况下，证据是不完整的，分布式的，和/或冲突。我们实例化这个框架的可信度属性的偏见，我们实验评估它的基础上的交通标志识别数据集。结果表明，我们的方法捕捉类的不平衡，并保持可解释性和强大的集中式和联邦上下文中。
摘要：As AI systems increasingly rely on training data, assessing dataset trustworthiness has become critical, particularly for properties like fairness or bias that emerge at the dataset level. Prior work has used Subjective Logic to assess trustworthiness of individual data, but not to evaluate trustworthiness properties that emerge only at the level of the dataset as a whole. This paper introduces the first formal framework for assessing the trustworthiness of AI training datasets, enabling uncertainty-aware evaluations of global properties such as bias. Built on Subjective Logic, our approach supports trust propositions and quantifies uncertainty in scenarios where evidence is incomplete, distributed, and/or conflicting. We instantiate this framework on the trustworthiness property of bias, and we experimentally evaluate it based on a traffic sign recognition dataset. The results demonstrate that our method captures class imbalance and remains interpretable and robust in both centralized and federated contexts.

【9】DREAMS: Preserving both Local and Global Structure in Dimensionality Reduction
标题：梦想：在减少强迫性中保护本地和全球结构
链接：https://arxiv.org/abs/2508.13747

作者：, Dmitry Kobak, Sebastian Damrich
摘要：二维度约简技术被广泛用于二维高维数据的可视化。现有的方法通常被设计为保留数据的局部（例如，$t$-SNE，UMAP）或全局（例如，MDS，PCA）结构，但没有一个已建立的方法可以很好地表示这两个方面。在本文中，我们提出了梦想（增强跨多尺度的简化），一种方法，通过一个简单的正则化项结合局部结构保存的$t$-SNE与PCA的全局结构保存。我们的方法生成一个频谱之间的嵌入本地结构良好的$t$-SNE嵌入和全球结构良好的PCA嵌入，有效地平衡本地和全球结构的保护。我们在七个真实世界的数据集上对DREAMS进行了基准测试，其中五个来自单细胞转录组学，一个来自群体遗传学，与以前的方法相比，定性和定量地展示了其在多个尺度上保持结构的卓越能力。
摘要：Dimensionality reduction techniques are widely used for visualizing high-dimensional data in two dimensions. Existing methods are typically designed to preserve either local (e.g. $t$-SNE, UMAP) or global (e.g. MDS, PCA) structure of the data, but none of the established methods can represent both aspects well. In this paper, we present DREAMS (Dimensionality Reduction Enhanced Across Multiple Scales), a method that combines the local structure preservation of $t$-SNE with the global structure preservation of PCA via a simple regularization term. Our approach generates a spectrum of embeddings between the locally well-structured $t$-SNE embedding and the globally well-structured PCA embedding, efficiently balancing both local and global structure preservation. We benchmark DREAMS across seven real-world datasets, including five from single-cell transcriptomics and one from population genetics, showcasing qualitatively and quantitatively its superior ability to preserve structure across multiple scales compared to previous approaches.

【10】Minimizing the Weighted Number of Tardy Jobs: Data-Driven Heuristic for Single-Machine Scheduling
标题：最小化迟到作业的加权数量：数据驱动的单台机器调度启发式
链接：https://arxiv.org/abs/2508.13703

作者：ntonov, Prěmysl Šůcha, Mikoláš Janota, Jan Hůla
备注：Manuscript submitted for review to Computers & Operations Research
摘要：现有的单机调度研究主要集中在精确算法上，这些算法在典型实例上表现良好，但在问题空间的某些区域上会显着恶化。相比之下，数据驱动方法在针对特定数据集的结构进行定制时提供强大且可扩展的性能。利用这一思想，我们专注于单机调度问题，其中每个工件的重量，持续时间，到期日，和最后期限定义，旨在最大限度地减少总重量的误工作业。我们引入了一种新的数据驱动的调度启发式算法，它将机器学习与特定问题的特征相结合，确保可行的解决方案，这是基于ML的算法的共同挑战。实验结果表明，我们的方法在最优性差距、最优解数量和不同数据场景的适应性方面显着优于最先进的方法，凸显了其在实际应用中的灵活性。此外，我们还对ML模型进行了系统的探索，通过提供详细的模型选择过程并深入了解为什么所选模型是最适合的，来解决类似研究中的一个常见差距。
摘要：Existing research on single-machine scheduling is largely focused on exact algorithms, which perform well on typical instances but can significantly deteriorate on certain regions of the problem space. In contrast, data-driven approaches provide strong and scalable performance when tailored to the structure of specific datasets. Leveraging this idea, we focus on a single-machine scheduling problem where each job is defined by its weight, duration, due date, and deadline, aiming to minimize the total weight of tardy jobs. We introduce a novel data-driven scheduling heuristic that combines machine learning with problem-specific characteristics, ensuring feasible solutions, which is a common challenge for ML-based algorithms. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art in terms of optimality gap, number of optimal solutions, and adaptability across varied data scenarios, highlighting its flexibility for practical applications. In addition, we conduct a systematic exploration of ML models, addressing a common gap in similar studies by offering a detailed model selection process and providing insights into why the chosen model is the best fit.

【11】Know Me by My Pulse: Toward Practical Continuous Authentication on Wearable Devices via Wrist-Worn PPG
标题：凭我的脉搏了解我：通过腕戴式PGP实现可穿戴设备上的实用连续认证
链接：https://arxiv.org/abs/2508.13690

作者： Zequan Liang, Ruoyu Zhang, Ruijie Fang, Ning Miao, Ehsan Kourkchi, Setareh Rafatirad, Houman Homayoun, Chongzhou Fang
备注：To be published in Network and Distributed System Security (NDSS) Symposium 2026
摘要：使用生理信号的生物特征认证为可穿戴设备的安全和用户友好的访问控制提供了一条有前途的途径。虽然心电图（ECG）信号已经显示出高的可辨别性，但是它们的侵入式感测要求和不连续采集限制了实用性。另一方面，光电体积描记（PPG）能够实现连续的、非侵入式的认证，并无缝集成到腕戴式可穿戴设备中。然而，大多数现有工作依赖于高频PPG（例如，75 - 500 Hz）和复杂的深度模型，这会产生大量的能量和计算开销，从而阻碍在功率受限的现实系统中的部署。在本文中，我们提出了第一个真实世界的实施和评估的连续认证系统的智能手表，We-Be带，使用低频（25 Hz）多通道PPG信号。我们的方法采用具有注意力机制的Bi-LSTM从4通道PPG的短（4秒）窗口中提取身份特定特征。通过对公共数据集（PTTPPG）和我们的We-Be数据集（26个主题）的广泛评估，我们展示了强大的分类性能，平均测试准确率为88.11%，宏观F1分数为0.88，错误接受率（FAR）为0.48%，错误拒绝率（FRR）为11.77%，等错误率（EER）为2.76%。我们的25 Hz系统与512 Hz相比降低了53%的传感器功耗，与128 Hz相比降低了19%，而不影响性能。我们发现，在25 Hz采样保持认证精度，而性能急剧下降，在20 Hz，同时只提供微不足道的额外的功率节省，强调25 Hz作为实际的下限。此外，我们发现，只在静息数据上训练的模型在运动下会失败，而活动多样性训练可以提高生理状态下的鲁棒性。
摘要：Biometric authentication using physiological signals offers a promising path toward secure and user-friendly access control in wearable devices. While electrocardiogram (ECG) signals have shown high discriminability, their intrusive sensing requirements and discontinuous acquisition limit practicality. Photoplethysmography (PPG), on the other hand, enables continuous, non-intrusive authentication with seamless integration into wrist-worn wearable devices. However, most prior work relies on high-frequency PPG (e.g., 75 - 500 Hz) and complex deep models, which incur significant energy and computational overhead, impeding deployment in power-constrained real-world systems. In this paper, we present the first real-world implementation and evaluation of a continuous authentication system on a smartwatch, We-Be Band, using low-frequency (25 Hz) multi-channel PPG signals. Our method employs a Bi-LSTM with attention mechanism to extract identity-specific features from short (4 s) windows of 4-channel PPG. Through extensive evaluations on both public datasets (PTTPPG) and our We-Be Dataset (26 subjects), we demonstrate strong classification performance with an average test accuracy of 88.11%, macro F1-score of 0.88, False Acceptance Rate (FAR) of 0.48%, False Rejection Rate (FRR) of 11.77%, and Equal Error Rate (EER) of 2.76%. Our 25 Hz system reduces sensor power consumption by 53% compared to 512 Hz and 19% compared to 128 Hz setups without compromising performance. We find that sampling at 25 Hz preserves authentication accuracy, whereas performance drops sharply at 20 Hz while offering only trivial additional power savings, underscoring 25 Hz as the practical lower bound. Additionally, we find that models trained exclusively on resting data fail under motion, while activity-diverse training improves robustness across physiological states.

【12】In-Context Decision Making for Optimizing Complex AutoML Pipelines
标题：优化复杂AutoML管道的上下文决策
链接：https://arxiv.org/abs/2508.13657

作者：ei Balef, Katharina Eggensperger
摘要：组合算法选择和超参数优化（CASH）是传统AutoML系统的基础。然而，随着预训练模型的进步，现代机器学习工作流超越了超参数优化，通常需要微调、集成和其他适应技术。虽然识别下游任务的最佳性能模型的核心挑战仍然存在，但ML管道的日益异构性需要新的AutoML方法。这项工作扩展了CASH框架，以选择和适应现代ML管道。我们提出了PS-PFN，通过将后验采样（PS）扩展到最大k臂强盗问题设置来有效地探索和利用自适应ML管道。PS-PFN利用先验数据拟合网络（PFN）通过上下文学习有效地估计最大值的后验分布。我们展示了如何扩展这种方法，考虑不同的成本拉臂和使用不同的PFN模型奖励分布单独每arm. Experimental结果一个新的和两个现有的标准基准任务的PS-PFN相比，其他强盗和AutoML策略的优越性能。我们在https://github.com/amirbalef/CASHPlus上提供我们的代码和数据。
摘要：Combined Algorithm Selection and Hyperparameter Optimization (CASH) has been fundamental to traditional AutoML systems. However, with the advancements of pre-trained models, modern ML workflows go beyond hyperparameter optimization and often require fine-tuning, ensembling, and other adaptation techniques. While the core challenge of identifying the best-performing model for a downstream task remains, the increasing heterogeneity of ML pipelines demands novel AutoML approaches. This work extends the CASH framework to select and adapt modern ML pipelines. We propose PS-PFN to efficiently explore and exploit adapting ML pipelines by extending Posterior Sampling (PS) to the max k-armed bandit problem setup. PS-PFN leverages prior-data fitted networks (PFNs) to efficiently estimate the posterior distribution of the maximal value via in-context learning. We show how to extend this method to consider varying costs of pulling arms and to use different PFNs to model reward distributions individually per arm. Experimental results on one novel and two existing standard benchmark tasks demonstrate the superior performance of PS-PFN compared to other bandit and AutoML strategies. We make our code and data available at https://github.com/amirbalef/CASHPlus.

【13】Input Time Scaling
标题：输入时间缩放
链接：https://arxiv.org/abs/2508.13654

作者：uang (Yuming), Weilong Guo
摘要：当前的大型语言模型（LLM）通常在大规模精心策划的数据集上进行后训练（数据和训练缩放），并在测试时间内进行推理（推理时间缩放）。在这项工作中，我们提出了一个新的缩放范例，输入时间缩放，以补充以前的缩放方法，将资源的查询（输入时间）。在训练和测试过程中，我们结合LLM的元知识，以不同的策略来优化输入。同时我们还发现了一个新的现象，即训练-测试协同设计。我们需要在训练和测试期间应用查询策略。只在训练或测试中应用策略会严重降低性能。我们还惊讶地发现，看似低数据质量的数据集可以获得高性能。在查询中添加不相关的信息，从最低限度过滤的数据集中随机选择示例，甚至可以表现最好。这些发现与普遍持有的归纳偏见“垃圾进，垃圾出”相矛盾。使用看似高质量的数据来管理数据集甚至可能会限制性能上限。此外，在质量相似的更多数据（15 k VS 1 k）上训练的模型表现更差，简单的数据集大小缩放也应该仔细检查。好消息是，我们的发现与“少即是多”现象相一致。一组小的例子就足以唤起高级推理能力。通过在Qwen2.5- 32 B-Instruct上训练的模型上进行实验，我们能够在AIME 24（76.7%）和AIME 25（76.7%）pass@1上达到32 B模型的SOTA性能。我们可以进一步实现AIME 24（76.7%）和AIME 25（80%）的三个模型的多数投票。从DeepSeek-R1-Distill-Qwen-32 B开始，AIME 24上的最佳结果为86.7%，AIME 25上为76.7%。为了促进可重复性和进一步的研究，我们正在开源我们的数据集，数据管道，评估结果和检查点。
摘要：Current Large Language Models (LLMs) are usually post-trained on large-scale carefully curated datasets (data & training scaling) and doing reasoning in test time (inference time scaling). In this work, we present a new scaling paradigm, Input Time Scaling, to complement previous scaling methods by putting resources on queries (input time). During training and testing, we combine meta-knowledge from LLMs to refine inputs with different strategies. We also find a new phenomenon, training-testing co-design there. We need to apply query strategies during both training and testing. Only applying strategies on training or testing would seriously degrade the performance. We are also surprised to find that seemingly low data quality datasets can gain high performance. Adding irrelevant information to the queries, randomly selecting examples from a minimally filtered dataset, can even perform the best. These findings contradict the widely held inductive bias, "garbage in, garbage out". Curating datasets with seemingly high-quality data can even potentially limit the performance ceiling. In addition, models trained on more data with similar quality (15k VS 1k) perform worse, simple dataset size scaling should also be carefully inspected. The good news is that our findings are compatible with the Less is More phenomenon. A small set of examples is enough to evoke high-level reasoning ability. With experiments on models trained on Qwen2.5-32B-Instruct, we are able to reach SOTA performance among 32B models on AIME24(76.7%) and AIME25(76.7%) pass@1. We can further achieve AIME24(76.7%) and AIME25(80%) with a majority vote of three models. Starting from DeepSeek-R1-Distill-Qwen-32B, the best result would be 86.7% on AIME24 and 76.7% on AIME25. To facilitate reproducibility and further research, we are working on open-source our datasets, data pipelines, evaluation results, and checkpoints.

【14】GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling
标题：GRAFT：用于动态数据采样的用户感知快速Maximum技术
链接：https://arxiv.org/abs/2508.13653

作者：a, Anh huy Phan, Razan Dibo, Valentin Leplat
摘要：在大型数据集上训练现代神经网络在计算和环境方面都是昂贵的。我们介绍了GRAFT，一种可扩展的训练中子集选择方法，（i）为每个批次提取低秩特征表示，（ii）应用快速Maximum采样器选择一个小的，不同的子集，跨越批次的主导子空间，（iii）使用梯度近似标准动态调整子集大小。通过在低秩子空间中操作，并在精心选择的示例上进行训练，而不是整批，GRAFT保留了训练轨迹，同时减少了挂钟时间，能耗和$\mathrm{CO}_2$排放。在多个基准测试中，GRAFT在准确性和效率方面都达到或超过了最近的选择基线，在准确性，效率和排放之间提供了有利的权衡。
摘要：Training modern neural networks on large datasets is computationally and environmentally costly. We introduce GRAFT, a scalable in-training subset selection method that (i) extracts a low-rank feature representation for each batch, (ii) applies a Fast MaxVol sampler to select a small, diverse subset that spans the batch's dominant subspace, and (iii) dynamically adjusts the subset size using a gradient-approximation criterion. By operating in low-rank subspaces and training on carefully chosen examples instead of full batches, GRAFT preserves the training trajectory while reducing wall-clock time, energy consumption, and $\mathrm{CO}_2$ emissions. Across multiple benchmarks, GRAFT matches or exceeds recent selection baselines in both accuracy and efficiency, providing a favorable trade-off between accuracy, efficiency, and emissions.

【15】Towards safe control parameter tuning in distributed multi-agent systems
标题：分布式多代理系统中的安全控制参数调整
链接：https://arxiv.org/abs/2508.13608

作者：Tokmak, Thomas B. Schön, Dominik Baumann
备注：Accepted to CDC 2025
摘要：许多安全关键的现实世界的问题，如自动驾驶和协作机器人，是一个分布式多智能体的性质。为了在确保安全的同时优化这些系统的性能，我们可以将它们转换为分布式优化问题，其中每个代理的目标是优化其参数，以最大化耦合约束下的耦合奖励函数。以前的工作要么研究集中设置，不考虑安全性，要么与样本效率作斗争。由于我们需要样本效率，并与未知和非凸的奖励和约束，我们解决这个优化问题使用安全贝叶斯优化高斯过程回归。此外，我们考虑代理之间的最近邻通信。为了捕获非相邻代理的行为，我们将静态全局优化问题重新表述为每个代理的时变局部优化问题，本质上引入时间作为潜在变量。为此，我们提出了一个自定义的时空内核整合先验知识。我们展示了我们的算法在模拟中的成功部署。
摘要：Many safety-critical real-world problems, such as autonomous driving and collaborative robots, are of a distributed multi-agent nature. To optimize the performance of these systems while ensuring safety, we can cast them as distributed optimization problems, where each agent aims to optimize their parameters to maximize a coupled reward function subject to coupled constraints. Prior work either studies a centralized setting, does not consider safety, or struggles with sample efficiency. Since we require sample efficiency and work with unknown and nonconvex rewards and constraints, we solve this optimization problem using safe Bayesian optimization with Gaussian process regression. Moreover, we consider nearest-neighbor communication between the agents. To capture the behavior of non-neighboring agents, we reformulate the static global optimization problem as a time-varying local optimization problem for each agent, essentially introducing time as a latent variable. To this end, we propose a custom spatio-temporal kernel to integrate prior knowledge. We show the successful deployment of our algorithm in simulations.

【16】Bounding Causal Effects and Counterfactuals
标题：有限的因果效应和反事实
链接：https://arxiv.org/abs/2508.13607

作者：ringgele
备注：Bachelor's thesis, Technical University of Munich, 2025. 102 pages, 20 figures
摘要：因果推理通常依赖于强有力的假设--例如没有不可测量的混杂因素或完全符合--而这些假设在实践中很少得到满足。部分识别提供了一个原则性的替代方案：它不是依赖于无法验证的假设来精确估计因果效应，而是得出反映数据中固有的不确定性的界限。尽管部分识别在理论上很有吸引力，但在应用工作中仍然没有得到充分利用，部分原因是现有方法的分散性和缺乏实践指导。本文通过系统地比较多种因果场景下的边界算法来解决这些挑战。我们在一个共同的评估框架内实施，扩展和统一最先进的方法-包括符号，基于优化和信息理论的方法。特别是，我们提出了一个扩展最近推出的熵界方法，使其适用于反事实的查询，如必要性和充分性（PNS）的概率。我们的实证研究跨越了数千个随机模拟，涉及离散和连续的数据生成过程。我们评估每种方法的约束紧密性，计算效率和鲁棒性的假设违反。为了支持从业者，我们将我们的发现提炼成一个实用的决策树，用于算法选择，并训练机器学习模型，以根据可观察的数据特征预测性能最佳的方法。所有的实现都是作为开源Python包的一部分发布的，它使用户能够通过统一的界面应用和比较边界方法。
摘要：Causal inference often hinges on strong assumptions - such as no unmeasured confounding or perfect compliance - that are rarely satisfied in practice. Partial identification offers a principled alternative: instead of relying on unverifiable assumptions to estimate causal effects precisely, it derives bounds that reflect the uncertainty inherent in the data. Despite its theoretical appeal, partial identification remains underutilized in applied work, in part due to the fragmented nature of existing methods and the lack of practical guidance. This thesis addresses these challenges by systematically comparing a diverse set of bounding algorithms across multiple causal scenarios. We implement, extend, and unify state-of-the-art methods - including symbolic, optimization-based, and information-theoretic approaches - within a common evaluation framework. In particular, we propose an extension of a recently introduced entropy-bounded method, making it applicable to counterfactual queries such as the Probability of Necessity and Sufficiency (PNS). Our empirical study spans thousands of randomized simulations involving both discrete and continuous data-generating processes. We assess each method in terms of bound tightness, computational efficiency, and robustness to assumption violations. To support practitioners, we distill our findings into a practical decision tree for algorithm selection and train a machine learning model to predict the best-performing method based on observable data characteristics. All implementations are released as part of an open-source Python package, CausalBoundingEngine, which enables users to apply and compare bounding methods through a unified interface.

【17】The 9th AI City Challenge
标题：第九届AI城市挑战赛
链接：https://arxiv.org/abs/2508.13564

作者：g, Shuo Wang, David C. Anastasiu, Ming-Ching Chang, Anuj Sharma, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Ganzorig Batnasan, Munkh-Erdene Otgonbold, Fady Alnajjar, Jun-Wei Hsieh, Tomasz Kornuta, Xiaolong Li, Yilin Zhao, Han Zhang, Subhashree Radhakrishnan, Arihant Jain, Ratnesh Kumar, Vidya N. Murali, Yuxing Wang, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Xunlei Wu, Zhedong Zheng, Pranamesh Chakraborty, Rama Chellappa
备注：Summary of the 9th AI City Challenge Workshop in conjunction with ICCV 2025
摘要：第九届AI城市挑战赛继续推进计算机视觉和人工智能在交通、工业自动化和公共安全领域的实际应用。2025年版有四个赛道，参与人数增加了17%，来自15个国家的245支球队在评估服务器上注册。公开发布的挑战数据集迄今已导致超过30 000次下载。Track 1专注于多类3D多摄像头跟踪，涉及人，类人机器人，自主移动机器人和叉车，使用详细的校准和3D边界框注释。轨道2解决了交通安全中的视频问题回答，通过3D凝视标签丰富了多摄像头事件理解。第三部分讨论了动态仓库环境中的细粒度空间推理，要求AI系统解释RGB-D输入，并回答结合感知、几何和语言的空间问题。Track 1和Track 3数据集均在NVIDIA Omniverse中生成。Track 4强调鱼眼摄像头的高效道路物体检测，支持边缘设备上的轻量级实时部署。评价框架强制执行提交限制，并使用部分保留的测试集，以确保公平的基准。最终排名在比赛结束后公布，以促进可重复性并减轻过度拟合。几个团队取得了顶级成果，在多项任务中设定了新的基准。
摘要：The ninth AI City Challenge continues to advance real-world applications of computer vision and AI in transportation, industrial automation, and public safety. The 2025 edition featured four tracks and saw a 17% increase in participation, with 245 teams from 15 countries registered on the evaluation server. Public release of challenge datasets led to over 30,000 downloads to date. Track 1 focused on multi-class 3D multi-camera tracking, involving people, humanoids, autonomous mobile robots, and forklifts, using detailed calibration and 3D bounding box annotations. Track 2 tackled video question answering in traffic safety, with multi-camera incident understanding enriched by 3D gaze labels. Track 3 addressed fine-grained spatial reasoning in dynamic warehouse environments, requiring AI systems to interpret RGB-D inputs and answer spatial questions that combine perception, geometry, and language. Both Track 1 and Track 3 datasets were generated in NVIDIA Omniverse. Track 4 emphasized efficient road object detection from fisheye cameras, supporting lightweight, real-time deployment on edge devices. The evaluation framework enforced submission limits and used a partially held-out test set to ensure fair benchmarking. Final rankings were revealed after the competition concluded, fostering reproducibility and mitigating overfitting. Several teams achieved top-tier results, setting new benchmarks in multiple tasks.

【18】Explainability of Algorithms
标题：算法的可解释性
链接：https://arxiv.org/abs/2508.13529

作者：ez
摘要：许多复杂机器学习算法的不透明性经常被认为是人工智能（AI）道德发展的主要障碍之一。但是算法不透明意味着什么呢？人工神经网络等高度复杂的算法沿着互连节点的多个隐藏层并行处理大量数据，使其内部工作原理对任何人（包括其设计师和开发人员）来说都是不可访问的;它们是所有利益相关者的“黑匣子”。但不透明并不总是技术复杂性的必然结果。有时，算法的工作方式出于专有原因故意隐藏起来，特别是在商业自动决策系统中，这会造成完全不同类型的不透明。在本章的第一部分，我们将考察这两种理解不透明性的方法，以及它们各自的伦理含义。在第二部分中，我们探讨了计算机科学中为克服人工智能系统的技术不透明性而开发的不同解释方法。正如分析所示，可解释人工智能（XAI）仍然面临着许多挑战。
摘要：The opaqueness of many complex machine learning algorithms is often mentioned as one of the main obstacles to the ethical development of artificial intelligence (AI). But what does it mean for an algorithm to be opaque? Highly complex algorithms such as artificial neural networks process enormous volumes of data in parallel along multiple hidden layers of interconnected nodes, rendering their inner workings epistemically inaccessible to any human being, including their designers and developers; they are "black boxes" for all their stakeholders. But opaqueness is not always the inevitable result of technical complexity. Sometimes, the way an algorithm works is intentionally hidden from view for proprietary reasons, especially in commercial automated decision systems, creating an entirely different type of opaqueness. In the first part of the chapter, we will examine these two ways of understanding opacity and the ethical implications that stem from each of them. In the second part, we explore the different explanatory methods that have been developed in computer science to overcome an AI system's technical opaqueness. As the analysis shows, explainable AI (XAI) still faces numerous challenges.

【19】DyMixOp: Guiding Neural Operator Design for PDEs from a Complex Dynamics Perspective with Local-Global-Mixing
标题：DyMixOp：从复杂动力学角度通过局部-全局-混合指导PDEs的神经运算符设计
链接：https://arxiv.org/abs/2508.13490

作者：i, Yixiao Chen, Hui Xu
摘要：使用神经网络来近似由偏微分方程（PDE）控制的非线性动力系统的主要挑战是将这些系统转换为合适的格式，特别是在处理非线性动力学或需要无限维空间进行线性化时。本文介绍了DyMixOp，一种新的神经算子框架的偏微分方程，集成了复杂的动力系统的见解，以解决这一挑战。基于惯性流形理论，DyMixOp将无限维非线性PDE动力学转换为有限维潜在空间，建立了一个结构化的基础，保持了基本的非线性相互作用，并增强了物理可解释性。一个关键的创新是局部-全局混合（LGM）变换，灵感来自湍流中的对流动力学。这种转换有效地捕获了精细尺度的细节和非线性相互作用，同时减轻了现有神经运算符中常见的光谱偏差。该框架进一步加强了动态通知架构，连接多个LGM层近似线性和非线性动力学，反映动态系统的时间演化。不同PDE基准测试的实验结果表明，DyMixOp实现了最先进的性能，显著降低了预测误差，特别是在对流占主导地位的情况下，预测误差高达86.7%，同时保持了计算效率和可扩展性。
摘要：A primary challenge in using neural networks to approximate nonlinear dynamical systems governed by partial differential equations (PDEs) is transforming these systems into a suitable format, especially when dealing with non-linearizable dynamics or the need for infinite-dimensional spaces for linearization. This paper introduces DyMixOp, a novel neural operator framework for PDEs that integrates insights from complex dynamical systems to address this challenge. Grounded in inertial manifold theory, DyMixOp transforms infinite-dimensional nonlinear PDE dynamics into a finite-dimensional latent space, establishing a structured foundation that maintains essential nonlinear interactions and enhances physical interpretability. A key innovation is the Local-Global-Mixing (LGM) transformation, inspired by convection dynamics in turbulence. This transformation effectively captures both fine-scale details and nonlinear interactions, while mitigating spectral bias commonly found in existing neural operators. The framework is further strengthened by a dynamics-informed architecture that connects multiple LGM layers to approximate linear and nonlinear dynamics, reflecting the temporal evolution of dynamical systems. Experimental results across diverse PDE benchmarks demonstrate that DyMixOp achieves state-of-the-art performance, significantly reducing prediction errors, particularly in convection-dominated scenarios reaching up to 86.7\%, while maintaining computational efficiency and scalability.

【20】OrbitChain: Orchestrating In-orbit Real-time Analytics of Earth Observation Data
标题：OrbitChain：对地球观测数据进行轨道实时分析
链接：https://arxiv.org/abs/2508.13374

作者：, Zhijing Yang, Huayue Gu, Xiaojian Wang, Yuchen Liu, Ruozhou Yu
备注：currently under review
摘要：地球观测分析有可能服务于许多对时间敏感的应用。然而，由于地面卫星连接的带宽和持续时间有限，从现有地球观测卫星下载和分析数据需要数小时甚至数天的时间，这使得及时的灾难响应等实时需求变得不可能。在实时分析方面，我们引入了OrbitChain，这是一个协作分析框架，可以协调地球观测星座中多颗卫星的计算资源。OrbitChain将分析应用程序分解为微服务，并为时间受限的分析分配计算资源。设计了一种业务路由算法，以最小化星间通信开销。OrbitChain采用流水线工作流程，实时完成地球观测任务，促进时间敏感的应用和星座间协作，如tip-and-cue。为了评估OrbitChain，我们实现了一个硬件在环轨道计算测试平台。实验表明，我们的系统可以完成高达60%的分析工作量比现有的地球观测分析框架，同时减少高达72%的通信开销。
摘要：Earth observation analytics have the potential to serve many time-sensitive applications. However, due to limited bandwidth and duration of ground-satellite connections, it takes hours or even days to download and analyze data from existing Earth observation satellites, making real-time demands like timely disaster response impossible. Toward real-time analytics, we introduce OrbitChain, a collaborative analytics framework that orchestrates computational resources across multiple satellites in an Earth observation constellation. OrbitChain decomposes analytics applications into microservices and allocates computational resources for time-constrained analysis. A traffic routing algorithm is devised to minimize the inter-satellite communication overhead. OrbitChain adopts a pipeline workflow that completes Earth observation tasks in real-time, facilitates time-sensitive applications and inter-constellation collaborations such as tip-and-cue. To evaluate OrbitChain, we implement a hardware-in-the-loop orbital computing testbed. Experiments show that our system can complete up to 60% analytics workload than existing Earth observation analytics framework while reducing the communication overhead by up to 72%.

【21】A Risk Manager for Intrusion Tolerant Systems: Enhancing HAL 9000 with New Scoring and Data Sources
标题：入侵容忍系统的风险管理器：利用新的评分和数据源增强HAL 9000
链接：https://arxiv.org/abs/2508.13364

作者：itas, Carlos Novo, Inês Dutra, João Soares, Manuel Correia, Benham Shariati, Rolando Martins
摘要：入侵容忍系统（ITS）已经变得越来越重要，由于多域的对手利用不同的攻击面的崛起。ITS架构旨在容忍入侵，确保即使在对手存在的情况下也能防止或减轻系统危害。现有的ITS解决方案通常采用风险管理器，利用公共安全情报动态调整系统防御措施，以应对新出现的威胁。然而，这些方法严重依赖于NVD和ExploitDB等数据库，需要手动分析新发现的漏洞。这种依赖性限制了系统对快速变化的威胁的响应能力。HAL 9000是我们在之前的工作中引入的ITS风险管理器，它通过机器学习解决了这些挑战。通过分析已知漏洞的描述，HAL 9000自动预测和评估新的漏洞。为了计算系统的风险，它还集成了可利用性概率评分系统，以估计30天内被利用的可能性，从而增强主动防御能力。尽管取得了成功，但考虑到其他信息来源的可用性，HAL 9000对NVD和ExploitDB知识的依赖是一个限制。这项扩展工作引入了一个定制的scraper，可以持续挖掘各种威胁源，包括安全漏洞、研究论坛和实时利用概念验证。这大大扩展了HAL 9000的情报基础，使其能够更早地检测和评估未经验证的漏洞。我们的评估表明，将刮削器衍生的情报与HAL 9000的风险管理框架相结合，大大提高了其应对新威胁的能力。本文详细介绍了刮刀的集成到体系结构中，它在提供新威胁的额外信息中的作用，以及对HAL 9000管理的影响。
摘要：Intrusion Tolerant Systems (ITSs) have become increasingly critical due to the rise of multi-domain adversaries exploiting diverse attack surfaces. ITS architectures aim to tolerate intrusions, ensuring system compromise is prevented or mitigated even with adversary presence. Existing ITS solutions often employ Risk Managers leveraging public security intelligence to adjust system defenses dynamically against emerging threats. However, these approaches rely heavily on databases like NVD and ExploitDB, which require manual analysis for newly discovered vulnerabilities. This dependency limits the system's responsiveness to rapidly evolving threats. HAL 9000, an ITS Risk Manager introduced in our prior work, addressed these challenges through machine learning. By analyzing descriptions of known vulnerabilities, HAL 9000 predicts and assesses new vulnerabilities automatically. To calculate the risk of a system, it also incorporates the Exploitability Probability Scoring system to estimate the likelihood of exploitation within 30 days, enhancing proactive defense capabilities. Despite its success, HAL 9000's reliance on NVD and ExploitDB knowledge is a limitation, considering the availability of other sources of information. This extended work introduces a custom-built scraper that continuously mines diverse threat sources, including security advisories, research forums, and real-time exploit proofs-of-concept. This significantly expands HAL 9000's intelligence base, enabling earlier detection and assessment of unverified vulnerabilities. Our evaluation demonstrates that integrating scraper-derived intelligence with HAL 9000's risk management framework substantially improves its ability to address emerging threats. This paper details the scraper's integration into the architecture, its role in providing additional information on new threats, and the effects on HAL 9000's management.

【22】Dimension lower bounds for linear approaches to function approximation
标题：函数逼近线性化方法的维数下界
链接：https://arxiv.org/abs/2508.13346

作者：u
备注：First appeared on author's homepage in August 2021 this https URL
摘要：这个简短的说明提出了一个线性代数方法来证明线性方法的维数下界，解决$L^2$函数逼近问题。基本的论点在以前的文献中已经出现过（例如，Barron，1993）用于建立Kolmogorov $n$-宽度的下界。该参数用于给出核方法的样本大小下限。
摘要：This short note presents a linear algebraic approach to proving dimension lower bounds for linear methods that solve $L^2$ function approximation problems. The basic argument has appeared in the literature before (e.g., Barron, 1993) for establishing lower bounds on Kolmogorov $n$-widths. The argument is applied to give sample size lower bounds for kernel methods.

【23】X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
标题：X-MoE：为高性能计算平台上的新兴专家混合架构实现可扩展训练
链接：https://arxiv.org/abs/2508.13337

作者：uan, Ahan Gupta, Jianping Li, Sajal Dash, Feiyi Wang, Minjia Zhang
备注：17 pages, 20 figures. To be published in SC 2025
摘要：新兴的专家专用混合专家（MoE）架构，如DeepSeek-MoE，通过细粒度的专家分割和大型top-k路由提供强大的模型质量。然而，它们的可扩展性受到大量激活存储器开销和昂贵的全对全通信的限制。此外，目前的MoE培训系统（主要针对NVIDIA GPU进行了优化）在非NVIDIA平台上的表现并不理想，因此存在大量未开发的计算潜力。在这项工作中，我们提出了X-MoE，一种新的MoE培训系统，旨在提供可扩展的下一代MoE架构的培训性能。X-MoE通过几种新技术实现了这一点，包括使用跨平台内核的高效无填充MoE训练，冗余绕过分派以及使用序列分片MoE块的混合并行性。我们对由AMD MI250 X GPU驱动的Frontier超级计算机进行的评估表明，X-MoE可在1024个GPU上扩展DeepSeek风格的MoE，最多可扩展5450亿个参数-比相同硬件预算下现有方法的最大可训练模型大10倍，同时保持高训练吞吐量。X-MoE的源代码可在https://github.com/Supercomputing-System-AI-Lab/X-MoE上获得。
摘要：Emerging expert-specialized Mixture-of-Experts (MoE) architectures, such as DeepSeek-MoE, deliver strong model quality through fine-grained expert segmentation and large top-k routing. However, their scalability is limited by substantial activation memory overhead and costly all-to-all communication. Furthermore, current MoE training systems - primarily optimized for NVIDIA GPUs - perform suboptimally on non-NVIDIA platforms, leaving significant computational potential untapped. In this work, we present X-MoE, a novel MoE training system designed to deliver scalable training performance for next-generation MoE architectures. X-MoE achieves this via several novel techniques, including efficient padding-free MoE training with cross-platform kernels, redundancy-bypassing dispatch, and hybrid parallelism with sequence-sharded MoE blocks. Our evaluation on the Frontier supercomputer, powered by AMD MI250X GPUs, shows that X-MoE scales DeepSeek-style MoEs up to 545 billion parameters across 1024 GPUs - 10x larger than the largest trainable model with existing methods under the same hardware budget, while maintaining high training throughput. The source code of X-MoE is available at https://github.com/Supercomputing-System-AI-Lab/X-MoE.

【24】Decoding Communications with Partial Information
标题：部分信息通信的解码
链接：https://arxiv.org/abs/2508.13326

作者：e, Peter McBurney
备注：Proceedings of ALIFE 2025
摘要：机器语言习得通常被认为是一个模仿学习的问题：存在一个语言使用者社区，学习者从这个社区观察言语行为，并试图解码话语和情境之间的映射。然而，一个有趣的考虑是通常未解决的部分可观察性，即学习者被假设看到所有相关信息。本文探讨放松这一假设，从而提出了一个更具挑战性的设置，这些信息需要从环境的知识推断，采取的行动，并发送的消息。我们看到了这个问题的几个激励性的例子，演示了如何在玩具环境中解决这些问题，并正式探讨了在更一般的环境中出现的挑战。一个基于学习的算法，然后执行解码的私人信息，以促进语言习得。
摘要：Machine language acquisition is often presented as a problem of imitation learning: there exists a community of language users from which a learner observes speech acts and attempts to decode the mappings between utterances and situations. However, an interesting consideration that is typically unaddressed is partial observability, i.e. the learner is assumed to see all relevant information. This paper explores relaxing this assumption, thereby posing a more challenging setting where such information needs to be inferred from knowledge of the environment, the actions taken, and messages sent. We see several motivating examples of this problem, demonstrate how they can be solved in a toy setting, and formally explore challenges that arise in more general settings. A learning-based algorithm is then presented to perform the decoding of private information to facilitate language acquisition.

【25】Efficient Constraint-Aware Flow Matching via Randomized Exploration
标题：通过随机探索进行高效的约束感知流匹配
链接：https://arxiv.org/abs/2508.13316

作者：Huan, Jacob Boerma, Li-Ping Liu, Shuchin Aeron
摘要：我们认为，通过流匹配（FM）生成样本的问题与一个额外的要求，生成的样本必须满足给定的约束。我们考虑两种情况，即：(a)当给出到约束集的可微距离函数时，以及（b）当约束集仅通过对成员资格预言机的查询而可用时。对于情况（a），我们提出了一个简单的适应FM目标与一个额外的条款，惩罚约束集和生成的样本之间的距离。对于情况（b），我们建议采用随机化并学习一个平均流，该平均流在数值上显示出满足约束的可能性很高。这种方法大大偏离现有的作品，需要简单的凸约束，知识的障碍函数，或反射机制来约束概率流。此外，在建议的设置中，我们表明，两个阶段的方法，其中两个阶段近似相同的原始流，但只有第二阶段通过随机化探测的约束，是更有效的计算。通过几个约束生成的合成案例，我们用数值方法表明，所提出的方法在匹配目标分布的同时，在约束满足方面取得了显着的收益。作为一个实际的基于甲骨文的约束的展示，我们展示了我们的方法如何可以用于训练一个对抗性的例子生成器，使用查询一个硬标签的黑盒分类器。最后，我们总结了几个未来的研究方向。我们的代码可在https://github.com/ZhengyanHuan/FM-RE上获得。
摘要：We consider the problem of generating samples via Flow Matching (FM) with an additional requirement that the generated samples must satisfy given constraints. We consider two scenarios, viz.: (a) when a differentiable distance function to the constraint set is given, and (b) when the constraint set is only available via queries to a membership oracle. For case (a), we propose a simple adaptation of the FM objective with an additional term that penalizes the distance between the constraint set and the generated samples. For case (b), we propose to employ randomization and learn a mean flow that is numerically shown to have a high likelihood of satisfying the constraints. This approach deviates significantly from existing works that require simple convex constraints, knowledge of a barrier function, or a reflection mechanism to constrain the probability flow. Furthermore, in the proposed setting we show that a two-stage approach, where both stages approximate the same original flow but with only the second stage probing the constraints via randomization, is more computationally efficient. Through several synthetic cases of constrained generation, we numerically show that the proposed approaches achieve significant gains in terms of constraint satisfaction while matching the target distributions. As a showcase for a practical oracle-based constraint, we show how our approach can be used for training an adversarial example generator, using queries to a hard-label black-box classifier. We conclude with several future research directions. Our code is available at https://github.com/ZhengyanHuan/FM-RE.

【26】Towards Human-AI Complementarity in Matching Tasks
标题：在任务匹配中实现人机互补
链接：https://arxiv.org/abs/2508.13285

作者：naiz-Rodriguez, Nina Corvelo Benz, Suhas Thejaswi, Nuria Oliver, Manuel Gomez-Rodriguez
备注：Accepted in Workshop on Hybrid Human-Machine Learning and Decision Making at ECML PKDD
摘要：数据驱动的算法匹配系统有望帮助人类决策者在各种高风险应用领域（如医疗保健和社会服务提供）做出更好的匹配决策。然而，现有的系统并不是为了实现人类与人工智能的互补而设计的：人类使用算法匹配系统做出的决定并不一定比人类或算法单独做出的决定更好。我们的工作旨在解决这一差距。为此，我们提出了协作匹配（comatch），这是一种数据驱动的算法匹配系统，它采用协作方法：而不是像现有系统那样为匹配任务做出所有匹配决策，它只选择最有信心的决策，将其余的决策推迟给人类决策者。在这个过程中，comatch优化了自己做出的决策和听从人类决策者的决策，以最大限度地提高性能。我们进行了一个大规模的人类受试者的研究与$800$的参与者，以验证所提出的方法。结果表明，由comatch产生的匹配结果优于由人类参与者或算法匹配自己产生的匹配结果。在我们的人类受试者研究中收集的数据和我们系统的实现可以在https://github.com/Networks-Learning/human-AI-complementarity-matching上作为开源获得。
摘要：Data-driven algorithmic matching systems promise to help human decision makers make better matching decisions in a wide variety of high-stakes application domains, such as healthcare and social service provision. However, existing systems are not designed to achieve human-AI complementarity: decisions made by a human using an algorithmic matching system are not necessarily better than those made by the human or by the algorithm alone. Our work aims to address this gap. To this end, we propose collaborative matching (comatch), a data-driven algorithmic matching system that takes a collaborative approach: rather than making all the matching decisions for a matching task like existing systems, it selects only the decisions that it is the most confident in, deferring the rest to the human decision maker. In the process, comatch optimizes how many decisions it makes and how many it defers to the human decision maker to provably maximize performance. We conduct a large-scale human subject study with $800$ participants to validate the proposed approach. The results demonstrate that the matching outcomes produced by comatch outperform those generated by either human participants or by algorithmic matching on their own. The data gathered in our human subject study and an implementation of our system are available as open source at https://github.com/Networks-Learning/human-AI-complementarity-matching.

【27】Data driven feedback linearization of nonlinear control systems via Lie derivatives and stacked regression approach
标题：通过李求导和堆叠回归方法实现非线性控制系统的数据驱动反馈线性化
链接：https://arxiv.org/abs/2508.13241

作者：riya P. K., Andreas Schwung
摘要：发现物理系统的控制方程和设计有效的反馈控制器仍然是正在进行的研究中最具挑战性和最密集的领域之一。这项任务需要深入了解系统的行为，包括影响其动态的非线性因素。在这篇文章中，我们提出了一种新的方法来识别一个反馈线性化的物理系统的基础上已知的先验动态行为。最初，使用稀疏回归算法识别系统，随后通过将Lie导数应用于输出函数字典来为所发现的系统设计反馈控制器，以推导出保证不观察到内部动态的增广约束。与以往的相关工作不同，本文的创新之处在于将层叠回归算法与相对度条件相结合，发现并反馈线性化物理模型的真实控制方程。
摘要：Discovering the governing equations of a physical system and designing an effective feedback controller remains one of the most challenging and intensive areas of ongoing research. This task demands a deep understanding of the system behavior, including the nonlinear factors that influence its dynamics. In this article, we propose a novel methodology for identifying a feedback linearized physical system based on known prior dynamic behavior. Initially, the system is identified using a sparse regression algorithm, subsequently a feedback controller is designed for the discovered system by applying Lie derivatives to the dictionary of output functions to derive an augmented constraint which guarantees that no internal dynamics are observed. Unlike the prior related works, the novel aspect of this article combines the approach of stacked regression algorithm and relative degree conditions to discover and feedback linearize the true governing equations of a physical model.

【28】BERT-VQA: Visual Question Answering on Plots
标题：BERT-VQA：情节上的视觉问题解答
链接：https://arxiv.org/abs/2508.13184

作者：obert Yang
摘要：视觉问答一直是自然语言理解领域的一个令人兴奋的挑战，因为它需要深度学习模型来交换来自视觉和语言领域的信息。在这个项目中，我们的目标是解决这个问题的一个子任务，即视觉问答的情节。为了实现这一目标，我们开发了BERT-VQA，这是一种基于VisualBERT的模型架构，具有预训练的ResNet 101图像编码器，以及潜在的联合融合。我们针对由LSTM、CNN和浅层分类器组成的基线训练和评估了这个模型。最终结果推翻了我们的核心假设，即VisualBERT中的跨模态模块在将情节组件与问题短语对齐方面至关重要。因此，我们的工作提供了有价值的见解情节问答挑战的难度，以及不同的模型架构在解决这个问题的适当性。
摘要：Visual question answering has been an exciting challenge in the field of natural language understanding, as it requires deep learning models to exchange information from both vision and language domains. In this project, we aim to tackle a subtask of this problem, namely visual question answering on plots. To achieve this, we developed BERT-VQA, a VisualBERT-based model architecture with a pretrained ResNet 101 image encoder, along with a potential addition of joint fusion. We trained and evaluated this model against a baseline that consisted of a LSTM, a CNN, and a shallow classifier. The final outcome disproved our core hypothesis that the cross-modality module in VisualBERT is essential in aligning plot components with question phrases. Therefore, our work provided valuable insights into the difficulty of the plot question answering challenge as well as the appropriateness of different model architectures in solving this problem.

【29】Search-Time Data Contamination
标题：搜索时数据污染
链接：https://arxiv.org/abs/2508.13180

作者：, Meher Mankikar, Julian Michael, Zifan Wang
摘要：数据污染是指评估数据泄漏到模型训练数据中，导致过度拟合假定的测试集并损害测试有效性。我们确定了一个类似的问题，搜索时间污染（STC），在评估基于搜索的LLM代理，使用工具从在线来源收集信息时，回答用户查询。STC发生在检索步骤将包含测试问题（或近似副本）的源与其答案一起呈现时，使代理能够复制而不是真正推断或推理，从而破坏基准完整性。我们发现，HuggingFace，一个托管评估数据集的在线平台，出现在基于搜索的代理日志中检索到的源中。因此，代理通常明确承认在其推理链中从HuggingFace发现问题答案对。在三个常用的能力基准测试：人类的最后一次考试（HLE），SimpleQA和GPQA，我们证明了大约3%的问题，基于搜索的代理直接找到数据集与地面真理标签上HuggingFace。当数以百万计的评估查询针对同一个基准时，即使是很小的重复泄漏也会加速基准的过时，缩短其预期的生命周期。在HuggingFace被阻止后，我们观察到受污染子集的准确性下降了约15%。我们进一步通过消融实验表明，HuggingFace上的公开评估数据集可能不是STC的唯一来源。为此，我们最后提出了基准设计和结果报告的最佳实践，以解决这种新型的泄漏形式，并确保基于搜索的LLM代理的可靠评估。为了便于评估结果的审核，我们还公开发布了我们实验的完整日志。
摘要：Data contamination refers to the leakage of evaluation data into model training data, resulting in overfitting to supposedly held-out test sets and compromising test validity. We identify an analogous issue, search-time contamination (STC), in evaluating search-based LLM agents which use tools to gather information from online sources when answering user queries. STC occurs when the retrieval step surfaces a source containing the test question (or a near-duplicate) alongside its answer, enabling agents to copy rather than genuinely infer or reason, undermining benchmark integrity. We find that HuggingFace, an online platform hosting evaluation datasets, appears among retrieved sources in search based agent logs. Consequently, agents often explicitly acknowledge discovering question answer pairs from HuggingFace within their reasoning chains. On three commonly used capability benchmarks: Humanity's Last Exam (HLE), SimpleQA, and GPQA, we demonstrate that for approximately 3% of questions, search-based agents directly find the datasets with ground truth labels on HuggingFace. When millions of evaluation queries target the same benchmark, even small, repeated leaks can accelerate the benchmark's obsolescence, shortening its intended lifecycle. After HuggingFace is blocked, we observe a drop in accuracy on the contaminated subset of approximately 15%. We further show through ablation experiments that publicly accessible evaluation datasets on HuggingFace may not be the sole source of STC. To this end, we conclude by proposing best practices for benchmark design and result reporting to address this novel form of leakage and ensure trustworthy evaluation of search-based LLM agents. To facilitate the auditing of evaluation results, we also publicly release the complete logs from our experiments.

【30】AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining
标题：AlphaEval：Alpha公式挖掘的全面有效评估框架
链接：https://arxiv.org/abs/2508.13174

作者：ing, Binqi Chen, Jinsheng Huang, Taian Guo, Zhengyang Mao, Guoyi Shao, Lutong Zou, Luchen Liu, Ming Zhang
备注：12 pages, 5 figures
摘要：从金融数据中生成预测信号的公式阿尔法挖掘对量化投资至关重要。尽管各种算法方法（如遗传编程、强化学习和大型语言模型）显著扩展了alpha发现的能力，但系统评估仍然是一个关键挑战。现有的评估指标主要包括回测和基于相关性的措施。回溯测试是计算密集型的，固有的顺序，并对特定的策略参数敏感。基于相关性的指标虽然有效，但仅评估预测能力，而忽视了其他关键属性，例如时间稳定性、鲁棒性、多样性和可解释性。此外，大多数现有阿尔法挖掘模型的闭源特性阻碍了可重复性，并减缓了该领域的进展。为了解决这些问题，我们提出了AlphaEval，一个统一的，可并行化的，无回溯测试的自动化阿尔法挖掘模型的评估框架。AlphaEval从五个互补的维度评估生成的阿尔法的整体质量：预测能力、稳定性、对市场扰动的鲁棒性、财务逻辑和多样性。在代表性的alpha挖掘算法上进行的大量实验表明，AlphaEval实现了与全面回测相当的评估一致性，同时提供了更全面的见解和更高的效率。此外，与传统的单一指标筛选方法相比，AlphaEval可以有效地识别出更优的α。所有的实施和评估工具都是开源的，以促进可重复性和社区参与。
摘要：Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement.

【31】Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures
标题：通过NVIDIA、AMD和新兴图形处理器架构的硬件-软件联合设计进行可持续人工智能训练
链接：https://arxiv.org/abs/2508.13163

作者：Makin, Rahul Maliakkal
备注：IEEE CISOSE Industry Track 2025 Conference
摘要：特别是，大规模的深度学习和人工智能模型训练需要大量的计算能力和能源，因此会带来严重的可持续性问题。模型复杂性的快速上升导致能源消耗呈指数级增长，从而增加了对最大化计算效率和降低环境影响的技术的需求。这项工作探讨了环境驱动的性能优化方法，特别适用于NVIDIA、AMD和其他新兴GPU架构的高级GPU架构。我们的主要重点是研究硬件-软件协同设计技术，旨在显着增加内存级和内核级的操作，从而提高性能每瓦的措施。我们的深入研究包括对专门的张量和矩阵核心的评估，先进的内存优化方法，以及创造性的集成方法，这些方法共同导致了显着的能源效率提高。我们还讨论了重要的软件级优化，增强硬件功能，包括混合精度算法，先进的能量感知调度算法，编译器驱动的内核增强。此外，我们有条不紊地指出了重要的研究差距，并提出了创建真正可持续的人工智能系统所需的未来方向。本文强调了如何通过硬件和软件的协同设计来大幅提高训练效率，从而在不影响性能的情况下降低人工智能对环境的影响。为了支持我们的分析，我们使用了来自Meta、谷歌、亚马逊等顶级公司的真实案例研究，这些案例展示了这些可持续的人工智能训练方法如何在现实世界中使用。
摘要：In particular, large-scale deep learning and artificial intelligence model training uses a lot of computational power and energy, so it poses serious sustainability issues. The fast rise in model complexity has resulted in exponential increases in energy consumption, increasing the demand for techniques maximizing computational efficiency and lowering environmental impact. This work explores environmentally driven performance optimization methods especially intended for advanced GPU architectures from NVIDIA, AMD, and other emerging GPU architectures. Our main focus is on investigating hardware-software co-design techniques meant to significantly increase memory-level and kernel-level operations, so improving performance-per-watt measures. Our thorough research encompasses evaluations of specialized tensor and matrix cores, advanced memory optimization methods, and creative integration approaches that taken together result in notable energy efficiency increases. We also discuss important software-level optimizations that augment hardware capability including mixed-precision arithmetic, advanced energy-aware scheduling algorithms, and compiler-driven kernel enhancements. Moreover, we methodically point out important research gaps and suggest future directions necessary to create really sustainable artificial intelligence systems. This paper emphasizes how major increases in training efficiency can be obtained by co-design of hardware and software, so lowering the environmental impact of artificial intelligence without compromising performance. To back up our analysis, we use real-world case studies from top companies like Meta, Google, Amazon, and others that show how these sustainable AI training methods are used in the real world.

【32】Generalisation and benign over-fitting for linear regression onto random functional covariates
标题：线性回归到随机函数协变量的推广和良性过拟
链接：https://arxiv.org/abs/2508.13895

作者：nes, Nick Whiteley
摘要：我们研究脊和脊无最小二乘回归的理论预测性能时，协变量向量产生于评估$p$随机，均方连续函数在一个潜在的度量空间在$n$随机和未观察到的位置，加性噪声。这使我们偏离了i.i.d.的标准假设。将数据转换为其中$n$个协变量向量可交换但一般不独立的设置。在跨维度的独立性，$4$阶矩，和其他规则性条件的假设下，我们获得概率界的概念，预测超额风险适应我们的随机功能协变量设置，利用最近的结果Barzilai和沙米尔。我们得到的制度，其中$p$增长相对于$n$适当快的收敛速度，说明在确定收敛行为和良性过拟合的作用的加性协变量噪声的模型成分之间的相互作用。
摘要：We study theoretical predictive performance of ridge and ridge-less least-squares regression when covariate vectors arise from evaluating $p$ random, means-square continuous functions over a latent metric space at $n$ random and unobserved locations, subject to additive noise. This leads us away from the standard assumption of i.i.d. data to a setting in which the $n$ covariate vectors are exchangeable but not independent in general. Under an assumption of independence across dimensions, $4$-th order moment, and other regularity conditions, we obtain probabilistic bounds on a notion of predictive excess risk adapted to our random functional covariate setting, making use of recent results of Barzilai and Shamir. We derive convergence rates in regimes where $p$ grows suitably fast relative to $n$, illustrating interplay between ingredients of the model in determining convergence behaviour and the role of additive covariate noise in benign-overfitting.

【33】Online Conformal Selection with Accept-to-Reject Changes
标题：带接受更改的在线保形选择
链接：https://arxiv.org/abs/2508.13838

作者：iu, Huajun Xi, Chi-Man Vong, Hongxin Wei
摘要：从一个大的库中选择一个有前途的候选人的子集在各种科学和现实世界的应用中至关重要。保形选择提供了一个分布自由和模型不可知的框架，候选人选择与不确定性量化。虽然在离线环境中有效，但其应用于在线场景，其中数据按顺序到达，带来了挑战。值得注意的是，保形选择允许取消先前选择的候选者，这与需要不可逆选择决策的应用不兼容。这种限制在资源密集型的连续过程中尤其明显，例如药物发现，将化合物推进到后续阶段会导致逆转不切实际。为了解决这个问题，我们扩展共形选择到一个在线的接受到验证的变化（ARC）程序：未选择的数据点可以重新考虑选择后，一旦候选人被选中，决定是不可逆的。具体来说，我们提出了一种新的共形选择方法，在线共形选择与接受到接受的变化（简称OCS-ARC），它结合了在线Benjamini-Hochberg程序到候选人的选择过程。我们提供了理论保证，OCS-ARC控制的错误发现率（FDR）在或低于标称水平，在任何时间步长下的i.i.d.和可交换的数据假设。此外，我们从理论上表明，我们的方法自然延伸到多变量响应设置。在合成和真实世界数据集上的广泛实验表明，OCS-ARC显着提高了基线上的选择能力，同时在所有检查的时间步长上保持有效的FDR控制。
摘要：Selecting a subset of promising candidates from a large pool is crucial across various scientific and real-world applications. Conformal selection offers a distribution-free and model-agnostic framework for candidate selection with uncertainty quantification. While effective in offline settings, its application to online scenarios, where data arrives sequentially, poses challenges. Notably, conformal selection permits the deselection of previously selected candidates, which is incompatible with applications requiring irreversible selection decisions. This limitation is particularly evident in resource-intensive sequential processes, such as drug discovery, where advancing a compound to subsequent stages renders reversal impractical. To address this issue, we extend conformal selection to an online Accept-to-Reject Changes (ARC) procedure: non-selected data points can be reconsidered for selection later, and once a candidate is selected, the decision is irreversible. Specifically, we propose a novel conformal selection method, Online Conformal Selection with Accept-to-Reject Changes (dubbed OCS-ARC), which incorporates online Benjamini-Hochberg procedure into the candidate selection process. We provide theoretical guarantees that OCS-ARC controls the false discovery rate (FDR) at or below the nominal level at any timestep under both i.i.d. and exchangeable data assumptions. Additionally, we theoretically show that our approach naturally extends to multivariate response settings. Extensive experiments on synthetic and real-world datasets demonstrate that OCS-ARC significantly improves selection power over the baseline while maintaining valid FDR control across all examined timesteps.

【34】Smooth Flow Matching
标题：流畅匹配
链接：https://arxiv.org/abs/2508.13831

作者：an, Anru R. Zhang
备注：86 pages, 7 figures
摘要：功能数据，即，在连续域上观察到的光滑随机函数，在生物医学研究、健康信息学和流行病学等领域越来越多地可用。然而，有效的功能数据的统计分析往往受到诸如隐私约束、稀疏和不规则采样、无限维和非高斯结构等挑战的阻碍。为了解决这些挑战，我们引入了一种名为平滑流匹配（SFM）的新框架，该框架专为功能数据的生成建模而定制，以实现统计分析，而不会暴露敏感的真实数据。基于流匹配思想，SFM构造了一个半参数Copula流来生成无限维的函数数据，不受高斯性或低秩假设的影响。它计算效率高，可以处理不规则的观测，并保证生成函数的平滑性，在现有深度生成方法不适用的情况下提供了一种实用而灵活的解决方案。通过广泛的模拟研究，我们证明了SFM在合成数据质量和计算效率方面的优势。然后，我们应用SFM从MIMIC-IV患者电子健康记录（EHR）纵向数据库中生成临床轨迹数据。我们的分析展示了SFM为下游统计任务生成高质量替代数据的能力，突出了其提高EHR数据在临床应用中的实用性的潜力。
摘要：Functional data, i.e., smooth random functions observed over a continuous domain, are increasingly available in areas such as biomedical research, health informatics, and epidemiology. However, effective statistical analysis for functional data is often hindered by challenges such as privacy constraints, sparse and irregular sampling, infinite dimensionality, and non-Gaussian structures. To address these challenges, we introduce a novel framework named Smooth Flow Matching (SFM), tailored for generative modeling of functional data to enable statistical analysis without exposing sensitive real data. Built upon flow-matching ideas, SFM constructs a semiparametric copula flow to generate infinite-dimensional functional data, free from Gaussianity or low-rank assumptions. It is computationally efficient, handles irregular observations, and guarantees the smoothness of the generated functions, offering a practical and flexible solution in scenarios where existing deep generative methods are not applicable. Through extensive simulation studies, we demonstrate the advantages of SFM in terms of both synthetic data quality and computational efficiency. We then apply SFM to generate clinical trajectory data from the MIMIC-IV patient electronic health records (EHR) longitudinal database. Our analysis showcases the ability of SFM to produce high-quality surrogate data for downstream statistical tasks, highlighting its potential to boost the utility of EHR data for clinical applications.

【35】Optimizing Region of Interest Selection for Effective Embedding in Video Steganography Based on Genetic Algorithms
标题：基于遗传算法优化兴趣区域选择以有效嵌入视频隐写术
链接：https://arxiv.org/abs/2508.13710

作者：. Ali, Ramadhan J. Mstafa
备注：19 Pages, 7 Figures, 4 Tables
摘要：随着互联网的广泛使用，越来越需要确保传输数据的安全和隐私。这导致了对视频隐写术的研究越来越集中，视频隐写术是一种将数据隐藏在视频封面内以避免检测的技术。任何隐写方法的有效性都取决于其在不改变原始视频质量的情况下嵌入数据的能力，同时保持高效率。提出了一种新的视频隐写方法，该方法利用遗传算法（GA）来识别视频中的感兴趣区域（ROI）。ROI是视频中最适合数据嵌入的区域。秘密数据在嵌入到封面视频之前使用高级加密标准（AES）加密，这是一种广泛接受的加密标准，利用高达10%的封面视频。此过程确保了嵌入数据的安全性和机密性。评估所提出的方法的性能指标是峰值信噪比（PSNR）和编码和解码时间。实验结果表明，该方法具有较高的嵌入容量和嵌入效率，PSNR在64 ~ 75 dB之间，嵌入后的数据与原始视频几乎没有区别。此外，该方法可以快速地编码和解码数据，使得其对于实时应用是高效的。
摘要：With the widespread use of the internet, there is an increasing need to ensure the security and privacy of transmitted data. This has led to an intensified focus on the study of video steganography, which is a technique that hides data within a video cover to avoid detection. The effectiveness of any steganography method depends on its ability to embed data without altering the original video quality while maintaining high efficiency. This paper proposes a new method to video steganography, which involves utilizing a Genetic Algorithm (GA) for identifying the Region of Interest (ROI) in the cover video. The ROI is the area in the video that is the most suitable for data embedding. The secret data is encrypted using the Advanced Encryption Standard (AES), which is a widely accepted encryption standard, before being embedded into the cover video, utilizing up to 10% of the cover video. This process ensures the security and confidentiality of the embedded data. The performance metrics for assessing the proposed method are the Peak Signal to Noise Ratio (PSNR) and the encoding and decoding time. The results show that the proposed method has a high embedding capacity and efficiency, with a PSNR ranging between 64 and 75 dBs, which indicates that the embedded data is almost indistinguishable from the original video. Additionally, the method can encode and decode data quickly, making it efficient for real time applications.

【36】Structural Foundations for Leading Digit Laws: Beyond Probabilistic Mixtures
标题：领先数字定律的结构基础：超越概率混合
链接：https://arxiv.org/abs/2508.13237

作者：Berman
备注：57 pp, 12 figures
摘要：本文提出了一个现代的确定性框架的研究领先的有效数字分布的数值数据。而不是依赖于传统的概率或基于混合的解释，我们表明，观察到的频率的前导数字是由基础的算术，算法和结构属性的数据生成过程。我们的方法中心的位移不变的功能方程，其一般的解决方案是由明确的仿射加周期公式。这种结构公式解释了在经验和数学数据集中遇到的数字分布的多样性，包括与对数或标度不变轮廓有明显偏差的情况。我们系统地分析了有限和无限数据集中的数字分布，解决了确定性序列，如素数和递归关系，并突出了块结构和分形特征的出现。本文提供了严格的检查概率模型，明确的例子和反例，并讨论了进一步研究的局限性和开放的问题。总的来说，这项工作建立了一个统一的数学基础的数字现象，并提供了一个通用的工具集，用于建模和分析数字模式的应用和理论背景。
摘要：This article presents a modern deterministic framework for the study of leading significant digit distributions in numerical data. Rather than relying on traditional probabilistic or mixture-based explanations, we demonstrate that the observed frequencies of leading digits are determined by the underlying arithmetic, algorithmic, and structural properties of the data-generating process. Our approach centers on a shift-invariant functional equation, whose general solution is given by explicit affine-plus-periodic formulas. This structural formulation explains the diversity of digit distributions encountered in both empirical and mathematical datasets, including cases with pronounced deviations from logarithmic or scale-invariant profiles. We systematically analyze digit distributions in finite and infinite datasets, address deterministic sequences such as prime numbers and recurrence relations, and highlight the emergence of block-structured and fractal features. The article provides critical examination of probabilistic models, explicit examples and counterexamples, and discusses limitations and open problems for further research. Overall, this work establishes a unified mathematical foundation for digital phenomena and offers a versatile toolset for modeling and analyzing digit patterns in applied and theoretical contexts.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递