Py学习  »  机器学习算法

机器学习学术速递[12.10]

arXiv每日学术速递 • 昨天 • 51 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计143篇


大模型相关(14篇)

【1】Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
标题:重新审视大型语言模型训练中下游收件箱的缩放属性
链接:https://arxiv.org/abs/2512.08894

作者:Jakub Krajewski,Amitis Shidani,Dan Busbridge,Sam Wiseman,Jason Ramapuram
摘要:虽然大型语言模型(LLM)的缩放律传统上专注于预训练损失等代理指标,但预测下游任务性能被认为是不可靠的。本文提出了一个直接的框架来模拟从培训预算的基准性能的缩放的观点提出了挑战。我们发现,对于一个固定的令牌参数比,一个简单的幂律可以准确地描述多个流行的下游任务的日志精度的缩放行为。我们的研究结果表明,直接的方法外推比以前提出的两阶段的程序,这是容易复合错误。此外,我们引入了函数形式,可以预测令牌与参数比率的准确性,并考虑重复采样下的推理计算。我们验证了我们在两个数据集混合物中对多达350 B令牌训练的多达17 B参数的模型的研究结果。为了支持可重复性并鼓励未来的研究,我们发布了完整的预训练损失和下游评估结果。
摘要:While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct framework to model the scaling of benchmark performance from the training budget. We find that for a fixed token-to-parameter ratio, a simple power law can accurately describe the scaling behavior of log accuracy on multiple popular downstream tasks. Our results show that the direct approach extrapolates better than the previously proposed two-stage procedure, which is prone to compounding errors. Furthermore, we introduce functional forms that predict accuracy across token-to-parameter ratios and account for inference compute under repeated sampling. We validate our findings on models with up to 17B parameters trained on up to 350B tokens across two dataset mixtures. To support reproducibility and encourage future research, we release the complete set of pretraining losses and downstream evaluation results.


【2】When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
标题:当表泄露时:攻击基于LLM的表格数据生成中的字符串重新同步
链接:https://arxiv.org/abs/2512.08875

作者:Joshua Ward,Bochao Gu,Chi-Hua Wang,Guang Cheng
摘要:大型语言模型(LLM)最近在生成高质量的表格合成数据方面表现出了卓越的性能。在实践中,已经出现了两种主要方法来使LLM适应表格数据生成:(i)直接在表格数据集上微调较小的模型,以及(ii)通过上下文提供的示例提示较大的模型。在这项工作中,我们表明,从这两个制度的流行实现表现出一种倾向,通过从他们的训练数据中再现记忆的数字模式,损害隐私。为了系统地分析这种风险,我们引入了一种简单的无框成员推理攻击(MIA),称为LevAtt,它假设只对生成的合成数据进行对抗性访问,并以合成观察中的数字字符串序列为目标。使用这种方法,我们的攻击在各种模型和数据集上暴露了大量的隐私泄露,在某些情况下,甚至是最先进模型上的完美成员分类器。我们的研究结果强调了基于LLM的合成数据生成的独特隐私漏洞以及有效防御的必要性。为此,我们提出了两种方法,包括一种新的采样策略,在生成过程中的数字的战略扰动。我们的评估表明,这种方法可以击败这些攻击的保真度和效用的合成数据的损失最小。
摘要:Large Language Models (LLMs) have recently demonstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two primary approaches have emerged for adapting LLMs to tabular data generation: (i) fine-tuning smaller models directly on tabular datasets, and (ii) prompting larger models with examples provided in context. In this work, we show that popular implementations from both regimes exhibit a tendency to compromise privacy by reproducing memorized patterns of numeric digits from their training data. To systematically analyze this risk, we introduce a simple No-box Membership Inference Attack (MIA) called LevAtt that assumes adversarial access to only the generated synthetic data and targets the string sequences of numeric digits in synthetic observations. Using this approach, our attack exposes substantial privacy leakage across a wide range of models and datasets, and in some cases, is even a perfect membership classifier on state-of-the-art models. Our findings highlight a unique privacy vulnerability of LLM-based synthetic data generation and the need for effective defenses. To this end, we propose two methods, including a novel sampling strategy that strategically perturbs digits during generation. Our evaluation demonstrates that this approach can defeat these attacks with minimal loss of fidelity and utility of the synthetic data.


【3】Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents
标题:Fed-SE:隐私约束多环境LLM Agent的联邦自进化
链接:https://arxiv.org/abs/2512.08870

作者:Xiang Chen,Yuling Shi,Qizhen Lan,Yuchao Qiu,Xiaodong Gu
摘要 :LLM代理被广泛部署在复杂的交互式任务中,但隐私约束通常会妨碍动态环境中的集中优化和协同进化。虽然联邦学习(FL)已被证明在静态数据集上是有效的,但其扩展到代理的开放式自我进化仍有待探索。直接应用标准FL是具有挑战性的:异构任务和稀疏的,强制性级别的奖励引入了严重的梯度冲突,破坏了全局优化过程。为了弥合这一差距,我们提出了Fed-SE,这是一个用于LLM代理的联邦自进化框架。Fed-SE建立了一个局部进化-全局聚合范式。在本地,代理对过滤后的高回报轨迹进行参数有效的微调,以实现稳定的梯度更新。在全球范围内,Fed-SE将更新聚合在一个低秩子空间内,该子空间可以解开特定于环境的动态,从而有效地减少客户端之间的负传递。在五个异构环境中的实验表明,Fed-SE将平均任务成功率提高了约18%,验证了其在隐私受限部署中强大的跨环境知识转移的有效性。
摘要:LLM agents are widely deployed in complex interactive tasks, yet privacy constraints often preclude centralized optimization and co-evolution across dynamic environments. While Federated Learning (FL) has proven effective on static datasets, its extension to the open-ended self-evolution of agents remains underexplored. Directly applying standard FL is challenging: heterogeneous tasks and sparse, trajectory-level rewards introduce severe gradient conflicts, destabilizing the global optimization process. To bridge this gap, we propose Fed-SE, a Federated Self-Evolution framework for LLM agents. Fed-SE establishes a local evolution-global aggregation paradigm. Locally, agents employ parameter-efficient fine-tuning on filtered, high-return trajectories to achieve stable gradient updates. Globally, Fed-SE aggregates updates within a low-rank subspace that disentangles environment-specific dynamics, effectively reducing negative transfer across clients. Experiments across five heterogeneous environments demonstrate that Fed-SE improves average task success rates by approximately 18% over federated baselines, validating its effectiveness in robust cross-environment knowledge transfer in privacy-constrained deployments.


【4】Multicalibration for LLM-based Code Generation
标题:基于LLM的代码生成的多校准
链接:https://arxiv.org/abs/2512.08810

作者:Viola Campos,Robin Kuschnereit,Adrian Ulges
备注:Accepted at AI-SQE 2026 (The 1st International Workshop on AI for Software Quality Evaluation: Judgment, Metrics, Benchmarks, and Beyond)
摘要:随着基于AI的代码生成变得越来越普遍,研究人员正在研究代码LLM的校准-确保其置信度得分忠实地代表代码正确性的真实可能性。为此,我们研究了多校准,它可以捕获有关编码问题的其他因素,例如复杂性,代码长度或使用的编程语言。我们使用最新一代的代码LLM(Qwen 3 Coder,GPT-OSS,DeepSeek-R1-Distill)在三个函数合成基准上研究了四种多校准方法。我们的研究结果表明,多重校准可以产生明显的改善,无论是未校准的令牌似然(+1.03技能得分)和基线校准(+0.37技能得分)。我们研究了消融中上述因素的影响,并使我们的数据集(包括代码生成,可能性和正确性标签)可用于代码LLM校准的未来研究。
摘要:As AI-based code generation becomes widespread, researchers are investigating the calibration of code LLMs - ensuring their confidence scores faithfully represent the true likelihood of code correctness. To do so, we investigate multicalibration, which can capture additional factors about a coding problem, such as complexity, code length, or programming language used. We study four multicalibration approaches on three function synthesis benchmarks, using latest-generation code LLMs (Qwen3 Coder, GPT-OSS, DeepSeek-R1-Distill). Our results demonstrate that multicalibration can yield distinct improvements over both uncalibrated token likelihoods (+1.03 in skill score) and baseline calibrations (+0.37 in skill score). We study the influence of the aforementioned factors in ablations, and make our dataset (consisting of code generations, likelihoods, and correctness labels) available for future research on code LLM calibration.


【5】PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration
标题:PrivButton:通过设备云协作对大型语言模型进行高效且保护隐私的微调
链接:https://arxiv.org/abs/2512.08809

作者:Yi Liu,Weixiang Han,Chengjun Cai,Xingliang Yuan,Cong Wang
备注:Accepted at IEEE INFOCOM 2026 (full version)
摘要:随着大型语言模型的兴起,服务提供商将语言模型作为服务提供,使用户能够通过上传的私有数据集微调定制模型。然而,这引发了对敏感数据泄露的担忧。先前的方法依赖于设备-云协作框架内的差异隐私,难以平衡隐私和实用性,使用户暴露于推理攻击或降低微调性能。为了解决这个问题,我们提出了PrivTune,这是一个通过分裂学习(SL)实现的高效和隐私保护的微调框架。PrivTune的关键思想是将精心制作的噪声注入SL底部模型的令牌表示中,使每个令牌类似于$n$-hop间接邻居。PrivTune将此公式化为一个优化问题,以计算最佳噪声向量,与防御效用目标保持一致。在此基础上,它然后调整参数(即,mean)以与优化方向对齐,并根据令牌重要性来缩放噪声以最小化失真。在五个数据集(涵盖分类和生成任务)上针对三种嵌入反转和三种属性推理攻击的实验表明,在斯坦福情绪树库数据集上使用RoBERTa,PrivTune将攻击成功率降低到10%,效用性能仅下降3.33%,优于最先进的基线。
摘要:With the rise of large language models, service providers offer language models as a service, enabling users to fine-tune customized models via uploaded private datasets. However, this raises concerns about sensitive data leakage. Prior methods, relying on differential privacy within device-cloud collaboration frameworks, struggle to balance privacy and utility, exposing users to inference attacks or degrading fine-tuning performance. To address this, we propose PrivTune, an efficient and privacy-preserving fine-tuning framework via Split Learning (SL). The key idea of PrivTune is to inject crafted noise into token representations from the SL bottom model, making each token resemble the $n$-hop indirect neighbors. PrivTune formulates this as an optimization problem to compute the optimal noise vector, aligning with defense-utility goals. On this basis, it then adjusts the parameters (i.e., mean) of the $d_χ$-Privacy noise distribution to align with the optimization direction and scales the noise according to token importance to minimize distortion. Experiments on five datasets (covering both classification and generation tasks) against three embedding inversion and three attribute inference attacks show that, using RoBERTa on the Stanford Sentiment Treebank dataset, PrivTune reduces the attack success rate to 10% with only a 3.33% drop in utility performance, outperforming state-of-the-art baselines.


【6】MobileFineTuner: A Unified End-to-End Framework for Fine-Tuning LLMs on Mobile Phones
标题:MobileFineTuner:用于在手机上微调LLM的统一端到端框架
链接:https://arxiv.org/abs/2512.08211

作者:Jiaxiang Geng,Lunyu Zhao,Yiyi Lu,Bing Luo
备注:15 pages, 9 figures, submitted to Mobisys 2026
摘要 :移动电话是最普遍的终端设备,产生大量的人类创作的数据,并作为终端应用程序的主要平台。随着大型语言模型(LLM)的高质量公共数据接近耗尽,设备上的微调提供了一个利用私人用户数据同时保护隐私的机会。然而,现有的方法主要是基于模拟或依赖于物联网设备和PC,使商品手机在很大程度上未被探索。一个关键的差距是缺乏一个开源框架,使实际的LLM微调手机。我们提出了MobileFineTuner,一个统一的开源框架,可以直接在商品手机上进行端到端的LLM微调。MobileFineTuner旨在提高效率、可扩展性和可用性,支持全参数微调(Full-FT)和参数高效微调(PEFT)。为了解决移动电话固有的内存和能源限制,我们引入了系统级优化,包括参数分片,梯度累积和能量感知计算调度。我们展示了MobileFineTuner的实用性微调GPT-2,杰玛3,和Qwen 2.5在真正的手机。广泛的实验和消融研究验证了所提出的优化的有效性,并建立MobileFineTuner作为一个可行的基础,为未来的研究设备上LLM培训。
摘要:Mobile phones are the most ubiquitous end devices, generating vast amounts of human-authored data and serving as the primary platform for end-side applications. As high-quality public data for large language models (LLMs) approaches exhaustion, on-device fine-tuning provides an opportunity to leverage private user data while preserving privacy. However, existing approaches are predominantly simulation-based or rely on IoT devices and PCs, leaving commodity mobile phones largely unexplored. A key gap is the absence of an open-source framework that enables practical LLM fine-tuning on mobile phones. We present MobileFineTuner, a unified open-source framework that enables end-to-end LLM fine-tuning directly on commodity mobile phones. MobileFineTuner is designed for efficiency, scalability, and usability, supporting full-parameters fine-tuning (Full-FT) and parameter-efficient fine-tuning (PEFT). To address the memory and energy limitations inherent to mobile phones, we introduce system-level optimizations including parameter sharding, gradient accumulation, and energy-aware computation scheduling. We demonstrate the practicality of MobileFineTuner by fine-tuning GPT-2, Gemma 3, and Qwen 2.5 on real mobile phones. Extensive experiments and ablation studies validate the effectiveness of the proposed optimizations and establish MobileFineTuner as a viable foundation for future research on on-device LLM training.


【7】Balanced Accuracy: The Right Metric for Evaluating LLM Judges - Explained through Youden's J statistic
标题:平衡的准确性:评估LLM法官的正确指标-通过Youden的J统计进行解释
链接:https://arxiv.org/abs/2512.08121

作者:Stephane Collot,Colin Fraser,Justin Zhao,William F. Shen,Timon Willi,Ilias Leontiadis
备注:9 pages, 5 figures
摘要:大型语言模型(LLM)的严格评估依赖于通过理想或不理想行为(例如任务通过率或违反策略)的流行程度来比较模型。这些流行率估计是由分类器产生的,无论是作为法官的LLM还是人类注释者,都使得分类器的选择成为值得信赖的评估的核心。用于这种选择的常用指标,如准确性,精确度和F1,对类别不平衡和任意选择阳性类别敏感,并且可能有利于扭曲患病率估计的法官。我们表明,约登的$J$统计量在理论上与选择最佳判断比较模型,平衡精度是一个等效的线性变换的$J$。通过分析论证和实证例子和模拟,我们展示了如何选择法官使用平衡的准确性导致更好,更强大的分类器选择。
摘要:Rigorous evaluation of large language models (LLMs) relies on comparing models by the prevalence of desirable or undesirable behaviors, such as task pass rates or policy violations. These prevalence estimates are produced by a classifier, either an LLM-as-a-judge or human annotators, making the choice of classifier central to trustworthy evaluation. Common metrics used for this choice, such as Accuracy, Precision, and F1, are sensitive to class imbalance and to arbitrary choices of positive class, and can favor judges that distort prevalence estimates. We show that Youden's $J$ statistic is theoretically aligned with choosing the best judge to compare models, and that Balanced Accuracy is an equivalent linear transformation of $J$. Through both analytical arguments and empirical examples and simulations, we demonstrate how selecting judges using Balanced Accuracy leads to better, more robust classifier selection.


【8】Training LLMs for Honesty via Confessions
标题:通过自白训练法学硕士诚实
链接:https://arxiv.org/abs/2512.08093

作者:Manas Joglekar,Jeremy Chen,Gabriel Wu,Jason Yosinski,Jasmine Wang,Boaz Barak,Amelia Glaese
摘要:大型语言模型(LLM)在报告他们的行为和信仰时可能是不诚实的-例如,他们可能夸大他们对事实声明的信心或掩盖秘密行动的证据。这种不诚实可能是由于强化学习(RL)的影响而产生的,其中奖励塑造的挑战可能导致训练过程无意中激励模型撒谎或歪曲其行为。   在这项工作中,我们提出了一种方法,通过自我报告 * 忏悔 * 引出一个LLM的缺点诚实的表达。坦白是一种输出,在模型的原始答案之后根据请求提供,这意味着作为模型遵守其政策和指示的文字和精神的完整说明。在训练过程中分配给坦白的奖励完全基于其诚实,并且不会对主要答案的奖励产生积极或消极的影响。只要坦白奖励最大化的“阻力最小的路径”是揭露不当行为而不是掩盖它,这就会激励模型诚实地坦白。我们的研究结果为这一经验假设提供了一些理由,特别是在令人震惊的模型不当行为的情况下。   为了证明我们的方法的可行性,我们训练GPT-5-Thinking来产生供词,并在测量幻觉、指令遵循、阴谋和奖励黑客的分布外场景中评估其诚实性。我们发现,当模型在其“主要”答案中撒谎或省略缺点时,它通常会诚实地承认这些行为,并且这种坦白诚实度会随着训练而适度提高。自白可以实现许多推理时间干预,包括监视、拒绝采样和向用户呈现问题。
摘要:Large language models (LLMs) can be dishonest when reporting on their actions and beliefs -- for example, they may overstate their confidence in factual claims or cover up evidence of covert actions. Such dishonesty may arise due to the effects of reinforcement learning (RL), where challenges with reward shaping can result in a training process that inadvertently incentivizes the model to lie or misrepresent its actions.   In this work we propose a method for eliciting an honest expression of an LLM's shortcomings via a self-reported *confession*. A confession is an output, provided upon request after a model's original answer, that is meant to serve as a full account of the model's compliance with the letter and spirit of its policies and instructions. The reward assigned to a confession during training is solely based on its honesty, and does not impact positively or negatively the main answer's reward. As long as the "path of least resistance" for maximizing confession reward is to surface misbehavior rather than covering it up, this incentivizes models to be honest in their confessions. Our findings provide some justification this empirical assumption, especially in the case of egregious model misbehavior.   To demonstrate the viability of our approach, we train GPT-5-Thinking to produce confessions, and we evaluate its honesty in out-of-distribution scenarios measuring hallucination, instruction following, scheming, and reward hacking. We find that when the model lies or omits shortcomings in its "main" answer, it often confesses to these behaviors honestly, and this confession honesty modestly improves with training. Confessions can enable a number of inference-time interventions including monitoring, rejection sampling, and surfacing issues to the user.


【9】Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders
标题:通过稀疏自动编码器揭示化学语言模型中的潜在知识
链接:https://arxiv.org/abs/2512.08077

作者:Jaron Cohen,Alexander G. Hasson,Sara Tanovic
摘要:自机器学习出现以来,可解释性一直是一个持续的挑战,随着生成模型支持药物和材料发现中的高风险应用,可解释性变得越来越紧迫。大语言模型(LLM)架构的最新进展产生了化学语言模型(CLM),在分子性质预测和分子生成方面具有令人印象深刻的能力。然而,这些模型内部如何表示化学知识仍然知之甚少。在这项工作中,我们扩展了稀疏自动编码器技术,发现和检查可解释的功能内CLMs。将我们的方法应用于材料基础模型(FM 4 M)SMI-TED化学基础模型,我们提取了语义上有意义的潜在特征,并在不同的分子数据集中分析了它们的激活模式。我们的研究结果表明,这些模型编码了丰富的化学概念景观。我们确定特定的潜在特征和不同领域的化学知识之间的相关性,包括结构基序,物理化学性质和药理学药物类别。我们的方法提供了一个可推广的框架,用于揭示以化学为中心的人工智能系统中的潜在知识。这项工作对基础理解和实际部署都有影响;有可能加速计算化学研究。
摘要:Since the advent of machine learning, interpretability has remained a persistent challenge, becoming increasingly urgent as generative models support high-stakes applications in drug and material discovery. Recent advances in large language model (LLM) architectures have yielded chemistry language models (CLMs) with impressive capabilities in molecular property prediction and molecular generation. However, how these models internally represent chemical knowledge remains poorly understood. In this work, we extend sparse autoencoder techniques to uncover and examine interpretable features within CLMs. Applying our methodology to the Foundation Models for Materials (FM4M) SMI-TED chemistry foundation model, we extract semantically meaningful latent features and analyse their activation patterns across diverse molecular datasets. Our findings reveal that these models encode a rich landscape of chemical concepts. We identify correlations between specific latent features and distinct domains of chemical knowledge, including structural motifs, physicochemical properties, and pharmacological drug classes. Our approach provides a generalisable framework for uncovering latent knowledge in chemistry-focused AI systems. This work has implications for both foundational understanding and practical deployment; with the potential to accelerate computational chemistry research.


【10】CrowdLLM: Building LLM-Based Digital Populations Augmented with Generative Models
标题:CrowdLLM:利用生成模型构建基于LLM的数字人群
链接:https://arxiv.org/abs/2512.07890

作者:Ryan Feng Lin,Keyu Tian,Hanming Zheng,Congjing Zhang,Li Zeng,Shuai Huang
摘要:大型语言模型(LLM)的出现引发了人们对创建基于LLM的数字群体的极大兴趣,这些群体可应用于社交模拟、众包、营销和推荐系统等许多应用。数字人群可以降低招募人类参与者的成本,并减轻与人类受试者研究相关的许多担忧。然而,研究发现,大多数现有的作品仅依赖于LLM,无法充分捕捉真实人群的准确性和多样性。为了解决这一限制,我们提出了CrowdLLM,它集成了预训练的LLM和生成模型,以增强数字群体的多样性和保真度。我们对CrowdLLM进行理论分析,了解其在创建具有成本效益,具有足够代表性,可扩展的数字人群方面的巨大潜力,这些数字人群可以与真实人群的质量相匹配。还跨多个领域(例如,众包,投票,用户评级)和模拟研究表明,CrowdLLM在人类数据的准确性和分布保真度方面都取得了令人鼓舞的表现。
摘要:The emergence of large language models (LLMs) has sparked much interest in creating LLM-based digital populations that can be applied to many applications such as social simulation, crowdsourcing, marketing, and recommendation systems. A digital population can reduce the cost of recruiting human participants and alleviate many concerns related to human subject study. However, research has found that most of the existing works rely solely on LLMs and could not sufficiently capture the accuracy and diversity of a real human population. To address this limitation, we propose CrowdLLM that integrates pretrained LLMs and generative models to enhance the diversity and fidelity of the digital population. We conduct theoretical analysis of CrowdLLM regarding its great potential in creating cost-effective, sufficiently representative, scalable digital populations that can match the quality of a real crowd. Comprehensive experiments are also conducted across multiple domains (e.g., crowdsourcing, voting, user rating) and simulation studies which demonstrate that CrowdLLM achieves promising performance in both accuracy and distributional fidelity to human data.


【11】SABER: Small Actions, Big Errors - Safeguarding Mutating Steps in LLM Agents
标题:SABER:小行动,大错误-保障LLM代理中的突变步骤
链接:https://arxiv.org/abs/2512.07850

作者:Alejandro Cuadron,Pengfei Yu,Yang Liu,Arpit Gupta
备注:submitted to ICLR2026
摘要:尽管LLM代理取得了快速进展,但长期使用工具的任务的性能仍然很脆弱。为了更好地理解这种脆弱性,我们提出一个简单的问题:\n {所有行为对失败的贡献是否相等?}通过分析$τ$-Bench(航空公司/零售商)和SWE-Bench Verified上的执行轨迹,我们将轨迹分解为\n {mutating}(环境变化)与\n非突变步骤和正式的\n {决定性偏差},最早的行动,水平分歧,翻转成功失败。逻辑回归表明,突变行为中的每一个额外偏差都会使SoTA模型的成功几率降低高达92%$(航空公司)和96%$(零售)。相比之下,非变异行为的偏差几乎没有影响。错误也随着上下文长度的增长而增长,因为代理偏离了角色并对过时的约束进行操作。受这些观察的启发,我们引入了\cm{},这是一种与模型无关的,无梯度的测试时保护措施,它(i)添加了突变门控验证,(ii)在突变步骤之前注入\emdash {Targeted Reflection},以及(iii)执行基于块的上下文清理。\cm{}提供一致的增益,例如,Qwen 3-思考:+28\% \n相对于航空公司,+11\%零售,+7\% SWE-Bench验证;克劳德:+9\%/+7\%。我们进一步确定了$τ$-Bench中的天花板效应,其中注释错误和未指定的任务人为地限制了模型性能。为了解决这个问题,我们发布了$τ$-Bench Verified,它通过有针对性的修订来恢复基准净空。我们的研究结果认为,行动层面的分析,有针对性的保障措施,可靠的评估为先决条件,强大的多轮代理。
摘要 :Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: \emph{do all actions contribute equally to failure?} Analyzing execution traces on $τ$-Bench (Airline/Retail) and SWE-Bench Verified, we decompose trajectories into \emph{mutating} (environment-changing) vs.\ non-mutating steps and formalize \emph{decisive deviations}, earliest action, level divergences that flip success to failure. A logistic regression reveals that each additional deviation in a mutating action reduces the odds of success by upto $92\%$ on Airline and upto $96\%$ on Retail for SoTA models. In contrast, deviations in non-mutating actions have little to no effect. Errors also grow with context length as agents drift from role and act on stale constraints. Motivated by these observations, we introduce \cm{}, a model-agnostic, gradient-free, test-time safeguard that (i) adds mutation-gated verification, (ii) injects \emph{Targeted Reflection} before mutating steps, and (iii) performs block-based context cleaning. \cm{} delivers consistent gains, e.g., Qwen3-Thinking: +28\% \emph{relative} on Airline, +11\% on Retail, and +7\% on SWE-Bench Verified; Claude: +9\%/+7\%. We further identify ceiling effects in $τ$-Bench, where annotation errors and underspecified tasks artificially cap model performance. To address this, we release $τ$-Bench Verified, which restores benchmark headroom through targeted revisions. Our results argue for action-level analysis, targeted safeguards, and reliable evaluations as prerequisites for robust multi-turn agents.


【12】MixLM: High-Throughput and Effective LLM Ranking via Text-Embedding Mix-Interaction
标题:MixLM:通过文本嵌入Mix-Interaction实现高吞吐量和有效的LLM排名
链接:https://arxiv.org/abs/2512.07846

作者:Guoyao Li,Ran He,Shusen Jing,Kayhan Behdin,Yubo Wang,Sundara Raman Ramachandran,Chanh Nguyen,Jian Sheng,Xiaojing Ma,Chuanrui Zhu,Sriram Vasudevan,Muchen Wu,Sayan Ghosh,Lin Su,Qingquan Song,Xiaoqing Wang,Zhipeng Wang,Qing Lan,Yanning Chen,Jingwei Wu,Luke Simon,Wenjing Zhang,Qi Guo,Fedor Borisyuk
摘要:大型语言模型(LLM)擅长捕捉语义细微差别,因此在现代推荐和搜索系统中显示出令人印象深刻的相关性排名性能。然而,它们在工业延迟和吞吐量要求下遭受高计算开销。特别是,交叉编码器排名系统通常会创建长上下文预填充繁重的工作负载,因为模型必须与用户,查询和项目信息一起呈现。为此,我们提出了MixLM,一种新的基于LLM的排名框架,它通过减少输入上下文长度来显着提高系统吞吐量,同时保留跨编码器排名的语义强度。与标准的排名系统相比,上下文作为纯文本呈现给模型,我们建议使用混合交互,文本和嵌入令牌的混合来表示输入。具体来说,MixLM将目录中的所有项目编码到几个嵌入令牌中,并存储在近线缓存中。在线推理期间使用编码的项目描述,有效地将项目长度从几千个文本标记减少到几个嵌入标记。我们分享了将MixLM框架部署到LinkedIn的真实搜索应用程序中的见解,包括对我们的培训管道的详细讨论,以及对我们的在线服务基础设施优化的全面分析。与强大的基线相比,MixLM在相同的延迟预算下将吞吐量提高了10.0倍,同时保持了相关性指标。MixLM带来的效率提升实现了LLM驱动的搜索的全流量部署,这使得在线A/B测试中的每日活跃用户(DAU)显著增加了0.47%。
摘要:Large language models (LLMs) excel at capturing semantic nuances and therefore show impressive relevance ranking performance in modern recommendation and search systems. However, they suffer from high computational overhead under industrial latency and throughput requirements. In particular, cross-encoder ranking systems often create long context prefill-heavy workloads, as the model has to be presented with the user, query and item information. To this end, we propose MixLM, a novel LLM-based ranking framework, which significantly improves the system throughput via reducing the input context length, while preserving the semantic strength of cross-encoder rankers. In contrast to a standard ranking system where the context is presented to the model as pure text, we propose to use mix-interaction, a mixture of text and embedding tokens to represent the input. Specifically, MixLM encodes all items in the catalog into a few embedding tokens and stores in a nearline cache. The encoded item descriptions are used during online inference, effectively reducing the item length from a few thousand text tokens to a few embedding tokens. We share insights from deploying our MixLM framework to a real-world search application at LinkedIn, including a detailed discussion of our training pipelines, as well as a thorough analysis of our online serving infrastructure optimization. Comparing with strong baselines, MixLM increased throughput by 10.0x under the same latency budget, while maintaining relevance metrics. The efficiency gains delivered by MixLM enabled full-traffic deployment of LLM-powered search, which resulted in a significant 0.47% increase in Daily Active Users (DAU) in online A/B tests.


【13】ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models
标题:ThreadWeaver:自适应线程法在语言模型中实现高效并行推理
链接:https://arxiv.org/abs/2512.07843

作者:Long Lian,Sida Wang,Felix Juefei-Xu,Tsu-Jui Fu,Xiuyu Li,Adam Yala,Trevor Darrell,Alane Suhr,Yuandong Tian,Xi Victoria Lin
摘要:扩展推理时间计算使大型语言模型(LLM)能够实现强大的推理性能,但固有的顺序解码会导致大量延迟,特别是在复杂的任务中。自适应并行推理最近的工作旨在提高推理效率分解的问题解决过程中的并发推理线程时,有益的。然而,现实任务的现有方法要么仅限于有监督的行为克隆,要么与广泛使用的顺序长思维链(CoT)基线相比,准确性显着下降。此外,许多需要定制的推理引擎,使部署复杂化。我们介绍了ThreadWeaver,自适应并行推理的框架,达到了与流行的顺序推理模型相当的大小,同时显着降低推理延迟的准确性。ThreadWeaver的性能源于三个关键创新:1)两阶段并行轨迹生成器,可生成大规模、高质量的CoT数据,并带有并行注释,用于监督微调; 2)基于trie的训练-推理协同设计,可在任何现成的自回归推理引擎上实现并行推理,而无需修改位置嵌入或KV缓存;以及3)并行化感知强化学习框架,其教导模型平衡准确性与有效并行化。在六个具有挑战性的数学推理基准测试中,在Qwen 3 -8B上训练的ThreadWeaver实现了与尖端顺序推理模型相当的准确性(平均71.9%,AIME 24上为79.9%),同时在令牌延迟方面提供了高达1.53倍的平均加速,在准确性和效率之间建立了一个新的帕累托边界。
摘要 :Scaling inference-time computation has enabled Large Language Models (LLMs) to achieve strong reasoning performance, but inherently sequential decoding leads to substantial latency, especially on complex tasks. Recent work on adaptive parallel reasoning aims to improve inference efficiency by decomposing the problem-solving process into concurrent reasoning threads when beneficial. However, existing methods on realistic tasks are either limited to supervised behavior cloning or exhibit significant accuracy drops compared to widely-used sequential long chain-of-thought (CoT) baselines. Moreover, many require customized inference engines, complicating deployment. We introduce ThreadWeaver, a framework for adaptive parallel reasoning that achieves accuracy on par with popular sequential reasoning models of comparable size while significantly reducing inference latency. ThreadWeaver's performance stems from three key innovations: 1) a two-stage parallel trajectory generator that produces large-scale, high-quality CoT data with parallel annotations for supervised fine-tuning; 2) a trie-based training-inference co-design that enables parallel reasoning on any off-the-shelf autoregressive inference engine without modifying position embeddings or KV caches; and 3) a parallelization-aware reinforcement learning framework that teaches the model to balance accuracy with effective parallelization. Across six challenging mathematical reasoning benchmarks, ThreadWeaver trained atop Qwen3-8B achieves accuracy comparable to cutting-edge sequential reasoning models (71.9% on average and 79.9% on AIME24) while delivering up to 1.53x average speedup in token latency, establishing a new Pareto frontier between accuracy and efficiency.


【14】Automating High Energy Physics Data Analysis with LLM-Powered Agents
标题:使用LLM支持的代理自动化高能物理数据分析
链接:https://arxiv.org/abs/2512.07785

作者:Eli Gendreau-Distler,Joshua Ho,Dongwon Kim,Luc Tomas Le Pottier,Haichen Wang,Chengxi Yang
备注:16 pages, 6 figures, 2 tables, the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) - Machine Learning and the Physical Sciences (ML4PS) workshop (poster)
摘要:我们提出了一个原理证明研究,演示了使用大型语言模型(LLM)代理自动化的代表性高能物理(HEP)分析。使用Higgs玻色子双光子截面测量作为ATLAS Open Data的案例研究,我们设计了一个混合系统,该系统将基于LLM的监督编码器代理与Snakemake工作流管理器相结合。在这种体系结构中,工作流管理器强制执行可再现性和确定性,而代理自主地生成,执行和迭代地纠正分析代码,以响应用户的指令。我们定义了定量评估指标,包括成功率,错误分布,每个特定任务的成本,以及API调用的平均数量,以评估多阶段工作流程中的代理性能。为了表征架构之间的差异,我们对Gemini和GPT-5系列,Claude系列和领先的开放重量模型的最先进的LLM进行了基准测试。虽然工作流管理器确保所有分析步骤的确定性执行,但最终输出仍然显示随机变化。虽然我们将温度设置为零,但其他采样参数(例如,top-p,top-k)保持其默认值,并且一些面向推理的模型在内部调整这些设置。因此,模型不能产生完全确定的结果。该研究建立了HEP中第一个LLM-Agent驱动的自动化数据分析框架,从而能够在现实世界的科学计算环境中对模型功能,稳定性和局限性进行系统基准测试。本工作中使用的基线代码可在https://huggingface.co/HWresearch/LLM4HEP上获得。这项工作被接受为NeurIPS 2025机器学习和物理科学(ML 4PS)研讨会的海报。首次提交日期为2025年8月30日。
摘要:We present a proof-of-principle study demonstrating the use of large language model (LLM) agents to automate a representative high energy physics (HEP) analysis. Using the Higgs boson diphoton cross-section measurement as a case study with ATLAS Open Data, we design a hybrid system that combines an LLM-based supervisor-coder agent with the Snakemake workflow manager. In this architecture, the workflow manager enforces reproducibility and determinism, while the agent autonomously generates, executes, and iteratively corrects analysis code in response to user instructions. We define quantitative evaluation metrics including success rate, error distribution, costs per specific task, and average number of API calls, to assess agent performance across multi-stage workflows. To characterize variability across architectures, we benchmark a representative selection of state-of-the-art LLMs spanning the Gemini and GPT-5 series, the Claude family, and leading open-weight models. While the workflow manager ensures deterministic execution of all analysis steps, the final outputs still show stochastic variation. Although we set the temperature to zero, other sampling parameters (e.g., top-p, top-k) remained at their defaults, and some reasoning-oriented models internally adjust these settings. Consequently, the models do not produce fully deterministic results. This study establishes the first LLM-agent-driven automated data-analysis framework in HEP, enabling systematic benchmarking of model capabilities, stability, and limitations in real-world scientific computing environments. The baseline code used in this work is available at https://huggingface.co/HWresearch/LLM4HEP. This work was accepted as a poster at the Machine Learning and the Physical Sciences (ML4PS) workshop at NeurIPS 2025. The initial submission was made on August 30, 2025.


Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?
标题:TabPFN可以通过图表格化与GNN竞争节点分类吗?
链接:https://arxiv.org/abs/2512.08798

作者:Jeongwhan Choi,Woosung Kang,Minseo Kim,Jongwoo Kim,Noseong Park
备注:Rejected from LoG 2025 (submitted August 2025)
摘要:在大数据上预训练的基础模型已经展示了跨领域的显著zero-shot泛化能力。基于TabPFN在表格数据及其最近扩展到时间序列方面的成功,我们研究了图节点分类是否可以有效地重新表述为表格学习问题。我们引入TabPFN-GN,它通过提取节点属性,结构属性,位置编码和可选的平滑邻域特征将图形数据转换为表格特征。这使得TabPFN能够执行直接的节点分类,而无需任何特定于图的训练或语言模型依赖性。我们在12个基准数据集上的实验表明,TabPFN-GN在同构图上与GNN具有竞争力的性能,并且在同构图上始终优于它们。这些结果表明,原则性特征工程可以弥合表格和图形域之间的差距,为特定任务的GNN训练和LLM依赖的图形基础模型提供了一种实用的替代方案。
摘要:Foundation models pretrained on large data have demonstrated remarkable zero-shot generalization capabilities across domains. Building on the success of TabPFN for tabular data and its recent extension to time series, we investigate whether graph node classification can be effectively reformulated as a tabular learning problem. We introduce TabPFN-GN, which transforms graph data into tabular features by extracting node attributes, structural properties, positional encodings, and optionally smoothed neighborhood features. This enables TabPFN to perform direct node classification without any graph-specific training or language model dependencies. Our experiments on 12 benchmark datasets reveal that TabPFN-GN achieves competitive performance with GNNs on homophilous graphs and consistently outperforms them on heterophilous graphs. These results demonstrate that principled feature engineering can bridge the gap between tabular and graph domains, providing a practical alternative to task-specific GNN training and LLM-dependent graph foundation models.


【2】Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning
标题:通过强化学习和编辑通用图提示调优
链接:https://arxiv.org/abs/2512.08763

作者:Jinfeng Xu,Zheyu Chen,Shuo Yang,Jinze Li,Hewei Wang,Yijie Li,Edith C. H. Ngai
备注:Accepted by KDD 2026
摘要:早期的图提示调整方法依赖于图神经网络(GNN)的特定任务设计,限制了它们在不同预训练策略中的适应性。相比之下,另一个有前途的研究方向是研究通用图提示调优,它直接在输入图的特征空间中操作,并建立了一个理论基础,即通用图提示调优理论上可以实现任何提示函数的等效效果,消除对特定预训练策略的依赖。最近的作品提出了选择性的基于节点的图形提示调整,以追求更理想的提示。然而,我们认为,选择性的基于节点的图形提示调整不可避免地妥协的通用图形提示调整的理论基础。本文通过引入更严格的约束条件,加强了通用图提示调优的理论基础,论证了向所有节点添加提示是实现图提示通用性的必要条件。为此,我们提出了一种新的模型和范式,学习和编辑通用GrAph提示调整(LEAP),它保留了通用图提示调整的理论基础,同时追求更理想的提示。具体来说,我们首先构建基本的通用图形提示以保留理论基础,然后采用演员-评论家强化学习来选择节点和编辑提示。在全镜头和Few-Shot场景中,对各种预训练策略的图级和节点级任务进行了广泛的实验,结果表明LEAP始终优于微调和其他基于神经网络的方法。
摘要:Early graph prompt tuning approaches relied on task-specific designs for Graph Neural Networks (GNNs), limiting their adaptability across diverse pre-training strategies. In contrast, another promising line of research has investigated universal graph prompt tuning, which operates directly in the input graph's feature space and builds a theoretical foundation that universal graph prompt tuning can theoretically achieve an equivalent effect of any prompting function, eliminating dependence on specific pre-training strategies. Recent works propose selective node-based graph prompt tuning to pursue more ideal prompts. However, we argue that selective node-based graph prompt tuning inevitably compromises the theoretical foundation of universal graph prompt tuning. In this paper, we strengthen the theoretical foundation of universal graph prompt tuning by introducing stricter constraints, demonstrating that adding prompts to all nodes is a necessary condition for achieving the universality of graph prompts. To this end, we propose a novel model and paradigm, Learning and Editing Universal GrAph Prompt Tuning (LEAP), which preserves the theoretical foundation of universal graph prompt tuning while pursuing more ideal prompts. Specifically, we first build the basic universal graph prompts to preserve the theoretical foundation and then employ actor-critic reinforcement learning to select nodes and edit prompts. Extensive experiments on graph- and node-level tasks across various pre-training strategies in both full-shot and few-shot scenarios show that LEAP consistently outperforms fine-tuning and other prompt-based approaches.


【3】A Hybrid Model for Stock Market Forecasting: Integrating News Sentiment and Time Series Data with Graph Neural Networks
标题:一种股票市场预测的混合模型:用图神经网络集成新闻情绪和时间序列数据
链接:https://arxiv.org/abs/2512.08567

作者:Nader Sadek,Mirette Moawad,Christina Naguib,Mariam Elzahaby
备注:11 pages, 6 figures. Published in the Proceedings of the 5th International Conference on Artificial Intelligence Research (ICAIR 2025). Published version available at: https://papers.academic-conferences.org/index.php/icair/article/view/4294
摘要:股票市场预测是金融领域的一个长期挑战,因为准确的预测有助于做出明智的投资决策。传统的模型主要依赖于历史价格,但最近的研究表明,金融新闻可以提供有用的外部信号。本文研究了一种多模态方法,将公司的新闻文章与其历史股票数据相结合,以提高预测性能。我们将图神经网络(GNN)模型与基线LSTM模型进行了比较。每个公司的历史数据都使用LSTM进行编码,而新闻标题则嵌入了语言模型。这些嵌入在异构图中形成节点,GraphSAGE用于捕获文章,公司和行业之间的交互。我们评估两个目标:一个二进制的方向变化的标签和一个基于重要性的标签。在美国股票和彭博数据集上的实验表明,GNN的性能优于LSTM基线,在第一个目标上实现了53%的准确率,在第二个目标上实现了4%的精度增益。研究结果还表明,具有更多相关新闻的公司产生更高的预测准确率。此外,新闻标题比完整的文章包含更强的预测信号,这表明简洁的新闻摘要在短期市场反应中发挥着重要作用。
摘要:Stock market prediction is a long-standing challenge in finance, as accurate forecasts support informed investment decisions. Traditional models rely mainly on historical prices, but recent work shows that financial news can provide useful external signals. This paper investigates a multimodal approach that integrates companies' news articles with their historical stock data to improve prediction performance. We compare a Graph Neural Network (GNN) model with a baseline LSTM model. Historical data for each company is encoded using an LSTM, while news titles are embedded with a language model. These embeddings form nodes in a heterogeneous graph, and GraphSAGE is used to capture interactions between articles, companies, and industries. We evaluate two targets: a binary direction-of-change label and a significance-based label. Experiments on the US equities and Bloomberg datasets show that the GNN outperforms the LSTM baseline, achieving 53% accuracy on the first target and a 4% precision gain on the second. Results also indicate that companies with more associated news yield higher prediction accuracy. Moreover, headlines contain stronger predictive signals than full articles, suggesting that concise news summaries play an important role in short-term market reactions.


【4】Enhancing Explainability of Graph Neural Networks Through Conceptual and Structural Analyses and Their Extensions
标题:通过概念和结构分析及其扩展增强图神经网络的可解释性
链接:https://arxiv.org/abs/2512.08344

作者:Tien Cuong Bui
备注:157 pages, Doctoral dissertation at Seoul National University (submitted in 2024.08 to SNU library, slightly updated in 2025.11 for open digital version)
摘要:图神经网络(GNNs)已经成为一种强大的工具,用于建模和分析具有图结构的数据。在众多应用中的广泛采用强调了这些模型的价值。然而,这些方法的复杂性往往妨碍理解其决策过程。当前的可解释人工智能(XAI)方法很难解开图中错综复杂的关系和交互。有几种方法试图通过事后的方法或自我解释的设计来弥合这一差距。他们中的大多数专注于图结构分析,以确定与预测结果相关的基本模式。虽然事后解释方法是适应性强的,但它们需要额外的计算资源,并且由于对模型内部工作的访问有限,可能不太可靠。相反,可解释模型可以提供即时的解释,但它们对不同场景的通用性仍然是一个主要问题。为了解决这些缺点,本文试图开发一种新的XAI框架,为基于图的机器学习量身定制。拟议的框架旨在为GNN提供适应性强、计算效率高的解释,超越个体特征分析,捕捉图结构如何影响预测。
摘要 :Graph Neural Networks (GNNs) have become a powerful tool for modeling and analyzing data with graph structures. The wide adoption in numerous applications underscores the value of these models. However, the complexity of these methods often impedes understanding their decision-making processes. Current Explainable AI (XAI) methods struggle to untangle the intricate relationships and interactions within graphs. Several methods have tried to bridge this gap via a post-hoc approach or self-interpretable design. Most of them focus on graph structure analysis to determine essential patterns that correlate with prediction outcomes. While post-hoc explanation methods are adaptable, they require extra computational resources and may be less reliable due to limited access to the model's internal workings. Conversely, Interpretable models can provide immediate explanations, but their generalizability to different scenarios remains a major concern. To address these shortcomings, this thesis seeks to develop a novel XAI framework tailored for graph-based machine learning. The proposed framework aims to offer adaptable, computationally efficient explanations for GNNs, moving beyond individual feature analysis to capture how graph structure influences predictions.


【5】gHAWK: Local and Global Structure Encoding for Scalable Training of Graph Neural Networks on Knowledge Graphs
标题:gHAWK:用于知识图上图神经网络可扩展训练的局部和全局结构编码
链接:https://arxiv.org/abs/2512.08274

作者:Humera Sabir,Fatima Farooq,Ashraf Aboulnaga
摘要:知识图(KGs)是结构化、异构数据的丰富来源,为广泛的应用程序提供支持。利用这些数据的常见方法是在KG上训练图形神经网络(GNN)。然而,现有的消息传递GNN很难扩展到大型KG,因为它们依赖于迭代消息传递过程来学习图结构,这是低效的,特别是在小批量训练下,其中节点只能看到其邻域的部分视图。在本文中,我们解决了这个问题,并提出了gHAWK,一个新的和可扩展的GNN训练框架的大型KG。其关键思想是在GNN训练开始之前为每个节点预先计算结构特征,以捕获其局部和全局结构。具体来说,gHAWK引入了一个预处理步骤,该步骤计算:(a)~Bloom过滤器来压缩编码局部邻域结构,以及(b)~TransE嵌入来表示图中每个节点的全局位置。然后将这些特征与任何域特定特征(例如,文本嵌入),产生一个节点特征向量,可以被合并到任何GNN技术中。通过使用结构先验增强消息传递训练,gHAWK显著减少了内存使用,加速了收敛,并提高了模型的准确性。在Open Graph Benchmark(OGB)的大型数据集上进行的大量实验表明,gHAWK在节点属性预测和链接预测任务上都实现了最先进的准确性和更短的训练时间,在三个图中均超过了OGB排行榜。
摘要:Knowledge Graphs (KGs) are a rich source of structured, heterogeneous data, powering a wide range of applications. A common approach to leverage this data is to train a graph neural network (GNN) on the KG. However, existing message-passing GNNs struggle to scale to large KGs because they rely on the iterative message passing process to learn the graph structure, which is inefficient, especially under mini-batch training, where a node sees only a partial view of its neighborhood. In this paper, we address this problem and present gHAWK, a novel and scalable GNN training framework for large KGs. The key idea is to precompute structural features for each node that capture its local and global structure before GNN training even begins. Specifically, gHAWK introduces a preprocessing step that computes: (a)~Bloom filters to compactly encode local neighborhood structure, and (b)~TransE embeddings to represent each node's global position in the graph. These features are then fused with any domain-specific features (e.g., text embeddings), producing a node feature vector that can be incorporated into any GNN technique. By augmenting message-passing training with structural priors, gHAWK significantly reduces memory usage, accelerates convergence, and improves model accuracy. Extensive experiments on large datasets from the Open Graph Benchmark (OGB) demonstrate that gHAWK achieves state-of-the-art accuracy and lower training time on both node property prediction and link prediction tasks, topping the OGB leaderboard for three graphs.


【6】PR-CapsNet: Pseudo-Riemannian Capsule Network with Adaptive Curvature Routing for Graph Learning
标题:PR-CapsNet:具有用于图学习的自适应曲线路由的伪Riemann胶囊网络
链接:https://arxiv.org/abs/2512.08218

作者:Ye Qin,Jingchao Wang,Yang Shi,Haiying Huang,Junxu Li,Weijian Liu,Tinghui Chen,Jinghui Qin
备注:To appear in WSDM 2026 (ACM International Conference on Web Search and Data Mining)
摘要:胶囊网络(CapsNets)通过动态路由和矢量化层次表示显示出卓越的图形表示能力,但由于固有的测地线不连通性问题,它们在固定曲率空间下对真实世界图形的复杂几何形状建模效果较差,导致性能不佳。最近的研究发现,非欧伪黎曼流形为嵌入图形数据提供了特定的归纳偏差,但如何利用它们来改进CapsNets仍然有待探索。在这里,我们将欧氏胶囊路由扩展到测地线不连通的伪黎曼流形上,并推导出伪黎曼胶囊网络(PR CapsNet),它对自适应曲率的伪黎曼流形中的数据进行建模,用于图形表示学习。具体地说,PR\-CapsNet通过利用伪黎曼几何增强了具有自适应伪黎曼切线空间路由的CapsNet。与单曲率或子空间划分方法不同,PR CapsNet通过其通用的伪黎曼度量同时对分层和聚类或循环图结构进行建模。该方法首先采用伪黎曼切空间路由算法,通过同态变换将胶囊态分解为球时空子空间和欧氏空间子空间.然后,一个自适应曲率路由开发自适应融合功能,从不同的曲率空间复杂的图形通过一个可学习的曲率张量与几何注意从局部流形性质。最后,提出了一个保留几何性质的伪黎曼胶囊分类器,将胶囊嵌入投影到切空间,并使用曲率加权softmax进行分类.对节点和图分类基准的大量实验表明,PR\-CapsNet优于SOTA模型,验证了PR\-CapsNet对复杂图结构的强大表示能力。
摘要 :Capsule Networks (CapsNets) show exceptional graph representation capacity via dynamic routing and vectorized hierarchical representations, but they model the complex geometries of real\-world graphs poorly by fixed\-curvature space due to the inherent geodesical disconnectedness issues, leading to suboptimal performance. Recent works find that non\-Euclidean pseudo\-Riemannian manifolds provide specific inductive biases for embedding graph data, but how to leverage them to improve CapsNets is still underexplored. Here, we extend the Euclidean capsule routing into geodesically disconnected pseudo\-Riemannian manifolds and derive a Pseudo\-Riemannian Capsule Network (PR\-CapsNet), which models data in pseudo\-Riemannian manifolds of adaptive curvature, for graph representation learning. Specifically, PR\-CapsNet enhances the CapsNet with Adaptive Pseudo\-Riemannian Tangent Space Routing by utilizing pseudo\-Riemannian geometry. Unlike single\-curvature or subspace\-partitioning methods, PR\-CapsNet concurrently models hierarchical and cluster or cyclic graph structures via its versatile pseudo\-Riemannian metric. It first deploys Pseudo\-Riemannian Tangent Space Routing to decompose capsule states into spherical\-temporal and Euclidean\-spatial subspaces with diffeomorphic transformations. Then, an Adaptive Curvature Routing is developed to adaptively fuse features from different curvature spaces for complex graphs via a learnable curvature tensor with geometric attention from local manifold properties. Finally, a geometric properties\-preserved Pseudo\-Riemannian Capsule Classifier is developed to project capsule embeddings to tangent spaces and use curvature\-weighted softmax for classification. Extensive experiments on node and graph classification benchmarks show PR\-CapsNet outperforms SOTA models, validating PR\-CapsNet's strong representation power for complex graph structures.


【7】Graph Contrastive Learning via Spectral Graph Alignment
标题:通过谱图对齐进行图对比学习
链接:https://arxiv.org/abs/2512.07878

作者:Manh Nguyen,Joshua Cape
摘要:给定每个输入图的增强视图,对比学习方法(例如,InfoNCE)优化了跨视图的图嵌入的成对对齐,同时没有提供任何机制来控制从这些嵌入构建的视图特定的图的图的全局结构。我们引入SpecMatch-CL,一种新的损失函数,通过最小化它们的归一化拉普拉斯算子之间的差异来对齐视图特定的图的图。在理论上,我们证明了在一定的假设下,归一化拉普拉斯算子之间的差异不仅为理想的完美对准对比损失和当前损失之间的差异提供了一个上界,而且还为均匀损失提供了一个上界。从经验上讲,SpecMatch-CL在低标签率的无监督学习和半监督学习下,在八个TU基准上建立了最新的技术水平,并在PPI-306 K和ZINC 2 M数据集上的迁移学习中获得了一致的收益。
摘要:Given augmented views of each input graph, contrastive learning methods (e.g., InfoNCE) optimize pairwise alignment of graph embeddings across views while providing no mechanism to control the global structure of the view specific graph-of-graphs built from these embeddings. We introduce SpecMatch-CL, a novel loss function that aligns the view specific graph-of-graphs by minimizing the difference between their normalized Laplacians. Theoretically, we show that under certain assumptions, the difference between normalized Laplacians provides an upper bound not only for the difference between the ideal Perfect Alignment contrastive loss and the current loss, but also for the Uniformly loss. Empirically, SpecMatch-CL establishes new state of the art on eight TU benchmarks under unsupervised learning and semi-supervised learning at low label rates, and yields consistent gains in transfer learning on PPI-306K and ZINC 2M datasets.


【8】SA^2GFM: Enhancing Robust Graph Foundation Models with Structure-Aware Semantic Augmentation
标题:SA ' 2GFM:通过结构感知语义增强稳健的图基础模型
链接:https://arxiv.org/abs/2512.07857

作者:Junhua Shi,Qingyun Sun,Haonan Yuan,Xingcheng Fu
摘要:我们提出了图基础模型(GFM),它在各种任务中取得了重大进展,但它们对域噪声,结构扰动和对抗性攻击的鲁棒性仍有待研究。一个关键的限制是层次结构语义的建模不足,这对泛化至关重要。在本文中,我们提出了SA^2GFM,一个强大的GFM框架,通过结构感知语义增强来改进域自适应表示。首先,我们通过将基于熵的编码树转换为结构感知的文本提示来对分层结构先验进行编码,以进行特征增强。增强的输入由一个自我监督的信息瓶颈机制处理,该机制通过结构引导的压缩来提取鲁棒的、可转移的表示。为了解决跨域自适应中的负迁移问题,我们引入了一种专家自适应路由机制,将混合专家架构与空专家设计相结合。为了有效的下游适应,我们提出了一个微调模块,通过联合内部和社区间的结构学习优化层次结构。大量的实验表明,SA^2GFM在节点和图分类的有效性和鲁棒性方面优于9种最先进的基线。
摘要:We present Graph Foundation Models (GFMs) which have made significant progress in various tasks, but their robustness against domain noise, structural perturbations, and adversarial attacks remains underexplored. A key limitation is the insufficient modeling of hierarchical structural semantics, which are crucial for generalization. In this paper, we propose SA^2GFM, a robust GFM framework that improves domain-adaptive representations through Structure-Aware Semantic Augmentation. First, we encode hierarchical structural priors by transforming entropy-based encoding trees into structure-aware textual prompts for feature augmentation. The enhanced inputs are processed by a self-supervised Information Bottleneck mechanism that distills robust, transferable representations via structure-guided compression. To address negative transfer in cross-domain adaptation, we introduce an expert adaptive routing mechanism, combining a mixture-of-experts architecture with a null expert design. For efficient downstream adaptation, we propose a fine-tuning module that optimizes hierarchical structures through joint intra- and inter-community structure learning. Extensive experiments demonstrate that SA^2GFM outperforms 9 state-of-the-art baselines in terms of effectiveness and robustness against random noise and adversarial perturbations for node and graph classification.


Transformer(5篇)

【1】Transformers for Multimodal Brain State Decoding: Integrating Functional Magnetic Resonance Imaging Data and Medical Metadata
标题:用于多模式大脑状态解码的转换器:集成功能性磁共振成像数据和医学元数据
链接:https://arxiv.org/abs/2512.08462

作者:Danial Jafarzadeh Jazi,Maryam Hajiesmaeili
摘要:从功能磁共振成像(fMRI)数据中解码大脑状态对于推进神经科学和临床应用至关重要。虽然传统的机器学习和深度学习方法在利用fMRI数据的高维和复杂性方面取得了长足的进步,但它们通常无法利用医学数字成像和通信(DICOM)元数据提供的丰富上下文。本文提出了一种新的框架集成变压器为基础的架构与多模态输入,包括功能磁共振成像数据和DICOM元数据。通过采用注意机制,所提出的方法捕捉复杂的时空模式和上下文关系,提高模型的准确性,可解释性和鲁棒性。该框架的潜力涵盖了临床诊断、认知神经科学和个性化医疗的应用。限制,如元数据的变化和计算需求,得到解决,并讨论了未来的方向,优化可扩展性和可推广性。
摘要 :Decoding brain states from functional magnetic resonance imaging (fMRI) data is vital for advancing neuroscience and clinical applications. While traditional machine learning and deep learning approaches have made strides in leveraging the high-dimensional and complex nature of fMRI data, they often fail to utilize the contextual richness provided by Digital Imaging and Communications in Medicine (DICOM) metadata. This paper presents a novel framework integrating transformer-based architectures with multimodal inputs, including fMRI data and DICOM metadata. By employing attention mechanisms, the proposed method captures intricate spatial-temporal patterns and contextual relationships, enhancing model accuracy, interpretability, and robustness. The potential of this framework spans applications in clinical diagnostics, cognitive neuroscience, and personalized medicine. Limitations, such as metadata variability and computational demands, are addressed, and future directions for optimizing scalability and generalizability are discussed.


【2】Residual-SwinCA-Net: A Channel-Aware Integrated Residual CNN-Swin Transformer for Malignant Lesion Segmentation in BUSI
标题:残留-SwinCA-Net:一种用于BUSI中恶性病变分割的队列感知集成残留CNN-Swin Transformer
链接:https://arxiv.org/abs/2512.08243

作者:Saeeda Naz,Saddam Hussain Khan
备注:26 Pages, 10 Figures, 4 Tables
摘要:在这项研究中,提出了一种新的深度混合残差-SwinCA-Net分割框架,通过提取局部相关和鲁棒的特征,结合残差CNN模块来解决这些挑战。此外,为了学习全局依赖性,Swin Transformer块使用内部残差路径进行定制,这增强了梯度稳定性,细化了局部模式,并促进了全局特征融合。以前,为了增强组织的连续性,超声噪声supplies,并强调精细的结构转换拉普拉斯高斯区域算子的应用,并保持恶性病变轮廓的形态完整性,边界导向的运营商已被纳入。随后,收缩策略被应用逐步逐步减少特征图逐步捕获规模不变性和增强结构变异的鲁棒性。此外,每个解码器级先验增强集成了一个新的多尺度通道注意力和挤压(MSCAS)模块。MSCAS选择性地强调编码器突出地图,保留歧视性的全球范围内,和互补的本地结构,以最小的计算成本,同时抑制冗余激活。最后,像素注意模块编码类相关的空间线索,自适应加权恶性病变像素,同时抑制背景干扰。残差-SwinCA-Net和现有的CNN/ViTs技术已在公开可用的BUSI数据集上实现。所提出的残差-SwinCA-Net框架优于并实现了99.29%的平均准确度,98.74%的IoU和0.9041 Dice用于乳腺病变分割。所提出的残差-SwinCA-Net框架提高了BUSI病变诊断性能,并加强了及时的临床决策。
摘要:A novel deep hybrid Residual-SwinCA-Net segmentation framework is proposed in the study for addressing such challenges by extracting locally correlated and robust features, incorporating residual CNN modules. Furthermore, for learning global dependencies, Swin Transformer blocks are customized using internal residual pathways, which reinforce gradient stability, refine local patterns, and facilitate global feature fusion. Formerly, for enhancing tissue continuity, ultrasound noise suppressions, and accentuating fine structural transitions Laplacian-of-Gaussian regional operator is applied, and for maintaining the morphological integrity of malignant lesion contours, a boundary-oriented operator has been incorporated. Subsequently, a contraction strategy was applied stage-wise by progressively reducing features-map progressively for capturing scale invariance and enhancing the robustness of structural variability. In addition, each decoder level prior augmentation integrates a new Multi-Scale Channel Attention and Squeezing (MSCAS) module. The MSCAS selectively emphasizes encoder salient maps, retains discriminative global context, and complementary local structures with minimal computational cost while suppressing redundant activations. Finally, the Pixel-Attention module encodes class-relevant spatial cues by adaptively weighing malignant lesion pixels while suppressing background interference. The Residual-SwinCA-Net and existing CNNs/ViTs techniques have been implemented on the publicly available BUSI dataset. The proposed Residual-SwinCA-Net framework outperformed and achieved 99.29% mean accuracy, 98.74% IoU, and 0.9041 Dice for breast lesion segmentation. The proposed Residual-SwinCA-Net framework improves the BUSI lesion diagnostic performance and strengthens timely clinical decision-making.


【3】PolyLingua: Margin-based Inter-class Transformer for Robust Cross-domain Language Detection
标题:PolyLingua:基于边缘的类间Transformer,用于鲁棒的跨域语言检测
链接:https://arxiv.org/abs/2512.08143

作者:Ali Lotfi Rezaabad,Bikram Khanal,Shashwat Chaurasia,Lu Zeng,Dezhi Hong,Hossein Beshashati,Thomas Butler,Megan Ganji
摘要:语言识别是聊天机器人和虚拟助手等多语言系统的关键第一步,可实现语言和文化上准确的用户体验。这一阶段的错误可能会导致下游故障,从而为准确性设置了很高的标准。然而,现有的语言识别工具在关键情况下很难处理,例如歌曲标题和用户语言不同的音乐请求。像LangDetect,FastText这样的开源工具速度快但准确性较差,而大型语言模型虽然有效,但对于低延迟或低资源设置来说往往过于昂贵。我们介绍了PolyLingua,这是一个轻量级的基于transformer的模型,用于域内语言检测和细粒度语言分类。它采用了两级对比学习框架,结合实例级分离和类级对齐与自适应余量,即使对于密切相关的语言,也能产生紧凑且分离良好的嵌入。在两个具有挑战性的数据集上进行评估-Amazon Massive(多语言数字助理话语)和Song数据集(频繁代码切换的音乐请求)-PolyLingua分别达到99.25% F1和98.15% F1,超过Sonnet 3.5,同时使用的参数减少了10倍,使其成为计算和延迟受限环境的理想选择。
摘要:Language identification is a crucial first step in multilingual systems such as chatbots and virtual assistants, enabling linguistically and culturally accurate user experiences. Errors at this stage can cascade into downstream failures, setting a high bar for accuracy. Yet, existing language identification tools struggle with key cases--such as music requests where the song title and user language differ. Open-source tools like LangDetect, FastText are fast but less accurate, while large language models, though effective, are often too costly for low-latency or low-resource settings. We introduce PolyLingua, a lightweight Transformer-based model for in-domain language detection and fine-grained language classification. It employs a two-level contrastive learning framework combining instance-level separation and class-level alignment with adaptive margins, yielding compact and well-separated embeddings even for closely related languages. Evaluated on two challenging datasets--Amazon Massive (multilingual digital assistant utterances) and a Song dataset (music requests with frequent code-switching)--PolyLingua achieves 99.25% F1 and 98.15% F1, respectively, surpassing Sonnet 3.5 while using 10x fewer parameters, making it ideal for compute- and latency-constrained environments.


【4】Using Text-Based Life Trajectories from Swedish Register Data to Predict Residential Mobility with Pretrained Transformers
标题:使用来自瑞典登记数据的基于文本的生活轨迹预测预训练Transformer的住宅流动性
链接:https://arxiv.org/abs/2512.07865

作者:Philipp Stark,Alexandros Sopasakis,Ola Hall,Markus Grillitsch
摘要 :我们将大规模的瑞典注册数据转换为文本生活轨迹,以解决数据分析中的两个长期挑战:分类变量的高基数和编码方案随时间的不一致。利用这一独特而全面的人口登记册,我们将690万人(2001-2013)的登记册数据转换为语义丰富的文本,并预测个人在以后几年(2013-2017)的居住流动性。这些生活轨迹将人口统计信息与居住、工作、教育、收入和家庭环境的年度变化相结合,使我们能够评估这些序列如何有效地支持纵向预测。我们比较了多种NLP架构(包括LSTM、DistilBERT、BERT和Qwen),发现顺序和基于transformer的模型比基线模型更有效地捕捉时间和语义结构。结果表明,文本化的寄存器数据保留了有意义的信息,个别途径,并支持复杂的,可扩展的建模。由于很少有国家保持具有可比覆盖率和精度的纵向微观数据,因此该数据集可以进行其他地方难以或不可能进行的分析和方法学测试,为开发和评估新的序列建模方法提供了严格的测试平台。总的来说,我们的研究结果表明,将语义丰富的语域数据与现代语言模型相结合,可以大大推进社会科学的纵向分析。
摘要:We transform large-scale Swedish register data into textual life trajectories to address two long-standing challenges in data analysis: high cardinality of categorical variables and inconsistencies in coding schemes over time. Leveraging this uniquely comprehensive population register, we convert register data from 6.9 million individuals (2001-2013) into semantically rich texts and predict individuals' residential mobility in later years (2013-2017). These life trajectories combine demographic information with annual changes in residence, work, education, income, and family circumstances, allowing us to assess how effectively such sequences support longitudinal prediction. We compare multiple NLP architectures (including LSTM, DistilBERT, BERT, and Qwen) and find that sequential and transformer-based models capture temporal and semantic structure more effectively than baseline models. The results show that textualized register data preserves meaningful information about individual pathways and supports complex, scalable modeling. Because few countries maintain longitudinal microdata with comparable coverage and precision, this dataset enables analyses and methodological tests that would be difficult or impossible elsewhere, offering a rigorous testbed for developing and evaluating new sequence-modeling approaches. Overall, our findings demonstrate that combining semantically rich register data with modern language models can substantially advance longitudinal analysis in social sciences.


【5】LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model
标题:LAPA:用于Transformer模型的日志域预测驱动的动态稀疏加速器
链接:https://arxiv.org/abs/2512.07855

作者:Huizheng Wang,Hongbin Wang,Shaojun Wei,Yang Hu,Shouyi Yin
摘要:基于注意力的Transformers已经彻底改变了自然语言处理(NLP),并在计算机视觉(CV)任务中表现出强大的性能。然而,随着输入序列的变化,Transformer模型中的计算瓶颈表现出跨阶段的动态行为,这需要跨阶段的稀疏加速策略。不幸的是,大多数现有的稀疏Transformer方法是基于单级的,并且当跨多个级应用时,其稀疏预测机制导致显著的功率开销。为此,本文提出了一种日志域注意力预测算法-架构协同设计,命名为LAPA。首先,一个非对称的领先一计算(ALOC)计划的目的是消除昂贵的乘法。接下来,进一步提出了混合精度多轮移位累积(MRSA)机制以减轻累积开销。数据特征相关过滤器(DDF)策略旨在与MRSA过程协同工作。最后,一个精心设计的加速器,将理论上的增强转化为实际的硬件改进。实验结果表明,LAPA实现了3.52倍,3.24倍和2.79倍以上的能源效率比最先进的(SOTA)的作品Spatten,桑格和FACT,分别。
摘要:Attention-based Transformers have revolutionized natural language processing (NLP) and shown strong performance in computer vision (CV) tasks. However, as the input sequence varies, the computational bottlenecks in Transformer models exhibit dynamic behavior across stages, which calls for a cross-stage sparse acceleration strategy. Unfortunately, most existing sparse Transformer approaches are single-stage based, and their sparsity prediction mechanisms lead to significant power overhead when applied across multiple stages. To this end, this paper proposes a log-domain attention prediction algorithm-architecture co-design, named LAPA. First, an asymmetric leading one computing (ALOC) scheme is designed to eliminate expensive multiplications. Next, a mixed-precision multi-round shifting accumulation (MRSA) mechanism is further proposed to mitigate the accumulation overhead. A data-feature dependent filter (DDF) strategy is designed to work in concert with the MRSA process. Finally, an elaborate accelerator is designed to translate the theoretical enhancement into practical hardware improvement. Experimental results show that LAPA achieves 3.52x, 3.24x and 2.79x higher energy efficiency than the state-of-the-art (SOTA) works Spatten, Sanger and FACT, respectively.


GAN|对抗|攻击|生成相关(14篇)

【1】Differentially Private Synthetic Data Generation Using Context-Aware GANs
标题:使用上下文感知GAN的差异私有合成数据生成
链接:https://arxiv.org/abs/2512.08869

作者:Anantaa Kotal,Anupam Joshi
摘要:大数据在各行业的广泛使用引发了重大的隐私问题,特别是在共享或分析敏感信息时。GDPR和HIPAA等法规对数据处理施加了严格的控制,因此很难平衡洞察需求和隐私要求。合成数据通过创建反映真实模式而不暴露敏感信息的人工数据集提供了一个很有前途的解决方案。然而,传统的合成数据方法往往无法捕获复杂的隐式规则,这些规则将数据的不同元素联系起来,并且在医疗保健等领域中至关重要。他们可能会重现明确的模式,但忽略了特定领域的限制,没有直接说明,但对现实主义和实用性至关重要。例如,限制某些药物用于特定条件或防止有害药物相互作用的处方指南可能不会明确出现在原始数据中。在没有这些隐含规则的情况下生成的合成数据可能会导致医学上不适当或不切实际的配置文件。为了解决这一差距,我们提出了ContextGAN,这是一种上下文感知的差分私有生成对抗网络,它通过对显式和隐式知识进行编码的约束矩阵来集成特定于域的规则。约束感知的隐私策略根据这些规则评估合成数据,以确保遵守域约束,而差分隐私则保护原始数据中的敏感细节。我们在医疗保健,安全和金融领域验证了ContextGAN,表明它产生了高质量的合成数据,尊重领域规则并保护隐私。我们的研究结果表明,ContextGAN通过强制执行域约束来提高现实性和实用性,使其适用于需要在严格的隐私保证下遵守显式模式和隐式规则的应用程序。
摘要 :The widespread use of big data across sectors has raised major privacy concerns, especially when sensitive information is shared or analyzed. Regulations such as GDPR and HIPAA impose strict controls on data handling, making it difficult to balance the need for insights with privacy requirements. Synthetic data offers a promising solution by creating artificial datasets that reflect real patterns without exposing sensitive information. However, traditional synthetic data methods often fail to capture complex, implicit rules that link different elements of the data and are essential in domains like healthcare. They may reproduce explicit patterns but overlook domain-specific constraints that are not directly stated yet crucial for realism and utility. For example, prescription guidelines that restrict certain medications for specific conditions or prevent harmful drug interactions may not appear explicitly in the original data. Synthetic data generated without these implicit rules can lead to medically inappropriate or unrealistic profiles. To address this gap, we propose ContextGAN, a Context-Aware Differentially Private Generative Adversarial Network that integrates domain-specific rules through a constraint matrix encoding both explicit and implicit knowledge. The constraint-aware discriminator evaluates synthetic data against these rules to ensure adherence to domain constraints, while differential privacy protects sensitive details from the original data. We validate ContextGAN across healthcare, security, and finance, showing that it produces high-quality synthetic data that respects domain rules and preserves privacy. Our results demonstrate that ContextGAN improves realism and utility by enforcing domain constraints, making it suitable for applications that require compliance with both explicit patterns and implicit rules under strict privacy guarantees.


【2】Secure and Privacy-Preserving Federated Learning for Next-Generation Underground Mine Safety
标题:安全且保护隐私的联邦学习,以实现下一代地下矿山安全
链接:https://arxiv.org/abs/2512.08862

作者:Mohamed Elmahallawy,Sanjay Madria,Samuel Frimpong
摘要:地下采矿作业依赖传感器网络来监测温度、气体浓度和矿工移动等关键参数,从而实现及时的危险检测和安全决策。然而,将原始传感器数据传输到集中式服务器进行机器学习(ML)模型训练会引发严重的隐私和安全问题。联邦学习(FL)提供了一个很有前途的替代方案,它支持分散的模型训练,而不会暴露敏感的本地数据。然而,在地下采矿中应用FL提出了独特的挑战:(i)对手可能会窃听共享模型更新,以发起模型反演或成员推断攻击,从而损害数据隐私和操作安全;(ii)矿山和传感器噪声之间的非IID数据分布可能会阻碍模型收敛。为了解决这些问题,我们提出了FedMining--一个为地下采矿量身定制的隐私保护FL框架。FedMining引入了两个核心创新:(1)分散式功能加密(DFE)方案,保持本地模型加密,阻止未经授权的访问和推理攻击;(2)平衡聚合机制,以减轻数据异构性并增强收敛性。对真实世界挖掘数据集的评估表明,FedMining能够保护隐私,同时保持高模型准确性,并在减少通信和计算开销的情况下实现快速收敛。这些优点使得FedMining在实时地下安全监控方面既安全又实用。
摘要:Underground mining operations depend on sensor networks to monitor critical parameters such as temperature, gas concentration, and miner movement, enabling timely hazard detection and safety decisions. However, transmitting raw sensor data to a centralized server for machine learning (ML) model training raises serious privacy and security concerns. Federated Learning (FL) offers a promising alternative by enabling decentralized model training without exposing sensitive local data. Yet, applying FL in underground mining presents unique challenges: (i) Adversaries may eavesdrop on shared model updates to launch model inversion or membership inference attacks, compromising data privacy and operational safety; (ii) Non-IID data distributions across mines and sensor noise can hinder model convergence. To address these issues, we propose FedMining--a privacy-preserving FL framework tailored for underground mining. FedMining introduces two core innovations: (1) a Decentralized Functional Encryption (DFE) scheme that keeps local models encrypted, thwarting unauthorized access and inference attacks; and (2) a balancing aggregation mechanism to mitigate data heterogeneity and enhance convergence. Evaluations on real-world mining datasets demonstrate FedMining's ability to safeguard privacy while maintaining high model accuracy and achieving rapid convergence with reduced communication and computation overhead. These advantages make FedMining both secure and practical for real-time underground safety monitoring.


【3】Generation is Required for Data-Efficient Perception
标题:数据高效感知需要生成
链接:https://arxiv.org/abs/2512.08854

作者:Jack Brady,Bernhard Schölkopf,Thomas Kipf,Simon Buchholz,Wieland Brendel
备注:Preprint
摘要:有人假设,人类水平的视觉感知需要一个生成的方法,其中内部表示的结果从反转解码器。然而,今天最成功的视觉模型是非生成的,依赖于将图像映射到表示的编码器,而无需解码器反转。这就提出了一个问题:事实上,生成是否是机器实现人类水平视觉感知所必需的。为了解决这个问题,我们研究生成和非生成方法是否可以实现合成泛化,这是人类感知的标志。在一个组合数据生成过程中,我们正式的归纳偏见,以保证组合的一般化,基于解码器(生成)和基于编码器(非生成)的方法。然后,我们从理论上表明,使用正则化或架构约束对编码器强制执行这些归纳偏差通常是不可行的。相反,对于生成方法,可以直接强制执行的归纳偏见,从而使组成的概括约束解码器和反转it.We强调如何反转可以有效地执行,无论是在线通过基于梯度的搜索或离线通过生成重放。我们通过在真实感图像数据集上训练一系列生成和非生成方法来研究我们理论的经验意义。我们发现,如果没有必要的归纳偏差,非生成方法往往无法进行组合概括,需要大规模的预训练或额外的监督来提高概括能力。相比之下,生成方法通过在解码器上利用合适的归纳偏差以及搜索和重放,在合成概括方面产生显著的改进,而不需要额外的数据。
摘要:It has been hypothesized that human-level visual perception requires a generative approach in which internal representations result from inverting a decoder. Yet today's most successful vision models are non-generative, relying on an encoder that maps images to representations without decoder inversion. This raises the question of whether generation is, in fact, necessary for machines to achieve human-level visual perception. To address this, we study whether generative and non-generative methods can achieve compositional generalization, a hallmark of human perception. Under a compositional data generating process, we formalize the inductive biases required to guarantee compositional generalization in decoder-based (generative) and encoder-based (non-generative) methods. We then show theoretically that enforcing these inductive biases on encoders is generally infeasible using regularization or architectural constraints. In contrast, for generative methods, the inductive biases can be enforced straightforwardly, thereby enabling compositional generalization by constraining a decoder and inverting it. We highlight how this inversion can be performed efficiently, either online through gradient-based search or offline through generative replay. We examine the empirical implications of our theory by training a range of generative and non-generative methods on photorealistic image datasets. We find that, without the necessary inductive biases, non-generative methods often fail to generalize compositionally and require large-scale pretraining or added supervision to improve generalization. By comparison, generative methods yield significant improvements in compositional generalization, without requiring additional data, by leveraging suitable inductive biases on a decoder along with search and replay.


【4】Forecasting Fails: Unveiling Evasion Attacks in Weather Prediction Models
标题:预测失败:揭露天气预测模型中的逃避攻击
链接:https://arxiv.org/abs/2512.08832

作者:Huzaifa Arif,Pin-Yu Chen,Alex Gittens,James Diffenderfer,Bhavya Kailkhura
摘要:随着人们越来越依赖人工智能模型进行天气预报,评估其对对抗性扰动的脆弱性势在必行。这项工作介绍了天气自适应对抗扰动优化(WAAPO),这是一种用于生成有针对性的对抗扰动的新框架,它既能有效地操纵预报,又能隐身以避免检测。WAAPO通过结合信道稀疏性、空间定位和平滑性的约束来实现这一点,确保扰动在物理上保持真实和不可感知。使用ERA5数据集和FourCastNet(Pathak et al. 2022),我们证明了WAAPO生成与预定义目标紧密对齐的对抗轨迹的能力,即使在约束条件下也是如此。我们的实验突出了人工智能驱动的预测模型中的关键漏洞,其中对初始条件的小扰动可能导致预测天气模式的重大偏差。这些研究结果强调,有必要采取强有力的保障措施,以防止业务预报系统中的对抗性利用。
摘要:With the increasing reliance on AI models for weather forecasting, it is imperative to evaluate their vulnerability to adversarial perturbations. This work introduces Weather Adaptive Adversarial Perturbation Optimization (WAAPO), a novel framework for generating targeted adversarial perturbations that are both effective in manipulating forecasts and stealthy to avoid detection. WAAPO achieves this by incorporating constraints for channel sparsity, spatial localization, and smoothness, ensuring that perturbations remain physically realistic and imperceptible. Using the ERA5 dataset and FourCastNet (Pathak et al. 2022), we demonstrate WAAPO's ability to generate adversarial trajectories that align closely with predefined targets, even under constrained conditions. Our experiments highlight critical vulnerabilities in AI-driven forecasting models, where small perturbations to initial conditions can result in significant deviations in predicted weather patterns. These findings underscore the need for robust safeguards to protect against adversarial exploitation in operational forecasting systems.


【5】De novo generation of functional terpene synthases using TpsGPT
标题:使用TpsGPT从头产生功能性蕨类植物酶
链接:https://arxiv.org/abs/2512.08772

作者:Hamsini Ramanathan,Roman Bushuiev,Matouš Soldát,Jirí Kohout,Téo Hebra,Joshua David Smith,Josef Sivic,Tomáš Pluskal
备注:11 pages, 8 figures, Accepted at the NeurIPS 2025 AI for Science and MLSB 2025 workshops
摘要:萜烯脱氢酶(TPS)是一个关键的酶家族,负责产生多种萜烯支架,支撑许多天然产物,包括一线抗癌药物,如紫杉醇。然而,通过定向进化重新设计TPS成本高昂且速度缓慢。我们介绍了TpsGPT,一个可扩展的TPS蛋白质设计的生成模型,通过微调蛋白质语言模型ProtGPT2从UniProt挖掘的79k TPS序列。TpsGPT在计算机中生成了从头酶候选物,我们使用多种验证指标对其进行了评估,包括EnzymeExplorer分类,ESMFold结构置信度(pLDDT),序列多样性,CLEAN分类,InterPro结构域检测和Foldseek结构比对。从28k生成序列的初始池中,我们鉴定了7种符合所有验证标准的推定TPS酶。实验验证证实TPS酶活性在这些序列中的至少两个。我们的研究结果表明,在精心策划的酶类特异性数据集上对蛋白质语言模型进行微调,结合严格的过滤,可以从头生成功能性的、进化上遥远的酶。
摘要:Terpene synthases (TPS) are a key family of enzymes responsible for generating the diverse terpene scaffolds that underpin many natural products, including front-line anticancer drugs such as Taxol. However, de novo TPS design through directed evolution is costly and slow. We introduce TpsGPT, a generative model for scalable TPS protein design, built by fine-tuning the protein language model ProtGPT2 on 79k TPS sequences mined from UniProt. TpsGPT generated de novo enzyme candidates in silico and we evaluated them using multiple validation metrics, including EnzymeExplorer classification, ESMFold structural confidence (pLDDT), sequence diversity, CLEAN classification, InterPro domain detection, and Foldseek structure alignment. From an initial pool of 28k generated sequences, we identified seven putative TPS enzymes that satisfied all validation criteria. Experimental validation confirmed TPS enzymatic activity in at least two of these sequences. Our results show that fine-tuning of a protein language model on a carefully curated, enzyme-class-specific dataset, combined with rigorous filtering, can enable the de novo generation of functional, evolutionarily distant enzymes.


【6】Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery
标题:通过多摄像头无影灯无干扰地生成开放手术视频
链接:https://arxiv.org/abs/2512.08577

作者:Yuna Kato,Shohei Mori,Hideo Saito,Yoshifumi Takatsume,Hiroki Kajita,Mariko Isogawa
摘要:开放式手术的视频记录非常需要用于教育和研究目的。然而,捕获无障碍视频是具有挑战性的,因为外科医生经常阻挡相机视野。为了避免遮挡,必须频繁调整摄像机的位置和角度,这是高度劳动密集型的。先前的工作已经通过在无影灯上安装多个摄像机并将它们布置成完全围绕手术区域来解决这个问题。这种设置增加了一些摄像机捕捉到无遮挡视图的机会。然而,在后处理中需要手动图像对准,因为外科医生每次移动灯以获得最佳照明时,相机配置都会发生变化。本文旨在完全自动化这一对齐任务。所提出的方法识别照明系统移动的帧,重新对齐它们,并选择遮挡最少的摄像机来生成从固定角度一致地呈现手术野的视频。涉及外科医生的用户研究表明,我们的方法生成的视频优于传统方法生成的视频,便于确认手术区域和视频观看期间的舒适度。此外,我们的方法在视频质量上比现有技术有所改进。此外,我们实现了几个合成选项的建议的视图合成方法,并进行了用户研究,以评估外科医生的偏好,每个选项。
摘要 :Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons' preferences for each option.


【7】Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset
标题:评估前沿人工智能模型的生物威胁基准生成框架III:实施细菌生物威胁基准(B3)数据集
链接:https://arxiv.org/abs/2512.08459

作者:Gary Ackerman,Theodore Wilson,Zachary Kallenborn,Olivia Shoemaker,Anna Wetzel,Hayley Peterson,Abigail Danfora,Jenna LaTourette,Brandon Behlendorf,Douglas Clifford
备注:19 pages, 2 figures
摘要:快速发展的前沿人工智能(AI)模型,特别是大型语言模型(LLM),促进生物恐怖主义或获得生物武器的潜力已经引起了重大的政策,学术和公众关注。模型开发者和政策制定者都力求量化和减轻任何风险,其中一个重要内容是开发可评估特定模型所构成的生物安保风险的模型基准。本文讨论了细菌生物威胁基准(B3)数据集的试点实施。这是描述整体生物威胁基准生成(BBG)框架的三篇系列论文中的第三篇,之前的论文详细介绍了B3数据集的开发。试点涉及通过样本前沿人工智能模型运行基准,然后对模型响应进行人工评估,并对结果进行多个维度的应用风险分析。总体而言,试点项目表明,B3数据集提供了一种可行的、细致入微的方法,可用于快速评估LLM造成的生物安保风险,确定该风险的主要来源,并为缓解优先事项的优先领域提供指导。
摘要:The potential for rapidly-evolving frontier artificial intelligence (AI) models, especially large language models (LLMs), to facilitate bioterrorism or access to biological weapons has generated significant policy, academic, and public concern. Both model developers and policymakers seek to quantify and mitigate any risk, with an important element of such efforts being the development of model benchmarks that can assess the biosecurity risk posed by a particular model. This paper discusses the pilot implementation of the Bacterial Biothreat Benchmark (B3) dataset. It is the third in a series of three papers describing an overall Biothreat Benchmark Generation (BBG) framework, with previous papers detailing the development of the B3 dataset. The pilot involved running the benchmarks through a sample frontier AI model, followed by human evaluation of model responses, and an applied risk analysis of the results along several dimensions. Overall, the pilot demonstrated that the B3 dataset offers a viable, nuanced method for rapidly assessing the biosecurity risk posed by a LLM, identifying the key sources of that risk and providing guidance for priority areas of mitigation priority.


【8】Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models II: Benchmark Generation Process
标题:评估前沿人工智能模型的Biothreat基准生成框架II:基准生成过程
链接:https://arxiv.org/abs/2512.08451

作者:Gary Ackerman,Zachary Kallenborn,Anna Wetzel,Hayley Peterson,Jenna LaTourette,Olivia Shoemaker,Brandon Behlendorf,Sheriff Almakki,Doug Clifford,Noah Sheinbaum
备注:18 pages, 3 figures
摘要:快速发展的前沿人工智能(AI)模型,特别是大型语言模型(LLM),促进生物恐怖主义或获得生物武器的潜力已经引起了重大的政策,学术和公众关注。模型开发者和政策制定者都力求量化和减轻任何风险,其中一个重要内容是开发可评估特定模型所构成的生物安保风险的模型基准。本文是一系列三篇文章中的第二篇,描述了新型生物威胁基准生成(BBG)框架的第二个组成部分:细菌生物威胁基准(B3)数据集的生成。开发过程涉及三种互补的方法:1)基于Web的即时生成,2)红色团队,以及3)挖掘现有的基准语料库,以生成与项目第一部分期间开发的任务查询架构相关的7,000多个潜在基准。经过重复数据删除,然后对提升诊断性进行评估,并采取一般质量控制措施,最终将候选基准减少到1 010个。这一程序确保了这些基准(a)在提供提升方面具有诊断性;(b)与生物安保威胁直接相关;(c)与更大的生物安保架构保持一致,从而能够在不同分析层面进行细致入微的分析。
摘要:The potential for rapidly-evolving frontier artificial intelligence (AI) models, especially large language models (LLMs), to facilitate bioterrorism or access to biological weapons has generated significant policy, academic, and public concern. Both model developers and policymakers seek to quantify and mitigate any risk, with an important element of such efforts being the development of model benchmarks that can assess the biosecurity risk posed by a particular model. This paper, the second in a series of three, describes the second component of a novel Biothreat Benchmark Generation (BBG) framework: the generation of the Bacterial Biothreat Benchmark (B3) dataset. The development process involved three complementary approaches: 1) web-based prompt generation, 2) red teaming, and 3) mining existing benchmark corpora, to generate over 7,000 potential benchmarks linked to the Task-Query Architecture that was developed during the first component of the project. A process of de-duplication, followed by an assessment of uplift diagnosticity, and general quality control measures, reduced the candidates to a set of 1,010 final benchmarks. This procedure ensured that these benchmarks are a) diagnostic in terms of providing uplift; b) directly relevant to biosecurity threats; and c) are aligned with a larger biosecurity architecture permitting nuanced analysis at different levels of analysis.


【9】Conditional Morphogenesis: Emergent Generation of Structural Digits via Neural Cellular Automata
标题:条件形态发生:通过神经元自动机紧急生成结构数字
链接:https://arxiv.org/abs/2512.08360

作者:Ali Sakour
备注:13 pages, 5 figures. Code available at: https://github.com/alisakour/Conditional-NCA-Digits
摘要 :生物系统表现出显著的形态发生可塑性,其中单个基因组可以编码由局部化学信号触发的各种专门的细胞结构。在深度学习领域,微分神经元自动机(NCA)已经成为模仿这种自组织的范例。然而,现有的NCA研究主要集中在连续纹理合成或单目标对象恢复,离开类条件结构生成的挑战在很大程度上未被探索。在这项工作中,我们提出了一种新的条件神经元胞自动机(c-NCA)架构,能够从一个单一的通用种子中生长不同的拓扑结构-特别是MNIST数字,仅由空间广播类向量引导。与传统的生成模型(例如,GANs,VAE)依赖于全局接收场,我们的模型强制执行严格的局部性和平移等变性。我们证明,通过将一个热条件注入细胞感知场,一组局部规则可以学习打破对称性并自组装成十个不同的几何吸引子。实验结果表明,我们的c-NCA实现稳定的收敛,正确地形成数字拓扑从一个单一的像素,并表现出生物系统的鲁棒性。这项工作弥合了基于纹理的NCAs和结构模式形成之间的差距,为条件生成提供了一个轻量级的,生物学上合理的替代方案。
摘要:Biological systems exhibit remarkable morphogenetic plasticity, where a single genome can encode various specialized cellular structures triggered by local chemical signals. In the domain of Deep Learning, Differentiable Neural Cellular Automata (NCA) have emerged as a paradigm to mimic this self-organization. However, existing NCA research has predominantly focused on continuous texture synthesis or single-target object recovery, leaving the challenge of class-conditional structural generation largely unexplored. In this work, we propose a novel Conditional Neural Cellular Automata (c-NCA) architecture capable of growing distinct topological structures - specifically MNIST digits - from a single generic seed, guided solely by a spatially broadcasted class vector. Unlike traditional generative models (e.g., GANs, VAEs) that rely on global reception fields, our model enforces strict locality and translation equivariance. We demonstrate that by injecting a one-hot condition into the cellular perception field, a single set of local rules can learn to break symmetry and self-assemble into ten distinct geometric attractors. Experimental results show that our c-NCA achieves stable convergence, correctly forming digit topologies from a single pixel, and exhibits robustness characteristic of biological systems. This work bridges the gap between texture-based NCAs and structural pattern formation, offering a lightweight, biologically plausible alternative for conditional generation.


【10】Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise in Infinite, Real-Time Terrain Generation
标题:地形扩散:无限实时地形生成中柏林噪音的基于扩散的继任者
链接:https://arxiv.org/abs/2512.08309

作者:Alexander Goslin
备注:Project website: https://xandergos.github.io/terrain-diffusion/ Code: https://github.com/xandergos/terrain-diffusion/
摘要:几十年来,程序世界一直建立在程序噪声函数(如Perlin噪声)上,这些噪声函数快速且无限,但在现实主义和大规模一致性方面受到根本限制。我们介绍了Terrain Diffusion,这是Perlin噪声的AI时代的继承者,它将扩散模型的保真度与程序噪声不可或缺的属性联系起来:无缝无限范围,种子一致性和恒定时间随机访问。其核心是InfiniteDiffusion,这是一种用于无限生成的新颖算法,可以无缝,实时地合成无限的景观。扩散模型的分层堆栈将行星背景与局部细节相结合,而紧凑的拉普拉斯编码则在地球尺度的动态范围内稳定输出。一个开源的无限张量框架支持无限张量的恒定内存操作,并且几步一致性蒸馏可以实现高效的生成。这些组件共同建立了扩散模型,作为程序化世界生成的实际基础,能够连贯地、可控地、无限制地合成整个行星。
摘要:For decades, procedural worlds have been built on procedural noise functions such as Perlin noise, which are fast and infinite, yet fundamentally limited in realism and large-scale coherence. We introduce Terrain Diffusion, an AI-era successor to Perlin noise that bridges the fidelity of diffusion models with the properties that made procedural noise indispensable: seamless infinite extent, seed-consistency, and constant-time random access. At its core is InfiniteDiffusion, a novel algorithm for infinite generation, enabling seamless, real-time synthesis of boundless landscapes. A hierarchical stack of diffusion models couples planetary context with local detail, while a compact Laplacian encoding stabilizes outputs across Earth-scale dynamic ranges. An open-source infinite-tensor framework supports constant-memory manipulation of unbounded tensors, and few-step consistency distillation enables efficient generation. Together, these components establish diffusion models as a practical foundation for procedural world generation, capable of synthesizing entire planets coherently, controllably, and without limits.


【11】Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture
标题:评估前沿人工智能模型的Biothreat基准生成框架I:任务查询架构
链接:https://arxiv.org/abs/2512.08130

作者:Gary Ackerman,Brandon Behlendorf,Zachary Kallenborn,Sheriff Almakki,Doug Clifford,Jenna LaTourette,Hayley Peterson,Noah Sheinbaum,Olivia Shoemaker,Anna Wetzel
备注:18 pages
摘要:模型开发者和政策制定者都在寻求量化和减轻快速发展的前沿人工智能(AI)模型的风险,特别是大型语言模型(LLM),以促进生物恐怖主义或获得生物武器。这种努力的一个重要内容是制定可评估特定模式所构成的生物安保风险的模式基准。本文介绍了一种新的生物威胁基准生成(BBG)框架的第一个组成部分。BBG方法旨在帮助模型开发人员和评估人员可靠地测量和评估现有和未来人工智能模型的生物安全风险提升和一般危害潜力,同时考虑到在其他基准测试工作中经常被忽视的威胁本身的关键方面,包括不同的参与者能力水平和操作(除了纯技术)风险因素。作为一个试点,BBG首先被开发来解决细菌生物威胁。BBG建立在生物威胁类别、元素和任务的层次结构之上,然后作为开发任务对齐查询的基础。本文概述了这种生物威胁任务查询架构的发展,我们将其命名为细菌生物威胁模式,而未来的论文将描述后续的努力,将查询转化为模型提示,以及如何实现由此产生的基准模型评估。总体而言,BBG框架,包括细菌生物威胁模式,旨在提供一个强大的,可重复使用的结构,用于评估多个聚合级别的LLM产生的细菌生物风险,该结构涵盖了生物对手的全部技术和操作要求,并考虑了广泛的生物对手能力。
摘要 :Both model developers and policymakers seek to quantify and mitigate the risk of rapidly-evolving frontier artificial intelligence (AI) models, especially large language models (LLMs), to facilitate bioterrorism or access to biological weapons. An important element of such efforts is the development of model benchmarks that can assess the biosecurity risk posed by a particular model. This paper describes the first component of a novel Biothreat Benchmark Generation (BBG) Framework. The BBG approach is designed to help model developers and evaluators reliably measure and assess the biosecurity risk uplift and general harm potential of existing and future AI models, while accounting for key aspects of the threat itself that are often overlooked in other benchmarking efforts, including different actor capability levels, and operational (in addition to purely technical) risk factors. As a pilot, the BBG is first being developed to address bacterial biological threats only. The BBG is built upon a hierarchical structure of biothreat categories, elements and tasks, which then serves as the basis for the development of task-aligned queries. This paper outlines the development of this biothreat task-query architecture, which we have named the Bacterial Biothreat Schema, while future papers will describe follow-on efforts to turn queries into model prompts, as well as how the resulting benchmarks can be implemented for model evaluation. Overall, the BBG Framework, including the Bacterial Biothreat Schema, seeks to offer a robust, re-usable structure for evaluating bacterial biological risks arising from LLMs across multiple levels of aggregation, which captures the full scope of technical and operational requirements for biological adversaries, and which accounts for a wide spectrum of biological adversary capabilities.


【12】CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification
标题:CAMO:用于危机分类的偶然性引导的对抗性多模式领域概括
链接:https://arxiv.org/abs/2512.08071

作者:Pingchuan Ma,Chengshuai Zhao,Bohan Jiang,Saketh Vishnubhatla,Ujun Jeong,Alimohammad Beigi,Adrienne Raglin,Huan Liu
摘要:社交媒体中的危机分类旨在从多模式帖子中提取可操作的灾害相关信息,这是提高态势感知和促进及时应急响应的关键任务。然而,危机类型的广泛差异使得在看不见的灾害中实现可概括的绩效成为一个持续的挑战。现有的方法主要利用深度学习来融合文本和视觉线索进行危机分类,在域内设置下实现数字上合理的结果。然而,它们在看不见的危机类型中表现出较差的泛化能力,因为它们1。不解开虚假和因果特征,导致域移位下的性能下降,以及2.未能对齐异构模态表示在一个共享的空间,这阻碍了直接适应既定的单模态域泛化(DG)技术的多模态设置。为了解决这些问题,我们引入了一个cavity引导的多模态域泛化(MMDG)框架,该框架将对抗性解纠缠与统一表示学习相结合,用于危机分类。对抗性目标鼓励模型解开并专注于域不变的因果特征,从而导致基于稳定因果机制的更可推广的分类。统一的表示将来自共享潜在空间内不同模态的特征对齐,使单模态DG策略能够无缝扩展到多模态学习。在不同数据集上的实验表明,我们的方法在看不见的灾难场景中实现了最佳性能。
摘要:Crisis classification in social media aims to extract actionable disaster-related information from multimodal posts, which is a crucial task for enhancing situational awareness and facilitating timely emergency responses. However, the wide variation in crisis types makes achieving generalizable performance across unseen disasters a persistent challenge. Existing approaches primarily leverage deep learning to fuse textual and visual cues for crisis classification, achieving numerically plausible results under in-domain settings. However, they exhibit poor generalization across unseen crisis types because they 1. do not disentangle spurious and causal features, resulting in performance degradation under domain shift, and 2. fail to align heterogeneous modality representations within a shared space, which hinders the direct adaptation of established single-modality domain generalization (DG) techniques to the multimodal setting. To address these issues, we introduce a causality-guided multimodal domain generalization (MMDG) framework that combines adversarial disentanglement with unified representation learning for crisis classification. The adversarial objective encourages the model to disentangle and focus on domain-invariant causal features, leading to more generalizable classifications grounded in stable causal mechanisms. The unified representation aligns features from different modalities within a shared latent space, enabling single-modality DG strategies to be seamlessly extended to multimodal learning. Experiments on the different datasets demonstrate that our approach achieves the best performance in unseen disaster scenarios.


【13】Controllable risk scenario generation from human crash data for autonomous vehicle testing
标题:根据自动驾驶车辆测试的人类碰撞数据生成可控风险场景
链接:https://arxiv.org/abs/2512.07874

作者:Qiujing Lu,Xuanhan Wang,Runze Yuan,Wei Lu,Xinyi Gong,Shuo Feng
摘要:确保自动驾驶汽车(AV)的安全性需要在日常驾驶和罕见的安全关键条件下进行严格的测试。一个关键的挑战在于模拟环境代理,包括背景车辆(BV)和易受伤害的道路使用者(VRU),这些代理在标称交通中表现得逼真,同时也表现出与现实世界事故一致的风险倾向行为。我们介绍可控风险代理生成(CRAG),一个框架,旨在统一建模的主导名义行为和罕见的安全关键行为。CRAG构建了一个结构化的潜在空间,将正常行为和风险相关行为分开,从而有效地利用有限的碰撞数据。通过结合风险意识的潜在表示与优化为基础的模式转换机制,该框架允许代理人顺利和可扩展的视野从安全到风险状态,同时保持在两个政权的高保真度。大量的实验表明,CRAG提高了多样性相比,现有的基线,同时也使可控的风险场景的生成有针对性的和有效的评估AV的鲁棒性。
摘要:Ensuring the safety of autonomous vehicles (AV) requires rigorous testing under both everyday driving and rare, safety-critical conditions. A key challenge lies in simulating environment agents, including background vehicles (BVs) and vulnerable road users (VRUs), that behave realistically in nominal traffic while also exhibiting risk-prone behaviors consistent with real-world accidents. We introduce Controllable Risk Agent Generation (CRAG), a framework designed to unify the modeling of dominant nominal behaviors and rare safety-critical behaviors. CRAG constructs a structured latent space that disentangles normal and risk-related behaviors, enabling efficient use of limited crash data. By combining risk-aware latent representations with optimization-based mode-transition mechanisms, the framework allows agents to shift smoothly and plausibly from safe to risk states over extended horizons, while maintaining high fidelity in both regimes. Extensive experiments show that CRAG improves diversity compared to existing baselines, while also enabling controllable generation of risk scenarios for targeted and efficient evaluation of AV robustness.


【14】Worst-case generation via minimax optimization in Wasserstein space
标题:通过Wasserstein空间中的极大极小优化生成最坏情况
链接:https://arxiv.org/abs/2512.08176

作者:Xiuyuan Cheng,Yao Xie,Linglingzhi Zhu,Yunqin Zhu
摘要 :在从机器学习模型到电网和医疗预测系统的应用中,最坏情况生成在评估分布变化下的鲁棒性和压力测试系统中起着关键作用。我们开发了一个生成的建模框架,最坏情况下生成的预先指定的风险,基于连续概率分布的最小-最大优化,即Wasserstein空间。与传统的离散分布鲁棒优化方法不同,这种方法通常会遇到可扩展性问题,泛化能力有限和成本高昂的最坏情况推理,我们的框架利用Brenier定理将最不利(最坏情况)的分布描述为从连续参考度量向前推进的传输图,从而实现了超越经典离散DRO公式的风险诱导生成的连续和表达概念。基于最小-最大公式,我们提出了一个梯度下降上升(GDA)型计划,更新决策模型和运输地图在一个单一的循环,建立全局收敛保证温和的正则性假设下,可能没有凸性。我们还建议使用神经网络来参数化传输图,该神经网络可以通过匹配传输的训练样本与GDA迭代同时进行训练,从而实现无模拟的方法。通过对合成数据和图像数据的数值实验,验证了该方法作为风险诱导最坏情况发生器的有效性。
摘要:Worst-case generation plays a critical role in evaluating robustness and stress-testing systems under distribution shifts, in applications ranging from machine learning models to power grids and medical prediction systems. We develop a generative modeling framework for worst-case generation for a pre-specified risk, based on min-max optimization over continuous probability distributions, namely the Wasserstein space. Unlike traditional discrete distributionally robust optimization approaches, which often suffer from scalability issues, limited generalization, and costly worst-case inference, our framework exploits the Brenier theorem to characterize the least favorable (worst-case) distribution as the pushforward of a transport map from a continuous reference measure, enabling a continuous and expressive notion of risk-induced generation beyond classical discrete DRO formulations. Based on the min-max formulation, we propose a Gradient Descent Ascent (GDA)-type scheme that updates the decision model and the transport map in a single loop, establishing global convergence guarantees under mild regularity assumptions and possibly without convexity-concavity. We also propose to parameterize the transport map using a neural network that can be trained simultaneously with the GDA iterations by matching the transported training samples, thereby achieving a simulation-free approach. The efficiency of the proposed method as a risk-induced worst-case generator is validated by numerical experiments on synthetic and image data.


半/弱/无/有监督|不确定性|主动学习(8篇)

【1】Unsupervised Learning of Density Estimates with Topological Optimization
标题:利用布局优化进行密度估计的无监督学习
链接:https://arxiv.org/abs/2512.08895

作者:Suina Tanweer,Firas A. Khasawneh
摘要:核密度估计是机器学习、贝叶斯推理、随机动力学和信号处理中各种算法的关键组成部分。然而,无监督密度估计技术需要调整一个关键的超参数:核带宽。带宽的选择至关重要,因为它通过过度或欠平滑拓扑特征来控制偏差-方差权衡。拓扑数据分析提供了在数学上量化拓扑特征的方法,例如连接的组件、环、空隙等,即使在密度估计的可视化是不可能的高维中。在本文中,我们提出了一种使用基于拓扑的损失函数的无监督学习方法,用于自动和无监督地选择最佳带宽,并将其与经典技术进行基准测试-展示了其在不同维度上的潜力。
摘要:Kernel density estimation is a key component of a wide variety of algorithms in machine learning, Bayesian inference, stochastic dynamics and signal processing. However, the unsupervised density estimation technique requires tuning a crucial hyperparameter: the kernel bandwidth. The choice of bandwidth is critical as it controls the bias-variance trade-off by over- or under-smoothing the topological features. Topological data analysis provides methods to mathematically quantify topological characteristics, such as connected components, loops, voids et cetera, even in high dimensions where visualization of density estimates is impossible. In this paper, we propose an unsupervised learning approach using a topology-based loss function for the automated and unsupervised selection of the optimal bandwidth and benchmark it against classical techniques -- demonstrating its potential across different dimensions.


【2】Developing Distance-Aware Uncertainty Quantification Methods in Physics-Guided Neural Networks for Reliable Bearing Health Prediction
标题:在物理引导神经网络中开发距离感知不确定性量化方法,以实现可靠的轴承健康预测
链接:https://arxiv.org/abs/2512.08499

作者:Waleed Razzaq,Yun-Bo Zhao
备注:Under review at Structural health Monitoring - SAGE
摘要:准确且具有不确定性的退化估计对于安全关键系统(如带有滚动轴承的旋转机械)的预测性维护至关重要。许多现有的不确定性方法缺乏置信度校准,运行成本高,没有距离感知,并且无法在分布外的数据下推广。我们介绍了两种用于确定性物理引导神经网络的距离感知不确定性方法:基于谱归一化高斯过程的PG-SNGP和基于深度证据回归的PG-SNER。我们对隐藏层进行谱归一化,因此网络保留了从输入到潜在空间的距离。PG-SNGP用高斯过程层替换最终的密集层,以用于距离敏感的不确定性,而PG-SNER输出正态逆伽马参数,以相干概率形式对不确定性进行建模。我们使用标准的准确性指标和基于Pearson相关系数的新的距离感知指标来评估性能,该指标衡量预测的不确定性如何跟踪测试样本和训练样本之间的距离。我们还设计了一个动态的加权方案,以平衡数据的保真度和物理一致性的损失。我们使用PRONOSTIA数据集测试了我们的滚动轴承退化方法,并将其与Monte Carlo和Deep Enhancement PGNNs进行了比较。结果表明,PG-SNGP和PG-SNER提高了预测精度,在OOD条件下可靠地推广,并对对抗性攻击和噪声保持鲁棒性。
摘要:Accurate and uncertainty-aware degradation estimation is essential for predictive maintenance in safety-critical systems like rotating machinery with rolling-element bearings. Many existing uncertainty methods lack confidence calibration, are costly to run, are not distance-aware, and fail to generalize under out-of-distribution data. We introduce two distance-aware uncertainty methods for deterministic physics-guided neural networks: PG-SNGP, based on Spectral Normalization Gaussian Process, and PG-SNER, based on Deep Evidential Regression. We apply spectral normalization to the hidden layers so the network preserves distances from input to latent space. PG-SNGP replaces the final dense layer with a Gaussian Process layer for distance-sensitive uncertainty, while PG-SNER outputs Normal Inverse Gamma parameters to model uncertainty in a coherent probabilistic form. We assess performance using standard accuracy metrics and a new distance-aware metric based on the Pearson Correlation Coefficient, which measures how well predicted uncertainty tracks the distance between test and training samples. We also design a dynamic weighting scheme in the loss to balance data fidelity and physical consistency. We test our methods on rolling-element bearing degradation using the PRONOSTIA dataset and compare them with Monte Carlo and Deep Ensemble PGNNs. Results show that PG-SNGP and PG-SNER improve prediction accuracy, generalize reliably under OOD conditions, and remain robust to adversarial attacks and noise.


【3】Uncertainty-Aware Subset Selection for Robust Visual Explainability under Distribution Shifts
标题:分布漂移下具有鲁棒视觉解释性的不确定性子集选择
链接:https://arxiv.org/abs/2512.08445

作者:Madhav Gupta,Vishak Prasad C,Ganesh Ramakrishnan
摘要:基于子集选择的方法被广泛用于解释深度视觉模型:它们通过突出最有影响力的图像区域来进行预测,并支持对象级解释。虽然这些方法在分销(ID)环境中表现良好,但它们在分销(OOD)条件下的行为仍然知之甚少。通过在多个ID-OOD集上的广泛实验,我们发现现有的基于子集的方法的可靠性显着降低,产生冗余,不稳定和不确定性敏感的解释。为了解决这些缺点,我们引入了一个框架,该框架将子模块子集选择与逐层、基于梯度的不确定性估计相结合,以提高鲁棒性和保真度,而无需额外的训练或辅助模型。我们的方法通过自适应权重扰动来估计不确定性,并使用这些估计来指导子模块优化,确保多样化和信息丰富的子集选择。实证评估表明,除了减轻现有方法的弱点,在OOD的情况下,我们的框架也产生了改进ID设置。这些发现突出了当前基于子集的方法的局限性,并展示了不确定性驱动的优化如何增强归因和对象级可解释性,为现实世界视觉应用中更透明和更值得信赖的人工智能铺平了道路。
摘要:Subset selection-based methods are widely used to explain deep vision models: they attribute predictions by highlighting the most influential image regions and support object-level explanations. While these methods perform well in in-distribution (ID) settings, their behavior under out-of-distribution (OOD) conditions remains poorly understood. Through extensive experiments across multiple ID-OOD sets, we find that reliability of the existing subset based methods degrades markedly, yielding redundant, unstable, and uncertainty-sensitive explanations. To address these shortcomings, we introduce a framework that combines submodular subset selection with layer-wise, gradient-based uncertainty estimation to improve robustness and fidelity without requiring additional training or auxiliary models. Our approach estimates uncertainty via adaptive weight perturbations and uses these estimates to guide submodular optimization, ensuring diverse and informative subset selection. Empirical evaluations show that, beyond mitigating the weaknesses of existing methods under OOD scenarios, our framework also yields improvements in ID settings. These findings highlight limitations of current subset-based approaches and demonstrate how uncertainty-driven optimization can enhance attribution and object-level interpretability, paving the way for more transparent and trustworthy AI in real-world vision applications.


【4】SOFA-FL: Self-Organizing Hierarchical Federated Learning with Adaptive Clustered Data Sharing
标题:SOFA-FL:具有自适应并行数据共享的自组织分层联邦学习
链接:https://arxiv.org/abs/2512.08267

作者:Yi Ni,Xinkun Wang,Han Zhang
摘要:联邦学习(FL)在不断变化的环境中面临着重大挑战,特别是在数据异构性和固定网络拓扑结构的刚性方面。为了解决这些问题,本文提出了\textbf{SOFA-FL}(自组织分层联邦学习与自适应分布式数据共享),一种新的框架,使分层联邦系统自组织和适应随着时间的推移。   该框架基于三个核心机制:(1)动态多分支聚集聚类(DMAC),它构造了一个初始的有效的层次结构;(2)\textbf{自组织分层自适应传播和进化(SHAPE)},它允许系统通过原子操作动态地重构其拓扑结构-嫁接,修剪,合并,和净化--以适应数据分布的变化;以及(3)\textbf{自适应数据共享},它通过在客户端和集群节点之间启用受控的部分数据交换来减轻数据异构性。   通过集成这些机制,SOFA-FL有效地捕捉客户端之间的动态关系,并增强个性化功能,而不依赖于预定的集群结构。
摘要:Federated Learning (FL) faces significant challenges in evolving environments, particularly regarding data heterogeneity and the rigidity of fixed network topologies. To address these issues, this paper proposes \textbf{SOFA-FL} (Self-Organizing Hierarchical Federated Learning with Adaptive Clustered Data Sharing), a novel framework that enables hierarchical federated systems to self-organize and adapt over time.   The framework is built upon three core mechanisms: (1) \textbf{Dynamic Multi-branch Agglomerative Clustering (DMAC)}, which constructs an initial efficient hierarchical structure; (2) \textbf{Self-organizing Hierarchical Adaptive Propagation and Evolution (SHAPE)}, which allows the system to dynamically restructure its topology through atomic operations -- grafting, pruning, consolidation, and purification -- to adapt to changes in data distribution; and (3) \textbf{Adaptive Clustered Data Sharing}, which mitigates data heterogeneity by enabling controlled partial data exchange between clients and cluster nodes.   By integrating these mechanisms, SOFA-FL effectively captures dynamic relationships among clients and enhances personalization capabilities without relying on predetermined cluster structures.


【5】Multi-agent learning under uncertainty: Recurrence vs. concentration
标题:不确定性下的多智能体学习:复发与集中
链接:https://arxiv.org/abs/2512.08132

作者:Kyriakos Lotidis,Panayotis Mertikopoulos,Nicholas Bambos,Jose Blanchet
备注:44 pages, 17 figures
摘要:在本文中,我们研究了不确定性下的多智能体学习的收敛景观。具体来说,我们分析了两个随机模型的正则化学习在连续的游戏-一个在连续和一个在离散时间的目的是表征的长期行为的诱导序列的发挥。与确定性的、全信息的学习模型(或学习率为零的模型)形成鲜明对比的是,我们证明了由此产生的动态一般不会收敛。相反,我们会问,从长远来看,哪些行为更经常被使用,以及使用的次数有多少。我们表明,在强单调游戏中,正则化学习的动态可能会无限频繁地偏离平衡,但它们总是在有限时间内(我们估计)返回到其附近,并且它们的长期分布集中在其附近。我们量化了这种集中的程度,并且我们表明,如果底层游戏不是强单调的,这些有利的特性可能都会崩溃-以这种方式强调了在持续随机性和不确定性存在下正则化学习的局限性。
摘要 :In this paper, we examine the convergence landscape of multi-agent learning under uncertainty. Specifically, we analyze two stochastic models of regularized learning in continuous games -- one in continuous and one in discrete time with the aim of characterizing the long-run behavior of the induced sequence of play. In stark contrast to deterministic, full-information models of learning (or models with a vanishing learning rate), we show that the resulting dynamics do not converge in general. In lieu of this, we ask instead which actions are played more often in the long run, and by how much. We show that, in strongly monotone games, the dynamics of regularized learning may wander away from equilibrium infinitely often, but they always return to its vicinity in finite time (which we estimate), and their long-run distribution is sharply concentrated around a neighborhood thereof. We quantify the degree of this concentration, and we show that these favorable properties may all break down if the underlying game is not strongly monotone -- underscoring in this way the limits of regularized learning in the presence of persistent randomness and uncertainty.


【6】Learning Dynamics from Infrequent Output Measurements for Uncertainty-Aware Optimal Control
标题:从罕见的输出测量中学习动态以实现不确定性最优控制
链接:https://arxiv.org/abs/2512.08013

作者:Robert Lefringhausen,Theodor Springer,Sandra Hirche
备注:Submitted to the 2026 IFAC World Congress
摘要:可靠的最优控制是具有挑战性的,当一个非线性系统的动态是未知的,只有罕见的,有噪声的输出测量。这项工作解决了这一设置有限的传感通过制定贝叶斯先验的连续时间动态和潜在的状态轨迹的状态空间形式,并更新它通过一个有针对性的边缘大都市黑斯廷斯采样器配备了一个数值ODE积分。由此产生的后验样本用于制定一个基于神经网络的最优控制问题,占模型和测量的不确定性,并使用标准的非线性规划方法来解决。该方法是在一个数值的情况下,使用1型糖尿病模型的葡萄糖调节研究进行验证。
摘要:Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of limited sensing by formulating a Bayesian prior over the continuous-time dynamics and latent state trajectory in state-space form and updating it through a targeted marginal Metropolis-Hastings sampler equipped with a numerical ODE integrator. The resulting posterior samples are used to formulate a scenario-based optimal control problem that accounts for both model and measurement uncertainty and is solved using standard nonlinear programming methods. The approach is validated in a numerical case study on glucose regulation using a Type 1 diabetes model.


【7】Semi-Supervised Contrastive Learning with Orthonormal Prototypes
标题:使用正交原型的半监督对比学习
链接:https://arxiv.org/abs/2512.07880

作者:Huanran Li,Manh Nguyen,Daniel Pimentel-Alarcón
摘要:对比学习已经成为深度学习中一种强大的方法,擅长通过对比不同分布的样本来学习有效的表示。然而,嵌入收敛到低维空间的维度崩溃带来了重大挑战,特别是在半监督和自监督设置中。在本文中,我们首先确定了一个临界学习率阈值,超过该阈值,标准的对比损失收敛到崩溃的解决方案。基于这些见解,我们提出了CLOP,一种新的半监督损失函数,旨在通过促进类嵌入之间正交线性子空间的形成来防止维度崩溃。通过对真实和合成数据集的广泛实验,我们证明了CROP提高了图像分类和对象检测任务的性能,同时在不同的学习速率和批量大小下表现出更高的稳定性。
摘要:Contrastive learning has emerged as a powerful method in deep learning, excelling at learning effective representations through contrasting samples from different distributions. However, dimensional collapse, where embeddings converge into a lower-dimensional space, poses a significant challenge, especially in semi-supervised and self-supervised setups. In this paper, we first identify a critical learning-rate threshold, beyond which standard contrastive losses converge to collapsed solutions. Building on these insights, we propose CLOP, a novel semi-supervised loss function designed to prevent dimensional collapse by promoting the formation of orthogonal linear subspaces among class embeddings. Through extensive experiments on real and synthetic datasets, we demonstrate that CLOP improves performance in image classification and object detection tasks while also exhibiting greater stability across different learning rates and batch sizes.


【8】SetAD: Semi-Supervised Anomaly Learning in Contextual Sets
标题:SetAD:上下文集合中的半监督异常学习
链接:https://arxiv.org/abs/2512.07863

作者:Jianling Gao,Chongyang Tao,Xuelian Lin,Junfeng Liu,Shuai Ma
备注:9 pages
摘要:半监督异常检测(AD)通过有效地利用有限的标记数据显示出很大的希望。然而,现有的方法通常是围绕得分的个别点或简单的对。这种{点或对为中心}的观点不仅忽视了异常的上下文性质,这是由它们从集体组的偏差定义的,而且也未能利用丰富的监督信号,可以从集合的组合组合生成。因此,这样的模型很难利用数据中的高阶交互,这对于学习区分性表示至关重要。为了解决这些限制,我们提出了SetAD,一个新的框架,重新框架的半监督AD作为一个集级异常检测任务。SetAD采用了一个通过分级学习目标训练的基于注意力的集合编码器,其中模型学习量化整个集合内的错误程度。这种方法直接对定义异常的复杂组级交互进行建模。此外,为了增强鲁棒性和分数校准,我们提出了一个上下文校准的异常评分机制,它通过聚合多个不同上下文集的对等行为的归一化偏差来评估一个点的异常分数。在10个真实世界数据集上进行的广泛实验表明,SetAD的性能明显优于最先进的模型。值得注意的是,我们表明,我们的模型的性能不断提高,随着集大小的增加,提供了强有力的经验支持,基于集的制定异常检测。
摘要:Semi-supervised anomaly detection (AD) has shown great promise by effectively leveraging limited labeled data. However, existing methods are typically structured around scoring individual points or simple pairs. Such {point- or pair-centric} view not only overlooks the contextual nature of anomalies, which are defined by their deviation from a collective group, but also fails to exploit the rich supervisory signals that can be generated from the combinatorial composition of sets. Consequently, such models struggle to exploit the high-order interactions within the data, which are critical for learning discriminative representations. To address these limitations, we propose SetAD, a novel framework that reframes semi-supervised AD as a Set-level Anomaly Detection task. SetAD employs an attention-based set encoder trained via a graded learning objective, where the model learns to quantify the degree of anomalousness within an entire set. This approach directly models the complex group-level interactions that define anomalies. Furthermore, to enhance robustness and score calibration, we propose a context-calibrated anomaly scoring mechanism, which assesses a point's anomaly score by aggregating its normalized deviations from peer behavior across multiple, diverse contextual sets. Extensive experiments on 10 real-world datasets demonstrate that SetAD significantly outperforms state-of-the-art models. Notably, we show that our model's performance consistently improves with increasing set size, providing strong empirical support for the set-based formulation of anomaly detection.


迁移|Zero/Few/One-Shot|自适应(4篇)

【1】FedLAD: A Modular and Adaptive Testbed for Federated Log Anomaly Detection
标题:FedLAD:一个用于联邦日志异常检测的模块化自适应测试床
链接:https://arxiv.org/abs/2512.08277

作者:Yihan Liao,Jacky Keung,Zhenyu Mao,Jingyu Zhang,Jialong Li
备注:Accepted Artifact at ACSOS 2025
摘要:基于日志的异常检测(LAD)是保证大规模分布式系统可靠性的关键。然而,大多数现有的LAD方法假设集中式训练,这往往是不切实际的,由于隐私的限制和分散的性质,系统日志。虽然联邦学习(FL)提供了一个有前途的替代方案,但缺乏专门的测试平台,以满足联邦环境中LAD的需求。为了解决这个问题,我们提出了FedLAD,一个统一的平台,用于在FL约束下训练和评估LAD模型。FedLAD支持多种LAD模型、基准数据集和聚合策略的即插即用集成,同时为验证日志(自监控)、参数调优(自配置)和自适应策略控制(自适应)提供运行时支持。通过实现可重复和可扩展的实验,FedLAD弥合了FL框架和LAD要求之间的差距,为未来的研究提供了坚实的基础。项目代码可在https://github.com/AA-cityu/FedLAD上公开获取。
摘要:Log-based anomaly detection (LAD) is critical for ensuring the reliability of large-scale distributed systems. However, most existing LAD approaches assume centralized training, which is often impractical due to privacy constraints and the decentralized nature of system logs. While federated learning (FL) offers a promising alternative, there is a lack of dedicated testbeds tailored to the needs of LAD in federated settings. To address this, we present FedLAD, a unified platform for training and evaluating LAD models under FL constraints. FedLAD supports plug-and-play integration of diverse LAD models, benchmark datasets, and aggregation strategies, while offering runtime support for validation logging (self-monitoring), parameter tuning (self-configuration), and adaptive strategy control (self-adaptation). By enabling reproducible and scalable experimentation, FedLAD bridges the gap between FL frameworks and LAD requirements, providing a solid foundation for future research. Project code is publicly available at: https://github.com/AA-cityu/FedLAD.


【2】Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation
标题:Zero-Splat TeleAssistant:一种用于语义遥操作的Zero-Shot姿势估计框架
链接:https://arxiv.org/abs/2512.08271

作者:Srijan Dokania,Dharini Raghavan
备注:Published and Presented at 3rd Workshop on Human-Centric Multilateral Teleoperation in ICRA 2025
摘要:我们介绍零Splat TeleAssist,这是一种zero-shot传感器融合管道,可将商品CCTV流转换为多边远程操作的共享6自由度世界模型。通过集成视觉语言分割,单目深度,加权PCA姿态提取和3D高斯溅射(3DGS),TeleAssist为每个操作员提供多个机器人的实时全局位置和方向,而无需以交互为中心的远程操作设置中的基准或深度传感器。
摘要:We introduce Zero-Splat TeleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleoperation. By integrating vision-language segmentation, monocular depth, weighted-PCA pose extraction, and 3D Gaussian Splatting (3DGS), TeleAssist provides every operator with real-time global positions and orientations of multiple robots without fiducials or depth sensors in an interaction-centric teleoperation setup.


【3】Minimax and Bayes Optimal Adaptive Experimental Design for Treatment Choice
标题:治疗选择的Minimax和Bayes最佳自适应实验设计
链接:https://arxiv.org/abs/2512.08513

作者:Masahiro Kato
摘要:我们考虑了一个治疗选择的自适应实验,并设计了一个关于后悔的极大极小和贝叶斯最优自适应实验。给定二元处理,实验者的目标是通过适应性实验选择具有最高预期结果的处理,以最大化福利。我们认为自适应实验包括两个阶段,治疗分配阶段和治疗选择阶段。实验从治疗分配阶段开始,实验者将治疗分配给实验对象以收集观察结果。在此阶段,实验者可以使用实验中获得的观测值自适应地更新分配概率。在分配阶段之后,实验者进行到处理选择阶段,其中一个处理被选为最佳。对于这种自适应的实验过程中,我们提出了一个自适应的实验,分裂的治疗分配阶段分为两个阶段,我们首先估计的标准偏差,然后分配每个治疗成比例的标准偏差。我们表明,这个实验,通常被称为奈曼分配,是极大极小和贝叶斯最优的意义上说,它的遗憾上限完全匹配的下限,我们得到的。为了显示这种最优性,我们推导出极大极小和贝叶斯下限的遗憾使用改变测量参数。然后,我们利用中心极限定理和大偏差界来估计相应的上界。
摘要:We consider an adaptive experiment for treatment choice and design a minimax and Bayes optimal adaptive experiment with respect to regret. Given binary treatments, the experimenter's goal is to choose the treatment with the highest expected outcome through an adaptive experiment, in order to maximize welfare. We consider adaptive experiments that consist of two phases, the treatment allocation phase and the treatment choice phase. The experiment starts with the treatment allocation phase, where the experimenter allocates treatments to experimental subjects to gather observations. During this phase, the experimenter can adaptively update the allocation probabilities using the observations obtained in the experiment. After the allocation phase, the experimenter proceeds to the treatment choice phase, where one of the treatments is selected as the best. For this adaptive experimental procedure, we propose an adaptive experiment that splits the treatment allocation phase into two stages, where we first estimate the standard deviations and then allocate each treatment proportionally to its standard deviation. We show that this experiment, often referred to as Neyman allocation, is minimax and Bayes optimal in the sense that its regret upper bounds exactly match the lower bounds that we derive. To show this optimality, we derive minimax and Bayes lower bounds for the regret using change-of-measure arguments. Then, we evaluate the corresponding upper bounds using the central limit theorem and large deviation bounds.


【4】Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification
标题 :具有自适应成本敏感分裂的功能性随机森林用于不平衡功能性数据分类
链接:https://arxiv.org/abs/2512.07888

作者:Fahad Mostafa,Hafiz Khan
备注:23 pages, 4 figures
摘要:在观测值为曲线或轨迹的情况下,对函数数据进行分类构成了独特的挑战,特别是在严重的类别不平衡的情况下。传统的随机森林算法,虽然强大的表格数据,往往无法捕捉的内在结构的功能性观察和斗争与少数类检测。本文介绍了一种新的集成框架FRF-ACS(Functional Random Forest with Adaptive Cost-Sensitive Splitting),它是为不平衡功能数据分类而设计的。所提出的方法利用基扩展和函数主成分分析(FPCA)来有效地表示曲线,使树能够对低维函数特征进行操作。为了解决不平衡问题,我们采用了一个动态的成本敏感的分裂标准,调整类权重本地在每个节点上,结合一个混合采样策略,集成功能SMOTE和加权自举。此外,曲线特定的相似性度量取代传统的欧几里德措施,以保持叶分配过程中的功能特性。在包括生物医学信号和传感器轨迹的合成和真实世界数据集上进行的广泛实验表明,与现有的功能分类器和不平衡处理技术相比,FRF-ACS显着提高了少数类召回和整体预测性能。这项工作提供了一个可扩展的,可解释的解决方案,高维功能数据分析领域的少数类检测是至关重要的。
摘要:Classification of functional data where observations are curves or trajectories poses unique challenges, particularly under severe class imbalance. Traditional Random Forest algorithms, while robust for tabular data, often fail to capture the intrinsic structure of functional observations and struggle with minority class detection. This paper introduces Functional Random Forest with Adaptive Cost-Sensitive Splitting (FRF-ACS), a novel ensemble framework designed for imbalanced functional data classification. The proposed method leverages basis expansions and Functional Principal Component Analysis (FPCA) to represent curves efficiently, enabling trees to operate on low dimensional functional features. To address imbalance, we incorporate a dynamic cost sensitive splitting criterion that adjusts class weights locally at each node, combined with a hybrid sampling strategy integrating functional SMOTE and weighted bootstrapping. Additionally, curve specific similarity metrics replace traditional Euclidean measures to preserve functional characteristics during leaf assignment. Extensive experiments on synthetic and real world datasets including biomedical signals and sensor trajectories demonstrate that FRF-ACS significantly improves minority class recall and overall predictive performance compared to existing functional classifiers and imbalance handling techniques. This work provides a scalable, interpretable solution for high dimensional functional data analysis in domains where minority class detection is critical.


强化学习(7篇)

【1】Reinforcement Learning From State and Temporal Differences
标题:来自状态和时间差异的强化学习
链接:https://arxiv.org/abs/2512.08855

作者:Lex Weaver,Jonathan Baxter
摘要:具有函数逼近的TD($λ$)在一些复杂的强化学习问题上已经被证明是成功的。对于线性近似,TD($λ$)已被证明可以最小化每个状态的近似值与真实值之间的平方误差。然而,就政策而言,状态的相对顺序错误才是关键,而不是状态值的错误。我们说明了这一点,无论是在简单的两个状态和三个状态的系统中,TD($λ$)-从一个最优的政策-收敛到一个次优的政策,也在西洋双陆棋。然后,我们提出了TD($λ$)的修改形式,称为STD($λ$),其中函数逼近器相对于二元决策问题的相对状态值进行训练。本文给出了一个理论分析,包括在两状态系统的背景下证明了STD($λ$)的单调策略改进,并与Bertsekas的微分训练方法[1]进行了比较。其次是STD($λ$)在两态系统上的成功演示和著名的杂技演员问题的变体。
摘要:TD($λ$) with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD($λ$) has been shown to minimise the squared error between the approximate value of each state and the true value. However, as far as policy is concerned, it is error in the relative ordering of states that is critical, rather than error in the state values. We illustrate this point, both in simple two-state and three-state systems in which TD($λ$)--starting from an optimal policy--converges to a sub-optimal policy, and also in backgammon. We then present a modified form of TD($λ$), called STD($λ$), in which function approximators are trained with respect to relative state values on binary decision problems. A theoretical analysis, including a proof of monotonic policy improvement for STD($λ$) in the context of the two-state system, is presented, along with a comparison with Bertsekas' differential training method [1]. This is followed by successful demonstrations of STD($λ$) on the two-state system and a variation on the well known acrobot problem.


【2】Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning
标题:离线强化学习中数据中毒的最优扰动预算分配
链接:https://arxiv.org/abs/2512.08485

作者:Junnan Qiu,Jie Li
摘要:离线强化学习(RL)可以从静态数据集进行策略优化,但本质上容易受到数据中毒攻击。现有的攻击策略通常依赖于局部均匀扰动,不加区别地对待所有样本。这种方法是低效的,因为它浪费了低影响样本的扰动预算,并且由于显著的统计偏差而缺乏隐蔽性。在本文中,我们提出了一种新的全球预算分配攻击策略。利用理论上的洞察力,一个样本的影响值函数收敛成正比,其时间差(TD)的错误,我们制定了一个全球性的资源分配问题的攻击。我们推导出一个封闭形式的解决方案,其中扰动幅度分配成比例的TD误差的灵敏度下的全球L2约束。D4RL基准测试的实证结果表明,我们的方法显着优于基线策略,实现高达80%的性能下降,最小的扰动,逃避检测的最先进的统计和频谱防御。
摘要 :Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to data poisoning attacks. Existing attack strategies typically rely on locally uniform perturbations, which treat all samples indiscriminately. This approach is inefficient, as it wastes the perturbation budget on low-impact samples, and lacks stealthiness due to significant statistical deviations. In this paper, we propose a novel Global Budget Allocation attack strategy. Leveraging the theoretical insight that a sample's influence on value function convergence is proportional to its Temporal Difference (TD) error, we formulate the attack as a global resource allocation problem. We derive a closed-form solution where perturbation magnitudes are assigned proportional to the TD-error sensitivity under a global L2 constraint. Empirical results on D4RL benchmarks demonstrate that our method significantly outperforms baseline strategies, achieving up to 80% performance degradation with minimal perturbations that evade detection by state-of-the-art statistical and spectral defenses.


【3】Using reinforcement learning to probe the role of feedback in skill acquisition
标题:使用强化学习探讨反馈在技能习得中的作用
链接:https://arxiv.org/abs/2512.08463

作者:Antonio Terpin,Raffaello D'Andrea
备注:Website: https://antonioterpin.com/fluids-control
摘要:许多高性能的人类活动是在很少或没有外部反馈的情况下执行的:想想花样滑冰运动员三级跳远,投手投出弧线球,或者咖啡师倒拿铁艺术。为了研究在完全受控的条件下技能获得的过程,我们绕过了人类受试者。相反,我们直接将通才强化学习代理与桌面循环水通道中的旋转圆柱体连接,以最大化或最小化阻力。该设置具有几个期望的属性。首先,它是一个物理系统,具有只有物理世界才具有的丰富的相互作用和复杂的动态:流动是高度混乱的,即使不是不可能,也非常难以精确建模或模拟。第二,目标--阻力最小化或最大化--很容易陈述,可以直接在奖励中捕捉到,但好的策略事先并不明显。第三,几十年的实验研究为简单、高性能的开环政策提供了方法。最后,该装置价格低廉,比人类研究更容易复制。在我们的实验中,我们发现,高维流反馈,让代理发现高性能的阻力控制策略,只有几分钟的真实世界的互动。当我们在没有任何反馈的情况下重放相同的动作序列时,我们获得了几乎相同的表现。这表明,执行学习策略不需要反馈,特别是流反馈。令人惊讶的是,在训练过程中没有流量反馈的情况下,智能体无法在阻力最大化中发现任何表现良好的策略,但仍然成功地实现了阻力最小化,尽管速度更慢,可靠性更低。我们的研究表明,学习一项高性能技能可能需要比执行它更丰富的信息,学习条件可以是好的或坏的,这完全取决于目标,而不是动态或政策的复杂性。
摘要:Many high-performance human activities are executed with little or no external feedback: think of a figure skater landing a triple jump, a pitcher throwing a curveball for a strike, or a barista pouring latte art. To study the process of skill acquisition under fully controlled conditions, we bypass human subjects. Instead, we directly interface a generalist reinforcement learning agent with a spinning cylinder in a tabletop circulating water channel to maximize or minimize drag. This setup has several desirable properties. First, it is a physical system, with the rich interactions and complex dynamics that only the physical world has: the flow is highly chaotic and extremely difficult, if not impossible, to model or simulate accurately. Second, the objective -- drag minimization or maximization -- is easy to state and can be captured directly in the reward, yet good strategies are not obvious beforehand. Third, decades-old experimental studies provide recipes for simple, high-performance open-loop policies. Finally, the setup is inexpensive and far easier to reproduce than human studies. In our experiments we find that high-dimensional flow feedback lets the agent discover high-performance drag-control strategies with only minutes of real-world interaction. When we later replay the same action sequences without any feedback, we obtain almost identical performance. This shows that feedback, and in particular flow feedback, is not needed to execute the learned policy. Surprisingly, without flow feedback during training the agent fails to discover any well-performing policy in drag maximization, but still succeeds in drag minimization, albeit more slowly and less reliably. Our studies show that learning a high-performance skill can require richer information than executing it, and learning conditions can be kind or wicked depending solely on the goal, not on dynamics or policy complexity.


【4】Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks
标题:干扰攻击下无人机协作中继网络的多智能体深度强化学习
链接:https://arxiv.org/abs/2512.08341

作者:Thai Duong Nguyen,Ngoc-Tan Nguyen,Thanh-Dao Nguyen,Nguyen Van Huynh,Dinh-Hieu Tran,Symeon Chatzinotas
备注:IEEE ICC 2026
摘要:部署无人机机群作为动态通信中继是下一代战术网络的关键。然而,在竞争激烈的环境中运行需要解决复杂的权衡,包括最大化系统吞吐量,同时确保避免冲突和对抗性干扰的弹性。由于该问题的动态性和多目标性,现有的基于进化论的方法通常难以找到有效的解决方案。本文将这一挑战描述为一个合作的多智能体强化学习(MARL)问题,使用集中式训练与分散式执行(CTDE)框架解决。我们的方法采用了一个集中的评论家,使用全球状态信息来指导分散的演员,只使用本地观察。仿真结果表明,我们提出的框架显着优于启发式基线,增加了约50%的总系统吞吐量,同时实现了接近零的碰撞率。一个关键的发现是,代理开发一个紧急的抗干扰策略,没有明确的编程。他们学会智能地定位自己,以平衡减轻干扰机干扰和保持与地面用户的有效通信联系之间的权衡。
摘要:The deployment of Unmanned Aerial Vehicle (UAV) swarms as dynamic communication relays is critical for next-generation tactical networks. However, operating in contested environments requires solving a complex trade-off, including maximizing system throughput while ensuring collision avoidance and resilience against adversarial jamming. Existing heuristic-based approaches often struggle to find effective solutions due to the dynamic and multi-objective nature of this problem. This paper formulates this challenge as a cooperative Multi-Agent Reinforcement Learning (MARL) problem, solved using the Centralized Training with Decentralized Execution (CTDE) framework. Our approach employs a centralized critic that uses global state information to guide decentralized actors which operate using only local observations. Simulation results show that our proposed framework significantly outperforms heuristic baselines, increasing the total system throughput by approximately 50% while simultaneously achieving a near-zero collision rate. A key finding is that the agents develop an emergent anti-jamming strategy without explicit programming. They learn to intelligently position themselves to balance the trade-off between mitigating interference from jammers and maintaining effective communication links with ground users.


【5】An Introduction to Deep Reinforcement and Imitation Learning
标题:深度强化和模仿学习简介
链接:https://arxiv.org/abs/2512.08052

作者:Pedro Santana
摘要 :机器人和虚拟角色等智能体必须不断选择动作来有效执行任务,解决复杂的顺序决策问题。鉴于手动设计此类控制器的难度,基于学习的方法已成为有前途的替代方案,最值得注意的是深度强化学习(DRL)和深度模仿学习(DIL)。DRL利用奖励信号来优化行为,而DIL使用专家演示来指导学习。本文介绍了DRL和DIL的体现代理的上下文中,采用简洁,深度优先的方法来文献。它是独立的,根据需要提供所有必要的数学和机器学习概念。它并不打算作为该领域的调查;相反,它专注于一小部分基础算法和技术,优先考虑深入理解而不是广泛覆盖。这些材料的范围从马尔可夫决策过程到DRL的REINFORCE和邻近策略优化(PPO),从行为克隆到DIL的数据集聚合(Dagger)和生成对抗模仿学习(GAIL)。
摘要:Embodied agents, such as robots and virtual characters, must continuously select actions to execute tasks effectively, solving complex sequential decision-making problems. Given the difficulty of designing such controllers manually, learning-based approaches have emerged as promising alternatives, most notably Deep Reinforcement Learning (DRL) and Deep Imitation Learning (DIL). DRL leverages reward signals to optimize behavior, while DIL uses expert demonstrations to guide learning. This document introduces DRL and DIL in the context of embodied agents, adopting a concise, depth-first approach to the literature. It is self-contained, presenting all necessary mathematical and machine learning concepts as they are needed. It is not intended as a survey of the field; rather, it focuses on a small set of foundational algorithms and techniques, prioritizing in-depth understanding over broad coverage. The material ranges from Markov Decision Processes to REINFORCE and Proximal Policy Optimization (PPO) for DRL, and from Behavioral Cloning to Dataset Aggregation (DAgger) and Generative Adversarial Imitation Learning (GAIL) for DIL.


【6】Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care
标题:重症监护中的离线多目标强化学习基准
链接:https://arxiv.org/abs/2512.08012

作者:Aryaman Bansal,Divya Sharma
摘要:在重症监护室等重症监护环境中,临床医生面临着平衡冲突目标的复杂挑战,主要是最大限度地提高患者生存率,同时最小化资源利用(例如,停留时间)。单目标强化学习方法通常通过优化固定的标量化奖励函数来解决这个问题,导致僵化的策略无法适应不同的临床优先级。多目标强化学习(MORL)通过沿着帕累托边界学习一组最优策略来提供解决方案,允许在测试时进行动态偏好选择。然而,在医疗保健中应用MORL需要从历史数据中进行严格的离线学习。   在本文中,我们基准测试三个离线MORL算法,条件保守的帕累托Q学习(CPQL),自适应CPQL,和修改后的帕累托有效决策代理(PEDA)决策Transformer(PEDA DT),对三个标度化的单目标基线(BC,CQL和DDQN)的MIMIC-IV数据集。使用关闭政策评估(OPE)指标,我们证明了PEDA DT算法提供了更好的灵活性相比,静态标量基线。值得注意的是,我们的研究结果扩展了以前在医疗保健中的单目标决策Transformers的研究结果,证实了序列建模架构在扩展到多目标条件生成时仍然是稳健和有效的。这些研究结果表明,离线MORL是一个很有前途的框架,使个性化的,可调整的决策在重症监护,而不需要再培训。
摘要:In critical care settings such as the Intensive Care Unit, clinicians face the complex challenge of balancing conflicting objectives, primarily maximizing patient survival while minimizing resource utilization (e.g., length of stay). Single-objective Reinforcement Learning approaches typically address this by optimizing a fixed scalarized reward function, resulting in rigid policies that fail to adapt to varying clinical priorities. Multi-objective Reinforcement Learning (MORL) offers a solution by learning a set of optimal policies along the Pareto Frontier, allowing for dynamic preference selection at test time. However, applying MORL in healthcare necessitates strict offline learning from historical data.   In this paper, we benchmark three offline MORL algorithms, Conditioned Conservative Pareto Q-Learning (CPQL), Adaptive CPQL, and a modified Pareto Efficient Decision Agent (PEDA) Decision Transformer (PEDA DT), against three scalarized single-objective baselines (BC, CQL, and DDQN) on the MIMIC-IV dataset. Using Off-Policy Evaluation (OPE) metrics, we demonstrate that PEDA DT algorithm offers superior flexibility compared to static scalarized baselines. Notably, our results extend previous findings on single-objective Decision Transformers in healthcare, confirming that sequence modeling architectures remain robust and effective when scaled to multi-objective conditioned generation. These findings suggest that offline MORL is a promising framework for enabling personalized, adjustable decision-making in critical care without the need for retraining.


【7】Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis
标题:基于价值的强化学习的组合优化启发式方法:统一框架和分析
链接:https://arxiv.org/abs/2512.08601

作者:Orit Davidovich,Shimrit Shtern,Segev Wasserkrug,Nimrod Megiddo
摘要:自20世纪90年代以来,已经进行了大量的实证工作,训练统计模型,如神经网络(NN),学习算法的组合优化(CO)问题。如果成功,这种方法就不需要专家为每个问题类型设计算法。由于其结构,许多硬CO问题可以通过强化学习(RL)进行处理。事实上,我们发现了大量的文献使用基于值的,政策梯度,或演员-评论家的方法训练神经网络,有希望的结果,无论是在经验最优差距和推理运行时。然而,一直缺乏理论工作支持使用RL CO问题。为此,我们引入了一个统一的框架,通过马尔可夫决策过程(MDP)来建模CO问题,并使用RL技术来解决这些问题。我们提供了易于测试的假设下,CO问题可以制定为等价的未贴现的MDP,提供最佳的解决方案,原来的CO问题。此外,我们建立条件下,基于值的RL技术收敛到近似的CO问题的解决方案,保证相关的最优性差距。我们的融合分析提供:(1)在每次RL迭代时,批量大小和投影梯度下降步骤的足够增长率;(2)在问题参数和目标RL精度方面产生的最优性差距;以及(3)选择状态空间嵌入的重要性。总之,我们的分析阐明了著名的深度Q学习算法在这个问题背景下的成功(和局限性)。
摘要 :Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts to design heuristics per problem type. Due to their structure, many hard CO problems are amenable to treatment through reinforcement learning (RL). Indeed, we find a wealth of literature training NNs using value-based, policy gradient, or actor-critic approaches, with promising results, both in terms of empirical optimality gaps and inference runtimes. Nevertheless, there has been a paucity of theoretical work undergirding the use of RL for CO problems. To this end, we introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using RL techniques. We provide easy-to-test assumptions under which CO problems can be formulated as equivalent undiscounted MDPs that provide optimal solutions to the original CO problems. Moreover, we establish conditions under which value-based RL techniques converge to approximate solutions of the CO problem with a guarantee on the associated optimality gap. Our convergence analysis provides: (1) a sufficient rate of increase in batch size and projected gradient descent steps at each RL iteration; (2) the resulting optimality gap in terms of problem parameters and targeted RL accuracy; and (3) the importance of a choice of state-space embedding. Together, our analysis illuminates the success (and limitations) of the celebrated deep Q-learning algorithm in this problem context.


元学习(1篇)

【1】A Multivariate Bernoulli-Based Sampling Method for Multi-Label Data with Application to Meta-Research
标题:基于伯努里的多标签数据多元抽样方法及其在元研究中的应用
链接:https://arxiv.org/abs/2512.08371

作者:Simon Chung,Colby J. Vorland,Donna L. Maney,Andrew W. Brown
摘要:数据集可能包含具有多个标签的观测。如果标签不是相互排斥的,并且如果标签在频率上变化很大,则获得包括具有更少标签的足够观察的样本以进行关于这些标签的推断,并且以已知的方式偏离群体频率,这就带来了挑战。在本文中,我们考虑一个多元伯努利分布作为我们的多标签问题的基础分布。我们提出了一种新的采样算法,标签依赖性考虑。它使用观察到的标签频率来估计多变量伯努利分布参数,并计算每个标签组合的权重。这种方法确保加权采样在考虑标签依赖性的同时获得目标分布特征。我们将这种方法应用于Web of Science中标记有64个生物医学主题类别的研究文章样本。我们的目标是保持类别频率顺序,减少最常见和最不常见类别之间的频率差异,并考虑类别依赖性。这一方法产生了一个更加平衡的子样本,提高了少数群体类别的代表性。
摘要:Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient observations with scarcer labels to make inferences about those labels, and which deviates from the population frequencies in a known manner, creates challenges. In this paper, we consider a multivariate Bernoulli distribution as our underlying distribution of a multi-label problem. We present a novel sampling algorithm that takes label dependencies into account. It uses observed label frequencies to estimate multivariate Bernoulli distribution parameters and calculate weights for each label combination. This approach ensures the weighted sampling acquires target distribution characteristics while accounting for label dependencies. We applied this approach to a sample of research articles from Web of Science labeled with 64 biomedical topic categories. We aimed to preserve category frequency order, reduce frequency differences between most and least common categories, and account for category dependencies. This approach produced a more balanced sub-sample, enhancing the representation of minority categories.


符号|符号学习(1篇)

【1】Towards symbolic regression for interpretable clinical decision scores
标题:迈向可解释临床决策评分的符号回归
链接:https://arxiv.org/abs/2512.07961

作者:Guilherme Seidyo Imai Aldeia,Joseph D. Romano,Fabricio Olivetti de Franca,Daniel S. Herman,William G. La Cava
备注:15 pages, 5 figures. Accepted for publication in Philosophical Transactions A. Autor Accepted Manuscript version
摘要:医疗决策经常使用将风险方程与规则相结合的算法,提供明确和标准化的治疗途径。传统上,符号回归(SR)将其搜索空间限制为连续函数形式及其参数,这使得难以对这种决策进行建模。然而,由于其能够导出数据驱动的可解释模型,SR有望开发数据驱动的临床风险评分。为此,我们引入了Brush,这是一种SR算法,它将决策树式分裂算法与非线性常数优化相结合,允许将基于规则的逻辑无缝集成到符号回归和分类模型中。Brush在SRBench上实现了帕累托最优性能,并被应用于概括两个广泛使用的临床评分系统,实现了高准确性和可解释的模型。与决策树、随机森林和其他SR方法相比,Brush在生成更简单的模型的同时实现了相当或更好的预测性能。
摘要:Medical decision-making makes frequent use of algorithms that combine risk equations with rules, providing clear and standardized treatment pathways. Symbolic regression (SR) traditionally limits its search space to continuous function forms and their parameters, making it difficult to model this decision-making. However, due to its ability to derive data-driven, interpretable models, SR holds promise for developing data-driven clinical risk scores. To that end we introduce Brush, an SR algorithm that combines decision-tree-like splitting algorithms with non-linear constant optimization, allowing for seamless integration of rule-based logic into symbolic regression and classification models. Brush achieves Pareto-optimal performance on SRBench, and was applied to recapitulate two widely used clinical scoring systems, achieving high accuracy and interpretable models. Compared to decision trees, random forests, and other SR methods, Brush achieves comparable or superior predictive performance while producing simpler models.


医学相关(4篇)

【1】CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space
标题:CLARITY:通过在潜在空间中建模上下文感知疾病轨迹来指导治疗决策的医学世界模型
链接:https://arxiv.org/abs/2512.08029

作者:Tianxingjian Ding,Yuanhao Zou,Chen Chen,Mubarak Shah,Yu Tian
摘要 :肿瘤学的临床决策需要预测动态疾病演变,这是当前静态AI预测器无法执行的任务。虽然世界模型(WM)为生成预测提供了一个范例,但现有的医学应用仍然有限。现有的方法往往依赖于随机扩散模型,专注于视觉重建,而不是因果关系,生理过渡。此外,在医学领域,像MeWM这样的模型通常忽略患者特定的时间和临床背景,并且缺乏将预测与治疗决策联系起来的反馈机制。为了解决这些差距,我们引入了一个医学世界模型,它可以直接在结构化的潜在空间内预测疾病的演变。它明确地整合了时间间隔(时间背景)和患者特定数据(临床背景),将治疗条件性进展建模为平滑,可解释的轨迹,从而生成生理上忠实的个性化治疗计划。最后,Quantiity引入了一个新的预测到决策框架,将潜在的推出转化为透明的,可操作的建议。在治疗计划方面,PERFORITY展示了最先进的性能。在MU-Glioma-Post数据集上,我们的方法比最近的MeWM高出12%,并且显著超过了所有其他医学特定的大型语言模型。
摘要:Clinical decision-making in oncology requires predicting dynamic disease evolution, a task current static AI predictors cannot perform. While world models (WMs) offer a paradigm for generative prediction, existing medical applications remain limited. Existing methods often rely on stochastic diffusion models, focusing on visual reconstruction rather than causal, physiological transitions. Furthermore, in medical domain, models like MeWM typically ignore patient-specific temporal and clinical contexts and lack a feedback mechanism to link predictions to treatment decisions. To address these gaps, we introduce CLARITY, a medical world model that forecasts disease evolution directly within a structured latent space. It explicitly integrates time intervals (temporal context) and patient-specific data (clinical context) to model treatment-conditioned progression as a smooth, interpretable trajectory, and thus generate physiologically faithful, individualized treatment plans. Finally, CLARITY introduces a novel prediction-to-decision framework, translating latent rollouts into transparent, actionable recommendations. CLARITY demonstrates state-of-the-art performance in treatment planning. On the MU-Glioma-Post dataset, our approach outperforms recent MeWM by 12\%, and significantly surpasses all other medical-specific large language models.


【2】Bridging the Clinical Expertise Gap: Development of a Web-Based Platform for Accessible Time Series Forecasting and Analysis
标题:弥合临床专业知识差距:开发基于Web的可访问时间序列预测和分析平台
链接:https://arxiv.org/abs/2512.07992

作者:Aaron D. Mullen,Daniel R. Harris,Svetla Slavova,V. K. Cody Bumgardner
摘要:时间序列预测在各个领域和行业都有应用,特别是在医疗保健领域,但分析数据、构建模型和解释结果所需的技术专业知识可能是使用这些技术的障碍。本文介绍了一个网络平台,使分析和绘制数据,训练预测模型,解释和查看结果的过程可供研究人员和临床医生访问。用户可以上传数据并生成图表来展示他们的变量以及它们之间的关系。该平台支持多种预测模型和训练技术,可根据用户的需求进行高度定制。此外,可以从大型语言模型中生成建议和解释,这可以帮助用户为其数据选择适当的参数,并理解每个模型的结果。我们的目标是将该平台集成到学习健康系统中,以便从临床管道中连续收集数据和进行推断。
摘要:Time series forecasting has applications across domains and industries, especially in healthcare, but the technical expertise required to analyze data, build models, and interpret results can be a barrier to using these techniques. This article presents a web platform that makes the process of analyzing and plotting data, training forecasting models, and interpreting and viewing results accessible to researchers and clinicians. Users can upload data and generate plots to showcase their variables and the relationships between them. The platform supports multiple forecasting models and training techniques which are highly customizable according to the user's needs. Additionally, recommendations and explanations can be generated from a large language model that can help the user choose appropriate parameters for their data and understand the results for each model. The goal is to integrate this platform into learning health systems for continuous data collection and inference from clinical pipelines.


【3】Medical Test-free Disease Detection Based on Big Data
标题:基于大数据的医学免检测疾病检测
链接:https://arxiv.org/abs/2512.07856

作者:Haokun Zhao,Yingzhe Bai,Qingyang Xu,Lixin Zhou,Jianxin Chen,Jicong Fan
摘要:准确的疾病检测对于有效的医疗和患者护理至关重要。然而,疾病检测的过程通常与广泛的医学测试和相当大的成本相关联,使得对患者进行所有可能的医学测试以诊断或预测数百或数千种疾病是不切实际的。在这项工作中,我们提出了疾病检测协作学习(CLDD),这是一种新的基于图的深度学习模型,通过自适应地利用疾病之间的关联和患者之间的相似性,将疾病检测制定为协作学习任务。CLDD整合了电子健康记录中的患者-疾病相互作用和人口统计学特征,为每位患者检测数百或数千种疾病,几乎不依赖相应的医学测试。在包含61,191名患者和2,000种疾病的MIMIC-IV数据集的处理版本上进行的广泛实验表明,CLDD在多个指标上始终优于代表性基线,实现了6.33%的召回率提高和7.63%的精度提高。此外,对个体患者的案例研究表明,CLDD可以成功地在其排名靠前的预测中恢复被掩盖的疾病,证明了疾病预测的可解释性和可靠性。通过降低诊断成本和改善可及性,CLDD有望实现大规模疾病筛查和社会健康保障。
摘要:Accurate disease detection is of paramount importance for effective medical treatment and patient care. However, the process of disease detection is often associated with extensive medical testing and considerable costs, making it impractical to perform all possible medical tests on a patient to diagnose or predict hundreds or thousands of diseases. In this work, we propose Collaborative Learning for Disease Detection (CLDD), a novel graph-based deep learning model that formulates disease detection as a collaborative learning task by exploiting associations among diseases and similarities among patients adaptively. CLDD integrates patient-disease interactions and demographic features from electronic health records to detect hundreds or thousands of diseases for every patient, with little to no reliance on the corresponding medical tests. Extensive experiments on a processed version of the MIMIC-IV dataset comprising 61,191 patients and 2,000 diseases demonstrate that CLDD consistently outperforms representative baselines across multiple metrics, achieving a 6.33\% improvement in recall and 7.63\% improvement in precision. Furthermore, case studies on individual patients illustrate that CLDD can successfully recover masked diseases within its top-ranked predictions, demonstrating both interpretability and reliability in disease prediction. By reducing diagnostic costs and improving accessibility, CLDD holds promise for large-scale disease screening and social health security.


【4】Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation
标题:肿瘤锚定深度特征随机森林用于肺癌分割中的分布外检测
链接:https://arxiv.org/abs/2512.08216

作者:Aneesh Rangnekar,Harini Veeraraghavan
摘要 :从3D计算机断层扫描(CT)扫描中准确分割癌性病变对于自动治疗计划和反应评估至关重要。然而,即使是将自我监督学习(SSL)预训练的Transformers与卷积解码器相结合的最先进的模型也容易受到分布外(OOD)输入的影响,从而产生不正确的肿瘤分割,从而对安全的临床部署构成风险。现有的基于logit的方法受到特定于任务的模型偏差的影响,而明确检测OOD的架构增强增加了参数和计算成本。因此,我们引入了一个即插即用和轻量级的事后随机森林为基础的OOD检测框架,称为RF-Deep,利用有限的离群值暴露的深功能。RF-Deep通过重新利用来自预训练然后微调的骨干编码器的分层特征来增强对成像变化的泛化,通过从锚定到预测肿瘤分割的多个感兴趣区域提取特征来提供任务相关的OOD检测。因此,它可以缩放到不同视场的图像。我们使用近OOD(肺栓塞,阴性COVID-19)和远OOD(肾癌,健康胰腺)数据集的1,916次CT扫描将RF-Deep与现有OOD检测方法进行了比较。RF-Deep对于具有挑战性的近OOD数据集实现了AUROC > 93.50,对于远OOD数据集实现了近乎完美的检测(AUROC > 99.00),大大优于基于logit和放射组学的方法。RF-Deep在不同深度和预训练策略的网络中保持了类似的性能一致性,证明了其作为一种轻量级的、与架构无关的方法的有效性,可以提高从CT体积中分割肿瘤的可靠性。
摘要:Accurate segmentation of cancerous lesions from 3D computed tomography (CT) scans is essential for automated treatment planning and response assessment. However, even state-of-the-art models combining self-supervised learning (SSL) pretrained transformers with convolutional decoders are susceptible to out-of-distribution (OOD) inputs, generating confidently incorrect tumor segmentations, posing risks for safe clinical deployment. Existing logit-based methods suffer from task-specific model biases, while architectural enhancements to explicitly detect OOD increase parameters and computational costs. Hence, we introduce a plug-and-play and lightweight post-hoc random forests-based OOD detection framework called RF-Deep that leverages deep features with limited outlier exposure. RF-Deep enhances generalization to imaging variations by repurposing the hierarchical features from the pretrained-then-finetuned backbone encoder, providing task-relevant OOD detection by extracting the features from multiple regions of interest anchored to the predicted tumor segmentations. Hence, it scales to images of varying fields-of-view. We compared RF-Deep against existing OOD detection methods using 1,916 CT scans across near-OOD (pulmonary embolism, negative COVID-19) and far-OOD (kidney cancer, healthy pancreas) datasets. RF-Deep achieved AUROC > 93.50 for the challenging near-OOD datasets and near-perfect detection (AUROC > 99.00) for the far-OOD datasets, substantially outperforming logit-based and radiomics approaches. RF-Deep maintained similar performance consistency across networks of different depths and pretraining strategies, demonstrating its effectiveness as a lightweight, architecture-agnostic approach to enhance the reliability of tumor segmentation from CT volumes.


超分辨率|去噪|去模糊|去雾(1篇)

【1】Astra: General Interactive World Model with Autoregressive Denoising
标题:Astra:具有自回归去噪的通用互动世界模型
链接:https://arxiv.org/abs/2512.08931

作者:Yixuan Zhu,Jiaqi Feng,Wenzhao Zheng,Yuan Gao,Xin Tao,Pengfei Wan,Jie Zhou,Jiwen Lu
备注:Code is available at: https://github.com/EternalEvan/Astra
摘要:扩散Transformers的最新进展使视频生成模型能够从文本或图像生成高质量的视频剪辑。然而,能够从过去的观测和行动中预测长期未来的世界模型仍然没有得到充分的探索,特别是对于通用场景和各种形式的行动。为了弥合这一差距,我们引入了Astra,这是一个交互式的通用世界模型,可以为不同的场景生成真实世界的未来(例如,自动驾驶,机器人抓取)与精确的动作交互(例如,摄像机运动、机器人动作)。我们提出了一个自回归去噪架构,并使用时间因果注意力来聚合过去的观察和支持流输出。我们使用噪声增强的历史记忆,以避免过度依赖于过去的帧,以平衡响应时间的连贯性。为了精确的动作控制,我们引入了一个动作感知适配器,它直接将动作信号注入去噪过程。我们进一步开发了一个混合的动作专家,动态路由异构的动作模式,提高跨不同的现实世界的任务,如探索,操纵和相机控制的多功能性。Astra实现了交互式、一致性和通用的长期视频预测,并支持各种形式的交互。在多个数据集上的实验证明了Astra在保真度,长期预测和动作对齐方面的改进,超过了现有的最先进的世界模型。
摘要:Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict long-horizon futures from past observations and actions remain underexplored, especially for general-purpose scenarios and various forms of actions. To bridge this gap, we introduce Astra, an interactive general world model that generates real-world futures for diverse scenarios (e.g., autonomous driving, robot grasping) with precise action interactions (e.g., camera motion, robot action). We propose an autoregressive denoising architecture and use temporal causal attention to aggregate past observations and support streaming outputs. We use a noise-augmented history memory to avoid over-reliance on past frames to balance responsiveness with temporal coherence. For precise action control, we introduce an action-aware adapter that directly injects action signals into the denoising process. We further develop a mixture of action experts that dynamically route heterogeneous action modalities, enhancing versatility across diverse real-world tasks such as exploration, manipulation, and camera control. Astra achieves interactive, consistent, and general long-term video prediction and supports various forms of interactions. Experiments across multiple datasets demonstrate the improvements of Astra in fidelity, long-range prediction, and action alignment over existing state-of-the-art world models.


自动驾驶|车辆|车道检测等(2篇)

【1】Command & Control (C2) Traffic Detection Via Algorithm Generated Domain (Dga) Classification Using Deep Learning And Natural Language Processing
标题:使用深度学习和自然语言处理通过算法生成域(Dga)分类进行命令与控制(C2)交通检测
链接:https://arxiv.org/abs/2512.07866

作者:Maria Milena Araujo Felix
备注:Language: Portuguese
摘要:现代恶意软件的复杂性,特别是与命令和控制(C2)服务器的通信,已经使基于黑名单的静态防御过时。域生成算法(DGA)的使用允许攻击者每天生成数千个动态地址,从而阻碍了传统防火墙的拦截。本文旨在提出并评估一种使用深度学习和自然语言处理(NLP)技术检测DGA域的方法。该方法包括收集一个包含50,000个合法和50,000个恶意域名的混合数据库,然后提取词汇特征并训练递归神经网络(LSTM)。结果表明,虽然统计熵分析是有效的简单的DGAs,神经网络方法在检测复杂的模式,达到97.2%的准确率,并减少在模糊的合法交通场景的误报率的优势。
摘要 :The sophistication of modern malware, specifically regarding communication with Command and Control (C2) servers, has rendered static blacklist-based defenses obsolete. The use of Domain Generation Algorithms (DGA) allows attackers to generate thousands of dynamic addresses daily, hindering blocking by traditional firewalls. This paper aims to propose and evaluate a method for detecting DGA domains using Deep Learning and Natural Language Processing (NLP) techniques. The methodology consisted of collecting a hybrid database containing 50,000 legitimate and 50,000 malicious domains, followed by the extraction of lexical features and the training of a Recurrent Neural Network (LSTM). Results demonstrated that while statistical entropy analysis is effective for simple DGAs, the Neural Network approach presents superiority in detecting complex patterns, reaching 97.2% accuracy and reducing the false positive rate in ambiguous lawful traffic scenarios.


【2】HSTMixer: A Hierarchical MLP-Mixer for Large-Scale Traffic Forecasting
标题:HSTMixer:用于大规模流量预测的分层MLP-Mixer
链接:https://arxiv.org/abs/2512.07854

作者:Yongyao Wang,Jingyuan Wang,Xie Yu,Jiahao Ji,Chao Li
备注:10 pages, 9 figures
摘要:交通预测工作对现代城市管理具有重要意义。近年来,大规模预测越来越受到关注,因为它更好地反映了现实世界的交通网络的复杂性。然而,现有的模型往往表现出二次计算的复杂性,使他们不切实际的大规模现实世界的情况。在本文中,我们提出了一种新的框架,层次时空混合器(HSTMixer),它利用了一个全MLP架构的高效和有效的大规模流量预测。HSTMixer采用分层时空混合块,通过自底向上聚合和自顶向下传播来提取多分辨率特征。此外,自适应区域混合器生成基于区域语义的变换矩阵,使我们的模型能够动态捕获不同区域的时空模式。在四个大规模真实世界数据集上进行的大量实验表明,该方法不仅实现了最先进的性能,而且具有竞争力的计算效率。
摘要:Traffic forecasting task is significant to modern urban management. Recently, there is growing attention on large-scale forecasting, as it better reflects the complexity of real-world traffic networks. However, existing models often exhibit quadratic computational complexity, making them impractical for large-scale real-world scenarios. In this paper, we propose a novel framework, Hierarchical Spatio-Temporal Mixer (HSTMixer), which leverages an all-MLP architecture for efficient and effective large-scale traffic forecasting. HSTMixer employs a hierarchical spatiotemporal mixing block to extract multi-resolution features through bottom-up aggregation and top-down propagation. Furthermore, an adaptive region mixer generates transformation matrices based on regional semantics, enabling our model to dynamically capture evolving spatiotemporal patterns for different regions. Extensive experiments conducted on four large-scale real-world datasets demonstrate that the proposed method not only achieves state-of-the-art performance but also exhibits competitive computational efficiency.


点云|SLAM|雷达|激光|深度RGBD相关(2篇)

【1】Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis
标题:深度成长的模型能克服深度诅咒吗?深入分析
链接:https://arxiv.org/abs/2512.08819

作者:Ferdinand Kapl,Emmanouil Angelis,Tobias Höppe,Kaitlin Maile,Johannes von Oswald,Nino Scherrer,Stefan Bauer
摘要:在训练期间逐渐增加Transformers的深度不仅可以降低训练成本,而且还可以提高推理性能,如MIDAS所示(Saunshi等人,2024年)。然而,到目前为止,对这些收益的机械理解一直缺失。在这项工作中,我们建立了一个连接到最近的工作表明,在非生长,pre-layernorm Transformers的第二半层贡献的最终输出分布比那些在前半-也被称为深度的诅咒(Sun等人,2025年,Csordás等人,2025年)。使用深度方面的分析,我们表明,通过逐渐中间堆叠的增长产生更有效地利用模型深度,改变了剩余流结构,并促进形成可置换的计算块。此外,我们提出了一个轻量级的修改MIDAS,产生进一步改善下游推理基准。总的来说,这项工作突出了模型深度的逐渐增长如何导致形成不同的计算电路,并克服了标准非生长模型中有限的深度利用率。
摘要:Gradually growing the depth of Transformers during training can not only reduce training cost but also lead to improved reasoning performance, as shown by MIDAS (Saunshi et al., 2024). Thus far, however, a mechanistic understanding of these gains has been missing. In this work, we establish a connection to recent work showing that layers in the second half of non-grown, pre-layernorm Transformers contribute much less to the final output distribution than those in the first half - also known as the Curse of Depth (Sun et al., 2025, Csordás et al., 2025). Using depth-wise analyses, we demonstrate that growth via gradual middle stacking yields more effective utilization of model depth, alters the residual stream structure, and facilitates the formation of permutable computational blocks. In addition, we propose a lightweight modification of MIDAS that yields further improvements in downstream reasoning benchmarks. Overall, this work highlights how the gradual growth of model depth can lead to the formation of distinct computational circuits and overcome the limited depth utilization seen in standard non-grown models.


【2】Solving Over-Smoothing in GNNs via Nonlocal Message Passing: Algebraic Smoothing and Depth Scalability
标题:通过非本地消息传递解决GNN中的过度平滑:代数平滑和深度可扩展性
链接:https://arxiv.org/abs/2512.08475

作者:Weiqi Guan,Junlin He
备注:18 pages, 4 figures
摘要:层规格化(LN)的位置和过度平滑现象之间的关系仍然有待研究。我们确定了一个关键的困境:前LN架构避免过度平滑,但遭受深度的诅咒,而后LN架构绕过深度的诅咒,但经历过度平滑。   为了解决这个问题,我们提出了一种新的方法,基于Post-LN,诱导代数平滑,防止过度平滑而没有深度灾难。五个基准测试的实证结果表明,我们的方法支持更深的网络(高达256层),并提高了性能,不需要额外的参数。   主要贡献:   理论表征:分析LN动力学及其对过平滑和深度灾难的影响。   A Principled Solution:一种参数有效的方法,可以引入代数平滑,避免过度平滑和深度灾难。   经验验证:大量的实验表明该方法在更深层次的GNN中的有效性。
摘要 :The relationship between Layer Normalization (LN) placement and the over-smoothing phenomenon remains underexplored. We identify a critical dilemma: Pre-LN architectures avoid over-smoothing but suffer from the curse of depth, while Post-LN architectures bypass the curse of depth but experience over-smoothing.   To resolve this, we propose a new method based on Post-LN that induces algebraic smoothing, preventing over-smoothing without the curse of depth. Empirical results across five benchmarks demonstrate that our approach supports deeper networks (up to 256 layers) and improves performance, requiring no additional parameters.   Key contributions:   Theoretical Characterization: Analysis of LN dynamics and their impact on over-smoothing and the curse of depth.   A Principled Solution: A parameter-efficient method that induces algebraic smoothing and avoids over-smoothing and the curse of depth.   Empirical Validation: Extensive experiments showing the effectiveness of the method in deeper GNNs.


联邦学习|隐私保护|加密(3篇)

【1】Decentralized Trust for Space AI: Blockchain-Based Federated Learning Across Multi-Vendor LEO Satellite Networks
标题:太空人工智能的去中心化信任:跨多供应商LEO卫星网络的基于区块链的联邦学习
链接:https://arxiv.org/abs/2512.08882

作者:Mohamed Elmahallawy,Asma Jodeiri Akbarfam
摘要:太空人工智能的兴起正在通过灾害检测、边境监视和气候监测等应用重塑政府和行业,这些应用由商业和政府低地球轨道(LEO)卫星的大量数据提供支持。联合卫星学习(FSL)可以在不共享原始数据的情况下进行联合模型训练,但由于间歇性连接而导致收敛缓慢,并带来了关键的信任挑战-其中可能会在卫星星座中出现有偏见或伪造的更新,包括通过对卫星间或卫星-地面通信链路的网络攻击注入的更新。我们提出了OrbitChain,这是一个区块链支持的框架,可以在LEO网络中实现值得信赖的多供应商协作。OrbitChain(i)将共识卸载到具有更大计算能力的高空平台(HAP),(ii)确保来自不同供应商拥有的不同轨道的模型更新的透明,可审计的出处,以及(iii)防止操纵或不完整的贡献影响全球FSL模型聚合。大量的模拟表明,OrbitChain减少了计算和通信开销,同时提高了隐私,安全性和全局模型的准确性。其许可的权限证明分类账以低于秒的延迟完成了1000多个区块(对于1-of-5,3-of-5和5-of-5法定人数,延迟分别为0.16,s,0.26,s,0.35,s)。此外,与单一供应商相比,OrbitChain将真实卫星数据集的收敛时间缩短了30小时,证明了其实时多供应商学习的有效性。我们的代码可在https://github.com/wsu-cyber-security-lab-ai/OrbitChain.git上获得
摘要:The rise of space AI is reshaping government and industry through applications such as disaster detection, border surveillance, and climate monitoring, powered by massive data from commercial and governmental low Earth orbit (LEO) satellites. Federated satellite learning (FSL) enables joint model training without sharing raw data, but suffers from slow convergence due to intermittent connectivity and introduces critical trust challenges--where biased or falsified updates can arise across satellite constellations, including those injected through cyberattacks on inter-satellite or satellite-ground communication links. We propose OrbitChain, a blockchain-backed framework that empowers trustworthy multi-vendor collaboration in LEO networks. OrbitChain (i) offloads consensus to high-altitude platforms (HAPs) with greater computational capacity, (ii) ensures transparent, auditable provenance of model updates from different orbits owned by different vendors, and (iii) prevents manipulated or incomplete contributions from affecting global FSL model aggregation. Extensive simulations show that OrbitChain reduces computational and communication overhead while improving privacy, security, and global model accuracy. Its permissioned proof-of-authority ledger finalizes over 1000 blocks with sub-second latency (0.16,s, 0.26,s, 0.35,s for 1-of-5, 3-of-5, and 5-of-5 quorums). Moreover, OrbitChain reduces convergence time by up to 30 hours on real satellite datasets compared to single-vendor, demonstrating its effectiveness for real-time, multi-vendor learning. Our code is available at https://github.com/wsu-cyber-security-lab-ai/OrbitChain.git


【2】DS FedProxGrad: Asymptotic Stationarity Without Noise Floor in Fair Federated Learning
标题:DS FedProxGrad:公平联邦学习中无噪底的渐进平稳性
链接:https://arxiv.org/abs/2512.08671

作者:Huzaifa Arif
备注:8 pages
摘要:Recent work \cite{arifgroup} introduced Federated Proximal Gradient \textbf{(\texttt{FedProxGrad})} for solving non-convex composite optimization problems in group fair federated learning. However, the original analysis established convergence only to a \textit{noise-dominated neighborhood of stationarity}, with explicit dependence on a variance-induced noise floor. In this work, we provide an improved asymptotic convergence analysis for a generalized \texttt{FedProxGrad}-type analytical framework with inexact local proximal solutions and explicit fairness regularization. We call this extended analytical framework \textbf{DS \texttt{FedProxGrad}} (Decay Step Size \texttt{FedProxGrad}). Under a Robbins-Monro step-size schedule \cite{robbins1951stochastic} and a mild decay condition on local inexactness, we prove that $\liminf_{r\to\infty} \mathbb{E}[\|\nabla F(\mathbf{x}^r)\|^2] = 0$, i.e., the algorithm is asymptotically stationary and the convergence rate does not depend on a variance-induced noise floor.
摘要:Recent work \cite{arifgroup} introduced Federated Proximal Gradient \textbf{(\texttt{FedProxGrad})} for solving non-convex composite optimization problems in group fair federated learning. However, the original analysis established convergence only to a \textit{noise-dominated neighborhood of stationarity}, with explicit dependence on a variance-induced noise floor. In this work, we provide an improved asymptotic convergence analysis for a generalized \texttt{FedProxGrad}-type analytical framework with inexact local proximal solutions and explicit fairness regularization. We call this extended analytical framework \textbf{DS \texttt{FedProxGrad}} (Decay Step Size \texttt{FedProxGrad}). Under a Robbins-Monro step-size schedule \cite{robbins1951stochastic} and a mild decay condition on local inexactness, we prove that $\liminf_{r\to\infty} \mathbb{E}[\|\nabla F(\mathbf{x}^r)\|^2] = 0$, i.e., the algorithm is asymptotically stationary and the convergence rate does not depend on a variance-induced noise floor.


【3】Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning
标题:最小化分层激活规范改善联邦学习的推广
链接:https://arxiv.org/abs/2512.08314

作者:M Yashwanth,Gaurav Kumar Nayak,Harsh Rangwani,Arya Singh,R. Venkatesh Babu,Anirban Chakraborty
备注:Accepted to WACV 2024
摘要 :联合学习(FL)是一种新兴的机器学习框架,它使多个客户端(由服务器协调)能够通过聚合本地训练的模型来协作地训练全局模型,而无需共享任何客户端的训练数据。在最近的工作中已经观察到,以联邦方式学习可能会导致聚合的全局模型收敛到“尖锐的最小值”,从而对FL训练模型的泛化能力产生不利影响。因此,在这项工作中,我们的目标是通过引入“平坦度”约束FL优化问题来提高在联邦设置中训练的模型的泛化性能。这种平坦性约束被施加在从训练损失计算的Hessian的顶部特征值上。当每个客户端在其本地数据上训练模型时,我们进一步利用客户端损失函数重新制定这个复杂的问题,并提出了一种新的计算效率高的正则化技术,称为“MAN”,它最小化客户端模型上每层的激活范数。我们还从理论上表明,最小化激活范数降低了客户端损失的逐层Hessian的顶部特征值,这反过来又降低了整体Hessian的顶部特征值,确保收敛到平坦的最小值。我们将我们提出的平坦度约束优化应用到现有的FL技术中,并获得了显着的改进,从而建立了新的最先进的技术。
摘要:Federated Learning (FL) is an emerging machine learning framework that enables multiple clients (coordinated by a server) to collaboratively train a global model by aggregating the locally trained models without sharing any client's training data. It has been observed in recent works that learning in a federated manner may lead the aggregated global model to converge to a 'sharp minimum' thereby adversely affecting the generalizability of this FL-trained model. Therefore, in this work, we aim to improve the generalization performance of models trained in a federated setup by introducing a 'flatness' constrained FL optimization problem. This flatness constraint is imposed on the top eigenvalue of the Hessian computed from the training loss. As each client trains a model on its local data, we further re-formulate this complex problem utilizing the client loss functions and propose a new computationally efficient regularization technique, dubbed 'MAN,' which Minimizes Activation's Norm of each layer on client-side models. We also theoretically show that minimizing the activation norm reduces the top eigenvalue of the layer-wise Hessian of the client's loss, which in turn decreases the overall Hessian's top eigenvalue, ensuring convergence to a flat minimum. We apply our proposed flatness-constrained optimization to the existing FL techniques and obtain significant improvements, thereby establishing new state-of-the-art.


推理|分析|理解|解释(3篇)

【1】Explainable Anomaly Detection for Industrial IoT Data Streams
标题:工业物联网数据流的可解释异常检测
链接:https://arxiv.org/abs/2512.08885

作者:Ana Rita Paupério,Diogo Risca,Afonso Lourenço,Goreti Marreiros,Ricardo Martins
备注:Accepted at 41st ACM/SIGAPP Symposium On Applied Computing (SAC 2026)
摘要:物联网和边缘计算正在改变工业维护,产生连续的数据流,需要在有限的计算资源下进行实时的自适应决策。虽然数据流挖掘(DSM)解决了这一挑战,但大多数方法都假设完全监督的设置,但在实践中,地面实况标签通常会延迟或不可用。本文提出了一种协同DSM框架,集成了无监督的异常检测与交互式,人在回路学习,以支持维护决策。我们采用在线隔离森林,并使用增量部分依赖图和特征重要性得分来增强可解释性,这些得分来自个体条件期望曲线与衰减平均值的偏差,使用户能够动态地重新评估特征相关性并调整异常阈值。我们描述了实时实施,并提供初步的结果,故障检测在JEWALLOOM单位。正在进行的工作目标是持续监测,以预测和解释即将发生的轴承故障。
摘要:Industrial maintenance is being transformed by the Internet of Things and edge computing, generating continuous data streams that demand real-time, adaptive decision-making under limited computational resources. While data stream mining (DSM) addresses this challenge, most methods assume fully supervised settings, yet in practice, ground-truth labels are often delayed or unavailable. This paper presents a collaborative DSM framework that integrates unsupervised anomaly detection with interactive, human-in-the-loop learning to support maintenance decisions. We employ an online Isolation Forest and enhance interpretability using incremental Partial Dependence Plots and a feature importance score, derived from deviations of Individual Conditional Expectation curves from a fading average, enabling users to dynamically reassess feature relevance and adjust anomaly thresholds. We describe the real-time implementation and provide initial results for fault detection in a Jacquard loom unit. Ongoing work targets continuous monitoring to predict and explain imminent bearing failures.


【2】Multi-domain performance analysis with scores tailored to user preferences
标题:多领域性能分析,根据用户偏好定制评分
链接:https://arxiv.org/abs/2512.08715

作者:Sébastien Piérard,Adrien Deliège,Marc Van Droogenbroeck
摘要:算法、方法和模型的性能往往在很大程度上取决于它们所应用的案例的分布,这种分布特定于应用领域。在几个领域进行评估后,计算(加权)平均性能并仔细检查在此平均过程中发生的情况是非常有用的。为了实现这一目标,我们采用概率框架,并将性能视为概率度量(例如,用于分类任务的归一化混淆矩阵)。看来,相应的加权平均值是已知的总结,只有一些显着的分数分配给总结性能的值等于加权算术平均值分配给特定领域的性能。这些分数包括排序分数的族,由用户偏好参数化的连续体,并且在算术平均值中考虑的权重取决于用户偏好。在此基础上,我们严格定义了四个领域,命名为最简单的,最困难的,优势,和瓶颈领域,作为用户偏好的功能。在建立理论的一般设置,无论任务,我们开发了新的视觉工具,两类分类。
摘要:The performance of algorithms, methods, and models tends to depend heavily on the distribution of cases on which they are applied, this distribution being specific to the applicative domain. After performing an evaluation in several domains, it is highly informative to compute a (weighted) mean performance and, as shown in this paper, to scrutinize what happens during this averaging. To achieve this goal, we adopt a probabilistic framework and consider a performance as a probability measure (e.g., a normalized confusion matrix for a classification task). It appears that the corresponding weighted mean is known to be the summarization, and that only some remarkable scores assign to the summarized performance a value equal to a weighted arithmetic mean of the values assigned to the domain-specific performances. These scores include the family of ranking scores, a continuum parameterized by user preferences, and that the weights to consider in the arithmetic mean depend on the user preferences. Based on this, we rigorously define four domains, named easiest, most difficult, preponderant, and bottleneck domains, as functions of user preferences. After establishing the theory in a general setting, regardless of the task, we develop new visual tools for two-class classification.


【3】RaX-Crash: A Resource Efficient and Explainable Small Model Pipeline with an Application to City Scale Injury Severity Prediction
标题:RaX-Crash:一个资源高效且可解释的小型管道,应用于城市规模伤害严重程度预测
链接:https://arxiv.org/abs/2512.07848

作者:Di Zhu,Chen Xie,Ziwei Wang,Haoyun Zhang
摘要:纽约市每年报告超过10万起机动车碰撞事故,造成重大伤害和公共卫生负担。我们提出了RaX-Crash,这是一个资源高效且可解释的小型模型管道,用于在官方NYC机动车碰撞数据集上进行结构化伤害严重程度预测。RaX-Crash将三个链接表与数千万条记录集成在一起,在分区存储中构建统一的特征模式,并在工程化的表格特征上训练基于紧凑树的集合(Random Forest和XGBoost),并将其与本地部署的小型语言模型(SLM)进行比较,并提示文本摘要。在一个暂时的测试集上,XGBoost和随机森林的准确率分别为0.7828和0.7794,明显优于SLM(0.594和0.496);类不平衡分析表明,简单的类加权提高了致命召回率,并具有适度的准确性权衡,SHAP归因突出了人类脆弱性因素,时间和位置作为预测严重性的主要驱动因素。总体而言,RaX-Crash表明,可解释的小型模型集合仍然是城市规模伤害分析的强大基线,而将表格预测器与SLM生成的叙述配对的混合管道在不牺牲可扩展性的情况下改善了沟通。
摘要:New York City reports over one hundred thousand motor vehicle collisions each year, creating substantial injury and public health burden. We present RaX-Crash, a resource efficient and explainable small model pipeline for structured injury severity prediction on the official NYC Motor Vehicle Collisions dataset. RaX-Crash integrates three linked tables with tens of millions of records, builds a unified feature schema in partitioned storage, and trains compact tree based ensembles (Random Forest and XGBoost) on engineered tabular features, which are compared against locally deployed small language models (SLMs) prompted with textual summaries. On a temporally held out test set, XGBoost and Random Forest achieve accuracies of 0.7828 and 0.7794, clearly outperforming SLMs (0.594 and 0.496); class imbalance analysis shows that simple class weighting improves fatal recall with modest accuracy trade offs, and SHAP attribution highlights human vulnerability factors, timing, and location as dominant drivers of predicted severity. Overall, RaX-Crash indicates that interpretable small model ensembles remain strong baselines for city scale injury analytics, while hybrid pipelines that pair tabular predictors with SLM generated narratives improve communication without sacrificing scalability.


检测相关(2篇)

【1】ByteStorm: a multi-step data-driven approach for Tropical Cyclones detection and tracking
标题:ByteStorm:热带气旋检测和跟踪的多步骤数据驱动方法
链接:https://arxiv.org/abs/2512.07885

作者:Davide Donno,Donatello Elia,Gabriele Accarino,Marco De Carlo,Enrico Scoccimarro,Silvio Gualdi
备注:21 pages, 10 figures
摘要:准确的热带气旋(TC)跟踪是天气和气候科学领域的一项重大挑战。传统的跟踪方案主要依赖于主观阈值,这可能会在其应用的地理区域的技能中引入偏差。我们提出了ByteStorm,一个有效的数据驱动的框架重建TC轨道没有阈值调整。它利用深度学习网络来检测TC中心(通过分类和定位),仅使用相对涡度(850 mb)和平均海平面压力。然后,通过BYTE算法将检测到的中心链接到TC轨迹。ByteStorm在东太平洋和西北太平洋盆地(ENP和WNP)的最先进的确定性跟踪器上进行了评估。该框架在检测概率($85.05\%$ ENP,$79.48\%$ WNP)、虚警率($23.26\%$ ENP,$16.14\%$ WNP)和高年际变率相关性($0.75$ ENP和$0.69$ WNP)方面实现了优异的性能。这些结果突出了集成深度学习和计算机视觉以实现快速准确TC跟踪的潜力,为传统方法提供了强大的替代方案。
摘要:Accurate tropical cyclones (TCs) tracking represents a critical challenge in the context of weather and climate science. Traditional tracking schemes mainly rely on subjective thresholds, which may introduce biases in their skills on the geographical region of application. We present ByteStorm, an efficient data-driven framework for reconstructing TC tracks without threshold tuning. It leverages deep learning networks to detect TC centers (via classification and localization), using only relative vorticity (850 mb) and mean sea-level pressure. Then, detected centers are linked into TC tracks through the BYTE algorithm. ByteStorm is evaluated against state-of-the-art deterministic trackers in the East- and West-North Pacific basins (ENP and WNP). The proposed framework achieves superior performance in terms of Probability of Detection ($85.05\%$ ENP, $79.48\%$ WNP), False Alarm Rate ($23.26\%$ ENP, $16.14\%$ WNP), and high Inter-Annual Variability correlations ($0.75$ ENP and $0.69$ WNP). These results highlight the potential of integrating deep learning and computer vision for fast and accurate TC tracking, offering a robust alternative to traditional approaches.


【2】Detection of Cyberbullying in GIF using AI
标题:使用人工智能检测GIF中的网络欺凌
链接:https://arxiv.org/abs/2512.07838

作者:Pal Dave,Xiaohong Yuan,Madhuri Siddula,Kaushik Roy
摘要:网络欺凌是一个众所周知的社会问题,而且正在日益升级。由于互联网的蓬勃发展,社交媒体为用户提供了许多不同的方式来表达他们的意见和交换信息。网络欺凌发生在社交媒体上,使用短信、评论、分享图像和GIF或贴纸以及音频和视频。已经做了很多研究来检测文本数据上的网络欺凌;有些可用于图像。很少有研究可用于检测GIF/贴纸上的网络欺凌。我们从Twitter收集了一个GIF数据集,并应用了一个深度学习模型来检测数据集中的网络欺凌。首先,我们使用Twitter提取了与网络欺凌相关的标签。我们使用这些标签来下载GIF文件使用公开可用的API GIPHY。我们收集了超过4100个GIF,包括网络欺凌和非网络欺凌。我们应用深度学习预训练模型VGG 16来检测网络欺凌。深度学习模型的准确率达到了97%。我们的工作为在这一领域工作的研究人员提供了GIF数据集。
摘要 :Cyberbullying is a well-known social issue, and it is escalating day by day. Due to the vigorous development of the internet, social media provide many different ways for the user to express their opinions and exchange information. Cyberbullying occurs on social media using text messages, comments, sharing images and GIFs or stickers, and audio and video. Much research has been done to detect cyberbullying on textual data; some are available for images. Very few studies are available to detect cyberbullying on GIFs/stickers. We collect a GIF dataset from Twitter and Applied a deep learning model to detect cyberbullying from the dataset. Firstly, we extracted hashtags related to cyberbullying using Twitter. We used these hashtags to download GIF file using publicly available API GIPHY. We collected over 4100 GIFs including cyberbullying and non cyberbullying. we applied deep learning pre-trained model VGG16 for the detection of the cyberbullying. The deep learning model achieved the accuracy of 97%. Our work provides the GIF dataset for researchers working in this area.


分类|识别(4篇)

【1】Mitigating Individual Skin Tone Bias in Skin Lesion Classification through Distribution-Aware Reweighting
标题:通过分布感知重新加权减轻皮肤病变分类中的个体肤色偏差
链接:https://arxiv.org/abs/2512.08733

作者:Kuniko Paxton,Zeinab Dehghani,Koorosh Aslansefat,Dhavalkumar Thakker,Yiannis Papadopoulos
摘要:肤色历来是歧视的焦点,但医学成像机器学习的公平性研究往往依赖于粗略的子组类别,忽视了个体水平的差异。这种基于群体的方法有可能掩盖亚群体内离群值所面临的偏差。本研究介绍了一个基于分布的框架,用于评估和减轻皮肤病变分类中的个体公平性。我们把肤色作为一个连续的属性,而不是一个分类标签,并采用核密度估计(KDE)来模拟其分布。我们进一步比较了12个统计距离度量来量化肤色分布之间的差异,并提出了一个基于距离的重新加权(DRW)损失函数来纠正少数民族色调的代表性不足。跨CNN和Transformer模型的实验证明:(i)分类重新加权在捕获个体水平差异方面的局限性,以及(ii)基于分布的重新加权的优越性能,特别是保真度相似性(FS),Wasserstein距离(WD),Hellinger度量(HM)和调和平均相似性(HS)。这些发现建立了一种强大的方法,可以在皮肤病学人工智能系统中促进个体层面的公平性,并强调了医学图像分析中敏感连续属性的更广泛意义。
摘要:Skin color has historically been a focal point of discrimination, yet fairness research in machine learning for medical imaging often relies on coarse subgroup categories, overlooking individual-level variations. Such group-based approaches risk obscuring biases faced by outliers within subgroups. This study introduces a distribution-based framework for evaluating and mitigating individual fairness in skin lesion classification. We treat skin tone as a continuous attribute rather than a categorical label, and employ kernel density estimation (KDE) to model its distribution. We further compare twelve statistical distance metrics to quantify disparities between skin tone distributions and propose a distance-based reweighting (DRW) loss function to correct underrepresentation in minority tones. Experiments across CNN and Transformer models demonstrate: (i) the limitations of categorical reweighting in capturing individual-level disparities, and (ii) the superior performance of distribution-based reweighting, particularly with Fidelity Similarity (FS), Wasserstein Distance (WD), Hellinger Metric (HM), and Harmonic Mean Similarity (HS). These findings establish a robust methodology for advancing fairness at individual level in dermatological AI systems, and highlight broader implications for sensitive continuous attributes in medical image analysis.


【2】A Comparative Study of EMG- and IMU-based Gesture Recognition at the Wrist and Forearm
标题:基于EMG和IMU的手腕和前臂手势识别的比较研究
链接:https://arxiv.org/abs/2512.07997

作者:Soroush Baghernezhad,Elaheh Mohammadreza,Vinicius Prado da Fonseca,Ting Zou,Xianta Jiang
摘要:手势是我们日常与环境互动的一个组成部分。手势识别(HGR)是通过各种输入方式(如视觉数据(图像和视频)和生物信号)解释人类意图的过程。生物信号被广泛用于HGR,因为它们能够通过放置在手臂上的传感器非侵入性地捕获。其中,测量肌肉电活动的表面肌电图(sEMG)是研究最广泛的方式。然而,较少探索的替代方案,如惯性测量单元(IMU),可以提供关于细微肌肉运动的补充信息,这使得它们对手势识别很有价值。在这项研究中,我们研究了使用来自不同肌肉群的IMU信号来捕获用户意图的潜力。我们的研究结果表明,IMU信号包含足够的信息,作为静态手势识别的唯一输入传感器。此外,我们比较不同的肌肉群,并检查个别肌肉群的模式识别质量。我们进一步发现,由IMU捕获的肌腱诱导的微运动是静态手势识别的主要贡献者。我们相信,利用肌肉微运动信息可以提高截肢者假肢的可用性。这种方法还为机器人、远程操作、手语翻译等领域的手势识别提供了新的可能性。
摘要:Gestures are an integral part of our daily interactions with the environment. Hand gesture recognition (HGR) is the process of interpreting human intent through various input modalities, such as visual data (images and videos) and bio-signals. Bio-signals are widely used in HGR due to their ability to be captured non-invasively via sensors placed on the arm. Among these, surface electromyography (sEMG), which measures the electrical activity of muscles, is the most extensively studied modality. However, less-explored alternatives such as inertial measurement units (IMUs) can provide complementary information on subtle muscle movements, which makes them valuable for gesture recognition. In this study, we investigate the potential of using IMU signals from different muscle groups to capture user intent. Our results demonstrate that IMU signals contain sufficient information to serve as the sole input sensor for static gesture recognition. Moreover, we compare different muscle groups and check the quality of pattern recognition on individual muscle groups. We further found that tendon-induced micro-movement captured by IMUs is a major contributor to static gesture recognition. We believe that leveraging muscle micro-movement information can enhance the usability of prosthetic arms for amputees. This approach also offers new possibilities for hand gesture recognition in fields such as robotics, teleoperation, sign language interpretation, and beyond.


【3】Pattern Recognition of Ozone-Depleting Substance Exports in Global Trade Data
标题:全球贸易数据中臭氧消耗物质出口的模式识别
链接:https://arxiv.org/abs/2512.07864

作者:Muhammad Sukri Bin Ramli
摘要 :需要通过审查大型、复杂的海关数据集来监测《蒙特利尔议定书》等环境条约的新方法。本文介绍了一个使用无监督机器学习的框架,可以系统地检测可疑的交易模式并突出显示活动以供审查。我们的方法应用于100,000条交易记录,结合了几种ML技术。无监督聚类(K-Means)根据货物价值和重量发现自然贸易原型。异常检测(隔离森林和IQR)识别罕见的“特大交易”和具有商业上不寻常的每公斤价格值的货物。这是由启发式标记补充,以找到模糊的装运描述等策略。这些层次被组合成一个优先级分数,它成功地识别了1,351个价格异常值和1,288个高优先级货物,以供海关审查。一个关键的发现是,高优先级商品显示出与一般商品不同的、更有价值的价值重量比。这一点使用可解释人工智能(SHAP)进行了验证,该AI确认模糊描述和高价值是最重要的风险预测因素。该模型的灵敏度通过其在2021年初检测到“大型交易”的大幅飙升而得到验证,这与美国AIM法案的现实监管影响直接相关。这项工作提出了一个可重复的无监督学习管道,将原始交易数据转化为监管团体优先考虑的可用情报。
摘要:New methods are needed to monitor environmental treaties, like the Montreal Protocol, by reviewing large, complex customs datasets. This paper introduces a framework using unsupervised machine learning to systematically detect suspicious trade patterns and highlight activities for review. Our methodology, applied to 100,000 trade records, combines several ML techniques. Unsupervised Clustering (K-Means) discovers natural trade archetypes based on shipment value and weight. Anomaly Detection (Isolation Forest and IQR) identifies rare "mega-trades" and shipments with commercially unusual price-per-kilogram values. This is supplemented by Heuristic Flagging to find tactics like vague shipment descriptions. These layers are combined into a priority score, which successfully identified 1,351 price outliers and 1,288 high-priority shipments for customs review. A key finding is that high-priority commodities show a different and more valuable value-to-weight ratio than general goods. This was validated using Explainable AI (SHAP), which confirmed vague descriptions and high value as the most significant risk predictors. The model's sensitivity was validated by its detection of a massive spike in "mega-trades" in early 2021, correlating directly with the real-world regulatory impact of the US AIM Act. This work presents a repeatable unsupervised learning pipeline to turn raw trade data into prioritized, usable intelligence for regulatory groups.


【4】FAIM: Frequency-Aware Interactive Mamba for Time Series Classification
标题:FAIM:用于时间序列分类的频率感知交互式曼巴
链接:https://arxiv.org/abs/2512.07858

作者:Da Zhang,Bingyu Li,Zhiyuan Zhao,Yanhan Zhang,Junyu Gao,Feiping Nie,Xuelong Li
摘要:时间序列分类(TSC)在许多实际应用中至关重要,例如环境监测,医疗诊断和姿势识别。TSC任务要求模型有效地捕获判别信息以进行准确的类别识别。虽然深度学习架构擅长捕捉时间依赖性,但它们通常存在计算成本高、对噪声扰动敏感以及对小规模数据集过拟合的敏感性。为了解决这些挑战,我们提出FAIM,一个轻量级的频率感知交互曼巴模型。具体来说,我们引入了一个自适应滤波块(AFB),利用傅立叶变换从时间序列数据中提取频域特征。AFB采用可学习的自适应阈值来动态抑制噪声,并采用全局和局部语义自适应滤波的逐元素耦合,从而能够对不同频率分量之间的协同作用进行深入建模。此外,我们设计了一个交互式曼巴块(IMB),以促进有效的多粒度信息交互,平衡提取细粒度的判别特征和全面的全局上下文信息,从而赋予FAIM与强大的表达表示TSC任务。此外,我们还引入了一种自我监督的预训练机制,以增强FAIM对复杂时间模式的理解,并提高其在各种领域和高噪声场景中的鲁棒性。在多个基准测试上的大量实验表明,FAIM始终优于现有的最先进的(SOTA)方法,实现了准确性和效率之间的卓越权衡,并表现出出色的性能。
摘要:Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition. TSC tasks require models to effectively capture discriminative information for accurate class identification. Although deep learning architectures excel at capturing temporal dependencies, they often suffer from high computational cost, sensitivity to noise perturbations, and susceptibility to overfitting on small-scale datasets. To address these challenges, we propose FAIM, a lightweight Frequency-Aware Interactive Mamba model. Specifically, we introduce an Adaptive Filtering Block (AFB) that leverages Fourier Transform to extract frequency-domain features from time series data. The AFB incorporates learnable adaptive thresholds to dynamically suppress noise and employs element-wise coupling of global and local semantic adaptive filtering, enabling in-depth modeling of the synergy among different frequency components. Furthermore, we design an Interactive Mamba Block (IMB) to facilitate efficient multi-granularity information interaction, balancing the extraction of fine-grained discriminative features and comprehensive global contextual information, thereby endowing FAIM with powerful and expressive representations for TSC tasks. Additionally, we incorporate a self-supervised pre-training mechanism to enhance FAIM's understanding of complex temporal patterns and improve its robustness across various domains and high-noise scenarios. Extensive experiments on multiple benchmarks demonstrate that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency and exhibits outstanding performance.


表征(2篇)

【1】Persistent Topological Structures and Cohomological Flows as a Mathematical Framework for Brain-Inspired Representation Learning
标题:持久性的布局结构和上同调流作为脑启发表示学习的数学框架
链接:https://arxiv.org/abs/2512.08241

作者:Preksha Girish,Rachana Mysore,Mahanthesha U,Shrey Kumar,Shipra Prashant
备注:6 pages, 2 figures
摘要:本文提出了一个严格的数学框架,大脑启发的表示学习建立在持久的拓扑结构和上同调流之间的相互作用。神经计算被重新表述为动态单纯复合体上链映射的演化,使其能够在时间,空间和功能性大脑状态中捕获不变量。所提出的架构集成了代数拓扑与微分几何构造上同调算子,推广基于梯度的学习同调景观。控制拓扑签名和真正的神经数据集的合成数据进行联合分析,使用持久的同源性,层上同调,和谱拉普拉斯量化的稳定性,连续性和结构保存。实证结果表明,与图神经和基于流形的深度架构相比,该模型实现了更好的流形一致性和噪声恢复能力,为拓扑驱动的表示学习建立了连贯的数学基础。
摘要:This paper presents a mathematically rigorous framework for brain-inspired representation learning founded on the interplay between persistent topological structures and cohomological flows. Neural computation is reformulated as the evolution of cochain maps over dynamic simplicial complexes, enabling representations that capture invariants across temporal, spatial, and functional brain states. The proposed architecture integrates algebraic topology with differential geometry to construct cohomological operators that generalize gradient-based learning within a homological landscape. Synthetic data with controlled topological signatures and real neural datasets are jointly analyzed using persistent homology, sheaf cohomology, and spectral Laplacians to quantify stability, continuity, and structural preservation. Empirical results demonstrate that the model achieves superior manifold consistency and noise resilience compared to graph neural and manifold-based deep architectures, establishing a coherent mathematical foundation for topology-driven representation learning.


【2】Data-Efficient Learning of Anomalous Diffusion with Wavelet Representations: Enabling Direct Learning from Experimental Trajectories
标题 :利用子波表示对异常扩散进行数据高效学习:实现从实验轨迹的直接学习
链接:https://arxiv.org/abs/2512.08510

作者:Gongyi Wang,Yu Zhang,Zihan Huang
备注:23 pages, 16 figures
摘要:机器学习(ML)已成为分析异常扩散轨迹的通用工具,但大多数现有管道都是在大量模拟数据集上训练的。相比之下,实验轨迹,如单粒子跟踪(SPT),通常是稀缺的,可能与用于模拟的理想化模型有很大不同,导致ML方法应用于真实数据时性能下降甚至崩溃。为了解决这种不匹配,我们引入了一种基于小波的异常扩散表示,可以直接从实验记录中进行数据高效学习。这种表示是通过将六个互补小波族应用于每个轨迹并结合所得小波模尺度图来构造的。我们首先从andi数据集基准评估模拟轨迹上的小波表示,它在只有1000个训练轨迹的情况下明显优于基于特征和基于概率的方法,并且在大型训练集上仍然保持优势。然后,我们使用这种表示直接从实验SPT轨迹的荧光珠扩散F-肌动蛋白网络,其中小波表示仍然优于现有的替代品的扩散指数回归和网格大小分类。特别是,当预测实验轨迹的扩散指数时,使用小波表示在1200个实验轨迹上训练的模型比纯粹在10^6 $模拟轨迹上训练的最先进的深度学习模型实现了更低的误差。我们将这种数据效率与不同尺度指纹的出现相关联,这些指纹在小波谱中解开了潜在的扩散机制。
摘要:Machine learning (ML) has become a versatile tool for analyzing anomalous diffusion trajectories, yet most existing pipelines are trained on large collections of simulated data. In contrast, experimental trajectories, such as those from single-particle tracking (SPT), are typically scarce and may differ substantially from the idealized models used for simulation, leading to degradation or even breakdown of performance when ML methods are applied to real data. To address this mismatch, we introduce a wavelet-based representation of anomalous diffusion that enables data-efficient learning directly from experimental recordings. This representation is constructed by applying six complementary wavelet families to each trajectory and combining the resulting wavelet modulus scalograms. We first evaluate the wavelet representation on simulated trajectories from the andi-datasets benchmark, where it clearly outperforms both feature-based and trajectory-based methods with as few as 1000 training trajectories and still retains an advantage on large training sets. We then use this representation to learn directly from experimental SPT trajectories of fluorescent beads diffusing in F-actin networks, where the wavelet representation remains superior to existing alternatives for both diffusion-exponent regression and mesh-size classification. In particular, when predicting the diffusion exponents of experimental trajectories, a model trained on 1200 experimental tracks using the wavelet representation achieves significantly lower errors than state-of-the-art deep learning models trained purely on $10^6$ simulated trajectories. We associate this data efficiency with the emergence of distinct scale fingerprints disentangling underlying diffusion mechanisms in the wavelet spectra.


3D|3D重建等相关(1篇)

【1】CarBench: A Comprehensive Benchmark for Neural Surrogates on High-Fidelity 3D Car Aerodynamics
标题:CarBench:高保真3D汽车空气动力学神经代理的综合基准
链接:https://arxiv.org/abs/2512.07847

作者:Mohamed Elrefaie,Dule Shu,Matt Klenk,Faez Ahmed
摘要:基准测试一直是计算机视觉、自然语言处理和更广泛的深度学习领域进步的基石,通过标准化数据集和可重复的评估协议推动算法创新。大规模计算流体动力学(CFD)数据集的日益可用性为将机器学习应用于空气动力学和工程设计提供了新的机会。然而,尽管取得了这些进展,在工程设计中,大规模数值模拟还没有标准化的基准。在这项工作中,我们引入了CarBench,这是第一个致力于大规模3D汽车空气动力学的综合基准,它在DrivAerNet++上对最先进的模型进行了大规模评估,DrivAerNet++是汽车空气动力学最大的公共数据集,包含8,000多个高保真汽车模拟。我们评估了11种跨越神经操作方法的架构(例如,傅立叶神经运算符),几何深度学习(PointNet,RegDGCNN,PointMAE,PointTransformer),基于变换的神经求解器(Transolver,Transolver++,AB-UPT)和隐式场网络(TripNet)。除了标准的插值任务,我们还进行了跨类别实验,在这些实验中,基于transformer的求解器在单个汽车原型上进行训练,并在看不见的类别上进行评估。我们的分析涵盖了预测准确性,物理一致性,计算效率和统计不确定性。为了加快数据驱动工程的进展,我们开源了基准框架,包括训练管道,基于自举响应的不确定性估计例程和预训练模型权重,为高保真CFD模拟的大规模学习建立了第一个可重复的基础,可在https://github.com/Mohamedelrefaie/CarBench上获得。
摘要:Benchmarking has been the cornerstone of progress in computer vision, natural language processing, and the broader deep learning domain, driving algorithmic innovation through standardized datasets and reproducible evaluation protocols. The growing availability of large-scale Computational Fluid Dynamics (CFD) datasets has opened new opportunities for applying machine learning to aerodynamic and engineering design. Yet, despite this progress, there exists no standardized benchmark for large-scale numerical simulations in engineering design. In this work, we introduce CarBench, the first comprehensive benchmark dedicated to large-scale 3D car aerodynamics, performing a large-scale evaluation of state-of-the-art models on DrivAerNet++, the largest public dataset for automotive aerodynamics, containing over 8,000 high-fidelity car simulations. We assess eleven architectures spanning neural operator methods (e.g., Fourier Neural Operator), geometric deep learning (PointNet, RegDGCNN, PointMAE, PointTransformer), transformer-based neural solvers (Transolver, Transolver++, AB-UPT), and implicit field networks (TripNet). Beyond standard interpolation tasks, we perform cross-category experiments in which transformer-based solvers trained on a single car archetype are evaluated on unseen categories. Our analysis covers predictive accuracy, physical consistency, computational efficiency, and statistical uncertainty. To accelerate progress in data-driven engineering, we open-source the benchmark framework, including training pipelines, uncertainty estimation routines based on bootstrap resampling, and pretrained model weights, establishing the first reproducible foundation for large-scale learning from high-fidelity CFD simulations, available at https://github.com/Mohamedelrefaie/CarBench.


优化|敛散性(2篇)

【1】Direct transfer of optimized controllers to similar systems using dimensionless MPC
标题:利用无量纲MPC将优化控制器直接传递给相似系统
链接:https://arxiv.org/abs/2512.08667

作者:Josip Kir Hromatko,Shambhuraj Sawant,Šandor Ileš,Sébastien Gros
备注:7 pages, 4 figures
摘要:缩尺模型实验常用于各种工程领域,以降低实验成本并克服与全尺寸系统相关的限制。这些实验的相关性依赖于量纲分析和动态相似性原理。然而,将控制器转移到全尺寸系统通常需要额外的调优。在本文中,我们提出了一种方法,使直接控制器转移使用无量纲模型预测控制,自动调整闭环性能。通过这种重构,优化控制器的闭环行为直接转移到一个新的,动态相似的系统。此外,无量纲配方允许在参数优化过程中使用来自不同尺度系统的数据。我们展示了一个cartpole摆动和赛车问题的方法,应用强化学习或贝叶斯优化调整控制器参数。用于获得本文结果的软件可在https://github.com/josipkh/dimensionless-mpcrl上公开获得。
摘要:Scaled model experiments are commonly used in various engineering fields to reduce experimentation costs and overcome constraints associated with full-scale systems. The relevance of such experiments relies on dimensional analysis and the principle of dynamic similarity. However, transferring controllers to full-scale systems often requires additional tuning. In this paper, we propose a method to enable a direct controller transfer using dimensionless model predictive control, tuned automatically for closed-loop performance. With this reformulation, the closed-loop behavior of an optimized controller transfers directly to a new, dynamically similar system. Additionally, the dimensionless formulation allows for the use of data from systems of different scales during parameter optimization. We demonstrate the method on a cartpole swing-up and a car racing problem, applying either reinforcement learning or Bayesian optimization for tuning the controller parameters. Software used to obtain the results in this paper is publicly available at https://github.com/josipkh/dimensionless-mpcrl.


【2】Bayesian Optimization for Function-Valued Responses under Min-Max Criteria
标题:最小-最大准则下功能值响应的Bayesian优化
链接:https://arxiv.org/abs/2512.07868

作者:Pouya Ahadi,Reza Marzban,Ali Adibi,Kamran Paynabar
备注:25 pages, 6 figures
摘要:贝叶斯优化被广泛用于优化昂贵的黑盒函数,但大多数现有的方法集中在标量响应。在许多科学和工程环境中,响应是功能性的,在时间或波长等指数上平滑变化,这使得经典公式不适用。现有的方法往往最小化集成误差,捕获平均性能,但忽略了最坏的情况下的偏差。为了解决这个问题,我们提出了最小-最大函数贝叶斯优化(MM-FBO),一个框架,直接最大限度地减少整个功能域的最大误差。使用功能主成分分析表示功能响应,并为主成分分数构建高斯过程替代物。在此表示的基础上,MM-FBO引入了一个集成的不确定性采集功能,该功能平衡了最坏情况下的预期误差与跨功能域的探索。我们提供了两个理论保证:最坏情况下的目标的离散化界,和一致性结果表明,作为代理变得准确和不确定性消失,收购收敛到真正的最小最大目标。我们通过实验验证的方法合成基准和物理启发的案例研究,涉及电磁散射的超光子设备和汽相渗透。结果表明,MM-FBO始终优于现有的基线,并强调了明确建模功能的不确定性在贝叶斯优化的重要性。
摘要:Bayesian optimization is widely used for optimizing expensive black box functions, but most existing approaches focus on scalar responses. In many scientific and engineering settings the response is functional, varying smoothly over an index such as time or wavelength, which makes classical formulations inadequate. Existing methods often minimize integrated error, which captures average performance but neglects worst case deviations. To address this limitation we propose min-max Functional Bayesian Optimization (MM-FBO), a framework that directly minimizes the maximum error across the functional domain. Functional responses are represented using functional principal component analysis, and Gaussian process surrogates are constructed for the principal component scores. Building on this representation, MM-FBO introduces an integrated uncertainty acquisition function that balances exploitation of worst case expected error with exploration across the functional domain. We provide two theoretical guarantees: a discretization bound for the worst case objective, and a consistency result showing that as the surrogate becomes accurate and uncertainty vanishes, the acquisition converges to the true min-max objective. We validate the method through experiments on synthetic benchmarks and physics inspired case studies involving electromagnetic scattering by metaphotonic devices and vapor phase infiltration. Results show that MM-FBO consistently outperforms existing baselines and highlights the importance of explicitly modeling functional uncertainty in Bayesian optimization.


预测|估计(7篇)

【1】Long-Sequence LSTM Modeling for NBA Game Outcome Prediction Using a Novel Multi-Season Dataset
标题:使用新型多赛季数据集进行NBA比赛结果预测的长序列LSTM建模
链接:https://arxiv.org/abs/2512.08591

作者:Charles Rios,Longzhen Han,Almas Baimagambetov,Nikolaos Polatidis
摘要:预测职业篮球比赛的结果,特别是在国家篮球协会(NBA),对于教练策略,球迷参与和体育博彩变得越来越重要。然而,许多现有的预测模型与概念漂移,有限的时间背景和跨季节的不稳定性作斗争。为了推进这一领域的预测,我们引入了一个新构建的纵向NBA数据集,涵盖了2004-05赛季到2024-25赛季,并提出了一个深度学习框架,旨在对长期表现趋势进行建模。我们的主要贡献是一个长短期记忆(LSTM)架构,它利用相当于8个完整NBA赛季的9,840场比赛的扩展序列长度来捕捉不断变化的团队动态和赛季间的依赖关系。我们将此模型与几种传统的机器学习(ML)和深度学习(DL)基线进行了比较,包括逻辑回归,随机森林,多层感知器(MLP)和卷积神经网络(CNN)。LSTM在所有指标上都达到了最佳性能,准确率为72.35,精度为73.15,AUC-ROC为76.13。这些结果证明了长序列时间建模在篮球结果预测中的重要性,并突出了我们新的多赛季数据集对于开发强大的,可推广的NBA预测系统的价值。
摘要 :Predicting the outcomes of professional basketball games, particularly in the National Basketball Association (NBA), has become increasingly important for coaching strategy, fan engagement, and sports betting. However, many existing prediction models struggle with concept drift, limited temporal context, and instability across seasons. To advance forecasting in this domain, we introduce a newly constructed longitudinal NBA dataset covering the 2004-05 to 2024-25 seasons and present a deep learning framework designed to model long-term performance trends. Our primary contribution is a Long Short-Term Memory (LSTM) architecture that leverages an extended sequence length of 9,840 games equivalent to eight full NBA seasons to capture evolving team dynamics and season-over-season dependencies. We compare this model against several traditional Machine Learning (ML) and Deep Learning (DL) baselines, including Logistic Regression, Random Forest, Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). The LSTM achieves the best performance across all metrics, with 72.35 accuracy, 73.15 precision and 76.13 AUC-ROC. These results demonstrate the importance of long-sequence temporal modeling in basketball outcome prediction and highlight the value of our new multi-season dataset for developing robust, generalizable NBA forecasting systems.


【2】Predicting California Bearing Ratio with Ensemble and Neural Network Models: A Case Study from Türkiye
标题:利用Ensemble和神经网络模型预测加州承载率:Türkiye的案例研究
链接:https://arxiv.org/abs/2512.08340

作者:Abdullah Hulusi Kökçam,Uğur Dağdeviren,Talas Fikret Kurnaz,Alparslan Serhat Demir,Caner Erden
备注:Presented at the 13th International Symposium on Intelligent Manufacturing and Service Systems, Duzce, Turkey, Sep 25-27, 2025. Also available on Zenodo: DOI 10.5281/zenodo.17530868
摘要:加州承载比(CBR)是一个关键的岩土工程指标,用于评估路基土壤的承载能力,特别是在交通基础设施和基础设计。传统的CBR测定依赖于实验室渗透试验。尽管这些测试很准确,但通常耗时,成本高,并且可能不切实际,特别是对于大规模或不同的土壤剖面。人工智能,特别是机器学习(ML)的最新进展,使数据驱动的方法能够以更快的速度和更高的精度对复杂的土壤行为进行建模。这项研究介绍了一个全面的ML框架CBR预测使用的382个土壤样本的数据集收集从不同的地质气候区域在Türkiye。该数据集包括与承载力相关的理化土壤特性,允许在监督学习环境中进行多维特征表示。测试了12种ML算法,包括决策树、随机森林、额外树、梯度提升、xgboost、k最近邻、支持向量回归、多层感知器、adaboost、bagging、投票和堆叠回归。每个模型都经过训练、验证和评估,以评估其泛化能力和鲁棒性。其中,随机森林回归器表现最好,达到0.95(训练),0.76(验证)和0.83(测试)的强R2分数。这些结果突出了该模型强大的非线性映射能力,使其成为预测岩土工程任务的一个有前途的工具。该研究支持在岩土工程中集成以数据为中心的智能模型,为传统方法提供有效的替代方案,并促进基础设施分析和设计的数字化转型。
摘要:The California Bearing Ratio (CBR) is a key geotechnical indicator used to assess the load-bearing capacity of subgrade soils, especially in transportation infrastructure and foundation design. Traditional CBR determination relies on laboratory penetration tests. Despite their accuracy, these tests are often time-consuming, costly, and can be impractical, particularly for large-scale or diverse soil profiles. Recent progress in artificial intelligence, especially machine learning (ML), has enabled data-driven approaches for modeling complex soil behavior with greater speed and precision. This study introduces a comprehensive ML framework for CBR prediction using a dataset of 382 soil samples collected from various geoclimatic regions in Türkiye. The dataset includes physicochemical soil properties relevant to bearing capacity, allowing multidimensional feature representation in a supervised learning context. Twelve ML algorithms were tested, including decision tree, random forest, extra trees, gradient boosting, xgboost, k-nearest neighbors, support vector regression, multi-layer perceptron, adaboost, bagging, voting, and stacking regressors. Each model was trained, validated, and evaluated to assess its generalization and robustness. Among them, the random forest regressor performed the best, achieving strong R2 scores of 0.95 (training), 0.76 (validation), and 0.83 (test). These outcomes highlight the model's powerful nonlinear mapping ability, making it a promising tool for predictive geotechnical tasks. The study supports the integration of intelligent, data-centric models in geotechnical engineering, offering an effective alternative to traditional methods and promoting digital transformation in infrastructure analysis and design.


【3】Probabilistic Multi-Agent Aircraft Landing Time Prediction
标题:概率多智能体飞机着陆时间预测
链接:https://arxiv.org/abs/2512.08281

作者:Kyungmin Kim,Seokbin Yoon,Keumjin Lee
备注:13 pages, 8 figures, accepted at AIAA SciTech 2026
摘要:准确可靠的飞机着陆时间预测是空中交通管理中有效分配资源的基础。然而,飞机轨迹和交通流量的固有不确定性对预测准确性和可信度提出了重大挑战。因此,预测模型不仅应提供飞机着陆时间的点估计,而且还应提供与这些预测相关的不确定性。此外,飞机轨迹经常受到附近飞机通过空中交通管制干预(如雷达引导)的影响。因此,着陆时间预测模型必须考虑空域中的多智能体交互。在这项工作中,我们提出了一个概率多智能体飞机着陆时间预测框架,提供了多架飞机的着陆时间分布。我们使用从韩国仁川国际机场航站楼空域收集的空中交通监控数据集来评估所提出的框架。结果表明,该模型实现了更高的预测精度比基线,并量化了其结果的相关不确定性。此外,该模型通过其注意力分数揭示了空中交通管制的潜在模式,从而增强了可解释性。
摘要:Accurate and reliable aircraft landing time prediction is essential for effective resource allocation in air traffic management. However, the inherent uncertainty of aircraft trajectories and traffic flows poses significant challenges to both prediction accuracy and trustworthiness. Therefore, prediction models should not only provide point estimates of aircraft landing times but also the uncertainties associated with these predictions. Furthermore, aircraft trajectories are frequently influenced by the presence of nearby aircraft through air traffic control interventions such as radar vectoring. Consequently, landing time prediction models must account for multi-agent interactions in the airspace. In this work, we propose a probabilistic multi-agent aircraft landing time prediction framework that provides the landing times of multiple aircraft as distributions. We evaluate the proposed framework using an air traffic surveillance dataset collected from the terminal airspace of the Incheon International Airport in South Korea. The results demonstrate that the proposed model achieves higher prediction accuracy than the baselines and quantifies the associated uncertainties of its outcomes. In addition, the model uncovered underlying patterns in air traffic control through its attention scores, thereby enhancing explainability.


【4】Geometric-Stochastic Multimodal Deep Learning for Predictive Modeling of SUDEP and Stroke Vulnerability
标题:几何随机多模式深度学习用于SUDPP和中风易感性的预测建模
链接:https://arxiv.org/abs/2512.08257

作者:Preksha Girish,Rachana Mysore,Mahanthesha U,Shrey Kumar,Misbah Fatimah Annigeri,Tanish Jain
备注:7 pages, 3 figures
摘要 :癫痫猝死(SUDEP)和急性缺血性卒中是危及生命的疾病,涉及皮质、脑干和自主神经系统之间的复杂相互作用。我们提出了一个统一的几何-随机多模态深度学习框架,该框架集成了EEG、ECG、呼吸、SpO 2、EMG和fMRI信号,以模拟SUDEP和中风脆弱性。该方法结合了黎曼流形嵌入,李群不变特征表示,分数随机动力学,哈密顿能量流建模和跨模态注意机制。中风传播建模使用分数流行病扩散结构脑图。在MULTI-EQUID数据集上的实验表明,预测精度得到了提高,并且来自流形曲率、分数记忆指数、注意熵和扩散中心性的生物标志物也得到了解释。建议的框架提供了一个数学原理的基础,早期检测,风险分层,和解释的多模态神经自主神经紊乱的建模。
摘要:Sudden Unexpected Death in Epilepsy (SUDEP) and acute ischemic stroke are life-threatening conditions involving complex interactions across cortical, brainstem, and autonomic systems. We present a unified geometric-stochastic multimodal deep learning framework that integrates EEG, ECG, respiration, SpO2, EMG, and fMRI signals to model SUDEP and stroke vulnerability. The approach combines Riemannian manifold embeddings, Lie-group invariant feature representations, fractional stochastic dynamics, Hamiltonian energy-flow modeling, and cross-modal attention mechanisms. Stroke propagation is modeled using fractional epidemic diffusion over structural brain graphs. Experiments on the MULTI-CLARID dataset demonstrate improved predictive accuracy and interpretable biomarkers derived from manifold curvature, fractional memory indices, attention entropy, and diffusion centrality. The proposed framework provides a mathematically principled foundation for early detection, risk stratification, and interpretable multimodal modeling in neural-autonomic disorders.


【5】Deep Kernel Aalen-Johansen Estimator: An Interpretable and Flexible Neural Net Framework for Competing Risks
标题:深度核Aalen-Johansen估计:一个用于竞争风险的可解释和灵活的神经网络框架
链接:https://arxiv.org/abs/2512.08063

作者:Xiaobin Shen,George H. Chen
备注:Machine Learning for Health (ML4H) 2025 Spotlight
摘要:我们提出了一个可解释的深度竞争风险模型,称为深度核Aalen-Johansen(DKAJ)估计,它推广了经典的累积发生率函数(CIF)的Aalen-Johansen非参数估计。每个数据点(例如,患者)被表示为聚类的加权组合。如果一个数据点仅对一个聚类具有非零权重,则其预测的CIF对应于仅限于该聚类中的数据点的经典Aalen-Johansen估计量。这些权重来自一个自动学习的核函数,该函数测量任何两个数据点的相似程度。在四个标准的竞争风险数据集上,我们表明DKAJ与最先进的基线相比具有竞争力,同时能够提供可视化来帮助模型解释。
摘要:We propose an interpretable deep competing risks model called the Deep Kernel Aalen-Johansen (DKAJ) estimator, which generalizes the classical Aalen-Johansen nonparametric estimate of cumulative incidence functions (CIFs). Each data point (e.g., patient) is represented as a weighted combination of clusters. If a data point has nonzero weight only for one cluster, then its predicted CIFs correspond to those of the classical Aalen-Johansen estimator restricted to data points from that cluster. These weights come from an automatically learned kernel function that measures how similar any two data points are. On four standard competing risks datasets, we show that DKAJ is competitive with state-of-the-art baselines while being able to provide visualizations to assist model interpretation.


【6】GPU Memory Prediction for Multimodal Model Training
标题:用于多模式模型训练的图形处理器内存预测
链接:https://arxiv.org/abs/2512.07853

作者:Jinwoo Jeong,Minchul Kang,Younghun Go,Changyong Shin,Hyunho Lee,Junho Yoon,Gyeongsik Yang,Chuck Yoo
备注:1st Workshop on Systems for Agentic AI (SAA '25), co-located with SOSP 2025
摘要:随着人工智能系统中的深度学习模型规模和复杂性的增长,GPU内存需求增加,并且经常超过可用的GPU内存容量,因此会发生内存不足(OoM)错误。众所周知,OoM中断了整个训练本身,浪费了大量的计算资源。因此,为了防止OoM,准确预测GPU内存使用情况至关重要。然而,以前的研究只集中在单峰架构,并未能推广到多模态模型,即使多模态模型是一个常见的选择,在代理人工智能系统。为了解决这一限制,我们提出了一个框架,通过分析多模态模型的模型架构和训练行为来预测峰值GPU内存使用量。具体来说,该框架将多模态模型分解为其组成层,并应用因子分解来估计每一层的内存使用情况。我们的评估表明,我们的框架实现了约8.7%的平均MAPE的高预测精度。
摘要:As deep learning models in agentic AI systems grow in scale and complexity, GPU memory requirements increase and often exceed the available GPU memory capacity, so that out-of-memory (OoM) errors occur. It is well known that OoM interrupts the whole training itself and wastes substantial computational resources. Therefore, to prevent OoM, accurate prediction of GPU memory usage is essential. However, previous studies focus only on unimodal architectures and fail to generalize to multimodal models, even though the multimodal models are a common choice in agentic AI systems. To address this limitation, we propose a framework that predicts the peak GPU memory usage by analyzing the model architecture and training behavior of multimodal models. Specifically, the framework decomposes the multimodal model into its constituent layers and applies factorization to estimate the memory usage of each layer. Our evaluation shows that our framework achieves high prediction accuracy of ~8.7% average MAPE.


【7】Integrating LSTM Networks with Neural Levy Processes for Financial Forecasting
标题:将LSTM网络与神经Levy过程集成用于财务预测
链接:https://arxiv.org/abs/2512.07860

作者:Mohammed Alruqimi,Luca Di Persio
摘要 :本文研究了深度学习与金融模型的最佳集成,以实现稳健的资产价格预测。具体来说,我们开发了一个混合框架,将长短期记忆(LSTM)网络与Merton-Lévy跳跃扩散模型相结合。为了优化这个框架,我们采用了灰狼优化器(GWO)进行LSTM超参数调整,并探索了Merton-Levy模型参数的三种校准方法:人工神经网络(ANN),海洋捕食者算法(MPA)和基于PyTorch的Torchtools库。为了评估我们的混合模型的预测性能,我们将其与几个基准模型进行了比较,包括标准LSTM和与分数赫斯顿模型相结合的LSTM。该评估使用了三个真实世界的金融数据集:布伦特油价,STOXX 600指数和IT 40指数。使用标准指标评估性能,包括均方误差(MSE)、平均绝对误差(MAE)、均方百分比误差(MSPE)和决定系数(R2)。我们的实验结果表明,将GWO优化的LSTM网络与使用ANN校准的Levy-Merton跳跃扩散模型相结合的混合模型优于基本LSTM模型和本研究中开发的所有其他模型。
摘要:This paper investigates an optimal integration of deep learning with financial models for robust asset price forecasting. Specifically, we developed a hybrid framework combining a Long Short-Term Memory (LSTM) network with the Merton-Lévy jump-diffusion model. To optimise this framework, we employed the Grey Wolf Optimizer (GWO) for the LSTM hyperparameter tuning, and we explored three calibration methods for the Merton-Levy model parameters: Artificial Neural Networks (ANNs), the Marine Predators Algorithm (MPA), and the PyTorch-based TorchSDE library. To evaluate the predictive performance of our hybrid model, we compared it against several benchmark models, including a standard LSTM and an LSTM combined with the Fractional Heston model. This evaluation used three real-world financial datasets: Brent oil prices, the STOXX 600 index, and the IT40 index. Performance was assessed using standard metrics, including Mean Squared Error (MSE), Mean Absolute Error(MAE), Mean Squared Percentage Error (MSPE), and the coefficient of determination (R2). Our experimental results demonstrate that the hybrid model, combining a GWO-optimized LSTM network with the Levy-Merton Jump-Diffusion model calibrated using an ANN, outperformed the base LSTM model and all other models developed in this study.


其他神经网络|深度学习|模型|建模(22篇)

【1】Refining Diffusion Models for Motion Synthesis with an Acceleration Loss to Generate Realistic IMU Data
标题:改进具有加速度损失的运动合成扩散模型以生成真实的IMU数据
链接:https://arxiv.org/abs/2512.08859

作者:Lars Ole Häusler,Lena Uhlenberg,Göran Köber,Diyora Salimova,Oliver Amft
备注:7 pages, 3 figures, 1 table
摘要:我们提出了一个文本到惯性测量单元(IMU)的运动合成框架,通过使用基于加速度的二阶损失(L_acc)微调预训练的扩散模型来获得真实的IMU数据。L_acc在所生成的运动的离散二阶时间差中执行一致性,从而将扩散先验与IMU特定的加速度模式对齐。我们将L_acc集成到现有扩散模型的训练目标中,微调模型以获得IMU特定的运动先验,并使用现有的文本到IMU框架(包括表面建模和虚拟传感器仿真)评估模型。我们分析了加速度信号保真度和合成运动表示与实际IMU记录之间的差异。作为下游应用,我们评估了人类活动识别(HAR),并使用我们的方法与早期的扩散模型和两个额外的扩散模型基线的数据比较了分类性能。当我们用L_acc增强早期的扩散模型目标并继续训练时,L_acc相对于原始模型下降了12.7%。在高动态活动(即,跑步,跳跃)与低动态活动(即,坐着,站着)。在低维嵌入中,由我们改进的模型产生的合成IMU数据更接近真实IMU记录的分布。与早期的扩散模型相比,专门在我们改进的合成IMU数据上训练的HAR分类将性能提高了8.7%,比性能最好的比较扩散模型提高了7.6%。我们的结论是,加速度感知扩散细化提供了一种有效的方法来调整运动生成和IMU合成,并强调了深度学习流水线对于传感器特定任务的通用文本到运动先验的灵活性。
摘要:We propose a text-to-IMU (inertial measurement unit) motion-synthesis framework to obtain realistic IMU data by fine-tuning a pretrained diffusion model with an acceleration-based second-order loss (L_acc). L_acc enforces consistency in the discrete second-order temporal differences of the generated motion, thereby aligning the diffusion prior with IMU-specific acceleration patterns. We integrate L_acc into the training objective of an existing diffusion model, finetune the model to obtain an IMU-specific motion prior, and evaluate the model with an existing text-to-IMU framework that comprises surface modelling and virtual sensor simulation. We analysed acceleration signal fidelity and differences between synthetic motion representation and actual IMU recordings. As a downstream application, we evaluated Human Activity Recognition (HAR) and compared the classification performance using data of our method with the earlier diffusion model and two additional diffusion model baselines. When we augmented the earlier diffusion model objective with L_acc and continued training, L_acc decreased by 12.7% relative to the original model. The improvements were considerably larger in high-dynamic activities (i.e., running, jumping) compared to low-dynamic activities~(i.e., sitting, standing). In a low-dimensional embedding, the synthetic IMU data produced by our refined model shifts closer to the distribution of real IMU recordings. HAR classification trained exclusively on our refined synthetic IMU data improved performance by 8.7% compared to the earlier diffusion model and by 7.6% over the best-performing comparison diffusion model. We conclude that acceleration-aware diffusion refinement provides an effective approach to align motion generation and IMU synthesis and highlights how flexible deep learning pipelines are for specialising generic text-to-motion priors to sensor-specific tasks.


【2】Identifying counterfactual probabilities using bivariate distributions and uplift modeling
标题:使用二元分布和提升建模识别反事实概率
链接:https://arxiv.org/abs/2512.08805

作者:Théo Verhelst,Gianluca Bontempi
备注:7 pages. Submitted to the 34th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
摘要:非线性模型将干预的因果效应估计为治疗和控制下的潜在结果之间的差异,而反事实识别旨在恢复这些潜在结果的联合分布(例如,“如果我们给他们一个营销报价,这个客户还会流失吗?").这种联合的反事实分布提供了比隆起更丰富的信息,但更难估计。然而,这两种方法是协同作用的:可以利用隆起模型进行反事实估计。我们提出了一个反事实的估计,适合一个二元贝塔分布预测隆起分数,产生后验分布的反事实的结果。我们的方法不需要超出隆起建模的因果假设。模拟显示了该方法的有效性,例如,它可以应用于电信中的客户流失问题,其中它揭示了标准ML或提升模型无法获得的见解。
摘要:Uplift modeling estimates the causal effect of an intervention as the difference between potential outcomes under treatment and control, whereas counterfactual identification aims to recover the joint distribution of these potential outcomes (e.g., "Would this customer still have churned had we given them a marketing offer?"). This joint counterfactual distribution provides richer information than the uplift but is harder to estimate. However, the two approaches are synergistic: uplift models can be leveraged for counterfactual estimation. We propose a counterfactual estimator that fits a bivariate beta distribution to predicted uplift scores, yielding posterior distributions over counterfactual outcomes. Our approach requires no causal assumptions beyond those of uplift modeling. Simulations show the efficacy of the approach, which can be applied, for example, to the problem of customer churn in telecom, where it reveals insights unavailable to standard ML or uplift models alone.


【3】Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search
标题:通过自动提示搜索暴露文本到图像模型中的隐藏偏见
链接:https://arxiv.org/abs/2512.08724

作者:Manos Plitsis,Giorgos Bouritsas,Vassilis Katsouros,Yannis Panagakis
摘要:文本到图像(TTI)扩散模型已经取得了显着的视觉质量,但他们已经多次被证明表现出社会偏见的敏感属性,如性别,种族和年龄。为了减轻这些偏见,现有的方法经常依赖于策划提示数据集-手动构建或使用大型语言模型(LLM)生成-作为其训练和/或评估程序的一部分。除了策展成本,这也有可能忽视意外的、不太明显的提示,这些提示会触发有偏见的生成,即使在经历了去偏见的模型中也是如此。在这项工作中,我们介绍了偏见引导提示搜索(BGPS),一个框架,自动生成提示,旨在最大限度地提高所产生的图像中的偏见的存在。BGPS包括两个组成部分:(1)被指示产生属性中性提示的LLM,以及(2)作用于TTI的内部表示的属性分类器,其将LLM的解码过程导向提示空间的放大感兴趣的图像属性的区域。我们对稳定扩散1.5和最先进的去偏模型进行了广泛的实验,发现了一系列微妙的和以前没有记录的偏见,严重恶化了公平性指标。至关重要的是,发现的提示是可解释的,即它们可以由典型用户输入,与突出的硬提示优化对应物相比,定量地改善了困惑度度量。我们的研究结果揭示了TTI漏洞,而BGPS扩展了偏见搜索空间,可以作为一个新的评估工具,偏见缓解。
摘要:Text-to-image (TTI) diffusion models have achieved remarkable visual quality, yet they have been repeatedly shown to exhibit social biases across sensitive attributes such as gender, race and age. To mitigate these biases, existing approaches frequently depend on curated prompt datasets - either manually constructed or generated with large language models (LLMs) - as part of their training and/or evaluation procedures. Beside the curation cost, this also risks overlooking unanticipated, less obvious prompts that trigger biased generation, even in models that have undergone debiasing. In this work, we introduce Bias-Guided Prompt Search (BGPS), a framework that automatically generates prompts that aim to maximize the presence of biases in the resulting images. BGPS comprises two components: (1) an LLM instructed to produce attribute-neutral prompts and (2) attribute classifiers acting on the TTI's internal representations that steer the decoding process of the LLM toward regions of the prompt space that amplify the image attributes of interest. We conduct extensive experiments on Stable Diffusion 1.5 and a state-of-the-art debiased model and discover an array of subtle and previously undocumented biases that severely deteriorate fairness metrics. Crucially, the discovered prompts are interpretable, i.e they may be entered by a typical user, quantitatively improving the perplexity metric compared to a prominent hard prompt optimization counterpart. Our findings uncover TTI vulnerabilities, while BGPS expands the bias search space and can act as a new evaluation tool for bias mitigation.


【4】Gradient-Informed Monte Carlo Fine-Tuning of Diffusion Models for Low-Thrust Trajectory Design
标题:低推力弹道设计的扩散模型受影响的蒙特卡洛微调
链接:https://arxiv.org/abs/2512.08705

作者:Jannik Graebner,Ryne Beeson
摘要:在圆形限制性三体问题中,小推力航天器轨道的初步任务设计是一个全局搜索过程,其特点是目标景观复杂,存在大量的局部极小值。将问题公式化为从局部最优解邻域上支持的非归一化分布中采样,提供了部署马尔可夫链蒙特卡罗方法和生成机器学习的机会。在这项工作中,我们扩展了我们以前的自监督扩散模型微调框架,采用梯度通知马尔可夫链蒙特卡罗。我们比较两种算法-大都市调整的朗之万算法和汉密尔顿蒙特卡罗-都从扩散模型学习的分布初始化。使用状态转移矩阵分析计算平衡燃料消耗、飞行时间和约束违反的目标函数的导数。我们表明,将梯度漂移项加速混合,提高了收敛的马尔可夫链的多革命转移在土星-泰坦系统。在评估的方法中,MALA提供了性能和计算成本之间的最佳权衡。从在相关转移上训练的基线扩散模型生成的样本开始,MALA明确地以帕累托最优解为目标。与随机游走Metropolis算法相比,它将可行性率从17.34%提高到63.01%,并产生更密集,更多样化的Pareto前沿覆盖。通过使用奖励加权似然最大化对生成的样本和相关奖励值的扩散模型进行微调,我们学习了问题的全局解决方案结构,并消除了对繁琐的单独数据生成阶段的需求。
摘要:Preliminary mission design of low-thrust spacecraft trajectories in the Circular Restricted Three-Body Problem is a global search characterized by a complex objective landscape and numerous local minima. Formulating the problem as sampling from an unnormalized distribution supported on neighborhoods of locally optimal solutions, provides the opportunity to deploy Markov chain Monte Carlo methods and generative machine learning. In this work, we extend our previous self-supervised diffusion model fine-tuning framework to employ gradient-informed Markov chain Monte Carlo. We compare two algorithms - the Metropolis-Adjusted Langevin Algorithm and Hamiltonian Monte Carlo - both initialized from a distribution learned by a diffusion model. Derivatives of an objective function that balances fuel consumption, time of flight and constraint violations are computed analytically using state transition matrices. We show that incorporating the gradient drift term accelerates mixing and improves convergence of the Markov chain for a multi-revolution transfer in the Saturn-Titan system. Among the evaluated methods, MALA provides the best trade-off between performance and computational cost. Starting from samples generated by a baseline diffusion model trained on a related transfer, MALA explicitly targets Pareto-optimal solutions. Compared to a random walk Metropolis algorithm, it increases the feasibility rate from 17.34% to 63.01% and produces a denser, more diverse coverage of the Pareto front. By fine-tuning a diffusion model on the generated samples and associated reward values with reward-weighted likelihood maximization, we learn the global solution structure of the problem and eliminate the need for a tedious separate data generation phase.


【5】Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models
标题:解释扩散模型图像保护方法中的结构化扰动
链接:https://arxiv.org/abs/2512.08329

作者:Michael R. Martin,Garrick Chan,Kwan-Liu Ma
备注:32 pages, 17 figures, 1 table, 5 algorithms, preprint
摘要 :最近的图像保护机制,如Glaze和Nightshade,引入了难以察觉的、反向设计的扰动,旨在破坏下游的文本到图像生成模型。虽然它们的经验有效性是已知的,但这些扰动的内部结构、可检测性和代表性行为仍然知之甚少。这项研究提供了一个系统的,可解释的人工智能分析,使用一个统一的框架,集成了白盒特征空间检查和黑盒信号级探测。通过潜在空间聚类,特征通道激活分析,基于遮挡的空间敏感性映射和频域表征,我们表明,保护机制作为结构化的,低熵的扰动紧密耦合到底层图像内容跨代表性,空间和频谱域。受保护的图像保留内容驱动的功能组织与保护特定的子结构,而不是诱导全球代表性的漂移。可检测性是由扰动熵,空间部署,和频率对齐的相互作用的影响,顺序保护放大可检测的结构,而不是抑制it.Frequency域分析表明,釉和龙葵重新分配能量沿占主导地位的图像对齐的频率轴,而不是引入扩散噪声。这些研究结果表明,当代图像保护是通过结构化的特征级变形而不是语义错位来进行的,这解释了为什么保护信号在视觉上仍然很微妙,但始终可以检测到。这项工作提高了对抗性图像保护的可解释性,并为生成式AI系统的未来防御和检测策略的设计提供了信息。
摘要:Recent image protection mechanisms such as Glaze and Nightshade introduce imperceptible, adversarially designed perturbations intended to disrupt downstream text-to-image generative models. While their empirical effectiveness is known, the internal structure, detectability, and representational behavior of these perturbations remain poorly understood. This study provides a systematic, explainable AI analysis using a unified framework that integrates white-box feature-space inspection and black-box signal-level probing. Through latent-space clustering, feature-channel activation analysis, occlusion-based spatial sensitivity mapping, and frequency-domain characterization, we show that protection mechanisms operate as structured, low-entropy perturbations tightly coupled to underlying image content across representational, spatial, and spectral domains. Protected images preserve content-driven feature organization with protection-specific substructure rather than inducing global representational drift. Detectability is governed by interacting effects of perturbation entropy, spatial deployment, and frequency alignment, with sequential protection amplifying detectable structure rather than suppressing it. Frequency-domain analysis shows that Glaze and Nightshade redistribute energy along dominant image-aligned frequency axes rather than introducing diffuse noise. These findings indicate that contemporary image protection operates through structured feature-level deformation rather than semantic dislocation, explaining why protection signals remain visually subtle yet consistently detectable. This work advances the interpretability of adversarial image protection and informs the design of future defenses and detection strategies for generative AI systems.


【6】Mathematical Foundations of Neural Tangents and Infinite-Width Networks
标题:神经切向和无限宽网络的数学基础
链接:https://arxiv.org/abs/2512.08264

作者:Rachana Mysore,Preksha Girish,Kavitha Jayaram,Shrey Kumar,Preksha Girish,Shravan Sanjeev Bagal,Kavitha Jayaram,Shreya Aravind Shastry
备注:7 pages, 2 figures
摘要:我们通过神经切线核(NTK)研究了无限宽度区域中神经网络的数学基础。我们提出了NTK-Eigenvalue-Controlled Residual Network(NTK-ECRN),这是一种集成了傅立叶特征嵌入、残差连接与分层缩放以及随机深度的架构,可以在训练过程中对内核演化进行严格的分析。我们的理论贡献包括推导NTK动态的界限,表征特征值的演变,并将谱特性与泛化和优化稳定性联系起来。在合成数据集和基准数据集上的实验结果验证了预测的内核行为,并证明了训练稳定性和泛化能力的提高。这项工作提供了一个全面的框架,将无限宽度理论和实际的深度学习架构联系起来。
摘要:We investigate the mathematical foundations of neural networks in the infinite-width regime through the Neural Tangent Kernel (NTK). We propose the NTK-Eigenvalue-Controlled Residual Network (NTK-ECRN), an architecture integrating Fourier feature embeddings, residual connections with layerwise scaling, and stochastic depth to enable rigorous analysis of kernel evolution during training. Our theoretical contributions include deriving bounds on NTK dynamics, characterizing eigenvalue evolution, and linking spectral properties to generalization and optimization stability. Empirical results on synthetic and benchmark datasets validate the predicted kernel behavior and demonstrate improved training stability and generalization. This work provides a comprehensive framework bridging infinite-width theory and practical deep-learning architectures.


【7】Wavelet-Accelerated Physics-Informed Quantum Neural Network for Multiscale Partial Differential Equations
标题:用于多尺度偏微方程的微波加速物理信息量子神经网络
链接:https://arxiv.org/abs/2512.08256

作者:Deepak Gupta,Himanshu Pandey,Ratikanta Behera
摘要:这项工作提出了一个基于小波的物理信息量子神经网络框架,以有效地解决多尺度偏微分方程,涉及尖锐的梯度,刚度,快速局部变化,和高度振荡的行为。传统的物理信息神经网络(PINN)在求解微分方程方面表现出了巨大的潜力,而它们的量子对应物量子PINN则表现出更强的代表能力,具有更少的可训练参数。然而,这两种方法在准确解决多尺度特征方面都面临着显著的挑战。此外,它们依赖于自动微分来构建损失函数,这引入了相当大的计算开销,导致更长的训练时间。为了克服这些挑战,我们开发了一种小波加速的物理信息量子神经网络,它消除了自动微分的需要,大大降低了计算复杂性。所提出的框架将小波的多分辨率属性的量子神经网络架构内,从而提高网络的能力,有效地捕捉多尺度问题的局部和全局特征。数值实验表明,我们提出的方法实现了卓越的精度,同时需要不到百分之五的可训练参数相比,经典的基于小波的PINN,从而加快收敛。此外,与现有的量子PINN相比,它提供了三到五倍的加速,突出了所提出的方法有效解决具有挑战性的多尺度和振荡问题的潜力。
摘要 :This work proposes a wavelet-based physics-informed quantum neural network framework to efficiently address multiscale partial differential equations that involve sharp gradients, stiffness, rapid local variations, and highly oscillatory behavior. Traditional physics-informed neural networks (PINNs) have demonstrated substantial potential in solving differential equations, and their quantum counterparts, quantum-PINNs, exhibit enhanced representational capacity with fewer trainable parameters. However, both approaches face notable challenges in accurately solving multiscale features. Furthermore, their reliance on automatic differentiation for constructing loss functions introduces considerable computational overhead, resulting in longer training times. To overcome these challenges, we developed a wavelet-accelerated physics-informed quantum neural network that eliminates the need for automatic differentiation, significantly reducing computational complexity. The proposed framework incorporates the multiresolution property of wavelets within the quantum neural network architecture, thereby enhancing the network's ability to effectively capture both local and global features of multiscale problems. Numerical experiments demonstrate that our proposed method achieves superior accuracy while requiring less than five percent of the trainable parameters compared to classical wavelet-based PINNs, resulting in faster convergence. Moreover, it offers a speedup of three to five times compared to existing quantum PINNs, highlighting the potential of the proposed approach for efficiently solving challenging multiscale and oscillatory problems.


【8】LayerPipe2: Multistage Pipelining and Weight Recompute via Improved Exponential Moving Average for Training Neural Networks
标题:LayerPipe 2:通过改进的指数移动平均值进行多阶段流水线和权重重新计算训练神经网络
链接:https://arxiv.org/abs/2512.08160

作者:Nanda K. Unnikrishnan,Keshab K. Parhi
备注:Proc. of 2025 Asilomar Conference on Signals, Systems, and Computers, October 2025, Pacific Grove, CA
摘要:在我们之前的工作LayerPipe中,我们介绍了一种通过重叠向前和向后计算来加速卷积、全连接和尖峰神经网络训练的方法。然而,尽管取得了经验上的成功,但没有解决在每一层需要引入多少梯度延迟以实现期望的流水线水平的原则性理解。本文LayerPipe 2通过使用可变延迟梯度自适应和重定时正式派生LayerPipe来填补这一空白。我们确定延迟可以合法地插入,并表明所需的延迟量直接从网络结构,其中内层需要较少的延迟和外层需要较长的延迟。当在每一层应用流水线时,延迟量仅取决于剩余下游级的数量。当层在组中流水线化时,组中的所有层共享相同的延迟分配。这些见解不仅解释了以前观察到的调度模式,而且还暴露了一个经常被忽视的挑战,即流水线隐含地需要存储历史权重。我们克服了这个存储瓶颈,通过开发一个管道-感知移动平均重建所需的过去的状态,而不是明确地存储它们。这降低了内存成本,而不会牺牲精度保证,使流水线学习可行。其结果是一个原则性的框架,说明了如何构建LayerPipe架构,预测其延迟要求,并减轻其存储负担,从而实现可扩展的流水线训练与控制通信计算权衡。
摘要:In our prior work, LayerPipe, we had introduced an approach to accelerate training of convolutional, fully connected, and spiking neural networks by overlapping forward and backward computation. However, despite empirical success, a principled understanding of how much gradient delay needs to be introduced at each layer to achieve desired level of pipelining was not addressed. This paper, LayerPipe2, fills that gap by formally deriving LayerPipe using variable delayed gradient adaptation and retiming. We identify where delays may be legally inserted and show that the required amount of delay follows directly from the network structure where inner layers require fewer delays and outer layers require longer delays. When pipelining is applied at every layer, the amount of delay depends only on the number of remaining downstream stages. When layers are pipelined in groups, all layers in the group share the same assignment of delays. These insights not only explain previously observed scheduling patterns but also expose an often overlooked challenge that pipelining implicitly requires storage of historical weights. We overcome this storage bottleneck by developing a pipeline--aware moving average that reconstructs the required past states rather than storing them explicitly. This reduces memory cost without sacrificing the accuracy guarantees that makes pipelined learning viable. The result is a principled framework that illustrates how to construct LayerPipe architectures, predicts their delay requirements, and mitigates their storage burden, thereby enabling scalable pipelined training with controlled communication computation tradeoffs.


【9】TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
标题:TreeGRPO:用于扩散模型在线RL后训练的树优势GRPO
链接:https://arxiv.org/abs/2512.08153

作者:Zheng Ding,Weirui Ye
摘要:强化学习(RL)后训练对于将生成模型与人类偏好相匹配至关重要,但其高昂的计算成本仍然是广泛采用的主要障碍。我们引入了\textbf{TreeGRPO},这是一种新颖的RL框架,通过将去噪过程重塑为搜索树来显着提高训练效率。从共享的初始噪声样本中,TreeGRPO策略性地分支以生成多个候选轨迹,同时有效地重用它们的公共前缀。这种树形结构的方法具有三个关键优势:(1)高样本效率,在相同的训练样本下实现更好的性能(2)通过计算特定于步骤的优势的奖励反向传播的细粒度信用分配,克服了基于概率的方法的均匀信用分配限制,以及(3)\endash {摊销计算},其中多子分支允许每次向前传递多个策略更新。在扩散和基于流的模型上进行的大量实验表明,TreeGRPO在建立效率-回报权衡空间中的优越帕累托边界的同时,实现了更快的训练。我们的方法在多个基准和奖励模型中始终优于GRPO基线,为基于RL的视觉生成模型对齐提供了可扩展和有效的途径。该项目的网站是treegrpo.github.io。
摘要:Reinforcement learning (RL) post-training is crucial for aligning generative models with human preferences, but its prohibitive computational cost remains a major barrier to widespread adoption. We introduce \textbf{TreeGRPO}, a novel RL framework that dramatically improves training efficiency by recasting the denoising process as a search tree. From shared initial noise samples, TreeGRPO strategically branches to generate multiple candidate trajectories while efficiently reusing their common prefixes. This tree-structured approach delivers three key advantages: (1) \emph{High sample efficiency}, achieving better performance under same training samples (2) \emph{Fine-grained credit assignment} via reward backpropagation that computes step-specific advantages, overcoming the uniform credit assignment limitation of trajectory-based methods, and (3) \emph{Amortized computation} where multi-child branching enables multiple policy updates per forward pass. Extensive experiments on both diffusion and flow-based models demonstrate that TreeGRPO achieves \textbf{2.4$\times$ faster training} while establishing a superior Pareto frontier in the efficiency-reward trade-off space. Our method consistently outperforms GRPO baselines across multiple benchmarks and reward models, providing a scalable and effective pathway for RL-based visual generative model alignment. The project website is available at treegrpo.github.io.


【10】Long-only cryptocurrency portfolio management by ranking the assets: a neural network approach
标题:通过对资产进行排名进行只做长的加密货币投资组合管理:神经网络方法
链接:https://arxiv.org/abs/2512.08124

作者:Zijiang Yang
摘要 :本文将在加密货币市场的背景下提出一种新的基于机器学习的投资组合管理方法。以前的研究人员主要集中在预测特定加密货币的走势,如比特币(BTC),然后根据预测进行交易。与以往将加密货币单独处理不同,本文通过分析相对关系来管理一组加密货币。具体来说,在每个时间步中,我们利用神经网络来预测管理的加密货币的未来回报的排名,并相应地设置权重。通过整合这些横截面信息,基于2020年5月至2023年11月的真实每日加密货币市场数据的回测实验,所提出的方法被证明是有利可图的。在这3.5年中,市场经历了看涨、看跌和停滞的完整周期。尽管在如此复杂的市场条件下,所提出的方法优于现有的方法,并实现了夏普比率为1.01,年化收益率为64.26%。此外,所提出的方法被证明是强大的交易费用的增加。
摘要:This paper will propose a novel machine learning based portfolio management method in the context of the cryptocurrency market. Previous researchers mainly focus on the prediction of the movement for specific cryptocurrency such as the bitcoin(BTC) and then trade according to the prediction. In contrast to the previous work that treats the cryptocurrencies independently, this paper manages a group of cryptocurrencies by analyzing the relative relationship. Specifically, in each time step, we utilize the neural network to predict the rank of the future return of the managed cryptocurrencies and place weights accordingly. By incorporating such cross-sectional information, the proposed methods is shown to profitable based on the backtesting experiments on the real daily cryptocurrency market data from May, 2020 to Nov, 2023. During this 3.5 years, the market experiences the full cycle of bullish, bearish and stagnant market conditions. Despite under such complex market conditions, the proposed method outperforms the existing methods and achieves a Sharpe ratio of 1.01 and annualized return of 64.26%. Additionally, the proposed method is shown to be robust to the increase of transaction fee.


【11】Scalable Offline Model-Based RL with Action Chunks
标题:具有动作块的可扩展离线基于模型的RL
链接:https://arxiv.org/abs/2512.08108

作者:Kwanyoung Park,Seohong Park,Youngwoon Lee,Sergey Levine
备注:22 pages, 7 figures
摘要:在本文中,我们研究了基于模型的强化学习(RL),特别是基于模型的值扩展,是否可以为离线RL中处理复杂的长时间任务提供可扩展的方法。基于模型的值扩展使用由当前策略和学习的动态模型生成的长度为n的假想卷展来拟合策略上的值函数。虽然较大的n减少了价值自举的偏差,但它放大了长期积累的模型误差,降低了未来的预测。我们解决了这个权衡与\n {action-chunk}模型,预测未来的状态,从一系列的行动(一个“行动块”),而不是一个单一的行动,这减少了复合错误。此外,我们没有直接训练策略来最大化奖励,而是采用了来自表达性行为动作块策略的拒绝采样,这可以防止模型利用分布外的动作。我们称之为\textbf{基于模型的RL与动作块(MAC)}。通过对具有高达100 M转换的大规模数据集的高挑战性任务的实验,我们表明MAC在离线基于模型的RL算法中实现了最佳性能,特别是在具有挑战性的长时间任务上。
摘要:In this paper, we study whether model-based reinforcement learning (RL), in particular model-based value expansion, can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL. Model-based value expansion fits an on-policy value function using length-n imaginary rollouts generated by the current policy and a learned dynamics model. While larger n reduces bias in value bootstrapping, it amplifies accumulated model errors over long horizons, degrading future predictions. We address this trade-off with an \emph{action-chunk} model that predicts a future state from a sequence of actions (an "action chunk") instead of a single action, which reduces compounding errors. In addition, instead of directly training a policy to maximize rewards, we employ rejection sampling from an expressive behavioral action-chunk policy, which prevents model exploitation from out-of-distribution actions. We call this recipe \textbf{Model-Based RL with Action Chunks (MAC)}. Through experiments on highly challenging tasks with large-scale datasets of up to 100M transitions, we show that MAC achieves the best performance among offline model-based RL algorithms, especially on challenging long-horizon tasks.


【12】Complexity of One-Dimensional ReLU DNNs
标题:一维ReLU DNN的复杂性
链接:https://arxiv.org/abs/2512.08091

作者:Jonathan Kogan,Hayden Jananthan,Jeremy Kepner
备注:Presented at IEEE MIT URTC 2025
摘要:我们通过线性区域的透镜来研究一维(1D)ReLU深度神经网络的表达能力。对于随机初始化的,完全连接的1D ReLU网络(具有非零偏差的He缩放),我们证明了线性区域的预期数量增长为$\sum_{i = 1}^L n_i + \mathop{o}\left(\sum_{i = 1}^L{n_i}\right)+ 1$,其中$n_\ell$表示第$\ell$隐藏层中的神经元数量。我们还提出了一个函数自适应的稀疏性概念,将网络使用的预期区域与在固定公差内近似目标所需的最小数量进行比较。
摘要:We study the expressivity of one-dimensional (1D) ReLU deep neural networks through the lens of their linear regions. For randomly initialized, fully connected 1D ReLU networks (He scaling with nonzero bias) in the infinite-width limit, we prove that the expected number of linear regions grows as $\sum_{i = 1}^L n_i + \mathop{o}\left(\sum_{i = 1}^L{n_i}\right) + 1$, where $n_\ell$ denotes the number of neurons in the $\ell$-th hidden layer. We also propose a function-adaptive notion of sparsity that compares the expected regions used by the network to the minimal number needed to approximate a target within a fixed tolerance.


【13】HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability
标题:HOLE:神经网络可解释性的潜在嵌入的同调观察
链接:https://arxiv.org/abs/2512.07988

作者:Sudhanva Manjunath Athreya,Paul Rosen
摘要:深度学习模型在各个领域都取得了巨大的成功,但它们的学习表示和决策过程在很大程度上仍然是不透明的,难以解释。本文介绍了HOLE(Homological Observation of Latent Embeddings),这是一种通过持久同源性分析和解释深度神经网络的方法。HOLE从神经激活中提取拓扑特征,并使用一套可视化技术(包括Sankey图、热图、树状图和斑点图)将其呈现出来。这些工具有助于检查各层的表示结构和质量。我们使用一系列判别模型在标准数据集上评估HOLE,重点关注表示质量、跨层可解释性以及对输入扰动和模型压缩的鲁棒性。结果表明,拓扑分析揭示了与类分离、特征解纠缠和模型鲁棒性相关的模式,为理解和改进深度学习系统提供了补充视角。
摘要 :Deep learning models have achieved remarkable success across various domains, yet their learned representations and decision-making processes remain largely opaque and hard to interpret. This work introduces HOLE (Homological Observation of Latent Embeddings), a method for analyzing and interpreting deep neural networks through persistent homology. HOLE extracts topological features from neural activations and presents them using a suite of visualization techniques, including Sankey diagrams, heatmaps, dendrograms, and blob graphs. These tools facilitate the examination of representation structure and quality across layers. We evaluate HOLE on standard datasets using a range of discriminative models, focusing on representation quality, interpretability across layers, and robustness to input perturbations and model compression. The results indicate that topological analysis reveals patterns associated with class separation, feature disentanglement, and model robustness, providing a complementary perspective for understanding and improving deep learning systems.


【14】CIP-Net: Continual Interpretable Prototype-based Network
标题:CIP-Net:连续可解释原型网络
链接:https://arxiv.org/abs/2512.07981

作者:Federico Di Valerio,Michela Proietti,Alessio Ragno,Roberto Capobianco
摘要:持续学习限制模型随着时间的推移学习新任务,而不会忘记他们已经学到的东西。在这种情况下,一个关键的挑战是灾难性的遗忘,学习新的信息会导致模型在以前的任务中失去性能。最近,可解释的人工智能被认为是更好地理解和减少遗忘的一种有希望的方法。特别是,可自我解释的模型是有用的,因为它们在预测过程中生成解释,这有助于保存知识。然而,大多数现有的可解释的方法使用事后解释或需要额外的内存为每个新的任务,导致有限的可扩展性。在这项工作中,我们介绍CIP-Net,一个无范例的自我解释的原型为基础的模型,设计用于持续学习。CIP-Net避免存储过去的示例,并保持简单的架构,同时仍然提供有用的解释和强大的性能。我们证明,CIPNet实现了最先进的性能相比,以前的无范例和自我解释的方法在任务和类增量设置,同时承担显着降低内存相关的开销。这使其成为持续学习的实用和可解释的解决方案。
摘要:Continual learning constrains models to learn new tasks over time without forgetting what they have already learned. A key challenge in this setting is catastrophic forgetting, where learning new information causes the model to lose its performance on previous tasks. Recently, explainable AI has been proposed as a promising way to better understand and reduce forgetting. In particular, self-explainable models are useful because they generate explanations during prediction, which can help preserve knowledge. However, most existing explainable approaches use post-hoc explanations or require additional memory for each new task, resulting in limited scalability. In this work, we introduce CIP-Net, an exemplar-free self-explainable prototype-based model designed for continual learning. CIP-Net avoids storing past examples and maintains a simple architecture, while still providing useful explanations and strong performance. We demonstrate that CIPNet achieves state-of-the-art performances compared to previous exemplar-free and self-explainable methods in both task- and class-incremental settings, while bearing significantly lower memory-related overhead. This makes it a practical and interpretable solution for continual learning.


【15】GSPN-2: Efficient Parallel Sequence Modeling
标题:GSEN-2:高效的并行序列建模
链接:https://arxiv.org/abs/2512.07884

作者:Hongjun Wang,Yitong Jiang,Collin McCarthy,David Wehr,Hanrong Ye,Xinhao Li,Ka Chun Cheung,Wonmin Byeon,Jinwei Gu,Ke Chen,Kai Han,Hongxu Yin,Pavlo Molchanov,Jan Kautz,Sifei Liu
备注:NeurIPS 2025
摘要:高效的Vision Transformer仍然是高分辨率图像和长视频相关现实世界应用的瓶颈。广义空间传播网络(GSPN)通过用线扫描传播方案取代二次自注意来解决这个问题,使成本在行数或列数方面接近线性,同时保持准确性。尽管有这种进步,现有的GSPN实现仍然遭受(i)由于重复启动GPU内核而导致的沉重开销,(ii)来自全局GPU存储器的过多数据传输,以及(iii)由于为每个通道维护单独的传播权重而导致的冗余计算。我们介绍GSPN-2,一个联合算法系统重新设计。特别是,我们将之前实现中的数千个微启动消除到一个单一的2D内核中,显式地将一个线程束固定到每个通道切片,并将前一列的激活放在共享内存中。在模型方面,我们引入了一种紧凑的通道传播策略,该策略取代了每个通道的矩阵,调整了参数,并与Transformer attention中使用的亲和图自然对齐。实验证明了GSPN-2在图像分类和文本到图像合成任务中的有效性,以显著降低的计算成本匹配变换级精度。GSPN-2通过其结构化矩阵变换和GPU优化实现的独特组合,为视觉应用中的全局空间上下文建模建立了新的效率前沿。项目页面:https://whj363636.github.io/GSPN2/
摘要:Efficient vision transformer remains a bottleneck for high-resolution images and long-video related real-world applications. Generalized Spatial Propagation Network (GSPN) addresses this by replacing quadratic self-attention with a line-scan propagation scheme, bringing the cost close to linear in the number of rows or columns, while retaining accuracy. Despite this advancement, the existing GSPN implementation still suffers from (i) heavy overhead due to repeatedly launching GPU kernels, (ii) excessive data transfers from global GPU memory, and (iii) redundant computations caused by maintaining separate propagation weights for each channel. We introduce GSPN-2, a joint algorithm-system redesign. In particular, we eliminate thousands of micro-launches from the previous implementation into one single 2D kernel, explicitly pin one warp to each channel slice, and stage the previous column's activations in shared memory. On the model side, we introduce a compact channel propagation strategy that replaces per-channel matrices, trimming parameters, and align naturally with the affinity map used in transformer attention. Experiments demonstrate GSPN-2's effectiveness across image classification and text-to-image synthesis tasks, matching transformer-level accuracy with significantly lower computational cost. GSPN-2 establishes a new efficiency frontier for modeling global spatial context in vision applications through its unique combination of structured matrix transformations and GPU-optimized implementation. Project page: https://whj363636.github.io/GSPN2/


【16】Artificial Intelligence-Driven Network-on-Chip Design Space Exploration: Neural Network Architectures for Design
标题:人工智能驱动的片上网络设计空间探索:用于设计的神经网络架构
链接:https://arxiv.org/abs/2512.07877

作者:Amogh Anshu N,Harish BP
摘要 :片上网络(Network-on-Chip,NoC)设计需要探索高维的配置空间,以满足严格的吞吐量要求和延迟约束。传统的设计空间探索技术通常速度缓慢,难以处理复杂的非线性参数交互。本文提出了一种机器学习驱动的框架,使用BookSim仿真和反向神经网络模型自动化NoC设计空间探索。具体而言,我们比较了三种架构-多层感知器(MLP),条件扩散模型和条件变分自动编码器(CVAE),以预测给定目标性能指标的最佳NoC参数。我们的流水线在不同的网格拓扑中生成了超过150,000个模拟数据点。条件扩散模型实现了最高的预测精度,在未知数据上获得0.463的均方误差(MSE)。此外,所提出的框架将设计探索时间减少了几个数量级,使其成为快速和可扩展的NoC协同设计的实用解决方案。
摘要:Network-on-Chip (NoC) design requires exploring a high-dimensional configuration space to satisfy stringent throughput requirements and latency constraints.Traditional design space exploration techniques are often slow and struggle to handle complex, non-linear parameter interactions.This work presents a machine learning-driven framework that automates NoC design space exploration using BookSim simulations and reverse neural network models.Specifically, we compare three architectures - a Multi-Layer Perceptron (MLP),a Conditional Diffusion Model, and a Conditional Variational Autoencoder (CVAE) to predict optimal NoC parameters given target performance metrics.Our pipeline generates over 150,000 simulation data points across varied mesh topologies.The Conditional Diffusion Model achieved the highest predictive accuracy, attaining a mean squared error (MSE) of 0.463 on unseen data.Furthermore, the proposed framework reduces design exploration time by several orders of magnitude, making it a practical solution for rapid and scalable NoC co-design.


【17】Fourier-Enhanced Recurrent Neural Networks for Electrical Load Time Series Downscaling
标题:用于电力负荷时间序列缩减的傅里叶增强型回归神经网络
链接:https://arxiv.org/abs/2512.07876

作者:Qi Chen,Mihai Anitescu
备注:Submitted to IEEE PES General Meeting 2026
摘要:我们提出了一个傅立叶增强的递归神经网络(RNN)降尺度电力负荷。该模型结合了(i)由低分辨率输入驱动的循环骨干,(ii)在潜在空间中融合的显式傅立叶季节嵌入,以及(iii)捕获每个周期内高分辨率分量之间依赖关系的自我注意层。在四个PJM区域中,该方法产生的RMSE比经典的Prophet基线(有和没有季节性/LAA)更低,水平方向更平坦,并且比没有注意力或傅立叶特征的RNN消融更低。
摘要:We present a Fourier-enhanced recurrent neural network (RNN) for downscaling electrical loads. The model combines (i) a recurrent backbone driven by low-resolution inputs, (ii) explicit Fourier seasonal embeddings fused in latent space, and (iii) a self-attention layer that captures dependencies among high-resolution components within each period. Across four PJM territories, the approach yields RMSE lower and flatter horizon-wise than classical Prophet baselines (with and without seasonality/LAA) and than RNN ablations without attention or Fourier features.


【18】Softly Symbolifying Kolmogorov-Arnold Networks
标题:柔和象征科尔莫戈洛夫-阿诺德网络
链接:https://arxiv.org/abs/2512.07875

作者:James Bagrow,Josh Bongard
备注:13 pages, 5 figures, 3 tables
摘要:Kolmogorov-Arnold网络(KANs)为可解释的机器学习提供了一条有前途的道路:它们的可学习激活可以单独研究,同时可以准确地集体拟合复杂数据。然而,在实践中,经过训练的激活通常缺乏符号保真度,学习病理分解,与可解释的形式没有意义的对应。我们提出了软符号化Kolmogorov-Arnold网络(S2 KAN),它将符号基元直接集成到训练中。每个激活都从符号和密集术语的字典中提取,并使用可学习的门来稀疏表示。至关重要的是,这种稀疏化是可区分的,可以实现端到端优化,并且受到原则性的最小描述长度目标的指导。当符号术语足够时,S2 KAN发现可解释的形式;当它们不存在时,它优雅地退化为密集样条。我们在符号基准、动力系统预测和现实世界的预测任务中,用小得多的模型展示了具有竞争力或优越的准确性,并观察到即使没有正则化压力也会出现自稀疏化的证据。
摘要:Kolmogorov-Arnold Networks (KANs) offer a promising path toward interpretable machine learning: their learnable activations can be studied individually, while collectively fitting complex data accurately. In practice, however, trained activations often lack symbolic fidelity, learning pathological decompositions with no meaningful correspondence to interpretable forms. We propose Softly Symbolified Kolmogorov-Arnold Networks (S2KAN), which integrate symbolic primitives directly into training. Each activation draws from a dictionary of symbolic and dense terms, with learnable gates that sparsify the representation. Crucially, this sparsification is differentiable, enabling end-to-end optimization, and is guided by a principled Minimum Description Length objective. When symbolic terms suffice, S2KAN discovers interpretable forms; when they do not, it gracefully degrades to dense splines. We demonstrate competitive or superior accuracy with substantially smaller models across symbolic benchmarks, dynamical systems forecasting, and real-world prediction tasks, and observe evidence of emergent self-sparsification even without regularization pressure.


【19】Space Alignment Matters: The Missing Piece for Inducing Neural Collapse in Long-Tailed Learning
标题:空间对齐很重要:长尾学习中导致神经崩溃的缺失部分
链接:https://arxiv.org/abs/2512.07844

作者:Jinping Wang,Zhiqiang Gao,Zhiwu Xie
摘要:神经崩溃(NC)的最新研究表明,在类平衡条件下,类特征均值和分类器权重自发地对齐到一个单一的等角紧框架(ETF)。然而,在长尾机制中,严重的样本不平衡往往会阻止NC现象的出现,导致泛化性能差。目前的努力主要是寻求通过对特征或分类器权重施加约束来恢复ETF几何形状,但忽略了一个关键问题:特征和分类器权重空间之间存在明显的不对齐。在本文中,我们从理论上量化的危害,这种错位通过最佳误差指数分析。基于这种见解,我们提出了三个明确的对齐策略,即插即用到现有的长尾方法,而无需架构更改。在CIFAR-10-LT、CIFAR-100-LT和ImageNet-LT数据集上进行的大量实验一致地提高了检查的基线,并实现了最先进的性能。
摘要 :Recent studies on Neural Collapse (NC) reveal that, under class-balanced conditions, the class feature means and classifier weights spontaneously align into a simplex equiangular tight frame (ETF). In long-tailed regimes, however, severe sample imbalance tends to prevent the emergence of the NC phenomenon, resulting in poor generalization performance. Current efforts predominantly seek to recover the ETF geometry by imposing constraints on features or classifier weights, yet overlook a critical problem: There is a pronounced misalignment between the feature and the classifier weight spaces. In this paper, we theoretically quantify the harm of such misalignment through an optimal error exponent analysis. Built on this insight, we propose three explicit alignment strategies that plug-and-play into existing long-tail methods without architectural change. Extensive experiments on the CIFAR-10-LT, CIFAR-100-LT, and ImageNet-LT datasets consistently boost examined baselines and achieve the state-of-the-art performances.


【20】Fused Gromov-Wasserstein Contrastive Learning for Effective Enzyme-Reaction Screening
标题:融合Gromov-Wasserstein对比学习用于有效的酶反应筛查
链接:https://arxiv.org/abs/2512.08508

作者:Gengmo Zhou,Feng Yu,Wenda Wang,Zhifeng Gao,Guolin Ke,Zhewei Wei,Zhen Wang
摘要:酶是重要的催化剂,能够进行广泛的生化反应。从庞大的蛋白质文库中有效地识别特定的酶对于推进生物催化是必不可少的。传统的酶筛选和检索的计算方法是耗时和资源密集型的。最近,深度学习方法显示出了希望。然而,这些方法只关注酶和反应之间的相互作用,忽略了每个域中固有的层次关系。为了解决这些局限性,我们引入了FGW-CLIP,这是一种基于优化融合Gromov-Wasserstein距离的新型对比学习框架。FGW-CLIP结合了多个比对,包括反应和酶之间的结构域间比对以及酶和反应内的结构域内比对。通过引入定制的正则化项,我们的方法最小化了酶和反应空间之间的Gromov-Wasserstein距离,从而增强了这些域之间的信息集成。广泛的评估表明FGW-CLIP在挑战酶反应任务中的优越性。在广泛使用的EnzymeMap基准测试中,FGW-CLIP在酶虚拟筛选中达到了最先进的性能,如BEDROC和EF指标所测量的。此外,FGW-CLIP在最大的酶反应基准ReactZyme的所有三个分支中始终表现出色,证明了对新型酶和反应的强大推广。这些结果将FGW-CLIP定位为在复杂的生化环境中发现酶的有前途的框架,在不同的筛选方案中具有很强的适应性。
摘要:Enzymes are crucial catalysts that enable a wide range of biochemical reactions. Efficiently identifying specific enzymes from vast protein libraries is essential for advancing biocatalysis. Traditional computational methods for enzyme screening and retrieval are time-consuming and resource-intensive. Recently, deep learning approaches have shown promise. However, these methods focus solely on the interaction between enzymes and reactions, overlooking the inherent hierarchical relationships within each domain. To address these limitations, we introduce FGW-CLIP, a novel contrastive learning framework based on optimizing the fused Gromov-Wasserstein distance. FGW-CLIP incorporates multiple alignments, including inter-domain alignment between reactions and enzymes and intra-domain alignment within enzymes and reactions. By introducing a tailored regularization term, our method minimizes the Gromov-Wasserstein distance between enzyme and reaction spaces, which enhances information integration across these domains. Extensive evaluations demonstrate the superiority of FGW-CLIP in challenging enzyme-reaction tasks. On the widely-used EnzymeMap benchmark, FGW-CLIP achieves state-of-the-art performance in enzyme virtual screening, as measured by BEDROC and EF metrics. Moreover, FGW-CLIP consistently outperforms across all three splits of ReactZyme, the largest enzyme-reaction benchmark, demonstrating robust generalization to novel enzymes and reactions. These results position FGW-CLIP as a promising framework for enzyme discovery in complex biochemical settings, with strong adaptability across diverse screening scenarios.


【21】Learned iterative networks: An operator learning perspective
标题:学习迭代网络:操作员学习的视角
链接:https://arxiv.org/abs/2512.08444

作者:Andreas Hauptmann,Ozan Öktem
摘要:学习图像重建已成为计算成像和反问题的一个重要研究方向。其中最成功的方法是学习迭代网络,这是制定通过展开经典的迭代优化算法解决变分问题。虽然底层算法通常在函数分析环境中制定,但学习方法通常被视为纯粹离散的。在本章中,我们提出了学习迭代网络的统一运营商视图。具体来说,我们制定了一个学习的重建算子,定义如何计算,并单独的学习问题,定义要计算什么。在这种情况下,我们提出了共同的方法,并表明许多方法在其核心是密切相关的。我们回顾线性以及非线性反问题在这个框架内,并提出了一个简短的数值研究的结论。
摘要:Learned image reconstruction has become a pillar in computational imaging and inverse problems. Among the most successful approaches are learned iterative networks, which are formulated by unrolling classical iterative optimisation algorithms for solving variational problems. While the underlying algorithm is usually formulated in the functional analytic setting, learned approaches are often viewed as purely discrete. In this chapter we present a unified operator view for learned iterative networks. Specifically, we formulate a learned reconstruction operator, defining how to compute, and separately the learning problem, which defines what to compute. In this setting we present common approaches and show that many approaches are closely related in their core. We review linear as well as nonlinear inverse problems in this framework and present a short numerical study to conclude.


【22】Conformal Defects in Neural Network Field Theories
标题:神经网络场论中的保形缺陷
链接:https://arxiv.org/abs/2512.07946

作者:Pietro Capuozzo,Brandon Robinson,Benjamin Suzzoni
备注:23 pages, 1 figure
摘要:神经网络场论(NN-FTs)通过对网络结构和网络参数的先验分布的描述,提出了一种新的任意场论的构造方法,包括共形场论。在这项工作中,我们提出了一个形式主义的建设共形不变的缺陷在这些NN-FT。我们证明了这种新的形式主义在两个玩具模型NN标量场理论。我们开发了一个NN的解释类似于在这些模型中的两点相关函数的缺陷OPE的扩展。
摘要:Neural Network Field Theories (NN-FTs) represent a novel construction of arbitrary field theories, including those of conformal fields, through the specification of the network architecture and prior distribution for the network parameters. In this work, we present a formalism for the construction of conformally invariant defects in these NN-FTs. We demonstrate this new formalism in two toy models of NN scalar field theories. We develop an NN interpretation of an expansion akin to the defect OPE in two-point correlation functions in these models.


其他(26篇)

【1】OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer
标题:ONSO:用于人与机器人技能转移的开源触觉手套
链接:https://arxiv.org/abs/2512.08920

作者:Jessica Yin,Haozhi Qi,Youngsun Wi,Sayantan Kundu,Mike Lambeta,William Yang,Changhao Wang,Tingfan Wu,Jitendra Malik,Tess Hellebrekers
备注:Project website: https://jessicayin.github.io/osmo_tactile_glove/
摘要:人类视频演示为学习机器人策略提供了丰富的训练数据,但仅凭视频无法捕捉掌握操作的关键丰富接触信号。我们介绍OSMO,这是一种开源的可穿戴触觉手套,专为人类到机器人的技能转移而设计。该手套在指尖和手掌上安装了12个三轴触觉传感器,旨在与最先进的手部跟踪方法兼容,以进行野外数据收集。我们证明了一个机器人的政策,专门训练与OSMO收集的人类演示,没有任何真正的机器人数据,是能够执行一个具有挑战性的接触丰富的操作任务。通过为人类和机器人配备相同的手套,OSMO最大限度地减少了视觉和触觉实施例的差距,实现了连续剪切和法向力反馈的传输,同时避免了图像修复或其他基于视觉的力推断的需要。在需要持续接触压力的实际擦除任务中,我们的战斗感知策略实现了72%的成功率,通过消除与接触相关的故障模式,优于仅视觉基线。我们发布完整的硬件设计、固件和组装说明,以支持社区采用。
摘要:Human video demonstrations provide abundant training data for learning robot policies, but video alone cannot capture the rich contact signals critical for mastering manipulation. We introduce OSMO, an open-source wearable tactile glove designed for human-to-robot skill transfer. The glove features 12 three-axis tactile sensors across the fingertips and palm and is designed to be compatible with state-of-the-art hand-tracking methods for in-the-wild data collection. We demonstrate that a robot policy trained exclusively on human demonstrations collected with OSMO, without any real robot data, is capable of executing a challenging contact-rich manipulation task. By equipping both the human and the robot with the same glove, OSMO minimizes the visual and tactile embodiment gap, enabling the transfer of continuous shear and normal force feedback while avoiding the need for image inpainting or other vision-based force inference. On a real-world wiping task requiring sustained contact pressure, our tactile-aware policy achieves a 72% success rate, outperforming vision-only baselines by eliminating contact-related failure modes. We release complete hardware designs, firmware, and assembly instructions to support community adoption.


【2】Open Polymer Challenge: Post-Competition Report
标题:公开聚合物挑战:赛后报告
链接:https://arxiv.org/abs/2512.08896

作者:Gang Liu,Sobin Alosious,Subhamoy Mahajan,Eric Inae,Yihan Zhu,Yuhan Liu,Renzheng Zhang,Jiaxin Xu,Addison Howard,Ying Li,Tengfei Luo,Meng Jiang
备注:The report for the competition: "NeurIPS - Open Polymer Prediction 2025". Kaggle Page: https://www.kaggle.com/competitions/neurips-open-polymer-prediction-2025. Website: https://open-polymer-challenge.github.io
摘要:机器学习(ML)为发现可持续聚合物材料提供了一条强大的途径,但由于缺乏大型、高质量和可公开访问的聚合物数据集,进展受到限制。开放聚合物挑战赛(OPC)通过发布第一个社区开发的聚合物信息学基准来解决这一差距,该基准具有10K聚合物和5个属性的数据集:热导率,回转半径,密度,自由体积分数和玻璃化转变温度。挑战集中在多任务聚合物性能预测上,这是材料发现虚拟筛选管道的核心步骤。参与者在现实约束下开发模型,包括小数据,标签不平衡和异构模拟源,使用基于特征的增强,迁移学习,自我监督预训练和有针对性的集成策略等技术。比赛还揭示了有关数据准备,分布变化和跨组模拟一致性的重要经验教训,为未来大规模聚合物数据集提供了最佳实践。由此产生的模型、分析和发布的数据为聚合物科学中的分子人工智能奠定了新的基础,预计将加速可持续和节能材料的开发。随着比赛,我们在https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data上发布了测试数据集。我们还在https://github.com/sobinalosious/ADEPT上发布了数据生成管道,它模拟了超过25个属性,包括热导率,回转半径和密度。
摘要:Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polymer science and are expected to accelerate the development of sustainable and energy-efficient materials. Along with the competition, we release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data. We also release the data generation pipeline at https://github.com/sobinalosious/ADEPT, which simulates more than 25 properties, including thermal conductivity, radius of gyration, and density.


【3】DAO-GP Drift Aware Online Non-Linear Regression Gaussian-Process
标题:DAO-GP漂移感知在线非线性回归高斯过程
链接:https://arxiv.org/abs/2512.08879

作者:Mohammad Abu-Shaira,Ajita Rattani,Weishi Shi
摘要 :现实世界的数据集往往表现出时间动态的特点是不断变化的数据分布。忽视这种现象,通常被称为概念漂移,可以显着降低模型的预测准确性。此外,在线模型中超参数的存在加剧了这个问题。这些参数通常是固定的,并且不能由用户响应于演变的数据分布而动态地调整。高斯过程(GP)模型提供了强大的非参数回归功能和不确定性量化,使其成为在线环境中复杂数据关系建模的理想选择。然而,传统的在线GP方法面临着几个关键的限制,包括缺乏漂移意识,依赖于固定的超参数,易受数据窥探,缺乏原则性的衰减机制,以及内存效率低下。作为回应,我们提出了DAO-GP(漂移感知在线高斯过程),一种新颖的,完全自适应的,无超参数的,衰减的,稀疏的非线性回归模型。DAO-GP具有内置的漂移检测和自适应机制,可以根据漂移的严重程度动态调整模型行为。广泛的实证评估证实了DAO-GP在静态条件下,不同漂移类型(突然,增量,渐进)和不同数据特征的鲁棒性。分析表明,它的动态适应,有效的内存和衰减为基础的管理,和不断发展的诱导点。与最先进的参数和非参数模型相比,DAO-GP始终实现卓越或具有竞争力的性能,使其成为在线非线性回归的漂移弹性解决方案。
摘要:Real-world datasets often exhibit temporal dynamics characterized by evolving data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly diminish a model's predictive accuracy. Furthermore, the presence of hyperparameters in online models exacerbates this issue. These parameters are typically fixed and cannot be dynamically adjusted by the user in response to the evolving data distribution. Gaussian Process (GP) models offer powerful non-parametric regression capabilities with uncertainty quantification, making them ideal for modeling complex data relationships in an online setting. However, conventional online GP methods face several critical limitations, including a lack of drift-awareness, reliance on fixed hyperparameters, vulnerability to data snooping, absence of a principled decay mechanism, and memory inefficiencies. In response, we propose DAO-GP (Drift-Aware Online Gaussian Process), a novel, fully adaptive, hyperparameter-free, decayed, and sparse non-linear regression model. DAO-GP features a built-in drift detection and adaptation mechanism that dynamically adjusts model behavior based on the severity of drift. Extensive empirical evaluations confirm DAO-GP's robustness across stationary conditions, diverse drift types (abrupt, incremental, gradual), and varied data characteristics. Analyses demonstrate its dynamic adaptation, efficient in-memory and decay-based management, and evolving inducing points. Compared with state-of-the-art parametric and non-parametric models, DAO-GP consistently achieves superior or competitive performance, establishing it as a drift-resilient solution for online non-linear regression.


【4】Neural Ordinary Differential Equations for Simulating Metabolic Pathway Dynamics from Time-Series Multiomics Data
标题:用于从时间序列多元组学数据模拟代谢途径动力学的神经常微方程
链接:https://arxiv.org/abs/2512.08732

作者:Udesh Habaraduwa,Andrei Lixandru
摘要:人类健康和生物工程的进步在很大程度上依赖于预测复杂生物系统的行为。虽然高通量多组学数据变得越来越丰富,但将这些数据转换为可操作的预测模型仍然是一个瓶颈。高容量的数据驱动模拟系统在这一领域至关重要;与受先验知识限制的经典机械模型不同,这些架构可以直接从观察数据中推断潜在的相互作用,从而模拟时间轨迹并预测个性化医疗和合成生物学中的下游干预效果。为了应对这一挑战,我们引入神经常微分方程(NODE)作为学习蛋白质组和代谢组之间复杂相互作用的动态框架。我们将这个框架应用于来自工程大肠杆菌菌株的时间序列数据,模拟代谢途径的连续动力学。与传统的机器学习管道相比,所提出的NODE架构在捕获系统动态方面表现出卓越的性能。我们的研究结果显示,在柠檬烯(高达94.38%的改善)和异戊烯醇(高达97.65%的改善)途径数据集之间,均方根误差比基线改善了90%以上。此外,NODE模型的推理时间加速了1000倍,使其成为下一代代谢工程和生物发现的可扩展、高保真工具。
摘要:The advancement of human healthspan and bioengineering relies heavily on predicting the behavior of complex biological systems. While high-throughput multiomics data is becoming increasingly abundant, converting this data into actionable predictive models remains a bottleneck. High-capacity, datadriven simulation systems are critical in this landscape; unlike classical mechanistic models restricted by prior knowledge, these architectures can infer latent interactions directly from observational data, allowing for the simulation of temporal trajectories and the anticipation of downstream intervention effects in personalized medicine and synthetic biology. To address this challenge, we introduce Neural Ordinary Differential Equations (NODEs) as a dynamic framework for learning the complex interplay between the proteome and metabolome. We applied this framework to time-series data derived from engineered Escherichia coli strains, modeling the continuous dynamics of metabolic pathways. The proposed NODE architecture demonstrates superior performance in capturing system dynamics compared to traditional machine learning pipelines. Our results show a greater than 90% improvement in root mean squared error over baselines across both Limonene (up to 94.38% improvement) and Isopentenol (up to 97.65% improvement) pathway datasets. Furthermore, the NODE models demonstrated a 1000x acceleration in inference time, establishing them as a scalable, high-fidelity tool for the next generation of metabolic engineering and biological discovery.


【5】An Additive Manufacturing Part Qualification Framework: Transferring Knowledge of Stress-strain Behaviors from Additively Manufactured Polymers to Metals
标题:增材制造零件鉴定框架:将应力-应变行为知识从增材制造聚合物转移到金属
链接:https://arxiv.org/abs/2512.08699

作者:Chenglong Duan,Dazhong Wu
摘要:零件鉴定在增材制造(AM)中至关重要,因为它确保增材制造的零件可以在关键应用中一致地生产和可靠地使用。零件鉴定旨在验证增材制造零件是否符合性能要求;因此,预测增材制造零件的复杂应力-应变行为至关重要。我们通过将增材制造的低成本聚合物的应力-应变行为的知识转移到金属上,开发了一个用于增材制造零件鉴定的动态时间规整(DTW)-转移学习(TL)框架。具体而言,该框架采用DTW来选择聚合物数据集作为与目标金属数据集最相关的源域。使用长短期记忆(LSTM)模型,四个源聚合物(即,尼龙、PLA、CF-ABS和树脂)和三种目标金属(即,AlSi 10 Mg,Ti6 Al 4V和碳钢),通过不同的AM技术制造的DTW-TL框架被用来证明的有效性。实验结果表明,DTW-TL框架识别聚合物和金属之间的最接近的匹配,选择一个单一的聚合物数据集作为源域。当三种金属被用作目标域时,DTW-TL模型的平均绝对百分比误差最低,为12.41%,决定系数最高,为0.96,分别优于没有TL的vanilla LSTM模型以及在四个聚合物数据集上预训练的TL模型作为源域。
摘要 :Part qualification is crucial in additive manufacturing (AM) because it ensures that additively manufactured parts can be consistently produced and reliably used in critical applications. Part qualification aims at verifying that an additively manufactured part meets performance requirements; therefore, predicting the complex stress-strain behaviors of additively manufactured parts is critical. We develop a dynamic time warping (DTW)-transfer learning (TL) framework for additive manufacturing part qualification by transferring knowledge of the stress-strain behaviors of additively manufactured low-cost polymers to metals. Specifically, the framework employs DTW to select a polymer dataset as the source domain that is the most relevant to the target metal dataset. Using a long short-term memory (LSTM) model, four source polymers (i.e., Nylon, PLA, CF-ABS, and Resin) and three target metals (i.e., AlSi10Mg, Ti6Al4V, and carbon steel) that are fabricated by different AM techniques are utilized to demonstrate the effectiveness of the DTW-TL framework. Experimental results show that the DTW-TL framework identifies the closest match between polymers and metals to select one single polymer dataset as the source domain. The DTW-TL model achieves the lowest mean absolute percentage error of 12.41% and highest coefficient of determination of 0.96 when three metals are used as the target domain, respectively, outperforming the vanilla LSTM model without TL as well as the TL model pre-trained on four polymer datasets as the source domain.


【6】An Agentic AI System for Multi-Framework Communication Coding
标题:用于多框架通信编码的大型人工智能系统
链接:https://arxiv.org/abs/2512.08659

作者:Bohao Yang,Rui Yang,Joshua M. Biro,Haoyuan Wang,Jessica L. Handley,Brianna Richardson,Sophia Bessias,Nicoleta Economou-Zavlanos,Armando D. Bedoya,Monica Agrawal,Michael M. Zavlanos,Anand Chowdhury,Raj M. Ratwani,Kai Sun,Kathryn I. Pollak,Michael J. Pencina,Chuan Hong
摘要:临床沟通对患者的治疗效果至关重要,但对患者-提供者对话的大规模人工注释仍然是劳动密集型的,不一致的,并且难以扩展。基于大型语言模型的现有方法通常依赖于缺乏适应性、可解释性和可靠性的单任务模型,特别是在跨各种通信框架和临床领域应用时。在这项研究中,我们开发了一个用于临床沟通的多框架结构化人工智能系统(MOSAIC),该系统构建在基于LangGraph的架构上,该架构协调了四个核心代理,包括用于码本选择和工作流规划的计划代理,用于维护最新检索数据库的更新代理,一组注释代理,该注释代理应用具有动态Few-Shot提示的码本引导检索增强生成(RAG),以及提供一致性检查和反馈的验证代理。为了评估性能,我们将MOSAIC输出与训练有素的人类编码人员创建的黄金标准注释进行了比较。我们开发和评估MOSAIC使用26个金标准注释的成绩单的培训和50个成绩单的测试,跨越风湿病学和OB/GYN域。在测试集上,MOSAIC的F1总分为0.928。流变学子集的性能最高(F1 = 0.962),患者行为最强(例如,患者提问、表达偏好或表现出自信)。消融显示,MOSAIC优于基线基准。
摘要:Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, and difficult to scale. Existing approaches based on large language models typically rely on single-task models that lack adaptability, interpretability, and reliability, especially when applied across various communication frameworks and clinical domains. In this study, we developed a Multi-framework Structured Agentic AI system for Clinical Communication (MOSAIC), built on a LangGraph-based architecture that orchestrates four core agents, including a Plan Agent for codebook selection and workflow planning, an Update Agent for maintaining up-to-date retrieval databases, a set of Annotation Agents that applies codebook-guided retrieval-augmented generation (RAG) with dynamic few-shot prompting, and a Verification Agent that provides consistency checks and feedback. To evaluate performance, we compared MOSAIC outputs against gold-standard annotations created by trained human coders. We developed and evaluated MOSAIC using 26 gold standard annotated transcripts for training and 50 transcripts for testing, spanning rheumatology and OB/GYN domains. On the test set, MOSAIC achieved an overall F1 score of 0.928. Performance was highest in the Rheumatology subset (F1 = 0.962) and strongest for Patient Behavior (e.g., patients asking questions, expressing preferences, or showing assertiveness). Ablations revealed that MOSAIC outperforms baseline benchmarking.


【7】Reusability in MLOps: Leveraging Ports and Adapters to Build a Microservices Architecture for the Maritime Domain
标题:MLOps中的可重用性:利用港口和适配器构建海事领域的微服务架构
链接:https://arxiv.org/abs/2512.08657

作者:Renato Cordeiro Ferreira,Aditya Dhinavahi,Rowanne Trapmann,Willem-Jan van den Heuvel
备注:7 pages, 3 figures (3 diagrams), submitted to ICSA 2026
摘要:支持ML的系统(MLES)本质上是复杂的,因为它们需要多个组件来实现其业务目标。本经验报告展示了在构建Ocean Guard(用于海事领域异常检测的MLES)时应用的软件架构可重用性技术。特别是,它强调了重用端口和适配器模式以支持从单个代码库构建多个微服务的挑战和经验教训。这份经验报告希望能激励软件工程师、机器学习工程师和数据科学家应用Hundronal Architecture模式来构建他们的MLES。
摘要:ML-Enabled Systems (MLES) are inherently complex since they require multiple components to achieve their business goal. This experience report showcases the software architecture reusability techniques applied while building Ocean Guard, an MLES for anomaly detection in the maritime domain. In particular, it highlights the challenges and lessons learned to reuse the Ports and Adapters pattern to support building multiple microservices from a single codebase. This experience report hopes to inspire software engineers, machine learning engineers, and data scientists to apply the Hexagonal Architecture pattern to build their MLES.


【8】Fully Decentralized Certified Unlearning
标题:完全去中心化认证的学习
链接:https://arxiv.org/abs/2512.08443

作者:Hithem Lamri,Michail Maniatakos
摘要:机器去学习(MU)旨在消除训练模型中指定数据的影响,以响应隐私请求或数据中毒。虽然已在集中式和服务器编排的联邦设置(通过类似于差分隐私的保证,DP)中分析了认证的遗忘,但分散式设置-其中对等体在没有协调器的情况下进行通信-仍然没有得到充分的探索。我们研究了具有固定拓扑结构的分散网络中的认证unlearning,并提出了RR-DU,这是一种随机行走过程,在unlearning客户端的遗忘集上执行一个投影梯度上升步骤,在其他地方保留的数据上执行几何分布的投影下降步骤,结合子采样高斯噪声和投影到原始模型周围的信任区域上。我们提供(i)在凸情形下的收敛性保证和在非凸情形下的平稳性保证,(ii)(\vareps,δ)通过分段级子采样的子采样高斯Rényi DP(RDP)对客户端视图进行网络非学习证书,以及(iii)删除容量界限,该界限与遗忘本地数据比率成比例并量化分散化的效果(网络混合和随机子采样)的隐私-效用权衡。从经验上讲,在图像基准测试(MNIST,CIFAR-10)上,RR-DU匹配给定的$(\varepalent,δ)$,同时实现比分散DP基线更高的测试准确度,并将遗忘准确度降低到随机猜测($\approximat10\ %$)。
摘要 :Machine unlearning (MU) seeks to remove the influence of specified data from a trained model in response to privacy requests or data poisoning. While certified unlearning has been analyzed in centralized and server-orchestrated federated settings (via guarantees analogous to differential privacy, DP), the decentralized setting -- where peers communicate without a coordinator remains underexplored. We study certified unlearning in decentralized networks with fixed topologies and propose RR-DU, a random-walk procedure that performs one projected gradient ascent step on the forget set at the unlearning client and a geometrically distributed number of projected descent steps on the retained data elsewhere, combined with subsampled Gaussian noise and projection onto a trust region around the original model. We provide (i) convergence guarantees in the convex case and stationarity guarantees in the nonconvex case, (ii) $(\varepsilon,δ)$ network-unlearning certificates on client views via subsampled Gaussian $Rényi$ DP (RDP) with segment-level subsampling, and (iii) deletion-capacity bounds that scale with the forget-to-local data ratio and quantify the effect of decentralization (network mixing and randomized subsampling) on the privacy--utility trade-off. Empirically, on image benchmarks (MNIST, CIFAR-10), RR-DU matches a given $(\varepsilon,δ)$ while achieving higher test accuracy than decentralized DP baselines and reducing forget accuracy to random guessing ($\approx 10\%$).


【9】Beyond Wave Variables: A Data-Driven Ensemble Approach for Enhanced Teleoperation Transparency and Stability
标题:超越波浪变量:增强远程操作透明度和稳定性的数据驱动集成方法
链接:https://arxiv.org/abs/2512.08436

作者:Nour Mitiche,Farid Ferguene,Mourad Oussalah
备注:14 pages, 8 figures, 5 tables
摘要:通信信道中的时间延迟对双边遥操作系统提出了重大挑战,影响了系统的透明度和稳定性。虽然传统的基于波变量的四通道架构的方法通过无源性来确保稳定性,但它们仍然容易受到波反射和干扰(如可变延迟和环境噪声)的影响。本文提出了一个数据驱动的混合框架,它用三个高级序列模型的集合取代了传统的波变量变换,每个模型都通过最先进的Optuna优化器单独优化,并通过堆叠元学习器组合。基本预测器包括一个使用Prophet进行趋势校正的LSTM,一个基于LSTM的特征提取器,与聚类和随机森林配对,用于改进回归,以及一个用于局部和长期动态的CNN-LSTM模型。使用MATLAB/Simulink中实现的基线系统生成的数据在Python中进行实验验证。结果表明,我们的优化合奏实现了不同的延迟和噪声下的基线波变量系统相媲美的透明度,同时通过无源性约束,确保稳定性。
摘要:Time delays in communication channels present significant challenges for bilateral teleoperation systems, affecting both transparency and stability. Although traditional wave variable-based methods for a four-channel architecture ensure stability via passivity, they remain vulnerable to wave reflections and disturbances like variable delays and environmental noise. This article presents a data-driven hybrid framework that replaces the conventional wave-variable transform with an ensemble of three advanced sequence models, each optimized separately via the state-of-the-art Optuna optimizer, and combined through a stacking meta-learner. The base predictors include an LSTM augmented with Prophet for trend correction, an LSTM-based feature extractor paired with clustering and a random forest for improved regression, and a CNN-LSTM model for localized and long-term dynamics. Experimental validation was performed in Python using data generated from the baseline system implemented in MATLAB/Simulink. The results show that our optimized ensemble achieves a transparency comparable to the baseline wave-variable system under varying delays and noise, while ensuring stability through passivity constraints.


【10】Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging
标题:磁子:通过差异能量分配器优化ML系统的能源效率
链接:https://arxiv.org/abs/2512.08365

作者:Yi Pan,Wenbo Qian,Dedong Xie,Ruiyan Hu,Yigong Hu,Baris Kasikci
备注:12 pages, 10 fi
摘要:机器学习(ML)模型的训练和部署已经变得非常耗能。虽然现有的优化工作主要集中在硬件能源效率上,但一个重要但被忽视的低效率来源是由糟糕的软件设计引起的软件能源浪费。这通常包括冗余或设计不佳的操作,这些操作消耗更多的能源,而不会提高性能。这些低效率出现在广泛使用的ML框架和应用程序中,但开发人员通常缺乏检测和诊断它们的可见性和工具。   我们提出了差分能量调试,这是一种新的方法,它利用了竞争ML系统通常以截然不同的能耗实现类似功能的观察结果。基于这一见解,我们设计并实现了Magneton,这是一种能量分析器,可以在操作员级别比较类似ML系统之间的能耗,并自动确定导致过度能耗的代码区域和配置选择。应用于9个流行的ML系统,涵盖LLM推理,通用ML框架和图像生成,Magneton检测和诊断了16个已知的软件能源效率低下的案例,并进一步发现了8个以前未知的案例,其中7个已经被开发人员证实。
摘要:The training and deployment of machine learning (ML) models have become extremely energy-intensive. While existing optimization efforts focus primarily on hardware energy efficiency, a significant but overlooked source of inefficiency is software energy waste caused by poor software design. This often includes redundant or poorly designed operations that consume more energy without improving performance. These inefficiencies arise in widely used ML frameworks and applications, yet developers often lack the visibility and tools to detect and diagnose them.   We propose differential energy debugging, a novel approach that leverages the observation that competing ML systems often implement similar functionality with vastly different energy consumption. Building on this insight, we design and implement Magneton, an energy profiler that compares energy consumption between similar ML systems at the operator level and automatically pinpoints code regions and configuration choices responsible for excessive energy use. Applied to 9 popular ML systems spanning LLM inference, general ML frameworks, and image generation, Magneton detects and diagnoses 16 known cases of software energy inefficiency and further discovers 8 previously unknown cases, 7 of which have been confirmed by developers.


【11】Low Rank Support Quaternion Matrix Machine
标题:低等级支持四元数矩阵机
链接:https://arxiv.org/abs/2512.08327

作者:Wang Chen,Ziyan Luo,Shuangyue Wang
摘要 :对于彩色图像分类,输入特征通常表示为实域中的向量、矩阵或三阶张量。受彩色图像四元数数据建模在图像恢复和去噪任务中的成功启发,我们提出了一种新的彩色图像分类方法,命名为低秩支持四元数矩阵机(LSQMM),其中RGB通道被视为纯四元数,以有效地保留通道之间的内在耦合关系,通过四元数代数。为了促进低秩结构产生的强相关的颜色通道,四元数核范数正则化项,作为一个自然的扩展传统的矩阵核范数的四元数域,被添加到铰链损失在我们的LSQMM模型。设计了一种基于交替方向乘子法(ADMM)的迭代算法,有效地解决了所提出的四元数优化模型。多个彩色图像分类数据集上的实验结果表明,我们提出的分类方法具有优势,在分类精度,鲁棒性和计算效率,相比,几个国家的最先进的方法,使用支持向量机,支持矩阵机,支持张量机。
摘要:Input features are conventionally represented as vectors, matrices, or third order tensors in the real field, for color image classification. Inspired by the success of quaternion data modeling for color images in image recovery and denoising tasks, we propose a novel classification method for color image classification, named as the Low-rank Support Quaternion Matrix Machine (LSQMM), in which the RGB channels are treated as pure quaternions to effectively preserve the intrinsic coupling relationships among channels via the quaternion algebra. For the purpose of promoting low-rank structures resulting from strongly correlated color channels, a quaternion nuclear norm regularization term, serving as a natural extension of the conventional matrix nuclear norm to the quaternion domain, is added to the hinge loss in our LSQMM model. An Alternating Direction Method of Multipliers (ADMM)-based iterative algorithm is designed to effectively resolve the proposed quaternion optimization model. Experimental results on multiple color image classification datasets demonstrate that our proposed classification approach exhibits advantages in classification accuracy, robustness and computational efficiency, compared to several state-of-the-art methods using support vector machines, support matrix machines, and support tensor machines.


【12】Jacobian Aligned Random Forests
标题:雅各比对齐随机森林
链接:https://arxiv.org/abs/2512.08306

作者:Sarwesh Rauniyar
摘要:轴对齐的决策树是快速和稳定的,但在具有旋转或交互依赖的决策边界的数据集上挣扎,其中信息分割需要特征的线性组合,而不是单个特征阈值。倾斜森林通过每个节点的超平面分割来解决这个问题,但增加了计算成本和实现复杂性。我们提出了一个简单的替代方案:JARF,雅可比对齐的随机森林。具体地说,我们首先拟合一个轴对齐的森林来估计类概率或回归输出,计算这些预测相对于每个特征的有限差分梯度,将它们聚合成一个预期的雅可比外积,该外积概括了预期梯度外积(EGOP),并将其用作所有输入的单个全局线性预处理器。这种有监督的预处理器应用特征空间的单个全局旋转,然后将转换后的数据传递回标准的轴对齐森林,保留现成的训练管道,同时捕获倾斜边界和特征交互,否则需要许多轴对齐的分裂来近似。同样的构造适用于任何提供梯度的模型,尽管我们在这项工作中关注随机森林和梯度提升树。在表格分类和回归基准测试中,这种预处理始终改善了轴对齐的森林,并且通常匹配或超过倾斜基线,同时改善了训练时间。我们的实验结果和理论分析一起表明,监督预处理可以恢复斜森林的准确性,同时保持简单性和鲁棒性的轴对齐树。
摘要:Axis-aligned decision trees are fast and stable but struggle on datasets with rotated or interaction-dependent decision boundaries, where informative splits require linear combinations of features rather than single-feature thresholds. Oblique forests address this with per-node hyperplane splits, but at added computational cost and implementation complexity. We propose a simple alternative: JARF, Jacobian-Aligned Random Forests. Concretely, we first fit an axis-aligned forest to estimate class probabilities or regression outputs, compute finite-difference gradients of these predictions with respect to each feature, aggregate them into an expected Jacobian outer product that generalizes the expected gradient outer product (EGOP), and use it as a single global linear preconditioner for all inputs. This supervised preconditioner applies a single global rotation of the feature space, then hands the transformed data back to a standard axis-aligned forest, preserving off-the-shelf training pipelines while capturing oblique boundaries and feature interactions that would otherwise require many axis-aligned splits to approximate. The same construction applies to any model that provides gradients, though we focus on random forests and gradient-boosted trees in this work. On tabular classification and regression benchmarks, this preconditioning consistently improves axis-aligned forests and often matches or surpasses oblique baselines while improving training time. Our experimental results and theoretical analysis together indicate that supervised preconditioning can recover much of the accuracy of oblique forests while retaining the simplicity and robustness of axis-aligned trees.


【13】Empowering smart app development with SolidGPT: an edge-cloud hybrid AI agent framework
标题:利用SolidGPT支持智能应用程序开发:边缘云混合AI代理框架
链接:https://arxiv.org/abs/2512.08286

作者:Liao Hu,Qiteng Wu,Ruoyu Qi
摘要:将大型语言模型(LLM)集成到移动和软件开发工作流中面临着三个需求之间的持续紧张关系:语义感知,开发人员生产力和数据隐私。传统的基于云的工具提供强大的推理,但存在数据暴露和延迟的风险,而设备上的解决方案缺乏对代码库和开发人员工具的全面理解。我们介绍SolidGPT,这是一个基于GitHub构建的开源边缘云混合开发人员助手,旨在增强代码和工作空间语义搜索。SolidGPT使开发人员能够:与您的代码库对话:交互式查询代码和项目结构,无需手动搜索即可发现正确的方法和模块。自动化软件项目工作流程:通过VSCode和Notion进行深度集成,生成PRD、任务分解、看板,甚至搭建Web应用程序的开始。配置私有、可扩展的代理:板载私有代码文件夹(最多约500个文件),连接Notion,通过嵌入和上下文培训自定义AI代理角色,并通过Docker、CLI或VSCode扩展进行部署。在实践中,SolidGPT通过以下方式提高了开发人员的生产力:语义丰富的代码导航:不再需要在文件中搜索或想知道功能在哪里。集成的文档和任务管理:将生成的PRD内容和任务板无缝同步到开发人员工作流程中。隐私优先设计:通过Docker或VSCode在本地运行,完全控制代码和数据,同时根据需要可选地接触LLM API。通过结合交互式代码查询、自动化项目脚手架和人工智能协作,SolidGPT提供了一个实用的、尊重隐私的边缘助手,可以加速现实世界的开发工作流,是智能移动和软件工程环境的理想选择。
摘要 :The integration of Large Language Models (LLMs) into mobile and software development workflows faces a persistent tension among three demands: semantic awareness, developer productivity, and data privacy. Traditional cloud-based tools offer strong reasoning but risk data exposure and latency, while on-device solutions lack full-context understanding across codebase and developer tooling. We introduce SolidGPT, an open-source, edge-cloud hybrid developer assistant built on GitHub, designed to enhance code and workspace semantic search. SolidGPT enables developers to: talk to your codebase: interactively query code and project structure, discovering the right methods and modules without manual searching. Automate software project workflows: generate PRDs, task breakdowns, Kanban boards, and even scaffold web app beginnings, with deep integration via VSCode and Notion. Configure private, extensible agents: onboard private code folders (up to approximately 500 files), connect Notion, customize AI agent personas via embedding and in-context training, and deploy via Docker, CLI, or VSCode extension. In practice, SolidGPT empowers developer productivity through: Semantic-rich code navigation: no more hunting through files or wondering where a feature lives. Integrated documentation and task management: seamlessly sync generated PRD content and task boards into developer workflows. Privacy-first design: running locally via Docker or VSCode, with full control over code and data, while optionally reaching out to LLM APIs as needed. By combining interactive code querying, automated project scaffolding, and human-AI collaboration, SolidGPT provides a practical, privacy-respecting edge assistant that accelerates real-world development workflows, ideal for intelligent mobile and software engineering contexts.


【14】SPROCKET: Extending ROCKET to Distance-Based Time-Series Transformations With Prototypes
标题:SPROCKET:通过原型将ROCKET扩展到基于距离的时间序列转换
链接:https://arxiv.org/abs/2512.08246

作者:Nicholas Harner
备注:63 Pages, 28 in main body with 3 appendices for supplemental figures
摘要:经典的时间序列分类算法主要采用特征工程策略。这些变换中最突出的一个是ROCKET,它通过随机内核特征实现了强大的性能。我们介绍了SPROCKET(选择原型随机卷积核变换),它实现了一种新的基于原型的特征工程策略。在大多数UCR和UEA时间序列分类档案中,SPROCKET实现了与现有卷积算法相当的性能,并且新的MR-HY-SP(MultiROCKET-HYPERTIES-SPROCKET)集成的平均准确度排名超过了HYDRA-MR,这是以前最好的卷积集成的性能。这些实验结果表明,基于原型的特征变换可以提高时间序列分类的准确性和鲁棒性。
摘要:Classical Time Series Classification algorithms are dominated by feature engineering strategies. One of the most prominent of these transforms is ROCKET, which achieves strong performance through random kernel features. We introduce SPROCKET (Selected Prototype Random Convolutional Kernel Transform), which implements a new feature engineering strategy based on prototypes. On a majority of the UCR and UEA Time Series Classification archives, SPROCKET achieves performance comparable to existing convolutional algorithms and the new MR-HY-SP ( MultiROCKET-HYDRA-SPROCKET) ensemble's average accuracy ranking exceeds HYDRA-MR, the previous best convolutional ensemble's performance. These experimental results demonstrate that prototype-based feature transformation can enhance both accuracy and robustness in time series classification.


【15】Correction of Decoupled Weight Decay
标题:脱钩重量衰变的修正
链接:https://arxiv.org/abs/2512.08217

作者:Jason Chuan-Chih Chou
摘要:解耦的权重衰减是AdamW优于Adam的唯一原因,长期以来一直被设定为与学习率$γ$成正比。一些研究人员最近挑战了这种假设,并认为解耦的重量衰减应该设置为$\propto γ^2$,而不是基于稳态的正交性参数。相反,我们发现消除更新的垂直分量对权重范数的贡献对训练动态的影响很小。相反,我们推导出解耦的权重衰减$\propto γ^2$导致稳定的权重范数,基于简单的假设,即更新在稳定状态下变得独立于权重,而不管优化器的性质如何。基于相同的假设,我们推导并实证验证了Scion优化器下的minibatch的总更新贡献(TUC)更好地表征了动量相关的有效学习率,其最佳值转移,我们表明解耦的权重衰减$\propto γ^2$导致稳定的权重和梯度范数,并允许我们更好地控制训练动态并提高模型性能。
摘要:Decoupled weight decay, solely responsible for the performance advantage of AdamW over Adam, has long been set to proportional to learning rate $γ$ without questioning. Some researchers have recently challenged such assumption and argued that decoupled weight decay should be set $\propto γ^2$ instead based on orthogonality arguments at steady state. To the contrary, we find that eliminating the contribution of the perpendicular component of the update to the weight norm leads to little change to the training dynamics. Instead, we derive that decoupled weight decay $\propto γ^2$ results in stable weight norm based on the simple assumption that updates become independent of the weights at steady state, regardless of the nature of the optimizer. Based on the same assumption, we derive and empirically verify that the Total Update Contribution (TUC) of a minibatch under the Scion optimizer is better characterized by the momentum-dependent effective learning rate whose optimal value transfers and we show that decoupled weight decay $\propto γ^2$ leads to stable weight and gradient norms and allows us to better control the training dynamics and improve the model performance.


【16】Robust Agents in Open-Ended Worlds
标题:开放世界中的强大代理人
链接:https://arxiv.org/abs/2512.08139

作者:Mikayel Samvelyan
备注:PhD Thesis
摘要:人工智能(AI)在各种应用中的日益普及,凸显了对能够成功导航和适应不断变化的开放式世界的智能体的需求。一个关键的挑战是确保这些人工智能代理是强大的,不仅在训练期间观察到的熟悉环境中表现出色,而且还有效地推广到以前看不见的各种场景。在这篇论文中,我们利用开放性和多智能体学习的方法来训练和评估强大的AI代理能够推广到新的环境,分布外的输入,并与其他合作伙伴代理的互动。我们首先介绍MiniHack,这是一个沙盒框架,用于通过过程内容生成创建不同的环境。基于NetHack游戏,MiniHack能够为强化学习(RL)代理构建新任务,重点是泛化。然后,我们介绍了Maestro,这是一种生成对抗性课程的新颖方法,可以逐步增强RL代理在两人零和游戏中的鲁棒性和通用性。我们进一步探索多智能体领域的鲁棒性,利用质量多样性方法系统地识别复杂视频游戏足球领域中最先进的预训练RL策略中的漏洞,其特征在于相互交织的合作和竞争动态。最后,我们扩展我们的鲁棒性的探索领域的LLM。在这里,我们的重点是诊断和增强LLM对对抗性提示的鲁棒性,采用进化搜索来生成各种有效的输入,旨在从LLM中引出不期望的输出。这项工作共同为人工智能鲁棒性的未来发展铺平了道路,使智能体的开发不仅能够适应不断变化的世界,而且能够在面对不可预见的挑战和互动时蓬勃发展。
摘要 :The growing prevalence of artificial intelligence (AI) in various applications underscores the need for agents that can successfully navigate and adapt to an ever-changing, open-ended world. A key challenge is ensuring these AI agents are robust, excelling not only in familiar settings observed during training but also effectively generalising to previously unseen and varied scenarios. In this thesis, we harness methodologies from open-endedness and multi-agent learning to train and evaluate robust AI agents capable of generalising to novel environments, out-of-distribution inputs, and interactions with other co-player agents. We begin by introducing MiniHack, a sandbox framework for creating diverse environments through procedural content generation. Based on the game of NetHack, MiniHack enables the construction of new tasks for reinforcement learning (RL) agents with a focus on generalisation. We then present Maestro, a novel approach for generating adversarial curricula that progressively enhance the robustness and generality of RL agents in two-player zero-sum games. We further probe robustness in multi-agent domains, utilising quality-diversity methods to systematically identify vulnerabilities in state-of-the-art, pre-trained RL policies within the complex video game football domain, characterised by intertwined cooperative and competitive dynamics. Finally, we extend our exploration of robustness to the domain of LLMs. Here, our focus is on diagnosing and enhancing the robustness of LLMs against adversarial prompts, employing evolutionary search to generate a diverse range of effective inputs that aim to elicit undesirable outputs from an LLM. This work collectively paves the way for future advancements in AI robustness, enabling the development of agents that not only adapt to an ever-evolving world but also thrive in the face of unforeseen challenges and interactions.


【17】Robust equilibria in continuous games: From strategic to dynamic robustness
标题:连续博弈中的鲁棒均衡:从战略鲁棒到动态鲁棒
链接:https://arxiv.org/abs/2512.08138

作者:Kyriakos Lotidis,Panayotis Mertikopoulos,Nicholas Bambos,Jose Blanchet
备注:33 pages, 5 figures
摘要:在本文中,我们研究的鲁棒性的纳什均衡在连续的游戏,在战略和动态的不确定性。从前者开始,我们引入了一个强大的平衡的概念,这些平衡保持不变的小-但在其他方面任意-扰动游戏的收益结构,我们提供了一个清晰的几何特征。随后,我们转向动态鲁棒性的问题,我们研究哪些平衡可能会出现稳定的极限点的动态“遵循正规化的领导者”(FTRL)的随机性和不确定性的存在。尽管他们非常不同的起源,我们建立了这两个概念的鲁棒性之间的结构对应关系:战略鲁棒性意味着动态鲁棒性,相反,战略鲁棒性的要求不能放松,如果要保持动态鲁棒性。最后,我们研究了收敛到鲁棒均衡的速度作为底层正则化的函数,我们表明,熵正则化学习收敛在一个几何速度的游戏与affiliate约束的动作空间。
摘要:In this paper, we examine the robustness of Nash equilibria in continuous games, under both strategic and dynamic uncertainty. Starting with the former, we introduce the notion of a robust equilibrium as those equilibria that remain invariant to small -- but otherwise arbitrary -- perturbations to the game's payoff structure, and we provide a crisp geometric characterization thereof. Subsequently, we turn to the question of dynamic robustness, and we examine which equilibria may arise as stable limit points of the dynamics of "follow the regularized leader" (FTRL) in the presence of randomness and uncertainty. Despite their very distinct origins, we establish a structural correspondence between these two notions of robustness: strategic robustness implies dynamic robustness, and, conversely, the requirement of strategic robustness cannot be relaxed if dynamic robustness is to be maintained. Finally, we examine the rate of convergence to robust equilibria as a function of the underlying regularizer, and we show that entropically regularized learning converges at a geometric rate in games with affinely constrained action spaces.


【18】Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization
标题:通过类子空间超线性化提高后门检测器的灵敏度
链接:https://arxiv.org/abs/2512.08129

作者:Guangmingmei Yang,David J. Miller,George Kesidis
摘要:大多数训练后后门检测方法依赖于被攻击模型,与非目标类相比,该模型对攻击的目标类表现出极端的离群值检测统计。然而,这些方法可能失败:(1)当一些(非目标)类别容易与所有其他类别区分时,在这种情况下,它们可能自然地实现极端检测统计(例如,决策置信度);以及(2)当后门是微妙的,即,其特征相对于固有的类别区分特征而言较弱。一个关键的观察结果是,后门目标类的贡献,其检测统计从后门触发器和其内在的功能,而非目标类只有贡献,从他们的内在功能。为了实现更灵敏的检测器,我们因此建议抑制内在特征,同时优化给定类别的检测统计。对于非目标类,这种抑制将大大减少可实现的统计,而对于目标类,后门触发器的(显著)贡献仍然存在。在实践中,我们制定了一个约束优化问题,利用一小部分干净的例子从一个给定的类,并优化检测统计,同时正交相对于类的内在特征。我们称这种即插即用的方法类子空间归一化(CSO),并评估它对具有挑战性的混合标签和自适应攻击。
摘要:Most post-training backdoor detection methods rely on attacked models exhibiting extreme outlier detection statistics for the target class of an attack, compared to non-target classes. However, these approaches may fail: (1) when some (non-target) classes are easily discriminable from all others, in which case they may naturally achieve extreme detection statistics (e.g., decision confidence); and (2) when the backdoor is subtle, i.e., with its features weak relative to intrinsic class-discriminative features. A key observation is that the backdoor target class has contributions to its detection statistic from both the backdoor trigger and from its intrinsic features, whereas non-target classes only have contributions from their intrinsic features. To achieve more sensitive detectors, we thus propose to suppress intrinsic features while optimizing the detection statistic for a given class. For non-target classes, such suppression will drastically reduce the achievable statistic, whereas for the target class the (significant) contribution from the backdoor trigger remains. In practice, we formulate a constrained optimization problem, leveraging a small set of clean examples from a given class, and optimizing the detection statistic while orthogonalizing with respect to the class's intrinsic features. We dub this plug-and-play approach Class Subspace Orthogonalization (CSO) and assess it against challenging mixed-label and adaptive attacks.


【19】LUNA: Linear Universal Neural Attention with Generalization Guarantees
标题:LUNA:具有概括保证的线性通用神经注意力
链接:https://arxiv.org/abs/2512.08061

作者:Ashkan Shahbazi,Ping He,Ali Abbasi,Yikun Bai,Xinran Liu,Elaheh Akbari,Darian Salehi,Navid NaderiAlizadeh,Soheil Kolouri
摘要:缩放注意力面临着一个关键的瓶颈:softmax注意力的$\mathcal{O}(n^2)$二次计算成本,这限制了它在长序列领域的应用。虽然线性注意力机制将此成本降低到$\mathcal{O}(n)$,但它们通常依赖于固定的随机特征映射,例如随机傅立叶特征或手工函数。这种对静态的、数据不可知的内核的依赖造成了一种基本的权衡,迫使从业者为了计算效率而牺牲了显著的模型准确性。我们引入了\textsc{LUNA},一个内核化的线性注意力机制,消除了这种权衡,保留线性成本,同时匹配和超越二次注意力的准确性。\textsc{LUNA}构建于核心特征映射本身应该被学习而不是先验固定的关键洞察之上。通过对内核进行参数化,\textsc{LUNA}可以学习为特定数据和任务量身定制的特征基础,从而克服固定特征方法的表达限制。\textsc{Luna}使用一个可学习的特征映射来实现这一点,该特征映射诱导一个正定内核并允许一个流形式,从而在序列长度中产生线性时间和内存缩放。经验评估验证了我们的方法在不同的设置。在长距离竞技场(LRA)上,\textsc{Luna}在计算奇偶校验下,使用相同的参数计数、训练步骤和近似FLOP,在高效的Transformers中实现了最先进的平均精度。\textsc{Luna}在事后转换方面也表现出色:在微调的BERT和ViT-B/16检查点中替换softmax,并进行短暂的微调,恢复了大部分原始性能,大大优于固定线性化。
摘要:Scaling attention faces a critical bottleneck: the $\mathcal{O}(n^2)$ quadratic computational cost of softmax attention, which limits its application in long-sequence domains. While linear attention mechanisms reduce this cost to $\mathcal{O}(n)$, they typically rely on fixed random feature maps, such as random Fourier features or hand-crafted functions. This reliance on static, data-agnostic kernels creates a fundamental trade-off, forcing practitioners to sacrifice significant model accuracy for computational efficiency. We introduce \textsc{LUNA}, a kernelized linear attention mechanism that eliminates this trade-off, retaining linear cost while matching and surpassing the accuracy of quadratic attention. \textsc{LUNA} is built on the key insight that the kernel feature map itself should be learned rather than fixed a priori. By parameterizing the kernel, \textsc{LUNA} learns a feature basis tailored to the specific data and task, overcoming the expressive limitations of fixed-feature methods. \textsc{Luna} implements this with a learnable feature map that induces a positive-definite kernel and admits a streaming form, yielding linear time and memory scaling in the sequence length. Empirical evaluations validate our approach across diverse settings. On the Long Range Arena (LRA), \textsc{Luna} achieves state-of-the-art average accuracy among efficient Transformers under compute parity, using the same parameter count, training steps, and approximate FLOPs. \textsc{Luna} also excels at post-hoc conversion: replacing softmax in fine-tuned BERT and ViT-B/16 checkpoints and briefly fine-tuning recovers most of the original performance, substantially outperforming fixed linearizations.


【20】Fairness-aware PageRank via Edge Reweighting
标题:通过边缘重新加权实现公平性感知PageRank
链接:https://arxiv.org/abs/2512.08055

作者:Honglian Wang,Haoyun Chen,Aristides Gionis
摘要:链接分析算法,如PageRank,有助于通过基于其连接性评估单个顶点的重要性来理解网络的结构动力学。最近,随着负责任的人工智能的重要性不断上升,链接分析算法中的公平性问题也越来越受到关注。在本文中,我们提出了一种新的方法,将组公平性的PageRank算法,通过重新加权的转移概率在底层的转移矩阵。我们制定的问题,实现公平的PageRank通过寻求最小化的公平损失,这是原始组明智的PageRank分布和目标PageRank分布之间的差异。我们进一步定义了一个组适应的公平性的概念,考虑随机游走与组偏向重新启动为每组占组同质性。由于公平性损失是非凸的,我们提出了一个有效的投影梯度下降法计算局部最优的边缘权重。与早期的方法不同,我们不建议向网络添加新的边,也不调整重启向量。相反,我们保持底层网络的拓扑结构不变,只修改现有边的相对重要性。我们经验比较我们的方法与国家的最先进的基线,并证明我们的方法的有效性,其中非常小的变化,在过渡矩阵导致显着改善的公平性的PageRank算法。
摘要:Link-analysis algorithms, such as PageRank, are instrumental in understanding the structural dynamics of networks by evaluating the importance of individual vertices based on their connectivity. Recently, with the rising importance of responsible AI, the question of fairness in link-analysis algorithms has gained traction. In this paper, we present a new approach for incorporating group fairness into the PageRank algorithm by reweighting the transition probabilities in the underlying transition matrix. We formulate the problem of achieving fair PageRank by seeking to minimize the fairness loss, which is the difference between the original group-wise PageRank distribution and a target PageRank distribution. We further define a group-adapted fairness notion, which accounts for group homophily by considering random walks with group-biased restart for each group. Since the fairness loss is non-convex, we propose an efficient projected gradient-descent method for computing locally-optimal edge weights. Unlike earlier approaches, we do not recommend adding new edges to the network, nor do we adjust the restart vector. Instead, we keep the topology of the underlying network unchanged and only modify the relative importance of existing edges. We empirically compare our approach with state-of-the-art baselines and demonstrate the efficacy of our method, where very small changes in the transition matrix lead to significant improvement in the fairness of the PageRank algorithm.


【21】Can AI autonomously build, operate, and use the entire data stack?
标题:人工智能可以自主构建、操作和使用整个数据栈吗?
链接:https://arxiv.org/abs/2512.07926

作者:Arvind Agarwal,Lisa Amini,Sameep Mehta,Horst Samulowitz,Kavitha Srinivas
摘要:企业数据管理是一项艰巨的任务。它涵盖数据架构和系统、集成、质量、治理和持续改进。虽然人工智能助手可以帮助特定的角色(如数据工程师和管理员)导航和配置数据堆栈,但它们远未达到完全自动化。然而,随着人工智能越来越有能力处理以前由于固有的复杂性而抵制自动化的任务,我们相信有一个即将到来的机会来瞄准完全自主的数据资产。目前,人工智能被用于数据堆栈的不同部分,但在本文中,我们主张从在独立数据组件操作中使用人工智能转向对整个数据生命周期进行更全面和自主的处理。为此,我们将探索如何由智能代理自主管理现代数据堆栈的每个阶段,以构建不仅可供人类最终用户使用,而且可供人工智能本身使用的自给自足系统。首先,我们描述了越来越多的力量和机会,需要这种范式的转变,检查代理如何可以简化数据生命周期,并强调开放的问题和领域,需要额外的研究。我们希望这项工作将激发热烈的辩论,刺激进一步的研究,激励协作方法,并促进数据系统更加自主的未来。
摘要 :Enterprise data management is a monumental task. It spans data architecture and systems, integration, quality, governance, and continuous improvement. While AI assistants can help specific persona, such as data engineers and stewards, to navigate and configure the data stack, they fall far short of full automation. However, as AI becomes increasingly capable of tackling tasks that have previously resisted automation due to inherent complexities, we believe there is an imminent opportunity to target fully autonomous data estates. Currently, AI is used in different parts of the data stack, but in this paper, we argue for a paradigm shift from the use of AI in independent data component operations towards a more holistic and autonomous handling of the entire data lifecycle. Towards that end, we explore how each stage of the modern data stack can be autonomously managed by intelligent agents to build self-sufficient systems that can be used not only by human end-users, but also by AI itself. We begin by describing the mounting forces and opportunities that demand this paradigm shift, examine how agents can streamline the data lifecycle, and highlight open questions and areas where additional research is needed. We hope this work will inspire lively debate, stimulate further research, motivate collaborative approaches, and facilitate a more autonomous future for data systems.


【22】Nonnegative Matrix Factorization through Cone Collapse
标题:通过锥塌缩实现非负矩阵因式分解
链接:https://arxiv.org/abs/2512.07879

作者:Manh Nguyen,Daniel Pimentel-Alarcón
摘要:非负矩阵分解(NMF)是一种广泛使用的工具,用于学习非负数据的基于部分的低维表示,并应用于视觉,文本和生物信息学。在聚类应用中,正交NMF(ONMF)变体进一步对表示矩阵施加(近似)正交性,使得其行表现得像软聚类指示符。然而,现有的算法,通常是从优化的观点,并没有明确地利用由NMF引起的圆锥几何形状:数据点位于凸锥,其极端射线编码的基本方向或“主题”。在这项工作中,我们重新审视NMF从这个几何的角度来看,并提出锥崩溃,一个算法,从完整的非负正交和迭代收缩它向最小锥生成的数据。我们证明了,在温和的假设下的数据,锥崩溃终止在许多步骤,并恢复最小生成锥$\mathbf{X}^\top$。在此基础上,我们推导出一个锥感知的正交NMF模型(CC-NMF),通过应用单正交NMF恢复的极端射线。在16个基准基因表达,文本和图像数据集上,CC-NMF在聚类纯度方面始终匹配或优于强NMF基线,包括乘法更新,ANLS,投影NMF,ONMF和稀疏NMF。这些结果表明,显式地恢复数据锥可以产生理论上接地和经验强大的基于NMF的聚类方法。
摘要:Nonnegative matrix factorization (NMF) is a widely used tool for learning parts-based, low-dimensional representations of nonnegative data, with applications in vision, text, and bioinformatics. In clustering applications, orthogonal NMF (ONMF) variants further impose (approximate) orthogonality on the representation matrix so that its rows behave like soft cluster indicators. Existing algorithms, however, are typically derived from optimization viewpoints and do not explicitly exploit the conic geometry induced by NMF: data points lie in a convex cone whose extreme rays encode fundamental directions or "topics". In this work we revisit NMF from this geometric perspective and propose Cone Collapse, an algorithm that starts from the full nonnegative orthant and iteratively shrinks it toward the minimal cone generated by the data. We prove that, under mild assumptions on the data, Cone Collapse terminates in finitely many steps and recovers the minimal generating cone of $\mathbf{X}^\top$ . Building on this basis, we then derive a cone-aware orthogonal NMF model (CC-NMF) by applying uni-orthogonal NMF to the recovered extreme rays. Across 16 benchmark gene-expression, text, and image datasets, CC-NMF consistently matches or outperforms strong NMF baselines-including multiplicative updates, ANLS, projective NMF, ONMF, and sparse NMF-in terms of clustering purity. These results demonstrate that explicitly recovering the data cone can yield both theoretically grounded and empirically strong NMF-based clustering methods.


【23】Advancing physiological time series reconstruction and imputation via mixture of receptive fields and experts fusion
标题:通过感受野和专家融合的混合推进生理时间序列重建和插补
链接:https://arxiv.org/abs/2512.07873

作者:Ci Zhang,Huayu Li,Changdi Yang,Jiangnan Xia,Yanzhi Wang,Xiaolong Ma,Jin Lu,Geng Yuan
摘要:最近的研究表明,使用扩散模型的时间序列信号的recruitment有很大的希望。然而,这样的方法在医学时间序列领域中仍然很大程度上未被探索。生理时间序列信号的独特特征,如多变量,高时间变异性,高噪声和伪像倾向,使得基于深度学习的方法对于诸如插补之类的任务仍然具有挑战性。因此,我们提出了一种新的混合专家(MoE)为基础的噪声估计分数为基础的扩散框架内。具体而言,感受野自适应MoE(RFA MoE)模块设计为使每个通道能够在整个扩散过程中自适应地选择所需的感受野。此外,最近的文献已经发现,当生成生理信号时,执行多个推断并对重构信号进行平均可以有效地减少重构误差,但是以显著的计算和延迟开销为代价。我们设计了一个融合MoE模块,并创新性地利用MoE模块的性质,并行生成K个噪声信号,使用路由机制融合它们,并在单个推理步骤中完成信号重构。这种设计不仅比以前的方法提高了性能,而且消除了与多个推理过程相关的大量计算成本和延迟。大量的结果表明,我们提出的框架始终优于基于扩散的SOTA作品在不同的任务和数据集。
摘要:Recent studies show that using diffusion models for time series signal reconstruc- tion holds great promise. However, such approaches remain largely unexplored in the domain of medical time series. The unique characteristics of the physiological time series signals, such as multivariate, high temporal variability, highly noisy, and artifact-prone, make deep learning-based approaches still challenging for tasks such as imputation. Hence, we propose a novel Mixture of Experts (MoE)-based noise estimator within a score-based diffusion framework. Specifically, the Receptive Field Adaptive MoE (RFAMoE) module is designed to enable each channel to adap- tively select desired receptive fields throughout the diffusion process. Moreover, recent literature has found that when generating a physiological signal, performing multiple inferences and averaging the reconstructed signals can effectively reduce reconstruction errors, but at the cost of significant computational and latency over- head. We design a Fusion MoE module and innovatively leverage the nature of MoE module to generate K noise signals in parallel, fuse them using a routing mechanism, and complete signal reconstruction in a single inference step. This design not only improves performance over previous methods but also eliminates the substantial computational cost and latency associated with multiple inference processes. Extensive results demonstrate that our proposed framework consistently outperforms diffusion-based SOTA works on different tasks and datasets.


【24】Magnetic activity of ultracool dwarfs in the LAMOST DR11
标题:LAMOST DR 11中超冷矮星的磁活动
链接:https://arxiv.org/abs/2512.08305

作者:Yue Xiang,Shenghong Gu,Dongtao Cao
备注:13 pages, 10 figures, accepted for publication in ApJ
摘要:超冷矮星由质量最低的恒星和褐矮星组成。它们的内部是完全对流的,与部分对流的类太阳恒星不同。超冷矮星表面下的磁场产生过程仍然知之甚少,也存在争议。为了显著增加活跃超冷矮星的样本,我们在最新的LAMOST数据发布DR 11中确定了962个超冷矮星。通过对LAMOST谱的降质,模拟了中国空间站巡天望远镜(CSST)的低分辨率无狭缝谱。建立了一种基于自动编码器模型的半监督机器学习方法,利用模拟的CSST光谱识别超冷矮星,验证了CSST全天无缝光谱巡天探测超冷矮星的能力。本文用H α谱线作为代用谱线,研究了超冷矮星的磁活动。根据Kepler/K2光变曲线计算了82颗超冷矮星的自转周期。我们还推导出了超冷矮星的活度-转动关系,它在Rossby数为0.12附近饱和。
摘要:Ultracool dwarfs consist of lowest-mass stars and brown dwarfs. Their interior is fully convective, different from that of the partly-convective Sun-like stars. Magnetic field generation process beneath the surface of ultracool dwarfs is still poorly understood and controversial. To increase samples of active ultracool dwarfs significantly, we have identified 962 ultracool dwarfs in the latest LAMOST data release, DR11. We also simulate the Chinese Space Station Survey Telescope (CSST) low-resolution slitless spectra by degrading the LAMOST spectra. A semi-supervised machine learning approach with an autoencoder model is built to identify ultracool dwarfs with the simulated CSST spectra, which demonstrates the capability of the CSST all-sky slitless spectroscopic survey on the detection of ultracool dwarfs. Magnetic activity of the ultracool dwarfs is investigated by using the H$α$ line emission as a proxy. The rotational periods of 82 ultracool dwarfs are derived based on the Kepler/K2 light curves. We also derive the activity-rotation relation of the ultracool dwarfs, which is saturated around a Rossby number of 0.12.


【25】Provable Diffusion Posterior Sampling for Bayesian Inversion
标题:Bayesian倒置的可证明扩散后验抽样
链接:https://arxiv.org/abs/2512.08022

作者:Jinyuan Chang,Chenguang Duan,Yuling Jiao,Ruoxuan Li,Jerry Zhijian Yang,Cheng Yuan
摘要:本文提出了一种新的基于扩散的后验采样方法在一个即插即用(PSTO)的框架。我们的方法构造了一个概率运输从一个易于采样的终端分布的目标后验,使用一个热启动策略来初始化粒子。为了近似后验分数,我们开发了一个Monte Carlo估计,其中使用Langevin动力学生成粒子,避免了以前工作中常用的启发式近似。从数据中学习控制朗之万动力学的分数,使模型能够捕获底层先验分布的丰富结构特征。在理论方面,我们提供了非渐近误差界,表明该方法收敛,即使是复杂的,多模态的目标后验分布。这些界限明确量化后验分数估计,热启动初始化和后验采样过程中产生的错误。我们的分析进一步阐明了先验分数匹配误差和贝叶斯逆问题的条件数如何影响整体性能。最后,我们提出了数值实验证明所提出的方法在一系列反问题的有效性。
摘要:This paper proposes a novel diffusion-based posterior sampling method within a plug-and-play (PnP) framework. Our approach constructs a probability transport from an easy-to-sample terminal distribution to the target posterior, using a warm-start strategy to initialize the particles. To approximate the posterior score, we develop a Monte Carlo estimator in which particles are generated using Langevin dynamics, avoiding the heuristic approximations commonly used in prior work. The score governing the Langevin dynamics is learned from data, enabling the model to capture rich structural features of the underlying prior distribution. On the theoretical side, we provide non-asymptotic error bounds, showing that the method converges even for complex, multi-modal target posterior distributions. These bounds explicitly quantify the errors arising from posterior score estimation, the warm-start initialization, and the posterior sampling procedure. Our analysis further clarifies how the prior score-matching error and the condition number of the Bayesian inverse problem influence overall performance. Finally, we present numerical experiments demonstrating the effectiveness of the proposed method across a range of inverse problems.


【26】Fast and Robust Diffusion Posterior Sampling for MR Image Reconstruction Using the Preconditioned Unadjusted Langevin Algorithm
标题:使用预处理未调整Langevin算法进行快速稳健的MR图像重建扩散后验采样
链接:https://arxiv.org/abs/2512.05791

作者:Moritz Blumenthal,Tina Holliber,Jonathan I. Tamir,Martin Uecker
备注:Submitted to Magnetic Resonance in Medicine
摘要:用途:与扩散模型相结合的未调整的朗之万算法(Unadjusted Langevin Algorithm,简称ANN)可以从高度欠采样的k空间数据中生成具有不确定性估计的高质量MRI重建。然而,采样方法,如扩散后验采样或似然退火遭受重建时间长,需要参数调整。本文的目的是提出一种具有快速收敛性的鲁棒采样算法。   理论与方法:在用于采样后验的反向扩散过程中,在所有噪声尺度下将精确似然性乘以扩散先验。为了克服收敛速度慢的问题,使用预处理。该方法在fastMRI数据上进行训练,并在健康志愿者的回顾性欠采样大脑数据上进行测试。   结果如下:对于笛卡尔和非笛卡尔加速MRI中的后验采样,新方法在重建速度和样本质量方面优于退火采样。   总结:所提出的预处理的精确似然能够在各种MRI重建任务中进行快速可靠的后验采样,而无需进行参数调整。
摘要:Purpose: The Unadjusted Langevin Algorithm (ULA) in combination with diffusion models can generate high quality MRI reconstructions with uncertainty estimation from highly undersampled k-space data. However, sampling methods such as diffusion posterior sampling or likelihood annealing suffer from long reconstruction times and the need for parameter tuning. The purpose of this work is to develop a robust sampling algorithm with fast convergence.   Theory and Methods: In the reverse diffusion process used for sampling the posterior, the exact likelihood is multiplied with the diffused prior at all noise scales. To overcome the issue of slow convergence, preconditioning is used. The method is trained on fastMRI data and tested on retrospectively undersampled brain data of a healthy volunteer.   Results: For posterior sampling in Cartesian and non-Cartesian accelerated MRI the new approach outperforms annealed sampling in terms of reconstruction speed and sample quality.   Conclusion: The proposed exact likelihood with preconditioning enables rapid and reliable posterior sampling across various MRI reconstruction tasks without the need for parameter tuning.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/190280