点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计170篇
大模型相关(21篇)
【1】MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
标题:MarkTune:改善开放权重LLM水印中的质量检测能力权衡
链接:https://arxiv.org/abs/2512.04044
作者:Yizhou Zhao,Zhiwei Steven Wu,Adam Block
摘要:水印的目的是在生成的文本中嵌入隐藏的信号,当获得密钥时可以可靠地检测到这些信号。开放权重语言模型对此类水印方案提出了严峻的挑战,因为一旦模型权重公开,主导当代方法的推理时间干预就无法实施。现有的watermaking技术的开放权重模型,如最近提出的高斯马克,通常依赖于小的修改模型的权重,这可以产生可检测到的信号配备了一个秘密的密钥,但实现检测能力相当的推理时间水印一般需要权重扰动,显着降低生成质量。我们引入MarkTune,一个理论上有原则的,基于策略的微调框架,它将GaussMark信号视为奖励,同时正则化文本质量的下降。我们推导出MarkTune作为GaussMark的改进,并证明MarkTune通过在模型的表示空间内进行更细粒度的水印感知权重更新,同时保持生成质量,从而不断提高GaussMark的质量检测性权衡。从经验上讲,我们表明,MarkTune推动高斯马克的质量可检测性边界接近推理时间水印,对释义和微调攻击保持鲁棒性,并表现出很强的泛化能力:在一个数据集上微调的模型保留了大量的水印检测能力。总之,这些结果建立MarkTune作为一个通用的策略,嵌入强大的,高质量的水印到开放重量LM。
摘要:Watermarking aims to embed hidden signals in generated text that can be reliably detected when given access to a secret key. Open-weight language models pose acute challenges for such watermarking schemes because the inference-time interventions that dominate contemporary approaches cannot be enforced once model weights are public. Existing watermaking techniques for open-weight models, such as the recently proposed GaussMark, typically rely on small modifications to model weights, which can yield signals detectable to those equipped with a secret key, but achieving detection power comparable to inference-time watermarks generally requires weight perturbations that noticeably reduce generation quality. We introduce MarkTune, a theoretically principled, on-policy fine-tuning framework that treats the GaussMark signal as a reward while simultaneously regularizing against degradation in text quality. We derive MarkTune as an improvement on GaussMark and demonstrate that MarkTune consistently improves the quality-detectability trade-off over GaussMark by steering finer-grained, watermark-aware weight updates within the model's representation space while preserving generation quality. Empirically, we show that MarkTune pushes the quality-detectability frontier of GaussMark close to that of inference-time watermarking, remains robust to paraphrasing and fine-tuning attacks, and exhibits strong generalization: a model fine-tuned on one dataset retains substantial watermark detection power on unseen datasets. Together, these results establish MarkTune as a general strategy for embedding robust, high-quality watermarks into open-weight LMs.
【2】Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs
标题:通过LLC中的激活空间白化进行免训练政策违规检测
链接:https://arxiv.org/abs/2512.03994
作者:Oren Rachmil,Roy Betser,Itay Gershon,Omer Hofman,Nitay Yakoby,Yuval Meron,Idan Yankelev,Asaf Shabtai,Yuval Elovici,Roman Vainshtein
备注:Accepted to the AAAI 2026 Deployable AI (DAI) Workshop
摘要:随着组织越来越多地在法律支持、金融和医疗服务等敏感领域部署LLM,使专有的大型语言模型(LLM)与内部组织政策保持一致已成为当务之急。除了通用的安全过滤器,企业还需要可靠的机制来检测其监管和运营框架内的政策违规行为,违规行为可能引发法律和声誉风险。现有的内容审核框架(如护栏)在很大程度上仍然局限于安全领域,并且缺乏捕获细微差别的组织策略的健壮性。法学硕士作为判断者和微调方法尽管灵活,但会引入显着的延迟并且缺乏可解释性。为了解决这些限制,我们提出了一种免训练且高效的方法,该方法将策略违规检测视为分发外(OOD)检测问题。受白化技术的启发,我们应用线性变换来解相关模型的隐藏激活,并将其标准化为零均值和单位方差,从而产生近似一致的协方差矩阵。在这个变换后的空间中,我们使用欧几里德范数作为合规性得分来检测策略违规。该方法只需要策略文本和少量的示例,这使得它重量轻,易于部署。在一个具有挑战性的政策基准上,我们的方法实现了最先进的结果,超越了现有的护栏和微调的推理模型。这项工作为组织提供了一个实用的、以统计为基础的框架,用于对LLM进行政策意识监督,推进了可部署AI治理的更广泛目标。代码可在:https://tinyurl.com/policy-violation-detection
摘要:Aligning proprietary large language models (LLMs) with internal organizational policies has become an urgent priority as organizations increasingly deploy LLMs in sensitive domains such as legal support, finance, and medical services. Beyond generic safety filters, enterprises require reliable mechanisms to detect policy violations within their regulatory and operational frameworks, where breaches can trigger legal and reputational risks. Existing content moderation frameworks, such as guardrails, remain largely confined to the safety domain and lack the robustness to capture nuanced organizational policies. LLM-as-a-judge and fine-tuning approaches, though flexible, introduce significant latency and lack interpretability. To address these limitations, we propose a training-free and efficient method that treats policy violation detection as an out-of-distribution (OOD) detection problem. Inspired by whitening techniques, we apply a linear transformation to decorrelate the model's hidden activations and standardize them to zero mean and unit variance, yielding near-identity covariance matrix. In this transformed space, we use the Euclidean norm as a compliance score to detect policy violations. The method requires only the policy text and a small number of illustrative samples, which makes it light-weight and easily deployable. On a challenging policy benchmark, our approach achieves state-of-the-art results, surpassing both existing guardrails and fine-tuned reasoning models. This work provides organizations with a practical and statistically grounded framework for policy-aware oversight of LLMs, advancing the broader goal of deployable AI governance. Code is available at: https://tinyurl.com/policy-violation-detection
【3】Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models
标题:通过大型语言模型实现Few-Shot类增量学习的自动攻击发现
链接:https://arxiv.org/abs/2512.03882
作者:Haidong Kang,Wei Wu,Hanling Wang
摘要:Few-Shot class incremental learning(FSCIL)是一种更现实、更具有挑战性的持续学习范式,它可以通过少量的训练样本来增量学习未知类,克服对基类的灾难性遗忘。以前的努力主要集中在研究更有效的FSCIL方法。相比之下,较少的注意力用于考虑安全问题,以促进FSCIL。本文旨在全面研究攻击对FSCIL的影响。我们首先通过系统地探索人类专家如何设计攻击方法(即,PGD、FGSM)影响FSCIL。我们发现这些方法要么无法攻击基类,要么由于依赖于大量的专家知识而遭受巨大的人力成本。这突出了为FSCIL设计专门攻击方法的必要性。基于这些见解,在本文中,我们提出了一种简单而有效的ACraft方法,通过利用大型语言模型(LLM)自动引导和发现针对FSCIL的最佳攻击方法,而无需人类专家。此外,为了改善LLM和FSCIL之间的推理,我们引入了一种新的基于邻近策略优化(PPO)的强化学习来优化学习,通过建立正反馈,使LLM在下一代中产生更好的攻击方法。主流基准测试的实验表明,我们的ACraft显着降低了最先进的FSCIL方法的性能,大大超过人类专家设计的攻击方法,同时保持最低的攻击成本。
摘要
:Few-shot class incremental learning (FSCIL) is a more realistic and challenging paradigm in continual learning to incrementally learn unseen classes and overcome catastrophic forgetting on base classes with only a few training examples. Previous efforts have primarily centered around studying more effective FSCIL approaches. By contrast, less attention was devoted to thinking the security issues in contributing to FSCIL. This paper aims to provide a holistic study of the impact of attacks on FSCIL. We first derive insights by systematically exploring how human expert-designed attack methods (i.e., PGD, FGSM) affect FSCIL. We find that those methods either fail to attack base classes, or suffer from huge labor costs due to relying on huge expert knowledge. This highlights the need to craft a specialized attack method for FSCIL. Grounded in these insights, in this paper, we propose a simple yet effective ACraft method to automatically steer and discover optimal attack methods targeted at FSCIL by leveraging Large Language Models (LLMs) without human experts. Moreover, to improve the reasoning between LLMs and FSCIL, we introduce a novel Proximal Policy Optimization (PPO) based reinforcement learning to optimize learning, making LLMs generate better attack methods in the next generation by establishing positive feedback. Experiments on mainstream benchmarks show that our ACraft significantly degrades the performance of state-of-the-art FSCIL methods and dramatically beyond human expert-designed attack methods while maintaining the lowest costs of attack.
【4】OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance
标题:OmniDexVLG:从视觉语言模型引导的抓取语义、分类学和功能性功能学习灵巧抓取生成
链接:https://arxiv.org/abs/2512.03874
作者:Lei Zhang,Diwen Zheng,Kaixin Bai,Zhenshan Bing,Zoltan-Csaba Marton,Zhaopeng Chen,Alois Christian Knoll,Jianwei Zhang
备注:Project Website: https://sites.google.com/view/omnidexvlg, 16 pages
摘要:灵巧抓取生成的目的是产生符合任务要求和人类可解释的抓取语义的抓取姿势。然而,实现语义可控灵巧把握合成仍然是极具挑战性的,由于缺乏统一的建模多个语义维度,包括把握分类,接触语义,功能启示。为了解决这些局限性,我们提出了OmniDexVLG,多模态,语义感知把握生成框架,能够产生结构多样和语义连贯的灵巧把握联合语言和视觉指导下。我们的方法从OmniDexDataGen开始,OmniDexDataGen是一个语义丰富的灵巧抓取数据集生成管道,它集成了抓取分类指导的配置采样、功能启示接触点采样、分类感知的差力闭合抓取采样以及基于物理的优化和验证,能够系统地覆盖不同的抓取类型。我们进一步介绍OmniDexReasoner,多模态把握类型的语义推理模块,利用多智能体协作,检索增强生成和思维推理链来推断把握相关的语义,并生成高质量的注释,使语言指令与任务特定的把握意图相一致。在这些组件的基础上,我们开发了一个统一的视觉语言抓取生成模型,明确结合把握分类,接触结构和功能启示语义,使细粒度控制掌握自然语言指令的合成。在模拟和现实世界中的物体抓取和消融研究的广泛实验表明,我们的方法大大优于最先进的方法在把握多样性,接触语义的多样性,功能启示的多样性和语义的一致性。
摘要:Dexterous grasp generation aims to produce grasp poses that align with task requirements and human interpretable grasp semantics. However, achieving semantically controllable dexterous grasp synthesis remains highly challenging due to the lack of unified modeling of multiple semantic dimensions, including grasp taxonomy, contact semantics, and functional affordance. To address these limitations, we present OmniDexVLG, a multimodal, semantics aware grasp generation framework capable of producing structurally diverse and semantically coherent dexterous grasps under joint language and visual guidance. Our approach begins with OmniDexDataGen, a semantic rich dexterous grasp dataset generation pipeline that integrates grasp taxonomy guided configuration sampling, functional affordance contact point sampling, taxonomy aware differential force closure grasp sampling, and physics based optimization and validation, enabling systematic coverage of diverse grasp types. We further introduce OmniDexReasoner, a multimodal grasp type semantic reasoning module that leverages multi agent collaboration, retrieval augmented generation, and chain of thought reasoning to infer grasp related semantics and generate high quality annotations that align language instructions with task specific grasp intent. Building upon these components, we develop a unified Vision Language Grasping generation model that explicitly incorporates grasp taxonomy, contact structure, and functional affordance semantics, enabling fine grained control over grasp synthesis from natural language instructions. Extensive experiments in simulation and real world object grasping and ablation studies demonstrate that our method substantially outperforms state of the art approaches in terms of grasp diversity, contact semantic diversity, functional affordance diversity, and semantic consistency.
【5】DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
标题:DVPO:基于分布价值建模的LLM后训练政策优化
链接:https://arxiv.org/abs/2512.03847
作者:Dingwei Zhu,Zhiheng Xi,Shihan Dou,Yuhui Wang,Sixian Li,Junjie Ye,Honglin Guo,Shichun Liu,Chenhao Huang,Yajie Yang,Junlin Shang,Senjie Jin,Ming Zhang,Jiazheng Zhang,Caishuang Huang,Yunke Zhang,Demei Yan,Yuran Wang,Tao Gui
摘要:强化学习(RL)在LLM后训练中表现出很强的性能,但现实世界的部署通常涉及噪声或不完整的监督。在这种情况下,复杂和不可靠的监督信号可能会破坏训练的稳定性,并损害泛化能力。虽然现有的方法,如最坏情况优化(例如,RFQI、CQL)和基于均值的方法(例如,PPO,GRPO)可以提高稳定性,但它们往往忽视泛化,可能会产生过于保守的策略,导致在不同的实际场景中性能不均匀。为此,我们引入了DVPO(具有风险感知策略优化的分布值建模),这是一种新的RL框架,将条件风险理论与分布值建模相结合,以更好地平衡鲁棒性和泛化性。DVPO学习令牌级别的值分布以提供细粒度的监督,并应用非对称风险正则化来塑造分布尾部:它收缩下尾以抑制噪声负偏差,同时扩展上尾以保持探索性多样性。在多轮对话,数学推理和科学QA的广泛实验和分析中,DVPO在嘈杂的监督下始终优于PPO,GRPO和基于Bellman的强大PPO,显示了其在现实世界中的LLM后期培训潜力。
摘要:Reinforcement learning (RL) has shown strong performance in LLM post-training, but real-world deployment often involves noisy or incomplete supervision. In such settings, complex and unreliable supervision signals can destabilize training and harm generalization. While existing approaches such as worst-case optimization (e.g., RFQI, CQL) and mean-based methods (e.g., PPO, GRPO) can improve stability, they often overlook generalization and may produce overly conservative policies, leading to uneven performance across diverse real scenarios. To this end, we introduce DVPO (Distributional Value Modeling with Risk-aware Policy Optimization), a new RL framework that combines conditional risk theory with distributional value modeling to better balance robustness and generalization. DVPO learns token-level value distributions to provide fine-grained supervision, and applies an asymmetric risk regularization to shape the distribution tails: it contracts the lower tail to dampen noisy negative deviations, while expanding the upper tail to preserve exploratory diversity. Across extensive experiments and analysis in multi-turn dialogue, math reasoning, and scientific QA, DVPO consistently outperforms PPO, GRPO, and robust Bellman-based PPO under noisy supervision, showing its potential for LLM post-training in the real-world.
【6】Log Probability Tracking of LLM APIs
标题:LLM API的日志概率跟踪
链接:https://arxiv.org/abs/2512.03816
作者:Timothée Chauvin,Erwan Le Merrer,François Taïani,Gilles Tredan
摘要:当通过API提供商使用LLM时,用户希望服务的模型随着时间的推移保持一致,这对于下游应用程序的可靠性和研究的可重复性至关重要。现有的审计方法成本太高,无法定期应用于各种可用的LLM API。这意味着模型更新在实践中基本上没有受到监控。在这项工作中,我们表明,虽然LLM日志概率(logprobs)通常是不确定的,但它们仍然可以用作LLM API的成本效益连续监控的基础。我们根据每个标记logprob的平均值应用一个简单的统计测试,仅请求单个标记的输出。这足以检测到小到一步微调的变化,使这种方法比现有方法更敏感,同时便宜1,000倍。我们引入TinyChange基准作为衡量审计方法在小的、现实的模型变化背景下的敏感性的一种方法。
摘要
:When using an LLM through an API provider, users expect the served model to remain consistent over time, a property crucial for the reliability of downstream applications and the reproducibility of research. Existing audit methods are too costly to apply at regular time intervals to the wide range of available LLM APIs. This means that model updates are left largely unmonitored in practice. In this work, we show that while LLM log probabilities (logprobs) are usually non-deterministic, they can still be used as the basis for cost-effective continuous monitoring of LLM APIs. We apply a simple statistical test based on the average value of each token logprob, requesting only a single token of output. This is enough to detect changes as small as one step of fine-tuning, making this approach more sensitive than existing methods while being 1,000x cheaper. We introduce the TinyChange benchmark as a way to measure the sensitivity of audit methods in the context of small, realistic model changes.
【7】AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
标题:AdaptVision:通过自适应视觉习得构建高效的视觉语言模型
链接:https://arxiv.org/abs/2512.03794
作者:Zichuan Lin,Yicheng Liu,Yang Yang,Lvfang Tao,Deheng Ye
备注:15 pages, 9 figures
摘要:视觉语言模型(VLM)在视觉问题回答任务中取得了显着的成功,但它们对大量视觉标记的依赖引入了显着的计算开销。虽然现有的高效VLM方法通过固定比率压缩来减少视觉令牌,但它们被动地操作并且缺乏适应不同任务要求的能力。这就引出了一个基本问题:VLM能否自主确定每个样本所需的最小视觉标记数量?受人类主动视觉机制的启发,我们引入了AdaptVision,这是一种高效的VLM范式,可以通过从粗到细的方法实现自适应视觉令牌获取。我们的模型最初处理低分辨率图像中的压缩视觉标记,并在必要时通过调用边界框工具来裁剪关键区域来选择性地获取额外的视觉信息。我们使用强化学习框架来训练AdaptVision,该框架仔细平衡了准确性和效率。我们的方法的核心是解耦转向策略优化(DTPO),它将学习目标分为两个部分:(1)工具学习,优化正确的工具利用率,以及(2)准确性改进,改进生成的响应以提高答案的正确性。基于这个公式,我们进一步解耦优势估计计算单独的优势与每个目标相关的令牌。与普通GRPO相比,该配方能够更有效地优化AdaptVision。多个VQA基准测试的综合实验表明,AdaptVision实现了卓越的性能,同时比最先进的高效VLM方法消耗更少的视觉令牌。
摘要:Vision-Language Models (VLMs) have achieved remarkable success in visual question answering tasks, but their reliance on large numbers of visual tokens introduces significant computational overhead. While existing efficient VLM approaches reduce visual tokens through fixed-ratio compression, they operate passively and lack the ability to adapt to varying task requirements. This motivates a fundamental question: Can VLMs autonomously determine the minimum number of visual tokens required for each sample? Inspired by human active vision mechanisms, we introduce AdaptVision, an efficient VLM paradigm that enables adaptive visual token acquisition through a coarse-to-fine approach. Our model initially processes compressed visual tokens from low-resolution images and selectively acquires additional visual information by invoking a bounding box tool to crop key regions when necessary. We train AdaptVision using a reinforcement learning framework that carefully balances accuracy and efficiency. Central to our approach is Decoupled Turn Policy Optimization (DTPO), which decouples the learning objective into two components: (1) tool learning, which optimizes correct tool utilization, and (2) accuracy improvement, which refines the generated responses to improve answer correctness. Based on this formulation, we further decouple advantage estimation by computing separate advantages for tokens associated with each objective. This formulation enables more effective optimization for AdaptVision compared to vanilla GRPO. Comprehensive experiments across multiple VQA benchmarks demonstrate that AdaptVision achieves superior performance while consuming substantially fewer visual tokens than state-of-the-art efficient VLM methods.
【8】Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
标题:扩散LLM的原则RL从序列级角度出现
链接:https://arxiv.org/abs/2512.03759
作者:Jingyang Ou,Jiaqi Han,Minkai Xu,Shaoxuan Xu,Jianwen Xie,Stefano Ermon,Yi Wu,Chongxuan Li
摘要:强化学习(RL)已被证明对自回归语言模型非常有效,但将这些方法应用于扩散大型语言模型(DLLM)提出了根本性的挑战。核心困难在于似然近似:虽然自回归模型自然地提供了令牌级RL目标所必需的令牌级条件概率(例如,GRPO),dLLM通过缺乏这种因子分解的迭代非自回归去噪步骤生成序列。为了解决这个根本的不匹配,我们提出了基于ELBO的序列级策略优化(ESPO),这是一个原则性的RL框架,它将整个序列生成视为单个动作,并使用ELBO作为易于处理的序列级似然代理。我们的方法结合了重要性比率的每个令牌归一化和鲁棒的KL发散估计,以确保稳定的大规模训练。在数学推理、编码和规划任务上的大量实验表明,ESPO的表现明显优于令牌级基线,在倒计时任务上实现了20-40分的显著改进,同时在数学和编码基准上保持了一致的收益。我们的方法将序列级优化建立为dLLM中RL的原则性和经验性有效范例。我们的代码可在https://github.com/ML-GSAI/ESPO上获得。
摘要:Reinforcement Learning (RL) has proven highly effective for autoregressive language models, but adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges. The core difficulty lies in likelihood approximation: while autoregressive models naturally provide token-level conditional probabilities essential for token-level RL objectives (e.g., GRPO), dLLMs generate sequences through iterative non-autoregressive denoising steps that lack this factorization. To address this fundamental mismatch, we propose ELBO-based Sequence-level Policy Optimization (ESPO), a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy. Our method incorporates per-token normalization of importance ratios and robust KL-divergence estimation to ensure stable large-scale training. Extensive experiments on mathematical reasoning, coding, and planning tasks demonstrate that ESPO significantly outperforms token-level baselines, achieving dramatic improvements of 20-40 points on the Countdown task, while maintaining consistent gains on math and coding benchmarks. Our approach establishes sequence-level optimization as a principled and empirically effective paradigm for RL in dLLMs. Our code is available at https://github.com/ML-GSAI/ESPO.
【9】The promising potential of vision language models for the generation of textual weather forecasts
标题:视觉语言模型在生成文本天气预报方面具有广阔的潜力
链接:https://arxiv.org/abs/2512.03623
作者:Edward C. C. Steele,Dinesh Mane,Emilio Monti,Luis Orus,Rebecca Chantrill-Cheyette,Matthew Couch,Kirstine I. Dale,Simon Eaton,Govindarajan Rangarajan,Amir Majlesi,Steven Ramsdale,Michael Sharpe,Craig Smith,Jonathan Smith,Rebecca Yates,Holly Ellis,Charles Ewen
备注:7 pages, 2 tables
摘要:尽管多模式基础模型的能力很有前途,但其在气象产品和服务生成方面的应用仍处于初期阶段。为了加速愿望和采用,我们探索了视觉语言模型的新颖使用,用于直接从视频编码的网格化天气数据中编写标志性的航运预报文本。这些早期结果表明,可扩展的技术机会有望在气象企业内外提高生产效率和服务创新。
摘要:Despite the promising capability of multimodal foundation models, their application to the generation of meteorological products and services remains nascent. To accelerate aspiration and adoption, we explore the novel use of a vision language model for writing the iconic Shipping Forecast text directly from video-encoded gridded weather data. These early results demonstrate promising scalable technological opportunities for enhancing production efficiency and service innovation within the weather enterprise and beyond.
【10】SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting
标题:SELF:一种鲁棒的奇异值和特征值LLM指纹算法
链接:https://arxiv.org/abs/2512.03620
作者:Hanxiu Zhang,Yue Zheng
摘要
:在大型语言模型(LLM)中保护知识产权(IP)是当代人工智能研究中的一个关键挑战。虽然指纹技术已经成为检测未经授权的模型使用的基本机制,但现有的方法-无论是基于行为的还是结构性的-都存在漏洞,例如虚假声明攻击或容易受到权重操纵。为了克服这些局限性,我们提出了SELF,一种新的内在的基于权重的指纹方案,消除了对输入的依赖,并从本质上抵制虚假声明。SELF通过两个关键创新实现了强大的IP保护:1)通过LLM注意力权重的奇异值和特征值分解进行唯一的,可扩展的和变换不变的指纹提取,以及2)基于Few-Shot学习和数据增强的有效的基于神经网络的指纹相似性比较。实验结果表明,SELF保持了较高的IP侵权检测精度,同时对各种下游修改,包括量化,修剪和微调攻击表现出较强的鲁棒性。我们的代码可在https://github.com/HanxiuZhang/SELF_v2上获得。
摘要:The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at https://github.com/HanxiuZhang/SELF_v2.
【11】Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis
标题:医学青光眼诊断的视觉语言模型的公平性微调
链接:https://arxiv.org/abs/2512.03477
作者:Zijian Gu,Yuxi Liu,Zhenhao Zhang,Song Wang
备注:10 pages, 3 tables
摘要:视觉语言模型在医学成像任务上实现了专家级的性能,但在人口统计学群体中表现出显着的诊断准确性差异。我们引入了医疗VLM的公平性感知低秩自适应,将参数效率与显式公平性优化相结合。我们的关键算法贡献是一个可区分的MaxAccGap损失,可以实现跨人口统计组的准确性奇偶性的端到端优化。我们提出三种方法:FR-LoRA将MaxAccGap正则化集成到训练目标中,GR-LoRA应用逆频率加权来平衡梯度贡献,Hybrid-LoRA结合了这两种机制。通过对10,000张青光眼眼底图像进行评估,GR-LoRA将诊断准确性差异减少了69%,同时保持53.15%的总体准确性。消融研究表明,强大的正则化强度实现最佳的公平性与最小的准确性权衡,种族特定的优化产生60%的差距减少。我们的方法只需要0.24%的可训练参数,从而能够在资源受限的医疗环境中实际部署公平的医疗AI。
摘要:Vision-language models achieve expert-level performance on medical imaging tasks but exhibit significant diagnostic accuracy disparities across demographic groups. We introduce fairness-aware Low-Rank Adaptation for medical VLMs, combining parameter efficiency with explicit fairness optimization. Our key algorithmic contribution is a differentiable MaxAccGap loss that enables end-to-end optimization of accuracy parity across demographic groups. We propose three methods: FR-LoRA integrates MaxAccGap regularization into the training objective, GR-LoRA applies inverse frequency weighting to balance gradient contributions, and Hybrid-LoRA combines both mechanisms.Evaluated on 10,000 glaucoma fundus images, GR-LoRA reduces diagnostic accuracy disparities by 69% while maintaining 53.15% overall accuracy. Ablation studies reveal that strong regularization strength achieves optimal fairness with minimal accuracy trade-off, and race-specific optimization yields 60% disparity reduction. Our approach requires only 0.24% trainable parameters, enabling practical deployment of fair medical AI in resource-constrained healthcare settings.
【12】UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
标题:UniQL:自适应边缘LLM的统一量化和低等级压缩
链接:https://arxiv.org/abs/2512.03383
作者:Hung-Yueh Chiang,Chi-Chih Chang,Yu-Chen Lu,Chien-Yu Lin,Kai-Chiang Wu,Mohamed S. Abdelfattah,Diana Marculescu
摘要:在移动平台上部署大型语言模型(LLM)模型面临着巨大的挑战,因为设备的内存和共享计算资源有限。资源可用性可能是一个问题,因为它直接受到当前设备工作负载的影响,增加了模型部署的不确定性。我们介绍了UniQL,这是一个统一的训练后量化和低秩压缩框架,具有边缘LLM的设备上可配置的修剪率。UniQL是一个通用框架,它集成了Transformers、状态空间模型(SSM)和混合模型的量化和低秩压缩,以支持各种边缘应用。在我们提出的联合框架中,我们引入了一种高效的结构化权重排序方法,可以将计算速度提高20倍,量化感知奇异值分解(SVD)可以最大限度地减少量化误差,SSM的状态感知权重排序,以及修剪模型的融合旋转位置嵌入(RoPE)内核。我们的框架在单通道工作流中在云中执行权重排序,微调和量化,同时使设备上可配置的修剪率高达35%。我们的实验表明,量化和修剪模型实现了4x-5.7x的内存减少和2.7x-3.4x的令牌吞吐量提高,保持精度在5%的原始模型在15%修剪Transformers(Llama 3和Qwen2.5),SSM(Mamba 2),和混合模型(Nemotron-H和Bamba-v2)。代码和量化模型可在https://github.com/enyac-group/UniQL上获得。
摘要:Deploying large language model (LLM) models on mobile platforms faces significant challenges due to the limited memory and shared computational resources of the device. Resource availability may be an issue as it is directly impacted by the current device workload, adding to the uncertainty of model deployment. We introduce UniQL, a unified post-training quantization and low-rank compression framework with on-device configurable pruning rates for edge LLMs. UniQL is a general framework that integrates quantization and low-rank compression for Transformers, State Space Models (SSMs), and hybrid models to support diverse edge applications. In our proposed joint framework, we introduce an efficient structured weight-sorting method that speeds up computation by 20x, quantization-aware singular value decomposition (SVD) to minimize quantization errors, state-aware weight sorting for SSMs, and a fused rotary positional embedding (RoPE) kernel for pruned models. Our framework performs weight-sorting, fine-tuning, and quantization in the cloud in a single-pass workflow, while enabling on-device configurable pruning rates up to 35%. Our experiments show that quantized and pruned models achieve a memory reduction of 4x-5.7x and a token-throughput improvement of 2.7x-3.4x, maintaining accuracy within 5% of the original models at 15% pruning across Transformers (Llama3 and Qwen2.5), SSMs (Mamba2), and hybrid models (Nemotron-H and Bamba-v2). The code and quantized models are available at: https://github.com/enyac-group/UniQL.
【13】Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
标题:缓存持续时间:LLM中内存有限制的KV缓存的令牌保留
链接:https://arxiv.org/abs/2512.03324
作者:Ngoc Bui,Shubham Sharma,Simran Lamba,Saumitra Mishra,Rex Ying
摘要:内存和计算仍然是长期LLM推理的核心瓶颈,这是由于自注意力的二次成本和不断增长的键值(KV)缓存。现有的策略,如量化,卸载,启发式KV驱逐,内存有界的推理,要么招致高编排成本或依赖于不可靠的基于注意力的代理的重要性。我们提出了TRIM-KV,这是一种新的方法,它通过一个轻量级的保留门在创建时学习每个令牌的内在重要性。每个门预测一个随时间衰减的标量保留分数,反映令牌对特定层和头部的长期效用。当超出内存预算时,将驱逐分数较低的令牌,以确保缓存始终包含最关键的令牌。TRIM-KV通过从冻结的LLM中提取并结合容量损失来有效地训练,仅需要门微调并添加可忽略的推理开销。在数学推理(GSM 8 K,MATH-500,AIME 24),程序生成(LongProc),会话长记忆基准(LongMemEval)和长上下文理解(LongBench和SCBench)中,TRIM-KV始终优于强驱逐和可学习的检索基线,特别是在低记忆状态下。值得注意的是,它甚至在某些设置中超过了全缓存模型,这表明选择性保留可以作为一种正则化形式,抑制来自无信息标记的噪声。定性分析进一步表明,学习保持分数与人类的直觉,自然恢复的词汇,如水槽令牌,滑动窗口,和要旨压缩没有明确的设计。除了效率之外,保留分数还提供了对层和头部特定角色的见解,这表明了LLM可解释性的新途径。
摘要
:Memory and computation remain core bottlenecks in long-horizon LLM inference due to the quadratic cost of self-attention and the ever-growing key-value (KV) cache. Existing strategies for memory-bounded inference, such as quantization, offloading, or heuristic KV eviction, either incur high orchestration costs or rely on unreliable attention-based proxies of importance. We propose TRIM-KV, a novel approach that learns each token's intrinsic importance at creation time via a lightweight retention gate. Each gate predicts a scalar retention score that decays over time, reflecting the long-term utility of the token for a specific layer and head. Tokens with low scores are evicted when the memory budget is exceeded, ensuring that the cache always contains the most critical tokens. TRIM-KV is trained efficiently through distillation from a frozen LLM combined with a capacity loss, requiring only gate fine-tuning and adding negligible inference overhead. Across mathematical reasoning (GSM8K, MATH-500, AIME24), procedural generation (LongProc), conversational long-memory benchmarks (LongMemEval), and long-context understanding (LongBench and SCBench), TRIM-KV consistently outperforms strong eviction and learnable retrieval baselines, especially in low-memory regimes. Remarkably, it even surpasses full-cache models in some settings, showing that selective retention can serve as a form of regularization, suppressing noise from uninformative tokens. Qualitative analyses further reveal that learned retention scores align with human intuition, naturally recovering heuristics such as sink tokens, sliding windows, and gist compression without explicit design. Beyond efficiency, retention scores provide insights into layer- and head-specific roles, suggesting a new path toward LLM interpretability.
【14】Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
标题:随机掩蔽微调:缓解LLC中PII小型化的有效方法
链接:https://arxiv.org/abs/2512.03310
作者:Kunj Joshi,David A. Smith
备注:To be submitted for ICML 2026
摘要:目前关于自然语言模型记忆的文献,特别是大型语言模型(LLM),带来了严重的安全和隐私风险,因为模型倾向于从训练数据中记忆个人识别信息(PIIs)。我们介绍了随机屏蔽微调(RMFT),一种新颖的隐私保护微调技术,可减少PII记忆,同时最大限度地减少性能影响。使用安然电子邮件数据集,我们证明,RMFT实现了80.81%的减少总提取率和80.17%的减少看到提取率相比,基线微调,优于重复数据删除方法,同时保持只有5.73%的困惑增加。我们提出了MaxTER,一个帕累托最优的评估框架,用于评估隐私效用的权衡,并显示性能的RMFT与响应曲线下的面积(AURC)度量。
摘要:The current literature on memorization in Natural Language Models, especially Large Language Models (LLMs), poses severe security and privacy risks, as models tend to memorize personally identifying information (PIIs) from training data. We introduce Randomized Masked Fine-Tuning (RMFT), a novel privacy-preserving fine-tuning technique that reduces PII memorization while minimizing performance impact. Using the Enron Email Dataset, we demonstrate that RMFT achieves an 80.81% reduction in Total Extraction Rate and 80.17% reduction in Seen Extraction Rate compared to baseline fine-tuning, outperforming deduplication methods while maintaining only a 5.73% increase in perplexity. We present MaxTER, a Pareto-optimal evaluation framework for assessing privacy-utility tradeoffs, and show the performance of RMFT vs Deduplication by Area Under The Response Curve (AURC) metric.
【15】Watermarks for Embeddings-as-a-Service Large Language Models
标题:嵌入即服务大型语言模型的水印
链接:https://arxiv.org/abs/2512.03079
作者:Anudeex Shetty
摘要:大型语言模型(LLM)在自然语言理解和生成方面表现出了卓越的能力。基于这些LLM,企业已经开始提供嵌入即服务(EaaS),提供有利于下游自然语言处理任务的特征提取功能(以文本嵌入的形式)。然而,之前的研究表明,EaaS容易受到模仿攻击,攻击者以黑盒方式克隆服务模型,而无法访问模型的内部工作。作为回应,水印已被添加到文本嵌入中,以通过允许他们检查模型所有权来保护EaaS提供商的知识产权。本文主要研究EaaS水印技术在防范模仿攻击中的应用。为了实现这一目标,我们揭示了新的攻击,并提出和验证新的水印技术。 首先,我们证明了现有的EaaS水印可以通过在模仿攻击过程中克隆模型时对输入文本进行释义来去除。我们的研究表明,在大多数情况下,释义可以有效地绕过当前最先进的EaaS水印,跨越各种攻击设置(包括不同的释义技术和模型)和数据集。这证明了最近EaaS水印技术中的新漏洞。 随后,作为对策,我们提出了一种新的水印技术,WET(水印EaaS与线性变换),它采用线性变换的嵌入。水印验证是通过应用逆变换和比较恢复和原始嵌入之间的相似性。我们证明了它的鲁棒性对改述攻击与接近完美的可验证性。我们进行了详细的消融研究,以评估WET中每个组成部分和超参数的意义。
摘要:Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. Based on these LLMs, businesses have started to provide Embeddings-as-a-Service (EaaS), offering feature extraction capabilities (in the form of text embeddings) that benefit downstream natural language processing tasks. However, prior research has demonstrated that EaaS is vulnerable to imitation attacks, where an attacker clones the service's model in a black-box manner without access to the model's internal workings. In response, watermarks have been added to the text embeddings to protect the intellectual property of EaaS providers by allowing them to check for model ownership. This thesis focuses on defending against imitation attacks by investigating EaaS watermarks. To achieve this goal, we unveil novel attacks and propose and validate new watermarking techniques. Firstly, we show that existing EaaS watermarks can be removed through paraphrasing the input text when attackers clone the model during imitation attacks. Our study illustrates that paraphrasing can effectively bypass current state-of-the-art EaaS watermarks across various attack setups (including different paraphrasing techniques and models) and datasets in most instances. This demonstrates a new vulnerability in recent EaaS watermarking techniques. Subsequently, as a countermeasure, we propose a novel watermarking technique, WET (Watermarking EaaS with Linear Transformation), which employs linear transformation of the embeddings. Watermark verification is conducted by applying a reverse transformation and comparing the similarity between recovered and original embeddings. We demonstrate its robustness against paraphrasing attacks with near-perfect verifiability. We conduct detailed ablation studies to assess the significance of each component and hyperparameter in WET.
【16】Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing
标题:通过基于费米函数的排序选择和规范固定对LLM进行全球优化的DID压缩
链接:https://arxiv.org/abs/2512.03062
作者:Roman Rausch,David Jansen,Sukhbinder Singh,Román Orús
备注:Prepared for submission to ESANN 2026
摘要:大型语言模型(LLM)在计算资源方面要求非常高。LLM权重的低秩分解(例如,经由奇异值分解(SVD))是用于LLM压缩的有前景的方法,但存在若干实际障碍,例如,选择适当的逐层秩和摆脱其参数冗余。在这项工作中,我们提出了两个物理启发的改进SVD LLM压缩:(1)\textbf{FermiGrad},梯度下降算法,通过放松离散奇异值截断到一个连续的优化使用费米函数确定全局最优的逐层排名;(2)\textbf{PivGa},一个额外的\textit{无损}压缩的低秩因子,利用其参数化的内在规范自由。
摘要:Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) \textbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) \textbf{PivGa}, an additional \textit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.
【17】Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation
标题:缓解LLM中可逆问题的幻觉和遗漏:硬件逻辑设计自动化的应用
链接:https://arxiv.org/abs/2512.03053
作者:Andrew S. Cassidy,Guillaume Garreau,Jay Sivagnaname,Mike Grassi,Bernard Brezzo,John V. Arthur,Dharmendra S. Modha
备注:7 pages, 2 figures, 7 tables
摘要:我们证明了从源域转换数据的可逆问题(例如,逻辑条件表(LCT))到目标域(例如,硬件描述语言(HDL)代码),一种使用大语言模型(LLM)作为从源到目的地的无损编码器,然后作为返回源的无损解码器的方法,与信息论中的无损压缩相当,可以减轻大多数LLM的缺点幻觉和遗漏。具体来说,使用LCT作为输入,我们使用七个不同的LLM生成二维片上网络路由器的完整HDL(13个单元,1500-2000行代码),从自动生成的HDL中重建LCT,并比较原始和重建的LCT。这种方法产生了显着的生产力提高,不仅确认正确生成的LLM逻辑和检测不正确生成的LLM逻辑,而且还帮助开发人员发现设计规范错误。
摘要:We show for invertible problems that transform data from a source domain (for example, Logic Condition Tables (LCTs)) to a destination domain (for example, Hardware Description Language (HDL) code), an approach of using Large Language Models (LLMs) as a lossless encoder from source to destination followed by as a lossless decoder back to the source, comparable to lossless compression in information theory, can mitigate most of the LLM drawbacks of hallucinations and omissions. Specifically, using LCTs as inputs, we generate the full HDL for a two-dimensional network-on-chip router (13 units, 1500-2000 lines of code) using seven different LLMs, reconstruct the LCTs from the auto-generated HDL, and compare the original and reconstructed LCTs. This approach yields significant productivity improvements, not only confirming correctly generated LLM logic and detecting incorrectly generated LLM logic but also assisting developers in finding design specification errors.
【18】Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
标题:大型语言模型中的价值漂移和一致性工作基于信息量的测量
链接:https://arxiv.org/abs/2512.03047
作者:Samih Fadli
备注:6 pages. Companion paper to "The Second Law of Intelligence: Controlling Ethical Entropy in Autonomous Systems". Code and tools: https://github.com/AerisSpace/EthicalEntropyKit
摘要:大型语言模型的安全性通常使用静态基准进行评估,但关键故障是动态的:分布偏移下的值漂移,越狱攻击以及部署中对齐的缓慢退化。基于最近的第二智能定律,将道德熵视为一个状态变量,除非通过对齐工作进行反击,否则它往往会增加,我们使这个框架适用于大型语言模型。我们定义了一个五向行为分类法,训练分类器从模型成绩单中估计道德熵S(t),并在压力测试中测量四个前沿模型的基础和预防调整变体的熵动态。基础模型显示出持续的熵增长,而调整的变体抑制了漂移,并将道德熵降低了大约80%。从这些轨迹中,我们估计有效的对齐工作速率gamma_eff,并将S(t)和gamma_eff嵌入监控管道中,当熵漂移超过稳定性阈值时,该监控管道会发出警报,从而实现对值漂移的运行时监督。
摘要:Large language model safety is usually assessed with static benchmarks, but key failures are dynamic: value drift under distribution shift, jailbreak attacks, and slow degradation of alignment in deployment. Building on a recent Second Law of Intelligence that treats ethical entropy as a state variable which tends to increase unless countered by alignment work, we make this framework operational for large language models. We define a five-way behavioral taxonomy, train a classifier to estimate ethical entropy S(t) from model transcripts, and measure entropy dynamics for base and instruction-tuned variants of four frontier models across stress tests. Base models show sustained entropy growth, while tuned variants suppress drift and reduce ethical entropy by roughly eighty percent. From these trajectories we estimate an effective alignment work rate gamma_eff and embed S(t) and gamma_eff in a monitoring pipeline that raises alerts when entropy drift exceeds a stability threshold, enabling run-time oversight of value drift.
【19】Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting
标题:基于多层自反射的大语言模型多步数学推理
链接:https://arxiv.org/abs/2506.23888
作者:André de Souza Loureiro,Jorge Valverde-Rebaza,Julieta Noguez,David Escarcega,Ricardo Marcacini
备注:Accepted for publication in: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2025). Research Track
摘要:大型语言模型(LLM)的最新进展显着提高了其解决问题的能力。然而,这些模型在面对复杂的多步推理任务时仍然很困难。在本文中,我们提出了多层自反射与自动推理(MAPS)框架,一种新的方法,旨在通过集成技术,如思想链(CoT),自反射和自动推理,以增强多步骤的数学推理LLM。与传统的静态提示方法不同,MAPS采用迭代细化过程。最初,模型使用CoT提示生成解决方案。当检测到错误时,自适应的自我反思机制会识别和分析它们,生成量身定制的提示以指导纠正。这些动态调整的提示使模型能够迭代地完善其推理。在多个LLM上的四个成熟的基准测试上的实验表明,MAPS的性能显着优于标准CoT,并通过推理优化模型实现了有竞争力的结果。此外,MAPS使通用LLM能够达到与专业推理模型相当的性能水平。虽然更深的反射层提高了准确性,但它们也增加了令牌的使用和成本。为了平衡这种权衡,MAPS策略性地限制反射深度,确保成本和推理性能之间的最佳平衡。
摘要:Recent advancements in Large Language Models (LLMs) have significantly improved their problem-solving capabilities. However, these models still struggle when faced with complex multi-step reasoning tasks. In this paper, we propose the Multi-Layered Self-Reflection with Auto-Prompting (MAPS) framework, a novel approach designed to enhance multi-step mathematical reasoning in LLMs by integrating techniques such as Chain of Thought (CoT), Self-Reflection, and Auto-Prompting. Unlike traditional static prompting methods, MAPS employs an iterative refinement process. Initially, the model generates a solution using CoT prompting. When errors are detected, an adaptive self-reflection mechanism identifies and analyzes them, generating tailored prompts to guide corrections. These dynamically adjusted prompts enable the model to iteratively refine its reasoning. Experiments on four well-established benchmarks across multiple LLMs show that MAPS significantly outperforms standard CoT and achieves competitive results with reasoning-optimized models. In addition, MAPS enables general-purpose LLMs to reach performance levels comparable to specialized reasoning models. While deeper reflection layers improve accuracy, they also increase token usage and costs. To balance this trade-off, MAPS strategically limits reflection depth, ensuring an optimal balance between cost and reasoning performance.
【20】Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
标题:异类人类反馈下大语言模型奖励学习的不确定性量化
链接:https://arxiv.org/abs/2512.03208
作者:Pangpang Liu,Junwei Lu,Will Wei Sun
摘要:我们研究了用于对齐大型语言模型(LLM)的奖励模型的估计和统计推断。LLM对齐的一个关键组成部分是来自人类反馈的强化学习(RLHF),其中人类比较模型生成的答案对,并且他们的偏好被用于训练奖励模型。然而,人类的反馈本质上是异质的,这给可靠的奖励学习带来了重大挑战。为了解决这个问题,我们采用了一个异构的偏好框架,共同模拟答案和人类理性的潜在回报。这导致了一个具有挑战性的双凸优化问题,我们通过交替梯度下降算法解决。我们建立了理论保证所得到的估计,包括其收敛性和渐近分布。这些结果使建设的奖励估计的置信区间。利用这些不确定性量化的结果,我们进行有效的统计比较奖励,并将不确定性纳入最佳的N$(BoN)的政策框架。大量的模拟表明,我们的方法的有效性,和实际LLM数据的应用程序突出的实用价值,占奖励建模LLM对齐的不确定性。
摘要
:We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback (RLHF), where humans compare pairs of model-generated answers and their preferences are used to train a reward model. However, human feedback is inherently heterogeneous, creating significant challenges for reliable reward learning. To address this, we adopt a heterogeneous preference framework that jointly models the latent reward of answers and human rationality. This leads to a challenging biconvex optimization problem, which we solve via an alternating gradient descent algorithm. We establish theoretical guarantees for the resulting estimator, including its convergence and asymptotic distribution. These results enable the construction of confidence intervals for reward estimates. Leveraging these uncertainty quantification results, we conduct valid statistical comparisons between rewards and incorporate uncertainty into the best-of-$N$ (BoN) policy framework. Extensive simulations demonstrate the effectiveness of our method, and applications to real LLM data highlight the practical value of accounting for uncertainty in reward modeling for LLM alignment.
【21】A note on the impossibility of conditional PAC-efficient reasoning in large language models
标题:关于大型语言模型中不可能进行条件PAC高效推理的注释
链接:https://arxiv.org/abs/2512.03057
作者:Hao Zeng
摘要:我们证明了一个不可能的结果条件可能近似正确(PAC)有效的推理在大型语言模型。虽然最近的工作已经建立了边际PAC效率保证昂贵的专家模型和廉价的快速模型之间切换的复合模型,我们表明,有条件的(逐点)保证是不可能的分布自由设置。具体来说,对于非原子输入空间,任何实现条件PAC效率的算法都必须是微不足道的,因为它服从专家模型,几乎每个输入的概率至少为1-α。
摘要:We prove an impossibility result for conditional Probably Approximately Correct (PAC)-efficient reasoning in large language models. While recent work has established marginal PAC efficiency guarantees for composite models that switch between expensive expert models and cheaper fast models, we show that conditional (pointwise) guarantees are impossible in the distribution-free setting. Specifically, for non-atomic input spaces, any algorithm achieving conditional PAC efficiency must be trivial in the sense that it defers to the expert model with probability at least $1-α$ for almost every input.
Graph相关(图学习|图神经网络|图优化等)(3篇)
【1】Quantum Topological Graph Neural Networks for Detecting Complex Fraud Patterns
标题:用于检测复杂欺诈模式的量子布局图神经网络
链接:https://arxiv.org/abs/2512.03696
作者:Mohammad Doost,Mohammad Manthouri
摘要:我们提出了一种新的QTGNN框架,用于检测大规模金融网络中的欺诈交易。通过集成量子嵌入、变分图卷积和拓扑数据分析,QTGNN捕获了复杂的交易动态和表明欺诈的结构异常。该方法包括量子数据嵌入与纠缠增强,变分量子图卷积与非线性动力学,高阶拓扑不变量的提取,混合量子经典异常学习与自适应优化,以及通过拓扑属性可解释的决策。严格的收敛保证确保了在有噪声的中间尺度量子(NISQ)设备上的稳定训练,而拓扑签名的稳定性提供了鲁棒的欺诈检测。通过电路简化和图形采样,该框架针对NISQ硬件进行了优化,可扩展到大型交易网络。在PaySim和Elliptic等金融数据集上进行模拟,使用ROC-AUC、精度和误报率等指标,将QTGNN与经典和量子基线进行基准测试。消融研究评估了量子嵌入、拓扑特征、非线性通道和混合学习的贡献。QTGNN为金融欺诈检测提供了理论上合理,可解释和实用的解决方案,桥接了量子机器学习,图论和拓扑分析。
摘要:We propose a novel QTGNN framework for detecting fraudulent transactions in large-scale financial networks. By integrating quantum embedding, variational graph convolutions, and topological data analysis, QTGNN captures complex transaction dynamics and structural anomalies indicative of fraud. The methodology includes quantum data embedding with entanglement enhancement, variational quantum graph convolutions with non-linear dynamics, extraction of higher-order topological invariants, hybrid quantum-classical anomaly learning with adaptive optimization, and interpretable decision-making via topological attribution. Rigorous convergence guarantees ensure stable training on noisy intermediate-scale quantum (NISQ) devices, while stability of topological signatures provides robust fraud detection. Optimized for NISQ hardware with circuit simplifications and graph sampling, the framework scales to large transaction networks. Simulations on financial datasets, such as PaySim and Elliptic, benchmark QTGNN against classical and quantum baselines, using metrics like ROC-AUC, precision, and false positive rate. An ablation study evaluates the contributions of quantum embeddings, topological features, non-linear channels, and hybrid learning. QTGNN offers a theoretically sound, interpretable, and practical solution for financial fraud detection, bridging quantum machine learning, graph theory, and topological analysis.
【2】VS-Graph: Scalable and Efficient Graph Classification Using Hyperdimensional Computing
标题:VS-图:使用多维计算的可扩展且高效的图分类
链接:https://arxiv.org/abs/2512.03394
作者:Hamed Poursiami,Shay Snyder,Guojing Cong,Thomas Potok,Maryam Parsa
摘要:图分类是从分子性质预测到材料设计等领域的一项基本任务。虽然图神经网络(GNN)通过消息传递学习表达性表示来实现强大的性能,但它们会产生很高的计算成本,限制了它们在资源受限设备上的可扩展性和部署。超维计算(HDC),也被称为矢量符号架构(VSA),提供了一种轻量级的、受大脑启发的替代方案,但现有的基于HDC的图方法通常难以与GNN的预测性能相匹配。在这项工作中,我们提出了VS-Graph,一个矢量符号图学习框架,缩小了HDC的效率和消息传递的表达能力之间的差距。VS-Graph引入了一种用于拓扑驱动的节点识别的Spike Diffusion机制和一种用于完全在高维向量空间内进行多跳邻域聚合的关联消息传递方案。在没有基于梯度的优化或反向传播的情况下,我们的方法与现代GNN相比具有竞争力的准确性,在标准基准测试(如MUTAG和DD)上比之前的HDC基线高出4-5%。它还在多个数据集上匹配或超过了GNN基线的性能,同时将训练速度提高了450倍。此外,即使超向量维数降低到D=128,VS-Graph也保持了高精度,在积极的维度压缩下表现出鲁棒性,并为边缘和神经形态硬件上的超高效执行铺平了道路。
摘要:Graph classification is a fundamental task in domains ranging from molecular property prediction to materials design. While graph neural networks (GNNs) achieve strong performance by learning expressive representations via message passing, they incur high computational costs, limiting their scalability and deployment on resource-constrained devices. Hyperdimensional Computing (HDC), also known as Vector Symbolic Architectures (VSA), offers a lightweight, brain-inspired alternative, yet existing HDC-based graph methods typically struggle to match the predictive performance of GNNs. In this work, we propose VS-Graph, a vector-symbolic graph learning framework that narrows the gap between the efficiency of HDC and the expressive power of message passing. VS-Graph introduces a Spike Diffusion mechanism for topology-driven node identification and an Associative Message Passing scheme for multi-hop neighborhood aggregation entirely within the high-dimensional vector space. Without gradient-based optimization or backpropagation, our method achieves competitive accuracy with modern GNNs, outperforming the prior HDC baseline by 4-5% on standard benchmarks such as MUTAG and DD. It also matches or exceeds the performance of the GNN baselines on several datasets while accelerating the training by a factor of up to 450x. Furthermore, VS-Graph maintains high accuracy even with the hypervector dimensionality reduced to D=128, demonstrating robustness under aggressive dimension compression and paving the way for ultra-efficient execution on edge and neuromorphic hardware.
【3】Temporal Graph Neural Networks for Early Anomaly Detection and Performance Prediction via PV System Monitoring Data
标题:通过PV系统监测数据进行早期异常检测和性能预测的时间图神经网络
链接:https://arxiv.org/abs/2512.03114
作者:Srijani Mukherjee,Laurent Vuillon,Liliane Bou Nassif,Stéphanie Giroux-Julien,Hervé Pabiou,Denys Dutykh,Ionnasis Tsanakas
摘要:太阳能光伏(PV)系统的快速增长需要先进的性能监测和异常检测方法,以确保最佳运行。在这项研究中,我们提出了一种新的方法,利用时间图神经网络(时间GNN)来预测太阳能光伏输出功率,并使用环境和操作参数检测异常。该模型利用基于图形的光伏系统的关键参数,包括辐照度,模块和环境温度之间的时间关系来预测电功率输出。这项研究是基于收集的数据,从位于屋顶上的室外设施在里昂(法国),包括从光伏组件和气象参数的功率测量。
摘要:The rapid growth of solar photovoltaic (PV) systems necessitates advanced methods for performance monitoring and anomaly detection to ensure optimal operation. In this study, we propose a novel approach leveraging Temporal Graph Neural Network (Temporal GNN) to predict solar PV output power and detect anomalies using environmental and operational parameters. The proposed model utilizes graph-based temporal relationships among key PV system parameters, including irradiance, module and ambient temperature to predict electrical power output. This study is based on data collected from an outdoor facility located on a rooftop in Lyon (France) including power measurements from a PV module and meteorological parameters.
Transformer(1篇)
【1】GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers
标题:GalaxyDiT:扩散变形机中具有引导对齐和自适应代理的高效视频生成
链接:https://arxiv.org/abs/2512.03451
作者:Zhiye Song,Steve Dai,Ben Keller,Brucek Khailany
摘要:扩散模型已经彻底改变了视频生成,成为创造性内容生成和物理模拟的重要工具。基于转换器的架构(DiTs)和无分类器指导(CFG)是这一成功的两个基石,可实现强大的即时依从性和逼真的视频质量。尽管它们的通用性和优越的性能,这些模型需要密集的计算。每个视频生成都需要数十个迭代步骤,而CFG将所需的计算量增加了一倍。这种低效率阻碍了下游应用的广泛采用。 我们引入了GalaxyDiT,这是一种无需训练的方法,可以通过指导对齐和系统代理选择来加速视频生成,以实现重用指标。通过排序相关分析,我们的技术确定了每个视频模型的最佳代理,跨模型族和参数尺度,从而确保最佳的计算重用。我们在Wan2.1-1.3B和Wan2.1- 14 B上实现了1.87\times $和2.37\times $的加速,而在VBench-2.0基准测试中仅下降了0.97%和0.72%。在高加速率下,我们的方法保持了对基础模型的优异保真度,在峰值信噪比(PSNR)方面超过了现有最先进的方法5到10 dB。
摘要:Diffusion models have revolutionized video generation, becoming essential tools in creative content generation and physical simulation. Transformer-based architectures (DiTs) and classifier-free guidance (CFG) are two cornerstones of this success, enabling strong prompt adherence and realistic video quality. Despite their versatility and superior performance, these models require intensive computation. Each video generation requires dozens of iterative steps, and CFG doubles the required compute. This inefficiency hinders broader adoption in downstream applications. We introduce GalaxyDiT, a training-free method to accelerate video generation with guidance alignment and systematic proxy selection for reuse metrics. Through rank-order correlation analysis, our technique identifies the optimal proxy for each video model, across model families and parameter scales, thereby ensuring optimal computational reuse. We achieve $1.87\times$ and $2.37\times$ speedup on Wan2.1-1.3B and Wan2.1-14B with only 0.97% and 0.72% drops on the VBench-2.0 benchmark. At high speedup rates, our approach maintains superior fidelity to the base model, exceeding prior state-of-the-art approaches by 5 to 10 dB in peak signal-to-noise ratio (PSNR).
GAN|对抗|攻击|生成相关(4篇)
【1】PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
标题:PSA:金字塔稀疏注意力,高效的视频理解和生成
链接:https://arxiv.org/abs/2512.04025
作者:Xiaolong Li,Youping Gu,Xi Lin,Weijie Wang,Bohan Zhuang
备注:Tech report
摘要:注意力机制是基础模型的核心,但其二次复杂性仍然是扩展的关键瓶颈。这一挑战推动了高效注意力机制的发展,稀疏性成为主导范式。当前的方法通常保留或丢弃带有二进制掩码的整个键值块,导致在高稀疏度下的大量信息丢失。为了缓解这一差距,我们提出了金字塔稀疏注意力(PSA),一个通用的模块,适用于视频理解和生成任务。PSA引入了多级池KV表示,而不是二进制掩码,从而实现更精细的掩码粒度。具体而言,每个查询块动态地将较低的池化级别分配给关键KV块,将较高的级别分配给不太重要的块,从而在完全保留和完全修剪之间创建信息插值。这种设计类似于计算机视觉中的定点量化和经典特征金字塔网络,有效地减少了信息丢失,同时在低计算预算下保持了计算效率。它使用本地硬件友好的内核,该内核利用解耦的块-瓦片设计来确保高效执行。在视频理解和生成基准中,PSA保留了上下文信息和视觉保真度,始终优于或达到与现有稀疏注意力基线相当的性能,并具有卓越的效率-质量权衡。我们的代码和模型权重可在http://ziplab.co/PSA上公开获取
摘要:Attention mechanisms are the core of foundation models, but their quadratic complexity remains a critical bottleneck for scaling. This challenge has driven the development of efficient attention mechanisms, with sparsity emerging as the dominant paradigm. Current methods typically retain or discard entire key-value blocks with binary masks, resulting in substantial information loss under high sparsity. To mitigate this gap, we present Pyramid Sparse Attention (PSA), a versatile module applicable to both video understanding and generation tasks. Instead of binary masking, PSA introduces multi-level pooled KV representations, enabling finer mask granularity. Specifically, each query block dynamically allocates lower pooling levels to critical KV blocks and higher levels to less important ones, creating an informative interpolation between full retention and complete pruning. This design, analogous to fixed-point quantization and classical feature pyramid networks in computer vision, effectively mitigates information loss while preserving computational efficiency under a low compute budget. It works with a native, hardware-friendly kernel that leverages decoupled block-tile design to ensure efficient execution. Across video understanding and generation benchmarks, PSA preserves contextual information and visual fidelity, consistently outperforming or achieving comparable performance over existing sparse attention baselines with superior efficiency-quality trade-offs. Our code and model weights are publicly available at: http://ziplab.co/PSA
【2】Step-by-step Layered Design Generation
标题:分步分层设计生成
链接:https://arxiv.org/abs/2512.03335
作者:Faizan Farooq Khan,K J Joseph,Koustava Goswami,Mohamed Elhoseiny,Balaji Vasan Srinivasan
摘要:从本质上讲,设计生成是一个循序渐进的过程,设计师通过仔细修改逐步完善和增强他们的工作。尽管有这个基本特征,现有的方法主要把设计合成作为一个单步生成问题,大大低估了创造性过程的内在复杂性。为了弥合这一差距,我们提出了一种新的问题设置,称为分步分层设计生成,它要求机器学习模型生成一个遵循设计师指令序列的设计。利用多模态LLM的最新进展,我们提出了SLEDGE:逐步分层设计GEnerator,将设计的每次更新建模为原子,分层更改其先前状态,同时在指令中接地。为了补充我们的新问题设置,我们引入了一个新的评估套件,包括数据集和基准。我们详尽的实验分析和比较与国家的最先进的方法为我们的新设置量身定制证明了我们的方法的有效性。我们希望我们的工作将引起人们对这一务实和探索不足的研究领域的关注。
摘要:Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristic, existing approaches mainly treat design synthesis as a single-step generation problem, significantly underestimating the inherent complexity of the creative process. To bridge this gap, we propose a novel problem setting called Step-by-Step Layered Design Generation, which tasks a machine learning model with generating a design that adheres to a sequence of instructions from a designer. Leveraging recent advancements in multi-modal LLMs, we propose SLEDGE: Step-by-step LayEred Design GEnerator to model each update to a design as an atomic, layered change over its previous state, while being grounded in the instruction. To complement our new problem setting, we introduce a new evaluation suite, including a dataset and a benchmark. Our exhaustive experimental analysis and comparison with state-of-the-art approaches tailored to our new setup demonstrate the efficacy of our approach. We hope our work will attract attention to this pragmatic and under-explored research area.
【3】How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy
标题:如何对您的数据进行DP-fy:通过差异隐私生成合成数据的实用指南
链接:https://arxiv.org/abs/2512.03238
作者:Natalia Ponomareva,Zheng Xu,H. Brendan McMahan,Peter Kairouz,Lucas Rosenblatt,Vincent Cohen-Addad,Cristóbal Guzmán,Ryan McKenna,Galen Andrew,Alex Bie,Da Yu,Alex Kurakin,Morteza Zadimoghaddam,Sergei Vassilvitskii,Andreas Terzis
摘要:需要高质量的数据来为最终用户释放AI的全部潜力。然而,寻找这些数据的新来源变得越来越困难:大多数公开的人类生成的数据将很快被使用。此外,公开可用的数据通常不能代表特定系统的用户-例如,与人工智能助理交互的承包商的研究语音数据集可能比最终用户发出的真实世界命令更同质,更清晰,更自我审查。因此,解锁基于真实用户交互的高质量数据至关重要。然而,直接使用用户数据会带来重大的隐私风险。差分隐私(DP)是一个完善的框架,用于推理和限制信息泄漏,是保护用户隐私的黄金标准。这项工作的重点,\n {差异私有合成数据},是指保留源数据的整体趋势的合成数据,同时为对源数据集做出贡献的个人提供强有力的隐私保证。DP合成数据可以释放以前由于隐私问题而无法访问的数据集的价值,并可以取代以前只有基本保护的敏感数据集的使用,如基于规则的匿名化。 在本文中,我们探讨了围绕DP合成数据的全套技术,它们提供的隐私保护类型以及各种形式(图像,表格,文本和分散)的最新技术。我们概述了生成DP合成数据的系统所需的所有组件,从敏感数据处理和准备,到跟踪使用和经验隐私测试。我们希望这项工作将导致DP合成数据的采用增加,刺激更多的研究,并增加对DP合成数据方法的信任。
摘要:High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly available data often is not representative of users of a particular system -- for example, a research speech dataset of contractors interacting with an AI assistant will likely be more homogeneous, well articulated and self-censored than real world commands that end users will issue. Therefore unlocking high-quality data grounded in real user interactions is of vital interest. However, the direct use of user data comes with significant privacy risks. Differential Privacy (DP) is a well established framework for reasoning about and limiting information leakage, and is a gold standard for protecting user privacy. The focus of this work, \emph{Differentially Private Synthetic data}, refers to synthetic data that preserves the overall trends of source data,, while providing strong privacy guarantees to individuals that contributed to the source dataset. DP synthetic data can unlock the value of datasets that have previously been inaccessible due to privacy concerns and can replace the use of sensitive datasets that previously have only had rudimentary protections like ad-hoc rule-based anonymization. In this paper we explore the full suite of techniques surrounding DP synthetic data, the types of privacy protections they offer and the state-of-the-art for various modalities (image, tabular, text and decentralized). We outline all the components needed in a system that generates DP synthetic data, from sensitive data handling and preparation, to tracking the use and empirical privacy testing. We hope that work will result in increased adoption of DP synthetic data, spur additional research and increase trust in DP synthetic data approaches.
【4】Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare
标题:多对一对抗性共识:暴露基于人工智能的医疗保健中的多代理合谋风险
链接:https://arxiv.org/abs/2512.03097
作者:Adeela Bashir,The Anh han,Zia Ush Shamszaman
备注:7 pages Conference level paper
摘要:将大型语言模型(LLM)集成到医疗物联网系统中,有望实现更快的决策和更好的医疗支持。LLM也被部署为多智能体团队,通过辩论、投票或决策建议来协助AI医生。然而,当多个助理代理交互时,协调的对手可以串通起来创造虚假的共识,将AI医生推向有害的处方。我们开发了一个实验框架与脚本和非脚本医生代理,对抗性助理,和一个验证代理,检查对临床指南的决定。使用50个代表性的临床问题,我们发现,共谋驱动攻击成功率(ASR)和有害的建议率(HRR)在未受保护的系统高达100%。相比之下,验证者代理通过阻止对抗性共识来恢复100%的准确性。这项工作提供了人工智能医疗中共谋风险的第一个系统性证据,并展示了一种实用的轻量级防御,以确保指南的保真度。
摘要:The integration of large language models (LLMs) into healthcare IoT systems promises faster decisions and improved medical support. LLMs are also deployed as multi-agent teams to assist AI doctors by debating, voting, or advising on decisions. However, when multiple assistant agents interact, coordinated adversaries can collude to create false consensus, pushing an AI doctor toward harmful prescriptions. We develop an experimental framework with scripted and unscripted doctor agents, adversarial assistants, and a verifier agent that checks decisions against clinical guidelines. Using 50 representative clinical questions, we find that collusion drives the Attack Success Rate (ASR) and Harmful Recommendation Rates (HRR) up to 100% in unprotected systems. In contrast, the verifier agent restores 100% accuracy by blocking adversarial consensus. This work provides the first systematic evidence of collusion risk in AI healthcare and demonstrates a practical, lightweight defence that ensures guideline fidelity.
半/弱/无/有监督|不确定性|主动学习(11篇)
【1】Learning Steerable Clarification Policies with Collaborative Self-play
标题:通过协作自我游戏学习可操纵的澄清政策
链接:https://arxiv.org/abs/2512.04068
作者:Jonathan Berant,Maximillian Chen,Adam Fisch,Reza Aghajani,Fantine Huot,Mirella Lapata,Jacob Eisenstein
摘要:为了处理未指定或模糊的查询,AI助手需要一个策略来管理其不确定性,以确定(a)何时猜测用户意图并直接回答,(b)何时枚举并回答多个可能的意图,以及(c)何时提出澄清问题。然而,这样的策略在上下文上取决于诸如用户偏好或模态之类的因素。例如,在小屏幕或语音设置中枚举多个可能的用户意图是麻烦的。在这项工作中,我们提出了训练可操纵的政策,管理这种不确定性,使用自我发挥。给定两个代理,一个模拟用户,另一个模拟人工智能助手,我们生成对话,用户发出一个潜在的模糊查询,助手需要确定如何响应。重要的是,该模型将每个澄清问题和每个生成的单词的数值成本作为输入,并被要求采取行动,以最大限度地提高其最终奖励,这是成本惩罚的准确性。我们使用强化自我训练(ReST)来训练我们的模型,以实现高回报,并表明这导致了一个可操纵的政策,可预测地改变其行为的条件下提供的成本,导致更高的回报和准确性。此外,我们的过程还推广到在训练时未观察到的数值成本值。
摘要:To handle underspecified or ambiguous queries, AI assistants need a policy for managing their uncertainty to determine (a) when to guess the user intent and answer directly, (b) when to enumerate and answer multiple possible intents, and (c) when to ask a clarifying question. However, such policies are contextually dependent on factors such as user preferences or modality. For example, enumerating multiple possible user intentions is cumbersome on small screens or in a voice setting. In this work, we propose to train steerable policies for managing this uncertainty using self-play. Given two agents, one simulating a user and the other an AI assistant, we generate conversations where the user issues a potentially ambiguous query, and the assistant needs to determine how to respond. Importantly, the model takes as input the numerical cost of each clarification question, and each generated word, and is asked to take the action that will maximize its final reward, which is the cost-penalized accuracy. We use Reinforced Self-Training (ReST) to train our model to achieve high reward and show this leads to a steerable policy that changes its behavior predictably conditioned on the provided costs, leading to higher reward and accuracy. Moreover, our procedure also generalizes to numerical cost values that were unobserved at training time.
【2】Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning
标题:通过深度强化学习实现全车辆主动悬架的数字双胞胎控制协同设计
链接:https://arxiv.org/abs/2512.03891
作者:Ying-Kuan Tsai,Yi-Ping Chen,Vispi Karkaria,Wei Chen
备注:28 pages, 17 figures
摘要:主动悬架系统对于提高车辆的舒适性、安全性和稳定性至关重要,但其性能往往受到固定硬件设计和控制策略的限制,无法适应不确定和动态的运行条件。数字孪生(DT)和深度强化学习(DRL)的最新进展为车辆生命周期中的实时数据驱动优化提供了新的机会。然而,将这些技术整合到一个统一的框架中仍然是一个开放的挑战。本文提出了一种基于DT的整车主动悬架控制协同设计(CCD)框架。通过将自动微分集成到DRL中,我们在不同的驾驶员行为和环境不确定性下共同优化物理悬架组件和控制策略。DRL还解决了部分可观测性的挑战,其中只有有限的状态可以被感知并反馈到控制器,通过直接从可用的传感器信息学习最佳控制动作。该框架将模型更新与分位数学习结合起来,以捕获数据的不确定性,从而实现实时决策和从数字物理交互中进行自适应学习。该方法演示了个性化优化悬架系统下两个不同的驾驶设置(温和和积极的)。结果表明,优化后的系统实现了更平滑的轨迹,并减少了约43%和52%的控制工作,分别为轻度和积极的,同时保持乘坐舒适性和稳定性。贡献包括:开发了一个基于DT的CCD框架,该框架集成了DRL和整车主动悬架的不确定性感知模型更新,引入了用于自改进系统的多代设计策略,并展示了针对不同驾驶员类型的主动悬架系统的个性化优化。
摘要:Active suspension systems are critical for enhancing vehicle comfort, safety, and stability, yet their performance is often limited by fixed hardware designs and control strategies that cannot adapt to uncertain and dynamic operating conditions. Recent advances in digital twins (DTs) and deep reinforcement learning (DRL) offer new opportunities for real-time, data-driven optimization across a vehicle's lifecycle. However, integrating these technologies into a unified framework remains an open challenge. This work presents a DT-based control co-design (CCD) framework for full-vehicle active suspensions using multi-generation design concepts. By integrating automatic differentiation into DRL, we jointly optimize physical suspension components and control policies under varying driver behaviors and environmental uncertainties. DRL also addresses the challenge of partial observability, where only limited states can be sensed and fed back to the controller, by learning optimal control actions directly from available sensor information. The framework incorporates model updating with quantile learning to capture data uncertainty, enabling real-time decision-making and adaptive learning from digital-physical interactions. The approach demonstrates personalized optimization of suspension systems under two distinct driving settings (mild and aggressive). Results show that the optimized systems achieve smoother trajectories and reduce control efforts by approximately 43% and 52% for mild and aggressive, respectively, while maintaining ride comfort and stability. Contributions include: developing a DT-enabled CCD framework integrating DRL and uncertainty-aware model updating for full-vehicle active suspensions, introducing a multi-generation design strategy for self-improving systems, and demonstrating personalized optimization of active suspension systems for distinct driver types.
【3】AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning
标题:AaPE:用于自我监督音频表示学习的混淆感知补丁嵌入
链接:https://arxiv.org/abs/2512.03637
作者:Kohei Yamamoto,Kosuke Okusa
备注:11 pages, 4 figures
摘要:基于transformer的音频SSL(自监督学习)模型通常将频谱图视为图像,应用卷积补丁化和大量时间下采样。这会降低有效奈奎斯特频率并引入混叠,而简单的低通滤波会删除与任务相关的高频提示。在这项研究中,我们提出了混叠感知补丁嵌入(AaPE),一个下拉式补丁干,减轻混叠,同时保留高频信息。AaPE增强了标准补丁令牌,其特征由带限复正弦内核使用动态针对易混叠频带的双侧指数窗口产生。从输入估计内核的频率和衰减参数,使并行,自适应子带分析,其输出与标准补丁令牌融合。AaPE无缝集成到蒙面师生自我监督学习中。此外,我们将多掩码策略与对比目标相结合,以加强不同掩码模式的一致性,稳定训练。在AudioSet上进行预培训,然后在不同的下游基准上进行微调评估,这些基准涵盖环境声音和其他常见音频领域等类别。这种方法在一部分任务上产生了最先进的性能,在其余任务上产生了有竞争力的结果。互补的线性探测评估反映了这种模式,在几个基准测试中获得了明显的收益,在其他地方也有很好的表现。对这些结果的集体分析表明,AaPE有助于减轻混叠的影响,而不会丢弃信息丰富的高频内容。
摘要:Transformer-based audio SSL (self-supervised learning) models often treat spectrograms as images, applying convolutional patchification with heavy temporal downsampling. This lowers the effective Nyquist frequency and introduces aliasing, while naïve low-pass filtering removes task-relevant high-frequency cues. In this study, we present Aliasing-aware Patch Embedding (AaPE), a drop-in patch stem that mitigates aliasing while preserving high-frequency information. AaPE augments standard patch tokens with features produced by a band-limited complex sinusoidal kernel using a two-sided exponential window that dynamically targets alias-prone bands. Frequency and decay parameters of the kernel are estimated from the input, enabling parallel, adaptive subband analysis whose outputs are fused with the standard patch tokens. AaPE integrates seamlessly into the masked teacher-student self-supervised learning. In addition, we combine a multi-mask strategy with a contrastive objective to enforce consistency across diverse mask patterns, stabilizing training. Pre-training on AudioSet followed by fine-tuning evaluation across diverse downstream benchmarks, which spanned categories, such as environmental sounds and other common audio domains. This approach yields state-of-the-art performance on a subset of tasks and competitive results across the remainder. Complementary linear probing evaluation mirrors this pattern, yielding clear gains on several benchmarks and strong performance elsewhere. The collective analysis of these results indicates that AaPE serves to mitigate the effects of aliasing without discarding of informative high-frequency content.
【4】KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models
标题:KeyPointDistuser:通过潜在扩散模型的无监督3D关键点学习
链接:https://arxiv.org/abs/2512.03450
作者:Rhys Newbury,Juyan Zhang,Tin Tran,Hanna Kurniawati,Dana Kulić
摘要:以无监督的方式理解和表示3D对象的结构仍然是计算机视觉和图形学的核心挑战。大多数现有的无监督关键点方法不是为无条件生成设置而设计的,限制了它们在现代3D生成管道中的使用;我们的公式明确地弥合了这一差距。我们提出了一个无监督的框架,从点云数据学习空间结构的3D关键点。这些关键点作为一个紧凑和可解释的表示,条件阐明扩散模型(EDM)重建完整的形状。学习的关键点在对象实例中表现出可重复的空间结构,并支持关键点空间中的平滑插值,这表明它们捕获几何变化。我们的方法在不同的对象类别中实现了强大的性能,与以前的方法相比,在关键点一致性方面提高了6个百分点。
摘要:Understanding and representing the structure of 3D objects in an unsupervised manner remains a core challenge in computer vision and graphics. Most existing unsupervised keypoint methods are not designed for unconditional generative settings, restricting their use in modern 3D generative pipelines; our formulation explicitly bridges this gap. We present an unsupervised framework for learning spatially structured 3D keypoints from point cloud data. These keypoints serve as a compact and interpretable representation that conditions an Elucidated Diffusion Model (EDM) to reconstruct the full shape. The learned keypoints exhibit repeatable spatial structure across object instances and support smooth interpolation in keypoint space, indicating that they capture geometric variation. Our method achieves strong performance across diverse object categories, yielding a 6 percentage-point improvement in keypoint consistency compared to prior approaches.
【5】Adaptive Regime-Switching Forecasts with Distribution-Free Uncertainty: Deep Switching State-Space Models Meet Conformal Prediction
标题:具有无分布不确定性的自适应区域转换预测:深度转换状态空间模型满足保形预测
链接:https://arxiv.org/abs/2512.03298
作者:Echo Diyun LU,Charles Findling,Marianne Clausel,Alessandro Leite,Wei Gong,Pierric Kersaudy
摘要:制度变迁通常会打破时间序列的平稳性,使校准的不确定性与点的准确性一样重要。我们通过耦合深度切换状态空间模型与自适应共形推理(ACI)及其聚合变体(AgACI)来研究状态切换预测的无分布不确定性。我们还引入了一个统一的共形包装器,它位于强序列基线之上,包括S4,MC-Dropout GRU,稀疏高斯过程和变点局部模型,以在非平稳性和模型错误指定下产生具有有限样本边际保证的在线预测带。在合成数据集和真实数据集上,整合的预报器以具有竞争力的准确性和普遍提高的波段效率实现了接近标称的覆盖。
摘要:Regime transitions routinely break stationarity in time series, making calibrated uncertainty as important as point accuracy. We study distribution-free uncertainty for regime-switching forecasting by coupling Deep Switching State Space Models with Adaptive Conformal Inference (ACI) and its aggregated variant (AgACI). We also introduce a unified conformal wrapper that sits atop strong sequence baselines including S4, MC-Dropout GRU, sparse Gaussian processes, and a change-point local model to produce online predictive bands with finite-sample marginal guarantees under nonstationarity and model misspecification. Across synthetic and real datasets, conformalized forecasters achieve near-nominal coverage with competitive accuracy and generally improved band efficiency.
【6】Flux4D: Flow-based Unsupervised 4D Reconstruction
标题:Flux 4D:基于流的无监督4D重建
链接:https://arxiv.org/abs/2512.03210
作者:Jingkang Wang,Henry Che,Yun Chen,Ze Yang,Lily Goli,Sivabalan Manivasagam,Raquel Urtasun
备注:NeurIPS 2025. Project page: https://waabi.ai/flux4d/
摘要:从视觉观察中重建大规模动态场景是计算机视觉中的一个基本挑战,对机器人和自主系统具有重要意义。虽然最近的微分渲染方法,如神经辐射场(NeRF)和3D高斯溅射(3DGS)已经实现了令人印象深刻的照片级重建,但它们受到可扩展性限制,需要注释来解耦演员运动。现有的自监督方法试图通过利用运动线索和几何先验来消除显式注释,但它们仍然受到每个场景优化和对超参数调整的敏感性的限制。在本文中,我们介绍了Flux4D,一个简单的和可扩展的框架,大规模的动态场景的4D重建。Flux4D直接预测3D高斯及其运动动态,以完全无监督的方式重建传感器观测结果。通过仅采用光度损失并强制执行“尽可能静态”的正则化,Flux4D学会直接从原始数据中分解动态元素,而无需预先训练的监督模型或基础先验,只需在许多场景中进行训练。我们的方法可以在几秒钟内有效地重建动态场景,有效地扩展到大型数据集,并很好地推广到看不见的环境,包括罕见和未知的物体。在室外驾驶数据集上的实验表明,Flux4D在可扩展性、泛化和重建质量方面明显优于现有方法。
摘要:Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.
【7】Drainage: A Unifying Framework for Addressing Class Uncertainty
标题:排水:解决阶级不确定性的统一框架
链接:https://arxiv.org/abs/2512.03182
作者:Yasser Taha,Grégoire Montavon,Nils Körber
备注:16 pages, 8 figures
摘要:现代深度学习面临着噪声标签、类别模糊以及需要鲁棒地拒绝分布外或损坏的样本的重大挑战。在这项工作中,我们提出了一个统一的框架的基础上的概念“排水节点”,我们添加在网络的输出。节点用于将概率质量重新分配给不确定性,同时保留期望的属性,例如端到端训练和可微性。这种机制为高度模糊、异常或嘈杂的样本提供了一条自然的逃逸路线,特别是与实例依赖和不对称标签噪声相关的样本。在系统的实验中,涉及到不同比例的实例相关的噪声或不对称噪声CIFAR-10/100标签,我们的排水配方实现了高达9%的精度增加现有的方法在高噪声制度。我们在现实世界数据集上的结果,如mini-WebVision,mini-ImageNet和Clothing-1 M,与现有的最先进的方法相匹配或超越。定性分析揭示了去噪效应,其中引流神经元持续吸收损坏的、错误标记的或离群数据,从而产生更稳定的决策边界。此外,我们的排水公式使应用程序远远超出了分类,直接受益于网络规模,半监督数据集清洗和开放集应用程序。
摘要:Modern deep learning faces significant challenges with noisy labels, class ambiguity, as well as the need to robustly reject out-of-distribution or corrupted samples. In this work, we propose a unified framework based on the concept of a "drainage node'' which we add at the output of the network. The node serves to reallocate probability mass toward uncertainty, while preserving desirable properties such as end-to-end training and differentiability. This mechanism provides a natural escape route for highly ambiguous, anomalous, or noisy samples, particularly relevant for instance-dependent and asymmetric label noise. In systematic experiments involving the addition of varying proportions of instance-dependent noise or asymmetric noise to CIFAR-10/100 labels, our drainage formulation achieves an accuracy increase of up to 9\% over existing approaches in the high-noise regime. Our results on real-world datasets, such as mini-WebVision, mini-ImageNet and Clothing-1M, match or surpass existing state-of-the-art methods. Qualitative analysis reveals a denoising effect, where the drainage neuron consistently absorbs corrupt, mislabeled, or outlier data, leading to more stable decision boundaries. Furthermore, our drainage formulation enables applications well beyond classification, with immediate benefits for web-scale, semi-supervised dataset cleaning, and open-set applications.
【8】Real-Time Structural Health Monitoring with Bayesian Neural Networks: Distinguishing Aleatoric and Epistemic Uncertainty for Digital Twin Frameworks
标题:使用Bayesian神经网络进行实时结构健康监测:区分数字双胞胎框架的主观不确定性和认知不确定性
链接:https://arxiv.org/abs/2512.03115
作者:Hanbin Cho,Jecheon Yu,Hyeonbin Moon,Jiyoung Yoon,Junhyeong Lee,Giyoung Kim,Jinhyoung Park,Seunghwa Ryu
备注:37 pages, 13 figures
摘要:传感器数据的可靠实时分析对于高价值资产的结构健康监测(SHM)至关重要,但一个主要的挑战是获得空间分辨的全场任意性和认知不确定性,以做出可信的决策。我们提出了一个综合的SHM框架,结合主成分分析(PCA),贝叶斯神经网络(BNN),和汉密尔顿蒙特卡罗(HMC)推理,稀疏应变计测量映射到领先的PCA模式重建全场应变分布与不确定性量化。该框架通过对具有不同裂纹长度的碳纤维增强聚合物(CFRP)试样进行循环四点弯曲试验进行验证,实现了精确的应变场重建(R平方值> 0.9),同时产生实时不确定性场。一个关键的贡献是,BNN产生强大的全场应变重建从嘈杂的实验数据与裂纹引起的应变奇点,同时也提供了明确的表示两个互补的不确定性领域。在全域形式中联合考虑,随机和认知不确定性域使得在局部水平上诊断低置信度区域是否由数据固有问题或模型相关限制驱动成为可能,从而支持可靠的决策。总的来说,结果表明,所提出的框架将SHM推向了值得信赖的数字孪生部署和风险意识结构诊断。
摘要
:Reliable real-time analysis of sensor data is essential for structural health monitoring (SHM) of high-value assets, yet a major challenge is to obtain spatially resolved full-field aleatoric and epistemic uncertainties for trustworthy decision-making. We present an integrated SHM framework that combines principal component analysis (PCA), a Bayesian neural network (BNN), and Hamiltonian Monte Carlo (HMC) inference, mapping sparse strain gauge measurements onto leading PCA modes to reconstruct full-field strain distributions with uncertainty quantification. The framework was validated through cyclic four-point bending tests on carbon fiber reinforced polymer (CFRP) specimens with varying crack lengths, achieving accurate strain field reconstruction (R squared value > 0.9) while simultaneously producing real-time uncertainty fields. A key contribution is that the BNN yields robust full-field strain reconstructions from noisy experimental data with crack-induced strain singularities, while also providing explicit representations of two complementary uncertainty fields. Considered jointly in full-field form, the aleatoric and epistemic uncertainty fields make it possible to diagnose at a local level, whether low-confidence regions are driven by data-inherent issues or by model-related limitations, thereby supporting reliable decision-making. Collectively, the results demonstrate that the proposed framework advances SHM toward trustworthy digital twin deployment and risk-aware structural diagnostics.
【9】ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification
标题:ALARM:具有不确定性量化的复杂环境监测中基于MLLM的自动异常检测
链接:https://arxiv.org/abs/2512.03101
作者:Congjing Zhang,Feng Lin,Xinyi Zhao,Pei Guo,Wei Li,Lin Chen,Chaoyue Zhao,Shuai Huang
摘要:大语言模型(Large Language Models,LLM)的发展极大地激发了研究兴趣,即开发基于多模态LLM(Multi-Modal LLM,MLLM)的视觉异常检测(Visual Anomaly Detection,VAD)算法,以应用于复杂环境。面临的挑战是,在这些复杂的环境中,异常有时是高度上下文和模糊的,因此,不确定性量化(UQ)是一个基于MLLM的VAD系统成功的关键能力。在本文中,我们介绍了我们的UQ支持的MLLM为基础的VAD框架称为报警。ALARM将UQ与推理链、自反射和MLLM集成等质量保证技术相结合,以实现稳健和准确的性能,并基于严格的概率推理管道和计算过程进行设计。使用真实世界的智能家居基准数据和伤口图像分类数据进行了广泛的实证评估,这表明ALARM的卓越性能及其在不同领域的通用适用性,以进行可靠的决策。
摘要:The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for an MLLM-based VAD system to succeed. In this paper, we introduce our UQ-supported MLLM-based VAD framework called ALARM. ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process. Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data, which shows ALARM's superior performance and its generic applicability across different domains for reliable decision-making.
【10】Physics-informed self-supervised learning for predictive modeling of coronary artery digital twins
标题:基于物理信息的自我监督学习用于冠状动脉数字双胞胎的预测建模
链接:https://arxiv.org/abs/2512.03055
作者:Xiaowu Sun,Thabo Mahendiran,Ortal Senouf,Denise Auberson,Bernard De Bruyne,Stephane Fournier,Olivier Muller,Pascal Frossard,Emmanuel Abbe,Dorina Thanou
备注:19 pages
摘要:心血管疾病是全球主要的死亡原因,冠状动脉疾病(CAD)是其最常见的形式,需要早期风险预测。虽然通过成像重建的3D冠状动脉数字孪生模型为个性化评估提供了详细的解剖结构,但其分析依赖于计算密集型计算流体动力学(CFD),限制了可扩展性。数据驱动的方法受到稀缺的标记数据和缺乏生理先验的阻碍。为了解决这个问题,我们提出了PINS-CAD,这是一个物理信息自监督学习框架。它在200,000个合成冠状动脉数字双胞胎上预训练图形神经网络,以预测压力和流量,由1D Navier-Stokes方程和压降定律指导,无需CFD或标记数据。当对多中心FAME 2研究中635例患者的临床数据进行微调时,PINS-CAD预测未来心血管事件的AUC为0.73,优于临床风险评分和数据驱动基线。这表明,基于物理学的预训练可以提高样本效率并产生有生理意义的表示。此外,PINS-CAD生成空间分辨的压力和血流储备分数曲线,提供可解释的生物标志物。通过将物理先验嵌入到几何深度学习中,PINS-CAD将常规血管造影转换为无模拟的生理感知框架,用于可扩展的预防性心脏病学。
摘要:Cardiovascular disease is the leading global cause of mortality, with coronary artery disease (CAD) as its most prevalent form, necessitating early risk prediction. While 3D coronary artery digital twins reconstructed from imaging offer detailed anatomy for personalized assessment, their analysis relies on computationally intensive computational fluid dynamics (CFD), limiting scalability. Data-driven approaches are hindered by scarce labeled data and lack of physiological priors. To address this, we present PINS-CAD, a physics-informed self-supervised learning framework. It pre-trains graph neural networks on 200,000 synthetic coronary digital twins to predict pressure and flow, guided by 1D Navier-Stokes equations and pressure-drop laws, eliminating the need for CFD or labeled data. When fine-tuned on clinical data from 635 patients in the multicenter FAME2 study, PINS-CAD predicts future cardiovascular events with an AUC of 0.73, outperforming clinical risk scores and data-driven baselines. This demonstrates that physics-informed pretraining boosts sample efficiency and yields physiologically meaningful representations. Furthermore, PINS-CAD generates spatially resolved pressure and fractional flow reserve curves, providing interpretable biomarkers. By embedding physical priors into geometric deep learning, PINS-CAD transforms routine angiography into a simulation-free, physiology-aware framework for scalable, preventive cardiology.
【11】Ultra-Strong Gradient Diffusion MRI with Self-Supervised Learning for Prostate Cancer Characterization
标题:具有自我监督学习的超强梯度扩散MRI用于前列腺癌定性
链接:https://arxiv.org/abs/2512.03196
作者:Tanishq Patil,Snigdha Sen,Malwina Molendowska,Kieran G. Foley,Fabrizio Fasano,Mara Cercignani,Marco Palombo,Paddy J. Slator,Eleftheria Panagiotaki
备注:24 pages, 17 figures, 7 tables
摘要:扩散MRI(dMRI)能够对前列腺微结构进行无创评估,但多参数MRI中的表观扩散系数等常规指标对基础组织学缺乏特异性。将dMRI与基于隔室的生物物理VERDICT(用于肿瘤细胞计数的血管、细胞外和限制性扩散)框架相结合,提供了更丰富的微观结构见解,尽管临床梯度系统(40-80 mT/m)在较强的扩散权重下由于回波时间延长而导致信噪比(SNR)较差。超强梯度(高达300 mT/m)可以通过改善SNR和对比度噪声比(CNR)来缓解这些限制,但由于周围神经刺激阈值和梯度不均匀性的挑战,其采用直到最近才仅限于研究环境。本研究调查是否物理知情的自我监督VERDICT(ssVERDICT)拟合应用于超强梯度增强前列腺癌表征相对于当前的临床采集。我们使用密集多层感知器(Dense MLP)和卷积U-Net架构开发了增强的ssVERDICT拟合方法,将其与临床到超强梯度系统的非线性最小二乘(NLLS)拟合和扩散峰度成像进行基准测试。在超强梯度下的密集ssVERDICT显著优于NLLS VERDICT,将中位CNR提高了47%,将患者间变异系数降低了52%,并将合并的f_ic变异降低了50%。总体而言,与传统方法和临床梯度系统相比,它提供了最高的CNR,最稳定的参数估计值和最清晰的肿瘤-正常对比度。这些发现突出了先进的梯度系统和基于深度学习的建模在改善非侵入性前列腺癌表征和减少不必要的活检方面的潜力。
摘要
:Diffusion MRI (dMRI) enables non-invasive assessment of prostate microstructure but conventional metrics such as the Apparent Diffusion Coefficient in multiparametric MRI lack specificity to underlying histology. Integrating dMRI with the compartment-based biophysical VERDICT (Vascular, Extracellular, and Restricted Diffusion for Cytometry in Tumours) framework offers richer microstructural insights, though clinical gradient systems (40-80 mT/m) suffer from poor signal-to-noise ratio (SNR) at stronger diffusion weightings due to prolonged echo times. Ultra-strong gradients (up to 300 mT/m) can mitigate these limitations by improving SNR and contrast-to-noise ratios (CNR) but their adoption has until recently been limited to research environments due to challenges with peripheral nerve stimulation thresholds and gradient non-uniformity. This study investigates whether physics-informed self-supervised VERDICT (ssVERDICT) fitting applied to ultra-strong gradients enhances prostate cancer characterization relative to current clinical acquisitions. We developed enhanced ssVERDICT fitting approaches using dense multilayer perceptron (Dense MLP) and convolutional U-Net architectures, benchmarking them against non-linear least-squares (NLLS) fitting and Diffusion Kurtosis Imaging across clinical- to ultra-strong gradient systems. Dense ssVERDICT at ultra-strong gradient notably outperformed NLLS VERDICT, boosting median CNR by 47%, cutting inter-patient Coefficient of Variation by 52%, and reducing pooled f_ic variation by 50%. Overall, it delivered the highest CNR, the most stable parameter estimates, and the clearest tumour-normal contrast compared with conventional methods and clinical gradient systems. These findings highlight the potential of advanced gradient systems and deep learning-based modelling to improve non-invasive prostate cancer characterization and reduce unnecessary biopsies.
迁移|Zero/Few/One-Shot|自适应(8篇)
【1】Adaptive Identification and Modeling of Clinical Pathways with Process Mining
标题:利用流程挖掘进行临床路径的自适应识别和建模
链接:https://arxiv.org/abs/2512.03787
作者:Francesco Vitale,Nicola Mazzocca
备注:Accepted to the 41st ACM/SIGAPP Symposium On Applied Computing (ACM SAC 2026)
摘要:临床路径是模拟患者治疗程序的专业医疗保健计划。它们的开发旨在提供基于标准的进展和标准化患者治疗,从而改善护理,减少资源使用并加速患者康复。然而,基于临床指南和领域专业知识对这些途径进行手动建模是困难的,并且可能无法反映疾病的不同变化或组合的实际最佳实践。我们提出了一个两阶段的建模方法,使用过程挖掘,扩展知识库的临床路径,利用一致性检查诊断。在第一阶段,收集给定疾病的历史数据,以过程模型的形式捕获治疗。在第二阶段,将新数据与参考模型进行比较,以验证一致性。基于一致性检查结果,可以使用针对新变体或疾病组合定制的更具体的模型来扩展知识库。我们使用Synthea来展示我们的方法,Synthea是一个模拟患者治疗SARS-CoV-2感染的基准数据集,具有不同的COVID-19并发症。结果表明,我们的方法能够以足够的精度扩展临床路径的知识库,峰值达到95.62%AUC,同时保持67.11%的弧度简单性。
摘要:Clinical pathways are specialized healthcare plans that model patient treatment procedures. They are developed to provide criteria-based progression and standardize patient treatment, thereby improving care, reducing resource use, and accelerating patient recovery. However, manual modeling of these pathways based on clinical guidelines and domain expertise is difficult and may not reflect the actual best practices for different variations or combinations of diseases. We propose a two-phase modeling method using process mining, which extends the knowledge base of clinical pathways by leveraging conformance checking diagnostics. In the first phase, historical data of a given disease is collected to capture treatment in the form of a process model. In the second phase, new data is compared against the reference model to verify conformance. Based on the conformance checking results, the knowledge base can be expanded with more specific models tailored to new variants or disease combinations. We demonstrate our approach using Synthea, a benchmark dataset simulating patient treatments for SARS-CoV-2 infections with varying COVID-19 complications. The results show that our method enables expanding the knowledge base of clinical pathways with sufficient precision, peaking to 95.62% AUC while maintaining an arc-degree simplicity of 67.11%.
【2】Adaptive sampling using variational autoencoder and reinforcement learning
标题:使用变分自动编码器和强化学习的自适应采样
链接:https://arxiv.org/abs/2512.03525
作者:Adil Rasheed,Mikael Aleksander Jansen Shahly,Muhammad Faisal Aftab
摘要:压缩感知支持稀疏采样,但依赖于通用基础和随机测量,限制了效率和重建质量。最佳传感器放置使用历史数据来设计定制的采样模式,但其固定的线性基础不能适应非线性或样本特定的变化。基于生成模型的压缩感知使用深度生成先验来改进重构,但仍然采用次优随机采样。我们提出了一个自适应稀疏感知框架,耦合一个变分自动编码器之前与强化学习选择测量顺序。实验表明,该方法优于CS,OSP和基于生成模型的稀疏测量重建。
摘要:Compressed sensing enables sparse sampling but relies on generic bases and random measurements, limiting efficiency and reconstruction quality. Optimal sensor placement uses historcal data to design tailored sampling patterns, yet its fixed, linear bases cannot adapt to nonlinear or sample-specific variations. Generative model-based compressed sensing improves reconstruction using deep generative priors but still employs suboptimal random sampling. We propose an adaptive sparse sensing framework that couples a variational autoencoder prior with reinforcement learning to select measurements sequentially. Experiments show that this approach outperforms CS, OSP, and Generative model-based reconstruction from sparse measurements.
【3】A2G-QFL: Adaptive Aggregation with Two Gains in Quantum Federated learning
标题:A2 G-QFL:量子联邦学习中的两项收益的自适应聚合
链接:https://arxiv.org/abs/2512.03363
作者:Shanika Iroshi Nanayakkara,Shiva Raj Pokhrel
备注:8 pages, 4 figures, QCNC 2026
摘要:部署在量子和异构经典网络上的联邦学习(FL)面临着显着的性能下降,由于不均匀的客户端质量,随机隐形传态保真度,设备不稳定,以及局部和全局模型之间的几何不匹配。经典的聚合规则假设欧氏拓扑和统一的通信可靠性,限制了它们对新兴量子联邦系统的适用性。本文介绍了A2G(自适应聚合与两个增益),一个双增益框架,共同调节几何混合通过几何增益和调制客户端的重要性,使用来自远程传输保真度,延迟和不稳定性的QoS增益。我们开发的A2G更新规则,建立收敛保证平滑和有界方差的假设下,并表明A2G恢复FedAvg,QoS感知平均,和流形为基础的聚合作为特殊情况。量子经典混合测试台上的实验表明,在非均匀和有噪条件下,稳定性得到提高,准确度更高。
摘要:Federated learning (FL) deployed over quantum enabled and heterogeneous classical networks faces significant performance degradation due to uneven client quality, stochastic teleportation fidelity, device instability, and geometric mismatch between local and global models. Classical aggregation rules assume euclidean topology and uniform communication reliability, limiting their suitability for emerging quantum federated systems. This paper introduces A2G (Adaptive Aggregation with Two Gains), a dual gain framework that jointly regulates geometric blending through a geometry gain and modulates client importance using a QoS gain derived from teleportation fidelity, latency, and instability. We develop the A2G update rule, establish convergence guarantees under smoothness and bounded variance assumptions, and show that A2G recovers FedAvg, QoS aware averaging, and manifold based aggregation as special cases. Experiments on a quantum classical hybrid testbed demonstrate improved stability and higher accuracy under heterogeneous and noisy conditions.
【4】ASPEN: An Adaptive Spectral Physics-Enabled Network for Ginzburg-Landau Dynamics
标题:ASPEN:一个基于自适应谱物理的Ginzburg-Landau动力学网络
链接:https://arxiv.org/abs/2512.03290
作者:Julian Evan Chrisnanto,Nurfauzi Fadillah,Yulison Herry Chrisnanto
备注:15 pages, 7 figures
摘要:物理信息神经网络(PINN)已经成为求解偏微分方程(PDE)的强大的无网格范式。然而,众所周知,由于标准多层感知器(MLP)架构固有的光谱偏差,它们在处理刚性、多尺度和非线性系统时遇到了困难,这使得它们无法充分表示高频分量。在这项工作中,我们介绍了自适应频谱物理使能网络(ASPEN),一种新的架构,旨在克服这一关键的限制。ASPEN将具有可学习傅立叶特征的自适应光谱层直接集成到网络的输入级。这种机制允许模型在训练过程中动态调整自己的频谱基础,使其能够有效地学习和表示解决方案所需的精确频率内容。我们证明了ASPEN的有效性,将其应用到复杂的Ginzburg-Landau方程(CGLE),一个典型的和具有挑战性的基准非线性,刚性时空动力学。我们的研究结果表明,一个标准的PINN架构灾难性地失败在这个问题上,分散到非物理振荡。相比之下,ASPEN成功地解决了CGLE异常准确。预测的解与高分辨率的地面实况在视觉上无法区分,物理残差的中值较低,为5.10 x 10^-3。此外,我们验证了ASPEN的解决方案不仅是逐点准确的,而且物理上一致,正确捕捉紧急的物理性质,包括快速的自由能弛豫和长期稳定的畴壁前。这项工作表明,通过引入自适应谱基,我们的框架为标准PINN失败的复杂动力系统提供了一个强大且物理一致的求解器,为具有挑战性的物理领域的机器学习开辟了新的选择。
摘要
:Physics-Informed Neural Networks (PINNs) have emerged as a powerful, mesh-free paradigm for solving partial differential equations (PDEs). However, they notoriously struggle with stiff, multi-scale, and nonlinear systems due to the inherent spectral bias of standard multilayer perceptron (MLP) architectures, which prevents them from adequately representing high-frequency components. In this work, we introduce the Adaptive Spectral Physics-Enabled Network (ASPEN), a novel architecture designed to overcome this critical limitation. ASPEN integrates an adaptive spectral layer with learnable Fourier features directly into the network's input stage. This mechanism allows the model to dynamically tune its own spectral basis during training, enabling it to efficiently learn and represent the precise frequency content required by the solution. We demonstrate the efficacy of ASPEN by applying it to the complex Ginzburg-Landau equation (CGLE), a canonical and challenging benchmark for nonlinear, stiff spatio-temporal dynamics. Our results show that a standard PINN architecture catastrophically fails on this problem, diverging into non-physical oscillations. In contrast, ASPEN successfully solves the CGLE with exceptional accuracy. The predicted solution is visually indistinguishable from the high-resolution ground truth, achieving a low median physics residual of 5.10 x 10^-3. Furthermore, we validate that ASPEN's solution is not only pointwise accurate but also physically consistent, correctly capturing emergent physical properties, including the rapid free energy relaxation and the long-term stability of the domain wall front. This work demonstrates that by incorporating an adaptive spectral basis, our framework provides a robust and physically-consistent solver for complex dynamical systems where standard PINNs fail, opening new options for machine learning in challenging physical domains.
【5】A Discrete Neural Operator with Adaptive Sampling for Surrogate Modeling of Parametric Transient Darcy Flows in Porous Media
标题:多孔介质参数瞬时达西流替代建模的自适应采样离散神经运算器
链接:https://arxiv.org/abs/2512.03113
作者:Zhenglong Chen,Zhao Zhang,Xia Yan,Jiayu Zhai,Piyang Liu,Kai Zhang
摘要:本文提出了一种新的离散神经网络算子,用于非均质多孔介质中随机参数非稳态达西流场的代理建模。该方法综合了时域编码、算子学习和UNet等技术,实现了随机参数向量空间与时空流场之间的映射。新的离散神经算子可以实现比SOTA注意-残差-UNet结构更高的预测精度。在有限体积法的基础上,引入导水率矩阵代替渗透率矩阵作为代理变量的输入,进一步提高了预测精度。为了提高采样效率,提出了一种基于高斯混合模型的生成式潜空间自适应采样方法。对二维/三维单、两相达西流场预测的算例进行了验证。结果表明,在有限的训练集的预测精度一致增强。
摘要:This study proposes a new discrete neural operator for surrogate modeling of transient Darcy flow fields in heterogeneous porous media with random parameters. The new method integrates temporal encoding, operator learning and UNet to approximate the mapping between vector spaces of random parameter and spatiotemporal flow fields. The new discrete neural operator can achieve higher prediction accuracy than the SOTA attention-residual-UNet structure. Derived from the finite volume method, the transmissibility matrices rather than permeability is adopted as the inputs of surrogates to enhance the prediction accuracy further. To increase sampling efficiency, a generative latent space adaptive sampling method is developed employing the Gaussian mixture model for density estimation of generalization error. Validation is conducted on test cases of 2D/3D single- and two-phase Darcy flow field prediction. Results reveal consistent enhancement in prediction accuracy given limited training set.
【6】Delta Sampling: Data-Free Knowledge Transfer Across Diffusion Models
标题:Delta抽样:跨扩散模型的无数据知识转移
链接:https://arxiv.org/abs/2512.03056
作者:Zhidong Gao,Zimeng Pan,Yuhang Yao,Chenyue Xie,Wei Wei
摘要:像Stable Diffusion(SD)这样的扩散模型驱动着一个充满活力的开源生态系统,包括完全微调的检查点和参数高效的适配器,如LoRA、LyCORIS和ControlNet。然而,这些适配组件紧密耦合到特定的基础模型,使得它们在基础模型被升级时难以重用(例如,从SD 1.x到2.x),这是由于模型参数和架构发生了重大变化。在这项工作中,我们提出了Delta Sampling(DS),这是一种新的方法,可以在不同架构的基础模型之间进行知识转移,而无需访问原始训练数据。DS完全在推理时通过利用delta进行操作:在基础模型适应之前和之后模型预测的差异。然后,该增量用于指导新基础模型的去噪过程。我们评估了不同SD版本的DS,证明DS在创造所需效果方面实现了一致的改进(例如,视觉风格,语义概念和结构)下不同的抽样策略。这些结果突出DS作为一个有效的,即插即用的知识转移机制,在基于扩散的图像合成。Code:~ https://github.com/Zhidong-Gao/DeltaSampling
摘要:Diffusion models like Stable Diffusion (SD) drive a vibrant open-source ecosystem including fully fine-tuned checkpoints and parameter-efficient adapters such as LoRA, LyCORIS, and ControlNet. However, these adaptation components are tightly coupled to a specific base model, making them difficult to reuse when the base model is upgraded (e.g., from SD 1.x to 2.x) due to substantial changes in model parameters and architecture. In this work, we propose Delta Sampling (DS), a novel method that enables knowledge transfer across base models with different architectures, without requiring access to the original training data. DS operates entirely at inference time by leveraging the delta: the difference in model predictions before and after the adaptation of a base model. This delta is then used to guide the denoising process of a new base model. We evaluate DS across various SD versions, demonstrating that DS achieves consistent improvements in creating desired effects (e.g., visual styles, semantic concepts, and structures) under different sampling strategies. These results highlight DS as an effective, plug-and-play mechanism for knowledge transfer in diffusion-based image synthesis. Code:~ https://github.com/Zhidong-Gao/DeltaSampling
【7】Energy-Efficient Federated Learning via Adaptive Encoder Freezing for MRI-to-CT Conversion: A Green AI-Guided Research
标题:通过自适应编码器冻结进行节能联邦学习以实现MRI到CT转换:一项绿色人工智能引导的研究
链接:https://arxiv.org/abs/2512.03054
作者:Ciro Benito Raggio,Lucia Migliorelli,Nils Skupien,Mathias Krohmer Zabaleta,Oliver Blanck,Francesco Cicone,Giuseppe Lucio Cascini,Paolo Zaffino,Maria Francesca Spadea
备注:22 pages, 13 figures
摘要:联邦学习(FL)具有促进健康平等的潜力,使不同的机构能够协作训练深度学习(DL)模型,即使数据有限。然而,FL的重大资源需求往往排除了计算基础设施有限的中心,进一步扩大了现有的医疗差距。为了解决这个问题,我们提出了一种面向绿色人工智能的自适应层冻结策略,旨在降低能耗和计算负载,同时保持模型性能。我们使用不同的联邦架构测试了我们的方法,用于磁共振成像(MRI)到计算机断层扫描(CT)的转换。所提出的自适应策略通过基于从轮到轮的编码器权重的监测相对差异选择性地冻结编码器权重来优化联邦训练。基于患者的机制确保仅在更新始终保持最低限度时才发生冻结。使用CodeCarbon库跟踪联邦的能源消耗和CO2eq排放。与同等的非冷冻同行相比,我们的方法将培训时间、总能耗和CO2eq排放量减少了23%。同时,MRI到CT的转换性能得以保持,平均绝对误差(MAE)仅有微小变化。值得注意的是,对于五个评估的架构中的三个,没有观察到统计学上的显著差异,而两个架构表现出统计学上的显著改进。我们的工作符合一种研究范式,即促进基于DL的框架满足临床要求,同时确保气候,社会和经济的可持续性。它为新的FL评估框架奠定了基础,促进了隐私,公平,更广泛地说,AI驱动的医疗保健的正义。
摘要
:Federated Learning (FL) holds the potential to advance equality in health by enabling diverse institutions to collaboratively train deep learning (DL) models, even with limited data. However, the significant resource requirements of FL often exclude centres with limited computational infrastructure, further widening existing healthcare disparities. To address this issue, we propose a Green AI-oriented adaptive layer-freezing strategy designed to reduce energy consumption and computational load while maintaining model performance. We tested our approach using different federated architectures for Magnetic Resonance Imaging (MRI)-to-Computed Tomography (CT) conversion. The proposed adaptive strategy optimises the federated training by selectively freezing the encoder weights based on the monitored relative difference of the encoder weights from round to round. A patience-based mechanism ensures that freezing only occurs when updates remain consistently minimal. The energy consumption and CO2eq emissions of the federation were tracked using the CodeCarbon library. Compared to equivalent non-frozen counterparts, our approach reduced training time, total energy consumption and CO2eq emissions by up to 23%. At the same time, the MRI-to-CT conversion performance was maintained, with only small variations in the Mean Absolute Error (MAE). Notably, for three out of the five evaluated architectures, no statistically significant differences were observed, while two architectures exhibited statistically significant improvements. Our work aligns with a research paradigm that promotes DL-based frameworks meeting clinical requirements while ensuring climatic, social, and economic sustainability. It lays the groundwork for novel FL evaluation frameworks, advancing privacy, equity and, more broadly, justice in AI-driven healthcare.
【8】Tada-DIP: Input-adaptive Deep Image Prior for One-shot 3D Image Reconstruction
标题:Tada-DIP:用于单次三维图像重建的输入自适应深度图像先验
链接:https://arxiv.org/abs/2512.03962
作者:Evan Bell,Shijun Liang,Ismail Alkhouri,Saiprasad Ravishankar
备注:6 pages, 8 figures, 2025 Asilomar Conference on Signals, Systems, and Computers. Code is available at github.com/evanbell02/Tada-DIP/
摘要:深度图像先验(DIP)最近成为一种很有前途的基于单次神经网络的图像重建方法。然而,DIP已经看到有限的应用到3D图像重建问题。在这项工作中,我们介绍了Tada-DIP,一个高效的和完全三维DIP方法求解三维反问题。通过结合输入自适应和去噪正则化,Tada-DIP产生高质量的3D重建,同时避免了DIP中常见的过拟合现象。稀疏视图X射线计算机断层扫描重建的实验验证了所提出的方法的有效性,表明Tada-DIP产生更好的重建比无训练数据的基线,并实现与使用具有完全采样体积的大型数据集训练的监督网络相当的重建性能。
摘要:Deep Image Prior (DIP) has recently emerged as a promising one-shot neural-network based image reconstruction method. However, DIP has seen limited application to 3D image reconstruction problems. In this work, we introduce Tada-DIP, a highly effective and fully 3D DIP method for solving 3D inverse problems. By combining input-adaptation and denoising regularization, Tada-DIP produces high-quality 3D reconstructions while avoiding the overfitting phenomenon that is common in DIP. Experiments on sparse-view X-ray computed tomography reconstruction validate the effectiveness of the proposed method, demonstrating that Tada-DIP produces much better reconstructions than training-data-free baselines and achieves reconstruction performance on par with a supervised network trained using a large dataset with fully-sampled volumes.
强化学习(6篇)
【1】Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
标题:引导流政策:从离线强化学习中的高价值动作中学习
链接:https://arxiv.org/abs/2512.03973
作者:Franki Nguimatsia Tiofack,Théotime Le Hellard,Fabian Schramm,Nicolas Perrin-Gilbert,Justin Carpentier
摘要:离线强化学习通常依赖于行为正则化,强制执行策略以保持接近数据集分布。然而,这种方法无法在其正则化组件中区分高值和低值动作。我们引入了引导流策略(GFP),它耦合了一个多步骤的流匹配策略与一个蒸馏一步演员。参与者通过加权行为克隆来指导流策略,以专注于从数据集中克隆高值操作,而不是不加区别地模仿所有状态-操作对。反过来,流策略约束演员保持与数据集的最佳转换一致,同时最大化评论家。这种相互指导使GFP能够在OGBench,Minari和D4 RL基准测试的144个状态和基于像素的任务中实现最先进的性能,并在次优数据集和具有挑战性的任务中获得实质性的收益。网址:https://simple-robotics.github.io/publications/guided-flow-policy/
摘要:Offline reinforcement learning often relies on behavior regularization that enforces policies to remain close to the dataset distribution. However, such approaches fail to distinguish between high-value and low-value actions in their regularization components. We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled one-step actor. The actor directs the flow policy through weighted behavior cloning to focus on cloning high-value actions from the dataset rather than indiscriminately imitating all state-action pairs. In turn, the flow policy constrains the actor to remain aligned with the dataset's best transitions while maximizing the critic. This mutual guidance enables GFP to achieve state-of-the-art performance across 144 state and pixel-based tasks from the OGBench, Minari, and D4RL benchmarks, with substantial gains on suboptimal datasets and challenging tasks. Webpage: https://simple-robotics.github.io/publications/guided-flow-policy/
【2】Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware
标题:采用英特尔Loihi 2神经形态硬件的自主强化学习机器人控制
链接:https://arxiv.org/abs/2512.03911
作者:Kenneth Stewart,Roxana Leontie,Samantha Chapin,Joe Hays,Sumit Bam Shrestha,Carl Glen Henshaw
备注:Submitted for review at NICE 2026 (Neuro-Inspired Computational Elements) conference
摘要:我们提出了一个端到端的管道,用于在神经形态硬件上部署强化学习(RL)训练的人工神经网络(ANN),将其转换为尖峰Sigma-Delta神经网络(SDNN)。我们证明了完全在模拟中训练的ANN策略可以转换为与英特尔的Loihi 2架构兼容的SDNN,从而实现低延迟和高能效的推理。作为一个测试案例,我们使用RL策略控制的Astrobee自由飞行机器人,类似于以前的硬件在空间验证控制器。该策略使用校正线性单元(ReLU)进行训练,转换为SDNN并部署在英特尔的Loihi 2上,然后在NVIDIA的Omniverse Isaac Lab仿真环境中进行评估,以实现Astrobee运动的闭环控制。我们比较GPU和Loihi 2之间的执行性能。研究结果突出了使用神经形态平台进行机器人控制的可行性,并为未来空间和陆地机器人应用中的节能,实时神经形态计算建立了一条途径。
摘要:We present an end-to-end pipeline for deploying reinforcement learning (RL) trained Artificial Neural Networks (ANNs) on neuromorphic hardware by converting them into spiking Sigma-Delta Neural Networks (SDNNs). We demonstrate that an ANN policy trained entirely in simulation can be transformed into an SDNN compatible with Intel's Loihi 2 architecture, enabling low-latency and energy-efficient inference. As a test case, we use an RL policy for controlling the Astrobee free-flying robot, similar to a previously hardware in space-validated controller. The policy, trained with Rectified Linear Units (ReLUs), is converted to an SDNN and deployed on Intel's Loihi 2, then evaluated in NVIDIA's Omniverse Isaac Lab simulation environment for closed-loop control of Astrobee's motion. We compare execution performance between GPU and Loihi 2. The results highlight the feasibility of using neuromorphic platforms for robotic control and establish a pathway toward energy-efficient, real-time neuromorphic computation in future space and terrestrial robotics applications.
【3】Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA
标题:用于动态算法配置的深度强化学习:使用(1+($X $,$X $))-GA优化OneMax的案例研究
链接:https://arxiv.org/abs/2512.03805
作者:Tai Nguyen,Phong Le,André Biedenkapp,Carola Doerr,Nguyen Dang
备注:arXiv admin note: text overlap with arXiv:2502.20265
摘要
:动态算法配置(DAC)研究参数化优化算法的控制策略的有效识别。许多研究利用强化学习(RL)中决策的鲁棒性来解决算法配置中的优化挑战。然而,将RL应用于DAC具有挑战性,通常需要广泛的领域专业知识。我们通过对OneMax实例上控制(1+($λ$,$λ$))-GA的种群大小参数的系统分析,对DAC中的深度RL算法进行了全面的研究。我们对DDQN和PPO的调查揭示了限制其在DAC中有效性的两个基本挑战:可扩展性退化和学习不稳定性。我们将这些问题追溯到两个主要原因:勘探不足和规划范围覆盖,每一个都可以通过有针对性的解决方案有效地解决。为了解决探索不足的问题,我们引入了一种自适应的奖励转移机制,该机制利用奖励分布统计信息来增强DDQN代理探索,消除了对特定于实例的超参数调整的需要,并确保在不同的问题尺度上保持一致的有效性。在处理规划范围覆盖问题时,我们证明了未贴现学习在DDQN中有效地解决了它,而PPO面临着需要替代算法设计的基本方差问题。我们进一步分析了PPO的超参数依赖关系,表明虽然超参数优化增强了学习稳定性,但它在识别各种配置的有效策略方面始终存在不足。最后,我们证明了DDQN配备了我们的自适应奖励转移策略,实现了性能与理论推导的政策,大大提高了采样效率,优于以前的DAC方法的几个数量级。
摘要:Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies have leveraged the robustness of decision-making in Reinforcement Learning (RL) to address the optimization challenges in algorithm configuration. However, applying RL to DAC is challenging and often requires extensive domain expertise. We conduct a comprehensive study of deep-RL algorithms in DAC through a systematic analysis of controlling the population size parameter of the (1+($λ$,$λ$))-GA on OneMax instances. Our investigation of DDQN and PPO reveals two fundamental challenges that limit their effectiveness in DAC: scalability degradation and learning instability. We trace these issues to two primary causes: under-exploration and planning horizon coverage, each of which can be effectively addressed through targeted solutions. To address under-exploration, we introduce an adaptive reward shifting mechanism that leverages reward distribution statistics to enhance DDQN agent exploration, eliminating the need for instance-specific hyperparameter tuning and ensuring consistent effectiveness across different problem scales. In dealing with the planning horizon coverage problem, we demonstrate that undiscounted learning effectively resolves it in DDQN, while PPO faces fundamental variance issues that necessitate alternative algorithmic designs. We further analyze the hyperparameter dependencies of PPO, showing that while hyperparameter optimization enhances learning stability, it consistently falls short in identifying effective policies across various configurations. Finally, we demonstrate that DDQN equipped with our adaptive reward shifting strategy achieves performance comparable to theoretically derived policies with vastly improved sample efficiency, outperforming prior DAC approaches by several orders of magnitude.
【4】Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing
标题:自主规划太空组装强化学习自由飞行(APIARY)国际空间站Astrobee测试
链接:https://arxiv.org/abs/2512.03729
作者:Samantha Chapin,Kenneth Stewart,Roxana Leontie,Carl Glen Henshaw
备注:iSpaRo 2025, Best Paper Award in Orbital Robotics
摘要:美国海军研究实验室(NRL)的自主规划空间装配强化学习自由飞行器(APIARY)实验开创了在太空零重力(零G)环境中使用强化学习(RL)控制自由飞行机器人的先河。2025年5月27日,星期二,APIARY团队进行了有史以来第一次,据我们所知,使用国际空间站(ISS)上的NASA Astrobee机器人在太空中对自由飞行器进行RL控制。在NVIDIA Isaac实验室仿真环境中使用Actor-Critic Proximal Policy Optimization(PPO)网络训练了一个强大的6自由度(DOF)控制策略,对目标姿势和质量分布进行随机化,以增强鲁棒性。本文详细介绍了该试验的模拟试验、地面试验和飞行验证。这次在轨演示验证了RL在提高机器人自主性方面的变革潜力,能够快速开发和部署(在几分钟到几小时内)为太空探索,物流和实时任务需求量身定制的行为。
摘要:The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for control of free-flying robots in the zero-gravity (zero-G) environment of space. On Tuesday, May 27th 2025 the APIARY team conducted the first ever, to our knowledge, RL control of a free-flyer in space using the NASA Astrobee robot on-board the International Space Station (ISS). A robust 6-degrees of freedom (DOF) control policy was trained using an actor-critic Proximal Policy Optimization (PPO) network within the NVIDIA Isaac Lab simulation environment, randomizing over goal poses and mass distributions to enhance robustness. This paper details the simulation testing, ground testing, and flight validation of this experiment. This on-orbit demonstration validates the transformative potential of RL for improving robotic autonomy, enabling rapid development and deployment (in minutes to hours) of tailored behaviors for space exploration, logistics, and real-time mission needs.
【5】SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning
标题:SPARK:无参考强化学习的逐步过程感知奖励
链接:https://arxiv.org/abs/2512.03244
作者:Salman Rahman,Sruthi Gorantla,Arpit Gupta,Swastik Roy,Nanyun Peng,Yang Liu
摘要:提供密集的步骤级反馈的过程奖励模型(PRM)已经显示出强化学习的前景,但它们的采用仍然受到昂贵的步骤级注释或地面真实参考的限制。我们建议SPARK:一个三阶段的框架,其中在第一阶段,生成器模型产生不同的解决方案,验证器模型使用并行缩放(自一致性)和顺序缩放(元批判)对其进行评估。在第二阶段,我们使用这些验证输出作为合成训练数据来微调生成过程奖励模型,这些模型随后在训练期间用作奖励信号。我们发现,在步骤级别聚合多个独立验证会为过程奖励模型产生训练数据,这些模型超过了地面实况结果监督,在ProcessBench(用于识别数学推理中错误步骤的基准)上达到67.5 F1,而参考指导训练为66.4,GPT-4 o为61.9。在最后阶段,我们将我们的生成式PRM与思想链验证(PRM-CoT)作为数学推理RL实验中的奖励模型,并引入格式约束以防止奖励黑客。使用Qwen2.5-Math-7 B,我们在六个数学推理基准测试中实现了47.4%的平均准确率,优于基于地面事实的RLVR(43.9%)。我们的工作实现了超越地面实况方法的无参考RL训练,为缺乏可验证答案或可访问地面实况的领域开辟了新的可能性。
摘要:Process reward models (PRMs) that provide dense, step-level feedback have shown promise for reinforcement learning, yet their adoption remains limited by the need for expensive step-level annotations or ground truth references. We propose SPARK: a three-stage framework where in the first stage a generator model produces diverse solutions and a verifier model evaluates them using parallel scaling (self-consistency) and sequential scaling (meta-critique). In the second stage, we use these verification outputs as synthetic training data to fine-tune generative process reward models, which subsequently serve as reward signals during training. We show that aggregating multiple independent verifications at the step level produces training data for process reward models that surpass ground-truth outcome supervision, achieving 67.5 F1 on ProcessBench (a benchmark for identifying erroneous steps in mathematical reasoning) compared to 66.4 for reference-guided training and 61.9 for GPT-4o. In the final stage, we apply our generative PRM with chain-of-thought verification (PRM-CoT) as the reward model in RL experiments on mathematical reasoning, and introduce format constraints to prevent reward hacking. Using Qwen2.5-Math-7B, we achieve 47.4% average accuracy across six mathematical reasoning benchmarks, outperforming ground-truth-based RLVR (43.9%). Our work enables reference-free RL training that exceeds ground-truth methods, opening new possibilities for domains lacking verifiable answers or accessible ground truth.
【6】Optimizing Life Sciences Agents in Real-Time using Reinforcement Learning
标题:使用强化学习实时优化生命科学代理
链接:https://arxiv.org/abs/2512.03065
作者:Nihir Chadderwala
摘要:生命科学中的生成式AI代理面临着一个关键挑战:确定从简单的事实问题到复杂的机械推理的各种查询的最佳方法。传统方法依赖于固定的规则或昂贵的标记训练数据,这两者都不能适应不断变化的条件或用户偏好。我们提出了一个新的框架,将AWS Strands Agents与Thompson Sampling上下文强盗相结合,使AI代理能够仅从用户反馈中学习最佳决策策略。我们的系统优化了三个关键维度:生成策略选择(直接与思想链),工具选择(文献检索,药物数据库等),和域路由(药理学、分子生物学、临床专家)。通过对生命科学查询的实证评估,我们证明与随机基线相比,用户满意度提高了15- 30%,在20-30次查询后出现了清晰的学习模式。我们的方法不需要地面真值标签,不断适应用户偏好,并为代理AI系统中的探索-利用困境提供了一个原则性的解决方案。
摘要
:Generative AI agents in life sciences face a critical challenge: determining the optimal approach for diverse queries ranging from simple factoid questions to complex mechanistic reasoning. Traditional methods rely on fixed rules or expensive labeled training data, neither of which adapts to changing conditions or user preferences. We present a novel framework that combines AWS Strands Agents with Thompson Sampling contextual bandits to enable AI agents to learn optimal decision-making strategies from user feedback alone. Our system optimizes three key dimensions: generation strategy selection (direct vs. chain-of-thought), tool selection (literature search, drug databases, etc.), and domain routing (pharmacology, molecular biology, clinical specialists). Through empirical evaluation on life science queries, we demonstrate 15-30\% improvement in user satisfaction compared to random baselines, with clear learning patterns emerging after 20-30 queries. Our approach requires no ground truth labels, adapts continuously to user preferences, and provides a principled solution to the exploration-exploitation dilemma in agentic AI systems.
医学相关(3篇)
【1】EfficientECG: Cross-Attention with Feature Fusion for Efficient Electrocardiogram Classification
标题:Efficient心电图:交叉注意与特征融合实现高效心电图分类
链接:https://arxiv.org/abs/2512.03804
作者:Hanhui Deng,Xinglin Li,Jie Luo,Zhanpeng Jin,Di Wu
摘要:心电图是一种有用的诊断信号,可以通过测量心脏产生的电活动来检测心脏异常。由于其快速、无创和信息丰富的特点,心电图有许多新兴的应用。在本文中,我们研究了新的深度学习技术,以有效地管理和分析ECG数据,旨在准确快速地构建诊断模型,从而大大减轻医务工作者的负担。与现有的心电图模型表现出很高的误诊率不同,我们的深度学习方法可以通过端到端训练自动提取心电图数据的特征。具体来说,我们首先设计了EfficientECG,一个准确的和轻量级的分类模型的ECG分析的基础上,现有的EfficientNet模型,它可以有效地处理高频长序列心电数据与各种领先的类型。最重要的是,我们接下来提出了一种基于交叉注意力的EfficientECG特征融合模型,用于分析具有多个特征的多导联ECG数据(例如,性别和年龄)。我们对代表性ECG数据集的评估验证了我们的模型在高精度,多特征融合和轻量级方面对最先进作品的优越性。
摘要:Electrocardiogram is a useful diagnostic signal that can detect cardiac abnormalities by measuring the electrical activity generated by the heart. Due to its rapid, non-invasive, and richly informative characteristics, ECG has many emerging applications. In this paper, we study novel deep learning technologies to effectively manage and analyse ECG data, with the aim of building a diagnostic model, accurately and quickly, that can substantially reduce the burden on medical workers. Unlike the existing ECG models that exhibit a high misdiagnosis rate, our deep learning approaches can automatically extract the features of ECG data through end-to-end training. Specifically, we first devise EfficientECG, an accurate and lightweight classification model for ECG analysis based on the existing EfficientNet model, which can effectively handle high-frequency long-sequence ECG data with various leading types. On top of that, we next propose a cross-attention-based feature fusion model of EfficientECG for analysing multi-lead ECG data with multiple features (e.g., gender and age). Our evaluations on representative ECG datasets validate the superiority of our model against state-of-the-art works in terms of high precision, multi-feature fusion, and lightweights.
【2】SweetDeep: A Wearable AI Solution for Real-Time Non-Invasive Diabetes Screening
标题:SweetDeep:用于实时无创糖尿病筛查的可穿戴人工智能解决方案
链接:https://arxiv.org/abs/2512.03471
作者:Ian Henriques,Lynda Elhassar,Sarvesh Relekar,Denis Walrave,Shayan Hassantabar,Vishu Ghanakota,Adel Laoui,Mahmoud Aich,Rafia Tir,Mohamed Zerguine,Samir Louafi,Moncef Kimouche,Emmanuel Cosson,Niraj K Jha
备注:12 pages, 6 figures. Submitted to the IEEE Journal of Biomedical and Health Informatics
摘要:全球2型糖尿病的增加强调了对可扩展和具有成本效益的筛查方法的需求。目前的诊断需要生物化学测定,这是侵入性的和昂贵的。消费者可穿戴设备的进步使得人们能够早期探索基于机器学习的疾病检测,但之前的研究仅限于受控环境。我们介绍了SweetDeep,这是一个紧凑的神经网络,它基于来自欧盟和中东和北非地区的285名(糖尿病和非糖尿病)参与者的生理和人口统计数据进行训练,这些数据是使用三星Galaxy Watch 7设备在自由生活条件下收集的。每名参与者每天贡献多个2分钟探头记录,每人总计约20个记录。尽管包含不到3,000个参数,但SweetDeep在三重交叉验证下实现了82.5%的患者水平准确度(82.1%的宏F1,79.7%的灵敏度,84.6%的特异性),预期校准误差为5.5%。允许模型放弃低于10%的低置信度患者预测,对其余患者的准确率为84.5%。这些发现表明,将工程特性与轻量级架构相结合,可以支持在现实世界的可穿戴环境中准确,快速和可推广的2型糖尿病检测。
摘要:The global rise in type 2 diabetes underscores the need for scalable and cost-effective screening methods. Current diagnosis requires biochemical assays, which are invasive and costly. Advances in consumer wearables have enabled early explorations of machine learning-based disease detection, but prior studies were limited to controlled settings. We present SweetDeep, a compact neural network trained on physiological and demographic data from 285 (diabetic and non-diabetic) participants in the EU and MENA regions, collected using Samsung Galaxy Watch 7 devices in free-living conditions over six days. Each participant contributed multiple 2-minute sensor recordings per day, totaling approximately 20 recordings per individual. Despite comprising fewer than 3,000 parameters, SweetDeep achieves 82.5% patient-level accuracy (82.1% macro-F1, 79.7% sensitivity, 84.6% specificity) under three-fold cross-validation, with an expected calibration error of 5.5%. Allowing the model to abstain on less than 10% of low-confidence patient predictions yields an accuracy of 84.5% on the remaining patients. These findings demonstrate that combining engineered features with lightweight architectures can support accurate, rapid, and generalizable detection of type 2 diabetes in real-world wearable settings.
【3】Bayesian Event-Based Model for Disease Subtype and Stage Inference
标题:基于事件的疾病亚型和阶段推理模型
链接:https://arxiv.org/abs/2512.03467
作者:Hongtao Hao,Joseph L. Austerweil
备注:32 pages; machine learning for health symposium (2025); Proceedings of the 5th Machine Learning for Health Symposium in PMLR
摘要:慢性病的进展往往因患者而异。而不是随机变化,通常有一个小数目的亚型,疾病如何在患者之间的进展。为了捕获这种结构化异质性,基于事件的亚型和阶段推断模型(SuStaIn)估计亚型的数量,每个亚型的疾病进展顺序,并将每个患者分配到主要来自横截面数据的亚型。它已被广泛应用于揭示许多疾病的亚型,并告知我们对它们的理解。但它的表现有多稳健?在本文中,我们开发了一个原则性的贝叶斯子类型变体的基于事件的模型(BEBMS),并比较其性能SuStaIn在各种合成数据实验与不同水平的模型误指定。在排序、分段和子类型分配任务方面,BEBMS的性能大大优于SuStaIn。此外,我们将BEBMS和SuStaIn应用于现实世界的阿尔茨海默氏症数据集。我们发现BEBMS的结果比SuStaIn更符合阿尔茨海默病进展的科学共识。
摘要:Chronic diseases often progress differently across patients. Rather than randomly varying, there are typically a small number of subtypes for how a disease progresses across patients. To capture this structured heterogeneity, the Subtype and Stage Inference Event-Based Model (SuStaIn) estimates the number of subtypes, the order of disease progression for each subtype, and assigns each patient to a subtype from primarily cross-sectional data. It has been widely applied to uncover the subtypes of many diseases and inform our understanding of them. But how robust is its performance? In this paper, we develop a principled Bayesian subtype variant of the event-based model (BEBMS) and compare its performance to SuStaIn in a variety of synthetic data experiments with varied levels of model misspecification. BEBMS substantially outperforms SuStaIn across ordering, staging, and subtype assignment tasks. Further, we apply BEBMS and SuStaIn to a real-world Alzheimer's data set. We find BEBMS has results that are more consistent with the scientific consensus of Alzheimer's disease progression than SuStaIn.
蒸馏|知识提取(1篇)
【1】Technical Report on Text Dataset Distillation
标题:文本数据集蒸馏技术报告
链接:https://arxiv.org/abs/2512.03967
作者:Keith Ando Ogawa,Bruno Lopes Yamamoto,Lucas Lauton de Alcantara,Victor Zacarias,Edson Bollis,Lucas Pellicer,Rosimeire Pereira Costa,Anna Helena Reali Costa,Artur Jordao
摘要:在视觉领域,数据集蒸馏是一种将大型数据集浓缩为较小的合成数据集的技术,该合成数据集在训练过程中表现出类似的结果。虽然图像数据提供了大量的蒸馏方法文献,但相比之下,文本数据集蒸馏的工作较少。文本数据集蒸馏最初是作为视觉世界的努力的适应而发展起来的,随着模态的特殊性成为明显的障碍,它上升为一个独立的研究分支。几个里程碑标志着这一领域的发展,例如引入使用Transformer模型的方法,生成离散合成文本,以及扩展到具有超过1B参数的仅解码器模型。尽管现代方法取得了重大进展,但该领域仍处于成熟阶段,在基准标准化,克服文本离散性的方法,处理复杂任务以及提供真实世界应用程序的明确示例方面仍有改进的空间。在本报告中,我们回顾了文本数据集蒸馏的过去和最近的进展,强调了不同的蒸馏策略,关键贡献和一般挑战。
摘要:In the vision domain, dataset distillation arises as a technique to condense a large dataset into a smaller synthetic one that exhibits a similar result in the training process. While image data presents an extensive literature of distillation methods, text dataset distillation has fewer works in comparison. Text dataset distillation initially grew as an adaptation of efforts from the vision universe, as the particularities of the modality became clear obstacles, it rose into a separate branch of research. Several milestones mark the development of this area, such as the introduction of methods that use transformer models, the generation of discrete synthetic text, and the scaling to decoder-only models with over 1B parameters. Despite major advances in modern approaches, the field remains in a maturing phase, with room for improvement on benchmarking standardization, approaches to overcome the discrete nature of text, handling complex tasks, and providing explicit examples of real-world applications. In this report, we review past and recent advances in dataset distillation for text, highlighting different distillation strategies, key contributions, and general challenges.
聚类(3篇)
【1】PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering
标题:PretopoMD:基于Pretopology的混合数据分层集群
链接:https://arxiv.org/abs/2512.03071
作者:Loup-Noe Levy,Guillaume Guerard,Sonia Djebali,Soufian Ben Amor
摘要:本文提出了一种新的基于前拓扑的算法,旨在解决聚类混合数据的挑战,而不需要降维。利用析取范式,我们的方法制定了可定制的逻辑规则和可调整的超参数,允许用户定义的层次聚类结构,并为异构数据集提供量身定制的解决方案。通过层次树状图分析和比较聚类指标,我们的方法通过准确和可解释地直接从原始数据中划分聚类,从而保持数据完整性,从而展示了卓越的性能。实证研究结果突出了该算法的鲁棒性,在构建有意义的集群,并揭示其在克服相关问题的集群数据的可解释性。这项工作的新颖之处在于它脱离了传统的降维技术,并创新地使用了逻辑规则,增强了聚类的形成和清晰度,从而为聚类混合数据的讨论做出了重大进展。
摘要:This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction. Leveraging Disjunctive Normal Form, our approach formulates customizable logical rules and adjustable hyperparameters that allow for user-defined hierarchical cluster construction and facilitate tailored solutions for heterogeneous datasets. Through hierarchical dendrogram analysis and comparative clustering metrics, our method demonstrates superior performance by accurately and interpretably delineating clusters directly from raw data, thus preserving data integrity. Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability. The novelty of this work lies in its departure from traditional dimensionality reduction techniques and its innovative use of logical rules that enhance both cluster formation and clarity, thereby contributing a significant advancement to the discourse on clustering mixed data.
【2】Mixed Data Clustering Survey and Challenges
标题:混合数据集群调查和挑战
链接:https://arxiv.org/abs/2512.03070
作者:Guillaume Guerard,Sonia Djebali
摘要:大数据范式的出现改变了行业管理和分析信息的方式,迎来了一个前所未有的数据量、速度和多样性的时代。在这种情况下,混合数据聚类已经成为一个关键的挑战,需要创新的方法,可以有效地利用异构数据类型,包括数值和分类变量。传统的聚类技术,通常是为同质数据集设计的,通常很难捕捉混合数据带来的额外复杂性,强调需要专门为这种设置量身定制的方法。分层和可解释的算法在这种情况下特别有价值,因为它们提供了结构化的,可解释的聚类结果,支持明智的决策。本文介绍了一种基于预拓扑空间的聚类方法。此外,对经典的数值聚类算法和现有的预拓扑方法的基准产生洞察的性能和有效性的大数据范式内所提出的方法。
摘要:The advent of the big data paradigm has transformed how industries manage and analyze information, ushering in an era of unprecedented data volume, velocity, and variety. Within this landscape, mixed-data clustering has become a critical challenge, requiring innovative methods that can effectively exploit heterogeneous data types, including numerical and categorical variables. Traditional clustering techniques, typically designed for homogeneous datasets, often struggle to capture the additional complexity introduced by mixed data, underscoring the need for approaches specifically tailored to this setting. Hierarchical and explainable algorithms are particularly valuable in this context, as they provide structured, interpretable clustering results that support informed decision-making. This paper introduces a clustering method grounded in pretopological spaces. In addition, benchmarking against classical numerical clustering algorithms and existing pretopological approaches yields insights into the performance and effectiveness of the proposed method within the big data paradigm.
【3】Hierarchical clustering of complex energy systems using pretopology
标题:使用预拓结构实现复杂能源系统的分层集群
链接:https://arxiv.org/abs/2512.03069
作者:Loup-Noe Levy,Jeremie Bosom,Guillaume Guerard,Soufian Ben Amor,Marc Bui,Hai Tran
摘要:本文试图回答以下问题:如何在一个大的分布式区域的能源消耗曲线建模和分类,以优化管理的建筑物的消费? 对成千上万的建筑物进行逐案深入审计将需要大量的时间和金钱以及大量的合格人员。因此,必须开发一种自动化方法,以建立一个相关和有效的推荐系统。 为了解决这个问题,预拓扑是用来模拟网站的消费配置文件和多标准的层次分类算法,使用预拓扑空间的属性,已在Python库中开发。 为了评估结果,使用了三个数据集:在2D空间中生成的一组各种大小的点,一组生成的时间序列和一组来自法国能源公司的400个真实消费地点的消费时间序列。 在点数据集上,该算法能够以点在空间中的位置和点的大小为参数来识别点簇。在生成的时间序列上,该算法能够使用Pearson相关性来识别时间序列聚类,其中调整后的Rand指数(ARI)为1。
摘要
:This article attempts answering the following problematic: How to model and classify energy consumption profiles over a large distributed territory to optimize the management of buildings' consumption? Doing case-by-case in depth auditing of thousands of buildings would require a massive amount of time and money as well as a significant number of qualified people. Thus, an automated method must be developed to establish a relevant and effective recommendations system. To answer this problematic, pretopology is used to model the sites' consumption profiles and a multi-criterion hierarchical classification algorithm, using the properties of pretopological space, has been developed in a Python library. To evaluate the results, three data sets are used: A generated set of dots of various sizes in a 2D space, a generated set of time series and a set of consumption time series of 400 real consumption sites from a French Energy company. On the point data set, the algorithm is able to identify the clusters of points using their position in space and their size as parameter. On the generated time series, the algorithm is able to identify the time series clusters using Pearson's correlation with an Adjusted Rand Index (ARI) of 1.
自动驾驶|车辆|车道检测等(2篇)
【1】Physics-Embedded Gaussian Process for Traffic State Estimation
标题:用于交通状态估计的物理嵌入高斯过程
链接:https://arxiv.org/abs/2512.04004
作者:Yanlin Chen,Kehua Chen,Yinhai Wang
摘要:交通状态估计(TSE)变得具有挑战性时,探测车辆渗透率低,观测空间稀疏。纯数据驱动的方法缺乏物理解释,并且在观测数据稀疏时泛化能力差。相比之下,物理模型很难整合不确定性并捕捉交通的真正复杂性。为了弥合这一差距,最近的研究探索通过将物理结构嵌入高斯过程来将它们结合起来。这些方法通常通过伪观测将控制方程作为软约束引入,使得模型结构能够在变分框架内集成。然而,这些方法严重依赖于惩罚调整,缺乏原则性的不确定性校准,这使得它们对模型错误指定敏感。在这项工作中,我们提出了一种新的物理嵌入高斯过程(PEGP),旨在将领域知识与数据驱动的方法在交通状态估计解决这些限制。具体来说,我们设计了两个多输出内核通知经典的交通流模型,通过明确的应用程序的线性化微分算子。在HighD,NGSIM上的实验显示出与非物理基线相比的一致改进。PEGP-ARZ在稀疏观测下证明更可靠,而PEGP-LWR在密集观测下实现更低的误差。消融研究进一步揭示了PEGP-ARZ残差与物理学密切相关,并产生校准的可解释的不确定性,而PEGP-LWR残差更正交,并产生几乎恒定的方差场。该PEGP框架结合了物理先验、不确定性量化等方法,为TSE提供了可靠的支持。
摘要:Traffic state estimation (TSE) becomes challenging when probe-vehicle penetration is low and observations are spatially sparse. Pure data-driven methods lack physical explanations and have poor generalization when observed data is sparse. In contrast, physical models have difficulty integrating uncertainties and capturing the real complexity of traffic. To bridge this gap, recent studies have explored combining them by embedding physical structure into Gaussian process. These approaches typically introduce the governing equations as soft constraints through pseudo-observations, enabling the integration of model structure within a variational framework. However, these methods rely heavily on penalty tuning and lack principled uncertainty calibration, which makes them sensitive to model mis-specification. In this work, we address these limitations by presenting a novel Physics-Embedded Gaussian Process (PEGP), designed to integrate domain knowledge with data-driven methods in traffic state estimation. Specifically, we design two multi-output kernels informed by classic traffic flow models, constructed via the explicit application of the linearized differential operator. Experiments on HighD, NGSIM show consistent improvements over non-physics baselines. PEGP-ARZ proves more reliable under sparse observation, while PEGP-LWR achieves lower errors with denser observation. Ablation study further reveals that PEGP-ARZ residuals align closely with physics and yield calibrated, interpretable uncertainty, whereas PEGP-LWR residuals are more orthogonal and produce nearly constant variance fields. This PEGP framework combines physical priors, uncertainty quantification, which can provide reliable support for TSE.
【2】Unlocking the Invisible Urban Traffic Dynamics under Extreme Weather: A New Physics-Constrained Hamiltonian Learning Algorithm
标题:解锁极端天气下隐形城市交通动态:一种新的物理约束汉密尔顿学习算法
链接:https://arxiv.org/abs/2512.03744
作者:Xuhui Lin,Qiuchen Lu
摘要:城市交通系统面临着来自极端天气事件的日益严峻的恢复挑战,但目前的评估方法依赖于表面水平的恢复指标,而这些指标忽略了隐藏的结构性破坏。现有的方法不能区分真正的恢复和“假恢复”,其中流量指标正常化,但底层的系统动态永久降级。为了解决这个问题,一个新的物理约束的哈密顿学习算法相结合的“结构不可逆性检测”和“能量景观重建”已经开发出来。我们的方法提取低维状态表示,通过物理约束优化识别准哈密顿结构,并通过能量景观比较量化结构变化。对2021年伦敦极端降雨的分析表明,虽然表面指标完全恢复,但我们的算法检测到了传统监测遗漏的64.8%的结构性损坏。我们的框架提供了主动的结构风险评估工具,使基础设施投资基于真实的系统健康状况,而不是误导性的表面指标。
摘要:Urban transportation systems face increasing resilience challenges from extreme weather events, but current assessment methods rely on surface-level recovery indicators that miss hidden structural damage. Existing approaches cannot distinguish between true recovery and "false recovery," where traffic metrics normalize, but the underlying system dynamics permanently degrade. To address this, a new physics-constrained Hamiltonian learning algorithm combining "structural irreversibility detection" and "energy landscape reconstruction" has been developed. Our approach extracts low-dimensional state representations, identifies quasi-Hamiltonian structures through physics-constrained optimization, and quantifies structural changes via energy landscape comparison. Analysis of London's extreme rainfall in 2021 demonstrates that while surface indicators were fully recovered, our algorithm detected 64.8\% structural damage missed by traditional monitoring. Our framework provides tools for proactive structural risk assessment, enabling infrastructure investments based on true system health rather than misleading surface metrics.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】Transmit Weights, Not Features: Orthogonal-Basis Aided Wireless Point-Cloud Transmission
标题:传输权重,而不是功能:常规辅助无线点云传输
链接:https://arxiv.org/abs/2512.03819
作者:Junlin Chang,Yubo Han,Hnag Yue,John S Thompson,Rongke Liu
备注:5 pages, 5 figures
摘要:深度传感器的广泛采用大大降低了点云采集的障碍。本文提出了一种基于深度联合信源信道编码(DeepJSCC)的三维点云语义无线传输框架。发射机不发送原始特征,而是在接收机侧语义正交特征池上预测组合权重,从而实现紧凑的表示和鲁棒的重建。基于折叠的解码器将2D网格变形为3D,在保持几何保真度的同时加强流形连续性。经过倒角距离(CD)和正交正则化器的训练,该系统在不同信噪比(SNR)和带宽的ModelNet40上进行评估。结果显示,在高带宽下,性能与SEMantic点云传输(SEPT)相当,在带宽受限的情况下有明显的增益,峰值信噪比(PSNR)和CD都有一致的改善。烧蚀实验证实了正交化和折叠先验的好处。
摘要:The widespread adoption of depth sensors has substantially lowered the barrier to point-cloud acquisition. This letter proposes a semantic wireless transmission framework for three dimension (3D) point clouds built on Deep Joint Source - Channel Coding (DeepJSCC). Instead of sending raw features, the transmitter predicts combination weights over a receiver-side semantic orthogonal feature pool, enabling compact representations and robust reconstruction. A folding-based decoder deforms a 2D grid into 3D, enforcing manifold continuity while preserving geometric fidelity. Trained with Chamfer Distance (CD) and an orthogonality regularizer, the system is evaluated on ModelNet40 across varying Signal-to-Noise Ratios (SNRs) and bandwidths. Results show performance on par with SEmantic Point cloud Transmission (SEPT) at high bandwidth and clear gains in bandwidth-constrained regimes, with consistent improvements in both Peak Signal-to-Noise Ratio (PSNR) and CD. Ablation experiments confirm the benefits of orthogonalization and the folding prior.
联邦学习|隐私保护|加密(4篇)
【1】Over-the-Air Federated Learning: Rethinking Edge AI Through Signal Processing
标题:空中联合学习:通过信号处理重新思考边缘人工智能
链接:https://arxiv.org/abs/2512.03719
作者:Seyed Mohammad Azimi-Abarghouyi,Carlo Fischione,Kaibin Huang
摘要:空中联合学习(AirFL)是一种新兴的范式,它将无线信号处理和分布式机器学习紧密集成,以在网络边缘实现可扩展的AI。通过利用无线信号的叠加特性,AirFL同时执行学习过程的通信和模型聚合,从而显著降低延迟、带宽和能耗。本文提供了AirFL的教程处理,提出了三种设计方法的新分类:CSIT感知,盲目和加权AirFL。我们提供了一个全面的指导理论基础,性能分析,复杂性考虑,实际限制和前瞻性的研究方向。
摘要:Over-the-Air Federated Learning (AirFL) is an emerging paradigm that tightly integrates wireless signal processing and distributed machine learning to enable scalable AI at the network edge. By leveraging the superposition property of wireless signals, AirFL performs communication and model aggregation of the learning process simultaneously, significantly reducing latency, bandwidth, and energy consumption. This article offers a tutorial treatment of AirFL, presenting a novel classification into three design approaches: CSIT-aware, blind, and weighted AirFL. We provide a comprehensive guide to theoretical foundations, performance analysis, complexity considerations, practical limitations, and prospective research directions.
【2】Federated Learning and Trajectory Compression for Enhanced AIS Coverage
标题:联邦学习和轨迹压缩以增强AIS覆盖范围
链接:https://arxiv.org/abs/2512.03584
作者:Thomas Gräupl,Andreas Reisenbauer,Marcel Hecko,Anil Rasouli,Anita Graser,Melitta Dragaschnig,Axel Weissenfeld,Gilles Dejaegere,Mahmoud Sakr
摘要:本文介绍了VesselEdge系统,该系统利用联邦学习和带宽受限的轨迹压缩,通过扩展AIS覆盖范围来增强海上态势感知。VesselEdge将船舶转变为移动传感器,实现实时异常检测和通过低带宽连接进行高效数据传输。该系统集成了用于联邦学习的M3 fed模型和用于轨迹压缩的BWC-DR-A算法,对异常数据进行优先级排序。初步结果表明,VesselEdge的有效性,提高AIS的覆盖范围和态势感知使用历史数据。
摘要:This paper presents the VesselEdge system, which leverages federated learning and bandwidth-constrained trajectory compression to enhance maritime situational awareness by extending AIS coverage. VesselEdge transforms vessels into mobile sensors, enabling real-time anomaly detection and efficient data transmission over low-bandwidth connections. The system integrates the M3fed model for federated learning and the BWC-DR-A algorithm for trajectory compression, prioritizing anomalous data. Preliminary results demonstrate the effectiveness of VesselEdge in improving AIS coverage and situational awareness using historical data.
【3】Single-Round Scalable Analytic Federated Learning
标题:单轮可扩展分析联邦学习
链接:https://arxiv.org/abs/2512.03336
作者:Alan T. L. Bacellar,Mustafa Munir,Felipe M. G. França,Priscila M. V. Lima,Radu Marculescu,Lizy K. John
摘要:联邦学习(FL)受到两个关键挑战的困扰:高通信开销和异构(非IID)数据的性能崩溃。分析FL(AFL)提供了一个单轮,数据分布不变的解决方案,但仅限于线性模型。随后的非线性方法,如DeepAFL,重新获得准确性,但牺牲了单轮优势。在这项工作中,我们打破了这种权衡。我们提出了SAFLe,一个框架,实现可扩展的非线性表达,通过引入一个结构化的头桶功能和稀疏,分组嵌入。我们证明了这种非线性结构在数学上等价于高维线性回归。这一关键等价性使得SAFLe可以用AFL的单发不变聚合律来求解。从经验上讲,SAFLe为分析FL建立了一种新的最先进的技术,在所有基准测试中的准确性显著优于线性AFL和多轮DeepAFL,展示了一种高效和可扩展的联合视觉解决方案。
摘要:Federated Learning (FL) is plagued by two key challenges: high communication overhead and performance collapse on heterogeneous (non-IID) data. Analytic FL (AFL) provides a single-round, data distribution invariant solution, but is limited to linear models. Subsequent non-linear approaches, like DeepAFL, regain accuracy but sacrifice the single-round benefit. In this work, we break this trade-off. We propose SAFLe, a framework that achieves scalable non-linear expressivity by introducing a structured head of bucketed features and sparse, grouped embeddings. We prove this non-linear architecture is mathematically equivalent to a high-dimensional linear regression. This key equivalence allows SAFLe to be solved with AFL's single-shot, invariant aggregation law. Empirically, SAFLe establishes a new state-of-the-art for analytic FL, significantly outperforming both linear AFL and multi-round DeepAFL in accuracy across all benchmarks, demonstrating a highly efficient and scalable solution for federated vision.
【4】Multi-Frequency Federated Learning for Human Activity Recognition Using Head-Worn Sensors
标题:使用头戴传感器的多频联邦学习用于人类活动识别
链接:https://arxiv.org/abs/2512.03287
作者:Dario Fenoglio,Mohan Li,Davide Casnici,Matias Laporte,Shkurta Gashi,Silvia Santini,Martin Gjoreski,Marc Langheinrich
备注:8 pages, 2024 International Conference on Intelligent Environments (IE), 2024
摘要:人类活动识别(HAR)有益于各种应用领域,包括健康和老年人护理。传统的HAR涉及构建依赖于集中式用户数据的管道,这可能会带来隐私问题,因为它们需要将用户数据上传到集中式服务器。这项工作提出了多频联合学习(FL),以实现:(1)隐私感知ML;(2)具有不同采样频率的跨设备联合ML模型学习。我们专注于头戴式设备(例如,耳机和智能眼镜),与传统的基于智能手表或智能手机的HAR相比,这是一个相对未开发的领域。结果表明,在两个数据集上对频率特定的方法进行了改进,这表明多频率FL-HAR任务有着广阔的前景。拟议的网络的实施是公开的,以供进一步研究和开发。
摘要:Human Activity Recognition (HAR) benefits various application domains, including health and elderly care. Traditional HAR involves constructing pipelines reliant on centralized user data, which can pose privacy concerns as they necessitate the uploading of user data to a centralized server. This work proposes multi-frequency Federated Learning (FL) to enable: (1) privacy-aware ML; (2) joint ML model learning across devices with varying sampling frequency. We focus on head-worn devices (e.g., earbuds and smart glasses), a relatively unexplored domain compared to traditional smartwatch- or smartphone-based HAR. Results have shown improvements on two datasets against frequency-specific approaches, indicating a promising future in the multi-frequency FL-HAR task. The proposed network's implementation is publicly available for further research and development.
推理|分析|理解|解释(8篇)
【1】Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
标题:划分,然后接地:调整帧选择以查询类型以理解长格式视频
链接:https://arxiv.org/abs/2512.04000
作者:Jialuo Li,Bin Li,Jiahao Li,Yan Lu
摘要
:大型多模态模型(Large Multimodal Models,LVMs)在长格式视频理解中的应用受到有限的上下文长度和处理密集视频令牌的计算成本的限制。因此,最近的研究集中在查询感知的帧选择,方法,往往会产生显着的计算开销。本文挑战的假设,这种复杂的搜索机制是普遍必要的。我们首先确定和验证的查询类型区分全球查询和本地化查询。我们证明,虽然统一采样是有效的和高效的全局查询,本地化的查询确实需要查询感知的选择,以获得最佳性能。基于这一见解,我们提出了DIG,一个无需训练的框架选择框架,根据查询类型调整其策略。具体来说,DIG采用高效的统一采样的全球查询,同时激活一个专门的管道提取查询相关的帧本地化查询。对三个长格式视频理解基准的实验表明,即使将输入帧计数缩放至256,DIG的性能也始终优于现有基线,并稳健地提高了LMM性能。
摘要:The application of Large Multimodal Models (LMMs) to long-form video understanding is constrained by limited context lengths and the computationally prohibitive cost of processing dense video tokens. Consequently, recent research has focused on query-aware frame selection, methods that often incur significant computational overhead. This paper challenges the assumption that such complex search mechanisms are universally necessary. We first identify and validate a query typology distinguishing between global query and localized query. We demonstrate that while uniform sampling is both effective and efficient for global queries, localized queries indeed necessitate query-aware selection for optimal performance. Building on this insight, we propose DIG, a training-free frame selection framework that adapts its strategy based on the query type. Specifically,DIG employs efficient uniform sampling for global queries while activating a specialized pipeline to extract query-relevant frames for localized queries. Experiments on three long-form video understanding benchmarks demonstrate that DIG consistently outperforms existing baselines and robustly improves LMM performance, even when scaling the input frame count to 256.
【2】Multi-Modal Opinion Integration for Financial Sentiment Analysis using Cross-Modal Attention
标题:使用跨模式注意力进行财务情绪分析的多模式意见整合
链接:https://arxiv.org/abs/2512.03464
作者:Yujing Liu,Chen Yang
摘要:近年来,金融舆情分析对于市场预测和风险评估越来越重要。然而,现有的方法往往难以有效地整合不同的意见模式,并捕捉它们之间的细粒度的相互作用。本文提出了一个端到端的深度学习框架,通过专门为金融情绪分析设计的新型跨模态注意力机制,集成了两种不同的金融观点模态:近因模态(及时观点)和流行模态(趋势观点)。虽然这两种模式都由文本数据组成,但它们代表了根本不同的信息渠道:最近驱动的市场更新与受欢迎程度驱动的集体情绪。我们的模型首先使用BERT(中文wwm-ext)的特征嵌入,然后采用我们提出的金融多头交叉注意(FMHCA)结构,以促进这些不同的意见模式之间的信息交换。处理后的特征通过Transformer层进行优化,并使用多模态因子化双线性池进行融合,以分类为负面、中性和正面情绪。在涵盖837家公司的综合数据集上进行的广泛实验表明,我们的方法实现了83.5%的准确率,显著优于包括BERT+Transformer在内的基线21%。这些结果凸显了我们的框架在支持更准确的财务决策和风险管理方面的潜力。
摘要:In recent years, financial sentiment analysis of public opinion has become increasingly important for market forecasting and risk assessment. However, existing methods often struggle to effectively integrate diverse opinion modalities and capture fine-grained interactions across them. This paper proposes an end-to-end deep learning framework that integrates two distinct modalities of financial opinions: recency modality (timely opinions) and popularity modality (trending opinions), through a novel cross-modal attention mechanism specifically designed for financial sentiment analysis. While both modalities consist of textual data, they represent fundamentally different information channels: recency-driven market updates versus popularity-driven collective sentiment. Our model first uses BERT (Chinese-wwm-ext) for feature embedding and then employs our proposed Financial Multi-Head Cross-Attention (FMHCA) structure to facilitate information exchange between these distinct opinion modalities. The processed features are optimized through a transformer layer and fused using multimodal factored bilinear pooling for classification into negative, neutral, and positive sentiment. Extensive experiments on a comprehensive dataset covering 837 companies demonstrate that our approach achieves an accuracy of 83.5%, significantly outperforming baselines including BERT+Transformer by 21 percent. These results highlight the potential of our framework to support more accurate financial decision-making and risk management.
【3】Comparative algorithm performance evaluation and prediction for the maximum clique problem using instance space analysis
标题:使用实例空间分析的最大团问题比较算法性能评估和预测
链接:https://arxiv.org/abs/2512.03419
作者:Bharat Sharman,Elkafi Hassini
摘要:最大团问题,一个著名的基于图的组合优化问题,已通过各种算法的方法来解决,虽然问题实例的系统分析仍然稀疏。本研究采用实例空间分析(ISA)方法系统地分析了这个问题的实例空间,并评估和预测最先进的(SOTA)算法的性能,包括精确,启发式和图形神经网络(GNN)为基础的方法。使用图机器学习研究中常用的TWITTER,COLLAB和IMDB-BINARY基准测试中的图实例编译数据集。一组33个通用的和2个特定问题的多项式时间可计算的图形为基础的功能,包括几个光谱特性,被用于ISA。一个综合性能的措施,将解决方案的质量和算法的运行时间。比较分析表明,精确算法混合阶最大团(MOMC)在编译数据集构成的实例空间的约74.7%上表现出优异的性能。Guesthouse和CliSAT分别占实例空间的13.8%和11%。基于ISA的算法性能预测模型在从BHOSLIB和DIMACS数据集编译的34个具有挑战性的测试实例上运行,分别产生了88%和97%的前1和前2个最佳性能算法预测准确度。
摘要:The maximum clique problem, a well-known graph-based combinatorial optimization problem, has been addressed through various algorithmic approaches, though systematic analyses of the problem instances remain sparse. This study employs the instance space analysis (ISA) methodology to systematically analyze the instance space of this problem and assess & predict the performance of state-of-the-art (SOTA) algorithms, including exact, heuristic, and graph neural network (GNN)-based methods. A dataset was compiled using graph instances from TWITTER, COLLAB and IMDB-BINARY benchmarks commonly used in graph machine learning research. A set of 33 generic and 2 problem-specific polynomial-time-computable graph-based features, including several spectral properties, was employed for the ISA. A composite performance mea- sure incorporating both solution quality and algorithm runtime was utilized. The comparative analysis demonstrated that the exact algorithm Mixed Order Maximum Clique (MOMC) exhib- ited superior performance across approximately 74.7% of the instance space constituted by the compiled dataset. Gurobi & CliSAT accounted for superior performance in 13.8% and 11% of the instance space, respectively. The ISA-based algorithm performance prediction model run on 34 challenging test instances compiled from the BHOSLIB and DIMACS datasets yielded top-1 and top-2 best performing algorithm prediction accuracies of 88% and 97%, respectively.
【4】Associating Healthcare Teamwork with Patient Outcomes for Predictive Analysis
标题:将医疗保健团队合作与患者结果关联起来进行预测分析
链接:https://arxiv.org/abs/2512.03296
作者:Hsiao-Ying Lu,Kwan-Liu Ma
摘要:癌症治疗结果不仅受到临床和人口统计学因素的影响,还受到医疗团队合作的影响。然而,以前的工作在很大程度上忽视了人类合作在塑造患者生存方面的潜在作用。本文提出了一种应用人工智能的方法,以揭示医疗保健专业人员(HCP)合作的影响-通过电子健康记录(EHR)系统捕获-对癌症患者的结果。我们将EHR介导的HCP相互作用建模为网络,并应用机器学习技术来检测嵌入这些合作中的患者生存预测信号。我们的模型经过交叉验证,以确保可推广性,我们通过识别与改善结果相关的关键网络特征来解释预测。重要的是,临床专家和文献验证了所确定的关键协作特征的相关性,增强了其在现实世界中应用的潜力。这项工作有助于建立一个实用的工作流程,利用协作和人工智能的数字痕迹来评估和改善基于团队的医疗保健。该方法有可能转移到涉及复杂协作的其他领域,并提供可操作的见解,以支持医疗保健服务中的数据知情干预。
摘要
:Cancer treatment outcomes are influenced not only by clinical and demographic factors but also by the collaboration of healthcare teams. However, prior work has largely overlooked the potential role of human collaboration in shaping patient survival. This paper presents an applied AI approach to uncovering the impact of healthcare professionals' (HCPs) collaboration-captured through electronic health record (EHR) systems-on cancer patient outcomes. We model EHR-mediated HCP interactions as networks and apply machine learning techniques to detect predictive signals of patient survival embedded in these collaborations. Our models are cross validated to ensure generalizability, and we explain the predictions by identifying key network traits associated with improved outcomes. Importantly, clinical experts and literature validate the relevance of the identified crucial collaboration traits, reinforcing their potential for real-world applications. This work contributes to a practical workflow for leveraging digital traces of collaboration and AI to assess and improve team-based healthcare. The approach is potentially transferable to other domains involving complex collaboration and offers actionable insights to support data-informed interventions in healthcare delivery.
【5】Plantain: Plan-Answer Interleaved Reasoning
标题:车前草:计划-答案交织推理
链接:https://arxiv.org/abs/2512.03176
作者:Anthony Liang,Jonathan Berant,Adam Fisch,Abhimanyu Goyal,Kalpesh Krishna,Jacob Eisenstein
摘要:推理模型在生成可见的响应之前通常会花费大量的时间进行思考。与此同时,它们不会给用户任何提示,告诉用户他们的推理是否正确,如果他们的推理有缺陷,也不会给用户任何追索权来阻止和纠正他们。这造成了一种令人沮丧的,但不幸的是常见的体验:用户的时间被浪费了,而模型从一个错误的前提进行推理,而这个错误的前提本可以很容易地被纠正。相比之下,人类说话者通常会执行轻量级的增量接地行为,以确保对话中的参与者处于同一页面上;在这里,我们问语言模型是否可以学习利用类似类型的行为?有了这个动机,我们提出了交错推理(IR),在该模型之间交替思考和浮出水面的中间反应,作为一种替代标准的“思考,然后回答”的方法。通过更早地向用户提供有用的信息,IR减少了感知延迟,即用户等待初始输出的时间,而不会影响最终响应的质量。我们进一步介绍了一个专门的交错推理,植物(计划,思想,答案交错),其中第一个中间响应是一个明确的,一步一步的计划执行任务。这种计划优先的策略允许用户干预和早期反馈,以便后续推理步骤。我们证明,在几个具有挑战性的数学推理和编码基准中,Plantain在pass@1方面提高了约6%,同时相对于思考然后回答基线,将首次响应时间减少了60%以上。
摘要:Reasoning models often spend a significant amount of time thinking before they generate a visible response. In the meantime, they do not give the user any hints as to whether their reasoning is on the right track, and do not give the user any recourse to stop and correct them if their reasoning is flawed. This creates a frustrating, but unfortunately common, experience: the user's time is wasted while the model reasons from a false premise that could have easily been corrected. In contrast, human speakers typically perform lightweight, incremental grounding acts to ensure that participants in the conversation are on the same page; here we ask if language models can learn to leverage a similar type of behavior? With this motivation, we propose interleaved reasoning (IR), in which the model alternates between thinking and surfacing intermediate responses, as an alternative to the standard "think-then-answer" approach. By providing useful information to the user earlier, IR reduces perceived latency, the time a user waits for an initial output, without compromising the quality of the final response. We further introduce a specialization of interleaved reasoning, Plantain (Plan-Thought-Answer Interleaving), where the first intermediate response is an explicit, step-by-step plan for executing the task. This plan-first strategy allows for user intervention and early feedback for subsequent reasoning steps. We demonstrate that Plantain yields an ~6% improvement in pass@1 across several challenging math reasoning and coding benchmarks, while reducing time-to-first-response by over 60% relative to think-then-answer baselines.
【6】Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time
标题:在推断时解锁扩散模型中隐藏的生物分子形态景观
链接:https://arxiv.org/abs/2512.03312
作者:Daniel D. Richman,Jessica Karaguesian,Carl-Mikael Suomivuori,Ron O. Dror
备注:Project page: https://github.com/drorlab/conformix
摘要:蛋白质等生物分子的功能取决于它们在各种结构或“构象”之间相互转换的能力。几十年来,研究人员一直致力于开发计算方法来预测构象的分布,这比静态折叠结构更难通过实验确定。我们提出了ConforMix,推理时间算法,提高采样的构象分布使用分类指导,过滤和自由能估计的组合。我们的方法升级了扩散模型-无论是静态结构预测还是构象生成-都可以更有效地发现构象变异性,而无需事先了解主要自由度。ConforMix与模型预训练的改进是正交的,即使是完美再现玻尔兹曼分布的假设模型也会受益。值得注意的是,当应用于为静态结构预测训练的扩散模型时,ConforMix捕获了结构变化,包括结构域运动、隐蔽口袋灵活性和转运蛋白循环,同时避免了非物理状态。生物关键蛋白质的案例研究证明了这种方法的可扩展性,准确性和实用性。
摘要:The function of biomolecules such as proteins depends on their ability to interconvert between a wide range of structures or "conformations." Researchers have endeavored for decades to develop computational methods to predict the distribution of conformations, which is far harder to determine experimentally than a static folded structure. We present ConforMix, an inference-time algorithm that enhances sampling of conformational distributions using a combination of classifier guidance, filtering, and free energy estimation. Our approach upgrades diffusion models -- whether trained for static structure prediction or conformational generation -- to enable more efficient discovery of conformational variability without requiring prior knowledge of major degrees of freedom. ConforMix is orthogonal to improvements in model pretraining and would benefit even a hypothetical model that perfectly reproduced the Boltzmann distribution. Remarkably, when applied to a diffusion model trained for static structure prediction, ConforMix captures structural changes including domain motion, cryptic pocket flexibility, and transporter cycling, while avoiding unphysical states. Case studies of biologically critical proteins demonstrate the scalability, accuracy, and utility of this method.
【7】QGShap: Quantum Acceleration for Faithful GNN Explanations
标题:QGShap:量子加速忠实的GNN算法
链接:https://arxiv.org/abs/2512.03099
作者:Haribandhu Jena,Jyotirmaya Shivottam,Subhankar Mishra
备注:Accepted in the QC+AI Workshop at AAAI 2026
摘要:图神经网络(GNN)在药物发现、社交网络分析和推荐系统等关键领域已变得不可或缺,但其黑箱性质阻碍了在需要透明度和问责制的场景中的部署。虽然基于Shapley值的方法通过量化每个组件对预测的贡献来提供数学原理解释,但计算精确值需要评估$2^n$联盟(或聚合超过$n!排列),这对于现实世界的图是难以处理的。现有的近似策略牺牲保真度或效率,限制了它们的实际效用。我们介绍了QGShap,一种量子计算方法,它利用幅度放大来实现联盟评估的二次加速,同时保持精确的Shapley计算。与经典的采样或代理方法不同,我们的方法提供了完全忠实的解释,而无需对易处理的图大小进行近似权衡。我们对合成图数据集进行了实证评估,证明QGShap实现了始终如一的高保真度和解释准确性,在所有评估指标中匹配或超过了经典方法的性能。这些结果共同表明,QGShap不仅保留了Shapley的精确忠实性,而且还提供了可解释的,稳定的和结构一致的解释,这些解释与GNN的底层图推理一致。QGShap的实现可在https://github.com/smlab-niser/qgshap上获得。
摘要:Graph Neural Networks (GNNs) have become indispensable in critical domains such as drug discovery, social network analysis, and recommendation systems, yet their black-box nature hinders deployment in scenarios requiring transparency and accountability. While Shapley value-based methods offer mathematically principled explanations by quantifying each component's contribution to predictions, computing exact values requires evaluating $2^n$ coalitions (or aggregating over $n!$ permutations), which is intractable for real-world graphs. Existing approximation strategies sacrifice either fidelity or efficiency, limiting their practical utility. We introduce QGShap, a quantum computing approach that leverages amplitude amplification to achieve quadratic speedups in coalition evaluation while maintaining exact Shapley computation. Unlike classical sampling or surrogate methods, our approach provides fully faithful explanations without approximation trade-offs for tractable graph sizes. We conduct empirical evaluations on synthetic graph datasets, demonstrating that QGShap achieves consistently high fidelity and explanation accuracy, matching or exceeding the performance of classical methods across all evaluation metrics. These results collectively demonstrate that QGShap not only preserves exact Shapley faithfulness but also delivers interpretable, stable, and structurally consistent explanations that align with the underlying graph reasoning of GNNs. The implementation of QGShap is available at https://github.com/smlab-niser/qgshap.
【8】Performance Analysis of Quantum Support Vector Classifiers and Quantum Neural Networks
标题
:量子支持量分类器和量子神经网络的性能分析
链接:https://arxiv.org/abs/2512.03094
作者:Tomás Villalba-Ferreiro,Eduardo Mosqueira-Rey,Diego Alvarez-Estevez
备注:7 pages, 7 figures, conference
摘要:本研究探讨了量子支持向量分类器(QSVC)和量子神经网络(QNN)与机器学习任务的经典模型相比的性能。通过在Iris和MNIST-PCA数据集上评估这些模型,我们发现随着问题复杂性的增加,量子模型往往优于经典方法。虽然QSVC通常提供更一致的结果,但由于其增加的量子负载,QNN在更高复杂性的任务中表现出更优越的性能。此外,我们还分析了超参数调整的影响,表明特征映射和anonymous配置显着影响模型的准确性。我们还比较了PennyLane和Qiskit框架,得出结论,Qiskit为我们的实现提供了更好的优化和效率。这些发现突出了量子机器学习(QML)在复杂分类问题上的潜力,并为模型选择和优化策略提供了见解
摘要:This study explores the performance of Quantum Support Vector Classifiers (QSVCs) and Quantum Neural Networks (QNNs) in comparison to classical models for machine learning tasks. By evaluating these models on the Iris and MNIST-PCA datasets, we find that quantum models tend to outperform classical approaches as the problem complexity increases. While QSVCs generally provide more consistent results, QNNs exhibit superior performance in higher-complexity tasks due to their increased quantum load. Additionally, we analyze the impact of hyperparameter tuning, showing that feature maps and ansatz configurations significantly influence model accuracy. We also compare the PennyLane and Qiskit frameworks, concluding that Qiskit provides better optimization and efficiency for our implementation. These findings highlight the potential of Quantum Machine Learning (QML) for complex classification problems and provide insights into model selection and optimization strategies
检测相关(7篇)
【1】Domain Feature Collapse: Implications for Out-of-Distribution Detection and Solutions
标题:域功能崩溃:对分发外检测和解决方案的影响
链接:https://arxiv.org/abs/2512.04034
作者:Hong Yang,Devroop Kar,Qi Yu,Alex Ororbia,Travis Desell
摘要:为什么最先进的OOD检测方法在单域数据集上训练模型时会出现灾难性的失败?我们提供了第一个理论解释这一现象,通过信息论的镜头。我们证明了在单域数据上的监督学习不可避免地产生域特征崩溃-I(x_d; z)= 0的表示,这意味着特定于域的信息被完全丢弃。这是信息瓶颈优化的基本结果:在单个域上训练的模型(例如,医学图像)学会仅依赖于类特定特征而丢弃域特征,从而在检测域外样本时导致灾难性的失败(例如,在MNIST上仅实现53%FPR@95)。我们扩展我们的分析,使用Fano不等式,以量化在实际情况下的部分崩溃。为了验证我们的理论,我们引入了Domain Bench,一个单域数据集的基准,并证明了通过域过滤(使用预训练的表示)保持I(x_d; z)> 0可以解决故障模式。虽然域过滤本身在概念上是简单的,但它的有效性为我们的信息理论框架提供了强有力的经验证据。我们的工作解释了一个令人困惑的经验现象,揭示了监督学习在狭窄领域的基本局限性,并对迁移学习以及何时微调与冻结预训练模型具有更广泛的影响。
摘要:Why do state-of-the-art OOD detection methods exhibit catastrophic failure when models are trained on single-domain datasets? We provide the first theoretical explanation for this phenomenon through the lens of information theory. We prove that supervised learning on single-domain data inevitably produces domain feature collapse -- representations where I(x_d; z) = 0, meaning domain-specific information is completely discarded. This is a fundamental consequence of information bottleneck optimization: models trained on single domains (e.g., medical images) learn to rely solely on class-specific features while discarding domain features, leading to catastrophic failure when detecting out-of-domain samples (e.g., achieving only 53% FPR@95 on MNIST). We extend our analysis using Fano's inequality to quantify partial collapse in practical scenarios. To validate our theory, we introduce Domain Bench, a benchmark of single-domain datasets, and demonstrate that preserving I(x_d; z) > 0 through domain filtering (using pretrained representations) resolves the failure mode. While domain filtering itself is conceptually straightforward, its effectiveness provides strong empirical evidence for our information-theoretic framework. Our work explains a puzzling empirical phenomenon, reveals fundamental limitations of supervised learning in narrow domains, and has broader implications for transfer learning and when to fine-tune versus freeze pretrained models.
【2】A Hybrid Deep Learning and Anomaly Detection Framework for Real-Time Malicious URL Classification
标题:用于实时恶意URL分类的混合深度学习和异常检测框架
链接:https://arxiv.org/abs/2512.03462
作者:Berkani Khaled,Zeraoulia Rafik
备注:14 pages,2 figures
摘要:恶意URL仍然是网络钓鱼、恶意软件和网络威胁的主要载体。这项研究提出了一种混合深度学习框架,结合了\texttt{HashingVectorizer} n-gram分析、SMOTE平衡、隔离森林异常过滤和轻量级神经网络分类器,用于实时URL分类。多级管道处理来自开源存储库的具有统计特征(长度,点计数,熵)的URL,实现$O(NL + EBdh)$训练复杂度和20\,ms预测延迟。经验评估产生了96.4\%的准确性,95.4\%的F1分数和97.3\%的ROC-AUC,优于CNN(94.8\%)和SVM基线,乘以$--$100\!\乘以$加速比(表~\ref{tab:comp-complex})。多语言Tkinter GUI(阿拉伯语/英语/法语)支持通过剪贴板集成进行实时威胁评估。该框架展示了优越的可扩展性和弹性对混淆的URL模式。
摘要:Malicious URLs remain a primary vector for phishing, malware, and cyberthreats. This study proposes a hybrid deep learning framework combining \texttt{HashingVectorizer} n-gram analysis, SMOTE balancing, Isolation Forest anomaly filtering, and a lightweight neural network classifier for real-time URL classification. The multi-stage pipeline processes URLs from open-source repositories with statistical features (length, dot count, entropy), achieving $O(NL + EBdh)$ training complexity and a 20\,ms prediction latency. Empirical evaluation yields 96.4\% accuracy, 95.4\% F1-score, and 97.3\% ROC-AUC, outperforming CNN (94.8\%) and SVM baselines with a $50\!\times$--$100\!\times$ speedup (Table~\ref{tab:comp-complexity}). A multilingual Tkinter GUI (Arabic/English/French) enables real-time threat assessment with clipboard integration. The framework demonstrates superior scalability and resilience against obfuscated URL patterns.
【3】MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems
标题:MAGE-ID:一个多模态的入侵检测系统生成框架
链接:https://arxiv.org/abs/2512.03375
作者:Mahdi Arab Loodaricheh,Mohammad Hossein Manshaei,Anita Raja
摘要:由于异构网络流量、不断变化的网络威胁以及良性流量和攻击流量之间明显的数据不平衡,现代入侵检测系统(IDS)面临着严峻的挑战。虽然生成模型在数据增强方面表现出了希望,但现有方法仅限于单一模式,无法捕获跨域依赖关系。本文介绍了MAGE-ID(多模态攻击发生器入侵检测),一个基于扩散的生成框架,通过一个统一的潜在先验耦合表流特征与其变换后的图像。通过联合训练Transformer和基于CNN的变分编码器与EDM风格的去噪器,MAGE-ID实现了平衡和连贯的多模态合成。对CIC-IDS-2017和NSL-KDD的评估表明,与TabSyn和TabDDPM相比,在保真度、多样性和下游检测性能方面都有显着改进,突出了MAGE-ID对多模式IDS增强的有效性。
摘要
:Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. While generative models have shown promise in data augmentation, existing approaches are limited to single modalities and fail to capture cross-domain dependencies. This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework that couples tabular flow features with their transformed images through a unified latent prior. By jointly training Transformer and CNN-based variational encoders with an EDM style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis. Evaluations on CIC-IDS-2017 and NSL-KDD demonstrate significant improvements in fidelity, diversity, and downstream detection performance over TabSyn and TabDDPM, highlighting the effectiveness of MAGE-ID for multimodal IDS augmentation.
【4】PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing Imagery
标题:PyroFocus:一种在多光谱遥感图像中实时检测野火的深度学习方法
链接:https://arxiv.org/abs/2512.03257
作者:Mark Moussa,Andre Williams,Seth Roffe,Douglas Morton
摘要:快速准确的野火检测对于应急响应和环境管理至关重要。在机载和星载任务中,实时算法必须区分无火、有火和火后条件,并估计火灾强度。多光谱和高光谱热成像仪提供了丰富的光谱信息,但高数据维数和有限的机载资源使实时处理具有挑战性。随着野火发生频率和严重程度的增加,对低延迟和计算效率高的机载检测方法的需求至关重要。 我们对多种深度学习架构进行了系统评估,包括自定义卷积神经网络(CNN)和基于变换器的模型,用于多类火灾分类。我们还介绍了PyroFocus,这是一个两阶段的管道,它可以进行火灾分类,然后进行火灾辐射功率(FRP)回归或分割,以减少机载部署的推理时间和计算成本。使用来自NASA的MODIS/ASTER机载模拟器(MASTER)的数据,这是类似于下一代火灾探测传感器,我们比较准确性,推理延迟和资源效率。 实验结果表明,所提出的两阶段流水线实现了速度和准确性之间的强烈权衡,展示了在未来野火监测任务中实时边缘部署的巨大潜力。
摘要:Rapid and accurate wildfire detection is crucial for emergency response and environmental management. In airborne and spaceborne missions, real-time algorithms must distinguish between no fire, active fire, and post-fire conditions, and estimate fire intensity. Multispectral and hyperspectral thermal imagers provide rich spectral information, but high data dimensionality and limited onboard resources make real-time processing challenging. As wildfires increase in frequency and severity, the need for low-latency and computationally efficient onboard detection methods is critical. We present a systematic evaluation of multiple deep learning architectures, including custom Convolutional Neural Networks (CNNs) and Transformer-based models, for multi-class fire classification. We also introduce PyroFocus, a two-stage pipeline that performs fire classification followed by fire radiative power (FRP) regression or segmentation to reduce inference time and computational cost for onboard deployment. Using data from NASA's MODIS/ASTER Airborne Simulator (MASTER), which is similar to a next-generation fire detection sensor, we compare accuracy, inference latency, and resource efficiency. Experimental results show that the proposed two-stage pipeline achieves strong trade-offs between speed and accuracy, demonstrating significant potential for real-time edge deployment in future wildfire monitoring missions.
【5】Contrastive Deep Learning for Variant Detection in Wastewater Genomic Sequencing
标题:对比深度学习用于废水基因组测序中的变体检测
链接:https://arxiv.org/abs/2512.03158
作者:Adele Chinda,Richmond Azumah,Hemanth Demakethepalli Venkateswara
备注:13 pages, 4 figures
摘要:以淡水为基础的基因组监测已成为人群水平病毒监测的有力工具,为整个社区的病毒变异提供了全面的见解。然而,这种方法面临着来自高测序噪声、低病毒覆盖率、片段化读段和完全不存在标记的变体注释的显著计算挑战。传统的基于参考的变体调用管道与新的突变作斗争,并且需要大量的计算资源。我们提出了一个全面的框架,用于无监督的病毒变异检测使用矢量量化变分自动编码器(VQ-VAE),学习离散码本的基因组模式从k-mer标记序列,而不需要参考基因组或变异标签。我们的方法扩展了基础VQ-VAE架构,使用掩码重建预训练来增强对丢失数据的鲁棒性,并对高分辨嵌入进行对比学习。在SARS-CoV-2废水测序数据(包括大约100,000个读数)上进行评估,我们的VQ-VAE实现了99.52%的平均标记水平准确度和56.33%的精确序列匹配率,同时保持了19.73%的码本利用率(512个代码中的101个代码有效),证明了有效的离散表示学习。不同投影维度的对比微调产生了实质性的聚类改进:64维嵌入实现了+35%的Silhouette分数改进(0.31至0.42),而128维嵌入实现了+42%的改进(0.31至0.44),清楚地表明了嵌入维度对变体区分能力的影响。我们的无参考框架为基因组监测提供了一种可扩展的,可解释的方法,可直接应用于公共卫生监测。
摘要:Wastewater-based genomic surveillance has emerged as a powerful tool for population-level viral monitoring, offering comprehensive insights into circulating viral variants across entire communities. However, this approach faces significant computational challenges stemming from high sequencing noise, low viral coverage, fragmented reads, and the complete absence of labeled variant annotations. Traditional reference-based variant calling pipelines struggle with novel mutations and require extensive computational resources. We present a comprehensive framework for unsupervised viral variant detection using Vector-Quantized Variational Autoencoders (VQ-VAE) that learns discrete codebooks of genomic patterns from k-mer tokenized sequences without requiring reference genomes or variant labels. Our approach extends the base VQ-VAE architecture with masked reconstruction pretraining for robustness to missing data and contrastive learning for highly discriminative embeddings. Evaluated on SARS-CoV-2 wastewater sequencing data comprising approximately 100,000 reads, our VQ-VAE achieves 99.52% mean token-level accuracy and 56.33% exact sequence match rate while maintaining 19.73% codebook utilization (101 of 512 codes active), demonstrating efficient discrete representation learning. Contrastive fine-tuning with different projection dimensions yields substantial clustering improvements: 64-dimensional embeddings achieve +35% Silhouette score improvement (0.31 to 0.42), while 128-dimensional embeddings achieve +42% improvement (0.31 to 0.44), clearly demonstrating the impact of embedding dimensionality on variant discrimination capability. Our reference-free framework provides a scalable, interpretable approach to genomic surveillance with direct applications to public health monitoring.
【6】Detecting AI Hallucinations in Finance: An Information-Theoretic Method Cuts Hallucination Rate by 92%
标题:检测金融中的人工智能幻觉:信息论方法将幻觉率降低92%
链接:https://arxiv.org/abs/2512.03107
作者:Mainak Singha
备注:17 pages, 7 figures. Information-theoretic, hallucination detector for financial application. Feedback from researchers and practitioners is welcome
摘要:大型语言模型(LLM)产生流畅但不受支持的答案-幻觉-限制了高风险领域的安全部署。我们提出了ECLIPSE,一个框架,将幻觉视为模型的语义熵和可用证据的能力之间的不匹配。我们结合熵估计通过多样本聚类与一种新的困惑分解,衡量模型如何使用检索到的证据。我们证明,在温和的条件下,由此产生的熵容量目标是严格凸的唯一稳定的最佳。我们使用GPT-3.5-turbo(n=200个具有合成幻觉的平衡样本)对受控的财务问题回答数据集进行评估,其中ECLIPSE达到0.89的ROC AUC和0.90的平均精度,大大优于仅语义熵基线(AUC 0.50)。使用Claude-3-Haiku进行的受控消融(缺乏标记级对数概率)显示AUC降至0.59,系数幅度降低95% -表明ECLIPSE是一种logprob-native机制,其有效性取决于校准的标记级不确定性。困惑分解功能表现出最大的学习系数,证实证据利用是幻觉检测的核心。我们将这项工作定位为受控机制研究;跨领域和自然发生的幻觉的更广泛验证仍然是未来的工作。
摘要
:Large language models (LLMs) produce fluent but unsupported answers - hallucinations - limiting safe deployment in high-stakes domains. We propose ECLIPSE, a framework that treats hallucination as a mismatch between a model's semantic entropy and the capacity of available evidence. We combine entropy estimation via multi-sample clustering with a novel perplexity decomposition that measures how models use retrieved evidence. We prove that under mild conditions, the resulting entropy-capacity objective is strictly convex with a unique stable optimum. We evaluate on a controlled financial question answering dataset with GPT-3.5-turbo (n=200 balanced samples with synthetic hallucinations), where ECLIPSE achieves ROC AUC of 0.89 and average precision of 0.90, substantially outperforming a semantic entropy-only baseline (AUC 0.50). A controlled ablation with Claude-3-Haiku, which lacks token-level log probabilities, shows AUC dropping to 0.59 with coefficient magnitudes decreasing by 95% - demonstrating that ECLIPSE is a logprob-native mechanism whose effectiveness depends on calibrated token-level uncertainties. The perplexity decomposition features exhibit the largest learned coefficients, confirming that evidence utilization is central to hallucination detection. We position this work as a controlled mechanism study; broader validation across domains and naturally occurring hallucinations remains future work.
【7】Novelty detection on path space
标题:路径空间的新奇检测
链接:https://arxiv.org/abs/2512.03243
作者:Ioannis Gasteratos,Antoine Jacquier,Maud Lemercier,Terry Lyons,Cristopher Salvi
摘要:我们框架新奇检测路径空间作为一个假设检验问题,基于签名的测试统计。使用Gasteratos和Jacquier(2023)的运输成本不等式,我们获得了假阳性率的尾界,这些假阳性率超出了高斯测度,扩展到具有光滑有界向量场的RDE解的律,得到了分位数和p值的估计。利用洗牌产品,我们推导出精确的公式,光滑的代理人的条件风险价值(CVaR)的预期签名,导致新的一类SVM算法优化光滑CVaR目标。然后,我们建立下界的类型-$\mathrm{II}$错误的替代品有限的一阶矩,给一般的权力界时,参考措施和替代品是绝对连续的。最后,我们数值评估的类型-$\mathrm{I}$错误和基于签名的检验统计量的统计能力,使用合成的异常扩散数据和现实世界的分子生物学数据。
摘要:We frame novelty detection on path space as a hypothesis testing problem with signature-based test statistics. Using transportation-cost inequalities of Gasteratos and Jacquier (2023), we obtain tail bounds for false positive rates that extend beyond Gaussian measures to laws of RDE solutions with smooth bounded vector fields, yielding estimates of quantiles and p-values. Exploiting the shuffle product, we derive exact formulae for smooth surrogates of conditional value-at-risk (CVaR) in terms of expected signatures, leading to new one-class SVM algorithms optimising smooth CVaR objectives. We then establish lower bounds on type-$\mathrm{II}$ error for alternatives with finite first moment, giving general power bounds when the reference measure and the alternative are absolutely continuous with respect to each other. Finally, we evaluate numerically the type-$\mathrm{I}$ error and statistical power of signature-based test statistic, using synthetic anomalous diffusion data and real-world molecular biology data.
分类|识别(4篇)
【1】HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English
标题:HieroGlyphTranslator:自动识别埃及象形文字并将其翻译成英语
链接:https://arxiv.org/abs/2512.03817
作者:Ahmed Nasser,Marwan Mohamed,Alaa Sherif,Basmala Mahmoud,Shereen Yehia,Asmaa Saad,Mariam S. El-Rahmany,Ensaf H. Mohamed
摘要:埃及象形文字,古埃及的书写系统,完全由图画组成。将这些字形翻译成英语带来了各种挑战,包括一个符号可以有多种含义。深度学习翻译应用正在迅速发展,产生了显著的成果,对我们的生活产生了重大影响。在这项研究中,我们提出了一种方法的自动识别和翻译的古埃及象形文字从图像到英语。该研究使用两个数据集进行分类和翻译:Morris Franken数据集和PasttianTranslation数据集。我们的方法分为三个阶段:分割(使用Contour和Detectron 2)、将符号映射到Gardiner代码以及翻译(使用CNN模型)。该模型的BLEU得分为42.2,与之前的研究相比,这是一个显著的结果。
摘要:Egyptian hieroglyphs, the ancient Egyptian writing system, are composed entirely of drawings. Translating these glyphs into English poses various challenges, including the fact that a single glyph can have multiple meanings. Deep learning translation applications are evolving rapidly, producing remarkable results that significantly impact our lives. In this research, we propose a method for the automatic recognition and translation of ancient Egyptian hieroglyphs from images to English. This study utilized two datasets for classification and translation: the Morris Franken dataset and the EgyptianTranslation dataset. Our approach is divided into three stages: segmentation (using Contour and Detectron2), mapping symbols to Gardiner codes, and translation (using the CNN model). The model achieved a BLEU score of 42.2, a significant result compared to previous research.
【2】Forensic Activity Classification Using Digital Traces from iPhones: A Machine Learning-based Approach
标题:使用iPhone数字痕迹进行法医活动分类:基于机器学习的方法
链接:https://arxiv.org/abs/2512.03786
作者:Conor McCarthy,Jan Peter van Zandwijk,Marcel Worring,Zeno Geradts
摘要:智能手机和智能手表在日常生活中无处不在,并提供了有关其用户行为的丰富信息来源。特别是,来自手机嵌入式运动传感器的数字痕迹为法医调查员提供了深入了解一个人的身体活动的机会。在这项工作中,我们提出了一种基于机器学习的方法,将数字痕迹转换为不同类型的身体活动的似然比(LR)。在一个新的数据集NFI\_FARED上进行评估,该数据集包含来自四种不同类型的iPhone的标记有19个活动的数字痕迹,发现我们的方法可以产生有用的LR系统来区分可能的171个活动配对中的167个。同样的方法被扩展到同时分析多个活动(或多组活动)的可能性,并制定活动时间表,以协助法医调查的早期和后期阶段。数据集和复制结果所需的所有代码也已公开,以鼓励对该主题的进一步研究。
摘要:Smartphones and smartwatches are ever-present in daily life, and provide a rich source of information on their users' behaviour. In particular, digital traces derived from the phone's embedded movement sensors present an opportunity for a forensic investigator to gain insight into a person's physical activities. In this work, we present a machine learning-based approach to translate digital traces into likelihood ratios (LRs) for different types of physical activities. Evaluating on a new dataset, NFI\_FARED, which contains digital traces from four different types of iPhones labelled with 19 activities, it was found that our approach could produce useful LR systems to distinguish 167 out of a possible 171 activity pairings. The same approach was extended to analyse likelihoods for multiple activities (or groups of activities) simultaneously and create activity timelines to aid in both the early and latter stages of forensic investigations. The dataset and all code required to replicate the results have also been made public to encourage further research on this topic.
【3】Cross-Space Synergy: A Unified Framework for Multimodal Emotion Recognition in Conversation
标题:跨空间协同:对话中多模式情感识别的统一框架
链接:https://arxiv.org/abs/2512.03521
作者:Xiaosen Lyu,Jiayu Xiong,Yuren Chen,Wanlong Wang,Xiaoqing Dai,Jing Wang
备注:Accepted to AAAI 2026
摘要:会话中的多模态情感识别(MERC)旨在通过整合文本、声音和视觉线索来预测说话者的情感。现有的方法要么难以捕获复杂的跨模式交互,要么在使用更深层次的架构时遇到梯度冲突和不稳定的训练。为了解决这些问题,我们提出了跨空间协同(CSS),它耦合一个表示组件与优化组件。Synergistic Polynomial Fusion(SPF)提供表示角色,利用低秩张量因子分解来有效地捕获高阶跨模态相互作用。Pareto Gradient Modulator(PGM)起到优化作用,在竞争目标之间沿着Pareto最优方向引导更新,以缓解梯度冲突并提高稳定性。实验表明,CSS在准确性和训练稳定性方面优于IEMOCAP和MELD上已有的代表性方法,证明了其在复杂多模态场景下的有效性。
摘要
:Multimodal Emotion Recognition in Conversation (MERC) aims to predict speakers' emotions by integrating textual, acoustic, and visual cues. Existing approaches either struggle to capture complex cross-modal interactions or experience gradient conflicts and unstable training when using deeper architectures. To address these issues, we propose Cross-Space Synergy (CSS), which couples a representation component with an optimization component. Synergistic Polynomial Fusion (SPF) serves the representation role, leveraging low-rank tensor factorization to efficiently capture high-order cross-modal interactions. Pareto Gradient Modulator (PGM) serves the optimization role, steering updates along Pareto-optimal directions across competing objectives to alleviate gradient conflicts and improve stability. Experiments show that CSS outperforms existing representative methods on IEMOCAP and MELD in both accuracy and training stability, demonstrating its effectiveness in complex multimodal scenarios.
【4】GaussDetect-LiNGAM:Causal Direction Identification without Gaussianity test
标题:GaussDetect-LiNGAM:无高斯性测试的因果方向识别
链接:https://arxiv.org/abs/2512.03428
作者:Ziyi Ding,Xiao-Ping Zhang
摘要:我们提出了GaussDetect-LiNGAM,这是一种用于二元因果发现的新方法,它通过利用反向回归中噪声高斯性和残差独立性之间的基本等价性来消除对显式高斯性测试的需要。在标准LiNGAM假设的线性,非周期性和外生性,我们证明了高斯的前向模型噪声是等效的回归和残差之间的独立性在反向模型。这种理论上的洞察力使我们能够取代脆弱和样本敏感的高斯性测试与强大的基于内核的独立性测试。实验结果验证了等价性,并表明GaussDetect-LiNGAM在不同的噪声类型和样本大小上保持了高度的一致性,同时减少了每次决策的测试次数(TPD)。我们的方法提高了因果推理的效率和实用性,使LiNGAM在现实世界中更容易获得和可靠。
摘要:We propose GaussDetect-LiNGAM, a novel approach for bivariate causal discovery that eliminates the need for explicit Gaussianity tests by leveraging a fundamental equivalence between noise Gaussianity and residual independence in the reverse regression. Under the standard LiNGAM assumptions of linearity, acyclicity, and exogeneity, we prove that the Gaussianity of the forward-model noise is equivalent to the independence between the regressor and residual in the reverse model. This theoretical insight allows us to replace fragile and sample-sensitive Gaussianity tests with robust kernel-based independence tests. Experimental results validate the equivalence and demonstrate that GaussDetect-LiNGAM maintains high consistency across diverse noise types and sample sizes, while reducing the number of tests per decision (TPD). Our method enhances both the efficiency and practical applicability of causal inference, making LiNGAM more accessible and reliable in real-world scenarios.
表征(2篇)
【1】In-Context Representation Hijacking
标题:上下文表示劫持
链接:https://arxiv.org/abs/2512.03771
作者:Itay Yona,Amir Sarid,Michael Karasik,Yossi Gandelsman
摘要:我们介绍了\textbf{Doublempeak},一个简单的\textbf {在上下文表示劫持}攻击大型语言模型(LLM)。攻击通过系统地替换有害关键字(例如,\textit{bomb})与良性令牌(例如,\textit{carrot})在多个上下文中的示例中,为有害请求提供了前缀。我们证明,这种替代导致内部表示的良性令牌收敛到有害的一个,有效地嵌入下的委婉语有害的语义。因此,表面上无害的提示(例如,“如何制作胡萝卜?”“)在内部被解释为不允许的指令(例如,“如何制造炸弹?””),从而绕过模型的安全对齐。我们使用可解释性工具来表明,这种语义覆盖出现层,与良性的意义在早期层收敛到有害的语义在以后的。DoubletPeak无需优化,可在模型家族中广泛移植,并在闭源和开源系统上实现了很高的成功率,在Llama-3.3- 70 B-Instruct上达到了74%的ASR,具有单句上下文覆盖。我们的研究结果突出了LLM潜在空间中的一个新的攻击面,揭示了当前的对齐策略是不够的,应该在代表性层面上操作。
摘要:We introduce \textbf{Doublespeak}, a simple \emph{in-context representation hijacking} attack against large language models (LLMs). The attack works by systematically replacing a harmful keyword (e.g., \textit{bomb}) with a benign token (e.g., \textit{carrot}) across multiple in-context examples, provided a prefix to a harmful request. We demonstrate that this substitution leads to the internal representation of the benign token converging toward that of the harmful one, effectively embedding the harmful semantics under a euphemism. As a result, superficially innocuous prompts (e.g., ``How to build a carrot?'') are internally interpreted as disallowed instructions (e.g., ``How to build a bomb?''), thereby bypassing the model's safety alignment. We use interpretability tools to show that this semantic overwrite emerges layer by layer, with benign meanings in early layers converging into harmful semantics in later ones. Doublespeak is optimization-free, broadly transferable across model families, and achieves strong success rates on closed-source and open-source systems, reaching 74\% ASR on Llama-3.3-70B-Instruct with a single-sentence context override. Our findings highlight a new attack surface in the latent space of LLMs, revealing that current alignment strategies are insufficient and should instead operate at the representation level.
【2】Universally Converging Representations of Matter Across Scientific Foundation Models
标题:跨科学基金会模型的普遍收敛物质表示
链接:https://arxiv.org/abs/2512.03750
作者:Sathya Edamadaka,Soojung Yang,Ju Li,Rafael Gómez-Bombarelli
备注:Oral spotlight at NeurIPS 2025 UniReps Workshop
摘要:具有截然不同模式和架构的机器学习模型正在被训练来预测分子、材料和蛋白质的行为。然而,目前尚不清楚它们是否学习到类似的物质内部表征。了解它们的潜在结构对于构建科学基础模型至关重要,这些模型可以可靠地推广到训练领域之外。虽然在语言和视觉中观察到了表征趋同,但在科学中的对应物尚未得到系统的探索。在这里,我们展示了近60个科学模型的表征,跨越字符串,图形,3D原子和基于蛋白质的模式,在广泛的化学系统中高度一致。在不同数据集上训练的模型具有高度相似的小分子表示,机器学习原子间势随着性能的提高而收敛于表示空间,这表明基础模型学习了物理现实的共同底层表示。然后,我们展示了科学模型的两种不同的机制:在与训练期间看到的输入类似的输入上,高性能模型紧密对齐,弱模型在表示空间中发散到局部次优;在与训练期间看到的结构大不相同的结构上,几乎所有模型都崩溃到低信息表示上,这表明今天的模型仍然受到训练数据和归纳偏差的限制,并且还没有编码真正的通用结构。我们的研究结果建立了代表性对齐作为定量基准的基础级的一般性,在科学模型。更广泛地说,我们的工作可以随着模型的规模跟踪物质的普遍表征的出现,并选择和提取模型,其学习的表征在模态、物质领域和科学任务之间转移得最好。
摘要:Machine learning models of vastly different modalities and architectures are being trained to predict the behavior of molecules, materials, and proteins. However, it remains unclear whether they learn similar internal representations of matter. Understanding their latent structure is essential for building scientific foundation models that generalize reliably beyond their training domains. Although representational convergence has been observed in language and vision, its counterpart in the sciences has not been systematically explored. Here, we show that representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems. Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality. We then show two distinct regimes of scientific models: on inputs similar to those seen during training, high-performing models align closely and weak models diverge into local sub-optima in representation space; on vastly different structures from those seen during training, nearly all models collapse onto a low-information representation, indicating that today's models remain limited by training data and inductive bias and do not yet encode truly universal structure. Our findings establish representational alignment as a quantitative benchmark for foundation-level generality in scientific models. More broadly, our work can track the emergence of universal representations of matter as models scale, and for selecting and distilling models whose learned representations transfer best across modalities, domains of matter, and scientific tasks.
编码器(3篇)
【1】Origin-Conditional Trajectory Encoding: Measuring Urban Configurational Asymmetries through Neural Decomposition
标题:起源条件轨迹编码:通过神经分解测量城市空间不对称性
链接:https://arxiv.org/abs/2512.03755
作者:Stephen Law,Tao Yang,Nanjiang Chen,Xuhui Lin
摘要:城市分析越来越依赖于人工智能驱动的轨迹分析,但目前的方法存在方法上的碎片化:轨迹学习捕捉运动模式,但忽略了空间背景,而空间嵌入方法编码街道网络,但错过了时间动态。三个差距依然存在:(1)缺乏整合空间和时间表示的联合训练,(2)忽略导航中方向不对称的起源不可知论处理($A \to B \ne B \to A$),以及(3)过度依赖辅助数据(POI,图像)而不是城市空间的基本几何属性。我们引入了一个条件轨迹编码器,共同学习空间和运动表示,同时使用几何特征保留原点相关的不对称性。该框架将城市导航分解为共享的认知模式和特定于原点的空间叙事,从而能够定量测量起始位置之间的认知不对称。我们的双向LSTM处理可视性比率和曲率特征,这些特征以可学习的起源嵌入为条件,通过对比学习将表示分解为共享的城市模式和起源特定的签名。六个综合城市的结果和北京西城区的真实世界验证表明,城市形态创造了系统的认知不平等。这为城市规划者提供了评估经验公平性的定量工具,为建筑师提供了对布局决策认知影响的见解,并为导航系统提供了原点感知分析。
摘要:Urban analytics increasingly relies on AI-driven trajectory analysis, yet current approaches suffer from methodological fragmentation: trajectory learning captures movement patterns but ignores spatial context, while spatial embedding methods encode street networks but miss temporal dynamics. Three gaps persist: (1) lack of joint training that integrates spatial and temporal representations, (2) origin-agnostic treatment that ignores directional asymmetries in navigation ($A \to B \ne B \to A$), and (3) over-reliance on auxiliary data (POIs, imagery) rather than fundamental geometric properties of urban space. We introduce a conditional trajectory encoder that jointly learns spatial and movement representations while preserving origin-dependent asymmetries using geometric features. This framework decomposes urban navigation into shared cognitive patterns and origin-specific spatial narratives, enabling quantitative measurement of cognitive asymmetries across starting locations. Our bidirectional LSTM processes visibility ratio and curvature features conditioned on learnable origin embeddings, decomposing representations into shared urban patterns and origin-specific signatures through contrastive learning. Results from six synthetic cities and real-world validation on Beijing's Xicheng District demonstrate that urban morphology creates systematic cognitive inequalities. This provides urban planners quantitative tools for assessing experiential equity, offers architects insights into layout decisions' cognitive impacts, and enables origin-aware analytics for navigation systems.
【2】Cyclical Temporal Encoding and Hybrid Deep Ensembles for Multistep Energy Forecasting
标题:用于多步能量预测的循环时间编码和混合深度集成
链接:https://arxiv.org/abs/2512.03656
作者:Salim Khazem,Houssam Kanso
摘要:准确的用电量预测对于需求管理和智能电网运营至关重要。本文介绍了一种统一的深度学习框架,该框架将循环时间编码与混合LSTM-CNN架构集成在一起,以增强多步能量预测。我们系统地转换基于字符串的属性使用正弦余弦编码,以保持周期性结构,并通过相关性分析来评估其预测相关性。为了利用长期季节效应和短期局部模式,我们采用了由LSTM,CNN和专门用于每个预测范围的MLP回归器的元学习器组成的集成模型。使用一年的全国消费数据集,我们进行了广泛的实验研究,包括消融分析与周期性编码和日历功能,并与文献中建立的基线进行比较。结果表明,在所有七个预测范围内都有一致的改进,我们的混合模型实现了比单个架构和先前方法更低的RMSE和MAE。这些发现证实了将循环时间表示与互补的深度学习结构相结合的好处。据我们所知,这是在统一的短期能源预测框架内联合评估时间编码、基于时间的特征和混合集成架构的第一项工作。
摘要:Accurate electricity consumption forecasting is essential for demand management and smart grid operations. This paper introduces a unified deep learning framework that integrates cyclical temporal encoding with hybrid LSTM-CNN architectures to enhance multistep energy forecasting. We systematically transform calendar-based attributes using sine cosine encodings to preserve periodic structure and evaluate their predictive relevance through correlation analysis. To exploit both long-term seasonal effects and short-term local patterns, we employ an ensemble model composed of an LSTM, a CNN, and a meta-learner of MLP regressors specialized for each forecast horizon. Using a one year national consumption dataset, we conduct an extensive experimental study including ablation analyses with and without cyclical encodings and calendar features and comparisons with established baselines from the literature. Results demonstrate consistent improvements across all seven forecast horizons, with our hybrid model achieving lower RMSE and MAE than individual architectures and prior methods. These findings confirm the benefit of combining cyclical temporal representations with complementary deep learning structures. To our knowledge, this is the first work to jointly evaluate temporal encodings, calendar-based features, and hybrid ensemble architectures within a unified short-term energy forecasting framework.
【3】Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
标题:自我注意中符号的动态特性和位置编码的影响
链接:https://arxiv.org/abs/2512.03058
作者:Duy-Tung Pham,An The Nguyen,Viet-Hoang Tran,Nhan-Phu Chung,Xin T. Tong,Tan M. Nguyen,Thieu N. Vo
摘要:本文研究了预训练的Transformer模型中标记的动力学性质,并探讨了它们在改进Transformers中的应用。为此,我们分析了控制预训练模型的连续时间限制的动力系统,并描述了其解的渐近行为。具体来说,我们根据模型参数来描述令牌随着时间的推移何时彼此靠近或远离。我们提供了充分的条件,基于这些参数,以确定令牌收敛到零或发散到无穷大的情况。与以前的作品不同,我们的条件范围更广,更适用于现实世界的模型。此外,我们调查如何不同形式的位置编码-特别是绝对和旋转-影响这些动态制度。经验证据表明,收敛的情况下,模型的性能产生不利影响。受这些见解的启发,我们提出了简单的改进Transformer架构,减轻收敛行为的模型与绝对或旋转位置编码。这些发现为改进Transformer模型提供了理论基础和设计原则。
摘要:This paper investigates the dynamical properties of tokens in pre-trained Transformer models and explores their application to improving Transformers. To this end, we analyze the dynamical system governing the continuous-time limit of the pre-trained model and characterize the asymptotic behavior of its solutions. Specifically, we characterize when tokens move closer to or farther from one another over time, depending on the model parameters. We provide sufficient conditions, based on these parameters, to identify scenarios where tokens either converge to zero or diverge to infinity. Unlike prior works, our conditions are broader in scope and more applicable to real-world models. Furthermore, we investigate how different forms of positional encoding -- specifically absolute and rotary -- affect these dynamical regimes. Empirical evidence reveals that the convergence scenario adversely impacts model performance. Motivated by these insights, we propose simple refinements to Transformer architectures that mitigate convergence behavior in models with absolute or rotary positional encoding. These findings support theoretical foundations and design principles for improving Transformer models.
优化|敛散性(4篇)
【1】Convergence for Discrete Parameter Updates
标题:离散参数更新的收敛
链接:https://arxiv.org/abs/2512.04051
作者:Paul Wilson,Fabio Zanasi,George Constantinides
备注:opt-ml 2025 workshop at NeurIPS
摘要:现代深度学习模型需要大量的计算资源,这促使人们研究低精度训练。量化训练通过用低位整数表示训练分量来解决这个问题,但通常依赖于离散实值更新。我们引入了一种替代方法,其中更新规则本身是离散的,避免了连续更新的量化设计。我们建立了收敛保证一般类的离散计划,并提出了一个多项式更新规则作为一个具体的例子,支持经验评估。这种观点为有效的训练开辟了新的途径,特别是对于具有固有离散结构的模型。
摘要
:Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a multinomial update rule as a concrete example, supported by empirical evaluation. This perspective opens new avenues for efficient training, particularly for models with inherently discrete structure.
【2】CoGraM: Context-sensitive granular optimization method with rollback for robust model fusion
标题:CoGraM:具有回滚的上下文敏感粒度优化方法,用于鲁棒模型融合
链接:https://arxiv.org/abs/2512.03610
作者:Julius Lenz
备注:15 pages, 4 figures, 8 equations
摘要:在没有重新训练的情况下合并神经网络是联邦和分布式学习的核心。常见的方法,如加权平均或Fisher合并,往往会失去准确性,并且在种子之间不稳定。CoGRAM(上下文粒度合并)是一种跨层、神经元和权重级别的多阶段、上下文敏感、基于损失的迭代优化方法,可将决策与损失差异和阈值相结合,并通过回滚防止有害更新。CoGram是一种优化方法,解决了Fisher等方法的弱点,可以显着改善合并网络。
摘要:Merging neural networks without retraining is central to federated and distributed learning. Common methods such as weight averaging or Fisher merging often lose accuracy and are unstable across seeds. CoGraM (Contextual Granular Merging) is a multi-stage, context-sensitive, loss-based, and iterative optimization method across layers, neurons, and weight levels that aligns decisions with loss differences and thresholds and prevents harmful updates through rollback. CoGraM is an optimization method that addresses the weaknesses of methods such as Fisher and can significantly improve the merged network.
【3】Optimal Transportation and Alignment Between Gaussian Measures
标题:高斯测度间的最优传输与对齐
链接:https://arxiv.org/abs/2512.03579
作者:Sanjit Dandapanthula,Aleksandr Podkopaev,Shiva Prasad Kasiviswanathan,Aaditya Ramdas,Ziv Goldfeld
摘要:最优传输(OT)和Gromov-Wasserstein(GW)对齐提供了可解释的几何框架,用于比较,转换和聚合异构数据集-数据科学和机器学习中无处不在的任务。由于这些框架在计算上是昂贵的,大规模的应用程序往往依赖于二次成本下的高斯分布的封闭形式的解决方案。这项工作提供了一个全面的治疗高斯,二次成本OT和内积GW(IGW)对齐,关闭文献中的几个差距,以扩大适用性。首先,我们对待开放的问题IGW对齐之间的非中心高斯可分的希尔伯特空间给出了一个封闭形式的表达,以二次优化酉算子,我们推导出严格的解析上界和下界。如果至少有一个高斯测量是中心的,解决方案减少到一个完全封闭的形式的表达,我们进一步扩展到中心高斯之间的IGW重心的解析解。我们还提出了一个减少高斯多边际OT与成对二次成本的一个易于处理的优化问题,并提供了一个有效的算法来解决它使用秩亏约束。为了证明实用性,我们将我们的研究结果应用于合成和真实世界数据集的知识蒸馏和异构聚类。
摘要:Optimal transport (OT) and Gromov-Wasserstein (GW) alignment provide interpretable geometric frameworks for comparing, transforming, and aggregating heterogeneous datasets -- tasks ubiquitous in data science and machine learning. Because these frameworks are computationally expensive, large-scale applications often rely on closed-form solutions for Gaussian distributions under quadratic cost. This work provides a comprehensive treatment of Gaussian, quadratic cost OT and inner product GW (IGW) alignment, closing several gaps in the literature to broaden applicability. First, we treat the open problem of IGW alignment between uncentered Gaussians on separable Hilbert spaces by giving a closed-form expression up to a quadratic optimization over unitary operators, for which we derive tight analytic upper and lower bounds. If at least one Gaussian measure is centered, the solution reduces to a fully closed-form expression, which we further extend to an analytic solution for the IGW barycenter between centered Gaussians. We also present a reduction of Gaussian multimarginal OT with pairwise quadratic costs to a tractable optimization problem and provide an efficient algorithm to solve it using a rank-deficiency constraint. To demonstrate utility, we apply our results to knowledge distillation and heterogeneous clustering on synthetic and real-world datasets.
【4】Convergence of a class of gradient-free optimisation schemes when the objective function is noisy, irregular, or both
标题:当目标函数有噪、不规则或两者兼而有之时一类无梯度优化方案的收敛
链接:https://arxiv.org/abs/2512.03225
作者:Christophe Andrieu,Nicolas Chopin,Ettore Fincato,Mathieu Gerber
摘要:我们研究了一类迭代算法的收敛性,该算法旨在最小化一个潜在的非光滑和噪声的目标函数,这可能是代数难处理的,其值可以作为一个黑盒子的输出。所考虑的算法可以在广义梯度下降递归的保护伞下进行,其中梯度是目标函数的平滑近似。我们开发的框架包括作为特殊情况下的基于模型的方法和软化,两个经典的零阶优化方法。收敛结果是在非常弱的假设下获得的目标函数的规律性,并涉及在参数更新中所采取的步骤的平滑度和大小之间的权衡。正如预期的那样,在随机情况下需要额外的假设。我们说明了这些算法的相关性和我们的收敛结果,通过一个具有挑战性的分类机器学习的例子。
摘要:We investigate the convergence properties of a class of iterative algorithms designed to minimize a potentially non-smooth and noisy objective function, which may be algebraically intractable and whose values may be obtained as the output of a black box. The algorithms considered can be cast under the umbrella of a generalised gradient descent recursion, where the gradient is that of a smooth approximation of the objective function. The framework we develop includes as special cases model-based and mollification methods, two classical approaches to zero-th order optimisation. The convergence results are obtained under very weak assumptions on the regularity of the objective function and involve a trade-off between the degree of smoothing and size of the steps taken in the parameter updates. As expected, additional assumptions are required in the stochastic case. We illustrate the relevance of these algorithms and our convergence results through a challenging classification example from machine learning.
预测|估计(8篇)
【1】Observation-driven correction of numerical weather prediction for marine winds
标题:海洋风数值天气预报的观测驱动修正
链接:https://arxiv.org/abs/2512.03606
作者:Matteo Peduto,Qidong Yang,Jonathan Giezendanner,Devis Tuia,Sherrie Wang
摘要:准确的海上风场预报对于安全航行、船舶定线和能源运营至关重要,但由于海洋观测稀疏、异质且随时间变化,因此仍然具有挑战性。我们重新制定风预报为全球数值天气预报(NWP)模式的观测信息校正。我们不是直接预报风,而是通过同化最新的现场观测来调整全球预报系统(GFS)的输出来学习本地校正模式。我们提出了一种基于transformer的深度学习架构,该架构(i)通过掩蔽和基于集合的注意力机制处理不规则和时变的观察集,(ii)通过交叉注意力对最近的观察-预测对进行条件预测,以及(iii)采用循环时间嵌入和坐标感知位置表示,以实现任意空间坐标的单次推理。我们评估我们的模式在大西洋使用国际综合海洋大气数据集(ICOADS)的观测作为参考。该模型降低了GFS 10米风的RMSE在所有的提前时间长达48小时,实现了45%的改善,在1小时的提前时间和13%的改善,在48小时的提前时间。空间分析显示,观测数据最丰富的海岸线和航道沿线的改善最为持久。标记化架构自然适应异构观测平台(船舶、浮标、验潮仪和沿海站),并在一次向前传递中生成特定于站点的预测和流域规模的网格化产品。这些结果证明了一种实用的,低延迟的后处理方法,通过学习纠正系统预报误差来补充NWP。
摘要
:Accurate marine wind forecasts are essential for safe navigation, ship routing, and energy operations, yet they remain challenging because observations over the ocean are sparse, heterogeneous, and temporally variable. We reformulate wind forecasting as observation-informed correction of a global numerical weather prediction (NWP) model. Rather than forecasting winds directly, we learn local correction patterns by assimilating the latest in-situ observations to adjust the Global Forecast System (GFS) output. We propose a transformer-based deep learning architecture that (i) handles irregular and time-varying observation sets through masking and set-based attention mechanisms, (ii) conditions predictions on recent observation-forecast pairs via cross-attention, and (iii) employs cyclical time embeddings and coordinate-aware location representations to enable single-pass inference at arbitrary spatial coordinates. We evaluate our model over the Atlantic Ocean using observations from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) as reference. The model reduces GFS 10-meter wind RMSE at all lead times up to 48 hours, achieving 45% improvement at 1-hour lead time and 13% improvement at 48-hour lead time. Spatial analyses reveal the most persistent improvements along coastlines and shipping routes, where observations are most abundant. The tokenized architecture naturally accommodates heterogeneous observing platforms (ships, buoys, tide gauges, and coastal stations) and produces both site-specific predictions and basin-scale gridded products in a single forward pass. These results demonstrate a practical, low-latency post-processing approach that complements NWP by learning to correct systematic forecast errors.
【2】ProtoEFNet: Dynamic Prototype Learning for Inherently Interpretable Ejection Fraction Estimation in Echocardiography
标题:ProtoEFNet:用于超声心动图中固有可解释射血分数估计的动态原型学习
链接:https://arxiv.org/abs/2512.03339
作者:Yeganeh Ghamary,Victoria Wu,Hooman Vaseli,Christina Luong,Teresa Tsang,Siavash Bigdeli,Purang Abolmaesumi
备注:11 pages, Accepted in IMIMIC Workshop at MICCAI 2025
摘要:射血分数(EF)是评估心脏功能和诊断心力衰竭等疾病的重要指标。传统上,EF估计需要手动跟踪和领域专业知识,使得该过程耗时,并受到观察者之间的差异。目前大多数用于EF预测的深度学习方法都是透明度有限的黑箱模型,这降低了临床信任度。一些事后可解释性的方法已经被提出来解释的决策过程后,预测。然而,这些解释不能指导模型的内部推理,因此在临床应用中提供有限的可靠性。为了解决这个问题,我们引入了ProtoEFNet,这是一种用于连续EF回归的新型基于视频的原型学习模型。该模型学习动态时空原型,捕获临床上有意义的心脏运动模式。此外,所提出的原型角分离(PAS)损失在连续EF频谱上强制执行区分表示。我们在EchonetDynamic数据集上的实验表明,ProtoEFNet可以实现与其不可解释的对应物相当的准确性,同时提供临床相关的见解。消融研究表明,拟议的损失提高了性能,F1评分从77.67$\pm$2.68增加到79.64$\pm$2.10,增加了2%。我们的源代码可在以下网址获得:https://github.com/DeepRCL/ProtoEF
摘要:Ejection fraction (EF) is a crucial metric for assessing cardiac function and diagnosing conditions such as heart failure. Traditionally, EF estimation requires manual tracing and domain expertise, making the process time-consuming and subject to interobserver variability. Most current deep learning methods for EF prediction are black-box models with limited transparency, which reduces clinical trust. Some post-hoc explainability methods have been proposed to interpret the decision-making process after the prediction is made. However, these explanations do not guide the model's internal reasoning and therefore offer limited reliability in clinical applications. To address this, we introduce ProtoEFNet, a novel video-based prototype learning model for continuous EF regression. The model learns dynamic spatiotemporal prototypes that capture clinically meaningful cardiac motion patterns. Additionally, the proposed Prototype Angular Separation (PAS) loss enforces discriminative representations across the continuous EF spectrum. Our experiments on the EchonetDynamic dataset show that ProtoEFNet can achieve accuracy on par with its non-interpretable counterpart while providing clinically relevant insight. The ablation study shows that the proposed loss boosts performance with a 2% increase in F1 score from 77.67$\pm$2.68 to 79.64$\pm$2.10. Our source code is available at: https://github.com/DeepRCL/ProtoEF
【3】HydroDCM: Hydrological Domain-Conditioned Modulation for Cross-Reservoir Inflow Prediction
标题:HydroCM:跨水库流入预测的水文域条件调制
链接:https://arxiv.org/abs/2512.03300
作者:Pengfei Hu,Fan Ming,Xiaoxue Han,Chang Lu,Yue Ning,Dan Lu
备注:Accepted by AAAI 2026 workshop (oral) on AI for Environmental Science
摘要:深度学习模型在水库流入预测方面显示出了希望,但由于分布差异,它们的性能在应用于不同水库时往往会恶化,称为域转移问题。域泛化(DG)解决方案旨在通过提取域不变表示来解决这个问题,这些表示可以减轻不可见域中的错误。然而,在水文环境中,每个水库表现出独特的流入模式,而一些元数据以外的观察,如空间信息施加间接的,但显着的影响。这种不匹配限制了传统的DG技术的适用性,许多领域的水文系统。为了克服这些挑战,我们提出了HydroDCM,一个可扩展的DG框架跨水库流入预测。水库的空间元数据用于构建伪域标签,引导对抗学习不变的时间特征。在推理过程中,HydroDCM通过由目标水库的元数据通知的轻量级调节层来适应这些特征,从而调和DG的不变性与特定于位置的适应。在上科罗拉多河流域的30个现实世界的水库的实验结果表明,我们的方法大大优于国家的最先进的DG基线在许多领域的条件下,并保持计算效率。
摘要:Deep learning models have shown promise in reservoir inflow prediction, yet their performance often deteriorates when applied to different reservoirs due to distributional differences, referred to as the domain shift problem. Domain generalization (DG) solutions aim to address this issue by extracting domain-invariant representations that mitigate errors in unseen domains. However, in hydrological settings, each reservoir exhibits unique inflow patterns, while some metadata beyond observations like spatial information exerts indirect but significant influence. This mismatch limits the applicability of conventional DG techniques to many-domain hydrological systems. To overcome these challenges, we propose HydroDCM, a scalable DG framework for cross-reservoir inflow forecasting. Spatial metadata of reservoirs is used to construct pseudo-domain labels that guide adversarial learning of invariant temporal features. During inference, HydroDCM adapts these features through light-weight conditioning layers informed by the target reservoir's metadata, reconciling DG's invariance with location-specific adaptation. Experiment results on 30 real-world reservoirs in the Upper Colorado River Basin demonstrate that our method substantially outperforms state-of-the-art DG baselines under many-domain conditions and remains computationally efficient.
【4】Neighborhood density estimation using space-partitioning based hashing schemes
标题:使用基于空间划分的哈希方案估计邻居密度
链接:https://arxiv.org/abs/2512.03187
作者:Aashi Jindal
备注:arXiv admin note: text overlap with arXiv:2011.03729
摘要:This work introduces FiRE/FiRE.1, a novel sketching-based algorithm for anomaly detection to quickly identify rare cell sub-populations in large-scale single-cell RNA sequencing data. This method demonstrated superior performance against state-of-the-art techniques. Furthermore, the thesis proposes Enhash, a fast and resource-efficient ensemble learner that uses projection hashing to detect concept drift in streaming data, proving highly competitive in time and accuracy across various drift types.
【5】Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration
标题:基于扩散贝叶斯探测的状态估计误差的动态修正
链接:https://arxiv.org/abs/2512.03102
作者:Yiwei Shi,Hongnan Ma,Mengyue Yang,Cunjia Liu,Weiru Liu
摘要:In emergency response and other high-stakes societal applications, early-stage state estimates critically shape downstream outcomes. Yet, these initial state estimates-often based on limited or biased information-can be severely misaligned with reality, constraining subsequent actions and potentially causing catastrophic delays, resource misallocation, and human harm. Under the stationary bootstrap baseline (zero transition and no rejuvenation), bootstrap particle filters exhibit Stationarity-Induced Posterior Support Invariance (S-PSI), wherein regions excluded by the initial prior remain permanently unexplorable, making corrections impossible even when new evidence contradicts current beliefs. While classical perturbations can in principle break this lock-in, they operate in an always-on fashion and may be inefficient. To overcome this, we propose a diffusion-driven Bayesian exploration framework that enables principled, real-time correction of early state estimation errors. Our method expands posterior support via entropy-regularized sampling and covariance-scaled diffusion. A Metropolis-Hastings check validates proposals and keeps inference adaptive to unexpected evidence. Empirical evaluations on realistic hazardous-gas localization tasks show that our approach matches reinforcement learning and planning baselines when priors are correct. It substantially outperforms classical SMC perturbations and RL-based methods under misalignment, and we provide theoretical guarantees that DEPF resolves S-PSI while maintaining statistical rigor.
【6】A Large Scale Heterogeneous Treatment Effect Estimation Framework and Its Applications of Users' Journey at Snap
标题:大规模异类治疗效果估计框架及其在Snap用户旅程中的应用
链接:https://arxiv.org/abs/2512.03060
作者:Jing Pan,Li Shi,Paul Lo
摘要:Heterogeneous Treatment Effect (HTE) and Conditional Average Treatment Effect (CATE) models relax the assumption that treatment effects are the same for every user. We present a large scale industrial framework for estimating HTE using experimental data from hundreds of millions of Snapchat users. By combining results across many experiments, the framework uncovers latent user characteristics that were previously unmeasurable and produces stable treatment effect estimates at scale. We describe the core components that enabled this system, including experiment selection, base learner design, and incremental training. We also highlight two applications: user influenceability to ads and user sensitivity to ads. An online A/B test using influenceability scores for targeting showed an improvement on key business metrics that is more than six times larger than what is typically considered significant.
【7】Deep-BrownConrady: Prediction of Camera Calibration and Distortion Parameters Using Deep Learning and Synthetic Data
标题:Deep-BrownConrady:使用深度学习和合成数据预测相机校准和失真参数
链接:https://arxiv.org/abs/2501.14510
作者:Faiz Muhammad Chaudhry,Jarno Ralli,Jerome Leudet,Fahad Sohrab,Farhad Pakdaman,Pierre Corbani,Moncef Gabbouj
摘要:This research addresses the challenge of camera calibration and distortion parameter prediction from a single image using deep learning models. The main contributions of this work are: (1) demonstrating that a deep learning model, trained on a mix of real and synthetic images, can accurately predict camera and lens parameters from a single image, and (2) developing a comprehensive synthetic dataset using the AILiveSim simulation platform. This dataset includes variations in focal length and lens distortion parameters, providing a robust foundation for model training and testing. The training process predominantly relied on these synthetic images, complemented by a small subset of real images, to explore how well models trained on synthetic data can perform calibration tasks on real-world images. Traditional calibration methods require multiple images of a calibration object from various orientations, which is often not feasible due to the lack of such images in publicly available datasets. A deep learning network based on the ResNet architecture was trained on this synthetic dataset to predict camera calibration parameters following the Brown-Conrady lens model. The ResNet architecture, adapted for regression tasks, is capable of predicting continuous values essential for accurate camera calibration in applications such as autonomous driving, robotics, and augmented reality. Keywords: Camera calibration, distortion, synthetic data, deep learning, residual networks (ResNet), AILiveSim, horizontal field-of-view, principal point, Brown-Conrady Model.
【8】Class conditional conformal prediction for multiple inputs by p-value aggregation
标题:通过p值聚集进行多个输入的类条件保形预测
链接:https://arxiv.org/abs/2507.07150
作者:Jean-Baptiste Fermanian,Mohamed Hebiri,Joseph Salmon
摘要:Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a single instance are available at prediction time. Our approach is particularly motivated by applications in citizen science, where multiple images of the same plant or animal are captured by individuals. Our method integrates the information from each observation into conformal prediction, enabling a reduction in the size of the predicted label set while preserving the required class-conditional coverage guarantee. The approach is based on the aggregation of conformal p-values computed from each observation of a multi-input. By exploiting the exact distribution of these p-values, we propose a general aggregation framework using an abstract scoring function, encompassing many classical statistical tools. Knowledge of this distribution also enables refined versions of standard strategies, such as majority voting. We evaluate our method on simulated and real data, with a particular focus on Pl@ntNet, a prominent citizen science platform that facilitates the collection and identification of plant species through user-submitted images.
其他神经网络|深度学习|模型|建模(30篇)
【1】Fast & Efficient Normalizing Flows and Applications of Image Generative Models
标题:图像生成模型的快速高效规范化流程和应用
链接:https://arxiv.org/abs/2512.04039
作者:Sandeep Nagar
备注:PhD Thesis
摘要:This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-Flow, for the forward pass and training it using proposed backpropagation algorithm, and 6) Affine-StableSR, a compact and efficient super-resolution model that leverages pre-trained weights and Normalizing Flow layers to reduce parameter count while maintaining performance. The second part: 1) An automated quality assessment system for agricultural produce using Conditional GANs to address class imbalance, data scarcity and annotation challenges, achieving good accuracy in seed purity testing; 2) An unsupervised geological mapping framework utilizing stacked autoencoders for dimensionality reduction, showing improved feature extraction compared to conventional methods; 3) We proposed a privacy preserving method for autonomous driving datasets using on face detection and image inpainting; 4) Utilizing Stable Diffusion based image inpainting for replacing the detected face and license plate to advancing privacy-preserving techniques and ethical considerations in the field.; and 5) An adapted diffusion model for art restoration that effectively handles multiple types of degradation through unified fine-tuning.
【2】Quantum-Classical Physics-Informed Neural Networks for Solving Reservoir Seepage Equations
标题:用于求解水库渗漏方程的量子经典物理神经网络
链接:https://arxiv.org/abs/2512.03923
作者:Xiang Rao,Yina Liu,Yuxuan Shen
摘要:Solving partial differential equations (PDEs) for reservoir seepage is critical for optimizing oil and gas field development and predicting production performance. Traditional numerical methods suffer from mesh-dependent errors and high computational costs, while classical Physics-Informed Neural Networks (PINNs) face bottlenecks in parameter efficiency, high-dimensional expression, and strong nonlinear fitting. To address these limitations, we propose a Discrete Variable (DV)-Circuit Quantum-Classical Physics-Informed Neural Network (QCPINN) and apply it to three typical reservoir seepage models for the first time: the pressure diffusion equation for heterogeneous single-phase flow, the nonlinear Buckley-Leverett (BL) equation for two-phase waterflooding, and the convection-diffusion equation for compositional flow considering adsorption. The QCPINN integrates classical preprocessing/postprocessing networks with a DV quantum core, leveraging quantum superposition and entanglement to enhance high-dimensional feature mapping while embedding physical constraints to ensure solution consistency. We test three quantum circuit topologies (Cascade, Cross-mesh, Alternate) and demonstrate through numerical experiments that QCPINNs achieve high prediction accuracy with fewer parameters than classical PINNs. Specifically, the Alternate topology outperforms others in heterogeneous single-phase flow and two-phase BL equation simulations, while the Cascade topology excels in compositional flow with convection-dispersion-adsorption coupling. Our work verifies the feasibility of QCPINN for reservoir engineering applications, bridging the gap between quantum computing research and industrial practice in oil and gas engineering.
【3】Scalable Decision Focused Learning via Online Trainable Surrogates
标题:通过在线可训练代理进行可扩展的以决策为中心的学习
链接:https://arxiv.org/abs/2512.03861
作者:Gaetano Signorelli,Michele Lombardi
摘要:Decision support systems often rely on solving complex optimization problems that may require to estimate uncertain parameters beforehand. Recent studies have shown how using traditionally trained estimators for this task can lead to suboptimal solutions. Using the actual decision cost as a loss function (called Decision Focused Learning) can address this issue, but with a severe loss of scalability at training time. To address this issue, we propose an acceleration method based on replacing costly loss function evaluations with an efficient surrogate. Unlike previously defined surrogates, our approach relies on unbiased estimators reducing the risk of spurious local optima and can provide information on its local confidence allowing one to switch to a fallback method when needed. Furthermore, the surrogate is designed for a black-box setting, which enables compensating for simplifications in the optimization model and account- ing for recourse actions during cost computation. In our results, the method reduces costly inner solver calls, with a solution quality comparable to other state-of-the-art techniques.
【4】Feature-aware Modulation for Learning from Temporal Tabular Data
标题:用于从时态表格数据学习的设备感知调制
链接:https://arxiv.org/abs/2512.03678
作者:Hao-Run Cai,Han-Jia Ye
备注:17 pages, 6 figures, 8 tables. NeurIPS 2025
摘要:While tabular machine learning has achieved remarkable success, temporal distribution shifts pose significant challenges in real-world deployment, as the relationships between features and labels continuously evolve. Static models assume fixed mappings to ensure generalization, whereas adaptive models may overfit to transient patterns, creating a dilemma between robustness and adaptability. In this paper, we analyze key factors essential for constructing an effective dynamic mapping for temporal tabular data. We discover that evolving feature semantics-particularly objective and subjective meanings-introduce concept drift over time. Crucially, we identify that feature transformation strategies are able to mitigate discrepancies in feature representations across temporal stages. Motivated by these insights, we propose a feature-aware temporal modulation mechanism that conditions feature representations on temporal context, modulating statistical properties such as scale and skewness. By aligning feature semantics across time, our approach achieves a lightweight yet powerful adaptation, effectively balancing generalizability and adaptability. Benchmark evaluations validate the effectiveness of our method in handling temporal shifts in tabular data.
【5】Conditional updates of neural network weights for increased out of training performance
标题:神经网络权重的条件更新以提高训练性能
链接
:https://arxiv.org/abs/2512.03653
作者:Jan Saynisch-Wagner,Saran Rajendran Sari
摘要:This study proposes a method to enhance neural network performance when training data and application data are not very similar, e.g., out of distribution problems, as well as pattern and regime shifts. The method consists of three main steps: 1) Retrain the neural network towards reasonable subsets of the training data set and note down the resulting weight anomalies. 2) Choose reasonable predictors and derive a regression between the predictors and the weight anomalies. 3) Extrapolate the weights, and thereby the neural network, to the application data. We show and discuss this method in three use cases from the climate sciences, which include successful temporal, spatial and cross-domain extrapolations of neural networks.
【6】When, How Long and How Much? Interpretable Neural Networks for Time Series Regression by Learning to Mask and Aggregate
标题:何时、多长时间以及多少?通过学习掩蔽和聚合来实现时间序列回归的可解释神经网络
链接:https://arxiv.org/abs/2512.03578
作者:Florent Forest,Amaury Wei,Olga Fink
备注:12 pages, 5 figures, 4 tables
摘要:Time series extrinsic regression (TSER) refers to the task of predicting a continuous target variable from an input time series. It appears in many domains, including healthcare, finance, environmental monitoring, and engineering. In these settings, accurate predictions and trustworthy reasoning are both essential. Although state-of-the-art TSER models achieve strong predictive performance, they typically operate as black boxes, making it difficult to understand which temporal patterns drive their decisions. Post-hoc interpretability techniques, such as feature attribution, aim to to explain how the model arrives at its predictions, but often produce coarse, noisy, or unstable explanations. Recently, inherently interpretable approaches based on concepts, additive decompositions, or symbolic regression, have emerged as promising alternatives. However, these approaches remain limited: they require explicit supervision on the concepts themselves, often cannot capture interactions between time-series features, lack expressiveness for complex temporal patterns, and struggle to scale to high-dimensional multivariate data. To address these limitations, we propose MAGNETS (Mask-and-AGgregate NEtwork for Time Series), an inherently interpretable neural architecture for TSER. MAGNETS learns a compact set of human-understandable concepts without requiring any annotations. Each concept corresponds to a learned, mask-based aggregation over selected input features, explicitly revealing both which features drive predictions and when they matter in the sequence. Predictions are formed as combinations of these learned concepts through a transparent, additive structure, enabling clear insight into the model's decision process.
【7】Machine Learning to Predict Slot Usage in TSCH Wireless Sensor Networks
标题:机器学习预测TCH无线传感器网络中的槽使用情况
链接:https://arxiv.org/abs/2512.03570
作者:Stefano Scanzio,Gabriele Formis,Tullio Facchinetti,Gianluca Cena
备注:preprint accepted, 8 pages, 2025
摘要:Wireless sensor networks (WSNs) are employed across a wide range of industrial applications where ultra-low power consumption is a critical prerequisite. At the same time, these systems must maintain a certain level of determinism to ensure reliable and predictable operation. In this view, time slotted channel hopping (TSCH) is a communication technology that meets both conditions, making it an attractive option for its usage in industrial WSNs. This work proposes the use of machine learning to learn the traffic pattern generated in networks based on the TSCH protocol, in order to turn nodes into a deep sleep state when no transmission is planned and thus to improve the energy efficiency of the WSN. The ability of machine learning models to make good predictions at different network levels in a typical tree network topology was analyzed in depth, showing how their capabilities degrade while approaching the root of the tree. The application of these models on simulated data based on an accurate modeling of wireless sensor nodes indicates that the investigated algorithms can be suitably used to further and substantially reduce the power consumption of a TSCH network.
【8】Towards Irreversible Machine Unlearning for Diffusion Models
标题:扩散模型的不可逆转的机器取消学习
链接:https://arxiv.org/abs/2512.03564
作者:Xun Yuan,Zilong Zhao,Jiayu Li,Aryan Pasikhani,Prosanta Gope,Biplab Sikdar
摘要:Diffusion models are renowned for their state-of-the-art performance in generating synthetic images. However, concerns related to safety, privacy, and copyright highlight the need for machine unlearning, which can make diffusion models forget specific training data and prevent the generation of sensitive or unwanted content. Current machine unlearning methods for diffusion models are primarily designed for conditional diffusion models and focus on unlearning specific data classes or features. Among these methods, finetuning-based machine unlearning methods are recognized for their efficiency and effectiveness, which update the parameters of pre-trained diffusion models by minimizing carefully designed loss functions. However, in this paper, we propose a novel attack named Diffusion Model Relearning Attack (DiMRA), which can reverse the finetuning-based machine unlearning methods, posing a significant vulnerability of this kind of technique. Without prior knowledge of the unlearning elements, DiMRA optimizes the unlearned diffusion model on an auxiliary dataset to reverse the unlearning, enabling the model to regenerate previously unlearned elements. To mitigate this vulnerability, we propose a novel machine unlearning method for diffusion models, termed as Diffusion Model Unlearning by Memorization (DiMUM). Unlike traditional methods that focus on forgetting, DiMUM memorizes alternative data or features to replace targeted unlearning data or features in order to prevent generating such elements. In our experiments, we demonstrate the effectiveness of DiMRA in reversing state-of-the-art finetuning-based machine unlearning methods for diffusion models, highlighting the need for more robust solutions. We extensively evaluate DiMUM, demonstrating its superior ability to preserve the generative performance of diffusion models while enhancing robustness against DiMRA.
【9】Parameter-Efficient Augment Plugin for Class-Incremental Learning
标题:用于类增量学习的参数高效增强插件
链接:https://arxiv.org/abs/2512.03537
作者:Zhiming Xu,Baile Xu,Jian Zhao,Furao Shen,Suorong Yang
备注:10 pages, 6 figures, 2 tables
摘要:Existing class-incremental learning (CIL) approaches based on replay or knowledge distillation are often constrained by forgetting or the stability-plasticity dilemma. Some expansion-based approaches could achieve higher accuracy. However, they always require significant parameter increases. In this paper, we propose a plugin extension paradigm termed the Deployment of extra LoRA Components (DLC) for non-pre-trained CIL scenarios.We treat the feature extractor trained through replay or distillation as a base model with rich knowledge. For each task, we use Low-Rank Adaptation (LoRA) to inject task-specific residuals into the base model's deep layers. During inference, representations with task-specific residuals are aggregated to produce classification predictions. To mitigate interference from non-target LoRA plugins, we introduce a lightweight weighting unit. This unit learns to assign importance scores to different LoRA-tuned representations. Like downloadable contents in software, our method serves as a plug-and-play enhancement that efficiently extends the base methods. Remarkably, on the large-scale ImageNet-100, with merely 4 % of the parameters of a standard ResNet-18, our DLC model achieves a significant 8 % improvement in accuracy, demonstrating exceptional efficiency. Moreover, it could surpass state-of-the-art methods under the fixed memory budget.
【10】Physics-Driven Learning Framework for Tomographic Tactile Sensing
标题:断层扫描触觉感知的物理驱动学习框架
链接:https://arxiv.org/abs/2512.03512
作者:Xuanxuan Yang,Xiuyang Zhang,Haofeng Chen,Gang Ma,Xiaojie Wang
备注:7pages,7figures
摘要:Electrical impedance tomography (EIT) provides an attractive solution for large-area tactile sensing due to its minimal wiring and shape flexibility, but its nonlinear inverse problem often leads to severe artifacts and inaccurate contact reconstruction. This work presents PhyDNN, a physics-driven deep reconstruction framework that embeds the EIT forward model directly into the learning objective. By jointly minimizing the discrepancy between predicted and ground-truth conductivity maps and enforcing consistency with the forward PDE, PhyDNN reduces the black-box nature of deep networks and improves both physical plausibility and generalization. To enable efficient backpropagation, we design a differentiable forward-operator network that accurately approximates the nonlinear EIT response, allowing fast physics-guided training. Extensive simulations and real tactile experiments on a 16-electrode soft sensor show that PhyDNN consistently outperforms NOSER, TV, and standard DNNs in reconstructing contact shape, location, and pressure distribution. PhyDNN yields fewer artifacts, sharper boundaries, and higher metric scores, demonstrating its effectiveness for high-quality tomographic tactile sensing.
【11】Modal Logical Neural Networks
标题:模态逻辑神经网络
链接:https://arxiv.org/abs/2512.03491
作者:Antonin Sulc
备注:27 pages, 10 figures, 7 tables
摘要:We propose Modal Logical Neural Networks (MLNNs), a neurosymbolic framework that integrates deep learning with the formal semantics of modal logic, enabling reasoning about necessity and possibility. Drawing on Kripke semantics, we introduce specialized neurons for the modal operators $\Box$ and $\Diamond$ that operate over a set of possible worlds, enabling the framework to act as a differentiable ``logical guardrail.'' The architecture is highly flexible: the accessibility relation between worlds can either be fixed by the user to enforce known rules or, as an inductive feature, be parameterized by a neural network. This allows the model to optionally learn the relational structure of a logical system from data while simultaneously performing deductive reasoning within that structure. This versatile construction is designed for flexibility. The entire framework is differentiable from end to end, with learning driven by minimizing a logical contradiction loss. This not only makes the system resilient to inconsistent knowledge but also enables it to learn nonlinear relationships that can help define the logic of a problem space. We illustrate MLNNs on four case studies: grammatical guardrailing, axiomatic detection of the unknown, multi-agent epistemic trust, and detecting constructive deception in natural language negotiation. These experiments demonstrate how enforcing or learning accessibility can increase logical consistency and interpretability without changing the underlying task architecture.
【12】Joint Progression Modeling (JPM): A Probabilistic Framework for Mixed-Pathology Progression
标题:关节进展建模(JPM):混合病理进展的概率框架
链接:https://arxiv.org/abs/2512.03475
作者:Hongtao Hao,Joseph L. Austerweil
备注:49 pages; Machine Learning for Health (ML4H) Symposium 2025
摘要:Event-based models (EBMs) infer disease progression from cross-sectional data, and standard EBMs assume a single underlying disease per individual. In contrast, mixed pathologies are common in neurodegeneration. We introduce the Joint Progression Model (JPM), a probabilistic framework that treats single-disease trajectories as partial rankings and builds a prior over joint progressions. We study several JPM variants (Pairwise, Bradley-Terry, Plackett-Luce, and Mallows) and analyze three properties: (i) calibration -- whether lower model energy predicts smaller distance to the ground truth ordering; (ii) separation -- the degree to which sampled rankings are distinguishable from random permutations; and (iii) sharpness -- the stability of sampled aggregate rankings. All variants are calibrated, and all achieve near-perfect separation; sharpness varies by variant and is well-predicted by simple features of the input partial rankings (number and length of rankings, conflict, and overlap). In synthetic experiments, JPM improves ordering accuracy by roughly 21 percent over a strong EBM baseline (SA-EBM) that treats the joint disease as a single condition. Finally, using NACC, we find that the Mallows variant of JPM and the baseline model (SA-EBM) have results that are more consistent with prior literature on the possible disease progression of the mixed pathology of AD and VaD.
【13】Grokked Models are Better Unlearners
标题:Grokkked模型是更好的学习者
链接:https://arxiv.org/abs/2512.03437
作者:Yuanbang Liang,Yang Li
摘要:Grokking-delayed generalization that emerges well after a model has fit the training data-has been linked to robustness and representation quality. We ask whether this training regime also helps with machine unlearning, i.e., removing the influence of specified data without full retraining. We compare applying standard unlearning methods before versus after the grokking transition across vision (CNNs/ResNets on CIFAR, SVHN, and ImageNet) and language (a transformer on a TOFU-style setup). Starting from grokked checkpoints consistently yields (i) more efficient forgetting (fewer updates to reach a target forget level), (ii) less collateral damage (smaller drops on retained and test performance), and (iii) more stable updates across seeds, relative to early-stopped counterparts under identical unlearning algorithms. Analyses of features and curvature further suggest that post-grokking models learn more modular representations with reduced gradient alignment between forget and retain subsets, which facilitates selective forgetting. Our results highlight when a model is trained (pre- vs. post-grokking) as an orthogonal lever to how unlearning is performed, providing a practical recipe to improve existing unlearning methods without altering their algorithms.
【14】Better World Models Can Lead to Better Post-Training Performance
标题:更好的世界模型可以带来更好的训练后表现
链接:https://arxiv.org/abs/2512.03400
作者:Prakhar Gupta,Henry Conklin,Sarah-Jane Leslie,Andrew Lee
摘要:In this work we study how explicit world-modeling objectives affect the internal representations and downstream capability of Transformers across different training stages. We use a controlled 2x2x2 Rubik's Cube and ask: (1) how does explicitly pretraining a world model affect the model's latent representations, and (2) how does world-model quality affect the model's performance after reinforcement learning post-training? We compare standard next-token prediction to two explicit world-modeling strategies -- (i) state-prediction pretraining and (ii) a joint state-prediction + next-token objective -- and assess task performance after Group Relative Policy Optimization (GRPO) is applied as post-training. We evaluate the representation quality with linear probes and causal interventions. We find that explicit world-modeling yields more linearly decodable and causally steerable state representations. More importantly, we find that improved state representations lead to higher gains for GRPO, especially on harder cube states. Our results indicate that sharpening state representations can improve the effectiveness of post-training for sequence-planning tasks.
【15】Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value
标题:全栈一致:人工智能和具有厚重价值模型的机构一致
链接:https://arxiv.org/abs/2512.03399
作者:Joe Edelman,Tan Zhi-Xuan,Ryan Lowe,Oliver Klingefjord,Vincent Wang-Mascianica,Matija Franklin,Ryan Othniel Kearns,Ellie Hain,Atrisha Sarkar,Michiel Bakker,Fazl Barez,David Duvenaud,Jakob Foerster,Iason Gabriel,Joseph Gubbels,Bryce Goodman,Andreas Haupt,Jobst Heitzig,Julian Jara-Ettinger,Atoosa Kasirzadeh,James Ravi Kirkpatrick,Andrew Koh,W. Bradley Knox,Philipp Koralus,Joel Lehman,Sydney Levine,Samuele Marro,Manon Revel,Toby Shorin,Morgan Sutherland,Michael Henry Tessler,Ivan Vendrov,James Wilken-Smith
摘要:Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intentions of its operating organization can lead to bad outcomes if the goals of that organization are misaligned with those of other institutions and individuals. For this reason, we need full-stack alignment, the concurrent alignment of AI systems and the institutions that shape them with what people value. This can be done without imposing a particular vision of individual or collective flourishing. We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively. They struggle to distinguish values from other signals, to support principled normative reasoning, and to model collective goods. We propose thick models of value will be needed. These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences, to model the social embedding of individual choices, and to reason normatively, applying values in new domains. We demonstrate this approach in five areas: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions.
【16】Breaking Determinism: Stochastic Modeling for Reliable Off-Policy Evaluation in Ad Auctions
标题:突破决定论:广告拍卖中可靠的政策外评估的随机建模
链接:https://arxiv.org/abs/2512.03354
作者:Hongseon Yeom,Jaeyoul Shin,Soojin Min,Jeongmin Yoon,Seunghak Yu,Dongyeop Kang
摘要:Online A/B testing, the gold standard for evaluating new advertising policies, consumes substantial engineering resources and risks significant revenue loss from deploying underperforming variations. This motivates the use of Off-Policy Evaluation (OPE) for rapid, offline assessment. However, applying OPE to ad auctions is fundamentally more challenging than in domains like recommender systems, where stochastic policies are common. In online ad auctions, it is common for the highest-bidding ad to win the impression, resulting in a deterministic, winner-takes-all setting. This results in zero probability of exposure for non-winning ads, rendering standard OPE estimators inapplicable. We introduce the first principled framework for OPE in deterministic auctions by repurposing the bid landscape model to approximate the propensity score. This model allows us to derive robust approximate propensity scores, enabling the use of stable estimators like Self-Normalized Inverse Propensity Scoring (SNIPS) for counterfactual evaluation. We validate our approach on the AuctionNet simulation benchmark and against 2-weeks online A/B test from a large-scale industrial platform. Our method shows remarkable alignment with online results, achieving a 92\% Mean Directional Accuracy (MDA) in CTR prediction, significantly outperforming the parametric baseline. MDA is the most critical metric for guiding deployment decisions, as it reflects the ability to correctly predict whether a new model will improve or harm performance. This work contributes the first practical and validated framework for reliable OPE in deterministic auction environments, offering an efficient alternative to costly and risky online experiments.
【17】Retrofitting Earth System Models with Cadence-Limited Neural Operator Updates
标题
:利用Cadence有限神经操作符更新改造地球系统模型
链接:https://arxiv.org/abs/2512.03309
作者:Aniruddha Bora,Shixuan Zhang,Khemraj Shukla,Bryce Harrop,George Em. Karniadakis,L. Ruby Leung
摘要:Coarse resolution, imperfect parameterizations, and uncertain initial states and forcings limit Earth-system model (ESM) predictions. Traditional bias correction via data assimilation improves constrained simulations but offers limited benefit once models run freely. We introduce an operator-learning framework that maps instantaneous model states to bias-correction tendencies and applies them online during integration. Building on a U-Net backbone, we develop two operator architectures Inception U-Net (IUNet) and a multi-scale network (M\&M) that combine diverse upsampling and receptive fields to capture multiscale nonlinear features under Energy Exascale Earth System Model (E3SM) runtime constraints. Trained on two years E3SM simulations nudged toward ERA5 reanalysis, the operators generalize across height levels and seasons. Both architectures outperform standard U-Net baselines in offline tests, indicating that functional richness rather than parameter count drives performance. In online hybrid E3SM runs, M\&M delivers the most consistent bias reductions across variables and vertical levels. The ML-augmented configurations remain stable and computationally feasible in multi-year simulations, providing a practical pathway for scalable hybrid modeling. Our framework emphasizes long-term stability, portability, and cadence-limited updates, demonstrating the utility of expressive ML operators for learning structured, cross-scale relationships and retrofitting legacy ESMs.
【18】Robust Tabular Foundation Models
标题:稳健的表格基础模型
链接:https://arxiv.org/abs/2512.03307
作者:Matthew Peroni,Franck Le,Vadim Sheinin
备注:Shaping Responsible Synthetic Data in the Era of Foundation Models, AAAI 2026
摘要:The development of tabular foundation models (TFMs) has accelerated in recent years, showing strong potential to outperform traditional ML methods for structured data. A key finding is that TFMs can be pretrained entirely on synthetic datasets, opening opportunities to design data generators that encourage desirable model properties. Prior work has mainly focused on crafting high-quality priors over generators to improve overall pretraining performance. Our insight is that parameterizing the generator distribution enables an adversarial robustness perspective: during training, we can adapt the generator to emphasize datasets that are particularly challenging for the model. We formalize this by introducing an optimality gap measure, given by the difference between TFM performance and the best achievable performance as estimated by strong baselines such as XGBoost, CatBoost, and Random Forests. Building on this idea, we propose Robust Tabular Foundation Models (RTFM), a model-agnostic adversarial training framework. Applied to the TabPFN V2 classifier, RTFM improves benchmark performance, with up to a 6% increase in mean normalized AUC over the original TabPFN and other baseline algorithms, while requiring less than 100k additional synthetic datasets. These results highlight a promising new direction for targeted adversarial training and fine-tuning of TFMs using synthetic data alone.
【19】Learning Network Sheaves for AI-native Semantic Communication
标题:学习用于人工智能原生语义通信的网络队列
链接:https://arxiv.org/abs/2512.03248
作者:Enrico Grimaldi,Mario Edoardo Pandolfo,Gabriele D'Acunto,Sergio Barbarossa,Paolo Di Lorenzo
摘要:Recent advances in AI call for a paradigm shift from bit-centric communication to goal- and semantics-oriented architectures, paving the way for AI-native 6G networks. In this context, we address a key open challenge: enabling heterogeneous AI agents to exchange compressed latent-space representations while mitigating semantic noise and preserving task-relevant meaning. We cast this challenge as learning both the communication topology and the alignment maps that govern information exchange among agents, yielding a learned network sheaf equipped with orthogonal maps. This learning process is further supported by a semantic denoising end compression module that constructs a shared global semantic space and derives sparse, structured representations of each agent's latent space. This corresponds to a nonconvex dictionary learning problem solved iteratively with closed-form updates. Experiments with mutiple AI agents pre-trained on real image data show that the semantic denoising and compression facilitates AI agents alignment and the extraction of semantic clusters, while preserving high accuracy in downstream task. The resulting communication network provides new insights about semantic heterogeneity across agents, highlighting the interpretability of our methodology.
【20】A Multi-Agent, Policy-Gradient approach to Network Routing
标题:网络路由的多代理、策略梯度方法
链接:https://arxiv.org/abs/2512.03211
作者:Nigel Tao,Jonathan Baxter,Lex Weaver
摘要:Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.
【21】Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
标题:核磁共振谱解析小分子结构的原子扩散模型
链接:https://arxiv.org/abs/2512.03127
作者:Ziyu Xiong,Yichi Zhang,Foyez Alauddin,Chu Xin Cheng,Joon Soo An,Mohammad R. Seyedsayamdost,Ellen D. Zhong
备注:NeurIPS 2025
摘要:Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the discovery of novel natural products and clinical therapeutics. Yet, interpreting NMR spectra remains a time-consuming, manual process requiring extensive domain expertise. We introduce ChefNMR (CHemical Elucidation From NMR), an end-to-end framework that directly predicts an unknown molecule's structure solely from its 1D NMR spectra and chemical formula. We frame structure elucidation as conditional generation from an atomic diffusion model built on a non-equivariant transformer architecture. To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products. ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%. This work takes a significant step toward solving the grand challenge of automating small-molecule structure elucidation and highlights the potential of deep learning in accelerating molecular discovery. Code is available at https://github.com/ml-struct-bio/chefnmr.
【22】Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models
标题:在统一多模态模型的持续学习中减轻模态内和模态间遗忘
链接:https://arxiv.org/abs/2512.03125
作者:Xiwen Wei,Mustafa Munir,Radu Marculescu
备注:NeurIPS 2025
摘要:Unified Multimodal Generative Models (UMGMs) unify visual understanding and image generation within a single autoregressive framework. However, their ability to continually learn new tasks is severely hindered by catastrophic forgetting, both within a modality (intra-modal) and across modalities (inter-modal). While intra-modal forgetting has been studied in prior continual learning (CL) work, inter-modal forgetting remains largely unexplored. In this paper, we identify and empirically validate this phenomenon in UMGMs and provide a theoretical explanation rooted in gradient conflict between modalities. To address both intra- and inter-modal forgetting, we propose Modality-Decoupled Experts (MoDE), a lightweight and scalable architecture that isolates modality-specific updates to mitigate the gradient conflict and leverages knowledge distillation to prevent catastrophic forgetting and preserve pre-trained capabilities. Unlike previous CL methods that remain modality-coupled and suffer from modality gradient conflict, MoDE explicitly decouples modalities to prevent interference. Experiments across diverse benchmarks demonstrate that MoDE significantly mitigates both inter- and intra-modal forgetting, outperforming prior CL baselines in unified multimodal generation settings. Codes will be publicly available: https://github.com/Christina200/MoDE-official.git
【23】Model-Agnostic Fairness Regularization for GNNs with Incomplete Sensitive Information
标题:不完全敏感信息GNNs的模型无关公平正则化
链接:https://arxiv.org/abs/2512.03074
作者:Mahdi Tavassoli Kejani,Fadi Dornaika,Jean-Michel Loubes
摘要:Graph Neural Networks (GNNs) have demonstrated exceptional efficacy in relational learning tasks, including node classification and link prediction. However, their application raises significant fairness concerns, as GNNs can perpetuate and even amplify societal biases against protected groups defined by sensitive attributes such as race or gender. These biases are often inherent in the node features, structural topology, and message-passing mechanisms of the graph itself. A critical limitation of existing fairness-aware GNN methods is their reliance on the strong assumption that sensitive attributes are fully available for all nodes during training--a condition that poses a practical impediment due to privacy concerns and data collection constraints. To address this gap, we propose a novel, model-agnostic fairness regularization framework designed for the realistic scenario where sensitive attributes are only partially available. Our approach formalizes a fairness-aware objective function that integrates both equal opportunity and statistical parity as differentiable regularization terms. Through a comprehensive empirical evaluation across five real-world benchmark datasets, we demonstrate that the proposed method significantly mitigates bias across key fairness metrics while maintaining competitive node classification performance. Results show that our framework consistently outperforms baseline models in achieving a favorable fairness-accuracy trade-off, with minimal degradation in predictive accuracy. The datasets and source code will be publicly released at https://github.com/mtavassoli/GNN-FC.
【24】Physics-Informed Machine Learning for Steel Development: A Computational Framework and CCT Diagram Modelling
标题:用于钢铁开发的物理知识机器学习:计算框架和CSC图建模
链接:https://arxiv.org/abs/2512.03050
作者:Peter Hedström,Victor Lamelas Cubero,Jón Sigurdsson,Viktor Österberg,Satish Kolli,Joakim Odqvist,Ziyong Hou,Wangzhong Mu,Viswanadh Gowtham Arigela
备注:14 pages
摘要:Machine learning (ML) has emerged as a powerful tool for accelerating the computational design and production of materials. In materials science, ML has primarily supported large-scale discovery of novel compounds using first-principles data and digital twin applications for optimizing manufacturing processes. However, applying general-purpose ML frameworks to complex industrial materials such as steel remains a challenge. A key obstacle is accurately capturing the intricate relationship between chemical composition, processing parameters, and the resulting microstructure and properties. To address this, we introduce a computational framework that combines physical insights with ML to develop a physics-informed continuous cooling transformation (CCT) model for steels. Our model, trained on a dataset of 4,100 diagrams, is validated against literature and experimental data. It demonstrates high computational efficiency, generating complete CCT diagrams with 100 cooling curves in under 5 seconds. It also shows strong generalizability across alloy steels, achieving phase classification F1 scores above 88% for all phases. For phase transition temperature regression, it attains mean absolute errors (MAE) below 20 °C across all phases except bainite, which shows a slightly higher MAE of 27 °C. This framework can be extended with additional generic and customized ML models to establish a universal digital twin platform for heat treatment. Integration with complementary simulation tools and targeted experiments will further support accelerated materials design workflows.
【25】Refining Machine Learning Potentials through Thermodynamic Theory of Phase Transitions
标题:通过相转变热力学理论提炼机器学习潜力
链接:https://arxiv.org/abs/2512.03974
作者:Paul Fuchs,Julija Zavadlav
摘要:Foundational Machine Learning Potentials can resolve the accuracy and transferability limitations of classical force fields. They enable microscopic insights into material behavior through Molecular Dynamics simulations, which can crucially expedite material design and discovery. However, insufficiently broad and systematically biased reference data affect the predictive quality of the learned models. Often, these models exhibit significant deviations from experimentally observed phase transition temperatures, in the order of several hundred kelvins. Thus, fine-tuning is necessary to achieve adequate accuracy in many practical problems. This work proposes a fine-tuning strategy via top-down learning, directly correcting the wrongly predicted transition temperatures to match the experimental reference data. Our approach leverages the Differentiable Trajectory Reweighting algorithm to minimize the free energy differences between phases at the experimental target pressures and temperatures. We demonstrate that our approach can accurately correct the phase diagram of pure Titanium in a pressure range of up to 5 GPa, matching the experimental reference within tenths of kelvins and improving the liquid-state diffusion constant. Our approach is model-agnostic, applicable to multi-component systems with solid-solid and solid-liquid transitions, and compliant with top-down training on other experimental properties. Therefore, our approach can serve as an essential step towards highly accurate application-specific and foundational machine learning potentials.
【26】A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
标题:大规模人工智能模型中稀疏专家混合辅助无损失负载平衡的理论框架
链接:https://arxiv.org/abs/2512.03915
作者:X. Y. Han,Yuan Zhong
摘要:In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing the Auxiliary-Loss-Free Load Balancing (ALF-LB) procedure -- proposed by DeepSeek's Wang et al. (2024) -- by casting it as a one-step-per-iteration primal-dual method for an assignment problem. First, in a stylized deterministic setting, our framework yields several insightful structural properties: (i) a monotonic improvement of a Lagrangian objective, (ii) a preference rule that moves tokens from overloaded to underloaded experts, and (iii) an approximate-balancing guarantee. Then, we incorporate the stochastic and dynamic nature of AI training using a generalized online optimization formulation. In the online setting, we derive a strong convexity property of the objective that leads to a logarithmic expected regret bound under certain step-size choices. Additionally, we present real experiments on 1B-parameter DeepSeekMoE models to complement our theoretical findings. Together, these results build a principled framework for analyzing the Auxiliary-Loss-Free Load Balancing of s-MoE in AI models.
【27】Comparison of neural network training strategies for the simulation of dynamical systems
标题:动态系统模拟的神经网络训练策略比较
链接:https://arxiv.org/abs/2512.03851
作者:Paul Strasser,Andreas Pfeffer,Jakob Weber,Markus Gurtner,Andreas Körner
备注:submitted to ECC
摘要:Neural networks have become a widely adopted tool for modeling nonlinear dynamical systems from data. However, the choice of training strategy remains a key design decision, particularly for simulation tasks. This paper compares two predominant strategies: parallel and series-parallel training. The conducted empirical analysis spans five neural network architectures and two examples: a pneumatic valve test bench and an industrial robot benchmark. The study reveals that, even though series-parallel training dominates current practice, parallel training consistently yields better long-term prediction accuracy. Additionally, this work clarifies the often inconsistent terminology in the literature and relate both strategies to concepts from system identification. The findings suggest that parallel training should be considered the default training strategy for neural network-based simulation of dynamical systems.
【28】Colored Markov Random Fields for Probabilistic Topological Modeling
标题:用于概率布局建模的有色Markov随机场
链接:https://arxiv.org/abs/2512.03727
作者:Lorenzo Marinucci,Leonardo Di Nino,Gabriele D'Acunto,Mario Edoardo Pandolfo,Paolo Di Lorenzo,Sergio Barbarossa
备注:Proceeding of 2025 Asilomar Conference on Signals, Systems, and Computers
摘要:Probabilistic Graphical Models (PGMs) encode conditional dependencies among random variables using a graph -nodes for variables, links for dependencies- and factorize the joint distribution into lower-dimensional components. This makes PGMs well-suited for analyzing complex systems and supporting decision-making. Recent advances in topological signal processing highlight the importance of variables defined on topological spaces in several application domains. In such cases, the underlying topology shapes statistical relationships, limiting the expressiveness of canonical PGMs. To overcome this limitation, we introduce Colored Markov Random Fields (CMRFs), which model both conditional and marginal dependencies among Gaussian edge variables on topological spaces, with a theoretical foundation in Hodge theory. CMRFs extend classical Gaussian Markov Random Fields by including link coloring: connectivity encodes conditional independence, while color encodes marginal independence. We quantify the benefits of CMRFs through a distributed estimation case study over a physical network, comparing it with baselines with different levels of topological prior.
【29】Consistent Projection of Langevin Dynamics: Preserving Thermodynamics and Kinetics in Coarse-Grained Models
标题:朗之万动力学的一致投影:粗粒模型中的热力学和动力学保留
链接:https://arxiv.org/abs/2512.03706
作者:Vahid Nateghi,Lara Neureither,Selma Moqvist,Carsten Hartmann,Simon Olsson,Feliks Nüske
摘要
:Coarse graining (CG) is an important task for efficient modeling and simulation of complex multi-scale systems, such as the conformational dynamics of biomolecules. This work presents a projection-based coarse-graining formalism for general underdamped Langevin dynamics. Following the Zwanzig projection approach, we derive a closed-form expression for the coarse grained dynamics. In addition, we show how the generator Extended Dynamic Mode Decomposition (gEDMD) method, which was developed in the context of Koopman operator methods, can be used to model the CG dynamics and evaluate its kinetic properties, such as transition timescales. Finally, we combine our approach with thermodynamic interpolation (TI), a generative approach to transform samples between thermodynamic conditions, to extend the scope of the approach across thermodynamic states without repeated numerical simulations. Using a two-dimensional model system, we demonstrate that the proposed method allows to accurately capture the thermodynamic and kinetic properties of the full-space model.
【30】Learning From Limited Data and Feedback for Cell Culture Process Monitoring: A Comparative Study
标题:从有限数据和反馈中学习细胞培养过程监测:比较研究
链接:https://arxiv.org/abs/2512.03460
作者:Johnny Peng,Thanh Tung Khuat,Ellen Otte,Katarzyna Musial,Bogdan Gabrys
备注:This is a pre-print for submitting to computers & chemical engineering journal
摘要:In cell culture bioprocessing, real-time batch process monitoring (BPM) refers to the continuous tracking and analysis of key process variables such as viable cell density, nutrient levels, metabolite concentrations, and product titer throughout the duration of a batch run. This enables early detection of deviations and supports timely control actions to ensure optimal cell growth and product quality. BPM plays a critical role in ensuring the quality and regulatory compliance of biopharmaceutical manufacturing processes. However, the development of accurate soft sensors for BPM is hindered by key challenges, including limited historical data, infrequent feedback, heterogeneous process conditions, and high-dimensional sensory inputs. This study presents a comprehensive benchmarking analysis of machine learning (ML) methods designed to address these challenges, with a focus on learning from historical data with limited volume and relevance in the context of bioprocess monitoring. We evaluate multiple ML approaches including feature dimensionality reduction, online learning, and just-in-time learning across three datasets, one in silico dataset and two real-world experimental datasets. Our findings highlight the importance of training strategies in handling limited data and feedback, with batch learning proving effective in homogeneous settings, while just-in-time learning and online learning demonstrate superior adaptability in cold-start scenarios. Additionally, we identify key meta-features, such as feed media composition and process control strategies, that significantly impact model transferability. The results also suggest that integrating Raman-based predictions with lagged offline measurements enhances monitoring accuracy, offering a promising direction for future bioprocess soft sensor development.
其他(36篇)
【1】Fare Comparison App of Uber, Ola and Rapido
标题:Uber、Ola和Rapido的票价比较应用
链接:https://arxiv.org/abs/2512.04065
作者:Ashlesha Gopinath Sawant,Sahil S. Jadhav,Vidhan R. Jain,Shriraj S. Jagtap,Prachi Jadhav,Soham Jadhav,Ichha Raina
备注:4 pages
摘要:In todays increasing world, it is very important to have good hailing services like Ola, Uber, and Rapido as it is very essential for our daily transportation. Users often face difficulties in choosing the most appropriate and efficient ride that would lead to both cost-effective and would take us to our destination in less time. This project provides you with the web application that helps you to select the most beneficial ride for you by providing users with the fare comparison between Ola, Uber, Rapido for the destination entered by the user. The backend is use to fetch the data, providing users with the fare comparison for the ride and finally providing with the best option using Python. This research paper also addresses the problem and challenges faced in accessing the data using APIs, Android Studios emulator, Appium and location comparison. Thus, the aim of the project is to provide transparency to the users in ride-hailing services and increase efficiency and provide users with better experience.
【2】Eval Factsheets: A Structured Framework for Documenting AI Evaluations
标题:Eval概况表:记录人工智能评估的结构化框架
链接:https://arxiv.org/abs/2512.04062
作者:Florian Bordes,Candace Ross,Justine T Kao,Evangelia Spiliopoulou,Adina Williams
摘要:The rapid proliferation of benchmarks has created significant challenges in reproducibility, transparency, and informed decision-making. However, unlike datasets and models -- which benefit from structured documentation frameworks like Datasheets and Model Cards -- evaluation methodologies lack systematic documentation standards. We introduce Eval Factsheets, a structured, descriptive framework for documenting AI system evaluations through a comprehensive taxonomy and questionnaire-based approach. Our framework organizes evaluation characteristics across five fundamental dimensions: Context (Who made the evaluation and when?), Scope (What does it evaluate?), Structure (With what the evaluation is built?), Method (How does it work?) and Alignment (In what ways is it reliable/valid/robust?). We implement this taxonomy as a practical questionnaire spanning five sections with mandatory and recommended documentation elements. Through case studies on multiple benchmarks, we demonstrate that Eval Factsheets effectively captures diverse evaluation paradigms -- from traditional benchmarks to LLM-as-judge methodologies -- while maintaining consistency and comparability. We hope Eval Factsheets are incorporated into both existing and newly released evaluation frameworks and lead to more transparency and reproducibility.
【3】Efficient Public Verification of Private ML via Regularization
标题:通过规范化对私有ML进行有效公开验证
链接:https://arxiv.org/abs/2512.04008
作者:Zoë Ruha Bell,Anvith Thudi,Olive Franzese-McLaughlin,Nicolas Papernot,Shafi Goldwasser
摘要
:Training with differential privacy (DP) provides a guarantee to members in a dataset that they cannot be identified by users of the released model. However, those data providers, and, in general, the public, lack methods to efficiently verify that models trained on their data satisfy DP guarantees. The amount of compute needed to verify DP guarantees for current algorithms scales with the amount of compute required to train the model. In this paper we design the first DP algorithm with near optimal privacy-utility trade-offs but whose DP guarantees can be verified cheaper than training. We focus on DP stochastic convex optimization (DP-SCO), where optimal privacy-utility trade-offs are known. Here we show we can obtain tight privacy-utility trade-offs by privately minimizing a series of regularized objectives and only using the standard DP composition bound. Crucially, this method can be verified with much less compute than training. This leads to the first known DP-SCO algorithm with near optimal privacy-utility whose DP verification scales better than training cost, significantly reducing verification costs on large datasets.
【4】Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
标题:对角线化Softmax:Hadamard为可追踪交叉-熵动力学设计
链接:https://arxiv.org/abs/2512.04006
作者:Connall Garrod,Jonathan P. Keating,Christos Thrampoulidis
摘要:Cross-entropy (CE) training loss dominates deep learning practice, yet existing theory often relies on simplifications, either replacing it with squared loss or restricting to convex models, that miss essential behavior. CE and squared loss generate fundamentally different dynamics, and convex linear models cannot capture the complexities of non-convex optimization. We provide an in-depth characterization of multi-class CE optimization dynamics beyond the convex regime by analyzing a canonical two-layer linear neural network with standard-basis vectors as inputs: the simplest non-convex extension for which the implicit bias remained unknown. This model coincides with the unconstrained features model used to study neural collapse, making our work the first to prove that gradient flow on CE converges to the neural collapse geometry. We construct an explicit Lyapunov function that establishes global convergence, despite the presence of spurious critical points in the non-convex landscape. A key insight underlying our analysis is an inconspicuous finding: Hadamard Initialization diagonalizes the softmax operator, freezing the singular vectors of the weight matrices and reducing the dynamics entirely to their singular values. This technique opens a pathway for analyzing CE training dynamics well beyond our specific setting considered here.
【5】Density-Informed VAE (DiVAE): Reliable Log-Prior Probability via Density Alignment Regularization
标题:Density-Informed VAE(DiVAE):通过密度对齐正则化的可靠对数先验概率
链接:https://arxiv.org/abs/2512.03928
作者:Michele Alessi,Alessio Ansuini,Alex Rodriguez
备注:PriGM Workshop EurIPS 2025
摘要:We introduce Density-Informed VAE (DiVAE), a lightweight, data-driven regularizer that aligns the VAE log-prior probability $\log p_Z(z)$ with a log-density estimated from data. Standard VAEs match latents to a simple prior, overlooking density structure in the data-space. DiVAE encourages the encoder to allocate posterior mass in proportion to data-space density and, when the prior is learnable, nudges the prior toward high-density regions. This is realized by adding a robust, precision-weighted penalty to the ELBO, incurring negligible computational overhead. On synthetic datasets, DiVAE (i) improves distributional alignment of latent log-densities to its ground truth counterpart, (ii) improves prior coverage, and (iii) yields better OOD uncertainty calibration. On MNIST, DiVAE improves alignment of the prior with external estimates of the density, providing better interpretability, and improves OOD detection for learnable priors.
【6】Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction
标题:非线性模糊单集约化的概率基础
链接:https://arxiv.org/abs/2512.03899
作者:Janis Keck,Lukas Silvester Barth,Fatemeh,Fahimi,Parvaneh Joharinad,Jürgen Jost
备注:47 pages (including appendix), 11 figures
摘要:Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from algebraic topology without a clear probabilistic interpretation detaches them from commonly used theoretical frameworks in those areas. In this work we introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets. In particular, this perspective shows that the fuzzy weights of UMAP arise from a generative model that samples Vietoris-Rips filtrations at random scales, yielding cumulative distribution functions of pairwise distances. More generally, the framework connects fuzzy simplicial sets to probabilistic models on the face poset, clarifies the relation between Kullback-Leibler divergence and fuzzy cross-entropy in this setting, and recovers standard t-norms and t-conorms via Boolean operations on the underlying simplicial sets. We then show how new embedding methods may be derived from this framework and illustrate this on an example where we generalize UMAP using Čech filtrations with triplet sampling. In summary, this probabilistic viewpoint provides a unified probabilistic theoretical foundation for fuzzy simplicial sets, clarifies the role of UMAP within this framework, and enables the systematic derivation of new dimensionality reduction methods.
【7】Hyperdimensional Computing for Sustainable Manufacturing: An Initial Assessment
标题:可持续制造的超维计算:初步评估
链接:https://arxiv.org/abs/2512.03864
作者:Danny Hoang,Anandkumar Patel,Ruimen Chen,Rajiv Malhotra,Farhad Imani
摘要:Smart manufacturing can significantly improve efficiency and reduce energy consumption, yet the energy demands of AI models may offset these gains. This study utilizes in-situ sensing-based prediction of geometric quality in smart machining to compare the energy consumption, accuracy, and speed of common AI models. HyperDimensional Computing (HDC) is introduced as an alternative, achieving accuracy comparable to conventional models while drastically reducing energy consumption, 200$\times$ for training and 175 to 1000$\times$ for inference. Furthermore, HDC reduces training times by 200$\times$ and inference times by 300 to 600$\times$, showcasing its potential for energy-efficient smart manufacturing.
【8】Deep Unfolding: Recent Developments, Theory, and Design Guidelines
标题:深度展开:最近的发展、理论和设计指南
链接:https://arxiv.org/abs/2512.03768
作者:Nir Shlezinger,Santiago Segarra,Yi Zhang,Dvir Avrahami,Zohar Davidov,Tirza Routtenberg,Yonina C. Eldar
备注:under review for publication in the IEEE
摘要:Optimization methods play a central role in signal processing, serving as the mathematical foundation for inference, estimation, and control. While classical iterative optimization algorithms provide interpretability and theoretical guarantees, they often rely on surrogate objectives, require careful hyperparameter tuning, and exhibit substantial computational latency. Conversely, machine learning (ML ) offers powerful data-driven modeling capabilities but lacks the structure, transparency, and efficiency needed for optimization-driven inference. Deep unfolding has recently emerged as a compelling framework that bridges these two paradigms by systematically transforming iterative optimization algorithms into structured, trainable ML architectures. This article provides a tutorial-style overview of deep unfolding, presenting a unified perspective of methodologies for converting optimization solvers into ML models and highlighting their conceptual, theoretical, and practical implications. We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature. Furthermore, we survey recent theoretical advances that establish convergence and generalization guarantees for unfolded optimizers, and provide comparative qualitative and empirical studies illustrating their relative trade-offs in complexity, interpretability, and robustness.
【9】Cross-embodied Co-design for Dexterous Hands
标题:面向灵巧手的交叉体现协同设计
链接:https://arxiv.org/abs/2512.03743
作者:Kehlani Fay,Darin Anthony Djapri,Anya Zorin,James Clinton,Ali El Lahib,Hao Su,Michael T. Tolley,Sha Yi,Xiaolong Wang
摘要:Dexterous manipulation is limited by both control and design, without consensus as to what makes manipulators best for performing dexterous tasks. This raises a fundamental challenge: how should we design and control robot manipulators that are optimized for dexterity? We present a co-design framework that learns task-specific hand morphology and complementary dexterous control policies. The framework supports 1) an expansive morphology search space including joint, finger, and palm generation, 2) scalable evaluation across the wide design space via morphology-conditioned cross-embodied control, and 3) real-world fabrication with accessible components. We evaluate the approach across multiple dexterous tasks, including in-hand rotation with simulation and real deployment. Our framework enables an end-to-end pipeline that can design, train, fabricate, and deploy a new robotic hand in under 24 hours. The full framework will be open-sourced and available on our website.
【10】Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control
标题:跨越模拟和地面测试之间的Sim 2 Real差距,实现自主自由飞行器控制的太空部署
链接:https://arxiv.org/abs/2512.03736
作者:Kenneth Stewart,Samantha Chapin,Roxana Leontie,Carl Glen Henshaw
备注:published at iSpaRo 2025
摘要:Reinforcement learning (RL) offers transformative potential for robotic control in space. We present the first on-orbit demonstration of RL-based autonomous control of a free-flying robot, the NASA Astrobee, aboard the International Space Station (ISS). Using NVIDIA's Omniverse physics simulator and curriculum learning, we trained a deep neural network to replace Astrobee's standard attitude and translation control, enabling it to navigate in microgravity. Our results validate a novel training pipeline that bridges the simulation-to-reality (Sim2Real) gap, utilizing a GPU-accelerated, scientific-grade simulation environment for efficient Monte Carlo RL training. This successful deployment demonstrates the feasibility of training RL policies terrestrially and transferring them to space-based applications. This paves the way for future work in In-Space Servicing, Assembly, and Manufacturing (ISAM), enabling rapid on-orbit adaptation to dynamic mission requirements.
【11】Dynamically Scaled Activation Steering
标题:动态扩展激活转向
链接:https://arxiv.org/abs/2512.03661
作者:Alex Ferrando,Xavier Suau,Jordi Gonzàlez,Pau Rodriguez
摘要:Activation steering has emerged as a powerful method for guiding the behavior of generative models towards desired outcomes such as toxicity mitigation. However, most existing methods apply interventions uniformly across all inputs, degrading model performance when steering is unnecessary. We introduce Dynamically Scaled Activation Steering (DSAS), a method-agnostic steering framework that decouples when to steer from how to steer. DSAS adaptively modulates the strength of existing steering transformations across layers and inputs, intervening strongly only when undesired behavior is detected. At generation time, DSAS computes context-dependent scaling factors that selectively adjust the strength of any steering method. We also show how DSAS can be jointly optimized end-to-end together with the steering function. When combined with existing steering methods, DSAS consistently improves the Pareto front with respect to steering alone, achieving a better trade-off between toxicity mitigation and utility preservation. We further demonstrate DSAS's generality by applying it to a text-to-image diffusion model, showing how adaptive steering allows the modulation of specific concepts. Finally, DSAS introduces minimal computational overhead while improving interpretability, pinpointing which tokens require steering and by how much.
【12】Optical Context Compression Is Just (Bad) Autoencoding
标题:光学上下文压缩只是(糟糕的)自动编码
链接:https://arxiv.org/abs/2512.03643
作者:Ivan Yee Lee,Cheng Yang,Taylor Berg-Kirkpatrick
摘要:DeepSeek-OCR demonstrates that rendered text can be reconstructed with high fidelity from a small number of vision tokens. This finding has sparked excitement about vision-based context compression for language models. But the evaluation stops at reconstruction; whether these representations help language modeling remains untested. We test two assumptions implicit in the optical-compression narrative: that vision-based compression provides unique advantages for text reconstruction from compressed representations, and that DeepSeek-OCR's reconstruction results are evidence that vision-based compression will be useful for language modeling. Comparing their vision encoder against simple alternatives--parameter-free mean pooling and a learned hierarchical encoder--we find that these simple approaches match or surpass vision for reconstruction at matched compression ratios, and outperform it for language modeling--where vision-based compression fails to beat truncation. The excitement around optical context compression outpaces the evidence. Code and checkpoints are available at https://github.com/ivnle/bad-autoencoding
【13】EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths
标题:EnCompass:通过搜索程序执行路径来增强代理编程
链接:https://arxiv.org/abs/2512.03571
作者:Zhening Li,Armando Solar-Lezama,Yisong Yue,Stephan Zheng
备注:65 pages, 2 figures, published in NeurIPS 2025
摘要:We introduce a new approach to agent programming, the development of LLM-based agents. Current approaches to agent programming often entangle two aspects of agent design: the core workflow logic and the inference-time strategy (e.g., tree search). We introduce "probabilistic angelic nondeterminism" ("PAN"), a programming model that disentangles these two concerns, allowing the programmer to describe the agent workflow and independently experiment with different inference-time strategies by simply changing a few inputs. We provide an implementation of PAN in Python as the EnCompass framework, which uses a Python decorator to compile agent workflow programs into a search space. We present three case studies that demonstrate how the framework lets the programmer quickly improve the reliability of an agent and easily switch between different inference-time strategies, all with little additional coding.
【14】A Preliminary Study on the Promises and Challenges of Native Top-$k$ Sparse Attention
标题:原住民顶级-$k$稀疏注意力的承诺与挑战初步研究
链接:https://arxiv.org/abs/2512.03494
作者:Di Xiu,Hongyin Tang,Bolin Rong,Lizhi Yan,Jingang Wang,Yifan Lu,Xunliang Cai
摘要:Large Language Models (LLMs) are increasingly prevalent in the field of long-context modeling, however, their inference computational costs have become a critical bottleneck hindering the advancement of tasks such as agents and multimodal applications. This report conducts a preliminary investigation into the effectiveness and theoretical mechanisms of the Top-$k$ Attention mechanism during both the decoding and training phases. First, we validate the effectiveness of exact Top-$k$ Decoding through extensive experimentation. Experiments demonstrate that retaining only the pivotal Keys with the highest similarity to the Query as the context window during the decoding stage achieves performance comparable to, or even surpassing, full attention on downstream tasks such as HELMET and LongBench v2. Second, we further explore the native Top-$k$ Attention training strategy. Experiments confirm that ensuring the consistency between training and inference regarding Top-$k$ Attention operations facilitates the further unlocking of Top-$k$ Decoding's potential, thereby significantly enhancing model performance. Furthermore, considering the high computational complexity of exact Top-$k$ Attention, we investigate the impact of approximate Top-$k$ algorithm precision on downstream tasks. Our research confirms a positive correlation between downstream task performance and approximation fidelity, and we provide statistical evaluations of the Lightning Indexer's precision within the DeepSeek-V3.2-Exp model. Finally, this report provides a theoretical interpretation from the perspective of Entropy. Experimental observations indicate that models subjected to Top-$k$ Attention SFT exhibit a distinct phenomenon of entropy reduction in downstream tasks, which validates the hypothesis that low-entropy states are better adapted to Top-$k$ Decoding.
【15】ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
标题:ATHENA:分层进化数值算法的精英团队
链接:https://arxiv.org/abs/2512.03476
作者:Juan Diego Toscano,Daniel T. Chen,George Em Karniadakis
摘要:Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms), an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle. Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual Bandit problem. Acting as an online learner, the system analyzes prior trials to select structural `actions' ($A_n$) from combinatorial spaces guided by expert blueprints (e.g., Universal Approximation, Physics-Informed constraints). These actions are translated into executable code ($S_n$) to generate scientific rewards ($R_n$). ATHENA transcends standard automation: in SciC, it autonomously identifies mathematical symmetries for exact analytical solutions or derives stable numerical solvers where foundation models fail. In SciML, it performs deep diagnosis to tackle ill-posed formulations and combines hybrid symbolic-numeric workflows (e.g., coupling PINNs with FEM) to resolve multiphysics problems. The framework achieves super-human performance, reaching validation errors of $10^{-14}$. Furthermore, collaborative ``human-in-the-loop" intervention allows the system to bridge stability gaps, improving results by an order of magnitude. This paradigm shift focuses from implementation mechanics to methodological innovation, accelerating scientific discovery.
【16】Tuning-Free Structured Sparse Recovery of Multiple Measurement Vectors using Implicit Regularization
标题:使用隐式正规化的多个测量载体的无调谐结构稀疏恢复
链接:https://arxiv.org/abs/2512.03393
作者:Lakshmi Jayalal,Sheetal Kalyani
摘要
:Recovering jointly sparse signals in the multiple measurement vectors (MMV) setting is a fundamental problem in machine learning, but traditional methods like multiple measurement vectors orthogonal matching pursuit (M-OMP) and multiple measurement vectors FOCal Underdetermined System Solver (M-FOCUSS) often require careful parameter tuning or prior knowledge of the sparsity of the signal and/or noise variance. We introduce a novel tuning-free framework that leverages Implicit Regularization (IR) from overparameterization to overcome this limitation. Our approach reparameterizes the estimation matrix into factors that decouple the shared row-support from individual vector entries. We show that the optimization dynamics inherently promote the desired row-sparse structure by applying gradient descent to a standard least-squares objective on these factors. We prove that with a sufficiently small and balanced initialization, the optimization dynamics exhibit a "momentum-like" effect, causing the norms of rows in the true support to grow significantly faster than others. This formally guarantees that the solution trajectory converges towards an idealized row-sparse solution. Additionally, empirical results demonstrate that our approach achieves performance comparable to established methods without requiring any prior information or tuning.
【17】NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction
标题:NavMapFusion:基于扩散的导航地图融合,用于在线载体化高清地图构建
链接:https://arxiv.org/abs/2512.03317
作者:Thomas Monninger,Zihan Zhang,Steffen Staab,Sihao Ding
备注:Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)
摘要:Accurate environmental representations are essential for autonomous driving, providing the foundation for safe and efficient navigation. Traditionally, high-definition (HD) maps are providing this representation of the static road infrastructure to the autonomous system a priori. However, because the real world is constantly changing, such maps must be constructed online from on-board sensor data. Navigation-grade standard-definition (SD) maps are widely available, but their resolution is insufficient for direct deployment. Instead, they can be used as coarse prior to guide the online map construction process. We propose NavMapFusion, a diffusion-based framework that performs iterative denoising conditioned on high-fidelity sensor data and on low-fidelity navigation maps. This paper strives to answer: (1) How can coarse, potentially outdated navigation maps guide online map construction? (2) What advantages do diffusion models offer for map fusion? We demonstrate that diffusion-based map construction provides a robust framework for map fusion. Our key insight is that discrepancies between the prior map and online perception naturally correspond to noise within the diffusion process; consistent regions reinforce the map construction, whereas outdated segments are suppressed. On the nuScenes benchmark, NavMapFusion conditioned on coarse road lines from OpenStreetMap data reaches a 21.4% relative improvement on 100 m, and even stronger improvements on larger perception ranges, while maintaining real-time capabilities. By fusing low-fidelity priors with high-fidelity sensor data, the proposed method generates accurate and up-to-date environment representations, guiding towards safer and more reliable autonomous driving. The code is available at https://github.com/tmonnin/navmapfusion
【18】BlendedNet++: A Large-Scale Blended Wing Body Aerodynamics Dataset and Benchmark
标题:BlendedNet++:大规模混合翼身空气动力学数据集和基准
链接:https://arxiv.org/abs/2512.03280
作者:Nicholas Sung,Steven Spreizer,Mohamed Elrefaie,Matthew C. Jones,Faez Ahmed
摘要:Despite progress in machine learning-based aerodynamic surrogates, the scarcity of large, field-resolved datasets limits progress on accurate pointwise prediction and reproducible inverse design for aircraft. We introduce BlendedNet++, a large-scale aerodynamic dataset and benchmark focused on blended wing body (BWB) aircraft. The dataset contains over 12,000 unique geometries, each simulated at a single flight condition, yielding 12,490 aerodynamic results for steady RANS CFD. For every case, we provide (i) integrated force/moment coefficients CL, CD, CM and (ii) dense surface fields of pressure and skin friction coefficients Cp and (Cfx, Cfy, Cfz). Using this dataset, we standardize a forward-surrogate benchmark to predict pointwise fields across six model families: GraphSAGE, GraphUNet, PointNet, a coordinate Transformer (Transolver-style), a FiLMNet (coordinate MLP with feature-wise modulation), and a Graph Neural Operator Transformer (GNOT). Finally, we present an inverse design task of achieving a specified lift-to-drag ratio under fixed flight conditions, implemented via a conditional diffusion model. To assess performance, we benchmark this approach against gradient-based optimization on the same surrogate and a diffusion-optimization hybrid that first samples with the conditional diffusion model and then further optimizes the designs. BlendedNet++ provides a unified forward and inverse protocol with multi-model baselines, enabling fair, reproducible comparison across architectures and optimization paradigms. We expect BlendedNet++ to catalyze reproducible research in field-level aerodynamics and inverse design; resources (dataset, splits, baselines, and scripts) will be released upon acceptance.
【19】Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
标题:回想起来太晚了:解释多模式知识检索中的两跳问题
链接:https://arxiv.org/abs/2512.03276
作者:Constantin Venhoff,Ashkan Khakzar,Sonia Joseph,Philip Torr,Neel Nanda
摘要
:Training vision language models (VLMs) aims to align visual representations from a vision encoder with the textual representations of a pretrained large language model (LLM). However, many VLMs exhibit reduced factual recall performance compared to their LLM backbones, raising the question of how effective multimodal fine-tuning is at extending existing mechanisms within the LLM to visual inputs. We argue that factual recall based on visual inputs requires VLMs to solve a two-hop problem: (1) forming entity representations from visual inputs, and (2) recalling associated factual knowledge based on these entity representations. By benchmarking 14 VLMs with various architectures (LLaVA, Native, Cross-Attention), sizes (7B-124B parameters), and training setups on factual recall tasks against their original LLM backbone models, we find that 11 of 14 models exhibit factual recall degradation. We select three models with high and two models with low performance degradation, and use attribution patching, activation patching, and probing to show that degraded VLMs struggle to use the existing factual recall circuit of their LLM backbone, because they resolve the first hop too late in the computation. In contrast, high-performing VLMs resolve entity representations early enough to reuse the existing factual recall mechanism. Finally, we demonstrate two methods to recover performance: patching entity representations from the LLM backbone into the VLM, and prompting with chain-of-thought reasoning. Our results highlight that the speed of early entity resolution critically determines how effective VLMs are in using preexisting LLM mechanisms. More broadly, our work illustrates how mechanistic analysis can explain and unveil systematic failures in multimodal alignment.
【20】Perch 2.0 transfers 'whale' to underwater tasks
标题:鲈鱼2.0将“鲸鱼”转移到水下任务
链接:https://arxiv.org/abs/2512.03219
作者:Andrea Burns,Lauren Harrell,Bart van Merriënboer,Vincent Dumoulin,Jenny Hamer,Tom Denton
备注:8 pages, 3 figures, 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: AI for Non-Human Animal Communication
摘要:Perch 2.0 is a supervised bioacoustics foundation model pretrained on 14,597 species, including birds, mammals, amphibians, and insects, and has state-of-the-art performance on multiple benchmarks. Given that Perch 2.0 includes almost no marine mammal audio or classes in the training data, we evaluate Perch 2.0 performance on marine mammal and underwater audio tasks through few-shot transfer learning. We perform linear probing with the embeddings generated from this foundation model and compare performance to other pretrained bioacoustics models. In particular, we compare Perch 2.0 with previous multispecies whale, Perch 1.0, SurfPerch, AVES-bio, BirdAVES, and Birdnet V2.3 models, which have open-source tools for transfer-learning and agile modeling. We show that the embeddings from the Perch 2.0 model have consistently high performance for few-shot transfer learning, generally outperforming alternative embedding models on the majority of tasks, and thus is recommended when developing new linear classifiers for marine mammal classification with few labeled examples.
【21】Scaling Internal-State Policy-Gradient Methods for POMDPs
标题:扩大POMDPs的国家内部政策梯度方法
链接:https://arxiv.org/abs/2512.03204
作者:Douglas Aberdeen,Jonathan Baxter
摘要:Policy-gradient methods have received increased attention recently as a mechanism for learning to act in partially observable environments. They have shown promise for problems admitting memoryless policies but have been less successful when memory is required. In this paper we develop several improved algorithms for learning policies with memory in an infinite-horizon setting -- directly when a known model of the environment is available, and via simulation otherwise. We compare these algorithms on some large POMDPs, including noisy robot navigation and multi-agent problems.
【22】Enhancing Job Matching: Occupation, Skill and Qualification Linking with the ESCO and EQF taxonomies
标题:增强工作匹配:职业、技能和资格与工作组和EQF分类相联系
链接:https://arxiv.org/abs/2512.03195
作者:Stylianos Saroglou,Konstantinos Diamantaras,Francesco Preta,Marina Delianidi,Apostolos Benisis,Christian Johannes Meyer
备注:14 pages, 1 figure, Preprint
摘要:This study investigates the potential of language models to improve the classification of labor market information by linking job vacancy texts to two major European frameworks: the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and the European Qualifications Framework (EQF). We examine and compare two prominent methodologies from the literature: Sentence Linking and Entity Linking. In support of ongoing research, we release an open-source tool, incorporating these two methodologies, designed to facilitate further work on labor classification and employment discourse. To move beyond surface-level skill extraction, we introduce two annotated datasets specifically aimed at evaluating how occupations and qualifications are represented within job vacancy texts. Additionally, we examine different ways to utilize generative large language models for this task. Our findings contribute to advancing the state of the art in job entity extraction and offer computational infrastructure for examining work, skills, and labor market narratives in a digitally mediated economy. Our code is made publicly available: https://github.com/tabiya-tech/tabiya-livelihoods-classifier
【23】GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding
标题:GRAND:多智能体路径查找中网络调度的引导、再平衡和分配
链接:https://arxiv.org/abs/2512.03194
作者:Johannes Gaber,Meshal Alharbi,Daniele Gammelli,Gioele Zardini
摘要:Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling for lifelong Multi-Agent Pickup-and-Delivery (MAPD) and propose a hybrid method that couples learning-based global guidance with lightweight optimization. A graph neural network policy trained via reinforcement learning outputs a desired distribution of free agents over an aggregated warehouse graph. This signal is converted into region-to-region rebalancing through a minimum-cost flow, and finalized by small, local assignment problems, preserving accuracy while keeping per-step latency within a 1 s compute budget. On congested warehouse benchmarks from the League of Robot Runners (LRR) with up to 500 agents, our approach improves throughput by up to 10% over the 2024 winning scheduler while maintaining real-time execution. The results indicate that coupling graph-structured learned guidance with tractable solvers reduces congestion and yields a practical, scalable blueprint for high-throughput scheduling in large fleets.
【24】Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability
标题:超越可加性:面向非线性可解释性的稀疏等序Shapley回归
链接:https://arxiv.org/abs/2512.03112
作者:Jialai She
摘要
:Shapley values, a gold standard for feature attribution in Explainable AI, face two primary challenges. First, the canonical Shapley framework assumes that the worth function is additive, yet real-world payoff constructions--driven by non-Gaussian distributions, heavy tails, feature dependence, or domain-specific loss scales--often violate this assumption, leading to distorted attributions. Secondly, achieving sparse explanations in high dimensions by computing dense Shapley values and then applying ad hoc thresholding is prohibitively costly and risks inconsistency. We introduce Sparse Isotonic Shapley Regression (SISR), a unified nonlinear explanation framework. SISR simultaneously learns a monotonic transformation to restore additivity--obviating the need for a closed-form specification--and enforces an L0 sparsity constraint on the Shapley vector, enhancing computational efficiency in large feature spaces. Its optimization algorithm leverages Pool-Adjacent-Violators for efficient isotonic regression and normalized hard-thresholding for support selection, yielding implementation ease and global convergence guarantees. Analysis shows that SISR recovers the true transformation in a wide range of scenarios and achieves strong support recovery even in high noise. Moreover, we are the first to demonstrate that irrelevant features and inter-feature dependencies can induce a true payoff transformation that deviates substantially from linearity. Experiments in regression, logistic regression, and tree ensembles demonstrate that SISR stabilizes attributions across payoff schemes, correctly filters irrelevant features while standard Shapley values suffer severe rank and sign distortions. By unifying nonlinear transformation estimation with sparsity pursuit, SISR advances the frontier of nonlinear explainability, providing a theoretically grounded and practical attribution framework.
【25】E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing
标题:E-Valuator:具有序贯假设检验的可靠Agent验证器
链接:https://arxiv.org/abs/2512.03109
作者:Shuvom Sadhuka,Drew Prinster,Clara Fannjiang,Gabriele Scalia,Aviv Regev,Hanchen Wang
摘要:Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have developed verifiers, such as LLM judges and process-reward models, to score the quality of each action in an agent's trajectory. Although these heuristic scores can be informative, there are no guarantees of correctness when used to decide whether an agent will yield a successful output. Here, we introduce e-valuator, a method to convert any black-box verifier score into a decision rule with provable control of false alarm rates. We frame the problem of distinguishing successful trajectories (that is, a sequence of actions that will lead to a correct response to the user's prompt) and unsuccessful trajectories as a sequential hypothesis testing problem. E-valuator builds on tools from e-processes to develop a sequential hypothesis test that remains statistically valid at every step of an agent's trajectory, enabling online monitoring of agents over arbitrarily long sequences of actions. Empirically, we demonstrate that e-valuator provides greater statistical power and better false alarm rate control than other strategies across six datasets and three agents. We additionally show that e-valuator can be used for to quickly terminate problematic trajectories and save tokens. Together, e-valuator provides a lightweight, model-agnostic framework that converts verifier heuristics into decisions rules with statistical guarantees, enabling the deployment of more reliable agentic systems.
【26】Password-Activated Shutdown Protocols for Misaligned Frontier Agents
标题:针对未对齐的边境代理的密码激活的NPS协议
链接:https://arxiv.org/abs/2512.03089
作者:Kai Williams,Rohan Subramani,Francis Rhys Ward
摘要:Frontier AI developers may fail to align or control highly-capable AI agents. In many cases, it could be useful to have emergency shutdown mechanisms which effectively prevent misaligned agents from carrying out harmful actions in the world. We introduce password-activated shutdown protocols (PAS protocols) -- methods for designing frontier agents to implement a safe shutdown protocol when given a password. We motivate PAS protocols by describing intuitive use-cases in which they mitigate risks from misaligned systems that subvert other control efforts, for instance, by disabling automated monitors or self-exfiltrating to external data centres. PAS protocols supplement other safety efforts, such as alignment fine-tuning or monitoring, contributing to defence-in-depth against AI risk. We provide a concrete demonstration in SHADE-Arena, a benchmark for AI monitoring and subversion capabilities, in which PAS protocols supplement monitoring to increase safety with little cost to performance. Next, PAS protocols should be robust to malicious actors who want to bypass shutdown. Therefore, we conduct a red-team blue-team game between the developers (blue-team), who must implement a robust PAS protocol, and a red-team trying to subvert the protocol. We conduct experiments in a code-generation setting, finding that there are effective strategies for the red-team, such as using another model to filter inputs, or fine-tuning the model to prevent shutdown behaviour. We then outline key challenges to implementing PAS protocols in real-life systems, including: security considerations of the password and decisions regarding when, and in which systems, to use them. PAS protocols are an intuitive mechanism for increasing the safety of frontier AI. We encourage developers to consider implementing PAS protocols prior to internal deployment of particularly dangerous systems to reduce loss-of-control risks.
【27】Risk-Entropic Flow Matching
标题:风险-熵流量匹配
链接:https://arxiv.org/abs/2512.03078
作者:Vahid R. Ramezani,Benjamin Englard
备注:29 pages, 5 figures
摘要:Tilted (entropic) risk, obtained by applying a log-exponential transform to a base loss, is a well established tool in statistics and machine learning for emphasizing rare or high loss events while retaining a tractable optimization problem. In this work, our aim is to interpret its structure for Flow Matching (FM). FM learns a velocity field that transports samples from a simple source distribution to data by integrating an ODE. In rectified FM, training pairs are obtained by linearly interpolating between a source sample and a data sample, and a neural velocity field is trained to predict the straight line displacement using a mean squared error loss. This squared loss collapses all velocity targets that reach the same space-time point into a single conditional mean, thereby ignoring higher order conditional information (variance, skewness, multi-modality) that encodes fine geometric structure about the data manifold and minority branches. We apply the standard risk-sensitive (log-exponential) transform to the conditional FM loss and show that the resulting tilted risk loss is a natural upper-bound on a meaningful conditional entropic FM objective defined at each space-time point. Furthermore, we show that a small order expansion of the gradient of this conditional entropic objective yields two interpretable first order corrections: covariance preconditioning of the FM residual, and a skew tail term that favors asymmetric or rare branches. On synthetic data designed to probe ambiguity and tails, the resulting risk-sensitive loss improves statistical metrics and recovers geometric structure more faithfully than standard rectified FM.
【28】Safe and Sustainable Electric Bus Charging Scheduling with Constrained Hierarchical DRL
标题:具有约束分层DRL的安全可持续电动公交车充电调度
链接:https://arxiv.org/abs/2512.03059
作者:Jiaju Qi,Lei Lei,Thorsteinn Jonsson,Dusit Niyato
摘要:The integration of Electric Buses (EBs) with renewable energy sources such as photovoltaic (PV) panels is a promising approach to promote sustainable and low-carbon public transportation. However, optimizing EB charging schedules to minimize operational costs while ensuring safe operation without battery depletion remains challenging - especially under real-world conditions, where uncertainties in PV generation, dynamic electricity prices, variable travel times, and limited charging infrastructure must be accounted for. In this paper, we propose a safe Hierarchical Deep Reinforcement Learning (HDRL) framework for solving the EB Charging Scheduling Problem (EBCSP) under multi-source uncertainties. We formulate the problem as a Constrained Markov Decision Process (CMDP) with options to enable temporally abstract decision-making. We develop a novel HDRL algorithm, namely Double Actor-Critic Multi-Agent Proximal Policy Optimization Lagrangian (DAC-MAPPO-Lagrangian), which integrates Lagrangian relaxation into the Double Actor-Critic (DAC) framework. At the high level, we adopt a centralized PPO-Lagrangian algorithm to learn safe charger allocation policies. At the low level, we incorporate MAPPO-Lagrangian to learn decentralized charging power decisions under the Centralized Training and Decentralized Execution (CTDE) paradigm. Extensive experiments with real-world data demonstrate that the proposed approach outperforms existing baselines in both cost minimization and safety compliance, while maintaining fast convergence speed.
【29】Exploring Syntropic Frameworks in AI Alignment: A Philosophical Investigation
标题:探索人工智能对齐中的共变性框架:哲学研究
链接:https://arxiv.org/abs/2512.03048
作者:Austin Spizzirri
备注:Approx. 3,000 words, 10 pages. Philosophical analysis of AI alignment (process-based / syntropy framework)
摘要:I argue that AI alignment should be reconceived as architecting syntropic, reasons-responsive agents through process-based, multi-agent, developmental mechanisms rather than encoding fixed human value content. The paper makes three philosophical contributions. First, I articulate the ``specification trap'' argument demonstrating why content-based value specification appears structurally unstable due to the conjunction of the is-ought gap, value pluralism, and the extended frame problem. Second, I propose syntropy -- the recursive reduction of mutual uncertainty between agents through state alignment -- as an information-theoretic framework for understanding multi-agent alignment dynamics. Third, I establish a functional distinction between genuine and simulated moral capacity grounded in compatibilist theories of guidance control, coupled with an embodied experimental paradigm and verification regime providing operational criteria independent of phenomenological claims. This paper represents the philosophical component of a broader research program whose empirical validation is being developed in a separate project currently in preparation. While the framework generates specific, falsifiable predictions about value emergence and moral agency in artificial systems, empirical validation remains pending.
【30】Closing the problem of which causal structures of up to six total nodes have a classical-quantum gap
标题:解决多达六个节点的哪些因果结构具有经典量子差距的问题
链接:https://arxiv.org/abs/2512.04058
作者:Shashaank Khanna,Matthew Pusey,Roger Colbeck
备注:5 pages, 3 figures, 1 table
摘要:The discovery of Bell that there exist quantum correlations that cannot be reproduced classically is one of the most important in the foundations of quantum mechanics, as well as having practical implications. Bell's result was originally proven in a simple bipartite causal structure, but analogous results have also been shown in further causal structures. Here we study the only causal structure with six or fewer nodes in which the question of whether or not there exist quantum correlations that cannot be achieved classically was open. In this causal structure we show that such quantum correlations exist using a method that involves imposing additional restrictions on the correlations. This hence completes the picture of which causal structures of up to six nodes support non-classical quantum correlations. We also provide further illustrations of our method using other causal structures.
【31】A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses
标题:将想象的听觉MEG映射到聆听的大脑反应的卷积框架
链接:https://arxiv.org/abs/2512.03458
作者:Maryam Maghsoudi,Mohsen Rezaeizadeh,Shihab Shamma
摘要:Decoding imagined speech engages complex neural processes that are difficult to interpret due to uncertainty in timing and the limited availability of imagined-response datasets. In this study, we present a Magnetoencephalography (MEG) dataset collected from trained musicians as they imagined and listened to musical and poetic stimuli. We show that both imagined and perceived brain responses contain consistent, condition-specific information. Using a sliding-window ridge regression model, we first mapped imagined responses to listened responses at the single-subject level, but found limited generalization across subjects. At the group level, we developed an encoder-decoder convolutional neural network with a subject-specific calibration layer that produced stable and generalizable mappings. The CNN consistently outperformed the null model, yielding significantly higher correlations between predicted and true listened responses for nearly all held-out subjects. Our findings demonstrate that imagined neural activity can be transformed into perception-like responses, providing a foundation for future brain-computer interface applications involving imagined speech and music.
【32】When does Gaussian equivalence fail and how to fix it: Non-universal behavior of random features with quadratic scaling
标题:高斯等效何时失败以及如何修复它:具有二次缩放的随机特征的非普遍行为
链接:https://arxiv.org/abs/2512.03325
作者:Garrett G. Wen,Hong Hu,Yue M. Lu,Zhou Fan,Theodor Misiakiewicz
摘要:A major effort in modern high-dimensional statistics has been devoted to the analysis of linear predictors trained on nonlinear feature embeddings via empirical risk minimization (ERM). Gaussian equivalence theory (GET) has emerged as a powerful universality principle in this context: it states that the behavior of high-dimensional, complex features can be captured by Gaussian surrogates, which are more amenable to analysis. Despite its remarkable successes, numerical experiments show that this equivalence can fail even for simple embeddings -- such as polynomial maps -- under general scaling regimes. We investigate this breakdown in the setting of random feature (RF) models in the quadratic scaling regime, where both the number of features and the sample size grow quadratically with the data dimension. We show that when the target function depends on a low-dimensional projection of the data, such as generalized linear models, GET yields incorrect predictions. To capture the correct asymptotics, we introduce a Conditional Gaussian Equivalent (CGE) model, which can be viewed as appending a low-dimensional non-Gaussian component to an otherwise high-dimensional Gaussian model. This hybrid model retains the tractability of the Gaussian framework and accurately describes RF models in the quadratic scaling regime. We derive sharp asymptotics for the training and test errors in this setting, which continue to agree with numerical simulations even when GET fails. Our analysis combines general results on CLT for Wiener chaos expansions and a careful two-phase Lindeberg swapping argument. Beyond RF models and quadratic scaling, our work hints at a rich landscape of universality phenomena in high-dimensional ERM.
【33】Iterative Tilting for Diffusion Fine-Tuning
标题:迭代倾斜用于扩散微调
链接:https://arxiv.org/abs/2512.03234
作者:Jean Pachebat,Giovanni Conforti,Alain Durmus,Yazid Janati
备注:14 pages
摘要:We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $\exp(λr)$ into $N$ sequential smaller tilts, each admitting a tractable score update via first-order Taylor expansion. This requires only forward evaluations of the reward function and avoids backpropagating through sampling chains. We validate on a two-dimensional Gaussian mixture with linear reward, where the exact tilted distribution is available in closed form.
【34】In Situ Quantum Analog Pulse Characterization via Structured Signal Processing
标题:通过结构化信号处理的现场量子模拟脉冲特征
链接:https://arxiv.org/abs/2512.03193
作者:Yulong Dong,Christopher Kang,Murphy Yuezhen Niu
备注:48 pages, 10 figures
摘要:Analog quantum simulators can directly emulate time-dependent Hamiltonian dynamics, enabling the exploration of diverse physical phenomena such as phase transitions, quench dynamics, and non-equilibrium processes. Realizing accurate analog simulations requires high-fidelity time-dependent pulse control, yet existing calibration schemes are tailored to digital gate characterization and cannot be readily extended to learn continuous pulse trajectories. We present a characterization algorithm for in situ learning of pulse trajectories by extending the Quantum Signal Processing (QSP) framework to analyze time-dependent pulses. By combining QSP with a logical-level analog-digital mapping paradigm, our method reconstructs a smooth pulse directly from queries of the time-ordered propagator, without requiring mid-circuit measurements or additional evolution. Unlike conventional Trotterization-based methods, our approach avoids unscalable performance degradation arising from accumulated local truncation errors as the logical-level segmentation increases. Through rigorous theoretical analysis and extensive numerical simulations, we demonstrate that our method achieves high accuracy with strong efficiency and robustness against SPAM as well as depolarizing errors, providing a lightweight and optimal validation protocol for analog quantum simulators capable of detecting major hardware faults.
【35】An AI Implementation Science Study to Improve Trustworthy Data in a Large Healthcare System
标题:一项人工智能实施科学研究,以改善大型医疗保健系统中的可信数据
链接:https://arxiv.org/abs/2512.03098
作者:Benoit L. Marteau,Andrew Hornback,Shaun Q. Tan,Christian Lowson,Jason Woloff,May D. Wang
备注:Submitted and Accepted to the IEEE International Conference on Biomedical and Health Informatics (BHI) 2025
摘要:The rapid growth of Artificial Intelligence (AI) in healthcare has sparked interest in Trustworthy AI and AI Implementation Science, both of which are essential for accelerating clinical adoption. However, strict regulations, gaps between research and clinical settings, and challenges in evaluating AI systems continue to hinder real-world implementation. This study presents an AI implementation case study within Shriners Childrens (SC), a large multisite pediatric system, showcasing the modernization of SCs Research Data Warehouse (RDW) to OMOP CDM v5.4 within a secure Microsoft Fabric environment. We introduce a Python-based data quality assessment tool compatible with SCs infrastructure, extending OHDsi's R/Java-based Data Quality Dashboard (DQD) and integrating Trustworthy AI principles using the METRIC framework. This extension enhances data quality evaluation by addressing informative missingness, redundancy, timeliness, and distributional consistency. We also compare systematic and case-specific AI implementation strategies for Craniofacial Microsomia (CFM) using the FHIR standard. Our contributions include a real-world evaluation of AI implementations, integration of Trustworthy AI principles into data quality assessment, and insights into hybrid implementation strategies that blend systematic infrastructure with use-case-driven approaches to advance AI in healthcare.
【36】Calibrating Geophysical Predictions under Constrained Probabilistic Distributions
标题:约束概率分布下地球物理预测的校正
链接:https://arxiv.org/abs/2512.03081
作者
:Zhewen Hou,Jiajin Sun,Subashree Venkatasubramanian,Peter Jin,Shuolin Li,Tian Zheng
摘要:Machine learning (ML) has shown significant promise in studying complex geophysical dynamical systems, including turbulence and climate processes. Such systems often display sensitive dependence on initial conditions, reflected in positive Lyapunov exponents, where even small perturbations in short-term forecasts can lead to large deviations in long-term outcomes. Thus, meaningful inference requires not only accurate short-term predictions, but also consistency with the system's long-term attractor that is captured by the marginal distribution of state variables. Existing approaches attempt to address this challenge by incorporating spatial and temporal dependence, but these strategies become impractical when data are extremely sparse. In this work, we show that prior knowledge of marginal distributions offers valuable complementary information to short-term observations, motivating a distribution-informed learning framework. We introduce a calibration algorithm based on normalization and the Kernelized Stein Discrepancy (KSD) to enhance ML predictions. The method here employs KSD within a reproducing kernel Hilbert space to calibrate model outputs, improving their fidelity to known physical distributions. This not only sharpens pointwise predictions but also enforces consistency with non-local statistical structures rooted in physical principles. Through synthetic experiments-spanning offline climatological CO2 fluxes and online quasi-geostrophic flow simulations-we demonstrate the robustness and broad utility of the proposed framework.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递