Py学习  »  机器学习算法

机器学习学术速递[1.1]

arXiv每日学术速递 • 3 周前 • 190 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计176篇


大模型相关(18篇)

【1】Reliable and Resilient Collective Communication Library for LLM Training and Serving
标题:可靠且有弹性的集体沟通图书馆,用于LLM训练和服务
链接:https://arxiv.org/abs/2512.25059

作者:Wei Wang,Nengneng Yu,Sixian Xiong,Zaoxing Liu
摘要:现代ML训练和推理现在跨越数十到数万个GPU,由于恢复缓慢,网络故障可能会浪费10- 15%的GPU时间。常见的网络错误和链路波动会触发超时,通常会终止整个作业,从而在训练期间强制执行代价高昂的检查点回滚,并在推理期间请求重新处理。我们提出了R$^2$CCL,这是一个容错通信库,它通过利用多NIC硬件提供无损、低开销的故障转移。R$^2$CCL执行快速连接迁移、带宽感知负载重新分配和弹性集体算法,以在故障情况下保持进度。我们在两台8 GPU H100 InfiniBand服务器上评估R$^2$CCL,并通过大规模ML模拟器对数百个具有不同故障模式的GPU进行建模。实验表明,R$^2$CCL对NIC故障具有很高的鲁棒性,训练开销小于1%,推理开销小于3%. R$^2$CCL的性能分别优于基准AdapCC和DejaVu 12.18$\times$和47$\times$。
摘要:Modern ML training and inference now span tens to tens of thousands of GPUs, where network faults can waste 10--15\% of GPU hours due to slow recovery. Common network errors and link fluctuations trigger timeouts that often terminate entire jobs, forcing expensive checkpoint rollback during training and request reprocessing during inference. We present R$^2$CCL, a fault-tolerant communication library that provides lossless, low-overhead failover by exploiting multi-NIC hardware. R$^2$CCL performs rapid connection migration, bandwidth-aware load redistribution, and resilient collective algorithms to maintain progress under failures. We evaluate R$^2$CCL on two 8-GPU H100 InfiniBand servers and via large-scale ML simulators modeling hundreds of GPUs with diverse failure patterns. Experiments show that R$^2$CCL is highly robust to NIC failures, incurring less than 1\% training and less than 3\% inference overheads. R$^2$CCL outperforms baselines AdapCC and DejaVu by 12.18$\times$ and 47$\times$, respectively.


【2】Diffusion Language Models are Provably Optimal Parallel Samplers
标题:扩散语言模型是可证明最优的并行采样器
链接:https://arxiv.org/abs/2512.25014

作者:Haozhe Jiang,Nika Haghtalab,Lijie Chen
摘要:扩散语言模型(DLMs)已经成为自回归模型的一个有前途的替代方案,通过并行令牌生成来实现更快的推理。我们提供了一个严格的基础,这一优势,通过形式化的并行采样模型,并显示DLM增强与多项式长度的思想链(CoT)可以模拟任何并行采样算法使用的顺序步骤的最佳数量。因此,每当可以使用少量顺序步骤生成目标分布时,DLM可以用于使用相同数量的最佳顺序步骤生成分布。但是,如果无法修改先前显示的令牌,则具有CoT的DLM仍然会产生大量的中间占用空间。我们证明,启用remasking(将未掩蔽的令牌转换为掩码)或修订(将未掩蔽的令牌转换为其他未掩蔽的令牌)与CoT一起进一步允许DLMs模拟任何具有最佳空间复杂度的并行采样算法。我们进一步证明了修订的优势,建立一个严格的表现力差距:与修订或remasking的DLM是严格的表达比那些没有。我们的研究结果不仅提供了一个理论上的理由的承诺,DLMs作为最有效的并行采样器,但也倡导使修订DLMs。
摘要:Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide a rigorous foundation for this advantage by formalizing a model of parallel sampling and showing that DLMs augmented with polynomial-length chain-of-thought (CoT) can simulate any parallel sampling algorithm using an optimal number of sequential steps. Consequently, whenever a target distribution can be generated using a small number of sequential steps, a DLM can be used to generate the distribution using the same number of optimal sequential steps. However, without the ability to modify previously revealed tokens, DLMs with CoT can still incur large intermediate footprints. We prove that enabling remasking (converting unmasked tokens to masks) or revision (converting unmasked tokens to other unmasked tokens) together with CoT further allows DLMs to simulate any parallel sampling algorithm with optimal space complexity. We further justify the advantage of revision by establishing a strict expressivity gap: DLMs with revision or remasking are strictly more expressive than those without. Our results not only provide a theoretical justification for the promise of DLMs as the most efficient parallel sampler, but also advocate for enabling revision in DLMs.


【3】Efficiently Estimating Data Efficiency for Language Model Fine-tuning
标题:有效估计语言模型微调的数据效率
链接:https://arxiv.org/abs/2512.24991

作者:Gyung Hyun Je,Colin Raffel
摘要:虽然大型语言模型(LLM)在许多下游任务中表现出合理的zero-shot能力,但微调是提高其性能的常见做法。然而,任务的数据效率--即,实现期望的性能水平所需的微调示例的数量通常是未知的,导致增加注释和重新训练的昂贵周期。事实上,我们在一组精心策划的30个专门任务中证明,高性能的LLM可能会在zero-shot中挣扎,但经过微调后可以获得更强的性能。这激发了对预测任务的数据效率而不需要增量注释的方法的需求。在引入了一个量化任务数据效率的具体指标后,我们提出使用低置信度样本的梯度余弦相似性来预测基于少量标记样本的数据效率。我们在具有不同数据效率的各种任务上验证了我们的方法,在总体数据效率预测中达到了8.6%的误差,并且通常消除了每个任务上数百个不必要的注释。我们的实验结果和实现代码可以在GitHub上找到。
摘要:While large language models (LLMs) demonstrate reasonable zero-shot capability across many downstream tasks, fine-tuning is a common practice to improve their performance. However, a task's data efficiency--i.e., the number of fine-tuning examples needed to achieve a desired level of performance--is often unknown, resulting in costly cycles of incremental annotation and retraining. Indeed, we demonstrate across a curated set of 30 specialized tasks that performant LLMs may struggle zero-shot but can attain stronger performance after fine-tuning. This motivates the need for methods to predict a task's data efficiency without requiring incremental annotation. After introducing a concrete metric that quantifies a task's data efficiency, we propose using the gradient cosine similarity of low-confidence examples to predict data efficiency based on a small number of labeled samples. We validate our approach on a diverse set of tasks with varying data efficiencies, attaining 8.6% error in overall data efficiency prediction and typically eliminating hundreds of unnecessary annotations on each task. Our experiment results and implementation code are available on GitHub.


【4】DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments
标题:DarkEQA:为低光室内环境中的预定问题回答进行基准视觉语言模型
链接:https://arxiv.org/abs/2512.24985

作者:Yohan Park,Hyunwoo Ha,Wonjun Jo,Tae-Hyun Oh
备注:Submitted to IEEE Robotics and Automation Letters (RA-L)
摘要:视觉语言模型(VLM)越来越多地被采用作为体现代理的中央推理模块。现有的基准测试在理想的光线充足的条件下评估其能力,但强大的24/7操作要求在各种视觉降级下的性能,包括夜间或黑暗环境中的低光照条件-这是一个核心需求,在很大程度上被忽视了。为了解决这一未充分探索的挑战,我们提出了DarkEQA,这是一个开源基准,用于在多级低光条件下评估EQA相关的感知原语。DarkEQA通过在受控退化下从自我中心的观察中评估问题回答来隔离感知瓶颈,从而实现可归因的鲁棒性分析。DarkEQA的一个关键设计特性是其物理保真度:视觉退化在线性RAW空间中建模,模拟基于物理的照明下降和传感器噪声,然后是ISP启发的渲染管道。我们通过评估各种最先进的VLM和低光图像增强(LLIE)模型来证明DarkEQA的实用性。我们的分析系统地揭示了VLM在这些具有挑战性的视觉条件下操作时的局限性。我们的代码和基准数据集将在验收后发布。
摘要:Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet robust 24/7 operation demands performance under a wide range of visual degradations, including low-light conditions at night or in dark environments--a core necessity that has been largely overlooked. To address this underexplored challenge, we present DarkEQA, an open-source benchmark for evaluating EQA-relevant perceptual primitives under multi-level low-light conditions. DarkEQA isolates the perception bottleneck by evaluating question answering from egocentric observations under controlled degradations, enabling attributable robustness analysis. A key design feature of DarkEQA is its physical fidelity: visual degradations are modeled in linear RAW space, simulating physics-based illumination drop and sensor noise followed by an ISP-inspired rendering pipeline. We demonstrate the utility of DarkEQA by evaluating a wide range of state-of-the-art VLMs and Low-Light Image Enhancement (LLIE) models. Our analysis systematically reveals VLMs' limitations when operating under these challenging visual conditions. Our code and benchmark dataset will be released upon acceptance.


【5】Iterative Deployment Improves Planning Skills in LLMs
标题:迭代部署提高了LLM的规划技能
链接:https://arxiv.org/abs/2512.24940

作者:Augusto B. Corrêa,Yoav Gelberg,Luckeciano C. Melo,Ilia Shumailov,André G. Pereira,Yarin Gal
摘要:我们证明了大型语言模型(LLM)的迭代部署,每个模型都对用户从以前的模型部署中精心策划的数据进行了微调,可以显着改变结果模型的属性。通过在各种规划领域测试这种机制,我们观察到规划技能的大幅提高,后来的模型通过发现比初始模型长得多的计划来显示紧急泛化。然后,我们提供了理论分析,表明迭代部署有效地实现了外循环中的强化学习(RL)训练(即不作为有意模型训练的一部分),具有隐式奖励功能。与RL的联系有两个重要的意义:首先,对于AI安全领域,重复部署所带来的奖励函数没有明确定义,并且可能对未来模型部署的属性产生意想不到的影响。其次,这里强调的机制可以被视为显式强化学习的替代训练机制,依赖于数据管理而不是显式奖励。
摘要:We show that iterative deployment of large language models (LLMs), each fine-tuned on data carefully curated by users from the previous models' deployment, can significantly change the properties of the resultant models. By testing this mechanism on various planning domains, we observe substantial improvements in planning skills, with later models displaying emergent generalization by discovering much longer plans than the initial models. We then provide theoretical analysis showing that iterative deployment effectively implements reinforcement learning (RL) training in the outer-loop (i.e. not as part of intentional model training), with an implicit reward function. The connection to RL has two important implications: first, for the field of AI safety, as the reward function entailed by repeated deployment is not defined explicitly, and could have unexpected implications to the properties of future model deployments. Second, the mechanism highlighted here can be viewed as an alternative training regime to explicit RL, relying on data curation rather than explicit rewards.


【6】Adaptive Dependency-aware Prompt Optimization Framework for Multi-Step LLM Pipeline
标题:用于多步骤LLM流水线的自适应依赖性感知提示优化框架
链接:https://arxiv.org/abs/2512.24933

作者:Minjun Zhao,Xinyu Zhang,Shuai Zhang,Deyang Li,Ruifeng Shi
摘要:多步骤LLM管道以结构化的顺序多次调用大型语言模型,可以有效地解决复杂的任务,但它们的性能在很大程度上取决于每个步骤中使用的提示。由于缺少步骤级监督和步骤间依赖性,联合优化这些提示是困难的。现有的端到端即时优化方法在这些条件下挣扎,并且经常产生次优或不稳定的更新。我们提出了ADOPT,一个自适应依赖感知提示优化框架的多步LLM管道。ADOPT显式地对每个LLM步骤与最终任务结果之间的依赖关系进行建模,从而实现精确的文本梯度估计,类似于计算分析导数。它将文本梯度估计与梯度更新相结合,将多提示优化减少为灵活的单提示优化步骤,并采用基于Shapley的机制自适应地分配优化资源。在真实世界数据集和不同管道结构上的实验表明,ADOPT是有效和健壮的,始终优于最先进的即时优化基线。
摘要:Multi-step LLM pipelines invoke large language models multiple times in a structured sequence and can effectively solve complex tasks, but their performance heavily depends on the prompts used at each step. Jointly optimizing these prompts is difficult due to missing step-level supervision and inter-step dependencies. Existing end-to-end prompt optimization methods struggle under these conditions and often yield suboptimal or unstable updates. We propose ADOPT, an Adaptive Dependency-aware Prompt Optimization framework for multi-step LLM pipelines. ADOPT explicitly models the dependency between each LLM step and the final task outcome, enabling precise text-gradient estimation analogous to computing analytical derivatives. It decouples textual gradient estimation from gradient updates, reducing multi-prompt optimization to flexible single-prompt optimization steps, and employs a Shapley-based mechanism to adaptively allocate optimization resources. Experiments on real-world datasets and diverse pipeline structures show that ADOPT is effective and robust, consistently outperforming state-of-the-art prompt optimization baselines.


【7】HOLOGRAPH: Active Causal Discovery via Sheaf-Theoretic Alignment of Large Language Model Priors
标题:全息图:通过大型语言模型先验的层理论对齐发现主动原因
链接:https://arxiv.org/abs/2512.24478

作者:Hyunjun Kim
摘要:从观测数据中发现因果关系仍然受到可识别性约束的根本限制。最近的工作探索了利用大型语言模型(LLM)作为先验因果知识的来源,但现有的方法依赖于缺乏理论基础的启发式集成。我们介绍全息图,一个框架,正式LLM引导的因果发现,通过层理论-表示局部因果信念的部分变量子集的预层。我们的关键见解是,连贯的全球因果结构对应于全球部分的存在,而拓扑障碍表现为非零层上同调。我们提出了代数潜在投影来处理隐藏的混杂因素和自然梯度下降的信念流形上的原则优化。合成和真实世界的基准测试实验表明,全息图提供了严格的数学基础,同时在50-100个变量的因果发现任务上实现了有竞争力的性能。我们的层理论分析表明,虽然恒等式,传递性和胶合公理满足数值精度(<10^{-6}),局部性公理失败的较大的图,这表明基本的非局部耦合在潜变量投影。代码可在[https://github.com/hyunjun1121/holograph](https://github.com/hyunjun1121/holograph)获得。
摘要:Causal discovery from observational data remains fundamentally limited by identifiability constraints. Recent work has explored leveraging Large Language Models (LLMs) as sources of prior causal knowledge, but existing approaches rely on heuristic integration that lacks theoretical grounding. We introduce HOLOGRAPH, a framework that formalizes LLM-guided causal discovery through sheaf theory--representing local causal beliefs as sections of a presheaf over variable subsets. Our key insight is that coherent global causal structure corresponds to the existence of a global section, while topological obstructions manifest as non-vanishing sheaf cohomology. We propose the Algebraic Latent Projection to handle hidden confounders and Natural Gradient Descent on the belief manifold for principled optimization. Experiments on synthetic and real-world benchmarks demonstrate that HOLOGRAPH provides rigorous mathematical foundations while achieving competitive performance on causal discovery tasks with 50-100 variables. Our sheaf-theoretic analysis reveals that while Identity, Transitivity, and Gluing axioms are satisfied to numerical precision (<10^{-6}), the Locality axiom fails for larger graphs, suggesting fundamental non-local coupling in latent variable projections. Code is available at [https://github.com/hyunjun1121/holograph](https://github.com/hyunjun1121/holograph).


【8】Enhancing LLM Planning Capabilities through Intrinsic Self-Critique
标题:通过内在的自我批评增强LLM规划能力
链接:https://arxiv.org/abs/2512.24103

作者:Bernd Bohnet,Pierre-Alexandre Kamienny,Hanie Sedghi,Dilan Gorur,Pranjal Awasthi,Aaron Parisi,Kevin Swersky,Rosanne Liu,Azade Nova,Noah Fiedel
摘要:我们展示了一种让法学硕士批评他们自己的答案的方法,目的是提高他们的绩效,从而比既定的规划基准取得显着改进。尽管早期的研究结果对LLM利用自我批评方法的有效性表示怀疑,但我们通过内在的自我批评在Blocksworld域中的规划数据集上显示出显着的性能增益,而无需外部来源,如验证器。我们还在物流和迷你网格数据集上展示了类似的改进,超过了强大的基线精度。我们采用了Few-Shot学习技术,并逐步将其扩展到多镜头的方法作为我们的基础方法,并证明了通过采用迭代过程进行校正和改进,可以在这种已经具有竞争力的方法上获得实质性的改进。我们说明了自我批评如何显着提高规划绩效。我们的实证结果展示了所考虑的模型类别的最新技术水平,即2024年10月的LLM模型检查点。我们的主要重点在于方法本身,展示了无论特定模型版本如何都适用的内在自我改进能力,我们相信将我们的方法应用于更复杂的搜索技术和更有能力的模型将带来更好的性能。
摘要:We demonstrate an approach for LLMs to critique their \emph{own} answers with the goal of enhancing their performance that leads to significant improvements over established planning benchmarks. Despite the findings of earlier research that has cast doubt on the effectiveness of LLMs leveraging self critique methods, we show significant performance gains on planning datasets in the Blocksworld domain through intrinsic self-critique, without external source such as a verifier. We also demonstrate similar improvements on Logistics and Mini-grid datasets, exceeding strong baseline accuracies. We employ a few-shot learning technique and progressively extend it to a many-shot approach as our base method and demonstrate that it is possible to gain substantial improvement on top of this already competitive approach by employing an iterative process for correction and refinement. We illustrate how self-critique can significantly boost planning performance. Our empirical results present new state-of-the-art on the class of models considered, namely LLM model checkpoints from October 2024. Our primary focus lies on the method itself, demonstrating intrinsic self-improvement capabilities that are applicable regardless of the specific model version, and we believe that applying our method to more complex search techniques and more capable models will lead to even better performance.


【9】Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation Study
标题:GP-VAE语言模型潜空间中的自回归性:经验性消融研究
链接:https://arxiv.org/abs/2512.24102

作者:Yves Ruffenach
备注:A focused ablation study analyzing the role of latent autoregression in GP-VAE models
摘要:本文提供了一个基于烧蚀的分析潜在的自回归GP-VAE模型,建立在我们以前的工作介绍的架构。语言模型通常依赖于对令牌的自回归因子分解。相比之下,我们先前的工作提出了通过因果高斯过程将序列结构转移到潜在空间,同时使用非自回归解码器。在这里,我们进行了系统的消融研究所发挥的作用,潜在的自回归。我们比较了(i)具有自回归潜在动态的完整GP-VAE模型,(ii)潜变量独立的非自回归消融,以及(iii)标准令牌级自回归Transformer。我们的研究结果表明,在所考虑的制度(中等规模的语料库和短的训练环境),潜在的自回归诱导潜在的轨迹是显着更兼容的高斯过程先验,并表现出更大的长期稳定性。相反,去除自回归会导致潜在结构退化和长期行为不稳定。这些发现突出了潜在自回归作为组织长期结构的有效机制的作用,同时与令牌级自回归建模保持互补。它们应该被解释为对代表性结构的经验分析,而不是对新架构的建议。
摘要:This paper provides an ablation-based analysis of latent autoregression in GP-VAE models, building upon our previous work introducing the architecture. Language models typically rely on an autoregressive factorization over tokens. In contrast, our prior work proposed shifting sequential structure to the latent space through a causal Gaussian process, while using a non-autoregressive decoder. Here, we conduct a systematic ablation study of the role played by latent autoregression. We compare (i) a full GP-VAE model with autoregressive latent dynamics, (ii) a non-autoregressive ablation in which latent variables are independent, and (iii) a standard token-level autoregressive Transformer. Our results show that, within the considered regime (medium-scale corpora and short training contexts), latent autoregression induces latent trajectories that are significantly more compatible with the Gaussian-process prior and exhibit greater long-horizon stability. In contrast, removing autoregression leads to degraded latent structure and unstable long-range behavior. These findings highlight the role of latent autoregression as an effective mechanism for organizing long-range structure, while remaining complementary to token-level autoregressive modeling. They should be interpreted as an empirical analysis of representational structure rather than as a proposal for a new architecture.


【10】How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns
标题:LLM如何以及为何推广:LLM推理从认知行为到低级模式的细粒度分析
链接:https://arxiv.org/abs/2512.24063

作者:Haoyue Bai,Yiyou Sun,Wenjie Hu,Shi Qiu,Maggie Ziyu Huan,Peiyang Song,Robert Nowak,Dawn Song
摘要:大型语言模型(LLM)表现出截然不同的泛化行为:监督微调(SFT)往往会缩小能力,而重复学习(RL)调优往往会保留能力。这种差异背后的原因尚不清楚,因为之前的研究主要依赖于粗略的准确性指标。我们通过引入一种新的基准来解决这一差距,该基准将推理分解为原子核心技能,如计算,事实检索,模拟,枚举和诊断,为解决LLM中推理的基本问题提供了一个具体的框架。通过隔离和测量这些核心技能,基准提供了一个更细粒度的视图,具体的认知能力如何出现,转移,有时在培训后崩溃。结合对低层次统计模式(如分布发散和参数统计)的分析,它可以细粒度地研究在SFT和RL下如何在数学,科学推理和非推理任务中进行泛化。我们的元探测框架在不同的训练阶段跟踪模型行为,并揭示了RL调整的模型保持更稳定的行为配置文件和抵抗崩溃的推理技能,而SFT模型表现出更尖锐的漂移和过拟合表面模式。这项工作提供了新的见解,在LLM推理的性质,并指向原则,设计培训策略,促进广泛的,强大的泛化。
摘要:Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to preserve it. The reasons behind this divergence remain unclear, as prior studies have largely relied on coarse accuracy metrics. We address this gap by introducing a novel benchmark that decomposes reasoning into atomic core skills such as calculation, fact retrieval, simulation, enumeration, and diagnostic, providing a concrete framework for addressing the fundamental question of what constitutes reasoning in LLMs. By isolating and measuring these core skills, the benchmark offers a more granular view of how specific cognitive abilities emerge, transfer, and sometimes collapse during post-training. Combined with analyses of low-level statistical patterns such as distributional divergence and parameter statistics, it enables a fine-grained study of how generalization evolves under SFT and RL across mathematical, scientific reasoning, and non-reasoning tasks. Our meta-probing framework tracks model behavior at different training stages and reveals that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns. This work provides new insights into the nature of reasoning in LLMs and points toward principles for designing training strategies that foster broad, robust generalization.


【11】Beyond Hallucinations: A Composite Score for Measuring Reliability in Open-Source Large Language Models
标题:超越幻觉:衡量开源大型语言模型可靠性的综合分数
链接:https://arxiv.org/abs/2512.24058

作者 :Rohit Kumar Salla,Manoj Saravanan,Shrikar Reddy Kota
备注:5 pages, 4 tables, accepted at AAAI 2026
摘要:LLaMA、Mistral和Gemma等大型语言模型(LLM)越来越多地用于医疗保健、法律和金融等决策关键领域,但其可靠性仍然不确定。他们经常犯过度自信的错误,在输入变化下性能下降,并且缺乏明确的不确定性估计。现有的评价是零散的,只涉及孤立的方面。我们引入了综合可靠性评分(CRS),这是一个统一的框架,将校准,鲁棒性和不确定性量化集成到一个单一的可解释的度量中。通过在五个QA数据集上对十个领先的开源LLM进行实验,我们评估了基线,扰动和校准方法下的性能。CRS提供稳定的模型排名,发现单一指标遗漏的隐藏故障模式,并强调最可靠的系统平衡了准确性,鲁棒性和校准的不确定性。
摘要:Large Language Models (LLMs) like LLaMA, Mistral, and Gemma are increasingly used in decision-critical domains such as healthcare, law, and finance, yet their reliability remains uncertain. They often make overconfident errors, degrade under input shifts, and lack clear uncertainty estimates. Existing evaluations are fragmented, addressing only isolated aspects. We introduce the Composite Reliability Score (CRS), a unified framework that integrates calibration, robustness, and uncertainty quantification into a single interpretable metric. Through experiments on ten leading open-source LLMs across five QA datasets, we assess performance under baselines, perturbations, and calibration methods. CRS delivers stable model rankings, uncovers hidden failure modes missed by single metrics, and highlights that the most dependable systems balance accuracy, robustness, and calibrated uncertainty.


【12】RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress
标题:重复诅咒:测量和理解在OSS压力下混合专家LLM中的路由器不平衡
链接:https://arxiv.org/abs/2512.23995

作者:Ruixuan Huang,Qingyue Wang,Hantao Huang,Yudong Gao,Dong Chen,Shuai Wang,Wei Wang
摘要:混合专家架构由于其优越的参数效率已经成为扩展大型语言模型的标准。为了适应实际中不断增长的专家数量,现代推理系统通常采用专家并行来跨设备分配专家。然而,在推理过程中缺乏显式的负载平衡约束,使得敌对输入触发严重的路由集中。我们证明,出的分布提示可以操纵的路由策略,使所有的令牌始终路由到同一组的顶级$k$专家,这会造成计算瓶颈,在某些设备上,而迫使其他闲置。这将效率机制转换为拒绝服务攻击向量,导致违反第一个令牌的时间的服务级别协议。我们提出了RepetitionCurse,一个低成本的黑盒策略来利用这个漏洞。通过识别MoE路由器行为中的普遍缺陷,RepetitionCurse以与模型无关的方式使用简单的重复令牌模式来构造对抗性提示。在广泛部署的MoE模型(如Mixtral-8x 7 B)上,我们的方法将端到端推理延迟增加了3.063倍,显著降低了服务可用性。
摘要:Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert parallelism to distribute experts across devices. However, the absence of explicit load balancing constraints during inference allows adversarial inputs to trigger severe routing concentration. We demonstrate that out-of-distribution prompts can manipulate the routing strategy such that all tokens are consistently routed to the same set of top-$k$ experts, which creates computational bottlenecks on certain devices while forcing others to idle. This converts an efficiency mechanism into a denial-of-service attack vector, leading to violations of service-level agreements for time to first token. We propose RepetitionCurse, a low-cost black-box strategy to exploit this vulnerability. By identifying a universal flaw in MoE router behavior, RepetitionCurse constructs adversarial prompts using simple repetitive token patterns in a model-agnostic manner. On widely deployed MoE models like Mixtral-8x7B, our method increases end-to-end inference latency by 3.063x, degrading service availability significantly.


【13】Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
标题:Yggdrasil:为延迟最优的基于树的LLM解码弥合动态推测和静态推测
链接:https://arxiv.org/abs/2512.23858

作者:Yue Guan,Changming Yu,Shihan Fang,Weiming Hu,Zaifeng Pan,Zheng Wang,Zihan Liu,Yangjie Zhou,Yufei Ding,Minyi Guo,Jingwen Leng
备注:Accepted by NeurIPS 2025
摘要:推测解码通过并行生成和验证多个令牌来改进LLM推理,但由于动态推测和静态运行时假设之间的不匹配,现有系统的性能不佳。我们提出Yggdrasil,一个共同设计的系统,使延迟最佳的推测解码,通过上下文感知树起草和编译器友好的执行。Yggdrasil引入了用于静态图兼容性的等增长树结构,用于草稿选择的延迟感知优化目标,以及基于阶段的调度以减少开销。Yggdrasil支持未经修改的LLM,并在多个硬件设置中实现了高达3.98\times $的最新基线加速。
摘要:Speculative decoding improves LLM inference by generating and verifying multiple tokens in parallel, but existing systems suffer from suboptimal performance due to a mismatch between dynamic speculation and static runtime assumptions. We present Yggdrasil, a co-designed system that enables latency-optimal speculative decoding through context-aware tree drafting and compiler-friendly execution. Yggdrasil introduces an equal-growth tree structure for static graph compatibility, a latency-aware optimization objective for draft selection, and stage-based scheduling to reduce overhead. Yggdrasil supports unmodified LLMs and achieves up to $3.98\times$ speedup over state-of-the-art baselines across multiple hardware setups.


【14】The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models
标题:深入挖掘和编造测试(DDFT):一种测量语言模型中认知稳健性的协议
链接:https://arxiv.org/abs/2512.23850

作者:Rahul Baxi
备注:Currently under review at TMLR
摘要:目前的语言模型评估衡量的是模型在理想条件下知道什么,而不是在现实压力下知道多少。像MMLU和TruthfulQA这样的静态基准测试无法区分缺乏知识的模型和当信息降级或对手探测弱点时验证机制崩溃的模型。我们介绍了向下钻取和捏造测试(DDFT),一个协议,测量认知的鲁棒性:一个模型的能力,以保持事实的准确性下渐进的语义压缩和对抗性制造。我们提出了一个双系统的认知模型,包括一个语义系统,生成流畅的文本和一个认知验证,验证事实的准确性。我们的研究结果,在5个压缩水平(1,800回合水平评价)的8个知识领域的9个前沿模型评估的基础上,揭示了认知鲁棒性是正交的传统设计范式。参数计数(r=0.083,p=0.832)和架构类型(r=0.153,p=0.695)都不能显著预测鲁棒性,这表明它来自与当前方法不同的训练方法和验证机制。错误检测能力强有力地预测了总体稳健性(rho=-0.817,p=0.007),表明这是关键瓶颈。我们发现,旗舰模型表现出脆性,尽管其规模,而较小的模型可以实现强大的性能,挑战模型大小和可靠性之间的关系的假设。DDFT框架为在关键应用中部署之前评估认知鲁棒性提供了理论基础和实用工具。
摘要:Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. Static benchmarks like MMLU and TruthfulQA cannot distinguish a model that lacks knowledge from one whose verification mechanisms collapse when information degrades or adversaries probe for weaknesses. We introduce the Drill-Down and Fabricate Test (DDFT), a protocol that measures epistemic robustness: a model's ability to maintain factual accuracy under progressive semantic compression and adversarial fabrication. We propose a two-system cognitive model comprising a Semantic System that generates fluent text and an Epistemic Verifier that validates factual accuracy. Our findings, based on evaluating 9 frontier models across 8 knowledge domains at 5 compression levels (1,800 turn-level evaluations), reveal that epistemic robustness is orthogonal to conventional design paradigms. Neither parameter count (r=0.083, p=0.832) nor architectural type (r=0.153, p=0.695) significantly predicts robustness, suggesting it emerges from training methodology and verification mechanisms distinct from current approaches. Error detection capability strongly predicts overall robustness (rho=-0.817, p=0.007), indicating this is the critical bottleneck. We find that flagship models exhibit brittleness despite their scale, while smaller models can achieve robust performance, challenging assumptions about the relationship between model size and reliability. The DDFT framework provides both theoretical foundation and practical tools for assessing epistemic robustness before deployment in critical applications.


【15】Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs
标题:金融QA领域知识的集成:一种基于LLM的多检索器RAG方法
链接:https://arxiv.org/abs/2512.23848

作者:Yukun Zhang,Stefan Elbl Droguett,Samyak Jain
摘要:本研究针对金融领域知识缺乏所造成的金融数值推理问题分类(QA)的错误。尽管最近在大型语言模型(LLM)方面取得了进展,但金融数字问题仍然具有挑战性,因为它们需要金融领域的特定知识和复杂的多步数字推理。我们实现了一个多检索器检索增强生成器(RAG)系统检索外部领域知识和内部问题上下文,并利用最新的LLM来解决这些任务。通过全面的消融实验和错误分析,我们发现使用SecBERT编码器进行特定领域的训练,大大有助于我们的最佳神经符号模型超越FinQA论文的顶级模型,这是我们的基线。这表明特定领域培训的潜在优越性能。此外,我们最好的基于FPGA的LLM生成器实现了最先进的(SOTA)性能,具有显着的改进(>7%),但仍然低于人类专家的性能。这项研究强调了在较小的模型和Few-Shot例子中幻觉损失和外部知识增益之间的权衡。对于更大的模型,从外部事实中获得的收益通常超过幻觉的损失。最后,我们的研究结果证实了最新的LLM增强的数值推理能力,优化了Few-Shot学习。
摘要:This research project addresses the errors of financial numerical reasoning Question Answering (QA) tasks due to the lack of domain knowledge in finance. Despite recent advances in Large Language Models (LLMs), financial numerical questions remain challenging because they require specific domain knowledge in finance and complex multi-step numeric reasoning. We implement a multi-retriever Retrieval Augmented Generators (RAG) system to retrieve both external domain knowledge and internal question contexts, and utilize the latest LLM to tackle these tasks. Through comprehensive ablation experiments and error analysis, we find that domain-specific training with the SecBERT encoder significantly contributes to our best neural symbolic model surpassing the FinQA paper's top model, which serves as our baseline. This suggests the potential superior performance of domain-specific training. Furthermore, our best prompt-based LLM generator achieves the state-of-the-art (SOTA) performance with significant improvement (>7%), yet it is still below the human expert performance. This study highlights the trade-off between hallucinations loss and external knowledge gains in smaller models and few-shot examples. For larger models, the gains from external facts typically outweigh the hallucination loss. Finally, our findings confirm the enhanced numerical reasoning capabilities of the latest LLM, optimized for few-shot learning.


【16】Retrieval Augmented Question Answering: When Should LLMs Admit Ignorance?
标题:检索增强问题解答:LLM何时应该承认无知?
链接:https://arxiv.org/abs/2512.23836

作者:Dingmin Wang,Ji Ma,Shankar Kumar
摘要:大语言模型(LLM)中扩展上下文窗口的成功推动了检索增强生成中更广泛上下文的使用。我们调查使用LLM检索增强问答。虽然较长的上下文更容易合并目标知识,但它们引入了更多不相关的信息,阻碍了模型的生成过程并降低了其性能。为了解决这个问题,我们设计了一个自适应提示策略,它涉及到分割检索到的信息成更小的块,并依次提示LLM回答问题,使用每个块。调整块大小允许在合并相关信息和减少不相关信息之间进行权衡。在三个开放域问答数据集上的实验结果表明,该自适应策略在使用较少标记的情况下,与标准提示的性能相当。我们的分析表明,当遇到不充分的信息,LLM往往会产生错误的答案,而不是拒绝回应,这构成了一个主要的错误来源。这一发现强调了需要进一步研究,以提高LLM在面临信息不足时有效拒绝请求的能力。
摘要:The success of expanded context windows in Large Language Models (LLMs) has driven increased use of broader context in retrieval-augmented generation. We investigate the use of LLMs for retrieval augmented question answering. While longer contexts make it easier to incorporate targeted knowledge, they introduce more irrelevant information that hinders the model's generation process and degrades its performance. To address the issue, we design an adaptive prompting strategy which involves splitting the retrieved information into smaller chunks and sequentially prompting a LLM to answer the question using each chunk. Adjusting the chunk size allows a trade-off between incorporating relevant information and reducing irrelevant information. Experimental results on three open-domain question answering datasets demonstrate that the adaptive strategy matches the performance of standard prompting while using fewer tokens. Our analysis reveals that when encountering insufficient information, the LLM often generates incorrect answers instead of declining to respond, which constitutes a major source of error. This finding highlights the need for further research into enhancing LLMs' ability to effectively decline requests when faced with inadequate information.


【17】Geometric Scaling of Bayesian Inference in LLMs
标题:LLM中Bayesian推理的几何标度
链接:https://arxiv.org/abs/2512.23752

作者:Naman Aggarwal,Siddhartha R. Dalal,Vishal Misra
摘要:最近的工作表明,在受控的“风洞”设置中训练的小型Transformers可以实现精确的贝叶斯推理,并且它们的训练动态产生一个几何基底-低维值流形和渐进正交键-编码后验结构。我们研究这种几何签名是否在生产级语言模型中持续存在。在Pythia,Phi-2,Llama-3和Mistral家族中,我们发现最后一层值表示沿着一个单一的主导轴组织,其位置与预测熵密切相关,并且域限制提示将此结构折叠成在合成设置中观察到的相同的低维流形。   为了探索这种几何结构的作用,我们在上下文学习过程中对Pythia-410 M的熵对齐轴进行了有针对性的干预。移除或扰动该轴选择性地破坏局部不确定性几何,而匹配的随机轴干预则使其保持完整。然而,这些单层操作不会在贝叶斯行为中产生成比例的特定退化,这表明几何形状是不确定性的特权读出,而不是单一的计算瓶颈。总之,我们的研究结果表明,现代语言模型保留几何基板,使贝叶斯推断在风洞中,并组织他们的近似贝叶斯更新沿此基板。
摘要:Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference, and that their training dynamics produce a geometric substrate -- low-dimensional value manifolds and progressively orthogonal keys -- that encodes posterior structure. We investigate whether this geometric signature persists in production-grade language models. Across Pythia, Phi-2, Llama-3, and Mistral families, we find that last-layer value representations organize along a single dominant axis whose position strongly correlates with predictive entropy, and that domain-restricted prompts collapse this structure into the same low-dimensional manifolds observed in synthetic settings.   To probe the role of this geometry, we perform targeted interventions on the entropy-aligned axis of Pythia-410M during in-context learning. Removing or perturbing this axis selectively disrupts the local uncertainty geometry, whereas matched random-axis interventions leave it intact. However, these single-layer manipulations do not produce proportionally specific degradation in Bayesian-like behavior, indicating that the geometry is a privileged readout of uncertainty rather than a singular computational bottleneck. Taken together, our results show that modern language models preserve the geometric substrate that enables Bayesian inference in wind tunnels, and organize their approximate Bayesian updates along this substrate.


【18】A Test of Lookahead Bias in LLM Forecasts
标题:LLM预测中前瞻偏差的测试
链接:https://arxiv.org/abs/2512.23847

作者:Zhenyu Gao,Wenxi Jiang,Yutong Yan
摘要:我们开发了一个统计测试来检测由大型语言模型(LLM)生成的经济预测中的前瞻偏差。使用最先进的预训练数据检测技术,我们估计给定提示出现在LLM的训练语料库中的可能性,我们称之为前瞻倾向(Lookahead Propensity,缩写为PSNR)。我们正式表明,预测准确性之间的正相关关系表明前瞻性偏见的存在和幅度,并将测试应用到两个预测任务:新闻标题预测股票收益和盈利电话会议成绩单预测资本支出。我们的测试提供了一个具有成本效益的诊断工具,用于评估LLM生成的预测的有效性和可靠性。
摘要 :We develop a statistical test to detect lookahead bias in economic forecasts generated by large language models (LLMs). Using state-of-the-art pre-training data detection techniques, we estimate the likelihood that a given prompt appeared in an LLM's training corpus, a statistic we term Lookahead Propensity (LAP). We formally show that a positive correlation between LAP and forecast accuracy indicates the presence and magnitude of lookahead bias, and apply the test to two forecasting tasks: news headlines predicting stock returns and earnings call transcripts predicting capital expenditures. Our test provides a cost-efficient, diagnostic tool for assessing the validity and reliability of LLM-generated forecasts.


Graph相关(图学习|图神经网络|图优化等)(11篇)

【1】Frequent subgraph-based persistent homology for graph classification
标题:用于图分类的基于频繁子图的持续同调
链接:https://arxiv.org/abs/2512.24917

作者:Xinyang Chen,Amaël Broustet,Guoting Chen
备注:Preprint. 18 pages, 10 figures
摘要:持久同源性(PH)最近成为一个强大的工具,提取拓扑特征。将PH集成到机器学习和深度学习模型中,增强了拓扑感知和可解释性。然而,图上的大多数PH方法依赖于有限的过滤集合,例如基于度或基于权重的过滤,这些过滤忽略了更丰富的特征,例如数据集中的重复信息,从而限制了表达能力。在这项工作中,我们提出了一种新的图过滤称为频繁子图过滤(FSF),这是来自频繁子图,并产生稳定的和信息丰富的基于频率的持久同源性(FPH)功能。我们研究了FSF的理论性质,并提供了证据和实验验证。除了持久同源性本身,我们还介绍了两种图分类方法:基于FPH的机器学习模型(FPH-ML)和将FPH与图神经网络(FPH-GNN)集成以增强拓扑感知图表示学习的混合框架。我们的框架桥梁频繁子图挖掘和拓扑数据分析,提供了一个新的视角拓扑感知特征提取。实验结果表明,FPH-ML实现竞争力或更高的精度相比,基于核和度的过滤方法。当集成到图神经网络中时,FPH产生的相对性能增益从0.4%到21%不等,在基准测试中比GCN和GIN主干提高了8.2个百分点。
摘要:Persistent homology (PH) has recently emerged as a powerful tool for extracting topological features. Integrating PH into machine learning and deep learning models enhances topology awareness and interpretability. However, most PH methods on graphs rely on a limited set of filtrations, such as degree-based or weight-based filtrations, which overlook richer features like recurring information across the dataset and thus restrict expressive power. In this work, we propose a novel graph filtration called Frequent Subgraph Filtration (FSF), which is derived from frequent subgraphs and produces stable and information-rich frequency-based persistent homology (FPH) features. We study the theoretical properties of FSF and provide both proofs and experimental validation. Beyond persistent homology itself, we introduce two approaches for graph classification: an FPH-based machine learning model (FPH-ML) and a hybrid framework that integrates FPH with graph neural networks (FPH-GNNs) to enhance topology-aware graph representation learning. Our frameworks bridge frequent subgraph mining and topological data analysis, offering a new perspective on topology-aware feature extraction. Experimental results show that FPH-ML achieves competitive or superior accuracy compared with kernel-based and degree-based filtration methods. When integrated into graph neural networks, FPH yields relative performance gains ranging from 0.4 to 21 percent, with improvements of up to 8.2 percentage points over GCN and GIN backbones across benchmarks.


【2】Spectral Graph Neural Networks for Cognitive Task Classification in fMRI Connectomes
标题:基于谱图神经网络的fMRI连接体认知任务分类
链接:https://arxiv.org/abs/2512.24901

作者:Debasis Maji,Arghya Banerjee,Debaditya Barman
摘要:使用机器学习的认知任务分类在从神经成像数据解码大脑状态中起着核心作用。通过将机器学习与大脑网络分析相结合,可以从功能磁共振成像连接体中提取复杂的连接模式。这个过程将原始血氧水平依赖(BOLD)信号转换为认知过程的可解释表示。图神经网络(GNN)通过将大脑区域建模为节点,将功能连接建模为边,从而进一步推进了这一范式,捕获了传统方法经常错过的拓扑依赖性和多尺度交互。我们提出的SpectralBrainGNN模型是一种基于图形傅立叶变换(GFT)的谱卷积框架,通过归一化拉普拉斯特征分解计算。在人类连接组项目任务(HCPTask)数据集上的实验表明了该方法的有效性,分类准确率达到96.25%。该实现可在https://github.com/gnnplayground/SpectralBrainGNN上公开获得,以支持再现性和未来的研究。
摘要:Cognitive task classification using machine learning plays a central role in decoding brain states from neuroimaging data. By integrating machine learning with brain network analysis, complex connectivity patterns can be extracted from functional magnetic resonance imaging connectomes. This process transforms raw blood-oxygen-level-dependent (BOLD) signals into interpretable representations of cognitive processes. Graph neural networks (GNNs) further advance this paradigm by modeling brain regions as nodes and functional connections as edges, capturing topological dependencies and multi-scale interactions that are often missed by conventional approaches. Our proposed SpectralBrainGNN model, a spectral convolution framework based on graph Fourier transforms (GFT) computed via normalized Laplacian eigendecomposition. Experiments on the Human Connectome Project-Task (HCPTask) dataset demonstrate the effectiveness of the proposed approach, achieving a classification accuracy of 96.25\%. The implementation is publicly available at https://github.com/gnnplayground/SpectralBrainGNN to support reproducibility and future research.


【3】HeteroHBA: A Generative Structure-Manipulating Backdoor Attack on Heterogeneous Graphs
标题:HeteroDBA:对异类图的生成性结构操纵后门攻击
链接:https://arxiv.org/abs/2512.24665

作者:Honglin Gao,Lan Zhao,Junhao Ren,Xiang Li,Gaoxi Xiao
摘要:异构图神经网络(HGNNs)在许多现实世界的应用中已经取得了很好的性能,但异构图上的目标后门中毒仍然研究较少。我们考虑后门攻击异构节点分类,其中对手注入一小组的触发节点和连接在训练过程中,以迫使特定的受害者节点被错误分类到攻击者选择的标签在测试时,同时保持干净的性能。我们提出了HeteroHBA,一个生成后门框架,选择有影响力的辅助邻居触发附件通过基于显着性的筛选和综合不同的触发功能和连接模式,以更好地匹配本地异构上下文。为了提高隐蔽性,我们将自适应实例归一化(AdaIN)与最大平均离散(MMD)损失相结合,以使触发特征分布与良性统计数据保持一致,从而降低可检测性,并且我们使用双层目标优化攻击,共同促进攻击成功并保持干净的准确性。在具有代表性HGNN架构的多个真实异构图上的实验表明,HeteroHBA始终比以前的后门基线实现更高的攻击成功率,对干净的准确性具有相当或更小的影响;此外,在我们的异构性感知结构防御CSD下,攻击仍然有效。这些结果突出了异构图学习中的实际后门风险,并激励开发更强大的防御。
摘要:Heterogeneous graph neural networks (HGNNs) have achieved strong performance in many real-world applications, yet targeted backdoor poisoning on heterogeneous graphs remains less studied. We consider backdoor attacks for heterogeneous node classification, where an adversary injects a small set of trigger nodes and connections during training to force specific victim nodes to be misclassified into an attacker-chosen label at test time while preserving clean performance. We propose HeteroHBA, a generative backdoor framework that selects influential auxiliary neighbors for trigger attachment via saliency-based screening and synthesizes diverse trigger features and connection patterns to better match the local heterogeneous context. To improve stealthiness, we combine Adaptive Instance Normalization (AdaIN) with a Maximum Mean Discrepancy (MMD) loss to align the trigger feature distribution with benign statistics, thereby reducing detectability, and we optimize the attack with a bilevel objective that jointly promotes attack success and maintains clean accuracy. Experiments on multiple real-world heterogeneous graphs with representative HGNN architectures show that HeteroHBA consistently achieves higher attack success than prior backdoor baselines with comparable or smaller impact on clean accuracy; moreover, the attack remains effective under our heterogeneity-aware structural defense, CSD. These results highlight practical backdoor risks in heterogeneous graph learning and motivate the development of stronger defenses.


【4】A Graph Neural Network with Auxiliary Task Learning for Missing PMU Data Reconstruction
标题:具有辅助任务学习的图神经网络用于缺失的PFA数据重建
链接:https://arxiv.org/abs/2512.24542

作者:Bo Li,Zijun Chen,Haiwang Zhong,Di Cao,Guangchun Ruan
摘要:在广域测量系统(WAMS)中,由于硬件故障、通信延迟和网络攻击等原因,相量测量单元(PMU)测量容易出现数据丢失。现有的数据驱动方法存在对电力系统概念漂移的不适应性、高丢失率下的鲁棒性差以及依赖于不切实际的系统全可观性假设等问题。因此,本文提出了一种辅助任务学习(ATL)的方法来重建丢失的PMU数据。首先,提出了一种K-hop图神经网络(GNN),使由PMU节点组成的子图上的直接学习,克服了不完全可观测系统的限制。然后,设计了一个由两个互补图网络组成的辅助学习框架来实现精确重构:一个时空GNN从PMU数据中提取时空依赖关系来重构缺失值,另一个辅助GNN利用PMU数据的低秩特性来实现无监督在线学习。通过这种方式,PMU数据的低秩属性在整个架构中被动态地利用,以确保鲁棒性和自适应性。数值结果表明,该方法在高丢失率和不完全可观测性下具有较好的离线和在线性能。
摘要:In wide-area measurement systems (WAMS), phasor measurement unit (PMU) measurement is prone to data missingness due to hardware failures, communication delays, and cyber-attacks. Existing data-driven methods are limited by inadaptability to concept drift in power systems, poor robustness under high missing rates, and reliance on the unrealistic assumption of full system observability. Thus, this paper proposes an auxiliary task learning (ATL) method for reconstructing missing PMU data. First, a K-hop graph neural network (GNN) is proposed to enable direct learning on the subgraph consisting of PMU nodes, overcoming the limitation of the incompletely observable system. Then, an auxiliary learning framework consisting of two complementary graph networks is designed for accurate reconstruction: a spatial-temporal GNN extracts spatial-temporal dependencies from PMU data to reconstruct missing values, and another auxiliary GNN utilizes the low-rank property of PMU data to achieve unsupervised online learning. In this way, the low-rank properties of the PMU data are dynamically leveraged across the architecture to ensure robustness and self-adaptation. Numerical results demonstrate the superior offline and online performance of the proposed method under high missing rates and incomplete observability.


【5】Spectral and Spatial Graph Learning for Multispectral Solar Image Compression
标题:用于多光谱太阳图像压缩的光谱和空间图学习
链接:https://arxiv.org/abs/2512.24463

作者:Prasiddha Siwakoti,Atefeh Khoshkhahtinat,Piyush M. Mehta,Barbara J. Thompson,Michael S. F. Kirk,Daniel da Silva
备注:8 pages, 6 figures 1 table. Code available at https://github.com/agyat4/sgraph
摘要:多光谱太阳图像的高保真压缩对于空间飞行任务仍然具有挑战性,因为有限的带宽必须与保留精细的光谱和空间细节相平衡。我们提出了一个针对太阳观测量身定制的学习图像压缩框架,利用两个互补模块:(1)光谱间窗口图嵌入(iSWGE),通过将光谱通道表示为具有学习边缘特征的图节点来显式建模波段间关系;(2)窗口空间图形注意力和卷积块注意力(WSGA-C),其将稀疏图注意力与卷积注意力相结合,以减少空间冗余并强调精细尺度结构。在SDOML数据集上对6个极紫外(EUV)通道的评估表明,我们的方法实现了平均光谱信息发散度(MSID)降低20.15%,PSNR提高高达1.09%,以及在强学习基线上1.62%的对数转换MS-SSIM增益,以可比的每像素比特率提供更清晰和光谱上忠实的重建。该代码可在https://github.com/agyat4/sgraph上公开获取。
摘要:High-fidelity compression of multispectral solar imagery remains challenging for space missions, where limited bandwidth must be balanced against preserving fine spectral and spatial details. We present a learned image compression framework tailored to solar observations, leveraging two complementary modules: (1) the Inter-Spectral Windowed Graph Embedding (iSWGE), which explicitly models inter-band relationships by representing spectral channels as graph nodes with learned edge features; and (2) the Windowed Spatial Graph Attention and Convolutional Block Attention (WSGA-C), which combines sparse graph attention with convolutional attention to reduce spatial redundancy and emphasize fine-scale structures. Evaluations on the SDOML dataset across six extreme ultraviolet (EUV) channels show that our approach achieves a 20.15%reduction in Mean Spectral Information Divergence (MSID), up to 1.09% PSNR improvement, and a 1.62% log transformed MS-SSIM gain over strong learned baselines, delivering sharper and spectrally faithful reconstructions at comparable bits-per-pixel rates. The code is publicly available at https://github.com/agyat4/sgraph .


【6】Hyperspherical Graph Representation Learning via Adaptive Neighbor-Mean Alignment and Uniformity
标题:通过自适应邻居均值对齐和均匀性的超球图表示学习
链接:https://arxiv.org/abs/2512.24062

作者:Rui Chen,Junjun Guo,Hongbin Wang,Yan Xiang,Yantuan Xian,Zhengtao Yu
备注:Submitted to Pattern Recognition
摘要:图表示学习(GRL)旨在将图结构数据的结构和语义依赖编码为低维嵌入。然而,现有的GRL方法往往依赖于替代对比目标或互信息最大化,这通常需要复杂的架构,负采样策略,和敏感的超参数调整。这些设计选择可能会导致过度平滑、过度挤压和训练不稳定。在这项工作中,我们提出了HyperGRL,一个统一的框架,通过自适应邻域均值对齐和无采样均匀性的超球面图表示学习。HyperGRL通过两个反向耦合的目标将节点嵌入到单位超球体上:邻居平均对齐和无采样均匀性。对齐目标使用每个节点的局部邻域的平均表示来构建语义上接地的稳定目标,以捕获共享的结构和特征模式。均匀性目标通过基于L2的超球面正则化来制定分散度,在保留判别信息的同时鼓励全局均匀的嵌入分布。为了进一步稳定训练,我们引入了一种熵引导的自适应平衡机制,该机制可以动态调节对齐和均匀性之间的相互作用,而无需手动调整。对节点分类、节点聚类和链接预测的大量实验表明,HyperGRL在不同的图结构中提供了卓越的表示质量和泛化能力,与现有最强的方法相比,平均分别提高了1.49%、0.86%和0.74%。这些发现突出了几何接地,无采样的对比目标的图形表示学习的有效性。
摘要:Graph representation learning (GRL) aims to encode structural and semantic dependencies of graph-structured data into low-dimensional embeddings. However, existing GRL methods often rely on surrogate contrastive objectives or mutual information maximization, which typically demand complex architectures, negative sampling strategies, and sensitive hyperparameter tuning. These design choices may induce over-smoothing, over-squashing, and training instability. In this work, we propose HyperGRL, a unified framework for hyperspherical graph representation learning via adaptive neighbor-mean alignment and sampling-free uniformity. HyperGRL embeds nodes on a unit hypersphere through two adversarially coupled objectives: neighbor-mean alignment and sampling-free uniformity. The alignment objective uses the mean representation of each node's local neighborhood to construct semantically grounded, stable targets that capture shared structural and feature patterns. The uniformity objective formulates dispersion via an L2-based hyperspherical regularization, encouraging globally uniform embedding distributions while preserving discriminative information. To further stabilize training, we introduce an entropy-guided adaptive balancing mechanism that dynamically regulates the interplay between alignment and uniformity without requiring manual tuning. Extensive experiments on node classification, node clustering, and link prediction demonstrate that HyperGRL delivers superior representation quality and generalization across diverse graph structures, achieving average improvements of 1.49%, 0.86%, and 0.74% over the strongest existing methods, respectively. These findings highlight the effectiveness of geometrically grounded, sampling-free contrastive objectives for graph representation learning.


【7】Physics-informed Graph Neural Networks for Operational Flood Modeling
标题:用于业务洪水建模的物理信息图神经网络
链接:https://arxiv.org/abs/2512.23964

作者 :Carlo Malapad Acosta,Herath Mudiyanselage Viraj Vidura Herath,Jia Yu Lim,Abhishek Saha,Sanka Rasnayaka,Lucy Marshall
备注:To be submitted to IJCAI
摘要:洪水模型通过模拟洪水的时空流体动力学为战略性灾害管理提供信息。虽然基于物理的数值洪水模型是准确的,但其大量的计算成本限制了其在快速预测至关重要的业务环境中的使用。使用图神经网络(GNN)设计的模型提供了速度和准确性,同时具有处理非结构化空间域的能力。鉴于其灵活的输入和架构,GNN可以轻松地与物理信息技术一起使用,从而显着提高可解释性。本研究引入了一种新型洪水GNN架构DUALFloodGNN,它通过显式损失项嵌入了全球和局部规模的物理约束。该模型通过共享的消息传递框架联合预测节点处的水量和沿边缘的流量。为了提高自回归推理的性能,使用动态课程学习增强的多步损失进行模型训练。与标准GNN架构和最先进的GNN洪水模型相比,DUALFloodGNN在预测多个水文变量方面取得了实质性的改进,同时保持了高计算效率。该模型在https://github.com/acostacos/dual_flood_gnn上开源。
摘要:Flood models inform strategic disaster management by simulating the spatiotemporal hydrodynamics of flooding. While physics-based numerical flood models are accurate, their substantial computational cost limits their use in operational settings where rapid predictions are essential. Models designed with graph neural networks (GNNs) provide both speed and accuracy while having the ability to process unstructured spatial domains. Given its flexible input and architecture, GNNs can be leveraged alongside physics-informed techniques with ease, significantly improving interpretability. This study introduces a novel flood GNN architecture, DUALFloodGNN, which embeds physical constraints at both global and local scales through explicit loss terms. The model jointly predicts water volume at nodes and flow along edges through a shared message-passing framework. To improve performance for autoregressive inference, model training is conducted with a multi-step loss enhanced with dynamic curriculum learning. Compared with standard GNN architectures and state-of-the-art GNN flood models, DUALFloodGNN achieves substantial improvements in predicting multiple hydrologic variables while maintaining high computational efficiency. The model is open-sourced at https://github.com/acostacos/dual_flood_gnn.


【8】A Survey on Graph Neural Networks for Fraud Detection in Ride Hailing Platforms
标题:用于叫车平台欺诈检测的图神经网络研究
链接:https://arxiv.org/abs/2512.23777

作者:Kanishka Hewageegana,Janani Harischandra,Nipuna Senanayake,Gihan Danansuriya,Kavindu Hapuarachchi,Pooja Illangarathne
备注:12 pages, 8 figures, 2 tables. Presented at the 2024 7th International Conference on Artificial Intelligence and Big Data (ICAIBD)
摘要:本研究通过图神经网络(GNNs)研究了乘车平台中的欺诈检测,重点关注各种模型的有效性。通过分析普遍存在的欺诈活动,该研究强调并比较了与欺诈检测相关的现有工作,这些工作在解决在线乘车平台内的欺诈事件时可能很有用。此外,本文强调解决阶级不平衡和欺诈性伪装。它还概述了GNN架构和方法的结构化概述适用于异常检测,确定重大的方法进步和差距。该论文呼吁进一步探索现实世界的适用性和技术改进,以加强快速发展的乘车行业中的欺诈检测策略。
摘要:This study investigates fraud detection in ride hailing platforms through Graph Neural Networks (GNNs),focusing on the effectiveness of various models. By analyzing prevalent fraudulent activities, the research highlights and compares the existing work related to fraud detection which can be useful when addressing fraudulent incidents within the online ride hailing platforms. Also, the paper highlights addressing class imbalance and fraudulent camouflage. It also outlines a structured overview of GNN architectures and methodologies applied to anomaly detection, identifying significant methodological progress and gaps. The paper calls for further exploration into real-world applicability and technical improvements to enhance fraud detection strategies in the rapidly evolving ride-hailing industry.


【9】A New Decomposition Paradigm for Graph-structured Nonlinear Programs via Message Passing
标题:基于消息传递的图结构非线性规划分解新范式
链接:https://arxiv.org/abs/2512.24676

作者:Kuangyu Ding,Marie Maros,Gesualdo Scutari
备注:55 pages, 14 figures
摘要:我们研究有限和非线性规划,其决策变量根据图或超图局部相互作用。我们提出了MP-Jacobi(Message Passing-Jacobi),这是一个符合图形的分散框架,它将最小和消息传递与Jacobi块更新耦合在一起。(超)图被划分成树簇。在每次迭代时,代理通过解决集群子问题并行更新,该集群子问题的目标分解为(i)通过集群树上的单个最小和扫描(成本去消息)评估的集群内项和(ii)通过使用邻居的最新迭代的Jacobi校正处理的集群间耦合。该设计仅使用单跳通信,并在循环图上产生收敛的消息传递方法。   对于强凸目标,我们建立了全局线性收敛和明确的速率,量化曲率,耦合强度和所选择的分区如何影响可扩展性,并为聚类提供指导。为了减轻精确消息更新的计算和通信成本,我们开发了符合图形的代理,在降低每次迭代复杂度的同时保持收敛。我们进一步扩展MP-Jacobi超图,在严重重叠的制度,代理为基础的超边分裂计划恢复有限的时间内集群消息更新,并保持收敛。实验验证了该理论,并在分散的梯度基线上显示出一致的改进。
摘要:We study finite-sum nonlinear programs whose decision variables interact locally according to a graph or hypergraph. We propose MP-Jacobi (Message Passing-Jacobi), a graph-compliant decentralized framework that couples min-sum message passing with Jacobi block updates. The (hyper)graph is partitioned into tree clusters. At each iteration, agents update in parallel by solving a cluster subproblem whose objective decomposes into (i) an intra-cluster term evaluated by a single min-sum sweep on the cluster tree (cost-to-go messages) and (ii) inter-cluster couplings handled via a Jacobi correction using neighbors' latest iterates. This design uses only single-hop communication and yields a convergent message-passing method on loopy graphs.   For strongly convex objectives we establish global linear convergence and explicit rates that quantify how curvature, coupling strength, and the chosen partition affect scalability and provide guidance for clustering. To mitigate the computation and communication cost of exact message updates, we develop graph-compliant surrogates that preserve convergence while reducing per-iteration complexity. We further extend MP-Jacobi to hypergraphs; in heavily overlapping regimes, a surrogate-based hyperedge-splitting scheme restores finite-time intra-cluster message updates and maintains convergence. Experiments validate the theory and show consistent improvements over decentralized gradient baselines.


【10】Topological Spatial Graph Coarsening
标题:布局空间图粗化
链接:https://arxiv.org/abs/2512.24327

作者:Anna Calissano,Etienne Lasalle
摘要:空间图是节点在空间中局部化的特定图(例如,公共运输网络、分子、分支生物结构)。在这项工作中,我们考虑了空间图约简的问题,其目的是找到更小的空间图(即,具有更少的节点),具有与初始节点相同的总体结构。在这种情况下,由于额外的空间信息,在保持初始图的主要拓扑特征的同时执行图约简是特别相关的。因此,我们提出了一个拓扑空间图粗化方法的基础上,找到一个新的框架之间的权衡图的减少和保存的拓扑特征。粗化通过折叠短边来实现。为了捕捉所需的拓扑信息,校准的减少水平,我们适应建设经典的拓扑描述符点云(所谓的持久图)空间图形。这种构造依赖于引入一种新的过滤,称为三角形感知图过滤。我们的粗化方法是无参数的,我们证明了它是旋转,平移和缩放下的初始空间图的等变。我们评估了我们的方法在合成和真实空间图上的性能,并表明它在保留相关拓扑信息的同时显着减小了图的大小。
摘要 :Spatial graphs are particular graphs for which the nodes are localized in space (e.g., public transport network, molecules, branching biological structures). In this work, we consider the problem of spatial graph reduction, that aims to find a smaller spatial graph (i.e., with less nodes) with the same overall structure as the initial one. In this context, performing the graph reduction while preserving the main topological features of the initial graph is particularly relevant, due to the additional spatial information. Thus, we propose a topological spatial graph coarsening approach based on a new framework that finds a trade-off between the graph reduction and the preservation of the topological characteristics. The coarsening is realized by collapsing short edges. In order to capture the topological information required to calibrate the reduction level, we adapt the construction of classical topological descriptors made for point clouds (the so-called persistent diagrams) to spatial graphs. This construction relies on the introduction of a new filtration called triangle-aware graph filtration. Our coarsening approach is parameter-free and we prove that it is equivariant under rotations, translations and scaling of the initial spatial graph. We evaluate the performances of our method on synthetic and real spatial graphs, and show that it significantly reduces the graph sizes while preserving the relevant topological information.


【11】Quantum Error Mitigation with Attention Graph Transformers for Burgers Equation Solvers on NISQ Hardware
标题:NISQ硬件上Burgers方程求解器的注意图变换量子误差消除
链接:https://arxiv.org/abs/2512.23817

作者:Seyed Mohamad Ali Tousi,Adib Bazgir,Yuwen Zhang,G. N. DeSouza
摘要:我们提出了一个混合量子经典框架增强与学习误差缓解解决粘性Burgers方程的噪声中间尺度量子(NISQ)硬件。使用Cole-Hopf变换,非线性Burgers方程被映射到扩散方程,在均匀网格上离散化,并编码成量子态,其时间演化通过在Qiskit中实现的Trotterized最近邻电路来近似。量子模拟上执行嘈杂的Aer后端和IBM超导量子设备和基准对高精度的经典解决方案,使用基于Krylov的求解器应用到相应的离散化的哈密顿。从测量的量子振幅,我们重建的速度场和评估物理和数值诊断,包括L2误差,冲击位置,和耗散率,有和没有零噪声外推(ZNE)。为了实现数据驱动的错误缓解,我们通过扫描粘度,时间步长,网格分辨率和边界条件构建了一个大型参数数据集,产生了噪声,ZNE校正,硬件和经典解决方案的匹配元组以及详细的电路元数据。利用这个数据集,我们训练了一个基于注意力的图神经网络,该网络结合了电路结构、光锥信息、全局电路参数和噪声量子输出,以预测减少错误的解决方案。在广泛的参数范围内,学习的模型一致地减少了量子和经典解决方案之间的差异,超出了ZNE单独实现的范围。我们讨论了这种方法的扩展到更高维的Burgers系统和更一般的量子偏微分方程求解器,突出学习误差缓解作为一个有前途的补充,以物理为基础的降噪技术NISQ设备。
摘要:We present a hybrid quantum-classical framework augmented with learned error mitigation for solving the viscous Burgers equation on noisy intermediate-scale quantum (NISQ) hardware. Using the Cole-Hopf transformation, the nonlinear Burgers equation is mapped to a diffusion equation, discretized on uniform grids, and encoded into a quantum state whose time evolution is approximated via Trotterized nearest-neighbor circuits implemented in Qiskit. Quantum simulations are executed on noisy Aer backends and IBM superconducting quantum devices and are benchmarked against high-accuracy classical solutions obtained using a Krylov-based solver applied to the corresponding discretized Hamiltonian. From measured quantum amplitudes, we reconstruct the velocity field and evaluate physical and numerical diagnostics, including the L2 error, shock location, and dissipation rate, both with and without zero-noise extrapolation (ZNE). To enable data-driven error mitigation, we construct a large parametric dataset by sweeping viscosity, time step, grid resolution, and boundary conditions, producing matched tuples of noisy, ZNE-corrected, hardware, and classical solutions together with detailed circuit metadata. Leveraging this dataset, we train an attention-based graph neural network that incorporates circuit structure, light-cone information, global circuit parameters, and noisy quantum outputs to predict error-mitigated solutions. Across a wide range of parameters, the learned model consistently reduces the discrepancy between quantum and classical solutions beyond what is achieved by ZNE alone. We discuss extensions of this approach to higher-dimensional Burgers systems and more general quantum partial differential equation solvers, highlighting learned error mitigation as a promising complement to physics-based noise reduction techniques on NISQ devices.


Transformer(3篇)

【1】Many Minds from One Model: Bayesian Transformers for Population Intelligence
标题:一个模型的多个思想:人口智能的Bayesian Transformers
链接:https://arxiv.org/abs/2512.25063

作者:Diji Yang,Yi Zhang
摘要:尽管它们的规模和成功,现代Transformers几乎普遍被训练为单一的系统:优化产生一组确定性的参数,代表关于数据的单一功能假设。受智力来自多个头脑的想法的启发,我们提出了人口贝叶斯Transformers(B-Trans),它将标准的大型语言模型转换为贝叶斯Transformer模型,以支持从一组预训练的权重中采样多样化但连贯的模型实例。   B-Trans引入了贝叶斯激励的后验代理,将归一化层中的偏置偏移量视为具有高斯变分近似的随机变量,从而在模型行为上产生分布,而无需训练完整的贝叶斯神经网络。从这个代理中采样产生一组具有不同行为的模型实例,同时保持一般的能力。为了保持每一代内的一致性,我们在序列级别冻结采样噪声,在令牌之间强制执行时间一致性。B-Trans允许群体级别的决策,其中汇总样本个体的预测显着增强了探索。跨zero-shot生成、具有可验证奖励的强化学习(RLVR)和没有显式标签的强化学习的实验表明,B-Trans有效地利用了群体的智慧,产生了卓越的语义多样性,同时与确定性基线相比实现了更好的任务性能。
摘要:Despite their scale and success, modern transformers are almost universally trained as single-minded systems: optimization produces one deterministic set of parameters, representing a single functional hypothesis about the data. Motivated by the idea that intelligence emerge from many minds, we propose Population Bayesian Transformers (B-Trans), which transform a standard Large Language Model into a Bayesian Transformer model to supports sampling diverse yet coherent model instances from a single set of pre-trained weights.   B-Trans introduces a Bayesian-motivated posterior proxy by treating the bias-like offsets in normalization layers as stochastic variables with a Gaussian variational approximation, inducing a distribution over model behavior without the cost of training full Bayesian neural networks. Sampling from this proxy yields a set of model instances with diverse behaviors while maintaining general competence. To preserve coherence within each generation, we freeze the sampled noise at the sequence level, enforcing temporal consistency across tokens. B-Trans allows for population-level decision-making, where aggregating predictions across sampled individuals significantly enhances exploration. Experiments across zero-shot generation, Reinforcement Learning with Verifiable Rewards (RLVR), and RL without explicit labels demonstrate that B-Trans effectively leverage the wisdom of crowds, yielding superior semantic diversity while achieving better task performance compared to deterministic baselines.


【2】Guiding a Diffusion Transformer with the Internal Dynamics of Itself
标题:用扩散Transformer本身的内部动力学引导扩散Transformer
链接:https://arxiv.org/abs/2512.24176

作者:Xingyu Zhou,Qifan Li,Xiaobin Hu,Hai Chen,Shuhang Gu
备注:Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
摘要:扩散模型具有捕获整个(条件)数据分布的强大能力。然而,由于缺乏足够的训练和数据来学习覆盖低概率区域,模型将因无法生成与这些区域对应的高质量图像而受到惩罚。为了获得更好的生成质量,引导策略,如无分类器引导(CFG)可以引导样本到高概率区域在采样阶段。然而,标准的CFG往往会导致过度简化或失真的样本。另一方面,引导扩散模型的替代路线及其坏版本受到精心设计的退化策略,额外的训练和额外的采样步骤的限制。在本文中,我们提出了一个简单而有效的策略内部指导(IG),它在训练过程中引入了一个辅助监督的中间层和外推的中间层和深层的输出,以获得生成结果在采样过程中。这个简单的策略在各种基线上的训练效率和生成质量方面都有显着的改进。在ImageNet 256 x256上,SiT-XL/2+IG在80和800 epoch时实现FID=5.31和FID=1.75。更令人印象深刻的是,LightningDiT-XL/1+IG实现了FID=1.34,这在所有这些方法之间实现了较大的裕度。结合CFG,LightningDiT-XL/1+IG实现了当前最先进的FID 1.19。
摘要 :The diffusion model presents a powerful ability to capture the entire (conditional) data distribution. However, due to the lack of sufficient training and data to learn to cover low-probability areas, the model will be penalized for failing to generate high-quality images corresponding to these areas. To achieve better generation quality, guidance strategies such as classifier free guidance (CFG) can guide the samples to the high-probability areas during the sampling stage. However, the standard CFG often leads to over-simplified or distorted samples. On the other hand, the alternative line of guiding diffusion model with its bad version is limited by carefully designed degradation strategies, extra training and additional sampling steps. In this paper, we proposed a simple yet effective strategy Internal Guidance (IG), which introduces an auxiliary supervision on the intermediate layer during training process and extrapolates the intermediate and deep layer's outputs to obtain generative results during sampling process. This simple strategy yields significant improvements in both training efficiency and generation quality on various baselines. On ImageNet 256x256, SiT-XL/2+IG achieves FID=5.31 and FID=1.75 at 80 and 800 epochs. More impressively, LightningDiT-XL/1+IG achieves FID=1.34 which achieves a large margin between all of these methods. Combined with CFG, LightningDiT-XL/1+IG achieves the current state-of-the-art FID of 1.19.


【3】A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe
标题:用于基于InSAR的地面变形预测的多模式Transformer,具有欧洲跨站点通用化
链接:https://arxiv.org/abs/2512.23906

作者:Wendong Yao,Binhua Huang,Soumyabrata Dev
备注:submitted to ISPRS Journal of Photogrammetry and Remote Sensing for review
摘要:越来越需要对地面变形进行近实时的区域尺度监测,以支持城市规划、关键基础设施管理和减轻自然灾害。虽然干涉合成孔径雷达(InSAR)和欧洲地面运动服务(EGMS)等大陆尺度服务提供了对过去运动的密集观测,但由于长期趋势,季节周期和偶尔的突然不连续性(例如,同震阶跃),以及强烈的空间异质性。在这项研究中,我们提出了一个多模式的补丁为基础的Transformer单步,固定间隔的下一个历元临近预报的位移图从EGMS时间序列(重采样到64 × 64网格超过100公里× 100公里瓦片)。该模型摄取最近的位移快照以及(i)仅从训练窗口以泄漏安全的方式计算的静态运动学指标(平均速度、加速度、季节幅度),以及(ii)谐波一年中的一天编码。在爱尔兰东部区块(E32 N34)上,STGCN在仅位移设置中最强,而当所有模型接收相同的多模态输入时,多模态Transformer明显优于CNN-LSTM,CNN-LSTM+Attn和多模态STGCN,在测试集上实现RMSE = 0.90 mm和R^2 $ = 0.97,具有最佳阈值精度。
摘要:Near-real-time regional-scale monitoring of ground deformation is increasingly required to support urban planning, critical infrastructure management, and natural hazard mitigation. While Interferometric Synthetic Aperture Radar (InSAR) and continental-scale services such as the European Ground Motion Service (EGMS) provide dense observations of past motion, predicting the next observation remains challenging due to the superposition of long-term trends, seasonal cycles, and occasional abrupt discontinuities (e.g., co-seismic steps), together with strong spatial heterogeneity. In this study we propose a multimodal patch-based Transformer for single-step, fixed-interval next-epoch nowcasting of displacement maps from EGMS time series (resampled to a 64x64 grid over 100 km x 100 km tiles). The model ingests recent displacement snapshots together with (i) static kinematic indicators (mean velocity, acceleration, seasonal amplitude) computed in a leakage-safe manner from the training window only, and (ii) harmonic day-of-year encodings. On the eastern Ireland tile (E32N34), the STGCN is strongest in the displacement-only setting, whereas the multimodal Transformer clearly outperforms CNN-LSTM, CNN-LSTM+Attn, and multimodal STGCN when all models receive the same multimodal inputs, achieving RMSE = 0.90 mm and $R^2$ = 0.97 on the test set with the best threshold accuracies.


GAN|对抗|攻击|生成相关(7篇)

【1】Projection-based Adversarial Attack using Physics-in-the-Loop Optimization for Monocular Depth Estimation
标题:使用物理在环优化进行单目深度估计的基于投影的对抗攻击
链接:https://arxiv.org/abs/2512.24792

作者:Takeru Kusakabe,Yudai Hirose,Mashiho Mukaida,Satoshi Ono
摘要:深度神经网络(DNN)仍然容易受到对抗性攻击的影响,当向输入图像添加特定扰动时,这种攻击会导致错误分类。这种漏洞也威胁到基于DNN的单目深度估计(MDE)模型的可靠性,使得鲁棒性增强成为实际应用中的关键需求。为了验证基于DNN的MDE模型的脆弱性,本研究提出了一种基于投影的对抗性攻击方法,该方法将扰动光投影到目标对象上。该方法采用物理在环(PITL)优化-在实际环境中评估候选解决方案,以考虑设备规格和干扰-并利用分布式协方差矩阵自适应进化策略。实验证实,该方法成功地创建了对抗性的例子,导致深度错误估计,导致部分对象从目标场景中消失。
摘要:Deep neural networks (DNNs) remain vulnerable to adversarial attacks that cause misclassification when specific perturbations are added to input images. This vulnerability also threatens the reliability of DNN-based monocular depth estimation (MDE) models, making robustness enhancement a critical need in practical applications. To validate the vulnerability of DNN-based MDE models, this study proposes a projection-based adversarial attack method that projects perturbation light onto a target object. The proposed method employs physics-in-the-loop (PITL) optimization -- evaluating candidate solutions in actual environments to account for device specifications and disturbances -- and utilizes a distributed covariance matrix adaptation evolution strategy. Experiments confirmed that the proposed method successfully created adversarial examples that lead to depth misestimations, resulting in parts of objects disappearing from the target scene.


【2】Privacy-Preserving Semantic Communications via Multi-Task Learning and Adversarial Perturbations
标题:通过多任务学习和对抗性扰动保护隐私的语义通信
链接:https://arxiv.org/abs/2512.24452

作者:Yalin E. Sagduyu,Tugba Erpek,Aylin Yener,Sennur Ulukus
摘要:语义通信传达与任务相关的含义,而不是仅仅关注消息重构,从而提高下一代无线系统的带宽效率和鲁棒性。然而,学习的语义表示仍然可以将敏感信息泄漏给非预期的接收者(窃听者)。本文提出了一种基于深度学习的语义通信框架,该框架联合支持多个接收者任务,同时明确限制窃听者的语义泄漏。合法链接在发送器处采用学习的编码器,而接收器训练解码器进行语义推理和数据重构。通过迭代最小-最大优化的安全问题,制定了窃听者进行训练,以提高其语义推理,而合法的发射机-接收机对进行训练,以保持任务性能,同时减少窃听者的成功。我们还引入了一个辅助层,叠加一个合作,adversarially精心制作的扰动传输的波形,降低语义泄漏窃听。使用MNIST和CIFAR-10数据集在瑞利衰落信道加性高斯白噪声的性能进行评估。语义准确性和重建质量随着潜在维度的增加而提高,而最小-最大机制在不降低合法接收器的情况下显着降低了窃听者的推理性能。扰动层成功地减少了语义泄漏,即使合法链接只为自己的任务而训练。这个全面的框架激励语义通信设计与可调的,端到端的隐私对适应性的对手在现实的无线设置。
摘要:Semantic communications conveys task-relevant meaning rather than focusing solely on message reconstruction, improving bandwidth efficiency and robustness for next-generation wireless systems. However, learned semantic representations can still leak sensitive information to unintended receivers (eavesdroppers). This paper presents a deep learning-based semantic communication framework that jointly supports multiple receiver tasks while explicitly limiting semantic leakage to an eavesdropper. The legitimate link employs a learned encoder at the transmitter, while the receiver trains decoders for semantic inference and data reconstruction. The security problem is formulated via an iterative min-max optimization in which an eavesdropper is trained to improve its semantic inference, while the legitimate transmitter-receiver pair is trained to preserve task performance while reducing the eavesdropper's success. We also introduce an auxiliary layer that superimposes a cooperative, adversarially crafted perturbation on the transmitted waveform to degrade semantic leakage to an eavesdropper. Performance is evaluated over Rayleigh fading channels with additive white Gaussian noise using MNIST and CIFAR-10 datasets. Semantic accuracy and reconstruction quality improve with increasing latent dimension, while the min-max mechanism reduces the eavesdropper's inference performance significantly without degrading the legitimate receiver. The perturbation layer is successful in reducing semantic leakage even when the legitimate link is trained only for its own task. This comprehensive framework motivates semantic communication designs with tunable, end-to-end privacy against adaptive adversaries in realistic wireless settings.


【3】DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks
标题:DivQAT:增强量化卷积神经网络对抗模型提取攻击的鲁棒性
链接 :https://arxiv.org/abs/2512.23948

作者:Kacem Khaled,Felipe Gohring de Magalhães,Gabriela Nicolescu
摘要:卷积神经网络(CNN)及其量化的对等网络容易受到提取攻击,构成了IP盗窃的重大威胁。然而,与大型模型相比,量化模型对这些攻击的鲁棒性研究很少。以前的防御建议将计算的噪声注入预测概率。然而,这些防御是有限的,因为它们在模型设计期间没有被合并,并且只是在训练之后作为事后的想法添加。此外,大多数防御技术在计算上是昂贵的,并且通常具有关于受害者模型的不切实际的假设,这些假设在边缘设备实现中是不可行的,并且不适用于量化模型。在本文中,我们提出了DivQAT,一种新的算法来训练量化CNN的基础上量化感知训练(QAT),旨在提高其对提取攻击的鲁棒性。据我们所知,我们的技术是第一个修改量化过程,将模型提取防御集成到训练过程中的技术。通过对基准视觉数据集的实证验证,我们证明了我们的技术在防御模型提取攻击而不影响模型准确性方面的有效性。此外,将我们的量化技术与其他防御机制相结合,与传统的QAT相比,提高了它们的有效性。
摘要:Convolutional Neural Networks (CNNs) and their quantized counterparts are vulnerable to extraction attacks, posing a significant threat of IP theft. Yet, the robustness of quantized models against these attacks is little studied compared to large models. Previous defenses propose to inject calculated noise into the prediction probabilities. However, these defenses are limited since they are not incorporated during the model design and are only added as an afterthought after training. Additionally, most defense techniques are computationally expensive and often have unrealistic assumptions about the victim model that are not feasible in edge device implementations and do not apply to quantized models. In this paper, we propose DivQAT, a novel algorithm to train quantized CNNs based on Quantization Aware Training (QAT) aiming to enhance their robustness against extraction attacks. To the best of our knowledge, our technique is the first to modify the quantization process to integrate a model extraction defense into the training process. Through empirical validation on benchmark vision datasets, we demonstrate the efficacy of our technique in defending against model extraction attacks without compromising model accuracy. Furthermore, combining our quantization technique with other defense mechanisms improves their effectiveness compared to traditional QAT.


【4】Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation
标题:对抗性镜头:利用注意力层生成对抗性示例进行评估
链接:https://arxiv.org/abs/2512.23837

作者:Kaustubh Dhole
摘要:机械可解释性的最新进展表明,中间注意层编码标记级假设,这些假设朝着最终输出迭代地细化。在这项工作中,我们利用这一特性直接从注意力层令牌分布中生成对抗性示例。与基于符号或基于梯度的攻击不同,我们的方法利用模型内部的令牌预测,产生的扰动既合理,又与模型自身的生成过程内部一致。我们评估从中间层提取的令牌是否可以作为下游评估任务的有效对抗扰动。我们使用ArgQuality数据集进行参数质量评估实验,LLaMA-3.1-Instruct-8B同时作为生成器和评估器。我们的研究结果表明,基于注意力的对抗性示例会导致评估性能的可测量下降,同时保持与原始输入的语义相似。然而,我们也观察到,从某些层和标记位置提取的替代可以引入语法退化,限制其实际效果。总体而言,我们的研究结果突出了使用中间层表示作为基于LLM的评估管道压力测试对抗性示例的原则来源的承诺和当前限制。
摘要:Recent advances in mechanistic interpretability suggest that intermediate attention layers encode token-level hypotheses that are iteratively refined toward the final output. In this work, we exploit this property to generate adversarial examples directly from attention-layer token distributions. Unlike prompt-based or gradient-based attacks, our approach leverages model-internal token predictions, producing perturbations that are both plausible and internally consistent with the model's own generation process. We evaluate whether tokens extracted from intermediate layers can serve as effective adversarial perturbations for downstream evaluation tasks. We conduct experiments on argument quality assessment using the ArgQuality dataset, with LLaMA-3.1-Instruct-8B serving as both the generator and evaluator. Our results show that attention-based adversarial examples lead to measurable drops in evaluation performance while remaining semantically similar to the original inputs. However, we also observe that substitutions drawn from certain layers and token positions can introduce grammatical degradation, limiting their practical effectiveness. Overall, our findings highlight both the promise and current limitations of using intermediate-layer representations as a principled source of adversarial examples for stress-testing LLM-based evaluation pipelines.


【5】Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark
标题:预算导致的过度发电作为拒绝服务:黑匣子攻击端基准
链接:https://arxiv.org/abs/2512.23779

作者:Manu,Yi Guo,Jo Plested,Tim Lynar,Kanchana Thilakarathna,Nirhoshan Sivaroopan,Jack Yang,Wangli Yang
备注:12 pages, 2 figures
摘要:大型语言模型(LLM)可能会过度生成,在生成序列结束(EOS)令牌之前发出数千个令牌。这会降低应答质量,增加延迟和成本,并且可以作为拒绝服务(DoS)攻击的武器。最近的工作已经开始研究DOS风格的提示攻击,但通常集中在一个单一的攻击算法或假设白盒访问,没有一个攻击端的基准,比较黑盒,只查询制度与已知的令牌化器的攻击者。我们引入这样一个基准测试,并研究两个只攻击者。第一种是进化的过代提示搜索(EOGen),它搜索令牌空间中抑制EOS并引起长延续的前缀。第二种是目标条件强化学习攻击器(RL-GOAL),它训练网络生成以目标长度为条件的前缀。为了表征行为,我们引入了过生成因子(OGF),即生成的令牌与模型上下文窗口的比率,以及停顿和延迟摘要。我们的进化攻击者在Phi-3上实现了平均OGF = 1.38 +/- 1.15和成功@OGF >= 2的24.5%。RL-GOAL更强:在受害者中,它实现了更高的平均OGF(高达2.81 +/- 1.38)。
摘要:Large language models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This degrades answer quality, inflates latency and cost, and can be weaponized as a denial-of-service (DoS) attack. Recent work has begun to study DoS-style prompt attacks, but typically focuses on a single attack algorithm or assumes white-box access, without an attack-side benchmark that compares prompt-based attackers in a black-box, query-only regime with a known tokenizer. We introduce such a benchmark and study two prompt-only attackers. The first is Evolutionary Over-Generation Prompt Search (EOGen), which searches the token space for prefixes that suppress EOS and induce long continuations. The second is a goal-conditioned reinforcement learning attacker (RL-GOAL) that trains a network to generate prefixes conditioned on a target length. To characterize behavior, we introduce Over-Generation Factor (OGF), the ratio of produced tokens to a model's context window, along with stall and latency summaries. Our evolutionary attacker achieves mean OGF = 1.38 +/- 1.15 and Success@OGF >= 2 of 24.5 percent on Phi-3. RL-GOAL is stronger: across victims it achieves higher mean OGF (up to 2.81 +/- 1.38).


【6】A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering Simulation
标题:工程仿真中几何准备和网格生成的人工智能方法综述
链接:https://arxiv.org/abs/2512.23719

作者:Steven Owen,Nathan Brown,Nikos Chrisochoides,Rao Garimella,Xianfeng Gu,Franck Ledoux,Na Lei,Roshan Quadros,Navamita Ray,Nicolas Winovich,Yongjie Jessica Zhang
备注:35 pages, 0 figure, accepted by the International Meshing Roundtable conference 2026
摘要:人工智能开始缓解CAD到网格管道中长期存在的瓶颈。本调查回顾了机器学习辅助零件分类、网格质量预测和失效的最新进展。我们探索的方法,提高非结构化和块结构网格,支持体积参数化,并加快并行网格生成。我们还研究了用于脚本自动化的新兴工具,包括强化学习和大型语言模型。在这些努力中,人工智能作为一种辅助技术,扩展了传统几何和网格工具的功能。该调查突出了代表性方法,实际部署和关键研究挑战,这些挑战将塑造下一代数据驱动网格化工作流程。
摘要 :Artificial intelligence is beginning to ease long-standing bottlenecks in the CAD-to-mesh pipeline. This survey reviews recent advances where machine learning aids part classification, mesh quality prediction, and defeaturing. We explore methods that improve unstructured and block-structured meshing, support volumetric parameterizations, and accelerate parallel mesh generation. We also examine emerging tools for scripting automation, including reinforcement learning and large language models. Across these efforts, AI acts as an assistive technology, extending the capabilities of traditional geometry and meshing tools. The survey highlights representative methods, practical deployments, and key research challenges that will shape the next generation of data-driven meshing workflows.


【7】SymSeqBench: a unified framework for the generation and analysis of rule-based symbolic sequences and datasets
标题:SymSeqBench:用于生成和分析基于规则的符号序列和数据集的统一框架
链接:https://arxiv.org/abs/2512.24977

作者:Barna Zajzon,Younes Bouhadjar,Maxime Fabre,Felix Schmidt,Noah Ostendorf,Emre Neftci,Abigail Morrison,Renato Duarte
摘要:序列结构是自然认知和行为的多个领域的关键特征,如语言,运动和决策。同样,它也是我们希望应用人工智能的任务的核心属性。因此,这是非常重要的发展框架,使我们能够评估序列学习和处理领域不可知论的方式,同时提供一个链接到正式的理论计算和可计算性。为了满足这一需求,我们引入了两个互补的软件工具:SymSeq,旨在严格生成和分析结构化符号序列,和SeqBench,一个全面的基准套件,基于规则的序列处理任务,以评估人工学习系统在认知相关领域的性能。结合起来,SymSeqBench提供了跨不同知识领域调查序列结构的多功能性,包括实验心理语言学,认知心理学,行为分析,神经形态计算和人工智能。由于其在形式语言理论(FLT)的基础上,SymSeqBench为多个领域的研究人员提供了一种方便实用的方法来应用FLT的概念来概念化和标准化他们的实验,从而通过共享的计算框架和形式主义来促进我们对认知和行为的理解。该工具是模块化的,开放供研究界使用。
摘要:Sequential structure is a key feature of multiple domains of natural cognition and behavior, such as language, movement and decision-making. Likewise, it is also a central property of tasks to which we would like to apply artificial intelligence. It is therefore of great importance to develop frameworks that allow us to evaluate sequence learning and processing in a domain agnostic fashion, whilst simultaneously providing a link to formal theories of computation and computability. To address this need, we introduce two complementary software tools: SymSeq, designed to rigorously generate and analyze structured symbolic sequences, and SeqBench, a comprehensive benchmark suite of rule-based sequence processing tasks to evaluate the performance of artificial learning systems in cognitively relevant domains. In combination, SymSeqBench offers versatility in investigating sequential structure across diverse knowledge domains, including experimental psycholinguistics, cognitive psychology, behavioral analysis, neuromorphic computing and artificial intelligence. Due to its basis in Formal Language Theory (FLT), SymSeqBench provides researchers in multiple domains with a convenient and practical way to apply the concepts of FLT to conceptualize and standardize their experiments, thus advancing our understanding of cognition and behavior through shared computational frameworks and formalisms. The tool is modular, openly available and accessible to the research community.


半/弱/无/有监督|不确定性|主动学习(3篇)

【1】Self-Supervised Neural Architecture Search for Multimodal Deep Neural Networks
标题:多峰深度神经网络的自我监督神经架构搜索
链接:https://arxiv.org/abs/2512.24793

作者:Shota Suzuki,Satoshi Ono
摘要:神经架构搜索(NAS)自动化深度神经网络(DNN)的架构设计过程,引起了越来越多的关注。需要从多模态中进行特征融合的多模态DNN由于其结构复杂性而受益于NAS;然而,通过NAS构建多模态DNN的架构需要大量的标记训练数据。因此,本文提出了一种自监督学习(SSL)方法的多模态DNN的架构搜索。该方法将SSL综合应用于体系结构搜索和模型预训练过程。实验结果表明,该方法成功地从未标记的训练数据设计DNN的架构。
摘要:Neural architecture search (NAS), which automates the architectural design process of deep neural networks (DNN), has attracted increasing attention. Multimodal DNNs that necessitate feature fusion from multiple modalities benefit from NAS due to their structural complexity; however, constructing an architecture for multimodal DNNs through NAS requires a substantial amount of labeled training data. Thus, this paper proposes a self-supervised learning (SSL) method for architecture search of multimodal DNNs. The proposed method applies SSL comprehensively for both the architecture search and model pretraining processes. Experimental results demonstrated that the proposed method successfully designed architectures for DNNs from unlabeled training data.


【2】Micro-Macro Tensor Neural Surrogates for Uncertainty Quantification in Collisional Plasma
标题:碰撞等离子体中不确定性量化的微-宏张量神经替代
链接:https://arxiv.org/abs/2512.24205

作者:Wei Chen,Giacomo Dimarco,Lorenzo Pareschi
摘要:等离子体动力学方程表现出显着的敏感性模型参数和数据的微观扰动,使可靠和有效的不确定性量化(UQ)的预测模拟必不可少的。然而,不确定性采样的成本,高维相空间,和多尺度刚度在传统的数值方法的计算效率和误差控制提出了严峻的挑战。这些方面进一步强调在存在的碰撞,高维非局部碰撞积分和守恒性质构成严重的限制。为了克服这一点,我们提出了一个方差减少的Monte Carlo框架UQ的Vlasov-泊松-朗道(VPL)系统,其中神经网络代理人取代多个昂贵的评价朗道碰撞项。该方法耦合一个高保真,渐近保持VPL求解器与廉价的,强相关的代理人的基础上的Vlasov-Poisson-Fokker-Planck(VPFP)和欧拉-Poisson(EP)方程。对于代理模型,我们引入了可分离物理信息神经网络(SPINN)的推广,开发了一类基于各向异性微观-宏观分解的张量神经网络,以降低速度矩成本,模型复杂性和维数灾难。为了进一步增加与VPL的相关性,我们校准了VPFP模型,并设计了一个渐进保持SPINN,其小和大克努森极限分别恢复EP和VP系统。数值实验表明,大大减少了标准蒙特卡洛,准确的统计数据,更少的高保真度的样本,和较低的挂钟时间,同时保持鲁棒性随机尺寸的方差。
摘要:Plasma kinetic equations exhibit pronounced sensitivity to microscopic perturbations in model parameters and data, making reliable and efficient uncertainty quantification (UQ) essential for predictive simulations. However, the cost of uncertainty sampling, the high-dimensional phase space, and multiscale stiffness pose severe challenges to both computational efficiency and error control in traditional numerical methods. These aspects are further emphasized in presence of collisions where the high-dimensional nonlocal collision integrations and conservation properties pose severe constraints. To overcome this, we present a variance-reduced Monte Carlo framework for UQ in the Vlasov--Poisson--Landau (VPL) system, in which neural network surrogates replace the multiple costly evaluations of the Landau collision term. The method couples a high-fidelity, asymptotic-preserving VPL solver with inexpensive, strongly correlated surrogates based on the Vlasov--Poisson--Fokker--Planck (VPFP) and Euler--Poisson (EP) equations. For the surrogate models, we introduce a generalization of the separable physics-informed neural network (SPINN), developing a class of tensor neural networks based on an anisotropic micro-macro decomposition, to reduce velocity-moment costs, model complexity, and the curse of dimensionality. To further increase correlation with VPL, we calibrate the VPFP model and design an asymptotic-preserving SPINN whose small- and large-Knudsen limits recover the EP and VP systems, respectively. Numerical experiments show substantial variance reduction over standard Monte Carlo, accurate statistics with far fewer high-fidelity samples, and lower wall-clock time, while maintaining robustness to stochastic dimension.


【3】Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
标题:奇妙的推理行为以及在哪里可以找到它们:推理过程的无监督发现
链接:https://arxiv.org/abs/2512.23988

作者:Zhenyu Zhang,Shujian Zhang,John Lambert,Wenxuan Zhou,Zhangyang Wang,Mingqing Chen,Andrew Hard,Rajiv Mathews,Lun Wang
摘要:尽管最近的大型语言模型(LLM)的推理能力越来越强,但其在推理过程中的内部机制仍然没有得到充分的探索。现有方法通常依赖于人类定义的概念(例如,过度思考,反思),以监督的方式分析推理。然而,这样的方法是有限的,因为它是不可行的,以捕捉潜在的推理行为,其中许多是难以定义的令牌空间的全部频谱。在这项工作中,我们提出了一个无监督的框架(即,上升:推理行为通过稀疏自动编码器的可解释性)发现推理向量,我们定义为方向的激活空间编码不同的推理行为。通过将思维链轨迹分割成步骤级“步骤”并在步骤级激活上训练稀疏自动编码器(SAE),我们发现了与反射和回溯等可解释行为相对应的解纠缠特征。可视化和聚类分析表明,这些行为占据可分离的区域在解码器列空间。此外,有针对性的干预SAE衍生向量可以可控地放大或抑制特定的推理行为,改变推理轨迹,而无需再训练。除了特定行为的解开,SAE捕获结构特性,如响应长度,揭示长与短的推理痕迹集群。更有趣的是,SAE能够发现人类监督之外的新行为。我们证明了控制响应的信心,通过确定在SAE解码器空间的信心相关的向量的能力。这些研究结果强调了无监督的潜在发现的潜力,解释和可控地引导推理LLM。
摘要:Despite the growing reasoning capabilities of recent large language models (LLMs), their internal mechanisms during the reasoning process remain underexplored. Prior approaches often rely on human-defined concepts (e.g., overthinking, reflection) at the word level to analyze reasoning in a supervised manner. However, such methods are limited, as it is infeasible to capture the full spectrum of potential reasoning behaviors, many of which are difficult to define in token space. In this work, we propose an unsupervised framework (namely, RISE: Reasoning behavior Interpretability via Sparse auto-Encoder) for discovering reasoning vectors, which we define as directions in the activation space that encode distinct reasoning behaviors. By segmenting chain-of-thought traces into sentence-level 'steps' and training sparse auto-encoders (SAEs) on step-level activations, we uncover disentangled features corresponding to interpretable behaviors such as reflection and backtracking. Visualization and clustering analyses show that these behaviors occupy separable regions in the decoder column space. Moreover, targeted interventions on SAE-derived vectors can controllably amplify or suppress specific reasoning behaviors, altering inference trajectories without retraining. Beyond behavior-specific disentanglement, SAEs capture structural properties such as response length, revealing clusters of long versus short reasoning traces. More interestingly, SAEs enable the discovery of novel behaviors beyond human supervision. We demonstrate the ability to control response confidence by identifying confidence-related vectors in the SAE decoder space. These findings underscore the potential of unsupervised latent discovery for both interpreting and controllably steering reasoning in LLMs.


迁移|Zero/Few/One-Shot|自适应(6篇)

【1】Characterization of Transfer Using Multi-task Learning Curves
标题:使用多任务学习曲线描述迁移
链接:https://arxiv.org/abs/2512.24866

作者:András Millinghoffer,Bence Bolgár,Péter Antal
摘要:迁移效应在使用固定数据集的训练过程中和使用累积数据的归纳推理中都表现出来。我们假设,通过包括更多的样本来扰动数据集,而不是通过梯度更新来扰动模型,提供了一个补充和更基本的转移效应表征。为了捕捉这一现象,我们使用多任务学习曲线来定量建模转移效应,该曲线近似于不同样本大小下的归纳性能。我们描述了一种有效的方法来近似多任务学习曲线,类似于在训练过程中应用的任务亲和度方法。我们比较了统计和计算方法的转移,这表明相当高的计算成本为以前的,但更好的功率和更广泛的适用性。使用基准药物-靶标相互作用数据集进行评价。我们的研究结果表明,学习曲线可以更好地捕捉多任务学习的效果和他们的多任务扩展可以描绘成对和上下文迁移的基础模型的影响。
摘要:Transfer effects manifest themselves both during training using a fixed data set and in inductive inference using accumulating data. We hypothesize that perturbing the data set by including more samples, instead of perturbing the model by gradient updates, provides a complementary and more fundamental characterization of transfer effects. To capture this phenomenon, we quantitatively model transfer effects using multi-task learning curves approximating the inductive performance over varying sample sizes. We describe an efficient method to approximate multi-task learning curves analogous to the Task Affinity Grouping method applied during training. We compare the statistical and computational approaches to transfer, which indicates considerably higher compute costs for the previous but better power and broader applicability. Evaluations are performed using a benchmark drug-target interaction data set. Our results show that learning curves can better capture the effects of multi-task learning and their multi-task extensions can delineate pairwise and contextual transfer effects in foundation models.


【2】Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
标题:动态大概念模型:自适应语义空间中的潜在推理
链接:https://arxiv.org/abs/2512.24617

作者:Xingwei Qu,Shaowen Wang,Zihao Huang,Kai Hua,Fan Yin,Rui-Jie Zhu,Jundong Zhou,Qiyang Min,Zihao Wang,Yizhi Li,Tianyu Zhang,He Xing,Zheng Zhang,Yuxuan Song,Tianyu Zheng,Zhiyuan Zeng,Chenghua Lin,Ge Zhang,Wenhao Huang
摘要:大型语言模型(LLM)将统一计算应用于所有标记,尽管语言表现出高度不均匀的信息密度。这种标记统一的机制在本地可预测的跨度上浪费了容量,同时将计算分配给语义关键的转换。我们提出了$\textbf {动态大概念模型(DLCM)}$,一个层次化的语言建模框架,学习语义边界的潜在表示和计算转移到一个压缩的概念空间,推理是更有效的。DLCM发现可变长度的概念端到端,而不依赖于预定义的语言单位。分层压缩从根本上改变了缩放行为。我们引入了第一个$\textbf {压缩感知缩放律}$,它将令牌级容量,概念级推理能力和压缩比分开,从而在固定的FLOP下实现原则性的计算分配。为了稳定地训练这种异构体系结构,我们进一步开发了$\textbf {解耦$μ $P参数化}$,支持跨宽度和压缩机制的zero-shot超参数传输。在一个实际的设置($R = 4 $,对应于每个概念平均四个令牌),DLCM重新分配了大约三分之一的推理计算到一个更高的能力推理骨干,实现了$\textbf {+2.69$\%$平均改善}$在匹配的推理FLOP下,在12个zero-shot基准测试。
摘要:Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose $\textbf{Dynamic Large Concept Models (DLCM)}$, a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first $\textbf{compression-aware scaling law}$, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a $\textbf{decoupled $μ$P parametrization}$ that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting ($R=4$, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a $\textbf{+2.69$\%$ average improvement}$ across 12 zero-shot benchmarks under matched inference FLOPs.


【3】Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics
标题:偏置-噪音-对齐诊断引导的自适应学习
链接:https://arxiv.org/abs/2512.24445

作者:Akash Samanta,Sheldon Williamson
备注:This preprint focuses on the theoretical framework and diagnostic behavior. Comprehensive experimental validation in application-specific settings is deferred to a companion experimental study
摘要:部署在非平稳和安全关键环境中的学习系统,往往遭受不稳定,收敛速度慢,或脆弱的适应时,学习动态随着时间的推移而演变。虽然现代优化、强化学习和元学习方法适用于梯度统计,但它们在很大程度上忽略了误差信号本身的时间结构。本文提出了一种诊断驱动的自适应学习框架,明确建模误差演化通过原则性分解为偏差,捕捉持续漂移;噪声,捕捉随机变化;和对齐,捕捉重复的方向性激励导致过冲。这些诊断是从损失或时间差误差轨迹的轻量级统计数据在线计算的,并且独立于模型架构或任务域。我们表明,所提出的偏差-噪声-对齐分解为监督优化,演员-评论家强化学习和学习优化器提供了统一的控制骨干。在此框架的基础上,我们推导出诊断驱动的实例,包括一个稳定的监督优化,诊断调节演员批评计划,和诊断条件学习优化。在标准光滑性假设下,我们建立了所有情况下的有界有效更新和稳定性。行动者-批评者学习中的代表性诊断例证突出了所提出的信号如何调制适应以响应时间差误差结构。总的来说,这项工作将错误演化提升为自适应学习中的一流对象,并为动态环境中的可靠学习提供了可解释的轻量级基础。
摘要:Learning systems deployed in nonstationary and safety-critical environments often suffer from instability, slow convergence, or brittle adaptation when learning dynamics evolve over time. While modern optimization, reinforcement learning, and meta-learning methods adapt to gradient statistics, they largely ignore the temporal structure of the error signal itself. This paper proposes a diagnostic-driven adaptive learning framework that explicitly models error evolution through a principled decomposition into bias, capturing persistent drift; noise, capturing stochastic variability; and alignment, capturing repeated directional excitation leading to overshoot. These diagnostics are computed online from lightweight statistics of loss or temporal-difference error trajectories and are independent of model architecture or task domain. We show that the proposed bias-noise-alignment decomposition provides a unifying control backbone for supervised optimization, actor-critic reinforcement learning, and learned optimizers. Building on this framework, we derive diagnostic-driven instantiations including a stabilized supervised optimizer, a diagnostic-regulated actor-critic scheme, and a diagnostic-conditioned learned optimizer. Under standard smoothness assumptions, we establish bounded effective updates and stability properties for all cases. Representative diagnostic illustrations in actor-critic learning highlight how the proposed signals modulate adaptation in response to temporal-difference error structure. Overall, this work elevates error evolution to a first-class object in adaptive learning and provides an interpretable, lightweight foundation for reliable learning in dynamic environments.


【4】MeLeMaD: Adaptive Malware Detection via Chunk-wise Feature Selection and Meta-Learning
标题:MeLeMaD:通过分块特征选择和元学习的自适应恶意软件检测
链接:https://arxiv.org/abs/2512.23987

作者:Ajvad Haneef K,Karan Kuwar Singh,Madhu Kumar S D
备注:20 pages, 8 Figures
摘要:要应对网络安全中恶意软件检测的重大挑战,必须有既强大又能适应不断变化的威胁环境的解决方案。本文介绍了Meta Learning Malware Detection(MeLeMaD),这是一种利用模型不可知元学习(MAML)的适应性和泛化能力进行恶意软件检测的新框架。MeLeMaD采用了一种新的特征选择技术,基于梯度提升的分块特征选择(CFSGB),专为处理大规模,高维恶意软件数据集而量身定制,显著提高了检测效率。使用两个基准恶意软件数据集(CIC-AndMal 2020和BODMAS)和一个自定义数据集(CNOID)对MeLeMaD进行严格验证,在关键评估指标(包括准确度、精确度、召回率、F1分数、MCC和AUC)方面取得了卓越的性能。MeLeMaD在CIC AndMal 2020上的准确率为98.04%,在BODMAS上的准确率为99.97%,优于最先进的方法。自定义数据集,CNOD,也达到了值得称赞的97.85%的准确率。这些结果强调了MeLeMaD在解决恶意软件检测中的鲁棒性、适应性和大规模高维数据集的挑战方面的潜力,为更有效、更高效的网络安全解决方案铺平了道路。
摘要:Confronting the substantial challenges of malware detection in cybersecurity necessitates solutions that are both robust and adaptable to the ever-evolving threat environment. The paper introduces Meta Learning Malware Detection (MeLeMaD), a novel framework leveraging the adaptability and generalization capabilities of Model-Agnostic Meta-Learning (MAML) for malware detection. MeLeMaD incorporates a novel feature selection technique, Chunk-wise Feature Selection based on Gradient Boosting (CFSGB), tailored for handling large-scale, high-dimensional malware datasets, significantly enhancing the detection efficiency. Two benchmark malware datasets (CIC-AndMal2020 and BODMAS) and a custom dataset (EMBOD) were used for rigorously validating the MeLeMaD, achieving a remarkable performance in terms of key evaluation measures, including accuracy, precision, recall, F1-score, MCC, and AUC. With accuracies of 98.04\% on CIC-AndMal2020 and 99.97\% on BODMAS, MeLeMaD outperforms the state-of-the-art approaches. The custom dataset, EMBOD, also achieves a commendable accuracy of 97.85\%. The results underscore the MeLeMaD's potential to address the challenges of robustness, adaptability, and large-scale, high-dimensional datasets in malware detection, paving the way for more effective and efficient cybersecurity solutions.


【5】Lifelong Domain Adaptive 3D Human Pose Estimation
标题:终身领域自适应3D人体姿势估计
链接:https://arxiv.org/abs/2512.23860

作者:Qucheng Peng,Hongfei Xue,Pu Wang,Chen Chen
备注:Accepted by AAAI 2026
摘要:3D人体姿态估计(3D HPE)在各种应用中至关重要,从人员重新识别和动作识别到虚拟现实。然而,依赖于在受控环境中收集的带注释的3D数据,对各种野外场景的推广提出了挑战。现有的领域自适应(DA)范例,如通用DA和无源DA的3D HPE忽略了非静态目标姿态数据集的问题。为了解决这些挑战,我们提出了一个新的任务,名为终身域自适应3D HPE。据我们所知,我们是第一家将终身领域适应引入3D HPE任务的公司。在这种终身DA设置中,姿态估计器在源域上进行预训练,随后适应不同的目标域。此外,在适应当前目标域期间,姿态估计器无法访问源和所有先前的目标域。3D HPE的终身DA涉及克服适应当前领域姿势的挑战,并保留来自以前领域的知识,特别是对抗灾难性遗忘。我们提出了一个创新的生成对抗网络(GAN)框架,它包含3D姿态生成器,2D姿态估计器和3D姿态估计器。该框架有效地减轻了域转移,并对齐原始和增强的姿势。此外,我们构建了一个新的3D姿态生成器的范例,集成姿态感知,时间感知,和域感知的知识,以提高当前域的适应和减轻灾难性的遗忘在以前的域。我们的方法通过对不同领域自适应3D HPE数据集的广泛实验证明了卓越的性能。
摘要:3D Human Pose Estimation (3D HPE) is vital in various applications, from person re-identification and action recognition to virtual reality. However, the reliance on annotated 3D data collected in controlled environments poses challenges for generalization to diverse in-the-wild scenarios. Existing domain adaptation (DA) paradigms like general DA and source-free DA for 3D HPE overlook the issues of non-stationary target pose datasets. To address these challenges, we propose a novel task named lifelong domain adaptive 3D HPE. To our knowledge, we are the first to introduce the lifelong domain adaptation to the 3D HPE task. In this lifelong DA setting, the pose estimator is pretrained on the source domain and subsequently adapted to distinct target domains. Moreover, during adaptation to the current target domain, the pose estimator cannot access the source and all the previous target domains. The lifelong DA for 3D HPE involves overcoming challenges in adapting to current domain poses and preserving knowledge from previous domains, particularly combating catastrophic forgetting. We present an innovative Generative Adversarial Network (GAN) framework, which incorporates 3D pose generators, a 2D pose discriminator, and a 3D pose estimator. This framework effectively mitigates domain shifts and aligns original and augmented poses. Moreover, we construct a novel 3D pose generator paradigm, integrating pose-aware, temporal-aware, and domain-aware knowledge to enhance the current domain's adaptation and alleviate catastrophic forgetting on previous domains. Our method demonstrates superior performance through extensive experiments on diverse domain adaptive 3D HPE datasets.


【6】Improving the stability of the covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling
标题:提高协方差控制自适应Langevin恒温器在大规模贝叶斯抽样中的稳定性
链接:https://arxiv.org/abs/2512.24515

作者:Jiani Wei,Xiaocheng Shang
摘要 :随机梯度朗之万动力学及其变体在贝叶斯抽样的设置下通过随机(通常小得多)子集近似整个数据集的可能性。由于计算效率的(通常是实质性的)提高,它们已被广泛用于大规模机器学习应用。它已被证明,所谓的协方差控制的自适应Langevin(CCAdL)恒温器,其中包括一个额外的条款,涉及的协方差矩阵的噪声力,优于流行的替代方法。在CCAdL中,使用移动平均来估计噪声力的协方差矩阵,在这种情况下,协方差矩阵将在长时间限制下收敛到常数矩阵。此外,它出现在我们的数值实验中,移动平均的使用可能会降低数值积分器的稳定性,从而限制了最大可用步长。在这篇文章中,我们提出了一个修改后的CCAdL(即,mCCAdL)恒温器,其使用缩放和平方方法的缩放部分以及对指数的截断泰勒级数近似,以在数值上近似涉及CCAdL中提出的附加项的子系统的精确解。我们还提出了一个对称的分裂方法mCCAdL,而不是欧拉型离散在原来的CCAdL恒温器。我们在数值实验中证明,新提出的mCCAdL恒温器在数值稳定性方面比原来的CCAdL恒温器有了很大的提高,同时在大规模机器学习应用的数值精度方面明显优于流行的替代随机梯度方法。
摘要:Stochastic gradient Langevin dynamics and its variants approximate the likelihood of an entire dataset, via random (and typically much smaller) subsets, in the setting of Bayesian sampling. Due to the (often substantial) improvement of the computational efficiency, they have been widely used in large-scale machine learning applications. It has been demonstrated that the so-called covariance-controlled adaptive Langevin (CCAdL) thermostat, which incorporates an additional term involving the covariance matrix of the noisy force, outperforms popular alternative methods. A moving average is used in CCAdL to estimate the covariance matrix of the noisy force, in which case the covariance matrix will converge to a constant matrix in long-time limit. Moreover, it appears in our numerical experiments that the use of a moving average could reduce the stability of the numerical integrators, thereby limiting the largest usable stepsize. In this article, we propose a modified CCAdL (i.e., mCCAdL) thermostat that uses the scaling part of the scaling and squaring method together with a truncated Taylor series approximation to the exponential to numerically approximate the exact solution to the subsystem involving the additional term proposed in CCAdL. We also propose a symmetric splitting method for mCCAdL, instead of an Euler-type discretisation used in the original CCAdL thermostat. We demonstrate in our numerical experiments that the newly proposed mCCAdL thermostat achieves a substantial improvement in the numerical stability over the original CCAdL thermostat, while significantly outperforming popular alternative stochastic gradient methods in terms of the numerical accuracy for large-scale machine learning applications.


强化学习(9篇)

【1】Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation
标题:用于移动机器人导航的混合运动规划与深度强化学习
链接:https://arxiv.org/abs/2512.24651

作者:Yury Kolomeytsev,Dmitry Golembiovsky
备注:22 pages, 4 figures
摘要:自主移动机器人在复杂的动态环境中运行面临着双重挑战,即在具有静态障碍物的大规模、结构多样的空间中导航,同时与各种移动代理安全地交互。传统的基于图的规划器擅长长距离寻路,但缺乏反应能力,而深度强化学习(DRL)方法表现出强大的碰撞避免能力,但由于缺乏全局背景,往往无法达到遥远的目标。我们提出了混合运动规划与深度强化学习(HMP-DRL),这是一个弥合这一差距的混合框架。我们的方法利用基于图的全局规划器来生成路径,该路径通过在状态空间和奖励函数中编码的检查点序列被集成到本地DRL策略中。为了确保社会合规性,本地规划者采用实体感知的奖励结构,该结构基于周围代理的语义类型动态地调整安全裕度和惩罚。我们验证了所提出的方法,通过广泛的测试,在一个现实的模拟环境中来自真实世界的地图数据。综合实验表明,HMP-DRL在机器人导航的关键指标方面始终优于其他方法,包括最先进的方法:成功率,碰撞率和到达目标的时间。总的来说,这些研究结果证实,将长期路径引导与语义感知的本地控制相结合,可以显着提高以人为中心的复杂环境中自主导航的安全性和可靠性。
摘要:Autonomous mobile robots operating in complex, dynamic environments face the dual challenge of navigating large-scale, structurally diverse spaces with static obstacles while safely interacting with various moving agents. Traditional graph-based planners excel at long-range pathfinding but lack reactivity, while Deep Reinforcement Learning (DRL) methods demonstrate strong collision avoidance but often fail to reach distant goals due to a lack of global context. We propose Hybrid Motion Planning with Deep Reinforcement Learning (HMP-DRL), a hybrid framework that bridges this gap. Our approach utilizes a graph-based global planner to generate a path, which is integrated into a local DRL policy via a sequence of checkpoints encoded in both the state space and reward function. To ensure social compliance, the local planner employs an entity-aware reward structure that dynamically adjusts safety margins and penalties based on the semantic type of surrounding agents. We validate the proposed method through extensive testing in a realistic simulation environment derived from real-world map data. Comprehensive experiments demonstrate that HMP-DRL consistently outperforms other methods, including state-of-the-art approaches, in terms of key metrics of robot navigation: success rate, collision rate, and time to reach the goal. Overall, these findings confirm that integrating long-term path guidance with semantically-aware local control significantly enhances both the safety and reliability of autonomous navigation in complex human-centric settings.


【2】Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models
标题:反向强化学习和动态离散选择模型的有效推理
链接:https://arxiv.org/abs/2512.24407

作者:Lars van der Laan,Aurelien Bibaut,Nathan Kallus
摘要:反向强化学习(IRL)和动态离散选择(DDC)模型通过恢复使观察到的行为合理化的奖励函数来解释顺序决策。灵活的IRL方法通常依赖于机器学习,但不能保证有效的推理,而经典的DDC方法会施加限制性的参数规范,并且通常需要重复的动态编程。我们开发了一个半参数的框架去偏逆强化学习,产生统计有效的推断奖励依赖泛函的最大熵IRL和Gumbel冲击DDC模型的广泛的类。我们表明,日志行为的政策作为一个伪奖励,点识别政策价值的差异,在一个简单的规范化,奖励本身。然后,我们将这些目标形式化,包括已知和反事实softmax策略下的策略值和归一化奖励的泛函,作为行为策略和过渡核的光滑泛函,建立路径可微性,并推导出其有效的影响函数。在此特征的基础上,我们构建了自动去偏机器学习估计,允许灵活的非参数估计的滋扰组件,同时实现$\sqrt{n}$-一致性,渐近正态性和半参数效率。我们的框架将DDC模型的经典推理扩展到非参数奖励和现代机器学习工具,为IRL中的统计推理提供了一个统一的,计算上易于处理的方法。
摘要:Inverse reinforcement learning (IRL) and dynamic discrete choice (DDC) models explain sequential decision-making by recovering reward functions that rationalize observed behavior. Flexible IRL methods typically rely on machine learning but provide no guarantees for valid inference, while classical DDC approaches impose restrictive parametric specifications and often require repeated dynamic programming. We develop a semiparametric framework for debiased inverse reinforcement learning that yields statistically efficient inference for a broad class of reward-dependent functionals in maximum entropy IRL and Gumbel-shock DDC models. We show that the log-behavior policy acts as a pseudo-reward that point-identifies policy value differences and, under a simple normalization, the reward itself. We then formalize these targets, including policy values under known and counterfactual softmax policies and functionals of the normalized reward, as smooth functionals of the behavior policy and transition kernel, establish pathwise differentiability, and derive their efficient influence functions. Building on this characterization, we construct automatic debiased machine-learning estimators that allow flexible nonparametric estimation of nuisance components while achieving $\sqrt{n}$-consistency, asymptotic normality, and semiparametric efficiency. Our framework extends classical inference for DDC models to nonparametric rewards and modern machine-learning tools, providing a unified and computationally tractable approach to statistical inference in IRL.


【3】MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems
标题:MaRCA:用于大规模推荐系统中动态计算分配的多代理强化学习
链接:https://arxiv.org/abs/2512.24325

作者:Wan Jiang,Xinyi Zang,Yudong Zhao,Yusi Zou,Yunfei Lu,Junbo Tong,Yang Liu,Ming Li,Jiani Shi,Xin Yang
备注:12 pages, 5 figures
摘要 :现代推荐系统面临着巨大的计算挑战,由于不断增长的模型复杂性和流量规模,有效的计算分配至关重要的商业收入最大化。现有方法通常简化多阶段计算资源分配,忽略了阶段间的依赖关系,从而限制了全局最优性。在本文中,我们提出了MaRCA,一个多智能体强化学习框架,用于大规模推荐系统中的端到端计算资源分配。MaRCA将推荐系统的各个阶段建模为合作代理,使用集中式训练与分散式执行(CTDE)来优化计算资源约束下的收益。我们引入了一个AutoBucket TestBench用于精确的计算成本估计,以及一个基于模型预测控制(MPC)的收入-成本平衡器来主动预测流量负载并相应地调整收入-成本权衡。自2024年11月在一家领先的全球电子商务平台的广告渠道中进行端到端部署以来,MaRCA每天持续处理数千亿次广告请求,并利用现有计算资源实现了16. 67%的收入增长。
摘要:Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing business revenue. Existing approaches typically simplify multi-stage computation resource allocation, neglecting inter-stage dependencies, thus limiting global optimality. In this paper, we propose MaRCA, a multi-agent reinforcement learning framework for end-to-end computation resource allocation in large-scale recommender systems. MaRCA models the stages of a recommender system as cooperative agents, using Centralized Training with Decentralized Execution (CTDE) to optimize revenue under computation resource constraints. We introduce an AutoBucket TestBench for accurate computation cost estimation, and a Model Predictive Control (MPC)-based Revenue-Cost Balancer to proactively forecast traffic loads and adjust the revenue-cost trade-off accordingly. Since its end-to-end deployment in the advertising pipeline of a leading global e-commerce platform in November 2024, MaRCA has consistently handled hundreds of billions of ad requests per day and has delivered a 16.67% revenue uplift using existing computation resources.


【4】Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem
标题:深度强化学习解决车队规模和混合车辆路径问题
链接:https://arxiv.org/abs/2512.24251

作者:Pengfu Wan,Jiawei Chen,Gangyan Xu
摘要:车队规模和混合车辆路径问题(FSMVRP)是车辆路径问题(VRP)的一个重要变体,在运筹学和计算科学中得到了广泛的研究。FSMVRP需要同时决定车队组成和路线,使其高度适用于现实世界的场景,如短期车辆租赁和按需物流。然而,这些要求也增加了FSMVRP的复杂性,带来了重大挑战,特别是在大规模和时间受限的环境中。在本文中,我们提出了一种基于深度强化学习(DRL)的方法来解决FSMVRP,能够在几秒钟内生成接近最优的解决方案。具体来说,我们制定了一个马尔可夫决策过程(MDP)的问题,并开发了一种新的政策网络,称为FRIPN,无缝集成车队组成和路由决策。我们的方法采用了专门的输入嵌入设计的distinctdecision目标,包括剩余的图形嵌入,以促进有效的车辆使用决策。在随机生成的实例和基准数据集上进行了全面的实验。实验结果表明,我们的方法在计算效率和可扩展性方面表现出显着的优势,特别是在大规模和时间受限的情况下。这些优势突出了我们的方法在实际应用中的潜力,并为将基于DRL的技术扩展到VRP的其他变体提供了宝贵的灵感。
摘要:The Fleet Size and Mix Vehicle Routing Problem (FSMVRP) is a prominent variant of the Vehicle Routing Problem (VRP), extensively studied in operations research and computational science. FSMVRP requires simultaneous decisions on fleet composition and routing, making it highly applicable to real-world scenarios such as short-term vehicle rental and on-demand logistics. However, these requirements also increase the complexity of FSMVRP, posing significant challenges, particularly in large-scale and time-constrained environments. In this paper, we propose a deep reinforcement learning (DRL)-based approach for solving FSMVRP, capable of generating near-optimal solutions within a few seconds. Specifically, we formulate the problem as a Markov Decision Process (MDP) and develop a novel policy network, termed FRIPN, that seamlessly integrates fleet composition and routing decisions. Our method incorporates specialized input embeddings designed for distinctdecision objectives, including a remaining graph embedding to facilitate effective vehicle employment decisions. Comprehensive experiments are conducted on both randomly generated instances and benchmark datasets. The experimental results demonstrate that our method exhibits notable advantages in terms of computational efficiency and scalability, particularly in large-scale and time-constrained scenarios. These strengths highlight the potential of our approach for practical applications and provide valuable inspiration for extending DRL-based techniques to other variants of VRP.


【5】Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR
标题:带流匹配的最大-熵强化学习和LQR案例研究
链接:https://arxiv.org/abs/2512.23870

作者:Yuyang Zhang,Yang Hu,Bo Dai,Na Li
摘要:软演员批评者(SAC)是一种流行的最大熵强化学习算法。在实践中,SAC中基于能量的策略通常使用简单的策略类来近似提高效率,从而牺牲了表达性和鲁棒性。在本文中,我们提出了一个变体的SAC算法,参数化的政策与基于流的模型,利用其丰富的表现力。在该算法中,我们评估基于流的政策,利用瞬时变化的变量技术和更新的策略与本文开发的流匹配的在线变体。这种在线变体称为重要性采样流匹配(ISFM),仅使用来自用户指定的采样分布而不是未知目标分布的样本进行策略更新。我们开发了一个理论分析ISFM,表征如何不同的抽样分布的选择影响学习效率。最后,我们对最大熵线性二次调节器问题进行了案例研究,证明了该算法学习最优动作分布。
摘要:Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for efficiency, sacrificing the expressiveness and robustness. In this paper, we propose a variant of the SAC algorithm that parameterizes the policy with flow-based models, leveraging their rich expressiveness. In the algorithm, we evaluate the flow-based policy utilizing the instantaneous change-of-variable technique and update the policy with an online variant of flow matching developed in this paper. This online variant, termed importance sampling flow matching (ISFM), enables policy update with only samples from a user-specified sampling distribution rather than the unknown target distribution. We develop a theoretical analysis of ISFM, characterizing how different choices of sampling distributions affect the learning efficiency. Finally, we conduct a case study of our algorithm on the max-entropy linear quadratic regulator problems, demonstrating that the proposed algorithm learns the optimal action distribution.


【6】FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading
标题:FineFT:期货交易高效且具有风险意识的强化学习
链接:https://arxiv.org/abs/2512.23773

作者:Molei Qin,Xinyu Cai,Yewen Li,Haochong Xia,Chuqiao Zong,Shuo Sun,Xinrun Wang,Bo An
摘要:期货是一种合约,要求以预定的日期和价格交换资产,以其高杠杆和流动性而闻名,因此在加密市场中蓬勃发展。RL已广泛应用于各种定量任务。然而,大多数方法都集中在现货,不能直接应用于高杠杆的期货市场,因为两个挑战。首先,高杠杆放大了回报波动,使训练具有随机性,难以收敛。第二,以前的作品缺乏对能力边界的自我意识,当遇到新的市场状态时,他们面临重大损失的风险(例如,像COVID-19这样的黑天鹅事件)。为了应对这些挑战,我们提出了高效和风险感知的期货交易强化学习(FineFT),这是一种新的三阶段集成强化学习框架,具有稳定的训练和适当的风险管理。在第一阶段,集成Q学习器选择性地更新集成TD错误,以提高收敛。在第二阶段,我们过滤Q-学习者的基础上,他们的盈利能力和训练VAE的市场状态,以确定学习者的能力边界。在第三阶段,我们在经过培训的VAE指导下,从过滤的集合和保守政策中进行选择,以保持盈利能力并降低新市场状态的风险。通过在高保真和5倍杠杆的高频交易环境中对加密货币期货进行广泛的实验,我们证明FineFT在6个财务指标上优于12个SOTA基线,将风险降低了40%以上,同时实现了与亚军相比更高的盈利能力。可视化的选择性更新机制表明,不同的代理专门在不同的市场动态,消融研究证明路由与VAE有效地减少了最大下降,和选择性更新提高收敛和性能。
摘要 :Futures are contracts obligating the exchange of an asset at a predetermined date and price, notable for their high leverage and liquidity and, therefore, thrive in the Crypto market. RL has been widely applied in various quantitative tasks. However, most methods focus on the spot and could not be directly applied to the futures market with high leverage because of 2 challenges. First, high leverage amplifies reward fluctuations, making training stochastic and difficult to converge. Second, prior works lacked self-awareness of capability boundaries, exposing them to the risk of significant loss when encountering new market state (e.g.,a black swan event like COVID-19). To tackle these challenges, we propose the Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading (FineFT), a novel three-stage ensemble RL framework with stable training and proper risk management. In stage I, ensemble Q learners are selectively updated by ensemble TD errors to improve convergence. In stage II, we filter the Q-learners based on their profitabilities and train VAEs on market states to identify the capability boundaries of the learners. In stage III, we choose from the filtered ensemble and a conservative policy, guided by trained VAEs, to maintain profitability and mitigate risk with new market states. Through extensive experiments on crypto futures in a high-frequency trading environment with high fidelity and 5x leverage, we demonstrate that FineFT outperforms 12 SOTA baselines in 6 financial metrics, reducing risk by more than 40% while achieving superior profitability compared to the runner-up. Visualization of the selective update mechanism shows that different agents specialize in distinct market dynamics, and ablation studies certify routing with VAEs reduces maximum drawdown effectively, and selective update improves convergence and performance.


【7】Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions
标题:安全偏向的政策优化:通过信任区域实现硬约束强化学习
链接:https://arxiv.org/abs/2512.23770

作者:Ankit Kanwar,Dominik Wagner,Luke Ong
摘要:安全关键域中的强化学习(RL)要求代理在严格遵守安全约束的同时最大化奖励。现有的方法,如拉格朗日和基于投影的方法,往往无法确保接近零的安全违规或牺牲奖励性能,在面对硬约束。我们提出了安全偏置信任域策略优化(SB-TRPO),一个新的硬约束RL的信任域算法。SB-TRPO自适应地将策略更新偏向于约束满足,同时仍然寻求奖励改进。具体地说,它使用成本和奖励的自然策略梯度的凸组合来执行信任区域更新,确保在每一步的最佳成本降低的固定比例。我们提供了一个理论上的保证,当地的进展安全,奖励改善时,梯度适当对齐。对标准和具有挑战性的安全健身房任务的实验表明,与最先进的方法相比,SB-TRPO始终实现了安全性和有意义的任务完成的最佳平衡。
摘要:Reinforcement learning (RL) in safety-critical domains requires agents to maximise rewards while strictly adhering to safety constraints. Existing approaches, such as Lagrangian and projection-based methods, often either fail to ensure near-zero safety violations or sacrifice reward performance in the face of hard constraints. We propose Safety-Biased Trust Region Policy Optimisation (SB-TRPO), a new trust-region algorithm for hard-constrained RL. SB-TRPO adaptively biases policy updates towards constraint satisfaction while still seeking reward improvement. Concretely, it performs trust-region updates using a convex combination of the natural policy gradients of cost and reward, ensuring a fixed fraction of optimal cost reduction at each step. We provide a theoretical guarantee of local progress towards safety, with reward improvement when gradients are suitably aligned. Experiments on standard and challenging Safety Gymnasium tasks show that SB-TRPO consistently achieves the best balance of safety and meaningful task completion compared to state-of-the-art methods.


【8】Sparse Offline Reinforcement Learning with Corruption Robustness
标题:具有腐败鲁棒性的稀疏离线强化学习
链接:https://arxiv.org/abs/2512.24768

作者:Nam Phuong Tran,Andi Nika,Goran Radanovic,Long Tran-Thanh,Debmalya Mandal
摘要:我们研究了离线稀疏强化学习(RL)对强数据损坏的鲁棒性。在我们的设置中,对手可能会任意扰动一小部分收集的轨迹从一个高维但稀疏的马尔可夫决策过程,我们的目标是估计一个接近最优的政策。主要的挑战是,在高维制度中,样本的数量$N$小于特征维度$d$,利用稀疏性是获得非空的保证是必不可少的,但还没有系统地研究离线RL。我们分析了均匀覆盖和稀疏单集中性假设下的问题。虽然最小二乘值迭代(LSVI),一个标准的方法,强大的离线RL,在均匀覆盖下表现良好,我们表明,将稀疏性集成到LSVI是不自然的,它的分析可能会崩溃,由于过于悲观的奖金。为了克服这一点,我们提出了演员批评的方法与稀疏鲁棒估计器预言,避免使用逐点悲观奖金,并提供了第一个非空的保证稀疏离线RL下的单政策集中覆盖。此外,我们将我们的结果扩展到污染的设置,并表明我们的算法仍然强大的强污染。我们的研究结果提供了第一个非空的保证,在高维稀疏MDP与单政策集中覆盖和腐败,表明学习一个接近最优的政策仍然是可能的制度,传统的强大的离线RL技术可能会失败。
摘要:We investigate robustness to strong data corruption in offline sparse reinforcement learning (RL). In our setting, an adversary may arbitrarily perturb a fraction of the collected trajectories from a high-dimensional but sparse Markov decision process, and our goal is to estimate a near optimal policy. The main challenge is that, in the high-dimensional regime where the number of samples $N$ is smaller than the feature dimension $d$, exploiting sparsity is essential for obtaining non-vacuous guarantees but has not been systematically studied in offline RL. We analyse the problem under uniform coverage and sparse single-concentrability assumptions. While Least Square Value Iteration (LSVI), a standard approach for robust offline RL, performs well under uniform coverage, we show that integrating sparsity into LSVI is unnatural, and its analysis may break down due to overly pessimistic bonuses. To overcome this, we propose actor-critic methods with sparse robust estimator oracles, which avoid the use of pointwise pessimistic bonuses and provide the first non-vacuous guarantees for sparse offline RL under single-policy concentrability coverage. Moreover, we extend our results to the contaminated setting and show that our algorithm remains robust under strong contamination. Our results provide the first non-vacuous guarantees in high-dimensional sparse MDPs with single-policy concentrability coverage and corruption, showing that learning a near-optimal policy remains possible in regimes where traditional robust offline RL techniques may fail.


【9】Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning
标题:用于政策上风险敏感强化学习的鲁棒Bayesian动态规划
链接:https://arxiv.org/abs/2512.24580

作者:Shanyu Han,Yangbo He,Yang Liu
备注:63 pages
摘要:我们提出了一个新的框架,风险敏感强化学习(RSRL),结合了对过渡不确定性的鲁棒性。我们定义了两个不同的,但耦合的风险措施:一个内部的风险措施,解决状态和成本的随机性和外部的风险措施捕捉过渡动态的不确定性。我们的框架统一和概括了大多数现有的RL框架,允许内部和外部风险措施的一般一致的风险措施。在此框架内,我们构建了一个风险敏感的鲁棒马尔可夫决策过程(RSRMDP),推导出其Bellman方程,并提供了一个给定的后验分布下的误差分析。我们进一步开发了贝叶斯动态规划(贝叶斯DP)算法,交替后验更新和值迭代。该方法采用了基于风险的Bellman算子的估计器,该估计器将蒙特卡罗抽样与凸优化相结合,我们证明了其强一致性保证。此外,我们证明了该算法收敛到一个近最优的政策在训练环境中,并分析了样本复杂度和计算复杂度下的Dirichlet后验和CVaR。最后,我们通过两个数值实验验证了我们的方法。结果显示出良好的收敛性能,同时提供了直观的演示,其优点在风险敏感性和鲁棒性。在实证上,我们进一步证明了所提出的算法的优点,通过应用在期权套期保值。
摘要:We propose a novel framework for risk-sensitive reinforcement learning (RSRL) that incorporates robustness against transition uncertainty. We define two distinct yet coupled risk measures: an inner risk measure addressing state and cost randomness and an outer risk measure capturing transition dynamics uncertainty. Our framework unifies and generalizes most existing RL frameworks by permitting general coherent risk measures for both inner and outer risk measures. Within this framework, we construct a risk-sensitive robust Markov decision process (RSRMDP), derive its Bellman equation, and provide error analysis under a given posterior distribution. We further develop a Bayesian Dynamic Programming (Bayesian DP) algorithm that alternates between posterior updates and value iteration. The approach employs an estimator for the risk-based Bellman operator that combines Monte Carlo sampling with convex optimization, for which we prove strong consistency guarantees. Furthermore, we demonstrate that the algorithm converges to a near-optimal policy in the training environment and analyze both the sample complexity and the computational complexity under the Dirichlet posterior and CVaR. Finally, we validate our approach through two numerical experiments. The results exhibit excellent convergence properties while providing intuitive demonstrations of its advantages in both risk-sensitivity and robustness. Empirically, we further demonstrate the advantages of the proposed algorithm through an application on option hedging.


元学习(1篇)

【1】MotivNet: Evolving Meta-Sapiens into an Emotionally Intelligent Foundation Model
标题:MotivNet:将元智人发展为情感智能基础模型
链接:https://arxiv.org/abs/2512.24231

作者:Rahul Medicharla,Alper Yilmaz
备注:6 pages, 4 figures
摘要:在本文中,我们介绍了MotivNet,一个可推广的面部表情识别模型,强大的现实世界的应用。当前最先进的FER模型在不同数据上测试时往往具有较弱的泛化能力,导致在现实世界中的性能恶化,并阻碍FER作为一个研究领域。虽然研究人员已经提出了复杂的架构来解决这个泛化问题,但他们需要训练跨域来获得可泛化的结果,这对于现实世界的应用来说是内在矛盾的。我们的模型MotivNet通过使用Meta-Sapiens作为骨干,在没有跨域训练的情况下实现了跨数据集的竞争性能。Sapiens是一种人类视觉基础模型,通过对Masked Autoencoder进行大规模预训练,在现实世界中具有最先进的泛化能力。我们建议MotivNet作为一个额外的下游任务的Sapiens和定义三个标准来评估MotivNet的可行性作为一个Sapiens的任务:基准性能,模型相似性和数据相似性。在本文中,我们描述了MotivNet的组成部分,我们的训练方法,以及我们的结果,表明MotivNet是跨领域的推广。我们证明了MotivNet可以针对现有的SOTA模型进行基准测试,并满足所列标准,验证MotivNet作为Sapiens下游任务,并使FER更激励野外应用程序。该代码可在https://github.com/OSUPCVLab/EmotionFromFaceImages上获得。
摘要:In this paper, we introduce MotivNet, a generalizable facial emotion recognition model for robust real-world application. Current state-of-the-art FER models tend to have weak generalization when tested on diverse data, leading to deteriorated performance in the real world and hindering FER as a research domain. Though researchers have proposed complex architectures to address this generalization issue, they require training cross-domain to obtain generalizable results, which is inherently contradictory for real-world application. Our model, MotivNet, achieves competitive performance across datasets without cross-domain training by using Meta-Sapiens as a backbone. Sapiens is a human vision foundational model with state-of-the-art generalization in the real world through large-scale pretraining of a Masked Autoencoder. We propose MotivNet as an additional downstream task for Sapiens and define three criteria to evaluate MotivNet's viability as a Sapiens task: benchmark performance, model similarity, and data similarity. Throughout this paper, we describe the components of MotivNet, our training approach, and our results showing MotivNet is generalizable across domains. We demonstrate that MotivNet can be benchmarked against existing SOTA models and meets the listed criteria, validating MotivNet as a Sapiens downstream task, and making FER more incentivizing for in-the-wild application. The code is available at https://github.com/OSUPCVLab/EmotionFromFaceImages.


医学相关(5篇)

【1】ProDM: Synthetic Reality-driven Property-aware Progressive Diffusion Model for Coronary Calcium Motion Correction in Non-gated Chest CT
标题:ProDM:合成现实驱动的属性感知渐进扩散模型,用于非门控胸部CT中冠状动脉钙运动纠正
链接:https://arxiv.org/abs/2512.24948

作者:Xinran Gong,Gorkem Durak,Halil Ertugrul Aktas,Vedat Cicek,Jinkui Hao,Ulas Bagci,Nilay S. Shah,Bo Zhou
备注:21 pages, 8 figures
摘要:胸部CT的冠状动脉钙化(CAC)评分是一种成熟的工具,用于分层和完善临床心血管疾病风险估计。CAC量化依赖于钙化病变的准确描绘,但经常受到心脏和呼吸运动引入的伪影的影响。ECG门控心脏CT大大减少了运动伪影,但由于门控要求和缺乏保险覆盖,其在人群筛查和常规成像中的使用仍然有限。虽然越来越多地考虑从非门控胸部CT中识别偶发CAC,因为它提供了一种可访问和广泛可用的替代方案,但这种模式受到更严重的运动伪影的限制。我们提出ProDM(属性感知渐进校正扩散模型),这是一个生成扩散框架,可从非门控CT恢复无运动钙化病变。ProDM引入了三个关键组件:(1)CAC运动模拟数据引擎,可直接从心脏门控CT合成具有不同运动轨迹的真实非门控采集,实现无配对数据的监督训练;(2)属性感知学习策略,通过可区分的钙一致性损失纳入钙特定先验,以保持病变完整性;以及(3)渐进校正方案,其在扩散步骤中逐渐减少伪影以增强稳定性和钙保真度。在真实患者数据集上的实验表明,与几个基线相比,ProDM显著提高了CAC评分准确性、空间病变保真度和风险分层性能。一项关于真实非门控扫描的阅片师研究进一步证实,ProDM可抑制运动伪影并提高临床可用性。这些发现强调了从常规胸部CT成像中进行可靠CAC量化的渐进性、属性感知框架的潜力。
摘要:Coronary artery calcium (CAC) scoring from chest CT is a well-established tool to stratify and refine clinical cardiovascular disease risk estimation. CAC quantification relies on the accurate delineation of calcified lesions, but is oftentimes affected by artifacts introduced by cardiac and respiratory motion. ECG-gated cardiac CTs substantially reduce motion artifacts, but their use in population screening and routine imaging remains limited due to gating requirements and lack of insurance coverage. Although identification of incidental CAC from non-gated chest CT is increasingly considered for it offers an accessible and widely available alternative, this modality is limited by more severe motion artifacts. We present ProDM (Property-aware Progressive Correction Diffusion Model), a generative diffusion framework that restores motion-free calcified lesions from non-gated CTs. ProDM introduces three key components: (1) a CAC motion simulation data engine that synthesizes realistic non-gated acquisitions with diverse motion trajectories directly from cardiac-gated CTs, enabling supervised training without paired data; (2) a property-aware learning strategy incorporating calcium-specific priors through a differentiable calcium consistency loss to preserve lesion integrity; and (3) a progressive correction scheme that reduces artifacts gradually across diffusion steps to enhance stability and calcium fidelity. Experiments on real patient datasets show that ProDM significantly improves CAC scoring accuracy, spatial lesion fidelity, and risk stratification performance compared with several baselines. A reader study on real non-gated scans further confirms that ProDM suppresses motion artifacts and improves clinical usability. These findings highlight the potential of progressive, property-aware frameworks for reliable CAC quantification from routine chest CT imaging.


【2】DTI-GP: Bayesian operations for drug-target interactions using deep kernel Gaussian processes
标题:DTI-GP:使用深核高斯过程进行药物-靶点相互作用的Bayesian操作
链接:https://arxiv.org/abs/2512.24810

作者:Bence Bolgár,András Millinghoffer,Péter Antal
摘要:关于药物-靶标相互作用(DTI)预测的精确概率信息对于理解局限性和提高预测性能至关重要。高斯过程(GP)提供了一个可扩展的框架,以集成最先进的DTI表示和贝叶斯推理,实现新的操作,如贝叶斯分类与拒绝,前$K$选择和排名。我们提出了一个基于深度内核学习的GP架构(DTI-GP),它包含了一个用于化合物和蛋白质目标的组合神经嵌入模块和一个GP模块。工作流继续从预测分布中进行采样,以估计贝叶斯优先矩阵,该矩阵用于快速准确的选择和排序操作。DTI-GP优于最先进的解决方案,并且它允许(1)构建贝叶斯准确性置信度富集分数,(2)用于改进富集的拒绝方案,以及(3)估计和搜索前$K$选择和具有高预期效用的排名。
摘要:Precise probabilistic information about drug-target interaction (DTI) predictions is vital for understanding limitations and boosting predictive performance. Gaussian processes (GP) offer a scalable framework to integrate state-of-the-art DTI representations and Bayesian inference, enabling novel operations, such as Bayesian classification with rejection, top-$K$ selection, and ranking. We propose a deep kernel learning-based GP architecture (DTI-GP), which incorporates a combined neural embedding module for chemical compounds and protein targets, and a GP module. The workflow continues with sampling from the predictive distribution to estimate a Bayesian precedence matrix, which is used in fast and accurate selection and ranking operations. DTI-GP outperforms state-of-the-art solutions, and it allows (1) the construction of a Bayesian accuracy-confidence enrichment score, (2) rejection schemes for improved enrichment, and (3) estimation and search for top-$K$ selections and ranking with high expected utility.


【3】CPR: Causal Physiological Representation Learning for Robust ECG Analysis under Distribution Shifts
标题:CPR:用于分布偏移下的稳健ECG分析的因果生理表示学习
链接:https://arxiv.org/abs/2512.24564

作者:Shunbo Jia,Caizhi Liao
摘要:用于心电图(ECG)诊断的深度学习模型已经实现了显著的准确性,但在对抗性扰动(特别是模仿生物形态的平滑对抗性扰动(SAP))时表现出脆弱性。现有的防御面临着一个关键的困境:对抗训练(AT)提供了鲁棒性,但会带来令人望而却步的计算负担,而像随机平滑(RS)这样的认证方法会引入显著的推理延迟,使其无法用于实时临床监测。我们认为,这种脆弱性源于模型依赖于非鲁棒的虚假相关性,而不是不变的病理特征。为了解决这个问题,我们提出了因果生理表征学习(CPR)。与没有语义约束的标准去噪方法不同,CPR在因果解纠缠框架内结合了生理结构先验。通过经由结构因果模型(SCM)对ECG生成进行建模,CPR实施结构干预,该结构干预严格地将不变的病理形态(P-QRS-T复合体)与非因果伪影分离。PTB-XL的实验结果表明,CPR显着优于标准的临床预处理方法。具体而言,在SAP攻击下,CPR的F1得分为0.632,超过中值平滑(0.541 F1)9.1%。至关重要的是,CPR匹配随机平滑的认证鲁棒性,同时保持单通道推理效率,在鲁棒性,效率和临床可解释性之间提供卓越的权衡。
摘要:Deep learning models for Electrocardiogram (ECG) diagnosis have achieved remarkable accuracy but exhibit fragility against adversarial perturbations, particularly Smooth Adversarial Perturbations (SAP) that mimic biological morphology. Existing defenses face a critical dilemma: Adversarial Training (AT) provides robustness but incurs a prohibitive computational burden, while certified methods like Randomized Smoothing (RS) introduce significant inference latency, rendering them impractical for real-time clinical monitoring. We posit that this vulnerability stems from the models' reliance on non-robust spurious correlations rather than invariant pathological features. To address this, we propose Causal Physiological Representation Learning (CPR). Unlike standard denoising approaches that operate without semantic constraints, CPR incorporates a Physiological Structural Prior within a causal disentanglement framework. By modeling ECG generation via a Structural Causal Model (SCM), CPR enforces a structural intervention that strictly separates invariant pathological morphology (P-QRS-T complex) from non-causal artifacts. Empirical results on PTB-XL demonstrate that CPR significantly outperforms standard clinical preprocessing methods. Specifically, under SAP attacks, CPR achieves an F1 score of 0.632, surpassing Median Smoothing (0.541 F1) by 9.1%. Crucially, CPR matches the certified robustness of Randomized Smoothing while maintaining single-pass inference efficiency, offering a superior trade-off between robustness, efficiency, and clinical interpretability.


【4】Medical Image Classification on Imbalanced Data Using ProGAN and SMA-Optimized ResNet: Application to COVID-19
标题:使用ProGAN和SMA-优化ResNet对不平衡数据进行医学图像分类:在COVID-19中的应用
链接:https://arxiv.org/abs/2512.24214

作者:Sina Jahromi,Farshid Hajati,Alireza Rezaee,Javaher Nourian
摘要:不平衡数据的挑战在医学图像分类中是突出的。当属于特定类别的图像的数量与属于其他类别的图像的数量相比存在显著差异时,例如存在或不存在特定疾病,就会出现这种挑战。这个问题在大流行期间尤其明显,这可能导致数据集更加严重的不平衡。近年来,研究人员采用了各种方法来准确快速地检测COVID-19感染者,其中人工智能和机器学习算法处于最前沿。然而,缺乏足够和平衡的数据仍然是这些方法的一个重大障碍。这项研究通过提出一个渐进式生成对抗网络来解决这一挑战,以生成合成数据来补充真实数据。该方法提出了一种加权方法,在将合成数据输入深度网络分类器之前将其与真实数据相结合。采用多目标元启发式种群优化算法对分类器的超参数进行优化。当应用于COVID-19的大型且不平衡的胸部X射线图像数据集时,与现有方法相比,所提出的模型表现出优越的交叉验证指标。该模型对4类和2类不平衡分类问题的准确率分别达到95.5%和98.5%。成功的实验结果证明了该模型在流行病期间使用不平衡数据进行医学图像分类的有效性。
摘要:The challenge of imbalanced data is prominent in medical image classification. This challenge arises when there is a significant disparity in the number of images belonging to a particular class, such as the presence or absence of a specific disease, as compared to the number of images belonging to other classes. This issue is especially notable during pandemics, which may result in an even more significant imbalance in the dataset. Researchers have employed various approaches in recent years to detect COVID-19 infected individuals accurately and quickly, with artificial intelligence and machine learning algorithms at the forefront. However, the lack of sufficient and balanced data remains a significant obstacle to these methods. This study addresses the challenge by proposing a progressive generative adversarial network to generate synthetic data to supplement the real ones. The proposed method suggests a weighted approach to combine synthetic data with real ones before inputting it into a deep network classifier. A multi-objective meta-heuristic population-based optimization algorithm is employed to optimize the hyper-parameters of the classifier. The proposed model exhibits superior cross-validated metrics compared to existing methods when applied to a large and imbalanced chest X-ray image dataset of COVID-19. The proposed model achieves 95.5% and 98.5% accuracy for 4-class and 2-class imbalanced classification problems, respectively. The successful experimental outcomes demonstrate the effectiveness of the proposed model in classifying medical images using imbalanced data during pandemics.


【5】Tracing the Heart's Pathways: ECG Representation Learning from a Cardiac Conduction Perspective
标题:追踪心脏路径:从心脏导电角度进行心电图表示学习
链接:https://arxiv.org/abs/2512.24002

作者:Tan Pan,Yixuan Sun,Chen Jiang,Qiong Gao,Rui Sun,Xingmeng Zhang,Zhenqi Yang,Limei Han,Yixiu Liang,Yuan Cheng,Kaiyu Guo
备注:Accepted to AAAI2026
摘要:多导联心电图(ECG)是心脏诊断的基石。心电图自监督学习(eSSL)的最新进展为在不依赖高质量注释的情况下增强表示学习带来了光明的前景。然而,早期的eSSL方法有一个关键的局限性:它们专注于导联和心跳之间的一致模式,忽略了心脏传导过程中心跳的固有差异,而微妙但重要的变化携带着独特的生理特征。此外,用于ECG分析的表示学习应该与ECG诊断指南保持一致,其从单个心跳到单个导联并最终到导联组合。然而,在将预先训练的模型应用于下游任务时,这种顺序逻辑往往被忽视。为了解决这些差距,我们提出了CLEAR-HUG,这是一个两阶段的框架,旨在捕获导联之间心脏传导的细微变化,同时遵守ECG诊断指南。在第一阶段,我们引入了一个eSSL模型,称为传导LEAd重建器(CLEAR),它捕获了心跳之间的特定变化和一般共性。CLEAR将每个心跳视为一个独立的实体,采用简单而有效的稀疏注意机制来重建信号,而不会受到其他心跳的干扰。在第二阶段,我们实施了一个分层的领导统一组头(HUG)的疾病诊断,反映临床工作流程。六个任务的实验结果显示,提高了6.84%,验证了CLEAR-HUG的有效性。这突出了其增强心脏传导表示并将模式与专家诊断指南对齐的能力。
摘要:The multi-lead electrocardiogram (ECG) stands as a cornerstone of cardiac diagnosis. Recent strides in electrocardiogram self-supervised learning (eSSL) have brightened prospects for enhancing representation learning without relying on high-quality annotations. Yet earlier eSSL methods suffer a key limitation: they focus on consistent patterns across leads and beats, overlooking the inherent differences in heartbeats rooted in cardiac conduction processes, while subtle but significant variations carry unique physiological signatures. Moreover, representation learning for ECG analysis should align with ECG diagnostic guidelines, which progress from individual heartbeats to single leads and ultimately to lead combinations. This sequential logic, however, is often neglected when applying pre-trained models to downstream tasks. To address these gaps, we propose CLEAR-HUG, a two-stage framework designed to capture subtle variations in cardiac conduction across leads while adhering to ECG diagnostic guidelines. In the first stage, we introduce an eSSL model termed Conduction-LEAd Reconstructor (CLEAR), which captures both specific variations and general commonalities across heartbeats. Treating each heartbeat as a distinct entity, CLEAR employs a simple yet effective sparse attention mechanism to reconstruct signals without interference from other heartbeats. In the second stage, we implement a Hierarchical lead-Unified Group head (HUG) for disease diagnosis, mirroring clinical workflow. Experimental results across six tasks show a 6.84% improvement, validating the effectiveness of CLEAR-HUG. This highlights its ability to enhance representations of cardiac conduction and align patterns with expert diagnostic guidelines.


蒸馏|知识提取(2篇)

【1】Attribution-Guided Distillation of Matryoshka Sparse Autoencoders
标题:Matryoshka稀疏自编码器的属性引导提取
链接:https://arxiv.org/abs/2512.24975

作者:Cristina P. Martin-Linares,Jonathan P. Ling
摘要:稀疏自动编码器(SAE)旨在将模型激活分解为单语义的、人类可解释的特征。在实践中,学习到的特征通常是冗余的,并且在训练运行和稀疏级别之间变化,这使得解释难以传输和重用。我们引入了Distilled Matryoshka Sparse Autoencoders(DMSAE),这是一个训练管道,它提取了一个紧凑的核心,并重用它来训练新的SAE。DMSAE运行一个迭代蒸馏循环:训练一个具有共享核心的Matryoshka SAE,使用梯度X激活来测量每个特征对最嵌套重建中下一个令牌丢失的贡献,并仅保留解释属性的固定部分的最小子集。只有核心编码器权重向量在周期中传输;每次都重新初始化核心解码器和所有非核心延迟。在Gemma-2-2B层12剩余流激活上,七个蒸馏循环(500 M令牌,65 k宽度)产生了重复选择的197个特征的蒸馏核心。使用这个提取的核心进行训练改进了几个SAEBench指标,并证明了一致的潜在特征集可以跨稀疏级别进行传输
摘要:Sparse autoencoders (SAEs) aim to disentangle model activations into monosemantic, human-interpretable features. In practice, learned features are often redundant and vary across training runs and sparsity levels, which makes interpretations difficult to transfer and reuse. We introduce Distilled Matryoshka Sparse Autoencoders (DMSAEs), a training pipeline that distills a compact core of consistently useful features and reuses it to train new SAEs. DMSAEs run an iterative distillation cycle: train a Matryoshka SAE with a shared core, use gradient X activation to measure each feature's contribution to next-token loss in the most nested reconstruction, and keep only the smallest subset that explains a fixed fraction of the attribution. Only the core encoder weight vectors are transferred across cycles; the core decoder and all non-core latents are reinitialized each time. On Gemma-2-2B layer 12 residual stream activations, seven cycles of distillation (500M tokens, 65k width) yielded a distilled core of 197 features that were repeatedly selected. Training using this distilled core improves several SAEBench metrics and demonstrates that consistent sets of latent features can be transferred across sparsity levels


【2】HINTS: Extraction of Human Insights from Time-Series Without External Sources
标题:提示:在没有外部来源的情况下从时间序列中提取人类见解
链接:https://arxiv.org/abs/2512.23755

作者:Sheo Yon Jhin,Noseong Park
备注:AAAI 2026 AI4TS Workshop paper
摘要:人类决策、情绪和集体心理是塑造金融和经济系统中观察到的时间动态的复杂因素。许多最近的时间序列预测模型利用外部来源(例如,新闻和社交媒体)来捕捉人为因素,但这些方法在财务、计算和实际影响方面产生了很高的数据依赖成本。在这项研究中,我们提出了HINTS,一个自监督学习框架,从时间序列残差中提取这些潜在因素,而无需外部数据。HINTS利用Friedkin-Johnsen(FJ)的意见动力学模型作为一个结构性的归纳偏见模型不断变化的社会影响力,记忆和偏见模式。提取的人为因素被集成到一个国家的最先进的骨干模型作为一个注意力地图。使用9个真实世界和基准数据集的实验结果表明,HINTS一贯提高预测精度。此外,多个案例研究和消融研究验证了HINTS的可解释性,证明了提取的因素和现实世界的事件之间的强大的语义对齐,证明了HINTS的实用性。
摘要:Human decision-making, emotions, and collective psychology are complex factors that shape the temporal dynamics observed in financial and economic systems. Many recent time series forecasting models leverage external sources (e.g., news and social media) to capture human factors, but these approaches incur high data dependency costs in terms of financial, computational, and practical implications. In this study, we propose HINTS, a self-supervised learning framework that extracts these latent factors endogenously from time series residuals without external data. HINTS leverages the Friedkin-Johnsen (FJ) opinion dynamics model as a structural inductive bias to model evolving social influence, memory, and bias patterns. The extracted human factors are integrated into a state-of-the-art backbone model as an attention map. Experimental results using nine real-world and benchmark datasets demonstrate that HINTS consistently improves forecasting accuracy. Furthermore, multiple case studies and ablation studies validate the interpretability of HINTS, demonstrating strong semantic alignment between the extracted factors and real-world events, demonstrating the practical utility of HINTS.


聚类(2篇)

【1】Deep Global Clustering for Hyperspectral Image Segmentation: Concepts, Applications, and Open Challenges
标题:用于高光谱图像分割的深度全局集群:概念、应用和开放挑战
链接:https://arxiv.org/abs/2512.24172

作者:Yu-Tang Chang,Pin-Wei Chen,Shih-Fang Chen
备注:10 pages, 4 figures. Technical report extending ACPA 2025 conference paper. Code and data available at https://github.com/b05611038/HSI_global_clustering
摘要:高光谱成像(HSI)分析面临着计算瓶颈,由于大量的数据量,超过可用的内存。虽然在大型遥感数据集上预先训练的基础模型显示出了希望,但它们的学习表示通常无法转移到特定领域的应用中,如近距离农业监测,其中光谱特征,空间尺度和语义目标根本不同。本报告介绍了深度全局聚类(DGC),这是一个用于内存高效HSI分割的概念框架,它可以从局部补丁观察中学习全局聚类结构,而无需预先训练。DGC在具有重叠区域的小补丁上运行以增强一致性,从而在消费者硬件上实现30分钟内的培训,同时保持恒定的内存使用。在叶病数据集上,DGC实现了背景组织分离(平均IoU 0.925),并通过可导航的语义粒度演示了无监督的疾病检测。然而,该框架遭受的优化不稳定性根源于多目标损失平衡:有意义的表示迅速出现,但由于集群过度合并在特征空间退化。我们将这项工作定位为智力脚手架-设计理念有优点,但稳定的实现需要有原则的方法来动态损耗平衡。代码和数据可在https://github.com/b05611038/HSI_global_clustering上获得。
摘要:Hyperspectral imaging (HSI) analysis faces computational bottlenecks due to massive data volumes that exceed available memory. While foundation models pre-trained on large remote sensing datasets show promise, their learned representations often fail to transfer to domain-specific applications like close-range agricultural monitoring where spectral signatures, spatial scales, and semantic targets differ fundamentally. This report presents Deep Global Clustering (DGC), a conceptual framework for memory-efficient HSI segmentation that learns global clustering structure from local patch observations without pre-training. DGC operates on small patches with overlapping regions to enforce consistency, enabling training in under 30 minutes on consumer hardware while maintaining constant memory usage. On a leaf disease dataset, DGC achieves background-tissue separation (mean IoU 0.925) and demonstrates unsupervised disease detection through navigable semantic granularity. However, the framework suffers from optimization instability rooted in multi-objective loss balancing: meaningful representations emerge rapidly but degrade due to cluster over-merging in feature space. We position this work as intellectual scaffolding - the design philosophy has merit, but stable implementation requires principled approaches to dynamic loss balancing. Code and data are available at https://github.com/b05611038/HSI_global_clustering.


【2】A Granular Grassmannian Clustering Framework via the Schubert Variety of Best Fit
标题:通过舒BERT最佳匹配变种的颗粒格拉斯曼集群框架
链接:https://arxiv.org/abs/2512.23766

作者:Karim Salta,Michael Kirby,Chris Peterson
摘要:在许多分类和聚类任务中,计算数据集或聚类的几何表示是有用的,例如均值或中值。当数据集由子空间表示时,这些代表成为格拉斯曼或旗流形上的点,其距离由其几何形状引起,通常通过主角。我们引入了一种子空间聚类算法,该算法用定义为舒伯特最佳拟合簇(SVBF)的可训练原型替换子空间均值--一个尽可能接近在至少一个固定方向上与每个集群成员相交的子空间。集成在Linde-Buzo-Grey(LBG)管道中,该SVBF-LBG方案在合成,图像,光谱和视频动作数据上提高了聚类纯度,同时保留了下游分析所需的数学结构。
摘要 :In many classification and clustering tasks, it is useful to compute a geometric representative for a dataset or a cluster, such as a mean or median. When datasets are represented by subspaces, these representatives become points on the Grassmann or flag manifold, with distances induced by their geometry, often via principal angles. We introduce a subspace clustering algorithm that replaces subspace means with a trainable prototype defined as a Schubert Variety of Best Fit (SVBF) - a subspace that comes as close as possible to intersecting each cluster member in at least one fixed direction. Integrated in the Linde-Buzo-Grey (LBG) pipeline, this SVBF-LBG scheme yields improved cluster purity on synthetic, image, spectral, and video action data, while retaining the mathematical structure required for downstream analysis.


超分辨率|去噪|去模糊|去雾(1篇)

【1】Implicit score matching meets denoising score matching: improved rates of convergence and log-density Hessian estimation
标题:隐式得分匹配满足去噪得分匹配:改进的收敛率和对数密度Hessian估计
链接:https://arxiv.org/abs/2512.24378

作者:Konstantin Yakovlev,Anna Markovich,Nikita Puchkin
备注:52 pages
摘要:我们研究的问题,估计得分函数使用隐式得分匹配和去噪得分匹配。假设数据分布呈现低维结构,我们证明了隐式得分匹配不仅能够适应内在维度,而且在样本量方面达到与去噪得分匹配相同的收敛速度。此外,我们证明了这两种方法都允许我们估计对数密度海森没有维数灾难通过简单的微分。这证明了生成扩散模型的基于ODE的采样器的收敛性。我们的方法是基于Gagliardo-Nirenberg型不等式的加权L^2 $-模的光滑函数和他们的衍生物。
摘要:We study the problem of estimating the score function using both implicit score matching and denoising score matching. Assuming that the data distribution exhibiting a low-dimensional structure, we prove that implicit score matching is able not only to adapt to the intrinsic dimension, but also to achieve the same rates of convergence as denoising score matching in terms of the sample size. Furthermore, we demonstrate that both methods allow us to estimate log-density Hessians without the curse of dimensionality by simple differentiation. This justifies convergence of ODE-based samplers for generative diffusion models. Our approach is based on Gagliardo-Nirenberg-type inequalities relating weighted $L^2$-norms of smooth functions and their derivatives.


自动驾驶|车辆|车道检测等(2篇)

【1】AutoFed: Manual-Free Federated Traffic Prediction via Personalized Prompt
标题:AutoFeed:通过个性化提示进行免手动联合流量预测
链接:https://arxiv.org/abs/2512.24625

作者:Zijian Zhao,Yitong Shang,Sen Li
摘要:准确的交通预测对于智能交通系统至关重要,包括乘车,城市道路规划和车队管理。然而,由于围绕交通数据的重大隐私问题,大多数现有方法依赖于本地培训,导致数据孤岛和有限的知识共享。联邦学习(FL)通过保护隐私的协作训练提供了一个有效的解决方案;然而,标准FL与客户端之间的非独立和同分布(非IID)问题作斗争。这一挑战导致了个性化联邦学习(PFL)作为一个有前途的范例的出现。然而,目前的PFL框架需要进一步适应流量预测任务,如专门的图形特征工程,数据处理和网络架构设计。许多以前的研究的一个显着的局限性是他们的依赖于超参数优化跨集群的信息,往往是在现实世界的集群,从而阻碍了实际部署。为了应对这一挑战,我们提出了AutoFed,一种新颖的流量预测PFL框架,消除了手动超参数调整的需要。受提示学习的启发,AutoFed引入了一个联邦表示器,该表示器采用与客户端一致的适配器将本地数据提取到一个紧凑的、全局共享的提示矩阵中。然后,这个提示条件个性化的预测,让每个客户端受益于跨客户端的知识,同时保持本地的特异性。在真实世界数据集上进行的大量实验表明,AutoFed在不同的场景中始终实现卓越的性能。这篇论文的代码在www.example.com上提供。
摘要:Accurate traffic prediction is essential for Intelligent Transportation Systems, including ride-hailing, urban road planning, and vehicle fleet management. However, due to significant privacy concerns surrounding traffic data, most existing methods rely on local training, resulting in data silos and limited knowledge sharing. Federated Learning (FL) offers an efficient solution through privacy-preserving collaborative training; however, standard FL struggles with the non-independent and identically distributed (non-IID) problem among clients. This challenge has led to the emergence of Personalized Federated Learning (PFL) as a promising paradigm. Nevertheless, current PFL frameworks require further adaptation for traffic prediction tasks, such as specialized graph feature engineering, data processing, and network architecture design. A notable limitation of many prior studies is their reliance on hyper-parameter optimization across datasets-information that is often unavailable in real-world scenarios-thus impeding practical deployment. To address this challenge, we propose AutoFed, a novel PFL framework for traffic prediction that eliminates the need for manual hyper-parameter tuning. Inspired by prompt learning, AutoFed introduces a federated representor that employs a client-aligned adapter to distill local data into a compact, globally shared prompt matrix. This prompt then conditions a personalized predictor, allowing each client to benefit from cross-client knowledge while maintaining local specificity. Extensive experiments on real-world datasets demonstrate that AutoFed consistently achieves superior performance across diverse scenarios. The code of this paper is provided at https://github.com/RS2002/AutoFed .


【2】Network Traffic Analysis with Process Mining: The UPSIDE Case Study
标题:利用流程挖掘进行网络流量分析:UPSIDE案例研究
链接:https://arxiv.org/abs/2512.23718

作者:Francesco Vitale,Paolo Palmiero,Massimiliano Rak,Nicola Mazzocca
摘要:在线游戏是一种流行的活动,涉及采用复杂的系统和网络基础设施。游戏的相关性产生了大量的市场收入,推动了对网络设备行为建模的研究,以评估带宽消耗,预测和维持高负载,并检测恶意活动。在这种情况下,流程挖掘似乎很有前途,因为它能够将数据驱动的分析与基于模型的见解相结合。在本文中,我们提出了一种基于过程挖掘的方法,分析游戏网络流量,允许:无监督的表征不同的状态从游戏网络数据;编码这些状态通过过程挖掘到可解释的Petri网;和游戏网络流量数据的分类,以确定不同的视频游戏正在播放。我们将该方法应用于UPSIDE案例研究,涉及与两个视频游戏交互的几个设备的游戏网络数据:《皇室战争》和《火箭联盟》。结果表明,游戏网络的行为可以有效地和可解释的建模,通过状态表示为Petri网具有足够的一致性(94.02%的设备间相似性)和特异性(174.99%的状态间分离),同时保持良好的分类精度的两个不同的视频游戏(73.84% AUC)。
摘要:Online gaming is a popular activity involving the adoption of complex systems and network infrastructures. The relevance of gaming, which generates large amounts of market revenue, drove research in modeling network devices' behavior to evaluate bandwidth consumption, predict and sustain high loads, and detect malicious activity. In this context, process mining appears promising due to its ability to combine data-driven analyses with model-based insights. In this paper, we propose a process mining-based method that analyzes gaming network traffic, allowing: unsupervised characterization of different states from gaming network data; encoding such states through process mining into interpretable Petri nets; and classification of gaming network traffic data to identify different video games being played. We apply the method to the UPSIDE case study, involving gaming network data of several devices interacting with two video games: Clash Royale and Rocket League. Results demonstrate that the gaming network behavior can be effectively and interpretably modeled through states represented as Petri nets with sufficient coherence (94.02% inter-device similarity) and specificity (174.99% inter-state separation) while maintaining a good classification accuracy of the two different video games (73.84% AUC).


点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】AODDiff: Probabilistic Reconstruction of Aerosol Optical Depth via Diffusion-based Bayesian Inference
标题:AODiff:通过基于扩散的Bayesian推断的气溶胶光学厚度的概率重建
链接 :https://arxiv.org/abs/2512.24847

作者:Linhao Fan,Hongqiang Fang,Jingyang Dai,Yong Jiang,Qixing Zhang
备注:17 pages, 9 figures
摘要:气溶胶光学厚度(AOD)场的高质量重建对于大气监测至关重要,但当前模型仍然受到完整训练数据稀缺和缺乏不确定性量化的限制。为了解决这些限制,我们提出了AODDiff,一个基于扩散的概率重建框架基于贝叶斯推理。通过利用学习的时空概率分布的AOD字段作为生成先验,该框架可以灵活地适应各种重建任务,而不需要特定于任务的再训练。首先,我们引入了一个腐败意识的训练策略,学习一个时空的AOD之前只从自然不完整的数据。随后,我们采用了解耦退火后验采样策略,使更有效的和集成的异构观测作为约束,以指导生成过程。我们通过对再分析数据的大量实验来验证所提出的框架。缩小和修复任务的结果证实了AODDiff的有效性和鲁棒性,特别是证明了它在保持高空间光谱保真度方面的优势。此外,作为一个生成模型,AODDiff固有地通过多次采样实现不确定性量化,为下游应用提供关键的置信度指标。
摘要:High-quality reconstruction of Aerosol Optical Depth (AOD) fields is critical for Atmosphere monitoring, yet current models remain constrained by the scarcity of complete training data and a lack of uncertainty quantification.To address these limitations, we propose AODDiff, a probabilistic reconstruction framework based on diffusion-based Bayesian inference. By leveraging the learned spatiotemporal probability distribution of the AOD field as a generative prior, this framework can be flexibly adapted to various reconstruction tasks without requiring task-specific retraining. We first introduce a corruption-aware training strategy to learns a spatiotemporal AOD prior solely from naturally incomplete data. Subsequently, we employ a decoupled annealing posterior sampling strategy that enables the more effective and integration of heterogeneous observations as constraints to guide the generation process. We validate the proposed framework through extensive experiments on Reanalysis data. Results across downscaling and inpainting tasks confirm the efficacy and robustness of AODDiff, specifically demonstrating its advantage in maintaining high spatial spectral fidelity. Furthermore, as a generative model, AODDiff inherently enables uncertainty quantification via multiple sampling, offering critical confidence metrics for downstream applications.


联邦学习|隐私保护|加密(4篇)

【1】Mobility-Assisted Decentralized Federated Learning: Convergence Analysis and A Data-Driven Approach
标题:移动辅助去中心联邦学习:融合分析和数据驱动方法
链接:https://arxiv.org/abs/2512.24694

作者:Reza Jahani,Md Farhamdur Reza,Richeng Jin,Huaiyu Dai
备注:Under review for potential publication in IEEE Transactions on Cognitive Communications and Networking
摘要:分散式联合学习(DFL)已经成为一种保护隐私的机器学习范式,可以在不依赖中央服务器的情况下在用户之间进行协作训练。然而,由于有限的连接性和数据异构性,其性能往往会显着下降。随着我们迈向下一代无线网络,移动性越来越多地嵌入到许多现实应用中。用户的移动性,无论是自然的或诱导的,使客户端作为中继或桥梁,从而增强稀疏网络中的信息流,然而,它对DFL的影响在很大程度上被忽视,尽管它的潜力。在这项工作中,我们系统地研究了流动性在提高DFL性能中的作用。我们首先在稀疏网络中建立了DFL在用户移动性下的收敛性,并从理论上证明了即使是一小部分用户的随机移动也可以显着提高性能。基于这一认识,我们提出了一个DFL框架,利用移动用户与诱导的移动模式,使他们能够利用数据分布的知识,以确定他们的轨迹,以提高通过网络的信息传播。通过大量的实验,我们从经验上证实了我们的理论研究结果,验证了我们的方法优于基线,并提供了各种网络参数如何影响DFL在移动网络中的性能的全面分析。
摘要:Decentralized Federated Learning (DFL) has emerged as a privacy-preserving machine learning paradigm that enables collaborative training among users without relying on a central server. However, its performance often degrades significantly due to limited connectivity and data heterogeneity. As we move toward the next generation of wireless networks, mobility is increasingly embedded in many real-world applications. The user mobility, either natural or induced, enables clients to act as relays or bridges, thus enhancing information flow in sparse networks; however, its impact on DFL has been largely overlooked despite its potential. In this work, we systematically investigate the role of mobility in improving DFL performance. We first establish the convergence of DFL in sparse networks under user mobility and theoretically demonstrate that even random movement of a fraction of users can significantly boost performance. Building upon this insight, we propose a DFL framework that utilizes mobile users with induced mobility patterns, allowing them to exploit the knowledge of data distribution to determine their trajectories to enhance information propagation through the network. Through extensive experiments, we empirically confirm our theoretical findings, validate the superiority of our approach over baselines, and provide a comprehensive analysis of how various network parameters influence DFL performance in mobile networks.


【2】Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning
标题:用于节能分散式联邦学习的时变混合矩阵设计
链接:https://arxiv.org/abs/2512.24069

作者:Xusheng Zhang,Tuan Nguyen,Ting He
摘要:我们考虑混合矩阵的设计,以最大限度地减少分散式联邦学习(DFL)在无线网络中的操作成本,重点是最大限度地减少每个节点的能量消耗。作为DFL的一个关键超参数,混合矩阵控制了收敛速度和Agent到Agent通信的需求,因此得到了广泛的研究。然而,现有的设计主要集中在最小化通信时间,留下开放的每个节点的能量消耗的最小化,这是能源受限的设备的关键。这项工作通过一个理论上合理的混合矩阵设计的解决方案,旨在最大限度地减少每个节点的能量消耗,直到收敛,同时考虑到无线通信的广播性质,解决了这一差距。基于一个新的收敛定理,允许任意时变的混合矩阵,我们提出了一个多阶段的设计框架,激活时变的通信拓扑结构下优化的预算,以权衡每次迭代的能量消耗和收敛速度,同时平衡跨节点的能量消耗。我们的评估基于真实数据验证了所提出的解决方案的有效性,结合稀疏混合矩阵的低能耗和稠密混合矩阵的快速收敛。
摘要:We consider the design of mixing matrices to minimize the operation cost for decentralized federated learning (DFL) in wireless networks, with focus on minimizing the maximum per-node energy consumption. As a critical hyperparameter for DFL, the mixing matrix controls both the convergence rate and the needs of agent-to-agent communications, and has thus been studied extensively. However, existing designs mostly focused on minimizing the communication time, leaving open the minimization of per-node energy consumption that is critical for energy-constrained devices. This work addresses this gap through a theoretically-justified solution for mixing matrix design that aims at minimizing the maximum per-node energy consumption until convergence, while taking into account the broadcast nature of wireless communications. Based on a novel convergence theorem that allows arbitrarily time-varying mixing matrices, we propose a multi-phase design framework that activates time-varying communication topologies under optimized budgets to trade off the per-iteration energy consumption and the convergence rate while balancing the energy consumption across nodes. Our evaluations based on real data have validated the efficacy of the proposed solution in combining the low energy consumption of sparse mixing matrices and the fast convergence of dense mixing matrices.


【3】Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems
标题:安全IIoT防御系统的零信任抽象联邦学习
链接:https://arxiv.org/abs/2512.23809

作者:Samaresh Kumar Singh,Joyjit Roy,Martin So
备注:9 Pages and 6 figures, Submitted in conference 2nd IEEE Conference on Secure and Trustworthy Cyber Infrastructure for IoT and Microelectronics, Houston TX, USA
摘要 :最近对关键基础设施的攻击,包括2021年Oldsmar水处理漏洞和2023年丹麦能源部门的妥协,凸显了工业物联网(IIoT)部署中紧迫的安全差距。虽然联邦学习(FL)能够保护隐私的协作入侵检测,但现有的框架仍然容易受到拜占庭中毒攻击,并且缺乏强大的代理身份验证。我们提出了零信任联合学习(ZTA-FL),这是一个深度防御框架,它结合了:(1)基于TPM的加密证明,实现了低于0.0000001的错误接受率,(2)一种新的SHAP加权聚合算法,在非IID条件下提供了理论保证的可解释的拜占庭检测,以及(3)保护隐私的设备上对抗训练。在三个IDS基准测试(Edge-IIoTset,CIC-IDS 2017,UNSW-NB 15)上的综合实验表明,ZTA-FL在30%拜占庭攻击下实现了97.8%的检测准确率,93.2%的准确率(优于FLAME 3.1%,p小于0.01),以及89.3%的对抗鲁棒性,同时减少了34%的通信开销。我们提供理论分析、故障模式表征和再现性放行代码。
摘要:Recent attacks on critical infrastructure, including the 2021 Oldsmar water treatment breach and 2023 Danish energy sector compromises, highlight urgent security gaps in Industrial IoT (IIoT) deployments. While Federated Learning (FL) enables privacy-preserving collaborative intrusion detection, existing frameworks remain vulnerable to Byzantine poisoning attacks and lack robust agent authentication. We propose Zero-Trust Agentic Federated Learning (ZTA-FL), a defense in depth framework combining: (1) TPM-based cryptographic attestation achieving less than 0.0000001 false acceptance rate, (2) a novel SHAP-weighted aggregation algorithm providing explainable Byzantine detection under non-IID conditions with theoretical guarantees, and (3) privacy-preserving on-device adversarial training. Comprehensive experiments across three IDS benchmarks (Edge-IIoTset, CIC-IDS2017, UNSW-NB15) demonstrate that ZTA-FL achieves 97.8 percent detection accuracy, 93.2 percent accuracy under 30 percent Byzantine attacks (outperforming FLAME by 3.1 percent, p less than 0.01), and 89.3 percent adversarial robustness while reducing communication overhead by 34 percent. We provide theoretical analysis, failure mode characterization, and release code for reproducibility.


【4】OptiVote: Non-Coherent FSO Over-the-Air Majority Vote for Communication-Efficient Distributed Federated Learning in Space Data Centers
标题:OptVote:非一致FSO空中多数投票支持空间数据中心中的通信高效分布式联邦学习
链接:https://arxiv.org/abs/2512.24334

作者:Anbang Zhang,Chenyuan Feng,Wai Ho Mow,Jia Ye,Shuaishuai Guo,Geyong Min,Tony Q. S. Quek
摘要:巨型星座的快速部署正在推动空间数据中心(SDC)的长期愿景,其中互连的卫星形成在轨分布式计算和学习基础设施。在这样的系统中实现分布式联邦学习是具有挑战性的,因为迭代训练需要在带宽和能量受限的卫星间链路上频繁聚合,并且链路条件可能是高度动态的。在这项工作中,我们利用空中计算(AirComp)作为网络聚合原语。然而,传统的相干AirComp依赖于严格的相位对准,由于卫星抖动和多普勒效应,难以在空间环境中保持。为了克服这一限制,我们提出了OptiVote,一个强大的和通信效率高的非相干自由空间光学(FSO)AirComp框架,用于向空间数据中心进行联合学习。OptiVote将符号随机梯度下降(signSGD)与多数表决(MV)聚合原理和脉冲位置调制(PPM)集成在一起,其中每个卫星通过激活正交PPM时隙来传达本地梯度符号。聚合节点通过非相干能量累积执行MV检测,将相敏场叠加转换为相位不可知的光强度组合,从而消除了对精确相位同步的需要,并提高了动态损伤下的恢复能力。为了减轻异构FSO信道引起的聚合偏差,我们进一步开发了一个重要性感知,信道状态信息(CSI)自由的动态功率控制方案,平衡接收到的能量,而无需额外的信令。我们提供了理论分析,统计FSO信道下的聚合错误概率的特征,并建立非凸目标的收敛保证。
摘要:The rapid deployment of mega-constellations is driving the long-term vision of space data centers (SDCs), where interconnected satellites form in-orbit distributed computing and learning infrastructures. Enabling distributed federated learning in such systems is challenging because iterative training requires frequent aggregation over inter-satellite links that are bandwidth- and energy-constrained, and the link conditions can be highly dynamic. In this work, we exploit over-the-air computation (AirComp) as an in-network aggregation primitive. However, conventional coherent AirComp relies on stringent phase alignment, which is difficult to maintain in space environments due to satellite jitter and Doppler effects. To overcome this limitation, we propose OptiVote, a robust and communication-efficient non-coherent free-space optical (FSO) AirComp framework for federated learning toward Space Data Centers. OptiVote integrates sign stochastic gradient descent (signSGD) with a majority-vote (MV) aggregation principle and pulse-position modulation (PPM), where each satellite conveys local gradient signs by activating orthogonal PPM time slots. The aggregation node performs MV detection via non-coherent energy accumulation, transforming phase-sensitive field superposition into phase-agnostic optical intensity combining, thereby eliminating the need for precise phase synchronization and improving resilience under dynamic impairments. To mitigate aggregation bias induced by heterogeneous FSO channels, we further develop an importance-aware, channel state information (CSI)-free dynamic power control scheme that balances received energies without additional signaling. We provide theoretical analysis by characterizing the aggregate error probability under statistical FSO channels and establishing convergence guarantees for non-convex objectives.


推理|分析|理解|解释(10篇)

【1】Scaling Open-Ended Reasoning to Predict the Future
标题:扩展开放式推理以预测未来
链接:https://arxiv.org/abs/2512.25070

作者:Nikhil Chandak,Shashwat Goel,Ameya Prabhu,Moritz Hardt,Jonas Geiping
备注:45 pages
摘要:高风险决策涉及对未来不确定性的推理。在这项工作中,我们训练语言模型对开放式预测问题进行预测。为了扩大训练数据的规模,我们使用全自动化的精心策划配方,从每日新闻报道的全球事件中合成新颖的预测问题。我们在我们的数据集OpenFocal上训练Qwen3思维模型。为了防止在训练和评估过程中泄漏未来的信息,我们使用离线新闻语料库,用于预测系统中的数据生成和检索。在一个小的验证集的指导下,我们展示了检索的好处,以及强化学习(RL)的改进奖励函数。一旦我们获得最终的预测系统,我们将在2025年5月至8月之间进行测试。我们的专业模型OpenForecaster 8B与更大的专有模型相匹配,我们的训练提高了预测的准确性,校准和一致性。我们发现,预测训练的校准改进在流行的基准测试中具有普遍性。我们开源了所有的模型、代码和数据,以使语言模型预测的研究更容易获得。
摘要:High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up training data, we synthesize novel forecasting questions from global events reported in daily news, using a fully automated, careful curation recipe. We train the Qwen3 thinking models on our dataset, OpenForesight. To prevent leakage of future information during training and evaluation, we use an offline news corpus, both for data generation and retrieval in our forecasting system. Guided by a small validation set, we show the benefits of retrieval, and an improved reward function for reinforcement learning (RL). Once we obtain our final forecasting system, we perform held-out testing between May to August 2025. Our specialized model, OpenForecaster 8B, matches much larger proprietary models, with our training improving the accuracy, calibration, and consistency of predictions. We find calibration improvements from forecasting training generalize across popular benchmarks. We open-source all our models, code, and data to make research on language model forecasting broadly accessible.


【2】From Trial to Deployment: A SEM Analysis of Traveler Adoptions to Fully Operational Autonomous Taxis
标题:从试验到部署:旅行者选择完全运营的自主出租车的扫描电子显微镜分析
链接:https://arxiv.org/abs/2512.24767

作者:Yutong Cai,Hua Wang
摘要:自动驾驶出租车服务代表了城市交通的变革性进步,提供安全,高效和全天候运营。虽然现有的文献已经通过规定的偏好实验和假设场景探索了用户对自动出租车的接受程度,但很少有研究基于运营AV服务调查实际用户行为。这项研究通过利用来自中国武汉的调查数据来解决这一差距,百度的Apollo Robotaxi服务在武汉大规模运营。我们设计了一个现实的调查,结合实际的服务属性,并收集336个有效的答复,从实际用户。使用结构方程模型,我们确定了六个潜在的心理结构,即信任和政策支持,成本敏感性,性能,行为意图,生活方式和教育。它们对采用行为的影响,在十个场景中的自主出租车的选择频率来衡量,检查和解释。结果表明,成本敏感性和行为意向是最强的积极预测采用,而其他潜在的结构发挥更微妙的作用。该模型在多个指数中表现出很强的拟合优度。我们的研究结果提供了经验证据,以支持政策制定,票价设计和公共宣传策略,以在现实世界的城市环境中扩大自动出租车的部署。
摘要 :Autonomous taxi services represent a transformative advancement in urban mobility, offering safety, efficiency, and round-the-clock operations. While existing literature has explored user acceptance of autonomous taxis through stated preference experiments and hypothetical scenarios, few studies have investigated actual user behavior based on operational AV services. This study addresses that gap by leveraging survey data from Wuhan, China, where Baidu's Apollo Robotaxi service operates at scale. We design a realistic survey incorporating actual service attributes and collect 336 valid responses from actual users. Using Structural Equation Modeling, we identify six latent psychological constructs, namely Trust \& Policy Support, Cost Sensitivity, Performance, Behavioral Intention, Lifestyle, and Education. Their influences on adoption behavior, measured by the selection frequency of autonomous taxis in ten scenarios, are examined and interpreted. Results show that Cost Sensitivity and Behavioral Intention are the strongest positive predictors of adoption, while other latent constructs play more nuanced roles. The model demonstrates strong goodness-of-fit across multiple indices. Our findings offer empirical evidence to support policymaking, fare design, and public outreach strategies for scaling autonomous taxis deployments in real-world urban settings.


【3】FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference
标题:高效N:M稀疏和量化模型推理的DSP协同设计
链接:https://arxiv.org/abs/2512.24713

作者:Fen-Yu Hsieh,Yun-Chang Teng,Ding-Yong Hong,Jan-Jan Wu
摘要:大型语言模型(LLM)在广泛的语言处理任务中表现出卓越的性能。然而,这种成功是以大量的计算和内存需求为代价的,这大大阻碍了它们在资源受限环境中的部署。为了应对这一挑战,这项工作引入了一个自动化框架,利用权重修剪和低位量化,并提出了一种硬件-软件协同设计方法,在现场可编程门阵列(FPGA)平台上生成加速器。特别是,我们实现了一个统一的流水线,该流水线应用N:M结构化修剪和4位整数量化来减少内存占用,然后通过优化的反量化和矩阵乘法来增强多个硬件平台上的LLM推理,包括CPU,具有密集和2:4稀疏张量核的NVIDIA GPU以及基于系统阵列的自定义FPGA加速器。利用2:4稀疏性结合4096 × 4096$矩阵的量化,我们的方法实现了权重存储的减少高达4\times $,矩阵乘法的加速提高了1.71\times $,与密集GPU基线相比,端到端延迟减少了1.29\times $。对LLaMA-7 B模型的缩放分析进一步表明,结构化稀疏使每个令牌的吞吐量提高了1.36倍。这些结果表明,细粒度的N:M稀疏性和量化的协同作用,使高效和可部署的LLM推理,而拟议的FPGA加速器提供了一个灵活的架构路径,以支持更广泛的一类稀疏模式超出固定的2:4硬件约束。
摘要:Large language models (LLMs) have demonstrated remarkable performance across a wide range of language processing tasks. However, this success comes at the cost of substantial computation and memory requirements, which significantly impedes their deployment in resource-constrained environments. To address this challenge, this work introduces an automation framework that leverages weight pruning and low-bit quantization, and presents a hardware-software co-design method that generates accelerators on the Field-Programmable Gate Array (FPGA) platform. In particular, we implement a unified pipeline that applies N:M structured pruning and 4-bit integer quantization to reduce the memory footprint, followed by optimized dequantization and matrix multiplication to enhance LLM inference on several hardware platforms, including CPUs, NVIDIA GPUs with Dense and 2:4 Sparse Tensor Cores, and a custom systolic-array-based FPGA accelerator. Utilizing 2:4 sparsity combined with quantization on $4096 \times 4096$ matrices, our approach achieves a reduction of up to $4\times$ in weight storage and a $1.71\times$ speedup in matrix multiplication, yielding a $1.29\times$ end-to-end latency reduction compared to dense GPU baselines. Scaling analysis on the LLaMA-7B model further shows that structured sparsity enhances the throughput per token by $1.36\times$. These results demonstrate the synergy of fine-grained N:M sparsity and quantization for enabling efficient and deployable LLM inference, while the proposed FPGA accelerator offers a flexible architectural path for supporting a broader class of sparsity patterns beyond the fixed 2:4 hardware constraints.


【4】Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time
标题:测试时理解和指导推理模型的认知行为
链接:https://arxiv.org/abs/2512.24574

作者:Zhenyu Zhang,Xiaoxia Wu,Zhongzhu Zhou,Qingyang Wu,Yineng Zhang,Pragaash Ponnusamy,Harikaran Subbaraj,Jue Wang,Shuaiwen Leon Song,Ben Athiwaratkun
摘要:大型语言模型(LLM)通常依赖于长思想链(CoT)推理来解决复杂的任务。虽然有效,但这些轨迹通常效率低下,导致过度令牌生成的高延迟,或在思考不足(肤浅,不一致的步骤)和过度思考(重复,冗长的推理)之间交替的不稳定推理。在这项工作中,我们研究了推理轨迹的结构,并发现了与不同的认知行为(如验证和回溯)相关的专门注意头。通过在推理时对这些头部进行轻微干预,我们可以引导模型远离低效模式。基于这一认识,我们提出了CREST,一种用于测试时认知推理转向的免训练方法。CREST有两个组成部分:(1)离线校准步骤,其识别认知头部并导出头部特定的导向向量,以及(2)推理时间过程,其旋转隐藏表示以抑制沿着那些向量的分量。CREST自适应地抑制非生产性推理行为,从而获得更高的准确性和更低的计算成本。在不同的推理基准和模型中,CREST将准确性提高了17.5%,同时将令牌使用量减少了37.6%,为更快,更可靠的LLM推理提供了一条简单有效的途径。
摘要:Large Language Models (LLMs) often rely on long chain-of-thought (CoT) reasoning to solve complex tasks. While effective, these trajectories are frequently inefficient, leading to high latency from excessive token generation, or unstable reasoning that alternates between underthinking (shallow, inconsistent steps) and overthinking (repetitive, verbose reasoning). In this work, we study the structure of reasoning trajectories and uncover specialized attention heads that correlate with distinct cognitive behaviors such as verification and backtracking. By lightly intervening on these heads at inference time, we can steer the model away from inefficient modes. Building on this insight, we propose CREST, a training-free method for Cognitive REasoning Steering at Test-time. CREST has two components: (1) an offline calibration step that identifies cognitive heads and derives head-specific steering vectors, and (2) an inference-time procedure that rotates hidden representations to suppress components along those vectors. CREST adaptively suppresses unproductive reasoning behaviors, yielding both higher accuracy and lower computational cost. Across diverse reasoning benchmarks and models, CREST improves accuracy by up to 17.5% while reducing token usage by 37.6%, offering a simple and effective pathway to faster, more reliable LLM reasoning.


【5】Lifting Vision: Ground to Aerial Localization with Reasoning Guided Planning
标题:提升愿景:通过推理引导规划进行地对空定位
链接:https://arxiv.org/abs/2512.24404

作者:Soham Pahari,M. Srinivas
摘要:多通道智能的发展近年来在视觉理解和高级推理方面取得了很大的进展。然而,大多数推理系统仍然以文本信息作为主要的推理媒介。这限制了它们在视觉导航和地理定位等空间任务中的有效性。这项工作讨论了这一领域的潜在范围,并最终提出了一个想法的视觉推理范式地理一致的视觉规划,我们介绍的框架称为视觉推理的本地化,或ViReLoc,只使用视觉表示进行规划和本地化。所提出的框架学习空间依赖关系和几何关系,基于文本的推理往往遭受理解。通过在视觉域中逐步编码推理并使用基于强化的目标进行优化,ViReLoc计划两个给定地面图像之间的路线。该系统还集成了对比学习和自适应特征交互,以对齐跨视图视角并减少视点差异。在不同的导航和定位方案的实验表明,空间推理的准确性和跨视图检索性能的一致改善。这些结果建立了视觉推理作为导航和定位的一种强有力的补充方法,并表明这些任务可以在没有实时全球定位系统数据的情况下执行,从而产生更安全的导航解决方案。
摘要 :Multimodal intelligence development recently show strong progress in visual understanding and high level reasoning. Though, most reasoning system still reply on textual information as the main medium for inference. This limit their effectiveness in spatial tasks such as visual navigation and geo-localization. This work discuss about the potential scope of this field and eventually propose an idea visual reasoning paradigm Geo-Consistent Visual Planning, our introduced framework called Visual Reasoning for Localization, or ViReLoc, which performs planning and localization using only visual representations. The proposed framework learns spatial dependencies and geometric relations that text based reasoning often suffer to understand. By encoding step by step inference in the visual domain and optimizing with reinforcement based objectives, ViReLoc plans routes between two given ground images. The system also integrates contrastive learning and adaptive feature interaction to align cross view perspectives and reduce viewpoint differences. Experiments across diverse navigation and localization scenarios show consistent improvements in spatial reasoning accuracy and cross view retrieval performance. These results establish visual reasoning as a strong complementary approach for navigation and localization, and show that such tasks can be performed without real time global positioning system data, leading to more secure navigation solutions.


【6】A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios
标题:基于扩散的基于模拟的推理评论:非理想数据场景中的基础和应用
链接:https://arxiv.org/abs/2512.23748

作者:Haley Rosso,Talea Mayo
摘要:对于复杂的模拟问题,推断参数的科学利益往往排除了使用经典的似然为基础的技术,由于棘手的似然函数。基于模拟的推理(SBI)方法通过直接利用来自模拟器的样本来学习给定观测数据$\mathbf{x}_{\text{o}}$的参数$\mathbfθ$上的后验分布,从而无需显式似然。最近的工作引起了人们的注意扩散模型-一种生成模型植根于分数匹配和逆时随机动力学-作为一个灵活的框架SBI任务。本文综述了基于扩散的SBI从第一原理到实际应用。我们首先回顾了扩散建模的数学基础(前向噪声,逆时卷积/常微分方程,概率流和去噪分数匹配),并解释了条件分数如何实现无似然后验采样。然后,我们研究了扩散模型在神经后验/似然估计中解决规范化流的痛点的地方,以及它们在哪里引入了新的权衡(例如,迭代采样成本)。本次审查的关键主题是基于扩散的SBI在科学数据常见的非理想条件下的鲁棒性:错误指定(模拟训练数据和现实之间的不匹配),非结构化或无限维观测和缺失。我们综合方法跨越基础绘制从薛定谔桥公式,条件和顺序后采样器,摊销架构的非结构化数据,推理时间之前的适应。在整个过程中,我们采用一致的符号,并强调准确后验所需的条件和注意事项。审查结束时讨论开放的问题,着眼于概率地球物理模型,可能会受益于扩散为基础的SBI的不确定性量化的应用。
摘要:For complex simulation problems, inferring parameters of scientific interest often precludes the use of classical likelihood-based techniques due to intractable likelihood functions. Simulation-based inference (SBI) methods forego the need for explicit likelihoods by directly utilizing samples from the simulator to learn posterior distributions over parameters $\mathbfθ$ given observed data $\mathbf{x}_{\text{o}}$. Recent work has brought attention to diffusion models -- a type of generative model rooted in score matching and reverse-time stochastic dynamics -- as a flexible framework SBI tasks. This article reviews diffusion-based SBI from first principles to applications in practice. We first recall the mathematical foundations of diffusion modeling (forward noising, reverse-time SDE/ODE, probability flow, and denoising score matching) and explain how conditional scores enable likelihood-free posterior sampling. We then examine where diffusion models address pain points of normalizing flows in neural posterior/likelihood estimation and where they introduce new trade-offs (e.g., iterative sampling costs). The key theme of this review is robustness of diffusion-based SBI in non-ideal conditions common to scientific data: misspecification (mismatch between simulated training data and reality), unstructured or infinite-dimensional observations, and missingness. We synthesize methods spanning foundations drawing from Schrodinger-bridge formulations, conditional and sequential posterior samplers, amortized architectures for unstructured data, and inference-time prior adaptation. Throughout, we adopt consistent notation and emphasize conditions and caveats required for accurate posteriors. The review closes with a discussion of open problems with an eye toward applications of uncertainty quantification for probabilistic geophysical models that may benefit from diffusion-based SBI.


【7】Comparative Evaluation of Embedding Representations for Financial News Sentiment Analysis
标题:财经新闻情绪分析中嵌入表示的比较评价
链接:https://arxiv.org/abs/2512.13749

作者:Joyjit Roy,Samaresh Kumar Singh
备注:6 pages, 2 figures. Submitted to IEEE IATMSI-2026 (Track: AI, IoT and Computer Vision Enabled Technologies)
摘要:金融情绪分析增强了对市场的理解;然而,标准的自然语言处理方法在应用于小型数据集时遇到了重大挑战。本研究提供了一个基于嵌入的方法在资源受限的环境中的金融新闻情感分类的比较评估。Word 2 Vec、GloVe和句子Transformer表示与手动标记的标题上的梯度提升相结合进行评估。实验结果表明,验证和测试性能之间存在很大的差距,尽管有很强的验证指标,但模型的性能比普通基线差。分析表明,预训练的嵌入在关键数据充足性阈值以下会产生收益递减,并且小的验证集会导致模型选择过程中的过拟合。通过每周情绪汇总和市场监测工作流程的叙述性总结来说明实际应用。研究结果提供了经验证据,表明嵌入质量本身无法解决情绪分类中的基本数据稀缺问题。对于资源有限的从业者来说,结果表明,当标记样本稀缺时,需要考虑替代方法,如Few-Shot学习,数据增强或词典增强混合方法。
摘要:Financial sentiment analysis enhances market understanding; however, standard natural language processing approaches encounter significant challenges when applied to small datasets. This study provides a comparative evaluation of embedding-based methods for financial news sentiment classification in resource-constrained environments. Word2Vec, GloVe, and sentence transformer representations are evaluated in combination with gradient boosting on manually labeled headlines. Experimental results identify a substantial gap between validation and test performance, with models performing worse than trivial baselines despite strong validation metrics. The analysis demonstrates that pretrained embeddings yield diminishing returns below a critical data sufficiency threshold, and that small validation sets contribute to overfitting during model selection. Practical application is illustrated through weekly sentiment aggregation and narrative summarization for market monitoring workflows. The findings offer empirical evidence that embedding quality alone cannot address fundamental data scarcity in sentiment classification. For practitioners operating with limited resources, the results indicate the need to consider alternative approaches such as few-shot learning, data augmentation, or lexicon-enhanced hybrid methods when labeled samples are scarce.


【8】Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis
标题:一阶优化的基本不等式及其在统计风险分析中的应用
链接:https://arxiv.org/abs/2512.24999

作者:Seunghoon Paik,Kangjie Zhou,Matus Telgarsky,Ryan J. Tibshirani
备注:47 pages, 3 figures (7 subfigures)
摘要:我们介绍\textit{基本不等式}的一阶迭代优化算法,形成一个简单而通用的框架,连接隐式和显式正则化。虽然相关的不平等出现在文献中,我们隔离和突出一个特定的形式,并将其发展为一个全面的统计分析工具。设$f$表示要优化的目标函数。给定一个初始值为$θ_0$,当前值为$θ_T$的一阶迭代算法,得到了关于任意参考点$z$的基本不等式f(θ_T)-f(z)$的上界,其表示为$θ_0$,$θ_T$,$z$之间的距离和累积步长.边界将迭代次数转换为损失函数中的有效正则化系数。我们通过分析训练动态和预测风险界限来证明这个框架。除了重新审视和改进梯度下降的已知结果外,我们还为Bregman发散投影的镜像下降,通过梯度下降和指数梯度下降训练的广义线性模型以及随机预测提供了新的结果。我们用广义线性模型的实验来说明和补充这些理论发现。
摘要:We introduce \textit{basic inequalities} for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit regularization. While related inequalities appear in the literature, we isolate and highlight a specific form and develop it as a well-rounded tool for statistical analysis. Let $f$ denote the objective function to be optimized. Given a first-order iterative algorithm initialized at $θ_0$ with current iterate $θ_T$, the basic inequality upper bounds $f(θ_T)-f(z)$ for any reference point $z$ in terms of the accumulated step sizes and the distances between $θ_0$, $θ_T$, and $z$. The bound translates the number of iterations into an effective regularization coefficient in the loss function. We demonstrate this framework through analyses of training dynamics and prediction risk bounds. In addition to revisiting and refining known results on gradient descent, we provide new results for mirror descent with Bregman divergence projection, for generalized linear models trained by gradient descent and exponentiated gradient descent, and for randomized predictors. We illustrate and supplement these theoretical findings with experiments on generalized linear models.


【9】Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features
标题:迈向数据驱动天气模型中的机械理解:内部激活揭示可解释的物理特征
链接:https://arxiv.org/abs/2512.24440

作者:Theodore MacMillan,Nicholas T. Ouellette
备注:18 pages, 13 figures
摘要:大型数据驱动的物理模型,如DeepMind的天气模型GraphCast,在经验上成功地为复杂动力系统参数化了时间算子,其精度达到或在某些情况下超过了传统的基于物理的求解器。不幸的是,这些数据驱动的模型如何执行计算在很大程度上是未知的,它们的内部表示是否是可解释的或物理上一致的是一个悬而未决的问题。在这里,我们从大型语言模型的可解释性研究中调整工具来分析GraphCast中的中间计算层,利用稀疏自动编码器来发现模型神经元空间中的可解释特征。我们揭示了不同的特征,在很大范围内的长度和时间尺度,对应于热带气旋,大气河流,昼夜和季节性行为,大规模降水模式,特定的地理编码,海冰范围,等等。我们进一步展示了如何通过对模型预测步骤的干预来探测这些特征的精确抽象。作为一个案例研究,我们稀疏地修改对应于热带气旋的GraphCast功能,并观察可解释的和物理上一致的修改不断变化的飓风。这些方法为数据驱动的物理模型的黑箱行为提供了一个窗口,是实现其作为值得信赖的预测者和有科学价值的发现工具的潜力的一步。
摘要:Large data-driven physics models like DeepMind's weather model GraphCast have empirically succeeded in parameterizing time operators for complex dynamical systems with an accuracy reaching or in some cases exceeding that of traditional physics-based solvers. Unfortunately, how these data-driven models perform computations is largely unknown and whether their internal representations are interpretable or physically consistent is an open question. Here, we adapt tools from interpretability research in Large Language Models to analyze intermediate computational layers in GraphCast, leveraging sparse autoencoders to discover interpretable features in the neuron space of the model. We uncover distinct features on a wide range of length and time scales that correspond to tropical cyclones, atmospheric rivers, diurnal and seasonal behavior, large-scale precipitation patterns, specific geographical coding, and sea-ice extent, among others. We further demonstrate how the precise abstraction of these features can be probed via interventions on the prediction steps of the model. As a case study, we sparsely modify a feature corresponding to tropical cyclones in GraphCast and observe interpretable and physically consistent modifications to evolving hurricanes. Such methods offer a window into the black-box behavior of data-driven physics models and are a step towards realizing their potential as trustworthy predictors and scientifically valuable tools for discovery.


【10】Quantitative Understanding of PDF Fits and their Uncertainties
标题:定量了解PDF数据集及其不确定性
链接:https://arxiv.org/abs/2512.24116

作者:Amedeo Chiefa,Luigi Del Debbio,Richard Kenway
摘要:部分子分布函数(PDF)在描述对撞机的实验数据中起着核心作用,并提供了对核子结构的洞察。随着LHC进入高精度测量时代,为了与实验精度相匹配,具有可靠不确定性量化的稳健PDF确定已成为强制性的。NNPDF合作开创了使用机器学习(ML)技术进行PDF确定的先河,使用神经网络(NN)以灵活和无偏见的方式对未知PDF进行参数化。然后通过随机梯度下降算法在实验数据上训练NN。结果的统计稳健性通过使用合成数据的广泛闭合测试进行验证。在这项工作中,我们开发了一个理论框架的基础上神经正切核(NTK)来分析神经网络的训练动力学。这种方法使我们能够在精确的假设下推导出训练过程中神经网络演化的分析描述,从而能够定量地理解训练过程。对训练动态进行分析处理使我们能够以透明的方式澄清NN架构的作用和实验数据的影响。类似地,我们能够描述训练期间NN输出的协方差的演变,提供不确定性如何从数据传播到拟合函数的定量描述。虽然我们的结果不能替代PDF拟合,但它们确实提供了一个强大的诊断工具来评估当前拟合方法的鲁棒性。除了与粒子物理现象学的相关性之外,我们对PDF测定的分析还提供了一个测试平台,用于应用ML社区中开发的学习过程的理论思想。
摘要:Parton Distribution Functions (PDFs) play a central role in describing experimental data at colliders and provide insight into the structure of nucleons. As the LHC enters an era of high-precision measurements, a robust PDF determination with a reliable uncertainty quantification has become mandatory in order to match the experimental precision. The NNPDF collaboration has pioneered the use of Machine Learning (ML) techniques for PDF determinations, using Neural Networks (NNs) to parametrise the unknown PDFs in a flexible and unbiased way. The NNs are then trained on experimental data by means of stochastic gradient descent algorithms. The statistical robustness of the results is validated by extensive closure tests using synthetic data. In this work, we develop a theoretical framework based on the Neural Tangent Kernel (NTK) to analyse the training dynamics of neural networks. This approach allows us to derive, under precise assumptions, an analytical description of the neural network evolution during training, enabling a quantitative understanding of the training process. Having an analytical handle on the training dynamics allows us to clarify the role of the NN architecture and the impact of the experimental data in a transparent way. Similarly, we are able to describe the evolution of the covariance of the NN output during training, providing a quantitative description of how uncertainties are propagated from the data to the fitted function. While our results are not a substitute for PDF fitting, they do provide a powerful diagnostic tool to assess the robustness of current fitting methodologies. Beyond its relevance for particle physics phenomenology, our analysis of PDF determinations provides a testbed to apply theoretical ideas about the learning process developed in the ML community.


检测相关(2篇)

【1】Security Without Detection: Economic Denial as a Primitive for Edge and IoT Defense
标题:无检测的安全性:经济否认是边缘和物联网防御的原始条件
链接:https://arxiv.org/abs/2512.23849

作者:Samaresh Kumar Singh,Joyjit Roy
备注:8 pages, 2 figures, submitted to 3rd International Conference on Intelligent Digitization of Systems and Services (IDSS2026)
摘要:基于检测的安全性无法抵御使用加密、隐形和低速率技术的复杂攻击者,特别是在资源限制无法进行基于ML的入侵检测的物联网/边缘环境中。我们提出了经济拒绝安全(EDS),一个独立于检测的框架,通过利用一个基本的不对称性,使攻击在经济上不可行:防御者控制他们的环境,而攻击者不能。EDS包括四种机制自适应计算难题,诱饵驱动的相互作用熵,时间拉伸和带宽税收实现可证明的超线性成本放大。我们将EDS形式化为Stackelberg博弈,推导出最优参数选择的封闭形式均衡(定理1),并证明机制组合产生的成本比单个机制的总和高2.1倍(定理2)。EDS需要小于12 KB的内存,可在ESP 32类微控制器上部署。在四种攻击场景下对20台设备的异构物联网测试平台进行的评估(n = 30次试验,p < 0.001)表明:32- 560倍的攻击减速,85-520:1的成本不对称,8-62%的攻击成功减少,<20 ms的延迟开销,以及接近0%的误报。针对IoT-23恶意软件(Mirai、Torii、Hajime)的验证显示,独立缓解率为88%;结合ML-IDS,EDS实现了94%的缓解率,而单独使用IDS的缓解率为67%,提高了27%。EDS提供独立于检测的保护,适用于传统方法无法解决的资源受限环境。检测和缓解所测试的恶意软件样本的能力得到了增强;然而,即使不包括IDS,EDS也能实现其带来的好处。总的来说,EDS的实施有助于改变经济平衡,有利于防御者,并提供了一种保护物联网和边缘系统方法的可行方法。
摘要 :Detection-based security fails against sophisticated attackers using encryption, stealth, and low-rate techniques, particularly in IoT/edge environments where resource constraints preclude ML-based intrusion detection. We present Economic Denial Security (EDS), a detection-independent framework that makes attacks economically infeasible by exploiting a fundamental asymmetry: defenders control their environment while attackers cannot. EDS composes four mechanisms adaptive computational puzzles, decoy-driven interaction entropy, temporal stretching, and bandwidth taxation achieving provably superlinear cost amplification. We formalize EDS as a Stackelberg game, deriving closed-form equilibria for optimal parameter selection (Theorem 1) and proving that mechanism composition yields 2.1x greater costs than the sum of individual mechanisms (Theorem 2). EDS requires < 12KB memory, enabling deployment on ESP32 class microcontrollers. Evaluation on a 20-device heterogeneous IoT testbed across four attack scenarios (n = 30 trials, p < 0.001) demonstrates: 32-560x attack slowdown, 85-520:1 cost asymmetry, 8-62% attack success reduction, < 20ms latency overhead, and close to 0% false positives. Validation against IoT-23 malware (Mirai, Torii, Hajime) shows 88% standalone mitigation; combined with ML-IDS, EDS achieves 94% mitigation versus 67% for IDS alone a 27% improvement. EDS provides detection-independent protection suitable for resource-constrained environments where traditional approaches fail. The ability to detect and mitigate the malware samples tested was enhanced; however, the benefits provided by EDS were realized even without the inclusion of an IDS. Overall, the implementation of EDS serves to shift the economic balance in favor of the defender and provides a viable method to protect IoT and edge systems methodologies.


【2】Fast reconstruction-based ROI triggering via anomaly detection in the CYGNO optical TPC
标题:通过CYGNO光学PPC中的异常检测进行基于重建的快速感兴趣区触发
链接:https://arxiv.org/abs/2512.24290

作者:F. D. Amaro,R. Antonietti,E. Baracchini,L. Benussi,C. Capoccia,M. Caponero,L. G. M. de Carvalho,G. Cavoto,I. A. Costa,A. Croce,M. D'Astolfo,G. D'Imperio,G. Dho,E. Di Marco,J. M. F. dos Santos,D. Fiorina,F. Iacoangeli,Z. Islam,E. Kemp,H. P. Lima,G. Maccarrone,R. D. P. Mano,D. J. G. Marques,G. Mazzitelli,P. Meloni,A. Messina,V. Monno,C. M. B. Monteiro,R. A. Nobrega,G. M. Oppedisano,I. F. Pains,E. Paoletti,F. Petrucci,S. Piacentini,D. Pierluigi,D. Pinci,F. Renga,A. Russo,G. Saviano,P. A. O. C. Silva,N. J. Spooner,R. Tesauro,S. Tomassini,D. Tozzi
备注:13 pages, 6 figures, Submitted to IOP Machine Learning: Science and Technology
摘要:光学读出时间投影室(TPC)产生百万像素级的图像,其细粒度的拓扑信息是必不可少的稀有事件的搜索,但其规模的挑战实时数据选择。我们提出了一种无监督的,基于重建的异常检测策略,用于快速感兴趣区域(ROI)提取,直接在最低限度处理的相机帧上操作。专门在基座图像上训练的卷积自动编码器在没有标签、模拟或细粒度校准的情况下学习检测器噪声形态。应用于标准的数据采集框架,局部重建残差识别粒子引起的结构,通过阈值和空间聚类提取紧凑的ROI。使用来自CYGNO光学TPC原型的真实数据,我们比较了两种仅在训练目标上不同的训练训练后的自动编码器配置,从而对其影响进行了受控研究。最佳配置保留(93.0 +/- 0.2)%的重建信号强度,同时丢弃(97.8 +/- 0.1)%的图像区域,在消费者GPU上每帧的推理时间约为25 ms。结果表明,精心设计的训练目标是有效的基于重建的异常检测的关键,并在光学TPC的在线数据减少,提供了一个透明的和检测器不可知的基线。
摘要:Optical-readout Time Projection Chambers (TPCs) produce megapixel-scale images whose fine-grained topological information is essential for rare-event searches, but whose size challenges real-time data selection. We present an unsupervised, reconstruction-based anomaly-detection strategy for fast Region-of-Interest (ROI) extraction that operates directly on minimally processed camera frames. A convolutional autoencoder trained exclusively on pedestal images learns the detector noise morphology without labels, simulation, or fine-grained calibration. Applied to standard data-taking frames, localized reconstruction residuals identify particle-induced structures, from which compact ROIs are extracted via thresholding and spatial clustering. Using real data from the CYGNO optical TPC prototype, we compare two pedestal-trained autoencoder configurations that differ only in their training objective, enabling a controlled study of its impact. The best configuration retains (93.0 +/- 0.2)% of reconstructed signal intensity while discarding (97.8 +/- 0.1)% of the image area, with an inference time of approximately 25 ms per frame on a consumer GPU. The results demonstrate that careful design of the training objective is critical for effective reconstruction-based anomaly detection and that pedestal-trained autoencoders provide a transparent and detector-agnostic baseline for online data reduction in optical TPCs.


分类|识别(6篇)

【1】Generative Classifiers Avoid Shortcut Solutions
标题:生成式分类器避免收件箱解决方案
链接:https://arxiv.org/abs/2512.25034

作者:Alexander C. Li,Ananya Kumar,Deepak Pathak
备注:ICLR 2025. Code: https://github.com/alexlioralexli/generative-classifiers
摘要:判别式分类方法经常学习保持分布内的捷径,但即使在较小的分布偏移下也会失败。这种失效模式源于对与标签虚假相关的功能的过度依赖。我们表明,生成式分类器,使用类条件生成模型,可以避免这个问题的建模所有功能,核心和虚假的,而不是主要是虚假的。这些生成式分类器训练简单,避免了需要专门的增强、强正则化、额外的超参数或要避免的特定虚假相关性的知识。我们发现,基于扩散和自回归的生成分类器在五个标准图像和文本分布偏移基准上实现了最先进的性能,并减少了虚假相关性在实际应用中的影响,如医疗或卫星数据集。最后,我们仔细分析了高斯玩具设置,以了解生成分类器的归纳偏差,以及确定生成分类器何时优于判别分类器的数据属性。
摘要:Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail even under minor distribution shift. This failure mode stems from an overreliance on features that are spuriously correlated with the label. We show that generative classifiers, which use class-conditional generative models, can avoid this issue by modeling all features, both core and spurious, instead of mainly spurious ones. These generative classifiers are simple to train, avoiding the need for specialized augmentations, strong regularization, extra hyperparameters, or knowledge of the specific spurious correlations to avoid. We find that diffusion-based and autoregressive generative classifiers achieve state-of-the-art performance on five standard image and text distribution shift benchmarks and reduce the impact of spurious correlations in realistic applications, such as medical or satellite datasets. Finally, we carefully analyze a Gaussian toy setting to understand the inductive biases of generative classifiers, as well as the data properties that determine when generative classifiers outperform discriminative ones.


【2】Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning
标题:序贯支持网络学习的半重叠多强盗最佳臂辨识
链接:https://arxiv.org/abs/2512.24959

作者:András Antos,András Millinghoffer,Péter Antal
备注:29 pages, 2 figures
摘要:许多现代人工智能和机器学习问题需要通过共享但不对称的计算密集型流程来评估合作伙伴的贡献,并同时选择最有利的候选人。这些问题的顺序方法可以统一在一个新的框架下,顺序支持网络学习(SSNL),其目标是选择最有益的候选人集的合作伙伴使用试验的所有参与者,也就是说,学习一个有向图,代表最高性能的贡献。我们证明了一个新的纯探索模型,半重叠多(多臂)土匪(SOMMAB),其中一个单一的评价提供了不同的反馈,多个土匪由于他们的武器之间的结构重叠,可以用来学习支持网络稀疏候选列表有效。   我们开发了一个广义GapE算法的SOMMABs和推导出新的指数误差界,提高了最好的已知常数的指数多强盗最佳臂识别。边界与重叠程度呈线性关系,揭示了共享评估带来的显着样本复杂性增益。   从应用的角度来看,这项工作提供了一个理论基础和改进的性能保证顺序学习工具,用于识别支持网络从稀疏的候选人在多个学习问题,如在多任务学习(MTL),辅助任务学习(ATL),联邦学习(FL),并在多智能体系统(MAS)。
摘要 :Many modern AI and ML problems require evaluating partners' contributions through shared yet asymmetric, computationally intensive processes and the simultaneous selection of the most beneficial candidates. Sequential approaches to these problems can be unified under a new framework, Sequential Support Network Learning (SSNL), in which the goal is to select the most beneficial candidate set of partners for all participants using trials; that is, to learn a directed graph that represents the highest-performing contributions. We demonstrate that a new pure-exploration model, the semi-overlapping multi-(multi-armed) bandit (SOMMAB), in which a single evaluation provides distinct feedback to multiple bandits due to structural overlap among their arms, can be used to learn a support network from sparse candidate lists efficiently.   We develop a generalized GapE algorithm for SOMMABs and derive new exponential error bounds that improve the best known constant in the exponent for multi-bandit best-arm identification. The bounds scale linearly with the degree of overlap, revealing significant sample-complexity gains arising from shared evaluations.   From an application point of view, this work provides a theoretical foundation and improved performance guarantees for sequential learning tools for identifying support networks from sparse candidates in multiple learning problems, such as in multi-task learning (MTL), auxiliary task learning (ATL), federated learning (FL), and in multi-agent systems (MAS).


【3】AI-Driven Acoustic Voice Biomarker-Based Hierarchical Classification of Benign Laryngeal Voice Disorders from Sustained Vowels
标题:基于人工智能驱动的声学声音生物标记物的良性喉癌与持续元音的分层分类
链接:https://arxiv.org/abs/2512.24628

作者:Mohsen Annabestani,Samira Aghadoost,Anais Rameau,Olivier Elemento,Gloria Chia-Yi Chiang
摘要:良性喉发声障碍影响近五分之一的人,通常表现为发声困难,同时也作为更广泛的生理功能障碍的非侵入性指标。我们引入了一个临床启发的分层机器学习框架,使用从短而持续的元音发音中提取的声学特征,对8种良性语音障碍以及健康对照进行自动分类。实验利用了萨尔布鲁肯语音数据库中1,261名说话者的15,132段录音,包括元音/a/、/i/和/u/的中性、高、低和滑动音高。根据临床分流工作流程,该框架分三个连续阶段操作:第一阶段通过将卷积神经网络衍生的mel频谱图特征与21个可解释的声学生物标志物整合,对病理性与非病理性声音进行二进制筛选;第二阶段使用立方支持向量机将声音分层为健康,功能性或精神性,以及结构性或炎症性组;第3阶段通过合并来自先前阶段的概率输出来实现细粒度分类,从而改善相对于功能状况的结构性和炎症性疾病的区分。所提出的系统始终优于平面多类分类器和预训练的自监督模型,包括Meta HuBERT和Google HeAR,其通用目标并未针对持续的临床发声进行优化。通过将深度频谱表示与可解释的声学特征相结合,该框架增强了透明度和临床对齐。这些结果突出了定量声音生物标志物作为早期筛查、诊断分诊和纵向监测声音健康的可扩展的非侵入性工具的潜力。
摘要:Benign laryngeal voice disorders affect nearly one in five individuals and often manifest as dysphonia, while also serving as non-invasive indicators of broader physiological dysfunction. We introduce a clinically inspired hierarchical machine learning framework for automated classification of eight benign voice disorders alongside healthy controls, using acoustic features extracted from short, sustained vowel phonations. Experiments utilized 15,132 recordings from 1,261 speakers in the Saarbruecken Voice Database, covering vowels /a/, /i/, and /u/ at neutral, high, low, and gliding pitches. Mirroring clinical triage workflows, the framework operates in three sequential stages: Stage 1 performs binary screening of pathological versus non-pathological voices by integrating convolutional neural network-derived mel-spectrogram features with 21 interpretable acoustic biomarkers; Stage 2 stratifies voices into Healthy, Functional or Psychogenic, and Structural or Inflammatory groups using a cubic support vector machine; Stage 3 achieves fine-grained classification by incorporating probabilistic outputs from prior stages, improving discrimination of structural and inflammatory disorders relative to functional conditions. The proposed system consistently outperformed flat multi-class classifiers and pre-trained self-supervised models, including META HuBERT and Google HeAR, whose generic objectives are not optimized for sustained clinical phonation. By combining deep spectral representations with interpretable acoustic features, the framework enhances transparency and clinical alignment. These results highlight the potential of quantitative voice biomarkers as scalable, non-invasive tools for early screening, diagnostic triage, and longitudinal monitoring of vocal health.


【4】Sparse classification with positive-confidence data in high dimensions
标题:具有高维度正置信度数据的稀疏分类
链接:https://arxiv.org/abs/2512.24443

作者:The Tien Mai,Mai Anh Nguyen,Trung Nghia Nguyen
摘要:高维学习问题,其中特征的数量超过样本大小,通常需要稀疏正则化来进行有效的预测和变量选择。虽然这些技术是为完全监督的数据建立的,但在弱监督环境中,如正置信度(Pconf)分类,这些技术仍然没有得到充分的探索。Pconf学习只使用配备了置信度分数的正样本,从而避免了对负数据的需要。然而,现有的Pconf方法是不适合高维政权。本文提出了一种新的稀疏惩罚框架的高维Pconf分类。我们引入了使用凸(Lasso)和非凸(SCAD,MCP)惩罚的估计器,以解决收缩偏差并提高特征恢复。在理论上,我们建立了L1正则化Pconf估计的估计和预测误差界,证明了在约束强凸条件下,它能达到最小最优稀疏恢复率.为了解决由此产生的复合目标,我们开发了一个有效的近端梯度算法。大量的模拟表明,我们提出的方法实现了预测性能和变量选择精度相媲美的完全监督的方法,有效地弥合了弱监督和高维统计之间的差距。
摘要:High-dimensional learning problems, where the number of features exceeds the sample size, often require sparse regularization for effective prediction and variable selection. While established for fully supervised data, these techniques remain underexplored in weak-supervision settings such as Positive-Confidence (Pconf) classification. Pconf learning utilizes only positive samples equipped with confidence scores, thereby avoiding the need for negative data. However, existing Pconf methods are ill-suited for high-dimensional regimes. This paper proposes a novel sparse-penalization framework for high-dimensional Pconf classification. We introduce estimators using convex (Lasso) and non-convex (SCAD, MCP) penalties to address shrinkage bias and improve feature recovery. Theoretically, we establish estimation and prediction error bounds for the L1-regularized Pconf estimator, proving it achieves near minimax-optimal sparse recovery rates under Restricted Strong Convexity condition. To solve the resulting composite objective, we develop an efficient proximal gradient algorithm. Extensive simulations demonstrate that our proposed methods achieve predictive performance and variable selection accuracy comparable to fully supervised approaches, effectively bridging the gap between weak supervision and high-dimensional statistics.


【5】Improved Balanced Classification with Theoretically Grounded Loss Functions
标题:利用理论上固定的损失函数改进的平衡分类
链接:https://arxiv.org/abs/2512.23947

作者:Corinna Cortes,Mehryar Mohri,Yutao Zhong
备注:NeurIPS 2025
摘要:均衡损失是类不均衡情况下多类分类的一个广泛采用的目标。通过对所有班级给予同等重视,无论其频率如何,它促进了公平,并确保少数班级不被忽视。然而,直接最小化平衡分类损失通常是棘手的,这使得有效的代理损失的设计成为一个中心问题。本文介绍和研究了两种先进的代理损失族:广义Logit调整(GLA)损失函数和广义类感知加权(GCA)损失函数。GLA损失将基于类别先验的Logit调整损失推广到更广泛的一般交叉熵损失家族。GCA损失函数扩展了标准的类加权损失,它通过类频率逆比例损失,通过将类相关的置信度并将其扩展到一般的交叉熵家族。我们提出了一个全面的理论分析的一致性损失的家庭。我们表明,GLA损失是贝叶斯一致的,但只有$H$-一致的完整(即,无界)假设集。此外,它们的$H$-一致性界与最小类概率成反比,至少为1/\mathsf p_{\min}$。相比之下,GCA损失对于任何有界或完备的假设集都是H-一致的,H-一致性界的规模更有利,为1/\sqrt{\mathsf p_{\min}}$,在不平衡设置中提供了更强的理论保证。我们报告的实验结果表明,从经验上讲,GCA损失与校准类依赖的置信区间和GLA损失可以大大优于直接类加权损失以及LA损失。GLA通常在常见的基准测试中表现稍好,而GCA在高度不平衡的设置中表现出轻微的优势。
摘要 :The balanced loss is a widely adopted objective for multi-class classification under class imbalance. By assigning equal importance to all classes, regardless of their frequency, it promotes fairness and ensures that minority classes are not overlooked. However, directly minimizing the balanced classification loss is typically intractable, which makes the design of effective surrogate losses a central question. This paper introduces and studies two advanced surrogate loss families: Generalized Logit-Adjusted (GLA) loss functions and Generalized Class-Aware weighted (GCA) losses. GLA losses generalize Logit-Adjusted losses, which shift logits based on class priors, to the broader general cross-entropy loss family. GCA loss functions extend the standard class-weighted losses, which scale losses inversely by class frequency, by incorporating class-dependent confidence margins and extending them to the general cross-entropy family. We present a comprehensive theoretical analysis of consistency for both loss families. We show that GLA losses are Bayes-consistent, but only $H$-consistent for complete (i.e., unbounded) hypothesis sets. Moreover, their $H$-consistency bounds depend inversely on the minimum class probability, scaling at least as $1/\mathsf p_{\min}$. In contrast, GCA losses are $H$-consistent for any hypothesis set that is bounded or complete, with $H$-consistency bounds that scale more favorably as $1/\sqrt{\mathsf p_{\min}}$, offering significantly stronger theoretical guarantees in imbalanced settings. We report the results of experiments demonstrating that, empirically, both the GCA losses with calibrated class-dependent confidence margins and GLA losses can greatly outperform straightforward class-weighted losses as well as the LA losses. GLA generally performs slightly better in common benchmarks, whereas GCA exhibits a slight edge in highly imbalanced settings.


【6】Coordinate Matrix Machine: A Human-level Concept Learning to Classify Very Similar Documents
标题:坐标矩阵机:对非常相似的文档进行分类的人类层面的概念学习
链接:https://arxiv.org/abs/2512.23749

作者:Amin Sadri,M Maruf Hossain
备注:16 pages, 3 figures
摘要:人类水平的概念学习认为,人类通常从一个例子中学习新概念,而机器学习算法通常需要数百个样本来学习一个概念。我们的大脑下意识地识别重要的特征,并更有效地学习。\vspace*{6pt}   贡献:在本文中,我们提出了坐标矩阵机(CM$^2$)。这个专门构建的小模型通过学习文档结构并使用此信息对文档进行分类来增强人类智能。虽然现代“红色AI”趋势依赖于大规模的预训练和能源密集型GPU基础设施,但CM $^2 $被设计为绿色AI解决方案。它通过只识别人类会考虑的结构性“重要特征”来实现人类级别的概念学习,允许它对非常相似的文档进行分类,每个类只使用一个样本。   优势:我们的算法优于传统的向量化器和复杂的深度学习模型,这些模型需要更大的数据集和大量的计算。通过关注结构坐标而不是穷尽的语义向量,CM $^2 $提供了:1。以最少的数据实现高精度(一次性学习)2.几何和结构智能3.绿色人工智能和环境可持续性4。针对仅CPU环境进行了优化5.内在可解释性(玻璃盒模型)6。更快的计算和低延迟7.对不平衡类的鲁棒性8.经济可行性9.通用、可扩展和可扩展
摘要:Human-level concept learning argues that humans typically learn new concepts from a single example, whereas machine learning algorithms typically require hundreds of samples to learn a single concept. Our brain subconsciously identifies important features and learns more effectively. \vspace*{6pt}   Contribution: In this paper, we present the Coordinate Matrix Machine (CM$^2$). This purpose-built small model augments human intelligence by learning document structures and using this information to classify documents. While modern "Red AI" trends rely on massive pre-training and energy-intensive GPU infrastructure, CM$^2$ is designed as a Green AI solution. It achieves human-level concept learning by identifying only the structural "important features" a human would consider, allowing it to classify very similar documents using only one sample per class.   Advantage: Our algorithm outperforms traditional vectorizers and complex deep learning models that require larger datasets and significant compute. By focusing on structural coordinates rather than exhaustive semantic vectors, CM$^2$ offers: 1. High accuracy with minimal data (one-shot learning) 2. Geometric and structural intelligence 3. Green AI and environmental sustainability 4. Optimized for CPU-only environments 5. Inherent explainability (glass-box model) 6. Faster computation and low latency 7. Robustness against unbalanced classes 8. Economic viability 9. Generic, expandable, and extendable


表征(1篇)

【1】On the geometry and topology of representations: the manifolds of modular addition
标题:关于表示的几何和拓学:模加法的管形
链接:https://arxiv.org/abs/2512.25060

作者:Gabriela Moisescu-Pareja,Gavin McCracken,Harley Wiltzer,Vincent Létourneau,Colin Daniels,Doina Precup,Jonathan Love
摘要:时钟和比萨饼的解释,与架构不同的统一或可学习的注意力,被引入认为,不同的架构设计可以产生不同的电路模块添加。在这项工作中,我们表明,这是不是这种情况下,统一的注意力和可训练的注意力架构实现相同的算法,通过拓扑和几何等价表示。我们的方法超越了对单个神经元和权重的解释。相反,我们识别出与每个学习到的表示相对应的所有神经元,然后将神经元的集合作为一个实体进行研究。这种方法揭示了每个学习的表示都是一个流形,我们可以利用拓扑学的工具来研究。基于这一认识,我们可以统计分析数百个电路的学习表示,以证明从常见深度学习范式中自然产生的学习模块加法电路之间的相似性。
摘要:The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different architectural designs can yield distinct circuits for modular addition. In this work, we show that this is not the case, and that both uniform attention and trainable attention architectures implement the same algorithm via topologically and geometrically equivalent representations. Our methodology goes beyond the interpretation of individual neurons and weights. Instead, we identify all of the neurons corresponding to each learned representation and then study the collective group of neurons as one entity. This method reveals that each learned representation is a manifold that we can study utilizing tools from topology. Based on this insight, we can statistically analyze the learned representations across hundreds of circuits to demonstrate the similarity between learned modular addition circuits that arise naturally from common deep learning paradigms.


3D|3D重建等相关(1篇)

【1】3D Semantic Segmentation for Post-Disaster Assessment
标题:灾后评估的3D语义分割
链接:https://arxiv.org/abs/2512.24593

作者:Nhut Le,Maryam Rahnemoonfar
备注:Accepted by the 2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2025)
摘要:自然灾害日益频繁,对人的生命构成严重威胁,并造成巨大的经济损失。虽然3D语义分割对于灾后评估至关重要,但现有的深度学习模型缺乏专门为灾后环境设计的数据集。为了解决这一差距,我们使用无人机(UAV)构建了一个专门的3D数据集-在受影响地区捕获飓风伊恩(2022)的空中镜头,采用运动恢复结构(SfM)和多视图立体(MVS)技术来重建3D点云。我们在这个数据集上评估了最先进的(SOTA)3D语义分割模型,Fast Point Transformer(FPT),Point Transformer v3(PTv 3)和OA-CNN,暴露了受灾地区现有方法的重大局限性。这些发现强调了迫切需要在3D分割技术和专门的3D基准数据集的发展,以改善灾后现场的理解和响应。
摘要:The increasing frequency of natural disasters poses severe threats to human lives and leads to substantial economic losses. While 3D semantic segmentation is crucial for post-disaster assessment, existing deep learning models lack datasets specifically designed for post-disaster environments. To address this gap, we constructed a specialized 3D dataset using unmanned aerial vehicles (UAVs)-captured aerial footage of Hurricane Ian (2022) over affected areas, employing Structure-from-Motion (SfM) and Multi-View Stereo (MVS) techniques to reconstruct 3D point clouds. We evaluated the state-of-the-art (SOTA) 3D semantic segmentation models, Fast Point Transformer (FPT), Point Transformer v3 (PTv3), and OA-CNNs on this dataset, exposing significant limitations in existing methods for disaster-stricken regions. These findings underscore the urgent need for advancements in 3D segmentation techniques and the development of specialized 3D benchmark datasets to improve post-disaster scene understanding and response.


优化|敛散性(6篇)

【1】Convergence of the generalization error for deep gradient flow methods for PDEs
标题:偏微分方程深梯度流方法推广误差的收敛性
链接:https://arxiv.org/abs/2512.25017

作者:Chenguang Liu,Antonis Papapantoleon,Jasper Rou
备注:28 pages
摘要 :本文的目的是提供一个坚实的数学基础,应用深梯度流方法(DGFM)的解决方案(高维)偏微分方程(PDE)。我们将DGFM的泛化误差分解为近似误差和训练误差。我们首先证明了满足合理和可验证的假设的偏微分方程的解可以用神经网络近似,因此当神经元的数目趋于无穷大时,近似误差趋于零。然后,我们推导出训练过程在“宽网络极限”下遵循的梯度流,并分析了当训练时间趋于无穷大时该梯度流的极限。这些结果表明,DGFMs的泛化误差趋于零的神经元数目和训练时间趋于无穷大。
摘要:The aim of this article is to provide a firm mathematical foundation for the application of deep gradient flow methods (DGFMs) for the solution of (high-dimensional) partial differential equations (PDEs). We decompose the generalization error of DGFMs into an approximation and a training error. We first show that the solution of PDEs that satisfy reasonable and verifiable assumptions can be approximated by neural networks, thus the approximation error tends to zero as the number of neurons tends to infinity. Then, we derive the gradient flow that the training process follows in the ``wide network limit'' and analyze the limit of this flow as the training time tends to infinity. These results combined show that the generalization error of DGFMs tends to zero as the number of neurons and the training time tend to infinity.


【2】Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback
标题:偏好反馈零和博弈的非正规线性收敛
链接:https://arxiv.org/abs/2512.24818

作者:Shulun Chen,Runlong Zhou,Zihan Zhang,Maryam Fazel,Simon S. Du
备注:28 pages
摘要:将大型语言模型(LLM)与人类偏好对齐已被证明可以有效地增强模型功能,但使用Bradley-Terry模型的标准偏好建模假设了传递性,忽略了人类群体偏好的固有复杂性。Nash learning from human feedback(NLHF)通过将非传递性偏好框定为两人零和游戏来解决这一问题,其中对齐减少到找到纳什均衡(NE)。然而,现有的算法通常依赖于正则化,在计算原始游戏中的对偶间隙时会产生不可避免的偏差。在这项工作中,我们提供了第一个收敛保证乐观的乘法权重更新($\marttt {OMWU}$)在NLHF,表明它实现了最后一个线性收敛后,老化阶段,只要一个NE与完全支持存在,与实例相关的线性收敛速度到原来的NE,衡量的对偶差距。与Wei等人(2020)的先前结果相比,我们不需要NE唯一性的假设。我们的分析确定了一种新的边际收敛行为,其中很少播放的动作的概率从指数小值呈指数增长,使指数更好地依赖于实例相关常数比以前的结果。实验证实了$\mathtt{OMWU}$在表格和神经策略类的理论优势,展示了其潜在的LLM应用。
摘要:Aligning large language models (LLMs) with human preferences has proven effective for enhancing model capabilities, yet standard preference modeling using the Bradley-Terry model assumes transitivity, overlooking the inherent complexity of human population preferences. Nash learning from human feedback (NLHF) addresses this by framing non-transitive preferences as a two-player zero-sum game, where alignment reduces to finding the Nash equilibrium (NE). However, existing algorithms typically rely on regularization, incurring unavoidable bias when computing the duality gap in the original game. In this work, we provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($\mathtt{OMWU}$) in NLHF, showing that it achieves last-iterate linear convergence after a burn-in phase whenever an NE with full support exists, with an instance-dependent linear convergence rate to the original NE, measured by duality gaps. Compared to prior results in Wei et al. (2020), we do not require the assumption of NE uniqueness. Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values, enabling exponentially better dependence on instance-dependent constants than prior results. Experiments corroborate the theoretical strengths of $\mathtt{OMWU}$ in both tabular and neural policy classes, demonstrating its potential for LLM applications.


【3】Early Prediction of Sepsis using Heart Rate Signals and Genetic Optimized LSTM Algorithm
标题:基于心率信号和遗传优化LSTM算法的脓毒症早期预测
链接:https://arxiv.org/abs/2512.24253

作者:Alireza Rafiei,Farshid Hajati,Alireza Rezaee,Amirhossien Panahi,Shahadat Uddin
摘要:脓毒症的特征在于对感染的免疫应答失调,导致显著的死亡率、发病率和医疗保健成本。及时预测脓毒症进展对于通过早期干预减少不良结局至关重要。尽管针对重症监护病房(ICU)患者开发了许多模型,但在非病房环境中早期检测脓毒症的方法仍存在显著差距。本研究介绍并评估了四种新型机器学习算法,旨在通过分析心率数据来预测可穿戴设备上败血症的发作。这些模型的体系结构通过遗传算法进行了优化,优化了性能,计算复杂性和内存需求。随后提取每个模型的性能指标,以评估其在能够准确监测心率的可穿戴设备上实施的可行性。这些模型最初是为一个小时的预测窗口量身定制的,后来通过迁移学习扩展到四个小时。这项研究的令人鼓舞的结果表明,可穿戴技术有潜力促进ICU和病房环境外的早期脓毒症检测。
摘要:Sepsis, characterized by a dysregulated immune response to infection, results in significant mortality, morbidity, and healthcare costs. The timely prediction of sepsis progression is crucial for reducing adverse outcomes through early intervention. Despite the development of numerous models for Intensive Care Unit (ICU) patients, there remains a notable gap in approaches for the early detection of sepsis in non-ward settings. This research introduces and evaluates four novel machine learning algorithms designed for predicting the onset of sepsis on wearable devices by analyzing heart rate data. The architecture of these models was refined through a genetic algorithm, optimizing for performance, computational complexity, and memory requirements. Performance metrics were subsequently extracted for each model to evaluate their feasibility for implementation on wearable devices capable of accurate heart rate monitoring. The models were initially tailored for a prediction window of one hour, later extended to four hours through transfer learning. The encouraging outcomes of this study suggest the potential for wearable technology to facilitate early sepsis detection outside ICU and ward environments.


【4】Neural Optimal Design of Experiment for Inverse Problems
标题:反问题实验的神经优化设计
链接:https://arxiv.org/abs/2512.23763

作者:John E. Darges,Babak Maboudi Afkham,Matthias Chung
摘要:我们介绍了神经最优实验设计,这是一个基于学习的框架,用于反问题中的最优实验设计,避免了经典的双层优化和间接稀疏正则化。NODE在单个优化循环内联合训练神经重建模型和一组固定预算的连续设计变量,这些变量表示传感器位置、采样时间或测量角度。通过直接优化测量位置,而不是加权密集网格的候选人,所提出的方法强制稀疏的设计,消除了对l1调谐的需要,并大大降低了计算复杂性。我们验证NODE上的分析听话的指数增长基准,MNIST图像采样,并说明其有效性在现实世界中的稀疏视图X射线CT的例子。在所有情况下,NODE优于基线方法,证明了改进的重建精度和特定任务的性能。
摘要:We introduce Neural Optimal Design of Experiments, a learning-based framework for optimal experimental design in inverse problems that avoids classical bilevel optimization and indirect sparsity regularization. NODE jointly trains a neural reconstruction model and a fixed-budget set of continuous design variables representing sensor locations, sampling times, or measurement angles, within a single optimization loop. By optimizing measurement locations directly rather than weighting a dense grid of candidates, the proposed approach enforces sparsity by design, eliminates the need for l1 tuning, and substantially reduces computational complexity. We validate NODE on an analytically tractable exponential growth benchmark, on MNIST image sampling, and illustrate its effectiveness on a real world sparse view X ray CT example. In all cases, NODE outperforms baseline approaches, demonstrating improved reconstruction accuracy and task-specific performance.


【5】Fairness-Aware Insurance Pricing: A Multi-Objective Optimization Approach
标题:公平意识的保险定价:多目标优化方法
链接:https://arxiv.org/abs/2512.24747

作者:Tim J. Boonen,Xinyue Fan,Zixiao Quan
摘要:Machine learning improves predictive accuracy in insurance pricing but exacerbates trade-offs between competing fairness criteria across different discrimination measures, challenging regulators and insurers to reconcile profitability with equitable outcomes. While existing fairness-aware models offer partial solutions under GLM and XGBoost estimation methods, they remain constrained by single-objective optimization, failing to holistically navigate a conflicting landscape of accuracy, group fairness, individual fairness, and counterfactual fairness. To address this, we propose a novel multi-objective optimization framework that jointly optimizes all four criteria via the Non-dominated Sorting Genetic Algorithm II (NSGA-II), generating a diverse Pareto front of trade-off solutions. We use a specific selection mechanism to extract a premium on this front. Our results show that XGBoost outperforms GLM in accuracy but amplifies fairness disparities; the Orthogonal model excels in group fairness, while Synthetic Control leads in individual and counterfactual fairness. Our method consistently achieves a balanced compromise, outperforming single-model approaches.


【6】Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration
标题:静态重加权产生软拟Q迭代的局部收敛
链接:https://arxiv.org/abs/2512.23927

作者:Lars van der Laan,Nathan Kallus
摘要:Fitted Q-iteration (FQI) and its entropy-regularized variant, soft FQI, are central tools for value-based model-free offline reinforcement learning, but can behave poorly under function approximation and distribution shift. In the entropy-regularized setting, we show that the soft Bellman operator is locally contractive in the stationary norm of the soft-optimal policy, rather than in the behavior norm used by standard FQI. This geometric mismatch explains the instability of soft Q-iteration with function approximation in the absence of Bellman completeness. To restore contraction, we introduce stationary-reweighted soft FQI, which reweights each regression update using the stationary distribution of the current policy. We prove local linear convergence under function approximation with geometrically damped weight-estimation errors, assuming approximate realizability. Our analysis further suggests that global convergence may be recovered by gradually reducing the softmax temperature, and that this continuation approach can extend to the hardmax limit under a mild margin condition.


预测|估计(10篇)

【1】PRISM: A hierarchical multiscale approach for time series forecasting
标题:PRism:时间序列预测的分层多尺度方法
链接:https://arxiv.org/abs/2512.24898

作者:Zihao Chen,Alexandre Andre,Wenrui Ma,Ian Knight,Sergey Shuvaev,Eva Dyer
摘要:Forecasting is critical in areas such as finance, biology, and healthcare. Despite the progress in the field, making accurate forecasts remains challenging because real-world time series contain both global trends, local fine-grained structure, and features on multiple scales in between. Here, we present a new forecasting method, PRISM (Partitioned Representation for Iterative Sequence Modeling), that addresses this challenge through a learnable tree-based partitioning of the signal. At the root of the tree, a global representation captures coarse trends in the signal, while recursive splits reveal increasingly localized views of the signal. At each level of the tree, data are projected onto a time-frequency basis (e.g., wavelets or exponential moving averages) to extract scale-specific features, which are then aggregated across the hierarchy. This design allows the model to jointly capture global structure and local dynamics of the signal, enabling accurate forecasting. Experiments across benchmark datasets show that our method outperforms state-of-the-art methods for forecasting. Overall, these results demonstrate that our hierarchical approach provides a lightweight and flexible framework for forecasting multivariate time series. The code is available at https://github.com/nerdslab/prism.


【2】A Scalable Framework for logP Prediction: From Terabyte-Scale Data Integration to Interpretable Ensemble Modeling
标题:logP预测的可扩展框架:从太字节规模数据集成到可解释整体建模
链接:https://arxiv.org/abs/2512.24643

作者:Malikussaid,Septian Caesar Floresko,Ade Romadhony,Isman Kurniawan,Warih Maharani,Hilal Hudan Nuha
备注:18 pages, 15 figures, 4 equations, 2 algorithms, 6 tables, to be published in KST 2026, unabridged version
摘要:This study presents a large-scale predictive modeling framework for logP prediction using 426850 bioactive compounds rigorously curated from the intersection of three authoritative chemical databases: PubChem, ChEMBL, and eMolecules. We developed a novel computational infrastructure to address the data integration challenge, reducing processing time from a projected over 100 days to 3.2 hours through byte-offset indexing architecture, a 740-fold improvement. Our comprehensive analysis revealed critical insights into the multivariate nature of lipophilicity: while molecular weight exhibited weak bivariate correlation with logP, SHAP analysis on ensemble models identified it as the single most important predictor globally. We systematically evaluated multiple modeling approaches, discovering that linear models suffered from inherent heteroskedasticity that classical remediation strategies, including weighted least squares and Box-Cox transformation, failed to address. Tree-based ensemble methods, including Random Forest and XGBoost, proved inherently robust to this violation, achieving an R-squared of 0.765 and RMSE of 0.731 logP units on the test set. Furthermore, a stratified modeling strategy, employing specialized models for drug-like molecules (91 percent of dataset) and extreme cases (nine percent), achieved optimal performance: an RMSE of 0.838 for the drug-like subset and an R-squared of 0.767 for extreme molecules, the highest of all evaluated approaches. These findings provide actionable guidance for molecular design, establish robust baselines for lipophilicity prediction using only 2D descriptors, and demonstrate that well-curated, descriptor-based ensemble models remain competitive with state-of-the-art graph neural network architectures.


【3】What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
标题:是什么推动了联合嵌入预测世界模型物理规划的成功?
链接:https://arxiv.org/abs/2512.24497

作者 :Basile Terver,Tsung-Yen Yang,Jean Ponce,Adrien Bardes,Yann LeCun
摘要:A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach involves training a world model from state-action trajectories and subsequently use it with a planning algorithm to solve new tasks. Planning is commonly performed in the input space, but a recent family of methods has introduced planning algorithms that optimize in the learned representation space of the world model, with the promise that abstracting irrelevant details yields more efficient planning. In this work, we characterize models from this family as JEPA-WMs and investigate the technical choices that make algorithms from this class work. We propose a comprehensive study of several key components with the objective of finding the optimal approach within the family. We conducted experiments using both simulated environments and real-world robotic data, and studied how the model architecture, the training objective, and the planning algorithm affect planning success. We combine our findings to propose a model that outperforms two established baselines, DINO-WM and V-JEPA-2-AC, in both navigation and manipulation tasks. Code, data and checkpoints are available at https://github.com/facebookresearch/jepa-wms.


【4】Generative forecasting with joint probability models
标题:使用联合概率模型的生成预测
链接:https://arxiv.org/abs/2512.24446

作者:Patrick Wyrod,Ashesh Chattopadhyay,Daniele Venturi
备注:18 pages, 11 figures
摘要:Chaotic dynamical systems exhibit strong sensitivity to initial conditions and often contain unresolved multiscale processes, making deterministic forecasting fundamentally limited. Generative models offer an appealing alternative by learning distributions over plausible system evolutions; yet, most existing approaches focus on next-step conditional prediction rather than the structure of the underlying dynamics. In this work, we reframe forecasting as a fully generative problem by learning the joint probability distribution of lagged system states over short temporal windows and obtaining forecasts through marginalization. This new perspective allows the model to capture nonlinear temporal dependencies, represent multistep trajectory segments, and produce next-step predictions consistent with the learned joint distribution. We also introduce a general, model-agnostic training and inference framework for joint generative forecasting and show how it enables assessment of forecast robustness and reliability using three complementary uncertainty quantification metrics (ensemble variance, short-horizon autocorrelation, and cumulative Wasserstein drift), without access to ground truth. We evaluate the performance of the proposed method on two canonical chaotic dynamical systems, the Lorenz-63 system and the Kuramoto-Sivashinsky equation, and show that joint generative models yield improved short-term predictive skill, preserve attractor geometry, and achieve substantially more accurate long-range statistical behaviour than conventional conditional next-step models.


【5】Empower Low-Altitude Economy: A Reliability-Aware Dynamic Weighting Allocation for Multi-modal UAV Beam Prediction
标题:增强低空经济:多模式无人机射束预测的可靠性动态权重分配
链接:https://arxiv.org/abs/2512.24324

作者:Haojin Li,Anbang Zhang,Chen Sun,Chenyuan Feng,Kaiqian Qu,Tony Q. S. Quek,Haijun Zhang
摘要:The low-altitude economy (LAE) is rapidly expanding driven by urban air mobility, logistics drones, and aerial sensing, while fast and accurate beam prediction in uncrewed aerial vehicles (UAVs) communications is crucial for achieving reliable connectivity. Current research is shifting from single-signal to multi-modal collaborative approaches. However, existing multi-modal methods mostly employ fixed or empirical weights, assuming equal reliability across modalities at any given moment. Indeed, the importance of different modalities fluctuates dramatically with UAV motion scenarios, and static weighting amplifies the negative impact of degraded modalities. Furthermore, modal mismatch and weak alignment further undermine cross-scenario generalization. To this end, we propose a reliability-aware dynamic weighting scheme applied to a semantic-aware multi-modal beam prediction framework, named SaM2B. Specifically, SaM2B leverages lightweight cues such as environmental visual, flight posture, and geospatial data to adaptively allocate contributions across modalities at different time points through reliability-aware dynamic weight updates. Moreover, by utilizing cross-modal contrastive learning, we align the "multi-source representation beam semantics" associated with specific beam information to a shared semantic space, thereby enhancing discriminative power and robustness under modal noise and distribution shifts. Experiments on real-world low-altitude UAV datasets show that SaM2B achieves more satisfactory results than baseline methods.


【6】Colorful Pinball: Density-Weighted Quantile Regression for Conditional Guarantee of Conformal Prediction
标题:彩色弹球:用于保形预测的条件保证的密度加权分位数回归
链接:https://arxiv.org/abs/2512.24139

作者:Qianyi Chen,Bo Li
摘要:While conformal prediction provides robust marginal coverage guarantees, achieving reliable conditional coverage for specific inputs remains challenging. Although exact distribution-free conditional coverage is impossible with finite samples, recent work has focused on improving the conditional coverage of standard conformal procedures. Distinct from approaches that target relaxed notions of conditional coverage, we directly minimize the mean squared error of conditional coverage by refining the quantile regression components that underpin many conformal methods. Leveraging a Taylor expansion, we derive a sharp surrogate objective for quantile regression: a density-weighted pinball loss, where the weights are given by the conditional density of the conformity score evaluated at the true quantile. We propose a three-headed quantile network that estimates these weights via finite differences using auxiliary quantile levels at \(1-α\pm δ\), subsequently fine-tuning the central quantile by optimizing the weighted loss. We provide a theoretical analysis with exact non-asymptotic guarantees characterizing the resulting excess risk. Extensive experiments on diverse high-dimensional real-world datasets demonstrate remarkable improvements in conditional coverage performance.


【7】Multi-Scenario Highway Lane-Change Intention Prediction: A Temporal Physics-Informed Multi-Modal Framework
标题:多场景高速公路车道变更意图预测:时间物理信息多模式框架
链接:https://arxiv.org/abs/2512.24075

作者:Jiazhao Shi,Ziyu Wang,Yichen Lin,Shoufeng Lu
摘要:Lane-change intention prediction is safety-critical for autonomous driving and ADAS, but remains difficult in naturalistic traffic due to noisy kinematics, severe class imbalance, and limited generalization across heterogeneous highway scenarios. We propose Temporal Physics-Informed AI (TPI-AI), a hybrid framework that fuses deep temporal representations with physics-inspired interaction cues. A two-layer bidirectional LSTM (Bi-LSTM) encoder learns compact embeddings from multi-step trajectory histories; we concatenate these embeddings with kinematics-, safety-, and interaction-aware features (e.g., headway, TTC, and safe-gap indicators) and train a LightGBM classifier for three-class intention recognition (No-LC, Left-LC, Right-LC). To improve minority-class reliability, we apply imbalance-aware optimization including resampling/weighting and fold-wise threshold calibration. Experiments on two large-scale drone-based datasets, highD (straight highways) and exiD (ramp-rich environments), use location-based splits and evaluate prediction horizons T = 1, 2, 3 s. TPI-AI outperforms standalone LightGBM and Bi-LSTM baselines, achieving macro-F1 of 0.9562, 0.9124, 0.8345 on highD and 0.9247, 0.8197, 0.7605 on exiD at T = 1, 2, 3 s, respectively. These results show that combining physics-informed interaction features with learned temporal embeddings yields robust multi-scenario lane-change intention prediction.


【8】Exploring the Potential of Spiking Neural Networks in UWB Channel Estimation
标题:探索尖峰神经网络在超宽带信道估计中的潜力
链接:https://arxiv.org/abs/2512.23975

作者:Youdong Zhang,Xu He,Xiaolin Meng
摘要:Although existing deep learning-based Ultra-Wide Band (UWB) channel estimation methods achieve high accuracy, their computational intensity clashes sharply with the resource constraints of low-cost edge devices. Motivated by this, this letter explores the potential of Spiking Neural Networks (SNNs) for this task and develops a fully unsupervised SNN solution. To enable a comprehensive performance analysis, we devise an extensive set of comparative strategies and evaluate them on a compelling public benchmark. Experimental results show that our unsupervised approach still attains 80% test accuracy, on par with several supervised deep learning-based strategies. Moreover, compared with complex deep learning methods, our SNN implementation is inherently suited to neuromorphic deployment and offers a drastic reduction in model complexity, bringing significant advantages for future neuromorphic practice.


【9】Efficient Deep Learning for Short-Term Solar Irradiance Time Series Forecasting: A Benchmark Study in Ho Chi Minh City
标题:短期太阳辐射时间序列预测的高效深度学习:胡志明市的基准研究
链接:https://arxiv.org/abs/2512.23898

作者:Tin Hoang
备注:preprint, 40 pages
摘要:Reliable forecasting of Global Horizontal Irradiance (GHI) is essential for mitigating the variability of solar energy in power grids. This study presents a comprehensive benchmark of ten deep learning architectures for short-term (1-hour ahead) GHI time series forecasting in Ho Chi Minh City, leveraging high-resolution NSRDB satellite data (2011-2020) to compare established baselines (e.g. LSTM, TCN) against emerging state-of-the-art architectures, including Transformer, Informer, iTransformer, TSMixer, and Mamba. Experimental results identify the Transformer as the superior architecture, achieving the highest predictive accuracy with an R^2 of 0.9696. The study further utilizes SHAP analysis to contrast the temporal reasoning of these architectures, revealing that Transformers exhibit a strong "recency bias" focused on immediate atmospheric conditions, whereas Mamba explicitly leverages 24-hour periodic dependencies to inform predictions. Furthermore, we demonstrate that Knowledge Distillation can compress the high-performance Transformer by 23.5% while surprisingly reducing error (MAE: 23.78 W/m^2), offering a proven pathway for deploying sophisticated, low-latency forecasting on resource-constrained edge devices.


【10】Assessing generative modeling approaches for free energy estimates in condensed matter
标题:评估凝聚物质自由能估计的生成建模方法
链接:https://arxiv.org/abs/2512.23930

作者:Maximilian Schebek,Jiajun He,Emil Hoffmann,Yuanqi Du,Frank Noé,Jutta Rogal
摘要:The accurate estimation of free energy differences between two states is a long-standing challenge in molecular simulations. Traditional approaches generally rely on sampling multiple intermediate states to ensure sufficient overlap in phase space and are, consequently, computationally expensive. Several generative-model-based methods have recently addressed this challenge by learning a direct bridge between distributions, bypassing the need for intermediate states. However, it remains unclear which approaches provide the best trade-off between efficiency, accuracy, and scalability. In this work, we systematically review these methods and benchmark selected approaches with a focus on condensed-matter systems. In particular, we investigate the performance of discrete and continuous normalizing flows in the context of targeted free energy perturbation as well as FEAT (Free energy Estimators with Adaptive Transport) together with the escorted Jarzynski equality, using coarse-grained monatomic ice and Lennard-Jones solids as benchmark systems. We evaluate accuracy, data efficiency, computational cost, and scalability with system size. Our results provide a quantitative framework for selecting effective free energy estimation strategies in condensed-phase systems.


其他神经网络|深度学习|模型|建模(31篇)

【1】ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
标题:ResponseRank:通过偏好强度学习实现数据高效的奖励建模
链接:https://arxiv.org/abs/2512.25023

作者:Timo Kaufmann,Yannick Metz,Daniel Keim,Eyke Hüllermeier
备注:NeurIPS 2025
摘要 :Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over oranges and bananas over grapes, but which preference is stronger? Strength is crucial for decision-making under uncertainty and generalization of preference models, but hard to measure reliably. Metadata such as response times and inter-annotator agreement can serve as proxies for strength, but are often noisy and confounded. We propose ResponseRank to address the challenge of learning from noisy strength signals. Our method uses relative differences in proxy signals to rank responses to pairwise comparisons by their inferred preference strength. To control for systemic variation, we compare signals only locally within carefully constructed strata. This enables robust learning of utility differences consistent with strength-derived rankings while making minimal assumptions about the strength signal. Our contributions are threefold: (1) ResponseRank, a novel method that robustly learns preference strength by leveraging locally valid relative strength signals; (2) empirical evidence of improved sample efficiency and robustness across diverse tasks: synthetic preference learning (with simulated response times), language modeling (with annotator agreement), and RL control tasks (with simulated episode returns); and (3) the Pearson Distance Correlation (PDC), a novel metric that isolates cardinal utility learning from ordinal accuracy.


【2】MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control
标题:MSACL:具有李雅普诺夫证书的多步演员批评学习,用于指数稳定控制
链接:https://arxiv.org/abs/2512.24955

作者:Yongwei Zhang,Yuanzhe Xing,Quan Quan,Zhikun She
摘要:Achieving provable stability in model-free reinforcement learning (RL) remains a challenge, particularly in balancing exploration with rigorous safety. This article introduces MSACL, a framework that integrates exponential stability theory with maximum entropy RL through multi-step Lyapunov certificate learning. Unlike methods relying on complex reward engineering, MSACL utilizes off-policy multi-step data to learn Lyapunov certificates satisfying theoretical stability conditions. By introducing Exponential Stability Labels (ESL) and a $λ$-weighted aggregation mechanism, the framework effectively balances the bias-variance trade-off in multi-step learning. Policy optimization is guided by a stability-aware advantage function, ensuring the learned policy promotes rapid Lyapunov descent. We evaluate MSACL across six benchmarks, including stabilization and nonlinear tracking tasks, demonstrating its superiority over state-of-the-art Lyapunov-based RL algorithms. MSACL achieves exponential stability and rapid convergence under simple rewards, while exhibiting significant robustness to uncertainties and generalization to unseen trajectories. Sensitivity analysis establishes the multi-step horizon $n=20$ as a robust default across diverse systems. By linking Lyapunov theory with off-policy actor-critic frameworks, MSACL provides a foundation for verifiably safe learning-based control. Source code and benchmark environments will be made publicly available.


【3】Gradient Descent as Implicit EM in Distance-Based Neural Models
标题:梯度下降作为基于距离的神经模型中的隐式EM
链接:https://arxiv.org/abs/2512.24780

作者:Alan Oursland
备注:15 pages
摘要:Neural networks trained with standard objectives exhibit behaviors characteristic of probabilistic inference: soft clustering, prototype specialization, and Bayesian uncertainty tracking. These phenomena appear across architectures -- in attention mechanisms, classification heads, and energy-based models -- yet existing explanations rely on loose analogies to mixture models or post-hoc architectural interpretation. We provide a direct derivation. For any objective with log-sum-exp structure over distances or energies, the gradient with respect to each distance is exactly the negative posterior responsibility of the corresponding component: $\partial L / \partial d_j = -r_j$. This is an algebraic identity, not an approximation. The immediate consequence is that gradient descent on such objectives performs expectation-maximization implicitly -- responsibilities are not auxiliary variables to be computed but gradients to be applied. No explicit inference algorithm is required because inference is embedded in optimization. This result unifies three regimes of learning under a single mechanism: unsupervised mixture modeling, where responsibilities are fully latent; attention, where responsibilities are conditioned on queries; and cross-entropy classification, where supervision clamps responsibilities to targets. The Bayesian structure recently observed in trained transformers is not an emergent property but a necessary consequence of the objective geometry. Optimization and inference are the same process.


【4】Nested Learning: The Illusion of Deep Learning Architectures
标题:嵌套学习:深度学习架构的幻觉
链接:https://arxiv.org/abs/2512.24695

作者:Ali Behrouz,Meisam Razaviyayn,Peilin Zhong,Vahab Mirrokni
备注:A version of this work is published at Neural Information Processing Systems (NeurIPS) 2025
摘要:Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, self-improve, and find effective solutions. In this paper, we present a new learning paradigm, called Nested Learning (NL), that coherently represents a machine learning model with a set of nested, multi-level, and/or parallel optimization problems, each of which with its own context flow. Through the lenses of NL, existing deep learning methods learns from data through compressing their own context flow, and in-context learning naturally emerges in large models. NL suggests a philosophy to design more expressive learning algorithms with more levels, resulting in higher-order in-context learning and potentially unlocking effective continual learning capabilities. We advocate for NL by presenting three core contributions: (1) Expressive Optimizers: We show that known gradient-based optimizers, such as Adam, SGD with Momentum, etc., are in fact associative memory modules that aim to compress the gradients' information (by gradient descent). Building on this insight, we present other more expressive optimizers with deep memory and/or more powerful learning rules; (2) Self-Modifying Learning Module: Taking advantage of NL's insights on learning algorithms, we present a sequence model that learns how to modify itself by learning its own update algorithm; and (3) Continuum Memory System: We present a new formulation for memory system that generalizes the traditional viewpoint of long/short-term memory. Combining our self-modifying sequence model with the continuum memory system, we present a continual learning module, called Hope, showing promising results in language modeling, knowledge incorporation, and few-shot generalization tasks, continual learning, and long-context reasoning tasks.


【5】Generalising E-prop to Deep Networks
标题:将E-prop推广到深度网络
链接:https://arxiv.org/abs/2512.24506

作者:Beren Millidge
备注:30/12/25 initial upload
摘要:Recurrent networks are typically trained with backpropagation through time (BPTT). However, BPTT requires storing the history of all states in the network and then replaying them sequentially backwards in time. This computation appears extremely implausible for the brain to implement. Real Time Recurrent Learning (RTRL) proposes an mathematically equivalent alternative where gradient information is propagated forwards in time locally alongside the regular forward pass, however it has significantly greater computational complexity than BPTT which renders it impractical for large networks. E-prop proposes an approximation of RTRL which reduces its complexity to the level of BPTT while maintaining a purely online forward update which can be implemented by an eligibility trace at each synapse. However, works on RTRL and E-prop ubiquitously investigate learning in a single layer with recurrent dynamics. However, learning in the brain spans multiple layers and consists of both hierarchal dynamics in depth as well as time. In this mathematical note, we extend the E-prop framework to handle arbitrarily deep networks, deriving a novel recursion relationship across depth which extends the eligibility traces of E-prop to deeper layers. Our results thus demonstrate an online learning algorithm can perform accurate credit assignment across both time and depth simultaneously, allowing the training of deep recurrent networks without backpropagation through time.


【6】Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
标题:小型训练收件箱能否可靠地指导数据修复?重新思考代理模型实践
链接:https://arxiv.org/abs/2512.24503

作者:Jiachen T. Wang,Tong Wu,Kaifeng Lyu,James Zou,Dawn Song,Ruoxi Jia,Prateek Mittal
摘要:Data teams at frontier AI companies routinely train small proxy models to make critical decisions about pretraining data recipes for full-scale training runs. However, the community has a limited understanding of whether and when conclusions drawn from small-scale experiments reliably transfer to full-scale model training. In this work, we uncover a subtle yet critical issue in the standard experimental protocol for data recipe assessment: the use of identical small-scale model training configurations across all data recipes in the name of "fair" comparison. We show that the experiment conclusions about data quality can flip with even minor adjustments to training hyperparameters, as the optimal training configuration is inherently data-dependent. Moreover, this fixed-configuration protocol diverges from full-scale model development pipelines, where hyperparameter optimization is a standard step. Consequently, we posit that the objective of data recipe assessment should be to identify the recipe that yields the best performance under data-specific tuning. To mitigate the high cost of hyperparameter tuning, we introduce a simple patch to the evaluation protocol: using reduced learning rates for proxy model training. We show that this approach yields relative performance that strongly correlates with that of fully tuned large-scale LLM pretraining runs. Theoretically, we prove that for random-feature models, this approach preserves the ordering of datasets according to their optimal achievable loss. Empirically, we validate this approach across 23 data recipes covering four critical dimensions of data curation, demonstrating dramatic improvements in the reliability of small-scale experiments.


【7】Tubular Riemannian Laplace Approximations for Bayesian Neural Networks
标题:Bayesian神经网络的管状Riemannian拉普拉斯逼近
链接:https://arxiv.org/abs/2512.24381

作者:Rodrigo Pereira David
摘要:Laplace approximations are among the simplest and most practical methods for approximate Bayesian inference in neural networks, yet their Euclidean formulation struggles with the highly anisotropic, curved loss surfaces and large symmetry groups that characterize modern deep models. Recent work has proposed Riemannian and geometric Gaussian approximations to adapt to this structure. Building on these ideas, we introduce the Tubular Riemannian Laplace (TRL) approximation. TRL explicitly models the posterior as a probabilistic tube that follows a low-loss valley induced by functional symmetries, using a Fisher/Gauss-Newton metric to separate prior-dominated tangential uncertainty from data-dominated transverse uncertainty. We interpret TRL as a scalable reparametrised Gaussian approximation that utilizes implicit curvature estimates to operate in high-dimensional parameter spaces. Our empirical evaluation on ResNet-18 (CIFAR-10 and CIFAR-100) demonstrates that TRL achieves excellent calibration, matching or exceeding the reliability of Deep Ensembles (in terms of ECE) while requiring only a fraction (1/5) of the training cost. TRL effectively bridges the gap between single-model efficiency and ensemble-grade reliability.


【8】Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
标题:通过基于政策对象的面具学习联合选择大规模预训练数据
链接:https://arxiv.org/abs/2512.24265

作者:Ziqing Fan,Yuqiao Xian,Yan Sun,Li Shen
摘要:A fine-grained data recipe is crucial for pre-training large language models, as it can significantly enhance training efficiency and model performance. One important ingredient in the recipe is to select samples based on scores produced by defined rules, LLM judgment, or statistical information in embeddings, which can be roughly categorized into quality and diversity metrics. Due to the high computational cost when applied to trillion-scale token pre-training datasets such as FineWeb and DCLM, these two or more types of metrics are rarely considered jointly in a single selection process. However, in our empirical study, selecting samples based on quality metrics exhibit severe diminishing returns during long-term pre-training, while selecting on diversity metrics removes too many valuable high-quality samples, both of which limit pre-trained LLMs' capabilities. Therefore, we introduce DATAMASK, a novel and efficient joint learning framework designed for large-scale pre-training data selection that can simultaneously optimize multiple types of metrics in a unified process, with this study focusing specifically on quality and diversity metrics. DATAMASK approaches the selection process as a mask learning problem, involving iterative sampling of data masks, computation of policy gradients based on predefined objectives with sampled masks, and updating of mask sampling logits. Through policy gradient-based optimization and various acceleration enhancements, it significantly reduces selection time by 98.9% compared to greedy algorithm, enabling our study to explore joint learning within trillion-scale tokens. With DATAMASK, we select a subset of about 10% from the 15 trillion-token FineWeb dataset, termed FineWeb-Mask. Evaluated across 12 diverse tasks, we achieves significant improvements of 3.2% on a 1.5B dense model and 1.9% on a 7B MoE model.


【9】Paired Seed Evaluation: Statistical Reliability for Learning-Based Simulators
标题:配对种子评估:基于学习的模拟器的统计可靠性
链接:https://arxiv.org/abs/2512.24145

作者:Udit Sharma
备注:12 pages, 3 figures
摘要:Machine learning systems appear stochastic but are deterministically random, as seeded pseudorandom number generators produce identical realisations across executions. Learning-based simulators are widely used to compare algorithms, design choices, and interventions under such dynamics, yet evaluation outcomes often exhibit high variance due to random initialisation and learning stochasticity. We analyse the statistical structure of comparative evaluation in these settings and show that standard independent evaluation designs fail to exploit shared sources of randomness across alternatives. We formalise a paired seed evaluation design in which competing systems are evaluated under identical random seeds, inducing matched realisations of stochastic components and strict variance reduction whenever outcomes are positively correlated at the seed level. This yields tighter confidence intervals, higher statistical power, and effective sample size gains at fixed computational budgets. Empirically, seed-level correlations are typically large and positive, producing order-of-magnitude efficiency gains. Paired seed evaluation is weakly dominant in practice, improving statistical reliability when correlation is present and reducing to independent evaluation without loss of validity when it is not.


【10】GARDO: Reinforcing Diffusion Models without Reward Hacking
标题:GARDO:在没有奖励黑客的情况下加强扩散模型
链接:https://arxiv.org/abs/2512.24138

作者:Haoran He,Yuxiao Ye,Jie Liu,Jiajun Liang,Zhiyong Wang,Ziyang Yuan,Xintao Wang,Hangyu Mao,Pengfei Wan,Ling Pan
备注:17 pages. Project: https://tinnerhrhe.github.io/gardo_project
摘要:Fine-tuning diffusion models via online reinforcement learning (RL) has shown great potential for enhancing text-to-image alignment. However, since precisely specifying a ground-truth objective for visual tasks remains challenging, the models are often optimized using a proxy reward that only partially captures the true goal. This mismatch often leads to reward hacking, where proxy scores increase while real image quality deteriorates and generation diversity collapses. While common solutions add regularization against the reference policy to prevent reward hacking, they compromise sample efficiency and impede the exploration of novel, high-reward regions, as the reference policy is usually sub-optimal. To address the competing demands of sample efficiency, effective exploration, and mitigation of reward hacking, we propose Gated and Adaptive Regularization with Diversity-aware Optimization (GARDO), a versatile framework compatible with various RL algorithms. Our key insight is that regularization need not be applied universally; instead, it is highly effective to selectively penalize a subset of samples that exhibit high uncertainty. To address the exploration challenge, GARDO introduces an adaptive regularization mechanism wherein the reference model is periodically updated to match the capabilities of the online policy, ensuring a relevant regularization target. To address the mode collapse issue in RL, GARDO amplifies the rewards for high-quality samples that also exhibit high diversity, encouraging mode coverage without destabilizing the optimization process. Extensive experiments across diverse proxy rewards and hold-out unseen metrics consistently show that GARDO mitigates reward hacking and enhances generation diversity without sacrificing sample efficiency or exploration, highlighting its effectiveness and robustness.


【11】Training a Huggingface Model on AWS Sagemaker (Without Tears)
标题:在AWS Sagemaker上训练Huggingface模型(Without Tears)
链接:https://arxiv.org/abs/2512.24098

作者:Liling Tan
摘要:The development of Large Language Models (LLMs) has primarily been driven by resource-rich research groups and industry partners. Due to the lack of on-premise computing resources required for increasingly complex models, many researchers are turning to cloud services like AWS SageMaker to train Hugging Face models. However, the steep learning curve of cloud platforms often presents a barrier for researchers accustomed to local environments. Existing documentation frequently leaves knowledge gaps, forcing users to seek fragmented information across the web. This demo paper aims to democratize cloud adoption by centralizing the essential information required for researchers to successfully train their first Hugging Face model on AWS SageMaker from scratch.


【12】Causify DataFlow: A Framework For High-performance Machine Learning Stream Computing
标题:Caussify DataFlow:高性能机器学习流计算框架
链接:https://arxiv.org/abs/2512.23977

作者:Giacinto Paolo Saggese,Paul Smith
摘要:We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflows assume finite datasets and require substantial reimplementation when moving from batch prototypes to streaming production systems. This gap introduces causality violations, batch boundary artifacts, and poor reproducibility of real-time failures.   DataFlow resolves these issues through a unified execution model based on directed acyclic graphs (DAGs) with point-in-time idempotency: outputs at any time t depend only on a fixed-length context window preceding t. This guarantee ensures that models developed in batch mode execute identically in streaming production without code changes. The framework enforces strict causality by automatically tracking knowledge time across all transformations, eliminating future-peeking bugs.   DataFlow supports flexible tiling across temporal and feature dimensions, allowing the same model to operate at different frequencies and memory profiles via configuration alone. It integrates natively with the Python data science stack and provides fit/predict semantics for online learning, caching and incremental computation, and automatic parallelization through DAG-based scheduling. We demonstrate its effectiveness across domains including financial trading, IoT, fraud detection, and real-time analytics.


【13】Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
标题 :利用基于超图的存储器改进多步RAG以实现长上下文复杂关系建模
链接:https://arxiv.org/abs/2512.23959

作者:Chulun Zhou,Chunkang Zhang,Guoxin Yu,Fandong Meng,Jie Zhou,Wai Lam,Mo Yu
备注:21 pages
摘要:Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Many RAG systems incorporate a working memory module to consolidate retrieved information. However, existing memory designs function primarily as passive storage that accumulates isolated facts for the purpose of condensing the lengthy inputs and generating new sub-queries through deduction. This static nature overlooks the crucial high-order correlations among primitive facts, the compositions of which can often provide stronger guidance for subsequent steps. Therefore, their representational strength and impact on multi-step reasoning and knowledge evolution are limited, resulting in fragmented reasoning and weak global sense-making capacity in extended contexts. We introduce HGMem, a hypergraph-based memory mechanism that extends the concept of memory beyond simple storage into a dynamic, expressive structure for complex reasoning and global understanding. In our approach, memory is represented as a hypergraph whose hyperedges correspond to distinct memory units, enabling the progressive formation of higher-order interactions within memory. This mechanism connects facts and thoughts around the focal problem, evolving into an integrated and situated knowledge structure that provides strong propositions for deeper reasoning in subsequent steps. We evaluate HGMem on several challenging datasets designed for global sense-making. Extensive experiments and in-depth analyses show that our method consistently improves multi-step RAG and substantially outperforms strong baseline systems across diverse tasks.


【14】Interactive Machine Learning: From Theory to Scale
标题:交互式机器学习:从理论到规模
链接:https://arxiv.org/abs/2512.23924

作者:Yinglun Zhu
备注:Updated Ph.D. dissertation (typos corrected; minor technical and structural revisions)
摘要:Machine learning has achieved remarkable success across a wide range of applications, yet many of its most effective methods rely on access to large amounts of labeled data or extensive online interaction. In practice, acquiring high-quality labels and making decisions through trial-and-error can be expensive, time-consuming, or risky, particularly in large-scale or high-stakes settings. This dissertation studies interactive machine learning, in which the learner actively influences how information is collected or which actions are taken, using past observations to guide future interactions. We develop new algorithmic principles and establish fundamental limits for interactive learning along three dimensions: active learning with noisy data and rich model classes, sequential decision making with large action spaces, and model selection under partial feedback. Our results include the first computationally efficient active learning algorithms achieving exponential label savings without low-noise assumptions; the first efficient, general-purpose contextual bandit algorithms whose guarantees are independent of the size of the action space; and the first tight characterizations of the fundamental cost of model selection in sequential decision making. Overall, this dissertation advances the theoretical foundations of interactive learning by developing algorithms that are statistically optimal and computationally efficient, while also providing principled guidance for deploying interactive learning methods in large-scale, real-world settings.


【15】Rethinking Dense Linear Transformations: Stagewise Pairwise Mixing (SPM) for Near-Linear Training in Neural Networks
标题:重新思考密集线性变换:用于神经网络近线性训练的逐阶段成对混合(SPP)
链接:https://arxiv.org/abs/2512.23905

作者:Peter Farag
备注:16 pages
摘要:Dense linear layers are a dominant source of computational and parametric cost in modern machine learning models, despite their quadratic complexity and often being misaligned with the compositional structure of learned representations. We introduce Stagewise Pairwise Mixers (SPM), a structured linear operator that replaces dense matrices with a composition of sparse pairwise-mixing stages. An SPM layer implements a global linear transformation in $O(nL)$ time with $O(nL)$ parameters, where $L$ is typically constant or $log_2n$, and admits exact closed-form forward and backward computations. SPM is designed as a drop-in replacement for dense linear layers in feedforward networks, recurrent architectures, attention mechanisms, etc. We derive complete forward and backward expressions for two parameterizations: an orthogonal norm-preserving rotation-based variant and a fully general $2 \times 2$ mixing variant. Beyond computational savings, the stagewise structure of SPM induces an explicit compositional inductive bias that constrains model capacity and improves generalization when aligned with task structure. We present proof-of-concept experiments demonstrating substantial reductions in wall-clock cost and improved accuracy on structured learning problems, while retaining competitive performance on real-world benchmarks.


【16】Trellis: Learning to Compress Key-Value Memory in Attention Models
标题:网格:学习压缩注意力模型中的关键值记忆
链接:https://arxiv.org/abs/2512.23852

作者:Mahdi Karami,Ali Behrouz,Praneeth Kacham,Vahab Mirrokni
备注:In Second Conference on Language Modeling (COLM) (2025)
摘要:Transformers, while powerful, suffer from quadratic computational complexity and the ever-growing Key-Value (KV) cache of the attention mechanism. This paper introduces Trellis, a novel Transformer architecture with bounded memory that learns how to compress its key-value memory dynamically at test time. Trellis replaces the standard KV cache with a fixed-size memory and train a two-pass recurrent compression mechanism to store new keys and values into memory. To achieve this, it leverages an online gradient descent procedure with a forget gate, enabling the compressed memory to be updated recursively while learning to retain important contextual information from incoming tokens at test time. Extensive experiments on language modeling, common-sense reasoning, recall-intensive tasks, and time series show that the proposed architecture outperforms strong baselines. Notably, its performance gains increase as the sequence length grows, highlighting its potential for long-context applications.


【17】Deep learning methods for inverse problems using connections between proximal operators and Hamilton-Jacobi equations
标题:使用邻近操作符和Hamilton-Jacobi方程之间的联系来解决逆问题的深度学习方法
链接:https://arxiv.org/abs/2512.23829

作者:Oluwatosin Akande,Gabriel P. Langlois,Akwum Onwunta
摘要:Inverse problems are important mathematical problems that seek to recover model parameters from noisy data. Since inverse problems are often ill-posed, they require regularization or incorporation of prior information about the underlying model or unknown variables. Proximal operators, ubiquitous in nonsmooth optimization, are central to this because they provide a flexible and convenient way to encode priors and build efficient iterative algorithms. They have also recently become key to modern machine learning methods, e.g., for plug-and-play methods for learned denoisers and deep neural architectures for learning priors of proximal operators. The latter was developed partly due to recent work characterizing proximal operators of nonconvex priors as subdifferential of convex potentials. In this work, we propose to leverage connections between proximal operators and Hamilton-Jacobi partial differential equations (HJ PDEs) to develop novel deep learning architectures for learning the prior. In contrast to other existing methods, we learn the prior directly without recourse to inverting the prior after training. We present several numerical results that demonstrate the efficiency of the proposed method in high dimensions.


【18】MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling
标题:MS-RSM:一种用于高效序列建模的多尺度状态空间模型
链接:https://arxiv.org/abs/2512.23824

作者:Mahdi Karami,Ali Behrouz,Peilin Zhong,Razvan Pascanu,Vahab Mirrokni
备注:In Second Conference on Language Modeling (COLM) (2025)
摘要:State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequence modeling. They rely on linear recurrences to integrate information over time, enabling fast inference, parallelizable training, and control over recurrence stability. However, traditional SSMs often suffer from limited effective memory, requiring larger state sizes for improved recall. Moreover, existing SSMs struggle to capture multi-scale dependencies, which are essential for modeling complex structures in time series, images, and natural language. This paper introduces a multi-scale SSM framework that addresses these limitations by representing sequence dynamics across multiple resolution and processing each resolution with specialized state-space dynamics. By capturing both fine-grained, high-frequency patterns and coarse, global trends, MS-SSM enhances memory efficiency and long-range modeling. We further introduce an input-dependent scale-mixer, enabling dynamic information fusion across resolutions. The proposed approach significantly improves sequence modeling, particularly in long-range and hierarchical tasks, while maintaining computational efficiency. Extensive experiments on benchmarks, including Long Range Arena, hierarchical reasoning, time series classification, and image recognition, demonstrate that MS-SSM consistently outperforms prior SSM-based models, highlighting the benefits of multi-resolution processing in state-space architectures.


【19】TabMixNN: A Unified Deep Learning Framework for Structural Mixed Effects Modeling on Tabular Data
标题:TabMixNN:用于表格数据结构混合效应建模的统一深度学习框架
链接:https://arxiv.org/abs/2512.23787

作者:Deniz Akdemir
摘要:We present TabMixNN, a flexible PyTorch-based deep learning framework that synthesizes classical mixed-effects modeling with modern neural network architectures for tabular data analysis. TabMixNN addresses the growing need for methods that can handle hierarchical data structures while supporting diverse outcome types including regression, classification, and multitask learning. The framework implements a modular three-stage architecture: (1) a mixed-effects encoder with variational random effects and flexible covariance structures, (2) backbone architectures including Generalized Structural Equation Models (GSEM) and spatial-temporal manifold networks, and (3) outcome-specific prediction heads supporting multiple outcome families. Key innovations include an R-style formula interface for accessibility, support for directed acyclic graph (DAG) constraints for causal structure learning, Stochastic Partial Differential Equation (SPDE) kernels for spatial modeling, and comprehensive interpretability tools including SHAP values and variance decomposition. We demonstrate the framework's flexibility through applications to longitudinal data analysis, genomic prediction, and spatial-temporal modeling. TabMixNN provides a unified interface for researchers to leverage deep learning while maintaining the interpretability and theoretical grounding of classical mixed-effects models.


【20】Exploring Cumulative Effects in Survival Data Using Deep Learning Networks
标题:使用深度学习网络探索生存数据的累积效应
链接:https://arxiv.org/abs/2512.23764

作者:Kang-Chung Yang,Shinsheng Yuan
摘要:In epidemiological research, modeling the cumulative effects of time-dependent exposures on survival outcomes presents a challenge due to their intricate temporal dynamics. Conventional spline-based statistical methods, though effective, require repeated data transformation for each spline parameter tuning, with survival analysis computations relying on the entire dataset, posing difficulties for large datasets. Meanwhile, existing neural network-based survival analysis methods focus on accuracy but often overlook the interpretability of cumulative exposure patterns. To bridge this gap, we introduce CENNSurv, a novel deep learning approach that captures dynamic risk relationships from time-dependent data. Evaluated on two diverse real-world datasets, CENNSurv revealed a multi-year lagged association between chronic environmental exposure and a critical survival outcome, as well as a critical short-term behavioral shift prior to subscription lapse. This demonstrates CENNSurv's ability to model complex temporal patterns with improved scalability. CENNSurv provides researchers studying cumulative effects a practical tool with interpretable insights.


【21】Learning Coupled System Dynamics under Incomplete Physical Constraints and Missing Data
标题:不完全物理约束和缺失数据下学习耦合系统动力学
链接 :https://arxiv.org/abs/2512.23761

作者:Esha Saha,Hao Wang
备注:38 pages, 15 Figures, 15 Tables
摘要:Advances in data acquisition and computational methods have accelerated the use of differential equation based modelling for complex systems. Such systems are often described by coupled (or more) variables, yet governing equation is typically available for one variable, while the remaining variable can be accessed only through data. This mismatch between known physics and observed data poses a fundamental challenge for existing physics-informed machine learning approaches, which generally assume either complete knowledge of the governing equations or full data availability across all variables. In this paper, we introduce MUSIC (Multitask Learning Under Sparse and Incomplete Constraints), a sparsity induced multitask neural network framework that integrates partial physical constraints with data-driven learning to recover full-dimensional solutions of coupled systems when physics-constrained and data-informed variables are mutually exclusive. MUSIC employs mesh-free (random) sampling of training data and sparsity regularization, yielding highly compressed models with improved training and evaluation efficiency. We demonstrate that MUSIC accurately learns solutions (shock wave solutions, discontinuous solutions, pattern formation solutions) to complex coupled systems under data-scarce and noisy conditions, consistently outperforming non-sparse formulations. These results highlight MUSIC as a flexible and effective approach for modeling partially observed systems with incomplete physical knowledge.


【22】Generalized Regularized Evidential Deep Learning Models: Theory and Comprehensive Evaluation
标题:广义正规化证据深度学习模型:理论与综合评估
链接:https://arxiv.org/abs/2512.23753

作者:Deep Shankar Pandey,Hyomin Choi,Qi Yu
备注:This work has been submitted to the IEEE for possible publication
摘要:Evidential deep learning (EDL) models, based on Subjective Logic, introduce a principled and computationally efficient way to make deterministic neural networks uncertainty-aware. The resulting evidential models can quantify fine-grained uncertainty using learned evidence. However, the Subjective-Logic framework constrains evidence to be non-negative, requiring specific activation functions whose geometric properties can induce activation-dependent learning-freeze behavior: a regime where gradients become extremely small for samples mapped into low-evidence regions. We theoretically characterize this behavior and analyze how different evidential activations influence learning dynamics. Building on this analysis, we design a general family of activation functions and corresponding evidential regularizers that provide an alternative pathway for consistent evidence updates across activation regimes. Extensive experiments on four benchmark classification problems (MNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet), two few-shot classification problems, and blind face restoration problem empirically validate the developed theory and demonstrate the effectiveness of the proposed generalized regularized evidential models.


【23】A Comprehensive Study of Deep Learning Model Fixing Approaches
标题:深度学习模型修复方法的综合研究
链接:https://arxiv.org/abs/2512.23745

作者:Hanmo You,Zan Wang,Zishuo Dong,Luanqi Mo,Jianjun Zhao,Junjie Chen
摘要:Deep Learning (DL) has been widely adopted in diverse industrial domains, including autonomous driving, intelligent healthcare, and aided programming. Like traditional software, DL systems are also prone to faults, whose malfunctioning may expose users to significant risks. Consequently, numerous approaches have been proposed to address these issues. In this paper, we conduct a large-scale empirical study on 16 state-of-the-art DL model fixing approaches, spanning model-level, layer-level, and neuron-level categories, to comprehensively evaluate their performance. We assess not only their fixing effectiveness (their primary purpose) but also their impact on other critical properties, such as robustness, fairness, and backward compatibility. To ensure comprehensive and fair evaluation, we employ a diverse set of datasets, model architectures, and application domains within a uniform experimental setup for experimentation. We summarize several key findings with implications for both industry and academia. For example, model-level approaches demonstrate superior fixing effectiveness compared to others. No single approach can achieve the best fixing performance while improving accuracy and maintaining all other properties. Thus, academia should prioritize research on mitigating these side effects. These insights highlight promising directions for future exploration in this field.


【24】Learning Temporally Consistent Turbulence Between Sparse Snapshots via Diffusion Models
标题:通过扩散模型学习稀疏快照之间的时间一致湍流
链接:https://arxiv.org/abs/2512.24813

作者:Mohammed Sardar,Małgorzata J. Zimoń,Samuel Draycott,Alistair Revell,Alex Skillen
备注:15 pages, 10 figures
摘要:We investigate the statistical accuracy of temporally interpolated spatiotemporal flow sequences between sparse, decorrelated snapshots of turbulent flow fields using conditional Denoising Diffusion Probabilistic Models (DDPMs). The developed method is presented as a proof-of-concept generative surrogate for reconstructing coherent turbulent dynamics between sparse snapshots, demonstrated on a 2D Kolmogorov Flow, and a 3D Kelvin-Helmholtz Instability (KHI). We analyse the generated flow sequences through the lens of statistical turbulence, examining the time-averaged turbulent kinetic energy spectra over generated sequences, and temporal decay of turbulent structures. For the non-stationary Kelvin-Helmholtz Instability, we assess the ability of the proposed method to capture evolving flow statistics across the most strongly time-varying flow regime. We additionally examine instantaneous fields and physically motivated metrics at key stages of the KHI flow evolution.


【25】Limits of quantum generative models with classical sampling hardness
标题:具有经典抽样硬度的量子生成模型的局限性
链接:https://arxiv.org/abs/2512.24801

作者:Sabrina Herbst,Ivona Brandić,Adrián Pérez-Salinas
备注:29 pages, 9 figures
摘要:Sampling tasks have been successful in establishing quantum advantages both in theory and experiments. This has fueled the use of quantum computers for generative modeling to create samples following the probability distribution underlying a given dataset. In particular, the potential to build generative models on classically hard distributions would immediately preclude classical simulability, due to theoretical separations. In this work, we study quantum generative models from the perspective of output distributions, showing that models that anticoncentrate are not trainable on average, including those exhibiting quantum advantage. In contrast, models outputting data from sparse distributions can be trained. We consider special cases to enhance trainability, and observe that this opens the path for classical algorithms for surrogate sampling. This observed trade-off is linked to verification of quantum processes. We conclude that quantum advantage can still be found in generative models, although its source must be distinct from anticoncentration.


【26】Soliton profiles: Classical Numerical Schemes vs. Neural Network - Based Solvers
标题:孤立子剖面:经典数值方案与基于神经网络的求解器
链接:https://arxiv.org/abs/2512.24634

作者:Chandler Haight,Svetlana Roudenko,Zhongming Wang
摘要:We present a comparative study of classical numerical solvers, such as Petviashvili's method or finite difference with Newton iterations, and neural network-based methods for computing ground states or profiles of solitary-wave solutions to the one-dimensional dispersive PDEs that include the nonlinear Schrödinger, the nonlinear Klein-Gordon and the generalized KdV equations. We confirm that classical approaches retain high-order accuracy and strong computational efficiency for single-instance problems in the one-dimensional setting. Physics-informed neural networks (PINNs) are also able to reproduce qualitative solutions but are generally less accurate and less efficient in low dimensions than classical solvers due to expensive training and slow convergence. We also investigate the operator-learning methods, which, although computationally intensive during training, can be reused across many parameter instances, providing rapid inference after pretraining, making them attractive for applications involving repeated simulations or real-time predictions. For single-instance computations, however, the accuracy of operator-learning methods remains lower than that of classical methods or PINNs, in general.


【27】Virasoro Symmetry in Neural Network Field Theories
标题:神经网络场论中的维拉索罗对称性
链接:https://arxiv.org/abs/2512.24420

作者:Brandon Robinson
备注:11 pages, 2 figures
摘要:Neural Network Field Theories (NN-FTs) can realize global conformal symmetries via embedding space architectures. These models describe Generalized Free Fields (GFFs) in the infinite width limit. However, they typically lack a local stress-energy tensor satisfying conformal Ward identities. This presents an obstruction to realizing infinite-dimensional, local conformal symmetry typifying 2d Conformal Field Theories (CFTs). We present the first construction of an NN-FT that encodes the full Virasoro symmetry of a 2d CFT. We formulate a neural free boson theory with a local stress tensor $T(z)$ by properly choosing the architecture and prior distribution of network parameters. We verify the analytical results through numerical simulation; computing the central charge and the scaling dimensions of vertex operators. We then construct an NN realization of a Majorana Fermion and an $\mathcal{N}=(1,1)$ scalar multiplet, which then enables an extension of the formalism to include super-Virasoro symmetry. Finally, we extend the framework by constructing boundary NN-FTs that preserve (super-)conformal symmetry via the method of images.


【28】Deep Learning in Geotechnical Engineering: A Critical Assessment of PINNs and Operator Learning
标题:土木工程中的深度学习:PINN和操作员学习的批判性评估
链接:https://arxiv.org/abs/2512.24365

作者:Krishna Kumar
摘要:Deep learning methods -- physics-informed neural networks (PINNs), deep operator networks (DeepONet), and graph network simulators (GNS) -- are increasingly proposed for geotechnical problems. This paper tests these methods against traditional solvers on canonical problems: wave propagation and beam-foundation interaction. PINNs run 90,000 times slower than finite difference with larger errors. DeepONet requires thousands of training simulations and breaks even only after millions of evaluations. Multi-layer perceptrons fail catastrophically when extrapolating beyond training data -- the common case in geotechnical prediction. GNS shows promise for geometry-agnostic simulation but faces scaling limits and cannot capture path-dependent soil behavior. For inverse problems, automatic differentiation through traditional solvers recovers material parameters with sub-percent accuracy in seconds. We recommend: use automatic differentiation for inverse problems; apply site-based cross-validation to account for spatial autocorrelation; reserve neural networks for problems where traditional solvers are genuinely expensive and predictions remain within the training envelope. When a method is four orders of magnitude slower with less accuracy, it is not a viable replacement for proven solvers.


【29】Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators
标题:随机插值神经网络操作员随机过程的构造性逼近
链接:https://arxiv.org/abs/2512.24106

作者:Sachin Saini,Uaday Singh
备注:22 Pages, 10 Figures
摘要 :In this paper, we construct a class of stochastic interpolation neural network operators (SINNOs) with random coefficients activated by sigmoidal functions. We establish their boundedness, interpolation accuracy, and approximation capabilities in the mean square sense, in probability, as well as path-wise within the space of second-order stochastic (random) processes \( L^2(Ω, \mathcal{F},\mathbb{P}) \). Additionally, we provide quantitative error estimates using the modulus of continuity of the processes. These results highlight the effectiveness of SINNOs for approximating stochastic processes with potential applications in COVID-19 case prediction.


【30】Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data
标题:基于时间差分学习的策略镜像下降:在线Markov数据下的样本复杂度
链接:https://arxiv.org/abs/2512.24056

作者:Wenye Li,Hongxu Chen,Jiacai Liu,Ke Wei
摘要:This paper studies the policy mirror descent (PMD) method, which is a general policy optimization framework in reinforcement learning and can cover a wide range of policy gradient methods by specifying difference mirror maps. Existing sample complexity analysis for policy mirror descent either focuses on the generative sampling model, or the Markovian sampling model but with the action values being explicitly approximated to certain pre-specified accuracy. In contrast, we consider the sample complexity of policy mirror descent with temporal difference (TD) learning under the Markovian sampling model. Two algorithms called Expected TD-PMD and Approximate TD-PMD have been presented, which are off-policy and mixed policy algorithms respectively. Under a small enough constant policy update step size, the $\tilde{O}(\varepsilon^{-2})$ (a logarithm factor about $\varepsilon$ is hidden in $\tilde{O}(\cdot)$) sample complexity can be established for them to achieve average-time $\varepsilon$-optimality. The sample complexity is further improved to $O(\varepsilon^{-2})$ (without the hidden logarithm factor) to achieve the last-iterate $\varepsilon$-optimality based on adaptive policy update step sizes.


【31】Tensor Computing Interface: An Application-Oriented, Lightweight Interface for Portable High-Performance Tensor Network Applications
标题:张量计算接口:面向应用程序的轻量级接口,用于便携式高性能张量网络应用程序
链接:https://arxiv.org/abs/2512.23917

作者:Rong-Yang Sun,Tomonori Shirakawa,Hidehiko Kohshiro,D. N. Sheng,Seiji Yunoki
备注:34 pages, 10 figures
摘要:Tensor networks (TNs) are a central computational tool in quantum science and artificial intelligence. However, the lack of unified software interface across tensor-computing frameworks severely limits the portability of TN applications, coupling algorithmic development to specific hardware and software back ends. To address this challenge, we introduce the Tensor Computing Interface (TCI) -- an application-oriented, lightweight application programming interface designed to enable framework-independent, high-performance TN applications. TCI provides a well-defined type system that abstracts tensor objects together with a minimal yet expressive set of core functions covering essential tensor manipulations and tensor linear-algebra operations. Through numerical demonstrations on representative tensor-network applications, we show that codes written against TCI can be migrated seamlessly across heterogeneous hardware and software platforms while achieving performance comparable to native framework implementations. We further release an open-source implementation of TCI based on \textit{Cytnx}, demonstrating its practicality and ease of integration with existing tensor-computing frameworks.


其他(34篇)

【1】Coordinated Humanoid Manipulation with Choice Policies
标题:协调人形机器人操纵与选择策略
链接:https://arxiv.org/abs/2512.25072

作者:Haozhi Qi,Yen-Jen Wang,Toru Lin,Brent Yi,Yi Ma,Koushil Sreenath,Jitendra Malik
备注:Code and Website: https://choice-policy.github.io/
摘要:Humanoid robots hold great promise for operating in human-centric environments, yet achieving robust whole-body coordination across the head, hands, and legs remains a major challenge. We present a system that combines a modular teleoperation interface with a scalable learning framework to address this problem. Our teleoperation design decomposes humanoid control into intuitive submodules, which include hand-eye coordination, grasp primitives, arm end-effector tracking, and locomotion. This modularity allows us to collect high-quality demonstrations efficiently. Building on this, we introduce Choice Policy, an imitation learning approach that generates multiple candidate actions and learns to score them. This architecture enables both fast inference and effective modeling of multimodal behaviors. We validate our approach on two real-world tasks: dishwasher loading and whole-body loco-manipulation for whiteboard wiping. Experiments show that Choice Policy significantly outperforms diffusion policies and standard behavior cloning. Furthermore, our results indicate that hand-eye coordination is critical for success in long-horizon tasks. Our work demonstrates a practical path toward scalable data collection and learning for coordinated humanoid manipulation in unstructured environments.


【2】RAIR: A Rule-Aware Benchmark Uniting Challenging Long-Tail and Visual Salience Subset for E-commerce Relevance Assessment
标题:RAIR:一个具有规则意识的基准,结合了复杂的长尾和视觉突出性子集,用于电子商务相关性评估
链接:https://arxiv.org/abs/2512.24943

作者:Chenji Lu,Zhuo Chen,Hui Zhao,Zhenyi Wang,Pengjie Wang,Jian Xu,Bo Zheng
摘要:Search relevance plays a central role in web e-commerce. While large language models (LLMs) have shown significant results on relevance task, existing benchmarks lack sufficient complexity for comprehensive model assessment, resulting in an absence of standardized relevance evaluation metrics across the industry. To address this limitation, we propose Rule-Aware benchmark with Image for Relevance assessment(RAIR), a Chinese dataset derived from real-world scenarios. RAIR established a standardized framework for relevance assessment and provides a set of universal rules, which forms the foundation for standardized evaluation. Additionally, RAIR analyzes essential capabilities required for current relevance models and introduces a comprehensive dataset consists of three subset: (1) a general subset with industry-balanced sampling to evaluate fundamental model competencies; (2) a long-tail hard subset focus on challenging cases to assess performance limits; (3) a visual salience subset for evaluating multimodal understanding capabilities. We conducted experiments on RAIR using 14 open and closed-source models. The results demonstrate that RAIR presents sufficient challenges even for GPT-5, which achieved the best performance. RAIR data are now available, serving as an industry benchmark for relevance assessment while providing new insights into general LLM and Visual Language Model(VLM) evaluation.


【3】mHC: Manifold-Constrained Hyper-Connections
标题:mHC:总管约束超连接
链接:https://arxiv.org/abs/2512.24880

作者:Zhenda Xie,Yixuan Wei,Huanqi Cao,Chenggang Zhao,Chengqi Deng,Jiashi Li,Damai Dai,Huazuo Gao,Jiang Chang,Liang Zhao,Shangyan Zhou,Zhean Xu,Zhengyan Zhang,Wangding Zeng,Shengding Hu,Yuqing Wang,Jingyang Yuan,Lean Wang,Wenfeng Liang
摘要:Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.


【4】Discovering Coordinated Joint Options via Inter-Agent Relative Dynamics
标题:通过主体间相对动力学发现协调联合选项
链接:https://arxiv.org/abs/2512.24827

作者:Raul D. Steleac,Mohan Sridharan,David Abel
摘要:Temporally extended actions improve the ability to explore and plan in single-agent settings. In multi-agent settings, the exponential growth of the joint state space with the number of agents makes coordinated behaviours even more valuable. Yet, this same exponential growth renders the design of multi-agent options particularly challenging. Existing multi-agent option discovery methods often sacrifice coordination by producing loosely coupled or fully independent behaviours. Toward addressing these limitations, we describe a novel approach for multi-agent option discovery. Specifically, we propose a joint-state abstraction that compresses the state space while preserving the information necessary to discover strongly coordinated behaviours. Our approach builds on the inductive bias that synchronisation over agent states provides a natural foundation for coordination in the absence of explicit objectives. We first approximate a fictitious state of maximal alignment with the team, the \textit{Fermat} state, and use it to define a measure of \textit{spreadness}, capturing team-level misalignment on each individual state dimension. Building on this representation, we then employ a neural graph Laplacian estimator to derive options that capture state synchronisation patterns between agents. We evaluate the resulting options across multiple scenarios in two multi-agent domains, showing that they yield stronger downstream coordination capabilities compared to alternative option discovery methods.


【5】LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)
标题:LeanCat:精益中形式类别理论的基准套件(第一部分:1-类别)
链接:https://arxiv.org/abs/2512.24796

作者:Rongge Xu,Hui Dai,Yiming Fu,Jiedong Jiang,Tianjiao Nie,Hongwei Wang,Junkai Wang,Holiverse Yang,Jiatong Yang,Zhi-Hao Zhang
备注:11 pages, 4 figures, 1 table
摘要:Large language models (LLMs) have made rapid progress in formal theorem proving, yet current benchmarks under-measure the kind of abstraction and library-mediated reasoning that organizes modern mathematics. In parallel with FATE's emphasis on frontier algebra, we introduce LeanCat, a Lean benchmark for category-theoretic formalization -- a unifying language for mathematical structure and a core layer of modern proof engineering -- serving as a stress test of structural, interface-level reasoning. Part I: 1-Categories contains 100 fully formalized statement-level tasks, curated into topic families and three difficulty tiers via an LLM-assisted + human grading process. The best model solves 8.25% of tasks at pass@1 (32.50%/4.17%/0.00% by Easy/Medium/High) and 12.00% at pass@4 (50.00%/4.76%/0.00%). We also evaluate LeanBridge which use LeanExplore to search Mathlib, and observe consistent gains over single-model baselines. LeanCat is intended as a compact, reusable checkpoint for tracking both AI and human progress toward reliable, research-level formalization in Lean.


【6】Nonlinear Noise2Noise for Efficient Monte Carlo Denoiser Training
标题:非线性噪音2噪音用于高效蒙特卡洛降噪训练
链接:https://arxiv.org/abs/2512.24794

作者:Andrew Tinits,Stephen Mann
备注:15 pages, 7 figures, 2 tables
摘要 :The Noise2Noise method allows for training machine learning-based denoisers with pairs of input and target images where both the input and target can be noisy. This removes the need for training with clean target images, which can be difficult to obtain. However, Noise2Noise training has a major limitation: nonlinear functions applied to the noisy targets will skew the results. This bias occurs because the nonlinearity makes the expected value of the noisy targets different from the clean target image. Since nonlinear functions are common in image processing, avoiding them limits the types of preprocessing that can be performed on the noisy targets. Our main insight is that certain nonlinear functions can be applied to the noisy targets without adding significant bias to the results. We develop a theoretical framework for analyzing the effects of these nonlinearities, and describe a class of nonlinear functions with minimal bias.   We demonstrate our method on the denoising of high dynamic range (HDR) images produced by Monte Carlo rendering. Noise2Noise training can have trouble with HDR images, where the training process is overwhelmed by outliers and performs poorly. We consider a commonly used method of addressing these training issues: applying a nonlinear tone mapping function to the model output and target images to reduce their dynamic range. This method was previously thought to be incompatible with Noise2Noise training because of the nonlinearities involved. We show that certain combinations of loss functions and tone mapping functions can reduce the effect of outliers while introducing minimal bias. We apply our method to an existing machine learning-based Monte Carlo denoiser, where the original implementation was trained with high-sample count reference images. Our results approach those of the original implementation, but are produced using only noisy training data.


【7】BandiK: Efficient Multi-Task Decomposition Using a Multi-Bandit Framework
标题:BandiK:使用Multi-Bandit框架的高效多任务分解
链接:https://arxiv.org/abs/2512.24708

作者:András Millinghoffer,András Formanek,András Antos,Péter Antal
备注:8 pages, 14 figures
摘要:The challenge of effectively transferring knowledge across multiple tasks is of critical importance and is also present in downstream tasks with foundation models. However, the nature of transfer, its transitive-intransitive nature, is still an open problem, and negative transfer remains a significant obstacle. Selection of beneficial auxiliary task sets in multi-task learning is frequently hindered by the high computational cost of their evaluation, the high number of plausible candidate auxiliary sets, and the varying complexity of selection across target tasks.   To address these constraints, we introduce BandiK, a novel three-stage multi-task auxiliary task subset selection method using multi-bandits, where each arm pull evaluates candidate auxiliary sets by training and testing a multiple output neural network on a single random train-test dataset split. Firstly, BandiK estimates the pairwise transfers between tasks, which helps in identifying which tasks are likely to benefit from joint learning. In the second stage, it constructs a linear number of candidate sets of auxiliary tasks (in the number of all tasks) for each target task based on the initial estimations, significantly reducing the exponential number of potential auxiliary task sets. Thirdly, it employs a Multi-Armed Bandit (MAB) framework for each task, where the arms correspond to the performance of candidate auxiliary sets realized as multiple output neural networks over train-test data set splits. To enhance efficiency, BandiK integrates these individual task-specific MABs into a multi-bandit structure. The proposed multi-bandit solution exploits that the same neural network realizes multiple arms of different individual bandits corresponding to a given candidate set. This semi-overlapping arm property defines a novel multi-bandit cost/reward structure utilized in BandiK.


【8】Causal Discovery with Mixed Latent Confounding via Precision Decomposition
标题:通过精确分解发现具有混合潜在混淆的原因
链接:https://arxiv.org/abs/2512.24696

作者:Amir Asiaee,Samhita Pal,James O'quinn,James P. Long
摘要:We study causal discovery from observational data in linear Gaussian systems affected by \emph{mixed latent confounding}, where some unobserved factors act broadly across many variables while others influence only small subsets. This setting is common in practice and poses a challenge for existing methods: differentiable and score-based DAG learners can misinterpret global latent effects as causal edges, while latent-variable graphical models recover only undirected structure.   We propose \textsc{DCL-DECOR}, a modular, precision-led pipeline that separates these roles. The method first isolates pervasive latent effects by decomposing the observed precision matrix into a structured component and a low-rank component. The structured component corresponds to the conditional distribution after accounting for pervasive confounders and retains only local dependence induced by the causal graph and localized confounding. A correlated-noise DAG learner is then applied to this deconfounded representation to recover directed edges while modeling remaining structured error correlations, followed by a simple reconciliation step to enforce bow-freeness.   We provide identifiability results that characterize the recoverable causal target under mixed confounding and show how the overall problem reduces to well-studied subproblems with modular guarantees. Synthetic experiments that vary the strength and dimensionality of pervasive confounding demonstrate consistent improvements in directed edge recovery over applying correlated-noise DAG learning directly to the confounded data.


【9】From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme
标题:从感知到妙语:用野外模因艺术赋予VLM权力
链接:https://arxiv.org/abs/2512.24555

作者:Xueyan Li,Yingyi Xue,Mengjie Jiang,Qingzi Zhu,Yazhe Niu
备注:46 pages, 20 figures
摘要:Generating humorous memes is a challenging multimodal task that moves beyond direct image-to-caption supervision. It requires a nuanced reasoning over visual content, contextual cues, and subjective humor. To bridge this gap between visual perception and humorous punchline creation, we propose HUMOR}, a novel framework that guides VLMs through hierarchical reasoning and aligns them with group-wise human preferences. First, HUMOR employs a hierarchical, multi-path Chain-of-Thought (CoT): the model begins by identifying a template-level intent, then explores diverse reasoning paths under different contexts, and finally anchors onto a high-quality, context-specific path. This CoT supervision, which traces back from ground-truth captions, enhances reasoning diversity. We further analyze that this multi-path exploration with anchoring maintains a high expected humor quality, under the practical condition that high-quality paths retain significant probability mass. Second, to capture subjective humor, we train a pairwise reward model that operates within groups of memes sharing the same template. Following established theory, this approach ensures a consistent and robust proxy for human preference, even with subjective and noisy labels. The reward model then enables a group-wise reinforcement learning optimization, guaranteeing providing a theoretical guarantee for monotonic improvement within the trust region. Extensive experiments show that HUMOR empowers various VLMs with superior reasoning diversity, more reliable preference alignment, and higher overall meme quality. Beyond memes, our work presents a general training paradigm for open-ended, human-aligned multimodal generation, where success is guided by comparative judgment within coherent output group.


【10】More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization
标题:不止位:极端量化的多信封双二进制因式分解
链接:https://arxiv.org/abs/2512.24545

作者:Yuma Ichikawa,Yoshihiko Fujisawa,Yudai Fujimoto,Akira Sakai,Katsuki Fujisawa
备注:14 pages, 2 figures
摘要 :For extreme low-bit quantization of large language models (LLMs), Double Binary Factorization (DBF) is attractive as it enables efficient inference without sacrificing accuracy. However, the scaling parameters of DBF are too restrictive; after factoring out signs, all rank components share the same magnitude profile, resulting in performance saturation. We propose Multi-envelope DBF (MDBF), which retains a shared pair of 1-bit sign bases but replaces the single envelope with a rank-$l$ envelope. By sharing sign matrices among envelope components, MDBF effectively maintains a binary carrier and utilizes the limited memory budget for magnitude expressiveness. We also introduce a closed-form initialization and an alternating refinement method to optimize MDBF. Across the LLaMA and Qwen families, MDBF enhances perplexity and zero-shot accuracy over previous binary formats at matched bits per weight while preserving the same deployment-friendly inference primitive.


【11】OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization
标题:OptRot:通过训练后量化的无数据旋转来缓解体重异常值
链接:https://arxiv.org/abs/2512.24124

作者:Advait Gadhikar,Riccardo Grazzi,James Hensman
备注:25 pages, 10 figures
摘要:The presence of outliers in Large Language Models (LLMs) weights and activations makes them difficult to quantize. Recent work has leveraged rotations to mitigate these outliers. In this work, we propose methods that learn fusible rotations by minimizing principled and cheap proxy objectives to the weight quantization error. We primarily focus on GPTQ as the quantization method. Our main method is OptRot, which reduces weight outliers simply by minimizing the element-wise fourth power of the rotated weights. We show that OptRot outperforms both Hadamard rotations and more expensive, data-dependent methods like SpinQuant and OSTQuant for weight quantization. It also improves activation quantization in the W4A8 setting. We also propose a data-dependent method, OptRot$^{+}$, that further improves performance by incorporating information on the activation covariance. In the W4A4 setting, we see that both OptRot and OptRot$^{+}$ perform worse, highlighting a trade-off between weight and activation quantization.


【12】Random Multiplexing
标题:随机多路复用
链接:https://arxiv.org/abs/2512.24087

作者:Lei Liu,Yuhao Chi,Shunqi Huang,Zhaoyang Zhang
摘要:As wireless communication applications evolve from traditional multipath environments to high-mobility scenarios like unmanned aerial vehicles, multiplexing techniques have advanced accordingly. Traditional single-carrier frequency-domain equalization (SC-FDE) and orthogonal frequency-division multiplexing (OFDM) have given way to emerging orthogonal time-frequency space (OTFS) and affine frequency-division multiplexing (AFDM). These approaches exploit specific channel structures to diagonalize or sparsify the effective channel, thereby enabling low-complexity detection. However, their reliance on these structures significantly limits their robustness in dynamic, real-world environments. To address these challenges, this paper studies a random multiplexing technique that is decoupled from the physical channels, enabling its application to arbitrary norm-bounded and spectrally convergent channel matrices. Random multiplexing achieves statistical fading-channel ergodicity for transmitted signals by constructing an equivalent input-isotropic channel matrix in the random transform domain. It guarantees the asymptotic replica MAP bit-error rate (BER) optimality of AMP-type detectors for linear systems with arbitrary norm-bounded, spectrally convergent channel matrices and signaling configurations, under the unique fixed point assumption. A low-complexity cross-domain memory AMP (CD-MAMP) detector is considered, leveraging the sparsity of the time-domain channel and the randomness of the equivalent channel. Optimal power allocations are derived to minimize the replica MAP BER and maximize the replica constrained capacity of random multiplexing systems. The optimal coding principle and replica constrained-capacity optimality of CD-MAMP detector are investigated for random multiplexing systems. Additionally, the versatility of random multiplexing in diverse wireless applications is explored.


【13】Information-Theoretic Quality Metric of Low-Dimensional Embeddings
标题:低维嵌入的信息论质量度量
链接:https://arxiv.org/abs/2512.23981

作者:Sebastián Gutiérrez-Bernal,Hector Medel Cobaxin,Abiel Galindo González
备注:18 pages, 6 figures, submitted to Machine Learning (Springer Nature)
摘要:In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, rank-based neighborhood criteria, or Local Procrustes quantify distortions in distances or in local geometries, but do not directly assess how much information is preserved when projecting high-dimensional data onto a lower-dimensional space. To address this limitation, we introduce the Entropy Rank Preservation Measure (ERPM), a local metric based on the Shannon entropy of the singular-value spectrum of neighborhood matrices and on the stable rank, which quantifies changes in uncertainty between the original representation and its reduced projection, providing neighborhood-level indicators and a global summary statistic. To validate the results of the metric, we compare its outcomes with the Mean Relative Rank Error (MRRE), which is distance-based, and with Local Procrustes, which is based on geometric properties, using a financial time series and a manifold commonly studied in the literature. We observe that distance-based criteria exhibit very low correlation with geometric and spectral measures, while ERPM and Local Procrustes show strong average correlation but display significant discrepancies in local regimes, leading to the conclusion that ERPM complements existing metrics by identifying neighborhoods with severe information loss, thereby enabling a more comprehensive assessment of embeddings, particularly in information-sensitive applications such as the construction of early-warning indicators.


【14】Assured Autonomy: How Operations Research Powers and Orchestrates Generative AI Systems
标题:有保障的自主性:运营研究如何为生成性人工智能系统提供动力和规范
链接:https://arxiv.org/abs/2512.23978

作者:Tinglong Dai,David Simchi-Levi,Michelle Xiao Wu,Yao Xie
备注:Authors are listed alphabetically
摘要 :Generative artificial intelligence (GenAI) is shifting from conversational assistants toward agentic systems -- autonomous decision-making systems that sense, decide, and act within operational workflows. This shift creates an autonomy paradox: as GenAI systems are granted greater operational autonomy, they should, by design, embody more formal structure, more explicit constraints, and stronger tail-risk discipline. We argue stochastic generative models can be fragile in operational domains unless paired with mechanisms that provide verifiable feasibility, robustness to distribution shift, and stress testing under high-consequence scenarios. To address this challenge, we develop a conceptual framework for assured autonomy grounded in operations research (OR), built on two complementary approaches. First, flow-based generative models frame generation as deterministic transport characterized by an ordinary differential equation, enabling auditability, constraint-aware generation, and connections to optimal transport, robust optimization, and sequential decision control. Second, operational safety is formulated through an adversarial robustness lens: decision rules are evaluated against worst-case perturbations within uncertainty or ambiguity sets, making unmodeled risks part of the design. This framework clarifies how increasing autonomy shifts OR's role from solver to guardrail to system architect, with responsibility for control logic, incentive protocols, monitoring regimes, and safety boundaries. These elements define a research agenda for assured autonomy in safety-critical, reliability-sensitive operational domains.


【15】Statistical Guarantees in the Search for Less Discriminatory Algorithms
标题:寻找歧视性较低算法的统计保证
链接:https://arxiv.org/abs/2512.23943

作者:Chris Hays,Ben Laufer,Solon Barocas,Manish Raghavan
备注:37 pages, 10 figures
摘要:Recent scholarship has argued that firms building data-driven decision systems in high-stakes domains like employment, credit, and housing should search for "less discriminatory algorithms" (LDAs) (Black et al., 2024). That is, for a given decision problem, firms considering deploying a model should make a good-faith effort to find equally performant models with lower disparate impact across social groups. Evidence from the literature on model multiplicity shows that randomness in training pipelines can lead to multiple models with the same performance, but meaningful variations in disparate impact. This suggests that developers can find LDAs simply by randomly retraining models. Firms cannot continue retraining forever, though, which raises the question: What constitutes a good-faith effort? In this paper, we formalize LDA search via model multiplicity as an optimal stopping problem, where a model developer with limited information wants to produce strong evidence that they have sufficiently explored the space of models. Our primary contribution is an adaptive stopping algorithm that yields a high-probability upper bound on the gains achievable from a continued search, allowing the developer to certify (e.g., to a court) that their search was sufficient. We provide a framework under which developers can impose stronger assumptions about the distribution of models, yielding correspondingly stronger bounds. We validate the method on real-world credit, employment and housing datasets.


【16】Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias
标题:约束滋生概括:作为归纳偏差的时间动力学
链接:https://arxiv.org/abs/2512.23916

作者:Xia Chen
备注:8 pages, 7 figures
摘要:Conventional deep learning prioritizes unconstrained optimization, yet biological systems operate under strict metabolic constraints. We propose that these physical constraints shape dynamics to function not as limitations, but as a temporal inductive bias that breeds generalization. Through a phase-space analysis of signal propagation, we reveal a fundamental asymmetry: expansive dynamics amplify noise, whereas proper dissipative dynamics compress phase space that aligns with the network's spectral bias, compelling the abstraction of invariant features. This condition can be imposed externally via input encoding, or intrinsically through the network's own temporal dynamics. Both pathways require architectures capable of temporal integration and proper constraints to decode induced invariants, whereas static architectures fail to capitalize on temporal structure. Through comprehensive evaluations across supervised classification, unsupervised reconstruction, and zero-shot reinforcement learning, we demonstrate that a critical "transition" regime maximizes generalization capability. These findings establish dynamical constraints as a distinct class of inductive bias, suggesting that robust AI development requires not only scaling and removing limitations, but computationally mastering the temporal characteristics that naturally promote generalization.


【17】Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining
标题:探索压缩记忆的极限:小范围预训练中无限注意的研究
链接:https://arxiv.org/abs/2512.23862

作者:Ruizhe Huang,Kexuan Zhang,Yihao Fang,Baifeng Yu
摘要:This study investigates small-scale pretraining for Small Language Models (SLMs) to enable efficient use of limited data and compute, improve accessibility in low-resource settings and reduce costs. To enhance long-context extrapolation in compact models, we focus on Infini-attention, which builds a compressed memory from past segments while preserving local attention. In our work, we conduct an empirical study using 300M-parameter LLaMA models pretrained with Infini-attention. The model demonstrates training stability and outperforms the baseline in long-context retrieval. We identify the balance factor as a key part of the model performance, and we found that retrieval accuracy drops with repeated memory compressions over long sequences. Even so, Infini-attention still effectively compensates for the SLM's limited parameters. Particularly, despite performance degradation at a 16,384-token context, the Infini-attention model achieves up to 31% higher accuracy than the baseline. Our findings suggest that achieving robust long-context capability in SLMs benefits from architectural memory like Infini-attention.


【18】Flow Matching Neural Processes
标题:流量匹配神经过程
链接:https://arxiv.org/abs/2512.23853

作者:Hussen Abu Hamad,Dan Rosenbaum
备注:NeurIPS 2025. For code, see https://github.com/danrsm/flowNP
摘要 :Neural processes (NPs) are a class of models that learn stochastic processes directly from data and can be used for inference, sampling and conditional sampling. We introduce a new NP model based on flow matching, a generative modeling paradigm that has demonstrated strong performance on various data modalities. Following the NP training framework, the model provides amortized predictions of conditional distributions over any arbitrary points in the data. Compared to previous NP models, our model is simple to implement and can be used to sample from conditional distributions using an ODE solver, without requiring auxiliary conditioning methods. In addition, the model provides a controllable tradeoff between accuracy and running time via the number of steps in the ODE solver. We show that our model outperforms previous state-of-the-art neural process methods on various benchmarks including synthetic 1D Gaussian processes data, 2D images, and real-world weather data.


【19】Exploiting the Prior of Generative Time Series Imputation
标题:利用生成时间序列插补的先验性
链接:https://arxiv.org/abs/2512.23832

作者:YuYang Miao,Chang Li,Zehua Chen
摘要:Time series imputation, i.e., filling the missing values of a time recording, finds various applications in electricity, finance, and weather modelling. Previous methods have introduced generative models such as diffusion probabilistic models and Schrodinger bridge models to conditionally generate the missing values from Gaussian noise or directly from linear interpolation results. However, as their prior is not informative to the ground-truth target, their generation process inevitably suffer increased burden and limited imputation accuracy. In this work, we present Bridge-TS, building a data-to-data generation process for generative time series imputation and exploiting the design of prior with two novel designs. Firstly, we propose expert prior, leveraging a pretrained transformer-based module as an expert to fill the missing values with a deterministic estimation, and then taking the results as the prior of ground truth target. Secondly, we explore compositional priors, utilizing several pretrained models to provide different estimation results, and then combining them in the data-to-data generation process to achieve a compositional priors-to-target imputation process. Experiments conducted on several benchmark datasets such as ETT, Exchange, and Weather show that Bridge-TS reaches a new record of imputation accuracy in terms of mean square error and mean absolute error, demonstrating the superiority of improving prior for generative time series imputation.


【20】Improved Bounds for Private and Robust Alignment
标题:改善私人和稳健一致的界限
链接:https://arxiv.org/abs/2512.23816

作者:Wenqian Weng,Yi He,Xingyu Zhou
摘要:In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online settings. We consider preference labels subject to privacy constraints and/or adversarial corruption, and analyze two distinct interplays between them: privacy-first and corruption-first. For the privacy-only setting, we show that log loss with an MLE-style algorithm achieves near-optimal rates, in contrast to conventional wisdom. For the joint privacy-and-corruption setting, we first demonstrate that existing offline algorithms in fact provide stronger guarantees -- simultaneously in terms of corruption level and privacy parameters -- than previously known, which further yields improved bounds in the corruption-only regime. In addition, we also present the first set of results for private and robust online alignment. Our results are enabled by new uniform convergence guarantees for log loss and square loss under privacy and corruption, which we believe have broad applicability across learning theory and statistics.


【21】Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations
标题:揭露歧视集群:量化和解释系统性公平侵犯行为
链接:https://arxiv.org/abs/2512.23769

作者:Ranit Debnath Akash,Ashish Kumar,Verya Monjezi,Ashutosh Trivedi,Gang,Tan,Saeid Tizpaz-Niari
备注:In 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025)
摘要:Fairness in algorithmic decision-making is often framed in terms of individual fairness, which requires that similar individuals receive similar outcomes. A system violates individual fairness if there exists a pair of inputs differing only in protected attributes (such as race or gender) that lead to significantly different outcomes-for example, one favorable and the other unfavorable. While this notion highlights isolated instances of unfairness, it fails to capture broader patterns of systematic or clustered discrimination that may affect entire subgroups. We introduce and motivate the concept of discrimination clustering, a generalization of individual fairness violations. Rather than detecting single counterfactual disparities, we seek to uncover regions of the input space where small perturbations in protected features lead to k-significantly distinct clusters of outcomes. That is, for a given input, we identify a local neighborhood-differing only in protected attributes-whose members' outputs separate into many distinct clusters. These clusters reveal significant arbitrariness in treatment solely based on protected attributes that help expose patterns of algorithmic bias that elude pairwise fairness checks. We present HyFair, a hybrid technique that combines formal symbolic analysis (via SMT and MILP solvers) to certify individual fairness with randomized search to discover discriminatory clusters. This combination enables both formal guarantees-when no counterexamples exist-and the detection of severe violations that are computationally challenging for symbolic methods alone. Given a set of inputs exhibiting high k-unfairness, we introduce a novel explanation method to generate interpretable, decision-tree-style artifacts. Our experiments demonstrate that HyFair outperforms state-of-the-art fairness verification and local explanation methods.


【22】Enabling Physical AI at the Edge: Hardware-Accelerated Recovery of System Dynamics
标题:在边缘启用物理人工智能:系统动态的硬件加速恢复
链接:https://arxiv.org/abs/2512.23767

作者:Bin Xu,Ayan Banerjee,Sandeep Gupta
备注:2025 59th Asilomar Conference on Signals, Systems, and Computers
摘要 :Physical AI at the edge -- enabling autonomous systems to understand and predict real-world dynamics in real time -- requires hardware-efficient learning and inference. Model recovery (MR), which identifies governing equations from sensor data, is a key primitive for safe and explainable monitoring in mission-critical autonomous systems operating under strict latency, compute, and power constraints. However, state-of-the-art MR methods (e.g., EMILY and PINN+SR) rely on Neural ODE formulations that require iterative solvers and are difficult to accelerate efficiently on edge hardware. We present \textbf{MERINDA} (Model Recovery in Reconfigurable Dynamic Architecture), an FPGA-accelerated MR framework designed to make physical AI practical on resource-constrained devices. MERINDA replaces expensive Neural ODE components with a hardware-friendly formulation that combines (i) GRU-based discretized dynamics, (ii) dense inverse-ODE layers, (iii) sparsity-driven dropout, and (iv) lightweight ODE solvers. The resulting computation is structured for streaming parallelism, enabling critical kernels to be fully parallelized on the FPGA. Across four benchmark nonlinear dynamical systems, MERINDA delivers substantial gains over GPU implementations: \textbf{114$\times$ lower energy} (434~J vs.\ 49{,}375~J), \textbf{28$\times$ smaller memory footprint} (214~MB vs.\ 6{,}118~MB), and \textbf{1.68$\times$ faster training}, while matching state-of-the-art model-recovery accuracy. These results demonstrate that MERINDA can bring accurate, explainable MR to the edge for real-time monitoring of autonomous systems.


【23】Drift-Based Dataset Stability Benchmark
标题:基于漂移的数据集稳定性基准
链接:https://arxiv.org/abs/2512.23762

作者:Dominik Soukup,Richard Plný,Daniel Vašata,Tomáš Čejka
备注:9 pages
摘要:Machine learning (ML) represents an efficient and popular approach for network traffic classification. However, network traffic classification is a challenging domain, and trained models may degrade soon after deployment due to the obsolete datasets and quick evolution of computer networks as new or updated protocols appear. Moreover, significant change in the behavior of a traffic type (and, therefore, the underlying features representing the traffic) can produce a large and sudden performance drop of the deployed model, known as a data or concept drift. In most cases, complete retraining is performed, often without further investigation of root causes, as good dataset quality is assumed. However, this is not always the case and further investigation must be performed. This paper proposes a novel methodology to evaluate the stability of datasets and a benchmark workflow that can be used to compare datasets.   The proposed framework is based on a concept drift detection method that also uses ML feature weights to boost the detection performance. The benefits of this work are demonstrated on CESNET-TLS-Year22 dataset. We provide the initial dataset stability benchmark that is used to describe dataset stability and weak points to identify the next steps for optimization. Lastly, using the proposed benchmarking methodology, we show the optimization impact on the created dataset variants.


【24】Governing Cloud Data Pipelines with Agentic AI
标题:利用大型人工智能治理云数据管道
链接:https://arxiv.org/abs/2512.23737

作者:Aswathnarayan Muthukrishnan Kirubakaran,Adithya Parthasarathy,Nitin Saksena,Ram Sekhar Bodala,Akshay Deshpande,Suhas Malempati,Shiva Carimireddy,Abhirup Mazumder
备注:https://www.ijcstjournal.org/volume-13/issue-6/IJCST-V13I6P44.pdf
摘要:Cloud data pipelines increasingly operate under dynamic workloads, evolving schemas, cost constraints, and strict governance requirements. Despite advances in cloud-native orchestration frameworks, most production pipelines rely on static configurations and reactive operational practices, resulting in prolonged recovery times, inefficient resource utilization, and high manual overhead. This paper presents Agentic Cloud Data Engineering, a policy-aware control architecture that integrates bounded AI agents into the governance and control plane of cloud data pipelines. In Agentic Cloud Data Engineering platform, specialized agents analyze pipeline telemetry and metadata, reason over declarative cost and compliance policies, and propose constrained operational actions such as adaptive resource reconfiguration, schema reconciliation, and automated failure recovery. All agent actions are validated against governance policies to ensure predictable and auditable behavior. We evaluate Agentic Cloud Data Engineering platform using representative batch and streaming analytics workloads constructed from public enterprise-style datasets. Experimental results show that Agentic Cloud Data Engineering platform reduces mean pipeline recovery time by up to 45%, lowers operational cost by approximately 25%, and decreases manual intervention events by over 70% compared to static orchestration, while maintaining data freshness and policy compliance. These results demonstrate that policy-bounded agentic control provides an effective and practical approach for governing cloud data pipelines in enterprise environments.


【25】Are First-Order Diffusion Samplers Really Slower? A Fast Forward-Value Approach
标题:一阶扩散采样器真的很慢吗?快速前瞻价值方法
链接:https://arxiv.org/abs/2512.24927

作者:Yuchen Jiao,Na Li,Changxiao Cai,Gen Li
摘要:Higher-order ODE solvers have become a standard tool for accelerating diffusion probabilistic model (DPM) sampling, motivating the widespread view that first-order methods are inherently slower and that increasing discretization order is the primary path to faster generation. This paper challenges this belief and revisits acceleration from a complementary angle: beyond solver order, the placement of DPM evaluations along the reverse-time dynamics can substantially affect sampling accuracy in the low-neural function evaluation (NFE) regime.   We propose a novel training-free, first-order sampler whose leading discretization error has the opposite sign to that of DDIM. Algorithmically, the method approximates the forward-value evaluation via a cheap one-step lookahead predictor. We provide theoretical guarantees showing that the resulting sampler provably approximates the ideal forward-value trajectory while retaining first-order convergence. Empirically, across standard image generation benchmarks (CIFAR-10, ImageNet, FFHQ, and LSUN), the proposed sampler consistently improves sample quality under the same NFE budget and can be competitive with, and sometimes outperform, state-of-the-art higher-order samplers. Overall, the results suggest that the placement of DPM evaluations provides an additional and largely independent design angle for accelerating diffusion sampling.


【26】MultiRisk: Multiple Risk Control via Iterative Score Thresholding
标题:MultiRisk:通过迭代评分保存进行多重风险控制
链接:https://arxiv.org/abs/2512.24587

作者:Sunay Joshi,Yan Sun,Hamed Hassani,Edgar Dobriban
摘要:As generative AI systems are increasingly deployed in real-world applications, regulating multiple dimensions of model behavior has become essential. We focus on test-time filtering: a lightweight mechanism for behavior control that compares performance scores to estimated thresholds, and modifies outputs when these bounds are violated. We formalize the problem of enforcing multiple risk constraints with user-defined priorities, and introduce two efficient dynamic programming algorithms that leverage this sequential structure. The first, MULTIRISK-BASE, provides a direct finite-sample procedure for selecting thresholds, while the second, MULTIRISK, leverages data exchangeability to guarantee simultaneous control of the risks. Under mild assumptions, we show that MULTIRISK achieves nearly tight control of all constraint risks. The analysis requires an intricate iterative argument, upper bounding the risks by introducing several forms of intermediate symmetrized risk functions, and carefully lower bounding the risks by recursively counting jumps in symmetrized risk functions between appropriate risk levels. We evaluate our framework on a three-constraint Large Language Model alignment task using the PKU-SafeRLHF dataset, where the goal is to maximize helpfulness subject to multiple safety constraints, and where scores are generated by a Large Language Model judge and a perplexity filter. Our experimental results show that our algorithm can control each individual risk at close to the target level.


【27】Probabilistic Computers for Neural Quantum States
标题:神经量子状态的概率计算机
链接:https://arxiv.org/abs/2512.24558

作者:Shuvro Chowdhury,Jasper Pieterse,Navid Anjum Aadit,Johan H. Mentink,Kerem Y. Camsari
摘要:Neural quantum states efficiently represent many-body wavefunctions with neural networks, but the cost of Monte Carlo sampling limits their scaling to large system sizes. Here we address this challenge by combining sparse Boltzmann machine architectures with probabilistic computing hardware. We implement a probabilistic computer on field programmable gate arrays (FPGAs) and use it as a fast sampler for energy-based neural quantum states. For the two-dimensional transverse-field Ising model at criticality, we obtain accurate ground-state energies for lattices up to 80 $\times$ 80 (6400 spins) using a custom multi-FPGA cluster. Furthermore, we introduce a dual-sampling algorithm to train deep Boltzmann machines, replacing intractable marginalization with conditional sampling over auxiliary layers. This enables the training of sparse deep models and improves parameter efficiency relative to shallow networks. Using this algorithm, we train deep Boltzmann machines for a system with 35 $\times$ 35 (1225 spins). Together, these results demonstrate that probabilistic hardware can overcome the sampling bottleneck in variational simulation of quantum many-body systems, opening a path to larger system sizes and deeper variational architectures.


【28】Variational Quantum Brushes
标题:变量量子画笔
链接:https://arxiv.org/abs/2512.24173

作者:Jui-Ting Lu,Henrique Ennes,Chih-Kang Huang,Ali Abbassi
摘要:Quantum brushes are computational arts software introduced by Ferreira et al (2025) that leverage quantum behavior to generate novel artistic effects. In this outreach paper, we introduce the mathematical framework and describe the implementation of two quantum brushes based on variational quantum algorithms, Steerable and Chemical. While Steerable uses quantum geometric control theory to merge two works of art, Chemical mimics variational eigensolvers for estimating molecular ground energies to evolve colors on an underlying canvas. The implementation of both brushes is available open-source at https://github.com/moth-quantum/QuantumBrush and is fully compatible with the original quantum brushes.


【29】Score-based sampling without diffusions: Guidance from a simple and modular scheme
标题:无扩散的基于分数的抽样:来自简单模块化方案的指导
链接:https://arxiv.org/abs/2512.24152

作者:M. J. Wainwright
摘要:Sampling based on score diffusions has led to striking empirical results, and has attracted considerable attention from various research communities. It depends on availability of (approximate) Stein score functions for various levels of additive noise. We describe and analyze a modular scheme that reduces score-based sampling to solving a short sequence of ``nice'' sampling problems, for which high-accuracy samplers are known. We show how to design forward trajectories such that both (a) the terminal distribution, and (b) each of the backward conditional distribution is defined by a strongly log concave (SLC) distribution. This modular reduction allows us to exploit \emph{any} SLC sampling algorithm in order to traverse the backwards path, and we establish novel guarantees with short proofs for both uni-modal and multi-modal densities. The use of high-accuracy routines yields $\varepsilon$-accurate answers, in either KL or Wasserstein distances, with polynomial dependence on $\log(1/\varepsilon)$ and $\sqrt{d}$ dependence on the dimension.


【30】Fundamental limits for weighted empirical approximations of tilted distributions
标题:倾斜分布加权经验逼近的基本极限
链接:https://arxiv.org/abs/2512.23979

作者:Sarvesh Ravichandran Iyer,Himadri Mandal,Dhruman Gupta,Rushil Gupta,Agniv Bandhyopadhyay,Achal Bassamboo,Varun Gupta,Sandeep Juneja
备注:84 pages, 6 figures
摘要:Consider the task of generating samples from a tilted distribution of a random vector whose underlying distribution is unknown, but samples from it are available. This finds applications in fields such as finance and climate science, and in rare event simulation. In this article, we discuss the asymptotic efficiency of a self-normalized importance sampler of the tilted distribution. We provide a sharp characterization of its accuracy, given the number of samples and the degree of tilt. Our findings reveal a surprising dichotomy: while the number of samples needed to accurately tilt a bounded random vector increases polynomially in the tilt amount, it increases at a super polynomial rate for unbounded distributions.


【31】Implicit geometric regularization in flow matching via density weighted Stein operators
标题:通过密度加权Stein运算实现流匹配的隐式几何正规化
链接:https://arxiv.org/abs/2512.23956

作者:Shinto Eguchi
摘要:Flow Matching (FM) has emerged as a powerful paradigm for continuous normalizing flows, yet standard FM implicitly performs an unweighted $L^2$ regression over the entire ambient space. In high dimensions, this leads to a fundamental inefficiency: the vast majority of the integration domain consists of low-density ``void'' regions where the target velocity fields are often chaotic or ill-defined. In this paper, we propose {$γ$-Flow Matching ($γ$-FM)}, a density-weighted variant that aligns the regression geometry with the underlying probability flow. While density weighting is desirable, naive implementations would require evaluating the intractable target density. We circumvent this by introducing a Dynamic Density-Weighting strategy that estimates the \emph{target} density directly from training particles. This approach allows us to dynamically downweight the regression loss in void regions without compromising the simulation-free nature of FM. Theoretically, we establish that $γ$-FM minimizes the transport cost on a statistical manifold endowed with the $γ$-Stein metric. Spectral analysis further suggests that this geometry induces an implicit Sobolev regularization, effectively damping high-frequency oscillations in void regions. Empirically, $γ$-FM significantly improves vector field smoothness and sampling efficiency on high-dimensional latent datasets, while demonstrating intrinsic robustness to outliers.


【32】Energy-Tweedie: Score meets Score, Energy meets Energy
标题:Energy-Tweedie:分数满足分数,能量满足能量
链接:https://arxiv.org/abs/2512.23818

作者:Andrej Leban
备注:22 pages, 5 figures
摘要:Denoising and score estimation have long been known to be linked via the classical Tweedie's formula. In this work, we first extend the latter to a wider range of distributions often called "energy models" and denoted elliptical distributions in this work. Next, we examine an alternative view: we consider the denoising posterior $P(X|Y)$ as the optimizer of the energy score (a scoring rule) and derive a fundamental identity that connects the (path-) derivative of a (possibly) non-Euclidean energy score to the score of the noisy marginal. This identity can be seen as an analog of Tweedie's identity for the energy score, and allows for several interesting applications; for example, score estimation, noise distribution parameter estimation, as well as using energy score models in the context of "traditional" diffusion model samplers with a wider array of noising distributions.


【33】Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting
标题:基于平稳加权的无Bellman完备性拟合Q估计
链接:https://arxiv.org/abs/2512.23805

作者:Lars van der Laan,Nathan Kallus
摘要:Fitted Q-evaluation (FQE) is a central method for off-policy evaluation in reinforcement learning, but it generally requires Bellman completeness: that the hypothesis class is closed under the evaluation Bellman operator. This requirement is challenging because enlarging the hypothesis class can worsen completeness. We show that the need for this assumption stems from a fundamental norm mismatch: the Bellman operator is gamma-contractive under the stationary distribution of the target policy, whereas FQE minimizes Bellman error under the behavior distribution. We propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts. This enables strong evaluation guarantees in the absence of realizability or Bellman completeness, avoiding the geometric error blow-up of standard FQE in this setting while maintaining the practicality of regression-based evaluation.


【34】Spike-Timing-Dependent Plasticity for Bernoulli Message Passing
标题:Bernoulli消息传递的尖峰时间依赖可塑性
链接:https://arxiv.org/abs/2512.23728

作者:Sepideh Adamiat,Wouter M. Kouw,Bert de Vries
摘要:Bayesian inference provides a principled framework for understanding brain function, while neural activity in the brain is inherently spike-based. This paper bridges these two perspectives by designing spiking neural networks that simulate Bayesian inference through message passing for Bernoulli messages. To train the networks, we employ spike-timing-dependent plasticity, a biologically plausible mechanism for synaptic plasticity which is based on the Hebbian rule. Our results demonstrate that the network's performance closely matches the true numerical solution. We further demonstrate the versatility of our approach by implementing a factor graph example from coding theory, illustrating signal transmission over an unreliable channel.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/191262