Py学习  »  机器学习算法

机器学习学术速递[12.8]

arXiv每日学术速递 • 1 月前 • 440 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计113篇


大模型相关(16篇)

【1】Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
标题:剩下的一切都必须是真的:过滤驱动LLM的推理,塑造多样性
链接:https://arxiv.org/abs/2512.05962

作者:Germán Kruszewski,Pierre Erbacher,Jos Rozen,Marc Dymetman
摘要:强化学习(RL)已经成为调整LLM以解决涉及推理的任务的事实标准。然而,越来越多的证据表明,以这种方式训练的模型往往会遭受多样性的重大损失。我们认为,这是因为RL隐式地优化了“模式搜索”或“迫零”反向KL的目标分布,导致模型将质量集中在目标的某些高概率区域,而忽略其他区域。在这项工作中,我们从一个明确的目标分布开始,通过过滤掉不正确的答案,同时保留正确答案的相对概率来获得。从预先训练的LLM开始,我们使用$α$-发散族来近似这个目标分布,该发散族统一了先前的方法,并通过在模式搜索和质量覆盖发散之间进行插值来直接控制精度-多样性权衡。在精益定理证明基准上,我们的方法在覆盖精度Pareto边界上实现了最先进的性能,在覆盖轴上优于所有先前的方法。
摘要:Reinforcement Learning (RL) has become the de facto standard for tuning LLMs to solve tasks involving reasoning. However, growing evidence shows that models trained in such way often suffer from a significant loss in diversity. We argue that this arises because RL implicitly optimizes the "mode-seeking" or "zero-forcing" Reverse KL to a target distribution causing the model to concentrate mass on certain high-probability regions of the target while neglecting others. In this work, we instead begin from an explicit target distribution, obtained by filtering out incorrect answers while preserving the relative probabilities of correct ones. Starting from a pre-trained LLM, we approximate this target distribution using the $α$-divergence family, which unifies prior approaches and enables direct control of the precision-diversity trade-off by interpolating between mode-seeking and mass-covering divergences. On a Lean theorem-proving benchmark, our method achieves state-of-the-art performance along the coverage-precision Pareto frontier, outperforming all prior methods on the coverage axis.


【2】Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models
标题:基于语言模型的低资源语言方言识别器的自举模糊器
链接:https://arxiv.org/abs/2512.05887

作者:Sairam Vaidya,Marcel Böhme,Loris D'Antoni
摘要:现代可扩展编译器框架(如MLIR)支持快速创建特定于领域的语言方言。然而,这种灵活性使正确性更难保证,因为加速开发的可扩展性也使维护测试基础设施变得复杂。可扩展语言需要自动化的测试生成,它既与方言无关(跨方言工作,无需手动调整),又与方言有效(针对方言特定的功能来查找bug)。现有的方法通常牺牲这些目标之一,要么需要手动构建种子语料库的每种方言,或未能有效。我们提出了一个方言不可知和方言有效的基于语法和覆盖率指导的可扩展编译器模糊方法,结合了现有工作的两个关键见解:(i)方言的语法,已经编码的结构和类型约束,往往可以自动提取方言规范;以及(ii)这些语法可以与预-经过训练的大型语言模型可以从整个方言空间自动生成具有代表性和多样性的种子输入,而无需任何手动操作。输入或训练数据。然后,这些种子可以用于引导覆盖引导模糊器。我们将这种方法构建成一个工具,Germinator。在对涵盖91种方言的六个MLIR项目进行评估时,Germinator生成的种子比基于语法的基线提高了10-120%的线覆盖率。我们与基于语法的基线进行比较,因为它们是唯一一类可以统一应用于MLIR异构方言生态系统的现有自动种子生成器。Germinator发现了88个以前未知的bug(40个已确认),其中23个是以前没有自动测试生成器的方言,证明了对低资源方言的有效和可控的大规模测试。
摘要:Modern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same extensibility that accelerates development also complicates maintaining the testing infrastructure. Extensible languages require automated test generation that is both dialect-agnostic (works across dialects without manual adaptation) and dialect-effective (targets dialect-specific features to find bugs). Existing approaches typically sacrifice one of these goals by either requiring manually constructed seed corpora for each dialect, or by failing to be effective. We present a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach for extensible compilers that combines two key insights from existing work: (i) the grammars of dialects, which already encode the structural and type constraints, can often be extracted automatically from the dialect specification; and (ii) these grammars can be used in combination with pre-trained large language models to automatically generate representative and diverse seed inputs from the full dialect space without requiring any manual input or training data. These seeds can then be used to bootstrap coverage-guided fuzzers. We built this approach into a tool, Germinator. When evaluated on six MLIR projects spanning 91 dialects, Germinator generated seeds improve line coverage by 10-120% over grammar-based baselines. We compare against grammar-based baselines because they are the only class of existing automatic seed generators that can be applied uniformly across MLIR's heterogeneous dialect ecosystem. Germinator discovers 88 previously unknown bugs (40 confirmed), including 23 in dialects with no prior automated test generators, demonstrating effective and controllable testing of low-resource dialects at scale.


【3】Mechanistic Interpretability of Antibody Language Models Using SAEs
标题:使用严重不良事件的抗体语言模型的机制解释性
链接:https://arxiv.org/abs/2512.05794

作者:Rebonto Haque,Oliver M. Turnbull,Anisha Parsan,Nithin Parsan,John J. Yang,Charlotte M. Deane
摘要:稀疏自动编码器(SAE)是一种机械的可解释性技术,已被用于提供对大型蛋白质语言模型中学习概念的洞察。在这里,我们采用TopK和有序SAE来研究自回归抗体语言模型p-IgGen,并引导其生成。我们发现,TopK SAE可以揭示生物学上有意义的潜在功能,但高功能概念相关性并不能保证因果控制一代。相比之下,有序SAE强加了一个分层结构,可以可靠地识别可操纵的特征,但代价是更复杂和更难解释的激活模式。这些发现推进了特定领域蛋白质语言模型的机械可解释性,并表明,虽然TopK SAE足以将潜在特征映射到概念,但当需要精确的生成转向时,有序SAE是优选的。
摘要:Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within large protein language models. Here, we employ TopK and Ordered SAEs to investigate an autoregressive antibody language model, p-IgGen, and steer its generation. We show that TopK SAEs can reveal biologically meaningful latent features, but high feature concept correlation does not guarantee causal control over generation. In contrast, Ordered SAEs impose an hierarchical structure that reliably identifies steerable features, but at the expense of more complex and less interpretable activation patterns. These findings advance the mechanistic interpretability of domain-specific protein language models and suggest that, while TopK SAEs are sufficient for mapping latent features to concepts, Ordered SAEs are preferable when precise generative steering is required.


【4】Teaching Language Models Mechanistic Explainability Through Arrow-Pushing
标题:通过箭头推动教学语言模型机械解释性
链接:https://arxiv.org/abs/2512.05722

作者:Théo A. Neukomm,Zlatko Jončev,Philippe Schwaller
备注:ELLIS 2025 ML4Molecules Workshop
摘要:化学反应机制提供了重要的洞察力的可合成性,但目前的计算机辅助合成规划(CASP)系统缺乏机械接地。我们引入了一个计算框架,用于教学语言模型,通过箭头推动形式主义预测化学反应机制,这是一个有百年历史的符号,在尊重守恒定律的同时跟踪电子流。我们开发了MechSMILES,这是一种编码分子结构和电子流的紧凑文本格式,并使用机械反应数据集(如mech-USPTO-31 k和FlowER)对四种日益复杂的机制预测任务进行了语言模型训练。我们的模型在基本步骤预测上达到了超过95%的前3名准确率,在mech-USPTO-31 k上的得分超过了73%,在FlowER数据集上达到了93%,用于检索我们最困难的任务的完整反应机制。这种机械的理解使三个关键的应用。首先,我们的模型作为CASP系统的事后验证器,过滤化学上不可信的转换。其次,它们能够实现整体的原子到原子映射,跟踪所有原子,包括氢。第三,他们提取催化剂感知反应模板,将回收的催化剂与旁观者物种区分开来。通过在物理上有意义的电子移动中进行预测,确保质量和电荷守恒,这项工作提供了一条通往更可解释和化学上有效的计算合成规划的途径,同时为机制预测的基准测试提供了一个架构不可知的框架。
摘要:Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computational framework for teaching language models to predict chemical reaction mechanisms through arrow pushing formalism, a century-old notation that tracks electron flow while respecting conservation laws. We developed MechSMILES, a compact textual format encoding molecular structure and electron flow, and trained language models on four mechanism prediction tasks of increasing complexity using mechanistic reaction datasets, such as mech-USPTO-31k and FlowER. Our models achieve more than 95\% top-3 accuracy on elementary step prediction and scores that surpass 73\% on mech-USPTO-31k, and 93\% on FlowER dataset for the retrieval of complete reaction mechanisms on our hardest task. This mechanistic understanding enables three key applications. First, our models serve as post-hoc validators for CASP systems, filtering chemically implausible transformations. Second, they enable holistic atom-to-atom mapping that tracks all atoms, including hydrogens. Third, they extract catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. By grounding predictions in physically meaningful electron moves that ensure conservation of mass and charge, this work provides a pathway toward more explainable and chemically valid computational synthesis planning, while providing an architecture-agnostic framework for the benchmarking of mechanism prediction.


【5】Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
标题:超越数据过滤:LLM中能力删除的知识本地化
链接:https://arxiv.org/abs/2512.05648

作者:Igor Shilov,Alex Cloud,Aryo Pradipta Gema,Jacob Goldman-Wetzler,Nina Panickssery,Henry Sleight,Erik Jones,Cem Anil
摘要:大型语言模型越来越多地拥有具有双重用途风险的能力。虽然数据过滤已经成为预训练时间的缓解措施,但它面临着重大挑战:标记数据是否有害在规模上是昂贵的,并且考虑到使用更大的模型提高样本效率,即使少量错误标记的内容也可能会产生危险的功能。为了解决与错误标记的有害内容相关联的风险,先前的工作提出了梯度路由(Cloud等人,2024)--一种将目标知识本地化到模型参数的专用子集中的技术,以便稍后可以将其删除。我们探讨了一种改进的梯度路由,我们称之为选择性屏蔽(SGTM),特别侧重于评估其对标签噪声的鲁棒性。SGTM零屏蔽选定的梯度,使得目标域示例仅更新其专用参数。我们在两个应用程序中测试SGTM的有效性:从双语合成数据集训练的模型中删除一种语言的知识,并从英语维基百科训练的模型中删除生物学知识。在这两种情况下,SGTM提供了更好的保留/忘记权衡标记错误的存在相比,数据过滤和先前提出的梯度路由的实例化。与可以通过微调快速撤销的浅层学习方法不同,SGTM对对抗性微调表现出很强的鲁棒性,与基于微调的学习方法(RMU)相比,需要七倍以上的微调步骤才能在遗忘集上达到基线性能。我们的研究结果表明,SGTM提供了一个有前途的预训练时间补充现有的安全缓解措施,特别是在标签噪声是不可避免的设置。
摘要:Large Language Models increasingly possess capabilities that carry dual-use risks. While data filtering has emerged as a pretraining-time mitigation, it faces significant challenges: labeling whether data is harmful is expensive at scale, and given improving sample efficiency with larger models, even small amounts of mislabeled content could give rise to dangerous capabilities. To address risks associated with mislabeled harmful content, prior work proposed Gradient Routing (Cloud et al., 2024) -- a technique that localizes target knowledge into a dedicated subset of model parameters so they can later be removed. We explore an improved variant of Gradient Routing, which we call Selective GradienT Masking (SGTM), with particular focus on evaluating its robustness to label noise. SGTM zero-masks selected gradients such that target domain examples only update their dedicated parameters. We test SGTM's effectiveness in two applications: removing knowledge of one language from a model trained on a bilingual synthetic dataset, and removing biology knowledge from a model trained on English Wikipedia. In both cases SGTM provides better retain/forget trade-off in the presence of labeling errors compared to both data filtering and a previously proposed instantiation of Gradient Routing. Unlike shallow unlearning approaches that can be quickly undone through fine-tuning, SGTM exhibits strong robustness to adversarial fine-tuning, requiring seven times more fine-tuning steps to reach baseline performance on the forget set compared to a finetuning-based unlearning method (RMU). Our results suggest SGTM provides a promising pretraining-time complement to existing safety mitigations, particularly in settings where label noise is unavoidable.


【6】RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs
标题:RoBoN:通过多个LLM路由在线n中最佳测试时间扩展
链接:https://arxiv.org/abs/2512.05542

作者:Jonathan Geuter,Gregor Kornhardt
备注:20 pages, 3 figures. 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Foundations of Reasoning in Language Models
摘要:Best-of-$n$是LLM推理中广泛使用的测试时间缩放方法。然而,尽管有证据表明,LLM在不同的任务中表现出互补的优势,但传统上,最好的n$依赖于一个单一的模型来生成响应。我们提出了RoBoN(路由在线最佳-$n$),一个顺序的多LLM替代流行的单模式最佳-$n$。给定一组模型$\{m_i\}_{i=1}^M$,RoBoN基于使用奖励模型计算的分数和预测响应的一致性信号,在模型之间逐个顺序路由世代。这种在线路由不需要额外的训练,保持计算奇偶性,并与任何插件奖励模型一起工作。在推理基准(MATH 500,OlympiadBench,MinervaMath,GSM 8 K,MMLU)中,RoBoN始终优于应用于每个单个模型的标准最佳$n $,绝对准确度提高高达3.4%,并且在统一的多模型组合基线上也有所改善。我们的研究结果表明,跨模型的多样性可以在推理中被利用,以单独提高任何组成模型的最佳性能,为多个LLM的测试时间缩放提供了一个简单的,无训练的路径。
摘要:Best-of-$n$ is a widely used test-time scaling approach for LLM inference. Yet despite evidence that LLMs exhibit complementary strengths across tasks, traditionally best-of-$n$ relies on a single model to generate responses. We propose RoBoN (Routed Online Best-of-$n$), a sequential multi-LLM alternative to the prevailing single-model best-of-$n$. Given a suite of models $\{m_i\}_{i=1}^M$, RoBoN sequentially routes generations one-by-one across models, based on scores computed using a reward model and an agreement signal on the predicted responses. This online routing requires no additional training, keeps compute parity, and works with any plug-in reward model. Across reasoning benchmarks (MATH500, OlympiadBench, MinervaMath, GSM8K, MMLU), RoBoN consistently outperforms standard best-of-$n$ applied to each individual model for larger $n$, with gains of up to 3.4\% in absolute accuracy, and also improves over a uniform multi-model portfolio baseline. Our results indicate that diversity across models can be exploited at inference to improve best-of-$n$ performance over any constituent model alone, providing a simple, training-free path to test-time scaling with multiple LLMs.


【7】Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement
标题:Poodle:通过即时模型替换完美缩减大型语言模型
链接:https://arxiv.org/abs/2512.05525

作者 :Nils Strassenburg,Boris Glavic,Tilmann Rabl
摘要:企业越来越依赖大型语言模型(LLM)来自动化简单的重复性任务,而不是开发自定义机器学习模型。LLM需要很少的训练示例,并且可以由没有模型开发专业知识的用户使用。然而,与较小的模型相比,这是以更高的资源和能源消耗为代价的,较小的模型通常对简单任务实现类似的预测性能。在本文中,我们提出了我们对即时模型替换(JITR)的愿景,在识别LLM调用中的重复任务后,模型将被透明地替换为一个更便宜的替代方案,该替代方案可以很好地执行此特定任务。JITR保留了LLM的易用性和低开发工作量,同时节省了大量的成本和能源。我们讨论的主要挑战,在实现我们的愿景,识别经常性的任务和创建一个自定义模型。具体来说,我们认为模型搜索和迁移学习将在JITR中发挥至关重要的作用,以有效地识别和微调重复任务的模型。使用我们的JITR原型Poodle,我们在示例性任务中实现了显著的节省。
摘要:Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and can be utilized by users without expertise in model development. However, this comes at the cost of substantially higher resource and energy consumption compared to smaller models, which often achieve similar predictive performance for simple tasks. In this paper, we present our vision for just-in-time model replacement (JITR), where, upon identifying a recurring task in calls to an LLM, the model is replaced transparently with a cheaper alternative that performs well for this specific task. JITR retains the ease of use and low development effort of LLMs, while saving significant cost and energy. We discuss the main challenges in realizing our vision regarding the identification of recurring tasks and the creation of a custom model. Specifically, we argue that model search and transfer learning will play a crucial role in JITR to efficiently identify and fine-tune models for a recurring task. Using our JITR prototype Poodle, we achieve significant savings for exemplary tasks.


【8】TS-HINT: Enhancing Semiconductor Time Series Regression Using Attention Hints From Large Language Model Reasoning
标题:TS-Hint:利用大型语言模型推理的注意力提示增强半导体时间序列回归
链接:https://arxiv.org/abs/2512.05419

作者:Jonathan Adam Rico,Nagarajan Raghavan,Senthilnath Jayavelu
摘要:现有的数据驱动方法依赖于从时间序列中提取静态特征来近似半导体制造工艺(诸如化学机械抛光(CMP))的材料去除率(MRR)。然而,这导致时间动态的损失。此外,这些方法需要大量的数据进行有效的训练。在本文中,我们提出了TS-Hint,一个时间序列基础模型(TSFM)框架,集成了基于注意力机制数据和显着性数据的训练过程中提供注意力提示的思维链推理。实验结果表明,我们的模型在有限的数据设置通过Few-Shot学习的有效性,可以直接从多变量时间序列特征学习。
摘要:Existing data-driven methods rely on the extraction of static features from time series to approximate the material removal rate (MRR) of semiconductor manufacturing processes such as chemical mechanical polishing (CMP). However, this leads to a loss of temporal dynamics. Moreover, these methods require a large amount of data for effective training. In this paper, we propose TS-Hint, a Time Series Foundation Model (TSFM) framework, integrated with chain-of-thought reasoning which provides attention hints during training based on attention mechanism data and saliency data. Experimental results demonstrate the effectiveness of our model in limited data settings via few-shot learning and can learn directly from multivariate time series features.


【9】When Forgetting Builds Reliability: LLM Unlearning for Reliable Hardware Code Generation
标题:当遗忘建立可靠性:LLM遗忘可靠的硬件代码生成
链接:https://arxiv.org/abs/2512.05341

作者:Yiwen Liang,Qiufeng Li,Shikai Wang,Weidong Cao
摘要:大型语言模型(LLM)在通过自动代码生成加速数字硬件设计方面显示出强大的潜力。然而,确保它们的可靠性仍然是一个关键的挑战,因为在大规模异构数据集上训练的现有LLM通常表现出对专有知识产权(IP)的记忆问题,污染的基准和不安全的编码模式。为了减轻这些风险,我们提出了一种新的基于LLM的硬件代码生成量身定制的遗忘框架。我们的方法结合了(i)一种语法保留的遗忘策略,可以在遗忘过程中保护硬件代码的结构完整性,以及(ii)一种细粒度的地板感知选择性丢失,可以精确有效地去除有问题的知识。这种集成实现了有效的遗忘,而不会降低LLM代码生成功能。大量的实验表明,我们的框架支持忘记设置高达3倍大,通常只需要一个单一的训练时期,同时保持语法正确性和功能完整性的寄存器传输级(RTL)代码。我们的工作为可靠的LLM辅助硬件设计铺平了道路。
摘要:Large Language Models (LLMs) have shown strong potential in accelerating digital hardware design through automated code generation. Yet, ensuring their reliability remains a critical challenge, as existing LLMs trained on massive heterogeneous datasets often exhibit problematic memorization of proprietary intellectual property (IP), contaminated benchmarks, and unsafe coding patterns. To mitigate these risks, we propose a novel unlearning framework tailored for LLM-based hardware code generation. Our method combines (i) a syntax-preserving unlearning strategy that safeguards the structural integrity of hardware code during forgetting, and (ii) a fine-grained floor-aware selective loss that enables precise and efficient removal of problematic knowledge. This integration achieves effective unlearning without degrading LLM code generation capabilities. Extensive experiments show that our framework supports forget sets up to 3x larger, typically requiring only a single training epoch, while preserving both syntactic correctness and functional integrity of register-transfer level (RTL) codes. Our work paves an avenue towards reliable LLM-assisted hardware design.


【10】Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models
标题:大型语言模型具有稳健护栏的分类自适应调节模型
链接:https://arxiv.org/abs/2512.05339

作者:Mahesh Kumar Nandwana,Youngwan Lim,Joseph Liu,Alex Yang,Varun Notibala,Nishchaie Khanna
备注:To be presented at AAAI-26 PerFM Workshop
摘要:大型语言模型(LLM)通常在训练后阶段进行安全性调整;然而,它们仍然可能生成可能对用户构成潜在风险的不适当输出。这一挑战强调了需要在模型输入和输出中运行的强大保障措施。在这项工作中,我们介绍了Roblox Guard 1.0,这是一种最先进的指令微调LLM,旨在通过全面的输入输出调节来增强LLM系统的安全性,使用LLM管道来增强调节能力。我们的模型建立在Llama-3.1-8B-Instruct主干上,经过指令微调,可以概括以前看不见的安全分类,并在域外安全基准测试中表现出强大的性能。指令微调过程使用合成和开源安全数据集的混合,并通过思想链(CoT)原理和输入反转进行增强,以增强上下文理解和决策。为了支持系统评估,我们还发布了RobloxGuard-Eval,这是一个新的基准测试,具有可扩展的安全分类法,用于评估LLM护栏和审核框架的有效性。
摘要:Large Language Models (LLMs) are typically aligned for safety during the post-training phase; however, they may still generate inappropriate outputs that could potentially pose risks to users. This challenge underscores the need for robust safeguards that operate across both model inputs and outputs. In this work, we introduce Roblox Guard 1.0, a state-of-the-art instruction fine-tuned LLM designed to enhance the safety of LLM systems through comprehensive input-output moderation, using a pipeline of LLMs to enhance moderation capability. Built on the Llama-3.1-8B-Instruct backbone, our model is instruction fine-tuned to generalize across previously unseen safety taxonomies and demonstrates strong performance on out-of-domain safety benchmarks. The instruction fine-tuning process uses a mix of synthetic and open-source safety datasets, augmented with chain-of-thought (CoT) rationales and input inversion to enhance contextual understanding and decision making. To support systematic evaluation, we also release RobloxGuard-Eval, a new benchmark featuring an extensible safety taxonomy to assess the effectiveness of LLM guardrails and moderation frameworks.


【11】PathFinder: MCTS and LLM Feedback-based Path Selection for Multi-Hop Question Answering
标题:路径:基于MCTS和LLM反馈的多跳问题回答路径选择
链接:https://arxiv.org/abs/2512.05336

作者:Durga Prasad Maram,Kalpa Gunaratna,Vijay Srinivasan,Haris Jeelani,Srinivas Chappidi
备注:5 PAGES, 3 IMAGES
摘要:多跳问题回答是一项具有挑战性的任务,其中语言模型必须经过多个步骤才能得到正确的答案。在大型语言模型及其推理能力的帮助下,现有系统能够通过多个步骤来思考和分解输入问题,以进行分析,检索和推理。然而,针对这个问题的基于训练的方法仍然受到LLM幻觉和不正确的推理路径的影响,这阻碍了性能。因此,我们提出了PATHFINDER,一种方法:(i)使用蒙特卡洛树搜索来生成训练路径轨迹,(ii)通过使用子答案召回和LLM作为判断验证过滤错误和冗长的轨迹来提高训练数据质量,以及(iii)重新制定子查询来处理失败的检索情况。通过遵循这些步骤,我们证明了PATHFINDER在公共基准数据集上提高了多跳QA的性能。
摘要:Multi-hop question answering is a challenging task in which language models must reason over multiple steps to reach the correct answer. With the help of Large Language Models and their reasoning capabilities, existing systems are able to think and decompose an input question over multiple steps to analyze, retrieve, and reason. However, training-based approaches for this problem still suffer from LLM hallucinations and incorrect reasoning paths that hinder performance. Hence, we propose PATHFINDER, an approach that: (i) uses Monte Carlo Tree Search to generate training path traces, (ii) improves training data quality by filtering erroneous and lengthy traces using sub-answer recall and LLM-as-a-judge verification, and (iii) reformulates sub-queries to handle failed retrieval cases. By following these steps, we demonstrate that PATHFINDER improves the performance of multi-hop QA over public benchmark datasets.


【12】Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats
标题:揭露Pink Slime新闻:语言签名和针对LLM生成的威胁的稳健检测
链接:https://arxiv.org/abs/2512.05331

作者:Sadat Shahriar,Navid Ayoobi,Arjun Mukherjee,Mostafa Musharrat,Sai Vishnu Vamsi
备注:Published in RANLP 2025
摘要:当地新闻是2800万美国人可靠信息的重要来源,面临着来自Pink Slime Journalism的日益增长的威胁,Pink Slime Journalism是一种模仿合法当地报道的低质量自动生成的文章。检测这些欺骗性文章需要对它们的语言、文体和词汇特征进行细粒度的分析。在这项工作中,我们进行了全面的研究,以揭示粉红色粘液内容的区别模式,并提出基于这些见解的检测策略。除了传统的生成方法之外,我们还强调了一种新的对抗向量:通过大型语言模型(LLM)进行修改。我们的研究结果表明,即使是消费者可访问的LLM也会显著破坏现有的检测系统,使其F1得分降低高达40%。为了应对这种威胁,我们引入了一个强大的学习框架,专门用于抵御基于LLM的对抗性攻击,并适应自动化粉红色粘液新闻的不断发展,并显示出高达27%的改进。
摘要:The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimic legitimate local reporting. Detecting these deceptive articles requires a fine-grained analysis of their linguistic, stylistic, and lexical characteristics. In this work, we conduct a comprehensive study to uncover the distinguishing patterns of Pink Slime content and propose detection strategies based on these insights. Beyond traditional generation methods, we highlight a new adversarial vector: modifications through large language models (LLMs). Our findings reveal that even consumer-accessible LLMs can significantly undermine existing detection systems, reducing their performance by up to 40% in F1-score. To counter this threat, we introduce a robust learning framework specifically designed to resist LLM-based adversarial attacks and adapt to the evolving landscape of automated pink slime journalism, and showed and improvement by up to 27%.


【13】The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?
标题:LLM签名的侵蚀:迭代解释后我们还能区分人类和LLM生成的科学思想吗?
链接:https://arxiv.org/abs/2512.05311

作者:Sadat Shahriar,Navid Ayoobi,Arjun Mukherjee
备注:Published in RANLP 2025
摘要:随着对LLM作为研究代理人的依赖越来越大,区分LLM和人类产生的想法对于理解LLM研究能力的认知细微差别至关重要。虽然检测LLM生成的文本已被广泛研究,但区分人类与LLM生成的科学思想仍然是一个未探索的领域。在这项工作中,我们系统地评估了最先进的(SOTA)机器学习模型区分人类和LLM生成的想法的能力,特别是在连续的释义阶段之后。我们的研究结果突出了SOTA模型在源归因方面面临的挑战,在连续五个释义阶段之后,检测性能平均下降了25.4%。此外,我们证明,将研究问题作为上下文信息,提高检测性能高达2.97%。值得注意的是,我们的分析表明,当想法被解释为简化的非专家风格时,检测算法会显着挣扎,这对可区分的LLM签名的侵蚀贡献最大。
摘要:With the increasing reliance on LLMs as research agents, distinguishing between LLM and human-generated ideas has become crucial for understanding the cognitive nuances of LLMs' research capabilities. While detecting LLM-generated text has been extensively studied, distinguishing human vs LLM-generated scientific idea remains an unexplored area. In this work, we systematically evaluate the ability of state-of-the-art (SOTA) machine learning models to differentiate between human and LLM-generated ideas, particularly after successive paraphrasing stages. Our findings highlight the challenges SOTA models face in source attribution, with detection performance declining by an average of 25.4\% after five consecutive paraphrasing stages. Additionally, we demonstrate that incorporating the research problem as contextual information improves detection performance by up to 2.97%. Notably, our analysis reveals that detection algorithms struggle significantly when ideas are paraphrased into a simplified, non-expert style, contributing the most to the erosion of distinguishable LLM signatures.


【14】Mitigating the Antigenic Data Bottleneck: Semi-supervised Learning with Protein Language Models for Influenza A Surveillance
标题:缓解抗原数据瓶颈:使用蛋白质语言模型进行半监督学习用于甲型流感监测
链接:https://arxiv.org/abs/2512.05222

作者:Yanhua Xu
备注:V0: initial draft uploaded
摘要:甲型流感病毒(IAV)的抗原性进化速度需要频繁的疫苗更新,但用于定量抗原性的血凝抑制(HI)测定是劳动密集型的且不可扩展。因此,基因组数据远远超过可用的表型标签,限制了传统监督模型的有效性。我们假设,将预训练的蛋白质语言模型(PLM)与半监督学习(SSL)相结合,即使在标记数据稀缺的情况下,也可以保持较高的预测准确性。我们评估了两个SSL策略,自训练和标签扩散,对完全监督的基线使用四个PLM衍生的嵌入(ESM-2,ProtVec,ProtT 5,ProtBert)应用于血凝素(HA)序列。嵌套交叉验证框架模拟了四种IAV亚型(H1N1,H3 N2,H5 N1,H9 N2)的低标签方案(25%,50%,75%和100%标签可用性)。SSL在标签稀缺的情况下持续提高性能。使用ProtVec的自我训练产生了最大的相对增益,表明SSL可以补偿较低分辨率的表示。ESM-2仍然非常稳健,仅用25%的标记数据就达到了0.82以上的F1评分,表明其嵌入捕获了关键的抗原决定簇。虽然H1N1和H9 N2的预测精度很高,但高变H3 N2亚型仍然具有挑战性,尽管SSL减轻了性能下降。这些发现表明,将PLM与SSL整合可以解决抗原性标记瓶颈,并能够更有效地使用未标记的监测序列,支持快速变体优先级排序和及时的疫苗株选择。
摘要 :Influenza A viruses (IAVs) evolve antigenically at a pace that requires frequent vaccine updates, yet the haemagglutination inhibition (HI) assays used to quantify antigenicity are labor-intensive and unscalable. As a result, genomic data vastly outpace available phenotypic labels, limiting the effectiveness of traditional supervised models. We hypothesize that combining pre-trained Protein Language Models (PLMs) with Semi-Supervised Learning (SSL) can retain high predictive accuracy even when labeled data are scarce. We evaluated two SSL strategies, Self-training and Label Spreading, against fully supervised baselines using four PLM-derived embeddings (ESM-2, ProtVec, ProtT5, ProtBert) applied to haemagglutinin (HA) sequences. A nested cross-validation framework simulated low-label regimes (25%, 50%, 75%, and 100% label availability) across four IAV subtypes (H1N1, H3N2, H5N1, H9N2). SSL consistently improved performance under label scarcity. Self-training with ProtVec produced the largest relative gains, showing that SSL can compensate for lower-resolution representations. ESM-2 remained highly robust, achieving F1 scores above 0.82 with only 25% labeled data, indicating that its embeddings capture key antigenic determinants. While H1N1 and H9N2 were predicted with high accuracy, the hypervariable H3N2 subtype remained challenging, although SSL mitigated the performance decline. These findings demonstrate that integrating PLMs with SSL can address the antigenicity labeling bottleneck and enable more effective use of unlabeled surveillance sequences, supporting rapid variant prioritization and timely vaccine strain selection.


【15】Semantic Faithfulness and Entropy Production Measures to Tame Your LLM Demons and Manage Hallucinations
标题:语义忠实和熵产生措施来驯服您的LLM恶魔和管理幻觉
链接:https://arxiv.org/abs/2512.05156

作者:Igor Halperin
备注:23 pages, 6 figures
摘要:评估大型语言模型(LLM)对给定任务的忠实性是一个复杂的挑战。我们提出了两个新的无监督的忠诚度评估指标,利用信息论和热力学的见解。我们的方法将LLM视为一个二分信息引擎,其中隐藏层充当麦克斯韦恶魔,通过提示$Q$控制上下文$C $转换为答案$A$。我们模型的上下文-答案(QCA)三元组的概率分布在共享的主题。从$C$到$Q$和$A$的主题转换被建模为转换矩阵${\bfQ}$和${\bfA}$,分别编码查询目标和实际结果。我们的语义忠实性(SF)度量量化忠实于任何给定的QCA三元组的Kullback-Leibler(KL)这些矩阵之间的分歧。两个矩阵都是通过这个KL发散的凸优化同时推断的,最终的SF度量是通过将最小发散映射到单位区间[0,1]上来获得的,其中分数越高表示越忠实。此外,我们提出了一个基于语义熵产生(SEP)的答案生成度量,并表明高忠实度通常意味着低熵产生。SF和SEP度量可以联合或单独用于LLM评估和幻觉控制。我们展示了我们的框架,对企业SEC 10-K文件的LLM摘要。
摘要:Evaluating faithfulness of Large Language Models (LLMs) to a given task is a complex challenge. We propose two new unsupervised metrics for faithfulness evaluation using insights from information theory and thermodynamics. Our approach treats an LLM as a bipartite information engine where hidden layers act as a Maxwell demon controlling transformations of context $C $ into answer $A$ via prompt $Q$. We model Question-Context-Answer (QCA) triplets as probability distributions over shared topics. Topic transformations from $C$ to $Q$ and $A$ are modeled as transition matrices ${\bf Q}$ and ${\bf A}$ encoding the query goal and actual result, respectively. Our semantic faithfulness (SF) metric quantifies faithfulness for any given QCA triplet by the Kullback-Leibler (KL) divergence between these matrices. Both matrices are inferred simultaneously via convex optimization of this KL divergence, and the final SF metric is obtained by mapping the minimal divergence onto the unit interval [0,1], where higher scores indicate greater faithfulness. Furthermore, we propose a thermodynamics-based semantic entropy production (SEP) metric in answer generation, and show that high faithfulness generally implies low entropy production. The SF and SEP metrics can be used jointly or separately for LLM evaluation and hallucination control. We demonstrate our framework on LLM summarization of corporate SEC 10-K filings.


【16】How to Tame Your LLM: Semantic Collapse in Continuous Systems
标题:如何驯服法学硕士:连续系统中的语义崩溃
链接:https://arxiv.org/abs/2512.05162

作者:C. M. Wyss
备注:35 pages, 1 figure. Exolytica AI Technical Report XTR-2025-01
摘要:我们开发了一个通用的语义动态理论的大型语言模型,将它们形式化为连续状态机(CSM):光滑的动态系统,其潜在的流形下概率转移算子的发展。关联的转移算子$P:L^2(M,μ)\to L^2(M,μ)$编码语义质量的传播。在温和的正则性假设(紧性、遍历性、有界雅可比矩阵)下,$P$是紧的,具有离散谱。在这种情况下,我们证明了语义特征定理(SCT):$P$的主导特征函数诱导了许多不变意义的谱盆,每个谱盆都可以在$\mathbb{R}$上的O-极小结构中定义。因此,光谱集总性和逻辑驯服性是一致的。这解释了离散符号语义如何从连续计算中出现:连续激活流形折叠成有限的、逻辑上可解释的本体。我们进一步扩展的SCT随机和绝热(时间不均匀)的设置,缓慢漂移内核保持紧凑性,频谱相干性,和盆地结构。
摘要:We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve under probabilistic transition operators. The associated transfer operator $P: L^2(M,μ) \to L^2(M,μ)$ encodes the propagation of semantic mass. Under mild regularity assumptions (compactness, ergodicity, bounded Jacobian), $P$ is compact with discrete spectrum. Within this setting, we prove the Semantic Characterization Theorem (SCT): the leading eigenfunctions of $P$ induce finitely many spectral basins of invariant meaning, each definable in an o-minimal structure over $\mathbb{R}$. Thus spectral lumpability and logical tameness coincide. This explains how discrete symbolic semantics can emerge from continuous computation: the continuous activation manifold collapses into a finite, logically interpretable ontology. We further extend the SCT to stochastic and adiabatic (time-inhomogeneous) settings, showing that slowly drifting kernels preserve compactness, spectral coherence, and basin structure.


Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】Bounded Graph Clustering with Graph Neural Networks
标题:利用图神经网络进行有界图聚集
链接:https://arxiv.org/abs/2512.05623

作者:Kibidi Neocosmos,Diego Baptista,Nicole Ludwig
备注:17 pages, 8 figures
摘要:在社区检测中,许多方法需要用户预先指定聚类的数量,因为在所有可能的值上进行穷举搜索在计算上是不可行的。虽然一些经典算法可以直接从数据中推断出这个数字,但对于图神经网络(GNN)来说,情况通常并非如此:即使指定了所需的聚类数量,标准的基于GNN的方法也往往无法返回确切的数字。在这项工作中,我们通过引入一种灵活和有原则的方式来控制GNN发现的社区数量来解决这一限制。我们提出了一个框架,允许用户指定一个合理的范围,并在训练过程中强制执行这些界限,而不是假设真实的聚类数是已知的。但是,如果用户想要确切的聚类数,也可以指定并可靠地返回。
摘要:In community detection, many methods require the user to specify the number of clusters in advance since an exhaustive search over all possible values is computationally infeasible. While some classical algorithms can infer this number directly from the data, this is typically not the case for graph neural networks (GNNs): even when a desired number of clusters is specified, standard GNN-based methods often fail to return the exact number due to the way they are designed. In this work, we address this limitation by introducing a flexible and principled way to control the number of communities discovered by GNNs. Rather than assuming the true number of clusters is known, we propose a framework that allows the user to specify a plausible range and enforce these bounds during training. However, if the user wants an exact number of clusters, it may also be specified and reliably returned.


【2】GRASP: Graph Reasoning Agents for Systems Pharmacology with Human-in-the-Loop
标题:GRASP:具有人在环的系统药理学图形推理代理
链接:https://arxiv.org/abs/2512.05502

作者:Omid Bazgir,Vineeth Manthapuri,Ilia Rattsev,Mohammad Jafarnejad
摘要 :定量系统药理学(QSP)建模对于药物开发是必不可少的,但它需要大量的时间投资,限制了领域专家的吞吐量。我们提出了\textbf{GRASP} --一个具有人在环对话界面的多智能体、图推理框架--它将QSP模型编码为类型化的生物知识图,并将其编译为可执行的MATLAB/SimBiology代码,同时保留单位、质量平衡和生理约束。一个两阶段的工作流-- \textsc{Understanding}(遗留代码的图形重建)和\textsc{Action}(约束检查、语言驱动的修改)--由具有迭代验证的状态机编排。GRASP围绕新实体执行宽度优先参数对齐,以获得表面相关量,并提出生物学上合理的默认值,并运行自动执行/诊断直到收敛。在使用LLM-as-judge的头对头评估中,GRASP在生物相容性、数学正确性、结构保真度和代码质量方面优于SME指导的CoT和ToT基线(\(\approx\)9- 10/10 vs.\ 5- 7/10)。BFS对齐对于依赖性发现、单位和范围实现F1 = 0.95。这些结果表明,图形结构的代理工作流可以使QSP模型开发既方便又严格,使领域专家能够在不牺牲生物医学保真度的情况下用自然语言指定机制。
摘要:Quantitative Systems Pharmacology (QSP) modeling is essential for drug development but it requires significant time investment that limits the throughput of domain experts. We present \textbf{GRASP} -- a multi-agent, graph-reasoning framework with a human-in-the-loop conversational interface -- that encodes QSP models as typed biological knowledge graphs and compiles them to executable MATLAB/SimBiology code while preserving units, mass balance, and physiological constraints. A two-phase workflow -- \textsc{Understanding} (graph reconstruction of legacy code) and \textsc{Action} (constraint-checked, language-driven modification) -- is orchestrated by a state machine with iterative validation. GRASP performs breadth-first parameter-alignment around new entities to surface dependent quantities and propose biologically plausible defaults, and it runs automatic execution/diagnostics until convergence. In head-to-head evaluations using LLM-as-judge, GRASP outperforms SME-guided CoT and ToT baselines across biological plausibility, mathematical correctness, structural fidelity, and code quality (\(\approx\)9--10/10 vs.\ 5--7/10). BFS alignment achieves F1 = 0.95 for dependency discovery, units, and range. These results demonstrate that graph-structured, agentic workflows can make QSP model development both accessible and rigorous, enabling domain experts to specify mechanisms in natural language without sacrificing biomedical fidelity.


【3】PERM EQ x GRAPH EQ: Equivariant Neural Networks for Quantum Molecular Learning
标题:PERM EQ x GRAPH EQ:量子分子学习的等变神经网络
链接:https://arxiv.org/abs/2512.05475

作者:Saumya Biswas,Jiten Oswal
备注:22 pages, 9 figures, 4 tables
摘要:在分子几何的层次顺序上,我们比较了几何量子机器学习模型的性能。两个分子数据集被认为是:简单的线性形状的LiH-分子和三角锥分子NH 3。准确性和概括性指标被认为是。一个经典的等变模型被用作性能比较的基线。研究了无对称等变性、旋转和置换等变性以及图嵌入置换等变性的量子机器学习模型的比较性能。性能差异和分子的几何形状问题揭示了选择模型的泛化标准。图嵌入的功能被证明是一个有效的途径,以更大的几何数据集的可训练性。置换对称嵌入被认为是几何学习中最具推广性的量子机器学习模型。
摘要:In hierarchal order of molecular geometry, we compare the performances of Geometric Quantum Machine Learning models. Two molecular datasets are considered: the simplistic linear shaped LiH-molecule and the trigonal pyramidal molecule NH3. Both accuracy and generalizability metrics are considered. A classical equivariant model is used as a baseline for the performance comparison. The comparative performance of Quantum Machine Learning models with no symmetry equivariance, rotational and permutational equivariance, and graph embedded permutational equivariance is investigated. The performance differentials and the molecular geometry in question reveals the criteria for choice of models for generalizability. Graph embedding of features is shown to be an effective pathway to greater trainability for geometric datasets. Permutational symmetric embedding is found to be the most generalizable quantum Machine Learning model for geometric learning.


【4】Sepsis Prediction Using Graph Convolutional Networks over Patient-Feature-Value Triplets
标题:使用患者特征值三重组图卷积网络预测败血症
链接:https://arxiv.org/abs/2512.05416

作者:Bozhi Dan,Di Wu,Ji Xu,Xiang Liu,Yiziting Zhu,Xin Shu,Yujie Li,Bin Yi
摘要:在重症监护环境中,脓毒症仍然是患者患病和死亡的主要原因;然而,其及时检测受到电子健康记录(EHR)数据复杂、稀疏和异质性的阻碍。我们提出了Triplet-GCN,一种单分支图卷积模型,将每次遇到的情况表示为患者-特征-值三元组,构建了一个二分EHR图,并通过图卷积网络(GCN)学习患者嵌入,然后是轻量级多层感知器(MLP)。该管道应用了特定类型的预处理-数字变量的中位数填补和标准化,二进制特征的效果编码,以及罕见分类属性的低维嵌入模式填补-并使用汇总统计数据对患者节点进行分类,同时保留边缘上的测量值以保留“谁测量了什么以及测量了多少”。在来自三家三级医院的回顾性多中心中国队列(N = 648; 70/30训练-测试分割)中,Triplet-GCN在区分度和平衡误差指标方面始终优于强表格基线(KNN,SVM,XGBoost,Random Forest),产生了更有利的灵敏度-特异性权衡,并改善了早期预警的整体效用。这些发现表明,将EHR编码为三元组并在患者特征图上传播信息,比特征独立模型产生更多信息的患者表示,为可部署的脓毒症风险分层提供了简单的端到端蓝图。
摘要:In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous nature of electronic health record (EHR) data. We propose Triplet-GCN, a single-branch graph convolutional model that represents each encounter as patient--feature--value triplets, constructs a bipartite EHR graph, and learns patient embeddings via a Graph Convolutional Network (GCN) followed by a lightweight multilayer perceptron (MLP). The pipeline applies type-specific preprocessing -- median imputation and standardization for numeric variables, effect coding for binary features, and mode imputation with low-dimensional embeddings for rare categorical attributes -- and initializes patient nodes with summary statistics, while retaining measurement values on edges to preserve "who measured what and by how much". In a retrospective, multi-center Chinese cohort (N = 648; 70/30 train--test split) drawn from three tertiary hospitals, Triplet-GCN consistently outperforms strong tabular baselines (KNN, SVM, XGBoost, Random Forest) across discrimination and balanced error metrics, yielding a more favorable sensitivity--specificity trade-off and improved overall utility for early warning. These findings indicate that encoding EHR as triplets and propagating information over a patient--feature graph produce more informative patient representations than feature-independent models, offering a simple, end-to-end blueprint for deployable sepsis risk stratification.


【5】DMAGT: Unveiling miRNA-Drug Associations by Integrating SMILES and RNA Sequence Structures through Graph Transformer Models
标题:DMAGT:通过图形Transformer模型整合SMILES和RNA序列结构揭示mRNA与药物的关联
链接:https://arxiv.org/abs/2512.05287

作者:Ziqi Zhang
备注:9 pages, 4 figures
摘要:由于它们在基因调控中的作用,miRNAs为药理学铺平了新的道路,专注于靶向miRNAs的药物开发。然而,传统的湿实验室实验受到效率和成本限制的限制,使得难以广泛探索开发的药物和靶向miRNA之间的潜在关联。因此,我们设计了一种新的机器学习模型,该模型基于多层基于transformer的图神经网络,DMAGT,专门用于预测药物和miRNA之间的关联。该模型将药物-miRNA关联转换为图,采用Word 2 Vec嵌入药物分子结构和miRNA基础结构的特征,并利用图Transformer模型从嵌入的特征和关系结构中学习,最终预测药物和miRNA之间的关联。为了评估DMAGT,我们在三个由药物-miRNA关联组成的数据集上测试了其性能:ncDR,RNAInter和SM 2 miR,达到了95.24\pm0.05 $的AUC。DMGT在应对类似挑战的比较实验中表现出卓越的性能。为了验证其实际疗效,我们特别关注两种药物,即5-氟尿嘧啶和奥沙利铂。在20种最有可能的药物-miRNA关联中,有14种被成功验证。上述实验表明,DMGT在预测药物-miRNA缔合方面具有优异的性能和稳定性,为miRNA药物开发提供了一条新的捷径。
摘要 :MiRNAs, due to their role in gene regulation, have paved a new pathway for pharmacology, focusing on drug development that targets miRNAs. However, traditional wet lab experiments are limited by efficiency and cost constraints, making it difficult to extensively explore potential associations between developed drugs and target miRNAs. Therefore, we have designed a novel machine learning model based on a multi-layer transformer-based graph neural network, DMAGT, specifically for predicting associations between drugs and miRNAs. This model transforms drug-miRNA associations into graphs, employs Word2Vec for embedding features of drug molecular structures and miRNA base structures, and leverages a graph transformer model to learn from embedded features and relational structures, ultimately predicting associations between drugs and miRNAs. To evaluate DMAGT, we tested its performance on three datasets composed of drug-miRNA associations: ncDR, RNAInter, and SM2miR, achieving up to AUC of $95.24\pm0.05$. DMAGT demonstrated superior performance in comparative experiments tackling similar challenges. To validate its practical efficacy, we specifically focused on two drugs, namely 5-Fluorouracil and Oxaliplatin. Of the 20 potential drug-miRNA associations identified as the most likely, 14 were successfully validated. The above experiments demonstrate that DMAGT has an excellent performance and stability in predicting drug-miRNA associations, providing a new shortcut for miRNA drug development.


Transformer(1篇)

【1】NEAT: Neighborhood-Guided, Efficient, Autoregressive Set Transformer for 3D Molecular Generation
标题:NEAT:邻域引导的,高效的,自回归的3D分子生成集Transformer
链接:https://arxiv.org/abs/2512.05844

作者:Daniel Rose,Roxane Axel Jacob,Johannes Kirchmair,Thierry Langer
摘要:自回归模型是一个有前途的替代扩散为基础的模型的三维分子结构生成。然而,一个关键的限制是令牌顺序的假设:虽然文本具有自然的顺序,但给定分子图前缀的下一个令牌预测应该对原子排列不变。以前的作品通过使用规范顺序或焦点原子来回避这种不匹配。我们认为这是不必要的。我们引入NEAT,一个邻域引导的,高效的,自回归的,集Transformer,将分子图作为原子集,并学习在图形边界上的自回归流模型的顺序不可知的分布在允许的令牌。NEAT在3D分子生成方面具有最先进的性能,具有高计算效率和原子级排列不变性,为可扩展的分子设计奠定了实用基础。
摘要:Autoregressive models are a promising alternative to diffusion-based models for 3D molecular structure generation. However, a key limitation is the assumption of a token order: while text has a natural sequential order, the next token prediction given a molecular graph prefix should be invariant to atom permutations. Previous works sidestepped this mismatch by using canonical orders or focus atoms. We argue that this is unnecessary. We introduce NEAT, a Neighborhood-guided, Efficient, Autoregressive, Set Transformer that treats molecular graphs as sets of atoms and learns the order-agnostic distribution over admissible tokens at the graph boundary with an autoregressive flow model. NEAT approaches state-of-the-art performance in 3D molecular generation with high computational efficiency and atom-level permutation invariance, establishing a practical foundation for scalable molecular design.


GAN|对抗|攻击|生成相关(3篇)

【1】Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms
标题:通过教育平台的实体链接增强检索增强生成
链接:https://arxiv.org/abs/2512.05967

作者:Francesco Granata,Francesco Poggi,Misael Mongiovì
摘要:在大型语言模型(LLM)时代,检索增强生成(RAG)架构因其在可靠的知识源中支持语言生成的能力而受到广泛关注。尽管其令人印象深刻的有效性在许多领域,RAG系统的基础上,仅仅是语义相似性往往无法确保事实的准确性,在专业领域,术语的歧义可能会影响检索的相关性。本研究提出了一种增强的RAG架构,集成了来自实体链接的事实信号,以提高意大利语教育问答系统的准确性。该系统包括一个基于维基数据的实体链接模块,并实现了三个重新排序策略,结合语义和基于实体的信息:一个混合得分加权模型,倒数排名融合,和跨编码器重新排名。实验在两个基准上进行:自定义学术数据集和标准SQuAD-it数据集。结果表明,在特定领域的背景下,基于倒数秩融合的混合模式显着优于基线和交叉编码器的方法,而交叉编码器实现了最好的结果在一般领域的数据集。这些研究结果证实存在的域不匹配的影响,并强调域适应和混合排名策略,以提高检索增强生成的事实精度和可靠性的重要性。它们还展示了实体感知RAG系统在教育环境中的潜力,促进了自适应和可靠的基于AI的辅导工具。
摘要:In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their impressive effectiveness in many areas, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, where terminological ambiguity can affect retrieval relevance. This study proposes an enhanced RAG architecture that integrates a factual signal derived from Entity Linking to improve the accuracy of educational question-answering systems in Italian. The system includes a Wikidata-based Entity Linking module and implements three re-ranking strategies to combine semantic and entity-based information: a hybrid score weighting model, reciprocal rank fusion, and a cross-encoder re-ranker. Experiments were conducted on two benchmarks: a custom academic dataset and the standard SQuAD-it dataset. Results show that, in domain-specific contexts, the hybrid schema based on reciprocal rank fusion significantly outperforms both the baseline and the cross-encoder approach, while the cross-encoder achieves the best results on the general-domain dataset. These findings confirm the presence of an effect of domain mismatch and highlight the importance of domain adaptation and hybrid ranking strategies to enhance factual precision and reliability in retrieval-augmented generation. They also demonstrate the potential of entity-aware RAG systems in educational environments, fostering adaptive and reliable AI-based tutoring tools.


【2】SSDLabeler: Realistic semi-synthetic data generation for multi-label artifact classification in EEG
标题:SSDLabeler:用于脑电中多标签伪影分类的真实半合成数据生成
链接:https://arxiv.org/abs/2512.05500

作者:Taketo Akama,Akima Connelly,Shun Minamikawa,Natalia Polouliakh
摘要:EEG记录固有地被诸如眼睛、肌肉和环境噪声之类的伪影污染,这些伪影模糊神经活动并使预处理复杂化。CNAS分类在稳定性和透明度方面具有优势,为基于ICA的方法提供了一种可行的替代方案,可与人工检查一起灵活使用,并可在各种应用中使用。然而,伪影分类受到其训练数据的限制,因为它需要大量的人工标记,这不能完全覆盖真实世界EEG的多样性。已经提出半合成数据(SSD)方法来解决该限制,但是现有方法通常使用ICA分量注入单个伪影类型或者需要单独记录伪影信号,从而降低了所生成数据的真实性和方法的适用性。为了克服这些问题,我们引入了SSDLabeler,一个框架,通过ICA分解真实EEG,使用RMS和PSD标准进行时代级伪影验证,并将多种伪影类型重新注入到干净的数据中,来生成逼真的注释SSD。当应用于训练多标签伪影分类器时,与先前的SSD和原始EEG训练相比,它在不同条件下提高了原始EEG的准确性,为伪影处理建立了可扩展的基础,该伪影处理捕获了真实EEG的共现和复杂性。
摘要:EEG recordings are inherently contaminated by artifacts such as ocular, muscular, and environmental noise, which obscure neural activity and complicate preprocessing. Artifact classification offers advantages in stability and transparency, providing a viable alternative to ICA-based methods that enable flexible use alongside human inspections and across various applications. However, artifact classification is limited by its training data as it requires extensive manual labeling, which cannot fully cover the diversity of real-world EEG. Semi-synthetic data (SSD) methods have been proposed to address this limitation, but prior approaches typically injected single artifact types using ICA components or required separately recorded artifact signals, reducing both the realism of the generated data and the applicability of the method. To overcome these issues, we introduce SSDLabeler, a framework that generates realistic, annotated SSDs by decomposing real EEG with ICA, epoch-level artifact verification using RMS and PSD criteria, and reinjecting multiple artifact types into clean data. When applied to train a multi-label artifact classifier, it improved accuracy on raw EEG across diverse conditions compared to prior SSD and raw EEG training, establishing a scalable foundation for artifact handling that captures the co-occurrence and complexity of real EEG.


【3】Bayesian Optimization and Convolutional Neural Networks for Zernike-Based Wavefront Correction in High Harmonic Generation
标题:基于Zernike的Bayesian优化和卷积神经网络在高次产生中进行波阵面修正
链接:https://arxiv.org/abs/2512.05127

作者:Guilherme Grancho D. Fernandes,Duarte Alexandrino,Eduardo Silva,João Matias,Joaquim Pereira
摘要:高次谐波产生(HHG)是一种非线性过程,其能够在极紫外(EUV)到软X射线范围内在桌面上产生可调谐、高能、相干、超短辐射脉冲。这些脉冲在凝聚态物理学的光电子能谱、高能量密度等离子体的泵浦探测光谱和阿秒科学中有应用。然而,在高功率激光系统所需的HHG的光学像差降低光束质量和降低效率。我们提出了一种机器学习方法来优化像差校正使用空间光调制器。我们实现并比较了贝叶斯优化和卷积神经网络(CNN)方法来预测波前校正的最佳Zernike多项式系数。我们的CNN在测试数据上取得了80.39%的准确率,证明了在HHG系统中自动像差校正的潜力。
摘要:High harmonic generation (HHG) is a nonlinear process that enables table-top generation of tunable, high-energy, coherent, ultrashort radiation pulses in the extreme ultraviolet (EUV) to soft X-ray range. These pulses find applications in photoemission spectroscopy in condensed matter physics, pump-probe spectroscopy for high-energy-density plasmas, and attosecond science. However, optical aberrations in the high-power laser systems required for HHG degrade beam quality and reduce efficiency. We present a machine learning approach to optimize aberration correction using a spatial light modulator. We implemented and compared Bayesian optimization and convolutional neural network (CNN) methods to predict optimal Zernike polynomial coefficients for wavefront correction. Our CNN achieved promising results with 80.39% accuracy on test data, demonstrating the potential for automated aberration correction in HHG systems.


半/弱/无/有监督|不确定性|主动学习(5篇)

【1】Modular Jets for Supervised Pipelines: Diagnosing Mirage vs Identifiability
标题:监控管道的模块化喷嘴:诊断幻影与可识别性
链接:https://arxiv.org/abs/2512.05638

作者:Suman Sanyal
摘要:经典的监督学习主要通过对保留数据的预测风险来评估模型。这样的评估量化了函数在分布上的表现,但它们没有解决模型的内部分解是否由数据和评估设计唯一确定。在本文中,我们介绍了用于回归和分类管道的模块化Jets。给定一个任务流形(输入空间),一个模块分解,并访问模块级表示,我们估计经验射流,这是局部线性响应映射,描述每个模块如何对输入的小结构化扰动作出反应。我们提出了一个经验的概念\mirage}制度,多个不同的模块分解诱导不可区分的喷气机,从而保持观测等价,并与\mirage {可识别}制度,观察到的喷气机挑出分解自然对称性的对比。在两个模块的线性回归流水线的设置,我们证明了射流识别定理。在温和的秩假设和访问模块级喷气机,内部因式分解是唯一确定的,而风险只评估承认一个大家庭的海市蜃楼分解,实现相同的输入到输出的地图。然后,我们提出了一个算法(MoJet)的经验射流估计和海市蜃楼诊断,并说明了使用线性和深度回归以及管道分类的框架。
摘要:Classical supervised learning evaluates models primarily via predictive risk on hold-out data. Such evaluations quantify how well a function behaves on a distribution, but they do not address whether the internal decomposition of a model is uniquely determined by the data and evaluation design. In this paper, we introduce \emph{Modular Jets} for regression and classification pipelines. Given a task manifold (input space), a modular decomposition, and access to module-level representations, we estimate empirical jets, which are local linear response maps that describe how each module reacts to small structured perturbations of the input. We propose an empirical notion of \emph{mirage} regimes, where multiple distinct modular decompositions induce indistinguishable jets and thus remain observationally equivalent, and contrast this with an \emph{identifiable} regime, where the observed jets single out a decomposition up to natural symmetries. In the setting of two-module linear regression pipelines we prove a jet-identifiability theorem. Under mild rank assumptions and access to module-level jets, the internal factorisation is uniquely determined, whereas risk-only evaluation admits a large family of mirage decompositions that implement the same input-to-output map. We then present an algorithm (MoJet) for empirical jet estimation and mirage diagnostics, and illustrate the framework using linear and deep regression as well as pipeline classification.


【2】Wasserstein distance based semi-supervised manifold learning and application to GNSS multi-path detection
标题:基于Wasserstein距离的半监督流学习及其在GNSS多路径检测中的应用
链接:https://arxiv.org/abs/2512.05567

作者:Antoine Blais,Nicolas Couëllan
摘要:本研究的主要目标是提出一种基于最佳传输的半监督方法,使用深度卷积网络从稀缺的标记图像数据中学习。其原理在于基于隐式图的转导半监督学习,其中图像样本之间的相似性度量是Wasserstein距离。该度量在学习期间用于标签传播机制。我们应用并证明了该方法在GNSS实际应用中的有效性。更具体地说,我们解决多径干扰检测的问题。在各种信号条件下进行了实验。结果表明,对于控制半监督量和对度量的灵敏度水平的超参数的特定选择,分类准确率可以比全监督训练方法显着提高。
摘要:The main objective of this study is to propose an optimal transport based semi-supervised approach to learn from scarce labelled image data using deep convolutional networks. The principle lies in implicit graph-based transductive semi-supervised learning where the similarity metric between image samples is the Wasserstein distance. This metric is used in the label propagation mechanism during learning. We apply and demonstrate the effectiveness of the method on a GNSS real life application. More specifically, we address the problem of multi-path interference detection. Experiments are conducted under various signal conditions. The results show that for specific choices of hyperparameters controlling the amount of semi-supervision and the level of sensitivity to the metric, the classification accuracy can be significantly improved over the fully supervised training method.


【3】Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)
标题:使用稀疏变分高斯过程Kolmogorov-Arnold网络(SVGP KAN)进行科学机器学习的不确定性量化
链接:https://arxiv.org/abs/2512.05306

作者:Y. Sungtaek Ju
备注:20 pages, 3 figures
摘要:Kolmogorov-Arnold网络已经成为传统多层感知器的可解释替代方案。然而,标准实现缺乏许多科学应用所必需的原则性不确定性量化能力。我们提出了一个框架,将稀疏变分高斯过程推理与Kolmogorov-Arnold拓扑结构相结合,使可扩展的贝叶斯推理的计算复杂度在样本大小上是准线性的。通过分析矩匹配,我们通过深层加性结构传播不确定性,同时保持可解释性。我们使用三个示例研究来证明该框架区分任意性和认知不确定性的能力:流体流重建中异方差测量噪声的校准,对流扩散动力学多步预测中预测置信度退化的量化,以及卷积自动编码器中的分布检测。这些结果表明,稀疏变分高斯过程Kolmogorov-Arnold网络(SVGP KAN)是科学机器学习中不确定性感知学习的一种有前途的架构。
摘要 :Kolmogorov-Arnold Networks have emerged as interpretable alternatives to traditional multi-layer perceptrons. However, standard implementations lack principled uncertainty quantification capabilities essential for many scientific applications. We present a framework integrating sparse variational Gaussian process inference with the Kolmogorov-Arnold topology, enabling scalable Bayesian inference with computational complexity quasi-linear in sample size. Through analytic moment matching, we propagate uncertainty through deep additive structures while maintaining interpretability. We use three example studies to demonstrate the framework's ability to distinguish aleatoric from epistemic uncertainty: calibration of heteroscedastic measurement noise in fluid flow reconstruction, quantification of prediction confidence degradation in multi-step forecasting of advection-diffusion dynamics, and out-of-distribution detection in convolutional autoencoders. These results suggest Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KANs) is a promising architecture for uncertainty-aware learning in scientific machine learning.


【4】Uncertainty-Aware Data-Efficient AI: An Information-Theoretic Perspective
标题:具有不确定性的数据高效人工智能:信息理论的视角
链接:https://arxiv.org/abs/2512.05267

作者:Osvaldo Simeone,Yaniv Romano
摘要:在机器人、电信和医疗保健等特定于上下文的应用中,人工智能系统经常面临训练数据有限的挑战。这种稀缺性引入了认知的不确定性,即,可减少的不确定性源于对基础数据分布的不完整了解,这从根本上限制了预测性能。本文审查了通过两种互补方法解决数据有限制度的正式方法:量化认知不确定性和通过合成数据增强缓解数据稀缺性。我们首先回顾广义贝叶斯学习框架,通过模型参数空间中的广义后验来表征认知的不确定性,以及“后贝叶斯”学习框架。我们继续提出信息理论泛化界限,形式化训练数据量和预测不确定性之间的关系,为广义贝叶斯学习提供理论依据。超越渐近统计有效性的方法,我们调查不确定性量化方法,提供有限样本的统计保证,包括共形预测和共形风险控制。最后,我们通过将有限的标记数据与丰富的模型预测或合成数据相结合来研究数据效率的最新进展。在整个过程中,我们采取信息理论的角度来看,强调信息措施在量化数据稀缺的影响的作用。
摘要:In context-specific applications such as robotics, telecommunications, and healthcare, artificial intelligence systems often face the challenge of limited training data. This scarcity introduces epistemic uncertainty, i.e., reducible uncertainty stemming from incomplete knowledge of the underlying data distribution, which fundamentally limits predictive performance. This review paper examines formal methodologies that address data-limited regimes through two complementary approaches: quantifying epistemic uncertainty and mitigating data scarcity via synthetic data augmentation. We begin by reviewing generalized Bayesian learning frameworks that characterize epistemic uncertainty through generalized posteriors in the model parameter space, as well as ``post-Bayes'' learning frameworks. We continue by presenting information-theoretic generalization bounds that formalize the relationship between training data quantity and predictive uncertainty, providing a theoretical justification for generalized Bayesian learning. Moving beyond methods with asymptotic statistical validity, we survey uncertainty quantification methods that provide finite-sample statistical guarantees, including conformal prediction and conformal risk control. Finally, we examine recent advances in data efficiency by combining limited labeled data with abundant model predictions or synthetic data. Throughout, we take an information-theoretic perspective, highlighting the role of information measures in quantifying the impact of data scarcity.


【5】Advanced Unsupervised Learning: A Comprehensive Overview of Multi-View Clustering Techniques
标题:高级无监督学习:多视图集群技术的全面概述
链接:https://arxiv.org/abs/2512.05169

作者:Abdelmalik Moujahid,Fadi Dornaika
摘要:机器学习技术面临着许多挑战,以实现最佳性能。这些问题包括计算约束、单视图学习算法的局限性以及处理来自不同领域、源或视图的大型数据集的复杂性。在这种情况下,多视图聚类(MVC),一类无监督的多视图学习,出现了一个强大的方法来克服这些挑战。MVC弥补了单视图方法的不足,为各种无监督学习任务提供了更丰富的数据表示和有效的解决方案。与传统的单视图方法相比,多视图数据的语义丰富的性质增加了其实际效用,尽管其固有的复杂性。该调查做出了三方面的贡献:(1)将多视图聚类方法系统地分类为定义明确的组,包括协同训练,协同正则化,子空间,深度学习,基于内核,基于锚点和基于图的策略;(2)深入分析各自的优势,弱点和实际挑战,如可扩展性和不完整数据;以及(3)对MVC研究的新兴趋势、跨学科应用和未来方向的前瞻性讨论。这项研究代表了广泛的工作量,包括审查140多个基础和最近的出版物,对集成策略(如早期融合,后期融合和联合学习)的比较见解的发展,以及对医疗保健,多媒体和社交网络分析领域的实际用例的结构化调查。通过整合这些努力,这项工作旨在填补MVC研究中的现有空白,并为该领域的发展提供可操作的见解。
摘要:Machine learning techniques face numerous challenges to achieve optimal performance. These include computational constraints, the limitations of single-view learning algorithms and the complexity of processing large datasets from different domains, sources or views. In this context, multi-view clustering (MVC), a class of unsupervised multi-view learning, emerges as a powerful approach to overcome these challenges. MVC compensates for the shortcomings of single-view methods and provides a richer data representation and effective solutions for a variety of unsupervised learning tasks. In contrast to traditional single-view approaches, the semantically rich nature of multi-view data increases its practical utility despite its inherent complexity. This survey makes a threefold contribution: (1) a systematic categorization of multi-view clustering methods into well-defined groups, including co-training, co-regularization, subspace, deep learning, kernel-based, anchor-based, and graph-based strategies; (2) an in-depth analysis of their respective strengths, weaknesses, and practical challenges, such as scalability and incomplete data; and (3) a forward-looking discussion of emerging trends, interdisciplinary applications, and future directions in MVC research. This study represents an extensive workload, encompassing the review of over 140 foundational and recent publications, the development of comparative insights on integration strategies such as early fusion, late fusion, and joint learning, and the structured investigation of practical use cases in the areas of healthcare, multimedia, and social network analysis. By integrating these efforts, this work aims to fill existing gaps in MVC research and provide actionable insights for the advancement of the field.


迁移|Zero/Few/One-Shot|自适应(2篇)

【1】BERTO: an Adaptive BERT-based Network Time Series Predictor with Operator Preferences in Natural Language
标题:BERTO:一个基于BERT的自适应网络时间序列预测器,具有自然语言中的操作员偏好
链接:https://arxiv.org/abs/2512.05721

作者:Nitin Priyadarshini Shankar,Vaibhav Singh,Sheetal Kalyani,Christian Maciocco
摘要:我们介绍BERTO,一个基于BERT的框架,用于蜂窝网络中的流量预测和能量优化。BERTO基于Transformer架构构建,可提供高预测精度,而其平衡损耗功能和基于自定义的特性使运营商能够在节能和性能之间进行权衡。自然语言提示引导模型根据操作员的意图管理预测不足和预测过度。在真实世界数据集上的实验表明,BERTO改进了现有模型,MSE降低了4.13 $\%,同时引入了通过简单的自然语言输入来平衡节能和性能的竞争目标的功能,在功率为1.4 $ kW的灵活范围内运行,服务质量变化高达9\times $,使其非常适合智能RAN部署。
摘要:We introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO delivers high prediction accuracy, while its Balancing Loss Function and prompt-based customization allow operators to adjust the trade-off between power savings and performance. Natural language prompts guide the model to manage underprediction and overprediction in accordance with the operator's intent. Experiments on real-world datasets show that BERTO improves upon existing models with a $4.13$\% reduction in MSE while introducing the feature of balancing competing objectives of power saving and performance through simple natural language inputs, operating over a flexible range of $1.4$ kW in power and up to $9\times$ variation in service quality, making it well suited for intelligent RAN deployments.


【2】Variance Matters: Improving Domain Adaptation via Stratified Sampling
标题:方差很重要:通过分层采样改善领域自适应
链接:https://arxiv.org/abs/2512.05226

作者:Andrea Napoli,Paul White
摘要:领域转移仍然是将机器学习模型部署到现实世界的关键挑战。无监督域自适应(UDA)旨在通过最小化训练过程中的域差异来解决这个问题,但差异估计在随机设置中存在高方差,这可能会扼杀该方法的理论优势。本文提出了方差减少域自适应分层采样(VaRDASS),第一个专门的随机方差减少技术的UDA。我们考虑两个具体的差异措施-相关对齐和最大平均差异(MMD)-并得出这些条款的临时分层目标。然后,我们提出了预期和最坏情况下的误差范围,并证明我们提出的MMD目标在理论上是最优的(即,在某些假设下,最小化方差。最后,介绍并分析了一种实用的k-means型优化算法。在三个域偏移数据集上的实验表明,改进的差异估计精度和目标域性能。
摘要:Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Variance-Reduced Domain Adaptation via Stratified Sampling (VaRDASS), the first specialised stochastic variance reduction technique for UDA. We consider two specific discrepancy measures -- correlation alignment and the maximum mean discrepancy (MMD) -- and derive ad hoc stratification objectives for these terms. We then present expected and worst-case error bounds, and prove that our proposed objective for the MMD is theoretically optimal (i.e., minimises the variance) under certain assumptions. Finally, a practical k-means style optimisation algorithm is introduced and analysed. Experiments on three domain shift datasets demonstrate improved discrepancy estimation accuracy and target domain performance.


强化学习(3篇)

【1】A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning
标题:基于强化学习的快速抗干扰认知雷达部署算法
链接:https://arxiv.org/abs/2512.05753

作者:Wencheng Cai,Xuchao Gao,Congying Han,Mingqiang Li,Tiande Guo
摘要:快速部署认知雷达以对抗干扰仍然是现代战争中的一个关键挑战,更有效的部署可以更快地发现目标。现有的方法主要是基于进化算法,这是耗时的,容易陷入局部最优。针对这些不足,本文提出了一种基于神经网络的快速抗干扰雷达部署算法(FARDA)。我们首先将雷达部署问题建模为端到端任务,并设计深度强化学习算法来解决这个问题,我们开发集成神经模块来感知热图信息和全新的奖励格式。实证结果表明,我们的方法实现了与进化算法相当的覆盖范围,同时部署雷达的速度快了约7,000倍。进一步的消融实验证实了FARDA的每个组件的必要性。
摘要:The fast deployment of cognitive radar to counter jamming remains a critical challenge in modern warfare, where more efficient deployment leads to quicker detection of targets. Existing methods are primarily based on evolutionary algorithms, which are time-consuming and prone to falling into local optima. We tackle these drawbacks via the efficient inference of neural networks and propose a brand new framework: Fast Anti-Jamming Radar Deployment Algorithm (FARDA). We first model the radar deployment problem as an end-to-end task and design deep reinforcement learning algorithms to solve it, where we develop integrated neural modules to perceive heatmap information and a brand new reward format. Empirical results demonstrate that our method achieves coverage comparable to evolutionary algorithms while deploying radars approximately 7,000 times faster. Further ablation experiments confirm the necessity of each component of FARDA.


【2】Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
标题:作为稳定强化学习的软全局约束的熵比剪裁
链接:https://arxiv.org/abs/2512.05591

作者:Zhenpeng Su,Leiyu Pan,Minxuan Lv,Tiehua Mei,Zijia Lin,Yuntao Li,Wenping Hu,Ruiming Tang,Kun Gai,Guorui Zhou
摘要:大型语言模型后训练依赖于强化学习来提高模型能力和对齐质量。然而,非策略训练范式引入了分布偏移,这通常会将策略推到信任区域之外,导致训练不稳定性,表现为策略熵的波动和不稳定梯度。虽然PPO-Clip通过重要性裁剪缓解了这个问题,但它仍然忽略了动作的全局分布变化。为了解决这些挑战,我们建议使用当前和以前的政策之间的熵比作为一个新的全局度量,有效地量化整个更新过程中的政策探索的相对变化。在此度量的基础上,我们引入了一个\textbf{Entropy Ratio Clipping}(ERC)机制,对熵比施加双向约束。这在全球分布水平上稳定了策略更新,并补偿了PPO剪辑无法调节未采样动作的概率变化。我们将ERC集成到DAPO和GPPO强化学习算法中。多个基准测试的实验表明,ERC一贯提高性能。
摘要:Large language model post-training relies on reinforcement learning to improve model capability and alignment quality. However, the off-policy training paradigm introduces distribution shift, which often pushes the policy beyond the trust region, leading to training instabilities manifested as fluctuations in policy entropy and unstable gradients. Although PPO-Clip mitigates this issue through importance clipping, it still overlooks the global distributional shift of actions. To address these challenges, we propose using the entropy ratio between the current and previous policies as a new global metric that effectively quantifies the relative change in policy exploration throughout updates. Building on this metric, we introduce an \textbf{Entropy Ratio Clipping} (ERC) mechanism that imposes bidirectional constraints on the entropy ratio. This stabilizes policy updates at the global distribution level and compensates for the inability of PPO-clip to regulate probability shifts of un-sampled actions. We integrate ERC into both DAPO and GPPO reinforcement learning algorithms. Experiments across multiple benchmarks show that ERC consistently improves performance.


【3】Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem
标题:具有替代方案的动态VNE问题的分层强化学习
链接:https://arxiv.org/abs/2512.05207

作者:Ali Al Housseini,Cristina Rottondi,Omran Ayoub
备注:Submitted to IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN) 2026
摘要:虚拟网络嵌入(VNE)是网络切片的关键推动因素,但大多数公式假设每个虚拟网络请求(VNR)具有固定的拓扑结构。最近,引入了具有替代拓扑的VNE(VNEAP)来捕获可延展的VNR,其中每个请求可以使用不同地交易资源的几个功能等效拓扑中的一个来实例化。虽然这种灵活性扩大了可行空间,但它也引入了额外的决策层,使动态嵌入更具挑战性。本文提出了HRL-VNEAP,一个层次的强化学习方法VNEAP下的动态到达。高级策略选择最合适的替代拓扑(或拒绝请求),低级策略将所选拓扑嵌入到底层网络中。在多个流量负载下对实际底层拓扑的实验表明,朴素的利用策略仅提供适度的增益,而HRL-VNEAP在所有指标上始终实现最佳性能。与最强的测试基线相比,HRL-VNEAP将验收率提高了20.7\%,总收入提高了36.2\%,收入成本比提高了22.1\%。最后,我们对一个易于处理的实例MILP制定基准,以量化剩余的差距,以最优和激励未来的工作学习和优化为基础的VNEAP解决方案。
摘要:Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.


元学习(2篇)

【1】Meta-Learning Multi-armed Bandits for Beam Tracking in 5G and 6G Networks
标题:用于5G和6G网络中束跟踪的元学习多臂盗贼
链接:https://arxiv.org/abs/2512.05680

作者:Alexander Mattick,George Yammine,Georgios Kontes,Setareh Maghsudi,Christopher Mutschler
摘要:具有许多元件的波束成形天线阵列可在下一代5G和6G网络中实现更高的数据速率。在当前实践中,模拟波束成形使用预配置波束的码本,其中每个波束朝向特定方向辐射,并且波束管理功能连续地选择用于移动用户设备(UE)的最佳波束。然而,较大的码本以及光束反射或阻塞引起的影响使得最佳光束选择具有挑战性。与以前的工作和标准化的努力,选择监督学习训练分类器,以预测下一个最好的波束的基础上,以前选择的波束,我们制定的问题作为一个部分可观察的马尔可夫决策过程(POMDP)和模型的环境作为码本本身。在每个时间步,我们选择一个候选波束的不可观测的最佳波束和先前探测的波束的信念状态的条件。这将波束选择问题框定为定位移动最佳波束的在线搜索过程。与以前的工作相比,我们的方法处理新的或不可预见的轨迹和物理环境的变化,并优于以前的工作的数量级。
摘要:Beamforming-capable antenna arrays with many elements enable higher data rates in next generation 5G and 6G networks. In current practice, analog beamforming uses a codebook of pre-configured beams with each of them radiating towards a specific direction, and a beam management function continuously selects \textit{optimal} beams for moving user equipments (UEs). However, large codebooks and effects caused by reflections or blockages of beams make an optimal beam selection challenging. In contrast to previous work and standardization efforts that opt for supervised learning to train classifiers to predict the next best beam based on previously selected beams we formulate the problem as a partially observable Markov decision process (POMDP) and model the environment as the codebook itself. At each time step, we select a candidate beam conditioned on the belief state of the unobservable optimal beam and previously probed beams. This frames the beam selection problem as an online search procedure that locates the moving optimal beam. In contrast to previous work, our method handles new or unforeseen trajectories and changes in the physical environment, and outperforms previous work by orders of magnitude.


【2】To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples
标题:思考或不思考:过度CoT示例的元训练的隐性成本
链接:https://arxiv.org/abs/2512.05318

作者:Vignesh Kothapalli,Ata Fatahibaarzi,Hamed Firooz,Maziar Sanjabi
备注:26 pages, 45 figures, 3 tables
摘要:结合Few-Shot上下文学习(ICL)的思想链(CoT)提示已经在大型语言模型(LLM)中解锁了重要的推理能力。然而,ICL与CoT的例子是无效的新任务时,预训练知识是不够的。我们使用CoT-ICL实验室框架在受控环境中研究这个问题,并提出元训练技术来学习上下文中的新抽象推理任务。虽然CoT示例有助于推理,但我们注意到,当CoT监督有限时,在元训练期间过度包含它们会降低性能。为了减轻这种行为,我们提出了CoT-Recipe,这是一种正式的方法来调节元训练序列中CoT和非CoT示例的混合。我们证明,通过CoT-Recipe仔细调制可以将Transformers在新任务上的准确性提高高达300%,即使在上下文中没有CoT示例。我们通过将这些技术应用于预训练的LLM(Qwen2.5系列)进行符号推理任务,并观察到高达130%的准确率,证实了这些技术的更广泛有效性。
摘要:Chain-of-thought (CoT) prompting combined with few-shot in-context learning (ICL) has unlocked significant reasoning capabilities in large language models (LLMs). However, ICL with CoT examples is ineffective on novel tasks when the pre-training knowledge is insufficient. We study this problem in a controlled setting using the CoT-ICL Lab framework, and propose meta-training techniques to learn novel abstract reasoning tasks in-context. Although CoT examples facilitate reasoning, we noticed that their excessive inclusion during meta-training degrades performance when CoT supervision is limited. To mitigate such behavior, we propose CoT-Recipe, a formal approach to modulate the mix of CoT and non-CoT examples in meta-training sequences. We demonstrate that careful modulation via CoT-Recipe can increase the accuracy of transformers on novel tasks by up to 300% even when there are no CoT examples available in-context. We confirm the broader effectiveness of these techniques by applying them to pretrained LLMs (Qwen2.5 series) for symbolic reasoning tasks and observing gains of up to 130% in accuracy.


医学相关(3篇)

【1】NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction
标题:NICE:用于颌骨手术预测的神经隐式颅面模型
链接:https://arxiv.org/abs/2512.05920

作者:Jiawen Yang,Yihui Cao,Xuanyu Tian,Yuyao Zhang,Hongjiang Wei
摘要:颌外科手术是一种重要的干预措施,以纠正牙颌面骨骼畸形,以提高咬合功能和面部美学。由于骨骼运动和面部软组织之间复杂的非线性相互作用,精确的术后面部外观预测仍然具有挑战性。现有的生物力学、参数模型和深度学习方法要么缺乏计算效率,要么无法完全捕捉这些复杂的相互作用。为了解决这些局限性,我们提出了神经隐式颅面模型(NICE),它采用隐式神经表示进行准确的解剖重建和手术结果预测。NICE包括形状模块和手术模块,形状模块采用区域特定的隐式符号距离函数(SDF)解码器来重建面部表面、上颌骨和下颌骨,手术模块采用区域特定的变形解码器。这些变形解码器由共享的手术潜在代码驱动,以有效地模拟面部表面对骨骼运动的复杂的非线性生物力学响应,并结合解剖学先验知识。变形解码器输出逐点位移场,实现手术结果的精确建模。大量的实验表明,NICE优于当前最先进的方法,显著提高了嘴唇和下巴等关键面部区域的预测精度,同时稳健地保持了解剖结构的完整性。这项工作提供了一个临床上可行的工具,加强手术计划和病人咨询正颌手术。
摘要:Orthognathic surgery is a crucial intervention for correcting dentofacial skeletal deformities to enhance occlusal functionality and facial aesthetics. Accurate postoperative facial appearance prediction remains challenging due to the complex nonlinear interactions between skeletal movements and facial soft tissue. Existing biomechanical, parametric models and deep-learning approaches either lack computational efficiency or fail to fully capture these intricate interactions. To address these limitations, we propose Neural Implicit Craniofacial Model (NICE) which employs implicit neural representations for accurate anatomical reconstruction and surgical outcome prediction. NICE comprises a shape module, which employs region-specific implicit Signed Distance Function (SDF) decoders to reconstruct the facial surface, maxilla, and mandible, and a surgery module, which employs region-specific deformation decoders. These deformation decoders are driven by a shared surgical latent code to effectively model the complex, nonlinear biomechanical response of the facial surface to skeletal movements, incorporating anatomical prior knowledge. The deformation decoders output point-wise displacement fields, enabling precise modeling of surgical outcomes. Extensive experiments demonstrate that NICE outperforms current state-of-the-art methods, notably improving prediction accuracy in critical facial regions such as lips and chin, while robustly preserving anatomical integrity. This work provides a clinically viable tool for enhanced surgical planning and patient consultation in orthognathic procedures.


【2】Model Gateway: Model Management Platform for Model-Driven Drug Discovery
标题:模型门户:模型驱动药物发现的模型管理平台
链接:https://arxiv.org/abs/2512.05462

作者:Yan-Shiun Wu,Nathan A. Morin
备注:7 pages, 7 figures
摘要 :本文介绍了模型网关,这是一个用于管理药物发现管道中的机器学习(ML)和科学计算模型的管理平台。该平台支持大型语言模型(LLM)代理和基于生成AI的工具,以在我们的机器学习操作(MLOps)管道中执行ML模型管理任务,例如动态共识模型,一种聚合多个科学计算模型的模型,注册和管理,检索模型信息,异步提交/执行模型,以及在模型完成执行后接收结果。该平台包括模型所有者控制面板、平台管理工具和模型网关API服务,用于与平台交互并跟踪模型执行。当测试扩展超过10 k个并发应用程序客户端使用模型时,该平台实现了0%的失败率。模型网关是我们模型驱动的药物发现管道的基本组成部分。随着我们的MLOps基础设施的成熟以及LLM代理和生成AI工具的集成,它有可能显着加快新药的开发。
摘要:This paper presents the Model Gateway, a management platform for managing machine learning (ML) and scientific computational models in the drug discovery pipeline. The platform supports Large Language Model (LLM) Agents and Generative AI-based tools to perform ML model management tasks in our Machine Learning operations (MLOps) pipelines, such as the dynamic consensus model, a model that aggregates several scientific computational models, registration and management, retrieving model information, asynchronous submission/execution of models, and receiving results once the model complete executions. The platform includes a Model Owner Control Panel, Platform Admin Tools, and Model Gateway API service for interacting with the platform and tracking model execution. The platform achieves a 0% failure rate when testing scaling beyond 10k simultaneous application clients consume models. The Model Gateway is a fundamental part of our model-driven drug discovery pipeline. It has the potential to significantly accelerate the development of new drugs with the maturity of our MLOps infrastructure and the integration of LLM Agents and Generative AI tools.


【3】Rethinking Tokenization for Clinical Time Series: When Less is More
标题:重新思考临床时间序列的代币化:当少即是多
链接:https://arxiv.org/abs/2512.05217

作者:Rafi Al Attrach,Rajna Fani,David Restrepo,Yugang Jia,Peter Schüffler
备注:9 pages, 2 figures, 4 tables. Machine Learning for Health (ML4H) 2025, Findings track
摘要:令牌化策略决定了模型处理电子健康记录的方式,但对其有效性的公平比较仍然有限。我们提出了一个系统的评价标记化方法的临床时间序列建模使用transformer为基础的架构,揭示任务依赖性,有时违反直觉的发现时间和价值特征的重要性。通过对MIMIC-IV的四个临床预测任务的受控消融,我们证明了明确的时间编码对评估的下游任务没有一致的统计学显著性益处。值特征显示任务依赖的重要性,影响死亡率预测,但不影响再入院,这表明单独的代码序列可以携带足够的预测信号。我们进一步表明,冻结预训练代码编码器大大优于其可训练的同行,同时需要显着更少的参数。更大的临床编码器提供跨任务的一致改进,受益于消除计算开销的冻结嵌入。我们的受控评估可以实现更公平的标记化比较,并证明在许多情况下,更简单,参数有效的方法可以实现强大的性能,尽管最佳标记化策略仍然依赖于任务。
摘要:Tokenization strategies shape how models process electronic health records, yet fair comparisons of their effectiveness remain limited. We present a systematic evaluation of tokenization approaches for clinical time series modeling using transformer-based architectures, revealing task-dependent and sometimes counterintuitive findings about temporal and value feature importance. Through controlled ablations across four clinical prediction tasks on MIMIC-IV, we demonstrate that explicit time encodings provide no consistent statistically significant benefit for the evaluated downstream tasks. Value features show task-dependent importance, affecting mortality prediction but not readmission, suggesting code sequences alone can carry sufficient predictive signal. We further show that frozen pretrained code encoders dramatically outperform their trainable counterparts while requiring dramatically fewer parameters. Larger clinical encoders provide consistent improvements across tasks, benefiting from frozen embeddings that eliminate computational overhead. Our controlled evaluation enables fairer tokenization comparisons and demonstrates that simpler, parameter-efficient approaches can, in many cases, achieve strong performance, though the optimal tokenization strategy remains task-dependent.


蒸馏|知识提取(2篇)

【1】Utility Boundary of Dataset Distillation: Scaling and Configuration-Coverage Laws
标题:数据集蒸馏的效用边界:缩放和分解覆盖定律
链接:https://arxiv.org/abs/2512.05817

作者:Zhengquan Luo,Zhiqiang Xu
摘要:数据集蒸馏(DD)旨在构建紧凑的合成数据集,使模型能够实现与全数据训练相当的性能,同时大幅减少存储和计算。尽管快速的经验进展,其理论基础仍然有限:现有的方法(梯度,分布,轨迹匹配)是建立在异构的替代目标和优化假设,这使得它很难分析其共同的原则或提供一般的保证。此外,目前还不清楚在什么条件下,当训练配置(如优化器,架构或增强)发生变化时,提取的数据可以保持完整数据集的有效性。为了回答这些问题,我们提出了一个统一的理论框架,称为配置-动力学-误差分析,它重新制定了主要的DD方法下一个共同的广义误差的角度来看,并提供了两个主要结果:(i)一个标度律,提供了一个单一的配置上限,表征如何蒸馏样本大小的增加,并解释了常见的性能饱和效应的错误减少;和(ii)覆盖率的法律表明,所需的蒸馏样本量的规模与配置的多样性线性,可证明匹配的上限和下限。此外,我们的统一分析表明,各种匹配方法是可互换的替代品,减少了相同的泛化错误,澄清了为什么它们都可以实现数据集蒸馏,并提供了关于替代选择如何影响样本效率和鲁棒性的指导。不同方法和配置的实验从经验上证实了推导出的定律,为DD提供了理论基础,并实现了理论驱动的紧凑,配置强大的数据集蒸馏设计。
摘要:Dataset distillation (DD) aims to construct compact synthetic datasets that allow models to achieve comparable performance to full-data training while substantially reducing storage and computation. Despite rapid empirical progress, its theoretical foundations remain limited: existing methods (gradient, distribution, trajectory matching) are built on heterogeneous surrogate objectives and optimization assumptions, which makes it difficult to analyze their common principles or provide general guarantees. Moreover, it is still unclear under what conditions distilled data can retain the effectiveness of full datasets when the training configuration, such as optimizer, architecture, or augmentation, changes. To answer these questions, we propose a unified theoretical framework, termed configuration--dynamics--error analysis, which reformulates major DD approaches under a common generalization-error perspective and provides two main results: (i) a scaling law that provides a single-configuration upper bound, characterizing how the error decreases as the distilled sample size increases and explaining the commonly observed performance saturation effect; and (ii) a coverage law showing that the required distilled sample size scales linearly with configuration diversity, with provably matching upper and lower bounds. In addition, our unified analysis reveals that various matching methods are interchangeable surrogates, reducing the same generalization error, clarifying why they can all achieve dataset distillation and providing guidance on how surrogate choices affect sample efficiency and robustness. Experiments across diverse methods and configurations empirically confirm the derived laws, advancing a theoretical foundation for DD and enabling theory-driven design of compact, configuration-robust dataset distillation.


【2】One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow
标题:自蒸馏和确定性流的一步扩散采样器
链接:https://arxiv.org/abs/2512.05251

作者:Pascal Jutras-Dube,Jiaru Zhang,Ziran Wang,Ruqi Zhang
摘要:从非标准化目标分布中采样是机器学习和统计学中一项基本但具有挑战性的任务。现有的采样算法通常需要许多迭代步骤来产生高质量的样本,从而导致高计算成本。我们引入一步扩散采样器,学习一步条件的常微分方程,使一个大的步骤通过状态空间的一致性损失再现许多小的轨迹。我们进一步表明,标准的ELBO估计扩散采样器退化的几步制度,因为常见的离散积分器产生不匹配的前向/后向过渡内核。出于这种分析,我们推导出一个确定性流(DF)的重要性权重ELBO估计没有向后内核。为了校准DF,我们引入了体积一致性正则化,该正则化沿着跨步长分辨率的流对齐累积体积变化。因此,我们建议的采样器实现了采样和稳定的证据估计,只有一个或几个步骤。在具有挑战性的合成和贝叶斯基准测试中,它实现了具有竞争力的样本质量,具有数量级更少的网络评估,同时保持稳健的ELBO估计。
摘要 :Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps to produce high-quality samples, leading to high computational costs. We introduce one-step diffusion samplers which learn a step-conditioned ODE so that one large step reproduces the trajectory of many small ones via a state-space consistency loss. We further show that standard ELBO estimates in diffusion samplers degrade in the few-step regime because common discrete integrators yield mismatched forward/backward transition kernels. Motivated by this analysis, we derive a deterministic-flow (DF) importance weight for ELBO estimation without a backward kernel. To calibrate DF, we introduce a volume-consistency regularization that aligns the accumulated volume change along the flow across step resolutions. Our proposed sampler therefore achieves both sampling and stable evidence estimate in only one or few steps. Across challenging synthetic and Bayesian benchmarks, it achieves competitive sample quality with orders-of-magnitude fewer network evaluations while maintaining robust ELBO estimates.


聚类(1篇)

【1】BalLOT: Balanced $k$-means clustering with optimal transport
标题:BalLOT:平衡$k$-意味着具有最佳运输的集群
链接:https://arxiv.org/abs/2512.05926

作者:Wenyan Luo,Dustin G. Mixon
备注:20 pages, 9 figures
摘要:我们考虑平衡k均值聚类的基本问题。特别是,我们引入了一个最佳的运输方法交替最小化称为BalLOT,我们表明,它提供了一个快速和有效的解决方案,这个问题。我们建立了各种数值实验证明几个理论保证之前。首先,我们证明了通用数据,BalLOT在每一步产生积分耦合。接下来,我们进行了景观分析,提供理论保证的随机球模型下种植集群的精确和部分恢复。最后,我们提出了初始化方案,实现一步恢复种植集群。
摘要:We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.


点云|SLAM|雷达|激光|深度RGBD相关(2篇)

【1】Curvature-Regularized Variational Autoencoder for 3D Scene Reconstruction from Sparse Depth
标题:用于从稀疏深度重建3D场景的曲线规则变分自动编码器
链接:https://arxiv.org/abs/2512.05783

作者:Maryam Yousefi,Soodeh Bakhshandeh
摘要:当深度传感器仅提供所需测量的5%时,重建完整的3D场景变得困难。自主车辆和机器人无法容忍稀疏重建引入的几何误差。我们提出了通过离散拉普拉斯算子的曲率正则化,实现了比标准变分自编码器高18.1%的重建精度。我们的贡献挑战了几何深度学习中的一个隐含假设:组合多个几何约束可以提高性能。一个设计良好的正则化项不仅匹配,而且超过了复杂的多项公式的有效性。离散拉普拉斯算子提供稳定的梯度和噪声抑制,仅需15%的训练开销和零推理成本。代码和模型可在https://github.com/Maryousefi/GeoVAE-3D上获得。
摘要:When depth sensors provide only 5% of needed measurements, reconstructing complete 3D scenes becomes difficult. Autonomous vehicles and robots cannot tolerate the geometric errors that sparse reconstruction introduces. We propose curvature regularization through a discrete Laplacian operator, achieving 18.1% better reconstruction accuracy than standard variational autoencoders. Our contribution challenges an implicit assumption in geometric deep learning: that combining multiple geometric constraints improves performance. A single well-designed regularization term not only matches but exceeds the effectiveness of complex multi-term formulations. The discrete Laplacian offers stable gradients and noise suppression with just 15% training overhead and zero inference cost. Code and models are available at https://github.com/Maryousefi/GeoVAE-3D.


【2】Moving object detection from multi-depth images with an attention-enhanced CNN
标题:利用注意力增强CNN从多深度图像中检测运动对象
链接:https://arxiv.org/abs/2512.05415

作者:Masato Shibukawa,Fumi Yoshida,Toshifumi Yanagisawa,Takashi Ito,Hirohisa Kurosaki,Makoto Yoshikawa,Kohki Kamiya,Ji-an Jiang,Wesley Fraser,JJ Kavelaars,Susan Benecchi,Anne Verbiscer,Akira Hatakeyama,Hosei O,Naoya Ozaki
备注:14 pages, 22 figures, submitted to PASJ
摘要:从广域测量数据中检测太阳系中移动物体的最大挑战之一是确定信号是否指示真实物体或由于其他来源,如噪音。对象验证在很大程度上依赖于人眼,这通常会导致显著的劳动力成本。为了解决这一限制并减少对人工干预的依赖,我们提出了一种集成了卷积块注意模块的多输入卷积神经网络。这种方法是专门为增强我们以前开发和使用的移动物体检测系统而定制的。目前的方法引入了两个创新。第一个是多输入架构,同时处理多个堆叠图像。第二个是加入了卷积块注意模块,使模型能够专注于空间和信道维度的基本特征。这些进步有助于从多个输入中进行有效学习,从而实现更强大的移动对象检测。该模型的性能进行评估的数据集包括约2,000个观测图像。我们实现了近99%的准确性,AUC(曲线下面积)>0.99。这些指标表明,所提出的模型取得了优异的分类性能。通过调整目标检测的阈值,与人工验证相比,新模型将人工工作量减少了99%以上。
摘要:One of the greatest challenges for detecting moving objects in the solar system from wide-field survey data is determining whether a signal indicates a true object or is due to some other source, like noise. Object verification has relied heavily on human eyes, which usually results in significant labor costs. In order to address this limitation and reduce the reliance on manual intervention, we propose a multi-input convolutional neural network integrated with a convolutional block attention module. This method is specifically tailored to enhance the moving object detection system that we have developed and used previously. The current method introduces two innovations. This first one is a multi-input architecture that processes multiple stacked images simultaneously. The second is the incorporation of the convolutional block attention module which enables the model to focus on essential features in both spatial and channel dimensions. These advancements facilitate efficient learning from multiple inputs, leading to more robust detection of moving objects. The performance of the model is evaluated on a dataset consisting of approximately 2,000 observational images. We achieved an accuracy of nearly 99% with AUC (an Area Under the Curve) of >0.99. These metrics indicate that the proposed model achieves excellent classification performance. By adjusting the threshold for object detection, the new model reduces the human workload by more than 99% compared to manual verification.


联邦学习|隐私保护|加密(1篇)

【1】MAR-FL: A Communication Efficient Peer-to-Peer Federated Learning System
标题:MAR-FL:一个通信高效的点对点联邦学习系统
链接:https://arxiv.org/abs/2512.05234

作者:Felix Mulitze,Herbert Woisetschläger,Hans Arno Jacobsen
备注:Accepted at the peer-reviewed AI4NextG Workshop at NeurIPS 2025
摘要 :下一代无线系统和分布式机器学习(ML)的融合需要联合学习(FL)方法,这些方法在无线连接的对等体和网络扰动下保持高效和鲁棒性。对等(P2P)FL消除了中央协调器的瓶颈,但现有的方法遭受过度的通信复杂性,限制了它们在实践中的可扩展性。我们介绍MAR-FL,一种新的P2P FL系统,利用迭代的基于组的聚合,大大减少通信开销,同时保持弹性流失。MAR-FL实现了O(N log N)的通信成本,与之前现有基线的O(N^2)复杂度形成对比,从而保持了有效性,特别是当聚合回合中的对等点数量增加时。该系统对不可靠的FL客户端具有鲁棒性,并且可以集成私有计算。
摘要:The convergence of next-generation wireless systems and distributed Machine Learning (ML) demands Federated Learning (FL) methods that remain efficient and robust with wireless connected peers and under network churn. Peer-to-peer (P2P) FL removes the bottleneck of a central coordinator, but existing approaches suffer from excessive communication complexity, limiting their scalability in practice. We introduce MAR-FL, a novel P2P FL system that leverages iterative group-based aggregation to substantially reduce communication overhead while retaining resilience to churn. MAR-FL achieves communication costs that scale as O(N log N), contrasting with the O(N^2) complexity of previously existing baselines, and thereby maintains effectiveness especially as the number of peers in an aggregation round grows. The system is robust towards unreliable FL clients and can integrate private computing.


推理|分析|理解|解释(5篇)

【1】KANFormer for Predicting Fill Probabilities via Survival Analysis in Limit Order Books
标题:KANFormer通过极限订单簿中的生存分析预测填充概率
链接:https://arxiv.org/abs/2512.05734

作者:Jinfeng Zhong,Emmanuel Bacry,Agathe Guilloux,Jean-François Muzy
摘要:本文介绍了KANFormer,这是一种基于深度学习的新型模型,用于通过利用市场和代理级别的信息来预测限价单的成交时间。KANFormer将扩张因果卷积网络与Transformer编码器相结合,并通过Kolmogorov-Arnold网络(KAN)进行增强,从而提高了非线性逼近。与现有模型仅依赖于限价订单簿的一系列快照不同,KANFormer集成了与LOB动态和订单在队列中的位置相关的代理的动作,以更有效地捕获与执行可能性相关的模式。我们使用CAC 40指数期货的标记订单数据来评估模型。结果表明,KANFormer在校正(右删失对数似然比,综合Brier评分)和区分(C指数,时间依赖AUC)方面都优于现有的作品。我们使用SHAP(SHapley加法解释)进一步分析特征随时间的重要性。我们的研究结果强调了将丰富的市场信号与表达性神经架构相结合的好处,以实现对填充概率的准确和可解释的预测。
摘要:This paper introduces KANFormer, a novel deep-learning-based model for predicting the time-to-fill of limit orders by leveraging both market- and agent-level information. KANFormer combines a Dilated Causal Convolutional network with a Transformer encoder, enhanced by Kolmogorov-Arnold Networks (KANs), which improve nonlinear approximation. Unlike existing models that rely solely on a series of snapshots of the limit order book, KANFormer integrates the actions of agents related to LOB dynamics and the position of the order in the queue to more effectively capture patterns related to execution likelihood. We evaluate the model using CAC 40 index futures data with labeled orders. The results show that KANFormer outperforms existing works in both calibration (Right-Censored Log-Likelihood, Integrated Brier Score) and discrimination (C-index, time-dependent AUC). We further analyze feature importance over time using SHAP (SHapley Additive exPlanations). Our results highlight the benefits of combining rich market signals with expressive neural architectures to achieve accurate and interpretabl predictions of fill probabilities.


【2】DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis
标题:DashFusion:用于多模式情绪分析的双数据流对齐与分层瓶颈融合
链接:https://arxiv.org/abs/2512.05515

作者:Yuhua Wen,Qifei Li,Yingying Zhou,Yingming Gao,Zhengqi Wen,Jianhua Tao,Ya Li
备注:Accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2025
摘要:多模态情感分析(MSA)集成了文本、图像和音频等各种模态,以提供对情感的更全面的理解。然而,有效的MSA受到对齐和融合问题的挑战。对齐需要跨模态同步时间和语义信息,而融合涉及将这些对齐的特征集成到统一的表示中。现有方法通常孤立地解决对准或融合,导致性能和效率的限制。为了解决这些问题,我们提出了一个新的框架,称为双流对齐与分层瓶颈融合(DashFusion)。首先,双流对齐模块通过时间对齐和语义对齐对多模态特征进行匹配。时间对齐采用跨模态注意力来建立多模态序列之间的帧级对应。语义对齐通过对比学习确保了特征空间的一致性。其次,监督对比学习利用标签信息来细化模态特征。最后,分层瓶颈融合逐步集成多模态信息通过压缩瓶颈令牌,实现性能和计算效率之间的平衡。我们在三个数据集上评估了DashFusion:CMU-MOSI,CMU-MOSEI和CH-SIMS。实验结果表明,DashFusion在各种指标上都达到了最先进的性能,消融研究证实了我们的对齐和融合技术的有效性。我们实验的代码可在https://github.com/ultramarineX/DashFusion上获得。
摘要:Multimodal sentiment analysis (MSA) integrates various modalities, such as text, image, and audio, to provide a more comprehensive understanding of sentiment. However, effective MSA is challenged by alignment and fusion issues. Alignment requires synchronizing both temporal and semantic information across modalities, while fusion involves integrating these aligned features into a unified representation. Existing methods often address alignment or fusion in isolation, leading to limitations in performance and efficiency. To tackle these issues, we propose a novel framework called Dual-stream Alignment with Hierarchical Bottleneck Fusion (DashFusion). Firstly, dual-stream alignment module synchronizes multimodal features through temporal and semantic alignment. Temporal alignment employs cross-modal attention to establish frame-level correspondences among multimodal sequences. Semantic alignment ensures consistency across the feature space through contrastive learning. Secondly, supervised contrastive learning leverages label information to refine the modality features. Finally, hierarchical bottleneck fusion progressively integrates multimodal information through compressed bottleneck tokens, which achieves a balance between performance and computational efficiency. We evaluate DashFusion on three datasets: CMU-MOSI, CMU-MOSEI, and CH-SIMS. Experimental results demonstrate that DashFusion achieves state-of-the-art performance across various metrics, and ablation studies confirm the effectiveness of our alignment and fusion techniques. The codes for our experiments are available at https://github.com/ultramarineX/DashFusion.


【3】LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
标题:LY NX:学习动态退出以实现保密控制推理
链接:https://arxiv.org/abs/2512.05325

作者:Ömer Faruk Akgül,Yusuf Hakan Kalaycı,Rajgopal Kannan,Willie Neiswanger,Viktor Prasanna
摘要:大型推理模型通过生成扩展的思维链在复杂任务上实现了强大的性能,但它们经常“过度思考”:在有足够的信息来正确回答之后很长时间内继续推理。这会浪费推理时间计算,并可能损害准确性。现有的尝试提前停止要么操纵解码与额外的采样和验证,依赖于辅助验证模型,或操作仅作为事后分析管道没有正式的保证。我们引入了LYNX,这是一种在线早期退出机制,它将模型自身的隐藏状态意识转变为置信度控制的停止决策。LYNX将退出决策与自然发生的推理线索(例如,“hmm”,“wait”),使用来自强制退出的监督在那些提示令牌处的隐藏状态上训练轻量级探测器,并且将所得分数包装在分裂共形预测中以获得对过早退出的无分布控制。至关重要的是,我们在通用数学语料库上对这个探测器进行一次训练和校准,然后在基准测试、解码温度甚至非数学任务中不加改变地重复使用它。在跨越1.5B到32 B参数的三个模型家族中,每个基础模型的单个数学训练的探针产生强大的准确性-效率权衡。在GSM 8 K上,LYNX匹配或提高了基线精度,同时减少了40- 65%的令牌;在MATH-500上,它提高了精度高达12个点,减少了大约35- 60%的令牌;在AIME 2024上,它恢复了基线精度,节省了超过50%的令牌;在CommonsenseQA(一个非数学基准测试)上,它传输zero-shot,准确率略有提高,令牌减少了70%。与最先进的早期退出方法相比,LYNX提供了具有竞争力或优越的帕累托边界,同时保持完全在线,不需要代理模型进行推理,并提供明确的,用户可调的置信度保证。
摘要 :Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often "overthink": continuing to reason long after they have enough information to answer correctly. This wastes inference-time compute and can hurt accuracy. Existing attempts to stop early either manipulate decoding with extra sampling and heuristics, rely on auxiliary verifier models, or operate only as post-hoc analysis pipelines without formal guarantees. We introduce LYNX, an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions. LYNX attaches exit decisions to naturally occurring reasoning cues (e.g., "hmm", "wait") during generation, trains a lightweight probe on hidden states at those cue tokens using supervision from forced exits, and wraps the resulting scores in split conformal prediction to obtain distribution-free control over premature exits. Crucially, we train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks. Across three model families spanning 1.5B to 32B parameters, a single mathematically trained probe per base model yields strong accuracy--efficiency tradeoffs. On GSM8K, LYNX matches or improves baseline accuracy while reducing tokens by 40--65\%; on MATH-500 it improves accuracy by up to 12 points with roughly 35--60\% fewer tokens; on AIME 2024 it recovers baseline accuracy with more than 50\% token savings; and on CommonsenseQA, a non-math benchmark, it transfers zero-shot with modest accuracy gains and up to 70\% fewer tokens. Compared to state-of-the-art early-exit methods, LYNX offers competitive or superior Pareto frontiers while remaining fully online, requiring no proxy models at inference, and providing explicit, user-tunable confidence guarantees.


【4】Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data
标题:我们真的需要数据吗?用预测数据进行推理的现代视角
链接:https://arxiv.org/abs/2512.05456

作者:Stephen Salerno,Kentaro Hoffman,Awan Afiaz,Anna Neufeld,Tyler H. McCormick,Jeffrey T. Leek
备注:32 pages, 9 figures, 3 tables
摘要:随着人工智能和机器学习工具变得越来越容易获得,科学家在数据收集方面面临新的障碍(例如,成本上升,调查响应率下降),研究人员越来越多地使用来自预先训练的算法的预测作为缺失或未观察到的数据的替代品。虽然出于财务和后勤方面的原因,使用标准工具进行推理可能会歪曲自变量与感兴趣的结果之间的关联,因为真实的、未观察到的结果被预测值所取代。在本文中,我们描述了用预测数据(IPD)进行推理所固有的统计挑战,并表明高预测准确性并不能保证有效的下游推理。我们发现,所有这些故障减少到统计概念(i)偏差,当预测系统地改变被估量或变量之间的关系扭曲,和(ii)方差,当预测模型的不确定性和真实数据的内在变异性被忽略。然后,我们回顾最近的方法进行IPD和讨论如何这个框架是深深植根于经典统计理论。然后,我们评论一些开放的问题和有趣的途径,为未来的工作在这一领域,并结束了一些评论如何使用预测数据的科学研究,既透明和统计原则。
摘要:As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g., rising costs, declining survey response rates), researchers increasingly use predictions from pre-trained algorithms as substitutes for missing or unobserved data. Though appealing for financial and logistical reasons, using standard tools for inference can misrepresent the association between independent variables and the outcome of interest when the true, unobserved outcome is replaced by a predicted value. In this paper, we characterize the statistical challenges inherent to drawing inference with predicted data (IPD) and show that high predictive accuracy does not guarantee valid downstream inference. We show that all such failures reduce to statistical notions of (i) bias, when predictions systematically shift the estimand or distort relationships among variables, and (ii) variance, when uncertainty from the prediction model and the intrinsic variability of the true data are ignored. We then review recent methods for conducting IPD and discuss how this framework is deeply rooted in classical statistical theory. We then comment on some open questions and interesting avenues for future work in this area, and end with some comments on how to use predicted data in scientific studies that is both transparent and statistically principled.


【5】Continuous-Time Homeostatic Dynamics for Reentrant Inference Models
标题:可重入推理模型的连续时间静态动力学
链接:https://arxiv.org/abs/2512.05158

作者:Byung Gyu Chae
备注:13 pages, 4 figures
摘要:我们制定的快速权重稳态再入网络(FHRN)作为一个连续时间的神经ODE系统,揭示其作为一个规范调节的再入动力学过程的作用。从离散重入规则$x_t = x_t^{(ex)} + γ\,W_r\,g(y_{t-1}\|)\,y_{t-1}$出发,我们导出了耦合系统$\dot{y}=-y+f(W_ry;\,x,\,A)+g_{h}}(y)$,表明该网络耦合了快速联想记忆和全局径向稳态。动力学承认有界吸引子所管辖的能量泛函,产生一个环状流形。雅可比谱分析确定了\n {反射制度},其中再入诱导稳定的振荡轨迹,而不是发散或崩溃。与连续时间递归神经网络或液体神经网络不同,FHRN通过群体级增益调制而不是固定递归或神经元局部时间自适应来实现稳定性。这些结果建立了再入网络作为一个独特的类的自参考神经动力学支持递归但有界的计算。
摘要:We formulate the Fast-Weights Homeostatic Reentry Network (FHRN) as a continuous-time neural-ODE system, revealing its role as a norm-regulated reentrant dynamical process. Starting from the discrete reentry rule $x_t = x_t^{(\mathrm{ex})} + γ\, W_r\, g(\|y_{t-1}\|)\, y_{t-1}$, we derive the coupled system $\dot{y}=-y+f(W_ry;\,x,\,A)+g_{\mathrm{h}}(y)$ showing that the network couples fast associative memory with global radial homeostasis. The dynamics admit bounded attractors governed by an energy functional, yielding a ring-like manifold. A Jacobian spectral analysis identifies a \emph{reflective regime} in which reentry induces stable oscillatory trajectories rather than divergence or collapse. Unlike continuous-time recurrent neural networks or liquid neural networks, FHRN achieves stability through population-level gain modulation rather than fixed recurrence or neuron-local time adaptation. These results establish the reentry network as a distinct class of self-referential neural dynamics supporting recursive yet bounded computation.


检测相关(3篇)

【1】SCoNE: Spherical Consistent Neighborhoods Ensemble for Effective and Efficient Multi-View Anomaly Detection
标题:SCoNE:球形一致邻居集合,用于有效且高效的多视图异常检测
链接:https://arxiv.org/abs/2512.05540

作者:Yang Xu,Hang Zhang,Yixiao Ma,Ye Zhu,Kai Ming Ting
摘要:多视图异常检测的核心问题是在所有视图中一致地表示正常实例的局部邻域。最近的方法考虑在每个视图中的局部邻域的表示独立,然后通过学习过程捕获跨所有视图的一致的邻居。他们受到两个关键问题的困扰。首先,不能保证它们可以很好地捕获一致的邻居,特别是当相同的邻居在不同视图中的不同密度的区域中时,导致检测精度较差。其次,学习过程具有$\mathcal{O}(N^2)$的高计算成本,使得它们不适用于大型数据集。为了解决这些问题,我们提出了一种新的方法,称为\textbf{S}球形\textbf{C}一致\textbf{N}邻域\textbf{E}邻域(SCoNE)。它有两个独特的特点:(a)一致的邻域直接用多视图实例表示,不需要现有方法中使用的中间表示;以及(b)邻域具有数据依赖的属性,这导致稀疏区域中的大邻域和密集区域中的小邻域。数据相关的属性使不同视图中的局部邻域能够被表示为一致的邻域,而无需学习。这导致了$\mathcal{O}(N)$时间复杂度。经验评估表明,SCoNE具有更高的检测精度,并且在大型数据集中比现有方法快几个数量级。
摘要 :The core problem in multi-view anomaly detection is to represent local neighborhoods of normal instances consistently across all views. Recent approaches consider a representation of local neighborhood in each view independently, and then capture the consistent neighbors across all views via a learning process. They suffer from two key issues. First, there is no guarantee that they can capture consistent neighbors well, especially when the same neighbors are in regions of varied densities in different views, resulting in inferior detection accuracy. Second, the learning process has a high computational cost of $\mathcal{O}(N^2)$, rendering them inapplicable for large datasets. To address these issues, we propose a novel method termed \textbf{S}pherical \textbf{C}onsistent \textbf{N}eighborhoods \textbf{E}nsemble (SCoNE). It has two unique features: (a) the consistent neighborhoods are represented with multi-view instances directly, requiring no intermediate representations as used in existing approaches; and (b) the neighborhoods have data-dependent properties, which lead to large neighborhoods in sparse regions and small neighborhoods in dense regions. The data-dependent properties enable local neighborhoods in different views to be represented well as consistent neighborhoods, without learning. This leads to $\mathcal{O}(N)$ time complexity. Empirical evaluations show that SCoNE has superior detection accuracy and runs orders-of-magnitude faster in large datasets than existing approaches.


【2】IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection
标题:IDK-S:用于流媒体异常检测的增量分布式核心
链接:https://arxiv.org/abs/2512.05531

作者:Yang Xu,Yixiao Ma,Kaifeng Zhang,Zuliang Yang,Kai Ming Ting
摘要:数据流上的异常检测提出了重大挑战,需要方法在不断变化的分布中保持高检测精度,同时确保实时效率。在这里,我们介绍$\mathcal{IDK}$-$\mathcal{S}$,一种用于$\mathbf{S}$流异常检测的新型$\mathbf{I}$增量$\mathbf{D}$增量$\mathbf{K}$内核,通过在内核均值嵌入框架中创建新的动态表示来有效地解决这些挑战。$\mathcal{IDK}$-$\mathcal{S}$的优越性归因于两个关键创新。首先,它继承了隔离分布内核的优势,这是一种离线检测器,由于使用了依赖于数据的内核,因此与隔离森林和局部离群值因子等基础方法相比,它具有显着的性能优势。其次,它采用了一个轻量级的增量更新机制,显着减少计算开销相比,天真的基线策略执行一个完整的模型再训练。这是在不影响检测准确性的情况下实现的,这一声明得到了其与完整再训练模型的统计等效性的支持。我们对13个基准测试的广泛实验表明,$\mathcal{IDK}$-$\mathcal{S}$实现了卓越的检测精度,同时运行速度大大加快,在许多情况下比现有的最先进的方法快一个数量级。
摘要:Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Here we introduce $\mathcal{IDK}$-$\mathcal{S}$, a novel $\mathbf{I}$ncremental $\mathbf{D}$istributional $\mathbf{K}$ernel for $\mathbf{S}$treaming anomaly detection that effectively addresses these challenges by creating a new dynamic representation in the kernel mean embedding framework. The superiority of $\mathcal{IDK}$-$\mathcal{S}$ is attributed to two key innovations. First, it inherits the strengths of the Isolation Distributional Kernel, an offline detector that has demonstrated significant performance advantages over foundational methods like Isolation Forest and Local Outlier Factor due to the use of a data-dependent kernel. Second, it adopts a lightweight incremental update mechanism that significantly reduces computational overhead compared to the naive baseline strategy of performing a full model retraining. This is achieved without compromising detection accuracy, a claim supported by its statistical equivalence to the full retrained model. Our extensive experiments on thirteen benchmarks demonstrate that $\mathcal{IDK}$-$\mathcal{S}$ achieves superior detection accuracy while operating substantially faster, in many cases by an order of magnitude, than existing state-of-the-art methods.


【3】Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification
标题:超越检测:细粒度Webshell家族分类的综合基准和表示学习研究
链接:https://arxiv.org/abs/2512.05288

作者:Feijiang Han
摘要:恶意WebShell通过损害关键数字基础设施并危及医疗保健和金融等行业的公共服务,构成了重大且不断发展的威胁。虽然研究界在WebShell检测方面取得了重大进展(即,区分恶意样本和良性样本),我们认为现在是从被动检测过渡到深入分析和主动防御的时候了。一个有希望的方向是WebShell家族分类的自动化,这涉及识别特定的恶意软件血统,以了解对手的策略并实现精确,快速的响应。然而,这一关键任务在很大程度上仍然是一个尚未探索的领域,目前依赖于缓慢的人工专家分析。为了解决这一差距,我们提出了第一个系统的研究,自动化的WebShell家庭分类。我们的方法从提取动态函数调用跟踪开始,以捕获抵抗常见加密和混淆的固有行为。为了增强数据集的规模和多样性,以实现更稳定的评估,我们使用大型语言模型合成的新变体来增强这些真实世界的痕迹。然后,这些增强的轨迹被抽象为序列、图和树,为基准测试一套全面的表示方法提供了基础。我们的评估涵盖了经典的基于序列的嵌入(CBOW,GloVe),Transformers(BERT,SimCSE)和一系列结构感知算法,包括Graph Kernels,Graph Edit Distance,Graph2Vec和各种图形神经网络。通过在监督和无监督设置下对四个真实世界的家庭注释数据集进行广泛的实验,我们建立了一个强大的基线,并为数据抽象,表示模型和学习范式的最有效组合提供了实用的见解。
摘要:Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection (i.e., distinguishing malicious samples from benign ones), we argue that it is time to transition from passive detection to in-depth analysis and proactive defense. One promising direction is the automation of WebShell family classification, which involves identifying the specific malware lineage in order to understand an adversary's tactics and enable a precise, rapid response. This crucial task, however, remains a largely unexplored area that currently relies on slow, manual expert analysis. To address this gap, we present the first systematic study to automate WebShell family classification. Our method begins with extracting dynamic function call traces to capture inherent behaviors that are resistant to common encryption and obfuscation. To enhance the scale and diversity of our dataset for a more stable evaluation, we augment these real-world traces with new variants synthesized by Large Language Models. These augmented traces are then abstracted into sequences, graphs, and trees, providing a foundation to benchmark a comprehensive suite of representation methods. Our evaluation spans classic sequence-based embeddings (CBOW, GloVe), transformers (BERT, SimCSE), and a range of structure-aware algorithms, including Graph Kernels, Graph Edit Distance, Graph2Vec, and various Graph Neural Networks. Through extensive experiments on four real-world, family-annotated datasets under both supervised and unsupervised settings, we establish a robust baseline and provide practical insights into the most effective combinations of data abstractions, representation models, and learning paradigms for this challenge.


表征(1篇)

【1】Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
标题:歌词很重要:利用学习表征的力量进行音乐流行预测
链接:https://arxiv.org/abs/2512.05508

作者:Yash Choudhary,Preeti Rao,Pushpak Bhattacharyya
备注:8 pages
摘要:准确预测音乐受欢迎程度是音乐行业的一个关键挑战,为艺术家、制作人和流媒体平台带来好处。之前的研究主要集中在音频功能,社交元数据或模型架构上。这项工作解决了在预测流行的歌词未充分发掘的作用。我们提出了一个自动化的管道,使用LLM提取高维歌词嵌入,捕捉语义,语法和顺序信息。这些功能集成到HitMusicLyricNet中,HitMusicLyricNet是一种多模式架构,它结合了音频,歌词和社交元数据,用于在0-100范围内进行流行度评分预测。我们的方法在SpotGenTrack数据集上的表现优于现有的基线,该数据集包含超过100,000个跟踪,分别在MAE和MSE方面实现了9%和20%的改进。消融证实,收益来自我们的LLM驱动的歌词功能管道(LyricsAENet),强调了密集的歌词表示的价值。
摘要:Accurately predicting music popularity is a critical challenge in the music industry, offering benefits to artists, producers, and streaming platforms. Prior research has largely focused on audio features, social metadata, or model architectures. This work addresses the under-explored role of lyrics in predicting popularity. We present an automated pipeline that uses LLM to extract high-dimensional lyric embeddings, capturing semantic, syntactic, and sequential information. These features are integrated into HitMusicLyricNet, a multimodal architecture that combines audio, lyrics, and social metadata for popularity score prediction in the range 0-100. Our method outperforms existing baselines on the SpotGenTrack dataset, which contains over 100,000 tracks, achieving 9% and 20% improvements in MAE and MSE, respectively. Ablation confirms that gains arise from our LLM-driven lyrics feature pipeline (LyricsAENet), underscoring the value of dense lyric representations.


3D|3D重建等相关(1篇)

【1】PoolNet: Deep Learning for 2D to 3D Video Process Validation
标题:PoolNet:用于2D到3D视频流程验证的深度学习
链接:https://arxiv.org/abs/2512.05362

作者:Sanchit Kaul,Joseph Luna,Shray Arora
备注:All code related to this paper can be found at https://github.com/sanchitkaul/PoolNet.git
摘要:从序列和非序列图像数据中提取运动恢复结构(SfM)信息是一项耗时且计算量大的任务。除此之外,大多数公开可用的数据由于相机姿态变化不足、模糊场景元素和噪声数据而不适合处理。为了解决这个问题,我们引入了PoolNet,这是一个多功能的深度学习框架,用于对野外数据进行帧级和场景级验证。我们证明,我们的模型成功地区分SfM准备场景从那些不适合处理,同时显着削弱的时间量的最先进的算法需要从运动中获得结构的数据。
摘要:Lifting Structure-from-Motion (SfM) information from sequential and non-sequential image data is a time-consuming and computationally expensive task. In addition to this, the majority of publicly available data is unfit for processing due to inadequate camera pose variation, obscuring scene elements, and noisy data. To solve this problem, we introduce PoolNet, a versatile deep learning framework for frame-level and scene-level validation of in-the-wild data. We demonstrate that our model successfully differentiates SfM ready scenes from those unfit for processing while significantly undercutting the amount of time state of the art algorithms take to obtain structure-from-motion data.


优化|敛散性(5篇)

【1】Approximation of Box Decomposition Algorithm for Fast Hypervolume-Based Multi-Objective Optimization
标题:基于超体积的快速多目标优化Box分解算法的逼近
链接:https://arxiv.org/abs/2512.05825

作者:Shuhei Watanabe
摘要:基于超体积(HV)的贝叶斯优化(BO)是多目标决策的标准方法之一。然而,优化采集函数的计算成本仍然是一个显著的瓶颈,主要是由于HV改进计算的费用。虽然HV盒分解提供了一种有效的方法来处理频繁的精确改进计算,但在最坏的情况下,它会受到超多项式内存复杂度$O(MN^{\lfloor \frac{M + 1}{2} \rfloor})$的影响,正如Lacour等人提出的那样。为了解决这个问题,Couckuyt等人(2012)采用了一种近似算法。然而,严格的算法描述目前缺乏文献。本文通过提供这种近似算法的全面数学和算法细节来弥合这一差距。
摘要:Hypervolume (HV)-based Bayesian optimization (BO) is one of the standard approaches for multi-objective decision-making. However, the computational cost of optimizing the acquisition function remains a significant bottleneck, primarily due to the expense of HV improvement calculations. While HV box-decomposition offers an efficient way to cope with the frequent exact improvement calculations, it suffers from super-polynomial memory complexity $O(MN^{\lfloor \frac{M + 1}{2} \rfloor})$ in the worst case as proposed by Lacour et al. (2017). To tackle this problem, Couckuyt et al. (2012) employed an approximation algorithm. However, a rigorous algorithmic description is currently absent from the literature. This paper bridges this gap by providing comprehensive mathematical and algorithmic details of this approximation algorithm.


【2】Non-Convex Federated Optimization under Cost-Aware Client Selection
标题:具有成本意识的客户选择下的非凸联邦优化
链接:https://arxiv.org/abs/2512.05327

作者:Xiaowen Jiang,Anton Rodomanov,Sebastian U. Stich
摘要:不同的联邦优化算法通常采用不同的客户端选择策略:有些方法在每一轮只与随机抽样的客户端子集进行通信,而其他方法则需要定期与所有客户端进行通信,或者使用结合两种策略的混合方案。然而,用于比较优化方法的现有度量通常不区分这些策略,这在实践中常常引起不同的通信成本。为了解决这种差异,我们引入了一个简单而自然的联邦优化模型,量化了通信和本地计算的复杂性。这个新模型允许几种常用的客户端选择策略,并明确地将每个策略与不同的成本相关联。在这种情况下,我们提出了一个新的算法,实现了最知名的通信和局部复杂性之间现有的联邦优化方法的非凸优化。该算法是基于不精确复合梯度法与精心构造的梯度估计和一个特殊的程序,在每次迭代求解辅助子问题。梯度估计是基于SAGA,一个流行的方差减少梯度估计。我们首先推导出一个新的方差界,表明SAGA可以利用功能相似性。然后,我们引入递归梯度技术作为一种通用的方法,以潜在地提高一个给定的条件无偏梯度估计,包括SAGA和SVRG的误差界。通过应用这种技术SAGA,我们得到了一个新的估计,RG-SAGA,它具有改进的误差界相比,原来的。
摘要:Different federated optimization algorithms typically employ distinct client-selection strategies: some methods communicate only with a randomly sampled subset of clients at each round, while others need to periodically communicate with all clients or use a hybrid scheme that combines both strategies. However, existing metrics for comparing optimization methods typically do not distinguish between these strategies, which often incur different communication costs in practice. To address this disparity, we introduce a simple and natural model of federated optimization that quantifies communication and local computation complexities. This new model allows for several commonly used client-selection strategies and explicitly associates each with a distinct cost. Within this setting, we propose a new algorithm that achieves the best-known communication and local complexities among existing federated optimization methods for non-convex optimization. This algorithm is based on the inexact composite gradient method with a carefully constructed gradient estimator and a special procedure for solving the auxiliary subproblem at each iteration. The gradient estimator is based on SAGA, a popular variance-reduced gradient estimator. We first derive a new variance bound for it, showing that SAGA can exploit functional similarity. We then introduce the Recursive-Gradient technique as a general way to potentially improve the error bound of a given conditionally unbiased gradient estimator, including both SAGA and SVRG. By applying this technique to SAGA, we obtain a new estimator, RG-SAGA, which has an improved error bound compared to the original one.


【3】Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces
标题:可解释性与最优化的桥梁:再生核Hilbert空间中可证明的属性加权Actor-Critic
链接:https://arxiv.org/abs/2512.05291

作者:Na Li,Hangguan Shan,Wei Ni,Wenjie Zhang,Xinyu Li
摘要:演员-评论家(AC)方法是强化学习(RL)的基石,但提供有限的可解释性。目前可解释的强化学习方法很少使用状态属性来辅助训练。相反,他们平等地对待所有状态特征,从而忽略了单个状态维度对奖励的异质性影响。我们提出了RKHS-基于SHAP的高级Actor-Critic(RSA 2C),一个属性感知的,内核化的,两个时标的AC算法,包括Actor,Value Critic和Advantage Critic。Actor在具有Mahalanobis加权算子值内核的向量值再生内核希尔伯特空间(RKHS)中实例化,而Value Critic和Advantage Critic驻留在标量RKHS中。这些RKHS增强的组件使用稀疏字典:Value Critic维护自己的字典,而Actor和Advantage Critic共享一个字典。通过RKHS-SHAP(用于流形上期望的内核均值嵌入和用于流形外期望的条件均值嵌入)从值批评计算的状态属性被转换为Mahalanobis门控权重,该权重调节演员梯度和优势批评目标。理论上,我们得到了一个全局的,非渐近收敛的状态扰动下的界,通过扰动误差项和效率通过收敛误差项显示稳定性。三个标准的连续控制环境的实证结果表明,我们的算法实现了效率,稳定性和可解释性。
摘要 :Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward. We propose RKHS--SHAP-based Advanced Actor--Critic (RSA2C), an attribution-aware, kernelized, two-timescale AC algorithm, including Actor, Value Critic, and Advantage Critic. The Actor is instantiated in a vector-valued reproducing kernel Hilbert space (RKHS) with a Mahalanobis-weighted operator-valued kernel, while the Value Critic and Advantage Critic reside in scalar RKHSs. These RKHS-enhanced components use sparsified dictionaries: the Value Critic maintains its own dictionary, while the Actor and Advantage Critic share one. State attributions, computed from the Value Critic via RKHS--SHAP (kernel mean embedding for on-manifold expectations and conditional mean embedding for off-manifold expectations), are converted into Mahalanobis-gated weights that modulate Actor gradients and Advantage Critic targets. Theoretically, we derive a global, non-asymptotic convergence bound under state perturbations, showing stability through the perturbation-error term and efficiency through the convergence-error term. Empirical results on three standard continuous-control environments show that our algorithm achieves efficiency, stability, and interpretability.


【4】Consequences of Kernel Regularity for Bandit Optimization
标题:Bandit优化的核规则性后果
链接:https://arxiv.org/abs/2512.05957

作者:Madison Lee,Tara Javidi
备注:Feedback welcome!
摘要:在这项工作中,我们研究了核正则性和算法性能之间的关系,在强盗优化的RKHS功能。虽然再生核希尔伯特空间(RKHS)方法传统上依赖于全局核回归器,但使用利用局部近似的基于平滑度的方法也很常见。我们表明,这些观点是深深连接通过各向同性内核的频谱特性。特别是,我们刻画了Matérn,平方指数,有理二次,$γ$-指数,分段多项式和狄利克雷内核的傅立叶谱,并表明衰减率决定渐近遗憾从两个角度来看。对于核化的bandit算法,谱衰减产生最大信息增益的上界,控制最坏情况下的遗憾,而对于基于平滑的方法,相同的衰减率建立Hölder空间嵌入和Besov空间范数等价,使局部连续性分析成为可能。这些连接表明,基于内核的和局部自适应算法可以在一个统一的框架内进行分析。这使我们能够得到明确的遗憾界为每个内核家庭,在几种情况下,获得新的结果,并为他人提供改进的分析。此外,我们分析了LP-GP-UCB,一个算法,结合了这两种方法,增强全球高斯过程代理与局部多项式估计。虽然混合方法并不统一主导专门的方法,它实现了多个内核家庭的顺序最优。
摘要:In this work we investigate the relationship between kernel regularity and algorithmic performance in the bandit optimization of RKHS functions. While reproducing kernel Hilbert space (RKHS) methods traditionally rely on global kernel regressors, it is also common to use a smoothness-based approach that exploits local approximations. We show that these perspectives are deeply connected through the spectral properties of isotropic kernels. In particular, we characterize the Fourier spectra of the Matérn, square-exponential, rational-quadratic, $γ$-exponential, piecewise-polynomial, and Dirichlet kernels, and show that the decay rate determines asymptotic regret from both viewpoints. For kernelized bandit algorithms, spectral decay yields upper bounds on the maximum information gain, governing worst-case regret, while for smoothness-based methods, the same decay rates establish Hölder space embeddings and Besov space norm-equivalences, enabling local continuity analysis. These connections show that kernel-based and locally adaptive algorithms can be analyzed within a unified framework. This allows us to derive explicit regret bounds for each kernel family, obtaining novel results in several cases and providing improved analysis for others. Furthermore, we analyze LP-GP-UCB, an algorithm that combines both approaches, augmenting global Gaussian process surrogates with local polynomial estimators. While the hybrid approach does not uniformly dominate specialized methods, it achieves order-optimality across multiple kernel families.


【5】Designing an Optimal Sensor Network via Minimizing Information Loss
标题:通过最大限度地减少信息丢失来设计最佳传感器网络
链接:https://arxiv.org/abs/2512.05940

作者:Daniel Waxman,Fernando Llorente,Katia Lamer,Petar M. Djurić
备注:37 pages, 15 figures. Accepted to Bayesian Analysis
摘要:最优实验设计是统计学中的一个经典主题,有许多研究充分的问题,应用和解决方案。我们研究的设计问题是传感器的位置,以监测时空过程,明确占我们的建模和优化的时间维度。我们观察到,计算科学的最新进展通常会产生基于物理模拟的大型数据集,这些数据集很少用于实验设计。我们引入了一种新的基于模型的传感器布局标准,以及一个高效的优化算法,它集成了基于物理的模拟和贝叶斯实验设计原则,以确定传感器网络,“最大限度地减少信息丢失”从模拟数据。我们的技术依赖于稀疏变分推理和(可分离的)高斯-马尔可夫先验,因此可以适应贝叶斯实验设计的许多技术。我们验证我们的方法,通过案例研究监测空气温度在亚利桑那州凤凰城,使用最先进的基于物理的模拟。我们的研究结果表明,我们的框架优于随机或准随机采样,特别是在有限数量的传感器。最后,我们讨论我们的框架,包括更复杂的建模工具和现实世界的部署的实际考虑和影响。
摘要:Optimal experimental design is a classic topic in statistics, with many well-studied problems, applications, and solutions. The design problem we study is the placement of sensors to monitor spatiotemporal processes, explicitly accounting for the temporal dimension in our modeling and optimization. We observe that recent advancements in computational sciences often yield large datasets based on physics-based simulations, which are rarely leveraged in experimental design. We introduce a novel model-based sensor placement criterion, along with a highly-efficient optimization algorithm, which integrates physics-based simulations and Bayesian experimental design principles to identify sensor networks that "minimize information loss" from simulated data. Our technique relies on sparse variational inference and (separable) Gauss-Markov priors, and thus may adapt many techniques from Bayesian experimental design. We validate our method through a case study monitoring air temperature in Phoenix, Arizona, using state-of-the-art physics-based simulations. Our results show our framework to be superior to random or quasi-random sampling, particularly with a limited number of sensors. We conclude by discussing practical considerations and implications of our framework, including more complex modeling tools and real-world deployments.


预测|估计(10篇)

【1】NeuroMemFPP: A recurrent neural approach for memory-aware parameter estimation in fractional Poisson process
标题:NeuroMemFPP:一种用于分数Poisson过程中记忆感知参数估计的循环神经方法
链接:https://arxiv.org/abs/2512.05893

作者:Neha Gupta,Aditya Maheshwari
备注:12 pages
摘要:在本文中,我们提出了一个基于递归神经网络(RNN)的框架,用于估计分数泊松过程(FPP)的参数,该过程对具有记忆和长程依赖性的事件到达进行建模。长短期记忆(LSTM)网络从到达间隔时间序列中估计关键参数$μ>0$和$β\in(0,1)$,有效地建模它们的时间依赖性。我们对合成数据的实验表明,所提出的方法减少了约55.3%的均方误差(MSE)相比,传统的矩量法(Moments),并在不同的训练条件下可靠地执行。我们在两个真实世界的高频数据集上测试了该方法:来自宾夕法尼亚州蒙哥马利县的紧急呼叫记录和AAPL股票交易数据。结果表明,LSTM可以有效地跟踪日常模式和参数变化,表明其对具有复杂时间依赖性的真实数据的有效性。
摘要:In this paper, we propose a recurrent neural network (RNN)-based framework for estimating the parameters of the fractional Poisson process (FPP), which models event arrivals with memory and long-range dependence. The Long Short-Term Memory (LSTM) network estimates the key parameters $μ>0$ and $β\in(0,1)$ from sequences of inter-arrival times, effectively modeling their temporal dependencies. Our experiments on synthetic data show that the proposed approach reduces the mean squared error (MSE) by about 55.3\% compared to the traditional method of moments (MOM) and performs reliably across different training conditions. We tested the method on two real-world high-frequency datasets: emergency call records from Montgomery County, PA, and AAPL stock trading data. The results show that the LSTM can effectively track daily patterns and parameter changes, indicating its effectiveness on real-world data with complex time dependencies.


【2】Predicting Price Movements in High-Frequency Financial Data with Spiking Neural Networks
标题:利用尖峰神经网络预测高频金融数据中的价格变动
链接:https://arxiv.org/abs/2512.05868

作者:Brian Ezinwoke,Oliver Rhodes
备注:9 pages, 5 figures, 8 tables
摘要:现代高频交易(HFT)环境的特点是突然的价格飙升,同时存在风险和机会,但传统的金融模型往往无法捕捉到所需的精细时间结构。尖峰神经网络(SNN)提供了一个非常适合这些挑战的生物启发框架,因为它们具有处理离散事件和保持毫秒级时序的自然能力。本文研究了SNN在高频价格尖峰预测中的应用,通过贝叶斯优化(BO)的鲁棒超参数调整来提高性能。这项工作将高频股票数据转换为尖峰列车,并评估三种架构:一个建立的无监督STDP训练的SNN,一种新的SNN与明确的抑制竞争,和监督反向传播网络。BO是由一个新的目标驱动的,即惩罚性峰值准确性(PSA),旨在确保网络的预测价格峰值率与价格事件的经验率保持一致。模拟交易表明,使用PSA优化的模型始终优于其Spike Accuracy(SA)调整的同行和基线。具体而言,PSA的扩展SNN模型在简单回测中实现了最高的累积回报率(76.8%),显著超过了监督替代方案(42.54%的回报率)。这些结果验证了尖峰网络的潜力,当强大的调整与特定任务的目标,有效的价格尖峰预测高频交易。
摘要:Modern high-frequency trading (HFT) environments are characterized by sudden price spikes that present both risk and opportunity, but conventional financial models often fail to capture the required fine temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework well-suited to these challenges due to their natural ability to process discrete events and preserve millisecond-scale timing. This work investigates the application of SNNs to high-frequency price-spike forecasting, enhancing performance via robust hyperparameter tuning with Bayesian Optimization (BO). This work converts high-frequency stock data into spike trains and evaluates three architectures: an established unsupervised STDP-trained SNN, a novel SNN with explicit inhibitory competition, and a supervised backpropagation network. BO was driven by a novel objective, Penalized Spike Accuracy (PSA), designed to ensure a network's predicted price spike rate aligns with the empirical rate of price events. Simulated trading demonstrated that models optimized with PSA consistently outperformed their Spike Accuracy (SA)-tuned counterparts and baselines. Specifically, the extended SNN model with PSA achieved the highest cumulative return (76.8%) in simple backtesting, significantly surpassing the supervised alternative (42.54% return). These results validate the potential of spiking networks, when robustly tuned with task-specific objectives, for effective price spike forecasting in HFT.


【3】IdealTSF: Can Non-Ideal Data Contribute to Enhancing the Performance of Time Series Forecasting Models?
标题:IdealTSF:非理想数据能否有助于提高时间序列预测模型的性能?
链接:https://arxiv.org/abs/2512.05442

作者:Hua Wang,Jinghao Lu,Fan Zhang
备注:Accepted at AAAI 2026
摘要:深度学习在时间序列预测任务中表现出强大的性能。然而,序列数据中的缺失值和异常等问题阻碍了其在预测任务中的进一步发展。以前的研究主要集中在从序列数据中提取特征信息或将这些次优数据作为知识转移的正样本。更有效的方法是利用这些非理想的负样本来增强事件预测。作为回应,本研究强调了非理想负样本的优势,并提出了IdealTSF框架,它集成了理想的正样本和负样本的时间序列预测。IdealTSF由三个渐进步骤组成:预训练,训练和优化。该方法首先从负样本数据中提取知识对模型进行预训练,然后在训练过程中将序列数据转化为理想的正样本。此外,应用了具有对抗干扰的负优化机制。大量的实验表明,负样本数据解锁时间序列预测的基本注意力架构内的显着潜力。因此,IdealTSF特别适合具有噪声样本或低质量数据的应用。
摘要:Deep learning has shown strong performance in time series forecasting tasks. However, issues such as missing values and anomalies in sequential data hinder its further development in prediction tasks. Previous research has primarily focused on extracting feature information from sequence data or addressing these suboptimal data as positive samples for knowledge transfer. A more effective approach would be to leverage these non-ideal negative samples to enhance event prediction. In response, this study highlights the advantages of non-ideal negative samples and proposes the IdealTSF framework, which integrates both ideal positive and negative samples for time series forecasting. IdealTSF consists of three progressive steps: pretraining, training, and optimization. It first pretrains the model by extracting knowledge from negative sample data, then transforms the sequence data into ideal positive samples during training. Additionally, a negative optimization mechanism with adversarial disturbances is applied. Extensive experiments demonstrate that negative sample data unlocks significant potential within the basic attention architecture for time series forecasting. Therefore, IdealTSF is particularly well-suited for applications with noisy samples or low-quality data.


【4】Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction
标题:智能定时挖矿:比特币硬件ROI预测的深度学习框架
链接:https://arxiv.org/abs/2512.05402

作者:Sithumi Wickramasinghe,Bikramjit Das,Dorien Herremans
摘要:由于市场波动、技术快速过时和协议驱动的收入周期,比特币挖矿硬件的收购需要战略时机。尽管采矿业已经发展成为一个资本密集型行业,但关于何时购买新的专用集成电路(ASIC)硬件的指导很少,并且没有以前的计算框架解决这个决策问题。我们通过将硬件采购制定为时间序列分类任务来解决这一差距,预测购买ASIC机器是否会在一年内产生盈利(投资回报率(ROI)>= 1),边际(0 < ROI < 1)或无利可图(ROI <= 0)回报。我们提出了MineROI-Net,这是一个基于transformer的开源架构,旨在捕获采矿盈利能力的多尺度时间模式。根据2015年至2024年期间发布的20个ASIC矿机在不同市场制度下的数据进行评估,MineROI-Net的表现优于基于LSTM和TSLANet的基线,准确率达到83.7%,宏观F1得分达到83.1%。该模型表现出很强的经济相关性,在检测不盈利时段时达到93.6%的准确率,在检测盈利时段时达到98.5%的准确率,同时避免了将盈利情景误分类为不盈利情景,反之亦然。这些结果表明,MineROI-Net提供了一种实用的数据驱动工具,用于确定采矿硬件的收购时间,从而可能降低资本密集型采矿业务的财务风险。该模型可通过https://github.com/AMAAI-Lab/MineROI-Net获得。
摘要:Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-driven revenue cycles. Despite mining's evolution into a capital-intensive industry, there is little guidance on when to purchase new Application-Specific Integrated Circuit (ASIC) hardware, and no prior computational frameworks address this decision problem. We address this gap by formulating hardware acquisition as a time series classification task, predicting whether purchasing ASIC machines yields profitable (Return on Investment (ROI) >= 1), marginal (0 < ROI < 1), or unprofitable (ROI <= 0) returns within one year. We propose MineROI-Net, an open source Transformer-based architecture designed to capture multi-scale temporal patterns in mining profitability. Evaluated on data from 20 ASIC miners released between 2015 and 2024 across diverse market regimes, MineROI-Net outperforms LSTM-based and TSLANet baselines, achieving 83.7% accuracy and 83.1% macro F1-score. The model demonstrates strong economic relevance, achieving 93.6% precision in detecting unprofitable periods and 98.5% precision for profitable ones, while avoiding misclassification of profitable scenarios as unprofitable and vice versa. These results indicate that MineROI-Net offers a practical, data-driven tool for timing mining hardware acquisitions, potentially reducing financial risk in capital-intensive mining operations. The model is available through: https://github.com/AMAAI-Lab/MineROI-Net.


【5】Text Rationalization for Robust Causal Effect Estimation
标题:用于稳健因果效应估计的文本简化
链接:https://arxiv.org/abs/2512.05373

作者:Lijinghua Zhang,Hengrui Cai
摘要:自然语言处理的最新进展使文本数据在因果推理中的使用越来越多,特别是在治疗效果估计中调整混杂因素。虽然高维文本可以编码丰富的上下文信息,但它也对因果识别和估计提出了独特的挑战。特别是,当大量文本在特征空间中表示时,需要在混淆值之间有足够的治疗重叠的阳性假设通常在观察水平上被违反。冗余或虚假的文本特征会增加维度,产生极端的倾向分数,不稳定的权重和膨胀的效应估计方差。我们解决这些挑战与混淆感知令牌验证(CATR),一个框架,选择一个稀疏的必要子集的令牌使用剩余独立性诊断,旨在保留足够的混淆信息unconfoundedness。通过丢弃不相关的文本,同时保留关键信号,CATR减轻了观察层面的积极性违规,并稳定了下游因果效应估计器。对合成数据的实验和使用MIMIC-III数据库的真实世界研究表明,CATR比现有基线产生更准确,稳定和可解释的因果效应估计。
摘要 :Recent advances in natural language processing have enabled the increasing use of text data in causal inference, particularly for adjusting confounding factors in treatment effect estimation. Although high-dimensional text can encode rich contextual information, it also poses unique challenges for causal identification and estimation. In particular, the positivity assumption, which requires sufficient treatment overlap across confounder values, is often violated at the observational level, when massive text is represented in feature spaces. Redundant or spurious textual features inflate dimensionality, producing extreme propensity scores, unstable weights, and inflated variance in effect estimates. We address these challenges with Confounding-Aware Token Rationalization (CATR), a framework that selects a sparse necessary subset of tokens using a residual-independence diagnostic designed to preserve confounding information sufficient for unconfoundedness. By discarding irrelevant texts while retaining key signals, CATR mitigates observational-level positivity violations and stabilizes downstream causal effect estimators. Experiments on synthetic data and a real-world study using the MIMIC-III database demonstrate that CATR yields more accurate, stable, and interpretable causal effect estimates than existing baselines.


【6】Enhancing Dimensionality Prediction in Hybrid Metal Halides via Feature Engineering and Class-Imbalance Mitigation
标题:通过特征工程和类别不平衡缓解增强混合金属卤的准度预测
链接:https://arxiv.org/abs/2512.05367

作者:Mariia Karabin,Isaac Armstrong,Leo Beck,Paulina Apanel,Markus Eisenbach,David B. Mitzi,Hanna Terletska,Hendrik Heinz
摘要:我们提出了一个机器学习框架,用于预测混合金属卤化物(HMH)的结构维度,包括有机-无机钙钛矿,使用化学信息特征工程和高级类不平衡处理技术的组合。该数据集由494个HMH结构组成,在维度类别(0 D,1D,2D,3D)之间高度不平衡,对预测建模提出了重大挑战。这个数据集后来通过合成少数民族过采样技术(SMOTE)增加到1336,以减轻阶级不平衡的影响。我们开发了基于交互的描述符,并将其集成到多阶段工作流中,该工作流结合了特征选择、模型堆叠和性能优化,以提高维度预测的准确性。我们的方法显着提高了代表性不足的类的F1分数,在所有维度上实现了强大的交叉验证性能。
摘要:We present a machine learning framework for predicting the structural dimensionality of hybrid metal halides (HMHs), including organic-inorganic perovskites, using a combination of chemically-informed feature engineering and advanced class-imbalance handling techniques. The dataset, consisting of 494 HMH structures, is highly imbalanced across dimensionality classes (0D, 1D, 2D, 3D), posing significant challenges to predictive modeling. This dataset was later augmented to 1336 via the Synthetic Minority Oversampling Technique (SMOTE) to mitigate the effects of the class imbalance. We developed interaction-based descriptors and integrated them into a multi-stage workflow that combines feature selection, model stacking, and performance optimization to improve dimensionality prediction accuracy. Our approach significantly improves F1-scores for underrepresented classes, achieving robust cross-validation performance across all dimensionalities.


【7】Robustness Test for AI Forecasting of Hurricane Florence Using FourCastNetv2 and Random Perturbations of the Initial Condition
标题:使用FourCastNetv 2和初始条件随机扰动对飓风弗洛伦斯人工智能预测的稳健性测试
链接:https://arxiv.org/abs/2512.05323

作者:Adam Lizerbram,Shane Stevenson,Iman Khadir,Matthew Tu,Samuel S. P. Shen
备注:26 pages, 12 figures
摘要:了解天气预报模型在输入噪声或不同不确定性方面的鲁棒性对于评估其输出可靠性非常重要,特别是对于飓风等极端天气事件。在本文中,我们测试了一个人工智能(AI)天气预报模型:NVIDIA FourCastNetv2(FCNv2)的灵敏度和鲁棒性。我们进行了两个实验,旨在评估模型的输出在不同水平的注入噪声的模型的初始条件。首先,我们用不同数量的高斯噪声扰动欧洲中期天气预报中心(ECMWF)再分析v5(ERA5)数据集(2018年9月13日至16日)的飓风佛罗伦萨的初始条件,并检查对预测轨迹和预测风暴强度的影响。其次,我们以完全随机的初始条件启动FCNv2,并观察模型如何响应无意义的输入。我们的研究结果表明,FCNv2准确地保留低到中等噪声注入下的飓风特征。即使在高水平的噪音下,模型也能保持风暴的总体轨迹和结构,尽管定位精度开始下降。FCNv2始终低估了所有注入噪声水平的风暴强度和持续性。在完全随机的初始条件下,模型在几个时间步后生成平滑和有凝聚力的预测,这意味着模型倾向于稳定,平滑的输出。我们的方法简单,可移植到其他数据驱动的人工智能天气预报模型。
摘要:Understanding the robustness of a weather forecasting model with respect to input noise or different uncertainties is important in assessing its output reliability, particularly for extreme weather events like hurricanes. In this paper, we test sensitivity and robustness of an artificial intelligence (AI) weather forecasting model: NVIDIAs FourCastNetv2 (FCNv2). We conduct two experiments designed to assess model output under different levels of injected noise in the models initial condition. First, we perturb the initial condition of Hurricane Florence from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5) dataset (September 13-16, 2018) with varying amounts of Gaussian noise and examine the impact on predicted trajectories and forecasted storm intensity. Second, we start FCNv2 with fully random initial conditions and observe how the model responds to nonsensical inputs. Our results indicate that FCNv2 accurately preserves hurricane features under low to moderate noise injection. Even under high levels of noise, the model maintains the general storm trajectory and structure, although positional accuracy begins to degrade. FCNv2 consistently underestimates storm intensity and persistence across all levels of injected noise. With full random initial conditions, the model generates smooth and cohesive forecasts after a few timesteps, implying the models tendency towards stable, smoothed outputs. Our approach is simple and portable to other data-driven AI weather forecasting models.


【8】Robust forecast aggregation via additional queries
标题:通过额外查询进行强大的预测聚合
链接:https://arxiv.org/abs/2512.05271

作者:Rafael Frongillo,Mary Monroe,Eric Neyman,Bo Waggoner
摘要:我们研究稳健预测聚合的问题:与基础信息的最佳可能聚合相比,将专家预测与可证明的准确性保证相结合。先前的工作显示出强烈的不可能性结果,例如,即使在自然假设下,专家的个人预测的集合也不能简单地跟随随机专家(Neyman和Roughgarden,2022)。   在本文中,我们介绍了一个更一般的框架,允许委托人通过结构化查询从专家那里获得更丰富的信息。我们的框架确保专家将如实报告他们的基本信念,也使我们能够定义复杂性的概念,提出这些问题的难度。独立但重叠的专家信号的一般模型下,我们表明,最佳聚合是在最坏的情况下,每个复杂性的措施,上述代理的数量N$的限制是可以实现的。我们进一步建立准确性和查询复杂性之间的紧密权衡:聚合误差随着查询的数量线性下降,当“推理顺序”和与查询相关的代理数量为$ω(\sqrt{n})$时,聚合误差消失。这些结果表明,专家查询空间的适度扩展大大增强了强大的预测聚合的能力。因此,我们希望我们的新查询框架将在这一领域开辟一条富有成效的研究路线。
摘要 :We study the problem of robust forecast aggregation: combining expert forecasts with provable accuracy guarantees compared to the best possible aggregation of the underlying information. Prior work shows strong impossibility results, e.g. that even under natural assumptions, no aggregation of the experts' individual forecasts can outperform simply following a random expert (Neyman and Roughgarden, 2022).   In this paper, we introduce a more general framework that allows the principal to elicit richer information from experts through structured queries. Our framework ensures that experts will truthfully report their underlying beliefs, and also enables us to define notions of complexity over the difficulty of asking these queries. Under a general model of independent but overlapping expert signals, we show that optimal aggregation is achievable in the worst case with each complexity measure bounded above by the number of agents $n$. We further establish tight tradeoffs between accuracy and query complexity: aggregation error decreases linearly with the number of queries, and vanishes when the "order of reasoning" and number of agents relevant to a query is $ω(\sqrt{n})$. These results demonstrate that modest extensions to the space of expert queries dramatically strengthen the power of robust forecast aggregation. We therefore expect that our new query framework will open up a fruitful line of research in this area.


【9】Design-marginal calibration of Gaussian process predictive distributions: Bayesian and conformal approaches
标题:高斯过程预测分布的设计边际校准:Bayesian和保形方法
链接:https://arxiv.org/abs/2512.05611

作者:Aurélien Pion,Emmanuel Vazquez
摘要:我们研究了高斯过程(GP)预测分布的校准插值设置从设计边际的角度。以数据为条件,并在设计测量μ上取平均值,我们通过随机概率积分变换形式化了中心区间的μ-覆盖和μ-概率校准。我们介绍两种方法。cps-gp使用标准化的留一法残差使共形预测系统适应GP插值,产生具有有限样本边缘校准的逐步预测分布。bcr-gp保留GP后验均值,并通过拟合交叉验证标准化残差的广义正态模型替换高斯残差。一个贝叶斯选择规则的基础上的保守预测或交叉后验Kolmogorov-Smirnov标准的方差的上分位数的概率校准控制分散和尾部行为,同时产生平滑的预测分布适合序贯设计。基准函数的数值实验比较cps-gp,bcr-gp,Jackknife+的GP,和完整的共形高斯过程,使用校准指标(覆盖率,Kolmogorov-Smirnov,积分绝对误差)和准确性或锐度通过缩放连续排名的概率得分。
摘要:We study the calibration of Gaussian process (GP) predictive distributions in the interpolation setting from a design-marginal perspective. Conditioning on the data and averaging over a design measure μ, we formalize μ-coverage for central intervals and μ-probabilistic calibration through randomized probability integral transforms. We introduce two methods. cps-gp adapts conformal predictive systems to GP interpolation using standardized leave-one-out residuals, yielding stepwise predictive distributions with finite-sample marginal calibration. bcr-gp retains the GP posterior mean and replaces the Gaussian residual by a generalized normal model fitted to cross-validated standardized residuals. A Bayesian selection rule-based either on a posterior upper quantile of the variance for conservative prediction or on a cross-posterior Kolmogorov-Smirnov criterion for probabilistic calibration-controls dispersion and tail behavior while producing smooth predictive distributions suitable for sequential design. Numerical experiments on benchmark functions compare cps-gp, bcr-gp, Jackknife+ for GPs, and the full conformal Gaussian process, using calibration metrics (coverage, Kolmogorov-Smirnov, integral absolute error) and accuracy or sharpness through the scaled continuous ranked probability score.


【10】STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings
标题:STAR-GO:通过学习分层整合本体信息语义嵌入来改善蛋白质功能预测
链接:https://arxiv.org/abs/2512.05245

作者:Mehmet Efe Akça,Gökçe Uludoğan,Arzucan Özgür,İnci M. Baytaş
备注:14 pages, 2 figures, 6 tables
摘要:蛋白质功能的准确预测对于阐明分子机制和推进生物学和治疗发现是必不可少的。然而,实验注释远远落后于蛋白质序列数据的快速增长。计算方法通过将蛋白质与基因本体(GO)术语相关联来解决这一差距,基因本体(GO)术语通过层次关系和文本定义来编码功能知识。然而,现有的模型往往强调一种模态而不是另一种模态,限制了它们的泛化能力,特别是对于随着本体的发展而频繁出现的看不见的或新引入的GO术语,并使先前训练的模型过时。我们提出了STAR-GO,一个基于Transformer的框架,它联合建模GO术语的语义和结构特征,以增强zero-shot蛋白质功能预测。STAR-GO将文本定义与本体图结构相集成,以学习统一的GO表示,这些表示以分层顺序进行处理,以将信息从一般术语传播到特定术语。然后将这些表示与蛋白质序列嵌入进行比对,以捕获序列-功能关系。STAR-GO实现了最先进的性能和卓越的zero-shot泛化,展示了集成语义和结构的实用性,可用于鲁棒和适应性强的蛋白质功能预测。代码可在https://github.com/boun-tabi-lifelu/stargo上获得。
摘要:Accurate prediction of protein function is essential for elucidating molecular mechanisms and advancing biological and therapeutic discovery. Yet experimental annotation lags far behind the rapid growth of protein sequence data. Computational approaches address this gap by associating proteins with Gene Ontology (GO) terms, which encode functional knowledge through hierarchical relations and textual definitions. However, existing models often emphasize one modality over the other, limiting their ability to generalize, particularly to unseen or newly introduced GO terms that frequently arise as the ontology evolves, and making the previously trained models outdated. We present STAR-GO, a Transformer-based framework that jointly models the semantic and structural characteristics of GO terms to enhance zero-shot protein function prediction. STAR-GO integrates textual definitions with ontology graph structure to learn unified GO representations, which are processed in hierarchical order to propagate information from general to specific terms. These representations are then aligned with protein sequence embeddings to capture sequence-function relationships. STAR-GO achieves state-of-the-art performance and superior zero-shot generalization, demonstrating the utility of integrating semantics and structure for robust and adaptable protein function prediction. Code is available at https://github.com/boun-tabi-lifelu/stargo.


其他神经网络|深度学习|模型|建模(20篇)

【1】Impugan: Learning Conditional Generative Models for Robust Data Imputation
标题:Impugan:学习条件生成模型以实现稳健数据插补
链接:https://arxiv.org/abs/2512.05950

作者:Zalish Mahmud,Anantaa Kotal,Aritran Piplai
摘要:不完整的数据在实际应用中很常见。传感器失效,记录不一致,从不同来源收集的数据集在规模、采样率和质量上往往不同。这些差异会造成缺失值,从而难以组合数据并构建可靠的模型。回归模型、期望最大化和多重插补等标准插补方法依赖于关于线性和独立性的强假设。这些假设很少适用于复杂或异质数据,这可能导致有偏或过度平滑的估计。我们提出了Impugan,一个条件生成对抗网络(cGAN),用于估算缺失值和整合异构数据集。该模型在完整的样本上进行训练,以了解缺失变量如何依赖于观察到的变量。在推理过程中,生成器从可用的特征中重建缺失的条目,而推理器通过区分真实数据和估算数据来增强真实性。这种对抗性的过程使Impugan能够捕获传统方法无法表示的非线性和多模态关系。在对基准数据集和多源集成任务的实验中,Impugan与领先基线相比,实现了高达82%的地球移动器距离(EMD)和70%的互信息偏差(MI)。这些结果表明,对抗性训练的生成模型为估算和合并不完整的异构数据提供了一种可扩展的原则性方法。我们的模型可在github.com/zalishmahmud/impuganBigData2025上获得
摘要:Incomplete data are common in real-world applications. Sensors fail, records are inconsistent, and datasets collected from different sources often differ in scale, sampling rate, and quality. These differences create missing values that make it difficult to combine data and build reliable models. Standard imputation methods such as regression models, expectation-maximization, and multiple imputation rely on strong assumptions about linearity and independence. These assumptions rarely hold for complex or heterogeneous data, which can lead to biased or over-smoothed estimates. We propose Impugan, a conditional Generative Adversarial Network (cGAN) for imputing missing values and integrating heterogeneous datasets. The model is trained on complete samples to learn how missing variables depend on observed ones. During inference, the generator reconstructs missing entries from available features, and the discriminator enforces realism by distinguishing true from imputed data. This adversarial process allows Impugan to capture nonlinear and multimodal relationships that conventional methods cannot represent. In experiments on benchmark datasets and a multi-source integration task, Impugan achieves up to 82\% lower Earth Mover's Distance (EMD) and 70\% lower mutual-information deviation (MI) compared to leading baselines. These results show that adversarially trained generative models provide a scalable and principled approach for imputing and merging incomplete, heterogeneous data. Our model is available at: github.com/zalishmahmud/impuganBigData2025


【2】Developing synthetic microdata through machine learning for firm-level business surveys
标题:通过机器学习开发公司级业务调查的合成微数据
链接:https://arxiv.org/abs/2512.05948

作者:Jorge Cisneros Paz,Timothy Wojan,Matthew Williams,Jennifer Ozawa,Robert Chew,Kimberly Janda,Timothy Navarro,Michael Floyd,Christine Task,Damon Streat
备注:17 pages, 4 figures, 6 tables
摘要:美国人口普查局(US Census Bureau)提供的关于个人的公共使用微观数据样本(Public Use Microdata Samples,简称CNAS)已经存在了几十年。然而,计算能力的大幅提高和大数据的更大可用性大大增加了重新识别匿名数据的可能性,这可能违反了对调查受访者的保密承诺。数据科学工具可用于生成合成数据,保留经验数据的关键时刻,但不包含任何现有个人受访者或企业的记录。从调查中开发公共使用的公司数据带来了不同于人口数据的独特挑战,因为缺乏匿名性,并且在每个地理区域可以很容易地识别某些行业。本文简要介绍了一个机器学习模型,用于构建一个综合的基于年度商业调查(ABS),并讨论了各种质量指标。虽然ABS的商业调查目前正在完善,结果是保密的,我们提出了两个合成的商业调查开发的2007年企业主,类似于ABS的业务数据。对发表在《小企业经济学》上的高影响力分析的计量复制证明了合成数据与真实数据的逼真性,并激发了对可能的ABS用例的讨论。
摘要:Public-use microdata samples (PUMS) from the United States (US) Census Bureau on individuals have been available for decades. However, large increases in computing power and the greater availability of Big Data have dramatically increased the probability of re-identifying anonymized data, potentially violating the pledge of confidentiality given to survey respondents. Data science tools can be used to produce synthetic data that preserve critical moments of the empirical data but do not contain the records of any existing individual respondent or business. Developing public-use firm data from surveys presents unique challenges different from demographic data, because there is a lack of anonymity and certain industries can be easily identified in each geographic area. This paper briefly describes a machine learning model used to construct a synthetic PUMS based on the Annual Business Survey (ABS) and discusses various quality metrics. Although the ABS PUMS is currently being refined and results are confidential, we present two synthetic PUMS developed for the 2007 Survey of Business Owners, similar to the ABS business data. Econometric replication of a high impact analysis published in Small Business Economics demonstrates the verisimilitude of the synthetic data to the true data and motivates discussion of possible ABS use cases.


【3】LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction
链接:https://arxiv.org/abs/2512.05915

作者:Marius F. R. Juston,Ramavarapu S. Sreenivas,Dustin Nottage,Ahmet Soylemezoglu
备注:39 pages, 3 figures, 12 tables
摘要:深度残差网络(ResNets)在计算机视觉任务中取得了巨大的成功,这归功于它们在深度架构中保持梯度流的能力。同时,控制神经网络中的Lipschitz常数已成为增强对抗鲁棒性和网络可认证性的重要研究领域。本文提出了一种严格的方法,$\mathcal{L}$-Lipschitz深度残差网络的一般设计使用线性矩阵不等式(LMI)的框架。最初,ResNet架构被重新表述为循环三对角LMI,并推导出对网络参数的封闭形式约束以确保$\mathcal{L}$-Lipchitz连续性;然而,使用新的$LDL^\top$分解方法来证明LMI的可行性,我们将$\mathcal{L}$-Lipchitz网络的构建扩展到任何其他非线性架构。我们的贡献包括一个可证明的参数化方法,用于构建Lipschitz约束残差网络和其他层次结构。Cholesky分解也用于有效的参数化。这些发现使鲁棒网络设计适用于对抗鲁棒性,认证培训和控制系统。$LDL^\top$公式被证明是基于SDP的网络的紧松弛,保持充分的表达能力,并在121个UCI数据集上实现了SLL层的3\%-13\%的准确性增益。
摘要:Deep residual networks (ResNets) have demonstrated outstanding success in computer vision tasks, attributed to their ability to maintain gradient flow through deep architectures. Simultaneously, controlling the Lipschitz constant in neural networks has emerged as an essential area of research to enhance adversarial robustness and network certifiability. This paper presents a rigorous approach to the general design of $\mathcal{L}$-Lipschitz deep residual networks using a Linear Matrix Inequality (LMI) framework. Initially, the ResNet architecture was reformulated as a cyclic tridiagonal LMI, and closed-form constraints on network parameters were derived to ensure $\mathcal{L}$-Lipschitz continuity; however, using a new $LDL^\top$ decomposition approach for certifying LMI feasibility, we extend the construction of $\mathcal{L}$-Lipchitz networks to any other nonlinear architecture. Our contributions include a provable parameterization methodology for constructing Lipschitz-constrained residual networks and other hierarchical architectures. Cholesky decomposition is also used for efficient parameterization. These findings enable robust network designs applicable to adversarial robustness, certified training, and control systems. The $LDL^\top$ formulation is shown to be a tight relaxation of the SDP-based network, maintaining full expressiveness and achieving 3\%-13\% accuracy gains over SLL Layers on 121 UCI data sets.


【4】DAE-HardNet: A Physics Constrained Neural Network Enforcing Differential-Algebraic Hard Constraints
标题:DAE-HardNet:一个物理约束的微分代数硬约束神经网络
链接:https://arxiv.org/abs/2512.05881

作者:Rahul Golder,Bimol Nath Roy,M. M. Faruque Hasan
摘要:传统的物理信息神经网络(PINN)并不总是满足基于物理的约束,特别是当约束包括微分算子时。相反,它们以软方式最小化约束冲突。严格满足微分代数方程(DAE),以嵌入领域知识和第一原理的数据驱动的模型通常是具有挑战性的。这是因为数据驱动模型将原始函数视为黑盒,其导数只能在对函数求值后才能获得。我们介绍了DAE-HardNet,这是一种物理约束(而不仅仅是物理信息)神经网络,它可以同时学习函数及其导数,同时执行代数和微分约束。这是通过使用可微投影层将模型预测投影到约束流形上来完成的。我们将DAE-HardNet应用于几个系统,并测试由DAE控制的问题,包括动态Lotka-Volterra捕食系统和瞬态热传导。我们还展示了DAE-HardNet通过参数估计问题估计未知参数的能力。与多层感知器(MLP)和PINN相比,DAE-HardNet在保持预测准确性的同时实现了物理损失的数量级减少。它具有学习导数的额外好处,这改善了投影层之前的骨干神经网络的约束学习。对于特定的问题,这表明可以绕过投影层以实现更快的推理。当前的实现和代码可在https://github.com/SOULS-TAMU/DAE-HardNet上获得。
摘要:Traditional physics-informed neural networks (PINNs) do not always satisfy physics based constraints, especially when the constraints include differential operators. Rather, they minimize the constraint violations in a soft way. Strict satisfaction of differential-algebraic equations (DAEs) to embed domain knowledge and first-principles in data-driven models is generally challenging. This is because data-driven models consider the original functions to be black-box whose derivatives can only be obtained after evaluating the functions. We introduce DAE-HardNet, a physics-constrained (rather than simply physics-informed) neural network that learns both the functions and their derivatives simultaneously, while enforcing algebraic as well as differential constraints. This is done by projecting model predictions onto the constraint manifold using a differentiable projection layer. We apply DAE-HardNet to several systems and test problems governed by DAEs, including the dynamic Lotka-Volterra predator-prey system and transient heat conduction. We also show the ability of DAE-HardNet to estimate unknown parameters through a parameter estimation problem. Compared to multilayer perceptrons (MLPs) and PINNs, DAE-HardNet achieves orders of magnitude reduction in the physics loss while maintaining the prediction accuracy. It has the added benefits of learning the derivatives which improves the constrained learning of the backbone neural network prior to the projection layer. For specific problems, this suggests that the projection layer can be bypassed for faster inference. The current implementation and codes are available at https://github.com/SOULS-TAMU/DAE-HardNet.


【5】Computational Design of Low-Volatility Lubricants for Space Using Interpretable Machine Learning
标题:使用可解释机器学习计算设计低挥发性太空润滑剂
链接:https://arxiv.org/abs/2512.05870

作者:Daniel Miliate,Ashlie Martini
摘要 :空间运动机械组件(MMA)的功能和寿命取决于润滑剂的性能。经历高速或高循环的MMA需要液体基润滑剂,因为它们能够回流到接触点。然而,只有少数液体基润滑剂具有足够低的蒸气压以用于空间的真空条件,其中每一种都具有限制,这增加了MMA设计的约束。这项工作介绍了一种数据驱动的机器学习(ML)方法来预测蒸汽压,从而能够虚拟筛选和发现新的适用于太空的液体润滑剂。ML模型使用来自高通量分子动力学模拟和实验数据库的数据进行训练。这些模型的设计优先考虑可解释性,使化学结构和蒸汽压之间的关系被确定。基于这些见解,提出了几个候选分子,可能有希望在未来的空间润滑剂应用在MMA。
摘要:The function and lifetime of moving mechanical assemblies (MMAs) in space depend on the properties of lubricants. MMAs that experience high speeds or high cycles require liquid based lubricants due to their ability to reflow to the point of contact. However, only a few liquid-based lubricants have vapor pressures low enough for the vacuum conditions of space, each of which has limitations that add constraints to MMA designs. This work introduces a data-driven machine learning (ML) approach to predicting vapor pressure, enabling virtual screening and discovery of new space-suitable liquid lubricants. The ML models are trained with data from both high-throughput molecular dynamics simulations and experimental databases. The models are designed to prioritize interpretability, enabling the relationships between chemical structure and vapor pressure to be identified. Based on these insights, several candidate molecules are proposed that may have promise for future space lubricant applications in MMAs.


【6】Learnability Window in Gated Recurrent Neural Networks
标题:门控回归神经网络的可学习窗口
链接:https://arxiv.org/abs/2512.05790

作者:Lorenzo Livi
摘要:我们开发了一个理论框架,解释了门控机制如何确定循环神经网络的可学习性窗口$\mathcal{H}_N$,定义为梯度信息在统计上可恢复的最大时间范围。虽然经典的分析强调数值稳定性的雅可比积,我们表明,稳定性本身是不够的:可学习性,而不是由\n {有效学习率} $μ_{t,\ell}$,每滞后和每神经元的数量从一阶扩展门诱导的雅可比积通过时间反向传播。这些有效的学习率充当乘法滤波器,控制梯度传输的幅度和各向异性。在重尾($α$-稳定)梯度噪声下,证明了在滞后~$\ell$处检测依赖关系所需的最小样本量满足$N(\ell)\propto f(\ell)^{-α}$,其中$f(\ell)=\|μ t,\ell}\|_1$是有效学习率包络.这导致了$\mathcal{H}_N$的显式公式和$f(\ell)$的对数、多项式和指数衰减的封闭形式标度律。该理论预测,更宽或更异质的门谱产生更慢的衰减$f(\ell)$,因此更大的可学习性窗口,而拖尾噪声通过减慢统计集中来压缩$\mathcal{H}_N$。通过连接门诱导的时间尺度结构,梯度噪声和样本复杂性,该框架将有效学习率确定为控制门控递归网络何时以及多长时间可以学习长期时间依赖性的基本量。
摘要:We develop a theoretical framework that explains how gating mechanisms determine the learnability window $\mathcal{H}_N$ of recurrent neural networks, defined as the largest temporal horizon over which gradient information remains statistically recoverable. While classical analyses emphasize numerical stability of Jacobian products, we show that stability alone is insufficient: learnability is governed instead by the \emph{effective learning rates} $μ_{t,\ell}$, per-lag and per-neuron quantities obtained from first-order expansions of gate-induced Jacobian products in Backpropagation Through Time. These effective learning rates act as multiplicative filters that control both the magnitude and anisotropy of gradient transport. Under heavy-tailed ($α$-stable) gradient noise, we prove that the minimal sample size required to detect a dependency at lag~$\ell$ satisfies $N(\ell)\propto f(\ell)^{-α}$, where $f(\ell)=\|μ_{t,\ell}\|_1$ is the effective learning rate envelope. This leads to an explicit formula for $\mathcal{H}_N$ and closed-form scaling laws for logarithmic, polynomial, and exponential decay of $f(\ell)$. The theory predicts that broader or more heterogeneous gate spectra produce slower decay of $f(\ell)$ and hence larger learnability windows, whereas heavier-tailed noise compresses $\mathcal{H}_N$ by slowing statistical concentration. By linking gate-induced time-scale structure, gradient noise, and sample complexity, the framework identifies the effective learning rates as the fundamental quantities that govern when -- and for how long -- gated recurrent networks can learn long-range temporal dependencies.


【7】Towards agent-based-model informed neural networks
标题:迈向基于代理模型的神经网络
链接:https://arxiv.org/abs/2512.05764

作者:Nino Antulov-Fantulin
摘要:在本文中,我们提出了一个框架,用于设计与基于代理的模型的基本原则保持一致的神经网络。我们首先强调了标准神经微分方程在复杂系统建模中的局限性,在复杂系统中,物理不变量(如能量)往往不存在,但其他约束(如质量守恒,网络局部性,有限理性)必须强制执行。为了解决这个问题,我们引入了基于代理的模型通知神经网络(ABM-NN),它利用受限图神经网络和层次分解来学习可解释的,结构保持的动态。我们通过三个复杂性不断增加的案例研究验证了该框架:(i)广义Lotka-Volterra系统,其中我们在存在干预的情况下从短轨迹中恢复地面实况参数;(ii)基于图形的SIR传染模型,其中我们的方法优于最先进的图形学习基线(GCN,GraphSAGE,Graph Transformer)在样本外预测和噪声鲁棒性方面的应用;以及(iii)十大经济体的真实宏观经济模型,在那里,我们从经验数据中学习耦合的GDP动态,并展示基于梯度的政策干预反事实分析。
摘要:In this article, we present a framework for designing neural networks that remain consistent with the underlying principles of agent-based models. We begin by highlighting the limitations of standard neural differential equations in modeling complex systems, where physical invariants (like energy) are often absent but other constraints (like mass conservation, network locality, bounded rationality) must be enforced. To address this, we introduce Agent-Based-Model informed Neural Networks(ABM-NNs), which leverage restricted graph neural networks and hierarchical decomposition to learn interpretable, structure-preserving dynamics. We validate the framework across three case studies of increasing complexity: (i) a generalized Generalized Lotka--Volterra system, where we recover ground-truth parameters from short trajectories in presence of interventions; (ii) a graph-based SIR contagion model, where our method outperforms state-of-the-art graph learning baselines (GCN, GraphSAGE, Graph Transformer) in out-of-sample forecasting and noise robustness; and (iii) a real-world macroeconomic model of the ten largest economies, where we learn coupled GDP dynamics from empirical data and demonstrate gradient-based counterfactual analysis for policy interventions.


【8】Improving Local Fidelity Through Sampling and Modeling Nonlinearity
标题:通过采样和非线性建模提高局部保真度
链接:https://arxiv.org/abs/2512.05556

作者:Sanjeev Shrestha,Rahul Dubey,Hui Liu
摘要:随着黑盒机器学习模型的复杂性越来越高,以及它们在高风险领域的采用,为它们的预测提供解释至关重要。局部可解释模型不可知解释(LIME)是一种广泛使用的技术,它通过在预测实例周围局部学习可解释模型来解释任何分类器的预测。然而,它假设局部决策边界是线性的,无法捕捉非线性关系,导致不正确的解释。在本文中,我们提出了一种新的方法,可以产生高保真的解释。多变量自适应回归样条(MARS)被用来模拟非线性局部边界,有效地捕捉参考模型的基本行为,从而提高解释的局部保真度。此外,我们还利用了N-ball采样技术,该技术直接从所需的分布中采样,而不是像LIME中那样对样本进行重新加权,从而进一步提高了忠诚度得分。我们在三个UCI数据集上评估了我们的方法,这些数据集跨越不同的分类器和不同的内核宽度。实验结果表明,我们的方法产生更忠实的解释相比,基线,实现平均减少37%的均方根误差,显着提高局部保真度。
摘要 :With the increasing complexity of black-box machine learning models and their adoption in high-stakes areas, it is critical to provide explanations for their predictions. Local Interpretable Model-agnostic Explanation (LIME) is a widely used technique that explains the prediction of any classifier by learning an interpretable model locally around the predicted instance. However, it assumes that the local decision boundary is linear and fails to capture the non-linear relationships, leading to incorrect explanations. In this paper, we propose a novel method that can generate high-fidelity explanations. Multivariate adaptive regression splines (MARS) is used to model non-linear local boundaries that effectively captures the underlying behavior of the reference model, thereby enhancing the local fidelity of the explanation. Additionally, we utilize the N-ball sampling technique, which samples directly from the desired distribution instead of reweighting samples as done in LIME, further improving the faithfulness score. We evaluate our method on three UCI datasets across different classifiers and varying kernel widths. Experimental results show that our method yields more faithful explanations compared to baselines, achieving an average reduction of 37% in root mean square error, significantly improving local fidelity.


【9】On the Theoretical Foundation of Sparse Dictionary Learning in Mechanistic Interpretability
标题:论机械解释性稀疏词典学习的理论基础
链接:https://arxiv.org/abs/2512.05534

作者:Yiming Tang,Harshvardhan Saini,Yizhen Liao,Dianbo Liu
摘要:随着人工智能模型在不同领域实现卓越的能力,了解它们学习的表示以及它们如何处理信息对于科学进步和值得信赖的部署变得越来越重要。最近在机械可解释性方面的研究表明,神经网络将有意义的概念表示为其表示空间中的方向,并且经常以叠加的方式编码许多概念。各种稀疏字典学习(SDL)方法,包括稀疏自动编码器,转码器和交叉编码器,通过训练具有稀疏约束的辅助模型来解决这个问题,以将这些叠加的概念分解为可解释的特征。这些方法已经证明了显着的经验成功,但有有限的理论理解。现有的理论工作仅限于稀疏的自编码器与捆绑重量的约束,离开更广泛的SDL方法没有正式的接地。在这项工作中,我们开发了第一个统一的理论框架,考虑SDL作为一个统一的优化问题。我们展示了不同的方法如何实例化的理论框架,并提供严格的分析优化景观。我们提供了一些经验观察到的现象,包括功能吸收,死神经元,和神经元resstriking技术的第一个理论解释。我们进一步设计了控制实验来验证我们的理论结果。
摘要:As AI models achieve remarkable capabilities across diverse domains, understanding what representations they learn and how they process information has become increasingly important for both scientific progress and trustworthy deployment. Recent works in mechanistic interpretability have shown that neural networks represent meaningful concepts as directions in their representation spaces and often encode many concepts in superposition. Various sparse dictionary learning (SDL) methods, including sparse autoencoders, transcoders, and crosscoders, address this by training auxiliary models with sparsity constraints to disentangle these superposed concepts into interpretable features. These methods have demonstrated remarkable empirical success but have limited theoretical understanding. Existing theoretical work is limited to sparse autoencoders with tied-weight constraints, leaving the broader family of SDL methods without formal grounding. In this work, we develop the first unified theoretical framework considering SDL as one unified optimization problem. We demonstrate how diverse methods instantiate the theoretical framwork and provide rigorous analysis on the optimization landscape. We provide the first theoretical explanations for some empirically observed phenomena, including feature absorption, dead neurons, and the neuron resampling technique. We further design controlled experiments to validate our theoretical results.


【10】How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data
标题:Ensemble Learning如何平衡准确性和过拟:表格数据的偏差方差视角
链接:https://arxiv.org/abs/2512.05469

作者:Zubair Ahmed Mohammad
备注:11 pages, 9 figures, 3 tables. Code and reproducible experiments are available at: https://github.com/zubair0831/ensemble-generalization-gap
摘要:包围模型通常比单个学习器实现更高的准确性,但它们保持小泛化差距的能力并不总是很好理解。这项研究探讨了合奏如何平衡准确性和过拟合在四个表格分类任务:乳腺癌,心脏病,Pima糖尿病和信用卡欺诈。使用重复分层交叉验证与统计显著性检验,我们比较线性模型,一个单一的决策树,和九个集成方法。结果表明,集成可以达到高精度没有大的差距,通过平均或控制提升减少方差。在接近线性和干净的数据上,线性模型已经推广得很好,集成几乎没有额外的好处。在具有有意义的非线性结构的数据集上,基于树的集成将测试准确度提高了5到7个点,同时将差距保持在3%以下。在噪声或高度不平衡的数据集上,集成仍然具有竞争力,但需要正则化以避免拟合噪声或多数类模式。我们还计算简单的数据集复杂性指标,如线性得分,Fisher比率和噪声估计,这解释了何时集成可能有效地控制方差。总的来说,该研究提供了一个清晰的视图,如何以及何时合奏保持高精度,同时保持过拟合低,在现实世界的表格应用程序中的模型选择提供了实际的指导。
摘要:Ensemble models often achieve higher accuracy than single learners, but their ability to maintain small generalization gaps is not always well understood. This study examines how ensembles balance accuracy and overfitting across four tabular classification tasks: Breast Cancer, Heart Disease, Pima Diabetes, and Credit Card Fraud. Using repeated stratified cross validation with statistical significance testing, we compare linear models, a single decision tree, and nine ensemble methods. The results show that ensembles can reach high accuracy without large gaps by reducing variance through averaging or controlled boosting. On nearly linear and clean data, linear models already generalize well and ensembles offer little additional benefit. On datasets with meaningful nonlinear structure, tree based ensembles increase test accuracy by 5 to 7 points while keeping gaps below 3 percent. On noisy or highly imbalanced datasets, ensembles remain competitive but require regularization to avoid fitting noise or majority class patterns. We also compute simple dataset complexity indicators, such as linearity score, Fisher ratio, and noise estimate, which explain when ensembles are likely to control variance effectively. Overall, the study provides a clear view of how and when ensembles maintain high accuracy while keeping overfitting low, offering practical guidance for model selection in real world tabular applications.


【11】China Regional 3km Downscaling Based on Residual Corrective Diffusion Model
标题:基于剩余修正扩散模型的中国区域3公里缩减
链接:https://arxiv.org/abs/2512.05377

作者:Honglu Sun,Hao Jing,Zhixiang Dai,Sa Xiao,Wei Xue,Jian Sun,Qifeng Lu
摘要:数值天气预报的一个基本挑战是有效地产生高分辨率的预报。一个常见的解决方案是应用降尺度方法,包括动态降尺度和统计降尺度,全球模型的输出。这项工作的重点是统计降尺度,它使用统计模型建立低分辨率和高分辨率历史数据之间的统计关系。深度学习已经成为这一任务的强大工具,产生了各种高性能的超分辨率模型,可以直接应用于降尺度,如扩散模型和生成对抗网络。这项工作依赖于一个名为CorrDiff的基于扩散的降尺度框架。与CorrDiff的原始工作相比,这项工作中考虑的区域大了近20倍,我们不仅考虑了原始工作中的表面变量,而且还遇到了高水平变量(六个压力水平)作为目标降尺度变量。此外,增加了全局残差连接以提高精度。为了生成中国地区的3公里预报,我们将训练好的模型应用于CMA-GFS的25公里全球网格预报,CMA-GFS是中国气象局(CMA)的一个全球业务模型,SFF是一个基于数据驱动的基于深度学习的天气模型,由球面傅立叶神经算子(SFNO)开发。选择高分辨率区域模式CMA-MESO作为基线模式。实验结果表明,我们的方法降尺度的预测一般优于CMA-MESO的直接预测的MAE的目标变量。我们的雷达复合反射率的预测表明,CorrDiff,作为一个生成模型,可以生成细尺度的细节,导致更现实的预测相比,相应的确定性回归模型。
摘要:A fundamental challenge in numerical weather prediction is to efficiently produce high-resolution forecasts. A common solution is applying downscaling methods, which include dynamical downscaling and statistical downscaling, to the outputs of global models. This work focuses on statistical downscaling, which establishes statistical relationships between low-resolution and high-resolution historical data using statistical models. Deep learning has emerged as a powerful tool for this task, giving rise to various high-performance super-resolution models, which can be directly applied for downscaling, such as diffusion models and Generative Adversarial Networks. This work relies on a diffusion-based downscaling framework named CorrDiff. In contrast to the original work of CorrDiff, the region considered in this work is nearly 20 times larger, and we not only consider surface variables as in the original work, but also encounter high-level variables (six pressure levels) as target downscaling variables. In addition, a global residual connection is added to improve accuracy. In order to generate the 3km forecasts for the China region, we apply our trained models to the 25km global grid forecasts of CMA-GFS, an operational global model of the China Meteorological Administration (CMA), and SFF, a data-driven deep learning-based weather model developed from Spherical Fourier Neural Operators (SFNO). CMA-MESO, a high-resolution regional model, is chosen as the baseline model. The experimental results demonstrate that the forecasts downscaled by our method generally outperform the direct forecasts of CMA-MESO in terms of MAE for the target variables. Our forecasts of radar composite reflectivity show that CorrDiff, as a generative model, can generate fine-scale details that lead to more realistic predictions compared to the corresponding deterministic regression models.


【12】CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators
标题:首席财务官:通过流匹配神经运算符学习连续时间PDL动力学
链接:https://arxiv.org/abs/2512.05297

作者:Xianglong Hou,Xinquan Huang,Paris Perdikaris
摘要:时间相关偏微分方程(PDE)的神经算子替代品通常采用自回归预测方案,其在长时间的滚动中积累误差,并需要统一的时间离散化。我们引入了连续流算子(CFO),这是一个学习连续时间PDE动力学的框架,而没有标准连续方法的计算负担,例如,神经常微分方程关键的见解是重新利用流匹配来直接学习PDE的右侧,而无需通过ODE求解器进行反向传播。CFO将时间样条拟合到轨迹数据,使用节点处时间导数的有限差分估计来构建速度接近真实PDE动态的概率路径。然后,通过流匹配训练神经运算符来预测这些分析速度场。这种方法本质上是时间分辨率不变的:训练接受在任意非均匀时间网格上采样的轨迹,而推理则通过ODE集成以任何时间分辨率查询解决方案。在四个基准测试(Lorenz,1D Burgers,2D扩散反应,2D浅水)中,CFO表现出卓越的长期稳定性和显着的数据效率。仅在25%的不规则子采样时间点上训练的CFO优于在完整数据上训练的自回归基线,相对误差降低高达87%。尽管在推理时需要数值积分,但CFO实现了具有竞争力的效率,仅使用50%的函数评估就优于自回归基线,同时独特地实现了逆时推理和任意时间查询。
摘要:Neural operator surrogates for time-dependent partial differential equations (PDEs) conventionally employ autoregressive prediction schemes, which accumulate error over long rollouts and require uniform temporal discretization. We introduce the Continuous Flow Operator (CFO), a framework that learns continuous-time PDE dynamics without the computational burden of standard continuous approaches, e.g., neural ODE. The key insight is repurposing flow matching to directly learn the right-hand side of PDEs without backpropagating through ODE solvers. CFO fits temporal splines to trajectory data, using finite-difference estimates of time derivatives at knots to construct probability paths whose velocities closely approximate the true PDE dynamics. A neural operator is then trained via flow matching to predict these analytic velocity fields. This approach is inherently time-resolution invariant: training accepts trajectories sampled on arbitrary, non-uniform time grids while inference queries solutions at any temporal resolution through ODE integration. Across four benchmarks (Lorenz, 1D Burgers, 2D diffusion-reaction, 2D shallow water), CFO demonstrates superior long-horizon stability and remarkable data efficiency. CFO trained on only 25% of irregularly subsampled time points outperforms autoregressive baselines trained on complete data, with relative error reductions up to 87%. Despite requiring numerical integration at inference, CFO achieves competitive efficiency, outperforming autoregressive baselines using only 50% of their function evaluations, while uniquely enabling reverse-time inference and arbitrary temporal querying.


【13】Bridging quantum and classical computing for partial differential equations through multifidelity machine learning
标题:通过多保真机器学习弥合偏微方程的量子计算和经典计算
链接:https://arxiv.org/abs/2512.05241

作者:Bruno Jacob,Amanda A. Howard,Panos Stinis
备注:19 pages, 12 figures
摘要:偏微分方程(PDE)的量子算法在近期硬件上面临着严重的实际限制:有限的量子位数将空间分辨率限制在粗网格上,而电路深度限制则阻止了精确的长时间积分。这些硬件瓶颈限制了量子PDE求解器的低保真度制度,尽管他们的理论潜力的计算加速。我们引入了一个多保真度学习框架,该框架使用稀疏经典训练数据将粗糙的量子解决方案校正为高保真精度,从而促进了科学计算的实用量子实用性。该方法在丰富的量子求解器输出上训练低保真度代理,然后通过平衡线性和非线性变换的多保真度神经架构学习校正映射。通过量子格子Boltzmann方法,在包括粘性Burgers方程和不可压缩Navier-Stokes流在内的基准非线性偏微分方程上进行了演示,该框架成功地纠正了粗糙的量子预测,并实现了远远超出经典训练窗口的时间外推。这种策略说明了如何可以减少昂贵的高保真仿真的要求,同时产生的预测是与经典的准确性竞争。通过弥合硬件限制的量子模拟和应用需求之间的差距,这项工作为从现实世界的科学应用中的当前量子设备中提取计算价值建立了一条途径,推进了算法开发和计算物理学近期量子计算的实际部署。
摘要:Quantum algorithms for partial differential equations (PDEs) face severe practical constraints on near-term hardware: limited qubit counts restrict spatial resolution to coarse grids, while circuit depth limitations prevent accurate long-time integration. These hardware bottlenecks confine quantum PDE solvers to low-fidelity regimes despite their theoretical potential for computational speedup. We introduce a multifidelity learning framework that corrects coarse quantum solutions to high-fidelity accuracy using sparse classical training data, facilitating the path toward practical quantum utility for scientific computing. The approach trains a low-fidelity surrogate on abundant quantum solver outputs, then learns correction mappings through a multifidelity neural architecture that balances linear and nonlinear transformations. Demonstrated on benchmark nonlinear PDEs including viscous Burgers equation and incompressible Navier-Stokes flows via quantum lattice Boltzmann methods, the framework successfully corrects coarse quantum predictions and achieves temporal extrapolation well beyond the classical training window. This strategy illustrates how one can reduce expensive high-fidelity simulation requirements while producing predictions that are competitive with classical accuracy. By bridging the gap between hardware-limited quantum simulations and application requirements, this work establishes a pathway for extracting computational value from current quantum devices in real-world scientific applications, advancing both algorithm development and practical deployment of near-term quantum computing for computational physics.


【14】Coefficient of Variation Masking: A Volatility-Aware Strategy for EHR Foundation Models
标题:变异系数掩蔽:EHR基金会模型的波动性感知策略
链接:https://arxiv.org/abs/2512.05216

作者:Rajna Fani,Rafi Al Attrach,David Restrepo,Yugang Jia,Leo Anthony Celi,Peter Schüffler
备注:16 pages, 9 figures, 1 table, 1 algorithm. Accepted at Machine Learning for Health (ML4H) 2025, Proceedings of the Machine Learning Research (PMLR)
摘要:掩蔽自动编码器(MAE)越来越多地应用于电子健康记录(EHR),用于学习支持各种临床任务的通用表示。然而,现有的方法通常依赖于均匀随机掩蔽,隐含地假设所有特征都是可预测的。实际上,实验室测试显示出波动性的显著异质性:一些生物标志物(例如,钠)保持稳定,而其它(例如,乳酸盐)波动相当大且更难以建模。在临床上,挥发性生物标志物通常是急性病理生理学的信号,需要更复杂的建模来捕获其复杂的时间模式。我们提出了一种波动性感知的预训练策略,变异系数掩蔽(CV-Masking),根据每个特征的内在变异性自适应地调整掩蔽概率。结合与临床工作流程一致的仅值掩蔽目标,CV-Masking比随机和基于方差的策略产生了系统性改进。在大量实验室测试中的实验表明,CV-Masking增强了重建,提高了下游预测性能,并加速了收敛,产生了更强大和更有临床意义的EHR表示。
摘要:Masked autoencoders (MAEs) are increasingly applied to electronic health records (EHR) for learning general-purpose representations that support diverse clinical tasks. However, existing approaches typically rely on uniform random masking, implicitly assuming all features are equally predictable. In reality, laboratory tests exhibit substantial heterogeneity in volatility: some biomarkers (e.g., sodium) remain stable, while others (e.g., lactate) fluctuate considerably and are more difficult to model. Clinically, volatile biomarkers often signal acute pathophysiology and require more sophisticated modeling to capture their complex temporal patterns. We propose a volatility-aware pretraining strategy, Coefficient of Variation Masking (CV-Masking), that adaptively adjusts masking probabilities according to the intrinsic variability of each feature. Combined with a value-only masking objective aligned with clinical workflows, CV-Masking yields systematic improvements over random and variance-based strategies. Experiments on a large panel of laboratory tests show that CV-Masking enhances reconstruction, improves downstream predictive performance, and accelerates convergence, producing more robust and clinically meaningful EHR representations.


【15】Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models
标题:你的潜在面具是错误的:扩散模型的像素等效潜在合成
链接 :https://arxiv.org/abs/2512.05198

作者:Rowan Bradbury,Dazhi Zhong
备注:16 pages, 10 figures
摘要:扩散模型中的潜在修复仍然几乎普遍依赖于在下采样掩码下线性插值VAE潜伏期。我们提出了一个关键的原则,合成图像的潜伏期:像素等效潜伏合成(PELC)。等效的潜在合成器应该与像素空间中的合成相同。这一原理实现了全分辨率蒙版控制和真正的软边缘alpha合成,即使VAE在空间上压缩图像8倍。现代VAE捕获全局上下文超出了块对齐的局部结构,因此线性潜在混合不能是像素等效的:它在掩模接缝处产生大的伪影以及全局退化和颜色偏移。我们引入DecFormer,一个7. 7 M参数的Transformer,预测每个通道的混合权重和一个离流形残差校正,以实现掩模一致的潜在融合。DecFormer经过训练,融合后的解码与像素空间alpha合成匹配,与现有的扩散管道兼容,不需要骨干微调,只增加了FLUX.1-Dev参数的0.07%和3.5%的FLOP开销。在FLUX.1系列中,DecFormer可恢复全局色彩一致性、软掩膜支持、清晰边界和高保真掩膜,与标准掩膜插值相比,边缘周围的误差指标减少多达53%。作为修复先验,具有DecFormer的FLUX.1-Dev上的轻量级LoRA实现了与FLUX.1-Fill(一种完全微调的修复模型)相当的保真度。当我们专注于修复时,PELC是像素等效潜在编辑的通用方法,正如我们在复杂的颜色校正任务中所展示的那样。
摘要:Latent inpainting in diffusion models still relies almost universally on linearly interpolating VAE latents under a downsampled mask. We propose a key principle for compositing image latents: Pixel-Equivalent Latent Compositing (PELC). An equivalent latent compositor should be the same as compositing in pixel space. This principle enables full-resolution mask control and true soft-edge alpha compositing, even though VAEs compress images 8x spatially. Modern VAEs capture global context beyond patch-aligned local structure, so linear latent blending cannot be pixel-equivalent: it produces large artifacts at mask seams and global degradation and color shifts. We introduce DecFormer, a 7.7M-parameter transformer that predicts per-channel blend weights and an off-manifold residual correction to realize mask-consistent latent fusion. DecFormer is trained so that decoding after fusion matches pixel-space alpha compositing, is plug-compatible with existing diffusion pipelines, requires no backbone finetuning and adds only 0.07% of FLUX.1-Dev's parameters and 3.5% FLOP overhead. On the FLUX.1 family, DecFormer restores global color consistency, soft-mask support, sharp boundaries, and high-fidelity masking, reducing error metrics around edges by up to 53% over standard mask interpolation. Used as an inpainting prior, a lightweight LoRA on FLUX.1-Dev with DecFormer achieves fidelity comparable to FLUX.1-Fill, a fully finetuned inpainting model. While we focus on inpainting, PELC is a general recipe for pixel-equivalent latent editing, as we demonstrate on a complex color-correction task.


【16】Spatiotemporal Satellite Image Downscaling with Transfer Encoders and Autoregressive Generative Models
标题:利用传输编码器和自回归生成模型进行时空卫星图像缩减
链接:https://arxiv.org/abs/2512.05139

作者:Yang Xiang,Jingwen Zhong,Yige Yan,Petros Koutrakis,Eric Garshick,Meredith Franklin
摘要:我们提出了一个迁移学习生成降尺度框架,从粗尺度输入重建精细分辨率的卫星图像。我们的方法结合了轻量级的U-Net传输编码器和基于扩散的生成模型。简单的U-Net首先在一个长时间序列的粗分辨率数据上进行预训练,以学习时空表示;然后将其编码器冻结并转移到一个更大的降尺度模型中,作为物理上有意义的潜在特征。我们的应用程序使用美国宇航局的MERRA-2再分析作为低分辨率源域(50公里)和GEOS-5自然运行(G5 NR)作为高分辨率的目标(7公里)。我们的研究区域包括亚洲的一大片区域,通过将其分为两个分区和四个季节,使其在计算上易于处理。我们使用Wasserstein距离进行了域相似性分析,证实了MERRA-2和G5 NR之间的最小分布偏移,验证了参数冻结传输的安全性。在季节性区域划分中,我们的模型实现了出色的性能(R2 = 0.65至0.94),优于包括确定性U型网络、变分自编码器和先验迁移学习基线在内的比较模型。使用半变异函数,ACF/PACF和基于滞后的RMSE/R2的数据评估表明,预测的降尺度图像保留了物理上一致的空间变异性和时间自相关性,使稳定的自回归重建超出了G5 NR记录。这些结果表明,传输增强扩散模型提供了一个强大的和物理上一致的解决方案,用于缩小一个长时间序列的粗分辨率图像与有限的训练周期。这一进展对改善环境暴露评估和长期环境监测具有重要意义。
摘要:We present a transfer-learning generative downscaling framework to reconstruct fine resolution satellite images from coarse scale inputs. Our approach combines a lightweight U-Net transfer encoder with a diffusion-based generative model. The simpler U-Net is first pretrained on a long time series of coarse resolution data to learn spatiotemporal representations; its encoder is then frozen and transferred to a larger downscaling model as physically meaningful latent features. Our application uses NASA's MERRA-2 reanalysis as the low resolution source domain (50 km) and the GEOS-5 Nature Run (G5NR) as the high resolution target (7 km). Our study area included a large area in Asia, which was made computationally tractable by splitting into two subregions and four seasons. We conducted domain similarity analysis using Wasserstein distances confirmed minimal distributional shift between MERRA-2 and G5NR, validating the safety of parameter frozen transfer. Across seasonal regional splits, our model achieved excellent performance (R2 = 0.65 to 0.94), outperforming comparison models including deterministic U-Nets, variational autoencoders, and prior transfer learning baselines. Out of data evaluations using semivariograms, ACF/PACF, and lag-based RMSE/R2 demonstrated that the predicted downscaled images preserved physically consistent spatial variability and temporal autocorrelation, enabling stable autoregressive reconstruction beyond the G5NR record. These results show that transfer enhanced diffusion models provide a robust and physically coherent solution for downscaling a long time series of coarse resolution images with limited training periods. This advancement has significant implications for improving environmental exposure assessment and long term environmental monitoring.


【17】InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
标题:InvarDiff:加速扩散模型的跨尺度不变性缓存
链接:https://arxiv.org/abs/2512.05134

作者:Zihao Wu
备注:8 pages main, 8 pages appendix, 16 figures, 5 tables. Code: https://github.com/zihaowu25/InvarDiff
摘要:扩散模型提供高保真度合成,但由于迭代采样而保持缓慢。我们经验性地观察到在确定性采样中存在特征不变性,并提出InvarDiff,一种利用跨时间步长尺度和层尺度的相对时间不变性的无训练加速方法。通过几次确定性运行,我们计算出每时间步、每层、每模块的二进制缓存计划矩阵,并使用重新采样校正来避免连续缓存发生时的漂移。使用基于分位数的变更度量,该矩阵指定在哪个步骤重用哪个模块,而不是重新计算。同样的不变性标准应用于步骤规模,以启用跨时间步缓存,决定整个步骤是否可以重用缓存的结果。在推理期间,InvarDiff执行由该矩阵指导的分步和逐层缓存。当应用于DiT和FLUX时,我们的方法在保持保真度的同时减少了冗余计算。实验表明,InvarDiff实现了2 $-3 $\times $的端到端加速,对标准质量指标的影响最小。从质量上讲,我们观察到与完整计算相比,视觉质量几乎没有下降。
摘要:Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale. From a few deterministic runs, we compute a per-timestep, per-layer, per-module binary cache plan matrix and use a re-sampling correction to avoid drift when consecutive caches occur. Using quantile-based change metrics, this matrix specifies which module at which step is reused rather than recomputed. The same invariance criterion is applied at the step scale to enable cross-timestep caching, deciding whether an entire step can reuse cached results. During inference, InvarDiff performs step-first and layer-wise caching guided by this matrix. When applied to DiT and FLUX, our approach reduces redundant compute while preserving fidelity. Experiments show that InvarDiff achieves $2$-$3\times$ end-to-end speed-ups with minimal impact on standard quality metrics. Qualitatively, we observe almost no degradation in visual quality compared with full computations.


【18】Machine-learning-enabled interpretation of tribological deformation patterns in large-scale MD data
标题:大规模MD数据中的摩擦学变形模式的机器学习解释
链接:https://arxiv.org/abs/2512.05818

作者:Hendrik J. Ehrich,Marvin C. May,Stefan J. Eder
备注:19 pages, 11 figures
摘要:分子动力学(MD)模拟已成为不可缺少的探索在原子尺度上的摩擦变形模式。然而,将产生的高维数据转换为可解释的变形模式图仍然是一个资源密集型且主要是手动的过程。在这项工作中,我们引入了一个数据驱动的工作流程,该工作流程使用无监督和监督学习来自动执行此解释步骤。从CuNi合金模拟获得的晶粒取向彩色计算断层图像首先通过自动编码器压缩到32维全局特征向量。尽管这种强大的压缩,重建的图像保留了基本的微观结构图案:晶界,堆垛层错,孪晶和部分晶格旋转,而省略了最好的缺陷。然后将学习到的表示与模拟元数据(成分、载荷、时间、温度和空间位置)相结合,以训练CNN-MLP模型来预测主要变形模式。由此产生的模型在验证数据上实现了约96%的预测准确度。一种改进的评估策略,其中包含不同颗粒的整个空间区域被排除在训练之外,提供了更强大的泛化措施。该方法表明,基本的摩擦变形签名可以自动识别和分类的结构图像,使用机器学习。这一概念验证是迈向完全自动化、数据驱动的摩擦学机制图构建的第一步,最终是迈向预测建模框架,从而减少对大规模MD仿真活动的需求。
摘要:Molecular dynamics (MD) simulations have become indispensable for exploring tribological deformation patterns at the atomic scale. However, transforming the resulting high-dimensional data into interpretable deformation pattern maps remains a resource-intensive and largely manual process. In this work, we introduce a data-driven workflow that automates this interpretation step using unsupervised and supervised learning. Grain-orientation-colored computational tomograph pictures obtained from CuNi alloy simulations were first compressed through an autoencoder to a 32-dimensional global feature vector. Despite this strong compression, the reconstructed images retained the essential microstructural motifs: grain boundaries, stacking faults, twins, and partial lattice rotations, while omitting only the finest defects. The learned representations were then combined with simulation metadata (composition, load, time, temperature, and spatial position) to train a CNN-MLP model to predict the dominant deformation pattern. The resulting model achieves a prediction accuracy of approximately 96% on validation data. A refined evaluation strategy, in which an entire spatial region containing distinct grains was excluded from training, provides a more robust measure of generalization. The approach demonstrates that essential tribological deformation signatures can be automatically identified and classified from structural images using Machine Learning. This proof of concept constitutes a first step towards fully automated, data-driven construction of tribological mechanism maps and, ultimately, toward predictive modeling frameworks that may reduce the need for large-scale MD simulation campaigns.


【19】Comparing the latent features of universal machine-learning interatomic potentials
标题:比较通用机器学习原子间势的潜在特征
链接:https://arxiv.org/abs/2512.05717

作者:Sofiia Chorna,Davide Tisi,Cesare Malosso,Wei Bin How,Michele Ceriotti,Sanggyu Chong
摘要:在过去的几年里,“通用”机器学习原子间势(uMLIPs)的发展能够以合理的准确度近似各种化学结构和组成的基态势能面。虽然这些模型在架构和使用的数据集上有所不同,但它们都有能力将惊人数量的化学信息压缩成描述性的潜在特征。在这里,我们系统地分析了不同的uMLIP通过定量评估其潜在特征的相对信息含量与特征重建误差作为度量,并观察训练集和训练协议的选择如何影响趋势,从而学到了什么。我们发现uMLIPs以显著不同的方式编码化学空间,具有大量的跨模型特征重建错误。当考虑相同模型架构的变体时,趋势变得取决于所选择的数据集、目标和训练协议。我们还观察到,对uMLIP的微调在潜在特征中保留了很强的预训练偏差。最后,我们讨论了如何原子级的功能,这是直接输出的MLIP,可以被压缩成全球结构级的功能,通过级联的渐进累积量,每个添加显着的新信息的变化在一个给定的系统内的原子环境。
摘要:The past few years have seen the development of ``universal'' machine-learning interatomic potentials (uMLIPs) capable of approximating the ground-state potential energy surface across a wide range of chemical structures and compositions with reasonable accuracy. While these models differ in the architecture and the dataset used, they share the ability to compress a staggering amount of chemical information into descriptive latent features. Herein, we systematically analyze what the different uMLIPs have learned by quantitatively assessing the relative information content of their latent features with feature reconstruction errors as metrics, and observing how the trends are affected by the choice of training set and training protocol. We find that the uMLIPs encode chemical space in significantly distinct ways, with substantial cross-model feature reconstruction errors. When variants of the same model architecture are considered, trends become dependent on the dataset, target, and training protocol of choice. We also observe that fine-tuning of a uMLIP retains a strong pre-training bias in the latent features. Finally, we discuss how atom-level features, which are directly output by MLIPs, can be compressed into global structure-level features via concatenation of progressive cumulants, each adding significantly new information about the variability across the atomic environments within a given system.


【20】FieldSeer I: Physics-Guided World Models for Long-Horizon Electromagnetic Dynamics under Partial Observability
标题:FieldSeer I:部分可观测性下长视界电磁动力学的物理引导世界模型
链接:https://arxiv.org/abs/2512.05361

作者:Ziheng Guo,Fang Wu,Maoxiong Zhao,Chaoqun Fang,Yang Bu
摘要:我们介绍FieldSeer I,一个几何感知的世界模型,预测电磁场动态的部分观察2-D TE波导。该模型吸收了一个短前缀的观测场,条件上的标量源的行动和结构/材料地图,并产生闭环推出的物理域。在双对数域中进行训练可确保数值稳定性。在可再现的FDTD基准上进行评估(200个独特的模拟,按结构划分),FieldSeer I在三个实际设置中实现了比GRU和确定性基线更高的后缀保真度:(i)软件在环滤波(64 x64,P=80->Q=80),(ii)离线单列展开(80 x140,P=240->Q=40),以及(iii)离线多结构展开(80 x140,P=180->Q=100)。重要的是,它支持前缀后编辑几何修改,而无需重新同化。结果表明,几何条件的世界模型提供了一个实用的途径,互动的数字孪生光子设计。
摘要:We introduce FieldSeer I, a geometry-aware world model that forecasts electromagnetic field dynamics from partial observations in 2-D TE waveguides. The model assimilates a short prefix of observed fields, conditions on a scalar source action and structure/material map, and generates closed-loop rollouts in the physical domain. Training in a symmetric-log domain ensures numerical stability. Evaluated on a reproducible FDTD benchmark (200 unique simulations, structure-wise split), FieldSeer I achieves higher suffix fidelity than GRU and deterministic baselines across three practical settings: (i) software-in-the-loop filtering (64x64, P=80->Q=80), (ii) offline single-file rollouts (80x140, P=240->Q=40), and (iii) offline multi-structure rollouts (80x140, P=180->Q=100). Crucially, it enables edit-after-prefix geometry modifications without re-assimilation. Results demonstrate that geometry-conditioned world models provide a practical path toward interactive digital twins for photonic design.


其他(22篇)

【1】MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution
标题:MaxShapley:迈向具有公平上下文归因的激励兼容生成搜索
链接:https://arxiv.org/abs/2512.05958

作者:Sara Patel,Mingxun Zhou,Giulia Fanti
摘要:基于大型语言模型(LLM)的生成式搜索引擎正在取代传统搜索,从根本上改变了信息提供者的报酬方式。为了维持这个生态系统,我们需要公平的机制,根据内容提供者对生成的答案的贡献,对他们进行归属和补偿。我们介绍MaxShapley,一个有效的算法,公平的属性生成搜索管道,使用检索增强生成(RAG)。MaxShapley是著名Shapley值的一个特例;它利用可分解的最大和效用函数来计算文档数量的线性计算属性,而不是Shapley值的指数成本。我们在三个多跳QA数据集(HotPotQA,MuSiQUE,MS MARCO)上评估了MaxShapley; MaxShapley实现了与精确Shapley计算相当的归因质量,同时消耗了一小部分令牌-例如,在相同的归因准确性下,它比现有的最先进的方法减少了8倍的资源消耗。
摘要 :Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem, we need fair mechanisms to attribute and compensate content providers based on their contributions to generated answers. We introduce MaxShapley, an efficient algorithm for fair attribution in generative search pipelines that use retrieval-augmented generation (RAG). MaxShapley is a special case of the celebrated Shapley value; it leverages a decomposable max-sum utility function to compute attributions with linear computation in the number of documents, as opposed to the exponential cost of Shapley values. We evaluate MaxShapley on three multi-hop QA datasets (HotPotQA, MuSiQUE, MS MARCO); MaxShapley achieves comparable attribution quality to exact Shapley computation, while consuming a fraction of its tokens--for instance, it gives up to an 8x reduction in resource consumption over prior state-of-the-art methods at the same attribution accuracy.


【2】On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
标题:论分歧差异代理人的Bayes不一致性
链接:https://arxiv.org/abs/2512.05931

作者:Neil G. Marchant,Andrew C. Cullen,Feng Liu,Sarah M. Erfani
备注:37 pages, 7 figures
摘要:深度神经网络在部署到现实环境中时经常会由于分布变化而失败,这是构建安全可靠系统的关键障碍。一个新兴的方法来解决这个问题依赖于\n {disagreement discrimination} -一个衡量两个模型之间的差异如何在变化的分布下变化的指标。最大化该度量的过程已经在偏移下的边界误差、有害偏移的测试以及训练更鲁棒的模型中得到了应用。然而,这种优化涉及不可微的零一损失,需要使用实际的替代损失。我们证明,现有的代理人的分歧差异不贝叶斯一致,揭示了一个根本的缺陷:最大限度地提高这些代理人可能无法最大限度地提高真正的分歧差异。为了解决这个问题,我们引入了新的理论结果,为这些代理人的最优性差距提供了上限和下限。在这个理论的指导下,我们提出了一种新的分歧损失,当与交叉熵配对时,产生一个可证明一致的替代分歧差异。不同基准的实证评估表明,我们的方法提供了比现有方法更准确和更强大的分歧差异估计,特别是在具有挑战性的对抗条件下。
摘要:Deep neural networks often fail when deployed in real-world contexts due to distribution shift, a critical barrier to building safe and reliable systems. An emerging approach to address this problem relies on \emph{disagreement discrepancy} -- a measure of how the disagreement between two models changes under a shifting distribution. The process of maximizing this measure has seen applications in bounding error under shifts, testing for harmful shifts, and training more robust models. However, this optimization involves the non-differentiable zero-one loss, necessitating the use of practical surrogate losses. We prove that existing surrogates for disagreement discrepancy are not Bayes consistent, revealing a fundamental flaw: maximizing these surrogates can fail to maximize the true disagreement discrepancy. To address this, we introduce new theoretical results providing both upper and lower bounds on the optimality gap for such surrogates. Guided by this theory, we propose a novel disagreement loss that, when paired with cross-entropy, yields a provably consistent surrogate for disagreement discrepancy. Empirical evaluations across diverse benchmarks demonstrate that our method provides more accurate and robust estimates of disagreement discrepancy than existing approaches, particularly under challenging adversarial conditions.


【3】KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
标题:KQ-SVD:压缩KV缓存,可证明注意力保真度的保证
链接:https://arxiv.org/abs/2512.05916

作者:Damien Lesens,Beheshteh T. Rakhshan,Guillaume Rabusseau
摘要:键值(KV)缓存对于基于transformer的大型语言模型(LLM)的效率至关重要,它存储先前计算的向量以加速推理。然而,随着序列长度和批处理大小的增长,缓存成为主要的内存瓶颈。以前的压缩方法通常只对键进行低秩分解,或者尝试联合嵌入查询和键,但这两种方法都忽略了注意力从根本上取决于它们的内积。在这项工作中,我们证明了这样的策略是次优的近似注意矩阵。我们介绍KQ-SVD,一个简单的和计算效率高的方法,直接执行一个最佳的低秩分解的注意力矩阵通过一个封闭的形式的解决方案。通过瞄准冗余的真实来源,KQ-SVD在压缩下以更高保真度保留注意力输出。对LLaMA和Mistral模型的广泛评估表明,我们的方法始终提供卓越的投影质量。
摘要:The Key-Value (KV) cache is central to the efficiency of transformer-based large language models (LLMs), storing previously computed vectors to accelerate inference. Yet, as sequence length and batch size grow, the cache becomes a major memory bottleneck. Prior compression methods typically apply low-rank decomposition to keys alone or attempt to jointly embed queries and keys, but both approaches neglect that attention fundamentally depends on their inner products. In this work, we prove that such strategies are suboptimal for approximating the attention matrix. We introduce KQ-SVD, a simple and computationally efficient method that directly performs an optimal low-rank decomposition of the attention matrix via a closed-form solution. By targeting the true source of redundancy, KQ-SVD preserves attention outputs with higher fidelity under compression. Extensive evaluations on LLaMA and Mistral models demonstrate that our approach consistently delivers superior projection quality.


【4】Neural Coherence : Find higher performance to out-of-distribution tasks from few samples
标题:神经一致性:从少量样本中找到更高的分布任务性能
链接:https://arxiv.org/abs/2512.05880

作者:Simon Guiroy,Mats Richter,Sarath Chandar,Christopher Pal
摘要:为了为许多下游任务创建最先进的模型,微调预训练的大型视觉模型已成为常见的做法。然而,如何最好地确定由大规模训练运行产生的许多可能的模型检查点中的哪一个作为起点仍然是一个悬而未决的问题。当感兴趣的目标任务的数据稀缺、未标记和分布不清时,这一点变得尤为重要。在这种情况下,依赖于分发验证数据的常用方法变得不可靠或不适用。这项工作提出了一种新的模型选择方法,该方法只对目标任务中的几个未标记的示例进行可靠的操作。我们的方法是基于一个新的概念:神经相干性,这需要表征模型的激活统计源和目标域,允许一个定义模型选择方法具有高数据效率。我们提供了在ImageNet 1 K上预训练模型的实验,并检查了由Food-101,PlantNet-300 K和iNaturalist组成的目标域。我们还在许多元学习环境中对其进行了评估。与已建立的基线相比,我们的方法显着提高了这些不同目标域的泛化能力。我们通过展示其在训练数据选择中的有效性,进一步证明了神经一致性作为一个强大原则的多功能性。
摘要:To create state-of-the-art models for many downstream tasks, it has become common practice to fine-tune a pre-trained large vision model. However, it remains an open question of how to best determine which of the many possible model checkpoints resulting from a large training run to use as the starting point. This becomes especially important when data for the target task of interest is scarce, unlabeled and out-of-distribution. In such scenarios, common methods relying on in-distribution validation data become unreliable or inapplicable. This work proposes a novel approach for model selection that operates reliably on just a few unlabeled examples from the target task. Our approach is based on a novel concept: Neural Coherence, which entails characterizing a model's activation statistics for source and target domains, allowing one to define model selection methods with high data-efficiency. We provide experiments where models are pre-trained on ImageNet1K and examine target domains consisting of Food-101, PlantNet-300K and iNaturalist. We also evaluate it in many meta-learning settings. Our approach significantly improves generalization across these different target domains compared to established baselines. We further demonstrate the versatility of Neural Coherence as a powerful principle by showing its effectiveness in training data selection.


【5】Sparse Attention Post-Training for Mechanistic Interpretability
标题:稀疏注意后训练的机械解释性
链接:https://arxiv.org/abs/2512.05865

作者:Florent Draye,Anson Lei,Ingmar Posner,Bernhard Schölkopf
摘要:我们引入了一个简单的后训练方法,使Transformer的注意力稀疏,而不牺牲性能。在约束损失目标下应用灵活的稀疏正则化,我们在高达1B参数的模型上显示,可以保留原始的预训练损失,同时将注意力连接减少到其边缘的$\approximately 0.3 \%$。与为提高计算效率而设计的稀疏注意方法不同,我们的方法利用稀疏性作为结构先验:它保留了能力,同时暴露了更有组织和可解释的连接模式。我们发现,这种局部稀疏级联成全局电路简化:特定于任务的电路涉及更少的组件(注意头和MLP),连接它们的边少了100倍。这些结果表明,可以使Transformer注意力稀疏几个数量级,这表明其大部分计算是冗余的,稀疏性可以作为更具结构化和可解释模型的指导原则。
摘要:We introduce a simple post-training method that makes transformer attention sparse without sacrificing performance. Applying a flexible sparsity regularisation under a constrained-loss objective, we show on models up to 1B parameters that it is possible to retain the original pretraining loss while reducing attention connectivity to $\approx 0.3 \%$ of its edges. Unlike sparse-attention methods designed for computational efficiency, our approach leverages sparsity as a structural prior: it preserves capability while exposing a more organized and interpretable connectivity pattern. We find that this local sparsity cascades into global circuit simplification: task-specific circuits involve far fewer components (attention heads and MLPs) with up to 100x fewer edges connecting them. These results demonstrate that transformer attention can be made orders of magnitude sparser, suggesting that much of its computation is redundant and that sparsity may serve as a guiding principle for more structured and interpretable models.


【6】The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics
标题:AGI缺失的层:从模式炼金术到协调物理
链接:https://arxiv.org/abs/2512.05765

作者:Edward Y. Chang
备注:13 pages, 3 figures
摘要:有影响力的批评认为,大型语言模型(LLM)是AGI的死胡同:“仅仅是模式匹配器”在结构上无法推理或规划。我们认为这个结论错误地识别了瓶颈:它混淆了海洋和网络。模式库是System-1的必要基础;缺少的组件是System-2协调层,它选择、约束和绑定这些模式。我们正式通过UCCT,语义锚定的理论,模型推理的有效支持(rho_d),代表性的不匹配(d_r),和自适应锚定预算(伽玛日志k)的相变。在这个镜头下,不接地生成只是一个无诱饵检索基板的最大似然先验,而“推理”出现时,锚转向后向目标导向的约束。我们将UCCT转换为具有MACI的架构,MACI是一个协调堆栈,可以实现诱饵(行为调制辩论),过滤(苏格拉底式判断)和持久性(事务内存)。通过将常见的反对意见重新定义为可测试的协调失败,我们认为AGI的路径贯穿LLM,而不是围绕它们。
摘要:Influential critiques argue that Large Language Models (LLMs) are a dead end for AGI: "mere pattern matchers" structurally incapable of reasoning or planning. We argue this conclusion misidentifies the bottleneck: it confuses the ocean with the net. Pattern repositories are the necessary System-1 substrate; the missing component is a System-2 coordination layer that selects, constrains, and binds these patterns. We formalize this layer via UCCT, a theory of semantic anchoring that models reasoning as a phase transition governed by effective support (rho_d), representational mismatch (d_r), and an adaptive anchoring budget (gamma log k). Under this lens, ungrounded generation is simply an unbaited retrieval of the substrate's maximum likelihood prior, while "reasoning" emerges when anchors shift the posterior toward goal-directed constraints. We translate UCCT into architecture with MACI, a coordination stack that implements baiting (behavior-modulated debate), filtering (Socratic judging), and persistence (transactional memory). By reframing common objections as testable coordination failures, we argue that the path to AGI runs through LLMs, not around them.


【7】InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem
标题:InverseCrafter:作为潜在领域逆问题的高效视频重新捕获
链接:https://arxiv.org/abs/2512.05672

作者:Yeobin Hong,Suhyeon Lee,Hyungjin Chung,Jong Chul Ye
摘要:最近的可控4D视频生成方法通常依赖于微调预训练的视频扩散模型(VDM)。这种占主导地位的范式在计算上是昂贵的,需要大规模的数据集和架构修改,并经常遭受灾难性的遗忘模型的原始生成先验。在这里,我们提出了InverseCrafter,一个有效的修复逆求解器,重新制定的4D生成任务作为一个修复问题解决的潜在空间。我们的方法的核心是一个原则性的机制,将像素空间退化算子编码成一个连续的,多通道的潜在掩模,从而绕过重复的VAE操作和反向传播的昂贵的瓶颈。InverseCrafter不仅在摄像机控制任务中实现了相当新颖的视图生成和卓越的测量一致性,计算开销几乎为零,而且在通用视频修复和编辑方面也表现出色。代码可在https://github.com/yeobinhong/InverseCrafter上获得。
摘要:Recent approaches to controllable 4D video generation often rely on fine-tuning pre-trained Video Diffusion Models (VDMs). This dominant paradigm is computationally expensive, requiring large-scale datasets and architectural modifications, and frequently suffers from catastrophic forgetting of the model's original generative priors. Here, we propose InverseCrafter, an efficient inpainting inverse solver that reformulates the 4D generation task as an inpainting problem solved in the latent space. The core of our method is a principled mechanism to encode the pixel space degradation operator into a continuous, multi-channel latent mask, thereby bypassing the costly bottleneck of repeated VAE operations and backpropagation. InverseCrafter not only achieves comparable novel view generation and superior measurement consistency in camera control tasks with near-zero computational overhead, but also excels at general-purpose video inpainting with editing. Code is available at https://github.com/yeobinhong/InverseCrafter.


【8】Feasibility of AI-Assisted Programming for End-User Development
标题:人工智能辅助编程用于最终用户开发的可行性
链接:https://arxiv.org/abs/2512.05666

作者:Irene Weber
备注:12 pages, 3 figures
摘要:最终用户开发,即非程序员创建或调整自己的数字化工具,可以在推动组织内部的数字化转型方面发挥关键作用。目前,低代码/无代码平台被广泛用于通过可视化编程实现最终用户开发,从而最大限度地减少手动编码的需求。生成式人工智能的最新进展,特别是基于大型语言模型的助手和“副驾驶”,开辟了新的可能性,因为它们可以使最终用户生成和改进编程代码,并直接从自然语言提示中构建应用程序。这种方法在这里被称为AI辅助的最终用户编码,与现有的可视化LCNC平台相比,它具有更大的灵活性、更广泛的适用性、更快的开发速度、更高的可重用性和更少的供应商锁定。本文研究了人工智能辅助的最终用户编码是否是最终用户开发的可行范例,这可能会在未来补充甚至取代LCNC模型。为了探索这一点,我们进行了一项案例研究,要求非程序员通过与人工智能助手的互动开发一个基本的Web应用程序。大多数研究参与者在合理的时间内成功完成了任务,并表示支持人工智能辅助最终用户编码作为最终用户开发的可行方法。本文介绍了研究设计,分析了结果,并讨论了潜在的影响,为实践,未来的研究和学术教学。
摘要 :End-user development,where non-programmers create or adapt their own digital tools, can play a key role in driving digital transformation within organizations. Currently, low-code/no-code platforms are widely used to enable end-user development through visual programming, minimizing the need for manual coding. Recent advancements in generative AI, particularly large language model-based assistants and "copilots", open new possibilities, as they may enable end users to generate and refine programming code and build apps directly from natural language prompts. This approach, here referred to as AI-assisted end-user coding, promises greater flexibility, broader applicability, faster development, improved reusability, and reduced vendor lock-in compared to the established visual LCNC platforms. This paper investigates whether AI-assisted end-user coding is a feasible paradigm for end-user development, which may complement or even replace the LCNC model in the future. To explore this, we conducted a case study in which non-programmers were asked to develop a basic web app through interaction with AI assistants.The majority of study participants successfully completed the task in reasonable time and also expressed support for AI-assisted end-user coding as a viable approach for end-user development. The paper presents the study design, analyzes the outcomes, and discusses potential implications for practice, future research, and academic teaching.


【9】Hyperparameter Transfer Enables Consistent Gains of Matrix-Preconditioned Optimizers Across Scales
标题:超参数传递使矩阵预处理优化器跨尺度增益一致
链接:https://arxiv.org/abs/2512.05620

作者:Shikai Qiu,Zixi Chen,Hoang Phan,Qi Lei,Andrew Gordon Wilson
备注:NeurIPS 2025. Code available at: https://github.com/charliezchen/scaling-matrix-preconditioning
摘要:最近引入的几个利用矩阵级预处理的深度学习优化器相对于当前占主导地位的优化器AdamW显示出有希望的加速,特别是在相对小规模的实验中。然而,据报告,为验证和推广其成功经验所作的努力取得了好坏参半的结果。为了更好地理解这些优化器在规模上的有效性,在这项工作中,我们研究了如何通过超参数转移来扩展预处理优化器,建立在先前的工作,如$μ$P。我们研究了最佳学习率和权重衰减如何随着模型的宽度和深度而扩展,包括洗发水,SOAP和Muon,考虑到常用技术如封闭和嫁接的影响。我们发现,根据$μ$P缩放学习率可以提高迁移率,但仍然会受到导致漂移最佳学习率的显着有限宽度偏差的影响,我们表明可以通过阻塞和显式频谱归一化来减轻这种偏差。对于计算最优缩放,我们发现缩放无关的权重衰减为$1/\mathrm{width}$在优化器中几乎是最优的。应用这些缩放规则,我们表明Muon和Shampoo在训练大小从1.9亿美元到1.4亿美元的Llama架构语言模型时,与AdamW相比始终实现了1.4\times $和1.3\times $的加速,而在不正确的缩放下,加速迅速消失。基于这些结果和进一步的消融,我们认为,研究最佳超参数转移是必不可少的可靠地比较优化在规模给定一个现实的调整预算。
摘要:Several recently introduced deep learning optimizers utilizing matrix-level preconditioning have shown promising speedups relative to the current dominant optimizer AdamW, particularly in relatively small-scale experiments. However, efforts to validate and replicate their successes have reported mixed results. To better understand the effectiveness of these optimizers at scale, in this work we investigate how to scale preconditioned optimizers via hyperparameter transfer, building on prior works such as $μ$P. We study how the optimal learning rate and weight decay should scale with model width and depth for a wide range of optimizers, including Shampoo, SOAP, and Muon, accounting for the impact of commonly used techniques such as blocking and grafting. We find that scaling the learning rate according to $μ$P improves transfer, but can still suffer from significant finite-width deviations that cause drifting optimal learning rates, which we show can be mitigated by blocking and explicit spectral normalization. For compute-optimal scaling, we find scaling independent weight decay as $1/\mathrm{width}$ is nearly optimal across optimizers. Applying these scaling rules, we show Muon and Shampoo consistently achieve $1.4\times$ and $1.3\times$ speedup over AdamW for training Llama-architecture language models of sizes ranging from $190$M to $1.4$B, whereas the speedup vanishes rapidly with scale under incorrect scaling. Based on these results and further ablations, we argue that studying optimal hyperparameter transfer is essential for reliably comparing optimizers at scale given a realistic tuning budget.


【10】Credal and Interval Deep Evidential Classifications
标题:Credal和Interval深度证据分类
链接:https://arxiv.org/abs/2512.05526

作者:Michele Caprio,Shireen K. Manchingal,Fabio Cuzzolin
摘要:不确定性量化(UQ)是人工智能(AI)领域的一个关键挑战,深刻影响着决策、风险评估和模型可靠性。在本文中,我们介绍了Credal和Interval Deep Evidential Classifications(分别为CDEC和IDEC)作为解决分类任务中UQ的新方法。CDEC和IDEC分别利用一个置信集(概率的闭凸集)和一个证据预测分布区间,使我们能够避免对训练数据的过度拟合,并系统地评估认识(可约)和任意(不可约)的不确定性。当这些不确定性超过可接受的阈值时,CDEC和IDEC有能力放弃分类,并标记相关的过度认识或任意不确定性。相反,在可接受的不确定性范围内,CDEC和IDEC提供了一个强大的概率保证标签的集合。CDEC和IDEC使用标准的反向传播和来自证据理论的损失函数进行训练。他们克服了以前努力的缺点,并扩展了目前的证据深度学习文献。通过对MNIST、CIFAR-10和CIFAR-100及其自然OoD偏移(F-MNIST/K-MNIST、SVHN/Intel、TinyImageNet)的广泛实验,我们证明了CDEC和IDEC在认知和总体不确定性下实现了具有竞争力的预测准确性、最先进的OoD检测,以及在分布偏移下可靠扩展的紧密、校准良好的预测区域。在合奏大小的烧蚀进一步表明,CDEC达到稳定的不确定性估计,只有一个小的合奏。
摘要:Uncertainty Quantification (UQ) presents a pivotal challenge in the field of Artificial Intelligence (AI), profoundly impacting decision-making, risk assessment and model reliability. In this paper, we introduce Credal and Interval Deep Evidential Classifications (CDEC and IDEC, respectively) as novel approaches to address UQ in classification tasks. CDEC and IDEC leverage a credal set (closed and convex set of probabilities) and an interval of evidential predictive distributions, respectively, allowing us to avoid overfitting to the training data and to systematically assess both epistemic (reducible) and aleatoric (irreducible) uncertainties. When those surpass acceptable thresholds, CDEC and IDEC have the capability to abstain from classification and flag an excess of epistemic or aleatoric uncertainty, as relevant. Conversely, within acceptable uncertainty bounds, CDEC and IDEC provide a collection of labels with robust probabilistic guarantees. CDEC and IDEC are trained using standard backpropagation and a loss function that draws from the theory of evidence. They overcome the shortcomings of previous efforts, and extend the current evidential deep learning literature. Through extensive experiments on MNIST, CIFAR-10 and CIFAR-100, together with their natural OoD shifts (F-MNIST/K-MNIST, SVHN/Intel, TinyImageNet), we show that CDEC and IDEC achieve competitive predictive accuracy, state-of-the-art OoD detection under epistemic and total uncertainty, and tight, well-calibrated prediction regions that expand reliably under distribution shift. An ablation over ensemble size further demonstrates that CDEC attains stable uncertainty estimates with only a small ensemble.


【11】Turbulence Regression
标题:湍流回归
链接:https://arxiv.org/abs/2512.05483

作者:Yingang Fan,Binjie Ding,Baiyi Chen
摘要:空气湍流是指气流在流动过程中,由于速度、压力或方向的剧烈变化而产生的无序、不规则的运动状态。各种复杂的因素导致了复杂的低空湍流后果。在目前的观测条件下,特别是当仅使用风廓线雷达数据时,传统方法很难准确预测湍流状态。因此,本文介绍了一种利用离散数据的NeuTucker分解模型。针对连续稀疏的三维风场数据,构建了基于Tucker神经网络的低秩Tucker分解模型,以捕捉三维风场数据中潜在的相互作用。因此,这里提出了两个核心思想:1)离散化连续输入数据,以适应像NeuTucF这样需要离散数据输入的模型。2)构造一个四维Tucker相互作用张量来表示不同海拔和三维风速之间所有可能的时空相互作用。在估计真实数据集中的缺失观测值时,与各种常见的回归模型相比,这种离散化NeuTucF模型表现出优越的性能。
摘要:Air turbulence refers to the disordered and irregular motion state generated by drastic changes in velocity, pressure, or direction during airflow. Various complex factors lead to intricate low-altitude turbulence outcomes. Under current observational conditions, especially when using only wind profile radar data, traditional methods struggle to accurately predict turbulence states. Therefore, this paper introduces a NeuTucker decomposition model utilizing discretized data. Designed for continuous yet sparse three-dimensional wind field data, it constructs a low-rank Tucker decomposition model based on a Tucker neural network to capture the latent interactions within the three-dimensional wind field data. Therefore, two core ideas are proposed here: 1) Discretizing continuous input data to adapt to models like NeuTucF that require discrete data inputs. 2) Constructing a four-dimensional Tucker interaction tensor to represent all possible spatio-temporal interactions among different elevations and three-dimensional wind speeds. In estimating missing observations in real datasets, this discretized NeuTucF model demonstrates superior performance compared to various common regression models.


【12】EXR: An Interactive Immersive EHR Visualization in Extended Reality
标题:EXR:延展实境中的交互式沉浸式EHR可视化
链接:https://arxiv.org/abs/2512.05438

作者:Benoit Marteau,Shaun Q. Y. Tan,Jieru Li,Andrew Hornback,Yishan Zhong,Shaunna Wang,Christian Lowson,Jason Woloff,Joshua M. Pahys,Steven W. Hwang,Coleman Hilton,May D. Wang
备注:11 pages, 6 figures. Preprint version. This paper has been accepted to IEEE ICIR 2025. This is the author-prepared version and not the final published version. The final version will appear in IEEE Xplo
摘要:本文介绍了一个延展实境(XR)平台的沉浸式,交互式可视化的电子健康档案(EHR)的设计和实现。该系统通过将结构化和非结构化患者数据可视化到共享的3D环境中,从而扩展了传统的2D界面,实现了直观的探索和实时协作。模块化基础设施将基于FHIR的EHR数据与体积医学成像和AI生成的分割相集成,确保与现代医疗保健系统的互操作性。该平台的功能使用合成EHR数据集和通过AI驱动的分割管道处理的计算机断层扫描(CT)衍生脊柱模型进行演示。这项工作表明,这种集成的XR解决方案可以为下一代临床决策支持工具奠定基础,在交互式和空间丰富的环境中可以直接访问高级数据基础设施。
摘要:This paper presents the design and implementation of an Extended Reality (XR) platform for immersive, interactive visualization of Electronic Health Records (EHRs). The system extends beyond conventional 2D interfaces by visualizing both structured and unstructured patient data into a shared 3D environment, enabling intuitive exploration and real-time collaboration. The modular infrastructure integrates FHIR-based EHR data with volumetric medical imaging and AI-generated segmentation, ensuring interoperability with modern healthcare systems. The platform's capabilities are demonstrated using synthetic EHR datasets and computed tomography (CT)-derived spine models processed through an AI-powered segmentation pipeline. This work suggests that such integrated XR solutions could form the foundation for next-generation clinical decision-support tools, where advanced data infrastructures are directly accessible in an interactive and spatially rich environment.


【13】ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering
标题:ArtistMus:全球多元化、以艺术家为中心的检索增强音乐问题解答基准
链接:https://arxiv.org/abs/2512.05430

作者:Daeyong Kwon,SeungHeon Doh,Juhan Nam
备注:Submitted to LREC 2026. This work is an evolution of our earlier preprint arXiv:2507.23334
摘要:大型语言模型(LLM)的最新进展已经改变了开放域问题回答,但由于预训练数据中的音乐知识稀疏,它们在音乐相关推理中的有效性仍然有限。虽然音乐信息检索和计算音乐学已经探索了结构化和多模态理解,但很少有资源支持基于艺术家元数据或历史背景的事实和上下文音乐问答(MQA)。我们介绍MusWikiDB,一个来自144 K音乐相关维基百科页面的320万段落的矢量数据库,以及ArtistMus,一个关于500位不同艺术家的1,000个问题的基准,包括流派,出道年份和主题等元数据。这些资源使检索增强生成(RAG)的MQA的系统评估。实验表明,RAG显著提高了事实准确性;开源模型提高了+56.8个百分点(例如,Qwen 3 8B从35.0提高到91.8),接近专有模型的性能。RAG风格的微调进一步提高了事实回忆和上下文推理,改善了域内和域外基准测试的结果。MusWikiDB的准确率比通用维基百科语料库高出约6个百分点,检索速度快40%。我们发布MusWikiDB和ArtistMus,以推进音乐信息检索和特定领域问题回答的研究,为音乐等文化丰富的领域的检索增强推理奠定基础。
摘要:Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowledge in pretraining data. While music information retrieval and computational musicology have explored structured and multimodal understanding, few resources support factual and contextual music question answering (MQA) grounded in artist metadata or historical context. We introduce MusWikiDB, a vector database of 3.2M passages from 144K music-related Wikipedia pages, and ArtistMus, a benchmark of 1,000 questions on 500 diverse artists with metadata such as genre, debut year, and topic. These resources enable systematic evaluation of retrieval-augmented generation (RAG) for MQA. Experiments show that RAG markedly improves factual accuracy; open-source models gain up to +56.8 percentage points (for example, Qwen3 8B improves from 35.0 to 91.8), approaching proprietary model performance. RAG-style fine-tuning further boosts both factual recall and contextual reasoning, improving results on both in-domain and out-of-domain benchmarks. MusWikiDB also yields approximately 6 percentage points higher accuracy and 40% faster retrieval than a general-purpose Wikipedia corpus. We release MusWikiDB and ArtistMus to advance research in music information retrieval and domain-specific question answering, establishing a foundation for retrieval-augmented reasoning in culturally rich domains such as music.


【14】RevoNAD: Reflective Evolutionary Exploration for Neural Architecture Design
标题:RevoNAD:神经架构设计的反思进化探索
链接:https://arxiv.org/abs/2512.05403

作者:Gyusam Chang,Jeongyoon Yoon,Shin han yi,JaeHyeok Lee,Sujin Jang,Sangpil Kim
摘要:利用大型语言模型(LLM)的最新进展使神经架构设计(NAD)系统能够生成不受手动预定义搜索空间限制的新架构。尽管如此,LLM驱动的生成仍然具有挑战性:令牌级设计循环是离散和不可区分的,阻止了反馈顺利指导架构改进。这些方法,反过来,通常遭受模式崩溃到冗余结构或漂移到不可行的设计时,建设性的推理是没有很好的基础。我们介绍了RevoNAD,一个反射进化协调器,有效地桥梁基于LLM的推理与反馈对齐的架构搜索。首先,RevoNAD提出了多轮多专家共识,将孤立的设计规则转换为有意义的架构线索。然后,自适应反射探索调整利用奖励方差的探索程度;它在反馈不确定时进行探索,并在达到稳定时进行改进。最后,Pareto引导的进化选择有效地促进了共同优化准确性,效率,延迟,置信度和结构多样性的架构。在CIFAR 10、CIFAR 100、ImageNet 16 -120、COCO-5 K和Cityscape中,RevoNAD实现了最先进的性能。消融和转移研究进一步验证了RevoNAD在允许实际可靠和可部署的神经架构设计方面的有效性。
摘要:Recent progress in leveraging large language models (LLMs) has enabled Neural Architecture Design (NAD) systems to generate new architecture not limited from manually predefined search space. Nevertheless, LLM-driven generation remains challenging: the token-level design loop is discrete and non-differentiable, preventing feedback from smoothly guiding architectural improvement. These methods, in turn, commonly suffer from mode collapse into redundant structures or drift toward infeasible designs when constructive reasoning is not well grounded. We introduce RevoNAD, a reflective evolutionary orchestrator that effectively bridges LLM-based reasoning with feedback-aligned architectural search. First, RevoNAD presents a Multi-round Multi-expert Consensus to transfer isolated design rules into meaningful architectural clues. Then, Adaptive Reflective Exploration adjusts the degree of exploration leveraging reward variance; it explores when feedback is uncertain and refines when stability is reached. Finally, Pareto-guided Evolutionary Selection effectively promotes architectures that jointly optimize accuracy, efficiency, latency, confidence, and structural diversity. Across CIFAR10, CIFAR100, ImageNet16-120, COCO-5K, and Cityscape, RevoNAD achieves state-of-the-art performance. Ablation and transfer studies further validate the effectiveness of RevoNAD in allowing practically reliable, and deployable neural architecture design.


【15】Generalization Beyond Benchmarks: Evaluating Learnable Protein-Ligand Scoring Functions on Unseen Targets
标题:超越基准的概括:评估可学习蛋白质配体对不可见目标的评分功能
链接:https://arxiv.org/abs/2512.05386

作者:Jakub Kopko,David Graber,Saltuk Mustafa Eyrilmez,Stanislav Mazurenko,David Bednar,Jiri Sedlar,Josef Sivic
备注:15 pages, 6 figures, submitted to NeurIPS 2025 AI4Science Workshop
摘要 :随着机器学习越来越成为分子设计的核心,确保新蛋白质靶点上可学习的蛋白质-配体评分函数的可靠性至关重要。虽然许多评分函数在标准基准测试中表现良好,但它们在训练数据之外的泛化能力仍然是一个重大挑战。在这项工作中,我们评估了最先进的评分函数对数据集分割的泛化能力,这些数据集分割模拟了对具有有限数量的已知结构和实验亲和力测量的目标的评估。我们的分析表明,常用的基准并没有反映出真正的挑战,推广到新的目标。我们还研究了大规模的自我监督预训练是否可以弥合这种泛化差距,并提供了其潜力的初步证据。此外,我们探讨了利用有限的测试目标数据来提高评分功能性能的简单方法的有效性。我们的研究结果强调了需要更严格的评估方案,并为设计具有预测能力的评分函数提供了实用指导,这些功能扩展到新的蛋白质靶点。
摘要:As machine learning becomes increasingly central to molecular design, it is vital to ensure the reliability of learnable protein-ligand scoring functions on novel protein targets. While many scoring functions perform well on standard benchmarks, their ability to generalize beyond training data remains a significant challenge. In this work, we evaluate the generalization capability of state-of-the-art scoring functions on dataset splits that simulate evaluation on targets with a limited number of known structures and experimental affinity measurements. Our analysis reveals that the commonly used benchmarks do not reflect the true challenge of generalizing to novel targets. We also investigate whether large-scale self-supervised pretraining can bridge this generalization gap and we provide preliminary evidence of its potential. Furthermore, we probe the efficacy of simple methods that leverage limited test-target data to improve scoring function performance. Our findings underscore the need for more rigorous evaluation protocols and offer practical guidance for designing scoring functions with predictive power extending to novel protein targets.


【16】Interaction Tensor Shap
标题:相互作用张量形状
链接:https://arxiv.org/abs/2512.05338

作者:Hiroki Hasegawa,Yukihiko Okada
备注:30 pages
摘要:机器学习模型变得越来越深入和高维,使得人们很难理解单个和组合特征如何影响它们的预测。虽然基于Shapley值的方法提供了原则性的特征属性,但现有的公式无法追踪地评估更高阶的相互作用:Shapley Taylor相互作用指数(STII)需要子集的指数级枚举,而当前基于张量的方法(如边际SHAP张量(MST))仅限于一阶效应。核心问题是,没有现有的框架,同时保持公理的精确性STII,并避免指数计算爆破固有的高阶离散导数。在这里,我们表明,高阶Shapley相互作用可以精确地表示为张量网络收缩,使多项式时间和polylog深度计算下张量训练(TT)结构。我们引入交互张量SHAP(IT SHAP),它重新制定STII作为一个值张量和一个权重张量的收缩,并假设一个有限状态TT表示的权重张量与多项式TT秩。在TT结构模型和分布张量下,我们证明了IT SHAP将STII的指数复Theta(4^n)减少到NC 2并行时间。这些结果表明,IT SHAP提供了一个统一的,公理化的,计算上易处理的制定主效应和高阶相互作用的高维模型。该框架为可扩展的交互感知可解释人工智能奠定了基础,并对大型黑盒模型产生了影响,这些模型的组合结构以前使交互分析变得不可行。
摘要:Machine learning models have grown increasingly deep and high dimensional, making it difficult to understand how individual and combined features influence their predictions. While Shapley value based methods provide principled feature attributions, existing formulations cannot tractably evaluate higher order interactions: the Shapley Taylor Interaction Index (STII) requires exponential scale enumeration of subsets, and current tensor based approaches such as the Marginal SHAP Tensor (MST) are restricted to first order effects. The central problem is that no existing framework simultaneously preserves the axiomatic exactness of STII and avoids the exponential computational blow up inherent to high order discrete derivatives. Here we show that high order Shapley interactions can be represented exactly as tensor network contractions, enabling polynomial time and polylog depth computation under Tensor Train (TT) structure. We introduce Interaction Tensor SHAP (IT SHAP), which reformulates STII as the contraction of a Value Tensor and a Weight Tensor, and assume a finite state TT representation of the Weight Tensor with polynomial TT ranks. Under TT structured model and distribution tensors, we show that IT SHAP reduces the exponential complex Theta(4^n) of STII to NC2 parallel time. These results demonstrate that IT SHAP provides a unified, axiomatic, and computationally tractable formulation of main effects and higher order interactions in high dimensional models. This framework establishes a foundation for scalable interaction aware explainable AI, with implications for large black box models whose combinatorial structure has previously rendered interaction analysis infeasible.


【17】Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay
标题:通过脱钩优先级体验重播增强持续控制任务的深度确定性政策因素
链接:https://arxiv.org/abs/2512.05320

作者:Mehmet Efe Lorasdagi,Dogan Can Cicek,Furkan Burak Mutlu,Suleyman Serdar Kozat
摘要:背景资料:基于深度确定性策略的强化学习算法利用Actor-Critic架构,其中两个网络通常使用相同批次的重播转换进行训练。然而,Actor和Critic的学习目标和更新动态不同,这引起了人们对均匀过渡使用是否是最佳的关注。   目的:我们的目标是通过解耦用于训练Actor和Critic的过渡批次来提高深度确定性策略梯度算法的性能。我们的目标是设计一个经验重放机制,通过使用单独的,定制的批次,为每个组件提供适当的学习信号。   研究方法:我们介绍了解耦优先经验重放(DPER),一种新的方法,允许独立的采样过渡批次的演员和评论家。DPER可以集成到任何在连续控制域中运行的非策略深度强化学习算法中。我们将DPER与最先进的双延迟DDPG算法相结合,并在标准的连续控制基准测试中评估其性能。   结果如下:DPER在OpenAI Gym套件的多个MuJoCo任务中的表现优于传统的体验重播策略,例如香草体验重播和优先体验重播。   结论:我们的研究结果表明,Actor和Critic网络的解耦经验重放可以增强训练动态和最终政策质量。DPER提供了一种可推广的机制,可以增强广泛的Actor-Critic离线策略强化学习算法的性能。
摘要:Background: Deep Deterministic Policy Gradient-based reinforcement learning algorithms utilize Actor-Critic architectures, where both networks are typically trained using identical batches of replayed transitions. However, the learning objectives and update dynamics of the Actor and Critic differ, raising concerns about whether uniform transition usage is optimal.   Objectives: We aim to improve the performance of deep deterministic policy gradient algorithms by decoupling the transition batches used to train the Actor and the Critic. Our goal is to design an experience replay mechanism that provides appropriate learning signals to each component by using separate, tailored batches.   Methods: We introduce Decoupled Prioritized Experience Replay (DPER), a novel approach that allows independent sampling of transition batches for the Actor and the Critic. DPER can be integrated into any off-policy deep reinforcement learning algorithm that operates in continuous control domains. We combine DPER with the state-of-the-art Twin Delayed DDPG algorithm and evaluate its performance across standard continuous control benchmarks.   Results: DPER outperforms conventional experience replay strategies such as vanilla experience replay and prioritized experience replay in multiple MuJoCo tasks from the OpenAI Gym suite.   Conclusions: Our findings show that decoupling experience replay for Actor and Critic networks can enhance training dynamics and final policy quality. DPER offers a generalizable mechanism that enhances performance for a wide class of actor-critic off-policy reinforcement learning algorithms.


【18】When unlearning is free: leveraging low influence points to reduce computational costs
标题:当放弃学习是免费的:利用低影响点来降低计算成本
链接:https://arxiv.org/abs/2512.05254

作者:Anat Kleiman,Robert Fisher,Ben Deaner,Udi Wieder
摘要:随着对机器学习中数据隐私的担忧不断增加,从训练模型中删除或删除特定数据点的能力变得越来越重要。虽然最先进的遗忘方法已经出现,但它们通常平等地对待遗忘集中的所有点。在这项工作中,我们通过询问是否需要删除对模型学习影响可以忽略不计的点来挑战这种方法。通过对语言和视觉任务的影响函数进行比较分析,我们确定了对模型输出影响可以忽略不计的训练数据子集。利用这一洞察力,我们提出了一个有效的遗忘框架,该框架在遗忘之前减小了数据集的大小,从而在现实世界的经验示例中节省了大量的计算(高达约50%)。
摘要 :As concerns around data privacy in machine learning grow, the ability to unlearn, or remove, specific data points from trained models becomes increasingly important. While state of the art unlearning methods have emerged in response, they typically treat all points in the forget set equally. In this work, we challenge this approach by asking whether points that have a negligible impact on the model's learning need to be removed. Through a comparative analysis of influence functions across language and vision tasks, we identify subsets of training data with negligible impact on model outputs. Leveraging this insight, we propose an efficient unlearning framework that reduces the size of datasets before unlearning leading to significant computational savings (up to approximately 50 percent) on real world empirical examples.


【19】Edged Weisfeiler-Lehman Algorithm
标题:边缘韦斯费勒-雷曼算法
链接:https://arxiv.org/abs/2512.05238

作者:Xiao Yue,Bo Liu,Feng Zhang,Guangzhi Qu
备注:Author's Accepted Manuscript (AAM) of ICANN 2024 paper published in LNCS (Springer). Final version available at: https://link.springer.com/chapter/10.1007/978-3-031-72344-5_7
摘要:作为图学习的经典方法,传播-聚合方法被许多图神经网络(GNN)广泛利用,其中节点的表示通过递归聚合来自其自身和邻居节点的表示来更新。与传播-聚合方法类似,Weisfeiler-Lehman(1-WL)算法根据节点及其邻居节点的颜色表示通过颜色细化来测试同构。然而,1-WL不利用任何边缘特征(标签),在某些领域中呈现对利用边缘特征的潜在改进。为了解决这个问题,我们提出了一种新的边缘WL算法(E-WL),它扩展了原来的1-WL算法,将边缘功能。在E-WL算法的基础上,我们还引入了一个边缘图同构网络(EGIN)模型,用于进一步利用边缘特征,这解决了许多GNN中不利用图数据的任何边缘特征的一个关键缺点。我们使用12个边缘特征的基准图数据集评估了所提出的模型的性能,并将它们与一些最先进的基线模型进行了比较。实验结果表明,我们提出的EGIN模型,在一般情况下,表现出优越的性能图学习图分类任务。
摘要:As a classical approach on graph learning, the propagation-aggregation methodology is widely exploited by many of Graph Neural Networks (GNNs), wherein the representation of a node is updated by aggregating representations from itself and neighbor nodes recursively. Similar to the propagation-aggregation methodology, the Weisfeiler-Lehman (1-WL) algorithm tests isomorphism through color refinement according to color representations of a node and its neighbor nodes. However, 1-WL does not leverage any edge features (labels), presenting a potential improvement on exploiting edge features in some fields. To address this limitation, we proposed a novel Edged-WL algorithm (E-WL) which extends the original 1-WL algorithm to incorporate edge features. Building upon the E-WL algorithm, we also introduce an Edged Graph Isomorphism Network (EGIN) model for further exploiting edge features, which addresses one key drawback in many GNNs that do not utilize any edge features of graph data. We evaluated the performance of proposed models using 12 edge-featured benchmark graph datasets and compared them with some state-of-the-art baseline models. Experimental results indicate that our proposed EGIN models, in general, demonstrate superior performance in graph learning on graph classification tasks.


【20】Over-the-Air Semantic Alignment with Stacked Intelligent Metasurfaces
标题:基于堆叠智能元表面的空中语义对齐
链接:https://arxiv.org/abs/2512.05657

作者:Mario Edoardo Pandolfo,Kyriakos Stylianopoulos,George C. Alexandropoulos,Paolo Di Lorenzo
摘要:语义通信系统的目标是在具有人工智能能力的设备之间传输任务相关的信息,但是当异构的发送器-接收器模型产生不对齐的潜在表示时,它们的性能会降低。现有的语义对齐方法通常依赖于发送器或接收器处的附加数字处理,从而增加了整体设备的复杂性。在这项工作中,我们引入了第一个基于堆叠智能元表面(SIM)的空中语义对齐框架,该框架直接在波域中实现潜在空间对齐,大大减少了设备级别的计算负担。我们将SIM建模为能够仿真监督线性对准器和基于zero-shot Parseval帧的均衡器的可训练线性算子。为了实现这些运营商的物理,我们开发了一个基于梯度的优化过程,定制的元表面传递函数所需的语义映射。异构Vision Transformer(ViT)编码器的实验表明,SIM可以准确地再现监督和zero-shot语义均衡器,在高信噪比(SNR)的情况下实现高达90%的任务准确度,同时即使在低SNR值下也保持强大的鲁棒性。
摘要:Semantic communication systems aim to transmit task-relevant information between devices capable of artificial intelligence, but their performance can degrade when heterogeneous transmitter-receiver models produce misaligned latent representations. Existing semantic alignment methods typically rely on additional digital processing at the transmitter or receiver, increasing overall device complexity. In this work, we introduce the first over-the-air semantic alignment framework based on stacked intelligent metasurfaces (SIM), which enables latent-space alignment directly in the wave domain, reducing substantially the computational burden at the device level. We model SIMs as trainable linear operators capable of emulating both supervised linear aligners and zero-shot Parseval-frame-based equalizers. To realize these operators physically, we develop a gradient-based optimization procedure that tailors the metasurface transfer function to a desired semantic mapping. Experiments with heterogeneous vision transformer (ViT) encoders show that SIMs can accurately reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR), while maintaining strong robustness even at low SNR values.


【21】Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening
标题:解析哲学有效音乐聆听中对音乐元素的选择性听觉注意
链接:https://arxiv.org/abs/2512.05528

作者:Taketo Akama,Zhuohao Zhang,Tsukasa Nagashima,Takagi Yutaka,Shun Minamikawa,Natalia Polouliakh
摘要:长期以来,艺术在塑造人类情感、认知和行为方面发挥着深远的作用。虽然绘画和建筑等视觉艺术已经通过眼动追踪进行了研究,揭示了专家和新手之间不同的凝视模式,但听觉艺术形式的类似方法仍然不发达。尽管音乐是现代生活和文化的一个普遍组成部分,但仍然缺乏客观的工具来量化自然听觉体验期间听众的注意力和感知焦点。据我们所知,这是第一次尝试使用自然主义的、工作室制作的歌曲和只有四个电极的轻量级消费级脑电图设备来解码对音乐元素的选择性注意。通过分析在真实世界中的神经反应,如音乐听,我们测试解码是否是可行的条件下,最大限度地减少参与者的负担,并保持音乐体验的真实性。我们的贡献有四个方面:(i)解码真正的录音室制作的歌曲中的音乐注意力,(ii)用四通道消费者EEG证明可行性,(iii)提供对音乐注意力解码的见解,以及(iv)证明与先前工作相比改进的模型能力。我们的研究结果表明,音乐注意力不仅可以解码为新的歌曲,而且可以跨新的主题,在我们的测试条件下,与现有的方法相比,表现出性能的改善。这些发现表明,消费级设备可以可靠地捕获信号,并且音乐中的神经解码在现实世界中是可行的。这为教育、个性化音乐技术和治疗干预的应用铺平了道路。
摘要:Art has long played a profound role in shaping human emotion, cognition, and behavior. While visual arts such as painting and architecture have been studied through eye tracking, revealing distinct gaze patterns between experts and novices, analogous methods for auditory art forms remain underdeveloped. Music, despite being a pervasive component of modern life and culture, still lacks objective tools to quantify listeners' attention and perceptual focus during natural listening experiences. To our knowledge, this is the first attempt to decode selective attention to musical elements using naturalistic, studio-produced songs and a lightweight consumer-grade EEG device with only four electrodes. By analyzing neural responses during real world like music listening, we test whether decoding is feasible under conditions that minimize participant burden and preserve the authenticity of the musical experience. Our contributions are fourfold: (i) decoding music attention in real studio-produced songs, (ii) demonstrating feasibility with a four-channel consumer EEG, (iii) providing insights for music attention decoding, and (iv) demonstrating improved model ability over prior work. Our findings suggest that musical attention can be decoded not only for novel songs but also across new subjects, showing performance improvements compared to existing approaches under our tested conditions. These findings show that consumer-grade devices can reliably capture signals, and that neural decoding in music could be feasible in real-world settings. This paves the way for applications in education, personalized music technologies, and therapeutic interventions.


【22】Symmetric Linear Dynamical Systems are Learnable from Few Observations
标题:对称线性动力系统可以从很少的观察中学习
链接:https://arxiv.org/abs/2512.05337

作者:Minh Vu,Andrey Y. Lokhov,Marc Vuffray
摘要:我们考虑从单个时间轨迹$T$在完全和部分观察下学习$N$维随机线性动力学参数的问题。我们介绍并分析了一种新的估计,实现了一个小的最大元素的误差恢复对称动态矩阵,只使用$T=\mathcal{O}(\log N)$的意见,无论矩阵是稀疏或密集。该估计器基于矩量法,不依赖于特定问题的正则化。这对于诸如结构发现之类的应用尤其重要。
摘要:We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce and analyze a new estimator that achieves a small maximum element-wise error on the recovery of symmetric dynamic matrices using only $T=\mathcal{O}(\log N)$ observations, irrespective of whether the matrix is sparse or dense. This estimator is based on the method of moments and does not rely on problem-specific regularization. This is especially important for applications such as structure discovery.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/190185