Py学习  »  机器学习算法

机器学习学术速递[10.30]

arXiv每日学术速递 • 2 周前 • 188 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计153篇


大模型相关(13篇)

【1】Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation
标题:Hawk:利用空间上下文实现更快的自回归文本到图像生成
链接:https://arxiv.org/abs/2510.25739

作者:Zhi-Kai Chen, Jun-Peng Jiang, Han-Jia Ye, De-Chuan Zhan
摘要:自回归(AR)图像生成模型能够产生高保真图像,但由于其固有的顺序,逐令牌解码过程,往往受到缓慢的推理。推测解码采用轻量级草稿模型来近似更大AR模型的输出,在加速文本生成而不影响质量方面表现出了希望。然而,其应用程序的图像生成仍然在很大程度上未被探索。这些挑战来自于一个明显更大的采样空间,这使得草稿和目标模型输出之间的对齐变得复杂,再加上图像中固有的二维空间结构的使用不足,从而限制了局部依赖性的建模。为了克服这些挑战,我们引入了Hawk,这是一种新的方法,它利用图像的空间结构来引导推测模型进行更准确和更有效的预测。在多个文本到图像基准测试上的实验结果表明,与标准AR模型相比,速度提高了1.71倍,同时保持了图像的保真度和多样性。
摘要:Autoregressive (AR) image generation models are capable of producing high-fidelity images but often suffer from slow inference due to their inherently sequential, token-by-token decoding process. Speculative decoding, which employs a lightweight draft model to approximate the output of a larger AR model, has shown promise in accelerating text generation without compromising quality. However, its application to image generation remains largely underexplored. The challenges stem from a significantly larger sampling space, which complicates the alignment between the draft and target model outputs, coupled with the inadequate use of the two-dimensional spatial structure inherent in images, thereby limiting the modeling of local dependencies. To overcome these challenges, we introduce Hawk, a new approach that harnesses the spatial structure of images to guide the speculative model toward more accurate and efficient predictions. Experimental results on multiple text-to-image benchmarks demonstrate a 1.71x speedup over standard AR models, while preserving both image fidelity and diversity.


【2】Are Language Models Efficient Reasoners? A Perspective from Logic Programming
标题:语言模型是有效的推理者吗?逻辑编程的视角
链接:https://arxiv.org/abs/2510.25626

作者:Andreas Opedal, Yanick Zengaffinen, Haruki Shirakami, Clemente Pasti, Mrinmaya Sachan, Abulhair Saparov, Ryan Cotterell, Bernhard Schölkopf
备注:Accepted to NeurIPS 2025
摘要:现代语言模型(LM)表现出强大的演绎推理能力,但标准评估强调正确性,而忽略了类人推理的一个关键方面:效率。在现实世界的推理场景中,许多可用的信息是不相关的,有效的演绎推理需要识别和忽略这些干扰。我们提出了一个评估LM推理效率的框架,通过逻辑编程的镜头,介绍了一种简单的方法来对齐自然语言编写的证明-由LM生成-通过执行逻辑程序找到最短的证明。效率通过衡量模型避免不必要的推断的程度来量化。从经验上讲,我们构建了一个数学应用题的数据集,其中注入了不同数量的无关公理,这些公理与目标定理的语义重叠。我们发现,目前的LM在这种条件下显示出显着的准确性下降-即使是最小的,域一致的干扰-和他们产生的证据经常通过不相关的推理表现出弯路。
摘要:Modern language models (LMs) exhibit strong deductive reasoning capabilities, yet standard evaluations emphasize correctness while overlooking a key aspect of human-like reasoning: efficiency. In real-world reasoning scenarios, much of the available information is irrelevant, and effective deductive inference requires identifying and ignoring such distractions. We propose a framework for assessing LM reasoning efficiency through the lens of logic programming, introducing a simple method to align proofs written in natural language -- as generated by an LM -- with shortest proofs found by executing the logic program. Efficiency is quantified by measuring how well a model avoids unnecessary inference. Empirically, we construct a dataset of math word problems injected with various number of irrelevant axioms that vary in semantic overlap with the goal theorem. We find that current LMs show marked accuracy declines under such conditions -- even with minimal, domain-consistent distractions -- and the proofs they generate frequently exhibit detours through irrelevant inferences.


【3】GPTOpt: Towards Efficient LLM-Based Black-Box Optimization
标题:GPTOPt:迈向高效的基于LLM的黑匣子优化
链接:https://arxiv.org/abs/2510.25404

作者:Jamison Meindl, Yunsheng Tian, Tony Cui, Veronika Thost, Zhang-Wei Hong, Jie Chen, Wojciech Matusik, Mina Konaković Luković
摘要:全局优化昂贵的、无导数的黑盒函数需要极高的采样效率。贝叶斯优化(BO)等经典方法可能是有效的,但它们通常需要仔细调整每个应用领域的参数。与此同时,大型语言模型(LLM)已经显示出广泛的功能,但最先进的模型在解决连续的黑盒优化任务方面仍然有限。我们介绍GPTOpt,一个基于LLM的优化方法,装备LLM与连续的黑盒优化能力。通过在从不同BO参数化中派生的广泛合成数据集上微调大型语言模型,GPTOpt利用LLM预训练来概括优化任务。在各种黑盒优化基准测试中,GPTOpt超越了传统的优化器,突出了LLM用于高级数值推理的能力,并引入了一个灵活的全局优化框架,无需参数调整。
摘要:Global optimization of expensive, derivative-free black-box functions demands extreme sample efficiency. Classical methods such as Bayesian Optimization (BO) can be effective, but they often require careful parameter tuning to each application domain. At the same time, Large Language Models (LLMs) have shown broad capabilities, yet state-of-the-art models remain limited in solving continuous black-box optimization tasks. We introduce GPTOpt, an LLM-based optimization method that equips LLMs with continuous black-box optimization capabilities. By fine-tuning large language models on extensive synthetic datasets derived from diverse BO parameterizations, GPTOpt leverages LLM pre-training to generalize across optimization tasks. On a variety of black-box optimization benchmarks, GPTOpt surpasses traditional optimizers, highlighting the capacity of LLMs for advanced numerical reasoning and introducing a flexible framework for global optimization without parameter tuning.


【4】RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models
标题:RAVR:大型语言模型的参考答案引导变分推理
链接:https://arxiv.org/abs/2510.25206

作者:Tianqianjin Lin, Xi Zhao, Xingyao Zhang, Rujiao Long, Yi Xu, Zhuoren Jiang, Wenbo Su, Bo Zheng
备注:17 pages, 11 figures
摘要 :强化学习(RL)可以改善大型语言模型(LLM)的推理能力,但关键取决于一个关键的先决条件:LLM已经可以以不可忽略的概率生成高实用性的推理路径。对于超出法学硕士当前能力的任务,这种推理路径可能很难采样,并且学习有可能强化熟悉但次优的推理。我们的动机来自认知科学的洞察力,即为什么这是答案往往比答案是什么更容易的问题,因为它避免了开放式探索的沉重认知负荷,而是选择解释性重构-系统地追溯将问题与答案联系起来的推理。我们表明,LLM可以类似地利用答案来获得高质量的推理路径。我们形式化这一现象,并证明条件的答案可证明增加了预期效用的采样推理路径,从而将棘手的问题转化为可学习的。基于这一认识,我们引入了RAVR(参考答案引导的变分推理),这是一个端到端的框架,它使用答案条件推理作为变分替代问题推理。在一般和数学领域的实验表明,在强基线上的一致改进。我们进一步分析了推理行为,发现RAVR减少了犹豫,加强了结论巩固,促进了推理中的问题特定策略。
摘要:Reinforcement learning (RL) can refine the reasoning abilities of large language models (LLMs), but critically depends on a key prerequisite: the LLM can already generate high-utility reasoning paths with non-negligible probability. For tasks beyond the LLM's current competence, such reasoning path can be hard to sample, and learning risks reinforcing familiar but suboptimal reasoning. We are motivated by the insight from cognitive science that Why is this the answer is often an easier question than What is the answer, as it avoids the heavy cognitive load of open-ended exploration, opting instead for explanatory reconstruction-systematically retracing the reasoning that links a question to its answer. We show that LLMs can similarly leverage answers to derive high-quality reasoning paths. We formalize this phenomenon and prove that conditioning on answer provably increases the expected utility of sampled reasoning paths, thereby transforming intractable problems into learnable ones. Building on this insight, we introduce RAVR (Reference-Answer-guided Variational Reasoning), an end-to-end framework that uses answer-conditioned reasoning as a variational surrogate for question-only reasoning. Experiments in both general and math domains demonstrate consistent improvements over strong baselines. We further analyze the reasoning behavior and find that RAVR reduces hesitation, strengthens conclusion consolidation, and promotes problem-specific strategies in reasoning.


【5】Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
标题:用于基于LLM的生成式推荐系统的连续低级别适配器
链接:https://arxiv.org/abs/2510.25093

作者:Hyunsik Yoo, Ting-Wei Li, SeongKu Kang, Zhining Liu, Charlie Xu, Qilin Qi, Hanghang Tong
摘要:虽然大型语言模型(LLM)在推荐方面表现出色,但随着用户、项目和用户偏好的不断变化,它们在持续学习方面面临挑战。现有的基于LoRA的连续方法主要关注于保留先前任务的性能,但这忽略了推荐的独特性质:目标不是预测过去的偏好,当当前兴趣发生重大变化时,过时的偏好甚至会损害性能。为了解决这个问题,我们提出了PESO(Proximally rEgularized Single evolving lOra),这是一种用于LoRA的持续自适应方法。PESO引入了一个近端正则化器,将当前适配器锚定到其最近的冻结状态,使模型能够灵活地平衡适应和保存,并更好地捕获最近的用户行为。从理论上讲,我们表明,这种近端设计提供了数据感知,方向明智的指导LoRA子空间。根据经验,PESO始终优于现有的基于LoRA的持续学习方法。
摘要:While large language models (LLMs) achieve strong performance in recommendation, they face challenges in continual learning as users, items, and user preferences evolve over time. Existing LoRA-based continual methods primarily focus on preserving performance on previous tasks, but this overlooks the unique nature of recommendation: the goal is not to predict past preferences, and outdated preferences can even harm performance when current interests shift significantly. To address this, we propose PESO (Proximally rEgularized Single evolving lOra, a continual adaptation method for LoRA in recommendation. PESO introduces a proximal regularizer that anchors the current adapter to its most recent frozen state, enabling the model to flexibly balance adaptation and preservation, and to better capture recent user behaviors. Theoretically, we show that this proximal design provides data-aware, direction-wise guidance in the LoRA subspace. Empirically, PESO consistently outperforms existing LoRA-based continual learning methods.


【6】BioCoref: Benchmarking Biomedical Coreference Resolution with LLMs
标题:BioCoref:使用LLM对生物医学共指消解进行基准测试
链接:https://arxiv.org/abs/2510.25087

作者:Nourah M Salem, Elizabeth White, Michael Bada, Lawrence Hunter
摘要:生物医学文本中的共指消解由于复杂的特定领域术语、提及形式的高度模糊性以及共指表达之间的长距离依赖性而呈现出独特的挑战。在这项工作中,我们提出了一个全面的评估生成大语言模型(LLM)在生物医学领域的共指消解。使用CRAFT语料库作为我们的基准,我们评估的LLM的性能与四个提示实验,不同的地方,上下文丰富,和特定领域的线索,如缩写和实体字典的使用。我们基准这些方法对歧视性的跨度为基础的编码器,SpanBERT,生成与歧视性的方法的功效进行比较。我们的研究结果表明,虽然LLM表现出强大的表面水平的共指能力,特别是当补充域接地提示,他们的表现仍然敏感的远程上下文,并提到歧义。值得注意的是,LLaMA 8B和17 B模型在实体增强提示下显示出卓越的精度和F1得分,突出了轻量级提示工程在生物医学NLP任务中增强LLM实用性的潜力。
摘要:Coreference resolution in biomedical texts presents unique challenges due to complex domain-specific terminology, high ambiguity in mention forms, and long-distance dependencies between coreferring expressions. In this work, we present a comprehensive evaluation of generative large language models (LLMs) for coreference resolution in the biomedical domain. Using the CRAFT corpus as our benchmark, we assess the LLMs' performance with four prompting experiments that vary in their use of local, contextual enrichment, and domain-specific cues such as abbreviations and entity dictionaries. We benchmark these approaches against a discriminative span-based encoder, SpanBERT, to compare the efficacy of generative versus discriminative methods. Our results demonstrate that while LLMs exhibit strong surface-level coreference capabilities, especially when supplemented with domain-grounding prompts, their performance remains sensitive to long-range context and mentions ambiguity. Notably, the LLaMA 8B and 17B models show superior precision and F1 scores under entity-augmented prompting, highlighting the potential of lightweight prompt engineering for enhancing LLM utility in biomedical NLP tasks.


【7】GAPMAP: Mapping Scientific Knowledge Gaps in Biomedical Literature Using Large Language Models
标题:GAPMAP:使用大型语言模型绘制生物医学文献中的科学知识差距
链接:https://arxiv.org/abs/2510.25055

作者:Nourah M Salem, Elizabeth White, Michael Bada, Lawrence Hunter
摘要:科学进步是由对未知事物的深思熟虑所推动的。本研究调查了大型语言模型(LLM)识别生物医学文献中研究知识缺口的能力。我们定义了两类知识差距:明确的差距,明确的声明,失踪的知识和隐性差距,上下文推断失踪的知识。虽然以前的工作主要集中在明确的差距检测,我们扩展了这条线的研究,通过解决新的任务推断隐式差距。我们在四个数据集的近1500个文档上进行了两个实验,其中包括一个手动注释的生物医学文章语料库。我们在段落级别和全文设置下对封闭权重模型(来自OpenAI)和开放权重模型(Llama和Gemma 2)进行了基准测试。为了解决隐式间隙推理的推理问题,我们引入了\textbf{\small TABI},一个Toulmin-溯因桶推理方案,该方案构造推理和桶推断的结论候选者以进行验证。我们的研究结果突出了LLM在识别显性和隐性知识差距方面的强大能力。这对于开放权重模型和封闭权重模型都是如此,较大的变量通常表现更好。这表明LLM系统地识别候选人知识差距的能力很强,可以支持早期研究制定,政策制定者和资金决策。我们还报告了观察到的故障模式,并概述了稳健部署的方向,包括域自适应,人在环验证,以及开放和封闭权重模型的基准测试。
摘要 :Scientific progress is driven by the deliberate articulation of what remains unknown. This study investigates the ability of large language models (LLMs) to identify research knowledge gaps in the biomedical literature. We define two categories of knowledge gaps: explicit gaps, clear declarations of missing knowledge; and implicit gaps, context-inferred missing knowledge. While prior work has focused mainly on explicit gap detection, we extend this line of research by addressing the novel task of inferring implicit gaps. We conducted two experiments on almost 1500 documents across four datasets, including a manually annotated corpus of biomedical articles. We benchmarked both closed-weight models (from OpenAI) and open-weight models (Llama and Gemma 2) under paragraph-level and full-paper settings. To address the reasoning of implicit gaps inference, we introduce \textbf{\small TABI}, a Toulmin-Abductive Bucketed Inference scheme that structures reasoning and buckets inferred conclusion candidates for validation. Our results highlight the robust capability of LLMs in identifying both explicit and implicit knowledge gaps. This is true for both open- and closed-weight models, with larger variants often performing better. This suggests a strong ability of LLMs for systematically identifying candidate knowledge gaps, which can support early-stage research formulation, policymakers, and funding decisions. We also report observed failure modes and outline directions for robust deployment, including domain adaptation, human-in-the-loop verification, and benchmarking across open- and closed-weight models.


【8】Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
标题:乳腺癌VLMS:临床实用的视觉-语言训练-推理模型
链接:https://arxiv.org/abs/2510.25051

作者:Shunjie-Fabian Zheng, Hyeonjun Lee, Thijs Kooi, Ali Diba
备注:Accepted to Computer Vision for Automated Medical Diagnosis (CVAMD) Workshop at ICCV 2025
摘要:乳腺癌仍然是发达国家妇女中最常见的恶性肿瘤。通过乳房X光检查进行早期发现在降低死亡率方面发挥着关键作用。虽然计算机辅助诊断(CAD)系统在辅助放射科医师方面已经显示出前景,但是现有方法在临床部署中面临严重的限制-特别是在处理多模态数据的细微解释和由于先前临床历史的要求而导致的可行性方面。本研究介绍了一种新的框架,该框架将来自2D乳房X线照片的视觉特征与来自易于访问的临床元数据的结构化文本描述符协同结合,并通过创新的标记化模块合成放射学报告。我们在这项研究中提出的方法表明,卷积神经网络(ConvNets)与语言表示的战略集成实现了基于视觉变换器的模型的卓越性能,同时处理高分辨率图像并实现跨不同人群的实际部署。通过在多国队列筛查乳房X线照片上对其进行评估,与单峰基线相比,我们的多模式方法在癌症检测和钙化识别方面具有卓越的性能,并有特别的改进。所提出的方法建立了一个新的范例,开发临床上可行的基于VLM的CAD系统,有效地利用成像数据和上下文的患者信息,通过有效的融合机制。
摘要:Breast cancer remains the most commonly diagnosed malignancy among women in the developed world. Early detection through mammography screening plays a pivotal role in reducing mortality rates. While computer-aided diagnosis (CAD) systems have shown promise in assisting radiologists, existing approaches face critical limitations in clinical deployment - particularly in handling the nuanced interpretation of multi-modal data and feasibility due to the requirement of prior clinical history. This study introduces a novel framework that synergistically combines visual features from 2D mammograms with structured textual descriptors derived from easily accessible clinical metadata and synthesized radiological reports through innovative tokenization modules. Our proposed methods in this study demonstrate that strategic integration of convolutional neural networks (ConvNets) with language representations achieves superior performance to vision transformer-based models while handling high-resolution images and enabling practical deployment across diverse populations. By evaluating it on multi-national cohort screening mammograms, our multi-modal approach achieves superior performance in cancer detection and calcification identification compared to unimodal baselines, with particular improvements. The proposed method establishes a new paradigm for developing clinically viable VLM-based CAD systems that effectively leverage imaging data and contextual patient information through effective fusion mechanisms.


【9】Taming the Real-world Complexities in CPT E/M Coding with Large Language Models
标题:用大型语言模型驯服CPD E/M编码中的现实世界复杂性
链接:https://arxiv.org/abs/2510.25007

作者:Islam Nassar, Yang Lin, Yuan Jin, Rongxin Zhu, Chang Wei Tan, Zenan Zhai, Nitika Mathur, Thanh Tien Vu, Xu Zhong, Long Duong, Yuan-Fang Li
备注:EMNLP 2025 Industry Track
摘要:评价和管理(E/M)编码,根据当前程序术语(CPT)分类法,记录医生向患者提供的医疗服务。主要用于计费目的,提供准确的CPT E/M代码符合医生的最佳利益。%虽然很重要,但它是一项辅助任务,增加了医生的文档负担。自动化此编码任务将有助于减轻医生的文档负担,提高计费效率,并最终实现更好的患者护理。然而,许多现实世界的复杂性使得E/M编码自动化成为一项具有挑战性的任务。在本文中,我们阐述了一些关键的复杂性,并提出了ProFees,我们基于LLM的框架,解决了这些问题,然后进行了系统的评估。在专家策划的真实世界数据集上,ProFees的编码准确性比商业CPT E/M编码系统提高了36%以上,比我们最强的单提示基线提高了近5%,证明了其在解决现实世界复杂性方面的有效性。
摘要:Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians' best interest to provide accurate CPT E/M codes. %While important, it is an auxiliary task that adds to physicians' documentation burden. Automating this coding task will help alleviate physicians' documentation burden, improve billing efficiency, and ultimately enable better patient care. However, a number of real-world complexities have made E/M encoding automation a challenging task. In this paper, we elaborate some of the key complexities and present ProFees, our LLM-based framework that tackles them, followed by a systematic evaluation. On an expert-curated real-world dataset, ProFees achieves an increase in coding accuracy of more than 36\% over a commercial CPT E/M coding system and almost 5\% over our strongest single-prompt baseline, demonstrating its effectiveness in addressing the real-world complexities.


【10】Sequences of Logits Reveal the Low Rank Structure of Language Models
标题:逻辑位序列揭示了语言模型的低等级结构
链接:https://arxiv.org/abs/2510.24966

作者:Noah Golowich, Allen Liu, Abhishek Shetty
摘要:大型语言模型研究中的一个主要问题是理解其固有的低维结构。我们介绍了一种方法来研究低维结构的语言模型在一个模型不可知的水平:顺序概率模型。我们首先经验证明,广泛的现代语言模型表现出低秩结构:特别是,矩阵建立从模型的logits为不同的提示和响应集有低的近似秩。然后,我们表明,这种低秩结构可以用于生成-特别是,我们可以生成一个响应的目标提示使用模型的输出不相关的线性组合,甚至是无意义的提示。   在理论方面,我们观察到,在上面讨论的意义上研究语言模型的近似等级产生了一个简单的普遍抽象,其理论预测与我们的实验平行。然后,我们分析了抽象的表示能力,并给出可证明的学习保证。
摘要:A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts.   On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.


【11】Finding Culture-Sensitive Neurons in Vision-Language Models
标题:在视觉语言模型中寻找文化敏感神经元
链接:https://arxiv.org/abs/2510.24942

作者:Xiutian Zhao, Rochelle Choenni, Rohit Saxena, Ivan Titov
备注:22 pages, 13 figures
摘要 :尽管视觉语言模型(VLM)的表现令人印象深刻,但它们仍然在文化输入方面苦苦挣扎。为了理解VLM如何处理文化基础信息,我们研究了文化敏感神经元的存在,即其激活对与特定文化背景相关的输入表现出优先敏感性的神经元。我们研究这些神经元是否对文化多样性的视觉问题回答以及它们的位置很重要。使用CVQA基准,我们确定了神经元的文化选择性和执行因果关系测试,通过停用标记的神经元,由不同的识别方法。在25个文化群体的三个VLM上进行的实验表明,存在这样的神经元,其消融不成比例地损害了关于相应文化的问题的表现,而对其他人的影响很小。此外,我们提出了一种新的基于边缘的选择器-对比激活选择(CAS),并表明它优于现有的基于概率和熵的方法在识别文化敏感的神经元。最后,我们的逐层分析表明,这些神经元倾向于聚集在某些解码器层中。总的来说,我们的研究结果揭示了多模态表征的内部组织。
摘要:Despite their impressive performance, vision-language models (VLMs) still struggle on culturally situated inputs. To understand how VLMs process culturally grounded information, we study the presence of culture-sensitive neurons, i.e. neurons whose activations show preferential sensitivity to inputs associated with particular cultural contexts. We examine whether such neurons are important for culturally diverse visual question answering and where they are located. Using the CVQA benchmark, we identify neurons of culture selectivity and perform causal tests by deactivating the neurons flagged by different identification methods. Experiments on three VLMs across 25 cultural groups demonstrate the existence of neurons whose ablation disproportionately harms performance on questions about the corresponding cultures, while having minimal effects on others. Moreover, we propose a new margin-based selector - Contrastive Activation Selection (CAS), and show that it outperforms existing probability- and entropy-based methods in identifying culture-sensitive neurons. Finally, our layer-wise analyses reveals that such neurons tend to cluster in certain decoder layers. Overall, our findings shed new light on the internal organization of multimodal representations.


【12】ProofSketch: Efficient Verified Reasoning for Large Language Models
标题:ProofSketch:大型语言模型的高效验证推理
链接:https://arxiv.org/abs/2510.24811

作者:Disha Sheshanarayana, Tanishka Magar
备注:Accepted at NeurIPS 2025, ER Workshop
摘要:诸如思维链提示和自我一致性等推理方法在提高大型语言模型在各种推理任务中的准确性方面表现出巨大的潜力。然而,这样的方法涉及生成冗长的推理链,这大大增加了令牌消耗、计算成本和延迟。为了解决这种低效率,我们提出了ProofSketch,一个验证引导的推理框架,集成了符号闭包计算,字典式验证和自适应草图生成。我们的实验表明,ProofSketch在提高准确性的同时不断减少令牌的使用,表明这种方法为高效和可信的推理提供了一条有前途的道路。
摘要:Reasoning methods such as chain-of-thought prompting and self-consistency have shown immense potential to improve the accuracy of large language models across various reasoning tasks. However such methods involve generation of lengthy reasoning chains, which substantially increases token consumption, computational cost, and latency. To address this inefficiency, we propose ProofSketch, a verification-guided reasoning framework that integrates symbolic closure computation, lexicographic verification and adaptive sketch generation. Our experiments show that ProofSketch consistently reduces token usage while improving accuracy, demonstrating that this approach offers a promising path for efficient and trustworthy reasoning.


【13】A Survey on Efficient Vision-Language-Action Models
标题:高效视觉-语言-动作模型研究
链接:https://arxiv.org/abs/2510.24795

作者:Zhaoshu Yu, Bo Wang, Pengpeng Zeng, Haonan Zhang, Ji Zhang, Lianli Gao, Jingkuan Song, Nicu Sebe, Heng Tao Shen
备注:26 pages, 8 figures
摘要:视觉-语言-动作模型(VLA)代表了体现智能的重要前沿,旨在将数字知识与物理世界互动联系起来。虽然这些模型已经证明了显着的通才能力,他们的部署是严重阻碍了大量的计算和数据的要求固有的基础大规模的基础模型。由于迫切需要解决这些挑战,本调查首次全面审查了整个数据模型培训过程中的高效视觉-语言-动作模型(高效VLA)。具体来说,我们引入了一个统一的分类法来系统地组织该领域的不同工作,将当前的技术分为三个核心支柱:(1)高效的模型设计,专注于高效的架构和模型压缩;(2)高效的训练,减少模型学习期间的计算负担;(3)高效的数据收集,解决获取和利用机器人数据的瓶颈。通过对该框架内最先进方法的批判性审查,该调查不仅为社区建立了基础参考,而且还总结了代表性应用,描绘了关键挑战,并为未来的研究绘制了路线图。我们保持一个不断更新的项目页面,以跟踪我们的最新发展:https://evla-survey.github.io/
摘要:Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction. While these models have demonstrated remarkable generalist capabilities, their deployment is severely hampered by the substantial computational and data requirements inherent to their underlying large-scale foundation models. Motivated by the urgent need to address these challenges, this survey presents the first comprehensive review of Efficient Vision-Language-Action models (Efficient VLAs) across the entire data-model-training process. Specifically, we introduce a unified taxonomy to systematically organize the disparate efforts in this domain, categorizing current techniques into three core pillars: (1) Efficient Model Design, focusing on efficient architectures and model compression; (2) Efficient Training, which reduces computational burdens during model learning; and (3) Efficient Data Collection, which addresses the bottlenecks in acquiring and utilizing robotic data. Through a critical review of state-of-the-art methods within this framework, this survey not only establishes a foundational reference for the community but also summarizes representative applications, delineates key challenges, and charts a roadmap for future research. We maintain a continuously updated project page to track our latest developments: https://evla-survey.github.io/


Graph相关(图学习|图神经网络|图优化等)(11篇)

【1】Graph Network-based Structural Simulator: Graph Neural Networks for Structural Dynamics
标题:基于图形网络的结构模拟器:结构动力学的图形神经网络
链接:https://arxiv.org/abs/2510.25683

作者:Alessandro Lucchetti (1), Francesco Cadini (1), Marco Giglio (1), Luca Lomazzi (1) ((1) Politecnico di Milano, Department of Mechanical Engineering, Milano, Italy)
备注:16 pages, 14 figures
摘要:图神经网络(GNNs)最近被探索作为数值模拟的代理模型。虽然它们在计算流体动力学中的应用已被研究,但很少关注结构问题,特别是动态情况。为了解决这一差距,我们引入了基于图网络的结构模拟器(GNSS),这是一个用于动态结构问题代理建模的GNN框架。   GNSS遵循基于GNSS的机器学习模型的典型编码-处理-解码范式,其设计使其特别适合动态模拟,这归功于三个关键特征:㈠在节点固定的局部框架中表达节点运动学,这避免了有限差分速度中的灾难性抵消; ㈡采用符号感知回归损失,这减少了长时间推出中的相位误差;以及(iii)使用波长通知的连接半径,这优化了图构造。   我们评估全球导航卫星系统的案例研究,涉及一个50 kHz的汉宁调制脉冲激发的光束。结果表明,GNSS在数百个时间步上准确地再现了问题的物理特性,并推广到了看不见的负载条件,在这些条件下,现有的GNN无法收敛或提供有意义的预测。   与显式有限元基线相比,GNSS在保持空间和时间保真度的同时实现了实质性的推理加速。这些发现表明,具有物理一致性更新规则的局部保持GNN是动态波主导结构模拟的一种有竞争力的替代方案。
摘要 :Graph Neural Networks (GNNs) have recently been explored as surrogate models for numerical simulations. While their applications in computational fluid dynamics have been investigated, little attention has been given to structural problems, especially for dynamic cases. To address this gap, we introduce the Graph Network-based Structural Simulator (GNSS), a GNN framework for surrogate modeling of dynamic structural problems.   GNSS follows the encode-process-decode paradigm typical of GNN-based machine learning models, and its design makes it particularly suited for dynamic simulations thanks to three key features: (i) expressing node kinematics in node-fixed local frames, which avoids catastrophic cancellation in finite-difference velocities; (ii) employing a sign-aware regression loss, which reduces phase errors in long rollouts; and (iii) using a wavelength-informed connectivity radius, which optimizes graph construction.   We evaluate GNSS on a case study involving a beam excited by a 50kHz Hanning-modulated pulse. The results show that GNSS accurately reproduces the physics of the problem over hundreds of timesteps and generalizes to unseen loading conditions, where existing GNNs fail to converge or deliver meaningful predictions.   Compared with explicit finite element baselines, GNSS achieves substantial inference speedups while preserving spatial and temporal fidelity. These findings demonstrate that locality-preserving GNNs with physics-consistent update rules are a competitive alternative for dynamic, wave-dominated structural simulations.


【2】Generalized Sobolev IPM for Graph-Based Measures
标题:基于图的测量的广义Sobolev IPL
链接:https://arxiv.org/abs/2510.25591

作者:Tam Le, Truyen Nguyen, Hideitsu Hino, Kenji Fukumizu
摘要:研究了图度量空间上测度的Sobolev IPM问题,其中评价函数被限制在Sobolev范数定义的单位球内。虽然Le等人(2025)通过将Sobolev范数与加权$L^p$-范数相关联来实现可扩展计算,但由此产生的框架仍然本质上束缚于$L^p$几何结构,限制了其在$L^p$几何范式之外纳入替代结构先验的能力。为了克服这一局限性,我们建议通过Orlicz几何结构的镜头来推广Sobolev IPM,它采用凸函数来捕捉细微的几何关系,建立在最优传输理论的最新进展基础上-特别是Orlicz-Wasserstein(OW)和广义Sobolev传输-已被证明有助于推进机器学习方法。这种推广包括经典的Sobolev IPM作为一个特殊的情况下,同时容纳不同的几何先验超越传统的$L^p$结构。然而,它带来了重大的计算障碍,复合那些已经固有的Sobolev IPM。为了解决这些挑战,我们建立了一个新的理论联系Orlicz-Sobolev范数和Musielak范数,这有利于一个新的正则化的广义Sobolev IPM(GSI)。通过进一步利用底层图结构,我们表明,GSI与Musielak正则化(GSI-M)减少到一个简单的单变量优化问题,实现显着的计算效率。从经验上看,GSI-M在计算速度上比流行的OW快几个数量级,并在比较给定图上的概率测度进行文档分类和拓扑数据分析中的几个任务方面展示了其实际优势。
摘要:We study the Sobolev IPM problem for measures supported on a graph metric space, where critic function is constrained to lie within the unit ball defined by Sobolev norm. While Le et al. (2025) achieved scalable computation by relating Sobolev norm to weighted $L^p$-norm, the resulting framework remains intrinsically bound to $L^p$ geometric structure, limiting its ability to incorporate alternative structural priors beyond the $L^p$ geometry paradigm. To overcome this limitation, we propose to generalize Sobolev IPM through the lens of \emph{Orlicz geometric structure}, which employs convex functions to capture nuanced geometric relationships, building upon recent advances in optimal transport theory -- particularly Orlicz-Wasserstein (OW) and generalized Sobolev transport -- that have proven instrumental in advancing machine learning methodologies. This generalization encompasses classical Sobolev IPM as a special case while accommodating diverse geometric priors beyond traditional $L^p$ structure. It however brings up significant computational hurdles that compound those already inherent in Sobolev IPM. To address these challenges, we establish a novel theoretical connection between Orlicz-Sobolev norm and Musielak norm which facilitates a novel regularization for the generalized Sobolev IPM (GSI). By further exploiting the underlying graph structure, we show that GSI with Musielak regularization (GSI-M) reduces to a simple \emph{univariate optimization} problem, achieving remarkably computational efficiency. Empirically, GSI-M is several-order faster than the popular OW in computation, and demonstrates its practical advantages in comparing probability measures on a given graph for document classification and several tasks in topological data analysis.


【3】Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information
标题:Transformer通过核引导互信息可证明学习有向无环图
链接:https://arxiv.org/abs/2510.25542

作者:Yuan Cheng, Yu Huang, Zhe Xiong, Yingbin Liang, Vincent Y. F. Tan
摘要:揭示真实世界数据背后的隐藏图结构是科学领域广泛应用的关键挑战。最近,利用注意力机制的基于transformer的模型在捕获图中的复杂依赖关系方面取得了很大的经验成功。然而,对其训练动态的理论理解仅限于树状图,其中每个节点依赖于单个父节点。将可证明的保证扩展到更一般的有向无环图(DAG)--每个节点涉及多个父节点--仍然具有挑战性,主要是由于设计训练目标使不同的注意力集中者能够单独学习多个不同的父关系的困难。   在这项工作中,我们解决这个问题,通过引入一种新的信息理论度量:核引导的互信息(KG-MI),基于$f$-发散。我们的目标结合KG-MI与多头注意力框架,其中每个头与一个不同的边缘过渡内核有效地建模不同的亲子依赖关系。我们证明,给定的序列所产生的$K$-父DAG,训练一个单层,多头Transformer通过梯度上升收敛到全局最优的多项式时间。此外,我们描述了收敛时的注意力得分模式。此外,当将$f$-发散具体化为KL发散时,学习的注意力分数准确地反映了地面实况邻接矩阵,从而可证明地恢复了底层图形结构。实验结果验证了我们的理论研究结果。
摘要:Uncovering hidden graph structures underlying real-world data is a critical challenge with broad applications across scientific domains. Recently, transformer-based models leveraging the attention mechanism have demonstrated strong empirical success in capturing complex dependencies within graphs. However, the theoretical understanding of their training dynamics has been limited to tree-like graphs, where each node depends on a single parent. Extending provable guarantees to more general directed acyclic graphs (DAGs) -- which involve multiple parents per node -- remains challenging, primarily due to the difficulty in designing training objectives that enable different attention heads to separately learn multiple different parent relationships.   In this work, we address this problem by introducing a novel information-theoretic metric: the kernel-guided mutual information (KG-MI), based on the $f$-divergence. Our objective combines KG-MI with a multi-head attention framework, where each head is associated with a distinct marginal transition kernel to model diverse parent-child dependencies effectively. We prove that, given sequences generated by a $K$-parent DAG, training a single-layer, multi-head transformer via gradient ascent converges to the global optimum in polynomial time. Furthermore, we characterize the attention score patterns at convergence. In addition, when particularizing the $f$-divergence to the KL divergence, the learned attention scores accurately reflect the ground-truth adjacency matrix, thereby provably recovering the underlying graph structure. Experimental results validate our theoretical findings.


【4】Bridging the Divide: End-to-End Sequence-Graph Learning
标题:弥合鸿沟:端到端序列图学习
链接:https://arxiv.org/abs/2510.25126

作者:Yuen Chen, Yulun Wu, Samuel Sharpe, Igor Melnyk, Nam H. Nguyen, Furong Huang, C. Bayan Bruss, Rizal Fathony
摘要:许多现实世界的数据集都是顺序的和关系的:每个节点都携带一个事件序列,而边则编码交互。现有的序列建模和图建模方法往往忽略了其中一种模态。我们认为,序列和图形不是单独的问题,而是同一数据集的互补方面,应该共同学习。我们引入了BRIDGE,这是一个统一的端到端架构,它将序列编码器与GNN耦合在一个目标下,允许梯度在两个模块之间流动,并学习任务对齐的表示。为了在邻居之间实现细粒度的令牌级消息传递,我们添加了TOKENXATTN,这是一个令牌级的交叉注意层,可以在相邻序列中的事件之间传递消息。在友谊预测(Brightkite)和欺诈检测(Amazon)这两种设置中,BRIDGE在排名和分类指标方面始终优于静态GNN、时间图方法和仅序列基线。
摘要:Many real-world datasets are both sequential and relational: each node carries an event sequence while edges encode interactions. Existing methods in sequence modeling and graph modeling often neglect one modality or the other. We argue that sequences and graphs are not separate problems but complementary facets of the same dataset, and should be learned jointly. We introduce BRIDGE, a unified end-to-end architecture that couples a sequence encoder with a GNN under a single objective, allowing gradients to flow across both modules and learning task-aligned representations. To enable fine-grained token-level message passing among neighbors, we add TOKENXATTN, a token-level cross-attention layer that passes messages between events in neighboring sequences. Across two settings, friendship prediction (Brightkite) and fraud detection (Amazon), BRIDGE consistently outperforms static GNNs, temporal graph methods, and sequence-only baselines on ranking and classification metrics.


【5】Energy Approach from $\varepsilon$-Graph to Continuum Diffusion Model with Connectivity Functional
链接:https://arxiv.org/abs/2510.25114

作者:Yahong Yang, Sun Lee, Jeff Calder, Wenrui Hao
摘要:我们得到了一个基于能量的连续极限的$\varepp $-图赋予一个一般的连通性功能。我们证明了离散能量和它的连续对应物最多相差$O(\varepsilon)$;前因子只涉及连接密度的$W^{1,1}$-范数为$\varepsilon\to0 $,所以即使密度有很强的局部波动,误差界仍然有效。作为一个应用程序,我们介绍了一个神经网络的过程,重建的连接密度从边权重数据,然后嵌入到大脑动力学框架的连续模型。在这种情况下,通常的恒定扩散系数被取代的空间变化系数所产生的学习密度,产生的动态显着不同,从传统的恒定扩散模型。
摘要:We derive an energy-based continuum limit for $\varepsilon$-graphs endowed with a general connectivity functional. We prove that the discrete energy and its continuum counterpart differ by at most $O(\varepsilon)$; the prefactor involves only the $W^{1,1}$-norm of the connectivity density as $\varepsilon\to0$, so the error bound remains valid even when that density has strong local fluctuations. As an application, we introduce a neural-network procedure that reconstructs the connectivity density from edge-weight data and then embeds the resulting continuum model into a brain-dynamics framework. In this setting, the usual constant diffusion coefficient is replaced by the spatially varying coefficient produced by the learned density, yielding dynamics that differ significantly from those obtained with conventional constant-diffusion models.


【6】Learning Fair Graph Representations with Multi-view Information Bottleneck
标题:在多视图信息瓶颈下学习公平图表示
链接:https://arxiv.org/abs/2510.25096

作者:Chuxun Liu, Debo Cheng, Qingfeng Chen, Jiangzhang Gan, Jiuyong Li, Lin Liu
摘要:图神经网络(GNN)通过在节点特征和结构上传递消息来处理关系数据,但它们可以放大训练数据的偏差,将歧视性属性和结构失衡传播到不公平的结果中。许多公平性方法将偏差视为单一来源,忽略了不同的属性和结构效应,导致次优的公平性和效用权衡。为了克服这一挑战,我们提出了FairMIB,这是一个多视图信息瓶颈框架,旨在将图分解为特征,结构和扩散视图,以减轻GNN中的复杂性偏差。特别是,建议FairMIB采用对比学习,以最大限度地提高跨视图互信息的无偏表示学习。该算法进一步集成了多视角条件信息瓶颈目标,通过最小化敏感属性的互信息来平衡任务的效用和公平性。此外,FairMIB在扩散视图中引入了逆概率加权(IPW)邻接校正,这减少了消息传递过程中偏差传播的扩散。在五个真实世界的基准数据集上的实验表明,FairMIB在效用和公平性指标上都达到了最先进的性能。
摘要:Graph neural networks (GNNs) excel on relational data by passing messages over node features and structure, but they can amplify training data biases, propagating discriminatory attributes and structural imbalances into unfair outcomes. Many fairness methods treat bias as a single source, ignoring distinct attribute and structure effects and leading to suboptimal fairness and utility trade-offs. To overcome this challenge, we propose FairMIB, a multi-view information bottleneck framework designed to decompose graphs into feature, structural, and diffusion views for mitigating complexity biases in GNNs. Especially, the proposed FairMIB employs contrastive learning to maximize cross-view mutual information for bias-free representation learning. It further integrates multi-perspective conditional information bottleneck objectives to balance task utility and fairness by minimizing mutual information with sensitive attributes. Additionally, FairMIB introduces an inverse probability-weighted (IPW) adjacency correction in the diffusion view, which reduces the spread of bias propagation during message passing. Experiments on five real-world benchmark datasets demonstrate that FairMIB achieves state-of-the-art performance across both utility and fairness metrics.


【7】Graph Distance Based on Cause-Effect Estimands with Latents
标题:基于带潜在因果估计的图距离
链接:https://arxiv.org/abs/2510.25037

作者:Zhufeng Li, Niki Kilbertus
摘要:因果发现的目的是从观察中恢复表示给定变量之间因果关系的图,新的方法不断被提出。社区越来越多地提出了关于取得了多大进展的问题,因为正确评估发现的图仍然非常困难,特别是在潜在的混淆情况下。我们提出了一个无环有向混合图(ADMG)的图距离测量的基础上的下游任务的因果关系估计未观察到的混淆。我们的方法通过固定和符号验证器来量化图形差异如何扭曲不同治疗结果对的因果被估量。我们分析了不同的图形扰动下的措施的行为,并将其与现有的距离度量。
摘要:Causal discovery aims to recover graphs that represent causal relations among given variables from observations, and new methods are constantly being proposed. Increasingly, the community raises questions about how much progress is made, because properly evaluating discovered graphs remains notoriously difficult, particularly under latent confounding. We propose a graph distance measure for acyclic directed mixed graphs (ADMGs) based on the downstream task of cause-effect estimation under unobserved confounding. Our approach uses identification via fixing and a symbolic verifier to quantify how graph differences distort cause-effect estimands for different treatment-outcome pairs. We analyze the behavior of the measure under different graph perturbations and compare it against existing distance metrics.


【8】KAN-GCN: Combining Kolmogorov-Arnold Network with Graph Convolution Network for an Accurate Ice Sheet Emulator
标题:KAN-GCN:将Kolmogorov-Arnold网络与图卷积网络相结合,构建准确的冰盖模拟器
链接:https://arxiv.org/abs/2510.24926

作者:Zesheng Liu, YoungHyun Koo, Maryam Rahnemoonfar
备注:Accept for NeurIPS 2025 Workshop: New Perspectives in Graph Machine Learning
摘要:我们介绍KAN-GCN,一个快速准确的冰盖建模仿真器,它将Kolmogorov-Arnold网络(KAN)作为图卷积网络(GCN)之前的特征校准器。KAN前端应用可学习的一维扭曲和线性混合步骤,在不增加消息传递深度的情况下改进了特征调节和非线性编码。我们采用这种架构,以提高性能的数值冰盖模型的仿真器。我们的仿真器是训练和测试使用36融化率模拟3网格大小设置为松岛冰川,南极洲。在2到5层架构中,KAN-GCN匹配或超过纯GCN和MLP-GCN基线的准确性。尽管参数开销很小,但KAN-GCN通过用节点变换替换一个边缘消息传递层来提高粗糙网格上的推理吞吐量;只有最精细的网格显示出适度的成本。总体而言,KAN优先设计为大瞬态场景扫描提供了有利的精度与效率权衡。
摘要:We introduce KAN-GCN, a fast and accurate emulator for ice sheet modeling that places a Kolmogorov-Arnold Network (KAN) as a feature-wise calibrator before graph convolution networks (GCNs). The KAN front end applies learnable one-dimensional warps and a linear mixing step, improving feature conditioning and nonlinear encoding without increasing message-passing depth. We employ this architecture to improve the performance of emulators for numerical ice sheet models. Our emulator is trained and tested using 36 melting-rate simulations with 3 mesh-size settings for Pine Island Glacier, Antarctica. Across 2- to 5-layer architectures, KAN-GCN matches or exceeds the accuracy of pure GCN and MLP-GCN baselines. Despite a small parameter overhead, KAN-GCN improves inference throughput on coarser meshes by replacing one edge-wise message-passing layer with a node-wise transform; only the finest mesh shows a modest cost. Overall, KAN-first designs offer a favorable accuracy vs. efficiency trade-off for large transient scenario sweeps.


【9】A Re-node Self-training Approach for Deep Graph-based Semi-supervised Classification on Multi-view Image Data
标题:基于深度图的多视图图像数据半监督分类的重节点自训练方法
链接:https://arxiv.org/abs/2510.24791

作者:Jingjun Bi, Fadi Dornaika
摘要:最近,基于图的半监督学习和伪标记由于其在减少对大量数据注释的需求方面的有效性而受到关注。伪标记使用来自未标记数据的预测来改进模型训练,而基于图的方法的特征在于处理以图表示的数据。然而,缺乏清晰的图形结构的图像结合多视图数据的复杂性限制了传统的和现有的技术的效率。此外,在多视图数据中的图结构的集成仍然是一个挑战。在本文中,我们提出了重新节点自学图为基础的半监督学习多视图数据(RSGSLM)。我们的方法通过以下方式解决了这些挑战:(i)在图卷积网络(GCN)框架内结合线性特征变换和多视图图融合,(ii)动态地将伪标签纳入GCN损失函数以改善多视图数据中的分类,以及(iii)通过调整类边界附近标记样本的权重来纠正拓扑不平衡。此外,(iv)我们引入了适用于所有样本的无监督平滑损失。这种组合在保持计算效率的同时优化了性能。在多视点基准图像数据集上的实验结果表明,RSGSLM在多视点环境下优于现有的半监督学习方法。
摘要:Recently, graph-based semi-supervised learning and pseudo-labeling have gained attention due to their effectiveness in reducing the need for extensive data annotations. Pseudo-labeling uses predictions from unlabeled data to improve model training, while graph-based methods are characterized by processing data represented as graphs. However, the lack of clear graph structures in images combined with the complexity of multi-view data limits the efficiency of traditional and existing techniques. Moreover, the integration of graph structures in multi-view data is still a challenge. In this paper, we propose Re-node Self-taught Graph-based Semi-supervised Learning for Multi-view Data (RSGSLM). Our method addresses these challenges by (i) combining linear feature transformation and multi-view graph fusion within a Graph Convolutional Network (GCN) framework, (ii) dynamically incorporating pseudo-labels into the GCN loss function to improve classification in multi-view data, and (iii) correcting topological imbalances by adjusting the weights of labeled samples near class boundaries. Additionally, (iv) we introduce an unsupervised smoothing loss applicable to all samples. This combination optimizes performance while maintaining computational efficiency. Experimental results on multi-view benchmark image datasets demonstrate that RSGSLM surpasses existing semi-supervised learning approaches in multi-view contexts.


【10】The Underappreciated Power of Vision Models for Graph Structural Understanding
标题:视觉模型在图结构理解中的力量被低估
链接:https://arxiv.org/abs/2510.24788

作者:Xinjian Zhao, Wei Pang, Zhongkai Xue, Xiangru Jian, Lei Zhang, Yaoyao Xu, Xiaozhuang Song, Shu Wu, Tianshu Yu
备注:NeurIPS 2025
摘要:图神经网络通过自下而上的消息传递进行操作,与人类视觉感知有根本不同,人类视觉感知首先直观地捕捉全局结构。我们研究了视觉模型在图形理解方面未被充分认识的潜力,发现它们在既定基准上的性能与GNN相当,同时表现出明显不同的学习模式。这些不同的行为,结合现有的基准,混淆域功能与拓扑理解的局限性,激励我们引入GraphAbstract。这个基准测试评估了模型像人类一样感知全局图形属性的能力:识别组织原型、检测对称性、感知连接强度和识别关键元素。我们的研究结果表明,视觉模型在需要整体结构理解的任务上显着优于GNN,并在不同的图形尺度上保持泛化能力,而GNN则难以进行全局模式抽象,并随着图形大小的增加而退化。这项工作表明,视觉模型具有显着但未充分利用的图形结构理解的能力,特别是对于需要全局拓扑意识和尺度不变推理的问题。这些发现开辟了新的途径,利用这种未被充分认识的潜力,为整体模式识别占主导地位的任务开发更有效的图形基础模型。
摘要:Graph Neural Networks operate through bottom-up message-passing, fundamentally differing from human visual perception, which intuitively captures global structures first. We investigate the underappreciated potential of vision models for graph understanding, finding they achieve performance comparable to GNNs on established benchmarks while exhibiting distinctly different learning patterns. These divergent behaviors, combined with limitations of existing benchmarks that conflate domain features with topological understanding, motivate our introduction of GraphAbstract. This benchmark evaluates models' ability to perceive global graph properties as humans do: recognizing organizational archetypes, detecting symmetry, sensing connectivity strength, and identifying critical elements. Our results reveal that vision models significantly outperform GNNs on tasks requiring holistic structural understanding and maintain generalizability across varying graph scales, while GNNs struggle with global pattern abstraction and degrade with increasing graph size. This work demonstrates that vision models possess remarkable yet underutilized capabilities for graph structural understanding, particularly for problems requiring global topological awareness and scale-invariant reasoning. These findings open new avenues to leverage this underappreciated potential for developing more effective graph foundation models for tasks dominated by holistic pattern recognition.


【11】Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees
标题:不确定性中的模糊性:具有统计保证的不确定知识图推理
链接:https://arxiv.org/abs/2510.24754

作者:Yuqicheng Zhu, Jingcheng Wu, Yizhen Wang, Hongkuan Zhou, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab
备注:Accepted as a main conference paper at EMNLP 2025
摘要:不确定知识图嵌入(Uncertain Knowledge Graph Embedding,UnKGE)方法学习捕获结构和不确定性信息的向量表示,以预测看不见的三元组的分数。然而,现有的方法只产生点估计,没有量化预测的不确定性,限制了它们在高风险应用中的可靠性,在这些应用中,理解预测的置信度至关重要。为了解决这一限制,我们提出了\textsc{UnKGCP},一个框架,它生成的预测区间保证包含用户指定的置信水平的真实分数。间隔的长度反映了模型的预测不确定性。\textsc{UnKGCP}建立在共形预测框架上,但引入了一种新的针对UnKGE方法的非一致性度量和一种有效的区间构造程序。我们提供了理论保证的时间间隔和经验验证这些保证。在不同UnKGE方法的标准基准上进行的大量实验进一步证明了间隔是尖锐的,并且有效地捕获了预测的不确定性。
摘要:Uncertain knowledge graph embedding (UnKGE) methods learn vector representations that capture both structural and uncertainty information to predict scores of unseen triples. However, existing methods produce only point estimates, without quantifying predictive uncertainty-limiting their reliability in high-stakes applications where understanding confidence in predictions is crucial. To address this limitation, we propose \textsc{UnKGCP}, a framework that generates prediction intervals guaranteed to contain the true score with a user-specified level of confidence. The length of the intervals reflects the model's predictive uncertainty. \textsc{UnKGCP} builds on the conformal prediction framework but introduces a novel nonconformity measure tailored to UnKGE methods and an efficient procedure for interval construction. We provide theoretical guarantees for the intervals and empirically verify these guarantees. Extensive experiments on standard benchmarks across diverse UnKGE methods further demonstrate that the intervals are sharp and effectively capture predictive uncertainty.


Transformer(7篇)

【1】Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers
标题:视觉Transformer联邦即时调整的原型即时估计
链接:https://arxiv.org/abs/2510.25372

作者:M Yashwanth, Sharannya Ghosh, Aditay Tripathi, Anirban Chakraborty
摘要:预训练的Vision Transformers(ViTs)的视觉提示调整(VPT)已被证明是一种非常有效的参数高效微调技术,可使大型模型适应数据有限的下游任务。它的参数效率使其特别适合于联合学习(FL),其中通信和计算预算通常都受到限制。但是,全局快速调优很难在异构客户端之间进行泛化,而个性化调优过度适合本地数据,缺乏泛化能力。我们提出PEP-FedPT(联邦提示调整的原型提示估计),一个统一的框架,旨在实现联邦提示调整ViTs的泛化和个性化。在这个框架内,我们介绍了新的类上下文混合提示(CCMP)-基于类特定的提示保持在一起的全球共享提示。对于每个输入,CCMP使用从全局类原型和客户端类先验导出的权重自适应地组合特定于类的提示。这种方法使得能够在不存储依赖于客户端的可训练参数的情况下对每个样本进行即时个性化。提示通过传统的联邦平均技术协同优化。对CIFAR-100、TinyImageNet、DomainNet和iNaturalist数据集的综合评估表明,PEP-FedPT在不同的数据异构场景下始终超过最先进的基线,为Vision Transformers的高效和可推广的联邦即时调优奠定了坚实的基础。
摘要 :Visual Prompt Tuning (VPT) of pre-trained Vision Transformers (ViTs) has proven highly effective as a parameter-efficient fine-tuning technique for adapting large models to downstream tasks with limited data. Its parameter efficiency makes it particularly suitable for Federated Learning (FL), where both communication and computation budgets are often constrained. However, global prompt tuning struggles to generalize across heterogeneous clients, while personalized tuning overfits to local data and lacks generalization. We propose PEP-FedPT (Prompt Estimation from Prototypes for Federated Prompt Tuning), a unified framework designed to achieve both generalization and personalization in federated prompt tuning of ViTs. Within this framework, we introduce the novel Class-Contextualized Mixed Prompt (CCMP) - based on class-specific prompts maintained alongside a globally shared prompt. For each input, CCMP adaptively combines class-specific prompts using weights derived from global class prototypes and client class priors. This approach enables per-sample prompt personalization without storing client-dependent trainable parameters. The prompts are collaboratively optimized via traditional federated averaging technique on the same. Comprehensive evaluations on CIFAR-100, TinyImageNet, DomainNet, and iNaturalist datasets demonstrate that PEP-FedPT consistently surpasses the state-of-the-art baselines under diverse data heterogeneity scenarios, establishing a strong foundation for efficient and generalizable federated prompt tuning of Vision Transformers.


【2】A Study on Inference Latency for Vision Transformers on Mobile Devices
标题:移动设备上视觉变形器推理延迟的研究
链接:https://arxiv.org/abs/2510.25166

作者:Zhuojin Li, Marco Paolieri, Leana Golubchik
备注:To appear in Springer LNICST, volume 663, Proceedings of VALUETOOLS 2024
摘要:鉴于机器学习技术在移动设备上的重大进展,特别是在计算机视觉领域,在这项工作中,我们定量研究了190个真实世界的Vision Transformers(ViTs)在移动设备上的性能特征。通过与102个真实世界的卷积神经网络(CNN)进行比较,我们深入了解了影响移动设备上ViT架构延迟的因素。基于这些见解,我们开发了一个数据集,其中包括来自两个机器学习框架和六个移动平台的具有代表性构建块和最先进架构的1000个合成ViT的测量样本。使用这个数据集,我们表明,新的ViTs的推理延迟可以预测足够的准确性,为现实世界的应用程序。
摘要:Given the significant advances in machine learning techniques on mobile devices, particularly in the domain of computer vision, in this work we quantitatively study the performance characteristics of 190 real-world vision transformers (ViTs) on mobile devices. Through a comparison with 102 real-world convolutional neural networks (CNNs), we provide insights into the factors that influence the latency of ViT architectures on mobile devices. Based on these insights, we develop a dataset including measured latencies of 1000 synthetic ViTs with representative building blocks and state-of-the-art architectures from two machine learning frameworks and six mobile platforms. Using this dataset, we show that inference latency of new ViTs can be predicted with sufficient accuracy for real-world applications.


【3】Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
标题:专注Transformer中间接物体识别最小电路的出现
链接:https://arxiv.org/abs/2510.25013

作者:Rabin Adhikari
备注:9 pages, 10 figures
摘要:机械可解释性旨在将大型语言模型(LLM)逆向工程成人类可理解的计算电路。然而,预训练模型的复杂性往往掩盖了特定推理任务所需的最小机制。在这项工作中,我们训练小,注意力只Transformers器从头开始的间接对象识别(IOI)任务的象征性版本-一个基准研究共指-在Transformers推理。令人惊讶的是,只有两个注意力头的单层模型实现了完美的IOI准确性,尽管缺乏MLP和归一化层。通过剩余流分解,频谱分析,嵌入干预,我们发现,这两个头专门加入和对比子电路,共同实现IOI决议。此外,我们表明,一个两层,一个头模型实现了类似的性能,通过查询值交互组成跨层的信息。这些结果表明,特定任务的培训诱导高度可解释的,最小的电路,提供了一个可控的测试平台,用于探测Transformer推理的计算基础。
摘要:Mechanistic interpretability aims to reverse-engineer large language models (LLMs) into human-understandable computational circuits. However, the complexity of pretrained models often obscures the minimal mechanisms required for specific reasoning tasks. In this work, we train small, attention-only transformers from scratch on a symbolic version of the Indirect Object Identification (IOI) task -- a benchmark for studying coreference -- like reasoning in transformers. Surprisingly, a single-layer model with only two attention heads achieves perfect IOI accuracy, despite lacking MLPs and normalization layers. Through residual stream decomposition, spectral analysis, and embedding interventions, we find that the two heads specialize into additive and contrastive subcircuits that jointly implement IOI resolution. Furthermore, we show that a two-layer, one-head model achieves similar performance by composing information across layers through query-value interactions. These results demonstrate that task-specific training induces highly interpretable, minimal circuits, offering a controlled testbed for probing the computational foundations of transformer reasoning.


【4】Understanding Multi-View Transformers
标题:了解多视图Transformer
链接:https://arxiv.org/abs/2510.24907

作者:Michal Stary, Julien Gaubil, Ayush Tewari, Vincent Sitzmann
备注:Presented at the ICCV 2025 E2E3D Workshop
摘要:DUSt 3R等多视图Transformers通过以前馈方式解决3D任务,正在彻底改变3D视觉。然而,与以前的基于优化的管道相反,多视图Transformers的内部机制尚不清楚。它们的黑盒性质使数据扩展之外的进一步改进具有挑战性,并使安全性和可靠性关键应用程序的使用复杂化。在这里,我们提出了一种方法,用于探测和可视化3D表示从多视图Transformers的层的剩余连接。通过这种方式,我们研究了DUSt3R模型的一个变体,揭示了其潜在状态在各个块中的发展,各个层的作用,并提出了它与具有更强的显式全局姿态归纳偏差的方法的不同之处。最后,我们表明,调查的DUSt3R的变体估计的对应关系,与重建的几何细化。用于分析的代码可在https://github.com/JulienGaubil/und3rstand上获得。
摘要:Multi-view transformers such as DUSt3R are revolutionizing 3D vision by solving 3D tasks in a feed-forward manner. However, contrary to previous optimization-based pipelines, the inner mechanisms of multi-view transformers are unclear. Their black-box nature makes further improvements beyond data scaling challenging and complicates usage in safety- and reliability-critical applications. Here, we present an approach for probing and visualizing 3D representations from the residual connections of the multi-view transformers' layers. In this manner, we investigate a variant of the DUSt3R model, shedding light on the development of its latent state across blocks, the role of the individual layers, and suggest how it differs from methods with stronger inductive biases of explicit global pose. Finally, we show that the investigated variant of DUSt3R estimates correspondences that are refined with reconstructed geometry. The code used for the analysis is available at https://github.com/JulienGaubil/und3rstand .


【5】Stiff Circuit System Modeling via Transformer
标题:通过Transformer进行刚性电路系统建模
链接:https://arxiv.org/abs/2510.24727

作者:Weiman Yan, Yi-Chia Chang, Wanyu Zhao
摘要:准确和高效的电路行为建模是现代电子设计自动化的基石。在不同类型的电路中,使用以前的框架来建模刚性电路是具有挑战性的。在这项工作中,我们提出了一种新的方法,使用Crossformer,这是一个当前最先进的Transformer模型的时间序列预测任务,结合Kolmogorov-Arnold网络(KAN),刚性电路瞬态行为建模。通过利用Crossformer的时间表示能力和增强的KAN特征提取,我们的方法在预测电路对各种输入条件的响应时实现了更高的保真度。通过模拟-数字转换器(ADC)电路的SPICE模拟生成的数据集上的实验评估表明,我们的方法的有效性,与训练时间和错误率显着减少。
摘要:Accurate and efficient circuit behavior modeling is a cornerstone of modern electronic design automation. Among different types of circuits, stiff circuits are challenging to model using previous frameworks. In this work, we propose a new approach using Crossformer, which is a current state-of-the-art Transformer model for time-series prediction tasks, combined with Kolmogorov-Arnold Networks (KANs), to model stiff circuit transient behavior. By leveraging the Crossformer's temporal representation capabilities and the enhanced feature extraction of KANs, our method achieves improved fidelity in predicting circuit responses to a wide range of input conditions. Experimental evaluations on datasets generated through SPICE simulations of analog-to-digital converter (ADC) circuits demonstrate the effectiveness of our approach, with significant reductions in training time and error rates.


【6】How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs
标题:数据混合如何影响上下文学习:Transformer与MLP的渐进等效
链接:https://arxiv.org/abs/2510.25753

作者:Samet Demir, Zafer Dogan
备注:NeurIPS 2025, 24 pages, 6 figures
摘要:经过预训练的Transformers具有出色的情境学习(ICL)能力,使它们能够在不更新参数的情况下适应演示中的新任务。然而,理论研究通常依赖于简化的架构(例如,省略MLP),数据模型(例如,具有各向同性输入的线性回归)和单源训练,限制了它们与现实环境的相关性。在这项工作中,我们研究了ICL在预训练的Transformers与非线性MLP头非线性任务从多个数据源与异构输入,任务和噪声分布。我们分析了一个模型,其中MLP包括两层,第一层通过单个梯度步骤训练,第二层完全优化。在高维渐近性下,我们证明了这种模型在ICL误差中与结构多项式预测器是等价的,利用了高斯普适性和正交多项式理论的结果。这种等效性表明,非线性MLP有意义地提高ICL的性能,特别是在非线性任务,与线性基线相比。它还可以精确分析数据混合效应:我们确定了高质量数据源的关键属性(低噪声,结构化协方差),并表明只有当任务协方差具有足够的结构时,特征学习才会出现。这些结果在各种激活函数、模型大小和数据分布中得到了经验验证。最后,我们尝试了一个涉及多语言情感分析的真实场景,其中每种语言都被视为不同的来源。我们对这种情况的实验结果说明了我们的研究结果如何扩展到现实世界的情况。总的来说,我们的工作推进了ICL在Transformers中的理论基础,并为ICL中架构和数据的作用提供了可操作的见解。
摘要:Pretrained Transformers demonstrate remarkable in-context learning (ICL) capabilities, enabling them to adapt to new tasks from demonstrations without parameter updates. However, theoretical studies often rely on simplified architectures (e.g., omitting MLPs), data models (e.g., linear regression with isotropic inputs), and single-source training, limiting their relevance to realistic settings. In this work, we study ICL in pretrained Transformers with nonlinear MLP heads on nonlinear tasks drawn from multiple data sources with heterogeneous input, task, and noise distributions. We analyze a model where the MLP comprises two layers, with the first layer trained via a single gradient step and the second layer fully optimized. Under high-dimensional asymptotics, we prove that such models are equivalent in ICL error to structured polynomial predictors, leveraging results from the theory of Gaussian universality and orthogonal polynomials. This equivalence reveals that nonlinear MLPs meaningfully enhance ICL performance, particularly on nonlinear tasks, compared to linear baselines. It also enables a precise analysis of data mixing effects: we identify key properties of high-quality data sources (low noise, structured covariances) and show that feature learning emerges only when the task covariance exhibits sufficient structure. These results are validated empirically across various activation functions, model sizes, and data distributions. Finally, we experiment with a real-world scenario involving multilingual sentiment analysis where each language is treated as a different source. Our experimental results for this case exemplify how our findings extend to real-world cases. Overall, our work advances the theoretical foundations of ICL in Transformers and provides actionable insight into the role of architecture and data in ICL.


【7】Sub-microsecond Transformers for Jet Tagging on FPGAs
标题:用于在VGA上进行喷射标记的亚微秒Transformer
链接:https://arxiv.org/abs/2510.24784

作者:Lauri Laatu, Chang Sun, Arianna Cox, Abhijith Gandrakota, Benedikt Maier, Jennifer Ngadiuba, Zhiqiang Que, Wayne Luk, Maria Spiropulu, Alexander Tapper
摘要:我们提出了第一个亚微秒级的Transformer实现的FPGA实现竞争力的性能,为国家的最先进的高能物理基准。Transformers在现代机器学习应用中的多个任务中表现出卓越的性能,包括CERN大型强子对撞机(LHC)的喷气式标记。然而,它们的计算复杂性禁止在实时应用中使用,例如到目前为止对撞机实验的硬件触发系统。在这项工作中,我们展示了Transformers的第一个应用程序,在FPGA上的射流标记,实现$\mathcal{O}(100)$纳秒的延迟与其他基准模型相比,具有优越的性能。我们利用高粒度量化和分布式算法优化,在单个FPGA上实现整个Transformer模型,从而实现所需的吞吐量和延迟。此外,我们为hls 4 ml添加了多头注意力和线性注意力支持,使我们的工作能够被更广泛的快速机器学习社区所使用。这项工作推进了高亮度LHC的下一代触发系统,使Transformers能够用于高能物理及其他领域的实时应用。
摘要:We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC). However, their computational complexity prohibits use in real-time applications, such as the hardware trigger system of the collider experiments up until now. In this work, we demonstrate the first application of transformers for jet tagging on FPGAs, achieving $\mathcal{O}(100)$ nanosecond latency with superior performance compared to alternative baseline models. We leverage high-granularity quantization and distributed arithmetic optimization to fit the entire transformer model on a single FPGA, achieving the required throughput and latency. Furthermore, we add multi-head attention and linear attention support to hls4ml, making our work accessible to the broader fast machine learning community. This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.


GAN|对抗|攻击|生成相关(10篇)

【1】Model Inversion Attacks Meet Cryptographic Fuzzy Extractors
标题:模型倒置攻击遇到密码模糊提取器
链接:https://arxiv.org/abs/2510.25687

作者:Mallika Prabhakar, Louise Xu, Prateek Saxena
摘要:模型反演攻击对使用机器学习(ML)模型的隐私敏感应用程序构成了公开挑战。例如,人脸认证系统使用现代ML模型从注册用户的人脸图像中计算嵌入向量并存储它们。如果泄漏,反演攻击可以准确地从泄漏的向量中重建用户面部。尽管有十年的最佳解决方案,但对于模型反演的理想防御所需的属性没有系统的表征,即使是对于易受数据泄露影响的人脸认证系统的典型示例应用程序也是如此。   在本文中,我们正式证明了强大的防御模型反演所需的属性,并连接它,第一次,模糊提取器的加密概念。我们进一步表明,现有的模糊提取器是不安全的,用于基于ML的人脸认证。我们通过一种名为PIPE的新模型反转攻击来实现这一目标,该攻击在大多数情况下对先前的方案的成功率超过89%。然后,我们提出了L2 FE-Hash,第一个候选模糊提取器,它支持许多基于ML的应用(包括人脸认证)中所需的标准欧氏距离比较器。我们正式表征其计算安全保证,即使在极端的威胁模型完全违反存储的秘密,并实证显示其可用的准确性,在人脸认证的实际人脸分布。它提供了攻击不可知的安全性,而无需对其保护的ML模型进行任何重新训练。从经验上讲,它使先前最先进的反转攻击以及我们新的PIPE攻击无效。
摘要 :Model inversion attacks pose an open challenge to privacy-sensitive applications that use machine learning (ML) models. For example, face authentication systems use modern ML models to compute embedding vectors from face images of the enrolled users and store them. If leaked, inversion attacks can accurately reconstruct user faces from the leaked vectors. There is no systematic characterization of properties needed in an ideal defense against model inversion, even for the canonical example application of a face authentication system susceptible to data breaches, despite a decade of best-effort solutions.   In this paper, we formalize the desired properties of a provably strong defense against model inversion and connect it, for the first time, to the cryptographic concept of fuzzy extractors. We further show that existing fuzzy extractors are insecure for use in ML-based face authentication. We do so through a new model inversion attack called PIPE, which achieves a success rate of over 89% in most cases against prior schemes. We then propose L2FE-Hash, the first candidate fuzzy extractor which supports standard Euclidean distance comparators as needed in many ML-based applications, including face authentication. We formally characterize its computational security guarantees, even in the extreme threat model of full breach of stored secrets, and empirically show its usable accuracy in face authentication for practical face distributions. It offers attack-agnostic security without requiring any re-training of the ML model it protects. Empirically, it nullifies both prior state-of-the-art inversion attacks as well as our new PIPE attack.


【2】BOLT-GAN: Bayes-Optimal Loss for Stable GAN Training
标题:BOLT-GAN:稳定GAN训练的Bayes最佳损失
链接:https://arxiv.org/abs/2510.25609

作者:Mohammadreza Tavasoli Naeini, Ali Bereyhi, Morteza Noshad, Ben Liang, Alfred O. Hero III
摘要:我们介绍了BOLT-GAN,这是一种简单而有效的WGAN框架修改,灵感来自贝叶斯最优学习阈值(BOLT)。我们表明,与Lipschitz连续的距离,BOLT-GAN隐式地最小化不同的度量距离比地球移动器(Wasserstein)的距离,并实现更好的训练稳定性。对四个标准图像生成基准(CIFAR-10,CelebA-64,LSUN Bedroom-64和LSUN Church-64)的经验评估表明,BOLT-GAN始终优于WGAN,实现了10-60%的Frechet起始距离(FID)降低。我们的研究结果表明,BOLT是一个广泛适用的原则,以加强GAN培训。
摘要:We introduce BOLT-GAN, a simple yet effective modification of the WGAN framework inspired by the Bayes Optimal Learning Threshold (BOLT). We show that with a Lipschitz continuous discriminator, BOLT-GAN implicitly minimizes a different metric distance than the Earth Mover (Wasserstein) distance and achieves better training stability. Empirical evaluations on four standard image generation benchmarks (CIFAR-10, CelebA-64, LSUN Bedroom-64, and LSUN Church-64) show that BOLT-GAN consistently outperforms WGAN, achieving 10-60% lower Frechet Inception Distance (FID). Our results suggest that BOLT is a broadly applicable principle for enhancing GAN training.


【3】An In-Depth Analysis of Cyber Attacks in Secured Platforms
标题:深入分析安全平台中的网络攻击
链接:https://arxiv.org/abs/2510.25470

作者:Parick Ozoh, John K Omoniyi, Bukola Ibitoye
摘要:全球恶意软件威胁正在增加。为了解决这个问题,Android操作系统上引入了一种防御型勒索软件。与手机使用中的恶意威胁相关的挑战已成为移动通信中的一个紧迫问题,破坏用户体验并构成重大隐私威胁。这项研究调查了用于检测手机中恶意威胁的常用机器学习技术,并检查了它们的性能。过去的大多数研究都集中在客户反馈和评论上,担心人们可能会为了个人利益而创造虚假评论来推广或贬低产品和服务。因此,开发使用机器学习检测恶意威胁的技术一直是一个关键焦点。本文提出了一个全面的比较研究,目前的研究问题的恶意威胁和应对这些挑战的方法。然而,这些方法需要大量的信息,这对开发强大的、专门的自动化反恶意软件系统提出了挑战。本研究描述了Android应用程序数据集,并使用本研究中采用的指标的准确性水平来衡量技术的准确性。
摘要:There is an increase in global malware threats. To address this, an encryption-type ransomware has been introduced on the Android operating system. The challenges associated with malicious threats in phone use have become a pressing issue in mobile communication, disrupting user experiences and posing significant privacy threats. This study surveys commonly used machine learning techniques for detecting malicious threats in phones and examines their performance. The majority of past research focuses on customer feedback and reviews, with concerns that people might create false reviews to promote or devalue products and services for personal gain. Hence, the development of techniques for detecting malicious threats using machine learning has been a key focus. This paper presents a comprehensive comparative study of current research on the issue of malicious threats and methods for tackling these challenges. Nevertheless, a huge amount of information is required by these methods, presenting a challenge for developing robust, specialized automated anti-malware systems. This research describes the Android Applications dataset, and the accuracy of the techniques is measured using the accuracy levels of the metrics employed in this study.


【4】A Unified Bilevel Model for Adversarial Learning and A Case Study
标题:对抗学习的统一二层模型及案例研究
链接:https://arxiv.org/abs/2510.25121

作者:Yutong Zheng, Qingna Li
摘要:随着机器学习和人工智能的快速发展,对抗学习受到越来越多的关注。然而,由于大多数机器学习模型的结构复杂,对抗性攻击的机制并没有得到很好的解释。如何衡量攻击的效果仍然不太清楚。在本文中,我们提出了一个统一的双层模型的对抗学习。我们进一步研究了聚类模型中的对抗性攻击,并从数据扰动的角度对其进行了解释。我们发现,当数据扰动是相对较小的,聚类模型是强大的,而如果它是相对较大的,聚类结果的变化,从而导致攻击。为了衡量攻击对聚类模型的影响,我们分析了所谓的$\delta$-措施,它可以用于建议的双层模型的对抗性学习的聚类模型的良好定义。
摘要:Adversarial learning has been attracting more and more attention thanks to the fast development of machine learning and artificial intelligence. However, due to the complicated structure of most machine learning models, the mechanism of adversarial attacks is not well interpreted. How to measure the effect of attack is still not quite clear. In this paper, we propose a unified bilevel model for adversarial learning. We further investigate the adversarial attack in clustering models and interpret it from data perturbation point of view. We reveal that when the data perturbation is relatively small, the clustering model is robust, whereas if it is relatively large, the clustering result changes, which leads to an attack. To measure the effect of attacks for clustering models, we analyse the well-definedness of the so-called $\delta$-measure, which can be used in the proposed bilevel model for adversarial learning of clustering models.


【5】Secure Retrieval-Augmented Generation against Poisoning Attacks
标题:针对中毒攻击的安全检索增强生成
链接:https://arxiv.org/abs/2510.25025

作者:Zirui Cheng, Jikai Sun, Anjun Gao, Yueyang Quan, Zhuqing Liu, Xiaohua Hu, Minghong Fang
备注:To appear in IEEE BigData 2025
摘要:大型语言模型(LLM)已经改变了自然语言处理(NLP),使应用程序能够从内容生成到决策支持。检索增强生成(RAG)通过合并外部知识来改进LLM,但也引入了安全风险,特别是来自数据中毒的风险,其中攻击者将有毒文本注入知识数据库以操纵系统输出。虽然已经提出了各种防御措施,但它们往往难以抵御高级攻击。为了解决这个问题,我们引入RAGuard,旨在识别中毒文本的检测框架。RAGuard首先扩大了检索范围,以增加干净文本的比例,减少检索有毒内容的可能性。然后,它适用于分块困惑过滤检测异常变化和文本相似性过滤标记高度相似的文本。这种非参数方法增强了RAG的安全性,在大规模数据集上的实验表明,它在检测和减轻中毒攻击,包括强自适应攻击的有效性。
摘要 :Large language models (LLMs) have transformed natural language processing (NLP), enabling applications from content generation to decision support. Retrieval-Augmented Generation (RAG) improves LLMs by incorporating external knowledge but also introduces security risks, particularly from data poisoning, where the attacker injects poisoned texts into the knowledge database to manipulate system outputs. While various defenses have been proposed, they often struggle against advanced attacks. To address this, we introduce RAGuard, a detection framework designed to identify poisoned texts. RAGuard first expands the retrieval scope to increase the proportion of clean texts, reducing the likelihood of retrieving poisoned content. It then applies chunk-wise perplexity filtering to detect abnormal variations and text similarity filtering to flag highly similar texts. This non-parametric approach enhances RAG security, and experiments on large-scale datasets demonstrate its effectiveness in detecting and mitigating poisoning attacks, including strong adaptive attacks.


【6】The Generation Phases of Flow Matching: a Denoising Perspective
标题:流匹配的生成阶段:去噪的角度
链接:https://arxiv.org/abs/2510.24830

作者:Anne Gagneux, Ségolène Martin, Rémi Gribonval, Mathurin Massias
摘要:流量匹配已经取得了显着的成功,但影响其生成过程的质量的因素仍然知之甚少。在这项工作中,我们采用了去噪的角度,并设计了一个框架,以经验探索的生成过程。奠定了流匹配模型和去噪器之间的正式联系,我们提供了一个共同的基础,比较他们的性能产生和去噪。这使得原则性和受控扰动的设计能够影响样本生成:噪声和漂移。这导致了对生成过程的不同动态阶段的新见解,使我们能够精确地描述生成过程中去噪器成功或失败的阶段以及为什么这很重要。
摘要:Flow matching has achieved remarkable success, yet the factors influencing the quality of its generation process remain poorly understood. In this work, we adopt a denoising perspective and design a framework to empirically probe the generation process. Laying down the formal connections between flow matching models and denoisers, we provide a common ground to compare their performances on generation and denoising. This enables the design of principled and controlled perturbations to influence sample generation: noise and drift. This leads to new insights on the distinct dynamical phases of the generative process, enabling us to precisely characterize at which stage of the generative process denoisers succeed or fail and why this matters.


【7】Towards a Method for Synthetic Generation of PWA Transcripts
标题:研究一种合成生成PWA转录物的方法
链接:https://arxiv.org/abs/2510.24817

作者:Jason M. Pittman, Anton Phillips Jr., Yesenia Medina-Santos, Brielle C. Stark
备注:19 pages, 1 figure, 7 tables
摘要:在失语症研究中,言语语言病理学家(SLP)投入大量时间使用正确信息单位(CIU)对语音样本进行手动编码,这是一种衡量单个语音样本信息量的方法。开发识别失语症语言的自动化系统受到数据稀缺的限制。例如,在AphasiaBank中只有大约600个转录本,但数十亿个令牌被用于训练大型语言模型(LLM)。在更广泛的机器学习(ML)领域,研究人员越来越多地转向稀疏的合成数据。因此,本研究构建并验证了两种生成AphasiaBank Cat Rescue图片描述任务合成转录本的方法。一种方法利用程序编程方法,而第二种方法使用Mistral 7b指令和Llama 3.1 8b指令LLM。该方法通过单词删除、填充物插入和错语替换生成跨越四个严重程度水平(轻度、中度、重度、非常重度)的转录本。总体而言,我们发现,与人类引发的成绩单相比,Mistral 7b指令最好地捕捉了失语症中观察到的语言退化的关键方面,在合成生成方法中显示了NDW,单词计数和单词长度的现实方向变化。基于这些结果,未来的工作应该计划创建一个更大的数据集,微调模型以更好地表达失语症,并让SLP评估合成转录本的真实性和有用性。
摘要:In aphasia research, Speech-Language Pathologists (SLPs) devote extensive time to manually coding speech samples using Correct Information Units (CIUs), a measure of how informative an individual sample of speech is. Developing automated systems to recognize aphasic language is limited by data scarcity. For example, only about 600 transcripts are available in AphasiaBank yet billions of tokens are used to train large language models (LLMs). In the broader field of machine learning (ML), researchers increasingly turn to synthetic data when such are sparse. Therefore, this study constructs and validates two methods to generate synthetic transcripts of the AphasiaBank Cat Rescue picture description task. One method leverages a procedural programming approach while the second uses Mistral 7b Instruct and Llama 3.1 8b Instruct LLMs. The methods generate transcripts across four severity levels (Mild, Moderate, Severe, Very Severe) through word dropping, filler insertion, and paraphasia substitution. Overall, we found, compared to human-elicited transcripts, Mistral 7b Instruct best captures key aspects of linguistic degradation observed in aphasia, showing realistic directional changes in NDW, word count, and word length amongst the synthetic generation methods. Based on the results, future work should plan to create a larger dataset, fine-tune models for better aphasic representation, and have SLPs assess the realism and usefulness of the synthetic transcripts.


【8】Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases
标题:学会攻击:揭露连续数据发布中的隐私风险
链接:https://arxiv.org/abs/2510.24807

作者:Ziyao Cui, Minxing Zhang, Jian Pei
摘要:隐私问题在现代人工智能和数据科学应用中变得越来越重要,在这些应用中,敏感信息被收集,分析和共享在医疗保健,金融和移动等不同领域。虽然先前的研究集中在保护单个数据发布中的隐私,但许多现实世界的系统在顺序或连续的数据发布下运行,其中相同或相关的数据随着时间的推移而发布。这种连续披露会引入新的漏洞,因为不同版本之间的时间相关性可能使攻击者能够推断出隐藏在任何单个版本中的敏感信息。在本文中,我们调查攻击者是否可以利用连续出版物之间的依赖关系,即使每个单独的发布满足标准的隐私保证,在连续的数据发布的隐私妥协。为此,我们提出了一种新的攻击模型,通过将隐马尔可夫模型与基于强化学习的双向推理机制相结合来捕获这些顺序依赖关系。这使得攻击者能够利用序列中较早和较晚的观察来推断私人信息。我们在轨迹数据的背景下实例化我们的框架,演示对手如何从连续的移动数据集恢复敏感位置。在Geolife、Porto Taxi和SynMob数据集上进行的大量实验表明,我们的模型始终优于独立处理每个版本的基线方法。结果揭示了一个基本的隐私风险固有的顺序数据发布,单独保护的版本可以集体泄漏敏感信息时,分析时间。这些发现强调了对新的隐私保护框架的需求,这些框架明确地对时间依赖性进行建模,例如时间感知的差分隐私或顺序数据混淆策略。
摘要:Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed, and shared across diverse domains such as healthcare, finance, and mobility. While prior research has focused on protecting privacy in a single data release, many real-world systems operate under sequential or continuous data publishing, where the same or related data are released over time. Such sequential disclosures introduce new vulnerabilities, as temporal correlations across releases may enable adversaries to infer sensitive information that remains hidden in any individual release. In this paper, we investigate whether an attacker can compromise privacy in sequential data releases by exploiting dependencies between consecutive publications, even when each individual release satisfies standard privacy guarantees. To this end, we propose a novel attack model that captures these sequential dependencies by integrating a Hidden Markov Model with a reinforcement learning-based bi-directional inference mechanism. This enables the attacker to leverage both earlier and later observations in the sequence to infer private information. We instantiate our framework in the context of trajectory data, demonstrating how an adversary can recover sensitive locations from sequential mobility datasets. Extensive experiments on Geolife, Porto Taxi, and SynMob datasets show that our model consistently outperforms baseline approaches that treat each release independently. The results reveal a fundamental privacy risk inherent to sequential data publishing, where individually protected releases can collectively leak sensitive information when analyzed temporally. These findings underscore the need for new privacy-preserving frameworks that explicitly model temporal dependencies, such as time-aware differential privacy or sequential data obfuscation strategies.


【9】Re-evaluating sample efficiency in de novo molecule generation
标题:重新评估从头产生分子中的样本效率
链接:https://arxiv.org/abs/2212.01385

作者:Morgan Thomas, Noel M. O'Boyle, Andreas Bender, Chris De Graaf
备注:Submission to ELLIS ML4Molecules Workshop 2022
摘要:从头分子生成可能遭受数据效率低下;需要大量训练数据或许多采样数据点来进行目标优化。后者在将深度生成模型与计算昂贵的分子评分函数(a.k.a.甲骨文)通常用于计算机辅助药物设计。因此,最近的工作集中在方法,以提高样品效率的从头分子药物设计的背景下,或基准it. In这项工作中,我们讨论和适应最近的样品效率基准,以更好地反映现实的目标,也相对于质量的化学生成,这必须始终考虑在小分子药物设计的背景下,然后我们重新评估所有基准生成模型。我们发现,考虑到分子量和LogP相对于训练数据,以及提出的化学多样性,重新排序生成模型的排名。此外,我们对最近提出的一种提高样品效率的方法(增强爬山法)进行了基准测试,发现在考虑样品效率和产生的分子的化学性质时,它排名第一。在样品效率和化学可取性的不断改进,使更常规的计算昂贵的评分功能集成在一个更现实的时间尺度。
摘要:De novo molecule generation can suffer from data inefficiency; requiring large amounts of training data or many sampled data points to conduct objective optimization. The latter is a particular disadvantage when combining deep generative models with computationally expensive molecule scoring functions (a.k.a. oracles) commonly used in computer-aided drug design. Recent works have therefore focused on methods to improve sample efficiency in the context of de novo molecule drug design, or to benchmark it. In this work, we discuss and adapt a recent sample efficiency benchmark to better reflect realistic goals also with respect to the quality of chemistry generated, which must always be considered in the context of small-molecule drug design; we then re-evaluate all benchmarked generative models. We find that accounting for molecular weight and LogP with respect to the training data, and the diversity of chemistry proposed, re-orders the ranking of generative models. In addition, we benchmark a recently proposed method to improve sample efficiency (Augmented Hill-Climb) and found it ranked top when considering both the sample efficiency and chemistry of molecules generated. Continual improvements in sample efficiency and chemical desirability enable more routine integration of computationally expensive scoring functions on a more realistic timescale.


【10】EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation
标题:酶控制:为酶骨架生成添加功能性和特定蛋白质控制
链接:https://arxiv.org/abs/2510.25132

作者:Chao Song, Zhiyuan Liu, Han Huang, Liang Wang, Qiong Wang, Jianyu Shi, Hui Yu, Yihang Zhou, Yang Zhang
摘要:设计具有底物特异性功能的酶骨架是计算蛋白质工程中的一个关键挑战。目前的生成模型在蛋白质设计方面表现出色,但在结合数据、底物特异性控制和从头酶骨架生成的灵活性方面面临限制。为了解决这个问题,我们引入了EnzyBind,这是一个数据集,其中包含11,100个经过实验验证的酶-底物对,这些酶-底物对是专门从PDBbind中筛选出来的。在此基础上,我们提出了酶控制,一种方法,使功能和底物特异性控制酶骨架生成。我们的方法生成的酶骨干条件的MSA注释的催化位点和相应的基板,这是自动提取的策划酶底物数据。EnzyControl的核心是EnzyAdapter,这是一个轻量级的模块化组件,集成到预先训练的基序支架模型中,使其能够感知底物。两阶段训练范式进一步完善了模型生成准确和功能性酶结构的能力。实验表明,我们的酶控制实现了最好的性能在结构和功能指标上的酶结合和酶的基准,特别是显着的改进13%的可设计性和13%的催化效率相比,基线模型。代码发布于https://github.com/Vecteur-libre/EnzyControl。
摘要:Designing enzyme backbones with substrate-specific functionality is a critical challenge in computational protein engineering. Current generative models excel in protein design but face limitations in binding data, substrate-specific control, and flexibility for de novo enzyme backbone generation. To address this, we introduce EnzyBind, a dataset with 11,100 experimentally validated enzyme-substrate pairs specifically curated from PDBbind. Building on this, we propose EnzyControl, a method that enables functional and substrate-specific control in enzyme backbone generation. Our approach generates enzyme backbones conditioned on MSA-annotated catalytic sites and their corresponding substrates, which are automatically extracted from curated enzyme-substrate data. At the core of EnzyControl is EnzyAdapter, a lightweight, modular component integrated into a pretrained motif-scaffolding model, allowing it to become substrate-aware. A two-stage training paradigm further refines the model's ability to generate accurate and functional enzyme structures. Experiments show that our EnzyControl achieves the best performance across structural and functional metrics on EnzyBind and EnzyBench benchmarks, with particularly notable improvements of 13\% in designability and 13\% in catalytic efficiency compared to the baseline models. The code is released at https://github.com/Vecteur-libre/EnzyControl.


半/弱/无/有监督|不确定性|主动学习(4篇)

【1】Uncertainty Quantification for Regression: A Unified Framework based on kernel scores
标题:回归的不确定性量化:基于核心分数的统一框架
链接:https://arxiv.org/abs/2510.25599

作者:Christopher Bülte, Yusuf Sale, Gitta Kutyniok, Eyke Hüllermeier
摘要:回归任务,特别是在安全关键领域,需要适当的不确定性量化,但文献仍然主要集中在分类。在这方面,我们引入了一个家庭的措施,总的,任意的,和认知的不确定性的基础上适当的评分规则,特别强调内核的分数。该框架统一了几个著名的措施,并提供了一个原则配方设计新的行为,如尾部敏感性,鲁棒性和分布外的响应,是由内核的选择。我们证明了明确的对应关系内核得分的特点和下游的行为,产生具体的设计准则,针对特定任务的措施。大量的实验表明,这些措施是有效的下游任务,并揭示了明确的权衡实例,包括鲁棒性和分布外检测性能。
摘要:Regression tasks, notably in safety-critical domains, require proper uncertainty quantification, yet the literature remains largely classification-focused. In this light, we introduce a family of measures for total, aleatoric, and epistemic uncertainty based on proper scoring rules, with a particular emphasis on kernel scores. The framework unifies several well-known measures and provides a principled recipe for designing new ones whose behavior, such as tail sensitivity, robustness, and out-of-distribution responsiveness, is governed by the choice of kernel. We prove explicit correspondences between kernel-score characteristics and downstream behavior, yielding concrete design guidelines for task-specific measures. Extensive experiments demonstrate that these measures are effective in downstream tasks and reveal clear trade-offs among instantiations, including robustness and out-of-distribution detection performance.


【2】Analysis of Semi-Supervised Learning on Hypergraphs
标题:超图的半监督学习分析
链接:https://arxiv.org/abs/2510.25354

作者:Adrien Weihs, Andrea Bertozzi, Matthew Thorpe
摘要:超图为高阶交互建模提供了一个自然的框架,但它们在半监督学习中的理论基础仍然有限。我们提供了一个渐进的一致性分析变分学习随机几何超图,精确地描述了条件,确保超图学习的适定性,以及显示收敛到一个加权$p$-拉普拉斯方程。受此启发,我们提出了高阶超图学习(HOHL),它通过骨架图的拉普拉斯算子的幂进行正则化,以实现多尺度平滑。HOHL收敛于高阶Sobolev算子。从经验上看,它在标准基线上表现强劲。
摘要 :Hypergraphs provide a natural framework for modeling higher-order interactions, yet their theoretical underpinnings in semi-supervised learning remain limited. We provide an asymptotic consistency analysis of variational learning on random geometric hypergraphs, precisely characterizing the conditions ensuring the well-posedness of hypergraph learning as well as showing convergence to a weighted $p$-Laplacian equation. Motivated by this, we propose Higher-Order Hypergraph Learning (HOHL), which regularizes via powers of Laplacians from skeleton graphs for multiscale smoothness. HOHL converges to a higher-order Sobolev seminorm. Empirically, it performs strongly on standard baselines.


【3】Point-level Uncertainty Evaluation of Mobile Laser Scanning Point Clouds
标题:移动激光扫描点云的点级不确定性评估
链接:https://arxiv.org/abs/2510.24773

作者:Ziyang Xu, Olaf Wysocki, Christoph Holst
摘要:对移动激光扫描(MLS)点云的不确定性进行可靠的量化,对于确保下游应用(如3D测绘、建模和变化分析)的准确性和可靠性至关重要。传统的后向不确定性建模严重依赖于高精度的参考数据,这些数据通常是昂贵的或不可行的,以大规模获得。为了解决这个问题,本研究提出了一个基于机器学习的点级不确定性评估框架,学习局部几何特征和点级误差之间的关系。该框架使用两个集成学习模型,随机森林(RF)和XGBoost来实现,这两个模型在空间分区的真实世界数据集上进行训练和验证,以避免数据泄漏。实验结果表明,这两种模型可以有效地捕捉几何特征和不确定性之间的非线性关系,实现平均ROC-AUC值大于0.87。分析进一步揭示了描述高程变化、点密度和局部结构复杂性的几何特征在预测不确定性中起主导作用。该框架提供了一个数据驱动的不确定性评估的角度,为未来的质量控制和大规模点云的误差分析提供了一个可扩展和适应性强的基础。
摘要:Reliable quantification of uncertainty in Mobile Laser Scanning (MLS) point clouds is essential for ensuring the accuracy and credibility of downstream applications such as 3D mapping, modeling, and change analysis. Traditional backward uncertainty modeling heavily rely on high-precision reference data, which are often costly or infeasible to obtain at large scales. To address this issue, this study proposes a machine learning-based framework for point-level uncertainty evaluation that learns the relationship between local geometric features and point-level errors. The framework is implemented using two ensemble learning models, Random Forest (RF) and XGBoost, which are trained and validated on a spatially partitioned real-world dataset to avoid data leakage. Experimental results demonstrate that both models can effectively capture the nonlinear relationships between geometric characteristics and uncertainty, achieving mean ROC-AUC values above 0.87. The analysis further reveals that geometric features describing elevation variation, point density, and local structural complexity play a dominant role in predicting uncertainty. The proposed framework offers a data-driven perspective of uncertainty evaluation, providing a scalable and adaptable foundation for future quality control and error analysis of large-scale point clouds.


【4】Bayesian Neural Networks vs. Mixture Density Networks: Theoretical and Empirical Insights for Uncertainty-Aware Nonlinear Modeling
标题:Bayesian神经网络与混合密度网络:不确定性意识非线性建模的理论和经验见解
链接:https://arxiv.org/abs/2510.25001

作者:Riddhi Pratim Ghosh, Ian Barnett
备注:20 pages, 2 figures
摘要:本文研究了两种著名的概率神经建模范式:贝叶斯神经网络(BNN)和混合密度网络(MDNs)的不确定性感知非线性回归。虽然BNN通过在网络参数上放置先验分布来引入认知不确定性,但MDN直接对条件输出分布进行建模,从而捕获多模态和异方差数据生成机制。我们提出了一个统一的理论和实证框架比较这些方法。在理论方面,我们推导出收敛速度和误差界H\“旧的光滑条件下,MDN实现更快的Kullback-Leibler(KL)发散收敛,由于其基于似然性的性质,而BNN表现出额外的近似偏差引起的变分推理。从经验上讲,我们在合成非线性数据集和放射学基准(RSNA儿科骨龄挑战)上评估了这两种架构。定量和定性的结果表明,MDNs更有效地捕捉多模态响应和适应性不确定性,而BNN提供更多的解释认知不确定性在有限的数据。我们的研究结果阐明了基于后验和基于似然的概率学习的互补优势,为非线性系统中的不确定性感知建模提供了指导。
摘要:This paper investigates two prominent probabilistic neural modeling paradigms: Bayesian Neural Networks (BNNs) and Mixture Density Networks (MDNs) for uncertainty-aware nonlinear regression. While BNNs incorporate epistemic uncertainty by placing prior distributions over network parameters, MDNs directly model the conditional output distribution, thereby capturing multimodal and heteroscedastic data-generating mechanisms. We present a unified theoretical and empirical framework comparing these approaches. On the theoretical side, we derive convergence rates and error bounds under H\"older smoothness conditions, showing that MDNs achieve faster Kullback-Leibler (KL) divergence convergence due to their likelihood-based nature, whereas BNNs exhibit additional approximation bias induced by variational inference. Empirically, we evaluate both architectures on synthetic nonlinear datasets and a radiographic benchmark (RSNA Pediatric Bone Age Challenge). Quantitative and qualitative results demonstrate that MDNs more effectively capture multimodal responses and adaptive uncertainty, whereas BNNs provide more interpretable epistemic uncertainty under limited data. Our findings clarify the complementary strengths of posterior-based and likelihood-based probabilistic learning, offering guidance for uncertainty-aware modeling in nonlinear systems.


迁移|Zero/Few/One-Shot|自适应(3篇)

【1】TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting
标题:TempoPFN:用于零采样时间序列预测的线性RNN的综合预训练
链接:https://arxiv.org/abs/2510.25502

作者:Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, Frank Hutter
备注:30 pages, 18 figures, 13 tables
摘要:zero-shot时间序列预测的基础模型在有效的长期预测和再现性方面面临挑战,现有的仅合成方法在具有挑战性的基准上表现不佳。本文介绍了TempoPFN,一种基于线性递归神经网络(RNN)的单变量时间序列基础模型,该模型专门在合成数据上进行预训练。该模型使用GatedDeltaProduct架构,具有状态编织,用于跨序列长度的完全并行化训练,无需窗口或摘要技术,同时保持强大的时间状态跟踪。我们全面的合成数据管道统一了各种生成器,包括随机微分方程,高斯过程和音频合成,并具有新颖的增强功能。在Gift-Eval基准的zero-shot评估中,TempoPFN实现了顶级的竞争性能,优于所有现有的仅合成方法,并超过了绝大多数在真实世界数据上训练的模型,同时通过利用完全并行化的训练和推理,比现有的基线更有效。我们开源了完整的数据生成管道和训练代码,为未来的研究提供了可复制的基础。
摘要:Foundation models for zero-shot time series forecasting face challenges in efficient long-horizon prediction and reproducibility, with existing synthetic-only approaches underperforming on challenging benchmarks. This paper presents TempoPFN, a univariate time series foundation model based on linear Recurrent Neural Networks (RNNs) pre-trained exclusively on synthetic data. The model uses a GatedDeltaProduct architecture with state-weaving for fully parallelizable training across sequence lengths, eliminating the need for windowing or summarization techniques while maintaining robust temporal state-tracking. Our comprehensive synthetic data pipeline unifies diverse generators, including stochastic differential equations, Gaussian processes, and audio synthesis, with novel augmentations. In zero-shot evaluations on the Gift-Eval benchmark, TempoPFN achieves top-tier competitive performance, outperforming all existing synthetic-only approaches and surpassing the vast majority of models trained on real-world data, while being more efficient than existing baselines by leveraging fully parallelizable training and inference. We open-source our complete data generation pipeline and training code, providing a reproducible foundation for future research.


【2】Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
标题:具有自适应步进大小的动态加权动量用于高效的深度网络训练
链接:https://arxiv.org/abs/2510.25042

作者 :Zhifeng Wang, Longlong Li, Chunyan Zeng
备注:45 pages, 12 figures
摘要:在当前的深度学习研究领域,尽管随机梯度下降(SGD)和自适应矩估计(Adam)等优化算法得到了广泛应用,但它们在解决学习效率波动、满足复杂模型需求和解决非凸优化问题方面的能力仍然存在明显不足。这些挑战主要来自算法在处理复杂数据结构和模型方面的局限性,例如,难以选择适当的学习率,避免局部最优,以及在高维空间中导航。为了解决这些问题,本文介绍了一种新的优化算法DWMGrad。该算法在传统方法的基础上,引入了一种依赖于历史数据的动态引导机制,以动态更新动量和学习率。这允许优化器灵活地调整其对历史信息的依赖,以适应各种训练场景。这种策略不仅使优化器能够更好地适应不断变化的环境和任务复杂性,而且通过广泛的实验验证,展示了DWMGrad在多种场景下实现更快收敛速度和更高精度的能力。
摘要:Within the current sphere of deep learning research, despite the extensive application of optimization algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam), there remains a pronounced inadequacy in their capability to address fluctuations in learning efficiency, meet the demands of complex models, and tackle non-convex optimization issues. These challenges primarily arise from the algorithms' limitations in handling complex data structures and models, for instance, difficulties in selecting an appropriate learning rate, avoiding local optima, and navigating through high-dimensional spaces. To address these issues, this paper introduces a novel optimization algorithm named DWMGrad. This algorithm, building on the foundations of traditional methods, incorporates a dynamic guidance mechanism reliant on historical data to dynamically update momentum and learning rates. This allows the optimizer to flexibly adjust its reliance on historical information, adapting to various training scenarios. This strategy not only enables the optimizer to better adapt to changing environments and task complexities but also, as validated through extensive experimentation, demonstrates DWMGrad's ability to achieve faster convergence rates and higher accuracies under a multitude of scenarios.


【3】Adaptive EEG-based stroke diagnosis with a GRU-TCN classifier and deep Q-learning thresholding
标题:采用GRU-TCN分类器和深度Q学习阈值法进行自适应基于脑电波的中风诊断
链接:https://arxiv.org/abs/2510.24889

作者:Shakeel Abdulkareem (1), Bora Yimenicioglu (2), Andrea Yang (3), Khartik Uppalapati (2), Aneesh Gudipati (1), Zhaoyang Fan (3) ((1) George Mason University, College of Science, Fairfax, VA, USA, (2) Raregen Youth Network, Translational Medical Research Department, Oakton, VA, USA, (3) University of Southern California, Los Angeles, CA, USA)
备注:10 pages, 6 figures. Equal contribution: Shakeel Abdulkareem and Bora Yimenicioglu. Compiled with pdfLaTeX (wlscirep class)
摘要:疑似卒中的快速分诊需要准确的、床边可部署的工具; EEG很有前途,但在第一次接触时未得到充分利用。我们提出了一种自适应多任务EEG分类器,该分类器将32通道信号转换为功率谱密度特征(Welch),使用递归卷积网络(GRU-TCN)来预测中风类型(健康,缺血,出血),半球侧化和严重程度,并应用深度Q网络(DQN)实时调整决策阈值。使用UCLH卒中EIT/EEG数据集(44个记录;约26个急性卒中,10个对照)的患者分类,主要结局是卒中类型表现;次要结局是严重程度和偏侧化。基线GRU-TCN对卒中类型的准确性达到89.3%(F1 92.8%),对严重程度的准确性约为96.9%(F1 95.9%),对侧化的准确性约为96.7%(F1 97.4%)。采用DQN阈值自适应,笔画类型准确率提高到约98.0%(F1 97.7%)。我们还测试了独立的低密度EEG队列(ZJU 4 H)的鲁棒性,并报告了配对的患者水平统计数据。分析遵循STARD 2015诊断准确性研究指南(索引测试:GRU-TCN+DQN;参考标准:放射学/临床诊断;患者评估)。自适应阈值将操作点转移到临床首选的灵敏度-特异性权衡,而集成的头皮图和光谱可视化支持可解释性。
摘要:Rapid triage of suspected stroke needs accurate, bedside-deployable tools; EEG is promising but underused at first contact. We present an adaptive multitask EEG classifier that converts 32-channel signals to power spectral density features (Welch), uses a recurrent-convolutional network (GRU-TCN) to predict stroke type (healthy, ischemic, hemorrhagic), hemispheric lateralization, and severity, and applies a deep Q-network (DQN) to tune decision thresholds in real time. Using a patient-wise split of the UCLH Stroke EIT/EEG data set (44 recordings; about 26 acute stroke, 10 controls), the primary outcome was stroke-type performance; secondary outcomes were severity and lateralization. The baseline GRU-TCN reached 89.3% accuracy (F1 92.8%) for stroke type, about 96.9% (F1 95.9%) for severity, and about 96.7% (F1 97.4%) for lateralization. With DQN threshold adaptation, stroke-type accuracy increased to about 98.0% (F1 97.7%). We also tested robustness on an independent, low-density EEG cohort (ZJU4H) and report paired patient-level statistics. Analyses follow STARD 2015 guidance for diagnostic accuracy studies (index test: GRU-TCN+DQN; reference standard: radiology/clinical diagnosis; patient-wise evaluation). Adaptive thresholding shifts the operating point to clinically preferred sensitivity-specificity trade-offs, while integrated scalp-map and spectral visualizations support interpretability.


强化学习(2篇)

【1】Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning
标题:多目标强化学习中密集且多样化的目标覆盖
链接:https://arxiv.org/abs/2510.25311

作者:Sagalpreet Singh, Rishi Saket, Aravindan Raghuveer
备注:21 pages, 5 figures
摘要:强化学习算法主要集中在学习最大化预期回报的策略。因此,学习的策略可以利用一个或几个奖励源。然而,在许多自然情况下,人们希望学习的政策,诱导分散的边际状态分布在奖励状态,同时最大化的预期回报,这是典型的绑定到达到目标状态。这方面相对来说还没有探索过。基于熵正则化和内在奖励的现有技术使用随机性来鼓励探索,以找到一个最优的策略,这可能不一定会导致分散的边际状态分布在奖励状态。匹配目标分布的其他RL算法假设后者是先验可用的。这可能是不可行的,在大规模的系统中,枚举的所有状态是不可能的,一个状态被确定为一个目标状态,只有在达到它。我们形式化的问题,最大化的预期回报,同时统一访问的目标状态作为多目标RL在状态空间上的一个预言分类器确定的目标状态。我们提出了一种新的算法,学习一个高回报的策略混合物与边缘状态分布分散在一组目标状态。我们的算法基于优化自定义RL奖励,该奖励是基于当前策略混合在每次迭代时针对一组采样轨迹计算的。后者通过离线RL算法来更新策略混合。我们证明了我们的算法的性能保证,表现出有效的收敛边界优化的自然目标,捕获的预期回报以及分散的边缘状态分布在目标状态。我们设计和合成MDP和标准RL环境进行实验,以评估我们的算法的有效性。
摘要:Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, in many natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states, while maximizing the expected return which is typically tied to reaching a goal state. This aspect remains relatively unexplored. Existing techniques based on entropy regularization and intrinsic rewards use stochasticity for encouraging exploration to find an optimal policy which may not necessarily lead to dispersed marginal state distribution over rewarding states. Other RL algorithms which match a target distribution assume the latter to be available apriori. This may be infeasible in large scale systems where enumeration of all states is not possible and a state is determined to be a goal state only upon reaching it. We formalize the problem of maximizing the expected return while uniformly visiting the goal states as Multi Goal RL in which an oracle classifier over the state space determines the goal states. We propose a novel algorithm that learns a high-return policy mixture with marginal state distribution dispersed over the set of goal states. Our algorithm is based on optimizing a custom RL reward which is computed - based on the current policy mixture - at each iteration for a set of sampled trajectories. The latter are used via an offline RL algorithm to update the policy mixture. We prove performance guarantees for our algorithm, showing efficient convergence bounds for optimizing a natural objective which captures the expected return as well as the dispersion of the marginal state distribution over the goal states. We design and perform experiments on synthetic MDPs and standard RL environments to evaluate the effectiveness of our algorithm.


【2】Enhancing Hierarchical Reinforcement Learning through Change Point Detection in Time Series
标题:通过时间序列中的变点检测增强分层强化学习
链接:https://arxiv.org/abs/2510.24988

作者:Hemanath Arumugam, Falong Fan, Bo Liu
摘要:分层强化学习(HRL)通过跨越多个时间步的选项策略引入时间抽象,增强了长时间任务中决策的可扩展性。尽管其理论上的吸引力,HRL的实际实施遭受自主发现语义有意义的子目标和学习最佳选项终止边界的挑战。本文介绍了一种新的架构,它集成了一个自我监督,基于变压器的变化点检测(CPD)模块到Option-Critic框架,使状态轨迹的自适应分割和选项的发现。CPD模块使用来自内在信号的启发式伪标签进行训练,以在没有外部监督的情况下推断环境动态中的潜在变化。这些推断的变点以三种关键方式被利用:(i)作为稳定终止函数梯度的监督信号,(ii)通过分段行为克隆来预训练期权内策略,以及(iii)通过CPD定义的状态分区上的期权间分歧惩罚来实施功能专业化。整体优化目标使用结构感知辅助损失来增强标准演员-评论家损失。在我们的框架中,选项发现自然出现CPD定义的轨迹段映射到不同的选项内的政策,使代理自主划分其行为到可重用的,语义上有意义的技能。四个房间和弹球任务的实验表明,CPD引导代理表现出加速收敛,更高的累积回报,并显着改善选项专业化。这些研究结果证实,通过变点分割整合结构先验,在复杂的环境中,导致更可解释的,样本效率高,鲁棒的分层政策。
摘要:Hierarchical Reinforcement Learning (HRL) enhances the scalability of decision-making in long-horizon tasks by introducing temporal abstraction through options-policies that span multiple timesteps. Despite its theoretical appeal, the practical implementation of HRL suffers from the challenge of autonomously discovering semantically meaningful subgoals and learning optimal option termination boundaries. This paper introduces a novel architecture that integrates a self-supervised, Transformer-based Change Point Detection (CPD) module into the Option-Critic framework, enabling adaptive segmentation of state trajectories and the discovery of options. The CPD module is trained using heuristic pseudo-labels derived from intrinsic signals to infer latent shifts in environment dynamics without external supervision. These inferred change-points are leveraged in three critical ways: (i) to serve as supervisory signals for stabilizing termination function gradients, (ii) to pretrain intra-option policies via segment-wise behavioral cloning, and (iii) to enforce functional specialization through inter-option divergence penalties over CPD-defined state partitions. The overall optimization objective enhances the standard actor-critic loss using structure-aware auxiliary losses. In our framework, option discovery arises naturally as CPD-defined trajectory segments are mapped to distinct intra-option policies, enabling the agent to autonomously partition its behavior into reusable, semantically meaningful skills. Experiments on the Four-Rooms and Pinball tasks demonstrate that CPD-guided agents exhibit accelerated convergence, higher cumulative returns, and significantly improved option specialization. These findings confirm that integrating structural priors via change-point segmentation leads to more interpretable, sample-efficient, and robust hierarchical policies in complex environments.


分层学习(1篇)

【1】Hierarchical Physics-Embedded Learning for Spatiotemporal Dynamical Systems
标题:时空动态系统的分层物理嵌入式学习
链接:https://arxiv.org/abs/2510.25306

作者:Xizhe Wang, Xiaobin Song, Qingshan Jia, Hongbo Zhao, Benben Jiang
摘要:模拟复杂的时空动力学,特别是在远离平衡的系统中,仍然是科学界的一个巨大挑战。这些系统的控制偏微分方程(PDE)往往是棘手的,从第一原理,由于其固有的复杂性,其特点是高阶导数和强非线性,再加上不完整的物理知识。这刺激了数据驱动方法的发展,但这些方法面临着局限性:Pingdom数据驱动模型通常在物理上不一致且数据密集,而现有的物理信息方法缺乏表示复杂运算符或系统集成部分物理知识的结构能力。在这里,我们提出了一个分层的物理嵌入式学习框架,从根本上推进了从稀疏和嘈杂的数据中向前时空预测和反向发现物理定律。关键的创新是一个反映科学发现过程的两级架构:第一级学习PDE的基本符号组件,而第二级学习它们的管理组合。这种分层分解不仅降低了学习的复杂性,更重要的是,它能够实现先验知识的结构集成。将已知的物理规律直接嵌入到模型计算图中,保证了物理一致性,提高了数据效率。通过建立自适应傅立叶神经算子的框架,我们可以有效地捕捉动力系统的非局部依赖和高阶算子特性。此外,通过在结构上解耦已知项和未知项,该框架进一步使得能够通过符号回归来可解释地发现潜在的控制方程,而无需预先假设函数形式。
摘要:Modeling complex spatiotemporal dynamics, particularly in far-from-equilibrium systems, remains a grand challenge in science. The governing partial differential equations (PDEs) for these systems are often intractable to derive from first principles, due to their inherent complexity, characterized by high-order derivatives and strong nonlinearities, coupled with incomplete physical knowledge. This has spurred the development of data-driven methods, yet these approaches face limitations: Purely data-driven models are often physically inconsistent and data-intensive, while existing physics-informed methods lack the structural capacity to represent complex operators or systematically integrate partial physical knowledge. Here, we propose a hierarchical physics-embedded learning framework that fundamentally advances both the forward spatiotemporal prediction and inverse discovery of physical laws from sparse and noisy data. The key innovation is a two-level architecture that mirrors the process of scientific discovery: the first level learns fundamental symbolic components of a PDE, while the second learns their governing combinations. This hierarchical decomposition not only reduces learning complexity but, more importantly, enables a structural integration of prior knowledge. Known physical laws are directly embedded into the models computational graph, guaranteeing physical consistency and improving data efficiency. By building the framework upon adaptive Fourier Neural Operators, we can effectively capture the non-local dependencies and high-order operators characteristic of dynamical systems. Additionally, by structurally decoupling known and unknown terms, the framework further enables interpretable discovery of underlying governing equations through symbolic regression, without presupposing functional forms.


医学相关(5篇)

【1】Epileptic Seizure Detection and Prediction from EEG Data: A Machine Learning Approach with Clinical Validation
标题:根据脑电数据检测和预测癫痫发作:一种具有临床验证的机器学习方法
链接:https://arxiv.org/abs/2510.24986

作者:Ria Jayanti, Tanish Jain
备注:9 pages, 3 figures
摘要:近年来,机器学习已成为支持癫痫治疗中癫痫发作检测和监测的越来越强大的工具。传统的方法侧重于在癫痫发作开始后才发现,这限制了早期干预和积极治疗的机会。在这项研究中,我们提出了一种新的方法,集成了实时癫痫发作检测和预测,旨在捕捉微妙的时间模式的EEG数据,可能表明即将到来的癫痫发作。我们的方法使用CHB-MIT头皮EEG数据库进行评估,该数据库包括从23名儿童和年轻成人耐药癫痫患者中收集的969小时记录和173次癫痫发作。为了支持癫痫发作检测,我们实现了一系列有监督的机器学习算法,包括K最近邻、逻辑回归、随机森林和支持向量机。Logistic回归达到了90.9%的检测准确率和89.6%的召回率,证明了适用于临床筛选的平衡性能。随机森林和支持向量机模型实现了更高的准确性(94.0%),但召回率为0%,未能检测到任何癫痫发作,这表明仅凭准确性不足以评估具有类别不平衡的医学ML模型。对于癫痫发作预测,我们采用了长短期记忆(LSTM)网络,该网络使用深度学习来对EEG数据中的时间依赖性进行建模。LSTM模型实现了89.26%的预测准确率。这些结果突出了开发可访问的实时监测工具的潜力,这些工具不仅可以像传统方法那样检测癫痫发作,而且还可以在发作之前进行预测。这种预测癫痫发作的能力标志着从反应性癫痫发作管理到更积极主动的方法的重大转变,使患者能够预测癫痫发作并采取预防措施以降低受伤或其他并发症的风险。
摘要 :In recent years, machine learning has become an increasingly powerful tool for supporting seizure detection and monitoring in epilepsy care. Traditional approaches focus on identifying seizures only after they begin, which limits the opportunity for early intervention and proactive treatment. In this study, we propose a novel approach that integrates both real-time seizure detection and prediction, aiming to capture subtle temporal patterns in EEG data that may indicate an upcoming seizure. Our approach was evaluated using the CHB-MIT Scalp EEG Database, which includes 969 hours of recordings and 173 seizures collected from 23 pediatric and young adult patients with drug-resistant epilepsy. To support seizure detection, we implemented a range of supervised machine learning algorithms, including K-Nearest Neighbors, Logistic Regression, Random Forest, and Support Vector Machine. The Logistic Regression achieved 90.9% detection accuracy with 89.6% recall, demonstrating balanced performance suitable for clinical screening. Random Forest and Support Vector Machine models achieved higher accuracy (94.0%) but with 0% recall, failing to detect any seizures, illustrating that accuracy alone is insufficient for evaluating medical ML models with class imbalance. For seizure prediction, we employed Long Short-Term Memory (LSTM) networks, which use deep learning to model temporal dependencies in EEG data. The LSTM model achieved 89.26% prediction accuracy. These results highlight the potential of developing accessible, real-time monitoring tools that not only detect seizures as traditionally done, but also predict them before they occur. This ability to predict seizures marks a significant shift from reactive seizure management to a more proactive approach, allowing patients to anticipate seizures and take precautionary measures to reduce the risk of injury or other complications.


【2】CFL-SparseMed: Communication-Efficient Federated Learning for Medical Imaging with Top-k Sparse Updates
标题:CFL-SparseMed:具有Top-k稀疏更新的医学成像通信高效联邦学习
链接:https://arxiv.org/abs/2510.24776

作者:Gousia Habib, Aniket Bhardwaj, Ritvik Sharma, Shoeib Amin Banday, Ishfaq Ahmad Malik
摘要:安全可靠的医学图像分类对于有效的患者治疗至关重要,但由于数据和隐私问题,集中式模型面临挑战。联邦学习(FL)可以实现隐私保护协作,但要解决异构的非IID数据和高通信成本的问题,特别是在大型网络中。我们提出了\textbf{CFL-SparseMed},FL的方法,使用Top-k稀疏化,以减少通信开销,只传输前k梯度。这种统一的解决方案有效地解决了数据异构性问题,同时保持了模型的准确性。它提高了FL效率,保护隐私,并提高了非IID医学成像环境中的诊断准确性和患者护理。再现性源代码可在\href{https://github.com/Aniket2241/APK_contruct}{Github}上获得。
摘要:Secure and reliable medical image classification is crucial for effective patient treatment, but centralized models face challenges due to data and privacy concerns. Federated Learning (FL) enables privacy-preserving collaborations but struggles with heterogeneous, non-IID data and high communication costs, especially in large networks. We propose \textbf{CFL-SparseMed}, an FL approach that uses Top-k Sparsification to reduce communication overhead by transmitting only the top k gradients. This unified solution effectively addresses data heterogeneity while maintaining model accuracy. It enhances FL efficiency, preserves privacy, and improves diagnostic accuracy and patient care in non-IID medical imaging settings. The reproducibility source code is available on \href{https://github.com/Aniket2241/APK_contruct}{Github}.


【3】EcoScaleNet: A Lightweight Multi Kernel Network for Long Sequence 12 lead ECG Classification
标题:EcoScaleNet:一种用于长序列12导联心电图分类的轻量级多核网络
链接:https://arxiv.org/abs/2510.24748

作者:Dong-Hyeon Kang, Ju-Hyeon Nam, Sang-Chul Lee
备注:MICCAI Workshop on Efficient Medical AI (EMA)
摘要:12导联心电图(ECG)的准确解释对于心脏异常的早期检测是至关重要的,然而手动读取容易出错,并且现有的基于CNN的分类器难以选择概括为ECG的典型长序列的感受野大小。Omni Scale CNN(OS CNN)通过枚举受哥德巴赫猜想启发的素数大小的内核来解决这个问题,以覆盖每个尺度,但其详尽的设计爆炸了计算成本,并阻止了更深,更宽的模型。我们提出了高效卷积全尺度网络(EcoScale-Net),这是一种分层变体,在消除冗余的同时保留了完整的感受野覆盖。在每个阶段,最大内核长度被限制在下采样后仍然需要的规模,并且在每个Omni Scale块之前和之后插入瓶颈卷积,以减少通道增长并融合多尺度特征。在大规模CODE 15% ECG数据集上,与OS CNN相比,EcoScaleNet将参数减少了90%,FLOP减少了99%,同时将宏观平均F1评分提高了2.4%。这些结果表明,EcoScaleNet以一小部分计算成本为长序列ECG分类提供了SOTA准确性,从而实现了在商用硬件上的实时部署。我们的EcoScaleNet代码可在GitHub链接中找到。
摘要:Accurate interpretation of 12 lead electrocardiograms (ECGs) is critical for early detection of cardiac abnormalities, yet manual reading is error prone and existing CNN based classifiers struggle to choose receptive field sizes that generalize to the long sequences typical of ECGs. Omni Scale CNN (OS CNN) addresses this by enumerating prime sized kernels inspired by Goldbach conjecture to cover every scale, but its exhaustive design explodes computational cost and blocks deeper, wider models. We present Efficient Convolutional Omni Scale Network (EcoScale-Net), a hierarchical variant that retains full receptive field coverage while eliminating redundancy. At each stage, the maximum kernel length is capped to the scale still required after down sampling, and bottleneck convolutions inserted before and after every Omni Scale block curtail channel growth and fuse multi scale features. On the large scale CODE 15% ECG dataset, EcoScaleNet reduces parameters by 90% and FLOPs by 99% compared with OS CNN, while raising macro averaged F1 score by 2.4%. These results demonstrate that EcoScaleNet delivers SOTA accuracy for long sequence ECG classification at a fraction of the computational cost, enabling real time deployment on commodity hardware. Our EcoScaleNet code is available in GitHub Link.


【4】Comparative Analysis of Data Augmentation for Clinical ECG Classification with STAR
标题:STAR临床心电分类数据增强的比较分析
链接:https://arxiv.org/abs/2510.24740

作者:Nader Nemati
备注:19 pages, 11 figures
摘要:临床12导联心电图分类仍然很困难,因为不同的记录条件,重叠的病理,和明显的标签不平衡阻碍推广,而不受约束的增强风险扭曲诊断关键形态。在这项研究中,正弦时间-幅度恢复(STAR)被引入作为一个逐拍的增强,严格操作之间的连续R峰应用受控的时间扭曲和幅度缩放到每个R-R段,保持规范的P-QRS-T顺序,并离开头部和尾部的轨迹不变。STAR专为实用管道而设计,并提供:(i)形态忠实的可变性,扩大了训练多样性,而不会破坏峰值或间隔;(ii)源弹性训练,提高了跨设备,站点和队列的稳定性,而无需特定的队列调整;(iii)与常见的1D SE-ResNet式ECG编码器骨干的模型无关集成;以及(iv)通过节拍级增强来更好地学习稀有类,通过恢复信息节拍而不是复制整个记录来减少过拟合。与全局裁剪、大偏移或加性噪声相比,STAR避免了抑制或错位临床标志的变换。发布了一个完整的Python实现和一个透明的培训工作流程,与多机构12导联语料库上的源感知、分层五重协议保持一致,从而促进了检查和重用。总之,STAR为临床ECG分类提供了一种简单可控的增强,其中值得信赖的形态,操作简单性和跨源耐用性至关重要。
摘要:Clinical 12-lead ECG classification remains difficult because of diverse recording conditions, overlapping pathologies, and pronounced label imbalance hinder generalization, while unconstrained augmentations risk distorting diagnostically critical morphology. In this study, Sinusoidal Time--Amplitude Resampling (STAR) is introduced as a beat-wise augmentation that operates strictly between successive R-peaks to apply controlled time warping and amplitude scaling to each R--R segment, preserving the canonical P--QRS--T order and leaving the head and tail of the trace unchanged. STAR is designed for practical pipelines and offers: (i) morphology-faithful variability that broadens training diversity without corrupting peaks or intervals; (ii) source-resilient training, improving stability across devices, sites, and cohorts without dataset-specific tuning; (iii) model-agnostic integration with common 1D SE--ResNet-style ECG encoders backbone; and (iv) better learning on rare classes via beat-level augmentation, reducing overfitting by resampling informative beats instead of duplicating whole records. In contrast to global crops, large shifts, or additive noise, STAR avoids transformations that suppress or misalign clinical landmarks. A complete Python implementation and a transparent training workflow are released, aligned with a source-aware, stratified five-fold protocol over a multi-institutional 12-lead corpus, thereby facilitating inspection and reuse. Taken together, STAR provides a simple and controllable augmentation for clinical ECG classification where trustworthy morphology, operational simplicity, and cross-source durability are essential.


【5】Cardi-GPT: An Expert ECG-Record Processing Chatbot
标题:Cardi-GPT:一个专业的心电记录处理聊天机器人
链接:https://arxiv.org/abs/2510.24737

作者:Koustav Mallick, Neel Singh, Mohammedreza Hajiarbabi
备注:None
摘要:解释和交流心电图(ECG)结果是心血管诊断中至关重要但具有挑战性的任务,传统上需要大量的专业知识和精确的临床交流。Cardi-GPT是一种先进的专家系统,旨在通过深度学习和自然语言交互来简化心电图解释并增强临床沟通。Cardi-GPT采用16-残差块卷积神经网络(CNN)处理12导联ECG数据,在24种心脏状况下实现0.6194的加权准确度。一个新的模糊化层将复杂的数值输出转换为具有临床意义的语言类别,而集成的聊天机器人界面有助于直观地探索诊断见解和医疗服务提供者之间的无缝沟通。   该系统在四个国家六家医院的不同数据集上进行了评估,与基线模型相比表现出卓越的性能。此外,Cardi-GPT实现了令人印象深刻的73%的整体响应质量评分,使用衡量覆盖率,接地和一致性的综合评估框架进行评估。通过弥合复杂的ECG数据解释和可操作的临床见解之间的差距,Cardi-GPT代表了心血管医疗保健领域的变革性创新,有望在不同的医疗环境中提高诊断准确性,临床工作流程和患者结局。
摘要:Interpreting and communicating electrocardiogram (ECG) findings are crucial yet challenging tasks in cardiovascular diagnosis, traditionally requiring significant expertise and precise clinical communication. This paper introduces Cardi-GPT, an advanced expert system designed to streamline ECG interpretation and enhance clinical communication through deep learning and natural language interaction. Cardi-GPT employs a 16-residual-block convolutional neural network (CNN) to process 12-lead ECG data, achieving a weighted accuracy of 0.6194 across 24 cardiac conditions. A novel fuzzification layer converts complex numerical outputs into clinically meaningful linguistic categories, while an integrated chatbot interface facilitates intuitive exploration of diagnostic insights and seamless communication between healthcare providers.   The system was evaluated on a diverse dataset spanning six hospitals across four countries, demonstrating superior performance compared to baseline models. Additionally, Cardi-GPT achieved an impressive overall response quality score of 73\%, assessed using a comprehensive evaluation framework that measures coverage, grounding, and coherence. By bridging the gap between intricate ECG data interpretation and actionable clinical insights, Cardi-GPT represents a transformative innovation in cardiovascular healthcare, promising to improve diagnostic accuracy, clinical workflows, and patient outcomes across diverse medical settings.


推荐(1篇)

【1】TV-Rec: Time-Variant Convolutional Filter for Sequential Recommendation
标题:TV-Rec:用于顺序推荐的时变卷积滤波器
链接:https://arxiv.org/abs/2510.25259

作者:Yehjin Shin, Jeongwhan Choi, Seojin Kim, Noseong Park
备注:The 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:近年来,卷积滤波器由于其能够捕获局部序列模式的能力而越来越多地被采用在序列推荐中。然而,这些模型中的大多数都补充了具有自注意力的卷积滤波器。这是因为卷积过滤器(通常是固定过滤器)很难捕捉准确推荐所需的全局交互。我们提出了时变卷积过滤器的顺序推荐(TV-Rec),一个模型的启发图信号处理,其中时变图过滤器捕获位置相关的时间变化的用户序列。通过用时变过滤器取代固定内核和自我注意,TV-Rec实现了更高的表达能力,并更好地捕捉用户行为中的复杂交互模式。这种设计不仅消除了自我注意力的需要,而且还减少了计算,同时加速了推理。在六个公共基准上进行的广泛实验表明,TV-Rec的平均性能优于最先进的基线7.49%。
摘要:Recently, convolutional filters have been increasingly adopted in sequential recommendation for their ability to capture local sequential patterns. However, most of these models complement convolutional filters with self-attention. This is because convolutional filters alone, generally fixed filters, struggle to capture global interactions necessary for accurate recommendation. We propose Time-Variant Convolutional Filters for Sequential Recommendation (TV-Rec), a model inspired by graph signal processing, where time-variant graph filters capture position-dependent temporal variations in user sequences. By replacing both fixed kernels and self-attention with time-variant filters, TV-Rec achieves higher expressive power and better captures complex interaction patterns in user behavior. This design not only eliminates the need for self-attention but also reduces computation while accelerating inference. Extensive experiments on six public benchmarks show that TV-Rec outperforms state-of-the-art baselines by an average of 7.49%.


自动驾驶|车辆|车道检测等(2篇)

【1】SCOUT: A Lightweight Framework for Scenario Coverage Assessment in Autonomous Driving
标题:SCOT:自动驾驶场景覆盖评估的轻量级框架
链接:https://arxiv.org/abs/2510.24949

作者:Anil Yildiz, Sarah M. Thornton, Carl Hildebrandt, Sreeja Roy-Singh, Mykel J. Kochenderfer
摘要:评估场景覆盖率对于评估自主代理的鲁棒性至关重要,但现有方法依赖于昂贵的人工注释或计算密集型大型视觉语言模型(LVLM)。由于成本和效率的限制,这些方法对于大规模部署是不切实际的。为了解决这些缺点,我们提出了SCOUT(场景覆盖监督和理解工具),一个轻量级的代理模型,旨在预测场景覆盖标签直接从代理的潜在传感器表示。SCOUT通过蒸馏过程进行训练,学习近似LVLM生成的覆盖标签,同时消除了对连续LVLM推理或人工注释的需要。通过利用预先计算的感知特征,SCOUT避免了冗余计算,并实现了快速、可扩展的场景覆盖估计。我们在现实生活中自主导航场景的大型数据集上评估了我们的方法,证明它保持了高准确性,同时显着降低了计算成本。我们的研究结果表明,SCOUT提供了一个有效的和实用的替代大规模的覆盖分析。虽然其性能取决于LVLM生成的训练标签的质量,但SCOUT代表了在自治系统中实现有效场景覆盖监督的重要一步。
摘要:Assessing scenario coverage is crucial for evaluating the robustness of autonomous agents, yet existing methods rely on expensive human annotations or computationally intensive Large Vision-Language Models (LVLMs). These approaches are impractical for large-scale deployment due to cost and efficiency constraints. To address these shortcomings, we propose SCOUT (Scenario Coverage Oversight and Understanding Tool), a lightweight surrogate model designed to predict scenario coverage labels directly from an agent's latent sensor representations. SCOUT is trained through a distillation process, learning to approximate LVLM-generated coverage labels while eliminating the need for continuous LVLM inference or human annotation. By leveraging precomputed perception features, SCOUT avoids redundant computations and enables fast, scalable scenario coverage estimation. We evaluate our method across a large dataset of real-life autonomous navigation scenarios, demonstrating that it maintains high accuracy while significantly reducing computational cost. Our results show that SCOUT provides an effective and practical alternative for large-scale coverage analysis. While its performance depends on the quality of LVLM-generated training labels, SCOUT represents a major step toward efficient scenario coverage oversight in autonomous systems.


【2】DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes
标题:DrivingScene:一种用于动态驾驶场景的多任务在线前向3D高斯飞溅方法
链接:https://arxiv.org/abs/2510.24734

作者:Qirui Hou, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, Jianxun Cui
备注:Autonomous Driving, Novel view Synthesis, Multi task Learning
摘要:动态驾驶场景的实时、高保真重建面临着复杂动态和稀疏视图的挑战,现有方法难以平衡质量和效率。我们提出了DrivingScene,一个在线的,前馈的框架,只有两个连续的环绕视图图像重建4D动态场景。我们的关键创新是一个轻量级的残差流网络,它在学习静态场景之前预测每个相机动态对象的非刚性运动,通过场景流明确地建模动态。我们还引入了一个从粗到精的训练范式,以避免端到端方法中常见的不稳定性。在nuScenes数据集上的实验表明,我们的仅图像方法同时在线生成高质量的深度,场景流和3D高斯点云,在动态重建和新颖的视图合成方面都显着优于最先进的方法。
摘要 :Real-time, high-fidelity reconstruction of dynamic driving scenes is challenged by complex dynamics and sparse views, with prior methods struggling to balance quality and efficiency. We propose DrivingScene, an online, feed-forward framework that reconstructs 4D dynamic scenes from only two consecutive surround-view images. Our key innovation is a lightweight residual flow network that predicts the non-rigid motion of dynamic objects per camera on top of a learned static scene prior, explicitly modeling dynamics via scene flow. We also introduce a coarse-to-fine training paradigm that circumvents the instabilities common to end-to-end approaches. Experiments on nuScenes dataset show our image-only method simultaneously generates high-quality depth, scene flow, and 3D Gaussian point clouds online, significantly outperforming state-of-the-art methods in both dynamic reconstruction and novel view synthesis.


联邦学习|隐私保护|加密(1篇)

【1】Subgraph Federated Learning via Spectral Methods
标题:通过谱方法的子图联邦学习
链接:https://arxiv.org/abs/2510.25657

作者:Javad Aliakbari, Johan Östman, Ashkan Panahi, Alexandre Graell i Amat
备注:To be presented at The Annual Conference on Neural Information Processing Systems (NeurIPS) 2025
摘要:我们考虑的问题,联邦学习(FL)的图形结构的数据分布在多个客户端。特别是,我们解决了普遍的情况下,相互连接的子图,客户端之间的相互连接显着影响学习过程。现有的方法受到严重的限制,要么需要交换敏感的节点嵌入,从而造成隐私风险,或依赖于计算密集型的步骤,这阻碍了可扩展性。为了解决这些挑战,我们提出了FedLap,这是一种新的框架,它通过谱域中的拉普拉斯平滑来利用全局结构信息,以有效地捕获节点间的依赖关系,同时确保隐私和可扩展性。我们提供了一个正式的分析FedLap的隐私,证明它保护隐私。值得注意的是,FedLap是第一个具有强隐私保证的子图FL方案。在基准数据集上的大量实验表明,与现有技术相比,FedLap实现了具有竞争力或优越的实用性。
摘要:We consider the problem of federated learning (FL) with graph-structured data distributed across multiple clients. In particular, we address the prevalent scenario of interconnected subgraphs, where interconnections between clients significantly influence the learning process. Existing approaches suffer from critical limitations, either requiring the exchange of sensitive node embeddings, thereby posing privacy risks, or relying on computationally-intensive steps, which hinders scalability. To tackle these challenges, we propose FedLap, a novel framework that leverages global structure information via Laplacian smoothing in the spectral domain to effectively capture inter-node dependencies while ensuring privacy and scalability. We provide a formal analysis of the privacy of FedLap, demonstrating that it preserves privacy. Notably, FedLap is the first subgraph FL scheme with strong privacy guarantees. Extensive experiments on benchmark datasets demonstrate that FedLap achieves competitive or superior utility compared to existing techniques.


推理|分析|理解|解释(10篇)

【1】Neural Stochastic Flows: Solver-Free Modelling and Inference for SDE Solutions
标题:神经随机流:无解器建模和RST解的推理
链接:https://arxiv.org/abs/2510.25769

作者:Naoki Kiyohara, Edward Johns, Yingzhen Li
备注:NeurIPS 2025 (poster). Project page: this https URL
摘要:随机微分方程(SDEs)非常适合对金融、物理和机器学习中的噪声和不规则采样时间序列进行建模。传统的方法需要昂贵的数值求解器在任意时间点之间进行采样。我们引入神经随机流(NSF)及其潜在变体,它们使用具有结构约束的条件规范化流来直接学习(潜在)随机转换律,这些结构约束保留了从随机流继承的属性。这使得任意状态之间的单次采样,并在大的时间间隔产生高达两个数量级的加速。合成的模拟和真实世界的跟踪和视频数据上的实验表明,NSFs保持分布精度相媲美的数值方法,同时大大减少了计算任意时间点采样。
摘要:Stochastic differential equations (SDEs) are well suited to modelling noisy and irregularly sampled time series found in finance, physics, and machine learning. Traditional approaches require costly numerical solvers to sample between arbitrary time points. We introduce Neural Stochastic Flows (NSFs) and their latent variants, which directly learn (latent) SDE transition laws using conditional normalising flows with architectural constraints that preserve properties inherited from stochastic flows. This enables one-shot sampling between arbitrary states and yields up to two orders of magnitude speed-ups at large time gaps. Experiments on synthetic SDE simulations and on real-world tracking and video data show that NSFs maintain distributional accuracy comparable to numerical approaches while dramatically reducing computation for arbitrary time-point sampling.


【2】MLPrE -- A tool for preprocessing and exploratory data analysis prior to machine learning model construction
标题:MLPrE --在机器学习模型构建之前进行预处理和探索性数据分析的工具
链接:https://arxiv.org/abs/2510.25755

作者:David S Maxwell, Michael Darkoh, Sidharth R Samudrala, Caroline Chung, Stephanie T Schmidt, Bissan Al-Lazikani
摘要:随着最近AI深度学习的发展,需要一些工具来满足流入这些模型的数据的需求。在某些情况下,源数据可能以多种格式存在,因此必须针对机器学习模型或图形数据库对源数据进行调查和适当设计。现有工作流的开销和缺乏可扩展性限制了在更大的处理管道(如Apache Airflow)中的集成,从而推动了对强大、可扩展和轻量级工具的需求,以预处理随数据类型和大小扩展的任意数据集。为了解决这个问题,我们提出了机器学习预处理和探索性数据分析MLPrE,其中SparkDataFrames用于在处理过程中保存数据并确保可扩展性。一个通用的JSON输入文件格式被用来描述对DataFrame的逐步更改。实施了输入和输出、过滤、基本统计、特征工程和探索性数据分析的阶段。共有69个阶段被实施到MLPrE,其中我们强调和展示使用六个不同的数据集的关键阶段。我们进一步强调MLPrE的能力,独立处理平面文件中的多个字段,并重新组合它们,否则需要一个额外的管道,使用UniProt词汇表术语数据集。基于这一优势,我们利用现有的葡萄酒质量数据演示了聚类阶段。最后,我们演示了在MLPrE的最后阶段使用磷酸激酶数据的图形数据库的数据准备。总的来说,我们的MLPrE工具为预处理和早期数据分析提供了一个可推广和可扩展的工具,满足了机器学习不断扩大的使用对这种工具的迫切需求。该工具用于加速和简化大型工作流程中的早期开发。
摘要:With the recent growth of Deep Learning for AI, there is a need for tools to meet the demand of data flowing into those models. In some cases, source data may exist in multiple formats, and therefore the source data must be investigated and properly engineered for a Machine Learning model or graph database. Overhead and lack of scalability with existing workflows limit integration within a larger processing pipeline such as Apache Airflow, driving the need for a robust, extensible, and lightweight tool to preprocess arbitrary datasets that scales with data type and size. To address this, we present Machine Learning Preprocessing and Exploratory Data Analysis, MLPrE, in which SparkDataFrames were utilized to hold data during processing and ensure scalability. A generalizable JSON input file format was utilized to describe stepwise changes to that DataFrame. Stages were implemented for input and output, filtering, basic statistics, feature engineering, and exploratory data analysis. A total of 69 stages were implemented into MLPrE, of which we highlight and demonstrate key stages using six diverse datasets. We further highlight MLPrE's ability to independently process multiple fields in flat files and recombine them, otherwise requiring an additional pipeline, using a UniProt glossary term dataset. Building on this advantage, we demonstrated the clustering stage with available wine quality data. Lastly, we demonstrate the preparation of data for a graph database in the final stages of MLPrE using phosphosite kinase data. Overall, our MLPrE tool offers a generalizable and scalable tool for preprocessing and early data analysis, filling a critical need for such a tool given the ever expanding use of machine learning. This tool serves to accelerate and simplify early stage development in larger workflows.


【3】Right for the Right Reasons: Avoiding Reasoning Shortcuts via Prototypical Neurosymbolic AI
标题:正确的理由:通过原型神经符号人工智能避免推理捷径
链接:https://arxiv.org/abs/2510.25497

作者:Luca Andolfi, Eleonora Giunchiglia
摘要:神经符号AI越来越受欢迎,这要归功于它能够将神经感知和符号推理结合到端到端的可训练模型中。然而,最近的研究结果表明,这些倾向于捷径推理,即,学习未缩进的概念-或神经谓词-利用虚假的相关性来满足符号约束。在本文中,我们解决推理捷径的根本原因,我们介绍了原型神经符号架构。这些模型能够满足符号约束(是正确的),因为它们已经学习了正确的基本概念(出于正确的原因),而不是因为虚假的相关性,即使在极低的数据状态下。利用原型学习理论,我们证明了我们可以通过训练模型来满足背景知识,同时考虑到输入与少数标记数据点的相似性,从而有效地避免推理捷径。我们广泛验证了我们的方法在最近提出的rsbench基准套件在各种设置和任务非常稀缺的监督:我们表现出显着的改进,学习正确的概念,无论是在合成任务(MNIST-EvenOdd和Kand-Logic)和现实世界中,高风险的(BDD-OIA)。我们的研究结果为原型接地作为安全可靠的神经符号学习的有效,注释高效的策略铺平了道路。
摘要:Neurosymbolic AI is growing in popularity thanks to its ability to combine neural perception and symbolic reasoning in end-to-end trainable models. However, recent findings reveal these are prone to shortcut reasoning, i.e., to learning unindented concepts--or neural predicates--which exploit spurious correlations to satisfy the symbolic constraints. In this paper, we address reasoning shortcuts at their root cause and we introduce prototypical neurosymbolic architectures. These models are able to satisfy the symbolic constraints (be right) because they have learnt the correct basic concepts (for the right reasons) and not because of spurious correlations, even in extremely low data regimes. Leveraging the theory of prototypical learning, we demonstrate that we can effectively avoid reasoning shortcuts by training the models to satisfy the background knowledge while taking into account the similarity of the input with respect to the handful of labelled datapoints. We extensively validate our approach on the recently proposed rsbench benchmark suite in a variety of settings and tasks with very scarce supervision: we show significant improvements in learning the right concepts both in synthetic tasks (MNIST-EvenOdd and Kand-Logic) and real-world, high-stake ones (BDD-OIA). Our findings pave the way to prototype grounding as an effective, annotation-efficient strategy for safe and reliable neurosymbolic learning.


【4】MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding
标题:MMEdge:通过流水线传感和编码加速设备上多模式推理
链接:https://arxiv.org/abs/2510.25327

作者:Runxi Huang, Mingxuan Yu, Mingyu Tsoi, Xiaomin Ouyang
备注:Accepted by SenSys 2026
摘要:在资源受限的边缘设备上进行实时多模态推理对于自动驾驶、人机交互和移动健康等应用至关重要。然而,以前的工作往往忽略了传感动态和模型执行之间的紧密耦合,以及复杂的模态间的依赖关系。在本文中,我们提出了MMEdge,一个新的设备上的多模态推理框架的基础上流水线的传感和编码。MMEdge不是等待完整的传感器输入,而是将整个推理过程分解为一系列细粒度的传感和编码单元,从而允许计算随着数据的到达而递增。MMEdge还引入了一个轻量级但有效的时间聚合模块,可以跨不同的流水线单元捕获丰富的时间动态,以保持准确性性能。这种流水线设计还为细粒度的跨模态优化和推理期间的早期决策提供了机会。为了在资源可变性和输入数据复杂性下进一步增强系统性能,MMEdge结合了自适应多模态配置优化器和跨模态推测性跳过机制,该自适应多模态配置优化器在延迟约束下为每个模态动态选择最佳感测和模型配置,该跨模态推测性跳过机制在早期预测达到足够置信度时绕过较慢模态的未来单元。我们使用两个公共的多模态数据集来评估MMEdge,并将其部署在基于真实世界无人机(UAV)的多模态测试平台上。结果表明,MMEdge显著降低了端到端延迟,同时在各种系统和数据动态中保持了高任务准确性。
摘要:Real-time multimodal inference on resource-constrained edge devices is essential for applications such as autonomous driving, human-computer interaction, and mobile health. However, prior work often overlooks the tight coupling between sensing dynamics and model execution, as well as the complex inter-modality dependencies. In this paper, we propose MMEdge, an new on-device multi-modal inference framework based on pipelined sensing and encoding. Instead of waiting for complete sensor inputs, MMEdge decomposes the entire inference process into a sequence of fine-grained sensing and encoding units, allowing computation to proceed incrementally as data arrive. MMEdge also introduces a lightweight but effective temporal aggregation module that captures rich temporal dynamics across different pipelined units to maintain accuracy performance. Such pipelined design also opens up opportunities for fine-grained cross-modal optimization and early decision-making during inference. To further enhance system performance under resource variability and input data complexity, MMEdge incorporates an adaptive multimodal configuration optimizer that dynamically selects optimal sensing and model configurations for each modality under latency constraints, and a cross-modal speculative skipping mechanism that bypasses future units of slower modalities when early predictions reach sufficient confidence. We evaluate MMEdge using two public multimodal datasets and deploy it on a real-world unmanned aerial vehicle (UAV)-based multimodal testbed. The results show that MMEdge significantly reduces end-to-end latency while maintaining high task accuracy across various system and data dynamics.


【5】An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation
标题:使用结果不变数据增强的因果效应估计分析
链接:https://arxiv.org/abs/2510.25128

作者:Uzair Akbar, Niki Kilbertus, Hao Shen, Krikamol Muandet, Bo Dai
备注:Accepted at NeurIPS 2025
摘要:数据增强(DA)技术通常用于机器学习中的正则化目的,以便在i.i.d.下更好地泛化。设置.在这项工作中,我们提出了一个统一的框架与主题的因果推理,使DA的使用超出了只是i.i.d.的情况下。设置,但在干预措施的泛化,以及。具体来说,我们认为,当结果产生机制是不变的,我们选择DA,那么这种增强可以有效地被认为是干预治疗产生机制本身。这可能有助于减少隐藏混杂因素引起的因果效应估计偏倚。在存在这种未观察到的混杂因素的情况下,我们通常使用工具变量(IV)--治疗随机化的来源,这些来源有条件地独立于结果。然而,对于许多应用程序,IV可能不像DA那样容易获得,这是这项工作背后的主要动机。通过适当地正则化基于IV的估计量,我们引入了IV样(IVL)回归的概念,以减轻混杂偏倚并提高干预措施的预测性能,即使某些IV属性被放松。最后,我们将参数化DA作为IVL回归问题,并表明当用于组合时可以模拟这种DA的最坏情况应用,进一步提高因果估计和泛化任务的性能,超越简单DA可能提供的性能。这是理论上的人口的情况下,并通过模拟实验的有限样本的情况下,使用一个简单的线性例子。我们还提出了真实的数据实验来支持我们的情况。
摘要:The technique of data augmentation (DA) is often used in machine learning for regularization purposes to better generalize under i.i.d. settings. In this work, we present a unifying framework with topics in causal inference to make a case for the use of DA beyond just the i.i.d. setting, but for generalization across interventions as well. Specifically, we argue that when the outcome generating mechanism is invariant to our choice of DA, then such augmentations can effectively be thought of as interventions on the treatment generating mechanism itself. This can potentially help to reduce bias in causal effect estimation arising from hidden confounders. In the presence of such unobserved confounding we typically make use of instrumental variables (IVs) -- sources of treatment randomization that are conditionally independent of the outcome. However, IVs may not be as readily available as DA for many applications, which is the main motivation behind this work. By appropriately regularizing IV based estimators, we introduce the concept of IV-like (IVL) regression for mitigating confounding bias and improving predictive performance across interventions even when certain IV properties are relaxed. Finally, we cast parameterized DA as an IVL regression problem and show that when used in composition can simulate a worst-case application of such DA, further improving performance on causal estimation and generalization tasks beyond what simple DA may offer. This is shown both theoretically for the population case and via simulation experiments for the finite sample case using a simple linear example. We also present real data experiments to support our case.


【6】Machine Learning based Analysis for Radiomics Features Robustness in Real-World Deployment Scenarios
标题:基于机器学习的放射性组学分析在现实世界部署场景中具有鲁棒性
链接:https://arxiv.org/abs/2510.25026

作者:Sarmad Ahmad Khan, Simon Bernatz, Zahra Moslehi, Florian Buettner
摘要:基于放射学的机器学习模型显示出临床决策支持的前景,但容易受到成像协议,定位和分割变化引起的分布变化的影响。本研究系统地研究了基于放射学的机器学习模型在五个MRI序列的分布变化下的鲁棒性。我们评估了不同的采集协议和分割策略如何影响预测能力和不确定性意识方面的模型可靠性。使用16个水果的体模,我们通过以下方式评价了分布变化:(1)T2-HASTE、T2-TSE、T2-MAP、T1-TSE和T2-FLAIR序列的方案变化;(2)分割变化(完全、部分、旋转);(3)观察者间变异性。我们在8个一致的鲁棒特征与序列特定特征上训练XGBoost分类器,测试域内和域外条件下的模型性能。结果表明,在协议不变特征上训练的模型在分布变化中保持F1分数>0.85,而使用所有特征的模型在协议变化下表现出40%的性能下降。数据集扩充大大提高了不确定性估计的质量,并在不牺牲精度的情况下将预期校准误差(ECE)降低了35%。温度缩放提供了最小的校准好处,证实了XGBoost的固有可靠性。我们的研究结果表明,协议感知的功能选择和受控的幻影研究有效地预测模型的分布变化下的行为,提供了一个框架,开发强大的放射组学模型弹性现实世界的协议变化。
摘要:Radiomics-based machine learning models show promise for clinical decision support but are vulnerable to distribution shifts caused by variations in imaging protocols, positioning, and segmentation. This study systematically investigates the robustness of radiomics-based machine learning models under distribution shifts across five MRI sequences. We evaluated how different acquisition protocols and segmentation strategies affect model reliability in terms of predictive power and uncertainty-awareness. Using a phantom of 16 fruits, we evaluated distribution shifts through: (1) protocol variations across T2-HASTE, T2-TSE, T2-MAP, T1-TSE, and T2-FLAIR sequences; (2) segmentation variations (full, partial, rotated); and (3) inter-observer variability. We trained XGBoost classifiers on 8 consistent robust features versus sequence-specific features, testing model performance under in-domain and out-of-domain conditions. Results demonstrate that models trained on protocol-invariant features maintain F1-scores >0.85 across distribution shifts, while models using all features showed 40% performance degradation under protocol changes. Dataset augmentation substantially improved the quality of uncertainty estimates and reduced the expected calibration error (ECE) by 35% without sacrificing accuracy. Temperature scaling provided minimal calibration benefits, confirming XGBoost's inherent reliability. Our findings reveal that protocol-aware feature selection and controlled phantom studies effectively predict model behavior under distribution shifts, providing a framework for developing robust radiomics models resilient to real-world protocol variations.


【7】Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms
标题:嵌入式和模拟计算平台上深度和Bayesian神经网络的资源高效和稳健推理
链接:https://arxiv.org/abs/2510.24951

作者:Bernhard Klein
备注:Ph.D. dissertation, Heidelberg University, October 2025
摘要:虽然现代机器学习已经改变了许多应用领域,但其不断增长的计算需求越来越限制可扩展性和效率,特别是在嵌入式和资源有限的平台上。在实践中,神经网络不仅必须有效地运行,而且还必须在分布变化或不可见数据下提供可靠的预测。贝叶斯神经网络为量化不确定性提供了一个原则性的框架,但其计算开销进一步加剧了这些挑战。   这项工作通过联合追求算法和硬件效率,为传统和贝叶斯神经网络提供了资源高效和鲁棒的推理。前者通过模型压缩和近似贝叶斯推理减少计算,而后者优化数字加速器上的部署并探索模拟硬件,桥接算法设计和物理实现。第一个贡献,盖伦,执行自动层特定的压缩指导下的灵敏度分析和硬件在环反馈。模拟加速器以噪声为代价提高了效率;这项工作模拟了设备的缺陷,并将噪声训练扩展到非平稳条件,提高了鲁棒性和稳定性。第二条工作线推进概率推理,开发分析和集成近似,取代昂贵的采样,集成到编译器堆栈中,并优化嵌入式推理。最后,概率光子计算引入了一种范例,其中受控的模拟噪声充当固有熵源,从而直接在硬件中实现快速、节能的概率推理。   总之,这些研究展示了如何通过算法-硬件协同设计共同提高效率和可靠性,为下一代值得信赖的节能机器学习系统奠定基础。
摘要:While modern machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency, particularly on embedded and resource-limited platforms. In practice, neural networks must not only operate efficiently but also provide reliable predictions under distributional shifts or unseen data. Bayesian neural networks offer a principled framework for quantifying uncertainty, yet their computational overhead further compounds these challenges.   This work advances resource-efficient and robust inference for both conventional and Bayesian neural networks through the joint pursuit of algorithmic and hardware efficiency. The former reduces computation through model compression and approximate Bayesian inference, while the latter optimizes deployment on digital accelerators and explores analog hardware, bridging algorithmic design and physical realization. The first contribution, Galen, performs automatic layer-specific compression guided by sensitivity analysis and hardware-in-the-loop feedback. Analog accelerators offer efficiency gains at the cost of noise; this work models device imperfections and extends noisy training to nonstationary conditions, improving robustness and stability. A second line of work advances probabilistic inference, developing analytic and ensemble approximations that replace costly sampling, integrate into a compiler stack, and optimize embedded inference. Finally, probabilistic photonic computing introduces a paradigm where controlled analog noise acts as an intrinsic entropy source, enabling fast, energy-efficient probabilistic inference directly in hardware.   Together, these studies demonstrate how efficiency and reliability can be advanced jointly through algorithm-hardware co-design, laying the foundation for the next generation of trustworthy, energy-efficient machine-learning systems.


【8】Topic Analysis with Side Information: A Neural-Augmented LDA Approach
标题:具有辅助信息的主题分析:神经增强LDA方法
链接:https://arxiv.org/abs/2510.24918

作者:Biyi Fang, Kripa Rajshekhar, Truong Vo, Diego Klabjan
摘要:传统的主题模型,如潜在狄利克雷分配(LDA)已被广泛用于揭示文本语料库中的潜在结构,但他们往往难以整合辅助信息,如元数据,用户属性,或文档标签。这些限制限制了它们的表达性、个性化和可解释性。为了解决这个问题,我们提出了nnLDA,一个神经增强的概率主题模型,通过神经先验机制动态地结合边信息。nnLDA将每个文档建模为潜在主题的混合物,其中先验主题比例由以辅助特征为条件的神经网络生成。这种设计使模型能够捕获静态Dirichlet先验无法表示的边信息和主题分布之间的复杂非线性相互作用。我们开发了一个随机变分期望最大化算法,共同优化神经和概率组件。在多个基准数据集上,nnLDA在主题一致性、困惑度和下游分类方面始终优于LDA和Dirichlet-Multinomial Regression。这些结果突出了在辅助信息可用的情况下将神经表征学习与概率主题建模相结合的好处。
摘要:Traditional topic models such as Latent Dirichlet Allocation (LDA) have been widely used to uncover latent structures in text corpora, but they often struggle to integrate auxiliary information such as metadata, user attributes, or document labels. These limitations restrict their expressiveness, personalization, and interpretability. To address this, we propose nnLDA, a neural-augmented probabilistic topic model that dynamically incorporates side information through a neural prior mechanism. nnLDA models each document as a mixture of latent topics, where the prior over topic proportions is generated by a neural network conditioned on auxiliary features. This design allows the model to capture complex nonlinear interactions between side information and topic distributions that static Dirichlet priors cannot represent. We develop a stochastic variational Expectation-Maximization algorithm to jointly optimize the neural and probabilistic components. Across multiple benchmark datasets, nnLDA consistently outperforms LDA and Dirichlet-Multinomial Regression in topic coherence, perplexity, and downstream classification. These results highlight the benefits of combining neural representation learning with probabilistic topic modeling in settings where side information is available.


【9】Send Less, Save More: Energy-Efficiency Benchmark of Embedded CNN Inference vs. Data Transmission in IoT
标题:少发送,多节省:嵌入式CNN推理与物联网中数据传输的能效基准
链接:https://arxiv.org/abs/2510.24829

作者 :Benjamin Karic, Nina Herrmann, Jan Stenkamp, Paula Scharf, Fabian Gieseke, Angela Schwering
备注:11 Pages, Paper lists the categories for the ACM Computing Classification System
摘要:物联网(IoT)和人工智能的融合为我们提供了重要的机会,以提高我们监测和应对生态变化的能力。随着环境挑战变得越来越紧迫,对有效的远程监控解决方案的需求比以往任何时候都更加迫切。设计环境监测物联网应用(特别是涉及图像数据的应用)的一个主要挑战是创建能够在电力有限的偏远地区长期运行的节能物联网设备。微小机器学习领域的进步允许在资源受限的电池供电微控制器上使用卷积神经网络(CNN)。由于数据传输是能源密集型的,因此直接在微控制器上执行推理以减少消息大小可以延长物联网节点的运行寿命。这项工作评估了常见的低功耗广域网和压缩CNN的使用,这些CNN在ESP 32-S3上对特定领域的数据集进行了训练。除其他外,我们的实验表明,与发送原始图像数据相比,在设备上执行CNN推理并仅传输结果可将总体能耗降低五倍。%与非量化模型相比,使用训练后量化的模型压缩伴随着可接受的精度降低,仅为几个百分点。这些发现倡导开发具有减少碳足迹的物联网应用程序,并能够通过嵌入式机器学习在环境监测场景中自主运行。
摘要:The integration of the Internet of Things (IoT) and Artificial Intelligence offers significant opportunities to enhance our ability to monitor and address ecological changes. As environmental challenges become increasingly pressing, the need for effective remote monitoring solutions is more critical than ever. A major challenge in designing IoT applications for environmental monitoring - particularly those involving image data - is to create energy-efficient IoT devices capable of long-term operation in remote areas with limited power availability. Advancements in the field of Tiny Machine Learning allow the use of Convolutional Neural Networks (CNNs) on resource-constrained, battery-operated microcontrollers. Since data transfer is energy-intensive, performing inference directly on microcontrollers to reduce the message size can extend the operational lifespan of IoT nodes. This work evaluates the use of common Low Power Wide Area Networks and compressed CNNs trained on domain specific datasets on an ESP32-S3. Our experiments demonstrate, among other things, that executing CNN inference on-device and transmitting only the results reduces the overall energy consumption by a factor of up to five compared to sending raw image data. %The compression of the model using Post Training Quantization is accompanied by an acceptable reduction in accuracy of only a few percentage points compared to a non-quantized model. These findings advocate the development of IoT applications with reduced carbon footprint and capable of operating autonomously in environmental monitoring scenarios by incorporating Embedded Machine Learning.


【10】Fortytwo: Swarm Inference with Peer-Ranked Consensus
标题:42:具有同行排名共识的群体推理
链接:https://arxiv.org/abs/2510.24801

作者:Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov, Ivan Nikitin, Alexander Firsov
摘要:随着集中式人工智能达到计算上限,并且越来越大的训练运行的回报越来越少,满足需求需要一个在容量和能力方面水平扩展的推理层。我们提出了一种新的协议,它利用群体智能原理和分布式成对排名共识来实现AI推理的卓越性能。我们的方法使用群体推理重新想象了AI节点之间的协作:跨异构模型的同行排名,声誉加权共识,显示最高质量的响应。使用自定义Bradley-Terry风格聚合模型的成对排名,我们证明群推理的性能大大优于多数投票,在GPQA Diamond上实现了85.90%,而在相同模型集下的多数投票中实现了68.69%-提高了+17.21个百分点(相对约+25.1%)。该协议结合了链上声誉,因此节点影响力随着时间的推移而适应所证明的准确性,从而产生了一种精英共识,可以过滤低质量或恶意的参与者。为了抵御Sybil攻击,Fortyttwo在其共识中采用了能力证明:节点必须成功完成校准/测试请求并获得声誉才能进入排名轮,使多身份攻击在保持开放性的同时在经济上没有吸引力。在六个具有挑战性的基准测试中,包括GPQA Diamond,LiveCodeBench和AIME,我们的评估表明,对对抗性和嘈杂的自由形式提示(例如,单次注入退化仅为0.12%,而整体单模型基线为6.20%),同时保持实际的可部署性。总之,这些结果为去中心化的人工智能系统奠定了基础--通过集体智慧使高质量推理的访问民主化,而不牺牲可靠性或安全性。
摘要:As centralized AI hits compute ceilings and diminishing returns from ever-larger training runs, meeting demand requires an inference layer that scales horizontally in both capacity and capability. We present Fortytwo, a novel protocol that leverages swarm intelligence principles and distributed pairwise ranking consensus to achieve superior performance in AI inference. Our approach reimagines collaboration among AI nodes using swarm inference: a peer-ranked, reputation-weighted consensus across heterogeneous models that surfaces the highest-quality responses. Using pairwise ranking with a custom Bradley-Terry-style aggregation model, we demonstrate that swarm inference substantially outperforms majority voting, achieving 85.90% on GPQA Diamond versus 68.69% for majority voting with the same model set - an improvement of +17.21 percentage points (approximately +25.1% relative). The protocol incorporates on-chain reputation so node influence adapts to demonstrated accuracy over time, yielding a meritocratic consensus that filters low-quality or malicious participants. To resist Sybil attacks, Fortytwo employs proof-of-capability in its consensus: nodes must successfully complete calibration/test requests and stake reputation to enter ranking rounds, making multi-identity attacks economically unattractive while preserving openness. Across six challenging benchmarks, including GPQA Diamond, LiveCodeBench, and AIME, our evaluation indicates higher accuracy and strong resilience to adversarial and noisy free-form prompting (e.g., prompt-injection degradation of only 0.12% versus 6.20% for a monolithic single-model baseline), while retaining practical deployability. Together, these results establish a foundation for decentralized AI systems - democratizing access to high-quality inference through collective intelligence without sacrificing reliability or security.


检测相关(1篇)

【1】Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification
标题:通过应用于涉及图像分类的概念漂移检测来监控概率预测的校准
链接:https://arxiv.org/abs/2510.25573

作者:Christopher T. Franck, Anne R. Driscoll, Zoe Szajnfarber, William H. Woodall
摘要:用于图像分类的机器学习方法在该领域取得了令人印象深刻的进展。例如,卷积神经网络能够在工业、国防和其他领域的广泛应用中实现卓越的图像分类精度。虽然这些机器学习模型拥有令人印象深刻的准确性,但一个相关的问题是如何评估和维护这些模型所做预测的校准。如果一个分类模型的预测概率与事件实际发生的概率相一致,则该模型被称为校准良好。虽然有许多可用的方法来评估机器学习校准和重新校准错误的预测,但随着时间的推移,在开发持续监控预测模型的潜在校准丢失的方法上花费的精力较少。我们提出了一个累积和为基础的方法与动态限制,使传统的过程监控和概念漂移应用中的误校准检测。这使得能够早期检测影响现场图像分类性能的操作上下文变化。所提出的图表可以广泛地用于用户需要随着时间的推移监测校准中潜在失效的概率预测的任何情况。重要的是,我们的方法基于概率预测和事件结果,不需要底层访问机器学习模型。
摘要:Machine learning approaches for image classification have led to impressive advances in that field. For example, convolutional neural networks are able to achieve remarkable image classification accuracy across a wide range of applications in industry, defense, and other areas. While these machine learning models boast impressive accuracy, a related concern is how to assess and maintain calibration in the predictions these models make. A classification model is said to be well calibrated if its predicted probabilities correspond with the rates events actually occur. While there are many available methods to assess machine learning calibration and recalibrate faulty predictions, less effort has been spent on developing approaches that continually monitor predictive models for potential loss of calibration as time passes. We propose a cumulative sum-based approach with dynamic limits that enable detection of miscalibration in both traditional process monitoring and concept drift applications. This enables early detection of operational context changes that impact image classification performance in the field. The proposed chart can be used broadly in any situation where the user needs to monitor probability predictions over time for potential lapses in calibration. Importantly, our method operates on probability predictions and event outcomes and does not require under-the-hood access to the machine learning model.


分类|识别(3篇)

【1】Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
标题:将分量对齐作为分类任务中通用化的训练时间代理
链接:https://arxiv.org/abs/2510.25480

作者:Florian A. Hölzl, Daniel Rueckert, Georgios Kaissis
备注:39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:强大的验证指标在当代深度学习中仍然是必不可少的,不仅可以检测过拟合和泛化能力差,还可以监控训练动态。在监督分类设置中,我们研究训练数据和模型权重之间的相互作用是否可以产生这样一个度量,即在训练过程中跟踪泛化,并将性能归因于单个训练样本。我们引入了一致性权重对齐(GWA),量化每个样本梯度和模型权重之间的一致性。我们发现,有效的学习对应于一致的对齐,而错位表示恶化的泛化。GWA在训练过程中是可有效计算的,并反映了样本特定的贡献和整个网络的学习动态。大量的实验表明,GWA可以准确地预测最佳的早期停止,实现原则性的模型比较,并识别有影响力的训练样本,为直接从训练数据进行模型分析提供了一种无验证集的方法。
摘要:Robust validation metrics remain essential in contemporary deep learning, not only to detect overfitting and poor generalization, but also to monitor training dynamics. In the supervised classification setting, we investigate whether interactions between training data and model weights can yield such a metric that both tracks generalization during training and attributes performance to individual training samples. We introduce Gradient-Weight Alignment (GWA), quantifying the coherence between per-sample gradients and model weights. We show that effective learning corresponds to coherent alignment, while misalignment indicates deteriorating generalization. GWA is efficiently computable during training and reflects both sample-specific contributions and dataset-wide learning dynamics. Extensive experiments show that GWA accurately predicts optimal early stopping, enables principled model comparisons, and identifies influential training samples, providing a validation-set-free approach for model analysis directly from the training data.


【2】Stable-by-Design Neural Network-Based LPV State-Space Models for System Identification
标题:用于系统识别的基于设计稳定神经网络的LPV状态空间模型
链接:https://arxiv.org/abs/2510.24757

作者:Ahmet Eren Sertbaş, Tufan Kumbasar
备注:In the 12th International Conference of Image Processing, Wavelet and Applications on Real World Problems, 2025
摘要:非线性系统的精确建模对于可靠的控制是必不可少的,然而传统的辨识方法往往难以在保持稳定性的同时捕获潜在的动态。我们提出了一个\textit{stable-by-design LPV neural networks based state space}(NN-SS)模型,该模型可以同时直接从数据中学习潜在状态和内部调度变量。状态转移矩阵,由神经网络使用学习调度变量,保证是稳定的,通过舒尔为基础的参数化。该架构结合了一个编码器的初始状态估计与状态空间代表网络,构建了完整的一套依赖于系统矩阵。为了训练NN-SS,我们开发了一个框架,该框架将多步预测损失与状态一致性正则化项相结合,确保对漂移的鲁棒性并提高长期预测精度。在基准非线性系统上对所提出的NN-SS进行了评估,结果表明,该模型始终匹配或优于经典的子空间辨识方法和最近的基于梯度的方法。这些研究结果突出了稳定性约束的神经LPV识别作为一个可扩展的和可靠的框架建模复杂的非线性系统的潜力。
摘要:Accurate modeling of nonlinear systems is essential for reliable control, yet conventional identification methods often struggle to capture latent dynamics while maintaining stability. We propose a \textit{stable-by-design LPV neural network-based state-space} (NN-SS) model that simultaneously learns latent states and internal scheduling variables directly from data. The state-transition matrix, generated by a neural network using the learned scheduling variables, is guaranteed to be stable through a Schur-based parameterization. The architecture combines an encoder for initial state estimation with a state-space representer network that constructs the full set of scheduling-dependent system matrices. For training the NN-SS, we develop a framework that integrates multi-step prediction losses with a state-consistency regularization term, ensuring robustness against drift and improving long-horizon prediction accuracy. The proposed NN-SS is evaluated on benchmark nonlinear systems, and the results demonstrate that the model consistently matches or surpasses classical subspace identification methods and recent gradient-based approaches. These findings highlight the potential of stability-constrained neural LPV identification as a scalable and reliable framework for modeling complex nonlinear systems.


【3】StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs
标题:StrikeWatch:基于低功耗FPGA的紧凑时序模型腕式步态识别
链接:https://arxiv.org/abs/2510.24738

作者:Tianheng Ling, Chao Qian, Peter Zdankin, Torben Weis, Gregor Schiele
备注:9 pages, 6 figures, 3 tables, accepted by IEEE Annual Congress on Artificial Intelligence of Things (IEEE AIoT), 3-5 Dec 2025, Osaka Japan
摘要:跑步对健康有很大的好处,但不当的步态模式可能会导致受伤,特别是在没有专家反馈的情况下。虽然基于相机、鞋垫或身体安装的传感器的现有步态分析系统已经证明了有效性,但它们通常体积庞大并且限于离线的跑步后分析。腕戴式可穿戴设备提供了更实用和非侵入性的替代方案,但由于噪声惯性测量单元(IMU)信号,有限的计算资源以及对云连接的依赖,在此类设备上实现实时步态识别仍然具有挑战性。本文介绍了StrikeWatch,一个紧凑的腕戴式系统,执行完全在设备上,实时步态识别使用IMU信号。作为一个案例研究,我们的目标是检测脚跟与前脚罢工,使跑步者自我纠正有害的步态模式,通过视觉和听觉反馈在运行过程中。我们提出了四种紧凑的DL架构(1D-CNN,1D-SepCNN,LSTM和Transformer),并对它们进行了优化,以在两个代表性的嵌入式现场可编程门阵列(FPGA)上进行节能推理:AMD Spartan-7 XC 7S 15和Lattice iCE 40 UP 5 K。使用我们定制的硬件原型,我们从户外跑步会话中收集标记的数据集,并通过完全自动化的部署管道评估所有模型。我们的研究结果揭示了模型复杂性和硬件效率之间的权衡。在12名参与者中进行评估,6位量化的1D-SepCNN实现了最高的平均F1得分0.847,同时每个推理仅消耗0.350 J,在20 MHz运行的iCE 40 UP 5 K上的延迟为0.140 ms。此配置支持在320 mAh电池上连续推理长达13.6天。所有数据集和代码都可以在GitHub存储库https://github.com/tianheng-ling/StrikeWatch中找到。
摘要:Running offers substantial health benefits, but improper gait patterns can lead to injuries, particularly without expert feedback. While prior gait analysis systems based on cameras, insoles, or body-mounted sensors have demonstrated effectiveness, they are often bulky and limited to offline, post-run analysis. Wrist-worn wearables offer a more practical and non-intrusive alternative, yet enabling real-time gait recognition on such devices remains challenging due to noisy Inertial Measurement Unit (IMU) signals, limited computing resources, and dependence on cloud connectivity. This paper introduces StrikeWatch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition using IMU signals. As a case study, we target the detection of heel versus forefoot strikes to enable runners to self-correct harmful gait patterns through visual and auditory feedback during running. We propose four compact DL architectures (1D-CNN, 1D-SepCNN, LSTM, and Transformer) and optimize them for energy-efficient inference on two representative embedded Field-Programmable Gate Arrays (FPGAs): the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. Using our custom-built hardware prototype, we collect a labeled dataset from outdoor running sessions and evaluate all models via a fully automated deployment pipeline. Our results reveal clear trade-offs between model complexity and hardware efficiency. Evaluated across 12 participants, 6-bit quantized 1D-SepCNN achieves the highest average F1 score of 0.847 while consuming just 0.350 {\mu}J per inference with a latency of 0.140 ms on the iCE40UP5K running at 20 MHz. This configuration supports up to 13.6 days of continuous inference on a 320 mAh battery. All datasets and code are available in the GitHub repository https://github.com/tianheng-ling/StrikeWatch.


表征(4篇)

【1】Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
标题:不要蒙蔽您的VLA:对齐视觉表示以实现OOD概括
链接:https://arxiv.org/abs/2510.25616

作者 :Nikita Kachaev, Mikhail Kolosov, Daniil Zelezetsky, Alexey K. Kovalev, Aleksandr I. Panov
备注:13 pages, 6 figures
摘要:视觉-语言-动作(VLA)模型的日益成功源于预训练的视觉-语言模型(VLM)可以赋予智能体可转移的世界知识和视觉-语言(VL)基础,为具有更广泛泛化的动作模型奠定基础。然而,当这些VLM适应动作模态,它仍然不清楚在多大程度上保留其原始的VL表示和知识。在这项工作中,我们进行了系统的研究,在VLA微调表示保留,表明天真的行动微调导致退化的视觉表征。为了表征和测量这些影响,我们探测VLA的隐藏表征并分析注意力地图,进一步地,我们设计了一组有针对性的任务和方法,将VLA模型与其对应的VLM进行对比,隔离动作微调引起的VL能力的变化。我们进一步评估了一系列的战略,调整视觉表示,并介绍了一个简单而有效的方法,减轻退化,并产生改进的泛化到分布(OOD)的情况。总之,我们的分析澄清了行动微调和VL表示的退化之间的权衡,并强调了实用的方法来恢复继承的VL功能。代码公开:https://blind-vla-paper.github.io
摘要:The growing success of Vision-Language-Action (VLA) models stems from the promise that pretrained Vision-Language Models (VLMs) can endow agents with transferable world knowledge and vision-language (VL) grounding, laying a foundation for action models with broader generalization. Yet when these VLMs are adapted to the action modality, it remains unclear to what extent their original VL representations and knowledge are preserved. In this work, we conduct a systematic study of representation retention during VLA fine-tuning, showing that naive action fine-tuning leads to degradation of visual representations. To characterize and measure these effects, we probe VLA's hidden representations and analyze attention maps, further, we design a set of targeted tasks and methods that contrast VLA models with their counterpart VLMs, isolating changes in VL capabilities induced by action fine-tuning. We further evaluate a range of strategies for aligning visual representations and introduce a simple yet effective method that mitigates degradation and yields improved generalization to out-of-distribution (OOD) scenarios. Taken together, our analysis clarifies the trade-off between action fine-tuning and the degradation of VL representations and highlights practical approaches to recover inherited VL capabilities. Code is publicly available: https://blind-vla-paper.github.io


【2】IBNorm: Information-Bottleneck Inspired Normalization for Representation Learning
标题:IBNorm:基于信息瓶颈的表示学习规范化
链接:https://arxiv.org/abs/2510.25262

作者:Xiandong Zou, Pan Zhou
摘要:规范化是深度学习的基础,但现有的方法,如BatchNorm,LayerNorm和RMSNorm是以方差为中心的,通过强制零均值和单位方差来稳定训练,而不控制表示如何捕获任务相关信息。我们提出了IB-Inspired Normalization(IBNorm),这是一个基于信息瓶颈原理的简单而强大的方法家族。IBNorm引入了有界压缩操作,鼓励嵌入保留预测信息,同时抑制干扰变化,产生更多的信息表示,同时保持标准归一化的稳定性和兼容性。理论上,我们证明了IBNorm实现了更高的IB值和更严格的推广界比方差为中心的方法。从经验来看,IBNorm在大规模语言模型(LLaMA、GPT-2)和视觉模型(ResNet、ViT)方面始终优于BatchNorm、LayerNorm和RMSNorm,互信息分析证实了卓越的信息瓶颈行为。代码将公开发布。
摘要:Normalization is fundamental to deep learning, but existing approaches such as BatchNorm, LayerNorm, and RMSNorm are variance-centric by enforcing zero mean and unit variance, stabilizing training without controlling how representations capture task-relevant information. We propose IB-Inspired Normalization (IBNorm), a simple yet powerful family of methods grounded in the Information Bottleneck principle. IBNorm introduces bounded compression operations that encourage embeddings to preserve predictive information while suppressing nuisance variability, yielding more informative representations while retaining the stability and compatibility of standard normalization. Theoretically, we prove that IBNorm achieves a higher IB value and tighter generalization bounds than variance-centric methods. Empirically, IBNorm consistently outperforms BatchNorm, LayerNorm, and RMSNorm across large-scale language models (LLaMA, GPT-2) and vision models (ResNet, ViT), with mutual information analysis confirming superior information bottleneck behavior. Code will be released publicly.


【3】Learning Low Rank Neural Representations of Hyperbolic Wave Dynamics from Data
标题:从数据学习双曲波动力学的低级神经表示
链接:https://arxiv.org/abs/2510.25123

作者:Woojin Cho, Kookjin Lee, Noseong Park, Donsub Rim, Gerrit Welper
备注:41 pages, 18 figures
摘要:我们提出了一种数据驱动的降维方法,非常适合基于物理的数据表示双曲波传播。该方法在超网络框架内利用了一种称为低秩神经表示(LRNR)的专用神经网络架构。该架构的动机是严格证明了这种波类的有效表示的存在的理论结果。我们通过原型示例说明,可以通过深度学习技术的组合直接从数据中学习传播波的这种有效的低维表示。我们观察到,低秩张量表示在训练的LRNR中自然出现,这揭示了波传播的新分解,其中每个分解模式对应于可解释的物理特征。此外,我们证明了LRNR架构,通过压缩方案,这是一个潜在的重要功能时,部署LRNR在要求苛刻的性能制度,使有效的推理。
摘要:We present a data-driven dimensionality reduction method that is well-suited for physics-based data representing hyperbolic wave propagation. The method utilizes a specialized neural network architecture called low rank neural representation (LRNR) inside a hypernetwork framework. The architecture is motivated by theoretical results that rigorously prove the existence of efficient representations for this wave class. We illustrate through archetypal examples that such an efficient low-dimensional representation of propagating waves can be learned directly from data through a combination of deep learning techniques. We observe that a low rank tensor representation arises naturally in the trained LRNRs, and that this reveals a new decomposition of wave propagation where each decomposed mode corresponds to interpretable physical features. Furthermore, we demonstrate that the LRNR architecture enables efficient inference via a compression scheme, which is a potentially important feature when deploying LRNRs in demanding performance regimes.


【4】Using latent representations to link disjoint longitudinal data for mixed-effects regression
标题:使用潜在表示连接不相交的纵向数据以实现混合效应回归
链接:https://arxiv.org/abs/2510.25531

作者:Clemens Schächter, Maren Hackenberg, Michelle Pfaffenlehner, Félix B. Tambe-Ndonfack, Thorsten Schmidt, Astrid Pechmann, Janbernd Kirschner, Jan Hasenauser, Harald Binder
备注:31 pages, 3 figures, 3 tables
摘要:许多罕见疾病提供有限的既定治疗选择,导致患者在出现新药物时转换治疗方法。为了在罕见病试验的低样本量限制下分析这种治疗转换的影响,重要的是要使用所有可用的数据来源。然而,当测量仪器的使用在观察期间发生变化时,例如当仪器适应特定年龄范围时,这是复杂的。由此产生的不相交的纵向数据轨迹,复杂的应用传统的建模方法,如混合效应回归。我们通过将每个仪器的观察结果映射到对齐的低维时间轨迹来解决这个问题,从而实现跨仪器的纵向建模。具体来说,我们采用了一组变分自动编码器架构,将项目值嵌入到每个时间点的共享潜在空间中。时间的疾病动态和治疗开关的影响,然后通过混合效应回归模型应用到潜在的代表。为了使统计推断,我们提出了一种新的统计测试方法,占混合效应回归和变分自动编码器的联合参数估计。该方法用于量化治疗转换对脊髓性肌萎缩症患者的影响。在这里,我们的方法将来自不同测量仪器的运动性能项目与混合效应回归相结合,并将估计的效应映射回观察到的项目水平,以量化治疗转换效应。我们的方法允许模型选择以及评估治疗转换的影响。结果突出了联合潜在表示建模的潜力,以解决小数据的挑战。
摘要 :Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the low sample size limitations of rare disease trials, it is important to use all available data sources. This, however, is complicated when usage of measurement instruments change during the observation period, for example when instruments are adapted to specific age ranges. The resulting disjoint longitudinal data trajectories, complicate the application of traditional modeling approaches like mixed-effects regression. We tackle this by mapping observations of each instrument to a aligned low-dimensional temporal trajectory, enabling longitudinal modeling across instruments. Specifically, we employ a set of variational autoencoder architectures to embed item values into a shared latent space for each time point. Temporal disease dynamics and treatment switch effects are then captured through a mixed-effects regression model applied to latent representations. To enable statistical inference, we present a novel statistical testing approach that accounts for the joint parameter estimation of mixed-effects regression and variational autoencoders. The methodology is applied to quantify the impact of treatment switches for patients with spinal muscular atrophy. Here, our approach aligns motor performance items from different measurement instruments for mixed-effects regression and maps estimated effects back to the observed item level to quantify the treatment switch effect. Our approach allows for model selection as well as for assessing effects of treatment switching. The results highlight the potential of modeling in joint latent representations for addressing small data challenges.


3D|3D重建等相关(1篇)

【1】3D CT-Based Coronary Calcium Assessment: A Feature-Driven Machine Learning Framework
标题:基于3D CT的冠状动脉钙评估:一个环境驱动的机器学习框架
链接:https://arxiv.org/abs/2510.25347

作者:Ayman Abaid, Gianpiero Guidone, Sara Alsubai, Foziyah Alquahtani, Talha Iqbal, Ruth Sharif, Hesham Elzomor, Emiliano Bianchini, Naeif Almagal, Michael G. Madden, Faisal Sharif, Ihsan Ullah
备注:11 pages, 2 Figures, MICCAI AMAI 2025 workshop, to be published in Volume 16206 of the Lecture Notes in Computer Science series
摘要:冠状动脉钙化(CAC)评分在冠状动脉疾病(CAD)的早期发现和危险分层中起着至关重要的作用。在这项研究中,我们专注于非造影剂冠状动脉计算机断层扫描血管造影(CCTA)扫描,这是通常用于早期钙化检测在临床设置。为了解决有限的注释数据的挑战,我们提出了一个基于放射学的管道,利用伪标签来生成训练标签,从而消除了对专家定义的分割的需要。此外,我们探索使用预训练的基础模型,特别是CT-FM和RadImageNet,来提取图像特征,然后与传统分类器一起使用。我们比较了这些深度学习功能与放射组学功能的性能。对包括182名患者的临床CCTA数据集进行评估,其中个体被分为两组:零与非零钙评分。我们进一步研究了训练对非造影数据集与组合造影和非造影数据集的影响,测试仅在非造影扫描上进行。结果表明,尽管没有专家注释,但基于放射组学的模型显著优于来自基础模型的CNN衍生嵌入(达到84%的准确性和p<0.05)。
摘要:Coronary artery calcium (CAC) scoring plays a crucial role in the early detection and risk stratification of coronary artery disease (CAD). In this study, we focus on non-contrast coronary computed tomography angiography (CCTA) scans, which are commonly used for early calcification detection in clinical settings. To address the challenge of limited annotated data, we propose a radiomics-based pipeline that leverages pseudo-labeling to generate training labels, thereby eliminating the need for expert-defined segmentations. Additionally, we explore the use of pretrained foundation models, specifically CT-FM and RadImageNet, to extract image features, which are then used with traditional classifiers. We compare the performance of these deep learning features with that of radiomics features. Evaluation is conducted on a clinical CCTA dataset comprising 182 patients, where individuals are classified into two groups: zero versus non-zero calcium scores. We further investigate the impact of training on non-contrast datasets versus combined contrast and non-contrast datasets, with testing performed only on non contrast scans. Results show that radiomics-based models significantly outperform CNN-derived embeddings from foundation models (achieving 84% accuracy and p<0.05), despite the unavailability of expert annotations.


优化|敛散性(6篇)

【1】Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
标题:计算中心网络上的机器学习和CPU调度协同优化
链接:https://arxiv.org/abs/2510.25176

作者:Mohammadreza Doostmohammadian, Zulfiya R. Gabidullina, Hamid R. Rabiee
备注:EAAI Journal
摘要:近年来,在人工智能(AI)研究的快速发展中,对快速,计算效率和可扩展解决方案的需求有所增加。研究了分布式机器学习和优化中的计算资源优化问题。给定分布在计算节点/服务器网络上的一组数据,其想法是最佳地分配CPU(中央处理单元)使用,同时通过其自己的数据共享在本地训练每个计算节点。这将问题公式化为协同优化设置,以(i)优化数据处理和(ii)优化分配计算资源。节点之间的信息共享网络可能是时变的,但具有平衡的权重以确保算法的共识型收敛。该算法是全时可行的,这意味着计算资源需求平衡约束在所提出的解决方案的所有迭代。此外,该解决方案允许在信息共享信道上解决可能的对数尺度量化以交换对数量化数据。对于一些示例应用,分布式支持向量机(SVM)和回归被认为是ML训练模型。从扰动理论,随着李雅普诺夫稳定性和特征谱分析的结果,被用来证明收敛到最优的情况。与现有的CPU调度方案相比,该算法提高了成本最优性差距超过50美元。
摘要:In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine learning (ML) and optimization is considered in this paper. Given a set of data distributed over a network of computing-nodes/servers, the idea is to optimally assign the CPU (central processing unit) usage while simultaneously training each computing node locally via its own share of data. This formulates the problem as a co-optimization setup to (i) optimize the data processing and (ii) optimally allocate the computing resources. The information-sharing network among the nodes might be time-varying, but with balanced weights to ensure consensus-type convergence of the algorithm. The algorithm is all-time feasible, which implies that the computing resource-demand balance constraint holds at all iterations of the proposed solution. Moreover, the solution allows addressing possible log-scale quantization over the information-sharing channels to exchange log-quantized data. For some example applications, distributed support-vector-machine (SVM) and regression are considered as the ML training models. Results from perturbation theory, along with Lyapunov stability and eigen-spectrum analysis, are used to prove the convergence towards the optimal case. As compared to existing CPU scheduling solutions, the proposed algorithm improves the cost optimality gap by more than $50\%$.


【2】Machine Learning Guided Optimal Transmission Switching to Mitigate Wildfire Ignition Risk
标题:机器学习引导的最佳传输切换以减轻野火点燃风险
链接:https://arxiv.org/abs/2510.25147

作者:Weimin Huang, Ryan Piansky, Bistra Dilkina, Daniel K. Molzahn
摘要:为了减轻急性野火点燃的风险,公用事业公司在高风险地区切断电源。最优功率切断(OPS)问题优化线路断电状态,以通过断电来管理野火点火风险,同时减少甩负荷。OPS问题是具有计算挑战性的混合线性规划(MILP),必须在操作环境中快速和频繁地解决。对于特定的电力系统,OPS实例共享一个公共结构,具有与野火风险、负载和可再生能源发电相关的不同参数。这激发了使用机器学习(ML)通过利用实例之间的共享模式来解决OPS问题。在本文中,我们开发了一个ML引导的框架,快速产生高质量的de-crossing的决定,通过扩展现有的ML引导的MILP解决方案的方法,同时集成领域知识的通电和断电线路的数量。一个大规模的现实加利福尼亚州的综合测试系统的结果表明,建议ML引导的方法产生高质量的解决方案比传统的优化方法更快。
摘要 :To mitigate acute wildfire ignition risks, utilities de-energize power lines in high-risk areas. The Optimal Power Shutoff (OPS) problem optimizes line energization statuses to manage wildfire ignition risks through de-energizations while reducing load shedding. OPS problems are computationally challenging Mixed-Integer Linear Programs (MILPs) that must be solved rapidly and frequently in operational settings. For a particular power system, OPS instances share a common structure with varying parameters related to wildfire risks, loads, and renewable generation. This motivates the use of Machine Learning (ML) for solving OPS problems by exploiting shared patterns across instances. In this paper, we develop an ML-guided framework that quickly produces high-quality de-energization decisions by extending existing ML-guided MILP solution methods while integrating domain knowledge on the number of energized and de-energized lines. Results on a large-scale realistic California-based synthetic test system show that the proposed ML-guided method produces high-quality solutions faster than traditional optimization methods.


【3】Error Bounds and Optimal Schedules for Masked Diffusions with Factorized Approximations
标题:具有因式逼近的掩蔽扩散的误差界和最优调度
链接:https://arxiv.org/abs/2510.25544

作者:Hugo Lavenant, Giacomo Zanella
摘要:最近提出的离散数据生成模型,如掩蔽扩散模型(MDM),利用条件独立近似,以减少流行的自回归模型(ARM)的计算成本,在抽样分布的一些偏差的代价。我们研究了由此产生的计算与准确性的权衡,提供了一般的误差范围(相对熵),仅取决于每次迭代生成的令牌的平均数量和独立的数据维度(即序列长度),从而支持MDM的经验成功。然后,我们研究通过使用非恒定调度大小(即在生成过程中改变未屏蔽令牌的数量)获得的收益,并将最佳调度确定为所谓的数据分布信息配置文件的函数,从而允许有原则地优化调度大小。我们直接将方法定义为采样算法,而不使用经典推导作为时间反演扩散过程,从而使我们得到简单而透明的证明。
摘要:Recently proposed generative models for discrete data, such as Masked Diffusion Models (MDMs), exploit conditional independence approximations to reduce the computational cost of popular Auto-Regressive Models (ARMs), at the price of some bias in the sampling distribution. We study the resulting computation-vs-accuracy trade-off, providing general error bounds (in relative entropy) that depend only on the average number of tokens generated per iteration and are independent of the data dimensionality (i.e. sequence length), thus supporting the empirical success of MDMs. We then investigate the gain obtained by using non-constant schedule sizes (i.e. varying the number of unmasked tokens during the generation process) and identify the optimal schedule as a function of a so-called information profile of the data distribution, thus allowing for a principled optimization of schedule sizes. We define methods directly as sampling algorithms and do not use classical derivations as time-reversed diffusion processes, leading us to simple and transparent proofs.


【4】Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains
标题:可逆Markov链线性函数逼近的非政策TD(0)收敛性
链接:https://arxiv.org/abs/2510.25514

作者:Maik Overmars, Jasper Goseling, Richard Boucherie
摘要:研究了非策略TD(0)在线性函数近似下用于近似马尔可夫链中期望折扣报酬的收敛性。众所周知,离线学习和函数逼近的结合会导致算法的发散。此设置的现有结果修改了算法,例如通过使用重要性采样重新加权更新。这以额外的复杂性为代价建立了收敛。相比之下,我们的方法是分析标准算法,但限制我们的注意力的类可逆马尔可夫链。我们证明了收敛在这个温和的可逆性条件下的链的结构,在许多应用中可以假设使用领域知识。特别是,我们建立了一个收敛保证下的折扣因子的上界的政策和政策外的过程之间的差异。这改善了文献中的已知结果,该文献指出,收敛保持一个足够小的折扣因子,通过建立一个明确的界限。收敛的概率为1,并实现投影贝尔曼误差等于零。为了获得这些结果,我们调整了Tsitsiklis和Van Roy [1997]用于政策情况的随机近似框架,以用于政策情况。我们说明我们的结果使用不同类型的可逆马尔可夫链,如一维随机游动和随机游动的加权图。
摘要:We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead to divergence of the algorithm. Existing results for this setting modify the algorithm, for instance by reweighing the updates using importance sampling. This establishes convergence at the expense of additional complexity. In contrast, our approach is to analyse the standard algorithm, but to restrict our attention to the class of reversible Markov chains. We demonstrate convergence under this mild reversibility condition on the structure of the chain, which in many applications can be assumed using domain knowledge. In particular, we establish a convergence guarantee under an upper bound on the discount factor in terms of the difference between the on-policy and off-policy process. This improves upon known results in the literature that state that convergence holds for a sufficiently small discount factor by establishing an explicit bound. Convergence is with probability one and achieves projected Bellman error equal to zero. To obtain these results, we adapt the stochastic approximation framework that was used by Tsitsiklis and Van Roy [1997 for the on-policy case, to the off-policy case. We illustrate our results using different types of reversible Markov chains, such as one-dimensional random walks and random walks on a weighted graph.


【5】Generative Bayesian Optimization: Generative Models as Acquisition Functions
标题:生成性Bayesian优化:生成性模型作为获取函数
链接:https://arxiv.org/abs/2510.25240

作者:Rafael Oliveira, Daniel M. Steinberg, Edwin V. Bonilla
备注:Under review
摘要:我们提出了一个通用的策略,把生成模型成候选解决方案的抽样批量贝叶斯优化(BO)。BO的生成模型的使用使得大批量缩放能够作为生成采样、非连续设计空间的优化以及高维和组合设计。受直接偏好优化(DPO)成功的启发,我们证明了可以训练一个生成模型,该模型具有直接从观察计算的噪声简单效用值,然后形成其密度与预期效用成比例的建议分布,即,BO的采集功能值。此外,这种方法是可推广的超越基于偏好的反馈到一般类型的奖励信号和损失函数。这种观点避免了构建替代(回归或分类)模型,这在以前的方法中很常见,这些方法使用生成模型进行黑盒优化。理论上,我们证明了BO过程中的生成模型近似遵循一个分布序列,在一定条件下,该分布序列渐近集中于全局最优值。我们还证明了这种效果,通过实验上具有挑战性的优化问题,涉及大批量的高维。
摘要:We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-continuous design spaces, and high-dimensional and combinatorial design. Inspired by the success of direct preference optimization (DPO), we show that one can train a generative model with noisy, simple utility values directly computed from observations to then form proposal distributions whose densities are proportional to the expected utility, i.e., BO's acquisition function values. Furthermore, this approach is generalizable beyond preference-based feedback to general types of reward signals and loss functions. This perspective avoids the construction of surrogate (regression or classification) models, common in previous methods that have used generative models for black-box optimization. Theoretically, we show that the generative models within the BO process approximately follow a sequence of distributions which asymptotically concentrate at the global optima under certain conditions. We also demonstrate this effect through experiments on challenging optimization problems involving large batches in high dimensions.


【6】Nonlinear Dynamics In Optimization Landscape of Shallow Neural Networks with Tunable Leaky ReLU
标题:具有可调Leaky ReLU的浅层神经网络优化格局中的非线性动力学
链接:https://arxiv.org/abs/2510.25060

作者:Jingzhou Liu
摘要:在这项工作中,我们研究了用均方损失和泄漏ReLU激活训练的浅层神经网络的非线性动力学。在高斯输入和等层宽k条件下,(1)基于等变梯度度建立了一个理论框架,适用于任意数目k>= 4的神经元,当泄漏参数$\alpha$变化时,检测具有相关对称性的临界点从全局极小点的分叉。通常情况下,我们的分析表明,多模简并始终发生在临界数0,独立于k。(2)作为一个副产品,我们进一步表明,这样的分叉是宽度无关的,只出现非负$\alpha$和全球最小值在整个工程制度$\alpha$在范围(0,1)没有进一步的破环不稳定性。一个显式的例子,k=5,以说明框架和展示所产生的分歧连同它们的对称性。
摘要 :In this work, we study the nonlinear dynamics of a shallow neural network trained with mean-squared loss and leaky ReLU activation. Under Gaussian inputs and equal layer width k, (1) we establish, based on the equivariant gradient degree, a theoretical framework, applicable to any number of neurons k>= 4, to detect bifurcation of critical points with associated symmetries from global minimum as leaky parameter $\alpha$ varies. Typically, our analysis reveals that a multi-mode degeneracy consistently occurs at the critical number 0, independent of k. (2) As a by-product, we further show that such bifurcations are width-independent, arise only for nonnegative $\alpha$ and that the global minimum undergoes no further symmetry-breaking instability throughout the engineering regime $\alpha$ in range (0,1). An explicit example with k=5 is presented to illustrate the framework and exhibit the resulting bifurcation together with their symmetries.


预测|估计(10篇)

【1】Leveraging an Atmospheric Foundational Model for Subregional Sea Surface Temperature Forecasting
标题:利用大气基础模式进行次区域海表温度预报
链接:https://arxiv.org/abs/2510.25563

作者:Víctor Medina, Giovanny A. Cuervo-Londoño, Javier Sánchez
备注:18 pages, 9 figures
摘要:准确预测海洋变量对于了解气候变化、管理海洋资源和优化海洋活动至关重要。传统的海洋预测依赖于数值模型,然而,这些方法面临着计算成本和可扩展性方面的限制。在这项研究中,我们采用Aurora,一种最初设计用于大气预报的基础深度学习模型,来预测金丝雀上升流系统中的海表温度(SST)。通过高分辨率海洋再分析数据微调这个模型,我们证明了它的能力,捕捉复杂的时空模式,同时减少计算需求。我们的方法涉及一个分阶段的微调过程,结合纬度加权误差指标和优化超参数,以实现有效的学习。实验结果表明,该模型实现了0.119K的低RMSE,保持了较高的异常相关系数(ACC $\approximately 0.997$)。该模型成功地再现了大规模的SST结构,但在捕捉沿海地区的细节面临的挑战。这项工作有助于数据驱动的海洋预测领域,证明了使用在不同领域预先训练的深度学习模型进行海洋应用的可行性。未来的改进包括整合更多的海洋学变量,提高空间分辨率,并探索物理信息神经网络,以提高可解释性和理解性。这些进步可以提高气候建模和海洋预测的准确性,支持环境和经济部门的决策。
摘要:The accurate prediction of oceanographic variables is crucial for understanding climate change, managing marine resources, and optimizing maritime activities. Traditional ocean forecasting relies on numerical models; however, these approaches face limitations in terms of computational cost and scalability. In this study, we adapt Aurora, a foundational deep learning model originally designed for atmospheric forecasting, to predict sea surface temperature (SST) in the Canary Upwelling System. By fine-tuning this model with high-resolution oceanographic reanalysis data, we demonstrate its ability to capture complex spatiotemporal patterns while reducing computational demands. Our methodology involves a staged fine-tuning process, incorporating latitude-weighted error metrics and optimizing hyperparameters for efficient learning. The experimental results show that the model achieves a low RMSE of 0.119K, maintaining high anomaly correlation coefficients (ACC $\approx 0.997$). The model successfully reproduces large-scale SST structures but faces challenges in capturing finer details in coastal regions. This work contributes to the field of data-driven ocean forecasting by demonstrating the feasibility of using deep learning models pre-trained in different domains for oceanic applications. Future improvements include integrating additional oceanographic variables, increasing spatial resolution, and exploring physics-informed neural networks to enhance interpretability and understanding. These advancements can improve climate modeling and ocean prediction accuracy, supporting decision-making in environmental and economic sectors.


【2】Support Vector Machine-Based Burnout Risk Prediction with an Interactive Interface for Organizational Use
标题:具有交互式界面的基于支持载体机的倦怠风险预测可供组织使用
链接:https://arxiv.org/abs/2510.25509

作者:Bruno W. G. Teodosio, Mário J. O. T. Lira, Pedro H. M. Araújo, Lucas R. C. Farias
备注:12 pages, including figures and references. Streamlit app available at: this https URL
摘要:职业倦怠是一种以情绪衰竭、人格解体和个人成就感降低为特征的心理综合征,对个体幸福感和组织绩效有着重要影响。这项研究提出了一种机器学习方法,使用HackerEarth员工倦怠挑战数据集来预测倦怠风险。评估了三种监督算法:最近邻(KNN),随机森林和支持向量机(SVM),通过使用决定系数(R2)的30倍交叉验证评估模型性能。在测试的模型中,SVM实现了最高的预测性能(R2 = 0.84),并在统计学上优于KNN和随机森林的基础上配对$t$-检验。为了确保实用性,使用Streamlit开发了一个交互式界面,允许非技术用户输入数据并接收倦怠风险预测。研究结果强调了机器学习在支持职业倦怠早期检测和促进组织环境中数据驱动的心理健康策略方面的潜力。
摘要:Burnout is a psychological syndrome marked by emotional exhaustion, depersonalization, and reduced personal accomplishment, with a significant impact on individual well-being and organizational performance. This study proposes a machine learning approach to predict burnout risk using the HackerEarth Employee Burnout Challenge dataset. Three supervised algorithms were evaluated: nearest neighbors (KNN), random forest, and support vector machine (SVM), with model performance evaluated through 30-fold cross-validation using the determination coefficient (R2). Among the models tested, SVM achieved the highest predictive performance (R2 = 0.84) and was statistically superior to KNN and Random Forest based on paired $t$-tests. To ensure practical applicability, an interactive interface was developed using Streamlit, allowing non-technical users to input data and receive burnout risk predictions. The results highlight the potential of machine learning to support early detection of burnout and promote data-driven mental health strategies in organizational settings.


【3】Parameter Averaging in Link Prediction
标题:链接预测中的参数平均
链接:https://arxiv.org/abs/2510.25361

作者:Rupesh Sapkota, Caglar Demir, Arnab Sharma, Axel-Cyrille Ngonga Ngomo
摘要:在机器学习中,包围方法被广泛用于提高泛化能力。这也促使在执行链接预测时采用知识图嵌入(KGE)模型的集成学习。为此,典型的方法是训练多个模型作为集合的一部分,然后对不同的预测进行平均。然而,这种方法具有一些显著的缺点。例如,训练多个模型的计算开销增加了延迟和内存开销。相比之下,模型合并方法提供了一种有前途的替代方案,不需要训练多个模型。在这项工作中,我们引入模型合并,特别是加权平均,在KGE模型。在此,从训练时期向前的模型参数的运行平均值被维持并用于预测。为了解决这个问题,我们还提出了一种方法,只有当泛化性能在验证数据集上提高时,才有选择地更新集成模型参数的运行平均值。我们评估了这两种不同的加权平均方法在链路预测任务上的应用,并与最先进的基准集成方法进行了比较。此外,我们评估加权平均的方法,考虑文字增广KGE模型和多跳查询应答任务。结果表明,所提出的加权平均方法在不同的评估设置中始终提高性能。
摘要 :Ensemble methods are widely employed to improve generalization in machine learning. This has also prompted the adoption of ensemble learning for the knowledge graph embedding (KGE) models in performing link prediction. Typical approaches to this end train multiple models as part of the ensemble, and the diverse predictions are then averaged. However, this approach has some significant drawbacks. For instance, the computational overhead of training multiple models increases latency and memory overhead. In contrast, model merging approaches offer a promising alternative that does not require training multiple models. In this work, we introduce model merging, specifically weighted averaging, in KGE models. Herein, a running average of model parameters from a training epoch onward is maintained and used for predictions. To address this, we additionally propose an approach that selectively updates the running average of the ensemble model parameters only when the generalization performance improves on a validation dataset. We evaluate these two different weighted averaging approaches on link prediction tasks, comparing the state-of-the-art benchmark ensemble approach. Additionally, we evaluate the weighted averaging approach considering literal-augmented KGE models and multi-hop query answering tasks as well. The results demonstrate that the proposed weighted averaging approach consistently improves performance across diverse evaluation settings.


【4】Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction
标题:超越泄漏和复杂性:走向现实和有效的信息级联预测
链接:https://arxiv.org/abs/2510.25348

作者:Jie Peng, Rui Wang, Qiang Wang, Zhewei Wei, Bin Tong, Guan Wang
摘要:信息级联流行度预测是分析社交网络中内容扩散的关键问题。然而,当前的相关工作遭受三个关键限制:(1)当前评估中的时间泄漏-基于随机级联的分裂允许模型访问未来信息,从而产生不切实际的结果;(2)缺乏下游转换信号的特征贫乏的数据集(例如,喜欢、评论或购买),这限制了更多的实际应用;(3)复杂的基于图的方法的计算效率低下,需要数天的训练才能获得边际收益。我们从三个方面系统地解决这些挑战:任务设置,数据集构建和模型设计。首先,我们提出了一种按时间排序的拆分策略,将数据按时间顺序划分为连续的窗口,确保模型在真正的预测任务上进行评估,而不会导致未来的信息泄漏。其次,我们介绍了Taoke,这是一个大规模的电子商务级联数据集,具有丰富的促销员/产品属性和真实的购买转换-捕获从促销到货币化的完整扩散生命周期。第三,我们开发了CasTemp,一个轻量级的框架,有效地模型级联动态通过时间行走,Jaccard为基础的邻居选择级联间的依赖关系,和基于GRU的编码与时间感知的注意。在无泄漏评估下,CasTemp在四个数据集上实现了最先进的性能,具有数量级的加速。值得注意的是,它擅长预测第二阶段的流行度转换--这是一项对现实世界应用至关重要的实际任务。
摘要:Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future information leakage. Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes and ground-truth purchase conversions--capturing the complete diffusion lifecycle from promotion to monetization. Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks, Jaccard-based neighbor selection for inter-cascade dependencies, and GRU-based encoding with time-aware attention. Under leak-free evaluation, CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup. Notably, it excels at predicting second-stage popularity conversions--a practical task critical for real-world applications.


【5】Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning
标题:多类正无标记学习的代价敏感无偏风险估计
链接:https://arxiv.org/abs/2510.25226

作者:Miao Zhang, Junpeng Li, Changchun Hua, Yana Yang
摘要:Positive--Unlabeled(PU)学习考虑的是只有阳性和未标记数据可用,而阴性数据缺失或未标记的设置。这种情况在实际应用中很常见,在这些应用中,注释可靠的否定是困难的或昂贵的。尽管PU学习取得了实质性进展,但多类案例(MPU)仍然具有挑战性:许多现有方法无法确保\{无偏风险估计},这限制了性能和稳定性。我们提出了一种基于自适应损失加权的代价敏感多类PU方法。在经验风险最小化的框架内,我们分配不同的,数据依赖的权重的正和负(从未标记的混合物)的损失分量,使所得的经验目标是一个无偏估计的目标风险。我们形式化的MPU数据生成过程,并建立一个泛化误差界的建议估计。在\textbf{eight}公共数据集上进行的广泛实验,跨越了不同的类先验和类的数量,在准确性和稳定性方面都显示出强基线的一致收益。
摘要:Positive--Unlabeled (PU) learning considers settings in which only positive and unlabeled data are available, while negatives are missing or left unlabeled. This situation is common in real applications where annotating reliable negatives is difficult or costly. Despite substantial progress in PU learning, the multi-class case (MPU) remains challenging: many existing approaches do not ensure \emph{unbiased risk estimation}, which limits performance and stability. We propose a cost-sensitive multi-class PU method based on \emph{adaptive loss weighting}. Within the empirical risk minimization framework, we assign distinct, data-dependent weights to the positive and \emph{inferred-negative} (from the unlabeled mixture) loss components so that the resulting empirical objective is an unbiased estimator of the target risk. We formalize the MPU data-generating process and establish a generalization error bound for the proposed estimator. Extensive experiments on \textbf{eight} public datasets, spanning varying class priors and numbers of classes, show consistent gains over strong baselines in both accuracy and stability.


【6】GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction
标题:GReF:通过有序多令牌预测高效重新排名的统一生成框架
链接:https://arxiv.org/abs/2510.25220

作者:Zhijie Lin, Zhuofeng Li, Chenglei Dai, Wentian Bao, Shuai Lin, Enyun Yu, Haoxiang Zhang, Liang Zhao
备注:Accepted by CIKM 2025
摘要:在多阶段推荐系统中,重排序在建模项目之间的列表内相关性方面起着至关重要的作用。一个关键的挑战在于探索最佳序列内的组合空间的排列。最近的研究遵循两阶段(生成器-评估器)范式,其中生成器产生多个可行序列,并且评估器选择最佳序列。在实践中,生成器通常被实现为自回归模型。然而,这些两阶段方法面临两个主要挑战。首先,生成器和评估器的分离阻碍了端到端的培训。其次,自回归生成器受到推理效率的影响。在这项工作中,我们提出了一个统一的生成有效的重新排序框架(GReF),以解决两个主要的挑战。具体来说,我们介绍根重新排序,自回归生成器具有双向编码器和动态自回归解码器生成因果重新排序序列。随后,我们在项目曝光顺序上预训练Gen-Reranker,以实现高质量的参数初始化。为了消除在训练过程中集成序列级评估以进行端到端优化时对评估器的需求,我们提出通过Rerank-DPO对模型进行后训练。此外,为了有效的自回归推理,我们引入了有序多令牌预测(OMTP),它训练Gen-Reranker同时生成多个未来项目,同时保持它们的顺序,确保在实时推荐系统中的实际部署。大量的离线实验表明,GReF优于最先进的重新排序方法,同时实现的延迟几乎与非自回归模型相当。此外,GReF还部署在日活跃用户超过3亿的实时视频应用快手中,显著提高了在线推荐质量。
摘要:In a multi-stage recommendation system, reranking plays a crucial role in modeling intra-list correlations among items. A key challenge lies in exploring optimal sequences within the combinatorial space of permutations. Recent research follows a two-stage (generator-evaluator) paradigm, where a generator produces multiple feasible sequences, and an evaluator selects the best one. In practice, the generator is typically implemented as an autoregressive model. However, these two-stage methods face two main challenges. First, the separation of the generator and evaluator hinders end-to-end training. Second, autoregressive generators suffer from inference efficiency. In this work, we propose a Unified Generative Efficient Reranking Framework (GReF) to address the two primary challenges. Specifically, we introduce Gen-Reranker, an autoregressive generator featuring a bidirectional encoder and a dynamic autoregressive decoder to generate causal reranking sequences. Subsequently, we pre-train Gen-Reranker on the item exposure order for high-quality parameter initialization. To eliminate the need for the evaluator while integrating sequence-level evaluation during training for end-to-end optimization, we propose post-training the model through Rerank-DPO. Moreover, for efficient autoregressive inference, we introduce ordered multi-token prediction (OMTP), which trains Gen-Reranker to simultaneously generate multiple future items while preserving their order, ensuring practical deployment in real-time recommender systems. Extensive offline experiments demonstrate that GReF outperforms state-of-the-art reranking methods while achieving latency that is nearly comparable to non-autoregressive models. Additionally, GReF has also been deployed in a real-world video app Kuaishou with over 300 million daily active users, significantly improving online recommendation quality.


【7】Selective Learning for Deep Time Series Forecasting
标题:深度时间序列预测的选择性学习
链接:https://arxiv.org/abs/2510.25207

作者:Yisong Fu, Zezhi Shao, Chengqing Yu, Yujie Li, Zhulin An, Qi Wang, Yongjun Xu, Fei Wang
备注:Accepted by NeurIPS 2025
摘要:得益于捕获复杂时间模式的高能力,深度学习(DL)显著提高了时间序列预测(TSF)。然而,由于时间序列对噪声和异常的固有脆弱性,深度模型往往会遭受严重的过拟合。流行的DL范式通过MSE损失统一优化所有时间步,并毫无差别地学习那些不确定和异常的时间步,最终导致过拟合。为了解决这个问题,我们提出了一种新的深度TSF选择性学习策略。具体来说,选择性学习筛选整个时间步的子集来计算优化中的MSE损失,引导模型专注于可推广的时间步,而忽略不可推广的时间步。我们的框架引入了一个双掩码机制来目标时间步:(1)利用剩余熵过滤不确定时间步的不确定性掩码,以及(2)采用剩余下界估计排除异常时间步的异常掩码。在八个真实世界数据集上进行的广泛实验表明,选择性学习可以显着提高典型的最先进深度模型的预测性能,包括Informer的MSE降低37.4%,TimesNet降低8.4%,iTransformer降低6.5%。
摘要:Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from severe overfitting due to the inherent vulnerability of time series to noise and anomalies. The prevailing DL paradigm uniformly optimizes all timesteps through the MSE loss and learns those uncertain and anomalous timesteps without difference, ultimately resulting in overfitting. To address this, we propose a novel selective learning strategy for deep TSF. Specifically, selective learning screens a subset of the whole timesteps to calculate the MSE loss in optimization, guiding the model to focus on generalizable timesteps while disregarding non-generalizable ones. Our framework introduces a dual-mask mechanism to target timesteps: (1) an uncertainty mask leveraging residual entropy to filter uncertain timesteps, and (2) an anomaly mask employing residual lower bound estimation to exclude anomalous timesteps. Extensive experiments across eight real-world datasets demonstrate that selective learning can significantly improve the predictive performance for typical state-of-the-art deep models, including 37.4% MSE reduction for Informer, 8.4% for TimesNet, and 6.5% for iTransformer.


【8】Scalable predictive processing framework for multitask caregiving robots
标题:用于多任务看护机器人的可扩展预测处理框架
链接:https://arxiv.org/abs/2510.25053

作者:Hayato Idei, Tamon Miyake, Tetsuya Ogata, Yuichi Yamashita
摘要:社会的快速老龄化加剧了对自主护理机器人的需求;然而,大多数现有系统都是针对特定任务的,依赖于手工预处理,限制了它们在不同场景中的泛化能力。认知神经科学中的一个流行理论提出,人类大脑通过分层预测处理来运作,这是通过整合多模态感觉信号来实现灵活认知和行为的基础。受这一原理的启发,我们引入了一种基于自由能原理下的预测处理的分层多模式递归神经网络,能够直接整合超过30,000维的视觉本体感受输入而无需降维。该模型能够学习两个具有代表性的任务,刚体重新定位和柔性毛巾擦拭,而无需特定于任务的特征工程。我们证明了三个关键属性:(i)自组织的层次潜在的动态调节任务的过渡,捕捉不确定性的变化,并推断闭塞状态;(ii)通过视觉本体感受的整合退化的视觉鲁棒性;以及(iii)多任务学习中的不对称干扰,其中更可变的擦拭任务对重新定位的影响很小,而学习重新定位任务导致擦拭性能的适度降低,而模型保持了整体鲁棒性。虽然评估仅限于模拟,但这些结果将预测处理确立为一种通用的、可扩展的计算原理,指向鲁棒、灵活和自主的机器人,同时为人类大脑在不确定的现实环境中实现灵活适应的能力提供了理论见解。
摘要:The rapid aging of societies is intensifying demand for autonomous care robots; however, most existing systems are task-specific and rely on handcrafted preprocessing, limiting their ability to generalize across diverse scenarios. A prevailing theory in cognitive neuroscience proposes that the human brain operates through hierarchical predictive processing, which underlies flexible cognition and behavior by integrating multimodal sensory signals. Inspired by this principle, we introduce a hierarchical multimodal recurrent neural network grounded in predictive processing under the free-energy principle, capable of directly integrating over 30,000-dimensional visuo-proprioceptive inputs without dimensionality reduction. The model was able to learn two representative caregiving tasks, rigid-body repositioning and flexible-towel wiping, without task-specific feature engineering. We demonstrate three key properties: (i) self-organization of hierarchical latent dynamics that regulate task transitions, capture variability in uncertainty, and infer occluded states; (ii) robustness to degraded vision through visuo-proprioceptive integration; and (iii) asymmetric interference in multitask learning, where the more variable wiping task had little influence on repositioning, whereas learning the repositioning task led to a modest reduction in wiping performance, while the model maintained overall robustness. Although the evaluation was limited to simulation, these results establish predictive processing as a universal and scalable computational principle, pointing toward robust, flexible, and autonomous caregiving robots while offering theoretical insight into the human brain's ability to achieve flexible adaptation in uncertain real-world environments.


【9】WBT-BGRL: A Non-Contrastive Weighted Bipartite Link Prediction Model for Inductive Learning
标题:WBT-BGRL:一种用于归纳学习的非对比加权双方链接预测模型
链接:https://arxiv.org/abs/2510.24927

作者:Joel Frank Huarayo Quispe, Lilian Berton, Didier Vega-Oliveros
备注:5 pages, submitted to the 12th International Conference on Soft Computing and Machine Intelligence (ISCMI 2025)
摘要:二部图中的链接预测对于推荐系统和故障检测等应用至关重要,但与单部图相比,它的研究较少。对比方法难以应对低效且有偏见的阴性采样,而非对比方法则仅依赖于阳性样本。现有的模型在转换设置中表现良好,但它们在归纳、加权和二分场景中的有效性尚未得到验证。为了解决这个问题,我们提出了加权二分三重引导图潜在(WBT-BGRL),一个非对比框架,增强了引导学习与一个新的加权机制的三重损失。使用具有双GCN编码器的二分架构,WBT-BGRL针对自适应最先进的模型(T-BGRL、BGRL、GBT、CCA-SSG)进行评估。在真实世界数据集(工业和电子商务)上的结果显示出有竞争力的性能,特别是当在预训练期间应用加权时,突出了加权的非对比学习在二分图中的归纳链接预测中的价值。
摘要:Link prediction in bipartite graphs is crucial for applications like recommendation systems and failure detection, yet it is less studied than in monopartite graphs. Contrastive methods struggle with inefficient and biased negative sampling, while non-contrastive approaches rely solely on positive samples. Existing models perform well in transductive settings, but their effectiveness in inductive, weighted, and bipartite scenarios remains untested. To address this, we propose Weighted Bipartite Triplet-Bootstrapped Graph Latents (WBT-BGRL), a non-contrastive framework that enhances bootstrapped learning with a novel weighting mechanism in the triplet loss. Using a bipartite architecture with dual GCN encoders, WBT-BGRL is evaluated against adapted state-of-the-art models (T-BGRL, BGRL, GBT, CCA-SSG). Results on real-world datasets (Industry and E-commerce) show competitive performance, especially when weighting is applied during pretraining-highlighting the value of weighted, non-contrastive learning for inductive link prediction in bipartite graphs.


【10】Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA
标题:利用GraphFLA的景观特征增强生物适应度预测基准
链接:https://arxiv.org/abs/2510.24826

作者:Mingyu Huang, Shasha Zhou, Ke Li
备注:56 apges, 18 figures, 8 tables, accepted as a conference paper at NeurIPS 2025
摘要 :机器学习模型越来越多地映射生物序列适应度景观来预测突变效应。对这些模型的有效评估需要根据经验数据制定基准。尽管他们令人印象深刻的规模,现有的基准缺乏地形信息的基础健身景观,这阻碍了解释和比较模型的性能超出平均分数。在这里,我们介绍了GraphFLA,这是一个Python框架,它可以从不同形式的诱变数据中构建和分析适应度景观(例如,DNA,RNA,蛋白质,以及其他)与多达数百万的突变体。GraphFLA计算了20个生物相关特征,这些特征表征了景观地形的4个基本方面。通过将GraphFLA应用于ProteinGym,RNAGym和CIS-BP的5,300多个景观,我们展示了其在解释和比较数十种适应度预测模型的性能方面的实用性,突出了影响模型准确性的因素以及不同模型的各自优势。此外,我们还发布了155个组合完整的经验适应度景观,涵盖了各种形式的220多万个序列。所有的代码和数据集都可以在https://github.com/COLA-Laboratory/GraphFLA上找到。
摘要:Machine learning models increasingly map biological sequence-fitness landscapes to predict mutational effects. Effective evaluation of these models requires benchmarks curated from empirical data. Despite their impressive scales, existing benchmarks lack topographical information regarding the underlying fitness landscapes, which hampers interpretation and comparison of model performance beyond averaged scores. Here, we introduce GraphFLA, a Python framework that constructs and analyzes fitness landscapes from mutagensis data in diverse modalities (e.g., DNA, RNA, protein, and beyond) with up to millions of mutants. GraphFLA calculates 20 biologically relevant features that characterize 4 fundamental aspects of landscape topography. By applying GraphFLA to over 5,300 landscapes from ProteinGym, RNAGym, and CIS-BP, we demonstrate its utility in interpreting and comparing the performance of dozens of fitness prediction models, highlighting factors influencing model accuracy and respective advantages of different models. In addition, we release 155 combinatorially complete empirical fitness landscapes, encompassing over 2.2 million sequences across various modalities. All the codes and datasets are available at https://github.com/COLA-Laboratory/GraphFLA.


其他神经网络|深度学习|模型|建模(22篇)

【1】Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning
标题:合成数据揭示相关多实例学习中的概括差距
链接:https://arxiv.org/abs/2510.25759

作者:Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes
摘要:多实例学习(MIL)通常用于医学成像中,通过处理补丁对高分辨率2D图像进行分类,或通过处理切片对3D体积进行分类。然而,传统的MIL方法单独处理实例,忽略了上下文关系,例如在实际应用中可能是必不可少的附近的补丁或切片的外观。我们设计了一个综合分类任务,其中考虑相邻实例特征对于准确预测至关重要。我们证明了现成的MIL方法的局限性,通过量化其性能相比,最佳贝叶斯估计为这项任务,这是在封闭形式。我们的经验表明,新的相关MIL方法仍然很难在数万个实例上从头开始训练时尽可能地泛化。
摘要:Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, conventional MIL approaches treat instances separately, ignoring contextual relationships such as the appearance of nearby patches or slices that can be essential in real applications. We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction. We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator for this task, which is available in closed-form. We empirically show that newer correlated MIL methods still struggle to generalize as well as possible when trained from scratch on tens of thousands of instances.


【2】Mechanistic Interpretability of RNNs emulating Hidden Markov Models
标题:RNN仿真隐马尔可夫模型的机制解释性
链接:https://arxiv.org/abs/2510.25674

作者:Elia Torre, Michele Viscione, Lucas Pompe, Benjamin F Grewe, Valerio Mante
备注:Accepted at NeurIPS 2025
摘要:递归神经网络(RNN)在神经科学中提供了一种强大的方法来推断神经群体中的潜在动力学,并生成关于行为背后的神经计算的假设。然而,过去的工作主要集中在相对简单的,输入驱动的,主要是确定性的行为-很少有人知道的机制,将允许RNN生成更丰富的,自发的,并在自然环境中观察到的潜在随机行为。隐马尔可夫模型(Hidden Markov Models,简称HMM)的建模揭示了自然行为被分割成离散的潜在状态,它们之间存在随机转换,这是一种可能与RNN实现的连续状态空间不一致的动态。在这里,我们首先展示了RNN可以复制HMM发射统计数据,然后对训练好的网络进行反向工程,以揭示它们实现的机制。在没有输入的情况下,经过训练的RNN的活动会向一个固定点崩溃。当由随机输入驱动时,轨迹反而沿着闭合轨道表现出噪声持续动力学。沿这些轨道的旋转调制发射概率,并由快速,确定性的过渡连接的缓慢,噪声驱动的动态区域之间的过渡。经过训练的RNN发展出高度结构化的连接,一小部分“踢神经元”启动了这些区域之间的转换。这种机制在训练过程中出现,因为网络进入随机共振状态,使其能够执行概率计算。对多个HMM架构(全连接、循环和线性链)的分析表明,这种解决方案通过对相同动态基序的模块化重用来推广,这表明了RNN可以模拟复杂离散潜在动态的组成原则。
摘要:Recurrent neural networks (RNNs) provide a powerful approach in neuroscience to infer latent dynamics in neural populations and to generate hypotheses about the neural computations underlying behavior. However, past work has focused on relatively simple, input-driven, and largely deterministic behaviors - little is known about the mechanisms that would allow RNNs to generate the richer, spontaneous, and potentially stochastic behaviors observed in natural settings. Modeling with Hidden Markov Models (HMMs) has revealed a segmentation of natural behaviors into discrete latent states with stochastic transitions between them, a type of dynamics that may appear at odds with the continuous state spaces implemented by RNNs. Here we first show that RNNs can replicate HMM emission statistics and then reverse-engineer the trained networks to uncover the mechanisms they implement. In the absence of inputs, the activity of trained RNNs collapses towards a single fixed point. When driven by stochastic input, trajectories instead exhibit noise-sustained dynamics along closed orbits. Rotation along these orbits modulates the emission probabilities and is governed by transitions between regions of slow, noise-driven dynamics connected by fast, deterministic transitions. The trained RNNs develop highly structured connectivity, with a small set of "kick neurons" initiating transitions between these regions. This mechanism emerges during training as the network shifts into a regime of stochastic resonance, enabling it to perform probabilistic computations. Analyses across multiple HMM architectures - fully connected, cyclic, and linear-chain - reveal that this solution generalizes through the modular reuse of the same dynamical motif, suggesting a compositional principle by which RNNs can emulate complex discrete latent dynamics.


【3】Feedback Alignment Meets Low-Rank Manifolds: A Structured Recipe for Local Learning
标题:反馈一致性满足低级别Manifolds:本地学习的结构化食谱
链接:https://arxiv.org/abs/2510.25594

作者:Arani Roy, Marco P. Apolinario, Shristi Das Biswas, Kaushik Roy
摘要:使用反向传播(BP)训练深度神经网络(DNN)可以达到最先进的精度,但需要全局错误传播和完全参数化,导致大量内存和计算开销。直接反馈对齐(DFA)能够以较低的内存需求实现本地的可并行更新,但受到非结构化反馈和更深架构(特别是卷积神经网络)中可扩展性差的限制。为了解决这些限制,我们提出了一个结构化的本地学习框架,直接在由权重矩阵的奇异值分解(SVD)定义的低秩流形上操作。每一层都以其分解形式进行训练,并使用集成了交叉熵、子空间对齐和正交正则化的复合损失对SVD分量进行更新。构造反馈矩阵以匹配SVD结构,确保前向和反馈路径之间的一致对齐。相对于原始DFA模型,我们的方法减少了可训练参数的数量,而不依赖于修剪或事后压缩。在CIFAR-10,CIFAR-100和ImageNet上的实验表明,我们的方法达到了与BP相当的准确性。消融研究证实了低秩设置中每个损失项的重要性。这些结果将低秩流形上的局部学习建立为满秩梯度训练的原则性和可扩展的替代方案。
摘要 :Training deep neural networks (DNNs) with backpropagation (BP) achieves state-of-the-art accuracy but requires global error propagation and full parameterization, leading to substantial memory and computational overhead. Direct Feedback Alignment (DFA) enables local, parallelizable updates with lower memory requirements but is limited by unstructured feedback and poor scalability in deeper architectures, specially convolutional neural networks. To address these limitations, we propose a structured local learning framework that operates directly on low-rank manifolds defined by the Singular Value Decomposition (SVD) of weight matrices. Each layer is trained in its decomposed form, with updates applied to the SVD components using a composite loss that integrates cross-entropy, subspace alignment, and orthogonality regularization. Feedback matrices are constructed to match the SVD structure, ensuring consistent alignment between forward and feedback pathways. Our method reduces the number of trainable parameters relative to the original DFA model, without relying on pruning or post hoc compression. Experiments on CIFAR-10, CIFAR-100, and ImageNet show that our method achieves accuracy comparable to that of BP. Ablation studies confirm the importance of each loss term in the low-rank setting. These results establish local learning on low-rank manifolds as a principled and scalable alternative to full-rank gradient-based training.


【4】Learning-Augmented Online Bidding in Stochastic Settings
标题:随机设置下的学习增强在线竞价
链接:https://arxiv.org/abs/2510.25582

作者:Spyros Angelopoulos, Bertrand Simon
摘要:在线竞价是一个经典的优化问题,在在线决策、可中断系统的设计和近似算法的分析中有着广泛的应用。在这项工作中,我们研究在线投标下学习增强设置,将随机性,无论是预测预言或算法本身。在第一部分中,我们研究分布式预测下的投标,并找到帕累托最优算法,提供最好的一致性和算法的鲁棒性之间的权衡。在第二部分中,我们研究了随机投标算法的能力和局限性,通过提出上界和下界的一致性/鲁棒性权衡。以前的工作主要集中在预言机,不利用随机信息的预测质量,和确定性算法。
摘要:Online bidding is a classic optimization problem, with several applications in online decision-making, the design of interruptible systems, and the analysis of approximation algorithms. In this work, we study online bidding under learning-augmented settings that incorporate stochasticity, in either the prediction oracle or the algorithm itself. In the first part, we study bidding under distributional predictions, and find Pareto-optimal algorithms that offer the best-possible tradeoff between the consistency and the robustness of the algorithm. In the second part, we study the power and limitations of randomized bidding algorithms, by presenting upper and lower bounds on the consistency/robustness tradeoffs. Previous works focused predominantly on oracles that do not leverage stochastic information on the quality of the prediction, and deterministic algorithms.


【5】Hybrid Quantum-Classical Recurrent Neural Networks
标题:混合量子经典回归神经网络
链接:https://arxiv.org/abs/2510.25557

作者:Wenduan Xu
摘要:We present a hybrid quantum-classical recurrent neural network (QRNN) architecture in which the entire recurrent core is realized as a parametrized quantum circuit (PQC) controlled by a classical feedforward network. The hidden state is the quantum state of an $n$-qubit PQC, residing in an exponentially large Hilbert space $\mathbb{C}^{2^n}$. The PQC is unitary by construction, making the hidden-state evolution norm-preserving without external constraints. At each timestep, mid-circuit readouts are combined with the input embedding and processed by the feedforward network, which provides explicit classical nonlinearity. The outputs parametrize the PQC, which updates the hidden state via unitary dynamics. The QRNN is compact and physically consistent, and it unifies (i) unitary recurrence as a high-capacity memory, (ii) partial observation via mid-circuit measurements, and (iii) nonlinear classical control for input-conditioned parametrization. We evaluate the model in simulation with up to 14 qubits on sentiment analysis, MNIST, permuted MNIST, copying memory, and language modeling, adopting projective measurements as a limiting case to obtain mid-circuit readouts while maintaining a coherent recurrent quantum memory. We further devise a soft attention mechanism over the mid-circuit readouts in a sequence-to-sequence model and show its effectiveness for machine translation. To our knowledge, this is the first model (RNN or otherwise) grounded in quantum operations to achieve competitive performance against strong classical baselines across a broad class of sequence-learning tasks.


【6】FaCT: Faithful Concept Traces for Explaining Neural Network Decisions
标题:FaCT:解释神经网络决策的忠实概念轨迹
链接:https://arxiv.org/abs/2510.25512

作者:Amin Parchami-Araghi, Sukrut Rao, Jonas Fischer, Bernt Schiele
备注:Accepted to NeurIPS 2025; Code is available at this https URL
摘要:Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge. Many post-hoc concept-based approaches have been introduced to understand their workings, yet they are not always faithful to the model. Further, they make restrictive assumptions on the concepts a model learns, such as class-specificity, small spatial extent, or alignment to human expectations. In this work, we put emphasis on the faithfulness of such concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations. Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced. We also leverage foundation models to propose a new concept-consistency metric, C$^2$-Score, that can be used to evaluate concept-based methods. We show that, compared to prior work, our concepts are quantitatively more consistent and users find our concepts to be more interpretable, all while retaining competitive ImageNet performance.


【7】A Deep Learning Framework for Multi-Operator Learning: Architectures and Approximation Theory
标题:一个用于多算子学习的深度学习框架:架构和近似理论
链接:https://arxiv.org/abs/2510.25379

作者:Adrien Weihs, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer
摘要 :While many problems in machine learning focus on learning mappings between finite-dimensional spaces, scientific applications require approximating mappings between function spaces, i.e., operators. We study the problem of learning collections of operators and provide both theoretical and empirical advances. We distinguish between two regimes: (i) multiple operator learning, where a single network represents a continuum of operators parameterized by a parametric function, and (ii) learning several distinct single operators, where each operator is learned independently. For the multiple operator case, we introduce two new architectures, $\mathrm{MNO}$ and $\mathrm{MONet}$, and establish universal approximation results in three settings: continuous, integrable, or Lipschitz operators. For the latter, we further derive explicit scaling laws that quantify how the network size must grow to achieve a target approximation accuracy. For learning several single operators, we develop a framework for balancing architectural complexity across subnetworks and show how approximation order determines computational efficiency. Empirical experiments on parametric PDE benchmarks confirm the strong expressive power and efficiency of the proposed architectures. Overall, this work establishes a unified theoretical and practical foundation for scalable neural operator learning across multiple operators.


【8】A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks
标题:一种基于凸性的深度神经网络两阶段训练算法
链接:https://arxiv.org/abs/2510.25366

作者:Tomas Hrycej, Bernhard Bermeitinger, Massimo Pavone, Götz-Henrik Wiegand, Siegfried Handschuh
备注:Appeared on KDIR IC3K Conference 2025 (Best Paper Award)
摘要:The key task of machine learning is to minimize the loss function that measures the model fit to the training data. The numerical methods to do this efficiently depend on the properties of the loss function. The most decisive among these properties is the convexity or non-convexity of the loss function. The fact that the loss function can have, and frequently has, non-convex regions has led to a widespread commitment to non-convex methods such as Adam. However, a local minimum implies that, in some environment around it, the function is convex. In this environment, second-order minimizing methods such as the Conjugate Gradient (CG) give a guaranteed superlinear convergence. We propose a novel framework grounded in the hypothesis that loss functions in real-world tasks swap from initial non-convexity to convexity towards the optimum. This is a property we leverage to design an innovative two-phase optimization algorithm. The presented algorithm detects the swap point by observing the gradient norm dependence on the loss. In these regions, non-convex (Adam) and convex (CG) algorithms are used, respectively. Computing experiments confirm the hypothesis that this simple convexity structure is frequent enough to be practically exploited to substantially improve convergence and accuracy.


【9】On the Stability of Neural Networks in Deep Learning
标题:深度学习中神经网络的稳定性
链接:https://arxiv.org/abs/2510.25282

作者:Blaise Delattre
摘要:Deep learning has achieved remarkable success across a wide range of tasks, but its models often suffer from instability and vulnerability: small changes to the input may drastically affect predictions, while optimization can be hindered by sharp loss landscapes. This thesis addresses these issues through the unifying perspective of sensitivity analysis, which examines how neural networks respond to perturbations at both the input and parameter levels.   We study Lipschitz networks as a principled way to constrain sensitivity to input perturbations, thereby improving generalization, adversarial robustness, and training stability. To complement this architectural approach, we introduce regularization techniques based on the curvature of the loss function, promoting smoother optimization landscapes and reducing sensitivity to parameter variations. Randomized smoothing is also explored as a probabilistic method for enhancing robustness at decision boundaries.   By combining these perspectives, we develop a unified framework where Lipschitz continuity, randomized smoothing, and curvature regularization interact to address fundamental challenges in stability. The thesis contributes both theoretical analysis and practical methodologies, including efficient spectral norm computation, novel Lipschitz-constrained layers, and improved certification procedures.


【10】BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training
标题:BPFA:利用子空间二分法加速神经网络训练
链接:https://arxiv.org/abs/2510.25244

作者:Wenjie Zhou, Bohan Wang, Wei Chen, Xueqi Cheng
备注:16 pages
摘要:Recent studies \citep{gur2018gradient,song2024does, wen2024understanding} highlight a fundamental dichotomy in deep learning optimization: Although parameter updates along the top eigendirections of the loss Hessian (Dom-space) capture most of the update magnitude, they often contribute minimally to loss reduction. In contrast, updates in the orthogonal component (Bulk-space) have smaller magnitudes but drive most learning progress. In this work, we further advance the understanding of this phenomenon and introduce the \textbf{Bulk-Space-Filtration-Accelerator (BSFA)}, a novel plug-and-play framework. BSFA accelerates training by differentially scaling update components projected onto these distinct subspaces, simultaneously enhancing stability by moderating updates in the dominant subspace and boosting convergence speed by amplifying those in the bulk-space. To ensure BSFA is both practical and scalable for contemporary large models, we introduce two key innovations: an efficient estimator using Principal Component Analysis (PCA) on historical updates for fast subspace estimation, and a block-wise strategy that applies this estimation on a per-parameter-block basis. These designs make BSFA computationally tractable and highly effective. We demonstrate BSFA's acceleration across various tasks, notably achieving approximately 2$\times$ speedup when pre-training LLaMA-72M on WikiText-103 and LLaMA-134M on OpenWebText compared to vanilla AdamW.


【11】Training Across Reservoirs: Using Numerical Differentiation To Couple Trainable Networks With Black-Box Reservoirs
标题:跨水库的训练:使用数字差异将可训练网络与黑匣子水库结合起来
链接:https://arxiv.org/abs/2510.25074

作者:Andrew Clark, Jack Moursounidis, Osmaan Rasouli, William Gan, Cooper Doyle, Anna Leontjeva
备注:12 pages main, Appendix 10 pages, 6 figures in main body, 10 overall
摘要:We introduce Bounded Numerical Differentiation (BOND), a perturbative method for estimating partial derivatives across network structures with inaccessible computational graphs. BOND demonstrates improved accuracy and scalability from existing perturbative methods, enabling new explorations of trainable architectures that integrate black-box functions. We observe that these black-box functions, realized in our experiments as fixed, untrained networks, can enhance model performance without increasing the number of trainable parameters. This improvement is achieved without extensive optimization of the architecture or properties of the black-box function itself. Our findings highlight the potential of leveraging fixed, non-trainable modules to expand model capacity, suggesting a path toward combining analogue and digital devices as a mechanism for scaling networks.


【12】Disentangling Shared and Private Neural Dynamics with SPIRE: A Latent Modeling Framework for Deep Brain Stimulation
标题:用SPIRE解开共享和私有神经动力学:大脑深层刺激的潜在建模框架
链接:https://arxiv.org/abs/2510.25023

作者:Rahil Soroushmojdehi, Sina Javadzadeh, Mehrnaz Asadi, Terence D.Sanger
备注:25 pages total. Main paper (including references): 13 pages with 7 figures. Appendix: 12 pages with 5 figures and 4 tables. Submitted to ICLR 2026
摘要:Disentangling shared network-level dynamics from region-specific activity is a central challenge in modeling multi-region neural data. We introduce SPIRE (Shared-Private Inter-Regional Encoder), a deep multi-encoder autoencoder that factorizes recordings into shared and private latent subspaces with novel alignment and disentanglement losses. Trained solely on baseline data, SPIRE robustly recovers cross-regional structure and reveals how external perturbations reorganize it. On synthetic benchmarks with ground-truth latents, SPIRE outperforms classical probabilistic models under nonlinear distortions and temporal misalignments. Applied to intracranial deep brain stimulation (DBS) recordings, SPIRE shows that shared latents reliably encode stimulation-specific signatures that generalize across sites and frequencies. These results establish SPIRE as a practical, reproducible tool for analyzing multi-region neural dynamics under stimulation.


【13】Conformational Rank Conditioned Committees for Machine Learning-Assisted Directed Evolution
标题:机器学习辅助定向进化的conformance Rank条件委员会
链接:https://arxiv.org/abs/2510.24974

作者:Mia Adler, Carrie Liang, Brian Peng, Oleg Presnyakov, Justin M. Baker, Jannelle Lauffer, Himani Sharma, Barry Merriman
摘要:Machine Learning-assisted directed evolution (MLDE) is a powerful tool for efficiently navigating antibody fitness landscapes. Many structure-aware MLDE pipelines rely on a single conformation or a single committee across all conformations, limiting their ability to separate conformational uncertainty from epistemic uncertainty. Here, we introduce a rank -conditioned committee (RCC) framework that leverages ranked conformations to assign a deep neural network committee per rank. This design enables a principled separation between epistemic uncertainty and conformational uncertainty. We validate our approach on SARS-CoV-2 antibody docking, demonstrating significant improvements over baseline strategies. Our results offer a scalable route for therapeutic antibody discovery while directly addressing the challenge of modeling conformational uncertainty.


【14】Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
标题:模式感知的Sam:敏锐度感知最小化驱动的梯度调制,用于协调多模式学习
链接:https://arxiv.org/abs/2510.24919

作者:Hossein R. Nowdeh, Jie Ji, Xiaolong Ma, Fatemeh Afghah
摘要:In multimodal learning, dominant modalities often overshadow others, limiting generalization. We propose Modality-Aware Sharpness-Aware Minimization (M-SAM), a model-agnostic framework that applies to many modalities and supports early and late fusion scenarios. In every iteration, M-SAM in three steps optimizes learning. \textbf{First, it identifies the dominant modality} based on modalities' contribution in the accuracy using Shapley. \textbf{Second, it decomposes the loss landscape}, or in another language, it modulates the loss to prioritize the robustness of the model in favor of the dominant modality, and \textbf{third, M-SAM updates the weights} by backpropagation of modulated gradients. This ensures robust learning for the dominant modality while enhancing contributions from others, allowing the model to explore and exploit complementary features that strengthen overall performance. Extensive experiments on four diverse datasets show that M-SAM outperforms the latest state-of-the-art optimization and gradient manipulation methods and significantly balances and improves multimodal learning.


【15】From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
标题:从线性到非线性:通过特征学习可证明的弱到强概括
链接:https://arxiv.org/abs/2510.24812

作者:Junsoo Oh, Jerry Song, Chulhee Yun
备注:NeurIPS 2025 camera-ready version, 70 pages
摘要:Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this effect, most theoretical insights are limited to abstract frameworks or linear/random feature models. In this paper, we provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLU CNN (strong). We consider structured data composed of label-dependent signals of varying difficulty and label-independent noise, and analyze gradient descent dynamics when the strong model is trained on data labeled by the pretrained weak model. Our analysis identifies two regimes -- data-scarce and data-abundant -- based on the signal-to-noise characteristics of the dataset, and reveals distinct mechanisms of weak-to-strong generalization. In the data-scarce regime, generalization occurs via benign overfitting or fails via harmful overfitting, depending on the amount of data, and we characterize the transition boundary. In the data-abundant regime, generalization emerges in the early phase through label correction, but we observe that overtraining can subsequently degrade performance.


【16】Dual-Domain Deep Learning-Assisted NOMA-CSK Systems for Secure and Efficient Vehicular Communications
标题:双域深度学习辅助NOMA-CSK系统用于安全高效的车辆通信
链接:https://arxiv.org/abs/2510.24763

作者:Tingting Huang, Jundong Chen, Huanqiang Zeng, Guofa Cai, Georges Kaddoum
摘要 :Ensuring secure and efficient multi-user (MU) transmission is critical for vehicular communication systems. Chaos-based modulation schemes have garnered considerable interest due to their benefits in physical layer security. However, most existing MU chaotic communication systems, particularly those based on non-coherent detection, suffer from low spectral efficiency due to reference signal transmission, and limited user connectivity under orthogonal multiple access (OMA). While non-orthogonal schemes, such as sparse code multiple access (SCMA)-based DCSK, have been explored, they face high computational complexity and inflexible scalability due to their fixed codebook designs. This paper proposes a deep learning-assisted power domain non-orthogonal multiple access chaos shift keying (DL-NOMA-CSK) system for vehicular communications. A deep neural network (DNN)-based demodulator is designed to learn intrinsic chaotic signal characteristics during offline training, thereby eliminating the need for chaotic synchronization or reference signal transmission. The demodulator employs a dual-domain feature extraction architecture that jointly processes the time-domain and frequency-domain information of chaotic signals, enhancing feature learning under dynamic channels. The DNN is integrated into the successive interference cancellation (SIC) framework to mitigate error propagation issues. Theoretical analysis and extensive simulations demonstrate that the proposed system achieves superior performance in terms of spectral efficiency (SE), energy efficiency (EE), bit error rate (BER), security, and robustness, while maintaining lower computational complexity compared to traditional MU-DCSK and existing DL-aided schemes. These advantages validate its practical viability for secure vehicular communications.


【17】Constructive Lyapunov Functions via Topology-Preserving Neural Networks
标题:通过保拓神经网络构建的李雅普诺夫函数
链接:https://arxiv.org/abs/2510.24730

作者:Jaehong Oh
备注:54pages, 14 figures
摘要:We prove that ONN achieves order-optimal performance on convergence rate ($\mu \propto \lambda_2$), edge efficiency ($E = N$ for minimal connectivity $k = 2$), and computational complexity ($O(N d^2)$). Empirical validation on 3M-node semantic networks demonstrates 99.75\% improvement over baseline methods, confirming exponential convergence ($\mu = 3.2 \times 10^{-4}$) and topology preservation. ORTSF integration into transformers achieves 14.7\% perplexity reduction and 2.3 faster convergence on WikiText-103. We establish deep connections to optimal control (Hamilton-Jacobi-Bellman), information geometry (Fisher-efficient natural gradient), topological data analysis (persistent homology computation in $O(KN)$), discrete geometry (Ricci flow), and category theory (adjoint functors). This work transforms Massera's abstract existence theorem into a concrete, scalable algorithm with provable guarantees, opening pathways for constructive stability analysis in neural networks, robotics, and distributed systems.


【18】Large-Scale Network Embedding in Apache Spark
标题:大规模网络嵌入Apache Spark
链接:https://arxiv.org/abs/2106.10620

作者:Wenqing Lin
备注:Accepted in KDD 2021
摘要:Network embedding has been widely used in social recommendation and network analysis, such as recommendation systems and anomaly detection with graphs. However, most of previous approaches cannot handle large graphs efficiently, due to that (i) computation on graphs is often costly and (ii) the size of graph or the intermediate results of vectors could be prohibitively large, rendering it difficult to be processed on a single machine. In this paper, we propose an efficient and effective distributed algorithm for network embedding on large graphs using Apache Spark, which recursively partitions a graph into several small-sized subgraphs to capture the internal and external structural information of nodes, and then computes the network embedding for each subgraph in parallel. Finally, by aggregating the outputs on all subgraphs, we obtain the embeddings of nodes in a linear cost. After that, we demonstrate in various experiments that our proposed approach is able to handle graphs with billions of edges within a few hours and is at least 4 times faster than the state-of-the-art approaches. Besides, it achieves up to $4.25\%$ and $4.27\%$ improvements on link prediction and node classification tasks respectively. In the end, we deploy the proposed algorithms in two online games of Tencent with the applications of friend recommendation and item recommendation, which improve the competitors by up to $91.11\%$ in running time and up to $12.80\%$ in the corresponding evaluation metrics.


【19】E-Scores for (In)Correctness Assessment of Generative Model Outputs
标题:生成模型输出(In)正确性评估的E分数
链接:https://arxiv.org/abs/2510.25770

作者:Guneet S. Dhillon, Javier González, Teodora Pandeva, Alicia Curth
摘要:While generative models, especially large language models (LLMs), are ubiquitous in today's world, principled mechanisms to assess their (in)correctness are limited. Using the conformal prediction framework, previous works construct sets of LLM responses where the probability of including an incorrect response, or error, is capped at a desired user-defined tolerance level. However, since these methods are based on p-values, they are susceptible to p-hacking, i.e., choosing the tolerance level post-hoc can invalidate the guarantees. We therefore leverage e-values to complement generative model outputs with e-scores as a measure of incorrectness. In addition to achieving the same statistical guarantees as before, e-scores provide users flexibility in adaptively choosing tolerance levels after observing the e-scores themselves, by upper bounding a post-hoc notion of error called size distortion. We experimentally demonstrate their efficacy in assessing LLM outputs for different correctness types: mathematical factuality and property constraints satisfaction.


【20】Physics-Guided Conditional Diffusion Networks for Microwave Image Reconstruction
标题:用于微波图像重建的物理引导条件扩散网络
链接:https://arxiv.org/abs/2510.25729

作者:Shirin Chehelgami, Joe LoVetri, Vahab Khoshdel
摘要 :A conditional latent-diffusion based framework for solving the electromagnetic inverse scattering problem associated with microwave imaging is introduced. This generative machine-learning model explicitly mirrors the non-uniqueness of the ill-posed inverse problem. Unlike existing inverse solvers utilizing deterministic machine learning techniques that produce a single reconstruction, the proposed latent-diffusion model generates multiple plausible permittivity maps conditioned on measured scattered-field data, thereby generating several potential instances in the range-space of the non-unique inverse mapping. A forward electromagnetic solver is integrated into the reconstruction pipeline as a physics-based evaluation mechanism. The space of candidate reconstructions form a distribution of possibilities consistent with the conditioning data and the member of this space yielding the lowest scattered-field data discrepancy between the predicted and measured scattered fields is reported as the final solution. Synthetic and experimental labeled datasets are used for training and evaluation of the model. An innovative labeled synthetic dataset is created that exemplifies a varied set of scattering features. Training of the model using this new dataset produces high quality permittivity reconstructions achieving improved generalization with excellent fidelity to shape recognition. The results highlight the potential of hybrid generative physics frameworks as a promising direction for robust, data-driven microwave imaging.


【21】Continuous subsurface property retrieval from sparse radar observations using physics informed neural networks
标题:使用物理知识神经网络从稀疏雷达观测结果中连续检索地下性质
链接:https://arxiv.org/abs/2510.25648

作者:Ishfaq Aziz, Mohamad Alipour
备注:22 pages, 9 main text figures + 2 supplementary figures
摘要:Estimating subsurface dielectric properties is essential for applications ranging from environmental surveys of soils to nondestructive evaluation of concrete in infrastructure. Conventional wave inversion methods typically assume few discrete homogeneous layers and require dense measurements or strong prior knowledge of material boundaries, limiting scalability and accuracy in realistic settings where properties vary continuously. We present a physics informed machine learning framework that reconstructs subsurface permittivity as a fully neural, continuous function of depth, trained to satisfy both measurement data and Maxwells equations. We validate the framework with both simulations and custom built radar experiments on multilayered natural materials. Results show close agreement with in-situ permittivity measurements (R^2=0.93), with sensitivity to even subtle variations (Delta eps_r=2). Parametric analysis reveals that accurate profiles can be recovered with as few as three strategically placed sensors in two layer systems. This approach reframes subsurface inversion from boundary-driven to continuous property estimation, enabling accurate characterization of smooth permittivity variations and advancing electromagnetic imaging using low cost radar systems.


【22】Decoding non-invasive brain activity with novel deep-learning approaches
标题:使用新型深度学习方法解码无创大脑活动
链接:https://arxiv.org/abs/2510.24733

作者:Richard Csaky
备注:PhD thesis, 342 pages
摘要:This thesis delves into the world of non-invasive electrophysiological brain signals like electroencephalography (EEG) and magnetoencephalography (MEG), focusing on modelling and decoding such data. The research aims to investigate what happens in the brain when we perceive visual stimuli or engage in covert speech (inner speech) and enhance the decoding performance of such stimuli. The thesis is divided into two main sections, methodological and experimental work. A central concern in both sections is the large variability present in electrophysiological recordings, whether it be within-subject or between-subject variability, and to a certain extent between-dataset variability. In the methodological sections, we explore the potential of deep learning for brain decoding. We present advancements in decoding visual stimuli using linear models at the individual subject level. We then explore how deep learning techniques can be employed for group decoding, introducing new methods to deal with between-subject variability. Finally, we also explores novel forecasting models of MEG data based on convolutional and Transformer-based architectures. In particular, Transformer-based models demonstrate superior capabilities in generating signals that closely match real brain data, thereby enhancing the accuracy and reliability of modelling the brain's electrophysiology. In the experimental section, we present a unique dataset containing high-trial inner speech EEG, MEG, and preliminary optically pumped magnetometer (OPM) data. Our aim is to investigate different types of inner speech and push decoding performance by collecting a high number of trials and sessions from a few participants. However, the decoding results are found to be mostly negative, underscoring the difficulty of decoding inner speech.


其他(36篇)

【1】Meshless solutions of PDE inverse problems on irregular geometries
标题:不规则几何体上偏微分方程反问题的无网格解
链接:https://arxiv.org/abs/2510.25752

作者:James V. Roggeveen, Michael P. Brenner
摘要:Solving inverse and optimization problems over solutions of nonlinear partial differential equations (PDEs) on complex spatial domains is a long-standing challenge. Here we introduce a method that parameterizes the solution using spectral bases on arbitrary spatiotemporal domains, whereby the basis is defined on a hyperrectangle containing the true domain. We find the coefficients of the basis expansion by solving an optimization problem whereby both the equations, the boundary conditions and any optimization targets are enforced by a loss function, building on a key idea from Physics-Informed Neural Networks (PINNs). Since the representation of the function natively has exponential convergence, so does the solution of the optimization problem, as long as it can be solved efficiently. We find empirically that the optimization protocols developed for machine learning find solutions with exponential convergence on a wide range of equations. The method naturally allows for the incorporation of data assimilation by including additional terms in the loss function, and for the efficient solution of optimization problems over the PDE solutions.


【2】LieSolver: A PDE-constrained solver for IBVPs using Lie symmetries
标题:LieSolver:使用Lie对称性的IBVP的PED约束求解器
链接:https://arxiv.org/abs/2510.25731

作者:René P. Klausen, Ivan Timofeev, Johannes Frank, Jonas Naujoks, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
摘要 :We introduce a method for efficiently solving initial-boundary value problems (IBVPs) that uses Lie symmetries to enforce the associated partial differential equation (PDE) exactly by construction. By leveraging symmetry transformations, the model inherently incorporates the physical laws and learns solutions from initial and boundary data. As a result, the loss directly measures the model's accuracy, leading to improved convergence. Moreover, for well-posed IBVPs, our method enables rigorous error estimation. The approach yields compact models, facilitating an efficient optimization. We implement LieSolver and demonstrate its application to linear homogeneous PDEs with a range of initial conditions, showing that it is faster and more accurate than physics-informed neural networks (PINNs). Overall, our method improves both computational efficiency and the reliability of predictions for PDE-constrained problems.


【3】Convolutional Spiking-based GRU Cell for Spatio-temporal Data
标题:基于卷积尖峰的时空数据GRU单元
链接:https://arxiv.org/abs/2510.25696

作者:Yesmine Abdennadher, Eleonora Cicciarella, Michele Rossi
备注:6 pages, 1 figure. Published in 2025 IEEE International Workshop On   Machine Learning for Signal Processing, Aug. 31-Sep. 3, 2025, Istanbul,   Turkey
摘要:Spike-based temporal messaging enables SNNs to efficiently process both purely temporal and spatio-temporal time-series or event-driven data. Combining SNNs with Gated Recurrent Units (GRUs), a variant of recurrent neural networks, gives rise to a robust framework for sequential data processing; however, traditional RNNs often lose local details when handling long sequences. Previous approaches, such as SpikGRU, fail to capture fine-grained local dependencies in event-based spatio-temporal data. In this paper, we introduce the Convolutional Spiking GRU (CS-GRU) cell, which leverages convolutional operations to preserve local structure and dependencies while integrating the temporal precision of spiking neurons with the efficient gating mechanisms of GRUs. This versatile architecture excels on both temporal datasets (NTIDIGITS, SHD) and spatio-temporal benchmarks (MNIST, DVSGesture, CIFAR10DVS). Our experiments show that CS-GRU outperforms state-of-the-art GRU variants by an average of 4.35%, achieving over 90% accuracy on sequential tasks and up to 99.31% on MNIST. It is worth noting that our solution achieves 69% higher efficiency compared to SpikGRU. The code is available at: https://github.com/YesmineAbdennadher/CS-GRU.


【4】A Configuration-First Framework for Reproducible, Low-Code Localization
标题:一个可重复性、低代码本地化的描述优先框架
链接:https://arxiv.org/abs/2510.25692

作者:Tim Strnad (Jožef Stefan Institute, Slovenia), Blaž Bertalanič (Jožef Stefan Institute, Slovenia), Carolina Fortuna (Jožef Stefan Institute, Slovenia)
备注:20 pages, 7 figures. Preprint submitted to ACM Transactions on Software Engineering and Methodology (TOSEM), 2025
摘要:Machine learning is increasingly permeating radio-based localization services. To keep results credible and comparable, everyday workflows should make rigorous experiment specification and exact repeatability the default, without blocking advanced experimentation. However, in practice, researchers face a three-way gap that could be filled by a framework that offers (i) low coding effort for end-to-end studies, (ii) reproducibility by default including versioned code, data, and configurations, controlled randomness, isolated runs, and recorded artifacts, and (iii) built-in extensibility so new models, metrics, and stages can be added with minimal integration effort. Existing tools rarely deliver all three for machine learning in general and localization workflows in particular. In this paper we introduce LOCALIZE, a low-code, configuration-first framework for radio localization in which experiments are declared in human-readable configuration, a workflow orchestrator runs standardized pipelines from data preparation to reporting, and all artifacts, such as datasets, models, metrics, and reports, are versioned. The preconfigured, versioned datasets reduce initial setup and boilerplate, speeding up model development and evaluation. The design, with clear extension points, allows experts to add components without reworking the infrastructure. In a qualitative comparison and a head-to-head study against a plain Jupyter notebook baseline, we show that the framework reduces authoring effort while maintaining comparable runtime and memory behavior. Furthermore, using a Bluetooth Low Energy dataset, we show that scaling across training data (1x to 10x) keeps orchestration overheads bounded as data grows. Overall, the framework makes reproducible machine-learning-based localization experimentation practical, accessible, and extensible.


【5】Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy
标题:低秩近似的谱扰动界及其在保密中的应用
链接:https://arxiv.org/abs/2510.25670

作者:Phuc Tran, Nisheeth K. Vishnoi, Van H. Vu
备注:NeurIPS 2025
摘要:A central challenge in machine learning is to understand how noise or measurement errors affect low-rank approximations, particularly in the spectral norm. This question is especially important in differentially private low-rank approximation, where one aims to preserve the top-$p$ structure of a data-derived matrix while ensuring privacy. Prior work often analyzes Frobenius norm error or changes in reconstruction quality, but these metrics can over- or under-estimate true subspace distortion. The spectral norm, by contrast, captures worst-case directional error and provides the strongest utility guarantees. We establish new high-probability spectral-norm perturbation bounds for symmetric matrices that refine the classical Eckart--Young--Mirsky theorem and explicitly capture interactions between a matrix $A \in \mathbb{R}^{n \times n}$ and an arbitrary symmetric perturbation $E$. Under mild eigengap and norm conditions, our bounds yield sharp estimates for $\|(A + E)_p - A_p\|$, where $A_p$ is the best rank-$p$ approximation of $A$, with improvements of up to a factor of $\sqrt{n}$. As an application, we derive improved utility guarantees for differentially private PCA, resolving an open problem in the literature. Our analysis relies on a novel contour bootstrapping method from complex analysis and extends it to a broad class of spectral functionals, including polynomials and matrix exponentials. Empirical results on real-world datasets confirm that our bounds closely track the actual spectral error under diverse perturbation regimes.


【6】INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
标题:智力vs. s FP:细粒度低位量化帧的综合研究
链接:https://arxiv.org/abs/2510.25602

作者 :Mengzhao Chen, Meng Wu, Hui Jin, Zhihang Yuan, Jing Liu, Chaoyi Zhang, Yunshui Li, Jie Huang, Jin Ma, Zeyue Xue, Zhiheng Liu, Xingyan Bin, Ping Luo
摘要:Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats to handle the pervasive activation outliers in Large Language Models (LLMs). Despite this industry trend, a unified comparison of FP and integer (INT) quantization across varying granularities has been missing, leaving algorithm and hardware co-design without clear guidance. This paper fills that gap by systematically investigating the trade-offs between FP and INT formats. We reveal a critical performance crossover: while FP excels in coarse-grained quantization, the comparison at fine-grained (block-wise) levels is more nuanced. Our comprehensive comparison demonstrates that for popular 8-bit fine-grained formats (e.g., MX with block size 32), MXINT8 is superior to its FP counterpart in both algorithmic accuracy and hardware efficiency. However, for 4-bit formats, FP (e.g., MXFP4, NVFP4) often holds an accuracy advantage , though we show that NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are applied. We also introduce a symmetric clipping method that resolves gradient bias in fine-grained low-bit INT training, enabling nearly lossless performance for MXINT8 training. These findings challenge the current hardware trajectory, demonstrating that a one-size-fits-all FP approach is suboptimal and advocating that fine-grained INT formats, particularly MXINT8, offer a better balance of accuracy, power, and efficiency for future AI accelerators.


【7】Perturbation Bounds for Low-Rank Inverse Approximations under Noise
标题:噪音下低阶逆逼近的扰动界
链接:https://arxiv.org/abs/2510.25571

作者:Phuc Tran, Nisheeth K. Vishnoi
备注:NeurIPS 2025
摘要:Low-rank pseudoinverses are widely used to approximate matrix inverses in scalable machine learning, optimization, and scientific computing. However, real-world matrices are often observed with noise, arising from sampling, sketching, and quantization. The spectral-norm robustness of low-rank inverse approximations remains poorly understood. We systematically study the spectral-norm error $\| (\tilde{A}^{-1})_p - A_p^{-1} \|$ for an $n\times n$ symmetric matrix $A$, where $A_p^{-1}$ denotes the best rank-\(p\) approximation of $A^{-1}$, and $\tilde{A} = A + E$ is a noisy observation. Under mild assumptions on the noise, we derive sharp non-asymptotic perturbation bounds that reveal how the error scales with the eigengap, spectral decay, and noise alignment with low-curvature directions of $A$. Our analysis introduces a novel application of contour integral techniques to the \emph{non-entire} function $f(z) = 1/z$, yielding bounds that improve over naive adaptations of classical full-inverse bounds by up to a factor of $\sqrt{n}$. Empirically, our bounds closely track the true perturbation error across a variety of real-world and synthetic matrices, while estimates based on classical results tend to significantly overpredict. These findings offer practical, spectrum-aware guarantees for low-rank inverse approximations in noisy computational environments.


【8】A Framework for Bounding Deterministic Risk with PAC-Bayes: Applications to Majority Votes
标题:用PAC-Bayes界定确定性风险的框架:多数投票的应用
链接:https://arxiv.org/abs/2510.25569

作者:Benjamin Leblanc, Pascal Germain
摘要:PAC-Bayes is a popular and efficient framework for obtaining generalization guarantees in situations involving uncountable hypothesis spaces. Unfortunately, in its classical formulation, it only provides guarantees on the expected risk of a randomly sampled hypothesis. This requires stochastic predictions at test time, making PAC-Bayes unusable in many practical situations where a single deterministic hypothesis must be deployed. We propose a unified framework to extract guarantees holding for a single hypothesis from stochastic PAC-Bayesian guarantees. We present a general oracle bound and derive from it a numerical bound and a specialization to majority vote. We empirically show that our approach consistently outperforms popular baselines (by up to a factor of 2) when it comes to generalization bounds on deterministic classifiers.


【9】Scalable Utility-Aware Multiclass Calibration
标题:可扩展的效用感知多类校准
链接:https://arxiv.org/abs/2510.25458

作者:Mahmoud Hegazy, Michael I. Jordan, Aymeric Dieuleveut
摘要:Ensuring that classifiers are well-calibrated, i.e., their predictions align with observed frequencies, is a minimal and fundamental requirement for classifiers to be viewed as trustworthy. Existing methods for assessing multiclass calibration often focus on specific aspects associated with prediction (e.g., top-class confidence, class-wise calibration) or utilize computationally challenging variational formulations. In this work, we study scalable \emph{evaluation} of multiclass calibration. To this end, we propose utility calibration, a general framework that measures the calibration error relative to a specific utility function that encapsulates the goals or decision criteria relevant to the end user. We demonstrate how this framework can unify and re-interpret several existing calibration metrics, particularly allowing for more robust versions of the top-class and class-wise calibration metrics, and, going beyond such binarized approaches, toward assessing calibration for richer classes of downstream utilities.


【10】Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions
标题:抽象人工智能:架构、应用和未来方向的全面调查
链接:https://arxiv.org/abs/2510.25445

作者:Mohamad Abou Ali, Fadi Dornaika
摘要 :Agentic AI represents a transformative shift in artificial intelligence, but its rapid advancement has led to a fragmented understanding, often conflating modern neural systems with outdated symbolic models -- a practice known as conceptual retrofitting. This survey cuts through this confusion by introducing a novel dual-paradigm framework that categorizes agentic systems into two distinct lineages: the Symbolic/Classical (relying on algorithmic planning and persistent state) and the Neural/Generative (leveraging stochastic generation and prompt-driven orchestration). Through a systematic PRISMA-based review of 90 studies (2018--2025), we provide a comprehensive analysis structured around this framework across three dimensions: (1) the theoretical foundations and architectural principles defining each paradigm; (2) domain-specific implementations in healthcare, finance, and robotics, demonstrating how application constraints dictate paradigm selection; and (3) paradigm-specific ethical and governance challenges, revealing divergent risks and mitigation strategies. Our analysis reveals that the choice of paradigm is strategic: symbolic systems dominate safety-critical domains (e.g., healthcare), while neural systems prevail in adaptive, data-rich environments (e.g., finance). Furthermore, we identify critical research gaps, including a significant deficit in governance models for symbolic systems and a pressing need for hybrid neuro-symbolic architectures. The findings culminate in a strategic roadmap arguing that the future of Agentic AI lies not in the dominance of one paradigm, but in their intentional integration to create systems that are both adaptable and reliable. This work provides the essential conceptual toolkit to guide future research, development, and policy toward robust and trustworthy hybrid intelligent systems.


【11】Position: Biology is the Challenge Physics-Informed ML Needs to Evolve
标题:立场:生物学是挑战物理知情ML需要进化
链接:https://arxiv.org/abs/2510.25368

作者:Julien Martinelli
摘要:Physics-Informed Machine Learning (PIML) has successfully integrated mechanistic understanding into machine learning, particularly in domains governed by well-known physical laws. This success has motivated efforts to apply PIML to biology, a field rich in dynamical systems but shaped by different constraints. Biological modeling, however, presents unique challenges: multi-faceted and uncertain prior knowledge, heterogeneous and noisy data, partial observability, and complex, high-dimensional networks. In this position paper, we argue that these challenges should not be seen as obstacles to PIML, but as catalysts for its evolution. We propose Biology-Informed Machine Learning (BIML): a principled extension of PIML that retains its structural grounding while adapting to the practical realities of biology. Rather than replacing PIML, BIML retools its methods to operate under softer, probabilistic forms of prior knowledge. We outline four foundational pillars as a roadmap for this transition: uncertainty quantification, contextualization, constrained latent structure inference, and scalability. Foundation Models and Large Language Models will be key enablers, bridging human expertise with computational modeling. We conclude with concrete recommendations to build the BIML ecosystem and channel PIML-inspired innovation toward challenges of high scientific and societal relevance.


【12】CDFlow: Building Invertible Layers with Circulant and Diagonal Matrices
标题:CDFlow:使用循环矩阵和对角矩阵构建可逆层
链接:https://arxiv.org/abs/2510.25323

作者:Xuchen Feng, Siyu Liao
备注:Accepted at NeurIPS 2025. Camera-ready version. 10 pages, 12 figures, 2 tables
摘要:Normalizing flows are deep generative models that enable efficient likelihood estimation and sampling through invertible transformations. A key challenge is to design linear layers that enhance expressiveness while maintaining efficient computation of the Jacobian determinant and inverse. We introduce a novel invertible linear layer based on the product of circulant and diagonal matrices. This decomposition reduces parameter complexity from $\mathcal{O}(n^2)$ to $\mathcal{O}(mn)$ using $m$ diagonal matrices and $m-1$ circulant matrices while still approximating general linear transformations. By leveraging the Fast Fourier Transform, our approach reduces the time complexity of matrix inversion from $\mathcal{O}(n^3)$ to $\mathcal{O}(mn\log n)$ and that of computing the log-determinant from $\mathcal{O}(n^3)$ to $\mathcal{O}(mn)$, where $n$ is the input dimension. We build upon this layer to develop Circulant-Diagonal Flow (CDFlow), which achieves strong density estimation on natural image datasets and effectively models data with inherent periodic structure. Furthermore, CDFlow significantly accelerates key operations in normalizing flows, providing practical benefits for scalable generative modeling.


【13】Scaling Up Bayesian DAG Sampling
标题:扩大Bayesian DAB抽样
链接:https://arxiv.org/abs/2510.25254

作者:Daniele Nikzad, Alexander Zhilkin, Juha Harviainen, Jack Kuipers, Giusi Moffa, Mikko Koivisto
摘要:Bayesian inference of Bayesian network structures is often performed by sampling directed acyclic graphs along an appropriately constructed Markov chain. We present two techniques to improve sampling. First, we give an efficient implementation of basic moves, which add, delete, or reverse a single arc. Second, we expedite summing over parent sets, an expensive task required for more sophisticated moves: we devise a preprocessing method to prune possible parent sets so as to approximately preserve the sums. Our empirical study shows that our techniques can yield substantial efficiency gains compared to previous methods.


【14】Lipschitz-aware Linearity Grafting for Certified Robustness
标题:Lipschitz意识的线性移植以获得认证的鲁棒性
链接:https://arxiv.org/abs/2510.25130

作者:Yongjin Han, Suhyun Kim
摘要 :Lipschitz constant is a fundamental property in certified robustness, as smaller values imply robustness to adversarial examples when a model is confident in its prediction. However, identifying the worst-case adversarial examples is known to be an NP-complete problem. Although over-approximation methods have shown success in neural network verification to address this challenge, reducing approximation errors remains a significant obstacle. Furthermore, these approximation errors hinder the ability to obtain tight local Lipschitz constants, which are crucial for certified robustness. Originally, grafting linearity into non-linear activation functions was proposed to reduce the number of unstable neurons, enabling scalable and complete verification. However, no prior theoretical analysis has explained how linearity grafting improves certified robustness. We instead consider linearity grafting primarily as a means of eliminating approximation errors rather than reducing the number of unstable neurons, since linear functions do not require relaxation. In this paper, we provide two theoretical contributions: 1) why linearity grafting improves certified robustness through the lens of the $l_\infty$ local Lipschitz constant, and 2) grafting linearity into non-linear activation functions, the dominant source of approximation errors, yields a tighter local Lipschitz constant. Based on these theoretical contributions, we propose a Lipschitz-aware linearity grafting method that removes dominant approximation errors, which are crucial for tightening the local Lipschitz constant, thereby improving certified robustness, even without certified training. Our extensive experiments demonstrate that grafting linearity into these influential activations tightens the $l_\infty$ local Lipschitz constant and enhances certified robustness.


【15】The Neural Differential Manifold: An Architecture with Explicit Geometric Structure
标题:神经差异Manifold:具有显式几何结构的架构
链接:https://arxiv.org/abs/2510.25113

作者:Di Zhang
备注:9 pages
摘要:This paper introduces the Neural Differential Manifold (NDM), a novel neural network architecture that explicitly incorporates geometric structure into its fundamental design. Departing from conventional Euclidean parameter spaces, the NDM re-conceptualizes a neural network as a differentiable manifold where each layer functions as a local coordinate chart, and the network parameters directly parameterize a Riemannian metric tensor at every point. The architecture is organized into three synergistic layers: a Coordinate Layer implementing smooth chart transitions via invertible transformations inspired by normalizing flows, a Geometric Layer that dynamically generates the manifold's metric through auxiliary sub-networks, and an Evolution Layer that optimizes both task performance and geometric simplicity through a dual-objective loss function. This geometric regularization penalizes excessive curvature and volume distortion, providing intrinsic regularization that enhances generalization and robustness. The framework enables natural gradient descent optimization aligned with the learned manifold geometry and offers unprecedented interpretability by endowing internal representations with clear geometric meaning. We analyze the theoretical advantages of this approach, including its potential for more efficient optimization, enhanced continual learning, and applications in scientific discovery and controllable generative modeling. While significant computational challenges remain, the Neural Differential Manifold represents a fundamental shift towards geometrically structured, interpretable, and efficient deep learning systems.


【16】Shift is Good: Mismatched Data Mixing Improves Test Performance
标题:转变很好:不匹配的数据混合提高测试性能
链接:https://arxiv.org/abs/2510.25108

作者:Marko Medvedev, Kaifeng Lyu, Zhiyuan Li, Nathan Srebro
摘要:We consider training and testing on mixture distributions with different training and test proportions. We show that in many settings, and in some sense generically, distribution shift can be beneficial, and test performance can improve due to mismatched training proportions, even if the components are unrelated and with no transfer between components. In a variety of scenarios, we identify the optimal training proportions and the extent to which such distribution shift can be beneficial. We show how the same analysis applies also to a compositional setting with differing distribution of component "skills'' at training and test.


【17】Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games
标题:垄断交易:有限单边反应游戏的基准环境
链接:https://arxiv.org/abs/2510.25080

作者:Will Wolf
备注:24 pages, 7 figures
摘要:Card games are widely used to study sequential decision-making under uncertainty, with real-world analogues in negotiation, finance, and cybersecurity. Typically, these games fall into three categories based on the flow of control: strictly-sequential (where players alternate single actions), deterministic-response (where some actions trigger a fixed outcome), and unbounded reciprocal-response (where alternating counterplays are permitted). A less-explored but strategically rich structure exists: the bounded one-sided response. This dynamic occurs when a player's action briefly transfers control to the opponent, who must satisfy a fixed condition through one or more sequential moves before the turn resolves. We term games featuring this mechanism Bounded One-Sided Response Games (BORGs).   We introduce a modified version of Monopoly Deal as a benchmark environment that specifically isolates the BORG dynamic, where a Rent action forces the opponent to sequentially choose payment assets. We demonstrate that the gold-standard algorithm, Counterfactual Regret Minimization (CFR), successfully converges on effective strategies for this domain without requiring novel algorithmic extensions. To support efficient, reproducible experimentation, we present a lightweight, full-stack research platform that unifies the environment, a parallelized CFR runtime, and a human-playable web interface, all runnable on a single workstation. This system provides a practical foundation for exploring state representation and policy learning in bounded one-sided response settings.   The trained CFR agent and source code are available at https://monopolydeal.ai.


【18】Automating Benchmark Design
标题:自动化基准设计
链接:https://arxiv.org/abs/2510.25039

作者:Amanda Dsouza, Harit Vishwakarma, Zhengyang Qi, Justin Bauer, Derek Pham, Thomas Walshe, Armin Parchami, Frederic Sala, Paroma Varma
摘要:The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing model capabilities, but these quickly become saturated. In contrast, dynamic benchmarks evolve alongside the models they evaluate, but are expensive to create and continuously update. To address these challenges, we develop BeTaL (Benchmark Tuning with an LLM-in-the-loop), a framework that leverages environment design principles to automate the process of dynamic benchmark design. BeTaL works by parameterizing key design choices in base benchmark templates and uses LLMs to reason through the resulting parameter space to obtain target properties (such as difficulty and realism) in a cost-efficient manner. We validate this approach on its ability to create benchmarks with desired difficulty levels. Using BeTaL, we create two new benchmarks and extend a popular agentic benchmark {\tau} -bench. Extensive evaluation on these three tasks and multiple target difficulty levels shows that BeTaL produces benchmarks much closer to the desired difficulty, with average deviations ranging from 5.3% to 13.2% -- a 2-4x improvement over the baselines.


【19】Towards Human-AI Synergy in Requirements Engineering: A Framework and Preliminary Study
标题:需求工程中的人机协同:框架和初步研究
链接:https://arxiv.org/abs/2510.25016

作者:Mateen Ahmed Abbasi, Petri Ihantola, Tommi Mikkonen, Niko Mäkitalo
备注:Accepted at the 2025 Sixth International Conference on Intelligent   Data Science Technologies and Applications (IDSTA 2025),8 pages, 4 figures.   Published in IEEE
摘要:The future of Requirements Engineering (RE) is increasingly driven by artificial intelligence (AI), reshaping how we elicit, analyze, and validate requirements. Traditional RE is based on labor-intensive manual processes prone to errors and complexity. AI-powered approaches, specifically large language models (LLMs), natural language processing (NLP), and generative AI, offer transformative solutions and reduce inefficiencies. However, the use of AI in RE also brings challenges like algorithmic bias, lack of explainability, and ethical concerns related to automation. To address these issues, this study introduces the Human-AI RE Synergy Model (HARE-SM), a conceptual framework that integrates AI-driven analysis with human oversight to improve requirements elicitation, analysis, and validation. The model emphasizes ethical AI use through transparency, explainability, and bias mitigation. We outline a multi-phase research methodology focused on preparing RE datasets, fine-tuning AI models, and designing collaborative human-AI workflows. This preliminary study presents the conceptual framework and early-stage prototype implementation, establishing a research agenda and practical design direction for applying intelligent data science techniques to semi-structured and unstructured RE data in collaborative environments.


【20】Cyclic Counterfactuals under Shift-Scale Interventions
标题:移位尺度干预下的循环反事实
链接:https://arxiv.org/abs/2510.25005

作者:Saptarshi Saha, Dhruv Vansraj Rathore, Utpal Garain
备注:Accepted at NeurIPS 2025
摘要:Most counterfactual inference frameworks traditionally assume acyclic structural causal models (SCMs), i.e. directed acyclic graphs (DAGs). However, many real-world systems (e.g. biological systems) contain feedback loops or cyclic dependencies that violate acyclicity. In this work, we study counterfactual inference in cyclic SCMs under shift-scale interventions, i.e., soft, policy-style changes that rescale and/or shift a variable's mechanism.


【21】What Really Matters in Matrix-Whitening Optimizers?
标题:什么是真正重要的矩阵白化优化?
链接:https://arxiv.org/abs/2510.25000

作者:Kevin Frans, Pieter Abbeel, Sergey Levine
摘要:A range of recent optimizers have emerged that approximate the same "matrix-whitening" transformation in various ways. In this work, we systematically deconstruct such optimizers, aiming to disentangle the key components that explain performance. Across tuned hyperparameters across the board, all flavors of matrix-whitening methods reliably outperform elementwise counterparts, such as Adam. Matrix-whitening is often related to spectral descent -- however, experiments reveal that performance gains are *not explained solely by accurate spectral normalization* -- particularly, SOAP displays the largest per-step gain, even though Muon more accurately descends along the steepest spectral descent direction. Instead, we argue that matrix-whitening serves two purposes, and the variance adaptation component of matrix-whitening is the overlooked ingredient explaining this performance gap. Experiments show that variance-adapted versions of optimizers consistently outperform their sign-descent counterparts, including an adaptive version of Muon. We further ablate variance adaptation strategies, finding that while lookahead style approximations are not as effective, low-rank variance estimators can effectively reduce memory costs without a performance loss.


【22】LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies
标题:LRT扩散:扩散政策的经过校准的风险意识指南
链接:https://arxiv.org/abs/2510.24983

作者:Ximan Sun, Xiang Cheng
摘要:Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the unconditional mean, at the LRT-gated mean, or a blend, exposing a continuum from exploitation to conservatism. We standardize states and actions consistently at train and test time and report a state-conditional out-of-distribution (OOD) metric alongside return. On D4RL MuJoCo tasks, LRT-Diffusion improves the return-OOD trade-off over strong Q-guided baselines in our implementation while honoring the desired alpha. Theoretically, we establish level-alpha calibration, concise stability bounds, and a return comparison showing when LRT surpasses Q-guidance-especially when off-support errors dominate. Overall, LRT-Diffusion is a drop-in, inference-time method that adds principled, calibrated risk control to diffusion policies for offline RL.


【23】Strategic inputs: feature selection from game-theoretic perspective
标题:战略输入:从博弈论角度选择功能
链接:https://arxiv.org/abs/2510.24982

作者:Chi Zhao, Jing Liu, Elena Parilina
摘要 :The exponential growth of data volumes has led to escalating computational costs in machine learning model training. However, many features fail to contribute positively to model performance while consuming substantial computational resources. This paper presents an end-to-end feature selection framework for tabular data based on game theory. We formulate feature selection procedure based on a cooperative game where features are modeled as players, and their importance is determined through the evaluation of synergistic interactions and marginal contributions. The proposed framework comprises four core components: sample selection, game-theoretic feature importance evaluation, redundant feature elimination, and optimized model training. Experimental results demonstrate that the proposed method achieves substantial computation reduction while preserving predictive performance, thereby offering an efficient solution of the computational challenges of large-scale machine learning. The source code is available at https://github.com/vectorsss/strategy_inputs.


【24】Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought
标题:啊哈时刻可以是假的吗?在思想链中识别真实和装饰性的思维步骤
链接:https://arxiv.org/abs/2510.24941

作者:Jiachen Zhao, Yiyou Sun, Weiyan Shi, Dawn Song
摘要:Recent large language models (LLMs) can generate long Chain-of-Thought (CoT) at test time, enabling them to solve complex tasks. These reasoning steps in CoT are often assumed as a faithful reflection of the model's internal thinking process, and used to monitor unsafe intentions. However, we find many reasoning steps don't truly contribute to LLMs' prediction. We measure the step-wise causal influence of each reasoning step on the model's final prediction with a proposed True Thinking Score (TTS). We reveal that LLMs often interleave between true-thinking steps (which are genuinely used to produce the final output) and decorative-thinking steps (which only give the appearance of reasoning but have minimal causal impact). Notably, only a small subset of the total reasoning steps have a high TTS that causally drive the model's prediction: e.g., for the AIME dataset, only an average of 2.3% of reasoning steps in CoT have a TTS >= 0.7 (range: 0-1) under the Qwen-2.5 model. Furthermore, we identify a TrueThinking direction in the latent space of LLMs. By steering along or against this direction, we can force the model to perform or disregard certain CoT steps when computing the final result. Finally, we highlight that self-verification steps in CoT (i.e., aha moments) can also be decorative, where LLMs do not truly verify their solution. Steering along the TrueThinking direction can force internal reasoning over these steps, resulting in a change in the final results. Overall, our work reveals that LLMs often verbalize reasoning steps without actually performing them internally, which undermines both the efficiency of LLM reasoning and the trustworthiness of CoT.


【25】Idea2Plan: Exploring AI-Powered Research Planning
标题:Idea2Plan:探索人工智能驱动的研究规划
链接:https://arxiv.org/abs/2510.24891

作者:Jin Huang, Silviu Cucerzan, Sujay Kumar Jauhar, Ryen W. White
摘要:Large language models (LLMs) have demonstrated significant potential to accelerate scientific discovery as valuable tools for analyzing data, generating hypotheses, and supporting innovative approaches in various scientific fields. In this work, we investigate how LLMs can handle the transition from conceptual research ideas to well-structured research plans. Effective research planning not only supports scientists in advancing their research but also represents a crucial capability for the development of autonomous research agents. Despite its importance, the field lacks a systematic understanding of LLMs' research planning capability. To rigorously measure this capability, we introduce the Idea2Plan task and Idea2Plan Bench, a benchmark built from 200 ICML 2025 Spotlight and Oral papers released after major LLM training cutoffs. Each benchmark instance includes a research idea and a grading rubric capturing the key components of valid plans. We further propose Idea2Plan JudgeEval, a complementary benchmark to assess the reliability of LLM-based judges against expert annotations. Experimental results show that GPT-5 and GPT-5-mini achieve the strongest performance on the benchmark, though substantial headroom remains for future improvement. Our study provides new insights into LLMs' capability for research planning and lay the groundwork for future progress.


【26】Aggregation Hides Out-of-Distribution Generalization Failures from Spurious Correlations
标题:聚集隐藏分布外概括失败,避免虚假相关性
链接:https://arxiv.org/abs/2510.24884

作者:Olawale Salaudeen, Haoran Zhang, Kumail Alhamoud, Sara Beery, Marzyeh Ghassemi
备注:Accepted as a Spotlight paper at NeurIPS 2025
摘要:Benchmarks for out-of-distribution (OOD) generalization frequently show a strong positive correlation between in-distribution (ID) and OOD accuracy across models, termed "accuracy-on-the-line." This pattern is often taken to imply that spurious correlations - correlations that improve ID but reduce OOD performance - are rare in practice. We find that this positive correlation is often an artifact of aggregating heterogeneous OOD examples. Using a simple gradient-based method, OODSelect, we identify semantically coherent OOD subsets where accuracy on the line does not hold. Across widely used distribution shift benchmarks, the OODSelect uncovers subsets, sometimes over half of the standard OOD set, where higher ID accuracy predicts lower OOD accuracy. Our findings indicate that aggregate metrics can obscure important failure modes of OOD robustness. We release code and the identified subsets to facilitate further research.


【27】AmarDoctor: An AI-Driven, Multilingual, Voice-Interactive Digital Health Application for Primary Care Triage and Patient Management to Bridge the Digital Health Divide for Bengali Speakers
标题:AmarDoctor:一款人工智能驱动、多语言、语音交互式数字健康应用程序,用于初级保健分诊和患者管理,弥合孟加拉语使用者的数字健康鸿沟
链接:https://arxiv.org/abs/2510.24724

作者:Nazmun Nahar, Ritesh Harshad Ruparel, Shariar Kabir, Sumaiya Tasnia Khan, Shyamasree Saha, Mamunur Rashid
摘要 :This study presents AmarDoctor, a multilingual voice-interactive digital health app designed to provide comprehensive patient triage and AI-driven clinical decision support for Bengali speakers, a population largely underserved in access to digital healthcare. AmarDoctor adopts a data-driven approach to strengthen primary care delivery and enable personalized health management. While platforms such as AdaHealth, WebMD, Symptomate, and K-Health have become popular in recent years, they mainly serve European demographics and languages. AmarDoctor addresses this gap with a dual-interface system for both patients and healthcare providers, supporting three major Bengali dialects. At its core, the patient module uses an adaptive questioning algorithm to assess symptoms and guide users toward the appropriate specialist. To overcome digital literacy barriers, it integrates a voice-interactive AI assistant that navigates users through the app services. Complementing this, the clinician-facing interface incorporates AI-powered decision support that enhances workflow efficiency by generating structured provisional diagnoses and treatment recommendations. These outputs inform key services such as e-prescriptions, video consultations, and medical record management. To validate clinical accuracy, the system was evaluated against a gold-standard set of 185 clinical vignettes developed by experienced physicians. Effectiveness was further assessed by comparing AmarDoctor performance with five independent physicians using the same vignette set. Results showed AmarDoctor achieved a top-1 diagnostic precision of 81.08 percent (versus physicians average of 50.27 percent) and a top specialty recommendation precision of 91.35 percent (versus physicians average of 62.6 percent).


【28】Scaling flow-based approaches for topology sampling in $\mathrm{SU}(3)$ gauge theory
链接:https://arxiv.org/abs/2510.25704

作者:Claudio Bonanno, Andrea Bulgarelli, Elia Cellini, Alessandro Nada, Dario Panfalone, Davide Vadacchino, Lorenzo Verzichelli
备注:1+39 pages, 14 figures
摘要:We develop a methodology based on out-of-equilibrium simulations to mitigate topological freezing when approaching the continuum limit of lattice gauge theories. We reduce the autocorrelation of the topological charge employing open boundary conditions, while removing exactly their unphysical effects using a non-equilibrium Monte Carlo approach in which periodic boundary conditions are gradually switched on. We perform a detailed analysis of the computational costs of this strategy in the case of the four-dimensional $\mathrm{SU}(3)$ Yang-Mills theory. After achieving full control of the scaling, we outline a clear strategy to sample topology efficiently in the continuum limit, which we check at lattice spacings as small as $0.045$ fm. We also generalize this approach by designing a customized Stochastic Normalizing Flow for evolutions in the boundary conditions, obtaining superior performances with respect to the purely stochastic non-equilibrium approach, and paving the way for more efficient future flow-based solutions.


【29】PyDPF: A Python Package for Differentiable Particle Filtering
标题:PyDPA:用于区分粒子过滤的Python包
链接:https://arxiv.org/abs/2510.25693

作者:John-Joseph Brady, Benjamin Cox, Víctor Elvira, Yunpeng Li
备注:42 pages, 0 figures, under review at the Journal of Statistical Software, the python package can be found at this https URL , the full documentation at this https URL , and the source code including experiments and replication material at this https URL
摘要:State-space models (SSMs) are a widely used tool in time series analysis. In the complex systems that arise from real-world data, it is common to employ particle filtering (PF), an efficient Monte Carlo method for estimating the hidden state corresponding to a sequence of observations. Applying particle filtering requires specifying both the parametric form and the parameters of the system, which are often unknown and must be estimated. Gradient-based optimisation techniques cannot be applied directly to standard particle filters, as the filters themselves are not differentiable. However, several recently proposed methods modify the resampling step to make particle filtering differentiable. In this paper, we present an implementation of several such differentiable particle filters (DPFs) with a unified API built on the popular PyTorch framework. Our implementation makes these algorithms easily accessible to a broader research community and facilitates straightforward comparison between them. We validate our framework by reproducing experiments from several existing studies and demonstrate how DPFs can be applied to address several common challenges with state space modelling.


【30】PitchFlower: A flow-based neural audio codec with pitch controllability
标题:PitchFlower:具有音调可控性的基于流的神经音频编解码器
链接:https://arxiv.org/abs/2510.25566

作者:Diego Torres, Axel Roebel, Nicolas Obin
备注:5 pages, 5 figures
摘要:We present PitchFlower, a flow-based neural audio codec with explicit pitch controllability. Our approach enforces disentanglement through a simple perturbation: during training, F0 contours are flattened and randomly shifted, while the true F0 is provided as conditioning. A vector-quantization bottleneck prevents pitch recovery, and a flow-based decoder generates high quality audio. Experiments show that PitchFlower achieves more accurate pitch control than WORLD at much higher audio quality, and outperforms SiFiGAN in controllability while maintaining comparable quality. Beyond pitch, this framework provides a simple and extensible path toward disentangling other speech attributes.


【31】Robust variable selection for spatial point processes observed with noise
标题:有噪音观察的空间点过程的稳健变量选择
链接:https://arxiv.org/abs/2510.25550

作者:Dominik Sturm, Ivo F. Sbalzarini
摘要 :We propose a method for variable selection in the intensity function of spatial point processes that combines sparsity-promoting estimation with noise-robust model selection. As high-resolution spatial data becomes increasingly available through remote sensing and automated image analysis, identifying spatial covariates that influence the localization of events is crucial to understand the underlying mechanism. However, results from automated acquisition techniques are often noisy, for example due to measurement uncertainties or detection errors, which leads to spurious displacements and missed events. We study the impact of such noise on sparse point-process estimation across different models, including Poisson and Thomas processes. To improve noise robustness, we propose to use stability selection based on point-process subsampling and to incorporate a non-convex best-subset penalty to enhance model-selection performance. In extensive simulations, we demonstrate that such an approach reliably recovers true covariates under diverse noise scenarios and improves both selection accuracy and stability. We then apply the proposed method to a forestry data set, analyzing the distribution of trees in relation to elevation and soil nutrients in a tropical rain forest. This shows the practical utility of the method, which provides a systematic framework for robust variable selection in spatial point-process models under noise, without requiring additional knowledge of the process.


【32】Sustainable NARMA-10 Benchmarking for Quantum Reservoir Computing
标题:可持续NARMA-10量子水库计算基准
链接:https://arxiv.org/abs/2510.25183

作者:Avyay Kodali, Priyanshi Singh, Pranay Pandey, Krishna Bhatia, Shalini Devendrababu, Srinjoy Ganguly
备注:6 pages, 1 table, 2 figures. Work conducted under QIntern 2025 (QWorld) with support from Fractal AI Research
摘要:This study compares Quantum Reservoir Computing (QRC) with classical models such as Echo State Networks (ESNs) and Long Short-Term Memory networks (LSTMs), as well as hybrid quantum-classical architectures (QLSTM), for the nonlinear autoregressive moving average task (NARMA-10). We evaluate forecasting accuracy (NRMSE), computational cost, and evaluation time. Results show that QRC achieves competitive accuracy while offering potential sustainability advantages, particularly in resource-constrained settings, highlighting its promise for sustainable time-series AI applications.


【33】Conditional neural field for spatial dimension reduction of turbulence data: a comparison study
标题:湍流数据空间维约简的条件神经场:比较研究
链接:https://arxiv.org/abs/2510.25135

作者:Junyi Guo, Pan Du, Xiantao Fan, Yahui Li, Jian-Xun Wang
摘要:We investigate conditional neural fields (CNFs), mesh-agnostic, coordinate-based decoders conditioned on a low-dimensional latent, for spatial dimensionality reduction of turbulent flows. CNFs are benchmarked against Proper Orthogonal Decomposition and a convolutional autoencoder within a unified encoding-decoding framework and a common evaluation protocol that explicitly separates in-range (interpolative) from out-of-range (strict extrapolative) testing beyond the training horizon, with identical preprocessing, metrics, and fixed splits across all baselines. We examine three conditioning mechanisms: (i) activation-only modulation (often termed FiLM), (ii) low-rank weight and bias modulation (termed FP), and (iii) last-layer inner-product coupling, and introduce a novel domain-decomposed CNF that localizes complexities. Across representative turbulence datasets (WMLES channel inflow, DNS channel inflow, and wall pressure fluctuations over turbulent boundary layers), CNF-FP achieves the lowest training and in-range testing errors, while CNF-FiLM generalizes best for out-of-range scenarios once moderate latent capacity is available. Domain decomposition significantly improves out-of-range accuracy, especially for the more demanding datasets. The study provides a rigorous, physics-aware basis for selecting conditioning, capacity, and domain decomposition when using CNFs for turbulence compression and reconstruction.


【34】scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration
标题:scMRDR:用于不配对单细胞多组学数据集成的可扩展、灵活的框架
链接:https://arxiv.org/abs/2510.24987

作者:Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye
备注:Accepted at NeurIPS 2025 (Spotlight)
摘要:Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration. Specifically, we disentangle each cell's latent representations into modality-shared and modality-specific components using a well-designed $\beta$-VAE architecture, which are augmented with isometric regularization to preserve intra-omics biological heterogeneity, adversarial objective to encourage cross-modal alignment, and masked reconstruction loss strategy to address the issue of missing features across modalities. Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation. Crucially, it scales effectively to large-level datasets and supports integration of more than two omics, offering a powerful and flexible solution for large-scale multi-omics data integration and downstream biological discovery.


【35】Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm
标题:通过Hoeffding函数分解和TreeHFD算法实现树的可解释性
链接:https://arxiv.org/abs/2510.24815

作者:Clément Bénard
摘要:Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with critical decisions at stake. The Hoeffding or ANOVA functional decomposition is a powerful explainability method, as it breaks down black-box models into a unique sum of lower-dimensional functions, provided that input variables are independent. In standard learning settings, input variables are often dependent, and the Hoeffding decomposition is generalized through hierarchical orthogonality constraints. Such generalization leads to unique and sparse decompositions with well-defined main effects and interactions. However, the practical estimation of this decomposition from a data sample is still an open problem. Therefore, we introduce the TreeHFD algorithm to estimate the Hoeffding decomposition of a tree ensemble from a data sample. We show the convergence of TreeHFD, along with the main properties of orthogonality, sparsity, and causal variable selection. The high performance of TreeHFD is demonstrated through experiments on both simulated and real data, using our treehfd Python package (https://github.com/ThalesGroup/treehfd). Besides, we empirically show that the widely used TreeSHAP method, based on Shapley values, is strongly connected to the Hoeffding decomposition.


【36】Spectral functions in Minkowski quantum electrodynamics from neural reconstruction: Benchmarking against dispersive Dyson--Schwinger integral equations
标题:来自神经重建的Minkowski量子电动力学中的谱函数:针对分散戴森--Schwinger积分方程的基准
链接:https://arxiv.org/abs/2510.24728

作者:Rodrigo Carmo Terin
备注:9 pages, 2 figures
摘要:A Minkowskian physics-informed neural network approach (M--PINN) is formulated to solve the Dyson--Schwinger integral equations (DSE) of quantum electrodynamics (QED) directly in Minkowski spacetime. Our novel strategy merges two complementary approaches: (i) a dispersive solver based on Lehmann representations and subtracted dispersion relations, and (ii) a M--PINN that learns the fermion mass function $B(p^2)$, under the same truncation and renormalization configuration (quenched, rainbow, Landau gauge) with the loss integrating the DSE residual with multi--scale regularization, and monotonicity/smoothing penalties in the spacelike branch in the same way as in our previous work in Euclidean space. The benchmarks show quantitative agreement from the infrared (IR) to the ultraviolet (UV) scales in both on-shell and momentum-subtraction schemes. In this controlled setting, our M--PINN reproduces the dispersive solution whilst remaining computationally compact and differentiable, paving the way for extensions with realistic vertices, unquenching effects, and uncertainty-aware variants.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/188503