Py学习  »  机器学习算法

机器学习学术速递[9.22]

arXiv每日学术速递 • 昨天 • 21 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计155篇


大模型相关(18篇)

【1】Inverting Trojans in LLMs
标题:LLM中的倒特洛伊木马
链接:https://arxiv.org/abs/2509.16203

作者: Li, Guangmingmei Yang, Jayaram Raghuram, David J. Miller, George Kesidis
摘要:虽然已经为例如用于图像的AI开发了有效的后门检测和反演方案,但是在将这些方法“移植”到LLM方面存在挑战。首先,LLM输入空间是离散的,这排除了在该空间上的基于梯度的搜索,这是许多后门反演方法的核心。第二,有大约30,000^k个k元组需要考虑,k是假定触发器的令牌长度。第三,对于LLM,需要将与攻击的假定目标响应(类)具有强边缘关联的令牌列入黑名单,因为这样的令牌给出错误的检测信号。但是,对于某些域,可能不存在好的黑名单。我们提出了一种LLM触发器反演方法,该方法具有三个关键组成部分:i)离散搜索,从一个选择的单例列表开始,假定触发器greatly增加; ii)隐式黑名单,通过评估激活空间中候选触发器和来自假定目标类的一小部分干净样本之间的平均余弦相似性来实现; iii)当候选触发极高错误分类时的检测,并且具有异常高的决策置信度。与最近的许多作品不同,我们证明了我们的方法可靠地检测并成功地反转了地面真相后门触发短语。
摘要:While effective backdoor detection and inversion schemes have been developed for AIs used e.g. for images, there are challenges in "porting" these methods to LLMs. First, the LLM input space is discrete, which precludes gradient-based search over this space, central to many backdoor inversion methods. Second, there are ~30,000^k k-tuples to consider, k the token-length of a putative trigger. Third, for LLMs there is the need to blacklist tokens that have strong marginal associations with the putative target response (class) of an attack, as such tokens give false detection signals. However, good blacklists may not exist for some domains. We propose a LLM trigger inversion approach with three key components: i) discrete search, with putative triggers greedily accreted, starting from a select list of singletons; ii) implicit blacklisting, achieved by evaluating the average cosine similarity, in activation space, between a candidate trigger and a small clean set of samples from the putative target class; iii) detection when a candidate trigger elicits high misclassifications, and with unusually high decision confidence. Unlike many recent works, we demonstrate that our approach reliably detects and successfully inverts ground-truth backdoor trigger phrases.


【2】Randomized Smoothing Meets Vision-Language Models
标题:随机平滑满足视觉语言模型
链接:https://arxiv.org/abs/2509.16088

作者: Seferis, Changshun Wu, Stefanos Kollias, Saddek Bensalem, Chih-Hong Cheng
备注:EMNLP'25 full version, including appendix (proofs, additional experiments)
摘要:随机平滑(RS)是确保机器学习模型正确性的重要技术之一,其中可以分析导出逐点鲁棒性证书。虽然RS被很好地理解为分类,但其在生成模型中的应用尚不清楚,因为它们的输出是序列而不是标签。我们通过将生成输出连接到Oracle分类任务来解决这个问题,并表明RS仍然可以启用:最终响应可以被分类为离散动作(例如,VLA中的服务机器人命令),有害与无害(VLM中的内容审核或毒性检测),甚至应用神谕将答案聚类为语义等效的答案。假设Oracle分类器比较的错误率是有界的,我们发展了将样本数量与相应的鲁棒性半径相关联的理论。我们进一步推导出改进的标度律分析相关的认证半径和准确度的样本数量,表明早期的结果2至3个数量级的样本足够的最小损失仍然有效,即使在较弱的假设。总之,这些进步使得鲁棒性认证对于最先进的VLM来说既定义明确又在计算上可行,这一点在最近的越狱式对抗攻击中得到了验证。
摘要:Randomized smoothing (RS) is one of the prominent techniques to ensure the correctness of machine learning models, where point-wise robustness certificates can be derived analytically. While RS is well understood for classification, its application to generative models is unclear, since their outputs are sequences rather than labels. We resolve this by connecting generative outputs to an oracle classification task and showing that RS can still be enabled: the final response can be classified as a discrete action (e.g., service-robot commands in VLAs), as harmful vs. harmless (content moderation or toxicity detection in VLMs), or even applying oracles to cluster answers into semantically equivalent ones. Provided that the error rate for the oracle classifier comparison is bounded, we develop the theory that associates the number of samples with the corresponding robustness radius. We further derive improved scaling laws analytically relating the certified radius and accuracy to the number of samples, showing that the earlier result of 2 to 3 orders of magnitude fewer samples sufficing with minimal loss remains valid even under weaker assumptions. Together, these advances make robustness certification both well-defined and computationally feasible for state-of-the-art VLMs, as validated against recent jailbreak-style adversarial attacks.


【3】BEFT: Bias-Efficient Fine-Tuning of Language Models
标题:BEFT:语言模型的偏误高效微调
链接:https://arxiv.org/abs/2509.15974

作者:Huang, Ananth Balashankar, Amir Aminifar
摘要:微调所有偏置项在各种参数有效的微调(PEFT)技术中脱颖而出,由于其开箱即用的可用性和有竞争力的性能,特别是在低数据状态下。仅偏置微调具有前所未有的参数效率的潜力。然而,微调不同的偏置项(即,查询、关键字或值预测中的偏差项)和下游性能仍然不清楚。现有的方法,例如,基于偏差变化的幅度或经验Fisher信息,为有效微调选择特定偏差项提供有限的指导。在本文中,我们提出了一种方法来选择偏置项进行微调,形成我们的偏置有效微调(BEFT)的基础。我们广泛地评估了我们的偏置有效的方法对其他偏置选择方法,在广泛的大型语言模型(LLM)跨越编码器和解码器的架构,从110 M到6.7 B的参数。我们的研究结果表明,我们的偏见有效的方法在不同的下游任务,包括分类,多项选择,并生成任务的有效性和优越性。
摘要:Fine-tuning all-bias-terms stands out among various parameter-efficient fine-tuning (PEFT) techniques, owing to its out-of-the-box usability and competitive performance, especially in low-data regimes. Bias-only fine-tuning has the potential for unprecedented parameter efficiency. However, the link between fine-tuning different bias terms (i.e., bias terms in the query, key, or value projections) and downstream performance remains unclear. The existing approaches, e.g., based on the magnitude of bias change or empirical Fisher information, provide limited guidance for selecting the particular bias term for effective fine-tuning. In this paper, we propose an approach for selecting the bias term to be fine-tuned, forming the foundation of our bias-efficient fine-tuning (BEFT). We extensively evaluate our bias-efficient approach against other bias-selection approaches, across a wide range of large language models (LLMs) spanning encoder-only and decoder-only architectures from 110M to 6.7B parameters. Our results demonstrate the effectiveness and superiority of our bias-efficient approach on diverse downstream tasks, including classification, multiple-choice, and generation tasks.


【4】Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
标题:超越分数:用于自动论文评估的不确定性校准的LLM
链接:https://arxiv.org/abs/2509.15926

作者:im, Qiao Wang (Judy), Zheng Yuan
备注:Accepted at EMNLP 2025 (Main Conference). Camera-ready version
摘要:自动作文评分(AES)系统现在在一些公共基准上达到了接近人类的一致,但现实世界的采用,特别是在高风险考试中,仍然有限。一个主要的障碍是,大多数模型输出一个单一的分数,没有任何伴随的信心或解释措施。我们用保形预测来解决这个问题,保形预测是一种无分布的包装器,它为任何分类器提供集值输出和正式的覆盖保证。两个开源大型语言模型(Llama-3 8B和Qwen-2.5 3B)在三个不同的语料库(ASAP,TOEFL 11,Cambridge-FCE)上进行了微调,并在90%的风险水平下进行了校准。可靠性是用UAcc评估的,UAcc是一种不确定性感知的准确性,它奖励模型的正确性和简洁性。据我们所知,这是第一个将共形预测和UAcc结合起来进行论文评分的工作。校准模型始终满足覆盖目标,同时保持预测集紧凑,这表明开源,中型LLM已经可以支持教师在环AES;我们讨论扩展和更广泛的用户研究作为未来的工作。
摘要:Automated Essay Scoring (AES) systems now reach near human agreement on some public benchmarks, yet real-world adoption, especially in high-stakes examinations, remains limited. A principal obstacle is that most models output a single score without any accompanying measure of confidence or explanation. We address this gap with conformal prediction, a distribution-free wrapper that equips any classifier with set-valued outputs and formal coverage guarantees. Two open-source large language models (Llama-3 8B and Qwen-2.5 3B) are fine-tuned on three diverse corpora (ASAP, TOEFL11, Cambridge-FCE) and calibrated at a 90 percent risk level. Reliability is assessed with UAcc, an uncertainty-aware accuracy that rewards models for being both correct and concise. To our knowledge, this is the first work to combine conformal prediction and UAcc for essay scoring. The calibrated models consistently meet the coverage target while keeping prediction sets compact, indicating that open-source, mid-sized LLMs can already support teacher-in-the-loop AES; we discuss scaling and broader user studies as future work.


【5】EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs
标题:EigenTrack:用于LLM和VLM中幻觉和分布失调检测的光谱激活特征跟踪
链接:https://arxiv.org/abs/2509.15735

作者:tori, Nastaran Darabi, Sina Tayebati, Ranganath Krishnan, Mahesh Subedar, Omesh Tickoo, Amit Ranjan Trivedi
备注:5 pages, submitted to ICASSP 2026, September 2025
摘要:大型语言模型(LLM)提供了广泛的实用性,但仍然容易出现幻觉和分发外(OOD)错误。我们提出了EigenTrack,一个可解释的实时检测器,使用隐藏的激活,一个紧凑的全球签名模型动态的频谱几何。通过将协方差谱统计数据(如熵、特征值间隙和KL发散)从随机基线流到轻量级递归分类器中,EigenTrack跟踪表示结构中的时间偏移,即表面错误出现之前的信号幻觉和OOD漂移。与黑盒和灰盒方法不同,它只需要一次向前传递而不需要重新搜索。与现有的白盒检测器不同,它保留了时间上下文,聚合了全局信号,并提供了可解释的准确性-延迟权衡。
摘要:Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.


【6】ORIC: Benchmarking Object Recognition in Incongruous Context for Large Vision-Language Models
标题:ORIC:大型视觉语言模型的不协调背景下的对象识别基准
链接:https://arxiv.org/abs/2509.15695

作者:Li, Zhan Ling, Yuchen Zhou, Hao Su
摘要:大型视觉语言模型(LVLM)通过集成视觉和文本信息,在图像说明、视觉问答和机器人技术方面取得了重大进展。然而,它们仍然容易在不协调的上下文中出现错误,即对象意外出现或在上下文预期时不存在。这导致两个关键的识别失败:对象错误识别和幻觉。为了系统地研究这个问题,我们引入了对象识别不一致的上下文基准(ORIC),一个新的基准,评估LVLM的情况下,对象上下文关系偏离预期。ORIC采用两种关键策略:(1)LLM引导的采样,识别存在但与上下文不一致的对象,以及(2)CLIP引导的采样,检测可能出现幻觉但不存在的对象,从而创建不一致的上下文。评估18 LVLM和两个开放式词汇检测模型,我们的结果显示显着的识别差距,强调上下文不一致所带来的挑战。这项工作为LVLM的局限性提供了重要的见解,并鼓励进一步研究上下文感知对象识别。
摘要:Large Vision-Language Models (LVLMs) have made significant strides in image caption, visual question answering, and robotics by integrating visual and textual information. However, they remain prone to errors in incongruous contexts, where objects appear unexpectedly or are absent when contextually expected. This leads to two key recognition failures: object misidentification and hallucination. To systematically examine this issue, we introduce the Object Recognition in Incongruous Context Benchmark (ORIC), a novel benchmark that evaluates LVLMs in scenarios where object-context relationships deviate from expectations. ORIC employs two key strategies: (1) LLM-guided sampling, which identifies objects that are present but contextually incongruous, and (2) CLIP-guided sampling, which detects plausible yet nonexistent objects that are likely to be hallucinated, thereby creating an incongruous context. Evaluating 18 LVLMs and two open-vocabulary detection models, our results reveal significant recognition gaps, underscoring the challenges posed by contextual incongruity. This work provides critical insights into LVLMs' limitations and encourages further research on context-aware object recognition.


【7】Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
标题:大型语言模型的稀疏自动编码器引导的内部表示取消学习
链接:https://arxiv.org/abs/2509.15631

作者:mashita, Akira Ito, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara
摘要:随着大型语言模型(LLM)越来越多地部署在各种应用程序中,隐私和版权问题加剧了对更有效的LLM学习技术的需求。许多现有的非学习方法旨在通过额外的训练(例如,梯度上升),这降低了生成这种输出的概率。虽然这种基于抑制的方法可以控制模型输出,但它们可能无法消除嵌入在模型内部激活中的潜在知识;抑制响应与忘记它并不相同。为了解决这些问题,我们提出了一种新的遗忘方法,直接干预模型的内部激活。在我们的表述中,遗忘被定义为一种状态,在这种状态下,被遗忘目标的激活与“未知”实体的激活无法区分。我们的方法引入了一个unlearning的目标,修改激活的目标实体远离那些已知的实体,并朝着那些未知的实体在一个稀疏的自动编码器潜在空间。通过将目标的内部激活与未知实体的内部激活对齐,我们将模型对目标实体的识别从“已知”转移到“未知”,实现真正的遗忘,同时避免过度抑制和模型崩溃。从经验上讲,我们表明,我们的方法有效地调整了被遗忘目标的内部激活,这是基于抑制的方法无法可靠实现的结果。此外,我们的方法有效地减少了模型的目标知识的问答任务中的召回,而不显着损害的非目标知识。
摘要 :As large language models (LLMs) are increasingly deployed across various applications, privacy and copyright concerns have heightened the need for more effective LLM unlearning techniques. Many existing unlearning methods aim to suppress undesirable outputs through additional training (e.g., gradient ascent), which reduces the probability of generating such outputs. While such suppression-based approaches can control model outputs, they may not eliminate the underlying knowledge embedded in the model's internal activations; muting a response is not the same as forgetting it. Moreover, such suppression-based methods often suffer from model collapse. To address these issues, we propose a novel unlearning method that directly intervenes in the model's internal activations. In our formulation, forgetting is defined as a state in which the activation of a forgotten target is indistinguishable from that of ``unknown'' entities. Our method introduces an unlearning objective that modifies the activation of the target entity away from those of known entities and toward those of unknown entities in a sparse autoencoder latent space. By aligning the target's internal activation with those of unknown entities, we shift the model's recognition of the target entity from ``known'' to ``unknown'', achieving genuine forgetting while avoiding over-suppression and model collapse. Empirically, we show that our method effectively aligns the internal activations of the forgotten target, a result that the suppression-based approaches do not reliably achieve. Additionally, our method effectively reduces the model's recall of target knowledge in question-answering tasks without significant damage to the non-target knowledge.


【8】Concept Unlearning in Large Language Models via Self-Constructed Knowledge Triplets
标题:通过自构建知识三胞胎在大型语言模型中进行概念消除学习
链接:https://arxiv.org/abs/2509.15621

作者:mashita, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara, Tomoharu Iwata
摘要:机器非学习(MU)作为大型语言模型(LLM)中隐私和版权问题的解决方案最近引起了相当大的关注。现有的MU方法旨在从LLM中删除特定的目标句子,同时最大限度地减少对无关知识的损害。然而,这些方法需要明确的目标句子,不支持删除更广泛的概念,如人或事件。为了解决这个问题,我们引入了概念遗忘(CU)作为LLM遗忘的新要求。我们利用知识图来表示LLM的内部知识,并将CU定义为删除遗忘目标节点和相关边。这种基于图形的公式化能够实现更直观的遗忘,并有助于设计更有效的方法。我们提出了一种新的方法,提示LLM生成知识三元组和解释性的句子的遗忘目标,并适用于这些表示的学习过程。我们的方法通过将学习过程与LLM的内部知识表示对齐,实现了更精确和全面的概念删除。在真实世界和合成数据集上的实验表明,我们的方法有效地实现了概念级的学习,同时保留不相关的知识。
摘要:Machine Unlearning (MU) has recently attracted considerable attention as a solution to privacy and copyright issues in large language models (LLMs). Existing MU methods aim to remove specific target sentences from an LLM while minimizing damage to unrelated knowledge. However, these approaches require explicit target sentences and do not support removing broader concepts, such as persons or events. To address this limitation, we introduce Concept Unlearning (CU) as a new requirement for LLM unlearning. We leverage knowledge graphs to represent the LLM's internal knowledge and define CU as removing the forgetting target nodes and associated edges. This graph-based formulation enables a more intuitive unlearning and facilitates the design of more effective methods. We propose a novel method that prompts the LLM to generate knowledge triplets and explanatory sentences about the forgetting target and applies the unlearning process to these representations. Our approach enables more precise and comprehensive concept removal by aligning the unlearning process with the LLM's internal knowledge representations. Experiments on real-world and synthetic datasets demonstrate that our method effectively achieves concept-level unlearning while preserving unrelated knowledge.


【9】DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models
标题:DivLogicEval:大型语言模型中逻辑推理评估的基准框架
链接:https://arxiv.org/abs/2509.15587

作者:Chung, Lemao Liu, Mo Yu, Dit-Yan Yeung
备注:Accepted by EMNLP 2025. Project Page: this https URL
摘要:自然语言中的逻辑推理被认为是大型语言模型(LLM)中人类智能的重要衡量标准。流行的基准测试可能会将多种推理技能纠缠在一起,从而对逻辑推理技能提供不真实的评估。同时,现有的逻辑推理基准在语言多样性方面受到限制,其分布偏离理想逻辑推理基准的分布,可能导致评估结果有偏差。因此,本文提出了一个新的经典逻辑基准DivLogicEval,由不同的语句组成的自然句子,在一个反直觉的方式。为了确保更可靠的评估,我们还引入了一个新的评估指标,减轻了LLM固有的偏见和随机性的影响。通过实验,我们证明了在何种程度上需要逻辑推理来回答DivLogicEval中的问题,并比较了不同流行的LLM在进行逻辑推理方面的性能。
摘要:Logic reasoning in natural language has been recognized as an important measure of human intelligence for Large Language Models (LLMs). Popular benchmarks may entangle multiple reasoning skills and thus provide unfaithful evaluations on the logic reasoning skill. Meanwhile, existing logic reasoning benchmarks are limited in language diversity and their distributions are deviated from the distribution of an ideal logic reasoning benchmark, which may lead to biased evaluation results. This paper thereby proposes a new classical logic benchmark DivLogicEval, consisting of natural sentences composed of diverse statements in a counterintuitive way. To ensure a more reliable evaluation, we also introduce a new evaluation metric that mitigates the influence of bias and randomness inherent in LLMs. Through experiments, we demonstrate the extent to which logical reasoning is required to answer the questions in DivLogicEval and compare the performance of different popular LLMs in conducting logical reasoning.


【10】Small LLMs with Expert Blocks Are Good Enough for Hyperparamter Tuning
标题:具有专家块的小型LLM足以适合超参数调优
链接:https://arxiv.org/abs/2509.15561

作者:e, Saksham Bansal, Parikshit Pareek
摘要:超参数调整(HPT)是机器学习(ML)管道中的必要步骤,但对于较大的模型,计算成本较高且不透明。最近,已经为HPT探索了大型语言模型(LLM),但大多数依赖于超过1000亿个参数的模型。我们提出了一个专家块框架HPT使用小型LLM。其核心是Trajectory Context Summarizer(TCS),这是一个确定性模块,可将原始训练轨迹转换为结构化上下文,使小型LLM能够以与大型模型相当的可靠性分析优化进度。使用两个本地运行的LLM(phi 4:reasoning 14 B和qwen2.5-coder:32 B)和10次试验预算,我们启用TCS的HPT流水线在六个不同任务中实现了GPT-4的平均性能约0.9个百分点。
摘要:Hyper-parameter Tuning (HPT) is a necessary step in machine learning (ML) pipelines but becomes computationally expensive and opaque with larger models. Recently, Large Language Models (LLMs) have been explored for HPT, yet most rely on models exceeding 100 billion parameters. We propose an Expert Block Framework for HPT using Small LLMs. At its core is the Trajectory Context Summarizer (TCS), a deterministic block that transforms raw training trajectories into structured context, enabling small LLMs to analyze optimization progress with reliability comparable to larger models. Using two locally-run LLMs (phi4:reasoning14B and qwen2.5-coder:32B) and a 10-trial budget, our TCS-enabled HPT pipeline achieves average performance within ~0.9 percentage points of GPT-4 across six diverse tasks.


【11】How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages
标题:语言模型如何生成俚语:人类和机器生成俚语使用之间的系统比较
链接:https://arxiv.org/abs/2509.15518

作者:, Zhewei Sun
摘要:俚语是一种常用的非正式语言,对NLP系统构成了巨大的挑战。然而,大型语言模型(LLM)的最新进展使这个问题变得更加容易解决。虽然LLM代理越来越广泛地应用于中介任务,如俚语检测和俚语解释,但它们的普遍性和可靠性在很大程度上取决于这些模型是否捕获了有关俚语的结构知识,这些知识与人类证实的俚语用法保持一致。为了回答这个问题,我们贡献了人类和机器生成的俚语用法之间的系统比较。我们的评估框架集中在三个核心方面:1)反映机器如何感知俚语的系统性偏见的用法特征,2)俚语用法所使用的词汇创造和单词重用所反映的创造性,以及3)当用作模型蒸馏的黄金标准示例时,俚语用法的信息性。通过比较在线俚语词典(OSD)和GPT-4 o和Llama-3生成的俚语中人类证实的俚语用法,我们发现LLM如何感知俚语的显着偏见。我们的研究结果表明,虽然LLM已经捕捉到了关于俚语创造性方面的重要知识,但这些知识并不足以使LLM能够进行语言分析等外推任务。
摘要 :Slang is a commonly used type of informal language that poses a daunting challenge to NLP systems. Recent advances in large language models (LLMs), however, have made the problem more approachable. While LLM agents are becoming more widely applied to intermediary tasks such as slang detection and slang interpretation, their generalizability and reliability are heavily dependent on whether these models have captured structural knowledge about slang that align well with human attested slang usages. To answer this question, we contribute a systematic comparison between human and machine-generated slang usages. Our evaluative framework focuses on three core aspects: 1) Characteristics of the usages that reflect systematic biases in how machines perceive slang, 2) Creativity reflected by both lexical coinages and word reuses employed by the slang usages, and 3) Informativeness of the slang usages when used as gold-standard examples for model distillation. By comparing human-attested slang usages from the Online Slang Dictionary (OSD) and slang generated by GPT-4o and Llama-3, we find significant biases in how LLMs perceive slang. Our results suggest that while LLMs have captured significant knowledge about the creative aspects of slang, such knowledge does not align with humans sufficiently to enable LLMs for extrapolative tasks such as linguistic analyses.


【12】Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs
标题:通过不断进化的知识图增强的大型语言模型的时态推理
链接:https://arxiv.org/abs/2509.15464

作者:in, Song Wang, Xiaojie Guo, Julian Shun, Yada Zhu
摘要:大型语言模型(LLM)在许多语言理解任务中表现出色,但难以对不断发展的知识进行推理。为了解决这个问题,最近的工作探索了用知识图(KG)来增强LLM,以提供结构化的最新信息。然而,许多现有的方法假设一个静态的KG快照,忽略了时间动态和事实的不一致性,在现实世界中的数据。为了解决时间转移知识推理的挑战,我们提出了EvoReasoner,一个时间感知的多跳推理算法,执行全局-局部实体接地,多路由分解,和时间接地评分。为了确保基础KG保持准确和最新,我们引入了EvoKG,这是一个噪声容忍KG进化模块,它通过基于置信度的矛盾解决和时间趋势跟踪,从非结构化文档中增量更新KG。我们评估我们的方法的时间QA基准和一个新的端到端的设置,KG是动态更新的原始文件。我们的方法优于基于验证和KG增强的基线,有效地缩小了小型和大型LLM之间在动态问题回答方面的差距。值得注意的是,使用我们的方法的8B参数模型与七个月后提示的671 B模型的性能相匹配。这些结果突出了时间推理与KG进化相结合的重要性,以实现强大的和最新的LLM性能。我们的代码可在github.com/junhongmit/TREK上公开获取。
摘要:Large language models (LLMs) excel at many language understanding tasks but struggle to reason over knowledge that evolves. To address this, recent work has explored augmenting LLMs with knowledge graphs (KGs) to provide structured, up-to-date information. However, many existing approaches assume a static snapshot of the KG and overlook the temporal dynamics and factual inconsistencies inherent in real-world data. To address the challenge of reasoning over temporally shifting knowledge, we propose EvoReasoner, a temporal-aware multi-hop reasoning algorithm that performs global-local entity grounding, multi-route decomposition, and temporally grounded scoring. To ensure that the underlying KG remains accurate and up-to-date, we introduce EvoKG, a noise-tolerant KG evolution module that incrementally updates the KG from unstructured documents through confidence-based contradiction resolution and temporal trend tracking. We evaluate our approach on temporal QA benchmarks and a novel end-to-end setting where the KG is dynamically updated from raw documents. Our method outperforms both prompting-based and KG-enhanced baselines, effectively narrowing the gap between small and large LLMs on dynamic question answering. Notably, an 8B-parameter model using our approach matches the performance of a 671B model prompted seven months later. These results highlight the importance of combining temporal reasoning with KG evolution for robust and up-to-date LLM performance. Our code is publicly available at github.com/junhongmit/TREK.


【13】IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs
标题:IMPQ:LLM的交互感知分层混合精度量化
链接:https://arxiv.org/abs/2509.15455

作者:hao, Ali Derakhshan, Dushyant Bharadwaj, Jayden Kana Hyman, Junhao Dong, Sangeetha Abdu Jyothi, Ian Harris
摘要:大型语言模型(LLM)承诺了令人印象深刻的功能,但其数十亿个参数的规模使得设备上或低资源部署望而却步。混合精度量化提供了一个令人信服的解决方案,但现有的方法在平均精度下降到4位以下时会遇到困难,因为它们依赖于孤立的、特定于层的指标,这些指标忽略了影响整体性能的关键层间交互。在本文中,我们提出了两个创新,以解决这些限制。首先,我们将混合精度量化问题框架为层间的合作博弈,并引入基于Shapley的渐进量化估计(SPQE)来有效地获得层敏感性和层间相互作用的准确Shapley估计。其次,在SPQE的基础上,我们提出了交互感知混合精度量化(IMPQ),它将这些Shapley估计转换为二进制二次优化公式,在严格的内存约束下为层分配2或4位精度。在三个独立的PTQ后端(Quanto,HQQ,GPTQ)上对Llama-3,Gemma-2和Qwen-3模型进行的综合实验证明了IMPQ的可扩展性和与仅依赖于孤立指标的方法相比一贯优越的性能。在4位到2位的平均精度范围内,IMPQ相对于最佳基线将Perplexity降低了20%到80%,并且随着位宽的收紧,利润率会增加。
摘要:Large Language Models (LLMs) promise impressive capabilities, yet their multi-billion-parameter scale makes on-device or low-resource deployment prohibitive. Mixed-precision quantization offers a compelling solution, but existing methods struggle when the average precision drops below four bits, as they rely on isolated, layer-specific metrics that overlook critical inter-layer interactions affecting overall performance. In this paper, we propose two innovations to address these limitations. First, we frame the mixed-precision quantization problem as a cooperative game among layers and introduce Shapley-based Progressive Quantization Estimation (SPQE) to efficiently obtain accurate Shapley estimates of layer sensitivities and inter-layer interactions. Second, building upon SPQE, we propose Interaction-aware Mixed-Precision Quantization (IMPQ) which translates these Shapley estimates into a binary quadratic optimization formulation, assigning either 2 or 4-bit precision to layers under strict memory constraints. Comprehensive experiments conducted on Llama-3, Gemma-2, and Qwen-3 models across three independent PTQ backends (Quanto, HQQ, GPTQ) demonstrate IMPQ's scalability and consistently superior performance compared to methods relying solely on isolated metrics. Across average precisions spanning 4 bit down to 2 bit, IMPQ cuts Perplexity by 20 to 80 percent relative to the best baseline, with the margin growing as the bit-width tightens.


【14】Quantifying Uncertainty in Natural Language Explanations of Large Language Models for Question Answering
标题:量化问题解答大型语言模型自然语言解释中的不确定性
链接:https://arxiv.org/abs/2509.15403

作者:, Mengdi Huai
摘要:大型语言模型(LLM)已经显示出强大的能力,能够在问答(QA)任务中提供简洁的上下文感知答案。复杂的LLM缺乏透明度激发了广泛的研究,旨在开发解释大型语言行为的方法。在现有的解释方法中,自然语言解释脱颖而出,因为它们能够以自解释的方式解释LLM,并且即使在模型是闭源的情况下也能够理解模型行为。然而,尽管有这些有希望的进步,目前还没有研究如何为这些生成的自然语言解释提供有效的不确定性保证的工作。这种不确定性量化对于理解这些解释背后的信心至关重要。值得注意的是,由于LLM的自回归生成过程和医疗查询中存在噪声,为自然语言解释生成有效的不确定性估计特别具有挑战性。为了弥合这一差距,在这项工作中,我们首先提出了一种新的不确定性估计框架,这些生成的自然语言解释,提供有效的不确定性保证在事后和模型不可知的方式。此外,我们还设计了一种新的鲁棒不确定性估计方法,即使在噪声下也能保持有效的不确定性保证。QA任务上的大量实验证明了我们的方法所期望的性能。
摘要 :Large language models (LLMs) have shown strong capabilities, enabling concise, context-aware answers in question answering (QA) tasks. The lack of transparency in complex LLMs has inspired extensive research aimed at developing methods to explain large language behaviors. Among existing explanation methods, natural language explanations stand out due to their ability to explain LLMs in a self-explanatory manner and enable the understanding of model behaviors even when the models are closed-source. However, despite these promising advancements, there is no existing work studying how to provide valid uncertainty guarantees for these generated natural language explanations. Such uncertainty quantification is critical in understanding the confidence behind these explanations. Notably, generating valid uncertainty estimates for natural language explanations is particularly challenging due to the auto-regressive generation process of LLMs and the presence of noise in medical inquiries. To bridge this gap, in this work, we first propose a novel uncertainty estimation framework for these generated natural language explanations, which provides valid uncertainty guarantees in a post-hoc and model-agnostic manner. Additionally, we also design a novel robust uncertainty estimation method that maintains valid uncertainty guarantees even under noise. Extensive experiments on QA tasks demonstrate the desired performance of our methods.


【15】Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech data
标题:探索有限语音数据下用于口语理解的大型音频语言模型的微调
链接:https://arxiv.org/abs/2509.15389

作者:Choi, Jaeyoon Jung, Hyeonyu Kim, Huu-Kim Nguyen, Hwayeon Kim
备注:4 pages (excluding references), 2 figures, submitted to ICASSP 2026
摘要:大型音频语言模型(LALM)已成为语音相关任务的强大工具,但仍有待进一步研究,尤其是在语音数据有限的情况下。为了弥合这一差距,我们系统地研究了不同的微调方案,包括纯文本,直接混合和课程学习如何影响口语理解(SLU),重点放在文本标签对丰富而配对语音标签数据有限的情况下。结果表明,LALM已经实现了竞争力的性能与纯文本微调,突出了他们强大的泛化能力。即使添加少量的语音数据(2-5%)也会产生实质性的进一步收益,课程学习在稀缺数据下特别有效。在跨语言二语习得中,将源语言语音数据与目标语言文本相结合,并以最少的目标语言语音数据实现有效的自适应。总体而言,这项研究提供了实际的见解LALM微调现实的数据约束下。
摘要:Large Audio Language Models (LALMs) have emerged as powerful tools for speech-related tasks but remain underexplored for fine-tuning, especially with limited speech data. To bridge this gap, we systematically examine how different fine-tuning schemes including text-only, direct mixing, and curriculum learning affect spoken language understanding (SLU), focusing on scenarios where text-label pairs are abundant while paired speech-label data are limited. Results show that LALMs already achieve competitive performance with text-only fine-tuning, highlighting their strong generalization ability. Adding even small amounts of speech data (2-5%) yields substantial further gains, with curriculum learning particularly effective under scarce data. In cross-lingual SLU, combining source-language speech data with target-language text and minimal target-language speech data enables effective adaptation. Overall, this study provides practical insights into the LALM fine-tuning under realistic data constraints.


【16】MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
标题:MaskAttn-SDXL:可控区域级文本到图像生成
链接:https://arxiv.org/abs/2509.15357

作者: Jiahao Chen, Anzhe Cheng, Paul Bogdan
备注:Submitted to ICASSP 2026
摘要:文本到图像的扩散模型实现了令人印象深刻的现实主义,但经常遭受组合失败的提示与多个对象,属性和空间关系,导致交叉令牌干扰实体纠缠,属性混合在对象之间,空间线索被违反。为了解决这些问题,我们提出了MaskAttn-SDXL,一个区域级的门控机制,应用于稳定扩散XL(SDXL)的UNet的交叉注意逻辑。MaskAttn-SDXL每层学习一个二进制掩码,在softmax之前将其注入到每个交叉注意力logit映射中,以稀疏标记到潜在的交互,以便只有语义相关的连接保持活跃。该方法不需要位置编码、辅助标记或外部区域掩码,并以可忽略的开销保留原始推理路径。在实践中,我们的模型提高了多对象提示的空间合规性和属性绑定,同时保持整体图像质量和多样性。这些研究结果表明,logit级maksed交叉注意是一个数据有效的原语强制成分控制,因此,我们的方法作为一个实际的扩展,在文本到图像生成的空间控制。
摘要:Text-to-image diffusion models achieve impressive realism but often suffer from compositional failures on prompts with multiple objects, attributes, and spatial relations, resulting in cross-token interference where entities entangle, attributes mix across objects, and spatial cues are violated. To address these failures, we propose MaskAttn-SDXL,a region-level gating mechanism applied to the cross-attention logits of Stable Diffusion XL(SDXL)'s UNet. MaskAttn-SDXL learns a binary mask per layer, injecting it into each cross-attention logit map before softmax to sparsify token-to-latent interactions so that only semantically relevant connections remain active. The method requires no positional encodings, auxiliary tokens, or external region masks, and preserves the original inference path with negligible overhead. In practice, our model improves spatial compliance and attribute binding in multi-object prompts while preserving overall image quality and diversity. These findings demonstrate that logit-level maksed cross-attention is an data-efficient primitve for enforcing compositional control, and our method thus serves as a practical extension for spatial control in text-to-image generation.


【17】Predicting Language Models' Success at Zero-Shot Probabilistic Prediction
标题:零概率预测语言模型的成功
链接:https://arxiv.org/abs/2509.15356

作者:, Santiago Cortes-Gomez, Carlos Miguel Patiño, Ananya Joshi, Ruiqi Lyu, Jingjing Tang, Alistair Turcan, Khurram Yamin, Steven Wu, Bryan Wilder
备注:EMNLP Findings 2025. We release our code at: this https URL
摘要:最近的工作已经研究了大型语言模型(LLM)作为用于生成个人级别特征(例如,用作风险模型或增加调查数据集)。然而,用户何时应该相信LLM将为他们的特定任务提供高质量的预测?为了解决这个问题,我们进行了大规模的实证研究LLM的zero-shot预测能力在广泛的表格预测任务。我们发现,LLM的性能是高度可变的,无论是在同一数据集内的任务,并在不同的数据集。然而,当LLM在基本预测任务上表现良好时,其预测概率对于个体级别的准确性来说变得更强。然后,我们构建指标来预测LLM在任务级别的性能,旨在区分LLM可能表现良好的任务和它们可能不适合的任务。我们发现,这些指标中的一些,其中每一个都是在没有标记数据的情况下评估的,产生了LLM在新任务上的预测性能的强烈信号。
摘要:Recent work has investigated the capabilities of large language models (LLMs) as zero-shot models for generating individual-level characteristics (e.g., to serve as risk models or augment survey datasets). However, when should a user have confidence that an LLM will provide high-quality predictions for their particular task? To address this question, we conduct a large-scale empirical study of LLMs' zero-shot predictive capabilities across a wide range of tabular prediction tasks. We find that LLMs' performance is highly variable, both on tasks within the same dataset and across different datasets. However, when the LLM performs well on the base prediction task, its predicted probabilities become a stronger signal for individual-level accuracy. Then, we construct metrics to predict LLMs' performance at the task level, aiming to distinguish between tasks where LLMs may perform well and where they are likely unsuitable. We find that some of these metrics, each of which are assessed without labeled data, yield strong signals of LLMs' predictive performance on new tasks.


【18】Evaluating the Limitations of Local LLMs in Solving Complex Programming Challenges
标题:评估本地LLM在解决复杂编程挑战方面的局限性
链接:https://arxiv.org/abs/2509.15283

作者:otek, Heather Cassel, Md Amiruzzaman, Linh B. Ngo
备注:Comments: 16 pages, 3 figures, 8 tables, accepted to CCSC Eastern 2025
摘要 :本研究探讨了当今的开源,本地托管的大型语言模型(LLM)在处理复杂的竞争性编程任务与扩展的问题描述和上下文的性能。在原始的人工智能驱动的代码生成评估框架(FACE)的基础上,作者对管道进行了改造,使其通过Ollama运行时完全离线工作,将FACE庞大的每个问题目录树折叠成几个统一的JSON文件,并添加了强大的检查点,以便在故障后可以恢复多日运行。增强的框架生成,提交和记录完整的Kattis语料库的3,589个问题的解决方案,跨越8个面向代码的模型,范围从67亿到90亿个参数。提交结果显示,本地模型的总体通过率@1准确度适中,最佳模型的接受率约为专有模型Gemini 1.5和ChatGPT-4的一半。这些发现暴露了私有的、成本受控的LLM部署和最先进的专有服务之间的持续差距,但也突出了开放模型的快速进展以及组织可以在内部硬件上复制的评估工作流程的实际好处。
摘要:This study examines the performance of today's open-source, locally hosted large-language models (LLMs) in handling complex competitive programming tasks with extended problem descriptions and contexts. Building on the original Framework for AI-driven Code Generation Evaluation (FACE), the authors retrofit the pipeline to work entirely offline through the Ollama runtime, collapsing FACE's sprawling per-problem directory tree into a handful of consolidated JSON files, and adding robust checkpointing so multi-day runs can resume after failures. The enhanced framework generates, submits, and records solutions for the full Kattis corpus of 3,589 problems across eight code-oriented models ranging from 6.7-9 billion parameters. The submission results show that the overall pass@1 accuracy is modest for the local models, with the best models performing at approximately half the acceptance rate of the proprietary models, Gemini 1.5 and ChatGPT-4. These findings expose a persistent gap between private, cost-controlled LLM deployments and state-of-the-art proprietary services, yet also highlight the rapid progress of open models and the practical benefits of an evaluation workflow that organizations can replicate on in-house hardware.


Graph相关(图学习|图神经网络|图优化等)(9篇)

【1】Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph
标题:通过Scheffe图进行查询高效的本地私有假设选择
链接:https://arxiv.org/abs/2509.16180

作者:math, Alireza F. Pour, Matthew Regehr, David P. Woodruff
摘要:针对局部差分隐私约束下的假设选择问题,提出了一种改进的查询复杂度算法。给定一组$k$概率分布$Q$,我们描述了一个算法,满足本地差分隐私,执行$\tilde{O}(k^{3/2})$非自适应查询的个人谁每个样本从概率分布$p$,并输出一个概率分布从集$Q$这是接近$p$。以前的算法需要$\Omega(k^2)$查询或多轮交互式查询。   从技术上讲,我们引入了一个新的对象,我们称之为Scheff\'e图,它捕获了$Q$中分布之间的差异结构,并且可能对假设选择任务有更广泛的兴趣。
摘要:We propose an algorithm with improved query-complexity for the problem of hypothesis selection under local differential privacy constraints. Given a set of $k$ probability distributions $Q$, we describe an algorithm that satisfies local differential privacy, performs $\tilde{O}(k^{3/2})$ non-adaptive queries to individuals who each have samples from a probability distribution $p$, and outputs a probability distribution from the set $Q$ which is nearly the closest to $p$. Previous algorithms required either $\Omega(k^2)$ queries or many rounds of interactive queries.   Technically, we introduce a new object we dub the Scheff\'e graph, which captures structure of the differences between distributions in $Q$, and may be of more broad interest for hypothesis selection tasks.


【2】Automated Cyber Defense with Generalizable Graph-based Reinforcement Learning Agents
标题:使用可概括的基于图的强化学习代理实现自动网络防御
链接:https://arxiv.org/abs/2509.16151

作者: King, Benjamin Bowman, H. Howie Huang
摘要:深度强化学习(RL)正在成为自动网络防御(ACD)的可行策略。传统的强化学习方法将网络表示为处于各种安全或威胁状态的计算机列表。不幸的是,这些模型被迫过度适应特定的网络拓扑结构,使得它们在面对即使是很小的环境扰动时也无效。在这项工作中,我们框架ACD作为一个两个球员的上下文为基础的部分可观察马尔可夫决策问题的意见表示为属性图。这种方法允许我们的代理人通过关系归纳偏见的镜头进行推理。代理学习如何以更一般的方式推理主机与其他系统实体的交互,它们的操作被理解为对表示环境的图形的编辑。通过引入这种偏差,我们将表明我们的代理可以更好地推理网络的状态和zero-shot适应新的。我们表明,这种方法优于国家的最先进的一个很大的保证金,并使我们的代理能够防御从未见过的网络对各种各样的对手在各种复杂的,多代理的环境。
摘要:Deep reinforcement learning (RL) is emerging as a viable strategy for automated cyber defense (ACD). The traditional RL approach represents networks as a list of computers in various states of safety or threat. Unfortunately, these models are forced to overfit to specific network topologies, rendering them ineffective when faced with even small environmental perturbations. In this work, we frame ACD as a two-player context-based partially observable Markov decision problem with observations represented as attributed graphs. This approach allows our agents to reason through the lens of relational inductive bias. Agents learn how to reason about hosts interacting with other system entities in a more general manner, and their actions are understood as edits to the graph representing the environment. By introducing this bias, we will show that our agents can better reason about the states of networks and zero-shot adapt to new ones. We show that this approach outperforms the state-of-the-art by a wide margin, and makes our agents capable of defending never-before-seen networks against a wide range of adversaries in a variety of complex, and multi-agent environments.


【3】Adversarial Graph Fusion for Incomplete Multi-view Semi-supervised Learning with Tensorial Imputation
标题:具有张量插补的不完全多视图半监督学习的对抗图融合
链接:https://arxiv.org/abs/2509.15955

作者:iang, Tingjin Luo, Xu Yang, Xinyan Liang
备注:30 pages, 15 figures
摘要:视图缺失是基于图的多视图半监督学习中的一个重要问题,阻碍了其在现实世界中的应用。为了解决这个问题,传统的方法引入了一个缺失的指示矩阵,并专注于挖掘部分结构中的现有样本在每个视图的标签传播(LP)。然而,我们认为,这些被忽视的缺失样本有时会导致不连续的局部结构,即,子集群,打破了LP中的基本平滑假设。因此,这样的子聚类问题(SCP)会扭曲图融合和降低分类性能。为了缓解SCP,我们提出了一种新的不完全多视图半监督学习方法,称为AGF-TI。首先,我们设计了一个对抗图融合方案,通过最小-最大框架学习一个鲁棒的共识图,以对抗扭曲的局部结构。通过将所有的相似性矩阵堆叠成一个张量,我们进一步恢复不完整的结构,从高阶一致性信息的基础上低秩张量学习。此外,基于锚点的策略,以减少计算复杂度。一个有效的替代优化算法相结合的减少梯度下降法来解决制定的目标,理论收敛性。在各种数据集上的大量实验结果验证了我们提出的AGF-TI与最先进的方法相比的优越性。代码可在https://github.com/ZhangqiJiang07/AGF_TI上获得。
摘要:View missing remains a significant challenge in graph-based multi-view semi-supervised learning, hindering their real-world applications. To address this issue, traditional methods introduce a missing indicator matrix and focus on mining partial structure among existing samples in each view for label propagation (LP). However, we argue that these disregarded missing samples sometimes induce discontinuous local structures, i.e., sub-clusters, breaking the fundamental smoothness assumption in LP. Consequently, such a Sub-Cluster Problem (SCP) would distort graph fusion and degrade classification performance. To alleviate SCP, we propose a novel incomplete multi-view semi-supervised learning method, termed AGF-TI. Firstly, we design an adversarial graph fusion scheme to learn a robust consensus graph against the distorted local structure through a min-max framework. By stacking all similarity matrices into a tensor, we further recover the incomplete structure from the high-order consistency information based on the low-rank tensor learning. Additionally, the anchor-based strategy is incorporated to reduce the computational complexity. An efficient alternative optimization algorithm combining a reduced gradient descent method is developed to solve the formulated objective, with theoretical convergence. Extensive experimental results on various datasets validate the superiority of our proposed AGF-TI as compared to state-of-the-art methods. Code is available at https://github.com/ZhangqiJiang07/AGF_TI.


【4】EvoBrain: Dynamic Multi-channel EEG Graph Modeling for Time-evolving Brain Network
标题:EvoBrain:时间演变大脑网络的动态多通道脑电图建模
链接:https://arxiv.org/abs/2509.15857

作者:toge, Zheng Chen, Tasuku Kimura, Yasuko Matsubara, Takufumi Yanagisawa, Haruhiko Kishima, Yasushi Sakurai
备注:Accepted by NeurIPS 2025 (spotlight)
摘要:动态GNN集成了脑电图(EEG)数据中的时间和空间特征,在自动化癫痫发作检测方面显示出巨大的潜力。然而,完全捕获表示大脑状态所需的潜在动力学,例如癫痫发作和非癫痫发作,仍然是一项重要的任务,并提出了两个基本挑战。首先,大多数现有的动态GNN方法都建立在时间固定的静态图上,无法反映癫痫发作过程中大脑连接的演变性质。其次,目前联合建模时间信号和图结构的努力,更重要的是,它们的相互作用仍然处于初级阶段,通常会导致性能不一致。为了应对这些挑战,我们提出了这两个问题的第一个理论分析,证明显式动态建模和时间然后图动态GNN方法的有效性和必要性。在这些见解的基础上,我们提出EvoBrain,一种新的癫痫发作检测模型,该模型集成了两个流Mamba架构与通过Laplacian位置编码增强的GCN,遵循神经学见解。此外,EvoBrain包含明确的动态图形结构,允许节点和边随时间推移而演变。我们的贡献包括:(a)理论分析证明了显式动态建模和时间-然后-图相对于其他方法的表现力优势,(b)与动态GNN基线相比,一种新颖有效的模型显着提高了AUROC 23%和F1分数30%,以及(c)对我们的方法在具有挑战性的早期癫痫预测任务上的广泛评估。
摘要:Dynamic GNNs, which integrate temporal and spatial features in Electroencephalography (EEG) data, have shown great potential in automating seizure detection. However, fully capturing the underlying dynamics necessary to represent brain states, such as seizure and non-seizure, remains a non-trivial task and presents two fundamental challenges. First, most existing dynamic GNN methods are built on temporally fixed static graphs, which fail to reflect the evolving nature of brain connectivity during seizure progression. Second, current efforts to jointly model temporal signals and graph structures and, more importantly, their interactions remain nascent, often resulting in inconsistent performance. To address these challenges, we present the first theoretical analysis of these two problems, demonstrating the effectiveness and necessity of explicit dynamic modeling and time-then-graph dynamic GNN method. Building on these insights, we propose EvoBrain, a novel seizure detection model that integrates a two-stream Mamba architecture with a GCN enhanced by Laplacian Positional Encoding, following neurological insights. Moreover, EvoBrain incorporates explicitly dynamic graph structures, allowing both nodes and edges to evolve over time. Our contributions include (a) a theoretical analysis proving the expressivity advantage of explicit dynamic modeling and time-then-graph over other approaches, (b) a novel and efficient model that significantly improves AUROC by 23% and F1 score by 30%, compared with the dynamic GNN baseline, and (c) broad evaluations of our method on the challenging early seizure prediction tasks.


【5】Solar Forecasting with Causality: A Graph-Transformer Approach to Spatiotemporal Dependencies
标题:具有因果关系的太阳预测:时空相依性的图形转换方法
链接:https://arxiv.org/abs/2509.15481

作者:, Demetri Psaltis, Christophe Moser, Luisa Lambertini
备注:Accepted to CIKM 2025
摘要:准确的太阳能预测是有效的可再生能源管理的基础。我们提出了SolarCAST,因果告知模型预测未来的全球水平辐照度(GHI)在目标站点仅使用历史GHI从站点X和附近的站点S -不像以前的工作,依赖于天空相机或卫星图像需要专门的硬件和繁重的预处理。为了仅使用公共传感器数据提供高精度,SolarCAST使用可扩展的神经组件对X-S相关性背后的三类混杂因素进行建模:(i)可观察的同步变量(例如,一天中的时间,站标识),经由嵌入模块处理;(i i)潜在的同步因素(例如,区域天气模式),由时空图神经网络捕获;以及(iii)时滞影响(例如,跨站点的云移动),使用学习时间偏移的门控Transformer进行建模。它在不同地理条件下的表现优于领先的时间序列和多模态基线,并比顶级商业预报员Solcast减少了25.9%的误差。SolarCAST为本地化的太阳能预测提供了一个轻量级、实用和可推广的解决方案。
摘要:Accurate solar forecasting underpins effective renewable energy management. We present SolarCAST, a causally informed model predicting future global horizontal irradiance (GHI) at a target site using only historical GHI from site X and nearby stations S - unlike prior work that relies on sky-camera or satellite imagery requiring specialized hardware and heavy preprocessing. To deliver high accuracy with only public sensor data, SolarCAST models three classes of confounding factors behind X-S correlations using scalable neural components: (i) observable synchronous variables (e.g., time of day, station identity), handled via an embedding module; (ii) latent synchronous factors (e.g., regional weather patterns), captured by a spatio-temporal graph neural network; and (iii) time-lagged influences (e.g., cloud movement across stations), modeled with a gated transformer that learns temporal shifts. It outperforms leading time-series and multimodal baselines across diverse geographical conditions, and achieves a 25.9% error reduction over the top commercial forecaster, Solcast. SolarCAST offers a lightweight, practical, and generalizable solution for localized solar forecasting.


【6】Partial Column Generation with Graph Neural Networks for Team Formation and Routing
标题:利用图神经网络生成部分列以用于团队组建和路由
链接:https://arxiv.org/abs/2509.15275

作者:all'Olio, Rainer Kolisch, Yaoxin Wu
备注:30 pages, 4 figures
摘要:团队组建和路由问题是一个具有挑战性的优化问题,在机场,医疗保健和维护操作等领域有几个实际应用。为了解决这一问题,文献中已经提出了基于列生成的精确解方法。在本文中,我们提出了一种新的部分列生成策略设置多个定价问题的基础上,预测哪些可能产生列的负降低成本。我们开发了一个针对团队组建和路由问题定制的机器学习模型,该模型利用图神经网络进行这些预测。计算实验表明,应用我们的策略增强了解决方案的方法,并优于传统的部分列生成方法,特别是在严格的时间限制下解决的硬实例。
摘要:The team formation and routing problem is a challenging optimization problem with several real-world applications in fields such as airport, healthcare, and maintenance operations. To solve this problem, exact solution methods based on column generation have been proposed in the literature. In this paper, we propose a novel partial column generation strategy for settings with multiple pricing problems, based on predicting which ones are likely to yield columns with a negative reduced cost. We develop a machine learning model tailored to the team formation and routing problem that leverages graph neural networks for these predictions. Computational experiments demonstrate that applying our strategy enhances the solution method and outperforms traditional partial column generation approaches from the literature, particularly on hard instances solved under a tight time limit.


【7】A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions Prediction
标题:用于药物相互作用预测的具有跨药物共同注意力的多尺度图神经过程
链接:https://arxiv.org/abs/2509.15256

作者: Jie Zhang, Zheng Xie, Yiping Song, Hao Li
摘要:准确预测药物间相互作用(DDI)对于药物安全性和有效药物开发至关重要。然而,现有的方法往往难以捕获不同尺度的结构信息,从局部官能团到全局分子拓扑结构,并且通常缺乏量化预测置信度的机制。为了解决这些限制,我们提出了MPNP-DDI,一种新的多尺度图神经过程框架。MPNP-DDI的核心是一个独特的消息传递方案,通过迭代应用,可以在多个尺度上学习图形表示的层次结构。至关重要的是,交叉药物共同注意机制然后动态地融合这些多尺度表示,以生成相互作用的药物对的上下文感知嵌入,而集成的神经处理模块提供原则性的不确定性估计。大量的实验表明,MPNP-DDI在基准数据集上的性能显着优于最先进的基线。通过提供基于多尺度结构特征的准确、可推广和不确定性感知的预测,MPNP-DDI代表了药物警戒、多药风险评估和精准医学的强大计算工具。
摘要 :Accurate prediction of drug-drug interactions (DDI) is crucial for medication safety and effective drug development. However, existing methods often struggle to capture structural information across different scales, from local functional groups to global molecular topology, and typically lack mechanisms to quantify prediction confidence. To address these limitations, we propose MPNP-DDI, a novel Multi-scale Graph Neural Process framework. The core of MPNP-DDI is a unique message-passing scheme that, by being iteratively applied, learns a hierarchy of graph representations at multiple scales. Crucially, a cross-drug co-attention mechanism then dynamically fuses these multi-scale representations to generate context-aware embeddings for interacting drug pairs, while an integrated neural process module provides principled uncertainty estimation. Extensive experiments demonstrate that MPNP-DDI significantly outperforms state-of-the-art baselines on benchmark datasets. By providing accurate, generalizable, and uncertainty-aware predictions built upon multi-scale structural features, MPNP-DDI represents a powerful computational tool for pharmacovigilance, polypharmacy risk assessment, and precision medicine.


【8】Accelerating Atomic Fine Structure Determination with Graph Reinforcement Learning
标题:利用图强化学习加速原子精细结构确定
链接:https://arxiv.org/abs/2509.16184

作者:V.-A. Darvariu, A. N. Ryabtsev, N. Hawes, J. C. Pickering
摘要:通过分析观测到的原子光谱确定的原子数据对于等离子体诊断是必不可少的。对于每个低电离开放d-和f-亚壳层原子物种,通过多年对10^4 $可观测光谱线的分析,可以确定大约10^3 $精细结构能级能量。我们提出了这个任务的自动化,通过铸造的分析过程作为一个马尔可夫决策过程,并解决它的图形强化学习使用奖励功能的历史人类决策。在我们对Co II和Nd II-III的现有谱线列表和理论计算的评估中,在数小时内计算了数百个能级能量,与Co II的95%和Nd II-III的54-87%的公布值一致。由于目前原子精细结构测定的效率难以满足天文学和聚变科学日益增长的原子数据需求,我们新的人工智能方法为缩小这一差距奠定了基础。
摘要:Atomic data determined by analysis of observed atomic spectra are essential for plasma diagnostics. For each low-ionisation open d- and f-subshell atomic species, around $10^3$ fine structure level energies can be determined through years of analysis of $10^4$ observable spectral lines. We propose the automation of this task by casting the analysis procedure as a Markov decision process and solving it by graph reinforcement learning using reward functions learned on historical human decisions. In our evaluations on existing spectral line lists and theoretical calculations for Co II and Nd II-III, hundreds of level energies were computed within hours, agreeing with published values in 95% of cases for Co II and 54-87% for Nd II-III. As the current efficiency in atomic fine structure determination struggles to meet growing atomic data demands from astronomy and fusion science, our new artificial intelligence approach sets the stage for closing this gap.


【9】Model-free algorithms for fast node clustering in SBM type graphs and application to social role inference in animals
标题:SBM类型图中快速节点聚集的无模型算法及其在动物社会角色推断中的应用
链接:https://arxiv.org/abs/2509.15989

作者:Cloez, Adrien Cotil, Jean-Baptiste Menassol, Nicolas Verzelen
摘要:我们提出了一个新的家庭的无模型算法的节点聚类和参数推断从随机块模型(SBM),在社区检测的基本框架产生的图形。从劳埃德算法的$k$-均值问题的启发,我们的方法扩展到SBMs一般的边缘权重分布。我们建立我们的估计的一致性下的自然可识别性条件。通过大量的数值实验,我们对我们的方法进行了基准测试,以最先进的技术,证明了更快的计算时间和更低的估计误差。最后,我们通过将其应用于行为生态学的经验网络数据来验证我们的算法的实际相关性。
摘要:We propose a novel family of model-free algorithms for node clustering and parameter inference in graphs generated from the Stochastic Block Model (SBM), a fundamental framework in community detection. Drawing inspiration from the Lloyd algorithm for the $k$-means problem, our approach extends to SBMs with general edge weight distributions. We establish the consistency of our estimator under a natural identifiability condition. Through extensive numerical experiments, we benchmark our methods against state-of-the-art techniques, demonstrating significantly faster computation times with the lower order of estimation error. Finally, we validate the practical relevance of our algorithms by applying them to empirical network data from behavioral ecology.


Transformer(6篇)

【1】Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
标题:基于注意力方案的注意力控制(ASAC):Transformer中一种受认知启发的注意力管理方法
链接:https://arxiv.org/abs/2509.16058

作者:ena, Federico Jurado Ruiz, Guido Manzi, Dianbo Liu, Alex Lamb
摘要:注意力机制已成为人工智能中不可或缺的一部分,通过从人类认知中汲取灵感,显着提高了模型性能和可扩展性。同时,认知科学中的注意图式理论(AST)认为,个体通过创建注意本身的模型来管理自己的注意,从而有效地分配认知资源。受AST的启发,我们引入了ASAC(Attention Schema-based Attention Control),它将注意力模式的概念集成到人工神经网络中。我们最初的实验集中在将ASAC模块嵌入到Transformer架构中。该模块采用矢量量化变分自动编码器(VQVAE)作为注意力提取器和控制器,便于精确的注意力管理。通过显式地建模注意分配,我们的方法旨在提高系统的效率。我们展示了ASAC在视觉和NLP领域的有效性,突出了其提高分类准确性和加快学习过程的能力。我们在各种数据集上使用Vision Transformers进行的实验表明,注意力控制器不仅可以提高分类准确性,还可以加速学习。此外,我们已经证明了该模型的鲁棒性和泛化能力,在噪声和分布外的数据集。此外,我们还展示了多任务设置中的性能改进。快速实验表明,基于注意力模式的模块增强了对抗性攻击的弹性,优化了注意力以提高学习效率,并促进了有效的迁移学习和从更少的示例中学习。这些有希望的结果建立了认知科学和机器学习之间的联系,为人工智能系统中注意力机制的有效利用提供了线索。
摘要:Attention mechanisms have become integral in AI, significantly enhancing model performance and scalability by drawing inspiration from human cognition. Concurrently, the Attention Schema Theory (AST) in cognitive science posits that individuals manage their attention by creating a model of the attention itself, effectively allocating cognitive resources. Inspired by AST, we introduce ASAC (Attention Schema-based Attention Control), which integrates the attention schema concept into artificial neural networks. Our initial experiments focused on embedding the ASAC module within transformer architectures. This module employs a Vector-Quantized Variational AutoEncoder (VQVAE) as both an attention abstractor and controller, facilitating precise attention management. By explicitly modeling attention allocation, our approach aims to enhance system efficiency. We demonstrate ASAC's effectiveness in both the vision and NLP domains, highlighting its ability to improve classification accuracy and expedite the learning process. Our experiments with vision transformers across various datasets illustrate that the attention controller not only boosts classification accuracy but also accelerates learning. Furthermore, we have demonstrated the model's robustness and generalization capabilities across noisy and out-of-distribution datasets. In addition, we have showcased improved performance in multi-task settings. Quick experiments reveal that the attention schema-based module enhances resilience to adversarial attacks, optimizes attention to improve learning efficiency, and facilitates effective transfer learning and learning from fewer examples. These promising results establish a connection between cognitive science and machine learning, shedding light on the efficient utilization of attention mechanisms in AI systems.


【2】Localmax dynamics for attention in transformers and its asymptotic behavior
标题:Transformer中注意力的Local max动态及其渐进行为
链接:https://arxiv.org/abs/2509.15958

作者 :etière, Maria Teresa Chiri, Bahman Gharesifard
备注:28 pages, 5 figures
摘要:我们引入了一个新的离散时间注意力模型,称为localmax动态,它插在经典的softmax动态和hardmax动态之间,其中只有最大化对给定令牌的影响的令牌具有正权重。与hardmax一样,统一权重由控制邻居影响的参数确定,但关键的扩展在于通过一个敏感性参数放松邻居的相互作用,这允许控制纯hardmax行为的偏差。正如我们所证明的,虽然令牌状态的凸包仍然收敛到凸多面体,但其结构不再能够完全由最大对齐集描述,这促使引入静态集来捕获顶点附近令牌的不变行为。我们表明,这些集发挥了关键作用,在理解系统的渐近行为,即使在随时间变化的对准灵敏度参数。我们进一步表明,localmax动态不表现出有限时间收敛,并提供结果为消失,非零,随时间变化的灵敏度参数,恢复的限制行为的hardmax作为一个副产品。最后,我们适应基于Lyapunov的方法,从经典的意见动态,突出其局限性,在非对称设置的localmax的相互作用和概述未来的研究方向。
摘要:We introduce a new discrete-time attention model, termed the localmax dynamics, which interpolates between the classic softmax dynamics and the hardmax dynamics, where only the tokens that maximize the influence toward a given token have a positive weight. As in hardmax, uniform weights are determined by a parameter controlling neighbor influence, but the key extension lies in relaxing neighborhood interactions through an alignment-sensitivity parameter, which allows controlled deviations from pure hardmax behavior. As we prove, while the convex hull of the token states still converges to a convex polytope, its structure can no longer be fully described by a maximal alignment set, prompting the introduction of quiescent sets to capture the invariant behavior of tokens near vertices. We show that these sets play a key role in understanding the asymptotic behavior of the system, even under time-varying alignment sensitivity parameters. We further show that localmax dynamics does not exhibit finite-time convergence and provide results for vanishing, nonzero, time-varying alignment-sensitivity parameters, recovering the limiting behavior of hardmax as a by-product. Finally, we adapt Lyapunov-based methods from classical opinion dynamics, highlighting their limitations in the asymmetric setting of localmax interactions and outlining directions for future research.


【3】Bayesian Physics Informed Neural Networks for Reliable Transformer Prognostics
标题:基于Bayesian物理学的神经网络实现可靠的Transformer预测
链接:https://arxiv.org/abs/2509.15933

作者:rez, Jokin Alcibar, Joel Pino, Mikel Sanz, David Pardo, Jose I. Aizpurua
备注:Submitted to the Annual Prognostics and Health Management (PHM) Society Conference 2025
摘要:科学机器学习(SciML)将物理和数据集成到学习过程中,与纯粹的数据驱动模型相比,提供了更好的泛化能力。尽管SciML具有潜力,但其在物理学中的应用仍然有限,部分原因是将偏微分方程(PDE)用于老化物理学的复杂性以及缺乏强大的不确定性量化方法。本文介绍了一个贝叶斯物理信息神经网络(B-PINN)的框架,用于概率统计估计。通过将贝叶斯神经网络嵌入到PINN架构中,所提出的方法产生有原则的、不确定性感知的预测。该方法适用于Transformer老化的案例研究,绝缘退化主要是由热应力驱动。热扩散偏微分方程被用作物理残差,不同的先验分布进行了研究,以检查其预测后验分布和编码先验物理知识的能力的影响。该框架是验证对有限元模型开发和测试的真实测量从太阳能发电厂。结果,对辍学PINN基线基准,表明拟议的B-PINN提供更可靠的预后预测准确量化预测的不确定性。这种能力对于支持关键电力资产的稳健和明智的维护决策至关重要。
摘要:Scientific Machine Learning (SciML) integrates physics and data into the learning process, offering improved generalization compared with purely data-driven models. Despite its potential, applications of SciML in prognostics remain limited, partly due to the complexity of incorporating partial differential equations (PDEs) for ageing physics and the scarcity of robust uncertainty quantification methods. This work introduces a Bayesian Physics-Informed Neural Network (B-PINN) framework for probabilistic prognostics estimation. By embedding Bayesian Neural Networks into the PINN architecture, the proposed approach produces principled, uncertainty-aware predictions. The method is applied to a transformer ageing case study, where insulation degradation is primarily driven by thermal stress. The heat diffusion PDE is used as the physical residual, and different prior distributions are investigated to examine their impact on predictive posterior distributions and their ability to encode a priori physical knowledge. The framework is validated against a finite element model developed and tested with real measurements from a solar power plant. Results, benchmarked against a dropout-PINN baseline, show that the proposed B-PINN delivers more reliable prognostic predictions by accurately quantifying predictive uncertainty. This capability is crucial for supporting robust and informed maintenance decision-making in critical power assets.


【4】ToFU: Transforming How Federated Learning Systems Forget User Data
标题:ToFU:改变联邦学习系统忘记用户数据的方式
链接:https://arxiv.org/abs/2509.15861

作者:Tran, Hong-Hanh Nguyen-Le, Quoc-Viet Pham
备注:ECAI-2025
摘要:神经网络无意中记住了训练数据,在联邦学习(FL)系统中造成了隐私风险,例如对敏感数据的推理和重建攻击。为了降低这些风险并遵守隐私法规,引入了联邦非学习(FU),使FL系统的参与者能够从全局模型中删除其数据的影响。然而,目前的FU方法主要是事后行为,难以有效地擦除神经网络深刻记忆的信息。我们认为,有效的遗忘需要一个范式转变:设计FL系统固有的服从遗忘。为此,我们提出了一个学习到遗忘的转换引导的联邦遗忘(ToFU)框架,该框架在学习过程中结合了转换,以减少对特定实例的记忆。我们的理论分析揭示了变换组合如何可证明地限制特定于实例的信息,直接简化了随后的遗忘。至关重要的是,ToFU可以作为一个即插即用的框架,提高现有FU方法的性能。在CIFAR-10、CIFAR-100和MUFAC基准上的实验表明,ToFU优于现有的FU基线,与当前方法集成时提高了性能,并减少了学习时间。
摘要:Neural networks unintentionally memorize training data, creating privacy risks in federated learning (FL) systems, such as inference and reconstruction attacks on sensitive data. To mitigate these risks and to comply with privacy regulations, Federated Unlearning (FU) has been introduced to enable participants in FL systems to remove their data's influence from the global model. However, current FU methods primarily act post-hoc, struggling to efficiently erase information deeply memorized by neural networks. We argue that effective unlearning necessitates a paradigm shift: designing FL systems inherently amenable to forgetting. To this end, we propose a learning-to-unlearn Transformation-guided Federated Unlearning (ToFU) framework that incorporates transformations during the learning process to reduce memorization of specific instances. Our theoretical analysis reveals how transformation composition provably bounds instance-specific information, directly simplifying subsequent unlearning. Crucially, ToFU can work as a plug-and-play framework that improves the performance of existing FU methods. Experiments on CIFAR-10, CIFAR-100, and the MUFAC benchmark show that ToFU outperforms existing FU baselines, enhances performance when integrated with current methods, and reduces unlearning time.


【5】Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers
标题:行为的心理解释:决策变形者中EWA激发的注意力
链接:https://arxiv.org/abs/2509.15498

作者:f, Narayan B. Mandayam
摘要:Transformers已经成为一个引人注目的体系结构,通过自注意力对轨迹进行建模,从而进行顺序决策。在强化学习(RL)中,它们使返回条件控制不依赖于值函数近似。决策Transformers(DT)通过将RL转换为监督序列建模来利用这一点,但它们仅限于离线数据并且缺乏探索。在线决策Transformers(ODTs)通过对策略部署进行熵正则化训练来解决这一限制,为传统的RL方法(如Soft Actor-Critic)提供了一种稳定的替代方案,该方法依赖于自举目标和奖励整形。尽管有这些优势,ODT使用标准注意力,缺乏对特定行动结果的外显记忆。这导致学习长期行动有效性的效率低下。受经验加权吸引力(EWA)等认知模型的启发,我们提出了基于矢量量化的经验加权吸引力在线决策Transformers(EWA-VQ-ODT),这是一个轻量级模块,可以维护总结最近成功和失败的每个动作的心理账户。通过直接网格查找将连续动作路由到紧凑的矢量量化码本,其中每个代码存储通过衰减和基于奖励的增强在线更新的标量吸引力。这些吸引力通过偏向与动作标记相关的列来调节注意力,不需要改变主干或训练目标。在标准的连续控制基准测试中,EWA-VQ-ODT提高了样本效率和ODT的平均回报率,特别是在早期训练中。该模块计算效率高,可通过每代码跟踪解释,并得到理论保证的支持,约束吸引力动态及其对注意力漂移的影响。
摘要 :Transformers have emerged as a compelling architecture for sequential decision-making by modeling trajectories via self-attention. In reinforcement learning (RL), they enable return-conditioned control without relying on value function approximation. Decision Transformers (DTs) exploit this by casting RL as supervised sequence modeling, but they are restricted to offline data and lack exploration. Online Decision Transformers (ODTs) address this limitation through entropy-regularized training on on-policy rollouts, offering a stable alternative to traditional RL methods like Soft Actor-Critic, which depend on bootstrapped targets and reward shaping. Despite these advantages, ODTs use standard attention, which lacks explicit memory of action-specific outcomes. This leads to inefficiencies in learning long-term action effectiveness. Inspired by cognitive models such as Experience-Weighted Attraction (EWA), we propose Experience-Weighted Attraction with Vector Quantization for Online Decision Transformers (EWA-VQ-ODT), a lightweight module that maintains per-action mental accounts summarizing recent successes and failures. Continuous actions are routed via direct grid lookup to a compact vector-quantized codebook, where each code stores a scalar attraction updated online through decay and reward-based reinforcement. These attractions modulate attention by biasing the columns associated with action tokens, requiring no change to the backbone or training objective. On standard continuous-control benchmarks, EWA-VQ-ODT improves sample efficiency and average return over ODT, particularly in early training. The module is computationally efficient, interpretable via per-code traces, and supported by theoretical guarantees that bound the attraction dynamics and its impact on attention drift.


【6】Modeling Transformers as complex networks to analyze learning dynamics
标题:将Transformer建模为复杂网络以分析学习动态
链接:https://arxiv.org/abs/2509.15269

作者:a Rocchetti
摘要:大型语言模型(LLM)在训练过程中获得复杂能力的过程仍然是机械可解释性的一个关键问题。该项目研究这些学习动态是否可以通过复杂网络理论(CNT)的镜头来表征。我介绍了一种新的方法来表示一个基于变压器的LLM作为一个有向的,加权图,其中节点是模型的计算组件(注意头和MLP)和边缘代表因果关系的影响,通过基于干预的消融技术测量。通过在一个典型的归纳任务上跟踪Pythia-14 M模型的143个训练检查点上这个组件图的演变,我分析了一套图论度量。结果表明,网络的结构演变通过探索,巩固和细化的不同阶段。具体来说,我确定了一个稳定的层次结构的信息传播组件和一组动态的信息采集组件,其角色重新配置在关键的学习路口的出现。这项工作表明,组件级网络的角度提供了一个强大的宏观镜头,可视化和理解的自组织原则,驱动LLM功能电路的形成。
摘要:The process by which Large Language Models (LLMs) acquire complex capabilities during training remains a key open question in mechanistic interpretability. This project investigates whether these learning dynamics can be characterized through the lens of Complex Network Theory (CNT). I introduce a novel methodology to represent a Transformer-based LLM as a directed, weighted graph where nodes are the model's computational components (attention heads and MLPs) and edges represent causal influence, measured via an intervention-based ablation technique. By tracking the evolution of this component-graph across 143 training checkpoints of the Pythia-14M model on a canonical induction task, I analyze a suite of graph-theoretic metrics. The results reveal that the network's structure evolves through distinct phases of exploration, consolidation, and refinement. Specifically, I identify the emergence of a stable hierarchy of information spreader components and a dynamic set of information gatherer components, whose roles reconfigure at key learning junctures. This work demonstrates that a component-level network perspective offers a powerful macroscopic lens for visualizing and understanding the self-organizing principles that drive the formation of functional circuits in LLMs.


GAN|对抗|攻击|生成相关(5篇)

【1】Instance Generation for Meta-Black-Box Optimization through Latent Space Reverse Engineering
标题:基于潜空间逆向工程的元黑箱优化实例生成
链接:https://arxiv.org/abs/2509.15810

作者:, Zeyuan Ma, Zhiguang Cao, Yue-Jiao Gong
摘要:为了减轻设计优化算法所需的密集的人类专业知识,最近的元黑盒优化(MetaBBO)研究利用元学习的泛化能力,在预定义的训练问题集上训练基于神经网络的算法设计策略,从而自动化低级优化器对未知问题实例的适应性。目前,现有MetaBBO中常见的训练问题集选择是众所周知的基准测试套件CoCo-BBOB。虽然这样的选择促进了MetaBBO的发展,但CoCo-BBOB中的问题实例或多或少在多样性上受到限制,从而增加了MetaBBO过拟合的风险,这可能进一步导致泛化能力差。在本文中,我们提出了一个实例生成方法,称为\textbf{LSRE},它可以生成不同的训练问题的MetaBBO学习更普遍的政策的实例。LSRE首先训练一个自动编码器,将高维问题特征映射到二维潜在空间。在这个潜在空间中的均匀网格采样导致隐藏的问题实例表示具有足够的多样性。通过利用遗传编程方法来搜索与这些隐藏表示具有最小L2距离的函数公式,LSRE逆向工程师可以创建一个多样化的问题集,称为\textbf{Diverse-BBO}。我们通过在Diverse-BBO上训练各种MetaBBO来验证LSRE的有效性,并观察它们在合成或现实场景中的泛化性能。广泛的实验结果强调了Diverse-BBO对MetaBBO中现有训练集选择的优越性。进一步的消融研究不仅证明了LSRE中设计选择的有效性,而且还揭示了实例多样性和MetaBBO泛化的有趣见解。
摘要:To relieve intensive human-expertise required to design optimization algorithms, recent Meta-Black-Box Optimization (MetaBBO) researches leverage generalization strength of meta-learning to train neural network-based algorithm design policies over a predefined training problem set, which automates the adaptability of the low-level optimizers on unseen problem instances. Currently, a common training problem set choice in existing MetaBBOs is well-known benchmark suites CoCo-BBOB. Although such choice facilitates the MetaBBO's development, problem instances in CoCo-BBOB are more or less limited in diversity, raising the risk of overfitting of MetaBBOs, which might further results in poor generalization. In this paper, we propose an instance generation approach, termed as \textbf{LSRE}, which could generate diverse training problem instances for MetaBBOs to learn more generalizable policies. LSRE first trains an autoencoder which maps high-dimensional problem features into a 2-dimensional latent space. Uniform-grid sampling in this latent space leads to hidden representations of problem instances with sufficient diversity. By leveraging a genetic-programming approach to search function formulas with minimal L2-distance to these hidden representations, LSRE reverse engineers a diversified problem set, termed as \textbf{Diverse-BBO}. We validate the effectiveness of LSRE by training various MetaBBOs on Diverse-BBO and observe their generalization performances on either synthetic or realistic scenarios. Extensive experimental results underscore the superiority of Diverse-BBO to existing training set choices in MetaBBOs. Further ablation studies not only demonstrate the effectiveness of design choices in LSRE, but also reveal interesting insights on instance diversity and MetaBBO's generalization.


【2】GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
标题:GUI-ReWalk:基于随机探索和意图感知推理的GUI Agent海量数据生成
链接:https://arxiv.org/abs/2509.15738

作者:, Minghao Liu, Taoran Lu, Lichen Yuan, Yiwei Liu, Haonan Xu, Yu Miao, Yuhao Chao, Zhaojian Li
摘要:图形用户界面(GUI)代理由大型语言和视觉语言模型提供支持,有望在数字环境中实现端到端自动化。然而,他们的进展从根本上受到缺乏可扩展的高质量轨迹数据的限制。现有的数据收集策略要么依赖于昂贵且不一致的手动注释,要么依赖于在多样性和有意义的任务覆盖范围之间进行权衡的综合生成方法。为了弥合这一差距,我们提出了GUI-ReWalk:一个推理增强的,多阶段的框架,用于合成现实和多样化的GUI轨迹。GUI-ReWalk从模拟人类试错行为的随机探索阶段开始,并逐步过渡到推理引导阶段,其中推断的目标驱动连贯和有目的的交互。此外,它还支持多步任务生成,支持跨多个应用程序构建长期工作流。通过将多样性的随机性与结构的目标感知推理相结合,GUI-ReWalk产生的数据更好地反映了人机交互的意图感知和自适应特性。我们在GUI-ReWalk数据集上进一步训练Qwen2.5-VL-7 B,并在多个基准测试中对其进行评估,包括Screenspot-Pro,OSWorld-G,UI-Vision,AndroidControl和GUI-Odyssey。结果表明,GUI-ReWalk能够更好地覆盖不同的交互流,更高的轨迹熵和更真实的用户意图。这些发现将GUI-ReWalk确立为一个可扩展的数据高效框架,用于推进GUI代理研究并实现强大的现实世界自动化。
摘要 :Graphical User Interface (GUI) Agents, powered by large language and vision-language models, hold promise for enabling end-to-end automation in digital environments. However, their progress is fundamentally constrained by the scarcity of scalable, high-quality trajectory data. Existing data collection strategies either rely on costly and inconsistent manual annotations or on synthetic generation methods that trade off between diversity and meaningful task coverage. To bridge this gap, we present GUI-ReWalk: a reasoning-enhanced, multi-stage framework for synthesizing realistic and diverse GUI trajectories. GUI-ReWalk begins with a stochastic exploration phase that emulates human trial-and-error behaviors, and progressively transitions into a reasoning-guided phase where inferred goals drive coherent and purposeful interactions. Moreover, it supports multi-stride task generation, enabling the construction of long-horizon workflows across multiple applications. By combining randomness for diversity with goal-aware reasoning for structure, GUI-ReWalk produces data that better reflects the intent-aware, adaptive nature of human-computer interaction. We further train Qwen2.5-VL-7B on the GUI-ReWalk dataset and evaluate it across multiple benchmarks, including Screenspot-Pro, OSWorld-G, UI-Vision, AndroidControl, and GUI-Odyssey. Results demonstrate that GUI-ReWalk enables superior coverage of diverse interaction flows, higher trajectory entropy, and more realistic user intent. These findings establish GUI-ReWalk as a scalable and data-efficient framework for advancing GUI agent research and enabling robust real-world automation.


【3】Adversarial generalization of unfolding (model-based) networks
标题:展开(基于模型的)网络的对抗概括
链接:https://arxiv.org/abs/2509.15370

作者:ni
备注:Accepted in NeurIPS2025
摘要:展开网络是从迭代算法中产生的可解释网络,包含数据结构的先验知识,旨在解决压缩感知等逆问题,该问题涉及从噪声,丢失的观测中恢复数据。压缩感知在从医学成像到密码学的关键领域中找到了应用,其中对抗鲁棒性对于防止灾难性故障至关重要。然而,在对抗性攻击的情况下,对展开网络性能的理论理解仍处于起步阶段。本文研究了由快速梯度符号方法生成的展开网络在受到l_2 $-范数约束攻击时的对抗性推广。特别是,我们选择了一个国家的最先进的overparameterized展开网络的家庭,并部署了一个新的框架,以估计其对抗Rademacher复杂性。鉴于这一估计,我们为所研究的网络提供了对抗性泛化误差界,这些误差界在攻击水平方面是紧的。据我们所知,这是第一次对展开网络的对抗性推广进行理论分析。我们进一步提出了一系列的实验现实世界的数据,结果证实了我们的推导理论,一致的所有数据。最后,我们观察到该家族的过度参数化可以用来提高对抗鲁棒性,从而揭示如何有效地增强神经网络的鲁棒性。
摘要:Unfolding networks are interpretable networks emerging from iterative algorithms, incorporate prior knowledge of data structure, and are designed to solve inverse problems like compressed sensing, which deals with recovering data from noisy, missing observations. Compressed sensing finds applications in critical domains, from medical imaging to cryptography, where adversarial robustness is crucial to prevent catastrophic failures. However, a solid theoretical understanding of the performance of unfolding networks in the presence of adversarial attacks is still in its infancy. In this paper, we study the adversarial generalization of unfolding networks when perturbed with $l_2$-norm constrained attacks, generated by the fast gradient sign method. Particularly, we choose a family of state-of-the-art overaparameterized unfolding networks and deploy a new framework to estimate their adversarial Rademacher complexity. Given this estimate, we provide adversarial generalization error bounds for the networks under study, which are tight with respect to the attack level. To our knowledge, this is the first theoretical analysis on the adversarial generalization of unfolding networks. We further present a series of experiments on real-world data, with results corroborating our derived theory, consistently for all data. Finally, we observe that the family's overparameterization can be exploited to promote adversarial robustness, shedding light on how to efficiently robustify neural networks.


【4】RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
标题:RespoDiff:双模块瓶颈转型,实现负责任且忠实的T2 I一代
链接:https://arxiv.org/abs/2509.15257

作者:akkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais, Serge Belongie, Anjan Dutta
摘要:扩散模型的快速发展使高保真和语义丰富的文本到图像的生成成为可能;然而,确保公平和安全仍然是一个公开的挑战。现有的方法通常以牺牲语义保真度和图像质量为代价来提高公平性和安全性。在这项工作中,我们提出了RespoDiff,一个新的框架,负责文本到图像的生成,采用了双模块转换的中间瓶颈表示的扩散模型。我们的方法引入了两个不同的可学习模块:一个专注于捕获和执行负责任的概念,如公平性和安全性,另一个致力于保持与中性提示的语义一致。为了促进双重学习过程,我们引入了一个新的分数匹配目标,使模块之间的有效协调。我们的方法优于国家的最先进的方法在负责任的生成,确保语义对齐,同时优化这两个目标,而不损害图像保真度。我们的方法提高了20%,在不同的,看不见的提示负责任和语义连贯的生成。此外,它可以无缝集成到SDXL等大型模型中,从而提高公平性和安全性。代码将在接受后发布。
摘要:The rapid advancement of diffusion models has enabled high-fidelity and semantically rich text-to-image generation; however, ensuring fairness and safety remains an open challenge. Existing methods typically improve fairness and safety at the expense of semantic fidelity and image quality. In this work, we propose RespoDiff, a novel framework for responsible text-to-image generation that incorporates a dual-module transformation on the intermediate bottleneck representations of diffusion models. Our approach introduces two distinct learnable modules: one focused on capturing and enforcing responsible concepts, such as fairness and safety, and the other dedicated to maintaining semantic alignment with neutral prompts. To facilitate the dual learning process, we introduce a novel score-matching objective that enables effective coordination between the modules. Our method outperforms state-of-the-art methods in responsible generation by ensuring semantic alignment while optimizing both objectives without compromising image fidelity. Our approach improves responsible and semantically coherent generation by 20% across diverse, unseen prompts. Moreover, it integrates seamlessly into large-scale models like SDXL, enhancing fairness and safety. Code will be released upon acceptance.


【5】Quantum Generative Adversarial Autoencoders: Learning latent representations for quantum data generation
标题:量子生成对抗自动编码器:学习量子数据生成的潜在表示
链接:https://arxiv.org/abs/2509.16186

作者: Raj, Rajiv Sangle, Avinash Singh, Krishna Kumar Sabapathy
备注:27 pages, 28 figures, 4 tables, 1 algorithm
摘要:在这项工作中,我们介绍了量子生成对抗自动编码器(QGAA),这是一种用于生成量子数据的量子模型。QGAA由两个组件组成:(a)量子自动编码器(QAE),用于压缩量子态;(b)量子生成对抗网络(QGAN),用于学习训练QAE的潜在空间。这种方法赋予QAE生成能力。QGAA的效用在两个代表性的方案中被证明:(a)纯纠缠态的产生,和(b)H$_2$和LiH的参数化分子基态的产生。在高达6个量子比特的模拟中,由训练的QGAA估计的能量的平均误差对于H$_2$为0.02 Ha,对于LiH为0.06 Ha。这些结果说明了QGAA在量子态生成、量子化学和近期量子机器学习应用方面的潜力。
摘要:In this work, we introduce the Quantum Generative Adversarial Autoencoder (QGAA), a quantum model for generation of quantum data. The QGAA consists of two components: (a) Quantum Autoencoder (QAE) to compress quantum states, and (b) Quantum Generative Adversarial Network (QGAN) to learn the latent space of the trained QAE. This approach imparts the QAE with generative capabilities. The utility of QGAA is demonstrated in two representative scenarios: (a) generation of pure entangled states, and (b) generation of parameterized molecular ground states for H$_2$ and LiH. The average errors in the energies estimated by the trained QGAA are 0.02 Ha for H$_2$ and 0.06 Ha for LiH in simulations upto 6 qubits. These results illustrate the potential of QGAA for quantum state generation, quantum chemistry, and near-term quantum machine learning applications.


半/弱/无/有监督|不确定性|主动学习(3篇)

【1】MTS-DMAE: Dual-Masked Autoencoder for Unsupervised Multivariate Time Series Representation Learning
标题:MTS-DMAE:用于无监督多元时间序列表示学习的双屏蔽自动编码器
链接:https://arxiv.org/abs/2509.16078

作者:tian Zhang, Yun Fu
备注:Accepted by ICDM 2025
摘要:无监督多变量时间序列(MTS)表示学习旨在从原始序列中提取紧凑且信息丰富的表示,而不依赖于标签,从而能够有效地转移到不同的下游任务。在本文中,我们提出了双屏蔽自动编码器(DMAE),一种新的屏蔽时间序列建模框架,用于无监督MTS表示学习。DMAE制定了两个互补的借口任务:(1)基于可见属性重建掩蔽值,以及(2)在教师编码器的指导下估计掩蔽特征的潜在表示。为了进一步提高表示质量,我们引入了特征级对齐约束,鼓励预测的潜在表示与教师的输出对齐。通过联合优化这些目标,DMAE学习时间连贯和语义丰富的表示。对分类、回归和预测任务的全面评估表明,我们的方法实现了与竞争基准相比一致和卓越的性能。
摘要:Unsupervised multivariate time series (MTS) representation learning aims to extract compact and informative representations from raw sequences without relying on labels, enabling efficient transfer to diverse downstream tasks. In this paper, we propose Dual-Masked Autoencoder (DMAE), a novel masked time-series modeling framework for unsupervised MTS representation learning. DMAE formulates two complementary pretext tasks: (1) reconstructing masked values based on visible attributes, and (2) estimating latent representations of masked features, guided by a teacher encoder. To further improve representation quality, we introduce a feature-level alignment constraint that encourages the predicted latent representations to align with the teacher's outputs. By jointly optimizing these objectives, DMAE learns temporally coherent and semantically rich representations. Comprehensive evaluations across classification, regression, and forecasting tasks demonstrate that our approach achieves consistent and superior performance over competitive baselines.


【2】Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations
标题:基于不确定性的强化学习平滑政策规范,几乎没有演示
链接:https://arxiv.org/abs/2509.15981

作者:, Charles A. Hepburn, Matthew Thorpe, Giovanni Montana
摘要:在稀疏奖励的强化学习中,演示可以加速学习,但确定何时模仿它们仍然具有挑战性。我们提出了平稳的政策规范化从示威(SPReD),一个框架,解决了根本的问题:当代理模仿示威与遵循自己的政策?SPReD使用集合方法来明确地模拟示范和政策行动的Q值分布,量化比较的不确定性。我们开发了两种互补的不确定性感知方法:一种概率方法估计的可能性演示优越性,和一个基于统计显著性的方法缩放模仿。与进行二进制模仿决策的流行方法(例如Q滤波器)不同,SPReD应用连续的、不确定比例的正则化权重,减少训练期间的梯度方差。尽管其计算简单,SPReD在八个机器人任务的实验中取得了显着的进步,在复杂任务中比现有方法高出14倍,同时保持了对演示质量和数量的鲁棒性。我们的代码可在https://github.com/YujieZhu7/SPReD上获得。
摘要:In reinforcement learning with sparse rewards, demonstrations can accelerate learning, but determining when to imitate them remains challenging. We propose Smooth Policy Regularisation from Demonstrations (SPReD), a framework that addresses the fundamental question: when should an agent imitate a demonstration versus follow its own policy? SPReD uses ensemble methods to explicitly model Q-value distributions for both demonstration and policy actions, quantifying uncertainty for comparisons. We develop two complementary uncertainty-aware methods: a probabilistic approach estimating the likelihood of demonstration superiority, and an advantage-based approach scaling imitation by statistical significance. Unlike prevailing methods (e.g. Q-filter) that make binary imitation decisions, SPReD applies continuous, uncertainty-proportional regularisation weights, reducing gradient variance during training. Despite its computational simplicity, SPReD achieves remarkable gains in experiments across eight robotics tasks, outperforming existing approaches by up to a factor of 14 in complex tasks while maintaining robustness to demonstration quality and quantity. Our code is available at https://github.com/YujieZhu7/SPReD.


【3】A Weak Supervision Approach for Monitoring Recreational Drug Use Effects in Social Media
标题:监测社交媒体上娱乐性吸毒影响的监督方法薄弱
链接:https://arxiv.org/abs/2509.15266

作者:eto-Santamaría, Alba Cortés Iglesias, Claudio Vidal Giné, Fermín Fernández Calderón, Óscar M. Lozano, Alejandro Rodríguez-González
摘要:了解娱乐性药物使用的现实影响仍然是公共卫生和生物医学研究中的一个关键挑战,特别是因为传统的监测系统往往不能充分反映用户体验。在这项研究中,我们利用社交媒体(特别是Twitter)作为与三种新兴精神活性物质(摇头丸,GHB和2C-B)相关的用户报告效果的丰富和未经过滤的来源。通过MetaMap将俚语术语的精选列表与生物医学概念提取相结合,我们识别并弱注释了超过92,000条提到这些物质的推文。在专家指导的启发式过程之后,每条推文都被标记了一个极性,反映了它是报告了积极还是消极的影响。然后,我们对报告的各种物质的表型结果进行了描述性和比较性分析,并训练了多个机器学习分类器来预测推文内容的极性,使用成本敏感学习和合成过采样等技术来解释强烈的类别不平衡。测试集上的最佳性能是从具有成本敏感学习的极限梯度提升中获得的(F1 = 0.885,AUPRC = 0.934)。我们的研究结果表明,Twitter能够检测物质特异性表型效应,极性分类模型可以支持实时药物警戒和药物效应表征,具有很高的准确性。
摘要:Understanding the real-world effects of recreational drug use remains a critical challenge in public health and biomedical research, especially as traditional surveillance systems often underrepresent user experiences. In this study, we leverage social media (specifically Twitter) as a rich and unfiltered source of user-reported effects associated with three emerging psychoactive substances: ecstasy, GHB, and 2C-B. By combining a curated list of slang terms with biomedical concept extraction via MetaMap, we identified and weakly annotated over 92,000 tweets mentioning these substances. Each tweet was labeled with a polarity reflecting whether it reported a positive or negative effect, following an expert-guided heuristic process. We then performed descriptive and comparative analyses of the reported phenotypic outcomes across substances and trained multiple machine learning classifiers to predict polarity from tweet content, accounting for strong class imbalance using techniques such as cost-sensitive learning and synthetic oversampling. The top performance on the test set was obtained from eXtreme Gradient Boosting with cost-sensitive learning (F1 = 0.885, AUPRC = 0.934). Our findings reveal that Twitter enables the detection of substance-specific phenotypic effects, and that polarity classification models can support real-time pharmacovigilance and drug effect characterization with high accuracy.


迁移|Zero/Few/One-Shot|自适应(6篇)

【1】DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
标题:DIVEEBATCH:通过对象多样性感知批量调整加速模型训练
链接:https://arxiv.org/abs/2509.16173

作者:, Yian Wang, Hari Sundaram
摘要:本文的目标是加速机器学习模型的训练,这是一个关键的挑战,因为大规模深度神经模型的训练在计算上可能是昂贵的。随机梯度下降(SGD)及其变体被广泛用于训练深度神经网络。与专注于调整学习率的传统方法相反,我们提出了一种新型的自适应批量大小SGD算法DiveBatch,可以动态调整批量大小。调整批量大小具有挑战性:由于并行计算,使用大批量更有效,但小批量训练通常在更少的时期内收敛,并且泛化能力更好。为了应对这一挑战,我们引入了基于梯度多样性的数据驱动自适应,使DiveBatch能够保持小批量训练的泛化性能,同时提高收敛速度和计算效率。梯度多样性有很强的理论依据:它来自SGD的收敛分析。在合成和CiFar-10,CiFar-100和Tiny-ImageNet上对DiveBatch的评估表明,DiveBatch的收敛速度明显快于标准SGD和AdaBatch(1.06 - 5.0倍),性能略有折衷。
摘要:The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants are widely used to train deep neural networks. In contrast to traditional approaches that focus on tuning the learning rate, we propose a novel adaptive batch size SGD algorithm, DiveBatch, that dynamically adjusts the batch size. Adapting the batch size is challenging: using large batch sizes is more efficient due to parallel computation, but small-batch training often converges in fewer epochs and generalizes better. To address this challenge, we introduce a data-driven adaptation based on gradient diversity, enabling DiveBatch to maintain the generalization performance of small-batch training while improving convergence speed and computational efficiency. Gradient diversity has a strong theoretical justification: it emerges from the convergence analysis of SGD. Evaluations of DiveBatch on synthetic and CiFar-10, CiFar-100, and Tiny-ImageNet demonstrate that DiveBatch converges significantly faster than standard SGD and AdaBatch (1.06 -- 5.0x), with a slight trade-off in performance.


【2】Time-adaptive SympNets for separable Hamiltonian systems
标题:可分离Hamilton系统的时间自适应症状网络
链接:https://arxiv.org/abs/2509.16026

作者:nik, Peter Benner
摘要:测量数据通常不规则地采样,即不在等距时间网格上采样。这对于哈密顿系统也是正确的。然而,现有的机器学习方法,学习辛积分器,如SymNets [20]和H'enonNets [4]仍然需要通过固定步长生成的训练数据。为了学习时间自适应辛积分器,在[20]中引入了SympNets的扩展,我们称之为TSympNets。我们适应TSympNets的架构,并将其扩展到非自治的哈密顿系统。到目前为止,TSymNet的近似质量是未知的。我们关闭这一差距,提供了一个普遍的逼近定理可分离的哈密顿系统,并表明它是不可能的,以扩大到非可分离的哈密顿系统。为了研究这些理论近似能力,我们进行了不同的数值实验。此外,我们修正了一个关于辛映射近似的实质性定理[25,定理2]的证明中的一个错误,但特别是对于辛机器学习方法。
摘要:Measurement data is often sampled irregularly i.e. not on equidistant time grids. This is also true for Hamiltonian systems. However, existing machine learning methods, which learn symplectic integrators, such as SympNets [20] and H\'enonNets [4] still require training data generated by fixed step sizes. To learn time-adaptive symplectic integrators, an extension to SympNets, which we call TSympNets, was introduced in [20]. We adapt the architecture of TSympNets and extend them to non-autonomous Hamiltonian systems. So far the approximation qualities of TSympNets were unknown. We close this gap by providing a universal approximation theorem for separable Hamiltonian systems and show that it is not possible to extend it to non-separable Hamiltonian systems. To investigate these theoretical approximation capabilities, we perform different numerical experiments. Furthermore we fix a mistake in a proof of a substantial theorem [25, Theorem 2] for the approximation of symplectic maps in general, but specifically for symplectic machine learning methods.


【3】HyP-ASO: A Hybrid Policy-based Adaptive Search Optimization Framework for Large-Scale Integer Linear Programs
标题:HyP-ASO:用于大规模NPS线性规划的混合基于策略的自适应搜索优化框架
链接:https://arxiv.org/abs/2509.15828

作者:Junkai Zhang, Yang Wu, Huigen Ye, Hua Xu, Huiling Xu, Yifan Zhang
摘要:由于大规模非线性规划的NP难性质,使用传统的求解器直接求解大规模非线性规划(ILP)是缓慢的。虽然基于大邻域搜索(LNS)的最新框架可以加速求解过程,但它们的性能往往受到难以生成足够有效的邻域的限制。为了应对这一挑战,我们提出了HyP-ASO,这是一种混合的基于策略的自适应搜索优化框架,将自定义公式与深度强化学习(RL)相结合。该公式利用可行解计算邻域生成过程中每个变量的选择概率,RL策略网络预测邻域大小。大量的实验表明,HyP-ASO显着优于现有的基于LNS的方法,用于大规模的ILP。实验结果表明,该算法具有轻量级和高度可扩展性,非常适合于解决大规模的ILP问题。
摘要:Directly solving large-scale Integer Linear Programs (ILPs) using traditional solvers is slow due to their NP-hard nature. While recent frameworks based on Large Neighborhood Search (LNS) can accelerate the solving process, their performance is often constrained by the difficulty in generating sufficiently effective neighborhoods. To address this challenge, we propose HyP-ASO, a hybrid policy-based adaptive search optimization framework that combines a customized formula with deep Reinforcement Learning (RL). The formula leverages feasible solutions to calculate the selection probabilities for each variable in the neighborhood generation process, and the RL policy network predicts the neighborhood size. Extensive experiments demonstrate that HyP-ASO significantly outperforms existing LNS-based approaches for large-scale ILPs. Additional experiments show it is lightweight and highly scalable, making it well-suited for solving large-scale ILPs.


【4】Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization
标题:随机分层优化具有快速收敛速度的自适应算法
链接:https://arxiv.org/abs/2509.15399

作者: Gong, Jie Hao, Mingrui Liu
备注:NeurIPS 2025
摘要:递阶优化是指决策变量和目标相互依赖的问题,如极大极小和二层公式。虽然已经提出了各种算法,但现有的方法和分析在随机优化设置中缺乏自适应性:在没有噪声幅度的先验知识的情况下,它们不能在宽范围的梯度噪声水平上实现最佳收敛速率。本文针对两类重要的随机递阶优化问题:非凸-强-凹Minimax优化和非凸-强-凸双层优化,提出了新的自适应算法。我们的算法实现急剧收敛速度$\widetilde{O}(1/\sqrt{T} + \sqrt{\bar{\sigma}}/T^{1/4})$在$T$迭代的梯度范数,其中$\bar{\sigma}$是一个上界的随机梯度噪声。值得注意的是,这些速率是在没有噪声水平的先验知识的情况下获得的,从而使得能够在低噪声和高噪声状态下自动自适应。据我们所知,这项工作提供了第一个自适应和急剧收敛保证随机分层优化。我们的算法设计结合了新的自适应参数选择的动量归一化技术。综合和深度学习任务的广泛实验证明了我们提出的算法的有效性。
摘要:Hierarchical optimization refers to problems with interdependent decision variables and objectives, such as minimax and bilevel formulations. While various algorithms have been proposed, existing methods and analyses lack adaptivity in stochastic optimization settings: they cannot achieve optimal convergence rates across a wide spectrum of gradient noise levels without prior knowledge of the noise magnitude. In this paper, we propose novel adaptive algorithms for two important classes of stochastic hierarchical optimization problems: nonconvex-strongly-concave minimax optimization and nonconvex-strongly-convex bilevel optimization. Our algorithms achieve sharp convergence rates of $\widetilde{O}(1/\sqrt{T} + \sqrt{\bar{\sigma}}/T^{1/4})$ in $T$ iterations for the gradient norm, where $\bar{\sigma}$ is an upper bound on the stochastic gradient noise. Notably, these rates are obtained without prior knowledge of the noise level, thereby enabling automatic adaptivity in both low and high-noise regimes. To our knowledge, this work provides the first adaptive and sharp convergence guarantees for stochastic hierarchical optimization. Our algorithm design combines the momentum normalization technique with novel adaptive parameter choices. Extensive experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms.


【5】Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception
标题:模拟类人自适应视觉以实现高效灵活的机器视觉感知
链接:https://arxiv.org/abs/2509.15333

作者:g, Yang Yue, Yang Yue, Huanqian Wang, Haojun Jiang, Yizeng Han, Zanlin Ni, Yifan Pu, Minglei Shi, Rui Lu, Qisen Yang, Andrew Zhao, Zhuofan Xia, Shiji Song, Gao Huang
摘要:人类视觉具有高度的适应性,通过顺序地注视与任务相关的区域来有效地采样复杂的环境。相比之下,主流的机器视觉模型一次被动地处理整个场景,导致过多的资源需求随着时空输入分辨率和模型大小而扩展,从而产生严重的限制,阻碍了未来的发展和现实世界的应用。在这里,我们介绍AdaptiveNN,一个通用框架,旨在推动从“被动”到“主动,自适应”的视觉模型的范式转变。AdaptiveNN将视觉感知制定为一个由粗到细的顺序决策过程,逐步识别和关注与任务相关的区域,逐步组合注视点之间的信息,并在足够时积极总结观察。我们建立了一种将表征学习与自我奖励强化学习相结合的理论,使不可微自适应神经网络的端到端训练无需对固定位置进行额外的监督。我们对AdaptiveNN进行了17项基准测试,涵盖9项任务,包括大规模视觉识别、细粒度区分、视觉搜索、处理真实驾驶和医疗场景的图像、语言驱动的嵌入式人工智能以及与人类的并排比较。AdaptiveNN在不牺牲准确性的情况下实现了高达28倍的推理成本降低,灵活地适应不同的任务需求和资源预算而无需重新训练,并通过其固定模式提供增强的可解释性,展示了一种有前途的高效,灵活和可解释的计算机视觉途径。此外,AdaptiveNN在许多情况下表现出与人类相似的感知行为,揭示了其作为研究视觉认知的有价值工具的潜力。代码可在https://github.com/LeapLabTHU/AdaptiveNN上获得。
摘要:Human vision is highly adaptive, efficiently sampling intricate environments by sequentially fixating on task-relevant regions. In contrast, prevailing machine vision models passively process entire scenes at once, resulting in excessive resource demands scaling with spatial-temporal input resolution and model size, yielding critical limitations impeding both future advancements and real-world application. Here we introduce AdaptiveNN, a general framework aiming to drive a paradigm shift from 'passive' to 'active, adaptive' vision models. AdaptiveNN formulates visual perception as a coarse-to-fine sequential decision-making process, progressively identifying and attending to regions pertinent to the task, incrementally combining information across fixations, and actively concluding observation when sufficient. We establish a theory integrating representation learning with self-rewarding reinforcement learning, enabling end-to-end training of the non-differentiable AdaptiveNN without additional supervision on fixation locations. We assess AdaptiveNN on 17 benchmarks spanning 9 tasks, including large-scale visual recognition, fine-grained discrimination, visual search, processing images from real driving and medical scenarios, language-driven embodied AI, and side-by-side comparisons with humans. AdaptiveNN achieves up to 28x inference cost reduction without sacrificing accuracy, flexibly adapts to varying task demands and resource budgets without retraining, and provides enhanced interpretability via its fixation patterns, demonstrating a promising avenue toward efficient, flexible, and interpretable computer vision. Furthermore, AdaptiveNN exhibits closely human-like perceptual behaviors in many cases, revealing its potential as a valuable tool for investigating visual cognition. Code is available at https://github.com/LeapLabTHU/AdaptiveNN.


【6】SETrLUSI: Stochastic Ensemble Multi-Source Transfer Learning Using Statistical Invariant
标题:SETrLUSI:使用统计不变量随机引入多源迁移学习
链接:https://arxiv.org/abs/2509.15593

作者:, Yiwei Song, Yuanhai Shao
摘要:在迁移学习中,一个源领域往往携带着不同的知识,不同的领域通常强调不同类型的知识。与传统迁移学习方法中仅处理来自所有领域的单一类型知识不同,我们引入了一种具有统计不变(SI)形式的弱收敛模式的集成学习框架,用于多源迁移学习,称为使用统计不变的随机Ensemble多源迁移学习(SETrLUSI)。该算法从源域和目标域中提取并集成了各种知识,不仅有效地利用了不同的知识,而且加快了收敛过程。此外,SETrLUSI结合了随机SI选择、成比例源域采样和目标域自举,这在提高模型稳定性的同时提高了训练效率。实验结果表明,SETrLUSI算法具有较好的收敛性,且收敛速度快,优于相关算法.
摘要:In transfer learning, a source domain often carries diverse knowledge, and different domains usually emphasize different types of knowledge. Different from handling only a single type of knowledge from all domains in traditional transfer learning methods, we introduce an ensemble learning framework with a weak mode of convergence in the form of Statistical Invariant (SI) for multi-source transfer learning, formulated as Stochastic Ensemble Multi-Source Transfer Learning Using Statistical Invariant (SETrLUSI). The proposed SI extracts and integrates various types of knowledge from both source and target domains, which not only effectively utilizes diverse knowledge but also accelerates the convergence process. Further, SETrLUSI incorporates stochastic SI selection, proportional source domain sampling, and target domain bootstrapping, which improves training efficiency while enhancing model stability. Experiments show that SETrLUSI has good convergence and outperforms related methods with a lower time cost.


强化学习(5篇)

【1】RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
标题:RLinf:通过宏到微流程转换实现灵活高效的大规模强化学习
链接:https://arxiv.org/abs/2509.15965

作者:Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, Zixiao Huang, Mingjie Wei, Yuqing Xie, Ke Yang, Bo Dai, Zhexuan Xu, Xiangyuan Wang, Xu Fu, Zhihao Liu, Kang Chen, Weilin Liu, Gang Liu, Boxun Li, Jianlei Yang, Zhi Yang, Guohao Dai, Yu Wang
备注:GitHub Repo: this https URL
摘要:强化学习(RL)在推进人工通用智能、代理智能和体现智能方面表现出巨大的潜力。然而,RL工作流固有的异构性和动态性往往导致硬件利用率低,在现有系统上的训练速度慢。在本文中,我们提出了RLinf,一个高性能的RL训练系统,基于我们的关键观察,有效的RL训练的主要障碍在于系统的灵活性。为了最大限度地提高灵活性和效率,RLinf构建在一种称为宏到微流转换(M2Flow)的新型RL系统设计范式之上,该范式自动分解时间和空间维度上的高级,易于组合的RL工作流,并将其重新组合为优化的执行流。在RLinf worker自适应通信能力的支持下,我们设计了上下文切换和弹性流水来实现M2Flow转换,并设计了一种基于性能分析的调度策略来生成最优的执行计划。对推理RL和具体RL任务的广泛评估表明,RLinf始终优于最先进的系统,在端到端训练吞吐量方面实现了1.1倍至2.13倍的加速。
摘要:Reinforcement learning (RL) has demonstrated immense potential in advancing artificial general intelligence, agentic intelligence, and embodied intelligence. However, the inherent heterogeneity and dynamicity of RL workflows often lead to low hardware utilization and slow training on existing systems. In this paper, we present RLinf, a high-performance RL training system based on our key observation that the major roadblock to efficient RL training lies in system flexibility. To maximize flexibility and efficiency, RLinf is built atop a novel RL system design paradigm called macro-to-micro flow transformation (M2Flow), which automatically breaks down high-level, easy-to-compose RL workflows at both the temporal and spatial dimensions, and recomposes them into optimized execution flows. Supported by RLinf worker's adaptive communication capability, we devise context switching and elastic pipelining to realize M2Flow transformation, and a profiling-guided scheduling policy to generate optimal execution plans. Extensive evaluations on both reasoning RL and embodied RL tasks demonstrate that RLinf consistently outperforms state-of-the-art systems, achieving 1.1x-2.13x speedup in end-to-end training throughput.


【2】Nonconvex Regularization for Feature Selection in Reinforcement Learning
标题:强化学习中特征选择的非凸正规化
链接:https://arxiv.org/abs/2509.15652

作者:zuki, Konstantinos Slavakis
摘要:提出了一种有效的批处理算法,用于强化学习(RL)中的特征选择,并具有理论收敛保证。为了减轻传统正则化方案中固有的估计偏差,第一个贡献扩展了经典的最小二乘时间差(LSTD)框架内的政策评估,制定了贝尔曼残差目标正则化的稀疏诱导,非凸投影极大极小凹(PMC)的惩罚。由于PMC罚函数的弱凸性,该公式可以被解释为一般非单调包含问题的一个特例。第二个贡献建立新的收敛条件的前向反射后向分裂(FRBS)算法来解决这类问题。基准数据集上的数值实验表明,所提出的方法大大优于国家的最先进的特征选择方法,特别是在场景中有许多嘈杂的功能。
摘要 :This work proposes an efficient batch algorithm for feature selection in reinforcement learning (RL) with theoretical convergence guarantees. To mitigate the estimation bias inherent in conventional regularization schemes, the first contribution extends policy evaluation within the classical least-squares temporal-difference (LSTD) framework by formulating a Bellman-residual objective regularized with the sparsity-inducing, nonconvex projected minimax concave (PMC) penalty. Owing to the weak convexity of the PMC penalty, this formulation can be interpreted as a special instance of a general nonmonotone-inclusion problem. The second contribution establishes novel convergence conditions for the forward-reflected-backward splitting (FRBS) algorithm to solve this class of problems. Numerical experiments on benchmark datasets demonstrate that the proposed approach substantially outperforms state-of-the-art feature-selection methods, particularly in scenarios with many noisy features.


【3】Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem
标题:完全分散的协作多Agent强化学习是一个上下文建模问题
链接:https://arxiv.org/abs/2509.15519

作者:Bingkun Bao, Yang Gao
摘要:本文研究了完全分散的合作多智能体强化学习,其中每个智能体只观察状态,其局部动作和共享奖励。无法访问其他代理的行动往往会导致非平稳的价值函数更新和价值函数估计过程中的相对过度泛化,阻碍有效的合作政策学习。然而,现有的作品未能同时解决这两个问题,由于他们无法模拟其他代理人在一个完全分散的设置的联合政策。为了克服这一局限性,我们提出了一种新的方法,名为动态感知上下文(DAC),正式的任务,作为本地感知的每个代理,作为一个上下文马尔可夫决策过程,并通过动态感知上下文建模进一步解决非平稳性和相对过度泛化。具体而言,DAC属性的非平稳本地任务动态的每个代理之间的切换未观察到的上下文,每个对应于一个不同的联合政策。然后,DAC使用潜变量对逐步动态分布进行建模,并将其称为上下文。对于每个代理,DAC引入了基于上下文的值函数,以解决值函数更新过程中的非平稳性问题。对于价值函数估计,导出了一个乐观的边际值,以促进合作行动的选择,从而解决了相对过度泛化的问题。在实验上,我们评估了DAC在各种合作任务(包括矩阵博弈,捕食者和猎物,SMAC),其优越的性能对多个基线验证了它的有效性。
摘要:This paper studies fully decentralized cooperative multi-agent reinforcement learning, where each agent solely observes the states, its local actions, and the shared rewards. The inability to access other agents' actions often leads to non-stationarity during value function updates and relative overgeneralization during value function estimation, hindering effective cooperative policy learning. However, existing works fail to address both issues simultaneously, due to their inability to model the joint policy of other agents in a fully decentralized setting. To overcome this limitation, we propose a novel method named Dynamics-Aware Context (DAC), which formalizes the task, as locally perceived by each agent, as an Contextual Markov Decision Process, and further addresses both non-stationarity and relative overgeneralization through dynamics-aware context modeling. Specifically, DAC attributes the non-stationary local task dynamics of each agent to switches between unobserved contexts, each corresponding to a distinct joint policy. Then, DAC models the step-wise dynamics distribution using latent variables and refers to them as contexts. For each agent, DAC introduces a context-based value function to address the non-stationarity issue during value function update. For value function estimation, an optimistic marginal value is derived to promote the selection of cooperative actions, thereby addressing the relative overgeneralization issue. Experimentally, we evaluate DAC on various cooperative tasks (including matrix game, predator and prey, and SMAC), and its superior performance against multiple baselines validates its effectiveness.


【4】Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning
标题:Fleming-R1:通过强化学习迈向专家级医学推理
链接:https://arxiv.org/abs/2509.15279

作者:Derek Li, Yan Shu, Robin Chen, Derek Duan, Teng Fang, Bryan Dai
摘要:虽然大型语言模型在医学应用中显示出了希望,但由于需要准确的答案和透明的推理过程,实现专家级的临床推理仍然具有挑战性。为了应对这一挑战,我们引入了Fleming-R1,这是一种通过三个互补创新设计的可验证医学推理模型。首先,我们的面向推理的数据策略(RODS)将策划的医疗QA数据集与知识图指导的合成相结合,以提高对代表性不足的疾病,药物和多跳推理链的覆盖率。其次,我们采用思想链(CoT)冷启动从教师模型中提取高质量的推理轨迹,建立鲁棒的推理先验。第三,我们使用组相对策略优化实现了一个两阶段的可验证奖励强化学习(RLVR)框架,该框架通过自适应硬样本挖掘来巩固核心推理技能,同时针对持久故障模式。在不同的医疗基准中,Fleming-R1提供了大量的参数效率改进:7 B变体超过了更大的基线,而32 B模型与GPT-4 o接近,并且始终优于强大的开源替代品。这些结果表明,结构化数据设计、面向推理的初始化和可验证的强化学习可以使临床推理超越简单的精度优化。我们公开发布Fleming-R1,以促进医疗AI的透明、可复制和可审计的进展,从而在高风险的临床环境中实现更安全的部署。
摘要:While large language models show promise in medical applications, achieving expert-level clinical reasoning remains challenging due to the need for both accurate answers and transparent reasoning processes. To address this challenge, we introduce Fleming-R1, a model designed for verifiable medical reasoning through three complementary innovations. First, our Reasoning-Oriented Data Strategy (RODS) combines curated medical QA datasets with knowledge-graph-guided synthesis to improve coverage of underrepresented diseases, drugs, and multi-hop reasoning chains. Second, we employ Chain-of-Thought (CoT) cold start to distill high-quality reasoning trajectories from teacher models, establishing robust inference priors. Third, we implement a two-stage Reinforcement Learning from Verifiable Rewards (RLVR) framework using Group Relative Policy Optimization, which consolidates core reasoning skills while targeting persistent failure modes through adaptive hard-sample mining. Across diverse medical benchmarks, Fleming-R1 delivers substantial parameter-efficient improvements: the 7B variant surpasses much larger baselines, while the 32B model achieves near-parity with GPT-4o and consistently outperforms strong open-source alternatives. These results demonstrate that structured data design, reasoning-oriented initialization, and verifiable reinforcement learning can advance clinical reasoning beyond simple accuracy optimization. We release Fleming-R1 publicly to promote transparent, reproducible, and auditable progress in medical AI, enabling safer deployment in high-stakes clinical environments.


【5】Quantum Reinforcement Learning with Dynamic-Circuit Qubit Reuse and Grover-Based Trajectory Optimization
标题:具有动态电路量子位重用和基于Grover的轨迹优化的量子强化学习
链接:https://arxiv.org/abs/2509.16002

作者: Su, Shaswot Shresthamali, Masaaki Kondo
摘要:开发了一个完全量子强化学习框架,该框架集成了量子马尔可夫决策过程、基于动态电路的量子位重用和Grover的轨迹优化算法。该框架完全在量子域内对状态、动作、奖励和转换进行编码,从而通过叠加和消除经典子程序来并行探索状态-动作序列。动态电路操作,包括中间电路测量和重置,允许跨多个代理-环境交互重用相同的物理量子位,将T时间步长的量子位需求从7*T减少到7,同时保持逻辑连续性。量子算术计算轨迹回报,Grover的搜索应用于这些评估轨迹的叠加,以放大测量回报最高的轨迹的概率,从而加速最佳策略的识别。模拟表明,基于动态电路的实现保持了轨迹保真度,同时相对于静态设计减少了66%的量子位使用。在IBM Heron级量子硬件上的实验部署证实了该框架在当前量子处理器的约束下运行,并验证了在噪声中等尺度量子条件下完全量子多步强化学习的可行性。该框架推进了量子强化学习在大规模顺序决策任务中的可扩展性和实际应用。
摘要 :A fully quantum reinforcement learning framework is developed that integrates a quantum Markov decision process, dynamic circuit-based qubit reuse, and Grover's algorithm for trajectory optimization. The framework encodes states, actions, rewards, and transitions entirely within the quantum domain, enabling parallel exploration of state-action sequences through superposition and eliminating classical subroutines. Dynamic circuit operations, including mid-circuit measurement and reset, allow reuse of the same physical qubits across multiple agent-environment interactions, reducing qubit requirements from 7*T to 7 for T time steps while preserving logical continuity. Quantum arithmetic computes trajectory returns, and Grover's search is applied to the superposition of these evaluated trajectories to amplify the probability of measuring those with the highest return, thereby accelerating the identification of the optimal policy. Simulations demonstrate that the dynamic-circuit-based implementation preserves trajectory fidelity while reducing qubit usage by 66 percent relative to the static design. Experimental deployment on IBM Heron-class quantum hardware confirms that the framework operates within the constraints of current quantum processors and validates the feasibility of fully quantum multi-step reinforcement learning under noisy intermediate-scale quantum conditions. This framework advances the scalability and practical application of quantum reinforcement learning for large-scale sequential decision-making tasks.


符号|符号学习(1篇)

【1】Improving Monte Carlo Tree Search for Symbolic Regression
标题:改进蒙特卡罗树搜索以实现符号回归
链接:https://arxiv.org/abs/2509.15929

作者:Huang, Daniel Zhengyu Huang, Tiannan Xiao, Dina Ma, Zhenyu Ming, Hao Shi, Yuanhui Wen
摘要:符号回归旨在发现满足预期目标的简洁、可解释的数学表达式,例如拟合数据,提出高度组合优化问题。虽然遗传编程一直是主要的方法,但最近的努力已经探索了用于提高搜索效率的强化学习方法。蒙特卡罗树搜索(MCTS),其能力,平衡探索和开发,通过引导搜索,已成为一个有前途的技术,符号表达式发现。然而,其传统的强盗策略和顺序符号结构往往限制性能。在这项工作中,我们提出了一个改进的MCTS框架的符号回归,解决这些限制,通过两个关键的创新:(1)一个极端的强盗分配策略,为确定全球最优的表达式,有限的时间性能保证多项式回报衰减假设;以及(2)进化启发的状态跳跃动作,诸如变异和交叉,其使得能够非局部地转移到搜索空间的有希望的区域。这些状态跳跃动作也重塑了搜索过程中的奖励格局,提高了鲁棒性和效率。我们对这些改进的影响进行了彻底的数值研究,并将我们的方法与现有的符号回归方法在各种数据集上进行了基准测试,包括地面实况和黑盒数据集。我们的方法在恢复率方面与最先进的库取得了有竞争力的性能,在准确性与模型复杂性的帕累托边界上取得了有利的地位。代码可在https://github.com/PKU-CMEGroup/MCTS-4-SR上获得。
摘要:Symbolic regression aims to discover concise, interpretable mathematical expressions that satisfy desired objectives, such as fitting data, posing a highly combinatorial optimization problem. While genetic programming has been the dominant approach, recent efforts have explored reinforcement learning methods for improving search efficiency. Monte Carlo Tree Search (MCTS), with its ability to balance exploration and exploitation through guided search, has emerged as a promising technique for symbolic expression discovery. However, its traditional bandit strategies and sequential symbol construction often limit performance. In this work, we propose an improved MCTS framework for symbolic regression that addresses these limitations through two key innovations: (1) an extreme bandit allocation strategy tailored for identifying globally optimal expressions, with finite-time performance guarantees under polynomial reward decay assumptions; and (2) evolution-inspired state-jumping actions such as mutation and crossover, which enable non-local transitions to promising regions of the search space. These state-jumping actions also reshape the reward landscape during the search process, improving both robustness and efficiency. We conduct a thorough numerical study to the impact of these improvements and benchmark our approach against existing symbolic regression methods on a variety of datasets, including both ground-truth and black-box datasets. Our approach achieves competitive performance with state-of-the-art libraries in terms of recovery rate, attains favorable positions on the Pareto frontier of accuracy versus model complexity. Code is available at https://github.com/PKU-CMEGroup/MCTS-4-SR.


医学相关(2篇)

【1】The Missing Piece: A Case for Pre-Training in 3D Medical Object Detection
标题:缺失的部分:3D医疗对象检测预训练的案例
链接:https://arxiv.org/abs/2509.15947

作者: Eckstein, Constantin Ulrich, Michael Baumgartner, Jessica Kächele, Dimitrios Bounias, Tassilo Wald, Ralf Floca, Klaus H. Maier-Hein
备注:MICCAI 2025
摘要:大规模的预训练有望推进3D医学对象检测,这是准确的计算机辅助诊断的关键组成部分。然而,与细分相比,它仍然没有得到充分的探索,其中预训练已经证明了显着的好处。用于3D对象检测的现有预训练方法依赖于2D医学数据或自然图像预训练,未能充分利用3D体积信息。在这项工作中,我们首次系统地研究了如何将现有的预训练方法集成到最先进的检测架构中,包括CNN和Transformers。我们的研究结果表明,预训练可以持续提高各种任务和数据集的检测性能。值得注意的是,基于重建的自监督预训练优于监督预训练,而对比预训练对3D医学对象检测没有明显的好处。我们的代码可在https://github.com/MIC-DKFZ/nnDetection-finetuning上公开获取。
摘要:Large-scale pre-training holds the promise to advance 3D medical object detection, a crucial component of accurate computer-aided diagnosis. Yet, it remains underexplored compared to segmentation, where pre-training has already demonstrated significant benefits. Existing pre-training approaches for 3D object detection rely on 2D medical data or natural image pre-training, failing to fully leverage 3D volumetric information. In this work, we present the first systematic study of how existing pre-training methods can be integrated into state-of-the-art detection architectures, covering both CNNs and Transformers. Our results show that pre-training consistently improves detection performance across various tasks and datasets. Notably, reconstruction-based self-supervised pre-training outperforms supervised pre-training, while contrastive pre-training provides no clear benefit for 3D medical object detection. Our code is publicly available at: https://github.com/MIC-DKFZ/nnDetection-finetuning.


【2】Recent Advancements in Microscopy Image Enhancement using Deep Learning: A Survey
标题:使用深度学习在显微镜图像增强方面的最新进展:调查
链接:https://arxiv.org/abs/2509.15363

作者:Dutta, Neeharika Sonowal, Risheraj Barauh, Deepjyoti Chetia, Sanjib Kr Kalita
备注:7 pages, 3 figures and 1 table. 2024 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI). IEEE, 2024
摘要:显微图像增强对于在微观尺度上理解生物细胞和材料的细节起着关键作用。近年来,显微图像增强的进步有了显着的增长,特别是在深度学习方法的帮助下。这篇调查论文旨在提供这种快速发展的最先进方法的快照,重点关注其演变,应用,挑战和未来方向。核心讨论围绕超分辨率显微图像增强、重建和去噪等关键领域进行,每个领域都根据其当前趋势及其深度学习的实际效用进行了探索。
摘要:Microscopy image enhancement plays a pivotal role in understanding the details of biological cells and materials at microscopic scales. In recent years, there has been a significant rise in the advancement of microscopy image enhancement, specifically with the help of deep learning methods. This survey paper aims to provide a snapshot of this rapidly growing state-of-the-art method, focusing on its evolution, applications, challenges, and future directions. The core discussions take place around the key domains of microscopy image enhancement of super-resolution, reconstruction, and denoising, with each domain explored in terms of its current trends and their practical utility of deep learning.


蒸馏|知识提取(2篇)

【1】FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
标题:FocalCodec-Stream:通过因果蒸馏流传输低比特率语音编码
链接:https://arxiv.org/abs/2509.16195

作者 :a Libera, Cem Subakan, Mirco Ravanelli
备注:5 pages, 1 figure
摘要:神经音频编解码器是现代生成音频管道的基本组成部分。虽然最近的编解码器实现了强大的低比特率重建,并为下游任务提供了强大的表示,但大多数编解码器都是不可流式传输的,限制了它们在实时应用中的使用。我们提出了FocalCodec流,一种混合编解码器的基础上,焦点调制,语音压缩成一个单一的二进制码本在0.55 - 0.80 kbps的理论延迟为80毫秒。我们的方法结合了多阶段的因果蒸馏WavLM有针对性的架构改进,包括一个轻量级的精炼模块,提高质量下的延迟约束。实验表明,FocalCodec-Stream在相当的比特率下优于现有的流式编解码器,同时保留语义和声学信息。结果是在重建质量、下游任务性能、延迟和效率之间的有利权衡。代码和检查点将在https://github.com/lucadellalib/focalcodec上发布。
摘要:Neural audio codecs are a fundamental component of modern generative audio pipelines. Although recent codecs achieve strong low-bitrate reconstruction and provide powerful representations for downstream tasks, most are non-streamable, limiting their use in real-time applications. We present FocalCodec-Stream, a hybrid codec based on focal modulation that compresses speech into a single binary codebook at 0.55 - 0.80 kbps with a theoretical latency of 80 ms. Our approach combines multi-stage causal distillation of WavLM with targeted architectural improvements, including a lightweight refiner module that enhances quality under latency constraints. Experiments show that FocalCodec-Stream outperforms existing streamable codecs at comparable bitrates, while preserving both semantic and acoustic information. The result is a favorable trade-off between reconstruction quality, downstream task performance, latency, and efficiency. Code and checkpoints will be released at https://github.com/lucadellalib/focalcodec.


【2】RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation
标题:随机矩阵理论因果知识提取
链接:https://arxiv.org/abs/2509.15724

作者:tori, Nastaran Darabi, Sureshkumar Senthilkumar, Amit Ranjan Trivedi
备注:5 pages, submitted to ICASSP 2026, September 2025
摘要:大型深度学习模型(如BERT和ResNet)可以实现最先进的性能,但由于其规模和计算需求,在边缘部署成本高昂。我们提出了RMT-KD,这是一种利用随机矩阵理论(RMT)进行知识蒸馏以迭代减少网络大小的压缩方法。代替修剪或启发式秩选择,RMT-KD仅保留通过隐藏表示的谱特性识别的信息方向。基于RMT的因果约简被逐层应用,并具有自蒸馏以保持稳定性和准确性。在GLUE、AG News和CIFAR-10上,RMT-KD实现了高达80%的参数减少,精度损失仅为2%,推理速度提高了2.8倍,功耗降低了近一半。这些结果建立RMT-KD作为一个数学接地网络蒸馏的方法。
摘要:Large deep learning models such as BERT and ResNet achieve state-of-the-art performance but are costly to deploy at the edge due to their size and compute demands. We present RMT-KD, a compression method that leverages Random Matrix Theory (RMT) for knowledge distillation to iteratively reduce network size. Instead of pruning or heuristic rank selection, RMT-KD preserves only informative directions identified via the spectral properties of hidden representations. RMT-based causal reduction is applied layer by layer with self-distillation to maintain stability and accuracy. On GLUE, AG News, and CIFAR-10, RMT-KD achieves up to 80% parameter reduction with only 2% accuracy loss, delivering 2.8x faster inference and nearly halved power consumption. These results establish RMT-KD as a mathematically grounded approach to network distillation.


聚类(2篇)

【1】Personalized Federated Learning with Heat-Kernel Enhanced Tensorized Multi-View Clustering
标题:具有热核增强张量多视图集群的个性化联邦学习
链接:https://arxiv.org/abs/2509.16101

作者:P. Sinaga
备注:26 pages, 3 algorithms, and 3 figures
摘要:我们提出了一个强大的个性化联邦学习框架,利用先进的张量分解技术的热核增强张量化多视图模糊c均值聚类。我们的方法集成了热核系数改编自量子场论与塔克分解和典型的polyadic分解(CANDECOMP/PARAFAC)转换传统的距离度量,并有效地表示高维多视图结构。该框架采用matriculation和矢量化技术,以促进通过N路广义张量的隐藏结构和多线性关系的发现。该方法引入了一个双层优化方案:局部热核增强模糊聚类与张量分解操作阶N输入张量,张量因子与隐私保护个性化机制的联邦聚合。局部阶段采用张量化核欧氏距离变换和塔克分解来发现多视图张量数据中的客户端特定模式,而全局聚合过程通过差分隐私保护协议协调客户端之间的张量因子(核心张量和因子矩阵)。这种张量化方法能够有效地处理高维多视图数据,通过低秩张量近似节省大量通信。
摘要:We present a robust personalized federated learning framework that leverages heat-kernel enhanced tensorized multi-view fuzzy c-means clustering with advanced tensor decomposition techniques. Our approach integrates heat-kernel coefficients adapted from quantum field theory with Tucker decomposition and canonical polyadic decomposition (CANDECOMP/PARAFAC) to transform conventional distance metrics and efficiently represent high-dimensional multi-view structures. The framework employs matriculation and vectorization techniques to facilitate the discovery of hidden structures and multilinear relationships via N-way generalized tensors. The proposed method introduces a dual-level optimization scheme: local heat-kernel enhanced fuzzy clustering with tensor decomposition operating on order-N input tensors, and federated aggregation of tensor factors with privacy-preserving personalization mechanisms. The local stage employs tensorized kernel Euclidean distance transformations and Tucker decomposition to discover client-specific patterns in multi-view tensor data, while the global aggregation process coordinates tensor factors (core tensors and factor matrices) across clients through differential privacy-preserving protocols. This tensorized approach enables efficient handling of high-dimensional multi-view data with significant communication savings through low-rank tensor approximations.


【2】FedHK-MVFC: Federated Heat Kernel Multi-View Clustering
标题:FedHK-MVFC:联合热核多视图集群
链接:https://arxiv.org/abs/2509.15844

作者:P. Sinaga
备注:41 pages, 9 figures, and 3 tables
摘要:在分布式人工智能和以隐私为中心的医疗应用领域,我们提出了一个多视图聚类框架,将量子场论与联合医疗分析联系起来。我们的方法使用来自光谱分析的热核系数将欧几里得距离转换为几何感知的相似性度量,捕获不同医疗数据的结构。我们通过具有收敛保证的热核距离(HKD)变换来实现这一点。开发了两种算法:Heat Kernel-Enhanced Multi-View Fuzzy Clustering(HK-MVFC)用于集中分析,Federated Heat Kernel-Multi-View Fuzzy Clustering(FedHK-MVFC)用于跨医院的安全、隐私保护学习,使用差异隐私和安全聚合来促进符合HIPAA的协作。对心血管患者的合成数据集的测试表明,与集中式方法相比,聚类准确性提高了8 - 12%,通信减少了70%,效率保持了98. 2%。在两家医院的10,000份患者记录上进行了验证,证明它对涉及ECG,心脏成像和行为数据的协作表型分析非常有用。我们的理论贡献包括更新规则证明收敛,自适应视图加权,隐私保护协议。这为医疗保健中的几何感知联邦学习提供了一个新的标准,将高等数学转化为分析敏感医疗数据的可行解决方案,同时确保严格性和临床相关性。
摘要 :In the realm of distributed AI and privacy-focused medical applications, we propose a framework for multi-view clustering that links quantum field theory with federated healthcare analytics. Our method uses heat-kernel coefficients from spectral analysis to convert Euclidean distances into geometry-aware similarity measures, capturing the structure of diverse medical data. We lay this out through the Heat Kernel Distance (HKD) transformation with convergence guarantees. Two algorithms are developed: Heat Kernel-Enhanced Multi-View Fuzzy Clustering (HK-MVFC) for central analysis, and Federated Heat Kernel Multi-View Fuzzy Clustering (FedHK-MVFC) for secure, privacy-preserving learning across hospitals using differential privacy and secure aggregation to facilitate HIPAA-compliant collaboration. Tests on synthetic datasets of cardiovascular patients show an $8-12 \%$ increase in clustering accuracy, $70 \%$ reduced communication, and $98.2 \%$ efficiency retention over centralized methods. Validated on 10,000 patient records across two hospitals, it proves useful for collaborative phenotyping involving ECG, cardiac imaging, and behavioral data. Our theoretical contributions include update rules with proven convergence, adaptive view weighting, and privacy-preserving protocols. This presents a new standard for geometry-aware federated learning in healthcare, turning advanced math into workable solutions for analyzing sensitive medical data while ensuring both rigor and clinical relevance.


自动驾驶|车辆|车道检测等(1篇)

【1】Exploring multimodal implicit behavior learning for vehicle navigation in simulated cities
标题:探索模拟城市中车辆导航的多模式隐式行为学习
链接:https://arxiv.org/abs/2509.15400

作者:an Antonelo, Gustavo Claudio Karl Couto, Christian Möller
备注:ENIAC conference
摘要:标准行为克隆(BC)无法学习多模式驾驶决策,其中同一场景存在多个有效动作。我们探索内隐行为克隆(IBC)与基于能量的模型(EBM),以更好地捕捉这种多模态。我们提出了数据增强的IBC(DA-IBC),它通过扰动专家动作来形成IBC训练的反例,并使用更好的初始化来进行无导数推理,从而提高了学习能力。在CARLA模拟器与鸟瞰图输入的实验表明,DA-IBC优于标准的IBC在城市驾驶任务,旨在评估多模态行为学习的测试环境。学习的能量景观能够表示多模态动作分布,这是BC无法实现的。
摘要:Standard Behavior Cloning (BC) fails to learn multimodal driving decisions, where multiple valid actions exist for the same scenario. We explore Implicit Behavioral Cloning (IBC) with Energy-Based Models (EBMs) to better capture this multimodality. We propose Data-Augmented IBC (DA-IBC), which improves learning by perturbing expert actions to form the counterexamples of IBC training and using better initialization for derivative-free inference. Experiments in the CARLA simulator with Bird's-Eye View inputs demonstrate that DA-IBC outperforms standard IBC in urban driving tasks designed to evaluate multimodal behavior learning in a test environment. The learned energy landscapes are able to represent multimodal action distributions, which BC fails to achieve.


联邦学习|隐私保护|加密(1篇)

【1】Hybrid Deep Learning-Federated Learning Powered Intrusion Detection System for IoT/5G Advanced Edge Computing Network
标题:适用于物联网/5G高级边缘计算网络的混合深度学习-联邦学习驱动的入侵检测系统
链接:https://arxiv.org/abs/2509.15555

作者:dar, Sasa Maric, Robert Abbas
摘要:物联网和5G高级应用的指数级扩展扩大了DDoS、恶意软件和零日入侵的攻击面。我们提出了一种入侵检测系统,该系统融合了卷积神经网络(CNN),双向LSTM(BiLSTM)和隐私保护联邦学习(FL)框架内的自动编码器(AE)瓶颈。CNN-BiLSTM分支捕获局部和门控交叉特征交互,而AE强调基于重建的异常敏感性。训练跨边缘设备进行,而无需共享原始数据。在UNSW-NB 15(二进制)上,融合模型的AUC达到99.59%,F1达到97.36%;混淆矩阵分析显示了平衡的错误率,具有高精度和召回率。在我们的测试硬件上,每个样本的平均推理时间约为0.0476 ms,这完全在不到10 ms的URLLC预算范围内,支持边缘部署。我们还讨论了可解释性、漂移容限和FL考虑因素,以实现合规、可扩展的5G-Advanced IoT安全性。
摘要:The exponential expansion of IoT and 5G-Advanced applications has enlarged the attack surface for DDoS, malware, and zero-day intrusions. We propose an intrusion detection system that fuses a convolutional neural network (CNN), a bidirectional LSTM (BiLSTM), and an autoencoder (AE) bottleneck within a privacy-preserving federated learning (FL) framework. The CNN-BiLSTM branch captures local and gated cross-feature interactions, while the AE emphasizes reconstruction-based anomaly sensitivity. Training occurs across edge devices without sharing raw data. On UNSW-NB15 (binary), the fused model attains AUC 99.59 percent and F1 97.36 percent; confusion-matrix analysis shows balanced error rates with high precision and recall. Average inference time is approximately 0.0476 ms per sample on our test hardware, which is well within the less than 10 ms URLLC budget, supporting edge deployment. We also discuss explainability, drift tolerance, and FL considerations for compliant, scalable 5G-Advanced IoT security.


推理|分析|理解|解释(2篇)

【1】Inference Offloading for Cost-Sensitive Binary Classification at the Edge
标题:边缘代价敏感二元分类的推理卸载
链接:https://arxiv.org/abs/2509.15674

作者:rayanan Moothedath, Umang Agarwal, Umeshraja N, James Richard Gross, Jaya Prakash Champati, Sharayu Moharir
摘要:我们专注于边缘智能系统中的二进制分类问题,其中假阴性比假阳性成本更高。该系统有一个紧凑的本地部署的模型,它是由一个更大的,远程模型,这是通过产生卸载成本通过网络访问补充。对于每个样本,我们的系统首先使用本地部署的模型进行推理。基于本地模型的输出,可以将样本卸载到远程模型。这项工作的目的是了解在这样的分层推理(HI)系统的分类精度和这些卸载成本之间的基本权衡。为了优化这个系统,我们提出了一个在线学习框架,不断适应一对阈值的本地模型的置信度得分。这些阈值确定本地模型的预测以及样本是被本地分类还是被卸载到远程模型。我们提出了一个封闭形式的解决方案的设置,其中本地模型进行校准。对于更一般的情况下,未校准的模型,我们介绍了H2T2,一个在线的两个阈值的分层推理策略,并证明它实现了次线性遗憾。H2T2是模型不可知的,不需要训练,并在推理阶段使用有限的反馈进行学习。在真实世界数据集上的模拟表明,H2T2始终优于朴素和单阈值HI策略,有时甚至超过离线最优。该策略还表现出对分布变化的鲁棒性,并有效地适应不匹配的分类器。
摘要:We focus on a binary classification problem in an edge intelligence system where false negatives are more costly than false positives. The system has a compact, locally deployed model, which is supplemented by a larger, remote model, which is accessible via the network by incurring an offloading cost. For each sample, our system first uses the locally deployed model for inference. Based on the output of the local model, the sample may be offloaded to the remote model. This work aims to understand the fundamental trade-off between classification accuracy and these offloading costs within such a hierarchical inference (HI) system. To optimize this system, we propose an online learning framework that continuously adapts a pair of thresholds on the local model's confidence scores. These thresholds determine the prediction of the local model and whether a sample is classified locally or offloaded to the remote model. We present a closed-form solution for the setting where the local model is calibrated. For the more general case of uncalibrated models, we introduce H2T2, an online two-threshold hierarchical inference policy, and prove it achieves sublinear regret. H2T2 is model-agnostic, requires no training, and learns in the inference phase using limited feedback. Simulations on real-world datasets show that H2T2 consistently outperforms naive and single-threshold HI policies, sometimes even surpassing offline optima. The policy also demonstrates robustness to distribution shifts and adapts effectively to mismatched classifiers.


【2】Learning in Stackelberg Mean Field Games: A Non-Asymptotic Analysis
标题:Stackelberg平均场博弈中的学习:非渐进分析
链接:https://arxiv.org/abs/2509.15392

作者:g, Benjamin Patrick Evans, Sujay Bhatt, Leo Ardon, Sumitra Ganesh, Alec Koppel
摘要 :我们研究策略优化的Stackelberg平均场游戏(MFG),一个层次的框架,一个单一的领导者和一个无限大的人口同质的追随者之间的战略互动建模。该目标可以表述为一个结构化的双层优化问题,其中领导者需要学习一个策略,最大化其奖励,预测追随者的响应。现有的方法来解决这些(及相关)问题往往依赖于限制性的独立性假设之间的领导者和追随者的目标,使用样本效率低下,由于嵌套循环算法结构,并缺乏有限时间收敛的保证。为了解决这些限制,我们提出了AC-SMFG,这是一种单循环演员-评论家算法,可对连续生成的马尔科夫样本进行操作。该算法交替(半)梯度更新的领导者,一个代表性的追随者,和平均场,是简单的,在实践中实现。我们建立了有限时间和有限样本收敛的算法Stackelberg目标的一个稳定点。据我们所知,这是第一个Stackelberg MFG算法与非渐近收敛保证。我们的关键假设是一个“梯度对齐”的条件,这要求领导者的全部政策梯度可以近似的一部分组成部分,放松现有的领导者-追随者的独立性假设。在一系列成熟的经济环境中的仿真结果表明,AC-SMFG优于现有的多智能体和MFG学习基线的政策质量和收敛速度。
摘要:We study policy optimization in Stackelberg mean field games (MFGs), a hierarchical framework for modeling the strategic interaction between a single leader and an infinitely large population of homogeneous followers. The objective can be formulated as a structured bi-level optimization problem, in which the leader needs to learn a policy maximizing its reward, anticipating the response of the followers. Existing methods for solving these (and related) problems often rely on restrictive independence assumptions between the leader's and followers' objectives, use samples inefficiently due to nested-loop algorithm structure, and lack finite-time convergence guarantees. To address these limitations, we propose AC-SMFG, a single-loop actor-critic algorithm that operates on continuously generated Markovian samples. The algorithm alternates between (semi-)gradient updates for the leader, a representative follower, and the mean field, and is simple to implement in practice. We establish the finite-time and finite-sample convergence of the algorithm to a stationary point of the Stackelberg objective. To our knowledge, this is the first Stackelberg MFG algorithm with non-asymptotic convergence guarantees. Our key assumption is a "gradient alignment" condition, which requires that the full policy gradient of the leader can be approximated by a partial component of it, relaxing the existing leader-follower independence assumption. Simulation results in a range of well-established economics environments demonstrate that AC-SMFG outperforms existing multi-agent and MFG learning baselines in policy quality and convergence speed.


检测相关(4篇)

【1】Network-Based Detection of Autism Spectrum Disorder Using Sustainable and Non-invasive Salivary Biomarkers
标题:使用可持续和非侵入性唾液生物标志物基于网络检测自闭症谱系障碍
链接:https://arxiv.org/abs/2509.16126

作者:. Fernandes, Robinson Sabino-Silva, Murillo G. Carneiro
摘要:自闭症谱系障碍(ASD)缺乏可靠的生物标志物,延误了早期诊断。使用ATR-FTIR光谱分析的159个唾液样本,我们开发了GANet,这是一种基于遗传算法的网络优化框架,利用PageRank和Degree进行基于重要性的特征表征。GANet系统地优化网络结构,从高维光谱数据中提取有意义的模式。与线性判别分析、支持向量机和深度学习模型相比,它具有更优越的性能,准确率为0.78,灵敏度为0.61,特异性为0.90,调和平均值为0.74。这些结果证明了GANet作为一种强大的、生物启发的、非侵入性的工具的潜力,用于精确的ASD检测和更广泛的基于光谱的健康应用。
摘要:Autism Spectrum Disorder (ASD) lacks reliable biological markers, delaying early diagnosis. Using 159 salivary samples analyzed by ATR-FTIR spectroscopy, we developed GANet, a genetic algorithm-based network optimization framework leveraging PageRank and Degree for importance-based feature characterization. GANet systematically optimizes network structure to extract meaningful patterns from high-dimensional spectral data. It achieved superior performance compared to linear discriminant analysis, support vector machines, and deep learning models, reaching 0.78 accuracy, 0.61 sensitivity, 0.90 specificity, and a 0.74 harmonic mean. These results demonstrate GANet's potential as a robust, bio-inspired, non-invasive tool for precise ASD detection and broader spectral-based health applications.


【2】Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach
标题:实现大小不变的突出对象检测:一种通用评估和优化方法
链接:https://arxiv.org/abs/2509.15573

作者:ao, Qianqian Xu, Feiran Li, Boyu Han, Zhiyong Yang, Xiaochun Cao, Qingming Huang
摘要:本文研究了显着对象检测(SOD)中一个基本但未充分研究的问题:评估协议的大小不变属性,特别是在单个图像中出现多个大小显著不同的显着对象时。我们首先提出了一个新的角度来揭示现有的广泛使用的SOD指标的固有的大小敏感性。通过仔细的理论推导,我们表明,目前的SOD度量下的图像的评价结果基本上可以分解成几个可分离的条款的总和,与每个条款的贡献是成正比的,其相应的区域大小。因此,预测误差将由较大的区域主导,而较小但可能更重要的语义对象往往被忽略,导致有偏见的性能评估和实际退化。为了解决这一挑战,提出了一个通用的大小不变的评估(SIEva)框架。其核心思想是单独评估每个可分离的组件,然后聚合结果,从而有效地减轻对象之间大小不平衡的影响。在此基础上,我们进一步开发了一个专用的优化框架(SIOpt),它坚持大小不变的原则,并显着提高了在广泛的尺寸范围内的显着对象的检测。值得注意的是,SIOpt与模型无关,可以与各种SOD主干无缝集成。从理论上讲,我们还对SOD方法进行了概括分析,并提供了支持我们新评估协议有效性的证据。最后,综合实验说明了我们所提出的方法的有效性。该代码可在https://github.com/Ferry-Li/SI-SOD上获得。
摘要:This paper investigates a fundamental yet underexplored issue in Salient Object Detection (SOD): the size-invariant property for evaluation protocols, particularly in scenarios when multiple salient objects of significantly different sizes appear within a single image. We first present a novel perspective to expose the inherent size sensitivity of existing widely used SOD metrics. Through careful theoretical derivations, we show that the evaluation outcome of an image under current SOD metrics can be essentially decomposed into a sum of several separable terms, with the contribution of each term being directly proportional to its corresponding region size. Consequently, the prediction errors would be dominated by the larger regions, while smaller yet potentially more semantically important objects are often overlooked, leading to biased performance assessments and practical degradation. To address this challenge, a generic Size-Invariant Evaluation (SIEva) framework is proposed. The core idea is to evaluate each separable component individually and then aggregate the results, thereby effectively mitigating the impact of size imbalance across objects. Building upon this, we further develop a dedicated optimization framework (SIOpt), which adheres to the size-invariant principle and significantly enhances the detection of salient objects across a broad range of sizes. Notably, SIOpt is model-agnostic and can be seamlessly integrated with a wide range of SOD backbones. Theoretically, we also present generalization analysis of SOD methods and provide evidence supporting the validity of our new evaluation protocols. Finally, comprehensive experiments speak to the efficacy of our proposed approach. The code is available at https://github.com/Ferry-Li/SI-SOD.


【3】Quantum Enhanced Anomaly Detection for ADS-B Data using Hybrid Deep Learning
标题:使用混合深度学习对ADS-B数据进行量子增强异常检测
链接:https://arxiv.org/abs/2509.15991

作者:an, Felipe Gohring de Magalhaes, Jean-Yves Ouattara, Gabriela Nicolescu
备注:This is the author's version of the work accepted for publication in the IEEE-AIAA Digital Avionics Systems Conference (DASC) 2025. The final version will be available via IEEE Xplore
摘要:新兴的量子机器学习(QML)领域在加快处理速度和有效处理与复杂数据集相关的高维数据方面显示出了巨大的优势。量子计算(QC)通过叠加和纠缠的量子特性实现了更有效的数据操作。在本文中,我们提出了一种结合量子和经典机器学习技术的新方法,以探索量子特性对自动相关监视广播(ADS-B)数据中异常检测的影响。我们比较了具有不同损失函数的混合全连接量子神经网络(H-FNN)的性能,并使用公开可用的ADS-B数据集来评估性能。结果表明,在检测异常方面具有竞争力的性能,准确率从90.17%到94.05%,与传统的全连接神经网络(FNN)模型的性能相当,后者的准确率在91.50%到93.37%之间。
摘要 :The emerging field of Quantum Machine Learning (QML) has shown promising advantages in accelerating processing speed and effectively handling the high dimensionality associated with complex datasets. Quantum Computing (QC) enables more efficient data manipulation through the quantum properties of superposition and entanglement. In this paper, we present a novel approach combining quantum and classical machine learning techniques to explore the impact of quantum properties for anomaly detection in Automatic Dependent Surveillance-Broadcast (ADS-B) data. We compare the performance of a Hybrid-Fully Connected Quantum Neural Network (H-FQNN) with different loss functions and use a publicly available ADS-B dataset to evaluate the performance. The results demonstrate competitive performance in detecting anomalies, with accuracies ranging from 90.17% to 94.05%, comparable to the performance of a traditional Fully Connected Neural Network (FNN) model, which achieved accuracies between 91.50% and 93.37%.


【4】Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech
标题:运动后语音中的呼吸和语义状态检测和状态级别分类
链接:https://arxiv.org/abs/2509.15473

作者:, Wuyue Xia, Huaxiu Yao, Jingping Nie
备注:6 pages, 3rd ACM International Workshop on Intelligent Acoustic Systems and Applications (IASA 25)
摘要:运动后言语包含丰富的生理和语言线索,通常以语义停顿、呼吸停顿和呼吸-语义组合停顿为标志。检测这些事件能够评估恢复率、肺功能和与运动相关的异常。然而,现有的工作识别和区分不同类型的停顿在这方面是有限的。在这项工作中,建立在最近发布的同步音频和呼吸信号的数据集,我们提供了系统的暂停类型的注释。使用这些注释,我们系统地在深度学习模型(GRU,1D CNN-LSTM,AlexNet,VGG 16),声学特征(MFCC,MFB)和分层Wav 2 Vec 2表示中进行探索性呼吸和语义暂停检测以及运动水平分类。我们评估三个设置-单一功能,功能融合,和两个阶段的检测分类级联下的分类和回归公式。结果表明,每种类型的检测准确率高达89$\%$的语义,55$\%$的呼吸,86$\%$的组合停顿,和73 $\ %$的整体,而努力水平分类达到90.5$\%$的准确率,优于以前的工作。
摘要:Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, building on a recently released dataset with synchronized audio and respiration signals, we provide systematic annotations of pause types. Using these annotations, we systematically conduct exploratory breathing and semantic pause detection and exertion-level classification across deep learning models (GRU, 1D CNN-LSTM, AlexNet, VGG16), acoustic features (MFCC, MFB), and layer-stratified Wav2Vec2 representations. We evaluate three setups-single feature, feature fusion, and a two-stage detection-classification cascade-under both classification and regression formulations. Results show per-type detection accuracy up to 89$\%$ for semantic, 55$\%$ for breathing, 86$\%$ for combined pauses, and 73$\%$overall, while exertion-level classification achieves 90.5$\%$ accuracy, outperformin prior work.


分类|识别(6篇)

【1】Dynamic Classifier-Free Diffusion Guidance via Online Feedback
标题:通过在线反馈提供动态无分类器扩散指导
链接:https://arxiv.org/abs/2509.16131

作者:Papalampidi, Olivia Wiles, Ira Ktena, Aleksandar Shtedritski, Emanuele Bugliarello, Ivana Kajic, Isabela Albuquerque, Aida Nematzadeh
摘要:无分类器引导(CFG)是文本到图像扩散模型的基石,但其有效性受到使用静态引导尺度的限制。这种“一刀切”的方法无法适应不同提示的不同要求;此外,基于梯度的校正或固定启发式时间表等先前的解决方案引入了额外的复杂性,并且无法推广。在这项工作中,我们通过引入一个动态CFG调度的框架来实现这种静态范式。我们的方法利用一套通用和专门的小规模潜在空间评估的在线反馈,如CLIP对齐,保真度和人类偏好奖励模型,以评估在反向扩散过程的每一步的生成质量。基于此反馈,我们执行贪婪搜索以选择每个时间步的最佳CFG尺度,从而创建针对每个提示和样本量身定制的独特引导时间表。我们证明了我们的方法在小规模模型和最先进的Imagen 3上的有效性,在文本对齐,视觉质量,文本渲染和数值推理方面都有显着改善。值得注意的是,与默认的Imagen 3基线相比,我们的方法在总体偏好上实现了高达53.8%的人类偏好获胜率,这一数字在针对特定功能(如文本渲染)的提示上增加了55.5%。我们的工作建立了最优的指导时间表是固有的动态和依赖的,并提供了一个有效的和可推广的框架来实现它。
摘要:Classifier-free guidance (CFG) is a cornerstone of text-to-image diffusion models, yet its effectiveness is limited by the use of static guidance scales. This "one-size-fits-all" approach fails to adapt to the diverse requirements of different prompts; moreover, prior solutions like gradient-based correction or fixed heuristic schedules introduce additional complexities and fail to generalize. In this work, we challeng this static paradigm by introducing a framework for dynamic CFG scheduling. Our method leverages online feedback from a suite of general-purpose and specialized small-scale latent-space evaluations, such as CLIP for alignment, a discriminator for fidelity and a human preference reward model, to assess generation quality at each step of the reverse diffusion process. Based on this feedback, we perform a greedy search to select the optimal CFG scale for each timestep, creating a unique guidance schedule tailored to every prompt and sample. We demonstrate the effectiveness of our approach on both small-scale models and the state-of-the-art Imagen 3, showing significant improvements in text alignment, visual quality, text rendering and numerical reasoning. Notably, when compared against the default Imagen 3 baseline, our method achieves up to 53.8% human preference win-rate for overall preference, a figure that increases up to to 55.5% on prompts targeting specific capabilities like text rendering. Our work establishes that the optimal guidance schedule is inherently dynamic and prompt-dependent, and provides an efficient and generalizable framework to achieve it.


【2】Personalized Prediction By Learning Halfspace Reference Classes Under Well-Behaved Distribution
标题:行为良好分布下学习半空间参考类的个性化预测
链接:https://arxiv.org/abs/2509.15592

作者:ang, Brendan Juba
摘要:在机器学习应用程序中,预测模型被训练成在整个数据分布中为未来的查询提供服务。现实世界的数据通常需要过于复杂的模型来实现竞争性的性能,但是,牺牲了可解释性。因此,机器学习模型在高风险应用(如医疗保健)中的部署越来越多,促使人们寻找准确和可解释的预测方法。这项工作提出了一个个性化的预测方案,其中一个易于解释的预测是学习每个查询。特别是,我们希望产生一个“稀疏线性”分类器具有竞争力的性能,特别是在一些子人口,包括查询点。这项工作的目标是研究这个预测模型的PAC学习的子群体表示的“半空间”在标签不可知的设置。首先,我们给出了一个分布特定的PAC学习算法,用于学习个性化预测的参考类。通过利用参考类学习算法和稀疏线性表示的列表学习器,我们证明了第一个上界,O(\mathrm{opt}^{1/4})$,用于稀疏线性分类器和均匀半空间子集的个性化预测。我们还评估了我们的算法在各种标准的基准数据集。
摘要:In machine learning applications, predictive models are trained to serve future queries across the entire data distribution. Real-world data often demands excessively complex models to achieve competitive performance, however, sacrificing interpretability. Hence, the growing deployment of machine learning models in high-stakes applications, such as healthcare, motivates the search for methods for accurate and explainable predictions. This work proposes a Personalized Prediction scheme, where an easy-to-interpret predictor is learned per query. In particular, we wish to produce a "sparse linear" classifier with competitive performance specifically on some sub-population that includes the query point. The goal of this work is to study the PAC-learnability of this prediction model for sub-populations represented by "halfspaces" in a label-agnostic setting. We first give a distribution-specific PAC-learning algorithm for learning reference classes for personalized prediction. By leveraging both the reference-class learning algorithm and a list learner of sparse linear representations, we prove the first upper bound, $O(\mathrm{opt}^{1/4} )$, for personalized prediction with sparse linear classifiers and homogeneous halfspace subsets. We also evaluate our algorithms on a variety of standard benchmark data sets.


【3】Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
标题:潜在分区网络:生成建模、表示学习和分类的统一原则
链接:https://arxiv.org/abs/2509.15591

作者:, Enshu Liu, Xuefei Ning, Junyi Zhu, Wenyu Wang, Sergey Yekhanin
备注:Published in NeurIPS 2025
摘要:生成建模、表示学习和分类是机器学习(ML)的三个核心问题,但它们的最新(SoTA)解决方案在很大程度上仍然是不相交的。在本文中,我们问:一个统一的原则可以解决所有三个?这种统一可以简化ML管道,并促进任务之间的更大协同作用。我们引入潜在分区网络(LZN)作为实现这一目标的一步。LZN的核心是创建一个共享的高斯潜在空间,对所有任务的信息进行编码。每个数据类型(例如,图像、文本、标签)配备有将样本映射到不相交的潜在区域的编码器,以及将潜在区域映射回数据的解码器。ML任务表示为这些编码器和解码器的组合:例如,标签条件图像生成使用标签编码器和图像解码器;图像嵌入使用图像编码器;分类使用图像编码器和标签解码器。我们在三个日益复杂的场景中展示了LZN的前景:(1)LZN可以增强现有模型(图像生成):当与SoTA整流模型相结合时,LZN将CIFAR 10上的FID从2.76提高到2.59-而无需修改训练目标。(2)LZN可以独立地解决任务(表示学习):LZN可以在没有辅助损失函数的情况下实现无监督表示学习,在ImageNet的下游线性分类上,分别比MoCo和Simplified方法的性能高出9.3%和0.2%。(3)LZN可以同时解决多个任务(联合生成和分类):通过图像和标签编码器/解码器,LZN通过设计联合执行这两项任务,提高了FID并在CIFAR 10上实现了SoTA分类精度。代码和训练模型可在https://github.com/microsoft/latent-zoning-networks上获得。该项目的网站是https://zinanlin.me/blogs/latent_zoning_networks.html。
摘要:Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.


【4】How many classes do we need to see for novel class discovery?
标题:我们需要看多少个类才能发现新的类?
链接:https://arxiv.org/abs/2509.15585

作者:Sarkar, Been Kim, Jennifer J. Sun
备注:DG-EBF @ CVPR2025
摘要:新的类发现对于ML模型适应不断变化的现实世界数据至关重要,其应用范围从科学发现到机器人技术。然而,这些数据集包含复杂和纠缠的变化因素,使系统的研究类发现困难。因此,许多基本问题尚未得到回答,为什么以及何时新的类发现更有可能成功。为了解决这个问题,我们提出了一个简单的控制实验框架,使用dSprites数据集与程序生成的修改因子。这使我们能够调查是什么影响了成功的类发现。特别是,我们研究了已知/未知类的数量和发现性能之间的关系,以及已知类的“覆盖率”对发现新类的影响。我们的实证结果表明,已知类的数量的好处达到饱和点,超过发现性能平台。不同环境下的收益递减模式为从业者提供了成本效益分析的见解,并为未来在复杂的现实世界数据集上进行更严格的类发现研究提供了起点。
摘要:Novel class discovery is essential for ML models to adapt to evolving real-world data, with applications ranging from scientific discovery to robotics. However, these datasets contain complex and entangled factors of variation, making a systematic study of class discovery difficult. As a result, many fundamental questions are yet to be answered on why and when new class discoveries are more likely to be successful. To address this, we propose a simple controlled experimental framework using the dSprites dataset with procedurally generated modifying factors. This allows us to investigate what influences successful class discovery. In particular, we study the relationship between the number of known/unknown classes and discovery performance, as well as the impact of known class 'coverage' on discovering new classes. Our empirical results indicate that the benefit of the number of known classes reaches a saturation point beyond which discovery performance plateaus. The pattern of diminishing return across different settings provides an insight for cost-benefit analysis for practitioners and a starting point for more rigorous future research of class discovery on complex real-world datasets.


【5】Hybrid unary-binary design for multiplier-less printed Machine Learning classifiers
标题:用于无乘数打印机器学习分类器的混合一元-二进制设计
链接:https://arxiv.org/abs/2509.15316

作者:rmeniakos, Theodoros Mantzakidis, Dimitrios Soudris
备注:Accepted for publication by 25th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation
摘要:印刷电子(PE)为实现机器学习(ML)电路提供了一种灵活、具有成本效益的硅替代方案,但其大的特征尺寸限制了分类器的复杂性。利用PE的低制造和NRE成本,设计人员可以根据特定的ML模型定制硬件,简化电路设计。这项工作探讨了替代算法,并提出了一种混合的一元二进制架构,删除昂贵的编码器,使高效,乘法器少执行MLP分类。我们还引入架构感知培训,以进一步提高面积和功率效率。对六个数据集的评估显示,面积平均减少46%,功率平均减少39%,精度损失最小,超过了其他最先进的MLP设计。
摘要:Printed Electronics (PE) provide a flexible, cost-efficient alternative to silicon for implementing machine learning (ML) circuits, but their large feature sizes limit classifier complexity. Leveraging PE's low fabrication and NRE costs, designers can tailor hardware to specific ML models, simplifying circuit design. This work explores alternative arithmetic and proposes a hybrid unary-binary architecture that removes costly encoders and enables efficient, multiplier-less execution of MLP classifiers. We also introduce architecture-aware training to further improve area and power efficiency. Evaluation on six datasets shows average reductions of 46% in area and 39% in power, with minimal accuracy loss, surpassing other state-of-the-art MLP designs.


【6】IEFS-GMB: Gradient Memory Bank-Guided Feature Selection Based on Information Entropy for EEG Classification of Neurological Disorders
标题:IEFS-GMB:基于信息熵的梯度记忆库引导特征选择用于脑电分类
链接:https://arxiv.org/abs/2509.15259

作者:ng, Hanyang Dong, Jia-Hong Gao, Yi Sun, Kuntao Xiao, Wanli Yang, Zhao Lv, Shurong Sheng
摘要 :基于深度学习的EEG分类对于自动检测神经系统疾病、提高诊断准确性和实现早期干预至关重要。然而,EEG信号的低信噪比限制了模型性能,使得特征选择(FS)对于优化神经网络编码器学习的表示至关重要。现有的FS方法很少专门为EEG诊断设计;许多方法依赖于架构,缺乏可解释性,限制了它们的适用性。此外,大多数依赖于单次迭代数据,导致对可变性的鲁棒性有限。为了解决这些问题,我们提出了IEFS GMB,一个基于信息熵的特征选择方法的梯度记忆库引导。该方法构建了一个动态记忆库,存储历史梯度,通过信息熵计算特征重要性,并应用基于熵的加权来选择信息丰富的EEG特征。在四个公共神经系统疾病数据集上的实验表明,使用IEFS-GMB增强的编码器比基线模型的准确率提高了0.64%到6.45%。该方法还优于四种竞争FS技术,并提高了模型的可解释性,支持其在临床环境中的实际使用。
摘要:Deep learning-based EEG classification is crucial for the automated detection of neurological disorders, improving diagnostic accuracy and enabling early intervention. However, the low signal-to-noise ratio of EEG signals limits model performance, making feature selection (FS) vital for optimizing representations learned by neural network encoders. Existing FS methods are seldom designed specifically for EEG diagnosis; many are architecture-dependent and lack interpretability, limiting their applicability. Moreover, most rely on single-iteration data, resulting in limited robustness to variability. To address these issues, we propose IEFS-GMB, an Information Entropy-based Feature Selection method guided by a Gradient Memory Bank. This approach constructs a dynamic memory bank storing historical gradients, computes feature importance via information entropy, and applies entropy-based weighting to select informative EEG features. Experiments on four public neurological disease datasets show that encoders enhanced with IEFS-GMB achieve accuracy improvements of 0.64% to 6.45% over baseline models. The method also outperforms four competing FS techniques and improves model interpretability, supporting its practical use in clinical settings.


表征(1篇)

【1】Detail Across Scales: Multi-Scale Enhancement for Full Spectrum Neural Representations
标题:跨尺度的细节:全谱神经表示的多尺度增强
链接:https://arxiv.org/abs/2509.15494

作者:Zhantao Chen, Cheng Peng, Rajan Plumley, Chun Hong Yoon, Jana B. Thayer, Joshua J. Turner
摘要:隐式神经表示(INR)已经成为基于离散数组的数据表示的紧凑和参数化替代方案,直接在神经网络权重中编码信息,以实现分辨率无关的表示和内存效率。然而,现有的INR方法,当被约束到紧凑的网络大小时,难以忠实地表示表征大多数科学数据集的多尺度结构、高频信息和精细纹理。为了解决这一限制,我们提出了WIEN-INR,这是一种基于小波的隐式神经表示,它将建模分布在不同的分辨率尺度上,并在最精细的尺度上采用专门的内核网络来恢复细微的细节。这种多尺度架构允许使用较小的网络来保留完整的信息,同时保持训练效率并降低存储成本。通过对跨越不同尺度和结构复杂性的各种科学数据集进行广泛的实验,WIEN-INR在保持紧凑模型大小的同时实现了卓越的重建保真度。这些结果表明,WIEN-INR作为高保真科学数据编码的实用神经表示框架,将INR的适用性扩展到有效保留精细细节至关重要的领域。
摘要:Implicit neural representations (INRs) have emerged as a compact and parametric alternative to discrete array-based data representations, encoding information directly in neural network weights to enable resolution-independent representation and memory efficiency. However, existing INR approaches, when constrained to compact network sizes, struggle to faithfully represent the multi-scale structures, high-frequency information, and fine textures that characterize the majority of scientific datasets. To address this limitation, we propose WIEN-INR, a wavelet-informed implicit neural representation that distributes modeling across different resolution scales and employs a specialized kernel network at the finest scale to recover subtle details. This multi-scale architecture allows for the use of smaller networks to retain the full spectrum of information while preserving the training efficiency and reducing storage cost. Through extensive experiments on diverse scientific datasets spanning different scales and structural complexities, WIEN-INR achieves superior reconstruction fidelity while maintaining a compact model size. These results demonstrate WIEN-INR as a practical neural representation framework for high-fidelity scientific data encoding, extending the applicability of INRs to domains where efficient preservation of fine detail is essential.


3D|3D重建等相关(1篇)

【1】Communications to Circulations: 3D Wind Field Retrieval and Real-Time Prediction Using 5G GNSS Signals and Deep Learning
标题:与环流的通信:使用5G GNSS信号和深度学习的3D风场检索和实时预测
链接:https://arxiv.org/abs/2509.16068

作者:, Hong Liang, Chaoxia Yuan, Mingyu Li, Aoqi Zhou, Chunqing Shang, Hua Cai, Peixi Liu, Kezuan Wang, Yifeng Zheng
备注:31 pages,11 figures,1 table
摘要:准确的大气风场信息对于包括天气预报、航空安全和减少灾害风险在内的各种应用至关重要。然而,由于传统的原位观测和遥感技术的局限性,以及数值天气预报(NWP)模式的计算费用和偏差,获得高时空分辨率的风数据仍然具有挑战性。本文介绍了G-WindCast,这是一种新型的深度学习框架,它利用5G全球导航卫星系统(GNSS)信号的信号强度变化来检索和预测三维(3D)大气风场。该框架利用前向神经网络(FNN)和Transformer网络来捕捉复杂的,非线性的,时空之间的关系GNSS派生的功能和风动力学。我们的初步结果表明,在风场反演和短期风场预报(最多30分钟的提前时间)方面都有很好的准确性,在某些情况下,技能分数与高分辨率NWP输出相当。该模型在不同的预测范围和压力水平表现出鲁棒性,其预测的风速和风向显示出优越的协议与观测相比,同时ERA 5再分析数据。此外,我们还表明,即使在GNSS台站数量显著减少的情况下,该系统也可以保持出色的局部预报性能(例如,约100个),突出了其成本效益和可扩展性。这种跨学科的方法强调了利用非传统数据源和深度学习进行高级环境监测和实时大气应用的变革潜力。
摘要:Accurate atmospheric wind field information is crucial for various applications, including weather forecasting, aviation safety, and disaster risk reduction. However, obtaining high spatiotemporal resolution wind data remains challenging due to limitations in traditional in-situ observations and remote sensing techniques, as well as the computational expense and biases of numerical weather prediction (NWP) models. This paper introduces G-WindCast, a novel deep learning framework that leverages signal strength variations from 5G Global Navigation Satellite System (GNSS) signals to retrieve and forecast three-dimensional (3D) atmospheric wind fields. The framework utilizes Forward Neural Networks (FNN) and Transformer networks to capture complex, nonlinear, and spatiotemporal relationships between GNSS-derived features and wind dynamics. Our preliminary results demonstrate promising accuracy in both wind retrieval and short-term wind forecasting (up to 30 minutes lead time), with skill scores comparable to high-resolution NWP outputs in certain scenarios. The model exhibits robustness across different forecast horizons and pressure levels, and its predictions for wind speed and direction show superior agreement with observations compared to concurrent ERA5 reanalysis data. Furthermore, we show that the system can maintain excellent performance for localized forecasting even with a significantly reduced number of GNSS stations (e.g., around 100), highlighting its cost-effectiveness and scalability. This interdisciplinary approach underscores the transformative potential of exploiting non-traditional data sources and deep learning for advanced environmental monitoring and real-time atmospheric applications.


编码器(1篇)

【1】Triplet Loss Based Quantum Encoding for Class Separability
标题:基于三重损失的量子编码的类可分离性
链接:https://arxiv.org/abs/2509.15705

作者:dacci, Mahul Pandey, Paolo Santini, Michele Amoretti
摘要:为了提高变分量子分类器的性能,提出了一种有效的数据驱动编码方案。这种编码是专门为图像等复杂数据集设计的,并试图通过产生输入状态来帮助分类任务,这些状态根据其分类标签在希尔伯特空间中形成分离良好的聚类。编码电路使用受经典面部识别算法启发的三重损失函数进行训练,并且通过编码密度矩阵之间的平均迹距离来测量类可分性。在MNIST和MedMNIST数据集上对各种二进制分类任务进行的基准测试表明,与具有相同VQC结构的幅度编码相比,具有相当大的改进,同时需要低得多的电路深度。
摘要 :An efficient and data-driven encoding scheme is proposed to enhance the performance of variational quantum classifiers. This encoding is specially designed for complex datasets like images and seeks to help the classification task by producing input states that form well-separated clusters in the Hilbert space according to their classification labels. The encoding circuit is trained using a triplet loss function inspired by classical facial recognition algorithms, and class separability is measured via average trace distances between the encoded density matrices. Benchmark tests performed on various binary classification tasks on MNIST and MedMNIST datasets demonstrate considerable improvement over amplitude encoding with the same VQC structure while requiring a much lower circuit depth.


优化|敛散性(8篇)

【1】Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems
标题:应用于路线问题的学习成本逆优化潜变量模型
链接:https://arxiv.org/abs/2509.15999

作者:ahoud, Erik Schaffernicht, Johannes A. Stork
备注:Accepted at Neurips 2025
摘要:学习具有未知成本函数的约束优化问题(COP)的解决方案的表示是具有挑战性的,因为像(变分)自动编码器这样的模型在解码结构化输出时很难强制执行约束。我们提出了一种逆优化潜变量模型(IO-LVM),它从观察到的解决方案中学习COP成本函数的潜在空间,并通过在循环中使用求解器求解COP来重建可行输出。我们的方法通过不可微的确定性求解器利用估计的Fenchel-Young损失梯度来塑造潜在空间。与通常恢复单个或特定于上下文的成本函数的标准逆优化或逆强化学习方法不同,IO-LVM捕获成本函数的分布,从而能够识别由不同代理或训练过程中不可用的条件引起的各种解决方案行为。我们在真实世界的船舶和出租车路线数据集以及合成图中的路径上验证了我们的方法,证明了它能够重建路径和循环,预测它们的分布,并产生可解释的潜在表示。
摘要:Learning representations for solutions of constrained optimization problems (COPs) with unknown cost functions is challenging, as models like (Variational) Autoencoders struggle to enforce constraints when decoding structured outputs. We propose an Inverse Optimization Latent Variable Model (IO-LVM) that learns a latent space of COP cost functions from observed solutions and reconstructs feasible outputs by solving a COP with a solver in the loop. Our approach leverages estimated gradients of a Fenchel-Young loss through a non-differentiable deterministic solver to shape the latent space. Unlike standard Inverse Optimization or Inverse Reinforcement Learning methods, which typically recover a single or context-specific cost function, IO-LVM captures a distribution over cost functions, enabling the identification of diverse solution behaviors arising from different agents or conditions not available during the training process. We validate our method on real-world datasets of ship and taxi routes, as well as paths in synthetic graphs, demonstrating its ability to reconstruct paths and cycles, predict their distributions, and yield interpretable latent representations.


【2】On the Convergence of Muon and Beyond
标题:关于μ子及超越的收敛
链接:https://arxiv.org/abs/2509.15816

作者: Yongxiang Liu, Ganzhao Yuan
摘要:Muon优化器在处理用于训练神经网络的矩阵结构参数方面取得了显着的经验成功。然而,它的实际性能和理论理解之间存在着很大的差距。现有的分析表明,标准的μ子变体仅达到次优收敛速度$\mathcal{O}(T^{-1/4})$在随机非凸设置,其中$T$表示迭代次数。为了探索μ子框架的理论限制,我们构建并分析了一个方差减少的变体,称为μ子VR 2。我们提供了第一个严格的证明,纳入方差减少机制,使μ on-VR 2,以达到最佳的收敛速度$\tilde{\mathcal{O}}(T^{-1/3})$,从而匹配这类问题的理论下限。此外,我们的分析建立了Polyak-{\L}ojasiewicz(P{\L})条件下的μ子变体的收敛保证。对视觉(CIFAR-10)和语言(C4)基准的广泛实验证实了我们关于每次迭代收敛的理论发现。总的来说,这项工作为Muon风格的优化器提供了最优性的第一个证明,并阐明了开发更实际有效的加速变体的道路。
摘要:The Muon optimizer has demonstrated remarkable empirical success in handling matrix-structured parameters for training neural networks. However, a significant gap persists between its practical performance and theoretical understanding. Existing analyses indicate that the standard Muon variant achieves only a suboptimal convergence rate of $\mathcal{O}(T^{-1/4})$ in stochastic non-convex settings, where $T$ denotes the number of iterations. To explore the theoretical limits of the Muon framework, we construct and analyze a variance-reduced variant, termed Muon-VR2. We provide the first rigorous proof that incorporating a variance-reduction mechanism enables Muon-VR2 to attain an optimal convergence rate of $\tilde{\mathcal{O}}(T^{-1/3})$, thereby matching the theoretical lower bound for this class of problems. Moreover, our analysis establishes convergence guarantees for Muon variants under the Polyak-{\L}ojasiewicz (P{\L}) condition. Extensive experiments on vision (CIFAR-10) and language (C4) benchmarks corroborate our theoretical findings on per-iteration convergence. Overall, this work provides the first proof of optimality for a Muon-style optimizer and clarifies the path toward developing more practically efficient, accelerated variants.


【3】Generalization and Optimization of SGD with Lookahead
标题:使用Lookahead对新加坡元进行概括和优化
链接:https://arxiv.org/abs/2509.15776

作者: Li, Yunwen Lei
摘要:Lookahead优化器通过采用双权重更新机制来增强深度学习模型,该机制已被证明可以提高SGD等底层优化器的性能。然而,大多数理论研究都集中在它对训练数据的收敛性上,而对其泛化能力的了解较少。现有的泛化分析通常受到限制性假设的限制,例如要求损失函数是全局Lipschitz连续的,并且它们的边界不能完全捕获优化和泛化之间的关系。在本文中,我们解决这些问题进行严格的稳定性和泛化分析的前瞻性优化与minibatch SGD。我们利用平均模型的稳定性,推导出推广的边界凸和强凸的问题,没有限制性的Lipschitzness假设。我们的分析表明,在凸设置的批量大小的线性加速。
摘要:The Lookahead optimizer enhances deep learning models by employing a dual-weight update mechanism, which has been shown to improve the performance of underlying optimizers such as SGD. However, most theoretical studies focus on its convergence on training data, leaving its generalization capabilities less understood. Existing generalization analyses are often limited by restrictive assumptions, such as requiring the loss function to be globally Lipschitz continuous, and their bounds do not fully capture the relationship between optimization and generalization. In this paper, we address these issues by conducting a rigorous stability and generalization analysis of the Lookahead optimizer with minibatch SGD. We leverage on-average model stability to derive generalization bounds for both convex and strongly convex problems without the restrictive Lipschitzness assumption. Our analysis demonstrates a linear speedup with respect to the batch size in the convex setting.


【4】On Optimal Steering to Achieve Exact Fairness
标题:论实现精确公平的最佳转向
链接:https://arxiv.org/abs/2509.15759

作者:rma, Amit Jayant Deshpande, Chiranjib Bhattacharyya, Rajiv Ratn Shah
备注:Accepted for Presentation at Neurips 2025
摘要:为了解决公平机器学习中的“偏入偏出”问题,重要的是将数据的特征分布或大型语言模型(LLM)的内部表示引导到保证组公平结果的理想分布。以前关于公平生成模型和表示转向的工作可以大大受益于模型输出的可证明公平性保证。我们将一个分布定义为理想分布,如果其上任何成本敏感风险的最小化者保证具有精确的组公平结果(例如,人口均等、机会均等)-换句话说,它没有公平-效用权衡。我们通过在KL-发散中找到最近的理想分布来制定最佳转向的优化程序,并且当底层分布来自众所周知的参数族(例如,正态,对数正态)。从经验上讲,我们在合成和真实世界数据集上的最佳转向技术在不降低效用的情况下提高了公平性(有时甚至提高了效用)。我们展示了LLM表示的仿射转向,以减少多类分类中的偏差,例如,职业预测来自Bios数据集中的简短传记(De-Arteaga等人)。此外,我们引导LLM的内部表示朝向期望的输出,以便它在不同的组中同样有效。
摘要 :To fix the 'bias in, bias out' problem in fair machine learning, it is important to steer feature distributions of data or internal representations of Large Language Models (LLMs) to ideal ones that guarantee group-fair outcomes. Previous work on fair generative models and representation steering could greatly benefit from provable fairness guarantees on the model output. We define a distribution as ideal if the minimizer of any cost-sensitive risk on it is guaranteed to have exact group-fair outcomes (e.g., demographic parity, equal opportunity)-in other words, it has no fairness-utility trade-off. We formulate an optimization program for optimal steering by finding the nearest ideal distribution in KL-divergence, and provide efficient algorithms for it when the underlying distributions come from well-known parametric families (e.g., normal, log-normal). Empirically, our optimal steering techniques on both synthetic and real-world datasets improve fairness without diminishing utility (and sometimes even improve utility). We demonstrate affine steering of LLM representations to reduce bias in multi-class classification, e.g., occupation prediction from a short biography in Bios dataset (De-Arteaga et al.). Furthermore, we steer internal representations of LLMs towards desired outputs so that it works equally well across different groups.


【5】The Multi-Query Paradox in Zeroth-Order Optimization
标题:零阶优化中的多查询悖论
链接:https://arxiv.org/abs/2509.15552

作者:Qingyu Song, Hong Xu
摘要:零阶(ZO)优化提供了一个强大的框架,显式梯度不可用的问题,必须近似使用查询函数值。流行的单查询方法是简单的,但遭受高估计方差,激励多查询范例,以提高估计精度。然而,这产生了一个关键的权衡:在查询的固定预算(即成本)下,每次迭代的查询和优化迭代的总数彼此成反比。如何最好地分配这一预算是一个基本的、未得到充分探讨的问题。   这项工作系统地解决了这个查询分配问题。我们分析了两种聚合方法:事实上的简单平均(ZO平均),和一个新的投影对齐方法(ZO对齐),我们来自本地代理最小化。通过推导这两种方法的收敛速度,使得依赖于查询的数量明确的强凸,凸,非凸,和随机设置,我们发现了一个鲜明的二分法:对于ZO-Avg,我们证明,使用一个以上的查询每次迭代总是查询效率低下,呈现单查询的方法最优。相反,ZO-Align通常在每次迭代查询更多的情况下性能更好,从而导致全子空间估计作为最佳方法。因此,我们的工作澄清,多查询问题归结为一个选择,而不是一个中间的查询大小,但两个经典的算法,选择完全由所使用的聚合方法。这些理论发现也得到了广泛实验的一致验证。
摘要:Zeroth-order (ZO) optimization provides a powerful framework for problems where explicit gradients are unavailable and have to be approximated using only queries to function value. The prevalent single-query approach is simple, but suffers from high estimation variance, motivating a multi-query paradigm to improves estimation accuracy. This, however, creates a critical trade-off: under a fixed budget of queries (i.e. cost), queries per iteration and the total number of optimization iterations are inversely proportional to one another. How to best allocate this budget is a fundamental, under-explored question.   This work systematically resolves this query allocation problem. We analyze two aggregation methods: the de facto simple averaging (ZO-Avg), and a new Projection Alignment method (ZO-Align) we derive from local surrogate minimization. By deriving convergence rates for both methods that make the dependence on the number of queries explicit across strongly convex, convex, non-convex, and stochastic settings, we uncover a stark dichotomy: For ZO-Avg, we prove that using more than one query per iteration is always query-inefficient, rendering the single-query approach optimal. On the contrary, ZO-Align generally performs better with more queries per iteration, resulting in a full-subspace estimation as the optimal approach. Thus, our work clarifies that the multi-query problem boils down to a choice not about an intermediate query size, but between two classic algorithms, a choice dictated entirely by the aggregation method used. These theoretical findings are also consistently validated by extensive experiments.


【6】Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises
标题:重尾噪音下的非凸分散随机二层优化
链接:https://arxiv.org/abs/2509.15543

作者:ang, Yihan Zhang, Hongchang Gao
摘要:现有的分散随机优化方法假设低层损失函数是强凸的,随机梯度噪声具有有限的方差。这些强假设在现实世界的机器学习模型中通常不满足。为了解决这些问题,我们提出了一种新的分散随机双层优化算法的重尾噪声下的非凸双层优化问题。具体来说,我们开发了一个归一化的随机方差减少双层梯度下降算法,它不依赖于任何裁剪操作。此外,我们建立了它的收敛速度,通过创新性地绑定重尾噪声下的非凸分散双层优化问题的相互依赖的梯度序列。据我们所知,这是重尾噪声下第一个具有严格理论保证的分散双层优化算法。大量的实验结果证实了我们的算法在处理重尾噪声的有效性。
摘要:Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noises. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noises for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noises. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noises.


【7】Training Variational Quantum Circuits Using Particle Swarm Optimization
标题:使用粒子群优化训练变分量子电路
链接:https://arxiv.org/abs/2509.15726

作者:dacci, Michele Amoretti
摘要:在这项工作中,粒子群优化(PSO)算法已被用于训练各种变分量子电路(VQC)。这种方法的动机是,通常使用的基于梯度的优化方法可能会受到贫瘠的高原问题。PSO是一种随机优化技术,灵感来自于一群鸟的集体行为。可以设置群的维数、算法的迭代次数和可训练参数的数量。在这项研究中,PSO已被用于训练整个结构的VQC,允许它选择应用哪些量子门,目标量子位,以及旋转角度,如果选择旋转。该算法仅限于从四种类型的门中选择:Rx,Ry,Rz和CNOT。所提出的优化方法已在MedMNIST的各种数据集上进行了测试,MedMNIST是为图像分类任务设计的生物医学图像数据集的集合。性能进行了比较与经典的随机梯度下降应用到一个预定义的VQC所取得的结果。结果表明,PSO可以在多个数据集上实现相当甚至更好的分类精度,尽管PSO使用的量子门数量比梯度下降优化使用的VQC少。
摘要:In this work, the Particle Swarm Optimization (PSO) algorithm has been used to train various Variational Quantum Circuits (VQCs). This approach is motivated by the fact that commonly used gradient-based optimization methods can suffer from the barren plateaus problem. PSO is a stochastic optimization technique inspired by the collective behavior of a swarm of birds. The dimension of the swarm, the number of iterations of the algorithm, and the number of trainable parameters can be set. In this study, PSO has been used to train the entire structure of VQCs, allowing it to select which quantum gates to apply, the target qubits, and the rotation angle, in case a rotation is chosen. The algorithm is restricted to choosing from four types of gates: Rx, Ry, Rz, and CNOT. The proposed optimization approach has been tested on various datasets of the MedMNIST, which is a collection of biomedical image datasets designed for image classification tasks. Performance has been compared with the results achieved by classical stochastic gradient descent applied to a predefined VQC. The results show that the PSO can achieve comparable or even better classification accuracy across multiple datasets, despite the PSO using a lower number of quantum gates than the VQC used with gradient descent optimization.


【8】Deep Gaussian Process-based Cost-Aware Batch Bayesian Optimization for Complex Materials Design Campaigns
标题:复杂材料设计活动的基于深高斯过程的成本感知批量Bayesian优化
链接:https://arxiv.org/abs/2509.14408

作者:af Akif Alvi, Brent Vela, Vahid Attari, Jan Janssen, Danny Perez, Douglas Allaire, Raymundo Arroyave
摘要:材料发现的加速步伐和不断扩大的范围需要优化框架,以有效地导航巨大的非线性设计空间,同时明智地分配有限的评估资源。我们提出了一个具有成本意识的,批量贝叶斯优化方案,由深度高斯过程(DGP)代理和异位查询策略提供支持。我们的DGP代理,通过堆叠GP层形成,模型高维组成特征之间的复杂层次关系,并捕获多个目标属性之间的相关性,通过连续层传播不确定性。我们将评估成本集成到一个置信度上限的采集扩展中,该扩展与异位查询一起并行提出小批量的候选人,平衡探索特征不足的区域,利用相关属性的高均值,低方差预测。应用于高温应用的耐火高熵合金,我们的框架比传统的基于GP的BO在更少的迭代中收敛到最佳配方,具有成本感知查询,突出了材料活动中深度,不确定性感知,成本敏感策略的价值。
摘要:The accelerating pace and expanding scope of materials discovery demand optimization frameworks that efficiently navigate vast, nonlinear design spaces while judiciously allocating limited evaluation resources. We present a cost-aware, batch Bayesian optimization scheme powered by deep Gaussian process (DGP) surrogates and a heterotopic querying strategy. Our DGP surrogate, formed by stacking GP layers, models complex hierarchical relationships among high-dimensional compositional features and captures correlations across multiple target properties, propagating uncertainty through successive layers. We integrate evaluation cost into an upper-confidence-bound acquisition extension, which, together with heterotopic querying, proposes small batches of candidates in parallel, balancing exploration of under-characterized regions with exploitation of high-mean, low-variance predictions across correlated properties. Applied to refractory high-entropy alloys for high-temperature applications, our framework converges to optimal formulations in fewer iterations with cost-aware queries than conventional GP-based BO, highlighting the value of deep, uncertainty-aware, cost-sensitive strategies in materials campaigns.


预测|估计(12篇)

【1】Predicting the descent into extremism and terrorism
标题:预测走向极端主义和恐怖主义
链接:https://arxiv.org/abs/2509.16014

作者:, W.J. Holmes, C.J. Taylor, H.M. State-Davey, A.J. Wragge
备注:10 pages, 12 figures, presented at 6th IMA Conference on Mathematics in Defence and Security, Online, 30 September 2023 (conference page at this https URL). arXiv admin note: text overlap with arXiv:2502.00013
摘要:本文提出了一种方法,用于自动分析和跟踪在线收集的材料中的声明,并检测声明的作者是否可能参与极端主义或恐怖主义。拟议的系统包括:在线校对语句,然后以适合机器学习(ML)的形式进行编码,ML组件对编码文本进行分类,跟踪器和可视化系统用于分析结果。检测和跟踪概念已经使用恐怖分子,极端分子,活动家和政治家的报价进行了测试,从wikiquote. org获得。使用最先进的通用句子编码器(Cer et al. 2018)为每个报价提取了一组特征,该编码器产生512维向量。这些数据被用来训练和测试支持向量机(SVM)分类器使用10倍交叉验证。该系统能够正确检测与极端主义相关的意图和态度81%的时间和97%的时间,使用839个报价的数据集。这种准确性高于基于n-gram文本特征的简单基线系统所实现的准确性。跟踪技术也被用来对数据进行时间分析,每个引用都被认为是对一个人精神状态的嘈杂测量。结果表明,跟踪算法能够发现随着时间推移的趋势和可能归因于重大事件的态度急剧变化。
摘要:This paper proposes an approach for automatically analysing and tracking statements in material gathered online and detecting whether the authors of the statements are likely to be involved in extremism or terrorism. The proposed system comprises: online collation of statements that are then encoded in a form amenable to machine learning (ML), an ML component to classify the encoded text, a tracker, and a visualisation system for analysis of results. The detection and tracking concept has been tested using quotes made by terrorists, extremists, campaigners, and politicians, obtained from wikiquote.org. A set of features was extracted for each quote using the state-of-the-art Universal Sentence Encoder (Cer et al. 2018), which produces 512-dimensional vectors. The data were used to train and test a support vector machine (SVM) classifier using 10-fold cross-validation. The system was able to correctly detect intentions and attitudes associated with extremism 81% of the time and terrorism 97% of the time, using a dataset of 839 quotes. This accuracy was higher than that which was achieved for a simple baseline system based on n-gram text features. Tracking techniques were also used to perform a temporal analysis of the data, with each quote considered to be a noisy measurement of a person's state of mind. It was demonstrated that the tracking algorithms were able to detect both trends over time and sharp changes in attitude that could be attributed to major events.


【2】UniTac2Pose: A Unified Approach Learned in Simulation for Category-level Visuotactile In-hand Pose Estimation
标题:UniTac2Pose:一种类别级手握位姿估计的统一仿真方法
链接:https://arxiv.org/abs/2509.15934

作者:Wu, Long Yang, Jin Liu, Weiyao Huang, Lehong Wu, Zelin Chen, Daolin Ma, Hao Dong
摘要:基于CAD模型准确估计物体的手持姿态在工业应用和日常任务中至关重要,从定位工件和组装组件到无缝插入USB连接器等设备。虽然现有的方法通常依赖于回归、特征匹配或配准技术,但实现对看不见的CAD模型的高精度和可推广性仍然是一个重大挑战。在本文中,我们提出了一种新的三阶段的框架,在手姿态估计。第一阶段涉及对姿势候选者进行采样和预排名,然后在第二阶段对这些候选者进行迭代细化。在最后阶段,应用后排序来识别最可能的姿势候选者。这些阶段由统一的基于能量的扩散模型控制,该模型仅在模拟数据上进行训练。该能量模型同时生成梯度以细化姿态估计,并产生量化姿态估计的质量的能量标量。此外,借用计算机视觉领域的想法,我们在基于能量的评分网络中引入了渲染比较架构,以显着提高模拟到真实的性能,正如我们的消融研究所证明的那样。我们进行了全面的实验,以表明我们的方法优于基于回归,匹配和配准技术的传统基线,同时还表现出强大的类别内泛化到以前看不见的CAD模型。此外,我们的方法将触觉对象姿态估计,姿态跟踪和不确定性估计集成到一个统一的框架中,从而在各种现实条件下实现鲁棒性能。
摘要:Accurate estimation of the in-hand pose of an object based on its CAD model is crucial in both industrial applications and everyday tasks, ranging from positioning workpieces and assembling components to seamlessly inserting devices like USB connectors. While existing methods often rely on regression, feature matching, or registration techniques, achieving high precision and generalizability to unseen CAD models remains a significant challenge. In this paper, we propose a novel three-stage framework for in-hand pose estimation. The first stage involves sampling and pre-ranking pose candidates, followed by iterative refinement of these candidates in the second stage. In the final stage, post-ranking is applied to identify the most likely pose candidates. These stages are governed by a unified energy-based diffusion model, which is trained solely on simulated data. This energy model simultaneously generates gradients to refine pose estimates and produces an energy scalar that quantifies the quality of the pose estimates. Additionally, borrowing the idea from the computer vision domain, we incorporate a render-compare architecture within the energy-based score network to significantly enhance sim-to-real performance, as demonstrated by our ablation studies. We conduct comprehensive experiments to show that our method outperforms conventional baselines based on regression, matching, and registration techniques, while also exhibiting strong intra-category generalization to previously unseen CAD models. Moreover, our approach integrates tactile object pose estimation, pose tracking, and uncertainty estimation into a unified framework, enabling robust performance across a variety of real-world conditions.


【3】From Data to Diagnosis: A Large, Comprehensive Bone Marrow Dataset and AI Methods for Childhood Leukemia Prediction
标题:从数据到诊断:用于儿童白血病预测的大型、全面的骨髓数据集和人工智能方法
链接:https://arxiv.org/abs/2509.15895

作者:öfener (1), Farina Kock (1), Martina Pontones (2), Tabita Ghete (2 and 3), David Pfrang (1), Nicholas Dickel (4), Meik Kunz (4), Daniela P. Schacherer (1), David A. Clunie (5), Andrey Fedorov (6), Max Westphal (1), Markus Metzler (2 and 3 and 7) ((1) Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany, (2) Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany, (3) Bavarian Cancer Research Center (BZKF), Erlangen, Germany, (4) Medical Informatics, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany, (5) PixelMed Publishing LLC, Bangor, PA, USA, (6) Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA, (7) Comprehensive Cancer Center Erlangen-EMN, Erlangen, Germany)
摘要:白血病诊断主要依赖于骨髓形态的手动显微镜分析,并辅以额外的实验室参数,这使得其复杂且耗时。虽然已经提出了人工智能(AI)解决方案,但大多数解决方案都使用私有数据集,并且只覆盖诊断管道的一部分。因此,我们提供了一个大型,高质量,公开可用的白血病骨髓数据集,涵盖了从细胞检测到诊断的整个诊断过程。使用这个数据集,我们进一步提出了细胞检测,细胞分类和诊断预测的方法。该数据集包括246名儿科患者的诊断,临床和实验室信息,超过40,000个带有边界框注释的单元格,其中超过28,000个具有高质量的类标签,使其成为公开可用的最全面的数据集。对AI模型的评估得出细胞检测的平均精度为0.96,曲线下面积为0.98,33类细胞分类的F1得分为0.61,使用预测细胞计数进行诊断预测的平均F1得分为0.90。虽然所提出的方法证明了它们对人工智能辅助诊断的有用性,但该数据集将促进该领域的进一步研究和开发,最终有助于更精确的诊断和改善患者的预后。
摘要:Leukemia diagnosis primarily relies on manual microscopic analysis of bone marrow morphology supported by additional laboratory parameters, making it complex and time consuming. While artificial intelligence (AI) solutions have been proposed, most utilize private datasets and only cover parts of the diagnostic pipeline. Therefore, we present a large, high-quality, publicly available leukemia bone marrow dataset spanning the entire diagnostic process, from cell detection to diagnosis. Using this dataset, we further propose methods for cell detection, cell classification, and diagnosis prediction. The dataset comprises 246 pediatric patients with diagnostic, clinical and laboratory information, over 40 000 cells with bounding box annotations and more than 28 000 of these with high-quality class labels, making it the most comprehensive dataset publicly available. Evaluation of the AI models yielded an average precision of 0.96 for the cell detection, an area under the curve of 0.98, and an F1-score of 0.61 for the 33-class cell classification, and a mean F1-score of 0.90 for the diagnosis prediction using predicted cell counts. While the proposed approaches demonstrate their usefulness for AI-assisted diagnostics, the dataset will foster further research and development in the field, ultimately contributing to more precise diagnoses and improved patient outcomes.


【4】Tsururu: A Python-based Time Series Forecasting Strategies Library
标题:鹤鲁:基于Python的时间序列预测策略库
链接:https://arxiv.org/abs/2509.15843

作者:tromina, Kseniia Kuvshinova, Aleksandr Yugay, Andrey Savchenko, Dmitry Simakov
备注:Accepted at IJCAI'25 Demo Track
摘要:虽然目前的时间序列研究的重点是开发新的模型,选择一个最佳的方法来训练这样的模型的关键问题是探索不足。Tsururu是本文介绍的Python库,通过灵活组合全局和多变量方法以及多步预测策略,为SoTA研究和行业搭建了桥梁。它还可以与各种预测模型无缝集成。可在https://github.com/sb-ai-lab/tsururu上获得。
摘要:While current time series research focuses on developing new models, crucial questions of selecting an optimal approach for training such models are underexplored. Tsururu, a Python library introduced in this paper, bridges SoTA research and industry by enabling flexible combinations of global and multivariate approaches and multi-step-ahead forecasting strategies. It also enables seamless integration with various forecasting models. Available at https://github.com/sb-ai-lab/tsururu .


【5】SolarCrossFormer: Improving day-ahead Solar Irradiance Forecasting by Integrating Satellite Imagery and Ground Sensors
标题:SolarCrossFormer:通过集成卫星图像和地面传感器来改善日前太阳辐射率预测
链接:https://arxiv.org/abs/2509.15827

作者:Schubnel, Jelena Simeunović, Corentin Tissier, Pierre-Jean Alet, Rafael E. Carrillo
备注:15 pages, 17 figures, submitted to IEEE Transactions on Sustainable Energy
摘要:太阳能光伏系统大规模并网需要准确的日前太阳辐照度预报。然而,目前的预测解决方案缺乏系统运营商所需的时间和空间分辨率。在本文中,我们介绍了SolarCrossFormer,这是一种用于日前辐照度预测的新型深度学习模型,它结合了来自地面气象站网络的卫星图像和时间序列。SolarCrossFormer使用新颖的图形神经网络来利用输入数据的模态间和模态内相关性,并提高预测的准确性和分辨率。它可以为瑞士任何地点生成概率预报,分辨率为15分钟,最长可达24小时。SolarCrossFormer的主要优势之一是其在实际操作中的鲁棒性。它可以在不重新训练模型的情况下纳入新的时间序列数据,此外,它可以在没有输入数据的情况下仅使用坐标来生成位置预测。在瑞士127个地点的一年数据集上的实验结果表明,SolarCrossFormer在预测范围内的归一化平均绝对误差为6.1%。其结果与商业数值天气预报服务所取得的结果具有竞争力。
摘要:Accurate day-ahead forecasts of solar irradiance are required for the large-scale integration of solar photovoltaic (PV) systems into the power grid. However, current forecasting solutions lack the temporal and spatial resolution required by system operators. In this paper, we introduce SolarCrossFormer, a novel deep learning model for day-ahead irradiance forecasting, that combines satellite images and time series from a ground-based network of meteorological stations. SolarCrossFormer uses novel graph neural networks to exploit the inter- and intra-modal correlations of the input data and improve the accuracy and resolution of the forecasts. It generates probabilistic forecasts for any location in Switzerland with a 15-minute resolution for horizons up to 24 hours ahead. One of the key advantages of SolarCrossFormer its robustness in real life operations. It can incorporate new time-series data without retraining the model and, additionally, it can produce forecasts for locations without input data by using only their coordinates. Experimental results over a dataset of one year and 127 locations across Switzerland show that SolarCrossFormer yield a normalized mean absolute error of 6.1 % over the forecasting horizon. The results are competitive with those achieved by a commercial numerical weather prediction service.


【6】Incremental Multistep Forecasting of Battery Degradation Using Pseudo Targets
标题:使用伪目标的电池退化增量多步预测
链接:https://arxiv.org/abs/2509.15740

作者:Adam Rico, Nagarajan Raghavan, Senthilnath Jayavelu
备注:The published version of this preprint can be accessed at this https URL
摘要:数据驱动模型可准确执行早期电池预测,以防止设备故障和进一步的安全隐患。大多数现有的机器学习(ML)模型都在离线模式下工作,每次遇到新的数据分布时,都必须考虑部署后的再训练。因此,需要一种在线ML方法,其中模型可以适应不同的分布。然而,现有的在线增量多步预测是一个很大的挑战,因为没有办法纠正其预测模型在当前的实例。此外,这些方法需要等待相当长的时间来获取足够的流数据,然后再重新训练。在这项研究中,我们提出了iFSNet(增量式快速和慢速学习网络),这是一个修改版本的FSNet的单通道模式(样本的样本),以实现多步预测使用伪目标。它使用输入序列的简单线性回归来外推伪未来样本(伪目标),并计算其余预测的损失,并不断更新模型。该模型充分利用了FSNet的联想记忆和自适应结构机制,同时通过引入伪目标对模型进行了渐进式改进。该模型在具有平滑退化轨迹的数据集上实现了0.00197 RMSE和0.00154 MAE,而在具有容量再生尖峰的不规则退化轨迹的数据集上实现了0.01588 RMSE和0.01234 MAE。
摘要 :Data-driven models accurately perform early battery prognosis to prevent equipment failure and further safety hazards. Most existing machine learning (ML) models work in offline mode which must consider their retraining post-deployment every time new data distribution is encountered. Hence, there is a need for an online ML approach where the model can adapt to varying distributions. However, existing online incremental multistep forecasts are a great challenge as there is no way to correct the model of its forecasts at the current instance. Also, these methods need to wait for a considerable amount of time to acquire enough streaming data before retraining. In this study, we propose iFSNet (incremental Fast and Slow learning Network) which is a modified version of FSNet for a single-pass mode (sample-by-sample) to achieve multistep forecasting using pseudo targets. It uses a simple linear regressor of the input sequence to extrapolate pseudo future samples (pseudo targets) and calculate the loss from the rest of the forecast and keep updating the model. The model benefits from the associative memory and adaptive structure mechanisms of FSNet, at the same time the model incrementally improves by using pseudo targets. The proposed model achieved 0.00197 RMSE and 0.00154 MAE on datasets with smooth degradation trajectories while it achieved 0.01588 RMSE and 0.01234 MAE on datasets having irregular degradation trajectories with capacity regeneration spikes.


【7】Manifold Dimension Estimation: An Empirical Study
标题:流形维数估计的实证研究
链接:https://arxiv.org/abs/2509.15517

作者:, Pierre Lafaye de Micheaux
摘要:流形假设认为高维数据通常位于低维流形上或附近。估计这个多方面的尺寸是必不可少的,利用其结构,但现有的工作,尺寸估计是零散的,缺乏系统的评价。这篇文章为研究人员和从业人员提供了一个全面的调查。我们回顾经常被忽视的理论基础,并提出八个代表性的估计。通过对照实验,我们分析了噪声、曲率和样本大小等单个因素如何影响性能。我们还比较了不同的合成和真实世界的数据集上的估计,介绍了一个原则性的方法,以特定的超参数调整。我们的研究结果提供了实际的指导,并建议,对于这种普遍性的问题,更简单的方法往往表现得更好。
摘要:The manifold hypothesis suggests that high-dimensional data often lie on or near a low-dimensional manifold. Estimating the dimension of this manifold is essential for leveraging its structure, yet existing work on dimension estimation is fragmented and lacks systematic evaluation. This article provides a comprehensive survey for both researchers and practitioners. We review often-overlooked theoretical foundations and present eight representative estimators. Through controlled experiments, we analyze how individual factors such as noise, curvature, and sample size affect performance. We also compare the estimators on diverse synthetic and real-world datasets, introducing a principled approach to dataset-specific hyperparameter tuning. Our results offer practical guidance and suggest that, for a problem of this generality, simpler methods often perform better.


【8】KoopCast: Trajectory Forecasting via Koopman Operators
标题:KoopCast:通过Koopman操作员进行轨迹预测
链接:https://arxiv.org/abs/2509.15513

作者:ee, Jaeuk Shin, Gihwan Kim, Joonho Han, Insoon Yang
摘要:我们提出了KoopCast,一个轻量级而有效的模型,在一般的动态环境中的轨迹预测。我们的方法利用Koopman算子理论,该理论通过将轨迹提升到高维空间来实现非线性动力学的线性表示。该框架遵循两个阶段的设计:首先,概率神经目标估计器预测合理的长期目标,指定去哪里;其次,基于Koopman算子的细化模块将意图和历史纳入非线性特征空间,实现线性预测,决定如何去。这种对偶结构不仅保证了很强的预测精度,而且继承了线性算子的优良特性,同时忠实地捕捉非线性动力学。因此,我们的模型提供了三个关键优势:(i)具有竞争力的准确性,(ii)基于Koopman频谱理论的可解释性,以及(iii)低延迟部署。我们在ETH/UCY、Waymo Open Motion Dataset和nuScenes上验证了这些优势,这些优势具有丰富的多智能体交互和地图约束的非线性运动。在基准测试中,KoopCast始终提供高预测准确性以及模式级别的可解释性和实际效率。
摘要:We present KoopCast, a lightweight yet efficient model for trajectory forecasting in general dynamic environments. Our approach leverages Koopman operator theory, which enables a linear representation of nonlinear dynamics by lifting trajectories into a higher-dimensional space. The framework follows a two-stage design: first, a probabilistic neural goal estimator predicts plausible long-term targets, specifying where to go; second, a Koopman operator-based refinement module incorporates intention and history into a nonlinear feature space, enabling linear prediction that dictates how to go. This dual structure not only ensures strong predictive accuracy but also inherits the favorable properties of linear operators while faithfully capturing nonlinear dynamics. As a result, our model offers three key advantages: (i) competitive accuracy, (ii) interpretability grounded in Koopman spectral theory, and (iii) low-latency deployment. We validate these benefits on ETH/UCY, the Waymo Open Motion Dataset, and nuScenes, which feature rich multi-agent interactions and map-constrained nonlinear motion. Across benchmarks, KoopCast consistently delivers high predictive accuracy together with mode-level interpretability and practical efficiency.


【9】VMDNet: Time Series Forecasting with Leakage-Free Samplewise Variational Mode Decomposition and Multibranch Decoding
标题:VMDNet:采用无泄漏样本变分模式分解和多分支解码的时间序列预测
链接:https://arxiv.org/abs/2509.15394

作者:ng, Ran Tao, John Cartlidge, Jin Zheng
备注:5 pages, 1 figure, 2 tables
摘要:在时间序列预测中,捕获周期性的时间模式是必不可少的;分解技术使这种结构明确,从而提高预测性能。变分模式分解(VMD)是一种功能强大的信号处理方法,用于感知信号的分解,近年来得到了越来越多的采用。然而,现有的研究往往遭受信息泄漏,并依赖于不适当的超参数调整。为了解决这些问题,我们提出了VMDNet,这是一个保持容量的框架,它(i)应用逐样本VMD来避免泄漏;(ii)用频率感知嵌入表示每个分解模式,并使用并行时间卷积网络(TCN)对其进行解码,确保模式独立性和有效学习;(iii)引入双层,Stackelberg启发的优化来自适应地选择VMD的两个核心超参数:模式的数量(K)和带宽损失(alpha)。在两个能量相关数据集上的实验表明,VMDNet在周期性较强时达到了最先进的结果,在捕获结构化周期性模式方面表现出明显的优势,同时在弱周期性下保持鲁棒性。
摘要:In time series forecasting, capturing recurrent temporal patterns is essential; decomposition techniques make such structure explicit and thereby improve predictive performance. Variational Mode Decomposition (VMD) is a powerful signal-processing method for periodicity-aware decomposition and has seen growing adoption in recent years. However, existing studies often suffer from information leakage and rely on inappropriate hyperparameter tuning. To address these issues, we propose VMDNet, a causality-preserving framework that (i) applies sample-wise VMD to avoid leakage; (ii) represents each decomposed mode with frequency-aware embeddings and decodes it using parallel temporal convolutional networks (TCNs), ensuring mode independence and efficient learning; and (iii) introduces a bilevel, Stackelberg-inspired optimisation to adaptively select VMD's two core hyperparameters: the number of modes (K) and the bandwidth penalty (alpha). Experiments on two energy-related datasets demonstrate that VMDNet achieves state-of-the-art results when periodicity is strong, showing clear advantages in capturing structured periodic patterns while remaining robust under weak periodicity.


【10】Copycat vs. Original: Multi-modal Pretraining and Variable Importance in Box-office Prediction
标题:模仿与原创:票房预测中的多模式预训练和可变重要性
链接:https://arxiv.org/abs/2509.15277

作者: Eunsoo Kim, Boyang Li
摘要 :电影行业的风险水平很高,这就需要使用自动化工具来预测票房收入并促进人类决策。在这项研究中,我们建立了一个复杂的多模态神经网络,通过将每部电影的众包描述性关键字在电影海报的视觉信息中接地来预测票房,从而增强了学习的关键字表示,从而大幅降低了14.5%的票房预测误差。先进的收入预测模型可以分析“山寨电影”或与最近发行的成功电影有很大相似性的电影的商业可行性。我们通过计算票房预测中模仿特征的影响来做到这一点。我们发现,模仿状态和电影收入之间存在正相关关系。然而,当相似电影的数量和它们的内容的相似性增加时,这种影响会减弱。总的来说,我们的工作开发了用于研究电影行业的复杂深度学习工具,并提供了有价值的业务洞察力。
摘要:The movie industry is associated with an elevated level of risk, which necessitates the use of automated tools to predict box-office revenue and facilitate human decision-making. In this study, we build a sophisticated multimodal neural network that predicts box offices by grounding crowdsourced descriptive keywords of each movie in the visual information of the movie posters, thereby enhancing the learned keyword representations, resulting in a substantial reduction of 14.5% in box-office prediction error. The advanced revenue prediction model enables the analysis of the commercial viability of "copycat movies," or movies with substantial similarity to successful movies released recently. We do so by computing the influence of copycat features in box-office prediction. We find a positive relationship between copycat status and movie revenue. However, this effect diminishes when the number of similar movies and the similarity of their content increase. Overall, our work develops sophisticated deep learning tools for studying the movie industry and provides valuable business insight.


【11】DeepMech: A Machine Learning Framework for Chemical Reaction Mechanism Prediction
标题:DeepMech:用于化学反应机制预测的机器学习框架
链接:https://arxiv.org/abs/2509.15872

作者:as, Ajnabiul Hoque, Mayank Baranwal, Raghavan B. Sunoj
备注:37 pages, 8 figures
摘要:完整的分步化学反应机理(CRM)的预测仍然是一个重大的挑战。CRM任务中的传统方法依赖于专家驱动的实验或昂贵的量子化学计算,而当代深度学习(DL)替代方案忽略了关键的中间体和机械步骤,并且经常受到幻觉的困扰。我们提出了DeepMech,一个可解释的基于图的DL框架,采用原子和键级的注意力,由机械操作的广义模板(TMOps)指导,以生成CRM。在我们精心策划的ReactMech数据集(约30 K CRM,具有100 K原子映射和质量平衡的基本步骤)上进行训练,DeepMech在预测基本步骤方面达到了98.98+/-0.12%的准确率,在完成CRM任务方面达到了95.94+/-0.21%的准确率,此外,即使在分发外的场景中也保持了高保真度,以及预测副产品和/或副产品。扩展到与益生元化学相关的多步骤CRM,证明了DeepMech有效重建从简单原始底物到复杂生物分子(如丝氨酸和戊醛糖)的途径的能力。注意力分析识别出符合化学直觉的反应原子/键,使我们的模型可解释并适合反应设计。
摘要:Prediction of complete step-by-step chemical reaction mechanisms (CRMs) remains a major challenge. Whereas the traditional approaches in CRM tasks rely on expert-driven experiments or costly quantum chemical computations, contemporary deep learning (DL) alternatives ignore key intermediates and mechanistic steps and often suffer from hallucinations. We present DeepMech, an interpretable graph-based DL framework employing atom- and bond-level attention, guided by generalized templates of mechanistic operations (TMOps), to generate CRMs. Trained on our curated ReactMech dataset (~30K CRMs with 100K atom-mapped and mass-balanced elementary steps), DeepMech achieves 98.98+/-0.12% accuracy in predicting elementary steps and 95.94+/-0.21% in complete CRM tasks, besides maintaining high fidelity even in out-of-distribution scenarios as well as in predicting side and/or byproducts. Extension to multistep CRMs relevant to prebiotic chemistry, demonstrates the ability of DeepMech in effectively reconstructing pathways from simple primordial substrates to complex biomolecules such as serine and aldopentose. Attention analysis identifies reactive atoms/bonds in line with chemical intuition, rendering our model interpretable and suitable for reaction design.


【12】(SP)$^2$-Net: A Neural Spatial Spectrum Method for DOA Estimation
标题:(SP)$#2 $-Net:一种用于波达方向估计的神经空间谱方法
链接:https://arxiv.org/abs/2509.15475

作者:an, Sharon Gannot, Tom Tirer
备注:Code can be found at this https URL
摘要:我们考虑的问题,估计多个源的到达方向(DOA)从一个单一的快照的天线阵列,一个任务与许多实际应用。在这样的设置中,经典的Bartlett波束形成器是常用的,因为最大似然估计变得不切实际时,源的数量是未知的或大的,和基于样本协方差的谱方法是不适用的,由于缺乏多个快照。然而,Bartlett波束形成器的精度和分辨率从根本上受到阵列孔径的限制。在本文中,我们提出了一种深度学习技术,包括一种新的架构和训练策略,用于从单个快照生成高分辨率的空间谱。具体来说,我们训练一个深度神经网络,该网络将测量值和假设角度作为输入,并学习输出与更广泛阵列的能力一致的分数。在推理时,可以通过扫描任意一组角度来生成热图。我们证明了我们的训练模型,命名为(SP)$^2$-Net,Bartlett波束形成器和基于稀疏的DOA估计方法的优势。
摘要:We consider the problem of estimating the directions of arrival (DOAs) of multiple sources from a single snapshot of an antenna array, a task with many practical applications. In such settings, the classical Bartlett beamformer is commonly used, as maximum likelihood estimation becomes impractical when the number of sources is unknown or large, and spectral methods based on the sample covariance are not applicable due to the lack of multiple snapshots. However, the accuracy and resolution of the Bartlett beamformer are fundamentally limited by the array aperture. In this paper, we propose a deep learning technique, comprising a novel architecture and training strategy, for generating a high-resolution spatial spectrum from a single snapshot. Specifically, we train a deep neural network that takes the measurements and a hypothesis angle as input and learns to output a score consistent with the capabilities of a much wider array. At inference time, a heatmap can be produced by scanning an arbitrary set of angles. We demonstrate the advantages of our trained model, named (SP)$^2$-Net, over the Bartlett beamformer and sparsity-based DOA estimation methods.


其他神经网络|深度学习|模型|建模(24篇)

【1】MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
标题:MANZANO:一个简单且可扩展的统一多模式模型,具有混合视觉令牌化器
链接:https://arxiv.org/abs/2509.16197

作者:i, Rui Qian, Bowen Pan, Haotian Zhang, Haoshuo Huang, Bowen Zhang, Jialing Tong, Haoxuan You, Xianzhi Du, Zhe Gan, Hyunjik Kim, Chao Jia, Zhenbang Wang, Yinfei Yang, Mingfei Gao, Zi-Yi Dou, Wenze Hu, Chang Gao, Dongxu Li, Philipp Dufter, Zirui Wang, Guoli Yin, Zhengdong Zhang, Chen Chen, Yang Zhao, Ruoming Pang, Zhifeng Chen
摘要:统一的多模态大语言模型(LLM)可以理解和生成视觉内容,具有巨大的潜力。然而,现有的开放源代码模型往往遭受这些功能之间的性能权衡。我们提出了Manzano,一个简单且可扩展的统一框架,通过将混合图像标记器与精心策划的训练配方相结合,大大减少了这种紧张局势。单个共享视觉编码器为两个轻量级适配器提供支持,这些适配器在公共语义空间内生成用于图像到文本理解的连续嵌入和用于文本到图像生成的离散令牌。一个统一的自回归LLM预测文本和图像令牌的形式的高级语义,与辅助扩散解码器随后翻译成像素的图像令牌。该架构与理解和生成数据的统一训练配方一起,实现了两种功能的可扩展联合学习。Manzano在统一模型中实现了最先进的结果,与专业模型相比具有竞争力,特别是在富文本评估方面。我们的研究表明,最小的任务冲突和一致的收益从缩放模型的大小,验证我们的设计选择的混合标记。
摘要 :Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. A single shared vision encoder feeds two lightweight adapters that produce continuous embeddings for image-to-text understanding and discrete tokens for text-to-image generation within a common semantic space. A unified autoregressive LLM predicts high-level semantics in the form of text and image tokens, with an auxiliary diffusion decoder subsequently translating the image tokens into pixels. The architecture, together with a unified training recipe over understanding and generation data, enables scalable joint learning of both capabilities. Manzano achieves state-of-the-art results among unified models, and is competitive with specialist models, particularly on text-rich evaluation. Our studies show minimal task conflicts and consistent gains from scaling model size, validating our design choice of a hybrid tokenizer.


【2】Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
标题:潜在学习:情景记忆通过灵活重复使用经验来补充参数学习
链接:https://arxiv.org/abs/2509.16189

作者:le Lampinen, Martin Engelcke, Yuxuan Li, Arslan Chaudhry, James L. McClelland
摘要:机器学习系统什么时候不能泛化,什么机制可以提高它们的泛化能力?在这里,我们从认知科学中汲取灵感,认为机器学习系统的一个弱点是它们未能表现出潜在学习--学习与手头任务无关但可能在未来任务中有用的信息。我们展示了这种观点如何将失败联系起来,从语言建模中的逆转诅咒到基于代理的导航的新发现。然后,我们强调认知科学如何将情景记忆作为解决这些问题的潜在部分。相应地,我们表明,具有Oracle检索机制的系统可以更灵活地使用学习经验来更好地概括这些挑战。我们还确定了一些重要的组成部分,有效地使用检索,包括在上下文学习的重要性,为获得使用信息的能力,在检索的例子。总之,我们的研究结果说明了与自然智能相比,当前机器学习系统相对数据效率低下的一个可能原因,并有助于理解检索方法如何补充参数学习以提高泛化能力。
摘要:When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of machine learning systems is their failure to exhibit latent learning -- learning information that is not relevant to the task at hand, but that might be useful in a future task. We show how this perspective links failures ranging from the reversal curse in language modeling to new findings on agent-based navigation. We then highlight how cognitive science points to episodic memory as a potential part of the solution to these issues. Correspondingly, we show that a system with an oracle retrieval mechanism can use learning experiences more flexibly to generalize better across many of these challenges. We also identify some of the essential components for effectively using retrieval, including the importance of within-example in-context learning for acquiring the ability to use information across retrieved examples. In summary, our results illustrate one possible contributor to the relative data inefficiency of current machine learning systems compared to natural intelligence, and help to understand how retrieval methods can complement parametric learning to improve generalization.


【3】Spatio-temporal, multi-field deep learning of shock propagation in meso-structured media
标题:细观结构介质中冲击波传播的时空、多领域深度学习
链接:https://arxiv.org/abs/2509.16139

作者:e Fernández-Godino, Meir H. Shachar, Kevin Korner, Jonathan L. Belof, Mukul Kumar, Jonathan Lind, William J. Schill
备注:16 pages, 10 figures
摘要:预测冲击波如何穿越多孔和结构材料的能力是行星防御、国家安全和实现惯性聚变能源竞赛的决定性因素。然而,捕捉孔塌缩,异常Hugoniot响应和局部加热-可以确定小行星偏转或聚变点火成功的现象-仍然是一个重大挑战,尽管最近在单场和减少表示方面取得了进展。我们引入了一个多场时空深度学习模型(MSTM),该模型将七个耦合场(压力、密度、温度、能量、物质分布和两个速度分量)统一到一个自回归代理中。在高保真的水代码数据的训练下,MSTM的运行速度比直接模拟快约1000倍,在多孔材料中的误差低于4%,在晶格结构中的误差低于10%。与以前的单场或基于操作员的替代品不同,MSTM解决了尖锐的冲击波阵面,同时将质量平均压力和温度等积分量保持在5%以内。这一进展将曾经被认为棘手的问题转化为易于处理的设计研究,为优化行星撞击缓解,惯性聚变能源和国家安全中的介观结构材料建立了实用框架。
摘要:The ability to predict how shock waves traverse porous and architected materials is a decisive factor in planetary defense, national security, and the race to achieve inertial fusion energy. Yet capturing pore collapse, anomalous Hugoniot responses, and localized heating -- phenomena that can determine the success of asteroid deflection or fusion ignition -- has remained a major challenge despite recent advances in single-field and reduced representations. We introduce a multi-field spatio-temporal deep learning model (MSTM) that unifies seven coupled fields -- pressure, density, temperature, energy, material distribution, and two velocity components -- into a single autoregressive surrogate. Trained on high-fidelity hydrocode data, MSTM runs about a thousand times faster than direct simulation, achieving errors below 4\% in porous materials and below 10\% in lattice structures. Unlike prior single-field or operator-based surrogates, MSTM resolves sharp shock fronts while preserving integrated quantities such as mass-averaged pressure and temperature to within 5\%. This advance transforms problems once considered intractable into tractable design studies, establishing a practical framework for optimizing meso-structured materials in planetary impact mitigation, inertial fusion energy, and national security.


【4】Automated Constitutive Model Discovery by Pairing Sparse Regression Algorithms with Model Selection Criteria
标题:通过将稀疏回归算法与模型选择标准配对自动发现本构模型
链接:https://arxiv.org/abs/2509.16040

作者:berto Urrea-Quintero, David Anton, Laura De Lorenzis, Henning Wessels
摘要:从数据中自动发现本构模型最近已经成为传统模型校准范式的一个有前途的替代方案。在这项工作中,我们提出了一个完全自动化的框架,本构模型的发现,系统地对三个稀疏回归算法(最小绝对收缩和选择算子(LASSO),最小角度回归(LARS),正交匹配追踪(OMP))与三个模型选择标准:$K$折交叉验证(CV),赤池信息准则(AIC),贝叶斯信息准则(BIC)。这种配对产生了九种不同的模型发现算法,并能够系统地探索稀疏性,预测性能和计算成本之间的权衡。虽然LARS作为一个有效的基于路径的求解器的$\ell_1$约束问题,OMP作为一个易于处理的启发式的$\ell_0 $正则化选择。该框架适用于各向同性和各向异性超弹性,利用合成和实验数据集。结果表明,所有九个算法的标准组合表现一贯良好的各向同性和各向异性材料的发现,产生高度准确的本构模型。这些发现拓宽了可行的发现算法的范围,超越了基于$\ell_1 $的方法,如LASSO。
摘要:The automated discovery of constitutive models from data has recently emerged as a promising alternative to the traditional model calibration paradigm. In this work, we present a fully automated framework for constitutive model discovery that systematically pairs three sparse regression algorithms (Least Absolute Shrinkage and Selection Operator (LASSO), Least Angle Regression (LARS), and Orthogonal Matching Pursuit (OMP)) with three model selection criteria: $K$-fold cross-validation (CV), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). This pairing yields nine distinct algorithms for model discovery and enables a systematic exploration of the trade-off between sparsity, predictive performance, and computational cost. While LARS serves as an efficient path-based solver for the $\ell_1$-constrained problem, OMP is introduced as a tractable heuristic for $\ell_0$-regularized selection. The framework is applied to both isotropic and anisotropic hyperelasticity, utilizing both synthetic and experimental datasets. Results reveal that all nine algorithm-criterion combinations perform consistently well for the discovery of isotropic and anisotropic materials, yielding highly accurate constitutive models. These findings broaden the range of viable discovery algorithms beyond $\ell_1$-based approaches such as LASSO.


【5】Targeted Fine-Tuning of DNN-Based Receivers via Influence Functions
标题:通过影响函数对基于DNN的接收机进行有针对性的微调
链接:https://arxiv.org/abs/2509.15950

作者:nonen, Heikki Penttinen, Ville Hautamäki
备注:7 pages; 10 figures; 1 table; 19 equations
摘要:我们首次将影响函数用于基于深度学习的无线接收器。应用于DeepRx,一个完全卷积的接收器,影响分析揭示了哪些训练样本驱动比特预测,从而能够有针对性地微调性能不佳的情况。我们发现,损失相对影响与容量一样的二进制交叉熵损失和一阶更新的有益样本最一致地提高了误码率对基因辅助性能,优于随机微调在单目标的情况下。事实证明,多目标适应的效果较差,突出表明了各种挑战。除了实验之外,我们还将影响力与自我影响力校正联系起来,并提出了一个二阶的、与影响力一致的更新策略。我们的研究结果建立影响函数作为一个解释性工具和有效的接收器适应的基础。
摘要:We present the first use of influence functions for deep learning-based wireless receivers. Applied to DeepRx, a fully convolutional receiver, influence analysis reveals which training samples drive bit predictions, enabling targeted fine-tuning of poorly performing cases. We show that loss-relative influence with capacity-like binary cross-entropy loss and first-order updates on beneficial samples most consistently improves bit error rate toward genie-aided performance, outperforming random fine-tuning in single-target scenarios. Multi-target adaptation proved less effective, underscoring open challenges. Beyond experiments, we connect influence to self-influence corrections and propose a second-order, influence-aligned update strategy. Our results establish influence functions as both an interpretability tool and a basis for efficient receiver adaptation.


【6】Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds
标题:作为世界模型的基础模型:基于文本的GridWorlds的基础研究
链接:https://arxiv.org/abs/2509.15915

作者:o, Michelangelo Conserva, Dominik Jeurissen, Paulo Rauber
备注:20 pages, 9 figures. Accepted for presentation at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop on Embodied World Models for Decision Making
摘要:虽然从头开始的强化学习在使用高效的模拟器解决顺序决策任务方面取得了令人印象深刻的结果,但具有昂贵交互的现实世界应用需要更多样本效率的代理。基础模型(FM)是提高样本效率的自然候选者,因为它们拥有广泛的知识和推理能力,但目前还不清楚如何将它们有效地集成到强化学习框架中。在本文中,我们预计,最重要的是,评估两个有前途的战略。首先,我们考虑使用基金会世界模型(FWM),利用FM的先验知识,使训练和评估代理与模拟的相互作用。其次,我们考虑使用的基础代理(FA),利用决策的FM的推理能力。我们评估这两种方法的经验,在一个家庭的网格世界的环境,适合当前一代的大型语言模型(LLM)。我们的研究结果表明,LLM的改进已经转化为更好的FWM和FA;基于当前LLM的FA已经可以为足够简单的环境提供优秀的策略; FWM和强化学习代理的耦合对于具有部分可观测性和随机元素的更复杂的设置非常有希望。
摘要:While reinforcement learning from scratch has shown impressive results in solving sequential decision-making tasks with efficient simulators, real-world applications with expensive interactions require more sample-efficient agents. Foundation models (FMs) are natural candidates to improve sample efficiency as they possess broad knowledge and reasoning capabilities, but it is yet unclear how to effectively integrate them into the reinforcement learning framework. In this paper, we anticipate and, most importantly, evaluate two promising strategies. First, we consider the use of foundation world models (FWMs) that exploit the prior knowledge of FMs to enable training and evaluating agents with simulated interactions. Second, we consider the use of foundation agents (FAs) that exploit the reasoning capabilities of FMs for decision-making. We evaluate both approaches empirically in a family of grid-world environments that are suitable for the current generation of large language models (LLMs). Our results suggest that improvements in LLMs already translate into better FWMs and FAs; that FAs based on current LLMs can already provide excellent policies for sufficiently simple environments; and that the coupling of FWMs and reinforcement learning agents is highly promising for more complex settings with partial observability and stochastic elements.


【7】Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data
标题:通过采样合成数据在潜空间中进行高效长尾学习
链接:https://arxiv.org/abs/2509.15859

作者:rma
备注:Accepted to Curated Data for Efficient Learning Workshop at ICCV 2025
摘要:不平衡的分类数据集给机器学习带来了重大挑战,通常会导致有偏见的模型在代表性不足的类上表现不佳。随着基金会模型的兴起,最近的研究集中在这些模型的全部,部分和参数有效的微调,以处理长尾分类。尽管这些工作在基准数据集上的表现令人印象深刻,但它们仍然无法缩小与使用平衡数据集训练的网络的差距,并且仍然需要大量的计算资源,即使是相对较小的数据集。强调计算效率和简单性的重要性,在这项工作中,我们提出了一个新的框架,利用视觉基础模型的丰富的语义潜在空间生成合成数据和训练一个简单的线性分类器使用的混合真实和合成数据的长尾分类。计算效率增益来自于减少到线性模型中的参数数量的可训练参数的数量。我们的方法为CIFAR-100-LT基准设定了一个新的最先进的水平,并在Places-LT基准上表现出了强大的性能,突出了我们简单有效的方法的有效性和适应性。
摘要:Imbalanced classification datasets pose significant challenges in machine learning, often leading to biased models that perform poorly on underrepresented classes. With the rise of foundation models, recent research has focused on the full, partial, and parameter-efficient fine-tuning of these models to deal with long-tail classification. Despite the impressive performance of these works on the benchmark datasets, they still fail to close the gap with the networks trained using the balanced datasets and still require substantial computational resources, even for relatively smaller datasets. Underscoring the importance of computational efficiency and simplicity, in this work we propose a novel framework that leverages the rich semantic latent space of Vision Foundation Models to generate synthetic data and train a simple linear classifier using a mixture of real and synthetic data for long-tail classification. The computational efficiency gain arises from the number of trainable parameters that are reduced to just the number of parameters in the linear model. Our method sets a new state-of-the-art for the CIFAR-100-LT benchmark and demonstrates strong performance on the Places-LT benchmark, highlighting the effectiveness and adaptability of our simple and effective approach.


【8】ThermalGuardian: Temperature-Aware Testing of Automotive Deep Learning Frameworks
标题:ThermalGuardian:汽车深度学习框架的温度感知测试
链接:https://arxiv.org/abs/2509.15815

作者:Zou, Juan Zhai, Chunrong Fang, Zhenyu Chen
摘要:深度学习模型在自动驾驶系统中发挥着至关重要的作用,支持环境感知等关键功能。为了加速模型推理,这些深度学习模型的部署依赖于汽车深度学习框架,例如Apollo中的PaddleInference和AutoWare中的TensorRT。然而,与在云上部署深度学习模型不同,车辆环境会经历从-40 ° C到50 ° C的极端环境温度,这会显著影响GPU温度。此外,计算时产生的热量进一步导致GPU温度升高。这些温度波动会导致通过DVFS等机制动态调整GPU频率。然而,汽车深度学习框架的设计没有考虑温度引起的频率变化的影响。当部署在温度变化的GPU上时,这些框架会遇到严重的质量问题:计算密集型运算符面临延迟或错误,高/混合精度运算符会遇到精度错误,时间序列运算符会遇到同步问题。现有的深度学习框架测试方法无法检测到上述质量问题,因为它们忽略了温度对深度学习框架质量的影响。为了弥补这一差距,我们提出了ThermalGuardian,这是第一个在温度变化环境下测试汽车深度学习框架的方法。具体而言,ThermalGuardian使用针对温度敏感算子的模型变异规则生成测试输入模型,根据牛顿冷却定律模拟GPU温度波动,并根据实时GPU温度控制GPU频率。
摘要 :Deep learning models play a vital role in autonomous driving systems, supporting critical functions such as environmental perception. To accelerate model inference, these deep learning models' deployment relies on automotive deep learning frameworks, for example, PaddleInference in Apollo and TensorRT in AutoWare. However, unlike deploying deep learning models on the cloud, vehicular environments experience extreme ambient temperatures varying from -40{\deg}C to 50{\deg}C, significantly impacting GPU temperature. Additionally, heats generated when computing further lead to the GPU temperature increase. These temperature fluctuations lead to dynamic GPU frequency adjustments through mechanisms such as DVFS. However, automotive deep learning frameworks are designed without considering the impact of temperature-induced frequency variations. When deployed on temperature-varying GPUs, these frameworks suffer critical quality issues: compute-intensive operators face delays or errors, high/mixed-precision operators suffer from precision errors, and time-series operators suffer from synchronization issues. The above quality issues cannot be detected by existing deep learning framework testing methods because they ignore temperature's effect on the deep learning framework quality. To bridge this gap, we propose ThermalGuardian, the first automotive deep learning framework testing method under temperature-varying environments. Specifically, ThermalGuardian generates test input models using model mutation rules targeting temperature-sensitive operators, simulates GPU temperature fluctuations based on Newton's law of cooling, and controls GPU frequency based on real-time GPU temperature.


【9】Learning to Optimize Capacity Planning in Semiconductor Manufacturing
标题:学习优化半导体制造中的产能规划
链接:https://arxiv.org/abs/2509.15767

作者:ndelfinger, Jieyi Bi, Qiuyu Zhu, Jianan Zhou, Bo Zhang, Fei Fei Zhang, Chew Wye Chan, Boon Ping Gan, Wentong Cai, Jie Zhang
摘要:在制造业中,产能计划是根据可变需求分配生产资源的过程。半导体制造中的当前行业实践通常应用启发式规则来对动作进行优先级排序,诸如考虑进入的机器和配方专用的未来改变列表。然而,尽管提供了可解释性,但逻辑分析不能轻易地解释过程流中可能逐渐导致瓶颈形成的复杂交互。在这里,我们提出了一个基于神经网络的模型,用于在单个机器层面上进行容量规划,并使用深度强化学习进行训练。通过使用异构图神经网络表示策略,该模型直接捕获机器和处理步骤之间的各种关系,从而实现主动决策。我们描述了几种措施,以实现足够的可扩展性,以解决可能的机器级动作的巨大空间。   我们的评估结果涵盖了英特尔的小规模Minifab模型和使用流行的SMT 2020测试平台进行的初步实验。在最大的测试场景中,我们的训练策略增加了吞吐量,并将周期时间减少了约1.8%。
摘要:In manufacturing, capacity planning is the process of allocating production resources in accordance with variable demand. The current industry practice in semiconductor manufacturing typically applies heuristic rules to prioritize actions, such as future change lists that account for incoming machine and recipe dedications. However, while offering interpretability, heuristics cannot easily account for the complex interactions along the process flow that can gradually lead to the formation of bottlenecks. Here, we present a neural network-based model for capacity planning on the level of individual machines, trained using deep reinforcement learning. By representing the policy using a heterogeneous graph neural network, the model directly captures the diverse relationships among machines and processing steps, allowing for proactive decision-making. We describe several measures taken to achieve sufficient scalability to tackle the vast space of possible machine-level actions.   Our evaluation results cover Intel's small-scale Minifab model and preliminary experiments using the popular SMT2020 testbed. In the largest tested scenario, our trained policy increases throughput and decreases cycle time by about 1.8% each.


【10】Aircraft Fuel Flow Modelling with Ageing Effects: From Parametric Corrections to Neural Networks
标题:考虑老化影响的飞机燃油流建模:从参数修正到神经网络
链接:https://arxiv.org/abs/2509.15736

作者:arry, Ramon Dalmau, Philippe Very, Junzi Sun
摘要:飞机燃油流的精确建模对于运行规划和环境影响评估都是至关重要的,然而标准的参数模型往往忽略了随着飞机老化而发生的性能退化。本文研究了多种方法来整合发动机老化的影响到空中客车A320-214的燃油流量预测,使用一个全面的数据集,约19000个快速访问记录器飞行从9个不同的机身不同的服务年。我们系统地评估了经典的基于物理的模型,经验校正系数和数据驱动的神经网络架构,将年龄作为输入特征或作为显式乘法偏差。结果表明,虽然基线模型始终低估了旧飞机的燃油消耗,但使用与机龄相关的校正因子和神经模型大大减少了偏差,提高了预测精度。然而,由于机体数量少和缺乏详细的维修事件记录,限制了基于机龄的修正的代表性和普遍性。这项研究强调了在参数和机器学习框架中考虑老龄化影响的重要性,以提高业务和环境评估的可靠性。该研究还强调了对更多样化数据集的需求,这些数据集可以捕捉真实世界发动机恶化的复杂性。
摘要:Accurate modelling of aircraft fuel-flow is crucial for both operational planning and environmental impact assessment, yet standard parametric models often neglect performance deterioration that occurs as aircraft age. This paper investigates multiple approaches to integrate engine ageing effects into fuel-flow prediction for the Airbus A320-214, using a comprehensive dataset of approximately nineteen thousand Quick Access Recorder flights from nine distinct airframes with varying years in service. We systematically evaluate classical physics-based models, empirical correction coefficients, and data-driven neural network architectures that incorporate age either as an input feature or as an explicit multiplicative bias. Results demonstrate that while baseline models consistently underestimate fuel consumption for older aircraft, the use of age-dependent correction factors and neural models substantially reduces bias and improves prediction accuracy. Nevertheless, limitations arise from the small number of airframes and the lack of detailed maintenance event records, which constrain the representativeness and generalization of age-based corrections. This study emphasizes the importance of accounting for the effects of ageing in parametric and machine learning frameworks to improve the reliability of operational and environmental assessments. The study also highlights the need for more diverse datasets that can capture the complexity of real-world engine deterioration.


【11】KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning
标题:KITE:上下文学习的核心化和信息论示例
链接:https://arxiv.org/abs/2509.15676

作者:ingh, Soumya Suvra Ghosal, Kapu Nirmal Joshua, Soumyabrata Pal, Sayak Ray Chowdhury
摘要:在上下文学习(ICL)已经成为一个强大的范例,用于适应大型语言模型(LLM)的新的和数据稀缺的任务,只使用提示中提出的一些精心挑选的特定于任务的例子。然而,考虑到有限的上下文大小的LLM,一个基本的问题出现了:应该选择哪些示例来最大限度地提高给定用户查询的性能?虽然像KATE这样的基于最近邻的方法已被广泛采用,但它们在高维嵌入空间中存在众所周知的缺点,包括泛化能力差和缺乏多样性。在这项工作中,我们研究这个问题的例子选择在ICL从原则,信息理论驱动的角度。我们首先将LLM建模为输入嵌入的线性函数,并将示例选择任务框架为特定于查询的优化问题:从较大的示例库中选择样本子集,以最大限度地减少特定查询的预测误差。该公式通过针对特定查询实例的准确预测,偏离了传统的以泛化为中心的学习理论方法。我们推导出一个原则性的代理目标,近似submodular,使使用贪婪算法的近似保证。我们进一步增强了我们的方法,(i)结合内核技巧,在高维特征空间中操作,而无需显式映射,以及(ii)引入基于最优设计的正则化器,以鼓励所选示例的多样性。从经验上讲,我们在一套分类任务中展示了标准检索方法的显着改进,突出了在现实世界中,标签稀缺的情况下ICL的结构感知,多样化的例子选择的好处。
摘要 :In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models (LLMs) to new and data-scarce tasks using only a few carefully selected task-specific examples presented in the prompt. However, given the limited context size of LLMs, a fundamental question arises: Which examples should be selected to maximize performance on a given user query? While nearest-neighbor-based methods like KATE have been widely adopted for this purpose, they suffer from well-known drawbacks in high-dimensional embedding spaces, including poor generalization and a lack of diversity. In this work, we study this problem of example selection in ICL from a principled, information theory-driven perspective. We first model an LLM as a linear function over input embeddings and frame the example selection task as a query-specific optimization problem: selecting a subset of exemplars from a larger example bank that minimizes the prediction error on a specific query. This formulation departs from traditional generalization-focused learning theoretic approaches by targeting accurate prediction for a specific query instance. We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee. We further enhance our method by (i) incorporating the kernel trick to operate in high-dimensional feature spaces without explicit mappings, and (ii) introducing an optimal design-based regularizer to encourage diversity in the selected examples. Empirically, we demonstrate significant improvements over standard retrieval methods across a suite of classification tasks, highlighting the benefits of structure-aware, diverse example selection for ICL in real-world, label-scarce scenarios.


【12】Efficient Extractive Text Summarization for Online News Articles Using Machine Learning
标题:使用机器学习高效提取在线新闻文章文本摘要
链接:https://arxiv.org/abs/2509.15614

作者:was, Milon Biswas, Arunima Mandal, Fatema Tabassum Liza, Joy Sarker
摘要:在信息过载的时代,在线新闻文章的内容管理依赖于有效的摘要来提高可访问性和用户参与度。本文通过采用先进的机器学习技术来生成简洁连贯的摘要,同时保留原始含义,从而解决了提取文本摘要的挑战。使用康奈尔新闻编辑室数据集,包括130万篇文章摘要对,我们开发了一个管道,利用BERT嵌入将文本数据转换为数值表示。通过将任务框架为二进制分类问题,我们探索了各种模型,包括逻辑回归,前馈神经网络和长短期记忆(LSTM)网络。我们的研究结果表明,LSTM网络具有捕获顺序依赖关系的能力,在F1得分和ROUGE-1指标方面优于Lede-3和更简单的模型等基线方法。这项研究强调了自动摘要在改善在线新闻平台内容管理系统方面的潜力,从而实现更有效的内容组织和增强的用户体验。
摘要:In the age of information overload, content management for online news articles relies on efficient summarization to enhance accessibility and user engagement. This article addresses the challenge of extractive text summarization by employing advanced machine learning techniques to generate concise and coherent summaries while preserving the original meaning. Using the Cornell Newsroom dataset, comprising 1.3 million article-summary pairs, we developed a pipeline leveraging BERT embeddings to transform textual data into numerical representations. By framing the task as a binary classification problem, we explored various models, including logistic regression, feed-forward neural networks, and long short-term memory (LSTM) networks. Our findings demonstrate that LSTM networks, with their ability to capture sequential dependencies, outperform baseline methods like Lede-3 and simpler models in F1 score and ROUGE-1 metrics. This study underscores the potential of automated summarization in improving content management systems for online news platforms, enabling more efficient content organization and enhanced user experiences.


【13】Universal Learning of Stochastic Dynamics for Exact Belief Propagation using Bernstein Normalizing Flows
标题:使用Bernstein正规化流进行精确信念传播的随机动力学通用学习
链接:https://arxiv.org/abs/2509.15533

作者:rese, Morteza Lahijanian
备注:13 pages, 7 figures
摘要:预测随机系统中未来状态的分布,称为置信传播,是不确定性推理的基础。然而,非线性动力学往往使分析的信念传播棘手,需要近似的方法。当系统模型是未知的,必须从数据中学习时,一个关键的问题出现了:我们可以学习一个模型,(i)普遍近似一般非线性随机动力学,(ii)支持分析信念传播?本文建立了一类模型,满足这两个性质的理论基础。所提出的方法结合了规范化流的密度估计与伯恩斯坦多项式的分析易处理性的表达。实证结果表明,我们的学习模型的有效性超过了最先进的数据驱动的方法,用于信念传播,特别是对于具有非加性,非高斯噪声的高度非线性系统。
摘要:Predicting the distribution of future states in a stochastic system, known as belief propagation, is fundamental to reasoning under uncertainty. However, nonlinear dynamics often make analytical belief propagation intractable, requiring approximate methods. When the system model is unknown and must be learned from data, a key question arises: can we learn a model that (i) universally approximates general nonlinear stochastic dynamics, and (ii) supports analytical belief propagation? This paper establishes the theoretical foundations for a class of models that satisfy both properties. The proposed approach combines the expressiveness of normalizing flows for density estimation with the analytical tractability of Bernstein polynomials. Empirical results show the efficacy of our learned model over state-of-the-art data-driven methods for belief propagation, especially for highly non-linear systems with non-additive, non-Gaussian noise.


【14】Computing Linear Regions in Neural Networks with Skip Connections
标题:用跳过连接计算神经网络中的线性区域
链接:https://arxiv.org/abs/2509.15441

作者:yce, Jan Verschelde
备注:Accepted for publication in the proceedings in Computer Algebra in Scientific Computing 2025
摘要:神经网络是机器学习的重要工具。用热带算术表示分段线性激活函数,使热带几何的应用成为可能。提出了计算神经网络为线性映射的区域的算法。通过计算实验,我们提供了见解的困难,训练神经网络,特别是过拟合的问题和跳跃连接的好处。
摘要:Neural networks are important tools in machine learning. Representing piecewise linear activation functions with tropical arithmetic enables the application of tropical geometry. Algorithms are presented to compute regions where the neural networks are linear maps. Through computational experiments, we provide insights on the difficulty to train neural networks, in particular on the problems of overfitting and on the benefits of skip connections.


【15】Deep learning and abstractive summarisation for radiological reports: an empirical study for adapting the PEGASUS models' family with scarce data
标题:放射性报告的深度学习和抽象总结:一项利用稀缺数据调整PEGASUS模型家族的实证研究
链接:https://arxiv.org/abs/2509.15419

作者:enzoni, Martina Langhals, Martin Boeker, Luise Modersohn, Máté E. Maros
备注:14 pages, 4 figures, and 3 tables
摘要:尽管人工智能发展迅速,但抽象总结对于医学等敏感和数据限制性领域仍然具有挑战性。随着成像数量的增加,用于复杂医学文本摘要的自动化工具的相关性预计将变得高度相关。在本文中,我们研究了通过非特定领域的抽象摘要编码器-解码器模型家族的微调过程的自适应,并为从业者提供了如何避免过度和欠拟合的见解。我们在一个中等规模的放射学报告公共数据集上使用了PEGASUS和PEGASUS-X。对于每个模型,我们综合评估了两个不同的检查点,具有不同大小的相同训练数据。在固定大小的验证集上,我们在训练历史期间使用词汇和语义指标监测模型的性能。PEGASUS表现出不同的阶段,这可能与逐时双下降或峰-降-恢复行为有关。对于PEGASUS-X,我们发现使用更大的检查点会导致性能下降。这项工作突出了在处理稀缺的训练数据时,具有高表现力的微调模型所面临的挑战和风险,并为未来研究专门领域中汇总模型的更强大的微调策略奠定了基础。
摘要 :Regardless of the rapid development of artificial intelligence, abstractive summarisation is still challenging for sensitive and data-restrictive domains like medicine. With the increasing number of imaging, the relevance of automated tools for complex medical text summarisation is expected to become highly relevant. In this paper, we investigated the adaptation via fine-tuning process of a non-domain-specific abstractive summarisation encoder-decoder model family, and gave insights to practitioners on how to avoid over- and underfitting. We used PEGASUS and PEGASUS-X, on a medium-sized radiological reports public dataset. For each model, we comprehensively evaluated two different checkpoints with varying sizes of the same training data. We monitored the models' performances with lexical and semantic metrics during the training history on the fixed-size validation set. PEGASUS exhibited different phases, which can be related to epoch-wise double-descent, or peak-drop-recovery behaviour. For PEGASUS-X, we found that using a larger checkpoint led to a performance detriment. This work highlights the challenges and risks of fine-tuning models with high expressivity when dealing with scarce training data, and lays the groundwork for future investigations into more robust fine-tuning strategies for summarisation models in specialised domains.


【16】Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning
标题:全球预设,局部调整:一种简单而有效的持续学习对比策略
链接:https://arxiv.org/abs/2509.15347

作者: Xinrui Wang, Songcan Chen
备注:The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: {https://doi.org/10.1007/s11704-025-50623-6}
摘要:持续学习(CL)涉及从不断发展的任务中获取和积累知识,同时减轻灾难性遗忘。近年来,利用对比损失来构建更易传递、更不易遗忘的表征已成为合作语言研究的一个有前途的方向。尽管取得了进步,但由于任务间和任务内特征引起的混乱,它们的性能仍然有限。为了解决这个问题,我们提出了一个简单而有效的对比策略,名为\textbf{G}重定位\textbf{P},\textbf{L}局部\textbf{A}调整\textbf{S}验证\textbf{C}对比学习(GPLASC)。具体来说,为了避免任务级的混乱,我们将整个单位超球面的表示分为不重叠的区域,区域的中心形成一个任务间预先固定的\textbf{E}quiangular \textbf{T}ight \textbf{F}rame(ETF)。同时,对于单个任务,我们的方法有助于调节特征结构,并在各自的分配区域内形成任务内可调的ETF。因此,我们的方法\textit{同时}确保任务之间和任务内的区别特征结构,可以无缝集成到任何现有的对比持续学习框架。大量的实验验证了其有效性。
摘要:Continual learning (CL) involves acquiring and accumulating knowledge from evolving tasks while alleviating catastrophic forgetting. Recently, leveraging contrastive loss to construct more transferable and less forgetful representations has been a promising direction in CL. Despite advancements, their performance is still limited due to confusion arising from both inter-task and intra-task features. To address the problem, we propose a simple yet effective contrastive strategy named \textbf{G}lobal \textbf{P}re-fixing, \textbf{L}ocal \textbf{A}djusting for \textbf{S}upervised \textbf{C}ontrastive learning (GPLASC). Specifically, to avoid task-level confusion, we divide the entire unit hypersphere of representations into non-overlapping regions, with the centers of the regions forming an inter-task pre-fixed \textbf{E}quiangular \textbf{T}ight \textbf{F}rame (ETF). Meanwhile, for individual tasks, our method helps regulate the feature structure and form intra-task adjustable ETFs within their respective allocated regions. As a result, our method \textit{simultaneously} ensures discriminative feature structures both between tasks and within tasks and can be seamlessly integrated into any existing contrastive continual learning framework. Extensive experiments validate its effectiveness.


【17】Kuramoto Orientation Diffusion Models
标题:仓本方向扩散模型
链接:https://arxiv.org/abs/2509.15328

作者: T. Anderson Keller, Sevan Brodjian, Takeru Miyato, Yisong Yue, Pietro Perona, Max Welling
备注:NeurIPS 2025
摘要:方向丰富的图像,如指纹和纹理,往往表现出一致的角度方向模式,是具有挑战性的模型使用标准的生成方法的基础上各向同性欧几里德扩散。受生物系统中相位同步作用的启发,我们提出了一个基于分数的生成模型,该模型建立在周期域上,利用扩散过程中的随机仓本动力学。在神经和物理系统中,Kuramoto模型捕获了耦合振荡器之间的同步现象-我们在这里重新利用这种行为作为结构化图像生成的感应偏置。在我们的框架中,前向过程通过全局或局部耦合振荡器相互作用和对全局参考相位的吸引来执行相位变量之间的同步,逐渐将数据折叠成低熵的von Mises分布。然后,反向过程执行\textit{去重复化},通过使用学习的评分函数反转动态来生成不同的模式。这种方法使结构化的破坏过程中向前扩散和分层生成过程中,逐步细化到精细尺度的细节全球一致性。我们实现了包裹的高斯转换内核和并行性感知网络来解释圆形几何。我们的方法在一般图像基准上取得了有竞争力的结果,并显着提高了指纹和纹理等方向密集数据集的生成质量。最终,这项工作表明了生物灵感的同步动力学作为生成建模中的结构化先验的承诺。
摘要:Orientation-rich images, such as fingerprints and textures, often exhibit coherent angular directional patterns that are challenging to model using standard generative approaches based on isotropic Euclidean diffusion. Motivated by the role of phase synchronization in biological systems, we propose a score-based generative model built on periodic domains by leveraging stochastic Kuramoto dynamics in the diffusion process. In neural and physical systems, Kuramoto models capture synchronization phenomena across coupled oscillators -- a behavior that we re-purpose here as an inductive bias for structured image generation. In our framework, the forward process performs \textit{synchronization} among phase variables through globally or locally coupled oscillator interactions and attraction to a global reference phase, gradually collapsing the data into a low-entropy von Mises distribution. The reverse process then performs \textit{desynchronization}, generating diverse patterns by reversing the dynamics with a learned score function. This approach enables structured destruction during forward diffusion and a hierarchical generation process that progressively refines global coherence into fine-scale details. We implement wrapped Gaussian transition kernels and periodicity-aware networks to account for the circular geometry. Our method achieves competitive results on general image benchmarks and significantly improves generation quality on orientation-dense datasets like fingerprints and textures. Ultimately, this work demonstrates the promise of biologically inspired synchronization dynamics as structured priors in generative modeling.


【18】Autoguided Online Data Curation for Diffusion Model Training
标题:用于扩散模型训练的自动引导在线数据处理
链接:https://arxiv.org/abs/2509.15267

作者:ais, Luis Oala, Daniele Faccio, Marco Aversa
备注:Accepted non-archival paper at ICCV 2025 Workshop on Curated Data for Efficient Learning (CDEL)
摘要:生成模型计算的成本重新点燃了对高效数据管理的承诺和希望。在这项工作中,我们研究了最近开发的自动指导和在线数据选择方法是否可以提高训练生成扩散模型的时间和样本效率。我们将联合样本选择(JEST)和自动引导集成到一个统一的代码库中,以实现快速消融和基准测试。我们评估了受控的2-D合成数据生成任务以及(3x 64 x64)-D图像生成的数据策展组合。我们的比较是在相同的挂钟时间和相同数量的样本,明确考虑到选择的开销。在整个实验中,自动引导始终提高样品质量和多样性。早期AJEST(仅在训练开始时应用选择)在这两项任务的数据效率方面可以与单独的自动指导相匹配或略微超过。然而,它的时间开销和增加的复杂性使得自动制导或均匀随机数据选择在大多数情况下更可取。这些发现表明,虽然有针对性的在线选择可以在早期训练中提高效率,但强大的样本质量改进主要是由自动指导驱动的。我们讨论了限制和范围,并概述了数据选择可能是有益的。
摘要:The costs of generative model compute rekindled promises and hopes for efficient data curation. In this work, we investigate whether recently developed autoguidance and online data selection methods can improve the time and sample efficiency of training generative diffusion models. We integrate joint example selection (JEST) and autoguidance into a unified code base for fast ablation and benchmarking. We evaluate combinations of data curation on a controlled 2-D synthetic data generation task as well as (3x64x64)-D image generation. Our comparisons are made at equal wall-clock time and equal number of samples, explicitly accounting for the overhead of selection. Across experiments, autoguidance consistently improves sample quality and diversity. Early AJEST (applying selection only at the beginning of training) can match or modestly exceed autoguidance alone in data efficiency on both tasks. However, its time overhead and added complexity make autoguidance or uniform random data selection preferable in most situations. These findings suggest that while targeted online selection can yield efficiency gains in early training, robust sample quality improvements are primarily driven by autoguidance. We discuss limitations and scope, and outline when data selection may be beneficial.


【19】Generative AI Meets Wireless Sensing: Towards Wireless Foundation Model
标题:生成式AI与无线传感相遇:迈向无线基础模型
链接:https://arxiv.org/abs/2509.15258

作者:g, Guoxuan Chi, Chenshu Wu, Hanyu Liu, Yuchong Gao, Yunhao Liu, Jie Xu, Tony Xiao Han
摘要:生成人工智能(GenAI)在计算机视觉(CV)和自然语言处理(NLP)等领域取得了重大进展,证明了其合成高保真数据和提高泛化能力的能力。最近,将GenAI集成到无线传感系统中的兴趣越来越大。通过利用数据增强、域自适应和去噪等生成技术,可以显著改善无线传感应用,包括设备定位、人类活动识别和环境监测。本调查从两个互补的角度研究了GenAI和无线传感的融合。首先,我们探讨了如何将GenAI集成到无线传感管道中,重点关注两种集成模式:作为一个插件来增强特定于任务的模型,以及作为一个求解器来直接解决传感任务。其次,我们分析了主流生成模型的特点,如生成对抗网络(GANs),变分自编码器(VAE)和扩散模型,并讨论了它们在各种无线传感任务中的适用性和独特优势。我们进一步确定了将GenAI应用于无线传感的关键挑战,并概述了无线基础模型的未来发展方向:一个统一的、预先训练的设计,能够在不同的传感任务中实现可扩展、可适应和高效的信号理解。
摘要:Generative Artificial Intelligence (GenAI) has made significant advancements in fields such as computer vision (CV) and natural language processing (NLP), demonstrating its capability to synthesize high-fidelity data and improve generalization. Recently, there has been growing interest in integrating GenAI into wireless sensing systems. By leveraging generative techniques such as data augmentation, domain adaptation, and denoising, wireless sensing applications, including device localization, human activity recognition, and environmental monitoring, can be significantly improved. This survey investigates the convergence of GenAI and wireless sensing from two complementary perspectives. First, we explore how GenAI can be integrated into wireless sensing pipelines, focusing on two modes of integration: as a plugin to augment task-specific models and as a solver to directly address sensing tasks. Second, we analyze the characteristics of mainstream generative models, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models, and discuss their applicability and unique advantages across various wireless sensing tasks. We further identify key challenges in applying GenAI to wireless sensing and outline a future direction toward a wireless foundation model: a unified, pre-trained design capable of scalable, adaptable, and efficient signal understanding across diverse sensing tasks.


【20】Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning
标题:伪造前模型:即时学习作为忘记学习的原生机制
链接:https://arxiv.org/abs/2509.15230

作者:ndrix, Giovanni Patanè, Leonardo G. Russo, Simone Carnemolla, Giovanni Bellitto, Federica Proietto Salanitri, Concetto Spampinato, Matteo Pennisi
备注:Accepted at ACM multimedia 2025 BNI track
摘要:基础模型通过在不同的模式和任务中实现强大的和可转移的表示来改变多媒体分析。然而,它们的静态部署与不断增长的社会和监管需求相冲突-特别是根据GDPR等隐私框架的要求,需要根据请求取消特定数据。传统的非学习方法,包括再训练、激活编辑或蒸馏,通常计算昂贵、脆弱,不适合实时或持续进化的系统。在本文中,我们提出了一种范式转变:重新思考遗忘不是作为一种追溯性干预,而是作为一种内置能力。我们引入了一个基于知识的学习框架,该框架在单个训练阶段内统一了知识获取和删除。我们的方法不是在模型权重中编码信息,而是将类级别的语义绑定到专用的提示符。这种设计可以通过删除相应的提示来实现即时遗忘,而无需重新训练,模型修改或访问原始数据。实验表明,我们的框架保留了保留类的预测性能,同时有效地删除被遗忘的。除了实用性之外,我们的方法还具有强大的隐私和安全保证:它可以抵抗成员推断攻击,并且即使在对抗条件下,也可以立即删除,从而防止任何剩余知识提取。这确保了遵守数据保护原则,并防止未经授权访问被遗忘的信息,使该框架适合在敏感和受监管的环境中部署。总的来说,通过将可移动性嵌入到架构本身,这项工作为设计模块化、可扩展和道德响应的AI模型奠定了新的基础。
摘要:Foundation models have transformed multimedia analysis by enabling robust and transferable representations across diverse modalities and tasks. However, their static deployment conflicts with growing societal and regulatory demands -- particularly the need to unlearn specific data upon request, as mandated by privacy frameworks such as the GDPR. Traditional unlearning approaches, including retraining, activation editing, or distillation, are often computationally expensive, fragile, and ill-suited for real-time or continuously evolving systems. In this paper, we propose a paradigm shift: rethinking unlearning not as a retroactive intervention but as a built-in capability. We introduce a prompt-based learning framework that unifies knowledge acquisition and removal within a single training phase. Rather than encoding information in model weights, our approach binds class-level semantics to dedicated prompt tokens. This design enables instant unlearning simply by removing the corresponding prompt -- without retraining, model modification, or access to original data. Experiments demonstrate that our framework preserves predictive performance on retained classes while effectively erasing forgotten ones. Beyond utility, our method exhibits strong privacy and security guarantees: it is resistant to membership inference attacks, and prompt removal prevents any residual knowledge extraction, even under adversarial conditions. This ensures compliance with data protection principles and safeguards against unauthorized access to forgotten information, making the framework suitable for deployment in sensitive and regulated environments. Overall, by embedding removability into the architecture itself, this work establishes a new foundation for designing modular, scalable and ethically responsive AI models.


【21】Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities
链接:https://arxiv.org/abs/2509.15822

作者: Carpentier, Christophe Giraud, Nicolas Verzelen
摘要:统计物理学的预测假设,随机块模型(SBM)中的群落恢复在高于且仅高于Kesten-Stigum(KS)阈值的多项式时间内是可能的。这个猜想已经引起了丰富的文献,证明非平凡的社区恢复在高于KS阈值的SBM中确实是可能的,只要社区的数量$K$保持小于$\sqrt{n}$,其中$n$是观察到的图中的节点数。当K=o(\sqrt{n})$时,低于KS阈值的低次多项式也会失败。   当$K\geq \sqrt{n}$,Chin et al.(2025)最近证明,在稀疏的制度,在多项式时间内的社区恢复是可能的KS阈值以下,通过计数非回溯路径。这一突破性的结果使他们假设了一个新的阈值为许多社区制度K\geq \sqrt{n}$。在这项工作中,我们提供的证据证实了他们对$K\geq \sqrt{n}$的猜想:   1-我们证明了,对于任何密度的图形,低次多项式无法恢复社区的阈值以下假设的秦等。(2025年);   2-我们证明,社区恢复是可能的多项式时间以上的假设阈值,不仅在稀疏制度的~Chin等人,而且在某些(但不是全部)适度稀疏的区域中也是如此,这是通过基本上对观察到的图中的团出现进行计数来实现的。
摘要:Predictions from statistical physics postulate that recovery of the communities in Stochastic Block Model (SBM) is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that non-trivial community recovery is indeed possible in SBM above the KS threshold, as long as the number $K$ of communities remains smaller than $\sqrt{n}$, where $n$ is the number of nodes in the observed graph. Failure of low-degree polynomials below the KS threshold was also proven when $K=o(\sqrt{n})$.   When $K\geq \sqrt{n}$, Chin et al.(2025) recently prove that, in a sparse regime, community recovery in polynomial time is possible below the KS threshold by counting non-backtracking paths. This breakthrough result lead them to postulate a new threshold for the many communities regime $K\geq \sqrt{n}$. In this work, we provide evidences that confirm their conjecture for $K\geq \sqrt{n}$:   1- We prove that, for any density of the graph, low-degree polynomials fail to recover communities below the threshold postulated by Chin et al.(2025);   2- We prove that community recovery is possible in polynomial time above the postulated threshold, not only in the sparse regime of~Chin et al., but also in some (but not all) moderately sparse regimes by essentially counting clique occurence in the observed graph.


【22】Impact of Single Rotations and Entanglement Topologies in Quantum Neural Networks
标题:量子神经网络中单旋转和纠缠拓扑的影响
链接:https://arxiv.org/abs/2509.15722

作者:dacci, Michele Amoretti
摘要:在这项工作中,分析了不同的变分量子电路的性能,研究了它如何随着纠缠拓扑、采用的门和要执行的量子机器学习任务而变化。分析的目的是确定构建量子神经网络电路的最佳方式。在所提出的实验中,使用了两种类型的电路:一种具有交替的旋转和纠缠层,另一种与第一种类似,但具有额外的最后一层旋转。作为旋转层,考虑一个和两个旋转序列的所有组合。四种不同的纠缠拓扑结构进行了比较:线性,循环,成对,和完整的。不同的任务被认为是,即概率分布和图像的生成,以及图像分类。实现的结果与不同电路的可表达性和纠缠能力相关,以了解这些功能如何影响性能。
摘要:In this work, an analysis of the performance of different Variational Quantum Circuits is presented, investigating how it changes with respect to entanglement topology, adopted gates, and Quantum Machine Learning tasks to be performed. The objective of the analysis is to identify the optimal way to construct circuits for Quantum Neural Networks. In the presented experiments, two types of circuits are used: one with alternating layers of rotations and entanglement, and the other, similar to the first one, but with an additional final layer of rotations. As rotation layers, all combinations of one and two rotation sequences are considered. Four different entanglement topologies are compared: linear, circular, pairwise, and full. Different tasks are considered, namely the generation of probability distributions and images, and image classification. Achieved results are correlated with the expressibility and entanglement capability of the different circuits to understand how these features affect performance.


【23】Interpretable Network-assisted Random Forest+
标题:可解释网络辅助随机森林+
链接:https://arxiv.org/abs/2509.15611

作者:. Tang, Elizaveta Levina, Ji Zhu
摘要:机器学习算法通常假设训练样本是独立的。当数据点通过网络连接时,样本之间的相关性既是一个挑战,减少了有效样本大小,也是一个机会,可以通过利用来自网络邻居的信息来改善预测。现在有多种方法可以利用这个机会,但包括图神经网络在内的许多方法都不容易解释,限制了它们在理解模型如何进行预测方面的有用性。其他方法,如网络辅助线性回归,是可解释的,但通常会产生更差的预测性能。我们通过提出一系列灵活的网络辅助模型来弥合这一差距,这些模型建立在随机森林(RF+)的泛化基础上,实现了极具竞争力的预测准确性,并且可以通过特征重要性度量进行解释。特别是,我们开发了一套解释工具,使从业者不仅能够识别驱动模型预测的重要特征,而且还可以量化网络对预测的重要性。重要的是,我们提供了全局和局部重要性度量以及样本影响度量,以评估给定观测的影响。这套工具扩大了网络辅助机器学习的范围和适用性,以解决可解释性和透明度至关重要的高影响力问题。
摘要:Machine learning algorithms often assume that training samples are independent. When data points are connected by a network, the induced dependency between samples is both a challenge, reducing effective sample size, and an opportunity to improve prediction by leveraging information from network neighbors. Multiple methods taking advantage of this opportunity are now available, but many, including graph neural networks, are not easily interpretable, limiting their usefulness for understanding how a model makes its predictions. Others, such as network-assisted linear regression, are interpretable but often yield substantially worse prediction performance. We bridge this gap by proposing a family of flexible network-assisted models built upon a generalization of random forests (RF+), which achieves highly-competitive prediction accuracy and can be interpreted through feature importance measures. In particular, we develop a suite of interpretation tools that enable practitioners to not only identify important features that drive model predictions, but also quantify the importance of the network contribution to prediction. Importantly, we provide both global and local importance measures as well as sample influence measures to assess the impact of a given observation. This suite of tools broadens the scope and applicability of network-assisted machine learning for high-impact problems where interpretability and transparency are essential.


【24】Kernel Model Validation: How To Do It, And Why You Should Care
标题:核心模型验证:如何做到这一点,以及为什么您应该关心
链接:https://arxiv.org/abs/2509.15244

作者:ziani, Marieme Ngom
备注:12 pages, 6 figures. To appear in ITEA Journal of Test and Evaluation, Vol. 46, Issue 3, September 2025
摘要:高斯过程(GP)模型是不确定性量化(UQ)中的常用工具,因为它们旨在提供可用于表示模型不确定性的功能不确定性估计。通常很难精确地说明这种不确定性的概率解释,以及它是以何种方式校准的。如果没有这样的校准声明,这种不确定性估计的价值是非常有限的和定性的。我们激励GP预测的正确概率校准的重要性,通过描述GP预测校准失败可能会导致退化的收敛性能的目标优化算法,称为有针对性的自适应设计(ESTA)。我们讨论的GP生成的不确定性区间的解释,以及如何可以学会信任他们,通过一个正式的协方差核验证程序,利用GP预测的多元正态性。我们给出简单的例子GP回归误指定的1维模型,并讨论了高维模型的情况。
摘要:Gaussian Process (GP) models are popular tools in uncertainty quantification (UQ) because they purport to furnish functional uncertainty estimates that can be used to represent model uncertainty. It is often difficult to state with precision what probabilistic interpretation attaches to such an uncertainty, and in what way is it calibrated. Without such a calibration statement, the value of such uncertainty estimates is quite limited and qualitative. We motivate the importance of proper probabilistic calibration of GP predictions by describing how GP predictive calibration failures can cause degraded convergence properties in a target optimization algorithm called Targeted Adaptive Design (TAD). We discuss the interpretation of GP-generated uncertainty intervals in UQ, and how one may learn to trust them, through a formal procedure for covariance kernel validation that exploits the multivariate normal nature of GP predictions. We give simple examples of GP regression misspecified 1-dimensional models, and discuss the situation with respect to higher-dimensional models.


其他(35篇)

【1】MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair
标题:MatchFixAgent:数据不可知的自主存储库级代码翻译验证和修复
链接:https://arxiv.org/abs/2509.16187

作者:Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand, Joey Dodds, Daniel Kroening
摘要:代码转换将源代码从一种编程语言(PL)转换为另一种编程语言。验证翻译的功能等价性并在必要时进行修复是代码翻译的关键步骤。现有的自动化验证和修复方法由于高工程开销而难以推广到许多PL,并且它们依赖于现有的且通常不充分的测试套件,这导致了等效性的错误声明和无效的翻译修复。我们开发了MatchFixAgent,一个基于大型语言模型(LLM)的PL不可知框架,用于翻译的等价验证和修复。MatchFixAgent具有多代理架构,将等价验证划分为多个子任务,以确保对翻译进行全面和一致的语义分析。然后它将分析结果提供给测试代理来编写和执行测试。在观察到测试失败后,修复代理会尝试修复翻译错误。最终的(不)等价性决定是由裁决代理,考虑语义分析和测试执行结果。   我们比较MatchFixAgent的验证和修复结果与四个仓库级代码翻译技术。我们使用了来自他们的工件的2,219个翻译对,其中包括6个PL对,并从24个GitHub项目中收集,总计超过90万行代码。我们的研究结果表明,MatchFixAgent产生(在)等价判决的99.2%的翻译对,与以前的工作相同的等价性验证结果72.8%。当MatchFixAgent的结果与先前的工作不一致时,我们发现60.7%的时间MatchFixAgent的结果实际上是正确的。此外,我们表明,MatchFixAgent可以修复50.6%的不等价翻译,而以前的工作的18.5%。这表明MatchFixAgent比之前的工作更能适应许多PL对,同时产生高度准确的验证结果。
摘要 :Code translation transforms source code from one programming language (PL) to another. Validating the functional equivalence of translation and repairing, if necessary, are critical steps in code translation. Existing automated validation and repair approaches struggle to generalize to many PLs due to high engineering overhead, and they rely on existing and often inadequate test suites, which results in false claims of equivalence and ineffective translation repair. We develop MatchFixAgent, a large language model (LLM)-based, PL-agnostic framework for equivalence validation and repair of translations. MatchFixAgent features a multi-agent architecture that divides equivalence validation into several sub-tasks to ensure thorough and consistent semantic analysis of the translation. Then it feeds this analysis to test agent to write and execute tests. Upon observing a test failure, the repair agent attempts to fix the translation bug. The final (in)equivalence decision is made by the verdict agent, considering semantic analyses and test execution results.   We compare MatchFixAgent's validation and repair results with four repository-level code translation techniques. We use 2,219 translation pairs from their artifacts, which cover 6 PL pairs, and are collected from 24 GitHub projects totaling over 900K lines of code. Our results demonstrate that MatchFixAgent produces (in)equivalence verdicts for 99.2% of translation pairs, with the same equivalence validation result as prior work on 72.8% of them. When MatchFixAgent's result disagrees with prior work, we find that 60.7% of the time MatchFixAgent's result is actually correct. In addition, we show that MatchFixAgent can repair 50.6% of inequivalent translation, compared to prior work's 18.5%. This demonstrates that MatchFixAgent is far more adaptable to many PL pairs than prior work, while producing highly accurate validation results.


【2】When Bugs Linger: A Study of Anomalous Resolution Time Outliers and Their Themes
标题:When Bugs Linger:异常分辨率时间异常值及其主题研究
链接:https://arxiv.org/abs/2509.16140

作者:atil
备注:7 pages, 2 tables, 21 figures
摘要:高效的错误解决对于维护软件质量和用户满意度至关重要。然而,特定的错误报告经历了异常长的解决时间,这可能表明潜在的过程效率低下或复杂的问题。这项研究对七个著名的开源存储库中的错误解决异常进行了全面的分析:Cassandra,Firefox,Hadoop,HBase,SeaMonkey,Spark和Thunderbird。利用统计方法,如Z分数和四分位数范围(IQR),我们确定异常的错误解决持续时间。为了理解这些异常的主题性质,我们应用词频-逆文档频率(TF-IDF)进行文本特征提取和KMeans聚类来分组类似的错误摘要。我们的发现揭示了项目之间的一致模式,异常通常围绕测试失败,增强请求和用户界面问题聚集。这种方法为项目维护人员提供了可操作的见解,以优先考虑并有效地解决长期存在的错误。
摘要:Efficient bug resolution is critical for maintaining software quality and user satisfaction. However, specific bug reports experience unusually long resolution times, which may indicate underlying process inefficiencies or complex issues. This study presents a comprehensive analysis of bug resolution anomalies across seven prominent open-source repositories: Cassandra, Firefox, Hadoop, HBase, SeaMonkey, Spark, and Thunderbird. Utilizing statistical methods such as Z-score and Interquartile Range (IQR), we identify anomalies in bug resolution durations. To understand the thematic nature of these anomalies, we apply Term Frequency-Inverse Document Frequency (TF-IDF) for textual feature extraction and KMeans clustering to group similar bug summaries. Our findings reveal consistent patterns across projects, with anomalies often clustering around test failures, enhancement requests, and user interface issues. This approach provides actionable insights for project maintainers to prioritize and effectively address long-standing bugs.


【3】DiffusionNFT: Online Diffusion Reinforcement with Forward Process
标题:DiffusionNFT:带前向过程的在线扩散强化
链接:https://arxiv.org/abs/2509.16117

作者:eng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, Ming-Yu Liu
摘要:在线强化学习(RL)一直是后训练语言模型的核心,但由于难以处理的可能性,将其扩展到扩散模型仍然具有挑战性。最近的工作离散化的反向采样过程,使GRPO式的训练,但他们继承了根本的缺点,包括求解器的限制,正向-反向不一致,和复杂的集成与无分类器的指导(CFG)。我们引入了Diffusion Negative-aware FineTuning(DiffusionNFT),这是一种新的在线RL范式,通过流匹配直接在正向过程中优化扩散模型。DiffusionNFT对比正代和负代,以定义隐含的策略改进方向,自然地将强化信号纳入监督学习目标。这种公式能够使用任意的黑盒求解器进行训练,消除了对似然估计的需要,并且只需要干净的图像而不是采样轨迹来进行策略优化。在头对头比较中,DiffusionNFT比FlowGRPO的效率高出25倍,同时不含CFG。例如,DiffusionNFT在1 k步内将GenEval评分从0.24提高到0.98,而FlowGRPO在超过5 k步和额外的CFG使用下达到0.95。通过利用多个奖励模型,DiffusionNFT在每个测试基准中都显著提升了SD3.5-Medium的性能。
摘要:Online reinforcement learning (RL) has been central to post-training language models, but its extension to diffusion models remains challenging due to intractable likelihoods. Recent works discretize the reverse sampling process to enable GRPO-style training, yet they inherit fundamental drawbacks, including solver restrictions, forward-reverse inconsistency, and complicated integration with classifier-free guidance (CFG). We introduce Diffusion Negative-aware FineTuning (DiffusionNFT), a new online RL paradigm that optimizes diffusion models directly on the forward process via flow matching. DiffusionNFT contrasts positive and negative generations to define an implicit policy improvement direction, naturally incorporating reinforcement signals into the supervised learning objective. This formulation enables training with arbitrary black-box solvers, eliminates the need for likelihood estimation, and requires only clean images rather than sampling trajectories for policy optimization. DiffusionNFT is up to $25\times$ more efficient than FlowGRPO in head-to-head comparisons, while being CFG-free. For instance, DiffusionNFT improves the GenEval score from 0.24 to 0.98 within 1k steps, while FlowGRPO achieves 0.95 with over 5k steps and additional CFG employment. By leveraging multiple reward models, DiffusionNFT significantly boosts the performance of SD3.5-Medium in every benchmark tested.


【4】Rethinking Molecule Synthesizability with Chain-of-Reaction
标题:用反应链重新思考分子的可合成性
链接:https://arxiv.org/abs/2509.16084

作者: Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Weili Nie, Arash Vahdat
摘要:分子生成模型的一个众所周知的缺陷是,它们不能保证生成可合成的分子。已经有相当多的尝试来解决这个问题,但是考虑到可合成分子的指数级大的组合空间,现有方法已经显示出有限的空间覆盖和差的分子优化性能。为了解决这些问题,我们引入了ReaSyn,这是一个用于可合成投影的生成框架,该模型通过生成导致可合成类似物的途径来探索可合成空间中给定分子的邻域。为了充分利用合成途径中包含的化学知识,我们提出了一个新的视角,认为合成途径类似于大型语言模型(LLM)中的推理路径。具体来说,受LLM中的思想链(CoT)推理的启发,我们引入了反应链(CoR)符号,明确说明了反应物,反应类型和中间产物,用于途径中的每一步。通过CoR符号,ReaSyn可以在每个反应步骤中获得密集监督,以便在监督训练期间显式学习化学反应规则并执行分步推理。此外,为了进一步增强ReaSyn的推理能力,我们提出了基于强化学习(RL)的微调和针对可合成投影的目标导向测试时计算缩放。ReaSyn在可合成分子重建中实现了最高的重建率和途径多样性,在可合成目标导向分子优化中实现了最高的优化性能,并且在可合成命中扩展中显著优于先前的可合成投影方法。这些结果突出了ReaSyn在组合大的可合成化学空间中导航的卓越能力。
摘要 :A well-known pitfall of molecular generative models is that they are not guaranteed to generate synthesizable molecules. There have been considerable attempts to address this problem, but given the exponentially large combinatorial space of synthesizable molecules, existing methods have shown limited coverage of the space and poor molecular optimization performance. To tackle these problems, we introduce ReaSyn, a generative framework for synthesizable projection where the model explores the neighborhood of given molecules in the synthesizable space by generating pathways that result in synthesizable analogs. To fully utilize the chemical knowledge contained in the synthetic pathways, we propose a novel perspective that views synthetic pathways akin to reasoning paths in large language models (LLMs). Specifically, inspired by chain-of-thought (CoT) reasoning in LLMs, we introduce the chain-of-reaction (CoR) notation that explicitly states reactants, reaction types, and intermediate products for each step in a pathway. With the CoR notation, ReaSyn can get dense supervision in every reaction step to explicitly learn chemical reaction rules during supervised training and perform step-by-step reasoning. In addition, to further enhance the reasoning capability of ReaSyn, we propose reinforcement learning (RL)-based finetuning and goal-directed test-time compute scaling tailored for synthesizable projection. ReaSyn achieves the highest reconstruction rate and pathway diversity in synthesizable molecule reconstruction and the highest optimization performance in synthesizable goal-directed molecular optimization, and significantly outperforms previous synthesizable projection methods in synthesizable hit expansion. These results highlight ReaSyn's superior ability to navigate combinatorially-large synthesizable chemical space.


【5】SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection
标题:SABER:通过跨层剩余连接揭示安全对齐中的漏洞
链接:https://arxiv.org/abs/2509.16060

作者:Joshi, Palash Nandi, Tanmoy Chakraborty
备注:Accepted in EMNLP'25 Main
摘要:具有安全对齐训练的大型语言模型(LLM)是具有强大语言理解能力的强大工具。这些模型通常经过涉及人类反馈的细致调整程序,以确保接受安全输入,同时拒绝有害或不安全的输入。然而,尽管他们的大规模和调整努力,LLM仍然容易受到越狱攻击,恶意用户操纵模型产生有害的输出,它是明确训练,以避免。在这项研究中,我们发现,在LLM的安全机制主要是嵌入在中后期层。基于这种见解,我们引入了一种新的白盒越狱方法,SABER(通过额外的Resistance的安全对齐旁路),它通过剩余连接连接两个中间层$s$和$e$,使得$s < e$。我们的方法在HarmBench测试集上的最佳性能基线上实现了51%的改进。此外,SABER诱导只有一个边际转移的困惑时,评估的HarmBench验证集。源代码可在https://github.com/PalGitts/SABER上公开获得。
摘要:Large Language Models (LLMs) with safe-alignment training are powerful instruments with robust language comprehension capabilities. These models typically undergo meticulous alignment procedures involving human feedback to ensure the acceptance of safe inputs while rejecting harmful or unsafe ones. However, despite their massive scale and alignment efforts, LLMs remain vulnerable to jailbreak attacks, where malicious users manipulate the model to produce harmful outputs that it was explicitly trained to avoid. In this study, we find that the safety mechanisms in LLMs are predominantly embedded in the middle-to-late layers. Building on this insight, we introduce a novel white-box jailbreak method, SABER (Safety Alignment Bypass via Extra Residuals), which connects two intermediate layers $s$ and $e$ such that $s < e$, through a residual connection. Our approach achieves a 51% improvement over the best-performing baseline on the HarmBench test set. Furthermore, SABER induces only a marginal shift in perplexity when evaluated on the HarmBench validation set. The source code is publicly available at https://github.com/PalGitts/SABER.


【6】EmoHeal: An End-to-End System for Personalized Therapeutic Music Retrieval from Fine-grained Emotions
标题:DeliverHeal:一个从细粒度情绪中进行个性化治疗音乐检索的端到端系统
链接:https://arxiv.org/abs/2509.15986

作者:an, Jinhua Liang, Huan Zhang
备注:5 pages, 5 figures. Submitted to the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)
摘要:现有的数字心理健康工具往往忽视了日常挑战背后的微妙情绪状态。例如,全球有超过15亿人受到睡前焦虑的影响,但目前的方法基本上仍然是静态的,“一刀切”,无法适应个人需求。在这项工作中,我们提出了一个端到端的系统,提供个性化的,三阶段的支持性叙述。Escherheal通过微调的XLM-RoBERTa模型从用户文本中检测27种细粒度的情绪,通过基于音乐治疗原理(GEMS,iso-principle)的知识图将它们映射到音乐参数。SeaHeal使用CLAMP 3模型检索视听内容,以引导用户从当前状态转向更平静的状态(“匹配-引导-目标”)。一项受试者内研究(N=40)显示了显著的支持效果,参与者报告了显著的情绪改善(M=4.12,p<0.001)和高感知情绪识别准确性(M=4.05,p<0.001)。感知准确性和治疗结果之间的强相关性(r=0.72,p<0.001)验证了我们的细粒度方法。这些发现确立了理论驱动的、情感感知的数字健康工具的可行性,并为音乐治疗原则的实施提供了可扩展的人工智能蓝图。
摘要:Existing digital mental wellness tools often overlook the nuanced emotional states underlying everyday challenges. For example, pre-sleep anxiety affects more than 1.5 billion people worldwide, yet current approaches remain largely static and "one-size-fits-all", failing to adapt to individual needs. In this work, we present EmoHeal, an end-to-end system that delivers personalized, three-stage supportive narratives. EmoHeal detects 27 fine-grained emotions from user text with a fine-tuned XLM-RoBERTa model, mapping them to musical parameters via a knowledge graph grounded in music therapy principles (GEMS, iso-principle). EmoHeal retrieves audiovisual content using the CLAMP3 model to guide users from their current state toward a calmer one ("match-guide-target"). A within-subjects study (N=40) demonstrated significant supportive effects, with participants reporting substantial mood improvement (M=4.12, p<0.001) and high perceived emotion recognition accuracy (M=4.05, p<0.001). A strong correlation between perceived accuracy and therapeutic outcome (r=0.72, p<0.001) validates our fine-grained approach. These findings establish the viability of theory-driven, emotion-aware digital wellness tools and provides a scalable AI blueprint for operationalizing music therapy principles.


【7】Compose Yourself: Average-Velocity Flow Matching for One-Step Speech Enhancement
标题:整理自己:平均速度流匹配用于一步语音增强
链接:https://arxiv.org/abs/2509.15952

作者:, Yue Lei, Wenxin Tai, Jin Wu, Jia Chen, Ting Zhong, Fan Zhou
备注:5 pages, 2 figures, submitted to ICASSP 2026
摘要:扩散和流匹配(FM)模型在语音增强(SE)中取得了显著的进展,但它们依赖于多步生成,计算量大,容易受到离散化误差的影响。一步生成建模,特别是MeanFlow的最新进展,通过平均速度场重新制定动态提供了一个有前途的替代方案。在这项工作中,我们提出了COSE,一个一步FM框架定制SE。为了解决MeanFlow中Jacobian向量积(JVP)计算的高训练开销,我们引入了速度合成恒等式来有效地计算平均速度,消除了昂贵的计算,同时保持理论一致性并实现有竞争力的增强质量。在标准基准测试上进行的大量实验表明,COSE的采样速度提高了5倍,训练成本降低了40%,而所有这些都不会影响语音质量。代码可在https://github.com/ICDM-UESTC/COSE上获得。
摘要:Diffusion and flow matching (FM) models have achieved remarkable progress in speech enhancement (SE), yet their dependence on multi-step generation is computationally expensive and vulnerable to discretization errors. Recent advances in one-step generative modeling, particularly MeanFlow, provide a promising alternative by reformulating dynamics through average velocity fields. In this work, we present COSE, a one-step FM framework tailored for SE. To address the high training overhead of Jacobian-vector product (JVP) computations in MeanFlow, we introduce a velocity composition identity to compute average velocity efficiently, eliminating expensive computation while preserving theoretical consistency and achieving competitive enhancement quality. Extensive experiments on standard benchmarks show that COSE delivers up to 5x faster sampling and reduces training cost by 40%, all without compromising speech quality. Code is available at https://github.com/ICDM-UESTC/COSE.


【8】The Alignment Bottleneck
标题:路线瓶颈
链接:https://arxiv.org/abs/2509.15932

作者:o
摘要:大型语言模型随着规模的扩大而改进,但基于反馈的对齐仍然表现出与预期行为的系统性偏差。受经济学和认知科学中的有限理性的启发,我们认为判断是资源有限的,反馈是受约束的渠道。在此基础上,我们将回路建模为两级级联,给定S,认知能力为C| S}$和平均总容量$\bar{C}_{\text{tot}| S}$。我们的主要结果是一个容量耦合的对齐性能区间。它将一个在可分离码本上证明的与数据大小无关的Fano下界与一个PAC Bayes上界配对,该上界的KL项通过$m \,\bar{C}_{\text{tot}}由同一信道控制|S}$。当使用典型的可观察损失并且数据集来自相同的混合物时,PAC-Bayes界成为相同真实风险的上界。在这些匹配的条件下,两个极限由一个单一的能力。结果包括,在价值复杂性和容量固定的情况下,单独添加标签无法跨越界限;在更复杂的目标上实现较低的风险需要以$\log M$增长的容量;并且一旦有用的信号饱和容量,进一步优化往往适合信道优化,与奉承和奖励黑客的报告一致。该分析将对齐视为接口工程:测量和分配有限的容量,管理任务复杂性,并决定信息的使用位置。
摘要 :Large language models improve with scale, yet feedback-based alignment still exhibits systematic deviations from intended behavior. Motivated by bounded rationality in economics and cognitive science, we view judgment as resource-limited and feedback as a constrained channel. On this basis, we model the loop as a two-stage cascade $U \to H \to Y$ given $S$, with cognitive capacity $C_{\text{cog}|S}$ and average total capacity $\bar{C}_{\text{tot}|S}$. Our main result is a capacity-coupled Alignment Performance Interval. It pairs a data size-independent Fano lower bound proved on a separable codebook mixture with a PAC-Bayes upper bound whose KL term is controlled by the same channel via $m \, \bar{C}_{\text{tot}|S}$. The PAC-Bayes bound becomes an upper bound on the same true risk when the canonical observable loss is used and the dataset is drawn from the same mixture. Under these matched conditions, both limits are governed by a single capacity. Consequences include that, with value complexity and capacity fixed, adding labels alone cannot cross the bound; attaining lower risk on more complex targets requires capacity that grows with $\log M$; and once useful signal saturates capacity, further optimization tends to fit channel regularities, consistent with reports of sycophancy and reward hacking. The analysis views alignment as interface engineering: measure and allocate limited capacity, manage task complexity, and decide where information is spent.


【9】Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
标题:通过线下奖励评估和政策搜索增强生成性自动竞价
链接:https://arxiv.org/abs/2509.15927

作者:, Yiqin Lv, Miao Xu, Cheems Wang, Yixiu Mao, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng
摘要:自动竞价是广告主提高广告效果的重要工具。最近的进展表明,AI生成的投标(AIGB)将自动投标制定为轨迹生成任务,并在离线数据上训练基于条件扩散的规划器,与典型的基于离线强化学习(RL)的自动投标方法相比,实现了卓越和稳定的性能。然而,现有的AIGB方法仍然遇到了性能瓶颈,由于它们忽视了细粒度的生成质量评估和无法探索静态数据集之外。为了解决这个问题,我们提出了AIGB-Pearl(\endash {Planning with Evolutionator via RL}),这是一种将生成式规划和策略优化相结合的新方法。AIGB-Pearl的关键是构造一个非自举的轨迹评估器来分配奖励和指导策略搜索,使规划器能够通过交互迭代地优化其生成质量。此外,为了提高离线设置中轨迹评估器的准确性,我们采用了三种关键技术:(i)基于大型语言模型(LLM)的架构,以获得更好的表示能力,(ii)混合逐点和成对损失,以获得更好的分数学习,以及(iii)自适应集成专家反馈,以获得更好的泛化能力。在模拟和真实世界的广告系统上进行的大量实验证明了我们的方法的最先进的性能。
摘要:Auto-bidding is an essential tool for advertisers to enhance their advertising performance. Recent progress has shown that AI-Generated Bidding (AIGB), which formulates the auto-bidding as a trajectory generation task and trains a conditional diffusion-based planner on offline data, achieves superior and stable performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still encounter a performance bottleneck due to their neglect of fine-grained generation quality evaluation and inability to explore beyond static datasets. To address this, we propose AIGB-Pearl (\emph{Planning with EvAluator via RL}), a novel method that integrates generative planning and policy optimization. The key to AIGB-Pearl is to construct a non-bootstrapped \emph{trajectory evaluator} to assign rewards and guide policy search, enabling the planner to optimize its generation quality iteratively through interaction. Furthermore, to enhance trajectory evaluator accuracy in offline settings, we incorporate three key techniques: (i) a Large Language Model (LLM)-based architecture for better representational capacity, (ii) hybrid point-wise and pair-wise losses for better score learning, and (iii) adaptive integration of expert feedback for better generalization ability. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.


【10】A Flow-rate-conserving CNN-based Domain Decomposition Method for Blood Flow Simulations
标题:基于CNN的流量保持区域分解方法的血流模拟
链接:https://arxiv.org/abs/2509.15900

作者:es, Axel Klawonn, Natalie Kubicki, Martin Lanser, Kengo Nakajima, Takashi Shimokawabe, Janine Weber
摘要:本工作旨在使用卷积神经网络(CNN)替代模型预测狭窄动脉中非牛顿粘度的血流。提出了一种使用基于CNN的子域求解器的交替Schwarz域分解方法。通用子域求解器(USDS)在单个固定几何上进行训练,然后应用于Schwarz方法中的每个子域求解。不同流入条件下不同形状和长度的二维狭窄动脉的结果,并进行统计学评价。当使用有限数量的训练数据时,一个关键的发现是需要实现USDS,它保留了一些物理特性,例如在我们的情况下,流量守恒。物理感知方法优于纯数据驱动的USDS,提供改进的子域解决方案,并防止在Schwarz迭代期间全局解决方案的过冲或下冲,从而实现更可靠的收敛。
摘要:This work aims to predict blood flow with non-Newtonian viscosity in stenosed arteries using convolutional neural network (CNN) surrogate models. An alternating Schwarz domain decomposition method is proposed which uses CNN-based subdomain solvers. A universal subdomain solver (USDS) is trained on a single, fixed geometry and then applied for each subdomain solve in the Schwarz method. Results for two-dimensional stenotic arteries of varying shape and length for different inflow conditions are presented and statistically evaluated. One key finding, when using a limited amount of training data, is the need to implement a USDS which preserves some of the physics, as, in our case, flow rate conservation. A physics-aware approach outperforms purely data-driven USDS, delivering improved subdomain solutions and preventing overshooting or undershooting of the global solution during the Schwarz iterations, thereby leading to more reliable convergence.


【11】SAGE: Semantic-Aware Shared Sampling for Efficient Diffusion
标题:SAGE:语义感知共享采样以实现高效扩散
链接:https://arxiv.org/abs/2509.15865

作者:ao, Tong Bai, Lei Huang, Xiaoyu Liang
备注:5 pages, 4 figures
摘要:扩散模型在不同的领域表现出明显的好处,但其高采样成本,需要几十个连续的模型评估,仍然是一个主要的限制。以前的努力主要是通过优化的求解器或蒸馏来加速采样,这些算法独立地处理每个查询。相比之下,我们通过在语义相似的查询中共享早期采样来减少总步骤数。为了在不牺牲质量的情况下提高效率,我们提出了SAGE,一个语义感知的共享采样框架,它集成了一个共享的采样方案,以提高效率和量身定制的培训策略,以保持质量。大量的实验表明,SAGE降低了25.5%的采样成本,同时提高了生成质量,FID降低了5.0%,CLIP提高了5.4%,多样性提高了160%。
摘要:Diffusion models manifest evident benefits across diverse domains, yet their high sampling cost, requiring dozens of sequential model evaluations, remains a major limitation. Prior efforts mainly accelerate sampling via optimized solvers or distillation, which treat each query independently. In contrast, we reduce total number of steps by sharing early-stage sampling across semantically similar queries. To enable such efficiency gains without sacrificing quality, we propose SAGE, a semantic-aware shared sampling framework that integrates a shared sampling scheme for efficiency and a tailored training strategy for quality preservation. Extensive experiments show that SAGE reduces sampling cost by 25.5%, while improving generation quality with 5.0% lower FID, 5.4% higher CLIP, and 160% higher diversity over baselines.


【12】Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings
标题:利用多模式嵌入优化电子商务中的产品时间表
链接:https://arxiv.org/abs/2509.15858

作者:ulunk, Berk Taskin, M. Furkan Eseoglu, H. Bahadir Sahin
摘要:在大型电子商务市场中,重复的产品列表经常导致消费者混淆和运营效率低下,降低了对平台的信任并增加了成本。传统的基于关键词的搜索方法由于依赖于精确的文本匹配,而忽略了产品标题中固有的语义相似性,因此无法准确识别重复项。为了解决这些挑战,我们引入了一个可扩展的,多模式的产品重复数据删除专为电子商务领域。我们的方法采用了特定领域的文本模型接地BERT架构结合MaskedAutoEncoders的图像表示。这两种架构都增加了降维技术,以产生紧凑的128维嵌入,而没有显着的信息丢失。作为补充,我们还开发了一种新的决策者模型,它利用了文本和图像向量。通过将这些特征提取机制与优化的矢量数据库Milvus集成,我们的系统可以在超过2亿个项目的广泛产品目录中进行高效和高精度的相似性搜索,而系统RAM消耗仅为100 GB。实证评估表明,我们的匹配系统实现了0.90的宏观平均F1得分,优于第三方解决方案,达到F1得分0.83。我们的研究结果表明,将特定领域的适应与最先进的机器学习技术相结合,可以减少大规模电子商务环境中的重复列表。
摘要:In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies falter in accurately identifying duplicates due to their reliance on exact textual matches, neglecting semantic similarities inherent in product titles. To address these challenges, we introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain. Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations. Both of these architectures are augmented with dimensionality reduction techniques to produce compact 128-dimensional embeddings without significant information loss. Complementing this, we also developed a novel decider model that leverages both text and image vectors. By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches across extensive product catalogs exceeding 200 million items with just 100GB of system RAM consumption. Empirical evaluations demonstrate that our matching system achieves a macro-average F1 score of 0.90, outperforming third-party solutions which attain an F1 score of 0.83. Our findings show the potential of combining domain-specific adaptations with state-of-the-art machine learning techniques to mitigate duplicate listings in large-scale e-commerce environments.


【13】Monte Carlo Tree Diffusion with Multiple Experts for Protein Design
标题:蛋白质设计多位专家的蒙特卡洛树扩散
链接:https://arxiv.org/abs/2509.15796

作者:iu, Mingxuan Cao, Songhao Jiang, Xiao Luo, Xiaotian Duan, Mengdi Wang, Tobin R. Sosnick, Jinbo Xu, Rick Stevens
摘要:蛋白质设计的目标是产生折叠成具有所需性质的功能结构的氨基酸序列。将自回归语言模型与蒙特卡罗树搜索(MCTS)相结合的现有方法与长范围依赖性作斗争,并且遭受不切实际的大搜索空间。我们提出了MCTD-ME,Monte Carlo Tree Diffusion with Multiple Experts,它将掩蔽扩散模型与树搜索相结合,以实现多令牌规划和高效探索。与自回归规划器不同,MCTD-ME使用生物保真增强扩散去噪作为推出引擎,联合修改多个位置并扩展到大序列空间。它进一步利用不同能力的专家来丰富探索,以基于pLDDT的掩蔽时间表为指导,该时间表针对低置信度区域,同时保留可靠的残留物。提出了一种新的多专家选择规则(PH-UCT-ME),将预测熵UCT扩展到专家集成。在反向折叠任务(CAMEO和PDB基准测试)中,MCTD-ME在序列恢复(AAR)和结构相似性(scTM)方面均优于单一专家和无指导基线,对于较长的蛋白质,收益增加,并受益于多专家指导。更一般地说,该框架是模型不可知的,并且适用于逆折叠之外,包括从头蛋白质工程和多目标分子生成。
摘要:The goal of protein design is to generate amino acid sequences that fold into functional structures with desired properties. Prior methods combining autoregressive language models with Monte Carlo Tree Search (MCTS) struggle with long-range dependencies and suffer from an impractically large search space. We propose MCTD-ME, Monte Carlo Tree Diffusion with Multiple Experts, which integrates masked diffusion models with tree search to enable multi-token planning and efficient exploration. Unlike autoregressive planners, MCTD-ME uses biophysical-fidelity-enhanced diffusion denoising as the rollout engine, jointly revising multiple positions and scaling to large sequence spaces. It further leverages experts of varying capacities to enrich exploration, guided by a pLDDT-based masking schedule that targets low-confidence regions while preserving reliable residues. We propose a novel multi-expert selection rule (PH-UCT-ME) extends predictive-entropy UCT to expert ensembles. On the inverse folding task (CAMEO and PDB benchmarks), MCTD-ME outperforms single-expert and unguided baselines in both sequence recovery (AAR) and structural similarity (scTM), with gains increasing for longer proteins and benefiting from multi-expert guidance. More generally, the framework is model-agnostic and applicable beyond inverse folding, including de novo protein engineering and multi-objective molecular generation.


【14】UPRPRC: Unified Pipeline for Reproducing Parallel Resources -- Corpus from the United Nations
标题:UPRPRC:复制并行资源的统一管道--来自联合国的数据库
链接:https://arxiv.org/abs/2509.15789

作者:u, Fangjian Shen, Zhengkai Tang, Qiang Liu, Hexuan Cheng, Hui Liu, Wushao Wen
备注:5 pages, 1 figure, submitted to ICASSP2026
摘要:多语言数据集的质量和可访问性对于推进机器翻译至关重要。然而,以往基于联合国文件建立的语料库存在过程不透明、复制困难和规模有限等问题。为了应对这些挑战,我们推出了一个完整的端到端解决方案,从通过Web抓取的数据采集到文本对齐。整个过程是完全可复制的,具有极简的单机示例和可选的分布式计算步骤以实现可扩展性。在其核心,我们提出了一个新的图形辅助段落对齐(GAPA)算法的高效和灵活的段落级对齐。由此产生的语料库包含超过7.13亿个英语单词,是之前工作规模的两倍多。据我们所知,这是最大的公开可用的平行语料库,完全由人工翻译的非人工智能生成的内容组成。我们的代码和语料库可以在MIT许可证下访问。
摘要:The quality and accessibility of multilingual datasets are crucial for advancing machine translation. However, previous corpora built from United Nations documents have suffered from issues such as opaque process, difficulty of reproduction, and limited scale. To address these challenges, we introduce a complete end-to-end solution, from data acquisition via web scraping to text alignment. The entire process is fully reproducible, with a minimalist single-machine example and optional distributed computing steps for scalability. At its core, we propose a new Graph-Aided Paragraph Alignment (GAPA) algorithm for efficient and flexible paragraph-level alignment. The resulting corpus contains over 713 million English tokens, more than doubling the scale of prior work. To the best of our knowledge, this represents the largest publicly available parallel corpus composed entirely of human-translated, non-AI-generated content. Our code and corpus are accessible under the MIT License.


【15】Toward Efficient Influence Function: Dropout as a Compression Tool
标题:迈向高效影响功能:辍学作为压缩工具
链接:https://arxiv.org/abs/2509.15651

作者:ang, Mohammad Mohammadi Amiri
摘要:评估训练数据对机器学习模型的影响对于理解模型的行为、提高透明度和选择训练数据至关重要。影响函数提供了一个理论框架,用于量化给定特定测试数据的训练数据点对模型性能的影响。然而,影响函数的计算和内存成本提出了重大挑战,特别是对于大规模模型,即使使用近似方法,因为计算中涉及的梯度与模型本身一样大。在这项工作中,我们引入了一种新的方法,利用dropout作为梯度压缩机制来更有效地计算影响函数。我们的方法显着减少了计算和内存开销,不仅在影响函数计算,而且在梯度压缩过程中。通过理论分析和实证验证,我们证明了我们的方法可以保留数据影响的关键组成部分,并使其应用于现代大规模模型。
摘要 :Assessing the impact the training data on machine learning models is crucial for understanding the behavior of the model, enhancing the transparency, and selecting training data. Influence function provides a theoretical framework for quantifying the effect of training data points on model's performance given a specific test data. However, the computational and memory costs of influence function presents significant challenges, especially for large-scale models, even when using approximation methods, since the gradients involved in computation are as large as the model itself. In this work, we introduce a novel approach that leverages dropout as a gradient compression mechanism to compute the influence function more efficiently. Our method significantly reduces computational and memory overhead, not only during the influence function computation but also in gradient compression process. Through theoretical analysis and empirical validation, we demonstrate that our method could preserves critical components of the data influence and enables its application to modern large-scale models.


【16】Information Geometry of Variational Bayes
标题:变分Bayes的信息几何
链接:https://arxiv.org/abs/2509.15641

作者:Emtiyaz Khan
摘要:我们强调信息几何和变分贝叶斯(VB)之间的基本连接,并讨论其对机器学习的影响。在某些条件下,VB解决方案总是需要估计或计算自然梯度。我们通过使用Khan和Rue(2023)的自然梯度下降算法(称为贝叶斯学习规则(BLR))来展示这一事实的几个后果。这些包括(i)将贝叶斯规则简化为自然梯度的添加,(ii)在基于梯度的方法中使用的二次代理的泛化,以及(iii)用于大型语言模型的VB算法的大规模实现。无论是连接,也不是它的后果是新的,但我们进一步强调信息几何和贝叶斯的两个领域的共同起源,希望促进更多的工作在两个领域的交叉点。
摘要:We highlight a fundamental connection between information geometry and variational Bayes (VB) and discuss its consequences for machine learning. Under certain conditions, a VB solution always requires estimation or computation of natural gradients. We show several consequences of this fact by using the natural-gradient descent algorithm of Khan and Rue (2023) called the Bayesian Learning Rule (BLR). These include (i) a simplification of Bayes' rule as addition of natural gradients, (ii) a generalization of quadratic surrogates used in gradient-based methods, and (iii) a large-scale implementation of VB algorithms for large language models. Neither the connection nor its consequences are new but we further emphasize the common origins of the two fields of information geometry and Bayes with a hope to facilitate more work at the intersection of the two fields.


【17】Reward Hacking Mitigation using Verifiable Composite Rewards
标题:使用可验证的复合奖励缓解奖励黑客攻击
链接:https://arxiv.org/abs/2509.15557

作者:han Bin Tarek, Rahmatollah Beheshti
备注:Accepted at the 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2025)
摘要:来自可验证奖励(RLVR)的强化学习最近表明,大型语言模型(LLM)可以在没有直接监督的情况下开发自己的推理。然而,医疗领域的应用程序,特别是问答,在推理阶段容易受到重大奖励黑客攻击。我们的工作解决了这种行为的两种主要形式:i)提供一个最终的答案,而无需事先推理,以及ii)采用非标准的推理格式来利用奖励机制。为了缓解这些问题,我们引入了一个复合奖励函数,对这些行为进行了具体的惩罚。我们的实验表明,与基线相比,我们提出的奖励模型扩展RLVR会导致更好的格式推理,更少的奖励黑客攻击和更好的准确性。这种方法标志着朝着减少奖励黑客攻击和提高利用RLVR模型的可靠性迈出了一步。
摘要:Reinforcement Learning from Verifiable Rewards (RLVR) has recently shown that large language models (LLMs) can develop their own reasoning without direct supervision. However, applications in the medical domain, specifically for question answering, are susceptible to significant reward hacking during the reasoning phase. Our work addresses two primary forms of this behavior: i) providing a final answer without preceding reasoning, and ii) employing non-standard reasoning formats to exploit the reward mechanism. To mitigate these, we introduce a composite reward function with specific penalties for these behaviors. Our experiments show that extending RLVR with our proposed reward model leads to better-formatted reasoning with less reward hacking and good accuracy compared to the baselines. This approach marks a step toward reducing reward hacking and enhancing the reliability of models utilizing RLVR.


【18】PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors
标题:PolyJuice让它成真:黑匣子、通用红色团队用于合成图像检测器
链接:https://arxiv.org/abs/2509.15551

作者:hdashtian, Mashrur M. Morshed, Jacob H. Seidman, Gaurav Bharaj, Vishnu Naresh Boddeti
备注:Accepted as NeurIPS 2025 poster
摘要:合成图像检测器(SID)是抵御文本到图像(T2 I)模型中图像日益逼真所带来风险的关键防御措施。红色团队通过识别和利用错误分类的合成图像来提高SID的有效性。然而,现有的红队解决方案(i)需要白盒访问SID,这对于专有的最先进的检测器是不可行的,以及(ii)通过昂贵的在线优化生成特定于图像的攻击。为了解决这些限制,我们提出了PolyJuice,第一个黑盒,图像不可知的红队方法SID,根据观察到的分布转移在T2 I的潜在空间之间的样本正确和不正确的分类SID。PolyJuice通过以下方式生成攻击:(i)通过仅需要对SID进行黑盒访问的轻量级离线过程来识别这种转变的方向,以及(ii)通过将所有生成的图像普遍转向SID的故障模式来利用这种方向。PolyJuice-steered T2 I模型在欺骗SID(高达84%)方面比未转向的模型更有效。我们还表明,转向方向可以有效地估计在较低的分辨率和转移到更高的分辨率,使用简单的插值,减少计算开销。最后,在PolyJuice增强的数据集上调整SID模型可以显着提高检测器的性能(高达30%)。
摘要:Synthetic image detectors (SIDs) are a key defense against the risks posed by the growing realism of images from text-to-image (T2I) models. Red teaming improves SID's effectiveness by identifying and exploiting their failure modes via misclassified synthetic images. However, existing red-teaming solutions (i) require white-box access to SIDs, which is infeasible for proprietary state-of-the-art detectors, and (ii) generate image-specific attacks through expensive online optimization. To address these limitations, we propose PolyJuice, the first black-box, image-agnostic red-teaming method for SIDs, based on an observed distribution shift in the T2I latent space between samples correctly and incorrectly classified by the SID. PolyJuice generates attacks by (i) identifying the direction of this shift through a lightweight offline process that only requires black-box access to the SID, and (ii) exploiting this direction by universally steering all generated images towards the SID's failure modes. PolyJuice-steered T2I models are significantly more effective at deceiving SIDs (up to 84%) compared to their unsteered counterparts. We also show that the steering directions can be estimated efficiently at lower resolutions and transferred to higher resolutions using simple interpolation, reducing computational overhead. Finally, tuning SID models on PolyJuice-augmented datasets notably enhances the performance of the detectors (up to 30%).


【19】Geometric Integration for Neural Control Variates
标题:神经控制变量的几何积分
链接:https://arxiv.org/abs/2509.15538

作者:ister, Takahiro Harada
摘要:控制变量是蒙特卡罗积分的方差缩减技术。该原理涉及近似的被积函数的功能,可以分析集成,并集成使用蒙特卡罗方法之间的残差被积函数和近似,以获得一个无偏估计。神经网络是通用的近似器,可以用作控制变量。然而,挑战在于分析整合,这在一般情况下是不可能的。在这篇手稿中,我们研究了一个最简单的神经网络模型,多层感知器(MLP)与连续分段线性激活函数,及其可能的解析积分。我们提出了一种基于积分域细分的积分方法,采用计算几何的技术来解决这个问题。我们证明了MLP可以作为一个控制变量,结合我们的集成方法,显示在光传输模拟中的应用。
摘要 :Control variates are a variance-reduction technique for Monte Carlo integration. The principle involves approximating the integrand by a function that can be analytically integrated, and integrating using the Monte Carlo method only the residual difference between the integrand and the approximation, to obtain an unbiased estimate. Neural networks are universal approximators that could potentially be used as a control variate. However, the challenge lies in the analytic integration, which is not possible in general. In this manuscript, we study one of the simplest neural network models, the multilayered perceptron (MLP) with continuous piecewise linear activation functions, and its possible analytic integration. We propose an integration method based on integration domain subdivision, employing techniques from computational geometry to solve this problem in 2D. We demonstrate that an MLP can be used as a control variate in combination with our integration method, showing applications in the light transport simulation.


【20】Policy Gradient Optimzation for Bayesian-Risk MDPs with General Convex Losses
标题:具有一般凸损失的Bayesian风险MDPs的政策梯度优化
链接:https://arxiv.org/abs/2509.15509

作者:g Wang, Yifan Lin, Enlu Zhou
摘要:Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach to estimate the parameters from data and impose a coherent risk functional (with respect to the Bayesian posterior distribution) on the loss. Since this formulation usually does not satisfy the interchangeability principle, it does not admit Bellman equations and cannot be solved by approaches based on dynamic programming. Therefore, We propose a policy gradient optimization method, leveraging the dual representation of coherent risk measures and extending the envelope theorem to continuous cases. We then show the stationary analysis of the algorithm with a convergence rate of $O(T^{-1/2}+r^{-1/2})$, where $T$ is the number of policy gradient iterations and $r$ is the sample size of the gradient estimator. We further extend our algorithm to an episodic setting, and establish the global convergence of the extended algorithm and provide bounds on the number of iterations needed to achieve an error bound $O(\epsilon)$ in each episode.


【21】FRAUDGUESS: Spotting and Explaining New Types of Fraud in Million-Scale Financial Data
标题:欺诈:发现并解释百万规模金融数据中的新型欺诈类型
链接:https://arxiv.org/abs/2509.15493

作者: F. Cordeiro, Meng-Chieh Lee, Christos Faloutsos
摘要:Given a set of financial transactions (who buys from whom, when, and for how much), as well as prior information from buyers and sellers, how can we find fraudulent transactions? If we have labels for some transactions for known types of fraud, we can build a classifier. However, we also want to find new types of fraud, still unknown to the domain experts ('Detection'). Moreover, we also want to provide evidence to experts that supports our opinion ('Justification'). In this paper, we propose FRAUDGUESS, to achieve two goals: (a) for 'Detection', it spots new types of fraud as micro-clusters in a carefully designed feature space; (b) for 'Justification', it uses visualization and heatmaps for evidence, as well as an interactive dashboard for deep dives. FRAUDGUESS is used in real life and is currently considered for deployment in an Anonymous Financial Institution (AFI). Thus, we also present the three new behaviors that FRAUDGUESS discovered in a real, million-scale financial dataset. Two of these behaviors are deemed fraudulent or suspicious by domain experts, catching hundreds of fraudulent transactions that would otherwise go un-noticed.


【22】Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
标题:分层自我注意力:将神经力学注意力推广到多尺度问题
链接:https://arxiv.org/abs/2509.15448

作者:zadeh, Sara Abdali, Yinheng Li, Kazuhito Koishida
备注:In The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:Transformers and their attention mechanism have been revolutionary in the field of Machine Learning. While originally proposed for the language data, they quickly found their way to the image, video, graph, etc. data modalities with various signal geometries. Despite this versatility, generalizing the attention mechanism to scenarios where data is presented at different scales from potentially different modalities is not straightforward. The attempts to incorporate hierarchy and multi-modality within transformers are largely based on ad hoc heuristics, which are not seamlessly generalizable to similar problems with potentially different structures. To address this problem, in this paper, we take a fundamentally different approach: we first propose a mathematical construct to represent multi-modal, multi-scale data. We then mathematically derive the neural attention mechanics for the proposed construct from the first principle of entropy minimization. We show that the derived formulation is optimal in the sense of being the closest to the standard Softmax attention while incorporating the inductive biases originating from the hierarchical/geometric information of the problem. We further propose an efficient algorithm based on dynamic programming to compute our derived attention mechanism. By incorporating it within transformers, we show that the proposed hierarchical attention mechanism not only can be employed to train transformer models in hierarchical/multi-modal settings from scratch, but it can also be used to inject hierarchical information into classical, pre-trained transformer models post training, resulting in more efficient models in zero-shot manner.


【23】Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data
标题:单细胞RN-seq数据的随机矩阵理论引导的稀疏PCA
链接:https://arxiv.org/abs/2509.15429

作者:ardès
备注:16 figures
摘要 :Single-cell RNA-seq provides detailed molecular snapshots of individual cells but is notoriously noisy. Variability stems from biological differences, PCR amplification bias, limited sequencing depth, and low capture efficiency, making it challenging to adapt computational pipelines to heterogeneous datasets or evolving technologies. As a result, most studies still rely on principal component analysis (PCA) for dimensionality reduction, valued for its interpretability and robustness. Here, we improve upon PCA with a Random Matrix Theory (RMT)-based approach that guides the inference of sparse principal components using existing sparse PCA algorithms. We first introduce a novel biwhitening method, inspired by the Sinkhorn-Knopp algorithm, that simultaneously stabilizes variance across genes and cells. This enables the use of an RMT-based criterion to automatically select the sparsity level, rendering sparse PCA nearly parameter-free. Our mathematically grounded approach retains the interpretability of PCA while enabling robust, hands-off inference of sparse principal components. Across seven single-cell RNA-seq technologies and four sparse PCA algorithms, we show that this method systematically improves the reconstruction of the principal subspace and consistently outperforms PCA-, autoencoder-, and diffusion-based methods in cell-type classification tasks.


【24】Top-$k$ Feature Importance Ranking
标题:顶级-$k$功能重要性排名
链接:https://arxiv.org/abs/2509.15420

作者:, Tiffany Tang, Genevera Allen
摘要:Accurate ranking of important features is a fundamental challenge in interpretable machine learning with critical applications in scientific discovery and decision-making. Unlike feature selection and feature importance, the specific problem of ranking important features has received considerably less attention. We introduce RAMPART (Ranked Attributions with MiniPatches And Recursive Trimming), a framework that utilizes any existing feature importance measure in a novel algorithm specifically tailored for ranking the top-$k$ features. Our approach combines an adaptive sequential halving strategy that progressively focuses computational resources on promising features with an efficient ensembling technique using both observation and feature subsampling. Unlike existing methods that convert importance scores to ranks as post-processing, our framework explicitly optimizes for ranking accuracy. We provide theoretical guarantees showing that RAMPART achieves the correct top-$k$ ranking with high probability under mild conditions, and demonstrate through extensive simulation studies that RAMPART consistently outperforms popular feature importance methods, concluding with a high-dimensional genomics case study.


【25】Stochastic Sample Approximations of (Local) Moduli of Continuity
标题:(局部)连续模的随机样本逼近
链接:https://arxiv.org/abs/2509.15368

作者:zarov, Allen Gehret, Robert Shorten, Jakub Marecek
摘要:Modulus of local continuity is used to evaluate the robustness of neural networks and fairness of their repeated uses in closed-loop models. Here, we revisit a connection between generalized derivatives and moduli of local continuity, and present a non-uniform stochastic sample approximation for moduli of local continuity. This is of importance in studying robustness of neural networks and fairness of their repeated uses.


【26】Probabilistic Conformal Coverage Guarantees in Small-Data Settings
标题:小数据环境中的概率保形覆盖保证
链接:https://arxiv.org/abs/2509.15349

作者: Zwart
摘要:Conformal prediction provides distribution-free prediction sets with guaranteed marginal coverage. However, in split conformal prediction this guarantee is training-conditional only in expectation: across many calibration draws, the average coverage equals the nominal level, but the realized coverage for a single calibration set may vary substantially. This variance undermines effective risk control in practical applications. Here we introduce the Small Sample Beta Correction (SSBC), a plug-and-play adjustment to the conformal significance level that leverages the exact finite-sample distribution of conformal coverage to provide probabilistic guarantees, ensuring that with user-defined probability over the calibration draw, the deployed predictor achieves at least the desired coverage.


【27】Subject Matter Expertise vs Professional Management in Collective Sequential Decision Making
标题:集体序贯决策中的专业知识与专业管理
链接:https://arxiv.org/abs/2509.15263

作者:resh, Yonatan Loewenstein
备注:Reinforcement Learning and Decision Making (RLDM) 2025. arXiv admin note: substantial text overlap with arXiv:2412.18593
摘要:Your company's CEO is retiring. You search for a successor. You can promote an employee from the company familiar with the company's operations, or recruit an external professional manager. Who should you prefer? It has not been clear how to address this question, the "subject matter expertise vs. professional manager debate", quantitatively and objectively. We note that a company's success depends on long sequences of interdependent decisions, with often-opposing recommendations of diverse board members. To model this task in a controlled environment, we utilize chess - a complex, sequential game with interdependent decisions which allows for quantitative analysis of performance and expertise (since the states, actions and game outcomes are well-defined). The availability of chess engines differing in style and expertise, allows scalable experimentation. We considered a team of (computer) chess players. At each turn, team members recommend a move and a manager chooses a recommendation. We compared the performance of two manager types. For manager as "subject matter expert", we used another (computer) chess player that assesses the recommendations of the team members based on its own chess expertise. We examined the performance of such managers at different strength levels. To model a "professional manager", we used Reinforcement Learning (RL) to train a network that identifies the board positions in which different team members have relative advantage, without any pretraining in chess. We further examined this network to see if any chess knowledge is acquired implicitly. We found that subject matter expertise beyond a minimal threshold does not significantly contribute to team synergy. Moreover, performance of a RL-trained "professional" manager significantly exceeds that of even the best "expert" managers, while acquiring only limited understanding of chess.


【28】KNARsack: Teaching Neural Algorithmic Reasoners to Solve Pseudo-Polynomial Problems
标题:KnARsack:教神经数学推理者解决伪多项问题
链接:https://arxiv.org/abs/2509.15239

作者:ožgaj, Dobrik Georgiev, Marin Šilić, Petar Veličković
备注:14 pages, 10 figures
摘要:Neural algorithmic reasoning (NAR) is a growing field that aims to embed algorithmic logic into neural networks by imitating classical algorithms. In this extended abstract, we detail our attempt to build a neural algorithmic reasoner that can solve Knapsack, a pseudo-polynomial problem bridging classical algorithms and combinatorial optimisation, but omitted in standard NAR benchmarks. Our neural algorithmic reasoner is designed to closely follow the two-phase pipeline for the Knapsack problem, which involves first constructing the dynamic programming table and then reconstructing the solution from it. The approach, which models intermediate states through dynamic programming supervision, achieves better generalization to larger problem instances than a direct-prediction baseline that attempts to select the optimal subset only from the problem inputs.


【29】MICA: Multi-Agent Industrial Coordination Assistant
标题:MICA:多主体工业协调助理
链接:https://arxiv.org/abs/2509.15237

作者:unyu Peng, Junwei Zheng, Yufan Chen, Yitain Shi, Jiale Wei, Ruiping Liu, Kailun Yang, Rainer Stiefelhagen
备注:The source code will be made publicly available at this https URL
摘要:Industrial workflows demand adaptive and trustworthy assistance that can operate under limited computing, connectivity, and strict privacy constraints. In this work, we present MICA (Multi-Agent Industrial Coordination Assistant), a perception-grounded and speech-interactive system that delivers real-time guidance for assembly, troubleshooting, part queries, and maintenance. MICA coordinates five role-specialized language agents, audited by a safety checker, to ensure accurate and compliant support. To achieve robust step understanding, we introduce Adaptive Step Fusion (ASF), which dynamically blends expert reasoning with online adaptation from natural speech feedback. Furthermore, we establish a new multi-agent coordination benchmark across representative task categories and propose evaluation metrics tailored to industrial assistance, enabling systematic comparison of different coordination topologies. Our experiments demonstrate that MICA consistently improves task success, reliability, and responsiveness over baseline structures, while remaining deployable on practical offline hardware. Together, these contributions highlight MICA as a step toward deployable, privacy-preserving multi-agent assistants for dynamic factory environments. The source code will be made publicly available at https://github.com/Kratos-Wen/MICA.


【30】PRISM: Probabilistic and Robust Inverse Solver with Measurement-Conditioned Diffusion Prior for Blind Inverse Problems
标题:PRism:盲逆问题的概率和鲁棒逆解器,具有测量条件扩散先验
链接:https://arxiv.org/abs/2509.16106

作者:u, Evan Bell, Guijin Wang, Yu Sun
摘要:Diffusion models are now commonly used to solve inverse problems in computational imaging. However, most diffusion-based inverse solvers require complete knowledge of the forward operator to be used. In this work, we introduce a novel probabilistic and robust inverse solver with measurement-conditioned diffusion prior (PRISM) to effectively address blind inverse problems. PRISM offers a technical advancement over current methods by incorporating a powerful measurement-conditioned diffusion model into a theoretically principled posterior sampling scheme. Experiments on blind image deblurring validate the effectiveness of the proposed method, demonstrating the superior performance of PRISM over state-of-the-art baselines in both image and blur kernel recovery.


【31】What is a good matching of probability measures? A counterfactual lens on transport maps
标题:什么是概率测量的良好匹配?交通地图上的反事实镜头
链接:https://arxiv.org/abs/2509.16027

作者:Lara, Luca Ganassali
备注:37 pages; comments most welcome
摘要:Coupling probability measures lies at the core of many problems in statistics and machine learning, from domain adaptation to transfer learning and causal inference. Yet, even when restricted to deterministic transports, such couplings are not identifiable: two atomless marginals admit infinitely many transport maps. The common recourse to optimal transport, motivated by cost minimization and cyclical monotonicity, obscures the fact that several distinct notions of multivariate monotone matchings coexist. In this work, we first carry a comparative analysis of three constructions of transport maps: cyclically monotone, quantile-preserving and triangular monotone maps. We establish necessary and sufficient conditions for their equivalence, thereby clarifying their respective structural properties. In parallel, we formulate counterfactual reasoning within the framework of structural causal models as a problem of selecting transport maps between fixed marginals, which makes explicit the role of untestable assumptions in counterfactual reasoning. Then, we are able to connect these two perspectives by identifying conditions on causal graphs and structural equations under which counterfactual maps coincide with classical statistical transports. In this way, we delineate the circumstances in which causal assumptions support the use of a specific structure of transport map. Taken together, our results aim to enrich the theoretical understanding of families of transport maps and to clarify their possible causal interpretations. We hope this work contributes to establishing new bridges between statistical transport and causal inference.


【32】AI Methods for Permutation Circuit Synthesis Across Generic Topologies
标题:跨通用布局排列电路综合的人工智能方法
链接:https://arxiv.org/abs/2509.16020

作者:llar, Juan Cruz-Benito, Ismael Faro, David Kremer
备注:This paper has been accepted by First AAAI Symposium on Quantum Information & Machine Learning (QIML): Bridging Quantum Computing and Artificial Intelligence at AAAI 2025 Fall Symposium
摘要 :This paper investigates artificial intelligence (AI) methodologies for the synthesis and transpilation of permutation circuits across generic topologies. Our approach uses Reinforcement Learning (RL) techniques to achieve near-optimal synthesis of permutation circuits up to 25 qubits. Rather than developing specialized models for individual topologies, we train a foundational model on a generic rectangular lattice, and employ masking mechanisms to dynamically select subsets of topologies during the synthesis. This enables the synthesis of permutation circuits on any topology that can be embedded within the rectangular lattice, without the need to re-train the model. In this paper we show results for 5x5 lattice and compare them to previous AI topology-oriented models and classical methods, showing that they outperform classical heuristics, and match previous specialized AI models, and performs synthesis even for topologies that were not seen during training. We further show that the model can be fine tuned to strengthen the performance for selected topologies of interest. This methodology allows a single trained model to efficiently synthesize circuits across diverse topologies, allowing its practical integration into transpilation workflows.


【33】VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
标题:VoXtream:延迟极低的全速文本转语音
链接:https://arxiv.org/abs/2509.15969

作者:rgashov, Gustav Eje Henter, Gabriel Skantze
备注:5 pages, 1 figure, submitted to IEEE ICASSP 2026
摘要:We present VoXtream, a fully autoregressive, zero-shot streaming text-to-speech (TTS) system for real-time use that begins speaking from the first word. VoXtream directly maps incoming phonemes to audio tokens using a monotonic alignment scheme and a dynamic look-ahead that does not delay onset. Built around an incremental phoneme transformer, a temporal transformer predicting semantic and duration tokens, and a depth transformer producing acoustic tokens, VoXtream achieves, to our knowledge, the lowest initial delay among publicly available streaming TTS: 102 ms on GPU. Despite being trained on a mid-scale 9k-hour corpus, it matches or surpasses larger baselines on several metrics, while delivering competitive quality in both output- and full-streaming settings. Demo and code are available at https://herimor.github.io/voxtream.


【34】Neural Architecture Search Algorithms for Quantum Autoencoders
标题:量子自动编码器的神经结构搜索算法
链接:https://arxiv.org/abs/2509.15451

作者:shrestha, Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Ilya Safro
摘要:The design of quantum circuits is currently driven by the specific objectives of the quantum algorithm in question. This approach thus relies on a significant manual effort by the quantum algorithm designer to design an appropriate circuit for the task. However this approach cannot scale to more complex quantum algorithms in the future without exponentially increasing the circuit design effort and introducing unwanted inductive biases. Motivated by this observation, we propose to automate the process of cicuit design by drawing inspiration from Neural Architecture Search (NAS). In this work, we propose two Quantum-NAS algorithms that aim to find efficient circuits given a particular quantum task. We choose quantum data compression as our driver quantum task and demonstrate the performance of our algorithms by finding efficient autoencoder designs that outperform baselines on three different tasks - quantum data denoising, classical data compression and pure quantum data compression. Our results indicate that quantum NAS algorithms can significantly alleviate the manual effort while delivering performant quantum circuits for any given task.


【35】Training thermodynamic computers by gradient descent
标题:通过梯度下降训练热力学计算机
链接:https://arxiv.org/abs/2509.15324

作者:hitelam
摘要:We show how to adjust the parameters of a thermodynamic computer by gradient descent in order to perform a desired computation at a specified observation time. Within a digital simulation of a thermodynamic computer, training proceeds by maximizing the probability with which the computer would generate an idealized dynamical trajectory. The idealized trajectory is designed to reproduce the activations of a neural network trained to perform the desired computation. This teacher-student scheme results in a thermodynamic computer whose finite-time dynamics enacts a computation analogous to that of the neural network. The parameters identified in this way can be implemented in the hardware realization of the thermodynamic computer, which will perform the desired computation automatically, driven by thermal noise. We demonstrate the method on a standard image-classification task, and estimate the thermodynamic advantage -- the ratio of energy costs of the digital and thermodynamic implementations -- to exceed seven orders of magnitude. Our results establish gradient descent as a viable training method for thermodynamic computing, enabling application of the core methodology of machine learning to this emerging field.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/187056
 
21 次点击