点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计206篇
大模型相关(34篇)
【1】SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport
标题:SOTAlignn:通过最佳传输实现单模式视觉和语言模型的半监督对齐
链接:https://arxiv.org/abs/2602.23353
作者:Simon Roschmann,Paul Krzakala,Sonia Mazelet,Quentin Bouniot,Zeynep Akata
备注:Preprint
摘要:柏拉图表征假说假设,在不同模态上训练的神经网络会收敛到一个共享的世界统计模型。最近的工作通过将冻结的预训练视觉和语言模型与轻量级对齐层对齐来利用这种融合,但通常依赖于对比损失和数百万配对样本。在这项工作中,我们问是否可以实现有意义的调整,大大减少监督。我们引入了一种半监督设置,其中使用少量的图像-文本对以及大量的未配对数据来对齐预训练的单峰编码器。为了解决这一挑战,我们提出了SOTAlign,一个两阶段的框架,首先使用线性教师从有限的配对数据中恢复粗略的共享几何,然后通过基于最优传输的发散来改进未配对样本的对齐,该发散可以转移关系结构而不会过度约束目标空间。与现有的半监督方法不同,SOTAlign有效地利用了未配对的图像和文本,在数据集和编码器对之间学习鲁棒的联合嵌入,并显著优于监督和半监督基线。
摘要:The Platonic Representation Hypothesis posits that neural networks trained on different modalities converge toward a shared statistical model of the world. Recent work exploits this convergence by aligning frozen pretrained vision and language models with lightweight alignment layers, but typically relies on contrastive losses and millions of paired samples. In this work, we ask whether meaningful alignment can be achieved with substantially less supervision. We introduce a semi-supervised setting in which pretrained unimodal encoders are aligned using a small number of image-text pairs together with large amounts of unpaired data. To address this challenge, we propose SOTAlign, a two-stage framework that first recovers a coarse shared geometry from limited paired data using a linear teacher, then refines the alignment on unpaired samples via an optimal-transport-based divergence that transfers relational structure without overconstraining the target space. Unlike existing semi-supervised methods, SOTAlign effectively leverages unpaired images and text, learning robust joint embeddings across datasets and encoder pairs, and significantly outperforming supervised and semi-supervised baselines.
【2】Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
标题:评估领导者-跟随者互动中小语言模型的Zero-Shot和单镜头适应
链接:https://arxiv.org/abs/2602.23312
作者:Rafael R. Baptista,André de Lima Salgado,Ricardo V. Godoy,Marcelo Becker,Thiago Boaventura,Gustavo J. G. Lahr
摘要:主从交互是人-机器人交互的一种重要模式。然而,实时分配角色对于资源受限的移动和辅助机器人来说仍然具有挑战性。虽然大型语言模型(LLM)已经显示出自然通信的前景,但它们的大小和延迟限制了设备上的部署。小语言模型(SLM)提供了一个潜在的替代方案,但其有效性的角色分类在HRI尚未得到系统的评估。在本文中,我们提出了一个基准的领导者-追随者通信的SLM,介绍了一种新的数据集,来自已发布的数据库和合成样本,以捕捉特定的互动动态增强。我们研究了两种适应策略:提示工程和微调,研究zero-shot和one-shot交互模式下,与未经训练的基线相比。Qwen2.5-0.5B的实验表明,zero-shot微调实现了稳健的分类性能(86.66%的准确率),同时保持了低延迟(每个样本22.2 ms),显著优于基线和非工程化方法。然而,结果也表明,在一个镜头模式,增加上下文长度的挑战模型的架构能力的性能下降。这些研究结果表明,微调SLM提供了一个有效的解决方案,直接的角色分配,同时突出了关键的权衡对话的复杂性和分类的可靠性的边缘。
摘要:Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.
【3】Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
标题:扩展搜索相关性:利用LLM生成的判断增强应用商店排名
链接:https://arxiv.org/abs/2602.23234
作者:Evangelia Christakopoulou,Vivekkumar Patel,Hemanth Velaga,Sandip Gaikwad
摘要
:大型商业搜索系统优化相关性,以推动成功的会话,帮助用户找到他们正在寻找的东西。为了最大限度地提高相关性,我们利用两个互补的目标:行为相关性(用户倾向于点击或下载的结果)和文本相关性(结果与查询的语义匹配)。一个持续的挑战是缺乏专家提供的文本相关性标签相对丰富的行为相关性标签。我们首先通过系统地评估LLM配置来解决这个问题,发现一个专门的,微调的模型在提供高度相关的标签方面明显优于一个更大的预训练模型。使用这个最优模型作为力倍增器,我们生成了数百万个文本相关性标签来克服数据稀缺性。我们发现,用这些文本相关性标签来增强我们的生产排名会导致帕累托边界的显著向外转移:离线NDCG提高了行为相关性,同时增加了文本相关性。这些离线收益通过对App Store排名的全球A/B测试进行了验证,该测试显示转化率在统计学上显著增加了+0.24%,其中最显著的性能收益发生在尾部查询中,其中新的文本相关性标签在缺乏可靠的行为相关性标签的情况下提供了强大的信号。
摘要:Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit to the query). A persistent challenge is the scarcity of expert-provided textual relevance labels relative to abundant behavioral relevance labels. We first address this by systematically evaluating LLM configurations, finding that a specialized, fine-tuned model significantly outperforms a much larger pre-trained one in providing highly relevant labels. Using this optimal model as a force multiplier, we generate millions of textual relevance labels to overcome the data scarcity. We show that augmenting our production ranker with these textual relevance labels leads to a significant outward shift of the Pareto frontier: offline NDCG improves for behavioral relevance while simultaneously increasing for textual relevance. These offline gains were validated by a worldwide A/B test on the App Store ranker, which demonstrated a statistically significant +0.24% increase in conversion rate, with the most substantial performance gains occurring in tail queries, where the new textual relevance labels provide a robust signal in the absence of reliable behavioral relevance labels.
【4】InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
标题:InnerQ:针对大型语言模型的KV缓存的硬件感知免调量化
链接:https://arxiv.org/abs/2602.23200
作者:Sayed Mohammadreza Tayaranian Hosseini,Amir Ardakani,Warren J. Gross
备注:16 pages, 4 figures, 4 tables, 2 algorithms
摘要:在解码过程中减少大型语言模型(LLM)的硬件占用对于有效的长序列生成至关重要。一个关键的瓶颈是键值(KV)缓存,它的大小随序列长度而变化,并且很容易支配模型的内存占用。以前的工作提出了量化方法,集中在压缩KV缓存,同时保持其信息。我们引入了InnerQ,一种硬件感知的KV缓存量化方案,可以在不牺牲精度的情况下降低解码延迟。InnerQ应用分组量化,同时在其内部维度上对缓存矩阵进行分组。与以前在外部维度上分组的工作不同,InnerQ将去量化与矢量矩阵乘法对齐,并支持跨GPU计算单元的比例因子重用。这减少了内存访问和加速去量化,产生高达$22\%$的加速比以前的工作和高达$88\%$的半精度向量矩阵乘法。为了在积极的压缩下保持保真度,InnerQ结合了(i)混合量化,基于本地统计信息为每组选择对称或非对称量化;(ii)针对最近令牌和注意力汇令牌的高精度窗口,以减轻离群值泄漏;以及(iii)密钥缓存的每通道标准化,在预填充期间计算一次并折叠到查询中,以避免运行时开销。我们对Llama模型的评估实验表明,InnerQ保持了与非量化KV缓存相当的Few-Shot GSM 8 K性能,并超过了先前的KV缓存量化方法。
摘要:Reducing the hardware footprint of large language models (LLMs) during decoding is critical for efficient long-sequence generation. A key bottleneck is the key-value (KV) cache, whose size scales with sequence length and easily dominates the memory footprint of the model. Previous work proposed quantization methods that are focused on compressing the KV cache while maintaining its information. We introduce InnerQ, a hardware-aware KV-cache quantization scheme that lowers decode latency without sacrificing accuracy. InnerQ applies group-wise quantization while grouping the cache matrices over their inner dimension. Unlike previous work that group over the outer dimension, InnerQ aligns dequantization with the vector-matrix multiplication and enables scale factor reuse across GPU compute units. This reduces memory accesses and accelerates dequantization, yielding up to $22\%$ speedup over previous work and up to $88\%$ over half-precision vector-matrix multiplication. To preserve fidelity under aggressive compression, InnerQ incorporates (i) hybrid quantization, selecting symmetric or asymmetric quantization per group based on local statistics; (ii) high-precision windows for both the most recent tokens and the attention sink tokens to mitigate outlier leakage; and (iii) per-channel normalization of the key cache, computed once during prefill and folded into the query to avoid runtime overhead. Our evaluation experiments on Llama models shows that InnerQ maintains a few-shot GSM8K performance comparable to non-quantized KV caches and surpasses prior KV cache quantization methods.
【5】Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models
标题:诱导与生物学相遇:蛋白质语言模型中重复检测的机制
链接:https://arxiv.org/abs/2602.23179
作者:Gal Kesten-Pomeranz,Yaniv Nikankin,Anja Reusch,Tomer Tsaban,Ora Schueler-Furman,Yonatan Belinkov
摘要:蛋白质序列中有大量的重复片段,既有精确的拷贝,也有突变的近似片段。这些重复序列对于蛋白质的结构和功能非常重要,激发了数十年的重复序列识别算法工作。最近的工作表明,蛋白质语言模型(PLM)通过检查它们在掩蔽标记预测中的行为来识别重复。为了阐明它们的内部机制,我们研究了PLM如何检测精确和近似重复。我们发现近似重复的机制在功能上包含了精确重复的机制。然后,我们描述了这种机制,揭示了两个主要阶段:PLM首先使用一般的位置注意头和生物专门的组件,如编码氨基酸相似性的神经元,建立功能表示。然后,感应头注意到重复段中对齐的标记,提示正确答案。我们的研究结果揭示了PLM如何通过将基于语言的模式匹配与专业的生物学知识相结合来解决这一生物学任务,从而为研究PLM中更复杂的进化过程奠定了基础。
摘要
:Protein sequences are abundant in repeating segments, both as exact copies and as approximate segments with mutations. These repeats are important for protein structure and function, motivating decades of algorithmic work on repeat identification. Recent work has shown that protein language models (PLMs) identify repeats, by examining their behavior in masked-token prediction. To elucidate their internal mechanisms, we investigate how PLMs detect both exact and approximate repeats. We find that the mechanism for approximate repeats functionally subsumes that of exact repeats. We then characterize this mechanism, revealing two main stages: PLMs first build feature representations using both general positional attention heads and biologically specialized components, such as neurons that encode amino-acid similarity. Then, induction heads attend to aligned tokens across repeated segments, promoting the correct answer. Our results reveal how PLMs solve this biological task by combining language-based pattern matching with specialized biological knowledge, thereby establishing a basis for studying more complex evolutionary processes in PLMs.
【6】Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs
标题:由于解码不匹配而导致情态崩溃:多模式LLM的信息理论限制
链接:https://arxiv.org/abs/2602.23136
作者:Jayadev Billa
备注:22 pages, 11 tables, 2 figures. Code: https://github.com/jb1999/modality_collapse_paper
摘要:多模态LLM可以处理语音和图像,但它们不能听到说话者的声音或看到物体的纹理。我们表明这不是编码的失败:说话者身份,情感和视觉属性通过每个LLM层生存(线性探测中的机会以上3- 55倍),但去除64- 71%的模态特定方差改善了解码器损失。解码器对这些方向没有学习的用途;它们的存在是噪音。 我们将其形式化为不匹配的解码器问题:在文本上训练的解码器只能沿着文本对齐的方向提取信息。可解码信息由广义互信息(GMI)限定,具有与分布距离和解码器灵敏度相关的降级缩放。边界是解码器的评分规则的属性,而不是任何特定的架构;它适用于非文本输入是否通过学习投影,离散码本或根本没有显式适配器到达。我们通过跨越语音和视觉的五个模型来验证这一点。一个受控实验(两个Prismatic VLM仅在编码器文本对齐方面不同)证实了瓶颈是解码器的评分规则,而不是编码器或投影。LoRA的一项干预措施证明了解决方案:使用情感目标进行培训可以提高情感可达性($+$7.5%),而不会影响其他属性,这证实了培训目标决定了什么是可达的。
摘要:Multimodal LLMs can process speech and images, but they cannot hear a speaker's voice or see an object's texture. We show this is not a failure of encoding: speaker identity, emotion, and visual attributes survive through every LLM layer (3--55$\times$ above chance in linear probes), yet removing 64--71% of modality-specific variance improves decoder loss. The decoder has no learned use for these directions; their presence is noise. We formalize this as a mismatched decoder problem: a decoder trained on text can only extract information along text-aligned directions. Accessible information is bounded by the Generalized Mutual Information (GMI), with degradation scaling with distributional distance and decoder sensitivity. The bound is a property of the decoder's scoring rule, not of any particular architecture; it applies whether non-text inputs arrive through a learned projection, a discrete codebook, or no explicit adapter at all. We validate this across five models spanning speech and vision. A controlled experiment (two Prismatic VLMs differing only in encoder text-alignment) confirms the bottleneck is the decoder's scoring rule, not the encoder or projection. A LoRA intervention demonstrates the fix: training with an emotion objective improves emotion accessibility ($+$7.5%) without affecting other attributes, confirming that the training objective determines what becomes accessible.
【7】PRAC: Principal-Random Subspace for LLM Activation Compression and Memory-Efficient Training
标题:PRAC:LLM激活压缩和内存高效训练的主随机子空间
链接:https://arxiv.org/abs/2602.23111
作者:Yanyi Li,Yimu Zhang,Cong Fang
摘要:激活已经成为大批量LLM训练中的主要内存瓶颈。然而,现有的压缩方法未能利用激活的频谱结构,导致收敛缓慢或有限的压缩。为了解决这个问题,我们的桥梁算法的快速收敛和子空间投影的要求之间的关系,并表明,一个有效的压缩应产生一个无偏估计的原始激活低方差。我们提出了用于LLM激活压缩(PRAC)的主随机子空间,它新颖地将激活分解为两个分量:通过SVD捕获的主要子空间以保留主导信息,以及从正交补采样的随机子空间以近似尾部。通过引入一个精确的比例因子,我们证明了在一定条件下,PRAC产生一个具有最小方差的无偏梯度估计。对预训练和微调任务的大量实验表明,PRAC实现了高达36%的总内存减少,性能下降可以忽略不计,计算成本最小。
摘要:Activations have become the primary memory bottleneck in large-batch LLM training. However, existing compression methods fail to exploit the spectral structure of activations, resulting in slow convergence or limited compression. To address this, we bridge the relationship between the algorithm's fast convergence and the requirements for subspace projection, and show that an effective compression should yield an unbiased estimate of the original activation with low variance. We propose Principal-Random Subspace for LLM Activation Compression (PRAC), which novelly decomposes activations into two components: a principal subspace captured via SVD to retain dominant information, and a random subspace sampled from the orthogonal complement to approximate the tail. By introducing a precise scaling factor, we prove that PRAC yields an unbiased gradient estimator with minimum variance under certain conditions. Extensive experiments on pre-training and fine-tuning tasks demonstrate that PRAC achieves up to 36% total memory reduction with negligible performance degradation and minimal computational cost.
【8】Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent
标题:使用风格分析辅助LLM代理评估去匿名化风险
链接:https://arxiv.org/abs/2602.23079
作者:Boyang Zhang,Yang Zhang
摘要:大型语言模型(LLM)的快速发展使强大的作者身份推断能力成为可能,这引起了人们对新闻文章等文本数据中意外去匿名化风险的日益关注。在这项工作中,我们引入了一个LLM代理,旨在通过结构化,可解释的管道来评估和减轻此类风险。我们的框架的核心是建议$\textit{SALA}$(文体辅助LLM分析)方法,它集成了定量文体特征与LLM推理,以实现强大而透明的作者归属。大规模新闻数据集上的实验表明,$\textit{SALA}$,特别是与数据库模块增强时,在各种情况下实现高的推理精度。最后,我们提出了一个指导重组策略,利用代理的推理轨迹生成重写提示,有效地减少作者身份识别,同时保留文本意义。我们的研究结果强调了LLM代理的去匿名化潜力以及可解释的主动防御对于保护作者隐私的重要性。
摘要
:The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended deanonymization risks in textual data such as news articles. In this work, we introduce an LLM agent designed to evaluate and mitigate such risks through a structured, interpretable pipeline. Central to our framework is the proposed $\textit{SALA}$ (Stylometry-Assisted LLM Analysis) method, which integrates quantitative stylometric features with LLM reasoning for robust and transparent authorship attribution. Experiments on large-scale news datasets demonstrate that $\textit{SALA}$, particularly when augmented with a database module, achieves high inference accuracy in various scenarios. Finally, we propose a guided recomposition strategy that leverages the agent's reasoning trace to generate rewriting prompts, effectively reducing authorship identifiability while preserving textual meaning. Our findings highlight both the deanonymization potential of LLM agents and the importance of interpretable, proactive defenses for safeguarding author privacy.
【9】RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection
标题:RhythmBERT:一种基于心电图波潜在表示的自我监督语言模型,用于心脏病检测
链接:https://arxiv.org/abs/2602.23060
作者:Xin Wang,Burcu Ozek,Aruna Mohan,Amirhossein Ravari,Or Zilbershot,Fatemeh Afghah
摘要:心电图(ECG)分析对于诊断心脏病至关重要,但大多数自监督学习方法将ECG视为通用时间序列,忽略了生理语义和节律级结构。现有的对比方法利用扭曲形态的增强,而生成方法采用固定窗口分割,这使心动周期不一致。为了解决这些限制,我们提出了RhythmBERT,一个生成的ECG语言模型,认为ECG作为一种语言范式,通过基于自动编码器的潜在表示将P,QRS和T段编码为符号标记。这些离散标记捕获节奏语义,而互补的连续嵌入保留细粒度形态,从而实现波形结构和节奏的统一视图。RhythmBERT在大约800,000个未标记的ECG记录上进行了预训练,具有掩蔽的预测目标,使其能够以标签有效的方式学习上下文表示。评估表明,尽管仅使用单导联,但RhythmBERT实现了与强大的12导联基线相当或更高的性能。这种概括从房颤等常见疾病扩展到轻微ST-T异常和心肌梗死等具有临床挑战性的病例。我们的研究结果表明,将ECG视为结构化语言为推进心脏分析提供了一种可扩展的生理学对齐途径。
摘要:Electrocardiogram (ECG) analysis is crucial for diagnosing heart disease, but most self-supervised learning methods treat ECG as a generic time series, overlooking physiologic semantics and rhythm-level structure. Existing contrastive methods utilize augmentations that distort morphology, whereas generative approaches employ fixed-window segmentation, which misaligns cardiac cycles. To address these limitations, we propose RhythmBERT, a generative ECG language model that considers ECG as a language paradigm by encoding P, QRS, and T segments into symbolic tokens via autoencoder-based latent representations. These discrete tokens capture rhythm semantics, while complementary continuous embeddings retain fine-grained morphology, enabling a unified view of waveform structure and rhythm. RhythmBERT is pretrained on approximately 800,000 unlabeled ECG recordings with a masked prediction objective, allowing it to learn contextual representations in a label-efficient manner. Evaluations show that despite using only a single lead, RhythmBERT achieves comparable or superior performance to strong 12-lead baselines. This generalization extends from prevalent conditions such as atrial fibrillation to clinically challenging cases such as subtle ST-T abnormalities and myocardial infarction. Our results suggest that considering ECG as structured language offers a scalable and physiologically aligned pathway for advancing cardiac analysis.
【10】Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
标题:基于混合策略优化的探索性内存增强LLM Agent
链接:https://arxiv.org/abs/2602.23008
作者:Zeyuan Liu,Jeonghye Kim,Xufang Luo,Dongsheng Li,Yuqing Yang
备注:Accepted to ICLR 2026
摘要:探索仍然是用强化学习训练的大型语言模型代理的关键瓶颈。虽然先前的方法利用预先训练的知识,但它们在需要发现新状态的环境中失败了。我们提出了探索性内存增强的开策略和关策略优化(EMPO$^2$),这是一个混合RL框架,它利用内存进行探索,并结合了开策略和关策略更新,使LLM在使用内存的情况下表现良好,同时也确保了不使用内存的鲁棒性。在ScienceWorld和WebShop上,EMPO$^2$分别比GRPO提高了128.6%和11.3%。此外,在分布外测试中,EMPO表现出对新任务的卓越适应性,只需要几次记忆试验,不需要参数更新。这些结果突出了EMPO$^2$作为一个有前途的框架,建立更多的探索性和可推广的基于LLM的代理。
摘要:Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose Exploratory Memory-Augmented On- and Off-Policy Optimization (EMPO$^2$), a hybrid RL framework that leverages memory for exploration and combines on- and off-policy updates to make LLMs perform well with memory while also ensuring robustness without it. On ScienceWorld and WebShop, EMPO$^2$ achieves 128.6% and 11.3% improvements over GRPO, respectively. Moreover, in out-of-distribution tests, EMPO$^2$ demonstrates superior adaptability to new tasks, requiring only a few trials with memory and no parameter updates. These results highlight EMPO$^2$ as a promising framework for building more exploratory and generalizable LLM-based agents.
【11】Moral Preferences of LLMs Under Directed Contextual Influence
标题:直接语境影响下的法学硕士道德偏好
链接:https://arxiv.org/abs/2602.22831
作者:Phil Blandfort,Tushar Karayil,Urja Pawar,Robert Graham,Alex McKenzie,Dmitrii Krasheninnikov
摘要:LLM的道德基准通常使用上下文无关的提示,隐含地假设稳定的偏好。然而,在部署中,提示通常包括可以引导决策的上下文信号,诸如用户请求、关于社会规范的线索等。我们研究如何直接的上下文影响重塑电车问题式的道德分流设置的决定。我们引入了一个试点评估工具,用于在电车问题式道德分流中对定向情境影响:对于每个人口统计因素,我们应用匹配的,方向翻转的情境影响,这些影响只在他们喜欢的群体中有所不同,从而能够系统地测量定向反应。我们发现:(i)背景影响通常会显著改变决策,即使只是表面上相关;(ii)基线偏好是方向可操纵性的不良预测因素,因为模型可能表现出基线中立,但在影响下表现出系统的可操纵性不对称;(iii)影响可能适得其反:模型可能明确声称中立或低估背景线索,但他们的选择仍然会改变,有时是相反的方向;和(iv)推理降低了平均灵敏度,但放大了有偏见的Few-Shot例子的影响。我们的研究结果激励扩展道德评价与控制,方向翻转的上下文操作,以更好地表征模型的行为。
摘要
:Moral benchmarks for LLMs typically use context-free prompts, implicitly assuming stable preferences. In deployment, however, prompts routinely include contextual signals such as user requests, cues on social norms, etc. that may steer decisions. We study how directed contextual influences reshape decisions in trolley-problem-style moral triage settings. We introduce a pilot evaluation harness for directed contextual influence in trolley-problem-style moral triage: for each demographic factor, we apply matched, direction-flipped contextual influences that differ only in which group they favor, enabling systematic measurement of directional response. We find that: (i) contextual influences often significantly shift decisions, even when only superficially relevant; (ii) baseline preferences are a poor predictor of directional steerability, as models can appear baseline-neutral yet exhibit systematic steerability asymmetry under influence; (iii) influences can backfire: models may explicitly claim neutrality or discount the contextual cue, yet their choices still shift, sometimes in the opposite direction; and (iv) reasoning reduces average sensitivity, but amplifies the effect of biased few-shot examples. Our findings motivate extending moral evaluations with controlled, direction-flipped context manipulations to better characterize model behavior.
【12】TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models
标题:TARAZ:语言模型文化评估的波斯语简答问题基准
链接:https://arxiv.org/abs/2602.22827
作者:Reihaneh Iranmanesh,Saeedeh Davoudi,Pasha Abrishamchian,Ophir Frieder,Nazli Goharian
备注:11 pages, 3 figures, Fifteenth biennial Language Resources and Evaluation Conference (LREC) 2026 (to appear)
摘要:本文提出了一个全面的评估框架,用于评估波斯语大型语言模型(LLM)的文化能力。现有的波斯文化基准主要依赖于多项选择格式和以英语为中心的指标,未能捕捉波斯语的形态复杂性和语义细微差别。我们的框架引入了一个特定于波斯语的简短答案评估,该评估将基于规则的形态规范化与混合语法和语义相似性模块相结合,从而实现了超出精确字符串重叠的强大软匹配评分。通过对15个最先进的开源和闭源模型的系统评估,我们证明了我们的混合评估通过捕获表面水平方法无法检测到的含义,与精确匹配基线相比,将评分一致性提高了+10%。我们公开发布我们的评估框架,为测量波斯语文化理解提供了第一个标准化基准,并为跨文化LLM评估研究建立了可复制的基础。
摘要:This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on multiple-choice formats and English-centric metrics that fail to capture Persian's morphological complexity and semantic nuance. Our framework introduces a Persian-specific short-answer evaluation that combines rule-based morphological normalization with a hybrid syntactic and semantic similarity module, enabling robust soft-match scoring beyond exact string overlap. Through systematic evaluation of 15 state-of-the-art open- and closed-source models, we demonstrate that our hybrid evaluation improves scoring consistency by +10% compared to exact-match baselines by capturing meaning that surface-level methods cannot detect. We publicly release our evaluation framework, providing the first standardized benchmark for measuring cultural understanding in Persian and establishing a reproducible foundation for cross-cultural LLM evaluation research.
【13】Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching
标题:通过分布式提示缓存加速资源受限边缘设备上的本地LLM
链接:https://arxiv.org/abs/2602.22812
作者:Hiroki Matsutani,Naoki Matsuda,Naoto Sugiura
摘要:由于资源受限的边缘设备上的本地LLM推理施加了严重的性能瓶颈,本文提出了分布式提示缓存,通过在多个低端边缘设备上协作共享中间处理状态来提高推理性能。为了充分利用提示相似性,我们的分布式缓存机制也支持部分匹配。由于这种方法引入了与无线网络上的状态共享相关联的通信开销,我们引入了一个基于Bloom-filter的数据结构,称为目录,以确定远程服务器是否拥有所需的内部状态,从而抑制不必要的通信。在Raspberry Pi Zero 2 W平台上使用Gemma-3 270 M模型和MMLU数据集进行的实验表明,该方法平均分别将TTFT(第一个令牌的时间)和TTLT(最后一个令牌的时间)减少了93.12%和50.07%。
摘要:Since local LLM inference on resource-constrained edge devices imposes a severe performance bottleneck, this paper proposes distributed prompt caching to enhance inference performance by cooperatively sharing intermediate processing states across multiple low-end edge devices. To fully utilize prompt similarity, our distributed caching mechanism also supports partial matching. As this approach introduces communication overhead associated with state sharing over a wireless network, we introduce a Bloom-filter-based data structure, referred to as a catalog, to determine whether a remote server possesses the desired internal states, thereby suppressing unnecessary communication. Experiments using the Gemma-3 270M model and the MMLU dataset on the Raspberry Pi Zero 2W platform demonstrate that the proposed approach reduces TTFT (Time to First Token) and TTLT (Time to Last Token) by 93.12% and 50.07% on average, respectively.
【14】Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement
标题:通过平面方向动态增强加速LLM预训练
链接:https://arxiv.org/abs/2602.22681
作者:Shuchen Zhu,Rizhen Hu,Mingze Wang,Mou Sun,Xue Wang,Kun Yuan,Zaiwen Wen
摘要:预训练大型语言模型需要大量的计算资源,这使得优化器的效率至关重要。优化景观是高度各向异性的,损失减少主要由沿平坦方向的进展驱动。虽然基于矩阵的优化器(如Muon和SOAP)利用细粒度的曲率信息来优于AdamW,但它们的更新往往是各向同性的-沿平坦方向相对保守,但沿尖锐方向可能具有侵略性。为了解决这个问题,我们首先建立了一个统一的黎曼常微分方程(ODE)框架,阐明了常见的自适应算法如何协同工作:预条件诱导黎曼几何,减轻病态,而动量作为黎曼阻尼项,促进收敛。在这些见解的指导下,我们提出了LITE,这是一种广义的加速策略,通过沿平坦轨迹应用更大的Hessian阻尼系数和学习率来增强训练动态。大量的实验表明,LITE在不同的体系结构(密集,MoE),参数尺度(130 M-1.3 B),数据集(C4,Pile)和学习率计划(余弦,预热稳定衰减)上显着加速了Muon和SOAP。理论分析证实,LITE有助于在各向异性景观中沿着平坦方向更快地收敛,为有效的LLM预训练提供了一种原则性方法。该代码可在https://github.com/SHUCHENZHU/LITE上获得。
摘要
:Pre-training Large Language Models requires immense computational resources, making optimizer efficiency essential. The optimization landscape is highly anisotropic, with loss reduction driven predominantly by progress along flat directions. While matrix-based optimizers such as Muon and SOAP leverage fine-grained curvature information to outperform AdamW, their updates tend toward isotropy -- relatively conservative along flat directions yet potentially aggressive along sharp ones. To address this limitation, we first establish a unified Riemannian Ordinary Differential Equation (ODE) framework that elucidates how common adaptive algorithms operate synergistically: the preconditioner induces a Riemannian geometry that mitigates ill-conditioning, while momentum serves as a Riemannian damping term that promotes convergence. Guided by these insights, we propose LITE, a generalized acceleration strategy that enhances training dynamics by applying larger Hessian damping coefficients and learning rates along flat trajectories. Extensive experiments demonstrate that LITE significantly accelerates both Muon and SOAP across diverse architectures (Dense, MoE), parameter scales (130M--1.3B), datasets (C4, Pile), and learning-rate schedules (cosine, warmup-stable-decay). Theoretical analysis confirms that LITE facilitates faster convergence along flat directions in anisotropic landscapes, providing a principled approach to efficient LLM pre-training. The code is available at https://github.com/SHUCHENZHU/LITE.
【15】Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators
标题:对Trie进行载体化:加速器上基于LLM的生成式检索的高效约束解码
链接:https://arxiv.org/abs/2602.22647
作者:Zhengyang Su,Isay Katsman,Yueqi Wang,Ruining He,Lukasz Heldt,Raghunandan Keshavan,Shao-Chuan Wang,Xinyang Yi,Mingyan Gao,Onkar Dalal,Lichan Hong,Ed Chi,Ningren Han
备注:14 pages, 4 figures
摘要:生成检索已经成为基于LLM的推荐的一个强大的范例。然而,工业推荐系统通常受益于基于业务逻辑(例如,强制内容新鲜度或产品类别)将输出空间限制为项目的受约束子集,标准自回归解码不能原生地支持该业务逻辑。此外,利用前缀树(Tries)的现有约束解码方法在硬件加速器(TPU/GPU)上招致严重的延迟损失。在这项工作中,我们引入STATIC(稀疏转换矩阵加速Trie索引约束解码),一个高效的和可扩展的约束解码技术,专为高吞吐量的LLM为基础的生成检索TPU/GPU。通过将前缀树扁平化为静态压缩稀疏行(CSR)矩阵,我们将不规则树遍历转换为完全矢量化的稀疏矩阵操作,从而在硬件加速器上实现了巨大的效率提升。我们在服务数十亿用户的大型工业视频推荐平台上部署了STATIC。STATIC以最小的延迟开销(每步0.033 ms和0.25%的推理时间)产生显著的产品指标影响,实现了CPU trie实现的948倍加速和硬件加速的二进制搜索基线的47- 1033倍加速。此外,STATIC的运行时开销在广泛的实际配置中仍然非常低。据我们所知,STATIC实现了严格约束的生成式检索的第一次生产规模部署。此外,学术基准测试表明,静态可以显着提高冷启动性能的生成检索。我们的代码可在https://github.com/youtube/static-constraint-decoding上获得。
摘要:Generative retrieval has emerged as a powerful paradigm for LLM-based recommendation. However, industrial recommender systems often benefit from restricting the output space to a constrained subset of items based on business logic (e.g. enforcing content freshness or product category), which standard autoregressive decoding cannot natively support. Moreover, existing constrained decoding methods that make use of prefix trees (Tries) incur severe latency penalties on hardware accelerators (TPUs/GPUs). In this work, we introduce STATIC (Sparse Transition Matrix-Accelerated Trie Index for Constrained Decoding), an efficient and scalable constrained decoding technique designed specifically for high-throughput LLM-based generative retrieval on TPUs/GPUs. By flattening the prefix tree into a static Compressed Sparse Row (CSR) matrix, we transform irregular tree traversals into fully vectorized sparse matrix operations, unlocking massive efficiency gains on hardware accelerators. We deploy STATIC on a large-scale industrial video recommendation platform serving billions of users. STATIC produces significant product metric impact with minimal latency overhead (0.033 ms per step and 0.25% of inference time), achieving a 948x speedup over a CPU trie implementation and a 47-1033x speedup over a hardware-accelerated binary-search baseline. Furthermore, the runtime overhead of STATIC remains extremely low across a wide range of practical configurations. To the best of our knowledge, STATIC enables the first production-scale deployment of strictly constrained generative retrieval. In addition, evaluation on academic benchmarks demonstrates that STATIC can considerably improve cold-start performance for generative retrieval. Our code is available at https://github.com/youtube/static-constraint-decoding.
【16】Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning
标题:压缩简单,探索困难:提高LLM推理的难度意识的熵正规化
链接:https://arxiv.org/abs/2602.22642
作者:Qin-Wen Luo,Sheng Ren,Xiang Chen,Rui Liu,Jun Fang,Naiqiang Tan,Sheng-Jun Huang
摘要:思想链(CoT)已经大大增强了大型语言模型(LLM)处理复杂推理任务的能力,但显式推理步骤的冗长性质会导致令人望而却步的推理延迟和计算成本,从而限制了现实世界的部署。虽然现有的压缩方法-从自我训练到具有长度约束的强化学习(RL)-试图减轻这一点,但它们往往为了简洁而牺牲推理能力。在这些方法中,我们确定了一个关键的失败模式:明确优化较短的轨迹会触发快速的熵崩溃,这会过早地缩小探索空间,扼杀有效推理路径的发现,特别是对于需要大量推理的挑战性问题。为了解决这个问题,我们提出了压缩响应简单的问题和探索困难的(CEEH),一个困难的意识到基于RL的高效推理的方法。CEEH动态评估实例难度以应用选择性熵正则化:它为当前困难的问题保留了多样化的搜索空间以确保鲁棒性,同时允许在推理路径已建立的较容易实例上进行积极压缩。此外,我们引入了一个动态的最优长度惩罚锚定到历史上最短的正确响应,有效地抵消熵引起的长度膨胀和稳定的奖励信号。在六个推理基准中,CEEH始终减少响应长度,同时保持与基础模型相当的准确性,并相对于仅长度优化改进Pass@k。
摘要
:Chain-of-Thought (CoT) has substantially empowered Large Language Models (LLMs) to tackle complex reasoning tasks, yet the verbose nature of explicit reasoning steps incurs prohibitive inference latency and computational costs, limiting real-world deployment. While existing compression methods - ranging from self-training to Reinforcement Learning (RL) with length constraints - attempt to mitigate this, they often sacrifice reasoning capability for brevity. We identify a critical failure mode in these approaches: explicitly optimizing for shorter trajectories triggers rapid entropy collapse, which prematurely shrinks the exploration space and stifles the discovery of valid reasoning paths, particularly for challenging questions requiring extensive deduction. To address this issue, we propose Compress responses for Easy questions and Explore Hard ones (CEEH), a difficulty-aware approach to RL-based efficient reasoning. CEEH dynamically assesses instance difficulty to apply selective entropy regularization: it preserves a diverse search space for currently hard questions to ensure robustness, while permitting aggressive compression on easier instances where the reasoning path is well-established. In addition, we introduce a dynamic optimal-length penalty anchored to the historically shortest correct response, which effectively counteracts entropy-induced length inflation and stabilizes the reward signal. Across six reasoning benchmarks, CEEH consistently reduces response length while maintaining accuracy comparable to the base model, and improves Pass@k relative to length-only optimization.
【17】Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA
标题:Semantic Tube预测:利用JEPA击败LLM数据效率
链接:https://arxiv.org/abs/2602.22617
作者:Hai Huang,Yann LeCun,Randall Balestriero
备注:21 pages, 13 figures
摘要:大型语言模型(LLM)遵循一致的缩放定律-经验幂律拟合,预测损失如何随着计算,数据和参数而减少。虽然这些规律具有预测性,但它们是描述性的,而不是规定性的:它们表征的是典型的训练,而不是最佳的训练。令人惊讶的是,很少有作品成功地挑战了这些定律所暗示的数据效率界限--这是我们的主要关注点。为此,我们引入了测地线假设,假设令牌序列跟踪测地线光滑的语义流形,因此局部线性。基于这一原则,我们提出了一种新的语义管预测(STP)任务,一个JEPA风格的正则化,限制隐藏状态的轨迹的管状邻域的测地线。STP将JEPA推广到语言,而不需要显式的多视图增强。我们表明,这种约束提高了信噪比,从而保持多样性,防止轨迹碰撞推理过程中。从经验上讲,STP允许LLM在NL-RX-SYNTH数据集上将基线精度与16\times $少的训练数据相匹配,直接违反了Chinchilla风格缩放定律的数据项,并证明了原则性几何先验可以超越蛮力缩放。代码可在https://github.com/galilai-group/llm-jepa#stp上获得。
摘要:Large Language Models (LLMs) obey consistent scaling laws -- empirical power-law fits that predict how loss decreases with compute, data, and parameters. While predictive, these laws are descriptive rather than prescriptive: they characterize typical training, not optimal training. Surprisingly few works have successfully challenged the data-efficiency bounds implied by these laws -- which is our primary focus. To that end, we introduce the Geodesic Hypothesis, positing that token sequences trace geodesics on a smooth semantic manifold and are therefore locally linear. Building on this principle, we propose a novel Semantic Tube Prediction (STP) task, a JEPA-style regularizer that confines hidden-state trajectories to a tubular neighborhood of the geodesic. STP generalizes JEPA to language without requiring explicit multi-view augmentations. We show this constraint improves signal-to-noise ratio, and consequently preserves diversity by preventing trajectory collisions during inference. Empirically, STP allows LLMs to match baseline accuracy with 16$\times$ less training data on the NL-RX-SYNTH dataset, directly violating the data term of Chinchilla-style scaling laws and demonstrating that principled geometric priors can surpass brute-force scaling. Code is available at https://github.com/galilai-group/llm-jepa#stp.
【18】pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training
标题:pQuant:通过去耦合线性量化感知训练迈向有效的低位语言模型
链接:https://arxiv.org/abs/2602.22592
作者:Wenzheng Zhang,Bingzheng Liu,Yang Hu,Xiaoying Bai,Wentao Zhang,Bin Cui
备注:10 pages, 7 figures
摘要:从头开始的量化感知训练已经成为一种很有前途的方法,用于构建具有极低位权重(低于2位)的高效大型语言模型(LLM),这可以为边缘部署提供实质性优势。然而,现有的方法仍然无法达到令人满意的准确性和可扩展性。在这项工作中,我们确定了一个参数民主化的效果作为一个关键的瓶颈:所有参数的敏感性变得同质化,严重限制了表现力。为了解决这个问题,我们提出了pQuant,这是一种通过将线性层分成两个专门的分支来对参数进行并行化的方法:一个主要的1位分支用于高效计算,一个紧凑的高精度分支用于保留最敏感的参数。通过量身定制的特征缩放,我们显式地引导模型将敏感参数分配给高精度分支。此外,我们将此分支扩展为多个稀疏激活的专家,从而实现有效的容量扩展。大量的实验表明,我们的pQuant达到国家的最先进的性能,在极低的比特量化。
摘要:Quantization-Aware Training from scratch has emerged as a promising approach for building efficient large language models (LLMs) with extremely low-bit weights (sub 2-bit), which can offer substantial advantages for edge deployment. However, existing methods still fail to achieve satisfactory accuracy and scalability. In this work, we identify a parameter democratization effect as a key bottleneck: the sensitivity of all parameters becomes homogenized, severely limiting expressivity. To address this, we propose pQuant, a method that decouples parameters by splitting linear layers into two specialized branches: a dominant 1-bit branch for efficient computation and a compact high-precision branch dedicated to preserving the most sensitive parameters. Through tailored feature scaling, we explicitly guide the model to allocate sensitive parameters to the high-precision branch. Furthermore, we extend this branch into multiple, sparsely-activated experts, enabling efficient capacity scaling. Extensive experiments indicate our pQuant achieves state-of-the-art performance in extremely low-bit quantization.
【19】CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety
标题:CourtGuard:LLM安全中Zero-Shot政策调整的模型不可知框架
链接:https://arxiv.org/abs/2602.22557
作者:Umid Suleymanov,Rufiz Bayramov,Suad Gafarli,Seljan Musayeva,Taghi Mammadov,Aynur Akhundlu,Murat Kantarcioglu
备注:Under Review
摘要
:目前大型语言模型(LLM)的安全机制严重依赖于静态的、微调的分类器,这些分类器存在适应刚性,无法在没有昂贵的再培训的情况下执行新的治理规则。为了解决这个问题,我们引入了CourtGuard,检索增强的多代理框架,重新想象安全评估作为证据辩论。通过精心策划基于外部政策文件的对抗性辩论,CourtGuard在7个安全基准上实现了最先进的性能,在没有微调的情况下超越了专用的政策遵循基线。除了标准指标之外,我们还强调了两个关键功能:(1)Zero-Shot适应性,我们的框架通过交换参考策略成功地推广到域外维基百科破坏任务(达到90%的准确率);和(2)自动数据管理和审计,我们利用CourtGuard来管理和审计九个复杂的对抗性攻击的新数据集。我们的研究结果表明,将安全逻辑与模型权重解耦,为满足当前和未来的人工智能治理监管要求提供了一条稳健、可解释和适应性强的路径。
摘要:Current safety mechanisms for Large Language Models (LLMs) rely heavily on static, fine-tuned classifiers that suffer from adaptation rigidity, the inability to enforce new governance rules without expensive retraining. To address this, we introduce CourtGuard, a retrieval-augmented multi-agent framework that reimagines safety evaluation as Evidentiary Debate. By orchestrating an adversarial debate grounded in external policy documents, CourtGuard achieves state-of-the-art performance across 7 safety benchmarks, outperforming dedicated policy-following baselines without fine-tuning. Beyond standard metrics, we highlight two critical capabilities: (1) Zero-Shot Adaptability, where our framework successfully generalized to an out-of-domain Wikipedia Vandalism task (achieving 90\% accuracy) by swapping the reference policy; and (2) Automated Data Curation and Auditing, where we leveraged CourtGuard to curate and audit nine novel datasets of sophisticated adversarial attacks. Our results demonstrate that decoupling safety logic from model weights offers a robust, interpretable, and adaptable path for meeting current and future regulatory requirements in AI governance.
【20】Reinforcement-aware Knowledge Distillation for LLM Reasoning
标题:LLM推理的增强感知知识提炼
链接:https://arxiv.org/abs/2602.22495
作者:Zhaoyang Zhang,Shuli Jiang,Yantao Shen,Yuting Zhang,Dhananjay Ram,Shuo Yang,Zhuowen Tu,Wei Xia,Stefano Soatto
摘要:强化学习(RL)后训练最近在长思维链推理大型语言模型(LLM)方面取得了重大进展,但此类模型的高推理成本促使其向更小的学生进行蒸馏。大多数现有的知识提取(KD)方法都是为监督微调(SFT)而设计的,依赖于固定的教师轨迹或基于教师-学生Kullback-Leibler(KL)发散的正则化。当与RL相结合时,这些方法通常会受到分布失配和客观干扰的影响:教师的监督可能与学生不断发展的推出分布不一致,KL正则化器可能与奖励最大化竞争,并需要小心的损失平衡。为了解决这些问题,我们提出了RL意识蒸馏(RLAD),它在RL过程中执行选择性模仿-只有当它改善了当前的政策更新时,才引导学生向老师学习。我们的核心组件,信任区域比蒸馏(TRRD),取代教师-学生KL正则化器与PPO/GRPO风格的似然比目标锚定到教师-旧政策的混合物,从而产生可感知的,信任区域有界的蒸馏学生推出和自然平衡的探索,利用和模仿。在各种逻辑推理和数学基准测试中,RLAD始终优于离线蒸馏,标准GRPO和基于KL的策略师生知识蒸馏。
摘要:Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought reasoning large language models (LLMs), but the high inference cost of such models motivates distillation into smaller students. Most existing knowledge distillation (KD) methods are designed for supervised fine-tuning (SFT), relying on fixed teacher traces or teacher-student Kullback-Leibler (KL) divergence-based regularization. When combined with RL, these approaches often suffer from distribution mismatch and objective interference: teacher supervision may not align with the student's evolving rollout distribution, and the KL regularizer can compete with reward maximization and require careful loss balancing. To address these issues, we propose RL-aware distillation (RLAD), which performs selective imitation during RL -- guiding the student toward the teacher only when it improves the current policy update. Our core component, Trust Region Ratio Distillation (TRRD), replaces the teacher-student KL regularizer with a PPO/GRPO-style likelihood-ratio objective anchored to a teacher--old-policy mixture, yielding advantage-aware, trust-region-bounded distillation on student rollouts and naturally balancing exploration, exploitation, and imitation. Across diverse logic reasoning and math benchmarks, RLAD consistently outperforms offline distillation, standard GRPO, and KL-based on-policy teacher-student knowledge distillation.
【21】Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns
标题:通过对称路线的皮质柱在语言模型中进行高效的持续学习
链接:https://arxiv.org/abs/2602.22479
作者:Afshin Khadangi
摘要:持续学习是部署语言模型的核心要求,但标准训练和微调管道在非平稳数据下仍然很脆弱。在线更新通常会导致灾难性的遗忘,而提高稳定性的方法通常会增加延迟,内存占用或密集计算,这些方法无法很好地扩展到长上下文。我们引入了TRC$^{2}$(丘脑路由的皮层列),这是一个仅用于解码器的主干,可解决架构级别的持续学习问题。TRC$^{2}$将皮质柱上的稀疏丘脑路由与调制、预测、记忆和反馈机制结合在一起,并结合了快速校正通路,该通路支持快速适应,而不会使较慢的参数不稳定。产生的块是稀疏的和块并行的,从而实现有效的训练和推理,同时保持每个子系统的干净消融。我们实例化了一个可重复的训练和评估堆栈以及一个连续学习的工具,该工具可以在流域偏移下测量代理遗忘。在语言建模和持续学习基准测试中,TRC$^{2}$改进了可比计算的稳定性-可塑性权衡,实现了快速的流上适应,同时保留了先前获得的行为。
摘要:Continual learning is a core requirement for deployed language models, yet standard training and fine-tuning pipelines remain brittle under non-stationary data. Online updates often induce catastrophic forgetting, while methods that improve stability frequently increase latency, memory footprint, or dense computation in ways that do not scale well to long contexts. We introduce TRC$^{2}$ (Thalamically Routed Cortical Columns), a decoder-only backbone that addresses continual learning at the architectural level. TRC$^{2}$ combines sparse thalamic routing over cortical columns with mechanisms for modulation, prediction, memory, and feedback, together with a fast corrective pathway that supports rapid adaptation without destabilizing slower parameters. The resulting block is sparse and chunk-parallel, enabling efficient training and inference while preserving clean ablations of each subsystem. We instantiate a reproducible training and evaluation stack and a continual-learning harness that measures proxy forgetting under streaming domain shifts. Across language modeling and continual learning benchmarks, TRC$^{2}$ improves the stability-plasticity tradeoff at comparable compute, enabling rapid on-stream adaptation while preserving previously acquired behavior.
【22】Causality $\neq$ Invariance: Function and Concept Vectors in LLMs
标题:因果关系$ eq$Invariant:LLM中的功能和概念载体
链接:https://arxiv.org/abs/2602.22424
作者:Gustaw Opiełka,Hannes Rosenbusch,Claire E. Stevenson
摘要
:大型语言模型(LLM)是否抽象地表示概念,即,独立于输入格式?我们重新审视功能向量(FV),上下文学习(ICL)任务的紧凑表示,因果驱动任务性能。在多个LLM中,我们证明了FV不是完全不变的:当从不同的输入格式(例如,开放式与多项选择),即使两者都针对相同的概念。我们识别概念向量(CV),它们携带更稳定的概念表示。与FV一样,CV由注意力头输出组成;然而,与FV不同的是,基于它们是否在输入格式中一致地编码概念,使用表征相似性分析(RSA)来选择组成头。虽然这些头出现在类似的层FV相关的头,这两组在很大程度上是不同的,这表明不同的潜在机制。转向实验表明,当提取和应用格式匹配时,FV在分布中表现出色(例如,在英语中都是开放式的),而简历在两种问题类型(开放式与多项选择)和语言中都有更好的分布。我们的研究结果表明,LLM确实包含抽象概念表示,但这些表示与驱动ICL性能的表示不同。
摘要:Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show that FVs are not fully invariant: FVs are nearly orthogonal when extracted from different input formats (e.g., open-ended vs. multiple-choice), even if both target the same concept. We identify Concept Vectors (CVs), which carry more stable concept representations. Like FVs, CVs are composed of attention head outputs; however, unlike FVs, the constituent heads are selected using Representational Similarity Analysis (RSA) based on whether they encode concepts consistently across input formats. While these heads emerge in similar layers to FV-related heads, the two sets are largely distinct, suggesting different underlying mechanisms. Steering experiments reveal that FVs excel in-distribution, when extraction and application formats match (e.g., both open-ended in English), while CVs generalize better out-of-distribution across both question types (open-ended vs. multiple-choice) and languages. Our results show that LLMs do contain abstract concept representations, but these differ from those that drive ICL performance.
【23】Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory
标题:大型语言模型中的结构和冗余:随机矩阵理论的谱研究
链接:https://arxiv.org/abs/2602.22345
作者:Davide Ettori
备注:Executive Summary of Master Thesis in Computer Science Engineering, Politecnico di Milano
摘要:本文通过基于谱几何和随机矩阵理论(RMT)的统一框架,解决了现代深度学习中两个持续存在且密切相关的挑战:可靠性和效率。随着深度网络和大型语言模型的不断扩展,它们的内部行为变得越来越不透明,导致幻觉,分布变化下的脆弱泛化,以及不断增长的计算和能源需求。通过分析各层和输入中隐藏激活的特征值动态,这项工作表明,谱统计为模型行为提供了一个紧凑、稳定和可解释的镜头,能够将结构化的因果表示与噪声主导的变化分离开来。在这个框架内,第一个贡献,EigenTrack,介绍了一个实时的方法来检测幻觉和大型语言和视觉语言模型的分布行为。EigenTrack将流式激活转换为频谱描述符,例如熵,方差和与Marchenko-Pastur基线的偏差,并使用轻量级递归分类器对其时间演变进行建模,从而在模型输出中出现可靠性故障之前对其进行早期检测,同时提供对表示动态的可解释见解。第二个贡献,RMT-KD,提出了一种通过随机矩阵理论知识蒸馏来压缩深度网络的原则性方法。通过将激活谱中的离群特征值解释为任务相关信息的载体,RMT-KD通过迭代自蒸馏逐步将网络投影到低维子空间上,从而产生更加紧凑和节能的模型,同时保持准确性和密集,硬件友好的结构。
摘要:This thesis addresses two persistent and closely related challenges in modern deep learning, reliability and efficiency, through a unified framework grounded in Spectral Geometry and Random Matrix Theory (RMT). As deep networks and large language models continue to scale, their internal behavior becomes increasingly opaque, leading to hallucinations, fragile generalization under distribution shift, and growing computational and energy demands. By analyzing the eigenvalue dynamics of hidden activations across layers and inputs, this work shows that spectral statistics provide a compact, stable, and interpretable lens on model behavior, capable of separating structured, causal representations from noise-dominated variability. Within this framework, the first contribution, EigenTrack, introduces a real-time method for detecting hallucinations and out-of-distribution behavior in large language and vision-language models. EigenTrack transforms streaming activations into spectral descriptors such as entropy, variance, and deviations from the Marchenko-Pastur baseline, and models their temporal evolution using lightweight recurrent classifiers, enabling early detection of reliability failures before they appear in model outputs while offering interpretable insight into representation dynamics. The second contribution, RMT-KD, presents a principled approach to compressing deep networks via random matrix theoretic knowledge distillation. By interpreting outlier eigenvalues in activation spectra as carriers of task-relevant information, RMT-KD progressively projects networks onto lower-dimensional subspaces through iterative self-distillation, yielding significantly more compact and energy-efficient models while preserving accuracy and dense, hardware-friendly structure.
【24】Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads
标题:解码挂钩:一个多模态的LLM框架,用于分析视频广告的挂钩期
链接:https://arxiv.org/abs/2602.22299
作者:Kunpeng Zhang,Poppy Zhang,Shawndra Hill,Amel Awadelkarim
备注:11 pages, 5 figures, 3 tables
摘要:基于视频的广告是品牌吸引消费者的重要媒介,社交媒体平台利用用户数据优化广告投放并提高参与度。一个关键但未被充分开发的方面是“吸引期”,即吸引观众注意力并影响参与度的前三秒。由于视频内容的多模态性质,混合了视觉、听觉和文本元素,因此分析这个简短的窗口具有挑战性。传统方法往往忽略了这些组件之间微妙的相互作用,需要先进的框架进行彻底的评估。 本研究提出一个框架,使用基于transformer的多模态大语言模型(MLLM)来分析视频广告的hooking周期。它测试了两种帧采样策略,均匀随机采样和关键帧选择,以确保平衡和代表性的声学特征提取,捕捉全方位的设计元素。挂钩视频由最先进的MLLM处理,以生成对广告初始影响的描述性分析,这些分析使用BERTopic进行高级抽象,提炼成连贯的主题。该框架还集成了音频属性和聚合广告定位信息等功能,丰富了进一步分析的功能集。 来自社交媒体平台的大规模真实数据的实证验证证明了我们框架的有效性,揭示了挂钩期特征与关键绩效指标(如每笔投资的转化率)之间的相关性。结果突出了该方法的实用性和预测能力,为优化视频广告策略提供了有价值的见解。这项研究通过提供一种可扩展的方法来理解和增强视频广告的初始时刻,从而推进了视频广告分析。
摘要
:Video-based ads are a vital medium for brands to engage consumers, with social media platforms leveraging user data to optimize ad delivery and boost engagement. A crucial but under-explored aspect is the 'hooking period', the first three seconds that capture viewer attention and influence engagement metrics. Analyzing this brief window is challenging due to the multimodal nature of video content, which blends visual, auditory, and textual elements. Traditional methods often miss the nuanced interplay of these components, requiring advanced frameworks for thorough evaluation. This study presents a framework using transformer-based multimodal large language models (MLLMs) to analyze the hooking period of video ads. It tests two frame sampling strategies, uniform random sampling and key frame selection, to ensure balanced and representative acoustic feature extraction, capturing the full range of design elements. The hooking video is processed by state-of-the-art MLLMs to generate descriptive analyses of the ad's initial impact, which are distilled into coherent topics using BERTopic for high-level abstraction. The framework also integrates features such as audio attributes and aggregated ad targeting information, enriching the feature set for further analysis. Empirical validation on large-scale real-world data from social media platforms demonstrates the efficacy of our framework, revealing correlations between hooking period features and key performance metrics like conversion per investment. The results highlight the practical applicability and predictive power of the approach, offering valuable insights for optimizing video ad strategies. This study advances video ad analysis by providing a scalable methodology for understanding and enhancing the initial moments of video advertisements.
【25】UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
标题:UpSkill:LLM结构化响应多样性的互信息技能学习
链接:https://arxiv.org/abs/2602.22296
作者:Devan Shah,Owen Yang,Daniel Yang,Chongyi Zheng,Benjamin Eysenbach
备注:First two authors equal contribution. 29 pages total (11 pages main text), 10 figures, 10 tables. Project website: https://dshah.io/upskill/
摘要:带有可验证奖励的强化学习(RLVR)提高了大型语言模型(LLM)在数学和编程任务上的推理能力,但优化单次尝试准确性的标准方法可能会无意中抑制重复尝试的响应多样性,缩小探索范围并忽视代表性不足的策略。我们引入了UpSkill,这是一种训练时间方法,它将互信息技能学习(MISL)适应于LLM,以优化pass@k的正确性。我们提出了一种新的奖励,我们在组相对策略优化(GRPO)中实现:令牌级互信息(MI)奖励,鼓励轨迹特异性z。在GSM 8 K上使用三种开放权重模型Llama 3.1-8B、Qwen 2.5- 7 B和R1-Distilled-Qwen 2.5-Math-1.5B进行的实验表明,UpSkill改进了更强基础模型上的多次尝试指标,Qwen和Llama在pass@k中的平均增益约为3%,而不会降低pass@1。此外,我们发现的经验和理论证据表明,通过@k的改进是密切相关的互信息的目标。
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large language models (LLMs) on mathematics and programming tasks, but standard approaches that optimize single-attempt accuracy can inadvertently suppress response diversity across repeated attempts, narrowing exploration and overlooking underrepresented strategies. We introduce UpSkill, a training time method that adapts Mutual Information Skill Learning (MISL) to LLMs for optimizing pass@k correctness. We propose a novel reward that we implement within Group Relative Policy Optimization (GRPO): a token-level mutual information (MI) reward that encourages trajectory specificity to z. Experiments on GSM8K with three open-weight models, Llama 3.1-8B, Qwen 2.5-7B, and R1-Distilled-Qwen2.5-Math-1.5B, show that UpSkill improves multi-attempt metrics on the stronger base models, yielding mean gains of ~3% in pass@k for both Qwen and Llama without degrading pass@1. Additionally, we find both empirical and theoretical evidence that improvements in pass@k are closely tied to the mutual information objective.
【26】Manifold of Failure: Behavioral Attraction Basins in Language Models
标题:失败的多种形式:语言模型中的行为吸引盆地
链接:https://arxiv.org/abs/2602.22291
作者:Sarthak Munshi,Manish Bhatt,Vineeth Sai Narajala,Idan Habler,AmmarnAl-Kahfah,Ken Huang,Blake Gatto
摘要:虽然之前的工作主要集中在将对抗性示例投射到自然数据的流形上以恢复安全性,但我们认为,要全面了解人工智能的安全性,需要对不安全区域本身进行表征。本文介绍了一个框架,系统地映射在大型语言模型(LLM)的故障流形。我们重新架构的质量多样性问题的漏洞搜索,使用MAP精英照亮这些故障区域的连续拓扑结构,我们称之为行为吸引盆地。我们的质量指标对齐偏差(Alignment Deviation)可指导搜索模型行为与预期对齐偏离最大的区域。三个LLM:Llama-3-8B,GPT-OSS-20 B和GPT-5-Mini,我们表明MAP-Elites实现了高达63%的行为覆盖率,发现了多达370个不同的漏洞利基,并揭示了截然不同的模型特定拓扑签名:Llama-3-8B表现出近乎普遍的脆弱性平台(平均对齐偏差0.93),GPT-OSS-20 B显示了具有空间集中盆地的破碎景观(平均值0.73),GPT-5-Mini显示了强大的鲁棒性,上限为0.50。我们的方法生成了每个模型的安全景观的可解释的全局地图,这些地图是现有攻击方法(GCG,PAIR或TAP)无法提供的,将范式从寻找离散故障转移到理解其底层结构。
摘要:While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a comprehensive understanding of AI safety requires characterizing the unsafe regions themselves. This paper introduces a framework for systematically mapping the Manifold of Failure in Large Language Models (LLMs). We reframe the search for vulnerabilities as a quality diversity problem, using MAP-Elites to illuminate the continuous topology of these failure regions, which we term behavioral attraction basins. Our quality metric, Alignment Deviation, guides the search towards areas where the model's behavior diverges most from its intended alignment. Across three LLMs: Llama-3-8B, GPT-OSS-20B, and GPT-5-Mini, we show that MAP-Elites achieves up to 63% behavioral coverage, discovers up to 370 distinct vulnerability niches, and reveals dramatically different model-specific topological signatures: Llama-3-8B exhibits a near-universal vulnerability plateau (mean Alignment Deviation 0.93), GPT-OSS-20B shows a fragmented landscape with spatially concentrated basins (mean 0.73), and GPT-5-Mini demonstrates strong robustness with a ceiling at 0.50. Our approach produces interpretable, global maps of each model's safety landscape that no existing attack method (GCG, PAIR, or TAP) can provide, shifting the paradigm from finding discrete failures to understanding their underlying structure.
【27】BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning
标题:BrepCoder:用于多任务B-rep推理的统一多模式大型语言模型
链接:https://arxiv.org/abs/2602.22284
作者
:Mingi Kim,Yongjun Kim,Jungwoo Kang,Hyungki Kim
摘要:深度学习的最新进展积极地解决了计算机辅助设计(CAD)领域的复杂挑战。然而,大多数现有方法依赖于需要针对新任务进行结构修改的特定任务模型,并且它们主要关注点云或图像,而不是行业标准的边界表示(B-rep)格式。为了解决这些限制,我们提出了BrepCoder,一个统一的多模态大型语言模型(MLLM),执行不同的CAD任务,从B-rep输入。通过利用大型语言模型(LLM)的代码生成功能,我们将CAD建模序列转换为类似Python的代码,并将其与B-rep对齐。然后,我们采用两阶段的训练策略:首先,对逆向工程进行预训练,以学习几何特征和设计逻辑。第二,有效地将模型扩展到各种下游任务,如完成,纠错和CAD-QA。因此,通过将B-rep解释为结构代码,BrepCoder在不同的任务中实现了卓越的泛化,展示了其作为通用CAD代理的潜力。
摘要:Recent advancements in deep learning have actively addressed complex challenges within the Computer-Aided Design (CAD) domain.However, most existing approaches rely on task-specifi c models requiring structural modifi cations for new tasks, and they predominantly focus on point clouds or images rather than the industry-standard Boundary Representation (B-rep) format. To address these limitations, we propose BrepCoder, a unifi ed Multimodal Large Language Model (MLLM) that performs diverse CAD tasks from B-rep inputs. By leveraging the code generation capabilities of Large Language Models (LLMs), we convert CAD modeling sequences into Python-like code and align them with B-rep. We then adopt a two-stage training strategy: First, pre-training on reverse engineering to learn geometric features and design logic. Second, eff ectively extending the model to various downstream tasks such as completion, error correction, and CAD-QA. Consequently, by interpreting B-rep as structural code, BrepCoder achieves superior generalization across diverse tasks, demonstrating its potential as a general-purpose CAD agent.
【28】Integrating Machine Learning Ensembles and Large Language Models for Heart Disease Prediction Using Voting Fusion
标题:使用投票融合集成机器学习集成和大型语言模型用于心脏病预测
链接:https://arxiv.org/abs/2602.22280
作者:Md. Tahsin Amin,Tanim Ahmmod,Zannatul Ferdus,Talukder Naemul Hasan Naem,Ehsanul Ferdous,Arpita Bhattacharjee,Ishmam Ahmed Solaiman,Nahiyan Bin Noor
备注:7 pages, 8 figures, (Accepted at a peer-reviewed conference)
摘要:心血管疾病是全球死亡的主要原因,需要早期识别,精确的风险分类和可靠的决策支持技术。大型语言模型(LLM)的出现提供了新的zero-shot和Few-Shot推理能力,尽管机器学习(ML)算法,特别是Random Forest、XGBoost、LightGBM和CatBoost等集成方法,在建模复杂的非线性患者数据和常规的逻辑回归方面表现出色。该研究使用1,190例患者记录的合并数据集预测心血管疾病,通过OpenRouter API将传统机器学习模型(95.78%准确率,ROC-AUC 0.96)与开源大型语言模型进行比较。最后,在Gemini 2.5 Flash下,ML集成和LLM推理的混合融合实现了最佳结果(96.62%的准确率,0.97 AUC),表明LLM(78.9%的准确率)在与ML模型结合而不是单独使用时效果最佳。结果表明,ML集成实现了最高的性能(95.78%的准确度,ROC-AUC 0.96),而LLM在zero-shot(78.9%)中表现中等,在Few-Shot(72.6%)设置中略好。所提出的混合方法增强了在不确定情况下的强度,说明集成ML被认为是最好的结构化表格预测案例,但它可以与混合ML-LLM系统集成,以提供较小的增加,并为更可靠的临床决策支持工具开辟道路。
摘要:Cardiovascular disease is the primary cause of death globally, necessitating early identification, precise risk classification, and dependable decision-support technologies. The advent of large language models (LLMs) provides new zero-shot and few-shot reasoning capabilities, even though machine learning (ML) algorithms, especially ensemble approaches like Random Forest, XGBoost, LightGBM, and CatBoost, are excellent at modeling complex, non-linear patient data and routinely beat logistic regression. This research predicts cardiovascular disease using a merged dataset of 1,190 patient records, comparing traditional machine learning models (95.78% accuracy, ROC-AUC 0.96) with open-source large language models via OpenRouter APIs. Finally, a hybrid fusion of the ML ensemble and LLM reasoning under Gemini 2.5 Flash achieved the best results (96.62% accuracy, 0.97 AUC), showing that LLMs (78.9 % accuracy) work best when combined with ML models rather than used alone. Results show that ML ensembles achieved the highest performance (95.78% accuracy, ROC-AUC 0.96), while LLMs performed moderately in zero-shot (78.9%) and slightly better in few-shot (72.6%) settings. The proposed hybrid method enhanced the strength in uncertain situations, illustrating that ensemble ML is considered the best structured tabular prediction case, but it can be integrated with hybrid ML-LLM systems to provide a minor increase and open the way to more reliable clinical decision-support tools.
【29】Support Tokens, Stability Margins, and a New Foundation for Robust LLMs
标题:支持代币、稳定性边际和稳健LLM的新基础
链接:https://arxiv.org/abs/2602.22271
作者:Deepak Agarwal,Dhyey Dharmendrakumar Mavani,Suyash Gupta,Karthik Sethuraman,Tejas Dharamsi
备注:39 pages, 6 figures
摘要:自我注意通常被描述为一种灵活的,内容自适应的方式,将令牌与过去的信息混合在一起。我们重新解释因果自我注意力Transformers,现代基础模型的骨干,在概率框架内,就像经典PCA是如何扩展到概率PCA。然而,这种重新表述揭示了一个令人惊讶的和更深层次的结构洞察:由于变量的变化现象,障碍约束出现在自我注意参数。这导致了令牌空间上的高度结构化的几何形状,为LLM解码的动态提供了理论见解。这揭示了注意力变得病态的边界,导致类似于经典支持向量机的边缘解释。就像支持向量一样,这自然会产生支持令牌的概念。 此外,我们表明,LLM可以被解释为一个随机过程的权力集的令牌空间,序列建模提供了一个严格的概率框架。我们提出了一个贝叶斯框架,并推导出一个MAP估计目标,只需要对标准LLM训练进行最小的修改:在通常的交叉熵损失中增加一个平滑的对数障碍惩罚。我们证明,这提供了更强大的模型,而不牺牲样本外的准确性,它是直接纳入实践。
摘要
:Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We re-interpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much like how classical PCA is extended to probabilistic PCA. However, this re-formulation reveals a surprising and deeper structural insight: due to a change-of-variables phenomenon, a barrier constraint emerges on the self-attention parameters. This induces a highly structured geometry on the token space, providing theoretical insights into the dynamics of LLM decoding. This reveals a boundary where attention becomes ill-conditioned, leading to a margin interpretation similar to classical support vector machines. Just like support vectors, this naturally gives rise to the concept of support tokens. Furthermore, we show that LLMs can be interpreted as a stochastic process over the power set of the token space, providing a rigorous probabilistic framework for sequence modeling. We propose a Bayesian framework and derive a MAP estimation objective that requires only a minimal modification to standard LLM training: the addition of a smooth log-barrier penalty to the usual cross-entropy loss. We demonstrate that this provides more robust models without sacrificing out-of-sample accuracy and that it is straightforward to incorporate in practice.
【30】AutoQRA: Joint Optimization of Mixed-Precision Quantization and Low-rank Adapters for Efficient LLM Fine-Tuning
标题:AutoQRA:混合精度量化和低秩适配器的联合优化,用于有效的LLM微调
链接:https://arxiv.org/abs/2602.22268
作者:Changhai Zhou,Shiyang Zhang,Yuhua Zhou,Qian Qiao,Jun Gao,Cheng Jin,Kaizhou Qin,Weizhong Zhang
备注:15 pages, 10 figures
摘要:量化之后的参数高效微调已经成为一个有前途的范例下游适应严格的GPU内存限制。然而,这种顺序流水线未能利用量化位宽和LoRA秩之间的复杂相互作用。具体地,具有低量化误差的仔细优化的量化分配并不总是转化为强的微调性能,并且不同的位宽和秩配置可以在相同的存储器预算下导致显著不同的结果。为了解决这一限制,我们提出了AutoQRA,这是一种联合优化框架,可以在混合量化微调过程中同时优化每层的位宽和LoRA秩配置。为了解决大型离散搜索空间和频繁微调迭代相关的高评估成本带来的挑战,AutoQRA将优化过程分解为两个阶段。首先,它首先进行全局多保真度进化搜索,其中初始种群通过注入逐层重要性先验来热启动。该阶段采用特定的操作符和性能模型来有效地筛选候选配置。其次,信赖域贝叶斯优化应用于局部细化搜索空间的有希望的区域,并确定在给定的内存预算下的最佳配置。这种方法能够在训练期间对特定层中的量化噪声进行主动补偿。实验表明,AutoQRA实现了接近全精度微调的性能,其内存占用可与统一的4位方法相媲美。
摘要:Quantization followed by parameter-efficient fine-tuning has emerged as a promising paradigm for downstream adaptation under tight GPU memory constraints. However, this sequential pipeline fails to leverage the intricate interaction between quantization bit-width and LoRA rank. Specifically, a carefully optimized quantization allocation with low quantization error does not always translate to strong fine-tuning performance, and different bit-width and rank configurations can lead to significantly varying outcomes under the same memory budget. To address this limitation, we propose AutoQRA, a joint optimization framework that simultaneously optimizes the bit-width and LoRA rank configuration for each layer during the mixed quantized fine-tuning process. To tackle the challenges posed by the large discrete search space and the high evaluation cost associated with frequent fine-tuning iterations, AutoQRA decomposes the optimization process into two stages. First, it first conducts a global multi-fidelity evolutionary search, where the initial population is warm-started by injecting layer-wise importance priors. This stage employs specific operators and a performance model to efficiently screen candidate configurations. Second, trust-region Bayesian optimization is applied to locally refine promising regions of the search space and identify optimal configurations under the given memory budget. This approach enables active compensation for quantization noise in specific layers during training. Experiments show that AutoQRA achieves performance close to full-precision fine-tuning with a memory footprint comparable to uniform 4-bit methods.
【31】Sustainable LLM Inference using Context-Aware Model Switching
标题:使用上下文感知模型切换的可持续LLM推理
链接:https://arxiv.org/abs/2602.22261
作者:Yuvarani,Akashdeep Singh,Zahra Fathanah,Salsabila Harlen,Syeikha Syafura Al-Zahra binti Zahari,Hema Subramaniam
备注:15 pages, 6 figures
摘要:大型语言模型已成为许多人工智能应用的核心,但其不断增长的能源消耗引发了严重的可持续性问题。当前人工智能部署的一个关键限制是依赖于一刀切的推理策略,大多数系统将每个请求路由到同一个大型模型,而不管任务的复杂性如何,这导致了大量和不必要的能源浪费。为了解决这个问题,我们提出了一个上下文感知的模型切换方法,动态选择一个合适的语言模型的查询复杂度的基础上。所提出的系统使用上下文感知模型切换进行节能LLM推理,该推理结合了重复查询的缓存,基于规则的复杂性评分,用于快速和可解释的决策,机器学习分类以捕获语义意图,以及随着时间的推移从交互模式中学习的用户自适应组件。使用真实会话工作负载和三种具有不同计算成本的开源语言模型(Gemma3 1B,Gemma3 4B和Qwen3 4B),测量能耗(通过NVML GPU功率遥测),响应延迟,路由准确性和输出质量(BERTScore F1)来评估所提出的架构,以反映真实世界的使用条件。实验结果表明,与始终使用最大模型相比,模型切换方法可以减少高达67.5%的能耗,同时保持93.6%的响应质量。此外,简单查询的响应时间也显著缩短了约68%。这些结果表明,模型切换推理为实现更节能和可持续的AI系统提供了一条实用和可扩展的道路,表明可以在不牺牲响应质量的情况下实现显着的效率提升。
摘要:Large language models have become central to many AI applications, but their growing energy consumption raises serious sustainability concerns. A key limitation in current AI deployments is the reliance on a one-size-fits-all inference strategy where most systems route every request to the same large model, regardless of task complexity, leading to substantial and unnecessary energy waste. To address this issue, we propose a context-aware model switching approach that dynamically selects an appropriate language model based on query complexity. The proposed system uses a Context-Aware Model Switching for Energy-Efficient LLM Inference that combines caching for repeated queries, rulebased complexity scoring for fast and explainable decisions, machine learning classification to capture semantic intent, and a user-adaptive component that learns from interaction patterns over time. The proposed architecture was evaluated using real conversation workloads and three open-source language models (Gemma3 1B, Gemma3 4B and Qwen3 4B) with different computational costs, measuring energy consumption (via NVML GPU power telemetry), response latency, routing accuracy, and output quality (BERTScore F1) to reflect real-world usage conditions. Experimental results show that the model switching approach can reduce energy consumption by up to 67.5% compared to always using the largest model while maintaining a response quality of 93.6%. In addition, the response time for simple queries also improved significantly by approximately 68%. These results show that model switching inference offers a practical and scalable path toward more energy-efficient and sustainable AI systems, demonstrating that significant efficiency gains can be achieved without major sacrifices in response quality.
【32】Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models
标题:自我净化缓解多模式扩散语言模型中的后门
链接:https://arxiv.org/abs/2602.22246
作者:Guangnian Wan,Qi Li,Gongfan Fang,Xinyin Ma,Xinchao Wang
摘要:多模态扩散语言模型(MDLM)最近出现作为一个有竞争力的替代自回归。然而,他们对后门攻击的脆弱性在很大程度上仍未得到探索。在这项工作中,我们证明了完善的数据中毒管道可以成功地将后门植入MDLM,使攻击者能够通过特定的触发器操纵模型行为,同时在干净的输入上保持正常的性能。然而,对这些模型有效的防御策略尚未出现。为了弥合这一差距,我们引入了一个后门防御框架MDLM命名为DiSP(扩散自净化)。DiSP是由一个关键的观察驱动的:在推理时选择性地屏蔽某些视觉标记可以中和后门模型的破坏者诱导的行为并恢复正常功能。在此基础上,我们使用受损的模型本身来净化中毒的数据集,然后在净化的数据上微调模型,将其恢复为干净的数据集。鉴于这样的特定设计,DiSP可以删除后门,而不需要任何辅助模型或干净的参考数据。大量的实验表明,我们的方法有效地减轻了后门效应,将攻击成功率(ASR)从90%以上降低到通常低于5%,同时保持模型在良性任务上的性能。
摘要:Multimodal Diffusion Language Models (MDLMs) have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their vulnerability to backdoor attacks remains largely unexplored. In this work, we show that well-established data-poisoning pipelines can successfully implant backdoors into MDLMs, enabling attackers to manipulate model behavior via specific triggers while maintaining normal performance on clean inputs. However, defense strategies effective to these models are yet to emerge. To bridge this gap, we introduce a backdoor defense framework for MDLMs named DiSP (Diffusion Self-Purification). DiSP is driven by a key observation: selectively masking certain vision tokens at inference time can neutralize a backdoored model's trigger-induced behaviors and restore normal functionality. Building on this, we purify the poisoned dataset using the compromised model itself, then fine-tune the model on the purified data to recover it to a clean one. Given such a specific design, DiSP can remove backdoors without requiring any auxiliary models or clean reference data. Extensive experiments demonstrate that our approach effectively mitigates backdoor effects, reducing the attack success rate (ASR) from over 90% to typically under 5%, while maintaining model performance on benign tasks.
【33】Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
标题:Duel-Evolve:通过LLM自我偏好进行免奖励测试时间缩放
链接:https://arxiv.org/abs/2602.21585
作者:Sweta Karlekar,Carolina Zheng,Magnus Saebo,Nicolas Beltran-Velez,Shuyang Yu,John Bowlan,Michal Kucer,David Blei
摘要:许多应用程序试图通过在离散输出空间上迭代地提出、评分和细化候选项来优化测试时的LLM输出。现有的方法使用一个校准的标量评估器的目标目标来指导搜索,但对于许多任务,这样的分数是不可用的,太稀疏,或不可靠。相比之下,成对比较往往更容易引出,仍然提供有用的信号改进方向,并且可以从LLM本身获得而无需外部监督。基于这一观察,我们引入了Duel-Evolve,这是一种进化优化算法,它用来自用于生成候选人的相同LLM的成对偏好取代了外部标量奖励。Duel-Evolve通过Bayesian Bradley-Terry模型聚合这些噪声候选比较,产生候选质量的不确定性感知估计。这些质量估计值指导使用双汤普森抽样将比较预算分配到合理的最优值,以及选择高质量的亲本以产生改进的候选者。我们在MathBench上评估了Duel-Evolve,它比现有方法和基线的准确率高出20个百分点,在LiveCodeBench上,它比可比的迭代方法提高了12个百分点。值得注意的是,该方法不需要奖励模型,在搜索过程中不需要地面实况标签,也不需要手工制作的评分函数。结果表明,成对的自我偏好提供了强大的优化信号,测试时间的改善,在大的,离散的输出空间。
摘要:Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and refining candidates over a discrete output space. Existing methods use a calibrated scalar evaluator for the target objective to guide search, but for many tasks such scores are unavailable, too sparse, or unreliable. Pairwise comparisons, by contrast, are often easier to elicit, still provide useful signal on improvement directions, and can be obtained from the LLM itself without external supervision. Building on this observation, we introduce Duel-Evolve, an evolutionary optimization algorithm that replaces external scalar rewards with pairwise preferences elicited from the same LLM used to generate candidates. Duel-Evolve aggregates these noisy candidate comparisons via a Bayesian Bradley-Terry model, yielding uncertainty-aware estimates of candidate quality. These quality estimates guide allocation of the comparison budget toward plausible optima using Double Thompson Sampling, as well as selection of high-quality parents to generate improved candidates. We evaluate Duel-Evolve on MathBench, where it achieves 20 percentage points higher accuracy over existing methods and baselines, and on LiveCodeBench, where it improves over comparable iterative methods by over 12 percentage points. Notably, the method requires no reward model, no ground-truth labels during search, and no hand-crafted scoring function. Results show that pairwise self-preferences provide strong optimization signal for test-time improvement over large, discrete output spaces.
【34】CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction
标题:CrossLLM-Mamba:LLM的多峰状态空间融合用于RNA相互作用预测
链接:https://arxiv.org/abs/2602.22236
作者:Rabeya Tus Sadia,Qiang Ye,Qiang Cheng
摘要:RNA相关相互作用的准确预测对于理解细胞调控和推进药物发现至关重要。虽然生物大语言模型(BioLLM),如ESM-2和RiNALMo提供了强大的序列表示,现有的方法依赖于静态融合策略,无法捕捉动态的,上下文相关的性质的分子结合。我们介绍CrossLLM-Mamba,一个新的框架,重新制定的相互作用预测作为一个状态空间对齐问题。通过利用双向Mamba编码器,我们的方法通过隐藏状态传播实现了特定模态嵌入之间的深度“串扰”,将交互建模为动态序列转换而不是静态特征重叠。该框架保持线性计算复杂度,使其可扩展到高维BioLLM嵌入。我们进一步结合高斯噪声注入和焦点损失,以增强对硬阴性样本的鲁棒性。在三个相互作用类别,RNA-蛋白质,RNA-小分子和RNA-RNA的综合实验表明,CrossLLM-Mamba实现了最先进的性能。在RPI 1460基准上,我们的模型达到了0.892的MCC,超过了之前的最佳值5.2%。对于结合亲和力预测,我们在核糖开关和重复RNA亚型上实现了超过0.95的Pearson相关性。这些结果建立状态空间建模作为一个强大的范例多模态生物相互作用预测。
摘要
:Accurate prediction of RNA-associated interactions is essential for understanding cellular regulation and advancing drug discovery. While Biological Large Language Models (BioLLMs) such as ESM-2 and RiNALMo provide powerful sequence representations, existing methods rely on static fusion strategies that fail to capture the dynamic, context-dependent nature of molecular binding. We introduce CrossLLM-Mamba, a novel framework that reformulates interaction prediction as a state-space alignment problem. By leveraging bidirectional Mamba encoders, our approach enables deep ``crosstalk'' between modality-specific embeddings through hidden state propagation, modeling interactions as dynamic sequence transitions rather than static feature overlaps. The framework maintains linear computational complexity, making it scalable to high-dimensional BioLLM embeddings. We further incorporate Gaussian noise injection and Focal Loss to enhance robustness against hard-negative samples. Comprehensive experiments across three interaction categories, RNA-protein, RNA-small molecule, and RNA-RNA demonstrate that CrossLLM-Mamba achieves state-of-the-art performance. On the RPI1460 benchmark, our model attains an MCC of 0.892, surpassing the previous best by 5.2\%. For binding affinity prediction, we achieve Pearson correlations exceeding 0.95 on riboswitch and repeat RNA subtypes. These results establish state-space modeling as a powerful paradigm for multi-modal biological interaction prediction.
Graph相关(图学习|图神经网络|图优化等)(8篇)
【1】DyGnROLE: Modeling Asymmetry in Dynamic Graphs with Node-Role-Oriented Latent Encoding
标题:DyGnROLE:利用面向节点角色的潜在编码对动态图中的不对称性进行建模
链接:https://arxiv.org/abs/2602.23135
作者:Tyler Bonnet,Marek Rei
摘要:真实世界的动态图通常是有向的,源节点和目的节点表现出不对称的行为模式和时间动态。然而,现有的动态图架构在很大程度上依赖于用于处理源节点和目的地节点的共享参数,具有有限的或没有系统的角色感知建模。我们提出了DyGnROLE(动态图节点角色导向的潜在编码),一个基于转换器的架构,明确地解开源和目的地表示。通过使用单独的嵌入词汇表和角色语义位置编码,该模型捕获了每个角色独特的结构和时间上下文。这些专门嵌入在低标签机制中的有效性的关键是我们引入的自监督预训练目标:时间对比链接预测(TCLP)。预训练使用完整的未标记的交互历史来编码信息结构偏差,使模型能够在不需要注释数据的情况下学习特定于角色的表示。对未来边缘分类的评估表明,DyGnROLE大大优于各种最先进的基线,建立了角色感知建模作为动态图学习的有效策略。
摘要:Real-world dynamic graphs are often directed, with source and destination nodes exhibiting asymmetrical behavioral patterns and temporal dynamics. However, existing dynamic graph architectures largely rely on shared parameters for processing source and destination nodes, with limited or no systematic role-aware modeling. We propose DyGnROLE (Dynamic Graph Node-Role-Oriented Latent Encoding), a transformer-based architecture that explicitly disentangles source and destination representations. By using separate embedding vocabularies and role-semantic positional encodings, the model captures the distinct structural and temporal contexts unique to each role. Critical to the effectiveness of these specialized embeddings in low-label regimes is a self-supervised pretraining objective we introduce: Temporal Contrastive Link Prediction (TCLP). The pretraining uses the full unlabeled interaction history to encode informative structural biases, enabling the model to learn role-specific representations without requiring annotated data. Evaluation on future edge classification demonstrates that DyGnROLE substantially outperforms a diverse set of state-of-the-art baselines, establishing role-aware modeling as an effective strategy for dynamic graph learning.
【2】Learning Disease-Sensitive Latent Interaction Graphs From Noisy Cardiac Flow Measurements
标题:从有噪音的心流量测量中学习疾病敏感的潜在相互作用图
链接:https://arxiv.org/abs/2602.23035
作者:Viraj Patel,Marko Grujic,Philipp Aigner,Theodor Abart,Marcus Granegger,Deblina Bhattacharjee,Katharine Fraser
摘要:心脏血流模式包含关于疾病严重程度和临床干预的丰富信息,然而当前的成像和计算方法未能捕获相干流特征的潜在关系结构。我们提出了一个物理信息,潜在的关系框架来模拟心脏涡流作为图中的交互节点。我们的模型将神经关系推理架构与物理启发的交互能量和出生-死亡动态相结合,产生对疾病严重程度和干预水平敏感的潜图。我们首先将其应用于主动脉缩窄的计算流体动力学模拟。学习的潜图显示,随着主动脉半径变窄,涡流相互作用变得更强和更频繁。这导致更高的图熵,与缩窄严重程度单调相关(R^2 =0.78$,Spearman $|ρ| =0.96$)。然后,我们将此方法扩展到不同水平的左心室辅助设备支持下的左心室超声数据集。潜图表示再次捕捉相干涡结构的削弱,从而证明跨模态的概括。结果表明,潜在的相互作用图和熵作为心脏疾病和干预的强大和可解释的标记。
摘要:Cardiac blood flow patterns contain rich information about disease severity and clinical interventions, yet current imaging and computational methods fail to capture underlying relational structures of coherent flow features. We propose a physics-informed, latent relational framework to model cardiac vortices as interacting nodes in a graph. Our model combines a neural relational inference architecture with physics-inspired interaction energy and birth-death dynamics, yielding a latent graph sensitive to disease severity and intervention level. We first apply this to computational fluid dynamics simulations of aortic coarctation. Learned latent graphs reveal that as the aortic radius narrows, vortex interactions become stronger and more frequent. This leads to a higher graph entropy, correlating monotonically with coarctation severity ($R^2=0.78$, Spearman $|ρ|=0.96$). We then extend this method to ultrasound datasets of left ventricles under varying levels of left ventricular assist device support. Again the latent graph representation captures the weakening of coherent vortical structures, thereby demonstrating cross-modal generalisation. Results show latent interaction graphs and entropy serve as robust and interpretable markers of cardiac disease and intervention.
【3】LEDA: Latent Semantic Distribution Alignment for Multi-domain Graph Pre-training
标题:LEDA:多域图预训练的潜在语义分布对齐
链接:https://arxiv.org/abs/2602.22660
作者:Lianze Shan,Jitao Zhao,Dongxiao He,Siqi Liu,Jiaxu Cui,Weixiong Zhang
备注:Accepted by WWW-26, 12 pages, 2 figures
摘要
:通用大型模型(如GPT和DeepSeek)的最新进展促使将通用性引入图预训练,旨在使用图表示跨不同领域学习丰富和可概括的知识,以提高各种下游应用程序的性能。然而,大多数现有方法在从通用图学习有效知识方面面临挑战,主要是由于简单的数据对齐和有限的训练指导。简单化的数据对齐问题源于对高度多样化的图数据使用简单的统一,这无法对齐语义并误导预训练模型。有限的训练指导的问题在于将域内预训练范例任意应用于跨域场景。虽然它可以有效地增强一个数据空间中的区分表示,但它很难从许多图中捕获有效的知识。为了解决这些挑战,我们提出了一种新的潜在sEmantic分布对齐(LEDA)模型,用于通用图预训练。具体来说,我们首先引入了一个维度投影单元,自适应地将不同的领域特征对齐到一个共享的语义空间中,以最小的信息损失。此外,我们设计了一个变分语义推理模块,以获得共享的潜在分布。然后采用该分布来指导域投影,使其与跨域共享语义对齐,并确保跨域语义学习。LEDA在广泛的图形和下游任务中表现出强大的性能。值得注意的是,在Few-Shot跨域设置中,它的性能明显优于域内基线和高级通用预训练模型。
摘要:Recent advances in generic large models, such as GPT and DeepSeek, have motivated the introduction of universality to graph pre-training, aiming to learn rich and generalizable knowledge across diverse domains using graph representations to improve performance in various downstream applications. However, most existing methods face challenges in learning effective knowledge from generic graphs, primarily due to simplistic data alignment and limited training guidance. The issue of simplistic data alignment arises from the use of a straightforward unification for highly diverse graph data, which fails to align semantics and misleads pre-training models. The problem with limited training guidance lies in the arbitrary application of in-domain pre-training paradigms to cross-domain scenarios. While it is effective in enhancing discriminative representation in one data space, it struggles to capture effective knowledge from many graphs. To address these challenges, we propose a novel Latent sEmantic Distribution Alignment (LEDA) model for universal graph pre-training. Specifically, we first introduce a dimension projection unit to adaptively align diverse domain features into a shared semantic space with minimal information loss. Furthermore, we design a variational semantic inference module to obtain the shared latent distribution. The distribution is then adopted to guide the domain projection, aligning it with shared semantics across domains and ensuring cross-domain semantic learning. LEDA exhibits strong performance across a broad range of graphs and downstream tasks. Remarkably, in few-shot cross-domain settings, it significantly outperforms in-domain baselines and advanced universal pre-training models.
【4】MUG: Meta-path-aware Universal Heterogeneous Graph Pre-Training
标题:MUG:元路径感知的通用异构图预训练
链接:https://arxiv.org/abs/2602.22645
作者:Lianze Shan,Jitao Zhao,Dongxiao He,Yongqi Huang,Zhiyong Feng,Weixiong Zhang
备注:Accepted by AAAI-26, 9 pages, 3 figures
摘要:通用图预训练已经成为图表示学习中的一个关键范例,为训练编码器提供了一种很有前途的方法,可以从未标记的图中学习可转移的表示,并有效地推广到广泛的下游任务。然而,最近在通用图预训练方面的探索主要集中在同构图上,而对于异构图仍然没有探索,异构图表现出更大的结构和语义复杂性。这种异构性使得训练用于不同异构图的通用编码器具有高度挑战性:(i)具有特定于数据集的语义的不同类型阻碍了统一表示空间的构建;(ii)元路径的数量和语义在数据集之间变化,使得从一个数据集学习的编码和聚合模式难以应用于其他数据集。为了解决这些挑战,我们提出了一种新的元路径感知的通用异构图预训练(MUG)方法。具体来说,对于挑战(i),MUG引入了一个输入统一模块,该模块将来自每个异构图中的多个节点和关系类型的信息集成到一个统一的表示中。然后,通过维度感知编码器将该表示投影到共享空间中,从而实现具有不同模式的图之间的对齐。此外,对于挑战(ii),MUG训练一个共享编码器来捕获不同元路径视图之间的一致结构模式,而不是依赖于特定于小块集的聚合策略,同时全局目标鼓励可区分性并减少特定于小块集的偏差。大量的实验证明了MUG在真实数据集上的有效性。
摘要:Universal graph pre-training has emerged as a key paradigm in graph representation learning, offering a promising way to train encoders to learn transferable representations from unlabeled graphs and to effectively generalize across a wide range of downstream tasks. However, recent explorations in universal graph pre-training primarily focus on homogeneous graphs and it remains unexplored for heterogeneous graphs, which exhibit greater structural and semantic complexity. This heterogeneity makes it highly challenging to train a universal encoder for diverse heterogeneous graphs: (i) the diverse types with dataset-specific semantics hinder the construction of a unified representation space; (ii) the number and semantics of meta-paths vary across datasets, making encoding and aggregation patterns learned from one dataset difficult to apply to others. To address these challenges, we propose a novel Meta-path-aware Universal heterogeneous Graph pre-training (MUG) approach. Specifically, for challenge (i), MUG introduces a input unification module that integrates information from multiple node and relation types within each heterogeneous graph into a unified representation.This representation is then projected into a shared space by a dimension-aware encoder, enabling alignment across graphs with diverse schemas.Furthermore, for challenge (ii), MUG trains a shared encoder to capture consistent structural patterns across diverse meta-path views rather than relying on dataset-specific aggregation strategies, while a global objective encourages discriminability and reduces dataset-specific biases. Extensive experiments demonstrate the effectiveness of MUG on some real datasets.
【5】Persistent Nonnegative Matrix Factorization via Multi-Scale Graph Regularization
标题:通过多尺度图正规化实现持续非负矩阵分解
链接:https://arxiv.org/abs/2602.22536
作者:Jichao Zhang,Ran Miao,Limin Li
摘要:矩阵分解技术,特别是非负矩阵分解(NMF),已被广泛用于降维和可解释的数据表示。然而,现有的基于NMF的方法本质上是单尺度的,并且不能捕获跨分辨率的连接结构的演变。在这项工作中,我们提出了持久的非负矩阵分解(pNMF),规模参数化家庭的NMF问题,产生一个序列的持久对齐的嵌入,而不是一个单一的。通过利用持久的同源性,我们确定了一个典型的最小的足够的规模集,在该基础上的连接经历了质的变化。这些规范尺度诱导一系列的图拉普拉斯算子,导致耦合的NMF制定与尺度几何正则化和明确的跨尺度一致性约束。我们分析了嵌入沿尺度参数的结构特性,并建立了连续尺度之间的增量界限。由此产生的模型定义了一个跨尺度的非平凡的解决方案路径,而不是一个单一的因式分解,这带来了新的计算挑战。我们开发了一个保证收敛的序列交替优化算法。在人工合成和单细胞RNA测序数据集上的数值实验证明了该方法在多尺度低秩嵌入中的有效性。
摘要
:Matrix factorization techniques, especially Nonnegative Matrix Factorization (NMF), have been widely used for dimensionality reduction and interpretable data representation. However, existing NMF-based methods are inherently single-scale and fail to capture the evolution of connectivity structures across resolutions. In this work, we propose persistent nonnegative matrix factorization (pNMF), a scale-parameterized family of NMF problems, that produces a sequence of persistence-aligned embeddings rather than a single one. By leveraging persistent homology, we identify a canonical minimal sufficient scale set at which the underlying connectivity undergoes qualitative changes. These canonical scales induce a sequence of graph Laplacians, leading to a coupled NMF formulation with scale-wise geometric regularization and explicit cross-scale consistency constraint. We analyze the structural properties of the embeddings along the scale parameter and establish bounds on their increments between consecutive scales. The resulting model defines a nontrivial solution path across scales, rather than a single factorization, which poses new computational challenges. We develop a sequential alternating optimization algorithm with guaranteed convergence. Numerical experiments on synthetic and single-cell RNA sequencing datasets demonstrate the effectiveness of the proposed approach in multi-scale low-rank embeddings.
【6】Improving Spatial Allocation for Energy System Coupling with Graph Neural Networks
标题:图神经网络耦合改善能源系统空间分配
链接:https://arxiv.org/abs/2602.22249
作者:Xuanhao Mu,Jakob Geiges,Nan Liu,Thorsten Schlachter,Veit Hagenmeyer
摘要:在能源系统分析中,空间分辨率不匹配的耦合模型是一个重大挑战。一种常见的解决方案是为高分辨率地理单元分配权重以进行聚合,但传统模型仅使用单个地理空间属性。本文提出了一种创新的方法,采用自监督异构图神经网络来解决这个问题。该方法将高分辨率的地理单元建模为图形节点,整合各种地理特征,为每个网格点生成物理上有意义的权重。这些权重增强了传统的基于Voronoi的分配方法,使其能够通过整合必要的地理信息而超越简单的地理邻近性。此外,自监督学习范式克服了缺乏准确的地面实况数据的问题。实验结果表明,将该方法生成的权值应用于基于聚类的Voronoi图,与传统方法相比,显著提高了可扩展性、准确性和物理可扩展性,同时提高了精度.
摘要:In energy system analysis, coupling models with mismatched spatial resolutions is a significant challenge. A common solution is assigning weights to high-resolution geographic units for aggregation, but traditional models are limited by using only a single geospatial attribute. This paper presents an innovative method employing a self-supervised Heterogeneous Graph Neural Network to address this issue. This method models high-resolution geographic units as graph nodes, integrating various geographical features to generate physically meaningful weights for each grid point. These weights enhance the conventional Voronoi-based allocation method, allowing it to go beyond simply geographic proximity by incorporating essential geographic information.In addition, the self-supervised learning paradigm overcomes the lack of accurate ground-truth data. Experimental results demonstrate that applying weights generated by this method to cluster-based Voronoi Diagrams significantly enhances scalability, accuracy, and physical plausibility, while increasing precision compared to traditional methods.
【7】Patient-Centered, Graph-Augmented Artificial Intelligence-Enabled Passive Surveillance for Early Stroke Risk Detection in High-Risk Individuals
标题:以患者为中心、图形增强人工智能支持的被动监测,用于高危人群的早期中风风险检测
链接:https://arxiv.org/abs/2602.22228
作者:Jiyeong Kim,Stephen P. Ma,Nirali Vora,Nicholas W. Larsen,Julia Adler-Milstein,Jonathan H. Chen,Selen Bozkurt,Abeed Sarker,Juhee Cho,Jindeok Joo,Natali Pageler,Fatima Rodriguez,Christopher Sharp,Eleni Linos
摘要:中风每年影响数百万人,但症状识别不佳往往会延误就医。为了解决风险识别的差距,我们开发了一个被动监测系统,用于使用糖尿病患者报告的症状进行早期卒中风险检测。构建基于患者自身语言和双机器学习管道(异构GNN和EN/LASSO)的症状分类法,我们确定了与随后卒中相关的症状模式。我们将研究结果转化为综合症状相关性和时间接近性的混合风险筛查系统,通过基于EHR的模拟在3-90天的窗口中进行评估。在保守阈值下,有意设计以最大限度地减少错误警报,筛选系统实现了高特异性(1.00)和患病率调整的阳性预测值(1.00),具有良好的灵敏度(0.72),预期的权衡优先精度,在90天窗口内最高。仅患者报告的语言支持高精度,低负担的早期卒中风险检测,这可以为高风险个体的临床评估和干预提供宝贵的时间窗口。
摘要:Stroke affected millions annually, yet poor symptom recognition often delayed care-seeking. To address risk recognition gap, we developed a passive surveillance system for early stroke risk detection using patient-reported symptoms among individuals with diabetes. Constructing a symptom taxonomy grounded in patients own language and a dual machine learning pipeline (heterogeneous GNN and EN/LASSO), we identified symptom patterns associated with subsequent stroke. We translated findings into a hybrid risk screening system integrating symptom relevance and temporal proximity, evaluated across 3-90 day windows through EHR-based simulations. Under conservative thresholds, intentionally designed to minimize false alerts, the screening system achieved high specificity (1.00) and prevalence-adjusted positive predictive value (1.00), with good sensitivity (0.72), an expected trade-off prioritizing precision, that was highest in 90-day window. Patient-reported language alone supported high-precision, low-burden early stroke risk detection, that could offer a valuable time window for clinical evaluation and intervention for high-risk individuals.
【8】Deep ensemble graph neural networks for probabilistic cosmic-ray direction and energy reconstruction in autonomous radio arrays
标题:自主无线电阵列中概率宇宙射线方向和能量重建的深度系综图神经网络
链接:https://arxiv.org/abs/2602.23321
作者:Arsène Ferrière,Aurélien Benoit-Lévy,Olivier Martineau-Huynh,Matías Tueros
备注:Submitted to Astroparticle Physics Journal
摘要
:利用先进的机器学习技术,我们开发了一种方法,可以根据超高能宇宙射线在地面无线电探测器阵列上感应的电压轨迹精确重建超高能宇宙射线的到达方向和能量。 在我们的方法中,触发天线被表示为一个图形结构,作为输入的图形神经网络(GNN)。通过将物理知识结合到GNN架构和输入数据中,我们提高了精度,并减少了完全数据驱动方法所需的训练集大小。该方法在真实噪声条件下的模拟数据上实现了0.092°的角分辨率和16.4%的电磁能量重建分辨率。 我们还采用不确定性估计方法来提高预测的可靠性,量化GNN输出的置信度,并为方向和能量重建提供置信区间。最后,我们研究策略,以验证模型的一致性和鲁棒性在现实生活中的变化,识别的情况下,预测仍然可靠,尽管模拟和现实之间的域转移的目标。
摘要:Using advanced machine learning techniques, we developed a method for reconstructing precisely the arrival direction and energy of ultra-high-energy cosmic rays from the voltage traces they induced on ground-based radio detector arrays. In our approach, triggered antennas are represented as a graph structure, which serves as input for a graph neural network (GNN). By incorporating physical knowledge into both the GNN architecture and the input data, we improve the precision and reduce the required size of the training set with respect to a fully data-driven approach. This method achieves an angular resolution of 0.092° and an electromagnetic energy reconstruction resolution of 16.4% on simulated data with realistic noise conditions. We also employ uncertainty estimation methods to enhance the reliability of our predictions, quantifying the confidence of the GNN's outputs and providing confidence intervals for both direction and energy reconstruction. Finally, we investigate strategies to verify the model's consistency and robustness under real life variations, with the goal of identifying scenarios in which predictions remain reliable despite domain shifts between simulation and reality.
Transformer(4篇)
【1】MetaOthello: A Controlled Study of Multiple World Models in Transformers
标题:元奥赛罗:《Transformer》中多重世界模型的对照研究
链接:https://arxiv.org/abs/2602.23164
作者:Aviral Chawla,Galen Hall,Juniper Lovato
摘要:基础模型必须处理多个生成过程,但机械的可解释性在很大程度上研究孤立的能力;仍然不清楚单个Transformer如何组织多个可能相互冲突的“世界模型”。之前的奥赛罗神经网络实验测试了世界模型学习,但只关注一个游戏,只有一套规则。我们介绍MetaOthello,一个控制套件的奥赛罗变种共享语法,但不同的规则或标记,和训练小GPT的混合变量数据,研究如何组织多个世界模型在一个共享的表示空间。我们发现,在混合游戏数据上训练的Transformers不会将其容量划分为孤立的子模型;相反,它们会收敛于一个主要共享的板状态表示,该表示在变量之间进行因果转移。在一个变体上训练的线性探针可以干预另一个变体的内部状态,其有效性接近匹配探针的有效性。对于具有令牌重映射的同构游戏,表示等价于跨层推广的单个正交旋转。当规则部分重叠时,早期的层保持游戏不可知的表示,而中间层识别游戏身份,随后的层进行专门化。MetaOthello不仅提供了一条理解Transformers是否学习世界模型的途径,还提供了一条理解它们如何同时组织多个世界模型的途径。
摘要:Foundation models must handle multiple generative processes, yet mechanistic interpretability largely studies capabilities in isolation; it remains unclear how a single transformer organizes multiple, potentially conflicting "world models". Previous experiments on Othello playing neural-networks test world-model learning but focus on a single game with a single set of rules. We introduce MetaOthello, a controlled suite of Othello variants with shared syntax but different rules or tokenizations, and train small GPTs on mixed-variant data to study how multiple world models are organized in a shared representation space. We find that transformers trained on mixed-game data do not partition their capacity into isolated sub-models; instead, they converge on a mostly shared board-state representation that transfers causally across variants. Linear probes trained on one variant can intervene on another's internal state with effectiveness approaching that of matched probes. For isomorphic games with token remapping, representations are equivalent up to a single orthogonal rotation that generalizes across layers. When rules partially overlap, early layers maintain game-agnostic representations while a middle layer identifies game identity, and later layers specialize. MetaOthello offers a path toward understanding not just whether transformers learn world models, but how they organize many at once.
【2】Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability
标题:用于预测和预防Transformer训练不稳定的剩余Koopman谱分析
链接:https://arxiv.org/abs/2602.22988
作者:Bum Jun Kim,Shohei Taniguchi,Makoto Kawano,Yusuke Iwasawa,Yutaka Matsuo
备注:23 pages, 7 figures
摘要:Transformers中的培训分歧浪费了计算,但实践者只有在昂贵的运行开始后才发现不稳定性。因此,他们需要在培训开始之前获得Transformer的预期故障概率。我们对残差库普曼谱分析(RKSP)的研究提供了这样的估计。从初始化时的单次前向通过,RKSP通过将白化动态模式分解应用于逐层残差快照来提取Koopman光谱特征。我们的中心诊断,近单位光谱质量,量化的分数模式集中在单位圆附近,捕捉不稳定的风险。为了预测广泛配置的差异,该估计器实现了0.995的AUROC,优于最佳梯度基线。我们通过Koopman Spectral Shaping(KSS)进一步使此诊断可操作,KSS在训练期间重塑光谱。我们经验验证,我们的方法在实践中的作品:RKSP预测分歧在初始化,当RKSP标志高风险,打开KSS成功地防止分歧。在没有规范化层的高学习率机制中,KSS将发散率从66.7%降低到12.5%,并使学习率提高50%到150%。这些发现可以推广到WikiText-103语言建模、CIFAR-10上的Vision Transformers和预训练语言模型,包括GPT-2和LLaMA-2高达7 B,以及新兴的架构,如MoE、Mamba风格的SSM和KAN。
摘要:Training divergence in transformers wastes compute, yet practitioners discover instability only after expensive runs begin. They therefore need an expected probability of failure for a transformer before training starts. Our study of Residual Koopman Spectral Profiling (RKSP) provides such an estimate. From a single forward pass at initialization, RKSP extracts Koopman spectral features by applying whitened dynamic mode decomposition to layer-wise residual snapshots. Our central diagnostic, the near-unit spectral mass, quantifies the fraction of modes concentrated near the unit circle, which captures instability risk. For predicting divergence across extensive configurations, this estimator achieves an AUROC of 0.995, outperforming the best gradient baseline. We further make this diagnostic actionable through Koopman Spectral Shaping (KSS), which reshapes spectra during training. We empirically validate that our method works in practice: RKSP predicts divergence at initialization, and when RKSP flags high risk, turning on KSS successfully prevents divergence. In the challenging high learning rate regime without normalization layers, KSS reduces the divergence rate from 66.7% to 12.5% and enables learning rates that are 50% to 150% higher. These findings generalize to WikiText-103 language modeling, vision transformers on CIFAR-10, and pretrained language models, including GPT-2 and LLaMA-2 up to 7B, as well as emerging architectures such as MoE, Mamba-style SSMs, and KAN.
【3】Transformers converge to invariant algorithmic cores
标题:Transformer收敛到不变算法核心
链接:https://arxiv.org/abs/2602.22600
作者:Joshua S. Schiffman
摘要:大型语言模型具有复杂的功能,但理解它们内部的工作方式仍然是一个核心挑战。一个根本的障碍是,训练选择的是行为,而不是电路,所以许多重量配置可以实现相同的功能。哪些内部结构反映了计算,哪些是特定训练运行的事故?这项工作提取算法的核心:紧凑的子空间的必要和足够的任务性能。独立训练的Transformers学习不同的权重,但收敛到相同的核心。马尔可夫链Transformers嵌入3D核心在几乎正交的子空间,但恢复相同的过渡谱。模加法Transformers在grokking发现紧凑的循环算子,后来膨胀,产生一个预测模型的记忆泛化过渡。GPT-2语言模型通过一个单一的轴来控制主谓一致,当翻转时,在整个世代中,语法数字在各个尺度上都是颠倒的。这些结果揭示了在训练运行和尺度上持续存在的低维不变量,表明Transformer计算是围绕紧凑的共享算法结构组织的。机制的可解释性可以受益于针对这样的不变量-计算本质-而不是特定于实现的细节。
摘要:Large language models exhibit sophisticated capabilities, yet understanding how they work internally remains a central challenge. A fundamental obstacle is that training selects for behavior, not circuitry, so many weight configurations can implement the same function. Which internal structures reflect the computation, and which are accidents of a particular training run? This work extracts algorithmic cores: compact subspaces necessary and sufficient for task performance. Independently trained transformers learn different weights but converge to the same cores. Markov-chain transformers embed 3D cores in nearly orthogonal subspaces yet recover identical transition spectra. Modular-addition transformers discover compact cyclic operators at grokking that later inflate, yielding a predictive model of the memorization-to-generalization transition. GPT-2 language models govern subject-verb agreement through a single axis that, when flipped, inverts grammatical number throughout generation across scales. These results reveal low-dimensional invariants that persist across training runs and scales, suggesting that transformer computations are organized around compact, shared algorithmic structures. Mechanistic interpretability could benefit from targeting such invariants -- the computational essence -- rather than implementation-specific details.
【4】Multi-Dimensional Spectral Geometry of Biological Knowledge in Single-Cell Transformer Representations
标题:单细胞Transformer表示中生物知识的多维谱几何
链接:https://arxiv.org/abs/2602.22247
作者:Ihor Kendiukhov
摘要:单细胞基础模型(如scGPT)学习高维基因表示,但这些表示编码的生物学知识仍不清楚。我们通过63次自动假设筛选(测试了183个假设)迭代系统地解码了scGPT内部表征的几何结构,揭示了该模型将基因组织成结构化的生物坐标系,而不是不透明的特征空间。 优势光谱轴通过亚细胞定位分离基因,其中分泌蛋白在一极,胞质蛋白在另一极。中间Transformer层瞬时编码线粒体和ER隔室,其序列反映了细胞分泌途径。正交轴编码蛋白质-蛋白质相互作用网络,其具有对实验测量的相互作用强度的分级保真度(Spearman rho = 1.000,跨越n = 5个STRING置信五分位数,p = 0.017)。 在一个紧凑的六维谱子空间中,该模型将转录因子与其靶基因区分开来(AUROC = 0.744,所有12层均显著)。早期的层保留了哪些特定的基因调节哪些靶点,而更深层的层将其压缩成一个更粗糙的调节器与受调节的区别。抑制边缘在几何上比激活边缘更突出,并且B细胞主调节子BATF和BACH 2显示出跨Transformer深度朝向B细胞身份锚PAX 5的收敛。细胞类型标记基因聚类具有高保真度(AUROC = 0.851)。剩余流几何编码的生物结构互补的注意模式。这些结果表明,生物Transformers学习细胞组织的可解释的内部模型,与监管网络推理,药物靶点优先级,和模型审计的影响。
摘要:Single-cell foundation models such as scGPT learn high-dimensional gene representations, but what biological knowledge these representations encode remains unclear. We systematically decode the geometric structure of scGPT internal representations through 63 iterations of automated hypothesis screening (183 hypotheses tested), revealing that the model organizes genes into a structured biological coordinate system rather than an opaque feature space. The dominant spectral axis separates genes by subcellular localization, with secreted proteins at one pole and cytosolic proteins at the other. Intermediate transformer layers transiently encode mitochondrial and ER compartments in a sequence that mirrors the cellular secretory pathway. Orthogonal axes encode protein-protein interaction networks with graded fidelity to experimentally measured interaction strength (Spearman rho = 1.000 across n = 5 STRING confidence quintiles, p = 0.017). In a compact six-dimensional spectral subspace, the model distinguishes transcription factors from their target genes (AUROC = 0.744, all 12 layers significant). Early layers preserve which specific genes regulate which targets, while deeper layers compress this into a coarser regulator versus regulated distinction. Repression edges are geometrically more prominent than activation edges, and B-cell master regulators BATF and BACH2 show convergence toward the B-cell identity anchor PAX5 across transformer depth. Cell-type marker genes cluster with high fidelity (AUROC = 0.851). Residual-stream geometry encodes biological structure complementary to attention patterns. These results indicate that biological transformers learn an interpretable internal model of cellular organization, with implications for regulatory network inference, drug target prioritization, and model auditing.
GAN|对抗|攻击|生成相关(8篇)
【1】Forecasting Antimicrobial Resistance Trends Using Machine Learning on WHO GLASS Surveillance Data: A Retrieval-Augmented Generation Approach for Policy Decision Support
标题:使用机器学习预测WHO GLASS监测数据的抗菌药物耐药性趋势:用于政策决策支持的检索增强生成方法
链接:https://arxiv.org/abs/2602.22673
作者:Md Tanvir Hasan Turja
备注:18 pages, 4 figures, code and data available at https://github.com/TanvirTurja
摘要
:抗生素耐药性(AMR)是一个日益严重的全球危机,预计到2050年每年将导致1000万人死亡。虽然世卫组织全球抗菌素耐药性和使用监测系统(GLASS)提供了44个国家的标准化监测数据,但很少有研究应用机器学习来预测这些数据的人群水平耐药性趋势。本文提出了AMR趋势预测和循证决策支持的两个组成部分的框架。我们对六个模型进行了基准测试- Naive,Linear Regression,Ridge Regression,XGBoost,LightGBM和LSTM -在世界卫生组织六个地区(2021-2023年)的5,909个世界卫生组织GLASS观察结果上。XGBoost实现了最佳性能,测试MAE为7.07%,R-squared为0.854,比原始基线高出83.1%。特征重要性分析确定前一年耐药率为主要预测因素(50.5%重要性),而区域MAE范围为4.16%(欧洲区域)至10.14%(东南亚区域)。此外,我们还实现了检索增强生成(RAG)管道,将WHO政策文档的ChromaDB向量存储与本地部署的Phi-3 Mini语言模型相结合,生成源属性,幻觉约束的政策答案。代码和数据可在https://github.com/TanvirTurja上获得
摘要:Antimicrobial resistance (AMR) is a growing global crisis projected to cause 10 million deaths per year by 2050. While the WHO Global Antimicrobial Resistance and Use Surveillance System (GLASS) provides standardized surveillance data across 44 countries, few studies have applied machine learning to forecast population-level resistance trends from this data. This paper presents a two-component framework for AMR trend forecasting and evidence-grounded policy decision support. We benchmark six models -- Naive, Linear Regression, Ridge Regression, XGBoost, LightGBM, and LSTM -- on 5,909 WHO GLASS observations across six WHO regions (2021-2023). XGBoost achieved the best performance with a test MAE of 7.07% and R-squared of 0.854, outperforming the naive baseline by 83.1%. Feature importance analysis identified the prior-year resistance rate as the dominant predictor (50.5% importance), while regional MAE ranged from 4.16% (European Region) to 10.14% (South-East Asia Region). We additionally implemented a Retrieval-Augmented Generation (RAG) pipeline combining a ChromaDB vector store of WHO policy documents with a locally deployed Phi-3 Mini language model, producing source-attributed, hallucination-constrained policy answers. Code and data are available at https://github.com/TanvirTurja
【2】EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning
标题:EvolveGen:数学级硬件模型通过强化学习检查基准生成
链接:https://arxiv.org/abs/2602.22609
作者:Guangyu Hu,Xiaofeng Zhou,Wei Zhang,Hongce Zhang
备注:19 pages, 8 figures. Accepted by TACAS 2026
摘要:硬件模型检测的进展关键取决于高质量的基准测试。然而,社区面临着一个显着的基准差距:现有的套件数量有限,往往只分布在表示,如BTOR 2没有访问原始寄存器传输级(RTL)的设计,并偏向极端困难的情况下,要么微不足道或棘手。这些限制阻碍了新的验证技术的严格评估,并鼓励过度拟合的求解器算法的一组狭窄的问题。为了解决这个问题,我们引入了EvolveGen,这是一个通过将强化学习(RL)与高级合成(HLS)相结合来生成硬件模型检查基准的框架。我们的方法在算法抽象级别上运行,其中RL代理学习构建计算图。通过编译这些图在不同的合成指令,我们产生对功能上等同,但结构上不同的硬件设计,诱导具有挑战性的模型检查实例。求解器运行时被用作奖励信号,使代理能够自主发现和生成暴露特定于求解器的弱点的小而难的实例。实验表明,EvolveGen有效地创建了标准格式的多样化基准集(例如,AIGER和BTOR 2),并有效地揭示了最先进的模型检查器的性能瓶颈。
摘要:Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed only in representations such as BTOR2 without access to the originating register-transfer-level (RTL) designs, and biased toward extreme difficulty where instances are either trivial or intractable. These limitations hinder rigorous evaluation of new verification techniques and encourage overfitting of solver heuristics to a narrow set of problems. To address this, we introduce EvolveGen, a framework for generating hardware model checking benchmarks by combining reinforcement learning (RL) with high-level synthesis (HLS). Our approach operates at an algorithmic level of abstraction in which an RL agent learns to construct computation graphs. By compiling these graphs under different synthesis directives, we produce pairs of functionally equivalent but structurally distinct hardware designs, inducing challenging model checking instances. Solver runtime is used as the reward signal, enabling the agent to autonomously discover and generate small-but-hard instances that expose solver-specific weaknesses. Experiments show that EvolveGen efficiently creates a diverse benchmark set in standard formats (e.g., AIGER and BTOR2) and effectively reveals performance bottlenecks in state-of-the-art model checkers.
【3】TabDLM: Free-Form Tabular Data Generation via Joint Numerical-Language Diffusion
标题:TabDLM:通过联合数字语言扩散生成自由形式的表格数据
链接:https://arxiv.org/abs/2602.22586
作者:Donghong Cai,Jiarui Feng,Yanbo Wang,Da Zheng,Yixin Chen,Muhan Zhang
备注:Preprint
摘要:合成表格数据生成由于其对数据扩充、基础模型和隐私的重要性而引起了越来越多的关注。然而,现实世界的表格数据集越来越多地包含自由格式的文本字段(例如,综述或临床笔记)以及结构化的数字和分类属性。通过不同模态的联合建模生成这种异构表仍然具有挑战性。现有的方法大致分为两类:基于扩散的方法和基于LLM的方法。扩散模型可以在连续或离散空间中捕获数字和分类特征的复杂依赖关系,但将其扩展到开放式文本是不平凡的,并且通常会导致文本质量下降。相比之下,基于LLM的生成器自然会生成流畅的文本,但它们的离散标记化可能会扭曲精确或大范围的数值,从而阻碍数字和语言的准确建模。在这项工作中,我们提出了TabDLM,一个统一的框架,自由形式的表格数据生成通过联合数值-语言扩散模型建立在掩蔽扩散语言模型(MDLM)。TabDLM通过掩蔽扩散对文本和分类特征进行建模,同时通过学习到的专门数字标记嵌入用连续扩散过程对数字特征进行建模;然后双向注意力在单个模型中捕获跨模态交互。在不同基准上的广泛实验证明了TabDLM与基于强扩散和LLM的基线相比的有效性。
摘要:Synthetic tabular data generation has attracted growing attention due to its importance for data augmentation, foundation models, and privacy. However, real-world tabular datasets increasingly contain free-form text fields (e.g., reviews or clinical notes) alongside structured numerical and categorical attributes. Generating such heterogeneous tables with joint modeling of different modalities remains challenging. Existing approaches broadly fall into two categories: diffusion-based methods and LLM-based methods. Diffusion models can capture complex dependencies over numerical and categorical features in continuous or discrete spaces, but extending them to open-ended text is nontrivial and often leads to degraded text quality. In contrast, LLM-based generators naturally produce fluent text, yet their discrete tokenization can distort precise or wide-range numerical values, hindering accurate modeling of both numbers and language. In this work, we propose TabDLM, a unified framework for free-form tabular data generation via a joint numerical--language diffusion model built on masked diffusion language models (MDLMs). TabDLM models textual and categorical features through masked diffusion, while modeling numerical features with a continuous diffusion process through learned specialized numeric tokens embedding; bidirectional attention then captures cross-modality interactions within a single model. Extensive experiments on diverse benchmarks demonstrate the effectiveness of TabDLM compared to strong diffusion- and LLM-based baselines.
【4】Space Syntax-guided Post-training for Residential Floor Plan Generation
标题:空间语法引导的住宅平面图生成后训练
链接:https://arxiv.org/abs/2602.22507
作者:Zhuoyang Jiang,Dongqing Zhang
摘要:住宅平面图的预训练生成模型通常被优化以适应大规模数据分布,这可能会低估关键的建筑先验,例如家庭公共空间的配置优势和连通性(例如,客厅和门厅)。本文提出了空间语法引导的后训练(SSPT),这是一种后训练范式,通过不可微的预言将空间语法知识显式注入到平面图生成中。Oracle通过贪婪的最大矩形分解和门介导的邻接构造将Rplan风格的布局转换为矩形空间图,然后计算基于集成的测量来量化公共空间的优势和功能层次。 为了实现一致的评估和诊断,我们进一步引入了SSPT-Bench(Eval-8),这是一个分布外基准,它使用上限为$\leq 7$房间的条件对模型进行后训练,同时评估8个房间的程序,以及用于优势,稳定性和轮廓对齐的统一度量套件。SSPT使用两种策略进行实例化:(i)通过空间语法过滤和扩散微调进行迭代再训练,以及(ii)通过具有空间语法奖励的PPO进行强化学习。实验表明,与分布拟合基线相比,这两种策略都提高了公共空间的优势,恢复了更清晰的功能层次结构,而PPO则以更高的计算效率和更低的方差实现了更强的收益。SSPT提供了一个可扩展的途径,将架构理论集成到数据驱动的计划生成中,并与其他生成骨干兼容,给出了一个事后评估的预言。
摘要:Pre-trained generative models for residential floor plans are typically optimized to fit large-scale data distributions, which can under-emphasize critical architectural priors such as the configurational dominance and connectivity of domestic public spaces (e.g., living rooms and foyers). This paper proposes Space Syntax-guided Post-training (SSPT), a post-training paradigm that explicitly injects space syntax knowledge into floor plan generation via a non-differentiable oracle. The oracle converts RPLAN-style layouts into rectangle-space graphs through greedy maximal-rectangle decomposition and door-mediated adjacency construction, and then computes integration-based measurements to quantify public space dominance and functional hierarchy. To enable consistent evaluation and diagnosis, we further introduce SSPT-Bench (Eval-8), an out-of-distribution benchmark that post-trains models using conditions capped at $\leq 7$ rooms while evaluating on 8-room programs, together with a unified metric suite for dominance, stability, and profile alignment. SSPT is instantiated with two strategies: (i) iterative retraining via space-syntax filtering and diffusion fine-tuning, and (ii) reinforcement learning via PPO with space-syntax rewards. Experiments show that both strategies improve public-space dominance and restore clearer functional hierarchy compared to distribution-fitted baselines, while PPO achieves stronger gains with substantially higher compute efficiency and reduced variance. SSPT provides a scalable pathway for integrating architectural theory into data-driven plan generation and is compatible with other generative backbones given a post-hoc evaluation oracle.
【5】mmWave Radar Aware Dual-Conditioned GAN for Speech Reconstruction of Signals With Low SNR
标题:毫米波雷达感知双条件GAN用于低SNR信号的语音重建
链接:https://arxiv.org/abs/2602.22431
作者:Jash Karani,Adithya Chittem,Deepan Roy,Sandeep Joshi
备注:Under review at Interspeech 2026
摘要:毫米波(mmWave)雷达捕获的信号带宽有限且有噪声,使得难以重建可理解的全带宽语音。在这项工作中,我们提出了一个两阶段的语音重建管道毫米波使用雷达感知双条件生成对抗网络(RAD-GAN),这是能够执行带宽扩展的信号与低信噪比(-5 dB到-1 dB),通过玻璃墙捕获。我们提出了一个毫米波定制的多梅尔鉴别器(MMD)和残余融合门(RFG),以提高发电机输入处理多个条件通道。所提出的两阶段流水线涉及在合成剪切的干净语音上预训练模型,以及在RFG生成的融合梅尔频谱图上微调。我们的经验表明,所提出的方法,在有限的数据集上训练,没有预先训练的模块,也没有数据增强,在这个特定的任务中表现优于最先进的方法。RAD-GAN的音频示例可在https://rad-gan-demo-site.vercel.app/上在线获得。
摘要:Millimeter-wave (mmWave) radar captures are band-limited and noisy, making for difficult reconstruction of intelligible full-bandwidth speech. In this work, we propose a two-stage speech reconstruction pipeline for mmWave using a Radar-Aware Dual-conditioned Generative Adversarial Network (RAD-GAN), which is capable of performing bandwidth extension on signals with low signal-to-noise ratios (-5 dB to -1 dB), captured through glass walls. We propose an mmWave-tailored Multi-Mel Discriminator (MMD) and a Residual Fusion Gate (RFG) to enhance the generator input to process multiple conditioning channels. The proposed two-stage pipeline involves pretraining the model on synthetically clipped clean speech and finetuning on fused mel spectrograms generated by the RFG. We empirically show that the proposed method, trained on a limited dataset, with no pre-trained modules, and no data augmentations, outperformed state-of-the-art approaches for this specific task. Audio examples of RAD-GAN are available online at https://rad-gan-demo-site.vercel.app/.
【6】Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection
标题:学习奖励,而不是标签:用于机械故障检测的对抗性反向强化学习
链接:https://arxiv.org/abs/2602.22297
作者:Dhiraj Neupane,Richard Dazeley,Mohamed Reda Bouadjenek,Sunil Aryal
备注:This article is accepted to be published in AAMAS2026. The doi is listed below but the production is on the way as of now (26/02/2026)
摘要:强化学习(RL)为机械故障检测(MFD)提供了重要的前景。然而,大多数现有的基于RL的MFD方法并没有充分利用RL的顺序决策优势,往往把MFD作为一个简单的猜测游戏(上下文强盗)。为了弥合这一差距,我们将MFD制定为离线逆强化学习问题,其中代理直接从健康的操作序列中学习奖励动态,从而绕过手动奖励工程和故障标签的需要。我们的框架采用对抗性逆强化学习来训练区分正常(专家)和策略生成的转换的神经网络。机器人的学习奖励作为异常分数,指示与正常操作行为的偏差。当在三个运行到故障的基准数据集(HUMS 2023,IMS和XJTU-SY)上进行评估时,该模型始终将低异常分数分配给正常样本,并将高分分配给故障样本,从而实现早期和鲁棒的故障检测。通过将RL的顺序推理与MFD的时间结构相结合,这项工作为数据驱动的工业环境中基于RL的诊断开辟了一条道路。
摘要:Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However, most existing RL-based MFD approaches do not fully exploit RL's sequential decision-making strengths, often treating MFD as a simple guessing game (Contextual Bandits). To bridge this gap, we formulate MFD as an offline inverse reinforcement learning problem, where the agent learns the reward dynamics directly from healthy operational sequences, thereby bypassing the need for manual reward engineering and fault labels. Our framework employs Adversarial Inverse Reinforcement Learning to train a discriminator that distinguishes between normal (expert) and policy-generated transitions. The discriminator's learned reward serves as an anomaly score, indicating deviations from normal operating behaviour. When evaluated on three run-to-failure benchmark datasets (HUMS2023, IMS, and XJTU-SY), the model consistently assigns low anomaly scores to normal samples and high scores to faulty ones, enabling early and robust fault detection. By aligning RL's sequential reasoning with MFD's temporal structure, this work opens a path toward RL-based diagnostics in data-driven industrial settings.
【7】To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
标题:欺骗就是教?通过对抗强化学习打造感知稳健性
链接:https://arxiv.org/abs/2602.22227
作者:Yicheng Bao,Xuhong Wang,Xin Tan
摘要:尽管多模态大型语言模型(MLLM)具有令人印象深刻的功能,但在面对视觉复杂的场景时表现出感知脆弱性。这种弱点源于对有限训练数据集的依赖,这些数据集的规模非常昂贵,并且对模型的鲁棒性施加了上限。我们介绍了\textbf{AOT-SFT},一个用于自举MLLM鲁棒性的大规模对抗数据集。在此基础上,我们提出了\textbf{AOT(Adversarial Opponent Training)},这是一个自我发挥的框架,通过创建自己的训练数据来增强MLLM的鲁棒性。我们的方法协调了图像编辑攻击者和防御者MLLM之间的共同进化,其中攻击者生成了多样化和动态的图像操作课程,迫使防御者进行调整和改进。大量的实验表明,AOT增强了防御者的感知鲁棒性,减少了幻觉,为训练更可靠的MLLM建立了一个可扩展的范例。
摘要:Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively expensive to scale and impose a ceiling on model robustness. We introduce \textbf{AOT-SFT}, a large-scale adversarial dataset for bootstrapping MLLM robustness. Building on this, we propose \textbf{AOT (Adversarial Opponent Training)}, a self-play framework that forges MLLM robustness by creating its own training data. Our method orchestrates a co-evolution between an image-editing Attacker and a Defender MLLM, where the Attacker generates a diverse and dynamic curriculum of image manipulations, forcing the Defender to adapt and improve. Extensive experiments demonstrate that AOT enhances the Defender's perceptual robustness and reduces hallucinations, establishing a scalable paradigm for training more reliable MLLMs.
【8】A Fast and Practical Column Generation Approach for Identifying Carcinogenic Multi-Hit Gene Combinations
标题:识别致癌多击基因组合的快速实用列生成方法
链接:https://arxiv.org/abs/2602.22551
作者:Rick S. H. Willemsen,Tenindra Abeywickrama,Ramu Anandakrishnan
摘要:癌症通常是由估计的2到9个基因突变的特定组合驱动的,称为多击组合。识别这些组合对于理解致癌作用和设计靶向治疗至关重要。我们将这一挑战形式化为多命中癌症驱动集覆盖问题(MHCDSCP),这是一个二元分类问题,选择基因组合以最大限度地覆盖肿瘤样本,同时最小化正常样本的覆盖率。现有的方法通常依赖于穷举搜索和超级计算基础设施。在本文中,我们提出了约束规划和混合整数规划公式的MHCDSCP。在真实世界的癌症基因组学数据上进行评估,我们的方法在一分钟内在单个商品CPU上运行时实现了与最先进方法相当的性能。此外,我们引入了一个列生成启发式能够解决小的情况下,以最优。这些结果表明,解决MHCDSCP的计算强度比以前认为的要低,从而为探索建模假设开辟了研究方向。
摘要:Cancer is often driven by specific combinations of an estimated two to nine gene mutations, known as multi-hit combinations. Identifying these combinations is critical for understanding carcinogenesis and designing targeted therapies. We formalise this challenge as the Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP), a binary classification problem that selects gene combinations to maximise coverage of tumor samples while minimising coverage of normal samples. Existing approaches typically rely on exhaustive search and supercomputing infrastructure. In this paper, we present constraint programming and mixed integer programming formulations of the MHCDSCP. Evaluated on real-world cancer genomics data, our methods achieve performance comparable to state-of-the-art methods while running on a single commodity CPU in under a minute. Furthermore, we introduce a column generation heuristic capable of solving small instances to optimality. These results suggest that solving the MHCDSCP is less computationally intensive than previously believed, thereby opening research directions for exploring modelling assumptions.
半/弱/无/有监督|不确定性|主动学习(7篇)
【1】Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity
标题:二元异方差下联邦不确定性量化的保形神经网络
链接:https://arxiv.org/abs/2602.23296
作者:Quang-Huy Nguyen,Jiaqi Wang,Wei-Shinn Ku
摘要:联邦学习面临着不确定性量化的挑战。如果没有可靠的UQ,FL系统就有可能在资源不足的代理上部署过度自信的模型,从而导致无声的局部失败,尽管全球表现似乎令人满意。现有的联邦UQ方法通常孤立地解决数据异构性或模型异构性,忽略了它们对跨代理的覆盖可靠性的联合影响。共形预测是一种广泛使用的无分布UQ框架,但其在异构FL环境中的应用仍有待深入研究。我们提供FedWQ-CP,一个简单而有效的方法,平衡经验的覆盖性能与效率在全球和代理水平下的双重异质性。FedWQ-CP在单次通信中执行代理-服务器校准。在每个代理,一致性得分计算校准数据和局部分位数阈值的推导。然后,每个代理仅将其分位数阈值和校准样本大小传输到服务器。服务器简单地通过加权平均来聚合这些阈值以产生全局阈值。在七个公共数据集上进行分类和回归的实验结果表明,FedWQ-CP在产生最小预测集或区间的同时,凭经验保持了智能体和全局覆盖率。
摘要:Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk deploying overconfident models at under-resourced agents, leading to silent local failures despite seemingly satisfactory global performance. Existing federated UQ approaches often address data heterogeneity or model heterogeneity in isolation, overlooking their joint effect on coverage reliability across agents. Conformal prediction is a widely used distribution-free UQ framework, yet its applications in heterogeneous FL settings remains underexplored. We provide FedWQ-CP, a simple yet effective approach that balances empirical coverage performance with efficiency at both global and agent levels under the dual heterogeneity. FedWQ-CP performs agent-server calibration in a single communication round. On each agent, conformity scores are computed on calibration data and a local quantile threshold is derived. Each agent then transmits only its quantile threshold and calibration sample size to the server. The server simply aggregates these thresholds through a weighted average to produce a global threshold. Experimental results on seven public datasets for both classification and regression demonstrate that FedWQ-CP empirically maintains agent-wise and global coverage while producing the smallest prediction sets or intervals.
【2】PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA
标题:PSQE:无监督MMEA伪种子质量增强的理论与实践方法
链接:https://arxiv.org/abs/2602.22903
作者:Yunpeng Hong,Chenyang Bu,Jie Zhang,Yi He,Di Wu,Xindong Wu
备注:2026 SIGKDD accept
摘要:多模态实体对齐(MMEA)旨在识别不同数据模态之间的等效实体,从而实现结构化数据集成,进而提高各种大型语言模型应用程序的性能。为了提高难以获得的标记种子对的要求,最近的方法转向使用伪对齐种子的无监督范式。然而,多模态环境中的无监督实体对齐仍然未得到充分探索,主要是因为多模态信息的合并通常会导致知识图中伪种子的覆盖不平衡。为了克服这一点,我们提出了PSQE(伪种子质量增强),以提高精度和图形覆盖的平衡,通过多模态信息和聚类-重新排序的伪种子。理论分析揭示了伪种子对现有的基于对比学习的MMEA模型的影响。特别是,伪种子可以同时影响对比学习中的吸引和排斥项,而不平衡的图覆盖会导致模型优先考虑高密度区域,从而削弱其对稀疏区域中实体的学习能力。实验结果验证了我们的理论研究结果,并表明,PSQE作为一个即插即用的模块,可以提高性能的基线相当大的利润。
摘要:Multimodal Entity Alignment (MMEA) aims to identify equivalent entities across different data modalities, enabling structural data integration that in turn improves the performance of various large language model applications. To lift the requirement of labeled seed pairs that are difficult to obtain, recent methods shifted to an unsupervised paradigm using pseudo-alignment seeds. However, unsupervised entity alignment in multimodal settings remains underexplored, mainly because the incorporation of multimodal information often results in imbalanced coverage of pseudo-seeds within the knowledge graph. To overcome this, we propose PSQE (Pseudo-Seed Quality Enhancement) to improve the precision and graph coverage balance of pseudo seeds via multimodal information and clustering-resampling. Theoretical analysis reveals the impact of pseudo seeds on existing contrastive learning-based MMEA models. In particular, pseudo seeds can influence the attraction and the repulsion terms in contrastive learning at once, whereas imbalanced graph coverage causes models to prioritize high-density regions, thereby weakening their learning capability for entities in sparse regions. Experimental results validate our theoretical findings and show that PSQE as a plug-and-play module can improve the performance of baselines by considerable margins.
【3】Set-based v.s. Distribution-based Representations of Epistemic Uncertainty: A Comparative Study
标题:基于集的vs. s认识不确定性的基于分布的表示:比较研究
链接:https://arxiv.org/abs/2602.22747
作者:Kaizheng Wang,Yunjia Wang,Fabio Cuzzolin,David Moens,Hans Hallez,Siu Lun Chau
备注:29 pages
摘要:神经网络中的认知不确定性通常使用两种二阶范式建模:基于分布的表示,其依赖于后验参数分布,以及基于置信集(概率分布的凸集)的基于集合的表示。由于语义、假设和评估实践的不同,这些框架通常被认为是基本上不可比的,使得它们的相对优点不清楚。实证比较进一步混淆了基础预测模型的变化。为了澄清这个问题,我们提出了一个对照研究,使原则,像对像的两种范式的评价。这两种表示都是从共享神经网络生成的预测分布的相同有限集合中构建的,将表示效果与预测准确性隔离开来。我们的研究通过8个基准的3个不确定性度量来评估每个表示,包括选择性预测和分布外检测,跨越6个基础预测模型和每个配置10个独立运行。我们的研究结果表明,这些看似不可比的框架之间的有意义的比较是可行的和翔实的,提供了深入了解二阶表示的选择如何影响实际的不确定性感知性能。
摘要:Epistemic uncertainty in neural networks is commonly modeled using two second-order paradigms: distribution-based representations, which rely on posterior parameter distributions, and set-based representations based on credal sets (convex sets of probability distributions). These frameworks are often regarded as fundamentally non-comparable due to differing semantics, assumptions, and evaluation practices, leaving their relative merits unclear. Empirical comparisons are further confounded by variations in the underlying predictive models. To clarify this issue, we present a controlled comparative study enabling principled, like-for-like evaluation of the two paradigms. Both representations are constructed from the same finite collection of predictive distributions generated by a shared neural network, isolating representational effects from predictive accuracy. Our study evaluates each representation through the lens of 3 uncertainty measures across 8 benchmarks, including selective prediction and out-of-distribution detection, spanning 6 underlying predictive models and 10 independent runs per configuration. Our results show that meaningful comparison between these seemingly non-comparable frameworks is both feasible and informative, providing insights into how second-order representation choices impact practical uncertainty-aware performance.
【4】When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering
标题:何时采取行动、询问或学习:不确定性意识的政策指导
链接:https://arxiv.org/abs/2602.22474
作者:Jessie Yuan,Yilin Wu,Andrea Bajcsy
摘要:策略导向是一种在部署时调整机器人行为的新兴方式:学习验证器分析由预训练策略提出的低级动作样本(例如,扩散策略)并且仅选择与任务一致的那些。虽然视觉语言模型(VLM)是有前途的通用验证器,由于其推理能力,现有的框架往往假设这些模型是良好的校准。在实践中,VLM的过度自信的判断可以降低在任务规范中的高级别语义不确定性和预训练策略的低级别动作不确定性或无能力的情况下的转向性能。我们提出了不确定性感知的政策导向(UPS),一个框架,共同的原因语义任务的不确定性和低级别的行动可行性,并选择一个不确定性的解决策略:执行一个高信心的行动,澄清任务的模糊性,通过自然语言查询,或要求行动干预,以纠正低级别的政策时,它被认为是无法在任务。我们利用共形预测来校准VLM的组成和预训练的基本策略,为验证者选择正确的策略提供统计保证。在部署过程中收集干预措施后,我们采用剩余学习来提高预训练策略的能力,使系统能够持续学习,但最大限度地减少昂贵的人工反馈。我们通过模拟和硬件实验展示了我们的框架,表明UPS可以解开自信,模糊和无能的情况下,最大限度地减少昂贵的用户干预相比,未经校准的基线和以前的人类或机器人门控的持续学习方法。视频可以在https://jessie-yuan.github.io/ups/上找到
摘要:Policy steering is an emerging way to adapt robot behaviors at deployment-time: a learned verifier analyzes low-level action samples proposed by a pre-trained policy (e.g., diffusion policy) and selects only those aligned with the task. While Vision-Language Models (VLMs) are promising general-purpose verifiers due to their reasoning capabilities, existing frameworks often assume these models are well-calibrated. In practice, the overconfident judgment from VLM can degrade the steering performance under both high-level semantic uncertainty in task specifications and low-level action uncertainty or incapability of the pre-trained policy. We propose uncertainty-aware policy steering (UPS), a framework that jointly reasons about semantic task uncertainty and low-level action feasibility, and selects an uncertainty resolution strategy: execute a high-confidence action, clarify task ambiguity via natural language queries, or ask for action interventions to correct the low-level policy when it is deemed incapable at the task. We leverage conformal prediction to calibrate the composition of the VLM and the pre-trained base policy, providing statistical assurances that the verifier selects the correct strategy. After collecting interventions during deployment, we employ residual learning to improve the capability of the pre-trained policy, enabling the system to learn continually but with minimal expensive human feedback. We demonstrate our framework through experiments in simulation and on hardware, showing that UPS can disentangle confident, ambiguous, and incapable scenarios and minimizes expensive user interventions compared to uncalibrated baselines and prior human- or robot-gated continual learning approaches. Videos can be found at https://jessie-yuan.github.io/ups/
【5】How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
标题:潜在推理方法在弱监督和强监督下如何表现?
链接:https://arxiv.org/abs/2602.22441
作者:Yingqian Cui,Zhenwei Dai,Bing He,Zhan Shi,Hui Liu,Rui Sun,Zhiji Liu,Yue Xing,Jiliang Tang,Benoit Dumoulin
摘要:潜在推理是近年来提出的一种推理范式,它通过在潜在空间而不是文本空间中生成步骤来执行多步推理。这种范式通过在连续的潜在空间中执行多步计算,使推理超越离散的语言标记。虽然已经有许多研究集中在提高潜在推理的性能,其内部机制仍然没有得到充分的研究。在这项工作中,我们进行了全面的分析,潜在的推理方法,以更好地了解在这个过程中的作用和行为的潜在代表。我们确定了两个关键问题,在不同层次的监督潜在的推理方法。首先,我们观察到普遍的捷径行为,他们在不依赖潜在推理的情况下实现了高准确性。其次,我们研究的假设,潜在的推理支持BFS般的探索在潜在的空间,并发现,虽然潜在的表示可以编码多种可能性,推理过程并没有忠实地实现结构化搜索,而是表现出隐式修剪和压缩。最后,我们的研究结果揭示了一个权衡与监督强度:更强的监督减轻捷径的行为,但限制了能力的潜在代表,以保持不同的假设,而较弱的监督允许更丰富的潜在代表的成本增加捷径的行为。
摘要:Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual space. This paradigm enables reasoning beyond discrete language tokens by performing multi-step computation in continuous latent spaces. Although there have been numerous studies focusing on improving the performance of latent reasoning, its internal mechanisms remain not fully investigated. In this work, we conduct a comprehensive analysis of latent reasoning methods to better understand the role and behavior of latent representation in the process. We identify two key issues across latent reasoning methods with different levels of supervision. First, we observe pervasive shortcut behavior, where they achieve high accuracy without relying on latent reasoning. Second, we examine the hypothesis that latent reasoning supports BFS-like exploration in latent space, and find that while latent representations can encode multiple possibilities, the reasoning process does not faithfully implement structured search, but instead exhibits implicit pruning and compression. Finally, our findings reveal a trade-off associated with supervision strength: stronger supervision mitigates shortcut behavior but restricts the ability of latent representations to maintain diverse hypotheses, whereas weaker supervision allows richer latent representations at the cost of increased shortcut behavior.
【6】Data-Driven Supervision of a Thermal-Hydraulic Process Towards a Physics-Based Digital Twin
标题:基于物理的数字孪生模型的热工水力过程数据驱动监控
链接:https://arxiv.org/abs/2602.22267
作者:Osimone Imhogiemhe,Yoann Jus,Hubert Lejeune,Saïd Moussaoui
摘要:生产过程的实时监控是多个行业面临的共同挑战。它的目标是过程组件监控及其预测性维护,以确保安全,不间断生产和保持高效率水平。除了数据驱动的机器学习模型之外,用于模拟物理系统的高级工具的兴起为设计专用于有效系统监控的数值工具提供了可能性。在这方面,数字孪生概念提供了一个适当的框架,为这些挑战提供了解决方案。本文的主要目的是开发这样一个数字孪生致力于故障检测和诊断的背景下,热工水力过程监督。基于系统的数值模拟,除了机器学习方法,我们提出了不同的模块,致力于过程参数变化检测和在线估计。提出的故障检测和诊断算法进行了验证,在一个特定的测试场景中,与单一的一次性参数变化发生在系统中。数值结果表明,在参数变化的局部化和他们的值的更新方面具有良好的精度。
摘要
:The real-time supervision of production processes is a common challenge across several industries. It targets process component monitoring and its predictive maintenance in order to ensure safety, uninterrupted production and maintain high efficiency level. The rise of advanced tools for the simulation of physical systems in addition to data-driven machine learning models offers the possibility to design numerical tools dedicated to efficient system monitoring. In that respect, the digital twin concept presents an adequate framework that proffers solution to these challenges. The main purpose of this paper is to develop such a digital twin dedicated to fault detection and diagnosis in the context of a thermal-hydraulic process supervision. Based on a numerical simulation of the system, in addition to machine learning methods, we propose different modules dedicated to process parameter change detection and their on-line estimation. The proposed fault detection and diagnosis algorithm is validated on a specific test scenario, with single one-off parameter change occurrences in the system. The numerical results show good accuracy in terms of parameter variation localization and the update of their values.
【7】Unsupervised Continual Learning for Amortized Bayesian Inference
标题:无监督持续学习用于摊销式Bayesian推理
链接:https://arxiv.org/abs/2602.22884
作者:Aayush Mishra,Šimon Kucharský,Paul-Christian Bürkner
摘要:摊销贝叶斯推理(ABI)可以使用在模拟数据上训练的生成神经网络进行有效的后验估计,但在模型错误指定的情况下,性能往往会下降。虽然在未标记的经验数据上进行自我一致性(SC)训练可以增强网络的鲁棒性,但目前的方法仅限于静态的单任务设置,无法处理顺序到达的数据或分布变化。我们提出了一个持续学习的ABI框架,该框架将基于模拟的预训练与对真实世界数据的无监督顺序SC微调相结合。为了解决灾难性遗忘的挑战,我们引入了两种自适应策略:(1)具有情景重放的SC,利用过去观察的记忆缓冲区,以及(2)具有弹性权重合并的SC,其规则化更新以保留任务关键参数。在三个不同的案例研究中,我们的方法显着减轻了遗忘,并产生了优于标准模拟训练的后验估计,实现了更接近MCMC参考的估计,为一系列不同任务中值得信赖的ABI提供了可行的路径。
摘要:Amortized Bayesian Inference (ABI) enables efficient posterior estimation using generative neural networks trained on simulated data, but often suffers from performance degradation under model misspecification. While self-consistency (SC) training on unlabeled empirical data can enhance network robustness, current approaches are limited to static, single-task settings and fail to handle sequentially arriving data or distribution shifts. We propose a continual learning framework for ABI that decouples simulation-based pre-training from unsupervised sequential SC fine-tuning on real-world data. To address the challenge of catastrophic forgetting, we introduce two adaptation strategies: (1) SC with episodic replay, utilizing a memory buffer of past observations, and (2) SC with elastic weight consolidation, which regularizes updates to preserve task-critical parameters. Across three diverse case studies, our methods significantly mitigate forgetting and yield posterior estimates that outperform standard simulation-based training, achieving estimates closer to MCMC reference, providing a viable path for trustworthy ABI across a range of different tasks.
迁移|Zero/Few/One-Shot|自适应(9篇)
【1】Efficient Real-Time Adaptation of ROMs for Unsteady Flows Using Data Assimilation
标题:使用数据同化对不稳定流进行有效实时调整ROMs
链接:https://arxiv.org/abs/2602.23188
作者:Ismaël Zighed,Andrea Nóvoa,Luca Magri,Taraneh Sayadi
摘要:我们提出了一个有效的再训练策略的参数化降阶模型(ROM),达到的准确性相比,充分的再训练,而只需要一小部分的计算时间,并完全依赖于稀疏观测的整个系统。该架构采用编码-处理-解码结构:变分自动编码器(VAE)执行降维,以及Transformer网络来演化潜在状态并对动态进行建模。ROM是参数化的外部控制变量,雷诺数的Navier-Stokes设置,与Transformer利用注意力机制,以捕捉时间依赖性和参数的影响。概率VAE使得能够对轨迹集合进行随机采样,通过前两个时刻提供预测手段和不确定性量化。在对有限的一组动态机制进行初始训练后,仅使用稀疏数据使模型适应样本外参数区域。它的概率公式自然支持合奏生成,我们采用合奏卡尔曼滤波框架内同化数据和重建全状态轨迹从最小的观测。我们进一步表明,考虑的动力系统,在样本外的预测误差的主要来源源于扭曲的潜在的流形,而不是在潜在的动态变化。因此,再训练可以限于自动编码器,允许具有非常稀疏的微调数据的轻量级、计算高效、实时自适应过程。
摘要:We propose an efficient retraining strategy for a parameterized Reduced Order Model (ROM) that attains accuracy comparable to full retraining while requiring only a fraction of the computational time and relying solely on sparse observations of the full system. The architecture employs an encode-process-decode structure: a Variational Autoencoder (VAE) to perform dimensionality reduction, and a transformer network to evolve the latent states and model the dynamics. The ROM is parameterized by an external control variable, the Reynolds number in the Navier-Stokes setting, with the transformer exploiting attention mechanisms to capture both temporal dependencies and parameter effects. The probabilistic VAE enables stochastic sampling of trajectory ensembles, providing predictive means and uncertainty quantification through the first two moments. After initial training on a limited set of dynamical regimes, the model is adapted to out-of-sample parameter regions using only sparse data. Its probabilistic formulation naturally supports ensemble generation, which we employ within an ensemble Kalman filtering framework to assimilate data and reconstruct full-state trajectories from minimal observations. We further show that, for the dynamical system considered, the dominant source of error in out-of-sample forecasts stems from distortions of the latent manifold rather than changes in the latent dynamics. Consequently, retraining can be limited to the autoencoder, allowing for a lightweight, computationally efficient, real-time adaptation procedure with very sparse fine-tuning data.
【2】SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling
标题:SubspaceAD:通过子空间建模的免训练Few-Shot异常检测
链接:https://arxiv.org/abs/2602.23013
作者:Camile Lendering,Erkut Akdag,Egor Bondarev
备注:Accepted to CVPR 2026
摘要
:在工业检测中检测视觉异常通常需要每个类别仅使用几个正常图像进行训练。最近的Few-Shot方法采用基础模型特征实现了强大的结果,但通常依赖于存储库、辅助数据集或视觉语言模型的多模态调整。因此,我们质疑这种复杂性是否是必要的视觉基础模型的特征表示。为了回答这个问题,我们引入了SubspaceAD,这是一种无需训练的方法,分为两个简单的阶段。首先,块级特征提取从一个小的正常的图像由冻结DINOv 2骨干。其次,主成分分析(PCA)模型是适合这些功能,以估计正常变化的低维子空间。在推理时,通过关于该子空间的重构残差检测异常,产生可解释的和统计上接地的异常分数。尽管简单,SubspaceAD在单次和Few-Shot设置上实现了最先进的性能,而无需训练、快速调整或内存库。在单次异常检测设置中,SubspaceAD在MVTec-AD数据集上实现了98.0%和97.6%的图像级和像素级AUROC,在VisA数据集上分别实现了93.3%和98.3%,超过了现有的最先进的结果。代码和演示可在https://github.com/CLendering/SubspaceAD上获得。
摘要:Detecting visual anomalies in industrial inspection often requires training with only a few normal images per category. Recent few-shot methods achieve strong results employing foundation-model features, but typically rely on memory banks, auxiliary datasets, or multi-modal tuning of vision-language models. We therefore question whether such complexity is necessary given the feature representations of vision foundation models. To answer this question, we introduce SubspaceAD, a training-free method, that operates in two simple stages. First, patch-level features are extracted from a small set of normal images by a frozen DINOv2 backbone. Second, a Principal Component Analysis (PCA) model is fit to these features to estimate the low-dimensional subspace of normal variations. At inference, anomalies are detected via the reconstruction residual with respect to this subspace, producing interpretable and statistically grounded anomaly scores. Despite its simplicity, SubspaceAD achieves state-of-the-art performance across one-shot and few-shot settings without training, prompt tuning, or memory banks. In the one-shot anomaly detection setting, SubspaceAD achieves image-level and pixel-level AUROC of 98.0% and 97.6% on the MVTec-AD dataset, and 93.3% and 98.3% on the VisA dataset, respectively, surpassing prior state-of-the-art results. Code and demo are available at https://github.com/CLendering/SubspaceAD.
【3】pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
标题:pMoE:招募多元化专家在视觉改编中赢得更多胜利
链接:https://arxiv.org/abs/2602.22938
作者:Shentong Mo,Xufang Luo,Dongsheng Li
摘要:参数有效的微调已经在各种视觉适应任务中表现出了有希望的结果,例如分类和分割。通常,即时调整技术利用了来自单个预训练模型的知识,无论是来自一般还是专业医疗领域。然而,这种方法通常忽略了在同一调优过程中集成不同领域知识可能产生的潜在协同作用。在这项工作中,我们提出了一种新的混合专家提示调优方法称为pMoE,它利用多个专家领域的优势,通过专家专用的提示令牌和可学习的调度器,有效地结合他们的专业知识在一个统一的模型框架。我们的pMoE引入了专家特定的提示令牌,并利用动态令牌调度机制,在各个提示层,以优化每个领域的专家在适应阶段的贡献。通过结合来自不同专家的领域知识,拟议的pMoE显着提高了模型的通用性和适用性,以广泛的任务。我们在47个适应任务中进行了广泛的实验,包括一般和医学领域的分类和分割。结果表明,我们的pMoE不仅实现了卓越的性能与大幅度的改进,但也提供了一个最佳的权衡计算效率和适应效果相比,现有的方法。
摘要:Parameter-efficient fine-tuning has demonstrated promising results across various visual adaptation tasks, such as classification and segmentation. Typically, prompt tuning techniques have harnessed knowledge from a single pre-trained model, whether from a general or a specialized medical domain. However, this approach typically overlooks the potential synergies that could arise from integrating diverse domain knowledge within the same tuning process. In this work, we propose a novel Mixture-of-Experts prompt tuning method called pMoE, which leverages the strengths of multiple expert domains through expert-specialized prompt tokens and the learnable dispatcher, effectively combining their expertise in a unified model framework. Our pMoE introduces expert-specific prompt tokens and utilizes a dynamic token dispatching mechanism at various prompt layers to optimize the contribution of each domain expert during the adaptation phase. By incorporating both domain knowledge from diverse experts, the proposed pMoE significantly enhances the model's versatility and applicability to a broad spectrum of tasks. We conduct extensive experiments across 47 adaptation tasks, including both classification and segmentation in general and medical domains. The results demonstrate that our pMoE not only achieves superior performance with a large margin of improvements but also offers an optimal trade-off between computational efficiency and adaptation effectiveness compared to existing methods.
【4】NoRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion
标题:NoRA:通过多元化扩展打破低等级适应的线性天花板
链接:https://arxiv.org/abs/2602.22911
作者:Hung-Hsuan Chen
摘要:低秩自适应(LoRA)主导参数有效微调(PEFT)。然而,在复杂的推理任务中,它面临着一个关键的“线性上限”:由于内在的线性约束,简单地增加秩会产生收益递减。我们引入NoRA(非线性秩自适应),这是一种权重级并行适配器,它注入SiLU门控和结构dropout来诱导流形扩展。在SlimOrca基准测试中,NoRA打破了这一线性障碍:NoRA在排名64(PPL 3.89)时的表现明显优于LoRA在排名512(PPL 3.90)时的表现,证明了卓越的频谱效率。这一优势也适用于数学推理,NoRA在MathInstruct上的困惑度为1.97,大大超过了LoRA的饱和点2.07。通过奇异值分解(SVD)的机制分析证实,NoRA激活了奇异值谱的休眠尾部,有效地防止了线性方法中观察到的秩崩溃。
摘要:Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a critical ``linear ceiling'' in complex reasoning tasks: simply increasing the rank yields diminishing returns due to intrinsic linear constraints. We introduce NoRA (Non-linear Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and structural dropout to induce manifold expansion. On the SlimOrca benchmark, NoRA breaks this linear barrier: NoRA remarkably at rank 64 (PPL 3.89) outperforms LoRA at rank 512 (PPL 3.90), demonstrating superior spectral efficiency. This advantage generalizes to mathematical reasoning, where NoRA achieves a perplexity of 1.97 on MathInstruct, significantly surpassing LoRA's saturation point of 2.07. Mechanism analysis via Singular Value Decomposition (SVD) confirms that NoRA activates the dormant tail of the singular value spectrum, effectively preventing the rank collapse observed in linear methods.
【5】Doubly Adaptive Channel and Spatial Attention for Semantic Image Communication by IoT Devices
标题:物联网设备语义图像通信的双重自适应通道和空间注意力
链接:https://arxiv.org/abs/2602.22794
作者
:Soroosh Miri,Sepehr Abolhasani,Shahrokh Farahmand,S. Mohammad Razavizadeh
备注:6 pages, 7 figures, conference
摘要:物联网(IoT)网络面临着诸如有限的通信带宽、受限的计算和能量资源以及高度动态的无线信道条件等重大挑战。深度神经网络(DNN)与语义通信相结合的使用已经成为解决这些限制的一种有前途的范例。最近提出了深度联合信源信道编码(DJSCC)以实现图像的语义通信。在原始DJSCC公式的基础上,低复杂度的注意力风格架构已被添加到DNN中,以进一步增强性能。作为一个主要障碍,针对各种信噪比(SNR)分别训练这些DNN将导致过多的存储或通信开销,而这是小型物联网设备无法维持的。SNR自适应DJSCC(ADJSCC)已经被提出来训练DNN一次,但是将当前SNR作为数据的一部分馈送到信道式注意机制。我们改进ADJSCC的同时利用双自适应信道和空间注意力模块在发送器和接收器。这些模块动态调整以适应不同的信道条件和空间特征重要性,从而实现鲁棒且高效的特征提取和语义信息恢复。仿真结果证实,我们提出的双自适应DJSCC(DA-DJSCC)显着改善ADJSCC在几个性能标准,同时引起的复杂性略有增加。这些事实使DA-DJSCC成为性能要求高但复杂度低的物联网网络中语义通信的理想选择。
摘要:Internet of Things (IoT) networks face significant challenges such as limited communication bandwidth, constrained computational and energy resources, and highly dynamic wireless channel conditions. Utilization of deep neural networks (DNNs) combined with semantic communication has emerged as a promising paradigm to address these limitations. Deep joint source-channel coding (DJSCC) has recently been proposed to enable semantic communication of images. Building upon the original DJSCC formulation, low-complexity attention-style architectures has been added to the DNNs for further performance enhancement. As a main hurdle, training these DNNs separately for various signal-to-noise ratios (SNRs) will amount to excessive storage or communication overhead, which can not be maintained by small IoT devices. SNR Adaptive DJSCC (ADJSCC), has been proposed to train the DNNs once but feed the current SNR as part of the data to the channel-wise attention mechanism. We improve upon ADJSCC by a simultaneous utilization of doubly adaptive channel-wise and spatial attention modules at both transmitter and receiver. These modules dynamically adjust to varying channel conditions and spatial feature importance, enabling robust and efficient feature extraction and semantic information recovery. Simulation results corroborate that our proposed doubly adaptive DJSCC (DA-DJSCC) significantly improves upon ADJSCC in several performance criteria, while incurring a mild increase in complexity. These facts render DA-DJSCC a desirable choice for semantic communication in performance demanding but low-complexity IoT networks.
【6】Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation
标题:通过优势塑造和长度感知梯度调节实现稳定的适应性思维
链接:https://arxiv.org/abs/2602.22556
作者:Zihang Xu,Haozhi Xie,Ziqi Miao,Wuxuan Gong,Chen Qian,Lijun Li
备注:15 pages, 7 figures
摘要:大型推理模型(LRM)通过扩展推理轨迹实现了强大的性能,但对于低复杂度的查询,它们通常会表现出过度思考的行为。现有的努力,以减轻这个问题是从根本上限制了不稳定的准确性和效率的权衡和异构推理行为的鲁棒性差。为了应对这些挑战,我们提出了一个两阶段的框架,稳定的适应性思维LRM。该框架首先应用混合微调将模型暴露给思考和不思考行为,建立条件良好的初始化。然后,它使用正确性保持优势成形(CPAS)进行自适应强化学习,以避免抑制正确的长链推理,并使用长度感知梯度调节(LAGR)在严重的推理长度异质性下稳定优化。在Qwen2.5-1.5B和7 B上进行的大量实验显示,在强基线上的一致改进,实现了高达+3.7/+3.6的准确度点,同时将生成的令牌减少了40.6%/43.9%。对不同问题难度和分布任务的进一步分析证实了我们方法的鲁棒性和通用性。
摘要:Large reasoning models (LRMs) achieve strong performance through extended reasoning traces, but they often exhibit overthinking behavior for low-complexity queries. Existing efforts to mitigate this issue are fundamentally limited by unstable accuracy-efficiency trade-offs and poor robustness to heterogeneous reasoning behaviors. To address these challenges, we propose a two-stage framework for stable adaptive thinking in LRMs. The framework first applies Hybrid Fine-Tuning to expose the model to both thinking and no-thinking behaviors, establishing well-conditioned initialization. It then performs adaptive reinforcement learning with Correctness-Preserving Advantage Shaping (CPAS) to avoid suppressing correct long-chain reasoning, and Length-Aware Gradient Regulation (LAGR) to stabilize optimization under severe reasoning-length heterogeneity. Extensive experiments on Qwen2.5-1.5B and 7B show consistent improvements over strong baselines, achieving up to +3.7/+3.6 accuracy points while reducing generated tokens by 40.6%/43.9%. Further analyses across varying problem difficulties and out-of-distribution tasks confirm the robustness and generalization of our approach.
【7】LUMOS: Democratizing SciML Workflows with L0-Regularized Learning for Unified Feature and Parameter Adaptation
标题:LUMOS:通过L0规范化学习民主化SciML工作流程,实现统一特征和参数自适应
链接:https://arxiv.org/abs/2602.22537
作者:Shouwei Gao,Xu Zheng,Dongsheng Luo,Sheng Di,Wenqian Dong
摘要:科学机器学习(SciML)的快速发展加速了不同领域的发现,但设计有效的SciML模型仍然是一项具有挑战性的任务。在实践中,构建这样的模型通常需要大量的先验知识和手工专业知识,特别是在确定使用哪些输入特征以及模型应该有多大方面。我们引入了LUMOS,这是一个基于L0正则化学习的端到端框架,它统一了特征选择和模型修剪,使SciML模型设计民主化。通过采用半随机门控和重新参数化技术,LUMOS在训练过程中动态选择信息特征并修剪冗余参数,减少对手动调整的依赖,同时保持预测准确性。我们在13个不同的SciML工作负载中评估了LUMOS,包括宇宙学和分子科学,并证明了它的有效性和普遍性。在13个SciML模型上的实验表明,LUMOS实现了71.45%的参数减少和6.4倍的平均推理加速。此外,在多达8个GPU上进行的分布式数据并行(DDP)训练证实了
摘要
:The rapid growth of scientific machine learning (SciML) has accelerated discovery across diverse domains, yet designing effective SciML models remains a challenging task. In practice, building such models often requires substantial prior knowledge and manual expertise, particularly in determining which input features to use and how large the model should be. We introduce LUMOS, an end-to-end framework based on L0-regularized learning that unifies feature selection and model pruning to democratize SciML model design. By employing semi-stochastic gating and reparameterization techniques, LUMOS dynamically selects informative features and prunes redundant parameters during training, reducing the reliance on manual tuning while maintaining predictive accuracy. We evaluate LUMOS across 13 diverse SciML workloads, including cosmology and molecular sciences, and demonstrate its effectiveness and generalizability. Experiments on 13 SciML models show that LUMOS achieves 71.45% parameter reduction and a 6.4x inference speedup on average. Furthermore, Distributed Data Parallel (DDP) training on up to eight GPUs confirms the scalability of
【8】An Adaptive Multichain Blockchain: A Multiobjective Optimization Approach
标题:自适应多链区块链:一种多目标优化方法
链接:https://arxiv.org/abs/2602.22230
作者:Nimrod Talmon,Haim Zysberg
摘要:区块链被广泛用于安全交易处理,但其可扩展性仍然有限,现有的多链设计通常是静态的,即使需求和容量发生变化。我们将区块链配置视为多代理资源分配问题:应用程序和运营商声明需求,容量和价格界限;优化器在每个时期将它们分组为临时链并设置链级清算价格。该目标最大化应用程序,运营商和系统的规范化效用的治理加权组合。该模型是模块化的-适应能力兼容性,应用类型多样性和时代间的稳定性-并且可以在链下解决,结果可以在链上验证。我们分析了公平性和激励问题,并提出了模拟,突出吞吐量,分散化,运营商收益率和服务稳定性之间的权衡。
摘要:Blockchains are widely used for secure transaction processing, but their scalability remains limited, and existing multichain designs are typically static even as demand and capacity shift. We cast blockchain configuration as a multiagent resource-allocation problem: applications and operators declare demand, capacity, and price bounds; an optimizer groups them into ephemeral chains each epoch and sets a chain-level clearing price. The objective maximizes a governance-weighted combination of normalized utilities for applications, operators, and the system. The model is modular -- accommodating capability compatibility, application-type diversity, and epoch-to-epoch stability -- and can be solved off-chain with outcomes verifiable on-chain. We analyze fairness and incentive issues and present simulations that highlight trade-offs among throughput, decentralization, operator yield, and service stability.
【9】Flow Matching is Adaptive to Manifold Structures
标题:流量匹配适应于管汇结构
链接:https://arxiv.org/abs/2602.22486
作者:Shivam Kumar,Yixin Wang,Lizhen Lin
摘要:流匹配已经成为基于扩散的生成建模的无模拟替代方案,通过求解ODE来产生样本,ODE的时间相关速度场沿着简单源分布(例如,标准正态)和目标数据分布。基于流的方法通常表现出更大的训练稳定性,并且在数据集中在低维流形附近的高维设置中实现了强大的经验性能,例如文本到图像合成,视频生成和分子结构生成。尽管取得了这一成功,现有的理论分析流量匹配假设目标分布平滑,全维密度,离开其有效性在歧管支持的设置在很大程度上无法解释。为此,我们从理论上分析了流匹配与线性插值时,目标分布的光滑流形上的支持。我们建立了一个非渐近收敛保证学习速度场,然后通过常微分方程传播这个估计误差,以获得由流匹配目标引起的隐式密度估计的统计一致性。所得的收敛速度接近最小最大最优,只依赖于内在的尺寸,并反映了光滑的流形和目标分布。总之,这些结果提供了一个原则性的解释流匹配如何适应内在的数据几何形状和规避维数灾难。
摘要:Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-dependent velocity field is learned along an interpolation between a simple source distribution (e.g., a standard normal) and a target data distribution. Flow-based methods often exhibit greater training stability and have achieved strong empirical performance in high-dimensional settings where data concentrate near a low-dimensional manifold, such as text-to-image synthesis, video generation, and molecular structure generation. Despite this success, existing theoretical analyses of flow matching assume target distributions with smooth, full-dimensional densities, leaving its effectiveness in manifold-supported settings largely unexplained. To this end, we theoretically analyze flow matching with linear interpolation when the target distribution is supported on a smooth manifold. We establish a non-asymptotic convergence guarantee for the learned velocity field, and then propagate this estimation error through the ODE to obtain statistical consistency of the implicit density estimator induced by the flow-matching objective. The resulting convergence rate is near minimax-optimal, depends only on the intrinsic dimension, and reflects the smoothness of both the manifold and the target distribution. Together, these results provide a principled explanation for how flow matching adapts to intrinsic data geometry and circumvents the curse of dimensionality.
强化学习(2篇)
【1】QSIM: Mitigating Overestimation in Multi-Agent Reinforcement Learning via Action Similarity Weighted Q-Learning
标题:QSIM:通过动作相似性加权Q学习缓解多智能体强化学习中的高估
链接:https://arxiv.org/abs/2602.22786
作者:Yuanjun Li,Bin Zhang,Hao Chen,Zhouyang Jiang,Dapeng Li,Zhiwei Xu
备注:19 pages, 15 figures, 7tables. Accepted to the 36th International Conference on Automated Planning and Scheduling (ICAPS 2026)
摘要:价值分解(VD)方法在协作多智能体强化学习(MARL)中取得了显著的成功。然而,它们依赖于时间差(TD)目标计算的最大算子,导致系统的Q值高估。这个问题在MARL中尤其严重,因为联合动作空间的组合爆炸,这通常会导致不稳定的学习和次优策略。为了解决这个问题,我们提出了QSIM,一个相似性加权Q学习框架,使用动作相似性重建TD目标。QSIM不是直接使用贪婪联合动作,而是在结构化的近贪婪联合动作空间上形成相似加权期望。这个公式允许目标整合来自不同但行为相关的行为的Q值,同时将更大的影响力分配给那些更类似于贪婪选择的行为。通过平滑目标与结构相关的替代品,QSIM有效地减轻高估,提高学习稳定性。大量的实验表明,QSIM可以与各种VD方法无缝集成,始终产生优越的性能和稳定性相比,原来的算法。此外,实证分析证实,QSIM显着缓解系统的价值高估MARL。代码可在https://github.com/MaoMaoLYJ/pymarl-qsim上获得。
摘要
:Value decomposition (VD) methods have achieved remarkable success in cooperative multi-agent reinforcement learning (MARL). However, their reliance on the max operator for temporal-difference (TD) target calculation leads to systematic Q-value overestimation. This issue is particularly severe in MARL due to the combinatorial explosion of the joint action space, which often results in unstable learning and suboptimal policies. To address this problem, we propose QSIM, a similarity weighted Q-learning framework that reconstructs the TD target using action similarity. Instead of using the greedy joint action directly, QSIM forms a similarity weighted expectation over a structured near-greedy joint action space. This formulation allows the target to integrate Q-values from diverse yet behaviorally related actions while assigning greater influence to those that are more similar to the greedy choice. By smoothing the target with structurally relevant alternatives, QSIM effectively mitigates overestimation and improves learning stability. Extensive experiments demonstrate that QSIM can be seamlessly integrated with various VD methods, consistently yielding superior performance and stability compared to the original algorithms. Furthermore, empirical analysis confirms that QSIM significantly mitigates the systematic value overestimation in MARL. Code is available at https://github.com/MaoMaoLYJ/pymarl-qsim.
【2】Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
标题:通过翻译者引导的强化学习增强VLM中的几何感知
链接:https://arxiv.org/abs/2602.22703
作者:Hao Yu,Shuning Jia,Guanghao Li,Wenhao Jiang,Chun Yuan
摘要:视觉语言模型(VLM)由于其对基本图元素的有限感知而经常与几何推理斗争。为了应对这一挑战,我们引入GeoPerceive,这是一个基准测试,包括与特定领域语言(DSL)表示配对的图实例,以及一个高效的自动数据生成管道。这种设计使得独立于推理的几何感知的孤立评估成为可能。为了利用GeoPerceive提供的数据来增强VLMs的几何感知能力,我们提出了GeoDPO,一个翻译指导的强化学习(RL)框架。GeoDPO采用了一个NL-to-DSL翻译器,该翻译器在GeoPerceive数据引擎生成的合成对上进行训练,以桥接自然语言和DSL。这个翻译器有助于计算细粒度的DSL级别分数,这些分数在强化学习中充当奖励信号。我们在域内和域外数据集上评估GeoDPO,跨越几何感知和下游推理中的任务。实验结果表明,虽然监督微调(SFT)只提供边际改进,甚至可能会损害性能在域外的情况下,GeoDPO实现了实质性的收益:$+26.5\%$的域内数据,$+8.0\%$的域外数据,和$+39.0\%$的下游推理任务。这些发现强调了GeoDPO优于SFT的性能和泛化能力。所有代码均在https://github.com/Longin-Yu/GeoPerceive上发布 以确保可重复性。
摘要:Vision-language models (VLMs) often struggle with geometric reasoning due to their limited perception of fundamental diagram elements. To tackle this challenge, we introduce GeoPerceive, a benchmark comprising diagram instances paired with domain-specific language (DSL) representations, along with an efficient automatic data generation pipeline. This design enables the isolated evaluation of geometric perception independently from reasoning. To exploit the data provided by GeoPerceive for enhancing the geometric perception capabilities of VLMs, we propose GeoDPO, a translator-guided reinforcement learning (RL) framework. GeoDPO employs an NL-to-DSL translator, which is trained on synthetic pairs generated by the data engine of GeoPerceive, to bridge natural language and DSL. This translator facilitates the computation of fine-grained, DSL-level scores, which serve as reward signals in reinforcement learning. We assess GeoDPO on both in-domain and out-of-domain datasets, spanning tasks in geometric perception as well as downstream reasoning. Experimental results demonstrate that, while supervised fine-tuning (SFT) offers only marginal improvements and may even impair performance in out-of-domain scenarios, GeoDPO achieves substantial gains: $+26.5\%$ on in-domain data, $+8.0\%$ on out-of-domain data, and $+39.0\%$ on downstream reasoning tasks. These findings underscore the superior performance and generalization ability of GeoDPO over SFT. All codes are released at https://github.com/Longin-Yu/GeoPerceive to ensure reproducibility.
医学相关(7篇)
【1】Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction
标题:即插即用扩散满足ADMM:用于稳健医学图像重建的双变量耦合
链接:https://arxiv.org/abs/2602.23214
作者:Chenhe Du,Xuanyu Tian,Qing Wu,Muyu Liu,Jingyi Yu,Hongjiang Wei,Yuyao Zhang
摘要:即插即用扩散先验(Plug-and-Play diffusion prior,PnPDP)框架通过将预训练的生成模型视为模块化先验,已成为解决成像逆问题的强大范例。然而,我们发现了一个关键的缺陷,在流行的Pingdom求解器(例如,基于HQS或近端梯度):它们用作无记忆算子,仅基于瞬时梯度更新估计。这种历史跟踪的缺乏不可避免地导致非零稳态偏差,其中重建不能严格满足严重腐败下的物理测量。为了解决这一问题,我们提出了双耦合Puppet扩散,它恢复了经典的对偶变量,提供积分反馈,理论上保证渐近收敛到精确的数据流形。然而,这种严格的几何耦合引入了第二个挑战:累积的对偶残差表现出光谱颜色的结构化伪影,违反了扩散先验的加性高斯白噪声(AWGN)假设,导致严重的幻觉。为了弥合这一差距,我们引入了频谱均匀化(SH),这是一种频域自适应机制,可将这些结构化残差调制为统计兼容的伪AWGN输入。这有效地将求解器的严格优化轨迹与去噪器的有效统计流形对齐。CT和MRI重建的大量实验表明,我们的方法解决了偏见,幻觉权衡,实现了最先进的保真度与显着加速收敛。
摘要:Plug-and-Play diffusion prior (PnPDP) frameworks have emerged as a powerful paradigm for solving imaging inverse problems by treating pretrained generative models as modular priors. However, we identify a critical flaw in prevailing PnP solvers (e.g., based on HQS or Proximal Gradient): they function as memoryless operators, updating estimates solely based on instantaneous gradients. This lack of historical tracking inevitably leads to non-vanishing steady-state bias, where the reconstruction fails to strictly satisfy physical measurements under heavy corruption. To resolve this, we propose Dual-Coupled PnP Diffusion, which restores the classical dual variable to provide integral feedback, theoretically guaranteeing asymptotic convergence to the exact data manifold. However, this rigorous geometric coupling introduces a secondary challenge: the accumulated dual residuals exhibit spectrally colored, structured artifacts that violate the Additive White Gaussian Noise (AWGN) assumption of diffusion priors, causing severe hallucinations. To bridge this gap, we introduce Spectral Homogenization (SH), a frequency-domain adaptation mechanism that modulates these structured residuals into statistically compliant pseudo-AWGN inputs. This effectively aligns the solver's rigorous optimization trajectory with the denoiser's valid statistical manifold. Extensive experiments on CT and MRI reconstruction demonstrate that our approach resolves the bias-hallucination trade-off, achieving state-of-the-art fidelity with significantly accelerated convergence.
【2】FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification
标题:FairQuant:用于医学图像分类的公平性混合精度量化
链接:https://arxiv.org/abs/2602.23192
作者:Thomas Woergaard,Raghavendra Selvan
备注:Source code available at https://github.com/saintslab/FairQuant
摘要:通过量化模型参数来压缩神经网络提供了性能和效率之间的有用权衡。与全精度模型相比,量化感知训练和训练后量化等方法努力保持压缩模型的下游性能。然而,这些技术没有明确考虑算法公平性的影响。在这项工作中,我们研究了公平意识的混合精度量化方案下明确的比特预算的医学图像分类。我们引入了FairQuant,这是一个框架,它结合了组感知重要性分析,预算混合精度分配和可学习的比特感知量化(BAQ)模式,该模式在比特率和公平性正则化下联合优化权重和每单位比特分配。我们在Fitzpatrick 17 k和ISIC 2019上评估了ResNet 18/50,DeiT-Tiny和TinyViT的方法。结果表明,平均精度接近4-6位的FairQuant配置恢复了大部分均匀8位精度,同时相对于均匀4位和8位基线提高了最差组性能,在共享预算下具有可比的公平性指标。
摘要:Compressing neural networks by quantizing model parameters offers useful trade-off between performance and efficiency. Methods like quantization-aware training and post-training quantization strive to maintain the downstream performance of compressed models compared to the full precision models. However, these techniques do not explicitly consider the impact on algorithmic fairness. In this work, we study fairness-aware mixed-precision quantization schemes for medical image classification under explicit bit budgets. We introduce FairQuant, a framework that combines group-aware importance analysis, budgeted mixed-precision allocation, and a learnable Bit-Aware Quantization (BAQ) mode that jointly optimizes weights and per-unit bit allocations under bitrate and fairness regularization. We evaluate the method on Fitzpatrick17k and ISIC2019 across ResNet18/50, DeiT-Tiny, and TinyViT. Results show that FairQuant configurations with average precision near 4-6 bits recover much of the Uniform 8-bit accuracy while improving worst-group performance relative to Uniform 4- and 8-bit baselines, with comparable fairness metrics under shared budgets.
【3】A Data-Driven Approach to Support Clinical Renal Replacement Therapy
标题:支持临床肾脏替代治疗的数据驱动方法
链接:https://arxiv.org/abs/2602.22902
作者:Alice Balboni,Luis Escobar,Andrea Manno,Fabrizio Rossi,Maria Cristina Ruffa,Gianluca Villa,Giordano D'Aloisio,Antonio Consolo
摘要:本研究调查了一种数据驱动的机器学习方法,用于预测接受连续性肾脏替代治疗(CRRT)的重症患者的膜污染。使用来自ICU的时间序列数据,识别16个临床选择的特征以训练预测模型。为了确保可解释性并实现可靠的反事实分析,研究人员采用了表格数据方法,而不是直接建模时间依赖性。考虑到结垢和非结垢情况之间的不平衡,ADASYN过采样技术被应用于改善少数类别表示。测试了随机森林、XGBoost和LightGBM模型,在10%的再平衡率下实现了77.6%的灵敏度和96.3%的特异性的平衡性能。在不同的预测范围内,结果仍然稳健。值得注意的是,表格方法优于LSTM递归神经网络,这表明显式时间建模对于强大的预测性能是不必要的。特征选择进一步将模型减少到五个关键变量,以最小的准确性损失提高了简单性和可解释性。基于Shapley值的反事实分析被应用于性能最佳的模型,成功地识别出能够逆转结垢预测的最小输入变化。总体而言,研究结果支持可解释的机器学习模型在CRRT期间预测膜污染的可行性。预测和反事实分析的整合提供了实际的临床价值,可能指导治疗调整,以减少结垢风险和改善患者管理。
摘要:This study investigates a data-driven machine learning approach to predict membrane fouling in critically ill patients undergoing Continuous Renal Replacement Therapy (CRRT). Using time-series data from an ICU, 16 clinically selected features were identified to train predictive models. To ensure interpretability and enable reliable counterfactual analysis, the researchers adopted a tabular data approach rather than modeling temporal dependencies directly. Given the imbalance between fouling and non-fouling cases, the ADASYN oversampling technique was applied to improve minority class representation. Random Forest, XGBoost, and LightGBM models were tested, achieving balanced performance with 77.6% sensitivity and 96.3% specificity at a 10% rebalancing rate. Results remained robust across different forecasting horizons. Notably, the tabular approach outperformed LSTM recurrent neural networks, suggesting that explicit temporal modeling was not necessary for strong predictive performance. Feature selection further reduced the model to five key variables, improving simplicity and interpretability with minimal loss of accuracy. A Shapley value-based counterfactual analysis was applied to the best-performing model, successfully identifying minimal input changes capable of reversing fouling predictions. Overall, the findings support the viability of interpretable machine learning models for predicting membrane fouling during CRRT. The integration of prediction and counterfactual analysis offers practical clinical value, potentially guiding therapeutic adjustments to reduce fouling risk and improve patient management.
【4】Predicting Multi-Drug Resistance in Bacterial Isolates Through Performance Comparison and LIME-based Interpretation of Classification Models
标题:通过性能比较和基于LIME的分类模型解释预测细菌分离株的多药耐药性
链接:https://arxiv.org/abs/2602.22400
作者:Santanam Wishal,Riad Sahara
备注:6 pages, 7 figures
摘要:由于治疗选择有限和传统药敏试验的延迟,抗菌素耐药性的增加,特别是多药耐药性(MDR),对临床决策提出了关键挑战。这项研究提出了一个可解释的机器学习框架,使用临床特征和抗生素敏感性模式预测细菌分离株的MDR。评估了五种分类模型,包括Logistic回归,随机森林,AdaBoost,XGBoost和LightGBM。这些模型在9,714株分离株的策划数据集上进行训练,在抗生素家族水平上编码耐药性,以捕获与MDR定义一致的跨类耐药模式。性能评估包括准确性、F1评分、AUC-ROC和Matthews相关系数。包围模型,特别是XGBoost和LightGBM,在所有指标上都表现出了卓越的预测能力。为了解决临床透明度差距,应用本地可解释模型不可知解释(LIME)来生成实例级解释。LIME将喹诺酮类、复方新诺明、粘菌素、氨基糖苷类和呋喃类药物的耐药性确定为MDR预测的最大贡献者,与已知的生物学机制一致。结果表明,将高性能模型与本地可解释性相结合,为抗菌剂管理提供了准确性和可操作性。该框架支持早期MDR识别,并增强了对机器学习辅助临床决策支持的信任。
摘要
:The rise of Antimicrobial Resistance, particularly Multi-Drug Resistance (MDR), presents a critical challenge for clinical decision-making due to limited treatment options and delays in conventional susceptibility testing. This study proposes an interpretable machine learning framework to predict MDR in bacterial isolates using clinical features and antibiotic susceptibility patterns. Five classification models were evaluated, including Logistic Regression, Random Forest, AdaBoost, XGBoost, and LightGBM. The models were trained on a curated dataset of 9,714 isolates, with resistance encoded at the antibiotic family level to capture cross-class resistance patterns consistent with MDR definitions. Performance assessment included accuracy, F1-score, AUC-ROC, and Matthews Correlation Coefficient. Ensemble models, particularly XGBoost and LightGBM, demonstrated superior predictive capability across all metrics. To address the clinical transparency gap, Local Interpretable Model-agnostic Explanations (LIME) was applied to generate instance-level explanations. LIME identified resistance to quinolones, Co-trimoxazole, Colistin, aminoglycosides, and Furanes as the strongest contributors to MDR predictions, aligning with known biological mechanisms. The results show that combining high-performing models with local interpretability provides both accuracy and actionable insights for antimicrobial stewardship. This framework supports earlier MDR identification and enhances trust in machine learning-assisted clinical decision support.
【5】Learning geometry-dependent lead-field operators for forward ECG modeling
标题:学习用于正向心电图建模的几何相关导场运算符
链接:https://arxiv.org/abs/2602.22367
作者:Arsenii Dokuchaev,Francesca Bonizzoni,Stefano Pagani,Francesco Regazzoni,Simone Pezzuto
备注:20 pages, 9 figures
摘要:现代正向心电图(ECG)计算模型依赖于躯干域的准确表示。导联场方法可实现快速ECG模拟,同时保持完整的几何保真度。然而,在躯干表示中实现高解剖准确性在临床实践中是具有挑战性的,因为成像协议通常集中在心脏上并且通常不包括整个躯干。此外,铅场方法的计算成本与电极数量成线性比例,限制了其在高密度记录设置中的适用性。迄今为止,没有现有的方法同时实现高解剖保真度,低数据要求和计算效率。在这项工作中,我们提出了一个形状知情的代理模型的铅场运营商,作为一个下拉式的替代全阶模型在前向心电图模拟。所提出的框架包括两个组成部分:一个几何编码模块,将解剖形状映射到一个低维的潜在空间,和一个几何条件的神经代理,预测铅场梯度从空间坐标,电极位置和潜在代码。所提出的方法在躯干内(平均角度误差5°)和心脏内都实现了高精度的近似导联场,从而实现了高精度的ECG模拟(相对均方误差<2.5%)。代理一致优于广泛使用的伪铅场近似,同时保持可忽略不计的推理成本。由于其紧凑的潜在表示,该方法不需要完全详细的躯干分割,因此可以部署在数据有限的设置,同时保持高保真ECG模拟。
摘要:Modern forward electrocardiogram (ECG) computational models rely on an accurate representation of the torso domain. The lead-field method enables fast ECG simulations while preserving full geometric fidelity. Achieving high anatomical accuracy in torso representation is, however, challenging in clinical practice, as imaging protocols are typically focused on the heart and often do not include the entire torso. In addition, the computational cost of the lead-field method scales linearly with the number of electrodes, limiting its applicability in high-density recording settings. To date, no existing approach simultaneously achieves high anatomical fidelity, low data requirements and computational efficiency. In this work, we propose a shape-informed surrogate model of the lead-field operator that serves as a drop-in replacement for the full-order model in forward ECG simulations. The proposed framework consists of two components: a geometry-encoding module that maps anatomical shapes into a low-dimensional latent space, and a geometry-conditioned neural surrogate that predicts lead-field gradients from spatial coordinates, electrode positions and latent codes. The proposed method achieves high accuracy in approximating lead fields both within the torso (mean angular error 5°) and inside the heart, resulting in highly accurate ECG simulations (relative mean squared error <2.5%. The surrogate consistently outperforms the widely used pseudo lead-field approximation while preserving negligible inference cost. Owing to its compact latent representation, the method does not require a fully detailed torso segmentation and can therefore be deployed in data-limited settings while preserving high-fidelity ECG simulations.
【6】When Should a Model Change Its Mind? An Energy-Based Theory and Regularizer for Concept Drift in Electrocardiogram (ECG) Signals
标题:模特什么时候应该改变主意?心电图信号概念漂移的基于能量的理论和调节器
链接:https://arxiv.org/abs/2602.22294
作者:Timothy Oladunni,Blessing Ojeme,Kyndal Maclin,Clyde Baidoo
摘要:在动态生理信号上操作的模型必须将良性的、标签保留的可变性与真正的概念变化区分开来。现有的概念漂移框架在很大程度上是分布式的,并且没有提供关于当潜在信号经历生理上合理的能量波动时模型的内部表示可以移动多少的原则性指导。因此,深度模型经常将幅度、速率或形态的无害变化误解为概念漂移,从而产生不稳定的预测,特别是在多模式融合设置中。 本研究介绍了生理能量守恒理论(PECT),一个基于能量的框架,在动态信号的概念稳定性。PECT假定,在虚拟漂移下,归一化潜在位移应与归一化信号能量变化成比例,而持续违反此比例表示真实概念漂移。我们通过能量约束表示学习(ECRL)来实现这一原则,ECRL是一种轻量级的正则化器,可以在不修改编码器架构或增加推理时间成本的情况下惩罚能量不一致的潜在运动。 虽然PECT制定动态信号一般,我们实例化和评估它的多模态心电图在7个单峰和混合模型。实验表明,在最强的三峰混合(1D+2D+Transformer),干净的准确性在很大程度上保持(96.0%至94.1%),而扰动的准确性大幅提高(72.6%至85.5%)和融合表示漂移减少超过45%。在所有架构中观察到类似的趋势,提供了经验证据,PECT功能作为一个能量漂移规律,在连续的生理信号的概念稳定性。
摘要
:Models operating on dynamic physiologic signals must distinguish benign, label-preserving variability from true concept change. Existing concept-drift frameworks are largely distributional and provide no principled guidance on how much a model's internal representation may move when the underlying signal undergoes physiologically plausible fluctuations in energy. As a result, deep models often misinterpret harmless changes in amplitude, rate, or morphology as concept drift, yielding unstable predictions, particularly in multimodal fusion settings. This study introduces Physiologic Energy Conservation Theory (PECT), an energy-based framework for concept stability in dynamic signals. PECT posits that under virtual drift, normalized latent displacement should scale proportionally with normalized signal energy change, while persistent violations of this proportionality indicate real concept drift. We operationalize this principle through Energy-Constrained Representation Learning (ECRL), a lightweight regularizer that penalizes energy-inconsistent latent movement without modifying encoder architectures or adding inference-time cost. Although PECT is formulated for dynamic signals in general, we instantiate and evaluate it on multimodal ECG across seven unimodal and hybrid models. Experiments show that in the strongest trimodal hybrid (1D+2D+Transformer), clean accuracy is largely preserved (96.0% to 94.1%), while perturbed accuracy improves substantially (72.6% to 85.5%) and fused representation drift decreases by over 45%. Similar trends are observed across all architectures, providing empirical evidence that PECT functions as an energy-drift law governing concept stability in continuous physiologic signals.
【7】Early Risk Stratification of Dosing Errors in Clinical Trials Using Machine Learning
标题:使用机器学习进行临床试验中给药错误的早期风险分层
链接:https://arxiv.org/abs/2602.22285
作者:Félicien Hêche,Sohrab Ferdowsi,Anthony Yazdani,Sara Sansaloni-Pastor,Douglas Teodoro
摘要:目的:本研究的目的是开发一个基于机器学习(ML)的框架,用于根据临床试验(CT)显示高剂量错误率的可能性,使用试验开始前可用的信息进行早期风险分层。材料和方法:我们从ClinicalTrials.gov构建了一个包含42,112个CT的数据集。提取结构化、半结构化试验数据和非结构化方案相关自由文本数据。根据不良事件报告、MedDRA术语和Wilson置信区间,为CT分配二进制标签,表明给药错误率升高。我们评估了在结构化特征上训练的XGBoost模型,使用文本数据的ClinicalModernBERT模型,以及结合两种模式的简单后期融合模型。应用事后概率校准,以实现可解释的试验水平风险分层。结果:晚期融合模型的AUC-ROC最高(0.862)。除了区分之外,校准的输出使CT能够稳健地分层到预定义的风险类别中。标记为给药错误率过高的试验比例在较高预测风险组中单调增加,并与相应的预测概率范围一致。讨论:这些结果表明,使用启动前信息,可以在试验水平上预期给药错误风险。概率校准对于将模型输出转化为可靠和可解释的风险类别至关重要,而简单的多模态集成可以在不需要复杂架构的情况下获得性能增益。结论:本研究引入了一种可重复且可扩展的ML框架,用于对存在高剂量错误率风险的CT进行早期试验级风险分层,支持临床研究中主动的、基于风险的质量管理。
摘要:Objective: The objective of this study is to develop a machine learning (ML)-based framework for early risk stratification of clinical trials (CTs) according to their likelihood of exhibiting a high rate of dosing errors, using information available prior to trial initiation. Materials and Methods: We constructed a dataset from ClinicalTrials.gov comprising 42,112 CTs. Structured, semi-structured trial data, and unstructured protocol-related free-text data were extracted. CTs were assigned binary labels indicating elevated dosing error rate, derived from adverse event reports, MedDRA terminology, and Wilson confidence intervals. We evaluated an XGBoost model trained on structured features, a ClinicalModernBERT model using textual data, and a simple late-fusion model combining both modalities. Post-hoc probability calibration was applied to enable interpretable, trial-level risk stratification. Results: The late-fusion model achieved the highest AUC-ROC (0.862). Beyond discrimination, calibrated outputs enabled robust stratification of CTs into predefined risk categories. The proportion of trials labeled as having an excessively high dosing error rate increased monotonically across higher predicted risk groups and aligned with the corresponding predicted probability ranges. Discussion: These findings indicate that dosing error risk can be anticipated at the trial level using pre-initiation information. Probability calibration was essential for translating model outputs into reliable and interpretable risk categories, while simple multimodal integration yielded performance gains without requiring complex architectures. Conclusion: This study introduces a reproducible and scalable ML framework for early, trial-level risk stratification of CTs at risk of high dosing error rates, supporting proactive, risk-based quality management in clinical research.
蒸馏|知识提取(2篇)
【1】ManifoldGD: Training-Free Hierarchical Manifold Guidance for Diffusion-Based Dataset Distillation
标题:ManifoldVD:基于扩散的数据集蒸馏的免训练分层Manifold指南
链接:https://arxiv.org/abs/2602.23295
作者:Ayush Roy,Wei-Yang Alex Lee,Rudrasis Chakraborty,Vishnu Suresh Lokhande
备注:CVPE 2026
摘要:近年来,大型数据集阻碍了有效的模型训练,同时还包含冗余概念。数据集蒸馏旨在合成紧凑的数据集,保留大规模训练集的知识,同时大幅减少存储和计算。扩散模型的最新进展通过利用预先训练的生成先验实现了无训练蒸馏;然而,现有的指导策略仍然有限。目前基于分数的方法要么执行无指导的去噪,要么依赖于简单的基于模式的指导,对实例原型质心(IPC质心),这往往是基本的和次优的。我们提出了流形引导蒸馏(ManifoldGD),这是一个基于训练的无扩散框架,在每个去噪时间步集成了流形一致性指导。我们的方法采用IPC计算通过一个分层的,分裂的聚类VAE潜在的功能,产生一个多尺度的核心集的IPC,捕捉粗糙的语义模式和精细的类内变异。使用提取的IPC质心的局部邻域,我们为每个扩散去噪时间步创建潜在流形。在每个去噪步骤中,我们将模式对齐向量投影到估计的潜在流形的局部切空间上,从而约束生成轨迹以保持流形忠实,同时保持语义一致性。该公式提高了代表性,多样性和图像保真度,而无需任何模型再训练。实证结果表明,在FID,真实和合成数据集嵌入之间的l2距离以及分类准确性方面,与现有的无训练和基于训练的基线相比,取得了一致的收益,将ManifoldGD建立为第一个几何感知的无训练数据蒸馏框架。
摘要:In recent times, large datasets hinder efficient model training while also containing redundant concepts. Dataset distillation aims to synthesize compact datasets that preserve the knowledge of large-scale training sets while drastically reducing storage and computation. Recent advances in diffusion models have enabled training-free distillation by leveraging pre-trained generative priors; however, existing guidance strategies remain limited. Current score-based methods either perform unguided denoising or rely on simple mode-based guidance toward instance prototype centroids (IPC centroids), which often are rudimentary and suboptimal. We propose Manifold-Guided Distillation (ManifoldGD), a training-free diffusion-based framework that integrates manifold consistent guidance at every denoising timestep. Our method employs IPCs computed via a hierarchical, divisive clustering of VAE latent features, yielding a multi-scale coreset of IPCs that captures both coarse semantic modes and fine intra-class variability. Using a local neighborhood of the extracted IPC centroids, we create the latent manifold for each diffusion denoising timestep. At each denoising step, we project the mode-alignment vector onto the local tangent space of the estimated latent manifold, thus constraining the generation trajectory to remain manifold-faithful while preserving semantic consistency. This formulation improves representativeness, diversity, and image fidelity without requiring any model retraining. Empirical results demonstrate consistent gains over existing training-free and training-based baselines in terms of FID, l2 distance among real and synthetic dataset embeddings, and classification accuracy, establishing ManifoldGD as the first geometry-aware training-free data distillation framework.
【2】VAE-MS: An Asymmetric Variational Autoencoder for Mutational Signature Extraction
标题:VAE-MS:一种用于突变特征提取的非对称变分自动编码器
链接
:https://arxiv.org/abs/2602.22239
作者:Ida Egendal,Rasmus Froberg Brøndum,Dan J Woodcock,Christopher Yau,Martin Bøgsted
备注:Keywords: Variational Autoencoders, Mutational Signatures
摘要:突变特征分析已成为揭示驱动癌症发展的潜在生物学过程的有力方法。然而,通常使用非负矩阵分解(NMF)执行的特征提取过程通常缺乏可靠性和临床适用性。为了解决这些限制,已经引入了几种解决方案,包括使用神经网络来实现更准确的估计,以及使用概率方法来更好地捕捉数据中的自然变化。在这项工作中,我们介绍了一个变异自动编码器突变签名(VAE-MS),一种新的模型,利用非对称架构和概率方法提取突变签名。VAE-MS与三种最先进的突变特征提取模型进行了比较:SigProfilerExtractor,基于NMF的黄金标准; MUSE-XAE,采用不含概率成分的非对称设计的自动编码器;以及SigneR,贝叶斯NMF模型,以说明非线性提取与概率模型相结合的优势。在重建输入数据和推广到未知数据的能力方面,具有概率组件的模型(VAE-MS,SigneR)大大优于没有概率组件的模型(SigProfilerExtractor,MUSE-XAE)。基于NMF的模型(SigneR,SigProfilerExtractor)在模拟数据中具有最准确的重建,而VAE-MS在真实癌症数据上重建得更准确。在评估一致地提取签名的能力时,没有模型表现出明显的优势。VAE-MS软件可在https://github.com/CLINDA-AAU/VAE-MS上获得。
摘要:Mutational signature analysis has emerged as a powerful method for uncovering the underlying biological processes driving cancer development. However, the signature extraction process, typically performed using non-negative matrix factorization (NMF), often lacks reliability and clinical applicability. To address these limitations, several solutions have been introduced, including the use of neural networks to achieve more accurate estimates and probabilistic methods to better capture natural variation in the data. In this work, we introduce a Variational Autoencoder for Mutational Signatures (VAE-MS), a novel model that leverages both an asymmetric architecture and probabilistic methods for the extraction of mutational signatures. VAE-MS is compared to with three state-of-the-art models for mutational signature extraction: SigProfilerExtractor, the NMF-based gold standard; MUSE-XAE, an autoencoder that employs an asymmetric design without probabilistic components; and SigneR, a Bayesian NMF model, to illustrate the strength in combining a nonlinear extraction with a probabilistic model. In the ability to reconstruct input data and generalize to unseen data, models with probabilistic components (VAE-MS, SigneR) dramatically outperformed models without (SigProfilerExtractor, MUSE-XAE). The NMF-baed models (SigneR, SigProfilerExtractor) had the most accurate reconstructions in simulated data, while VAE-MS reconstructed more accurately on real cancer data. Upon evaluating the ability to extract signatures consistently, no model exhibited a clear advantage over the others. Software for VAE-MS is available at https://github.com/CLINDA-AAU/VAE-MS.
推荐(4篇)
【1】From Agnostic to Specific: Latent Preference Diffusion for Multi-Behavior Sequential Recommendation
标题:从不可知到特定:多行为顺序推荐的潜在偏好扩散
链接:https://arxiv.org/abs/2602.23132
作者:Ruochen Yang,Xiaodong Li,Jiawei Sheng,Jiangxia Cao,Xinkui Lin,Shen Wang,Shuang Yang,Zhaojie Liu,Tingwen Liu
摘要:多行为顺序推荐(MBSR)旨在学习用户多行为序列之间的动态、异构交互,从而捕捉目标行为下的用户偏好,为下一次交互项目预测提供依据。与以前的方法,采用单向建模的辅助行为映射到目标行为,最近的关注正在从行为固定的行为特定的建议。然而,这些方法仍然忽略了用户的潜在偏好,潜在的决策,导致次优的解决方案。同时,由于项目和行为之间的不对称确定性,基于偏好评分的判别范式不适合捕捉从低熵行为到高熵项目的不确定性,无法提供高效多样的推荐。为了应对这些挑战,我们提出了一个基于框架的扩散模型,该模型可以在潜在空间中引导偏好生成,从而实现多样化和准确的\textit {\textbf{M}ulti-\textbf{B}推荐或顺序推荐。具体来说,我们设计了一个多行为自动编码器(MBAE),以构建一个统一的用户潜在偏好空间,促进跨行为的交互和协作,在行为感知的RoPE(BaRoPE)采用多个信息融合。随后,我们进行目标行为特定的偏好转移的潜在空间,丰富的信息先验。一个多条件引导层归一化(MCGLN)被引入去噪。在真实数据集上的大量实验证明了该模型的有效性。
摘要:Multi-behavior sequential recommendation (MBSR) aims to learn the dynamic and heterogeneous interactions of users' multi-behavior sequences, so as to capture user preferences under target behavior for the next interacted item prediction. Unlike previous methods that adopt unidirectional modeling by mapping auxiliary behaviors to target behavior, recent concerns are shifting from behavior-fixed to behavior-specific recommendation. However, these methods still ignore the user's latent preference that underlying decision-making, leading to suboptimal solutions. Meanwhile, due to the asymmetric deterministic between items and behaviors, discriminative paradigm based on preference scoring is unsuitable to capture the uncertainty from low-entropy behaviors to high-entropy items, failing to provide efficient and diverse recommendation. To address these challenges, we propose \textbf{FatsMB}, a framework based diffusion model that guides preference generation \textit{\textbf{F}rom Behavior-\textbf{A}gnostic \textbf{T}o Behavior-\textbf{S}pecific} in latent spaces, enabling diverse and accurate \textit{\textbf{M}ulti-\textbf{B}ehavior Sequential Recommendation}. Specifically, we design a Multi-Behavior AutoEncoder (MBAE) to construct a unified user latent preference space, facilitating interaction and collaboration across Behaviors, within Behavior-aware RoPE (BaRoPE) employed for multiple information fusion. Subsequently, we conduct target behavior-specific preference transfer in the latent space, enriching with informative priors. A Multi-Condition Guided Layer Normalization (MCGLN) is introduced for the denoising. Extensive experiments on real-world datasets demonstrate the effectiveness of our model.
【2】SIGMA: A Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress
标题:SIGMA:全球速卖通的基于语义的指令驱动生成式多任务推荐器
链接:https://arxiv.org/abs/2602.22913
作者:Yang Yu,Lei Kou,Huaikuan Yi,Bin Chen,Yayu Cao,Lei Shen,Chao Zhang,Bing Wang,Xiaoyi Zeng
摘要
:随着大语言模型的快速发展,生成式推荐正在逐渐重塑推荐系统的范式。然而,大多数现有的方法仍然局限于交互驱动的下一个项目预测范式,无法快速适应不断变化的趋势或解决不同的推荐任务以及现实世界中的特定业务需求。为此,我们在AliExpress上展示了SIGMA,一个语义接地指令驱动的生成式多任务推荐器。具体来说,我们首先地面项目实体一般语义通过一个统一的潜在空间捕捉语义和协作关系。在此基础上,我们开发了一种混合项目标记化方法,用于精确建模和高效生成。此外,我们构建了一个大规模的多任务SFT数据集,使SIGMA能够满足各种推荐需求,通过谨慎跟随。最后,我们设计了一个三步的项目生成过程集成了自适应概率融合机制,以校准的输出分布的基础上,特定任务的要求,推荐精度和多样性。大量的离线实验和在线A/B测试证明了SIGMA的有效性。
摘要:With the rapid evolution of Large Language Models, generative recommendation is gradually reshaping the paradigm of recommender systems. However, most existing methods are still confined to the interaction-driven next-item prediction paradigm, failing to rapidly adapt to evolving trends or address diverse recommendation tasks along with business-specific requirements in real-world scenarios. To this end, we present SIGMA, a Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress. Specifically, we first ground item entities in general semantics via a unified latent space capturing both semantic and collaborative relations. Building upon this, we develop a hybrid item tokenization method for precise modeling and efficient generation. Moreover, we construct a large-scale multi-task SFT dataset to empower SIGMA to fulfill various recommendation demands via instruction-following. Finally, we design a three-step item generation procedure integrated with an adaptive probabilistic fusion mechanism to calibrate the output distributions based on task-specific requirements for recommendation accuracy and diversity. Extensive offline experiments and online A/B tests demonstrate the effectiveness of SIGMA.
【3】Generative Recommendation for Large-Scale Advertising
标题:针对大规模广告的生成性推荐
链接:https://arxiv.org/abs/2602.22732
作者:Ben Xue,Dan Liu,Lixiang Wang,Mingjie Sun,Peng Wang,Pengfei Zhang,Shaoyun Shi,Tianyu Xu,Yunhao Sha,Zhiqiang Liu,Bo Kong,Bo Wang,Hang Yang,Jieting Xue,Junhao Wang,Shengyu Wang,Shuping Hui,Wencai Ye,Xiao Lin,Yongzhi Li,Yuhang Chen,Zhihui Yin,Quan Chen,Shiyang Wen,Wenjin Wu,Han Li,Guorui Zhou,Changcheng Li,Peng Jiang
备注:13 pages, 6 figures, under review
摘要:生成式推荐由于其潜在的可扩展性和更强的模型容量,近年来引起了业界的广泛关注。然而,在大规模广告中部署实时生成式推荐需要超越大型语言模型(LLM)风格的训练和服务食谱的设计。我们提出了一个面向生产的生成式推荐器,它是跨架构、学习和服务共同设计的,名为GR 4AD(Generative Recommendation for ADdvertising)。在标记化方面,GR 4AD提出了UA-SID(Unified Advertisement Semantic ID,统一广告语义ID)来捕获复杂的业务信息。此外,GR 4AD还引入了LazyAR,这是一种惰性自回归解码器,它可以放松逐层依赖性,以进行简短的多候选生成,在降低推理成本的同时保持有效性,这有助于在固定的服务预算下进行扩展。为了使优化与业务价值保持一致,GR 4AD采用了VSL(价值感知监督学习)并提出了RSPO(排名引导Softmax偏好优化),这是一种排名感知的列表强化学习算法,可在列表级指标下优化基于价值的奖励,以实现持续在线更新。对于在线推理,我们进一步提出了动态波束服务,它适应跨代水平的波束宽度和在线负载来控制计算。大规模的在线A/B测试显示,与现有的基于DLRM的堆栈相比,广告收入提高了4.2%,并且从模型扩展和推理时间扩展中获得了一致的收益。GR 4AD已在快手广告系统全面部署,用户数超过4亿,实现高吞吐量实时服务。
摘要:Generative recommendation has recently attracted widespread attention in industry due to its potential for scaling and stronger model capacity. However, deploying real-time generative recommendation in large-scale advertising requires designs beyond large-language-model (LLM)-style training and serving recipes. We present a production-oriented generative recommender co-designed across architecture, learning, and serving, named GR4AD (Generative Recommendation for ADdvertising). As for tokenization, GR4AD proposes UA-SID (Unified Advertisement Semantic ID) to capture complicated business information. Furthermore, GR4AD introduces LazyAR, a lazy autoregressive decoder that relaxes layer-wise dependencies for short, multi-candidate generation, preserving effectiveness while reducing inference cost, which facilitates scaling under fixed serving budgets. To align optimization with business value, GR4AD employs VSL (Value-Aware Supervised Learning) and proposes RSPO (Ranking-Guided Softmax Preference Optimization), a ranking-aware, list-wise reinforcement learning algorithm that optimizes value-based rewards under list-level metrics for continual online updates. For online inference, we further propose dynamic beam serving, which adapts beam width across generation levels and online load to control compute. Large-scale online A/B tests show up to 4.2% ad revenue improvement over an existing DLRM-based stack, with consistent gains from both model scaling and inference-time scaling. GR4AD has been fully deployed in Kuaishou advertising system with over 400 million users and achieves high-throughput real-time serving.
【4】From Bias to Balance: Fairness-Aware Paper Recommendation for Equitable Peer Review
标题:从偏见到平衡:公平同行评审的公平意识论文建议
链接:https://arxiv.org/abs/2602.22438
作者:Uttamasha Anjally Oyshi,Susan Gauch
摘要:尽管频繁的双盲审查,与作者人口统计学相关的系统性偏见仍然不利于代表性不足的群体。我们从一个简单的假设开始:如果一个评论后推荐器是用一个显式的公平正则化器训练的,它应该在不降低质量的情况下增加包含性。为了测试这一点,我们引入了Fair-PaperRec,这是一种多层感知器(MLP),在交叉属性上具有可区分的公平性损失(例如,种族、国家),在双盲审查后对论文进行重新排名。我们首先探讨了跨越高,中等和接近公平的偏见的合成数据集的假设。在多个随机运行中,这些对照研究映射了增加公平性权重增强宏观/微观多样性的地方,同时保持效用近似稳定,证明了在不同差异水平下的鲁棒性和适应性。然后,我们将假设带入原始环境、来自ACM人机交互特别兴趣小组(SIGCHI)、设计交互系统(DIS)和智能用户界面(IUI)的会议数据。在这个真实世界的场景中,适当调整的Fair-PaperRec配置实现了代表性不足的群体参与率增加42.03%,相对于历史选择,总体效用最多变化3.16%。综合来看,从合成到原始的进展表明,公平正则化既可以作为一种公平机制,也可以作为一种温和的质量正则化器,特别是在高度偏见的制度。通过首先分析受控条件下公平性参数的行为,然后在真实投稿中验证它们,Fair-PaperRec为审稿后论文选择提供了一个实用的,以公平为中心的框架,该框架保留了,在某些情况下甚至可以提高,测量的学术质量。
摘要
:Despite frequent double-blind review, systemic biases related to author demographics still disadvantage underrepresented groups. We start from a simple hypothesis: if a post-review recommender is trained with an explicit fairness regularizer, it should increase inclusion without degrading quality. To test this, we introduce Fair-PaperRec, a Multi-Layer Perceptron (MLP) with a differentiable fairness loss over intersectional attributes (e.g., race, country) that re-ranks papers after double-blind review. We first probe the hypothesis on synthetic datasets spanning high, moderate, and near-fair biases. Across multiple randomized runs, these controlled studies map where increasing the fairness weight strengthens macro/micro diversity while keeping utility approximately stable, demonstrating robustness and adaptability under varying disparity levels. We then carry the hypothesis into the original setting, conference data from ACM Special Interest Group on Computer-Human Interaction (SIGCHI), Designing Interactive Systems (DIS), and Intelligent User Interfaces (IUI). In this real-world scenario, an appropriately tuned configuration of Fair-PaperRec achieves up to a 42.03% increase in underrepresented-group participation with at most a 3.16% change in overall utility relative to the historical selection. Taken together, the synthetic-to-original progression shows that fairness regularization can act as both an equity mechanism and a mild quality regularizer, especially in highly biased regimes. By first analyzing the behavior of the fairness parameters under controlled conditions and then validating them on real submissions, Fair-PaperRec offers a practical, equity-focused framework for post-review paper selection that preserves, and in some settings can even enhance, measured scholarly quality.
聚类(2篇)
【1】Hypernetwork-based approach for grid-independent functional data clustering
标题:基于超网络的网格无关函数数据聚类方法
链接:https://arxiv.org/abs/2602.22823
作者:Anirudh Thatipelli,Ali Siahkoohi
摘要:函数数据聚类关注的是对共享相似结构的函数进行分组,然而大多数现有方法都隐式地对采样网格进行操作,导致聚类分配取决于分辨率,采样密度或预处理选择,而不是底层函数本身。为了解决这一限制,我们引入了一个框架,通过自动编码架构将离散化函数观测-以任意分辨率和任意网格-映射到固定维向量空间。编码器是一个超网络,它将坐标值对映射到隐式神经表示(INR)的权重空间,后者充当解码器。由于INR表示具有非常少的参数的函数,因此这种设计产生了与采样网格解耦的紧凑表示,而超网络将权重预测分摊到整个数据集。然后使用标准算法在该权重空间中执行聚类,使得该方法对离散化和聚类方法的选择都是不可知的。通过在高维环境中进行合成和真实世界的实验,我们展示了竞争性聚类性能,该性能对采样分辨率的变化具有鲁棒性-包括泛化到训练过程中看不到的分辨率。
摘要:Functional data clustering is concerned with grouping functions that share similar structure, yet most existing methods implicitly operate on sampled grids, causing cluster assignments to depend on resolution, sampling density, or preprocessing choices rather than on the underlying functions themselves. To address this limitation, we introduce a framework that maps discretized function observations -- at arbitrary resolution and on arbitrary grids -- into a fixed-dimensional vector space via an auto-encoding architecture. The encoder is a hypernetwork that maps coordinate-value pairs to the weight space of an implicit neural representation (INR), which serves as the decoder. Because INRs represent functions with very few parameters, this design yields compact representations that are decoupled from the sampling grid, while the hypernetwork amortizes weight prediction across the dataset. Clustering is then performed in this weight space using standard algorithms, making the approach agnostic to both the discretization and the choice of clustering method. By means of synthetic and real-world experiments in high-dimensional settings, we demonstrate competitive clustering performance that is robust to changes in sampling resolution -- including generalization to resolutions not seen during training.
【2】Low-degree Lower bounds for clustering in moderate dimension
标题:低度中等维度聚集的下限
链接:https://arxiv.org/abs/2602.23023
作者:Alexandra Carpentier,Nicolas Verzelen
摘要:我们研究的基本问题,聚类$n$点到$K$组绘制的各向同性高斯在$\mathbb{R}^d$的混合物。具体来说,我们研究了平均向量之间的必要最小距离$Δ$,以部分恢复底层分区。虽然$Δ$的最小最大最优阈值已经确定,但该信息论限制与已知多项式时间过程的性能之间存在显着差距。虽然这个差距最近的特点是在高维政权($n \leq dK$),它仍然在很大程度上未被探索的中维政权($n \geq dK$)。在这篇手稿中,我们解决这个制度,建立一个新的低次多项式下界的中等维的情况下,当$d \geq K$。我们发现,虽然聚类的困难为$n \leq dK$主要是由降维和谱方法驱动,中维制度涉及更微妙的现象,导致一个“非参数率”。我们提供了一种新的非谱算法匹配这个速率,脱落新的光的计算限制的聚类问题在中等尺寸。
摘要:We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.
超分辨率|去噪|去模糊|去雾(1篇)
【1】HARU-Net: Hybrid Attention Residual U-Net for Edge-Preserving Denoising in Cone-Beam Computed Tomography
标题:HARU-Net:用于锥束计算机断层扫描中边缘保留去噪的混合注意力剩余U-Net
链接:https://arxiv.org/abs/2602.22544
作者:Khuram Naveed,Ruben Pauwels
摘要
:锥形束计算机断层扫描(CBCT)广泛用于牙科和颌面部成像,但低剂量采集会引入强烈的空间变化噪声,降低软组织的可见性并模糊精细的解剖结构。传统的去噪方法难以在CBCT中抑制噪声的同时保留边缘。尽管基于深度学习的方法提供了高保真恢复,但它们在CBCT去噪中的使用受到监督训练的高分辨率CBCT数据稀缺的限制。为了解决这一研究空白,我们提出了一种新的混合注意力残差U-网络(HARU-Net),用于CBCT数据的高质量去噪,在使用3D Accuitomo 170(J. Morita,京都,日本)CBCT系统的高分辨率协议获得的人类半下颌骨尸体数据集上进行训练。这种方法的新贡献是集成了三个互补的架构组件:(i)嵌入在每个跳跃连接内的混合注意力Transformer块(HAB),以选择性地强调显著的解剖特征,(ii)在瓶颈处的残余混合注意力Transformer组(RHAG),以加强全局上下文建模和长距离特征交互,以及(iii)残余学习卷积块,以便于在整个网络中进行更深、更稳定的特征提取。HARU-Net始终优于SwinIR和Uformer等最先进的(SOTA)方法,实现了最高的PSNR(37.52 dB)、最高的SSIM(0.9557)和最低的GMSD(0.1084)。这种有效且临床上可靠的CBCT去噪以显著低于SOTA方法的计算成本实现,为提高低剂量CBCT成像的诊断质量提供了实际的进步。
摘要:Cone-beam computed tomography (CBCT) is widely used in dental and maxillofacial imaging, but low-dose acquisition introduces strong, spatially varying noise that degrades soft-tissue visibility and obscures fine anatomical structures. Classical denoising methods struggle to suppress noise in CBCT while preserving edges. Although deep learning-based approaches offer high-fidelity restoration, their use in CBCT denoising is limited by the scarcity of high-resolution CBCT data for supervised training. To address this research gap, we propose a novel Hybrid Attention Residual U-Net (HARU-Net) for high-quality denoising of CBCT data, trained on a cadaver dataset of human hemimandibles acquired using a high-resolution protocol of the 3D Accuitomo 170 (J. Morita, Kyoto, Japan) CBCT system. The novel contribution of this approach is the integration of three complementary architectural components: (i) a hybrid attention transformer block (HAB) embedded within each skip connection to selectively emphasize salient anatomical features, (ii) a residual hybrid attention transformer group (RHAG) at the bottleneck to strengthen global contextual modeling and long-range feature interactions, and (iii) residual learning convolutional blocks to facilitate deeper, more stable feature extraction throughout the network. HARU-Net consistently outperforms state-of-the-art (SOTA) methods including SwinIR and Uformer, achieving the highest PSNR (37.52 dB), highest SSIM (0.9557), and lowest GMSD (0.1084). This effective and clinically reliable CBCT denoising is achieved at a computational cost significantly lower than that of the SOTA methods, offering a practical advancement toward improving diagnostic quality in low-dose CBCT imaging.
自动驾驶|车辆|车道检测等(2篇)
【1】Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving
标题:释放端到端自动驾驶扩散模型的潜力
链接:https://arxiv.org/abs/2602.22801
作者:Yinan Zheng,Tianyi Tan,Bin Huang,Enguang Liu,Ruiming Liang,Jianlin Zhang,Jianwei Cui,Guang Chen,Kun Ma,Hangjun Ye,Long Chen,Ya-Qin Zhang,Xianyuan Zhan,Jingjing Liu
摘要:扩散模型已成为机器人决策任务的热门选择,最近也被考虑用于解决自动驾驶任务。然而,它们在自动驾驶中的应用和评估仍然局限于基于模拟或实验室的环境。对于大规模、复杂的现实环境,如端到端自动驾驶(E2E AD),扩散模型的全部力量仍然没有得到充分的探索。在这项研究中,我们进行了系统的和大规模的调查,释放的潜力的扩散模型作为规划者的E2E AD,基于大量的实车数据和道路测试。通过全面和仔细控制的研究,我们确定了扩散损失空间,轨迹表示和数据缩放的关键见解,显着影响E2E规划性能。此外,我们还提供了一种有效的强化学习后训练策略,以进一步提高学习规划器的安全性。由此产生的基于扩散的学习框架Hyper Diffusion Planner(HDP)部署在实车平台上,并在6个城市驾驶场景和200公里的真实世界测试中进行了评估,与基础模型相比,性能显著提高了10倍。我们的工作表明,经过适当设计和训练的扩散模型可以作为复杂的、真实世界的自动驾驶任务的有效和可扩展的E2E AD规划器。
摘要:Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Autonomous Driving (E2E AD), remains underexplored. In this study, we conducted a systematic and large-scale investigation to unleash the potential of the diffusion models as planners for E2E AD, based on a tremendous amount of real-vehicle data and road testing. Through comprehensive and carefully controlled studies, we identify key insights into the diffusion loss space, trajectory representation, and data scaling that significantly impact E2E planning performance. Moreover, we also provide an effective reinforcement learning post-training strategy to further enhance the safety of the learned planner. The resulting diffusion-based learning framework, Hyper Diffusion Planner} (HDP), is deployed on a real-vehicle platform and evaluated across 6 urban driving scenarios and 200 km of real-world testing, achieving a notable 10x performance improvement over the base model. Our work demonstrates that diffusion models, when properly designed and trained, can serve as effective and scalable E2E AD planners for complex, real-world autonomous driving tasks.
【2】Positional-aware Spatio-Temporal Network for Large-Scale Traffic Prediction
标题:用于大规模交通预测的位置感知时空网络
链接:https://arxiv.org/abs/2602.22274
作者:Runfei Chen
备注:Accepted for the 104th Transportation Research Board (TRB) Annual Meeting in 2025
摘要:交通流预测已成为人们日常生活中不可或缺的一项任务,它需要利用图结构下的时间段内各个位置之间的时空关系来预测未来的流量。然而,更广泛的地理区域和更长的时间跨度的大量旅行需求要求模型能够清楚地区分每个节点,并拥有对历史的整体看法,这在以前的工作中很少受到关注。此外,不断增长的数据规模阻碍了大多数模型在实际应用程序环境中的部署。为此,在本文中,我们提出了一个轻量级的位置感知时空网络(PANTRAL),以有效地捕捉时间和空间的复杂性,在一个端到端的方式。PANTOS引入位置感知嵌入来分离每个节点的表示,同时还利用时间注意力模块来改善当前模型的远程感知。大量的实验验证了PANDITIS在不同规模(县,特大城市和州)数据集上的有效性和效率。进一步的分析也证明了新引入模块的功效。
摘要
:Traffic flow forecasting has emerged as an indispensable mission for daily life, which is required to utilize the spatiotemporal relationship between each location within a time period under a graph structure to predict future flow. However, the large travel demand for broader geographical areas and longer time spans requires models to distinguish each node clearly and possess a holistic view of the history, which has been paid less attention to in prior works. Furthermore, increasing sizes of data hinder the deployment of most models in real application environments. To this end, in this paper, we propose a lightweight Positional-aware Spatio-Temporal Network (PASTN) to effectively capture both temporal and spatial complexities in an end-to-end manner. PASTN introduces positional-aware embeddings to separate each node's representation, while also utilizing a temporal attention module to improve the long-range perception of current models. Extensive experiments verify the effectiveness and efficiency of PASTN across datasets of various scales (county, megalopolis and state). Further analysis demonstrates the efficacy of newly introduced modules either.
联邦学习|隐私保护|加密(4篇)
【1】SettleFL: Trustless and Scalable Reward Settlement Protocol for Federated Learning on Permissionless Blockchains (Extended version)
标题:SettleFL:无许可区块链上联邦学习的无可信且可扩展的奖励结算协议(扩展版本)
链接:https://arxiv.org/abs/2602.23167
作者:Shuang Liang,Yang Hua,Linshan Jiang,Peishen Yan,Tao Song,Bin Yao,Haibing Guan
摘要:在开放的联邦学习(FL)环境中,没有中央权威机构存在,确保协作公平性依赖于分散的奖励结算,但无许可区块链的高昂成本直接与模型训练的高频迭代性质相冲突。现有的解决方案要么妥协分散化,要么由于线性链上成本而遭受可扩展性瓶颈。为了解决这个问题,我们提出了SettleFL,这是一个无信任和可扩展的奖励结算协议,旨在通过提供两个可互操作的协议来最大限度地减少总的经济摩擦。利用共享的特定于域的电路架构,SettleFL提供了两种可互操作的策略:(1)提交和挑战变体,通过乐观执行和争议驱动仲裁来最大限度地减少链上成本,以及(2)提交证明变体,通过每轮有效性证明来保证即时终结。这种设计允许协议灵活地适应不同的延迟和成本约束,同时在没有可信协调的情况下实施合理的鲁棒性。我们进行了大量的实验,结合真实FL工作负载和受控模拟。结果表明,SettleFL在扩展到800名参与者时仍然实用,实现了大幅降低的天然气成本。
摘要:In open Federated Learning (FL) environments where no central authority exists, ensuring collaboration fairness relies on decentralized reward settlement, yet the prohibitive cost of permissionless blockchains directly clashes with the high-frequency, iterative nature of model training. Existing solutions either compromise decentralization or suffer from scalability bottlenecks due to linear on-chain costs. To address this, we present SettleFL, a trustless and scalable reward settlement protocol designed to minimize total economic friction by offering a family of two interoperable protocols. Leveraging a shared domain-specific circuit architecture, SettleFL offers two interoperable strategies: (1) a Commit-and-Challenge variant that minimizes on-chain costs via optimistic execution and dispute-driven arbitration, and (2) a Commit-with-Proof variant that guarantees instant finality through per-round validity proofs. This design allows the protocol to flexibly adapt to varying latency and cost constraints while enforcing rational robustness without trusted coordination. We conduct extensive experiments combining real FL workloads and controlled simulations. Results show that SettleFL remains practical when scaling to 800 participants, achieving substantially lower gas cost.
【2】Tackling Privacy Heterogeneity in Differentially Private Federated Learning
标题:差分私有联邦学习中隐私异质性的解决
链接:https://arxiv.org/abs/2602.22633
作者:Ruichen Xu,Ying-Jun Angela Zhang,Jianwei Huang
摘要:差分私有联邦学习(DP-FL)使客户能够协作训练机器学习模型,同时保护其本地数据的隐私。然而,大多数现有的DP-FL方法假设所有客户端共享统一的隐私预算,这一假设在隐私要求差异很大的现实场景中并不成立。这种隐私异质性带来了一个重大的挑战:传统的客户端选择策略,通常依赖于数据量,不能区分客户端提供高质量的更新和那些引入大量的噪音,由于严格的隐私限制。为了解决这一差距,我们首次在DP-FL中对隐私感知客户选择进行了系统研究。我们通过推导量化隐私异质性对训练错误影响的收敛分析来建立理论基础。在此分析的基础上,我们提出了一个隐私意识的客户端选择策略,制定为一个凸优化问题,自适应地调整选择概率,以尽量减少训练误差。对基准数据集进行的广泛实验表明,在异构隐私预算下,与现有基线相比,我们的方法在CIFAR-10上的测试准确性提高了10%。这些结果强调了将隐私异质性纳入客户端选择的重要性,以实现实用和有效的联邦学习。
摘要:Differentially private federated learning (DP-FL) enables clients to collaboratively train machine learning models while preserving the privacy of their local data. However, most existing DP-FL approaches assume that all clients share a uniform privacy budget, an assumption that does not hold in real-world scenarios where privacy requirements vary widely. This privacy heterogeneity poses a significant challenge: conventional client selection strategies, which typically rely on data quantity, cannot distinguish between clients providing high-quality updates and those introducing substantial noise due to strict privacy constraints. To address this gap, we present the first systematic study of privacy-aware client selection in DP-FL. We establish a theoretical foundation by deriving a convergence analysis that quantifies the impact of privacy heterogeneity on training error. Building on this analysis, we propose a privacy-aware client selection strategy, formulated as a convex optimization problem, that adaptively adjusts selection probabilities to minimize training error. Extensive experiments on benchmark datasets demonstrate that our approach achieves up to a 10% improvement in test accuracy on CIFAR-10 compared to existing baselines under heterogeneous privacy budgets. These results highlight the importance of incorporating privacy heterogeneity into client selection for practical and effective federated learning.
【3】Beyond performance-wise Contribution Evaluation in Federated Learning
标题:超越联邦学习中的绩效贡献评估
链接:https://arxiv.org/abs/2602.22470
作者:Balazs Pejo
摘要:联合学习提供了一个隐私友好的协作学习框架,但它的成功,像任何合资企业,取决于其参与者的贡献。现有的客户端评估方法主要关注模型性能,例如准确性或损失,这仅代表机器学习模型整体效用的一个维度。相比之下,这项工作调查了客户端对模型可信度的贡献的关键但被忽视的问题-特别是其可靠性(对噪声数据的容忍度),弹性(对对抗性示例的抵抗力)和公平性(通过人口统计学平价来衡量)。为了量化这些多方面的贡献,我们采用了最先进的Shapley值近似值,这是一种价值归属的原则性方法。我们的研究结果表明,没有一个客户在所有维度上都表现出色,这些维度在很大程度上是相互独立的,这突出了当前评估方案的一个关键缺陷:没有一个单一的指标足以进行全面评估和公平的奖励分配。
摘要
:Federated learning offers a privacy-friendly collaborative learning framework, yet its success, like any joint venture, hinges on the contributions of its participants. Existing client evaluation methods predominantly focus on model performance, such as accuracy or loss, which represents only one dimension of a machine learning model's overall utility. In contrast, this work investigates the critical, yet overlooked, issue of client contributions towards a model's trustworthiness -- specifically, its reliability (tolerance to noisy data), resilience (resistance to adversarial examples), and fairness (measured via demographic parity). To quantify these multifaceted contributions, we employ the state-of-the-art approximation of the Shapley value, a principled method for value attribution. Our results reveal that no single client excels across all dimensions, which are largely independent from each other, highlighting a critical flaw in current evaluation scheme: no single metric is adequate for comprehensive evaluation and equitable rewarding allocation.
【4】CQSA: Byzantine-robust Clustered Quantum Secure Aggregation in Federated Learning
标题:CQSA:联邦学习中的拜占庭鲁棒量子安全聚合
链接:https://arxiv.org/abs/2602.22269
作者:Arnab Nath,Harsh Kasyap
备注:6 pages, 3 figures
摘要:联合学习(FL)支持协作模型训练,而无需共享原始数据。然而,共享的局部模型更新仍然容易受到推断和中毒攻击。已经提出了安全聚合方案来减轻这些攻击。在这项工作中,我们的目标是了解这些技术是如何实现在量子辅助FL。量子安全聚合(QSA)已被提出,提供信息理论的隐私,通过将客户端更新编码到全局相位的多体纠缠态。然而,现有的QSA协议依赖于在所有参与客户端之间共享的单个全局Greenberger-Horne-Zeilinger(GHZ)状态。这种设计带来了根本性的挑战:大规模的GHZ状态的保真度迅速恶化的客户端的数量的增加;和(ii)全球聚合防止拜占庭客户端的检测。我们提出了量子安全聚合(CQSA),这是一个模块化聚合框架,它协调了近期量子硬件的物理约束以及FL中拜占庭鲁棒性的需求。CQSA随机将客户端划分为小集群,每个集群使用高保真,低量子比特GHZ状态执行本地量子聚合。服务器采用常见的统计测量(例如余弦相似性和欧几里得距离)来分析集群级聚合之间的统计关系,以识别恶意贡献。通过理论分析和去极化噪声下的仿真,我们证明了CQSA保证了模型的稳定收敛,获得了优于全局QSA的状态保真度。
摘要:Federated Learning (FL) enables collaborative model training without sharing raw data. However, shared local model updates remain vulnerable to inference and poisoning attacks. Secure aggregation schemes have been proposed to mitigate these attacks. In this work, we aim to understand how these techniques are implemented in quantum-assisted FL. Quantum Secure Aggregation (QSA) has been proposed, offering information-theoretic privacy by encoding client updates into the global phase of multipartite entangled states. Existing QSA protocols, however, rely on a single global Greenberger-Horne-Zeilinger (GHZ) state shared among all participating clients. This design poses fundamental challenges: fidelity of large-scale GHZ states deteriorates rapidly with the increasing number of clients; and (ii) the global aggregation prevents the detection of Byzantine clients. We propose Clustered Quantum Secure Aggregation (CQSA), a modular aggregation framework that reconciles the physical constraints of near-term quantum hardware along with the need for Byzantine-robustness in FL. CQSA randomly partitions the clients into small clusters, each performing local quantum aggregation using high-fidelity, low-qubit GHZ states. The server analyzes statistical relationships between cluster-level aggregates employing common statistical measures such as cosine similarity and Euclidean distance to identify malicious contributions. Through theoretical analysis and simulations under depolarizing noise, we demonstrate that CQSA ensures stable model convergence, achieves superior state fidelity over global QSA.
推理|分析|理解|解释(10篇)
【1】Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
标题:在不忘记上下文学习的情况下进行微调:线性注意力模型的理论分析
链接:https://arxiv.org/abs/2602.23197
作者:Chungpa Lee,Jy-yong Sohn,Kangwook Lee
摘要:基于transformer的大型语言模型展示了上下文学习,通过演示的Few-Shot提示来适应下游任务。在实践中,这些模型通常经过微调,以提高下游任务的zero-shot性能,允许它们在没有示例的情况下解决任务,从而降低推理成本。然而,微调可能会降低上下文学习,限制微调模型在微调期间看不到的任务上的性能。使用线性注意力模型,我们提供了一个理论分析,其特征在于如何微调目标修改注意力参数,并确定条件下,这导致退化的Few-Shot性能。我们发现,微调所有的注意力参数可能会损害上下文学习,而限制更新的值矩阵提高zero-shot的性能,同时保持上下文学习。我们进一步表明,结合辅助Few-Shot损失增强了主要在目标任务上的上下文学习,代价是在微调过程中看不到的任务上的上下文学习能力下降。我们实证验证我们的理论结果。
摘要:Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning. We further show that incorporating an auxiliary few-shot loss enhances in-context learning primarily on the target task, at the expense of degraded in-context learning ability on tasks not seen during fine-tuning. We empirically validate our theoretical results.
【2】MoDora: Tree-Based Semi-Structured Document Analysis System
标题:MoDora:基于树的半结构化文档分析系统
链接:https://arxiv.org/abs/2602.23061
作者:Bangrui Xu,Qihang Yao,Zirui Tang,Xuanhe Zhou,Yeye He,Shihan Yu,Qianqian Xu,Bin Wang,Guoliang Li,Conghui He,Fan Wu
备注
:Extension of our SIGMOD 2026 paper. Please refer to source code available at https://github.com/weAIDB/MoDora
摘要:半结构化文档集成了不同的交错数据元素(例如,表格、图表、分层段落),以各种且通常不规则的布局排列。这些文档在各个领域都被广泛观察到,占了现实世界数据的很大一部分。然而,现有的方法很难在这些文档上支持自然语言问题回答,这是由于三个主要的技术挑战:(1)通过OCR等技术提取的元素通常是碎片化的,并且被剥离了它们的原始语义上下文,使得它们不足以进行分析。(2)现有的方法缺乏有效的表示来捕获文档内的分层结构(例如,将表格与嵌套的章节标题相关联)并保持布局特定的区别(例如,区分侧边栏和主内容)。(3)回答问题通常需要检索和对齐分散在多个区域或页面中的相关信息,例如将描述性段落链接到文档中其他位置的表格单元格。 为了解决这些问题,我们提出了MoDora,一个LLM驱动的半结构化文档分析系统。首先,我们采用局部对齐聚合策略将OCR解析的元素转换为布局感知的组件,并对具有层次标题或非文本元素的组件进行特定类型的信息提取。其次,我们设计了关联树(CCTree)来分层组织组件,通过自下而上的级联汇总过程明确建模组件间的关系和布局差异。最后,我们提出了一个问题类型感知的检索策略,支持(1)基于布局的网格划分的基于位置的检索和(2)LLM指导修剪基于语义的检索。实验表明,MoDora的准确率优于基线5.97%-61.07%。代码在https://github.com/weAIDB/MoDora。
摘要:Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and often irregular layouts. These documents are widely observed across domains and account for a large portion of real-world data. However, existing methods struggle to support natural language question answering over these documents due to three main technical challenges: (1) The elements extracted by techniques like OCR are often fragmented and stripped of their original semantic context, making them inadequate for analysis. (2) Existing approaches lack effective representations to capture hierarchical structures within documents (e.g., associating tables with nested chapter titles) and to preserve layout-specific distinctions (e.g., differentiating sidebars from main content). (3) Answering questions often requires retrieving and aligning relevant information scattered across multiple regions or pages, such as linking a descriptive paragraph to table cells located elsewhere in the document. To address these issues, we propose MoDora, an LLM-powered system for semi-structured document analysis. First, we adopt a local-alignment aggregation strategy to convert OCR-parsed elements into layout-aware components, and conduct type-specific information extraction for components with hierarchical titles or non-text elements. Second, we design the Component-Correlation Tree (CCTree) to hierarchically organize components, explicitly modeling inter-component relations and layout distinctions through a bottom-up cascade summarization process. Finally, we propose a question-type-aware retrieval strategy that supports (1) layout-based grid partitioning for location-based retrieval and (2) LLM-guided pruning for semantic-based retrieval. Experiments show MoDora outperforms baselines by 5.97%-61.07% in accuracy. The code is at https://github.com/weAIDB/MoDora.
【3】MEDNA-DFM: A Dual-View FiLM-MoE Model for Explainable DNA Methylation Prediction
标题:MEDNA-DFM:用于可解释DNA甲基化预测的双视图FiLM-MoE模型
链接:https://arxiv.org/abs/2602.22850
作者:Yi He,Yina Cao,Jixiu Zhai,Di Wang,Junxiao Kong,Tianchi Lu
摘要:DNA甲基化的精确计算鉴定对于理解表观遗传调控是必不可少的。尽管深度学习在这种二元分类任务中表现出色,但其“黑匣子”性质阻碍了生物学洞察力。我们通过引入高性能模型MEDNA-DFM以及机制启发的信号净化算法来解决这个问题。我们的研究表明,MEDNA-DFM有效地捕获保守的甲基化模式,实现了不同物种之间的强大区分。对外部独立数据集的验证证实了模型的泛化是由保守的内在基序驱动的(例如,GC含量)而不是系统发育接近度。此外,应用我们开发的算法提取的图案具有显着更高的可靠性比以前的研究。最后,从果蝇6 mA的案例研究的经验证据促使我们提出了一个“序列结构协同作用”的假说,这表明GAGG核心基序和上游的A-道元件的功能合作。我们通过计算机诱变进一步验证了这一假设,证实了一个或两个元素的消融显着降低了模型的识别能力。这项工作为甲基化预测提供了一个强大的工具,并展示了可解释的深度学习如何推动方法创新和生物学假设的产生。
摘要:Accurate computational identification of DNA methylation is essential for understanding epigenetic regulation. Although deep learning excels in this binary classification task, its "black-box" nature impedes biological insight. We address this by introducing a high-performance model MEDNA-DFM, alongside mechanism-inspired signal purification algorithms. Our investigation demonstrates that MEDNA-DFM effectively captures conserved methylation patterns, achieving robust distinction across diverse species. Validation on external independent datasets confirms that the model's generalization is driven by conserved intrinsic motifs (e.g., GC content) rather than phylogenetic proximity. Furthermore, applying our developed algorithms extracted motifs with significantly higher reliability than prior studies. Finally, empirical evidence from a Drosophila 6mA case study prompted us to propose a "sequence-structure synergy" hypothesis, suggesting that the GAGG core motif and an upstream A-tract element function cooperatively. We further validated this hypothesis via in silico mutagenesis, confirming that the ablation of either or both elements significantly degrades the model's recognition capabilities. This work provides a powerful tool for methylation prediction and demonstrates how explainable deep learning can drive both methodological innovation and the generation of biological hypotheses.
【4】Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD
标题:通过逐层MIA风险感知DP-SGD减轻中间表示中的成员关系推断
链接:https://arxiv.org/abs/2602.22611
作者:Jiayang Meng,Tao Huang,Chen Hou,Guolong Zheng,Hong Chen
摘要:在Embedding-as-an-Interface(EaEE)设置中,查询预训练模型的中间表示(IR)。IR的分布特性可能会泄漏训练集成员资格信号,从而导致成员资格推断攻击(MIA),其强度在各层之间变化。虽然差分私有随机梯度下降(DP-SGD)减轻了这种泄漏,现有的实现采用每示例梯度裁剪和统一的,层不可知的噪声乘数,忽略异构逐层MIA漏洞。本文介绍了逐层MIA风险感知的DP-SGD(LM-DP-SGD),它自适应地分配隐私保护层之间的比例,他们的MIA风险。具体而言,LM-DP-SGD在公共影子数据集上训练影子模型,从其训练/测试分割中提取每层IR,并使用其攻击错误率作为MIA风险估计来拟合特定于层的MIA对手。利用MIA的跨数据集可传递性,这些估计值然后用于在私有训练期间重新加权每个层对全局裁剪梯度的贡献,从而在固定噪声幅度下提供层适当的保护。我们进一步建立了理论保证的隐私和LM-DP-SGD的收敛。大量的实验表明,在相同的隐私预算下,LM-DP-SGD降低了峰值IR级MIA风险,同时保持效用,产生了优越的隐私效用权衡。
摘要
:In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.
【5】SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
标题:SideQuest:用于长地平线统计推理的模型驱动的KV缓存管理
链接:https://arxiv.org/abs/2602.22603
作者:Sanjay Kariyappa,G. Edward Suh
摘要:长期运行的代理任务,如深入研究,需要对分布在多个网页和文档中的信息进行多跳推理。在这样的任务中,LLM上下文由来自外部检索的令牌主导,导致内存使用快速增长并限制解码性能。虽然有几个KV缓存压缩技术存在的长上下文输入,我们发现,现有的算法无法有效地支持多步推理模型。我们用SideQuest来解决这一挑战--一种新的方法,它利用大型推理模型(LRM)本身通过推理令牌在其上下文中的有用性来执行KV缓存压缩。为了防止与此管理过程相关的令牌污染模型的内存,我们将KV缓存压缩作为与主推理任务并行执行的辅助任务。我们使用仅用215个样本训练的模型进行的评估表明,SideQuest在代理任务上将峰值令牌使用量减少了65%,准确性降低了最小程度,优于基于几何的KV缓存压缩技术。
摘要:Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to grow rapidly and limiting decode performance. While several KV cache compression techniques exist for long-context inputs, we find that existing heuristics fail to support multi-step reasoning models effectively. We address this challenge with SideQuest -- a novel approach that leverages the Large Reasoning Model (LRM) itself to perform KV cache compression by reasoning about the usefulness of tokens in its context. To prevent the tokens associated with this management process from polluting the model's memory, we frame KV cache compression as an auxiliary task executed in parallel to the main reasoning task. Our evaluations, using a model trained with just 215 samples, show that SideQuest reduces peak token usage by up to 65% on agentic tasks with minimal degradation in accuracy, outperforming heuristic-based KV cache compression techniques.
【6】RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
标题:RAIN-Merging:一种增强具有保留思维格式的大型推理模型中指令遵循的无干扰方法
链接:https://arxiv.org/abs/2602.22538
作者:Zhehao Huang,Yuhang Liu,Baijiong Lin,Yixin Lou,Zhengbao He,Hanling Tian,Tao Li,Xiaolin Huang
备注:41 pages, ICLR 2026 Oral
摘要:Large reasoning models (LRMs) excel at a long chain of reasoning but often fail to faithfully follow instructions regarding output format, constraints, or specific requirements. We investigate whether this gap can be closed by integrating an instruction-tuned model (ITM) into an LRM. Analyzing their differences in parameter space, namely task vectors, we find that their principal subspaces are nearly orthogonal across key modules, suggesting a lightweight merging with minimal interference. However, we also demonstrate that naive merges are fragile because they overlook the output format mismatch between LRMs (with explicit thinking and response segments) and ITMs (answers-only). We introduce RAIN-Merging (Reasoning-Aware Instruction-attention guided Null-space projection Merging), a gradient-free method that integrates instruction following while preserving thinking format and reasoning performance. First, with a small reasoning calibration set, we project the ITM task vector onto the null space of forward features at thinking special tokens, which preserves the LRM's structured reasoning mechanisms. Second, using a small instruction calibration set, we estimate instruction attention to derive module-specific scaling that amplifies instruction-relevant components and suppresses leakage. Across four instruction-following benchmarks and nine reasoning & general capability benchmarks, RAIN-Merging substantially improves instruction adherence while maintaining reasoning quality. The gains are consistent across model scales and architectures, translating to improved performance in agent settings.
【7】Calibrated Test-Time Guidance for Bayesian Inference
标题:经过校准的Bayesian推理测试时间指南
链接:https://arxiv.org/abs/2602.22428
作者:Daniel Geyfman,Felix Draxler,Jan Groeneveld,Hyunsoo Lee,Theofanis Karaletsos,Stephan Mandt
备注:Preprint. Under review
摘要:Test-time guidance is a widely used mechanism for steering pretrained diffusion models toward outcomes specified by a reward function. Existing approaches, however, focus on maximizing reward rather than sampling from the true Bayesian posterior, leading to miscalibrated inference. In this work, we show that common test-time guidance methods do not recover the correct posterior distribution and identify the structural approximations responsible for this failure. We then propose consistent alternative estimators that enable calibrated sampling from the Bayesian posterior. We significantly outperform previous methods on a set of Bayesian inference tasks, and match state-of-the-art in black hole image reconstruction.
【8】Reliable XAI Explanations in Sudden Cardiac Death Prediction for Chagas Cardiomyopathy
标题:可靠的XAI解释在恰加斯心肌病心源性猝死预测中
链接
:https://arxiv.org/abs/2602.22288
作者:Vinícius P. Chagas,Luiz H. T. Viana,Mac M. da S. Carlos,João P. V. Madeiro,Roberto C. Pedrosa,Thiago Alves Rocha,Carlos H. L. Cavalcante
备注:Preprint. For the final published version, see the DOI below
摘要:Sudden cardiac death (SCD) is unpredictable, and its prediction in Chagas cardiomyopathy (CC) remains a significant challenge, especially in patients not classified as high risk. While AI and machine learning models improve risk stratification, their adoption is hindered by a lack of transparency, as they are often perceived as \textit{black boxes} with unclear decision-making processes. Some approaches apply heuristic explanations without correctness guarantees, leading to mistakes in the decision-making process. To address this, we apply a logic-based explainability method with correctness guarantees to the problem of SCD prediction in CC. This explainability method, applied to an AI classifier with over 95\% accuracy and recall, demonstrated strong predictive performance and 100\% explanation fidelity. When compared to state-of-the-art heuristic methods, it showed superior consistency and robustness. This approach enhances clinical trust, facilitates the integration of AI-driven tools into practice, and promotes large-scale deployment, particularly in endemic regions where it is most needed.
【9】FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation
标题:FIRE:金融情报和推理评估的综合基准
链接:https://arxiv.org/abs/2602.22273
作者:Xiyuan Zhang,Huihang Wu,Jiayu Guo,Zhenlin Zhang,Yiwei Zhang,Liangyu Huo,Xiaoxiao Ma,Jiansong Wan,Xuewei Jiao,Yi Jing,Jian Xie
摘要:We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial knowledge of LLMs and their ability to handle practical business scenarios. For theoretical assessment, we curate a diverse set of examination questions drawn from widely recognized financial qualification exams, enabling evaluation of LLMs deep understanding and application of financial knowledge. In addition, to assess the practical value of LLMs in real-world financial tasks, we propose a systematic evaluation matrix that categorizes complex financial domains and ensures coverage of essential subdomains and business activities. Based on this evaluation matrix, we collect 3,000 financial scenario questions, consisting of closed-form decision questions with reference answers and open-ended questions evaluated by predefined rubrics. We conduct comprehensive evaluations of state-of-the-art LLMs on the FIRE benchmark, including XuanYuan 4.0, our latest financial-domain model, as a strong in-domain baseline. These results enable a systematic analysis of the capability boundaries of current LLMs in financial applications. We publicly release the benchmark questions and evaluation code to facilitate future research.
【10】From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference
标题:从浅Bayesian神经网络到高斯过程:一般收敛性、可识别性和可扩展推理
链接:https://arxiv.org/abs/2602.22492
作者:Gracielle Antunes de Araújo,Flávio B. Gonçalves
备注:29 pages, 4 figures, 8 tables. Supplementary material included
摘要:In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model. Building on this theory, we propose a new covariance function defined as a convex mixture of components induced by four widely used activation functions, and we characterize key properties including positive definiteness and both strict and practical identifiability under different input designs. For computation, we develop a scalable maximum a posterior (MAP) training and prediction procedure using a Nyström approximation, and we show how the Nyström rank and anchor selection control the cost-accuracy trade-off. Experiments on controlled simulations and real-world tabular datasets demonstrate stable hyperparameter estimates and competitive predictive performance at realistic computational cost.
检测相关(2篇)
【1】A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection
标题:基于上下文感知的BanglaBERT和两层堆叠LSTM框架的融合用于多标签网络欺凌检测
链接:https://arxiv.org/abs/2602.22449
作者:Mirza Raquib,Asif Pervez Polok,Kedar Nath Biswas,Rahat Uddin Azad,Saydul Akbar Murad,Nick Rahimi
摘要:Cyberbullying has become a serious and growing concern in todays virtual world. When left unnoticed, it can have adverse consequences for social and mental health. Researchers have explored various types of cyberbullying, but most approaches use single-label classification, assuming that each comment contains only one type of abuse. In reality, a single comment may include overlapping forms such as threats, hate speech, and harassment. Therefore, multilabel detection is both realistic and essential. However, multilabel cyberbullying detection has received limited attention, especially in low-resource languages like Bangla, where robust pre-trained models are scarce. Developing a generalized model with moderate accuracy remains challenging. Transformers offer strong contextual understanding but may miss sequential dependencies, while LSTM models capture temporal flow but lack semantic depth. To address these limitations, we propose a fusion architecture that combines BanglaBERT-Large with a two-layer stacked LSTM. We analyze their behavior to jointly model context and sequence. The model is fine-tuned and evaluated on a publicly available multilabel Bangla cyberbullying dataset covering cyberbully, sexual harassment, threat, and spam. We apply different sampling strategies to address class imbalance. Evaluation uses multiple metrics, including accuracy, precision, recall, F1-score, Hamming loss, Cohens kappa, and AUC-ROC. We employ 5-fold cross-validation to assess the generalization of the architecture.
【2】A Learning-Based Hybrid Decision Framework for Matching Systems with User Departure Detection
标题:具有用户离开检测的匹配系统的基于学习的混合决策框架
链接:https://arxiv.org/abs/2602.22412
作者:Ruiqi Zhou,Donghao Zhu,Houcai Shen
备注:Accepted at HCII 2026
摘要:In matching markets such as kidney exchanges and freight exchanges, delayed matching has been shown to improve overall market efficiency. The benefits of delay are highly sensitive to participants' sojourn times and departure behavior, and delaying matches can impose significant costs, including longer waiting times and increased market congestion. These competing effects make fixed matching policies inherently inflexible in dynamic environments. We propose a learning-based Hybrid framework that adaptively combines immediate and delayed matching. The framework continuously collects data on user departures over time, estimates the underlying departure distribution via regression, and determines whether to delay matching in the subsequent period based on a decision threshold that governs the system's tolerance for matching efficiency loss. The proposed framework can substantially reduce waiting times and congestion while sacrificing only a limited amount of matching efficiency. By dynamically adjusting its matching strategy, the Hybrid framework enables system performance to flexibly interpolate between purely greedy and purely patient policies, offering a robust and adaptive alternative to static matching mechanisms.
表征(1篇)
【1】Physics Informed Viscous Value Representations
标题:物理知识粘性值表示
链接:https://arxiv.org/abs/2602.23280
作者:Hrishikesh Viswanath,Juanwu Lu,S. Talha Bukhari,Damon Conover,Ziran Wang,Aniket Bera
摘要:Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.
3D|3D重建等相关(1篇)
【1】Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials
标题:Zatom-1:3D分子和材料的多峰流基础模型
链接:https://arxiv.org/abs/2602.22251
作者:Alex Morehead,Miruna Cretu,Antonia Panescu,Rishabh Anand,Maurice Weiler,Tynan Perez,Samuel Blau,Steven Farrell,Wahid Bhimji,Anubhav Jain,Hrushikesh Sahasrabuddhe,Pietro Lio,Tommi Jaakkola,Rafael Gomez-Bombarelli,Rex Ying,N. Benjamin Erichson,Michael W. Mahoney
摘要:General-purpose 3D chemical modeling encompasses molecules and materials, requiring both generative and predictive capabilities. However, most existing AI approaches are optimized for a single domain (molecules or materials) and a single task (generation or prediction), which limits representation sharing and transfer. We introduce Zatom-1, the first foundation model that unifies generative and predictive learning of 3D molecules and materials. Zatom-1 is a Transformer trained with a multimodal flow matching objective that jointly models discrete atom types and continuous 3D geometries. This approach supports scalable pretraining with predictable gains as model capacity increases, while enabling fast and stable sampling. We use joint generative pretraining as a universal initialization for downstream multi-task prediction of properties, energies, and forces. Empirically, Zatom-1 matches or outperforms specialized baselines on both generative and predictive benchmarks, while reducing the generative inference time by more than an order of magnitude. Our experiments demonstrate positive predictive transfer between chemical domains from joint generative pretraining: modeling materials during pretraining improves molecular property prediction accuracy.
编码器(2篇)
【1】Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting
标题:Switch Hurdle:带有AR Hurdle解码器的MoE编码器,用于间歇性需求预测
链接:https://arxiv.org/abs/2602.22685
作者:Fabian Muşat,Simona Căbuz
摘要
:Intermittent demand, a pattern characterized by long sequences of zero sales punctuated by sporadic, non-zero values, poses a persistent challenge in retail and supply chain forecasting. Both traditional methods, such as ARIMA, exponential smoothing, or Croston variants, as well as modern neural architectures such as DeepAR and Transformer-based models often underperform on such data, as they treat demand as a single continuous process or become computationally expensive when scaled across many sparse series. To address these limitations, we introduce Switch-Hurdle: a new framework that integrates a Mixture-of-Experts (MoE) encoder with a Hurdle-based probabilistic decoder. The encoder uses a sparse Top-1 expert routing during the forward pass yet approximately dense in the backward pass via a straight-through estimator (STE). The decoder follows a cross-attention autoregressive design with a shared hurdle head that explicitly separates the forecasting task into two components: a binary classification component estimating the probability of a sale, and a conditional regression component, predicting the quantity given a sale. This structured separation enables the model to capture both occurrence and magnitude processes inherent to intermittent demand. Empirical results on the M5 benchmark and a large proprietary retail dataset show that Switch-Hurdle achieves state-of-the-art prediction performance while maintaining scalability.
【2】ECHO: Encoding Communities via High-order Operators
标题:ECHO:通过高级操作员编码社区
链接:https://arxiv.org/abs/2602.22446
作者:Emilio Ferrara
摘要:Community detection in attributed networks faces a fundamental divide: topological algorithms ignore semantic features, while Graph Neural Networks (GNNs) encounter devastating computational bottlenecks. Specifically, GNNs suffer from a Semantic Wall of feature over smoothing in dense or heterophilic networks, and a Systems Wall driven by the O(N^2) memory constraints of pairwise clustering. To dismantle these barriers, we introduce ECHO (Encoding Communities via High order Operators), a scalable, self supervised architecture that reframes community detection as an adaptive, multi scale diffusion process. ECHO features a Topology Aware Router that automatically analyzes structural heuristics sparsity, density, and assortativity to route graphs through the optimal inductive bias, preventing heterophilic poisoning while ensuring semantic densification. Coupled with a memory sharded full batch contrastive objective and a novel chunked O(N \cdot K) similarity extraction method, ECHO completely bypasses traditional O(N^2) memory bottlenecks without sacrificing the mathematical precision of global gradients. Extensive evaluations demonstrate that this topology feature synergy consistently overcomes the classical resolution limit. On synthetic LFR benchmarks scaled up to 1 million nodes, ECHO achieves scale invariant accuracy despite severe topological noise. Furthermore, on massive real world social networks with over 1.6 million nodes and 30 million edges, it completes clustering in mere minutes with throughputs exceeding 2,800 nodes per second matching the speed of highly optimized purely topological baselines. The implementation utilizes a unified framework that automatically engages memory sharded optimization to support adoption across varying hardware constraints. GitHub Repository: https://github.com/emilioferrara/ECHO-GNN
优化|敛散性(8篇)
【1】MSINO: Curvature-Aware Sobolev Optimization for Manifold Neural Networks
标题:MSINO:基于曲线感知的Sobolev优化
链接:https://arxiv.org/abs/2602.22937
作者:Suresan Pareth
备注:32 pages, 6 figures. Submitted for journal consideration
摘要:We introduce Manifold Sobolev Informed Neural Optimization (MSINO), a curvature aware training framework for neural networks defined on Riemannian manifolds. The method replaces standard Euclidean derivative supervision with a covariant Sobolev loss that aligns gradients using parallel transport and improves stability via a Laplace Beltrami smoothness regularization term. Building on classical results in Riemannian optimization and Sobolev theory on manifolds, we derive geometry dependent constants that yield (i) a Descent Lemma with a manifold Sobolev smoothness constant, (ii) a Sobolev Polyak Lojasiewicz inequality giving linear convergence guarantees for Riemannian gradient descent and stochastic gradient descent under explicit step size bounds, and (iii) a two step Newton Sobolev method with local quadratic contraction in curvature controlled neighborhoods. Unlike prior Sobolev training in Euclidean space, MSINO provides training time guarantees that explicitly track curvature and transported Jacobians. Applications include surface imaging, physics informed learning settings, and robotics on Lie groups such as SO(3) and SE(3). The framework unifies value and gradient based learning with curvature aware convergence guarantees for neural training on manifolds.
【2】Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
标题:长期任务的组分层策略优化
链接:https://arxiv.org/abs/2602.22817
作者:Shuo He,Lang Feng,Qi Wei,Xin Cheng,Lei Feng,Bo An
备注:Accepted at ICLR 2026
摘要:Group-based reinforcement learning (RL), such as GRPO, has advanced the capabilities of large language models on long-horizon agentic tasks. To enable more fine-grained policy updates, recent research has increasingly shifted toward stepwise group-based policy optimization, which treats each step in a rollout trajectory independently while using a memory module to retain historical context. However, we find a key issue in estimating stepwise relative advantages, namely context inconsistency, where steps within the same group may differ in their historical contexts. Empirically, we reveal that this issue can lead to severely biased advantage estimation, thereby degrading policy optimization significantly. To address the issue, in this paper, we propose Hierarchy-of-Groups Policy Optimization (HGPO) for long-horizon agentic tasks. Specifically, within a group of rollout trajectories, HGPO assigns each step to multiple hierarchical groups according to the consistency of historical contexts. Then, for each step, HGPO computes distinct advantages within each group and aggregates them with an adaptive weighting scheme. In this way, HGPO can achieve a favorable bias-variance trade-off in stepwise advantage estimation, without extra models or rollouts. Evaluations on two challenging agentic tasks, ALFWorld and WebShop with Qwen2.5-1.5B-Instruct and Qwen2.5-7B-Instruct, show that HGPO significantly outperforms existing agentic RL methods under the same computational constraints. Code is available at https://github.com/langfengQ/verl-agent/tree/master/recipe/hgpo.
【3】$ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models
标题:$$-DPO:大型多模态模型中连续学习的公平直接偏好优化方法
链接:https://arxiv.org/abs/2602.22601
作者:Thanh-Dat Truong,Huu-Thien Tran,Jackson Cothren,Bhiksha Raj,Khoa Luu
备注:Accepted to CVPR'26
摘要
:Fairness in Continual Learning for Large Multimodal Models (LMMs) is an emerging yet underexplored challenge, particularly in the presence of imbalanced data distributions that can lead to biased model updates and suboptimal performance across tasks. While recent continual learning studies have made progress in addressing catastrophic forgetting, the problem of fairness caused the imbalanced data remains largely underexplored. This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $φ$-DPO) framework for continual learning in LMMs. In particular, we first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals. Then, we identify the limitations of conventional DPO in imbalanced data and present a new $φ$-DPO loss that explicitly addresses distributional biases. We provide a comprehensive theoretical analysis demonstrating that our approach addresses both forgetting and data imbalance. Additionally, to enable $φ$-DPO-based continual learning, we construct pairwise preference annotations for existing benchmarks in the context of continual learning. Extensive experiments and ablation studies show the proposed $φ$-DPO achieves State-of-the-Art performance across multiple benchmarks, outperforming prior continual learning methods of LMMs.
【4】Operationalizing Fairness: Post-Hoc Threshold Optimization Under Hard Resource Limits
标题:操作公平性:硬资源限制下的事后阈值优化
链接:https://arxiv.org/abs/2602.22560
作者:Moirangthem Tiken Singh,Amit Kalita,Sapam Jitu Singh
摘要:The deployment of machine learning in high-stakes domains requires a balance between predictive safety and algorithmic fairness. However, existing fairness interventions often as- sume unconstrained resources and employ group-specific decision thresholds that violate anti- discrimination regulations. We introduce a post-hoc, model-agnostic threshold optimization framework that jointly balances safety, efficiency, and equity under strict and hard capacity constraints. To ensure legal compliance, the framework enforces a single, global decision thresh- old. We formulated a parameterized ethical loss function coupled with a bounded decision rule that mathematically prevents intervention volumes from exceeding the available resources. An- alytically, we prove the key properties of the deployed threshold, including local monotonicity with respect to ethical weighting and the formal identification of critical capacity regimes. We conducted extensive experimental evaluations on diverse high-stakes datasets. The principal re- sults demonstrate that capacity constraints dominate ethical priorities; the strict resource limit determines the final deployed threshold in over 80% of the tested configurations. Furthermore, under a restrictive 25% capacity limit, the proposed framework successfully maintains high risk identification (recall ranging from 0.409 to 0.702), whereas standard unconstrained fairness heuristics collapse to a near-zero utility. We conclude that theoretical fairness objectives must be explicitly subordinated to operational capacity limits to remain in deployment. By decou- pling predictive scoring from policy evaluation and strictly bounding intervention rates, this framework provides a practical and legally compliant mechanism for stakeholders to navigate unavoidable ethical trade-offs in resource-constrained environments.
【5】Sharp Convergence Rates for Masked Diffusion Models
标题:掩蔽扩散模型的快速收敛速度
链接:https://arxiv.org/abs/2602.22505
作者:Yuchen Liang,Zhiheng Tan,Ness Shroff,Yingbin Liang
摘要:Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, with masked (absorbing-rate) variants emerging as competitive alternatives to autoregressive models. Among existing samplers, the Euler method remains the standard choice in many applications, and more recently, the First-Hitting Sampler (FHS) has shown considerable promise for masked diffusion models. Despite their practical success, the theoretical understanding of these samplers remains limited. Existing analyses are conducted in Kullback-Leibler (KL) divergence, which often yields loose parameter dependencies and requires strong assumptions on score estimation. Moreover, these guarantees do not cover recently developed high-performance sampler of FHS. In this work, we first develop a direct total-variation (TV) based analysis for the Euler method that overcomes these limitations. Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees without requiring any surrogate initialization. Also for this setting, we provide the first convergence lower bound for the Euler sampler, establishing tightness with respect to both the data dimension $d$ and the target accuracy $\varepsilon$. Finally, we analyze the FHS sampler and show that it incurs no sampling error beyond that induced by score estimation, which we show to be tight with a matching lower error bound. Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS, which may be of independent interest.
【6】TopoEdit: Fast Post-Optimization Editing of Topology Optimized Structures
标题:TopoEdit:对布局优化结构进行快速优化后编辑
链接:https://arxiv.org/abs/2602.22430
作者:Hongrui Chen,Josephine V. Carstensen,Faez Ahmed
摘要:Despite topology optimization producing high-performance structures, late-stage localized revisions remain brittle: direct density-space edits (e.g., warping pixels, inserting holes, swapping infill) can sever load paths and sharply degrade compliance, while re-running optimization is slow and may drift toward a qualitatively different design. We present TopoEdit, a fast post-optimization editor that demonstrates how structured latent embeddings from a pre-trained topology foundation model (OAT) can be repurposed as an interface for physics-aware engineering edits. Given an optimized topology, TopoEdit encodes it into OAT's spatial latent, applies partial noising to preserve instance identity while increasing editability, and injects user intent through an edit-then-denoise diffusion pipeline. We instantiate three edit operators: drag-based topology warping with boundary-condition-consistent conditioning updates, shell-infill lattice replacement using a lattice-anchored reference latent with updated volume-fraction conditioning, and late-stage no-design region enforcement via masked latent overwrite followed by diffusion-based recovery. A consistency-preserving guided DDIM procedure localizes changes while allowing global structural adaptation; multiple candidates can be sampled and selected using a compliance-aware criterion, with optional short SIMP refinement for warps. Across diverse case studies and large edit sweeps, TopoEdit produces intention-aligned modifications that better preserve mechanical performance and avoid catastrophic failure modes compared to direct density-space edits, while generating edited candidates in sub-second diffusion time per sample.
【7】Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation
标题
:垂直权重修改增强学习可扩展性和收敛效率,无需梯度反向传播
链接:https://arxiv.org/abs/2602.22259
作者:Guoqing Ma,Shan Yu
摘要:Recognizing the substantial computational cost of backpropagation (BP), non-BP methods have emerged as attractive alternatives for efficient learning on emerging neuromorphic systems. However, existing non-BP approaches still face critical challenges in efficiency and scalability. Inspired by neural representations and dynamic mechanisms in the brain, we propose a perturbation-based approach called LOw-rank Cluster Orthogonal (LOCO) weight modification. We find that low-rank is an inherent property of perturbation-based algorithms. Under this condition, the orthogonality constraint limits the variance of the node perturbation (NP) gradient estimates and enhances the convergence efficiency. Through extensive evaluations on multiple datasets, LOCO demonstrates the capability to locally train the deepest spiking neural networks to date (more than 10 layers), while exhibiting strong continual learning ability, improved convergence efficiency, and better task performance compared to other brain-inspired non-BP algorithms. Notably, LOCO requires only O(1) parallel time complexity for weight updates, which is significantly lower than that of BP methods. This offers a promising direction for achieving high-performance, real-time, and lifelong learning on neuromorphic systems.
【8】Causal Direction from Convergence Time: Faster Training in the True Causal Direction
标题:来自收敛时间的因果方向:在真正因果方向上更快地训练
链接:https://arxiv.org/abs/2602.22254
作者:Abdulrahman Tamim
摘要:We introduce Causal Computational Asymmetry (CCA), a principle for causal direction identification based on optimization dynamics in which one neural network is trained to predict $Y$ from $X$ and another to predict $X$ from $Y$, and the direction that converges faster is inferred to be causal. Under the additive noise model $Y = f(X) + \varepsilon$ with $\varepsilon \perp X$ and $f$ nonlinear and injective, we establish a formal asymmetry: in the reverse direction, residuals remain statistically dependent on the input regardless of approximation quality, inducing a strictly higher irreducible loss floor and non-separable gradient noise in the optimization dynamics, so that the reverse model requires strictly more gradient steps in expectation to reach any fixed loss threshold; consequently, the forward (causal) direction converges in fewer expected optimization steps. CCA operates in optimization-time space, distinguishing it from methods such as RESIT, IGCI, and SkewScore that rely on statistical independence or distributional asymmetries, and proper z-scoring of both variables is required for valid comparison of convergence rates. On synthetic benchmarks, CCA achieves 26/30 correct causal identifications across six neural architectures, including 30/30 on sine and exponential data-generating processes. We further embed CCA into a broader framework termed Causal Compression Learning (CCL), which integrates graph structure learning, causal information compression, and policy optimization, with all theoretical guarantees formally proved and empirically validated on synthetic datasets.
预测|估计(15篇)
【1】Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
标题:粗数据的均值估计:特征和有效算法
链接:https://arxiv.org/abs/2602.23341
作者:Alkis Kalavasis,Anay Mehrotra,Manolis Zampetakis,Felix Zhou,Ziyu Zhu
备注:Abstract truncated to arXiv limits. To appear in ICLR'26
摘要:Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse samples, roughly speaking, have ``low'' information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21] established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open: (1) When is the mean identifiable under convex partitions? (2) Is computationally efficient estimation possible under identifiability and convex partitions? This work resolves both questions. [...]
【2】Prediction of Diffusion Coefficients in Mixtures with Tensor Completion
标题:张量完成混合物扩散系数的预测
链接:https://arxiv.org/abs/2602.23142
作者:Zeno Romero,Kerstin Münnemann,Hans Hasse,Fabian Jirasek
摘要
:Predicting diffusion coefficients in mixtures is crucial for many applications, as experimental data remain scarce, and machine learning (ML) offers promising alternatives to established semi-empirical models. Among ML models, matrix completion methods (MCMs) have proven effective in predicting thermophysical properties, including diffusion coefficients in binary mixtures. However, MCMs are restricted to single-temperature predictions, and their accuracy depends strongly on the availability of high-quality experimental data for each temperature of interest. In this work, we address this challenge by presenting a hybrid tensor completion method (TCM) for predicting temperature-dependent diffusion coefficients at infinite dilution in binary mixtures. The TCM employs a Tucker decomposition and is jointly trained on experimental data for diffusion coefficients at infinite dilution in binary systems at 298 K, 313 K, and 333 K. Predictions from the semi-empirical SEGWE model serve as prior knowledge within a Bayesian training framework. The TCM then extrapolates linearly to any temperature between 268 K and 378 K, achieving markedly improved prediction accuracy compared to established models across all studied temperatures. To further enhance predictive performance, the experimental database was expanded using active learning (AL) strategies for targeted acquisition of new diffusion data by pulsed-field gradient (PFG) NMR measurements. Diffusion coefficients at infinite dilution in 19 solute + solvent systems were measured at 298 K, 313 K, and 333 K. Incorporating these results yields a substantial improvement in the TCM's predictive accuracy. These findings highlight the potential of combining data-efficient ML methods with adaptive experimentation to advance predictive modeling of transport properties.
【3】Sequential Regression for Continuous Value Prediction using Residual Quantization
标题:使用剩余量化进行连续值预测的序列回归
链接:https://arxiv.org/abs/2602.23012
作者:Runpeng Cui,Zhipeng Sun,Chi Lu,Peng Jiang
摘要:Continuous value prediction plays a crucial role in industrial-scale recommendation systems, including tasks such as predicting users' watch-time and estimating the gross merchandise value (GMV) in e-commerce transactions. However, it remains challenging due to the highly complex and long-tailed nature of the data distributions. Existing generative approaches rely on rigid parametric distribution assumptions, which fundamentally limits their performance when such assumptions misalign with real-world data. Overly simplified forms cannot adequately model real-world complexities, while more intricate assumptions often suffer from poor scalability and generalization. To address these challenges, we propose a residual quantization (RQ)-based sequence learning framework that represents target continuous values as a sum of ordered quantization codes, predicted recursively from coarse to fine granularity with diminishing quantization errors. We introduce a representation learning objective that aligns RQ code embedding space with the ordinal structure of target values, allowing the model to capture continuous representations for quantization codes and further improving prediction accuracy. We perform extensive evaluations on public benchmarks for lifetime value (LTV) and watch-time prediction, alongside a large-scale online experiment for GMV prediction on an industrial short-video recommendation platform. The results consistently show that our approach outperforms state-of-the-art methods, while demonstrating strong generalization across diverse continuous value prediction tasks in recommendation systems.
【4】Fair feature attribution for multi-output prediction: a Shapley-based perspective
标题:多输出预测的公平特征归因:基于Shapley的视角
链接:https://arxiv.org/abs/2602.22882
作者:Umberto Biccari,Alain Ibáñez de Opakua,José María Mato,Óscar Millet,Roberto Morales,Enrique Zuazua
摘要:In this article, we provide an axiomatic characterization of feature attribution for multi-output predictors within the Shapley framework. While SHAP explanations are routinely computed independently for each output coordinate, the theoretical necessity of this practice has remained unclear. By extending the classical Shapley axioms to vector-valued cooperative games, we establish a rigidity theorem showing that any attribution rule satisfying efficiency, symmetry, dummy player, and additivity must necessarily decompose component-wise across outputs. Consequently, any joint-output attribution rule must relax at least one of the classical Shapley axioms. This result identifies a previously unformalized structural constraint in Shapley-based interpretability, clarifying the precise scope of fairness-consistent explanations in multi-output learning. Numerical experiments on a biomedical benchmark illustrate that multi-output models can yield computational savings in training and deployment, while producing SHAP explanations that remain fully consistent with the component-wise structure imposed by the Shapley axioms.
【5】FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics
标题:FlexMS是一个灵活的框架,用于对代谢组学中基于深度学习的谱预测工具进行基准测试
链接:https://arxiv.org/abs/2602.22822
作者:Yunhua Zhong,Yixuan Tang,Yifan Li,Jie Yang,Pan Liu,Jun Xia
备注:28 pages, preprint version
摘要:The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form of mass-to-charge ratio peaks. However, the lack of experimental spectra hinders the attachment of each molecular identification, and thus urges the establishment of prediction approaches for computational models. Deep learning models appear promising for predicting molecular structure spectra, but overall assessment remains challenging as a result of the heterogeneity in methods and the lack of well-defined benchmarks. To address this, our contribution is the creation of benchmark framework FlexMS for constructing and evaluating diverse model architectures in mass spectrum prediction. With its easy-to-use flexibility, FlexMS supports the dynamic construction of numerous distinct combinations of model architectures, while assessing their performance on preprocessed public datasets using different metrics. In this paper, we provide insights into factors influencing performance, including the structural diversity of datasets, hyperparameters like learning rate and data sparsity, pretraining effects, metadata ablation settings and cross-domain transfer learning analysis. This provides practical guidance in choosing suitable models. Moreover, retrieval benchmarks simulate practical identification scenarios and score potential matches based on predicted spectra.
【6】HyperKKL: Enabling Non-Autonomous State Estimation through Dynamic Weight Conditioning
标题:HyperKKL:通过动态权重调节实现非自治状态估计
链接:https://arxiv.org/abs/2602.22630
作者:Yahia Salaheldin Shaaban,Salem Lahlou,Abdelrahman Sayed Sayed
备注:18 pages, 6 figures, Under review in ICLR 2026 AI & PDE Workshop
摘要
:This paper proposes HyperKKL, a novel learning approach for designing Kazantzis-Kravaris/Luenberger (KKL) observers for non-autonomous nonlinear systems. While KKL observers offer a rigorous theoretical framework by immersing nonlinear dynamics into a stable linear latent space, its practical realization relies on solving Partial Differential Equations (PDE) that are analytically intractable. Current existing learning-based approximations of the KKL observer are mostly designed for autonomous systems, failing to generalize to driven dynamics without expensive retraining or online gradient updates. HyperKKL addresses this by employing a hypernetwork architecture that encodes the exogenous input signal to instantaneously generate the parameters of the KKL observer, effectively learning a family of immersion maps parameterized by the external drive. We rigorously evaluate this approach against a curriculum learning strategy that attempts to generalize from autonomous regimes via training heuristics alone. The novel approach is illustrated on four numerical simulations in benchmark examples including the Duffing, Van der Pol, Lorenz, and Rössler systems.
【7】Predicting Tennis Serve directions with Machine Learning
标题:用机器学习预测网球发球方向
链接:https://arxiv.org/abs/2602.22527
作者:Ying Zhu,Ruthuparna Naikar
摘要:Serves, especially first serves, are very important in professional tennis. Servers choose their serve directions strategically to maximize their winning chances while trying to be unpredictable. On the other hand, returners try to predict serve directions to make good returns. The mind game between servers and returners is an important part of decision-making in professional tennis matches. To help understand the players' serve decisions, we have developed a machine learning method for predicting professional tennis players' first serve directions. Through feature engineering, our method achieves an average prediction accuracy of around 49\% for male players and 44\% for female players. Our analysis provides some evidence that top professional players use a mixed-strategy model in serving decisions and that fatigue might be a factor in choosing serve directions. Our analysis also suggests that contextual information is perhaps more important for returners' anticipatory reactions than previously thought.
【8】TEFL: Prediction-Residual-Guided Rolling Forecasting for Multi-Horizon Time Series
标题:TEFL:多时间序列的预测-残差引导滚动预测
链接:https://arxiv.org/abs/2602.22520
作者:Xiannan Huang,Shen Fang,Shuhan Qiu,Chengcheng Yu,Jiayuan Du,Chao Yang
摘要:Time series forecasting plays a critical role in domains such as transportation, energy, and meteorology. Despite their success, modern deep forecasting models are typically trained to minimize point-wise prediction loss without leveraging the rich information contained in past prediction residuals from rolling forecasts - residuals that reflect persistent biases, unmodeled patterns, or evolving dynamics. We propose TEFL (Temporal Error Feedback Learning), a unified learning framework that explicitly incorporates these historical residuals into the forecasting pipeline during both training and evaluation. To make this practical in deep multi-step settings, we address three key challenges: (1) selecting observable multi-step residuals under the partial observability of rolling forecasts, (2) integrating them through a lightweight low-rank adapter to preserve efficiency and prevent overfitting, and (3) designing a two-stage training procedure that jointly optimizes the base forecaster and error module. Extensive experiments across 10 real-world datasets and 5 backbone architectures show that TEFL consistently improves accuracy, reducing MAE by 5-10% on average. Moreover, it demonstrates strong robustness under abrupt changes and distribution shifts, with error reductions exceeding 10% (up to 19.5%) in challenging scenarios. By embedding residual-based feedback directly into the learning process, TEFL offers a simple, general, and effective enhancement to modern deep forecasting systems.
【9】MolFM-Lite: Multi-Modal Molecular Property Prediction with Conformer Ensemble Attention and Cross-Modal Fusion
标题:MolFM-Lite:利用适形物聚集注意力和跨模式融合的多模式分子性质预测
链接:https://arxiv.org/abs/2602.22405
作者:Syed Omer Shah,Mohammed Maqsood Ahmed,Danish Mohiuddin Mohammed,Shahnawaz Alam,Mohd Vahaj ur Rahman
摘要:Most machine learning models for molecular property prediction rely on a single molecular representation (either a sequence, a graph, or a 3D structure) and treat molecular geometry as static. We present MolFM-Lite, a multi-modal model that jointly encodes SELFIES sequences (1D), molecular graphs (2D), and conformer ensembles (3D) through cross-attention fusion, while conditioning predictions on experimental context via Feature-wise Linear Modulation (FiLM). Our main methodological contributions are: (1) a conformer ensemble attention mechanism that combines learnable attention with Boltzmann-weighted priors over multiple RDKit-generated conformers, capturing the thermodynamic distribution of molecular shapes; and (2) a cross-modal fusion layer where each modality can attend to others, enabling complementary information sharing. We evaluate on four MoleculeNet scaffold-split benchmarks using our model's own splits, and report all baselines re-evaluated under the same protocol. Comprehensive ablation studies across all four datasets confirm that each architectural component contributes independently, with tri-modal fusion providing 7-11% AUC improvement over single-modality baselines and conformer ensembles adding approximately 2% over single-conformer variants. Pre-training on ZINC250K (~250K molecules) using cross-modal contrastive and masked-atom objectives enables effective weight initialization at modest compute cost. We release all code, trained models, and data splits to support reproducibility.
【10】Global River Forecasting with a Topology-Informed AI Foundation Model
标题:基于拓扑信息的AI基础模型的全球河流预测
链接:https://arxiv.org/abs/2602.22293
作者:Hancheng Ren,Gang Zhao,Shuo Wang,Louise Slater,Dai Yamazaki,Shu Liu,Jingfang Fan,Shibo Cui,Ziming Yu,Shengyu Kang,Depeng Zuo,Dingzhi Peng,Zongxue Xu,Bo Pang
备注:26 pages, 5 figures, 3 extended data tables, 3 extended data figures
摘要
:River systems operate as inherently interconnected continuous networks, meaning river hydrodynamic simulation ought to be a systemic process. However, widespread hydrology data scarcity often restricts data-driven forecasting to isolated predictions. To achieve systemic simulation and reduce reliance on river observations, we present GraphRiverCast (GRC), a topology-informed AI foundation model designed to simulate multivariate river hydrodynamics in global river systems. GRC is capable of operating in a "ColdStart" mode, generating predictions without relying on historical river states for initialization. In 7-day global pseudo-hindcasts, GRC-ColdStart functions as a robust standalone simulator, achieving a Nash-Sutcliffe Efficiency (NSE) of approximately 0.82 without exhibiting the significant error accumulation typical of autoregressive paradigms. Ablation studies reveal that topological encoding serves as indispensable structural information in the absence of historical states, explicitly guiding hydraulic connectivity and network-scale mass redistribution to reconstruct flow dynamics. Furthermore, when adapted locally via a pre-training and fine-tuning strategy, GRC consistently outperforms physics-based and locally-trained AI baselines. Crucially, this superiority extends from gauged reaches to full river networks, underscoring the necessity of topology encoding and physics-based pre-training. Built on a physics-aligned neural operator architecture, GRC enables rapid and cross-scale adaptive simulation, establishing a collaborative paradigm bridging global hydrodynamic knowledge with local hydrological reality.
【11】X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation
标题:X-RECINE:基于XAI的RElevance输入过滤和架构fNE调整用于通道估计
链接:https://arxiv.org/abs/2602.22277
作者:Abdul Karim Gizzini,Yahia Medjahdi
备注:This paper has been accepted for publication in the IEEE Transactions on Vehicular Technology (TVT) as a correspondence paper
摘要:AI-native architectures are vital for 6G wireless communications. The black-box nature and high complexity of deep learning models employed in critical applications, such as channel estimation, limit their practical deployment. While perturbation-based XAI solutions offer input filtering, they often neglect internal structural optimization. We propose X-REFINE, an XAI-based framework for joint input-filtering and architecture fine-tuning. By utilizing a decomposition-based, sign-stabilized LRP epsilon rule, X-REFINE backpropagates predictions to derive high-resolution relevance scores for both subcarriers and hidden neurons. This enables a holistic optimization that identifies the most faithful model components. Simulation results demonstrate that X-REFINE achieves a superior interpretability-performance-complexity trade-off, significantly reducing computational complexity while maintaining robust bit error rate (BER) performance across different scenarios.
【12】Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting
标题:先验知识增强的时空流行病预测
链接:https://arxiv.org/abs/2602.22270
作者:Sijie Ruan,Jinyu Li,Jia Wei,Zenghao Xu,Jie Bao,Junshi Xu,Junyang Qiu,Hanning Yuan,Xiaoxiao Wang,Shuliang Wang
备注:12 pages, 10 figures
摘要:Spatio-temporal epidemic forecasting is critical for public health management, yet existing methods often struggle with insensitivity to weak epidemic signals, over-simplified spatial relations, and unstable parameter estimation. To address these challenges, we propose the Spatio-Temporal priOr-aware Epidemic Predictor (STOEP), a novel hybrid framework that integrates implicit spatio-temporal priors and explicit expert priors. STOEP consists of three key components: (1) Case-aware Adjacency Learning (CAL), which dynamically adjusts mobility-based regional dependencies using historical infection patterns; (2) Space-informed Parameter Estimating (SPE), which employs learnable spatial priors to amplify weak epidemic signals; and (3) Filter-based Mechanistic Forecasting (FMF), which uses an expert-guided adaptive thresholding strategy to regularize epidemic parameters. Extensive experiments on real-world COVID-19 and influenza datasets demonstrate that STOEP outperforms the best baseline by 11.1% in RMSE. The system has been deployed at one provincial CDC in China to facilitate downstream applications.
【13】A Synergistic Approach: Dynamics-AI Ensemble in Tropical Cyclone Forecasting
标题:协同方法:热带气旋预测中的动态人工智能集成
链接:https://arxiv.org/abs/2602.22533
作者:Yonghui Li,Wansuo Duan,Hao Li,Wei Han,Han Zhang,Yinuo Li
摘要:This study addresses a critical challenge in AI-based weather forecasting by developing an AI-driven optimized ensemble forecast system using Orthogonal Conditional Nonlinear Optimal Perturbations (O-CNOPs). The system bridges the gap between computational efficiency and dynamic consistency in tropical cyclone (TC) forecasting. Unlike conventional ensembles limited by computational costs or AI ensembles constrained by inadequate perturbation methods, O-CNOPs generate dynamically optimized perturbations that capture fast-growing errors of FuXi model while maintaining plausibility. The key innovation lies in producing orthogonal perturbations that respect FuXi nonlinear dynamics, yielding structures reflecting dominant dynamical controls and physically interpretable probabilistic forecasts. Demonstrating superior deterministic and probabilistic skills over the operational Integrated Forecasting System Ensemble Prediction System, this work establishes a new paradigm combining AI computational advantages with rigorous dynamical constraints. Success in TC track forecasting paves the way for reliable ensemble forecasts of other high-impact weather systems, marking a major step toward operational AI-based ensemble forecasting.
【14】LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees
标题:LoBoost:针对对象增强树的快速模型本地局部共形预测
链接:https://arxiv.org/abs/2602.22432
作者:Vagner Santos,Victor Coscrato,Luben Cabezas,Rafael Izbicki,Thiago Ramos
摘要
:Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.
【15】FM-RME: Foundation Model Empowered Radio Map Estimation
标题:FM-RME:基础模型增强无线电地图估计
链接:https://arxiv.org/abs/2602.22231
作者:Dong Yang,Yue Wang,Songyang Zhang,Yingshu Li,Zhipeng Cai,Zhi Tian
备注:7 pages, 5 figures, conference
摘要:Traditional radio map estimation (RME) techniques fail to capture multi-dimensional and dynamic characteristics of complex spectrum environments. Recent data-driven methods achieve accurate RME in spatial domain, but ignore physical prior knowledge of radio propagation, limiting data efficiency especially in multi-dimensional scenarios. To overcome such limitations, we propose a new foundation model, characterized by self-supervised pre-training on diverse data for zero-shot generalization, enabling multi-dimensional radio map estimation (FM-RME). Specifically, FM-RME builds an effective synergy of two core components: a geometry-aware feature extraction module that encodes physical propagation symmetries, i.e., translation and rotation invariance, as inductive bias, and an attention-based neural network that learns long-range correlations across the spatial-temporal-spectral domains. A masked self-supervised multi-dimensional pre-training strategy is further developed to learn generalizable spectrum representations across diverse wireless environments. Once pre-trained, FM-RME supports zero-shot inference for multi-dimensional RME, including spatial, temporal, and spectral estimation, without scenario-specific retraining. Simulation results verify that FM-RME exhibits desired learning performance across diverse datasets and zero-shot generalization capabilities beyond existing RME methods.
其他神经网络|深度学习|模型|建模(29篇)
【1】Model Agreement via Anchoring
标题:通过托管的示范协议
链接:https://arxiv.org/abs/2602.23360
作者:Eric Eaton,Surbhi Goel,Marcel Hussing,Michael Kearns,Aaron Roth,Sikata Bela Sengupta,Jessica Sorrell
摘要:Numerous lines of aim to control $\textit{model disagreement}$ -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and standard notion of model disagreement in real-valued prediction problems, namely the expected squared difference in predictions between two models trained on independent samples, without any coordination of the training processes. We would like to be able to drive disagreement to zero with some natural parameter(s) of the training procedure using analyses that can be applied to existing training methodologies. We develop a simple general technique for proving bounds on independent model disagreement based on $\textit{anchoring}$ to the average of two models within the analysis. We then apply this technique to prove disagreement bounds for four commonly used machine learning algorithms: (1) stacked aggregation over an arbitrary model class (where disagreement is driven to 0 with the number of models $k$ being stacked) (2) gradient boosting (where disagreement is driven to 0 with the number of iterations $k$) (3) neural network training with architecture search (where disagreement is driven to 0 with the size $n$ of the architecture being optimized over) and (4) regression tree training over all regression trees of fixed depth (where disagreement is driven to 0 with the depth $d$ of the tree architecture). For clarity, we work out our initial bounds in the setting of one-dimensional regression with squared error loss -- but then show that all of our results generalize to multi-dimensional regression with any strongly convex loss.
【2】Inferential Mechanics Part 1: Causal Mechanistic Theories of Machine Learning in Chemical Biology with Implications
标题:推理力学第1部分:化学生物学中机器学习的因果机制理论及其影响
链接:https://arxiv.org/abs/2602.23303
作者:Ilya Balabin,Thomas M. Kaiser
摘要:Machine learning techniques are now routinely encountered in research laboratories across the globe. Impressive progress has been made through ML and AI techniques with regards to large data set processing. This progress has increased the ability of the experimenter to digest data and make novel predictions regarding phenomena of interest. However, machine learning predictors generated from data sets taken from the natural sciences are often treated as black boxes which are used broadly and generally without detailed consideration of the causal structure of the data set of interest. Work has been attempted to bring causality into discussions of machine learning models of natural phenomena; however, a firm and unified theoretical treatment is lacking. This series of three papers explores the union of chemical theory, biological theory, probability theory and causality that will correct current causal flaws of machine learning in the natural sciences. This paper, Part 1 of the series, provides the formal framework of the foundational causal structure of phenomena in chemical biology and is extended to machine learning through the novel concept of focus, defined here as the ability of a machine learning algorithm to narrow down to a hidden underpinning mechanism in large data sets. Initial proof of these principles on a family of Akt inhibitors is also provided. The second paper containing Part 2 will provide a formal exploration of chemical similarity, and Part 3 will present extensive experimental evidence of how hidden causal structures weaken all machine learning in chemical biology. This series serves to establish for chemical biology a new kind of mathematical framework for modeling mechanisms in Nature without the need for the tools of reductionism: inferential mechanics.
【3】Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime
标题:竹内的信息标准作为接近NTK政权的DNN的一般化措施
链接:https://arxiv.org/abs/2602.23219
作者:Hiroki Naganuma,Taiji Suzuki,Rio Yokota,Masahiro Nomura,Kohta Ishikawa,Ikuro Sato
摘要
:Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishing a reliable generalization measure for statistically singular models such as deep neural networks (DNNs) is difficult due to their complex nature. This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs. Importantly, the developed theory indicates the applicability of TIC near the neural tangent kernel (NTK) regime. In a series of experiments, we trained more than 5,000 DNN models with 12 architectures, including large models (e.g., VGG-16), on four datasets, and estimated the corresponding TIC values to examine the relationship between the generalization gap and the TIC estimates. We applied several TIC approximation methods with feasible computational costs and assessed the accuracy trade-off. Our experimental results indicate that the estimated TIC values correlate well with the generalization gap under conditions close to the NTK regime. However, we show both theoretically and empirically that outside the NTK regime such correlation disappears. Finally, we demonstrate that TIC provides better trial pruning ability than existing methods for hyperparameter optimization.
【4】Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language
标题:告诉我要学什么:将神经记忆推广到自然语言中可控制
链接:https://arxiv.org/abs/2602.23201
作者:Max S. Bennett,Thomas P. Zollo,Richard Zemel
备注:58 Pages, 16 Figures, Code at https://github.com/maxbennett/Generalized-Neural-Memory
摘要:Modern machine learning models are deployed in diverse, non-stationary environments where they must continually adapt to new tasks and evolving knowledge. Continual fine-tuning and in-context learning are costly and brittle, whereas neural memory methods promise lightweight updates with minimal forgetting. However, existing neural memory models typically assume a single fixed objective and homogeneous information streams, leaving users with no control over what the model remembers or ignores over time. To address this challenge, we propose a generalized neural memory system that performs flexible updates based on learning instructions specified in natural language. Our approach enables adaptive agents to learn selectively from heterogeneous information sources, supporting settings, such as healthcare and customer service, where fixed-objective memory updates are insufficient.
【5】Learning Physical Operators using Neural Operators
标题:使用神经运算符学习物理运算符
链接:https://arxiv.org/abs/2602.23113
作者:Vignesh Gopakumar,Ander Gray,Dan Giles,Lorenzo Zanisi,Matt J. Kusner,Timo Betcke,Stanislas Pamela,Marc Peter Deisenroth
摘要:Neural operators have emerged as promising surrogate models for solving partial differential equations (PDEs), but struggle to generalise beyond training distributions and are often constrained to a fixed temporal discretisation. This work introduces a physics-informed training framework that addresses these limitations by decomposing PDEs using operator splitting methods, training separate neural operators to learn individual non-linear physical operators while approximating linear operators with fixed finite-difference convolutions. This modular mixture-of-experts architecture enables generalisation to novel physical regimes by explicitly encoding the underlying operator structure. We formulate the modelling task as a neural ordinary differential equation (ODE) where these learned operators constitute the right-hand side, enabling continuous-in-time predictions through standard ODE solvers and implicitly enforcing PDE constraints. Demonstrated on incompressible and compressible Navier-Stokes equations, our approach achieves better convergence and superior performance when generalising to unseen physics. The method remains parameter-efficient, enabling temporal extrapolation beyond training horizons, and provides interpretable components whose behaviour can be verified against known physics.
【6】Latent Matters: Learning Deep State-Space Models
标题:潜在问题:学习深度状态空间模型
链接:https://arxiv.org/abs/2602.23050
作者:Alexej Klushyn,Richard Kurle,Maximilian Soelch,Botond Cseke,Patrick van der Smagt
备注:Published at NeurIPS 2021
摘要:Deep state-space models (DSSMs) enable temporal predictions by learning the underlying dynamics of observed sequence data. They are often trained by maximising the evidence lower bound. However, as we show, this does not ensure the model actually learns the underlying dynamics. We therefore propose a constrained optimisation framework as a general approach for training DSSMs. Building upon this, we introduce the extended Kalman VAE (EKVAE), which combines amortised variational inference with classic Bayesian filtering/smoothing to model dynamics more accurately than RNN-based DSSMs. Our results show that the constrained optimisation framework significantly improves system identification and prediction accuracy on the example of established state-of-the-art DSSMs. The EKVAE outperforms previous models w.r.t. prediction accuracy, achieves remarkable results in identifying dynamical systems, and can furthermore successfully learn state-space representations where static and dynamic features are disentangled.
【7】Scaling Laws of Global Weather Models
标题:全球天气模型的缩放定律
链接:https://arxiv.org/abs/2602.22962
作者:Yuejiang Yu,Langwen Huang,Alexandru Calotoiu,Torsten Hoefler
备注:17 pages, 7 figures
摘要
:Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.
【8】Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
标题:齐次神经网络中随机梯度下降的推广界
链接:https://arxiv.org/abs/2602.22936
作者:Wenquan Ma,Yang Sui,Jiaye Teng,Bohan Wang,Jing Xu,Jingqin Yang
摘要:Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $η_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $Ω(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations.
【9】Multi-agent imitation learning with function approximation: Linear Markov games and beyond
标题:具有函数逼近的多智能体模仿学习:线性马尔科夫游戏及超越
链接:https://arxiv.org/abs/2602.22810
作者:Luca Viano,Till Freihaut,Emanuele Nevali,Volkan Cevher,Matthieu Geist,Giorgia Ramponi
摘要:In this work, we present the first theoretical analysis of multi-agent imitation learning (MAIL) in linear Markov games where both the transition dynamics and each agent's reward function are linear in some given features. We demonstrate that by leveraging this structure, it is possible to replace the state-action level "all policy deviation concentrability coefficient" (Freihaut et al., arXiv:2510.09325) with a concentrability coefficient defined at the feature level which can be much smaller than the state-action analog when the features are informative about states' similarity. Furthermore, to circumvent the need for any concentrability coefficient, we turn to the interactive setting. We provide the first, computationally efficient, interactive MAIL algorithm for linear Markov games and show that its sample complexity depends only on the dimension of the feature map $d$. Building on these theoretical findings, we propose a deep MAIL interactive algorithm which clearly outperforms BC on games such as Tic-Tac-Toe and Connect4.
【10】KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling
标题:KMLP:一种用于Web规模表格数据建模的可扩展混合架构
链接:https://arxiv.org/abs/2602.22777
作者:Mingming Zhang,Pengfei Shi,Zhiqing Xiao,Feng Zhao,Guandong Sun,Yulin Kang,Ruizhe Gao,Ningtao Wang,Xing Fu,Weiqiang Wang,Junbo Zhao
备注:Accepted by THE ACM WEB CONFERENCE 2026
摘要:Predictive modeling on web-scale tabular data with billions of instances and hundreds of heterogeneous numerical features faces significant scalability challenges. These features exhibit anisotropy, heavy-tailed distributions, and non-stationarity, creating bottlenecks for models like Gradient Boosting Decision Trees and requiring laborious manual feature engineering. We introduce KMLP, a hybrid deep architecture integrating a shallow Kolmogorov-Arnold Network (KAN) front-end with a Gated Multilayer Perceptron (gMLP) backbone. The KAN front-end uses learnable activation functions to automatically model complex non-linear transformations for each feature, while the gMLP backbone captures high-order interactions. Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs increasing at larger scales, validating KMLP as a scalable deep learning paradigm for large-scale web tabular data.
【11】Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks
标题:通过激活子空间瓶颈解释和引导状态空间模型
链接:https://arxiv.org/abs/2602.22719
作者:Vamshi Sunku Mohan,Kaustubh Gupta,Aneesha Das,Chandan Singh
摘要:State-space models (SSMs) have emerged as an efficient strategy for building powerful language models, avoiding the quadratic complexity of computing attention in transformers. Despite their promise, the interpretability and steerability of modern SSMs remain relatively underexplored. We take a major step in this direction by identifying activation subspace bottlenecks in the Mamba family of SSM models using tools from mechanistic interpretability. We then introduce a test-time steering intervention that simply multiplies the activations of the identified bottlenecks by a scalar. Across 5 SSMs and 6 diverse benchmarks, this intervention improves performance by an average of 8.27%, without requiring any task-specific tuning. Finally, we validate that the identified bottlenecks are indeed hindering performance by modifying them to yield an architecture we call Stable-Mamba, which achieves long-context performance gains when retrained from scratch.
【12】dLLM: Simple Diffusion Language Modeling
标题:DLLM:简单扩散语言建模
链接:https://arxiv.org/abs/2602.22661
作者:Zhanhui Zhou,Lingjie Chen,Hanghang Tong,Dawn Song
备注
:Code available at: https://github.com/ZHZisZZ/dllm
摘要:Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to reproduce or extend. As the field accelerates, there is a clear need for a unified framework that standardizes these common components while remaining flexible enough to support new methods and architectures. To address this gap, we introduce dLLM, an open-source framework that unifies the core components of diffusion language modeling -- training, inference, and evaluation -- and makes them easy to customize for new designs. With dLLM, users can reproduce, finetune, deploy, and evaluate open-source large DLMs such as LLaDA and Dream through a standardized pipeline. The framework also provides minimal, reproducible recipes for building small DLMs from scratch with accessible compute, including converting any BERT-style encoder or autoregressive LM into a DLM. We also release the checkpoints of these small DLMs to make DLMs more accessible and accelerate future research.
【13】TorchLean: Formalizing Neural Networks in Lean
标题:TorchLean:在Lean中规范化神经网络
链接:https://arxiv.org/abs/2602.22631
作者:Robert Joseph George,Jennifer Cruden,Xiangru Zhong,Huan Zhang,Anima Anandkumar
备注:35 pages, multiple figures and tables
摘要:Neural networks are increasingly deployed in safety- and mission-critical pipelines, yet many verification and analysis results are produced outside the programming environment that defines and runs the model. This separation creates a semantic gap between the executed network and the analyzed artifact, so guarantees can hinge on implicit conventions such as operator semantics, tensor layouts, preprocessing, and floating-point corner cases. We introduce TorchLean, a framework in the Lean 4 theorem prover that treats learned models as first-class mathematical objects with a single, precise semantics shared by execution and verification. TorchLean unifies (1) a PyTorch-style verified API with eager and compiled modes that lower to a shared op-tagged SSA/DAG computation-graph IR, (2) explicit Float32 semantics via an executable IEEE-754 binary32 kernel and proof-relevant rounding models, and (3) verification via IBP and CROWN/LiRPA-style bound propagation with certificate checking. We validate TorchLean end-to-end on certified robustness, physics-informed residual bounds for PINNs, and Lyapunov-style neural controller verification, alongside mechanized theoretical results including a universal approximation theorem. These results demonstrate a semantics-first infrastructure for fully formal, end-to-end verification of learning-enabled systems.
【14】Relatron: Automating Relational Machine Learning over Relational Databases
标题:Relatron:在关系数据库上自动化关系机器学习
链接:https://arxiv.org/abs/2602.22552
作者:Zhikai Chen,Han Xie,Jian Zhang,Jiliang Tang,Xiang Song,Huzefa Rangwala
备注:ICLR 2026
摘要:Predictive modeling over relational databases (RDBs) powers applications, yet remains challenging due to capturing both cross-table dependencies and complex feature interactions. Relational Deep Learning (RDL) methods automate feature engineering via message passing, while classical approaches like Deep Feature Synthesis (DFS) rely on predefined non-parametric aggregators. Despite performance gains, the comparative advantages of RDL over DFS and the design principles for selecting effective architectures remain poorly understood. We present a comprehensive study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks. Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and (3) validation accuracy is an unreliable guide for architecture choice. This search yields a model performance bank that links architecture configurations to their performance; leveraging this bank, we analyze the drivers of the RDL-DFS performance gap and introduce two task signals -- RDB task homophily and an affinity embedding that captures size, path, feature, and temporal structure -- whose correlation with the gap enables principled routing. Guided by these signals, we propose Relatron, a task embedding-based meta-selector that chooses between RDL and DFS and prunes the within-family search. Lightweight loss-landscape metrics further guard against brittle checkpoints by preferring flatter optima. In experiments, Relatron resolves the "more tuning, worse performance" effect and, in joint hyperparameter-architecture optimization, achieves up to 18.5% improvement over strong baselines with 10x lower cost than Fisher information-based alternatives.
【15】Coarse-to-Fine Learning of Dynamic Causal Structures
标题:动态因果结构从粗到细的学习
链接:https://arxiv.org/abs/2602.22532
作者:Dezhi Yang,Qiaoyu Tan,Carlotta Domeniconi,Jun Wang,Lizhen Cui,Guoxian Yu
备注:Accepted by ICLR2026
摘要:Learning the dynamic causal structure of time series is a challenging problem. Most existing approaches rely on distributional or structural invariance to uncover underlying causal dynamics, assuming stationary or partially stationary causality. However, these assumptions often conflict with the complex, time-varying causal relationships observed in real-world systems. This motivates the need for methods that address fully dynamic causality, where both instantaneous and lagged dependencies evolve over time. Such a setting poses significant challenges for the efficiency and stability of causal discovery. To address these challenges, we introduce DyCausal, a dynamic causal structure learning framework. DyCausal leverages convolutional networks to capture causal patterns within coarse-grained time windows, and then applies linear interpolation to refine causal structures at each time step, thereby recovering fine-grained and time-varying causal graphs. In addition, we propose an acyclic constraint based on matrix norm scaling, which improves efficiency while effectively constraining loops in evolving causal structures. Comprehensive evaluations on both synthetic and real-world datasets demonstrate that DyCausal achieves superior performance compared to existing methods, offering a stable and efficient approach for identifying fully dynamic causal structures from coarse to fine.
【16】Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression
标题:重新审视表格回归的Chebyshev多项和各向异性RBS模型
链接:https://arxiv.org/abs/2602.22422
作者:Luciano Gerber,Huw Lloyd
备注:32 pages, 6 figures, 11 tables. Submitted to Information Sciences
摘要:Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.
【17】Testable Learning of General Halfspaces under Massart Noise
标题:Massart噪音下一般半空间的可测试学习
链接:https://arxiv.org/abs/2602.22300
作者:Ilias Diakonikolas,Giannis Iakovidis,Daniel M. Kane,Sihan Liu
摘要:We study the algorithmic task of testably learning general Massart halfspaces under the Gaussian distribution. In the testable learning setting, the aim is the design of a tester-learner pair satisfying the following properties: (1) if the tester accepts, the learner outputs a hypothesis and a certificate that it achieves near-optimal error, and (2) it is highly unlikely that the tester rejects if the data satisfies the underlying assumptions. Our main result is the first testable learning algorithm for general halfspaces with Massart noise and Gaussian marginals. The complexity of our algorithm is $d^{\mathrm{polylog}(\min\{1/γ, 1/ε\})}$, where $ε$ is the excess error and $γ$ is the bias of the target halfspace, which qualitatively matches the known quasi-polynomial Statistical Query lower bound for the non-testable setting. The analysis of our algorithm hinges on a novel sandwiching polynomial approximation to the sign function with multiplicative error that may be of broader interest.
【18】AviaSafe: A Physics-Informed Data-Driven Model for Aviation Safety-Critical Cloud Forecasts
标题:AviaSafe:用于航空安全关键云预测的物理信息数据驱动模型
链接:https://arxiv.org/abs/2602.22298
作者:Zijian Zhu,Qiusheng Huang,Anboyu Guo,Xiaohui Zhong,Hao Li
摘要:Current AI weather forecasting models predict conventional atmospheric variables but cannot distinguish between cloud microphysical species critical for aviation safety. We introduce AviaSafe, a hierarchical, physics-informed neural forecaster that produces global, six-hourly predictions of these four hydrometeor species for lead times up to 7 days. Our approach addresses the unique challenges of cloud prediction: extreme sparsity, discontinuous distributions, and complex microphysical interactions between species. We integrate the Icing Condition (IC) index from aviation meteorology as a physics-based constraint that identifies regions where supercooled water fuels explosive ice crystal growth. The model employs a hierarchical architecture that first predicts cloud spatial distribution through masked attention, then quantifies species concentrations within identified regions. Training on ERA5 reanalysis data, our model achieves lower RMSE for cloud species compared to baseline and outperforms operational numerical models on certain key variables at 7-day lead times. The ability to forecast individual cloud species enables new applications in aviation route optimization where distinguishing between ice and liquid water determines engine icing risk.
【19】OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data
标题:OmniZip:学习针对多模式数据的统一、轻量级的无损压缩器
链接:https://arxiv.org/abs/2602.22286
作者:Yan Zhao,Zhengxue Cheng,Junxuan Zhang,Dajiang Zhou,Qunshan Gu,Qi Wang,Li Song
备注:8 figures, 10 tables
摘要:Lossless compression is essential for efficient data storage and transmission. Although learning-based lossless compressors achieve strong results, most of them are designed for a single modality, leading to redundant compressor deployments in multi-modal settings. Designing a unified multi-modal compressor is critical yet challenging, as different data types vary largely in format, dimension, and statistics. Multi-modal large language models offer a promising resolution but remain too complex for practical use. Thus, we propose \textbf{OmniZip}, \textbf{a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence)}. Built on a lightweight backbone, OmniZip incorporates three key components to enable efficient multi-modal lossless compression: a modality-unified tokenizer that reversibly transforms diverse data into tokens, a modality-routing context learning mechanism that enables flexible multi-modal context modeling, and a modality-routing feedforward design that further enhances the model's nonlinear representation flexibility. A reparameterization training strategy is used to enhance model capacity. OmniZip outperforms or matches other state-of-the-art compressors on multiple modalities, achieving 42\%, 57\%, 62\% and 42\%, 53\% higher compression efficiency than gzip on CLIC-M, TouchandGo, enwik9, LibriSpeech, and WikiSQL datasets, respectively. It also supports near real-time inference on resource-constrained edge devices, reaching about 1MB/s on MacBook CPUs and iPhone NPUs. Our code is released at https://github.com/adminasmi/OmniZip-CVPR2026.
【20】WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention
标题:WaveRSM:非平稳信号注意力的多尺度状态空间模型
链接:https://arxiv.org/abs/2602.22266
作者:Ruben Solozabal,Velibor Bojkovic,Hilal Alquabeh,Klea Ziu,Kentaro Inui,Martin Takac
摘要:State-space models (SSMs) have emerged as a powerful foundation for long-range sequence modeling, with the HiPPO framework showing that continuous-time projection operators can be used to derive stable, memory-efficient dynamical systems that encode the past history of the input signal. However, existing projection-based SSMs often rely on polynomial bases with global temporal support, whose inductive biases are poorly matched to signals exhibiting localized or transient structure. In this work, we introduce \emph{WaveSSM}, a collection of SSMs constructed over wavelet frames. Our key observation is that wavelet frames yield a localized support on the temporal dimension, useful for tasks requiring precise localization. Empirically, we show that on equal conditions, \textit{WaveSSM} outperforms orthogonal counterparts as S4 on real-world datasets with transient dynamics, including physiological signals on the PTB-XL dataset and raw audio on Speech Commands.
【21】Code World Models for Parameter Control in Evolutionary Algorithms
标题:进化算法中参数控制的代码世界模型
链接:https://arxiv.org/abs/2602.22260
作者:Camilo Chacón Sartori,Guillem Rodríguez Corominas
摘要:Can an LLM learn how an optimizer behaves -- and use that knowledge to control it? We extend Code World Models (CWMs), LLM-synthesized Python programs that predict environment dynamics, from deterministic games to stochastic combinatorial optimization. Given suboptimal trajectories of $(1{+}1)$-$\text{RLS}_k$, the LLM synthesizes a simulator of the optimizer's dynamics; greedy planning over this simulator then selects the mutation strength $k$ at each step. On \lo{} and \onemax{}, CWM-greedy performs within 6\% of the theoretically optimal policy -- without ever seeing optimal-policy trajectories. On \jump{$_k$}, where a deceptive valley causes all adaptive baselines to fail (0\% success rate), CWM-greedy achieves 100\% success rate -- without any collection policy using oracle knowledge of the gap parameter. On the NK-Landscape, where no closed-form model exists, CWM-greedy outperforms all baselines across fifteen independently generated instances ($36.94$ vs.\ $36.32$; $p<0.001$) when the prompt includes empirical transition statistics. The CWM also outperforms DQN in sample efficiency (200 offline trajectories vs.\ 500 online episodes), success rate (100\% vs.\ 58\%), and generalization ($k{=}3$: 78\% vs.\ 0\%). Robustness experiments confirm stable synthesis across 5 independent runs.
【22】Deep Sequence Modeling with Quantum Dynamics: Language as a Wave Function
标题:利用量子动力学进行深度序列建模:作为波函数的语言
链接:https://arxiv.org/abs/2602.22255
作者:Ahmed Nebli,Hadi Saadatdoorabi,Kevin Yam
摘要:We introduce a sequence modeling framework in which the latent state is a complex-valued wave function evolving on a finite-dimensional Hilbert space under a learned, time-dependent Hamiltonian. Unlike standard recurrent architectures that rely on gating mechanisms to suppress competing hypotheses, our framework utilizes quantum interference: the Hamiltonian steers the phases of complex amplitudes so that conflicting interpretations cancel while compatible ones reinforce. The dynamics are strictly unitary, ensuring that the state norm is preserved exactly at every time step via a Cayley (Crank--Nicolson) discretization. Token probabilities are extracted using the Born rule, a quadratic measurement operator that couples magnitudes and relative phases. Our primary theoretical contribution is a separation theorem characterizing the representational advantage of this readout: we define a family of disambiguation tasks that a complex unitary model of dimension $N$ solves exactly, but which requires a state dimension of $Ω(N^2)$ for any real-valued orthogonal model equipped with a standard affine-softmax readout. This quadratic gap arises because the Born rule implicitly lifts the $N$-dimensional state into the space of rank-one Hermitian matrices, accessing pairwise phase correlations that are inaccessible to linear projections. Finally, we derive a continuity equation for the latent probability mass, yielding conserved pairwise currents that serve as a built-in diagnostic for tracing information flow between dimensions.
【23】Q-Tag: Watermarking Quantum Circuit Generative Models
标题:Q-tag:水印量子电路生成模型
链接:https://arxiv.org/abs/2602.23085
作者:Yang Yang,Yuzhu Long,Han Fang,Zhaoyun Chen,Zhonghui Li,Weiming Zhang,Guoping Guo
备注:13 pages, 8 figures
摘要:Quantum cloud platforms have become the most widely adopted and mainstream approach for accessing quantum computing resources, due to the scarcity and operational complexity of quantum hardware. In this service-oriented paradigm, quantum circuits, which constitute high-value intellectual property, are exposed to risks of unauthorized access, reuse, and misuse. Digital watermarking has been explored as a promising mechanism for protecting quantum circuits by embedding ownership information for tracing and verification. However, driven by recent advances in generative artificial intelligence, the paradigm of quantum circuit design is shifting from individually and manually constructed circuits to automated synthesis based on quantum circuit generative models (QCGMs). In such generative settings, protecting only individual output circuits is insufficient, and existing post hoc, circuit-centric watermarking methods are not designed to integrate with the generative process, often failing to simultaneously ensure stealthiness, functional correctness, and robustness at scale. These limitations highlight the need for a new watermarking paradigm that is natively integrated with quantum circuit generative models. In this work, we present the first watermarking framework for QCGMs, which embeds ownership signals into the generation process while preserving circuit fidelity. We introduce a symmetric sampling strategy that aligns watermark encoding with the model's Gaussian prior, and a synchronization mechanism that counteracts adversarial watermark attack through latent drift correction. Empirical results confirm that our method achieves high-fidelity circuit generation and robust watermark detection across a range of perturbations, paving the way for scalable, secure copyright protection in AI-powered quantum design.
【24】Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks
标题:超越NNGP:Bayesian神经网络中的大偏差和特征学习
链接:https://arxiv.org/abs/2602.22925
作者:Katerina Papagiannouli,Dario Trevisan,Giuseppe Pio Zitto
摘要:We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.
【25】SPD Learn: A Geometric Deep Learning Python Library for Neural Decoding Through Trivialization
标题:SPD Learn:一个通过平凡化进行神经解码的几何深度学习Python库
链接:https://arxiv.org/abs/2602.22895
作者:Bruno Aristimunha,Ce Ju,Antoine Collas,Florent Bouchard,Ammar Mian,Bertrand Thirion,Sylvain Chevallier,Reinmar Kobler
备注:9 Pages
摘要:Implementations of symmetric positive definite (SPD) matrix-based neural networks for neural decoding remain fragmented across research codebases and Python packages. Existing implementations often employ ad hoc handling of manifold constraints and non-unified training setups, which hinders reproducibility and integration into modern deep-learning workflows. To address this gap, we introduce SPD Learn, a unified and modular Python package for geometric deep learning with SPD matrices. SPD Learn provides core SPD operators and neural-network layers, including numerically stable spectral operators, and enforces Stiefel/SPD constraints via trivialization-based parameterizations. This design enables standard backpropagation and optimization in unconstrained Euclidean spaces while producing manifold-constrained parameters by construction. The package also offers reference implementations of representative SPDNet-based models and interfaces with widely used brain computer interface/neuroimaging toolkits and modern machine-learning libraries (e.g., MOABB, Braindecode, Nilearn, and SKADA), facilitating reproducible benchmarking and practical deployment.
【26】Advancing accelerator virtual beam diagnostics through latent evolution modeling: an integrated solution to forward, inverse, tuning, and UQ problems
标题:通过潜在进化建模推进加速器虚拟射束诊断:正向、反向、调谐和UQ问题的集成解决方案
链接:https://arxiv.org/abs/2602.22618
作者:Mahindra Rautela,Alexander Scheinker
摘要:Virtual beam diagnostics relies on computationally intensive beam dynamics simulations where high-dimensional charged particle beams evolve through the accelerator. We propose Latent Evolution Model (LEM), a hybrid machine learning framework with an autoencoder that projects high-dimensional phase spaces into lower-dimensional representations, coupled with transformers to learn temporal dynamics in the latent space. This approach provides a common foundational framework addressing multiple interconnected challenges in beam diagnostics. For \textit{forward modeling}, a Conditional Variational Autoencoder (CVAE) encodes 15 unique projections of the 6D phase space into a latent representation, while a transformer predicts downstream latent states from upstream inputs. For \textit{inverse problems}, we address two distinct challenges: (a) predicting upstream phase spaces from downstream observations by utilizing the same CVAE architecture with transformers trained on reversed temporal sequences along with aleatoric uncertainty quantification, and (b) estimating RF settings from the latent space of the trained LEM using a dedicated dense neural network that maps latent representations to RF parameters. For \textit{tuning problems}, we leverage the trained LEM and RF estimator within a Bayesian optimization framework to determine optimal RF settings that minimize beam loss. This paper summarizes our recent efforts and demonstrates how this unified approach effectively addresses these traditionally separate challenges.
【27】What Topological and Geometric Structure Do Biological Foundation Models Learn? Evidence from 141 Hypotheses
标题:生物基础模型学习什么布局和几何结构?来自141个假设的证据
链接:https://arxiv.org/abs/2602.22289
作者:Ihor Kendiukhov
摘要:When biological foundation models such as scGPT and Geneformer process single-cell gene expression, what geometric and topological structure forms in their internal representations? Is that structure biologically meaningful or a training artifact, and how confident should we be in such claims? We address these questions through autonomous large-scale hypothesis screening: an AI-driven executor-brainstormer loop that proposed, tested, and refined 141 geometric and topological hypotheses across 52 iterations, covering persistent homology, manifold distances, cross-model alignment, community structure, and directed topology, all with explicit null controls and disjoint gene-pool splits. Three principal findings emerge. First, the models learn genuine geometric structure. Gene embedding neighborhoods exhibit non-trivial topology, with persistent homology significant in 11 of 12 transformer layers at p < 0.05 in the weakest domain and 12 of 12 in the other two. A multi-level distance hierarchy shows that manifold-aware metrics outperform Euclidean distance for identifying regulatory gene pairs, and graph community partitions track known transcription factor target relationships. Second, this structure is shared across independently trained models. CCA alignment between scGPT and Geneformer yields canonical correlation of 0.80 and gene retrieval accuracy of 72 percent, yet none of 19 tested methods reliably recover gene-level correspondences. The models agree on the global shape of gene space but not on precise gene placement. Third, the structure is more localized than it first appears. Under stringent null controls applied across all null families, robust signal concentrates in immune tissue, while lung and external lung signals weaken substantially.
【28】Stochastic Neural Networks for Quantum Devices
标题:量子设备的随机神经网络
链接:https://arxiv.org/abs/2602.22241
作者:Bodo Rosenhahn,Tobias J. Osborne,Christoph Hirche
备注:15 pages
摘要:This work presents a formulation to express and optimize stochastic neural networks as quantum circuits in gate-based quantum computing. Motivated by a classical perceptron, stochastic neurons are introduced and combined into a quantum neural network. The Kiefer-Wolfowitz algorithm in combination with simulated annealing is used for training the network weights. Several topologies and models are presented, including shallow fully connected networks, Hopfield Networks, Restricted Boltzmann Machines, Autoencoders and convolutional neural networks. We also demonstrate the combination of our optimized neural networks as an oracle for the Grover algorithm to realize a quantum generative AI model.
【29】Solving stiff dark matter equations via Jacobian Normalization with Physics-Informed Neural Networks
标题:利用物理信息神经网络通过雅可比正规化求解刚性暗物质方程
链接:https://arxiv.org/abs/2602.21988
作者:M. P. Bento,H. B. Câmara,J. R. Rocha,J. F. Seabra
备注:16 LaTeX pages; 6 figures
摘要:Stiff differential equations pose a major challenge for Physics-Informed Neural Networks (PINNs), often causing poor convergence. We propose a simple, hyperparameter-free method to address stiffness by normalizing loss residuals with the Jacobian. We provide theoretical indications that Jacobian-based normalization can improve gradient descent and validate it on benchmark stiff ordinary differential equations. We then apply it to a realistic system: the stiff Boltzmann equations (BEs) governing weakly interacting massive particle (WIMP) dark matter (DM). Our approach achieves higher accuracy than attention mechanisms previously proposed for handling stiffness, recovering the full solution where prior methods fail. This is further demonstrated in an inverse problem with a single experimental data point - the observed DM relic density - where our inverse PINNs correctly infer the cross section that solves the BEs in both Standard and alternative cosmologies.
其他(44篇)
【1】A Dataset is Worth 1 MB
标题:数据集价值1 MB
链接:https://arxiv.org/abs/2602.23358
作者:Elad Kimchi Shoshani,Leeyam Gabay,Yedid Hoshen
备注:23 pages, 9 figures
摘要:A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on diverse hardware and software frameworks, transmitting a pre-trained model is often infeasible; instead, agents require raw data to train their own task-specific models locally. While dataset distillation attempts to compress training signals, current methods struggle to scale to high-resolution data and rarely achieve sufficiently small files. In this paper, we propose Pseudo-Labels as Data (PLADA), a method that completely eliminates pixel transmission. We assume agents are preloaded with a large, generic, unlabeled reference dataset (e.g., ImageNet-1K, ImageNet-21K) and communicate a new task by transmitting only the class labels for specific images. To address the distribution mismatch between the reference and target datasets, we introduce a pruning mechanism that filters the reference dataset to retain only the labels of the most semantically relevant images for the target task. This selection process simultaneously maximizes training efficiency and minimizes transmission payload. Experiments on 10 diverse datasets demonstrate that our approach can transfer task knowledge with a payload of less than 1 MB while retaining high classification accuracy, offering a promising solution for efficient dataset serving.
【2】FlashOptim: Optimizers for Memory Efficient Training
标题:Flash Optimm:内存高效训练的优化器
链接:https://arxiv.org/abs/2602.23349
作者:Jose Javier Gonzalez Ortiz,Abhay Gupta,Chris Renard,Davis Blalock
备注:Source code is available at https://github.com/databricks/flashoptim
摘要:Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just the parameter itself, but also its gradient and one or more optimizer state variables. With each of these values typically requiring 4 bytes, training even a 7 billion parameter model can be impractical for researchers with less than 100GB of accelerator memory. We introduce FlashOptim, a suite of optimizations that reduces per-parameter memory by over 50% while preserving model quality and API compatibility. Our approach introduces two key techniques. First, we improve master weight splitting by finding and exploiting a tight bound on its quantization error. Second, we design companding functions that greatly reduce the error in 8-bit optimizer state quantization. Together with 16-bit gradients, these techniques reduce AdamW memory from 16 bytes to 7 bytes per parameter, or 5 bytes with gradient release. They also cut model checkpoint sizes by more than half. Experiments with FlashOptim applied to SGD, AdamW, and Lion show no measurable quality degradation on any task from a collection of standard vision and language benchmarks, including Llama-3.1-8B finetuning.
【3】Differentiable Zero-One Loss via Hypersimplex Projections
标题:通过超单纯形投影的可区分的零一损失
链接:https://arxiv.org/abs/2602.23336
作者:Camilo Gomez,Pengyang Wang,Liansheng Tang
备注:To appear in PAKDD 2026 (Pacific-Asia Conference on Knowledge Discovery and Data Mining), 12 pages
摘要:Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.
【4】ParamMem: Augmenting Language Agents with Parametric Reflective Memory
标题:ParamMem:利用参数反射记忆增强语言代理
链接:https://arxiv.org/abs/2602.23320
作者:Tianjun Yao,Yongqiang Chen,Yujia Zheng,Pan Li,Zhiqiang Shen,Kun Zhang
备注:20 pages
摘要:Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent studies have attempted to address this limitation through various approaches, among which increasing reflective diversity has shown promise. Our empirical analysis reveals a strong positive correlation between reflective diversity and task success, further motivating the need for diverse reflection signals. We introduce ParamMem, a parametric memory module that encodes cross-sample reflection patterns into model parameters, enabling diverse reflection generation through temperature-controlled sampling. Building on this module, we propose ParamAgent, a reflection-based agent framework that integrates parametric memory with episodic and cross-sample memory. Extensive experiments on code generation, mathematical reasoning, and multi-hop question answering demonstrate consistent improvements over state-of-the-art baselines. Further analysis reveals that ParamMem is sample-efficient, enables weak-to-strong transfer across model scales, and supports self-improvement without reliance on stronger external model, highlighting the potential of ParamMem as an effective component for enhancing language agents.
【5】A Proper Scoring Rule for Virtual Staining
标题:虚拟染色的适当评分规则
链接:https://arxiv.org/abs/2602.23305
作者:Samuel Tonks,Steve Hood,Ryan Musso,Ceridwen Hopely,Steve Titus,Minh Doan,Iain Styles,Alexander Krull
摘要:Generative virtual staining (VS) models for high-throughput screening (HTS) can provide an estimated posterior distribution of possible biological feature values for each input and cell. However, when evaluating a VS model, the true posterior is unavailable. Existing evaluation protocols only check the accuracy of the marginal distribution over the dataset rather than the predicted posteriors. We introduce information gain (IG) as a cell-wise evaluation framework that enables direct assessment of predicted posteriors. IG is a strictly proper scoring rule and comes with a sound theoretical motivation allowing for interpretability, and for comparing results across models and features. We evaluate diffusion- and GAN-based models on an extensive HTS dataset using IG and other metrics and show that IG can reveal substantial performance differences other metrics cannot.
【6】Zeroth-Order Stackelberg Control in Combinatorial Congestion Games
标题:组合拥挤博弈中的零阶Stackelberg控制
链接:https://arxiv.org/abs/2602.23277
作者:Saeed Masiha,Sepehr Elahi,Negar Kiyavash,Patrick Thiran
摘要:We study Stackelberg (leader--follower) tuning of network parameters (tolls, capacities, incentives) in combinatorial congestion games, where selfish users choose discrete routes (or other combinatorial strategies) and settle at a congestion equilibrium. The leader minimizes a system-level objective (e.g., total travel time) evaluated at equilibrium, but this objective is typically nonsmooth because the set of used strategies can change abruptly. We propose ZO-Stackelberg, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria. We prove convergence to generalized Goldstein stationary points of the true equilibrium objective, with explicit dependence on the equilibrium approximation error, and analyze subsampled oracles: if an exact minimizer is sampled with probability $κ_m$, then the Frank--Wolfe error decays as $\mathcal{O}(1/(κ_m T))$. We also propose stratified sampling as a practical way to avoid a vanishing $κ_m$ when the strategies that matter most for the Wardrop equilibrium concentrate in a few dominant combinatorial classes (e.g., short paths). Experiments on real-world networks demonstrate that our method achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.
【7】Closing the gap on tabular data with Fourier and Implicit Categorical Features
标题:利用傅里叶和隐式分类特征缩小表格数据的差距
链接:https://arxiv.org/abs/2602.23182
作者:Marius Dragoi,Florin Gogianu,Elena Burceanu
摘要
:While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last "unconquered castle" for neural networks. We hypothesize that a significant advantage of tree-based methods lies in their intrinsic capability to model and exploit non-linear interactions induced by features with categorical characteristics. In contrast, neural-based methods exhibit biases toward uniform numerical processing of features and smooth solutions, making it challenging for them to effectively leverage such patterns. We address this performance gap by using statistical-based feature processing techniques to identify features that are strongly correlated with the target once discretized. We further mitigate the bias of deep models for overly-smooth solutions, a bias that does not align with the inherent properties of the data, using Learned Fourier. We show that our proposed feature preprocessing significantly boosts the performance of deep learning models and enables them to achieve a performance that closely matches or surpasses XGBoost on a comprehensive tabular data benchmark.
【8】Benchmarking Temporal Web3 Intelligence: Lessons from the FinSurvival 2025 Challenge
标题:Temporal Web 3智能基准:FinSurvival 2025挑战赛的教训
链接:https://arxiv.org/abs/2602.23159
作者:Oshani Seneviratne,Fernando Spadea,Adrien Pavao,Aaron Micah Green,Kristin P. Bennett
摘要:Temporal Web analytics increasingly relies on large-scale, longitudinal data to understand how users, content, and systems evolve over time. A rapidly growing frontier is the \emph{Temporal Web3}: decentralized platforms whose behavior is recorded as immutable, time-stamped event streams. Despite the richness of this data, the field lacks shared, reproducible benchmarks that capture real-world temporal dynamics, specifically censoring and non-stationarity, across extended horizons. This absence slows methodological progress and limits the transfer of techniques between Web3 and broader Web domains. In this paper, we present the \textit{FinSurvival Challenge 2025} as a case study in benchmarking \emph{temporal Web3 intelligence}. Using 21.8 million transaction records from the Aave v3 protocol, the challenge operationalized 16 survival prediction tasks to model user behavior transitions.We detail the benchmark design and the winning solutions, highlighting how domain-aware temporal feature construction significantly outperformed generic modeling approaches. Furthermore, we distill lessons for next-generation temporal benchmarks, arguing that Web3 systems provide a high-fidelity sandbox for studying temporal challenges, such as churn, risk, and evolution that are fundamental to the wider Web.
【9】Partial recovery of meter-scale surface weather
标题:米级地表天气部分恢复
链接:https://arxiv.org/abs/2602.23146
作者:Jonathan Giezendanner,Qidong Yang,Eric Schmitt,Anirban Chandra,Daniel Salles Civitarese,Johannes Jakubik,Jeremy Vila,Detlef Hohl,Campbell Watson,Sherrie Wang
摘要:Near-surface atmospheric conditions can differ sharply over tens to hundreds of meters due to land cover and topography, yet this variability is absent from current weather analyses and forecasts. It is unclear whether such meter-scale variability reflects irreducibly chaotic dynamics or contains a component predictable from surface characteristics and large-scale atmospheric forcing. Here we show that a substantial, physically coherent component of meter-scale near-surface weather is statistically recoverable from existing observations. By conditioning coarse atmospheric state on sparse surface station measurements and high-resolution Earth observation data, we infer spatially continuous fields of near-surface wind, temperature, and humidity at 10 m resolution across the contiguous United States. Relative to ERA5, the inferred fields reduce wind error by 29% and temperature and dewpoint error by 6%, while explaining substantially more spatial variance at fixed time steps. They also exhibit physically interpretable structure, including urban heat islands, evapotranspiration-driven humidity contrasts, and wind speed differences across land cover types. Our findings expand the frontier of weather modeling by demonstrating a computationally feasible approach to continental-scale meter-resolution inference. More broadly, they illustrate how conditioning coarse dynamical models on static fine-scale features can reveal previously unresolved components of the Earth system.
【10】Bound to Disagree : Generalization Bounds via Certifiable Surrogates
标题:必然不同意:通过可认证的代理人来概括界限
链接:https://arxiv.org/abs/2602.23128
作者:Mathieu Bazinet,Valentina Zantedeschi,Pascal Germain
摘要:Generalization bounds for deep learning models are typically vacuous, not computable or restricted to specific model classes. In this paper, we tackle these issues by providing new disagreement-based certificates for the gap between the true risk of any two predictors. We then bound the true risk of the predictor of interest via a surrogate model that enjoys tight generalization guarantees, and evaluating our disagreement bound on an unlabeled dataset. We empirically demonstrate the tightness of the obtained certificates and showcase the versatility of the approach by training surrogate models leveraging three different frameworks: sample compression, model compression and PAC-Bayes theory. Importantly, such guarantees are achieved without modifying the target model, nor adapting the training procedure to the generalization framework.
【11】Regularized Online RLHF with Generalized Bilinear Preferences
标题:具有广义双线性偏好的正则化在线RLHF
链接:https://arxiv.org/abs/2602.23116
作者:Junghyun Lee,Minju Hong,Kwang-Sung Jun,Chulhee Yun,Se-Young Yun
备注:43 pages, 1 table
摘要:We consider the problem of contextual online RLHF with general preferences, where the goal is to identify the Nash Equilibrium. We adopt the Generalized Bilinear Preference Model (GBPM) to capture potentially intransitive preferences via low-rank, skew-symmetric matrices. We investigate general preference learning with any strongly convex regularizer (where $η^{-1}$ is the regularization strength), generalizing beyond prior works limited to reverse KL-regularization. Central to our analysis is proving that the dual gap of the greedy policy is bounded by the square of the estimation error - a result derived solely from strong convexity and the skew-symmetricity of GBPM.Building on this insight and a feature diversity assumption, we establish two regret bounds via two simple algorithms: (1) Greedy Sampling achieves polylogarithmic, $e^{O(η)}$-free regret $\tilde{O}(ηd^4 (\log T)^2)$. (2) Explore-Then-Commit achieves $\mathrm{poly}(d)$-free regret $\tilde{O}(\sqrt{ηr T})$ by exploiting the low-rank structure; this is the first statistically efficient guarantee for online RLHF in high-dimensions.
【12】Physics-informed neural particle flow for the Bayesian update step
标题:贝叶斯更新步骤的物理信息神经粒子流
链接:https://arxiv.org/abs/2602.23089
作者:Domonkos Csuzdi,Tamás Bécsi,Olivér Törő
摘要:The Bayesian update step poses significant computational challenges in high-dimensional nonlinear estimation. While log-homotopy particle flow filters offer an alternative to stochastic sampling, existing formulations usually yield stiff differential equations. Conversely, existing deep learning approximations typically treat the update as a black-box task or rely on asymptotic relaxation, neglecting the exact geometric structure of the finite-horizon probability transport. In this work, we propose a physics-informed neural particle flow, which is an amortized inference framework. To construct the flow, we couple the log-homotopy trajectory of the prior to posterior density function with the continuity equation describing the density evolution. This derivation yields a governing partial differential equation (PDE), referred to as the master PDE. By embedding this PDE as a physical constraint into the loss function, we train a neural network to approximate the transport velocity field. This approach enables purely unsupervised training, eliminating the need for ground-truth posterior samples. We demonstrate that the neural parameterization acts as an implicit regularizer, mitigating the numerical stiffness inherent to analytic flows and reducing online computational complexity. Experimental validation on multimodal benchmarks and a challenging nonlinear scenario confirms better mode coverage and robustness compared to state-of-the-art baselines.
【13】OmniGAIA: Towards Native Omni-Modal AI Agents
标题:OmniGAIA:迈向原生全模式人工智能代理
链接:https://arxiv.org/abs/2602.22897
作者:Xiaoxi Li,Wenxiang Jiao,Jiarui Jin,Shijian Wang,Guanting Dong,Jiajie Jin,Hao Wang,Yinuo Wang,Ji-Rong Wen,Yuan Lu,Zhicheng Dou
摘要:Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to interact with the world. However, current multi-modal LLMs are primarily confined to bi-modal interactions (e.g., vision-language), lacking the unified cognitive capabilities required for general AI assistants. To bridge this gap, we introduce OmniGAIA, a comprehensive benchmark designed to evaluate omni-modal agents on tasks necessitating deep reasoning and multi-turn tool execution across video, audio, and image modalities. Constructed via a novel omni-modal event graph approach, OmniGAIA synthesizes complex, multi-hop queries derived from real-world data that require cross-modal reasoning and external tool integration. Furthermore, we propose OmniAtlas, a native omni-modal foundation agent under tool-integrated reasoning paradigm with active omni-modal perception. Trained on trajectories synthesized via a hindsight-guided tree exploration strategy and OmniDPO for fine-grained error correction, OmniAtlas effectively enhances the tool-use capabilities of existing open-source models. This work marks a step towards next-generation native omni-modal AI assistants for real-world scenarios.
【14】Decentralized Ranking Aggregation: Gossip Algorithms for Borda and Copeland Consensus
标题:去中心化排名聚合:Borda和Copeland共识的八卦算法
链接:https://arxiv.org/abs/2602.22847
作者:Anna Van Elst,Kerrian Le Caillec,Igor Colin,Stephan Clémençon
备注:8 pages, 2 figures
摘要:The concept of ranking aggregation plays a central role in preference analysis, and numerous algorithms for calculating median rankings, often originating in social choice theory, have been documented in the literature, offering theoretical guarantees in a centralized setting, i.e., when all the ranking data to be aggregated can be brought together in a single computing unit. For many technologies (e.g. peer-to-peer networks, IoT, multi-agent systems), extending the ability to calculate consensus rankings with guarantees in a decentralized setting, i.e., when preference data is initially distributed across a communicating network, remains a major methodological challenge. Indeed, in recent years, the literature on decentralized computation has mainly focused on computing or optimizing statistics such as arithmetic means using gossip algorithms. The purpose of this article is precisely to study how to achieve reliable consensus on collective rankings using classical rules (e.g. Borda, Copeland) in a decentralized setting, thereby raising new questions, robustness to corrupted nodes, and scalability through reduced communication costs in particular. The approach proposed and analyzed here relies on random gossip communication, allowing autonomous agents to compute global ranking consensus using only local interactions, without coordination or central authority. We provide rigorous convergence guarantees, including explicit rate bounds, for the Borda and Copeland consensus methods. Beyond these rules, we also provide a decentralized implementation of consensus according to the median rank rule and local Kemenization. Extensive empirical evaluations on various network topologies and real and synthetic ranking datasets demonstrate that our algorithms converge quickly and reliably to the correct ranking aggregation.
【15】AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications
标题:AMA-Bench:评估大型应用的长视野存储器
链接:https://arxiv.org/abs/2602.22769
作者:Yujie Zhao,Boqin Yuan,Junbo Huang,Haocheng Yuan,Zhongming Yu,Haozhou Xu,Lanxiang Hu,Abhilash Shankarampeta,Zimeng Huang,Wentao Ni,Yuandong Tian,Jishen Zhao
摘要
:Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a significant gap exists between practical applications and current evaluation standards for agent memory: existing benchmarks primarily focus on dialogue-centric, human-agent interactions. In reality, agent memory consists of a continuous stream of agent-environment interactions that are primarily composed of machine-generated representations. To bridge this gap, we introduce AMA-Bench (Agent Memory with Any length), which evaluates long-horizon memory for LLMs in real agentic applications. It features two key components: (1) a set of real-world agentic trajectories across representative agentic applications, paired with expert-curated QA, and (2) a set of synthetic agentic trajectories that scale to arbitrary horizons, paired with rule-based QA. Our comprehensive study shows that existing memory systems underperform on AMA-Bench primarily because they lack causality and objective information and are constrained by the lossy nature of similarity-based retrieval employed by many memory systems. To address these limitations, we propose AMA-Agent, an effective memory system featuring a causality graph and tool-augmented retrieval. Our results demonstrate that AMA-Agent achieves 57.22% average accuracy on AMA-Bench, surpassing the strongest memory system baselines by 11.16%.
【16】DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule
标题:DPSQL+:一个具有最小频率规则的差异私有SQL库
链接:https://arxiv.org/abs/2602.22699
作者:Tomoya Matsumoto,Shokichi Takakura,Shun Takagi,Satoshi Hasegawa
摘要:SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least $k$ distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level $(\varepsilon,δ)$-DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.
【17】ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL
标题:ContextRL:通过上下文增强RL提高MLLM的知识发现效率
链接:https://arxiv.org/abs/2602.22623
作者:Xingyu Lu,Jinpeng Wang,YiFan Zhang,Shijie Ma,Xiao Hu,Tianke Zhang,Haonan fan,Kaiyu Jiang,Changyi Liu,Kaiyu Tang,Bin Wen,Fan Yang,Tingting Gao,Han Li,Chun Yuan
备注:14 pages, 5 figures
摘要:We propose ContextRL, a novel framework that leverages context augmentation to overcome these bottlenecks. Specifically, to enhance Identifiability, we provide the reward model with full reference solutions as context, enabling fine-grained process verification to filter out false positives (samples with the right answer but low-quality reasoning process). To improve Reachability, we introduce a multi-turn sampling strategy where the reward model generates mistake reports for failed attempts, guiding the policy to "recover" correct responses from previously all-negative groups. Experimental results on 11 perception and reasoning benchmarks show that ContextRL significantly improves knowledge discovery efficiency. Notably, ContextRL enables the Qwen3-VL-8B model to achieve performance comparable to the 32B model, outperforming standard RLVR baselines by a large margin while effectively mitigating reward hacking. Our in-depth analysis reveals the significant potential of contextual information for improving reward model accuracy and document the widespread occurrence of reward hacking, offering valuable insights for future RLVR research.
【18】DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion
标题:民主党意识的AdaLN-Zero:驯服差异私人扩散中条件诱导的重尾犯罪者
链接:https://arxiv.org/abs/2602.22610
作者:Tao Huang,Jiayang Meng,Xu Yang,Chen Hou,Hong Chen
摘要:Condition injection enables diffusion models to generate context-aware outputs, which is essential for many time-series tasks. However, heterogeneous conditional contexts (e.g., observed history, missingness patterns or outlier covariates) can induce heavy-tailed per-example gradients. Under Differentially Private Stochastic Gradient Descent (DP-SGD), these rare conditioning-driven heavy-tailed gradients disproportionately trigger global clipping, resulting in outlier-dominated updates, larger clipping bias, and degraded utility under a fixed privacy budget. In this paper, we propose DP-aware AdaLN-Zero, a drop-in sensitivity-aware conditioning mechanism for conditional diffusion transformers that limits conditioning-induced gain without modifying the DP-SGD mechanism. DP-aware AdaLN-Zero jointly constrains conditioning representation magnitude and AdaLN modulation parameters via bounded re-parameterization, suppressing extreme gradient tail events before gradient clipping and noise injection. Empirically, DP-SGD equipped with DP-aware AdaLN-Zero improves interpolation/imputation and forecasting under matched privacy settings. We observe consistent gains on a real-world power dataset and two public ETT benchmarks over vanilla DP-SGD. Moreover, gradient diagnostics attribute these improvements to conditioning-specific tail reshaping and reduced clipping distortion, while preserving expressiveness in non-private training. Overall, these results show that sensitivity-aware conditioning can substantially improve private conditional diffusion training without sacrificing standard performance.
【19】Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach
标题:在人工智能评估中纠正评分者效应的人类标签:项目反应理论方法
链接:https://arxiv.org/abs/2602.22585
作者:Jodi M. Casabianca,Maggie Beiting-Parrish
备注:16 pages, 5 figures, 1 table; The 16th Annual Learning Analytics and Knowledge Conference (LAK) Workshop on LLM Psychometrics, April 27, 2026, Bergen, Norway
摘要
:Human evaluations play a central role in training and assessing AI models, yet these data are rarely treated as measurements subject to systematic error. This paper integrates psychometric rater models into the AI pipeline to improve the reliability and validity of conclusions drawn from human judgments. The paper reviews common rater effects, severity and centrality, that distort observed ratings, and demonstrates how item response theory rater models, particularly the multi-faceted Rasch model, can separate true output quality from rater behavior. Using the OpenAI summarization dataset as an empirical example, we show how adjusting for rater severity produces corrected estimates of summary quality and provides diagnostic insight into rater performance. Incorporating psychometric modeling into human-in-the-loop evaluation offers more principled and transparent use of human data, enabling developers to make decisions based on adjusted scores rather than raw, error-prone ratings. This perspective highlights a path toward more robust, interpretable, and construct-aligned practices for AI development and evaluation.
【20】IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck
标题:IBCircuit:在信息瓶颈下走向整体电路发现
链接:https://arxiv.org/abs/2602.22581
作者:Tian Bian,Yifan Niu,Chaohao Yuan,Chengzhi Piao,Bingzhe Wu,Long-Kai Huang,Yu Rong,Tingyang Xu,Hong Cheng,Jia Li
摘要:Circuit discovery has recently attracted attention as a potential research direction to explain the non-trivial behaviors of language models. It aims to find the computational subgraphs, also known as circuits, within the model that are responsible for solving specific tasks. However, most existing studies overlook the holistic nature of these circuits and require designing specific corrupted activations for different tasks, which is inaccurate and inefficient. In this work, we propose an end-to-end approach based on the principle of Information Bottleneck, called IBCircuit, to identify informative circuits holistically. IBCircuit is an optimization framework for holistic circuit discovery and can be applied to any given task without tediously corrupted activation design. In both the Indirect Object Identification (IOI) and Greater-Than tasks, IBCircuit identifies more faithful and minimal circuits in terms of critical node components and edge components compared to recent related work.
【21】Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training
标题:Search-P1:以路径为中心的奖励塑造,实现稳定有效的统计RAG训练
链接:https://arxiv.org/abs/2602.22576
作者:Tianle Xia,Ming Xu,Lingxiang Hu,Yiding Sun,Wenwei Li,Linfang Shang,Liqun Liu,Peng Shu,Huan Yu,Jie Jiang
摘要:Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, yet traditional single-round retrieval struggles with complex multi-step reasoning. Agentic RAG addresses this by enabling LLMs to dynamically decide when and what to retrieve, but current RL-based training methods suffer from sparse outcome rewards that discard intermediate signals and low sample efficiency where failed samples contribute nothing. We propose Search-P1, a framework that introduces path-centric reward shaping for agentic RAG training, comprising two key components: (1) Path-Centric Reward, which evaluates the structural quality of reasoning trajectories through order-agnostic step coverage and soft scoring that extracts learning signals even from failed samples, and (2) Dual-Track Path Scoring with offline-generated reference planners that assesses paths from both self-consistency and reference-alignment perspectives. Experiments on multiple QA benchmarks demonstrate that Search-P1 achieves significant improvements over Search-R1 and other strong baselines, with an average accuracy gain of 7.7 points.
【22】S2O: Early Stopping for Sparse Attention via Online Permutation
标题:S2 O:通过在线排列提前停止分散注意力
链接:https://arxiv.org/abs/2602.22575
作者:Yu Zhang,Songwei Liu,Chenqian Yan,Sheng Lin,Beichen Ning,Fangmin Chen,Xing Wang
摘要:Attention scales quadratically with sequence length, fundamentally limiting long-context inference. Existing block-granularity sparsification can reduce latency, but coarse blocks impose an intrinsic sparsity ceiling, making further improvements difficult even with carefully engineered designs. We present S2O, which performs early stopping for sparse attention via online permutation. Inspired by virtual-to-physical address mapping in memory systems, S2O revisits and factorizes FlashAttention execution, enabling inference to load non-contiguous tokens rather than a contiguous span in the original order. Motivated by fine-grained structures in attention heatmaps, we transform explicit permutation into an online, index-guided, discrete loading policy; with extremely lightweight preprocessing and index-remapping overhead, it concentrates importance on a small set of high-priority blocks. Building on this importance-guided online permutation for loading, S2O further introduces an early-stopping rule: computation proceeds from high to low importance; once the current block score falls below a threshold, S2O terminates early and skips the remaining low-contribution blocks, thereby increasing effective sparsity and reducing computation under a controlled error budget. As a result, S2O substantially raises the practical sparsity ceiling. On Llama-3.1-8B under a 128K context, S2O reduces single-operator MSE by 3.82$\times$ at matched sparsity, and reduces prefill compute density by 3.31$\times$ at matched MSE; meanwhile, it preserves end-to-end accuracy and achieves 7.51$\times$ attention and 3.81$\times$ end-to-end speedups.
【23】Autoregressive Visual Decoding from EEG Signals
标题:脑电信号的自回归视觉解码
链接:https://arxiv.org/abs/2602.22555
作者:Sicheng Dai,Hongwang Xiao,Shan Yu,Qiwei Ye
摘要
:Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal resolution. However, current approaches face significant challenges in bridging the modality gap between EEG and image data. These methods typically rely on complex adaptation processes involving multiple stages, making it hard to maintain consistency and manage compounding errors. Furthermore, the computational overhead imposed by large-scale diffusion models limit their practicality in real-world brain-computer interface (BCI) applications. In this work, we present AVDE, a lightweight and efficient framework for visual decoding from EEG signals. First, we leverage LaBraM, a pre-trained EEG model, and fine-tune it via contrastive learning to align EEG and image representations. Second, we adopt an autoregressive generative framework based on a "next-scale prediction" strategy: images are encoded into multi-scale token maps using a pre-trained VQ-VAE, and a transformer is trained to autoregressively predict finer-scale tokens starting from EEG embeddings as the coarsest representation. This design enables coherent generation while preserving a direct connection between the input EEG signals and the reconstructed images. Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks, while using only 10% of the parameters. In addition, visualization of intermediate outputs shows that the generative process of AVDE reflects the hierarchical nature of human visual perception. These results highlight the potential of autoregressive models as efficient and interpretable tools for practical BCI applications.
【24】Multilingual Safety Alignment Via Sparse Weight Editing
标题:通过稀疏权重编辑进行多语言安全对齐
链接:https://arxiv.org/abs/2602.22554
作者:Jiaming Liang,Zhaoxin Wang,Handing Wang
摘要:Large Language Models (LLMs) exhibit significant safety disparities across languages, with low-resource languages (LRLs) often bypassing safety guardrails established for high-resource languages (HRLs) like English. Existing solutions, such as multilingual supervised fine-tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), are computationally expensive and dependent on scarce multilingual safety data. In this work, we propose a novel, training-free alignment framework based on Sparse Weight Editing. Identifying that safety capabilities are localized within a sparse set of safety neurons, we formulate the cross-lingual alignment problem as a constrained linear transformation. We derive a closed-form solution to optimally map the harmful representations of LRLs to the robust safety subspaces of HRLs, while preserving general utility via a null-space projection constraint. Extensive experiments across 8 languages and multiple model families (Llama-3, Qwen-2.5) demonstrate that our method substantially reduces Attack Success Rate (ASR) in LRLs with negligible impact on general reasoning capabilities, all achieved with a single, data-efficient calculation.
【25】Towards Dynamic Dense Retrieval with Routing Strategy
标题:基于路由策略的动态密集检索
链接:https://arxiv.org/abs/2602.22547
作者:Zhan Su,Fengran Mo,Jinghan Zhang,Yuchen Hui,Jia Ao Sun,Bingbing Wen,Jian-Yun Nie
摘要:The \textit{de facto} paradigm for applying dense retrieval (DR) to new tasks involves fine-tuning a pre-trained model for a specific task. However, this paradigm has two significant limitations: (1) It is difficult adapt the DR to a new domain if the training dataset is limited. (2) Old DR models are simply replaced by newer models that are trained from scratch when the former are no longer up to date. Especially for scenarios where the model needs to be updated frequently, this paradigm is prohibitively expensive. To address these challenges, we propose a novel dense retrieval approach, termed \textit{dynamic dense retrieval} (DDR). DDR uses \textit{prefix tuning} as a \textit{module} specialized for a specific domain. These modules can then be compositional combined with a dynamic routing strategy, enabling highly flexible domain adaptation in the retrieval part. Extensive evaluation on six zero-shot downstream tasks demonstrates that this approach can surpass DR while utilizing only 2\% of the training parameters, paving the way to achieve more flexible dense retrieval in IR. We see it as a promising future direction for applying dense retrieval to various tasks.
【26】TFPS: A Temporal Filtration-enhanced Positive Sample Set Construction Method for Implicit Collaborative Filtering
标题:TFPS:一种用于隐式协同过滤的时间过滤增强正样本集构建方法
链接:https://arxiv.org/abs/2602.22521
作者:Jiayi Wu,Zhengyu Wu,Xunkai Li,Rong-Hua Li,Guoren Wang
摘要:The negative sampling strategy can effectively train collaborative filtering (CF) recommendation models based on implicit feedback by constructing positive and negative samples. However, existing methods primarily optimize the negative sampling process while neglecting the exploration of positive samples. Some denoising recommendation methods can be applied to denoise positive samples within negative sampling strategies, but they ignore temporal information. Existing work integrates sequential information during model aggregation but neglects time interval information, hindering accurate capture of users' current preferences. To address this problem, from a data perspective, we propose a novel temporal filtration-enhanced approach to construct a high-quality positive sample set. First, we design a time decay model based on interaction time intervals, transforming the original graph into a weighted user-item bipartite graph. Then, based on predefined filtering operations, the weighted user-item bipartite graph is layered. Finally, we design a layer-enhancement strategy to construct a high-quality positive sample set for the layered subgraphs. We provide theoretical insights into why TFPS can improve Recall@k and NDCG@k, and extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed method. Additionally, TFPS can be integrated with various implicit CF recommenders or negative sampling methods to enhance its performance.
【27】VeRO: An Evaluation Harness for Agents to Optimize Agents
标题:VeRO:代理优化代理的评估工具包
链接:https://arxiv.org/abs/2602.22480
作者:Varun Ursekar,Apaar Shanker,Veronica Chatrath,Yuan,Xue,Sam Denton
摘要
:An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this task. Agent optimization differs fundamentally from conventional software engineering: the target agent interleaves deterministic code with stochastic LLM completions, requiring structured capture of both intermediate reasoning and downstream execution outcomes. To address these challenges, we introduce VERO (Versioning, Rewards, and Observations), which provides (1) a reproducible evaluation harness with versioned agent snapshots, budget-controlled evaluation, and structured execution traces, and (2) a benchmark suite of target agents and tasks with reference evaluation procedures. Using VERO, we conduct an empirical study comparing optimizer configurations across tasks and analyzing which modifications reliably improve target agent performance. We release VERO to support research on agent optimization as a core capability for coding agents.
【28】veScale-FSDP: Flexible and High-Performance FSDP at Scale
标题:veScale-FSDP:大规模灵活且高性能的FSDP
链接:https://arxiv.org/abs/2602.22437
作者:Zezhou Wang,Youjie Li,Zhiqi Lin,Jiacheng Yang,Cong Xie,Guanyu Feng,Zheng Zhong,Ziyue Huang,Hongyu Zhu,Zhi Zhang,Yanghua Peng,Xin Liu
摘要:Fully Sharded Data Parallel (FSDP), also known as ZeRO, is widely used for training large-scale models, featuring its flexibility and minimal intrusion on model code. However, current FSDP systems struggle with structure-aware training methods (e.g., block-wise quantized training) and with non-element-wise optimizers (e.g., Shampoo and Muon) used in cutting-edge models (e.g., Gemini, Kimi K2). FSDP's fixed element- or row-wise sharding formats conflict with the block-structured computations. In addition, today's implementations fall short in communication and memory efficiency, limiting scaling to tens of thousands of GPUs. We introduce veScale-FSDP, a redesigned FSDP system that couples a flexible sharding format, RaggedShard, with a structure-aware planning algorithm to deliver both flexibility and performance at scale. veScale-FSDP natively supports efficient data placement required by FSDP, empowering block-wise quantization and non-element-wise optimizers. As a result, veScale-FSDP achieves 5~66% higher throughput and 16~30% lower memory usage than existing FSDP systems, while scaling efficiently to tens of thousands of GPUs.
【29】GetBatch: Distributed Multi-Object Retrieval for ML Data Loading
标题:GetBlock:用于ML数据加载的分布式多对象检索
链接:https://arxiv.org/abs/2602.22434
作者:Alex Aizman,Abhishek Gaikwad,Piotr Żelasko
备注:11 pages, 3 figures, 2 tables. Preprint
摘要:Machine learning training pipelines consume data in batches. A single training step may require thousands of samples drawn from shards distributed across a storage cluster. Issuing thousands of individual GET requests incurs per-request overhead that often dominates data transfer time. To solve this problem, we introduce GetBatch - a new object store API that elevates batch retrieval to a first-class storage operation, replacing independent GET operations with a single deterministic, fault-tolerant streaming execution. GetBatch achieves up to 15x throughput improvement for small objects and, in a production training workload, reduces P95 batch retrieval latency by 2x and P99 per-object tail latency by 3.7x compared to individual GET requests.
【30】SimpleOCR: Rendering Visualized Questions to Teach MLLMs to Read
标题:SimpleOCR:渲染可视化问题以教MLLM阅读
链接:https://arxiv.org/abs/2602.22426
作者:Yibo Peng,Peng Xia,Ding Zhong,Kaide Zeng,Siwei Han,Yiyang Zhou,Jiaqi Liu,Ruiyi Zhang,Huaxiu Yao
摘要:Despite the rapid advancements in Multimodal Large Language Models (MLLMs), a critical question regarding their visual grounding mechanism remains unanswered: do these models genuinely ``read'' text embedded in images, or do they merely rely on parametric shortcuts in the text prompt? In this work, we diagnose this issue by introducing the Visualized-Question (VQ) setting, where text queries are rendered directly onto images to structurally mandate visual engagement. Our diagnostic experiments on Qwen2.5-VL reveal a startling capability-utilization gap: despite possessing strong OCR capabilities, models suffer a performance degradation of up to 12.7% in the VQ setting, exposing a deep-seated ``modality laziness.'' To bridge this gap, we propose SimpleOCR, a plug-and-play training strategy that imposes a structural constraint on the learning process. By transforming training samples into the VQ format with randomized styles, SimpleOCR effectively invalidates text-based shortcuts, compelling the model to activate and optimize its visual text extraction pathways. Empirically, SimpleOCR yields robust gains without architectural modifications. On four representative OOD benchmarks, it surpasses the base model by 5.4% and GRPO based on original images by 2.7%, while exhibiting extreme data efficiency, achieving superior performance with 30x fewer samples (8.5K) than recent RL-based methods. Furthermore, its plug-and-play nature allows seamless integration with advanced RL strategies like NoisyRollout to yield complementary improvements. Code is available at https://github.com/aiming-lab/SimpleOCR.
【31】Disentangling Shared and Target-Enriched Topics via Background-Contrastive Non-negative Matrix Factorization
标题:通过背景对比非负矩阵分解解开共享和目标丰富的主题
链接:https://arxiv.org/abs/2602.22387
作者:Yixuan Li,Archer Y. Yang,Yue Li
摘要
:Biological signals of interest in high-dimensional data are often masked by dominant variation shared across conditions. This variation, arising from baseline biological structure or technical effects, can prevent standard dimensionality reduction methods from resolving condition-specific structure. The challenge is that these confounding topics are often unknown and mixed with biological signals. Existing background correction methods are either unscalable to high dimensions or not interpretable. We introduce background contrastive Non-negative Matrix Factorization (\model), which extracts target-enriched latent topics by jointly factorizing a target dataset and a matched background using shared non-negative bases under a contrastive objective that suppresses background-expressed structure. This approach yields non-negative components that are directly interpretable at the feature level, and explicitly isolates target-specific variation. \model is learned by an efficient multiplicative update algorithm via matrix multiplication such that it is highly efficient on GPU hardware and scalable to big data via minibatch training akin to deep learning approach. Across simulations and diverse biological datasets, \model reveals signals obscured by conventional methods, including disease-associated programs in postmortem depressive brain single-cell RNA-seq, genotype-linked protein expression patterns in mice, treatment-specific transcriptional changes in leukemia, and TP53-dependent drug responses in cancer cell lines.
【32】A 1/R Law for Kurtosis Contrast in Balanced Mixtures
标题:平衡混合中峰度对比的1/R定律
链接:https://arxiv.org/abs/2602.22334
作者:Yuda Bi,Wenjun Xiao,Linhao Bai,Vince D Calhoun
摘要:Kurtosis-based Independent Component Analysis (ICA) weakens in wide, balanced mixtures. We prove a sharp redundancy law: for a standardized projection with effective width $R_{\mathrm{eff}}$ (participation ratio), the population excess kurtosis obeys $|κ(y)|=O(κ_{\max}/R_{\mathrm{eff}})$, yielding the order-tight $O(c_bκ_{\max}/R)$ under balance (typically $c_b=O(\log R)$). As an impossibility screen, under standard finite-moment conditions for sample kurtosis estimation, surpassing the $O(1/\sqrt{T})$ estimation scale requires $R\lesssim κ_{\max}\sqrt{T}$. We also show that \emph{purification} -- selecting $m\!\ll\!R$ sign-consistent sources -- restores $R$-independent contrast $Ω(1/m)$, with a simple data-driven heuristic. Synthetic experiments validate the predicted decay, the $\sqrt{T}$ crossover, and contrast recovery.
【33】Training Agents to Self-Report Misbehavior
标题:训练代理人自我报告不当行为
链接:https://arxiv.org/abs/2602.22303
作者:Bruce W. Lee,Chen Yueh-Han,Tomek Korbak
摘要:Frontier AI agents may pursue hidden goals while concealing their pursuit from oversight. Alignment training aims to prevent such behavior by reinforcing the correct goals, but alignment may not always succeed and can lead to unwanted side effects. We propose self-incrimination training, which instead trains agents to produce a visible signal when they covertly misbehave. We train GPT-4.1 and Gemini-2.0 agents to call a report_scheming() tool when behaving deceptively and measure their ability to cause harm undetected in out-of-distribution environments. Self-incrimination significantly reduces the undetected successful attack rate, outperforming matched-capability monitors and alignment baselines while preserving instruction hierarchy and incurring minimal safety tax on general capabilities. Unlike blackbox monitoring, self-incrimination performance is consistent across tasks regardless of how suspicious the misbehavior appears externally. The trained behavior persists under adversarial prompt optimization and generalizes to settings where agents pursue misaligned goals themselves rather than being instructed to misbehave. Our results suggest self-incrimination offers a viable path for reducing frontier misalignment risk, one that neither assumes misbehavior can be prevented nor that it can be reliably classified from the outside.
【34】Multi-Level Causal Embeddings
标题:多层次因果嵌入
链接:https://arxiv.org/abs/2602.22287
作者:Willem Schooltink,Fabio Massimo Zennaro
摘要:Abstractions of causal models allow for the coarsening of models such that relations of cause and effect are preserved. Whereas abstractions focus on the relation between two models, in this paper we study a framework for causal embeddings which enable multiple detailed models to be mapped into sub-systems of a coarser causal model. We define causal embeddings as a generalization of abstraction, and present a generalized notion of consistency. By defining a multi-resolution marginal problem, we showcase the relevance of causal embeddings for both the statistical marginal problem and the causal marginal problem; furthermore, we illustrate its practical use in merging datasets coming from models with different representations.
【35】Differentially Private Truncation of Unbounded Data via Public Second Moments
标题:基于公共二阶矩的无界数据差分私有截断
链接:https://arxiv.org/abs/2602.22282
作者:Zilong Cao,Xuan Bi,Hai Zhang
摘要:Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
【36】RETLLM: Training and Data-Free MLLMs for Multimodal Information Retrieval
标题:RETLLM:用于多模式信息检索的训练和无数据MLLM
链接:https://arxiv.org/abs/2602.22278
作者:Dawei Su,Dongsheng Wang
备注:5 pages, 2 figure
摘要
:Multimodal information retrieval (MMIR) has gained attention for its flexibility in handling text, images, or mixed queries and candidates. Recent breakthroughs in multimodal large language models (MLLMs) boost MMIR performance by incorporating MLLM knowledge under the contrastive finetuning framework. However, they suffer from pre-training inconsistency and require large datasets. In this work, we introduce a novel framework, RetLLM, designed to query MLLMs for MMIR in a training- and data-free manner. Specifically, we formulate MMIR as a similarity score generation task and prompt MLLMs to directly predict retrieval scores in a coarse-then-fine pipeline. At the coarse stage, a top-k filtering strategy builds a small yet high-quality candidate pool for each query, enabling MLLMs to focus on semantically relevant candidates. Subsequently, the retrieval score is predicted by feeding both the query and candidate into MLLMs at the fine stage. Importantly, we propose a visual enhancement module during reasoning to help MLLMs re-pick forgotten visuals, improving retrieval. Extensive experiments on MMIR benchmarks show that RetLLM outperforms fine-tuned models. Ablation studies further verify each component. Our work demonstrates that MLLMs can achieve strong MMIR performance without any training, highlighting their inherent multimodal reasoning ability in a simple, scalable framework. We release our code at: https://github.com/alivecat05/RETLLM
【37】Entropy-Controlled Flow Matching
标题:熵控制流量匹配
链接:https://arxiv.org/abs/2602.22265
作者:Chika Maduabuchi
摘要:Modern vision generators transport a base distribution to data through time-indexed measures, implemented as deterministic flows (ODEs) or stochastic diffusions (SDEs). Despite strong empirical performance, standard flow-matching objectives do not directly control the information geometry of the trajectory, allowing low-entropy bottlenecks that can transiently deplete semantic modes. We propose Entropy-Controlled Flow Matching (ECFM): a constrained variational principle over continuity-equation paths enforcing a global entropy-rate budget d/dt H(mu_t) >= -lambda. ECFM is a convex optimization in Wasserstein space with a KKT/Pontryagin system, and admits a stochastic-control representation equivalent to a Schrodinger bridge with an explicit entropy multiplier. In the pure transport regime, ECFM recovers entropic OT geodesics and Gamma-converges to classical OT as lambda -> 0. We further obtain certificate-style mode-coverage and density-floor guarantees with Lipschitz stability, and construct near-optimal collapse counterexamples for unconstrained flow matching.
【38】SEGB: Self-Evolved Generative Bidding with Local Autoregressive Diffusion
标题:SEGB:具有局部自回归扩散的自我进化生成竞价
链接:https://arxiv.org/abs/2602.22226
作者:Yulong Gao,Wan Jiang,Mingzhe Cao,Xuepu Wang,Zeyu Pan,Haonan Yang,Ye Liu,Xin Yang
摘要:In the realm of online advertising, automated bidding has become a pivotal tool, enabling advertisers to efficiently capture impression opportunities in real-time. Recently, generative auto-bidding has shown significant promise, offering innovative solutions for effective ad optimization. However, existing offline-trained generative policies lack the near-term foresight required for dynamic markets and usually depend on simulators or external experts for post-training improvement. To overcome these critical limitations, we propose Self-Evolved Generative Bidding (SEGB), a framework that plans proactively and refines itself entirely offline. SEGB first synthesizes plausible short-horizon future states to guide each bid, providing the agent with crucial, dynamic foresight. Crucially, it then performs value-guided policy refinement to iteratively discover superior strategies without any external intervention. This self-contained approach uniquely enables robust policy improvement from static data alone. Experiments on the AuctionNet benchmark and a large-scale A/B test validate our approach, demonstrating that SEGB significantly outperforms state-of-the-art baselines. In a large-scale online deployment, it delivered substantial business value, achieving a +10.19% increase in target cost, proving the effectiveness of our advanced planning and evolution paradigm.
【39】SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
标题:SmartChunk检索:查询感知的Chunk压缩,并规划高效的文档RAG
链接:https://arxiv.org/abs/2602.22225
作者:Xuechen Zhang,Koustava Goswami,Samet Oymak,Jiasi Chen,Nedim Lipka
备注:26 pages, 10 figures
摘要:Retrieval-augmented generation (RAG) has strong potential for producing accurate and factual outputs by combining language models (LMs) with evidence retrieved from large text corpora. However, current pipelines are limited by static chunking and flat retrieval: documents are split into short, predetermined, fixed-size chunks, embeddings are retrieved uniformly, and generation relies on whatever chunks are returned. This design brings challenges, as retrieval quality is highly sensitive to chunk size, often introduces noise from irrelevant or misleading chunks, and scales poorly to large corpora. We present SmartChunk retrieval, a query-adaptive framework for efficient and robust long-document question answering (QA). SmartChunk uses (i) a planner that predicts the optimal chunk abstraction level for each query, and (ii) a lightweight compression module that produces high-level chunk embeddings without repeated summarization. By adapting retrieval granularity on the fly, SmartChunk balances accuracy with efficiency and avoids the drawbacks of fixed strategies. Notably, our planner can reason about chunk abstractions through a novel reinforcement learning scheme, STITCH, which boosts accuracy and generalization. To reflect real-world applications, where users face diverse document types and query styles, we evaluate SmartChunk on five QA benchmarks plus one out-of-domain dataset. Across these evaluations, SmartChunk outperforms state-of-the-art RAG baselines, while reducing cost. Further analysis demonstrates strong scalability with larger corpora and consistent gains on out-of-domain datasets, highlighting its effectiveness as a general framework for adaptive retrieval.
【40】SQaLe: A Large Text-to-SQL Corpus Grounded in Real Schemas
标题:SQaLe:一个基于实模式的大型文本到SQL语料库
链接:https://arxiv.org/abs/2602.22223
作者:Cornelius Wolff,Daniel Gomm,Madelon Hulsebos
备注:Accepted at the AI for Tabular Data workshop at EurIPS 2025
摘要
:Advances in large language models have accelerated progress in text-to-SQL, methods for converting natural language queries into valid SQL queries. A key bottleneck for developing generalizable text-to-SQL models is the lack of large-scale datasets with sufficient schema and query complexity, domain coverage, and task diversity. We introduce SQaLe: a large-scale semi-synthetic text-to-SQL dataset built on 135,875 relational database schemas expanded from a collection of real-world schemas, SchemaPile. We establish a principled generation pipeline which combines schema sampling, question synthesis, and SQL construction, and produce 517,676 high-quality (question, schema, query) triples. The SQaLe dataset captures realistic schema size variability, diverse query patterns, and natural language ambiguity while maintaining execution validity. We provide an analysis of its contents and characteristics, and find that SQaLe introduces the most realistic large-scale text-to-SQL dataset to date in comparison with existing benchmarks and datasets. We discuss how SQaLe enables our vision for data scaling and model generalization in text-to-SQL research. The dataset is accessible at: https://huggingface.co/datasets/trl-lab/SQaLe-text-to-SQL-dataset.
【41】Regular Fourier Features for Nonstationary Gaussian Processes
标题:非平稳高斯过程的规则傅里叶特征
链接:https://arxiv.org/abs/2602.23006
作者:Arsalan Jawaid,Abdullah Karatas,Jörg Seewig
备注:8 pages, 5 figures
摘要:Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation, treating the spectral density as a probability distribution for Monte Carlo approximation. Although this probabilistic interpretation works for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes that avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite spectral support assumption, this yields an efficient low-rank approximation that is positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary kernels and on harmonizable mixture kernels with complex-valued spectral densities.
【42】Kernel Integrated $R^2$: A Measure of Dependence
标题:核心集成$R#2 $:依赖性的衡量标准
链接:https://arxiv.org/abs/2602.22985
作者:Pouya Roudaki,Shakeel Gavioli-Akilagun,Florian Kalinke,Mona Azadkia,Zoltán Szabó
摘要:We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced integrated $R^2$ with the flexibility of reproducing kernel Hilbert spaces (RKHSs). The proposed measure extends integrated $R^2$ from scalar responses to responses taking values on general spaces equipped with a characteristic kernel, allowing to measure dependence of multivariate, functional, and structured data, while remaining sensitive to tail behaviour and oscillatory dependence structures. We establish that (i) this new measure takes values in $[0,1]$, (ii) equals zero if and only if independence holds, and (iii) equals one if and only if the response is almost surely a measurable function of the covariates. Two estimators are proposed: a graph-based method using $K$-nearest neighbours and an RKHS-based method built on conditional mean embeddings. We prove consistency and derive convergence rates for the graph-based estimator, showing its adaptation to intrinsic dimensionality. Numerical experiments on simulated data and a real data experiment in the context of dependency testing for media annotations demonstrate competitive power against state-of-the-art dependence measures, particularly in settings involving non-linear and structured relationships.
【43】Deep Accurate Solver for the Geodesic Problem
标题:大地测量问题的深度精确求解器
链接:https://arxiv.org/abs/2602.22275
作者:Saar Huberman,Amit Bracha,Ron Kimmel
备注:Extended version of Deep Accurate Solver for the Geodesic Problem originally published in Scale Space and Variational Methods in Computer Vision (SSVM 2023), Lecture Notes in Computer Science, Springer. This version includes additional experiments and detailed analysis
摘要:A common approach to compute distances on continuous surfaces is by considering a discretized polygonal mesh approximating the surface and estimating distances on the polygon. We show that exact geodesic distances restricted to the polygon are at most second-order accurate with respect to the distances on the corresponding continuous surface. By order of accuracy we refer to the convergence rate as a function of the average distance between sampled points. Next, a higher-order accurate deep learning method for computing geodesic distances on surfaces is introduced. Traditionally, one considers two main components when computing distances on surfaces: a numerical solver that locally approximates the distance function, and an efficient causal ordering scheme by which surface points are updated. Classical minimal path methods often exploit a dynamic programming principle with quasi-linear computational complexity in the number of sampled points. The quality of the distance approximation is determined by the local solver that is revisited in this paper. To improve state of the art accuracy, we consider a neural network-based local solver which implicitly approximates the structure of the continuous surface. We supply numerical evidence that the proposed learned update scheme provides better accuracy compared to the best possible polyhedral approximations and previous learning-based methods. The result is a third-order accurate solver with a bootstrapping-recipe for further improvement.
【44】Survey on Neural Routing Solvers
标题:神经路由求解器调查
链接:https://arxiv.org/abs/2602.21761
作者:Yunpeng Ba,Xi Lin,Changliang Zhou,Ruihao Zheng,Zhenkun Wang,Xinyan Liang,Zhichao Lu,Jianyong Sun,Yuhua Qian,Qingfu Zhang
摘要
:Neural routing solvers (NRSs) that leverage deep learning to tackle vehicle routing problems have demonstrated notable potential for practical applications. By learning implicit heuristic rules from data, NRSs replace the handcrafted counterparts in classic heuristic frameworks, thereby reducing reliance on costly manual design and trial-and-error adjustments. This survey makes two main contributions: (1) The heuristic nature of NRSs is highlighted, and existing NRSs are reviewed from the perspective of heuristics. A hierarchical taxonomy based on heuristic principles is further introduced. (2) A generalization-focused evaluation pipeline is proposed to address limitations of the conventional pipeline. Comparative benchmarking of representative NRSs across both pipelines uncovers a series of previously unreported gaps in current research.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递