|
|
创作新主题 |
| docker Elasticsearch |
| linux MongoDB Redis DATABASE NGINX 其他Web框架 web工具 zookeeper tornado NoSql Bootstrap js peewee Git bottle IE MQ Jquery |
| 机器学习算法 |
| 短视频 |
| 印度 |
点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计139篇
大模型相关(12篇)
【1】Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume
标题:基于不一致调整语义量的多模态大型语言模型的不确定性量化
链接:https://arxiv.org/abs/2602.24195
备注:Earlier versions presented at ICLR 2025 QUESTION workshop and ICML 2025 R2-FM workshop
摘要:尽管多模态大型语言模型(MLLM)具有这些功能,但它们可能会产生看似合理但错误的输出,从而阻碍可靠的部署。准确的不确定性度量可以将不可靠的查询升级到人类专家或更大的模型,以提高性能。然而,现有的不确定性度量有实际的限制,如仅为特定的模态设计,依赖于外部工具,或计算昂贵。我们引入了UMPIRE,这是一个用于MLLM的免训练不确定性量化框架,它可以在没有外部工具的情况下有效地跨各种输入和输出模态工作,仅依赖于模型自身的内部模态特征。UMPIRE计算给定任务实例的采样MLLM响应的不一致性调整语义量,有效地捕获样本的全局语义多样性和基于内部模型置信度的响应的局部不一致性。我们提出了MLLM的不确定性需求,并提供理论分析,激励UMPIRE的设计。广泛的实验表明,UMPIRE在图像,音频和视频-文本基准的错误检测和不确定性校准方面始终优于基线指标,包括对抗性和分布外设置。我们还展示了UMPIRE的泛化到非文本输出任务,包括图像和音频生成。
摘要:Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurate uncertainty metrics could enable escalation of unreliable queries to human experts or larger models for improved performance. However, existing uncertainty metrics have practical constraints, such as being designed only for specific modalities, reliant on external tools, or computationally expensive. We introduce UMPIRE, a training-free uncertainty quantification framework for MLLMs that works efficiently across various input and output modalities without external tools, relying only on the models' own internal modality features. UMPIRE computes the incoherence-adjusted semantic volume of sampled MLLM responses for a given task instance, effectively capturing both the global semantic diversity of samples and the local incoherence of responses based on internal model confidence. We propose uncertainty desiderata for MLLMs and provide theoretical analysis motivating UMPIRE's design. Extensive experiments show that UMPIRE consistently outperforms baseline metrics in error detection and uncertainty calibration across image, audio, and video-text benchmarks, including adversarial and out-of-distribution settings. We also demonstrate UMPIRE's generalization to non-text output tasks, including image and audio generation.
【2】Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving
标题:分布式LLM适配器服务中GPU效率的数据驱动优化
链接:https://arxiv.org/abs/2602.24044
备注:journal extension of the workshop paper titled as "A data-driven ml approach for maximizing performance in llm-adapter serving"
摘要:大型语言模型(LLM)适配器支持低成本的模型专门化,但在分布式服务系统中引入了复杂的缓存和调度挑战,其中必须同时托管数百个适配器。虽然以前的工作主要集中在延迟最小化,通过吞吐量最大化的资源效率仍然没有得到充分的探索。本文提出了一种数据驱动的流水线,对于给定的工作负载,计算适配器放置,以最小数量的GPU服务于工作负载,同时避免请求饥饿和GPU内存错误。为此,该方法通过利用从实际服务行为中学习到的准确性能预测来确定每个GPU上可达到的最大可行吞吐量。拟议的流水线集成了三个组件:(i)为LLM适配器服务量身定制的数字孪生(DT),(ii)在DT生成的数据上训练的蒸馏机器学习(ML)模型,以及(iii)利用基于ML的性能估计来最大化GPU效率的贪婪布局算法。DT以高保真度模拟真实的系统动态,实现低于5%的吞吐量估计误差,同时在可预测和不可预测的工作负载上执行速度比完整的LLM基准测试快90倍。学习的ML模型进一步加速了性能估计,同时降低了边际精度,从而实现了可扩展的优化。实验结果表明,该流水线通过减少维持目标工作负载所需的GPU数量,大大提高了GPU的效率。除了GPU效率之外,该流水线还可以适应其他目标,例如延迟最小化,突出了其对未来大规模LLM服务基础设施的多功能性。
摘要:Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving systems where hundreds of adapters must be hosted concurrently. While prior work has largely focused on latency minimization, resource efficiency through throughput maximization remains underexplored. This paper presents a data-driven pipeline that, for a given workload, computes an adapter placement that serves the workload with the minimum number of GPUs while avoiding request starvation and GPU memory errors. To that end, the approach identifies the maximum feasible throughput attainable on each GPU by leveraging accurate performance predictions learned from real serving behavior. The proposed pipeline integrates three components: (i) a Digital Twin (DT) tailored to LLM-adapter serving, (ii) a distilled machine learning (ML) model trained on DT-generated data, and (iii) a greedy placement algorithm that exploits ML-based performance estimates to maximize GPU efficiency. The DT emulates real system dynamics with high fidelity, achieving below 5% throughput estimation error while executing up to 90 times faster than full LLM benchmarking across both predictable and unpredictable workloads. The learned ML models further accelerate performance estimation with marginal accuracy degradation, enabling scalable optimization. Experimental results demonstrate that the pipeline substantially improves GPU efficiency by reducing the number of GPUs required to sustain target workloads. Beyond GPU efficiency, the pipeline can be adapted to alternative objectives, such as latency minimization, highlighting its versatility for future large-scale LLM serving infrastructures.
【3】Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language
标题:尼泊尔语言句子级主题分类的基于BERT的模型基准
链接:https://arxiv.org/abs/2602.23940
备注:5 pages, 2 figures. Accepted and presented at the Regional International Conference on Natural Language Processing (RegICON 2025), Gauhati University, Guwahati, India, November 27-29, 2025. To appear in the conference proceedings. Accepted papers list available at: https://www.regicon2025.in/accepted-papers
摘要:基于transformer的模型(如BERT)在许多语言中显著地推进了自然语言处理(NLP)。然而,尼泊尔语,一种以梵文书写的低资源语言,仍然相对未被开发。本研究基准多语言,印度语,印地语,尼泊尔语BERT的变种,以评估其在尼泊尔语主题分类的有效性。包括mBERT、XLM-R、MuRIL、DevBERT、HindiBERT、IndicBERT和NepBERTa在内的10个预训练模型在平衡的尼泊尔语数据集上进行了微调和测试,该数据集包含5个概念领域的25,006个句子,并使用准确率、加权精度、召回率、F1分数和AUROC指标评估了性能。结果显示,印度语模型,特别是MuRIL-large,获得了90.60%的最高F1分数,优于多语言和单语言模型。NepBERTA也表现出竞争力,F1得分为88.26%。总的来说,这些发现为未来的文档级分类和更广泛的尼泊尔语NLP应用建立了一个强大的基线。
摘要:Transformer-based models such as BERT have significantly advanced Natural Language Processing (NLP) across many languages. However, Nepali, a low-resource language written in Devanagari script, remains relatively underexplored. This study benchmarks multilingual, Indic, Hindi, and Nepali BERT variants to evaluate their effectiveness in Nepali topic classification. Ten pre-trained models, including mBERT, XLM-R, MuRIL, DevBERT, HindiBERT, IndicBERT, and NepBERTa, were fine-tuned and tested on the balanced Nepali dataset containing 25,006 sentences across five conceptual domains and the performance was evaluated using accuracy, weighted precision, recall, F1-score, and AUROC metrics. The results reveal that Indic models, particularly MuRIL-large, achieved the highest F1-score of 90.60%, outperforming multilingual and monolingual models. NepBERTa also performed competitively with an F1-score of 88.26%. Overall, these findings establish a robust baseline for future document-level classification and broader Nepali NLP applications.
【4】Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning
标题:增强软件漏洞预测的持续学习:通过混合信心感知选择性重播以实现临时LLM微调来解决灾难性遗忘
链接:https://arxiv.org/abs/2602.23834
备注:Accepted for publication in the Proceedings of the 2026 International Conference on Information Systems Security and Privacy (ICISSP)
摘要
:最近的工作将大型语言模型(LLM)应用于源代码漏洞检测,但大多数评估仍然依赖于随机训练测试分割,忽略了时间并高估了真实世界的性能。在实践中,探测器部署在不断发展的代码基础上,必须识别未来的时间分布变化下的漏洞。本文研究了在2018-2024年的CVE链接数据集上对解码器风格的语言模型(带有LoRA的Microsoft/phi-2)进行的持续微调,这些数据集被组织成双月窗口。我们评估了八种持续学习策略,包括仅窗口和累积训练,基于重放的基线和基于规则的变体。我们提出了混合类感知选择性重放(Hybrid-CASR),二进制漏洞分类的置信度感知重放方法,优先考虑不确定的样本,同时保持重放缓冲区中VULNERABLE和FIXED功能的平衡比例。在每两个月一次的前瞻性评估中,Hybrid-CASR实现了0.667的宏观F1,比仅窗口基线(0.651)提高了0.016,具有统计学显著性增益($p = 0.026$)和更强的向后保留(IBR@1为0.741)。与基线相比,Hybrid-CASR还将每个窗口的训练时间减少了约17%,而累积训练仅以15.9倍的计算成本提供了较小的F1增加(0.661)。总体而言,结果表明,选择性重放类平衡提供了一个实用的准确性和效率的权衡基于LLM的时间漏洞检测下连续的时间漂移。
摘要:Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits that ignore time and overestimate real-world performance. In practice, detectors are deployed on evolving code bases and must recognise future vulnerabilities under temporal distribution shift. This paper investigates continual fine-tuning of a decoder-style language model (microsoft/phi-2 with LoRA) on a CVE-linked dataset spanning 2018-2024, organised into bi-monthly windows. We evaluate eight continual learning strategies, including window-only and cumulative training, replay-based baselines and regularisation-based variants. We propose Hybrid Class-Aware Selective Replay (Hybrid-CASR), a confidence-aware replay method for binary vulnerability classification that prioritises uncertain samples while maintaining a balanced ratio of VULNERABLE and FIXED functions in the replay buffer. On bi-monthly forward evaluation Hybrid-CASR achieves a Macro-F1 of 0.667, improving on the window-only baseline (0.651) by 0.016 with statistically significant gains ($p = 0.026$) and stronger backward retention (IBR@1 of 0.741). Hybrid-CASR also reduces training time per window by about 17 percent compared to the baseline, whereas cumulative training delivers only a minor F1 increase (0.661) at a 15.9-fold computational cost. Overall, the results show that selective replay with class balancing offers a practical accuracy-efficiency trade-off for LLM-based temporal vulnerability detection under continuous temporal drift.
【5】GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models
标题:GLUScope:一个分析Transformer语言模型中GLU神经元的工具
链接:https://arxiv.org/abs/2602.23826
备注:6 pages for main body, 9 pages in total. 4 figures
摘要:我们提出了GLUScope,一个开源的工具,用于分析基于transformer的语言模型中的神经元,用于可解释性研究。我们专注于比以前的工具更新的模型,特别是我们考虑门控激活功能,如SwigLU。这带来了一个新的挑战:仅仅理解积极的激活是不够的。相反,神经元的门和激活都可以是正的或负的,导致四种不同的可能符号组合,在某些情况下具有完全不同的功能。因此,对于任何神经元,我们的工具都会显示四种符号组合中每一种的文本示例,并指示每种组合出现的频率。我们描述了我们的工具如何导致新的见解的例子。演示可在https://sjgerstner.github.io/gluscope上获得。
摘要:We present GLUScope, an open-source tool for analyzing neurons in Transformer-based language models, intended for interpretability researchers. We focus on more recent models than previous tools do; specifically we consider gated activation functions such as SwiGLU. This introduces a new challenge: understanding positive activations is not enough. Instead, both the gate and the in activation of a neuron can be positive or negative, leading to four different possible sign combinations that in some cases have quite different functionalities. Accordingly, for any neuron, our tool shows text examples for each of the four sign combinations, and indicates how often each combination occurs. We describe examples of how our tool can lead to novel insights. A demo is available at https: //sjgerstner.github.io/gluscope.
【6】MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models
标题:MPU:面向大型语言模型的安全和隐私保护知识学习
链接:https://arxiv.org/abs/2602.23798
摘要:大型语言模型的机器非学习通常面临隐私困境,其中严格的约束禁止共享服务器的参数或客户端的遗忘集。为了解决这个双重非公开约束,我们提出了MPU,一个算法不可知的隐私保护多个扰动副本学习框架,主要介绍了两个服务器端模块:预处理随机副本生成和后处理更新聚合。在预处理中,服务器分发多个扰动和重新参数化的模型实例,允许客户端在其私有遗忘集上本地执行unlearning,而无需访问服务器的确切原始参数。在本地学习之后,服务器通过反转重新参数化并使用谐波去噪过程聚合更新来执行后处理,以减轻扰动的影响。对7种去学习算法的实验表明,MPU实现了与无噪声基线相当的去学习性能,在10%的噪声下,大多数算法的平均退化远低于1%,并且在1%的噪声下,某些算法甚至可以超过无噪声基线。代码可在https://github.com/Tristan-SHU/MPU上获得。
摘要:Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan-SHU/MPU.
【7】From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning
标题:从静态基准到动态协议:评估LLM推理的以代理为中心的文本异常检测
链接:https://arxiv.org/abs/2602.23729
备注:Accepted to ICLR 2026
摘要:大型语言模型(LLM)的评估主要依赖于静态数据集,这些数据集提供有限的可扩展性,并且无法捕获最近模型的不断发展的推理能力。为了克服这些限制,我们提出了一个以代理为中心的基准测试范式,通过引入一个动态协议,自主代理迭代生成,验证和解决问题,超越静态数据集。在该协议中,教师代理生成候选问题,编排代理严格验证其有效性并防止对抗性攻击,学生代理尝试解决验证的问题。无效的问题由教师代理修改,直到它通过验证。如果学生正确地解决了问题,协调器会提示教师生成更具挑战性的变体。因此,随着更有能力的代理被替换为任何角色,基准测试的难度会自动增加,从而能够在没有手动管理数据集的情况下对大型语言模型进行渐进式评估。采用文本异常检测作为我们的主要评估格式,这需要跨句子的逻辑推理和抵制模式匹配的捷径,我们证明了该协议系统地暴露角情况下推理错误,传统的基准测试未能揭示。我们进一步提倡沿着几个互补的轴评估系统,包括跨模型成对的性能和初始和协调器最终确定的问题之间的进展。通过将重点从固定数据集转移到动态协议,我们的方法为评估不断发展的语言模型提供了一个可持续的方向,并引入了以代理为中心的基准的共同进化为中心的研究议程。
摘要:The evaluation of large language models (LLMs) has predominantly relied on static datasets, which offer limited scalability and fail to capture the evolving reasoning capabilities of recent models. To overcome these limitations, we propose an agent-centric benchmarking paradigm that moves beyond static datasets by introducing a dynamic protocol in which autonomous agents iteratively generate, validate, and solve problems. Within this protocol, a teacher agent generates candidate problems, an orchestrator agent rigorously verifies their validity and guards against adversarial attacks, and a student agent attempts to solve the validated problems. An invalid problem is revised by the teacher agent until it passes validation. If the student correctly solves the problem, the orchestrator prompts the teacher to generate more challenging variants. Consequently, the benchmark scales in difficulty automatically as more capable agents are substituted into any role, enabling progressive evaluation of large language models without manually curated datasets. Adopting text anomaly detection as our primary evaluation format, which demands cross-sentence logical inference and resists pattern-matching shortcuts, we demonstrate that this protocol systematically exposes corner-case reasoning errors that conventional benchmarks fail to reveal. We further advocate evaluating systems along several complementary axes including cross-model pairwise performance and progress between the initial and orchestrator-finalized problems. By shifting the focus from fixed datasets to dynamic protocols, our approach offers a sustainable direction for evaluating ever-evolving language models and introduces a research agenda centered on the co-evolution of agent-centric benchmarks.
【8】FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation
标题:FlexGuard:持续风险评分-自适应LLM内容审核
链接:https://arxiv.org/abs/2602.23636
摘要
:确保LLM生成内容的安全性对于实际部署至关重要。大多数现有的护栏模型制定适度作为一个固定的二元分类任务,隐含地假设一个固定的定义的危害性。在实践中,执行严格性-如何保守地定义和执行有害性-在不同的平台上变化,并随着时间的推移而发展,使得二进制仲裁者在不断变化的需求下变得脆弱。我们首先介绍FlexBench,这是一个严格自适应的LLM适度基准,可以在多个严格制度下进行受控评估。FlexBench上的实验揭示了现有主持人中存在的大量交叉严格性不一致:在一种制度下表现良好的模型在其他制度下可能会大幅下降,限制了它们的实际可用性。为了解决这个问题,我们提出了FlexGuard,一个基于LLM的主持人,输出一个经过校准的连续风险评分,反映风险的严重程度,并通过阈值支持严格的特定决策。我们通过风险对齐优化来训练FlexGuard,以提高评分-严重性一致性,并提供实用的阈值选择策略,以适应部署时的目标严格性。在FlexBench和公共基准测试上的实验表明,FlexGuard在不同的严格性下实现了更高的仲裁准确性和显著提高的鲁棒性。我们发布源代码和数据以支持可重复性。
摘要:Ensuring the safety of LLM-generated content is essential for real-world deployment. Most existing guardrail models formulate moderation as a fixed binary classification task, implicitly assuming a fixed definition of harmfulness. In practice, enforcement strictness - how conservatively harmfulness is defined and enforced - varies across platforms and evolves over time, making binary moderators brittle under shifting requirements. We first introduce FlexBench, a strictness-adaptive LLM moderation benchmark that enables controlled evaluation under multiple strictness regimes. Experiments on FlexBench reveal substantial cross-strictness inconsistency in existing moderators: models that perform well under one regime can degrade substantially under others, limiting their practical usability. To address this, we propose FlexGuard, an LLM-based moderator that outputs a calibrated continuous risk score reflecting risk severity and supports strictness-specific decisions via thresholding. We train FlexGuard via risk-alignment optimization to improve score-severity consistency and provide practical threshold selection strategies to adapt to target strictness at deployment. Experiments on FlexBench and public benchmarks demonstrate that FlexGuard achieves higher moderation accuracy and substantially improved robustness under varying strictness. We release the source code and data to support reproducibility.
【9】Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning
标题:冻结语言和图像模型的超维跨模式对齐以实现高效的图像字幕
链接:https://arxiv.org/abs/2602.23588
摘要:视觉和语言的大型单峰基础模型编码了丰富的语义结构,但对齐它们通常需要计算密集型的多模态微调。这种方法依赖于大规模的参数更新,是资源密集型的,并且可以扰乱预先训练的表示。然而,新出现的证据表明,独立训练的基础模型可能已经表现出潜在的语义兼容性,反映了它们建模的数据中的共享结构。这就提出了一个基本问题:在不修改模型本身的情况下,能否实现跨模态对齐?在这里,我们介绍HDFLIM(HyperDimensional Computing with Frozen Language and Image Models),这是一个框架,可以建立跨模态映射,同时保持预训练的视觉和语言模型完全冻结。HDFLIM将单峰嵌入投影到共享的超维空间中,并利用轻量级的符号操作-绑定,捆绑和基于相似性的检索,在数据的单次传递中构建关联的跨模态表示。字幕生成源于高维内存检索,而不是基于梯度的迭代优化。我们表明,HDFLIM实现了与端到端视觉语言训练方法相当的性能,并产生比zero-shot基线更语义接地的字幕。通过解耦对齐参数调整,我们的研究结果表明,可以实现通过符号操作的各自嵌入的高维编码的基础模型的语义映射。更广泛地说,这项工作指向一种替代范式的基础模型对齐,其中冻结模型集成通过结构化的代表性映射,而不是通过大规模的再培训。我们实现的代码库可以在https://github.com/Abhishek-Dalvi410/HDFLIM上找到。
摘要:Large unimodal foundation models for vision and language encode rich semantic structures, yet aligning them typically requires computationally intensive multimodal fine-tuning. Such approaches depend on large-scale parameter updates, are resource intensive, and can perturb pretrained representations. Emerging evidence suggests, however, that independently trained foundation models may already exhibit latent semantic compatibility, reflecting shared structures in the data they model. This raises a fundamental question: can cross-modal alignment be achieved without modifying the models themselves? Here we introduce HDFLIM (HyperDimensional computing with Frozen Language and Image Models), a framework that establishes cross-modal mappings while keeping pretrained vision and language models fully frozen. HDFLIM projects unimodal embeddings into a shared hyperdimensional space and leverages lightweight symbolic operations -- binding, bundling, and similarity-based retrieval to construct associative cross-modal representations in a single pass over the data. Caption generation emerges from high-dimensional memory retrieval rather than iterative gradient-based optimization. We show that HDFLIM achieves performance comparable to end-to-end vision-language training methods and produces captions that are more semantically grounded than zero-shot baselines. By decoupling alignment from parameter tuning, our results suggest that semantic mapping across foundation models can be realized through symbolic operations on hyperdimensional encodings of the respective embeddings. More broadly, this work points toward an alternative paradigm for foundation model alignment in which frozen models are integrated through structured representational mappings rather than through large-scale retraining. The codebase for our implementation can be found at https://github.com/Abhishek-Dalvi410/HDFLIM.
【10】Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents
标题:Rudder:使用LLM代理指导分布式GNN训练中的预取
链接:https://arxiv.org/abs/2602.23556
备注:Accepted to the 40th ACM International Conference on Supercomputing (ICS 2026)
摘要:大规模图神经网络(GNN)通常通过将顶点的邻居采样到固定距离来训练。由于大型输入图是分布式的,因此训练需要频繁的不规则通信,这会阻碍前进的进程。此外,获取的数据随图、图分布、样本和批参数以及缓存策略而变化。因此,任何静态预取方法都将错过适应不同动态条件的关键机会。在本文中,我们介绍了嵌入在最先进的AWS DistDGL框架中的软件模块Rudder,用于自主预取远程节点并最大限度地减少通信。Rudder的适应性与标准分类器和传统ML分类器形成对比。我们观察到,在当代大型语言模型(LLM)中发现的生成式AI表现出诸如针对zero-shot任务的上下文学习(ICL)等新兴特性,具有逻辑多步推理。我们发现这种行为非常适合自适应控制,即使有大量的训练不足。在NERSC Perlmutter超级计算机上使用标准数据集和看不见的配置进行的评估显示,与基线DistDGL(无预取)相比,端到端训练性能提高了91%,与静态预取相比提高了82%,减少了50%以上的通信。我们的代码可在https://github.com/aishwaryyasarkar/rudder-llm-agent上获得。
摘要:Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular communication that stalls forward progress. Moreover, fetched data changes with graph, graph distribution, sample and batch parameters, and caching polices. Consequently, any static prefetching method will miss crucial opportunities to adapt to different dynamic conditions. In this paper, we introduce Rudder, a software module embedded in the state-of-the-art AWS DistDGL framework, to autonomously prefetch remote nodes and minimize communication. Rudder's adaptation contrasts with both standard heuristics and traditional ML classifiers. We observe that the generative AI found in contemporary Large Language Models (LLMs) exhibits emergent properties like In-Context Learning (ICL) for zero-shot tasks, with logical multi-step reasoning. We find this behavior well-suited for adaptive control even with substantial undertraining. Evaluations using standard datasets and unseen configurations on the NERSC Perlmutter supercomputer show up to 91% improvement in end-to-end training performance over baseline DistDGL (no prefetching), and an 82% improvement over static prefetching, reducing communication by over 50%. Our code is available at https://github.com/aishwaryyasarkar/rudder-llm-agent.
【11】Uncertainty-aware Language Guidance for Concept Bottleneck Models
标题:概念瓶颈模型的不确定性语言引导
链接:https://arxiv.org/abs/2602.23495
摘要:概念瓶颈模型(CBMs)提供了固有的可解释性,首先将输入样本映射到高级语义概念,然后将这些概念组合起来进行最终分类。然而,人类可理解的概念的注释需要广泛的专业知识和劳动,限制了建立信任措施的广泛采用。另一方面,也有一些作品,利用大型语言模型(LLM)的知识来构建概念瓶颈。然而,它们面临着两个基本的限制:首先,它们忽略了与LLM注释的概念相关的不确定性,并且缺乏有效的机制来量化注释概念的不确定性,从而增加了由于LLM的幻觉而导致的错误风险。此外,他们未能将与这些注释相关的不确定性纳入概念瓶颈模型的学习过程中。为了解决这些限制,我们提出了一种新的不确定性感知的CBM方法,它不仅严格量化的不确定性与有效的和分布免费的保证LLM注释的概念标签,但也纳入量化的概念不确定性到CBM的训练过程中,以考虑不同程度的可靠性LLM注释的概念。我们还提供了我们所提出的方法的理论分析。在真实数据集上的大量实验验证了我们所提出的方法的预期特性。
摘要
:Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification. However, the annotation of human-understandable concepts requires extensive expert knowledge and labor, constraining the broad adoption of CBMs. On the other hand, there are a few works that leverage the knowledge of large language models (LLMs) to construct concept bottlenecks. Nevertheless, they face two essential limitations: First, they overlook the uncertainty associated with the concepts annotated by LLMs and lack a valid mechanism to quantify uncertainty about the annotated concepts, increasing the risk of errors due to hallucinations from LLMs. Additionally, they fail to incorporate the uncertainty associated with these annotations into the learning process for concept bottleneck models. To address these limitations, we propose a novel uncertainty-aware CBM method, which not only rigorously quantifies the uncertainty of LLM-annotated concept labels with valid and distribution-free guarantees, but also incorporates quantified concept uncertainty into the CBM training procedure to account for varying levels of reliability across LLM-annotated concepts. We also provide the theoretical analysis for our proposed method. Extensive experiments on the real-world datasets validate the desired properties of our proposed methods.
【12】Detoxifying LLMs via Representation Erasure-Based Preference Optimization
标题:通过基于表示擦除的偏好优化来解除LLM的关联
链接:https://arxiv.org/abs/2602.23391
摘要:在网络规模数据上训练的大型语言模型(LLM)可能会产生有毒输出,从而引发对安全部署的担忧。基于DPO,NPO和类似算法的应用程序的先前防御,降低了有害延续的可能性,但并不稳健:它们容易受到对抗性提示的影响,并且很容易被基于微调的再学习攻击所撤销。事实上,研究表明,这些对模型的修改是肤浅的:线性探测揭示了有害的“方向”仍然存在于表征中。为了解决这个问题,我们提出了基于表示擦除的偏好优化(REPO),将解毒重新定义为令牌级偏好问题。使用一个新的目标与偏好数据,我们迫使有毒延续的表示收敛到良性的同行。我们的机制分析表明,这种颗粒方法是至关重要的:与基线不同,REPO诱导对毒性编码神经元的深度,局部编辑,同时保留一般模型效用。详尽的评估表明,REPO实现了最先进的鲁棒性,阻止了复杂的威胁,包括重新学习攻击和增强的GCG越狱,其中现有的表示和基于输出的方法失败。
摘要:Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based on applications of DPO, NPO, and similar algorithms, reduce the likelihood of harmful continuations, but not robustly so: they are vulnerable to adversarial prompting and easily undone by fine-tuning-based relearning attacks. Indeed, research has shown that these edits to the model are superficial: linear probing reveals that harmful "directions" remain present in representations. To address this, we propose Representation Erasure-based Preference Optimization (REPO), reformulating detoxification as a token-level preference problem. Using a novel objective with preference data, we force the representations of toxic continuations to converge toward their benign counterparts. Our mechanistic analysis reveals that this granular approach is critical: unlike baselines, REPO induces deep, localized edits to toxicity-encoding neurons while preserving general model utility. Exhaustive evaluations show that REPO achieves state-of-the-art robustness, stopping sophisticated threats-including relearning attacks and enhanced GCG jailbreaks-where existing representation- and output-based methods fail.
Graph相关(图学习|图神经网络|图优化等)(8篇)
【1】A Theory of Random Graph Shift in Truncated-Spectrum vRKHS
标题:截短谱vRKHS中的随机图移动理论
链接:https://arxiv.org/abs/2602.23880
摘要:本文通过随机图生成镜头,我们认为类内图共享相同的随机图模型(RGM)和RGM组件的变化引起的域移位下,通过域移位的图分类理论。虽然经典的域自适应(DA)理论已经很好地支持了现有的技术来处理图分布的变化,图样本的信息,这本身是结构化的对象,是较少探索。图的非欧几里得性质和用于图学习的专用架构进一步使图分布变化的细粒度分析复杂化。在本文中,我们提出了一个理论,假设RGM作为数据生成过程,利用其连接到假设复杂性的功能空间的角度,这样的细粒度分析。基于向量值再生核希尔伯特空间(vRKHS)公式,我们推导出一个推广界,其移位惩罚允许分解为(i)域差异项,(ii)由可访问截断谱总结的谱几何项,以及(iii)聚合收敛和构造稳定性效应的振幅项。我们在实际数据和模拟中实证验证了这些方面的见解。
摘要:This paper develops a theory of graph classification under domain shift through a random-graph generative lens, where we consider intra-class graphs sharing the same random graph model (RGM) and the domain shift induced by changes in RGM components. While classic domain adaptation (DA) theories have well-underpinned existing techniques to handle graph distribution shift, the information of graph samples, which are itself structured objects, is less explored. The non-Euclidean nature of graphs and specialized architectures for graph learning further complicate a fine-grained analysis of graph distribution shifts. In this paper, we propose a theory that assumes RGM as the data generative process, exploiting its connection to hypothesis complexity in function space perspective for such fine-grained analysis. Building on a vector-valued reproducing kernel Hilbert space (vRKHS) formulation, we derive a generalization bound whose shift penalty admits a factorization into (i) a domain discrepancy term, (ii) a spectral-geometry term summarized by the accessible truncated spectrum, and (iii) an amplitude term that aggregates convergence and construction-stability effects. We empirically verify the insights on these terms in both real data and simulations.
【2】Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval
标题:测地语义搜索:用于引文图检索的局部黎曼流形学习
链接:https://arxiv.org/abs/2602.23665
摘要:我们提出了测地线语义搜索(GSS),检索系统,学习引用图上的节点特定的黎曼度量,使几何感知语义搜索。与标准的基于嵌入的检索依赖于固定的欧几里得距离不同,\gss{}在每个节点上学习一个低秩度量张量$\mL_i \in \R^{d \times r}$,从而产生一个局部半正定度量$\mG_i = \mL_i \mL_i^\top + \eps \mI$。这种参数化保证了有效的度量,同时保持模型的易处理性。检索通过多源Dijkstra学习测地线距离,其次是最大边缘相关性重新排序和路径一致性过滤。在169 K篇论文的引用预测基准上,\gss{}在Recall@20方面比SPECTER+FAISS基线提高了23%,同时提供了可解释的引用路径。与平面测地线搜索相比,我们的分层粗到细搜索与k-means池减少了4 $\times $的计算成本,同时保持97\%的检索质量。我们提供了理论分析时测地距离优于直接相似性,表征低秩度量的近似质量,并验证预测经验。代码和训练模型可在https://github.com/YCRG-Labs/geodesic-search上获得。
摘要:We present Geodesic Semantic Search (GSS), a retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search. Unlike standard embedding-based retrieval that relies on fixed Euclidean distances, \gss{} learns a low-rank metric tensor $\mL_i \in \R^{d \times r}$ at each node, inducing a local positive semi-definite metric $\mG_i = \mL_i \mL_i^\top + \eps \mI$. This parameterization guarantees valid metrics while keeping the model tractable. Retrieval proceeds via multi-source Dijkstra on the learned geodesic distances, followed by Maximal Marginal Relevance reranking and path coherence filtering. On citation prediction benchmarks with 169K papers, \gss{} achieves 23\% relative improvement in Recall@20 over SPECTER+FAISS baselines while providing interpretable citation paths. Our hierarchical coarse-to-fine search with k-means pooling reduces computational cost by 4$\times$ compared to flat geodesic search while maintaining 97\% retrieval quality. We provide theoretical analysis of when geodesic distances outperform direct similarity, characterize the approximation quality of low-rank metrics, and validate predictions empirically. Code and trained models are available at https://github.com/YCRG-Labs/geodesic-search.
【3】Normalisation and Initialisation Strategies for Graph Neural Networks in Blockchain Anomaly Detection
标题:区块链异常检测中图神经网络的规范化和初始化策略
链接:https://arxiv.org/abs/2602.23599
备注:14 pages, 5 figures
摘要:图神经网络(GNN)通过从节点特征和交易图拓扑中联合学习,为金融欺诈检测提供了一种原则性的方法。然而,它们对现实世界反洗钱(AML)基准的有效性主要取决于培训实践,例如具体的权重初始化和标准化,这些培训实践仍然没有得到充分的探索。我们在Elliptic比特币数据集上系统地分析了三种GNN架构(GCN、GAT和GraphSAGE)的初始化和规范化策略。我们的实验表明,初始化和规范化是依赖于架构的:GraphSAGE实现了最强的性能与Xavier初始化单独,GAT受益最多的GraphNorm与Xavier初始化相结合,而GCN显示出有限的敏感性,这些修改。这些发现为在AML管道中为具有严重类不平衡的数据集部署GNN提供了实用的,特定于架构的指导。我们发布了一个可重复的实验框架,包括时间数据分割、种子运行和完整的消融结果。
摘要:Graph neural networks (GNNs) offer a principled approach to financial fraud detection by jointly learning from node features and transaction graph topology. However, their effectiveness on real-world anti-money laundering (AML) benchmarks depends critically on training practices such as specifically weight initialisation and normalisation that remain underexplored. We present a systematic ablation of initialisation and normalisation strategies across three GNN architectures (GCN, GAT, and GraphSAGE) on the Elliptic Bitcoin dataset. Our experiments reveal that initialisation and normalisation are architecture-dependent: GraphSAGE achieves the strongest performance with Xavier initialisation alone, GAT benefits most from combining GraphNorm with Xavier initialisation, while GCN shows limited sensitivity to these modifications. These findings offer practical, architecture-specific guidance for deploying GNNs in AML pipelines for datasets with severe class imbalance. We release a reproducible experimental framework with temporal data splits, seeded runs, and full ablation results.
【4】Flowette: Flow Matching with Graphette Priors for Graph Generation
标题:Flowette:与Graphette Priors进行流匹配以生成图形
链接:https://arxiv.org/abs/2602.23566
备注:37 Pages
摘要
:我们研究了具有循环子图基序的图的生成建模。我们提出了Flowette,一个连续流匹配框架,它采用了一个基于图神经网络的Transformer来学习一个速度场,该速度场定义在具有节点和边缘属性的图表示上。我们的模型通过基于最佳传输的耦合来保留拓扑结构,并通过正则化来保持长期结构依赖性。为了结合域驱动的结构先验,我们引入了graphettes,一个新的概率族的图形结构模型,通过控制结构编辑的图案,如环,星和树,概括graphons。我们从理论上分析了所提出的框架的耦合性,不变性和结构特性,并在合成和小分子图生成任务上进行了经验评估。Flowette展示了一致的改进,突出了将结构先验与基于流的训练相结合来建模复杂图形分布的有效性。
摘要:We study generative modeling of graphs with recurring subgraph motifs. We propose Flowette, a continuous flow matching framework, that employs a graph neural network based transformer to learn a velocity field defined over graph representations with node and edge attributes. Our model preserves topology through optimal transport based coupling, and long-range structural dependencies through regularisation. To incorporate domain driven structural priors, we introduce graphettes, a new probabilistic family of graph structure models that generalize graphons via controlled structural edits for motifs like rings, stars and trees. We theoretically analyze the coupling, invariance, and structural properties of the proposed framework, and empirically evaluate it on synthetic and small-molecule graph generation tasks. Flowette demonstrates consistent improvements, highlighting the effectiveness of combining structural priors with flow-based training for modeling complex graph distributions.
【5】V-MORALS: Visual Morse Graph-Aided Estimation of Regions of Attraction in a Learned Latent Space
标题:V-MORALS:已知潜在空间中吸引区域的视觉莫尔斯图辅助估计
链接:https://arxiv.org/abs/2602.23524
摘要:可达性分析在机器人技术中变得越来越重要,以区分安全状态和不安全状态。不幸的是,现有的可达性和安全性分析方法往往达不到要求,因为它们通常需要已知的系统动态或大型数据集来估计准确的系统模型,计算成本高,并假设完整的状态信息。最近的一种方法,称为MORALS,旨在通过使用拓扑工具来估计低维潜在空间中的3DR-eEgnciodnesr of Attraction(ROA)来解决这些缺点。然而,MORALS仍然依赖于完整的状态知识,并没有被研究时,只有传感器测量。本文提出了一种基于视觉Morse图的学习潜空间吸引域估计方法(V-MORALS)。V-MORALS在给定的控制器下接收系统的基于图像的轨迹的数据集,并学习用于可达性分析的潜在空间。使用这个学习的潜在空间,我们的方法能够生成定义良好的Morse图,从中我们可以计算各种系统和控制器的ROA。V-MORALS提供了类似于原始MORALS架构的功能,而不依赖于状态知识,只使用高级传感器数据。我们的项目网站是:https://v-morals.onrender.com。
摘要:Reachability analysis has become increasingly important in robotics to distinguish safe from unsafe states. Unfortunately, existing reachability and safety analysis methods often fall short, as they typically require known system dynamics or large datasets to estimate accurate system models, are computationally expensive, and assume full state information. A recent method, called MORALS, aims to address these shortcomings by using topological tools to estimate3DR-eEgnciodnesr of Attraction (ROA) in a low-dimensional latent space. However, MORALS still relies on full state knowledge and has not been studied when only sensor measurements are available. This paper presents Visual Morse Graph-Aided Estimation of Regions of Attraction in a Learned Latent Space (V- MORALS). V-MORALS takes in a dataset of image-based trajectories of a system under a given controller, and learns a latent space for reachability analysis. Using this learned latent space, our method is able to generate well-defined Morse Graphs, from which we can compute ROAs for various systems and controllers. V-MORALS provides capabilities similar to the original MORALS architecture without relying on state knowledge, and using only high-level sensor data. Our project website is at: https://v-morals.onrender.com.
【6】Pacing Opinion Polarization via Graph Reinforcement Learning
标题:基于图强化学习的观点极化研究
链接:https://arxiv.org/abs/2602.23390
备注:32 pages, 21 figure
摘要:在线社交网络中的意见两极分化对社会凝聚力和民主进程构成严重风险。最近的研究制定极化缓和的算法干预问题下的意见动力学模型,特别是Friedkin-Johnsen(FJ)模型。然而,大多数现有的方法都是针对特定的线性设置,并依赖于封闭形式的稳态分析,限制了可扩展性,灵活性和适用性的成本意识,非线性,或拓扑改变干预。 我们提出了PACIFIER,一个通过网络干预进行顺序极化缓和的图强化学习框架。PACIFIER将经典的ModerateInternal(MI)和ModerateExpressed(ME)问题重新表述为顺序决策任务,从而实现自适应干预政策,而无需重复稳态重新计算。该框架是客观的不可知论,并自然延伸到FJ一致的设置,包括不知情的干预,持续的内部意见,有偏见的同化动态,和节点删除。在现实世界的网络上进行的大量实验表明,在不同的适度场景下,具有强大的性能和可扩展性。
摘要:Opinion polarization in online social networks poses serious risks to social cohesion and democratic processes. Recent studies formulate polarization moderation as algorithmic intervention problems under opinion dynamics models, especially the Friedkin--Johnsen (FJ) model. However, most existing methods are tailored to specific linear settings and rely on closed-form steady-state analysis, limiting scalability, flexibility, and applicability to cost-aware, nonlinear, or topology-altering interventions. We propose PACIFIER, a graph reinforcement learning framework for sequential polarization moderation via network interventions. PACIFIER reformulates the canonical ModerateInternal (MI) and ModerateExpressed (ME) problems as sequential decision-making tasks, enabling adaptive intervention policies without repeated steady-state recomputation. The framework is objective-agnostic and extends naturally to FJ-consistent settings, including budget-aware interventions, continuous internal opinions, biased-assimilation dynamics, and node removal. Extensive experiments on real-world networks demonstrate strong performance and scalability across diverse moderation scenarios.
【7】Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables
标题:图不确定性下的公平性:利用变量集群上的部分已知因果图实现干预公平性
链接:https://arxiv.org/abs/2602.23611
备注:26 pages, 9 figures
摘要:关于个人的决策不仅需要准确的预测,而且还需要对性别和种族等敏感属性进行公平的预测。公平的因果概念符合法律要求,但许多方法假设可以获得潜在因果图的详细知识,这在实践中是一个苛刻的假设。我们提出了一个学习框架,通过利用因果图上的\textit{集群的变量},这是更容易估计比变量级图,实现干预公平。通过从这样的聚类因果图中识别出可能的\textit{adjustment cluster sets},我们的框架通过减少这些集合之间的干预分布之间的最差情况差异来训练预测模型。为此,我们开发了一个计算效率高的重心内核最大平均差异(MMD),规模有利的敏感属性值的数量。大量的实验表明,我们的框架在公平性和准确性之间取得了比现有方法更好的平衡,在有限的因果图知识下突出了其有效性。
摘要:Algorithmic decisions about individuals require predictions that are not only accurate but also fair with respect to sensitive attributes such as gender and race. Causal notions of fairness align with legal requirements, yet many methods assume access to detailed knowledge of the underlying causal graph, which is a demanding assumption in practice. We propose a learning framework that achieves interventional fairness by leveraging a causal graph over \textit{clusters of variables}, which is substantially easier to estimate than a variable-level graph. With possible \textit{adjustment cluster sets} identified from such a cluster causal graph, our framework trains a prediction model by reducing the worst-case discrepancy between interventional distributions across these sets. To this end, we develop a computationally efficient barycenter kernel maximum mean discrepancy (MMD) that scales favorably with the number of sensitive attribute values. Extensive experiments show that our framework strikes a better balance between fairness and accuracy than existing approaches, highlighting its effectiveness under limited causal graph knowledge.
【8】Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data
标题:时刻很重要:从异方差观察数据中发现均值和方差因果图
链接:https://arxiv.org/abs/2602.23602
备注:17 pages, 6 figures
摘要:异方差性-变量的方差随其他变量变化-在真实数据中普遍存在,从统计矩的角度阐明为什么它会出现在科学知识发现和决策中至关重要。然而,标准的因果发现并没有揭示哪些原因作用于均值与方差,因为它返回一个单一的矩不可知图,限制了可解释性和下游干预设计。我们提出了一个贝叶斯,矩驱动的因果发现框架,推断独立的\textit{mean}和\textit{variance}因果图从观测异方差数据。首先,我们通过建立这两个图分别可识别的充分条件来得到识别结果。在此理论的基础上,我们开发了一种变分推理方法,该方法可以学习两个图上的后验分布,从而实现结构特征的原则性不确定性量化(例如,边、路径和子图)。为了解决具有两个图结构的异方差模型中参数优化的挑战,我们采用了曲率感知优化方法,并开发了一种利用节点排序领域知识的先验合并技术,提高了样本效率。合成,半合成和真实数据的实验表明,我们的方法准确地恢复均值和方差结构,并优于最先进的基线。
摘要
:Heteroscedasticity -- where the variance of a variable changes with other variables -- is pervasive in real data, and elucidating why it arises from the perspective of statistical moments is crucial in scientific knowledge discovery and decision-making. However, standard causal discovery does not reveal which causes act on the mean versus the variance, as it returns a single moment-agnostic graph, limiting interpretability and downstream intervention design. We propose a Bayesian, moment-driven causal discovery framework that infers separate \textit{mean} and \textit{variance} causal graphs from observational heteroscedastic data. We first derive the identification results by establishing sufficient conditions under which these two graphs are separately identifiable. Building on this theory, we develop a variational inference method that learns a posterior distribution over both graphs, enabling principled uncertainty quantification of structural features (e.g., edges, paths, and subgraphs). To address the challenges of parameter optimization in heteroscedastic models with two graph structures, we take a curvature-aware optimization approach and develop a prior incorporation technique that leverages domain knowledge on node orderings, improving sample efficiency. Experiments on synthetic, semi-synthetic, and real data show that our approach accurately recovers mean and variance structures and outperforms state-of-the-art baselines.
Transformer(4篇)
【1】FaultXformer: A Transformer-Encoder Based Fault Classification and Location Identification model in PMU-Integrated Active Electrical Distribution System
标题:FaultXformer:基于Transformer-编码器的IMU集成主动配电系统故障分类和位置识别模型
链接:https://arxiv.org/abs/2602.24254
摘要:配电系统中准确的故障检测和定位至关重要,特别是随着分布式能源(DER)的日益集成,这给电网运营带来了更大的可变性和复杂性。在本研究中,提出了FaultXformer,这是一种基于Transformer编码器的架构,旨在使用从相量测量单元(PMU)获得的实时电流数据进行自动故障分析。该方法利用时间序列的电流数据,最初提取丰富的时间信息,在第1阶段,这是至关重要的识别故障类型,并精确地确定其在多个节点的位置。在阶段2中,处理这些提取的特征以区分不同的故障类型并识别配电系统内的相应故障位置。因此,这种双级Transformer编码器管道能够实现高保真表示学习,大大提高了工作性能。该模型进行了验证,从IEEE 13节点测试馈线,模拟20个单独的故障位置和几个DER集成的情况下,利用电流测量从四个战略定位PMU生成的数据集。为了证明稳健的性能评价,进行了分层10倍交叉验证。FaultXformer在交叉验证中实现了98.76%的故障类型分类平均准确率和98.92%的故障位置识别准确率,始终超过传统的深度学习基线卷积神经网络(CNN),递归神经网络(RNN)。长短期记忆(LSTM)的分类准确率分别提高了1.70%、34.95%和2.04%,定位准确率分别提高了10.82%、40.89%和6.27%。这些结果证明了所提出的模型具有显着的DER渗透的功效。
摘要:Accurate fault detection and localization in electrical distribution systems is crucial, especially with the increasing integration of distributed energy resources (DERs), which inject greater variability and complexity into grid operations. In this study, FaultXformer is proposed, a Transformer encoder-based architecture developed for automatic fault analysis using real-time current data obtained from phasor measurement unit (PMU). The approach utilizes time-series current data to initially extract rich temporal information in stage 1, which is crucial for identifying the fault type and precisely determining its location across multiple nodes. In Stage 2, these extracted features are processed to differentiate among distinct fault types and identify the respective fault location within the distribution system. Thus, this dual-stage transformer encoder pipeline enables high-fidelity representation learning, considerably boosting the performance of the work. The model was validated on a dataset generated from the IEEE 13-node test feeder, simulated with 20 separate fault locations and several DER integration scenarios, utilizing current measurements from four strategically located PMUs. To demonstrate robust performance evaluation, stratified 10-fold cross-validation is performed. FaultXformer achieved average accuracies of 98.76% in fault type classification and 98.92% in fault location identification across cross-validation, consistently surpassing conventional deep learning baselines convolutional neural network (CNN), recurrent neural network (RNN). long short-term memory (LSTM) by 1.70%, 34.95%, and 2.04% in classification accuracy and by 10.82%, 40.89%, and 6.27% in location accuracy, respectively. These results demonstrate the efficacy of the proposed model with significant DER penetration.
【2】MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy
标题:MuViT:用于显微镜跨尺度学习的多分辨率视觉变形器
链接:https://arxiv.org/abs/2602.24222
备注:Accepted at CVPR 2026
摘要:现代显微镜通常会产生千兆像素的图像,这些图像包含多个空间尺度的结构,从精细的细胞形态到更广泛的组织结构。许多分析任务需要结合这些尺度,但大多数视觉模型以单一分辨率运行或从一个视图中获得多尺度特征,限制了它们利用显微镜数据固有的多分辨率特性的能力。我们介绍MuViT,一个Transformer架构,用于融合来自同一底层图像的真实多分辨率观测。MuViT将所有补丁嵌入到共享的世界坐标系中,并将旋转位置嵌入扩展到这些坐标,从而使注意力能够在单个编码器中集成宽场上下文和高分辨率细节。在合成基准、肾脏组织病理学和高分辨率小鼠脑显微镜检查中,MuViT相对于强大的ViT和CNN基线提供了一致的改进。多分辨率MAE预训练进一步产生尺度一致的表示,增强下游任务。这些结果表明,明确的世界坐标建模提供了一个简单而强大的机制,利用多分辨率信息在大规模显微镜分析。
摘要:Modern microscopy routinely produces gigapixel images that contain structures across multiple spatial scales, from fine cellular morphology to broader tissue organization. Many analysis tasks require combining these scales, yet most vision models operate at a single resolution or derive multi-scale features from one view, limiting their ability to exploit the inherently multi-resolution nature of microscopy data. We introduce MuViT, a transformer architecture built to fuse true multi-resolution observations from the same underlying image. MuViT embeds all patches into a shared world-coordinate system and extends rotary positional embeddings to these coordinates, enabling attention to integrate wide-field context with high-resolution detail within a single encoder. Across synthetic benchmarks, kidney histopathology, and high-resolution mouse-brain microscopy, MuViT delivers consistent improvements over strong ViT and CNN baselines. Multi-resolution MAE pretraining further produces scale-consistent representations that enhance downstream tasks. These results demonstrate that explicit world-coordinate modelling provides a simple yet powerful mechanism for leveraging multi-resolution information in large-scale microscopy analysis.
【3】RAViT: Resolution-Adaptive Vision Transformer
标题:RAViT:分辨率自适应视觉Transformer
链接:https://arxiv.org/abs/2602.24159
摘要:Vision Transformers最近在计算机视觉方面取得了突破性进展,在许多应用中显示出出色的精度性能。然而,与卷积神经网络等替代方法相比,它们的计算成本非常高。为了解决这个问题,我们提出了一种新的图像分类框架,称为RAViT,基于多分支网络,该网络对具有不同分辨率的同一图像的多个副本进行操作,以降低计算成本,同时保持整体精度。此外,我们的框架包括一个早期退出机制,使我们的模型自适应,并允许在运行时的准确性和计算成本之间选择适当的权衡。例如,在双分支架构中,首先调整原始图像的大小以降低其分辨率,然后使用第一Transformer对其执行预测,并且所得到的预测与原始大小的图像一起被重新使用,以在第二Transformer上执行最终预测,其计算量小于经典Vision Transformer架构。早期退出过程允许模型在中间分支处进行最终预测,从而节省更多的计算。我们在CIFAR-10、Tiny ImageNet和ImageNet上评估了我们的方法。我们获得了与经典Vision Transformer模型相当的精度,仅具有约70%的FLOP。
摘要:Vision transformers have recently made a breakthrough in computer vision showing excellent performance in terms of precision for numerous applications. However, their computational cost is very high compared to alternative approaches such as Convolutional Neural Networks. To address this problem, we propose a novel framework for image classification called RAViT based on a multi-branch network that operates on several copies of the same image with different resolutions to reduce the computational cost while preserving the overall accuracy. Furthermore, our framework includes an early exit mechanism that makes our model adaptive and allows to choose the appropriate trade-off between accuracy and computational cost at run-time. For example in a two-branch architecture, the original image is first resized to reduce its resolution, then a prediction is performed on it using a first transformer and the resulting prediction is reused together with the original-size image to perform a final prediction on a second transformer with less computation than a classical Vision transformer architecture. The early-exit process allows the model to make a final prediction at intermediate branches, saving even more computation. We evaluated our approach on CIFAR-10, Tiny ImageNet, and ImageNet. We obtained an equivalent accuracy to the classical Vision transformer model with only around 70% of FLOPs.
【4】Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training
标题:Transformer训练中优化器引起的低维漂移和横向动力学
链接:https://arxiv.org/abs/2602.23696
备注:18 pages, 4 figures
摘要:我们研究了小Transformer模型中训练轨迹的几何形状,发现参数更新组织成具有横向残余动力学的主导漂移方向。使用非中心,行归一化的轨迹PCA,我们表明,一个单一的方向捕获了大部分的累积参数运动的早期训练,而其余的组件编码振荡行为的辅助探测性能。瞬时梯度与此主导方向几乎没有对齐,这表明它来自累积的优化器更新,而不是每个批次的梯度结构。在匹配的损失水平下比较AdamW与SGD变体揭示了轨迹几何结构的实质性差异:AdamW开发了多维漂移结构,而SGD系列优化器产生了近共线的参数演化和较弱的探针动力学。再加热选择性地扰动横向分量,对主导漂移坐标的影响最小。这些发现表明,优化器的选择塑造了学习轨迹的有效维度和结构,而不仅仅是损失值。
摘要
:We study the geometry of training trajectories in small transformer models and find that parameter updates organize into a dominant drift direction with transverse residual dynamics. Using uncentered, row-normalized trajectory PCA, we show that a single direction captures a large fraction of cumulative parameter movement early in training, while remaining components encode oscillatory behavior in auxiliary probe performance. Instantaneous gradients exhibit little alignment with this dominant direction, indicating that it arises from accumulated optimizer updates rather than per-batch gradient structure. Comparing AdamW with SGD variants at matched loss levels reveals substantial differences in trajectory geometry: AdamW develops multi-dimensional drift structure, whereas SGD-family optimizers produce nearly colinear parameter evolution and weaker probe dynamics. Reheating selectively perturbs transverse components with minimal effect on the dominant drift coordinate. These findings suggest that optimizer choice shapes the effective dimensionality and structure of learning trajectories beyond what is apparent from loss values alone.
GAN|对抗|攻击|生成相关(7篇)
【1】Mode Seeking meets Mean Seeking for Fast Long Video Generation
标题:模式寻求与均值寻求相结合,实现快速长视频生成
链接:https://arxiv.org/abs/2602.24289
备注:Project website: https://primecai.github.io/mmm/
摘要:将视频生成从几秒扩展到几分钟面临着一个关键瓶颈:虽然短视频数据丰富且高保真,但连贯的长格式数据却很少,并且仅限于狭窄的领域。为了解决这个问题,我们提出了一个训练范例,其中模式搜索满足均值搜索,通过解耦扩散Transformer的统一表示的基础上,从长期一致性解耦本地保真度。我们的方法利用通过长视频监督学习训练的全局流匹配头来捕获叙事结构,同时采用本地分布匹配头,通过模式搜索反向KL发散将滑动窗口与冻结的短视频教师对齐。这种策略能够合成分钟级视频,通过监督流匹配从有限的长视频中学习远程连贯性和运动,同时通过将学生的每个滑动窗口片段与冻结的短视频教师对齐来继承局部真实感,从而产生几步快速长视频生成器。评估结果表明,我们的方法有效地关闭的清晰度地平线差距,共同提高局部清晰度,运动和长期的一致性。项目网址:https://primecai.github.io/mmm/。
摘要:Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local fidelity from long-term coherence based on a unified representation via a Decoupled Diffusion Transformer. Our approach utilizes a global Flow Matching head trained via supervised learning on long videos to capture narrative structure, while simultaneously employing a local Distribution Matching head that aligns sliding windows to a frozen short-video teacher via a mode-seeking reverse-KL divergence. This strategy enables the synthesis of minute-scale videos that learns long-range coherence and motions from limited long videos via supervised flow matching, while inheriting local realism by aligning every sliding-window segment of the student to a frozen short-video teacher, resulting in a few-step fast long video generator. Evaluations show that our method effectively closes the fidelity-horizon gap by jointly improving local sharpness, motion and long-range consistency. Project website: https://primecai.github.io/mmm/.
【2】CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
标题:CUDA代理:用于高性能CUDA内核生成的大规模抽象RL
链接:https://arxiv.org/abs/2602.24286
摘要:GPU内核优化是现代深度学习的基础,但仍然是一项高度专业化的任务,需要深厚的硬件专业知识。尽管在通用编程方面表现强劲,但大型语言模型(LLM)仍然无法与基于编译器的系统(如用于CUDA内核生成的torch.compile)竞争。现有的CUDA代码生成方法要么依赖于固定的多回合执行反馈循环内的无训练细化或微调模型,但这两种范式都未能从根本上提高模型固有的CUDA优化能力,导致性能提升有限。我们介绍了CUDA Agent,这是一个大规模的代理强化学习系统,通过三个组件开发CUDA内核专业知识:可扩展的数据合成管道,技能增强的CUDA开发环境,自动验证和分析以提供可靠的奖励信号,以及强化学习算法技术,实现稳定的训练。CUDA Agent在KernelBench上实现了最先进的结果,提供了100\%,100\%和92\%的速度比torch快。在KernelBench Level-1,Level-2和Level-3拆分上编译,在最难的Level-3设置上比最强的专有模型(如Claude Opus 4.5和Gemini 3 Pro)性能高出约40\%。
摘要:GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large language models (LLMs) remain uncompetitive with compiler-based systems such as torch.compile for CUDA kernel generation. Existing CUDA code generation approaches either rely on training-free refinement or fine-tune models within fixed multi-turn execution-feedback loops, but both paradigms fail to fundamentally improve the model's intrinsic CUDA optimization ability, resulting in limited performance gains. We present CUDA Agent, a large-scale agentic reinforcement learning system that develops CUDA kernel expertise through three components: a scalable data synthesis pipeline, a skill-augmented CUDA development environment with automated verification and profiling to provide reliable reward signals, and reinforcement learning algorithmic techniques enabling stable training. CUDA Agent achieves state-of-the-art results on KernelBench, delivering 100\%, 100\%, and 92\% faster rate over torch.compile on KernelBench Level-1, Level-2, and Level-3 splits, outperforming the strongest proprietary models such as Claude Opus 4.5 and Gemini 3 Pro by about 40\% on the hardest Level-3 setting.
【3】Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
标题:越狱铸造:从论文到可重复基准的可运行攻击
链接:https://arxiv.org/abs/2602.24009
摘要:大型语言模型(LLM)的越狱技术比基准测试发展得更快,由于数据集、工具和判断协议的漂移,使得鲁棒性估计变得陈旧,难以在论文之间进行比较。我们介绍越狱铸造(JBF),一个系统,通过多代理工作流来解决这个差距,将越狱文件转换为可执行模块,以便在统一的线束内立即进行评估。JBF具有三个核心组件:(i)用于共享合同和可重用实用程序的JBF-LIB;(ii)用于多代理纸张到模块翻译的JBF-FORGE;以及(iii)用于标准化评估的JBF-EVAL。在30次复制攻击中,JBF实现了高保真度,平均(复制报告)攻击成功率(ASR)偏差为+0.26个百分点。通过利用共享的基础设施,JBF将攻击特定的实现代码减少了近一半,并实现了82.5%的平均代码重用率。该系统使用一致的GPT-4 o判断对10个受害者模型的所有30次攻击进行标准化AdvBench评估。通过自动化攻击集成和标准化评估,JBF提供了一个可扩展的解决方案,用于创建与快速变化的安全环境保持同步的动态基准。
摘要:Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUNDRY (JBF), a system that addresses this gap via a multi-agent workflow to translate jailbreak papers into executable modules for immediate evaluation within a unified harness. JBF features three core components: (i) JBF-LIB for shared contracts and reusable utilities; (ii) JBF-FORGE for the multi-agent paper-to-module translation; and (iii) JBF-EVAL for standardizing evaluations. Across 30 reproduced attacks, JBF achieves high fidelity with a mean (reproduced-reported) attack success rate (ASR) deviation of +0.26 percentage points. By leveraging shared infrastructure, JBF reduces attack-specific implementation code by nearly half relative to original repositories and achieves an 82.5% mean reused-code ratio. This system enables a standardized AdvBench evaluation of all 30 attacks across 10 victim models using a consistent GPT-4o judge. By automating both attack integration and standardized evaluation, JBF offers a scalable solution for creating living benchmarks that keep pace with the rapidly shifting security landscape.
【4】Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference
标题:通过变分推理学习掩蔽离散扩散模型的生成阶
链接:https://arxiv.org/abs/2602.23968
备注:12 pages, 1 figure
摘要:掩蔽离散扩散模型(MDM)是一种很有前途的生成式建模新方法,提供了并行令牌生成的能力,因此比自回归模型更高效。然而,实现并行生成和样品质量之间的最佳平衡仍然是一个悬而未决的问题。目前的方法主要通过固定的启发式并行采样方法来解决这个问题。最近有一些基于学习的方法来解决这个问题,但从变分推理的角度来看,它的制定仍然有待探索。在这项工作中,我们提出了一个变分推理框架学习并行生成顺序MDM。作为我们方法的一部分,我们提出了一个参数化的近似后验的生成顺序,这有利于并行和有效的采样在训练过程中。使用这种方法,我们进行了初步的实验GSM 8 K数据集,我们的方法执行竞争力对启发式采样策略的高度并行生成的制度。例如,我们的方法实现了33.1\%的准确性,平均只有4个生成步骤,相比之下,23.7-29.0\%的准确性,通过标准的竞争对手的方法在相同数量的步骤。我们相信,进一步的实验和分析的方法将产生有价值的见解与MDM的并行生成的问题。
摘要
:Masked discrete diffusion models (MDMs) are a promising new approach to generative modelling, offering the ability for parallel token generation and therefore greater efficiency than autoregressive counterparts. However, achieving an optimal balance between parallel generation and sample quality remains an open problem. Current approaches primarily address this issue through fixed, heuristic parallel sampling methods. There exist some recent learning based approaches to this problem, but its formulation from the perspective of variational inference remains underexplored. In this work, we propose a variational inference framework for learning parallel generation orders for MDMs. As part of our method, we propose a parameterisation for the approximate posterior of generation orders which facilitates parallelism and efficient sampling during training. Using this method, we conduct preliminary experiments on the GSM8K dataset, where our method performs competitively against heuristic sampling strategies in the regime of highly parallel generation. For example, our method achieves 33.1\% accuracy with an average of only only 4 generation steps, compared to 23.7-29.0\% accuracy achieved by standard competitor methods in the same number of steps. We believe further experiments and analysis of the method will yield valuable insights into the problem of parallel generation with MDMs.
【5】Exploring Robust Intrusion Detection: A Benchmark Study of Feature Transferability in IoT Botnet Attack Detection
标题:探索稳健的入侵检测:物联网僵尸网络攻击检测中特征可转移性的基准研究
链接:https://arxiv.org/abs/2602.23874
备注:Accepted for publication in the Proceedings of the 2026 International Conference on Information Systems Security and Privacy (ICISSP)
摘要:跨域入侵检测仍然是一个关键的挑战,由于网络流量特征和功能分布在不同环境中的显着变化。这项研究评估了三个广泛使用的基于流的功能集(Argus,Zeek和CICFlowMeter)在代表异构物联网和工业物联网网络条件的四个广泛使用的数据集上的可转移性。通过大量的实验,我们评估了多个分类模型的域内和跨域性能,并使用SHapley加法解释(SHAP)分析了特征重要性。我们的研究结果表明,在一个域上训练的模型在应用于不同的目标域时会遭受显著的性能下降,这反映了物联网入侵检测系统对分布变化的敏感性。此外,结果表明,分类算法和特征表示的选择显着影响可移植性。除了报告性能差异和对特征和特征空间的可转移性的彻底分析之外,我们还为特征工程提供了实用的指导方针,以提高域可变性下的鲁棒性。我们的研究结果表明,有效的入侵检测需要高域内的性能和跨域的变化,可通过仔细的特征空间设计,适当的算法选择和自适应策略。
摘要:Cross-domain intrusion detection remains a critical challenge due to significant variability in network traffic characteristics and feature distributions across environments. This study evaluates the transferability of three widely used flow-based feature sets (Argus, Zeek and CICFlowMeter) across four widely used datasets representing heterogeneous IoT and Industrial IoT network conditions. Through extensive experiments, we evaluate in- and cross-domain performance across multiple classification models and analyze feature importance using SHapley Additive exPlanations (SHAP). Our results show that models trained on one domain suffer significant performance degradation when applied to a different target domain, reflecting the sensitivity of IoT intrusion detection systems to distribution shifts. Furthermore, the results evidence that the choice of classification algorithm and feature representations significantly impact transferability. Beyond reporting performance differences and thorough analysis of the transferability of features and feature spaces, we provide practical guidelines for feature engineering to improve robustness under domain variability. Our findings suggest that effective intrusion detection requires both high in-domain performance and resilience to cross-domain variability, achievable through careful feature space design, appropriate algorithm selection and adaptive strategies.
【6】MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
标题:MAGE:离线强化学习的多尺度自回归生成
链接:https://arxiv.org/abs/2602.23770
备注:ICLR2026
摘要:生成模型在离线强化学习(RL)中获得了巨大的吸引力,因为它们能够对复杂的轨迹分布进行建模。然而,现有的基于代的方法仍然与以稀疏奖励为特征的长视野任务作斗争。一些分层生成方法已被开发,以减轻这个问题,分解成较短的地平线的子问题,使用一个政策,并产生详细的行动与另一个。虽然有效,但这些方法往往忽略了轨迹中固有的多尺度时间结构,从而导致次优性能。为了克服这些局限性,我们提出了MAGE,一种基于多尺度自回归遗传算法的离线强化学习方法。MAGE结合了一个条件引导的多尺度自动编码器来学习分层轨迹表示,以及一个多尺度Transformer,自回归生成从粗到细的时间尺度的轨迹表示。MAGE有效地捕获多个分辨率下轨迹的时间依赖性。此外,一个条件引导的解码器施加精确的控制短期行为。在五个离线RL基准上对十五个基线算法进行的广泛实验表明,MAGE成功地将多尺度轨迹建模与条件制导相结合,在长时间稀疏奖励设置中生成连贯和可控的轨迹。
摘要:Generative models have gained significant traction in offline reinforcement learning (RL) due to their ability to model complex trajectory distributions. However, existing generation-based approaches still struggle with long-horizon tasks characterized by sparse rewards. Some hierarchical generation methods have been developed to mitigate this issue by decomposing the original problem into shorter-horizon subproblems using one policy and generating detailed actions with another. While effective, these methods often overlook the multi-scale temporal structure inherent in trajectories, resulting in suboptimal performance. To overcome these limitations, we propose MAGE, a Multi-scale Autoregressive GEneration-based offline RL method. MAGE incorporates a condition-guided multi-scale autoencoder to learn hierarchical trajectory representations, along with a multi-scale transformer that autoregressively generates trajectory representations from coarse to fine temporal scales. MAGE effectively captures temporal dependencies of trajectories at multiple resolutions. Additionally, a condition-guided decoder is employed to exert precise control over short-term behaviors. Extensive experiments on five offline RL benchmarks against fifteen baseline algorithms show that MAGE successfully integrates multi-scale trajectory modeling with conditional guidance, generating coherent and controllable trajectories in long-horizon sparse-reward settings.
【7】Inference-time optimization for experiment-grounded protein ensemble generation
标题:基于实验的蛋白质系综生成的推理时间优化
链接:https://arxiv.org/abs/2602.24007
摘要:蛋白质功能依赖于动态构象集合,但目前的生成模型,如AlphaFold3,往往无法产生与实验数据相匹配的集合。最近的实验指导发电机试图通过引导反向扩散过程来解决这个问题。然而,这些方法受到固定采样范围和对初始化的敏感性的限制,经常产生不可信的结果。我们引入了一个通用的推理时间优化框架来解决这些挑战。首先,我们对潜在表示进行优化,以最大化集合对数似然,而不是事后扰动结构。这种方法消除了对扩散长度的依赖,消除了初始化偏差,并很容易地结合外部约束。其次,我们提出了新的采样方案绘制玻尔兹曼加权合奏。通过将AlphaFold3的结构先验与基于力场的先验相结合,我们从它们的产品分布中取样,同时平衡实验可能性。我们的研究结果表明,该框架始终优于最先进的指导,提高了多样性,物理能量,并与X射线晶体学和NMR数据一致,通常比沉积的PDB结构更好地拟合实验数据。最后,最大化ipTM分数的推理时间优化实验表明,干扰AlphaFold3嵌入可以人为地提高模型的置信度。这暴露了当前设计指标中的漏洞,其缓解可以提供降低绑定工程中错误发现率的途径。
摘要:Protein function relies on dynamic conformational ensembles, yet current generative models like AlphaFold3 often fail to produce ensembles that match experimental data. Recent experiment-guided generators attempt to address this by steering the reverse diffusion process. However, these methods are limited by fixed sampling horizons and sensitivity to initialization, often yielding thermodynamically implausible results. We introduce a general inference-time optimization framework to solve these challenges. First, we optimize over latent representations to maximize ensemble log-likelihood, rather than perturbing structures post hoc. This approach eliminates dependence on diffusion length, removes initialization bias, and easily incorporates external constraints. Second, we present novel sampling schemes for drawing Boltzmann-weighted ensembles. By combining structural priors from AlphaFold3 with force-field-based priors, we sample from their product distribution while balancing experimental likelihoods. Our results show that this framework consistently outperforms state-of-the-art guidance, improving diversity, physical energy, and agreement with data in X-ray crystallography and NMR, often fitting the experimental data better than deposited PDB structures. Finally, inference-time optimization experiments maximizing ipTM scores reveal that perturbing AlphaFold3 embeddings can artificially inflate model confidence. This exposes a vulnerability in current design metrics, whose mitigation could offer a pathway to reduce false discovery rates in binder engineering.
半/弱/无/有监督|不确定性|主动学习(6篇)
【1】An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks
标题:一种用于异类物联网网络异常检测的高效无监督联邦学习方法
链接:https://arxiv.org/abs/2602.24209
摘要
:联合学习(FL)是物联网(IoT)等分布式环境的有效范例,其中来自具有不同功能的不同设备的数据保持本地化,同时有助于共享全局模型。通过消除传输原始数据的需要,FL本质上保留了隐私。然而,物联网数据的异构性,源于设备功能,数据格式和通信约束的差异,对维护全局模型性能和隐私提出了重大挑战。在基于物联网的异常检测的背景下,无监督FL提供了一种很有前途的方法来识别异常行为,而无需集中式数据聚合。然而,跨设备的特征异质性使模型训练和优化复杂化,阻碍了有效的实施。在这项研究中,我们提出了一个有效的无监督FL框架,通过利用两个不同物联网数据集的共享特征来增强异常检测:一个集中在异常检测上,另一个集中在设备识别上,同时保留特定于物联网的特征。为了提高透明度和可解释性,我们采用可解释的AI技术,如SHAP,来识别影响局部模型决策的关键特征。在真实世界物联网数据集上进行的实验表明,该方法在异常检测准确性方面明显优于传统FL方法。这项工作强调了使用互补数据集的共享特征来优化无监督联邦学习并在分散的物联网环境中实现卓越的异常检测结果的潜力。
摘要:Federated learning (FL) is an effective paradigm for distributed environments such as the Internet of Things (IoT), where data from diverse devices with varying functionalities remains localized while contributing to a shared global model. By eliminating the need to transmit raw data, FL inherently preserves privacy. However, the heterogeneous nature of IoT data, stemming from differences in device capabilities, data formats, and communication constraints, poses significant challenges to maintaining both global model performance and privacy. In the context of IoT-based anomaly detection, unsupervised FL offers a promising means to identify abnormal behavior without centralized data aggregation. Nevertheless, feature heterogeneity across devices complicates model training and optimization, hindering effective implementation. In this study we propose an efficient unsupervised FL framework that enhances anomaly detection by leveraging shared features from two distinct IoT datasets: one focused on anomaly detection and the other on device identification, while preserving dataset-specific features. To improve transparency and interpretability, we employ explainable AI techniques, such as SHAP, to identify key features influencing local model decisions. Experiments conducted on real-world IoT datasets demonstrate that the proposed method significantly outperforms conventional FL approaches in anomaly detection accuracy. This work underscores the potential of using shared features from complementary datasets to optimize unsupervised federated learning and achieve superior anomaly detection results in decentralized IoT environments.
【2】Unsupervised Baseline Clustering and Incremental Adaptation for IoT Device Traffic Profiling
标题:用于物联网设备流量分析的无监督基线集群和增量自适应
链接:https://arxiv.org/abs/2602.24047
备注:6 pages, 2 figures, 4 tables
摘要:物联网设备的增长和异构性带来了安全挑战,其中静态识别模型可能会随着流量的发展而降级。本文提出了一个两阶段的基于流特征的管道,用于无监督的物联网设备流量分析和增量模型更新,并对Deakin物联网数据集中选定的长时间捕获进行了评估。对于基线分析,基于密度的聚类(DBSCAN)隔离了数据的大量离群值部分,并在测试的经典方法(NMI 0.78)中与地面实况设备标签产生最强的对齐,在聚类纯度上优于基于质心的聚类。对于增量自适应,我们评估面向流的聚类方法,发现BIRCH支持有效的更新(每次更新0.13秒),并形成相对连贯的集群,为一个新的设备(纯度0.87),但有限的捕获新的流量(份额0.72)和可衡量的权衡后,适应(0.71)已知设备的准确性。总的来说,结果突出了高纯度静态分析与渐进式集群灵活性之间的实际权衡,以适应不断发展的物联网环境。
摘要:The growth and heterogeneity of IoT devices create security challenges where static identification models can degrade as traffic evolves. This paper presents a two-stage, flow-feature-based pipeline for unsupervised IoT device traffic profiling and incremental model updating, evaluated on selected long-duration captures from the Deakin IoT dataset. For baseline profiling, density-based clustering (DBSCAN) isolates a substantial outlier portion of the data and produces the strongest alignment with ground-truth device labels among tested classical methods (NMI 0.78), outperforming centroid-based clustering on cluster purity. For incremental adaptation, we evaluate stream-oriented clustering approaches and find that BIRCH supports efficient updates (0.13 seconds per update) and forms comparatively coherent clusters for a held-out novel device (purity 0.87), but with limited capture of novel traffic (share 0.72) and a measurable trade-off in known-device accuracy after adaptation (0.71). Overall, the results highlight a practical trade-off between high-purity static profiling and the flexibility of incremental clustering for evolving IoT environments.
【3】RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models
标题:RewardUQ:不确定性意识奖励模型的统一框架
链接:https://arxiv.org/abs/2602.24040
摘要:奖励模型是将大型语言模型(LLM)与人类偏好对齐的核心。然而,大多数方法依赖于逐点奖励估计,忽略了有限的人类反馈所产生的奖励模型的认知不确定性。最近的研究表明,量化这种不确定性可以通过不确定性引导的主动学习降低人工注释的成本,并减轻LLM后训练中的奖励过度优化。然而,到目前为止,不确定性感知的奖励模型在没有进行彻底比较的情况下被采用,使得人们对它们知之甚少。这项工作引入了一个统一的框架,RewardUQ,系统地评估奖励模型的不确定性量化。我们比较常见的方法沿标准度量测量的准确性和校准,我们提出了一个新的排名策略,将这两个方面的简化比较。我们的实验结果表明,模型大小和初始化对性能的影响是最有意义的,大多数以前的工作都可以从替代设计选择中受益。为了促进新方法的开发和评估,并帮助下游应用程序的部署,我们将我们的开源框架作为Python包发布。我们的代码可在https://github.com/lasgroup/rewarduq上获得。
摘要:Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that overlook the epistemic uncertainty in reward models arising from limited human feedback. Recent work suggests that quantifying this uncertainty can reduce the costs of human annotation via uncertainty-guided active learning and mitigate reward overoptimization in LLM post-training. However, uncertainty-aware reward models have so far been adopted without thorough comparison, leaving them poorly understood. This work introduces a unified framework, RewardUQ, to systematically evaluate uncertainty quantification for reward models. We compare common methods along standard metrics measuring accuracy and calibration, and we propose a new ranking strategy incorporating both dimensions for a simplified comparison. Our experimental results suggest that model size and initialization have the most meaningful impact on performance, and most prior work could have benefited from alternative design choices. To foster the development and evaluation of new methods and aid the deployment in downstream applications, we release our open-source framework as a Python package. Our code is available at https://github.com/lasgroup/rewarduq.
【4】Active Value Querying to Minimize Additive Error in Subadditive Set Function Learning
标题:通过主动值查询来最小化次加性集函数学习中的加性误差
链接:https://arxiv.org/abs/2602.23529
摘要:次可加集函数在计算经济学(特别是组合拍卖)、组合优化或人工智能应用(如可解释机器学习)中起着关键作用。然而,指定集合函数通常需要将值分配给指数数量的子集,这在实践中通常是资源密集型的任务,特别是当值来自外部来源时,例如机器学习模型的再训练。某些值的~简单省略引入模糊性,当不完全集合函数必须进一步优化时,模糊性变得更加显著。受关于次可加函数关于乘法误差的不可逼近性的著名结果的启发,我们研究了一个关于加性误差的未知次可加(或其子类)集函数的逼近问题-- i.例如,我们的目标是有效地缩小最小和最大完备化之间的距离。我们的贡献有三个方面:(i)彻底探索具有缺失值的不同集合函数类的最小和最大完备化,并分析它们的结果距离;(ii)开发方法以最小化具有已知先验的集合函数类上的这种距离,通过以离线和在线方式公开附加子集的值来实现;以及(iii)算法在实际场景中的性能的实证演示。
摘要:Subadditive set functions play a pivotal role in computational economics (especially in combinatorial auctions), combinatorial optimization or artificial intelligence applications such as interpretable machine learning. However, specifying a set function requires assigning values to an exponentially large number of subsets in general, a task that is often resource-intensive in practice, particularly when the values derive from external sources such as retraining of machine learning models. A~simple omission of certain values introduces ambiguity that becomes even more significant when the incomplete set function has to be further optimized over. Motivated by the well-known result about inapproximability of subadditive functions using deterministic value queries with respect to a multiplicative error, we study a problem of approximating an unknown subadditive (or a subclass of thereof) set function with respect to an additive error -- i. e., we aim to efficiently close the distance between minimal and maximal completions. Our contributions are threefold: (i) a thorough exploration of minimal and maximal completions of different classes of set functions with missing values and an analysis of their resulting distance; (ii) the development of methods to minimize this distance over classes of set functions with a known prior, achieved by disclosing values of additional subsets in both offline and online manner; and (iii) empirical demonstrations of the algorithms' performance in practical scenarios.
【5】Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning
标题:人类监督作为信息瓶颈:人类引导学习中错误下限的统一理论
链接:https://arxiv.org/abs/2602.23446
备注:Proceedings from IEEE CAI 2026, Conference on Artificial Intelligence, 8-10 May, Granada, Spain. 8 Pages, 3 Figures, 7 Tables
摘要
:大型语言模型主要基于人类生成的数据和反馈进行训练,但它们会因注释噪声、主观偏好和自然语言有限的表达带宽而出现持续错误。我们认为,这些限制反映了结构特性的监督渠道,而不是模型规模或优化。我们发展了一个统一的理论,表明当人类监督渠道不足以满足潜在的评估目标时,它就充当了一个信息减少渠道,为任何受其支配的学习者诱导了一个严格的正超额风险下限。我们正式定义了这个人类有限智能极限,并表明在六个互补框架中,(算子理论,PAC-Bayes,信息论,因果推理,范畴理论和从人类反馈中强化学习的博弈论分析),不充分性产生严格正的下界,该下界由到注释噪声、偏好失真和语义压缩的相同结构分解产生。该理论解释了为什么单靠缩放不能消除持续的人类对齐错误,并描述了辅助非人类信号(例如,检索、程序执行、工具)通过恢复关于潜在目标的信息来增加有效的监督能力并使地板塌陷。对真实偏好数据、合成已知目标任务和外部可验证基准的实验证实了预测的结构特征:仅限人类的监督表现出持久的地板,而足够信息的辅助通道严格减少或消除了过多的错误。
摘要:Large language models are trained primarily on human-generated data and feedback, yet they exhibit persistent errors arising from annotation noise, subjective preferences, and the limited expressive bandwidth of natural language. We argue that these limitations reflect structural properties of the supervision channel rather than model scale or optimization. We develop a unified theory showing that whenever the human supervision channel is not sufficient for a latent evaluation target, it acts as an information-reducing channel that induces a strictly positive excess-risk floor for any learner dominated by it. We formalize this Human-Bounded Intelligence limit and show that across six complementary frameworks (operator theory, PAC-Bayes, information theory, causal inference, category theory, and game-theoretic analyses of reinforcement learning from human feedback), non-sufficiency yields strictly positive lower bounds arising from the same structural decomposition into annotation noise, preference distortion, and semantic compression. The theory explains why scaling alone cannot eliminate persistent human-aligned errors and characterizes conditions under which auxiliary non-human signals (e.g., retrieval, program execution, tools) increase effective supervision capacity and collapse the floor by restoring information about the latent target. Experiments on real preference data, synthetic known-target tasks, and externally verifiable benchmarks confirm the predicted structural signatures: human-only supervision exhibits a persistent floor, while sufficiently informative auxiliary channels strictly reduce or eliminate excess error.
【6】Active Learning for Planet Habitability Classification under Extreme Class Imbalance
标题:极端阶级失衡下的行星宜居性分类主动学习
链接:https://arxiv.org/abs/2602.23666
备注:19 pages, 9 figure, 2 tables
摘要:系外行星目录的规模和异质性的增加使得系统的可居住性评估变得具有挑战性,特别是考虑到潜在宜居行星的极度稀缺及其标签的不断变化的性质。在这项研究中,我们探索使用基于池的主动学习,以提高现实的观测约束下的可居住性分类的效率。我们从可居住世界目录和NASA系外行星档案中构建了一个统一的数据集,并将可居住性评估制定为二元分类问题。建立基于梯度提升决策树的监督基线,并优化召回,以优先识别稀有的潜在宜居行星。然后将该模型嵌入到主动学习框架中,在该框架中,将基于不确定性的保证金采样与多个运行和标签预算中的随机查询进行比较。我们发现,主动学习大大减少了接近监督性能所需的标记实例的数量,表明标签效率有明显的提高。为了将这些结果与实际的天文学用例联系起来,我们将独立训练的主动学习模型的预测聚合到一个集合中,并使用由此产生的平均概率和不确定性来对最初标记为不可居住的行星进行排名。这个过程确定了一个强大的候选人进一步研究,说明如何主动学习可以支持保守的,不确定性意识的后续目标的优先级,而不是投机重新分类。我们的研究结果表明,主动学习提供了一个原则性的框架,指导可居住性研究的数据制度,其特征是标签不平衡,不完整的信息,和有限的观测资源。
摘要:The increasing size and heterogeneity of exoplanet catalogs have made systematic habitability assessment challenging, particularly given the extreme scarcity of potentially habitable planets and the evolving nature of their labels. In this study, we explore the use of pool-based active learning to improve the efficiency of habitability classification under realistic observational constraints. We construct a unified dataset from the Habitable World Catalog and the NASA Exoplanet Archive and formulate habitability assessment as a binary classification problem. A supervised baseline based on gradient-boosted decision trees is established and optimized for recall in order to prioritize the identification of rare potentially habitable planets. This model is then embedded within an active learning framework, where uncertainty-based margin sampling is compared against random querying across multiple runs and labeling budgets. We find that active learning substantially reduces the number of labeled instances required to approach supervised performance, demonstrating clear gains in label efficiency. To connect these results to a practical astronomical use case, we aggregate predictions from independently trained active-learning models into an ensemble and use the resulting mean probabilities and uncertainties to rank planets originally labeled as non-habitable. This procedure identifies a single robust candidate for further study, illustrating how active learning can support conservative, uncertainty-aware prioritization of follow-up targets rather than speculative reclassification. Our results indicate that active learning provides a principled framework for guiding habitability studies in data regimes characterized by label imbalance, incomplete information, and limited observational resources.
迁移|Zero/Few/One-Shot|自适应(7篇)
【1】Adaptive Combinatorial Experimental Design: Pareto Optimality for Decision-Making and Inference
标题:自适应组合实验设计:决策和推理的帕累托最优性
链接:https://arxiv.org/abs/2602.24231
备注:30 pages, 3 figure, AISTATS 2026 accepted paper
摘要:在本文中,我们提供了第一次调查自适应组合实验设计,集中在组合多臂土匪(CMAB)的遗憾最小化和统计功率之间的权衡。虽然最大限度地减少遗憾需要重复利用高回报的武器,准确的推理奖励差距需要充分探索次优行动。我们正式通过帕累托最优的概念,并建立帕累托有效的学习CMAB的等价条件,这种权衡。我们考虑两个相关的情况下,不同的信息结构,即,全强盗反馈和半强盗反馈,并分别针对这两种情况提出了两种算法MixCombKL和MixCombUCB。我们提供的理论保证表明,这两种算法都是帕累托最优的,实现有限时间的保证遗憾和估计误差的手臂差距。我们的研究结果进一步表明,更丰富的反馈显着收紧可达到的帕累托前沿,与我们提出的方法下,提高估计精度所产生的主要收益。总之,这些发现建立了一个原则性的框架,自适应组合实验在多目标决策。
摘要:In this paper, we provide the first investigation into adaptive combinatorial experimental design, focusing on the trade-off between regret minimization and statistical power in combinatorial multi-armed bandits (CMAB). While minimizing regret requires repeated exploitation of high-reward arms, accurate inference on reward gaps requires sufficient exploration of suboptimal actions. We formalize this trade-off through the concept of Pareto optimality and establish equivalent conditions for Pareto-efficient learning in CMAB. We consider two relevant cases under different information structures, i.e., full-bandit feedback and semi-bandit feedback, and propose two algorithms MixCombKL and MixCombUCB respectively for these two cases. We provide theoretical guarantees showing that both algorithms are Pareto optimal, achieving finite-time guarantees on both regret and estimation error of arm gaps. Our results further reveal that richer feedback significantly tightens the attainable Pareto frontier, with the primary gains arising from improved estimation accuracy under our proposed methods. Taken together, these findings establish a principled framework for adaptive combinatorial experimentation in multi-objective decision-making.
【2】Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning
标题:强化学习的自适应相关加权内在回报
链接:https://arxiv.org/abs/2602.24081
摘要:我们提出了ACWI(Adaptive Correlation Weighted Intrinsic),这是一个自适应内在奖励缩放框架,旨在动态平衡内在和外在奖励,以改善稀疏奖励强化学习中的探索。与依赖于手动调整标量系数的传统方法不同,ACWI在线学习状态相关的缩放系数,这通常会导致任务之间的性能不稳定或次优。具体来说,ACWI引入了一个轻量级的Beta网络,通过基于编码器的架构直接从代理状态预测内在奖励权重。使用基于相关性的目标来优化缩放机制,该目标鼓励加权的内在回报和贴现的未来外在回报之间的一致性。该公式使任务自适应探索激励,同时保持计算效率和训练稳定性。我们评估ACWI在一套稀疏的奖励环境中的MiniGrid。实验结果表明,与固定的内在奖励基线相比,ACWI始终提高了样本效率和学习稳定性,以最小的计算开销实现了卓越的性能。
摘要:We propose ACWI (Adaptive Correlation Weighted Intrinsic), an adaptive intrinsic reward scaling framework designed to dynamically balance intrinsic and extrinsic rewards for improved exploration in sparse reward reinforcement learning. Unlike conventional approaches that rely on manually tuned scalar coefficients, which often result in unstable or suboptimal performance across tasks, ACWI learns a state dependent scaling coefficient online. Specifically, ACWI introduces a lightweight Beta Network that predicts the intrinsic reward weight directly from the agent state through an encoder based architecture. The scaling mechanism is optimized using a correlation based objective that encourages alignment between the weighted intrinsic rewards and discounted future extrinsic returns. This formulation enables task adaptive exploration incentives while preserving computational efficiency and training stability. We evaluate ACWI on a suite of sparse reward environments in MiniGrid. Experimental results demonstrate that ACWI consistently improves sample efficiency and learning stability compared to fixed intrinsic reward baselines, achieving superior performance with minimal computational overhead.
【3】MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening
标题:MNT:用于早期阿尔茨海默病筛查的多模式图像到言语知识转移
链接:https://arxiv.org/abs/2602.23994
摘要:阿尔茨海默病是一种进行性神经退行性疾病,其中轻度认知障碍(MCI)标志着衰老和痴呆之间的关键转变。神经成像方式,如结构MRI,提供了这种转变的生物标志物;然而,其高成本和基础设施需求限制了其在人群规模上的部署。语音分析提供了一种非侵入性的替代方案,但仅语音分类器是独立于神经成像开发的,使得决策边界在生物学上没有根据,并限制了CN与MCI之间微妙区别的可靠性。我们提出了MINT(Multimodal Imaging-to-Speech Knowledge Transfer),这是一个三阶段的跨模态框架,可以在训练时将生物标志物结构从MRI转移到语音编码器中。一位接受过1,228名受试者培训的MRI教师定义了一个紧凑的神经成像嵌入空间,用于CN与MCI分类。残余投影头通过组合几何损失将语音表示与此冻结的成像流形对齐,使语音适应学习的生物标志物空间,同时保持成像编码器保真度。冻结的MRI分类器从未暴露于语音,在推理时应用于对齐的嵌入,并且不需要扫描仪。对ADNI-4的评估显示,对齐的语音实现了与仅语音基线相当的性能(AUC 0.720 vs 0.711),同时在推理时不需要成像,这表明MRI衍生的决策边界可以支持语音表示。多模式融合优于单独MRI(0.973 vs 0.958)。消融研究确定辍学正规化和自我监督的预训练作为关键的设计决策。据我们所知,这是第一次证明MRI到语音的知识转移用于早期阿尔茨海默氏症的筛查,建立了一个生物学基础的途径,用于人群水平的认知分类,而无需神经成像进行推断。
摘要:Alzheimer's disease is a progressive neurodegenerative disorder in which mild cognitive impairment (MCI) marks a critical transition between aging and dementia. Neuroimaging modalities, such as structural MRI, provide biomarkers of this transition; however, their high costs and infrastructure needs limit their deployment at a population scale. Speech analysis offers a non-invasive alternative, but speech-only classifiers are developed independently of neuroimaging, leaving decision boundaries biologically ungrounded and limiting reliability on the subtle CN-versus-MCI distinction. We propose MINT (Multimodal Imaging-to-Speech Knowledge Transfer), a three-stage cross-modal framework that transfers biomarker structure from MRI into a speech encoder at training time. An MRI teacher, trained on 1,228 subjects, defines a compact neuroimaging embedding space for CN-versus-MCI classification. A residual projection head aligns speech representations to this frozen imaging manifold via a combined geometric loss, adapting speech to the learned biomarker space while preserving imaging encoder fidelity. The frozen MRI classifier, which is never exposed to speech, is applied to aligned embeddings at inference and requires no scanner. Evaluation on ADNI-4 shows aligned speech achieves performance comparable to speech-only baselines (AUC 0.720 vs 0.711) while requiring no imaging at inference, demonstrating that MRI-derived decision boundaries can ground speech representations. Multimodal fusion improves over MRI alone (0.973 vs 0.958). Ablation studies identify dropout regularization and self-supervised pretraining as critical design decisions. To our knowledge, this is the first demonstration of MRI-to-speech knowledge transfer for early Alzheimer's screening, establishing a biologically grounded pathway for population-level cognitive triage without neuroimaging at inference.
【4】Experience-Guided Self-Adaptive Cascaded Agents for Breast Cancer Screening and Diagnosis with Reduced Biopsy Referrals
标题:经验引导的自适应级联试剂用于乳腺癌筛查和诊断,减少活检参考次数
链接:https://arxiv.org/abs/2602.23899
摘要:我们提出了一个经验指导的级联多代理框架乳腺超声筛查和诊断,称为BUSD代理,旨在减少诊断升级和不必要的活检转诊。我们的框架模型筛选和诊断为两个阶段,选择性决策过程。一个轻量级的“筛查诊所”代理,仅限于分类模型作为工具,当恶性风险和不确定性估计为低时,从进一步的诊断升级中有选择地过滤出良性和正常病例。风险较高的病例被升级到“诊断诊所”,该机构整合了更丰富的感知和放射学描述工具,以就活检转诊作出二次决定。为了提高代理性能,病理学确认结果的过去记录以及图像嵌入,模型预测和历史代理动作作为结构化决策轨迹存储在记忆库中。对于每一个新的情况下,BUSD-Agent检索相似的过去的情况下,图像,模型响应和置信度的相似性,以条件代理的当前决策策略。这使得检索条件的上下文自适应,动态调整模型的信任和升级阈值从以前的经验,而无需参数更新。对10个乳腺超声数据集的评价表明,与没有轨迹调节的相同架构相比,所提出的经验指导工作流程将BUSD-Agent的诊断升级从84.95%降低到58.72%,将总体活检转诊从59.50%降低到37.08%,同时将平均筛查特异性提高68.48%,诊断特异性提高6.33%。
摘要:We propose an experience-guided cascaded multi-agent framework for Breast Ultrasound Screening and Diagnosis, called BUSD-Agent, that aims to reduce diagnostic escalation and unnecessary biopsy referrals. Our framework models screening and diagnosis as a two-stage, selective decision-making process. A lightweight `screening clinic' agent, restricted to classification models as tools, selectively filters out benign and normal cases from further diagnostic escalation when malignancy risk and uncertainty are estimated as low. Cases that have higher risks are escalated to the `diagnostic clinic' agent, which integrates richer perception and radiological description tools to make a secondary decision on biopsy referral. To improve agent performance, past records of pathology-confirmed outcomes along with image embeddings, model predictions, and historical agent actions are stored in a memory bank as structured decision trajectories. For each new case, BUSD-Agent retrieves similar past cases based on image, model response and confidence similarity to condition the agent's current decision policy. This enables retrieval-conditioned in-context adaptation that dynamically adjusts model trust and escalation thresholds from prior experiences without parameter updates. Evaluation across 10 breast ultrasound datasets shows that the proposed experience-guided workflow reduces diagnostic escalation in BUSD-Agent from 84.95% to 58.72% and overall biopsy referrals from 59.50% to 37.08%, compared to the same architecture without trajectory conditioning, while improving average screening specificity by 68.48% and diagnostic specificity by 6.33%.
【5】Bandwidth-adaptive Cloud-Assisted 360-Degree 3D Perception for Autonomous Vehicles
标题:自动驾驶汽车的带宽自适应云辅助360度3D感知
链接:https://arxiv.org/abs/2602.23871
摘要:自动驾驶面临的一个关键挑战在于在严格的延迟限制下保持对周围障碍物的实时态势感知。高处理要求加上有限的机载计算资源可能会导致延迟问题,特别是在复杂的城市环境中。为了解决这个问题,我们建议利用车联网(V2X)通信将处理部分卸载到计算资源丰富的云端,从而减少整体延迟。我们的方法利用基于transformer的模型将多相机传感器数据融合到全面的鸟瞰图(BEV)表示中,从而实现准确的360度3D对象检测。计算基于本地处理的层数和特征的量化级别在车辆和云之间动态划分。为了进一步减少网络负载,我们在传输之前应用特征向量裁剪和压缩。在真实世界的实验评估中,我们的混合策略实现了72%的减少端到端的延迟相比,传统的板载解决方案。为了适应波动的网络条件,我们引入了一种动态优化算法,该算法选择分裂点和量化级别,以最大限度地提高检测精度,同时满足实时延迟约束。基于跟踪的评估现实的带宽变化表明,这种自适应的方法提高了高达20%的精度比静态参数化相同的延迟性能。
摘要:A key challenge for autonomous driving lies in maintaining real-time situational awareness regarding surrounding obstacles under strict latency constraints. The high processing requirements coupled with limited onboard computational resources can cause delay issues, particularly in complex urban settings. To address this, we propose leveraging Vehicle-to-Everything (V2X) communication to partially offload processing to the cloud, where compute resources are abundant, thus reducing overall latency. Our approach utilizes transformer-based models to fuse multi-camera sensor data into a comprehensive Bird's-Eye View (BEV) representation, enabling accurate 360-degree 3D object detection. The computation is dynamically split between the vehicle and the cloud based on the number of layers processed locally and the quantization level of the features. To further reduce network load, we apply feature vector clipping and compression prior to transmission. In a real-world experimental evaluation, our hybrid strategy achieved a 72 \% reduction in end-to-end latency compared to a traditional onboard solution. To adapt to fluctuating network conditions, we introduce a dynamic optimization algorithm that selects the split point and quantization level to maximize detection accuracy while satisfying real-time latency constraints. Trace-based evaluation under realistic bandwidth variability shows that this adaptive approach improves accuracy by up to 20 \% over static parameterization with the same latency performance.
【6】Cross-Representation Knowledge Transfer for Improved Sequential Recommendations
标题:跨表示知识转移以改进的顺序推荐
链接:https://arxiv.org/abs/2602.23471
摘要:Transformer架构能够捕捉用户交互历史中的顺序依赖关系,已成为顺序推荐系统中的主导方法。尽管取得了成功,但这些模型孤立地考虑了序列元素,隐含地考虑了它们之间的复杂关系。相比之下,图神经网络通过更高阶的交互来显式地对这些关系进行建模,但通常无法充分捕捉它们随时间的演变,从而限制了它们在预测下一次交互方面的用途。为了填补这一空白,我们提出了一个新的框架,该框架结合了Transformers和图神经网络,并调整了不同的表示来解决下一项预测任务。我们的解决方案同时编码交互图中的结构依赖关系,并跟踪它们的动态变化。在一些开放数据集上的实验结果表明,该框架在推荐质量方面始终优于纯序列和图方法,以及结合这两种类型信号的最新方法。
摘要
:Transformer architectures, capable of capturing sequential dependencies in the history of user interactions, have become the dominant approach in sequential recommender systems. Despite their success, such models consider sequence elements in isolation, implicitly accounting for the complex relationships between them. Graph neural networks, in contrast, explicitly model these relationships through higher order interactions but are often unable to adequately capture their evolution over time, limiting their use for predicting the next interaction. To fill this gap, we present a new framework that combines transformers and graph neural networks and aligns different representations for solving next-item prediction task. Our solution simultaneously encodes structural dependencies in the interaction graph and tracks their dynamic change. Experimental results on a number of open datasets demonstrate that the proposed framework consistently outperforms both pure sequential and graph approaches in terms of recommendation quality, as well as recent methods that combine both types of signals.
【7】Few-Shot Continual Learning for 3D Brain MRI with Frozen Foundation Models
标题:使用冷冻基础模型进行3D脑部MRI的Few-Shot连续学习
链接:https://arxiv.org/abs/2602.23533
摘要:在大规模3D医学成像数据上预训练的基础模型在使用有限的标记数据进行持续学习的情况下适应多个下游任务时面临挑战。我们通过将冻结的预训练骨干与特定于任务的低秩自适应(LoRA)模块相结合,解决了3D脑MRI的Few-Shot持续学习。任务顺序到达-肿瘤分割(BraTS)和脑年龄估计(IXI)-没有重放以前的任务数据。每个任务都接收一个专用的LoRA适配器;只有适配器和特定于任务的头被训练,而主干保持冻结,从而通过设计消除灾难性遗忘(BWT=0)。在持续学习中,顺序完全微调遭受严重的遗忘(T1 Dice在T2之后从0.80下降到0.16),而顺序线性探测实现了强T1(Dice 0.79),但在T2上失败(MAE 1.45)。我们的LoRA方法在两个任务中实现了最佳的平衡性能:T1 Dice 0.62$\pm$0.07,T2 MAE 0.16$\pm$0.05,每个任务具有零遗忘和$摘要:Foundation models pretrained on large-scale 3D medical imaging data face challenges when adapted to multiple downstream tasks under continual learning with limited labeled data. We address few-shot continual learning for 3D brain MRI by combining a frozen pretrained backbone with task-specific Low-Rank Adaptation (LoRA) modules. Tasks arrive sequentially -- tumor segmentation (BraTS) and brain age estimation (IXI) -- with no replay of previous task data. Each task receives a dedicated LoRA adapter; only the adapter and task-specific head are trained while the backbone remains frozen, thereby eliminating catastrophic forgetting by design (BWT=0). In continual learning, sequential full fine-tuning suffers severe forgetting (T1 Dice drops from 0.80 to 0.16 after T2), while sequential linear probing achieves strong T1 (Dice 0.79) but fails on T2 (MAE 1.45). Our LoRA approach achieves the best balanced performance across both tasks: T1 Dice 0.62$\pm$0.07, T2 MAE 0.16$\pm$0.05, with zero forgetting and $
强化学习(4篇)
【1】Multi-Objective Reinforcement Learning for Large-Scale Tote Allocation in Human-Robot Collaborative Fulfillment Centers
标题:人机协作履行中心大规模行李箱分配的多目标强化学习
链接:https://arxiv.org/abs/2602.24182
摘要:在基于集装箱的履行中心中优化整合流程需要权衡处理速度、资源使用和空间利用等竞争目标,同时遵守一系列现实操作限制。这一过程涉及通过人工和机器人工作站的组合在集装箱之间移动物品,以释放入境库存空间并提高集装箱利用率。我们将此问题表述为具有高维状态空间和动态系统行为的大规模多目标强化学习(MORL)任务。我们的方法建立在通过零和游戏中的最佳响应和无遗憾动态解决约束强化学习问题的最新理论进展的基础上,实现了有原则的极大极小策略学习。对现实仓库模拟的政策评估表明,我们的方法有效地权衡了目标,我们根据经验观察到,它学习了一个单一的政策,同时满足所有的约束,即使这在理论上是没有保证的。我们进一步引入了一个理论框架来处理错误消除的问题,时间平均的解决方案显示振荡行为。此方法返回一个拉格朗日值接近博弈的最小最大值的单变量。这些结果证明了MORL在解决大规模工业系统中复杂,高影响决策问题方面的前景。
摘要:Optimizing the consolidation process in container-based fulfillment centers requires trading off competing objectives such as processing speed, resource usage, and space utilization while adhering to a range of real-world operational constraints. This process involves moving items between containers via a combination of human and robotic workstations to free up space for inbound inventory and increase container utilization. We formulate this problem as a large-scale Multi-Objective Reinforcement Learning (MORL) task with high-dimensional state spaces and dynamic system behavior. Our method builds on recent theoretical advances in solving constrained RL problems via best-response and no-regret dynamics in zero-sum games, enabling principled minimax policy learning. Policy evaluation on realistic warehouse simulations shows that our approach effectively trades off objectives, and we empirically observe that it learns a single policy that simultaneously satisfies all constraints, even if this is not theoretically guaranteed. We further introduce a theoretical framework to handle the problem of error cancellation, where time-averaged solutions display oscillatory behavior. This method returns a single iterate whose Lagrangian value is close to the minimax value of the game. These results demonstrate the promise of MORL in solving complex, high-impact decision-making problems in large-scale industrial systems.
【2】Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning
标题:通过跨域强化学习的扩散Schrödinger桥弥合动力学差距
链接:https://arxiv.org/abs/2602.23737
摘要:跨域强化学习(RL)的目的是在源域和目标域之间的动态变化下学习可转移的策略。一个关键的挑战在于缺乏目标域环境交互和奖励监督,这阻碍了直接的策略学习。为了应对这一挑战,我们提出了跨域强化学习(BDGxRL)的桥接动力学缺口,这是一种新的框架,它利用扩散薛定谔桥(DSB)将源转换与离线演示中编码的目标域动力学对齐。此外,我们引入了一个奖励调制机制,估计奖励的基础上的状态转换,适用于DSB对齐的样本,以确保奖励和目标域动态之间的一致性。BDGxRL完全在源域内执行面向目标的策略学习,而无需访问目标环境或其奖励。在MuJoCo跨域基准测试上的实验表明,BDGxRL的性能优于最先进的基线,并在过渡动态变化下表现出较强的适应性。
摘要:Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains. A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy learning. To address this challenge, we propose Bridging Dynamics Gaps for Cross-Domain Reinforcement Learning (BDGxRL), a novel framework that leverages Diffusion Schrödinger Bridge (DSB) to align source transitions with target-domain dynamics encoded in offline demonstrations. Moreover, we introduce a reward modulation mechanism that estimates rewards based on state transitions, applying to DSB-aligned samples to ensure consistency between rewards and target-domain dynamics. BDGxRL performs target-oriented policy learning entirely within the source domain, without access to the target environment or its rewards. Experiments on MuJoCo cross-domain benchmarks demonstrate that BDGxRL outperforms state-of-the-art baselines and shows strong adaptability under transition dynamics shifts.
【3】Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem
标题:利用强化学习构建、合并、求解和适应最小-最大多重旅行推销员问题
链接:https://arxiv.org/abs/2602.23579
摘要:多个旅行商问题(mTSP)将旅行商问题扩展到m个旅行,开始和结束于一个共同的仓库,并联合访问所有客户只一次。在最小-最大变式中,目标是最大限度地减少最长行程,反映工作量平衡。我们提出了一种混合方法,构造,合并,求解和适应强化学习(RL-CMSA),对称单仓库最小最大mTSP。该方法迭代地构造不同的解决方案,使用概率聚类学习成对q值的指导下,合并到一个紧凑的池路由,解决了一个限制集覆盖MILP,并通过路由间删除,移位和交换移动细化解决方案。通过在高质量的解决方案中加强城市对同现来更新q值,而通过老化和修剪来调整池。这种精确优化和以重复性为指导的构造的组合平衡了探索和开发。随机和TSPLIB实例的计算结果表明,RL-CMSA在可比的时间限制下始终找到(接近)最佳解,并优于最先进的混合遗传算法,特别是当实例大小和销售人员数量增加时。
摘要:The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation. Computational results on random and TSPLIB instances show that RL-CMSA consistently finds (near-)best solutions and outperforms a state-of-the-art hybrid genetic algorithm under comparable time limits, especially as instance size and the number of salesmen increase.
【4】Component Centric Placement Using Deep Reinforcement Learning
标题:基于深度强化学习的组件中心布局
链接:https://arxiv.org/abs/2602.23540
摘要:印刷电路板上元器件的自动布局是布局设计的关键阶段。虽然强化学习(RL)已成功应用于复杂封装中的片上系统IP块放置和小芯片布置,但PCB元件放置由于以下几个因素而呈现出独特的挑战:元件尺寸的变化、单面和双面板、线长约束、板约束以及非重叠放置要求。在这项工作中,我们采用以元件为中心的布局,使用RL自动化PCB元件放置:首先,主元件固定在中心,而无源元件放置在主元件的引脚附近。主要组件周围的自由空间被离散化,大大减少了搜索空间,同时仍然覆盖所有可行的位置;第二,我们利用先验知识,每个被动的位置必须接近其相应的电压源。这允许我们设计奖励函数,避免浪费探索不可行或不相关的搜索空间。使用以组件为中心的布局,我们实现了不同的方法,包括深度Q网络,演员-评论算法和模拟退火。超过九个不同复杂性的现实世界的PCB的评估表明,我们最好的方法接近人类一样的位置在线长和可行性。
摘要:Automated placement of components on printed circuit boards (PCBs) is a critical stage in placement layout design. While reinforcement learning (RL) has been successfully applied to system-on-chip IP block placement and chiplet arrangement in complex packages, PCB component placement presents unique challenges due to several factors: variation in component sizes, single- and double-sided boards, wirelength constraints, board constraints, and non-overlapping placement requirements. In this work, we adopt a component-centric layout for automating PCB component placement using RL: first, the main component is fixed at the center, while passive components are placed in proximity to the pins of the main component. Free space around the main component is discretized, drastically reducing the search space while still covering all feasible placement; second, we leverage prior knowledge that each passive's position has to be near to its corresponding voltage source. This allows us to design the reward function which avoids wasted exploration of infeasible or irrelevant search space. Using the component centric layout, we implemented different methods including Deep Q-Network, Actor-Critic algorithm and Simulated Annealing. Evaluation on over nine real-world PCBs of varying complexity shows that our best proposed method approaches near human-like placements in terms of wirelength and feasibility.
元学习(1篇)
【1】EvoX: Meta-Evolution for Automated Discovery
标题:EvoX:自动发现的元进化
链接:https://arxiv.org/abs/2602.23413
摘要:AlphaEvolve等最近的工作表明,将LLM驱动的优化与进化搜索相结合可以有效地改进跨领域的程序、提示和算法。在这种范式中,以前评估的解决方案被重用,以指导模型向新的候选解决方案。至关重要的是,这种进化过程的有效性取决于搜索策略:如何选择和改变先前的解决方案以生成新的候选方案。然而,大多数现有方法依赖于具有预定义旋钮的固定搜索策略(例如,探索-探索比率),其在整个执行过程中保持静态。虽然这些方法在某些情况下是有效的,但它们往往不能适应不同的任务,甚至不能适应同一任务,因为搜索空间会随着时间的推移而变化。我们介绍EvoX,一种自适应进化方法,优化自己的进化过程。EvoX共同发展候选解决方案和用于生成它们的搜索策略,不断更新先前解决方案的选择和变化。这使得系统能够在优化过程中在不同的搜索策略之间动态切换。在近200个现实世界的优化任务中,EvoX在大多数任务上都优于现有的AI驱动的进化方法,包括AlphaEvolve,OpenEvolve,GEPA和ShinkaEvolve。
摘要:Recent work such as AlphaEvolve has shown that combining LLM-driven optimization with evolutionary search can effectively improve programs, prompts, and algorithms across domains. In this paradigm, previously evaluated solutions are reused to guide the model toward new candidate solutions. Crucially, the effectiveness of this evolution process depends on the search strategy: how prior solutions are selected and varied to generate new candidates. However, most existing methods rely on fixed search strategies with predefined knobs (e.g., explore-exploit ratios) that remain static throughout execution. While effective in some settings, these approaches often fail to adapt across tasks, or even within the same task as the search space changes over time. We introduce EvoX, an adaptive evolution method that optimizes its own evolution process. EvoX jointly evolves candidate solutions and the search strategies used to generate them, continuously updating how prior solutions are selected and varied based on progress. This enables the system to dynamically shift between different search strategies during the optimization process. Across nearly 200 real-world optimization tasks, EvoX outperforms existing AI-driven evolutionary methods including AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on the majority of tasks.
符号|符号学习(2篇)
【1】Neuro-Symbolic AI for Analytical Solutions of Differential Equations
标题:用于方程分析解的神经符号人工智能
链接:https://arxiv.org/abs/2502.01476
备注:Updates the method and added extra results
摘要:微分方程的解析解提供了精确的、可解释的见解,但很少可用,因为发现它们需要专家的直觉或在组合空间中的穷举搜索。我们介绍了SIGS,一个神经符号框架,自动化这一过程。SIGS使用形式语法来生成语法上有效的构建块,将这些表达式嵌入到连续空间中,然后搜索该空间以通过最小化基于物理的残差来组装、评分和细化候选封闭形式的解决方案。这种设计将符号推理与数值优化相结合;语法通过构造将候选解块约束为适当的,而潜在搜索使探索易于处理且无数据。SIGS是第一个神经符号方法(i)解析求解非线性偏微分方程的耦合系统,(ii)发现语法错误指定下的解决方案,以及(iii)为缺乏已知封闭形式解决方案的偏微分方程产生精确的符号近似。总的来说,SIGS在标准基准测试中的准确性和效率比现有的符号方法提高了几个数量级。
摘要:Analytical solutions to differential equations offer exact, interpretable insight but are rarely available because discovering them requires expert intuition or exhaustive search in combinatorial spaces. We introduce SIGS, a neuro-symbolic framework that automates this process. SIGS uses a formal grammar to generate only syntactically valid building blocks, embeds these expressions into a continuous space, and then searches this space to assemble, score, and refine candidate closed-form solutions by minimizing a physics-based residual. This design unifies symbolic reasoning with numerical optimization; the grammar constrains candidate solution blocks to be proper by construction, while the latent search makes exploration tractable and data-free. SIGS is the first neuro-symbolic method to (i) analytically solve coupled systems of nonlinear PDEs, (ii) discover solutions under grammar misspecification, and (iii) produce accurate symbolic approximations for PDEs lacking known closed-form solutions. Overall, SIGS achieves orders-of-magnitude improvements in accuracy and efficiency over existing symbolic methods on standard benchmarks.
【2】VaSST: Variational Inference for Symbolic Regression using Soft Symbolic Trees
标题:VaCST:使用软符号树进行符号回归的变分推理
链接:https://arxiv.org/abs/2602.23561
备注:38 pages, 5 figures, 35 tables, Submitted
摘要:符号回归最近在人工智能驱动的科学发现中获得了关注,旨在从揭示潜在物理定律的数据中恢复显式封闭形式的表达式。尽管最近的进展,现有的方法仍然占主导地位的启发式搜索算法或数据密集型的方法,假设低噪声制度,缺乏原则性的不确定性量化。完全概率公式是稀缺的,现有的马尔可夫链蒙特卡洛贝叶斯方法往往难以有效地探索高度多模态组合空间的符号表达式。我们介绍VaSST,一个可扩展的概率框架,符号回归变分推理的基础上。VaSST采用连续松弛的符号表达式树,称为软符号树,其中离散算子和特征分配被允许组件上的软分布所取代。这种放松变换的组合搜索在一个天文学大的符号空间成一个有效的基于梯度的优化问题,同时保持连贯的概率解释。学习的软表示在符号结构上诱导后验分布,从而实现原则性的不确定性量化。在模拟实验和SRBench中的Feynman符号回归数据库中,VaSST在结构恢复和预测准确性方面都取得了优于最先进的符号回归方法的性能。
摘要:Symbolic regression has recently gained traction in AI-driven scientific discovery, aiming to recover explicit closed-form expressions from data that reveal underlying physical laws. Despite recent advances, existing methods remain dominated by heuristic search algorithms or data-intensive approaches that assume low-noise regimes and lack principled uncertainty quantification. Fully probabilistic formulations are scarce, and existing Markov chain Monte Carlo-based Bayesian methods often struggle to efficiently explore the highly multimodal combinatorial space of symbolic expressions. We introduce VaSST, a scalable probabilistic framework for symbolic regression based on variational inference. VaSST employs a continuous relaxation of symbolic expression trees, termed soft symbolic trees, where discrete operator and feature assignments are replaced by soft distributions over allowable components. This relaxation transforms the combinatorial search over an astronomically large symbolic space into an efficient gradient-based optimization problem while preserving a coherent probabilistic interpretation. The learned soft representations induce posterior distributions over symbolic structures, enabling principled uncertainty quantification. Across simulated experiments and Feynman Symbolic Regression Database within SRBench, VaSST achieves superior performance in both structural recovery and predictive accuracy compared to state-of-the-art symbolic regression methods.
医学相关(5篇)
【1】Histopathology Image Normalization via Latent Manifold Compaction
标题:通过潜在Manifold压缩实现组织学图像规范化
链接:https://arxiv.org/abs/2602.24251
备注:11 pages
摘要:组织病理学染色方案、扫描仪和采集管道中的技术变化所产生的批次效应对计算病理学构成了持续的挑战,阻碍了跨批次的推广,并限制了跨临床站点模型的可靠部署。在这项工作中,我们引入了潜在流形压缩(LMC),这是一种无监督表示学习框架,通过显式压缩染色诱导的潜在流形,从单个源数据集学习批次不变嵌入来执行图像协调。这使得LMC能够泛化到训练过程中看不到的目标域数据。在三个具有挑战性的公共和内部基准上进行评估,LMC大大减少了多个数据集之间的批处理引起的分离,并在下游跨批分类和检测任务中始终优于最先进的归一化方法,从而实现卓越的泛化能力。
摘要:Batch effects arising from technical variations in histopathology staining protocols, scanners, and acquisition pipelines pose a persistent challenge for computational pathology, hindering cross-batch generalization and limiting reliable deployment of models across clinical sites. In this work, we introduce Latent Manifold Compaction (LMC), an unsupervised representation learning framework that performs image harmonization by learning batch-invariant embeddings from a single source dataset through explicit compaction of stain-induced latent manifolds. This allows LMC to generalize to target domain data unseen during training. Evaluated on three challenging public and in-house benchmarks, LMC substantially reduces batch-induced separations across multiple datasets and consistently outperforms state-of-the-art normalization methods in downstream cross-batch classification and detection tasks, enabling superior generalization.
【2】A multimodal slice discovery framework for systematic failure detection and explanation in medical image classification
标题:用于医学图像分类中系统故障检测和解释的多模式切片发现框架
链接:https://arxiv.org/abs/2602.24183
摘要:尽管基于机器学习的医学图像分类器取得了进展,但这些系统的安全性和可靠性仍然是实际环境中的主要问题。现有的审计方法主要依赖于单峰特征或基于元数据的子组分析,其可解释性有限,并且通常无法捕获隐藏的系统故障。为了解决这些限制,我们引入了第一个自动审计框架,该框架将切片发现方法扩展到专门针对医疗应用的多模态表示。使用MIMIC-CXR-JPG数据集在常见故障场景下进行了全面的实验,证明了该框架在故障发现和解释生成方面的强大能力。我们的研究结果还表明,多模态信息通常允许更全面和有效的审计分类器,而单峰变量超出图像的输入表现出强大的潜力,在资源有限的情况下。
摘要:Despite advances in machine learning-based medical image classifiers, the safety and reliability of these systems remain major concerns in practical settings. Existing auditing approaches mainly rely on unimodal features or metadata-based subgroup analyses, which are limited in interpretability and often fail to capture hidden systematic failures. To address these limitations, we introduce the first automated auditing framework that extends slice discovery methods to multimodal representations specifically for medical applications. Comprehensive experiments were conducted under common failure scenarios using the MIMIC-CXR-JPG dataset, demonstrating the framework's strong capability in both failure discovery and explanation generation. Our results also show that multimodal information generally allows more comprehensive and effective auditing of classifiers, while unimodal variants beyond image-only inputs exhibit strong potential in scenarios where resources are constrained.
【3】When Does Multimodal Learning Help in Healthcare? A Benchmark on EHR and Chest X-Ray Fusion
标题:多模式学习何时对医疗保健有帮助?EHR和胸部X射线融合的基准
链接:https://arxiv.org/abs/2602.23614
摘要:机器学习有望推进临床决策支持,但尚不清楚多模态学习何时真正在实践中有所帮助,特别是在模态缺失和公平性约束下。在这项工作中,我们对MIMIC-IV和MIMIC-CXR的标准化队列进行了电子健康记录(EHR)和胸部X射线(CXR)之间的多模态融合的系统基准测试,旨在回答四个基本问题:多模态融合何时改善临床预测,不同的融合策略如何比较,现有方法对缺失模态的鲁棒性如何,以及多模态模型是否实现算法公平性。我们的研究揭示了几个关键的见解。多模态融合在模态完整时提高了性能,收益集中在需要EHR和CXR互补信息的疾病中。虽然跨模态学习机制捕获临床上有意义的依赖性,而不是简单的连接,丰富的时间结构的EHR引入了强大的模态不平衡,架构复杂性本身无法克服。在现实的缺失,多式联运的好处迅速退化,除非模型明确设计,以处理不完整的投入。此外,多模态融合并不能从本质上提高公平性,子组差异主要是由于人口统计学群体之间的不平等敏感性。为了支持可重复和可扩展的评估,我们进一步发布了一个灵活的基准测试工具包,可以实现新模型和数据集的即插即用集成。总之,这项工作提供了关于多模态学习何时有用,何时失败以及原因的可操作指导,为开发有效可靠的临床可部署多模态系统奠定了基础。这个开源工具包可以在https://github.com/jakeykj/CareBench上找到。
摘要:Machine learning holds promise for advancing clinical decision support, yet it remains unclear when multimodal learning truly helps in practice, particularly under modality missingness and fairness constraints. In this work, we conduct a systematic benchmark of multimodal fusion between Electronic Health Records (EHR) and chest X-rays (CXR) on standardized cohorts from MIMIC-IV and MIMIC-CXR, aiming to answer four fundamental questions: when multimodal fusion improves clinical prediction, how different fusion strategies compare, how robust existing methods are to missing modalities, and whether multimodal models achieve algorithmic fairness. Our study reveals several key insights. Multimodal fusion improves performance when modalities are complete, with gains concentrating in diseases that require complementary information from both EHR and CXR. While cross-modal learning mechanisms capture clinically meaningful dependencies beyond simple concatenation, the rich temporal structure of EHR introduces strong modality imbalance that architectural complexity alone cannot overcome. Under realistic missingness, multimodal benefits rapidly degrade unless models are explicitly designed to handle incomplete inputs. Moreover, multimodal fusion does not inherently improve fairness, with subgroup disparities mainly arising from unequal sensitivity across demographic groups. To support reproducible and extensible evaluation, we further release a flexible benchmarking toolkit that enables plug-and-play integration of new models and datasets. Together, this work provides actionable guidance on when multimodal learning helps, when it fails, and why, laying the foundation for developing clinically deployable multimodal systems that are both effective and reliable. The open-source toolkit can be found at https://github.com/jakeykj/CareBench.
【4】Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package
标题:开发临床预测模型的样本量计算:概述和pmsims R包
链接:https://arxiv.org/abs/2602.23507
备注:26 pages, 4 figures, 1 table, preprint
摘要:背景资料:临床预测模型越来越多地用于为医疗决策提供信息,但确定其开发的最小样本量仍然是一个关键且尚未解决的挑战。样本量不足可能导致过度拟合、泛化能力差和预测偏倚。现有的方法,如启发式规则,封闭式公式和基于模拟的方法,在灵活性和准确性方面各不相同,特别是对于复杂的数据结构和机器学习模型。研究方法:我们回顾了目前的预测建模中的样本量估计方法,并介绍了一个概念框架,区分基于均值和基于保证的标准。在此基础上,我们提出了一种新的基于模拟的方法,该方法集成了学习曲线,高斯过程优化和保证原则,以确定以高概率实现目标性能的样本量。这种方法是在pmsims中实现的,pmsims是一个开源的、与模型无关的R包。结果如下:通过案例研究,我们证明了样本量估计在方法,性能指标和建模策略之间存在很大差异。与现有的工具相比,pmsims提供了灵活、高效和可解释的解决方案,可以适应不同的模型和用户定义的指标,同时明确说明模型性能的变化。结论:我们的框架和软件通过将灵活性与计算效率相结合,推进了临床预测建模的样本量方法。未来的工作应该将这些方法扩展到分层和多模式数据,纳入公平性和稳定性指标,并解决缺失数据和复杂依赖结构等挑战。
摘要:Background: Clinical prediction models are increasingly used to inform healthcare decisions, but determining the minimum sample size for their development remains a critical and unresolved challenge. Inadequate sample sizes can lead to overfitting, poor generalisability, and biased predictions. Existing approaches, such as heuristic rules, closed-form formulas, and simulation-based methods, vary in flexibility and accuracy, particularly for complex data structures and machine learning models. Methods: We review current methodologies for sample size estimation in prediction modelling and introduce a conceptual framework that distinguishes between mean-based and assurance-based criteria. Building on this, we propose a novel simulation-based approach that integrates learning curves, Gaussian Process optimisation, and assurance principles to identify sample sizes that achieve target performance with high probability. This approach is implemented in pmsims, an open-source, model-agnostic R package. Results: Through case studies, we demonstrate that sample size estimates vary substantially across methods, performance metrics, and modelling strategies. Compared to existing tools, pmsims provides flexible, efficient, and interpretable solutions that accommodate diverse models and user-defined metrics while explicitly accounting for variability in model performance. Conclusions: Our framework and software advance sample size methodology for clinical prediction modelling by combining flexibility with computational efficiency. Future work should extend these methods to hierarchical and multimodal data, incorporate fairness and stability metrics, and address challenges such as missing data and complex dependency structures.
【5】Complex Networks and the Drug Repositioning Problem
标题:复杂网络与药物再定位问题
链接:https://arxiv.org/abs/2602.23396
摘要:在这篇硕士论文中,研究了多级药物-蛋白质网络的图形特性,以及网络的形状如何在多年来为发现提供信息,主要识别爬行发现和少量跳跃发现。最后,网络结构用于通知网络扩散推荐系统,并优先考虑现有药物,以针对引起被忽视的热带病的生物体中的蛋白质进行再利用。
摘要:In this Master's thesis, the graph properties of a multi-level drug-protein network are studied, as well as how the network's shape has informed discoveries over the years, identifying primarily crawling discoveries and a smaller number of hopping discoveries. Finally, the network structure is used to inform a network diffusion recommendation system and to prioritize existing drugs for repurposing against proteins in organisms that cause Neglected Tropical Diseases.
推荐(1篇)
【1】U-CAN: Utility-Aware Contrastive Attenuation for Efficient Unlearning in Generative Recommendation
标题:U-CAN:实用性感知对比衰减,以实现生成式推荐中的有效取消学习
链接:https://arxiv.org/abs/2602.23400
摘要:生成式推荐(GenRec)通常利用大型语言模型(LLM)将个性化重新定义为一个推理驱动的序列生成任务。然而,对用户日志的微调无意中将敏感属性编码到模型参数中,引起了严重的隐私问题。现有的机器非学习(MU)技术由于多义性困境而难以驾驭这种紧张局势,其中神经元使用一般推理模式来处理敏感数据,导致传统梯度或修剪方法下的灾难性效用损失。为了解决这个问题,我们提出了实用感知对比衰减(U-CAN),一个精确的unlearning框架,在低秩适配器上运行。U-CAN通过对比激活来量化风险,并专注于对遗忘集高度敏感但对保留集受到抑制的非对称反应的神经元。为了保障性能,我们引入了一种效用感知的校准机制,该机制将权重大小与保留集激活规范相结合,为对保留性能有重大贡献的维度分配更高的效用分数。与二进制修剪不同,二进制修剪通常会分割网络结构,U-CAN开发了具有可微分衰减函数的自适应软衰减,以选择性地降低LoRA适配器上的高风险参数,抑制敏感的检索路径并保持推理电路的拓扑连接性。在两个公共数据集上的七个指标的实验表明,U-CAN实现了强大的隐私遗忘,效用保留和计算效率。
摘要:Generative Recommendation (GenRec) typically leverages Large Language Models (LLMs) to redefine personalization as an instruction-driven sequence generation task. However, fine-tuning on user logs inadvertently encodes sensitive attributes into model parameters, raising critical privacy concerns. Existing Machine Unlearning (MU) techniques struggle to navigate this tension due to the Polysemy Dilemma, where neurons superimpose sensitive data with general reasoning patterns, leading to catastrophic utility loss under traditional gradient or pruning methods. To address this, we propose Utility-aware Contrastive AttenuatioN (U-CAN), a precision unlearning framework that operates on low-rank adapters. U-CAN quantifies risk by contrasting activations and focuses on neurons with asymmetric responses that are highly sensitive to the forgetting set but suppressed on the retention set. To safeguard performance, we introduce a utility-aware calibration mechanism that combines weight magnitudes with retention-set activation norms, assigning higher utility scores to dimensions that contribute strongly to retention performance. Unlike binary pruning, which often fragments network structure, U-CAN develop adaptive soft attenuation with a differentiable decay function to selectively down-scale high-risk parameters on LoRA adapters, suppressing sensitive retrieval pathways and preserving the topological connectivity of reasoning circuits. Experiments on two public datasets across seven metrics demonstrate that U-CAN achieves strong privacy forgetting, utility retention, and computational efficiency.
超分辨率|去噪|去模糊|去雾(1篇)
【1】Selective Denoising Diffusion Model for Time Series Anomaly Detection
标题:时间序列异常检测的选择性去噪扩散模型
链接:https://arxiv.org/abs/2602.23662
摘要:几十年来,时间序列异常检测(TSAD)一直是一个重要的研究领域,基于重建的方法,主要基于生成模型,越来越受欢迎,并取得了成功。扩散模型最近引起了人们的注意,由于其先进的生成能力。现有的基于扩散的TSAD方法依赖于条件策略,该策略借助于条件器从白噪声中重建输入实例。然而,这在准确重建正常部分方面提出了挑战,导致次优检测性能。作为回应,我们提出了一种新型的基于扩散的方法,名为AnomalyFilter,它充当选择性过滤器,仅对实例中的异常部分进行降噪,同时保留正常部分。为了构建这样的滤波器,我们在训练阶段屏蔽高斯噪声,并在不向实例添加噪声的情况下进行去噪过程。这两个简单的组件的协同作用,大大提高了天真的扩散模型的性能。在5个数据集上的实验表明,AnomalyFilter在正常部分上实现了非常低的重建误差,为异常检测的有效性提供了经验支持。AnomalyFilter代表了一种开创性的方法,专注于专门为TSAD定制的扩散模型的噪声设计。
摘要:Time series anomaly detection (TSAD) has been an important area of research for decades, with reconstruction-based methods, mostly based on generative models, gaining popularity and demonstrating success. Diffusion models have recently attracted attention due to their advanced generative capabilities. Existing diffusion-based methods for TSAD rely on a conditional strategy, which reconstructs input instances from white noise with the aid of the conditioner. However, this poses challenges in accurately reconstructing the normal parts, resulting in suboptimal detection performance. In response, we propose a novel diffusion-based method, named AnomalyFilter, which acts as a selective filter that only denoises anomaly parts in the instance while retaining normal parts. To build such a filter, we mask Gaussian noise during the training phase and conduct the denoising process without adding noise to the instances. The synergy of the two simple components greatly enhances the performance of naive diffusion models. Extensive experiments on five datasets demonstrate that AnomalyFilter achieves notably low reconstruction error on normal parts, providing empirical support for its effectiveness in anomaly detection. AnomalyFilter represents a pioneering approach that focuses on the noise design of diffusion models specifically tailored for TSAD.
联邦学习|隐私保护|加密(2篇)
【1】FedNSAM:Consistency of Local and Global Flatness for Federated Learning
标题:FedNSam:联邦学习的本地和全球平坦性的一致性
链接:https://arxiv.org/abs/2602.23827
摘要:在联邦学习(FL)中,多步局部更新和数据异构性通常会导致全局最小值更尖锐,从而降低全局模型的性能。流行的FL算法将锐度感知最小化(SAM)集成到本地训练中以解决这个问题。然而,在高数据异质性环境中,局部训练中的平坦性并不意味着全局模型的平坦性。因此,最小化客户端数据上的局部损失表面的锐度并不能使FL中SAM的有效性提高全局模型的泛化能力。我们定义了\textbf{flatness distance}来解释这种现象。通过对FL中SAM的重新思考和对平坦距离的理论分析,提出了一种新的加速SAM的算法,该算法通过在局部更新中引入全局Nesterov动量来协调全局平坦性和局部平坦性的一致性. \textbf{FedNSAM}使用全局Nesterov动量作为客户端全局扰动和外推的局部估计方向。理论上,我们证明了一个更严格的收敛界比FedSAM Nesterov外推。从经验上讲,我们对CNN和Transformer模型进行了全面的实验,以验证\textbf{FedNSAM}的卓越性能和效率。该代码可在https://github.com/junkangLiu0/FedNSAM上获得。
摘要:In federated learning (FL), multi-step local updates and data heterogeneity usually lead to sharper global minima, which degrades the performance of the global model. Popular FL algorithms integrate sharpness-aware minimization (SAM) into local training to address this issue. However, in the high data heterogeneity setting, the flatness in local training does not imply the flatness of the global model. Therefore, minimizing the sharpness of the local loss surfaces on the client data does not enable the effectiveness of SAM in FL to improve the generalization ability of the global model. We define the \textbf{flatness distance} to explain this phenomenon. By rethinking the SAM in FL and theoretically analyzing the \textbf{flatness distance}, we propose a novel \textbf{FedNSAM} algorithm that accelerates the SAM algorithm by introducing global Nesterov momentum into the local update to harmonize the consistency of global and local flatness. \textbf{FedNSAM} uses the global Nesterov momentum as the direction of local estimation of client global perturbations and extrapolation. Theoretically, we prove a tighter convergence bound than FedSAM by Nesterov extrapolation. Empirically, we conduct comprehensive experiments on CNN and Transformer models to verify the superior performance and efficiency of \textbf{FedNSAM}. The code is available at https://github.com/junkangLiu0/FedNSAM.
【2】FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
标题:FedDAQ:通过全球数据和异类环境的梯度集成实现联邦学习
链接:https://arxiv.org/abs/2602.23504
备注:This paper has been accepted in ICLR 2026
摘要
:联合学习(FL)使一组客户端能够协作训练模型,而无需共享单个数据,但当客户端数据异构时,其性能会下降。Quartered FL通过对相似客户进行分组来解决这个问题。然而,现有的聚类FL方法仅仅依赖于数据相似性或梯度相似性,然而,这导致客户端相似性的不完整评估。以前的集群FL方法也限制了同一集群内的客户端的知识和表示共享。这就阻止了集群模型从集群中不同的客户端群体中受益。为了解决这些限制,FedDAG引入了一个集群FL框架FedDAG,它采用了一个加权的、类相似性度量,该度量集成了数据和梯度信息,在集群过程中提供了一个更全面的相似性度量。此外,FedDAG采用双编码器架构的集群模型,包括一个主要的编码器训练自己的客户端的数据和辅助编码器使用梯度从互补集群细化。这使得跨集群的功能转移,同时保留集群特定的专业化。不同的基准测试和数据异构性设置的实验表明,FedDAG始终优于国家的最先进的集群FL基线的准确性。
摘要:Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous. Clustered FL tackles this by grouping similar clients. However, existing clustered FL approaches rely solely on either data similarity or gradient similarity; however, this results in an incomplete assessment of client similarities. Prior clustered FL approaches also restrict knowledge and representation sharing to clients within the same cluster. This prevents cluster models from benefiting from the diverse client population across clusters. To address these limitations, FedDAG introduces a clustered FL framework, FedDAG, that employs a weighted, class-wise similarity metric that integrates both data and gradient information, providing a more holistic measure of similarity during clustering. In addition, FedDAG adopts a dual-encoder architecture for cluster models, comprising a primary encoder trained on its own clients' data and a secondary encoder refined using gradients from complementary clusters. This enables cross-cluster feature transfer while preserving cluster-specific specialization. Experiments on diverse benchmarks and data heterogeneity settings show that FedDAG consistently outperforms state-of-the-art clustered FL baselines in accuracy.
推理|分析|理解|解释(4篇)
【1】Time Series Foundation Models as Strong Baselines in Transportation Forecasting: A Large-Scale Benchmark Analysis
标题:时间序列基础模型作为交通预测的强有力基线:大规模基准分析
链接:https://arxiv.org/abs/2602.24238
备注:6 pages
摘要:准确预测交通动态对于城市交通和基础设施规划至关重要。虽然最近的工作已经通过深度学习模型实现了强大的性能,但这些方法通常需要特定于网络的训练、架构设计和超参数调整。本文通过对最先进的模型Chronos-2的zero-shot性能进行基准测试,评估通用时间序列基础模型是否可以作为交通任务的预测器,该模型在10个真实世界的数据集上,包括高速公路交通量和流量,城市交通速度,自行车共享需求和电动汽车充电站数据。在一致的评估协议下,我们发现,即使没有任何特定于任务的微调,Chronos-2在大多数数据集上都提供了最先进或具有竞争力的准确性,经常优于经典的统计基线和专业的深度学习架构,特别是在较长的时间范围内。除了点预测之外,我们还使用预测区间覆盖率和锐度来评估其原生概率输出,这表明Chronos-2也提供了有用的不确定性量化,而无需特定的训练。总的来说,本研究支持采用时间序列基础模型作为交通预测研究的关键基线。
摘要:Accurate forecasting of transportation dynamics is essential for urban mobility and infrastructure planning. Although recent work has achieved strong performance with deep learning models, these methods typically require dataset-specific training, architecture design and hyper-parameter tuning. This paper evaluates whether general-purpose time-series foundation models can serve as forecasters for transportation tasks by benchmarking the zero-shot performance of the state-of-the-art model, Chronos-2, across ten real-world datasets covering highway traffic volume and flow, urban traffic speed, bike-sharing demand, and electric vehicle charging station data. Under a consistent evaluation protocol, we find that, even without any task-specific fine-tuning, Chronos-2 delivers state-of-the-art or competitive accuracy across most datasets, frequently outperforming classical statistical baselines and specialized deep learning architectures, particularly at longer horizons. Beyond point forecasting, we evaluate its native probabilistic outputs using prediction-interval coverage and sharpness, demonstrating that Chronos-2 also provides useful uncertainty quantification without dataset-specific training. In general, this study supports the adoption of time-series foundation models as a key baseline for transportation forecasting research.
【2】SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching
标题:Senache:通过敏感性感知缓存加速扩散模型推理
链接:https://arxiv.org/abs/2602.24208
摘要:扩散模型实现了最先进的视频生成质量,但由于大量的顺序去噪步骤,它们的推理仍然昂贵。这激发了越来越多的关于加速扩散推理的研究。在免训练加速方法中,缓存通过跨时间步重用先前计算的模型输出来减少计算。现有的缓存方法依赖于启发式标准来选择缓存/重用时间步,并需要大量的调优。我们解决这个问题的原则敏感性感知缓存框架的限制。具体来说,我们通过分析模型输出对去噪输入中的扰动的敏感性来形式化缓存误差,即,噪声潜伏期和时间步长,并表明这种敏感性是一个关键的预测缓存错误。基于这种分析,我们提出了敏感性感知缓存(SenCache),一个动态的缓存策略,自适应地选择缓存时间步长的每样本的基础上。我们的框架提供了一个自适应缓存的理论基础,解释了为什么以前的经验主义可以部分有效,并将其扩展到一个动态的,样本特定的方法。在Wan 2.1、CogVideoX和LTX-Video上的实验表明,在相似的计算预算下,SenCache比现有的缓存方法获得了更好的视觉质量。
摘要:Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps. This has motivated a growing line of research on accelerating diffusion inference. Among training-free acceleration methods, caching reduces computation by reusing previously computed model outputs across timesteps. Existing caching methods rely on heuristic criteria to choose cache/reuse timesteps and require extensive tuning. We address this limitation with a principled sensitivity-aware caching framework. Specifically, we formalize the caching error through an analysis of the model output sensitivity to perturbations in the denoising inputs, i.e., the noisy latent and the timestep, and show that this sensitivity is a key predictor of caching error. Based on this analysis, we propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis. Our framework provides a theoretical basis for adaptive caching, explains why prior empirical heuristics can be partially effective, and extends them to a dynamic, sample-specific approach. Experiments on Wan 2.1, CogVideoX, and LTX-Video show that SenCache achieves better visual quality than existing caching methods under similar computational budgets.
【3】Agentic AI-RAN: Enabling Intent-Driven, Explainable and Self-Evolving Open RAN Intelligence
标题:大型AI-RAN:实现意图驱动、可解释和自我进化的开放RAN智能
链接:https://arxiv.org/abs/2602.24115
备注:9 pages, 4 figures
摘要:开放式RAN(O-RAN)在非RT RIC、近RT RIC和分布式单元之间暴露了丰富的控制和遥测接口,但也使得以安全和可审计的方式操作多租户、多目标RAN变得更加困难。与此同时,具有明确规划、工具使用、记忆和自我管理的代理人工智能系统提供了一种构建长期控制循环的自然方式。本文探讨了如何将这种代理控制器引入O-RAN:我们回顾了O-RAN架构,将代理控制器与传统的ML/RL xApp进行了对比,并围绕三个集群组织任务景观:网络切片生命周期,无线电资源管理(RRM)闭环以及横切安全性,隐私和合规性。然后,我们介绍了一小部分代理原语(计划-行动-观察-反映,工具使用技能,记忆和证据,以及自我管理门),并在多细胞O-RAN模拟中显示,与传统的基线和消融相比,它们如何提高切片生命周期和RRM性能,去除单个原语。安全性、隐私性和合规性作为架构约束和标准一致部署的开放性挑战进行了讨论。该框架在三个经典网络切片上实现了平均8.83%的资源使用减少。
摘要:Open RAN (O-RAN) exposes rich control and telemetry interfaces across the Non-RT RIC, Near-RT RIC, and distributed units, but also makes it harder to operate multi-tenant, multi-objective RANs in a safe and auditable manner. In parallel, agentic AI systems with explicit planning, tool use, memory, and self-management offer a natural way to structure long-lived control loops. This article surveys how such agentic controllers can be brought into O-RAN: we review the O-RAN architecture, contrast agentic controllers with conventional ML/RL xApps, and organise the task landscape around three clusters: network slice life-cycle, radio resource management (RRM) closed loops, and cross-cutting security, privacy, and compliance. We then introduce a small set of agentic primitives (Plan-Act-Observe-Reflect, skills as tool use, memory and evidence, and self-management gates) and show, in a multi-cell O-RAN simulation, how they improve slice life-cycle and RRM performance compared to conventional baselines and ablations that remove individual primitives. Security, privacy, and compliance are discussed as architectural constraints and open challenges for standards-aligned deployments. This framework achieves an average 8.83\% reduction in resource usage across three classic network slices.
【4】ReasonX: Declarative Reasoning on Explanations
标题:ReasonX:关于解释的声明性推理
链接:https://arxiv.org/abs/2602.23810
摘要:解释不透明的机器学习(ML)模型已经成为一个越来越重要的挑战。然而,目前的人工智能(XAI)方法存在几个缺点,包括抽象不足,用户交互性有限,符号知识集成不足。我们提出了ReasonX,一个解释工具的基础上表达式(或查询)在一个封闭的代数运算符理论的线性约束。ReasonX为决策树提供了声明式和交互式的解释,这些解释可以表示正在分析的ML模型,也可以作为任何黑盒预测器的全局或局部代理模型。用户可以将背景或常识知识表示为线性约束。这允许在多个抽象级别上进行推理,从完全指定的示例到未指定或部分约束的示例。ReasonX利用混合整数线性规划(MILP)来推理事实和对比实例的特征。我们在这里提出的ReasonX的架构,它包括一个Python层,更接近用户,和约束逻辑编程(CLP)层,它实现了查询代数的元解释器。ReasonX的功能通过定性示例进行了演示,并通过定量实验与其他XAI工具进行了比较。
摘要
:Explaining opaque Machine Learning (ML) models has become an increasingly important challenge. However, current eXplanation in AI (XAI) methods suffer several shortcomings, including insufficient abstraction, limited user interactivity, and inadequate integration of symbolic knowledge. We propose ReasonX, an explanation tool based on expressions (or, queries) in a closed algebra of operators over theories of linear constraints. ReasonX provides declarative and interactive explanations for decision trees, which may represent the ML models under analysis or serve as global or local surrogate models for any black-box predictor. Users can express background or common sense knowledge as linear constraints. This allows for reasoning at multiple levels of abstraction, ranging from fully specified examples to under-specified or partially constrained ones. ReasonX leverages Mixed-Integer Linear Programming (MILP) to reason over the features of factual and contrastive instances. We present here the architecture of ReasonX, which consists of a Python layer, closer to the user, and a Constraint Logic Programming (CLP) layer, which implements a meta-interpreter of the query algebra. The capabilities of ReasonX are demonstrated through qualitative examples, and compared to other XAI tools through quantitative experiments.
检测相关(3篇)
【1】MI$^2$DAS: A Multi-Layer Intrusion Detection Framework with Incremental Learning for Securing Industrial IoT Networks
标题:MI$^2$DAS:一个具有增量学习的多层入侵检测框架,用于保护工业物联网网络
链接:https://arxiv.org/abs/2602.23846
备注:Accepted for publication in the Proceedings of the 2026 International Conference on Information Systems Security and Privacy (ICISSP)
摘要:工业物联网(IIoT)系统的快速扩张加剧了安全挑战,因为异构设备和动态流量模式增加了对复杂和以前看不到的网络攻击的暴露。传统的入侵检测系统往往在这样的环境中挣扎,因为它们依赖于大量的标记数据和有限的能力来检测新的威胁。为了解决这些挑战,我们提出了MI 2$DAS,一个多层入侵检测框架,集成了基于异常的分层流量池,开集识别,以区分已知和未知的攻击和增量学习,以适应新的攻击类型,最小的标签。在Edge-IIoTset数据集上进行的实验证明了所有层的强大性能。在第一层中,GMM实现了优异的正常攻击区分(准确度= 0.953,TPR = 1.000)。在开集识别中,GMM对已知攻击的召回率为0.813,而LOF对未知攻击的召回率为0.882。对于已知攻击的细粒度分类,随机森林实现了0.941的宏F1。最后,增量学习模块在合并新的攻击类时保持稳健的性能,实现了0.8995的宏F1。这些结果表明,MI$^2$DAS是一种有效、可扩展和自适应的框架,可增强IIoT安全性,以应对不断变化的威胁。
摘要:The rapid expansion of Industrial IoT (IIoT) systems has amplified security challenges, as heterogeneous devices and dynamic traffic patterns increase exposure to sophisticated and previously unseen cyberattacks. Traditional intrusion detection systems often struggle in such environments due to their reliance on extensive labeled data and limited ability to detect new threats. To address these challenges, we propose MI$^2$DAS, a multi-layer intrusion detection framework that integrates anomaly-based hierarchical traffic pooling, open-set recognition to distinguish between known and unknown attacks and incremental learning for adapting to novel attack types with minimal labeling. Experiments conducted on the Edge-IIoTset dataset demonstrate strong performance across all layers. In the first layer, GMM achieves superior normal-attack discrimination (accuracy = 0.953, TPR = 1.000). In open-set recognition, GMM attains a recall of 0.813 for known attacks, while LOF achieves 0.882 recall for unknown attacks. For fine-grained classification of known attacks, Random Forest achieves a macro-F1 of 0.941. Finally, the incremental learning module maintains robust performance when incorporation novel attack classes, achieving a macro-F1 of 0.8995. These results showcase MI$^2$DAS as an effective, scalable and adaptive framework for enhancing IIoT security against evolving threats.
【2】Modelling and Simulation of Neuromorphic Datasets for Anomaly Detection in Computer Vision
标题:用于计算机视觉异常检测的神经形态数据集建模与仿真
链接:https://arxiv.org/abs/2602.23514
备注:draft paper
摘要:动态视觉传感器(DVS)的可用性的限制提出了一个根本的挑战,神经形态计算机视觉应用的研究人员。作为回应,研究界已经创建了数据集,但通常包含有限数量的样本或场景。为了解决缺乏一个全面的神经形态视觉数据集的模拟器,我们介绍了异常神经形态工具的神经网络(ANTALO),一个新的数据集模拟框架。内置于Unity引擎中,ANTALK模拟抽象的、可配置的3D场景,这些场景由显示随机生成的行为的对象填充,这些行为描述了运动和旋转等属性。对象行为的采样和对不合理行为对象的标记是遵循中心极限定理原理的统计过程。通过调整软件中有限数量的参数,可以创建包含任意数量样本的数据集,并从ANTALBO导出,同时还可以导出附带的标签和帧数据。ANTALGOT解决了基于事件的计算机视觉研究人员的数据可用性的局限性,允许模拟定制的数据集,以满足包括对象识别和定位以及异常检测在内的目的。
摘要:Limitations on the availability of Dynamic Vision Sensors (DVS) present a fundamental challenge to researchers of neuromorphic computer vision applications. In response, datasets have been created by the research community, but often contain a limited number of samples or scenarios. To address the lack of a comprehensive simulator of neuromorphic vision datasets, we introduce the Anomalous Neuromorphic Tool for Shapes (ANTShapes), a novel dataset simulation framework. Built in the Unity engine, ANTShapes simulates abstract, configurable 3D scenes populated by objects displaying randomly-generated behaviours describing attributes such as motion and rotation. The sampling of object behaviours, and the labelling of anomalously-acting objects, is a statistical process following central limit theorem principles. Datasets containing an arbitrary number of samples can be created and exported from ANTShapes, along with accompanying label and frame data, through the adjustment of a limited number of parameters within the software. ANTShapes addresses the limitations of data availability to researchers of event-based computer vision by allowing for the simulation of bespoke datasets to suit purposes including object recognition and localisation alongside anomaly detection.
【3】SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection
标题:SALIENT:用于可控长尾CT检测的频率感知配对扩散
链接:https://arxiv.org/abs/2602.23447
备注:5 figures
摘要:全身CT中罕见病变的检测从根本上受到极端类别不平衡和低目标体积比的限制,尽管AUROC很高,但仍会导致精度崩溃。具有扩散模型的合成增强提供了希望,但是像素空间扩散在计算上是昂贵的,并且现有的掩蔽条件方法缺乏可控的属性级调节和成对的监督以进行负责任的训练。我们介绍了SALIENT,一个面具条件小波域扩散框架,合成成对的病变掩蔽量可控CT增强长尾制度下。SALIENT不是在像素空间中去噪,而是在离散小波系数上执行结构化扩散,明确地将低频亮度与高频结构细节分离。可学习的频率感知目标将目标和背景属性(结构、对比度、边缘保真度)分开,从而实现可解释和稳定的优化。3D VAE生成不同的体积病变掩模,并且半监督教师生成成对的切片级伪标签用于下游掩模引导的检测。SALIENT提高了生成真实性,这反映在更高的MS-SSIM(0.63至0.83)和更低的FID(118.4至46.5)上。在一个单独的下游评估中,SALIENT增强训练提高了长尾检测性能,在低患病率和目标体积比上产生不成比例的AUPRC增益。最佳的合成比例从2倍转变为4倍标记的种子大小减少,表明种子依赖性增强制度在低标签条件下。SALIENT证明了频率感知扩散能够在长尾CT检测中实现可控的、计算高效的精确救援。
摘要:Detection of rare lesions in whole-body CT is fundamentally limited by extreme class imbalance and low target-to-volume ratios, producing precision collapse despite high AUROC. Synthetic augmentation with diffusion models offers promise, yet pixel-space diffusion is computationally expensive, and existing mask-conditioned approaches lack controllable attribute-level regulation and paired supervision for accountable training. We introduce SALIENT, a mask-conditioned wavelet-domain diffusion framework that synthesizes paired lesion-masking volumes for controllable CT augmentation under long-tail regimes. Instead of denoising in pixel space, SALIENT performs structured diffusion over discrete wavelet coefficients, explicitly separating low-frequency brightness from high-frequency structural detail. Learnable frequency-aware objectives disentangle target and background attributes (structure, contrast, edge fidelity), enabling interpretable and stable optimization. A 3D VAE generates diverse volumetric lesion masks, and a semi-supervised teacher produces paired slice-level pseudo-labels for downstream mask-guided detection. SALIENT improves generative realism, as reflected by higher MS-SSIM (0.63 to 0.83) and lower FID (118.4 to 46.5). In a separate downstream evaluation, SALIENT-augmented training improves long-tail detection performance, yielding disproportionate AUPRC gains across low prevalences and target-to-volume ratios. Optimal synthetic ratios shift from 2x to 4x as labeled seed size decreases, indicating a seed-dependent augmentation regime under low-label conditions. SALIENT demonstrates that frequency-aware diffusion enables controllable, computationally efficient precision rescue in long-tail CT detection.
分类|识别(5篇)
【1】Comparing Classical and Quantum Variational Classifiers on the XOR Problem
标题:在异或问题上比较经典和量子变分分类器
链接:https://arxiv.org/abs/2602.24220
备注:32 pages, 17 figures. Code and experiment scripts available at https://github.com/mseilkhan/XOR-research-Quantum-ML-vs-Classic
摘要:量子机器学习将叠加和纠缠等原理应用于数据处理和优化。变分量子模型在高维希尔伯特空间中对量子比特进行操作,并提供了一种替代方法来建模表达性。我们比较了经典模型和变分量子分类器的XOR问题。逻辑回归,一个隐藏层的多层感知器,和两个量子比特变分量子分类器与电路深度1和2的评估与不同的高斯噪声和样本大小的合成XOR数据集,使用精度和二进制交叉熵。 性能主要由模型表达能力决定。逻辑回归和深度1量子电路无法可靠地表示XOR,而多层感知器和深度2量子电路在代表性条件下实现了完美的测试精度。跨噪声水平、数据集大小和随机种子的鲁棒性分析证实,电路深度对该任务的量子性能具有决定性作用。尽管匹配精度,多层感知器实现了较低的二进制交叉熵和大幅缩短的训练时间。硬件执行保留了全局XOR结构,但在决策函数中引入了结构化偏差。总体而言,更深层次的变分量子分类器可以在低维XOR基准上与经典神经网络的准确性相匹配,但在所研究的设置中没有观察到鲁棒性或效率方面的明显经验优势。
摘要:Quantum machine learning applies principles such as superposition and entanglement to data processing and optimization. Variational quantum models operate on qubits in high-dimensional Hilbert spaces and provide an alternative approach to model expressivity. We compare classical models and a variational quantum classifier on the XOR problem. Logistic regression, a one-hidden-layer multilayer perceptron, and a two-qubit variational quantum classifier with circuit depths 1 and 2 are evaluated on synthetic XOR datasets with varying Gaussian noise and sample sizes using accuracy and binary cross-entropy. Performance is determined primarily by model expressivity. Logistic regression and the depth-1 quantum circuit fail to represent XOR reliably, whereas the multilayer perceptron and the depth-2 quantum circuit achieve perfect test accuracy under representative conditions. Robustness analyses across noise levels, dataset sizes, and random seeds confirm that circuit depth is decisive for quantum performance on this task. Despite matching accuracy, the multilayer perceptron achieves lower binary cross-entropy and substantially shorter training time. Hardware execution preserves the global XOR structure but introduces structured deviations in the decision function. Overall, deeper variational quantum classifiers can match classical neural networks in accuracy on low-dimensional XOR benchmarks, but no clear empirical advantage in robustness or efficiency is observed in the examined settings.
【2】What You Read is What You Classify: Highlighting Attributions to Text and Text-Like Inputs
标题:阅读内容即分类内容:强调文本和类文本输入的归因
链接:https://arxiv.org/abs/2602.24149
备注:17 pages, 8 figures
摘要:目前,对于离散的令牌输入(如文本),还没有容易理解的可解释的人工智能(AI)方法。大多数可解释的人工智能技术都不能很好地扩展到令牌序列,在令牌序列中,局部和全局特征都很重要,因为最先进的模型,如Transformers,往往专注于全局连接。因此,现有的可解释的AI算法由于(i)识别不同的重要性令牌,或(ii)为大量令牌分配低重要性值而失败。这种基于标记的分类器的可解释AI方法推广了基于掩码的图像可解释AI算法。它从一个解释器神经网络开始,该网络经过训练,可以创建掩码来隐藏与分类无关的信息。然后,取掩模和分类器的嵌入层的连续值的Hadamard乘积并通过分类器,改变嵌入向量的大小,但保持方向不变。解释器被训练用于核苷酸序列的分类学分类器,并且结果表明,掩蔽片段与分类的相关性低于未掩蔽片段。该方法集中于令牌作为整体的重要性(即,输入序列的片段),从而产生人类可读的解释。
摘要:At present, there are no easily understood explainable artificial intelligence (AI) methods for discrete token inputs, like text. Most explainable AI techniques do not extend well to token sequences, where both local and global features matter, because state-of-the-art models, like transformers, tend to focus on global connections. Therefore, existing explainable AI algorithms fail by (i) identifying disparate tokens of importance, or (ii) assigning a large number of tokens a low value of importance. This method for explainable AI for tokens-based classifiers generalizes a mask-based explainable AI algorithm for images. It starts with an Explainer neural network that is trained to create masks to hide information not relevant for classification. Then, the Hadamard product of the mask and the continuous values of the classifier's embedding layer is taken and passed through the classifier, changing the magnitude of the embedding vector but keeping the orientation unchanged. The Explainer is trained for a taxonomic classifier for nucleotide sequences and it is shown that the masked segments are less relevant to classification than the unmasked ones. This method focused on the importance the token as a whole (i.e., a segment of the input sequence), producing a human-readable explanation.
【3】Provable Subspace Identification of Nonlinear Multi-view CCA
标题:非线性多视图PCA的可证子空间识别
链接:https://arxiv.org/abs/2602.23785
摘要:我们研究了非线性典型相关分析(CCA)的可识别性在多视图设置,其中每个视图是由一个未知的非线性映射应用到共享的潜伏期和视图私有噪声的线性混合产生的。而不是试图精确解混,一个问题被证明是不适定的,我们而是重新构建多视图CCA作为一个基础不变的子空间识别问题。我们证明了,在适当的潜在先验和频谱分离条件下,多视图CCA恢复成对相关的信号子空间的视图正交模糊。对于$N \geq 3$视图,该目标可证明地隔离了所有视图之间共享的联合相关子空间,同时消除了视图私有变化。我们进一步建立有限样本的一致性保证,通过频谱扰动理论转化为显式子空间误差界的经验互协方差的浓度。合成和渲染图像数据集上的实验验证了我们的理论研究结果,并证实了假设条件的必要性。
摘要:We investigate the identifiability of nonlinear Canonical Correlation Analysis (CCA) in a multi-view setup, where each view is generated by an unknown nonlinear map applied to a linear mixture of shared latents and view-private noise. Rather than attempting exact unmixing, a problem proven to be ill-posed, we instead reframe multi-view CCA as a basis-invariant subspace identification problem. We prove that, under suitable latent priors and spectral separation conditions, multi-view CCA recovers the pairwise correlated signal subspaces up to view-wise orthogonal ambiguity. For $N \geq 3$ views, the objective provably isolates the jointly correlated subspaces shared across all views while eliminating view-private variations. We further establish finite-sample consistency guarantees by translating the concentration of empirical cross-covariances into explicit subspace error bounds via spectral perturbation theory. Experiments on synthetic and rendered image datasets validate our theoretical findings and confirm the necessity of the assumed conditions.
【4】Causal Identification from Counterfactual Data: Completeness and Bounding Results
标题:从反事实数据中识别因果关系:完整性和界限结果
链接:https://arxiv.org/abs/2602.23541
摘要:以前建立$\textit{反事实识别}$的完整性结果的工作已经被限制在输入数据属于观察或干预分布(Pearl因果层次的第1层和第2层)的设置中,因为通常假设不可能从反事实分布中获得数据,这属于第3层。然而,最近的工作(Raghavan & Bareinboim,2025)已经正式描述了一个可以通过实验方法直接估计的反事实分布族-他们称之为$\textit{counterfactual realizabilty}$。这就留下了一个问题,即考虑到这种对(一些)第3层数据的新访问,现在可以识别哪些反事实量。为了回答这个问题,我们开发了CTFIDU+算法,用于从任意一组第3层分布中识别反事实查询,并证明它是完成这项任务的。在此基础上,我们建立了理论极限,其中反事实可以从物理可实现的分布,从而意味着$\textit{在非参数设置的基本限制,以准确的因果推理}$。最后,鉴于不可能识别某些关键类型的反事实,我们使用可实现的反事实数据推导出这些数量的新的分析界限,并使用模拟证实,反事实数据有助于在实践中收紧不可识别数量的界限。
摘要:Previous work establishing completeness results for $\textit{counterfactual identification}$ has been circumscribed to the setting where the input data belongs to observational or interventional distributions (Layers 1 and 2 of Pearl's Causal Hierarchy), since it was generally presumed impossible to obtain data from counterfactual distributions, which belong to Layer 3. However, recent work (Raghavan & Bareinboim, 2025) has formally characterized a family of counterfactual distributions which can be directly estimated via experimental methods - a notion they call $\textit{counterfactual realizabilty}$. This leaves open the question of what $\textit{additional}$ counterfactual quantities now become identifiable, given this new access to (some) Layer 3 data. To answer this question, we develop the CTFIDU+ algorithm for identifying counterfactual queries from an arbitrary set of Layer 3 distributions, and prove that it is complete for this task. Building on this, we establish the theoretical limit of which counterfactuals can be identified from physically realizable distributions, thus implying the $\textit{fundamental limit to exact causal inference in the non-parametric setting}$. Finally, given the impossibility of identifying certain critical types of counterfactuals, we derive novel analytic bounds for such quantities using realizable counterfactual data, and corroborate using simulations that counterfactual data helps tighten the bounds for non-identifiable quantities in practice.
【5】On the Limits of Interpretable Machine Learning in Quintic Root Classification
标题:五次根分类中可解释机器学习的局限性
链接:https://arxiv.org/abs/2602.23467
摘要
:机器学习(ML)能否从原始数值数据中自动恢复可解释的数学结构?我们的目标是回答这个问题,使用的分类多项式的实根配置高达五度作为结构化的基准。我们测试了大量的ML模型,包括决策树、逻辑回归、支持向量机、随机森林、梯度提升、XGBoost、符号回归和神经网络。神经网络仅使用原始系数在五次分类上实现了强大的分布内性能(84.3% +或-0.9%平衡准确度),而决策树表现得更差(59.9% +或-0.9%)。然而,当在关键点提供捕获符号变化的显式特征时,决策树匹配神经性能(84.2% +或-1.2%)并产生显式分类规则。知识蒸馏表明,这个单一的不变量占97.5%的提取的决策结构。分布外,数据效率和噪声鲁棒性分析表明,神经网络学习连续的,数据相关的几何近似的决策边界,而不是恢复尺度不变的符号规则。几何近似和符号不变性之间的这种区别解释了在模型中观察到的预测性能和可解释性之间的差距。虽然可以达到很高的预测精度,但我们没有发现任何证据表明评估的ML模型可以从原始系数中自动恢复离散的、人类可解释的数学规则。这些结果表明,在结构化的数学领域,可解释性可能需要明确的结构归纳偏差,而不是纯粹的数据驱动的近似。
摘要:Can Machine Learning (ML) autonomously recover interpretable mathematical structure from raw numerical data? We aim to answer this question using the classification of real-root configurations of polynomials up to degree five as a structured benchmark. We tested an extensive set of ML models, including decision trees, logistic regression, support vector machines, random forest, gradient boosting, XGBoost, symbolic regression, and neural networks. Neural networks achieved strong in-distribution performance on quintic classification using raw coefficients alone (84.3% + or - 0.9% balanced accuracy), whereas decision trees perform substantially worse (59.9% + or - 0.9\%). However, when provided with an explicit feature capturing sign changes at critical points, decision trees match neural performance (84.2% + or - 1.2%) and yield explicit classification rules. Knowledge distillation reveals that this single invariant accounts for 97.5% of the extracted decision structure. Out-of-distribution, data-efficiency, and noise robustness analyses indicate that neural networks learn continuous, data-dependent geometric approximations of the decision boundary rather than recovering scale-invariant symbolic rules. This distinction between geometric approximation and symbolic invariance explains the gap between predictive performance and interpretability observed across models. Although high predictive accuracy is attainable, we find no evidence that the evaluated ML models autonomously recover discrete, human-interpretable mathematical rules from raw coefficients. These results suggest that, in structured mathematical domains, interpretability may require explicit structural inductive bias rather than purely data-driven approximation.
表征(3篇)
【1】Who Guards the Guardians? The Challenges of Evaluating Identifiability of Learned Representations
标题:谁守护者?评估习得表示可识别性的挑战
链接:https://arxiv.org/abs/2602.24278
摘要:表示学习中的可识别性通常使用标准度量(例如,MCC,DCI,R^2)在具有已知地面实况因子的合成基准上。这些指标被假定为反映恢复到等价类的可识别性理论保证。我们表明,这一假设仅在特定的结构条件下成立:每个度量隐式编码的数据生成过程(DGP)和编码器的假设。当这些假设被违反时,度量标准会被错误指定,并可能产生系统性的误报和漏报。这种失败发生在经典的可识别性制度和事后设置中,可识别性是最需要的。我们引入了一个分类分离DGP假设编码器的几何形状,使用它来验证现有指标的有效性域,并发布了可重复的压力测试和比较的评估套件。
摘要:Identifiability in representation learning is commonly evaluated using standard metrics (e.g., MCC, DCI, R^2) on synthetic benchmarks with known ground-truth factors. These metrics are assumed to reflect recovery up to the equivalence class guaranteed by identifiability theory. We show that this assumption holds only under specific structural conditions: each metric implicitly encodes assumptions about both the data-generating process (DGP) and the encoder. When these assumptions are violated, metrics become misspecified and can produce systematic false positives and false negatives. Such failures occur both within classical identifiability regimes and in post-hoc settings where identifiability is most needed. We introduce a taxonomy separating DGP assumptions from encoder geometry, use it to characterise the validity domains of existing metrics, and release an evaluation suite for reproducible stress testing and comparison.
【2】Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
标题:成分概括需要视觉嵌入模型中的线性、垂直表示
链接:https://arxiv.org/abs/2602.24264
摘要:组合泛化,即在新的上下文中识别熟悉部分的能力,是智能系统的一个定义属性。尽管现代模型是在大规模数据集上训练的,但它们仍然只覆盖了可能输入的组合空间的一小部分,这就提出了一个问题,即必须有什么样的结构表示来支持对未知组合的泛化。我们形式化的三个必要条件组成的推广下的标准训练(可分性,可转移性,稳定性),并表明他们施加必要的几何约束:表示必须线性分解成每个概念组件,这些组件必须是正交的概念。这为线性表征假说提供了理论基础:在神经表征中广泛观察到的线性结构是合成泛化的必然结果。我们进一步推导出维度界限连接的可组合概念的嵌入几何的数量。从经验上讲,我们评估了这些预测在现代视觉模型(CLIP,SigLIP,DINO),并发现表示表现出部分线性分解与低秩,近正交的每概念的因素,这种结构的程度与成分概括看不见的组合。随着模型的不断扩展,这些条件预测了它们可能收敛到的代表性几何形状。代码可在https://github.com/oshapio/necessary-compositionality上获得。
摘要:Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern models are trained on massive datasets, they still cover only a tiny fraction of the combinatorial space of possible inputs, raising the question of what structure representations must have to support generalization to unseen combinations. We formalize three desiderata for compositional generalization under standard training (divisibility, transferability, stability) and show they impose necessary geometric constraints: representations must decompose linearly into per-concept components, and these components must be orthogonal across concepts. This provides theoretical grounding for the Linear Representation Hypothesis: the linear structure widely observed in neural representations is a necessary consequence of compositional generalization. We further derive dimension bounds linking the number of composable concepts to the embedding geometry. Empirically, we evaluate these predictions across modern vision models (CLIP, SigLIP, DINO) and find that representations exhibit partial linear factorization with low-rank, near-orthogonal per-concept factors, and that the degree of this structure correlates with compositional generalization on unseen combinations. As models continue to scale, these conditions predict the representational geometry they may converge to. Code is available at https://github.com/oshapio/necessary-compositionality.
【3】Disentangled Mode-Specific Representations for Tensor Time Series via Contrastive Learning
标题:通过对比学习解开张量时间序列的模式特定表示
链接:https://arxiv.org/abs/2602.23663
摘要:多模态张量时间序列(TTS)在搜索引擎、环境监测系统等领域都有广泛的应用。学习TTS的表示有利于各种应用,但它也具有挑战性,因为张量中固有的复杂性阻碍了丰富表示的实现。在本文中,我们提出了一种新的表示学习方法,专为TTS,即MoST。具体来说,MoST使用张量切片方法来降低TTS结构的复杂性,并学习可以分解为单个非时间模式的表示。每个表示捕获特定于模式的特征,其是相同模式内的变量之间的关系,以及模式不变特征,其在不同模式的表示中是共同的。我们采用对比学习框架来学习参数;损失函数包括两个部分,旨在以特定模式的方式和模式不变的方式学习表示,有效地利用解纠缠表示作为增强。在真实世界数据集上的大量实验表明,MoST在分类和预测精度方面始终优于最先进的方法。代码可在https://github.com/KoheiObata/MoST上获得。
摘要:Multi-mode tensor time series (TTS) can be found in many domains, such as search engines and environmental monitoring systems. Learning representations of a TTS benefits various applications, but it is also challenging since the complexities inherent in the tensor hinder the realization of rich representations. In this paper, we propose a novel representation learning method designed specifically for TTS, namely MoST. Specifically, MoST uses a tensor slicing approach to reduce the complexity of the TTS structure and learns representations that can be disentangled into individual non-temporal modes. Each representation captures mode-specific features, which are the relationship between variables within the same mode, and mode-invariant features, which are in common in representations of different modes. We employ a contrastive learning framework to learn parameters; the loss function comprises two parts intended to learn representation in a mode-specific way and mode-invariant way, effectively exploiting disentangled representations as augmentations. Extensive experiments on real-world datasets show that MoST consistently outperforms the state-of-the-art methods in terms of classification and forecasting accuracy. Code is available at https://github.com/KoheiObata/MoST.
优化|敛散性(6篇)
【1】LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding
标题:LK损失:投机解码的直接接受率优化
链接:https://arxiv.org/abs/2602.23881
摘要
:推测解码通过使用轻量级草稿模型来提出候选令牌,然后由目标模型并行验证,从而加速自回归大语言模型(LLM)推理。加速显著地由接受率决定,然而标准训练将Kullback-Leibler(KL)发散最小化作为代理目标。虽然KL分歧和接受率共享相同的全局最优值,但容量有限的小草案模型通常收敛到次优解,其中最小化KL并不能保证最大化接受率。为了解决这个问题,我们提出了LK损失,直接针对接受率的特殊培训目标。在四个草案架构和六个目标模型(从8B到685 B参数)上进行的全面实验表明,与标准的基于KL的训练相比,所有配置的验收指标都有了一致的改进。我们评估我们的方法在一般,编码和数学领域和报告增益高达8-10%的平均接受长度。LK损失易于实现,不引入计算开销,并且可以直接集成到任何现有的投机者培训框架中,使其成为现有培训目标草案的引人注目的替代方案。
摘要:Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.
【2】Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parameteric Policies
标题:超越状态镜像后裔:使用参数化策略进行离线策略优化
链接:https://arxiv.org/abs/2602.23811
摘要:我们研究了一般函数近似下的离线强化学习(RL)的理论方面。虽然先前的工作(例如,Xie等,2021年)已经建立了通过悲观主义从离线数据中学习良好策略的理论基础,现有的计算上易于处理的算法(通常在Oracle高效的意义上),如PSPI,仅适用于有限和小的动作空间。此外,这些算法依赖于状态明智的镜像下降,并要求演员被隐式地从评论家的功能,未能适应独立的政策参数化,这是普遍存在的实践。在这项工作中,我们解决了这些限制,并扩展到参数化的政策类在大或连续的动作空间的理论保证。当将镜像下降扩展到参数化策略时,我们将上下文耦合确定为核心难点,并展示了将镜像下降与自然策略梯度联系起来如何导致新颖的分析,保证和算法见解,包括离线RL和模仿学习之间令人惊讶的统一。
摘要:We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offline data via pessimism, existing algorithms that are computationally tractable (often in an oracle-efficient sense), such as PSPI, only apply to finite and small action spaces. Moreover, these algorithms rely on state-wise mirror descent and require actors to be implicitly induced from the critic functions, failing to accommodate standalone policy parameterization which is ubiquitous in practice. In this work, we address these limitations and extend the theoretical guarantees to parameterized policy classes over large or continuous action spaces. When extending mirror descent to parameterized policies, we identify contextual coupling as the core difficulty, and show how connecting mirror descent to natural policy gradient leads to novel analyses, guarantees, and algorithmic insights, including a surprising unification between offline RL and imitation learning.
【3】Actor-Critic Pretraining for Proximal Policy Optimization
标题:近端政策优化的演员评论家预训练
链接:https://arxiv.org/abs/2602.23804
摘要:强化学习(RL)的actor-critic算法可以实现自主学习,但通常需要大量的环境交互,这限制了它们在机器人中的适用性。利用专家数据可以减少所需的环境交互次数。一种常见的方法是演员预训练,其中演员网络通过专家演示的行为克隆进行初始化,随后使用RL进行微调。相比之下,批评网络的初始化很少受到关注,尽管它在政策优化中发挥着核心作用。本文提出了一种用于Actor-Critic算法的预训练方法,如最近策略优化(PPO),该算法使用专家演示来初始化两个网络。参与者通过行为克隆进行预训练,而批评者则使用从预训练策略的推出中获得的回报进行预训练。15模拟机器人操作和运动任务的方法进行评估。实验结果表明,与没有预训练相比,演员-评论家预训练平均提高了86.1%的样本效率,而仅演员预训练则提高了30.9%。
摘要:Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A common approach is actor pretraining, where the actor network is initialized via behavioral cloning on expert demonstrations and subsequently fine-tuned with RL. In contrast, the initialization of the critic network has received little attention, despite its central role in policy optimization. This paper proposes a pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks. The actor is pretrained via behavioral cloning, while the critic is pretrained using returns obtained from rollouts of the pretrained policy. The approach is evaluated on 15 simulated robotic manipulation and locomotion tasks. Experimental results show that actor-critic pretraining improves sample efficiency by 86.1% on average compared to no pretraining and by 30.9% to actor-only pretraining.
【4】On the Convergence of Single-Loop Stochastic Bilevel Optimization with Approximate Implicit Differentiation
标题:带近似隐式分化的单圈随机二层优化的收敛性
链接:https://arxiv.org/abs/2602.23633
摘要:None
摘要:Stochastic Bilevel Optimization has emerged as a fundamental framework for meta-learning and hyperparameter optimization. Despite the practical prevalence of single-loop algorithms--which update lower and upper variables concurrently--their theoretical understanding, particularly in the stochastic regime, remains significantly underdeveloped compared to their multi-loop counterparts. Existing analyses often yield suboptimal convergence rates or obscure the critical dependence on the lower-level condition number $κ$, frequently burying it within generic Lipschitz constants. In this paper, we bridge this gap by providing a refined convergence analysis of the Single-loop Stochastic Approximate Implicit Differentiation (SSAID) algorithm. We prove that SSAID achieves an $ε$-stationary point with an oracle complexity of $\mathcal{O}(κ^7 ε^{-2})$. Our result is noteworthy in two aspects: (i) it matches the optimal $\mathcal{O}(ε^{-2})$ rate of state-of-the-art multi-loop methods (e.g., stocBiO) while maintaining the computational efficiency of a single-loop update; and (ii) it provides the first explicit, fine-grained characterization of the $κ$-dependence for stochastic AID-based single-loop methods. This work demonstrates that SSAID is not merely a heuristic approach, but admits a rigorous theoretical foundation with convergence guarantees competitive with mainstream multi-loop frameworks.
【5】BTTackler: A Diagnosis-based Framework for Efficient Deep Learning Hyperparameter Optimization
标题:BTTackler:基于诊断的高效深度学习超参数优化框架
链接:https://arxiv.org/abs/2602.23630
摘要:超参数优化(HPO)在深度学习中是昂贵的,特别是在利用自动化方法时。大多数现有的自动化HPO方法是基于精度的,即,使用准确度度量来指导在特定搜索空间中对不同超参数配置的试验。然而,许多试验可能会遇到严重的训练问题,例如梯度消失和收敛不足,这些问题在训练的早期阶段很难通过准确性度量来反映,并且通常导致性能不佳。这导致了低效的优化轨迹,因为不良试验占用了相当多的计算资源,并降低了在时间限制内找到优秀超参数配置的概率。在本文中,我们提出了\textbf{Bad Trial Tackler(BTTackler)},这是一个新的HPO框架,它引入了训练诊断来自动识别训练问题,从而解决不良试验。BTTackler通过计算一组精心设计的量化指标来诊断每个试验,并在检测到任何培训问题时触发提前终止。对代表性的HPO任务进行了评估,这些任务包括三个经典的深度神经网络(DNN)和四个广泛使用的HPO方法。为了更好地量化自动HPO方法的有效性,我们提出了两个新的测量精度和时间消耗的基础上。结果表明,BTTackler算法具有两方面的优势:(1)在达到与基线算法相同的精度时,平均减少了40.33%的时间消耗;(2)在给定的时间预算内,BTTackler算法比基线算法平均多进行了44.5%的前10次试验。我们还发布了一个开源Python库,允许用户轻松地将BTTackler应用于自动化的HPO流程,只需最少的代码更改。
摘要
:Hyperparameter optimization (HPO) is known to be costly in deep learning, especially when leveraging automated approaches. Most of the existing automated HPO methods are accuracy-based, i.e., accuracy metrics are used to guide the trials of different hyperparameter configurations amongst a specific search space. However, many trials may encounter severe training problems, such as vanishing gradients and insufficient convergence, which can hardly be reflected by accuracy metrics in the early stages of the training and often result in poor performance. This leads to an inefficient optimization trajectory because the bad trials occupy considerable computation resources and reduce the probability of finding excellent hyperparameter configurations within a time limitation. In this paper, we propose \textbf{Bad Trial Tackler (BTTackler)}, a novel HPO framework that introduces training diagnosis to identify training problems automatically and hence tackles bad trials. BTTackler diagnoses each trial by calculating a set of carefully designed quantified indicators and triggers early termination if any training problems are detected. Evaluations are performed on representative HPO tasks consisting of three classical deep neural networks (DNN) and four widely used HPO methods. To better quantify the effectiveness of an automated HPO method, we propose two new measurements based on accuracy and time consumption. Results show the advantage of BTTackler on two-fold: (1) it reduces 40.33\% of time consumption to achieve the same accuracy comparable to baseline methods on average and (2) it conducts 44.5\% more top-10 trials than baseline methods on average within a given time budget. We also released an open-source Python library that allows users to easily apply BTTackler to automated HPO processes with minimal code changes.
【6】A distributed semismooth Newton based augmented Lagrangian method for distributed optimization
标题:分布式优化问题的一种基于增广拉格朗日方法的分布式半光滑牛顿法
链接:https://arxiv.org/abs/2602.23854
摘要:提出了一种新的分布式半光滑牛顿增广拉格朗日方法来求解网络上的一类优化问题,其中全局目标定义为局部代价函数之和,通信仅限于相邻代理.具体来说,我们采用增广拉格朗日方法来解决一个等价的重新制定的约束版本的原始问题。每个产生的子问题通过分布式半光滑牛顿法不精确地解决。通过充分利用广义Hessian矩阵的结构,提出了一种分布式加速邻近梯度方法来有效地计算牛顿方向,消除了与完整Hessian矩阵通信的需要。理论结果也得到了保证所提出的算法的收敛性。数值实验表明,我们的算法相比,最先进的分布式算法的效率和优越性。
摘要:This paper proposes a novel distributed semismooth Newton based augmented Lagrangian method for solving a class of optimization problems over networks, where the global objective is defined as the sum of locally held cost functions, and communication is restricted to neighboring agents. Specifically, we employ the augmented Lagrangian method to solve an equivalently reformulated constrained version of the original problem. Each resulting subproblem is solved inexactly via a distributed semismooth Newton method. By fully leveraging the structure of the generalized Hessian, a distributed accelerated proximal gradient method is proposed to compute the Newton direction efficiently, eliminating the need to communicate with full Hessian matrices. Theoretical results are also obtained to guarantee the convergence of the proposed algorithm. Numerical experiments demonstrate the efficiency and superiority of our algorithm compared to state-of-the-art distributed algorithms.
预测|估计(6篇)
【1】The Stability of Online Algorithms in Performative Prediction
标题:表演预测中在线算法的稳定性
链接:https://arxiv.org/abs/2602.24207
摘要:在决策中使用算法预测会导致一个反馈回路,我们部署的模型会积极影响我们看到的数据分布,然后再用于重新训练。Perdomo等人在2020年关于表演预测的工作中正式阐述了这种动态。我们的主要结果是一个无条件的减少表明,任何部署在表演设置的无遗憾算法收敛到一个(混合)表演稳定的平衡:一个解决方案,其中模型积极塑造数据分布的方式,他们自己的预测看起来最佳事后诸葛亮。在我们的工作之前,这一领域的所有积极结果都对模型如何影响分布产生了很大的限制。通过使用鞅参数和允许随机化,我们避免了任何这样的假设和回避最近的硬度结果找到稳定的模型。最后,在一个更概念性的说明中,我们的连接揭示了为什么常见的算法,如梯度下降,自然稳定并防止失控的反馈循环。我们希望我们的工作能够在未来实现在线优化和表演性之间的技术转移。
摘要:The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we see, and later use to retrain on. This dynamic was formalized by Perdomo et al. 2020 in their work on performative prediction. Our main result is an unconditional reduction showing that any no-regret algorithm deployed in performative settings converges to a (mixed) performatively stable equilibrium: a solution in which models actively shape data distributions in ways that their own predictions look optimal in hindsight. Prior to our work, all positive results in this area made strong restrictions on how models influenced distributions. By using a martingale argument and allowing randomization, we avoid any such assumption and sidestep recent hardness results for finding stable models. Lastly, on a more conceptual note, our connection sheds light on why common algorithms, like gradient descent, are naturally stabilizing and prevent runaway feedback loops. We hope our work enables future technical transfer of ideas between online optimization and performativity.
【2】Flow-Based Density Ratio Estimation for Intractable Distributions with Applications in Genomics
标题:难以处理分布的基于流的密度比估计及其在基因组学中的应用
链接:https://arxiv.org/abs/2602.24201
摘要:估计难以处理的数据分布对之间的密度比是概率建模中的核心问题,可以在不同的数据生成过程中跨条件和协变量对样本可能性进行原则性比较。虽然精确似然模型(如归一化流)为密度比估计提供了一种有前途的方法,但基于朴素流的评估在计算上是昂贵的,因为它们需要分别为每个分布模拟昂贵的似然积分。在这项工作中,我们利用条件感知流匹配,以获得一个单一的动态制定跟踪沿生成轨迹的密度比。我们在模拟基准上展示了封闭形式比率估计的竞争性能,并表明我们的方法支持单细胞基因组学数据分析中的多功能任务,其中基于可能性的细胞状态在实验条件下的比较使治疗效果估计和批次校正评估成为可能。
摘要:Estimating density ratios between pairs of intractable data distributions is a core problem in probabilistic modeling, enabling principled comparisons of sample likelihoods under different data-generating processes across conditions and covariates. While exact-likelihood models such as normalizing flows offer a promising approach to density ratio estimation, naive flow-based evaluations are computationally expensive, as they require simulating costly likelihood integrals for each distribution separately. In this work, we leverage condition-aware flow matching to derive a single dynamical formulation for tracking density ratios along generative trajectories. We demonstrate competitive performance on simulated benchmarks for closed-form ratio estimation, and show that our method supports versatile tasks in single-cell genomics data analysis, where likelihood-based comparisons of cellular states across experimental conditions enable treatment effect estimation and batch correction evaluation.
【3】SDMixer: Sparse Dual-Mixer for Time Series Forecasting
标题:SDMixer:用于时间序列预测的稀疏双混合器
链接:https://arxiv.org/abs/2602.23581
备注:12pages,2 figures
摘要:多元时间序列预测在交通、能源、金融等领域有着广泛的应用。然而,数据通常遭受多尺度特性,弱相关性和噪声干扰的问题,这些问题限制了现有模型的预测性能。提出了一种双流稀疏Mixer预测框架,该框架分别从频域和时域序列中提取全局趋势和局部动态特征。它采用稀疏机制过滤掉无效信息,从而提高跨变量依赖建模的准确性。实验结果表明,该方法在多个真实场景数据集上取得了领先的性能,验证了其有效性和通用性。该代码可在https://github.com/SDMixer/SDMixer上获得
摘要:Multivariate time series forecasting is widely applied in fields such as transportation, energy, and finance. However, the data commonly suffers from issues of multi-scale characteristics, weak correlations, and noise interference, which limit the predictive performance of existing models. This paper proposes a dual-stream sparse Mixer prediction framework that extracts global trends and local dynamic features from sequences in both the frequency and time domains, respectively. It employs a sparsity mechanism to filter out invalid information, thereby enhancing the accuracy of cross-variable dependency modeling. Experimental results demonstrate that this method achieves leading performance on multiple real-world scenario datasets, validating its effectiveness and generality. The code is available at https://github.com/SDMixer/SDMixer
【4】A Variational Estimator for $L_p$ Calibration Errors
标题:$L_p$校准误差的变分估计
链接:https://arxiv.org/abs/2602.24230
摘要:校准$\unicode{x2014}$确保预测的概率与观察到的类频率$\unicode{x2014}$一致的问题是机器学习系统进行可靠预测的基本需求。校准误差传统上通过发散函数进行评估,使用预测和经验频率之间的预期发散。准确估计这一数量是具有挑战性的,特别是在多类设置。在这里,我们将展示如何扩展最近的变分框架,用于估计超出适当的损失引起的发散引起的校准误差,以涵盖广泛的一类校准误差引起的$L_p$发散。我们的方法可以分离过度和不足的信心,不像非变分方法,避免高估。我们提供了大量的实验,并将我们的代码集成到开源软件包probmetrics(https://github.com/dholzmueller/probmetrics)中,以评估校准误差。
摘要
:Calibration$\unicode{x2014}$the problem of ensuring that predicted probabilities align with observed class frequencies$\unicode{x2014}$is a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.
【5】Predictive Hotspot Mapping for Data-driven Crime Prediction
标题:用于数据驱动犯罪预测的预测热点映射
链接:https://arxiv.org/abs/2602.23750
备注:50 pages
摘要:预测热点映射是犯罪预测与控制中的一个重要问题。准确的热点地图有助于适当地针对可用资源来管理城市犯罪。为了做出数据驱动的决策并实现警务和巡逻行动的自动化,世界各地的警察部门正在转向依赖历史数据的预测方法。在本文中,我们创建了一个非参数模型,使用时空核密度制定的犯罪预测的历史数据的基础上的目的。所提出的方法还能够通过替代来源纳入来自人类的专家输入。通过与德里警察局合作进行犯罪预测,这将有助于有效分配巡逻车辆以控制街头犯罪,该方法已在现实世界中进行了广泛的评估。本文所得到的结果是有前途的,可以很容易地应用于其他设置。我们发布了我们研究中使用的算法和数据集(掩码),以支持未来的研究,这将有助于实现进一步的改进。
摘要:Predictive hotspot mapping is an important problem in crime prediction and control. An accurate hotspot mapping helps in appropriately targeting the available resources to manage crime in cities. With an aim to make data-driven decisions and automate policing and patrolling operations, police departments across the world are moving towards predictive approaches relying on historical data. In this paper, we create a non-parametric model using a spatio-temporal kernel density formulation for the purpose of crime prediction based on historical data. The proposed approach is also able to incorporate expert inputs coming from humans through alternate sources. The approach has been extensively evaluated in a real-world setting by collaborating with the Delhi police department to make crime predictions that would help in effective assignment of patrol vehicles to control street crime. The results obtained in the paper are promising and can be easily applied in other settings. We release the algorithm and the dataset (masked) used in our study to support future research that will be useful in achieving further improvements.
【6】Partition Function Estimation under Bounded f-Divergence
标题:有界f-散度下的配分函数估计
链接:https://arxiv.org/abs/2602.23535
摘要:我们研究了估计分配函数的统计复杂性,给定样本访问一个建议分布和一个目标分布的非归一化密度比。虽然配分函数估计是一个经典的问题,现有的保证通常依赖于域或模型几何结构的结构假设。相反,我们提供了一个一般的,信息理论的表征,只依赖于建议和目标分布之间的关系。我们的分析引入了综合覆盖率曲线,这是一个量化有多少目标质量位于密度比大的区域的函数。我们表明,综合覆盖紧密的特征乘法分区函数估计的样本复杂性,并提供匹配的下界。我们进一步表达这些界限的$f$-分歧,产生急剧的相变依赖于增长率的f和恢复经典的结果作为一个特殊的情况下,同时延伸到重尾制度。匹配下限在所有状态中建立紧密性。作为应用,我们得到了改进的有限样本保证的重要性采样和自归一化的重要性采样,我们表现出严格的分离的复杂性近似采样和计数相同的发散约束下。我们的结果统一和推广了重要抽样,拒绝抽样和重尾均值估计的先验分析,提供了一个最小假设的配分函数估计理论。一路上,我们引入新的技术工具,包括新的连接覆盖和$f$-分歧,以及推广的经典Paley Zygmund不等式。
摘要:We study the statistical complexity of estimating partition functions given sample access to a proposal distribution and an unnormalized density ratio for a target distribution. While partition function estimation is a classical problem, existing guarantees typically rely on structural assumptions about the domain or model geometry. We instead provide a general, information-theoretic characterization that depends only on the relationship between the proposal and target distributions. Our analysis introduces the integrated coverage profile, a functional that quantifies how much target mass lies in regions where the density ratio is large. We show that integrated coverage tightly characterizes the sample complexity of multiplicative partition function estimation and provide matching lower bounds. We further express these bounds in terms of $f$-divergences, yielding sharp phase transitions depending on the growth rate of f and recovering classical results as a special case while extending to heavy-tailed regimes. Matching lower bounds establish tightness in all regimes. As applications, we derive improved finite-sample guarantees for importance sampling and self-normalized importance sampling, and we show a strict separation between the complexity of approximate sampling and counting under the same divergence constraints. Our results unify and generalize prior analyses of importance sampling, rejection sampling, and heavy-tailed mean estimation, providing a minimal-assumption theory of partition function estimation. Along the way we introduce new technical tools including new connections between coverage and $f$-divergences as well as a generalization of the classical Paley-Zygmund inequality.
其他神经网络|深度学习|模型|建模(19篇)
【1】Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
标题:通过度量森林完成更好的学习增强生成树算法
链接:https://arxiv.org/abs/2602.24232
摘要:我们提出了改进的学习增强算法,找到一个近似的最小生成树(MST)的任意度量空间中的点。我们的工作遵循最近的框架称为度量森林完成(MFC),其中学习的输入是一个森林,必须给予额外的边缘,以形成一个完整的生成树。Veldt et al.(2025)表明,最佳地完成森林需要$Ω(n^2)$时间,但设计了一个次二次复杂度的MFC的2.62近似。同样的方法是原始MST问题的$(2γ+ 1)$-近似,其中$γ\geq 1$是初始森林的质量参数。我们引入了一个广义的方法,这个以前的算法和最佳的$Ω(n^2)$时间MFC算法之间的插值。我们的方法只考虑边缘事件越来越多的战略选择的“代表性”点。我们的分析的一个推论是改进前一个算法的近似因子从2.62 MFC和$(2γ+1)$度量MST分别为2和$2γ$。我们证明这是紧的最坏情况下的情况下,但我们仍然获得更好的实例特定的近似使用我们的广义方法。我们补充我们的理论结果与一个彻底的实验评估。
摘要:We present improved learning-augmented algorithms for finding an approximate minimum spanning tree (MST) for points in an arbitrary metric space. Our work follows a recent framework called metric forest completion (MFC), where the learned input is a forest that must be given additional edges to form a full spanning tree. Veldt et al. (2025) showed that optimally completing the forest takes $Ω(n^2)$ time, but designed a 2.62-approximation for MFC with subquadratic complexity. The same method is a $(2γ+ 1)$-approximation for the original MST problem, where $γ\geq 1$ is a quality parameter for the initial forest. We introduce a generalized method that interpolates between this prior algorithm and an optimal $Ω(n^2)$-time MFC algorithm. Our approach considers only edges incident to a growing number of strategically chosen ``representative'' points. One corollary of our analysis is to improve the approximation factor of the previous algorithm from 2.62 for MFC and $(2γ+1)$ for metric MST to 2 and $2γ$ respectively. We prove this is tight for worst-case instances, but we still obtain better instance-specific approximations using our generalized method. We complement our theoretical results with a thorough experimental evaluation.
【2】Learning with a Budget: Identifying the Best Arm with Resource Constraints
标题:预算学习:在资源限制下确定最佳手臂
链接:https://arxiv.org/abs/2602.24146
备注:A preliminary version of this work, titled 'Best Arm Identification with Resource Constraints,' was presented at the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). This manuscript extends the original conference paper by providing improved theoretical results and more generalized conclusions, aiming for future journal submission. arXiv admin note: substantial text overlap with arXiv:2402.19090
摘要:在许多应用中,评估不同替代方案的有效性会带来不同的成本或资源使用。这种异质性的动机,我们研究的最佳手臂识别与资源约束(BAIwRC)的问题,其中代理寻求确定最佳的替代品(又名手臂)在存在资源约束。每次拉臂都消耗一种或多种类型的有限资源。我们做出了两个重要贡献。首先,我们提出了连续减半与资源配给(SH-RR)算法,它集成了资源感知分配到经典的连续减半框架的最佳手臂识别。SH-RR算法统一了随机和确定性消费设置的理论分析,并提出了一种新的有效消费度量
摘要
:In many applications, evaluating the effectiveness of different alternatives comes with varying costs or resource usage. Motivated by such heterogeneity, we study the Best Arm Identification with Resource Constraints (BAIwRC) problem, where an agent seeks to identify the best alternative (aka arm) in the presence of resource constraints. Each arm pull consumes one or more types of limited resources. We make two key contributions. First, we propose the Successive Halving with Resource Rationing (SH-RR) algorithm, which integrates resource-aware allocation into the classical successive halving framework on best arm identification. The SH-RR algorithm unifies the theoretical analysis for both the stochastic and deterministic consumption settings, with a new \textit{effective consumption measure
【3】Neural Diffusion Intensity Models for Point Process Data
标题:点过程数据的神经扩散强度模型
链接:https://arxiv.org/abs/2602.24083
摘要:Cox过程通过潜在的随机强度对过度分散的点过程数据进行建模,但是强度模型的非参数估计和强度路径上的后验推断通常是难以处理的,依赖于昂贵的MCMC方法。我们介绍神经扩散强度模型,一个变分框架的Cox过程驱动的神经SDES。我们的关键理论结果,基于放大的filtrations,表明条件点过程观测保留了扩散结构的潜在强度与明确的漂移校正。这保证了变分族包含真正的后验,使ELBO最大化符合极大似然估计在足够的模型容量。我们设计了一个摊销的编码器架构,通过模拟漂移校正的时间序列,取代重复的MCMC运行一个单一的向前通过可变长度的事件序列映射到后强度路径。合成和真实世界的数据上的实验表明,潜在的强度动态和后验路径的准确恢复,与基于MCMC的方法的数量级的加速比。
摘要:Cox processes model overdispersed point process data via a latent stochastic intensity, but both nonparametric estimation of the intensity model and posterior inference over intensity paths are typically intractable, relying on expensive MCMC methods. We introduce Neural Diffusion Intensity Models, a variational framework for Cox processes driven by neural SDEs. Our key theoretical result, based on enlargement of filtrations, shows that conditioning on point process observations preserves the diffusion structure of the latent intensity with an explicit drift correction. This guarantees the variational family contains the true posterior, so that ELBO maximization coincides with maximum likelihood estimation under sufficient model capacity. We design an amortized encoder architecture that maps variable-length event sequences to posterior intensity paths by simulating the drift-corrected SDE, replacing repeated MCMC runs with a single forward pass. Experiments on synthetic and real-world data demonstrate accurate recovery of latent intensity dynamics and posterior paths, with orders-of-magnitude speedups over MCMC-based methods.
【4】Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments
标题:基金会世界模型,为能够可靠地学习、验证和适应静态环境之外的代理提供
链接:https://arxiv.org/abs/2602.23997
备注:AAMAS 2026, Blue Sky Idea Track. 4 pages, 1 Figure
摘要:下一代自主智能体不仅要高效地学习,还要可靠地行动,并在开放世界中调整自己的行为。标准方法通常假设固定的任务和环境,很少或没有新颖性,这限制了世界模型支持必须随着条件变化而发展其策略的代理的能力。本文概述了基础世界模型的愿景:持久的组合表示,统一了强化学习,反应/程序合成和抽象机制。我们提出了一个围绕四个组成部分的议程:(i)从规范中学习的奖励模型,以支持具有明确目标的优化;(ii)在整个学习过程中集成的自适应形式验证;(iii)在线抽象校准,以量化模型预测的可靠性;(iv)由验证者指导的测试时间合成和世界模型生成。总之,这些组件使代理合成可验证的程序,从少量的交互中获得新的策略,并在适应新颖性的同时保持正确性。由此产生的框架将基础世界模型定位为学习,推理和适应的基础,为代理奠定基础,不仅表现良好,而且可以解释和证明他们采取的行为。
摘要:The next generation of autonomous agents must not only learn efficiently but also act reliably and adapt their behavior in open worlds. Standard approaches typically assume fixed tasks and environments with little or no novelty, which limits world models' ability to support agents that must evolve their policies as conditions change. This paper outlines a vision for foundation world models: persistent, compositional representations that unify reinforcement learning, reactive/program synthesis, and abstraction mechanisms. We propose an agenda built around four components: (i) learnable reward models from specifications to support optimization with clear objectives; (ii) adaptive formal verification integrated throughout learning; (iii) online abstraction calibration to quantify the reliability of the model's predictions; and (iv) test-time synthesis and world-model generation guided by verifiers. Together, these components enable agents to synthesize verifiable programs, derive new policies from a small number of interactions, and maintain correctness while adapting to novelty. The resulting framework positions foundation world models as a substrate for learning, reasoning, and adaptation, laying the groundwork for agents that not only act well but can explain and justify the behavior they adopt.
【5】Intrinsic Lorentz Neural Network
标题:内禀洛伦兹神经网络
链接:https://arxiv.org/abs/2602.23981
备注:Published in ICLR 2026
摘要:现实世界的数据经常表现出潜在的层次结构,这可以自然地表示为双曲几何。尽管最近的双曲神经网络已经证明了有希望的结果,但许多现有的架构仍然是部分内在的,将欧几里得运算与双曲运算混合或依赖于外部参数化。为了解决这个问题,我们提出了一种完全内在的双曲型结构,它可以在洛伦兹模型中进行所有计算。在其核心,该网络引入了一个新的点到超平面全连接层(FC),用从特征到学习的洛伦兹超平面的闭合形式双曲距离取代传统的欧几里得仿射对数,从而确保产生的几何决策函数尊重固有曲率。围绕这一基本层,我们设计了固有模块:GyroLBN,一种将陀螺定心与陀螺缩放相结合的洛伦兹批量归一化,在减少训练时间的同时,始终优于LBN和GyroBN。我们还提出了一个陀螺加性偏差的FC输出,洛伦兹补丁级联运营商,通过digamma为基础的规模,并洛伦兹dropout层的功能块对齐预期的日志半径。在CIFAR-10/100和两个基因组基准(TEB和GUE)上进行的大量实验表明,ILNN在双曲线模型中实现了最先进的性能和计算成本,并始终超过强欧几里得基线。该代码可在\href{https://github.com/Longchentong/ILNN}{\textcolor{magenta}{this url}}获得。
摘要:Real-world data frequently exhibit latent hierarchical structures, which can be naturally represented by hyperbolic geometry. Although recent hyperbolic neural networks have demonstrated promising results, many existing architectures remain partially intrinsic, mixing Euclidean operations with hyperbolic ones or relying on extrinsic parameterizations. To address it, we propose the \emph{Intrinsic Lorentz Neural Network} (ILNN), a fully intrinsic hyperbolic architecture that conducts all computations within the Lorentz model. At its core, the network introduces a novel \emph{point-to-hyperplane} fully connected layer (FC), replacing traditional Euclidean affine logits with closed-form hyperbolic distances from features to learned Lorentz hyperplanes, thereby ensuring that the resulting geometric decision functions respect the inherent curvature. Around this fundamental layer, we design intrinsic modules: GyroLBN, a Lorentz batch normalization that couples gyro-centering with gyro-scaling, consistently outperforming both LBN and GyroBN while reducing training time. We additionally proposed a gyro-additive bias for the FC output, a Lorentz patch-concatenation operator that aligns the expected log-radius across feature blocks via a digamma-based scale, and a Lorentz dropout layer. Extensive experiments conducted on CIFAR-10/100 and two genomic benchmarks (TEB and GUE) illustrate that ILNN achieves state-of-the-art performance and computational cost among hyperbolic models and consistently surpasses strong Euclidean baselines. The code is available at \href{https://github.com/Longchentong/ILNN}{\textcolor{magenta}{this url}}.
【6】Hierarchical Concept-based Interpretable Models
标题:基于概念的分层可解释模型
链接:https://arxiv.org/abs/2602.23947
备注:Published as a conference paper at ICLR 2026
摘要:现代深度神经网络由于其潜在表示的不透明性,阻碍了模型的理解,调试和去偏置,因此仍然具有挑战性。概念嵌入模型(CEMs)通过将输入映射到人类可解释的概念表示来解决这个问题,从而可以预测任务。然而,CEM无法表示概念间的关系,并需要在不同的粒度在训练过程中的概念注释,限制了它们的适用性。在本文中,我们介绍了分层概念嵌入模型(HiCEMS),这是一个新的CEMS家族,可以通过分层结构显式地对概念关系进行建模。为了使HiCEM在现实世界中的设置,我们提出了概念分裂,自动发现细粒度的子概念从预训练的CEM的嵌入空间,而不需要额外的注释的方法。这使得HiCEM能够从有限的概念标签中生成细粒度的解释,从而减少注释负担。我们在多个数据集上的评估,包括用户研究和对新提出的基于概念的3D厨房渲染数据集PseudoKitterfly的实验,表明(1)概念分裂发现了训练过程中不存在的人类可解释的子概念,这些子概念可用于训练高度准确的HiCEM,以及(2)HiCEM在不同粒度下实现了强大的测试时间概念干预,从而提高了任务准确性。
摘要
:Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEMs) address this by mapping inputs to human-interpretable concept representations from which tasks can be predicted. Yet, CEMs fail to represent inter-concept relationships and require concept annotations at different granularities during training, limiting their applicability. In this paper, we introduce Hierarchical Concept Embedding Models (HiCEMs), a new family of CEMs that explicitly model concept relationships through hierarchical structures. To enable HiCEMs in real-world settings, we propose Concept Splitting, a method for automatically discovering finer-grained sub-concepts from a pretrained CEM's embedding space without requiring additional annotations. This allows HiCEMs to generate fine-grained explanations from limited concept labels, reducing annotation burdens. Our evaluation across multiple datasets, including a user study and experiments on PseudoKitchens, a newly proposed concept-based dataset of 3D kitchen renders, demonstrates that (1) Concept Splitting discovers human-interpretable sub-concepts absent during training that can be used to train highly accurate HiCEMs, and (2) HiCEMs enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.
【7】Learning to Build: Autonomous Robotic Assembly of Stable Structures Without Predefined Plans
标题:学习建造:无需预定计划的稳定结构的自主机器人组装
链接:https://arxiv.org/abs/2602.23934
摘要:本文提出了一种新的自主机器人装配框架,用于构建稳定的结构,而不依赖于预定义的建筑蓝图。建筑任务通过目标和障碍物来定义,而不是遵循固定的计划,从而使系统能够更灵活地适应建筑过程中的环境不确定性和变化。使用具有后继特征的深度Q学习训练的强化学习(RL)策略作为决策组件。作为一个概念证明,我们评估的方法上的基准15个2D机器人装配任务的离散块建设。使用真实世界的闭环机器人设置的实验证明了该方法的可行性和处理施工噪声的能力。结果表明,我们的框架提供了一个有前途的方向,更适应和强大的机器人建设在现实世界中的环境。
摘要:This paper presents a novel autonomous robotic assembly framework for constructing stable structures without relying on predefined architectural blueprints. Instead of following fixed plans, construction tasks are defined through targets and obstacles, allowing the system to adapt more flexibly to environmental uncertainty and variations during the building process. A reinforcement learning (RL) policy, trained using deep Q-learning with successor features, serves as the decision-making component. As a proof of concept, we evaluate the approach on a benchmark of 15 2D robotic assembly tasks of discrete block construction. Experiments using a real-world closed-loop robotic setup demonstrate the feasibility of the method and its ability to handle construction noise. The results suggest that our framework offers a promising direction for more adaptable and robust robotic construction in real-world environments.
【8】ULW-SleepNet: An Ultra-Lightweight Network for Multimodal Sleep Stage Scoring
标题:ULW-SleepNet:用于多模式睡眠阶段评分的超轻量级网络
链接:https://arxiv.org/abs/2602.23852
备注:Accepted to ICASSP 2026
摘要:自动睡眠阶段评分对于睡眠障碍的诊断和治疗至关重要。虽然深度学习模型已经推进了该领域,但许多现有模型在计算上要求很高,并且是为单通道脑电图(EEG)设计的,限制了它们对多模态多导睡眠图(PSG)数据的实用性。为了克服这一点,我们提出了ULW-SleepNet,这是一个超轻量的多模态睡眠阶段评分框架,可以有效地整合来自多个生理信号的信息。ULW-SleepNet集成了一种新颖的双流可分离卷积(DSSC)块,独立可分离卷积,通道式参数共享和全局平均池化,以减少计算开销,同时保持有竞争力的准确性。在Sleep-EDF-20和Sleep-EDF-78数据集上进行评估,ULW-SleepNet分别实现了86.9%和81.4%的准确率,仅使用13.3K参数和7.89M FLOP。与最先进的方法相比,我们的模型减少了高达98.6%的参数,只有边际性能损失,这表明其在可穿戴和物联网设备上实时睡眠监测的强大潜力。本研究的源代码可在https://github.com/wzw999/ULW-SLEEPNET上公开获得。
摘要:Automatic sleep stage scoring is crucial for the diagnosis and treatment of sleep disorders. Although deep learning models have advanced the field, many existing models are computationally demanding and designed for single-channel electroencephalography (EEG), limiting their practicality for multimodal polysomnography (PSG) data. To overcome this, we propose ULW-SleepNet, an ultra-lightweight multimodal sleep stage scoring framework that efficiently integrates information from multiple physiological signals. ULW-SleepNet incorporates a novel Dual-Stream Separable Convolution (DSSC) Block, depthwise separable convolutions, channel-wise parameter sharing, and global average pooling to reduce computational overhead while maintaining competitive accuracy. Evaluated on the Sleep-EDF-20 and Sleep-EDF-78 datasets, ULW-SleepNet achieves accuracies of 86.9% and 81.4%, respectively, with only 13.3K parameters and 7.89M FLOPs. Compared to state-of-the-art methods, our model reduces parameters by up to 98.6% with only marginal performance loss, demonstrating its strong potential for real-time sleep monitoring on wearable and IoT devices. The source code for this study is publicly available at https://github.com/wzw999/ULW-SLEEPNET.
【9】Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective
标题:在未知限制的环境中通过专家演示学习维护安全:Q学习的角度
链接:https://arxiv.org/abs/2602.23816
备注:Accepted for publication at AAMAS 2026
摘要:给定一组轨迹,展示了在具有可观察奖励但具有未知约束和不可观察成本的受约束MDP中安全执行任务的可能性,我们的目标是找到一种策略,该策略最大化所展示轨迹的可能性,在保守和显着增加高奖励轨迹的可能性之间进行平衡,但具有潜在的不安全步骤。有了这些目标,我们的目标是学习一个政策,最大限度地提高概率的最有前途的轨迹相对于示范。在这样做的时候,我们制定的“承诺”的个人状态动作对的$Q$值,这取决于特定任务的奖励,以及对国家的安全评估,混合期望的奖励和安全。这需要一个安全的Q学习的角度来看,约束下的逆学习问题:设计的安全Q逆约束强化学习(SafeQIL)算法相比,最先进的逆约束强化学习算法的一组具有挑战性的基准任务,显示其优点。
摘要:Given a set of trajectories demonstrating the execution of a task safely in a constrained MDP with observable rewards but with unknown constraints and non-observable costs, we aim to find a policy that maximizes the likelihood of demonstrated trajectories trading the balance between being conservative and increasing significantly the likelihood of high-rewarding trajectories but with potentially unsafe steps. Having these objectives, we aim towards learning a policy that maximizes the probability of the most $promising$ trajectories with respect to the demonstrations. In so doing, we formulate the ``promise" of individual state-action pairs in terms of $Q$ values, which depend on task-specific rewards as well as on the assessment of states' safety, mixing expectations in terms of rewards and safety. This entails a safe Q-learning perspective of the inverse learning problem under constraints: The devised Safe $Q$ Inverse Constrained Reinforcement Learning (SafeQIL) algorithm is compared to state-of-the art inverse constraint reinforcement learning algorithms to a set of challenging benchmark tasks, showing its merits.
【10】GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks
标题:GRAIL:压缩网络线性重建的事后补偿
链接:https://arxiv.org/abs/2602.23795
备注:Conference on Parsimony and Learning (CPAL)
摘要:结构化深度模型压缩方法是硬件友好的,并大大降低了内存和推理成本。然而,在积极的压缩下,所产生的准确性下降通常需要压缩后微调,这可能是不切实际的,由于丢失的标记数据或高训练成本。我们提出了事后块补偿,称为GRAIL,一个简单的零微调步骤后,模型压缩,使用一个小的校准集恢复每个块的输入输出行为。该方法通过一个Gram矩阵总结隐藏的激活,并应用岭回归线性重建原来的隐藏表示从减少。所得到的重建图被吸收到下游投影权重中,而上游层被压缩。该方法是选择器不可知的(幅度,万达,基于革兰氏的选择,或折叠),数据感知(只需要几个向前通过没有梯度或标签),并恢复经典的修剪或折叠时,革兰氏矩阵接近单位,指示弱通道间的相关性。在ResNets,ViTs和仅解码器LLM中,GRAIL始终在实际压缩机制中提高无数据和数据感知修剪或折叠基线的准确性或复杂性,具有可管理的开销,没有反向传播。该代码可在https://github.com/TWWinde/GRAIL上获得。
摘要:Structured deep model compression methods are hardware-friendly and substantially reduce memory and inference costs. However, under aggressive compression, the resulting accuracy degradation often necessitates post-compression finetuning, which can be impractical due to missing labeled data or high training cost. We propose post-hoc blockwise compensation, called GRAIL, a simple zero-finetuning step applied after model compression that restores each block's input-output behavior using a small calibration set. The method summarizes hidden activations via a Gram matrix and applies ridge regression to linearly reconstruct the original hidden representation from the reduced one. The resulting reconstruction map is absorbed into the downstream projection weights, while the upstream layer is compressed. The approach is selector-agnostic (Magnitude, Wanda, Gram-based selection, or folding), data-aware (requiring only a few forward passes without gradients or labels), and recovers classic pruning or folding when the Gram matrix is near identity, indicating weak inter-channel correlations. Across ResNets, ViTs, and decoder-only LLMs, GRAIL consistently improves accuracy or perplexity over data-free and data-aware pruning or folding baselines in practical compression regimes, with manageable overhead and no backpropagation. The code is available at https://github.com/TWWinde/GRAIL.
【11】TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure
标题:TradeFM:贸易流和市场微观结构的生成基础模型
链接:https://arxiv.org/abs/2602.23784
备注:29 pages, 17 figures, 6 tables. Preprint
摘要:基础模型通过从大规模的异构数据中学习通用表示,将领域从语言转变为基因组学。我们介绍TradeFM,一个524 M参数生成Transformer,将这种范式引入市场微观结构,直接从超过9 K股票的数十亿交易事件中学习。为了实现跨资产泛化,我们开发了尺度不变的特征和通用的标记化方案,将订单流的异构、多模态事件流映射到统一的离散序列中-消除了特定于资产的校准。与确定性市场模拟器相结合,TradeFM生成的推出再现了金融回报的关键程式化事实,包括重尾,波动性聚集和回报自相关的缺乏。TradeFM的分布误差比Compound Hawkes基线低2- 3倍,并将zero-shot推广到地理分布外的亚太市场,具有适度的困惑度退化。总之,这些结果表明,标度不变的交易表示捕捉市场微观结构中的可转移结构,开辟了一条通往合成数据生成,压力测试和基于学习的交易代理的道路。
摘要:Foundation models have transformed domains from language to genomics by learning general-purpose representations from large-scale, heterogeneous data. We introduce TradeFM, a 524M-parameter generative Transformer that brings this paradigm to market microstructure, learning directly from billions of trade events across >9K equities. To enable cross-asset generalization, we develop scale-invariant features and a universal tokenization scheme that map the heterogeneous, multi-modal event stream of order flow into a unified discrete sequence -- eliminating asset-specific calibration. Integrated with a deterministic market simulator, TradeFM-generated rollouts reproduce key stylized facts of financial returns, including heavy tails, volatility clustering, and absence of return autocorrelation. Quantitatively, TradeFM achieves 2-3x lower distributional error than Compound Hawkes baselines and generalizes zero-shot to geographically out-of-distribution APAC markets with moderate perplexity degradation. Together, these results suggest that scale-invariant trade representations capture transferable structure in market microstructure, opening a path toward synthetic data generation, stress testing, and learning-based trading agents.
【12】Any Model, Any Place, Any Time: Get Remote Sensing Foundation Model Embeddings On Demand
标题:任何模型、任何地点、任何时间:按需获取遥感基金会模型嵌入
链接:https://arxiv.org/abs/2602.23678
摘要:遥感界正在见证基础模型的快速增长,这些模型为广泛的下游任务提供了强大的嵌入。然而,由于模型发布格式、平台和接口以及输入数据规范的巨大异质性,实际采用和公平比较仍然具有挑战性。这些不一致性显著增加了获取、使用和基准测试跨模型嵌入的成本。为了解决这个问题,我们提出了rs-embed,这是一个Python库,它提供了一个统一的、以感兴趣区域(ROI)为中心的界面:只需一行代码,用户就可以从任何支持的模型中检索任何位置和任何时间范围的嵌入。该库还提供高效的批处理,以实现大规模嵌入生成和评估。该代码可在以下网址获得:https://github.com/cybergis/rs-embed
摘要:The remote sensing community is witnessing a rapid growth of foundation models, which provide powerful embeddings for a wide range of downstream tasks. However, practical adoption and fair comparison remain challenging due to substantial heterogeneity in model release formats, platforms and interfaces, and input data specifications. These inconsistencies significantly increase the cost of obtaining, using, and benchmarking embeddings across models. To address this issue, we propose rs-embed, a Python library that offers a unified, region of interst (ROI) centric interface: with a single line of code, users can retrieve embeddings from any supported model for any location and any time range. The library also provides efficient batch processing to enable large-scale embedding generation and evaluation. The code is available at: https://github.com/cybergis/rs-embed
【13】Hybrid Quantum Temporal Convolutional Networks
标题:混合量子时态卷积网络
链接:https://arxiv.org/abs/2602.23578
摘要:用于序列数据的量子机器学习模型面临复杂多变量信号的可扩展性挑战。我们介绍了混合量子时间卷积网络(HQTCN),它将经典的时间窗口与量子卷积神经网络核心相结合。通过跨时间窗口应用共享量子电路,HQTCN捕获长程依赖性,同时实现显著的参数减少。在合成NARMA序列和高维EEG时间序列上进行评估,HQTCN在单变量数据上与经典基线具有竞争力,在多变量任务上优于所有基线。该模型在数据有限的条件下表现出特别的优势,保持高性能,比传统方法的参数少得多。这些结果建立HQTCN作为多变量时间序列分析的参数有效的方法。
摘要:Quantum machine learning models for sequential data face scalability challenges with complex multivariate signals. We introduce the Hybrid Quantum Temporal Convolutional Network (HQTCN), which combines classical temporal windowing with a quantum convolutional neural network core. By applying a shared quantum circuit across temporal windows, HQTCN captures long-range dependencies while achieving significant parameter reduction. Evaluated on synthetic NARMA sequences and high-dimensional EEG time-series, HQTCN performs competitively with classical baselines on univariate data and outperforms all baselines on multivariate tasks. The model demonstrates particular strength under data-limited conditions, maintaining high performance with substantially fewer parameters than conventional approaches. These results establish HQTCN as a parameter-efficient approach for multivariate time-series analysis.
【14】Dynamics of Learning under User Choice: Overspecialization and Peer-Model Probing
标题:用户选择下的学习动态:过度专业化和同行模型探索
链接:https://arxiv.org/abs/2602.23565
摘要:在许多部署机器学习的经济相关环境中,多个平台从同一个用户池中获取数据,每个用户都选择最适合他们的平台。在这种情况下,以前的工作完全集中在“本地”损失的学习者分布的数据,他们观察。我们发现,存在的情况下,使用现有算法的学习者几乎肯定会收敛到具有任意差的全局性能的模型,即使存在低全人口损失的模型。这是通过反馈诱导机制发生的,我们称之为过度专业化陷阱:当学习者为已经喜欢他们的用户进行优化时,他们对这个基础之外的用户的吸引力就会降低,这进一步限制了他们观察的数据。受最近在现代机器学习中使用知识蒸馏的启发,我们提出了一种算法,允许学习者“探测”对等模型的预测,使他们能够了解没有选择他们的用户。我们的分析表明,当探测成功时:当探测源提供足够的信息时,这个过程几乎肯定会收敛到具有有限全种群风险的平稳点,例如,知名的市场领导者或大多数具有良好全球业绩的同行。我们通过MovieLens,Census和Amazon Sentiment数据集的半合成实验验证了我们的发现。
摘要:In many economically relevant contexts where machine learning is deployed, multiple platforms obtain data from the same pool of users, each of whom selects the platform that best serves them. Prior work in this setting focuses exclusively on the "local" losses of learners on the distribution of data that they observe. We find that there exist instances where learners who use existing algorithms almost surely converge to models with arbitrarily poor global performance, even when models with low full-population loss exist. This happens through a feedback-induced mechanism, which we call the overspecialization trap: as learners optimize for users who already prefer them, they become less attractive to users outside this base, which further restricts the data they observe. Inspired by the recent use of knowledge distillation in modern ML, we propose an algorithm that allows learners to "probe" the predictions of peer models, enabling them to learn about users who do not select them. Our analysis characterizes when probing succeeds: this procedure converges almost surely to a stationary point with bounded full-population risk when probing sources are sufficiently informative, e.g., a known market leader or a majority of peers with good global performance. We verify our findings with semi-synthetic experiments on the MovieLens, Census, and Amazon Sentiment datasets.
【15】Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG
标题:Brain-OF:fMRI、EEG和MEG的全功能基础模型
链接:https://arxiv.org/abs/2602.23410
摘要:大脑基础模型在广泛的神经科学任务中取得了显着的进步。然而,大多数现有的模型仅限于一个单一的功能模式,限制了他们的能力,利用互补的时空动态和集体的数据规模在成像技术。为了解决这一限制,我们提出了Brain-OF,这是第一个在fMRI,EEG和MEG上联合预训练的多功能大脑基础模型,能够在统一的框架内处理单峰和多峰输入。为了协调异质时空分辨率,我们引入了任意分辨率神经信号采样器,它将不同的大脑信号投射到共享的语义空间中。为了进一步管理语义转换,Brain-OF骨干将DINT注意力与稀疏混合专家集成在一起,其中共享专家捕获模态不变的表示,路由专家专注于模态特定的语义。此外,我们还提出了掩蔽的时频建模,这是一种双域预训练目标,可以在时域和频域共同重建大脑信号。Brain-OF在包含约40个数据集的大规模语料库上进行了预训练,并在各种下游任务中表现出卓越的性能,突出了联合多模态集成和双域预训练的优势。
摘要
:Brain foundation models have achieved remarkable advances across a wide range of neuroscience tasks. However, most existing models are limited to a single functional modality, restricting their ability to exploit complementary spatiotemporal dynamics and the collective data scale across imaging techniques. To address this limitation, we propose Brain-OF, the first omnifunctional brain foundation model jointly pretrained on fMRI, EEG and MEG, capable of handling both unimodal and multimodal inputs within a unified framework. To reconcile heterogeneous spatiotemporal resolutions, we introduce the Any-Resolution Neural Signal Sampler, which projects diverse brain signals into a shared semantic space.To further manage semantic shifts, the Brain-OF backbone integrates DINT attention with a Sparse Mixture of Experts, where shared experts capture modality-invariant representations and routed experts specialize in modality-specific semantics. Furthermore, we propose Masked Temporal-Frequency Modeling, a dual-domain pretraining objective that jointly reconstructs brain signals in both the time and frequency domains. Brain-OF is pretrained on a large-scale corpus comprising around 40 datasets and demonstrates superior performance across diverse downstream tasks, highlighting the benefits of joint multimodal integration and dual-domain pretraining.
【16】BLISSNet: Deep Operator Learning for Fast and Accurate Flow Reconstruction from Sparse Sensor Measurements
标题:BLISSNet:深度操作员学习,通过稀疏传感器测量快速准确地重建流量
链接:https://arxiv.org/abs/2602.24228
摘要:从稀疏传感器测量重构流体流动是科学和工程中的基本挑战。广泛分离的测量和复杂的多尺度动力学使得精确恢复精细尺度结构变得困难。此外,现有的方法面临着一个持续的权衡:高精度模型通常计算昂贵,而更快的方法通常会损害保真度。在这项工作中,我们介绍BLISSNet,一个模型,流量重建和轻推的数据同化重建精度和计算效率之间取得了很好的平衡。该模型遵循类似DeepONet的架构,能够对任意大小的域进行zero-shot推理。在给定域上的第一个模型调用之后,可以预先计算某些网络组件,从而降低后续大型域上评估的推理成本。因此,该模型可以实现更快的推理比经典的插值方法,如径向基函数或双三次插值。这种高精度、低成本和zero-shot概括的组合使得BLISSNet非常适合于大规模实时流重建和数据同化任务。
摘要:Reconstructing fluid flows from sparse sensor measurements is a fundamental challenge in science and engineering. Widely separated measurements and complex, multiscale dynamics make accurate recovery of fine-scale structures difficult. In addition, existing methods face a persistent tradeoff: high-accuracy models are often computationally expensive, whereas faster approaches typically compromise fidelity. In this work, we introduce BLISSNet, a model that strikes a strong balance between reconstruction accuracy and computational efficiency for both flow reconstruction and nudging-based data assimilation. The model follows a DeepONet-like architecture, enabling zero-shot inference on domains of arbitrary size. After the first model call on a given domain, certain network components can be precomputed, leading to low inference cost for subsequent evaluations on large domains. Consequently, the model can achieve faster inference than classical interpolation methods such as radial basis function or bicubic interpolation. This combination of high accuracy, low cost, and zero-shot generalization makes BLISSNet well-suited for large-scale real-time flow reconstruction and data assimilation tasks.
【17】General Bayesian Policy Learning
标题:一般Bayesian政策学习
链接:https://arxiv.org/abs/2602.23672
摘要:本研究提出了政策学习的一般贝叶斯框架。我们考虑决策问题,决策者选择一个行动,从行动集,以最大限度地提高其预期的福利。典型的例子包括治疗选择和组合选择。在这样的问题中,统计目标是一个决策规则,每个结果Y(a)的预测不一定是主要的兴趣。我们制定这个政策学习问题的损失为基础的贝叶斯更新。我们的主要技术设备是福利最大化的平方损失替代品。我们表明,经验福利最大化的政策类相当于最小化的结果差异的缩放平方误差,到一个二次正则化控制的调整参数$$>0$。这种重写产生了一个一般的贝叶斯后验决策规则,承认高斯伪似然解释。我们澄清两个贝叶斯解释所产生的广义后验,工作高斯视图和决策理论的损失为基础的观点。作为一个实现示例,我们引入了具有双曲正切压缩输出的神经网络。最后,我们提供了一个PAC-Bayes风格的理论保证。
摘要:This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice and portfolio selection. In such problems, the statistical target is a decision rule, and the prediction of each outcome $Y(a)$ is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss surrogate for welfare maximization. We show that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter $ζ>0$. This rewriting yields a General Bayes posterior over decision rules that admits a Gaussian pseudo-likelihood interpretation. We clarify two Bayesian interpretations of the resulting generalized posterior, a working Gaussian view and a decision-theoretic loss-based view. As one implementation example, we introduce neural networks with tanh-squashed outputs. Finally, we provide theoretical guarantees in a PAC-Bayes style.
【18】Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models
标题:利用辅助变量引导生成模型揭示暗物质晕结构的物理驱动因素
链接:https://arxiv.org/abs/2602.23518
摘要:深度生成模型(DGMs)压缩高维数据,但通常在其潜在空间中纠缠不同的物理因素。我们提出了一个非线性变量指导的框架解开表示的热Sunyaev-Zel'dovich(tSZ)地图的暗物质晕。我们引入晕质量和浓度作为辅助变量,并应用一个轻量级的对齐惩罚,以鼓励潜在的尺寸,以反映这些物理量。为了生成尖锐和现实的样本,我们扩展了潜在的条件流匹配(LCFM),一个国家的最先进的生成模型,在潜在的空间中执行解纠缠。我们的解开潜在CFM(DL-CFM)模型恢复建立的质量浓度标度关系,并确定潜在的空间异常值,可能对应于不寻常的晕形成的历史。通过将潜在坐标与可解释的天体物理性质联系起来,我们的方法将潜在空间转化为宇宙学结构的诊断工具。这项工作表明,辅助指导保留了生成的灵活性,同时产生物理上有意义的,解开嵌入,提供了一个可推广的途径,揭示复杂的天文数据集中的独立因素。
摘要:Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliary-variable-guided framework for disentangling representations of thermal Sunyaev-Zel'dovich (tSZ) maps of dark matter halos. We introduce halo mass and concentration as auxiliary variables and apply a lightweight alignment penalty to encourage latent dimensions to reflect these physical quantities. To generate sharp and realistic samples, we extend latent conditional flow matching (LCFM), a state-of-the-art generative model, to enforce disentanglement in the latent space. Our Disentangled Latent-CFM (DL-CFM) model recovers the established mass-concentration scaling relation and identifies latent space outliers that may correspond to unusual halo formation histories. By linking latent coordinates to interpretable astrophysical properties, our method transforms the latent space into a diagnostic tool for cosmological structure. This work demonstrates that auxiliary guidance preserves generative flexibility while yielding physically meaningful, disentangled embeddings, providing a generalizable pathway for uncovering independent factors in complex astronomical datasets.
【19】Universality of Shallow and Deep Neural Networks on Non-Euclidean Spaces
标题:非欧几里得空间上浅层和深层神经网络的普遍性
链接:https://arxiv.org/abs/2602.23381
备注:23 pages, 35 references
摘要:我们为浅层和深层神经网络开发了一个框架,其输入范围在一般拓扑空间上。该模型是建立在一个规定的家庭连续的特征映射和一个固定的标量激活函数,它减少了多层前馈网络的欧几里得的情况下。我们专注于普遍逼近性质,并建立一般条件下,这种网络是稠密的连续向量值函数的空间上的任意和局部凸拓扑空间。在没有宽度限制的情况下,我们得到的普遍性结果,扩展经典的逼近定理,非欧几里德设置。本文的中心焦点是深窄框架,其中每个隐藏层的宽度是均匀有界的,而深度是允许增长的。我们确定了这种宽度受限的深度网络保持通用近似能力的条件。作为一个具体的例子,我们采用Ostrand的Kolmogorov叠加定理的扩展,推导出一个明确的普适性结果的产品紧凑的度量空间,宽度界表示的拓扑维数。
摘要:We develop a framework for shallow and deep neural networks whose inputs range over a general topological space. The model is built from a prescribed family of continuous feature maps and a fixed scalar activation function, and it reduces to multilayer feedforward networks in the Euclidean case. We focus on the universal approximation property and establish general conditions under which such networks are dense in spaces of continuous vector-valued functions on arbitrary and locally convex topological spaces. In the absence of width constraints, we obtain universality results that extend classical approximation theorems to non-Euclidean settings. A central focus of the paper is the deep narrow framework, in which the width of each hidden layer is uniformly bounded while the depth is allowed to grow. We identify conditions under which such width constrained deep networks retain universal approximation power. As a concrete example, we employ Ostrand's extension of the Kolmogorov superposition theorem to derive an explicit universality result for products of compact metric spaces, with width bounds expressed in terms of topological dimension.
其他(33篇)
【1】Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
标题:驯服势头:通过低等级逼近重新思考优化器状态
链接:https://arxiv.org/abs/2602.24283
备注:Camera-ready version. Accepted as Oral at ICLR 2026
摘要
:像Adam和Muon这样的现代优化器是训练大型语言模型的核心,但它们对一阶和二阶动量的依赖会引入显著的内存开销,这限制了可扩展性和计算效率。在这项工作中,我们重新构建了指数移动平均(EMA)在这些动量作为训练的线性回归通过在线梯度流。基于这种等价性,我们引入了LoRA-Pre,这是一种新型的低秩优化器,旨在进行有效的预训练。具体而言,LoRA-Pre通过将完整的动量矩阵分解为在线线性学习器内的紧凑的低秩子空间来减少优化器的内存占用,从而在提高内存效率的同时保持优化性能。我们通过对Llama架构家族的预训练模型进行经验验证LoRA-Pre的有效性,从60 M扩展到1B参数。LoRA-Pre在所有型号尺寸上都实现了最高性能。值得注意的是,LoRA-Pre表现出显著的排名效率,仅使用基线方法排名的1/8即可获得相当或更优的结果。除了预训练,我们还评估了LoRA-Pre在微调场景中的有效性。在相同的排名下,LoRA-Pre始终优于所有有效的微调基线。具体来说,与标准LoRA相比,LoRA-Pre在Llama-3.1-8B上实现了3.14分的大幅改进,在Llama-2- 7 B上实现了6.17分的大幅改进,验证了我们的方法在预训练和微调范例中的有效性。我们的代码可在https://github.com/mrflogs/LoRA-Pre上公开获取。
摘要:Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces significant memory overhead, which constrains scalability and computational efficiency. In this work, we reframe the exponential moving average (EMA) used in these momenta as the training of a linear regressor via online gradient flow. Building on this equivalence, we introduce LoRA-Pre, a novel low-rank optimizer designed for efficient pre-training. Specifically, LoRA-Pre reduces the optimizer's memory footprint by decomposing the full momentum matrix into a compact low-rank subspace within the online linear learner, thereby maintaining optimization performance while improving memory efficiency. We empirically validate LoRA-Pre's efficacy by pre-training models from the Llama architecture family, scaling from 60M to 1B parameters. LoRA-Pre achieves the highest performance across all model sizes. Notably, LoRA-Pre demonstrates remarkable rank efficiency, achieving comparable or superior results using only 1/8 the rank of baseline methods. Beyond pre-training, we evaluate LoRA-Pre's effectiveness in fine-tuning scenarios. With the same rank, LoRA-Pre consistently outperforms all efficient fine-tuning baselines. Specifically, compared to standard LoRA, LoRA-Pre achieves substantial improvements of 3.14 points on Llama-3.1-8B and 6.17 points on Llama-2-7B, validating our approach's effectiveness across both pre-training and fine-tuning paradigms. Our code is publicly available at https://github.com/mrflogs/LoRA-Pre.
【2】Memory Caching: RNNs with Growing Memory
标题:内存缓存:内存不断增长的RNN
链接:https://arxiv.org/abs/2602.24281
摘要:Transformers已经被确立为序列建模中最新进展的事实上的骨干,主要是由于它们随着上下文长度而扩展的不断增长的存储容量。虽然对于检索任务来说是合理的,但它会导致二次复杂性,因此促使最近的研究探索可行的次二次递归替代方案。尽管在不同的领域显示出有希望的初步结果,这种经常性的架构在回忆密集型任务中表现不佳,这通常归因于它们的固定大小的内存。在本文中,我们介绍了内存缓存(MC),这是一种简单而有效的技术,通过缓存内存状态的检查点来增强递归模型。隐藏状态)。内存缓存允许RNN的有效内存容量随着序列长度而增长,提供了在固定内存(即,$O(L)$复杂度)和不断增长的内存(即,复杂度为O(L^2)。我们提出了MC的四种变体,包括门控聚合和稀疏选择机制,并讨论了它们对线性和深层记忆模块的影响。我们在语言建模和长上下文理解任务上的实验结果表明,MC增强了循环模型的性能,支持其有效性。上下文回忆任务的结果表明,虽然Transformers达到最佳的准确性,我们的MC变体表现出竞争力的性能,缩小了与Transformers的差距,并表现得比最先进的循环模型。
摘要:Transformers have been established as the de-facto backbones for most recent advances in sequence modeling, mainly due to their growing memory capacity that scales with the context length. While plausible for retrieval tasks, it causes quadratic complexity and so has motivated recent studies to explore viable subquadratic recurrent alternatives. Despite showing promising preliminary results in diverse domains, such recurrent architectures underperform Transformers in recall-intensive tasks, often attributed to their fixed-size memory. In this paper, we introduce Memory Caching (MC), a simple yet effective technique that enhances recurrent models by caching checkpoints of their memory states (a.k.a. hidden states). Memory Caching allows the effective memory capacity of RNNs to grow with sequence length, offering a flexible trade-off that interpolates between the fixed memory (i.e., $O(L)$ complexity) of RNNs and the growing memory (i.e., $O(L^2)$ complexity) of Transformers. We propose four variants of MC, including gated aggregation and sparse selective mechanisms, and discuss their implications on both linear and deep memory modules. Our experimental results on language modeling, and long-context understanding tasks show that MC enhances the performance of recurrent models, supporting its effectiveness. The results of in-context recall tasks indicate that while Transformers achieve the best accuracy, our MC variants show competitive performance, close the gap with Transformers, and performs better than state-of-the-art recurrent models.
【3】Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification
标题:通过神经机制稀疏化有效发现近似因果抽象
链接:https://arxiv.org/abs/2602.24266
摘要:神经网络被假设为实现可解释的因果机制,但验证这一点需要找到一个因果抽象-一个更简单的,高层次的结构因果模型(SCM),忠实于干预下的网络。发现这样的抽象是困难的:它通常需要暴力交换干预或再培训。我们重新定义的问题,通过查看结构化修剪作为一个近似的抽象搜索。将经过训练的网络视为确定性SCM,我们推导出一个干预风险目标,其二阶展开产生封闭形式的标准,用于用常数替换单元或将它们折叠成邻居。在均匀曲率下,我们的分数减少到激活方差,恢复基于方差的修剪作为一种特殊情况,同时澄清它何时失败。由此产生的过程有效地提取稀疏,干预忠实的抽象从预训练的网络,我们通过交换干预验证。
摘要:Neural networks are hypothesized to implement interpretable causal mechanisms, yet verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions. Discovering such abstractions is hard: it typically demands brute-force interchange interventions or retraining. We reframe the problem by viewing structured pruning as a search over approximate abstractions. Treating a trained network as a deterministic SCM, we derive an Interventional Risk objective whose second-order expansion yields closed-form criteria for replacing units with constants or folding them into neighbors. Under uniform curvature, our score reduces to activation variance, recovering variance-based pruning as a special case while clarifying when it fails. The resulting procedure efficiently extracts sparse, intervention-faithful abstractions from pretrained networks, which we validate via interchange interventions.
【4】Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline
标题:覆盖范围感知的Web Crawling通过Web-知识-Web管道发现特定领域的供应商
链接:https://arxiv.org/abs/2602.24262
摘要:查明专门工业部门中小企业的全貌对供应链的复原力至关重要,但现有的商业数据库存在很大的覆盖差距,特别是对次级供应商和新兴利基市场的公司。我们提出了一个\textbf{Web--Knowledge--Web(W$\to$K$\to$W)}管道,迭代地(1)~抓取特定领域的Web源以发现候选供应商实体,(2)~提取结构化知识并将其合并到异构知识图中,(3)~使用知识图的拓扑结构和覆盖信号来引导后续的抓取向供应商空间的代表性不足的区域。为了量化发现的完整性,我们引入了一个受生态物种丰富度估计器(Chao 1,ACE)启发的覆盖估计框架。在半导体设备制造行业(NAICS 333242)上的实验表明,W$\to$K$\to$W流水线在使用相同的213页抓取预算的所有方法中实现了最高的精度(0.138)和F1(0.118),构建了765个实体和586个关系的知识图,同时在迭代~3时达到峰值召回率,只有112页。
摘要:Identifying the full landscape of small and medium-sized enterprises (SMEs) in specialized industry sectors is critical for supply-chain resilience, yet existing business databases suffer from substantial coverage gaps -- particularly for sub-tier suppliers and firms in emerging niche markets. We propose a \textbf{Web--Knowledge--Web (W$\to$K$\to$W)} pipeline that iteratively (1)~crawls domain-specific web sources to discover candidate supplier entities, (2)~extracts and consolidates structured knowledge into a heterogeneous knowledge graph, and (3)~uses the knowledge graph's topology and coverage signals to guide subsequent crawling toward under-represented regions of the supplier space. To quantify discovery completeness, we introduce a \textbf{coverage estimation framework} inspired by ecological species-richness estimators (Chao1, ACE) adapted for web-entity populations. Experiments on the semiconductor equipment manufacturing sector (NAICS 333242) demonstrate that the W$\to$K$\to$W pipeline achieves the highest precision (0.138) and F1 (0.118) among all methods using the same 213-page crawl budget, building a knowledge graph of 765 entities and 586 relations while reaching peak recall by iteration~3 with only 112 pages.
【5】Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text
标题:块式注意力传感器,用于快速准确的语音转文本流传输
链接:https://arxiv.org/abs/2602.24245
备注:Accepted at ICASSP 2026
摘要
:我们提出了Chunk-wise Attention Transducer(CHAT),这是RNN-T模型的一种新扩展,它在固定大小的块中处理音频,同时在每个块中使用交叉注意力。这种混合方法保持了RNN-T的流式传输能力,同时为局部对齐建模引入了受控的灵活性。CHAT显著降低了RNN-T必须处理的时间维度,从而大大提高了效率:峰值训练内存减少了46.2%,训练速度提高了1.36倍,推理速度提高了1.69倍。除了这些效率提升之外,CHAT还在多种语言和任务中实现了RNN-T的一致准确性提升-语音识别的相对WER降低高达6.3%,语音翻译的BLEU提高高达18.0%。该方法被证明对语音翻译特别有效,其中RNN-T的严格单调对齐会损害性能。我们的研究结果表明,CHAT模型提供了一个实用的解决方案,部署更有能力的流语音模型,而不牺牲实时约束。
摘要:We propose Chunk-wise Attention Transducer (CHAT), a novel extension to RNN-T models that processes audio in fixed-size chunks while employing cross-attention within each chunk. This hybrid approach maintains RNN-T's streaming capability while introducing controlled flexibility for local alignment modeling. CHAT significantly reduces the temporal dimension that RNN-T must handle, yielding substantial efficiency improvements: up to 46.2% reduction in peak training memory, up to 1.36X faster training, and up to 1.69X faster inference. Alongside these efficiency gains, CHAT achieves consistent accuracy improvements over RNN-T across multiple languages and tasks -- up to 6.3% relative WER reduction for speech recognition and up to 18.0% BLEU improvement for speech translation. The method proves particularly effective for speech translation, where RNN-T's strict monotonic alignment hurts performance. Our results demonstrate that the CHAT model offers a practical solution for deploying more capable streaming speech models without sacrificing real-time constraints.
【6】MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
标题:MT-PingEval:评估与私人信息游戏的多回合合作
链接:https://arxiv.org/abs/2602.24188
摘要:我们提出了一个可扩展的方法来评估语言模型在多回合的互动,使用一套合作游戏,需要有效的沟通私人信息。这使得能够进行交互式缩放分析,其中固定的令牌预算被划分为可变数量的轮次。我们发现,在许多情况下,语言模型无法使用交互式协作来改善非交互式基线场景,其中一个代理试图总结其信息,另一个代理立即采取行动-尽管有很大的空间。这表明,最先进的模型在规划和执行多轮协作对话方面仍然存在明显的弱点。我们分析了这些对话的语言特征,评估了奉承,信息密度和话语连贯性的作用。虽然对于当代语言模型的协作弱点没有单一的语言解释,但我们注意到,人类通过产生比大多数语言模型产生的对话更连贯的对话,以更高的令牌效率实现了类似的任务成功。对私人信息的主动管理是现实世界通信的一个定义性特征,我们希望MT-PingEval将推动进一步的工作,以提高这一能力。
摘要:We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communication about private information. This enables an interactive scaling analysis, in which a fixed token budget is divided over a variable number of turns. We find that in many cases, language models are unable to use interactive collaboration to improve over the non-interactive baseline scenario in which one agent attempts to summarize its information and the other agent immediately acts -- despite substantial headroom. This suggests that state-of-the-art models still suffer from significant weaknesses in planning and executing multi-turn collaborative conversations. We analyze the linguistic features of these dialogues, assessing the roles of sycophancy, information density, and discourse coherence. While there is no single linguistic explanation for the collaborative weaknesses of contemporary language models, we note that humans achieve comparable task success at superior token efficiency by producing dialogues that are more coherent than those produced by most language models. The proactive management of private information is a defining feature of real-world communication, and we hope that MT-PingEval will drive further work towards improving this capability.
【7】Sandwiching Polynomials for Geometric Concepts with Low Intrinsic Dimension
标题:低固有维几何概念的三明治式多项
链接:https://arxiv.org/abs/2602.24178
备注:30 pages
摘要:最近的工作表明,在具有挑战性的学习环境中,如分布偏移学习,可测试学习和污染学习,低次多项式逼近器具有惊人的能力。一对多项式逼近期望中的目标函数,同时还提供函数值的逐点上界和下界。本文给出了一种新的构造低次多项式的方法,该方法大大提高了几个基本函数类和边缘分布的度界。特别是,我们得到的度$\mathrm{poly}(k)$多项式的函数的$k$半空间下的高斯分布,提高指数在前$2^{O(k)}$界。更广泛地说,我们的方法适用于低维且具有平滑边界的函数类。 与以前的工作相比,我们的证明相对简单,直接使用目标函数的边界的光滑性来构造迭代Lipschitz函数,这是服从高维近似理论的结果。对于低维多项式阈值函数(PTFs)相对于高斯,我们得到双指数的改进,而不应用FT-软化方法的凯恩在以前的最佳结果。
摘要:Recent work has shown the surprising power of low-degree sandwiching polynomial approximators in the context of challenging learning settings such as learning with distribution shift, testable learning, and learning with contamination. A pair of sandwiching polynomials approximate a target function in expectation while also providing pointwise upper and lower bounds on the function's values. In this paper, we give a new method for constructing low-degree sandwiching polynomials that yield greatly improved degree bounds for several fundamental function classes and marginal distributions. In particular, we obtain degree $\mathrm{poly}(k)$ sandwiching polynomials for functions of $k$ halfspaces under the Gaussian distribution, improving exponentially over the prior $2^{O(k)}$ bound. More broadly, our approach applies to function classes that are low-dimensional and have smooth boundary. In contrast to prior work, our proof is relatively simple and directly uses the smoothness of the target function's boundary to construct sandwiching Lipschitz functions, which are amenable to results from high-dimensional approximation theory. For low-dimensional polynomial threshold functions (PTFs) with respect to Gaussians, we obtain doubly exponential improvements without applying the FT-mollification method of Kane used in the best previous result.
【8】Artificial Agency Program: Curiosity, compression, and communication in agents
标题:人工代理计划:代理人中的好奇心、压缩和沟通
链接:https://arxiv.org/abs/2602.24100
备注:This is a working draft. Feedback and criticism is most welcome
摘要:本文介绍了人工智能代理计划(AAP),这是一个将人工智能系统构建为现实嵌入式,资源受限代理的立场和研究议程,其发展由物理和计算约束下的好奇心作为学习进展驱动。核心论点是,当人工智能被视为扩展的人类工具系统的一部分时,它是最有用的,它可以增加感知、理解和驱动能力,同时减少人、工具和环境之间的摩擦。该议程将预测压缩、内在动机、授权和控制、界面质量(统一)和语言/自我沟通统一为选择性信息瓶颈。我们将这些想法表述为一个具有明确成本的可证伪程序,分阶段实验,以及一个具体的多模态标记化测试平台,其中代理人在观察,行动和审议之间分配有限的预算。目的是提供一个概念和实验框架,连接内在动机,信息论,热力学,有限理性和现代推理系统
摘要:This paper presents the Artificial Agency Program (AAP), a position and research agenda for building AI systems as reality embedded, resource-bounded agents whose development is driven by curiosity-as-learning-progress under physical and computational constraints. The central thesis is that AI is most useful when treated as part of an extended human--tool system that increases sensing, understanding, and actuation capability while reducing friction at the interface between people, tools, and environments. The agenda unifies predictive compression, intrinsic motivation, empowerment and control, interface quality (unification), and language/self-communication as selective information bottlenecks. We formulate these ideas as a falsifiable program with explicit costs, staged experiments, and a concrete multimodal tokenized testbed in which an agent allocates limited budget among observation, action, and deliberation. The aim is to provide a conceptual and experimental framework that connects intrinsic motivation, information theory, thermodynamics, bounded rationality, and modern reasoning systems
【9】DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer
标题:扩散协调器:通过在线扩散增强器连接神经重建和真实感模拟
链接:https://arxiv.org/abs/2602.24096
备注:For more details and updates, please visit our project website: https://research.nvidia.com/labs/sil/projects/diffusion-harmonizer
摘要:仿真对于自动驾驶汽车等自主机器人的开发和评估至关重要。神经重建正在成为一种有前途的解决方案,因为它能够以自动化和可扩展的方式从真实世界的数据中模拟各种各样的场景。然而,虽然NeRF和3D高斯溅射等方法可以产生视觉上引人注目的结果,但它们通常会出现伪影,特别是在渲染新视图时,并且无法真实地集成插入的动态对象,特别是当它们从不同场景捕获时。为了克服这些限制,我们引入了DiffusionHarmonizer,这是一个在线生成增强框架,可以将这些不完美场景的渲染转换为时间一致的输出,同时提高其真实感。其核心是一个单步时间调节增强器,它是从预训练的多步图像扩散模型转换而来的,能够在单个GPU上的在线模拟器中运行。有效训练它的关键是一个自定义的数据管理管道,该管道构造合成-真实对,强调外观协调,伪影校正和照明现实主义。其结果是一个可扩展的系统,显着提高仿真保真度在研究和生产环境。
摘要
:Simulation is essential to the development and evaluation of autonomous robots such as self-driving vehicles. Neural reconstruction is emerging as a promising solution as it enables simulating a wide variety of scenarios from real-world data alone in an automated and scalable way. However, while methods such as NeRF and 3D Gaussian Splatting can produce visually compelling results, they often exhibit artifacts particularly when rendering novel views, and fail to realistically integrate inserted dynamic objects, especially when they were captured from different scenes. To overcome these limitations, we introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings from such imperfect scenes into temporally consistent outputs while improving their realism. At its core is a single-step temporally-conditioned enhancer that is converted from a pretrained multi-step image diffusion model, capable of running in online simulators on a single GPU. The key to training it effectively is a custom data curation pipeline that constructs synthetic-real pairs emphasizing appearance harmonization, artifact correction, and lighting realism. The result is a scalable system that significantly elevates simulation fidelity in both research and production environments.
【10】The Subjectivity of Monoculture
标题:单一文化的主观性
链接:https://arxiv.org/abs/2602.24086
摘要:机器学习模型-包括大型语言模型(LLM)-通常被认为表现出单一文化,其中输出经常惊人地一致。但是,模型过于一致实际上意味着什么呢?我们认为,这个问题本质上是主观的,依赖于两个关键的决定。 首先,分析师必须指定一个基线空模型,说明“独立性”应该是什么样子。这种选择本质上是主观的,正如我们所展示的,不同的空模型会导致对过度一致性的截然不同的推断。其次,我们表明,推理依赖于人口的模型和项目正在考虑。在一个上下文中看起来高度相关的模型,在用不同的问题集或不同的同行集进行评估时,可能会显得独立。两个大规模的基准实验验证了我们的理论研究结果。例如,我们发现当使用具有项目难度的空模型时,与以前的作品相比,我们发现了截然不同的推论。总之,我们的研究结果重构单一文化的评价模型行为的绝对属性,但作为一个上下文相关的推理问题。
摘要:Machine learning models -- including large language models (LLMs) -- are often said to exhibit monoculture, where outputs agree strikingly often. But what does it actually mean for models to agree too much? We argue that this question is inherently subjective, relying on two key decisions. First, the analyst must specify a baseline null model for what "independence" should look like. This choice is inherently subjective, and as we show, different null models result in dramatically different inferences about excess agreement. Second, we show that inferences depend on the population of models and items under consideration. Models that seem highly correlated in one context may appear independent when evaluated on a different set of questions, or against a different set of peers. Experiments on two large-scale benchmarks validate our theoretical findings. For example, we find drastically different inferences when using a null model with item difficulty compared to previous works that do not. Together, our results reframe monoculture evaluation not as an absolute property of model behavior, but as a context-dependent inference problem.
【11】Leveraging Non-linear Dimension Reduction and Random Walk Co-occurrence for Node Embedding
标题:利用非线性降维和随机游走共生进行节点嵌入
链接:https://arxiv.org/abs/2602.24069
备注:13 pages, 6 figures
摘要:利用非线性降维技术,我们从节点嵌入中删除了低维约束,并提出了COVE,这是一种可解释的高维嵌入,当使用UMAP降低到低维时,会略微提高聚类和链接预测任务的性能。嵌入的灵感来自于神经嵌入方法,该方法使用随机游走上的同现作为相似性的指示,并且与扩散过程密切相关。扩展最近的社区检测基准,我们发现,COVE UMAP HDBSCAN管道执行类似于流行的鲁汶算法。
摘要:Leveraging non-linear dimension reduction techniques, we remove the low dimension constraint from node embedding and propose COVE, an explainable high dimensional embedding that, when reduced to low dimension with UMAP, slightly increases performance on clustering and link prediction tasks. The embedding is inspired by neural embedding methods that use co-occurrence on a random walk as an indication of similarity, and is closely related to a diffusion process. Extending on recent community detection benchmarks, we find that a COVE UMAP HDBSCAN pipeline performs similarly to the popular Louvain algorithm.
【12】pathsig: A GPU-Accelerated Library for Truncated and Projected Path Signatures
标题:pathsim:用于截断和投影路径签名的GOP加速库
链接:https://arxiv.org/abs/2602.24066
摘要:路径签名提供了丰富的序列数据表示,具有强有力的理论保证,在各种机器学习任务中具有良好的性能。虽然签名已经从固定的特征提取器发展到机器学习模型的可训练组件,但现有的库通常缺乏大规模基于梯度的学习所需的可扩展性。为了解决这个问题,本文引入了pathsig,这是一个PyTorch原生库,可以直接在单词基础上计算路径签名。通过使用CUDA内核在前缀封闭词集上并行更新签名系数,pathsig实现了高GPU吞吐量和接近最小的峰值内存。与其他库相比,pathsig在截断签名的计算方面实现了10- 30倍的加速,在需要通过签名进行反向传播的训练方面实现了高达4- 10倍的加速。除了常规截断之外,pathsig还支持将(无限维)签名投影到用户指定的单词集合上,以及由非齐次路径规则性激发的各向异性截断,从而实现更紧凑的表示,可以减少维度,冗余和计算成本。
摘要:Path signatures provide a rich representation of sequential data, with strong theoretical guarantees and good performance in a variety of machine-learning tasks. While signatures have progressed from fixed feature extractors to trainable components of machine-learning models, existing libraries often lack the required scalability for large-scale, gradient-based learning. To address this gap, this paper introduces pathsig, a PyTorch-native library that computes path signatures directly in the word basis. By using CUDA kernels to update signature coefficients in parallel over prefix-closed word sets, pathsig achieves high GPU throughput and near-minimal peak memory. Compared with other libraries, pathsig achieves 10-30x speedups for computation of truncated signatures and up to 4-10x speedups in training that require backpropagation through the signature. Beyond regular truncation, pathsig supports projections of the (infinite-dimensional) signature onto user-specified sets of words and anisotropic truncation motivated by inhomogeneous path regularity, enabling more compact representations that can reduce dimensionality, redundancy, and computational cost.
【13】InfoNCE Induces Gaussian Distribution
标题:InfoNCO引入高斯分布
链接:https://arxiv.org/abs/2602.24012
备注:Accepted to ICLR 2026, Oral
摘要:对比学习已经成为现代表征学习的基石,允许使用大量未标记的数据对特定任务和一般(基础)模型进行训练。对比训练中的一个典型损失是InfoNCE及其变体。在这项工作中,我们证明了InfoNCE目标在对比训练中出现的表示中诱导高斯结构。我们在两个互补的制度建立这一结果。首先,我们证明了在一定的对齐和浓度假设下,高维表示的投影渐近接近多元高斯分布。接下来,在不太严格的假设下,我们证明了增加一个小的渐近消失的正则化项,促进低特征范数和高特征熵导致类似的渐近结果。我们通过在多个编码器架构和大小的合成和CIFAR-10数据集上进行实验来支持我们的分析,证明了一致的高斯行为。这种观点为对比表征中常见的高斯性提供了一个原则性的解释。由此产生的高斯模型,使学习表示的原则性分析处理,预计将支持广泛的应用对比学习。
摘要:Contrastive learning has become a cornerstone of modern representation learning, allowing training with massive unlabeled data for both task-specific and general (foundation) models. A prototypical loss in contrastive training is InfoNCE and its variants. In this work, we show that the InfoNCE objective induces Gaussian structure in representations that emerge from contrastive training. We establish this result in two complementary regimes. First, we show that under certain alignment and concentration assumptions, projections of the high-dimensional representation asymptotically approach a multivariate Gaussian distribution. Next, under less strict assumptions, we show that adding a small asymptotically vanishing regularization term that promotes low feature norm and high feature entropy leads to similar asymptotic results. We support our analysis with experiments on synthetic and CIFAR-10 datasets across multiple encoder architectures and sizes, demonstrating consistent Gaussian behavior. This perspective provides a principled explanation for commonly observed Gaussianity in contrastive representations. The resulting Gaussian model enables principled analytical treatment of learned representations and is expected to support a wide range of applications in contrastive learning.
【14】SegMate: Asymmetric Attention-Based Lightweight Architecture for Efficient Multi-Organ Segmentation
标题:SegMate:基于不对称注意力的轻量级架构,用于高效的多器官分割
链接:https://arxiv.org/abs/2602.23903
摘要:最先进的医学图像分割模型具有出色的准确性,但需要大量的计算资源,限制了在资源受限的临床环境中的部署。我们提出了SegMate,一个高效的2.5D框架,实现了最先进的精度,同时大大降低了计算要求。我们的高效设计是精心整合非对称架构,注意力机制,多尺度特征融合,基于切片的位置调节和多任务优化的结果。我们展示了我们的框架在三个现代骨干(EfficientNetV 2-M,MambaOut-Tiny,FastViT-T12)的效率-准确性权衡。我们在三个数据集上进行实验:TotalSegmentator,SegTHOR和AMOS 22。与普通模型相比,SegMate将计算量(GFLOPs)减少了2.5倍,内存占用量(VRAM)减少了2.1倍,同时通常记录了约1%的性能增益。在TotalSegmentator上,我们仅用295 MB峰值GPU内存就实现了93.51%的Dice分数。在SegTHOR和AMOS 22上进行的Zero-shot交叉数据集评估显示出很强的泛化能力,Dice得分分别高达86.85%和89.35%。我们在https://github.com/andreibunea99/SegMate上发布我们的开源代码。
摘要
:State-of-the-art models for medical image segmentation achieve excellent accuracy but require substantial computational resources, limiting deployment in resource-constrained clinical settings. We present SegMate, an efficient 2.5D framework that achieves state-of-the-art accuracy, while considerably reducing computational requirements. Our efficient design is the result of meticulously integrating asymmetric architectures, attention mechanisms, multi-scale feature fusion, slice-based positional conditioning, and multi-task optimization. We demonstrate the efficiency-accuracy trade-off of our framework across three modern backbones (EfficientNetV2-M, MambaOut-Tiny, FastViT-T12). We perform experiments on three datasets: TotalSegmentator, SegTHOR and AMOS22. Compared with the vanilla models, SegMate reduces computation (GFLOPs) by up to 2.5x and memory footprint (VRAM) by up to 2.1x, while generally registering performance gains of around 1%. On TotalSegmentator, we achieve a Dice score of 93.51% with only 295MB peak GPU memory. Zero-shot cross-dataset evaluations on SegTHOR and AMOS22 demonstrate strong generalization, with Dice scores of up to 86.85% and 89.35%, respectively. We release our open-source code at https://github.com/andreibunea99/SegMate.
【15】RF-Agent: Automated Reward Function Design via Language Agent Tree Search
标题:RF-Agent:基于语言Agent树搜索的自动奖励函数设计
链接:https://arxiv.org/abs/2602.23876
备注:39 pages, 9 tables, 11 figures, Project page see https://github.com/deng-ai-lab/RF-Agent
摘要:为低层次控制任务设计有效的奖励函数是一个具有挑战性的问题。最近的研究旨在通过使用带有任务信息的大型语言模型(LLM)来生成密集的奖励函数,从而减少对专家经验的依赖。这些方法通常依赖于训练结果作为反馈,使用贪婪或进化算法迭代生成新的奖励函数。然而,他们遭受的历史反馈和低效的搜索利用率低,导致在复杂的控制任务的改善有限。为了应对这一挑战,我们提出了RF-Agent,这是一个将LLM视为语言代理的框架,并将奖励函数设计视为顺序决策过程,通过更好的上下文推理来增强优化。RF-Agent集成了蒙特卡洛树搜索(MCTS)来管理奖励设计和优化过程,利用LLM的多阶段上下文推理能力。这种方法更好地利用历史信息,提高了搜索效率,以确定有前途的奖励函数。在17个不同的低级别控制任务的出色的实验结果证明了我们的方法的有效性。源代码可在https://github.com/deng-ai-lab/RF-Agent上获得。
摘要:Designing efficient reward functions for low-level control tasks is a challenging problem. Recent research aims to reduce reliance on expert experience by using Large Language Models (LLMs) with task information to generate dense reward functions. These methods typically rely on training results as feedback, iteratively generating new reward functions with greedy or evolutionary algorithms. However, they suffer from poor utilization of historical feedback and inefficient search, resulting in limited improvements in complex control tasks. To address this challenge, we propose RF-Agent, a framework that treats LLMs as language agents and frames reward function design as a sequential decision-making process, enhancing optimization through better contextual reasoning. RF-Agent integrates Monte Carlo Tree Search (MCTS) to manage the reward design and optimization process, leveraging the multi-stage contextual reasoning ability of LLMs. This approach better utilizes historical information and improves search efficiency to identify promising reward functions. Outstanding experimental results in 17 diverse low-level control tasks demonstrate the effectiveness of our method. The source code is available at https://github.com/deng-ai-lab/RF-Agent.
【16】Inferring Chronic Treatment Onset from ePrescription Data: A Renewal Process Approach
标题:从电子处方数据推断慢性治疗开始:更新过程方法
链接:https://arxiv.org/abs/2602.23824
摘要:纵向电子健康记录(EHR)数据通常是左删失的,使得诊断记录不完整且不可靠,无法确定疾病发作。相比之下,门诊处方形成了基于更新的轨迹,提供了疾病管理的连续信号。我们提出了一个概率框架来推断慢性治疗发作的处方动态模型作为一个更新过程,并检测从零星到持续治疗的转变,通过变化点检测之间的基线泊松(零星处方)制度和制度特定的威布尔(持续治疗)更新模型。使用全国范围内的240万个人的ePrescription数据集,我们表明,该方法产生的时间上合理的发病估计比天真的基于规则的触发,大大减少了强左删失下不可信的早期检测。检测性能因疾病而异,与处方密度密切相关,突出了基于治疗的发病推断的优势和局限性。
摘要:Longitudinal electronic health record (EHR) data are often left-censored, making diagnosis records incomplete and unreliable for determining disease onset. In contrast, outpatient prescriptions form renewal-based trajectories that provide a continuous signal of disease management. We propose a probabilistic framework to infer chronic treatment onset by modeling prescription dynamics as a renewal process and detecting transitions from sporadic to sustained therapy via change-point detection between a baseline Poisson (sporadic prescribing) regime and a regime-specific Weibull (sustained therapy) renewal model. Using a nationwide ePrescription dataset of 2.4 million individuals, we show that the approach yields more temporally plausible onset estimates than naive rule-based triggering, substantially reducing implausible early detections under strong left censoring. Detection performance varies across diseases and is strongly associated with prescription density, highlighting both the strengths and limits of treatment-based onset inference.
【17】UPath: Universal Planner Across Topological Heterogeneity For Grid-Based Pathfinding
标题:UPath:跨越基于网格的寻路的布局异类的通用规划器
链接:https://arxiv.org/abs/2602.23789
摘要:基于网格寻路的搜索算法的性能,例如A*,关键取决于用于集中搜索的启发式函数。最近的研究表明,考虑到障碍物的位置/形状的信息化算法可以用深度神经网络来近似。不幸的是,现有的基于学习的方法大多依赖于训练和测试网格图是从相同的分布(例如,城市地图、室内地图等)并且在分发任务外表现不佳。这自然限制了它们在实践中的应用,通常需要一个通用的求解器,能够有效地处理任何问题的实例。在这项工作中,我们通过设计一个通用的启发式预测器来缩小这一差距:一个只训练一次的模型,但能够在一系列看不见的任务中推广。我们广泛的实证评估表明,建议的方法一半的计算工作量的A* 的一个因素的2.2,同时仍然提供解决方案的最佳成本平均在3%内的任务,是完全不同的训练$\unicode{x2013}$一个里程碑首次达到了一个可学习的求解器。
摘要:The performance of search algorithms for grid-based pathfinding, e.g. A*, critically depends on the heuristic function that is used to focus the search. Recent studies have shown that informed heuristics that take the positions/shapes of the obstacles into account can be approximated with the deep neural networks. Unfortunately, the existing learning-based approaches mostly rely on the assumption that training and test grid maps are drawn from the same distribution (e.g., city maps, indoor maps, etc.) and perform poorly on out-of-distribution tasks. This naturally limits their application in practice when often a universal solver is needed that is capable of efficiently handling any problem instance. In this work, we close this gap by designing an universal heuristic predictor: a model trained once, but capable of generalizing across a full spectrum of unseen tasks. Our extensive empirical evaluation shows that the suggested approach halves the computational effort of A* by up to a factor of 2.2, while still providing solutions within 3% of the optimal cost on average altogether on the tasks that are completely different from the ones used for training $\unicode{x2013}$ a milestone reached for the first time by a learnable solver.
【18】OPTIAGENT: A Physics-Driven Agentic Framework for Automated Optical Design
标题:OPTIOAGENT:自动光学设计的物理驱动的统计框架
链接:https://arxiv.org/abs/2602.23761
摘要:光学设计是配置光学元件以精确操纵光以实现高保真成像的过程。它本质上是一个高度非凸优化问题,严重依赖于人类的启发式专业知识和特定领域的知识。虽然大型语言模型(LLM)拥有广泛的光学知识,但它们在设计透镜系统时利用知识的能力仍然受到显着限制。这项工作是首次尝试在光学设计领域采用LLM。我们通过帮助没有接受过正式光学培训的用户成功开发功能性镜头系统来弥合专业知识差距。具体来说,我们策划了一个名为OptiDesignQA的综合数据集,其中包括来自标准光学教科书的经典透镜系统和由自动设计算法生成的用于训练和评估的新型配置。此外,我们通过全系统合成和透镜完成的混合目标将特定领域的光学专业知识注入LLM。为了使模型与光学原理保持一致,我们采用了由光学词典奖励指导的组相对策略优化(DrGRPO)来进行物理驱动的策略对齐。该奖励系统包括结构形式奖励,物理可行性奖励,光操作准确性和基于LLM的自动化。最后,我们的模型集成了专门的光学优化程序,用于端到端的微调和精度改进。我们对传统的基于优化的自动设计算法和LLM同行的基准,我们提出的方法和实验结果表明,我们的方法的优越性。
摘要
:Optical design is the process of configuring optical elements to precisely manipulate light for high-fidelity imaging. It is inherently a highly non-convex optimization problem that relies heavily on human heuristic expertise and domain-specific knowledge. While Large Language Models (LLMs) possess extensive optical knowledge, their capabilities in leveraging the knowledge in designing lens system remain significantly constrained. This work represents the first attempt to employ LLMs in the field of optical design. We bridge the expertise gap by enabling users without formal optical training to successfully develop functional lens systems. Concretely, we curate a comprehensive dataset, named OptiDesignQA, which encompasses both classical lens systems sourced from standard optical textbooks and novel configurations generated by automated design algorithms for training and evaluation. Furthermore, we inject domain-specific optical expertise into the LLM through a hybrid objective of full-system synthesis and lens completion. To align the model with optical principles, we employ Group Relative Policy Optimization Done Right (DrGRPO) guided by Optical Lexicographic Reward for physics-driven policy alignment. This reward system incorporates structural format rewards, physical feasibility rewards, light-manipulation accuracy, and LLM-based heuristics. Finally, our model integrates with specialized optical optimization routines for end-to-end fine-tuning and precision refinement. We benchmark our proposed method against both traditional optimization-based automated design algorithms and LLM counterparts, and experimental results show the superiority of our method.
【19】A Boundary Integral-based Neural Operator for Mesh Deformation
标题:基于边界积分的网格变形神经操作器
链接:https://arxiv.org/abs/2602.23703
备注:the code will be available upon request
摘要:提出了一种基于边界积分和神经算子的网格变形方法,将问题转化为线弹性边值问题。为了克服传统的有限元方法的高计算成本和现有的神经算子在处理向量场的Dirichlet边界条件的局限性,我们引入了一个直接的边界积分表示使用Dirichlet型格林张量。该公式仅将内部位移场表示为边界位移的函数,从而无需求解未知的牵引力。在此基础上,我们设计了一个基于边界积分的神经操作器(BINO),它可以学习几何和材料感知的格林牵引核。我们的框架的一个关键技术优势是通过几何描述符的几何表示的物理集成过程的数学解耦。虽然这项研究主要是在不同的边界条件下表现出强大的泛化能力,但该架构本身具有跨几何适应的潜力。数值实验,包括柔性梁的大变形和NACA翼型的刚体运动,证实了该模型的高精度和严格遵守的线性和叠加的原则。结果表明,该框架保证了网格质量和计算效率,为工程中的参数化网格生成和形状优化提供了一种可靠的新范式。
摘要:This paper presents an efficient mesh deformation method based on boundary integration and neural operators, formulating the problem as a linear elasticity boundary value problem (BVP). To overcome the high computational cost of traditional finite element methods and the limitations of existing neural operators in handling Dirichlet boundary conditions for vector fields, we introduce a direct boundary integral representation using a Dirichlet-type Green's tensor. This formulation expresses the internal displacement field solely as a function of boundary displacements, eliminating the need to solve for unknown tractions. Building on this, we design a Boundary-Integral-based Neural Operator (BINO) that learns the geometry- and material-aware Green's traction kernel. A key technical advantage of our framework is the mathematical decoupling of the physical integration process from the geometric representation via geometric descriptors. While this study primarily demonstrates robust generalization across diverse boundary conditions, the architecture inherently possesses potential for cross-geometry adaptation. Numerical experiments, including large deformations of flexible beams and rigid-body motions of NACA airfoils, confirm the model's high accuracy and strict adherence to the principles of linearity and superposition. The results demonstrate that the proposed framework ensures mesh quality and computational efficiency, providing a reliable new paradigm for parametric mesh generation and shape optimization in engineering.
【20】FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA
标题:FedRot-LoRA:缓解联邦LoRA中的旋转错位
链接:https://arxiv.org/abs/2602.23638
备注:preprint
摘要:Federated LoRA提供了一种高效的通信机制,用于在分散数据上微调大型语言模型。然而,在实践中,用于保持低秩的逐因子平均与局部更新的数学正确聚合之间的差异可能导致显著的聚合误差和不稳定的训练。我们认为,这个问题的一个主要来源是旋转错位,从低秩分解的旋转不变性-语义上等价的更新可以表示在不同的潜在子空间跨客户端,因为$(B_i R_i)(R_i ^\top A_i)= B_i A_i $。当直接对这些未对准的因子取平均值时,它们会破坏性地干扰并降低全局更新。为了解决这个问题,我们提出了FedRot-LoRA,这是一个联邦LoRA框架,可以在聚合之前通过正交转换调整客户端更新。这种对齐保留了语义更新,同时减少了跨客户端子空间的不匹配,而不会增加通信成本或限制模型的表达能力。我们提供了一个收敛性分析,检查由因子平均引起的聚合误差,并显示如何旋转对齐产生更严格的上限上这个错误。对自然语言理解和生成任务的广泛实验表明,FedRot-LoRA在一系列异质性水平和LoRA等级上始终优于现有的联邦LoRA基线。
摘要:Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations -- semantically equivalent updates can be represented in different latent subspaces across clients since $(B_i R_i)(R_i^\top A_i) = B_i A_i$. When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.
【21】Evidential Neural Radiance Fields
标题:证据神经辐射场
链接:https://arxiv.org/abs/2602.23574
摘要:了解不确定性的来源是可信的三维场景建模的基础。虽然神经辐射场(NeRFs)的最新进展在场景重建和新视图合成方面取得了令人印象深刻的精度,但缺乏不确定性估计大大限制了它们在安全关键环境中的部署。现有的不确定性量化方法NeRFs未能捕获任意和认知的不确定性。在那些量化一个或另一个,他们中的许多人要么妥协渲染质量或产生显着的计算开销,以获得不确定性估计。为了解决这些问题,我们引入了证据神经辐射场,这是一种概率方法,可以与NeRF渲染过程无缝集成,并可以从单个前向传递中直接量化任意和认知不确定性。我们在三个标准化的基准上比较了多种不确定性量化方法,其中我们的方法展示了最先进的场景重建保真度和不确定性估计质量。
摘要:Understanding sources of uncertainty is fundamental to trustworthy three-dimensional scene modeling. While recent advances in neural radiance fields (NeRFs) achieve impressive accuracy in scene reconstruction and novel view synthesis, the lack of uncertainty estimation significantly limits their deployment in safety-critical settings. Existing uncertainty quantification methods for NeRFs fail to capture both aleatoric and epistemic uncertainty. Among those that do quantify one or the other, many of them either compromise rendering quality or incur significant computational overhead to obtain uncertainty estimates. To address these issues, we introduce Evidential Neural Radiance Fields, a probabilistic approach that seamlessly integrates with the NeRF rendering process and enables direct quantification of both aleatoric and epistemic uncertainty from a single forward pass. We compare multiple uncertainty quantification methods on three standardized benchmarks, where our approach demonstrates state-of-the-art scene reconstruction fidelity and uncertainty estimation quality.
【22】Neural Operators Can Discover Functional Clusters
标题:神经操作员可以发现功能集群
链接:https://arxiv.org/abs/2602.23528
摘要:算子学习通过将推理分摊到无限多个问题中来重塑科学计算。虽然神经操作符(NO)在回归方面越来越被人们所理解,但对于分类及其无监督类似物:聚类却知之甚少。我们证明了基于样本的神经操作可以学习任何有限的集合类在无限维再生核希尔伯特空间,即使类既不是凸的,也不连接,温和的内核采样假设。我们的通用聚类定理表明,任何$K$封闭类可以近似到任意精度的NO-参数化类在上库拉托斯基拓扑封闭集,一个概念,可以解释为不允许假阳性误分类。 在此基础上,我们为函数数据开发了一个NO动力聚类管道,并将其应用于未标记的常微分方程(ODE)轨迹族。离散化的轨迹由固定的预训练编码器提升到连续的特征图中,并由轻量级的可训练头映射到软分配。不同的合成ODE基准的实验表明,所产生的实际SNO恢复潜在的动力学结构的制度,经典的方法失败,提供证据与我们的通用聚类理论。
摘要
:Operator learning is reshaping scientific computing by amortizing inference across infinite families of problems. While neural operators (NOs) are increasingly well understood for regression, far less is known for classification and its unsupervised analogue: clustering. We prove that sample-based neural operators can learn any finite collection of classes in an infinite-dimensional reproducing kernel Hilbert space, even when the classes are neither convex nor connected, under mild kernel sampling assumptions. Our universal clustering theorem shows that any $K$ closed classes can be approximated to arbitrary precision by NO-parameterized classes in the upper Kuratowski topology on closed sets, a notion that can be interpreted as disallowing false-positive misclassifications. Building on this, we develop an NO-powered clustering pipeline for functional data and apply it to unlabeled families of ordinary differential equation (ODE) trajectories. Discretized trajectories are lifted by a fixed pre-trained encoder into a continuous feature map and mapped to soft assignments by a lightweight trainable head. Experiments on diverse synthetic ODE benchmarks show that the resulting practical SNO recovers latent dynamical structure in regimes where classical methods fail, providing evidence consistent with our universal clustering theory.
【23】Lap2: Revisiting Laplace DP-SGD for High Dimensions via Majorization Theory
标题:Lap 2:通过多数化理论重新审视拉普拉斯DP-Singapore的高维度
链接:https://arxiv.org/abs/2602.23516
备注:16 pages including appendix. arXiv admin note: text overlap with arXiv:2509.06264
摘要:差分私有随机梯度下降(DP-SGD)是确保深度学习中隐私的基石技术,广泛用于从头开始训练和微调大规模语言模型。虽然DP-SGD主要依赖于高斯机制,但拉普拉斯机制由于其依赖于L1范数裁剪而未得到充分利用。这个约束严重限制了它在高维模型中的实用性,因为n维梯度的L1范数可以比其L2范数大sqrt(n)倍。结果,所需的噪声尺度随着模型大小显著增长,导致实用性差或不可训练的模型。 在这项工作中,我们介绍了Lap 2,一个新的解决方案,使L2裁剪拉普拉斯DP-SGD,同时保持强大的隐私保证。我们克服了维度驱动的裁剪障碍,通过计算坐标方面的矩界,并应用优化理论构建一个紧密的,数据无关的上界在整个模型。通过利用舒尔凸的时刻会计师功能,我们聚集这些界限使用精心设计的优集,尊重L2裁剪约束。这产生了一个多变量隐私会计师,该会计师随着模型维度优雅地缩放,并且能够使用数千个时刻。实证评估表明,我们的方法显着提高了拉普拉斯DP-SGD的性能,实现的结果可比或优于高斯DP-SGD下强隐私约束。例如,在SST-2上微调RoberTa-base(125 M参数)在λ =0.54时达到87.88%的准确度,在相同预算下优于高斯(87.16%)和标准拉普拉斯(48.97%)。
摘要:Differentially Private Stochastic Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning, widely used in both training from scratch and fine-tuning large-scale language models. While DP-SGD predominantly relies on the Gaussian mechanism, the Laplace mechanism remains underutilized due to its reliance on L1 norm clipping. This constraint severely limits its practicality in high-dimensional models because the L1 norm of an n-dimensional gradient can be up to sqrt(n) times larger than its L2 norm. As a result, the required noise scale grows significantly with model size, leading to poor utility or untrainable models. In this work, we introduce Lap2, a new solution that enables L2 clipping for Laplace DP-SGD while preserving strong privacy guarantees. We overcome the dimensionality-driven clipping barrier by computing coordinate-wise moment bounds and applying majorization theory to construct a tight, data-independent upper bound over the full model. By exploiting the Schur-convexity of the moment accountant function, we aggregate these bounds using a carefully designed majorization set that respects the L2 clipping constraint. This yields a multivariate privacy accountant that scales gracefully with model dimension and enables the use of thousands of moments. Empirical evaluations demonstrate that our approach significantly improves the performance of Laplace DP-SGD, achieving results comparable to or better than Gaussian DP-SGD under strong privacy constraints. For instance, fine-tuning RoBERTa-base (125M parameters) on SST-2 achieves 87.88% accuracy at epsilon=0.54, outperforming Gaussian (87.16%) and standard Laplace (48.97%) under the same budget.
【24】Spiky Rank and Its Applications to Rigidity and Circuits
标题:Spiky Rank及其在刚性和电路中的应用
链接:https://arxiv.org/abs/2602.23503
摘要:我们引入尖峰秩,一个新的矩阵参数,提高块秩相结合的组合结构,后者与线性代数的灵活性。尖峰矩阵是块结构的,其中对角块是任意秩1矩阵,并且矩阵的尖峰秩是将其表示为和所需的这种矩阵的最小数量。这种措施扩展块秩实矩阵,是更强大的组合和代数字符的问题。 我们的概念贡献如下:我们提出了尖峰秩作为一个表现良好的候选矩阵复杂性的措施,并通过应用程序展示其潜力。我们证明了大的尖峰秩意味着高的矩阵刚性,尖峰秩下界产生深度为2的ReLU电路的下界,这是神经网络的基本构建块。在技术方面,我们建立紧界随机矩阵和显式下界的框架,将其应用到汉明距离矩阵和谱扩展。最后,我们将尖峰秩与其他矩阵参数联系起来,包括块秩、稀疏性和$γ_2$-范数。
摘要:We introduce spiky rank, a new matrix parameter that enhances blocky rank by combining the combinatorial structure of the latter with linear-algebraic flexibility. A spiky matrix is block-structured with diagonal blocks that are arbitrary rank-one matrices, and the spiky rank of a matrix is the minimum number of such matrices required to express it as a sum. This measure extends blocky rank to real matrices and is more robust for problems with both combinatorial and algebraic character. Our conceptual contribution is as follows: we propose spiky rank as a well-behaved candidate matrix complexity measure and demonstrate its potential through applications. We show that large spiky rank implies high matrix rigidity, and that spiky rank lower bounds yield lower bounds for depth-2 ReLU circuits, the basic building blocks of neural networks. On the technical side, we establish tight bounds for random matrices and develop a framework for explicit lower bounds, applying it to Hamming distance matrices and spectral expanders. Finally, we relate spiky rank to other matrix parameters, including blocky rank, sparsity, and the $γ_2$-norm.
【25】Global Interpretability via Automated Preprocessing: A Framework Inspired by Psychiatric Questionnaires
标题:通过自动预处理的全球可解释性:受精神病学调查启发的框架
链接:https://arxiv.org/abs/2602.23459
摘要:精神病学问卷对上下文高度敏感,通常只能微弱地预测随后的症状严重程度,这使得预后关系难以了解。虽然灵活的非线性模型可以提高预测准确性,但其有限的可解释性可能会削弱临床信任。在成像和组学等领域,研究人员通常通过预处理提取稳定的信号,然后拟合可解释的线性模型来解决特定于访问和仪器的伪影。我们通过将预处理与预测解耦来对问卷数据采用相同的策略:我们将非线性能力限制在估计稳定项目值的基线预处理模块中,然后从这些稳定的基线项目学习线性映射到未来的严重程度。我们将这种两阶段方法称为REFINE(冗余开发随访信息非线性增强),它将非线性集中在预处理中,同时保持预后关系透明线性,因此可以通过系数矩阵进行全局解释,而不是通过事后局部属性。在实验中,REFINE优于其他可解释的方法,同时在精神病和非精神病纵向预测任务中保留了预后因素的明确全局归因。
摘要:Psychiatric questionnaires are highly context sensitive and often only weakly predict subsequent symptom severity, which makes the prognostic relationship difficult to learn. Although flexible nonlinear models can improve predictive accuracy, their limited interpretability can erode clinical trust. In fields such as imaging and omics, investigators commonly address visit- and instrument-specific artifacts by extracting stable signal through preprocessing and then fitting an interpretable linear model. We adopt the same strategy for questionnaire data by decoupling preprocessing from prediction: we restrict nonlinear capacity to a baseline preprocessing module that estimates stable item values, and then learn a linear mapping from these stabilized baseline items to future severity. We refer to this two-stage method as REFINE (Redundancy-Exploiting Follow-up-Informed Nonlinear Enhancement), which concentrates nonlinearity in preprocessing while keeping the prognostic relationship transparently linear and therefore globally interpretable through a coefficient matrix, rather than through post hoc local attributions. In experiments, REFINE outperforms other interpretable approaches while preserving clear global attribution of prognostic factors across psychiatric and non-psychiatric longitudinal prediction tasks.
【26】Long Range Frequency Tuning for QML
标题:QML的长范围频率调谐
链接:https://arxiv.org/abs/2602.23409
摘要:使用角度编码的量子机器学习模型自然地表示截断傅立叶级数,提供具有足够电路深度的通用函数近似能力。对于一元固定频率编码,电路深度缩放为O(omega_max *(omega_max + ω ^{-2})),目标频率幅度为omega_max,精度为ω。可训练频率方法理论上减少了这一点,以匹配目标频谱大小,只需要与目标频谱中的频率一样多的编码门。尽管有这种令人信服的效率,但它们的实际有效性取决于一个关键假设:基于梯度的优化可以将前因子驱动到任意目标值。我们通过系统的实验表明,频率前因子表现出有限的可训练性:运动被限制在大约+/-1单位与典型的学习率。当目标频率位于该可达范围之外时,优化经常失败。为了克服这种频率可达性的限制,我们提出了基于网格的初始化使用三进制编码,生成密集的整数频谱。虽然这种方法需要O(log_3(omega_max))个编码门--比理论上的最佳值多,但比固定频率方法少成指数--但它确保了目标频率位于局部可达范围内。在具有三个偏移高频的合成目标上,三元网格初始化实现了0.9969的中值R^2得分,而可训练频率基线为0.1841。对于真实世界的Flight Passengers数据集,三元网格初始化实现了0.9671的中值R^2得分,比可训练频率初始化(中值R^2 = 0.7876)提高了22.8%。
摘要
:Quantum machine learning models using angle encoding naturally represent truncated Fourier series, providing universal function approximation capabilities with sufficient circuit depth. For unary fixed-frequency encodings, circuit depth scales as O(omega_max * (omega_max + epsilon^{-2})) with target frequency magnitude omega_max and precision epsilon. Trainable-frequency approaches theoretically reduce this to match the target spectrum size, requiring only as many encoding gates as frequencies in the target spectrum. Despite this compelling efficiency, their practical effectiveness hinges on a key assumption: that gradient-based optimization can drive prefactors to arbitrary target values. We demonstrate through systematic experiments that frequency prefactors exhibit limited trainability: movement is constrained to approximately +/-1 units with typical learning rates. When target frequencies lie outside this reachable range, optimization frequently fails. To overcome this frequency reachability limitation, we propose grid-based initialization using ternary encodings, which generate dense integer frequency spectra. While this approach requires O(log_3(omega_max)) encoding gates -- more than the theoretical optimum but exponentially fewer than fixed-frequency methods -- it ensures target frequencies lie within the locally reachable range. On synthetic targets with three shifted high frequencies, ternary grid initialization achieves a median R^2 score of 0.9969, compared to 0.1841 for the trainable-frequency baseline. For the real-world Flight Passengers dataset, ternary grid initialization achieves a median R^2 score of 0.9671, representing a 22.8% improvement over trainable-frequency initialization (median R^2 = 0.7876).
【27】On De-Individuated Neurons: Continuous Symmetries Enable Dynamic Topologies
标题:关于去个体化神经元:连续对称性实现动态布局
链接:https://arxiv.org/abs/2602.23405
备注:22 pages, 5 figures, preprint to be submitted for review at Transactions on Machine Learning Research (TMLR)
摘要:本文介绍了一种新的方法,动态网络利用一个新的基本原则类,各向同性激活函数。这种方法能够实时神经元的生长和收缩的架构,以响应任务的需求。这是通过对称性重新参数化下不变的网络结构变化来实现的,使得神经发生下的计算相同,并且在神经退化下很好地近似。这是通过利用各向同性基元的基础独立性的属性来进行的,导致在元素函数形式中隐含的个体神经元的丢失。因此,各向同性允许层被分解并解释为单个人工神经元的基础的自由度。这实现了逐层对角化过程,其中可以重新表达典型的互连层,例如密集层,卷积核等,使得神经元在交替层内具有一对一的有序连接性。这表明哪些一对一的神经元对神经元的通信对整体功能有强烈的影响,哪些没有。因此,可以去除无关紧要的神经元(神经变性),并添加新的非活性支架神经元(神经发生),同时保持功能分析不变。一个新的可调模型参数,固有长度,也被引入,以确保这种分析不变性。这种方法在数学上将连接修剪等同于神经退化。对角化也提供了新的可能性,各向同性网络的机械可解释性,它表明,各向同性密集网络可以渐近达到50%的稀疏因子,同时保留确切的网络功能。最后,建设推广,展示了这种形式的各向同性原始架构的嵌套功能类。
摘要:This paper introduces a novel methodology for dynamic networks by leveraging a new symmetry-principled class of primitives, isotropic activation functions. This approach enables real-time neuronal growth and shrinkage of the architectures in response to task demand. This is made possible by network structural changes that are invariant under symmetry reparameterisations, leaving the computation identical under neurogenesis and well approximated under neurodegeneration. This is undertaken by leveraging the isotropic primitives' property of basis independence, resulting in the loss of the individuated neurons implicit in the elementwise functional form. Isotropy thereby allows a freedom in the basis to which layers are decomposed and interpreted as individual artificial neurons. This enables a layer-wise diagonalisation procedure, in which typical interconnected layers, such as dense layers, convolutional kernels, and others, can be reexpressed so that neurons have one-to-one, ordered connectivity within alternating layers. This indicates which one-to-one neuron-to-neuron communications are strongly impactful on overall functionality and which are not. Inconsequential neurons can thus be removed (neurodegeneration), and new inactive scaffold neurons added (neurogenesis) whilst remaining analytically invariant in function. A new tunable model parameter, intrinsic length, is also introduced to ensure this analytical invariance. This approach mathematically equates connectivity pruning with neurodegeneration. The diagonalisation also offers new possibilities for mechanistic interpretability into isotropic networks, and it is demonstrated that isotropic dense networks can asymptotically reach a sparsity factor of 50% whilst retaining exact network functionality. Finally, the construction is generalised, demonstrating a nested functional class for this form of isotropic primitive architectures.
【28】Active Bipartite Ranking with Smooth Posterior Distributions
标题:具有光滑后验分布的主动二分排名
链接:https://arxiv.org/abs/2602.24263
摘要:在这篇文章中,二分排名,统计学习问题,在许多应用程序中,并广泛研究的被动环境中,是接近在一个更一般的\textit{主动设置}比以前考虑的离散文献。虽然后者假设条件分布是分段常数,但我们开发的框架允许相反地处理连续条件分布,只要它们满足Hölder光滑性约束。我们首先表明,一个天真的方法的基础上离散化在一个统一的水平,固定的\textit{先验},并包括在应用下一个主动策略设计的离散设置一般失败。相反,我们提出了一种新的算法,称为平滑排名和设计的连续设置,其目的是尽量减少估计的排名规则和最佳的一个w.r.t.的ROC曲线之间的距离。$\sup$ norm。我们证明了,对于一个固定的置信水平$ε>0$和概率$δ\在(0,1)$,光滑秩是PAC$(ε,δ)$。此外,我们还给出了光滑秩算法的期望采样时间的问题依赖上界和任意PAC$(ε,δ)$算法的期望采样时间的问题依赖下界.除了进行理论分析,数值计算结果,提供了坚实的经验证据的性能的算法,这与其他方法相比毫不逊色。
摘要:In this article, bipartite ranking, a statistical learning problem involved in many applications and widely studied in the passive context, is approached in a much more general \textit{active setting} than the discrete one previously considered in the literature. While the latter assumes that the conditional distribution is piece wise constant, the framework we develop permits in contrast to deal with continuous conditional distributions, provided that they fulfill a Hölder smoothness constraint. We first show that a naive approach based on discretisation at a uniform level, fixed \textit{a priori} and consisting in applying next the active strategy designed for the discrete setting generally fails. Instead, we propose a novel algorithm, referred to as smooth-rank and designed for the continuous setting, which aims to minimise the distance between the ROC curve of the estimated ranking rule and the optimal one w.r.t. the $\sup$ norm. We show that, for a fixed confidence level $ε>0$ and probability $δ\in (0,1)$, smooth-rank is PAC$(ε,δ)$. In addition, we provide a problem dependent upper bound on the expected sampling time of smooth-rank and establish a problem dependent lower bound on the expected sampling time of any PAC$(ε,δ)$ algorithm. Beyond the theoretical analysis carried out, numerical results are presented, providing solid empirical evidence of the performance of the algorithm proposed, which compares favorably with alternative approaches.
【29】End-to-end Differentiable Calibration and Reconstruction for Optical Particle Detectors
标题:光学粒子探测器的端到端差异校准和重建
链接:https://arxiv.org/abs/2602.24129
摘要:具有光学读出的大规模均匀探测器广泛用于粒子探测,切伦科夫和闪烁体中微子探测器是突出的例子。实验物理学中的分析依赖于高保真模拟器将传感器级信息转换为感兴趣的物理量。这项任务关键取决于准确的校准,它将模拟行为与真实的探测器数据相匹配,以及跟踪,它从光学信号中推断粒子特性。我们提出了第一个端到端可微分光学粒子探测器模拟器,通过基于梯度的优化实现同步校准和重建。我们的方法统一了模拟,校准和跟踪,这是传统上被视为单独的问题,在一个单一的可微的框架。我们证明了它在光产生,传播和检测的所有关键阶段实现了平滑和物理上有意义的梯度,同时保持了计算效率。我们表明,基于梯度的校准和重建大大简化了现有的分析管道,同时匹配或超越传统的不可微方法的性能在精度和速度。此外,该框架的模块化允许直接适应不同的探测器几何形状和目标材料,为实验设计和优化提供了灵活的基础。结果表明,该技术已准备好在当前和未来的光学探测器实验中采用,为粒子物理学中的模拟和重建建立了一个新的范例。
摘要:Large-scale homogeneous detectors with optical readouts are widely used in particle detection, with Cherenkov and scintillator neutrino detectors as prominent examples. Analyses in experimental physics rely on high-fidelity simulators to translate sensor-level information into physical quantities of interest. This task critically depends on accurate calibration, which aligns simulation behavior with real detector data, and on tracking, which infers particle properties from optical signals. We present the first end-to-end differentiable optical particle detector simulator, enabling simultaneous calibration and reconstruction through gradient-based optimization. Our approach unifies simulation, calibration, and tracking, which are traditionally treated as separate problems, within a single differentiable framework. We demonstrate that it achieves smooth and physically meaningful gradients across all key stages of light generation, propagation, and detection while maintaining computational efficiency. We show that gradient-based calibration and reconstruction greatly simplify existing analysis pipelines while matching or surpassing the performance of conventional non-differentiable methods in both accuracy and speed. Moreover, the framework's modularity allows straightforward adaptation to diverse detector geometries and target materials, providing a flexible foundation for experiment design and optimization. The results demonstrate the readiness of this technique for adoption in current and future optical detector experiments, establishing a new paradigm for simulation and reconstruction in particle physics.
【30】Operationalizing Longitudinal Causal Discovery Under Real-World Workflow Constraints
标题:在现实世界工作流程约束下实施纵向因果发现
链接:https://arxiv.org/abs/2602.23800
摘要:因果发现在理论上取得了很大的进展,但在大规模纵向系统中的应用仍然有限。一个关键的障碍是,业务数据下产生的机构工作流程,其诱导偏序很少正式化,扩大了可接受的图形空间的方式不一致的记录过程。我们描述了一个用于纵向因果发现的工作流诱导约束类,该约束类通过协议派生的结构掩码和时间线对齐的索引来限制可接受的有向无环图空间。而不是引入一个新的优化算法,我们表明,明确编码工作流一致的偏序减少了结构的模糊性,特别是在混合离散-连续面板内的时间方向是弱识别。该框架结合了工作流派生的容许边约束,测量对齐的时间索引和块结构,基于引导的不确定性量化滞后的总影响,和一个动态的表示支持干预查询。在日本全国范围内的年度健康筛查队列中,有107,261人和429,044人年,工作流约束的纵向LiNGAM产生时间上一致的时间内子结构和可解释的滞后总效应,具有明确的不确定性。敏感性分析使用替代暴露和身体成分的定义保留主要的定性模式。我们认为,正式工作流派生的约束类提高结构的可解释性,而不依赖于特定领域的边缘规范,提供了一个可重复的桥梁之间的业务工作流程和纵向因果发现标准的可识别性假设。
摘要:Causal discovery has achieved substantial theoretical progress, yet its deployment in large-scale longitudinal systems remains limited. A key obstacle is that operational data are generated under institutional workflows whose induced partial orders are rarely formalized, enlarging the admissible graph space in ways inconsistent with the recording process. We characterize a workflow-induced constraint class for longitudinal causal discovery that restricts the admissible directed acyclic graph space through protocol-derived structural masks and timeline-aligned indexing. Rather than introducing a new optimization algorithm, we show that explicitly encoding workflow-consistent partial orders reduces structural ambiguity, especially in mixed discrete--continuous panels where within-time orientation is weakly identified. The framework combines workflow-derived admissible-edge constraints, measurement-aligned time indexing and block structure, bootstrap-based uncertainty quantification for lagged total effects, and a dynamic representation supporting intervention queries. In a nationwide annual health screening cohort in Japan with 107,261 individuals and 429,044 person-years, workflow-constrained longitudinal LiNGAM yields temporally consistent within-time substructures and interpretable lagged total effects with explicit uncertainty. Sensitivity analyses using alternative exposure and body-composition definitions preserve the main qualitative patterns. We argue that formalizing workflow-derived constraint classes improves structural interpretability without relying on domain-specific edge specification, providing a reproducible bridge between operational workflows and longitudinal causal discovery under standard identifiability assumptions.
【31】Multivariate Spatio-Temporal Neural Hawkes Processes
标题:多元时空神经霍克斯过程
链接:https://arxiv.org/abs/2602.23629
备注:16 pages, 20 figures (including supplementary material). Submitted to IEEE Transactions on Knowledge and Data Engineering (TKDE)
摘要:本文提出了一种多变量时空神经Hawkes过程,用于对具有时空动态的复杂多变量事件数据进行建模。该模型通过学习时间和空间衰减动力学将空间信息集成到潜在状态演化中,扩展了连续时间神经霍克斯过程,从而在没有预定义触发内核的情况下实现了激励和抑制的灵活建模。通过分析基于深度学习的时间Hawkes过程模型的拟合强度函数,我们确定了如何在基于似然的性能之外捕获拟合强度行为的建模差距,这激发了所提出的时空方法。仿真结果表明,该方法能够成功地恢复多变量时空点模式中的时间和空间强度结构,而现有的时间神经Hawkes过程方法则无法做到这一点。巴基斯坦的恐怖主义数据的应用程序进一步证明了所提出的模型的能力,捕捉复杂的时空交互在多个事件类型。
摘要:We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed model extends continuous-time neural Hawkes processes by integrating spatial information into latent state evolution through learned temporal and spatial decay dynamics, enabling flexible modeling of excitation and inhibition without predefined triggering kernels. By analyzing fitted intensity functions of deep learning-based temporal Hawkes process models, we identify a modeling gap in how fitted intensity behavior is captured beyond likelihood-based performance, which motivates the proposed spatio-temporal approach. Simulation studies show that the proposed method successfully recovers sensible temporal and spatial intensity structure in multivariate spatio-temporal point patterns, while existing temporal neural Hawkes process approach fails to do so. An application to terrorism data from Pakistan further demonstrates the proposed model's ability to capture complex spatio-temporal interaction across multiple event types.
【32】Tensor Hypercontraction Error Correction Using Regression
标题:利用回归修正张量超缩误差
链接:https://arxiv.org/abs/2602.23567
摘要:基于波函数的量子方法是预测和分析分子电子结构的最精确的工具,特别是用于解释动态电子相关。然而,大多数方法,包括动力学相关超出简单的二阶Møller-Plesset微扰理论(MP2)的水平是太昂贵的计算适用于大分子。随着系统尺寸减小缩放的近似是一种潜在的补救措施,例如Hohenstein等人的张量超收缩(THC)技术,而且还导致额外的误差源。在这项工作中,我们使用机器学习纠正THC近似方法中的错误。具体而言,我们将THC应用于三阶Møller-Plesset理论(MP3)作为单双激发耦合簇(CCSD)的简化模型,并根据主族化学数据库(MGCDB 84)中观察到的THC误差训练几个回归模型。我们比较了多元线性回归模型和非线性核岭回归模型的性能。我们还研究了相关程序,使用绝对和相对校正,并评估分子和反应能量的校正。我们讨论了使用回归技术来纠正THC-MP3错误的可能性,通过将其与“规范”MP3参考值进行比较,并根据准确性找到最佳技术。我们发现,非线性回归模型减少THC-和典型的MP3之间的均方根误差的因子为6-9$\times$的总分子能量和2-3$\times$的反应能量。
摘要:Wavefunction-based quantum methods are some of the most accurate tools for predicting and analyzing the electronic structure of molecules, in particular for accounting for dynamical electron correlation. However, most methods of including dynamical correlation beyond the simple second-order Møller-Plesset perturbation theory (MP2) level are too computationally expensive to apply to large molecules. Approximations which reduce scaling with system size are a potential remedy, such as the tensor hyper-contraction (THC) technique of Hohenstein et al., but also result in additional sources of error. In this work, we correct errors in THC-approximated methods using machine learning. Specifically, we apply THC to third-order Møller-Plesset theory (MP3) as a simplified model for coupled cluster with single and double excitations (CCSD), and train several regression models on observed THC errors from the Main Group Chemistry Database (MGCDB84). We compare performance of multiple linear regression models and non-linear Kernel Ridge regression models. We also investigate correlation procedures using absolute and relative corrections and evaluate the corrections for both molecule and reaction energies. We discuss the potential for using regression techniques to correct THC-MP3 errors by comparing it to the "canonical" MP3 reference values and find the optimum technique based on accuracy. We find that non-linear regression models reduced root mean squared errors between THC- and canonical MP3 by a factor of 6-9$\times$ for total molecular energies and 2-3$\times$ for reaction energies.
【33】Neural ensemble Kalman filter: Data assimilation for compressible flows with shocks
标题:神经集成卡尔曼过滤器:有冲击的可压缩流的数据同化
链接:https://arxiv.org/abs/2602.23461
摘要:由于许多经典的数据同化方法在不确定激波附近产生虚假振荡和非物理特征,因此对可压缩激波流的数据同化具有挑战性。在这里,我们重点介绍集合卡尔曼滤波器(EnKF)。我们发现,标准EnKF的性能不佳可能是由于双峰预测分布,可能会出现在附近的一个不确定的冲击位置,这违反了EnKF的假设,假设预测接近高斯。为了解决这个问题,我们引入了新的神经EnKF。其基本思想是通过将冲击流的预测集合映射到深度神经网络(NN)的参数空间(权重和偏差),并随后在该空间中执行DA,从而在集合DA中系统地嵌入神经函数近似。非线性映射编码的神经网络参数的合奏尖锐和光滑的流功能。因此,只有当NN参数在预测集合的神经表示内平滑变化时,神经EnKF更新才表现良好。我们表明,这种网络参数的平滑变化可以通过物理信息传递学习来实现,并证明这样做时,神经EnKF避免了困扰标准EnKF的虚假振荡和非物理特征。神经EnKF的适用性证明了通过一系列系统的数值实验与无粘Burgers方程,Sod的激波管,和二维爆炸波。
摘要:Data assimilation (DA) for compressible flows with shocks is challenging because many classical DA methods generate spurious oscillations and nonphysical features near uncertain shocks. We focus here on the ensemble Kalman filter (EnKF). We show that the poor performance of the standard EnKF may be attributed to the bimodal forecast distribution that can arise in the vicinity of an uncertain shock location; this violates the assumptions underpinning the EnKF, which assume a forecast which is close to Gaussian. To address this issue we introduce the new neural EnKF. The basic idea is to systematically embed neural function approximations within ensemble DA by mapping the forecast ensemble of shocked flows to the parameter space (weights and biases) of a deep neural network (NN) and to subsequently perform DA in that space. The nonlinear mapping encodes sharp and smooth flow features in an ensemble of NN parameters. Neural EnKF updates are therefore well-behaved only if the NN parameters vary smoothly within the neural representation of the forecast ensemble. We show that such a smooth variation of network parameters can be enforced via physics-informed transfer learning, and demonstrate that in so-doing the neural EnKF avoids the spurious oscillations and nonphysical features that plague the standard EnKF. The applicability of the neural EnKF is demonstrated through a series of systematic numerical experiments with an inviscid Burgers' equation, Sod's shock tube, and a two-dimensional blast wave.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递