点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计122篇
大模型相关(15篇)
【1】Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
标题:Atlantus:为全球语言环境实现开放且合规的法学硕士民主化
链接:https://arxiv.org/abs/2509.14233
作者: Hernández-Cano, Alexander Hägele, Allen Hao Huang, Angelika Romanou, Antoni-Joan Solergibert, Barna Pasztor, Bettina Messmer, Dhia Garbaya, Eduard Frank Ďurech, Ido Hakimi, Juan García Giraldo, Mete Ismayilzada, Negar Foroutan, Skander Moalla, Tiancheng Chen, Vinko Sabolčec, Yixuan Xu, Michael Aerni, Badr AlKhamissi, Ines Altemir Marinas, Mohammad Hossein Amani, Matin Ansaripour, Ilia Badanin, Harold Benoit, Emanuela Boros, Nicholas Browning, Fabian Bösch, Maximilian Böther, Niklas Canova, Camille Challier, Clement Charmillot, Jonathan Coles, Jan Deriu, Arnout Devos, Lukas Drescher, Daniil Dzenhaliou, Maud Ehrmann, Dongyang Fan, Simin Fan, Silin Gao, Miguel Gila, María Grandury, Diba Hashemi, Alexander Hoyle, Jiaming Jiang, Mark Klein, Andrei Kucharavy, Anastasiia Kucherenko, Frederike Lübeck, Roman Machacek, Theofilos Manitaras, Andreas Marfurt, Kyle Matoba, Simon Matrenok, Henrique Mendoncça, Fawzi Roberto Mohamed, Syrielle Montariol, Luca Mouchel, Sven Najem-Meyer, Jingwei Ni, Gennaro Oliva, Matteo Pagliardini, Elia Palme, Andrei Panferov, Léo Paoletti, Marco Passerini, Ivan Pavlov, Auguste Poiroux, Kaustubh Ponkshe, Nathan Ranchin, Javi Rando, Mathieu Sauser, Jakhongir Saydaliev, Muhammad Ali Sayfiddinov, Marian Schneider, Stefano Schuppli, Marco Scialanga, Andrei Semenov, Kumar Shridhar, Raghav Singhal, Anna Sotnikova, Alexander Sternfeld, Ayush Kumar Tarun, Paul Teiletche, Jannis Vamvas, Xiaozhe Yao, Hao Zhao Alexander Ilic, Ana Klimovic, Andreas Krause, Caglar Gulcehre, David Rosenthal, Elliott Ash, Florian Tramèr, Joost VandeVondele, Livio Veraldi, Martin Rajman, Thomas Schulthess, Torsten Hoefler, Antoine Bosselut, Martin Jaggi, Imanol Schlag
摘要:我们提出了一个完全开放的大型语言模型(LLM)套件,旨在解决当今开放模型生态系统中的两个系统性缺陷:数据合规性和多语言表示。与许多先前的模型不同,这些模型在没有可复制的数据管道或考虑内容所有者权利的情况下发布权重,Approximus模型仅在公开可用的数据上进行预训练,追溯遵守robots.txt排除和过滤非许可,有毒和个人可识别的内容。为了减轻记忆的风险,我们在预训练期间采用了金鱼目标,强烈抑制了数据的逐字回忆,同时保留了下游任务的性能。Aptus模型还扩展了多语言覆盖范围,使用来自1800多种语言的15T令牌进行训练,其中约40%的预训练数据分配给非英语内容。在8B和70B的规模发布,Approxtus接近国家的最先进的结果之间的完全开放的模型对多语言基准,竞争或超越开放重量的同行。除了模型权重之外,我们还通过许可证从开发周期中发布所有科学工件,包括数据准备脚本、检查点、评估套件和训练代码,从而实现透明的审计和扩展。
摘要:We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.
【2】NIRVANA: Structured pruning reimagined for large language models compression
标题:NIRVANA:为大型语言模型压缩重新构想结构化修剪
链接:https://arxiv.org/abs/2509.14230
作者:Ai, Tianxin Wei, Sirui Chen, Jingrui He
摘要:大型语言模型(LLM)的结构化修剪通过移除整个隐藏单元而提供了实质性的效率改进,然而当前方法通常遭受显著的性能降级,特别是在zero-shot设置中,并且需要昂贵的恢复技术,诸如监督微调(SFT)或适配器插入。为了解决这些关键的缺点,我们引入了NIRVANA,一种新的修剪方法,明确设计用于平衡即时zero-shot精度保存与强大的微调能力。NIRVANA利用从Adam优化动态下的神经切线核中导出的一阶显着性标准,提供了一种基于理论的修剪策略,该策略尊重基本的模型训练行为。为了进一步解决结构化修剪带来的独特挑战,NIRVANA采用了跨层和模块的自适应稀疏分配机制(注意力与MLP),以全局平衡的方式调整模块之间的修剪强度。此外,为了减轻修剪决策对校准数据质量的高度敏感性,我们提出了一种简单而有效的基于KL分歧的校准数据选择策略,确保更可靠和任务不可知的修剪结果。在Llama3、Qwen和T5模型上进行的综合实验表明,NIRVANA在等效稀疏约束下优于现有的结构化剪枝方法,为LLM压缩提供了一种理论上合理和实用的方法。该代码可在https://github.com/iDEA-iSAIL-Lab-UIUC/NIRVANA上获得。
摘要:Structured pruning of large language models (LLMs) offers substantial efficiency improvements by removing entire hidden units, yet current approaches often suffer from significant performance degradation, particularly in zero-shot settings, and necessitate costly recovery techniques such as supervised fine-tuning (SFT) or adapter insertion. To address these critical shortcomings, we introduce NIRVANA, a novel pruning method explicitly designed to balance immediate zero-shot accuracy preservation with robust fine-tuning capability. Leveraging a first-order saliency criterion derived from the Neural Tangent Kernel under Adam optimization dynamics, NIRVANA provides a theoretically grounded pruning strategy that respects essential model training behaviors. To further address the unique challenges posed by structured pruning, NIRVANA incorporates an adaptive sparsity allocation mechanism across layers and modules (attention vs. MLP), which adjusts pruning intensity between modules in a globally balanced manner. Additionally, to mitigate the high sensitivity of pruning decisions to calibration data quality, we propose a simple yet effective KL divergence-based calibration data selection strategy, ensuring more reliable and task-agnostic pruning outcomes. Comprehensive experiments conducted on Llama3, Qwen, and T5 models demonstrate that NIRVANA outperforms existing structured pruning methods under equivalent sparsity constraints, providing a theoretically sound and practical approach to LLM compression. The code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/NIRVANA.
【3】Language models' activations linearly encode training-order recency
标题:语言模型的激活线性编码训练顺序近因
链接:https://arxiv.org/abs/2509.14223
作者:rasheninnikov, Richard E. Turner, David Krueger
摘要:我们发现,当在训练过程中学习到信息时,语言模型的激活是线性编码的。我们的设置涉及创建一个具有已知训练顺序的模型,方法是在六个不相交但相似的命名实体数据集上依次微调Llama-3.2-1B。我们发现,六个训练数据集的测试样本的平均激活编码了训练顺序:当投影到2D子空间时,这些质心完全按照训练顺序排列,并且位于一条直线上。此外,我们表明,线性探针可以准确地(~90%)区分“早期”与“晚期”实体,概括为探针自己的训练过程中看不见的实体。该模型还可以进行微调,以显式地报告一个看不见的实体的训练阶段(约80%的准确率)。有趣的是,这种时间信号似乎并不归因于激活幅度、损失或模型置信度的简单差异。我们的论文表明,模型能够通过获取时间区分信息,并对它们如何管理冲突数据和应对知识修改产生重大影响。
摘要
:We show that language models' activations linearly encode when information was learned during training. Our setup involves creating a model with a known training order by sequentially fine-tuning Llama-3.2-1B on six disjoint but otherwise similar datasets about named entities. We find that the average activations of test samples for the six training datasets encode the training order: when projected into a 2D subspace, these centroids are arranged exactly in the order of training and lie on a straight line. Further, we show that linear probes can accurately (~90%) distinguish "early" vs. "late" entities, generalizing to entities unseen during the probes' own training. The model can also be fine-tuned to explicitly report an unseen entity's training stage (~80% accuracy). Interestingly, this temporal signal does not seem attributable to simple differences in activation magnitudes, losses, or model confidence. Our paper demonstrates that models are capable of differentiating information by its acquisition time, and carries significant implications for how they might manage conflicting data and respond to knowledge modifications.
【4】A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
标题:通用的巴纳--随机迭代的Bregman框架:统一随机镜像下降、学习和LLM训练
链接:https://arxiv.org/abs/2509.14216
作者: Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
备注:69 pages, 10 figures. Preprint
摘要:随机优化为现代人工智能的可扩展性提供了动力,包括机器学习、深度学习、强化学习和大型语言模型训练。然而,现有的理论仍然主要局限于希尔伯特空间,依赖于内积框架和正交性。这种范式无法捕获非欧几里德设置,例如单纯形上的镜像下降,稀疏学习的Bregman近似方法,信息几何中的自然梯度下降,或Kullback-Leibler正则化语言模型训练。与基于欧几里得的希尔伯特空间方法不同,这种方法包含一般的Banach空间。这项工作介绍了一个开创性的Banach-Bregman框架随机迭代,建立Bregman几何作为下一代优化的基础。它(i)通过Bregman投影和Bregman-Fejer单调性提供了一个统一的模板,包括随机近似,镜像下降,自然梯度,自适应方法和镜像近似;(ii)建立超松弛($\lambda > 2$)在非希尔伯特设置,使灵活的几何形状和阐明其加速效果;和(iii)提供收敛定理跨越几乎肯定有界的几何速率,验证合成和现实世界的任务。跨机器学习(UCI基准)、深度学习(例如,Transformer训练)、强化学习(actor-critic)和大型语言模型(WikiText-2 with distilGPT-2)的收敛速度提高了20%,方差降低了,准确性也提高了。这些结果将Banach-Bregman几何定位为统一优化理论和核心AI范式实践的基石。
摘要:Stochastic optimization powers the scalability of modern artificial intelligence, spanning machine learning, deep learning, reinforcement learning, and large language model training. Yet, existing theory remains largely confined to Hilbert spaces, relying on inner-product frameworks and orthogonality. This paradigm fails to capture non-Euclidean settings, such as mirror descent on simplices, Bregman proximal methods for sparse learning, natural gradient descent in information geometry, or Kullback--Leibler-regularized language model training. Unlike Euclidean-based Hilbert-space methods, this approach embraces general Banach spaces. This work introduces a pioneering Banach--Bregman framework for stochastic iterations, establishing Bregman geometry as a foundation for next-generation optimization. It (i) provides a unified template via Bregman projections and Bregman--Fejer monotonicity, encompassing stochastic approximation, mirror descent, natural gradient, adaptive methods, and mirror-prox; (ii) establishes super-relaxations ($\lambda > 2$) in non-Hilbert settings, enabling flexible geometries and elucidating their acceleration effect; and (iii) delivers convergence theorems spanning almost-sure boundedness to geometric rates, validated on synthetic and real-world tasks. Empirical studies across machine learning (UCI benchmarks), deep learning (e.g., Transformer training), reinforcement learning (actor--critic), and large language models (WikiText-2 with distilGPT-2) show up to 20% faster convergence, reduced variance, and enhanced accuracy over classical baselines. These results position Banach--Bregman geometry as a cornerstone unifying optimization theory and practice across core AI paradigms.
【5】Synthesizing Behaviorally-Grounded Reasoning Chains: A Data-Generation Framework for Personal Finance LLMs
标题:综合基于行为的推理链:个人理财LLM的数据生成框架
链接:https://arxiv.org/abs/2509.14180
作者:erthala
备注:24 pages, 11 figures. The paper presents a novel framework for generating a personal finance dataset. The resulting fine-tuned model and dataset are publicly available
摘要:个性化的财务建议需要考虑用户的目标,约束,风险承受能力和管辖权。以前的LLM工作集中在投资者和财务规划师的支持系统。与此同时,最近的许多研究考察了更广泛的个人理财任务,包括预算,债务管理,退休和遗产规划,通过代理管道,产生高维护成本,产生不到25%的预期财务回报。在这项研究中,我们引入了一个新的和可重复的框架,将相关的金融背景与行为金融研究相结合,为端到端的顾问构建监督数据。使用该框架,我们创建了一个19 k样本推理数据集,并对数据集上的Qwen-3-8B模型进行了全面的微调。通过一个保留的测试分割和一个盲LLM陪审团研究,我们证明,通过仔细的数据策展和行为集成,我们的8B模型在事实准确性,流畅性和个性化指标方面实现了与更大的基线(14- 32 B参数)相当的性能,同时比大型同行降低了80%的成本。
摘要:Personalized financial advice requires consideration of user goals, constraints, risk tolerance, and jurisdiction. Prior LLM work has focused on support systems for investors and financial planners. Simultaneously, numerous recent studies examine broader personal finance tasks, including budgeting, debt management, retirement, and estate planning, through agentic pipelines that incur high maintenance costs, yielding less than 25% of their expected financial returns. In this study, we introduce a novel and reproducible framework that integrates relevant financial context with behavioral finance studies to construct supervision data for end-to-end advisors. Using this framework, we create a 19k sample reasoning dataset and conduct a comprehensive fine-tuning of the Qwen-3-8B model on the dataset. Through a held-out test split and a blind LLM-jury study, we demonstrate that through careful data curation and behavioral integration, our 8B model achieves performance comparable to significantly larger baselines (14-32B parameters) across factual accuracy, fluency, and personalization metrics while incurring 80% lower costs than the larger counterparts.
【6】TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits
标题:TopoSizing:一个基于Topology的AMS电路理解和尺寸的LLM辅助框架
链接:https://arxiv.org/abs/2509.14169
作者:i, Zichen Kong, Yuan Wang, David Z. Pan, Xiyuan Tang
摘要:由于缺乏高质量的数据以及难以将领域知识嵌入自动化流程,模拟和混合信号电路设计仍然具有挑战性。传统的黑盒优化实现了采样效率,但缺乏对电路的理解,这通常会导致评估浪费在设计空间的低值区域。相比之下,基于学习的方法嵌入了结构化知识,但针对具体情况,重新培训的成本很高。最近对大型语言模型的尝试显示了潜力,但它们通常依赖于手动干预,限制了通用性和透明度。我们提出TopoSizing,一个端到端的框架,直接从原始网表执行强大的电路理解,并将此知识转化为优化收益。我们的方法首先应用图形算法来组织电路到一个分层的设备模块阶段表示。然后,LLM代理使用内置的一致性检查执行迭代假设-验证-细化循环,产生显式注释。通过LLM引导的初始采样和停滞触发的信任区域更新,将验证的见解集成到贝叶斯优化中,提高效率,同时保持可行性。
摘要
:Analog and mixed-signal circuit design remains challenging due to the shortage of high-quality data and the difficulty of embedding domain knowledge into automated flows. Traditional black-box optimization achieves sampling efficiency but lacks circuit understanding, which often causes evaluations to be wasted in low-value regions of the design space. In contrast, learning-based methods embed structural knowledge but are case-specific and costly to retrain. Recent attempts with large language models show potential, yet they often rely on manual intervention, limiting generality and transparency. We propose TopoSizing, an end-to-end framework that performs robust circuit understanding directly from raw netlists and translates this knowledge into optimization gains. Our approach first applies graph algorithms to organize circuits into a hierarchical device-module-stage representation. LLM agents then execute an iterative hypothesis-verification-refinement loop with built-in consistency checks, producing explicit annotations. Verified insights are integrated into Bayesian optimization through LLM-guided initial sampling and stagnation-triggered trust-region updates, improving efficiency while preserving feasibility.
【7】Large Language Model-Empowered Decision Transformer for UAV-Enabled Data Collection
标题:用于无人机数据采集的大语言模型决策Transformer
链接:https://arxiv.org/abs/2509.13934
作者:hen, Jiangzhou Wang, and Hyundong Shin, Arumugam Nallanathan
备注:14pages, 8 figures
摘要:部署无人机(UAV)从空间分布的设备中进行可靠和节能的数据收集,在支持各种物联网(IoT)应用方面具有很大的前景。然而,有限的续航能力和通信范围的无人机需要智能轨迹规划。虽然强化学习(RL)已被广泛应用于无人机轨迹优化,但其交互性在现实环境中具有较高的成本和风险。离线RL缓解了这些问题,但仍然容易受到不稳定训练的影响,并且严重依赖专家质量的数据集。为了解决这些挑战,我们制定了一个联合无人机轨迹规划和资源分配问题,以最大限度地提高能源效率的数据收集。首先将资源分配子问题转化为一个等价的线性规划公式,并以多项式时间复杂度进行优化求解。然后,我们提出了一个大语言模型(LLM)授权的临界正则化决策Transformer(DT)框架,称为LLM-CRDT,学习有效的无人机控制策略。在LLM-CRDT中,我们结合了评论家网络来规范DT模型训练,从而将DT的序列建模功能与基于评论家的价值指导相结合,从而能够从次优数据集中学习有效的策略。此外,为了减轻Transformer模型的数据饥饿性质,我们采用预训练的LLM作为DT模型的Transformer骨干,并采用参数高效的微调策略,即,LoRA,能够快速适应小规模数据集和低计算开销的无人机控制任务。大量的模拟表明,LLM-CRDT优于基准在线和离线RL方法,实现高达36.7%的能源效率比目前最先进的DT方法。
摘要:The deployment of unmanned aerial vehicles (UAVs) for reliable and energy-efficient data collection from spatially distributed devices holds great promise in supporting diverse Internet of Things (IoT) applications. Nevertheless, the limited endurance and communication range of UAVs necessitate intelligent trajectory planning. While reinforcement learning (RL) has been extensively explored for UAV trajectory optimization, its interactive nature entails high costs and risks in real-world environments. Offline RL mitigates these issues but remains susceptible to unstable training and heavily rely on expert-quality datasets. To address these challenges, we formulate a joint UAV trajectory planning and resource allocation problem to maximize energy efficiency of data collection. The resource allocation subproblem is first transformed into an equivalent linear programming formulation and solved optimally with polynomial-time complexity. Then, we propose a large language model (LLM)-empowered critic-regularized decision transformer (DT) framework, termed LLM-CRDT, to learn effective UAV control policies. In LLM-CRDT, we incorporate critic networks to regularize the DT model training, thereby integrating the sequence modeling capabilities of DT with critic-based value guidance to enable learning effective policies from suboptimal datasets. Furthermore, to mitigate the data-hungry nature of transformer models, we employ a pre-trained LLM as the transformer backbone of the DT model and adopt a parameter-efficient fine-tuning strategy, i.e., LoRA, enabling rapid adaptation to UAV control tasks with small-scale dataset and low computational overhead. Extensive simulations demonstrate that LLM-CRDT outperforms benchmark online and offline RL methods, achieving up to 36.7\% higher energy efficiency than the current state-of-the-art DT approaches.
【8】ST-LINK: Spatially-Aware Large Language Models for Spatio-Temporal Forecasting
标题:ST-LINK:用于时空预测的空间感知大型语言模型
链接:https://arxiv.org/abs/2509.13753
作者:eon, Hyunwook Lee, Juwon Kim, Sungahn Ko
备注:11 pages, 4 figures, Accepted to CIKM 2025. Code: https://github.com/HyoTaek98/ST_LINK
摘要:交通预测是智能交通系统中的一个关键问题。在最近的研究中,大型语言模型(LLM)已经成为一种很有前途的方法,但它们的内在设计,主要是为顺序令牌处理量身定制的,在有效捕获空间依赖性方面带来了显着的挑战。具体来说,LLM在建模空间关系方面的固有局限性及其与图形结构空间数据的架构不兼容性在很大程度上仍未得到解决。为了克服这些限制,我们引入了ST-LINK,一个新的框架,提高了大型语言模型捕捉时空依赖关系的能力。它的关键组成部分是空间增强注意力(SE-注意力)和记忆检索前馈网络(MRFFN)。SE-Attention扩展了旋转位置嵌入,将空间相关性整合为注意机制内的直接旋转变换。这种方法最大限度地提高了空间学习,同时保留了LLM固有的顺序处理结构。同时,MRFFN动态检索和利用关键的历史模式,以捕捉复杂的时间依赖性,提高长期预测的稳定性。在基准数据集上的综合实验表明,ST-LINK超越了传统的深度学习和LLM方法,并有效地捕获了常规的流量模式和突然变化。
摘要:Traffic forecasting represents a crucial problem within intelligent transportation systems. In recent research, Large Language Models (LLMs) have emerged as a promising method, but their intrinsic design, tailored primarily for sequential token processing, introduces notable challenges in effectively capturing spatial dependencies. Specifically, the inherent limitations of LLMs in modeling spatial relationships and their architectural incompatibility with graph-structured spatial data remain largely unaddressed. To overcome these limitations, we introduce ST-LINK, a novel framework that enhances the capability of Large Language Models to capture spatio-temporal dependencies. Its key components are Spatially-Enhanced Attention (SE-Attention) and the Memory Retrieval Feed-Forward Network (MRFFN). SE-Attention extends rotary position embeddings to integrate spatial correlations as direct rotational transformations within the attention mechanism. This approach maximizes spatial learning while preserving the LLM's inherent sequential processing structure. Meanwhile, MRFFN dynamically retrieves and utilizes key historical patterns to capture complex temporal dependencies and improve the stability of long-term forecasting. Comprehensive experiments on benchmark datasets demonstrate that ST-LINK surpasses conventional deep learning and LLM approaches, and effectively captures both regular traffic patterns and abrupt changes.
【9】LLM-I: LLMs are Naturally Interleaved Multimodal Creators
标题:LLM-I:LLM是自然交织的多模式创造者
链接:https://arxiv.org/abs/2509.13642
作者:, Feng Zhang, Kai Jia, Tao Jin
摘要:我们提出了LLM-Interleaved(LLM-I),一个灵活的和动态的框架,将交错图像-文本生成重新定义为一个工具使用问题。LLM-I旨在克服当前统一模型的“单一工具”瓶颈,这些模型仅限于合成图像,并与需要事实基础或程序精度的任务作斗争。我们的框架使中央LLM或MLLM代理能够智能地编排各种专用视觉工具的工具包,包括在线图像搜索,基于扩散的生成,代码执行和图像编辑。通过强化学习(RL)框架,该框架具有混合奖励系统,将基于规则的逻辑与LLM和MLLM评估器的判断相结合,训练代理熟练地选择和应用这些工具。LLM-I使用四种不同的模型骨干在不同的新数据集上进行训练,展示了最先进的性能,在四个基准测试中大幅优于现有方法。我们还介绍了一种新的测试时间缩放策略,提供进一步的性能增益。项目页面:https://github.com/ByteDance-BandAI/LLM-I。
摘要:We propose LLM-Interleaved (LLM-I), a flexible and dynamic framework that reframes interleaved image-text generation as a tool-use problem. LLM-I is designed to overcome the "one-tool" bottleneck of current unified models, which are limited to synthetic imagery and struggle with tasks requiring factual grounding or programmatic precision. Our framework empowers a central LLM or MLLM agent to intelligently orchestrate a diverse toolkit of specialized visual tools, including online image search, diffusion-based generation, code execution, and image editing. The agent is trained to select and apply these tools proficiently via a Reinforcement Learning (RL) framework that features a hybrid reward system combining rule-based logic with judgments from LLM and MLLM evaluators. Trained on a diverse new dataset using four different model backbones, LLM-I demonstrates state-of-the-art performance, outperforming existing methods by a large margin across four benchmarks. We also introduce a novel test-time scaling strategy that provides further performance gains. Project Page: https://github.com/ByteDance-BandAI/LLM-I.
【10】Privacy-Aware In-Context Learning for Large Language Models
标题:面向大型语言模型的隐私感知上下文学习
链接:https://arxiv.org/abs/2509.13625
作者:usal, Manoj Acharya, Ramneet Kaur, Colin Samplawski, Anirban Roy, Adam D. Cobb, Rohit Chadha, Susmit Jha
摘要:大型语言模型(LLM)已经显著改变了自然语言的理解和生成,但由于潜在的敏感信息暴露,它们引起了隐私问题。研究强调了信息泄漏的风险,其中对手可以提取嵌入在提示中的敏感信息。在这项工作中,我们介绍了一种新的私人预测框架,用于生成具有强隐私保证的高质量合成文本。我们的方法利用差分隐私(DP)框架,以确保最坏情况下的信息泄漏的理论界,而不需要任何微调的底层models.The所提出的方法执行的私人记录的推理和聚合产生的每个令牌的输出分布。这使得能够生成更长和连贯的合成文本,同时保持隐私保证。此外,我们提出了一个简单的混合操作,结合私人和公共推理,以进一步提高效用。实证评估表明,我们的方法优于以前的国家的最先进的方法在上下文学习(ICL)的任务,使其成为一个有前途的方向隐私保护文本生成,同时保持高实用性。
摘要:Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information leakage, where adversaries can extract sensitive information embedded in the prompts. In this work, we introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees. Our approach leverages the Differential Privacy (DP) framework to ensure worst-case theoretical bounds on information leakage without requiring any fine-tuning of the underlying models.The proposed method performs inference on private records and aggregates the resulting per-token output distributions. This enables the generation of longer and coherent synthetic text while maintaining privacy guarantees. Additionally, we propose a simple blending operation that combines private and public inference to further enhance utility. Empirical evaluations demonstrate that our approach outperforms previous state-of-the-art methods on in-context-learning (ICL) tasks, making it a promising direction for privacy-preserving text generation while maintaining high utility.
【11】Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in LLM Fine-tuning
标题:潜在特质和跨任务转移:解构LLM微调中的数据集交互
链接:https://arxiv.org/abs/2509.13624
作者: Krishna, Atharva Naik, Chaitali Agarwal, Sudharshan Govindan, Taesung Lee, Haw-Shiuan Chang
备注:Camera-ready version. Accepted to appear in the proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
摘要:大型语言模型越来越多地部署在不同的应用程序中。这通常包括LLM在培训期间没有遇到的任务。这意味着枚举和获得所有任务的高质量训练数据是不可行的。因此,我们通常需要依赖于使用具有不同特征的数据集的迁移学习,并预测分发外的请求。出于这种实际需要,我们提出了一个分析框架,建立一个迁移学习矩阵和降维,剖析这些跨任务的相互作用。我们训练和分析10个模型来识别潜在的能力(例如,推理,情感分类,NLU,算术),并发现迁移学习的副作用。我们的研究结果表明,性能的提高往往无视基于表面级数据集相似性或源数据质量的解释。相反,源数据集的隐藏统计因素,如类分布和世代长度倾向,以及特定的语言特征,实际上更有影响力。这项工作为迁移学习的复杂动态提供了见解,为更可预测和更有效的LLM适应铺平了道路。
摘要:Large language models are increasingly deployed across diverse applications. This often includes tasks LLMs have not encountered during training. This implies that enumerating and obtaining the high-quality training data for all tasks is infeasible. Thus, we often need to rely on transfer learning using datasets with different characteristics, and anticipate out-of-distribution requests. Motivated by this practical need, we propose an analysis framework, building a transfer learning matrix and dimensionality reduction, to dissect these cross-task interactions. We train and analyze 10 models to identify latent abilities (e.g., Reasoning, Sentiment Classification, NLU, Arithmetic) and discover the side effects of the transfer learning. Our findings reveal that performance improvements often defy explanations based on surface-level dataset similarity or source data quality. Instead, hidden statistical factors of the source dataset, such as class distribution and generation length proclivities, alongside specific linguistic features, are actually more influential. This work offers insights into the complex dynamics of transfer learning, paving the way for more predictable and effective LLM adaptation.
【12】BiasMap: Leveraging Cross-Attentions to Discover and Mitigate Hidden Social Biases in Text-to-Image Generation
标题:BiasMap:利用交叉注意力发现和减轻文本到图像生成中隐藏的社交偏见
链接:https://arxiv.org/abs/2509.13496
作者:ra Chakraborty, Xujun Che, Depeng Xu, Cori Faklaris, Xi Niu, Shuhan Yuan
摘要:偏差发现是黑盒生成模型,特别是文本到图像(TTI)模型的关键。现有的工作主要集中在输出级人口分布,这并不一定保证概念表示在缓解后被解开。我们提出了BiasMap,一个模型不可知的框架,用于揭示稳定扩散模型中潜在的概念级表征偏差。BiasMap利用交叉注意归因图来揭示人口统计数据之间的结构性纠缠(例如,性别,种族)和语义(例如,专业),深入到图像生成过程中的代表性偏见。使用这些概念的属性图,我们通过交集对并集(IoU)量化空间人口统计-语义概念纠缠,提供了一个镜头到偏见,仍然隐藏在现有的公平性阐述方法。此外,我们还通过能量引导扩散采样进一步利用BiasMap进行偏差缓解,该采样直接修改潜在噪声空间并最小化去噪过程中的预期SoftIoU。我们的研究结果表明,现有的公平性干预措施可以减少输出分布差距,但往往无法解开概念级耦合,而我们的缓解方法可以减轻概念纠缠在图像生成,同时补充分布偏差缓解。
摘要:Bias discovery is critical for black-box generative models, especiall text-to-image (TTI) models. Existing works predominantly focus on output-level demographic distributions, which do not neces- sarily guarantee concept representations to be disentangled post- mitigation. We propose BiasMap, a model-agnostic framework for uncovering latent concept-level representational biases in stable dif- fusion models. BiasMap leverages cross-attention attribution maps to reveal structural entanglements between demographics (e.g., gender, race) and semantics (e.g., professions), going deeper into representational bias during the image generation. Using attribu- tion maps of these concepts, we quantify the spatial demographics- semantics concept entanglement via Intersection over Union (IoU), offering a lens into bias that remains hidden in existing fairness dis- covery approaches. In addition, we further utilize BiasMap for bias mitigation through energy-guided diffusion sampling that directly modifies latent noise space and minimizes the expected SoftIoU dur- ing the denoising process. Our findings show that existing fairness interventions may reduce the output distributional gap but often fail to disentangle concept-level coupling, whereas our mitigation method can mitigate concept entanglement in image generation while complementing distributional bias mitigation.
【13】SteeringControl: Holistic Evaluation of Alignment Steering in LLMs
标题:SteeringControl:LLM中对齐转向的整体评估
链接:https://arxiv.org/abs/2509.13450
作者:iu, Nicholas Crispino, David Park, Nathan W. Henry, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang
摘要
:我们介绍了SteeringControl,一个用于评估跨核心对齐目标的表示转向方法的基准-偏见,有害的生成和幻觉-以及它们对次要行为(如奉承和常识道德)的影响。虽然先前的比对工作通常强调真实性或推理能力,以证明表征转向的副作用,但我们发现有许多未探索的权衡尚未以系统的方式理解。我们收集了一个数据集的安全相关的主要和次要行为,以评估转向的有效性和行为纠缠围绕五个流行的转向方法。为了实现这一点,我们制作了一个模块化的指导框架的基础上,作为许多现有方法的构建块的独特组件。我们对Qwen-2.5- 7 B和Llama-3.1-8B的研究发现,强大的转向性能取决于转向方法、模型和目标行为的特定组合,而严重的概念纠缠也可能来自这三者的不良组合。我们在这里发布代码:https://github.com/wang-research-lab/SteeringControl.git。
摘要:We introduce SteeringControl, a benchmark for evaluating representation steering methods across core alignment objectives--bias, harmful generation, and hallucination--and their effects on secondary behaviors such as sycophancy and commonsense morality. While prior alignment work often highlights truthfulness or reasoning ability to demonstrate the side effects of representation steering, we find there are many unexplored tradeoffs not yet understood in a systematic way. We collect a dataset of safety-relevant primary and secondary behaviors to evaluate steering effectiveness and behavioral entanglement centered around five popular steering methods. To enable this, we craft a modular steering framework based on unique components that serve as the building blocks of many existing methods. Our results on Qwen-2.5-7B and Llama-3.1-8B find that strong steering performance is dependent on the specific combination of steering method, model, and targeted behavior, and that severe concept entanglement can result from poor combinations of these three as well. We release our code here: https://github.com/wang-research-lab/SteeringControl.git.
【14】Accuracy Paradox in Large Language Models: Regulating Hallucination Risks in Generative AI
标题:大型语言模型中的准确性悖论:调节生成人工智能中的幻觉风险
链接:https://arxiv.org/abs/2509.13345
作者: Weiwei Yi, Jiahong Chen
摘要:随着大型语言模型(LLM)渗透到日常决策中,它们的认知和社会风险迫切需要审查。幻觉,即产生捏造、误导、过于简单或不可信的产出,已成为紧迫的挑战。虽然监管、学术和技术话语将准确性定位为减轻此类危害的主要基准,但本文认为,过度依赖准确性会误诊问题,并产生适得其反的效果:准确性悖论。本文借鉴跨学科文献,发展了幻觉类型的分类,并从输出、个人和社会三个相互交织的维度展示了这一悖论。首先,准确性作为可靠性的表面代理,激励修辞流畅性和表面正确性的优化,而不是认识上的可信度。这鼓励被动用户信任看起来准确但在认知上站不住脚的输出。其次,准确性作为单一指标无法检测出并非事实上错误的伤害,但仍然具有误导性,价值负载或社会扭曲,包括共识错觉,阿谀奉承的联盟和微妙的操纵。第三,监管机构对准确性的过度关注掩盖了幻觉的更广泛的社会后果,包括社会分类、侵犯隐私、损害公平、使异议边缘化的认识趋同、减少多元化和导致社会去技能化。通过研究欧盟人工智能法案,GDPR和DSA,本文认为,目前的法规在结构上还不足以解决这些认识上的,关系上的和系统性的伤害,并且由于过度依赖准确性而加剧。通过揭示这些概念和实际挑战,本文呼吁从根本上转变为多元化,上下文感知和操纵弹性方法,以实现人工智能可信治理。
摘要:As Large Language Models (LLMs) permeate everyday decision-making, their epistemic and societal risks demand urgent scrutiny. Hallucinations, the generation of fabricated, misleading, oversimplified or untrustworthy outputs, has emerged as imperative challenges. While regulatory, academic, and technical discourse position accuracy as the principal benchmark for mitigating such harms, this article contends that overreliance on accuracy misdiagnoses the problem and has counterproductive effect: the accuracy paradox. Drawing on interdisciplinary literatures, this article develops a taxonomy of hallucination types and shows the paradox along three intertwining dimensions: outputs, individuals and society. First, accuracy functions as a superficial proxy for reliability, incentivising the optimisation of rhetorical fluency and surface-level correctness over epistemic trustworthiness. This encourages passive user trust in outputs that appear accurate but epistemically untenable. Second, accuracy as a singular metric fails to detect harms that are not factually false but are nonetheless misleading, value-laden, or socially distorting, including consensus illusions, sycophantic alignment, and subtle manipulation. Third, regulatory overemphasis on accuracy obscures the wider societal consequences of hallucination, including social sorting, privacy violations, equity harms, epistemic convergence that marginalises dissent, reduces pluralism, and causes social deskilling. By examining the EU AI Act, GDPR, and DSA, the article argues that current regulations are not yet structurally equipped to address these epistemic, relational, and systemic harms and exacerbated by the overreliance on accuracy. By exposing such conceptual and practical challenges, this article calls for a fundamental shift towards pluralistic, context-aware, and manipulation-resilient approaches to AI trustworthy governance.
【15】LLM Chatbot-Creation Approaches
标题:LLM Chatbot创作方法
链接:https://arxiv.org/abs/2509.13326
作者:ta, Tanvi Raut, Kohav Yadav, Edward F. Gehringer
备注:Forthcoming in Frontiers in Education (FIE 2025), Nashville, Tennessee, USA, Nov 2-5, 2025
摘要:这篇完整的研究实践论文通过比较低代码平台和教育环境中的自定义编码解决方案,探索了开发课程聊天机器人的方法。随着GPT-4和LLaMA等大型语言模型(LLM)的兴起,基于LLM的聊天机器人正在被集成到教学工作流中,以自动执行任务,提供帮助并提供可扩展的支持。然而,选择最佳的开发策略需要平衡易用性、定制、数据隐私和可扩展性。这项研究比较了两种开发方法:低代码平台,如AnythingLLM和Botpress,以及使用LangChain,FAISS和FastAPI的自定义编码解决方案。该研究使用Prompt工程,检索增强生成(RAG)和个性化来评估聊天机器人原型的技术性能,可扩展性和用户体验。研究结果表明,虽然低代码平台可以实现快速原型设计,但它们在定制和扩展方面面临限制,而自定义编码系统提供更多控制,但需要大量的技术专业知识。这两种方法都成功地实现了自适应反馈回路和会话连续性等关键研究原则。这项研究为根据机构目标和资源选择适当的发展战略提供了一个框架。未来的工作将专注于混合解决方案,将低代码可访问性与模块化定制相结合,并将多模式输入纳入智能辅导系统。
摘要:This full research-to-practice paper explores approaches for developing course chatbots by comparing low-code platforms and custom-coded solutions in educational contexts. With the rise of Large Language Models (LLMs) like GPT-4 and LLaMA, LLM-based chatbots are being integrated into teaching workflows to automate tasks, provide assistance, and offer scalable support. However, selecting the optimal development strategy requires balancing ease of use, customization, data privacy, and scalability. This study compares two development approaches: low-code platforms like AnythingLLM and Botpress, with custom-coded solutions using LangChain, FAISS, and FastAPI. The research uses Prompt engineering, Retrieval-augmented generation (RAG), and personalization to evaluate chatbot prototypes across technical performance, scalability, and user experience. Findings indicate that while low-code platforms enable rapid prototyping, they face limitations in customization and scaling, while custom-coded systems offer more control but require significant technical expertise. Both approaches successfully implement key research principles such as adaptive feedback loops and conversational continuity. The study provides a framework for selecting the appropriate development strategy based on institutional goals and resources. Future work will focus on hybrid solutions that combine low-code accessibility with modular customization and incorporate multimodal input for intelligent tutoring systems.
Graph相关(图学习|图神经网络|图优化等)(6篇)
【1】Deep Temporal Graph Networks for Real-Time Correction of GNSS Jamming-Induced Deviations
标题:用于实时纠正GPS干扰引起的偏差的深度时间图网络
链接:https://arxiv.org/abs/2509.14000
作者:ić, Aljaž Blatnik, Carolina Fortuna, Blaž Bertalanič
备注:20 pages, 4 figures
摘要:全球导航卫星系统越来越多地受到故意干扰,在定位和定时必须保持运作的情况下,降低了可用性。我们通过将干扰缓解重构为动态图回归并引入以接收器为中心的深度时间图网络来解决这个问题,该网络可以预测并实时纠正接收器的水平偏差。在每个1Hz时期,卫星接收器环境被表示为具有时变属性(例如,SNR、方位角、仰角、纬度/经度)。单层Heterogeneous Graph ConvLSTM(Heterogeneous GCLSTM)在短历史上聚合一跳空间上下文和时间动态,以输出应用于动态校正的2D偏差向量。 我们评估的数据集从两个不同的接收机在三个干扰机配置文件,连续波(CW),三重音(CW 3),和宽带FM,每个行使在6个功率电平之间的-45和-70 dBm,50个重复每个场景(预干扰/干扰/恢复)。针对强多变量时间序列基线(MLP,均匀CNN和Seq 2 Point CNN),我们的模型始终达到最低的平均绝对误差(MAE)。在-45 dBm时,可达到3.64 cm(GP 01/cw)、7.74 cm(GP 01/cw 3)、4.41 cm(ublox/cw)、4.84 cm(ublox/cw 3)和4.82 cm(ublox/FM),通过-60至-70 dBm提高到1.65-2.08 cm。在合并所有功率的混合模式数据集上,MAE为3.78 cm(GP 01)和4.25 cm(ublox 10),优于Seq 2 Point、MLP和CNN。一项分割研究显示了卓越的数据效率:只有10%的训练数据,我们的方法仍然远远领先于基线(20 cm vs. 36-42 cm)。
摘要:Global Navigation Satellite Systems (GNSS) are increasingly disrupted by intentional jamming, degrading availability precisely when positioning and timing must remain operational. We address this by reframing jamming mitigation as dynamic graph regression and introducing a receiver-centric deep temporal graph network that predicts, and thus corrects, the receivers horizontal deviation in real time. At each 1 Hz epoch, the satellite receiver environment is represented as a heterogeneous star graph (receiver center, tracked satellites as leaves) with time varying attributes (e.g., SNR, azimuth, elevation, latitude/longitude). A single layer Heterogeneous Graph ConvLSTM (HeteroGCLSTM) aggregates one hop spatial context and temporal dynamics over a short history to output the 2D deviation vector applied for on the fly correction. We evaluate on datasets from two distinct receivers under three jammer profiles, continuous wave (cw), triple tone (cw3), and wideband FM, each exercised at six power levels between -45 and -70 dBm, with 50 repetitions per scenario (prejam/jam/recovery). Against strong multivariate time series baselines (MLP, uniform CNN, and Seq2Point CNN), our model consistently attains the lowest mean absolute error (MAE). At -45 dBm, it achieves 3.64 cm (GP01/cw), 7.74 cm (GP01/cw3), 4.41 cm (ublox/cw), 4.84 cm (ublox/cw3), and 4.82 cm (ublox/FM), improving to 1.65-2.08 cm by -60 to -70 dBm. On mixed mode datasets pooling all powers, MAE is 3.78 cm (GP01) and 4.25 cm (ublox10), outperforming Seq2Point, MLP, and CNN. A split study shows superior data efficiency: with only 10\% training data our approach remains well ahead of baselines (20 cm vs. 36-42 cm).
【2】Graph-Regularized Learning of Gaussian Mixture Models
标题:高斯混合模型的图正规化学习
链接:https://arxiv.org/abs/2509.13855
作者: Abdurakhmanova, Alex Jung
摘要:我们提出了一个图正则化学习的高斯混合模型(GARCH)在分布式设置异构和有限的本地数据。该方法利用提供的相似性图来指导节点之间的参数共享,避免了原始数据的传输。由此产生的模型允许灵活的邻居的参数聚合,并优于集中式和本地训练的Gestro在异构,低样本制度。
摘要:We present a graph-regularized learning of Gaussian Mixture Models (GMMs) in distributed settings with heterogeneous and limited local data. The method exploits a provided similarity graph to guide parameter sharing among nodes, avoiding the transfer of raw data. The resulting model allows for flexible aggregation of neighbors' parameters and outperforms both centralized and locally trained GMMs in heterogeneous, low-sample regimes.
【3】An End-to-End Differentiable, Graph Neural Network-Embedded Pore Network Model for Permeability Prediction
标题:渗透率预测的端到端可区分、嵌入图形神经网络的孔网络模型
链接:https://arxiv.org/abs/2509.13841
作者:ao, Heng Xiao
备注:This preprint is also available at ESS Open Archive: this https URL
摘要:多孔介质渗透率的准确预测是地下渗流模拟的基础。虽然纯数据驱动的模型提供了计算效率,但它们通常缺乏跨尺度的泛化,并且没有包含明确的物理约束。孔隙网络模型(PNM),另一方面,是基于物理的和有效的,但依赖于理想化的几何假设来估计孔隙尺度的水力传导,限制其在复杂结构的准确性。为了克服这些限制,我们提出了一个端到端的可微分混合框架,该框架将图神经网络(GNN)嵌入到PNM中。在这个框架中,用于电导计算的分析公式被基于GNN的预测所取代,该预测来自孔喉特征。然后将预测的电导传递到PNM求解器以进行渗透率计算。通过这种方式,该模型避免了PNM的理想化几何假设,同时保留了基于物理的流量计算。GNN的训练不需要标记的电导数据,每个孔隙网络可以有数千个;相反,它通过使用单个标量渗透率作为训练目标来学习电导值。这可以通过GNN(通过自动微分)和PNM求解器(通过离散伴随方法)反向传播梯度来实现,从而实现完全耦合的端到端训练。由此产生的模型具有很高的准确性,并在不同的尺度上具有很好的泛化能力,优于纯数据驱动和传统的PNM方法。基于一致性的敏感性分析进一步揭示了物理上一致的特征影响,增强了模型的可解释性。这种方法为复杂多孔介质中的渗透率预测提供了一个可扩展的物理信息框架,降低了模型的不确定性并提高了准确性。
摘要:Accurate prediction of permeability in porous media is essential for modeling subsurface flow. While pure data-driven models offer computational efficiency, they often lack generalization across scales and do not incorporate explicit physical constraints. Pore network models (PNMs), on the other hand, are physics-based and efficient but rely on idealized geometric assumptions to estimate pore-scale hydraulic conductance, limiting their accuracy in complex structures. To overcome these limitations, we present an end-to-end differentiable hybrid framework that embeds a graph neural network (GNN) into a PNM. In this framework, the analytical formulas used for conductance calculations are replaced by GNN-based predictions derived from pore and throat features. The predicted conductances are then passed to the PNM solver for permeability computation. In this way, the model avoids the idealized geometric assumptions of PNM while preserving the physics-based flow calculations. The GNN is trained without requiring labeled conductance data, which can number in the thousands per pore network; instead, it learns conductance values by using a single scalar permeability as the training target. This is made possible by backpropagating gradients through both the GNN (via automatic differentiation) and the PNM solver (via a discrete adjoint method), enabling fully coupled, end-to-end training. The resulting model achieves high accuracy and generalizes well across different scales, outperforming both pure data-driven and traditional PNM approaches. Gradient-based sensitivity analysis further reveals physically consistent feature influences, enhancing model interpretability. This approach offers a scalable and physically informed framework for permeability prediction in complex porous media, reducing model uncertainty and improving accuracy.
【4】State Space Models over Directed Graphs
标题:有向图上的状态空间模型
链接:https://arxiv.org/abs/2509.13735
作者:e, Xunkai Li, Rong-Hua Li, Guoren Wang
备注:currently undergoing review by IEEE Transactions on Big Data
摘要:有向图在许多领域都是普遍存在的,其中边的方向性编码了关键的因果依赖关系。然而,现有的GNN和为有向图定制的图Transformers面临两个主要挑战:(1)有效地捕获从有向边导出的长范围因果依赖关系;(2)在处理大规模图数据集时平衡准确性和训练效率。近年来,状态空间模型(SSM)在因果序列任务中取得了实质性的进展,其为图设计的变体已经证明了最先进的准确性,同时在各种图学习基准中保持了高效率。然而,现有的图状态空间模型是专门为无向图设计的,这限制了它们在有向图学习中的性能。为此,我们提出了一种创新的方法DirEgo 2 Token,通过k跳自我图序列化有向图。这标志着状态空间模型首次系统地扩展到有向图学习领域。在此基础上,我们开发了DirGraphSSM,一种新的有向图神经网络架构,通过消息传递机制在有向图上实现状态空间模型。实验结果表明,DirGraphSSM在三个有代表性的有向图学习任务上实现了最先进的性能,同时在两个额外的任务上实现了具有竞争力的性能,与现有的最先进的模型相比,训练速度提高了1.5\times $到2\times $。
摘要
:Directed graphs are ubiquitous across numerous domains, where the directionality of edges encodes critical causal dependencies. However, existing GNNs and graph Transformers tailored for directed graphs face two major challenges: (1) effectively capturing long-range causal dependencies derived from directed edges; (2) balancing accuracy and training efficiency when processing large-scale graph datasets. In recent years, state space models (SSMs) have achieved substantial progress in causal sequence tasks, and their variants designed for graphs have demonstrated state-of-the-art accuracy while maintaining high efficiency across various graph learning benchmarks. However, existing graph state space models are exclusively designed for undirected graphs, which limits their performance in directed graph learning. To this end, we propose an innovative approach DirEgo2Token which sequentializes directed graphs via k-hop ego graphs. This marks the first systematic extension of state space models to the field of directed graph learning. Building upon this, we develop DirGraphSSM, a novel directed graph neural network architecture that implements state space models on directed graphs via the message-passing mechanism. Experimental results demonstrate that DirGraphSSM achieves state-of-the-art performance on three representative directed graph learning tasks while attaining competitive performance on two additional tasks with 1.5$\times $ to 2$\times $ training speed improvements compared to existing state-of-the-art models.
【5】PhenoGnet: A Graph-Based Contrastive Learning Framework for Disease Similarity Prediction
标题:PhenoGnet:用于疾病相似性预测的基于图的对比学习框架
链接:https://arxiv.org/abs/2509.14037
作者:iniwatte, Kazi Jewel Rana, Aaron J. Masino
摘要:了解疾病的相似性对于推进诊断、药物发现和个性化治疗策略至关重要。我们提出了PhenoGnet,一种新的基于图的对比学习框架,旨在通过整合基因功能相互作用网络与人类表型本体(HPO)来预测疾病相似性。PhenoGnet包括两个关键组件:一个内部视图模型,使用图卷积网络(GCN)和图注意力网络(GAT)分别编码基因和表型图,以及一个交叉视图模型,作为共享权重多层感知器(MLP)实现,通过对比学习对齐基因和表型嵌入。该模型使用已知的基因表型关联作为正对,随机采样的不相关对作为负对来训练。疾病由其相关基因和/或表型的平均嵌入表示,并且通过余弦相似性计算成对相似性。对1,100个相似和866个不同疾病对的策划基准的评估显示出强大的性能,基于基因的嵌入实现了0.9012的AUCPR和0.8764的AUROC,优于现有的最先进的方法。值得注意的是,PhenoGnet捕获了直接重叠之外的潜在生物学关系,为疾病相似性预测提供了可扩展和可解释的解决方案。这些结果强调了其在罕见疾病研究和精准医学中的下游应用潜力。
摘要:Understanding disease similarity is critical for advancing diagnostics, drug discovery, and personalized treatment strategies. We present PhenoGnet, a novel graph-based contrastive learning framework designed to predict disease similarity by integrating gene functional interaction networks with the Human Phenotype Ontology (HPO). PhenoGnet comprises two key components: an intra-view model that separately encodes gene and phenotype graphs using Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs), and a cross view model implemented as a shared weight multilayer perceptron (MLP) that aligns gene and phenotype embeddings through contrastive learning. The model is trained using known gene phenotype associations as positive pairs and randomly sampled unrelated pairs as negatives. Diseases are represented by the mean embeddings of their associated genes and/or phenotypes, and pairwise similarity is computed via cosine similarity. Evaluation on a curated benchmark of 1,100 similar and 866 dissimilar disease pairs demonstrates strong performance, with gene based embeddings achieving an AUCPR of 0.9012 and AUROC of 0.8764, outperforming existing state of the art methods. Notably, PhenoGnet captures latent biological relationships beyond direct overlap, offering a scalable and interpretable solution for disease similarity prediction. These results underscore its potential for enabling downstream applications in rare disease research and precision medicine.
【6】A Geometric Graph-Based Deep Learning Model for Drug-Target Affinity Prediction
标题:用于药物目标亲和力预测的基于几何图的深度学习模型
链接:https://arxiv.org/abs/2509.13476
作者:Rana, Farjana Tasnim Mukta, Duc D. Nguyen
摘要:在基于结构的药物设计中,准确估计候选配体与其蛋白质受体之间的结合亲和力是一个核心挑战。人工智能的最新进展,特别是深度学习,已经证明了在这项任务中优于传统的经验和基于物理的方法的性能,这是由于结构和实验亲和力数据的可用性越来越大。在这项工作中,我们介绍了DeepGGL,这是一种深度卷积神经网络,它在几何图学习框架中集成了残余连接和注意力机制。通过利用多尺度加权彩色二分子图,DeepGGL有效地捕获了多尺度蛋白质-配体复合物中的细粒度原子级相互作用。我们将DeepGGL与CASF-2013和CASF-2016上的已建立模型进行了基准测试,在这些模型中,DeepGGL实现了最先进的性能,并在各种评估指标上取得了显着改进。为了进一步评估鲁棒性和泛化能力,我们在CSAR-NRC-HiQ数据集和PDBbind v2019 holdout集上测试了该模型。DeepGGL始终保持较高的预测准确性,突出了其在基于结构的药物发现中结合亲和力预测的适应性和可靠性。
摘要:In structure-based drug design, accurately estimating the binding affinity between a candidate ligand and its protein receptor is a central challenge. Recent advances in artificial intelligence, particularly deep learning, have demonstrated superior performance over traditional empirical and physics-based methods for this task, enabled by the growing availability of structural and experimental affinity data. In this work, we introduce DeepGGL, a deep convolutional neural network that integrates residual connections and an attention mechanism within a geometric graph learning framework. By leveraging multiscale weighted colored bipartite subgraphs, DeepGGL effectively captures fine-grained atom-level interactions in protein-ligand complexes across multiple scales. We benchmarked DeepGGL against established models on CASF-2013 and CASF-2016, where it achieved state-of-the-art performance with significant improvements across diverse evaluation metrics. To further assess robustness and generalization, we tested the model on the CSAR-NRC-HiQ dataset and the PDBbind v2019 holdout set. DeepGGL consistently maintained high predictive accuracy, highlighting its adaptability and reliability for binding affinity prediction in structure-based drug discovery.
GAN|对抗|攻击|生成相关(3篇)
【1】Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics
标题:利用高阶Langevin动力学防御扩散模型的成员推断攻击
链接:https://arxiv.org/abs/2509.14225
作者:Sterling, Yousef El-Laham, Mónica F. Bugallo
备注:5 pages, 2 figures, 1 table
摘要:生成式人工智能应用的最新进展引发了新的数据安全问题。本文主要研究扩散模型的成员推理攻击。当攻击者可以确定某个数据点是否用于训练模型时,就会发生这种类型的攻击。虽然扩散模型本质上比其他生成模型更能抵抗成员推断攻击,但它们仍然很容易受到影响。这里提出的防御利用临界阻尼高阶朗之万动力学,它引入了几个辅助变量和沿这些变量的联合扩散过程。这个想法是,辅助变量的存在混合了外部随机性,有助于在扩散过程的早期破坏敏感的输入数据。这一概念在理论上进行了研究,并验证了玩具数据集和语音数据集使用的接收器工作特性(AUROC)曲线和FID度量下的面积。
摘要:Recent advances in generative artificial intelligence applications have raised new data security concerns. This paper focuses on defending diffusion models against membership inference attacks. This type of attack occurs when the attacker can determine if a certain data point was used to train the model. Although diffusion models are intrinsically more resistant to membership inference attacks than other generative models, they are still susceptible. The defense proposed here utilizes critically-damped higher-order Langevin dynamics, which introduces several auxiliary variables and a joint diffusion process along these variables. The idea is that the presence of auxiliary variables mixes external randomness that helps to corrupt sensitive input data earlier on in the diffusion process. This concept is theoretically investigated and validated on a toy dataset and a speech dataset using the Area Under the Receiver Operating Characteristic (AUROC) curves and the FID metric.
【2】Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation
标题:谁撒了谎?检索增强一代有毒知识的责任归因
链接:https://arxiv.org/abs/2509.13772
作者:ang, Haoran Xin, Yuxi Chen, Zhuqing Liu, Biao Yi, Tong Li, Lihai Nie, Zheli Liu, Minghong Fang
备注:To appear in the IEEE Symposium on Security and Privacy, 2026
摘要
:检索增强生成(RAG)将外部知识集成到大型语言模型中,以提高响应质量。然而,最近的工作表明,RAG系统非常容易受到中毒攻击,其中恶意文本被插入到知识数据库中以影响模型输出。虽然已经提出了几种防御方法,但它们通常被更具适应性或更复杂的攻击所规避。 本文介绍了RAGOrigin,一个黑箱责任归属框架,旨在确定知识数据库中的哪些文本负责误导或不正确的世代。我们的方法构建了一个专注的归因范围,适合每个误生成事件,并通过评估其检索排名,语义相关性和对生成的响应的影响,每个候选文本分配一个责任分数。然后,该系统使用无监督聚类方法隔离有毒文本。我们评估了RAGOrigin在7个数据集和15个中毒攻击,包括新开发的自适应中毒策略和多攻击者场景。我们的方法在识别有毒内容方面优于现有的基线,并且在动态和噪声条件下仍然保持稳健。这些结果表明,RAGOrigin为追踪RAG系统中损坏知识的来源提供了一种实用有效的解决方案。
摘要:Retrieval-Augmented Generation (RAG) integrates external knowledge into large language models to improve response quality. However, recent work has shown that RAG systems are highly vulnerable to poisoning attacks, where malicious texts are inserted into the knowledge database to influence model outputs. While several defenses have been proposed, they are often circumvented by more adaptive or sophisticated attacks. This paper presents RAGOrigin, a black-box responsibility attribution framework designed to identify which texts in the knowledge database are responsible for misleading or incorrect generations. Our method constructs a focused attribution scope tailored to each misgeneration event and assigns a responsibility score to each candidate text by evaluating its retrieval ranking, semantic relevance, and influence on the generated response. The system then isolates poisoned texts using an unsupervised clustering method. We evaluate RAGOrigin across seven datasets and fifteen poisoning attacks, including newly developed adaptive poisoning strategies and multi-attacker scenarios. Our approach outperforms existing baselines in identifying poisoned content and remains robust under dynamic and noisy conditions. These results suggest that RAGOrigin provides a practical and effective solution for tracing the origins of corrupted knowledge in RAG systems.
【3】Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures
标题:通过混合量子经典生成模型架构的用于图像合成的量子强化学习引导扩散模型
链接:https://arxiv.org/abs/2509.14163
作者: Chen, En-Jui Kuo
摘要:扩散模型通常采用静态或启发式无分类器引导(CFG)计划,这些计划通常无法适应时间步长和噪声条件。在这项工作中,我们引入了一个量子强化学习(QRL)控制器,在每个去噪步骤中动态调整CFG。该控制器采用混合量子-经典演员-评论家架构:一个浅变分量子电路(VQC)与环纠缠产生的政策特征,这是由一个紧凑的多层感知器(MLP)映射到高斯行动在$\Delta$CFG,而经典的评论家估计值函数。该策略使用具有广义优势估计(GAE)的邻近策略优化(PPO)进行优化,并由平衡分类置信度、感知改进和动作正则化的奖励指导。CIFAR-10上的实验表明,我们的QRL策略提高了感知质量(LPIPS,PSNR,SSIM),同时减少了参数数量相比,经典的RL演员和固定的时间表。对量子位数和电路深度的烧蚀研究揭示了准确性和效率之间的权衡,扩展的评估证实了长扩散时间表下的鲁棒生成。
摘要:Diffusion models typically employ static or heuristic classifier-free guidance (CFG) schedules, which often fail to adapt across timesteps and noise conditions. In this work, we introduce a quantum reinforcement learning (QRL) controller that dynamically adjusts CFG at each denoising step. The controller adopts a hybrid quantum--classical actor--critic architecture: a shallow variational quantum circuit (VQC) with ring entanglement generates policy features, which are mapped by a compact multilayer perceptron (MLP) into Gaussian actions over $\Delta$CFG, while a classical critic estimates value functions. The policy is optimized using Proximal Policy Optimization (PPO) with Generalized Advantage Estimation (GAE), guided by a reward that balances classification confidence, perceptual improvement, and action regularization. Experiments on CIFAR-10 demonstrate that our QRL policy improves perceptual quality (LPIPS, PSNR, SSIM) while reducing parameter count compared to classical RL actors and fixed schedules. Ablation studies on qubit number and circuit depth reveal trade-offs between accuracy and efficiency, and extended evaluations confirm robust generation under long diffusion schedules.
半/弱/无/有监督|不确定性|主动学习(6篇)
【1】Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
标题:作为老师计算:将推理计算变成无参考监督
链接:https://arxiv.org/abs/2509.14234
作者:yalath, Shashwat Goel, Thomas Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten
备注:22 pages, 8 figures, 2 tables
摘要:当后训练中没有基础事实时,学习信号从何而来?我们提出通过计算作为教师(CaT)将探索转化为监督,CaT通过从一组并行的推出中合成单个引用,然后向其优化,将模型在推理时的自身探索转化为无引用监督。冻结锚(初始策略)调和遗漏和矛盾以估计参考,将额外的推理时间计算转化为教师信号。我们将其转化为两种制度的奖励:(i)可验证的任务使用最终答案的程序等价性;(ii)不可验证的任务使用自我提出的规则-由独立的LLM法官评分的二元,可审核的标准,奖励由满意的分数给出。与选择方法(N中最佳、多数、困惑或评判分数)不同,综合可能与大多数人不一致,即使所有推出都是错误的,综合也是正确的;性能与推出的数量成正比。作为测试时间程序,CaT提高了Gemma 3 4 B,Qwen 3 4 B和Llama 3.1 8B(在MATH-500上高达+27%;在HealthBench上为+12%)。通过强化学习(CaT-RL),我们获得了进一步的收益(高达+33%和+30%),训练后的策略超过了初始教师信号。
摘要:Where do learning signals come from when there is no ground truth in post-training? We propose turning exploration into supervision through Compute as Teacher (CaT), which converts the model's own exploration at inference-time into reference-free supervision by synthesizing a single reference from a group of parallel rollouts and then optimizing toward it. Concretely, the current policy produces a group of rollouts; a frozen anchor (the initial policy) reconciles omissions and contradictions to estimate a reference, turning extra inference-time compute into a teacher signal. We turn this into rewards in two regimes: (i) verifiable tasks use programmatic equivalence on final answers; (ii) non-verifiable tasks use self-proposed rubrics-binary, auditable criteria scored by an independent LLM judge, with reward given by the fraction satisfied. Unlike selection methods (best-of-N, majority, perplexity, or judge scores), synthesis may disagree with the majority and be correct even when all rollouts are wrong; performance scales with the number of rollouts. As a test-time procedure, CaT improves Gemma 3 4B, Qwen 3 4B, and Llama 3.1 8B (up to +27% on MATH-500; +12% on HealthBench). With reinforcement learning (CaT-RL), we obtain further gains (up to +33% and +30%), with the trained policy surpassing the initial teacher signal.
【2】Beyond Correlation: Causal Multi-View Unsupervised Feature Selection Learning
标题:超越相关性:因果多视图无监督特征选择学习
链接:https://arxiv.org/abs/2509.13763
作者:hen, Yanyong Huang, Bin Wang, Jinyuan Chang, Shiyu Liu, Tianrui Li
摘要:多视图无监督特征选择(MUFS)算法因其在多视图无标记数据降维方面的良好性能而受到越来越多的关注。现有的MUFS方法通常通过捕获特征和聚类标签之间的相关性来选择有区别的特征。然而,一个重要但尚未探索的问题仍然存在:\textit{这种相关性是否足够可靠以指导特征选择?}在本文中,我们分析MUFS从因果关系的角度,通过引入一种新的结构因果模型,这表明,现有的方法可能会选择不相关的功能,因为他们忽略了虚假的相关性造成的混淆。在此基础上,我们提出了一种新的MUFS方法,称为因果多视图无监督特征选择学习(CAUSA)。具体来说,我们首先采用一个广义的无监督谱回归模型,通过捕获特征和共识聚类标签之间的依赖关系来识别信息特征。然后,我们引入了一个因果正则化模块,可以自适应地从多视图数据中分离混淆因素,同时学习视图共享样本权重来平衡混淆因素分布,从而减轻虚假相关性。此后,整合到一个统一的学习框架,使CAUSA选择因果信息功能。综合实验表明,CAUSA优于几个国家的最先进的方法。据我们所知,这是第一次深入研究因果多视图特征选择在无监督设置。
摘要
:Multi-view unsupervised feature selection (MUFS) has recently received increasing attention for its promising ability in dimensionality reduction on multi-view unlabeled data. Existing MUFS methods typically select discriminative features by capturing correlations between features and clustering labels. However, an important yet underexplored question remains: \textit{Are such correlations sufficiently reliable to guide feature selection?} In this paper, we analyze MUFS from a causal perspective by introducing a novel structural causal model, which reveals that existing methods may select irrelevant features because they overlook spurious correlations caused by confounders. Building on this causal perspective, we propose a novel MUFS method called CAusal multi-view Unsupervised feature Selection leArning (CAUSA). Specifically, we first employ a generalized unsupervised spectral regression model that identifies informative features by capturing dependencies between features and consensus clustering labels. We then introduce a causal regularization module that can adaptively separate confounders from multi-view data and simultaneously learn view-shared sample weights to balance confounder distributions, thereby mitigating spurious correlations. Thereafter, integrating both into a unified learning framework enables CAUSA to select causally informative features. Comprehensive experiments demonstrate that CAUSA outperforms several state-of-the-art methods. To our knowledge, this is the first in-depth study of causal multi-view feature selection in the unsupervised setting.
【3】A Conformal Prediction Framework for Uncertainty Quantification in Physics-Informed Neural Networks
标题:物理信息神经网络中不确定性量化的保形预测框架
链接:https://arxiv.org/abs/2509.13717
作者: Cheuk Hin Ho, Yangshuai Wang
摘要:物理信息神经网络(PINN)已经成为解决偏微分方程的强大框架,但现有的PINN不确定性量化(UQ)方法通常缺乏严格的统计保证。在这项工作中,我们通过引入PINN中UQ的无分布共形预测(CP)框架来弥合这一差距。该框架通过在校准集上构建不一致性分数来校准预测区间,从而产生具有严格有限样本覆盖保证的PINN分布自由不确定性估计。为了处理空间异方差性,我们进一步引入局部共形分位数估计,在保持理论保证的同时实现空间自适应不确定性带。通过对典型偏微分方程(阻尼谐振子,泊松,艾伦-卡恩和亥姆霍兹方程)的系统评估和多个不确定性度量的综合测试,我们的结果表明,所提出的框架实现了可靠的校准和局部自适应不确定性区间,始终优于启发式UQ方法。通过将PINN与无分布UQ连接起来,这项工作引入了一个通用框架,不仅增强了校准和可靠性,而且为复杂PDE系统的不确定性感知建模开辟了新的途径。
摘要:Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving PDEs, yet existing uncertainty quantification (UQ) approaches for PINNs generally lack rigorous statistical guarantees. In this work, we bridge this gap by introducing a distribution-free conformal prediction (CP) framework for UQ in PINNs. This framework calibrates prediction intervals by constructing nonconformity scores on a calibration set, thereby yielding distribution-free uncertainty estimates with rigorous finite-sample coverage guarantees for PINNs. To handle spatial heteroskedasticity, we further introduce local conformal quantile estimation, enabling spatially adaptive uncertainty bands while preserving theoretical guarantee. Through systematic evaluations on typical PDEs (damped harmonic oscillator, Poisson, Allen-Cahn, and Helmholtz equations) and comprehensive testing across multiple uncertainty metrics, our results demonstrate that the proposed framework achieves reliable calibration and locally adaptive uncertainty intervals, consistently outperforming heuristic UQ approaches. By bridging PINNs with distribution-free UQ, this work introduces a general framework that not only enhances calibration and reliability, but also opens new avenues for uncertainty-aware modeling of complex PDE systems.
【4】Unsupervised Anomaly Detection in ALS EPICS Event Logs
标题:ALS EPICS事件队列中的无监督异常检测
链接:https://arxiv.org/abs/2509.13621
作者:ulc, Thorsten Hellert, Steven Hunt
备注:6 pages, 5 figures, The 20th International Conference on Accelerator and Large Experimental Physics Control Systems
摘要:本文介绍了一个自动故障分析框架的高级光源(ALS),处理实时事件日志从EPICS控制系统。通过将日志条目视为自然语言,我们使用语义嵌入技术将其转换为上下文向量表示。经过正常操作数据训练的序列感知神经网络为每个事件分配实时异常评分。该方法标记与基线行为的偏差,使操作员能够快速识别复杂系统故障之前的关键事件序列。
摘要:This paper introduces an automated fault analysis framework for the Advanced Light Source (ALS) that processes real-time event logs from its EPICS control system. By treating log entries as natural language, we transform them into contextual vector representations using semantic embedding techniques. A sequence-aware neural network, trained on normal operational data, assigns a real-time anomaly score to each event. This method flags deviations from baseline behavior, enabling operators to rapidly identify the critical event sequences that precede complex system failures.
【5】Proximity-Based Evidence Retrieval for Uncertainty-Aware Neural Networks
标题:不确定性感知神经网络的基于邻近度的证据检索
链接:https://arxiv.org/abs/2509.13338
作者:aroun, Mohammad Sadegh Khorshidi, Kasra Ranjbarigderi, Fang Chen, Amir H. Gandomi
备注:15 pages, 4 figures, 3 tables
摘要:这项工作提出了一个证据检索机制,不确定性意识的决策,取代了一个单一的全球截止与证据条件,实例自适应的标准。对于每个测试实例,在嵌入空间中检索邻近样本;通过Dempster-Shafer理论融合它们的预测分布。由此产生的融合信念作为每个实例的阈值机制。因为支持证据是明确的,所以决策是透明和可审计的。在具有BiT和ViT主干的CIFAR-10/100上进行的实验显示出更高或相当的不确定性感知性能,与对预测熵应用阈值相比,具有更少的不正确结果和可持续的审查负载。值得注意的是,只有少数证据足以实现这些收益;增加证据集只会产生适度的变化。这些结果表明,证据条件标记提供了一个更可靠的和可解释的替代固定的预测熵阈值的操作不确定性感知决策。
摘要:This work proposes an evidence-retrieval mechanism for uncertainty-aware decision-making that replaces a single global cutoff with an evidence-conditioned, instance-adaptive criterion. For each test instance, proximal exemplars are retrieved in an embedding space; their predictive distributions are fused via Dempster-Shafer theory. The resulting fused belief acts as a per-instance thresholding mechanism. Because the supporting evidences are explicit, decisions are transparent and auditable. Experiments on CIFAR-10/100 with BiT and ViT backbones show higher or comparable uncertainty-aware performance with materially fewer confidently incorrect outcomes and a sustainable review load compared with applying threshold on prediction entropy. Notably, only a few evidences are sufficient to realize these gains; increasing the evidence set yields only modest changes. These results indicate that evidence-conditioned tagging provides a more reliable and interpretable alternative to fixed prediction entropy thresholds for operational uncertainty-aware decision-making.
【6】Self-Supervised and Topological Signal-Quality Assessment for Any PPG Device
标题:任何PPV设备的自我监督和布局信号质量评估
链接:https://arxiv.org/abs/2509.12510
作者: Ruoyu Zhang, Zequan Liang, Ehsan Kourkchi, Setareh Rafatirad, Houman Homayoun
备注:In the proceedings of IEEE-EMBS BSN 2025
摘要
:可穿戴式光电容积描记术(PPG)嵌入在数十亿台设备中,但其光学波形很容易受到运动、灌注损失和环境光的破坏,从而危及下游的心脏测量分析。现有的信号质量评估(SQA)方法要么依赖于脆弱的统计学,要么依赖于数据饥饿的监督模型。我们为腕部PPG引入了第一个完全无监督的SQA管道。阶段1在来自异构源(设备和采样频率不同)的原始未标记数据的276小时上训练对比1-D ResNet-18,产生光学发射器和运动不变的嵌入(即,所学习的表示在LED波长、驱动强度和设备光学器件以及手腕运动的差异上是稳定的)。阶段2通过持久同源性(PH)将每个512-D编码器嵌入转换为4-D拓扑签名,并使用HDBSCAN对这些签名进行聚类。为了产生二进制信号质量指数(SQI),可接受的PPG信号由密集聚类表示,而其余聚类被假设为主要包含质量差的PPG信号。在没有重新调整的情况下,SQI在10,000个窗口的分层样本上分别获得0.72、0.34和6173的Silhouette、Davies-Bouldin和Calinski-Harabasz分数。在这项研究中,我们提出了一个混合自监督学习-拓扑数据分析(SSL-TDA)框架,为PPG信号提供了一个可扩展的跨设备质量门。
摘要:Wearable photoplethysmography (PPG) is embedded in billions of devices, yet its optical waveform is easily corrupted by motion, perfusion loss, and ambient light, jeopardizing downstream cardiometric analytics. Existing signal-quality assessment (SQA) methods rely either on brittle heuristics or on data-hungry supervised models. We introduce the first fully unsupervised SQA pipeline for wrist PPG. Stage 1 trains a contrastive 1-D ResNet-18 on 276 h of raw, unlabeled data from heterogeneous sources (varying in device and sampling frequency), yielding optical-emitter- and motion-invariant embeddings (i.e., the learned representation is stable across differences in LED wavelength, drive intensity, and device optics, as well as wrist motion). Stage 2 converts each 512-D encoder embedding into a 4-D topological signature via persistent homology (PH) and clusters these signatures with HDBSCAN. To produce a binary signal-quality index (SQI), the acceptable PPG signals are represented by the densest cluster while the remaining clusters are assumed to mainly contain poor-quality PPG signals. Without re-tuning, the SQI attains Silhouette, Davies-Bouldin, and Calinski-Harabasz scores of 0.72, 0.34, and 6173, respectively, on a stratified sample of 10,000 windows. In this study, we propose a hybrid self-supervised-learning--topological-data-analysis (SSL--TDA) framework that offers a drop-in, scalable, cross-device quality gate for PPG signals.
迁移|Zero/Few/One-Shot|自适应(7篇)
【1】Adaptive Client Selection via Q-Learning-based Whittle Index in Wireless Federated Learning
标题:无线联合学习中通过基于Q-Learning的Whittle指数进行自适应客户端选择
链接:https://arxiv.org/abs/2509.13933
作者: Yingxin Liu, Hang Qi, Jieping Luo, Zhizhang Liu, Jingjin Wu
摘要:我们认为,在无线联合学习(FL)的客户端选择问题,减少总所需的时间,以达到一定程度的学习精度的目标。由于服务器不能观察到客户端的动态状态,可以改变他们的计算和通信效率,我们制定客户端选择作为一个不安分的多臂强盗问题。我们提出了一种可扩展的和有效的方法,称为联合Q学习中的Whittle指数学习(WILF-Q),它使用Q学习自适应地学习和更新与每个客户端相关联的近似Whittle指数,然后选择具有最高指数的客户端。与现有方法相比,WILF-Q不需要客户端状态转换或数据分布的明确知识,使其非常适合在实际FL设置中部署。实验结果表明,WILF-Q显着优于现有的基线政策的学习效率,提供了一个强大的和有效的方法来选择客户端在无线FL。
摘要:We consider the client selection problem in wireless Federated Learning (FL), with the objective of reducing the total required time to achieve a certain level of learning accuracy. Since the server cannot observe the clients' dynamic states that can change their computation and communication efficiency, we formulate client selection as a restless multi-armed bandit problem. We propose a scalable and efficient approach called the Whittle Index Learning in Federated Q-learning (WILF-Q), which uses Q-learning to adaptively learn and update an approximated Whittle index associated with each client, and then selects the clients with the highest indices. Compared to existing approaches, WILF-Q does not require explicit knowledge of client state transitions or data distributions, making it well-suited for deployment in practical FL settings. Experiment results demonstrate that WILF-Q significantly outperforms existing baseline policies in terms of learning efficiency, providing a robust and efficient approach to client selection in wireless FL.
【2】APFEx: Adaptive Pareto Front Explorer for Intersectional Fairness
标题:APFEx:自适应帕累托前沿探索器,实现交叉公平
链接:https://arxiv.org/abs/2509.13908
作者:a Mondal, Faizanuddin Ansari, Swagatam Das
摘要:确保机器学习模型的公平性至关重要,特别是当偏见在种族、性别和年龄等交叉保护属性中复合时。虽然现有的方法解决了单一属性的公平性,但它们未能捕捉到交叉子群体所面临的细微差别,倍增的偏见。我们引入自适应帕累托前浏览器(APFEx),第一个框架,明确地建模交叉公平性作为一个联合优化问题的笛卡尔产品的敏感属性。APFEx结合了三个关键创新-(1)自适应多目标优化器,可在Pareto锥投影、梯度加权和探索策略之间动态切换,以导航公平性-准确性权衡,(2)可微交叉公平性度量,实现基于梯度的非平滑子群差异优化,以及(3)理论上保证收敛到Pareto最优解。在四个真实数据集上的实验证明了APFEx的优越性,在保持竞争准确性的同时减少了公平性违规。我们的工作弥合了公平ML中的一个关键差距,为交叉公平性提供了一个可扩展的,模型不可知的解决方案。
摘要:Ensuring fairness in machine learning models is critical, especially when biases compound across intersecting protected attributes like race, gender, and age. While existing methods address fairness for single attributes, they fail to capture the nuanced, multiplicative biases faced by intersectional subgroups. We introduce Adaptive Pareto Front Explorer (APFEx), the first framework to explicitly model intersectional fairness as a joint optimization problem over the Cartesian product of sensitive attributes. APFEx combines three key innovations- (1) an adaptive multi-objective optimizer that dynamically switches between Pareto cone projection, gradient weighting, and exploration strategies to navigate fairness-accuracy trade-offs, (2) differentiable intersectional fairness metrics enabling gradient-based optimization of non-smooth subgroup disparities, and (3) theoretical guarantees of convergence to Pareto-optimal solutions. Experiments on four real-world datasets demonstrate APFEx's superiority, reducing fairness violations while maintaining competitive accuracy. Our work bridges a critical gap in fair ML, providing a scalable, model-agnostic solution for intersectional fairness.
【3】TFMAdapter: Lightweight Instance-Level Adaptation of Foundation Models for Forecasting with Covariates
标题:TFMDapter:基础模型的轻量级实例级自适应,用于协变量预测
链接:https://arxiv.org/abs/2509.13906
作者:ge, Sunita Sarawagi
备注:Accepted at CIKM 2025
摘要:时间序列基础模型(TSFMs)最近在新时间序列的单变量预测中取得了最先进的性能,只需以过去值的简短历史为条件。他们的成功表明,跨不同领域的大规模预训练可以获得归纳偏差,从而在简短的历史中从时间模式中进行概括。然而,大多数TSFM无法利用协变量-未来可用的外生变量,在许多应用中对准确预测至关重要-由于其特定领域的性质和缺乏相关的归纳偏差。我们提出了TFMAdapter,一个轻量级的,实例级的适配器,增强TSFM与协变量信息,而无需微调。TFMAdapter不是重新训练,而是在单个模型调用期间提供的有限历史上操作,学习将协变量与单变量TSFM预测相结合的非参数级联。然而,这样的学习将需要在历史的所有步骤的单变量预测,需要太多的调用TSFM。为了在限制TSFM调用的同时实现对完整历史上下文的训练,TFMAdapter使用了一种两阶段的方法:(1)使用简单的回归模型生成伪预测,以及(2)训练高斯过程回归量,以使用伪和TSFM预测以及协变量来改进预测。在真实世界数据集上进行的大量实验表明,TFMAdapter始终优于基础模型和监督基线,以最小的数据和计算开销实现了比基础模型24- 27%的改进。我们的研究结果突出了轻量级适配器的潜力,通用的基础模型和特定领域的预测需求之间的差距。
摘要
:Time Series Foundation Models (TSFMs) have recently achieved state-of-the-art performance in univariate forecasting on new time series simply by conditioned on a brief history of past values. Their success demonstrates that large-scale pretraining across diverse domains can acquire the inductive bias to generalize from temporal patterns in a brief history. However, most TSFMs are unable to leverage covariates -- future-available exogenous variables critical for accurate forecasting in many applications -- due to their domain-specific nature and the lack of associated inductive bias. We propose TFMAdapter, a lightweight, instance-level adapter that augments TSFMs with covariate information without fine-tuning. Instead of retraining, TFMAdapter operates on the limited history provided during a single model call, learning a non-parametric cascade that combines covariates with univariate TSFM forecasts. However, such learning would require univariate forecasts at all steps in the history, requiring too many calls to the TSFM. To enable training on the full historical context while limiting TSFM invocations, TFMAdapter uses a two-stage method: (1) generating pseudo-forecasts with a simple regression model, and (2) training a Gaussian Process regressor to refine predictions using both pseudo- and TSFM forecasts alongside covariates. Extensive experiments on real-world datasets demonstrate that TFMAdapter consistently outperforms both foundation models and supervised baselines, achieving a 24-27\% improvement over base foundation models with minimal data and computational overhead. Our results highlight the potential of lightweight adapters to bridge the gap between generic foundation models and domain-specific forecasting needs.
【4】Hybrid Quantum-Classical Neural Networks for Few-Shot Credit Risk Assessment
标题:用于Few-Shot信用风险评估的混合量子经典神经网络
链接:https://arxiv.org/abs/2509.13818
作者:Wang, Yanbo J. Wang, Jiachi Zhang, Qi Xu, Yilun Zhao, Jintao Li, Yipeng Zhang, Bo Yang, Xinkai Gao, Xiaofeng Cao, Kai Xu, Pengpeng Hao, Xuan Yang, Heng Fan
摘要:量子机器学习(QML)为解决经典方法难以解决的复杂金融问题提供了一种新的范式。这项工作专门解决了Few-Shot信用风险评估的挑战,这是包容性金融中的一个关键问题,数据稀缺和不平衡限制了传统模型的有效性。为了解决这个问题,我们设计并实现了一种新的混合量子经典工作流程。该方法首先采用经典机器学习模型(逻辑回归,随机森林,XGBoost)的集成进行智能特征工程和降维。随后,通过参数移位规则训练的量子神经网络(QNN)作为核心分类器。该框架通过数值模拟进行了评估,并部署在Quafu量子云平台的ScQ-P21超导处理器上。在279个样本的真实信用数据集上,我们的QNN在模拟中实现了0.852 +/- 0.027的稳健平均AUC,并在硬件实验中获得了令人印象深刻的0.88 AUC。这种性能超过了一套经典的基准测试,在召回指标上有特别强的结果。这项研究为在NISQ时代将量子计算应用于数据受限的金融场景提供了一个务实的蓝图,并提供了宝贵的经验证据,支持其在普惠金融等高风险应用中的潜力。
摘要:Quantum Machine Learning (QML) offers a new paradigm for addressing complex financial problems intractable for classical methods. This work specifically tackles the challenge of few-shot credit risk assessment, a critical issue in inclusive finance where data scarcity and imbalance limit the effectiveness of conventional models. To address this, we design and implement a novel hybrid quantum-classical workflow. The methodology first employs an ensemble of classical machine learning models (Logistic Regression, Random Forest, XGBoost) for intelligent feature engineering and dimensionality reduction. Subsequently, a Quantum Neural Network (QNN), trained via the parameter-shift rule, serves as the core classifier. This framework was evaluated through numerical simulations and deployed on the Quafu Quantum Cloud Platform's ScQ-P21 superconducting processor. On a real-world credit dataset of 279 samples, our QNN achieved a robust average AUC of 0.852 +/- 0.027 in simulations and yielded an impressive AUC of 0.88 in the hardware experiment. This performance surpasses a suite of classical benchmarks, with a particularly strong result on the recall metric. This study provides a pragmatic blueprint for applying quantum computing to data-constrained financial scenarios in the NISQ era and offers valuable empirical evidence supporting its potential in high-stakes applications like inclusive finance.
【5】WatchAnxiety: A Transfer Learning Approach for State Anxiety Prediction from Smartwatch Data
标题:WatchAncient:一种根据Smartwatch数据预测状态焦虑的迁移学习方法
链接:https://arxiv.org/abs/2509.13725
作者: Ahmed, Noah French, Mark Rucker, Zhiyuan Wang, Taylor Myers-Brower, Kaitlyn Petz, Mehdi Boukhechba, Bethany A. Teachman, Laura E. Barnes
摘要:社交焦虑是一种常见的心理健康状况,与学术,社交和职业功能的重大挑战有关。一个核心特征是在社交场合中瞬间(状态)焦虑升高,但很少有先前的工作测量或预测这种焦虑在一天中的波动。捕捉这些日内动态对于设计实时、个性化的干预措施至关重要,例如及时适应性干预措施(JITAIs)。为了解决这一差距,我们对社交焦虑的大学生(N=91;排除后72)进行了一项研究,使用我们的定制智能手表系统,平均时间为9.03天(SD = 2.95)。参与者每天接受七次生态瞬时评估(EMA)以报告状态焦虑。我们基于超过10,000天的外部心率数据开发了一个基础模型,将其表示转移到我们的数据集,并对其进行微调以生成概率预测。这些都结合了元学习者的特质水平的措施。在我们的数据集中,我们的管道在状态焦虑检测中达到了60.4%的平衡准确率。为了评估泛化能力,我们将训练方法应用于TILES-18训练集的单独保留集-用于预训练的相同数据集。在10,095次每日一次的EMA上,我们的方法实现了59.1%的平衡准确率,比之前的工作至少高出7%。
摘要:Social anxiety is a common mental health condition linked to significant challenges in academic, social, and occupational functioning. A core feature is elevated momentary (state) anxiety in social situations, yet little prior work has measured or predicted fluctuations in this anxiety throughout the day. Capturing these intra-day dynamics is critical for designing real-time, personalized interventions such as Just-In-Time Adaptive Interventions (JITAIs). To address this gap, we conducted a study with socially anxious college students (N=91; 72 after exclusions) using our custom smartwatch-based system over an average of 9.03 days (SD = 2.95). Participants received seven ecological momentary assessments (EMAs) per day to report state anxiety. We developed a base model on over 10,000 days of external heart rate data, transferred its representations to our dataset, and fine-tuned it to generate probabilistic predictions. These were combined with trait-level measures in a meta-learner. Our pipeline achieved 60.4% balanced accuracy in state anxiety detection in our dataset. To evaluate generalizability, we applied the training approach to a separate hold-out set from the TILES-18 dataset-the same dataset used for pretraining. On 10,095 once-daily EMAs, our method achieved 59.1% balanced accuracy, outperforming prior work by at least 7%.
【6】Efficient Last-Iterate Convergence in Regret Minimization via Adaptive Reward Transformation
标题:通过自适应奖励转换实现遗憾最小化的高效最后迭代收敛
链接:https://arxiv.org/abs/2509.13653
作者: Yulin Wu, Shuhan Qi, Jiajia Zhang, Xiaozhen Sun, Tianzi Ma, Xuan Wang
摘要:后悔最小化是一种寻找正规形式博弈(NFG)和可拓形式博弈(EFG)纳什均衡的有效方法,但它通常只能保证平均策略的收敛性。然而,计算平均策略需要大量的计算资源或引入额外的错误,限制了其实际适用性。将奖励变换框架引入后悔极小化算法,通过对奖励函数进行正则化,实现了后悔极小化算法的最后一次收敛。然而,它面临着实际的挑战:其性能对手动调整的参数高度敏感,这些参数往往偏离理论收敛条件,导致收敛缓慢,振荡或停滞在局部最优值。 受以前工作的启发,我们提出了一种自适应技术来解决这些问题,确保RT遗憾匹配(RTRM),RT反事实遗憾最小化(RTCFR)及其变体更有效地解决NFG和EFG的理论保证和实际性能之间的一致性。我们的自适应方法动态地调整参数,平衡勘探和开采,同时改善遗憾积累,最终提高渐近最后收敛,实现线性收敛。实验结果表明,我们的方法显着加快收敛,优于国家的最先进的算法。
摘要:Regret minimization is a powerful method for finding Nash equilibria in Normal-Form Games (NFGs) and Extensive-Form Games (EFGs), but it typically guarantees convergence only for the average strategy. However, computing the average strategy requires significant computational resources or introduces additional errors, limiting its practical applicability. The Reward Transformation (RT) framework was introduced to regret minimization to achieve last-iterate convergence through reward function regularization. However, it faces practical challenges: its performance is highly sensitive to manually tuned parameters, which often deviate from theoretical convergence conditions, leading to slow convergence, oscillations, or stagnation in local optima. Inspired by previous work, we propose an adaptive technique to address these issues, ensuring better consistency between theoretical guarantees and practical performance for RT Regret Matching (RTRM), RT Counterfactual Regret Minimization (RTCFR), and their variants in solving NFGs and EFGs more effectively. Our adaptive methods dynamically adjust parameters, balancing exploration and exploitation while improving regret accumulation, ultimately enhancing asymptotic last-iterate convergence and achieving linear convergence. Experimental results demonstrate that our methods significantly accelerate convergence, outperforming state-of-the-art algorithms.
【7】Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory Prediction in Autonomous Vehicles
标题:动态感知:用于自动驾驶车辆轨迹预测的自适应多模式分布失调检测
链接:https://arxiv.org/abs/2509.13577
作者:uo, Lili Su
备注:8 pages, 7 figures
摘要:轨迹预测对于自动驾驶汽车(AV)的安全和无缝操作至关重要。然而,在部署中,预测模型不可避免地面临训练数据和真实世界条件之间的分布变化,其中罕见或代表性不足的流量场景会导致分布外(OOD)情况。虽然AV中大多数先前的OOD检测研究都集中在计算机视觉任务上,如对象检测和分割,但自动化水平的OOD检测在很大程度上仍然没有得到充分的探索。最近的一项研究将这个问题表述为最快变化检测(QCD)任务,为检测延迟和假警报之间的权衡提供了正式保证[1]。在此基础上,我们提出了一个新的框架,引入自适应机制,以实现在复杂的驾驶环境中的鲁棒检测。对多个真实世界数据集的实证分析表明,预测误差-即使是在分布样本上-也表现出随时间推移而演变的模式依赖性分布。通过明确建模这些错误模式,我们的方法实现了检测延迟和误报率的大幅改善。在已建立的轨迹预测基准上进行的综合实验表明,我们的框架在准确性和计算效率方面都明显优于先前的基于UQ和视觉的OOD方法,为实现可靠的驾驶感知自主提供了一条实用的道路。
摘要:Trajectory prediction is central to the safe and seamless operation of autonomous vehicles (AVs). In deployment, however, prediction models inevitably face distribution shifts between training data and real-world conditions, where rare or underrepresented traffic scenarios induce out-of-distribution (OOD) cases. While most prior OOD detection research in AVs has concentrated on computer vision tasks such as object detection and segmentation, trajectory-level OOD detection remains largely underexplored. A recent study formulated this problem as a quickest change detection (QCD) task, providing formal guarantees on the trade-off between detection delay and false alarms [1]. Building on this foundation, we propose a new framework that introduces adaptive mechanisms to achieve robust detection in complex driving environments. Empirical analysis across multiple real-world datasets reveals that prediction errors--even on in-distribution samples--exhibit mode-dependent distributions that evolve over time with dataset-specific dynamics. By explicitly modeling these error modes, our method achieves substantial improvements in both detection delay and false alarm rates. Comprehensive experiments on established trajectory prediction benchmarks show that our framework significantly outperforms prior UQ- and vision-based OOD approaches in both accuracy and computational efficiency, offering a practical path toward reliable, driving-aware autonomy.
强化学习(5篇)
【1】TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning
标题:TSYS:用于鲁棒Web代理强化学习的树引导偏好优化
链接:https://arxiv.org/abs/2509.14172
作者:en, Zhenghui Zhao, Zhangye Han, Miancan Liu, Xianhang Ye, Yiqing Li, Hongbo Min, Jinkui Ren, Xiantao Zhang, Guitao Cao
摘要:随着大型语言模型和视觉语言模型的快速发展,使用大型模型作为Web Agent已经成为自动Web交互的必要条件。然而,用强化学习训练Web Agent面临着关键的挑战,包括信用分配错误分配,过高的注释成本和奖励稀疏。为了解决这些问题,我们提出了树引导偏好优化(Tree-Guided Preference Optimization,TSTO),这是一种离线强化学习框架,它提出了一种树结构的轨迹表示,将轨迹之间语义相同的状态合并在一起,以消除标签冲突。我们的框架采用了一个过程奖励模型,通过子目标进度,冗余检测和动作验证自动生成细粒度的奖励。此外,动态加权机制在训练期间优先考虑高影响决策点。在Online-Mind 2 Web和我们自己构建的C-WebShop数据集上的实验表明,Tcloud显著优于现有方法,以更少的冗余步骤获得更高的成功率。
摘要:With the rapid advancement of large language models and vision-language models, employing large models as Web Agents has become essential for automated web interaction. However, training Web Agents with reinforcement learning faces critical challenges including credit assignment misallocation, prohibitively high annotation costs, and reward sparsity. To address these issues, we propose Tree-Guided Preference Optimization (TGPO), an offline reinforcement learning framework that proposes a tree-structured trajectory representation merging semantically identical states across trajectories to eliminate label conflicts. Our framework incorporates a Process Reward Model that automatically generates fine-grained rewards through subgoal progress, redundancy detection, and action verification. Additionally, a dynamic weighting mechanism prioritizes high-impact decision points during training. Experiments on Online-Mind2Web and our self-constructed C-WebShop datasets demonstrate that TGPO significantly outperforms existing methods, achieving higher success rates with fewer redundant steps.
【2】Online Bayesian Risk-Averse Reinforcement Learning
标题:在线Bayesian风险厌恶强化学习
链接:https://arxiv.org/abs/2509.14077
作者:g, Enlu Zhou
摘要:在本文中,我们研究了贝叶斯风险规避公式在强化学习(RL)。为了解决由于缺乏数据而导致的认知不确定性,我们采用贝叶斯风险马尔可夫决策过程(BRMDP)来解释未知基础模型的参数不确定性。在真实未知分布下,我们推导出贝叶斯风险值函数与原始值函数之间差异的渐近正态性。结果表明,贝叶斯风险厌恶方法倾向于悲观地低估原始价值函数。这种差异随着风险厌恶情绪的增强而增加,随着更多的数据变得可用而减少。然后,我们利用这种自适应属性的设置在线RL以及在线上下文多臂土匪(CMAB),在线RL的一个特殊情况。我们针对一般RL问题和CMAB问题提供了两个使用后验抽样的过程。我们建立了一个次线性的遗憾界,与遗憾定义为传统的遗憾RL和CMAB设置。此外,我们还建立了CMAB环境下的次线性后悔界限,其中后悔定义为贝叶斯风险后悔。最后,我们进行了数值实验,以证明所提出的算法在解决认知不确定性和验证的理论性质的有效性。
摘要:In this paper, we study the Bayesian risk-averse formulation in reinforcement learning (RL). To address the epistemic uncertainty due to a lack of data, we adopt the Bayesian Risk Markov Decision Process (BRMDP) to account for the parameter uncertainty of the unknown underlying model. We derive the asymptotic normality that characterizes the difference between the Bayesian risk value function and the original value function under the true unknown distribution. The results indicate that the Bayesian risk-averse approach tends to pessimistically underestimate the original value function. This discrepancy increases with stronger risk aversion and decreases as more data become available. We then utilize this adaptive property in the setting of online RL as well as online contextual multi-arm bandits (CMAB), a special case of online RL. We provide two procedures using posterior sampling for both the general RL problem and the CMAB problem. We establish a sub-linear regret bound, with the regret defined as the conventional regret for both the RL and CMAB settings. Additionally, we establish a sub-linear regret bound for the CMAB setting with the regret defined as the Bayesian risk regret. Finally, we conduct numerical experiments to demonstrate the effectiveness of the proposed algorithm in addressing epistemic uncertainty and verifying the theoretical properties.
【3】TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
标题:TreeIRL:利用树搜索和反向强化学习安全城市驾驶
链接:https://arxiv.org/abs/2509.13579
作者:. Tomov, Sang Uk Lee, Hansford Hendrago, Jinwook Huh, Teawon Han, Forbes Howington, Rafael da Silva, Gianmarco Bernasconi, Marc Heim, Samuel Findler, Xiaonan Ji, Alexander Boule, Michael Napoli, Kuo Chen, Jesse Miller, Boaz Floor, Yunqing Hu
摘要
:我们提出了TreeIRL,一种新的自动驾驶规划器,它结合了蒙特卡洛树搜索(MCTS)和逆强化学习(IRL),以实现模拟和现实驾驶中最先进的性能。其核心思想是使用MCTS来找到一组有希望的安全候选轨迹,并使用深度IRL评分函数来选择其中最像人类的轨迹。我们在大规模模拟和拉斯维加斯大都市区500多英里的真实自动驾驶中,对TreeIRL进行了评估。测试场景包括密集的城市交通,自适应巡航控制,切入点和交通灯。TreeIRL实现了最佳的整体性能,在安全、进步、舒适和人性化之间取得了平衡。据我们所知,我们的工作是第一次在公共道路上展示基于MCTS的规划,并强调了在各种指标和现实环境中评估规划者的重要性。TreeIRL具有高度可扩展性,可以通过强化学习和模仿学习进一步改进,为探索经典方法和基于学习的方法的不同组合提供了一个框架,以解决自动驾驶中的规划瓶颈。
摘要:We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation and in real-world driving. The core idea is to use MCTS to find a promising set of safe candidate trajectories and a deep IRL scoring function to select the most human-like among them. We evaluate TreeIRL against both classical and state-of-the-art planners in large-scale simulations and on 500+ miles of real-world autonomous driving in the Las Vegas metropolitan area. Test scenarios include dense urban traffic, adaptive cruise control, cut-ins, and traffic lights. TreeIRL achieves the best overall performance, striking a balance between safety, progress, comfort, and human-likeness. To our knowledge, our work is the first demonstration of MCTS-based planning on public roads and underscores the importance of evaluating planners across a diverse set of metrics and in real-world environments. TreeIRL is highly extensible and could be further improved with reinforcement learning and imitation learning, providing a framework for exploring different combinations of classical and learning-based approaches to solve the planning bottleneck in autonomous driving.
【4】$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation
标题:$Agent ' 2 $:用于强化学习自动化的Agent生成Agent框架
链接:https://arxiv.org/abs/2509.13368
作者: Xiaohan Shan, Ran Miao, Jianmin Li
备注:9 pages, 7 figures
摘要:强化学习代理开发传统上需要广泛的专业知识和漫长的迭代,通常会导致高失败率和有限的可访问性。本文介绍了$Agent^2$,一种新的代理生成代理框架,实现了完全自动化的RL代理设计,通过智能LLM驱动的生成。该系统可以自主地将自然语言任务描述和环境代码转换为全面、高性能的强化学习解决方案,而无需人工干预。$Agent^2$具有革命性的双代理架构。Generator Agent充当自治AI设计器,用于分析任务并生成可执行的RL Agent,而Target Agent是由此自动生成的RL Agent。该框架将RL开发分解为两个不同的阶段:MDP建模和算法优化,从而实现更有针对性和更有效的代理生成。基于模型上下文协议,$Agent ^2 $提供了一个统一的框架,可以在不同的环境和算法中创建智能Agent,同时结合自适应训练管理和智能反馈分析,以实现持续改进。在包括MuJoCo、MetaDrive、MPE和SMAC在内的广泛基准测试中进行的大量实验表明,在所有任务中,$Agent^2$的性能始终优于手动设计的解决方案,平均性能提高高达55%,并取得了显著的收益。通过实现真正的端到端闭环自动化,这项工作建立了一种新的范式,智能代理可以设计和优化其他代理,标志着自动化AI系统的根本突破。
摘要:Reinforcement learning agent development traditionally requires extensive expertise and lengthy iterations, often resulting in high failure rates and limited accessibility. This paper introduces $Agent^2$, a novel agent-generates-agent framework that achieves fully automated RL agent design through intelligent LLM-driven generation. The system autonomously transforms natural language task descriptions and environment code into comprehensive, high-performance reinforcement learning solutions without human intervention. $Agent^2$ features a revolutionary dual-agent architecture. The Generator Agent serves as an autonomous AI designer that analyzes tasks and generates executable RL agents, while the Target Agent is the resulting automatically generated RL agent. The framework decomposes RL development into two distinct stages: MDP modeling and algorithmic optimization, enabling more targeted and effective agent generation. Built on the Model Context Protocol, $Agent^2$ provides a unified framework that standardizes intelligent agent creation across diverse environments and algorithms, while incorporating adaptive training management and intelligent feedback analysis for continuous improvement. Extensive experiments on a wide range of benchmarks, including MuJoCo, MetaDrive, MPE, and SMAC, demonstrate that $Agent^2$ consistently outperforms manually designed solutions across all tasks, achieving up to 55% performance improvement and substantial gains on average. By enabling truly end-to-end, closed-loop automation, this work establishes a new paradigm in which intelligent agents design and optimize other agents, marking a fundamental breakthrough for automated AI systems.
【5】Maximizing UAV Cellular Connectivity with Reinforcement Learning for BVLoS Path Planning
标题:利用强化学习最大限度地提高无人机细胞连接性以实现BVLoS路径规划
链接:https://arxiv.org/abs/2509.13336
作者:hjati, Rosdiadee Nordin, Nor Fadzilah Abdullah
备注:Submitted to an IEEE Conference
摘要:提出了一种基于强化学习的蜂窝互联无人机超视距路径规划方法。我们的目标是最大限度地减少旅行距离,同时最大限度地提高蜂窝链路连接的质量,考虑现实世界的空中覆盖的限制,并采用经验的空中信道模型。提出的解决方案采用RL技术训练代理,使用UAV和基站(BS)之间的通信链路的质量作为奖励函数。仿真结果表明,该方法在训练智能体和生成可行的无人机路径规划方面是有效的。所提出的方法解决了由于无人机蜂窝通信的限制而带来的挑战,突出了在这一领域进行调查和考虑的必要性。RL算法有效地识别最佳路径,确保与地面BS的最大连通性,以确保安全可靠的BVLoS飞行操作。此外,该解决方案可以作为离线路径规划模块部署,可以集成到未来的地面控制系统(GCS)中,用于无人机操作,提高其能力和安全性。该方法具有复杂的远程无人机应用的潜力,推进了蜂窝连接无人机路径规划领域的技术。
摘要:This paper presents a reinforcement learning (RL) based approach for path planning of cellular connected unmanned aerial vehicles (UAVs) operating beyond visual line of sight (BVLoS). The objective is to minimize travel distance while maximizing the quality of cellular link connectivity by considering real world aerial coverage constraints and employing an empirical aerial channel model. The proposed solution employs RL techniques to train an agent, using the quality of communication links between the UAV and base stations (BSs) as the reward function. Simulation results demonstrate the effectiveness of the proposed method in training the agent and generating feasible UAV path plans. The proposed approach addresses the challenges due to limitations in UAV cellular communications, highlighting the need for investigations and considerations in this area. The RL algorithm efficiently identifies optimal paths, ensuring maximum connectivity with ground BSs to ensure safe and reliable BVLoS flight operation. Moreover, the solution can be deployed as an offline path planning module that can be integrated into future ground control systems (GCS) for UAV operations, enhancing their capabilities and safety. The method holds potential for complex long range UAV applications, advancing the technology in the field of cellular connected UAV path planning.
元学习(2篇)
【1】Meta-Learning Linear Models for Molecular Property Prediction
标题:用于分子性质预测的元学习线性模型
链接:https://arxiv.org/abs/2509.13527
作者:onova, Michael G. Taylor, Alice Allen, Ping Yang, Nicholas Lubbers
备注:26 pages, 16 figures
摘要:由于有限的高质量,一致的数据集,寻找结构-性质关系的化学家面临着巨大的挑战。机器学习(ML)在化学科学中具有显著的预测能力,但这些现代数据驱动的方法增加了对数据的需求。为了应对可解释人工智能(XAI)日益增长的需求,并弥合预测准确性和人类可理解性之间的差距,我们引入了LAMeL -一种用于元学习的线性算法,该算法在保持可解释性的同时提高了多个属性的预测准确性。虽然大多数方法孤立地处理每个化学预测任务,但LAMeL利用元学习框架来识别相关任务之间的共享模型参数,即使这些任务不共享数据,也允许它学习一个共同的函数流形,作为新的未知任务的更明智的起点。我们的方法提供的性能改进范围从1.1倍到25倍标准岭回归,这取决于数据集的域。虽然性能增强的程度因任务而异,但LAMeL始终优于或匹配传统的线性方法,使其成为准确性和可解释性至关重要的化学性质预测的可靠工具。
摘要
:Chemists in search of structure-property relationships face great challenges due to limited high quality, concordant datasets. Machine learning (ML) has significantly advanced predictive capabilities in chemical sciences, but these modern data-driven approaches have increased the demand for data. In response to the growing demand for explainable AI (XAI) and to bridge the gap between predictive accuracy and human comprehensibility, we introduce LAMeL - a Linear Algorithm for Meta-Learning that preserves interpretability while improving the prediction accuracy across multiple properties. While most approaches treat each chemical prediction task in isolation, LAMeL leverages a meta-learning framework to identify shared model parameters across related tasks, even if those tasks do not share data, allowing it to learn a common functional manifold that serves as a more informed starting point for new unseen tasks. Our method delivers performance improvements ranging from 1.1- to 25-fold over standard ridge regression, depending on the domain of the dataset. While the degree of performance enhancement varies across tasks, LAMeL consistently outperforms or matches traditional linear methods, making it a reliable tool for chemical property prediction where both accuracy and interpretability are critical.
【2】Prognosis of COVID-19 using Artificial Intelligence: A Systematic Review and Meta-analysis
标题:使用人工智能预测COVID-19:系统性综述和荟萃分析
链接:https://arxiv.org/abs/2408.00208
作者: Motamedian, Sadra Mohaghegh, Elham Babadi Oregani, Mahrsa Amjadi, Parnian Shobeiri, Negin Cheraghi, Niusha Solouki, Nikoo Ahmadi, Hossein Mohammad-Rahimi, Yassine Bouchareb, Arman Rahmim
摘要:目的:近年来,人工智能(AI)技术已被广泛用于多种疾病的诊断和预后。本研究识别、评估和综合了已发表的关于使用AI预测COVID-19的研究。方法:使用Medline、Google Scholar、Scopus、Embase、Cochrane和ProQuest进行电子检索。包括使用CT或胸部X光图像检查机器学习或深度学习方法以确定COVID-19预后的研究。计算敏感性、特异性曲线下面积和诊断优势比。测试结果:共纳入36篇文章;研究了各种与疾病相关的问题,包括疾病严重程度、机械通气或入住重症监护室和死亡率。采用了几种AI模型和架构,例如Siamense模型,支持向量机,随机森林,极限梯度提升和卷积神经网络。模型对死亡率、严重程度评估和通气需求的灵敏度分别为71%、88%和67%。上述变量的特异性分别为69%、89%和89%。结论:根据纳入的文章,使用CT或CXR图像的放射组学特征预测COVID-19患者的机器学习和深度学习方法可以帮助临床医生更有效地管理患者并分配资源。这些研究还表明,结合患者人口统计学、临床数据、实验室检查和放射组学特征可提高模型性能。
摘要:Purpose: Artificial intelligence (AI) techniques have been extensively utilized for diagnosing and prognosis of several diseases in recent years. This study identifies, appraises and synthesizes published studies on the use of AI for the prognosis of COVID-19. Method: Electronic search was performed using Medline, Google Scholar, Scopus, Embase, Cochrane and ProQuest. Studies that examined machine learning or deep learning methods to determine the prognosis of COVID-19 using CT or chest X-ray images were included. Polled sensitivity, specificity area under the curve and diagnostic odds ratio were calculated. Result: A total of 36 articles were included; various prognosis-related issues, including disease severity, mechanical ventilation or admission to the intensive care unit and mortality, were investigated. Several AI models and architectures were employed, such as the Siamense model, support vector machine, Random Forest , eXtreme Gradient Boosting, and convolutional neural networks. The models achieved 71%, 88% and 67% sensitivity for mortality, severity assessment and need for ventilation, respectively. The specificity of 69%, 89% and 89% were reported for the aforementioned variables. Conclusion: Based on the included articles, machine learning and deep learning methods used for the prognosis of COVID-19 patients using radiomic features from CT or CXR images can help clinicians manage patients and allocate resources more effectively. These studies also demonstrate that combining patient demographic, clinical data, laboratory tests and radiomic features improves model performances.
分层学习(1篇)
【1】Hierarchical Learning for Maze Navigation: Emergence of Mental Representations via Second-Order Learning
标题:迷宫导航的分层学习:通过二阶学习出现心理表示
链接:https://arxiv.org/abs/2509.14195
作者:inta Manir, Tim Oates
备注:8 pages, 3 figures
摘要:心理表征,其特征在于结构化的内部模型反映外部环境,是高级认知的基础,但仍然具有挑战性的实证研究。现有的理论假设,二阶学习-适应一阶学习的学习机制(即,了解任务/领域)--促进了这种环境认知同构的出现。在本文中,我们提出了一个层次结构,包括一个图卷积网络(GCN)作为一阶学习者和MLP控制器作为二阶学习者,经验验证了这一假设。GCN直接映射节点级功能的最佳导航路径的预测,而MLP动态适应GCN的参数时,面对结构新颖的迷宫环境。我们证明,二阶学习是特别有效的认知系统开发的内部心理地图结构同构的环境。定量和定性的结果突出了显着的性能改善和强大的概括看不见的迷宫任务,提供实证支持的关键作用,结构化的心理表征,最大限度地提高二阶学习的有效性。
摘要:Mental representation, characterized by structured internal models mirroring external environments, is fundamental to advanced cognition but remains challenging to investigate empirically. Existing theory hypothesizes that second-order learning -- learning mechanisms that adapt first-order learning (i.e., learning about the task/domain) -- promotes the emergence of such environment-cognition isomorphism. In this paper, we empirically validate this hypothesis by proposing a hierarchical architecture comprising a Graph Convolutional Network (GCN) as a first-order learner and an MLP controller as a second-order learner. The GCN directly maps node-level features to predictions of optimal navigation paths, while the MLP dynamically adapts the GCN's parameters when confronting structurally novel maze environments. We demonstrate that second-order learning is particularly effective when the cognitive system develops an internal mental map structurally isomorphic to the environment. Quantitative and qualitative results highlight significant performance improvements and robust generalization on unseen maze tasks, providing empirical support for the pivotal role of structured mental representations in maximizing the effectiveness of second-order learning.
医学相关(4篇)
【1】Differentially private federated learning for localized control of infectious disease dynamics
标题:用于传染病动态本地化控制的差异私人联邦学习
链接:https://arxiv.org/abs/2509.14024
作者:kouche, Henrik Zunker, Mario Fritz, Martin J. Kühn
备注:18 pages, 6 figures
摘要:在发生流行病时,必须迅速作出反应,以减缓流行病的蔓延。对于这一反应,地方化办法有几个优点,限制了必要的资源,减少了大规模干预的影响。然而,由于可用数据有限,在局部规模上训练单独的机器学习(ML)模型通常是不可行的。由于数据的高敏感性和隐私限制,集中化数据也具有挑战性。在这项研究中,我们考虑了一个本地化的战略基础上,德国的县和社区管理的相关地方卫生当局(LHA)。为了保护隐私,不反对提供详细的情况数据,我们提出了一个隐私保护的预测方法,可以帮助公共卫生专家和决策者。具有联邦学习(FL)的ML方法训练共享模型,而无需集中原始数据。考虑到县,社区或LHA作为客户端,并找到效用和隐私之间的平衡,我们研究了一个FL框架与客户端级别的差异隐私(DP)。我们训练一个共享的多层感知器上的滑动窗口的最近的情况下计数预测的情况下,而客户端只交换norm-clipped更新和DP噪声的服务器聚合更新。我们分两个阶段评估县级COVID-19数据的方法。正如预期的那样,非常严格的隐私产生不稳定,不可用的预测。在中等偏强水平,DP模型接近非DP模型:2020年11月,R^2 = 0.94$(对比0.95),平均绝对百分比误差(MAPE)为26%; 2022年3月,R^2 = 0.88$(对比0.93),MAPE为21%。总体而言,客户端级别的DP-FL可以提供有用的县级预测,具有强大的隐私保障,可行的隐私预算取决于流行阶段,允许卫生当局之间进行符合隐私的合作,以进行当地预测。
摘要
:In times of epidemics, swift reaction is necessary to mitigate epidemic spreading. For this reaction, localized approaches have several advantages, limiting necessary resources and reducing the impact of interventions on a larger scale. However, training a separate machine learning (ML) model on a local scale is often not feasible due to limited available data. Centralizing the data is also challenging because of its high sensitivity and privacy constraints. In this study, we consider a localized strategy based on the German counties and communities managed by the related local health authorities (LHA). For the preservation of privacy to not oppose the availability of detailed situational data, we propose a privacy-preserving forecasting method that can assist public health experts and decision makers. ML methods with federated learning (FL) train a shared model without centralizing raw data. Considering the counties, communities or LHAs as clients and finding a balance between utility and privacy, we study a FL framework with client-level differential privacy (DP). We train a shared multilayer perceptron on sliding windows of recent case counts to forecast the number of cases, while clients exchange only norm-clipped updates and the server aggregated updates with DP noise. We evaluate the approach on COVID-19 data on county-level during two phases. As expected, very strict privacy yields unstable, unusable forecasts. At a moderately strong level, the DP model closely approaches the non-DP model: $R^2= 0.94$ (vs. 0.95) and mean absolute percentage error (MAPE) of 26 % in November 2020; $R^2= 0.88$ (vs. 0.93) and MAPE of 21 % in March 2022. Overall, client-level DP-FL can deliver useful county-level predictions with strong privacy guarantees, and viable privacy budgets depend on epidemic phase, allowing privacy-compliant collaboration among health authorities for local forecasting.
【2】LamiGauss: Pitching Radiative Gaussian for Sparse-View X-ray Laminography Reconstruction
标题:LamiGauss:发射高斯用于稀疏视图X射线层压重建
链接:https://arxiv.org/abs/2509.13863
作者: Ander Biguri, Jean-Michel Morel, Raymond H. Chan, Carola-Bibiane Schönlieb, Jizhou Li
摘要:X射线计算机断层扫描(CL)对于微芯片和复合电池材料等应用中的板状结构的无损检测至关重要,而传统的计算机断层扫描(CT)由于几何约束而难以实现。然而,重建高质量的体积从分层投影仍然具有挑战性,特别是在高度稀疏视图采集条件下。在本文中,我们提出了一种重建算法,即LamiGauss,它结合了高斯溅射辐射光栅化与一个专用的探测器到世界的转换模型,将分层倾斜角。LamiGauss利用初始化策略,从初步重建中显式过滤掉常见的分层图像伪影,防止将冗余的高斯分配给错误的结构,从而将模型容量集中在表示真正的对象上。我们的方法有效地直接从稀疏投影优化,使有限的数据准确和有效的重建。在合成数据集和真实数据集上的大量实验证明了该方法的有效性和优越性。LamiGauss仅使用3$\%$的完整视图,以实现优于在完整数据集上优化的迭代方法的性能。
摘要:X-ray Computed Laminography (CL) is essential for non-destructive inspection of plate-like structures in applications such as microchips and composite battery materials, where traditional computed tomography (CT) struggles due to geometric constraints. However, reconstructing high-quality volumes from laminographic projections remains challenging, particularly under highly sparse-view acquisition conditions. In this paper, we propose a reconstruction algorithm, namely LamiGauss, that combines Gaussian Splatting radiative rasterization with a dedicated detector-to-world transformation model incorporating the laminographic tilt angle. LamiGauss leverages an initialization strategy that explicitly filters out common laminographic artifacts from the preliminary reconstruction, preventing redundant Gaussians from being allocated to false structures and thereby concentrating model capacity on representing the genuine object. Our approach effectively optimizes directly from sparse projections, enabling accurate and efficient reconstruction with limited data. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness and superiority of the proposed method over existing techniques. LamiGauss uses only 3$\%$ of full views to achieve superior performance over the iterative method optimized on a full dataset.
【3】Consistent View Alignment Improves Foundation Models for 3D Medical Image Segmentation
标题:一致的视图对齐改进了3D医学图像分割的基础模型
链接:https://arxiv.org/abs/2509.13846
作者:h, Felix Meister, Tobias Heimann, Christoph Brune, Jelmer M. Wolterink
备注:MICCAI 2025: 1st Place in Transformer track and 2nd Place in Convolution track of SSL3D-OpenMind challenge
摘要:许多最近的表示学习方法隐含地假设数据点的不相关视图足以学习各种下游任务的有意义的表示。在这项工作中,我们挑战这一假设,并证明,有意义的结构在潜在的空间不会自然出现。相反,它必须是明确的诱导。我们提出了一种方法,从不同的数据视图对齐表示对齐互补的信息,而不会引起误报。我们的实验表明,我们提出的自监督学习方法,一致的视图对齐,提高了下游任务的性能,突出了结构化视图对齐在学习有效表示的关键作用。我们的方法在MICCAI 2025 SSL 3D挑战赛中分别使用Primus Vision Transformer和ResEnc卷积神经网络获得了第一名和第二名。代码和预训练的模型权重在https://github.com/Tenbatsu24/LatentCampus上发布。
摘要:Many recent approaches in representation learning implicitly assume that uncorrelated views of a data point are sufficient to learn meaningful representations for various downstream tasks. In this work, we challenge this assumption and demonstrate that meaningful structure in the latent space does not emerge naturally. Instead, it must be explicitly induced. We propose a method that aligns representations from different views of the data to align complementary information without inducing false positives. Our experiments show that our proposed self-supervised learning method, Consistent View Alignment, improves performance for downstream tasks, highlighting the critical role of structured view alignment in learning effective representations. Our method achieved first and second place in the MICCAI 2025 SSL3D challenge when using a Primus vision transformer and ResEnc convolutional neural network, respectively. The code and pretrained model weights are released at https://github.com/Tenbatsu24/LatentCampus.
【4】PREDICT-GBM: Platform for Robust Evaluation and Development of Individualized Computational Tumor Models in Glioblastoma
标题:PREDICT-GBM:胶质母细胞瘤个体化计算肿瘤模型稳健评估和开发平台
链接:https://arxiv.org/abs/2509.13360
作者:, J. Weidner, M. Balcerak, F. Kofler, I. Ezhov, B. Menze, B. Wiestler
摘要:胶质母细胞瘤是最常见的原发性脑恶性肿瘤,其特点是高度侵袭性和极高的复发率。传统的放射治疗,采用统一的治疗边缘,未能考虑到病人的具体解剖和生物学因素,严重影响肿瘤细胞的迁移。为了解决这一限制,已经开发了许多胶质母细胞瘤生长的计算模型,使得能够生成延伸到放射学可见区域之外的肿瘤细胞分布图,从而为更精确的治疗策略提供信息。然而,尽管初步研究结果令人鼓舞,但这些生长模型的临床应用仍然有限。为了弥合这一翻译差距并加速模型开发和临床验证,我们引入了PREDICT-GBM,这是一个用于建模和评估的综合集成管道和数据集。该平台能够使用专家策划的临床数据集对最先进的肿瘤生长模型进行系统基准测试,该数据集包括255名受试者,具有完整的肿瘤分割和组织表征图。我们的分析表明,与传统的均匀边缘方法相比,来自肿瘤生长预测的个性化放射治疗计划对两个评估模型的复发覆盖率更高。这项工作建立了一个强大的平台,用于推进和系统地评估尖端的肿瘤生长建模方法,最终目标是促进临床转化和改善患者预后。
摘要:Glioblastoma is the most prevalent primary brain malignancy, distinguished by its highly invasive behavior and exceptionally high rates of recurrence. Conventional radiation therapy, which employs uniform treatment margins, fails to account for patient-specific anatomical and biological factors that critically influence tumor cell migration. To address this limitation, numerous computational models of glioblastoma growth have been developed, enabling generation of tumor cell distribution maps extending beyond radiographically visible regions and thus informing more precise treatment strategies. However, despite encouraging preliminary findings, the clinical adoption of these growth models remains limited. To bridge this translational gap and accelerate both model development and clinical validation, we introduce PREDICT-GBM, a comprehensive integrated pipeline and dataset for modeling and evaluation. This platform enables systematic benchmarking of state-of-the-art tumor growth models using an expert-curated clinical dataset comprising 255 subjects with complete tumor segmentations and tissue characterization maps. Our analysis demonstrates that personalized radiation treatment plans derived from tumor growth predictions achieved superior recurrence coverage compared to conventional uniform margin approaches for two of the evaluated models. This work establishes a robust platform for advancing and systematically evaluating cutting-edge tumor growth modeling approaches, with the ultimate goal of facilitating clinical translation and improving patient outcomes.
推荐(1篇)
【1】Sequential Data Augmentation for Generative Recommendation
标题:生成式推荐的顺序数据增强
链接:https://arxiv.org/abs/2509.13648
作者: Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Kijung Shin, Neil Shah, Liam Collins
摘要:生成式推荐在个性化系统中起着至关重要的作用,它根据用户的历史行为序列预测用户未来的交互行为。在训练这些模型时,一个关键但未充分探索的因素是数据增强,即从用户交互历史构建训练数据的过程。通过塑造训练分布,数据增强直接并且经常实质性地影响模型的泛化和性能。然而,在许多现有的工作中,这个过程被简化,应用不一致,或作为一个次要的设计选择,没有系统和原则的理解其影响。 基于我们的经验发现,不同的增强策略可能会产生很大的性能差异,我们深入分析了它们如何重塑训练分布,影响与未来目标的一致性,以及对未知输入的泛化。为了系统化这个设计空间,我们提出了GenPAS,一个广义的和原则性的框架,模型增强作为一个随机采样过程的输入-目标对与三个偏差控制的步骤:序列采样,目标采样和输入采样。该公式将广泛使用的策略统一为特殊情况,并能够灵活控制由此产生的训练分布。我们在基准数据集和工业数据集上的大量实验表明,与现有策略相比,GenPAS具有更高的准确性,数据效率和参数效率,为生成式推荐中的原则性训练数据构建提供了实际指导。
摘要:Generative recommendation plays a crucial role in personalized systems, predicting users' future interactions from their historical behavior sequences. A critical yet underexplored factor in training these models is data augmentation, the process of constructing training data from user interaction histories. By shaping the training distribution, data augmentation directly and often substantially affects model generalization and performance. Nevertheless, in much of the existing work, this process is simplified, applied inconsistently, or treated as a minor design choice, without a systematic and principled understanding of its effects. Motivated by our empirical finding that different augmentation strategies can yield large performance disparities, we conduct an in-depth analysis of how they reshape training distributions and influence alignment with future targets and generalization to unseen inputs. To systematize this design space, we propose GenPAS, a generalized and principled framework that models augmentation as a stochastic sampling process over input-target pairs with three bias-controlled steps: sequence sampling, target sampling, and input sampling. This formulation unifies widely used strategies as special cases and enables flexible control of the resulting training distribution. Our extensive experiments on benchmark and industrial datasets demonstrate that GenPAS yields superior accuracy, data efficiency, and parameter efficiency compared to existing strategies, providing practical guidance for principled training data construction in generative recommendation.
超分辨率|去噪|去模糊|去雾(1篇)
【1】Data Denoising and Derivative Estimation for Data-Driven Modeling of Nonlinear Dynamical Systems
标题:非线性动态系统数据驱动建模的数据去噪和求导估计
链接:https://arxiv.org/abs/2509.14219
作者:, Lewis Mitchell, John Maclean, Hemanth Saratchandran
摘要:非线性动态系统的数据驱动建模经常受到测量噪声的阻碍。我们提出了一个去噪框架,称为Runge-Kutta和基于总变差的隐式神经表示(RKTV-INR),它表示状态轨迹与隐式神经表示(INR)直接拟合噪声观测。龙格-库塔积分和全变分作为约束条件,以确保重建的状态是一个动态系统的轨迹,保持接近原始数据。经过训练的INR产生干净、连续的轨迹,并通过自动微分提供准确的一阶导数。然后将这些去噪状态和导数提供给非线性动力学稀疏识别(SINDy)以恢复控制方程。实验表明,有效的噪声抑制,精确的导数估计,和可靠的系统识别。
摘要:Data-driven modeling of nonlinear dynamical systems is often hampered by measurement noise. We propose a denoising framework, called Runge-Kutta and Total Variation Based Implicit Neural Representation (RKTV-INR), that represents the state trajectory with an implicit neural representation (INR) fitted directly to noisy observations. Runge-Kutta integration and total variation are imposed as constraints to ensure that the reconstructed state is a trajectory of a dynamical system that remains close to the original data. The trained INR yields a clean, continuous trajectory and provides accurate first-order derivatives via automatic differentiation. These denoised states and derivatives are then supplied to Sparse Identification of Nonlinear Dynamics (SINDy) to recover the governing equations. Experiments demonstrate effective noise suppression, precise derivative estimation, and reliable system identification.
自动驾驶|车辆|车道检测等(2篇)
【1】A Domain Knowledge Informed Approach for Anomaly Detection of Electric Vehicle Interior Sounds
标题:电动汽车车内声音异常检测领域知识知情方法
链接:https://arxiv.org/abs/2509.13390
作者:nte, Bram Cornelis, Claudio Colangeli, Karl Janssens, Brecht Van Baelen, Konstantinos Gryllias
备注:Submitted to: Mechanical Systems and Signal Processing
摘要:汽车驾驶室声音异常的检测对于确保车辆质量和保持乘客舒适性至关重要。在许多现实环境中,由于标记错误数据的稀缺性或完全不存在,该任务更适合作为无监督学习问题而不是监督情况。在这种无监督设置中,模型只在健康样本上进行训练,并将异常检测为与正常行为的偏差。然而,在缺乏用于验证的标记错误样本以及常用度量的有限可靠性(例如验证重建误差)的情况下,有效的模型选择仍然是一个重大挑战。为了克服这些限制,提出了一种基于领域知识的模型选择方法,其中通过健康谱图的结构化扰动设计的代理异常用于验证集以支持模型选择。在高保真电动汽车数据集上评估了所提出的方法,该数据集包括五种代表性故障类型的健康和故障车厢声音,即,不平衡、调制、呜呜声、风和脉宽调制。该数据集使用先进的声音合成技术生成,并通过专家评审团评估进行验证,已公开提供,以促进进一步的研究。五个故障情况下的实验评估表明,使用代理异常的最佳模型的选择,显着优于传统的模型选择策略。
摘要:The detection of anomalies in automotive cabin sounds is critical for ensuring vehicle quality and maintaining passenger comfort. In many real-world settings, this task is more appropriately framed as an unsupervised learning problem rather than the supervised case due to the scarcity or complete absence of labeled faulty data. In such an unsupervised setting, the model is trained exclusively on healthy samples and detects anomalies as deviations from normal behavior. However, in the absence of labeled faulty samples for validation and the limited reliability of commonly used metrics, such as validation reconstruction error, effective model selection remains a significant challenge. To overcome these limitations, a domain-knowledge-informed approach for model selection is proposed, in which proxy-anomalies engineered through structured perturbations of healthy spectrograms are used in the validation set to support model selection. The proposed methodology is evaluated on a high-fidelity electric vehicle dataset comprising healthy and faulty cabin sounds across five representative fault types viz., Imbalance, Modulation, Whine, Wind, and Pulse Width Modulation. This dataset, generated using advanced sound synthesis techniques, and validated via expert jury assessments, has been made publicly available to facilitate further research. Experimental evaluations on the five fault cases demonstrate the selection of optimal models using proxy-anomalies, significantly outperform conventional model selection strategies.
【2】VEGA: Electric Vehicle Navigation Agent via Physics-Informed Neural Operator and Proximal Policy Optimization
标题:VEGA:通过物理信息神经操作器和近端策略优化的电动汽车导航代理
链接:https://arxiv.org/abs/2509.13386
作者:m, Minhyeok Im, Jonathan Boyack, Jee Won Lee, Jongseong Brad Choi
备注:This work has been submitted to the 2026 IEEE International Conference on Robotics and Automation (ICRA) for possible publication
摘要:对软件定义汽车(SDV)的需求正在上升,电动汽车(EV)越来越多地配备功能强大的计算机。这使得车载AI系统能够优化充电感知路径优化,以反映车辆的当前状况和环境。我们提出了VEGA,一个充电感知的EV导航代理,它使用邻近策略优化(PPO)在充电状态(SoC)可行性下使用预算A* 师生指导来规划充电器注释的道路图。VEGA由两个模块组成。首先,一个物理信息神经操作员(PINO),在真实的车辆速度和电池功率日志上训练,使用最近的车辆速度日志来估计空气动力学阻力,滚动阻力,质量,电机和再生制动效率,以及通过学习车辆自定义动力学的辅助负载。其次,强化学习(RL)代理使用这些动态来优化路径,在SoC约束下具有最佳充电停止和停留时间。VEGA不需要额外的传感器,只使用车速信号。它可以用作功率和效率的虚拟传感器,以潜在地降低EV成本。在对旧金山到纽约等长途路线的评估中,VEGA的停靠站、停留时间、SoC管理和总行驶时间与Tesla Trip Planner密切相关,但稍微保守一些,这可能是由于实际车辆状况,例如车辆参数因恶化而漂移。虽然仅在美国地区接受过培训,但VEGA能够计算法国和日本的最佳电荷感知路径,证明了其普遍性。它实现了物理信息学习和强化学习的实际集成,用于EV生态路由。
摘要:Demands for software-defined vehicles (SDV) are rising and electric vehicles (EVs) are increasingly being equipped with powerful computers. This enables onboard AI systems to optimize charge-aware path optimization customized to reflect vehicle's current condition and environment. We present VEGA, a charge-aware EV navigation agent that plans over a charger-annotated road graph using Proximal Policy Optimization (PPO) with budgeted A* teacher-student guidance under state-of-charge (SoC) feasibility. VEGA consists of two modules. First, a physics-informed neural operator (PINO), trained on real vehicle speed and battery-power logs, uses recent vehicle speed logs to estimate aerodynamic drag, rolling resistance, mass, motor and regenerative-braking efficiencies, and auxiliary load by learning a vehicle-custom dynamics. Second, a Reinforcement Learning (RL) agent uses these dynamics to optimize a path with optimal charging stops and dwell times under SoC constraints. VEGA requires no additional sensors and uses only vehicle speed signals. It may serve as a virtual sensor for power and efficiency to potentially reduce EV cost. In evaluation on long routes like San Francisco to New York, VEGA's stops, dwell times, SoC management, and total travel time closely track Tesla Trip Planner while being slightly more conservative, presumably due to real vehicle conditions such as vehicle parameter drift due to deterioration. Although trained only in U.S. regions, VEGA was able to compute optimal charge-aware paths in France and Japan, demonstrating generalizability. It achieves practical integration of physics-informed learning and RL for EV eco-routing.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors
标题:ColonCrafter:使用扩散先验的结肠镜检查视频深度估计模型
链接:https://arxiv.org/abs/2509.13525
作者:rdy, Tyler Berzin, Pranav Rajpurkar
备注:12 pages, 8 figures
摘要:结肠镜检查中的三维(3D)场景理解提出了重大挑战,需要自动化方法进行准确的深度估计。然而,现有的深度估计模型的内窥镜斗争的时间一致性,在视频序列,限制了其适用性的3D重建。我们提出了ColonCrafter,一个基于扩散的深度估计模型,可以从单眼结肠镜视频中生成时间一致的深度图。我们的方法从合成结肠镜序列中学习鲁棒的几何先验,以生成时间上一致的深度图。我们还介绍了一种风格转移技术,该技术在保留几何结构的同时,适应真实的临床视频,以匹配我们的合成训练域。ColonCrafter在C3VD数据集上实现了最先进的zero-shot性能,优于通用和内窥镜专用方法。虽然全轨迹三维重建仍然是一个挑战,我们展示了临床相关的应用ColonCrafter,包括三维点云生成和表面覆盖评估。
摘要:Three-dimensional (3D) scene understanding in colonoscopy presents significant challenges that necessitate automated methods for accurate depth estimation. However, existing depth estimation models for endoscopy struggle with temporal consistency across video sequences, limiting their applicability for 3D reconstruction. We present ColonCrafter, a diffusion-based depth estimation model that generates temporally consistent depth maps from monocular colonoscopy videos. Our approach learns robust geometric priors from synthetic colonoscopy sequences to generate temporally consistent depth maps. We also introduce a style transfer technique that preserves geometric structure while adapting real clinical videos to match our synthetic training domain. ColonCrafter achieves state-of-the-art zero-shot performance on the C3VD dataset, outperforming both general-purpose and endoscopy-specific approaches. Although full trajectory 3D reconstruction remains a challenge, we demonstrate clinically relevant applications of ColonCrafter, including 3D point cloud generation and surface coverage assessment.
联邦学习|隐私保护|加密(3篇)
【1】FedSSG: Expectation-Gated and History-Aware Drift Alignment for Federated Learning
标题:FedSSG:联邦学习的期望门控和历史感知漂移一致
链接:https://arxiv.org/abs/2509.13895
作者:Zhou, Jinshan Lai, Fengchun Zhang, Zeqin Wu, Fengli Zhang
备注:4 page main text for conference
摘要:非IID数据和部分参与会导致联邦学习中的客户端漂移和不一致的局部最优,导致不稳定的收敛和精度损失。我们提出了FedSSG,一个随机采样引导的,历史感知的漂移对齐方法。FedSSG维护每个客户端的漂移记忆,累积局部模型差异作为历史梯度的轻量级草图;至关重要的是,它通过观测/预期参与率的平滑函数(从服务器采样器导出的相位-预期信号)来门控记忆更新和局部对齐项。当采样噪声在早期占主导地位时,这种统计接地门保持微弱和平滑,然后一旦参与统计稳定,就会加强,在没有额外沟通的情况下缩小局部-全局差距。在CIFAR-10/100中,100/500客户和2- 15%的参与率,FedSSG始终优于强大的漂移感知基线并加速收敛;在我们的基准测试中,它将测试准确性提高了几个点(例如,在CIFAR-10上约为+0.9,在CIFAR-100上约为+2.7),并且平均产生约4.5倍更快的目标精度收敛。该方法只增加了O(d)的客户端内存和一个常数时间门,并在近似IID或均匀采样下优雅地退化为温和的正则化器。FedSSG表明,抽样统计可以变成一个有原则的、历史感知的阶段控制,以稳定和加速联邦训练。
摘要:Non-IID data and partial participation induce client drift and inconsistent local optima in federated learning, causing unstable convergence and accuracy loss. We present FedSSG, a stochastic sampling-guided, history-aware drift alignment method. FedSSG maintains a per-client drift memory that accumulates local model differences as a lightweight sketch of historical gradients; crucially, it gates both the memory update and the local alignment term by a smooth function of the observed/expected participation ratio (a phase-by-expectation signal derived from the server sampler). This statistically grounded gate stays weak and smooth when sampling noise dominates early, then strengthens once participation statistics stabilize, contracting the local-global gap without extra communication. Across CIFAR-10/100 with 100/500 clients and 2-15 percent participation, FedSSG consistently outperforms strong drift-aware baselines and accelerates convergence; on our benchmarks it improves test accuracy by up to a few points (e.g., about +0.9 on CIFAR-10 and about +2.7 on CIFAR-100 on average over the top-2 baseline) and yields about 4.5x faster target-accuracy convergence on average. The method adds only O(d) client memory and a constant-time gate, and degrades gracefully to a mild regularizer under near-IID or uniform sampling. FedSSG shows that sampling statistics can be turned into a principled, history-aware phase control to stabilize and speed up federated training.
【2】ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning
标题:ParaAegis:灵活保护隐私的联邦学习的并行保护
链接:https://arxiv.org/abs/2509.13739
作者:(1), Yuecheng Li (1), Tianchi Liao (2), Jian Lou (2), Chuan Chen (1) ((1) School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China (2) School of Software Engineering, Sun Yat-sen University, Zhuhai, China)
备注:8 pages, 1 figure
摘要:联邦学习(FL)面临着一个关键的困境:现有的保护机制,如差分隐私(DP)和同态加密(HE)强制执行严格的权衡,迫使模型效用和计算效率之间的选择。这种缺乏灵活性的情况妨碍了实际执行。为了解决这个问题,我们引入了ParaAegis,一个并行的保护框架,旨在让从业者灵活地控制隐私-效用-效率的平衡。我们的核心创新是战略模型划分方案。通过将轻量级DP应用于模型的不太关键的低范数部分,同时用HE保护其余部分,我们创建了一个可调系统。一个分布式的投票机制确保了这种划分的一致性。理论分析证实了在相同隐私条件下效率和效用的调整。最重要的是,实验结果表明,通过调整超参数,我们的方法可以在模型精度和训练时间之间灵活地进行优先级排序。
摘要:Federated learning (FL) faces a critical dilemma: existing protection mechanisms like differential privacy (DP) and homomorphic encryption (HE) enforce a rigid trade-off, forcing a choice between model utility and computational efficiency. This lack of flexibility hinders the practical implementation. To address this, we introduce ParaAegis, a parallel protection framework designed to give practitioners flexible control over the privacy-utility-efficiency balance. Our core innovation is a strategic model partitioning scheme. By applying lightweight DP to the less critical, low norm portion of the model while protecting the remainder with HE, we create a tunable system. A distributed voting mechanism ensures consensus on this partitioning. Theoretical analysis confirms the adjustments between efficiency and utility with the same privacy. Crucially, the experimental results demonstrate that by adjusting the hyperparameters, our method enables flexible prioritization between model accuracy and training time.
【3】Secure UAV-assisted Federated Learning: A Digital Twin-Driven Approach with Zero-Knowledge Proofs
标题:安全的无人机辅助联邦学习:一种具有零知识证明的数字孪生驱动方法
链接:https://arxiv.org/abs/2509.13634
作者:ar Al Zami, Md Raihan Uddin, Dinh C. Nguyen
备注:15 pages, under revision at IEEE Internet of Things Journal
摘要:联邦学习(FL)作为一种在分散网络上训练机器学习模型的隐私保护方法已经越来越受欢迎。然而,为了确保无人机辅助FL系统的可靠运行,必须解决诸如过度能耗、通信效率低下和安全漏洞等问题。本文提出了一个创新的框架,集成数字孪生(DT)技术和零知识联合学习(zkFed)来应对这些挑战。UAV充当移动基站,允许分散的设备在本地训练FL模型并上传模型更新以进行聚合。通过结合DT技术,我们的方法可以实现实时系统监控和预测性维护,提高无人机网络效率。此外,零知识证明(ZKP)通过允许在不暴露敏感数据的情况下进行模型验证来加强安全性。为了优化能源效率和资源管理,我们引入了一种动态分配策略,可以根据网络条件调整无人机的飞行路径、传输功率和处理速率。使用块坐标下降和凸优化技术,我们的方法显着降低了系统的能量消耗高达29.6%相比,传统的FL方法。仿真结果表明,改进的学习性能,安全性和可扩展性,定位此框架作为一个有前途的解决方案,为下一代无人机为基础的智能网络。
摘要:Federated learning (FL) has gained popularity as a privacy-preserving method of training machine learning models on decentralized networks. However to ensure reliable operation of UAV-assisted FL systems, issues like as excessive energy consumption, communication inefficiencies, and security vulnerabilities must be solved. This paper proposes an innovative framework that integrates Digital Twin (DT) technology and Zero-Knowledge Federated Learning (zkFed) to tackle these challenges. UAVs act as mobile base stations, allowing scattered devices to train FL models locally and upload model updates for aggregation. By incorporating DT technology, our approach enables real-time system monitoring and predictive maintenance, improving UAV network efficiency. Additionally, Zero-Knowledge Proofs (ZKPs) strengthen security by allowing model verification without exposing sensitive data. To optimize energy efficiency and resource management, we introduce a dynamic allocation strategy that adjusts UAV flight paths, transmission power, and processing rates based on network conditions. Using block coordinate descent and convex optimization techniques, our method significantly reduces system energy consumption by up to 29.6% compared to conventional FL approaches. Simulation results demonstrate improved learning performance, security, and scalability, positioning this framework as a promising solution for next-generation UAV-based intelligent networks.
推理|分析|理解|解释(4篇)
【1】Dense Video Understanding with Gated Residual Tokenization
标题:使用门控剩余令牌化的密集视频理解
链接:https://arxiv.org/abs/2509.14199
作者:hang, Wenhao Chai, Shwai He, Ang Li, Yun Fu
摘要:在视频理解中,高时间分辨率对于捕获细粒度细节是必不可少的。然而,目前的视频大语言模型(VLLM)和基准大多依赖于低帧速率采样,如均匀采样或关键帧选择,丢弃密集的时间信息。这种折衷避免了对每个帧进行标记化的高成本,否则会导致冗余计算和随着视频长度增加的线性标记增长。虽然这种权衡适用于缓慢变化的内容,但对于像演讲理解这样的任务来说,它失败了,因为信息几乎出现在每一帧中,并且需要精确的时间对齐。为了解决这一差距,我们引入了密集视频理解(DVU),它通过减少令牌化时间和令牌开销来实现高FPS视频理解。现有的基准测试也是有限的,因为它们的QA对集中在粗略的内容更改上。因此,我们提出了DIVE(密集信息视频评估),第一个为密集时间推理设计的基准。为了使DVU实用化,我们提出了门控残差令牌化(GRT),一个两阶段的框架:(1)运动补偿门控间令牌化在令牌化期间使用像素级运动估计来跳过静态区域,实现令牌计数和计算的次线性增长。(2)语义场景内标记化合并在场景内的静态区域中融合标记,在保留动态语义的同时进一步减少冗余。DIVE上的实验表明,GRT优于较大的VLLM基线,并与FPS呈正相关。这些结果突出了密集的时间信息的重要性,并表明GRT能够实现高效,可扩展的高FPS视频理解。
摘要:High temporal resolution is essential for capturing fine-grained details in video understanding. However, current video large language models (VLLMs) and benchmarks mostly rely on low-frame-rate sampling, such as uniform sampling or keyframe selection, discarding dense temporal information. This compromise avoids the high cost of tokenizing every frame, which otherwise leads to redundant computation and linear token growth as video length increases. While this trade-off works for slowly changing content, it fails for tasks like lecture comprehension, where information appears in nearly every frame and requires precise temporal alignment. To address this gap, we introduce Dense Video Understanding (DVU), which enables high-FPS video comprehension by reducing both tokenization time and token overhead. Existing benchmarks are also limited, as their QA pairs focus on coarse content changes. We therefore propose DIVE (Dense Information Video Evaluation), the first benchmark designed for dense temporal reasoning. To make DVU practical, we present Gated Residual Tokenization (GRT), a two-stage framework: (1) Motion-Compensated Inter-Gated Tokenization uses pixel-level motion estimation to skip static regions during tokenization, achieving sub-linear growth in token count and compute. (2) Semantic-Scene Intra-Tokenization Merging fuses tokens across static regions within a scene, further reducing redundancy while preserving dynamic semantics. Experiments on DIVE show that GRT outperforms larger VLLM baselines and scales positively with FPS. These results highlight the importance of dense temporal information and demonstrate that GRT enables efficient, scalable high-FPS video understanding.
【2】SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation
标题:SpecDiff:加速扩散模型推理与自我猜测
链接:https://arxiv.org/abs/2509.13848
作者:, Jiaming Xu, Yongkang Zhou, Guohao Dai
摘要:特征缓存最近成为扩散模型加速的一种很有前途的方法。该算法通过缓存扩散模型推理过程中的相似特征,有效地解决了高计算量所带来的效率低下问题。本文从信息利用的角度分析了现有的特征缓存方法,指出单纯依赖历史信息会导致精度和速度性能受限。我们提出了一种新的范式,通过引入未来的信息,通过自我推测的基础上的信息相似性在同一时间步在不同的迭代时间。基于这种模式,我们提出了\textit{SpecDiff},一个免训练的多级特征缓存策略,包括缓存特征选择算法和多级特征分类算法。(1)基于自推测信息的特征选择算法。\textit{SpecDiff}根据自推测信息和历史信息为每个token确定动态重要性分数,并通过重要性分数执行缓存特征选择。(2)基于特征重要性分数的多层特征分类算法。\textit{SpecDiff}通过利用特征重要性得分的差异对令牌进行分类,并引入了多级特征计算策略。大量的实验表明,与NVIDIA A800- 80 GB GPU上的RFlow相比,\textit{SpecDiff}在稳定扩散3、3.5和FLUX中实现了平均2.80 \倍、2.74 \倍和3.17\倍的加速,质量损失可以忽略不计。通过合并推测和历史信息,\textit{SpecDiff}克服了加速-准确性权衡瓶颈,推动了有效扩散模型推理中加速和准确性的帕累托前沿。
摘要
:Feature caching has recently emerged as a promising method for diffusion model acceleration. It effectively alleviates the inefficiency problem caused by high computational requirements by caching similar features in the inference process of the diffusion model. In this paper, we analyze existing feature caching methods from the perspective of information utilization, and point out that relying solely on historical information will lead to constrained accuracy and speed performance. And we propose a novel paradigm that introduces future information via self-speculation based on the information similarity at the same time step across different iteration times. Based on this paradigm, we present \textit{SpecDiff}, a training-free multi-level feature caching strategy including a cached feature selection algorithm and a multi-level feature classification algorithm. (1) Feature selection algorithm based on self-speculative information. \textit{SpecDiff} determines a dynamic importance score for each token based on self-speculative information and historical information, and performs cached feature selection through the importance score. (2) Multi-level feature classification algorithm based on feature importance scores. \textit{SpecDiff} classifies tokens by leveraging the differences in feature importance scores and introduces a multi-level feature calculation strategy. Extensive experiments show that \textit{SpecDiff} achieves average 2.80 \times, 2.74 \times , and 3.17\times speedup with negligible quality loss in Stable Diffusion 3, 3.5, and FLUX compared to RFlow on NVIDIA A800-80GB GPU. By merging speculative and historical information, \textit{SpecDiff} overcomes the speedup-accuracy trade-off bottleneck, pushing the Pareto frontier of speedup and accuracy in the efficient diffusion model inference.
【3】DeepLogit: A sequentially constrained explainable deep learning modeling approach for transport policy analysis
标题:DeepLogit:一种用于交通政策分析的顺序约束可解释深度学习建模方法
链接:https://arxiv.org/abs/2509.13633
作者:n, Rakhi Manohar Mepparambath, Ling Feng
摘要:尽管深度学习模型在许多应用中取得了重大进展,但由于这些模型的黑盒性质,它们在规划和政策相关领域的适应仍然具有挑战性。在这项工作中,我们开发了一组DeepLogit模型,这些模型遵循一种新的顺序约束方法来估计用于运输政策分析的深度学习模型。在所提出的方法的第一步中,我们估计一个卷积神经网络(CNN)模型,只有线性项,这是一个线性参数多项式logit模型的等价。然后,我们通过将需要可解释性的参数限制在线性参数CNN模型中获得的值并包括高阶项或通过引入高级深度学习架构(如Transformers)来估计其他深度学习模型。我们的方法可以保留所选参数的可解释性,但提供了显着提高模型的精度比离散选择模型。我们展示了我们的方法上的过境路线选择的例子,使用现实世界的过境智能卡数据从新加坡。这项研究显示了统一方法的潜力,其中基于理论的离散选择模型(DCM)和数据驱动的AI模型可以利用彼此在可解释性和预测能力方面的优势。随着更大的数据集和更复杂的结构的可用性,这种方法可以使用离散选择模型来产生更准确的模型,同时保持其在规划和政策相关领域的适用性。我们的代码可以在https://github.com/jeremyoon/route-choice/上找到。
摘要:Despite the significant progress of deep learning models in multitude of applications, their adaption in planning and policy related areas remains challenging due to the black-box nature of these models. In this work, we develop a set of DeepLogit models that follow a novel sequentially constrained approach in estimating deep learning models for transport policy analysis. In the first step of the proposed approach, we estimate a convolutional neural network (CNN) model with only linear terms, which is equivalent of a linear-in-parameter multinomial logit model. We then estimate other deep learning models by constraining the parameters that need interpretability at the values obtained in the linear-in-parameter CNN model and including higher order terms or by introducing advanced deep learning architectures like Transformers. Our approach can retain the interpretability of the selected parameters, yet provides significantly improved model accuracy than the discrete choice model. We demonstrate our approach on a transit route choice example using real-world transit smart card data from Singapore. This study shows the potential for a unifying approach, where theory-based discrete choice model (DCM) and data-driven AI models can leverage each other's strengths in interpretability and predictive power. With the availability of larger datasets and more complex constructions, such approach can lead to more accurate models using discrete choice models while maintaining its applicability in planning and policy-related areas. Our code is available on https://github.com/jeremyoon/route-choice/ .
【4】An Analysis of Optimizer Choice on Energy Efficiency and Performance in Neural Network Training
标题:神经网络训练中能效和性能优化器选择分析
链接:https://arxiv.org/abs/2509.13516
作者:
备注:7 pages. 3 figures
摘要:随着机器学习模型变得越来越复杂和计算要求越来越高,了解训练决策对环境的影响对于可持续的人工智能发展至关重要。本文提出了一个全面的实证研究,研究神经网络训练中优化器选择和能源效率之间的关系。我们在三个基准数据集(MNIST,CIFAR-10,CIFAR-100)上进行了360次对照实验,使用了8个流行的优化器(SGD,Adam,AdamW,RMSprop,Adagrad,Adadelta,Adamax,NAadam),每个优化器有15个随机种子。使用CodeCarbon在Apple M1 Pro硬件上进行精确的能量跟踪,我们测量了训练时间、峰值内存使用量、二氧化碳排放量和最终模型性能。我们的研究结果揭示了训练速度,准确性和环境影响之间的重大权衡,这些影响因数据集和模型复杂性而异。我们认为AdamW和NAdam是一贯有效的选择,而SGD在复杂数据集上表现出卓越的性能,尽管排放量更高。这些结果为寻求在机器学习工作流程中平衡性能和可持续性的从业者提供了可操作的见解。
摘要:As machine learning models grow increasingly complex and computationally demanding, understanding the environmental impact of training decisions becomes critical for sustainable AI development. This paper presents a comprehensive empirical study investigating the relationship between optimizer choice and energy efficiency in neural network training. We conducted 360 controlled experiments across three benchmark datasets (MNIST, CIFAR-10, CIFAR-100) using eight popular optimizers (SGD, Adam, AdamW, RMSprop, Adagrad, Adadelta, Adamax, NAdam) with 15 random seeds each. Using CodeCarbon for precise energy tracking on Apple M1 Pro hardware, we measured training duration, peak memory usage, carbon dioxide emissions, and final model performance. Our findings reveal substantial trade-offs between training speed, accuracy, and environmental impact that vary across datasets and model complexity. We identify AdamW and NAdam as consistently efficient choices, while SGD demonstrates superior performance on complex datasets despite higher emissions. These results provide actionable insights for practitioners seeking to balance performance and sustainability in machine learning workflows.
检测相关(6篇)
【1】Queen Detection in Beehives via Environmental Sensor Fusion for Low-Power Edge Computing
标题:通过环境传感器融合在蜂巢中进行女王检测,以实现低功耗边缘计算
链接:https://arxiv.org/abs/2509.14061
作者: Luca, Elisa Donati
摘要:蜂王的存在对蜂群的健康和稳定至关重要,但目前的监测方法依赖于人工检查,这是劳动密集型的,破坏性的,对于大规模养蜂来说是不切实际的。虽然最近的基于音频的方法已经显示出希望,但它们通常需要高功耗、复杂的预处理,并且容易受到环境噪声的影响。为了克服这些局限性,我们提出了一个轻量级的,多模式的系统皇后检测环境传感器融合的基础上,特别是,温度,湿度和蜂巢内外之间的压力差。我们的方法在商用STM32微控制器上采用量化决策树推理,实现了实时、低功耗的边缘计算,而不会影响准确性。我们表明,我们的系统实现了99%以上的皇后检测精度只使用环境输入,音频功能提供没有显着的性能增益。这项工作为非侵入性蜂巢监测提供了一种可扩展和可持续的解决方案,为使用现成的节能硬件进行自主精确养蜂铺平了道路。
摘要
:Queen bee presence is essential for the health and stability of honeybee colonies, yet current monitoring methods rely on manual inspections that are labor-intensive, disruptive, and impractical for large-scale beekeeping. While recent audio-based approaches have shown promise, they often require high power consumption, complex preprocessing, and are susceptible to ambient noise. To overcome these limitations, we propose a lightweight, multimodal system for queen detection based on environmental sensor fusion-specifically, temperature, humidity, and pressure differentials between the inside and outside of the hive. Our approach employs quantized decision tree inference on a commercial STM32 microcontroller, enabling real-time, low-power edge computing without compromising accuracy. We show that our system achieves over 99% queen detection accuracy using only environmental inputs, with audio features offering no significant performance gain. This work presents a scalable and sustainable solution for non-invasive hive monitoring, paving the way for autonomous, precision beekeeping using off-the-shelf, energy-efficient hardware.
【2】Personalization on a Budget: Minimally-Labeled Continual Learning for Resource-Efficient Seizure Detection
标题:预算内的个性化:最低限度标签的持续学习以实现资源高效的癫痫发作检测
链接:https://arxiv.org/abs/2509.13974
作者:in Shahbazinia, Jonathan Dan, Jose A. Miranda, Giovanni Ansaloni, David Atienza
摘要:目的:癫痫是一种常见的神经系统疾病,需要仔细的诊断和持续的护理。癫痫发作检测仍然具有挑战性,因为目前的临床实践依赖于脑电图的专家分析,这是一个耗时的过程,需要专业知识。为了应对这一挑战,本文探讨了使用深度学习的自动癫痫发作检测,重点是个性化的持续学习模型,这些模型适应每个患者独特的脑电图信号特征,这些特征会随着时间的推移而变化。研究方法:在这种情况下,我们的方法解决了将新数据集成到现有模型中而不会发生灾难性遗忘的挑战,这是静态深度学习模型中的常见问题。我们提出EpiSMART,癫痫发作检测的持续学习框架,使用大小受限的重放缓冲区和知情的样本选择策略,以逐步适应患者特定的脑电图信号。通过选择性地保留高熵和癫痫发作预测样本,我们的方法保留了关键的过去信息,同时以最小的内存和计算需求保持高性能。结果如下:在CHB-MIT数据集上的验证表明,EpiSMART在所有其他患者中没有更新的情况下,F1评分比训练基线提高了21%。平均而言,EpiSMART每天只需要6.46分钟的标记数据和6.28次更新,因此适合在可穿戴系统中实时部署。结论:EpiSMART通过有效地将新数据集成到现有模型中,而不会降低过去的知识,从而在现实和资源受限的条件下实现强大和个性化的癫痫发作检测。重要性:该框架通过提供一种持续学习的方法来推进自动癫痫发作检测,该方法支持可穿戴医疗系统中的患者特定适应和实际部署。
摘要:Objective: Epilepsy, a prevalent neurological disease, demands careful diagnosis and continuous care. Seizure detection remains challenging, as current clinical practice relies on expert analysis of electroencephalography, which is a time-consuming process and requires specialized knowledge. Addressing this challenge, this paper explores automated epileptic seizure detection using deep learning, focusing on personalized continual learning models that adapt to each patient's unique electroencephalography signal features, which evolve over time. Methods: In this context, our approach addresses the challenge of integrating new data into existing models without catastrophic forgetting, a common issue in static deep learning models. We propose EpiSMART, a continual learning framework for seizure detection that uses a size-constrained replay buffer and an informed sample selection strategy to incrementally adapt to patient-specific electroencephalography signals. By selectively retaining high-entropy and seizure-predicted samples, our method preserves critical past information while maintaining high performance with minimal memory and computational requirements. Results: Validation on the CHB-MIT dataset, shows that EpiSMART achieves a 21% improvement in the F1 score over a trained baseline without updates in all other patients. On average, EpiSMART requires only 6.46 minutes of labeled data and 6.28 updates per day, making it suitable for real-time deployment in wearable systems. Conclusion:EpiSMART enables robust and personalized seizure detection under realistic and resource-constrained conditions by effectively integrating new data into existing models without degrading past knowledge. Significance: This framework advances automated seizure detection by providing a continual learning approach that supports patient-specific adaptation and practical deployment in wearable healthcare systems.
【3】Multimodal signal fusion for stress detection using deep neural networks: a novel approach for converting 1D signals to unified 2D images
标题:使用深度神经网络进行应力检测的多模态信号融合:一种将1D信号转换为统一2D图像的新方法
链接:https://arxiv.org/abs/2509.13636
作者:anpoor, Bahram Tarvirdizadeh, Khalil Alipour, Mohammad Ghamari
备注:14 pages 7 images 2 tables
摘要:这项研究介绍了一种新的方法,将多模态生理信号光体积描记术(PPG),皮肤电反应(GSR)和加速度(ACC)转换为2D图像矩阵,以增强使用卷积神经网络(CNN)的压力检测。与单独处理这些信号或依赖于固定编码的传统方法不同,我们的技术将它们融合到结构化图像表示中,使CNN能够更有效地捕获时间和交叉信号依赖性。这种基于图像的转换不仅提高了可解释性,而且还作为一种强大的数据增强形式。为了进一步增强泛化能力和模型鲁棒性,我们系统地将融合信号重新组织为多种格式,并将它们组合在多级训练管道中。这种方法显著提高了分类性能。虽然在压力检测的背景下进行了演示,但所提出的方法广泛适用于涉及多模态生理信号的任何领域,为通过可穿戴技术进行更准确,个性化和实时的健康监测铺平了道路。
摘要:This study introduces a novel method that transforms multimodal physiological signalsphotoplethysmography (PPG), galvanic skin response (GSR), and acceleration (ACC) into 2D image matrices to enhance stress detection using convolutional neural networks (CNNs). Unlike traditional approaches that process these signals separately or rely on fixed encodings, our technique fuses them into structured image representations that enable CNNs to capture temporal and cross signal dependencies more effectively. This image based transformation not only improves interpretability but also serves as a robust form of data augmentation. To further enhance generalization and model robustness, we systematically reorganize the fused signals into multiple formats, combining them in a multi stage training pipeline. This approach significantly boosts classification performance. While demonstrated here in the context of stress detection, the proposed method is broadly applicable to any domain involving multimodal physiological signals, paving the way for more accurate, personalized, and real time health monitoring through wearable technologies.
【4】Is GPT-4o mini Blinded by its Own Safety Filters? Exposing the Multimodal-to-Unimodal Bottleneck in Hate Speech Detection
标题:GPT-4 o mini是否被其自身的安全过滤器蒙蔽?揭露仇恨语音检测中的多模式到单模式瓶颈
链接:https://arxiv.org/abs/2509.13608
作者: Selvanayagam, Ted Kurti
摘要:随着大型多模态模型(LVMs)成为日常数字生活中不可或缺的一部分,了解其安全架构是AI对齐的关键问题。本文对OpenAI的GPT-4 o mini(一种全球部署的模型)进行了系统分析,以完成多模式仇恨语音检测的艰巨任务。使用Hateful Memes Challenge数据集,我们对500个样本进行了多阶段调查,以探索模型的推理和故障模式。我们的中心发现是实验识别的“单峰瓶颈”,一个建筑缺陷,模型的先进的多模态推理系统地抢占上下文盲安全过滤器。对144项内容策略拒绝的定量验证表明,这些覆盖是由50%的单模态视觉内容和50%的文本内容同等程度地触发的。我们进一步证明,这个安全系统是脆弱的,不仅阻止高风险的图像,但也良性,常见的模因格式,导致可预测的误报。这些发现揭示了最先进的LIPE中能力和安全性之间的根本紧张关系,强调了需要更集成的、上下文感知的对齐策略,以确保AI系统可以安全有效地部署。
摘要:As Large Multimodal Models (LMMs) become integral to daily digital life, understanding their safety architectures is a critical problem for AI Alignment. This paper presents a systematic analysis of OpenAI's GPT-4o mini, a globally deployed model, on the difficult task of multimodal hate speech detection. Using the Hateful Memes Challenge dataset, we conduct a multi-phase investigation on 500 samples to probe the model's reasoning and failure modes. Our central finding is the experimental identification of a "Unimodal Bottleneck," an architectural flaw where the model's advanced multimodal reasoning is systematically preempted by context-blind safety filters. A quantitative validation of 144 content policy refusals reveals that these overrides are triggered in equal measure by unimodal visual 50% and textual 50% content. We further demonstrate that this safety system is brittle, blocking not only high-risk imagery but also benign, common meme formats, leading to predictable false positives. These findings expose a fundamental tension between capability and safety in state-of-the-art LMMs, highlighting the need for more integrated, context-aware alignment strategies to ensure AI systems can be deployed both safely and effectively.
【5】Cooperative Target Detection with AUVs: A Dual-Timescale Hierarchical MARDL Approach
标题:使用AVs的协同目标检测:双时间尺度分层MARDL方法
链接:https://arxiv.org/abs/2509.13381
作者:yao, Yang Bo, Yu Zhiwen, Cao Xuelin, George C. Alexandropoulos, Merouane Debbah, Chau Yuen
备注:6 pages
摘要:自主水下航行器(AUV)在协同探测和侦察方面显示出巨大的潜力。然而,合作AUV通信引入暴露的风险。在对抗性环境中,实现高效协作,同时确保隐蔽行动成为水下合作任务的关键挑战。在本文中,我们提出了一种新的双时间尺度层次多智能体邻近策略优化(H-MAPPO)框架。高级组件基于中央AUV确定参与任务的个体,而低级组件通过参与AUV的功率和轨迹控制来降低暴露概率。仿真结果表明,该框架实现了快速收敛,在性能方面优于基准算法,并最大限度地提高长期合作效率,同时确保隐蔽操作。
摘要:Autonomous Underwater Vehicles (AUVs) have shown great potential for cooperative detection and reconnaissance. However, collaborative AUV communications introduce risks of exposure. In adversarial environments, achieving efficient collaboration while ensuring covert operations becomes a key challenge for underwater cooperative missions. In this paper, we propose a novel dual time-scale Hierarchical Multi-Agent Proximal Policy Optimization (H-MAPPO) framework. The high-level component determines the individuals participating in the task based on a central AUV, while the low-level component reduces exposure probabilities through power and trajectory control by the participating AUVs. Simulation results show that the proposed framework achieves rapid convergence, outperforms benchmark algorithms in terms of performance, and maximizes long-term cooperative efficiency while ensuring covert operations.
【6】Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection
标题:可推广音频深度伪造检测中的低级别适配器专家混合
链接:https://arxiv.org/abs/2509.13878
作者:kkonen, Ivan Kukanov, Ville Hautamäki
备注:6 pages, 3 figures, 1 table
摘要:像Wav 2 Vec 2这样的基础模型擅长语音任务中的表示学习,包括音频deepfake检测。然而,在对一组固定的真实和欺骗的音频片段进行微调后,它们往往无法推广到训练中没有出现的新的deepfake方法。为了解决这个问题,我们提出了一种混合LoRA专家的方法,将多个低秩适配器(LoRA)集成到模型的注意力层中。路由机制有选择地激活专业专家,增强对不断发展的deepfake攻击的适应性。实验结果表明,我们的方法优于标准的微调,在域内和域外的情况下,减少相等的错误率相对于基线模型。值得注意的是,我们最好的MoE-LoRA模型将平均域外EER从8.55\%降低到6.08\%,证明了其在实现可推广的音频深度伪造检测方面的有效性。
摘要:Foundation models such as Wav2Vec2 excel at representation learning in speech tasks, including audio deepfake detection. However, after being fine-tuned on a fixed set of bonafide and spoofed audio clips, they often fail to generalize to novel deepfake methods not represented in training. To address this, we propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters (LoRA) into the model's attention layers. A routing mechanism selectively activates specialized experts, enhancing adaptability to evolving deepfake attacks. Experimental results show that our method outperforms standard fine-tuning in both in-domain and out-of-domain scenarios, reducing equal error rates relative to baseline models. Notably, our best MoE-LoRA model lowers the average out-of-domain EER from 8.55\% to 6.08\%, demonstrating its effectiveness in achieving generalizable audio deepfake detection.
分类|识别(4篇)
【1】Deep Learning-Driven Peptide Classification in Biological Nanopores
标题:生物纳米孔中深度学习驱动的肽分类
链接:https://arxiv.org/abs/2509.14029
作者:vey, Julian Hoßbach, Sandro Kuppel, Tobias Ensslen, Jan C. Behrends, Christian Holm
备注:29 pages (incl. references) 7 figures
摘要:能够在临床环境中执行蛋白质的实时分类的装置将允许廉价且快速的疾病诊断。该技术的一个这样的候选者是纳米孔装置。这些设备通过测量当蛋白质或肽进入纳米长度尺度的孔时产生的电流信号来工作。如果该电流与肽的结构及其与孔的相互作用唯一相关,则信号可用于进行鉴定。虽然这种方法将允许在临床环境中实时鉴定肽和蛋白质,但迄今为止,这些信号的复杂性限制了它们的准确性。在这项工作中,我们通过小波变换将当前信号转换为尺度图图像,以非常适合机器学习算法的方式捕获幅度,频率和时间信息来解决分类问题。当对42种肽进行测试时,我们的方法实现了~ 81\,\%\$的分类准确度,在该领域开创了新的最先进水平,并朝着护理点的实用肽/蛋白质诊断迈出了一步。此外,我们还展示了将这些模型部署到实际硬件中时至关重要的模型传输技术,为实时疾病诊断的新方法铺平了道路。
摘要:A device capable of performing real time classification of proteins in a clinical setting would allow for inexpensive and rapid disease diagnosis. One such candidate for this technology are nanopore devices. These devices work by measuring a current signal that arises when a protein or peptide enters a nanometer-length-scale pore. Should this current be uniquely related to the structure of the peptide and its interactions with the pore, the signals can be used to perform identification. While such a method would allow for real time identification of peptides and proteins in a clinical setting, to date, the complexities of these signals limit their accuracy. In this work, we tackle the issue of classification by converting the current signals into scaleogram images via wavelet transforms, capturing amplitude, frequency, and time information in a modality well-suited to machine learning algorithms. When tested on 42 peptides, our method achieved a classification accuracy of ~$81\,\%$, setting a new state-of-the-art in the field and taking a step toward practical peptide/protein diagnostics at the point of care. In addition, we demonstrate model transfer techniques that will be critical when deploying these models into real hardware, paving the way to a new method for real-time disease diagnosis.
【2】Hybrid Quantum-Classical Model for Image Classification
标题:图像分类的混合量子经典模型
链接:https://arxiv.org/abs/2509.13353
作者:Adnan Shahzad
摘要:本研究在三个基准数据集(MNIST,CIFAR 100和STL 10)上对混合量子经典神经网络和纯经典模型进行了系统比较,以评估其性能,效率和鲁棒性。混合模型将参数化量子电路与经典深度学习架构集成在一起,而经典模型则使用传统的卷积神经网络(CNN)。对每个数据集进行了超过50个训练时期的实验,并对验证准确性,测试准确性,训练时间,计算资源使用和对抗鲁棒性进行了评估(使用$\epsilon=0.1$扰动进行测试)。主要发现表明,混合模型在最终准确度方面始终优于经典模型,达到{99.38\%(MNIST),41.69\%(CIFAR 100)和74.05%(STL 10)的验证准确率,而经典基准的验证准确率分别为98.21%、32.25%和63.76%。值得注意的是,混合优势随着数据集复杂度的增加而增加,在CIFAR 100(+9.44\%)和STL 10(+10.29\%)上显示出最显著的收益。混合模型还可以训练5- 12倍(例如,21.23s vs. MNIST上每历元108.44s),并使用少6- 32%的参数},同时保持对未知测试数据的卓越泛化。对抗鲁棒性测试表明,混合模型在更简单的数据集上具有更大的弹性(例如,45.27\%的MNIST的鲁棒准确性与经典的10.80\%),但在复杂的数据集,如CIFAR 100($\sim$1\%的鲁棒性)显示出相当的脆弱性。资源效率分析表明,混合模型消耗更少的内存(4- 5GB vs. 5- 6 GB的经典)和更低的CPU利用率(9.5% vs. 23.2%)。这些结果表明,混合量子经典架构在准确性,训练效率和参数可扩展性方面具有令人信服的优势,特别是对于复杂的视觉任务。
摘要
:This study presents a systematic comparison between hybrid quantum-classical neural networks and purely classical models across three benchmark datasets (MNIST, CIFAR100, and STL10) to evaluate their performance, efficiency, and robustness. The hybrid models integrate parameterized quantum circuits with classical deep learning architectures, while the classical counterparts use conventional convolutional neural networks (CNNs). Experiments were conducted over 50 training epochs for each dataset, with evaluations on validation accuracy, test accuracy, training time, computational resource usage, and adversarial robustness (tested with $\epsilon=0.1$ perturbations).Key findings demonstrate that hybrid models consistently outperform classical models in final accuracy, achieving {99.38\% (MNIST), 41.69\% (CIFAR100), and 74.05\% (STL10) validation accuracy, compared to classical benchmarks of 98.21\%, 32.25\%, and 63.76\%, respectively. Notably, the hybrid advantage scales with dataset complexity, showing the most significant gains on CIFAR100 (+9.44\%) and STL10 (+10.29\%). Hybrid models also train 5--12$\times$ faster (e.g., 21.23s vs. 108.44s per epoch on MNIST) and use 6--32\% fewer parameters} while maintaining superior generalization to unseen test data.Adversarial robustness tests reveal that hybrid models are significantly more resilient on simpler datasets (e.g., 45.27\% robust accuracy on MNIST vs. 10.80\% for classical) but show comparable fragility on complex datasets like CIFAR100 ($\sim$1\% robustness for both). Resource efficiency analyses indicate that hybrid models consume less memory (4--5GB vs. 5--6GB for classical) and lower CPU utilization (9.5\% vs. 23.2\% on average).These results suggest that hybrid quantum-classical architectures offer compelling advantages in accuracy, training efficiency, and parameter scalability, particularly for complex vision tasks.
【3】Classification Filtering
标题:分级过滤
链接:https://arxiv.org/abs/2509.13975
作者:ram
摘要:我们考虑一个流信号,其中每个样本被链接到一个潜在的类。我们假设有多个分类器可用,每个分类器都提供具有不同准确度的类概率。这些分类器采用一个简单的和固定的政策。在这种情况下,我们考虑的问题,融合的分类器的输出,同时将时间方面,以提高分类精度。我们提出了一个状态空间模型,并开发了一个适合实时执行的过滤器。我们证明了所提出的过滤器的有效性,在活动分类应用程序的基础上惯性测量单元(IMU)的数据从可穿戴设备。
摘要:We consider a streaming signal in which each sample is linked to a latent class. We assume that multiple classifiers are available, each providing class probabilities with varying degrees of accuracy. These classifiers are employed following a straightforward and fixed policy. In this setting, we consider the problem of fusing the output of the classifiers while incorporating the temporal aspect to improve classification accuracy. We propose a state-space model and develop a filter tailored for realtime execution. We demonstrate the effectiveness of the proposed filter in an activity classification application based on inertial measurement unit (IMU) data from a wearable device.
【4】TICL: Text-Embedding KNN For Speech In-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models
标题:TICI:用于语音上下文学习的文本嵌入KNN释放大型多模式模型的语音识别能力
链接:https://arxiv.org/abs/2509.13395
作者:heng, Yekaterina Yegorova, Mark Hasegawa-Johnson
摘要:语音基础模型最近已经证明了执行语音上下文学习(SICL)的能力。选择有效的背景下的例子是至关重要的SICL的性能,但选择方法仍然未充分探索。在这项工作中,我们提出了SICL(TICL)的文本嵌入KNN,这是一个简单的管道,它使用语义上下文来增强现成的大型多模态模型的语音识别能力,而无需微调。在具有挑战性的自动语音识别任务,包括口音英语,多语种语音和儿童的语音,我们的方法使模型能够超过zero-shot性能高达84.7%的相对WER减少。我们进行消融研究,以显示我们的方法的鲁棒性和效率。
摘要:Speech foundation models have recently demonstrated the ability to perform Speech In-Context Learning (SICL). Selecting effective in-context examples is crucial for SICL performance, yet selection methodologies remain underexplored. In this work, we propose Text-Embedding KNN for SICL (TICL), a simple pipeline that uses semantic context to enhance off-the-shelf large multimodal models' speech recognition ability without fine-tuning. Across challenging automatic speech recognition tasks, including accented English, multilingual speech, and children's speech, our method enables models to surpass zero-shot performance with up to 84.7% relative WER reduction. We conduct ablation studies to show the robustness and efficiency of our method.
表征(2篇)
【1】Learning Minimal Representations of Many-Body Physics from Snapshots of a Quantum Simulator
标题:从量子模拟器的快照学习多体物理学的最小表示
链接:https://arxiv.org/abs/2509.13821
作者:Møller, Gabriel Fernández-Fernández, Thomas Schweigler, Paulin de Schoulepnikoff, Jörg Schmiedmayer, Gorka Muñoz-Gil
备注:13 pages, 7 figures
摘要:模拟量子模拟器提供了对经典计算无法触及的多体动力学的访问。然而,从实验数据中提取物理见解往往受到测量噪声,有限的可观测量和底层微观模型的不完整知识的阻碍。在这里,我们开发了一种基于变分自动编码器(VAE)的机器学习方法来分析隧道耦合一维玻色气体的干涉测量,实现了sine-Gordon量子场论。以无监督的方式训练,VAE学习与系统的平衡控制参数强烈相关的最小潜在表示。应用于非平衡协议,潜在的空间揭示了签名冻结在孤子快速冷却后,并揭示了异常后淬火动力学没有捕获的传统的相关为基础的方法。这些结果表明,生成模型可以直接从噪声和稀疏的实验数据中提取物理上可解释的变量,为量子模拟器中的平衡和非平衡物理提供补充探针。更广泛地说,我们的工作突出了机器学习如何补充现有的场论技术,为量子多体系统中可扩展的数据驱动发现铺平了道路。
摘要:Analog quantum simulators provide access to many-body dynamics beyond the reach of classical computation. However, extracting physical insights from experimental data is often hindered by measurement noise, limited observables, and incomplete knowledge of the underlying microscopic model. Here, we develop a machine learning approach based on a variational autoencoder (VAE) to analyze interference measurements of tunnel-coupled one-dimensional Bose gases, which realize the sine-Gordon quantum field theory. Trained in an unsupervised manner, the VAE learns a minimal latent representation that strongly correlates with the equilibrium control parameter of the system. Applied to non-equilibrium protocols, the latent space uncovers signatures of frozen-in solitons following rapid cooling, and reveals anomalous post-quench dynamics not captured by conventional correlation-based methods. These results demonstrate that generative models can extract physically interpretable variables directly from noisy and sparse experimental data, providing complementary probes of equilibrium and non-equilibrium physics in quantum simulators. More broadly, our work highlights how machine learning can supplement established field-theoretical techniques, paving the way for scalable, data-driven discovery in quantum many-body systems.
【2】Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
标题:为什么不是所有的道路都通向罗马:人类视觉皮质层次结构中的表现几何各不相同
链接:https://arxiv.org/abs/2509.13459
作者:h, Zahraa Chorghay, Shahab Bakhtiari, Blake A. Richards
备注:9 pages, 4 figures
摘要:生物和人工智能系统为最佳编码导航基本的效率-鲁棒性权衡,即,它们必须有效地编码输入空间的许多属性,同时还对噪声具有鲁棒性。这种挑战在像人脑这样的分层处理系统中尤为明显。为了理解系统如何在效率和鲁棒性之间进行权衡,我们转向了一个人口几何框架,用于分析人类视觉皮层中的表示以及人工神经网络(ANN)。在腹侧的视觉流,我们发现了通用的,无标度的表示,其特征在于在大多数地区的幂律衰减本征谱。然而,在某些高阶视觉区域没有无标度表征,表明无标度几何不是大脑的普遍属性。与此同时,用自监督学习目标训练的人工神经网络也表现出自由几何,但在特定任务上进行微调后就没有了。基于这些经验结果和我们的分析见解,我们认为,一个系统的表示几何不是一个普遍的属性,而是取决于计算目标。
摘要:Biological and artificial intelligence systems navigate the fundamental efficiency-robustness tradeoff for optimal encoding, i.e., they must efficiently encode numerous attributes of the input space while also being robust to noise. This challenge is particularly evident in hierarchical processing systems like the human brain. With a view towards understanding how systems navigate the efficiency-robustness tradeoff, we turned to a population geometry framework for analyzing representations in the human visual cortex alongside artificial neural networks (ANNs). In the ventral visual stream, we found general-purpose, scale-free representations characterized by a power law-decaying eigenspectrum in most areas. However, in certain higher-order visual areas did not have scale-free representations, indicating that scale-free geometry is not a universal property of the brain. In parallel, ANNs trained with a self-supervised learning objective also exhibited free-free geometry, but not after fine-tune on a specific task. Based on these empirical results and our analytical insights, we posit that a system's representation geometry is not a universal property and instead depends upon the computational objective.
3D|3D重建等相关(1篇)
【1】MapAnything: Universal Feed-Forward Metric 3D Reconstruction
标题:MapAnything:通用前向度量3D重建
链接:https://arxiv.org/abs/2509.13414
作者:etha, Norman Müller, Johannes Schönberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, Jonathon Luiten, Manuel Lopez-Antequera, Samuel Rota Bulò, Christian Richardt, Deva Ramanan, Sebastian Scherer, Peter Kontschieder
备注:Project Page: this https URL
摘要:我们引入MapAnything,一个统一的基于transformer的前馈模型,它摄取一个或多个图像以及可选的几何输入,如相机本质,姿势,深度或部分重建,然后直接回归度量3D场景几何和相机。MapAnything利用多视图场景几何的因子表示,即,深度图、局部光线图、相机姿态和度量比例因子的集合,该度量比例因子有效地将局部重建升级为全局一致的度量帧。通过对不同数据集的监督和训练进行标准化,以及灵活的输入增强,MapAnything能够在单个前馈通道中解决广泛的3D视觉任务,包括未校准的运动结构,校准的多视图立体,单目深度估计,相机定位,深度完成等。我们提供了广泛的实验分析和模型消融,证明MapAnything优于或匹配专业前馈模型,同时提供更有效的联合训练行为,从而为通用3D重建骨干铺平了道路。
摘要:We introduce MapAnything, a unified transformer-based feed-forward model that ingests one or more images along with optional geometric inputs such as camera intrinsics, poses, depth, or partial reconstructions, and then directly regresses the metric 3D scene geometry and cameras. MapAnything leverages a factored representation of multi-view scene geometry, i.e., a collection of depth maps, local ray maps, camera poses, and a metric scale factor that effectively upgrades local reconstructions into a globally consistent metric frame. Standardizing the supervision and training across diverse datasets, along with flexible input augmentation, enables MapAnything to address a broad range of 3D vision tasks in a single feed-forward pass, including uncalibrated structure-from-motion, calibrated multi-view stereo, monocular depth estimation, camera localization, depth completion, and more. We provide extensive experimental analyses and model ablations demonstrating that MapAnything outperforms or matches specialist feed-forward models while offering more efficient joint training behavior, thus paving the way toward a universal 3D reconstruction backbone.
优化|敛散性(4篇)
【1】Exploring the Relationship between Brain Hemisphere States and Frequency Bands through Deep Learning Optimization Techniques
标题:通过深度学习优化技术探索脑半球状态与频段之间的关系
链接:https://arxiv.org/abs/2509.14078
作者:lam, Dmitry I. Ignatov, Karl Kaberg, Roman Nabatchikov
摘要:本研究调查分类器的性能,在EEG频段使用各种优化和评估有效的类预测的左,右半球。使用TensorFlow和PyTorch框架实现并比较了三种神经网络架构-深度密集网络,浅层三层网络和卷积神经网络(CNN)。结果表明,Adagrad和RMSprop优化器在不同的频带上始终表现良好,Adadelta在跨模型评估中表现出稳健的性能。具体来说,Adagrad在beta波段表现出色,而RMSprop在gamma波段表现出色。相反,SGD和FTRL表现出不一致的性能。在这些模型中,CNN表现出第二高的准确性,特别是在捕捉EEG数据的空间特征。深度密集网络在学习复杂模式方面表现出有竞争力的性能,而浅三层网络有时不太准确,但提供了计算效率。SHAP(Shapley Additive Decomposition)图用于识别有效的类别预测,揭示EEG频带对模型准确性的细微贡献。总体而言,该研究强调了优化器选择,模型架构和EEG频带分析在增强分类器性能和理解基于神经成像的分类任务中的特征重要性方面的重要性。
摘要:This study investigates classifier performance across EEG frequency bands using various optimizers and evaluates efficient class prediction for the left and right hemispheres. Three neural network architectures - a deep dense network, a shallow three-layer network, and a convolutional neural network (CNN) - are implemented and compared using the TensorFlow and PyTorch frameworks. Results indicate that the Adagrad and RMSprop optimizers consistently perform well across different frequency bands, with Adadelta exhibiting robust performance in cross-model evaluations. Specifically, Adagrad excels in the beta band, while RMSprop achieves superior performance in the gamma band. Conversely, SGD and FTRL exhibit inconsistent performance. Among the models, the CNN demonstrates the second highest accuracy, particularly in capturing spatial features of EEG data. The deep dense network shows competitive performance in learning complex patterns, whereas the shallow three-layer network, sometimes being less accurate, provides computational efficiency. SHAP (Shapley Additive Explanations) plots are employed to identify efficient class prediction, revealing nuanced contributions of EEG frequency bands to model accuracy. Overall, the study highlights the importance of optimizer selection, model architecture, and EEG frequency band analysis in enhancing classifier performance and understanding feature importance in neuroimaging-based classification tasks.
【2】RF-LSCM: Pushing Radiance Fields to Multi-Domain Localized Statistical Channel Modeling for Cellular Network Optimization
标题:RF-LSCP:将辐射场推入多域局部统计通道建模以实现蜂窝网络优化
链接:https://arxiv.org/abs/2509.13686
作者: Peng, Shutao Zhang, Xi Zheng, Ye Xue, Xinyu Qin, Tsung-Hui Chang
摘要:精确的局部无线信道建模是蜂窝网络优化的基石,能够在参数调整期间可靠地预测网络性能。局部化统计信道建模(LSCM)是为蜂窝网络优化量身定制的最先进的信道建模框架。然而,传统的LSCM方法,从参考信号接收功率(RSRP)测量推断信道的角功率谱(APS),受到严重的限制:它们通常局限于单小区,单网格和单载波频率分析,无法捕获复杂的跨域相互作用。为了克服这些挑战,我们提出了RF-LSCM,一种新的框架,该框架通过联合表示辐射场中的大规模信号衰减和多径分量来模拟信道APS。RF-LSCM引入了一种多域LSCM公式,该公式具有物理信息频率相关衰减模型(FDAM)以促进交叉频率泛化,以及点云辅助环境增强方法以实现多小区和多网格信道建模。此外,为了解决典型神经辐射场的计算效率低下的问题,RF-LSCM利用低秩张量表示,并辅以新的分层张量角度建模(HiTAM)算法。这种高效的设计显著降低了GPU内存需求和训练时间,同时保持了细粒度的准确性。在真实世界的多细胞数据集上进行的大量实验表明,RF-LSCM显著优于最先进的方法,通过有效融合多频率数据,覆盖预测的平均绝对误差(MAE)降低了30%,MAE提高了22%。
摘要
:Accurate localized wireless channel modeling is a cornerstone of cellular network optimization, enabling reliable prediction of network performance during parameter tuning. Localized statistical channel modeling (LSCM) is the state-of-the-art channel modeling framework tailored for cellular network optimization. However, traditional LSCM methods, which infer the channel's Angular Power Spectrum (APS) from Reference Signal Received Power (RSRP) measurements, suffer from critical limitations: they are typically confined to single-cell, single-grid and single-carrier frequency analysis and fail to capture complex cross-domain interactions. To overcome these challenges, we propose RF-LSCM, a novel framework that models the channel APS by jointly representing large-scale signal attenuation and multipath components within a radiance field. RF-LSCM introduces a multi-domain LSCM formulation with a physics-informed frequency-dependent Attenuation Model (FDAM) to facilitate the cross frequency generalization as well as a point-cloud-aided environment enhanced method to enable multi-cell and multi-grid channel modeling. Furthermore, to address the computational inefficiency of typical neural radiance fields, RF-LSCM leverages a low-rank tensor representation, complemented by a novel Hierarchical Tensor Angular Modeling (HiTAM) algorithm. This efficient design significantly reduces GPU memory requirements and training time while preserving fine-grained accuracy. Extensive experiments on real-world multi-cell datasets demonstrate that RF-LSCM significantly outperforms state-of-the-art methods, achieving up to a 30% reduction in mean absolute error (MAE) for coverage prediction and a 22% MAE improvement by effectively fusing multi-frequency data.
【3】A novel approach of day-ahead cooling load prediction and optimal control for ice-based thermal energy storage (TES) system in commercial buildings
标题:商业建筑冰基热能存储(TES)系统昼夜冷负荷预测和优化控制的新方法
链接:https://arxiv.org/abs/2509.13371
作者:ng, Xiao Wang, Jingjing An, Da Yan
备注:16 pages,14 figures,published to Energy & Buildings
摘要:蓄热是建筑物实现负荷转移和需求响应的有效方法。最佳的TES控制和管理对于提高冷却系统的性能至关重要。大多数现有的TES系统在固定的时间表上操作,这不能充分利用其负荷转移能力,并且需要广泛的调查和优化。本研究提出了一种新的综合负荷预测和优化控制方法的商业建筑冰基TES。建立了冷负荷预测模型,并在预测模型中引入了午间修正机制,提高了预测精度。在预测的基础上,提出了基于分时电价的规则控制策略,并根据中午预测修正引入了中午控制调整机制。将该方法应用于北京某商业综合体的冰基TES系统中,取得了389 kW的平均绝对误差(MAE)和12.5%的MAE方差系数。综合预测控制策略实现了9.9%的节能率。将该模型应用于实际案例建筑的楼宇自动化系统中,显著提高了冷却系统的效率和自动化程度。
摘要:Thermal energy storage (TES) is an effective method for load shifting and demand response in buildings. Optimal TES control and management are essential to improve the performance of the cooling system. Most existing TES systems operate on a fixed schedule, which cannot take full advantage of its load shifting capability, and requires extensive investigation and optimization. This study proposed a novel integrated load prediction and optimized control approach for ice-based TES in commercial buildings. A cooling load prediction model was developed and a mid-day modification mechanism was introduced into the prediction model to improve the accuracy. Based on the predictions, a rule-based control strategy was proposed according to the time-of-use tariff; the mid-day control adjustment mechanism was introduced in accordance with the mid-day prediction modifications. The proposed approach was applied in the ice-based TES system of a commercial complex in Beijing, and achieved a mean absolute error (MAE) of 389 kW and coefficient of variance of MAE of 12.5%. The integrated prediction-based control strategy achieved an energy cost saving rate of 9.9%. The proposed model was deployed in the realistic building automation system of the case building and significantly improved the efficiency and automation of the cooling system.
【4】Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain
标题:具有恒定收益的平均回报鲁棒Markov决策过程的Bellman最优性
链接:https://arxiv.org/abs/2509.14203
作者:ang, Nian Si
摘要:鲁棒马尔可夫决策过程(MDP)下的学习和最优控制问题受到越来越多的关注,但现有的理论、算法和应用大多集中在有限时域或折扣模型上。平均报酬公式虽然在许多运筹学和管理背景下是自然的,但仍然没有得到充分的探索。这主要是因为动态编程基础在技术上具有挑战性,并且只有部分理解,还有几个基本问题尚未解决。本文通过分析常增益设置,逐步建立了平均报酬鲁棒MDP的一般框架。研究了控制器和S-矩形敌手之间可能存在信息不对称的平均报酬鲁棒控制问题。我们的分析集中在恒定增益的鲁棒贝尔曼方程,检查解决方案的存在性和它们的关系,最优平均奖励。具体来说,我们确定的鲁棒Bellman方程的解决方案时,最优的平均奖励和固定的政策,我们提供了充分的条件,确保解决方案的存在。这些研究结果扩展了平均报酬鲁棒MDP的动态规划理论,为作战环境下长期平均准则下的鲁棒动态决策奠定了基础。
摘要:Learning and optimal control under robust Markov decision processes (MDPs) have received increasing attention, yet most existing theory, algorithms, and applications focus on finite-horizon or discounted models. The average-reward formulation, while natural in many operations research and management contexts, remains underexplored. This is primarily because the dynamic programming foundations are technically challenging and only partially understood, with several fundamental questions remaining open. This paper steps toward a general framework for average-reward robust MDPs by analyzing the constant-gain setting. We study the average-reward robust control problem with possible information asymmetries between the controller and an S-rectangular adversary. Our analysis centers on the constant-gain robust Bellman equation, examining both the existence of solutions and their relationship to the optimal average reward. Specifically, we identify when solutions to the robust Bellman equation characterize the optimal average reward and stationary policies, and we provide sufficient conditions ensuring solutions' existence. These findings expand the dynamic programming theory for average-reward robust MDPs and lay a foundation for robust dynamic decision making under long-run average criteria in operational environments.
预测|估计(7篇)
【1】Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting
标题:弥合过去与未来:时间序列预测的分布感知调整
链接:https://arxiv.org/abs/2509.14181
作者: Jie Yang, Tian Zhou, Peiyuan Liu, Yujin Tang, Rong Jin, Liang Sun
摘要:像对比学习这样的表征学习技术长期以来一直在时间序列预测中进行探索,反映了它们在计算机视觉和自然语言处理中的成功。然而,最近国家的最先进的(SOTA)预测很少采用这些代表性的方法,因为他们已经表现出很少的性能优势。我们挑战这种观点,并表明,明确的表示对齐可以提供关键信息,弥合输入历史和未来目标之间的分布差距。为此,我们引入了TimeAlign,一个轻量级的,即插即用的框架,通过一个简单的重建任务学习辅助功能,并将它们反馈给任何基础预测。在八个基准测试中的大量实验验证了其优越的性能。进一步的研究表明,收益主要来自纠正历史投入和未来产出之间的频率不匹配。我们还提供了一个理论上的理由TimeAlign的有效性,在增加学习表示和预测目标之间的互信息。由于它是架构不可知的,并且产生的开销可以忽略不计,TimeAlign可以作为现代深度学习时间序列预测系统的通用对齐模块。该代码可在https://github.com/TROUBADOUR000/TimeAlign上获得。
摘要:Representation learning techniques like contrastive learning have long been explored in time series forecasting, mirroring their success in computer vision and natural language processing. Yet recent state-of-the-art (SOTA) forecasters seldom adopt these representation approaches because they have shown little performance advantage. We challenge this view and demonstrate that explicit representation alignment can supply critical information that bridges the distributional gap between input histories and future targets. To this end, we introduce TimeAlign, a lightweight, plug-and-play framework that learns auxiliary features via a simple reconstruction task and feeds them back to any base forecaster. Extensive experiments across eight benchmarks verify its superior performance. Further studies indicate that the gains arises primarily from correcting frequency mismatches between historical inputs and future outputs. We also provide a theoretical justification for the effectiveness of TimeAlign in increasing the mutual information between learned representations and predicted targets. As it is architecture-agnostic and incurs negligible overhead, TimeAlign can serve as a general alignment module for modern deep learning time-series forecasting systems. The code is available at https://github.com/TROUBADOUR000/TimeAlign.
【2】From Distributional to Quantile Neural Basis Models: the case of Electricity Price Forecasting
标题:从分布神经基础模型到分位数神经基础模型:电价预测案例
链接:https://arxiv.org/abs/2509.14113
作者:o Brusaferri, Danial Ramin, Andrea Ballarino
备注:6 pages
摘要:虽然神经网络在多水平概率预测中实现了高预测精度,但理解导致特征条件输出的潜在机制仍然是预测人员面临的重大挑战。在这项工作中,我们通过引入分位数神经基础模型(Quantile Neural Basis Model)进一步解决了这一关键问题,该模型将分位数广义加性模型的可解释性原则纳入了端到端神经网络训练框架。为此,我们利用共享基分解和权重因子分解,通过避免任何参数分布假设来补充位置,规模和形状的神经模型。我们验证了我们的方法对日前电价预测,实现了与分布和分位数回归神经网络相当的预测性能,同时通过学习从输入特征到整个范围内的输出预测的非线性映射,提供了对模型行为的有价值的见解。
摘要:While neural networks are achieving high predictive accuracy in multi-horizon probabilistic forecasting, understanding the underlying mechanisms that lead to feature-conditioned outputs remains a significant challenge for forecasters. In this work, we take a further step toward addressing this critical issue by introducing the Quantile Neural Basis Model, which incorporates the interpretability principles of Quantile Generalized Additive Models into an end-to-end neural network training framework. To this end, we leverage shared basis decomposition and weight factorization, complementing Neural Models for Location, Scale, and Shape by avoiding any parametric distributional assumptions. We validate our approach on day-ahead electricity price forecasting, achieving predictive performance comparable to distributional and quantile regression neural networks, while offering valuable insights into model behavior through the learned nonlinear mappings from input features to output predictions across the horizon.
【3】Physics-based deep kernel learning for parameter estimation in high dimensional PDEs
标题:基于物理的深度核学习用于多维PDE中的参数估计
链接:https://arxiv.org/abs/2509.14054
作者:n, Christoph Brune, Mengwu Guo
摘要:高维偏微分方程(PDE)参数的推断提出了显著的计算和推断挑战,主要是由于维数灾难和传统数值方法的固有局限性。本文介绍了一种新型的两阶段贝叶斯框架,该框架将训练、基于物理的深度内核学习(DKL)与汉密尔顿蒙特卡罗(HMC)协同集成,以鲁棒地推断未知的偏微分方程参数,并从稀疏、精确的观察值中量化其不确定性。第一阶段利用基于物理的DKL来训练代理模型,其共同产生优化的神经网络特征提取器和PDE参数的鲁棒初始估计。在第二阶段,神经网络的权重固定,HMC是采用一个完整的贝叶斯框架内,有效地采样的核超参数和PDE参数的联合后验分布。规范和高维逆偏微分方程问题的数值实验表明,我们的框架准确地估计参数,提供可靠的不确定性估计,并有效地解决了数据稀疏性和模型复杂性的挑战,为各种科学和工程应用提供了一个强大的和可扩展的工具。
摘要:Inferring parameters of high-dimensional partial differential equations (PDEs) poses significant computational and inferential challenges, primarily due to the curse of dimensionality and the inherent limitations of traditional numerical methods. This paper introduces a novel two-stage Bayesian framework that synergistically integrates training, physics-based deep kernel learning (DKL) with Hamiltonian Monte Carlo (HMC) to robustly infer unknown PDE parameters and quantify their uncertainties from sparse, exact observations. The first stage leverages physics-based DKL to train a surrogate model, which jointly yields an optimized neural network feature extractor and robust initial estimates for the PDE parameters. In the second stage, with the neural network weights fixed, HMC is employed within a full Bayesian framework to efficiently sample the joint posterior distribution of the kernel hyperparameters and the PDE parameters. Numerical experiments on canonical and high-dimensional inverse PDE problems demonstrate that our framework accurately estimates parameters, provides reliable uncertainty estimates, and effectively addresses challenges of data sparsity and model complexity, offering a robust and scalable tool for diverse scientific and engineering applications.
【4】Long-context Reference-based MT Quality Estimation
标题:长上下文基于参考的MT质量估计
链接:https://arxiv.org/abs/2509.13980
作者:aq, Chinonso Cynthia Osuji, Sheila Castilho, Brian Davis
摘要:在本文中,我们提交给第十届机器翻译会议(WMT25)自动翻译质量评估共享任务。 我们的系统是建立在COMET框架和训练预测段级错误跨度注释(ESA)分数使用增强的长上下文数据。 为了构建长上下文训练数据,我们将领域内的人类注释句子连接起来,并计算其得分的加权平均值。 我们通过标准化它们的尺度来整合多个人类判断数据集(MQM,SQM和DA),并通过训练多语言回归模型来预测源代码,假设和参考翻译的质量分数。 实验结果表明,与仅在短片段上训练的模型相比,结合长上下文信息可以提高与人类判断的相关性。
摘要:In this paper, we present our submission to the Tenth Conference on Machine Translation (WMT25) Shared Task on Automated Translation Quality Evaluation. Our systems are built upon the COMET framework and trained to predict segment-level Error Span Annotation (ESA) scores using augmented long-context data. To construct long-context training data, we concatenate in-domain, human-annotated sentences and compute a weighted average of their scores. We integrate multiple human judgment datasets (MQM, SQM, and DA) by normalising their scales and train multilingual regression models to predict quality scores from the source, hypothesis, and reference translations. Experimental results show that incorporating long-context information improves correlations with human judgments compared to models trained only on short segments.
【5】Ensemble of Pre-Trained Models for Long-Tailed Trajectory Prediction
标题:用于长尾轨迹预测的预训练模型的扩展
链接:https://arxiv.org/abs/2509.13914
作者:remella, Yi Yang, Simon Wanna, Lars Kunze, Daniele De Martini
备注:Accepted 2025 IEEE International Conference on Intelligent Transportation Systems (ITSC 2025)
摘要:这项工作探讨了集成建模的应用,在城市环境中的车辆轨迹预测的多维回归问题。随着更新和更大的自动驾驶预测模型不断涌现,一个重要的开放性挑战是如何在不需要昂贵的重新训练的情况下结合这些大模型的优势。我们展示了如何,也许令人惊讶的是,将最先进的深度学习模型与简单的置信加权平均方法相结合(无需重新训练或微调)可以增强整体预测。事实上,虽然组合轨迹预测模型并不简单,但这种简单的方法比最佳预测模型提高了10%的性能,特别是在长尾指标方面。我们表明,这种性能的提高,在NuScenes和Argoverse数据集,这些改进是在整个数据集分布。我们工作的代码是开源的。
摘要:This work explores the application of ensemble modeling to the multidimensional regression problem of trajectory prediction for vehicles in urban environments. As newer and bigger state-of-the-art prediction models for autonomous driving continue to emerge, an important open challenge is the problem of how to combine the strengths of these big models without the need for costly re-training. We show how, perhaps surprisingly, combining state-of-the-art deep learning models out-of-the-box (without retraining or fine-tuning) with a simple confidence-weighted average method can enhance the overall prediction. Indeed, while combining trajectory prediction models is not straightforward, this simple approach enhances performance by 10% over the best prediction model, especially in the long-tailed metrics. We show that this performance improvement holds on both the NuScenes and Argoverse datasets, and that these improvements are made across the dataset distribution. The code for our work is open source.
【6】Label-Efficient Grasp Joint Prediction with Point-JEPA
标题:使用Point-JEPA的标签高效抓取联合预测
链接:https://arxiv.org/abs/2509.13349
作者:kabaagac, Boris Petrović
备注:4 pages, 5 figures. Submitted to IROS 2025 Workshop
摘要:我们研究了具有关节嵌入预测架构(Point-JEPA)的3D自我监督预训练是否能够实现标签有效的抓取关节角度预测。使用从网格标记的点云和一个ShapeNet预训练的Point-JEPA编码器,我们训练一个轻量级的多假设头,赢家通吃,并通过top-logit选择进行评估。在具有对象级拆分的DLR-Hand II上,Point-JEPA在低标签制度下将RMSE降低了26%,并与完全监督达到了同等水平。这些结果表明,JEPA风格的预训练是一种实用的数据有效的抓取学习方法。
摘要:We investigate whether 3D self-supervised pretraining with a Joint-Embedding Predictive Architecture (Point-JEPA) enables label-efficient grasp joint-angle prediction. Using point clouds tokenized from meshes and a ShapeNet-pretrained Point-JEPA encoder, we train a lightweight multi-hypothesis head with winner-takes-all and evaluate by top-logit selection. On DLR-Hand II with object-level splits, Point-JEPA reduces RMSE by up to 26% in low-label regimes and reaches parity with full supervision. These results suggest JEPA-style pretraining is a practical approach for data-efficient grasp learning.
【7】Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
标题:具有有偏差梯度估计的加速梯度方法:风险敏感性、高概率保证和大偏差界
链接:https://arxiv.org/abs/2509.13628
作者:üzbalaban, Yasa Syed, Necdet Serhat Aybat
摘要:我们研究了一阶方法中收敛速度和对梯度误差的鲁棒性之间的权衡。我们的重点是广义动量方法(GARCH),一类,包括Nesterov的加速梯度,重球,梯度下降。我们允许随机梯度误差,可能是对抗性和偏见,并通过鲁棒控制理论的风险敏感指数(RSI)的鲁棒性进行量化。对于二次目标,i.i.d.高斯噪声,我们给出了封闭形式的表达式RSI使用2x2 Riccati方程,揭示了帕累托边界RSI和收敛速度的步长和动量的选择。我们证明了时间平均次优性的大偏差原理,并表明率函数是,缩放,RSI的凸共轭。我们进一步将RSI与$H_{\infty}$-范数联系起来,表明更强的最坏情况鲁棒性(较小的$H_{\infty}$范数)会产生更尖锐的尾部概率衰减。除了二次,有偏次高斯梯度误差下,我们推导出有限时间模拟RSI的非渐近界,给出有限时间高概率保证和大偏差界。我们还观察到一个类似的权衡RSI和收敛速度边界光滑强凸函数。据我们所知,这是第一个非渐近保证和风险敏感分析的Gestival有偏梯度。稳健回归的数值实验说明了结果。
摘要:We study trade-offs between convergence rate and robustness to gradient errors in first-order methods. Our focus is on generalized momentum methods (GMMs), a class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i.i.d. Gaussian noise, we give closed-form expressions for RSI using 2x2 Riccati equations, revealing a Pareto frontier between RSI and convergence rate over stepsize and momentum choices. We prove a large-deviation principle for time-averaged suboptimality and show that the rate function is, up to scaling, the convex conjugate of the RSI. We further connect RSI to the $H_{\infty}$-norm, showing that stronger worst-case robustness (smaller $H_{\infty}$ norm) yields sharper decay of tail probabilities. Beyond quadratics, under biased sub-Gaussian gradient errors, we derive non-asymptotic bounds on a finite-time analogue of the RSI, giving finite-time high-probability guarantees and large-deviation bounds. We also observe an analogous trade-off between RSI and convergence-rate bounds for smooth strongly convex functions. To our knowledge, these are the first non-asymptotic guarantees and risk-sensitive analysis of GMMs with biased gradients. Numerical experiments on robust regression illustrate the results.
其他神经网络|深度学习|模型|建模(18篇)
【1】Multi-robot Multi-source Localization in Complex Flows with Physics-Preserving Environment Models
标题:具有物理保护环境模型的复杂流中的多机器人多源定位
链接:https://arxiv.org/abs/2509.14228
作者:Shaffer, Victoria Edwards, Brooks Kinch, Nathaniel Trask, M. Ani Hsieh
摘要:复杂流动中的源定位对负责定位化学品泄漏源或跟踪漏油扩散的多机器人团队构成了重大挑战。流体动力学可能是时变和混乱的,导致零星和间歇的传感器读数,复杂的环境几何形状进一步复杂化了团队建模和预测分散的能力。为了准确地解释驱动分散动力学的物理过程,机器人必须能够访问计算密集型数值模型,当机载计算有限时,这可能是困难的。我们提出了一个分布式移动传感框架的源定位,其中每个机器人进行机器学习,有限元模型的环境,以指导基于信息的采样。该模型用于评估近似互信息标准,以驱动信息趋性控制策略,该策略选择期望最大化源定位目标的信息量的传感区域。与基线传感策略相比,我们的方法实现了更快的误差减少,并且与基线机器学习方法相比,可以实现更准确的源定位。
摘要:Source localization in a complex flow poses a significant challenge for multi-robot teams tasked with localizing the source of chemical leaks or tracking the dispersion of an oil spill. The flow dynamics can be time-varying and chaotic, resulting in sporadic and intermittent sensor readings, and complex environmental geometries further complicate a team's ability to model and predict the dispersion. To accurately account for the physical processes that drive the dispersion dynamics, robots must have access to computationally intensive numerical models, which can be difficult when onboard computation is limited. We present a distributed mobile sensing framework for source localization in which each robot carries a machine-learned, finite element model of its environment to guide information-based sampling. The models are used to evaluate an approximate mutual information criterion to drive an infotaxis control strategy, which selects sensing regions that are expected to maximize informativeness for the source localization objective. Our approach achieves faster error reduction compared to baseline sensing strategies and results in more accurate source localization compared to baseline machine learning approaches.
【2】A Variational Framework for Residual-Based Adaptivity in Neural PDE Solvers and Operator Learning
标题:神经PDL求解器和操作员学习中基于剩余自适应性的变分框架
链接:https://arxiv.org/abs/2509.14198
作者:o Toscano, Daniel T. Chen, Vivek Oommen, George Em Karniadakis
摘要:基于残差的自适应策略广泛用于科学机器学习,但在很大程度上仍然是启发式的。我们引入了一个统一的变分框架,形式化这些方法,通过整合凸变换的残留。不同的变换对应于不同的目标泛函:指数权重的目标是均匀误差的最小化,而线性权重恢复二次误差的最小化。从这个角度来看,自适应加权相当于选择优化原始目标的采样分布,从而将离散化选择直接与误差度量相关联。这种原则性的方法产生三个好处:(1)它能够跨范数系统地设计自适应方案,(2)通过损失估计器的方差减小来减少离散化误差,以及(3)通过提高梯度信噪比来增强学习动态。将框架扩展到算子学习,我们证明了优化器和架构的显著性能提升。我们的研究结果为基于残差的自适应性提供了理论依据,并为原则性离散化和训练策略奠定了基础。
摘要
:Residual-based adaptive strategies are widely used in scientific machine learning but remain largely heuristic. We introduce a unifying variational framework that formalizes these methods by integrating convex transformations of the residual. Different transformations correspond to distinct objective functionals: exponential weights target the minimization of uniform error, while linear weights recover the minimization of quadratic error. Within this perspective, adaptive weighting is equivalent to selecting sampling distributions that optimize the primal objective, thereby linking discretization choices directly to error metrics. This principled approach yields three benefits: (1) it enables systematic design of adaptive schemes across norms, (2) reduces discretization error through variance reduction of the loss estimator, and (3) enhances learning dynamics by improving the gradient signal-to-noise ratio. Extending the framework to operator learning, we demonstrate substantial performance gains across optimizers and architectures. Our results provide a theoretical justification of residual-based adaptivity and establish a foundation for principled discretization and training strategies.
【3】A Compositional Kernel Model for Feature Learning
标题:一种用于特征学习的组合核模型
链接:https://arxiv.org/abs/2509.14158
作者:, Keli Liu, Michael Jordan
摘要:我们研究了内核岭回归的一个组成变量,其中的预测是适用于坐标明智的重新加权的输入。作为一个变分问题,该模型提供了一个简单的测试平台,在组合架构的功能学习。从变量选择的角度,我们展示了如何恢复相关变量,同时消除噪声变量。我们建立的保证表明,全局极小和静止点丢弃噪声坐标时,噪声变量是高斯分布。一个中心的发现是,$\ell_1 $型内核,如拉普拉斯内核,成功地恢复功能,有助于非线性效应在固定点,而高斯内核恢复只有线性的。
摘要:We study a compositional variant of kernel ridge regression in which the predictor is applied to a coordinate-wise reweighting of the inputs. Formulated as a variational problem, this model provides a simple testbed for feature learning in compositional architectures. From the perspective of variable selection, we show how relevant variables are recovered while noise variables are eliminated. We establish guarantees showing that both global minimizers and stationary points discard noise coordinates when the noise variables are Gaussian distributed. A central finding is that $\ell_1$-type kernels, such as the Laplace kernel, succeed in recovering features contributing to nonlinear effects at stationary points, whereas Gaussian kernels recover only linear ones.
【4】Breaking the Cycle of Incarceration With Targeted Mental Health Outreach: A Case Study in Machine Learning for Public Policy
标题:通过有针对性的心理健康外展打破监禁循环:公共政策机器学习案例研究
链接:https://arxiv.org/abs/2509.14129
作者:dolfa, Erika Salomon, Jin Yao, Steve Yoder, Robert Sullivan, Kevin McGuire, Allie Dickinson, Rob MacDougall, Brian Seidler, Christina Sung, Claire Herdeman, Rayid Ghani
摘要:许多被监禁的人面临着重大而复杂的挑战,包括精神疾病、药物依赖和无家可归,但监狱和监狱往往设备简陋,无法满足这些需求。由于现有刑事司法系统的支持很少,这些需求可能仍然得不到处理和恶化,往往导致进一步的犯罪和监禁周期,对个人和公共安全都有不利影响,对有色人种社区的影响特别大,继续扩大刑事司法结果中已经广泛的种族差异。为了应对这些失败,越来越多的刑事司法利益攸关方正在寻求通过创新方法打破这一循环,例如社区驱动和替代方法,以维持治安,指导,社区建设,恢复性司法,审前分流,整体辩护和社会服务联系。在这里,我们报告了堪萨斯州约翰逊县和卡内基梅隆大学之间的合作,以进行有针对性的,积极主动的心理健康外展,努力降低重新监禁率。 本文描述了所使用的数据,我们的预测建模方法和结果,以及现场试验的设计和分析,以确认我们的模型的预测能力,评估这种有针对性的推广的影响,并了解在什么水平的再监禁风险外展可能是最有效的。通过这项试验,我们发现我们的模型对新的监狱预订具有高度预测性,试验中风险最高的群体中有一半以上的人在第二年返回监狱。外展是最有效的,在这些最高风险的个人,心理健康的利用,EMS调度和刑事司法参与的影响。
摘要:Many incarcerated individuals face significant and complex challenges, including mental illness, substance dependence, and homelessness, yet jails and prisons are often poorly equipped to address these needs. With little support from the existing criminal justice system, these needs can remain untreated and worsen, often leading to further offenses and a cycle of incarceration with adverse outcomes both for the individual and for public safety, with particularly large impacts on communities of color that continue to widen the already extensive racial disparities in criminal justice outcomes. Responding to these failures, a growing number of criminal justice stakeholders are seeking to break this cycle through innovative approaches such as community-driven and alternative approaches to policing, mentoring, community building, restorative justice, pretrial diversion, holistic defense, and social service connections. Here we report on a collaboration between Johnson County, Kansas, and Carnegie Mellon University to perform targeted, proactive mental health outreach in an effort to reduce reincarceration rates. This paper describes the data used, our predictive modeling approach and results, as well as the design and analysis of a field trial conducted to confirm our model's predictive power, evaluate the impact of this targeted outreach, and understand at what level of reincarceration risk outreach might be most effective. Through this trial, we find that our model is highly predictive of new jail bookings, with more than half of individuals in the trial's highest-risk group returning to jail in the following year. Outreach was most effective among these highest-risk individuals, with impacts on mental health utilization, EMS dispatches, and criminal justice involvement.
【5】You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models
标题:训练你就是你:数据合成对训练上下文感知机器翻译模型的影响
链接:https://arxiv.org/abs/2509.14031
作者:a, Yusuf Can Semerci, Jan Scholtes, Gerasimos Spanakis
备注:EMNLP 2025 main conference
摘要:实现人类水平的翻译需要利用上下文来确保连贯性并处理代词消歧等复杂现象。标准训练数据中上下文丰富的示例的稀疏性被假设为上下文利用困难的原因。在这项工作中,我们通过构建具有受控比例的上下文相关示例的训练数据集,在单语言和多语言环境中系统地验证了这一说法。我们证明了训练数据稀疏性和模型性能之间的强关联,确认稀疏性是一个关键瓶颈。重要的是,我们发现,在一个上下文现象的改善不推广到其他人。虽然我们观察到一些跨语言迁移,但在同一子家族内的语言之间并没有显着更高。最后,我们提出并实证评估了两种旨在利用现有数据的培训策略。这些策略提高了上下文利用率,在单语言和多语言环境中,ctxPro评估的准确率分别提高了6个百分点和8个百分点。
摘要:Achieving human-level translations requires leveraging context to ensure coherence and handle complex phenomena like pronoun disambiguation. Sparsity of contextually rich examples in the standard training data has been hypothesized as the reason for the difficulty of context utilization. In this work, we systematically validate this claim in both single- and multilingual settings by constructing training datasets with a controlled proportions of contextually relevant examples. We demonstrate a strong association between training data sparsity and model performance confirming sparsity as a key bottleneck. Importantly, we reveal that improvements in one contextual phenomenon do no generalize to others. While we observe some cross-lingual transfer, it is not significantly higher between languages within the same sub-family. Finally, we propose and empirically evaluate two training strategies designed to leverage the available data. These strategies improve context utilization, resulting in accuracy gains of up to 6 and 8 percentage points on the ctxPro evaluation in single- and multilingual settings respectively.
【6】Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale
标题:Hala技术报告:大规模构建以阿拉伯语为中心的教学和翻译模型
链接:https://arxiv.org/abs/2509.14008
作者:d Al Kader Hammoud, Mohammad Zbeeb, Bernard Ghanem
备注:Technical Report
摘要:我们介绍了Hala,一个以阿拉伯语为中心的教学和翻译模型家族,它是用我们的编译和调整管道构建的。我们首先将一个强大的AR$\leftrightarrow$EN教师压缩到FP 8(在没有质量损失的情况下提高吞吐量2倍),并使用它来创建高保真的双语监督。然后,轻量级语言模型LFM 2 -1.2B在此数据上进行微调,并用于将高质量的英语指令集翻译为阿拉伯语,从而生成针对指令遵循的百万级语料库。我们在350 M、700 M、1.2B和9 B参数下对Hala模型进行训练,并应用slerp合并来平衡Arabic专业化与基础模型的优势。在以阿拉伯语为中心的基准上,Hala在“纳米”(20亿美元)和“小型”(7- 90亿美元)类别中都取得了最先进的成果,超过了他们的基础。我们发布模型,数据,评估和配方,以加速阿拉伯语NLP的研究。
摘要:We present Hala, a family of Arabic-centric instruction and translation models built with our translate-and-tune pipeline. We first compress a strong AR$\leftrightarrow$EN teacher to FP8 (yielding $\sim$2$\times$ higher throughput with no quality loss) and use it to create high-fidelity bilingual supervision. A lightweight language model LFM2-1.2B is then fine-tuned on this data and used to translate high-quality English instruction sets into Arabic, producing a million-scale corpus tailored to instruction following. We train Hala models at 350M, 700M, 1.2B, and 9B parameters, and apply slerp merging to balance Arabic specialization with base-model strengths. On Arabic-centric benchmarks, Hala achieves state-of-the-art results within both the "nano" ($\leq$2B) and "small" (7-9B) categories, outperforming their bases. We release models, data, evaluation, and recipes to accelerate research in Arabic NLP.
【7】Exploring Major Transitions in the Evolution of Biological Cognition With Artificial Neural Networks
标题:利用人工神经网络探索生物认知进化的重大转变
链接:https://arxiv.org/abs/2509.13968
作者:nos Voudouris, Andrew Barron, Marta Halina, Colin Klein, Matishalin Patel
摘要:过渡时期的进化论强调了一些改变,这些改变塑造了什么是可进化的,对衍生的谱系产生了戏剧性的影响。最近有人提出,认知也可能是通过一系列操纵生物神经网络结构的重大转变而进化的,从根本上改变了信息的流动。我们使用理想化的信息流模型,人工神经网络(ANN),来评估网络中信息流的变化是否会产生认知表现的过渡性变化。我们比较了前馈,循环和层压拓扑结构的网络,并测试了它们的性能学习人工语法的复杂性不同,控制网络的大小和资源。我们记录了与前馈网络相比,递归网络可以处理的输入类型的质的扩展,以及学习最复杂语法的性能的相关质的提高。我们还注意到训练循环网络的困难如何构成一种过渡障碍和偶然不可逆转性--进化过渡的其他关键特征。并非网络拓扑中的所有更改都能在此任务集中带来性能优势。分层网络在语法学习中并没有优于非分层网络。总的来说,我们的研究结果显示了信息流的一些变化如何产生认知表现的转变。
摘要:Transitional accounts of evolution emphasise a few changes that shape what is evolvable, with dramatic consequences for derived lineages. More recently it has been proposed that cognition might also have evolved via a series of major transitions that manipulate the structure of biological neural networks, fundamentally changing the flow of information. We used idealised models of information flow, artificial neural networks (ANNs), to evaluate whether changes in information flow in a network can yield a transitional change in cognitive performance. We compared networks with feed-forward, recurrent and laminated topologies, and tested their performance learning artificial grammars that differed in complexity, controlling for network size and resources. We documented a qualitative expansion in the types of input that recurrent networks can process compared to feed-forward networks, and a related qualitative increase in performance for learning the most complex grammars. We also noted how the difficulty in training recurrent networks poses a form of transition barrier and contingent irreversibility -- other key features of evolutionary transitions. Not all changes in network topology confer a performance advantage in this task set. Laminated networks did not outperform non-laminated networks in grammar learning. Overall, our findings show how some changes in information flow can yield transitions in cognitive performance.
【8】eXtended Physics Informed Neural Network Method for Fracture Mechanics Problems
标题:断裂力学问题的扩展物理信息神经网络方法
链接:https://arxiv.org/abs/2509.13952
作者:alian, Mohammad Reza Banan, Pooyan Broumand
摘要:本文介绍了扩展物理信息神经网络(X-PINN),一种新的和强大的框架,用于解决断裂力学问题,涉及多裂纹在裂缝介质。为了解决这个问题,提出了一个基于能量的损失函数,定制的集成方案,和域分解程序。受扩展有限元法(XFEM)的启发,神经网络的解决方案空间丰富了专门的功能,允许显式捕获裂纹尖端的裂纹体不连续性和奇异性。此外,引入了一个结构化的框架,其中标准和丰富的解决方案组件建模使用不同的神经网络,使灵活和有效的模拟复杂的多裂纹问题在1D和2D域,方便扩展到3D问题。数值实验验证了该方法的有效性和鲁棒性。
摘要:This paper presents eXtended Physics-Informed Neural Network (X-PINN), a novel and robust framework for addressing fracture mechanics problems involving multiple cracks in fractured media. To address this, an energy-based loss function, customized integration schemes, and domain decomposition procedures are proposed. Inspired by the Extended Finite Element Method (XFEM), the neural network solution space is enriched with specialized functions that allow crack body discontinuities and singularities at crack tips to be explicitly captured. Furthermore, a structured framework is introduced in which standard and enriched solution components are modeled using distinct neural networks, enabling flexible and effective simulations of complex multiple-crack problems in 1D and 2D domains, with convenient extensibility to 3D problems. Numerical experiments are conducted to validate the effectiveness and robustness of the proposed method.
【9】Masked Diffusion Models as Energy Minimization
标题:作为能源最小化的掩蔽扩散模型
链接:https://arxiv.org/abs/2509.13866
作者:en, Shen Nie, Jiacheng Sun, Zijin Feng, Zhenguo Li, Ji-Rong Wen, Chongxuan Li
摘要:我们提出了一个系统的理论框架,解释掩蔽扩散模型(MDM)的离散最优运输的能量最小化问题的解决方案。具体来说,我们证明了三种不同的能量配方-动能,条件动能和测地线能量-在数学上是等价的MDM结构下,和MDM最小化所有三个时,掩模时间表满足封闭形式的最优性条件。这种统一不仅澄清了MDM的理论基础,而且还激发了采样的实际改进。通过Beta分布参数化插值时间表,我们将时间表设计空间减少到易于处理的2D搜索,从而在不修改模型的情况下实现有效的训练后调整。合成和现实世界的基准实验表明,我们的能源启发的时间表优于手工制作的基线,特别是在低步长采样设置。
摘要:We present a systematic theoretical framework that interprets masked diffusion models (MDMs) as solutions to energy minimization problems in discrete optimal transport. Specifically, we prove that three distinct energy formulations--kinetic, conditional kinetic, and geodesic energy--are mathematically equivalent under the structure of MDMs, and that MDMs minimize all three when the mask schedule satisfies a closed-form optimality condition. This unification not only clarifies the theoretical foundations of MDMs, but also motivates practical improvements in sampling. By parameterizing interpolation schedules via Beta distributions, we reduce the schedule design space to a tractable 2D search, enabling efficient post-training tuning without model modification. Experiments on synthetic and real-world benchmarks demonstrate that our energy-inspired schedules outperform hand-crafted baselines, particularly in low-step sampling settings.
【10】Towards a Physics Foundation Model
标题:迈向物理基础模型
链接:https://arxiv.org/abs/2509.13805
作者:iesner, Matthias Wessling, Stephen Baek
摘要
:基础模型通过"训练一次,部署在任何地方“的范式,彻底改变了自然语言处理,其中一个预先训练的模型无需重新训练就能适应无数的下游任务。访问物理基础模型(PFM)将是变革性的-民主化访问高保真模拟,加速科学发现,并消除了对专业求解器开发的需求。然而,目前的物理感知机器学习方法从根本上仍然局限于单一、狭窄的领域,并且需要对每个新系统进行重新训练。我们介绍了通用物理Transformer(GPhyT),在1.8 TB的不同模拟数据上进行了训练,证明了基础模型功能是可以实现的。我们的关键见解是,Transformers可以学习从上下文中推断控制动力学,使单个模型能够模拟流体-固体相互作用,冲击波,热对流和多相动力学,而无需被告知底层方程。GPhyT实现了三个关键突破:(1)在多个物理领域的卓越性能,超过专业架构高达29倍,(2)通过上下文学习对完全看不见的物理系统进行zero-shot推广,以及(3)通过50个时间步的推出进行稳定的长期预测。通过建立一个单一的模型可以从数据中学习可推广的物理原理,这项工作开辟了一条通往通用PFM的道路,可以改变计算科学和工程。
摘要:Foundation models have revolutionized natural language processing through a ``train once, deploy anywhere'' paradigm, where a single pre-trained model adapts to countless downstream tasks without retraining. Access to a Physics Foundation Model (PFM) would be transformative -- democratizing access to high-fidelity simulations, accelerating scientific discovery, and eliminating the need for specialized solver development. Yet current physics-aware machine learning approaches remain fundamentally limited to single, narrow domains and require retraining for each new system. We present the General Physics Transformer (GPhyT), trained on 1.8 TB of diverse simulation data, that demonstrates foundation model capabilities are achievable for physics. Our key insight is that transformers can learn to infer governing dynamics from context, enabling a single model to simulate fluid-solid interactions, shock waves, thermal convection, and multi-phase dynamics without being told the underlying equations. GPhyT achieves three critical breakthroughs: (1) superior performance across multiple physics domains, outperforming specialized architectures by up to 29x, (2) zero-shot generalization to entirely unseen physical systems through in-context learning, and (3) stable long-term predictions through 50-timestep rollouts. By establishing that a single model can learn generalizable physical principles from data alone, this work opens the path toward a universal PFM that could transform computational science and engineering.
【11】Circuit realization and hardware linearization of monotone operator equilibrium networks
标题:单调运算符均衡网络的电路实现与硬件线性化
链接:https://arxiv.org/abs/2509.13793
作者:affey
摘要:它表明,电阻-二极管网络的端口行为对应于ReLU单调算子平衡网络(无限深度极限的神经网络)的解,给出了模拟硬件中神经网络的简约构造。我们还表明,这样的电路的梯度可以直接在硬件中计算,使用一个程序,我们称之为硬件线性化。这使得网络可以在硬件中进行训练,我们用器件级电路仿真来演示。我们将结果推广到电阻-二极管网络的级联,它可以用来实现前馈和其他非对称网络。最后,我们表明,不同的非线性元件会产生不同的激活函数,并介绍了新的二极管ReLU,这是由一个非理想的二极管模型。
摘要:It is shown that the port behavior of a resistor- diode network corresponds to the solution of a ReLU monotone operator equilibrium network (a neural network in the limit of infinite depth), giving a parsimonious construction of a neural network in analog hardware. We furthermore show that the gradient of such a circuit can be computed directly in hardware, using a procedure we call hardware linearization. This allows the network to be trained in hardware, which we demonstrate with a device-level circuit simulation. We extend the results to cascades of resistor-diode networks, which can be used to implement feedforward and other asymmetric networks. We finally show that different nonlinear elements give rise to different activation functions, and introduce the novel diode ReLU which is induced by a non-ideal diode model.
【12】Floating-Body Hydrodynamic Neural Networks
标题:浮体流体动力学神经网络
链接:https://arxiv.org/abs/2509.13783
作者:Zhang, Wenzhe Zhai, Rui Yann, Jia Gao, He Cao, Xianglei Xing
摘要:流体-结构相互作用在工程和自然系统中很常见,其中浮体运动由附加质量、阻力和背景流控制。对这些耗散动力学进行建模是困难的:黑盒神经模型回归状态导数,具有有限的可解释性和不稳定的长期预测。我们提出了浮体流体动力学神经网络(FHNN),一个物理结构的框架,预测可解释的流体动力学参数,如方向增加的质量,阻力系数,和基于流函数的流量,并将它们与解析运动方程耦合。这种设计限制了假设空间,增强了可解释性,并稳定了集成。在合成涡数据集上,FHNN实现了比神经ODE低一个数量级的误差,恢复了物理上一致的流场。与Hamiltonian和Lagrangian神经网络相比,FHNN更有效地处理耗散动力学,同时保持可解释性,这弥合了黑箱学习和透明系统识别之间的差距。
摘要:Fluid-structure interaction is common in engineering and natural systems, where floating-body motion is governed by added mass, drag, and background flows. Modeling these dissipative dynamics is difficult: black-box neural models regress state derivatives with limited interpretability and unstable long-horizon predictions. We propose Floating-Body Hydrodynamic Neural Networks (FHNN), a physics-structured framework that predicts interpretable hydrodynamic parameters such as directional added masses, drag coefficients, and a streamfunction-based flow, and couples them with analytic equations of motion. This design constrains the hypothesis space, enhances interpretability, and stabilizes integration. On synthetic vortex datasets, FHNN achieves up to an order-of-magnitude lower error than Neural ODEs, recovers physically consistent flow fields. Compared with Hamiltonian and Lagrangian neural networks, FHNN more effectively handles dissipative dynamics while preserving interpretability, which bridges the gap between black-box learning and transparent system identification.
【13】AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions
标题:AERIS:阿贡地球系统模型,用于可靠且巧妙的预测
链接:https://arxiv.org/abs/2509.13523
作者:anpää, Eugene Ku, Jason Stock, Murali Emani, Sam Foreman, Chunyong Jung, Sandeep Madireddy, Tung Nguyen, Varuni Sastry, Ray A. O. Sinurat, Sam Wheeler, Huihuo Zheng, Troy Arcomano, Venkatram Vishwanath, Rao Kotamarthi
备注:14 pages, 7 figures
摘要:生成式机器学习为更好地理解复杂的地球系统动力学提供了新的机会。最近的扩散为基础的方法解决光谱偏差和改善合奏校准天气预报相比,确定性的方法,但迄今已被证明难以在高分辨率下稳定地缩放。我们介绍AERIS,一个1.3到80 B的参数像素级Swin扩散Transformer来解决这个差距,和SWiPe,一个可推广的技术,组成窗口并行序列和流水线并行分片基于窗口的Transformers,而不增加通信成本或增加全球批量大小。在Aurora(10,080个节点)上,AERIS在0.25{\deg} ERA 5数据集上保持10.21 ExaFLOPS(混合精度)和11.21 ExaFLOPS的峰值性能,补丁大小为1$\times 1$,实现了95.5%的弱缩放效率和81.6%的强缩放效率。AERIS的性能优于IFS ENS,并且在90天的季节尺度上保持稳定,突出了十亿参数扩散模型在天气和气候预测方面的潜力。
摘要:Generative machine learning offers new opportunities to better understand complex Earth system dynamics. Recent diffusion-based methods address spectral biases and improve ensemble calibration in weather forecasting compared to deterministic methods, yet have so far proven difficult to scale stably at high resolutions. We introduce AERIS, a 1.3 to 80B parameter pixel-level Swin diffusion transformer to address this gap, and SWiPe, a generalizable technique that composes window parallelism with sequence and pipeline parallelism to shard window-based transformers without added communication cost or increased global batch size. On Aurora (10,080 nodes), AERIS sustains 10.21 ExaFLOPS (mixed precision) and a peak performance of 11.21 ExaFLOPS with $1 \times 1$ patch size on the 0.25{\deg} ERA5 dataset, achieving 95.5% weak scaling efficiency, and 81.6% strong scaling efficiency. AERIS outperforms the IFS ENS and remains stable on seasonal scales to 90 days, highlighting the potential of billion-parameter diffusion models for weather and climate prediction.
【14】Learning Nonlinear Responses in PET Bottle Buckling with a Hybrid DeepONet-Transolver Framework
标题:使用DeepONet-Transsolver混合框架学习PET瓶弯曲中的非线性响应
链接:https://arxiv.org/abs/2509.13520
作者:ar, Jing Bi, Cyril Ngo Ngoc, Victor Oancea, George Em Karniadakis
摘要:神经代理和算子网络求解偏微分方程(PDE)问题是近年来的研究热点。然而,大多数现有的方法是有限的,在他们的能力,以概括不同的非参数几何域的解决方案。在这项工作中,我们在聚对苯二甲酸乙二醇酯(PET)瓶子屈曲分析的背景下解决了这一挑战,这是一个代表性的包装设计问题,通常使用计算成本高昂的有限元分析(FEA)来解决。我们引入了一个混合DeepONet-Transolver框架,该框架可以同时预测节点位移场和顶部载荷压缩过程中反作用力的时间演化。我们的方法进行评估两个家庭的瓶子几何参数化的两个和四个设计变量。训练数据是使用Abaqus中的非线性FEA模拟为每个系列254个独特设计生成的。所提出的框架实现了平均相对L^{2}$误差为2.5-13%的位移场和约2.4%的四参数瓶家庭的时间相关的反作用力。逐点误差分析进一步表明,绝对位移误差的数量级为10 ^{-4}$-10 ^{-3}$,最大的差异局限于局部几何区域。重要的是,该模型准确地捕捉关键的物理现象,如屈曲行为,在不同的瓶子几何形状。这些结果突出了我们的框架作为一个可扩展的和计算效率高的替代品的潜力,特别是在计算力学和需要快速设计评估的应用程序的多任务预测。
摘要:Neural surrogates and operator networks for solving partial differential equation (PDE) problems have attracted significant research interest in recent years. However, most existing approaches are limited in their ability to generalize solutions across varying non-parametric geometric domains. In this work, we address this challenge in the context of Polyethylene Terephthalate (PET) bottle buckling analysis, a representative packaging design problem conventionally solved using computationally expensive finite element analysis (FEA). We introduce a hybrid DeepONet-Transolver framework that simultaneously predicts nodal displacement fields and the time evolution of reaction forces during top load compression. Our methodology is evaluated on two families of bottle geometries parameterized by two and four design variables. Training data is generated using nonlinear FEA simulations in Abaqus for 254 unique designs per family. The proposed framework achieves mean relative $L^{2}$ errors of 2.5-13% for displacement fields and approximately 2.4% for time-dependent reaction forces for the four-parameter bottle family. Point-wise error analyses further show absolute displacement errors on the order of $10^{-4}$-$10^{-3}$, with the largest discrepancies confined to localized geometric regions. Importantly, the model accurately captures key physical phenomena, such as buckling behavior, across diverse bottle geometries. These results highlight the potential of our framework as a scalable and computationally efficient surrogate, particularly for multi-task predictions in computational mechanics and applications requiring rapid design evaluation.
【15】Unified Spatiotemopral Physics-Informed Learning (USPIL): A Framework for Modeling Complex Predator-Prey Dynamics
标题:统一空间物理知情学习(USPIL):复杂捕食者-猎物动力学建模框架
链接:https://arxiv.org/abs/2509.13425
作者:an Chrisnanto, Yulison Herry Chrisnanto, Ferry Faizal
备注:20 pages, 11 figures. A preprint on using a unified physics-informed neural network framework to model predator-prey dynamics
摘要:生态系统呈现出复杂的多尺度动态,挑战传统的建模。新的方法必须捕捉时间振荡和新兴的时空模式,同时坚持守恒原则。我们提出了统一时空物理信息学习(USPIL)框架,这是一种深度学习架构,它集成了物理信息神经网络(PINN)和守恒定律,以跨维度模拟捕食者-猎物动态。该框架为普通(ODE)和偏(PDE)微分方程系统提供了统一的解决方案,描述了单个神经网络架构中的时间周期和反应扩散模式。我们的方法使用自动微分来执行物理约束和自适应损失加权来平衡数据保真度与物理一致性。应用于Lotka-Volterra系统,USPIL实现了98.9%的1D时间动力学相关性(损失:0.0219,MAE:0.0184),并捕获复杂的螺旋波在2D系统(损失:4.7656,模式相关性:0.94)。验证确认守恒定律的遵守率在0.5%以内,与数值求解器相比,推理的计算速度提高了10- 50倍。USPIL还通过可解释的物理约束实现了机械理解,促进了参数发现和灵敏度分析,这是纯数据驱动方法无法实现的。它在维度公式之间转换的能力为多尺度生态建模开辟了新的途径。这些能力使USPIL成为生态预测、保护规划和理解生态系统弹性的变革性工具,将物理学深度学习建立为一种强大且科学严谨的范式。
摘要:Ecological systems exhibit complex multi-scale dynamics that challenge traditional modeling. New methods must capture temporal oscillations and emergent spatiotemporal patterns while adhering to conservation principles. We present the Unified Spatiotemporal Physics-Informed Learning (USPIL) framework, a deep learning architecture integrating physics-informed neural networks (PINNs) and conservation laws to model predator-prey dynamics across dimensional scales. The framework provides a unified solution for both ordinary (ODE) and partial (PDE) differential equation systems, describing temporal cycles and reaction-diffusion patterns within a single neural network architecture. Our methodology uses automatic differentiation to enforce physics constraints and adaptive loss weighting to balance data fidelity with physical consistency. Applied to the Lotka-Volterra system, USPIL achieves 98.9% correlation for 1D temporal dynamics (loss: 0.0219, MAE: 0.0184) and captures complex spiral waves in 2D systems (loss: 4.7656, pattern correlation: 0.94). Validation confirms conservation law adherence within 0.5% and shows a 10-50x computational speedup for inference compared to numerical solvers. USPIL also enables mechanistic understanding through interpretable physics constraints, facilitating parameter discovery and sensitivity analysis not possible with purely data-driven methods. Its ability to transition between dimensional formulations opens new avenues for multi-scale ecological modeling. These capabilities make USPIL a transformative tool for ecological forecasting, conservation planning, and understanding ecosystem resilience, establishing physics-informed deep learning as a powerful and scientifically rigorous paradigm.
【16】Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks
标题:量子变分激活函数增强Kolmogorov-Arnold网络
链接:https://arxiv.org/abs/2509.14026
作者:g Jiang, Morris Yu-Chao Huang, Tianlong Chen, Hsi-Sheng Goan
备注:45 pages
摘要:变分量子电路(VQC)是量子机器学习的核心,而Kolmogorov-Arnold网络(KAN)的最新进展凸显了可学习激活函数的强大功能。我们通过引入量子变分激活函数(QVAF)来统一这些方向,QVAF通过称为数据重新上传激活(DARUAN)的单量子比特数据重新上传电路来实现。我们发现,在数据预处理中具有可训练权重的DARUAN具有指数增长的频谱,数据重复,与基于傅立叶的激活相比,参数大小呈指数级减小,而不会损失表现力。将DARUAN嵌入到KAN中会产生量子启发的KAN(QKAN),它保留了KAN的可解释性,同时提高了其参数效率、表达性和概括性。我们进一步介绍了两种新的技术,以提高可扩展性,可行性和计算效率,如层扩展和混合QKAN(HQKAN)的多层感知器(MLP)的大规模模型中的前馈网络的下拉式替代。我们提供了理论分析和广泛的实验函数回归,图像分类和自回归生成语言建模,证明QKAN的效率和可扩展性。DARUAN和QKAN为在噪声中间尺度量子(NISQ)硬件和经典量子模拟器上推进量子机器学习提供了一个有希望的方向。
摘要:Variational quantum circuits (VQCs) are central to quantum machine learning, while recent progress in Kolmogorov-Arnold networks (KANs) highlights the power of learnable activation functions. We unify these directions by introducing quantum variational activation functions (QVAFs), realized through single-qubit data re-uploading circuits called DatA Re-Uploading ActivatioNs (DARUANs). We show that DARUAN with trainable weights in data pre-processing possesses an exponentially growing frequency spectrum with data repetitions, enabling an exponential reduction in parameter size compared with Fourier-based activations without loss of expressivity. Embedding DARUAN into KANs yields quantum-inspired KANs (QKANs), which retain the interpretability of KANs while improving their parameter efficiency, expressivity, and generalization. We further introduce two novel techniques to enhance scalability, feasibility and computational efficiency, such as layer extension and hybrid QKANs (HQKANs) as drop-in replacements of multi-layer perceptrons (MLPs) for feed-forward networks in large-scale models. We provide theoretical analysis and extensive experiments on function regression, image classification, and autoregressive generative language modeling, demonstrating the efficiency and scalability of QKANs. DARUANs and QKANs offer a promising direction for advancing quantum machine learning on both noisy intermediate-scale quantum (NISQ) hardware and classical quantum simulators.
【17】Artificial neural networks ensemble methodology to predict significant wave height
标题:预报有效波高的人工神经网络集成方法
链接:https://arxiv.org/abs/2509.14020
作者:ivellaro Minuzzi, Leandro Farina
备注:None
摘要:波浪变量的预测对于依赖于更好地描述海洋状态的几个应用是重要的。由于模拟这个问题的微分方程的混沌行为,克服困难的众所周知的策略基本上是运行几个模拟,例如,改变初始条件,并平均每个模拟的结果,创建一个系综。此外,在过去几年中,考虑到可用数据量和计算能力的增加,机器学习算法已被用作传统数值模型的替代品,产生了相当或更好的结果。在这项工作中,我们提出了一种方法来创建不同的人工神经网络架构,即MLP,RNN,LSTM,CNN和混合CNN-LSTM的合奏,其目的是预测在巴西海岸的六个不同位置的显着波高。网络使用NOAA的数值再预报数据进行训练,目标是观测数据和数值模式输出之间的残差。一个新的策略来创建训练和目标数据集的演示。结果表明,我们的框架是能够产生高效率的预测,平均精度为80美元,可以实现高达88美元的最佳情况下,这意味着5美元的误差指标减少相比,NOAA的数值模式,并越来越减少计算成本。
摘要:The forecast of wave variables are important for several applications that depend on a better description of the ocean state. Due to the chaotic behaviour of the differential equations which model this problem, a well know strategy to overcome the difficulties is basically to run several simulations, by for instance, varying the initial condition, and averaging the result of each of these, creating an ensemble. Moreover, in the last few years, considering the amount of available data and the computational power increase, machine learning algorithms have been applied as surrogate to traditional numerical models, yielding comparative or better results. In this work, we present a methodology to create an ensemble of different artificial neural networks architectures, namely, MLP, RNN, LSTM, CNN and a hybrid CNN-LSTM, which aims to predict significant wave height on six different locations in the Brazilian coast. The networks are trained using NOAA's numerical reforecast data and target the residual between observational data and the numerical model output. A new strategy to create the training and target datasets is demonstrated. Results show that our framework is capable of producing high efficient forecast, with an average accuracy of $80\%$, that can achieve up to $88\%$ in the best case scenario, which means $5\%$ reduction in error metrics if compared to NOAA's numerical model, and a increasingly reduction of computational cost.
【18】Learning quantum many-body data locally: A provably scalable framework
标题:本地学习量子多体数据:可证明可扩展的框架
链接:https://arxiv.org/abs/2509.13705
作者:zei, Quoc Hoan Tran, Norifumi Matsumoto, Yasuhiro Endo, Hirotaka Oshima
备注:38 pages, 5 figures
摘要:机器学习(ML)在从量子实验中获得的复杂量子多体数据中提取见解方面具有很大的潜力。这种方法可以有效地解决某些经典难以解决的量子问题,这表明了利用量子数据的潜在优势。然而,解决大规模问题仍然需要大量的数据,超出了近期量子设备有限的计算资源。我们提出了一个可扩展的ML框架,称为几何局部量子内核(GLQK),旨在通过利用相关性的指数衰减来有效地学习量子多体实验数据,这是一种在非临界系统中普遍存在的现象。在学习一个未知的多项式的量子期望值的任务中,我们严格证明了GLQK大大提高了多项式样本复杂性的量子位数$n$,相比现有的影子内核,通过构建一个特征空间从本地量子信息的相关长度尺度。当目标多项式的每一项涉及很少的局部子系统时,这种改进尤其显着。值得注意的是,对于对称数据,GLQK实现了恒定的样本复杂度,独立于$n$。我们在两个量子多体现象的学习任务中数值证明了它的高可扩展性。这些结果建立了利用实验数据,以促进量子多体物理学的理解的新途径。
摘要:Machine learning (ML) holds great promise for extracting insights from complex quantum many-body data obtained in quantum experiments. This approach can efficiently solve certain quantum problems that are classically intractable, suggesting potential advantages of harnessing quantum data. However, addressing large-scale problems still requires significant amounts of data beyond the limited computational resources of near-term quantum devices. We propose a scalable ML framework called Geometrically Local Quantum Kernel (GLQK), designed to efficiently learn quantum many-body experimental data by leveraging the exponential decay of correlations, a phenomenon prevalent in noncritical systems. In the task of learning an unknown polynomial of quantum expectation values, we rigorously prove that GLQK substantially improves polynomial sample complexity in the number of qubits $n$, compared to the existing shadow kernel, by constructing a feature space from local quantum information at the correlation length scale. This improvement is particularly notable when each term of the target polynomial involves few local subsystems. Remarkably, for translationally symmetric data, GLQK achieves constant sample complexity, independent of $n$. We numerically demonstrate its high scalability in two learning tasks on quantum many-body phenomena. These results establish new avenues for utilizing experimental data to advance the understanding of quantum many-body physics.
其他(19篇)
【1】Deconstructing Intraocular Pressure: A Non-invasive Multi-Stage Probabilistic Inverse Framework
标题:解构眼内压:无创多阶段概率逆框架
链接:https://arxiv.org/abs/2509.14167
作者: Jaher, Abul Mukid Mohammad Mukaddes, A. B. M. Abdul Malek
备注:43 pages, 10 figures (including supplementary material)
摘要:许多重要的医疗保健决策都面临无法测量关键基础参数的挑战。青光眼是由眼内压(IOP)升高导致的不可逆失明的主要原因,提供了一个鲜明的例子。眼压的主要决定因素是一种称为小梁网渗透性的组织特性,无法在体内测量,迫使临床医生依赖间接替代物。这一临床挑战与更广泛的计算挑战相结合:由于缺乏地面实况数据和大规模高保真模拟的高昂成本,为这种不适定的逆问题开发预测模型受到阻碍。我们用一个端到端的框架来解决这两个挑战,从稀疏的常规数据中非侵入性地估计不可测量的变量。我们的方法结合了多阶段人工智能架构,在功能上分离问题;一种新的数据生成策略,我们称之为PCDS,无需进行数十万次昂贵的模拟,将有效计算时间从数年减少到数小时;以及贝叶斯引擎来量化预测的不确定性。我们的框架解构了一个单一的IOP测量到其基本组成部分,从常规的输入,产生的不可测量的组织渗透性和患者的流出设施的估计。我们的非侵入性估计流出设施取得了良好的协议与国家的最先进的张力描记术的精度媲美直接物理仪器。此外,新衍生的渗透性生物标志物在按疾病风险对临床队列进行分层方面表现出高准确性,突出了其诊断潜力。更广泛地说,我们的框架建立了一个通用的蓝图,解决类似的反问题,在其他数据稀缺,计算密集型领域。
摘要:Many critical healthcare decisions are challenged by the inability to measure key underlying parameters. Glaucoma, a leading cause of irreversible blindness driven by elevated intraocular pressure (IOP), provides a stark example. The primary determinant of IOP, a tissue property called trabecular meshwork permeability, cannot be measured in vivo, forcing clinicians to depend on indirect surrogates. This clinical challenge is compounded by a broader computational one: developing predictive models for such ill-posed inverse problems is hindered by a lack of ground-truth data and prohibitive cost of large-scale, high-fidelity simulations. We address both challenges with an end-to-end framework to noninvasively estimate unmeasurable variables from sparse, routine data. Our approach combines a multi-stage artificial intelligence architecture to functionally separate the problem; a novel data generation strategy we term PCDS that obviates the need for hundreds of thousands of costly simulations, reducing the effective computational time from years to hours; and a Bayesian engine to quantify predictive uncertainty. Our framework deconstructs a single IOP measurement into its fundamental components from routine inputs only, yielding estimates for the unmeasurable tissue permeability and a patient's outflow facility. Our noninvasively estimated outflow facility achieved excellent agreement with state-of-the-art tonography with precision comparable to direct physical instruments. Furthermore, the newly derived permeability biomarker demonstrates high accuracy in stratifying clinical cohorts by disease risk, highlighting its diagnostic potential. More broadly, our framework establishes a generalizable blueprint for solving similar inverse problems in other data-scarce, computationally-intensive domains.
【2】Nash Equilibria in Games with Playerwise Concave Coupling Constraints: Existence and Computation
标题
:具有逐人凹陷耦合约束的博弈中的纳什均衡:存在性和计算
链接:https://arxiv.org/abs/2509.14032
作者:rdan, Maryam Kamgarpour
摘要:研究了连续静态博弈中Nash均衡的存在性和计算问题,其中局中人的可接受策略受到共享耦合约束,即,依赖于它们的\n {joint}策略的约束。具体来说,我们专注于一类游戏的特点是玩家凹效用和玩家凹约束。先前关于纳什均衡存在性的结果并不适用于这一类,因为它们依赖于强假设,如可行集的联合凸性。利用拓扑不动点理论和新颖的结构性见解可行集的可收缩性下的playerwise凹约束,我们给出了一个存在性证明纳什均衡在较弱的条件下。建立存在性,然后我们专注于计算纳什均衡通过独立的梯度方法下的附加假设,该设施承认一个潜在的功能。考虑到可能的非凸可行域,我们采用了自适应步长的对数障碍正则化梯度上升。从一个初始可行的策略配置文件和精确的梯度反馈下,所提出的方法收敛到$\mathcal{O}(\mathcal ^{-3})$迭代$\mathcal $-近似约束纳什均衡。
摘要:We study the existence and computation of Nash equilibria in continuous static games where the players' admissible strategies are subject to shared coupling constraints, i.e., constraints that depend on their \emph{joint} strategies. Specifically, we focus on a class of games characterized by playerwise concave utilities and playerwise concave constraints. Prior results on the existence of Nash equilibria are not applicable to this class, as they rely on strong assumptions such as joint convexity of the feasible set. By leveraging topological fixed point theory and novel structural insights into the contractibility of feasible sets under playerwise concave constraints, we give an existence proof for Nash equilibria under weaker conditions. Having established existence, we then focus on the computation of Nash equilibria via independent gradient methods under the additional assumption that the utilities admit a potential function. To account for the possibly nonconvex feasible region, we employ a log barrier regularized gradient ascent with adaptive stepsizes. Starting from an initial feasible strategy profile and under exact gradient feedback, the proposed method converges to an $\epsilon$-approximate constrained Nash equilibrium within $\mathcal{O}(\epsilon^{-3})$ iterations.
【3】MOCHA: Multi-modal Objects-aware Cross-arcHitecture Alignment
标题:MOCHA:多模式对象感知跨区域建筑对齐
链接:https://arxiv.org/abs/2509.14001
作者:uffo, Francesco Barbato, Mete Ozay, Simone Milani, Umberto Michieli
摘要:我们介绍MOCHA(多模态对象感知跨架构对齐),这是一种知识蒸馏方法,可以从大型视觉语言教师(例如,LLaVa)转换为轻量级的仅视觉对象检测器学生(例如,YOLO)。翻译模块将学生特征映射到联合空间中,在联合空间中,学生和翻译者的训练由双目标损失指导,该双目标损失强制执行局部对齐和全局关系一致性。与之前专注于密集或全局对齐的方法不同,MOCHA在对象级别上操作,能够有效地传递语义,而无需修改教师或在推理时需要文本输入。我们验证我们的方法在四个个性化的检测基准下Few-Shot制度。结果显示,与基线相比,一致的收益,平均得分提高了+10.1。尽管其结构紧凑,但MOCHA的性能与更大的多模式模型相当,证明了其适用于现实世界的部署。
摘要:We introduce MOCHA (Multi-modal Objects-aware Cross-arcHitecture Alignment), a knowledge distillation approach that transfers region-level multimodal semantics from a large vision-language teacher (e.g., LLaVa) into a lightweight vision-only object detector student (e.g., YOLO). A translation module maps student features into a joint space, where the training of the student and translator is guided by a dual-objective loss that enforces both local alignment and global relational consistency. Unlike prior approaches focused on dense or global alignment, MOCHA operates at the object level, enabling efficient transfer of semantics without modifying the teacher or requiring textual input at inference. We validate our method across four personalized detection benchmarks under few-shot regimes. Results show consistent gains over baselines, with a +10.1 average score improvement. Despite its compact architecture, MOCHA reaches performance on par with larger multimodal models, proving its suitability for real-world deployment.
【4】Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
标题:Slim-SC:思想修剪,用于具有自一致性的有效缩放
链接:https://arxiv.org/abs/2509.13990
作者:g, Xu Guo, Anand Chaanan Singh, Esha Choukse, Dmitrii Ustiugov
备注:Accepted by EMNLP 2025 (Oral), 9 pages
摘要:最近,测试时缩放(TTS)得到了越来越多的关注,以提高LLM推理性能在测试时,而无需重新训练的模型。一个值得注意的TTS技术是自我一致性(SC),它并行生成多个推理链,并通过多数投票选择最终答案。虽然有效,但数量级的计算开销限制了其广泛部署。以前的尝试,以加速SC主要依赖于基于模型的信心分数或实证支持有限的竞争力。本文首次从理论和实证两方面分析了供应链的低效率,并揭示了可操作的改进机会。基于这些见解,我们提出了Slim-SC,一个逐步修剪策略,识别和删除冗余链使用链间的相似性在思想层面。在三个STEM推理数据集和两个最近的LLM架构上的实验表明,Slim-SC与R1-Distill相比,分别将推理延迟和KVC使用率降低了45%和26%,同时保持或提高了准确性,从而为SC提供了一种简单而有效的TTS替代方案。
摘要:Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the order-of-magnitude computational overhead limits its broad deployment. Prior attempts to accelerate SC mainly rely on model-based confidence scores or heuristics with limited empirical support. For the first time, we theoretically and empirically analyze the inefficiencies of SC and reveal actionable opportunities for improvement. Building on these insights, we propose Slim-SC, a step-wise pruning strategy that identifies and removes redundant chains using inter-chain similarity at the thought level. Experiments on three STEM reasoning datasets and two recent LLM architectures show that Slim-SC reduces inference latency and KVC usage by up to 45% and 26%, respectively, with R1-Distill, while maintaining or improving accuracy, thus offering a simple yet efficient TTS alternative for SC.
【5】Controllable Pareto Trade-off between Fairness and Accuracy
标题:公平性与准确性之间的可控帕累托权衡
链接:https://arxiv.org/abs/2509.13651
作者:Du, Jieyu Zhao, Yijun Yang, Tianyi Zhou
摘要:公平性-准确性权衡是NLP任务中的一个关键挑战。目前的工作重点是找到一个单一的“最佳”的解决方案,以平衡这两个目标,这是有限的考虑帕累托方面的不同解决方案。这项工作的目的是提供可控的权衡,根据用户的偏好的两个目标,这是定义为一个参考向量。为了实现这一目标,我们应用多目标优化(MOO),它可以从帕累托前沿的各个区域找到解决方案。然而,由于训练过程的随机性和高维梯度向量,精确控制权衡是具有挑战性的。因此,我们提出了可控帕累托权衡(CPT),可以有效地训练模型执行不同的权衡,根据用户的喜好。CPT 1)利用随机梯度的移动平均来稳定公平性更新以确定更新方向,以及2)通过仅保持关键参数的梯度来修剪梯度。我们评估CPT仇恨言论检测和职业分类任务。实验表明,CPT可以实现一个更高的质量的解决方案集的Pareto前沿比基线方法。它还具有更好的可控性,可以精确地遵循人类定义的参考向量。
摘要
:The fairness-accuracy trade-off is a key challenge in NLP tasks. Current work focuses on finding a single "optimal" solution to balance the two objectives, which is limited considering the diverse solutions on the Pareto front. This work intends to provide controllable trade-offs according to the user's preference of the two objectives, which is defined as a reference vector. To achieve this goal, we apply multi-objective optimization (MOO), which can find solutions from various regions of the Pareto front. However, it is challenging to precisely control the trade-off due to the stochasticity of the training process and the high dimentional gradient vectors. Thus, we propose Controllable Pareto Trade-off (CPT) that can effectively train models to perform different trade-offs according to users' preferences. CPT 1) stabilizes the fairness update with a moving average of stochastic gradients to determine the update direction, and 2) prunes the gradients by only keeping the gradients of the critical parameters. We evaluate CPT on hate speech detection and occupation classification tasks. Experiments show that CPT can achieve a higher-quality set of solutions on the Pareto front than the baseline methods. It also exhibits better controllability and can precisely follow the human-defined reference vectors.
【6】EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing
标题:EdiVal-Agent:一个以对象为中心的框架,用于多轮编辑的自动化、可扩展、细粒度评估
链接:https://arxiv.org/abs/2509.13399
作者:en, Yasi Zhang, Zhi Zhang, Peiyu Yu, Shu Wang, Zhendong Wang, Kevin Lin, Xiaofei Wang, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Jianwen Xie, Oscar Leong, Lijuan Wang, Ying Nian Wu, Mingyuan Zhou
备注:Tianyu Chen and Yasi Zhang contributed equally; Oscar Leong, Lijuan Wang, Ying Nian Wu, and Mingyuan Zhou advised equally
摘要:基于教学的图像编辑发展迅速,但可靠和可解释的评价仍然是一个瓶颈。目前的协议要么(i)依赖于成对的参考图像-导致有限的覆盖范围和继承偏见从以前的生成模型-或(ii)仅仅依赖于zero-shot视觉-语言模型(VLM),其指令遵循,内容一致性和视觉质量的基于语义的评估往往是不精确的。 为了解决这个问题,我们引入EdiVal-Agent,这是一个自动化,可扩展和细粒度的评估框架,用于从以对象为中心的角度进行基于多回合的编辑,并由一套专家工具支持。给定一幅图像,EdiVal-Agent首先将其分解为语义上有意义的对象,然后合成各种上下文感知的编辑指令。在评估方面,它将VLM与开放词汇对象检测器集成在一起,以评估指令遵循情况,使用语义级特征提取器评估内容一致性,并利用人类偏好模型来判断视觉质量。我们表明,结合VLMs与对象检测器产生更强的协议与人类的判断,在预防以下评价相比,单独使用VLMs和CLIP为基础的指标。此外,管道的模块化设计允许未来的工具无缝集成,随着时间的推移提高评估精度。 实例化这个管道,我们构建了EdiVal-Bench,这是一个多轮编辑基准,涵盖9种指令类型和11种最先进的编辑模型,包括自回归(AR)(包括Nano Banana,GPT-Image-1),流匹配和扩散范例。我们证明了EdiVal-Agent可以用来识别现有的故障模式,从而为下一代编辑模型的开发提供信息。项目页面:https://tianyucodings.github.io/EdiVAL-page/。
摘要:Instruction-based image editing has advanced rapidly, yet reliable and interpretable evaluation remains a bottleneck. Current protocols either (i) depend on paired reference images -- resulting in limited coverage and inheriting biases from prior generative models -- or (ii) rely solely on zero-shot vision--language models (VLMs), whose prompt-based assessments of instruction following, content consistency, and visual quality are often imprecise. To address this, we introduce EdiVal-Agent, an automated, scalable, and fine-grained evaluation framework for multi-turn instruction-based editing from an object-centric perspective, supported by a suite of expert tools. Given an image, EdiVal-Agent first decomposes it into semantically meaningful objects, then synthesizes diverse, context-aware editing instructions. For evaluation, it integrates VLMs with open-vocabulary object detectors to assess instruction following, uses semantic-level feature extractors to evaluate content consistency, and leverages human preference models to judge visual quality. We show that combining VLMs with object detectors yields stronger agreement with human judgments in instruction-following evaluation compared to using VLMs alone and CLIP-based metrics. Furthermore, the pipeline's modular design allows future tools to be seamlessly integrated, enhancing evaluation accuracy over time. Instantiating this pipeline, we build EdiVal-Bench, a multi-turn editing benchmark covering 9 instruction types and 11 state-of-the-art editing models spanning autoregressive (AR) (including Nano Banana, GPT-Image-1), flow-matching, and diffusion paradigms. We demonstrate that EdiVal-Agent can be used to identify existing failure modes, thereby informing the development of the next generation of editing models. Project page: https://tianyucodings.github.io/EdiVAL-page/.
【7】Curvature as a tool for evaluating dimensionality reduction and estimating intrinsic dimension
标题:Curvature作为评估降维和估计内在维度的工具
链接:https://arxiv.org/abs/2509.13385
作者: Beylier, Parvaneh Joharinad, Jürgen Jost, Nahid Torbati
备注:31 pages, 14 figures
摘要:利用最近发展的抽象概念的截面曲率,我们介绍了一种方法,用于构建一个基于曲率的几何轮廓的离散度量空间。我们在这里使用的曲率概念捕捉了点的三元组和其他点之间的度量关系。更重要的是,基于这个曲率轮廓,我们引入了一个定量的措施来评估数据表示的有效性,如降维技术产生的。此外,我们的实验表明,这种基于曲率的分析可以用来估计数据集的内在维数。我们用它来探索经验网络的大规模几何结构,并评估降维技术的有效性。
摘要:Utilizing recently developed abstract notions of sectional curvature, we introduce a method for constructing a curvature-based geometric profile of discrete metric spaces. The curvature concept that we use here captures the metric relations between triples of points and other points. More significantly, based on this curvature profile, we introduce a quantitative measure to evaluate the effectiveness of data representations, such as those produced by dimensionality reduction techniques. Furthermore, Our experiments demonstrate that this curvature-based analysis can be employed to estimate the intrinsic dimensionality of datasets. We use this to explore the large-scale geometry of empirical networks and to evaluate the effectiveness of dimensionality reduction techniques.
【8】ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
标题:ASTREA:引入地球智能实现轨道热自主
链接:https://arxiv.org/abs/2509.13380
作者: D. Mousist
备注:This preprint presents ASTREA, a multi-agent architecture combining LLM-guided semantic modulation with reinforcement learning for autonomous satellite operations. The system is validated in hardware orbital environments
摘要:本文介绍了ASTREA,第一个代理系统部署在飞行遗产硬件(TRL 9)的自主航天器操作。使用热控制作为一个代表性的用例,我们集成了一个资源受限的大型语言模型(LLM)代理与强化学习控制器在一个异步架构专为空间合格的平台。地面实验表明,LLM引导的监督提高了热稳定性,减少了违规行为,证实了在硬件约束下将语义推理与自适应控制相结合的可行性。然而,国际空间站(ISS)上的在轨验证显示,由于推理延迟与低地球轨道(LEO)卫星的快速热循环特性不匹配而导致性能下降。这些结果突出了在真实飞行环境中基于代理LLM的系统的机会和当前的局限性,为未来的空间自主提供了实用的设计指南。
摘要:This paper presents ASTREA, the first agentic system deployed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations. Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. However, on-orbit validation aboard the International Space Station (ISS) reveals performance degradation caused by inference latency mismatched with the rapid thermal cycles characteristic of Low Earth Orbit (LEO) satellites. These results highlight both the opportunities and current limitations of agentic LLM-based systems in real flight environments, providing practical design guidelines for future space autonomy.
【9】Synthetic Data and the Shifting Ground of Truth
标题:合成数据和真相的转移
链接:https://arxiv.org/abs/2509.13355
作者:ffenhuber
备注:Talk presented at the Society for the Social Studies of Science (4S) 2025 meeting in Seattle, Sept. 3, 2025
摘要:用于隐私保护、训练数据生成或简单方便地访问任何形状或体积的准现实数据的合成数据的出现使地面实况的概念变得复杂。合成数据模拟真实世界的观测,但不涉及外部特征。然而,这种代表性关系的缺乏并不妨碍研究人员使用合成数据作为人工智能模型和地面真实数据库的训练数据。据称,缺乏数据的真实性不仅仅是一个可接受的权衡,但往往会导致更好的模型性能比现实的数据:补偿已知的偏见,防止过拟合和支持泛化,并使模型在处理意外离群值时更加稳健。事实上,将噪声和完全不可信的数据注入训练集对模型是有益的。这大大复杂化了通常的假设,基于这些假设,表示的准确性决定了数据的保真度(垃圾输入-垃圾输出)。此外,地面真相成为一种自我参照的事情,其中用作地面真相存储库的标签本身就是生成模型的合成产物,因此与现实世界的观察无关。我的论文探讨了ML研究人员和实践者如何在这种矛盾的情况下引导地面真相,而不依赖于稳定的地面表示和现实世界的参考。它还将反映从代表性的数据概念转变为可以描述为模仿或图标的数据概念的更广泛影响。
摘要:The emergence of synthetic data for privacy protection, training data generation, or simply convenient access to quasi-realistic data in any shape or volume complicates the concept of ground truth. Synthetic data mimic real-world observations, but do not refer to external features. This lack of a representational relationship, however, not prevent researchers from using synthetic data as training data for AI models and ground truth repositories. It is claimed that the lack of data realism is not merely an acceptable tradeoff, but often leads to better model performance than realistic data: compensate for known biases, prevent overfitting and support generalization, and make the models more robust in dealing with unexpected outliers. Indeed, injecting noisy and outright implausible data into training sets can be beneficial for the model. This greatly complicates usual assumptions based on which representational accuracy determines data fidelity (garbage in - garbage out). Furthermore, ground truth becomes a self-referential affair, in which the labels used as a ground truth repository are themselves synthetic products of a generative model and as such not connected to real-world observations. My paper examines how ML researchers and practitioners bootstrap ground truth under such paradoxical circumstances without relying on the stable ground of representation and real-world reference. It will also reflect on the broader implications of a shift from a representational to what could be described as a mimetic or iconic concept of data.
【10】Imagined Autocurricula
标题:想象的自动课程
链接:https://arxiv.org/abs/2509.13341
作者:Güzel, Matthew Thomas Jackson, Jarek Luca Liesen, Tim Rocktäschel, Jakob Nicolaus Foerster, Ilija Bogunovic, Jack Parker-Holder
摘要:训练代理在具体环境中行动通常需要大量的训练数据或访问准确的模拟,这两种情况在现实世界中都不存在。相反,世界模型正在成为一种替代方案,利用离线,被动收集的数据,它们可以生成不同的世界来训练模拟中的代理。在这项工作中,我们利用世界模型产生想象的环境,训练强大的代理能够推广到新的任务变化。这样做的挑战之一是确保代理对生成的有用数据进行培训。因此,我们提出了一种新的方法,IMAC(想象Autocourses),利用无监督环境设计(UED),它引起了自动课程生成的世界。在一系列具有挑战性的程序生成环境中,我们证明了在保持环境中实现强大的传输性能是可能的,只在从较窄的数据集学习的世界模型中进行训练。我们相信,这开辟了一条道路,利用更大规模的基础世界模型,一般有能力的代理。
摘要:Training agents to act in embodied environments typically requires vast training data or access to accurate simulation, neither of which exists for many cases in the real world. Instead, world models are emerging as an alternative leveraging offline, passively collected data, they make it possible to generate diverse worlds for training agents in simulation. In this work, we harness world models to generate imagined environments to train robust agents capable of generalizing to novel task variations. One of the challenges in doing this is ensuring the agent trains on useful generated data. We thus propose a novel approach, IMAC (Imagined Autocurricula), leveraging Unsupervised Environment Design (UED), which induces an automatic curriculum over generated worlds. In a series of challenging, procedurally generated environments, we show it is possible to achieve strong transfer performance on held-out environments, having trained only inside a world model learned from a narrower dataset. We believe this opens the path to utilizing larger-scale, foundation world models for generally capable agents.
【11】FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness
标题:FRIT:利用因果重要性提高思想链的忠诚度
链接:https://arxiv.org/abs/2509.13334
作者:roop, Akshat Nallani, Saksham Uboweja, Adiliia Uzdenova, Michael Nguyen, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Vasu Sharma, Maheep Chaudhary
摘要:思想链(CoT)推理已经成为提高复杂任务的大型语言模型性能的强大工具,但最近的工作表明,推理步骤往往无法因果影响最终答案,从而产生脆弱和不可信的输出。以前的方法主要集中在衡量忠诚,而系统地改善它的方法仍然有限。我们引入了通过干预训练的忠实推理(FRIT),这是一种可扩展的对齐方法,通过从系统损坏的示例中学习来训练模型以产生因果一致的推理。FRIT通过干预模型生成的CoT中的各个推理步骤来生成合成训练数据,从而创建在推理失败时突出显示的忠实/不忠实对。然后,我们应用直接偏好优化来教模型偏好因果一致的推理路径。在Qwen 3 -8B和Mistral-7 B-v0.1上评估事实和符号推理任务时,FRIT将GSM 8 K上的Mistral的忠实推理提高了3.4 $个百分点,同时将准确率提高了7.6 $个百分点。我们的方法提供了第一个可扩展的,无监督的方法来训练语言模型,以产生更可靠和可解释的推理,解决推理性能和可信度之间的关键差距。我们在\href{https://github.com/Anut-py/frit}发布代码。
摘要:Chain-of-thought (CoT) reasoning has emerged as a powerful tool for improving large language model performance on complex tasks, but recent work shows that reasoning steps often fail to causally influence the final answer, creating brittle and untrustworthy outputs. Prior approaches focus primarily on measuring faithfulness, while methods for systematically improving it remain limited. We introduce Faithful Reasoning via Intervention Training (FRIT), a scalable alignment method that trains models to produce causally consistent reasoning by learning from systematically corrupted examples. FRIT generates synthetic training data by intervening on individual reasoning steps in model-generated CoTs, creating faithful/unfaithful pairs that highlight when reasoning breaks down. We then apply Direct Preference Optimization to teach models to prefer causally consistent reasoning paths. Evaluating on Qwen3-8B and Mistral-7B-v0.1 across factual and symbolic reasoning tasks, FRIT increases faithful reasoning by $3.4$ percentage points for Mistral on GSM8K while improving accuracy by $7.6$ percentage points. Our approach provides the first scalable, supervision-free method for training language models to produce more reliable and interpretable reasoning, addressing a critical gap between reasoning performance and trustworthiness. We release our code at \href{https://github.com/Anut-py/frit}.
【12】Joint data imputation and mechanistic modelling for simulating heart-brain interactions in incomplete datasets
标题:用于模拟不完整数据集中的心肺相互作用的联合数据插补和机械建模
链接:https://arxiv.org/abs/2010.01052
作者:us, Maxime Sermesant, Oscar Camara, Marco Lorenzi
摘要:在临床研究中使用的机械模型是有限的,缺乏多模态的患者数据代表不同的解剖和生理过程。例如,神经成像数据集不提供心脏特征的充分表示,用于脑疾病中的心血管因素的建模。为了解决这个问题,我们引入了一个概率框架,用于联合心脏数据插补和心血管机制模型的个性化,并应用于心脏数据不完整的大脑研究。我们的方法是基于一个变分框架的心脏信息的插补模型从可用的功能,随着高斯过程仿真器,可以忠实地再现个性化的心血管动力学的联合推断。英国生物银行的实验结果表明,我们的模型可以准确填补缺失的心脏功能的数据集包含最少的心脏信息,例如收缩压和舒张压,同时联合估计集总模型的仿真参数。这允许通过模拟对应于不同脑解剖条件的真实心脏动力学来对心脑关节关系进行新颖的探索。
摘要
:The use of mechanistic models in clinical studies is limited by the lack of multi-modal patients data representing different anatomical and physiological processes. For example, neuroimaging datasets do not provide a sufficient representation of heart features for the modeling of cardiovascular factors in brain disorders. To tackle this problem we introduce a probabilistic framework for joint cardiac data imputation and personalisation of cardiovascular mechanistic models, with application to brain studies with incomplete heart data. Our approach is based on a variational framework for the joint inference of an imputation model of cardiac information from the available features, along with a Gaussian Process emulator that can faithfully reproduce personalised cardiovascular dynamics. Experimental results on UK Biobank show that our model allows accurate imputation of missing cardiac features in datasets containing minimal heart information, e.g. systolic and diastolic blood pressures only, while jointly estimating the emulated parameters of the lumped model. This allows a novel exploration of the heart-brain joint relationship through simulation of realistic cardiac dynamics corresponding to different conditions of brain anatomy.
【13】Spacing Test for Fused Lasso
标题:熔断套索的间距测试
链接:https://arxiv.org/abs/2509.14229
作者:aka, Tatsuya Kimura, Joe Suzuki
摘要:本研究解决了融合套索中正则化参数的选取问题。特别是,我们扩展了Tibshirani等人提出的间隔测试的框架。融合套索,通过将选择事件描述为多面体约束,为后选择推理提供了理论基础。基于使用LARS型算法的融合套索的解决方案路径的分析,我们推导出精确的条件$p$-值为选定的变点。我们的方法拓宽了适用范围的间距测试从标准套索融合处罚结构。此外,通过数值实验比较所提出的方法与顺序版本的AIC和BIC以及交叉验证,我们表明,所提出的方法适当地控制I型错误,同时实现高检测功率。这项工作提供了一个理论上健全的和计算上实用的解决方案,结构化信号估计问题的参数选择和选择后的推理。关键词:融合Lasso,正则化参数选择,Lasso间隔检验,选择性推理,变点检测
摘要:This study addresses the unresolved problem of selecting the regularization parameter in the fused lasso. In particular, we extend the framework of the Spacing Test proposed by Tibshirani et al. to the fused lasso, providing a theoretical foundation for post-selection inference by characterizing the selection event as a polyhedral constraint. Based on the analysis of the solution path of the fused lasso using a LARS-type algorithm, we derive exact conditional $p$-values for the selected change-points. Our method broadens the applicability of the Spacing Test from the standard lasso to fused penalty structures. Furthermore, through numerical experiments comparing the proposed method with sequential versions of AIC and BIC as well as cross-validation, we demonstrate that the proposed approach properly controls the type I error while achieving high detection power. This work offers a theoretically sound and computationally practical solution for parameter selection and post-selection inference in structured signal estimation problems. Keywords: Fused Lasso, Regularization parameter selection, Spacing Test for Lasso, Selective inference, Change-point detection
【14】On the Rate of Gaussian Approximation for Linear Regression Problems
标题:线性回归问题的高斯逼近速度
链接:https://arxiv.org/abs/2509.14039
作者:sainov, Marina Sheshukova, Alain Durmus, Sergey Samsonov
摘要:本文考虑在线线性回归任务的高斯逼近问题。我们推导出相应的速率设置一个恒定的学习率和研究的收敛速度的明确依赖于问题的尺寸$d$和数量相关的设计矩阵。当迭代次数n已知时,只要样本量n足够大,我们的结果就得到了阶数为$\sqrt{\log{n}/n}的正态逼近率。
摘要:In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate upon the problem dimension $d$ and quantities related to the design matrix. When the number of iterations $n$ is known in advance, our results yield the rate of normal approximation of order $\sqrt{\log{n}/n}$, provided that the sample size $n$ is large enough.
【15】Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping
标题:使用深环整形提高引力波天文台的宇宙学范围
链接:https://arxiv.org/abs/2509.14016
作者:hli, Brendan Tracey, Tomislav Andric, Christopher Wipf, Yu Him Justin Chiu, Matthias Lochbrunner, Craig Donner, Rana X. Adhikari, Jan Harms, Iain Barr, Roland Hafner, Andrea Huber, Abbas Abdolmaleki, Charlie Beattie, Joseph Betzwieser, Serkan Cabi, Jonas Degrave, Yuzhu Dong, Leslie Fritz, Anchal Gupta, Oliver Groth, Sandy Huang, Tamara Norman, Hannah Openshaw, Jameson Rollins, Greg Thornton, George Van Den Driessche, Markus Wulfmeier, Pushmeet Kohli, Martin Riedmiller, LIGO Instrument Team
备注:None
摘要:引力波观测站低频灵敏度的提高将开启中等质量黑洞合并、双黑洞偏心率的研究,并为双中子星合并的多信使观测提供早期预警。如今的后视镜稳定控制会产生有害的噪声,这是提高灵敏度的主要障碍。我们通过Deep Loop Shaping消除了这种噪声,这是一种使用频域奖励的强化学习方法。我们在LIGO利文斯顿天文台(LLO)上证明了我们的方法。我们的控制器在10- 30 Hz频段的控制噪声降低了30倍以上,在子频段的控制噪声降低了100倍,超过了量子极限的设计目标。这些结果突出了深环成形改善当前和未来GW观测站以及更广泛的仪器和控制系统的潜力。
摘要:Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers, binary black hole eccentricity, and provide early warnings for multi-messenger observations of binary neutron star mergers. Today's mirror stabilization control injects harmful noise, constituting a major obstacle to sensitivity improvements. We eliminated this noise through Deep Loop Shaping, a reinforcement learning method using frequency domain rewards. We proved our methodology on the LIGO Livingston Observatory (LLO). Our controller reduced control noise in the 10--30Hz band by over 30x, and up to 100x in sub-bands surpassing the design goal motivated by the quantum limit. These results highlight the potential of Deep Loop Shaping to improve current and future GW observatories, and more broadly instrumentation and control systems.
【16】A reduced-order derivative-informed neural operator for subsurface fluid-flow
标题:地下流体流动的降阶求导神经操作器
链接:https://arxiv.org/abs/2509.13620
作者:(Jayjay)Park, Grant Bruer, Huseyin Tuna Erdinc, Abhinav Prakash Gahlot, Felix J. Herrmann
摘要:神经运算符已经成为昂贵的流体流动模拟器的经济有效的替代品,特别是在计算密集型任务中,例如从时移地震数据反演渗透率和不确定性量化。在这些应用中,替代梯度相对于系统参数的保真度是至关重要的,因为下游任务的准确性,如优化和贝叶斯推理,直接依赖于衍生信息的质量。物理信息方法的最新进展已经利用衍生信息来提高替代准确性。然而,合并显式雅可比矩阵可能在计算上变得令人望而却步,因为复杂度通常与输入参数的数量成二次比例。为了解决这个问题,我们提出了DeFINO(Derivative-based Fisher-score Informed Neural Operator),这是一个降阶的、基于导数的训练框架。DeFINO将傅立叶神经运算符(FNO)与一种由Fisher信息矩阵(Fisher Information Matrix)指导的新型基于导数的训练策略相结合。通过将雅可比矩阵投影到由观测器识别的主特征方向上,DeFINO捕获了直接由观测数据提供的关键敏感性信息,从而显著降低了计算费用。我们通过地下多相流体流动的背景下的合成实验验证了DeFINO,证明了梯度精度的提高,同时保持了对底层流体动力学的稳健的正向预测。这些结果突出了DeFINO在复杂的现实世界场景中为反演问题提供实用,可扩展的解决方案的潜力,所有这些都大大降低了计算成本。
摘要
:Neural operators have emerged as cost-effective surrogates for expensive fluid-flow simulators, particularly in computationally intensive tasks such as permeability inversion from time-lapse seismic data, and uncertainty quantification. In these applications, the fidelity of the surrogate's gradients with respect to system parameters is crucial, as the accuracy of downstream tasks, such as optimization and Bayesian inference, relies directly on the quality of the derivative information. Recent advances in physics-informed methods have leveraged derivative information to improve surrogate accuracy. However, incorporating explicit Jacobians can become computationally prohibitive, as the complexity typically scales quadratically with the number of input parameters. To address this limitation, we propose DeFINO (Derivative-based Fisher-score Informed Neural Operator), a reduced-order, derivative-informed training framework. DeFINO integrates Fourier neural operators (FNOs) with a novel derivative-based training strategy guided by the Fisher Information Matrix (FIM). By projecting Jacobians onto dominant eigen-directions identified by the FIM, DeFINO captures critical sensitivity information directly informed by observational data, significantly reducing computational expense. We validate DeFINO through synthetic experiments in the context of subsurface multi-phase fluid-flow, demonstrating improvements in gradient accuracy while maintaining robust forward predictions of underlying fluid dynamics. These results highlight DeFINO's potential to offer practical, scalable solutions for inversion problems in complex real-world scenarios, all at substantially reduced computational cost.
【17】Unleashing the power of computational insights in revealing the complexity of biological systems in the new era of spatial multi-omics
标题:释放计算洞察的力量,揭示空间多元组学新时代生物系统的复杂性
链接:https://arxiv.org/abs/2509.13376
作者:n, Tiangang Wang, Kexin Huang, Binwu Ying, Xiaobo Zhou
备注:43 pages, 9 figures, 1 table
摘要:空间组学技术的最新进展彻底改变了我们以前所未有的分辨率研究生物系统的能力。通过保留分子测量的空间背景,这些方法能够全面绘制发育生物学、神经科学、肿瘤学和进化研究中的细胞异质性、组织结构和动态生物学过程。这篇综述强调了技术和计算算法的不断进步的系统性概述,这些技术和算法为通过使用空间多组学更深入,更系统地理解哺乳动物组织和器官的结构和机制铺平了道路。我们的观点展示了先进的机器学习算法和多组学综合建模如何解码复杂的生物学过程,包括器官发育过程中细胞的空间组织和拓扑关系,以及肿瘤发生和转移的关键分子特征和调控网络。最后,我们概述了未来的技术创新和空间组学在精准医学建模的见解方向。
摘要:Recent advances in spatial omics technologies have revolutionized our ability to study biological systems with unprecedented resolution. By preserving the spatial context of molecular measurements, these methods enable comprehensive mapping of cellular heterogeneity, tissue architecture, and dynamic biological processes in developmental biology, neuroscience, oncology, and evolutionary studies. This review highlights a systematic overview of the continuous advancements in both technology and computational algorithms that are paving the way for a deeper, more systematic comprehension of the structure and mechanisms of mammalian tissues and organs by using spatial multi-omics. Our viewpoint demonstrates how advanced machine learning algorithms and multi-omics integrative modeling can decode complex biological processes, including the spatial organization and topological relationships of cells during organ development, as well as key molecular signatures and regulatory networks underlying tumorigenesis and metastasis. Finally, we outline future directions for technological innovation and modeling insights of spatial omics in precision medicine.
【18】Valuation of Exotic Options and Counterparty Games Based on Conditional Diffusion
标题:基于条件扩散的奇异期权估值和对手方博弈
链接:https://arxiv.org/abs/2509.13374
作者:o, Junchi Shen
备注:28 pages, 12 figures
摘要:本文讨论了奇异期权和结构性产品定价的挑战,传统模型往往无法处理,由于他们无法捕捉现实世界的市场现象,如厚尾分布和波动率聚集。我们引入了扩散条件概率模型(DDPM)来生成更真实的价格路径。我们的方法采用了一个复合损失函数与金融特定的功能,我们提出了一个P-Q动态博弈框架,通过对抗回测评估模型的经济价值。静态验证表明,我们的P模型有效地匹配市场均值和波动率。在动态博弈中,它比传统的基于蒙特卡罗的欧式和亚式期权模型显示出更高的盈利能力。然而,该模型在定价对极端事件高度敏感的产品时显示出局限性,如雪球和泡沫,因为它往往低估了尾部风险。该研究的结论是,扩散模型具有提高定价准确性的巨大潜力,但需要进一步研究,以提高其模拟极端市场风险的能力。
摘要:This paper addresses the challenges of pricing exotic options and structured products, which traditional models often fail to handle due to their inability to capture real-world market phenomena like fat-tailed distributions and volatility clustering. We introduce a Diffusion-Conditional Probability Model (DDPM) to generate more realistic price paths. Our method incorporates a composite loss function with financial-specific features, and we propose a P-Q dynamic game framework for evaluating the model's economic value through adversarial backtesting. Static validation shows our P-model effectively matches market mean and volatility. In dynamic games, it demonstrates significantly higher profitability than a traditional Monte Carlo-based model for European and Asian options. However, the model shows limitations in pricing products highly sensitive to extreme events, such as snowballs and accumulators, because it tends to underestimate tail risks. The study concludes that diffusion models hold significant potential for enhancing pricing accuracy, though further research is needed to improve their ability to model extreme market risks.
【19】Benchmarking Dimensionality Reduction Techniques for Spatial Transcriptomics
标题:空间转录组学的模糊性降低技术基准
链接:https://arxiv.org/abs/2509.13344
作者:q Mahmud, Veena Kochat, Suresh Satpati, Jagan Mohan Reddy Dwarampudi, Kunal Rai, Tania Banerjee
备注:This paper is accepted to the 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2025), 10 page and have 4 figures
摘要:我们介绍了一个统一的框架,用于评估在空间转录组学的降维技术超越标准PCA方法。我们基准六种方法PCA,NMF,autoencoder,VAE,和两个混合嵌入胆管癌Xenium数据集,系统地改变潜在的维度($k$=5-40)和聚类分辨率($\rho$=0.1-1.2)。每个配置使用互补的指标,包括重建误差,解释方差,集群凝聚力,和两个新的生物动机的措施:集群标记一致性(CMC)和标记排除率(MER)进行评估。我们的结果展示了不同的性能配置文件:PCA提供了一个快速的基线,NMF最大化标记富集,VAE平衡重建和可解释性,而自动编码器占据了中间地带。我们使用帕累托最优分析提供了系统的超参数选择,并展示了MER引导的重新分配如何在所有方法中提高生物保真度,CMC评分平均提高了12%。该框架使降维方法的原则性选择适合特定的空间转录组学分析。
摘要:We introduce a unified framework for evaluating dimensionality reduction techniques in spatial transcriptomics beyond standard PCA approaches. We benchmark six methods PCA, NMF, autoencoder, VAE, and two hybrid embeddings on a cholangiocarcinoma Xenium dataset, systematically varying latent dimensions ($k$=5-40) and clustering resolutions ($\rho$=0.1-1.2). Each configuration is evaluated using complementary metrics including reconstruction error, explained variance, cluster cohesion, and two novel biologically-motivated measures: Cluster Marker Coherence (CMC) and Marker Exclusion Rate (MER). Our results demonstrate distinct performance profiles: PCA provides a fast baseline, NMF maximizes marker enrichment, VAE balances reconstruction and interpretability, while autoencoders occupy a middle ground. We provide systematic hyperparameter selection using Pareto optimal analysis and demonstrate how MER-guided reassignment improves biological fidelity across all methods, with CMC scores improving by up to 12\% on average. This framework enables principled selection of dimensionality reduction methods tailored to specific spatial transcriptomics analyses.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递