点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计124篇
大模型相关(16篇)
【1】RescueLens: LLM-Powered Triage and Action on Volunteer Feedback for Food Rescue
标题:RescueLens:LLM支持的分类和食品救援志愿者反馈行动
链接:https://arxiv.org/abs/2511.15698
作者:Naveen Raman,Jingwu Tang,Zhiyu Chen,Zheyuan Ryan Shi,Sean Hudson,Ameesh Kapoor,Fei Fang
备注:Accepted at IAAI'26
摘要:粮食救援组织通过与志愿者合作,将多余的粮食从捐赠者手中重新分配给需要的受援者,同时解决粮食不安全和浪费问题。志愿者的反馈使粮食救援组织能够及早发现问题,并确保志愿者满意。然而,粮食救援组织手动监测反馈,这可能是繁琐和劳动密集型的,很难优先考虑哪些问题是最重要的。在这项工作中,我们研究了大型语言模型(LLM)如何帮助食品救援组织者根据志愿者的经验理解和采取行动。我们与位于宾夕法尼亚州匹兹堡的大型食品救援组织412 Food Rescue合作,设计RescueLens,这是一种法学硕士驱动的工具,可以自动对志愿者反馈进行分类,建议捐赠者和接受者跟进,并根据反馈更新志愿者方向。我们评估了RescueLens在注释数据集上的性能,并表明它可以以71%的精度恢复96%的志愿者问题。此外,RescueLens根据志愿者问题的发生率对捐赠者和接受者进行排名,组织者可以关注0.5%的捐赠者,这些捐赠者负责超过30%的志愿者问题。RescueLens现在部署在412 Food Rescue,通过对组织者的半结构化采访,我们发现RescueLens简化了反馈过程,使组织者更好地分配时间。
摘要:Food rescue organizations simultaneously tackle food insecurity and waste by working with volunteers to redistribute food from donors who have excess to recipients who need it. Volunteer feedback allows food rescue organizations to identify issues early and ensure volunteer satisfaction. However, food rescue organizations monitor feedback manually, which can be cumbersome and labor-intensive, making it difficult to prioritize which issues are most important. In this work, we investigate how large language models (LLMs) assist food rescue organizers in understanding and taking action based on volunteer experiences. We work with 412 Food Rescue, a large food rescue organization based in Pittsburgh, Pennsylvania, to design RescueLens, an LLM-powered tool that automatically categorizes volunteer feedback, suggests donors and recipients to follow up with, and updates volunteer directions based on feedback. We evaluate the performance of RescueLens on an annotated dataset, and show that it can recover 96% of volunteer issues at 71% precision. Moreover, by ranking donors and recipients according to their rates of volunteer issues, RescueLens allows organizers to focus on 0.5% of donors responsible for more than 30% of volunteer issues. RescueLens is now deployed at 412 Food Rescue and through semi-structured interviews with organizers, we find that RescueLens streamlines the feedback process so organizers better allocate their time.
【2】DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models
标题:DeepThinkVLA:增强视觉-语言-动作模型的推理能力
链接:https://arxiv.org/abs/2511.15669
作者:Cheng Yin,Yankai Lin,Wang Xu,Sikyuen Tam,Xiangrui Zeng,Zhiyuan Liu,Zhouping Yin
备注:16 pages, 6 figures, conference
摘要:通过思想链(CoT)使视觉语言行动(VLA)模型能够“在行动之前思考”,是克服端到端机器人策略数据饥饿性质的一条有前途的道路。然而,一个根本的冲突阻碍了进展:现有的模型使用一个自回归解码器进行顺序CoT推理和高维的,可并行的机器人动作。这种结构上的不匹配降低了运动控制能力,也无法在思想和行动之间建立起强有力的因果联系。我们引入了DeepThinkVLA,它通过紧密集成的架构和培训策略解决了这一冲突。在架构上,我们的混合注意力解码器生成顺序CoT与因果关系的注意,然后切换到双向注意力的快速,并行解码的动作向量。这种设计由两个阶段的训练管道补充:我们首先使用监督微调(SFT)来教授模型基础推理,然后应用强化学习(RL)与任务成功奖励,将完整的推理动作序列与期望的结果进行因果对齐。这种协同作用带来了最先进的性能,在LIBERO基准上实现了97.0%的成功率。我们的消融证实了设计的有效性:仅混合架构就比标准解码器性能高出15.5%,最后的RL阶段提供了关键的2%提升,以确保最佳性能。
摘要:Enabling Vision-Language-Action (VLA) models to "think before acting" via Chain-of-Thought (CoT) is a promising path to overcoming the data-hungry nature of end-to-end robot policies. However, progress is stalled by a fundamental conflict: existing models use a single autoregressive decoder for both sequential CoT reasoning and high-dimensional, parallelizable robot actions. This architectural mismatch degrades motor control and fails to forge a strong causal link between thought and action. We introduce DeepThinkVLA, which resolves this conflict through a tightly integrated architecture and training strategy. Architecturally, our hybrid-attention decoder generates sequential CoT with causal attention and then switches to bidirectional attention for fast, parallel decoding of action vectors. This design is complemented by a two-stage training pipeline: we first use Supervised Fine-Tuning (SFT) to teach the model foundational reasoning, then apply Reinforcement Learning (RL) with task-success rewards to causally align the full reasoning-action sequence with desired outcomes. This synergy leads to state-of-the-art performance, achieving a 97.0% success rate on the LIBERO benchmark. Our ablations confirm the design's effectiveness: the hybrid architecture alone outperforms standard decoders by 15.5%, and the final RL stage provides a crucial 2% boost to secure top performance.
【3】VisPlay: Self-Evolving Vision-Language Models from Images
标题:VisPlay:来自图像的自我进化视觉语言模型
链接:https://arxiv.org/abs/2511.15661
作者:Yicheng He,Chengsong Huang,Zongxia Li,Jiaxin Huang,Yonghui Yang
摘要:强化学习(RL)提供了一个原则性的框架,用于改进复杂推理任务的视觉语言模型(VLM)。然而,现有的强化学习方法通常依赖于人类注释的标签或特定于任务的启发式方法来定义可验证的奖励,这两种方法都成本高昂且难以扩展。我们引入了VisPlay,这是一个自我进化的RL框架,它使VLM能够使用大量未标记的图像数据自主提高其推理能力。从一个单一的基础VLM开始,VisPlay将模型分配给两个相互作用的角色:一个图像条件化的提问者,制定具有挑战性但可回答的视觉问题,以及一个多模态推理者,生成银色的响应。这些角色与组相对策略优化(GRPO)联合训练,GRPO结合了多样性和难度奖励,以平衡生成问题的复杂性与银答案的质量。VisPlay可在两个模型系列中有效扩展。当在Qwen2.5-VL和MiMo-VL上训练时,VisPlay在视觉推理、合成泛化和幻觉减少方面实现了一致的改进,跨越了包括MM-Vet和MMMU在内的八个基准,展示了一条通往自我进化的多模态智能的可扩展路径。该项目的网页可在https://bruno686.github.io/VisPlay/上找到
摘要
:Reinforcement learning (RL) provides a principled framework for improving Vision-Language Models (VLMs) on complex reasoning tasks. However, existing RL approaches often rely on human-annotated labels or task-specific heuristics to define verifiable rewards, both of which are costly and difficult to scale. We introduce VisPlay, a self-evolving RL framework that enables VLMs to autonomously improve their reasoning abilities using large amounts of unlabeled image data. Starting from a single base VLM, VisPlay assigns the model into two interacting roles: an Image-Conditioned Questioner that formulates challenging yet answerable visual questions, and a Multimodal Reasoner that generates silver responses. These roles are jointly trained with Group Relative Policy Optimization (GRPO), which incorporates diversity and difficulty rewards to balance the complexity of generated questions with the quality of the silver answers. VisPlay scales efficiently across two model families. When trained on Qwen2.5-VL and MiMo-VL, VisPlay achieves consistent improvements in visual reasoning, compositional generalization, and hallucination reduction across eight benchmarks, including MM-Vet and MMMU, demonstrating a scalable path toward self-evolving multimodal intelligence. The project page is available at https://bruno686.github.io/VisPlay/
【4】D4C: Data-free Quantization for Contrastive Language-Image Pre-training Models
标题:D4 C:对比图像预训练模型的无数据量化
链接:https://arxiv.org/abs/2511.15411
作者:Wenlun Zhang,Yunshan Zhong,Zihao Ding,Xinyu Li,Kentaro Yoshioka
摘要:无数据量化(DFQ)为模型压缩提供了一种实用的解决方案,而无需访问真实数据,这使得它在隐私敏感的场景中特别有吸引力。虽然DFQ已经显示出对单峰模型的承诺,但它对视觉语言模型的扩展,如对比图像预训练(CLIP)模型仍然没有得到充分的探索。在这项工作中,我们发现,直接应用现有的DFQ技术CLIP的结果在很大程度上的性能下降,由于两个关键的限制:语义内容不足和合成样本的图像内多样性低。为了应对这些挑战,我们提出了D4 C,这是第一个为CLIP量身定制的DFQ框架。D4 C通过三个关键组件来合成语义丰富且结构多样的伪图像:(1)基于文本提示的语义注入将生成的图像与真实世界的语义对齐;(2)结构对比生成通过利用前景-背景对比合成来再现自然图像的组成结构;以及(3)扰动感知增强应用受控扰动以提高样本多样性和鲁棒性。这些组件共同使D4 C能够合成语义信息和结构多样的图像,有效地弥合了CLIP上DFQ的性能差距。大量的实验验证了D4 C的有效性,在各种位宽和模型上显示出显着的性能改善。例如,在使用CLIP ResNet-50和ViT-B/32的W 4A 8设置下,D4 C在zero-shot分类中分别实现了CIFAR-10上12.4%和18.9%,CIFAR-100上6.8%和19.7%以及ImageNet-1 K上1.4%和5.7%的Top-1准确率提升。
摘要:Data-Free Quantization (DFQ) offers a practical solution for model compression without requiring access to real data, making it particularly attractive in privacy-sensitive scenarios. While DFQ has shown promise for unimodal models, its extension to Vision-Language Models such as Contrastive Language-Image Pre-training (CLIP) models remains underexplored. In this work, we reveal that directly applying existing DFQ techniques to CLIP results in substantial performance degradation due to two key limitations: insufficient semantic content and low intra-image diversity in synthesized samples. To tackle these challenges, we propose D4C, the first DFQ framework tailored for CLIP. D4C synthesizes semantically rich and structurally diverse pseudo images through three key components: (1) Prompt-Guided Semantic Injection aligns generated images with real-world semantics using text prompts; (2) Structural Contrastive Generation reproduces compositional structures of natural images by leveraging foreground-background contrastive synthesis; and (3) Perturbation-Aware Enhancement applies controlled perturbations to improve sample diversity and robustness. These components jointly empower D4C to synthesize images that are both semantically informative and structurally diverse, effectively bridging the performance gap of DFQ on CLIP. Extensive experiments validate the effectiveness of D4C, showing significant performance improvements on various bit-widths and models. For example, under the W4A8 setting with CLIP ResNet-50 and ViT-B/32, D4C achieves Top-1 accuracy improvement of 12.4% and 18.9% on CIFAR-10, 6.8% and 19.7% on CIFAR-100, and 1.4% and 5.7% on ImageNet-1K in zero-shot classification, respectively.
【5】Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction
标题:成本感知预测(CAP):用于心力衰竭死亡率预测的LLM增强型机器学习管道和决策支持系统
链接:https://arxiv.org/abs/2511.15357
作者:Yinan Yu,Falk Dippel,Christina E. Lundberg,Martin Lindgren,Annika Rosengren,Martin Adiels,Helen Sjöland
摘要:机器学习(ML)预测模型的开发通常不考虑下游价值权衡和临床可解释性。本文介绍了一个成本感知预测(CAP)框架,该框架结合了大型语言模型(LLM)代理辅助的成本效益分析,以传达应用ML预测所涉及的权衡。材料与方法:我们开发了一个ML模型,预测心力衰竭患者的1年死亡率(N = 30,021,22%死亡率),以确定那些有资格接受家庭护理的患者。然后,我们引入了临床影响预测(CIP)曲线来可视化重要的成本维度-生活质量和医疗保健提供者的费用,进一步分为治疗和错误成本,以评估预测的临床后果。最后,我们使用了四个LLM代理来生成特定于患者的描述。临床医生评估了该系统的决策支持价值。结果如下:极端梯度增强(XGB)模型实现了最佳性能,接受者工作特征曲线下面积(AUROC)为0.804(95%置信区间(CI)0.792-0.816),精确-召回曲线下面积(AUPRC)为0.529(95% CI 0.502-0.558),Brier评分为0.135(95% CI 0.130-0.140)。讨论内容:CIP成本曲线提供了整个决策阈值的成本组成的人口水平的概述,而LLM生成的成本效益分析,在个别患者的水平。根据临床医生的评价,该系统受到好评。然而,反馈强调需要加强投机性任务的技术准确性。结论:CAP利用LLM代理集成ML分类器结果和成本效益分析,以提供更透明和可解释的决策支持。
摘要:Objective: Machine learning (ML) predictive models are often developed without considering downstream value trade-offs and clinical interpretability. This paper introduces a cost-aware prediction (CAP) framework that combines cost-benefit analysis assisted by large language model (LLM) agents to communicate the trade-offs involved in applying ML predictions. Materials and Methods: We developed an ML model predicting 1-year mortality in patients with heart failure (N = 30,021, 22% mortality) to identify those eligible for home care. We then introduced clinical impact projection (CIP) curves to visualize important cost dimensions - quality of life and healthcare provider expenses, further divided into treatment and error costs, to assess the clinical consequences of predictions. Finally, we used four LLM agents to generate patient-specific descriptions. The system was evaluated by clinicians for its decision support value. Results: The eXtreme gradient boosting (XGB) model achieved the best performance, with an area under the receiver operating characteristic curve (AUROC) of 0.804 (95% confidence interval (CI) 0.792-0.816), area under the precision-recall curve (AUPRC) of 0.529 (95% CI 0.502-0.558) and a Brier score of 0.135 (95% CI 0.130-0.140). Discussion: The CIP cost curves provided a population-level overview of cost composition across decision thresholds, whereas LLM-generated cost-benefit analysis at individual patient-levels. The system was well received according to the evaluation by clinicians. However, feedback emphasizes the need to strengthen the technical accuracy for speculative tasks. Conclusion: CAP utilizes LLM agents to integrate ML classifier outcomes and cost-benefit analysis for more transparent and interpretable decision support.
【6】EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
标题:EntroPIC:通过比例积分控制的熵稳定化实现LLM的稳定长期训练
链接:https://arxiv.org/abs/2511.15248
作者:Kai Yang,Xin Xu,Yangkun Chen,Weijie Liu,Jiafei Lyu,Zichuan Lin,Deheng Ye,Saiyong Yang
摘要:大型语言模型(LLM)的长期训练需要保持稳定的探索,以防止模型崩溃为次优行为。在这种情况下,熵是至关重要的,因为它控制探索,并有助于避免过早收敛到次优解决方案。然而,现有的强化学习方法很难保持适当的熵水平,因为训练过程涉及正负样本的混合,每一个样本都以不同的方式影响熵。为了解决这个问题,我们提出了熵稳定通过比例积分控制(熵PIC),一种新的方法,自适应调整的影响,积极和消极的样本,通过动态调整其损失系数。这种方法在整个训练过程中稳定了熵,确保了有效的探索和稳定的进展。我们为政策和政策外学习设置提供了全面的理论分析,证明了EntropIC在大规模LLM培训中控制熵是有效的。实验结果表明,我们的方法成功地保持了所需的熵水平,从而为LLM提供了稳定和最佳的RL训练。
摘要:Long-term training of large language models (LLMs) requires maintaining stable exploration to prevent the model from collapsing into sub-optimal behaviors. Entropy is crucial in this context, as it controls exploration and helps avoid premature convergence to sub-optimal solutions. However, existing reinforcement learning methods struggle to maintain an appropriate level of entropy, as the training process involves a mix of positive and negative samples, each affecting entropy in different ways across steps. To address this, we propose Entropy stablilization via Proportional-Integral Control (EntroPIC), a novel method that adaptively adjusts the influence of positive and negative samples by dynamically tuning their loss coefficients. This approach stabilizes entropy throughout training, ensuring efficient exploration and steady progress. We provide a comprehensive theoretical analysis for both on-policy and off-policy learning settings, demonstrating that EntroPIC is effective at controlling entropy in large-scale LLM training. Experimental results show that our method successfully maintains desired entropy levels, enabling stable and optimal RL training for LLMs.
【7】Reasoning in Diffusion Large Language Models is Concentrated in Dynamic Confusion Zones
标题:扩散中的推理大型语言模型集中在动态混乱区域
链接:https://arxiv.org/abs/2511.15208
作者:Ranfei Chen,Ming Chen,Kaifei Wang
摘要:扩散大语言模型(DLLM)与自回归模型一起迅速崛起,成为复杂推理的强大范例,强化学习越来越多地用于下游对齐。现有的基于概率的RL方法在去噪步骤中均匀地分配策略梯度,隐含地将所有步骤视为同等重要。我们通过分析具有几个步骤级度量的轨迹来挑战这一假设:基于熵的不确定性,置信度(CM)不确定性和熵变化率(RoEC)。这些揭示了结构化的“混乱区域”:不确定性和不稳定性的短暂峰值强烈预测最终的成功或失败,而大多数步骤保持稳定。我们提出了自适应轨迹策略优化(ATPO),这是一种轻量级的步骤选择策略,可以动态地将梯度更新重新分配给这些高杠杆步骤,而不会改变RL目标,奖励或计算预算。使用混合RoEC+CM规则,ATPO在基准测试中的推理准确性和训练稳定性方面获得了实质性的提高,这表明利用轨迹动态是推进dLLM RL的关键。
摘要:Diffusion Large Language Models (dLLMs) are rapidly emerging alongside autoregressive models as a powerful paradigm for complex reasoning, with reinforcement learning increasingly used for downstream alignment. Existing trajectory-based RL methods uniformly allocate policy gradients across denoising steps, implicitly treating all steps as equally important. We challenge this assumption by analyzing trajectories with several step-level metrics: entropy-based uncertainty, Confidence-Margin (CM) uncertainty, and Rate of Entropy Change (RoEC). These reveal structured "zones of confusion": transient spikes in uncertainty and instability that strongly predict final success or failure, while most steps remain stable. We propose Adaptive Trajectory Policy Optimization (ATPO), a lightweight step-selection strategy that dynamically reallocates gradient updates to these high-leverage steps without changing the RL objective, rewards, or compute budget. Using a hybrid RoEC+CM rule, ATPO delivers substantial gains in reasoning accuracy and training stability across benchmarks, showing that exploiting trajectory dynamics is key to advancing dLLM RL.
【8】Teaching According to Students' Aptitude: Personalized Mathematics Tutoring via Persona-, Memory-, and Forgetting-Aware LLMs
标题:根据学生的能力进行教学:通过角色扮演、记忆和遗忘意识法学硕士进行个性化数学辅导
链接:https://arxiv.org/abs/2511.15163
作者:Yang Wu,Rujing Yao,Tong Zhang,Yufei Shi,Zhuoren Jiang,Zhushan Li,Xiaozhong Liu
备注:AAAI 2026 Workshop
摘要:大型语言模型(LLM)越来越多地集成到智能教学系统中,以提供类似人类的自适应教学。然而,大多数现有的方法未能捕捉到学生的知识如何动态地跨越他们的专业知识,概念差距和遗忘模式。这一挑战在数学辅导中尤为严峻,有效的教学需要精细的脚手架,精确地校准到每个学生的掌握水平和认知保持。为了解决这个问题,我们提出了TASA(教学根据学生的能力倾向),一个学生意识的辅导框架,集成人物角色,记忆和遗忘动态个性化的数学学习。具体来说,TASA保持一个结构化的学生角色捕捉熟练程度配置文件和事件记忆记录之前的学习互动。通过将连续遗忘曲线与知识追踪相结合,TASA动态更新每个学生的掌握状态,并生成上下文适当的,困难校准的问题和解释。实证结果表明,TASA实现了卓越的学习成果和更具适应性的辅导行为相比,有代表性的基线,强调建模的时间遗忘和学习者配置文件在基于LLM的辅导系统的重要性。
摘要:Large Language Models (LLMs) are increasingly integrated into intelligent tutoring systems to provide human-like and adaptive instruction. However, most existing approaches fail to capture how students' knowledge evolves dynamically across their proficiencies, conceptual gaps, and forgetting patterns. This challenge is particularly acute in mathematics tutoring, where effective instruction requires fine-grained scaffolding precisely calibrated to each student's mastery level and cognitive retention. To address this issue, we propose TASA (Teaching According to Students' Aptitude), a student-aware tutoring framework that integrates persona, memory, and forgetting dynamics for personalized mathematics learning. Specifically, TASA maintains a structured student persona capturing proficiency profiles and an event memory recording prior learning interactions. By incorporating a continuous forgetting curve with knowledge tracing, TASA dynamically updates each student's mastery state and generates contextually appropriate, difficulty-calibrated questions and explanations. Empirical results demonstrate that TASA achieves superior learning outcomes and more adaptive tutoring behavior compared to representative baselines, underscoring the importance of modeling temporal forgetting and learner profiles in LLM-based tutoring systems.
【9】From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
标题:从求解到迭代:LLM鲁棒推理的统一目标
链接:https://arxiv.org/abs/2511.15137
作者:Xiaoxuan Wang,Bo Liu,Song Jiang,Jingzhou Liu,Jingyuan Qi,Xia Chen,Baosheng He
摘要:大型语言模型(LLM)的推理能力通过强化学习(RL)得到了显着提高。尽管如此,LLM仍然很难始终如一地验证自己的推理痕迹。这就提出了如何增强LLM的自我验证能力以及这种能力是否能进一步提高推理性能的研究问题。在这项工作中,我们提出了GRPO-Verif,一个算法,共同优化解决方案的生成和自验证在一个统一的损失函数,可调超参数控制验证信号的权重。实验结果表明,我们的方法增强了自我验证能力,同时保持相当的推理性能。
摘要:The reasoning capabilities of large language models (LLMs) have been significantly improved through reinforcement learning (RL). Nevertheless, LLMs still struggle to consistently verify their own reasoning traces. This raises the research question of how to enhance the self-verification ability of LLMs and whether such an ability can further improve reasoning performance. In this work, we propose GRPO-Verif, an algorithm that jointly optimizes solution generation and self-verification within a unified loss function, with an adjustable hyperparameter controlling the weight of the verification signal. Experimental results demonstrate that our method enhances self-verification capability while maintaining comparable performance in reasoning.
【10】Beyond GeneGPT: A Multi-Agent Architecture with Open-Source LLMs for Enhanced Genomic Question Answering
标题:超越GeneGPT:具有开源LLM的多代理架构,用于增强基因组问题解答
链接:https://arxiv.org/abs/2511.15061
作者:Haodong Chen,Guido Zuccon,Teerapong Leelanupab
备注:This paper has been accepted to SIGIR-AP 2025
摘要:基因组问题的回答往往需要复杂的推理和整合不同的生物医学来源。GeneGPT通过将特定领域的API与OpenAI的code-davinci-002大型语言模型相结合来解决这一挑战,从而实现与基因组数据库的自然语言交互。然而,它对专有模型的依赖限制了可扩展性,增加了运营成本,并引发了对数据隐私和泛化的担忧。 在这项工作中,我们重新审视和再现GeneGPT在试点研究中使用开源模型,包括Llama 3.1,Qwen2.5和Qwen2.5编码器,在一个单片架构;这使我们能够识别这种方法的局限性。在此基础上,我们开发OpenBioLLM,一个模块化的多代理框架,通过引入代理专业化工具路由,查询生成和响应验证扩展GeneGPT。这使得协调推理和基于角色的任务执行成为可能。 OpenBioLLM在超过90%的基准任务上匹配或优于GeneGPT,在Gene-Turing上达到0.849的平均得分,在GeneHop上达到0.830,同时使用较小的开源模型,无需额外的微调或特定于工具的预训练。OpenBioLLM的模块化多代理设计将基准任务的延迟降低了40-50%,在不影响模型功能的情况下显著提高了效率。我们的综合评估结果突出了开源多智能体系统的基因组问题回答的潜力。代码和资源可在https://github.com/ielab/OpenBioLLM上获得。
摘要
:Genomic question answering often requires complex reasoning and integration across diverse biomedical sources. GeneGPT addressed this challenge by combining domain-specific APIs with OpenAI's code-davinci-002 large language model to enable natural language interaction with genomic databases. However, its reliance on a proprietary model limits scalability, increases operational costs, and raises concerns about data privacy and generalization. In this work, we revisit and reproduce GeneGPT in a pilot study using open source models, including Llama 3.1, Qwen2.5, and Qwen2.5 Coder, within a monolithic architecture; this allows us to identify the limitations of this approach. Building on this foundation, we then develop OpenBioLLM, a modular multi-agent framework that extends GeneGPT by introducing agent specialization for tool routing, query generation, and response validation. This enables coordinated reasoning and role-based task execution. OpenBioLLM matches or outperforms GeneGPT on over 90% of the benchmark tasks, achieving average scores of 0.849 on Gene-Turing and 0.830 on GeneHop, while using smaller open-source models without additional fine-tuning or tool-specific pretraining. OpenBioLLM's modular multi-agent design reduces latency by 40-50% across benchmark tasks, significantly improving efficiency without compromising model capability. The results of our comprehensive evaluation highlight the potential of open-source multi-agent systems for genomic question answering. Code and resources are available at https://github.com/ielab/OpenBioLLM.
【11】MermaidSeqBench: An Evaluation Benchmark for LLM-to-Mermaid Sequence Diagram Generation
标题:MermaidSeqBench:LLM到Mermaid序列图生成的评估基准
链接:https://arxiv.org/abs/2511.14967
作者:Basel Shbita,Farhan Ahmed,Chad DeLuca
摘要:大型语言模型(LLM)在从自然语言描述生成结构化图方面表现出了出色的能力。特别是,它们在生成软件工程的序列图方面表现出了很大的潜力,通常以基于文本的语法表示,如Mermaid。然而,在这一领域的系统评估仍然不够发达,因为缺乏现有的基准来评估LLM在这项任务中的正确性。为了解决这个缺点,我们引入MermaidSeqBench,一个人类验证和LLM综合扩展的基准,用于评估LLM从文本提示生成美人鱼序列图的能力。该基准测试由132个样本组成的核心集组成,从一小部分手工制作和验证的流开始。这些都是通过一种混合方法,结合人类注释,在上下文中LLM提示,和基于规则的变化生成扩展。我们的基准测试使用一个LLM作为判断模型来评估Mermaid序列图生成的细粒度指标,包括语法正确性,激活处理,错误处理和实际可用性。我们对许多最先进的LLM进行初步评估,并利用多个LLM判断模型来证明我们的基准的有效性和灵活性。我们的研究结果揭示了模型和评估模式之间的显著能力差距。我们提出的基准提供了一个基础,推进结构化图生成的研究和开发更严格的,细粒度的评估方法。
摘要:Large language models (LLMs) have demonstrated excellent capabilities in generating structured diagrams from natural language descriptions. In particular, they have shown great promise in generating sequence diagrams for software engineering, typically represented in a text-based syntax such as Mermaid. However, systematic evaluations in this space remain underdeveloped as there is a lack of existing benchmarks to assess the LLM's correctness in this task. To address this shortcoming, we introduce MermaidSeqBench, a human-verified and LLM-synthetically-extended benchmark for assessing an LLM's capabilities in generating Mermaid sequence diagrams from textual prompts. The benchmark consists of a core set of 132 samples, starting from a small set of manually crafted and verified flows. These were expanded via a hybrid methodology combining human annotation, in-context LLM prompting, and rule-based variation generation. Our benchmark uses an LLM-as-a-judge model to assess Mermaid sequence diagram generation across fine-grained metrics, including syntax correctness, activation handling, error handling, and practical usability. We perform initial evaluations on numerous state-of-the-art LLMs and utilize multiple LLM judge models to demonstrate the effectiveness and flexibility of our benchmark. Our results reveal significant capability gaps across models and evaluation modes. Our proposed benchmark provides a foundation for advancing research in structured diagram generation and for developing more rigorous, fine-grained evaluation methodologies.
【12】How to Train Private Clinical Language Models: A Comparative Study of Privacy-Preserving Pipelines for ICD-9 Coding
标题:如何训练私人临床语言模型:ICD-9编码隐私保护管道的比较研究
链接:https://arxiv.org/abs/2511.14936
作者:Mathieu Dufour,Andrew Duncan
备注:10 pages, 5 figures. Accepted to the Privacy-Preserving Machine Learning Workshop at EurIPS 2025
摘要:在临床文本上训练的大型语言模型有暴露敏感患者信息的风险,但差分隐私(DP)方法通常会严重降低部署所需的诊断准确性。尽管DP优化和文本生成方面取得了快速进展,但仍不清楚哪种隐私保护策略实际上最适合临床语言任务。我们提出了第一个系统的头对头比较四个培训管道的自动诊断编码从医院出院摘要。所有管道都使用相同的1B参数模型和匹配的隐私预算来预测ICD-9代码。在适度和宽松的隐私预算($\varepados\in \{4,6\}$),知识蒸馏从DP培训的教师优于直接DP-SGD和DP合成数据训练,恢复高达63%的非私人性能,同时保持强大的经验隐私(成员推断AUC $\approximately $0.5)。这些发现揭示了不同架构之间隐私-效用权衡的巨大差异,并将知识蒸馏确定为保护隐私的临床NLP的最实用途径。
摘要:Large language models trained on clinical text risk exposing sensitive patient information, yet differential privacy (DP) methods often severely degrade the diagnostic accuracy needed for deployment. Despite rapid progress in DP optimisation and text generation, it remains unclear which privacy-preserving strategy actually works best for clinical language tasks. We present the first systematic head-to-head comparison of four training pipelines for automated diagnostic coding from hospital discharge summaries. All pipelines use identical 1B-parameter models and matched privacy budgets to predict ICD-9 codes. At moderate and relaxed privacy budgets ($\varepsilon \in \{4, 6\}$), knowledge distillation from DP-trained teachers outperforms both direct DP-SGD and DP-synthetic data training, recovering up to 63\% of the non-private performance whilst maintaining strong empirical privacy (membership-inference AUC $\approx$ 0.5). These findings expose large differences in the privacy-utility trade-off across architectures and identify knowledge distillation as the most practical route to privacy-preserving clinical NLP.
【13】On-Premise SLMs vs. Commercial LLMs: Prompt Engineering and Incident Classification in SOCs and CSIRTs
标题:非首选的SLC与商业LLM:SOC和CSIRT中的快速工程和事件分类
链接:https://arxiv.org/abs/2511.14908
作者:Gefté Almeida,Marcio Pohlmann,Alex Severo,Diego Kreutz,Tiago Heinrich,Lourenço Pereira
备注:5 pages, 3 figures, 3 tables, submitted to ERRC/WRSeg 2025
摘要:在这项研究中,我们评估了安全事件分类的开源模型,并将其与专有模型进行比较。我们利用匿名的真实事件的数据集,根据NIST SP 800- 61 r3分类法进行分类,并使用五种自动化工程技术(PHP,SHP,HTP,PRP和EML)进行处理。结果表明,尽管专有模型仍然具有更高的准确性,但本地部署的开源模型在隐私,成本效益和数据主权方面具有优势。
摘要:In this study, we evaluate open-source models for security incident classification, comparing them with proprietary models. We utilize a dataset of anonymized real incidents, categorized according to the NIST SP 800-61r3 taxonomy and processed using five prompt-engineering techniques (PHP, SHP, HTP, PRP, and ZSL). The results indicate that, although proprietary models still exhibit higher accuracy, locally deployed open-source models provide advantages in privacy, cost-effectiveness, and data sovereignty.
【14】It's LIT! Reliability-Optimized LLMs with Inspectable Tools
标题:这是LIT!具有可检查工具的可靠性优化LLM
链接:https://arxiv.org/abs/2511.14903
作者:Ruixin Zhang,Jon Donnelly,Zhicheng Guo,Ghazal Khalighinejad,Haiyang Huang,Alina Jade Barnett,Cynthia Rudin
备注:Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop on Multi-Turn Interactions in Large Language Models
摘要:大型语言模型(LLM)在各个领域都表现出了卓越的能力。调用外部工具的能力进一步扩展了它们处理实际任务的能力。然而,LLM通常遵循一个不透明的推理过程,这限制了它们在高风险领域的实用性,在这些领域,解决方案需要对最终用户可信。LLM可以选择不可靠且难以排除故障的解决方案,即使有更好的选择。我们通过强制LLM使用外部-更可靠-工具来解决这个问题。我们提出了一个框架,建立在现有的LLMs的工具调用功能,使他们能够选择最可靠的和易于故障排除的解决方案路径,这可能涉及多个顺序的工具调用。我们将此框架称为LIT(LLM with Inspectable Tools)。为了支持LIT,我们引入了一个新的具有挑战性的基准数据集,其中包含1,300个问题和一组与一系列专业工具相关的可定制的可靠性成本函数。这些成本函数总结了每个工具的可靠性以及故障排除的容易程度。例如,计算器跨域可靠,而如果存在分布偏移,则线性预测模型不可靠,但易于故障排除。构造随机森林的工具既不可靠,也不容易排除故障。这些工具与哈佛USPTO专利数据集和NeurIPS 2023论文的新数据集交互,以解决不同难度级别的数学,编码和建模问题。我们证明,LLM可以实现更可靠和明智的解决问题,同时保持使用我们的框架的任务性能。
摘要:Large language models (LLMs) have exhibited remarkable capabilities across various domains. The ability to call external tools further expands their capability to handle real-world tasks. However, LLMs often follow an opaque reasoning process, which limits their usefulness in high-stakes domains where solutions need to be trustworthy to end users. LLMs can choose solutions that are unreliable and difficult to troubleshoot, even if better options are available. We address this issue by forcing LLMs to use external -- more reliable -- tools to solve problems when possible. We present a framework built on the tool-calling capabilities of existing LLMs to enable them to select the most reliable and easy-to-troubleshoot solution path, which may involve multiple sequential tool calls. We refer to this framework as LIT (LLMs with Inspectable Tools). In order to support LIT, we introduce a new and challenging benchmark dataset of 1,300 questions and a customizable set of reliability cost functions associated with a collection of specialized tools. These cost functions summarize how reliable each tool is and how easy it is to troubleshoot. For instance, a calculator is reliable across domains, whereas a linear prediction model is not reliable if there is distribution shift, but it is easy to troubleshoot. A tool that constructs a random forest is neither reliable nor easy to troubleshoot. These tools interact with the Harvard USPTO Patent Dataset and a new dataset of NeurIPS 2023 papers to solve mathematical, coding, and modeling problems of varying difficulty levels. We demonstrate that LLMs can achieve more reliable and informed problem-solving while maintaining task performance using our framework.
【15】Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings
标题:分层令牌前置:增强基于解码器的LLM嵌入中的信息流
链接:https://arxiv.org/abs/2511.14868
作者:Xueying Ding,Xingyue Huang,Mingxuan Ju,Liam Collins,Yozen Liu,Leman Akoglu,Neil Shah,Tong Zhao
摘要:大型语言模型产生了强大的文本嵌入,但它们的因果注意机制限制了信息从较晚的标记流向较早的标记,降低了表示质量。虽然最近的方法试图通过前置单个摘要令牌来解决这个问题,但它们过度压缩了信息,因此损害了长文档的性能。我们提出了分层令牌前置(HTP),一种解决两个关键瓶颈的方法。为了减轻注意力级别的压缩,HTP将输入划分为块,并将块级别的摘要标记预先添加到后续块中,从而为反向信息流创建多个路径。为了解决读出级别的过度挤压,我们用均值池代替最后一个令牌池,这是一个得到理论分析支持的选择。HTP在11个检索数据集和30个通用嵌入基准测试中实现了一致的性能增益,特别是在长上下文设置中。作为一种简单的、与体系结构无关的方法,HTP增强了zero-shot和微调模型,提供了一条可扩展的途径来实现卓越的长文档嵌入。
摘要:Large language models produce powerful text embeddings, but their causal attention mechanism restricts the flow of information from later to earlier tokens, degrading representation quality. While recent methods attempt to solve this by prepending a single summary token, they over-compress information, hence harming performance on long documents. We propose Hierarchical Token Prepending (HTP), a method that resolves two critical bottlenecks. To mitigate attention-level compression, HTP partitions the input into blocks and prepends block-level summary tokens to subsequent blocks, creating multiple pathways for backward information flow. To address readout-level over-squashing, we replace last-token pooling with mean-pooling, a choice supported by theoretical analysis. HTP achieves consistent performance gains across 11 retrieval datasets and 30 general embedding benchmarks, especially in long-context settings. As a simple, architecture-agnostic method, HTP enhances both zero-shot and finetuned models, offering a scalable route to superior long-document embeddings.
【16】DEVAL: A Framework for Evaluating and Improving the Derivation Capability of Large Language Models
标题:DEVAR:评估和改进大型语言模型派生能力的框架
链接:https://arxiv.org/abs/2511.14813
作者:Yifan Li,Qin Li,Min Zhang,Min Zhang,Peixin Wang
摘要:评估大型语言模型(LLM)对数据的推理能力仍然是一个开放和紧迫的研究问题。与LLM相比,人类推理可以根据输入的某些变化对输出进行相应的修改。这种推理模式依赖于管理数据变化之间关系的抽象规则,尚未在LLM中进行全面描述或评估。在本文中,我们正式定义这种推理模式为派生关系(DR),并引入派生能力(DC)的概念,即通过作出相应的修改输出时,输入发生一定的变化,应用DR。为了评估DC,提出了一个系统构建的评估框架DEVAL,并用于评估七个主流任务中的五个流行的LLM和一个大型推理模型。评估结果表明,主流LLM,如GPT-4 o和Claude3.5,表现出中等的DR识别能力,但在解决问题的情况下,有效地应用DR显着下降。为了改善这一点,我们提出了一种新的提示工程方法,称为派生验证(DP)。它实现了15.2%的平均改善DC所有测试的LLM,优于常用的提示工程技术。
摘要:Assessing the reasoning ability of Large Language Models (LLMs) over data remains an open and pressing research question. Compared with LLMs, human reasoning can derive corresponding modifications to the output based on certain kinds of changes to the input. This reasoning pattern, which relies on abstract rules that govern relationships between changes of data, has not been comprehensively described or evaluated in LLMs. In this paper, we formally define this reasoning pattern as the Derivation Relation (DR) and introduce the concept of Derivation Capability (DC), i.e. applying DR by making the corresponding modification to the output whenever the input takes certain changes. To assess DC, a systematically constructed evaluation framework named DEVAL is proposed and used to evaluate five popular LLMs and one Large Reasoning Model in seven mainstream tasks. The evaluation results show that mainstream LLMs, such as GPT-4o and Claude3.5, exhibit moderate DR recognition capabilities but reveal significant drop-offs on applying DR effectively in problem-solving scenarios. To improve this, we propose a novel prompt engineering approach called Derivation Prompting (DP). It achieves an average improvement of 15.2% in DC for all tested LLMs, outperforming commonly used prompt engineering techniques.
Graph相关(图学习|图神经网络|图优化等)(8篇)
【1】SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome
标题:SIGMMA:组织病理学图像和空间转录组的基于分层图的多尺度多模式对比对齐
链接:https://arxiv.org/abs/2511.15464
作者:Dabin Jeong,Amirhossein Vahidi,Ciro Ramírez-Suástegui,Marie Moullet,Kevin Ly,Mohammad Vali Sanian,Sebastian Birk,Yinshui Chang,Adam Boxall,Daniyal Jafree,Lloyd Steele,Vijaya Baskar MS,Muzlifah Haniffa,Mohammad Lotfollahi
摘要:计算病理学的最新进展已经利用视觉语言模型来学习苏木精和伊红(HE)图像与空间转录组(ST)轮廓的联合表示。然而,现有的方法通常在单个尺度下将HE瓦片与其对应的ST轮廓对齐,忽略了细粒度的细胞结构及其空间组织。为了解决这个问题,我们提出了Sigmma,一个多模态对比比对框架,用于学习HE图像和空间转录组分布在多个尺度上的分层表示。Sigmma引入了多尺度对比对齐,确保在不同尺度上学习的表征在不同模态之间保持一致。此外,通过将细胞相互作用表示为图并整合子图间和子图内的关系,我们的方法有效地捕获了组织微环境中从细到粗的细胞-细胞相互作用。我们证明了Sigmm学习表示,更好地捕捉跨模态对应,从而提高平均。9.78%,平均值为9.78%。在跨数据集的跨模态检索任务中,正确率为26.93%。我们进一步表明,它在下游分析中学习有意义的多组织组织。
摘要
:Recent advances in computational pathology have leveraged vision-language models to learn joint representations of Hematoxylin and Eosin (HE) images with spatial transcriptomic (ST) profiles. However, existing approaches typically align HE tiles with their corresponding ST profiles at a single scale, overlooking fine-grained cellular structures and their spatial organization. To address this, we propose Sigmma, a multi-modal contrastive alignment framework for learning hierarchical representations of HE images and spatial transcriptome profiles across multiple scales. Sigmma introduces multi-scale contrastive alignment, ensuring that representations learned at different scales remain coherent across modalities. Furthermore, by representing cell interactions as a graph and integrating inter- and intra-subgraph relationships, our approach effectively captures cell-cell interactions, ranging from fine to coarse, within the tissue microenvironment. We demonstrate that Sigmm learns representations that better capture cross-modal correspondences, leading to an improvement of avg. 9.78\% in the gene-expression prediction task and avg. 26.93\% in the cross-modal retrieval task across datasets. We further show that it learns meaningful multi-tissue organization in downstream analyses.
【2】Graph Query Networks for Object Detection with Automotive Radar
标题:用于汽车雷达目标检测的图查询网络
链接:https://arxiv.org/abs/2511.15271
作者:Loveneet Saini,Hasan Tercan,Tobias Meisen
备注:Accepted in WACV 2026 Main Conference
摘要:使用3D雷达进行物体检测对于360度汽车感知至关重要,但雷达的长波长会产生稀疏和不规则的反射,这对传统的基于网格和序列的卷积和Transformer检测器构成了挑战。本文介绍了图查询网络(GQN),一个基于注意力的框架,模型的雷达感知到的图形对象,提取个性化的关系和上下文特征。GQN采用了一种新的图形查询概念来动态地关注鸟瞰图(BEV)空间,构建由两个新模块处理的特定于对象的图形:用于关系推理的EdgeFocus和用于上下文聚合的DeepContext Pooling。在NuScenes数据集上,GQN将相对mAP提高了+53%,包括比最强的先验雷达方法提高了+8.2%,同时将峰值图构建开销降低了80%,并具有适度的FLOP成本。
摘要:Object detection with 3D radar is essential for 360-degree automotive perception, but radar's long wavelengths produce sparse and irregular reflections that challenge traditional grid and sequence-based convolutional and transformer detectors. This paper introduces Graph Query Networks (GQN), an attention-based framework that models objects sensed by radar as graphs, to extract individualized relational and contextual features. GQN employs a novel concept of graph queries to dynamically attend over the bird's-eye view (BEV) space, constructing object-specific graphs processed by two novel modules: EdgeFocus for relational reasoning and DeepContext Pooling for contextual aggregation. On the NuScenes dataset, GQN improves relative mAP by up to +53%, including a +8.2% gain over the strongest prior radar method, while reducing peak graph construction overhead by 80% with moderate FLOPs cost.
【3】D2D Power Allocation via Quantum Graph Neural Network
标题:基于量子图神经网络的D2D功率分配
链接:https://arxiv.org/abs/2511.15246
作者:Tung Giang Le,Xuan Tung Nguyen,Won-Joo Hwang
摘要:日益增加的无线网络复杂性要求可扩展的资源管理。经典GNN擅长于图学习,但在大规模环境中会产生高计算成本。我们提出了一个完全量子的图神经网络(QGNN),通过参数化量子电路(PQC)实现消息传递。我们的量子图卷积层(QGCL)将特征编码到量子态中,使用NISQ兼容的酉处理图,并通过测量检索嵌入。应用于D2D功率控制以实现SINR最大化,我们的QGNN以更少的参数和固有的并行性匹配经典性能。这种基于PQC的端到端GNN标志着量子加速无线优化的一步。
摘要:Increasing wireless network complexity demands scalable resource management. Classical GNNs excel at graph learning but incur high computational costs in large-scale settings. We present a fully quantum Graph Neural Network (QGNN) that implements message passing via Parameterized Quantum Circuits (PQCs). Our Quantum Graph Convolutional Layers (QGCLs) encode features into quantum states, process graphs with NISQ-compatible unitaries, and retrieve embeddings through measurement. Applied to D2D power control for SINR maximization, our QGNN matches classical performance with fewer parameters and inherent parallelism. This end-to-end PQC-based GNN marks a step toward quantum-accelerated wireless optimization.
【4】Vehicle Routing Problems via Quantum Graph Attention Network Deep Reinforcement Learning
标题:通过量子图注意力网络深度强化学习实现车辆路径问题
链接:https://arxiv.org/abs/2511.15175
作者:Le Tung Giang,Vu Hoang Viet,Nguyen Xuan Tung,Trinh Van Chien,Won-Joo Hwang
备注:11 pages, 3 figures, 2 tables. Accepted by SOICT 2025
摘要:车辆路径问题(VRP)是智能运输系统中的一个基本NP-难问题,在物流配送中有着广泛的应用。具有图神经网络(GNN)的深度强化学习(DRL)已经显示出了希望,但经典模型依赖于参数繁重且内存受限的大型多层感知器(MLP)。我们提出了一个量子图注意力网络(Q-GAT)的DRL框架内,参数化的量子电路(PQC)取代传统的MLP在关键的读出阶段。混合模型保持了图注意编码器的表达能力,同时减少了50%以上的可训练参数。在VRP基准测试上的实验结果表明,Q-GAT算法收敛速度更快,路由代价比经典GAT算法降低了约5%.这些结果证明了PQC增强的GNNs作为大规模路由和物流优化的紧凑而有效的解决方案的潜力。
摘要:The vehicle routing problem (VRP) is a fundamental NP-hard task in intelligent transportation systems with broad applications in logistics and distribution. Deep reinforcement learning (DRL) with Graph Neural Networks (GNNs) has shown promise, yet classical models rely on large multi-layer perceptrons (MLPs) that are parameter-heavy and memory-bound. We propose a Quantum Graph Attention Network (Q-GAT) within a DRL framework, where parameterized quantum circuits (PQCs) replace conventional MLPs at critical readout stages. The hybrid model maintains the expressive capacity of graph attention encoders while reducing trainable parameters by more than 50%. Using proximal policy optimization (PPO) with greedy and stochastic decoding, experiments on VRP benchmarks show that Q-GAT achieves faster convergence and reduces routing cost by about 5% compared with classical GAT baselines. These results demonstrate the potential of PQC-enhanced GNNs as compact and effective solvers for large-scale routing and logistics optimization.
【5】Knowledge Graphs as Structured Memory for Embedding Spaces: From Training Clusters to Explainable Inference
标题:知识图作为嵌入空间的结构化记忆:从训练集群到可解释推理
链接:https://arxiv.org/abs/2511.14961
作者:Artur A. Oliveira,Mateus Espadoto,Roberto M. Cesar,Roberto Hirata
备注:Submitted to GRIVAPP 2026 (21st International Conference on Computer Graphics, Interaction, Visualization Theory and Applications), Marbella, Spain, March 9-11 2026
摘要:我们介绍了图记忆(GM),一个结构化的非参数框架,增强嵌入式推理与紧凑,关系记忆区域级原型。GM不是孤立地处理每个训练实例,而是将嵌入空间总结为用可靠性指标注释的原型节点,并通过编码几何和上下文关系的边连接。这种设计将实例检索、基于原型的推理和基于图的标签传播统一在一个单一的归纳模型中,该模型支持高效的推理和忠实的解释。在包括乳腺组织病理学(IDC)在内的合成和真实数据集上的实验表明,GM实现了与$k$NN和标签扩散竞争的准确性,同时提供了更好的校准和更平滑的决策边界,所有这些都具有数量级更少的样本。通过显式地建模可靠性和关系结构,GM在非参数学习中提供了局部证据和全局一致性之间的原则性桥梁。
摘要:We introduce Graph Memory (GM), a structured non-parametric framework that augments embedding-based inference with a compact, relational memory over region-level prototypes. Rather than treating each training instance in isolation, GM summarizes the embedding space into prototype nodes annotated with reliability indicators and connected by edges that encode geometric and contextual relations. This design unifies instance retrieval, prototype-based reasoning, and graph-based label propagation within a single inductive model that supports both efficient inference and faithful explanation. Experiments on synthetic and real datasets including breast histopathology (IDC) show that GM achieves accuracy competitive with $k$NN and Label Spreading while offering substantially better calibration and smoother decision boundaries, all with an order of magnitude fewer samples. By explicitly modeling reliability and relational structure, GM provides a principled bridge between local evidence and global consistency in non-parametric learning.
【6】Integrating Causal Inference with Graph Neural Networks for Alzheimer's Disease Analysis
标题:将因果推理与图神经网络集成用于阿尔茨海默病分析
链接:https://arxiv.org/abs/2511.14922
作者:Pranay Kumar Peddi,Dhrubajyoti Ghosh
摘要:深度图学习已经从MRI中推进了阿尔茨海默病(AD)的分类,但大多数模型仍然是相关的,混淆了人口统计学和遗传因素与疾病的具体特征。我们提出了Cause-GCN,这是一个介入性的图形卷积框架,它集成了基于do-calculus的后门调整,以识别对AD进展产生稳定因果影响的大脑区域。每个受试者的MRI表示为结构连接体,其中节点表示皮质和皮质下区域,边缘编码解剖连接。通过主成分总结年龄、sec和APOE 4基因型等混杂因素,并纳入因果调整集中。在训练之后,通过服务其传入边缘和改变节点特征来模拟对各个区域的干预,以估计对疾病概率的平均因果效应。应用于来自ADNI队列的484名受试者,Cause-GCN实现了与基线GNN相当的性能,同时提供了可解释的因果效应排名,突出了与已建立的AD神经病理学一致的后部、扣带回和岛叶枢纽。
摘要:Deep graph learning has advanced Alzheimer's (AD) disease classification from MRI, but most models remain correlational, confounding demographic and genetic factors with disease specific features. We present Causal-GCN, an interventional graph convolutional framework that integrates do-calculus-based back-door adjustment to identify brain regions exerting stable causal influence on AD progression. Each subject's MRI is represented as a structural connectome where nodes denote cortical and subcortical regions and edges encode anatomical connectivity. Confounders such as age, sec, and APOE4 genotype are summarized via principal components and included in the causal adjustment set. After training, interventions on individual regions are simulated by serving their incoming edges and altering node features to estimate average causal effects on disease probability. Applied to 484 subjects from the ADNI cohort, Causal-GCN achieves performance comparable to baseline GNNs while providing interpretable causal effect rankings that highlight posterior, cingulate, and insular hubs consistent with established AD neuropathology.
【7】Exact Learning of Weighted Graphs Using Composite Queries
标题:使用复合字形精确学习加权图
链接:https://arxiv.org/abs/2511.14882
作者:Michael T. Goodrich,Songyu Liu,Ioannis Panageas
备注:Full version of the paper published at IWOCA 2025
摘要:本文研究了赋权图的精确学习问题,其中给定赋权图G=(V,E,w)的顶点集V,但不给定E。这个问题,也被称为图重构,是通过从一个oracle中询问关于$G$的查询来确定$E$的所有边,包括它们的权重。正如我们所观察到的,一般来说,使用简单的最短路径长度查询不足以学习加权图。因此,我们研究了一些场景,其中可以使用次二次复合查询来学习$G$,该复合查询组合了两个或三个简单查询。
摘要:In this paper, we study the exact learning problem for weighted graphs, where we are given the vertex set, $V$, of a weighted graph, $G=(V,E,w)$, but we are not given $E$. The problem, which is also known as graph reconstruction, is to determine all the edges of $E$, including their weights, by asking queries about $G$ from an oracle. As we observe, using simple shortest-path length queries is not sufficient, in general, to learn a weighted graph. So we study a number of scenarios where it is possible to learn $G$ using a subquadratic number of composite queries, which combine two or three simple queries.
【8】Resource-Based Time and Cost Prediction in Project Networks: From Statistical Modeling to Graph Neural Networks
标题:项目网络中基于资源的时间和成本预测:从统计建模到图神经网络
链接:https://arxiv.org/abs/2511.15003
作者:Reza Mirjalili,Behrad Braghi,Shahram Shadrokh Sikari
备注:52 pages, 12 figures
摘要:准确预测项目持续时间和成本仍然是项目管理中最具挑战性的方面之一,特别是在资源有限和相互依赖的任务网络中。传统的分析技术,如关键路径法(CPM)和计划评估和审查技术(PERT)依赖于简化的,往往是静态的假设,任务的相互依赖性和资源的性能。本研究提出了一种新的基于资源的预测框架,该框架将项目活动的网络表示与图神经网络(GNN)相结合,以捕获任务,资源和时间成本动态之间的结构和上下文关系。该模型将项目表示为异构活动-资源图,其中节点表示活动和资源,边编码时间和资源依赖关系。 我们在合成和基准项目数据集上评估了多种学习范式,包括GraphSAGE和Temporal Graph Networks。实验结果表明,与传统的回归和基于树的方法相比,所提出的GNN框架实现了平均绝对误差平均降低23%至31%,同时将大型复杂项目网络的决定系数R2从约0.78提高到0.91。此外,学习的嵌入提供了对资源瓶颈和关键依赖关系的可解释的见解,从而实现了更可解释和自适应的调度决策。
摘要:Accurate prediction of project duration and cost remains one of the most challenging aspects of project management, particularly in resource-constrained and interdependent task networks. Traditional analytical techniques such as the Critical Path Method (CPM) and Program Evaluation and Review Technique (PERT) rely on simplified and often static assumptions regarding task interdependencies and resource performance. This study proposes a novel resource-based predictive framework that integrates network representations of project activities with graph neural networks (GNNs) to capture structural and contextual relationships among tasks, resources, and time-cost dynamics. The model represents the project as a heterogeneous activity-resource graph in which nodes denote activities and resources, and edges encode temporal and resource dependencies. We evaluate multiple learning paradigms, including GraphSAGE and Temporal Graph Networks, on both synthetic and benchmark project datasets. Experimental results show that the proposed GNN framework achieves an average 23 to 31 percent reduction in mean absolute error compared to traditional regression and tree-based methods, while improving the coefficient of determination R2 from approximately 0.78 to 0.91 for large and complex project networks. Furthermore, the learned embeddings provide interpretable insights into resource bottlenecks and critical dependencies, enabling more explainable and adaptive scheduling decisions.
Transformer(5篇)
【1】RS-CA-HSICT: A Residual and Spatial Channel Augmented CNN Transformer Framework for Monkeypox Detection
标题:RS-CA-HSECT:用于猴痘检测的残余和空间通道增强CNN Transformer框架
链接:https://arxiv.org/abs/2511.15476
作者:Rashid Iqbal,Saddam Hussain Khan
备注:33 Pages, 12 Figure, 4 Tables
摘要:这项工作提出了一种混合深度学习方法,即基于残差和空间学习的信道增强集成CNN-Transformer架构,该架构利用了CNN和Transformer的优势来增强MPox检测。所提出的RS-CA-HSICT框架由HSICT块、残差CNN模块、空间CNN块和CA组成,其增强了多样的特征空间、详细的病变信息和长程依赖性。新的HSICT模块首先集成了干CNN的抽象表示和定制的ICT块,用于高效的多头注意力和具有同质(H)和结构(S)操作的结构化CNN层。定制的ICT块学习全局上下文交互和局部纹理提取。此外,H和S层通过减少噪声和对复杂形态变化进行建模来学习空间均匀性和精细结构细节。此外,逆残差学习增强了消失梯度,并且逐阶段分辨率降低确保了尺度不变性。此外,RS-CA-HSICT框架使用TL驱动的残差和空间CNN映射来增强学习的HSICT通道,以增强多尺度特征空间,捕获全局和局部结构线索、细微纹理和对比度变化。这些通道,在增强之前,通过保留有区别的通道同时抑制冗余的通道的“融合和注意”块进行细化,从而实现高效的计算。最后,空间注意机制细化像素选择,以检测Mpox中的细微模式和类内对比度变化。在Kaggle基准测试和不同的MPox数据集上的实验结果显示,分类准确率高达98.30%,F1得分为98.13%,优于现有的CNN和ViTs。
摘要:This work proposes a hybrid deep learning approach, namely Residual and Spatial Learning based Channel Augmented Integrated CNN-Transformer architecture, that leverages the strengths of CNN and Transformer towards enhanced MPox detection. The proposed RS-CA-HSICT framework is composed of an HSICT block, a residual CNN module, a spatial CNN block, and a CA, which enhances the diverse feature space, detailed lesion information, and long-range dependencies. The new HSICT module first integrates an abstract representation of the stem CNN and customized ICT blocks for efficient multihead attention and structured CNN layers with homogeneous (H) and structural (S) operations. The customized ICT blocks learn global contextual interactions and local texture extraction. Additionally, H and S layers learn spatial homogeneity and fine structural details by reducing noise and modeling complex morphological variations. Moreover, inverse residual learning enhances vanishing gradient, and stage-wise resolution reduction ensures scale invariance. Furthermore, the RS-CA-HSICT framework augments the learned HSICT channels with the TL-driven Residual and Spatial CNN maps for enhanced multiscale feature space capturing global and localized structural cues, subtle texture, and contrast variations. These channels, preceding augmentation, are refined through the Channel-Fusion-and-Attention block, which preserves discriminative channels while suppressing redundant ones, thereby enabling efficient computation. Finally, the spatial attention mechanism refines pixel selection to detect subtle patterns and intra-class contrast variations in Mpox. Experimental results on both the Kaggle benchmark and a diverse MPox dataset reported classification accuracy as high as 98.30% and an F1-score of 98.13%, which outperforms the existing CNNs and ViTs.
【2】BrainRotViT: Transformer-ResNet Hybrid for Explainable Modeling of Brain Aging from 3D sMRI
标题:BrainRotViT:Transformer-ResNet混合体,用于从3D sMRI对大脑衰老进行可解释建模
链接:https://arxiv.org/abs/2511.15188
作者:Wasif Jalal,Md Nafiu Rahman,M. Sohel Rahman
摘要:从结构MRI准确估计脑年龄是研究衰老和神经退行性变的有价值的生物标志物。传统的回归和基于CNN的方法面临着诸如手动特征工程、有限的感受野以及对异构数据的过拟合等限制。纯Transformer模型虽然有效,但需要大数据集和高计算成本。我们提出了Brain ResNet over trained Vision Transformer(BrainRotViT),这是一种混合架构,将Vision Transformers(ViT)的全局上下文建模与残余CNN的局部细化相结合。ViT编码器首先在辅助年龄和性别分类任务上进行训练,以学习切片级特征。然后将冻结的编码器应用于所有矢状切片以生成嵌入向量的2D矩阵,该矩阵被馈送到残余CNN回归器中,该回归器在最终的完全连接层处结合受试者性别以估计连续的脑年龄。我们的方法在包含130多个采集部位的11个MRI数据集上进行验证时实现了3.34年的MAE(Pearson $r=0.98$,Spearman $ρ=0.97$,$R^2=0.95$),优于基线和最先进的模型。它还在4个独立队列中具有良好的推广性,MAE在3.77至5.04年之间。对大脑年龄差距(预测年龄和实际年龄之间的差异)的分析表明,衰老模式与阿尔茨海默病,认知障碍和自闭症谱系障碍有关。模型注意力地图突出了大脑中与衰老相关的区域,特别是小脑蚓部、中央前回和中央后回、颞叶和内侧额上回。我们的研究结果表明,这种方法为脑年龄预测提供了一个有效的,可解释的和可推广的框架,弥合了CNN和基于transformer的方法之间的差距,同时为衰老和神经退行性疾病的研究开辟了新的途径。
摘要:Accurate brain age estimation from structural MRI is a valuable biomarker for studying aging and neurodegeneration. Traditional regression and CNN-based methods face limitations such as manual feature engineering, limited receptive fields, and overfitting on heterogeneous data. Pure transformer models, while effective, require large datasets and high computational cost. We propose Brain ResNet over trained Vision Transformer (BrainRotViT), a hybrid architecture that combines the global context modeling of vision transformers (ViT) with the local refinement of residual CNNs. A ViT encoder is first trained on an auxiliary age and sex classification task to learn slice-level features. The frozen encoder is then applied to all sagittal slices to generate a 2D matrix of embedding vectors, which is fed into a residual CNN regressor that incorporates subject sex at the final fully-connected layer to estimate continuous brain age. Our method achieves an MAE of 3.34 years (Pearson $r=0.98$, Spearman $ρ=0.97$, $R^2=0.95$) on validation across 11 MRI datasets encompassing more than 130 acquisition sites, outperforming baseline and state-of-the-art models. It also generalizes well across 4 independent cohorts with MAEs between 3.77 and 5.04 years. Analyses on the brain age gap (the difference between the predicted age and actual age) show that aging patterns are associated with Alzheimer's disease, cognitive impairment, and autism spectrum disorder. Model attention maps highlight aging-associated regions of the brain, notably the cerebellar vermis, precentral and postcentral gyri, temporal lobes, and medial superior frontal gyrus. Our results demonstrate that this method provides an efficient, interpretable, and generalizable framework for brain-age prediction, bridging the gap between CNN- and transformer-based approaches while opening new avenues for aging and neurodegeneration research.
【3】Transformer-Guided Deep Reinforcement Learning for Optimal Takeoff Trajectory Design of an eVTOL Drone
标题:转换器引导的深度强化学习用于eVTOL无人机的最佳起飞轨迹设计
链接:https://arxiv.org/abs/2511.14887
作者:Nathan M. Roberts,Xiaosong Du
备注:Conference version with 12 pages and 2 figures
摘要:电动垂直起降(eVTOL)飞机的快速发展为缓解城市交通拥堵提供了一个充满希望的机会。因此,开发最佳起飞轨迹以实现最小能耗对于更广泛的eVTOL飞机应用至关重要。传统的最优控制方法(如动态规划和线性二次型调节器)提供了高效和成熟的解决方案,但受到问题的维数和复杂性的限制。深度强化学习(DRL)是一种特殊类型的人工智能,用于处理复杂的非线性系统;然而,训练难度是限制DRL应用的关键瓶颈。为了解决这些挑战,我们提出了transformer-guided DRL,通过在每个时间步使用Transformer探索一个真实的状态空间来减轻训练难度。在eVTOL无人机的最佳起飞轨迹设计上证明了所提出的变压器引导DRL,以在满足起飞条件(即,最小垂直位移和最小水平速度)通过改变控制变量(即,功率和机翼与垂直方向的夹角)。结果显示,transformer引导的DRL代理以$4.57\times10^6$时间步长学习起飞,占普通DRL代理所需$19.79\times10^6$时间步长的25%。此外,与基于模拟的最佳参考相比,变压器引导的日间行车灯在最佳能耗方面实现了97.2%的准确度,而普通日间行车灯实现了96.3%的准确度。因此,所提出的变压器引导的DRL优于香草DRL的训练效率以及最佳设计验证方面。
摘要
:The rapid advancement of electric vertical take-off and landing (eVTOL) aircraft offers a promising opportunity to alleviate urban traffic congestion. Thus, developing optimal takeoff trajectories for minimum energy consumption becomes essential for broader eVTOL aircraft applications. Conventional optimal control methods (such as dynamic programming and linear quadratic regulator) provide highly efficient and well-established solutions but are limited by problem dimensionality and complexity. Deep reinforcement learning (DRL) emerges as a special type of artificial intelligence tackling complex, nonlinear systems; however, the training difficulty is a key bottleneck that limits DRL applications. To address these challenges, we propose the transformer-guided DRL to alleviate the training difficulty by exploring a realistic state space at each time step using a transformer. The proposed transformer-guided DRL was demonstrated on an optimal takeoff trajectory design of an eVTOL drone for minimal energy consumption while meeting takeoff conditions (i.e., minimum vertical displacement and minimum horizontal velocity) by varying control variables (i.e., power and wing angle to the vertical). Results presented that the transformer-guided DRL agent learned to take off with $4.57\times10^6$ time steps, representing 25% of the $19.79\times10^6$ time steps needed by a vanilla DRL agent. In addition, the transformer-guided DRL achieved 97.2% accuracy on the optimal energy consumption compared against the simulation-based optimal reference while the vanilla DRL achieved 96.3% accuracy. Therefore, the proposed transformer-guided DRL outperformed vanilla DRL in terms of both training efficiency as well as optimal design verification.
【4】FinTRec: Transformer Based Unified Contextual Ads Targeting and Personalization for Financial Applications
标题:FinTRec:基于Transformer的金融应用统一上下文广告定位和个性化
链接:https://arxiv.org/abs/2511.14865
作者:Dwipam Katariya,Snehita Varma,Akshat Shreemali,Benjamin Wu,Kalanand Mishra,Pranab Mohanty
备注:10 pages, 7 figures, Accepted at CARS @ RecSys 2025
摘要:基于transformer的体系结构被广泛应用于顺序推荐系统中,然而它们在金融服务(FS)中的应用为实时推荐提出了独特的实践和建模挑战。这些包括:a)跨越数字和物理渠道的长距离用户交互(隐式和显式),产生时间异构的上下文,b)多个相互关联的产品的存在需要协调的模型来支持各种广告投放和个性化的提要,同时平衡竞争的业务目标。我们提出了FinTRec,一个基于转换器的框架,可以解决这些挑战及其在金融服务中的运营目标。虽然基于树的模型由于其可解释性和与监管要求的一致性而传统上在金融服务中受到青睐,但我们的研究表明,FinTRec提供了向基于transformer的架构的可行且有效的转变。通过历史模拟和实时A/B测试相关性,我们表明FinTRec始终优于生产级基于树的基线。统一架构在针对产品适应性进行微调时,可实现跨产品信号共享,降低培训成本和技术债务,同时提高所有产品的离线性能。据我们所知,这是第一次全面研究统一的顺序推荐模型在FS中,解决了技术和业务的考虑。
摘要:Transformer-based architectures are widely adopted in sequential recommendation systems, yet their application in Financial Services (FS) presents distinct practical and modeling challenges for real-time recommendation. These include:a) long-range user interactions (implicit and explicit) spanning both digital and physical channels generating temporally heterogeneous context, b) the presence of multiple interrelated products require coordinated models to support varied ad placements and personalized feeds, while balancing competing business goals. We propose FinTRec, a transformer-based framework that addresses these challenges and its operational objectives in FS. While tree-based models have traditionally been preferred in FS due to their explainability and alignment with regulatory requirements, our study demonstrate that FinTRec offers a viable and effective shift toward transformer-based architectures. Through historic simulation and live A/B test correlations, we show FinTRec consistently outperforms the production-grade tree-based baseline. The unified architecture, when fine-tuned for product adaptation, enables cross-product signal sharing, reduces training cost and technical debt, while improving offline performance across all products. To our knowledge, this is the first comprehensive study of unified sequential recommendation modeling in FS that addresses both technical and business considerations.
【5】Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States
标题:Transformer可射性和几何鲁棒性-序列级隐藏状态的解析裕度和Bi-Lipschitz一致性
链接:https://arxiv.org/abs/2511.14808
作者:Mikael von Strauss
备注:22 pages, 5 figures
摘要:根据解码器的真正的分析假设,只有变压器,最近的工作表明,从离散提示最后令牌隐藏状态的映射是一般内射有限提示集。我们对这幅图进行细化:对于每个层$\ell $,我们定义了一个碰撞判别式$Δ ^\ell\子集θ $和内射层$U ^\ell = θ\setminus Δ ^\ell $,并证明了一个二分法--要么模型在集合上无处是内射的,要么$U ^\ell $是开的和稠密的,并且每个$F ^\ell_θ $都是内射的。在对优化器和绝对连续初始化的温和非奇异性假设下,通用注入性沿着任何固定范围内的平滑训练轨迹持续存在。我们还处理对称群$G $,证明判别式和内射层下降到商$θ/G $,所以内射性自然是函数等价类的一个属性。 我们补充这些结果与分层几何诊断的实证研究。我们定义了一个分离的保证金和一个共同的Lipschitz(较低的Lipschitz)提示空间和最后的令牌表示空间之间的常数,估计通过最近邻统计大提示集。将这些诊断应用于预训练的LLaMA-3和Qwen模型,我们研究了跨层、序列长度、模型尺度以及8位和4位激活量化的行为。在我们的采样提示中,我们没有看到全精度或8位的冲突,而4位量化会导致少量的冲突,并显着缩小了共同Lipschitz估计。对于一个从头开始训练的小GPT-2,归一化指标在训练过程中保持稳定。总体而言,结果表明,Transformer表示是一般的和持久的内射的连续参数理想化,而他们的实际可逆性可以使用简单的几何诊断探测。
摘要:Under real-analytic assumptions on decoder-only Transformers, recent work shows that the map from discrete prompts to last-token hidden states is generically injective on finite prompt sets. We refine this picture: for each layer $\ell$ we define a collision discriminant $Δ^\ell \subset Θ$ and injective stratum $U^\ell = Θ\setminus Δ^\ell$, and prove a dichotomy -- either the model is nowhere injective on the set, or $U^\ell$ is open and dense and every $F^\ell_θ$ is injective. Under mild non-singularity assumptions on the optimizer and an absolutely continuous initialization, generic injectivity persists along smooth training trajectories over any fixed horizon. We also treat symmetry groups $G$, showing that discriminants and injective strata descend to the quotient $Θ/G$, so injectivity is naturally a property of functional equivalence classes. We complement these results with an empirical study of layerwise geometric diagnostics. We define a separation margin and a co-Lipschitz (lower Lipschitz) constant between prompt space and last-token representation space, estimated via nearest-neighbor statistics on large prompt sets. Applying these diagnostics to pretrained LLaMA-3 and Qwen models, we study behavior across layers, sequence lengths, model scales, and 8- and 4-bit activation quantization. On our sampled prompts we see no collisions in full precision or at 8 bits, while 4-bit quantization induces a small number of collisions and markedly shrinks co-Lipschitz estimates. For a small GPT-2 trained from scratch, normalized metrics remain stable over training. Overall, the results suggest that Transformer representations are generically and persistently injective in the continuous-parameter idealization, while their practical invertibility can be probed using simple geometric diagnostics.
GAN|对抗|攻击|生成相关(4篇)
【1】FaultDiffusion: Few-Shot Fault Time Series Generation with Diffusion Model
标题:故障扩散:使用扩散模型生成少次故障时间序列
链接:https://arxiv.org/abs/2511.15174
作者:Yi Xu,Zhigang Chen,Rui Wang,Yangfan Li,Fengxiao Tang,Ming Zhao,Jiaqi Liu
备注:4 figures, 5 tables ,8 pages
摘要:在工业设备监控中,故障诊断对于确保系统可靠性和实现预测性维护至关重要。然而,由于故障事件的罕见性和数据注释的高成本,故障数据的稀缺性极大地阻碍了数据驱动的方法。现有的时间序列生成模型,优化丰富的正常数据,努力捕捉故障分布在Few-Shot的情况下,产生的样本,缺乏真实性和多样性,由于大的域间隙和高类内的变化故障。为了解决这个问题,我们提出了一种新的Few-Shot故障时间序列生成框架的基础上扩散模型。我们的方法采用了正负差异适配器,利用预先训练的正常数据分布来模拟正常和故障域之间的差异,以实现准确的故障合成。此外,引入了多样性损失来防止模式崩溃,通过样本间差异正则化来鼓励生成不同的故障样本。实验结果表明,该模型在真实性和多样性方面明显优于传统方法,在关键基准测试中达到了最先进的性能。
摘要:In industrial equipment monitoring, fault diagnosis is critical for ensuring system reliability and enabling predictive maintenance. However, the scarcity of fault data, due to the rarity of fault events and the high cost of data annotation, significantly hinders data-driven approaches. Existing time-series generation models, optimized for abundant normal data, struggle to capture fault distributions in few-shot scenarios, producing samples that lack authenticity and diversity due to the large domain gap and high intra-class variability of faults. To address this, we propose a novel few-shot fault time-series generation framework based on diffusion models. Our approach employs a positive-negative difference adapter, leveraging pre-trained normal data distributions to model the discrepancies between normal and fault domains for accurate fault synthesis. Additionally, a diversity loss is introduced to prevent mode collapse, encouraging the generation of diverse fault samples through inter-sample difference regularization. Experimental results demonstrate that our model significantly outperforms traditional methods in authenticity and diversity, achieving state-of-the-art performance on key benchmarks.
【2】Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation
标题:生成自然语言手术反馈:从结构化表示到基于领域的评估
链接:https://arxiv.org/abs/2511.15159
作者:Firdavs Nasriddinov,Rafal Kocielnik,Anima Anandkumar,Andrew J. Hung
备注:Accepted as proceedings paper for ML4H 2025
摘要:来自手术培训师的高质量术中反馈对于提高受训者的表现和长期技能获取至关重要。自动化自然的、培训师风格的反馈承诺及时、可访问和一致的大规模指导,但需要理解临床相关表示的模型。我们提出了一个结构感知的管道,从真正的培训师到学员的成绩单(33个手术)学习手术动作本体,并使用它来条件反馈生成。我们通过以下方式做出贡献:(1)从真实世界的反馈文本中挖掘仪器-动作-目标(IAT)三元组,并将表面形式聚类到规范化类别中,(2)微调视频到IAT模型,该模型利用外科手术和任务上下文以及细粒度的时间仪器运动,以及(3)演示如何有效地使用IAT三元组表示来指导GPT-4 o生成临床接地,培训师式的反馈。我们表明,在任务1:视频到IAT识别,我们的上下文注入和时间跟踪提供一致的AUC增益(仪器:0.67至0.74;动作:0.60至0.63;组织:0.74至0.79)。对于任务2:反馈文本生成(在1-5保真度标准上评级,其中1 =相反/不安全,3 =可接受,5 =与人类教练完美匹配),来自单独视频的GPT-40得分为2.17,而IAT条件反射达到2.44(+12.4%),使得分>= 3的可接受世代的份额从21%增加一倍至42%。传统的文本相似性度量也得到了改进:单词错误率降低了15-31%,ROUGE(短语/子串重叠)增加了9- 64%。明确的IAT结构中的接地生成提高了保真度,并产生了临床医生可验证的原理,支持在外科培训中的可审计使用。
摘要:High-quality intraoperative feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition. Automating natural, trainer-style feedback promises timely, accessible, and consistent guidance at scale but requires models that understand clinically relevant representations. We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts (33 surgeries) and uses it to condition feedback generation. We contribute by (1) mining Instrument-Action-Target (IAT) triplets from real-world feedback text and clustering surface forms into normalized categories, (2) fine-tuning a video-to-IAT model that leverages the surgical procedure and task contexts as well as fine-grained temporal instrument motion, and (3) demonstrating how to effectively use IAT triplet representations to guide GPT-4o in generating clinically grounded, trainer-style feedback. We show that, on Task 1: Video-to-IAT recognition, our context injection and temporal tracking deliver consistent AUC gains (Instrument: 0.67 to 0.74; Action: 0.60 to 0.63; Tissue: 0.74 to 0.79). For Task 2: feedback text generation (rated on a 1-5 fidelity rubric where 1 = opposite/unsafe, 3 = admissible, and 5 = perfect match to a human trainer), GPT-4o from video alone scores 2.17, while IAT conditioning reaches 2.44 (+12.4%), doubling the share of admissible generations with score >= 3 from 21% to 42%. Traditional text-similarity metrics also improve: word error rate decreases by 15-31% and ROUGE (phrase/substring overlap) increases by 9-64%. Grounding generation in explicit IAT structure improves fidelity and yields clinician-verifiable rationales, supporting auditable use in surgical training.
【3】Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
标题:Kandinsky 5.0:图像和视频生成的一系列基础模型
链接:https://arxiv.org/abs/2511.14993
作者:Vladimir Arkhipkin,Vladimir Korviakov,Nikolai Gerasimenko,Denis Parkhomenko,Viacheslav Vasilev,Alexey Letunovskiy,Maria Kovaleva,Nikolai Vaulin,Ivan Kirillov,Lev Novitskiy,Denis Koposov,Nikita Kiselev,Alexander Varlamov,Dmitrii Mikhailov,Vladimir Polovnikov,Andrey Shutkin,Ilya Vasiliev,Julia Agafonova,Anastasiia Kargapoltseva,Anna Dmitrienko,Anastasia Maltseva,Anna Averchenkova,Olga Kim,Tatiana Nikulina,Denis Dimitrov
备注:Website: https://kandinskylab.ai/
摘要:本报告介绍Kandinsky 5.0,这是一系列用于高分辨率图像和10秒视频合成的最先进的基础模型。该框架包括三个核心模型:Kandinsky 5.0 Image Lite -一系列6 B参数图像生成模型,Kandinsky 5.0 Video Lite -一个快速轻量级的2B参数文本到视频和图像到视频模型,以及Kandinsky 5.0 Video Pro -19 B参数模型,可实现卓越的视频生成质量。我们为多阶段训练管道提供了对数据策展生命周期的全面回顾,包括收集、处理、过滤和聚类,该管道涉及广泛的预训练,并结合了质量增强技术,如基于自我监督微调(SFT)和强化学习(RL)的后训练。我们还提出了新的架构,训练和推理优化,使Kandinsky 5.0能够在各种任务中实现高生成速度和最先进的性能,如人工评估所示。作为一个大规模的、公开可用的生成框架,Kandinsky 5.0充分利用了其预训练和后续阶段的潜力,以适应广泛的生成应用。我们希望这份报告,连同我们的开源代码和训练检查点的发布,将大大推动研究社区高质量生成模型的开发和可访问性。
摘要:This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis. The framework comprises three core line-up of models: Kandinsky 5.0 Image Lite - a line-up of 6B parameter image generation models, Kandinsky 5.0 Video Lite - a fast and lightweight 2B parameter text-to-video and image-to-video models, and Kandinsky 5.0 Video Pro - 19B parameter models that achieves superior video generation quality. We provide a comprehensive review of the data curation lifecycle - including collection, processing, filtering and clustering - for the multi-stage training pipeline that involves extensive pre-training and incorporates quality-enhancement techniques such as self-supervised fine-tuning (SFT) and reinforcement learning (RL)-based post-training. We also present novel architectural, training, and inference optimizations that enable Kandinsky 5.0 to achieve high generation speeds and state-of-the-art performance across various tasks, as demonstrated by human evaluation. As a large-scale, publicly available generative framework, Kandinsky 5.0 leverages the full potential of its pre-training and subsequent stages to be adapted for a wide range of generative applications. We hope that this report, together with the release of our open-source code and training checkpoints, will substantially advance the development and accessibility of high-quality generative models for the research community.
【4】Attacking Autonomous Driving Agents with Adversarial Machine Learning: A Holistic Evaluation with the CARLA Leaderboard
标题:利用对抗性机器学习攻击自动驾驶代理:CARLA排行榜的整体评估
链接:https://arxiv.org/abs/2511.14876
作者:Henry Wong,Clement Fung,Weiran Lin,Karen Li,Stanley Chen,Lujo Bauer
备注:12 pages
摘要:为了自动控制车辆,驾驶代理使用机器学习(ML)模型,控制器逻辑和自定义模块的组合输出。尽管许多先前的工作已经表明,对抗性示例可能会误导自动驾驶环境中使用的ML模型,但目前尚不清楚这些攻击是否能有效地为各种代理、环境和场景产生有害的驾驶行为。 为了评估对抗性示例对自动驾驶的风险,我们评估了对各种驾驶代理的攻击,而不是孤立地对ML模型的攻击。为了支持这种评估,我们利用城市驾驶模拟器CARLA来创建和评估对抗性示例。我们创建了旨在阻止或引导驾驶代理的对抗性补丁,在运行时将它们传输到CARLA模拟器中,并将它们与CARLA排行榜中的代理进行评估,这是一个年度研究竞赛中表现最佳的自动驾驶代理的公共存储库。与以前的工作不同,我们评估了对自动驾驶系统的攻击,而无需创建或修改任何驱动代理代码,并针对ML模型中包含的代理的所有部分。 我们对来自CARLA排行榜的三个开源驾驶代理的两种攻击策略进行了案例研究调查,涉及多个驾驶场景,照明条件和位置。有趣的是,我们发现,尽管一些攻击可以成功地误导ML模型预测错误的停止或转向命令,但一些驾驶代理使用的模块,如PID控制或基于GPS的规则,可以推翻攻击者操纵的ML模型预测。
摘要
:To autonomously control vehicles, driving agents use outputs from a combination of machine-learning (ML) models, controller logic, and custom modules. Although numerous prior works have shown that adversarial examples can mislead ML models used in autonomous driving contexts, it remains unclear if these attacks are effective at producing harmful driving actions for various agents, environments, and scenarios. To assess the risk of adversarial examples to autonomous driving, we evaluate attacks against a variety of driving agents, rather than against ML models in isolation. To support this evaluation, we leverage CARLA, an urban driving simulator, to create and evaluate adversarial examples. We create adversarial patches designed to stop or steer driving agents, stream them into the CARLA simulator at runtime, and evaluate them against agents from the CARLA Leaderboard, a public repository of best-performing autonomous driving agents from an annual research competition. Unlike prior work, we evaluate attacks against autonomous driving systems without creating or modifying any driving-agent code and against all parts of the agent included with the ML model. We perform a case-study investigation of two attack strategies against three open-source driving agents from the CARLA Leaderboard across multiple driving scenarios, lighting conditions, and locations. Interestingly, we show that, although some attacks can successfully mislead ML models into predicting erroneous stopping or steering commands, some driving agents use modules, such as PID control or GPS-based rules, that can overrule attacker-manipulated predictions from ML models.
半/弱/无/有监督|不确定性|主动学习(5篇)
【1】Cross-Modal Consistency-Guided Active Learning for Affective BCI Systems
标题:情感BCI系统的跨模式一致性引导主动学习
链接:https://arxiv.org/abs/2511.15138
作者:Hyo-Jeong Jang,Hye-Bin Shin,Kang Yin
摘要:深度学习模型在具有丰富、高质量标签的情况下表现最佳,但在基于EEG的情感识别中,这种条件很少能实现。脑电图(EEG)信号很容易被人为因素和个体差异所破坏,而情感标签通常来自主观和不一致的报告,这使得鲁棒的情感解码特别困难。我们提出了一个不确定性感知的主动学习框架,通过联合利用模型不确定性和跨模态一致性来增强对标签噪声的鲁棒性。该方法不是仅仅依赖于基于EEG的不确定性估计,而是评估跨模态对准以确定不确定性是否源于认知模糊或传感器噪声。一个表示对齐模块嵌入EEG和面部特征到一个共享的潜在空间,强制执行模态之间的语义一致性。剩余的差异被视为噪声引起的不一致,这些样本在主动学习过程中有选择地查询Oracle反馈。这个反馈驱动的过程引导网络获得可靠、信息丰富的样本,并减少嘈杂标签的影响。ASCERTAIN数据集上的实验研究了我们的效率和鲁棒性,突出了它作为脑机接口系统中基于EEG的情感解码的数据高效和噪声容忍方法的潜力。
摘要:Deep learning models perform best with abundant, high-quality labels, yet such conditions are rarely achievable in EEG-based emotion recognition. Electroencephalogram (EEG) signals are easily corrupted by artifacts and individual variability, while emotional labels often stem from subjective and inconsistent reports-making robust affective decoding particularly difficult. We propose an uncertainty-aware active learning framework that enhances robustness to label noise by jointly leveraging model uncertainty and cross-modal consistency. Instead of relying solely on EEG-based uncertainty estimates, the method evaluates cross-modal alignment to determine whether uncertainty originates from cognitive ambiguity or sensor noise. A representation alignment module embeds EEG and face features into a shared latent space, enforcing semantic coherence between modalities. Residual discrepancies are treated as noise-induced inconsistencies, and these samples are selectively queried for oracle feedback during active learning. This feedback-driven process guides the network toward reliable, informative samples and reduces the impact of noisy labels. Experiments on the ASCERTAIN dataset examine the efficiency and robustness of ours, highlighting its potential as a data-efficient and noise-tolerant approach for EEG-based affective decoding in brain-computer interface systems.
【2】WaveFuse-AL: Cyclical and Performance-Adaptive Multi-Strategy Active Learning for Medical Images
标题:WaveFuse-AL:医学图像的周期性和性能自适应多策略主动学习
链接:https://arxiv.org/abs/2511.15132
作者:Nishchala Thakur,Swati Kochhar,Deepti R. Bathula,Sukrit Gupta
摘要:主动学习通过策略性地选择信息量最大的样本进行标记来降低医学成像中的注释成本。然而,个体习得策略在主动学习周期的不同阶段往往表现出不一致的行为。我们提出了周期性和性能自适应多策略主动学习(WaveFuse-AL),一种新的框架,自适应融合多个既定的收购策略-BALD,BADGE,熵,和CoreSet在整个学习过程中。WaveFuse-AL将周期性(正弦)时间先验与性能驱动的自适应集成,以随时间动态调整策略重要性。我们在三个医学成像基准上评估了WaveFuse-AL:APTOS-2019(多类分类),RSNA肺炎检测(二元分类)和ISIC-2018(皮肤病变分割)。实验结果表明,WaveFuse-AL始终优于单一策略和交替策略基线,实现了统计上显着的性能改进(12个度量测量中的10个),同时最大限度地提高了有限注释预算的效用。
摘要:Active learning reduces annotation costs in medical imaging by strategically selecting the most informative samples for labeling. However, individual acquisition strategies often exhibit inconsistent behavior across different stages of the active learning cycle. We propose Cyclical and Performance-Adaptive Multi-Strategy Active Learning (WaveFuse-AL), a novel framework that adaptively fuses multiple established acquisition strategies-BALD, BADGE, Entropy, and CoreSet throughout the learning process. WaveFuse-AL integrates cyclical (sinusoidal) temporal priors with performance-driven adaptation to dynamically adjust strategy importance over time. We evaluate WaveFuse-AL on three medical imaging benchmarks: APTOS-2019 (multi-class classification), RSNA Pneumonia Detection (binary classification), and ISIC-2018 (skin lesion segmentation). Experimental results demonstrate that WaveFuse-AL consistently outperforms both single-strategy and alternating-strategy baselines, achieving statistically significant performance improvements (on ten out of twelve metric measurements) while maximizing the utility of limited annotation budgets.
【3】Efficient RF Passive Components Modeling with Bayesian Online Learning and Uncertainty Aware Sampling
标题:利用Bayesian在线学习和不确定性感知抽样的高效RF无源元件建模
链接:https://arxiv.org/abs/2511.15125
作者:Huifan Zhang,Pingqiang Zhou
摘要:基于机器学习的传统射频(RF)无源元件建模需要大量的电磁(EM)仿真来覆盖几何和频率设计空间,从而产生计算瓶颈。在本文中,我们介绍了一个不确定性感知贝叶斯在线学习框架的RF无源元件的有效参数建模,其中包括:1)贝叶斯神经网络与可重构头的联合几何-频域建模,同时量化不确定性; 2)自适应采样策略,同时优化训练数据采样的几何参数和频域使用不确定性的指导。在三个RF无源器件上进行验证,该框架实现了准确的建模,与传统的基于ML的流程相比,仅使用2.86%的EM仿真时间,实现了35倍的加速比。
摘要:Conventional radio frequency (RF) passive components modeling based on machine learning requires extensive electromagnetic (EM) simulations to cover geometric and frequency design spaces, creating computational bottlenecks. In this paper, we introduce an uncertainty-aware Bayesian online learning framework for efficient parametric modeling of RF passive components, which includes: 1) a Bayesian neural network with reconfigurable heads for joint geometric-frequency domain modeling while quantifying uncertainty; 2) an adaptive sampling strategy that simultaneously optimizes training data sampling across geometric parameters and frequency domain using uncertainty guidance. Validated on three RF passive components, the framework achieves accurate modeling while using only 2.86% EM simulation time compared to traditional ML-based flow, achieving a 35 times speedup.
【4】Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence
标题:动态嵌套分层结构:机器学习架构中开创性的自我进化,以实现终身智能
链接:https://arxiv.org/abs/2511.14823
作者:Akbar Anbar Jafari,Cagri Ozcinar,Gholamreza Anbarjafari
备注:12 pages, 1 figure
摘要
:当代机器学习模型,包括大型语言模型,在静态任务中表现出卓越的能力,但在非静态环境中却表现不佳,这是由于僵化的架构阻碍了持续适应和终身学习。基于嵌套学习范式,将模型分解为具有固定更新频率的多级优化问题,这项工作提出了动态嵌套层次结构作为推进人工智能和机器学习的下一个进化步骤。动态嵌套层次结构使模型能够在训练或推理过程中自主调整优化级别的数量、嵌套结构和更新频率,这受到神经可塑性的启发,可以在没有预定义约束的情况下实现自我进化。这项创新解决了现有模型中的顺行性遗忘症,通过动态压缩上下文流和适应分布变化来促进真正的终身学习。通过严格的数学公式,收敛的理论证明,表现力界限,以及在不同制度下的次线性遗憾,以及在语言建模,持续学习和长上下文推理方面表现优异的经验证明,动态嵌套层次结构建立了自适应通用智能的基础性进步。
摘要:Contemporary machine learning models, including large language models, exhibit remarkable capabilities in static tasks yet falter in non-stationary environments due to rigid architectures that hinder continual adaptation and lifelong learning. Building upon the nested learning paradigm, which decomposes models into multi-level optimization problems with fixed update frequencies, this work proposes dynamic nested hierarchies as the next evolutionary step in advancing artificial intelligence and machine learning. Dynamic nested hierarchies empower models to autonomously adjust the number of optimization levels, their nesting structures, and update frequencies during training or inference, inspired by neuroplasticity to enable self-evolution without predefined constraints. This innovation addresses the anterograde amnesia in existing models, facilitating true lifelong learning by dynamically compressing context flows and adapting to distribution shifts. Through rigorous mathematical formulations, theoretical proofs of convergence, expressivity bounds, and sublinear regret in varying regimes, alongside empirical demonstrations of superior performance in language modeling, continual learning, and long-context reasoning, dynamic nested hierarchies establish a foundational advancement toward adaptive, general-purpose intelligence.
【5】Beyond Uncertainty Sets: Leveraging Optimal Transport to Extend Conformal Predictive Distribution to Multivariate Settings
标题:超越不确定性集:利用最佳传输将保形预测分布扩展到多元环境
链接:https://arxiv.org/abs/2511.15146
作者:Eugene Ndiaye
摘要:共形预测(CP)构造模型输出的不确定性集,具有有限样本覆盖保证。如果候选输出的非一致性得分相对于在一组校准示例上观察到的得分不被认为是极端的,则该候选输出被包括在预测集中。然而,这一过程仅在分数是标量值时才是直接的,这将CP限制在实值分数或特定的一维约简上。排序向量的问题已经通过最优传输(OT)进行了研究,它提供了一种定义向量秩和多变量分位数区域的原则性方法,尽管通常只有渐近覆盖保证。我们恢复有限样本,分布免费覆盖,通过整合向量值OT分位数区域。这里,候选者的排名经由针对用该候选者的分数扩增的校准分数计算的传输图来定义。这定义了一个连续的OT问题,我们证明了由此产生的最优分配是分段常数在一个固定的多面体分区的分数空间。这使我们能够对整个预测集进行可追踪的表征,并提供了解决预测集更深层次局限性的机制:它们只表明哪些结果是合理的,而不是它们的相对可能性。在一维中,共形预测分布(CPD)通过有限样本校准产生预测分布来填补这一空白。将国家方案文件扩展到一个层面以外仍然是一个悬而未决的问题。据我们所知,我们构建了第一个具有有限样本校准的多变量CPD,即,它们定义了有效的多变量分布,其中任何导出的不确定性区域自动具有保证的覆盖。我们提出了保守和精确的随机版本,后者导致在一个多变量的推广经典的登普斯特-希尔程序。
摘要:Conformal prediction (CP) constructs uncertainty sets for model outputs with finite-sample coverage guarantees. A candidate output is included in the prediction set if its non-conformity score is not considered extreme relative to the scores observed on a set of calibration examples. However, this procedure is only straightforward when scores are scalar-valued, which has limited CP to real-valued scores or ad-hoc reductions to one dimension. The problem of ordering vectors has been studied via optimal transport (OT), which provides a principled method for defining vector-ranks and multivariate quantile regions, though typically with only asymptotic coverage guarantees. We restore finite-sample, distribution-free coverage by conformalizing the vector-valued OT quantile region. Here, a candidate's rank is defined via a transport map computed for the calibration scores augmented with that candidate's score. This defines a continuum of OT problems for which we prove that the resulting optimal assignment is piecewise-constant across a fixed polyhedral partition of the score space. This allows us to characterize the entire prediction set tractably, and provides the machinery to address a deeper limitation of prediction sets: that they only indicate which outcomes are plausible, but not their relative likelihood. In one dimension, conformal predictive distributions (CPDs) fill this gap by producing a predictive distribution with finite-sample calibration. Extending CPDs beyond one dimension remained an open problem. We construct, to our knowledge, the first multivariate CPDs with finite-sample calibration, i.e., they define a valid multivariate distribution where any derived uncertainty region automatically has guaranteed coverage. We present both conservative and exact randomized versions, the latter resulting in a multivariate generalization of the classical Dempster-Hill procedure.
迁移|Zero/Few/One-Shot|自适应(6篇)
【1】LaguerreNet: Advancing a Unified Solution for Heterophily and Over-smoothing with Adaptive Continuous Polynomials
标题:LaguerreNet:利用自适应连续多项提出异相和过度平滑的统一解决方案
链接:https://arxiv.org/abs/2511.15328
作者:Huseyin Goksu
摘要:谱图神经网络(GNN)有两个关键的限制:在“异嗜”图上的性能差和在高多项式次数(K)下的性能崩溃,称为过度平滑。这两个问题都源于标准滤波器的静态、低通特性(例如,ChebyNet)。虽然自适应多项式滤波器,如离散MeixnerNet,已经成为一个潜在的统一解决方案,其扩展到连续域和稳定性与无界系数仍然是悬而未决的问题。在这项工作中,我们提出了'LaguerreNet',一种基于连续Laguerre多项式的新型GNN滤波器。'LaguerreNet'通过使其核心alpha参数可训练来学习滤波器的频谱形状,从而推进自适应多项式方法。我们解决了严重的O(k^2)的数值不稳定性,这些无界多项式使用“LayerNorm”为基础的稳定技术。我们通过实验证明了这种方法是非常有效的:1)`LaguerreNet`在具有挑战性的异嗜性基准上取得了最先进的结果。2)它对过度平滑非常鲁棒,性能在K=10时达到峰值,比ChebyNet崩溃的数量级还要高。
摘要:Spectral Graph Neural Networks (GNNs) suffer from two critical limitations: poor performance on "heterophilic" graphs and performance collapse at high polynomial degrees (K), known as over-smoothing. Both issues stem from the static, low-pass nature of standard filters (e.g., ChebyNet). While adaptive polynomial filters, such as the discrete MeixnerNet, have emerged as a potential unified solution, their extension to the continuous domain and stability with unbounded coefficients remain open questions. In this work, we propose `LaguerreNet`, a novel GNN filter based on continuous Laguerre polynomials. `LaguerreNet` learns the filter's spectral shape by making its core alpha parameter trainable, thereby advancing the adaptive polynomial approach. We solve the severe O(k^2) numerical instability of these unbounded polynomials using a `LayerNorm`-based stabilization technique. We demonstrate experimentally that this approach is highly effective: 1) `LaguerreNet` achieves state-of-the-art results on challenging heterophilic benchmarks. 2) It is exceptionally robust to over-smoothing, with performance peaking at K=10, an order of magnitude beyond where ChebyNet collapses.
【2】KrawtchoukNet: A Unified GNN Solution for Heterophily and Over-smoothing with Adaptive Bounded Polynomials
标题:KrawtchoukNet:具有自适应有界多项的异方差和过平滑统一GNN解决方案
链接:https://arxiv.org/abs/2511.15327
作者:Huseyin Goksu
摘要:基于多项式滤波器的谱图神经网络(GNN),如ChebyNet,受到两个关键限制:1)在“异嗜”图上的性能崩溃和2)在高多项式次数(K)下的性能崩溃,称为过度平滑。这两个问题都源于标准滤波器的静态低通特性。在这项工作中,我们提出了'KrawtchoukNet',GNN滤波器的基础上离散Krawtchouk多项式。我们证明,KrawtchoukNet提供了一个统一的解决方案,这两个问题,通过两个关键的设计选择。首先,通过将多项式的域N固定为小常数(例如,N=20),我们创建了第一个GNN滤波器,其递归系数是\textit{固有界}的,使其对过度平滑非常鲁棒(在K=10时实现SOTA结果)。其次,通过使滤波器的形状参数p可学习,滤波器使其光谱响应适应图形数据。我们证明了这种自适应性质允许KrawtchoukNet在具有挑战性的异性基准(德克萨斯州,康奈尔大学)上实现SOTA性能,决定性地优于GAT和APPNP等标准GNN。
摘要
:Spectral Graph Neural Networks (GNNs) based on polynomial filters, such as ChebyNet, suffer from two critical limitations: 1) performance collapse on "heterophilic" graphs and 2) performance collapse at high polynomial degrees (K), known as over-smoothing. Both issues stem from the static, low-pass nature of standard filters. In this work, we propose `KrawtchoukNet`, a GNN filter based on the discrete Krawtchouk polynomials. We demonstrate that `KrawtchoukNet` provides a unified solution to both problems through two key design choices. First, by fixing the polynomial's domain N to a small constant (e.g., N=20), we create the first GNN filter whose recurrence coefficients are \textit{inherently bounded}, making it exceptionally robust to over-smoothing (achieving SOTA results at K=10). Second, by making the filter's shape parameter p learnable, the filter adapts its spectral response to the graph data. We show this adaptive nature allows `KrawtchoukNet` to achieve SOTA performance on challenging heterophilic benchmarks (Texas, Cornell), decisively outperforming standard GNNs like GAT and APPNP.
【3】SNAP: Low-Latency Test-Time Adaptation with Sparse Updates
标题:SNAP:具有稀疏更新的低延迟测试时适应
链接:https://arxiv.org/abs/2511.15276
作者:Hyeongheon Cha,Dong Min Kim,Hye Won Chung,Taesik Gong,Sung-Ju Lee
摘要:测试时自适应(TTA)使用未标记的测试数据来调整模型,以处理动态分布变化。然而,现有的方法依赖于频繁的适应和高计算成本,使它们不适合资源受限的边缘环境。为了解决这个问题,我们提出了SNAP,一个稀疏的TTA框架,减少适应频率和数据使用,同时保持准确性。SNAP即使仅基于1%的传入数据流进行调整,也能保持具有竞争力的准确性,这证明了它在不频繁更新的情况下的鲁棒性。我们的方法引入了两个关键组件:(i)类和域代表性存储器(CnDRM),其识别并存储代表类和域特征的一小组样本,以支持有限数据的有效适应;以及(ii)仅推断批量感知存储器归一化(IoBMN),其通过利用这些代表性样本在推断时动态地调整归一化统计,从而能够有效地对准移动的靶结构域。SNAP集成了五种最先进的TTA算法,可将延迟降低高达93.12%,同时将准确率下降保持在3.3%以下,即使适应率从1%到50%不等。这证明了它在边缘设备上的强大实用潜力,这些设备为延迟敏感型应用提供服务。源代码可在https://github.com/chahh9808/SNAP上获得。
摘要:Test-Time Adaptation (TTA) adjusts models using unlabeled test data to handle dynamic distribution shifts. However, existing methods rely on frequent adaptation and high computational cost, making them unsuitable for resource-constrained edge environments. To address this, we propose SNAP, a sparse TTA framework that reduces adaptation frequency and data usage while preserving accuracy. SNAP maintains competitive accuracy even when adapting based on only 1% of the incoming data stream, demonstrating its robustness under infrequent updates. Our method introduces two key components: (i) Class and Domain Representative Memory (CnDRM), which identifies and stores a small set of samples that are representative of both class and domain characteristics to support efficient adaptation with limited data; and (ii) Inference-only Batch-aware Memory Normalization (IoBMN), which dynamically adjusts normalization statistics at inference time by leveraging these representative samples, enabling efficient alignment to shifting target domains. Integrated with five state-of-the-art TTA algorithms, SNAP reduces latency by up to 93.12%, while keeping the accuracy drop below 3.3%, even across adaptation rates ranging from 1% to 50%. This demonstrates its strong potential for practical use on edge devices serving latency-sensitive applications. The source code is available at https://github.com/chahh9808/SNAP.
【4】Learning Where, What and How to Transfer: A Multi-Role Reinforcement Learning Approach for Evolutionary Multitasking
标题:学习转移地点、转移内容以及如何转移:用于进化多任务处理的多角色强化学习方法
链接:https://arxiv.org/abs/2511.15199
作者:Jiajun Zhan,Zeyuan Ma,Yue-Jiao Gong,Kay Chen Tan
摘要:进化多任务(EMT)算法通常需要量身定制的设计,以确保多任务优化的收敛性和最优性。在本文中,我们探索通过强化学习设计一个系统的和可推广的知识转移政策。我们首先确定三个主要挑战:确定要转移的任务(在哪里),要转移的知识(什么)和转移的机制(如何)。为了解决这些挑战,我们制定了一个多角色RL系统,其中三个策略网络(组)充当专门的代理:任务路由代理结合基于注意力的相似性识别模块以通过注意力分数确定源-目标转移对;知识控制代理确定要转移的精英解决方案的比例;一组策略适配代理通过动态控制底层EMT框架中的超参数来控制传输强度。通过在一个扩展的多任务问题分布上对所有网络模块进行端到端的预训练,得到一个可推广的元策略。全面的验证实验显示了我们的方法对代表性基线的最先进的性能。进一步的深入分析不仅揭示了我们的建议背后的理由,而且还对系统所学到的东西提供了深刻的解释。
摘要:Evolutionary multitasking (EMT) algorithms typically require tailored designs for knowledge transfer, in order to assure convergence and optimality in multitask optimization. In this paper, we explore designing a systematic and generalizable knowledge transfer policy through Reinforcement Learning. We first identify three major challenges: determining the task to transfer (where), the knowledge to be transferred (what) and the mechanism for the transfer (how). To address these challenges, we formulate a multi-role RL system where three (groups of) policy networks act as specialized agents: a task routing agent incorporates an attention-based similarity recognition module to determine source-target transfer pairs via attention scores; a knowledge control agent determines the proportion of elite solutions to transfer; and a group of strategy adaptation agents control transfer strength by dynamically controlling hyper-parameters in the underlying EMT framework. Through pre-training all network modules end-to-end over an augmented multitask problem distribution, a generalizable meta-policy is obtained. Comprehensive validation experiments show state-of-the-art performance of our method against representative baselines. Further in-depth analysis not only reveals the rationale behind our proposal but also provide insightful interpretations on what the system have learned.
【5】Cluster-based Adaptive Retrieval: Dynamic Context Selection for RAG Applications
标题:基于上下文的自适应检索:RAG应用中的动态上下文选择
链接:https://arxiv.org/abs/2511.14769
作者:Yifan Xu,Vipul Gupta,Rohit Aggarwal,Varsha Mahadevan,Bhaskar Krishnamachari
摘要:检索增强生成(RAG)通过从庞大且不断增长的语料库中提取外部材料,文档,代码,手册来增强大型语言模型(LLM),以有效地回答用户查询。RAG的有效性在很大程度上取决于检索到的文档数量与查询特征的一致性:狭义的查询通常需要更少的高度相关的文档,而更广泛或模糊的查询则受益于检索更广泛的支持信息。然而,常见的静态top-k检索方法无法适应这种变化,导致要么从太少的文档或冗余信息太多的不足的上下文。出于这些挑战,我们介绍了基于搜索引擎的自适应检索(CAR),一种算法,通过分析有序查询文档相似性距离的聚类模式,动态地确定最佳的文档数量。CAR检测相似距离内的转换点,其中紧密聚集的高度相关的文档向不太相关的候选者转移,建立一个自适应的截止点,随查询复杂性而扩展。在Coinbase的CDP语料库和公共MultiHop-RAG基准测试中,CAR始终选择最佳检索深度,并获得最高的TES分数,优于每一个固定的top-k基线。在下游RAG评估中,CAR将LLM令牌使用量减少了60%,将端到端延迟减少了22%,并将幻觉减少了10%,同时完全保留了答案的相关性。自从将CAR集成到Coinbase的虚拟助手中以来,我们看到用户参与度上升了200%。
摘要:Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by pulling in external material, document, code, manuals, from vast and ever-growing corpora, to effectively answer user queries. The effectiveness of RAG depends significantly on aligning the number of retrieved documents with query characteristics: narrowly focused queries typically require fewer, highly relevant documents, whereas broader or ambiguous queries benefit from retrieving more extensive supporting information. However, the common static top-k retrieval approach fails to adapt to this variability, resulting in either insufficient context from too few documents or redundant information from too many. Motivated by these challenges, we introduce Cluster-based Adaptive Retrieval (CAR), an algorithm that dynamically determines the optimal number of documents by analyzing the clustering patterns of ordered query-document similarity distances. CAR detects the transition point within similarity distances, where tightly clustered, highly relevant documents shift toward less pertinent candidates, establishing an adaptive cut-off that scales with query complexity. On Coinbase's CDP corpus and the public MultiHop-RAG benchmark, CAR consistently picks the optimal retrieval depth and achieves the highest TES score, outperforming every fixed top-k baseline. In downstream RAG evaluations, CAR cuts LLM token usage by 60%, trims end-to-end latency by 22%, and reduces hallucinations by 10% while fully preserving answer relevance. Since integrating CAR into Coinbase's virtual assistant, we've seen user engagement jump by 200%.
【6】Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion
标题:基于身份的迁移学习和MAMBA融合在对话中进行质量控制的多模式情感识别
链接:https://arxiv.org/abs/2511.14969
作者:Zanxu Wang,Homayoon Beigi
备注:8 pages, 14 images, 3 tables, Recognition Technologies, Inc. Technical Report RTI-20251118-01
摘要:本文通过系统的质量控制和多阶段迁移学习来解决多模态会话情感识别(MERC)中的数据质量问题。我们为MELD和IEMOCAP数据集实现了一个质量控制管道,用于验证说话人身份、音频文本对齐和人脸检测。我们利用来自说话人和人脸识别的迁移学习,假设身份识别嵌入不仅捕获稳定的声学和面部特征,而且捕获特定于个人的情感表达模式。我们使用MMPadeEasy(R)引擎来提取512维说话人和人脸嵌入,微调MPNet-v2以实现情感感知的文本表示,并通过在单峰数据集上训练的情感特定MLP来调整这些特征。基于MAMBA的三峰融合在MELD上达到64.8%的准确率,在IEMOCAP上达到74.3%。这些结果表明,结合基于身份的音频和视频嵌入与情感调整的文本表示的质量控制的数据子集产生一致的竞争性能的多模态情感识别会话,并提供了一个基础上进一步改善具有挑战性的,低频的情感类。
摘要:This paper addresses data quality issues in multimodal emotion recognition in conversation (MERC) through systematic quality control and multi-stage transfer learning. We implement a quality control pipeline for MELD and IEMOCAP datasets that validates speaker identity, audio-text alignment, and face detection. We leverage transfer learning from speaker and face recognition, assuming that identity-discriminative embeddings capture not only stable acoustic and Facial traits but also person-specific patterns of emotional expression. We employ RecoMadeEasy(R) engines for extracting 512-dimensional speaker and face embeddings, fine-tune MPNet-v2 for emotion-aware text representations, and adapt these features through emotion-specific MLPs trained on unimodal datasets. MAMBA-based trimodal fusion achieves 64.8% accuracy on MELD and 74.3% on IEMOCAP. These results show that combining identity-based audio and visual embeddings with emotion-tuned text representations on a quality-controlled subset of data yields consistent competitive performance for multimodal emotion recognition in conversation and provides a basis for further improvement on challenging, low-frequency emotion classes.
强化学习(6篇)
【1】The Impact of Quantization on Large Reasoning Model Reinforcement Learning
标题:量化对大型推理模型强化学习的影响
链接:https://arxiv.org/abs/2511.15694
作者:Medha Kumar,Zifei Xu,Xin Wang,Tristan Webb
备注:Accepted to the NeurIPS 2025 Efficient Reasoning Workshop
摘要:强大的推理能力现在可以通过大规模强化学习(RL)来实现,而无需任何监督微调。虽然后训练量化(PTQ)和量化感知训练(QAT)在微调的背景下得到了很好的研究,但量化如何影响大型推理模型(LRM)中的RL仍然是一个悬而未决的问题。为了回答这个问题,我们进行了系统的实验,并发现后RL量化模型与其量化感知RL优化模型之间在数学基准上的推理性能存在显着差距。我们的研究结果表明,量化意识RL训练对学习过程产生了负面影响,而PTQ和QLoRA则带来了更好的表现。
摘要:Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reasoning models (LRMs) remains an open question. To answer this question, we conducted systematic experiments and discovered a significant gap in reasoning performance on mathematical benchmarks between post-RL quantized models and their quantization-aware RL optimized counterparts. Our findings suggest that quantization-aware RL training negatively impacted the learning process, whereas PTQ and QLoRA led to greater performance.
【2】Continual Reinforcement Learning for Cyber-Physical Systems: Lessons Learned and Open Challenges
标题:网络物理系统的持续强化学习:经验教训和开放挑战
链接:https://arxiv.org/abs/2511.15652
作者:Kim N. Nolle,Ivana Dusparic,Rhodri Cusack,Vinny Cahill
备注:5 pages, 5 figures, Accepted to RLDM 2025
摘要:持续学习(CL)是机器学习的一个分支,旨在使智能体能够适应和概括以前学习的能力,以便这些能力可以重新应用于新的任务或环境。这在多任务设置或动态可能随时间变化的非静止环境中特别有用。这在自动驾驶等网络物理系统中尤其重要。然而,尽管CL最近取得了进展,但成功地将其应用于强化学习(RL)仍然是一个悬而未决的问题。 本文重点介绍了开放的挑战,在自动驾驶环境中的实验的基础上,在持续强化学习(CRL)。在这种环境中,智能体必须学会在四种不同的场景中成功停车,这些场景对应于不同角度的停车位。使用邻近策略优化(PPO),代理在这四个场景中一个接一个地接受训练,代表CL环境。这些实验暴露了CRL中的一些开放性挑战:找到合适的环境抽象、对超参数的过度敏感、灾难性遗忘以及有效利用神经网络容量。 基于这些确定的挑战,我们提出了开放的研究问题,重要的是要解决创建强大的CRL系统。此外,所确定的挑战对神经网络用于CL的适用性提出了质疑。我们还确定了跨学科研究的必要性,特别是计算机科学和神经科学之间。
摘要:Continual learning (CL) is a branch of machine learning that aims to enable agents to adapt and generalise previously learned abilities so that these can be reapplied to new tasks or environments. This is particularly useful in multi-task settings or in non-stationary environments, where the dynamics can change over time. This is particularly relevant in cyber-physical systems such as autonomous driving. However, despite recent advances in CL, successfully applying it to reinforcement learning (RL) is still an open problem. This paper highlights open challenges in continual RL (CRL) based on experiments in an autonomous driving environment. In this environment, the agent must learn to successfully park in four different scenarios corresponding to parking spaces oriented at varying angles. The agent is successively trained in these four scenarios one after another, representing a CL environment, using Proximal Policy Optimisation (PPO). These experiments exposed a number of open challenges in CRL: finding suitable abstractions of the environment, oversensitivity to hyperparameters, catastrophic forgetting, and efficient use of neural network capacity. Based on these identified challenges, we present open research questions that are important to be addressed for creating robust CRL systems. In addition, the identified challenges call into question the suitability of neural networks for CL. We also identify the need for interdisciplinary research, in particular between computer science and neuroscience.
【3】GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
标题:GRPO-RM:通过GRPO驱动的强化学习微调表示模型
链接:https://arxiv.org/abs/2511.15256
作者:Yanchen Xu,Ziheng Jiao,Hongyuan Zhang,Xuelong Li
摘要:组相对策略优化(GRPO)是一种用于微调大型语言模型(LLM)的强化学习方法,已在DeepSeek-R1等实际应用中证明了其有效性。它提出了一个问题,是否GRPO可以推广到表示学习模型。在本文中,我们提出了表示模型的组相对策略优化(GRPO-RM),并研究了GRPO-like策略在后训练表示模型中的性能。具体来说,我们的方法建立了一个预定义的输出集,在功能上取代令牌序列采样LLM,从而生成一个输出组,这是必不可少的概率驱动的优化GRPO。此外,一个专门的奖励函数的设计,以适应表征模型的属性。在各种真实数据集上进行了大量的实验,以验证我们所提出的方法的有效性。
摘要
:The Group Relative Policy Optimization (GRPO), a reinforcement learning method used to fine-tune large language models (LLMs), has proved its effectiveness in practical applications such as DeepSeek-R1. It raises a question whether GRPO can be generalized to representation learning models. In this paper, we propose Group Relative Policy Optimization for Representation Model (GRPO-RM), and investigate the performance of GRPO-like policy in post-training representation models. Specifically, our method establishes a predefined output set to functionally replace token sequence sampling in LLMs, thereby generating an output group, which is essential for the probability-driven optimization of GRPO. In addition, a specialized reward function is designed to accommodate the properties of representation models. Extensive experiments are conducted on various real-world datasets to validate the effectiveness of our proposed method.
【4】Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
标题:掩蔽自回归变分加速:快速推理使强化学习变得实用
链接:https://arxiv.org/abs/2511.15190
作者:Yuxuan Gu,Weimin Bai,Yifei Wang,Weijian Luo,He Sun
摘要:掩蔽自回归扩散模型(MAR)受益于扩散模型的表达建模能力和掩蔽自回归排序的灵活性。然而,香草MAR由于其分层推理机制而遭受缓慢的推理:外部AR去掩蔽循环和内部扩散去噪链。这种解耦的结构不仅会降低生成效率,而且会阻碍MAR在强化学习(RL)中的实际应用,强化学习是生成模型后训练的一种越来越重要的范式。(Masked Auto-Regressive Variational Acceleration)一个基于蒸馏的框架,将扩散链压缩到单个AR生成步骤中,同时保留灵活的自动回归解蔽令MARVAL的这种升华不仅产生了实质性的推理加速,而且至关重要的是,使RL后训练具有可验证的奖励,从而产生可扩展但人类偏好的快速生成模型。我们的贡献是双重的:(1)一种新的基于分数的变分目标,用于将掩蔽自回归扩散模型提取到单个生成步骤中,而不牺牲样本质量;以及(2)通过MARVAL-RL为掩蔽自回归模型提供一个有效的RL框架。在ImageNet 256*256上,MARVAL-Huge实现了2.00的FID,与MAR-diffusion相比加速了30倍以上,并且MARVAL-RL在具有实体名称的ImageNet数据集上的CLIP和图像奖励分数方面得到了一致的改进。总之,MARVAL展示了掩蔽自回归扩散模型的蒸馏和RL的第一条实用路径,实现了快速采样和更好的偏好对齐。
摘要:Masked auto-regressive diffusion models (MAR) benefit from the expressive modeling ability of diffusion models and the flexibility of masked auto-regressive ordering. However, vanilla MAR suffers from slow inference due to its hierarchical inference mechanism: an outer AR unmasking loop and an inner diffusion denoising chain. Such decoupled structure not only harm the generation efficiency but also hinder the practical use of MAR for reinforcement learning (RL), an increasingly critical paradigm for generative model post-training.To address this fundamental issue, we introduce MARVAL (Masked Auto-regressive Variational Acceleration), a distillation-based framework that compresses the diffusion chain into a single AR generation step while preserving the flexible auto-regressive unmasking order. Such a distillation with MARVAL not only yields substantial inference acceleration but, crucially, makes RL post-training with verifiable rewards practical, resulting in scalable yet human-preferred fast generative models. Our contributions are twofold: (1) a novel score-based variational objective for distilling masked auto-regressive diffusion models into a single generation step without sacrificing sample quality; and (2) an efficient RL framework for masked auto-regressive models via MARVAL-RL. On ImageNet 256*256, MARVAL-Huge achieves an FID of 2.00 with more than 30 times speedup compared with MAR-diffusion, and MARVAL-RL yields consistent improvements in CLIP and image-reward scores on ImageNet datasets with entity names. In conclusion, MARVAL demonstrates the first practical path to distillation and RL of masked auto-regressive diffusion models, enabling fast sampling and better preference alignments.
【5】Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement Learning
标题:使用多代理强化学习的任务特定共享感知O-RAN资源管理
链接:https://arxiv.org/abs/2511.15002
作者:Fatemeh Lotfi,Hossein Rajoli,Fatemeh Afghah
备注:Accepted to be published in IEEE Transaction on Machine Learning in Communication and Networking (TMLCN)
摘要:下一代网络利用开放无线电接入网络(O-RAN)架构来实现由RAN智能控制器(RIC)促进的动态资源管理。虽然深度强化学习(DRL)模型在优化网络资源方面表现出了希望,但它们在动态环境中往往难以实现鲁棒性和泛化能力。本文介绍了一种新的资源管理方法,增强了软演员评论(SAC)算法与锐度感知最小化(SAM)在分布式多Agent RL(MARL)框架。我们的方法引入了一种自适应和选择性SAM机制,其中正则化由时间差(TD)误差方差显式驱动,确保只有面临高环境复杂性的代理才被正则化。这种有针对性的策略减少了不必要的开销,提高了训练稳定性,并在不牺牲学习效率的情况下增强了泛化能力。我们进一步纳入了一个动态的$ρ$调度方案,以完善探索开发权衡代理。实验结果表明,我们的方法显着优于传统的DRL方法,产生高达$22\%$的资源分配效率的改善,并确保在不同的O-RAN切片优越的QoS满意度。
摘要:Next-generation networks utilize the Open Radio Access Network (O-RAN) architecture to enable dynamic resource management, facilitated by the RAN Intelligent Controller (RIC). While deep reinforcement learning (DRL) models show promise in optimizing network resources, they often struggle with robustness and generalizability in dynamic environments. This paper introduces a novel resource management approach that enhances the Soft Actor Critic (SAC) algorithm with Sharpness-Aware Minimization (SAM) in a distributed Multi-Agent RL (MARL) framework. Our method introduces an adaptive and selective SAM mechanism, where regularization is explicitly driven by temporal-difference (TD)-error variance, ensuring that only agents facing high environmental complexity are regularized. This targeted strategy reduces unnecessary overhead, improves training stability, and enhances generalization without sacrificing learning efficiency. We further incorporate a dynamic $ρ$ scheduling scheme to refine the exploration-exploitation trade-off across agents. Experimental results show our method significantly outperforms conventional DRL approaches, yielding up to a $22\%$ improvement in resource allocation efficiency and ensuring superior QoS satisfaction across diverse O-RAN slices.
【6】Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution
标题:激励-反应模型中的强化学习:在最佳执行中的应用
链接:https://arxiv.org/abs/2511.15262
作者:Tomas Espana,Yadh Hafsi,Fabrizio Lillo,Edoardo Vittori
摘要:我们研究了强化学习在元订单最佳执行中的应用,其目标是递增地执行大额订单,同时在较长一段时间内最大限度地减少执行不足和市场影响。与传统的价格动态和影响建模的参数方法不同,我们采用了一个无模型的数据驱动框架。由于政策优化需要历史数据无法提供的反事实反馈,因此我们采用模拟-反应模型来生成现实且易于处理的限价订单簿模拟,其中包括瞬时价格影响以及非线性和动态订单流响应。在方法上,我们训练一个双深度Q网络代理的状态空间,包括时间,库存,价格和深度变量,并评估其性能对既定的基准。数值模拟结果表明,该代理学习的政策,是战略和战术,有效地适应订单簿条件,并在多个培训配置优于标准的方法。这些发现提供了强有力的证据,证明无模型强化学习可以为最优执行问题提供自适应和鲁棒的解决方案。
摘要:We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem.
分层学习(1篇)
【1】Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning
标题:基于CLIP的类增量学习的分层语义树锚定
链接:https://arxiv.org/abs/2511.15633
作者:Tao Hu,Lan Li,Zhen-Hao Xie,Da-Wei Zhou
摘要:类增量学习(CIL)使模型能够在保留过去知识的同时不断学习新的类。最近,像CLIP这样的视觉语言模型通过多模态预训练提供了可转移的功能,使它们非常适合CIL。然而,现实世界的视觉和语言概念本质上是分层的:像“狗”这样的文本概念包含了细粒度的类别,如“拉布拉多”和“金毛猎犬”,每个类别都包含了它的图像。但是现有的基于CLIP的CIL方法无法明确地捕获这种固有的层次结构,导致细粒度的类特征在增量更新期间漂移,并最终导致灾难性的遗忘。为了应对这一挑战,我们提出了HASTEN(层次语义树分类),锚定层次信息到CIL,以减少灾难性的遗忘。首先,我们采用外部知识图作为监督,在双曲空间中嵌入视觉和文本特征,随着数据的发展有效地保持层次结构。其次,为了减轻灾难性的遗忘,我们将梯度投影到共享双曲映射器的零空间上,防止干扰先前的任务。这两个步骤协同工作,使模型能够通过维护层次关系来抵抗遗忘。大量的实验表明,HASTEN始终优于现有的方法,同时提供了一个统一的结构化表示。
摘要:Class-Incremental Learning (CIL) enables models to learn new classes continually while preserving past knowledge. Recently, vision-language models like CLIP offer transferable features via multi-modal pre-training, making them well-suited for CIL. However, real-world visual and linguistic concepts are inherently hierarchical: a textual concept like "dog" subsumes fine-grained categories such as "Labrador" and "Golden Retriever," and each category entails its images. But existing CLIP-based CIL methods fail to explicitly capture this inherent hierarchy, leading to fine-grained class features drift during incremental updates and ultimately to catastrophic forgetting. To address this challenge, we propose HASTEN (Hierarchical Semantic Tree Anchoring) that anchors hierarchical information into CIL to reduce catastrophic forgetting. First, we employ an external knowledge graph as supervision to embed visual and textual features in hyperbolic space, effectively preserving hierarchical structure as data evolves. Second, to mitigate catastrophic forgetting, we project gradients onto the null space of the shared hyperbolic mapper, preventing interference with prior tasks. These two steps work synergistically to enable the model to resist forgetting by maintaining hierarchical relationships. Extensive experiments show that HASTEN consistently outperforms existing methods while providing a unified structured representation.
医学相关(7篇)
【1】Novel sparse matrix algorithm expands the feasible size of a self-organizing map of the knowledge indexed by a database of peer-reviewed medical literature
标题:新型稀疏矩阵算法扩展了同行评审医学文献数据库索引的知识自组织地图的可行大小
链接:https://arxiv.org/abs/2511.15136
作者:Andrew Amos,Joanne Lee,Tarun Sen Gupta,Bunmi S. Malau-Aduli
摘要:由于现有算法的内存和处理需求呈指数级增长,过去绘制Medline数据库的努力仅限于可用数据的小子集。我们设计了一种新的稀疏矩阵乘法算法,使我们能够将自组织映射应用于整个Medline数据集,从而为现有医学知识提供更完整的映射。该算法还增加了改进自组织映射的可行性,以考虑数据集随时间的变化。
摘要:Past efforts to map the Medline database have been limited to small subsets of the available data because of the exponentially increasing memory and processing demands of existing algorithms. We designed a novel algorithm for sparse matrix multiplication that allowed us to apply a self-organizing map to the entire Medline dataset, allowing for a more complete map of existing medical knowledge. The algorithm also increases the feasibility of refining the self-organizing map to account for changes in the dataset over time.
【2】Deep Pathomic Learning Defines Prognostic Subtypes and Molecular Drivers in Colorectal Cancer
标题:深度病理学习定义结直肠癌的预后亚型和分子驱动因素
链接:https://arxiv.org/abs/2511.15067
作者:Zisong Wang,Xuanyu Wang,Hang Chen,Haizhou Wang,Yuxin Chen,Yihang Xu,Yunhe Yuan,Lihuan Luo,Xitong Ling,Xiaoping Liu
摘要:由于结直肠癌(CRC)的高度异质性,其准确的预后分层仍然是一个主要的临床挑战。传统的TNM分期系统不适合个性化医疗。我们的目的是开发和验证一种新的多实例学习模型TDAM-CRC使用组织病理学全载玻片图像进行准确的预后预测,并揭示其潜在的分子机制。我们在TCGA发现队列(n=581)上训练了模型,在独立的外部队列(n=1031)中验证了模型,并进一步整合了多组学数据,以提高模型的可解释性并识别新的预后生物标志物。结果表明,TDAM-CRC在两个队列中均实现了稳健的风险分层。其预测性能显著优于传统的临床分期系统和多种最先进的模型。TDAM-CRC风险评分在多变量分析中被证实为独立的预后因素。多组学分析显示,高危亚型与代谢重编程和免疫抑制肿瘤微环境密切相关。通过相互作用网络分析,我们确定并验证了线粒体核糖体蛋白L37(MRPL 37)作为一个关键的枢纽基因,将深层病理特征与临床预后联系起来。我们发现,由启动子低甲基化驱动的MRPL 37的高表达可作为良好预后的独立生物标志物。最后,我们构建了一个诺模图,结合TDAM-CRC风险评分和临床因素,为CRC患者提供一个精确和可解释的临床决策工具。我们的AI驱动的病理模型TDAM-CRC为改善CRC风险分层提供了强大的工具,揭示了新的分子靶点,并促进了个性化的临床决策。
摘要:Precise prognostic stratification of colorectal cancer (CRC) remains a major clinical challenge due to its high heterogeneity. The conventional TNM staging system is inadequate for personalized medicine. We aimed to develop and validate a novel multiple instance learning model TDAM-CRC using histopathological whole-slide images for accurate prognostic prediction and to uncover its underlying molecular mechanisms. We trained the model on the TCGA discovery cohort (n=581), validated it in an independent external cohort (n=1031), and further we integrated multi-omics data to improve model interpretability and identify novel prognostic biomarkers. The results demonstrated that the TDAM-CRC achieved robust risk stratification in both cohorts. Its predictive performance significantly outperformed the conventional clinical staging system and multiple state-of-the-art models. The TDAM-CRC risk score was confirmed as an independent prognostic factor in multivariable analysis. Multi-omics analysis revealed that the high-risk subtype is closely associated with metabolic reprogramming and an immunosuppressive tumor microenvironment. Through interaction network analysis, we identified and validated Mitochondrial Ribosomal Protein L37 (MRPL37) as a key hub gene linking deep pathomic features to clinical prognosis. We found that high expression of MRPL37, driven by promoter hypomethylation, serves as an independent biomarker of favorable prognosis. Finally, we constructed a nomogram incorporating the TDAM-CRC risk score and clinical factors to provide a precise and interpretable clinical decision-making tool for CRC patients. Our AI-driven pathological model TDAM-CRC provides a robust tool for improved CRC risk stratification, reveals new molecular targets, and facilitates personalized clinical decision-making.
【3】Oversampling techniques for predicting COVID-19 patient length of stay
标题:预测COVID-19患者住院时间的过度抽样技术
链接:https://arxiv.org/abs/2511.15048
作者:Zachariah Farahany,Jiawei Wu,K M Sajjadul Islam,Praveen Madiraju
备注:10 pages, 2022 IEEE International Conference on Big Data (Big Data)
摘要
:COVID-19是一种呼吸道疾病,于二零一九年引发全球大流行。它具有高度传染性,并具有以下症状:发烧或发冷,咳嗽,呼吸急促,疲劳,肌肉或身体疼痛,头痛,新的味觉或嗅觉丧失,喉咙痛,充血或流鼻涕,恶心或呕吐以及腹泻。这些症状的严重程度各不相同;已知一些具有许多风险因素的人住院时间较长或死于这种疾病。在本文中,我们分析了患者的电子健康记录(EHR),以使用住院时间(LOS)作为我们对严重程度的测量来预测其COVID-19感染的严重程度。这是一个不平衡的分类问题,因为许多人的LOS较短而不是较长。为了解决这个问题,我们综合创建交替的过采样训练数据集。一旦我们有了这个过采样数据,我们就通过人工神经网络(ANN)运行它,在训练过程中,它的超参数使用贝叶斯优化进行调整。我们选择具有最佳F1得分的模型,然后对其进行评估和讨论。
摘要:COVID-19 is a respiratory disease that caused a global pandemic in 2019. It is highly infectious and has the following symptoms: fever or chills, cough, shortness of breath, fatigue, muscle or body aches, headache, the new loss of taste or smell, sore throat, congestion or runny nose, nausea or vomiting, and diarrhea. These symptoms vary in severity; some people with many risk factors have been known to have lengthy hospital stays or die from the disease. In this paper, we analyze patients' electronic health records (EHR) to predict the severity of their COVID-19 infection using the length of stay (LOS) as our measurement of severity. This is an imbalanced classification problem, as many people have a shorter LOS rather than a longer one. To combat this problem, we synthetically create alternate oversampled training data sets. Once we have this oversampled data, we run it through an Artificial Neural Network (ANN), which during training has its hyperparameters tuned using Bayesian optimization. We select the model with the best F1 score and then evaluate it and discuss it.
【4】Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
标题:微调用于COVID-19检测的预训练音频模型:技术报告
链接:https://arxiv.org/abs/2511.14939
作者:Daniel Oliveira de Brito,Letícia Gabriella de Souza,Marcelo Matheus Gauy,Marcelo Finger,Arnaldo Candido Junior
备注:11 pages
摘要:本技术报告使用已建立的基准数据集调查了预训练音频模型在COVID-19检测任务中的性能。我们在Coswara和COUGHVID数据集上微调了Audio-MAE和三种PANN架构(CNN 6,CNN 10,CNN 14),评估了数据集内和跨数据集的泛化。我们按年龄和性别实施了严格的人口分层,以防止模型利用人口特征与COVID-19状态之间的虚假相关性。数据集内结果显示中等性能,Audio-MAE在Coswara上获得最强结果(0.82 AUC,0.76 F1评分),而所有模型在Coughvid上表现出有限性能(AUC 0.58-0.63)。交叉数据集评估显示所有模型均存在严重的泛化失败(AUC 0.43-0.68),Audio-MAE显示出强烈的性能下降(F1评分0.00-0.08)。我们的实验表明,人口统计平衡在降低表观模型性能的同时,通过消除人口统计泄漏--一个夸大性能指标的混杂因素--提供了对COVID-19检测能力的更现实的评估。此外,平衡后有限的数据集大小(1,219 - 2,160个样本)被证明不足以用于通常需要更大训练集的深度学习模型。这些发现突出了开发可推广的基于音频的COVID-19检测系统的根本挑战,并强调了严格的人口统计控制对临床稳健模型评估的重要性。
摘要:This technical report investigates the performance of pre-trained audio models on COVID-19 detection tasks using established benchmark datasets. We fine-tuned Audio-MAE and three PANN architectures (CNN6, CNN10, CNN14) on the Coswara and COUGHVID datasets, evaluating both intra-dataset and cross-dataset generalization. We implemented a strict demographic stratification by age and gender to prevent models from exploiting spurious correlations between demographic characteristics and COVID-19 status. Intra-dataset results showed moderate performance, with Audio-MAE achieving the strongest result on Coswara (0.82 AUC, 0.76 F1-score), while all models demonstrated limited performance on Coughvid (AUC 0.58-0.63). Cross-dataset evaluation revealed severe generalization failure across all models (AUC 0.43-0.68), with Audio-MAE showing strong performance degradation (F1-score 0.00-0.08). Our experiments demonstrate that demographic balancing, while reducing apparent model performance, provides more realistic assessment of COVID-19 detection capabilities by eliminating demographic leakage - a confounding factor that inflate performance metrics. Additionally, the limited dataset sizes after balancing (1,219-2,160 samples) proved insufficient for deep learning models that typically require substantially larger training sets. These findings highlight fundamental challenges in developing generalizable audio-based COVID-19 detection systems and underscore the importance of rigorous demographic controls for clinically robust model evaluation.
【5】HULFSynth : An INR based Super-Resolution and Ultra Low-Field MRI Synthesis via Contrast factor estimation
标题:HULFSynth:通过对比因子估计的基于IRR的超分辨率和超低场MRI合成
链接:https://arxiv.org/abs/2511.14897
作者:Pranav Indrakanti,Ivor Simpson
备注:Submitted to ISBI 2026
摘要:我们提出了一种无监督的单图像双向磁共振图像(MRI)合成器,它可以从高场(HF)幅度图像合成超低场(ULF)图像,反之亦然。与现有的MRI合成模型不同,我们的方法受到了HF和ULF MRI之间对比度变化的物理学的启发。我们的前向模型通过基于目标对比度值估计组织类型信噪比(SNR)值来模拟HF到ULF的变换。对于超分辨率任务,我们使用隐式神经表示(INR)网络合成HF图像,同时预测组织类型分割和图像强度,而无需观察到HF数据。所提出的方法进行评估,使用合成的超低频样数据从标准的3 T T$_1$加权图像的定性评估和配对的3 T-64 mT T$_1$加权图像的验证实验。WM-GM对比度在合成ULF样图像中提高了52%,在64 mT图像中提高了37%。灵敏度实验表明,我们的前向模型的目标对比度,噪声和初始播种的变化的鲁棒性。
摘要:We present an unsupervised single image bidirectional Magnetic Resonance Image (MRI) synthesizer that synthesizes an Ultra-Low Field (ULF) like image from a High-Field (HF) magnitude image and vice-versa. Unlike existing MRI synthesis models, our approach is inspired by the physics that drives contrast changes between HF and ULF MRIs. Our forward model simulates a HF to ULF transformation by estimating the tissue-type Signal-to-Noise ratio (SNR) values based on target contrast values. For the Super-Resolution task, we used an Implicit Neural Representation (INR) network to synthesize HF image by simultaneously predicting tissue-type segmentations and image intensity without observed HF data. The proposed method is evaluated using synthetic ULF-like data from generated from standard 3T T$_1$-weighted images for qualitative assessments and paired 3T-64mT T$_1$-weighted images for validation experiments. WM-GM contrast improved by 52% in synthetic ULF-like images and 37% in 64mT images. Sensitivity experiments demonstrated the robustness of our forward model to variations in target contrast, noise and initial seeding.
【6】CODE-II: A large-scale dataset for artificial intelligence in ECG analysis
标题:CODE-II:用于心电图分析的人工智能大规模数据集
链接:https://arxiv.org/abs/2511.15632
作者:Petrus E. O. G. B. Abreu,Gabriela M. M. Paixão,Jiawei Li,Paulo R. Gomes,Peter W. Macfarlane,Ana C. S. Oliveira,Vinicius T. Carvalho,Thomas B. Schön,Antonio Luiz P. Ribeiro,Antônio H. Ribeiro
摘要:用于心电图(ECG)解释的数据驱动方法正在迅速发展。大型数据集使基于人工智能(AI)的ECG分析取得了进展,但注释质量、大小和范围的限制仍然是主要挑战。在这里,我们介绍了CODE-II,这是一个由巴西米纳斯吉拉斯州远程医疗网络(TNMG)收集的2,093,807名成年患者的2,735,269个12导联ECG的大规模真实数据集。每项检查都使用标准化诊断标准进行注释,并由心脏病专家进行审查。CODE-II的一个定义特征是一组66个临床上有意义的诊断类别,由心脏病专家输入开发,并在远程医疗实践中常规使用。我们还提供了一个开放的可用子集:CODE-II-open,一个包含15,000名患者的公共子集,以及CODE-II-test,一个由多名心脏病专家审查的8,475项检查的非重叠集,用于盲态评价。在CODE-II上预训练的神经网络在外部基准测试(PTB-XL和CPSC 2018)上实现了卓越的传输性能,并优于在更大数据集上训练的替代方案。
摘要
:Data-driven methods for electrocardiogram (ECG) interpretation are rapidly progressing. Large datasets have enabled advances in artificial intelligence (AI) based ECG analysis, yet limitations in annotation quality, size, and scope remain major challenges. Here we present CODE-II, a large-scale real-world dataset of 2,735,269 12-lead ECGs from 2,093,807 adult patients collected by the Telehealth Network of Minas Gerais (TNMG), Brazil. Each exam was annotated using standardized diagnostic criteria and reviewed by cardiologists. A defining feature of CODE-II is a set of 66 clinically meaningful diagnostic classes, developed with cardiologist input and routinely used in telehealth practice. We additionally provide an open available subset: CODE-II-open, a public subset of 15,000 patients, and the CODE-II-test, a non-overlapping set of 8,475 exams reviewed by multiple cardiologists for blinded evaluation. A neural network pre-trained on CODE-II achieved superior transfer performance on external benchmarks (PTB-XL and CPSC 2018) and outperformed alternatives trained on larger datasets.
【7】Reconstruction of three-dimensional shapes of normal and disease-related erythrocytes from partial observations using multi-fidelity neural networks
标题:使用多保真神经网络根据部分观察重建正常和疾病相关红细胞的三维形状
链接:https://arxiv.org/abs/2511.14962
作者:Haizhou Wen,He Li,Zhen Li
备注:29 pages, 10 figures, 3 appendices
摘要:从局部观察(例如显微镜图像)重建3D红细胞或红细胞(RBC)形态对于理解RBC老化的生理学和各种RBC疾病的病理学是必不可少的。在这项研究中,我们提出了一种多保真度神经网络(MFNN)方法来融合红细胞的高保真横截面,与形态相似的低保真度参考3D红细胞形状恢复其完整的3D表面。MFNN预测器将在低保真度参考RBC数据上训练的卷积神经网络与捕获非线性形态相关性的前馈神经网络相结合,并使用表面积和体积约束来增强训练,以用于低保真度分支中的正则化。这种方法的理论基础是球体和3D RBC表面之间的拓扑同胚,训练数据由耗散粒子动力学模拟口腔细胞-盘细胞-棘细胞转化产生。在正常和老年人群中观察到的不同RBC形状的基准,我们的研究结果表明,MFNN预测可以重建复杂的RBC形态,超过95%的坐标精度时,提供至少两个正交的横截面。据观察,信息斜横截面相交刺尖棘细胞提高本地和全球的功能重建,突出了功能感知采样的价值。我们的研究进一步评估了采样策略,形状相异性和噪声的影响,在物理约束训练下显示出增强的鲁棒性。总之,这些结果表明MFNN能够从常规显微镜图像中观察到的部分横截面重建正常和老化RBC的3D形状,这可以促进正常和疾病相关RBC样品中RBC形态参数的定量分析。
摘要:Reconstruction of 3D erythrocyte or red blood cell (RBC) morphology from partial observations, such as microscope images, is essential for understanding the physiology of RBC aging and the pathology of various RBC disorders. In this study, we propose a multi-fidelity neural network (MFNN) approach to fuse high-fidelity cross-sections of an RBC, with a morphologically similar low-fidelity reference 3D RBC shape to recover its full 3D surface. The MFNN predictor combines a convolutional neural network trained on low-fidelity reference RBC data with a feedforward neural network that captures nonlinear morphological correlations, and augments training with surface area and volume constraints for regularization in the low-fidelity branch. This approach is theoretically grounded by a topological homeomorphism between a sphere and 3D RBC surfaces, with training data generated by dissipative particle dynamics simulations of stomatocyte-discocyte-echinocyte transformation. Benchmarking across diverse RBC shapes observed in normal and aged populations, our results show that the MFNN predictor can reconstruct complex RBC morphologies with over 95% coordinate accuracy when provided with at least two orthogonal cross-sections. It is observed that informative oblique cross-sections intersecting spicule tips of echinocytes improve both local and global feature reconstruction, highlighting the value of feature-aware sampling. Our study further evaluates the influence of sampling strategies, shape dissimilarity, and noise, showing enhanced robustness under physically constrained training. Altogether, these results demonstrate the capability of MFNN to reconstruct the 3D shape of normal and aged RBCs from partial cross-sections as observed in conventional microscope images, which could facilitate the quantitative analysis of RBC morphological parameters in normal and disease-related RBC samples.
蒸馏|知识提取(1篇)
【1】Logit-Based Losses Limit the Effectiveness of Feature Knowledge Distillation
标题:基于日志的损失限制了特征知识提取的有效性
链接:https://arxiv.org/abs/2511.14981
作者:Nicholas Cooper,Lijun Chen,Sailesh Dwivedy,Danna Gurari
备注:NeurIPS Workshop on Symmetry and Geometry in Neural Representations (NeurReps), December 2025
摘要:知识蒸馏(KD)方法可以将参数较多的教师模型的知识转移到较轻的学生模型。特征KD方法的现状是利用基于logits的损失函数(即,前softmax类分数)和中间层特征(即,潜在表征)。与以前的方法不同,我们提出了一个特征KD框架,用于专门使用基于特征的损失来训练学生的骨干(即,没有基于logit的损失,例如交叉熵)。利用最近发现的潜在表示的几何形状,我们引入了一个知识质量指标,以确定哪些教师层提供最有效的知识蒸馏。在三个图像分类数据集上进行的实验,包括四个不同的学生-教师对,跨越卷积神经网络和Vision Transformers,证明了我们的KD方法具有最先进的性能,与标准方法相比,前1级的准确率提高了15%。我们在https://github.com/Thegolfingocto/KD_wo_CE上分享我们的代码,以促进未来的工作。
摘要:Knowledge distillation (KD) methods can transfer knowledge of a parameter-heavy teacher model to a light-weight student model. The status quo for feature KD methods is to utilize loss functions based on logits (i.e., pre-softmax class scores) and intermediate layer features (i.e., latent representations). Unlike previous approaches, we propose a feature KD framework for training the student's backbone using feature-based losses exclusively (i.e., without logit-based losses such as cross entropy). Leveraging recent discoveries about the geometry of latent representations, we introduce a knowledge quality metric for identifying which teacher layers provide the most effective knowledge for distillation. Experiments on three image classification datasets with four diverse student-teacher pairs, spanning convolutional neural networks and vision transformers, demonstrate our KD method achieves state-of-the-art performance, delivering top-1 accuracy boosts of up to 15% over standard approaches. We publically share our code to facilitate future work at https://github.com/Thegolfingocto/KD_wo_CE.
聚类(1篇)
【1】Convex Clustering Redefined: Robust Learning with the Median of Means Estimator
标题:重新定义凸集群:使用均值估计中位数的鲁棒学习
链接:https://arxiv.org/abs/2511.14784
作者:Sourav De,Koustav Chowdhury,Bibhabasu Mandal,Sagar Ghosh,Swagatam Das,Debolina Paul,Saptarshi Chakraborty
备注:Accepted in AAAI 2026
摘要:利用凸损失函数的聚类方法最近吸引了越来越多的兴趣,在紧凑的数据集群的形成。虽然像k-means及其广泛的变体家族这样的经典方法仍然被广泛使用,但所有这些方法都需要提供聚类数k作为输入,并且许多方法对初始化非常敏感。凸聚类通过将聚类任务公式化为凸优化问题,提供了一种更稳定的替代方案,确保了唯一的全局解决方案。然而,它在处理高维数据时面临挑战,特别是在存在噪声和离群值的情况下。此外,由调整参数控制的强融合正则化可能会阻碍凸聚类框架内的有效聚类形成。为了克服这些挑战,我们引入了一个强大的方法,将凸聚类与均值中值(MoM)估计,从而开发了一个离群值抵抗和有效的聚类框架,不需要事先知道的集群数量。通过利用MoM的鲁棒性以及凸聚类的稳定性,我们的方法提高了性能和效率,特别是在大规模数据集上。理论分析表明,在特定条件下的弱一致性,而合成和真实世界的数据集上的实验验证了该方法的优越性能相比,现有的方法。
摘要:Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.
自动驾驶|车辆|车道检测等(1篇)
【1】STREAM-VAE: Dual-Path Routing for Slow and Fast Dynamics in Vehicle Telemetry Anomaly Detection
标题:CREAM-VAE:车辆远程通信异常检测中用于慢速和快速动态的双路径路由
链接:https://arxiv.org/abs/2511.15339
作者:Kadir-Kaan Özer,René Ebeling,Markus Enzweiler
备注:8 Pages, 4 Figures, 4 Tables
摘要:汽车遥测数据表现出缓慢漂移和快速尖峰,通常在同一序列内,使得可靠的异常检测具有挑战性。标准的基于重建的方法,包括序列变分自编码器(VAE),使用一个单一的潜在过程,因此混合异构的时间尺度,这可以平滑尖峰或膨胀的方差和削弱异常分离。 在本文中,我们提出了流-VAE,变分自动编码器的异常检测汽车遥测时间序列数据。我们的模型使用双路径编码器来分离慢漂移和快尖峰信号动态,以及表示与正常操作模式分开的瞬态偏差的解码器。STREAM-VAE专为部署而设计,可为车载监控器和后端车队分析在不同操作模式下生成稳定的异常分数。 在汽车遥测数据集和公共SMD基准上的实验表明,与强大的预测,注意力,图形和VAE基线相比,明确分离漂移和尖峰动态可以提高鲁棒性。
摘要:Automotive telemetry data exhibits slow drifts and fast spikes, often within the same sequence, making reliable anomaly detection challenging. Standard reconstruction-based methods, including sequence variational autoencoders (VAEs), use a single latent process and therefore mix heterogeneous time scales, which can smooth out spikes or inflate variances and weaken anomaly separation. In this paper, we present STREAM-VAE, a variational autoencoder for anomaly detection in automotive telemetry time-series data. Our model uses a dual-path encoder to separate slow drift and fast spike signal dynamics, and a decoder that represents transient deviations separately from the normal operating pattern. STREAM-VAE is designed for deployment, producing stable anomaly scores across operating modes for both in-vehicle monitors and backend fleet analytics. Experiments on an automotive telemetry dataset and the public SMD benchmark show that explicitly separating drift and spike dynamics improves robustness compared to strong forecasting, attention, graph, and VAE baselines.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】Artificial intelligence approaches for energy-efficient laser cutting machines
标题:节能激光切割机的人工智能方法
链接:https://arxiv.org/abs/2511.14952
作者:Mohamed Abdallah Salem,Hamdy Ahmed Ashour,Ahmed Elshenawy
摘要:该研究通过提出新的深度学习(DL)方法来实现节能,从而解决了激光切割中能源消耗和环境影响的重大挑战。认识到目前缺乏自适应控制和开环性质的CO2激光抽吸泵,本研究利用闭环配置,动态调整泵功率的基础上被切割的材料和烟雾水平产生。为了实现这种自适应系统,引入了各种材料分类方法,包括利用定制卷积神经网络(CNN)的无透镜散斑感测的技术,以及使用USB相机通过预训练的VGG 16 CNN模型进行转移学习的方法。此外,一个单独的DL模型的烟雾水平检测,同时细化泵的功率输出。这种集成促使排烟抽吸泵在非活动时间自动停止,并在运行期间动态调整功率,从而实现了实验证明的显著节能,结果显示排烟抽吸泵的能耗降低了20%至50%,从而为制造业的可持续发展做出了重大贡献。
摘要:This research addresses the significant challenges of energy consumption and environmental impact in laser cutting by proposing novel deep learning (DL) methodologies to achieve energy reduction. Recognizing the current lack of adaptive control and the open-loop nature of CO2 laser suction pumps, this study utilizes closed-loop configurations that dynamically adjust pump power based on both the material being cut and the smoke level generated. To implement this adaptive system, diverse material classification methods are introduced, including techniques leveraging lens-less speckle sensing with a customized Convolutional Neural Network (CNN) and an approach using a USB camera with transfer learning via the pre-trained VGG16 CNN model. Furthermore, a separate DL model for smoke level detection is employed to simultaneously refine the pump's power output. This integration prompts the exhaust suction pump to automatically halt during inactive times and dynamically adjust power during operation, leading to experimentally proven and remarkable energy savings, with results showing a 20% to 50% reduction in the smoke suction pump's energy consumption, thereby contributing substantially to sustainable development in the manufacturing sector.
联邦学习|隐私保护|加密(2篇)
【1】FairEnergy: Contribution-Based Fairness meets Energy Efficiency in Federated Learning
标题:公平能源:基于贡献的公平与联邦学习中的能源效率相结合
链接:https://arxiv.org/abs/2511.15454
作者:Ouiame Marnissi,Hajar EL Hammouti,El Houcine Bergou
摘要:联合学习(FL)支持跨分布式设备的协作模型训练,同时保护数据隐私。然而,在无线边缘系统中,由于异构资源、不平等的客户端贡献和有限的通信容量,在确保高模型准确性的同时平衡能量效率和公平参与仍然具有挑战性。为了解决这些挑战,我们提出了FairEnergy,一个公平意识的能量最小化框架,它集成了一个贡献分数,捕获更新的幅度和它们的压缩比到设备选择,带宽分配和压缩级别的联合优化。由此产生的混合整数非凸问题的解决,放松二进制选择变量和应用拉格朗日分解来处理全局带宽耦合,其次是每个设备的子问题优化。在非IID数据上的实验表明,与基线策略相比,FairEnergy实现了更高的准确性,同时降低了高达79%的能耗。
摘要:Federated learning (FL) enables collaborative model training across distributed devices while preserving data privacy. However, balancing energy efficiency and fair participation while ensuring high model accuracy remains challenging in wireless edge systems due to heterogeneous resources, unequal client contributions, and limited communication capacity. To address these challenges, we propose FairEnergy, a fairness-aware energy minimization framework that integrates a contribution score capturing both the magnitude of updates and their compression ratio into the joint optimization of device selection, bandwidth allocation, and compression level. The resulting mixed-integer non-convex problem is solved by relaxing binary selection variables and applying Lagrangian decomposition to handle global bandwidth coupling, followed by per-device subproblem optimization. Experiments on non-IID data show that FairEnergy achieves higher accuracy while reducing energy consumption by up to 79\% compared to baseline strategies.
【2】Bringing Federated Learning to Space
标题:将联邦学习带入太空
链接:https://arxiv.org/abs/2511.14889
作者:Grace Kim,Filip Svoboda,Nicholas Lane
备注:15 pages, 9 figures, 3 tables accepted to IEEE Aeroconf 2026
摘要:随着低地球轨道(LEO)卫星星座迅速扩展到数百和数千个航天器,对分布式机载机器学习的需求对于解决下行链路带宽限制变得至关重要。联邦学习(FL)提供了一个很有前途的框架,进行跨卫星网络的协同模型训练。要实现其在空间的惠益,自然需要解决空间特有的制约因素,从断断续续的连通性到轨道运动造成的动态。这项工作提出了第一个系统的可行性分析,适应现成的FL算法的卫星星座部署。我们引入了一个全面的“空间化”框架,适应地面算法(FedAvg,FedProx,FedBuff)运行轨道的限制下,产生一个轨道就绪的FL算法套件。然后,我们通过对768个星座配置进行广泛的参数扫描来评估这些空间化方法,这些星座配置包括不同的集群大小(1- 10),每个集群的卫星(1-10)和地面站网络(1-13)。我们的分析表明,空间自适应FL算法可以有效地扩展到多达100颗卫星的星座,实现接近集中式理想的性能。数月的训练周期可以减少到几天,相当于通过轨道调度和卫星群内的本地协调实现9倍的加速。这些结果为未来的任务设计者提供了可操作的见解,使分布式机载学习能够实现更自主,更有弹性和数据驱动的卫星操作。
摘要
:As Low Earth Orbit (LEO) satellite constellations rapidly expand to hundreds and thousands of spacecraft, the need for distributed on-board machine learning becomes critical to address downlink bandwidth limitations. Federated learning (FL) offers a promising framework to conduct collaborative model training across satellite networks. Realizing its benefits in space naturally requires addressing space-specific constraints, from intermittent connectivity to dynamics imposed by orbital motion. This work presents the first systematic feasibility analysis of adapting off-the-shelf FL algorithms for satellite constellation deployment. We introduce a comprehensive "space-ification" framework that adapts terrestrial algorithms (FedAvg, FedProx, FedBuff) to operate under orbital constraints, producing an orbital-ready suite of FL algorithms. We then evaluate these space-ified methods through extensive parameter sweeps across 768 constellation configurations that vary cluster sizes (1-10), satellites per cluster (1-10), and ground station networks (1-13). Our analysis demonstrates that space-adapted FL algorithms efficiently scale to constellations of up to 100 satellites, achieving performance close to the centralized ideal. Multi-month training cycles can be reduced to days, corresponding to a 9x speedup through orbital scheduling and local coordination within satellite clusters. These results provide actionable insights for future mission designers, enabling distributed on-board learning for more autonomous, resilient, and data-driven satellite operations.
推理|分析|理解|解释(5篇)
【1】Towards Understanding Layer Contributions in Tabular In-Context Learning Models
标题:了解表格式上下文学习模型中的层贡献
链接:https://arxiv.org/abs/2511.15432
作者:Amir Rezaei Balef,Mykhailo Koshil,Katharina Eggensperger
备注:Accepted at the EurIPS 2025 Workshop on AI for Tabular Data
摘要:尽管表格式上下文学习(ICL)模型和大型语言模型(LLM)之间的架构相似,但人们对各个层如何有助于表格式预测知之甚少。在本文中,我们研究了潜在空间如何在表格ICL模型中跨层演变,识别潜在的冗余层,并将这些动态与LLM中观察到的动态进行比较。我们分析TabPFN和TabICL通过“层作为画家”的角度来看,发现只有层的子集共享一个共同的代表性语言,这表明结构冗余,并提供了模型压缩和提高可解释性的机会。
摘要:Despite the architectural similarities between tabular in-context learning (ICL) models and large language models (LLMs), little is known about how individual layers contribute to tabular prediction. In this paper, we investigate how the latent spaces evolve across layers in tabular ICL models, identify potential redundant layers, and compare these dynamics with those observed in LLMs. We analyze TabPFN and TabICL through the "layers as painters" perspective, finding that only subsets of layers share a common representational language, suggesting structural redundancy and offering opportunities for model compression and improved interpretability.
【2】Proximal Approximate Inference in State-Space Models
标题:状态空间模型中的邻近近似推理
链接:https://arxiv.org/abs/2511.15409
作者:Hany Abdulsamad,Ángel F. García-Fernández,Simo Särkkä
摘要:提出了一类非线性、非高斯状态空间模型的状态估计算法。我们的方法是基于变分拉格朗日公式,将贝叶斯推理作为一个序列的熵的信任区域更新动态约束。这个框架产生了一个家庭的向前向后算法,其结构是由所选择的变分后验分解。通过专注于高斯-马尔可夫近似,我们推导出具有良好的计算复杂性的递归计划。对于一般的非线性,非高斯模型,我们关闭的递归使用广义统计线性回归和傅立叶-厄米矩匹配。
摘要:We present a class of algorithms for state estimation in nonlinear, non-Gaussian state-space models. Our approach is based on a variational Lagrangian formulation that casts Bayesian inference as a sequence of entropic trust-region updates subject to dynamic constraints. This framework gives rise to a family of forward-backward algorithms, whose structure is determined by the chosen factorization of the variational posterior. By focusing on Gauss--Markov approximations, we derive recursive schemes with favorable computational complexity. For general nonlinear, non-Gaussian models we close the recursions using generalized statistical linear regression and Fourier--Hermite moment matching.
【3】Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
标题:可扩展混合专家推理的动态专家量化
链接:https://arxiv.org/abs/2511.15015
作者:Kexin Chu,Dawei Xiang,Zixu Shen,Yiwei Yang,Zecheng Liu,Wei Zhang
备注:7 pages
摘要:混合专家(MoE)模型可以有效地扩展LLM容量,但在消费者GPU上的部署受到非活动专家的大量内存占用的限制。静态训练后量化降低了存储成本,但不能适应移动的激活模式,导致在积极压缩下的准确性损失。因此,我们提出DynaExq,一个运行时系统,把专家精度作为一流的,动态管理的资源。DynaExq结合了(1)热度感知精度控制器,可持续将专家位宽与长期激活统计数据对齐,(2)完全异步的精度切换管道,可将提升和降级与MoE计算重叠,以及(3)无碎片内存池机制,支持混合精度专家的确定性分配。这些器件共同实现了在严格的HBM预算下稳定、无阻塞的精密转换。 在Qwen 3 - 30 B和Qwen 3 - 80 B MoE模型和六个代表性基准测试中,DynaExq在单个RTX 5090和A6000 GPU上部署了大型LLM,并将精度提高了4.03个点。结果表明,自适应,工作负载感知量化是一种有效的策略,内存受限的MoE服务。
摘要:Mixture-of-Experts (MoE) models scale LLM capacity efficiently, but deployment on consumer GPUs is limited by the large memory footprint of inactive experts. Static post-training quantization reduces storage costs but cannot adapt to shifting activation patterns, causing accuracy loss under aggressive compression. So we present DynaExq, a runtime system that treats expert precision as a first-class, dynamically managed resource. DynaExq combines (1) a hotness-aware precision controller that continuously aligns expert bit-widths with long-term activation statistics, (2) a fully asynchronous precision-switching pipeline that overlaps promotion and demotion with MoE computation, and (3) a fragmentation-free memory pooling mechanism that supports hybrid-precision experts with deterministic allocation. Together, these components enable stable, non-blocking precision transitions under strict HBM budgets. Across Qwen3-30B and Qwen3-80B MoE models and six representative benchmarks, DynaExq deploys large LLMs on single RTX 5090 and A6000 GPUs and improves accuracy by up to 4.03 points over static low-precision baselines. The results show that adaptive, workload-aware quantization is an effective strategy for memory-constrained MoE serving.
【4】Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization
标题:通过群转弯政策优化来支持多转弯工具集成推理
链接:https://arxiv.org/abs/2511.14846
作者:Yifeng Ding,Hung Le,Songyang Han,Kangrui Ruan,Zhenghui Jin,Varun Kumar,Zijian Wang,Anoop Deoras
摘要:为多轮工具集成推理(TIR)训练大型语言模型(LLM)-其中模型迭代推理,生成代码并通过执行进行验证-对于现有的强化学习(RL)方法仍然具有挑战性。目前的强化学习方法,例如组相对策略优化(GRPO),受到粗粒度的,强制性级别的奖励,为复杂的多回合交互提供不足的学习信号,导致训练停滞。为了解决这个问题,我们提出了组转弯策略优化(GTPO),这是一种专门为多转弯TIR任务训练LLM而设计的新型RL算法。GTPO引入了三个关键创新:(1)回合级别奖励分配,为每个回合提供细粒度的反馈,(2)基于回报的优势估计,其中标准化的折扣回报被计算为优势,以及(3)自我监督的奖励整形,利用来自生成代码的自我监督信号来加密稀疏的基于二进制结果的奖励。我们的综合评估表明,GTPO在不同的推理基准上平均比GRPO高出3.0%,这表明它在现实世界中推进复杂数学推理的有效性。
摘要:Training Large Language Models (LLMs) for multi-turn Tool-Integrated Reasoning (TIR) - where models iteratively reason, generate code, and verify through execution - remains challenging for existing reinforcement learning (RL) approaches. Current RL methods, exemplified by Group Relative Policy Optimization (GRPO), suffer from coarse-grained, trajectory-level rewards that provide insufficient learning signals for complex multi-turn interactions, leading to training stagnation. To address this issue, we propose Group Turn Policy Optimization (GTPO), a novel RL algorithm specifically designed for training LLMs on multi-turn TIR tasks. GTPO introduces three key innovations: (1) turn-level reward assignment that provides fine-grained feedback for individual turns, (2) return-based advantage estimation where normalized discounted returns are calculated as advantages, and (3) self-supervised reward shaping that exploits self-supervision signals from generated code to densify sparse binary outcome-based rewards. Our comprehensive evaluation demonstrates that GTPO outperforms GRPO by 3.0% on average across diverse reasoning benchmarks, establishing its effectiveness for advancing complex mathematical reasoning in the real world.
【5】Latent space analysis and generalization to out-of-distribution data
标题:潜在空间分析和对非分布数据的概括
链接:https://arxiv.org/abs/2511.15010
作者:Katie Rainey,Erin Hausmann,Donald Waagen,David Gray,Donald Hulsey
摘要:理解深度学习系统导出的潜在决策空间中数据点之间的关系对于评估和解释系统在真实世界数据上的性能至关重要。检测深度学习系统的\textit{out-of-distribution}(OOD)数据仍然是一个活跃的研究课题。我们调查潜在空间OOD检测和模型的分类精度之间的连接。使用开源模拟和测量的合成孔径雷达(SAR)数据集,我们经验证明,OOD检测不能用作模型性能的代理措施。我们希望激发对潜在空间几何特性的更多研究,这些研究可能会产生对深度学习鲁棒性和可推广性的未来见解。
摘要:Understanding the relationships between data points in the latent decision space derived by the deep learning system is critical to evaluating and interpreting the performance of the system on real world data. Detecting \textit{out-of-distribution} (OOD) data for deep learning systems continues to be an active research topic. We investigate the connection between latent space OOD detection and classification accuracy of the model. Using open source simulated and measured Synthetic Aperture RADAR (SAR) datasets, we empirically demonstrate that the OOD detection cannot be used as a proxy measure for model performance. We hope to inspire additional research into the geometric properties of the latent space that may yield future insights into deep learning robustness and generalizability.
检测相关(3篇)
【1】Fast Post-Hoc Confidence Fusion for 3-Class Open-Set Aerial Object Detection
标题:用于3类开放集航空目标检测的快速事后置信融合
链接:https://arxiv.org/abs/2511.15343
作者:Spyridon Loukovitis,Vasileios Karampinis,Athanasios Voulodimos
摘要:开发可靠的无人机导航系统需要强大的空对空物体探测器,能够区分训练期间看到的物体和以前看不到的物体。虽然许多方法解决了闭集检测并实现了域内(ID)目标的高置信度识别,但它们通常不解决开集检测,这需要同时处理ID和分布外(OOD)对象。现有的开集方法通常依赖于一个单一的不确定性分数与阈值,限制灵活性,并经常混淆OOD对象与背景杂波。相比之下,我们提出了一个轻量级的,模型不可知的后处理框架,明确分离背景未知的对象,同时保持基本检测器的性能。我们的方法扩展了开集检测超出二进制ID/OOD分类的实时三路分类之间的ID目标,OOD对象和背景。为此,我们采用了一种融合方案,聚合多个置信度估计和每检测功能,使用紧凑的多层感知器(MLP)。将不同的logit变量添加到MLP中,可以在不影响吞吐量的情况下持续提高二进制和三类分类的性能。广泛的消融和比较实验证实,我们的方法在两类分类中超过基于阈值的基线,平均AUROC为2.7%,同时保留或改善开放集mAP。此外,我们的研究独特地实现了鲁棒的三级分类,这是安全无人机导航的关键能力,其中必须积极避免OOD对象,并安全地忽略背景区域。比较分析强调,我们的方法在AUROC的数据集上优于竞争技术,同时将闭集mAP提高了9个点,相对增益为18%。
摘要:Developing reliable UAV navigation systems requires robust air-to-air object detectors capable of distinguishing between objects seen during training and previously unseen objects. While many methods address closed-set detection and achieve high-confidence recognition of in-domain (ID) targets, they generally do not tackle open-set detection, which requires simultaneous handling of both ID and out-of-distribution (OOD) objects. Existing open-set approaches typically rely on a single uncertainty score with thresholding, limiting flexibility and often conflating OOD objects with background clutter. In contrast, we propose a lightweight, model-agnostic post-processing framework that explicitly separates background from unknown objects while preserving the base detector's performance. Our approach extends open-set detection beyond binary ID/OOD classification to real-time three-way classification among ID targets, OOD objects, and background. To this end, we employ a fusion scheme that aggregates multiple confidence estimates and per-detection features using a compact multilayer perceptron (MLP). Incorporating different logit variants into the MLP consistently enhances performance across both binary and three-class classification without compromising throughput. Extensive ablation and comparative experiments confirm that our method surpasses threshold-based baselines in two-class classification by an average of 2.7% AUROC, while retaining or improving open-set mAP. Furthermore, our study uniquely enables robust three-class classification, a critical capability for safe UAV navigation, where OOD objects must be actively avoided and background regions safely ignored. Comparative analysis highlights that our method surpasses competitive techniques in AUROC across datasets, while improving closed-set mAP by up to 9 points, an 18% relative gain.
【2】Fourier-KAN-Mamba: A Novel State-Space Equation Approach for Time-Series Anomaly Detection
标题:Fourier-KAN-Mamba:一种用于时间序列异常检测的新型状态空间方程方法
链接:https://arxiv.org/abs/2511.15083
作者:Xiancheng Wang,Lin Wang,Rui Wang,Zhibo Zhang,Minghang Zhao
摘要:时间序列异常检测在许多实际应用中起着关键作用,包括工业监测和故障诊断。最近,基于Mamba的状态空间模型在长序列建模中表现出了显著的效率。然而,直接将Mamba应用于异常检测任务仍然面临着捕获复杂的时间模式和非线性动力学的挑战。在本文中,我们提出了Fourier-KAN-Mamba,一种新的混合架构,集成了傅立叶层,Kolmogorov-Arnold网络(KAN),和Mamba选择性状态空间模型。傅立叶层提取多尺度频率特征,KAN增强了非线性表示能力,时间门控控制机制进一步提高了模型区分正常和异常模式的能力。在MSL,SMAP和SWaT数据集上的大量实验表明,我们的方法显着优于现有的最先进的方法。 关键词:时间序列异常检测,状态空间模型,曼巴,傅立叶变换,Kolmogorov-Arnold网络
摘要:Time-series anomaly detection plays a critical role in numerous real-world applications, including industrial monitoring and fault diagnosis. Recently, Mamba-based state-space models have shown remarkable efficiency in long-sequence modeling. However, directly applying Mamba to anomaly detection tasks still faces challenges in capturing complex temporal patterns and nonlinear dynamics. In this paper, we propose Fourier-KAN-Mamba, a novel hybrid architecture that integrates Fourier layer, Kolmogorov-Arnold Networks (KAN), and Mamba selective state-space model. The Fourier layer extracts multi-scale frequency features, KAN enhances nonlinear representation capability, and a temporal gating control mechanism further improves the model's ability to distinguish normal and anomalous patterns. Extensive experiments on MSL, SMAP, and SWaT datasets demonstrate that our method significantly outperforms existing state-of-the-art approaches. Keywords: time-series anomaly detection, state-space model, Mamba, Fourier transform, Kolmogorov-Arnold Network
【3】How to pick the best anomaly detector?
标题:如何选择最好的异常检测器?
链接:https://arxiv.org/abs/2511.14832
作者:Marie Hein,Gregor Kasieczka,Michael Krämer,Louis Moureaux,Alexander Mück,David Shih
备注:12 pages, 7 figures
摘要:异常检测具有在数据的未探索区域中发现新物理的潜力。然而,选择一个给定的数据集在一个模型不可知的方式最好的异常检测器是一个重要的挑战,迄今在很大程度上被忽视。在本文中,我们介绍了数据驱动的ARGOS度量,它有一个良好的理论基础和经验表明,鲁棒地选择最敏感的异常检测模型的数据。专注于弱监督,基于分类器的异常检测方法,我们表明,ARGOS度量优于其他模型选择度量以前在文献中使用的,特别是二进制交叉熵损失。我们探讨了几个现实的应用,包括超参数调整以及架构和功能选择,并在所有情况下,我们表明,ARGOS是强大的异常检测的嘈杂条件。
摘要:Anomaly detection has the potential to discover new physics in unexplored regions of the data. However, choosing the best anomaly detector for a given data set in a model-agnostic way is an important challenge which has hitherto largely been neglected. In this paper, we introduce the data-driven ARGOS metric, which has a sound theoretical foundation and is empirically shown to robustly select the most sensitive anomaly detection model given the data. Focusing on weakly-supervised, classifier-based anomaly detection methods, we show that the ARGOS metric outperforms other model selection metrics previously used in the literature, in particular the binary cross-entropy loss. We explore several realistic applications, including hyperparameter tuning as well as architecture and feature selection, and in all cases we demonstrate that ARGOS is robust to the noisy conditions of anomaly detection.
分类|识别(3篇)
【1】Decentralized Gaussian Process Classification and an Application in Subsea Robotics
标题:分散高斯过程分类及其在水下机器人中的应用
链接:https://arxiv.org/abs/2511.15529
作者:Yifei Gao,Hans J. He,Daniel J. Stilwell,James McMahon
备注:8 pages, 8 figures, IROS 2025 conference
摘要:协作的自主水下航行器(AUV)团队依靠声学通信进行协调,但这种通信介质受到有限范围,多路径效应和低带宽的限制。解决与声学通信相关联的不确定性的一种方式是实时学习通信环境。我们解决了一个团队的机器人建立一个地图的通信成功的概率从一个位置到另一个实时的挑战。这是一个分散的分类问题--通信事件要么成功,要么不成功--AUV共享其通信测量的一个子集来构建地图。这项工作的主要贡献是一个严格派生的数据共享政策,选择测量AUV之间共享。我们实验验证我们提出的共享政策,使用真实的声学通信数据收集的弗吉尼亚理工大学690 AUV的团队,证明其在水下环境中的有效性。
摘要:Teams of cooperating autonomous underwater vehicles (AUVs) rely on acoustic communication for coordination, yet this communication medium is constrained by limited range, multi-path effects, and low bandwidth. One way to address the uncertainty associated with acoustic communication is to learn the communication environment in real-time. We address the challenge of a team of robots building a map of the probability of communication success from one location to another in real-time. This is a decentralized classification problem -- communication events are either successful or unsuccessful -- where AUVs share a subset of their communication measurements to build the map. The main contribution of this work is a rigorously derived data sharing policy that selects measurements to be shared among AUVs. We experimentally validate our proposed sharing policy using real acoustic communication data collected from teams of Virginia Tech 690 AUVs, demonstrating its effectiveness in underwater environments.
【2】TSFM in-context learning for time-series classification of bearing-health status
标题:用于生育健康状况时间序列分类的TSFM上下文学习
链接:https://arxiv.org/abs/2511.15447
作者:Michel Tokic,Slobodan Djukanović,Anja von Beuningen,Cheng Feng
备注:Preprint submitted to ESANN 2026
摘要:本文介绍了一种基于上下文学习的时间序列基础模型(TSFM)分类方法。我们展示了不属于TSFM训练数据语料库的数据如何在不需要微调模型的情况下进行分类。在模型的提示中,示例以目标(类ID)和协变量(数据矩阵)的形式表示,这使得能够通过上下文学习沿着预测轴对未知协变量数据模式进行分类。我们将这种方法应用于振动数据,用于评估伺服压力机电机内的轴承的健康状态。该方法将频域参考信号转换为伪时间序列模式,生成对齐的协变量和目标信号,并使用TSFM来预测分类数据与预定义标签对应的概率。利用预训练模型的可扩展性,该方法在不同的操作条件下证明了有效性。这标志着从定制的狭窄AI解决方案向更广泛的AI驱动的维护系统的重大进展。
摘要:This paper introduces a classification method using in-context learning in time-series foundation models (TSFM). We show how data, which was not part of the TSFM training data corpus, can be classified without the need of finetuning the model. Examples are represented in the form of targets (class id) and covariates (data matrix) within the prompt of the model, which enables to classify an unknown covariate data pattern alongside the forecast axis through in-context learning. We apply this method to vibration data for assessing the health state of a bearing within a servo-press motor. The method transforms frequency domain reference signals into pseudo time-series patterns, generates aligned covariate and target signals, and uses the TSFM to predict probabilities how classified data corresponds to predefined labels. Leveraging the scalability of pre-trained models this method demonstrates efficacy across varied operational conditions. This marks significant progress beyond custom narrow AI solutions towards broader, AI-driven maintenance systems.
【3】Interpretable temporal fusion network of multi- and multi-class arrhythmia classification
标题:多类和多类心律失常分类的可解释时间融合网络
链接:https://arxiv.org/abs/2511.15062
作者:Yun Kwan Kim
备注:[Doctoral dissertation, Korea University, 2025]
摘要:临床决策支持系统(CDSS)已被广泛用于支持心脏病专家在从心电图检测和分类心律失常时做出的决策。然而,由于心律失常的长度不同,形成用于心律失常分类任务的CDSS具有挑战性。虽然心律失常的发作时间各不相同,但以前开发的方法没有考虑到这些条件。因此,我们提出了一个框架,包括(i)局部和全局提取和(ii)局部-全局信息融合,注意使心律失常检测和分类在一个受约束的输入长度。在10类和4类心律失常检测方面评估了该框架的性能,重点是使用MIT-BIH心律失常数据库(MITDB)和MIT-BIH房颤数据库(AFDB)识别心律失常发作的起始点和结束点及其持续时间。持续时间、发作和Dice评分表现导致MITDB的总体F1评分分别为96.45%、82.05%和96.31%,AFDB的总体F1评分分别为97.57%、98.31%和97.45%。结果表明,与基准模型相比,该模型在统计学上具有更好的性能。为了评估所提出的方法的泛化能力,MITDB训练的模型和MIT-BIH恶性室性心律失常数据库训练的模型分别测试AFDB和MITDB。与最先进的模型相比,获得了优越的性能。所提出的方法有效地捕捉本地和全球的信息和动态没有显着的信息丢失。因此,可以更准确地检测心律失常,并且可以精确地确定其发生时间,使得临床领域能够基于所提出的方法开发更准确的治疗计划。
摘要
:Clinical decision support systems (CDSSs) have been widely utilized to support the decisions made by cardiologists when detecting and classifying arrhythmia from electrocardiograms. However, forming a CDSS for the arrhythmia classification task is challenging due to the varying lengths of arrhythmias. Although the onset time of arrhythmia varies, previously developed methods have not considered such conditions. Thus, we propose a framework that consists of (i) local and global extraction and (ii) local-global information fusion with attention to enable arrhythmia detection and classification within a constrained input length. The framework's performance was evaluated in terms of 10-class and 4-class arrhythmia detection, focusing on identifying the onset and ending point of arrhythmia episodes and their duration using the MIT-BIH arrhythmia database (MITDB) and the MIT-BIH atrial fibrillation database (AFDB). Duration, episode, and Dice score performances resulted in overall F1-scores of 96.45%, 82.05%, and 96.31% on the MITDB and 97.57%, 98.31%, and 97.45% on the AFDB, respectively. The results demonstrated statistically superior performance compared to those of the benchmark models. To assess the generalization capability of the proposed method, an MITDB-trained model and MIT-BIH malignant ventricular arrhythmia database-trained model were tested AFDB and MITDB, respectively. Superior performance was attained compared with that of a state-of-the-art model. The proposed method effectively captures both local and global information and dynamics without significant information loss. Consequently, arrhythmias can be detected with greater accuracy, and their occurrence times can be precisely determined, enabling the clinical field to develop more accurate treatment plans based on the proposed method.
表征(3篇)
【1】PLATONT: Learning a Platonic Representation for Unified Network Tomography
标题:PLAONT:学习统一网络断层扫描的柏拉图表示
链接:https://arxiv.org/abs/2511.15251
作者:Chengze Du,Heng Xu,Zhiwei Yu,Bo Liu,Jialong Li
摘要:网络断层扫描的目的是推断隐藏的网络状态,如链路性能,流量负载和拓扑结构,从外部观察。大多数现有的方法单独解决这些问题,并依赖于有限的特定于任务的信号,这限制了泛化和可解释性。我们提出了PLAONT,一个统一的框架,模型不同的网络指标(例如,延迟、损失、带宽)作为共享潜在网络状态的投影。在柏拉图表征假说的指导下,PLAONT通过多模态对齐和对比学习来学习这种潜在状态。通过在共享的潜在空间内训练多个断层扫描任务,它构建了紧凑和结构化的表示,提高了跨任务的泛化能力。在合成数据集和真实数据集上的实验表明,PLAONT算法在链路估计、拓扑推断和流量预测等方面均优于现有算法,在不同网络条件下具有更高的准确性和更强的鲁棒性。
摘要:Network tomography aims to infer hidden network states, such as link performance, traffic load, and topology, from external observations. Most existing methods solve these problems separately and depend on limited task-specific signals, which limits generalization and interpretability. We present PLATONT, a unified framework that models different network indicators (e.g., delay, loss, bandwidth) as projections of a shared latent network state. Guided by the Platonic Representation Hypothesis, PLATONT learns this latent state through multimodal alignment and contrastive learning. By training multiple tomography tasks within a shared latent space, it builds compact and structured representations that improve cross-task generalization. Experiments on synthetic and real-world datasets show that PLATONT consistently outperforms existing methods in link estimation, topology inference, and traffic prediction, achieving higher accuracy and stronger robustness under varying network conditions.
【2】Complex-Valued 2D Gaussian Representation for Computer-Generated Holography
标题:计算机生成全息术的复值2D高斯表示
链接:https://arxiv.org/abs/2511.15022
作者:Yicheng Zhan,Xiangjun Gao,Long Quan,Kaan Akşit
备注:8 pages, 11 figures
摘要:我们提出了一种新的全息图表示的基础上结构化的复值二维高斯基元,它取代了每像素的信息存储,并减少了参数搜索空间高达10:1。为了实现端到端的训练,我们为我们的表示开发了一个可微分光栅化器,并在自由空间中集成了GPU优化的光传播内核。我们广泛的实验表明,我们的方法实现了高达2.5倍的VRAM使用率和50%的优化速度,同时产生比现有方法更高保真的重建。我们进一步介绍了一个转换过程,适应我们的代表实际的全息图格式,包括平滑和随机相位全息图。我们的实验表明,该程序可以有效地抑制噪声文物中观察到的以前的方法。通过减小全息图参数搜索空间,我们的表示使下一代计算机生成全息系统中的更可扩展的全息图估计成为可能。
摘要:We propose a new hologram representation based on structured complex-valued 2D Gaussian primitives, which replaces per-pixel information storage and reduces the parameter search space by up to 10:1. To enable end-to-end training, we develop a differentiable rasterizer for our representation, integrated with a GPU-optimized light propagation kernel in free space. Our extensive experiments show that our method achieves up to 2.5x lower VRAM usage and 50% faster optimization while producing higher-fidelity reconstructions than existing methods. We further introduce a conversion procedure that adapts our representation to practical hologram formats, including smooth and random phase-only holograms. Our experiments show that this procedure can effectively suppress noise artifacts observed in previous methods. By reducing the hologram parameter search space, our representation enables a more scalable hologram estimation in the next-generation computer-generated holography systems.
【3】Structured Contrastive Learning for Interpretable Latent Representations
标题:可解释潜在表示的结构化对比学习
链接:https://arxiv.org/abs/2511.14920
作者:Zhengyang Shen,Hua Tu,Mayue Shi
备注:Comments: 10 pages, 6 figures. Applications to medical signal retrieval and activity recognition. Correspondence: m.shi16@imperial.ac.uk
摘要:神经网络对语义无关的转换表现出严重的脆弱性。仅仅75 ms的心电图(ECG)相移将潜在余弦相似性从1.0降低到0.2,而传感器旋转会使惯性测量单元(IMU)的活动识别性能崩溃。我们确定的根本原因是“自由放任”的表示学习,潜在的空间发展不受约束的任务性能是满意的。我们提出了结构化对比学习(SCL),这是一个将潜在空间表示划分为三个语义组的框架:在给定变换下保持一致的不变特征(例如,相移或旋转)、通过新颖的变体机制主动区分变换的变体特征、以及保持任务灵活性的自由特征。这创造了可控的推拉动态,其中不同的潜在维度服务于不同的,可解释的目的。变体机制通过鼓励变体特征在正对中区分来增强对比学习,从而实现同时的鲁棒性和可解释性。我们的方法不需要架构修改,并无缝集成到现有的培训管道。ECG相位不变性和IMU旋转鲁棒性的实验证明了优越的性能:ECG相似性从0.25提高到0.91相移下,而WISDM活动识别达到86.65%的准确率与95.38%的旋转一致性,始终优于传统的数据增强。这项工作代表了从反应式数据增强到主动式结构学习的范式转变,从而实现了神经网络中可解释的潜在表示。
摘要:Neural networks exhibit severe brittleness to semantically irrelevant transformations. A mere 75ms electrocardiogram (ECG) phase shift degrades latent cosine similarity from 1.0 to 0.2, while sensor rotations collapse activity recognition performance with inertial measurement units (IMUs). We identify the root cause as "laissez-faire" representation learning, where latent spaces evolve unconstrained provided task performance is satisfied. We propose Structured Contrastive Learning (SCL), a framework that partitions latent space representations into three semantic groups: invariant features that remain consistent under given transformations (e.g., phase shifts or rotations), variant features that actively differentiate transformations via a novel variant mechanism, and free features that preserve task flexibility. This creates controllable push-pull dynamics where different latent dimensions serve distinct, interpretable purposes. The variant mechanism enhances contrastive learning by encouraging variant features to differentiate within positive pairs, enabling simultaneous robustness and interpretability. Our approach requires no architectural modifications and integrates seamlessly into existing training pipelines. Experiments on ECG phase invariance and IMU rotation robustness demonstrate superior performance: ECG similarity improves from 0.25 to 0.91 under phase shifts, while WISDM activity recognition achieves 86.65% accuracy with 95.38% rotation consistency, consistently outperforming traditional data augmentation. This work represents a paradigm shift from reactive data augmentation to proactive structural learning, enabling interpretable latent representations in neural networks.
3D|3D重建等相关(1篇)
【1】US-X Complete: A Multi-Modal Approach to Anatomical 3D Shape Recovery
标题:US-X Complete:解剖3D形状恢复的多模式方法
链接:https://arxiv.org/abs/2511.15600
作者:Miruna-Alexandra Gafencu,Yordanka Velikova,Nassir Navab,Mohammad Farid Azampour
备注:Accepted at the Workshop on Shape in Medical Imaging at MICCAI 2025
摘要:超声为脊柱标志、椎旁软组织和神经血管结构的实时可视化提供了一种无辐射、具有成本效益的解决方案,使其在脊柱手术期间的术中指导具有价值。然而,由于骨引起的声学阴影效应,超声在可视化完整的椎骨解剖结构(特别是椎骨体)方面受到固有的限制。在这项工作中,我们提出了一种新的多模态深度学习方法,通过利用来自单个X射线图像的互补信息来完成3D超声中的闭塞解剖结构。为了实现训练,我们生成成对的训练数据,包括:(1)模拟X射线扫描的2D侧椎骨视图,以及(2)模拟超声脊柱成像期间遇到的有限可见性和闭塞的3D部分椎骨表示。我们的方法整合了来自两种成像模式的形态信息,并证明与3D超声椎体完成的最新技术水平相比,椎体重建有显著改善(p < 0.001)。我们进行体模研究作为未来临床翻译的第一步,并实现更准确,完整的体积腰椎可视化叠加在超声扫描上,而不需要与术前模式,如计算机断层扫描配准。这表明,集成单个X射线投影减轻了超声的关键限制,同时保留了其作为主要成像模式的优势。代码和数据可以在https://github.com/miruna20/US-X-Complete上找到
摘要:Ultrasound offers a radiation-free, cost-effective solution for real-time visualization of spinal landmarks, paraspinal soft tissues and neurovascular structures, making it valuable for intraoperative guidance during spinal procedures. However, ultrasound suffers from inherent limitations in visualizing complete vertebral anatomy, in particular vertebral bodies, due to acoustic shadowing effects caused by bone. In this work, we present a novel multi-modal deep learning method for completing occluded anatomical structures in 3D ultrasound by leveraging complementary information from a single X-ray image. To enable training, we generate paired training data consisting of: (1) 2D lateral vertebral views that simulate X-ray scans, and (2) 3D partial vertebrae representations that mimic the limited visibility and occlusions encountered during ultrasound spine imaging. Our method integrates morphological information from both imaging modalities and demonstrates significant improvements in vertebral reconstruction (p < 0.001) compared to state of art in 3D ultrasound vertebral completion. We perform phantom studies as an initial step to future clinical translation, and achieve a more accurate, complete volumetric lumbar spine visualization overlayed on the ultrasound scan without the need for registration with preoperative modalities such as computed tomography. This demonstrates that integrating a single X-ray projection mitigates ultrasound's key limitation while preserving its strengths as the primary imaging modality. Code and data can be found at https://github.com/miruna20/US-X-Complete
编码器(1篇)
【1】DCL-SE: Dynamic Curriculum Learning for Spatiotemporal Encoding of Brain Imaging
标题:DCL-SE:脑成像时空编码的动态课程学习
链接:https://arxiv.org/abs/2511.15151
作者:Meihua Zhou,Xinyu Tong,Jiarui Zhao,Min Cheng,Li Yang,Lei Tian,Nan Wan
摘要:用于临床诊断的高维神经成像分析通常受到时空保真度的妥协以及大规模通用模型的有限适应性的限制。为了应对这些挑战,我们引入了时空编码的动态课程学习(DCL-SE),一个以数据驱动的时空编码(DaSE)为中心的端到端框架。我们利用近似秩池(ARP)有效地编码三维体积的大脑数据到信息丰富的,二维的动态表示,然后采用动态课程学习策略,由动态组机制(DGM)的指导下,逐步训练解码器,细化特征提取从全球解剖结构到精细的病理细节。在六个公开可用的数据集上进行评估,包括阿尔茨海默病和脑肿瘤分类,脑动脉分割和脑年龄预测,DCL-SE在准确性,鲁棒性和可解释性方面始终优于现有方法。这些发现强调了在大规模预训练网络时代,紧凑的、特定于任务的架构的至关重要性。
摘要:High-dimensional neuroimaging analyses for clinical diagnosis are often constrained by compromises in spatiotemporal fidelity and by the limited adaptability of large-scale, general-purpose models. To address these challenges, we introduce Dynamic Curriculum Learning for Spatiotemporal Encoding (DCL-SE), an end-to-end framework centered on data-driven spatiotemporal encoding (DaSE). We leverage Approximate Rank Pooling (ARP) to efficiently encode three-dimensional volumetric brain data into information-rich, two-dimensional dynamic representations, and then employ a dynamic curriculum learning strategy, guided by a Dynamic Group Mechanism (DGM), to progressively train the decoder, refining feature extraction from global anatomical structures to fine pathological details. Evaluated across six publicly available datasets, including Alzheimer's disease and brain tumor classification, cerebral artery segmentation, and brain age prediction, DCL-SE consistently outperforms existing methods in accuracy, robustness, and interpretability. These findings underscore the critical importance of compact, task-specific architectures in the era of large-scale pretrained networks.
优化|敛散性(5篇)
【1】Convergence and Sketching-Based Efficient Computation of Neural Tangent Kernel Weights in Physics-Based Loss
标题:基于物理学的损失中神经切核权重的收敛和基于草图的有效计算
链接:https://arxiv.org/abs/2511.15530
作者:Max Hirsch,Federico Pichi
摘要:在多目标优化中,多个损失项被加权并加在一起以形成单个目标。这些权重被选择为根据一些元目标适当地平衡竞争损失。例如,在物理信息神经网络(PINN)中,这些权重通常是自适应选择的,以改善网络的泛化误差。自适应权重的一个流行选择是基于PINN的神经正切核(NTK),它描述了训练期间预测器空间中网络的演变。这种自适应加权算法的收敛性先验地不清楚。此外,这些基于NTK的权重将在训练期间频繁更新,进一步增加了学习过程的计算负担。在本文中,我们证明了在适当的条件下,梯度下降增强自适应NTK为基础的权重是在适当的意义下收敛。然后,我们解决的问题,通过开发一个随机算法的启发预测-校正方法和矩阵草图,产生无偏估计的NTK到一个任意小的离散化误差的计算效率。最后,我们提供了数值实验来支持我们的理论研究结果,并显示我们的随机算法的有效性。代码可用性:https://github.com/maxhirsch/Efficient-NTK
摘要:In multi-objective optimization, multiple loss terms are weighted and added together to form a single objective. These weights are chosen to properly balance the competing losses according to some meta-goal. For example, in physics-informed neural networks (PINNs), these weights are often adaptively chosen to improve the network's generalization error. A popular choice of adaptive weights is based on the neural tangent kernel (NTK) of the PINN, which describes the evolution of the network in predictor space during training. The convergence of such an adaptive weighting algorithm is not clear a priori. Moreover, these NTK-based weights would be updated frequently during training, further increasing the computational burden of the learning process. In this paper, we prove that under appropriate conditions, gradient descent enhanced with adaptive NTK-based weights is convergent in a suitable sense. We then address the problem of computational efficiency by developing a randomized algorithm inspired by a predictor-corrector approach and matrix sketching, which produces unbiased estimates of the NTK up to an arbitrarily small discretization error. Finally, we provide numerical experiments to support our theoretical findings and to show the efficacy of our randomized algorithm. Code Availability: https://github.com/maxhirsch/Efficient-NTK
【2】Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling
标题:考虑风能消耗和削峰填谷的电热协同系统优化调度
链接:https://arxiv.org/abs/2511.15250
作者:Jin Ye,Lingmei Wang,Shujian Zhang,Haihang WU
摘要
:随着全球能源转型和可再生能源的快速发展,新能源融合和多重不确定性下的热电联产系统调度优化问题日益突出。针对这一问题,提出了一种基于改进的双延迟深度确定性策略梯度(PVTD 3)算法的智能调度方法。通过引入电网购电变化惩罚项实现系统优化。仿真结果表明,在10%、20%和30%的可再生能源普及率下,PVTD 3算法比传统TD 3算法分别降低了6.93%、12.68%和13.59%的系统综合成本。与此同时,它降低了12.8%的电网购电的平均波动幅度。在储能管理方面,PVTD 3算法将低温储热罐的结束时间状态值降低了7.67-17.67个单位,同时将高温储热罐保持在3.59-4.25的安全运行范围内。多场景对比验证表明,该算法不仅在经济效率和电网稳定性方面表现出色,而且在储能设备管理方面表现出卓越的可持续调度能力。
摘要:With the global energy transition and rapid development of renewable energy, the scheduling optimization challenge for combined power-heat systems under new energy integration and multiple uncertainties has become increasingly prominent. Addressing this challenge, this study proposes an intelligent scheduling method based on the improved Dual-Delay Deep Deterministic Policy Gradient (PVTD3) algorithm. System optimization is achieved by introducing a penalty term for grid power purchase variations. Simulation results demonstrate that under three typical scenarios (10%, 20%, and 30% renewable penetration), the PVTD3 algorithm reduces the system's comprehensive cost by 6.93%, 12.68%, and 13.59% respectively compared to the traditional TD3 algorithm. Concurrently, it reduces the average fluctuation amplitude of grid power purchases by 12.8%. Regarding energy storage management, the PVTD3 algorithm reduces the end-time state values of low-temperature thermal storage tanks by 7.67-17.67 units while maintaining high-temperature tanks within the 3.59-4.25 safety operating range. Multi-scenario comparative validation demonstrates that the proposed algorithm not only excels in economic efficiency and grid stability but also exhibits superior sustainable scheduling capabilities in energy storage device management.
【3】Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization
标题:通过动作量化的轨迹优化学习类人RL代理
链接:https://arxiv.org/abs/2511.15055
作者:Jian-Ting Guo,Yu-Cheng Chen,Ping-Chun Hsieh,Kuo-Hao Ho,Po-Wei Huang,Ti-Rong Wu,I-Chen Wu
备注:Accepted by the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:类人智能体一直是人工智能研究的目标之一。虽然强化学习(RL)在许多领域都取得了超人的性能,但相对较少的注意力集中在设计类人的RL代理上。因此,与人类相比,许多奖励驱动的RL代理通常表现出不自然的行为,引起了对可解释性和可信度的关注。为了在强化学习中实现类人行为,本文首先将类人行为定义为轨迹优化,其目标是找到一个与人类行为密切一致的动作序列,同时也使奖励最大化,并将经典的滚动时域控制作为一种易于处理和有效的实现方式来适应类人学习。为了实现这一点,我们引入了宏动作量化(MAQ),这是一个类似于人类的RL框架,它通过矢量量化VAE将人类演示提炼成宏动作。在D4RL Adroit基准测试上的实验表明,MAQ显着提高了人类相似性,增加了轨迹相似性得分,并在人类评估研究中的所有RL代理中实现了最高的人类相似性排名。我们的研究结果还表明,MAQ可以很容易地集成到各种现成的RL算法中,为学习类人RL代理开辟了一个有前途的方向。我们的代码可在https://rlg.iis.sinica.edu.tw/papers/MAQ上获得。
摘要:Human-like agents have long been one of the goals in pursuing artificial intelligence. Although reinforcement learning (RL) has achieved superhuman performance in many domains, relatively little attention has been focused on designing human-like RL agents. As a result, many reward-driven RL agents often exhibit unnatural behaviors compared to humans, raising concerns for both interpretability and trustworthiness. To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro actions via Vector-Quantized VAE. Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents in the human evaluation study. Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents. Our code is available at https://rlg.iis.sinica.edu.tw/papers/MAQ.
【4】Near-optimal delta-convex estimation of Lipschitz functions
标题:Lipschitz函数的近似最优δ-凸估计
链接:https://arxiv.org/abs/2511.15615
作者:Gábor Balázs
备注:41 pages, 7 figures
摘要:本文提出了一种从噪声观测值中估计未知Lipschitz函数的易处理算法,并建立了其收敛速度的上界。该方法将最大仿射方法从凸形状限制回归扩展到更一般的Lipschitz设置。一个关键组件是一个非线性特征扩展,它将最大仿射函数映射到三角凸函数的子类中,三角凸函数作为Lipschitz函数的通用逼近器,同时保持其Lipschitz常数。利用这一属性,估计达到极小极大收敛速度(对数因子)的平方损失和亚高斯分布的随机设计设置下的数据的内在维度。该算法集成了自适应分区来捕获内在维度,基于惩罚的正则化机制,无需知道真正的Lipschitz常数,以及将凸初始化与局部细化相结合的两阶段优化过程。该框架也可以直接适应凸形状限制回归。实验表明,相对于其他理论上合理的方法,包括最近邻和基于内核的回归竞争力的表现。
摘要:This paper presents a tractable algorithm for estimating an unknown Lipschitz function from noisy observations and establishes an upper bound on its convergence rate. The approach extends max-affine methods from convex shape-restricted regression to the more general Lipschitz setting. A key component is a nonlinear feature expansion that maps max-affine functions into a subclass of delta-convex functions, which act as universal approximators of Lipschitz functions while preserving their Lipschitz constants. Leveraging this property, the estimator attains the minimax convergence rate (up to logarithmic factors) with respect to the intrinsic dimension of the data under squared loss and subgaussian distributions in the random design setting. The algorithm integrates adaptive partitioning to capture intrinsic dimension, a penalty-based regularization mechanism that removes the need to know the true Lipschitz constant, and a two-stage optimization procedure combining a convex initialization with local refinement. The framework is also straightforward to adapt to convex shape-restricted regression. Experiments demonstrate competitive performance relative to other theoretically justified methods, including nearest-neighbor and kernel-based regressors.
【5】A Physics Informed Machine Learning Framework for Optimal Sensor Placement and Parameter Estimation
标题:用于最佳传感器放置和参数估计的物理知识机器学习框架
链接:https://arxiv.org/abs/2511.15543
作者:Georgios Venianakis,Constantinos Theodoropoulos,Michail Kavousanakis
摘要:参数估计在许多工程领域仍然是一项具有挑战性的任务。由于数据采集往往是昂贵的,有限的,或容易不准确(噪声,不确定性),它是至关重要的,以确定传感器配置,提供最大量的信息的未知参数,特别是分布参数系统的情况下,空间变化是重要的。物理信息神经网络(PINN)最近已经成为参数估计的强大机器学习(ML)工具,特别是在稀疏或噪声测量的情况下,克服了传统的基于优化和贝叶斯方法的一些局限性。尽管PINN被广泛用于求解逆问题,但相对较少关注其性能如何取决于传感器位置。本研究通过引入一个全面的基于PINN的框架来解决这一差距,该框架同时解决了传感器的最佳放置和参数估计。我们的方法涉及训练PINN模型,其中包含感兴趣的参数作为额外的输入。这使得能够通过自动微分有效地计算灵敏度函数,然后使用该灵敏度函数来确定利用D最优性准则的最佳传感器位置。该框架在两个复杂性不断增加的示例性分布参数反应扩散平流问题上得到了验证。结果表明,我们的PINNs为基础的方法始终达到更高的精度相比,从直观或随机选择的传感器位置估计的参数值。
摘要
:Parameter estimation remains a challenging task across many areas of engineering. Because data acquisition can often be costly, limited, or prone to inaccuracies (noise, uncertainty) it is crucial to identify sensor configurations that provide the maximum amount of information about the unknown parameters, in particular for the case of distributed-parameter systems, where spatial variations are important. Physics-Informed Neural Networks (PINNs) have recently emerged as a powerful machine-learning (ML) tool for parameter estimation, particularly in cases with sparse or noisy measurements, overcoming some of the limitations of traditional optimization-based and Bayesian approaches. Despite the widespread use of PINNs for solving inverse problems, relatively little attention has been given to how their performance depends on sensor placement. This study addresses this gap by introducing a comprehensive PINN-based framework that simultaneously tackles optimal sensor placement and parameter estimation. Our approach involves training a PINN model in which the parameters of interest are included as additional inputs. This enables the efficient computation of sensitivity functions through automatic differentiation, which are then used to determine optimal sensor locations exploiting the D-optimality criterion. The framework is validated on two illustrative distributed-parameter reaction-diffusion-advection problems of increasing complexity. The results demonstrate that our PINNs-based methodology consistently achieves higher accuracy compared to parameter values estimated from intuitively or randomly selected sensor positions.
预测|估计(8篇)
【1】Controlling False Positives in Image Segmentation via Conformal Prediction
标题:通过保形预测控制图像分割中的误报
链接:https://arxiv.org/abs/2511.15406
作者:Luca Mossina,Corentin Friedrich
摘要:可靠的语义分割对于临床决策至关重要,但深度模型很少对其错误提供明确的统计保证。我们引入了一个简单的事后框架,构建了一个无分布的,图像级控制假阳性预测的置信度掩模。给定任何预训练的分割模型,我们定义了一个嵌套的收缩掩模家族,通过增加分数阈值或应用形态侵蚀来获得。标记的校准集用于通过共形预测选择单个收缩参数,确保对于可与校准数据交换的新图像,置信度掩模中保留的假阳性比例以高概率保持在用户指定的容差以下。该方法是模型不可知的,不需要再训练,并提供有限样本的保证,无论潜在的预测。息肉分割基准上的实验证明了目标水平的经验有效性。我们的框架能够在过度分割可能产生临床后果的环境中实现实用的风险感知分割。代码在https://github.com/deel-ai-papers/conseco。
摘要:Reliable semantic segmentation is essential for clinical decision making, yet deep models rarely provide explicit statistical guarantees on their errors. We introduce a simple post-hoc framework that constructs confidence masks with distribution-free, image-level control of false-positive predictions. Given any pretrained segmentation model, we define a nested family of shrunken masks obtained either by increasing the score threshold or by applying morphological erosion. A labeled calibration set is used to select a single shrink parameter via conformal prediction, ensuring that, for new images that are exchangeable with the calibration data, the proportion of false positives retained in the confidence mask stays below a user-specified tolerance with high probability. The method is model-agnostic, requires no retraining, and provides finite-sample guarantees regardless of the underlying predictor. Experiments on a polyp-segmentation benchmark demonstrate target-level empirical validity. Our framework enables practical, risk-aware segmentation in settings where over-segmentation can have clinical consequences. Code at https://github.com/deel-ai-papers/conseco.
【2】EVA-Net: Interpretable Brain Age Prediction via Continuous Aging Prototypes from EEG
标题:EVA-Net:通过来自脑电的连续衰老原型可解释的大脑年龄预测
链接:https://arxiv.org/abs/2511.15393
作者:Kunyu Zhang,Mingxuan Wang,Xiangjie Shi,Haoxing Xu,Chao Zhang
摘要:大脑年龄是大脑健康的一个重要指标。虽然脑电图(EEG)是完成这一任务的实用工具,但现有的模型难以应对不完善的医疗数据的共同挑战,例如从监督不力、仅健康的队列中学习"正常“基线。这是识别疾病的关键异常检测任务,但标准模型通常是缺乏可解释结构的黑匣子。我们提出了EVA-Net,这是一个新的框架,它将大脑年龄重新定义为一个可解释的异常检测问题。EVA-Net使用高效的稀疏注意力Transformer来对长EEG序列进行建模。为了处理不完美数据中的噪声和可变性,它采用变分信息瓶颈来学习鲁棒的压缩表示。为了可解释性,这种表示与一个连续的原型网络保持一致,该网络明确学习规范的健康老龄化流形。在对1297名健康受试者进行训练后,EVA-Net达到了最先进的准确性。我们验证了其异常检测能力的一个看不见的队列27 MCI和AD患者。这个病理组显示出明显更高的脑年龄差距和一个新的原型对齐错误,证实他们偏离健康的流形。EVA-Net为使用不完善的医疗数据的医疗保健智能提供了一个可解释的框架。
摘要:The brain age is a key indicator of brain health. While electroencephalography (EEG) is a practical tool for this task, existing models struggle with the common challenge of imperfect medical data, such as learning a ``normal'' baseline from weakly supervised, healthy-only cohorts. This is a critical anomaly detection task for identifying disease, but standard models are often black boxes lacking an interpretable structure. We propose EVA-Net, a novel framework that recasts brain age as an interpretable anomaly detection problem. EVA-Net uses an efficient, sparsified-attention Transformer to model long EEG sequences. To handle noise and variability in imperfect data, it employs a Variational Information Bottleneck to learn a robust, compressed representation. For interpretability, this representation is aligned to a continuous prototype network that explicitly learns the normative healthy aging manifold. Trained on 1297 healthy subjects, EVA-Net achieves state-of-the-art accuracy. We validated its anomaly detection capabilities on an unseen cohort of 27 MCI and AD patients. This pathological group showed significantly higher brain-age gaps and a novel Prototype Alignment Error, confirming their deviation from the healthy manifold. EVA-Net provides an interpretable framework for healthcare intelligence using imperfect medical data.
【3】Multi-layer Stack Ensembles for Time Series Forecasting
标题:用于时间序列预测的多层堆栈集成
链接:https://arxiv.org/abs/2511.15350
作者:Nathanael Bosch,Oleksandr Shchur,Nick Erickson,Michael Bohlke-Schneider,Caner Türkmen
备注:Published at AutoML Conference 2025 Methods Track
摘要:集成是一种强大的技术,可以提高机器学习模型的准确性,其中堆叠等方法可以在表格任务中实现强大的结果。然而,在时间序列预测中,集成方法仍然没有得到充分利用,简单的线性组合仍然被认为是最先进的。在本文中,我们系统地探讨了时间序列预测的集成策略。我们评估了33个集成模型-现有的和新的-在50个真实世界的数据集。我们的研究结果表明,堆叠一致提高准确性,但没有一个单一的堆垛机在所有任务中表现最好。为了解决这个问题,我们提出了一个多层堆叠框架的时间序列预测,一种方法,结合了不同的堆栈模型的优势。我们证明了这种方法在不同的预测场景中始终提供卓越的准确性。我们的研究结果强调了基于堆栈的方法在改进AutoML系统进行时间序列预测方面的潜力。
摘要:Ensembling is a powerful technique for improving the accuracy of machine learning models, with methods like stacking achieving strong results in tabular tasks. In time series forecasting, however, ensemble methods remain underutilized, with simple linear combinations still considered state-of-the-art. In this paper, we systematically explore ensembling strategies for time series forecasting. We evaluate 33 ensemble models -- both existing and novel -- across 50 real-world datasets. Our results show that stacking consistently improves accuracy, though no single stacker performs best across all tasks. To address this, we propose a multi-layer stacking framework for time series forecasting, an approach that combines the strengths of different stacker models. We demonstrate that this method consistently provides superior accuracy across diverse forecasting scenarios. Our findings highlight the potential of stacking-based methods to improve AutoML systems for time series forecasting.
【4】Semiconductor Industry Trend Prediction with Event Intervention Based on LSTM Model in Sentiment-Enhanced Time Series Data
标题:情绪增强时间序列数据中基于LSTM模型的事件干预半导体行业趋势预测
链接:https://arxiv.org/abs/2511.15112
作者:Wei-hsiang Yen,Lyn Chao-ling Chen
备注:Accepted in Taiwan Academic Network Conference (TANET 2025)
摘要:本研究的创新之处在于将深度学习方法与情绪分析整合于传统商业模式分析与预测中,并以台积电为研究对象,针对台湾半导体产业进行产业趋势预测。由于半导体产业市场的快速变化和晶圆技术的快速发展,传统的数据分析方法在高多样性和时间序列数据中表现不佳。本研究采文字资料及时间序列资料,搜集自台积电的季度报告,包括财务资料。通过考虑公司内部事件和外部全球事件的事件干预进行情绪分析的文本数据。使用情绪增强的时间序列数据,采用LSTM模型预测台积电的行业趋势。预测结果揭示了台积电晶圆技术的重大发展和全球市场的潜在威胁,并与台积电产品发布消息和国际新闻相匹配。通过考虑内部和外部事件的干预,在半导体行业的行业趋势预测中准确执行的工作的贡献,以及预测结果提供了半导体行业在研究和业务方面的有价值的信息。
摘要:The innovation of the study is that the deep learning method and sentiment analysis are integrated in traditional business model analysis and forecasting, and the research subject is TSMC for industry trend prediction of semiconductor industry in Taiwan. For the rapid market changes and development of wafer technologies of semiconductor industry, traditional data analysis methods not perform well in the high variety and time series data. Textual data and time series data were collected from seasonal reports of TSMC including financial information. Textual data through sentiment analysis by considering the event intervention both from internal events of the company and the external global events. Using the sentiment-enhanced time series data, the LSTM model was adopted for predicting industry trend of TSMC. The prediction results reveal significant development of wafer technology of TSMC and the potential threatens in the global market, and matches the product released news of TSMC and the international news. The contribution of the work performed accurately in industry trend prediction of the semiconductor industry by considering both the internal and external event intervention, and the prediction results provide valuable information of semiconductor industry both in research and business aspects.
【5】IonCast: A Deep Learning Framework for Forecasting Ionospheric Dynamics
标题:IonCast:预测电离层动态的深度学习框架
链接:https://arxiv.org/abs/2511.15004
作者:Halil S. Kelebek,Linnea M. Wolniewicz,Michael D. Vergalla,Simone Mestici,Giacomo Acciarini,Bala Poduval,Olga Verkhoglyadova,Madhulika Guhathakurta,Thomas E. Berger,Frank Soboczenski,Atılım Güneş Baydin
备注:11 pages, 7 figures, 3 tables. Accepted as a poster presentation at the Machine Learning for the Physical Sciences Workshop at NeurIPS 2025
摘要:电离层是近地空间的重要组成部分,影响着GNSS的精度、高频通信和航空业务。由于这些原因,对电离层变异性进行准确的预测和建模变得越来越重要。为了解决这一差距,我们提出了IonCast,这是一套深度学习模型,其中包括为电离层动力学量身定制的GraphCast启发模型。IonCast利用时空学习来预测全球总电子含量(TEC),整合各种物理驱动因素和观测数据集。与持久性相比,在风暴时间和安静条件下进行验证突出了技能的提高。通过将异构数据与可扩展的基于图形的时空学习相统一,IonCast展示了机器学习如何增强对电离层变化的物理理解,并提高操作空间天气复原力。
摘要:The ionosphere is a critical component of near-Earth space, shaping GNSS accuracy, high-frequency communications, and aviation operations. For these reasons, accurate forecasting and modeling of ionospheric variability has become increasingly relevant. To address this gap, we present IonCast, a suite of deep learning models that include a GraphCast-inspired model tailored for ionospheric dynamics. IonCast leverages spatiotemporal learning to forecast global Total Electron Content (TEC), integrating diverse physical drivers and observational datasets. Validating on held-out storm-time and quiet conditions highlights improved skill compared to persistence. By unifying heterogeneous data with scalable graph-based spatiotemporal learning, IonCast demonstrates how machine learning can augment physical understanding of ionospheric variability and advance operational space weather resilience.
【6】Reservoir Computing via Multi-Scale Random Fourier Features for Forecasting Fast-Slow Dynamical Systems
标题:通过多尺度随机傅里叶特征进行储层计算以预测快-慢动态系统
链接:https://arxiv.org/abs/2511.14775
作者:S. K. Laha
备注:23 pages, 18 Figure
摘要:预测具有多尺度时间结构的非线性时间序列仍然是复杂系统建模的核心挑战。我们提出了一种新的水库计算框架,结合延迟嵌入与随机傅立叶特征(RFF)映射来捕捉这样的动态。研究了两种制剂:采用固定内核带宽的单尺度RFF库,以及集成多个带宽以表示快速和慢速时间依赖性的多尺度RFF库。该框架被应用到一组不同的规范系统:神经元模型,如Rulkov地图,Izhikevich模型,Hindmarsh-Rose模型和Morris-Lecar模型,这些模型表现出尖峰,爆裂和混沌行为所产生的快慢相互作用;和生态模型,包括捕食者-猎物动态和Ricker地图与季节性强迫,显示多尺度振荡和不稳定性。在所有情况下,多尺度RFF水库始终优于其单尺度对应,实现较低的归一化均方根误差(NRMSE)和更强大的长期预测。这些结果突出了显式地将多尺度特征映射纳入水库计算体系结构的有效性,用于对具有内在快慢相互作用的复杂动态系统进行建模。
摘要:Forecasting nonlinear time series with multi-scale temporal structures remains a central challenge in complex systems modeling. We present a novel reservoir computing framework that combines delay embedding with random Fourier feature (RFF) mappings to capture such dynamics. Two formulations are investigated: a single-scale RFF reservoir, which employs a fixed kernel bandwidth, and a multi-scale RFF reservoir, which integrates multiple bandwidths to represent both fast and slow temporal dependencies. The framework is applied to a diverse set of canonical systems: neuronal models such as the Rulkov map, Izhikevich model, Hindmarsh-Rose model, and Morris-Lecar model, which exhibit spiking, bursting, and chaotic behaviors arising from fast-slow interactions; and ecological models including the predator-prey dynamics and Ricker map with seasonal forcing, which display multi-scale oscillations and intermittency. Across all cases, the multi-scale RFF reservoir consistently outperforms its single-scale counterpart, achieving lower normalized root mean square error (NRMSE) and more robust long-horizon predictions. These results highlight the effectiveness of explicitly incorporating multi-scale feature mappings into reservoir computing architectures for modeling complex dynamical systems with intrinsic fast-slow interactions.
【7】Why Physics Still Matters: Improving Machine Learning Prediction of Material Properties with Phonon-Informed Datasets
标题:为什么物理学仍然很重要:利用音素知情数据集改进材料性质的机器学习预测
链接:https://arxiv.org/abs/2511.15222
作者:Pol Benítez,Cibrán López,Edgardo Saucedo,Teruyasu Mizoguchi,Claudio Cazorla
备注:12 pages; 5 figures
摘要:机器学习(ML)方法已经成为预测材料性能的强大工具,具有接近第一原理的准确性和大大降低的计算成本。然而,ML模型的性能主要取决于训练数据集的质量、大小和多样性。在材料科学中,这种依赖性对于从低对称性原子结构中学习特别重要,这些结构可以捕获热激发,结构缺陷和化学无序,这些特征在真实材料中普遍存在,但在大多数数据集中表现不足。因此,缺乏生成代表性训练数据的系统策略可能会限制ML模型在能源转换和光子学等技术关键领域的预测能力。在这项工作中,我们评估了在两种根本不同类型的数据集上训练的图神经网络(GNN)模型的有效性:一种由随机生成的原子配置组成,另一种使用基于晶格振动的物理信息采样构建。作为一个案例研究,我们解决了具有挑战性的任务,预测电子和机械性能的原型家庭的光电材料在现实的有限温度条件下。我们发现,声子知情的模型始终优于随机训练的对手,尽管依赖于更少的数据点。可解释性分析进一步揭示,高性能模型赋予控制属性变化的化学意义键更大的权重,强调了物理指导数据生成的重要性。总的来说,这项工作表明,较大的数据集不一定会产生更好的GNN预测模型,并介绍了一种简单而通用的策略,用于有效地构建材料信息学中的高质量训练数据。
摘要
:Machine learning (ML) methods have become powerful tools for predicting material properties with near first-principles accuracy and vastly reduced computational cost. However, the performance of ML models critically depends on the quality, size, and diversity of the training dataset. In materials science, this dependence is particularly important for learning from low-symmetry atomistic configurations that capture thermal excitations, structural defects, and chemical disorder, features that are ubiquitous in real materials but underrepresented in most datasets. The absence of systematic strategies for generating representative training data may therefore limit the predictive power of ML models in technologically critical fields such as energy conversion and photonics. In this work, we assess the effectiveness of graph neural network (GNN) models trained on two fundamentally different types of datasets: one composed of randomly generated atomic configurations and another constructed using physically informed sampling based on lattice vibrations. As a case study, we address the challenging task of predicting electronic and mechanical properties of a prototypical family of optoelectronic materials under realistic finite-temperature conditions. We find that the phonons-informed model consistently outperforms the randomly trained counterpart, despite relying on fewer data points. Explainability analyses further reveal that high-performing models assign greater weight to chemically meaningful bonds that control property variations, underscoring the importance of physically guided data generation. Overall, this work demonstrates that larger datasets do not necessarily yield better GNN predictive models and introduces a simple and general strategy for efficiently constructing high-quality training data in materials informatics.
【8】Data-driven Prediction of Species-Specific Plant Responses to Spectral-Shifting Films from Leaf Phenotypic and Photosynthetic Traits
标题:数据驱动预测特定物种植物对叶表型和光合性状光谱转移膜的反应
链接:https://arxiv.org/abs/2511.15173
作者:Jun Hyeun Kang,Jung Eek Son,Tae In Ahn
摘要:在温室中应用光谱转换膜将绿光转换为红光,已经显示出不同作物物种的不同生长反应。然而,作物在光质改变下的产量提高与每个物种的特定生物物理特性的集体效应有关。仅考虑作物的一个属性在理解阳光质量调节与作物生长性能之间的关系方面具有局限性。因此,本研究旨在利用人工智能综合考虑作物在SF下对其生长结果的生理反应,将多个植物表型性状和日光照积分联系起来。在2021年至2024年期间,在覆盖有PEF或SF的温室中种植各种叶、果和根作物,并测量在每种条件下栽培的植物的叶片反射率、单位面积叶片质量、叶绿素含量、日光积分和光饱和点。收集了210个数据点,但没有足够的数据来训练深度学习模型,因此使用了变分自动编码器来增强数据。大多数作物产量在SF下平均增加22.5%。这些数据用于训练几个模型,包括逻辑回归、决策树、随机森林、XGBoost和前馈神经网络(FFNN),旨在对施用SF是否对产量有显著影响进行二元分类。FFNN在未用于训练的测试数据集上实现了91.4%的高分类准确率。这项研究提供了深入了解叶片表型和光合特性,环境条件和太阳光谱成分之间的复杂的相互作用,通过提高预测太阳光谱变化的影响,使用SF的能力。
摘要:The application of spectral-shifting films in greenhouses to shift green light to red light has shown variable growth responses across crop species. However, the yield enhancement of crops under altered light quality is related to the collective effects of the specific biophysical characteristics of each species. Considering only one attribute of a crop has limitations in understanding the relationship between sunlight quality adjustments and crop growth performance. Therefore, this study aims to comprehensively link multiple plant phenotypic traits and daily light integral considering the physiological responses of crops to their growth outcomes under SF using artificial intelligence. Between 2021 and 2024, various leafy, fruiting, and root crops were grown in greenhouses covered with either PEF or SF, and leaf reflectance, leaf mass per area, chlorophyll content, daily light integral, and light saturation point were measured from the plants cultivated in each condition. 210 data points were collected, but there was insufficient data to train deep learning models, so a variational autoencoder was used for data augmentation. Most crop yields showed an average increase of 22.5% under SF. These data were used to train several models, including logistic regression, decision tree, random forest, XGBoost, and feedforward neural network (FFNN), aiming to binary classify whether there was a significant effect on yield with SF application. The FFNN achieved a high classification accuracy of 91.4% on a test dataset that was not used for training. This study provide insight into the complex interactions between leaf phenotypic and photosynthetic traits, environmental conditions, and solar spectral components by improving the ability to predict solar spectral shift effects using SF.
其他神经网络|深度学习|模型|建模(9篇)
【1】Walrus: A Cross-Domain Foundation Model for Continuum Dynamics
标题:海象:连续体动力学的跨领域基础模型
链接:https://arxiv.org/abs/2511.15684
作者:Michael McCabe,Payel Mukhopadhyay,Tanya Marwah,Bruno Regaldo-Saint Blancard,Francois Rozet,Cristiana Diaconu,Lucas Meyer,Kaze W. K. Wong,Hadi Sotoudeh,Alberto Bietti,Irina Espejo,Rio Fear,Siavash Golkar,Tom Hehir,Keiya Hirashima,Geraud Krawezik,Francois Lanusse,Rudy Morel,Ruben Ohana,Liam Parker,Mariel Pettee,Jeff Shen,Kyunghyun Cho,Miles Cranmer,Shirley Ho
摘要:基础模型已经改变了语言和视觉的机器学习,但在物理模拟中实现类似的影响仍然是一个挑战。数据异质性和不稳定的长期动态抑制了从足够多样化的动态中学习,而不同的分辨率和维度则对现代硬件的有效训练提出了挑战。通过实证和理论分析,我们采用了新的方法来缓解这些障碍,包括基于谐波分析的稳定方法,负载平衡的分布式2D和3D训练策略以及计算自适应标记化。使用这些工具,我们开发海象,基于变压器的基础模型,主要是为流体连续动力学。海象是预先训练的19个不同的方案,涵盖天体物理学,地球科学,流变学,等离子体物理学,声学和经典流体。实验表明,Walrus在下游任务的短期和长期预测范围以及预训练数据的广度上都优于先前的基础模型,而消融研究证实了我们对预测稳定性,训练吞吐量和传输性能的贡献超过传统方法的价值。代码和权重已发布供社区使用。
摘要:Foundation models have transformed machine learning for language and vision, but achieving comparable impact in physical simulation remains a challenge. Data heterogeneity and unstable long-term dynamics inhibit learning from sufficiently diverse dynamics, while varying resolutions and dimensionalities challenge efficient training on modern hardware. Through empirical and theoretical analysis, we incorporate new approaches to mitigate these obstacles, including a harmonic-analysis-based stabilization method, load-balanced distributed 2D and 3D training strategies, and compute-adaptive tokenization. Using these tools, we develop Walrus, a transformer-based foundation model developed primarily for fluid-like continuum dynamics. Walrus is pretrained on nineteen diverse scenarios spanning astrophysics, geoscience, rheology, plasma physics, acoustics, and classical fluids. Experiments show that Walrus outperforms prior foundation models on both short and long term prediction horizons on downstream tasks and across the breadth of pretraining data, while ablation studies confirm the value of our contributions to forecast stability, training throughput, and transfer performance over conventional approaches. Code and weights are released for community use.
【2】CODE: A global approach to ODE dynamics learning
标题:代码:ODE动态学习的全球方法
链接:https://arxiv.org/abs/2511.15619
作者:Nils Wildt,Daniel M. Tartakovsky,Sergey Oladyshkin,Wolfgang Nowak
摘要:常微分方程(ODE)是描述物理系统观测到的动力学的传统方法。科学家通常会对动力学行为进行假设,提出一个数学模型,并将其预测与数据进行比较。然而,现代计算和算法的进步现在使纯粹的数据驱动的学习直接从观察动态。在数据驱动的设置中,人们学习ODE的右手边(RHS)。通常假设密集测量,然而高时间分辨率通常既麻烦又昂贵。因此,通常只有稀疏的采样数据。在这项工作中,我们介绍ChaosODE(代码),多项式混沌ODE扩展,其中我们使用任意多项式混沌扩展(aPCE)的ODE的右手边,从而在全球正交多项式表示的动态。我们在Lotka-Volterra系统的几个实验中评估了CODE的性能,包括不同的噪声水平,初始条件和对未来的预测,即使是以前看不见的初始条件。CODE表现出显着的外推能力,即使在新的初始条件下进行评估,并显示出的优势相比,使用神经网络(NeuralODE)或核逼近器(KernelODE)作为RHS代表的方法进行了充分的检查。我们观察到,NeuralODE和KernelODE的高灵活性降低了稀缺的数据和测量噪声下的外推能力。最后,我们为动态学习问题的鲁棒优化提供了实用指南,并在随附的代码中进行了说明。
摘要
:Ordinary differential equations (ODEs) are a conventional way to describe the observed dynamics of physical systems. Scientists typically hypothesize about dynamical behavior, propose a mathematical model, and compare its predictions to data. However, modern computing and algorithmic advances now enable purely data-driven learning of governing dynamics directly from observations. In data-driven settings, one learns the ODE's right-hand side (RHS). Dense measurements are often assumed, yet high temporal resolution is typically both cumbersome and expensive. Consequently, one usually has only sparsely sampled data. In this work we introduce ChaosODE (CODE), a Polynomial Chaos ODE Expansion in which we use an arbitrary Polynomial Chaos Expansion (aPCE) for the ODE's right-hand side, resulting in a global orthonormal polynomial representation of dynamics. We evaluate the performance of CODE in several experiments on the Lotka-Volterra system, across varying noise levels, initial conditions, and predictions far into the future, even on previously unseen initial conditions. CODE exhibits remarkable extrapolation capabilities even when evaluated under novel initial conditions and shows advantages compared to well-examined methods using neural networks (NeuralODE) or kernel approximators (KernelODE) as the RHS representer. We observe that the high flexibility of NeuralODE and KernelODE degrades extrapolation capabilities under scarce data and measurement noise. Finally, we provide practical guidelines for robust optimization of dynamics-learning problems and illustrate them in the accompanying code.
【3】Neural network-driven domain decomposition for efficient solutions to the Helmholtz equation
标题:神经网络驱动的区域分解以有效解决赫尔姆霍尔茨方程
链接:https://arxiv.org/abs/2511.15445
作者:Victorita Dolean,Daria Hrebenshchykova,Stéphane Lanteri,Victor Michel-Dansac
摘要:精确模拟波的传播在声学、电磁学和地震分析等领域至关重要。传统的数值方法,如有限差分和有限元方法,被广泛用于求解偏微分方程(PDE),如亥姆霍兹方程。然而,这些方法面临着重大的计算挑战时,应用于复杂的二维域中的高频波问题。这项工作研究了有限基物理信息神经网络(FBPINNs)及其多级扩展作为一个有前途的替代方案。这些方法利用域分解,将计算域划分为重叠的子域,每个子域由局部神经网络管理。我们评估他们的准确性和计算效率在求解亥姆霍兹方程的均匀情况下,展示了他们的潜力,以减轻传统方法的局限性。
摘要:Accurately simulating wave propagation is crucial in fields such as acoustics, electromagnetism, and seismic analysis. Traditional numerical methods, like finite difference and finite element approaches, are widely used to solve governing partial differential equations (PDEs) such as the Helmholtz equation. However, these methods face significant computational challenges when applied to high-frequency wave problems in complex two-dimensional domains. This work investigates Finite Basis Physics-Informed Neural Networks (FBPINNs) and their multilevel extensions as a promising alternative. These methods leverage domain decomposition, partitioning the computational domain into overlapping sub-domains, each governed by a local neural network. We assess their accuracy and computational efficiency in solving the Helmholtz equation for the homogeneous case, demonstrating their potential to mitigate the limitations of traditional approaches.
【4】Parameter Importance-Driven Continual Learning for Foundation Models
标题:参数重要性驱动的基础模型连续学习
链接:https://arxiv.org/abs/2511.15375
作者:Lingxiang Wang,Hainan Zhang,Zhiming Zheng
摘要:特定领域的后训练往往会导致灾难性的遗忘,使基础模型失去一般的推理能力,并限制了它们对动态现实环境的适应性。在获取下游领域知识的同时保持一般能力是大型语言和多模态模型的核心挑战。传统的持续学习方法,如正则化,重放和架构隔离,受到下游性能差,依赖于不可访问的历史数据,或额外的参数开销。虽然最近的参数有效的调整(PET)方法可以减轻遗忘,其有效性在很大程度上取决于参数和更新策略的选择。在本文中,我们介绍了PIECE,一种基于参数重要性估计的连续增强方法,它在有效学习领域知识的同时保留了一般能力,而无需访问先前的训练数据或增加模型参数。PIECE选择性地仅更新与新任务最相关的0.1%的核心参数,由两个重要性估计器指导:基于Fisher信息的PIECE-F和基于结合梯度和曲率信息的二阶归一化的PIECE-S。在三种语言模型和两种多模态模型上的实验表明,PIECE保持了一般的能力,并在不同的下游任务中实现了最先进的持续学习性能。我们的研究结果突出了一个实用的路径,可扩展的,域自适应的基础模型没有灾难性的遗忘。
摘要:Domain-specific post-training often causes catastrophic forgetting, making foundation models lose their general reasoning ability and limiting their adaptability to dynamic real-world environments. Preserving general capabilities while acquiring downstream domain knowledge is a central challenge for large language and multimodal models. Traditional continual learning methods, such as regularization, replay and architectural isolation, suffer from poor downstream performance, reliance on inaccessible historical data, or additional parameter overhead. While recent parameter-efficient tuning (PET) methods can alleviate forgetting, their effectiveness strongly depends on the choice of parameters and update strategies. In this paper, we introduce PIECE, a Parameter Importance Estimation-based Continual Enhancement method that preserves general ability while efficiently learning domain knowledge without accessing prior training data or increasing model parameters. PIECE selectively updates only 0.1% of core parameters most relevant to new tasks, guided by two importance estimators: PIECE-F based on Fisher Information, and PIECE-S based on a second-order normalization that combines gradient and curvature information. Experiments across three language models and two multimodal models show that PIECE maintains general capabilities and achieves state-of-the-art continual learning performance across diverse downstream tasks. Our results highlight a practical path to scalable, domain-adaptive foundation models without catastrophic forgetting.
【5】On the Internal Semantics of Time-Series Foundation Models
标题:时间序列基础模型的内部语义
链接:https://arxiv.org/abs/2511.15324
作者:Atharva Pandey,Abhilash Neog,Gautam Jajoo
摘要:时间序列基础模型(TSFMs)最近已经成为跨不同时间域学习的通用范式。然而,尽管他们的经验成功,这些模型代表基本的时间序列概念的内部机制仍然知之甚少。在这项工作中,我们进行了系统的调查TSFMs的概念可解释性。具体而言,我们研究:(i)哪些层编码哪些概念,(ii)概念参数是否是线性可恢复的,(iii)表示如何在概念解纠缠和抽象方面跨越模型深度演变,以及(iv)模型如何处理概念的组合。我们系统地探讨这些问题,使用逐层分析,线性可恢复性测试,并表示相似性的措施,TSFM语义提供了一个结构化的帐户。由此产生的见解表明,早期层主要捕获局部时域模式(例如,AR(1),电平偏移,趋势),而更深层编码色散和变化时间信号,频谱和扭曲因子仍然最难线性恢复。然而,在组合设置中,探测性能下降,揭示了概念之间的干扰。这突出表明,虽然原子概念是可靠的本地化,组成仍然是一个挑战,强调了当前TSFMs的能力,以表示相互作用的时间现象的关键限制。
摘要:Time-series Foundation Models (TSFMs) have recently emerged as a universal paradigm for learning across diverse temporal domains. However, despite their empirical success, the internal mechanisms by which these models represent fundamental time-series concepts remain poorly understood. In this work, we undertake a systematic investigation of concept interpretability in TSFMs. Specifically, we examine: (i) which layers encode which concepts, (ii) whether concept parameters are linearly recoverable, (iii) how representations evolve in terms of concept disentanglement and abstraction across model depth, and (iv) how models process compositions of concepts. We systematically probe these questions using layer-wise analyses, linear recoverability tests, and representation similarity measures, providing a structured account of TSFM semantics. The resulting insights show that early layers mainly capture local, time-domain patterns (e.g., AR(1), level shifts, trends), while deeper layers encode dispersion and change-time signals, with spectral and warping factors remaining the hardest to recover linearly. In compositional settings, however, probe performance degrades, revealing interference between concepts. This highlights that while atomic concepts are reliably localized, composition remains a challenge, underscoring a key limitation in current TSFMs' ability to represent interacting temporal phenomena.
【6】Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environment
标题:动态、部分观察、时间序列环境中的模拟人类学习
链接
:https://arxiv.org/abs/2511.15032
作者:Jeffrey Jiang,Kevin Hong,Emily Kuczynski,Gregory Pottie
备注:Manuscript in preparation for IEEE Transactions on Education, 20 pages, 6 figures, 5 tables
摘要:虽然智能辅导系统(ITS)可以使用过去学生的信息来个性化教学,但每个新学生都是独一无二的。此外,教育问题本身就很困难,因为学习过程只能部分地观察到。因此,我们开发了一个动态的,时间序列的环境来模拟课堂环境,学生和教师的干预-包括辅导课程,讲座和考试。特别是,我们设计的模拟环境,允许不同程度的探测干预,可以收集更多的信息。然后,我们开发了强化学习ITS,它结合了学习学生的个人状态,同时通过使用探测干预从人口信息中提取。这些干预措施可以降低学生估计的难度,但也引入了一个成本效益决定,以找到足够的探测之间的平衡,以获得准确的估计和探测如此频繁,它成为破坏性的学生。我们比较了标准RL算法与几种基于贪婪规则的启发式方法的有效性,发现它们提供了不同的解决方案,但结果相似。我们还强调了随着隐藏信息水平的增加,问题的难度,以及如果我们允许探测干预,我们会得到的提升。我们展示了启发式和强化学习政策在改变学生人口分布方面的灵活性,发现两者都是灵活的,但强化学习政策很难帮助更难的课程。最后,我们测试了不同的课程结构与非探测的政策,我们发现,我们的政策能够提高测验和中期结构的性能比我们可以在一个最终的结构,突出了额外的信息的好处。
摘要:While intelligent tutoring systems (ITSs) can use information from past students to personalize instruction, each new student is unique. Moreover, the education problem is inherently difficult because the learning process is only partially observable. We therefore develop a dynamic, time-series environment to simulate a classroom setting, with student-teacher interventions - including tutoring sessions, lectures, and exams. In particular, we design the simulated environment to allow for varying levels of probing interventions that can gather more information. Then, we develop reinforcement learning ITSs that combine learning the individual state of students while pulling from population information through the use of probing interventions. These interventions can reduce the difficulty of student estimation, but also introduce a cost-benefit decision to find a balance between probing enough to get accurate estimates and probing so often that it becomes disruptive to the student. We compare the efficacy of standard RL algorithms with several greedy rules-based heuristic approaches to find that they provide different solutions, but with similar results. We also highlight the difficulty of the problem with increasing levels of hidden information, and the boost that we get if we allow for probing interventions. We show the flexibility of both heuristic and RL policies with regards to changing student population distributions, finding that both are flexible, but RL policies struggle to help harder classes. Finally, we test different course structures with non-probing policies and we find that our policies are able to boost the performance of quiz and midterm structures more than we can in a finals-only structure, highlighting the benefit of having additional information.
【7】Multimodal Wireless Foundation Models
标题:多模式无线基础模型
链接:https://arxiv.org/abs/2511.15162
作者:Ahmed Aboulfotouh,Hatem Abou-Zeid
摘要:无线基础模型(WFM)最近表现出有前途的能力,联合执行多种无线功能,并有效地适应新的环境。然而,虽然当前WFM仅处理一种模态,但取决于任务和操作条件,信息量最大的模态会发生变化,并且没有一种模态最适合所有任务。因此,工作机制的设计应接受多种模式,以实现更广泛和更多样化的任务和情景。在这项工作中,我们提出并建立了第一个多模式无线基础模型,能够处理原始IQ流和图像样无线模态(例如,频谱图和CSI),并在两者之间执行多个任务。我们引入了多模态设置的屏蔽无线建模,这是一种自监督的目标和预训练配方,可以从IQ流和图像式无线模态中学习联合表示。我们在两个模态系列的五个任务上评估了该模型:基于图像(人体活动感知,RF信号分类,5G NR定位)和基于IQ(RF设备指纹识别,干扰检测/分类)。多模态WFM与单模态WFM相比具有竞争力,并且在某些情况下超越了它们的性能。我们的研究结果表明,开发多模态WFM,支持不同的无线任务,在不同的模式的强大潜力。我们相信,这为AI原生6 G和联合感知、通信和定位的愿景迈出了具体的一步。
摘要:Wireless foundation models (WFMs) have recently demonstrated promising capabilities, jointly performing multiple wireless functions and adapting effectively to new environments. However, while current WFMs process only one modality, depending on the task and operating conditions, the most informative modality changes and no single modality is best for all tasks. WFMs should therefore be designed to accept multiple modalities to enable a broader and more diverse range of tasks and scenarios. In this work, we propose and build the first multimodal wireless foundation model capable of processing both raw IQ streams and image-like wireless modalities (e.g., spectrograms and CSI) and performing multiple tasks across both. We introduce masked wireless modeling for the multimodal setting, a self-supervised objective and pretraining recipe that learns a joint representation from IQ streams and image-like wireless modalities. We evaluate the model on five tasks across both modality families: image-based (human activity sensing, RF signal classification, 5G NR positioning) and IQ-based (RF device fingerprinting, interference detection/classification). The multimodal WFM is competitive with single-modality WFMs, and in several cases surpasses their performance. Our results demonstrates the strong potential of developing multimodal WFMs that support diverse wireless tasks across different modalities. We believe this provides a concrete step toward both AI-native 6G and the vision of joint sensing, communication, and localization.
【8】Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
标题:神经网络学习接近信息理论极限的通用多指标模型
链接:https://arxiv.org/abs/2511.15120
作者:Bohan Zhang,Zihao Wang,Hengyu Fu,Jason D. Lee
备注:86 pages, 2 figures. The order of the first two authors was determined by a coin flip
摘要:In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(\boldsymbol{x})=g(\boldsymbol{U}\boldsymbol{x})$ with hidden subspace $\boldsymbol{U}\in \mathbb{R}^{r\times d}$, which is the canonical setup to study representation learning. We prove that under generic non-degenerate assumptions on the link function, a standard two-layer neural network trained via layer-wise gradient descent can agnostically learn the target with $o_d(1)$ test error using $\widetilde{\mathcal{O}}(d)$ samples and $\widetilde{\mathcal{O}}(d^2)$ time. The sample and time complexity both align with the information-theoretic limit up to leading order and are therefore optimal. During the first stage of gradient descent learning, the proof proceeds via showing that the inner weights can perform a power-iteration process. This process implicitly mimics a spectral start for the whole span of the hidden subspace and eventually eliminates finite-sample noise and recovers this span. It surprisingly indicates that optimal results can only be achieved if the first layer is trained for more than $\mathcal{O}(1)$ steps. This work demonstrates the ability of neural networks to effectively learn hierarchical functions with respect to both sample and time efficiency.
摘要
:In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(\boldsymbol{x})=g(\boldsymbol{U}\boldsymbol{x})$ with hidden subspace $\boldsymbol{U}\in \mathbb{R}^{r\times d}$, which is the canonical setup to study representation learning. We prove that under generic non-degenerate assumptions on the link function, a standard two-layer neural network trained via layer-wise gradient descent can agnostically learn the target with $o_d(1)$ test error using $\widetilde{\mathcal{O}}(d)$ samples and $\widetilde{\mathcal{O}}(d^2)$ time. The sample and time complexity both align with the information-theoretic limit up to leading order and are therefore optimal. During the first stage of gradient descent learning, the proof proceeds via showing that the inner weights can perform a power-iteration process. This process implicitly mimics a spectral start for the whole span of the hidden subspace and eventually eliminates finite-sample noise and recovers this span. It surprisingly indicates that optimal results can only be achieved if the first layer is trained for more than $\mathcal{O}(1)$ steps. This work demonstrates the ability of neural networks to effectively learn hierarchical functions with respect to both sample and time efficiency.
【9】MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging
标题:MergeDNA:通过令牌合并进行动态令牌化的上下文感知基因组建模
链接:https://arxiv.org/abs/2511.14806
作者:Siyuan Li,Kai Yu,Anna Wang,Zicheng Liu,Chang Yu,Jingbo Zhou,Qirong Yang,Yucheng Guo,Xiaoming Zhang,Stan Z. Li
备注:AAAI 2026 (Oral Presentation) Preprint
摘要:基因组序列建模面临着两个尚未解决的挑战:信息密度在不同区域差异很大,同时没有明确定义的最小词汇单元。依赖于四个原始碱基或独立设计的DNA标记器,具有朴素掩蔽语言建模预训练的现有方法通常无法适应基因组序列的不同复杂性。利用令牌合并技术,本文介绍了一种分层架构,联合优化动态基因组标记器和潜在的Transformers与上下文感知的预训练任务。对于网络结构,标记化模块通过叠加多层局部窗口约束的可微标记合并块,自动将相邻的碱基组块成词,Latent Encoder通过全注意力块捕获合并词的全局上下文。MergeDNA对称地采用Latent Decoder和Local Decoder,通过两个预训练任务进行学习:Merged Token Reconstruction同时训练动态令牌化模块并自适应过滤重要令牌,而Adaptive Masked Token Modeling学习预测这些过滤后的令牌以捕获信息内容。大量的实验表明,MergeDNA在三个流行的DNA基准测试和几个多组学任务上实现了优异的性能,并进行了微调或zero-shot评估,优于典型的标记化方法和大规模DNA基础模型。
摘要:Modeling genomic sequences faces two unsolved challenges: the information density varies widely across different regions, while there is no clearly defined minimum vocabulary unit. Relying on either four primitive bases or independently designed DNA tokenizers, existing approaches with naive masked language modeling pre-training often fail to adapt to the varying complexities of genomic sequences. Leveraging Token Merging techniques, this paper introduces a hierarchical architecture that jointly optimizes a dynamic genomic tokenizer and latent Transformers with context-aware pre-training tasks. As for network structures, the tokenization module automatically chunks adjacent bases into words by stacking multiple layers of the differentiable token merging blocks with local-window constraints, then a Latent Encoder captures the global context of these merged words by full-attention blocks. Symmetrically employing a Latent Decoder and a Local Decoder, MergeDNA learns with two pre-training tasks: Merged Token Reconstruction simultaneously trains the dynamic tokenization module and adaptively filters important tokens, while Adaptive Masked Token Modeling learns to predict these filtered tokens to capture informative contents. Extensive experiments show that MergeDNA achieves superior performance on three popular DNA benchmarks and several multi-omics tasks with fine-tuning or zero-shot evaluation, outperforming typical tokenization methods and large-scale DNA foundation models.
其他(22篇)
【1】Tokenisation over Bounded Alphabets is Hard
标题:有界字母表上的代币化很困难
链接:https://arxiv.org/abs/2511.15709
作者:Violeta Kastreva,Philip Whittington,Dennis Komm,Tiago Pimentel
摘要:最近的研究表明,令牌化是NP完全的。然而,这些作品假设标记化应用于具有无限大字母的输入-这是一个不切实际的假设,因为在实践中标记器在固定大小的字母(如字节或Unicode字符)上操作。我们通过分析有界$n$-ary字母表的标记化来缩小这一差距,考虑两种自然变体:自下而上的标记化和直接标记化,在这种情况下,我们必须分别选择一系列合并操作或其应用程序优化压缩数据集的词汇表。首先,我们注意到,证明一个$n$-ary字母表的硬度结果证明了任何更大尺寸的字母表的相同结果。然后,我们证明,即使与二进制字母表,这两个变种不仅是NP-完全的,但承认没有多项式时间近似计划(除非P=NP)。我们进一步表明,直接标记仍然NP完全,即使适用于一元字母。虽然一元字母表可能实际上没有用,但这一结果表明,标记化的计算困难性不是大字母表或复杂结构的人工制品,而是一个根本性的障碍。总的来说,我们的研究结果解释了为什么BPE和UnigramLM等实用算法是启发式的,并指出近似算法是令牌化研究的重要前进道路。
摘要:Recent works have shown that tokenisation is NP-complete. However, these works assume tokenisation is applied to inputs with unboundedly large alphabets -- an unrealistic assumption, given that in practice tokenisers operate over fixed-size alphabets, such as bytes or Unicode characters. We close this gap by analysing tokenisation over bounded $n$-ary alphabets, considering two natural variants: bottom-up tokenisation and direct tokenisation, where we must, respectively, select a sequence of merge operations or a vocabulary whose application optimally compresses a dataset. First, we note that proving hardness results for an $n$-ary alphabet proves the same results for alphabets of any larger size. We then prove that even with binary alphabets, both variants are not only NP-complete, but admit no polynomial-time approximation scheme (unless P=NP). We further show that direct tokenisation remains NP-complete even when applied to unary alphabets. While unary alphabets may not be practically useful, this result establishes that the computational intractability of tokenisation is not an artifact of large alphabets or complex constructions, but a fundamental barrier. Overall, our results explain why practical algorithms such as BPE and UnigramLM are heuristic, and points toward approximation algorithms being an important path going forward for tokenisation research.
【2】PCARNN-DCBF: Minimal-Intervention Geofence Enforcement for Ground Vehicles
标题:PCARNN-DCBF:地面车辆的最小干预地理围栏执法
链接:https://arxiv.org/abs/2511.15522
作者:Yinan Yu,Samuel Scheidegger
摘要:地面车辆的防护栅栏正迅速成为实施作战设计领域(ODD)的关键技术。然而,现有的解决方案很难调和高保真学习与可验证控制的结构要求。我们通过引入PCARN-DCBF来解决这个问题,PCARN-DCBF是一种新的管道,它集成了物理编码的控制仿射残差神经网络和基于预览的离散控制障碍函数。与一般的学习模型不同,PCARNN明确保留了车辆动力学的控制仿射结构,确保了可靠优化所需的线性度。这使得DCBF能够通过实时二次程序(QP)来强制执行多边形保持约束,该程序可处理高相对度并减轻致动器饱和。CARLA在电动和燃烧平台上的实验表明,这种结构保留方法显着优于分析和非结构化神经基线。
摘要:Runtime geofencing for ground vehicles is rapidly emerging as a critical technology for enforcing Operational Design Domains (ODDs). However, existing solutions struggle to reconcile high-fidelity learning with the structural requirements of verifiable control. We address this by introducing PCARNN-DCBF, a novel pipeline integrating a Physics-encoded Control-Affine Residual Neural Network with a preview-based Discrete Control Barrier Function. Unlike generic learned models, PCARNN explicitly preserves the control-affine structure of vehicle dynamics, ensuring the linearity required for reliable optimization. This enables the DCBF to enforce polygonal keep-in constraints via a real-time Quadratic Program (QP) that handles high relative degree and mitigates actuator saturation. Experiments in CARLA across electric and combustion platforms demonstrate that this structure-preserving approach significantly outperforms analytical and unstructured neural baselines.
【3】Sample-Adaptivity Tradeoff in On-Demand Sampling
标题:按需抽样中的样本适应性权衡
链接:https://arxiv.org/abs/2511.15507
作者:Nika Haghtalab,Omar Montasser,Mingda Qiao
备注
:50 pages, to appear at NeurIPS 2025
摘要:我们研究了按需采样的样本复杂度和轮复杂度之间的权衡,其中学习算法自适应地从有限数量的轮中的$k$分布采样。在多分布学习(MDL)的可实现设置中,我们证明了$r$轮算法的最佳样本复杂度近似为$dk^{Θ(1/r)} / ε$。对于一般不可知的情况下,我们提出了一个算法,实现了接近最佳的样本复杂度为$\widetilde O((d + k)/ ε^2)$内$\widetilde O(\sqrt{k})$轮。独立的兴趣,我们引入了一个新的框架,优化通过按需采样(OODS),抽象的样本自适应权衡,并捕获大多数现有的MDL算法。我们在OODS环境中建立了轮复杂性的近紧界。上界直接产生不可知MDL的$\widetilde O(\sqrt{k})$轮算法,而下界意味着实现子多项式轮复杂度将需要从根本上绕过OODS固有硬度的新技术。
摘要:We study the tradeoff between sample complexity and round complexity in on-demand sampling, where the learning algorithm adaptively samples from $k$ distributions over a limited number of rounds. In the realizable setting of Multi-Distribution Learning (MDL), we show that the optimal sample complexity of an $r$-round algorithm scales approximately as $dk^{Θ(1/r)} / ε$. For the general agnostic case, we present an algorithm that achieves near-optimal sample complexity of $\widetilde O((d + k) / ε^2)$ within $\widetilde O(\sqrt{k})$ rounds. Of independent interest, we introduce a new framework, Optimization via On-Demand Sampling (OODS), which abstracts the sample-adaptivity tradeoff and captures most existing MDL algorithms. We establish nearly tight bounds on the round complexity in the OODS setting. The upper bounds directly yield the $\widetilde O(\sqrt{k})$-round algorithm for agnostic MDL, while the lower bounds imply that achieving sub-polynomial round complexity would require fundamentally new techniques that bypass the inherent hardness of OODS.
【4】A Tensor Compiler for Processing-In-Memory Architectures
标题:内存处理架构的张量调整器
链接:https://arxiv.org/abs/2511.15503
作者:Peiming Yang,Sankeerth Durvasula,Ivan Fernandez,Mohammad Sadrosadati,Onur Mutlu,Gennady Pekhimenko,Christina Giannoula
摘要:与高性能主机处理器集成的内存处理(PIM)设备(例如,GPU)可以通过在PIM核心处利用高内存带宽来加速机器学习(ML)模型(包括大型语言模型(LLM))中的内存密集型内核。然而,主机处理器和PIM核心需要不同的数据布局:DRAM需要跨DRAM组分布的连续元素,而PIM核心需要它们在本地组内。这就需要在ML内核执行中重新安排数据,这带来了显著的性能和可编程性挑战,进一步加剧了支持不同PIM后端的需求。当前的编译方法缺乏对跨多个PIM后端的不同ML内核的系统优化,并且在计算代码优化期间可能在很大程度上忽略数据重排。我们证明了数据重排和计算代码优化是相互依赖的,并且需要在调优过程中进行联合优化。为了解决这个问题,我们设计了DCC,这是PIM系统的第一个以数据为中心的ML编译器,它在统一的调优过程中共同优化数据重排和计算代码。DCC集成了一个多层PIM抽象,可以在不同的PIM后端上实现各种数据分发和处理策略。DCC通过将数据分区策略映射到计算循环分区,应用特定于PIM的代码优化以及利用快速准确的性能预测模型来选择最佳配置,从而实现有效的协同优化。我们在各种单独的ML内核中的评估表明,DCC在HBM-PIM上实现了高达7.68倍的加速(平均2.7倍),在AttAcc PIM后端上实现了高达13.17倍的加速(平均5.75倍)。在端到端LLM推理中,AttAcc上的DCC将GPT-3和LLaMA-2的GPU速度提高了7.71倍(平均4.88倍)。
摘要:Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Models (LLMs), by leveraging high memory bandwidth at PIM cores. However, Host processors and PIM cores require different data layouts: Hosts need consecutive elements distributed across DRAM banks, while PIM cores need them within local banks. This necessitates data rearrangements in ML kernel execution that pose significant performance and programmability challenges, further exacerbated by the need to support diverse PIM backends. Current compilation approaches lack systematic optimization for diverse ML kernels across multiple PIM backends and may largely ignore data rearrangements during compute code optimization. We demonstrate that data rearrangements and compute code optimization are interdependent, and need to be jointly optimized during the tuning process. To address this, we design DCC, the first data-centric ML compiler for PIM systems that jointly co-optimizes data rearrangements and compute code in a unified tuning process. DCC integrates a multi-layer PIM abstraction that enables various data distribution and processing strategies on different PIM backends. DCC enables effective co-optimization by mapping data partitioning strategies to compute loop partitions, applying PIM-specific code optimizations and leveraging a fast and accurate performance prediction model to select optimal configurations. Our evaluations in various individual ML kernels demonstrate that DCC achieves up to 7.68x speedup (2.7x average) on HBM-PIM and up to 13.17x speedup (5.75x average) on AttAcc PIM backend over GPU-only execution. In end-to-end LLM inference, DCC on AttAcc accelerates GPT-3 and LLaMA-2 by up to 7.71x (4.88x average) over GPU.
【5】NTK-Guided Implicit Neural Teaching
标题:NTK引导的内隐神经教学
链接:https://arxiv.org/abs/2511.15487
作者:Chen Zhang,Wei Zuo,Bingyang Cheng,Yikun Wang,Wei-Bin Kou,Yik Chung WU,Ngai Wong
备注:Preprint
摘要:隐式神经表示(INR)通过多层感知器(MLP)对连续信号进行参数化,为图像,音频和3D重建等任务实现紧凑,分辨率无关的建模。然而,拟合高分辨率信号需要优化数百万个坐标,从而导致高昂的计算成本。为了解决这个问题,我们提出了NTK引导的隐式神经教学(NINT),它通过动态选择最大化全局功能更新的坐标来加速训练。利用神经正切核(NTK),NINT通过NTK增强的损失梯度的范数对示例进行评分,捕获拟合误差和异构杠杆(自我影响和交叉坐标耦合)。与现有方法相比,这种双重考虑可以实现更快的收敛。通过大量的实验,我们证明了NINT在保持或提高表示质量的同时,将训练时间显著减少了近一半,在最近的基于采样的策略中建立了最先进的加速。
摘要:Implicit Neural Representations (INRs) parameterize continuous signals via multilayer perceptrons (MLPs), enabling compact, resolution-independent modeling for tasks like image, audio, and 3D reconstruction. However, fitting high-resolution signals demands optimizing over millions of coordinates, incurring prohibitive computational costs. To address it, we propose NTK-Guided Implicit Neural Teaching (NINT), which accelerates training by dynamically selecting coordinates that maximize global functional updates. Leveraging the Neural Tangent Kernel (NTK), NINT scores examples by the norm of their NTK-augmented loss gradients, capturing both fitting errors and heterogeneous leverage (self-influence and cross-coordinate coupling). This dual consideration enables faster convergence compared to existing methods. Through extensive experiments, we demonstrate that NINT significantly reduces training time by nearly half while maintaining or improving representation quality, establishing state-of-the-art acceleration among recent sampling-based strategies.
【6】CID: Measuring Feature Importance Through Counterfactual Distributions
标题:CID:通过反事实分布衡量特征重要性
链接:https://arxiv.org/abs/2511.15371
作者:Eddie Conti,Álvaro Parafita,Axel Brando
备注:Accepted at Northern Lights Deep Learning (NLDL) 2026 Conference
摘要:评估机器学习中个体特征的重要性对于理解模型的决策过程至关重要。虽然存在许多方法,但缺乏明确的基础事实进行比较,突出了对替代性、有充分依据的措施的需要。本文介绍了一种新的事后局部特征重要性方法,称为反事实重要性分布(CID)。我们生成两组正面和负面反事实,使用核密度估计对其分布进行建模,并基于分布相异性度量对特征进行排名。这一衡量标准建立在严格的数学框架之上,满足了作为有效指标所需的关键属性。我们展示了我们的方法的有效性,通过比较完善的本地功能的重要性解释。我们的方法不仅为现有方法提供了补充视角,而且还提高了忠诚度指标(全面性和充分性)的性能,从而对系统进行了更忠实的解释。这些结果突出了它作为模型分析的有价值的工具的潜力。
摘要
:Assessing the importance of individual features in Machine Learning is critical to understand the model's decision-making process. While numerous methods exist, the lack of a definitive ground truth for comparison highlights the need for alternative, well-founded measures. This paper introduces a novel post-hoc local feature importance method called Counterfactual Importance Distribution (CID). We generate two sets of positive and negative counterfactuals, model their distributions using Kernel Density Estimation, and rank features based on a distributional dissimilarity measure. This measure, grounded in a rigorous mathematical framework, satisfies key properties required to function as a valid metric. We showcase the effectiveness of our method by comparing with well-established local feature importance explainers. Our method not only offers complementary perspectives to existing approaches, but also improves performance on faithfulness metrics (both for comprehensiveness and sufficiency), resulting in more faithful explanations of the system. These results highlight its potential as a valuable tool for model analysis.
【7】Quant-Trim in Practice: Improved Cross-Platform Low-Bit Deployment on Edge NPUs
标题:量化修剪实践:改进边缘NPU上的跨平台低位部署
链接:https://arxiv.org/abs/2511.15300
作者:Rayen Dhahri,Steffen Urban
备注:Accepted to a Eurips 2025 workshop, work in progress
摘要:专门的边缘加速器依赖于低位量化,但供应商编译器在缩放,裁剪和内核支持方面有所不同,通常是黑盒。因此,相同的浮点(FP)检查点可能会在后端之间产生不一致的准确性,迫使从业者调整标志或重构模型以提供商友好的操作符子集。我们介绍Quant-Trim,一种训练阶段方法,它产生一个对后端和精度选择具有鲁棒性的硬件中立检查点。它结合了渐进式假量化,以使训练与部署的整数网格保持一致,并反向修剪以驯服离群值驱动的规模膨胀,同时保持可学习性。Quant-Trim与量化方案(对称/非对称,每张量/每通道,INT 8/INT 4)无关,并且不需要特定于供应商的图形更改。跨模型和任务,它缩小了FP,低位间隙,减少了对编译器编译/校准的依赖,并避免了每个后端的重新训练。我们报告了静态/动态激活缩放和不同运营商覆盖范围下的准确性和边缘指标延迟,吞吐量,能量/推理和成本。
摘要:Specialized edge accelerators rely on low-bit quantization, but vendor compilers differ in scaling, clipping, and kernel support, often as black boxes. The same floating-point (FP) checkpoint can therefore yield inconsistent accuracy across backends, forcing practitioners to tweak flags or refactor models to vendor-friendly operator subsets. We introduce Quant-Trim, a training-phase method that produces a hardware-neutral checkpoint robust to backend and precision choices. It combines progressive fake quantization to align training with the deployed integer grid and reverse pruning to tame outlier-driven scale inflation while preserving learnability. Quant-Trim is agnostic to quantization schemes (symmetric/asymmetric,per-tensor/per-channel, INT8/INT4) and requires no vendor-specific graph changes.Across models and tasks, it narrows the FP,low-bit gap, reduces dependence on compiler heuristics/calibration, and avoids per-backend retraining. We report accuracy and edge metrics latency, throughput, energy/inference, and cost under static/dynamic activation scaling and varying operator coverage.
【8】Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
标题:揭开文本的内在维度:从学术摘要到创意故事
链接:https://arxiv.org/abs/2511.15210
作者:Vladislav Pedashenko,Laida Kushnareva,Yana Khassan Nibal,Eduard Tulchinskii,Kristian Kuznetsov,Vladislav Zharchinskii,Yury Maximov,Irina Piontkovskaya
摘要:内在维度(ID)是现代LLM分析中的一个重要工具,为训练动态,缩放行为和数据集结构的研究提供了信息,但其文本决定因素仍然未得到充分探索。我们通过交叉编码器分析,语言特征和稀疏自动编码器(SAE)在可解释的文本属性中提供了第一个全面的研究基础ID。在这项工作中,我们建立了三个关键的发现。首先,ID与基于熵的度量是互补的:在控制长度之后,两者是不相关的,ID捕获与预测质量正交的几何复杂度。其次,ID表现出强大的体裁分层:科学散文显示低ID(~8),散文内容中等ID(~9),创意/观点写作高ID(~10.5)在所有测试的模型。这表明,当代法学硕士发现科学文本“代表性简单”,而小说需要额外的自由度。第三,使用SAE,我们确定了因果特征:科学信号(正式语气,报告模板,统计数据)减少了ID;人性化信号(个性化,情感,叙事)增加了它。因此,对于当代模式,科学写作似乎相对“容易”,而小说,意见和影响增加了代表性的自由度。我们多方面的分析为正确使用ID和基于ID的结果的合理解释提供了实际指导。
摘要:Intrinsic dimension (ID) is an important tool in modern LLM analysis, informing studies of training dynamics, scaling behavior, and dataset structure, yet its textual determinants remain underexplored. We provide the first comprehensive study grounding ID in interpretable text properties through cross-encoder analysis, linguistic features, and sparse autoencoders (SAEs). In this work, we establish three key findings. First, ID is complementary to entropy-based metrics: after controlling for length, the two are uncorrelated, with ID capturing geometric complexity orthogonal to prediction quality. Second, ID exhibits robust genre stratification: scientific prose shows low ID (~8), encyclopedic content medium ID (~9), and creative/opinion writing high ID (~10.5) across all models tested. This reveals that contemporary LLMs find scientific text "representationally simple" while fiction requires additional degrees of freedom. Third, using SAEs, we identify causal features: scientific signals (formal tone, report templates, statistics) reduce ID; humanized signals (personalization, emotion, narrative) increase it. Steering experiments confirm these effects are causal. Thus, for contemporary models, scientific writing appears comparatively "easy", whereas fiction, opinion, and affect add representational degrees of freedom. Our multi-faceted analysis provides practical guidance for the proper use of ID and the sound interpretation of ID-based results.
【9】HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples
标题:HinTel-AlignBench:印度教泰卢固语框架和基准,带有英语一致样本
链接:https://arxiv.org/abs/2511.15183
作者:Rishikant Chigrupaatii,Ponnada Sai Tulasi Kanishka,Lalit Chandra Routhu,Martin Patel Sama Supratheek Reddy,Divyam Gupta,Dasari Srikar,Krishna Teja Kuchimanchi,Rajiv Misra,Rohun Tripathi
摘要:印度拥有近15亿人口和120多种主要语言,是世界上最多样化的地区之一。随着多语言视觉语言模型(VLM)的日益突出,强大的评估方法对于推动低资源语言实现公平的AI至关重要。目前的多语言VLM评估存在四个主要局限性:依赖未经验证的自动翻译,任务/领域覆盖范围狭窄,样本量有限,以及缺乏文化和本地来源的翻译(QA)。为了解决这些差距,我们提出了一个可扩展的框架来评估VLM在印度的语言,并将其与英语的性能进行比较。使用该框架,我们生成HinTel-AlignBench,这是一个基准测试,它从印地语和泰卢固语的不同来源中提取英语对齐的样本。我们的贡献有三个方面:(1)结合了反向翻译、过滤和人工验证的半自动化数据集创建框架;(2)最全面的印地语和泰卢固语视觉语言基准,包括改编的英语数据集(VQAv 2,RealWorldQA,CLEVR-Math)和本地新颖的印度数据集(JEE用于STEM,VAANI用于文化基础),每种语言约有4,000对QA;以及(3)各种最新技术水平(SOTA)的开放权重和闭源VLM的详细性能分析。我们发现,在所有模型中,5个任务中有4个任务的英语任务与印度语言任务的性能出现回归,印地语平均回归8.3分,泰卢固语平均回归5.5分。我们对常见的故障模式进行分类,以突出多语言多模态理解的具体改进领域。
摘要:With nearly 1.5 billion people and more than 120 major languages, India represents one of the most diverse regions in the world. As multilingual Vision-Language Models (VLMs) gain prominence, robust evaluation methodologies are essential to drive progress toward equitable AI for low-resource languages. Current multilingual VLM evaluations suffer from four major limitations: reliance on unverified auto-translations, narrow task/domain coverage, limited sample sizes, and lack of cultural and natively sourced Question-Answering (QA). To address these gaps, we present a scalable framework to evaluate VLMs in Indian languages and compare it with performance in English. Using the framework, we generate HinTel-AlignBench, a benchmark that draws from diverse sources in Hindi and Telugu with English-aligned samples. Our contributions are threefold: (1) a semi-automated dataset creation framework combining back-translation, filtering, and human verification; (2) the most comprehensive vision-language benchmark for Hindi and and Telugu, including adapted English datasets (VQAv2, RealWorldQA, CLEVR-Math) and native novel Indic datasets (JEE for STEM, VAANI for cultural grounding) with approximately 4,000 QA pairs per language; and (3) a detailed performance analysis of various State-of-the-Art (SOTA) open-weight and closed-source VLMs. We find a regression in performance for tasks in English versus in Indian languages for 4 out of 5 tasks across all the models, with an average regression of 8.3 points in Hindi and 5.5 points for Telugu. We categorize common failure modes to highlight concrete areas of improvement in multilingual multimodal understanding.
【10】Complex variational autoencoders admit Kähler structure
标题:复杂变分自动编码器承认Kähler结构
链接:https://arxiv.org/abs/2511.15172
作者:Andrew Gracyk
备注:First version
摘要:已经发现,潜在的欧几里德变分自编码器(VAE)承认,在各种能力,黎曼结构。我们适应这些参数,但复杂的VAE与复杂的潜伏期。我们发现,复杂的VAE揭示了一定程度的凯勒几何结构。我们的方法将针对解码器的几何形状进行定制。我们推导了Fisher信息度量在复杂情况下的潜在复高斯正则化与平凡的关系矩阵。从统计信息论中可以看出,Fisher信息与Kullback-Leibler(KL)散度的Hessian函数是一致的。因此,度规Kähler势关系在相对熵下精确地实现。我们提出了一个Kähler潜在的复杂高斯混合物的衍生物,具有粗略的等价性的Fisher信息度量,同时仍然忠实于底层的Kähler几何。通过这个潜在的度量的计算是有效的,并通过我们的潜在的,有效的作为一个pluisubharmonic(PSH)功能,大规模的自动微分的计算负担被转移到小规模。我们表明,我们可以正则化的潜在空间与解码器的几何形状,我们可以按照一个加权的复杂的体积元素进行采样。我们证明了这些策略,在交换样本的变化,产生一致的更平滑的表示和更少的语义离群值。
摘要:It has been discovered that latent-Euclidean variational autoencoders (VAEs) admit, in various capacities, Riemannian structure. We adapt these arguments but for complex VAEs with a complex latent stage. We show that complex VAEs reveal to some level Kähler geometric structure. Our methods will be tailored for decoder geometry. We derive the Fisher information metric in the complex case under a latent complex Gaussian regularization with trivial relation matrix. It is well known from statistical information theory that the Fisher information coincides with the Hessian of the Kullback-Leibler (KL) divergence. Thus, the metric Kähler potential relation is exactly achieved under relative entropy. We propose a Kähler potential derivative of complex Gaussian mixtures that has rough equivalence to the Fisher information metric while still being faithful to the underlying Kähler geometry. Computation of the metric via this potential is efficient, and through our potential, valid as a plurisubharmonic (PSH) function, large scale computational burden of automatic differentiation is displaced to small scale. We show that we can regularize the latent space with decoder geometry, and that we can sample in accordance with a weighted complex volume element. We demonstrate these strategies, at the exchange of sample variation, yield consistently smoother representations and fewer semantic outliers.
【11】GPU-Initiated Networking for NCCL
标题:GPU发起的NCCL网络
链接:https://arxiv.org/abs/2511.15076
作者:Khaled Hamidouche,John Bachan,Pak Markthub,Peter-Jan Gootzen,Elena Agostini,Sylvain Jeaugey,Aamir Shafi,Georgios Theodorakis,Manjunath Gorentla Venkata
备注:13 pages, 9 figures, 3 tables
摘要:现代人工智能工作负载,尤其是混合专家(MoE)架构,越来越需要低延迟、细粒度的GPU到GPU通信以及设备端控制。传统的GPU通信遵循主机启动的模型,其中CPU协调所有通信操作-这是CUDA运行时的特征。虽然对于集体操作来说是健壮的,但是需要紧密集成计算和通信的应用可以从消除CPU协调开销的设备发起的通信中受益。 NCCL 2.28引入了具有三种操作模式的设备API:用于NVLink/PCIe的加载/存储可扩展(LSA),用于NVLink SHARP的多线程,以及用于网络RDMA的GPU发起的网络(GIN)。本文介绍了GIN的体系结构,设计,语义,并强调其对MoE通信的影响。GIN建立在三层架构上:i)NCCL Core主机端API,用于设备通信器设置和集体内存窗口注册; ii)设备端API,用于可从CUDA内核调用的远程内存操作;以及iii)具有双重语义(GPUDirect异步内核启动和代理)的网络插件架构,用于广泛的硬件支持。GPUDirect Async Kernel-Initiated后端利用DOCA GPUNetIO进行直接GPU到NIC通信,而Proxy后端通过标准RDMA网络上的无锁GPU到CPU队列提供等效功能。我们通过与DeepEP(一个MoE通信库)的集成展示了GIN的实用性。全面的基准测试表明,GIN在NCCL的统一运行时内提供设备发起的通信,将低延迟操作与NCCL的集体算法和生产基础设施相结合。
摘要:Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control. Traditional GPU communication follows a host-initiated model, where the CPU orchestrates all communication operations - a characteristic of the CUDA runtime. Although robust for collective operations, applications requiring tight integration of computation and communication can benefit from device-initiated communication that eliminates CPU coordination overhead. NCCL 2.28 introduces the Device API with three operation modes: Load/Store Accessible (LSA) for NVLink/PCIe, Multimem for NVLink SHARP, and GPU-Initiated Networking (GIN) for network RDMA. This paper presents the GIN architecture, design, semantics, and highlights its impact on MoE communication. GIN builds on a three-layer architecture: i) NCCL Core host-side APIs for device communicator setup and collective memory window registration; ii) Device-side APIs for remote memory operations callable from CUDA kernels; and iii) A network plugin architecture with dual semantics (GPUDirect Async Kernel-Initiated and Proxy) for broad hardware support. The GPUDirect Async Kernel-Initiated backend leverages DOCA GPUNetIO for direct GPU-to-NIC communication, while the Proxy backend provides equivalent functionality via lock-free GPU-to-CPU queues over standard RDMA networks. We demonstrate GIN's practicality through integration with DeepEP, an MoE communication library. Comprehensive benchmarking shows that GIN provides device-initiated communication within NCCL's unified runtime, combining low-latency operations with NCCL's collective algorithms and production infrastructure.
【12】Compiling to recurrent neurons
标题:编译为复发神经元
链接:https://arxiv.org/abs/2511.14953
作者:Joey Velez-Ginorio,Nada Amin,Konrad Kording,Steve Zdancewic
摘要:离散结构目前是可微规划中的第二类。由于离散结构上的函数缺乏明显的导数,可微程序不能区分它们,并限制它们的使用范围。例如,在对神经网络进行编程时,条件和迭代不能在任何地方使用;它们可能会破坏基于梯度的学习所必需的导数。这限制了我们可以直接表达的可微算法的类别,对我们如何构建神经网络和更一般的可微程序施加了限制。然而,这些限制并不是根本性的。最近的工作表明,条件可以是一流的,通过编译成线性神经元的可微形式。同样,这项工作表明迭代可以是一流的-通过编译线性递归神经元。我们提出了一个最小类型的,高阶和线性编程语言的迭代称为$\textsf{Cajal}\scriptstyle(\mathbb{\multimap},\mathbb{2},\mathbb{N})$。我们证明了它的程序可以正确编译为递归神经元,允许离散算法以与基于梯度的学习兼容的可微形式表示。在我们的实现中,我们进行了两个实验,将这些递归神经元与解决迭代图像变换任务的神经网络联系起来。这决定了它在学习之前的部分功能。因此,相对于没有第一类迭代编程的神经网络,该网络学习速度更快,数据效率更高。一个关键的教训是,递归神经元在学习和普通编程的离散结构之间实现了丰富的相互作用。
摘要:Discrete structures are currently second-class in differentiable programming. Since functions over discrete structures lack overt derivatives, differentiable programs do not differentiate through them and limit where they can be used. For example, when programming a neural network, conditionals and iteration cannot be used everywhere; they can break the derivatives necessary for gradient-based learning to work. This limits the class of differentiable algorithms we can directly express, imposing restraints on how we build neural networks and differentiable programs more generally. However, these restraints are not fundamental. Recent work shows conditionals can be first-class, by compiling them into differentiable form as linear neurons. Similarly, this work shows iteration can be first-class -- by compiling to linear recurrent neurons. We present a minimal typed, higher-order and linear programming language with iteration called $\textsf{Cajal}\scriptstyle(\mathbb{\multimap}, \mathbb{2}, \mathbb{N})$. We prove its programs compile correctly to recurrent neurons, allowing discrete algorithms to be expressed in a differentiable form compatible with gradient-based learning. With our implementation, we conduct two experiments where we link these recurrent neurons against a neural network solving an iterative image transformation task. This determines part of its function prior to learning. As a result, the network learns faster and with greater data-efficiency relative to a neural network programmed without first-class iteration. A key lesson is that recurrent neurons enable a rich interplay between learning and the discrete structures of ordinary programming.
【13】Front-door Reducibility: Reducing ADMGs to the Standard Front-door Setting via a Graphical Criterion
标题:前门降低:通过图形标准将ADMG降低到标准前门设置
链接:https://arxiv.org/abs/2511.15679
作者:Jianqiao Mao,Max A. Little
备注:16 pages, 3 figures
摘要:前门平差在经典前门准则下提供了一个简单的闭合形式识别公式,但其适用性往往被认为是狭窄和严格的。虽然ID算法是非常有用的,并被证明是有效的因果关系识别在一般的因果图(如果它是可识别的),执行ID算法并不能保证获得一个实用的,易于估计的介入分布表达式。我们认为,前门标准的适用性并不像它似乎是有限的:许多更复杂的因果图可以减少到前门标准。在本文中,我们介绍了前门归约(FDR),一个图形条件的无环有向混合图(ADMG),扩展了经典的前门标准的适用性,以减少一个大家庭的复杂的因果图的前门设置通过聚合变量到超级节点(FDR三元组)$\left(\boldsymbol{X}^{*},\boldsymbol{Y}^{*},\boldsymbol{M}^{*}\right)$。通过对FDR准则的描述,证明了FDR准则的满足与FDR平差的适用性之间的图级等价性。同时,我们提出了一个检测可容许FDR三元组的精确算法FDR-TID,并证明了该算法的正确性、完备性和有限终止性。经验驱动的例子表明,教科书前门设置之外的许多图形是FDR,产生简单的,可估计的调整,一般的ID表达式将是繁琐的。因此,FDR通过优先考虑可解释性和计算简单性而不牺牲混合图的通用性来补充现有的识别方法。
摘要:Front-door adjustment provides a simple closed-form identification formula under the classical front-door criterion, but its applicability is often viewed as narrow and strict. Although ID algorithm is very useful and is proved effective for causal relation identification in general causal graphs (if it is identifiable), performing ID algorithm does not guarantee to obtain a practical, easy-to-estimate interventional distribution expression. We argue that the applicability of the front-door criterion is not as limited as it seems: many more complicated causal graphs can be reduced to the front-door criterion. In this paper, We introduce front-door reducibility (FDR), a graphical condition on acyclic directed mixed graphs (ADMGs) that extends the applicability of the classic front-door criterion to reduce a large family of complicated causal graphs to a front-door setting by aggregating variables into super-nodes (FDR triple) $\left(\boldsymbol{X}^{*},\boldsymbol{Y}^{*},\boldsymbol{M}^{*}\right)$. After characterizing FDR criterion, we prove a graph-level equivalence between the satisfication of FDR criterion and the applicability of FDR adjustment. Meanwhile, we then present FDR-TID, an exact algorithm that detects an admissible FDR triple, together with established the algorithm's correctness, completeness, and finite termination. Empirically-motivated examples illustrate that many graphs outside the textbook front-door setting are FDR, yielding simple, estimable adjustments where general ID expressions would be cumbersome. FDR thus complements existing identification method by prioritizing interpretability and computational simplicity without sacrificing generality across mixed graphs.
【14】Rényi Differential Privacy for Heavy-Tailed SDEs via Fractional Poincaré Inequalities
标题:Rényi通过分数Poincaré不等式实现重尾SDP的差异隐私
链接:https://arxiv.org/abs/2511.15634
作者:Benjamin Dupuis,Mert Gürbüzbalaban,Umut Şimşekli,Jian Wang,Sinan Yildirim,Lingjiong Zhu
摘要:近年来,描述学习算法的差分隐私(DP)已成为一个主要的挑战。同时,许多研究建议研究具有重尾噪声的随机梯度下降(SGD)的行为,既可以作为现代深度学习模型的模型,也可以提高其性能。然而,大多数DP边界集中在轻尾噪声,其中已经获得了令人满意的保证,但所提出的技术不直接扩展到重尾设置。最近,获得了重尾SGD的第一个DP保证。这些结果提供了$(0,δ)$-DP保证,而不需要梯度裁剪。尽管对DP和重尾算法之间的联系有了新的认识,但这些结果对参数的数量有很强的依赖性,并且不能扩展到其他DP概念,如成熟的Rényi差分隐私(RDP)。在这项工作中,我们建议通过推导重尾SDEs的第一个RDP保证以及它们的离散化对应物来解决这些限制。我们的框架是基于新的Rényi流计算和使用良好的分数庞加莱不等式。在这样的不等式被满足的假设下,我们获得了DP保证,与现有技术相比,DP保证对维度的依赖性要弱得多。
摘要:Characterizing the differential privacy (DP) of learning algorithms has become a major challenge in recent years. In parallel, many studies suggested investigating the behavior of stochastic gradient descent (SGD) with heavy-tailed noise, both as a model for modern deep learning models and to improve their performance. However, most DP bounds focus on light-tailed noise, where satisfactory guarantees have been obtained but the proposed techniques do not directly extend to the heavy-tailed setting. Recently, the first DP guarantees for heavy-tailed SGD were obtained. These results provide $(0,δ)$-DP guarantees without requiring gradient clipping. Despite casting new light on the link between DP and heavy-tailed algorithms, these results have a strong dependence on the number of parameters and cannot be extended to other DP notions like the well-established Rényi differential privacy (RDP). In this work, we propose to address these limitations by deriving the first RDP guarantees for heavy-tailed SDEs, as well as their discretized counterparts. Our framework is based on new Rényi flow computations and the use of well-established fractional Poincaré inequalities. Under the assumption that such inequalities are satisfied, we obtain DP guarantees that have a much weaker dependence on the dimension compared to prior art.
【15】Gini Score under Ties and Case Weights
标题:领带和箱权重下的基尼分数
链接:https://arxiv.org/abs/2511.15446
作者:Alexej Brauer,Mario V. Wüthrich
摘要:基尼系数是统计建模和机器学习中用于模型验证和模型选择的常用工具。这是一个纯粹的排名为基础的分数,允许一个评估风险排名。统计建模的基尼系数主要用于二进制环境,其中它有许多等效的重新表述,如受试者工作特征(ROC)或曲线下面积(AUC)。在精算文献中,这种基于二进制响应的等级分数已被扩展到使用洛伦兹曲线和浓度曲线的一般实值随机变量。虽然这些最初的概念假设的风险排名是由一个连续的分布函数,我们在本文中讨论如何基尼系数可用于在风险排名的关系的情况下。此外,我们适应基尼系数的共同精算情况下的权重。
摘要:The Gini score is a popular tool in statistical modeling and machine learning for model validation and model selection. It is a purely rank based score that allows one to assess risk rankings. The Gini score for statistical modeling has mainly been used in a binary context, in which it has many equivalent reformulations such as the receiver operating characteristic (ROC) or the area under the curve (AUC). In the actuarial literature, this rank based score for binary responses has been extended to general real-valued random variables using Lorenz curves and concentration curves. While these initial concepts assume that the risk ranking is generated by a continuous distribution function, we discuss in this paper how the Gini score can be used in the case of ties in the risk ranking. Moreover, we adapt the Gini score to the common actuarial situation of having case weights.
【16】Exponential Lasso: robust sparse penalization under heavy-tailed noise and outliers with exponential-type loss
标题:指数Lasso:重尾噪音和具有指数型损失的异常值下的稳健稀疏惩罚
链接:https://arxiv.org/abs/2511.15332
作者:The Tien Mai
摘要:在高维统计中,Lasso是同时进行变量选择和参数估计的基础方法。然而,它对平方损失函数的依赖使其对离群值和重尾噪声高度敏感,可能导致不可靠的模型选择和有偏估计。为了解决这个问题,我们引入了指数Lasso,这是一种新颖的鲁棒方法,它在Lasso框架内集成了指数型损失函数。该损失函数旨在实现高斯噪声下的统计效率和对数据污染的鲁棒性之间的平滑权衡。与限制大残差影响的其他方法不同,指数损失平滑地重新下降,有效地降低了极端离群值的影响,同时保留了小误差的近二次行为。我们建立的理论保证表明,指数Lasso实现了强大的统计收敛速度,在理想条件下匹配经典Lasso,同时保持其鲁棒性存在重尾污染。在计算上,估计器是有效地优化,通过优化最小化(MM)算法,迭代地解决了一系列加权Lasso子问题。数值实验表明,该方法是非常有竞争力的,优于经典的Lasso在污染的设置和保持强大的性能,即使在高斯噪声。 我们的方法在Github上的\texttt{R}包\texttt{heavylasso}中实现:https://github.com/tienmt/heavylasso
摘要
:In high-dimensional statistics, the Lasso is a cornerstone method for simultaneous variable selection and parameter estimation. However, its reliance on the squared loss function renders it highly sensitive to outliers and heavy-tailed noise, potentially leading to unreliable model selection and biased estimates. To address this limitation, we introduce the Exponential Lasso, a novel robust method that integrates an exponential-type loss function within the Lasso framework. This loss function is designed to achieve a smooth trade-off between statistical efficiency under Gaussian noise and robustness against data contamination. Unlike other methods that cap the influence of large residuals, the exponential loss smoothly redescends, effectively downweighting the impact of extreme outliers while preserving near-quadratic behavior for small errors. We establish theoretical guarantees showing that the Exponential Lasso achieves strong statistical convergence rates, matching the classical Lasso under ideal conditions while maintaining its robustness in the presence of heavy-tailed contamination. Computationally, the estimator is optimized efficiently via a Majorization-Minimization (MM) algorithm that iteratively solves a series of weighted Lasso subproblems. Numerical experiments demonstrate that the proposed method is highly competitive, outperforming the classical Lasso in contaminated settings and maintaining strong performance even under Gaussian noise. Our method is implemented in the \texttt{R} package \texttt{heavylasso} available on Github: https://github.com/tienmt/heavylasso
【17】Robust Bayesian Optimisation with Unbounded Corruptions
标题:具有无界腐蚀的鲁棒Bayesian优化
链接:https://arxiv.org/abs/2511.15315
作者:Abdelhamid Ezzerg,Ilija Bogunovic,Jeremias Knoblauch
摘要:贝叶斯优化非常容易受到极端离群值的影响。现有的可证明鲁棒的方法通常假设有界的累积腐败预算,这使得它们对即使是足够大的单一腐败也毫无防御能力。为了解决这个问题,我们引入了一个新的对手,其预算仅限于腐败的频率,而不是其规模。然后,我们推导出RCGP-UCB,耦合著名的置信上限(UCB)的方法与鲁棒共轭高斯过程(RCGP)的算法。我们提出了稳定和自适应版本的RCGP-UCB,并证明了他们实现次线性遗憾的存在下,高达$O(T^{1/2})$和$O(T^{1/3})$腐败可能无限大。这种鲁棒性几乎是零成本的:没有离群值,RCGP-UCB的遗憾界限与标准GP-UCB算法的遗憾界限相匹配。
摘要:Bayesian Optimization is critically vulnerable to extreme outliers. Existing provably robust methods typically assume a bounded cumulative corruption budget, which makes them defenseless against even a single corruption of sufficient magnitude. To address this, we introduce a new adversary whose budget is only bounded in the frequency of corruptions, not in their magnitude. We then derive RCGP-UCB, an algorithm coupling the famous upper confidence bound (UCB) approach with a Robust Conjugate Gaussian Process (RCGP). We present stable and adaptive versions of RCGP-UCB, and prove that they achieve sublinear regret in the presence of up to $O(T^{1/2})$ and $O(T^{1/3})$ corruptions with possibly infinite magnitude. This robustness comes at near zero cost: without outliers, RCGP-UCB's regret bounds match those of the standard GP-UCB algorithm.
【18】Particle Monte Carlo methods for Lattice Field Theory
标题:格场理论的粒子蒙特卡罗方法
链接:https://arxiv.org/abs/2511.15196
作者:David Yallup
备注:To appear in the NeurIPS 2025 workshop, Frontiers in Probabilistic Inference: Sampling Meets Learning
摘要:格场理论(LFT)中的高维多峰采样问题已经成为机器学习辅助采样方法的重要基准。我们表明,GPU加速粒子方法,顺序蒙特卡罗(SMC)和嵌套采样,提供了一个强大的经典基线,匹配或优于最先进的神经采样器的样本质量和挂钟时间标准标量场理论基准,同时也估计分区函数。这些方法仅使用单个数据驱动的协方差进行调整,在没有特定问题结构的情况下实现了有竞争力的性能,提高了学习建议证明其培训成本合理的标准。
摘要:High-dimensional multimodal sampling problems from lattice field theory (LFT) have become important benchmarks for machine learning assisted sampling methods. We show that GPU-accelerated particle methods, Sequential Monte Carlo (SMC) and nested sampling, provide a strong classical baseline that matches or outperforms state-of-the-art neural samplers in sample quality and wall-clock time on standard scalar field theory benchmarks, while also estimating the partition function. Using only a single data-driven covariance for tuning, these methods achieve competitive performance without problem-specific structure, raising the bar for when learned proposals justify their training cost.
【19】CASPER: Cross-modal Alignment of Spatial and single-cell Profiles for Expression Recovery
标题:CASPER:表达恢复的空间和单细胞谱的跨模式比对
链接:https://arxiv.org/abs/2511.15139
作者:Amit Kumar,Maninder Kaur,Raghvendra Mall,Sukrit Gupta
摘要:空间转录组学能够在其天然组织背景下定位基因表达,但由于实验限制和过高的成本,目前的平台只能测量有限的一组基因。为了克服这一点,计算模型将单细胞RNA测序数据与空间转录组学相结合,以预测未测量的基因。我们提出了CASPER,一个基于交叉注意的框架,通过利用单细胞RNA测序的质心水平表示来预测空间转录组学中未测量的基因表达。我们在四个现有的基线模型中对四个最先进的空间转录组学/单细胞RNA测序数据集对进行了严格的测试。CASPER在我们实验的12个指标中有9个指标有显著改善。这项工作为空间转录组学到单细胞RNA测序模式翻译的进一步工作铺平了道路。CASPER的代码可在https://github.com/AI4Med-Lab/CASPER上获得。
摘要:Spatial Transcriptomics enables mapping of gene expression within its native tissue context, but current platforms measure only a limited set of genes due to experimental constraints and excessive costs. To overcome this, computational models integrate Single-Cell RNA Sequencing data with Spatial Transcriptomics to predict unmeasured genes. We propose CASPER, a cross-attention based framework that predicts unmeasured gene expression in Spatial Transcriptomics by leveraging centroid-level representations from Single-Cell RNA Sequencing. We performed rigorous testing over four state-of-the-art Spatial Transcriptomics/Single-Cell RNA Sequencing dataset pairs across four existing baseline models. CASPER shows significant improvement in nine out of the twelve metrics for our experiments. This work paves the way for further work in Spatial Transcriptomics to Single-Cell RNA Sequencing modality translation. The code for CASPER is available at https://github.com/AI4Med-Lab/CASPER.
【20】Selective Forgetting in Option Calibration: An Operator-Theoretic Gauss-Newton Framework
标题:期权校准中的选择性遗忘:操作论高斯-牛顿框架
链接:https://arxiv.org/abs/2511.14980
作者:Ahmet Umur Özsoy
摘要:随着市场的发展,期权定价模型的校准通常会重复进行,但现代系统缺乏一个操作员,可以在没有完全重新训练的情况下从校准模型中删除数据。当报价变得陈旧、损坏或受到删除要求的约束时,现有的校准管道必须重建整个非线性最小二乘问题,即使只有一小部分数据必须被排除。在这项工作中,我们介绍了一个原则性的框架,选择性遗忘(机器学习)参数选项校准。我们提供了稳定性保证,扰动界,并表明,建议的运营商满足当地的准确性标准的正则性假设。
摘要
:Calibration of option pricing models is routinely repeated as markets evolve, yet modern systems lack an operator for removing data from a calibrated model without full retraining. When quotes become stale, corrupted, or subject to deletion requirements, existing calibration pipelines must rebuild the entire nonlinear least-squares problem, even if only a small subset of data must be excluded. In this work, we introduce a principled framework for selective forgetting (machine unlearning) in parametric option calibration. We provide stability guarantees, perturbation bounds, and show that the proposed operators satisfy local exactness under standard regularity assumptions.
【21】Implicit Bias of the JKO Scheme
标题:JKO计划的隐性偏见
链接:https://arxiv.org/abs/2511.14827
作者:Peter Halmos,Boris Hanin
摘要:Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $η>0$ a sequence of probability distributions $ρ_k^η$ that approximate to first order in $η$ Wasserstein gradient flow on $J$. But the JKO scheme also has many other remarkable properties not shared by other first order integrators, e.g. it preserves energy dissipation and exhibits unconditional stability for $λ$-geodesically convex functionals $J$. To better understand the JKO scheme we characterize its implicit bias at second order in $η$. We show that $ρ_k^η$ are approximated to order $η^2$ by Wasserstein gradient flow on a \emph{modified} energy \[ J^η(ρ) = J(ρ) - \fracη{4}\int_M \Big\lVert \nabla_g \frac{δJ}{δρ} (ρ) \Big\rVert_{2}^{2} \,ρ(dx), \] obtained by subtracting from $J$ the squared metric curvature of $J$ times $η/4$. The JKO scheme therefore adds at second order in $η$ a \textit{deceleration} in directions where the metric curvature of $J$ is rapidly changing. This corresponds to canonical implicit biases for common functionals: for entropy the implicit bias is the Fisher information, for KL-divergence it is the Fisher-Hyv{ä}rinen divergence, and for Riemannian gradient descent it is the kinetic energy in the metric $g$. To understand the differences between minimizing $J$ and $J^η$ we study \emph{JKO-Flow}, Wasserstein gradient flow on $J^η$, in several simple numerical examples. These include exactly solvable Langevin dynamics on the Bures-Wasserstein space and Langevin sampling from a quartic potential in 1D.
摘要:Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $η>0$ a sequence of probability distributions $ρ_k^η$ that approximate to first order in $η$ Wasserstein gradient flow on $J$. But the JKO scheme also has many other remarkable properties not shared by other first order integrators, e.g. it preserves energy dissipation and exhibits unconditional stability for $λ$-geodesically convex functionals $J$. To better understand the JKO scheme we characterize its implicit bias at second order in $η$. We show that $ρ_k^η$ are approximated to order $η^2$ by Wasserstein gradient flow on a \emph{modified} energy \[ J^η(ρ) = J(ρ) - \fracη{4}\int_M \Big\lVert \nabla_g \frac{δJ}{δρ} (ρ) \Big\rVert_{2}^{2} \,ρ(dx), \] obtained by subtracting from $J$ the squared metric curvature of $J$ times $η/4$. The JKO scheme therefore adds at second order in $η$ a \textit{deceleration} in directions where the metric curvature of $J$ is rapidly changing. This corresponds to canonical implicit biases for common functionals: for entropy the implicit bias is the Fisher information, for KL-divergence it is the Fisher-Hyv{ä}rinen divergence, and for Riemannian gradient descent it is the kinetic energy in the metric $g$. To understand the differences between minimizing $J$ and $J^η$ we study \emph{JKO-Flow}, Wasserstein gradient flow on $J^η$, in several simple numerical examples. These include exactly solvable Langevin dynamics on the Bures-Wasserstein space and Langevin sampling from a quartic potential in 1D.
【22】Fully Differentiable dMRI Streamline Propagation in PyTorch
标题:PyTorch中的完全可区分dMRI流线传播
链接:https://arxiv.org/abs/2511.14807
作者:Jongyeon Yoon,Elyssa M. McMaster,Michael E. Kim,Gaurav Rudravaram,Kurt G. Schilling,Bennett A. Landman,Daniel Moyer
备注:9 pages, 4 figures. Accepted to SPIE Medical Imaging 2026: Image Processing
摘要:扩散MRI(dMRI)提供了一种独特的手段来探测活组织的微观结构架构,促进了诸如脑连接分析、跨多种条件建模和宏观结构特征估计等应用。纤维束成像出现于20世纪末,并在21世纪初加速发展,是一种使用dMRI可视化大脑中白质通路的技术。大多数扩散纤维束成像方法依赖于程序流线传播或全局能量最小化方法。尽管深度学习的最新进展已经实现了以前具有挑战性的任务,但现有的纤维束成像方法通常是不可区分的,限制了它们在端到端学习框架中的集成。虽然在可微框架中表示流线方面已经取得了进展,但没有现有的方法提供完全可微的传播。在这项工作中,我们提出了一个完全可微的解决方案,保持数值保真度与领先的流线算法。关键是我们的PyTorch设计的流线传播器没有阻止梯度流的组件,使其完全可微。我们表明,我们的方法匹配标准的传播,同时保持可微。通过将流线型传播转换为可区分的PyTorch框架,我们能够将纤维束成像更深入地集成到深度学习工作流中,为一种新的宏观结构推理类别奠定基础,这种推理不仅在计算上稳健,而且在科学上严谨。
摘要:Diffusion MRI (dMRI) provides a distinctive means to probe the microstructural architecture of living tissue, facilitating applications such as brain connectivity analysis, modeling across multiple conditions, and the estimation of macrostructural features. Tractography, which emerged in the final years of the 20th century and accelerated in the early 21st century, is a technique for visualizing white matter pathways in the brain using dMRI. Most diffusion tractography methods rely on procedural streamline propagators or global energy minimization methods. Although recent advancements in deep learning have enabled tasks that were previously challenging, existing tractography approaches are often non-differentiable, limiting their integration in end-to-end learning frameworks. While progress has been made in representing streamlines in differentiable frameworks, no existing method offers fully differentiable propagation. In this work, we propose a fully differentiable solution that retains numerical fidelity with a leading streamline algorithm. The key is that our PyTorch-engineered streamline propagator has no components that block gradient flow, making it fully differentiable. We show that our method matches standard propagators while remaining differentiable. By translating streamline propagation into a differentiable PyTorch framework, we enable deeper integration of tractography into deep learning workflows, laying the foundation for a new category of macrostructural reasoning that is not only computationally robust but also scientifically rigorous.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递