机器学习学术速递[12.1]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计227篇

大模型相关(25篇)

【1】MathSight: A Benchmark Exploring Have Vision-Language Models Really Seen in University-Level Mathematical Reasoning?
标题：MathSight：基准探索视觉语言模型真的在大学水平的数学推理中看到了吗？
链接：https://arxiv.org/abs/2511.23112

作者：Yuandong Wang,Yao Cui,Yuxin Zhao,Zhen Yang,Yangfu Zhu,Zhenzhou Shao
备注：Comments: 32 pages, 15 figures, 9 tables, includes appendix. Project page: https://cnu-bot-group.github.io/MathSight/
摘要：视觉语言模型（VLM）的最新进展在多模态数学推理方面取得了令人印象深刻的进展。然而，有多少视觉信息真正有助于推理仍然不清楚。现有的基准报告强大的整体性能，但很少孤立的图像模态的作用，留下开放的VLM是否真正利用视觉理解或仅仅依赖于语言先验。为了解决这个问题，我们提出了MathSight，一个大学水平的多模态数学推理基准，旨在解开和量化视觉输入的影响。每个问题都包括多个视觉变量-原始，手绘，照片捕捉-和一个纯文本的条件进行控制比较。最先进的VLMs实验揭示了一个一致的趋势：视觉信息的贡献随着问题难度的增加而减少。值得注意的是，没有任何图像输入的Qwen 3-VL超越了它的多模态变体和GPT-5，强调了像MathSight这样的基准在未来模型中推进真正的基于视觉的推理的必要性。
摘要：Recent advances in Vision-Language Models (VLMs) have achieved impressive progress in multimodal mathematical reasoning. Yet, how much visual information truly contributes to reasoning remains unclear. Existing benchmarks report strong overall performance but seldom isolate the role of the image modality, leaving open whether VLMs genuinely leverage visual understanding or merely depend on linguistic priors. To address this, we present MathSight, a university-level multimodal mathematical reasoning benchmark designed to disentangle and quantify the effect of visual input. Each problem includes multiple visual variants -- original, hand-drawn, photo-captured -- and a text-only condition for controlled comparison. Experiments on state-of-the-art VLMs reveal a consistent trend: the contribution of visual information diminishes with increasing problem difficulty. Remarkably, Qwen3-VL without any image input surpasses both its multimodal variants and GPT-5, underscoring the need for benchmarks like MathSight to advance genuine vision-grounded reasoning in future models.

【2】Experts are all you need: A Composable Framework for Large Language Model Inference
标题：专家就是你所需要的一切：一个用于大型语言模型推理的可组合框架
链接：https://arxiv.org/abs/2511.22955

作者：Shrihari Sridharan,Sourjya Roy,Anand Raghunathan,Kaushik Roy
摘要：大型语言模型（LLM）在各种自然语言处理（NLP）任务中实现了最先进的准确性。然而，这种成功是以增加模型大小为代价的，这导致了额外的计算负担。混合专家（MoE）通过仅激活参数或“专家”的子集来将模型容量与计算解耦，从而克服了这一瓶颈。然而，这些模型需要这些专家与路由器一起进行联合预训练，并且不对多步推理进行建模。相比之下，多智能体框架通过将复杂问题分解为模块化子任务来改进推理。然而，这些框架依赖于顺序的“计划-行动-观察”循环，这引入了显著的延迟。我们的工作，Comp-LLM，通过引入一个可组合的推理框架，使跨专家协作，通过一个显式的子查询依赖图来解决这些挑战。Comp-LLM由三个组件组成：（1）子查询生成器，它分解输入查询，使用嵌入相似性将每个子查询分配给适当的专家，并构造依赖图;（2）查询执行器，它处理图中的节点，并基于依赖性和资源约束识别并行机会;以及（3）响应聚合器，其将中间专家响应合成为连贯的最终答案。在几个基准测试中，Comp-LLM比类似尺寸的单片LLM实现了高达11.01%的精度提高，同时提供了1.67倍-3.56倍的模型尺寸减小，相对于其系列中最大的模型没有显着退化。此外，Comp-LLM与顺序子查询处理相比，延迟改善了1.1 - 1.7倍。
摘要：Large Language Models (LLMs) have achieved state-of-the-art accuracies in a variety of natural language processing (NLP) tasks. However, this success comes at the cost of increased model sizes which leads to additional computational burden. Mixture of Experts (MoEs) overcome this bottleneck by decoupling model capacity from computation by only activating a subset of parameters or "experts". However, these models require joint pretraining of these experts along with the router and do not model multi-step reasoning. In contrast, multi-agent frameworks improve reasoning by decomposing complex problems into modular subtasks. However, these frameworks rely on sequential "plan--act--observe" loops, which introduce significant latency. Our work, Comp-LLM, addresses these challenges by introducing a composable inference framework that enables cross-expert collaboration via an explicit sub-query dependency graph. Comp-LLM consists of three components: (1) A Sub-query Generator that decomposes an input query, assigns each sub-query to an appropriate expert using embedding similarity, and constructs a dependency graph; (2) A Query Executor that processes nodes in the graph and identifies opportunities for parallelism based on dependencies and resource constraints; and (3) A Response Aggregator that synthesizes intermediate expert responses into a coherent final answer. Across several benchmarks, Comp-LLM achieves up to 11.01% accuracy improvement over monolithic LLMs of similar size, while offering 1.67x--3.56x reduction in model size with no significant degradation relative to the largest model in its family. Additionally, Comp-LLM provides 1.1x--1.7x latency improvement compared to sequential sub-query processing.

【3】Language-conditioned world model improves policy generalization by reading environmental descriptions
标题：环境制约世界模型通过解读环境描述提高政策泛化能力
链接：https://arxiv.org/abs/2511.22904

作者：Anh Nguyen,Stefan Lee
备注：NeuRIPS 2025. Workshop: LAW 2025: Bridging Language, Agent, and World Models
摘要：为了在现实世界中与人类有效地交互，智能体必须理解描述环境动态的语言-即环境如何行为-而不仅仅是指定“做什么”的任务指令。理解这种动态描述语言对于人-代理交互和代理行为是重要的。最近的工作使用基于模型的方法来解决这个问题：语言被纳入世界模型，然后用于学习行为策略。然而，这些现有的方法要么没有表现出政策推广到看不见的游戏或依赖于有限的假设。例如，假设由推理时间规划引起的延迟对于目标任务是可容忍的，或者专家演示是可用的。在这条研究路线上，我们专注于从语言条件世界模型中改进政策泛化，同时放弃这些假设。我们提出了一种基于模型的强化学习方法，通过与环境的交互来训练语言条件世界模型，并从该模型中学习策略-无需规划或专家演示。我们的方法提出了基于DreamerV 3构建的Dreamer World Model（LED-WM）的智能编码器。LED-WM具有观察编码器，该观察编码器使用注意力机制来显式地将语言描述接地到观察中的实体。我们表明，与两种环境中的几种设置中的其他基线相比，使用LED-WM训练的策略更有效地推广到由新的动态和语言描述的看不见的游戏：MESSENGER和MESSENGER-WM。为了突出该策略如何在现实世界部署之前利用训练的世界模型，我们证明了该策略可以通过微调世界模型生成的合成测试轨迹来改进。
摘要：To interact effectively with humans in the real world, it is important for agents to understand language that describes the dynamics of the environment--that is, how the environment behaves--rather than just task instructions specifying "what to do". Understanding this dynamics-descriptive language is important for human-agent interaction and agent behavior. Recent work address this problem using a model-based approach: language is incorporated into a world model, which is then used to learn a behavior policy. However, these existing methods either do not demonstrate policy generalization to unseen games or rely on limiting assumptions. For instance, assuming that the latency induced by inference-time planning is tolerable for the target task or expert demonstrations are available. Expanding on this line of research, we focus on improving policy generalization from a language-conditioned world model while dropping these assumptions. We propose a model-based reinforcement learning approach, where a language-conditioned world model is trained through interaction with the environment, and a policy is learned from this model--without planning or expert demonstrations. Our method proposes Language-aware Encoder for Dreamer World Model (LED-WM) built on top of DreamerV3. LED-WM features an observation encoder that uses an attention mechanism to explicitly ground language descriptions to entities in the observation. We show that policies trained with LED-WM generalize more effectively to unseen games described by novel dynamics and language compared to other baselines in several settings in two environments: MESSENGER and MESSENGER-WM.To highlight how the policy can leverage the trained world model before real-world deployment, we demonstrate the policy can be improved through fine-tuning on synthetic test trajectories generated by the world model.

【4】ORION: Teaching Language Models to Reason Efficiently in the Language of Thought
标题：ORION：通过思维语言有效地教授语言模型进行推理
链接：https://arxiv.org/abs/2511.22891

作者：Kumar Tanmay,Kriti Aggarwal,Paul Pu Liang,Subhabrata Mukherjee
摘要：大型推理模型（LRM）在数学、代码生成和任务规划方面具有很强的性能，但它们对冗长的“思考”令牌链的依赖导致了高延迟、冗余和不连贯的推理路径。受思维语言假说的启发，该假说假定人类的推理是在一种称为心理语的符号性、成分性心理语言上进行的，我们引入了一个框架，该框架可以训练模型以类似的紧凑风格进行推理。Mentalese将抽象推理编码为超压缩的结构化令牌，使模型能够以更少的步骤解决复杂问题。为了提高效率和准确性，我们提出了短长度偏好优化（SLPO），这是一种强化学习方法，可以奖励保持正确的简洁解决方案，同时在需要时仍然允许更长的推理。应用于Mentalese对齐的模型，SLPO通过实现简洁的推理，在没有计算开销的情况下保留了详细思考的好处，从而产生了显著更高的压缩率。在包括AIME 2024和2025、MinervaMath、OlympiadBench、Math 500和AMC在内的基准测试中，我们的ORION模型生成的推理轨迹比DeepSeek R1 Distilled模型减少了4- 16倍，推理延迟降低了5倍，训练成本降低了7- 9倍，同时保持了90-98%的准确率。ORION在保持2倍压缩的同时，精度也超过了Claude和ChatGPT-4 o高达5%。这些结果表明，Mentalese风格的压缩推理向人类认知效率迈进了一步，在不牺牲准确性的情况下实现了实时、经济高效的推理。
摘要：Large Reasoning Models (LRMs) achieve strong performance in mathematics, code generation, and task planning, but their reliance on long chains of verbose "thinking" tokens leads to high latency, redundancy, and incoherent reasoning paths. Inspired by the Language of Thought Hypothesis, which posits that human reasoning operates over a symbolic, compositional mental language called Mentalese, we introduce a framework that trains models to reason in a similarly compact style. Mentalese encodes abstract reasoning as ultra-compressed, structured tokens, enabling models to solve complex problems with far fewer steps. To improve both efficiency and accuracy, we propose SHORTER LENGTH PREFERENCE OPTIMIZATION (SLPO), a reinforcement learning method that rewards concise solutions that stay correct, while still allowing longer reasoning when needed. Applied to Mentalese-aligned models, SLPO yields significantly higher compression rates by enabling concise reasoning that preserves the benefits of detailed thinking without the computational overhead. Across benchmarks including AIME 2024 and 2025, MinervaMath, OlympiadBench, Math500, and AMC, our ORION models produce reasoning traces with 4-16x fewer tokens, achieve up to 5x lower inference latency, and reduce training costs by 7-9x relative to the DeepSeek R1 Distilled model, while maintaining 90-98% of its accuracy. ORION also surpasses Claude and ChatGPT-4o by up to 5% in accuracy while maintaining 2x compression. These results show that Mentalese-style compressed reasoning offers a step toward human-like cognitive efficiency, enabling real-time, cost-effective reasoning without sacrificing accuracy.

【5】Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems
标题：在分布式LLM推理系统中为异类LoRA适配器提供服务
链接：https://arxiv.org/abs/2511.22880

作者：Shashwat Jaiswal,Shrikara Arun,Anjaly Parayil,Ankur Mallick,Spyros Mastorakis,Alind Khare,Chloi Alverti,Renee St Amant,Chetan Bansal,Victor Rühle,Josep Torrellas
摘要：低秩自适应（LoRA）已经成为大型语言模型（LLM）参数有效微调的事实方法，能够快速适应不同的领域。在生产中，基于LoRA的模型可以大规模提供服务，创建多租户环境，其中数百个适配器共享一个基础模型。然而，最先进的服务系统在不考虑等级（大小）变化的情况下批量处理异构适配器，导致严重的性能偏差，最终需要添加更多的GPU来满足服务级别目标（SLO）。现有的优化主要集中在加载、缓存和内核执行上，忽略了这种异构性，导致GPU资源未得到充分利用。我们提出了LoRAServe，工作负载感知的动态适配器放置和路由框架，旨在驯服等级的LoRA服务的多样性。通过在GPU之间动态重新平衡适配器并利用GPU Direct RDMA进行远程访问，LoRAServe可以在实际工作负载漂移的情况下最大限度地提高吞吐量并最大限度地减少尾部延迟。对X公司生产跟踪的评估表明，与最先进的系统相比，LoRAServe的吞吐量提高了2倍，TTFT降低了9倍，同时在SLO约束下使用的GPU减少了50%。
摘要：Low-Rank Adaptation (LoRA) has become the de facto method for parameter-efficient fine-tuning of large language models (LLMs), enabling rapid adaptation to diverse domains. In production, LoRA-based models are served at scale, creating multi-tenant environments with hundreds of adapters sharing a base model. However, state-of-the-art serving systems co-batch heterogeneous adapters without accounting for rank (size) variability, leading to severe performance skew, which ultimately requires adding more GPUs to satisfy service-level objectives (SLOs). Existing optimizations, focused on loading, caching, and kernel execution, ignore this heterogeneity, leaving GPU resources underutilized. We present LoRAServe, a workload-aware dynamic adapter placement and routing framework designed to tame rank diversity in LoRA serving. By dynamically rebalancing adapters across GPUs and leveraging GPU Direct RDMA for remote access, LoRAServe maximizes throughput and minimizes tail latency under real-world workload drift. Evaluations on production traces from Company X show that LoRAServe elicits up to 2$\times$ higher throughput, up to 9$\times$ lower TTFT, while using up to 50% fewer GPUs under SLO constraints compared to state-of-the-art systems.

【6】Mitigating Semantic Drift: Evaluating LLMs' Efficacy in Psychotherapy through MI Dialogue Summarization
标题：缓解语义漂移：通过MI对话总结评估LLM在心理治疗中的功效
链接：https://arxiv.org/abs/2511.22818

作者：Vivek Kumar,Pushpraj Singh Rajawat,Eirini Ntoutsi
摘要：大型语言模型（LLM）的最新进展已经显示出它们在通用和特定领域任务中的潜力。然而，越来越多的人担心他们缺乏敏感性，反应中的事实不正确，同情的不一致表达，偏见，幻觉，以及整体无法捕捉人类理解的深度和复杂性，特别是在低资源和敏感的领域，如心理学。为了应对这些挑战，我们的研究采用了混合方法来评估LLM在心理治疗中的疗效。我们使用LLM来生成动机面试（MI）对话的精确摘要，并基于动机面试治疗完整性（MITI）框架的关键组成部分设计了一个两阶段的注释方案，即唤起，协作，自主，方向，同理心和非判断态度。使用专家标注的MI对话作为基础事实，我们制定了多类分类任务，以评估模型的性能，逐步提示技术，结合一杆和Few-Shot提示。我们的研究结果提供了对LLM理解复杂心理结构的能力的见解，并强调了在治疗环境中减轻“语义漂移”的最佳实践。我们的工作不仅通过提供高质量的注释数据集来解决低资源领域的数据稀缺问题，而且还为在复杂的行为治疗中使用LLM进行精确的上下文解释提供了重要的见解。
摘要：Recent advancements in large language models (LLMs) have shown their potential across both general and domain-specific tasks. However, there is a growing concern regarding their lack of sensitivity, factual incorrectness in responses, inconsistent expressions of empathy, bias, hallucinations, and overall inability to capture the depth and complexity of human understanding, especially in low-resource and sensitive domains such as psychology. To address these challenges, our study employs a mixed-methods approach to evaluate the efficacy of LLMs in psychotherapy. We use LLMs to generate precise summaries of motivational interviewing (MI) dialogues and design a two-stage annotation scheme based on key components of the Motivational Interviewing Treatment Integrity (MITI) framework, namely evocation, collaboration, autonomy, direction, empathy, and a non-judgmental attitude. Using expert-annotated MI dialogues as ground truth, we formulate multi-class classification tasks to assess model performance under progressive prompting techniques, incorporating one-shot and few-shot prompting. Our results offer insights into LLMs' capacity for understanding complex psychological constructs and highlight best practices to mitigate ``semantic drift" in therapeutic settings. Our work contributes not only to the MI community by providing a high-quality annotated dataset to address data scarcity in low-resource domains but also critical insights for using LLMs for precise contextual interpretation in complex behavioral therapy.

【7】Automated Design Optimization via Strategic Search with Large Language Models
标题：通过大型语言模型的战略搜索自动化设计优化
链接：https://arxiv.org/abs/2511.22651

作者：Anthony Carreon,Vansh Sharma,Venkat Raman
备注：14 pages, 5 tables, 7 figures, preprint
摘要：传统的优化方法在定义良好的搜索空间中表现出色，但在转换和设计参数难以定义的设计问题中却很难解决。大型语言模型（LLM）通过动态解释设计空间和利用编码的领域知识提供了一种有前途的替代方案。为此，我们引入了一个LLM代理框架，它将设计优化视为一个由战略LLM推理指导的无梯度搜索问题。该框架采用了两个协作代理：一个战略家，探索和开发策略之间的选择，和执行详细的设计实现。应用于GPU代码优化--从机器学习到科学计算领域的关键领域--GPU生成的解决方案与化学动力学集成和密集矩阵乘法的专家实现相比具有竞争力。相对于贝叶斯优化方法，该框架实现了50-70%的搜索效率。它在大约8小时内完成优化，每次运行的估计成本高达159美元，而软件开发人员的平均工资估计成本高达480美元。这些发现打开了大门，自动化设计优化在不明确的搜索空间有限的先验信息。
摘要：Traditional optimization methods excel in well-defined search spaces but struggle with design problems where transformations and design parameters are difficult to define. Large language models (LLMs) offer a promising alternative by dynamically interpreting design spaces and leveraging encoded domain knowledge. To this end, we introduce AUTO, an LLM agent framework that treats design optimization as a gradient-free search problem guided by strategic LLM reasoning. The framework employs two collaborative agents: a Strategist that selects between exploration and exploitation strategies, and an Implementor that executes detailed designs. Applied to GPU code optimization -- a domain critical to fields from machine learning to scientific computing -- AUTO generates solutions competitive with expert implementations for chemical kinetics integration and dense matrix multiplication. The framework achieves 50-70% search efficiency relative to Bayesian optimization methodologies. It completes optimizations in approximately 8 hours at an estimated cost of up to \$159 per run, compared to an estimated cost of up to \$480 with median-wage software developers. These findings open the door to automating design optimization in ill-defined search spaces with limited prior information.

【8】DisCEdge: Distributed Context Management for Large Language Models at the Edge
标题：Discedge：边缘大型语言模型的分布式上下文管理
链接：https://arxiv.org/abs/2511.22599

作者：Mohammadreza Malekabbasi,Minghe Wang,David Bermbach
备注：author version
摘要：在边缘部署大型语言模型（LLM）服务有利于延迟敏感和隐私感知应用程序。然而，LLM的无状态性质使得管理用户上下文（例如，会话、偏好）跨地理分布的边缘节点具有挑战性。现有的解决方案，例如客户端上下文存储，通常会引入网络延迟和带宽开销，从而削弱边缘部署的优势。我们提出了Discedge，一个分布式的上下文管理系统，存储和复制用户上下文的令牌化的形式在边缘节点。通过将上下文保持为令牌序列而不是原始文本，我们的系统避免了冗余计算，并实现了高效的数据复制。我们在一个现实的边缘环境中使用商品硬件实现和评估开源原型。我们发现，与基于原始文本的系统相比，Discedge将中值响应时间提高了14.46%，并将中值节点间同步开销降低了15%。与客户端上下文管理相比，它还将客户端请求大小减少了90%，同时保证了数据的一致性。
摘要：Deploying Large Language Model (LLM) services at the edge benefits latency-sensitive and privacy-aware applications. However, the stateless nature of LLMs makes managing user context (e.g., sessions, preferences) across geo-distributed edge nodes challenging. Existing solutions, such as client-side context storage, often introduce network latency and bandwidth overhead, undermining the advantages of edge deployment. We propose DisCEdge, a distributed context management system that stores and replicates user context in tokenized form across edge nodes. By maintaining context as token sequences rather than raw text, our system avoids redundant computation and enables efficient data replication. We implement and evaluate an open-source prototype in a realistic edge environment with commodity hardware. We show DisCEdge improves median response times by up to 14.46% and lowers median inter-node synchronization overhead by up to 15% compared to a raw-text-based system. It also reduces client request sizes by a median of 90% compared to client-side context management, while guaranteeing data consistency.

【9】LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system
标题：LLM-Cave：大型语言模型推理和决策系统的基准和轻型环境
链接：https://arxiv.org/abs/2511.22598

作者：Huanyu Li,Zongyuan Li,Wei Huang,Xian Guo
备注：8 pages, 5 figures, ICICN 2025
摘要：ChatGPT o 1、ChatGPT o3和DeepSeek R1等大型语言模型（LLM）在解决难题方面表现出了巨大的潜力。然而，目前的LLM评估基准仅限于一步互动。一些现有的序列决策环境，如TextStarCraftII和LLM-PySC 2，过于复杂，需要数小时的交互才能完成游戏。在本文中，我们介绍LLM-Cave，一个基准和轻环境的LLM推理和决策系统。这种环境是象征主义时代的一个典型例子。人工智能使智能体能够探索环境，并通过使用部分可观察的状态信息推理附近的危险来避免潜在的损失。在实验中，我们评估了主流大型语言模型（LLM）的顺序推理能力，决策性能和计算效率，如GPT-4 o-mini，o 1-mini和DeepSeek-R1。实验表明，虽然Deepseek-R1在复杂推理任务上取得了最高的成功率，但像4 o-mini这样的较小模型通过采用推测链和规划者-批评者策略，以降低计算效率为代价，大大缩小了挑战的性能差距。这表明，结构化的，多步推理与基于LLM的反馈机制相结合，可以大大提高LLM的决策能力，提供了一个有前途的方向，改善推理较弱的模型，并建议一个新的推理为中心的基准LLM评估。我们的代码在https://github.com/puleya1277/CaveEnv上开源。
摘要：Large language models (LLMs) such as ChatGPT o1, ChatGPT o3, and DeepSeek R1 have shown great potential in solving difficult problems. However, current LLM evaluation benchmarks are limited to one-step interactions. Some of the existing sequence decision-making environments, such as TextStarCraftII and LLM-PySC2, are too complicated and require hours of interaction to complete a game. In this paper, we introduce LLM-Cave, a benchmark and light environment for LLM reasoning and decision-making systems. This environment is a classic instance in the era of Symbolism. Artificial intelligence enables the agent to explore the environment and avoid potential losses by reasoning about nearby dangers using partial observable state information. In the experiment, we evaluated the sequential reasoning ability, decision-making performance and computational efficiency of mainstream large language models (LLMs) such as GPT-4o-mini, o1-mini, and DeepSeek-R1. Experiments show that while Deepseek-R1 achieved the highest success rate on complex reasoning tasks, smaller models like 4o-mini significantly narrowed the performance gap on challenges by employing Chain of Speculation and Planner-Critic strategies, at the expense of reduced computational efficiency. This indicates that structured, multi-step reasoning combined with an LLM-based feedback mechanism can substantially enhance an LLM's decision-making capabilities, providing a promising direction for improving reasoning in weaker models and suggesting a new reasoning-centered benchmark for LLM assessment. Our code is open-sourced in https://github.com/puleya1277/CaveEnv.

【10】GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents
标题：GEO-Detective：与LLM代理一起揭露图像中的位置隐私风险
链接：https://arxiv.org/abs/2511.22441

作者：Xinyu Zhang,Yixin Wu,Boyang Zhang,Chenhao Lin,Chao Shen,Michael Backes,Yang Zhang
备注：15 pages with 7 figures and 12 tables
摘要：在社交媒体上分享的图像往往会暴露地理线索。虽然早期的地理定位方法需要专家的努力，缺乏泛化能力，但大型视觉语言模型（LVLM）的兴起现在甚至可以为普通用户提供准确的地理定位。然而，现有的方法并没有针对该任务进行优化。为了探索全部潜在的和相关的隐私风险，我们提出了Geo-Detective，这是一种模仿人类推理和工具用于图像地理位置推断的代理。它遵循四个步骤的程序，根据图像难度自适应地选择策略，并配备了专门的工具，如视觉反向搜索，它模仿人类如何收集外部地理线索。实验结果表明，GEO-Detective的整体性能优于基线大视觉语言模型（LVLM），特别是在缺乏可见地理特征的图像上。在国家级地理定位任务中，与基线LLM相比，它实现了超过11.1%的改进，即使在更细的粒度级别，它仍然提供了约5.2%的性能增益。同时，当配备外部线索时，GEO-Detective更有可能产生准确的预测，将“未知”预测率降低了50.6%以上。我们进一步探讨了多种防御策略，并发现地理侦探表现出更强的鲁棒性，强调需要更有效的隐私保护。
摘要：Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (LVLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.

【11】SuRe: Surprise-Driven Prioritised Replay for Continual LLM Learning
标题：SuRe：连续LLM学习的惊喜驱动优先重播
链接：https://arxiv.org/abs/2511.22367

作者：Hugo Hazard,Zafeirios Fountas,Martin A. Benfeghoul,Adnan Oomerjee,Jun Wang,Haitham Bou-Ammar
摘要：持续学习，即适应一系列任务而不忘记先前获得的知识的能力，仍然是机器学习的主要挑战，也是人工智能和人类智能之间的关键差距。虽然规则化和重放在视觉上表现良好，但它们落后于大型语言模型（LLM）的多任务学习，特别是在许多任务的规模上。我们重新回顾回放，并认为两种失败模式驱动这个差距：选择（排练什么）和整合（如何巩固新知识）。为了解决选择问题，我们提出了惊喜优先重播（SuRe），这是一个简单的，与架构无关的规则，可以对最令人惊讶的（高负对数似然）序列进行排名和存储。SuRe在大量任务（LNT）设置中实现了最先进的性能，并在标准CL和LNT基准测试中提供了最佳的总体平均值。为了解决集成问题，我们添加了一个双学习器设计，通过指数移动平均（EMA）合并了快速和慢速LoRA适配器，从而在稳定长期知识的同时实现快速适应。将SuRe与双学习器相结合可以获得进一步的收益，包括LNT上的精度比之前的SOTA提高了+5点。消融研究证实，我们提出的方法在降低重放频率和小缓冲区大小下仍然是稳健的，证明了有效性和样本效率。综上所述，我们的研究结果建立了重播作为持续LLM微调的强大基线，并证明了基于事件的选择和缓慢的权重整合是减轻灾难性遗忘的互补组成部分。
摘要：Continual learning, one's ability to adapt to a sequence of tasks without forgetting previously acquired knowledge, remains a major challenge in machine learning and a key gap between artificial and human intelligence. While regularisation and replay perform well in vision, they lag behind multi-task learning for large language models (LLMs), especially at scale with many tasks. We revisit replay and argue that two failure modes drive this gap: selection (what to rehearse) and integration (how to consolidate new knowledge). To address selection, we propose Surprise-prioritised Replay (SuRe), a simple, architecture-agnostic rule that ranks and stores the most surprising (high Negative Log-Likelihood) sequences. SuRe achieves state-of-the-art performance in the Large Number of Tasks (LNT) setting and delivers the best overall average across both Standard CL and LNT benchmarks. To address integration, we add a dual-learner design with fast and slow LoRA adapters merged via an exponential moving average (EMA), enabling rapid adaptation while stabilising long-term knowledge. Combining SuRe with the dual learner yields further gains, including improvements of up to +5 accuracy points on LNT over prior SOTA. Ablation studies confirm that our proposed method remains robust under reduced replay frequency and small buffer size, demonstrating both effectiveness and sample efficiency. Taken together, our results establish replay as a strong baseline for continual LLM fine-tuning and demonstrate that surprise-based selection and slow-weight consolidation are complementary components for mitigating catastrophic forgetting.

【12】SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
标题：SingleQuant：一次性高效量化大型语言模型
链接：https://arxiv.org/abs/2511.22316

作者：Jinying Xiao,Bin Ji,Shasha Li,Xiaodong Liu,Ma Jun,Ye Zhong,Wei Li,Xuan Xie,Qingbo Wu,Jie Yu
备注：9 pages, 4 figures
摘要：大型语言模型（LLM）量化有助于在资源有限的环境中部署LLM，但现有的方法，结合不兼容的梯度优化和量化截断导致严重的收敛病理。这增加了量化时间并降低了LLM的任务性能。我们的研究证实，Stiefel流形上的直通估计（STE）引入了非光滑性和梯度噪声，阻碍了优化收敛，并阻碍了高保真量化LLM的发展，尽管进行了广泛的训练。为了解决上述限制，我们提出了SingleQuant，这是一种单通道量化框架，它从量化截断中进行扩展，从而消除了上述非平滑和梯度噪声因素。具体而言，SingleQuant构建了针对不同激活离群值的对齐旋转变换（ART）和均匀旋转变换（URT），其中ART通过闭合形式的最佳旋转实现离群值的平滑，而URT通过几何映射重塑分布。这两个矩阵包括严格制定的吉文斯旋转与预定的尺寸和旋转角度，使有前途的LLM任务性能在短时间内。实验结果证明了SingleQuant在7 B-70 B LLM上的各种任务中优于所选基线。更准确地说，SingleQuant使量化的LLM能够实现更高的任务性能，同时需要更少的量化时间。例如，当量化LLaMA-2- 13 B时，SingleQuant实现了1，400 $\times $量化加速，与所选最佳基线相比，平均任务性能提高了+0.57\%。
摘要：Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantization time and degrades LLMs' task performance. Our studies confirm that Straight-Through Estimator (STE) on Stiefel manifolds introduce non-smoothness and gradient noise, obstructing optimization convergence and blocking high-fidelity quantized LLM development despite extensive training. To tackle the above limitations, we propose SingleQuant, a single-pass quantization framework that decouples from quantization truncation, thereby eliminating the above non-smoothness and gradient noise factors. Specifically, SingleQuant constructs Alignment Rotation Transformation (ART) and Uniformity Rotation Transformation (URT) targeting distinct activation outliers, where ART achieves smoothing of outlier values via closed-form optimal rotations, and URT reshapes distributions through geometric mapping. Both matrices comprise strictly formulated Givens rotations with predetermined dimensions and rotation angles, enabling promising LLMs task performance within a short time. Experimental results demonstrate SingleQuant's superiority over the selected baselines across diverse tasks on 7B-70B LLMs. To be more precise, SingleQuant enables quantized LLMs to achieve higher task performance while necessitating less time for quantization. For example, when quantizing LLaMA-2-13B, SingleQuant achieves 1,400$\times$ quantization speedup and increases +0.57\% average task performance compared to the selected best baseline.

【13】Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation
标题：用于蛋白质序列设计的大量大型语言模型代理及其实验验证
链接：https://arxiv.org/abs/2511.22311

作者：Fiona Y. Wang,Di Sheng Lee,David L. Kaplan,Markus J. Buehler
摘要：由于序列空间的巨大性以及序列、结构和功能之间的复杂耦合，重新设计具有定制结构、物理化学和功能特性的蛋白质仍然是生物技术、医学和材料科学中的一个巨大挑战。目前最先进的生成方法，如蛋白质语言模型（PLM）和基于扩散的架构，通常需要大量的微调，任务特定的数据，或模型重新配置，以支持目标导向的设计，从而限制了它们的灵活性和可扩展性。为了克服这些限制，我们提出了一个受群体智能启发的去中心化、基于代理的框架，用于从头蛋白质设计。在这种方法中，多个大语言模型（LLM）代理并行操作，每个代理分配给特定的残基位置。这些代理迭代地提出上下文感知的突变，通过整合设计目标，局部邻域的相互作用，记忆和反馈，从以前的迭代。这种位置明智的，分散的协调，使紧急设计的多样性，明确的序列，而不依赖于基序支架或多序列比对，验证与实验的蛋白质与α螺旋和卷曲结构。通过残基保守性、基于结构的度量以及序列收敛和嵌入的分析，我们证明了该框架具有涌现行为和蛋白质适应度景观的有效导航。我们的方法在几个GPU小时内实现了高效的目标导向设计，并且完全无需微调或专门培训即可操作，为蛋白质设计提供了可推广和适应性强的解决方案。除了蛋白质之外，该方法还为跨生物分子系统和其他科学发现任务的集体LLM驱动设计奠定了基础。
摘要：Designing proteins de novo with tailored structural, physicochemical, and functional properties remains a grand challenge in biotechnology, medicine, and materials science, due to the vastness of sequence space and the complex coupling between sequence, structure, and function. Current state-of-the-art generative methods, such as protein language models (PLMs) and diffusion-based architectures, often require extensive fine-tuning, task-specific data, or model reconfiguration to support objective-directed design, thereby limiting their flexibility and scalability. To overcome these limitations, we present a decentralized, agent-based framework inspired by swarm intelligence for de novo protein design. In this approach, multiple large language model (LLM) agents operate in parallel, each assigned to a specific residue position. These agents iteratively propose context-aware mutations by integrating design objectives, local neighborhood interactions, and memory and feedback from previous iterations. This position-wise, decentralized coordination enables emergent design of diverse, well-defined sequences without reliance on motif scaffolds or multiple sequence alignments, validated with experiments on proteins with alpha helix and coil structures. Through analyses of residue conservation, structure-based metrics, and sequence convergence and embeddings, we demonstrate that the framework exhibits emergent behaviors and effective navigation of the protein fitness landscape. Our method achieves efficient, objective-directed designs within a few GPU-hours and operates entirely without fine-tuning or specialized training, offering a generalizable and adaptable solution for protein design. Beyond proteins, the approach lays the groundwork for collective LLM-driven design across biomolecular systems and other scientific discovery tasks.

【14】Enhanced Conditional Generation of Double Perovskite by Knowledge-Guided Language Model Feedback
标题：通过知识引导语言模型反馈增强双Persimmon的条件生成
链接：https://arxiv.org/abs/2511.22307

作者：Inhyo Lee,Junhyeong Lee,Jongwon Park,KyungTae Lim,Seunghwa Ryu
摘要：双钙钛矿（DP）由于其成分可调性和与低能量制造的兼容性而成为可持续能源技术的有希望的候选者，但其巨大的设计空间对有条件的材料发现构成了重大挑战。这项工作引入了一个多智能体，文本梯度驱动的框架，通过整合三个互补的反馈源：基于LLM的自我评估，DP特定领域知识的反馈，和ML代理人为基础的反馈，在自然语言条件下进行DP组成的生成。类似于知识知情的机器学习如何提高传统数据驱动模型的可靠性，我们的框架结合了域知情的文本梯度，以引导生成过程朝向DP组成空间的物理上有意义的区域。三种增量配置的系统比较，（i）纯LLM生成，（ii）具有基于LLM推理的反馈的LLM生成，以及（iii）具有领域知识引导的反馈的LLM生成，表明来自知识告知梯度的迭代引导在没有额外训练数据的情况下提高了稳定性条件满意度，实现了超过98%的组成有效性和高达54%的稳定或亚稳定候选，超过了仅限LLM的基线（43%）和之前基于GAN的结果（27%）。基于ML的梯度的分析进一步揭示了它们在分布内（ID）区域中增强了性能，但在分布外（OOD）区域中变得不可靠。总的来说，这项工作提供了第一个系统的分析多代理，知识引导的文本梯度DP发现，并建立了一个可推广的蓝图MAS驱动的生成材料设计，旨在推进可持续发展的技术。
摘要：Double perovskites (DPs) are promising candidates for sustainable energy technologies due to their compositional tunability and compatibility with low-energy fabrication, yet their vast design space poses a major challenge for conditional materials discovery. This work introduces a multi-agent, text gradient-driven framework that performs DP composition generation under natural-language conditions by integrating three complementary feedback sources: LLM-based self-evaluation, DP-specific domain knowledge-informed feedback, and ML surrogate-based feedback. Analogous to how knowledge-informed machine learning improves the reliability of conventional data-driven models, our framework incorporates domain-informed text gradients to guide the generative process toward physically meaningful regions of the DP composition space. Systematic comparison of three incremental configurations, (i) pure LLM generation, (ii) LLM generation with LLM reasoning-based feedback, and (iii) LLM generation with domain knowledge-guided feedback, shows that iterative guidance from knowledge-informed gradients improves stability-condition satisfaction without additional training data, achieving over 98% compositional validity and up to 54% stable or metastable candidates, surpassing both the LLM-only baseline (43%) and prior GAN-based results (27%). Analyses of ML-based gradients further reveal that they enhance performance in in-distribution (ID) regions but become unreliable in out-of-distribution (OOD) regimes. Overall, this work provides the first systematic analysis of multi-agent, knowledge-guided text gradients for DP discovery and establishes a generalizable blueprint for MAS-driven generative materials design aimed at advancing sustainable technologies.

【15】TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation
标题：TreeCoder：LLM代码生成的解码和约束的系统探索和优化
链接：https://arxiv.org/abs/2511.22277

作者：Henrijs Princis,Arindam Sharma,Cristina David
摘要：大型语言模型（LLM）已经显示出了卓越的代码生成能力，但当仅通过自然语言提示引导时，它们的输出通常违反语法或语义约束。我们介绍了TreeCoder，这是迄今为止最通用和灵活的框架，用于探索LLM中的解码策略，约束和超参数，并将其用于代码生成，以在解码过程中执行正确性和结构，而不是依赖于提示工程。TreeCoder将解码表示为对候选程序的树搜索，其中解码策略和约束函数（例如风格，语法，执行）都被视为一流的可优化组件。该设计使得能够使用标准优化技术对解码配置进行系统探索和自动调谐。在MBPP（Python）和SQL-Spider基准测试上的实验表明，TreeCoder在CodeLlama、Mistral和DeepSeek等开源模型上持续提高了准确性，通常比它们的无约束基线表现得更好。
摘要：Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and flexible framework to date for exploring decoding strategies, constraints, and hyperparameters in LLMs, and use it in code generation to enforce correctness and structure during decoding rather than relying on prompt engineering. TreeCoder represents decoding as a tree search over candidate programs, where both decoding strategies and constraint functions - such as style, syntax, execution - are treated as first-class, optimisable components. This design enables systematic exploration and automatic tuning of decoding configurations using standard optimisation techniques. Experiments on the MBPP (Python) and SQL-Spider benchmarks show that TreeCoder consistently improves accuracy across open-source models such as CodeLlama, Mistral and DeepSeek, often outperforming their unconstrained baselines by considerable margins.

【16】TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices
标题：TinyLLM：边缘设备上抽象任务的小语言模型评估和优化
链接：https://arxiv.org/abs/2511.22138

作者：Mohd Ariful Haque,Fahad Rahman,Kishor Datta Gupta,Khalil Shujaee,Roy George
备注：8 pages, 3 figures, 4 tables
摘要：本文研究了小语言模型（SLM）用于代理任务（函数/工具/API调用）的有效性，重点是在边缘设备上运行代理，而不依赖云基础设施。我们使用伯克利函数调用排行榜（BFCL）框架评估SLM，并描述参数驱动的优化策略，包括监督微调（SFT），参数有效微调（PEFT），基于强化学习（RL）的优化，通过直接偏好优化（DPO）的偏好对齐，和混合方法。我们报告了包括TinyAgent，TinyLlama，Qwen和xLAM在内的模型在BFCL类别（简单，多个，并行，并行多个和相关性检测）中的结果，包括实时和非实时设置以及多轮评估。我们还详细介绍了从AgentBank数据构建的DPO训练管道（例如，ALFRED），包括我们使用TinyLlama响应作为拒绝输出和手动验证将SFT数据转换为选择-拒绝对。我们的研究结果表明，在不同的模型尺度下，中等尺寸的模型（1-3B参数）的精度明显优于超紧凑型模型（<1B参数），通过混合优化，总体精度高达65.74%，多圈精度为55.62%。这项研究强调了混合优化策略的重要性，这些策略使小型语言模型能够在边缘设备上提供准确、高效和稳定的人工智能，使隐私保护、低延迟的自治代理在云端之外变得实用。
摘要：This paper investigates the effectiveness of small language models (SLMs) for agentic tasks (function/tool/API calling) with a focus on running agents on edge devices without reliance on cloud infrastructure. We evaluate SLMs using the Berkeley Function Calling Leaderboard (BFCL) framework and describe parameter-driven optimization strategies that include supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), reinforcement learning (RL)-based optimization, preference alignment via Direct Preference Optimization (DPO), and hybrid methods. We report results for models including TinyAgent, TinyLlama, Qwen, and xLAM across BFCL categories (simple, multiple, parallel, parallel-multiple, and relevance detection), both in live and non-live settings, and in multi-turn evaluations. We additionally detail a DPO training pipeline constructed from AgentBank data (e.g., ALFRED), including our conversion of SFT data to chosen-rejected pairs using TinyLlama responses as rejected outputs and manual validation. Our results demonstrate clear accuracy differences across model scales where medium-sized models (1-3B parameters) significantly outperform ultra-compact models (<1B parameters), achieving up to 65.74% overall accuracy, and 55.62% multi-turn accuracy with hybrid optimization. This study highlights the importance of hybrid optimization strategies that enable small language models to deliver accurate, efficient, and stable agentic AI on edge devices, making privacy-preserving, low-latency autonomous agents practical beyond the cloud.

【17】Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
标题：分解信任：探索低级别法学硕士的隐私、对抗稳健性、公平性和道德
链接：https://arxiv.org/abs/2511.22099

作者：Daniel Agyei Asante,Md Mokarram Chowdhury,Yang Li
备注：14 pages, 10 figures
摘要：大型语言模型（LLM）推动了跨领域的重大进步，但其庞大的规模阻碍了在资源受限环境中的部署。模型压缩解决了这一挑战，低秩因子分解成为一种特别有效的方法，可以在保持准确性的同时减少大小，内存和计算。然而，尽管这些压缩模型拥有良好的性能和系统级优势，但其可信度影响仍然知之甚少。在本文中，我们首次全面研究了低秩因子分解如何影响LLM在隐私、对抗鲁棒性、公平性和道德一致性方面的可信度。我们评估了不同大小的多个LLM和使用不同低秩算法压缩的变体，揭示了关键的见解：（1）低秩压缩保留或改善了训练数据隐私，但削弱了对话期间的PII保护;（2）即使在深度压缩下，对抗性鲁棒性通常也得到了保留和增强;（3）道德推理在zero-shot设置中下降，但在Few-Shot提示下部分恢复;（4）公平性在压力下下降。除了压缩，我们还研究了模型规模和微调如何影响可信度，因为这两者在低秩方法中都很重要。为了指导值得信赖的压缩策略，我们以基于梯度的属性分析来结束我们的论文，以确定LLM中的哪些层对对抗鲁棒性贡献最大。
摘要：Large language models (LLMs) have driven major advances across domains, yet their massive size hinders deployment in resource-constrained settings. Model compression addresses this challenge, with low-rank factorization emerging as a particularly effective method for reducing size, memory, and computation while maintaining accuracy. However, while these compressed models boast of benign performance and system-level advantages, their trustworthiness implications remain poorly understood. In this paper, we present the first comprehensive study of how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethical alignment. We evaluate multiple LLMs of different sizes and variants compressed with diverse low-rank algorithms, revealing key insights: (1) low-rank compression preserves or improves training data privacy but weakens PII protection during conversation; (2) adversarial robustness is generally preserved and often enhanced, even under deep compression; (3) ethical reasoning degrades in zero-shot settings but partially recovers with few-shot prompting; (4) fairness declines under compression. Beyond compression, we investigate how model scale and fine-tuning affect trustworthiness, as both are important in low-rank methods. To guide trustworthy compression strategies, we end our paper with a gradient-based attribution analysis to identify which layers in LLMs contribute most to adversarial robustness.

【18】AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models
标题：AfriStereo：一个基于文化的数据集，用于评估大型语言模型中的刻板印象偏见
链接：https://arxiv.org/abs/2511.22016

作者：Yann Le Beux,Oluchi Audu,Oche D. Ankeli,Dhananjay Balakrishnan,Melissah Weya,Marie D. Ralaiarinosy,Ignatius Ezeani
摘要：现有的人工智能偏见评估基准在很大程度上反映了西方的观点，使非洲的情况代表性不足，并在各个领域的应用中形成有害的刻板印象。为了解决这一差距，我们引入了AfriStereo，这是第一个基于当地社会文化背景的开源非洲刻板印象数据集和评估框架。通过塞内加尔、肯尼亚和尼日利亚的社区参与，我们收集了1,163个性别、种族、宗教、年龄和职业的刻板印象。使用Few-Shot提示和人在回路验证，我们将数据集扩展到超过5,000个刻板印象-反刻板印象对。通过语义聚类和人工注释的文化知情的审查者进行了验证。对语言模型的初步评估显示，11个模型中有9个表现出统计上显著的偏差，偏差偏好比（BPR）在0.63到0.78之间（p <= 0.05），表明系统偏好刻板印象而不是反刻板印象，特别是在年龄，职业和性别方面。在我们的设置中，特定领域的模型似乎显示出较弱的偏见，这表明特定任务的培训可能会减轻一些关联。展望未来，AfriStereo为未来基于文化的偏见评估和缓解的研究开辟了道路，为人工智能社区提供了建立更公平，更具有上下文感知和全球包容性的NLP技术的关键方法。
摘要：Existing AI bias evaluation benchmarks largely reflect Western perspectives, leaving African contexts underrepresented and enabling harmful stereotypes in applications across various domains. To address this gap, we introduce AfriStereo, the first open-source African stereotype dataset and evaluation framework grounded in local socio-cultural contexts. Through community engaged efforts across Senegal, Kenya, and Nigeria, we collected 1,163 stereotypes spanning gender, ethnicity, religion, age, and profession. Using few-shot prompting with human-in-the-loop validation, we augmented the dataset to over 5,000 stereotype-antistereotype pairs. Entries were validated through semantic clustering and manual annotation by culturally informed reviewers. Preliminary evaluation of language models reveals that nine of eleven models exhibit statistically significant bias, with Bias Preference Ratios (BPR) ranging from 0.63 to 0.78 (p <= 0.05), indicating systematic preferences for stereotypes over antistereotypes, particularly across age, profession, and gender dimensions. Domain-specific models appeared to show weaker bias in our setup, suggesting task-specific training may mitigate some associations. Looking ahead, AfriStereo opens pathways for future research on culturally grounded bias evaluation and mitigation, offering key methodologies for the AI community on building more equitable, context-aware, and globally inclusive NLP technologies.

【19】Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs
标题：简化的策略搜索：LLM中通过语言和数值推理的强化学习
链接：https://arxiv.org/abs/2511.21928

作者：Yifan Zhou,Sachin Grover,Mohamed El Mistiri,Kamalesh Kalirathnam,Pratyush Kerhalkar,Swaroop Mishra,Neelesh Kumar,Sanket Gaurav,Oya Aran,Heni Ben Amor
备注：In The Thirty-ninth Annual Conference on Neural Information Processing Systems
摘要：强化学习（RL）传统上依赖于标量奖励信号，限制了其利用现实任务中经常可用的丰富语义知识的能力。相比之下，人类通过将数字反馈与语言、先验知识和常识相结合来有效地学习。我们介绍了简化策略搜索（ProPS），一种新的RL方法，在一个单一的框架内统一的数值和语言推理。与之前使用语言增强现有RL组件的工作不同，ProPS将大型语言模型（LLM）置于策略优化循环的中心-直接基于奖励反馈和自然语言输入提出策略更新。我们表明，LLM可以在上下文中进行数值优化，并且结合语义信号，如目标，领域知识和策略提示，可以导致更明智的探索和样本有效的学习。ProPS在15个Gymnasium任务中进行了评估，涵盖经典控制，Atari游戏和MuJoCo环境，并与七种广泛采用的RL算法（例如，PPO、SAC、TRPO）。它在15项任务中的8项上优于所有基线，并在提供领域知识时表现出实质性的收益。这些结果突出了统一语义和数字的潜力，以实现透明，可推广和人类对齐的RL。
摘要：Reinforcement Learning (RL) traditionally relies on scalar reward signals, limiting its ability to leverage the rich semantic knowledge often available in real-world tasks. In contrast, humans learn efficiently by combining numerical feedback with language, prior knowledge, and common sense. We introduce Prompted Policy Search (ProPS), a novel RL method that unifies numerical and linguistic reasoning within a single framework. Unlike prior work that augment existing RL components with language, ProPS places a large language model (LLM) at the center of the policy optimization loop-directly proposing policy updates based on both reward feedback and natural language input. We show that LLMs can perform numerical optimization in-context, and that incorporating semantic signals, such as goals, domain knowledge, and strategy hints can lead to more informed exploration and sample-efficient learning. ProPS is evaluated across fifteen Gymnasium tasks, spanning classic control, Atari games, and MuJoCo environments, and compared to seven widely-adopted RL algorithms (e.g., PPO, SAC, TRPO). It outperforms all baselines on eight out of fifteen tasks and demonstrates substantial gains when provided with domain knowledge. These results highlight the potential of unifying semantics and numerics for transparent, generalizable, and human-aligned RL.

【20】Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
标题：描述双边界：扩散语言模型的算术强度启发加速框架
链接：https://arxiv.org/abs/2511.21759

作者：Linye Wei,Wenjue Chen,Pingzhi Tang,Xiaotian Guo,Le Ye,Runsheng Wang,Meng Li
摘要：基于扩散的大语言模型（DLLM）最近获得了显着的关注，其卓越的性能和并行解码的固有潜力。现有的框架通过启用KV缓存进一步提高其推理效率。然而，它的双向注意机制需要周期性的缓存刷新，交织预填充和解码阶段，既贡献大量的推理成本和约束可实现的加速。受预填充和解码阶段的异构算术强度的启发，我们提出了ODB-dLLM，一个编排双边界以加速dLLM推理的框架。在预填充阶段，我们发现预定义的固定响应长度引入了沉重而冗余的计算开销，这影响了效率。为了缓解这一问题，ODB-dLLM采用了自适应长度预测机制，逐步减少预填充开销和不必要的计算。在解码阶段，我们分析了dLLM的计算特性，并提出了一种dLLM特定的跳跃共享推测解码方法，以提高效率，减少解码迭代次数。实验结果表明，ODB-dLLM分别比基线dLLM和Fast-dLLM实现了46- 162倍和2.63- 6.30倍的加速比，同时减轻了现有加速框架中的精度下降。
摘要：Diffusion-based large language models (dLLMs) have recently gained significant attention for their exceptional performance and inherent potential for parallel decoding. Existing frameworks further enhance its inference efficiency by enabling KV caching. However, its bidirectional attention mechanism necessitates periodic cache refreshes that interleave prefill and decoding phases, both contributing substantial inference cost and constraining achievable speedup. Inspired by the heterogeneous arithmetic intensity of the prefill and decoding phases, we propose ODB-dLLM, a framework that orchestrates dual-boundaries to accelerate dLLM inference. In the prefill phase, we find that the predefined fixed response length introduces heavy yet redundant computational overhead, which affects efficiency. To alleviate this, ODB-dLLM incorporates an adaptive length prediction mechanism that progressively reduces prefill overhead and unnecessary computation. In the decoding phase, we analyze the computational characteristics of dLLMs and propose a dLLM-specific jump-share speculative decoding method to enhance efficiency by reducing the number of decoding iterations. Experimental results demonstrate that ODB-dLLM achieves 46-162x and 2.63-6.30x speedups over the baseline dLLM and Fast-dLLM, respectively, while simultaneously mitigating the accuracy degradation in existing acceleration frameworks.

【21】PeerCoPilot: A Language Model-Powered Assistant for Behavioral Health Organizations
标题：PeerCoPilot：行为健康组织的语言模型支持助手
链接：https://arxiv.org/abs/2511.21721

作者：Gao Mo,Naveen Raman,Megan Chai,Cindy Peng,Shannon Pagdon,Nev Jones,Hong Shen,Peggy Swarbrick,Fei Fang
备注：Accepted at IAAI'26
摘要：行为健康状况，包括心理健康和物质使用障碍，是美国的主要疾病负担。同行经营的行为健康组织（PRO）通过将心理健康服务与收入，就业和住房等需求的援助相结合，为面临这些条件的个人提供关键帮助。然而，由于资金和人员配备有限，专业干事难以满足所有服务用户的需求。为了帮助PRO的同行提供者完成日常任务，我们引入了PeerCoPilot，这是一个大型语言模型（LLM）支持的助手，可以帮助同行提供者创建健康计划，构建分步目标，并找到组织资源来支持这些目标。PeerCoPilot通过检索增强生成管道确保信息可靠性，该管道由超过1，300个经过审查的资源的大型数据库支持。我们对15个对等提供商和6个服务用户进行了人工评估，发现超过90%的用户支持使用PeerCoPilot。此外，我们证明了PeerCoPilot提供了比基线LLM更可靠和更具体的信息。PeerCoPilot目前由CSPNJ的5-10个同行提供商组成的小组使用，CSPNJ是一家大型行为健康组织，为10，000多名服务用户提供服务，我们正在积极扩大PeerCoPilot的使用范围。
摘要：Behavioral health conditions, which include mental health and substance use disorders, are the leading disease burden in the United States. Peer-run behavioral health organizations (PROs) critically assist individuals facing these conditions by combining mental health services with assistance for needs such as income, employment, and housing. However, limited funds and staffing make it difficult for PROs to address all service user needs. To assist peer providers at PROs with their day-to-day tasks, we introduce PeerCoPilot, a large language model (LLM)-powered assistant that helps peer providers create wellness plans, construct step-by-step goals, and locate organizational resources to support these goals. PeerCoPilot ensures information reliability through a retrieval-augmented generation pipeline backed by a large database of over 1,300 vetted resources. We conducted human evaluations with 15 peer providers and 6 service users and found that over 90% of users supported using PeerCoPilot. Moreover, we demonstrated that PeerCoPilot provides more reliable and specific information than a baseline LLM. PeerCoPilot is now used by a group of 5-10 peer providers at CSPNJ, a large behavioral health organization serving over 10,000 service users, and we are actively expanding PeerCoPilot's use.

【22】Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation
标题：解决大型语言模型中的刻板印象：批判性审查和缓解
链接：https://arxiv.org/abs/2511.21711

作者：Fatima Kazi
摘要：近年来，随着自然语言处理（NLP）的发展，诸如ChatGPT之类的大型语言模型（LLM）越来越受欢迎，其用例也跨越了许多学科和日常生活。LLM从他们接受培训的数据集中继承了显式和隐式偏见;这些偏见可能包括社会，道德，文化，宗教和其他偏见和刻板印象。必须全面审查这些缺陷，查明这些偏见的存在和程度，认识其根源，并努力减少这些有偏见的产出，以确保产出公平，减少有害的陈规定型观念和错误信息。这项研究检查并强调了在不断增长的生成式人工智能（AI）中解决LLM偏见的必要性。我们利用StereoSet和CrowSPairs等偏差特定的基准来评估许多不同生成模型（如BERT，GPT 3.5和ADA）中各种偏差的存在。为了检测显性和隐性偏见，我们采用了三管齐下的方法进行全面和包容性的分析。结果表明，微调模型与性别偏见作斗争，但擅长识别和避免种族偏见。我们的研究结果还表明，尽管有一些成功的案例，LLM往往过度依赖于提示及其输出中的关键字。这表明LLM无法尝试真正理解其输出的准确性和真实性。最后，为了提高模型性能，我们应用了一种增强学习策略，包括微调、使用不同提示技术的模型以及偏差基准的数据增强。我们发现微调模型在跨数据集测试期间表现出有希望的适应性，并显着增强了隐式偏差基准测试的性能，性能增益高达20%。
摘要：Large Language models (LLMs), such as ChatGPT, have gained popularity in recent years with the advancement of Natural Language Processing (NLP), with use cases spanning many disciplines and daily lives as well. LLMs inherit explicit and implicit biases from the datasets they were trained on; these biases can include social, ethical, cultural, religious, and other prejudices and stereotypes. It is important to comprehensively examine such shortcomings by identifying the existence and extent of such biases, recognizing the origin, and attempting to mitigate such biased outputs to ensure fair outputs to reduce harmful stereotypes and misinformation. This study inspects and highlights the need to address biases in LLMs amid growing generative Artificial Intelligence (AI). We utilize bias-specific benchmarks such StereoSet and CrowSPairs to evaluate the existence of various biases in many different generative models such as BERT, GPT 3.5, and ADA. To detect both explicit and implicit biases, we adopt a three-pronged approach for thorough and inclusive analysis. Results indicate fine-tuned models struggle with gender biases but excel at identifying and avoiding racial biases. Our findings also illustrated that despite some cases of success, LLMs often over-rely on keywords in prompts and its outputs. This demonstrates the incapability of LLMs to attempt to truly understand the accuracy and authenticity of its outputs. Finally, in an attempt to bolster model performance, we applied an enhancement learning strategy involving fine-tuning, models using different prompting techniques, and data augmentation of the bias benchmarks. We found fine-tuned models to exhibit promising adaptability during cross-dataset testing and significantly enhanced performance on implicit bias benchmarks, with performance gains of up to 20%.

【23】Evaluating Embedding Generalization: How LLMs, LoRA, and SLERP Shape Representational Geometry
标题：评估嵌入概括：LLM、LoRA和SLERP如何塑造具象几何
链接：https://arxiv.org/abs/2511.21703

作者：Siyaxolisa Kabane
备注：20 pages, 16 figures
摘要：我们研究了当嵌入主干是大型语言模型（LLM）时与当它是非LLM编码器时密集文本嵌入的泛化特性，并且我们研究了球面线性插值（SLERP）模型合并减轻由特定任务自适应（例如，LoRA）。为了使比较具体和域不可知的，我们设计了一套控制的实验中，模型嵌入短的数值序列，并评估其聚类和分类这些序列的能力，根据定义良好的数论性质。我们的实验协议比较了四类模型：（1）从头开始训练或微调嵌入的非LLM编码器，（2）基于LLM的编码器采用参数有效方法（LoRA），（3）基于LLM的编码器采用LoRA，然后将模型调整合并到基本权重中，以及（4）使用SLERP跨检查点或阶段合并相同的LoRA适应LLM。我们评估代表性的质量与聚类指数（剪影和戴维斯Bouldin）。我们还分析了kmeans标签的使用，以查看嵌入是否编码了除了我们正在测试的信息之外的任何其他信息。从经验上讲，我们发现基于LLM的主干产生的嵌入可以更好地捕获高阶组合数字模式，但容易出现适配器优势，从而降低平衡泛化; SLERP合并始终恢复基本模型结构，同时保留大多数任务增益，与模型souping或未合并的模型相比，在聚类可分性和鲁棒性方面具有优越的权衡。
摘要：We investigate the generalization properties of dense text embeddings when the embedding backbone is a large language model (LLM) versus when it is a non-LLM encoder, and we study the extent to which spherical linear interpolation (SLERP) model-merging mitigates over-specialization introduced by task-specific adaptation (e.g., LoRA). To make the comparison concrete and domain-agnostic, we design a controlled suite of experiments in which models embed short numerical sequences and are evaluated on their ability to cluster and classify those sequences according to well-defined number-theoretic properties. Our experimental protocol compares four families of models: (1) non-LLM encoders trained from scratch or fine-tuned for embeddings, (2) LLM-based encoders adapted with parameter-efficient methods (LoRA), (3) LLM-based encoders with LoRA followed by model souping merging into the base weights, and (4) the same LoRA-adapted LLMs merged using SLERP across checkpoints or stages. We evaluate representational quality with clustering indices (Silhouette and Davies Bouldin). We additionally analyze the use of kmeans labels to see if the embeddings encode any other information besides the one we are testing for. Empirically, we find that LLM-based backbones produce embeddings that better capture higher-order, compositional numeric patterns, but are prone to adapter dominance that degrades balanced generalization; SLERP merging consistently recovers base-model structure while retaining most task gains, yielding superior tradeoffs in clustering separability, and robustness compared to model souping or models that were not merged.

【24】Temporal Consistency for LLM Reasoning Process Error Identification
标题：LLM推理过程错误识别的时间一致性
链接：https://arxiv.org/abs/2503.14495

作者：Jiacheng Guo,Yue Wu,Jiahao Qiu,Kaixuan Huang,Xinzhe Juan,Ling Yang,Mengdi Wang
摘要：验证对于有效的数学推理至关重要。我们提出了一种新的时间一致性方法，验证者迭代地完善他们的判断，根据以前的评估。与单轮验证或多模型辩论方法不同，我们的方法利用一系列自我反思动作的一致性来提高验证精度。不同的数学过程错误识别基准（Mathcheck，ProcessBench和PRM 800 K）的经验评估显示，与基线方法相比，性能得到了一致的改善。当应用于最近的DeepSeek R1蒸馏模型时，我们的方法表现出强大的性能，使7 B/8B蒸馏模型在ProcessBench上的性能优于所有70 B/72 B模型和GPT-4 o。值得注意的是，使用我们的方法提取的14 B模型实现了与Deepseek-R1相当的性能。我们的代码可在https://github.com/jcguo123/Temporal-Consistency上获得
摘要：Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1. Our codes are available at https://github.com/jcguo123/Temporal-Consistency

【25】QuantumChem-200K: A Large-Scale Open Organic Molecular Dataset for Quantum-Chemistry Property Screening and Language Model Benchmarking
标题：QuantumChem-200 K：用于量子化学性质筛选和语言模型基准的大规模开放有机分子数据集
链接：https://arxiv.org/abs/2511.21747

作者：Yinqi Zeng,Renjie Li
备注：9 pages, 5 figures, 3 tables
摘要：用于双光子聚合（TPP）的下一代光引发剂的发现受到缺乏包含建模光解离和激发态行为所需的量子化学和电子物理性质的大型开放数据集的阻碍。现有的分子数据集通常只提供基本的物理化学描述符，因此无法支持数据驱动的筛选或人工智能辅助的光引发剂设计。为了解决这一差距，我们引入了QuantumChem-200 K，这是一个包含20多万个有机分子的大规模数据集，注释了11个量子化学性质，包括双光子吸收（TPA）截面，TPA光谱范围，单线态-三线态系统间交叉（ISC）能量，毒性和合成可及性评分，亲水性，溶解度，沸点，分子量和芳香性。这些值是使用混合工作流计算的，该工作流集成了密度函数理论（DFT）、半经验激发态方法、原子量子求解器和神经网络预测器。使用QuantumChem-200 K，我们对开源Qwen2.5- 32 B大型语言模型进行了微调，以创建一个能够从SMILES向前预测性质的化学AI助手。对来自VQM 24和ZINC 20的3000种看不见的分子进行基准测试表明，特定于结构域的微调显着提高了GPT-4 o、Llama-3.1- 70 B和基础Qwen2.5- 32 B模型的准确性，特别是对于光引发剂设计至关重要的TPA和ISC预测。QuantumChem-200 K和相应的AI助手共同提供了第一个可扩展的平台，用于高通量，LLM驱动的光引发剂筛选和加速光敏材料的发现。
摘要：The discovery of next-generation photoinitiators for two-photon polymerization (TPP) is hindered by the absence of large, open datasets containing the quantum-chemical and photophysical properties required to model photodissociation and excited-state behavior. Existing molecular datasets typically provide only basic physicochemical descriptors and therefore cannot support data-driven screening or AI-assisted design of photoinitiators. To address this gap, we introduce QuantumChem-200K, a large-scale dataset of over 200,000 organic molecules annotated with eleven quantum-chemical properties, including two-photon absorption (TPA) cross sections, TPA spectral ranges, singlet-triplet intersystem crossing (ISC) energies, toxicity and synthetic accessibility scores, hydrophilicity, solubility, boiling point, molecular weight, and aromaticity. These values are computed using a hybrid workflow that integrates density function theory (DFT), semi-empirical excited-state methods, atomistic quantum solvers, and neural-network predictors. Using QuantumChem-200K, we fine tune the open-source Qwen2.5-32B large language model to create a chemistry AI assistant capable of forward property prediction from SMILES. Benchmarking on 3000 unseen molecules from VQM24 and ZINC20 demonstrates that domain-specific fine-tuning significantly improves accuracy over GPT-4o, Llama-3.1-70B, and the base Qwen2.5-32B model, particularly for TPA and ISC predictions central to photoinitiator design. QuantumChem-200K and the corresponding AI assistant together provide the first scalable platform for high-throughput, LLM-driven photoinitiator screening and accelerated discovery of photosensitive materials.

Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】Time Extrapolation with Graph Convolutional Autoencoder and Tensor Train Decomposition
标题：利用图卷积自动编码器和张量串分解进行时间外推
链接：https://arxiv.org/abs/2511.23037

作者：Yuanhong Chen,Federico Pichi,Zhen Gao,Gianluigi Rozza
摘要：图形自动编码器在非结构网格上定义的参数化偏微分方程的非线性降阶建模中得到了关注。尽管它们提供了一种几何上一致的处理复杂域的方法，但将这种架构应用于参数化动力系统以进行超出训练数据的时间预测，即外推机制，仍然是一项具有挑战性的任务，因为同时需要参数空间中的时间因果关系和泛化能力。在这项工作中，我们探讨了图卷积自动编码器（GCA）与张量训练（TT）分解和算子推理（OpInf）的集成，以开发一个时间一致的降阶模型。特别是，高保真快照通过TT分解表示为参数，空间和时间核心的组合，而OpInf用于学习后者的演变。此外，我们通过在深度算子网络（DeepONet）的框架中开发多保真度两阶段方法来提高泛化性能，将空间和时间核心视为主干网络，将参数核心视为分支网络。数值结果，包括热传导，对流扩散和涡脱落现象，表现出很大的性能，有效地学习动态的外推制度的复杂几何形状，也比较国家的最先进的方法，如MeshGraphNets。
摘要：Graph autoencoders have gained attention in nonlinear reduced-order modeling of parameterized partial differential equations defined on unstructured grids. Despite they provide a geometrically consistent way of treating complex domains, applying such architectures to parameterized dynamical systems for temporal prediction beyond the training data, i.e. the extrapolation regime, is still a challenging task due to the simultaneous need of temporal causality and generalizability in the parametric space. In this work, we explore the integration of graph convolutional autoencoders (GCAs) with tensor train (TT) decomposition and Operator Inference (OpInf) to develop a time-consistent reduced-order model. In particular, high-fidelity snapshots are represented as a combination of parametric, spatial, and temporal cores via TT decomposition, while OpInf is used to learn the evolution of the latter. Moreover, we enhance the generalization performance by developing a multi-fidelity two-stages approach in the framework of Deep Operator Networks (DeepONet), treating the spatial and temporal cores as the trunk networks, and the parametric core as the branch network. Numerical results, including heat-conduction, advection-diffusion and vortex-shedding phenomena, demonstrate great performance in effectively learning the dynamic in the extrapolation regime for complex geometries, also in comparison with state-of-the-art approaches e.g. MeshGraphNets.

【2】Adaptive Factor Graph-Based Tightly Coupled GNSS/IMU Fusion for Robust Positionin
标题：基于自适应因子图的紧耦合GNSS/IMU融合实现稳健定位
链接：https://arxiv.org/abs/2511.23017

作者：Elham Ahmadi,Alireza Olama,Petri Välisuo,Heidi Kuusniemi
摘要：在全球导航卫星系统面临挑战的环境中进行可靠定位仍然是导航系统面临的一个关键挑战。紧密耦合的GNSS/IMU融合提高了鲁棒性，但仍然容易受到非高斯噪声和离群值的影响。我们提出了一个强大的和自适应的因子图为基础的融合框架，直接集成GNSS伪距测量与IMU预积分因子，并采用巴伦损失，一个通用的鲁棒损失函数，统一了几个m-估计通过一个单一的可调参数。通过自适应地降低不可靠的GNSS测量值的权重，我们的方法提高了弹性定位。该方法在扩展的GTSAM框架中实现，并在UrbanNav数据集上进行评估。相对于标准FGO，所提出的解决方案将定位误差降低了41%，并且在城市峡谷环境中实现了比扩展卡尔曼滤波（EKF）基线更大的改进。这些结果突出了Barron损失在增强城市和信号受损环境中基于GNSS/IMU的导航的弹性方面的好处。
摘要：Reliable positioning in GNSS-challenged environments remains a critical challenge for navigation systems. Tightly coupled GNSS/IMU fusion improves robustness but remains vulnerable to non-Gaussian noise and outliers. We present a robust and adaptive factor graph-based fusion framework that directly integrates GNSS pseudorange measurements with IMU preintegration factors and incorporates the Barron loss, a general robust loss function that unifies several m-estimators through a single tunable parameter. By adaptively down weighting unreliable GNSS measurements, our approach improves resilience positioning. The method is implemented in an extended GTSAM framework and evaluated on the UrbanNav dataset. The proposed solution reduces positioning errors by up to 41% relative to standard FGO, and achieves even larger improvements over extended Kalman filter (EKF) baselines in urban canyon environments. These results highlight the benefits of Barron loss in enhancing the resilience of GNSS/IMU-based navigation in urban and signal-compromised environments.

【3】ARM-Explainer -- Explaining and improving graph neural network predictions for the maximum clique problem using node features and association rule mining
标题：ARM-Explainer --使用节点特征和关联规则挖掘来解释和改进图神经网络对最大团问题的预测
链接：https://arxiv.org/abs/2511.22866

作者：Bharat Sharman,Elkafi Hassini
摘要：已经提出了许多基于图神经网络（GNN）的算法来解决基于图的组合优化问题（COP），但解释其预测的方法在很大程度上尚未开发。我们介绍了ARM解释器，一个事后的，模型级的解释器，基于关联规则挖掘，并证明它的预测的混合几何散射（HGS）GNN的最大团问题（MCP），一个典型的NP-硬图形为基础的COP。在TWITTER和BHOSLIB-DIMACS基准数据集的测试实例上，ARM-Explainer发现的八条最具解释性的关联规则分别实现了2.42和0.49的中位提升和置信度值。ARM-Explainer识别最重要的节点特征及其值范围，这些特征会影响GNN对这些数据集的预测。此外，使用信息节点特征增强GNN大大提高了其在MCP上的性能，在BHOSLIB-DIMACS数据集的大型图上，最大团大小的中位数增加了22%（从29.5增加到36）。
摘要：Numerous graph neural network (GNN)-based algorithms have been proposed to solve graph-based combinatorial optimization problems (COPs), but methods to explain their predictions remain largely undeveloped. We introduce ARM-Explainer, a post-hoc, model-level explainer based on association rule mining, and demonstrate it on the predictions of the hybrid geometric scattering (HGS) GNN for the maximum clique problem (MCP), a canonical NP-hard graph-based COP. The eight most explanatory association rules discovered by ARM-Explainer achieve high median lift and confidence values of 2.42 and 0.49, respectively, on test instances from the TWITTER and BHOSLIB-DIMACS benchmark datasets. ARM-Explainer identifies the most important node features, together with their value ranges, that influence the GNN's predictions on these datasets. Furthermore, augmenting the GNN with informative node features substantially improves its performance on the MCP, increasing the median largest-found clique size by 22% (from 29.5 to 36) on large graphs from the BHOSLIB-DIMACS dataset.

【4】Intelligent Neural Networks: From Layered Architectures to Graph-Organized Intelligence
标题：智能神经网络：从分层架构到图形组织智能
链接：https://arxiv.org/abs/2511.22813

作者：Antoine Salomon
备注：Code available at https://github.com/AntoineSal/IntelligentNeuralNetwork
摘要：生物神经元表现出非凡的智能：它们保持内部状态，有选择地与其他神经元进行通信，并自组织成复杂的图形，而不是严格的层次结构。如果人工智能可以从类似的智能计算单元中产生呢？我们介绍智能神经网络（INN），这是一种范式转变，其中神经元是具有内部记忆和学习通信模式的一流实体，以完整的图形而不是顺序层组织。每个智能神经元将选择性状态空间动态（知道何时激活）与基于注意力的路由（知道向谁发送信号）相结合，通过图形结构的交互实现紧急计算。在标准的Text 8字符建模基准测试中，INN达到了1.705位每字符（BPC），显著优于可比的Transformer（2.055 BPC），并匹配高度优化的LSTM基线。至关重要的是，堆叠的Mamba块的参数匹配基线在相同的训练协议下无法收敛（>3.4 BPC），这表明INN的图形拓扑提供了基本的训练稳定性。消融研究证实：去除神经元间通信会降低性能或导致不稳定性，从而证明了学习神经路由的价值。这项工作表明，以神经元为中心的设计与图形组织不仅仅是生物启发-它是计算有效的，为模块化，可解释和可扩展的神经架构开辟了新的方向。
摘要：Biological neurons exhibit remarkable intelligence: they maintain internal states, communicate selectively with other neurons, and self-organize into complex graphs rather than rigid hierarchical layers. What if artificial intelligence could emerge from similarly intelligent computational units? We introduce Intelligent Neural Networks (INN), a paradigm shift where neurons are first-class entities with internal memory and learned communication patterns, organized in complete graphs rather than sequential layers. Each Intelligent Neuron combines selective state-space dynamics (knowing when to activate) with attention-based routing (knowing to whom to send signals), enabling emergent computation through graph-structured interactions. On the standard Text8 character modeling benchmark, INN achieves 1.705 Bit-Per-Character (BPC), significantly outperforming a comparable Transformer (2.055 BPC) and matching a highly optimized LSTM baseline. Crucially, a parameter-matched baseline of stacked Mamba blocks fails to converge (>3.4 BPC) under the same training protocol, demonstrating that INN's graph topology provides essential training stability. Ablation studies confirm this: removing inter-neuron communication degrades performance or leads to instability, proving the value of learned neural routing. This work demonstrates that neuron-centric design with graph organization is not merely bio-inspired -- it is computationally effective, opening new directions for modular, interpretable, and scalable neural architectures.

【5】IVGAE: Handling Incomplete Heterogeneous Data with a Variational Graph Autoencoder
标题：IVGAE：使用变分图自动编码器处理不完整的异类数据
链接：https://arxiv.org/abs/2511.22116

作者：Youran Zhou,Mohamed Reda Bouadjenek,Sunil Aryal%
摘要：处理缺失数据仍然是现实世界表格数据集的一个基本挑战，特别是当数据具有数值和分类特征时。现有的插补方法往往无法捕捉复杂的结构依赖关系，并有效地处理异构数据。我们提出了\textbf{IVGAE}，一个变分图自动编码器框架，用于不完整异构数据的鲁棒插补。IVGAE构造二分图来表示样本-特征关系，并应用图表示学习来建模结构依赖。一个关键的创新是它的\textit{双解码器架构}，其中一个解码器重建特征嵌入，另一个模型缺失模式，提供结构先验知识缺失机制。为了更好地编码分类变量，我们引入了一个基于transformer的异构嵌入模块，避免了高维的one-hot编码。在16个真实数据集上进行的大量实验表明，IVGAE在30%的缺失率下，在MCAR、MAR和MNAR缺失场景下，均能实现RMSE和下游F1的一致改进.代码和数据可在https://github.com/echoid/IVGAE上获得。
摘要：Handling missing data remains a fundamental challenge in real-world tabular datasets, especially when data are heterogeneous with both numerical and categorical features. Existing imputation methods often fail to capture complex structural dependencies and handle heterogeneous data effectively. We present \textbf{IVGAE}, a Variational Graph Autoencoder framework for robust imputation of incomplete heterogeneous data. IVGAE constructs a bipartite graph to represent sample-feature relationships and applies graph representation learning to model structural dependencies. A key innovation is its \textit{dual-decoder architecture}, where one decoder reconstructs feature embeddings and the other models missingness patterns, providing structural priors aware of missing mechanisms. To better encode categorical variables, we introduce a Transformer-based heterogeneous embedding module that avoids high-dimensional one-hot encoding. Extensive experiments on 16 real-world datasets show that IVGAE achieves consistent improvements in RMSE and downstream F1 across MCAR, MAR, and MNAR missing scenarios under 30\% missing rates. Code and data are available at: https://github.com/echoid/IVGAE.

【6】ResearchArcade: Graph Interface for Academic Tasks
标题：ResearchArcade：学术任务的图形界面
链接：https://arxiv.org/abs/2511.22036

作者：Jingjun Xu,Chongshan Lin,Haofei Yu,Tao Feng,Jiaxuan You
摘要：学术研究产生了不同的数据源，随着研究人员越来越多地使用机器学习来协助研究任务，一个关键问题出现了：我们能否建立一个统一的数据接口来支持各种学术任务的机器学习模型的开发？在这样一个统一的界面上训练的模型可以在整个研究过程中更好地支持人类研究人员，最终加速知识发现。在这项工作中，我们介绍了ResearchArcade，一个基于图形的接口，连接多个学术数据源，统一任务定义，并支持广泛的基础模型，以解决关键的学术挑战。ResearchArcade利用具有图形结构的连贯多表格式来组织来自不同来源的数据，包括来自ArXiv的学术语料库和来自OpenReview的同行评审，同时捕获具有多种形式的信息，例如文本，图形和表格。ResearchArcade还保留了手稿和社区层面的时间演变，支持对论文修订的研究以及随着时间的推移更广泛的研究趋势。此外，ResearchArcade统一了不同的学术任务定义，并支持具有不同输入要求的各种模型。我们在六个学术任务中的实验表明，结合跨源和多模态信息可以实现更广泛的任务，同时结合图形结构可以持续提高基线方法的性能。这突出了ResearchArcade的有效性及其推动研究进展的潜力。
摘要：Academic research generates diverse data sources, and as researchers increasingly use machine learning to assist research tasks, a crucial question arises: Can we build a unified data interface to support the development of machine learning models for various academic tasks? Models trained on such a unified interface can better support human researchers throughout the research process, eventually accelerating knowledge discovery. In this work, we introduce ResearchArcade, a graph-based interface that connects multiple academic data sources, unifies task definitions, and supports a wide range of base models to address key academic challenges. ResearchArcade utilizes a coherent multi-table format with graph structures to organize data from different sources, including academic corpora from ArXiv and peer reviews from OpenReview, while capturing information with multiple modalities, such as text, figures, and tables. ResearchArcade also preserves temporal evolution at both the manuscript and community levels, supporting the study of paper revisions as well as broader research trends over time. Additionally, ResearchArcade unifies diverse academic task definitions and supports various models with distinct input requirements. Our experiments across six academic tasks demonstrate that combining cross-source and multi-modal information enables a broader range of tasks, while incorporating graph structures consistently improves performance over baseline methods. This highlights the effectiveness of ResearchArcade and its potential to advance research progress.

【7】Nonstabilizerness Estimation using Graph Neural Networks
标题：使用图神经网络的非稳定性估计
链接：https://arxiv.org/abs/2511.23224

作者：Vincenzo Lipardi,Domenica Dibenedetto,Georgios Stamoulis,Evert van Nieuwenburg,Mark H. M. Winands
摘要：本文提出了一种图神经网络（GNN）方法来估计量子电路中的不稳定性，通过稳定器Rényi熵（SRE）来衡量。非稳定性是量子优势的基本来源，有效的SRE估计在实际应用中非常有益。我们通过三个监督学习公式从更容易的分类任务到更具挑战性的回归任务来解决非稳定性估计问题。实验结果表明，所提出的GNN设法从基于图的电路表示中捕获有意义的特征，从而在不同的场景中实现了鲁棒的泛化性能。在分类任务中，GNN在产品状态上进行训练，并在Clifford操作下进化的电路，纠缠态和具有更高数量量子位的电路上进行推广。在回归任务中，GNN显着提高了对分布外电路的SRE估计，与以前的工作相比，对于随机量子电路和从横向场伊辛模型导出的结构化电路，量子比特和门计数的数量都更高。此外，量子电路的图形表示自然地集成了特定于硬件的信息。在有噪声的量子硬件上的模拟突出了所提出的GNN预测量子设备上测量的SRE的潜力。
摘要：This article proposes a Graph Neural Network (GNN) approach to estimate nonstabilizerness in quantum circuits, measured by the stabilizer Rényi entropy (SRE). Nonstabilizerness is a fundamental resource for quantum advantage, and efficient SRE estimations are highly beneficial in practical applications. We address the nonstabilizerness estimation problem through three supervised learning formulations starting from easier classification tasks to the more challenging regression task. Experimental results show that the proposed GNN manages to capture meaningful features from the graph-based circuit representation, resulting in robust generalization performances achieved across diverse scenarios. In classification tasks, the GNN is trained on product states and generalizes on circuits evolved under Clifford operations, entangled states, and circuits with higher number of qubits. In the regression task, the GNN significantly improves the SRE estimation on out-of-distribution circuits with higher number of qubits and gate counts compared to previous work, for both random quantum circuits and structured circuits derived from the transverse-field Ising model. Moreover, the graph representation of quantum circuits naturally integrates hardware-specific information. Simulations on noisy quantum hardware highlight the potential of the proposed GNN to predict the SRE measured on quantum devices.

Transformer(9篇)

【1】Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla
标题：低资源孟加拉语中用于增强多模态作者意图分类的转换器驱动的三重融合框架
链接：https://arxiv.org/abs/2511.23287

作者：Ariful Islam,Tanvir Mahmud,Md Rifat Hossen
备注：Accepted at the 28th International Conference on Computer and Information Technology (ICCIT 2025). To be published in IEEE proceedings
摘要：互联网和社交网络的扩张导致了用户生成内容的爆炸式增长。作者意图理解在解释社交媒体内容方面起着至关重要的作用。本文通过利用文本和视觉数据来解决孟加拉国社交媒体帖子中的作者意图分类。认识到以前的单峰方法的局限性，我们系统地对基于transformer的语言模型（mBERT，DistilBERT，XLM-RoBERTa）和视觉架构（ViT，Swin，SwiftFormer，ResNet，DenseNet，MobileNet）进行了基准测试，使用了Uddessho数据集，其中包含3，048个帖子，跨越了六个实际意图类别。我们介绍了一种新的中间融合策略，显着优于早期和晚期融合这项任务。实验结果表明，中间融合，特别是与mBERT和Swin Transformer，实现了84.11%的宏F1分数，建立了一个新的国家的最先进的8.4存储点的改进比以前的孟加拉多模态方法。我们的分析表明，整合视觉上下文大大提高了意图分类。在中间层次上的跨模态特征集成提供了特定于模态的表征和跨模态学习之间的最佳平衡。这项研究为孟加拉语和其他低资源语言建立了新的基准和方法标准。我们称我们提出的框架BangACMM（孟加拉国作者内容多模）。
摘要：The expansion of the Internet and social networks has led to an explosion of user-generated content. Author intent understanding plays a crucial role in interpreting social media content. This paper addresses author intent classification in Bangla social media posts by leveraging both textual and visual data. Recognizing limitations in previous unimodal approaches, we systematically benchmark transformer-based language models (mBERT, DistilBERT, XLM-RoBERTa) and vision architectures (ViT, Swin, SwiftFormer, ResNet, DenseNet, MobileNet), utilizing the Uddessho dataset of 3,048 posts spanning six practical intent categories. We introduce a novel intermediate fusion strategy that significantly outperforms early and late fusion on this task. Experimental results show that intermediate fusion, particularly with mBERT and Swin Transformer, achieves 84.11% macro-F1 score, establishing a new state-of-the-art with an 8.4 percentage-point improvement over prior Bangla multimodal approaches. Our analysis demonstrates that integrating visual context substantially enhances intent classification. Cross-modal feature integration at intermediate levels provides optimal balance between modality-specific representation and cross-modal learning. This research establishes new benchmarks and methodological standards for Bangla and other low-resource languages. We call our proposed framework BangACMM (Bangla Author Content MultiModal).

【2】Towards Understanding Transformers in Learning Random Walks
标题：在学习随机行走时了解Transformer
链接：https://arxiv.org/abs/2511.23239

作者：Wei Shi,Yuan Cao
备注：45 pages, 13 figures
摘要：Transformers已被证明在各种应用程序中非常有效，特别是在处理自然语言和时间序列等顺序数据时。然而，Transformer模型往往缺乏清晰的可解释性，并且Transformers的成功在理论上还没有得到很好的理解。在本文中，我们研究的能力和可解释性的Transformers在学习一个家庭的经典统计模型，即随机游走的圆圈。我们从理论上证明，在梯度下降训练后，单层Transformer模型可以在预测随机游走时达到最佳精度。重要的是，我们的分析表明，经过训练的模型是可解释的：经过训练的softmax attention作为一个标记选择器，专注于直接的父状态;随后，值矩阵执行一步概率转换，以基于此父状态预测下一个状态的位置。我们还表明，某些边缘情况下，我们的理论不包括确实是失败的情况下，表明我们的理论条件是紧的。通过研究这些成功和失败的案例，我们发现，在某些简单的任务中，即使超出随机游走，初始化较小的梯度下降也可能失败或难以收敛到一个好的解决方案。实验进行支持我们的理论研究结果。
摘要：Transformers have proven highly effective across various applications, especially in handling sequential data such as natural languages and time series. However, transformer models often lack clear interpretability, and the success of transformers has not been well understood in theory. In this paper, we study the capability and interpretability of transformers in learning a family of classic statistical models, namely random walks on circles. We theoretically demonstrate that, after training with gradient descent, a one-layer transformer model can achieve optimal accuracy in predicting random walks. Importantly, our analysis reveals that the trained model is interpretable: the trained softmax attention serves as a token selector, focusing on the direct parent state; subsequently, the value matrix executes a one-step probability transition to predict the location of the next state based on this parent state. We also show that certain edge cases not covered by our theory are indeed failure cases, demonstrating that our theoretical conditions are tight. By investigating these success and failure cases, it is revealed that gradient descent with small initialization may fail or struggle to converge to a good solution in certain simple tasks even beyond random walks. Experiments are conducted to support our theoretical findings.

【3】TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies
标题：TWEO：没有极端异常值的Transformer为假人提供FP 8训练和量化
链接：https://arxiv.org/abs/2511.23225

作者：Guang Liang,Jie Shao,Ningyuan Tang,Xinyao Liu,Jianxin Wu
摘要：现代硬件中的原生FP8支持对于训练大型Transformers至关重要，但受到极端激活异常值的严重阻碍。现有的解决方案要么依赖于复杂的混合精度工程，要么依赖于侵入性的架构修改。本文从根本上挑战了离群值是数据驱动的传统智慧。我们证明了极端离群值是一种数据独立的，机械产生的训练伪像，源于权重矩阵的特定结构属性（即，共线性）。基于这一见解，我们提出了TWEO（无极端异常值的Transformers），这是一种新颖的、非侵入性的损失函数。TWEO通过一个非常简单的损失项有效地防止极端离群值，将离群值从10000+减少到20以下。然后，TWEO可以实现全模型FP8预训练，既不需要工程技巧，也不需要对LLM和ViT进行架构更改。当标准FP8训练灾难性地崩溃时，TWEO实现了与BF 16基线相当的性能，同时提供了36%的训练吞吐量增加。此外，TWEO实现了新的量化范例。LLM的硬件友好的W8A8每张量静态量化，以前被认为由于离群值而完全不可用，首次在TWEO训练的模型上实现了SOTA性能。
摘要：Native FP8 support in modern hardware is essential for training large Transformers, but is severely hindered by extreme activation outliers. Existing solutions either rely on complex mixed-precision engineering or invasive architectural modifications. This paper fundamentally challenges the conventional wisdom that outliers are data-driven. We demonstrate that extreme outliers are a data-independent, mechanically-produced artifact of training, originating from specific structural properties of the weight matrices (i.e., colinearity). Based on this insight, we propose TWEO (Transformers Without Extreme Outliers), a novel, non-invasive loss function. TWEO effectively prevents extreme outliers via a very simple loss term, which reduces outliers from 10000+ to less than 20. TWEO then enables full-model FP8 pre-training with neither engineering tricks nor architectural changes for both LLM and ViT. When standard FP8 training catastrophically collapses, TWEO achieves performance comparable to the BF16 baseline while delivering a 36% increase in training throughput. Also, TWEO enables a new quantization paradigm. Hardware-friendly W8A8 per-tensor static quantization of LLMs, previously considered completely unusable due to outliers, achieves SOTA performance for the first time on TWEO-trained models.

【4】Energy-Efficient Vision Transformer Inference for Edge-AI Deployment
标题：用于边缘人工智能部署的节能视觉Transformer推理
链接：https://arxiv.org/abs/2511.23166

作者：Nursultan Amanzhol,Jurn-Gyu Park
摘要：随着Vision Transformers（ViT）在能源受限设备上的部署日益增多，需要的评估方法不仅限于准确性。我们提出了一个两阶段的管道，用于评估ViT的能源效率，结合设备无关的模型选择与设备相关的测量。我们在ImageNet-1 K和CIFAR-10上对13个ViT模型进行了基准测试，并在NVIDIA Jetson TX 2（边缘设备）和NVIDIA RTX 3050（移动GPU）上运行推理。与设备无关的阶段使用NetScore指标进行筛选;与设备相关的阶段使用可持续准确性指标（SAM）对模型进行排名。结果显示，诸如LeViT_Conv_192的混合模型相对于ViT基线（例如，TX 2/CIFAR-10上的SAM 5 =1.44），而TinyViT-11M_Distilled等蒸馏模型在移动GPU上表现出色（例如，在RTX 3050/CIFAR-10上SAM 5 =1.72，在RTX 3050/ImageNet-1 K上SAM 5 =0.76）。
摘要：The growing deployment of Vision Transformers (ViTs) on energy-constrained devices requires evaluation methods that go beyond accuracy alone. We present a two-stage pipeline for assessing ViT energy efficiency that combines device-agnostic model selection with device-related measurements. We benchmark 13 ViT models on ImageNet-1K and CIFAR-10, running inference on NVIDIA Jetson TX2 (edge device) and an NVIDIA RTX 3050 (mobile GPU). The device-agnostic stage uses the NetScore metric for screening; the device-related stage ranks models with the Sustainable Accuracy Metric (SAM). Results show that hybrid models such as LeViT_Conv_192 reduce energy by up to 53% on TX2 relative to a ViT baseline (e.g., SAM5=1.44 on TX2/CIFAR-10), while distilled models such as TinyViT-11M_Distilled excel on the mobile GPU (e.g., SAM5=1.72 on RTX 3050/CIFAR-10 and SAM5=0.76 on RTX 3050/ImageNet-1K).

【5】Freeze, Diffuse, Decode: Geometry-Aware Adaptation of Pretrained Transformer Embeddings for Antimicrobial Peptide Design
标题：冻结、扩散、解码：预先训练的Transformer嵌入物的几何意识调整，用于抗菌肽设计
链接：https://arxiv.org/abs/2511.23120

作者：Pankhil Gawade,Adam Izdebski,Myriam Lizotte,Kevin R. Moon,Jake S. Rhodes,Guy Wolf,Ewa Szczurek
备注：16 pages, 4 figures
摘要：预训练的Transformers提供丰富的通用嵌入，这些嵌入被转移到下游任务。然而，目前的迁移策略：微调和探测，要么扭曲了嵌入的预训练几何结构，要么缺乏足够的表达能力来捕获任务相关的信号。当监督数据稀缺时，这些问题变得更加突出。在这里，我们介绍了冻结，扩散，解码（FDD），一种新的基于扩散的框架，它可以将预训练的嵌入适应下游任务，同时保留其底层几何结构。FDD沿着冻结嵌入的固有流形传播监督信号，从而实现嵌入空间的几何感知自适应。应用于抗菌肽设计，FDD产生低维，预测和可解释的表示，支持属性预测，检索和潜在空间插值。
摘要：Pretrained transformers provide rich, general-purpose embeddings, which are transferred to downstream tasks. However, current transfer strategies: fine-tuning and probing, either distort the pretrained geometric structure of the embeddings or lack sufficient expressivity to capture task-relevant signals. These issues become even more pronounced when supervised data are scarce. Here, we introduce Freeze, Diffuse, Decode (FDD), a novel diffusion-based framework that adapts pre-trained embeddings to downstream tasks while preserving their underlying geometric structure. FDD propagates supervised signal along the intrinsic manifold of frozen embeddings, enabling a geometry-aware adaptation of the embedding space. Applied to antimicrobial peptide design, FDD yields low-dimensional, predictive, and interpretable representations that support property prediction, retrieval, and latent-space interpolation.

【6】Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
标题：集中注意力：评估预先训练的Transformer嵌入以进行欺骗分类
链接：https://arxiv.org/abs/2511.22977

作者：Sumit Mamtani,Abhijeet Bhure
备注：Accepted at the IEEE 7th Computing, Communications and IoT Applications Conference (ComComAp 2025), Madrid, Spain, December 2025. 6 pages
摘要：本文研究了假新闻检测作为Transformer表示的下游评估，将仅编码器和仅解码器的预训练模型（BERT，GPT-2，Transformer-XL）作为与轻量级分类器配对的冻结嵌入器。通过控制预处理比较池与填充和神经与线性头，结果表明，上下文自我注意编码一致有效地转移。BERT嵌入结合逻辑回归在LIAR数据集分割上优于神经基线，而序列长度和聚合的分析揭示了截断的鲁棒性和简单的最大或平均池的优势。这项工作将基于注意力的令牌编码器定位为可靠的、以架构为中心的基础，用于准确性任务，将Transformer的贡献与分类器的复杂性隔离开来。
摘要：This paper investigates fake news detection as a downstream evaluation of Transformer representations, benchmarking encoder-only and decoder-only pre-trained models (BERT, GPT-2, Transformer-XL) as frozen embedders paired with lightweight classifiers. Through controlled preprocessing comparing pooling versus padding and neural versus linear heads, results demonstrate that contextual self-attention encodings consistently transfer effectively. BERT embeddings combined with logistic regression outperform neural baselines on LIAR dataset splits, while analyses of sequence length and aggregation reveal robustness to truncation and advantages from simple max or average pooling. This work positions attention-based token encoders as robust, architecture-centric foundations for veracity tasks, isolating Transformer contributions from classifier complexity.

【7】Efficient-Husformer: Efficient Multimodal Transformer Hyperparameter Optimization for Stress and Cognitive Loads
标题：Efficient-Husformer：针对压力和认知负荷的高效多峰Transformer超参数优化
链接：https://arxiv.org/abs/2511.22362

作者：Merey Orazaly,Fariza Temirkhanova,Jurn-Gyu Park
摘要：基于变换器的模型在生理信号分析领域获得了相当大的关注。它们利用时间信号中的长距离依赖性和复杂模式，使它们能够实现优于传统RNN和CNN模型的性能。然而，它们需要高计算强度和存储器需求。在这项工作中，我们提出了Efficient-Husformer，这是一种新的基于Transformer的架构，采用超参数优化（HPO）开发，用于跨两个多模态生理数据集（WESAD和CogLoad）的多类压力检测。本研究的主要贡献是：（1）设计了一个结构化的搜索空间，目标是有效的超参数优化;（2）一个全面的消融研究，评估架构决策的影响;（3）与原始Husformer相比，性能得到了持续改进，最佳配置在WESAD和CogLoad数据集上分别实现了88.41和92.61的准确度（提高了13.83%和6.98%）。最佳性能配置是通过（L + dm）或（L + FFN）模态组合实现的，使用单层、3个注意力头、18/30的模型尺寸和120/30的FFN尺寸，从而产生仅具有约30 k个参数的紧凑模型。
摘要：Transformer-based models have gained considerable attention in the field of physiological signal analysis. They leverage long-range dependencies and complex patterns in temporal signals, allowing them to achieve performance superior to traditional RNN and CNN models. However, they require high computational intensity and memory demands. In this work, we present Efficient-Husformer, a novel Transformer-based architecture developed with hyperparameter optimization (HPO) for multi-class stress detection across two multimodal physiological datasets (WESAD and CogLoad). The main contributions of this work are: (1) the design of a structured search space, targeting effective hyperparameter optimization; (2) a comprehensive ablation study evaluating the impact of architectural decisions; (3) consistent performance improvements over the original Husformer, with the best configuration achieving an accuracy of 88.41 and 92.61 (improvements of 13.83% and 6.98%) on WESAD and CogLoad datasets, respectively. The best-performing configuration is achieved with the (L + dm) or (L + FFN) modality combinations, using a single layer, 3 attention heads, a model dimension of 18/30, and FFN dimension of 120/30, resulting in a compact model with only about 30k parameters.

【8】MOTIF-RF: Multi-template On-chip Transformer Synthesis Incorporating Frequency-domain Self-transfer Learning for RFIC Design Automation
标题：MOTIF-RF：用于RFIC设计自动化的多模板片上Transformer合成频域自转移学习
链接：https://arxiv.org/abs/2511.21970

作者：Houbo He,Yizhou Xu,Lei Xia,Yaolong Hu,Fan Cai,Taiyun Chi
备注：Accepted at ASP-DAC 2026
摘要：本文提出了一种系统的研究开发多模板机器学习（ML）代理模型，并将其应用于射频集成电路（RFIC）Transformers（XFMR）的逆设计。我们的研究首先对四种广泛使用的ML架构进行基准测试，包括基于MLP、CNN、UNet和GT的模型，在不同的XFMR拓扑中使用相同的数据集。为了提高建模精度超过这些基线，我们提出了一种新的频域自迁移学习技术，利用相邻频带之间的相关性，导致S参数预测的精度提高约30%-50%。基于这些模型，我们进一步开发了基于协方差矩阵自适应进化策略（CMA-ES）算法的逆设计框架。该框架使用多个阻抗匹配任务进行了验证，所有这些任务都表现出快速收敛和值得信赖的性能。这些结果推进了RFIC的AI辅助规范到GDS自动化的目标，并为RFIC设计人员提供了将AI集成到其工作流程中的可行工具。
摘要：This paper presents a systematic study on developing multi-template machine learning (ML) surrogate models and applying them to the inverse design of transformers (XFMRs) in radio-frequency integrated circuits (RFICs). Our study starts with benchmarking four widely used ML architectures, including MLP-, CNN-, UNet-, and GT-based models, using the same datasets across different XFMR topologies. To improve modeling accuracy beyond these baselines, we then propose a new frequency-domain self-transfer learning technique that exploits correlations between adjacent frequency bands, leading to around 30%-50% accuracy improvement in the S-parameters prediction. Building on these models, we further develop an inverse design framework based on the covariance matrix adaptation evolutionary strategy (CMA-ES) algorithm. This framework is validated using multiple impedance-matching tasks, all demonstrating fast convergence and trustworthy performance. These results advance the goal of AI-assisted specs-to-GDS automation for RFICs and provide RFIC designers with actionable tools for integrating AI into their workflows.

【9】Closed-Loop Transformers: Autoregressive Modeling as Iterative Latent Equilibrium
标题：闭环Transformer：作为迭代潜在平衡的自回归模型
链接：https://arxiv.org/abs/2511.21882

作者：Akbar Anbar Jafari,Gholamreza Anbarjafari
备注：22 pages, 1 figure, 1 table
摘要：当代自回归Transformers在开环中操作：每个隐藏状态在单个前向传递中计算并且从不修改，导致错误在序列中传播不正确。我们确定这种开环瓶颈是一个基本的架构限制，其基础是在长期推理、事实一致性和多步规划中有据可查的失败。为了解决这一限制，我们引入了闭环预测原理，该原理要求模型迭代地细化潜在表示，直到在提交每个令牌之前达到自洽平衡。我们将此原理实例化为均衡Transformers（EqT），它使用均衡细化模块来增强标准Transformer层，该模块通过潜在空间中的梯度下降来最小化学习的能量函数。能量函数强制执行双向预测一致性、情景记忆一致性和输出置信度，所有这些都是在没有外部监督的情况下计算的。从理论上讲，我们证明了EqT在基于潜在能量的模型中执行近似MAP推理，建立线性收敛保证，并表明细化精确地提高了单次推理次优的硬实例的预测。该框架将深度均衡模型、扩散语言模型和测试时训练作为特例进行统一。在二进制奇偶校验任务上的初步实验表明，在具有挑战性的序列上平均提高了+3.28%，在标准Transformers接近随机性能的情况下，增益达到+8.07%，验证了审议的好处与任务难度成正比。正如注意力机制解决了递归网络的顺序瓶颈一样，我们提出闭环均衡可以解决开环自回归的承诺瓶颈，这是迈向语言模型的基础性一步。
摘要：Contemporary autoregressive transformers operate in open loop: each hidden state is computed in a single forward pass and never revised, causing errors to propagate uncorrected through the sequence. We identify this open-loop bottleneck as a fundamental architectural limitation underlying well-documented failures in long-range reasoning, factual consistency, and multi-step planning. To address this limitation, we introduce the closed-loop prediction principle, which requires that models iteratively refine latent representations until reaching a self-consistent equilibrium before committing to each token. We instantiate this principle as Equilibrium Transformers (EqT), which augment standard transformer layers with an Equilibrium Refinement Module that minimizes a learned energy function via gradient descent in latent space. The energy function enforces bidirectional prediction consistency, episodic memory coherence, and output confidence, all computed without external supervision. Theoretically, we prove that EqT performs approximate MAP inference in a latent energy-based model, establish linear convergence guarantees, and show that refinement improves predictions precisely on hard instances where one-shot inference is suboptimal. The framework unifies deep equilibrium models, diffusion language models, and test-time training as special cases. Preliminary experiments on the binary parity task demonstrate +3.28% average improvement on challenging sequences, with gains reaching +8.07% where standard transformers approach random performance, validating that the benefit of deliberation scales with task difficulty. Just as attention mechanisms resolved the sequential bottleneck of recurrent networks, we propose that closed-loop equilibrium may resolve the commitment bottleneck of open-loop autoregression, representing a foundational step toward language models.

GAN|对抗|攻击|生成相关(8篇)

【1】Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation
标题：使用单概率前向传递和代码生成加速执行Bayesian神经网络
链接：https://arxiv.org/abs/2511.23440

作者：Bernhard Klein,Falk Selker,Hendrik Borras,Sophie Steger,Franz Pernkopf,Holger Fröning
摘要：机器学习模型在诊断、天气预报、自然语言处理和自动驾驶等领域表现良好，但其有限的不确定性处理限制了在安全关键环境中的使用。传统的神经网络通常无法检测到域外（OOD）数据，并且可能输出自信但不正确的预测。贝叶斯神经网络（BNN）通过提供概率估计来解决这个问题，但由于预测需要采样权重分布和多个前向传递，因此计算成本很高。概率前向传递（PFP）通过假设高斯分布的权重和激活，提供了对随机变分推理（SVI）的高效近似，实现了完全解析的不确定性传播，并用单个确定性前向传递代替采样。我们提出了一个端到端的管道，用于在嵌入式ARM CPU上训练、编译、优化和部署基于PFP的BNN。使用TVM深度学习编译器，我们为多层感知器和卷积神经网络实现了一个专用的高斯传播算子库，并结合了手动和自动调整策略。消融研究表明，PFP在计算效率方面始终优于SVI，在小批量的情况下实现了高达4200倍的加速。PFP-BNN在精度、不确定性估计和OOD检测方面与Dirty-MNIST上的SVI-BNN相匹配，同时大大降低了计算成本。这些结果突出了贝叶斯近似与代码生成相结合的潜力，使高效的BNN部署资源受限的系统。
摘要：Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confident yet incorrect predictions. Bayesian neural networks (BNNs) address this by providing probabilistic estimates, but incur high computational cost because predictions require sampling weight distributions and multiple forward passes. The Probabilistic Forward Pass (PFP) offers a highly efficient approximation to Stochastic Variational Inference (SVI) by assuming Gaussian-distributed weights and activations, enabling fully analytic uncertainty propagation and replacing sampling with a single deterministic forward pass. We present an end-to-end pipeline for training, compiling, optimizing, and deploying PFP-based BNNs on embedded ARM CPUs. Using the TVM deep learning compiler, we implement a dedicated library of Gaussian-propagating operators for multilayer perceptrons and convolutional neural networks, combined with manual and automated tuning strategies. Ablation studies show that PFP consistently outperforms SVI in computational efficiency, achieving speedups of up to 4200x for small mini-batches. PFP-BNNs match SVI-BNNs on Dirty-MNIST in accuracy, uncertainty estimation, and OOD detection while greatly reducing compute cost. These results highlight the potential of combining Bayesian approximations with code generation to enable efficient BNN deployment on resource-constrained systems.

【2】Adversarial Training for Process Reward Models
标题：流程奖励模型的对抗性训练
链接：https://arxiv.org/abs/2511.22888

作者：Gurusha Juneja,Deepak Nathani,William Yang Wang
摘要：过程奖励模型（PRM）通过提供步骤级监督来增强LLM的推理能力。然而，由于昂贵的手动步骤级注释和静态训练数据对新错误的泛化能力差，它们的广泛采用受到限制。我们引入了逆向训练的PRM（\texttt{APRM}），其中生成器（$G$）学习产生推理错误来欺骗PRM（$R$），而$R$同时学习检测它们。这种相互作用产生了越来越难的负面$R$，提高其鲁棒性和泛化到新的错误，而不需要手动步骤级别的标签。在不同的数学推理基准测试中取平均值，\texttt{APRM}将求解器的准确性比最强的PRM基准提高了$+3.4$个百分点（pp）。\texttt{APRM}在分发外任务上实现了$+5.3$ pp的收益。
摘要：Process Reward Models (PRMs) enhance reasoning ability of LLMs by providing step-level supervision. However, their widespread adoption is limited due to expensive manual step-level annotation and poor generalization of static training data to novel errors. We introduce Adversarially Trained PRMs (\texttt{APRM}), where a Generator ($G$) learns to produce reasoning errors to deceive a PRM ($R$), while $R$ concurrently learns to detect them. This interaction yields progressively harder negatives for $R$, improving its robustness and generalization to novel errors without requiring manual step-level labels. Averaged across diverse mathematical reasoning benchmarks, \texttt{APRM} improves solver accuracy by $+3.4$ percentage points (pp) over the strongest PRM baseline. \texttt{APRM} achieves gains of $+5.3$ pp on out-of-distribution tasks.

【3】CausalProfiler: Generating Synthetic Benchmarks for Rigorous and Transparent Evaluation of Causal Machine Learning
标题：因果关系剖析器：生成合成基准，以严格透明地评估因果机器学习
链接：https://arxiv.org/abs/2511.22842

作者：Panayiotis Panayiotou,Audrey Poinsot,Alessandro Leite,Nicolas Chesneau,Marc Schoenauer,Özgür Şimşek
摘要：因果机器学习（Causal ML）旨在使用机器学习算法回答“如果”问题，使其成为高风险决策的有前途的工具。然而，因果ML的经验评估实践仍然有限。现有的基准通常依赖于少数手工制作或半合成的数据集，导致脆弱，不可推广的结论。为了弥补这一差距，我们引入了Causal Profiler，这是一个用于Causal ML方法的合成基准生成器。基于一组关于所考虑的因果模型、查询和数据类的显式设计选择，因果刻画器对构成综合因果基准的因果模型、数据、查询和基本事实进行随机采样。通过这种方式，因果ML方法可以在各种条件下进行严格和透明的评估。这项工作提供了第一个随机生成器的合成因果基准覆盖保证和透明的假设上的因果推理的三个层次：观察，干预和反事实。我们通过在不同的条件和假设下评估几种最先进的方法来证明其实用性，无论是在识别制度内外，说明了CauseProfiler能够实现的分析和见解的类型。
摘要：Causal machine learning (Causal ML) aims to answer "what if" questions using machine learning algorithms, making it a promising tool for high-stakes decision-making. Yet, empirical evaluation practices in Causal ML remain limited. Existing benchmarks often rely on a handful of hand-crafted or semi-synthetic datasets, leading to brittle, non-generalizable conclusions. To bridge this gap, we introduce CausalProfiler, a synthetic benchmark generator for Causal ML methods. Based on a set of explicit design choices about the class of causal models, queries, and data considered, the CausalProfiler randomly samples causal models, data, queries, and ground truths constituting the synthetic causal benchmarks. In this way, Causal ML methods can be rigorously and transparently evaluated under a variety of conditions. This work offers the first random generator of synthetic causal benchmarks with coverage guarantees and transparent assumptions operating on the three levels of causal reasoning: observation, intervention, and counterfactual. We demonstrate its utility by evaluating several state-of-the-art methods under diverse conditions and assumptions, both in and out of the identification regime, illustrating the types of analyses and insights the CausalProfiler enables.

【4】VeriDispatcher: Multi-Model Dispatching through Pre-Inference Difficulty Prediction for RTL Generation Optimization
标题：VeriDispatcher：通过RTL生成优化的预推理难度预测进行多模型调度
链接：https://arxiv.org/abs/2511.22749

作者：Zeng Wang,Weihua Xiao,Minghao Shao,Raghu Vamshi Hemadri,Ozgur Sinanoglu,Muhammad Shafique,Ramesh Karri
摘要：大型语言模型（LLM）在RTL生成方面表现出很强的性能，但由于架构和训练的差异，不同的模型在不同的任务上表现出色。以前的工作主要是提示或微调一个单一的模型。目前还没有很好地研究的是如何协调多个不同的LLM，以便它们共同提高RTL质量，同时降低成本，而不是运行所有模型并选择最佳输出。我们将其定义为多LLM RTL生成问题。我们提出了VeriDispatcher，一个多LLM RTL生成框架，根据预推理难度预测将每个RTL任务分派到合适的LLM。对于每个模型，我们训练一个紧凑的分类器的语义嵌入的任务描述，使用的难度分数来自基准变量，结合语法，结构相似性和功能正确性。在推理时，VeriDispatcher使用这些预测器将任务路由到选定的LLM子集。在RTLLM和VerilogEval上的10个不同的LLM中，VeriDispatcher在RTLLM上仅使用40%的商业调用就实现了高达18%的准确性提高，而在VerilogEval上保持准确性的同时减少了25%的商业使用，从而实现了硬件设计自动化中具有成本效益的高质量LLM部署。
摘要：Large Language Models (LLMs) show strong performance in RTL generation, but different models excel on different tasks because of architecture and training differences. Prior work mainly prompts or finetunes a single model. What remains not well studied is how to coordinate multiple different LLMs so they jointly improve RTL quality while also reducing cost, instead of running all models and choosing the best output. We define this as the multi-LLM RTL generation problem. We propose VeriDispatcher, a multi-LLM RTL generation framework that dispatches each RTL task to suitable LLMs based on pre-inference difficulty prediction. For each model, we train a compact classifier over semantic embeddings of task descriptions, using difficulty scores derived from benchmark variants that combine syntax, structural similarity, and functional correctness. At inference, VeriDispatcher uses these predictors to route tasks to a selected subset of LLMs. Across 10 diverse LLMs on RTLLM and VerilogEval, VeriDispatcher achieves up to 18% accuracy improvement on RTLLM using only 40% of commercial calls, and on VerilogEval maintains accuracy while reducing commercial usage by 25%, enabling cost-effective, high-quality LLM deployment in hardware design automation.

【5】Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra
标题：生成锚定场：通过出现速度场和传输代数的受控数据生成
链接：https://arxiv.org/abs/2511.22693

作者：Deressa Wodajo Deressa,Hannes Mareen,Peter Lambert,Glenn Van Wallendael
备注：20 pages, 21 figures
摘要：我们提出了生成锚定场（GAF），一个生成模型，学习独立的端点预测器$J$（噪声）和$K$（数据），而不是轨迹预测器。速度场v=K-J是由它们在时间上的不一致而产生的。这种因式分解使\textit{Transport Algebra}：代数运算能够在学习的$\{（J_n，K_n）\}_{n=1}^N$头上进行合成控制。通过类特定的$K_n$头，GAF支持共享基础分布和多个模态之间的丰富的定向传输映射家族，通过向量运算实现可控插值、混合生成和语义变形。我们实现了强大的样本质量（FID 7.5在CelebA-HQ $64\times 64$上），同时独特地提供了作为架构原语的合成生成。我们进一步证明，GAF在其初始和最终状态之间具有LPIPS=$0.0$的无损循环传输。代码可在https://github.com/IDLabMedia/GAF获得
摘要：We present Generative Anchored Fields (GAF), a generative model that learns independent endpoint predictors $J$ (noise) and $K$ (data) rather than a trajectory predictor. The velocity field $v=K-J$ emerges from their time-conditioned disagreement. This factorization enables \textit{Transport Algebra}: algebraic operation on learned $\{(J_n,K_n)\}_{n=1}^N$ heads for compositional control. With class-specific $K_n$ heads, GAF supports a rich family of directed transport maps between a shared base distribution and multiple modalities, enabling controllable interpolation, hybrid generation, and semantic morphing through vector arithmetic. We achieve strong sample quality (FID 7.5 on CelebA-HQ $64\times 64$) while uniquely providing compositional generation as an architectural primitive. We further demonstrate, GAF has lossless cyclic transport between its initial and final state with LPIPS=$0.0$. Code available at https://github.com/IDLabMedia/GAF

【6】Adversarial Flow Models
标题：对抗流模型
链接：https://arxiv.org/abs/2511.22475

作者：Shanchuan Lin,Ceyuan Yang,Zhijie Lin,Hao Chen,Haoqi Fan
摘要：我们提出了对抗流模型，一类生成模型，统一了对抗模型和流模型。我们的方法支持原生的一步或多步生成，并使用对抗目标进行训练。与传统的GANs不同，生成器学习噪声和数据分布之间的任意传输计划，我们的生成器学习确定性的噪声到数据映射，这与流匹配模型中的最佳传输相同。这大大稳定了对抗训练。此外，与基于一致性的方法不同，我们的模型直接学习一步或几步生成，而不需要学习传播概率流的中间时间步。这节省了模型容量，减少了训练迭代，并避免了错误累积。在ImageNet-256 px上相同的1 NFE设置下，我们的B/2模型接近基于一致性的XL/2模型的性能，而我们的XL/2模型创建了一个新的最佳FID 2.38。此外，我们还展示了通过深度重复进行56层和112层模型的端到端训练的可能性，而无需任何中间监督，并使用单次向前传递实现了2.08和1.94的FID，超过了2NFE和4 NFE。
摘要：We present adversarial flow models, a class of generative models that unifies adversarial models and flow models. Our method supports native one-step or multi-step generation and is trained using the adversarial objective. Unlike traditional GANs, where the generator learns an arbitrary transport plan between the noise and the data distributions, our generator learns a deterministic noise-to-data mapping, which is the same optimal transport as in flow-matching models. This significantly stabilizes adversarial training. Also, unlike consistency-based methods, our model directly learns one-step or few-step generation without needing to learn the intermediate timesteps of the probability flow for propagation. This saves model capacity, reduces training iterations, and avoids error accumulation. Under the same 1NFE setting on ImageNet-256px, our B/2 model approaches the performance of consistency-based XL/2 models, while our XL/2 model creates a new best FID of 2.38. We additionally show the possibility of end-to-end training of 56-layer and 112-layer models through depth repetition without any intermediate supervision, and achieve FIDs of 2.08 and 1.94 using a single forward pass, surpassing their 2NFE and 4NFE counterparts.

【7】ABLE: Using Adversarial Pairs to Construct Local Models for Explaining Model Predictions
标题：ABLE：使用对抗对构建局部模型来解释模型预测
链接：https://arxiv.org/abs/2511.21952

作者：Krishna Khadka,Sunny Shree,Pujan Budhathoki,Yu Lei,Raghu Kacker,D. Richard Kuhn
备注：10 pages, 2 figures. Accepted to KDD 2026 (Research Track)
摘要：机器学习模型越来越多地用于关键应用，但由于缺乏透明度，大多数是“黑匣子”。局部解释方法，如LIME，通过使用简单的、可解释的模型来近似测试实例附近的复杂模型的行为来解决这个问题。然而，这些方法通常遭受不稳定性和差的局部保真度。在本文中，我们提出了一种新的方法，称为对抗括号本地解释（ABLE），以解决这些限制。我们的方法首先通过添加有界高斯噪声在测试实例x_test附近生成一组邻域点。对于每个邻近点D，我们应用对抗攻击来生成具有最小扰动的对抗点A，从而产生与D不同的标签。然后对A执行第二次对抗攻击，以生成与D具有相同标签（因此与A不同）的点A'。点A和A'形成了一个对抗对，它包含了x_test的局部决策边界。然后，我们在这些对抗对上训练线性模型来近似局部决策边界。在三种深度神经网络架构的六个UCI基准数据集上的实验结果表明，我们的方法比最先进的方法具有更高的稳定性和保真度。
摘要：Machine learning models are increasingly used in critical applications but are mostly "black boxes" due to their lack of transparency. Local explanation approaches, such as LIME, address this issue by approximating the behavior of complex models near a test instance using simple, interpretable models. However, these approaches often suffer from instability and poor local fidelity. In this paper, we propose a novel approach called Adversarially Bracketed Local Explanation (ABLE) to address these limitations. Our approach first generates a set of neighborhood points near the test instance, x_test, by adding bounded Gaussian noise. For each neighborhood point D, we apply an adversarial attack to generate an adversarial point A with minimal perturbation that results in a different label than D. A second adversarial attack is then performed on A to generate a point A' that has the same label as D (and thus different than A). The points A and A' form an adversarial pair that brackets the local decision boundary for x_test. We then train a linear model on these adversarial pairs to approximate the local decision boundary. Experimental results on six UCI benchmark datasets across three deep neural network architectures demonstrate that our approach achieves higher stability and fidelity than the state-of-the-art.

【8】Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings
标题：打破幻觉：基于假设的生成性缓解多模式嵌入中对抗幻觉
链接：https://arxiv.org/abs/2511.21893

作者：Fatemeh Akbarian,Anahita Baninajjar,Yingyi Zhang,Ananth Balashankar,Amir Aminifar
摘要：多模态基础模型在共享嵌入空间中对齐图像、文本和其他模态，但仍然容易受到对抗性错觉的影响（Zhang et al.，2025），其中不可察觉的扰动会破坏跨模态对齐并误导下游任务。为了抵消对抗性错觉的影响，我们提出了一种任务不可知的缓解机制，该机制通过生成模型从攻击者的扰动输入中重建输入，例如，可变自动编码器（VAE），以保持自然对齐。为了进一步增强我们提出的防御机制，我们采用了一个生成的抽样策略，结合基于共识的聚合计划的结果生成的样本。我们在最先进的多模态编码器上的实验表明，我们的方法将错觉攻击成功率大幅降低至接近零，并在未扰动和扰动输入设置下分别将跨模态对齐提高了4%（42至46）和11%（32至43），为对抗性错觉提供了有效的模型不可知防御。
摘要：Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions (Zhang et al., 2025), where imperceptible perturbations disrupt cross-modal alignment and mislead downstream tasks. To counteract the effects of adversarial illusions, we propose a task-agnostic mitigation mechanism that reconstructs the input from the attacker's perturbed input through generative models, e.g., Variational Autoencoders (VAEs), to maintain natural alignment. To further enhance our proposed defense mechanism, we adopt a generative sampling strategy combined with a consensus-based aggregation scheme over the outcomes of the generated samples. Our experiments on the state-of-the-art multi-modal encoders show that our approach substantially reduces the illusion attack success rates to near-zero and improves cross-modal alignment by 4% (42 to 46) and 11% (32 to 43) in unperturbed and perturbed input settings respectively, providing an effective and model-agnostic defense against adversarial illusions.

半/弱/无/有监督|不确定性|主动学习(11篇)

【1】Heteroscedastic Neural Networks for Path Loss Prediction with Link-Specific Uncertainty
标题：具有特定链路不确定性的路径损耗预测异方差神经网络
链接：https://arxiv.org/abs/2511.23243

作者：Jonathan Ethier
备注：Submitted to IEEE AWPL in December 2025. 5 pages, 2 figures, 4 tables
摘要：传统的和现代的基于机器学习的路径损耗模型通常假设恒定的预测方差。我们提出了一个神经网络，通过最小化高斯负对数似然联合预测均值和特定于链路的方差，从而实现异方差不确定性估计。我们比较共享，部分共享和独立参数的架构，使用精度，校准和锐度指标的盲测试集从大型公共RF驱动测试数据集。共享参数架构表现最好，实现了7.4 dB的RMSE，95%的预测区间覆盖率为95.1%，平均区间宽度为29.6 dB。这些不确定性估计进一步支持特定链路的覆盖范围，改善RF规划和干扰分析，并提供有效的模型弱点自诊断。
摘要：Traditional and modern machine learning-based path loss models typically assume a constant prediction variance. We propose a neural network that jointly predicts the mean and link-specific variance by minimizing a Gaussian negative log-likelihood, enabling heteroscedastic uncertainty estimates. We compare shared, partially shared, and independent-parameter architectures using accuracy, calibration, and sharpness metrics on blind test sets from large public RF drive-test datasets. The shared-parameter architecture performs best, achieving an RMSE of 7.4 dB, 95.1 percent coverage for 95 percent prediction intervals, and a mean interval width of 29.6 dB. These uncertainty estimates further support link-specific coverage margins, improve RF planning and interference analyses, and provide effective self-diagnostics of model weaknesses.

【2】Modeling Chaotic Pedestrian Behavior Using Chaos Indicators and Supervised Learning
标题：使用混乱指标和监督学习建模混乱行人行为
链接：https://arxiv.org/abs/2511.22887

作者：Md. Muhtashim Shahrier,Nazmul Haque,Md Asif Raihan,Md. Hadiuzzaman
摘要：随着世界各地的城市致力于改善步行舒适性和安全性，了解行人行为的不规则性和不可预测性变得越来越重要。本研究介绍了一个数据驱动的框架，用于使用经验观察的轨迹数据和监督学习来建模混沌行人运动。视频记录在白天和夜间条件下，以捕捉行人动态变化的环境和交通情况。通过计算机视觉技术提取行人轨迹，并使用四个混沌度量量化行为混沌：近似熵和李雅普诺夫指数，每个计算速度和方向变化。然后应用主成分分析（PCA）将这些指标合并为统一的混沌评分。设计了一套全面的个人、组级和上下文流量特征，并用于训练随机森林和CatBoost回归模型。CatBoost型号始终实现卓越的性能。最好的白天基于PCA的CatBoost模型达到了0.8319的R^2，而夜间基于PCA的CatBoost模型达到了0.8574的R^2。SHAP分析强调，距离旅行，运动持续时间和速度变化等特征是混沌行为的强大贡献者。建议的框架，使从业者能够量化和预测在现实世界中的行为不稳定。企业家和工程师可以使用混沌分数来识别高风险的步行区，评估基础设施的改进，并校准真实的微观仿真模型。该方法还通过捕获基于可观察、可解释特征的短期运动不可预测性，支持自动车辆系统的自适应风险评估。
摘要：As cities around the world aim to improve walkability and safety, understanding the irregular and unpredictable nature of pedestrian behavior has become increasingly important. This study introduces a data-driven framework for modeling chaotic pedestrian movement using empirically observed trajectory data and supervised learning. Videos were recorded during both daytime and nighttime conditions to capture pedestrian dynamics under varying ambient and traffic contexts. Pedestrian trajectories were extracted through computer vision techniques, and behavioral chaos was quantified using four chaos metrics: Approximate Entropy and Lyapunov Exponent, each computed for both velocity and direction change. A Principal Component Analysis (PCA) was then applied to consolidate these indicators into a unified chaos score. A comprehensive set of individual, group-level, and contextual traffic features was engineered and used to train Random Forest and CatBoost regression models. CatBoost models consistently achieved superior performance. The best daytime PCA-based CatBoost model reached an R^2 of 0.8319, while the nighttime PCA-based CatBoost model attained an R^2 of 0.8574. SHAP analysis highlighted that features such as distance travel, movement duration, and speed variability were robust contributors to chaotic behavior. The proposed framework enables practitioners to quantify and anticipate behavioral instability in real-world settings. Planners and engineers can use chaos scores to identify high-risk pedestrian zones, apprise infrastructure improvements, and calibrate realistic microsimulation models. The approach also supports adaptive risk assessment in automated vehicle systems by capturing short-term motion unpredictability grounded in observable, interpretable features.

【3】A Unified and Stable Risk Minimization Framework for Weakly Supervised Learning with Theoretical Guarantees
标题：具有理论保证的弱监督学习统一稳定的风险最小化框架
链接：https://arxiv.org/abs/2511.22823

作者：Miao Zhang,Junpeng Li,Changchun Hua,Yana Yang
摘要：弱监督学习已经成为完全监督学习的一种实用替代方案，当完整和准确的标签成本高昂或无法获得时。然而，许多现有的方法都是针对特定的监督模式-例如正未标记（PU），未标记未标记（UU），补充标记（CLL），部分标记（PLL）或相似性未标记注释-并依赖于事后校正来减轻间接监督引起的不稳定性。我们提出了一个原则性的，统一的框架，绕过这种事后调整，直接制定一个稳定的替代风险基础上的弱监督数据结构。该公式自然包含了不同的设置-包括PU，UU，CLL，PLL，多类未标记和基于元组的学习-在一个单一的优化目标。我们进一步通过Rademacher复杂性建立了一个非渐近的推广范围，阐明了监督结构，模型容量和样本大小如何共同控制性能。除此之外，我们分析了类先验误指定的效果上的界限，推导出明确的条款，量化其影响，我们研究可识别性，给出充分条件-最显着的是通过跨组的监督分层-下，目标风险是可恢复的。广泛的实验表明，在类先验，数据集规模和类计数一致的收益-没有启发式稳定-同时表现出鲁棒性过拟合。
摘要：Weakly supervised learning has emerged as a practical alternative to fully supervised learning when complete and accurate labels are costly or infeasible to acquire. However, many existing methods are tailored to specific supervision patterns -- such as positive-unlabeled (PU), unlabeled-unlabeled (UU), complementary-label (CLL), partial-label (PLL), or similarity-unlabeled annotations -- and rely on post-hoc corrections to mitigate instability induced by indirect supervision. We propose a principled, unified framework that bypasses such post-hoc adjustments by directly formulating a stable surrogate risk grounded in the structure of weakly supervised data. The formulation naturally subsumes diverse settings -- including PU, UU, CLL, PLL, multi-class unlabeled, and tuple-based learning -- under a single optimization objective. We further establish a non-asymptotic generalization bound via Rademacher complexity that clarifies how supervision structure, model capacity, and sample size jointly govern performance. Beyond this, we analyze the effect of class-prior misspecification on the bound, deriving explicit terms that quantify its impact, and we study identifiability, giving sufficient conditions -- most notably via supervision stratification across groups -- under which the target risk is recoverable. Extensive experiments show consistent gains across class priors, dataset scales, and class counts -- without heuristic stabilization -- while exhibiting robustness to overfitting.

【4】Structure-aware Hybrid-order Similarity Learning for Multi-view Unsupervised Feature Selection
标题：用于多视图无监督特征选择的结构感知混合阶相似性学习
链接：https://arxiv.org/abs/2511.22656

作者：Lin Xu,Ke Li,Dongjie Wang,Fengmao Lv,Tianrui Li,Yanyong Huang
摘要：多视图无监督特征选择（MUFS）是一种有效的多视图数据降维方法。然而，大多数现有的方法主要使用一阶相似性图来保持局部结构，往往忽略了全局结构，可以捕获的二阶相似性。此外，一些MUFS方法利用预定义的二阶相似性图，使它们容易受到噪声和离群值的影响，从而导致次优的特征选择性能。在本文中，我们提出了一种新的MUFS方法，称为结构感知的混合阶相似性学习多视图无监督特征选择（SHINE-FS），以解决上述问题。SHINE-FS首先学习共识锚点和相应的锚点图，以捕获锚点和样本之间的跨视图关系。基于所获得的跨视图一致性信息，它生成样本的低维表示，这有助于通过识别区分性特征来重建多视图数据。随后，它采用锚样本关系来学习二阶相似性图。此外，通过联合学习一阶和二阶相似性图，SHINE-FS构建了一个混合阶相似性图，可以捕获局部和全局结构，从而揭示内在的数据结构，以增强特征选择。在真实多视图数据集上的综合实验结果表明，SHINE-FS优于现有的方法。
摘要：Multi-view unsupervised feature selection (MUFS) has recently emerged as an effective dimensionality reduction method for unlabeled multi-view data. However, most existing methods mainly use first-order similarity graphs to preserve local structure, often overlooking the global structure that can be captured by second-order similarity. In addition, a few MUFS methods leverage predefined second-order similarity graphs, making them vulnerable to noise and outliers and resulting in suboptimal feature selection performance. In this paper, we propose a novel MUFS method, termed Structure-aware Hybrid-order sImilarity learNing for multi-viEw unsupervised Feature Selection (SHINE-FS), to address the aforementioned problem. SHINE-FS first learns consensus anchors and the corresponding anchor graph to capture the cross-view relationships between the anchors and the samples. Based on the acquired cross-view consensus information, it generates low-dimensional representations of the samples, which facilitate the reconstruction of multi-view data by identifying discriminative features. Subsequently, it employs the anchor-sample relationships to learn a second-order similarity graph. Furthermore, by jointly learning first-order and second-order similarity graphs, SHINE-FS constructs a hybrid-order similarity graph that captures both local and global structures, thereby revealing the intrinsic data structure to enhance feature selection. Comprehensive experimental results on real multi-view datasets show that SHINE-FS outperforms the state-of-the-art methods.

【5】Where to Measure: Epistemic Uncertainty-Based Sensor Placement with ConvCNPs
标题：在哪里测量：使用ConvCNPs基于认知不确定性的传感器放置
链接：https://arxiv.org/abs/2511.22567

作者：Feyza Eksen,Stefan Oehmcke,Stefan Lüdtke
摘要：精确的传感器放置对于环境和气候过程等时空系统的建模至关重要。神经过程（NP），特别是卷积条件神经过程（ConvCNP），提供具有不确定性估计的可扩展概率模型，使其非常适合数据驱动的传感器放置。然而，现有的方法依赖于总的预测不确定性，它混淆了认知和任意的组件，这可能会导致次优的传感器选择在模糊的区域。为了解决这个问题，我们提出了预期减少认知的不确定性作为一个新的采集功能的传感器的位置。为了实现这一点，我们扩展了ConvCNP与混合密度网络（MDNs）输出头的认知不确定性估计。初步结果表明，认知不确定性驱动的传感器的位置更有效地减少模型误差比基于整体不确定性的方法。
摘要：Accurate sensor placement is critical for modeling spatio-temporal systems such as environmental and climate processes. Neural Processes (NPs), particularly Convolutional Conditional Neural Processes (ConvCNPs), provide scalable probabilistic models with uncertainty estimates, making them well-suited for data-driven sensor placement. However, existing approaches rely on total predictive uncertainty, which conflates epistemic and aleatoric components, that may lead to suboptimal sensor selection in ambiguous regions. To address this, we propose expected reduction in epistemic uncertainty as a new acquisition function for sensor placement. To enable this, we extend ConvCNPs with a Mixture Density Networks (MDNs) output head for epistemic uncertainty estimation. Preliminary results suggest that epistemic uncertainty driven sensor placement more effectively reduces model error than approaches based on overall uncertainty.

【6】TS2Vec-Ensemble: An Enhanced Self-Supervised Framework for Time Series Forecasting
标题：TS 2 Vec-Ensemble：时间序列预测的增强型自我监督框架
链接：https://arxiv.org/abs/2511.22395

作者：Ganeshan Niroshan,Uthayasanker Thayasivam
摘要：自监督表示学习，特别是通过TS 2 Vec等对比方法，推进了时间序列数据的分析。然而，这些模型在预测任务中往往会出现问题，因为它们的目标函数优先考虑实例判别，而不是捕捉确定性模式，如季节性和趋势，这对准确预测至关重要。本文介绍了TS 2 Vec-Ensemble，一种新的混合框架，旨在弥合这一差距。我们的方法通过将它们与编码周期性周期的显式工程时间特征融合，增强了来自预训练TS 2 Vec编码器的强大的隐式学习动态。这种融合是通过双模型集成架构实现的，其中两个不同的回归头-一个专注于学习动态，另一个专注于季节模式-使用自适应加权方案进行组合。集成权重针对每个预测范围进行独立优化，允许模型根据需要动态地优先考虑短期动态或长期季节性。我们在ETT基准数据集上进行了广泛的实验，用于单变量和多变量预测。结果表明，TS 2 Vec-Entrance一致且显着优于标准TS 2 Vec基线和其他最先进的模型，验证了我们的假设，即学习表示和显式时间先验的混合是长期时间序列预测的优越策略。
摘要：Self-supervised representation learning, particularly through contrastive methods like TS2Vec, has advanced the analysis of time series data. However, these models often falter in forecasting tasks because their objective functions prioritize instance discrimination over capturing the deterministic patterns, such as seasonality and trend, that are critical for accurate prediction. This paper introduces TS2Vec-Ensemble, a novel hybrid framework designed to bridge this gap. Our approach enhances the powerful, implicitly learned dynamics from a pretrained TS2Vec encoder by fusing them with explicit, engineered time features that encode periodic cycles. This fusion is achieved through a dual-model ensemble architecture, where two distinct regression heads -- one focused on learned dynamics and the other on seasonal patterns -- are combined using an adaptive weighting scheme. The ensemble weights are optimized independently for each forecast horizon, allowing the model to dynamically prioritize short-term dynamics or long-term seasonality as needed. We conduct extensive experiments on the ETT benchmark datasets for both univariate and multivariate forecasting. The results demonstrate that TS2Vec-Ensemble consistently and significantly outperforms the standard TS2Vec baseline and other state-of-the-art models, validating our hypothesis that a hybrid of learned representations and explicit temporal priors is a superior strategy for long-horizon time series forecasting.

【7】Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active Learning
标题：清理池：深度主动学习中未标记收件箱的渐进过滤
链接：https://arxiv.org/abs/2511.22344

作者：Denis Huseljic,Marek Herde,Lukas Rauch,Paul Hahn,Bernhard Sick
备注：Submitted to CVPR
摘要：现有的主动学习（AL）策略捕获了根本不同的数据价值概念，例如，不确定性或代表性。因此，策略的有效性在数据集、模型甚至AL周期中可能会有很大差异。致力于单一策略的风险次优性能，因为没有一个单一的策略在整个AL过程中占主导地位。我们介绍了REFINE，一种集成AL方法，它结合了多种策略，而无需事先知道哪种策略表现最好。在每个AL循环中，REFINE分两个阶段操作：（1）渐进式过滤通过考虑AL策略的集合来迭代地细化未标记的池，保留捕获不同价值概念的有希望的候选者。(2)然后，基于覆盖率的选择从这个细化的池中选择最后一批，确保所有先前确定的价值概念都得到考虑。在6个分类数据集和3个基础模型上的广泛实验表明，REFINE始终优于单个策略和现有的集成方法。值得注意的是，渐进式过滤作为一个强大的预处理步骤，可以提高应用于细化池的任何单个AL策略的性能，我们在音频频谱图分类用例中演示了这一点。最后，REFINE的集成可以很容易地扩展到即将到来的最先进的AL策略。
摘要：Existing active learning (AL) strategies capture fundamentally different notions of data value, e.g., uncertainty or representativeness. Consequently, the effectiveness of strategies can vary substantially across datasets, models, and even AL cycles. Committing to a single strategy risks suboptimal performance, as no single strategy dominates throughout the entire AL process. We introduce REFINE, an ensemble AL method that combines multiple strategies without knowing in advance which will perform best. In each AL cycle, REFINE operates in two stages: (1) Progressive filtering iteratively refines the unlabeled pool by considering an ensemble of AL strategies, retaining promising candidates capturing different notions of value. (2) Coverage-based selection then chooses a final batch from this refined pool, ensuring all previously identified notions of value are accounted for. Extensive experiments across 6 classification datasets and 3 foundation models show that REFINE consistently outperforms individual strategies and existing ensemble methods. Notably, progressive filtering serves as a powerful preprocessing step that improves the performance of any individual AL strategy applied to the refined pool, which we demonstrate on an audio spectrogram classification use case. Finally, the ensemble of REFINE can be easily extended with upcoming state-of-the-art AL strategies.

【8】Structure is Supervision: Multiview Masked Autoencoders for Radiology
标题：结构即监督：放射学多视图掩蔽自动编码器
链接：https://arxiv.org/abs/2511.22294

作者：Sonia Laguna,Andrea Agostini,Alain Ryser,Samuel Ruiperez-Campillo,Irene Cannistraci,Moritz Vandenhirtz,Stephan Mandt,Nicolas Deperrois,Farhad Nooralahzadeh,Michael Krauthammer,Thomas M. Sutter,Julia E. Vogt
摘要：构建强大的医疗机器学习系统需要利用临床数据中存在的内在结构的预训练策略。我们介绍了多视图屏蔽自动编码器（MVMAE），这是一个自我监督的框架，它利用放射学研究的自然多视图组织来学习视图不变和疾病相关的表示。MVMAE将掩蔽图像重建与交叉视图对齐相结合，将投影中的临床冗余转换为强大的自我监控信号。我们使用MVMAE-V2 T进一步扩展了这种方法，该方法将放射学报告作为基于文本的辅助学习信号，以增强语义基础，同时保留完全基于视觉的推理。在三个大规模公共数据集MIMIC-CXR、CheXpert和PadChest的下游疾病分类任务上进行评估，MVMAE的表现始终优于监督和视觉语言基线。此外，MVMAE-V2 T提供了额外的收益，特别是在结构化文本监督最有益的低标签制度中。总之，这些结果建立了结构和文本监督的重要性，作为可扩展的，临床接地的医学基础模型的补充路径。
摘要：Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVMAE), a self-supervised framework that leverages the natural multi-view organization of radiology studies to learn view-invariant and disease-relevant representations. MVMAE combines masked image reconstruction with cross-view alignment, transforming clinical redundancy across projections into a powerful self-supervisory signal. We further extend this approach with MVMAE-V2T, which incorporates radiology reports as an auxiliary text-based learning signal to enhance semantic grounding while preserving fully vision-based inference. Evaluated on a downstream disease classification task on three large-scale public datasets, MIMIC-CXR, CheXpert, and PadChest, MVMAE consistently outperforms supervised and vision-language baselines. Furthermore, MVMAE-V2T provides additional gains, particularly in low-label regimes where structured textual supervision is most beneficial. Together, these results establish the importance of structural and textual supervision as complementary paths toward scalable, clinically grounded medical foundation models.

【9】An energy-efficient spiking neural network with continuous learning for self-adaptive brain-machine interface
标题：一个具有持续学习功能的节能尖峰神经网络，用于自适应脑机接口
链接：https://arxiv.org/abs/2511.22108

作者：Zhou Biyan,Arindam Basu
摘要：在植入式脑机接口（iBMI）中，同时记录的神经元数量呈指数增长趋势。在植入物中集成神经解码器是未来无线iBMI的有效数据压缩方法。然而，系统的非平稳性使得解码器的性能不可靠。为了避免解码器的频繁重新训练并确保iBMI用户的安全性和舒适性，持续学习对于实际应用至关重要。由于深度尖峰神经网络（DSNN）被认为是开发资源有效的神经解码器的一种很有前途的方法，我们提出了适用于DSNN的强化学习（RL）算法的连续学习方法。选择Banditron和BRL作为两个候选RL算法，因为它们可以用有限的计算资源进行训练，有效地解决非平稳问题并拟合可植入设备的能量约束。为了评估所提出的方法的有效性，我们进行了开环和闭环实验。开环实验的精度与DSNN Banditron和DSNN Banditron L在较长的时间内保持稳定。与此同时，在具有扰动的闭环实验中，DSNN Banditron的目标时间与DSNN Banditron相比，在训练期间，存储器访问使用量减少了98%，乘法和累加（MAC）操作的要求减少了99%。与以前的连续学习SNN解码器相比，DSNN Banditron需要的计算量减少了98%，使其成为未来无线iBMI系统的主要候选者。
摘要：The number of simultaneously recorded neurons follows an exponentially increasing trend in implantable brain-machine interfaces (iBMIs). Integrating the neural decoder in the implant is an effective data compression method for future wireless iBMIs. However, the non-stationarity of the system makes the performance of the decoder unreliable. To avoid frequent retraining of the decoder and to ensure the safety and comfort of the iBMI user, continuous learning is essential for real-life applications. Since Deep Spiking Neural Networks (DSNNs) are being recognized as a promising approach for developing resource-efficient neural decoder, we propose continuous learning approaches with Reinforcement Learning (RL) algorithms adapted for DSNNs. Banditron and AGREL are chosen as the two candidate RL algorithms since they can be trained with limited computational resources, effectively addressing the non-stationary problem and fitting the energy constraints of implantable devices. To assess the effectiveness of the proposed methods, we conducted both open-loop and closed-loop experiments. The accuracy of open-loop experiments conducted with DSNN Banditron and DSNN AGREL remains stable over extended periods. Meanwhile, the time-to-target in the closed-loop experiment with perturbations, DSNN Banditron performed comparably to that of DSNN AGREL while achieving reductions of 98% in memory access usage and 99% in the requirements for multiply- and-accumulate (MAC) operations during training. Compared to previous continuous learning SNN decoders, DSNN Banditron requires 98% less computes making it a prime candidate for future wireless iBMI systems.

【10】MRI-Based Brain Age Estimation with Supervised Contrastive Learning of Continuous Representation
标题：基于MRI的大脑年龄估计和连续表示的监督对比学习
链接：https://arxiv.org/abs/2511.22102

作者：Simon Joseph Clément Crête,Marta Kersten-Oertel,Yiming Xiao
摘要：基于MRI的脑年龄估计模型旨在基于诸如神经解剖学特征的信息来评估受试者的生物脑年龄。包括神经退行性疾病在内的各种因素都可以加速大脑衰老，测量这种现象可以作为临床应用的潜在生物标志物。虽然基于深度学习（DL）的回归最近引起了人们的广泛关注，但现有方法通常无法捕捉神经形态变化的连续性，可能导致次优的特征表示和结果。为了解决这个问题，我们建议使用监督对比学习和最近的Rank-N-Contrast（RNC）损失来首次基于广泛使用的T1w结构MRI估计大脑年龄，并利用Grad-RAM来直观地解释回归结果。实验表明，我们提出的方法在有限的训练样本数据集上实现了4.27年的平均绝对误差（MAE）和0.93的R^2 $，显著优于具有相同ResNet主干的传统深度回归，同时在具有更大训练数据的最先进方法中表现更好或更好。此外，Grad-RAM揭示了与RNC损失相关的年龄回归比传统的深度回归更细微的特征。作为一项探索性研究，我们采用所提出的方法来估计阿尔茨海默病和帕金森病患者的生物学和时间脑年龄之间的差距，并揭示了脑年龄差距与疾病严重程度之间的相关性，证明了其作为神经退行性疾病生物标志物的潜力。
摘要：MRI-based brain age estimation models aim to assess a subject's biological brain age based on information, such as neuroanatomical features. Various factors, including neurodegenerative diseases, can accelerate brain aging and measuring this phenomena could serve as a potential biomarker for clinical applications. While deep learning (DL)-based regression has recently attracted major attention, existing approaches often fail to capture the continuous nature of neuromorphological changes, potentially resulting in sub-optimal feature representation and results. To address this, we propose to use supervised contrastive learning with the recent Rank-N-Contrast (RNC) loss to estimate brain age based on widely used T1w structural MRI for the first time and leverage Grad-RAM to visually explain regression results. Experiments show that our proposed method achieves a mean absolute error (MAE) of 4.27 years and an $R^2$ of 0.93 with a limited dataset of training samples, significantly outperforming conventional deep regression with the same ResNet backbone while performing better or comparably with the state-of-the-art methods with significantly larger training data. Furthermore, Grad-RAM revealed more nuanced features related to age regression with the RNC loss than conventional deep regression. As an exploratory study, we employed the proposed method to estimate the gap between the biological and chronological brain ages in Alzheimer's Disease and Parkinson's disease patients, and revealed the correlation between the brain age gap and disease severity, demonstrating its potential as a biomarker in neurodegenerative disorders.

【11】Unsupervised Anomaly Detection for Smart IoT Devices: Performance and Resource Comparison
标题：智能物联网设备的无监督异常检测：性能和资源比较
链接：https://arxiv.org/abs/2511.21842

作者：Md. Sad Abdullah Sami,Mushfiquzzaman Abid
摘要：物联网（IoT）在不同行业的快速扩展大大提高了运营效率，但同时由于网络威胁的增加，网络安全漏洞也随之增加。鉴于传统的基于签名的异常检测系统（ADS）在识别新兴和零日威胁方面的局限性，本研究使用TON_IoT恒温器数据集研究了两种无监督异常检测技术（隔离森林（IF）和单类支持向量机（OC-SVM））的有效性。基于标准指标（准确度，精确度，召回率和F1分数）以及关键的资源利用率指标（如推理时间，模型大小和峰值RAM使用率）进行了全面的评估。实验结果表明，IF始终优于OC-SVM，实现了更高的检测准确率，更高的精度和召回率，以及更好的F1分数。此外，Isolation Forest还展示了显著优越的计算占用空间，使其更适合部署在资源受限的物联网边缘设备上。这些发现强调了Isolation Forest在高维和不平衡物联网环境中的鲁棒性，并突出了其实时异常检测的实际可行性。
摘要：The rapid expansion of Internet of Things (IoT) deployments across diverse sectors has significantly enhanced operational efficiency, yet concurrently elevated cybersecurity vulnerabilities due to increased exposure to cyber threats. Given the limitations of traditional signature-based Anomaly Detection Systems (ADS) in identifying emerging and zero-day threats, this study investigates the effectiveness of two unsupervised anomaly detection techniques, Isolation Forest (IF) and One-Class Support Vector Machine (OC-SVM), using the TON_IoT thermostat dataset. A comprehensive evaluation was performed based on standard metrics (accuracy, precision, recall, and F1-score) alongside critical resource utilization metrics such as inference time, model size, and peak RAM usage. Experimental results revealed that IF consistently outperformed OC-SVM, achieving higher detection accuracy, superior precision, and recall, along with a significantly better F1-score. Furthermore, Isolation Forest demonstrated a markedly superior computational footprint, making it more suitable for deployment on resource-constrained IoT edge devices. These findings underscore Isolation Forest's robustness in high-dimensional and imbalanced IoT environments and highlight its practical viability for real-time anomaly detection.

迁移|Zero/Few/One-Shot|自适应(11篇)

【1】ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts
标题：ASTRO：通过动态引导轨迹滚动进行自适应缝合
链接：https://arxiv.org/abs/2511.23442

作者：Hang Yu,Di Zhang,Qiwei Du,Yanping Zhao,Hai Zhang,Guang Chen,Eduardo E. Veas,Junqiao Zhao
摘要：离线强化学习（RL）使代理能够从预先收集的数据集中学习最佳策略。然而，包含次优和碎片化轨迹的数据集对奖励传播提出了挑战，导致价值估计不准确和策略性能下降。虽然通过生成模型的轨迹拼接提供了一个有前途的解决方案，现有的增强方法经常产生轨迹，要么局限于支持的行为政策或违反底层的动态，从而限制了他们的有效性，政策的改善。我们提出了ASTRO，一个数据增强框架，可以为离线RL生成分布新颖和动态一致的轨迹。ASTRO首先学习时间距离表示，以识别不同的和可达到的缝合目标。然后，我们采用了一个动态引导的缝合规划器，自适应地生成连接动作序列通过推出偏差反馈，定义为目标状态序列和实际到达的状态序列之间的差距，通过执行预测的动作，以提高轨迹缝合的可行性和可达性。这一方法有助于通过拼接进行有效扩充，并最终加强政策学习。ASTRO在各种算法上优于先前的离线RL增强方法，在具有挑战性的OGBench套件上实现了显着的性能提升，并在标准离线RL基准测试（如D4RL）上表现出一致的改进。
摘要：Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challenges for reward propagation, resulting in inaccurate value estimation and degraded policy performance. While trajectory stitching via generative models offers a promising solution, existing augmentation methods frequently produce trajectories that are either confined to the support of the behavior policy or violate the underlying dynamics, thereby limiting their effectiveness for policy improvement. We propose ASTRO, a data augmentation framework that generates distributionally novel and dynamics-consistent trajectories for offline RL. ASTRO first learns a temporal-distance representation to identify distinct and reachable stitch targets. We then employ a dynamics-guided stitch planner that adaptively generates connecting action sequences via Rollout Deviation Feedback, defined as the gap between target state sequence and the actual arrived state sequence by executing predicted actions, to improve trajectory stitching's feasibility and reachability. This approach facilitates effective augmentation through stitching and ultimately enhances policy learning. ASTRO outperforms prior offline RL augmentation methods across various algorithms, achieving notable performance gain on the challenging OGBench suite and demonstrating consistent improvements on standard offline RL benchmarks such as D4RL.

【2】ParaGate: Parasitic-Driven Domain Adaptation Transfer Learning for Netlist Performance Prediction
标题：ParaGate：用于网表性能预测的寄生驱动领域自适应迁移学习
链接：https://arxiv.org/abs/2511.23340

作者：Bin Sun,Jingyi Zhou,Jianan Mu,Zhiteng Chao,Tianmeng Yang,Ziyue Xu,Jing Ye,Huawei Li
备注：8 pages, 6 figures
摘要：在传统的EDA流程中，布局级性能指标只能在布局和布线之后获得，这阻碍了早期阶段的全局优化。虽然一些基于神经网络的解决方案直接从网表预测布局级性能，但由于商业布局和布线工具的黑盒算法，它们经常面临泛化挑战，这些工具会在设计中创建不同的数据。为此，我们提出了ParaGate，一个三步跨阶段的预测框架，推断布局级的时间和电源从网表。首先，我们提出了一种两阶段的转移学习方法来预测寄生参数，对中等规模的电路进行预训练，并对较大的电路进行微调以捕获极端条件。接下来，我们依靠EDA工具进行时序分析，卸载长路径数值推理。最后，ParaGate使用子图特征执行全局校准。实验表明，ParaGate以最少的微调数据实现了强大的泛化能力：在openE906上，其到达时间R2从0.119到0.897。这些结果表明，ParaGate可以为综合和布局阶段的全局优化提供指导。
摘要：In traditional EDA flows, layout-level performance metrics are only obtainable after placement and routing, hindering global optimization at earlier stages. Although some neural-network-based solutions predict layout-level performance directly from netlists, they often face generalization challenges due to the black-box heuristics of commercial placement-and-routing tools, which create disparate data across designs. To this end, we propose ParaGate, a three-step cross-stage prediction framework that infers layout-level timing and power from netlists. First, we propose a two-phase transfer-learning approach to predict parasitic parameters, pre-training on mid-scale circuits and fine-tuning on larger ones to capture extreme conditions. Next, we rely on EDA tools for timing analysis, offloading the long-path numerical reasoning. Finally, ParaGate performs global calibration using subgraph features. Experiments show that ParaGate achieves strong generalization with minimal fine-tuning data: on openE906, its arrival-time R2 from 0.119 to 0.897. These results demonstrate that ParaGate could provide guidance for global optimization in the synthesis and placement stages.

【3】BanglaSentNet: An Explainable Hybrid Deep Learning Framework for Multi-Aspect Sentiment Analysis with Cross-Domain Transfer Learning
标题：BanglaSentNet：一个可解释的混合深度学习框架，用于多方面情绪分析和跨领域迁移学习
链接：https://arxiv.org/abs/2511.23264

作者：Ariful Islam,Md Rifat Hossen,Tanvir Mahmud
备注：Submitted to Springer Nature Computer Science (SNCS) as an extended version of our ICDSAIA 2025 conference paper
摘要：由于有限的注释数据集，形态复杂性，代码混合现象和域转移问题，孟加拉电子商务评论的多方面情感分析仍然具有挑战性，影响了3亿讲孟加拉语的用户。现有的方法缺乏可解释性和跨域泛化能力的实际部署至关重要。我们提出了BanglaSentNet，这是一个可解释的混合深度学习框架，通过动态加权集成学习集成了LSTM，BiLSTM，GRU和BanglaBERT，用于多方面情感分类。我们从孟加拉国主要电子商务平台的四个方面（质量，服务，价格，装饰）引入了8，755个手动注释的孟加拉产品评论数据集。我们的框架结合了基于SHAP的特征归因和注意力可视化，以获得透明的见解。BanglaSentNet实现了85%的准确率和0.88的F1分数，比独立的深度学习模型高出3-7%，并且大大超过了传统方法。可解释性套件达到9.4/10的可解释性得分，人类同意率为87.6%。跨域迁移学习实验显示了强大的泛化能力：zero-shot性能在不同领域（BanglaBook评论，社交媒体，一般电子商务，新闻头条）保持67-76%的有效性;使用500-1000个样本的Few-Shot学习实现了90-95%的完全微调性能，显着降低了注释成本。实际部署证明了孟加拉国电子商务平台的实用性，实现了定价优化，服务改进和客户体验增强的数据驱动决策。这项研究为孟加拉语情感分析建立了一个新的最先进的基准，推进了低资源语言的集成学习方法，并为商业应用提供了可行的解决方案。
摘要：Multi-aspect sentiment analysis of Bangla e-commerce reviews remains challenging due to limited annotated datasets, morphological complexity, code-mixing phenomena, and domain shift issues, affecting 300 million Bangla-speaking users. Existing approaches lack explainability and cross-domain generalization capabilities crucial for practical deployment. We present BanglaSentNet, an explainable hybrid deep learning framework integrating LSTM, BiLSTM, GRU, and BanglaBERT through dynamic weighted ensemble learning for multi-aspect sentiment classification. We introduce a dataset of 8,755 manually annotated Bangla product reviews across four aspects (Quality, Service, Price, Decoration) from major Bangladeshi e-commerce platforms. Our framework incorporates SHAP-based feature attribution and attention visualization for transparent insights. BanglaSentNet achieves 85% accuracy and 0.88 F1-score, outperforming standalone deep learning models by 3-7% and traditional approaches substantially. The explainability suite achieves 9.4/10 interpretability score with 87.6% human agreement. Cross-domain transfer learning experiments reveal robust generalization: zero-shot performance retains 67-76% effectiveness across diverse domains (BanglaBook reviews, social media, general e-commerce, news headlines); few-shot learning with 500-1000 samples achieves 90-95% of full fine-tuning performance, significantly reducing annotation costs. Real-world deployment demonstrates practical utility for Bangladeshi e-commerce platforms, enabling data-driven decision-making for pricing optimization, service improvement, and customer experience enhancement. This research establishes a new state-of-the-art benchmark for Bangla sentiment analysis, advances ensemble learning methodologies for low-resource languages, and provides actionable solutions for commercial applications.

【4】Bandit Guided Submodular Curriculum for Adaptive Subset Selection
标题：Bandit引导的自适应子集选择的子模块课程
链接：https://arxiv.org/abs/2511.22944

作者：Prateek Chanda,Prayas Agrawal,Saral Sureka,Lokesh Reddy Polu,Atharv Kshirsagar,Ganesh Ramakrishnan
备注：10 pages main, 21 pages Appendix, 8 figures
摘要：传统的课程学习从简单到困难的样本，但定义一个可靠的概念的难度仍然难以捉摸。先前的工作已经使用子模块函数来诱导课程学习中的难度分数。我们重新解释自适应子集选择，并制定它作为一个多臂的强盗问题，其中每个手臂对应于一个子模块功能指导样本选择。我们介绍ONLINESUBMOD，一种新的在线贪婪的政策，优化效用驱动的奖励，并可证明实现各种采样制度下的无遗憾性能。从经验上看，ONLINESUBMOD在视觉和语言数据集上的表现优于传统的课程学习和双层优化方法，表现出卓越的准确性和效率权衡。更广泛地说，我们表明，validationdriven奖励指标提供了一个原则性的方式来指导课程安排。
摘要：Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.

【5】Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation
标题：多模态测试时自适应的逐步调整桥接模态
链接：https://arxiv.org/abs/2511.22862

作者：Jiacheng Li,Songhe Feng
备注：Accepted by AAAI 2026 (Oral)
摘要：测试时自适应（TTA）允许仅使用未标记的测试数据进行在线模型自适应，旨在弥合源分布和目标分布之间的差距。然而，在多模态场景中，不同程度的分布转移在不同的模态引起了一个复杂的耦合效应的单模态浅功能转移和跨模态的高层语义错位，构成了一个主要的障碍，现有的TTA方法扩展到多模态领域。为了应对这一挑战，我们提出了一种新的多模态测试时自适应（MMTTA）框架，称为通过渐进式重新对齐桥接模态（BriMPR）。BriMPR由两个逐步增强的模块组成，采用分而治之的策略解决耦合效应。具体来说，我们首先分解成多个单峰特征对齐子问题的MMTTA。通过利用快速调整的强大函数近似能力，我们将单峰全局特征分布校准到它们各自的源分布，从而实现跨模态的初始语义重新对齐。随后，我们将可信的伪标签分配给掩蔽模态和完整模态的组合，并引入模态间实例对比学习，以进一步增强模态之间的信息交互并改进对齐。MMTTA任务，包括腐败为基础的和现实世界的域转移基准，广泛的实验证明了我们的方法的优越性。我们的源代码可以在[this URL]（https：//github.com/Luchicken/BriMPR）上找到。
摘要：Test-time adaptation (TTA) enables online model adaptation using only unlabeled test data, aiming to bridge the gap between source and target distributions. However, in multimodal scenarios, varying degrees of distribution shift across different modalities give rise to a complex coupling effect of unimodal shallow feature shift and cross-modal high-level semantic misalignment, posing a major obstacle to extending existing TTA methods to the multimodal field. To address this challenge, we propose a novel multimodal test-time adaptation (MMTTA) framework, termed as Bridging Modalities via Progressive Re-alignment (BriMPR). BriMPR, consisting of two progressively enhanced modules, tackles the coupling effect with a divide-and-conquer strategy. Specifically, we first decompose MMTTA into multiple unimodal feature alignment sub-problems. By leveraging the strong function approximation ability of prompt tuning, we calibrate the unimodal global feature distributions to their respective source distributions, so as to achieve the initial semantic re-alignment across modalities. Subsequently, we assign the credible pseudo-labels to combinations of masked and complete modalities, and introduce inter-modal instance-wise contrastive learning to further enhance the information interaction among modalities and refine the alignment. Extensive experiments on MMTTA tasks, including both corruption-based and real-world domain shift benchmarks, demonstrate the superiority of our method. Our source code is available at [this URL](https://github.com/Luchicken/BriMPR).

【6】AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices
标题：AutoTailor：针对多样化边缘设备的自动、高效的自适应模型部署
链接：https://arxiv.org/abs/2511.22355

作者：Mengyang Liu,Chenyu Lu,Haodong Tian,Fang Dong,Ruiting Zhou,Wei Wang,Dian Shen,Guangtong Li,Ye Wan,Li Li
摘要：设备上的机器学习（ML）已经成为新兴移动应用程序的基本组成部分。自适应模型部署通过自定义神经架构为异构设备功能和性能需求提供高效的推理。基于SuperNet的方法通过从预训练的ML模型生成大量模型变体提供了一个有前途的解决方案。然而，在现有框架中应用SuperNet会遇到繁琐的模型感知开发和耗时的硬件感知分析，这限制了它们的实际应用。我们介绍了AutoTailor，这是第一个为边缘设备实现自动化，基于端到端SuperNet的自适应模型部署的框架。与手动构建SuperNet不同，AutoTailor采用计算图引导的编译方法来自动将用户提供的ML模型转换为SuperNet。为了支持高效的专业化，AutoTailor集成了无需学习的延迟和准确性预测器，实现了低成本但准确的性能预测。我们的扩展评估表明，AutoTailor减少了11- 27 $\times $的SuperNet建设的代码行，减少硬件感知的剖析成本至少11 $\times $，并实现了高达15.60\%的绝对精度提高和60.03\%的延迟减少相比，在不同的模型和设备的最先进的方法。
摘要：On-device machine learning (ML) has become a fundamental component of emerging mobile applications. Adaptive model deployment delivers efficient inference for heterogeneous device capabilities and performance requirements through customizing neural architectures. SuperNet-based approaches offer a promising solution by generating a large number of model variants from a pre-trained ML model. However, applying SuperNet in existing frameworks suffers from tedious model-aware development and time-consuming hardware-aware profiling, which limits their practical adoption. We present AutoTailor, the first framework to enable automated, end-to-end SuperNet-based adaptive model deployment for edge devices. Unlike manual SuperNet construction, AutoTailor employs a computation graph-guided compilation approach to automatically transform user-provided ML models into SuperNets. To support efficient specialization, AutoTailor incorporates learning-free latency and accuracy predictors, enabling low-cost yet accurate performance prediction. Our extended evaluations demonstrate that AutoTailor reduces the lines of code for SuperNet construction by 11--27$\times$, decreases hardware-aware profiling costs by at least 11$\times$, and achieves up to 15.60\% absolute accuracy improvement and 60.03\% latency reduction compared to state-of-the-art approaches across diverse models and devices.

【7】Adaptive tumor growth forecasting via neural & universal ODEs
标题：通过神经和通用ODE进行自适应肿瘤生长预测
链接：https://arxiv.org/abs/2511.22292

作者：Kavya Subramanian,Prathamesh Dinesh Joshi,Raj Abhijit Dandekar,Rajat Dandekar,Sreedath Panat
备注：Accepted at JuliaCon 2025 conference
摘要：预测肿瘤生长对于优化治疗至关重要。经典的生长模型，如Gompertz和Bertalanffy方程，捕捉一般的肿瘤动力学，但可能无法适应患者特异性的变化，特别是有限的数据。在这项研究中，我们利用神经常微分方程（Neural ODE）和泛微分方程（Universal Differential Equations，UDE），科学机器学习（SciML）的两大支柱，来构建能够从实验数据中学习的自适应肿瘤生长模型。使用Gompertz模型作为基线，我们用自适应神经网络代替刚性项，通过Julia编程语言中的鲁棒建模来捕获隐藏的动态。我们使用我们的模型在数据约束和符号恢复下进行预测，将学习到的动态转换为明确的数学表达式。我们的方法有可能提高预测准确性，指导动态和有效的治疗策略，以改善临床结果。
摘要：Forecasting tumor growth is critical for optimizing treatment. Classical growth models such as the Gompertz and Bertalanffy equations capture general tumor dynamics but may fail to adapt to patient-specific variability, particularly with limited data available. In this study, we leverage Neural Ordinary Differential Equations (Neural ODEs) and Universal Differential Equations (UDEs), two pillars of Scientific Machine Learning (SciML), to construct adaptive tumor growth models capable of learning from experimental data. Using the Gompertz model as a baseline, we replace rigid terms with adaptive neural networks to capture hidden dynamics through robust modeling in the Julia programming language. We use our models to perform forecasting under data constraints and symbolic recovery to transform the learned dynamics into explicit mathematical expressions. Our approach has the potential to improve predictive accuracy, guiding dynamic and effective treatment strategies for improved clinical outcomes.

【8】Adaptive Dueling Double Deep Q-networks in Uniswap V3 Replication and Extension with Mamba
标题：Uniswitch V3中的自适应决斗双深Q网络使用Mamba复制和扩展
链接：https://arxiv.org/abs/2511.22101

作者：Zhaofeng Zhang
备注：12 pages, 5 figures
摘要：该报告介绍了复制和改进文章“Uniswap V3中的自适应流动性供应与深度强化学习”的主要步骤。“复制部分包括如何从Uniswap Subgraph获取数据、实现的详细信息以及对结果的评论。复制后，我提出了一个新的结构，在原来的模型的基础上，它结合了Mamba和DDQN和一个新的奖励函数。在这个新结构中，我再次清理数据，并引入两个新的基线进行比较。因此，尽管该模型尚未应用于所有数据集，但它显示出比原始模型更强的理论支持，并且在某些测试中表现得更好。
摘要：The report goes through the main steps of replicating and improving the article "Adaptive Liquidity Provision in Uniswap V3 with Deep Reinforcement Learning." The replication part includes how to obtain data from the Uniswap Subgraph, details of the implementation, and comments on the results. After the replication, I propose a new structure based on the original model, which combines Mamba with DDQN and a new reward function. In this new structure, I clean the data again and introduce two new baselines for comparison. As a result, although the model has not yet been applied to all datasets, it shows stronger theoretical support than the original model and performs better in some tests.

【9】Calibration-Free EEG-based Driver Drowsiness Detection with Online Test-Time Adaptation
标题：具有在线测试时间自适应的免校准基于脑电波的驾驶员嗜睡检测
链接：https://arxiv.org/abs/2511.22030

作者：Geun-Deok Jang,Dong-Kyun Han,Seo-Hyeon Park,Seong-Whan Lee
备注：10 pages, Submitted to IEEE Transactions on Human-Machine Systems
摘要：困倦驾驶是交通事故的一个日益增长的原因，促使最近探索基于脑电图（EEG）的困倦检测系统。然而，由于心理和物理因素导致的EEG信号的固有可变性需要繁琐的校准过程。特别地，EEG信号的受试者间可变性导致域移位问题，这使得将困倦检测模型推广到看不见的目标受试者具有挑战性。为了解决这些问题，我们提出了一种新型的驾驶员睡意检测框架，该框架利用在线测试时间自适应（TTA）方法来动态调整目标对象分布。我们提出的方法更新了批量归一化（BN）层中的可学习参数，同时保留了预训练的归一化统计数据，从而得到了一个修改后的配置，确保了测试期间的有效适应。我们采用了一个内存库，动态管理流EEG段，选择样本的基础上确定的负能量分数和持续时间的可靠性。此外，我们还引入了原型学习，以确保对分布随时间变化的鲁棒预测。我们在模拟环境中收集的持续注意力驾驶数据集上验证了我们的方法，其中在单调的车道保持任务中，根据延迟反应时间估计睡意。我们的实验表明，我们的方法优于所有基线，实现了平均F1分数为81.73%，比最佳TTA基线提高了11.73%。这表明，我们提出的方法显着提高了适应性的EEG为基础的困倦检测系统在非i.i.d.场景
摘要：Drowsy driving is a growing cause of traffic accidents, prompting recent exploration of electroencephalography (EEG)-based drowsiness detection systems. However, the inherent variability of EEG signals due to psychological and physical factors necessitates a cumbersome calibration process. In particular, the inter-subject variability of EEG signals leads to a domain shift problem, which makes it challenging to generalize drowsiness detection models to unseen target subjects. To address these issues, we propose a novel driver drowsiness detection framework that leverages online test-time adaptation (TTA) methods to dynamically adjust to target subject distributions. Our proposed method updates the learnable parameters in batch normalization (BN) layers, while preserving pretrained normalization statistics, resulting in a modified configuration that ensures effective adaptation during test time. We incorporate a memory bank that dynamically manages streaming EEG segments, selecting samples based on their reliability determined by negative energy scores and persistence time. In addition, we introduce prototype learning to ensure robust predictions against distribution shifts over time. We validated our method on the sustained-attention driving dataset collected in a simulated environment, where drowsiness was estimated from delayed reaction times during monotonous lane-keeping tasks. Our experiments show that our method outperforms all baselines, achieving an average F1-score of 81.73\%, an improvement of 11.73\% over the best TTA baseline. This demonstrates that our proposed method significantly enhances the adaptability of EEG-based drowsiness detection systems in non-i.i.d. scenarios.

【10】Adaptive Parameter Optimization for Robust Remote Photoplethysmography
标题：稳健远程光电体积脉搏成像的自适应参数优化
链接：https://arxiv.org/abs/2511.21903

作者：Cecilia G. Morales,Fanurs Chi En Teh,Kai Li,Pushpak Agrawal,Artur Dubrawski
备注：Accepted in Times Series for Health NeurIPs Workshop 2025
摘要：远程光电体积描记（rPPG）可使用标准RGB摄像头实现非接触式生命体征监测。然而，现有方法依赖于针对特定照明条件和相机设置优化的固定参数，限制了对不同部署环境的适应性。本文介绍了基于投影的鲁棒信号混合（PRISM）算法，一种无需训练的方法，通过基于信号质量评估的在线参数自适应，联合优化光度去趋势和颜色混合。PRISM在无监督方法中实现了最先进的性能，PURE上的MAE为0.77 bpm，UBFC-rPPG上的MAE为0.66 bpm，在5 bpm阈值下的准确度分别为97.3%和97.5%。统计分析证实，PRISM的性能与领先的监督方法相当（$p > 0.2$），同时在没有训练的情况下保持实时CPU性能。这验证了自适应时间序列优化显著改善了不同条件下的rPPG。
摘要：Remote photoplethysmography (rPPG) enables contactless vital sign monitoring using standard RGB cameras. However, existing methods rely on fixed parameters optimized for particular lighting conditions and camera setups, limiting adaptability to diverse deployment environments. This paper introduces the Projection-based Robust Signal Mixing (PRISM) algorithm, a training-free method that jointly optimizes photometric detrending and color mixing through online parameter adaptation based on signal quality assessment. PRISM achieves state-of-the-art performance among unsupervised methods, with MAE of 0.77 bpm on PURE and 0.66 bpm on UBFC-rPPG, and accuracy of 97.3\% and 97.5\% respectively at a 5 bpm threshold. Statistical analysis confirms PRISM performs equivalently to leading supervised methods ($p > 0.2$), while maintaining real-time CPU performance without training. This validates that adaptive time series optimization significantly improves rPPG across diverse conditions.

【11】LILAD: Learning In-context Lyapunov-stable Adaptive Dynamics Models
标题：LIRAD：在上下文中学习Lyapunov稳定的自适应动力学模型
链接：https://arxiv.org/abs/2511.21846

作者：Amit Jena,Na Li,Le Xie
备注：This article has been accepted for AAAI-26 (The 40th Annual AAAI Conference on Artificial Intelligence)
摘要：控制理论中的系统辨识旨在从轨迹数据近似动态系统。虽然神经网络已经表现出很强的预测准确性，但它们通常无法保持关键的物理特性，如稳定性，并且通常假设静态动态，限制了它们在分布变化下的适用性。现有的方法通常孤立地解决稳定性或适应性，缺乏确保两者的统一框架。我们提出了LILAD（学习上下文Lyapunov稳定的自适应动态），一种新的框架，系统识别，共同保证适应性和稳定性。LILAD通过上下文学习（ICL）同时学习动力学模型和李雅普诺夫函数，明确考虑参数不确定性。LILAD经过各种任务的训练，产生了一个稳定性感知的自适应动态模型以及自适应李雅普诺夫证书。在测试时，两个组件都使用短轨迹提示来适应新的系统实例，这可以实现快速泛化。为了严格确保稳定性，LILAD还计算了一个依赖于状态的衰减器，该衰减器对新系统实例中的任何状态的李雅普诺夫函数强制执行充分减少条件。这种机制甚至在分发外和任务外的场景下也扩展了稳定性保证。我们在基准自主系统上评估LILAD，并证明它在预测精度方面优于自适应，鲁棒和非自适应基线。
摘要：System identification in control theory aims to approximate dynamical systems from trajectory data. While neural networks have demonstrated strong predictive accuracy, they often fail to preserve critical physical properties such as stability and typically assume stationary dynamics, limiting their applicability under distribution shifts. Existing approaches generally address either stability or adaptability in isolation, lacking a unified framework that ensures both. We propose LILAD (Learning In-Context Lyapunov-stable Adaptive Dynamics), a novel framework for system identification that jointly guarantees adaptability and stability. LILAD simultaneously learns a dynamics model and a Lyapunov function through in-context learning (ICL), explicitly accounting for parametric uncertainty. Trained across a diverse set of tasks, LILAD produces a stability-aware, adaptive dynamics model alongside an adaptive Lyapunov certificate. At test time, both components adapt to a new system instance using a short trajectory prompt, which enables fast generalization. To rigorously ensure stability, LILAD also computes a state-dependent attenuator that enforces a sufficient decrease condition on the Lyapunov function for any state in the new system instance. This mechanism extends stability guarantees even under out-of-distribution and out-of-task scenarios. We evaluate LILAD on benchmark autonomous systems and demonstrate that it outperforms adaptive, robust, and non-adaptive baselines in predictive accuracy.

强化学习(6篇)

【1】Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning
标题：独立多智能体强化学习中的紧急协调和阶段结构
链接：https://arxiv.org/abs/2511.23315

作者：Azusa Yamaguchi
备注：22 pages, 19 figures
摘要：为了描述多智能体学习系统的动态特性，人们越来越多地寻求对分散式多智能体强化学习（MARL）中协调何时出现、波动或崩溃的更清晰的理解。我们重新审视完全独立的Q学习（IQL）作为一个最小的分散测试平台，并在环境大小L和代理密度ρ上进行大规模实验。我们构建了一个相位图使用两个轴-合作成功率（CSR）和稳定性指数来自TD误差方差-揭示了三个不同的制度：协调和稳定的阶段，脆弱的过渡区，和堵塞或无序的阶段。一个尖锐的双不稳定脊分离这些制度，并对应于持续的内核漂移，每个代理的有效过渡内核的时变移位引起的其他的政策更新。同步分析进一步表明，持续的合作所需的时间对齐，漂移和同步之间的竞争产生的脆弱的制度。删除代理标识符完全消除漂移和崩溃的三相结构，表明小代理间的不对称性是漂移的必要驱动程序。总体而言，结果表明，分散MARL表现出一致的相位结构的规模，密度和内核漂移之间的相互作用，这表明紧急协调行为作为一个分布相互作用驱动的相位现象。
摘要：A clearer understanding of when coordination emerges, fluctuates, or collapses in decentralized multi-agent reinforcement learning (MARL) is increasingly sought in order to characterize the dynamics of multi-agent learning systems. We revisit fully independent Q-learning (IQL) as a minimal decentralized testbed and run large-scale experiments across environment size L and agent density rho. We construct a phase map using two axes - the cooperative success rate (CSR) and a stability index derived from TD-error variance - revealing three distinct regimes: a coordinated and stable phase, a fragile transition region, and a jammed or disordered phase. A sharp double Instability Ridge separates these regimes and corresponds to persistent kernel drift, the time-varying shift of each agent's effective transition kernel induced by others' policy updates. Synchronization analysis further shows that temporal alignment is required for sustained cooperation, and that competition between drift and synchronization generates the fragile regime. Removing agent identifiers eliminates drift entirely and collapses the three-phase structure, demonstrating that small inter-agent asymmetries are a necessary driver of drift. Overall, the results show that decentralized MARL exhibits a coherent phase structure governed by the interaction between scale, density, and kernel drift, suggesting that emergent coordination behaves as a distribution-interaction-driven phase phenomenon.

【2】Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions
标题：通过截断分布改进随机时间约束强化学习
链接：https://arxiv.org/abs/2511.22406

作者：Roland Stolz,Michael Eichelbeck,Matthias Althoff
备注：Accepted at the AAAI26 conference main technical track
摘要：在强化学习（RL）中，考虑对动作空间的额外约束以确保安全性或动作相关性通常是有利的。现有的工作，这种行动约束RL面临的挑战，有效的政策更新，计算效率和可预测的运行时间。最近的工作建议使用截断正态分布的随机政策梯度方法。然而，计算的关键特征，如熵，对数概率，以及它们的梯度，在复杂的约束下，变得棘手。因此，先前的工作使用非截断分布来近似这些，这严重降低了性能。我们认为，这些特性的准确估计是至关重要的行动约束RL设置，并提出有效的数值逼近。我们还提供了一个有效的抽样策略截断的政策分布和验证我们的方法在三个基准环境中，使用准确的估计时，表现出显着的性能改善。
摘要：In reinforcement learning (RL), it is often advantageous to consider additional constraints on the action space to ensure safety or action relevance. Existing work on such action-constrained RL faces challenges regarding effective policy updates, computational efficiency, and predictable runtime. Recent work proposes to use truncated normal distributions for stochastic policy gradient methods. However, the computation of key characteristics, such as the entropy, log-probability, and their gradients, becomes intractable under complex constraints. Hence, prior work approximates these using the non-truncated distributions, which severely degrades performance. We argue that accurate estimation of these characteristics is crucial in the action-constrained RL setting, and propose efficient numerical approximations for them. We also provide an efficient sampling strategy for truncated policy distributions and validate our approach on three benchmark environments, which demonstrate significant performance improvements when using accurate estimations.

【3】BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning
标题：BiCQL-ML：用于最大似然反向强化学习的双层保守Q学习框架
链接：https://arxiv.org/abs/2511.22210

作者：Junsung Park
备注：8 pages, 3 figures
摘要：离线逆强化学习（IRL）旨在恢复一个奖励函数，该函数仅使用固定的演示数据来解释专家行为，而无需任何额外的在线交互。我们提出了BiCQL-ML，这是一种无策略的离线IRL算法，它在双层框架中联合优化了奖励函数和保守Q函数，从而避免了显式的策略学习。该方法在（i）在当前奖励下通过保守Q学习（CQL）学习保守Q函数和（ii）更新奖励参数以最大化专家动作的期望Q值，同时抑制对分布外动作的过度泛化之间交替。这个过程可以被看作是软值匹配原则下的最大似然估计。我们提供了理论保证，BiCQL-ML收敛到一个奖励函数下的专家政策是软最优的。从经验上讲，我们在标准的离线RL基准测试中表明，与现有的离线IRL基线相比，BiCQL-ML提高了奖励恢复和下游策略性能。
摘要：Offline inverse reinforcement learning (IRL) aims to recover a reward function that explains expert behavior using only fixed demonstration data, without any additional online interaction. We propose BiCQL-ML, a policy-free offline IRL algorithm that jointly optimizes a reward function and a conservative Q-function in a bi-level framework, thereby avoiding explicit policy learning. The method alternates between (i) learning a conservative Q-function via Conservative Q-Learning (CQL) under the current reward, and (ii) updating the reward parameters to maximize the expected Q-values of expert actions while suppressing over-generalization to out-of-distribution actions. This procedure can be viewed as maximum likelihood estimation under a soft value matching principle. We provide theoretical guarantees that BiCQL-ML converges to a reward function under which the expert policy is soft-optimal. Empirically, we show on standard offline RL benchmarks that BiCQL-ML improves both reward recovery and downstream policy performance compared to existing offline IRL baselines.

【4】Energy Efficient Sleep Mode Optimization in 5G mmWave Networks via Multi Agent Deep Reinforcement Learning
标题：通过多代理深度强化学习在5G毫米波网络中进行节能睡眠模式优化
链接：https://arxiv.org/abs/2511.22105

作者：Saad Masrur,Ismail Guvenc,David Lopez Perez
备注：This is an updated version of my preprint available on TechRxiv. Don't flag it as plagiarism. I wanna post my paper on arxiv
摘要：毫米波（mmWave）网络中的动态睡眠模式优化（SMO）对于在严格的服务质量（QoS）约束下最大化能量效率（EE）至关重要。然而，现有的优化和强化学习（RL）方法依赖于聚合的静态基站（BS）流量模型，这些模型无法捕获非平稳流量动态，并且存在较大的状态-动作空间，从而限制了实际部署。为了解决这些挑战，本文提出了一种使用双深度Q网络（DDQN）的多代理深度强化学习（MARL）框架，称为MARL-DDQN，用于在具有时变和基于社区的用户设备（UE）移动模型的3D城市环境中进行自适应SMO。与传统的单代理RL不同，MARL-DDQN能够以最小的信令开销实现可扩展的分布式决策。一个现实的BS功耗模型和波束形成集成，以准确地量化EE，而QoS的吞吐量定义。该方法适应SMO策略，以最大化EE，同时减轻小区间干扰，并确保吞吐量公平性。仿真结果表明，MARL-DDQN优于最先进的策略，包括All On，迭代QoS感知负载（IT-QoS-LB），MARL-DDPG和MARL-PPO，实现高达0.60 Mbit/Joule EE，8.5 Mbps的第10百分位吞吐量，并满足QoS约束的动态场景下的95%的时间。
摘要：Dynamic sleep mode optimization (SMO) in millimeter-wave (mmWave) networks is essential for maximizing energy efficiency (EE) under stringent quality-of-service (QoS) constraints. However, existing optimization and reinforcement learning (RL) approaches rely on aggregated, static base station (BS) traffic models that fail to capture non-stationary traffic dynamics and suffer from large state-action spaces, limiting real-world deployment. To address these challenges, this paper proposes a multi-agent deep reinforcement learning (MARL) framework using a Double Deep Q-Network (DDQN), referred to as MARL-DDQN, for adaptive SMO in a 3D urban environment with a time-varying and community-based user equipment (UE) mobility model. Unlike conventional single-agent RL, MARL-DDQN enables scalable, distributed decision-making with minimal signaling overhead. A realistic BS power consumption model and beamforming are integrated to accurately quantify EE, while QoS is defined in terms of throughput. The method adapts SMO policies to maximize EE while mitigating inter-cell interference and ensuring throughput fairness. Simulations show that MARL-DDQN outperforms state-of-the-art strategies, including All On, iterative QoS-aware load-based (IT-QoS-LB), MARL-DDPG, and MARL-PPO, achieving up to 0.60 Mbit/Joule EE, 8.5 Mbps 10th-percentile throughput, and meeting QoS constraints 95% of the time under dynamic scenarios.

【5】Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation
标题：关注协作和可扩展特征转换的异类多智能体强化学习
链接：https://arxiv.org/abs/2511.21934

作者：Tao Zhe,Huazhen Fang,Kunpeng Liu,Qian Lou,Tamzidul Hoque,Dongjie Wang
备注：Accepted at KDD 2026 Research Track
摘要：特征变换通过数学特征交叉生成信息特征来增强下游任务性能。尽管深度学习取得了进步，但特征转换对于结构化数据仍然至关重要，其中深度模型通常难以捕获复杂的特征交互。以前的文献自动化特征转换已经取得了成功，但往往依赖于算法或穷举搜索，导致效率低下和耗时的过程。最近的工作采用强化学习（RL），以加强传统的方法，通过更有效的试错的方式。然而，仍然存在两个限制：1）在转换过程中的动态特征扩展，这会导致不稳定性，并增加RL代理的学习复杂性; 2）代理之间的合作和通信不足，这会导致次优的特征交叉操作和模型性能下降。为了解决这些问题，我们提出了一种新的异构多代理RL框架，使合作和可扩展的功能转换。该框架包括三个异构的代理，分为两种类型，每一种设计用于选择必要的功能和功能交叉的操作。为了加强这些代理之间的沟通，我们实现了一个共享的批评机制，促进信息交换过程中的功能转换。为了处理动态扩展的特征空间，我们定制了多头基于注意力的特征代理，以选择合适的特征进行特征交叉。此外，我们在优化过程中引入了状态编码技术，以稳定和增强RL代理的学习动态，从而产生更强大和可靠的转换策略。最后，我们进行了大量的实验，以验证我们的模型的有效性，效率，鲁棒性和可解释性。
摘要：Feature transformation enhances downstream task performance by generating informative features through mathematical feature crossing. Despite the advancements in deep learning, feature transformation remains essential for structured data, where deep models often struggle to capture complex feature interactions. Prior literature on automated feature transformation has achieved success but often relies on heuristics or exhaustive searches, leading to inefficient and time-consuming processes. Recent works employ reinforcement learning (RL) to enhance traditional approaches through a more effective trial-and-error way. However, two limitations remain: 1) Dynamic feature expansion during the transformation process, which causes instability and increases the learning complexity for RL agents; 2) Insufficient cooperation and communication between agents, which results in suboptimal feature crossing operations and degraded model performance. To address them, we propose a novel heterogeneous multi-agent RL framework to enable cooperative and scalable feature transformation. The framework comprises three heterogeneous agents, grouped into two types, each designed to select essential features and operations for feature crossing. To enhance communication among these agents, we implement a shared critic mechanism that facilitates information exchange during feature transformation. To handle the dynamically expanding feature space, we tailor multi-head attention-based feature agents to select suitable features for feature crossing. Additionally, we introduce a state encoding technique during the optimization process to stabilize and enhance the learning dynamics of the RL agents, resulting in more robust and reliable transformation policies. Finally, we conduct extensive experiments to validate the effectiveness, efficiency, robustness, and interpretability of our model.

【6】OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning
标题：OBLR-PO：一个稳定强化学习的理论框架
链接：https://arxiv.org/abs/2511.23310

作者：Zixun Huang,Jiayi Sheng,Zeyu Zheng
备注：19 pages, 7 figures
摘要：现有的基于强化学习（RL）的大型语言模型后训练方法发展迅速，但它们的设计在很大程度上是由语言学而不是系统的理论原则指导的。这一差距限制了我们对梯度估计器和相关优化算法属性的理解，从而限制了提高训练稳定性和整体性能的机会。在这项工作中，我们提供了一个统一的理论框架，在温和的假设下，常用的政策梯度估计的统计特性。我们的分析建立无偏性，推导出精确的方差表达式，并产生一个优化损失上限，使学习动态的原则性推理。在这些结果的基础上，我们证明了收敛保证，并推导出一个自适应的学习率由梯度的信噪比（SNR）的时间表。我们进一步表明，方差最优基线是一个梯度加权估计，提供了一个新的原则，减少方差和自然提高稳定性超过现有的方法。这些见解激发了最优基线和学习率策略优化（OBLR-PO），这是一种以理论为基础的方式联合调整学习率和基线的算法。在Qwen 3 - 4 B-Base和Qwen 3 -8B-Base上的实验表明，与现有的策略优化方法相比，我们的理论贡献转化为大规模后期训练的实际改进。
摘要：Existing reinforcement learning (RL)-based post-training methods for large language models have advanced rapidly, yet their design has largely been guided by heuristics rather than systematic theoretical principles. This gap limits our understanding of the properties of the gradient estimators and the associated optimization algorithms, thereby constraining opportunities to improve training stability and overall performance. In this work, we provide a unified theoretical framework that characterizes the statistical properties of commonly used policy-gradient estimators under mild assumptions. Our analysis establishes unbiasedness, derives exact variance expressions, and yields an optimization-loss upper bound that enables principled reasoning about learning dynamics. Building on these results, we prove convergence guarantees and derive an adaptive learning-rate schedule governed by the signal-to-noise ratio (SNR) of gradients. We further show that the variance-optimal baseline is a gradient-weighted estimator, offering a new principle for variance reduction and naturally enhancing stability beyond existing methods. These insights motivate Optimal Baseline and Learning-Rate Policy Optimization (OBLR-PO), an algorithm that jointly adapts learning rates and baselines in a theoretically grounded manner. Experiments on Qwen3-4B-Base and Qwen3-8B-Base demonstrate consistent gains over existing policy optimization methods, validating that our theoretical contributions translate into practical improvements in large-scale post-training.

符号|符号学习(3篇)

【1】Beyond Curve Fitting: Neuro-Symbolic Agents for Context-Aware Epidemic Forecasting
标题：超越曲线匹配：用于上下文感知流行病预测的神经符号代理
链接：https://arxiv.org/abs/2511.23276

作者：Joongwon Chae,Runming Wang,Chen Xiong,Gong Yunhan,Lian Zhang,Ji Jiansong,Dongmei Yu,Peiwu Qin
摘要：手足口病（HFMD）的有效监测需要考虑流行病学模式和学校日历和天气等背景驱动因素的预测。虽然经典模型和最近的基础模型（例如，Chronos，TimesFM）纳入协变量，他们往往缺乏语义推理来解释相互冲突的驱动因素之间的因果关系。在这项工作中，我们提出了一个两个代理框架解耦上下文解释概率预测。一个法学硕士“事件解释器”的过程中的异构信号，包括学校的时间表，气象摘要和报告，成为一个标量传输影响信号。然后，神经符号核心将其与历史案例计数相结合，以产生校准的概率预测。我们评估了香港（2023-2024）和中国丽水（2024）真实世界手足口病数据集的框架。与传统和基础模型基线相比，我们的方法实现了具有竞争力的点预测精度，同时提供了强大的90%预测区间（覆盖范围0.85-1.00）和人类可解释的原理。我们的研究结果表明，通过LLM在结构上整合领域知识可以匹配最先进的性能，同时产生与公共卫生工作流程一致的上下文感知预测。代码可在https://github.com/jw-chae/forecast_MED上获得。
摘要：Effective surveillance of hand, foot and mouth disease (HFMD) requires forecasts accounting for epidemiological patterns and contextual drivers like school calendars and weather. While classical models and recent foundation models (e.g., Chronos, TimesFM) incorporate covariates, they often lack the semantic reasoning to interpret the causal interplay between conflicting drivers. In this work, we propose a two-agent framework decoupling contextual interpretation from probabilistic forecasting. An LLM "event interpreter" processes heterogeneous signals-including school schedules, meteorological summaries, and reports-into a scalar transmission-impact signal. A neuro-symbolic core then combines this with historical case counts to produce calibrated probabilistic forecasts. We evaluate the framework on real-world HFMD datasets from Hong Kong (2023-2024) and Lishui, China (2024). Compared to traditional and foundation-model baselines, our approach achieves competitive point forecasting accuracy while providing robust 90% prediction intervals (coverage 0.85-1.00) and human-interpretable rationales. Our results suggest that structurally integrating domain knowledge through LLMs can match state-of-the-art performance while yielding context-aware forecasts that align with public health workflows. Code is available at https://github.com/jw-chae/forecast_MED .

【2】Can Synthetic Data Improve Symbolic Regression Extrapolation Performance?
标题：合成数据能提高符号回归外推性能吗？
链接：https://arxiv.org/abs/2511.22794

作者：Fitria Wulandari Ramlan,Colm O'Riordan,Gabriel Kronberger,James McDermott
备注：8 pages, 16 figures, GECCO 2025 Symbolic Regression Workshop
摘要：许多机器学习模型在训练数据范围内进行预测时表现良好，但在需要外推时往往会遇到困难。使用遗传编程（GP）的符号回归（SR）可以生成灵活的模型，但外推时容易出现不可靠的行为。本文研究了在这种情况下，添加合成数据是否有助于提高性能。我们应用核密度估计（KDE）来识别输入空间中训练数据稀疏的区域。然后使用知识蒸馏方法在这些区域生成合成数据：教师模型生成对新输入点的预测，然后用于训练学生模型。我们在六个基准数据集上评估了这种方法，使用神经网络（NN），随机森林（RF）和GP作为教师模型（生成合成数据）和学生模型（在增强数据上训练）。结果表明，GP模型在合成数据上训练时通常可以改进，特别是在外推领域。然而，改进取决于所使用的数据集和教师模型。最重要的改进时，从GPe的合成数据被用来训练GPp外推区域。插值区域的变化仅显示轻微变化。我们还观察到异质性错误，其中模型性能在输入空间的不同区域之间存在差异。总的来说，这种方法为更好的外推提供了一种实用的解决方案。注：这项工作的早期版本出现在GECCO 2025符号回归研讨会上。这个arXiv版本纠正了原始提交的几个部分。
摘要：Many machine learning models perform well when making predictions within the training data range, but often struggle when required to extrapolate beyond it. Symbolic regression (SR) using genetic programming (GP) can generate flexible models but is prone to unreliable behaviour in extrapolation. This paper investigates whether adding synthetic data can help improve performance in such cases. We apply Kernel Density Estimation (KDE) to identify regions in the input space where the training data is sparse. Synthetic data is then generated in those regions using a knowledge distillation approach: a teacher model generates predictions on new input points, which are then used to train a student model. We evaluate this method across six benchmark datasets, using neural networks (NN), random forests (RF), and GP both as teacher models (to generate synthetic data) and as student models (trained on the augmented data). Results show that GP models can often improve when trained on synthetic data, especially in extrapolation areas. However, the improvement depends on the dataset and teacher model used. The most important improvements are observed when synthetic data from GPe is used to train GPp in extrapolation regions. Changes in interpolation areas show only slight changes. We also observe heterogeneous errors, where model performance varies across different regions of the input space. Overall, this approach offers a practical solution for better extrapolation. Note: An earlier version of this work appeared in the GECCO 2025 Workshop on Symbolic Regression. This arXiv version corrects several parts of the original submission.

【3】Constraining dark matter halo profiles with symbolic regression
标题：用符号回归约束暗物质光环轮廓
链接：https://arxiv.org/abs/2511.23073

作者：Alicia Martín,Tariq Yasin,Deaglan J. Bartlett,Harry Desmond,Pedro G. Ferreira
备注：18 pages, 5 figures. Accepted for publication in Philosophical Transactions of the Royal Society A
摘要：暗物质晕的典型特征是由模拟（例如NFW）激发的具有固定形式的径向密度分布。然而，模拟预测依赖于不确定的暗物质物理和重子模型。在这里，我们提出了一种方法来约束晕密度分布直接从观测使用穷举符号回归（ESR），一种技术，搜索空间的解析表达式的功能，最好的平衡精度和简单性为给定的数据集。我们测试模拟弱透镜过量表面密度（ESD）数据的NFW配置文件的合成集群的方法。受真实数据的启发，我们为每个ESD数据点分配一个恒定的分数不确定性，并改变这个不确定性和聚类的数量，以探讨数据精度和样本大小如何影响模型选择。对于约5%的分数误差，ESR恢复NFW配置文件，甚至从小到20个集群的样本。在较高的不确定性代表目前的调查，更简单的功能更有利于NFW，虽然它仍然具有竞争力。出现这种偏好是因为弱透镜误差在外围最小，导致拟合由外部轮廓主导。因此，ESR提供了一个强大的，模拟独立的框架，用于测试质量模型和确定晕的密度分布的哪些特征真正受到数据的约束。
摘要：Dark matter haloes are typically characterised by radial density profiles with fixed forms motivated by simulations (e.g. NFW). However, simulation predictions depend on uncertain dark matter physics and baryonic modelling. Here, we present a method to constrain halo density profiles directly from observations using Exhaustive Symbolic Regression (ESR), a technique that searches the space of analytic expressions for the function that best balances accuracy and simplicity for a given dataset. We test the approach on mock weak lensing excess surface density (ESD) data of synthetic clusters with NFW profiles. Motivated by real data, we assign each ESD data point a constant fractional uncertainty and vary this uncertainty and the number of clusters to probe how data precision and sample size affect model selection. For fractional errors around 5%, ESR recovers the NFW profile even from samples as small as 20 clusters. At higher uncertainties representative of current surveys, simpler functions are favoured over NFW, though it remains competitive. This preference arises because weak lensing errors are smallest in the outskirts, causing the fits to be dominated by the outer profile. ESR therefore provides a robust, simulation-independent framework both for testing mass models and determining which features of a halo's density profile are genuinely constrained by the data.

医学相关(4篇)

【1】EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model
标题：En心电图：心电图多任务基础模型的高效集体学习
链接：https://arxiv.org/abs/2511.22935

作者：Yuhao Xu,Xiaoda Wang,Jiaying Lu,Sirui Ding,Defu Cao,Huaxiu Yao,Yan Liu,Xiao Hu,Carl Yang
摘要：心电图（ECG）分析在各种心血管疾病的早期检测、监测和管理中起着至关重要的作用。虽然现有模型在心电图解读方面取得了显着成功，但它们未能利用各种心脏异常的相互关联性质。相反，开发能够提取多个ECG任务的所有相关特征的特定模型仍然是一个重大挑战。大规模基础模型虽然功能强大，但通常不会在ECG数据上进行预训练，这使得完全重新训练或微调计算成本高昂。为了应对这些挑战，我们提出了EnECG（基于专家的Ensemble Learning for ECG Multi-tasks），这是一个基于集成的框架，集成了多个专业基础模型，每个模型都擅长ECG解释的不同方面。EnECG不依赖于单个模型或单个任务，而是利用多个专业模型的优势来处理各种基于ECG的任务。为了减轻完全重新训练或微调的高计算成本，我们引入了一种轻量级自适应策略：将专用输出层附加到每个基础模型，并仅对这些新添加的参数应用低秩自适应（LoRA）。然后，我们采用专家混合（MoE）机制来学习集成权重，有效地结合了各个模型的互补专业知识。我们的实验结果表明，通过最大限度地减少微调的范围，EnECG可以帮助减少计算和内存成本，同时保持基础模型的强大代表性。该框架不仅增强了特征提取和预测性能，还确保了现实世界临床应用的实际效率。该代码可在https://github.com/yuhaoxu99/EnECG.git上获得。
摘要：Electrocardiogram (ECG) analysis plays a vital role in the early detection, monitoring, and management of various cardiovascular conditions. While existing models have achieved notable success in ECG interpretation, they fail to leverage the interrelated nature of various cardiac abnormalities. Conversely, developing a specific model capable of extracting all relevant features for multiple ECG tasks remains a significant challenge. Large-scale foundation models, though powerful, are not typically pretrained on ECG data, making full re-training or fine-tuning computationally expensive. To address these challenges, we propose EnECG(Mixture of Experts-based Ensemble Learning for ECG Multi-tasks), an ensemble-based framework that integrates multiple specialized foundation models, each excelling in different aspects of ECG interpretation. Instead of relying on a single model or single task, EnECG leverages the strengths of multiple specialized models to tackle a variety of ECG-based tasks. To mitigate the high computational cost of full re-training or fine-tuning, we introduce a lightweight adaptation strategy: attaching dedicated output layers to each foundation model and applying Low-Rank Adaptation (LoRA) only to these newly added parameters. We then adopt a Mixture of Experts (MoE) mechanism to learn ensemble weights, effectively combining the complementary expertise of individual models. Our experimental results demonstrate that by minimizing the scope of fine-tuning, EnECG can help reduce computational and memory costs while maintaining the strong representational power of foundation models. This framework not only enhances feature extraction and predictive performance but also ensures practical efficiency for real-world clinical applications. The code is available at https://github.com/yuhaoxu99/EnECG.git.

【2】Integrated Transcriptomic-proteomic Biomarker Identification for Radiation Response Prediction in Non-small Cell Lung Cancer Cell Lines
标题：综合转录组-蛋白质组生物标志物鉴定用于预测非小细胞肺癌细胞系辐射反应
链接：https://arxiv.org/abs/2511.22735

作者：Yajun Yu,Guoping Xu,Steve Jiang,Robert Timmerman,John Minna,Yuanyuan Zhang,Hao Peng
摘要：开发一个整合的转录组-蛋白质组框架，用于识别预测非小细胞肺癌（NSCLC）细胞系辐射反应的并发生物标志物，如通过2 Gy（SF 2）下的存活分数测量的。分别从73和46个NSCLC细胞系中收集RNA测序（RNA-seq）和数据独立采集质谱（DIA-MS）蛋白质组学数据。预处理后，保留了1，605个共享基因用于分析。使用最小绝对收缩和选择算子（Lasso）回归进行特征选择，并在重复10次的五重交叉验证下使用基于频率的排序标准。使用仅转录组、仅蛋白组和组合转录组-蛋白组特征集构建支持向量回归（SVR）模型。模型的性能进行了评估的决定系数（R2）和均方根误差（RMSE）。相关性分析评估了RNA和蛋白质表达之间的一致性以及所选生物标志物与SF 2的关系。RNA-蛋白表达呈现显著的正相关（Pearson's r中位数= 0.363）。独立的管道从转录组学、蛋白质组学和组合数据集中确定了20个优先级基因签名。在单一组学特征上训练的模型实现了有限的跨组学概括性，而组合模型在两个数据集中表现出平衡的预测准确性（对于转录组，R2=0.461，RMSE=0.120;对于蛋白质组，R2=0.604，RMSE=0.111）。本研究提出了第一个用于非小细胞肺癌SF 2预测的蛋白质转录组框架，强调了整合转录组和蛋白质组数据的互补价值。所确定的并发生物标志物捕获转录调控和功能蛋白质活性，提供机制的见解和翻译潜力。
摘要：To develop an integrated transcriptome-proteome framework for identifying concurrent biomarkers predictive of radiation response, as measured by survival fraction at 2 Gy (SF2), in non-small cell lung cancer (NSCLC) cell lines. RNA sequencing (RNA-seq) and data-independent acquisition mass spectrometry (DIA-MS) proteomic data were collected from 73 and 46 NSCLC cell lines, respectively. Following preprocessing, 1,605 shared genes were retained for analysis. Feature selection was performed using least absolute shrinkage and selection operator (Lasso) regression with a frequency-based ranking criterion under five-fold cross-validation repeated ten times. Support vector regression (SVR) models were constructed using transcriptome-only, proteome-only, and combined transcriptome-proteome feature sets. Model performance was assessed by the coefficient of determination (R2) and root mean square error (RMSE). Correlation analyses evaluated concordance between RNA and protein expression and the relationships of selected biomarkers with SF2. RNA-protein expression exhibited significant positive correlations (median Pearson's r = 0.363). Independent pipelines identified 20 prioritized gene signatures from transcriptomic, proteomic, and combined datasets. Models trained on single-omic features achieved limited cross-omic generalizability, while the combined model demonstrated balanced predictive accuracy in both datasets (R2=0.461, RMSE=0.120 for transcriptome; R2=0.604, RMSE=0.111 for proteome). This study presents the first proteotranscriptomic framework for SF2 prediction in NSCLC, highlighting the complementary value of integrating transcriptomic and proteomic data. The identified concurrent biomarkers capture both transcriptional regulation and functional protein activity, offering mechanistic insights and translational potential.

【3】Stacked Ensemble of Fine-Tuned CNNs for Knee Osteoarthritis Severity Grading
标题：用于膝关节骨关节炎严重程度分级的精调CNN堆叠集合
链接：https://arxiv.org/abs/2511.22143

作者：Adarsh Gupta,Japleen Kaur,Tanvi Doshi,Teena Sharma,Nishchal K. Verma,Shantaram Vasikarla
备注：Accepted and Presented at IEEE UEMCON, IBM T.J. Watson Research Center, New York, USA, 2024
摘要：膝骨关节炎（KOA）是一种肌肉骨骼疾病，可导致日常活动的严重限制和障碍，尤其是在老年人中。为了评估KOA的严重程度，通常分析受影响膝关节的X射线图像，并基于Kelling-Lawrence（KL）分级系统进行分级，该系统将KOA严重程度分为5个级别，范围从0到4。这种方法需要高水平的专业知识和时间，容易受到主观解释，从而引入潜在的诊断不准确性。为了解决这个问题，开发了一种微调卷积神经网络（CNN）的堆叠集成模型，用于两个分类任务：用于检测KOA存在的二元分类器和用于在KL频谱上精确分级的多类分类器。所提出的堆叠集成模型由一组不同的预训练架构组成，包括MobileNetV 2，You Only Look Once（YOLOv 8）和DenseNet 201作为基础学习器，以及Categorical Boosting（CatBoost）作为元学习器。该模型在多类分类和二元分类中的平衡测试准确率分别为73%和87.5%，高于现有文献中的其他研究成果。
摘要：Knee Osteoarthritis (KOA) is a musculoskeletal condition that can cause significant limitations and impairments in daily activities, especially among older individuals. To evaluate the severity of KOA, typically, X-ray images of the affected knee are analyzed, and a grade is assigned based on the Kellgren-Lawrence (KL) grading system, which classifies KOA severity into five levels, ranging from 0 to 4. This approach requires a high level of expertise and time and is susceptible to subjective interpretation, thereby introducing potential diagnostic inaccuracies. To address this problem a stacked ensemble model of fine-tuned Convolutional Neural Networks (CNNs) was developed for two classification tasks: a binary classifier for detecting the presence of KOA, and a multiclass classifier for precise grading across the KL spectrum. The proposed stacked ensemble model consists of a diverse set of pre-trained architectures, including MobileNetV2, You Only Look Once (YOLOv8), and DenseNet201 as base learners and Categorical Boosting (CatBoost) as the meta-learner. This proposed model had a balanced test accuracy of 73% in multiclass classification and 87.5% in binary classification, which is higher than previous works in extant literature.

【4】47B Mixture-of-Experts Beats 671B Dense Models on Chinese Medical Examinations
标题：47 B专家混合击败671 B中国体检密集模型
链接：https://arxiv.org/abs/2511.21701

作者：Chiung-Yi Tseng,Danyang Zhang,Tianyang Wang,Hongying Luo,Lu Chen,Junming Huang,Jibin Guan,Junfeng Hao,Junhao Song,Ziqian Bi
摘要：大型语言模型（LLM）的快速发展引起了人们对其在医学领域潜在应用的极大兴趣。本文对27名最先进的中国医学考试问题法学硕士进行了全面的基准评估，涵盖两个专业级别的7个医学专业。我们引入了一个强大的评估框架，该框架评估了来自心血管，胃肠病学，血液学，传染病，肾脏病学，神经病学和呼吸医学领域的2，800个精心策划的问题的模型性能。我们的数据集区分了主治医生和高级医生的难度水平，为不同复杂性的模型能力提供了细致入微的见解。我们的实证分析揭示了模型之间的显著性能差异，Mixtral-8x 7 B实现了74.25%的最高整体准确度，其次是DeepSeek-R1- 671 B，为64.07%。值得注意的是，我们观察到模型大小和性能之间没有一致的相关性，正如较小的专家混合架构的强大性能所证明的那样。评估表明，医学专业之间存在显著的性能差距，与胃肠病学和肾脏病学领域相比，模型在心血管和神经病学问题上的表现通常更好。此外，我们的分析表明，主治医生和高级医生级别之间的性能下降最小的表现最好的模型，这表明强大的泛化能力。该基准为在医学教育和临床决策支持系统中部署LLM提供了重要见解，突出了这些技术在专业医疗环境中的前景和当前局限性。
摘要：The rapid advancement of large language models(LLMs) has prompted significant interest in their potential applications in medical domains. This paper presents a comprehensive benchmark evaluation of 27 state-of-the-art LLMs on Chinese medical examination questions, encompassing seven medical specialties across two professional levels. We introduce a robust evaluation framework that assesses model performance on 2,800 carefully curated questions from cardiovascular, gastroenterology, hematology, infectious diseases, nephrology, neurology, and respiratory medicine domains. Our dataset distinguishes between attending physician and senior physician difficulty levels, providing nuanced insights into model capabilities across varying complexity. Our empirical analysis reveals substantial performance variations among models, with Mixtral-8x7B achieving the highest overall accuracy of 74.25%, followed by DeepSeek-R1-671B at 64.07%. Notably, we observe no consistent correlation between model size and performance, as evidenced by the strong performance of smaller mixture-of-experts architectures. The evaluation demonstrates significant performance gaps between medical specialties, with models generally performing better on cardiovascular and neurology questions compared to gastroenterology and nephrology domains. Furthermore, our analysis indicates minimal performance degradation between attending and senior physician levels for top-performing models, suggesting robust generalization capabilities. This benchmark provides critical insights for the deployment of LLMs in medical education and clinical decision support systems, highlighting both the promise and current limitations of these technologies in specialized medical contexts.

推荐(3篇)

【1】Masked Diffusion for Generative Recommendation
标题：生成性推荐的掩蔽扩散
链接：https://arxiv.org/abs/2511.23021

作者：Kulin Shah,Bhuvesh Kumar,Neil Shah,Liam Collins
备注：25 pages
摘要：基于语义ID（SID）的生成式推荐（GR）由于其性能增益、通过语言模型嵌入提供的语义信息的资本化以及推理和存储效率而成为传统推荐方法的一个有前途的替代方案。具有SID的现有GR工作框架使用自回归建模与用户的交互历史相对应的SID序列的概率。虽然这在某些设置中导致了令人印象深刻的下一个项目预测性能，但这些具有SID的自回归GR模型由于顺序令牌式解码，训练数据的潜在低效使用以及对学习令牌之间的短上下文关系的偏见而遭受昂贵的推断。受NLP最近突破的启发，我们建议使用掩蔽扩散来建模和学习用户SID序列的概率。掩蔽扩散采用离散掩蔽噪声来促进学习序列分布，并且将掩蔽令牌的概率建模为在给定未掩蔽令牌的情况下条件独立，从而允许掩蔽令牌的并行解码。我们证明，通过彻底的实验，我们提出的方法始终优于自回归建模。这种性能差距在数据受限的设置和粗粒度召回方面尤其明显，与我们的直觉一致。此外，我们的方法允许在推理过程中并行预测多个SID的灵活性，同时保持自回归建模的优越性能。
摘要：Generative recommendation (GR) with semantic IDs (SIDs) has emerged as a promising alternative to traditional recommendation approaches due to its performance gains, capitalization on semantic information provided through language model embeddings, and inference and storage efficiency. Existing GR with SIDs works frame the probability of a sequence of SIDs corresponding to a user's interaction history using autoregressive modeling. While this has led to impressive next item prediction performances in certain settings, these autoregressive GR with SIDs models suffer from expensive inference due to sequential token-wise decoding, potentially inefficient use of training data and bias towards learning short-context relationships among tokens. Inspired by recent breakthroughs in NLP, we propose to instead model and learn the probability of a user's sequence of SIDs using masked diffusion. Masked diffusion employs discrete masking noise to facilitate learning the sequence distribution, and models the probability of masked tokens as conditionally independent given the unmasked tokens, allowing for parallel decoding of the masked tokens. We demonstrate through thorough experiments that our proposed method consistently outperforms autoregressive modeling. This performance gap is especially pronounced in data-constrained settings and in terms of coarse-grained recall, consistent with our intuitions. Moreover, our approach allows the flexibility of predicting multiple SIDs in parallel during inference while maintaining superior performance to autoregressive modeling.

【2】Privacy-Utility-Bias Trade-offs for Privacy-Preserving Recommender Systems
标题：隐私保护推荐系统的隐私-效用-偏见权衡
链接：https://arxiv.org/abs/2511.22515

作者：Shiva Parsarad,Isabel Wagner
摘要：推荐系统（RS）基于用户的过去评级和来自其他用户的评级来输出用户可能感兴趣的项目（诸如电影或餐馆）的排名列表。RS越来越多地采用差分隐私（DP）来保护用户数据，这引发了关于隐私机制如何影响推荐准确性和公平性的问题。我们在MovieLens-1 M和Yelp数据集上对两种DP机制（差分隐私随机梯度下降（DPSGD）和局部差分隐私（LDP））进行了全面的跨模型评估，应用于四个推荐系统（神经协同过滤（NCF），贝叶斯个性化排名（BPR），奇异值分解（SVD）和变分自动编码器（VAE））。我们发现，更强的隐私一贯降低效用，但不是均匀的。DPSGD下的NCF显示出最小的准确性损失（在大约1的范围内低于10%），而SVD和BPR经历了更大的下降，特别是对于具有利基偏好的用户。VAE对隐私最敏感，代表性稀疏的群体急剧下降。对偏倚指标的影响也是类似的。DPSGD通常减少流行和不太流行的项目的建议之间的差距，而LDP保留现有的模式更密切。这些结果强调，没有单一的DP机制是一致的优越性，相反，每个提供权衡不同的隐私制度和数据条件。
摘要：Recommender systems (RSs) output ranked lists of items, such as movies or restaurants, that users may find interesting, based on the user's past ratings and ratings from other users. RSs increasingly incorporate differential privacy (DP) to protect user data, raising questions about how privacy mechanisms affect both recommendation accuracy and fairness. We conduct a comprehensive, cross-model evaluation of two DP mechanisms, differentially private stochastic gradient descent (DPSGD) and local differential privacy (LDP), applied to four recommender systems (Neural Collaborative Filtering (NCF), Bayesian Personalized Ranking (BPR), Singular Value Decomposition (SVD), and Variational Autoencoder (VAE)) on the MovieLens-1M and Yelp datasets. We find that stronger privacy consistently reduces utility, but not uniformly. NCF under DPSGD shows the smallest accuracy loss (under 10 percent at epsilon approximately 1), whereas SVD and BPR experience larger drops, especially for users with niche preferences. VAE is the most sensitive to privacy, with sharp declines for sparsely represented groups. The impact on bias metrics is similarly heterogeneous. DPSGD generally reduces the gap between recommendations of popular and less popular items, whereas LDP preserves existing patterns more closely. These results highlight that no single DP mechanism is uniformly superior; instead, each provides trade-offs under different privacy regimes and data conditions.

【3】Benchmarking In-context Experiential Learning Through Repeated Product Recommendations
标题：通过重复的产品推荐对背景体验式学习进行基准
链接：https://arxiv.org/abs/2511.22130

作者：Gilbert Yang,Yaqin Chen,Thomson Yen,Hongseok Namkoong
摘要：为了在不断变化的现实世界环境中可靠地导航，智能体必须应对不完整的知识，并通过经验调整自己的行为。然而，目前的评估主要集中在没有模糊性的任务上，并且不测量代理自适应学习和推理的能力。我们在产品推荐环境中强调了这种情境体验式学习的必要性，在这种环境中，代理商必须通过自然语言对话来导航不断变化的客户偏好和产品景观。我们策划了一个体验式学习和主动探索（BELA）的基准，它结合了（1）亚马逊丰富的现实世界产品，（2）多样化的用户角色集合，以代表异质但潜在的偏好，以及（3）由角色驱动的LLM用户模拟器，以创建丰富的交互轨迹。我们观察到，目前的前沿模型很难在不同的事件中得到有意义的改善，这强调了对具有强大上下文学习能力的代理系统的需求。
摘要：To reliably navigate ever-shifting real-world environments, agents must grapple with incomplete knowledge and adapt their behavior through experience. However, current evaluations largely focus on tasks that leave no ambiguity, and do not measure agents' ability to adaptively learn and reason through the experiences they accrued. We exemplify the need for this in-context experiential learning in a product recommendation context, where agents must navigate shifting customer preferences and product landscapes through natural language dialogue. We curate a benchmark for experiential learning and active exploration (BELA) that combines (1) rich real-world products from Amazon, (2) a diverse collection of user personas to represent heterogeneous yet latent preferences, and (3) a LLM user simulator powered by the persona to create rich interactive trajectories. We observe that current frontier models struggle to meaningfully improve across episodes, underscoring the need for agentic systems with strong in-context learning capabilities.

聚类(2篇)

【1】An Improved and Generalised Analysis for Spectral Clustering
标题：谱簇的改进和推广分析
链接：https://arxiv.org/abs/2511.23261

作者：George Tyler,Luca Zanetti
备注：11 pages, 7 figures. Accepted to Learning on Graphs Conference 2025
摘要：我们重新审视谱聚类的理论性能，这是一种经典的图分区算法，依赖于图的矩阵表示的特征向量。非正式地，我们表明，谱聚类工作良好，只要最小的特征值出现在组中，以及从矩阵表示的频谱的其余部分分开。例如，每当存在不同规模的集群层次结构时，就会出现这种情况，而这种机制是之前的分析没有捕捉到的。我们的结果是非常普遍的，可以应用到传统的图拉普拉斯算子之外。特别是，我们研究了有向图的厄米特表示，并表明谱聚类可以恢复分区，其中簇之间的边缘大多是在同一方向上。例如，这在生态网络中的营养级分析中有应用。我们证明了我们的结果准确地预测合成和真实世界的数据集上的谱聚类的性能。
摘要：We revisit the theoretical performances of Spectral Clustering, a classical algorithm for graph partitioning that relies on the eigenvectors of a matrix representation of the graph. Informally, we show that Spectral Clustering works well as long as the smallest eigenvalues appear in groups well separated from the rest of the matrix representation's spectrum. This arises, for example, whenever there exists a hierarchy of clusters at different scales, a regime not captured by previous analyses. Our results are very general and can be applied beyond the traditional graph Laplacian. In particular, we study Hermitian representations of digraphs and show Spectral Clustering can recover partitions where edges between clusters are oriented mostly in the same direction. This has applications in, for example, the analysis of trophic levels in ecological networks. We demonstrate that our results accurately predict the performances of Spectral Clustering on synthetic and real-world data sets.

【2】Clustering Malware at Scale: A First Full-Benchmark Study
标题：大规模集群恶意软件：第一项全基准研究
链接：https://arxiv.org/abs/2511.23198

作者：Martin Mocko,Jakub Ševcech,Daniela Chudá
备注：pre-print of the paper (i.e. "submitted manuscript" version)
摘要：近年来，恶意软件攻击仍然以高频率发生。恶意软件专家试图对传入的样本进行分类和归类，以确认其可信度或证明其恶意。可以识别恶意软件样本组的方式之一是通过恶意软件聚类。尽管社区做出了努力，但包含良性样本的恶意软件集群尚未得到充分探索。此外，尽管有更大的公共基准恶意软件数据集的可用性，恶意软件聚类研究避免在他们的实验中充分利用这些数据集，往往诉诸于只有几个家庭的小数据集。此外，目前用于恶意软件集群的最先进的解决方案仍然不清楚。在我们的研究中，我们评估了恶意软件集群的质量，并建立了Bodmas和Ember -两个大型公共基准恶意软件数据集的最新技术。我们的研究是第一次在整个恶意软件基准数据集上进行恶意软件聚类。此外，我们扩展了恶意软件集群任务，将良性样本。我们的研究结果表明，纳入良性样本不会显着降低聚类质量。我们发现Ember和Bodmas以及私人行业数据集之间创建的集群质量存在显着差异。与流行的观点相反，我们的顶级聚类性能是K-Means和BIRCH，DBSCAN和HAC落后。
摘要：Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove their maliciousness. One of the ways in which groups of malware samples can be identified is through malware clustering. Despite the efforts of the community, malware clustering which incorporates benign samples has been under-explored. Moreover, despite the availability of larger public benchmark malware datasets, malware clustering studies have avoided fully utilizing these datasets in their experiments, often resorting to small datasets with only a few families. Additionally, the current state-of-the-art solutions for malware clustering remain unclear. In our study, we evaluate malware clustering quality and establish the state-of-the-art on Bodmas and Ember - two large public benchmark malware datasets. Ours is the first study of malware clustering performed on whole malware benchmark datasets. Additionally, we extend the malware clustering task by incorporating benign samples. Our results indicate that incorporating benign samples does not significantly degrade clustering quality. We find that there are significant differences in the quality of the created clusters between Ember and Bodmas, as well as a private industry dataset. Contrary to popular opinion, our top clustering performers are K-Means and BIRCH, with DBSCAN and HAC falling behind.

自动驾驶|车辆|车道检测等(1篇)

【1】A Multi-View Multi-Timescale Hypergraph-Empowered Spatiotemporal Framework for EV Charging Forecasting
标题：用于电动汽车充电预测的多视图多时间尺度超图形时空框架
链接：https://arxiv.org/abs/2511.22072

作者：Jinhao Li,Hao Wang
备注：14 pages
摘要：准确的电动汽车（EV）充电需求预测对于电网的稳定运行和电动汽车积极参与电力市场至关重要。现有的预测方法，特别是基于图神经网络的预测方法，通常仅限于对站点之间的成对关系进行建模，无法捕捉城市充电网络中固有的复杂的组动态。为了解决这一差距，我们开发了一种新的预测框架，即HyperCast，利用超图的表达能力来模拟隐藏在电动汽车充电模式中的高阶时空依赖关系。HyperCast集成了多视图超图，它可以捕获静态地理接近度和基于需求的动态功能相似性，以及多时间尺度输入，以区分最近的趋势和每周的周期性。该框架采用专门的超时空块和定制的交叉注意机制，有效地融合来自这些不同来源的信息：视图和时间尺度。在四个公共数据集上进行的大量实验表明，HyperCast的性能明显优于各种最先进的基线，证明了明确建模集体充电行为以实现更准确预测的有效性。
摘要：Accurate electric vehicle (EV) charging demand forecasting is essential for stable grid operation and proactive EV participation in electricity market. Existing forecasting methods, particularly those based on graph neural networks, are often limited to modeling pairwise relationships between stations, failing to capture the complex, group-wise dynamics inherent in urban charging networks. To address this gap, we develop a novel forecasting framework namely HyperCast, leveraging the expressive power of hypergraphs to model the higher-order spatiotemporal dependencies hidden in EV charging patterns. HyperCast integrates multi-view hypergraphs, which capture both static geographical proximity and dynamic demand-based functional similarities, along with multi-timescale inputs to differentiate between recent trends and weekly periodicities. The framework employs specialized hyper-spatiotemporal blocks and tailored cross-attention mechanisms to effectively fuse information from these diverse sources: views and timescales. Extensive experiments on four public datasets demonstrate that HyperCast significantly outperforms a wide array of state-of-the-art baselines, demonstrating the effectiveness of explicitly modeling collective charging behaviors for more accurate forecasting.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】Automated Discovery of Laser Dicing Processes with Bayesian Optimization for Semiconductor Manufacturing
标题：利用Bayesian优化自动发现半导体制造的激光切片工艺
链接：https://arxiv.org/abs/2511.23141

作者：David Leeftink,Roman Doll,Heleen Visserman,Marco Post,Faysal Boughorbel,Max Hinne,Marcel van Gerven
备注：18 pages, 9 figures
摘要：半导体晶片的激光切割是微电子制造中的关键步骤，其中多个连续的激光通过精确地将各个管芯与晶片分离。使这种复杂的连续工艺适应新的晶圆材料通常需要数周的专家努力来平衡工艺速度、分离质量和材料完整性。我们首次在工业LASER 1205切割系统上自动发现了可用于生产的激光切割工艺。我们制定的问题作为一个高维，约束多目标贝叶斯优化任务，并引入一个连续的两级保真度策略，以尽量减少昂贵的破坏性模具强度评估。在裸硅和产品晶圆上，我们的方法仅使用技术人员级别的操作，自主提供在生产速度、芯片强度和结构完整性方面匹配或超过专家基线的可行配置。事后验证不同的权重配置的效用函数表明，多个可行的解决方案，定性不同的权衡，可以从最终的代理模型。专家对所发现的工艺进行改进，可以进一步提高生产速度，同时保持模具强度和结构完整性，超越纯手动或自动化方法。
摘要：Laser dicing of semiconductor wafers is a critical step in microelectronic manufacturing, where multiple sequential laser passes precisely separate individual dies from the wafer. Adapting this complex sequential process to new wafer materials typically requires weeks of expert effort to balance process speed, separation quality, and material integrity. We present the first automated discovery of production-ready laser dicing processes on an industrial LASER1205 dicing system. We formulate the problem as a high-dimensional, constrained multi-objective Bayesian optimization task, and introduce a sequential two-level fidelity strategy to minimize expensive destructive die-strength evaluations. On bare silicon and product wafers, our method autonomously delivers feasible configurations that match or exceed expert baselines in production speed, die strength, and structural integrity, using only technician-level operation. Post-hoc validation of different weight configurations of the utility functions reveals that multiple feasible solutions with qualitatively different trade-offs can be obtained from the final surrogate model. Expert-refinement of the discovered process can further improve production speed while preserving die strength and structural integrity, surpassing purely manual or automated methods.

联邦学习|隐私保护|加密(5篇)

【1】Closing the Generalization Gap in Parameter-efficient Federated Edge Learning
标题：缩小参数高效的联邦边缘学习中的概括差距
链接：https://arxiv.org/abs/2511.23282

作者：Xinnong Du,Zhonghao Lyu,Xiaowen Cao,Chunyang Wen,Shuguang Cui,Jie Xu
备注：13 pages, 8 figures
摘要：联邦边缘学习（FEEL）通过支持协作模型训练，同时保护数据隐私，为边缘人工智能（AI）提供了一个有前途的基础。然而，有限的和异构的本地数据集，以及资源受限的部署，严重降低模型泛化和资源利用率，导致学习性能受损。因此，我们提出了一个参数高效的FEEL框架，共同利用模型修剪和客户端选择来应对这些挑战。首先，我们推导出一个信息理论的概括陈述，它描述了训练和测试函数损失之间的差异，并将其嵌入到收敛性分析中。结果表明，较大的局部泛化语句会破坏全局收敛性。然后，我们制定了一个广义感知的平均平方梯度范数界最小化问题，通过联合优化修剪比，客户端选择，和通信计算资源的能源和延迟的限制。尽管它的非凸性，由此产生的混合整数问题是有效地解决了通过交替优化算法。大量的实验表明，所提出的设计实现了优于最先进的基线的学习性能，验证了耦合泛化感知分析与系统级优化的有效性，以实现高效的FEEL。
摘要：Federated edge learning (FEEL) provides a promising foundation for edge artificial intelligence (AI) by enabling collaborative model training while preserving data privacy. However, limited and heterogeneous local datasets, as well as resource-constrained deployment, severely degrade both model generalization and resource utilization, leading to a compromised learning performance. Therefore, we propose a parameter-efficient FEEL framework that jointly leverages model pruning and client selection to tackle such challenges. First, we derive an information-theoretic generalization statement that characterizes the discrepancy between training and testing function losses and embed it into the convergence analysis. It reveals that a larger local generalization statement can undermine the global convergence. Then, we formulate a generalization-aware average squared gradient norm bound minimization problem, by jointly optimizing the pruning ratios, client selection, and communication-computation resources under energy and delay constraints. Despite its non-convexity, the resulting mixed-integer problem is efficiently solved via an alternating optimization algorithm. Extensive experiments demonstrate that the proposed design achieves superior learning performance than state-of-the-art baselines, validating the effectiveness of coupling generalization-aware analysis with system-level optimization for efficient FEEL.

【2】Federated Learning Survey: A Multi-Level Taxonomy of Aggregation Techniques, Experimental Insights, and Future Frontiers
标题：联邦学习调查：聚合技术、实验见解和未来前沿的多层次分类
链接：https://arxiv.org/abs/2511.22616

作者：Meriem Arbaoui,Mohamed-el-Amine Brahmia,Abdellatif Rahmoun,Mourad Zghal
备注：Author-Accepted Manuscript. 65 pages, 26 figures, 20 tables. Published in ACM Transactions on Intelligent Systems and Technology (TIST), 2024
摘要：物联网和人工智能的融合开启了跨行业的创新，但日益增长的隐私问题和数据隔离阻碍了进展。传统的集中式机器学习努力克服这些挑战，这导致了联邦学习（FL）的兴起，这是一种分散的范式，可以在不共享本地原始数据的情况下进行协作模型训练。FL确保数据隐私，减少通信开销，并支持可扩展性，但与集中式方法相比，其异构性增加了复杂性。该调查侧重于三个主要的外语研究方向：个性化，优化和鲁棒性，通过文献计量分析与系统评价相结合的混合方法提供结构化分类，以确定最有影响力的作品。我们研究了与异构性、效率、安全性和隐私相关的挑战和技术，并提供了聚合策略的全面概述，包括体系结构、同步方法和不同的联邦目标。为了补充这一点，我们讨论了实际的评估方法和本实验比较IID和非IID数据分布下的聚合方法。最后，我们概述了有前途的研究方向，以推进FL，旨在指导未来的创新，在这个迅速发展的领域。
摘要：The integration of IoT and AI has unlocked innovation across industries, but growing privacy concerns and data isolation hinder progress. Traditional centralized ML struggles to overcome these challenges, which has led to the rise of Federated Learning (FL), a decentralized paradigm that enables collaborative model training without sharing local raw data. FL ensures data privacy, reduces communication overhead, and supports scalability, yet its heterogeneity adds complexity compared to centralized approaches. This survey focuses on three main FL research directions: personalization, optimization, and robustness, offering a structured classification through a hybrid methodology that combines bibliometric analysis with systematic review to identify the most influential works. We examine challenges and techniques related to heterogeneity, efficiency, security, and privacy, and provide a comprehensive overview of aggregation strategies, including architectures, synchronization methods, and diverse federation objectives. To complement this, we discuss practical evaluation approaches and present experiments comparing aggregation methods under IID and non-IID data distributions. Finally, we outline promising research directions to advance FL, aiming to guide future innovation in this rapidly evolving field.

【3】FLUX: Efficient Descriptor-Driven Clustered Federated Learning under Arbitrary Distribution Shifts
标题：FLOX：任意分布漂移下的高效描述符驱动的并行联邦学习
链接：https://arxiv.org/abs/2511.22305

作者：Dario Fenoglio,Mohan Li,Pietro Barbiero,Nicholas D. Lane,Marc Langheinrich,Martin Gjoreski
备注：[v1] Pre-print of the paper accepted to NeurIPS 2025 (57 pages)
摘要：联合学习（FL）支持跨多个客户端的协作模型训练，同时保护数据隐私。传统的FL方法通常使用全局模型来拟合所有客户端，假设客户端的数据是独立同分布的（IID）。然而，当这个假设不成立时，全局模型的准确性可能会显着下降，限制FL在现实世界中的应用。为了解决这一差距，我们提出了FLUX，一种新的基于聚类的FL（CFL）框架，它解决了训练和测试期间四种最常见的分布变化类型。为此，FLUX利用保护隐私的客户端描述符提取和无监督聚类，以确保在不同级别和类型的分布偏移中具有强大的性能和可扩展性。与解决非IID客户端分布偏移的现有CFL方法不同，FLUX i）不需要任何关于分布偏移类型或客户端集群数量的先验知识，并且ii）支持测试时自适应，使得看不见的和未标记的客户端能够从最合适的集群特定模型中受益。在四个标准基准、两个真实数据集和十个最先进的基线上进行的广泛实验表明，FLUX在不同的分布变化下提高了性能和稳定性，在最佳性能基线上实现了高达23个百分点的平均准确度增益，同时保持了与FedAvg相当的计算和通信开销。
摘要：Federated Learning (FL) enables collaborative model training across multiple clients while preserving data privacy. Traditional FL methods often use a global model to fit all clients, assuming that clients' data are independent and identically distributed (IID). However, when this assumption does not hold, the global model accuracy may drop significantly, limiting FL applicability in real-world scenarios. To address this gap, we propose FLUX, a novel clustering-based FL (CFL) framework that addresses the four most common types of distribution shifts during both training and test time. To this end, FLUX leverages privacy-preserving client-side descriptor extraction and unsupervised clustering to ensure robust performance and scalability across varying levels and types of distribution shifts. Unlike existing CFL methods addressing non-IID client distribution shifts, FLUX i) does not require any prior knowledge of the types of distribution shifts or the number of client clusters, and ii) supports test-time adaptation, enabling unseen and unlabeled clients to benefit from the most suitable cluster-specific models. Extensive experiments across four standard benchmarks, two real-world datasets and ten state-of-the-art baselines show that FLUX improves performance and stability under diverse distribution shifts, achieving an average accuracy gain of up to 23 percentage points over the best-performing baselines, while maintaining computational and communication overhead comparable to FedAvg.

【4】FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
标题：FedRE：一个用于模型异类联邦学习的表示纠缠框架
链接：https://arxiv.org/abs/2511.22265

作者：Yuan Yao,Lixu Wang,Jiaqi Wu,Jin Song,Simin Chen,Zehua Wang,Zijian Tian,Wei Chen,Huixia Li,Xiaoxiao Li
摘要：联邦学习（FL）可以在不影响隐私的情况下跨客户进行协作培训。虽然大多数现有的FL方法假设同质模型架构，客户端的数据和资源的异质性，使这一假设不切实际，激励模型异构FL。为了解决这个问题，我们提出了联邦表示纠缠（FedRE），一个框架建立在一种新的形式的客户端知识称为纠缠表示。在FedRE中，每个客户端使用归一化的随机权重将其本地表示聚合成单个纠缠表示，并应用相同的权重将相应的独热标签编码集成到纠缠标签编码中。然后将这些数据上传到服务器以训练全局分类器。在训练过程中，每个纠缠表示都通过其纠缠标签编码在类别之间进行监督，而随机权重在每一轮都被重新采样以引入多样性，从而减轻全局分类器的过度自信并促进更平滑的决策边界。此外，每个客户端上传单个跨类别纠缠表示及其纠缠标签编码，从而减轻表示反转攻击的风险并减少通信开销。大量的实验表明，FedRE实现了模型性能，隐私保护和通信开销之间的有效权衡。这些代码可在https://github.com/AIResearch-Group/FedRE上获得。
摘要：Federated learning (FL) enables collaborative training across clients without compromising privacy. While most existing FL methods assume homogeneous model architectures, client heterogeneity in data and resources renders this assumption impractical, motivating model-heterogeneous FL. To address this problem, we propose Federated Representation Entanglement (FedRE), a framework built upon a novel form of client knowledge termed entangled representation. In FedRE, each client aggregates its local representations into a single entangled representation using normalized random weights and applies the same weights to integrate the corresponding one-hot label encodings into the entangled-label encoding. Those are then uploaded to the server to train a global classifier. During training, each entangled representation is supervised across categories via its entangled-label encoding, while random weights are resampled each round to introduce diversity, mitigating the global classifier's overconfidence and promoting smoother decision boundaries. Furthermore, each client uploads a single cross-category entangled representation along with its entangled-label encoding, mitigating the risk of representation inversion attacks and reducing communication overhead. Extensive experiments demonstrate that FedRE achieves an effective trade-off among model performance, privacy protection, and communication overhead. The codes are available at https://github.com/AIResearch-Group/FedRE.

【5】A Fast and Flat Federated Learning Method via Weighted Momentum and Sharpness-Aware Minimization
标题：基于加权动量和敏锐度最小化的快速平坦联邦学习方法
链接：https://arxiv.org/abs/2511.22080

作者：Tianle Li,Yongzhi Huang,Linshan Jiang,Chang Liu,Qipeng Xie,Wenfeng Du,Lu Wang,Kaishun Wu
摘要：在联邦学习（FL）中，模型必须在严格的通信预算下快速收敛，同时在非IID客户端分布中泛化。这两个需求自然导致了两种广泛使用的技术：客户机/服务器加速技术和尖锐度感知最小化技术（SAM）。然而，简单地结合动量和SAM留下了两个结构问题，在非IID FL未解决。我们确定并正式两种故障模式：\n {局部-全局曲率错位}（本地SAM方向不需要反映全球损失的几何形状）和\n {动量回声振荡}（后期不稳定所造成的累积动量）。据我们所知，这些失效模式尚未在FL文献中联合阐述和解决。我们建议\textbf{FedWMSAM}解决这两种失效模式。首先，我们从服务器聚集的动量构造动量引导的全局扰动，以使客户端的SAM方向与全局下降几何对齐，从而实现保持效率的SAM近似。其次，我们耦合动量和SAM通过余弦相似性自适应规则，产生一个早期的动量，后期SAM两阶段的训练计划。我们给出了一个非IID收敛界，它在理论上明确地模拟了扰动引起的方差σ_ρ^2=σ^2+（Lρ）^2$及其对（S，K，R，N）的依赖性.我们在多个数据集和模型架构上进行了广泛的实验，结果验证了我们的方法的有效性，适应性和鲁棒性，证明了它在解决联邦学习的优化挑战方面的优越性。我们的代码可在https://github.com/Huang-Yongzhi/NeurlPS_FedWMSAM上获得。
摘要：In federated learning (FL), models must \emph{converge quickly} under tight communication budgets while \emph{generalizing} across non-IID client distributions. These twin requirements have naturally led to two widely used techniques: client/server \emph{momentum} to accelerate progress, and \emph{sharpness-aware minimization} (SAM) to prefer flat solutions. However, simply combining momentum and SAM leaves two structural issues unresolved in non-IID FL. We identify and formalize two failure modes: \emph{local-global curvature misalignment} (local SAM directions need not reflect the global loss geometry) and \emph{momentum-echo oscillation} (late-stage instability caused by accumulated momentum). To our knowledge, these failure modes have not been jointly articulated and addressed in the FL literature. We propose \textbf{FedWMSAM} to address both failure modes. First, we construct a momentum-guided global perturbation from server-aggregated momentum to align clients' SAM directions with the global descent geometry, enabling a \emph{single-backprop} SAM approximation that preserves efficiency. Second, we couple momentum and SAM via a cosine-similarity adaptive rule, yielding an early-momentum, late-SAM two-phase training schedule. We provide a non-IID convergence bound that \emph{explicitly models the perturbation-induced variance} $σ_ρ^2=σ^2+(Lρ)^2$ and its dependence on $(S, K, R, N)$ on the theory side. We conduct extensive experiments on multiple datasets and model architectures, and the results validate the effectiveness, adaptability, and robustness of our method, demonstrating its superiority in addressing the optimization challenges of Federated Learning. Our code is available at https://github.com/Huang-Yongzhi/NeurlPS_FedWMSAM.

推理|分析|理解|解释(12篇)

【1】SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments
标题：SmallWorlds：评估孤立环境中世界模型的动态理解
链接：https://arxiv.org/abs/2511.23465

作者：Xinyi Li,Zaishuo Xia,Weyl Lu,Chenjie Hao,Yubei Chen
摘要：目前的世界模型缺乏一个统一和受控的系统评估环境，因此很难评估它们是否真正捕捉到了管理环境动态的基本规则。在这项工作中，我们通过引入SmallWorld Benchmark来解决这一开放性挑战，SmallWorld Benchmark是一个测试平台，旨在评估世界模型在孤立和精确控制的动态下的能力，而不依赖于手工制作的奖励信号。使用这个基准测试，我们在完全可观察的状态空间中对代表性架构进行了全面的实验，包括递归状态空间模型，Transformer，扩散模型和神经ODE，检查它们在六个不同领域的行为。实验结果揭示了这些模型如何有效地捕捉环境结构，以及它们的预测如何在扩展部署过程中恶化，突出了当前建模范式的优势和局限性，并为表征学习和动态建模的未来改进方向提供了见解。
摘要：Current world models lack a unified and controlled setting for systematic evaluation, making it difficult to assess whether they truly capture the underlying rules that govern environment dynamics. In this work, we address this open challenge by introducing the SmallWorld Benchmark, a testbed designed to assess world model capability under isolated and precisely controlled dynamics without relying on handcrafted reward signals. Using this benchmark, we conduct comprehensive experiments in the fully observable state space on representative architectures including Recurrent State Space Model, Transformer, Diffusion model, and Neural ODE, examining their behavior across six distinct domains. The experimental results reveal how effectively these models capture environment structure and how their predictions deteriorate over extended rollouts, highlighting both the strengths and limitations of current modeling paradigms and offering insights into future improvement directions in representation learning and dynamics modeling.

【2】The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference
标题：进步的代价：数学效率和人工智能推理成本的下降
链接：https://arxiv.org/abs/2511.23455

作者：Hans Gundlach,Jayson Lynch,Matthias Mertens,Neil Thompson
摘要：近年来，语言模型在高级基准测试方面取得了巨大的进步，但其中大部分进步只能通过使用更昂贵的模型来实现。因此，基准可能扭曲了每美元实际能力的进展情况。为了解决这个问题，我们使用人工分析和Epoch AI的数据来形成最大的当前和历史价格数据集，以运行迄今为止的基准。我们发现，对于知识，推理，数学和软件工程基准的前沿模型，给定水平的基准性能的价格下降得非常快，每年约为5\times $到10\times $。人工智能推理成本的降低是由于经济力量、硬件效率的提高和算法效率的提高。分离出开放模型以控制竞争效应，并除以硬件价格下降，我们估计算法效率的进步约为每年3倍。最后，我们建议评估人员公布并考虑基准测试的价格，作为衡量人工智能现实影响的重要组成部分。
摘要：Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore present a warped picture of progress in practical capabilities per dollar. To remedy this, we use data from Artificial Analysis and Epoch AI to form the largest dataset of current and historical prices to run benchmarks to date. We find that the price for a given level of benchmark performance has decreased remarkably fast, around $5\times$ to $10\times$ per year, for frontier models on knowledge, reasoning, math, and software engineering benchmarks. These reductions in the cost of AI inference are due to economic forces, hardware efficiency improvements, and algorithmic efficiency improvements. Isolating out open models to control for competition effects and dividing by hardware price declines, we estimate that algorithmic efficiency progress is around $3\times$ per year. Finally, we recommend that evaluators both publicize and take into account the price of benchmarking as an essential part of measuring the real-world impact of AI.

【3】Machine Learning for Scientific Visualization: Ensemble Data Analysis
标题：用于科学可视化的机器学习：嵌入式数据分析
链接：https://arxiv.org/abs/2511.23290

作者：Hamid Gadirov
备注：PhD thesis, University of Groningen, 2025
摘要：科学模拟和实验测量产生了大量的时空数据，但由于高维、复杂结构和信息缺失，提取有意义的见解仍然具有挑战性。传统的分析方法常常难以解决这些问题，从而激发了对更强大的数据驱动方法的需求。本论文探索了深度学习方法来改善时空科学集合的分析和可视化，重点是降维，流量估计和时间插值。首先，我们通过基于自动编码器的降维科学合奏解决高维数据表示。我们评估了部分标记下投影度量的稳定性，并引入了一种Pareto有效的选择策略来识别最佳的自动编码器变体，从而确保表达和可靠的低维嵌入。接下来，我们将介绍FLINT，这是一种深度学习模型，用于在流监督和流无监督设置中进行高质量的流估计和时间插值。FLINT重建丢失的速度场，并为2D+时间和3D+时间集合的标量场生成高保真的时间插值，而无需特定于域的假设或广泛的微调。为了进一步提高适应性和泛化能力，我们引入了HyperFLINT，这是一种基于超网络的方法，它以模拟参数为条件来估计流场和插值标量数据。这种参数感知适应在不同的科学领域产生更准确的重建，即使是稀疏或不完整的数据。总的来说，本论文推进了用于科学可视化的深度学习技术，为解释复杂的时空集合提供了可扩展的，自适应的和高质量的解决方案。
摘要：Scientific simulations and experimental measurements produce vast amounts of spatio-temporal data, yet extracting meaningful insights remains challenging due to high dimensionality, complex structures, and missing information. Traditional analysis methods often struggle with these issues, motivating the need for more robust, data-driven approaches. This dissertation explores deep learning methodologies to improve the analysis and visualization of spatio-temporal scientific ensembles, focusing on dimensionality reduction, flow estimation, and temporal interpolation. First, we address high-dimensional data representation through autoencoder-based dimensionality reduction for scientific ensembles. We evaluate the stability of projection metrics under partial labeling and introduce a Pareto-efficient selection strategy to identify optimal autoencoder variants, ensuring expressive and reliable low-dimensional embeddings. Next, we present FLINT, a deep learning model for high-quality flow estimation and temporal interpolation in both flow-supervised and flow-unsupervised settings. FLINT reconstructs missing velocity fields and generates high-fidelity temporal interpolants for scalar fields across 2D+time and 3D+time ensembles without domain-specific assumptions or extensive finetuning. To further improve adaptability and generalization, we introduce HyperFLINT, a hypernetwork-based approach that conditions on simulation parameters to estimate flow fields and interpolate scalar data. This parameter-aware adaptation yields more accurate reconstructions across diverse scientific domains, even with sparse or incomplete data. Overall, this dissertation advances deep learning techniques for scientific visualization, providing scalable, adaptable, and high-quality solutions for interpreting complex spatio-temporal ensembles.

【4】CRAwDAD: Causal Reasoning Augmentation with Dual-Agent Debate
标题：CRAwDAD：因果推理增强与双代理辩论
链接：https://arxiv.org/abs/2511.22854

作者：Finn G. Vamosi,Nils D. Forkert
备注：12 pages, 8 figures. Code available at https://github.com/finnvamosi/CRAwDAD
摘要：When people reason about cause and effect, they often consider many competing "what if" scenarios before deciding which explanation fits best. Analogously, advanced language models capable of causal inference can consider multiple interventions and counterfactuals to judge the validity of causal claims. Crucially, this type of reasoning is less like a single calculation and more like an internal dialogue between alternative hypotheses. In this paper, we make this dialogue explicit through a dual-agent debate framework where one model provides a structured causal inference, and the other critically examines this reasoning for logical flaws. When disagreements arise, agents attempt to persuade each other, challenging each other's logic and revising their conclusions until they converge on a mutually agreed answer. To take advantage of this deliberative process, we specifically use reasoning language models, whose strengths in both causal inference and adversarial debate remain under-explored relative to standard large language models. We evaluate our approach on the CLadder dataset, a benchmark linking natural language questions to formally defined causal graphs across all three rungs of Pearl's ladder of causation. With Qwen3 and DeepSeek-R1 as debater agents, we demonstrate that multi-agent debate improves DeepSeek-R1's overall accuracy in causal inference from 78.03% to 87.45%, with the counterfactual category specifically improving from 67.94% to 80.04% accuracy. Similarly, Qwen3's overall accuracy improves from 84.16% to 89.41%, and counterfactual questions from 71.53% to 80.35%, showing that strong models can still benefit greatly from debate with weaker agents. Our results highlight the potential of reasoning models as building blocks for multi-agent systems in causal inference, and demonstrate the importance of diverse perspectives in causal problem-solving.

【5】PerfMamba: Performance Analysis and Pruning of Selective State Space Models
标题：PerfMamba：选择性状态空间模型的性能分析和修剪
链接：https://arxiv.org/abs/2511.22849

作者：Abdullah Al Asif,Mobina Kashaniyan,Sixing Yu,Juan Pablo Muñoz,Ali Jannesari
备注：Accepted in Bench 2025
摘要：Recent advances in sequence modeling have introduced selective SSMs as promising alternatives to Transformer architectures, offering theoretical computational efficiency and sequence processing advantages. A comprehensive understanding of selective SSMs in runtime behavior, resource utilization patterns, and scaling characteristics still remains unexplored, thus obstructing their optimal deployment and further architectural improvements. This paper presents a thorough empirical study of Mamba-1 and Mamba-2, systematically profiled for performance to assess the design principles that contribute to their efficiency in state-space modeling. A detailed analysis of computation patterns, memory access, I/O characteristics, and scaling properties was performed for sequence lengths ranging from 64 to 16384 tokens. Our findings show that the SSM component, a central part of the selective SSM architecture, demands a significant portion of computational resources compared to other components in the Mamba block. Based on these insights, we propose a pruning technique that selectively removes low-activity states within the SSM component, achieving measurable throughput and memory gains while maintaining accuracy within a moderate pruning regime. This approach results in performance improvements across varying sequence lengths, achieving a 1.14x speedup and reducing memory usage by 11.50\%. These results offer valuable guidance for designing more efficient SSM architectures that can be applied to a wide range of real-world applications.

【6】Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation
标题：计数仍然很重要：通过查询放松理解神经复杂查询响应
链接：https://arxiv.org/abs/2511.22565

作者：Yannick Brunink,Daniel Daza,Yunjie He,Michael Cochez
摘要：Neural methods for Complex Query Answering (CQA) over knowledge graphs (KGs) are widely believed to learn patterns that generalize beyond explicit graph structure, allowing them to infer answers that are unreachable through symbolic query processing. In this work, we critically examine this assumption through a systematic analysis comparing neural CQA models with an alternative, training-free query relaxation strategy that retrieves possible answers by relaxing query constraints and counting resulting paths. Across multiple datasets and query structures, we find several cases where neural and relaxation-based approaches perform similarly, with no neural model consistently outperforming the latter. Moreover, a similarity analysis reveals that their retrieved answers exhibit little overlap, and that combining their outputs consistently improves performance. These results call for a re-evaluation of progress in neural query answering: despite their complexity, current models fail to subsume the reasoning patterns captured by query relaxation. Our findings highlight the importance of stronger non-neural baselines and suggest that future neural approaches could benefit from incorporating principles of query relaxation.

【7】Space Explanations of Neural Network Classification
标题：神经网络分类的空间解释
链接：https://arxiv.org/abs/2511.22498

作者：Faezeh Labbaf,Tomáš Kolárik,Martin Blicha,Grigory Fedyukovich,Michael Wand,Natasha Sharygina
摘要：We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.

【8】DeepGI: Explainable Deep Learning for Gastrointestinal Image Classification
标题：DeepGI：用于胃肠道图像分类的可解释深度学习
链接：https://arxiv.org/abs/2511.21959

作者：Walid Houmaidi,Mohamed Hadadi,Youssef Sabiri,Yousra Chtouki
备注：7 pages, 4 figures, 2 tables. Accepted at DASET 2026
摘要：This paper presents a comprehensive comparative model analysis on a novel gastrointestinal medical imaging dataset, comprised of 4,000 endoscopic images spanning four critical disease classes: Diverticulosis, Neoplasm, Peritonitis, and Ureters. Leveraging state-of-the-art deep learning techniques, the study confronts common endoscopic challenges such as variable lighting, fluctuating camera angles, and frequent imaging artifacts. The best performing models, VGG16 and MobileNetV2, each achieved a test accuracy of 96.5%, while Xception reached 94.24%, establishing robust benchmarks and baselines for automated disease classification. In addition to strong classification performance, the approach includes explainable AI via Grad-CAM visualization, enabling identification of image regions most influential to model predictions and enhancing clinical interpretability. Experimental results demonstrate the potential for robust, accurate, and interpretable medical image analysis even in complex real-world conditions. This work contributes original benchmarks, comparative insights, and visual explanations, advancing the landscape of gastrointestinal computer-aided diagnosis and underscoring the importance of diverse, clinically relevant datasets and model explainability in medical AI research.

【9】Variational analysis of determinantal varieties
标题：决定性品种的变异分析
链接：https://arxiv.org/abs/2511.22613

作者：Yan Yang,Bin Gao,Ya-xiang Yuan
备注：71 pages, 6 figures, 2 tables
摘要：Determinantal varieties -- the sets of bounded-rank matrices or tensors -- have attracted growing interest in low-rank optimization. The tangent cone to low-rank sets is widely studied and underpins a range of geometric methods. The second-order geometry, which encodes curvature information, is more intricate. In this work, we develop a unified framework to derive explicit formulas for both first- and second-order tangent sets to various low-rank sets, including low-rank matrices, tensors, symmetric matrices, and positive semidefinite matrices. The framework also accommodates the intersection of a low-rank set and another set satisfying mild assumptions, thereby yielding a tangent intersection rule. Through the lens of tangent sets, we establish a necessary and sufficient condition under which a nonsmooth problem and its smooth parameterization share equivalent second-order stationary points. Moreover, we exploit tangent sets to characterize optimality conditions for low-rank optimization and prove that verifying second-order optimality is NP-hard. In a separate line of analysis, we investigate variational geometry of the graph of the normal cone to matrix varieties, deriving the explicit Bouligand tangent cone, Fréchet and Mordukhovich normal cones to the graph. These results are further applied to develop optimality conditions for low-rank bilevel programs.

【10】Data-driven informative priors for Bayesian inference with quasi-periodic data
标题：准周期数据的Bayesian推理的数据驱动信息先验
链接：https://arxiv.org/abs/2511.22296

作者：Javier Lopez-Santiago,Luca Martino,Joaquin Miguez,Gonzalo Vazquez-Vilar
备注：Accepted for publication in AJ. 19 pages (one column), 14 figures
摘要：Bayesian computational strategies for inference can be inefficient in approximating the posterior distribution in models that exhibit some form of periodicity. This is because the probability mass of the marginal posterior distribution of the parameter representing the period is usually highly concentrated in a very small region of the parameter space. Therefore, it is necessary to provide as much information as possible to the inference method through the parameter prior distribution. We intend to show that it is possible to construct a prior distribution from the data by fitting a Gaussian process (GP) with a periodic kernel. More specifically, we want to show that it is possible to approximate the marginal posterior distribution of the hyperparameter corresponding to the period in the kernel. Subsequently, this distribution can be used as a prior distribution for the inference method. We use an adaptive importance sampling method to approximate the posterior distribution of the hyperparameters of the GP. Then, we use the marginal posterior distribution of the hyperparameter related to the periodicity in order to construct a prior distribution for the period of the parametric model. This workflow is empirical Bayes, implemented as a modular (cut) transfer of a GP posterior for the period to the parametric model. We applied the proposed methodology to both synthetic and real data. We approximated the posterior distribution of the period of the GP kernel and then passed it forward as a posterior-as-prior with no feedback. Finally, we analyzed its impact on the marginal posterior distribution.

【11】Towards Understanding Generalization in DP-GD: A Case Study in Training Two-Layer CNNs
标题：了解DP-VD中的一般化：训练两层CNN的案例研究
链接：https://arxiv.org/abs/2511.22270

作者：Zhongjie Shi,Puyu Wang,Chenyang Zhang,Yuan Cao
摘要：Modern deep learning techniques focus on extracting intricate information from data to achieve accurate predictions. However, the training datasets may be crowdsourced and include sensitive information, such as personal contact details, financial data, and medical records. As a result, there is a growing emphasis on developing privacy-preserving training algorithms for neural networks that maintain good performance while preserving privacy. In this paper, we investigate the generalization and privacy performances of the differentially private gradient descent (DP-GD) algorithm, which is a private variant of the gradient descent (GD) by incorporating additional noise into the gradients during each iteration. Moreover, we identify a concrete learning task where DP-GD can achieve superior generalization performance compared to GD in training two-layer Huberized ReLU convolutional neural networks (CNNs). Specifically, we demonstrate that, under mild conditions, a small signal-to-noise ratio can result in GD producing training models with poor test accuracy, whereas DP-GD can yield training models with good test accuracy and privacy guarantees if the signal-to-noise ratio is not too small. This indicates that DP-GD has the potential to enhance model performance while ensuring privacy protection in certain learning tasks. Numerical simulations are further conducted to support our theoretical results.

【12】A Sensitivity Approach to Causal Inference Under Limited Overlap
标题：有限重叠下因果推理的敏感性方法
链接：https://arxiv.org/abs/2511.22003

作者：Yuanzhe Ma,Hongseok Namkoong
摘要：Limited overlap between treated and control groups is a key challenge in observational analysis. Standard approaches like trimming importance weights can reduce variance but introduce a fundamental bias. We propose a sensitivity framework for contextualizing findings under limited overlap, where we assess how irregular the outcome function has to be in order for the main finding to be invalidated. Our approach is based on worst-case confidence bounds on the bias introduced by standard trimming practices, under explicit assumptions necessary to extrapolate counterfactual estimates from regions of overlap to those without. Empirically, we demonstrate how our sensitivity framework protects against spurious findings by quantifying uncertainty in regions with limited overlap.

检测相关(6篇)

【1】An Efficient Privacy-preserving Intrusion Detection Scheme for UAV Swarm Networks
标题：一种高效的无人机群网络隐私保护入侵检测方案
链接：https://arxiv.org/abs/2511.22791

作者：Kanchon Gharami,Shafika Showkat Moni
备注：This paper has been accepted for publication in the Proceedings of the 44th AIAA/IEEE Digital Avionics Systems Conference (DASC) 2025, where it received the Best Paper of Session Award
摘要：The rapid proliferation of unmanned aerial vehicles (UAVs) and their applications in diverse domains, such as surveillance, disaster management, agriculture, and defense, have revolutionized modern technology. While the potential benefits of swarm-based UAV networks are growing significantly, they are vulnerable to various security attacks that can jeopardize the overall mission success by degrading their performance, disrupting decision-making, and compromising the trajectory planning process. The Intrusion Detection System (IDS) plays a vital role in identifying potential security attacks to ensure the secure operation of UAV swarm networks. However, conventional IDS primarily focuses on binary classification with resource-intensive neural networks and faces challenges, including latency, privacy breaches, increased performance overhead, and model drift. This research aims to address these challenges by developing a novel lightweight and federated continuous learning-based IDS scheme. Our proposed model facilitates decentralized training across diverse UAV swarms to ensure data heterogeneity and privacy. The performance evaluation of our model demonstrates significant improvements, with classification accuracies of 99.45% on UKM-IDS, 99.99% on UAV-IDS, 96.85% on TLM-UAV dataset, and 98.05% on Cyber-Physical datasets.

【2】Difficulties with Evaluating a Deception Detector for AIs
标题：评估人工智能欺骗检测器的困难
链接：https://arxiv.org/abs/2511.22662

作者：Lewis Smith,Bilal Chughtai,Neel Nanda
摘要：Building reliable deception detectors for AI systems -- methods that could predict when an AI system is being strategically deceptive without necessarily requiring behavioural evidence -- would be valuable in mitigating risks from advanced AI systems. But evaluating the reliability and efficacy of a proposed deception detector requires examples that we can confidently label as either deceptive or honest. We argue that we currently lack the necessary examples and further identify several concrete obstacles in collecting them. We provide evidence from conceptual arguments, analysis of existing empirical works, and analysis of novel illustrative case studies. We also discuss the potential of several proposed empirical workarounds to these problems and argue that while they seem valuable, they also seem insufficient alone. Progress on deception detection likely requires further consideration of these problems.

【3】Deep Learning Architectures for Code-Modulated Visual Evoked Potentials Detection
标题：用于代码调制视觉诱发势检测的深度学习架构
链接：https://arxiv.org/abs/2511.21940

作者：Kiran Nair,Hubert Cecotti
备注：20 Pages, prepared for a Journal
摘要：Non-invasive Brain-Computer Interfaces (BCIs) based on Code-Modulated Visual Evoked Potentials (C-VEPs) require highly robust decoding methods to address temporal variability and session-dependent noise in EEG signals. This study proposes and evaluates several deep learning architectures, including convolutional neural networks (CNNs) for 63-bit m-sequence reconstruction and classification, and Siamese networks for similarity-based decoding, alongside canonical correlation analysis (CCA) baselines. EEG data were recorded from 13 healthy adults under single-target flicker stimulation. The proposed deep models significantly outperformed traditional approaches, with distance-based decoding using Earth Mover's Distance (EMD) and constrained EMD showing greater robustness to latency variations than Euclidean and Mahalanobis metrics. Temporal data augmentation with small shifts further improved generalization across sessions. Among all models, the multi-class Siamese network achieved the best overall performance with an average accuracy of 96.89%, demonstrating the potential of data-driven deep architectures for reliable, single-trial C-VEP decoding in adaptive non-invasive BCI systems.

【4】Modeling Quantum Autoencoder Trainable Kernel for IoT Anomaly Detection
标题：为物联网异常检测建模量子自动编码器可训练内核
链接：https://arxiv.org/abs/2511.21932

作者：Swathi Chandrasekhar,Shiva Raj Pokhrel,Swati Kumari,Navneet Singh
摘要：Escalating cyber threats and the high-dimensional complexity of IoT traffic have outpaced classical anomaly detection methods. While deep learning offers improvements, computational bottlenecks limit real-time deployment at scale. We present a quantum autoencoder (QAE) framework that compresses network traffic into discriminative latent representations and employs quantum support vector classification (QSVC) for intrusion detection. Evaluated on three datasets, our approach achieves improved accuracy on ideal simulators and on the IBM Quantum hardware demonstrating practical quantum advantage on current NISQ devices. Crucially, moderate depolarizing noise acts as implicit regularization, stabilizing training and enhancing generalization. This work establishes quantum machine learning as a viable, hardware-ready solution for real-world cybersecurity challenges.

【5】Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection
标题：利用深生成模型推进海洋生物声学：南方常驻虎鲸检测的混合增强策略
链接：https://arxiv.org/abs/2511.21872

作者：Bruno Padovese,Fabio Frazao,Michael Dowd,Ruth Joy
备注：16 pages, 6 Figures, 2 Tables, submitted to Marine Mammal Science as part of a special issue on Machine Learning and Artificial Intelligence in Marine Mammal Research
摘要：Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but relatively simple transformations, leaving open the question of whether deep generative models can provide additional benefits. In this study, we evaluate the potential of deep generative for data augmentation in marine mammal call detection including: Variational Autoencoders, Generative Adversarial Networks, and Denoising Diffusion Probabilistic Models. Using Southern Resident Killer Whale (Orcinus orca) vocalizations from two long-term hydrophone deployments in the Salish Sea, we compare these approaches against traditional augmentation methods such as time-shifting and vocalization masking. While all generative approaches improved classification performance relative to the baseline, diffusion-based augmentation yielded the highest recall (0.87) and overall F1-score (0.75). A hybrid strategy combining generative-based synthesis with traditional methods achieved the best overall performance with an F1-score of 0.81. We hope this study encourages further exploration of deep generative models as complementary augmentation strategies to advance acoustic monitoring of threatened marine mammal populations.

【6】Artificial intelligence for methane detection: from continuous monitoring to verified mitigation
标题：甲烷检测人工智能：从持续监测到验证缓解措施
链接：https://arxiv.org/abs/2511.21777

作者：Anna Allen,Gonzalo Mateo-Garcia,Itziar Irakulis-Loitxate,Manuel Montesino-San Martin,Marc Watine,James Requeima,Javier Gorroño,Cynthia Randles,Tharwat Mokalled,Luis Guanter,Richard E. Turner,Claudio Cifarelli,Manfredi Caltagirone
摘要：Methane is a potent greenhouse gas, responsible for roughly 30\% of warming since pre-industrial times. A small number of large point sources account for a disproportionate share of emissions, creating an opportunity for substantial reductions by targeting relatively few sites. Detection and attribution of large emissions at scale for notification to asset owners remains challenging. Here, we introduce MARS-S2L, a machine learning model that detects methane emissions in publicly available multispectral satellite imagery. Trained on a manually curated dataset of over 80,000 images, the model provides high-resolution detections every two days, enabling facility-level attribution and identifying 78\% of plumes with an 8\% false positive rate at 697 previously unseen sites. Deployed operationally, MARS-S2L has issued 1,015 notifications to stakeholders in 20 countries, enabling verified, permanent mitigation of six persistent emitters, including a previously unknown site in Libya. These results demonstrate a scalable pathway from satellite detection to quantifiable methane mitigation.

分类|识别(8篇)

【1】Standard Occupation Classifier -- A Natural Language Processing Approach
标题：标准职业分类器--一种自然语言处理方法
链接：https://arxiv.org/abs/2511.23057

作者：Sidharth Rony,Jack Patman
摘要：Standard Occupational Classifiers (SOC) are systems used to categorize and classify different types of jobs and occupations based on their similarities in terms of job duties, skills, and qualifications. Integrating these facets with Big Data from job advertisement offers the prospect to investigate labour demand that is specific to various occupations. This project investigates the use of recent developments in natural language processing to construct a classifier capable of assigning an occupation code to a given job advertisement. We develop various classifiers for both UK ONS SOC and US O*NET SOC, using different Language Models. We find that an ensemble model, which combines Google BERT and a Neural Network classifier while considering job title, description, and skills, achieved the highest prediction accuracy. Specifically, the ensemble model exhibited a classification accuracy of up to 61% for the lower (or fourth) tier of SOC, and 72% for the third tier of SOC. This model could provide up to date, accurate information on the evolution of the labour market using job advertisements.

【2】Spatially Aware Dictionary-Free Eigenfunction Identification for Modeling and Control of Nonlinear Dynamical Systems
标题：用于非线性动态系统建模和控制的空间感知无字典特征函数识别
链接：https://arxiv.org/abs/2511.22648

作者：David Grasev
备注：31 pages, 24 figures
摘要：A new approach to data-driven discovery of Koopman eigenfunctions without a pre-defined set of basis functions is proposed. The approach is based on a reference trajectory, for which the Koopman mode amplitudes are first identified, and the Koopman mode decomposition is transformed to a new basis, which contains fundamental functions of eigenvalues and time. The initial values of the eigenfunctions are obtained by projecting trajectories onto this basis via a regularized least-squares fit. A global optimizer was employed to optimize the eigenvalues. Mapping initial-state values to eigenfunction values reveals their spatial structure, enabling the numerical computation of their gradients. Thus, deviations from the Koopman partial differential equation are penalized, leading to more robust solutions. The approach was successfully tested on several benchmark nonlinear dynamical systems, including the FitzHugh-Nagumo system with inputs, van der Pol and Duffing oscillators, and a 2-spool turbojet engine with control. The study demonstrates that incorporating principal eigenvalues and spatial structure integrity promotion significantly improves the accuracy of Koopman predictors. The approach effectively discovers Koopman spectral components even with sparse state-space sampling and reveals geometric features of the state space, such as invariant partitions. Finally, the numerical approximation of the eigenfunction gradient can be used for input dynamics modeling and control design. The results support the practicality of the approach for use with various dynamical systems.

【3】Benchmarking machine learning models for multi-class state recognition in double duantum dot data
标题：在双段点数据中对多类状态识别的机器学习模型进行基准测试
链接：https://arxiv.org/abs/2511.22451

作者：Valeria Díaz Moreno,Ryan P Khalili,Daniel Schug,Patrick J. Walsh,Justyna P. Zwolak
备注：12 pages, 4 figures, 2 tables
摘要：Semiconductor quantum dots (QDs) are a leading platform for scalable quantum processors. However, scaling to large arrays requires reliable, automated tuning strategies for devices' bootstrapping, calibration, and operation, with many tuning aspects depending on accurately identifying QD device states from charge-stability diagrams (CSDs). In this work, we present a comprehensive benchmarking study of four modern machine learning (ML) architectures for multi-class state recognition in double-QD CSDs. We evaluate their performance across different data budgets and normalization schemes using both synthetic and experimental data. We find that the more resource-intensive models -- U-Nets and visual transformers (ViTs) -- achieve the highest MSE score (defined as $1-\mathrm{MSE}$) on synthetic data (over $0.98$) but fail to generalize to experimental data. MDNs are the most computationally efficient and exhibit highly stable training, but with substantially lower peak performance. CNNs offer the most favorable trade-off on experimental CSDs, achieving strong accuracy with two orders of magnitude fewer parameters than the U-Nets and ViTs. Normalization plays a nontrivial role: min-max scaling generally yields higher MSE scores but less stable convergence, whereas z-score normalization produces more predictable training dynamics but at reduced accuracy for most models. Overall, our study shows that CNNs with min-max normalization are a practical approach for QD CSDs.

【4】Real-PGDN: A Two-level Classification Method for Full-Process Recognition of Newly Registered Pornographic and Gambling Domain Names
标题：Real-PGDN：一种用于新注册色情和赌博域名全流程识别的两级分类方法
链接：https://arxiv.org/abs/2511.22215

作者：Hao Wang,Yingshuo Wang,Junang Gan,Yanan Cheng,Jinshuai Zhang
摘要：Online pornography and gambling have consistently posed regulatory challenges for governments, threatening both personal assets and privacy. Therefore, it is imperative to research the classification of the newly registered Pornographic and Gambling Domain Names (PGDN). However, scholarly investigation into this topic is limited. Previous efforts in PGDN classification pursue high accuracy using ideal sample data, while others employ up-to-date data from real-world scenarios but achieve lower classification accuracy. This paper introduces the Real-PGDN method, which accomplishes a complete process of timely and comprehensive real-data crawling, feature extraction with feature-missing tolerance, precise PGDN classification, and assessment of application effects in actual scenarios. Our two-level classifier, which integrates CoSENT (BERT-based), Multilayer Perceptron (MLP), and traditional classification algorithms, achieves a 97.88% precision. The research process amasses the NRD2024 dataset, which contains continuous detection information over 20 days for 1,500,000 newly registered domain names across 6 directions. Results from our case study demonstrate that this method also maintains a forecast precision of over 70% for PGDN that are delayed in usage after registration.

【5】ARES: Anomaly Recognition Model For Edge Streams
标题：ARES：边缘流异常识别模型
链接：https://arxiv.org/abs/2511.22078

作者：Simone Mungari,Albert Bifet,Giuseppe Manco,Bernhard Pfahringer
备注：Accepted at KDD 2026
摘要：Many real-world scenarios involving streaming information can be represented as temporal graphs, where data flows through dynamic changes in edges over time. Anomaly detection in this context has the objective of identifying unusual temporal connections within the graph structure. Detecting edge anomalies in real time is crucial for mitigating potential risks. Unlike traditional anomaly detection, this task is particularly challenging due to concept drifts, large data volumes, and the need for real-time response. To face these challenges, we introduce ARES, an unsupervised anomaly detection framework for edge streams. ARES combines Graph Neural Networks (GNNs) for feature extraction with Half-Space Trees (HST) for anomaly scoring. GNNs capture both spike and burst anomalous behaviors within streams by embedding node and edge properties in a latent space, while HST partitions this space to isolate anomalies efficiently. ARES operates in an unsupervised way without the need for prior data labeling. To further validate its detection capabilities, we additionally incorporate a simple yet effective supervised thresholding mechanism. This approach leverages statistical dispersion among anomaly scores to determine the optimal threshold using a minimal set of labeled data, ensuring adaptability across different domains. We validate ARES through extensive evaluations across several real-world cyber-attack scenarios, comparing its performance against existing methods while analyzing its space and time complexity.

【6】FLAWS: A Benchmark for Error Identification and Localization in Scientific Papers
标题：FLAWS：科学论文错误识别和定位的基准
链接：https://arxiv.org/abs/2511.21843

作者：Sarina Xi,Vishisht Rao,Justin Payan,Nihar B. Shah
备注：30 pages, 12 tables, 2 figures
摘要：The identification and localization of errors is a core task in peer review, yet the exponential growth of scientific output has made it increasingly difficult for human reviewers to reliably detect errors given the limited pool of experts. Recent advances in Large Language Models (LLMs) have sparked interest in their potential to support such evaluation tasks, from academic peer review to automated scientific assessment. However, despite the growing use of LLMs in review systems, their capabilities to pinpoint errors remain underexplored. In this work, we introduce Fault Localization Across Writing in Science (FLAWS), an automated benchmark consisting of 713 paper-error pairs designed to evaluate how effectively LLMs detect errors that undermine key claims in research papers. We construct the benchmark by systematically inserting claim-invalidating errors into peer-reviewed papers using LLMs, paired with an automated evaluation metric that measures whether models can identify and localize these errors. Developing such a benchmark presents unique challenges that we overcome: ensuring that the inserted errors are well-defined, challenging, and relevant to the content of the paper, avoiding artifacts that would make identification trivial, and designing a scalable, automated evaluation metric. On the resulting benchmark, we evaluate five frontier LLMs: Claude Sonnet 4.5, DeepSeek Reasoner v3.1, Gemini 2.5 Pro, GPT 5, and Grok 4. Among these, GPT 5 is the top-performing model, achieving 39.1% identification accuracy when k=10, where k is the number of top-ranked error text candidates generated by the LLM.

【7】Multiclass threshold-based classification and model evaluation
标题：基于多类阈值的分类和模型评估
链接：https://arxiv.org/abs/2511.21794

作者：Edoardo Legnaro,Sabrina Guastavino,Francesco Marchetti
备注：arXiv admin note: substantial text overlap with arXiv:2505.11276
摘要：In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of softmax outputs with a geometric one on the multidimensional simplex, where the classification depends on a multidimensional threshold. This change of perspective enables for any trained classification network an \textit{a posteriori} optimization of the classification score by means of threshold tuning, as usually carried out in the binary setting, thus allowing for a further refinement of the prediction capability of any network. Our experiments show indeed that multidimensional threshold tuning yields performance improvements across various networks and datasets. Moreover, we derive a multiclass ROC analysis based on \emph{ROC clouds} -- the attainable (FPR,TPR) operating points induced by a single multiclass threshold -- and summarize them via a \emph{Distance From Point} (DFP) score to $(0,1)$. This yields a coherent alternative to standard One-vs-Rest (OvR) curves and aligns with the observed tuning gains.

【8】Support Vector Machine Classifier with Rescaled Huberized Pinball Loss
标题：具有重新缩放Huberized弹球损失的支持载体机分类器
链接：https://arxiv.org/abs/2511.22065

作者：Shibo Diao
摘要：Support vector machines are widely used in machine learning classification tasks, but traditional SVM models suffer from sensitivity to outliers and instability in resampling, which limits their performance in practical applications. To address these issues, this paper proposes a novel rescaled Huberized pinball loss function with asymmetric, non-convex, and smooth properties. Based on this loss function, we develop a corresponding SVM model called RHPSVM (Rescaled Huberized Pinball Loss Support Vector Machine). Theoretical analyses demonstrate that RHPSVM conforms to Bayesian rules, has a strict generalization error bound, a bounded influence function, and controllable optimality conditions, ensuring excellent classification accuracy, outlier insensitivity, and resampling stability. Additionally, RHPSVM can be extended to various advanced SVM variants by adjusting parameters, enhancing its flexibility. We transform the non-convex optimization problem of RHPSVM into a series of convex subproblems using the concave-convex procedure (CCCP) and solve it with the ClipDCD algorithm, which is proven to be convergent. Experimental results on simulated data, UCI datasets, and small-sample crop leaf image classification tasks show that RHPSVM outperforms existing SVM models in both noisy and noise-free scenarios, especially in handling high-dimensional small-sample data.

表征(5篇)

【1】A Theoretical Framework for Discovering Groups and Unitary Representations via Tensor Factorization
标题：通过张量因式分解发现群和元元表示的理论框架
链接：https://arxiv.org/abs/2511.23152

作者：Dongsung Huh,Halyun Jeong
摘要：We analyze the HyperCube model, an \textit{operator-valued} tensor factorization architecture that discovers group structures and their unitary representations. We provide a rigorous theoretical explanation for this inductive bias by decomposing its objective into a term regulating factor scales ($\mathcal{B}$) and a term enforcing directional alignment ($\mathcal{R} \geq 0$). This decomposition isolates the \textit{collinear manifold} ($\mathcal{R}=0$), to which numerical optimization consistently converges for group isotopes. We prove that this manifold admits feasible solutions exclusively for group isotopes, and that within it, $\mathcal{B}$ exerts a variational pressure toward unitarity. To bridge the gap to the global landscape, we formulate a \textit{Collinearity Dominance Conjecture}, supported by empirical observations. Conditional on this dominance, we prove two key results: (1) the global minimum is achieved by the unitary regular representation for groups, and (2) non-group operations incur a strictly higher objective value, formally quantifying the model's inductive bias toward the associative structure of groups (up to isotopy).

【2】Stable-Drift: A Patient-Aware Latent Drift Replay Method for Stabilizing Representations in Continual Learning
标题：稳定漂移：一种患者感知的潜在漂移重播方法，用于稳定持续学习中的表示
链接：https://arxiv.org/abs/2511.22615

作者：Paraskevi-Antonia Theofilou,Anuhya Thota,Stefanos Kollias,Mamatha Thota
备注：8 pages, 2 figures
摘要：When deep learning models are sequentially trained on new data, they tend to abruptly lose performance on previously learned tasks, a critical failure known as catastrophic forgetting. This challenge severely limits the deployment of AI in medical imaging, where models must continually adapt to data from new hospitals without compromising established diagnostic knowledge. To address this, we introduce a latent drift-guided replay method that identifies and replays samples with high representational instability. Specifically, our method quantifies this instability via latent drift, the change in a sample internal feature representation after naive domain adaptation. To ensure diversity and clinical relevance, we aggregate drift at the patient level, our memory buffer stores the per patient slices exhibiting the greatest multi-layer representation shift. Evaluated on a cross-hospital COVID-19 CT classification task using state-of-the-art CNN and Vision Transformer backbones, our method substantially reduces forgetting compared to naive fine-tuning and random replay. This work highlights latent drift as a practical and interpretable replay signal for advancing robust continual learning in real world medical settings.

【3】Beyond Atoms: Evaluating Electron Density Representation for 3D Molecular Learning
标题：超越原子：评估3D分子学习的电子密度表示
链接：https://arxiv.org/abs/2511.21900

作者：Patricia Suriana,Joshua A. Rackers,Ewa M. Nowara,Pedro O. Pinheiro,John M. Nicoloudis,Vishnu Sresht
摘要：Machine learning models for 3D molecular property prediction typically rely on atom-based representations, which may overlook subtle physical information. Electron density maps -- the direct output of X-ray crystallography and cryo-electron microscopy -- offer a continuous, physically grounded alternative. We compare three voxel-based input types for 3D convolutional neural networks (CNNs): atom types, raw electron density, and density gradient magnitude, across two molecular tasks -- protein-ligand binding affinity prediction (PDBbind) and quantum property prediction (QM9). We focus on voxel-based CNNs because electron density is inherently volumetric, and voxel grids provide the most natural representation for both experimental and computed densities. On PDBbind, all representations perform similarly with full data, but in low-data regimes, density-based inputs outperform atom types, while a shape-based baseline performs comparably -- suggesting that spatial occupancy dominates this task. On QM9, where labels are derived from Density Functional Theory (DFT) but input densities from a lower-level method (XTB), density-based inputs still outperform atom-based ones at scale, reflecting the rich structural and electronic information encoded in density. Overall, these results highlight the task- and regime-dependent strengths of density-derived inputs, improving data efficiency in affinity prediction and accuracy in quantum property modeling.

【4】Physically Interpretable Representation Learning with Gaussian Mixture Variational AutoEncoder (GM-VAE)
标题：使用高斯混合变分自动编码器（GM-VAE）进行物理可解释表示学习
链接：https://arxiv.org/abs/2511.21883

作者：Tiffany Fan,Murray Cutforth,Marta D'Elia,Alexandre Cortiella,Alireza Doostan,Eric Darve
摘要：Extracting compact, physically interpretable representations from high-dimensional scientific data is a persistent challenge due to the complex, nonlinear structures inherent in physical systems. We propose a Gaussian Mixture Variational Autoencoder (GM-VAE) framework designed to address this by integrating an Expectation-Maximization (EM)-inspired training scheme with a novel spectral interpretability metric. Unlike conventional VAEs that jointly optimize reconstruction and clustering (often leading to training instability), our method utilizes a block-coordinate descent strategy, alternating between expectation and maximization steps. This approach stabilizes training and naturally aligns latent clusters with distinct physical regimes. To objectively evaluate the learned representations, we introduce a quantitative metric based on graph-Laplacian smoothness, which measures the coherence of physical quantities across the latent manifold. We demonstrate the efficacy of this framework on datasets of increasing complexity: surface reaction ODEs, Navier-Stokes wake flows, and experimental laser-induced combustion Schlieren images. The results show that our GM-VAE yields smooth, physically consistent manifolds and accurate regime clustering, offering a robust data-driven tool for interpreting turbulent and reactive flow systems.

【5】Dynamical Implicit Neural Representations
标题：动态内隐神经表征
链接：https://arxiv.org/abs/2511.21787

作者：Yesom Park,Kelvin Kan,Thomas Flynn,Yi Huang,Shinjae Yoo,Stanley Osher,Xihaier Luo
备注：23 pages, 6 figures
摘要：Implicit Neural Representations (INRs) provide a powerful continuous framework for modeling complex visual and geometric signals, but spectral bias remains a fundamental challenge, limiting their ability to capture high-frequency details. Orthogonal to existing remedy strategies, we introduce Dynamical Implicit Neural Representations (DINR), a new INR modeling framework that treats feature evolution as a continuous-time dynamical system rather than a discrete stack of layers. This dynamical formulation mitigates spectral bias by enabling richer, more adaptive frequency representations through continuous feature evolution. Theoretical analysis based on Rademacher complexity and the Neural Tangent Kernel demonstrates that DINR enhances expressivity and improves training dynamics. Moreover, regularizing the complexity of the underlying dynamics provides a principled way to balance expressivity and generalization. Extensive experiments on image representation, field reconstruction, and data compression confirm that DINR delivers more stable convergence, higher signal fidelity, and stronger generalization than conventional static INRs.

3D|3D重建等相关(1篇)

【1】3D-Consistent Multi-View Editing by Diffusion Guidance
标题：通过扩散指导进行3D一致的多视图编辑
链接：https://arxiv.org/abs/2511.22228

作者：Josef Bengtson,David Nilsson,Dong In Lee,Fredrik Kahl
摘要：Recent advancements in diffusion models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsistent results across different views of the same scene. Such inconsistencies are particularly problematic for editing of 3D representations such as NeRFs or Gaussian Splat models. We propose a training-free diffusion framework that enforces multi-view consistency during the image editing process. The key assumption is that corresponding points in the unedited images should undergo similar transformations after editing. To achieve this, we introduce a consistency loss that guides the diffusion sampling toward coherent edits. The framework is flexible and can be combined with widely varying image editing methods, supporting both dense and sparse multi-view editing setups. Experimental results show that our approach significantly improves 3D consistency compared to existing multi-view editing methods. We also show that this increased consistency enables high-quality Gaussian Splat editing with sharp details and strong fidelity to user-specified text prompts. Please refer to our project page for video results: https://3d-consistent-editing.github.io/

编码器(1篇)

【1】PULSE-ICU: A Pretrained Unified Long-Sequence Encoder for Multi-task Prediction in Intensive Care Units
标题：PULSE-ICU：一种用于重症监护病房多任务预测的预训练统一长序列编码器
链接：https://arxiv.org/abs/2511.22199

作者：Sejeong Jang,Joo Heung Yoon,Hyo Kyung Lee
摘要：Intensive care unit (ICU) data are highly irregular, heterogeneous, and temporally fragmented, posing challenges for generalizable clinical prediction. We present PULSE-ICU, a self-supervised foundation model that learns event-level ICU representations from large-scale EHR sequences without resampling or manual feature engineering. A unified embedding module encodes event identity, continuous values, units, and temporal attributes, while a Longformer-based encoder enables efficient modeling of long trajectories. PULSE-ICU was fine-tuned across 18 prediction tasks, including mortality, intervention forecasting, and phenotype identification, achieving strong performance across task types. External validation on eICU, HiRID, and P12 showed substantial improvements with minimal fine-tuning, demonstrating robustness to domain shift and variable constraints. These findings suggest that foundation-style modeling can improve data efficiency and adaptability, providing a scalable framework for ICU decision support across diverse clinical environments.

优化|敛散性(8篇)

【1】Distributed Dynamic Associative Memory via Online Convex Optimization
标题：基于在线凸优化的分布式动态联想记忆
链接：https://arxiv.org/abs/2511.23347

作者：Bowen Wang,Matteo Zecchin,Osvaldo Simeone
摘要：An associative memory (AM) enables cue-response recall, and it has recently been recognized as a key mechanism underlying modern neural architectures such as Transformers. In this work, we introduce the concept of distributed dynamic associative memory (DDAM), which extends classical AM to settings with multiple agents and time-varying data streams. In DDAM, each agent maintains a local AM that must not only store its own associations but also selectively memorize information from other agents based on a specified interest matrix. To address this problem, we propose a novel tree-based distributed online gradient descent algorithm, termed DDAM-TOGD, which enables each agent to update its memory on the fly via inter-agent communication over designated routing trees. We derive rigorous performance guarantees for DDAM-TOGD, proving sublinear static regret in stationary environments and a path-length dependent dynamic regret bound in non-stationary environments. These theoretical results provide insights into how communication delays and network structure impact performance. Building on the regret analysis, we further introduce a combinatorial tree design strategy that optimizes the routing trees to minimize communication delays, thereby improving regret bounds. Numerical experiments demonstrate that the proposed DDAM-TOGD framework achieves superior accuracy and robustness compared to representative online learning baselines such as consensus-based distributed optimization, confirming the benefits of the proposed approach in dynamic, distributed environments.

【2】Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
标题：流量密度控制：超越熵调节的生成优化
链接：https://arxiv.org/abs/2511.22640

作者：Riccardo De Santi,Marin Vlastelica,Ya-Ping Hsieh,Zebang Shen,Niao He,Andreas Krause
备注：NeurIPS 2025
摘要：Adapting large-scale foundation flow and diffusion generative models to optimize task-specific objectives while preserving prior information is crucial for real-world applications such as molecular design, protein docking, and creative image generation. Existing principled fine-tuning methods aim to maximize the expected reward of generated samples, while retaining knowledge from the pre-trained model via KL-divergence regularization. In this work, we tackle the significantly more general problem of optimizing general utilities beyond average rewards, including risk-averse and novelty-seeking reward maximization, diversity measures for exploration, and experiment design objectives among others. Likewise, we consider more general ways to preserve prior information beyond KL-divergence, such as optimal transport distances and Renyi divergences. To this end, we introduce Flow Density Control (FDC), a simple algorithm that reduces this complex problem to a specific sequence of simpler fine-tuning tasks, each solvable via scalable established methods. We derive convergence guarantees for the proposed scheme under realistic assumptions by leveraging recent understanding of mirror flows. Finally, we validate our method on illustrative settings, text-to-image, and molecular design tasks, showing that it can steer pre-trained generative models to optimize objectives and solve practically relevant tasks beyond the reach of current fine-tuning schemes.

【3】What Shape Is Optimal for Masks in Text Removal?
标题：在文本删除中，什么形状最适合口罩？
链接：https://arxiv.org/abs/2511.22499

作者：Hyakka Nakada,Marika Kubota
备注：12 pages, 17 figures
摘要：The advent of generative models has dramatically improved the accuracy of image inpainting. In particular, by removing specific text from document images, reconstructing original images is extremely important for industrial applications. However, most existing methods of text removal focus on deleting simple scene text which appears in images captured by a camera in an outdoor environment. There is little research dedicated to complex and practical images with dense text. Therefore, we created benchmark data for text removal from images including a large amount of text. From the data, we found that text-removal performance becomes vulnerable against mask profile perturbation. Thus, for practical text-removal tasks, precise tuning of the mask shape is essential. This study developed a method to model highly flexible mask profiles and learn their parameters using Bayesian optimization. The resulting profiles were found to be character-wise masks. It was also found that the minimum cover of a text region is not optimal. Our research is expected to pave the way for a user-friendly guideline for manual masking.

【4】What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$
标题：精确度和召回率之间的最佳排名分数是多少？我们总是可以找到它，而且很少是$F_1$
链接：https://arxiv.org/abs/2511.22442

作者：Sébastien Piérard,Adrien Deliège,Marc Van Droogenbroeck
摘要：Ranking methods or models based on their performance is of prime importance but is tricky because performance is fundamentally multidimensional. In the case of classification, precision and recall are scores with probabilistic interpretations that are both important to consider and complementary. The rankings induced by these two scores are often in partial contradiction. In practice, therefore, it is extremely useful to establish a compromise between the two views to obtain a single, global ranking. Over the last fifty years or so,it has been proposed to take a weighted harmonic mean, known as the F-score, F-measure, or $F_β$. Generally speaking, by averaging basic scores, we obtain a score that is intermediate in terms of values. However, there is no guarantee that these scores lead to meaningful rankings and no guarantee that the rankings are good tradeoffs between these base scores. Given the ubiquity of $F_β$ scores in the literature, some clarification is in order. Concretely: (1) We establish that $F_β$-induced rankings are meaningful and define a shortest path between precision- and recall-induced rankings. (2) We frame the problem of finding a tradeoff between two scores as an optimization problem expressed with Kendall rank correlations. We show that $F_1$ and its skew-insensitive version are far from being optimal in that regard. (3) We provide theoretical tools and a closed-form expression to find the optimal value for $β$ for any distribution or set of performances, and we illustrate their use on six case studies.

【5】Quantum Bayesian Optimization for Quality Improvement in Fuselage Assembly
标题：提高机身装配质量的量子Bayesian优化
链接：https://arxiv.org/abs/2511.22090

作者：Jiayu Liu,Chong Liu,Trevor Rhone,Yinan Wang
摘要：Recent efforts in smart manufacturing have enhanced aerospace fuselage assembly processes, particularly by innovating shape adjustment techniques to minimize dimensional gaps between assembled sections. Existing approaches have shown promising results but face the issue of low sample efficiency from the manufacturing systems. It arises from the limitation of the classical Monte Carlo method when uncovering the mean response from a distribution. In contrast, recent work has shown that quantum algorithms can achieve the same level of estimation accuracy with significantly fewer samples than the classical Monte Carlo method from distributions. Therefore, we can adopt the estimation of the quantum algorithm to obtain the estimation from real physical systems (distributions). Motivated by this advantage, we propose a Quantum Bayesian Optimization (QBO) framework for precise shape control during assembly to improve the sample efficiency in manufacturing practice. Specifically, this approach utilizes a quantum oracle, based on finite element analysis (FEA)-based models or surrogate models, to acquire a more accurate estimation of the environment response with fewer queries for a certain input. QBO employs an Upper Confidence Bound (UCB) as the acquisition function to strategically select input values that are most likely to maximize the objective function. It has been theoretically proven to require much fewer samples while maintaining comparable optimization results. In the case study, force-controlled actuators are applied to one fuselage section to adjust its shape and reduce the gap to the adjoining section. Experimental results demonstrate that QBO achieves significantly lower dimensional error and uncertainty compared to classical methods, particularly using the same queries from the simulation.

【6】Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
标题：单高斯超参数化得分匹配的收敛动力学
链接：https://arxiv.org/abs/2511.22069

作者：Yiran Zhang,Weihang Xu,Mo Zhou,Maryam Fazel,Simon Shaolei Du
备注：43 pages
摘要：Score matching has become a central training objective in modern generative modeling, particularly in diffusion models, where it is used to learn high-dimensional data distributions through the estimation of score functions. Despite its empirical success, the theoretical understanding of the optimization behavior of score matching, particularly in over-parameterized regimes, remains limited. In this work, we study gradient descent for training over-parameterized models to learn a single Gaussian distribution. Specifically, we use a student model with $n$ learnable parameters and train it on data generated from a single ground-truth Gaussian using the population score matching objective. We analyze the optimization dynamics under multiple regimes. When the noise scale is sufficiently large, we prove a global convergence result for gradient descent. In the low-noise regime, we identify the existence of a stationary point, highlighting the difficulty of proving global convergence in this case. Nevertheless, we show convergence under certain initialization conditions: when the parameters are initialized to be exponentially small, gradient descent ensures convergence of all parameters to the ground truth. We further prove that without the exponentially small initialization, the parameters may not converge to the ground truth. Finally, we consider the case where parameters are randomly initialized from a Gaussian distribution far from the ground truth. We prove that, with high probability, only one parameter converges while the others diverge, yet the loss still converges to zero with a $1/τ$ rate, where $τ$ is the number of iterations. We also establish a nearly matching lower bound on the convergence rate in this regime. This is the first work to establish global convergence guarantees for Gaussian mixtures with at least three components under the score matching framework.

【7】On the Role of Preference Variance in Preference Optimization
标题：论偏好差异在偏好优化中的作用
链接：https://arxiv.org/abs/2510.13022

作者：Jiacheng Guo,Zihao Li,Jiahao Qiu,Yue Wu,Mengdi Wang
摘要：Direct Preference Optimization (DPO) has emerged as an important approach for learning from human preferences in aligning large language models (LLMs). However, collecting human preference data is costly and inefficient, motivating methods to reduce the required annotations. In this work, we investigate the impact of \emph{preference variance} (PVar), which measures the variance in model preferences when comparing pairs of responses, on the effectiveness of DPO training. We provide a theoretical insight by establishing an upper bound on the DPO gradient norm for any given prompt, showing it is controlled by the PVar of that prompt. This implies that prompts with low PVar can only produce small gradient updates, making them less valuable for learning. We validate this finding by fine-tuning LLMs with preferences generated by a reward model, evaluating on two benchmarks (AlpacaEval 2.0 and Arena-Hard). Experimental results demonstrate that prompts with higher PVar outperform randomly selected prompts or those with lower PVar. We also show that our PVar-based selection method is robust, when using smaller reward models (1B, 3B) for selection. Notably, in a separate experiment using the original human annotations from the UltraFeedback dataset, we found that training on only the top 10\% of prompts with the highest PVar yields better evaluation performance than training on the full dataset, highlighting the importance of preference variance in identifying informative examples for efficient LLM alignment.

【8】On the Condition Number Dependency in Bilevel Optimization
标题：二层优化中条件数依赖性
链接：https://arxiv.org/abs/2511.22331

作者：Lesi Chen,Jingzhao Zhang
摘要：Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(κ^4 ε^{-2})$ upper bound that is near-optimal in $ε$. However, the optimal dependency on the condition number $κ$ is unknown. In this work, we establish a new $Ω(κ^2 ε^{-2})$ lower bound and $\tilde{\mathcal{O}}(κ^{7/2} ε^{-2})$ upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show $Ω(κ_y^{13/4} ε^{-12/7})$ and $Ω(κ^{17/10} ε^{-8/5})$ lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ/\sqrtε)$ to $Ω(κ^{5/4} / \sqrtε)$. (3) For smooth stochastic problems, we show an $Ω(κ^4 ε^{-4})$ lower bound.

预测|估计(11篇)

【1】Time Series Forecasting via Direct Per-Step Probability Distribution Modeling
标题：通过直接逐步概率分布建模进行时间序列预测
链接：https://arxiv.org/abs/2511.23260

作者：Linghao Kong,Xiaopeng Hong
备注：16 pages, 8 figures. This is the preprint version of the paper and supplemental material to appear in AAAI, 2026. Please cite the final published version. Code is available at https://github.com/leonardokong486/interPDN
摘要：Deep neural network-based time series prediction models have recently demonstrated superior capabilities in capturing complex temporal dependencies. However, it is challenging for these models to account for uncertainty associated with their predictions, because they directly output scalar values at each time step. To address such a challenge, we propose a novel model named interleaved dual-branch Probability Distribution Network (interPDN), which directly constructs discrete probability distributions per step instead of a scalar. The regression output at each time step is derived by computing the expectation of the predictive distribution on a predefined support set. To mitigate prediction anomalies, a dual-branch architecture is introduced with interleaved support sets, augmented by coarse temporal-scale branches for long-term trend forecasting. Outputs from another branch are treated as auxiliary signals to impose self-supervised consistency constraints on the current branch's prediction. Extensive experiments on multiple real-world datasets demonstrate the superior performance of interPDN.

【2】Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
标题：Delka-XAI：解释在线时间序列监控中预测变化的统一框架
链接：https://arxiv.org/abs/2511.23036

作者：Changhun Kim,Yechan Mun,Hyeongwon Jang,Eunseo Lee,Sangchul Hahn,Eunho Yang
备注：Under review at ICLR 2026
摘要：Explaining online time series monitoring models is crucial across sensitive domains such as healthcare and finance, where temporal and contextual prediction dynamics underpin critical decisions. While recent XAI methods have improved the explainability of time series models, they mostly analyze each time step independently, overlooking temporal dependencies. This results in further challenges: explaining prediction changes is non-trivial, methods fail to leverage online dynamics, and evaluation remains difficult. To address these challenges, we propose Delta-XAI, which adapts 14 existing XAI methods through a wrapper function and introduces a principled evaluation suite for the online setting, assessing diverse aspects, such as faithfulness, sufficiency, and coherence. Experiments reveal that classical gradient-based methods, such as Integrated Gradients (IG), can outperform recent approaches when adapted for temporal analysis. Building on this, we propose Shifted Window Integrated Gradients (SWING), which incorporates past observations in the integration path to systematically capture temporal dependencies and mitigate out-of-distribution effects. Extensive experiments consistently demonstrate the effectiveness of SWING across diverse settings with respect to diverse metrics. Our code is publicly available at https://anonymous.4open.science/r/Delta-XAI.

【3】TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE
标题：TARFVAE：通过基于TARFLOW的VAE高效一步生成时间序列预测
链接：https://arxiv.org/abs/2511.22853

作者：Jiawen Wei,Lan Jiang,Pengbo Wei,Ziwen Ye,Teng Song,Chen Chen,Guangrui Ma
摘要：Time series data is ubiquitous, with forecasting applications spanning from finance to healthcare. Beyond popular deterministic methods, generative models are gaining attention due to advancements in areas like image synthesis and video generation, as well as their inherent ability to provide probabilistic predictions. However, existing generative approaches mostly involve recurrent generative operations or repeated denoising steps, making the prediction laborious, particularly for long-term forecasting. Most of them only conduct experiments for relatively short-term forecasting, with limited comparison to deterministic methods in long-term forecasting, leaving their practical advantages unclear. This paper presents TARFVAE, a novel generative framework that combines the Transformer-based autoregressive flow (TARFLOW) and variational autoencoder (VAE) for efficient one-step generative time series forecasting. Inspired by the rethinking that complex architectures for extracting time series representations might not be necessary, we add a flow module, TARFLOW, to VAE to promote spontaneous learning of latent variables that benefit predictions. TARFLOW enhances VAE's posterior estimation by breaking the Gaussian assumption, thereby enabling a more informative latent space. TARFVAE uses only the forward process of TARFLOW, avoiding autoregressive inverse operations and thus ensuring fast generation. During generation, it samples from the prior latent space and directly generates full-horizon forecasts via the VAE decoder. With simple MLP modules, TARFVAE achieves superior performance over state-of-the-art deterministic and generative models across different forecast horizons on benchmark datasets while maintaining efficient prediction speed, demonstrating its effectiveness as an efficient and powerful solution for generative time series forecasting.

【4】Predicting and Interpolating Spatiotemporal Environmental Data: A Case Study of Groundwater Storage in Bangladesh
标题：时空环境数据的预测与内插--以孟加拉国地下水储量为例
链接：https://arxiv.org/abs/2511.22378

作者：Anna Pazola,Mohammad Shamsudduha,Richard G. Taylor,Allan Tucker
备注：Submitted to the IDA 2026 conference
摘要：Geospatial observational datasets are often limited to point measurements, making temporal prediction and spatial interpolation essential for constructing continuous fields. This study evaluates two deep learning strategies for addressing this challenge: (1) a grid-to-grid approach, where gridded predictors are used to model rasterised targets (aggregation before modelling), and (2) a grid-to-point approach, where gridded predictors model point targets, followed by kriging interpolation to fill the domain (aggregation after modelling). Using groundwater storage data from Bangladesh as a case study, we compare the effcacy of these approaches. Our findings indicate that spatial interpolation is substantially more difficult than temporal prediction. In particular, nearest neighbours are not always the most similar, and uncertainties in geology strongly influence point temporal behaviour. These insights motivate future work on advanced interpolation methods informed by clustering locations based on time series dynamics. Demonstrated on groundwater storage, the conclusions are applicable to other environmental variables governed by indirectly observable factors. Code is available at https://github.com/pazolka/interpolation-prediction-gwsa.

【5】Predicting Public Health Impacts of Electricity Usage
标题：预测用电对公共健康的影响
链接：https://arxiv.org/abs/2511.22031

作者：Yejia Liu,Zhifeng Wu,Pengfei Li,Shaolei Ren
备注：21 Pages. Accepted to NeurIPS 2025 Workshop on Socially Responsible and Trustworthy Foundation Models (ResponsibleFM)
摘要：The electric power sector is a leading source of air pollutant emissions, impacting the public health of nearly every community. Although regulatory measures have reduced air pollutants, fossil fuels remain a significant component of the energy supply, highlighting the need for more advanced demand-side approaches to reduce the public health impacts. To enable health-informed demand-side management, we introduce HealthPredictor, a domain-specific AI model that provides an end-to-end pipeline linking electricity use to public health outcomes. The model comprises three components: a fuel mix predictor that estimates the contribution of different generation sources, an air quality converter that models pollutant emissions and atmospheric dispersion, and a health impact assessor that translates resulting pollutant changes into monetized health damages. Across multiple regions in the United States, our health-driven optimization framework yields substantially lower prediction errors in terms of public health impacts than fuel mix-driven baselines. A case study on electric vehicle charging schedules illustrates the public health gains enabled by our method and the actionable guidance it can offer for health-informed energy management. Overall, this work shows how AI models can be explicitly designed to enable health-informed energy management for advancing public health and broader societal well-being. Our datasets and code are released at: https://github.com/Ren-Research/Health-Impact-Predictor.

【6】CTR Prediction on Alibaba's Taobao Advertising Dataset Using Traditional and Deep Learning Models
标题：使用传统和深度学习模型对阿里巴巴Taobao广告数据集进行TLR预测
链接：https://arxiv.org/abs/2511.21963

作者：Hongyu Yang,Chunxi Wen,Jiyin Zhang,Nanfei Shen,Shijiao Zhang,Xiyan Han
摘要：Click-through rates prediction is critical in modern advertising systems, where ranking relevance and user engagement directly impact platform efficiency and business value. In this project, we explore how to model CTR more effectively using a large-scale Taobao dataset released by Alibaba. We start with supervised learning models, including logistic regression and Light-GBM, that are trained on static features such as user demographics, ad attributes, and contextual metadata. These models provide fast, interpretable benchmarks, but have limited capabilities to capture patterns of behavior that drive clicks. To better model user intent, we combined behavioral data from hundreds of millions of interactions over a 22-day period. By extracting and encoding user action sequences, we construct representations of user interests over time. We use deep learning models to fuse behavioral embeddings with static features. Among them, multilayer perceptrons (MLPs) have achieved significant performance improvements. To capture temporal dynamics, we designed a Transformer-based architecture that uses a self-attention mechanism to learn contextual dependencies across behavioral sequences, modeling not only what the user interacts with, but also the timing and frequency of interactions. Transformer improves AUC by 2.81 % over the baseline (LR model), with the largest gains observed for users whose interests are diverse or change over time. In addition to modeling, we propose an A/B testing strategy for real-world evaluation. We also think about the broader implications: personalized ad targeting technology can be applied to public health scenarios to achieve precise delivery of health information or behavior guidance. Our research provides a roadmap for advancing click-through rate predictions and extending their value beyond e-commerce.

【7】WalkCLIP: Multimodal Learning for Urban Walkability Prediction
标题：WalkCLIP：城市步行能力预测的多模式学习
链接：https://arxiv.org/abs/2511.21947

作者：Shilong Xiang,JangHyeon Lee,Min Namgung,Yao-Yi Chiang
摘要：Urban walkability is a cornerstone of public health, sustainability, and quality of life. Traditional walkability assessments rely on surveys and field audits, which are costly and difficult to scale. Recent studies have used satellite imagery, street view imagery, or population indicators to estimate walkability, but these single-source approaches capture only one dimension of the walking environment. Satellite data describe the built environment from above, but overlook the pedestrian perspective. Street view imagery captures conditions at the ground level, but lacks broader spatial context. Population dynamics reveal patterns of human activity but not the visual form of the environment. We introduce WalkCLIP, a multimodal framework that integrates these complementary viewpoints to predict urban walkability. WalkCLIP learns walkability-aware vision-language representations from GPT-4o generated image captions, refines these representations with a spatial aggregation module that incorporates neighborhood context, and fuses the resulting features with representations from a population dynamics foundation model. Evaluated at 4,660 locations throughout Minneapolis-Saint Paul, WalkCLIP outperforms unimodal and multimodal baselines in both predictive accuracy and spatial alignment. These results show that the integration of visual and behavioral signals yields reliable predictions of the walking environment.

【8】Multi-Modal Machine Learning for Early Trust Prediction in Human-AI Interaction Using Face Image and GSR Bio Signals
标题：使用人脸图像和GSR生物信号进行人机交互早期信任预测的多模式机器学习
链接：https://arxiv.org/abs/2511.21908

作者：Hamid Shamszare,Avishek Choudhury
摘要：Predicting human trust in AI systems is crucial for safe integration of AI-based decision support tools, especially in healthcare. This study proposes a multi-modal machine learning framework that combines image and galvanic skin response (GSR) data to predict early user trust in AI- or human-generated recommendations in a simulated ADHD mHealth context. Facial video data were processed using OpenCV for frame extraction and transferred learning with a pre-trained transformer model to derive emotional features. Concurrently, GSR signals were decomposed into tonic and phasic components to capture physiological arousal patterns. Two temporal windows were defined for trust prediction: the Early Detection Window (6 to 3 seconds before decision-making) and the Proximal Detection Window (3 to 0 seconds before decision-making). For each window, trust prediction was conducted separately using image-based, GSR-based, and multimodal (image + GSR) features. Each modality was analyzed using machine learning algorithms, and the top-performing unimodal models were integrated through a multimodal stacking ensemble for final prediction. Experimental results showed that combining facial and physiological cues significantly improved prediction performance. The multimodal stacking framework achieved an accuracy of 0.83, F1-score of 0.88, and ROC-AUC of 0.87 in the Early Detection Window, and an accuracy of 0.75, F1-score of 0.82, and ROC-AUC of 0.66 in the Proximal Detection Window. These results demonstrate the potential of bio signals as real-time, objective markers of user trust, enabling adaptive AI systems that dynamically adjust their responses to maintain calibrated trust which is a critical capability in mental health applications where mis-calibrated trust can affect diagnostic and treatment outcomes.

【9】Lightweight ML-Based Air Quality Prediction for IoT and Embedded Applications
标题：适用于物联网和嵌入式应用的轻型基于ML的空气质量预测
链接：https://arxiv.org/abs/2511.21857

作者：Md. Sad Abdullah Sami,Mushfiquzzaman Abid
摘要：This study investigates the effectiveness and efficiency of two variants of the XGBoost regression model, the full-capacity and lightweight (tiny) versions, for predicting the concentrations of carbon monoxide (CO) and nitrogen dioxide (NO2). Using the AirQualityUCI dataset collected over one year in an urban environment, we conducted a comprehensive evaluation based on widely accepted metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Bias Error (MBE), and the coefficient of determination (R2). In addition, we assessed resource-oriented metrics such as inference time, model size, and peak RAM usage. The full XGBoost model achieved superior predictive accuracy for both pollutants, while the tiny model, though slightly less precise, offered substantial computational benefits with significantly reduced inference time and model storage requirements. These results demonstrate the feasibility of deploying simplified models in resource-constrained environments without compromising predictive quality. This makes the tiny XGBoost model suitable for real-time air-quality monitoring in IoT and embedded applications.

【10】Machine learning for violence prediction: a systematic review and critical appraisal
标题：暴力预测的机器学习：系统审查和批判性评估
链接：https://arxiv.org/abs/2511.23118

作者：Stefaniya Kozhevnikova,Denis Yukhnenko,Giulio Scola,Seena Fazel
摘要：Purpose To conduct a systematic review of machine learning models for predicting violent behaviour by synthesising and appraising their validity, usefulness, and performance. Methods We systematically searched nine bibliographic databases and Google Scholar up to September 2025 for development and/or validation studies on machine learning methods for predicting all forms of violent behaviour. We synthesised the results by summarising discrimination and calibration performance statistics and evaluated study quality by examining risk of bias and clinical utility. Results We identified 38 studies reporting the development and validation of 40 models. Most studies reported Area Under the Curve (AUC) as the discrimination statistic with a range of 0.68-0.99. Only eight studies reported calibration performance, and three studies reported external validation. 31 studies had a high risk of bias, mainly in the analysis domain, and three studies had low risk of bias. The overall clinical utility of violence prediction models is poor, as indicated by risks of overfitting due to small samples, lack of transparent reporting, and low generalisability. Conclusion Although black box machine learning models currently have limited applicability in clinical settings, they may show promise for identifying high-risk individuals. We recommend five key considerations for violence prediction modelling: (i) ensuring methodological quality (e.g. following guidelines) and interdisciplinary collaborations; (ii) using black box algorithms only for highly complex data; (iii) incorporating dynamic predictions to allow for risk monitoring; (iv) developing more trustworthy algorithms using explainable methods; and (v) applying causal machine learning approaches where appropriate.

【11】Digital Elevation Model Estimation from RGB Satellite Imagery using Generative Deep Learning
标题：使用生成式深度学习从RB卫星图像进行数字海拔模型估计
链接：https://arxiv.org/abs/2511.21985

作者：Alif Ilham Madani,Riska A. Kuswati,Alex M. Lechner,Muhamad Risqi U. Saputra
备注：5 pages, 4 figures, accepted at IGARSS 2025 conference
摘要：Digital Elevation Models (DEMs) are vital datasets for geospatial applications such as hydrological modeling and environmental monitoring. However, conventional methods to generate DEM, such as using LiDAR and photogrammetry, require specific types of data that are often inaccessible in resource-constrained settings. To alleviate this problem, this study proposes an approach to generate DEM from freely available RGB satellite imagery using generative deep learning, particularly based on a conditional Generative Adversarial Network (GAN). We first developed a global dataset consisting of 12K RGB-DEM pairs using Landsat satellite imagery and NASA's SRTM digital elevation data, both from the year 2000. A unique preprocessing pipeline was implemented to select high-quality, cloud-free regions and aggregate normalized RGB composites from Landsat imagery. Additionally, the model was trained in a two-stage process, where it was first trained on the complete dataset and then fine-tuned on high-quality samples filtered by Structural Similarity Index Measure (SSIM) values to improve performance on challenging terrains. The results demonstrate promising performance in mountainous regions, achieving an overall mean root-mean-square error (RMSE) of 0.4671 and a mean SSIM score of 0.2065 (scale -1 to 1), while highlighting limitations in lowland and residential areas. This study underscores the importance of meticulous preprocessing and iterative refinement in generative modeling for DEM generation, offering a cost-effective and adaptive alternative to conventional methods while emphasizing the challenge of generalization across diverse terrains worldwide.

其他神经网络|深度学习|模型|建模(29篇)

【1】ThetaEvolve: Test-time Learning on Open Problems
标题：ThetaEvolve：开放问题的测试时学习
链接：https://arxiv.org/abs/2511.23473

作者：Yiping Wang,Shao-Rong Su,Zhiyuan Zeng,Eva Xu,Liliang Ren,Xinyu Yang,Zeyi Huang,Xuehai He,Luyao Ma,Baolin Peng,Hao Cheng,Pengcheng He,Weizhu Chen,Shuohang Wang,Simon Shaolei Du,Yelong Shen
备注：30 pages, link: https://github.com/ypwang61/ThetaEvolve
摘要：Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve bounds on open problems. However, it relies on ensembles of frontier LLMs to achieve new bounds and is a pure inference system that models cannot internalize the evolving strategies. We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time, allowing models to continually learn from their experiences in improving open optimization problems. ThetaEvolve features a single LLM, a large program database for enhanced exploration, batch sampling for higher throughput, lazy penalties to discourage stagnant outputs, and optional reward shaping for stable training signals, etc. ThetaEvolve is the first evolving framework that enable a small open-source model, like DeepSeek-R1-0528-Qwen3-8B, to achieve new best-known bounds on open problems (circle packing and first auto-correlation inequality) mentioned in AlphaEvolve. Besides, across two models and four open tasks, we find that ThetaEvolve with RL at test-time consistently outperforms inference-only baselines, and the model indeed learns evolving capabilities, as the RL-trained checkpoints demonstrate faster progress and better final performance on both trained target task and other unseen tasks. We release our code publicly: https://github.com/ypwang61/ThetaEvolve

【2】Physics-Informed Neural Networks for Thermophysical Property Retrieval
标题：用于热物理性质检索的物理信息神经网络
链接：https://arxiv.org/abs/2511.23449

作者：Ali Waseem,Malcolm Mielle
备注：26 pages, 4 figures, 3 tables
摘要：Inverse heat problems refer to the estimation of material thermophysical properties given observed or known heat diffusion behaviour. Inverse heat problems have wide-ranging uses, but a critical application lies in quantifying how building facade renovation reduces thermal transmittance, a key determinant of building energy efficiency. However, solving inverse heat problems with non-invasive data collected in situ is error-prone due to environmental variability or deviations from theoretically assumed conditions. Hence, current methods for measuring thermal conductivity are either invasive, require lengthy observation periods, or are sensitive to environmental and experimental conditions. Here, we present a PINN-based iterative framework to estimate the thermal conductivity k of a wall from a set of thermographs; our framework alternates between estimating the forward heat problem with a PINN for a fixed k, and optimizing k by comparing the thermographs and surface temperatures predicted by the PINN, repeating until the estimated k's convergence. Using both environmental data captured by a weather station and data generated from Finite-Volume-Method software simulations, we accurately predict k across different environmental conditions and data collection sampling times, given the temperature profile of the wall at dawn is close to steady state. Although violating the steady-state assumption impacts the accuracy of k's estimation, we show that our proposed framework still only exhibits a maximum MAE of 4.0851. Our work demonstrates the potential of PINN-based methods for reliable estimation of material properties in situ and under realistic conditions, without lengthy measurement campaigns. Given the lack of research on using machine learning, and more specifically on PINNs, for solving in-situ inverse problems, we expect our work to be a starting point for more research on the topic.

【3】Quantized-Tinyllava: a new multimodal foundation model enables efficient split learning
标题：量化-Tinyllava：新的多模式基础模型实现高效的分裂学习
链接：https://arxiv.org/abs/2511.23402

作者：Jiajun Guo,Xin Luo,Jie Liu
备注：14pages, 5 figures
摘要：Split learning is well known as a method for resolving data privacy concerns by training a model on distributed devices, thereby avoiding data sharing that raises privacy issues. However, high network communication costs are always an impediment to split learning, especially for large foundation models that require transmitting large amounts of high-dimensional data. To resolve this issue, we present a new multimodal model structure that incorporates a learning-based data compression method, which compresses model embeddings into low-bit integers while preserving the model's performance, greatly reducing the transmission costs between partitions. We then determine the optimal number of discrete representation levels based on a solid theoretical foundation from entropy coding.

【4】Learning-Augmented Online Bipartite Matching in the Random Arrival Order Model
标题：随机到达顺序模型中的学习增强在线双方匹配
链接：https://arxiv.org/abs/2511.23388

作者：Kunanon Burathep,Thomas Erlebach,William K. Moses
备注：17 pages, 1 figure, 1 table. An extended abstract of this paper appears in the proceedings of the 51st International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2026)
摘要：We study the online unweighted bipartite matching problem in the random arrival order model, with $n$ offline and $n$ online vertices, in the learning-augmented setting: The algorithm is provided with untrusted predictions of the types (neighborhoods) of the online vertices. We build upon the work of Choo et al. (ICML 2024, pp. 8762-8781) who proposed an approach that uses a prefix of the arrival sequence as a sample to determine whether the predictions are close to the true arrival sequence and then either follows the predictions or uses a known baseline algorithm that ignores the predictions and is $β$-competitive. Their analysis is limited to the case that the optimal matching has size $n$, i.e., every online vertex can be matched. We generalize their approach and analysis by removing any assumptions on the size of the optimal matching while only requiring that the size of the predicted matching is at least $αn$ for any constant $0 < α\le 1$. Our learning-augmented algorithm achieves $(1-o(1))$-consistency and $(β-o(1))$-robustness. Additionally, we show that the competitive ratio degrades smoothly between consistency and robustness with increasing prediction error.

【5】Hard-Constrained Neural Networks with Physics-Embedded Architecture for Residual Dynamics Learning and Invariant Enforcement in Cyber-Physical Systems
标题：具有物理嵌入式架构的硬约束神经网络用于网络物理系统中的剩余动态学习和不变实施
链接：https://arxiv.org/abs/2511.23307

作者：Enzo Nicolás Spotorno,Josafat Leal Filho,Antônio Augusto Fröhlich
备注：41 pages (30 pages main text + 11 pages appendices), 3 figures, 8 tables. Submitted to JMLR
摘要：This paper presents a framework for physics-informed learning in complex cyber-physical systems governed by differential equations with both unknown dynamics and algebraic invariants. First, we formalize the Hybrid Recurrent Physics-Informed Neural Network (HRPINN), a general-purpose architecture that embeds known physics as a hard structural constraint within a recurrent integrator to learn only residual dynamics. Second, we introduce the Projected HRPINN (PHRPINN), a novel extension that integrates a predict-project mechanism to strictly enforce algebraic invariants by design. The framework is supported by a theoretical analysis of its representational capacity. We validate HRPINN on a real-world battery prognostics DAE and evaluate PHRPINN on a suite of standard constrained benchmarks. The results demonstrate the framework's potential for achieving high accuracy and data efficiency, while also highlighting critical trade-offs between physical consistency, computational cost, and numerical stability, providing practical guidance for its deployment.

【6】db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism
标题：db-SP：利用双平衡序列并行主义加速视觉生成模型的稀疏注意力
链接：https://arxiv.org/abs/2511.23113

作者：Siqi Chen,Ke Hong,Tianchen Zhao,Ruiqi Xie,Zhenhua Zhu,Xudong Zhang,Yu Wang
摘要：Scaling Diffusion Transformer (DiT) inference via sequence parallelism is critical for reducing latency in visual generation, but is severely hampered by workload imbalance when applied to models employing block-wise sparse attention. The imbalance stems from the inherent variation in sparsity across attention heads and the irregular distribution of dense blocks within the sparse mask, when sequence parallelism is applied along the head dimension (as in Ulysses) or the block dimension (as in Ring Attention). In this paper, we formalize a sparse imbalance ratio to quantify the imbalance, and propose db-SP, a sparsity-aware sequence parallelism technique that tackles the challenge. db-SP contains a dual-level partitioning approach that achieves near-perfect workload balance at both the head and block levels with negligible overhead. Furthermore, to handle the evolving sparsity patterns across denoising steps and layers, db-SP dynamically determines the parallel degrees for the head and block dimensions at runtime. Experimental results demonstrate that db-SP delivers an end-to-end speedup of 1.25x and an attention-specific speedup of 1.40x over state-of-the-art sequence parallel methods on average. Code is available at https://github.com/thu-nics/db-SP.

【7】Buffer replay enhances the robustness of multimodal learning under missing-modality
标题：缓冲区回放增强了丢失模式下多模式学习的鲁棒性
链接：https://arxiv.org/abs/2511.23070

作者：Hongye Zhu,Xuan Liu,Yanwen Ba,Jingye Xue,Shigeng Zhang
摘要：Missing modalities consistently lead to significant performance degradation in multimodal models. Existing approaches either synthesize missing modalities at high computational cost or apply prompt-based fine-tuning that relies only on adjacent-layer features and overlooks long-distance contextual information, which may offer additional tolerance to errors when one or more modalities are missing. To address this, we introduce REplay Prompting (REP): (1) construct modality-wise feature buffers via a residual bypass to cache early-layer representations and replay them in deeper layers, mitigating information loss as network depth increases; (2) employ a private-shared feature decoupling strategy, where private buffers preserve modality-specific signals and shared buffers encode cross-modal semantics; and (3) design a task-aware dynamic initialization mechanism to configure these buffers differently, improving stability and generalization under diverse missing-modality conditions. Experiments on vision-language, vision-language-audio, and temporal multimodal benchmarks demonstrate that REP consistently outperforms prior methods under both single- and multi-modality missing scenarios, while introducing only negligible parameter overhead. These results establish REP as a lightweight and effective paradigm for robust multimodal learning in challenging missing-modality environments.

【8】ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery
标题：ClearGCD：缓解工作空间学习以实现稳健的广义类别发现
链接：https://arxiv.org/abs/2511.22892

作者：Kailin Lyu,Jianwei He,Long Xiao,Jianing Zeng,Liang Fan,Lin Shu,Jie Hao
备注：5 pages, 4 figures
摘要：In open-world scenarios, Generalized Category Discovery (GCD) requires identifying both known and novel categories within unlabeled data. However, existing methods often suffer from prototype confusion caused by shortcut learning, which undermines generalization and leads to forgetting of known classes. We propose ClearGCD, a framework designed to mitigate reliance on non-semantic cues through two complementary mechanisms. First, Semantic View Alignment (SVA) generates strong augmentations via cross-class patch replacement and enforces semantic consistency using weak augmentations. Second, Shortcut Suppression Regularization (SSR) maintains an adaptive prototype bank that aligns known classes while encouraging separation of potential novel ones. ClearGCD can be seamlessly integrated into parametric GCD approaches and consistently outperforms state-of-the-art methods across multiple benchmarks.

【9】Exact Learning of Arithmetic with Differentiable Agents
标题：具有可区分代理的算术精确学习
链接：https://arxiv.org/abs/2511.22751

作者：Hristo Papazov,Francesco D'Angelo,Nicolas Flammarion
备注：Accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: MATH-AI
摘要：We explore the possibility of exact algorithmic learning with gradient-based methods and introduce a differentiable framework capable of strong length generalization on arithmetic tasks. Our approach centers on Differentiable Finite-State Transducers (DFSTs), a Turing-complete model family that avoids the pitfalls of prior architectures by enabling constant-precision, constant-time generation, and end-to-end log-parallel differentiable training. Leveraging policy-trajectory observations from expert agents, we train DFSTs to perform binary and decimal addition and multiplication. Remarkably, models trained on tiny datasets generalize without error to inputs thousands of times longer than the training examples. These results show that training differentiable agents on structured intermediate supervision could pave the way towards exact gradient-based learning of algorithmic skills. Code available at \href{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}.

【10】DeXposure: A Dataset and Benchmarks for Inter-protocol Credit Exposure in Decentralized Financial Networks
标题：DeXSYS：去中心化金融网络中协议间信用风险的数据集和基准
链接：https://arxiv.org/abs/2511.22314

作者：Wenbin Wu,Kejiang Qian,Alexis Lui,Christopher Jack,Yue Wu,Peter McBurney,Fengxiang He,Bryan Zhang
备注：Data and code: https://github.com/dthinkr/DeXposure - Visualisation: https://ccaf.io/defi/ecosystem-map/visualisation/graph
摘要：We curate the DeXposure dataset, the first large-scale dataset for inter-protocol credit exposure in decentralized financial networks, covering global markets of 43.7 million entries across 4.3 thousand protocols, 602 blockchains, and 24.3 thousand tokens, from 2020 to 2025. A new measure, value-linked credit exposure between protocols, is defined as the inferred financial dependency relationships derived from changes in Total Value Locked (TVL). We develop a token-to-protocol model using DefiLlama metadata to infer inter-protocol credit exposure from the token's stock dynamics, as reported by the protocols. Based on the curated dataset, we develop three benchmarks for machine learning research with financial applications: (1) graph clustering for global network measurement, tracking the structural evolution of credit exposure networks, (2) vector autoregression for sector-level credit exposure dynamics during major shocks (Terra and FTX), and (3) temporal graph neural networks for dynamic link prediction on temporal graphs. From the analysis, we observe (1) a rapid growth of network volume, (2) a trend of concentration to key protocols, (3) a decline of network density (the ratio of actual connections to possible connections), and (4) distinct shock propagation across sectors, such as lending platforms, trading exchanges, and asset management protocols. The DeXposure dataset and code have been released publicly. We envision they will help with research and practice in machine learning as well as financial risk monitoring, policy analysis, DeFi market modeling, amongst others. The dataset also contributes to machine learning research by offering benchmarks for graph clustering, vector autoregression, and temporal graph analysis.

【11】GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
标题：GLA-Grad++：一种改进的Griffin-Lim引导的语音合成扩散模型
链接：https://arxiv.org/abs/2511.22293

作者：Teysir Baoueb,Xiaoyu Bie,Mathieu Fontaine,Gaël Richard
摘要：Recent advances in diffusion models have positioned them as powerful generative frameworks for speech synthesis, demonstrating substantial improvements in audio quality and stability. Nevertheless, their effectiveness in vocoders conditioned on mel spectrograms remains constrained, particularly when the conditioning diverges from the training distribution. The recently proposed GLA-Grad model introduced a phase-aware extension to the WaveGrad vocoder that integrated the Griffin-Lim algorithm (GLA) into the reverse process to reduce inconsistencies between generated signals and conditioning mel spectrogram. In this paper, we further improve GLA-Grad through an innovative choice in how to apply the correction. Particularly, we compute the correction term only once, with a single application of GLA, to accelerate the generation process. Experimental results demonstrate that our method consistently outperforms the baseline models, particularly in out-of-domain scenarios.

【12】Probabilistic Digital Twin for Misspecified Structural Dynamical Systems via Latent Force Modeling and Bayesian Neural Networks
标题：通过潜在力建模和Bayesian神经网络实现错误指定的结构动力系统的概率数字孪生
链接：https://arxiv.org/abs/2511.22133

作者：Sahil Kashyap,Rajdip Nayek
摘要：This work presents a probabilistic digital twin framework for response prediction in dynamical systems governed by misspecified physics. The approach integrates Gaussian Process Latent Force Models (GPLFM) and Bayesian Neural Networks (BNNs) to enable end-to-end uncertainty-aware inference and prediction. In the diagnosis phase, model-form errors (MFEs) are treated as latent input forces to a nominal linear dynamical system and jointly estimated with system states using GPLFM from sensor measurements. A BNN is then trained on posterior samples to learn a probabilistic nonlinear mapping from system states to MFEs, while capturing diagnostic uncertainty. For prognosis, this mapping is used to generate pseudo-measurements, enabling state prediction via Kalman filtering. The framework allows for systematic propagation of uncertainty from diagnosis to prediction, a key capability for trustworthy digital twins. The framework is demonstrated using four nonlinear examples: a single degree of freedom (DOF) oscillator, a multi-DOF system, and two established benchmarks -- the Bouc-Wen hysteretic system and the Silverbox experimental dataset -- highlighting its predictive accuracy and robustness to model misspecification.

【13】Autonomous labeling of surgical resection margins using a foundation model
标题：使用基础模型自主标记手术切除边缘
链接：https://arxiv.org/abs/2511.22131

作者：Xilin Yang,Musa Aydin,Yuhong Lu,Sahan Yoruc Selcuk,Bijie Bai,Yijie Zhang,Andrew Birkeland,Katjana Ehrlich,Julien Bec,Laura Marcu,Nir Pillar,Aydogan Ozcan
备注：20 Pages, 5 Figures
摘要：Assessing resection margins is central to pathological specimen evaluation and has profound implications for patient outcomes. Current practice employs physical inking, which is applied variably, and cautery artifacts can obscure the true margin on histological sections. We present a virtual inking network (VIN) that autonomously localizes the surgical cut surface on whole-slide images, reducing reliance on inks and standardizing margin-focused review. VIN uses a frozen foundation model as the feature extractor and a compact two-layer multilayer perceptron trained for patch-level classification of cautery-consistent features. The dataset comprised 120 hematoxylin and eosin (H&E) stained slides from 12 human tonsil tissue blocks, resulting in ~2 TB of uncompressed raw image data, where a board-certified pathologist provided boundary annotations. In blind testing with 20 slides from previously unseen blocks, VIN produced coherent margin overlays that qualitatively aligned with expert annotations across serial sections. Quantitatively, region-level accuracy was ~73.3% across the test set, with errors largely confined to limited areas that did not disrupt continuity of the whole-slide margin map. These results indicate that VIN captures cautery-related histomorphology and can provide a reproducible, ink-free margin delineation suitable for integration into routine digital pathology workflows and for downstream measurement of margin distances.

【14】Distance-based Learning of Hypertrees
标题：超树的基于距离的学习
链接：https://arxiv.org/abs/2511.22014

作者：Shaun Fallat,Kamyar Khodamoradi,David Kirkpatrick,Valerii Maliuk,S. Ahmad Mojallal,Sandra Zilles
摘要：We study the problem of learning hypergraphs with shortest-path queries (SP-queries), and present the first provably optimal online algorithm for a broad and natural class of hypertrees that we call orderly hypertrees. Our online algorithm can be transformed into a provably optimal offline algorithm. Orderly hypertrees can be positioned within the Fagin hierarchy of acyclic hypergraph (well-studied in database theory), and strictly encompass the broadest class in this hierarchy that is learnable with subquadratic SP-query complexity. Recognizing that in some contexts, such as evolutionary tree reconstruction, distance measurements can degrade with increased distance, we also consider a learning model that uses bounded distance queries. In this model, we demonstrate asymptotically tight complexity bounds for learning general hypertrees.

【15】Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment
标题：模型是否表明数据表明了什么？模型数据对齐的简单启发式
链接：https://arxiv.org/abs/2511.21931

作者：Henry Salgado,Meagan Kendall,Martine Ceberio
摘要：In this work, we propose a simple and computationally efficient framework to evaluate whether machine learning models align with the structure of the data they learn from; that is, whether \textit{the model says what the data says}. Unlike existing interpretability methods that focus exclusively on explaining model behavior, our approach establishes a baseline derived directly from the data itself. Drawing inspiration from Rubin's Potential Outcomes Framework, we quantify how strongly each feature separates the two outcome groups in a binary classification task, moving beyond traditional descriptive statistics to estimate each feature's effect on the outcome. By comparing these data-derived feature rankings against model-based explanations, we provide practitioners with an interpretable and model-agnostic method to assess model--data alignment.

【16】Towards a Foundation Model for Partial Differential Equations Across Physics Domains
标题：迈向跨物理领域偏方程的基础模型
链接：https://arxiv.org/abs/2511.21861

作者：Eduardo Soares,Emilio Vital Brazil,Victor Shirasuna,Breno W. S. R. de Carvalho,Cristiano Malossi
备注：Accepted to the AAAI 2026 AI2ASE Workshop
摘要：We present PDE-FM, a modular foundation model for physics-informed machine learning that unifies spatial, spectral, and temporal reasoning across heterogeneous partial differential equation (PDE) systems. PDE-FM combines spatial-spectral tokenization, physics-aware conditioning, and a Mamba-based state-space backbone with an operator-theoretic decoder, enabling scalable and data-efficient modeling of complex physical dynamics. In contrast to task-specific neural operators, PDE-FM is pretrained once on diverse PDE datasets and can be transferred to new physical regimes without architectural or data-specific modifications. Evaluated on twelve 2D and 3D datasets from The Well benchmark - spanning hydrodynamic, radiative, elastic, and astrophysical phenomena - PDE-FM achieves state-of-the-art accuracy in six domains, reducing mean VRMSE by 46% relative to prior operator-learning baselines. The model demonstrates robust cross-physics generalization, excelling in turbulent and radiative systems while maintaining strong performance in linear and steady-state regimes. These results suggest that large-scale pretraining across diverse physical processes can yield transferable representations of dynamics, marking a step toward unified, foundation-level surrogates for multi-physics simulation and scientific discovery.

【17】Massively Parallel Imitation Learning of Mouse Forelimb Musculoskeletal Reaching Dynamics
标题：小鼠前额肌肉骨骼到达动力学的大规模并行模仿学习
链接：https://arxiv.org/abs/2511.21848

作者：Eric Leonardis,Akira Nagamori,Ayesha Thanawalla,Yuanjia Yang,Joshua Park,Hutton Saunders,Eiman Azim,Talmo Pereira
备注：Accepted at NeurIPS 2025 Workshop Data on the Brain & Mind: Concrete Applications of AI to Neuroscience and Cognitive Science. 12 pages, 4 figures
摘要：The brain has evolved to effectively control the body, and in order to understand the relationship we need to model the sensorimotor transformations underlying embodied control. As part of a coordinated effort, we are developing a general-purpose platform for behavior-driven simulation modeling high fidelity behavioral dynamics, biomechanics, and neural circuit architectures underlying embodied control. We present a pipeline for taking kinematics data from the neuroscience lab and creating a pipeline for recapitulating those natural movements in a biomechanical model. We implement a imitation learning framework to perform a dexterous forelimb reaching task with a musculoskeletal model in a simulated physics environment. The mouse arm model is currently training at faster than 1 million training steps per second due to GPU acceleration with JAX and Mujoco-MJX. We present results that indicate that adding naturalistic constraints on energy and velocity lead to simulated musculoskeletal activity that better predict real EMG signals. This work provides evidence to suggest that energy and control constraints are critical to modeling musculoskeletal motor control.

【18】The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning
标题：值得信赖的机器学习罗生门集的双面性质
链接：https://arxiv.org/abs/2511.21799

作者：Ethan Hsu,Harry Chen,Chudi Zhong,Lesia Semenova
摘要：Real-world machine learning (ML) pipelines rarely produce a single model; instead, they produce a Rashomon set of many near-optimal ones. We show that this multiplicity reshapes key aspects of trustworthiness. At the individual-model level, sparse interpretable models tend to preserve privacy but are fragile to adversarial attacks. In contrast, the diversity within a large Rashomon set enables reactive robustness: even when an attack breaks one model, others often remain accurate. Rashomon sets are also stable under small distribution shifts. However, this same diversity increases information leakage, as disclosing more near-optimal models provides an attacker with progressively richer views of the training data. Through theoretical analysis and empirical studies of sparse decision trees and linear models, we characterize this robustness-privacy trade-off and highlight the dual role of Rashomon sets as both a resource and a risk for trustworthy ML.

【19】Physics-Informed Spiking Neural Networks via Conservative Flux Quantization
标题：基于保守通量量化的物理信息脉冲神经网络
链接：https://arxiv.org/abs/2511.21784

作者：Chi Zhang,Lin Wang
摘要：Real-time, physically-consistent predictions on low-power edge devices is critical for the next generation embodied AI systems, yet it remains a major challenge. Physics-Informed Neural Networks (PINNs) combine data-driven learning with physics-based constraints to ensure the model's predictions are with underlying physical principles.However, PINNs are energy-intensive and struggle to strictly enforce physical conservation laws. Brain-inspired spiking neural networks (SNNs) have emerged as a promising solution for edge computing and real-time processing. However, naively converting PINNs to SNNs degrades physical fidelity and fails to address long-term generalization issues. To this end, this paper introduce a novel Physics-Informed Spiking Neural Network (PISNN) framework. Importantly, to ensure strict physical conservation, we design the Conservative Leaky Integrate-and-Fire (C-LIF) neuron, whose dynamics structurally guarantee local mass preservation. To achieve robust temporal generalization, we introduce a novel Conservative Flux Quantization (CFQ) strategy, which redefines neural spikes as discrete packets of physical flux. Our CFQ learns a time-invariant physical evolution operator, enabling the PISNN to become a general-purpose solver -- conservative-by-construction. Extensive experiments show that our PISNN excels on diverse benchmarks. For both the canonical 1D heat equation and the more challenging 2D Laplace's Equation, it accurately simulates the system dynamics while maintaining perfect mass conservation by design -- a feat that is challenging for conventional PINNs. This work establishes a robust framework for fusing the rigor of scientific computing with the efficiency of neuromorphic engineering, paving the way for complex, long-term, and energy-efficient physics predictions for intelligent systems.

【20】$\mathcal{E}_0$: Enhancing Generalization and Fine-Grained Control in VLA Models via Continuized Discrete Diffusion
链接：https://arxiv.org/abs/2511.21542

作者：Zhihao Zhan,Jiaying Zhou,Likui Zhang,Qinhan Lv,Hao Liu,Jusheng Zhang,Weizheng Li,Ziliang Chen,Tianshui Chen,Keze Wang,Liang Lin,Guangrun Wang
摘要：Vision-Language-Action (VLA) models offer a unified framework for robotic manipulation by integrating visual perception, language understanding, and control generation. Yet existing VLA models still struggle to generalize across diverse tasks, scenes, and camera viewpoints, and often produce coarse or unstable actions. We introduce E0, a continuized discrete diffusion framework that formulates action generation as iterative denoising over quantized action tokens. Compared with continuous diffusion policies, E0 offers two key advantages: (1) discrete action tokens align naturally with the symbolic structure of pretrained VLM/VLA backbones, enabling stronger semantic conditioning; and 2. discrete diffusion matches the true quantized nature of real-world robot control-whose hardware constraints (e.g., encoder resolution, control frequency, actuation latency) inherently discretize continuous signals-and therefore benefits from a Bayes-optimal denoiser that models the correct discrete action distribution, leading to stronger generalization. Compared with discrete autoregressive and mask-based discrete diffusion models, E0 supports a significantly larger and finer-grained action vocabulary and avoids the distributional mismatch introduced by masking-based corruptions-yielding more accurate fine-grained action control. We further introduce a spherical viewpoint perturbation augmentation method to improve robustness to camera shifts without additional data. Experiments on LIBERO, VLABench, and ManiSkill show that E0 achieves state-of-the-art performance across 14 diverse environments, outperforming strong baselines by 10.7% on average. Real-world evaluation on a Franka arm confirms that E0 delivers precise, robust, and transferable manipulation, establishing discrete diffusion as a promising direction for generalizable VLA policy learning.

【21】Optical diffraction neural networks assisted computational ghost imaging through dynamic scattering media
标题：光学折射神经网络通过动态散射媒体辅助计算幽灵成像
链接：https://arxiv.org/abs/2511.22913

作者：Yue-Gang Li,Ze Zheng,Jun-jie Wang,Ming He,Jianping Fan,Tailong Xiao,Guihua Zeng
摘要：Ghost imaging leverages a single-pixel detector with no spatial resolution to acquire object echo intensity signals, which are correlated with illumination patterns to reconstruct an image. This architecture inherently mitigates scattering interference between the object and the detector but sensitive to scattering between the light source and the object. To address this challenge, we propose an optical diffraction neural networks (ODNNs) assisted ghost imaging method for imaging through dynamic scattering media. In our scheme, a set of fixed ODNNs, trained on simulated datasets, is incorporated into the experimental optical path to actively correct random distortions induced by dynamic scattering media. Experimental validation using rotating single-layer and double-layer ground glass confirms the feasibility and effectiveness of our approach. Furthermore, our scheme can also be combined with physics-prior-based reconstruction algorithms, enabling high-quality imaging under undersampled conditions. This work demonstrates a novel strategy for imaging through dynamic scattering media, which can be extended to other imaging systems.

【22】Resolving Sharp Gradients of Unstable Singularities to Machine Precision via Neural Networks
标题：通过神经网络解决不稳定奇异性的尖锐问题以提高机器精度
链接：https://arxiv.org/abs/2511.22819

作者：Yongji Wang,Tristan Léger,Ching-Yao Lai,Tristan Buckmaster
备注：27 pages, 12 figures
摘要：Recent work introduced a robust computational framework combining embedded mathematical structures, advanced optimization, and neural network architecture, leading to the discovery of multiple unstable self-similar solutions for key fluid dynamics equations, including the Incompressible Porous Media (IPM) and 2D Boussinesq systems. While this framework confirmed the existence of these singularities, an accuracy level approaching double-float machine precision was only achieved for stable and 1st unstable solutions of the 1D Córdoba-Córdoba-Fontelos model. For highly unstable solutions characterized by extreme gradients, the accuracy remained insufficient for validation. The primary obstacle is the presence of sharp solution gradients. Those gradients tend to induce large, localized PDE residuals during training, which not only hinder convergence, but also obscure the subtle signals near the origin required to identify the correct self-similar scaling parameter lambda of the solutions. In this work, we introduce a gradient-normalized PDE residual re-weighting scheme to resolve the high-gradient challenge while amplifying the critical residual signals at the origin for lambda identification. Coupled with the multi-stage neural network architecture, the PDE residuals are reduced to the level of round-off error across a wide spectrum of unstable self-similar singularities previously discovered. Furthermore, our method enables the discovery of new highly unstable singularities, i.e. the 4th unstable solution for IPM equations and a novel family of highly unstable solitons for the Nonlinear Schrödinger equations. This results in achieving high-gradient solutions with high precision, providing an important ingredient for bridging the gap between numerical discovery and computer-assisted proofs for unstable phenomena in nonlinear PDEs.

【23】Generative models for crystalline materials
标题：晶体材料的生成模型
链接：https://arxiv.org/abs/2511.22652

作者：Houssam Metni,Laura Ruple,Lauren N. Walters,Luca Torresi,Jonas Teufel,Henrik Schopmans,Jona Östreicher,Yumeng Zhang,Marlen Neubert,Yuri Koide,Kevin Steiner,Paul Link,Lukas Bär,Mariana Petrova,Gerbrand Ceder,Pascal Friederich
摘要：Understanding structure-property relationships in materials is fundamental in condensed matter physics and materials science. Over the past few years, machine learning (ML) has emerged as a powerful tool for advancing this understanding and accelerating materials discovery. Early ML approaches primarily focused on constructing and screening large material spaces to identify promising candidates for various applications. More recently, research efforts have increasingly shifted toward generating crystal structures using end-to-end generative models. This review analyzes the current state of generative modeling for crystal structure prediction and \textit{de novo} generation. It examines crystal representations, outlines the generative models used to design crystal structures, and evaluates their respective strengths and limitations. Furthermore, the review highlights experimental considerations for evaluating generated structures and provides recommendations for suitable existing software tools. Emerging topics, such as modeling disorder and defects, integration in advanced characterization, and incorporating synthetic feasibility constraints, are explored. Ultimately, this work aims to inform both experimental scientists looking to adapt suitable ML models to their specific circumstances and ML specialists seeking to understand the unique challenges related to inverse materials design and discovery.

【24】AdS/Deep-Learning made easy II: neural network-based approaches to holography and inverse problems
标题：AdS/深度学习变得简单II：基于神经网络的全息和逆问题方法
链接：https://arxiv.org/abs/2511.22522

作者：Hyun-Sik Jeong,Hanse Kim,Keun-Young Kim,Gaya Yun,Hyeonwoo Yu,Kwan Yun
备注：31pages, 17 figures
摘要：We apply physics-informed machine learning (PIML) to solve inverse problems in holography and classical mechanics, focusing on neural ordinary differential equations (Neural ODEs) and physics-informed neural networks (PINNs) for solving non-linear differential equations of motion. First, we introduce holographic inverse problems and demonstrate how PIML can reconstruct bulk spacetime and effective potentials from boundary quantum data. To illustrate this, two case studies are explored: the QCD equation of state in holographic QCD and $T$-linear resistivity in holographic strange metals. Additionally, we explicitly show how such holographic problems can be analogized to inverse problems in classical mechanics, modeling frictional forces with neural networks. We also explore Kolmogorov-Arnold Networks (KANs) as an alternative to traditional neural networks, offering more efficient solutions in certain cases. This manuscript aim to provide a systematic framework for using neural networks in inverse problems, serving as a comprehensive reference for researchers in machine learning for high-energy physics, with methodologies that also have broader applications in mathematics, engineering, and the natural sciences.

【25】The Machine Learning Approach to Moment Closure Relations for Plasma: A Review
标题：等离子体动量闭合关系的机器学习方法：综述
链接：https://arxiv.org/abs/2511.22486

作者：Samuel Burles,Enrico Camporeale
备注：30 pages, 2 figures
摘要：The requirement for large-scale global simulations of plasma is an ongoing challenge in both space and laboratory plasma physics. Any simulation based on a fluid model inherently requires a closure relation for the high order plasma moments. This review compiles and analyses the recent surge of machine learning approaches developing improved plasma closure models capable of capturing kinetic phenomena within plasma fluid models. The purpose of this review is both to collect and analyse the various methods employed on the plasma closure problem, including both equation discovery methods and neural network surrogate approaches, as well as to provide a general overview of the state of the problem. In particular, we highlight the challenges of developing a data-driven closure as well as the direction future work should take toward addressing these challenges, in the pursuit of a computationally viable large-scale global simulation.

【26】Algorithms and Scientific Software for Quasi-Monte Carlo, Fast Gaussian Process Regression, and Scientific Machine Learning
标题：准蒙特卡罗、快速高斯过程回归和科学机器学习的算法和科学软件
链接：https://arxiv.org/abs/2511.21915

作者：Aleksei G. Sorokin
备注：PhD thesis
摘要：Most scientific domains elicit the development of efficient algorithms and accessible scientific software. This thesis unifies our developments in three broad domains: Quasi-Monte Carlo (QMC) methods for efficient high-dimensional integration, Gaussian process (GP) regression for high-dimensional interpolation with built-in uncertainty quantification, and scientific machine learning (sciML) for modeling partial differential equations (PDEs) with mesh-free solvers. For QMC, we built new algorithms for vectorized error estimation and developed QMCPy (https://qmcsoftware.github.io/QMCSoftware/): an open-source Python interface to randomized low-discrepancy sequence generators, automatic variable transforms, adaptive error estimation procedures, and diverse use cases. For GPs, we derived new digitally-shift-invariant kernels of higher-order smoothness, developed novel fast multitask GP algorithms, and produced the scalable Python software FastGPs (https://alegresor.github.io/fastgps/). For sciML, we developed a new algorithm capable of machine precision recovery of PDEs with random coefficients. We have also studied a number of applications including GPs for probability of failure estimation, multilevel GPs for the Darcy flow equation, neural surrogates for modeling radiative transfer, and fast GPs for Bayesian multilevel QMC.

【27】Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations
标题：稀疏多核学习：交替最佳响应和半确定松弛
链接：https://arxiv.org/abs/2511.21890

作者：Dimitris Bertsimas,Caio de Prospero Iglesias,Nicholas A. G. Johnson
摘要：We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing l1 regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an l2 penalty for robustness. We solve the resulting non-convex minimax problem via an alternating best response algorithm with two subproblems: the alpha subproblem is a standard kernel SVM dual solved via LIBSVM, while the beta subproblem admits an efficient solution via the Greedy Selector and Simplex Projector algorithm. We reformulate SMKL as a mixed integer semidefinite optimization problem and derive a hierarchy of semidefinite convex relaxations which can be used to certify near-optimality of the solutions returned by our best response algorithm and also to warm start it. On ten UCI benchmarks, our method with random initialization outperforms state-of-the-art MKL approaches in out-of-sample prediction accuracy on average by 3.34 percentage points (relative to the best performing benchmark) while selecting a small number of candidate kernels in comparable runtime. With warm starting, our method outperforms the best performing benchmark's out-of-sample prediction accuracy on average by 4.05 percentage points. Our convex relaxations provide a certificate that in several cases, the solution returned by our best response algorithm is the globally optimal solution.

【28】Automated Statistical and Machine Learning Platform for Biological Research
标题：用于生物研究的自动化统计和机器学习平台
链接：https://arxiv.org/abs/2511.21770

作者：Luke Rimmo Lego,Samantha Gauthier,Denver Jn. Baptiste
备注：7 pages, 2 figures, 25 equations
摘要：Research increasingly relies on computational methods to analyze experimental data and predict molecular properties. Current approaches often require researchers to use a variety of tools for statistical analysis and machine learning, creating workflow inefficiencies. We present an integrated platform that combines classical statistical methods with Random Forest classification for comprehensive data analysis that can be used in the biological sciences. The platform implements automated hyperparameter optimization, feature importance analysis, and a suite of statistical tests including t tests, ANOVA, and Pearson correlation analysis. Our methodology addresses the gap between traditional statistical software, modern machine learning frameworks and biology, by providing a unified interface accessible to researchers without extensive programming experience. The system achieves this through automatic data preprocessing, categorical encoding, and adaptive model configuration based on dataset characteristics. Initial testing protocols are designed to evaluate classification accuracy across diverse chemical datasets with varying feature distributions. This work demonstrates that integrating statistical rigor with machine learning interpretability can accelerate biological discovery workflows while maintaining methodological soundness. The platform's modular architecture enables future extensions to additional machine learning algorithms and statistical procedures relevant to bioinformatics.

【29】DNNs, Dataset Statistics, and Correlation Functions
标题：DNN、数据集统计和相关函数
链接：https://arxiv.org/abs/2511.21715

作者：Robert W. Batterman,James F. Woodward
备注：37 pages, 12 figures
摘要：This paper argues that dataset structure is important in image recognition tasks (among other tasks). Specifically, we focus on the nature and genesis of correlational structure in the actual datasets upon which DNNs are trained. We argue that DNNs are implementing a widespread methodology in condensed matter physics and materials science that focuses on mesoscale correlation structures that live between fundamental atomic/molecular scales and continuum scales. Specifically, we argue that DNNs that are successful in image classification must be discovering high order correlation functions. It is well-known that DNNs successfully generalize in apparent contravention of standard statistical learning theory. We consider the implications of our discussion for this puzzle.

其他(50篇)

【1】Provable Benefits of Sinusoidal Activation for Modular Addition
标题：模块加法的窦激活的可证明的好处
链接：https://arxiv.org/abs/2511.23443

作者：Tianlong Huang,Zhiyuan Li
备注：60 pages, 15 figures
摘要：This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-$2$ exact realizations for any fixed length $m$ and, with bias, width-$2$ exact realizations uniformly over all lengths. In contrast, the width of ReLU networks must scale linearly with $m$ to interpolate, and they cannot simultaneously fit two lengths with different residues modulo $p$. We then provide a novel Natarajan-dimension generalization bound for sine networks, yielding nearly optimal sample complexity $\widetilde{\mathcal{O}}(p)$ for ERM over constant-width sine networks. We also derive width-independent, margin-based generalization for sine networks in the overparametrized regime and validate it. Empirically, sine networks generalize consistently better than ReLU networks across regimes and exhibit strong length extrapolation.

【2】LFM2 Technical Report
标题：LFM 2技术报告
链接：https://arxiv.org/abs/2511.23404

作者：Alexander Amini,Anna Banaszak,Harold Benoit,Arthur Böök,Tarek Dakhran,Song Duong,Alfred Eng,Fernando Fernandes,Marc Härkönen,Anne Harrington,Ramin Hasani,Saniya Karwa,Yuri Khrustalev,Maxime Labonne,Mathias Lechner,Valentine Lechner,Simon Lee,Zetian Li,Noel Loo,Jacob Marks,Edoardo Mosca,Samuel J. Paech,Paul Pak,Rom N. Parnichkun,Alex Quach,Ryan Rogers,Daniela Rus,Nayan Saxena,Bettina Schlager,Tim Seyde,Jimmy T. H. Smith,Aditya Tadimeti,Neehal Tumma
摘要：We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a small number of grouped query attention blocks, delivering up to 2x faster prefill and decode on CPUs compared to similarly sized models. The LFM2 family covers 350M-8.3B parameters, including dense models (350M, 700M, 1.2B, 2.6B) and a mixture-of-experts variant (8.3B total, 1.5B active), all with 32K context length. LFM2's training pipeline includes a tempered, decoupled Top-K knowledge distillation objective that avoids support mismatch; curriculum learning with difficulty-ordered data; and a three-stage post-training recipe of supervised fine-tuning, length-normalized preference optimization, and model merging. Pre-trained on 10-12T tokens, LFM2 models achieve strong results across diverse benchmarks; for example, LFM2-2.6B reaches 79.56% on IFEval and 82.41% on GSM8K. We further build multimodal and retrieval variants: LFM2-VL for vision-language tasks, LFM2-Audio for speech, and LFM2-ColBERT for retrieval. LFM2-VL supports tunable accuracy-latency tradeoffs via token-efficient visual processing, while LFM2-Audio separates audio input and output pathways to enable real-time speech-to-speech interaction competitive with models 3x larger. LFM2-ColBERT provides a low-latency encoder for queries and documents, enabling high-performance retrieval across multiple languages. All models are released with open weights and deployment packages for ExecuTorch, llama.cpp, and vLLM, making LFM2 a practical base for edge applications that need fast, memory-efficient inference and strong task capabilities.

【3】SDE-Attention: Latent Attention in SDE-RNNs for Irregularly Sampled Time Series with Missing Data
标题：SDE-Attention：SDE-RNN中具有缺失数据的不规则采样时间序列的潜在注意力
链接：https://arxiv.org/abs/2511.23238

作者：Yuting Fang,Qouc Le Gia,Flora Salim
备注：11 pages, 6 figures
摘要：Irregularly sampled time series with substantial missing observations are common in healthcare and sensor networks. We introduce SDE-Attention, a family of SDE-RNNs equipped with channel-level attention on the latent pre-RNN state, including channel recalibration, time-varying feature attention, and pyramidal multi-scale self-attention. We therefore conduct a comparison on a synthetic periodic dataset and real-world benchmarks, under varying missing rate. Latent-space attention consistently improves over a vanilla SDE-RNN. On the univariate UCR datasets, the LSTM-based time-varying feature model SDE-TVF-L achieves the highest average accuracy, raising mean performance by approximately 4, 6, and 10 percentage points over the baseline at 30%, 60% and 90% missingness, respectively (averaged across datasets). On multivariate UEA benchmarks, attention-augmented models again outperform the backbone, with SDE-TVF-L yielding up to a 7% gain in mean accuracy under high missingness. Among the proposed mechanisms, time-varying feature attention is the most robust on univariate datasets. On multivariate datasets, different attention types excel on different tasks, showing that SDE-Attention can be flexibly adapted to the structure of each problem.

【4】Fault-Tolerant MARL for CAVs under Observation Perturbations for Highway On-Ramp Merging
标题：高速公路入口匝道合并观察扰动下卡韦的耐故障MARL
链接：https://arxiv.org/abs/2511.23193

作者：Yuchen Shi,Huaxin Pei,Yi Zhang,Danya Yao
摘要：Multi-Agent Reinforcement Learning (MARL) holds significant promise for enabling cooperative driving among Connected and Automated Vehicles (CAVs). However, its practical application is hindered by a critical limitation, i.e., insufficient fault tolerance against observational faults. Such faults, which appear as perturbations in the vehicles' perceived data, can substantially compromise the performance of MARL-based driving systems. Addressing this problem presents two primary challenges. One is to generate adversarial perturbations that effectively stress the policy during training, and the other is to equip vehicles with the capability to mitigate the impact of corrupted observations. To overcome the challenges, we propose a fault-tolerant MARL method for cooperative on-ramp vehicles incorporating two key agents. First, an adversarial fault injection agent is co-trained to generate perturbations that actively challenge and harden the vehicle policies. Second, we design a novel fault-tolerant vehicle agent equipped with a self-diagnosis capability, which leverages the inherent spatio-temporal correlations in vehicle state sequences to detect faults and reconstruct credible observations, thereby shielding the policy from misleading inputs. Experiments in a simulated highway merging scenario demonstrate that our method significantly outperforms baseline MARL approaches, achieving near-fault-free levels of safety and efficiency under various observation fault patterns.

【5】Estimating the Event-Related Potential from Few EEG Trials
标题：从少数脑电试验中估计事件相关潜力
链接：https://arxiv.org/abs/2511.23162

作者：Anders Vestergaard Nørskov,Kasper Jørgensen,Alexander Neergaard Zahid,Morten Mørup
备注：Accepted by Transactions on Machine Learning Research (TMLR). 15 pages main manuscript, 30 pages total including supplementary material
摘要：Event-related potentials (ERP) are measurements of brain activity with wide applications in basic and clinical neuroscience, that are typically estimated using the average of many trials of electroencephalography signals (EEG) to sufficiently reduce noise and signal variability. We introduce EEG2ERP, a novel uncertainty-aware autoencoder approach that maps an arbitrary number of EEG trials to their associated ERP. To account for the ERP uncertainty we use bootstrapped training targets and introduce a separate variance decoder to model the uncertainty of the estimated ERP. We evaluate our approach in the challenging zero-shot scenario of generalizing to new subjects considering three different publicly available data sources; i) the comprehensive ERP CORE dataset that includes over 50,000 EEG trials across six ERP paradigms from 40 subjects, ii) the large P300 Speller BCI dataset, and iii) a neuroimaging dataset on face perception consisting of both EEG and magnetoencephalography (MEG) data. We consistently find that our method in the few trial regime provides substantially better ERP estimates than commonly used conventional and robust averaging procedures. EEG2ERP is the first deep learning approach to map EEG signals to their associated ERP, moving toward reducing the number of trials necessary for ERP research. Code is available at https://github.com/andersxa/EEG2ERP

【6】Adapting Neural Audio Codecs to EEG
标题：使神经音频编解码器适应脑电
链接：https://arxiv.org/abs/2511.23142

作者：Ard Kastrati,Luca Lanzendörfer,Riccardo Rigoni,John Staib Matilla,Roger Wattenhofer
备注：Foundation Models for the Brain and Body (BrainBodyFM@NeurIPS)
摘要：EEG and audio are inherently distinct modalities, differing in sampling rate, channel structure, and scale. Yet, we show that pretrained neural audio codecs can serve as effective starting points for EEG compression, provided that the data are preprocessed to be suitable to the codec's input constraints. Using DAC, a state-of-the-art neural audio codec as our base, we demonstrate that raw EEG can be mapped into the codec's stride-based framing, enabling direct reuse of the audio-pretrained encoder-decoder. Even without modification, this setup yields stable EEG reconstructions, and fine-tuning on EEG data further improves fidelity and generalization compared to training from scratch. We systematically explore compression-quality trade-offs by varying residual codebook depth, codebook (vocabulary) size, and input sampling rate. To capture spatial dependencies across electrodes, we propose DAC-MC, a multi-channel extension with attention-based cross-channel aggregation and channel-specific decoding, while retaining the audio-pretrained initialization. Evaluations on the TUH Abnormal and Epilepsy datasets show that the adapted codecs preserve clinically relevant information, as reflected in spectrogram-based reconstruction loss and downstream classification accuracy.

【7】Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory
标题：稳定边缘的谱集中：核心联想记忆的信息几何
链接：https://arxiv.org/abs/2511.23083

作者：Akira Tamamori
备注：4 pages, 4 figures
摘要：High-capacity kernel Hopfield networks exhibit a "Ridge of Optimization" characterized by extreme stability. While previously linked to "Spectral Concentration," its origin remains elusive. Here, we analyze the network dynamics on a statistical manifold, revealing that the Ridge corresponds to the "Edge of Stability," a critical boundary where the Fisher Information Matrix becomes singular. We demonstrate that the apparent Euclidean force antagonism is a manifestation of \textit{Dual Equilibrium} in the Riemannian space. This unifies learning dynamics and capacity via the Minimum Description Length principle, offering a geometric theory of self-organized criticality.

【8】Maritime Activities Observed Through Open-Access Positioning Data: Moving and Stationary Vessels in the Baltic Sea
标题：通过开放获取定位数据观察的海事活动：波罗的海移动和静止的船只
链接：https://arxiv.org/abs/2511.23016

作者：Moritz Hütten
备注：29 pages, 15 figures, and 9 tables, matching the version published in Geomatics. Accompanying research data are available at http://dx.doi.org/10.6084/m9.figshare.29062715
摘要：Understanding past and present maritime activity patterns is critical for navigation safety, environmental assessment, and commercial operations. An increasing number of services now openly provide positioning data from the Automatic Identification System (AIS) via ground-based receivers. We show that coastal vessel activity can be reconstructed from open access data with high accuracy, even with limited data quality and incomplete receiver coverage. For three months of open AIS data in the Baltic Sea from August to October 2024, we present (i) cleansing and reconstruction methods to improve the data quality, and (ii) a journey model that converts AIS message data into vessel counts, traffic estimates, and spatially resolved vessel density at a resolution of $\sim$400 m. Vessel counts are provided, along with their uncertainties, for both moving and stationary activity. Vessel density maps also enable the identification of port locations, and we infer the most crowded and busiest coastal areas in the Baltic Sea. We find that on average, $\gtrsim$4000 vessels simultaneously operate in the Baltic Sea, and more than 300 vessels enter or leave the area each day. Our results agree within 20\% with previous studies relying on proprietary data.

【9】A Modular Framework for Rapidly Building Intrusion Predictors
标题：快速构建入侵预测器的模块化框架
链接：https://arxiv.org/abs/2511.23000

作者：Xiaoxuan Wang,Rolf Stadler
摘要：We study automated intrusion prediction in an IT system using statistical learning methods. The focus is on developing online attack predictors that detect attacks in real time and identify the current stage of the attack. While such predictors have been proposed in the recent literature, these works typically rely on constructing a monolithic predictor tailored to a specific attack type and scenario. Given that hundreds of attack types are cataloged in the MITRE framework, training a separate monolithic predictor for each of them is infeasible. In this paper, we propose a modular framework for rapidly assembling online attack predictors from reusable components. The modular nature of a predictor facilitates controlling key metrics like timeliness and accuracy of prediction, as well as tuning the trade-off between them. Using public datasets for training and evaluation, we provide many examples of modular predictors and show how an effective predictor can be dynamically assembled during training from a network of modular components.

【10】A Trainable Centrality Framework for Modern Data
标题：现代数据的可训练中心性框架
链接：https://arxiv.org/abs/2511.22959

作者：Minh Duc Vu,Mingshuo Liu,Doudou Zhou
摘要：Measuring how central or typical a data point is underpins robust estimation, ranking, and outlier detection, but classical depth notions become expensive and unstable in high dimensions and are hard to extend beyond Euclidean data. We introduce Fused Unified centrality Score Estimation (FUSE), a neural centrality framework that operates on top of arbitrary representations. FUSE combines a global head, trained from pairwise distance-based comparisons to learn an anchor-free centrality score, with a local head, trained by denoising score matching to approximate a smoothed log-density potential. A single parameter between 0 and 1 interpolates between these calibrated signals, yielding depth-like centrality from different views via one forward pass. Across synthetic distributions, real images, time series, and text data, and standard outlier detection benchmarks, FUSE recovers meaningful classical ordering, reveals multi-scale geometric structures, and attains competitive performance with strong classical baselines while remaining simple and efficient.

【11】CORGI: GNNs with Convolutional Residual Global Interactions for Lagrangian Simulation
标题：CORGI：用于拉格朗日模拟的具有卷积剩余全球相互作用的GNN
链接：https://arxiv.org/abs/2511.22938

作者：Ethan Ji,Yuanzhou Chen,Arush Ramteke,Fang Sun,Tianrun Yu,Jai Parera,Wei Wang,Yizhou Sun
摘要：Partial differential equations (PDEs) are central to dynamical systems modeling, particularly in hydrodynamics, where traditional solvers often struggle with nonlinearity and computational cost. Lagrangian neural surrogates such as GNS and SEGNN have emerged as strong alternatives by learning from particle-based simulations. However, these models typically operate with limited receptive fields, making them inaccurate for capturing the inherently global interactions in fluid flows. Motivated by this observation, we introduce Convolutional Residual Global Interactions (CORGI), a hybrid architecture that augments any GNN-based solver with a lightweight Eulerian component for global context aggregation. By projecting particle features onto a grid, applying convolutional updates, and mapping them back to the particle domain, CORGI captures long-range dependencies without significant overhead. When applied to a GNS backbone, CORGI achieves a 57% improvement in rollout accuracy with only 13% more inference time and 31% more training time. Compared to SEGNN, CORGI improves accuracy by 49% while reducing inference time by 48% and training time by 30%. Even under identical runtime constraints, CORGI outperforms GNS by 47% on average, highlighting its versatility and performance on varied compute budgets.

【12】Covering-Space Normalizing Flows: Approximating Pushforwards on Lens Spaces
标题：覆盖空间规范化流程：镜头空间上的逼近前推
链接：https://arxiv.org/abs/2511.22882

作者：William Ghanem
摘要：We construct pushforward distributions via the universal covering map rho: S^3 -> L(p;q) with the goal of approximating these distributions using flows on L(p;q). We highlight that our method deletes redundancies in the case of a symmetric S^3 distribution. Using our model, we approximate the pushforwards of von Mises-Fisher-induced target densities as well as that of a Z_12-symmetric Boltzmann distribution on S^3 constructed to model benzene.

【13】From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
标题：从像素到感觉：将MLLM与人类对图像的认知感知保持一致
链接：https://arxiv.org/abs/2511.22805

作者：Yiming Chen,Junlin Han,Tianyi Bai,Shengbang Tong,Filippos Kokkinos,Philip Torr
备注：Project page with codes/datasets/models: https://follen-cry.github.io/MLLM-Cognition-project-page/
摘要：While Multimodal Large Language Models (MLLMs) are adept at answering what is in an image-identifying objects and describing scenes-they often lack the ability to understand how an image feels to a human observer. This gap is most evident when considering subjective cognitive properties, such as what makes an image memorable, funny, aesthetically pleasing, or emotionally evocative. To systematically address this challenge, we introduce CogIP-Bench, a comprehensive benchmark for evaluating MLLMs on such image cognitive properties. Our evaluation reveals a significant gap: current models are poorly aligned with human perception of these nuanced properties. We then demonstrate that a post-training phase can effectively bridge this gap, significantly enhancing the model's alignment with human judgments. Furthermore, we show that this learned cognitive alignment is not merely predictive but also transferable to downstream creative tasks. By integrating our cognitively-aligned MLLM into an image generation pipeline, we can guide the synthesis process to produce images that better embody desired traits, such as being more memorable or visually appealing. Our work provides a benchmark to measure this human-like perception, a post-training pipeline to enhance it, and a demonstration that this alignment unlocks more human-centric AI.

【14】GSpaRC: Gaussian Splatting for Real-time Reconstruction of RF Channels
标题：GSpaRC：用于实时重建RF通道的高斯飞溅
链接：https://arxiv.org/abs/2511.22793

作者：Bhavya Sai Nukapotula,Rishabh Tripathi,Seth Pregler,Dileep Kalathil,Srinivas Shakkottai,Theodore S. Rappaport
摘要：Channel state information (CSI) is essential for adaptive beamforming and maintaining robust links in wireless communication systems. However, acquiring CSI incurs significant overhead, consuming up to 25\% of spectrum resources in 5G networks due to frequent pilot transmissions at sub-millisecond intervals. Recent approaches aim to reduce this burden by reconstructing CSI from spatiotemporal RF measurements, such as signal strength and direction-of-arrival. While effective in offline settings, these methods often suffer from inference latencies in the 5--100~ms range, making them impractical for real-time systems. We present GSpaRC: Gaussian Splatting for Real-time Reconstruction of RF Channels, the first algorithm to break the 1 ms latency barrier while maintaining high accuracy. GSpaRC represents the RF environment using a compact set of 3D Gaussian primitives, each parameterized by a lightweight neural model augmented with physics-informed features such as distance-based attenuation. Unlike traditional vision-based splatting pipelines, GSpaRC is tailored for RF reception: it employs an equirectangular projection onto a hemispherical surface centered at the receiver to reflect omnidirectional antenna behavior. A custom CUDA pipeline enables fully parallelized directional sorting, splatting, and rendering across frequency and spatial dimensions. Evaluated on multiple RF datasets, GSpaRC achieves similar CSI reconstruction fidelity to recent state-of-the-art methods while reducing training and inference time by over an order of magnitude. By trading modest GPU computation for a substantial reduction in pilot overhead, GSpaRC enables scalable, low-latency channel estimation suitable for deployment in 5G and future wireless systems. The code is available here: \href{https://github.com/Nbhavyasai/GSpaRC-WirelessGaussianSplatting.git}{GSpaRC}.

【15】Test-time scaling of diffusions with flow maps
标题：使用流图进行扩散的测试时间缩放
链接：https://arxiv.org/abs/2511.22688

作者：Amirmojtaba Sabour,Michael S. Albergo,Carles Domingo-Enrich,Nicholas M. Boffi,Sanja Fidler,Karsten Kreis,Eric Vanden-Eijnden
摘要：A common recipe to improve diffusion models at test-time so that samples score highly against a user-specified reward is to introduce the gradient of the reward into the dynamics of the diffusion itself. This procedure is often ill posed, as user-specified rewards are usually only well defined on the data distribution at the end of generation. While common workarounds to this problem are to use a denoiser to estimate what a sample would have been at the end of generation, we propose a simple solution to this problem by working directly with a flow map. By exploiting a relationship between the flow map and velocity field governing the instantaneous transport, we construct an algorithm, Flow Map Trajectory Tilting (FMTT), which provably performs better ascent on the reward than standard test-time methods involving the gradient of the reward. The approach can be used to either perform exact sampling via importance weighting or principled search that identifies local maximizers of the reward-tilted distribution. We demonstrate the efficacy of our approach against other look-ahead techniques, and show how the flow map enables engagement with complicated reward functions that make possible new forms of image editing, e.g. by interfacing with vision language models.

【16】Modèles de Fondation et Ajustement : Vers une Nouvelle Génération de Modèles pour la Prévision des Séries Temporelles
标题：基金会与调整模式：国家选举模式的新普遍化
链接：https://arxiv.org/abs/2511.22674

作者：Morad Laglil,Emilie Devijver,Eric Gaussier,Bertrand Pracca
备注：in French language
摘要：Inspired by recent advances in large language models, foundation models have been developed for zero-shot time series forecasting, enabling prediction on datasets unseen during pretraining. These large-scale models, trained on vast collections of time series, learn generalizable representations for both point and probabilistic forecasting, reducing the need for task-specific architectures and manual tuning. In this work, we review the main architectures, pretraining strategies, and optimization methods used in such models, and study the effect of fine-tuning after pretraining to enhance their performance on specific datasets. Our empirical results show that fine-tuning generally improves zero-shot forecasting capabilities, especially for long-term horizons.

【17】GazeTrack: High-Precision Eye Tracking Based on Regularization and Spatial Computing
标题：GazeTrack：基于规则化和空间计算的高精度眼动追踪
链接：https://arxiv.org/abs/2511.22607

作者：Xiaoyin Yang
备注：10 pages, 7 figures
摘要：Eye tracking has become increasingly important in virtual and augmented reality applications; however, the current gaze accuracy falls short of meeting the requirements for spatial computing. We designed a gaze collection framework and utilized high-precision equipment to gather the first precise benchmark dataset, GazeTrack, encompassing diverse ethnicities, ages, and visual acuity conditions for pupil localization and gaze tracking. We propose a novel shape error regularization method to constrain pupil ellipse fitting and train on open-source datasets, enhancing semantic segmentation and pupil position prediction accuracy. Additionally, we invent a novel coordinate transformation method similar to paper unfolding to accurately predict gaze vectors on the GazeTrack dataset. Finally, we built a gaze vector generation model that achieves reduced gaze angle error with lower computational complexity compared to other methods.

【18】The Multiclass Score-Oriented Loss (MultiSOL) on the Simplex
标题：单形上的多类分数导向损失（MultiSAL）
链接：https://arxiv.org/abs/2511.22587

作者：Francesco Marchetti,Edoardo Legnaro,Sabrina Guastavino
摘要：In the supervised binary classification setting, score-oriented losses have been introduced with the aim of optimizing a chosen performance metric directly during the training phase, thus avoiding \textit{a posteriori} threshold tuning. To do this, in their construction, the decision threshold is treated as a random variable provided with a certain \textit{a priori} distribution. In this paper, we use a recently introduced multidimensional threshold-based classification framework to extend such score-oriented losses to multiclass classification, defining the Multiclass Score-Oriented Loss (MultiSOL) functions. As also demonstrated by several classification experiments, this proposed family of losses is designed to preserve the main advantages observed in the binary setting, such as the direct optimization of the target metric and the robustness to class imbalance, achieving performance comparable to other state-of-the-art loss functions and providing new insights into the interaction between simplex geometry and score-oriented learning.

【19】Entropy is all you need for Inter-Seed Cross-Play in Hanabi
标题：在Hanabi中进行种子间交叉游戏所需的一切都是信息
链接：https://arxiv.org/abs/2511.22581

作者：Johannes Forkel,Jakob Foerster
摘要：We find that in Hanabi, one of the most complex and popular benchmarks for zero-shot coordination and ad-hoc teamplay, a standard implementation of independent PPO with a slightly higher entropy coefficient 0.05 instead of the typically used 0.01, achieves a new state-of-the-art in cross-play between different seeds, beating by a significant margin all previous specialized algorithms, which were specifically designed for this setting. We provide an intuition for why sufficiently high entropy regularization ensures that different random seed produce joint policies which are mutually compatible. We also empirically find that a high $λ_{\text{GAE}}$ around 0.9, and using RNNs instead of just feed-forward layers in the actor-critic architecture, strongly increase inter-seed cross-play. While these results demonstrate the dramatic effect that hyperparameters can have not just on self-play scores but also on cross-play scores, we show that there are simple Dec-POMDPs though, in which standard policy gradient methods with increased entropy regularization are not able to achieve perfect inter-seed cross-play, thus demonstrating the continuing necessity for new algorithms for zero-shot coordination.

【20】List-Decodable Regression via Expander Sketching
标题：通过扩展器草图进行列表可解码回归
链接：https://arxiv.org/abs/2511.22524

作者：Herbod Pourali,Sajjad Hashemian,Ebrahim Ardeshir-Larijani
摘要：We introduce an expander-sketching framework for list-decodable linear regression that achieves sample complexity $\tilde{O}((d+\log(1/δ))/α)$, list size $O(1/α)$, and near input-sparsity running time $\tilde{O}(\mathrm{nnz}(X)+d^{3}/α)$ under standard sub-Gaussian assumptions. Our method uses lossless expanders to synthesize lightly contaminated batches, enabling robust aggregation and a short spectral filtering stage that matches the best known efficient guarantees while avoiding SoS machinery and explicit batch structure.

【21】Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
标题：通过混合精度增强可信度：基准、机遇和挑战
链接：https://arxiv.org/abs/2511.22483

作者：Guanxi Lu,Hao Mark Chen,Zhiqiang Que,Wayne Luk,Hongxiang Fan
备注：ASP-DAC 2026 Special Session
摘要：Large language models (LLMs) have shown promising performance across various tasks. However, their autoregressive decoding process poses significant challenges for efficient deployment on existing AI hardware. Quantization alleviates memory and compute pressure by compressing weights, activations, and KV caches to low precisions while preserving generation quality. However, existing quantization frameworks typically focus on perplexity or classification accuracy, often omitting critical trustworthiness metrics. This gap introduces risks when applying quantized LLMs to downstream high-stakes domains such as finance and healthcare. In this work, we systematically investigate the impact of quantization on four trustworthiness metrics (adversarial robustness, fairness, machine ethics, and out-of-distribution robustness) and identify the instability across compression ratios and quantization methods. Building on these observations, we develop a novel precision-ensemble voting approach that leverages predictions from mixed-precision variants of the same model and consistently improves performance by up to $5.8\%$ on trustworthiness metrics. Our results highlight the importance of considering trustworthiness when developing model compression techniques and point to research opportunities at the intersection of compression and trustworthiness for safety-critical applications.

【22】An Efficient Embedding Based Ad Retrieval with GPU-Powered Feature Interaction
标题：一种基于嵌入的高效广告检索，并采用基于图形处理器的特征交互
链接：https://arxiv.org/abs/2511.22460

作者：Yifan Lei,Jiahua Luo,Tingyu Jiang,Bo Zhang,Lifeng Wang,Dapeng Liu,Zhaoren Wu,Haijie Gu,Huan Yu,Jie Jiang
备注：9 pages, 4 figures
摘要：In large-scale advertising recommendation systems, retrieval serves as a critical component, aiming to efficiently select a subset of candidate ads relevant to user behaviors from a massive ad inventory for subsequent ranking and recommendation. The Embedding-Based Retrieval (EBR) methods modeled by the dual-tower network are widely used in the industry to maintain both retrieval efficiency and accuracy. However, the dual-tower model has significant limitations: the embeddings of users and ads interact only at the final inner product computation, resulting in insufficient feature interaction capabilities. Although DNN-based models with both user and ad as input features, allowing for early-stage interaction between these features, are introduced in the ranking stage to mitigate this issue, they are computationally infeasible for the retrieval stage. To bridge this gap, this paper proposes an efficient GPU-based feature interaction for the dual-tower network to significantly improve retrieval accuracy while substantially reducing computational costs. Specifically, we introduce a novel compressed inverted list designed for GPU acceleration, enabling efficient feature interaction computation at scale. To the best of our knowledge, this is the first framework in the industry to successfully implement Wide and Deep in a retrieval system. We apply this model to the real-world business scenarios in Tencent Advertising, and experimental results demonstrate that our method outperforms existing approaches in offline evaluation and has been successfully deployed to Tencent's advertising recommendation system, delivering significant online performance gains. This improvement not only validates the effectiveness of the proposed method, but also provides new practical guidance for optimizing large-scale ad retrieval systems.

【23】PISA: Prioritized Invariant Subgraph Aggregation
标题：PISA：优先级不变子图聚合
链接：https://arxiv.org/abs/2511.22435

作者：Ali Ghasemi,Farooq Ahmad Wani,Maria Sofia Bucarelli,Fabrizio Silvestri
摘要：Recent work has extended the invariance principle for out-of-distribution (OOD) generalization from Euclidean to graph data, where challenges arise due to complex structures and diverse distribution shifts in node attributes and topology. To handle these, Chen et al. proposed CIGA (Chen et al., 2022b), which uses causal modeling and an information-theoretic objective to extract a single invariant subgraph capturing causal features. However, this single-subgraph focus can miss multiple causal patterns. Liu et al. (2025) addressed this with SuGAr, which learns and aggregates diverse invariant subgraphs via a sampler and diversity regularizer, improving robustness but still relying on simple uniform or greedy aggregation. To overcome this, the proposed PISA framework introduces a dynamic MLP-based aggregation that prioritizes and combines subgraph representations more effectively. Experiments on 15 datasets, including DrugOOD (Ji et al., 2023), show that PISA achieves up to 5% higher classification accuracy than prior methods.

【24】MATCH: Engineering Transparent and Controllable Conversational XAI Systems through Composable Building Blocks
标题：匹配：通过可组合构建模块设计透明且可控的对话XAI系统
链接：https://arxiv.org/abs/2511.22420

作者：Sebe Vanbrabant,Gustavo Rovelo Ruiz,Davy Vanacken
备注：Submitted Version accepted for publication in an LNCS Volume "Engineering Interactive Computer Systems - EICS 2025 - International Workshops and Doctoral Consortium"
摘要：While the increased integration of AI technologies into interactive systems enables them to solve an increasing number of tasks, the black-box problem of AI models continues to spread throughout the interactive system as a whole. Explainable AI (XAI) techniques can make AI models more accessible by employing post-hoc methods or transitioning to inherently interpretable models. While this makes individual AI models clearer, the overarching system architecture remains opaque. This challenge not only pertains to standard XAI techniques but also to human examination and conversational XAI approaches that need access to model internals to interpret them correctly and completely. To this end, we propose conceptually representing such interactive systems as sequences of structural building blocks. These include the AI models themselves, as well as control mechanisms grounded in literature. The structural building blocks can then be explained through complementary explanatory building blocks, such as established XAI techniques like LIME and SHAP. The flow and APIs of the structural building blocks form an unambiguous overview of the underlying system, serving as a communication basis for both human and automated agents, thus aligning human and machine interpretability of the embedded AI models. In this paper, we present our flow-based approach and a selection of building blocks as MATCH: a framework for engineering Multi-Agent Transparent and Controllable Human-centered systems. This research contributes to the field of (conversational) XAI by facilitating the integration of interpretability into existing interactive systems.

【25】Test Time Training for AC Power Flow Surrogates via Physics and Operational Constraint Refinement
标题：通过物理和操作约束细化进行交流潮流替代品的测试时间训练
链接：https://arxiv.org/abs/2511.22343

作者：Panteleimon Dogoulis,Mohammad Iman Alizadeh,Sylvain Kubler,Maxime Cordy
摘要：Power Flow (PF) calculation based on machine learning (ML) techniques offer significant computational advantages over traditional numerical methods but often struggle to maintain full physical consistency. This paper introduces a physics-informed test-time training (PI-TTT) framework that enhances the accuracy and feasibility of ML-based PF surrogates by enforcing AC power flow equalities and operational constraints directly at inference time. The proposed method performs a lightweight self-supervised refinement of the surrogate outputs through few gradient-based updates, enabling local adaptation to unseen operating conditions without requiring labeled data. Extensive experiments on the IEEE 14-, 118-, and 300-bus systems and the PEGASE 1354-bus network show that PI-TTT reduces power flow residuals and operational constraint violations by one to two orders of magnitude compared with purely ML-based models, while preserving their computational advantage. The results demonstrate that PI-TTT provides fast, accurate, and physically reliable predictions, representing a promising direction for scalable and physics-consistent learning in power system analysis.

【26】Unexplored flaws in multiple-choice VQA evaluations
标题：多项选择VQA评估中未探索的缺陷
链接：https://arxiv.org/abs/2511.22341

作者：Fabio Rosenthal,Sebastian Schmidt,Thorsten Graf,Thorsten Bagodonat,Stephan Günnemann,Leo Schwinn
摘要：Multimodal Large Language Models (MLLMs) demonstrate strong capabilities in handling image-text inputs. A common way to assess this ability is through multiple-choice Visual Question Answering (VQA). Earlier works have already revealed that these benchmarks are sensitive to answer choice order, a limitation that can be mitigated through careful design. Yet, we highlight additional, unexplored biases in prompt formatting that question the reliability of current MLLM evaluations. Specifically, we identify three key variation factors in prompt formatting and analyze their impact through a large-scale study involving $\mathbf{\text{seven}}$ MLLMs and $\mathbf{\text{five}}$ VQA datasets, spanning $\mathbf{48}$ distinct $\mathbf{\text{prompt format variations}}$. Our findings reveal that multiple-choice VQA is highly sensitive to minor prompt format changes, even when these changes are semantically neutral. We further demonstrate that these biases persist independently of known order biases or the MLLM's confidence in the correct answer. Finally, we demonstrate that existing bias mitigation strategies fail to address these newly identified biases.

【27】Online Dynamic Pricing of Complementary Products
标题：互补产品的在线动态定价
链接：https://arxiv.org/abs/2511.22291

作者：Marco Mussi,Marcello Restelli
摘要：Traditional pricing paradigms, once dominated by static models and rule-based heuristics, are increasingly being replaced by dynamic, data-driven approaches powered by machine learning algorithms. Despite their growing sophistication, most dynamic pricing algorithms focus on optimizing the price of each product independently, disregarding potential interactions among items. By neglecting these interdependencies in consumer demand across related goods, sellers may fail to capture the full potential of coordinated pricing strategies. In this paper, we address this problem by exploring dynamic pricing mechanisms designed explicitly for complementary products, aiming to exploit their joint demand structure to maximize overall revenue. We present an online learning algorithm considering both positive and negative interactions between products' demands. The algorithm utilizes transaction data to identify advantageous complementary relationships through an integer programming problem between different items, and then optimizes pricing strategies using data-driven and computationally efficient multi-armed bandit solutions based on heteroscedastic Gaussian processes. We validate our solution in a simulated environment, and we demonstrate that our solution improves the revenue w.r.t. a comparable learning algorithm ignoring such interactions.

【28】The Hidden Cost of Approximation in Online Mirror Descent
标题：在线镜像下降中逼近的隐性成本
链接：https://arxiv.org/abs/2511.22283

作者：Ofir Schlisselberg,Uri Sherman,Tomer Koren,Yishay Mansour
摘要：Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are defined as solutions to optimization subproblems which, oftentimes, can be solved only approximately, leading to an inexact version of the algorithm. Nonetheless, existing OMD analyses typically assume an idealized error free setting, thereby limiting our understanding of performance guarantees that should be expected in practice. In this work we initiate a systematic study into inexact OMD, and uncover an intricate relation between regularizer smoothness and robustness to approximation errors. When the regularizer is uniformly smooth, we establish a tight bound on the excess regret due to errors. Then, for barrier regularizers over the simplex and its subsets, we identify a sharp separation: negative entropy requires exponentially small errors to avoid linear regret, whereas log-barrier and Tsallis regularizers remain robust even when the errors are only polynomial. Finally, we show that when the losses are stochastic and the domain is the simplex, negative entropy regains robustness-but this property does not extend to all subsets, where exponentially small errors are again necessary to avoid suboptimal regret.

【29】Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage
标题：通过REINFORCE和James-Stein收缩设计实例级采样计划
链接：https://arxiv.org/abs/2511.22177

作者：Peiyu Yu,Suraj Kothawade,Sirui Xie,Ying Nian Wu,Hongliang Fei
备注：23 pages
摘要：Most post-training methods for text-to-image samplers focus on model weights: either fine-tuning the backbone for alignment or distilling it for few-step efficiency. We take a different route: rescheduling the sampling timeline of a frozen sampler. Instead of a fixed, global schedule, we learn instance-level (prompt- and noise-conditioned) schedules through a single-pass Dirichlet policy. To ensure accurate gradient estimates in high-dimensional policy learning, we introduce a novel reward baseline based on a principled James-Stein estimator; it provably achieves lower estimation errors than commonly used variants and leads to superior performance. Our rescheduled samplers consistently improve text-image alignment including text rendering and compositional control across modern Stable Diffusion and Flux model families. Additionally, a 5-step Flux-Dev sampler with our schedules can attain generation quality comparable to deliberately distilled samplers like Flux-Schnell. We thus position our scheduling framework as an emerging model-agnostic post-training lever that unlocks additional generative potential in pretrained samplers.

【30】From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures
标题：从布局到检索：用统一签名解码嵌入空间
链接：https://arxiv.org/abs/2511.22150

作者：Florian Rottach,William Rudman,Bastain Rieck,Harrisen Scells,Carsten Eickhoff
摘要：Studying how embeddings are organized in space not only enhances model interpretability but also uncovers factors that drive downstream task performance. In this paper, we present a comprehensive analysis of topological and geometric measures across a wide set of text embedding models and datasets. We find a high degree of redundancy among these measures and observe that individual metrics often fail to sufficiently differentiate embedding spaces. Building on these insights, we introduce Unified Topological Signatures (UTS), a holistic framework for characterizing embedding spaces. We show that UTS can predict model-specific properties and reveal similarities driven by model architecture. Further, we demonstrate the utility of our method by linking topological structure to ranking effectiveness and accurately predicting document retrievability. We find that a holistic, multi-attribute perspective is essential to understanding and leveraging the geometry of text embeddings.

【31】A Variational Manifold Embedding Framework for Nonlinear Dimensionality Reduction
标题：用于非线性奇异性约简的变分Manifics嵌入框架
链接：https://arxiv.org/abs/2511.22128

作者：John J. Vastola,Samuel J. Gershman,Kanaka Rajan
备注：Accepted to the NeurIPS 2025 Workshop on Symmetry and Geometry in Neural Representations (NeurReps)
摘要：Dimensionality reduction algorithms like principal component analysis (PCA) are workhorses of machine learning and neuroscience, but each has well-known limitations. Variants of PCA are simple and interpretable, but not flexible enough to capture nonlinear data manifold structure. More flexible approaches have other problems: autoencoders are generally difficult to interpret, and graph-embedding-based methods can produce pathological distortions in manifold geometry. Motivated by these shortcomings, we propose a variational framework that casts dimensionality reduction algorithms as solutions to an optimal manifold embedding problem. By construction, this framework permits nonlinear embeddings, allowing its solutions to be more flexible than PCA. Moreover, the variational nature of the framework has useful consequences for interpretability: each solution satisfies a set of partial differential equations, and can be shown to reflect symmetries of the embedding objective. We discuss these features in detail and show that solutions can be analytically characterized in some cases. Interestingly, one special case exactly recovers PCA.

【32】Toward Data-Driven Surrogates of the Solar Wind with Spherical Fourier Neural Operator
标题：使用球形傅里叶神经运算符实现太阳风的数据驱动替代
链接：https://arxiv.org/abs/2511.22112

作者：Reza Mansouri,Dustin Kempton,Pete Riley,Rafal Angryk
备注：International Conference on Machine Learning and Applications (ICMLA 2025)
摘要：The solar wind, a continuous stream of charged particles from the Sun's corona, shapes the heliosphere and impacts space systems near Earth. Variations such as high-speed streams and coronal mass ejections can disrupt satellites, power grids, and communications, making accurate modeling essential for space weather forecasting. While 3D magnetohydrodynamic (MHD) models are used to simulate and investigate these variations in the solar wind, they tend to be computationally expensive, limiting their usefulness in investigating the impacts of boundary condition uncertainty. In this work, we develop a surrogate for steady state solar wind modeling, using a Spherical Fourier Neural Operator (SFNO). We compare our model to a previously developed numerical surrogate for this task called HUX, and we show that the SFNO achieves comparable or better performance across several metrics. Though HUX retains advantages in physical smoothness, this underscores the need for improved evaluation criteria rather than a flaw in SFNO. As a flexible and trainable approach, SFNO enables efficient real-time forecasting and can improve with more data. The source code and more visual results are available at https://github.com/rezmansouri/solarwind-sfno-velocity.

【33】Representative Action Selection for Large Action Space: From Bandits to MDPs
标题：大行动空间的代表性行动选择：从强盗到MDPs
链接：https://arxiv.org/abs/2511.22104

作者：Quan Zhou,Shie Mannor
备注：Journal version of arXiv:2505.18269
摘要：We study the problem of selecting a small, representative action subset from an extremely large action space shared across a family of reinforcement learning (RL) environments -- a fundamental challenge in applications like inventory management and recommendation systems, where direct learning over the entire space is intractable. Our goal is to identify a fixed subset of actions that, for every environment in the family, contains a near-optimal action, thereby enabling efficient learning without exhaustively evaluating all actions. This work extends our prior results for meta-bandits to the more general setting of Markov Decision Processes (MDPs). We prove that our existing algorithm achieves performance comparable to using the full action space. This theoretical guarantee is established under a relaxed, non-centered sub-Gaussian process model, which accommodates greater environmental heterogeneity. Consequently, our approach provides a computationally and sample-efficient solution for large-scale combinatorial decision-making under uncertainty.

【34】Equilibrium Propagation Without Limits
标题：均衡传播无限制
链接：https://arxiv.org/abs/2511.22024

作者：Elon Litman
摘要：We liberate Equilibrium Propagation (EP) from the limit of infinitesimal perturbations by establishing a finite-nudge foundation for local credit assignment. By modeling network states as Gibbs-Boltzmann distributions rather than deterministic points, we prove that the gradient of the difference in Helmholtz free energy between a nudged and free phase is exactly the difference in expected local energy derivatives. This validates the classic Contrastive Hebbian Learning update as an exact gradient estimator for arbitrary finite nudging, requiring neither infinitesimal approximations nor convexity. Furthermore, we derive a generalized EP algorithm based on the path integral of loss-energy covariances, enabling learning with strong error signals that standard infinitesimal approximations cannot support.

【35】A Safety and Security Framework for Real-World Agentic Systems
标题：现实世界统计系统的安全和保障框架
链接：https://arxiv.org/abs/2511.21990

作者：Shaona Ghosh,Barnaby Simkin,Kyriacos Shiarlis,Soumili Nandi,Dan Zhao,Matthew Fiedler,Julia Bazinska,Nikki Pope,Roopa Prabhu,Daniel Rohrer,Michael Demoret,Bartley Richardson
摘要：This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and security are not merely fixed attributes of individual models but also emergent properties arising from the dynamic interactions among models, orchestrators, tools, and data within their operating environments. We propose a new way of identification of novel agentic risks through the lens of user safety. Although, for traditional LLMs and agentic models in isolation, safety and security has a clear separation, through the lens of safety in agentic systems, they appear to be connected. Building on this foundation, we define an operational agentic risk taxonomy that unifies traditional safety and security concerns with novel, uniquely agentic risks, including tool misuse, cascading action chains, and unintended control amplification among others. At the core of our approach is a dynamic agentic safety and security framework that operationalizes contextual agentic risk management by using auxiliary AI models and agents, with human oversight, to assist in contextual risk discovery, evaluation, and mitigation. We further address one of the most challenging aspects of safety and security of agentic systems: risk discovery through sandboxed, AI-driven red teaming. We demonstrate the framework effectiveness through a detailed case study of NVIDIA flagship agentic research assistant, AI-Q Research Assistant, showcasing practical, end-to-end safety and security evaluations in complex, enterprise-grade agentic workflows. This risk discovery phase finds novel agentic risks that are then contextually mitigated. We also release the dataset from our case study, containing traces of over 10,000 realistic attack and defense executions of the agentic workflow to help advance research in agentic safety.

【36】Breaking Algorithmic Collusion in Human-AI Ecosystems
标题：打破人机生态系统中的种族勾结
链接：https://arxiv.org/abs/2511.21935

作者：Natalie Collina,Eshwar Ram Arunachaleswaran,Meena Jagadeesan
摘要：AI agents are increasingly deployed in ecosystems where they repeatedly interact not only with each other but also with humans. In this work, we study these human-AI ecosystems from a theoretical perspective, focusing on the classical framework of repeated pricing games. In our stylized model, the AI agents play equilibrium strategies, and one or more humans manually perform the pricing task instead of adopting an AI agent, thereby defecting to a no-regret strategy. Motivated by how populations of AI agents can sustain supracompetitive prices, we investigate whether high prices persist under such defections. Our main finding is that even a single human defection can destabilize collusion and drive down prices, and multiple defections push prices even closer to competitive levels. We further show how the nature of collusion changes under defection-aware AI agents. Taken together, our results characterize when algorithmic collusion is fragile--and when it persists--in mixed ecosystems of AI agents and humans.

【37】Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck
标题：从信息瓶颈看后门训练的动态性
链接：https://arxiv.org/abs/2511.21923

作者：Xinyu Liu,Xu Zhang,Can Chen,Ren Wang
摘要：Understanding how backdoor data influences neural network training dynamics remains a complex and underexplored challenge. In this paper, we present a rigorous analysis of the impact of backdoor data on the learning process, with a particular focus on the distinct behaviors between the target class and other clean classes. Leveraging the Information Bottleneck (IB) principle connected with clustering of internal representation, We find that backdoor attacks create unique mutual information (MI) signatures, which evolve across training phases and differ based on the attack mechanism. Our analysis uncovers a surprising trade-off: visually conspicuous attacks like BadNets can achieve high stealthiness from an information-theoretic perspective, integrating more seamlessly into the model than many visually imperceptible attacks. Building on these insights, we propose a novel, dynamics-based stealthiness metric that quantifies an attack's integration at the model level. We validate our findings and the proposed metric across multiple datasets and diverse attack types, offering a new dimension for understanding and evaluating backdoor threats. Our code is available in: https://github.com/XinyuLiu71/Information_Bottleneck_Backdoor.git.

【38】Exploring Fusion Strategies for Multimodal Vision-Language Systems
标题：探索多模式视觉语言系统的融合策略
链接：https://arxiv.org/abs/2511.21889

作者：Regan Willis,Jason Bakos
摘要：Modern machine learning models often combine multiple input streams of data to more accurately capture the information that informs their decisions. In multimodal machine learning, choosing the strategy for fusing data together requires careful consideration of the application's accuracy and latency requirements, as fusing the data at earlier or later stages in the model architecture can lead to performance changes in accuracy and latency. To demonstrate this tradeoff, we investigate different fusion strategies using a hybrid BERT and vision network framework that integrates image and text data. We explore two different vision networks: MobileNetV2 and ViT. We propose three models for each vision network, which fuse data at late, intermediate, and early stages in the architecture. We evaluate the proposed models on the CMU MOSI dataset and benchmark their latency on an NVIDIA Jetson Orin AGX. Our experimental results demonstrate that while late fusion yields the highest accuracy, early fusion offers the lowest inference latency. We describe the three proposed model architectures and discuss the accuracy and latency tradeoffs, concluding that data fusion earlier in the model architecture results in faster inference times at the cost of accuracy.

【39】Differential privacy from axioms
标题：与公理的差异隐私
链接：https://arxiv.org/abs/2511.21876

作者：Guy Blanc,William Pires,Toniann Pitassi
摘要：Differential privacy (DP) is the de facto notion of privacy both in theory and in practice. However, despite its popularity, DP imposes strict requirements which guard against strong worst-case scenarios. For example, it guards against seemingly unrealistic scenarios where an attacker has full information about all but one point in the data set, and still nothing can be learned about the remaining point. While preventing such a strong attack is desirable, many works have explored whether average-case relaxations of DP are easier to satisfy [HWR13,WLF16,BF16,LWX23]. In this work, we are motivated by the question of whether alternate, weaker notions of privacy are possible: can a weakened privacy notion still guarantee some basic level of privacy, and on the other hand, achieve privacy more efficiently and/or for a substantially broader set of tasks? Our main result shows the answer is no: even in the statistical setting, any reasonable measure of privacy satisfying nontrivial composition is equivalent to DP. To prove this, we identify a core set of four axioms or desiderata: pre-processing invariance, prohibition of blatant non-privacy, strong composition, and linear scalability. Our main theorem shows that any privacy measure satisfying our axioms is equivalent to DP, up to polynomial factors in sample complexity. We complement this result by showing our axioms are minimal: removing any one of our axioms enables ill-behaved measures of privacy.

【40】Saddle-Free Guidance: Improved On-Manifold Sampling without Labels or Additional Training
标题：无鞍指南：改进的歧管上采样，无需标签或额外训练
链接：https://arxiv.org/abs/2511.21863

作者：Eric Yeats,Darryl Hannan,Wilson Fearn,Timothy Doster,Henry Kvinge,Scott Mahan
摘要：Score-based generative models require guidance in order to generate plausible, on-manifold samples. The most popular guidance method, Classifier-Free Guidance (CFG), is only applicable in settings with labeled data and requires training an additional unconditional score-based model. More recently, Auto-Guidance adopts a smaller, less capable version of the original model to guide generation. While each method effectively promotes the fidelity of generated data, each requires labeled data or the training of additional models, making it challenging to guide score-based models when (labeled) training data are not available or training new models is not feasible. We make the surprising discovery that the positive curvature of log density estimates in saddle regions provides strong guidance for score-based models. Motivated by this, we develop saddle-free guidance (SFG) which maintains estimates of maximal positive curvature of the log density to guide individual score-based models. SFG has the same computational cost of classifier-free guidance, does not require additional training, and works with off-the-shelf diffusion and flow matching models. Our experiments indicate that SFG achieves state-of-the-art FID and FD-DINOv2 metrics in single-model unconditional ImageNet-512 generation. When SFG is combined with Auto-Guidance, its unconditional samples achieve general state-of-the-art in FD-DINOv2 score. Our experiments with FLUX.1-dev and Stable Diffusion v3.5 indicate that SFG boosts the diversity of output images compared to CFG while maintaining excellent prompt adherence and image fidelity.

【41】Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy
标题：超越会员资格：差异隐私中添加/删除邻近的限制
链接：https://arxiv.org/abs/2511.21804

作者：Gauri Pradhan,Joonas Jälkö,Santiago Zanella-Bèguelin,Antti Honkela
备注：17 pages, 11 figures
摘要：Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adversary's capability to distinguish two adjacent datasets according to chosen adjacency relation. In practice, most DP implementations use the add/remove adjacency relation, where two datasets are adjacent if one can be obtained from the other by adding or removing a single record, thereby protecting membership. In many ML applications, however, the goal is to protect attributes of individual records (e.g., labels used in supervised fine-tuning). We show that privacy accounting under add/remove overstates attribute privacy compared to accounting under the substitute adjacency relation, which permits substituting one record. To demonstrate this gap, we develop novel attacks to audit DP under substitute adjacency, and show empirically that audit results are inconsistent with DP guarantees reported under add/remove, yet remain consistent with the budget accounted under the substitute adjacency relation. Our results highlight that the choice of adjacency when reporting DP guarantees is critical when the protection target is per-record attributes rather than membership.

【42】A Multiscale Geometric Method for Capturing Relational Topic Alignment
标题：捕捉关系主题对齐的多尺度几何方法
链接：https://arxiv.org/abs/2511.21741

作者：Conrad D. Hougen,Karl T. Pazdernik,Alfred O. Hero
备注：5 pages, 3 figures, 2025 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing
摘要：Interpretable topic modeling is essential for tracking how research interests evolve within co-author communities. In scientific corpora, where novelty is prized, identifying underrepresented niche topics is particularly important. However, contemporary models built from dense transformer embeddings tend to miss rare topics and therefore also fail to capture smooth temporal alignment. We propose a geometric method that integrates multimodal text and co-author network data, using Hellinger distances and Ward's linkage to construct a hierarchical topic dendrogram. This approach captures both local and global structure, supporting multiscale learning across semantic and temporal dimensions. Our method effectively identifies rare-topic structure and visualizes smooth topic drift over time. Experiments highlight the strength of interpretable bag-of-words models when paired with principled geometric alignment.

【43】A Benchmark for Procedural Memory Retrieval in Language Agents
标题：语言代理中程序记忆检索的基准
链接：https://arxiv.org/abs/2511.21730

作者：Ishant Kohar,Aswanth Krishnan
摘要：Current AI agents excel in familiar settings, but fail sharply when faced with novel tasks with unseen vocabularies -- a core limitation of procedural memory systems. We present the first benchmark that isolates procedural memory retrieval from task execution, evaluating whether agents can recognize functionally equivalent procedures that span different object instantiations. Using ALFWorld, we construct dual corpora of expert and LLM-generated trajectories and evaluate six retrieval methods using systematically stratified queries. Our results expose a clear generalization cliff: embedding-based methods perform strongly on familiar contexts, yet degrade considerably on novel ones, while LLM-generated procedural abstractions demonstrate reliable cross-context transfer. Controlled ablations show that although embeddings capture some lexical-level abstraction, they fundamentally treat procedures as unordered bags of words, discarding temporal structure necessary for cross-context transfer. Corpus scale delivers far larger gains than representation enrichment, revealing an architectural ceiling in current encoders. Our benchmark offers the first diagnostic framework separating genuine procedural understanding from surface-level memorization and gives tools for developing retrieval systems capable of dependable generalization. Resources available at our GitHub repository (https://github.com/qpiai/Proced_mem_bench).

【44】Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue
标题：具有主动知识基础的情感多模式代理，以实现情感一致的营销对话
链接：https://arxiv.org/abs/2511.21728

作者：Lin Yu,Xiaofei Han,Yifei Kang,Chiung-Yi Tseng,Danyang Zhang,Ziqian Bi,Zhimo Han
摘要：Recent advances in large language models (LLMs) have enabled fluent dialogue systems, but most remain reactive and struggle in emotionally rich, goal-oriented settings such as marketing conversations. To address this limitation, we propose AffectMind, a multimodal affective dialogue agent that performs proactive reasoning and dynamic knowledge grounding to sustain emotionally aligned and persuasive interactions. AffectMind combines three components: a Proactive Knowledge Grounding Network (PKGN) that continuously updates factual and affective context from text, vision, and prosody; an Emotion--Intent Alignment Model (EIAM) that jointly models user emotion and purchase intent to adapt persuasion strategies; and a Reinforced Discourse Loop (RDL) that optimizes emotional coherence and engagement via reinforcement signals from user responses. Experiments on two newly curated marketing dialogue datasets, MM-ConvMarket and AffectPromo, show that AffectMind outperforms strong LLM-based baselines in emotional consistency (+26\%), persuasive success rate (+19\%), and long-term user engagement (+23\%), highlighting emotion-grounded proactivity as a key capability for commercial multimodal agents.

【45】Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
标题：目标导向搜索在长上下文内存任务中优于目标不可知内存压缩
链接：https://arxiv.org/abs/2511.21726

作者：Yicong Zheng,Kevin L. McKee,Thomas Miconi,Zacharie Bugaud,Mick van Gelderen,Jed McCaleb
摘要：How to enable human-like long-term memory in large language models (LLMs) has been a central question for unlocking more general capabilities such as few-shot generalization. Existing memory frameworks and benchmarks focus on finding the optimal memory compression algorithm for higher performance in tasks that require recollection and sometimes further reasoning. However, such efforts have ended up building more human bias into the compression algorithm, through the search for the best prompts and memory architectures that suit specific benchmarks, rather than finding a general solution that would work on other data distributions. On the other hand, goal-directed search on uncompressed information could potentially exhibit superior performance because compression is lossy, and a predefined compression algorithm will not fit all raw data distributions. Here we present SUMER (Search in Uncompressed Memory via Experience Replay), an end-to-end reinforcement learning agent with verifiable reward (RLVR) that learns to use search tools to gather information and answer a target question. On the LoCoMo dataset for long-context conversation understanding, SUMER with Qwen2.5-7B-Instruct learned to use search tools and outperformed all other biased memory compression approaches and also the full-context baseline, reaching SOTA performance (43% gain over the prior best). We demonstrate that a simple search method applied to raw data outperforms goal-agnostic and biased compression algorithms in current long-context memory tasks, arguing for new paradigms and benchmarks that are more dynamic and autonomously scalable. Code for SUMER and all implemented baselines is publicly available at https://github.com/zycyc/SUMER.

【46】Asymptotic Theory and Phase Transitions for Variable Importance in Quantile Regression Forests
标题：分位数回归森林中变量重要性的渐进理论和阶段转变
链接：https://arxiv.org/abs/2511.23212

作者：Tomoshige Nakamura,Hiroshi Shiraishi
摘要：Quantile Regression Forests (QRF) are widely used for non-parametric conditional quantile estimation, yet statistical inference for variable importance measures remains challenging due to the non-smoothness of the loss function and the complex bias-variance trade-off. In this paper, we develop a asymptotic theory for variable importance defined as the difference in pinball loss risks. We first establish the asymptotic normality of the QRF estimator by handling the non-differentiable pinball loss via Knight's identity. Second, we uncover a "phase transition" phenomenon governed by the subsampling rate $β$ (where $s \asymp n^β$). We prove that in the bias-dominated regime ($β\ge 1/2$), which corresponds to large subsample sizes typically favored in practice to maximize predictive accuracy, standard inference breaks down as the estimator converges to a deterministic bias constant rather than a zero-mean normal distribution. Finally, we derive the explicit analytic form of this asymptotic bias and discuss the theoretical feasibility of restoring valid inference via analytic bias correction. Our results highlight a fundamental trade-off between predictive performance and inferential validity, providing a theoretical foundation for understanding the intrinsic limitations of random forest inference in high-dimensional settings.

【47】A PLS-Integrated LASSO Method with Application in Index Tracking
标题：一种集成SLS的LANSO方法及其在指数跟踪中的应用
链接：https://arxiv.org/abs/2511.23205

作者：Shiqin Tang,Yining Dong,S. Joe Qin
摘要：In traditional multivariate data analysis, dimension reduction and regression have been treated as distinct endeavors. Established techniques such as principal component regression (PCR) and partial least squares (PLS) regression traditionally compute latent components as intermediary steps -- although with different underlying criteria -- before proceeding with the regression analysis. In this paper, we introduce an innovative regression methodology named PLS-integrated Lasso (PLS-Lasso) that integrates the concept of dimension reduction directly into the regression process. We present two distinct formulations for PLS-Lasso, denoted as PLS-Lasso-v1 and PLS-Lasso-v2, along with clear and effective algorithms that ensure convergence to global optima. PLS-Lasso-v1 and PLS-Lasso-v2 are compared with Lasso on the task of financial index tracking and show promising results.

【48】UCB for Large-Scale Pure Exploration: Beyond Sub-Gaussianity
标题：UCB用于大规模纯粹探索：超越次高斯性
链接：https://arxiv.org/abs/2511.22273

作者：Zaile Li,Weiwei Fan,L. Jeff Hong
摘要：Selecting the best alternative from a finite set represents a broad class of pure exploration problems. Traditional approaches to pure exploration have predominantly relied on Gaussian or sub-Gaussian assumptions on the performance distributions of all alternatives, which limit their applicability to non-sub-Gaussian especially heavy-tailed problems. The need to move beyond sub-Gaussianity may become even more critical in large-scale problems, which tend to be especially sensitive to distributional specifications. In this paper, motivated by the widespread use of upper confidence bound (UCB) algorithms in pure exploration and beyond, we investigate their performance in the large-scale, non-sub-Gaussian settings. We consider the simplest category of UCB algorithms, where the UCB value for each alternative is defined as the sample mean plus an exploration bonus that depends only on its own sample size. We abstract this into a meta-UCB algorithm and propose letting it select the alternative with the largest sample size as the best upon stopping. For this meta-UCB algorithm, we first derive a distribution-free lower bound on the probability of correct selection. Building on this bound, we analyze two general non-sub-Gaussian scenarios: (1) all alternatives follow a common location-scale structure and have bounded variance; and (2) when such a structure does not hold, each alternative has a bounded absolute moment of order $q > 3$. In both settings, we show that the meta-UCB algorithm and therefore a broad class of UCB algorithms can achieve the sample optimality. These results demonstrate the applicability of UCB algorithms for solving large-scale pure exploration problems with non-sub-Gaussian distributions. Numerical experiments support our results and provide additional insights into the comparative behaviors of UCB algorithms within and beyond our meta-UCB framework.

【49】On the Effect of Regularization on Nonparametric Mean-Variance Regression
标题：关于正规化对非参数均值-方差回归的影响
链接：https://arxiv.org/abs/2511.22004

作者：Eliot Wong-Toi,Alex Boyd,Vincent Fortuin,Stephan Mandt
摘要：Uncertainty quantification is vital for decision-making and risk assessment in machine learning. Mean-variance regression models, which predict both a mean and residual noise for each data point, provide a simple approach to uncertainty quantification. However, overparameterized mean-variance models struggle with signal-to-noise ambiguity, deciding whether prediction targets should be attributed to signal (mean) or noise (variance). At one extreme, models fit all training targets perfectly with zero residual noise, while at the other, they provide constant, uninformative predictions and explain the targets as noise. We observe a sharp phase transition between these extremes, driven by model regularization. Empirical studies with varying regularization levels illustrate this transition, revealing substantial variability across repeated runs. To explain this behavior, we develop a statistical field theory framework, which captures the observed phase transition in alignment with experimental results. This analysis reduces the regularization hyperparameter search space from two dimensions to one, significantly lowering computational costs. Experiments on UCI datasets and the large-scale ClimSim dataset demonstrate robust calibration performance, effectively quantifying predictive uncertainty.

【50】Invited to Develop: Institutional Belonging and the Counterfactual Architecture of Development
标题：受邀发展：机构归属和反事实发展架构
链接：https://arxiv.org/abs/2511.21865

作者：Diego Vallarino
摘要：This paper examines how institutional belonging shapes long-term development by comparing Spain and Uruguay, two small democracies with similar historical endowments whose trajectories diverged sharply after the 1960s. While Spain integrated into dense European institutional architectures, Uruguay remained embedded within the Latin American governance regime, characterized by weaker coordination and lower institutional coherence. To assess how alternative institutional embeddings could have altered these paths, the study develops a generative counterfactual framework grounded in economic complexity, institutional path dependence, and a Wasserstein GAN trained on data from 1960-2020. The resulting Expected Developmental Shift (EDS) quantifies structural gains or losses from hypothetical re-embedding in different institutional ecosystems. Counterfactual simulations indicate that Spain would have experienced significant developmental decline under a Latin American configuration, while Uruguay would have achieved higher complexity and resilience within a European regime. These findings suggest that development is not solely determined by domestic reforms but emerges from a country's structural position within transnational institutional networks.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递