机器学习学术速递[11.17]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计123篇

大模型相关(12篇)

【1】Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation
标题：诚实胜过准确：通过强化犹豫建立值得信赖的语言模型
链接：https://arxiv.org/abs/2511.11500

作者：Mohamad Amin Mohamadi,Tianhao Wang,Zhiyuan Li
摘要：现代语言模型无法满足可信赖智能的基本要求：知道什么时候不回答。尽管在基准测试中取得了令人印象深刻的准确性，但这些模型产生了自信的幻觉，即使错误的答案会带来灾难性的后果。我们对GSM 8 K、MedQA和GPQA的评估显示，尽管有明确的警告，但前沿模型几乎从不弃权，这表明提示不能覆盖奖励任何答案而不是没有答案的训练。作为补救措施，我们提出了强化犹豫（RH）：对可验证奖励（RLVR）的强化学习进行修改，使用三元奖励（+1正确，0错误，-$λ$错误）而不是二元奖励。逻辑谜题的控制实验表明，改变$λ$会沿着帕累托边界产生不同的模型，其中每个训练惩罚都会为其相应的风险机制产生最优模型：低惩罚产生积极的回答者，高惩罚产生保守的弃权者。然后，我们介绍了两种推理策略，利用训练的回避作为协调信号：级联路由查询通过模型降低风险容忍度，而自我级联重新查询相同的模型回避。两者都优于多数表决，计算成本较低。这些结果确立了谨慎作为一个一流的培训目标，将“我不知道”从失败转化为协调信号，使模型能够通过校准的诚实来赢得信任。
摘要：Modern language models fail a fundamental requirement of trustworthy intelligence: knowing when not to answer. Despite achieving impressive accuracy on benchmarks, these models produce confident hallucinations, even when wrong answers carry catastrophic consequences. Our evaluations on GSM8K, MedQA and GPQA show frontier models almost never abstain despite explicit warnings of severe penalties, suggesting that prompts cannot override training that rewards any answer over no answer. As a remedy, we propose Reinforced Hesitation (RH): a modification to Reinforcement Learning from Verifiable Rewards (RLVR) to use ternary rewards (+1 correct, 0 abstention, -$λ$ error) instead of binary. Controlled experiments on logic puzzles reveal that varying $λ$ produces distinct models along a Pareto frontier, where each training penalty yields the optimal model for its corresponding risk regime: low penalties produce aggressive answerers, high penalties conservative abstainers. We then introduce two inference strategies that exploit trained abstention as a coordination signal: cascading routes queries through models with decreasing risk tolerance, while self-cascading re-queries the same model on abstention. Both outperform majority voting with lower computational cost. These results establish abstention as a first-class training objective that transforms ``I don't know'' from failure into a coordination signal, enabling models to earn trust through calibrated honesty about their limits.

【2】AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery
标题：AIonopedia：一个LLM代理，为离子液体发现精心策划多模式学习
链接：https://arxiv.org/abs/2511.11257

作者：Yuqi Yin,Yibo Fu,Siyuan Wang,Peng Sun,Hongyu Wang,Xiaohui Wang,Lei Zheng,Zhiyong Li,Zhirong Liu,Jianji Wang,Zhaoxi Sun
摘要：新型离子液体（IL）的发现受到性质预测的关键挑战的阻碍，包括有限的数据，模型准确性差和碎片化的工作流程。利用大型语言模型（LLM）的力量，我们介绍了AIonopedia，据我们所知，这是第一个用于IL发现的LLM代理。由LLM增强的多模态结构域基础模型为IL提供支持，AIonopedia能够实现准确的属性预测，并采用分层搜索架构进行分子筛选和设计。在新策划的综合IL数据集上进行训练和评估，我们的模型提供了卓越的性能。补充这些结果，文献报道的系统的评估表明，该代理可以执行有效的IL修改。除了离线测试之外，实际功效还通过真实世界的湿实验室验证得到了进一步证实，其中该代理在具有挑战性的分布任务中表现出卓越的泛化能力，强调了其加速现实世界IL发现的能力。
摘要：The discovery of novel Ionic Liquids (ILs) is hindered by critical challenges in property prediction, including limited data, poor model accuracy, and fragmented workflows. Leveraging the power of Large Language Models (LLMs), we introduce AIonopedia, to the best of our knowledge, the first LLM agent for IL discovery. Powered by an LLM-augmented multimodal domain foundation model for ILs, AIonopedia enables accurate property predictions and incorporates a hierarchical search architecture for molecular screening and design. Trained and evaluated on a newly curated and comprehensive IL dataset, our model delivers superior performance. Complementing these results, evaluations on literature-reported systems indicate that the agent can perform effective IL modification. Moving beyond offline tests, the practical efficacy was further confirmed through real-world wet-lab validation, in which the agent demonstrated exceptional generalization capabilities on challenging out-of-distribution tasks, underscoring its ability to accelerate real-world IL discovery.

【3】Automata-Based Steering of Large Language Models for Diverse Structured Generation
标题：基于自动机的大型语言模型引导以实现多样化结构化生成
链接：https://arxiv.org/abs/2511.11018

作者：Xiaokun Luan,Zeming Wei,Yihao Zhang,Meng Sun
备注：ICFEM 2025 (Best Paper Award)
摘要：大型语言模型（LLM）越来越多地承担着生成结构化输出的任务。虽然结构化生成方法确保有效性，但它们往往缺乏输出多样性，这是我们在初步研究中确认的一个关键限制。我们提出了一种新的方法，以提高多样性的自动机为基础的结构化生成。我们的方法利用自动机遍历历史，引导LLM对新的结构模式。评估表明，我们的方法显着提高结构和内容多样性，同时保持可比的生成效率。此外，我们进行了案例研究，展示了我们的方法在生成不同的测试用例测试开源库的有效性。
摘要：Large language models (LLMs) are increasingly tasked with generating structured outputs. While structured generation methods ensure validity, they often lack output diversity, a critical limitation that we confirm in our preliminary study. We propose a novel method to enhance diversity in automaton-based structured generation. Our approach utilizes automata traversal history to steer LLMs towards novel structural patterns. Evaluations show our method significantly improves structural and content diversity while maintaining comparable generation efficiency. Furthermore, we conduct a case study showcasing the effectiveness of our method in generating diverse test cases for testing open-source libraries.

【4】VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
标题：VisMem：潜在视觉记忆释放视觉语言模型的潜力
链接：https://arxiv.org/abs/2511.11007

作者：Xinlei Yu,Chengming Xu,Guibin Zhang,Zhangquan Chen,Yudong Zhang,Yongbo He,Peng-Tao Jiang,Jiangning Zhang,Xiaobin Hu,Shuicheng Yan
摘要：尽管视觉语言模型（VLM）取得了显着的成功，但它们在一系列复杂的视觉任务上的表现往往受到“视觉处理瓶颈”的阻碍：在长时间的生成过程中，倾向于失去视觉证据的基础并表现出情境化视觉体验的不足。从人类认知记忆理论中汲取灵感，区分短期视觉主导记忆和长期语义主导记忆，我们提出了VisMem，一个认知对齐的框架，配备VLM与动态潜在视觉记忆，短期模块细粒度的感知保留和长期模块抽象语义巩固。这些记忆在推理过程中被无缝调用，使VLM能够在思维和生成过程中保持感知保真度和语义一致性。针对理解、推理和生成的各种视觉基准进行的广泛实验表明，VisMem相对于vanilla模型的平均性能提升了11.8%，优于所有同行，为潜在空间记忆增强建立了一个新的范例。该代码将提供：https://github.com/YU-deep/VisMem.git。
摘要：Despite the remarkable success of Vision-Language Models (VLMs), their performance on a range of complex visual tasks is often hindered by a "visual processing bottleneck": a propensity to lose grounding in visual evidence and exhibit a deficit in contextualized visual experience during prolonged generation. Drawing inspiration from human cognitive memory theory, which distinguishes short-term visually-dominant memory and long-term semantically-dominant memory, we propose VisMem, a cognitively-aligned framework that equips VLMs with dynamic latent vision memories, a short-term module for fine-grained perceptual retention and a long-term module for abstract semantic consolidation. These memories are seamlessly invoked during inference, allowing VLMs to maintain both perceptual fidelity and semantic consistency across thinking and generation. Extensive experiments across diverse visual benchmarks for understanding, reasoning, and generation reveal that VisMem delivers a significant average performance boost of 11.8% relative to the vanilla model and outperforms all counterparts, establishing a new paradigm for latent-space memory enhancement. The code will be available: https://github.com/YU-deep/VisMem.git.

【5】Architecting software monitors for control-flow anomaly detection through large language models and conformance checking
标题：架构软件通过大型语言模型和一致性检查监控控制流异常检测
链接：https://arxiv.org/abs/2511.10876

作者：Francesco Vitale,Francesco Flammini,Mauro Caporuscio,Nicola Mazzocca
摘要：背景：由于现代计算机系统的复杂性，确保其高水平的可靠性变得越来越具有挑战性。虽然系统在设计时就已经验证过了，但是它们的行为在运行时可能会有所不同，可能会由于“未知的未知数”而显示出控制流异常。目的：我们的目标是通过软件监控来检测控制流异常，软件监控通过记录软件执行和检测与预期控制流的偏差来验证运行时行为。研究方法：我们提出了一种方法来开发软件监视器的控制流异常检测，通过大型语言模型（LLM）和一致性检查。该方法建立在现有软件开发实践的基础上，以保持传统的V&V，同时提供额外的健壮性和可信度。它利用LLM链接设计时模型和实现代码，自动化源代码插装。由此产生的事件日志进行分析，通过一致性检查，一个可解释的和有效的控制流异常检测技术。结果如下：我们测试的方法，从欧洲铁路交通管理系统/欧洲训练控制系统（ERTMS/ETCS），这是一个铁路标准的现代互操作铁路的案例研究方案。从ERTMS/ETCS案例研究中获得的结果表明，基于LLM的源代码插装可以实现高达84.775%的参考设计时过程模型的控制流覆盖率，而随后的一致性检查为基础的异常检测达到96.610%F1-score和93.515%AUC的峰值性能。结论：在源代码插装过程中，利用特定领域的知识来指导LLM，可以显著地获得可靠和高质量的软件日志，并通过一致性检查实现有效的控制流异常检测。
摘要：Context: Ensuring high levels of dependability in modern computer-based systems has become increasingly challenging due to their complexity. Although systems are validated at design time, their behavior can be different at run-time, possibly showing control-flow anomalies due to "unknown unknowns". Objective: We aim to detect control-flow anomalies through software monitoring, which verifies run-time behavior by logging software execution and detecting deviations from expected control flow. Methods: We propose a methodology to develop software monitors for control-flow anomaly detection through Large Language Models (LLMs) and conformance checking. The methodology builds on existing software development practices to maintain traditional V&V while providing an additional level of robustness and trustworthiness. It leverages LLMs to link design-time models and implementation code, automating source-code instrumentation. The resulting event logs are analyzed via conformance checking, an explainable and effective technique for control-flow anomaly detection. Results: We test the methodology on a case-study scenario from the European Railway Traffic Management System / European Train Control System (ERTMS/ETCS), which is a railway standard for modern interoperable railways. The results obtained from the ERTMS/ETCS case study demonstrate that LLM-based source-code instrumentation can achieve up to 84.775% control-flow coverage of the reference design-time process model, while the subsequent conformance checking-based anomaly detection reaches a peak performance of 96.610% F1-score and 93.515% AUC. Conclusion: Incorporating domain-specific knowledge to guide LLMs in source-code instrumentation significantly allowed obtaining reliable and quality software logs and enabled effective control-flow anomaly detection through conformance checking.

【6】Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go
标题：Go-UT-Bench：用于Go中基于LLM的单元测试生成的微调数据集
链接：https://arxiv.org/abs/2511.10868

作者：Yashshi Pipalani,Hritik Raj,Rajat Ghosh,Vaishnavi Bhargava,Debojyoti Dutta
备注：9 pages, 5 figures
摘要：训练数据不平衡对代码LLM构成了重大挑战。大多数可用的数据严重超过了原始开源代码，而低估了更广泛的软件工程任务，特别是在像Golang这样的低资源语言中。因此，模型在代码自动完成方面表现出色，但在现实世界的开发人员工作流程（如单元测试生成）中表现不佳。为了解决这一差距，我们引入了GO UT Bench，这是一个包含5264对代码和单元测试的基准数据集，来自10个许可的Golang存储库，跨越不同的领域。我们评估其有效性作为一个微调数据集在两个LLM家庭，即专家和密集解码器的混合物。我们的研究结果表明，微调模型在超过75%的基准任务上优于基本模型。
摘要：Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, especially in low resource languages like Golang. As a result, models excel at code autocompletion but struggle with real world developer workflows such as unit test generation. To address this gap, we introduce GO UT Bench, a benchmark dataset of 5264 pairs of code and unit tests, drawn from 10 permissively licensed Golang repositories spanning diverse domain. We evaluate its effectiveness as a fine tuning dataset across two LLM families i.e. mixture of experts and dense decoders. Our results show that finetuned models outperform their base counterparts on more than 75% of benchmark tasks.

【7】ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries
标题：ExPairT-LLM：通过成对收件箱进行LLM代码选择的精确学习
链接：https://arxiv.org/abs/2511.10855

作者：Tom Yuviler,Dana Drachsler-Cohen
摘要：尽管LLM最近取得了进展，但代码生成的任务仍然具有挑战性。为了应对，代码选择算法从LLM生成的多个程序中选择最佳程序。然而，现有的算法可能无法识别正确的程序，因为它们可能会错误识别非等价程序，或者因为它们依赖于LLM并假设它总是正确地确定每个输入的输出。我们提出了ExPairT-LLM，一个精确的学习算法的代码选择，选择一个程序的LLM甲骨文提出两种新类型的查询：成对成员和成对等价。这些查询对于LLM来说更简单，并且使ExPairT-LLM能够通过锦标赛识别正确的程序，这对一些LLM错误是鲁棒的。我们在四个流行的代码数据集上评估了ExPairT-LLM。其pass@1（成功率）比最先进的代码选择算法平均高出+13.0%，最高可达+27.1%。它还将LLM执行复杂推理的pass@1提高了+24.0%。
摘要：Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. However, existing algorithms can fail to identify the correct program, either because they can misidentify nonequivalent programs or because they rely on an LLM and assume it always correctly determines the output for every input. We present ExPairT-LLM, an exact learning algorithm for code selection that selects a program by posing to an LLM oracle two new types of queries: pairwise membership and pairwise equivalence. These queries are simpler for LLMs and enable ExPairT-LLM to identify the correct program through a tournament, which is robust to some LLM mistakes. We evaluate ExPairT-LLM on four popular code datasets. Its pass@1 (success rate) outperforms the state-of-the-art code selection algorithm on average by +13.0% and up to +27.1%. It also improves the pass@1 of LLMs performing complex reasoning by +24.0%.

【8】Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs
标题：利用参数空间对称性在LLM中进行推理技能转移
链接：https://arxiv.org/abs/2511.10850

作者：Stefan Horoi,Sangwoo Cho,Supriyo Chakraborty,Shi-Xiong Zhang,Sambit Sahu,Guy Wolf,Genta Indra Winata
摘要：任务算法是在大型语言模型（LLM）之间转移技能的强大技术，但当模型在训练过程中出现分歧时，它往往会受到负面干扰。我们通过首先对齐模型的参数空间，利用Transformer架构的固有排列、旋转和缩放对称性来解决这个限制。我们适应现代分组查询注意力（GQA）和SwiGLU层的参数空间对齐，探索基于权重和基于激活的方法。使用这种推理优先策略，我们成功地将高级推理技能转移到非推理模型中。在具有挑战性的推理基准上的实验表明，我们的方法始终优于标准任务算法。这项工作提供了一种有效的方法，用于在不断发展的LLM系列中合并和转移专业技能，减少冗余微调并增强模型适应性。
摘要：Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs), but it often suffers from negative interference when models have diverged during training. We address this limitation by first aligning the models' parameter spaces, leveraging the inherent permutation, rotation, and scaling symmetries of Transformer architectures. We adapt parameter space alignment for modern Grouped-Query Attention (GQA) and SwiGLU layers, exploring both weight-based and activation-based approaches. Using this alignment-first strategy, we successfully transfer advanced reasoning skills to a non-reasoning model. Experiments on challenging reasoning benchmarks show that our method consistently outperforms standard task arithmetic. This work provides an effective approach for merging and transferring specialized skills across evolving LLM families, reducing redundant fine-tuning and enhancing model adaptability.

【9】SURFACEBENCH: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces?
标题：SURFACEBENCH：自演化LLM能找到3D科学表面的方程吗？
链接：https://arxiv.org/abs/2511.10833

作者：Sanchit Kabra,Shobhnik Kriplani,Parshin Shojaee,Chandan K. Reddy
摘要：从数据中发现方程是科学机器学习的核心挑战，需要恢复控制复杂物理和几何现象的简洁符号表达式。最近使用大型语言模型（LLM）的方法在符号回归方面表现出了希望，但它们的成功通常取决于记忆的公式或过于简化的函数形式。现有的基准加剧了这一限制：它们专注于标量函数，忽略域基础，并依赖于脆弱的基于字符串匹配的度量，无法捕获科学等价性。我们介绍SurfaceBench，第一个全面的基准符号表面发现。SurfaceBench包含183个任务，涉及15个符号复杂性类别，包括显式、隐式和参数方程表示形式。每个任务包括地面实况方程、变量语义和综合采样的三维数据。与之前的SR数据集不同，我们的任务反映了表面层次的结构，通过新颖的符号组合来抵抗LLM记忆，并以流体动力学，机器人技术，电磁学和几何学等科学领域为基础。为了评估方程发现质量，我们将符号检查与几何感知度量（如Chamfer和Hausdorff距离）配对，以捕获代数保真度和空间重建精度。我们的实验表明，国家的最先进的框架，虽然偶尔在特定的家庭成功，努力推广跨表示类型和表面复杂性。因此，SurfaceBench建立了一个具有挑战性和诊断性的测试平台，将符号推理与几何重建联系起来，实现了对组合概括，数据驱动的科学归纳和几何感知推理与LLM的进展进行原则性基准测试。我们在这里发布代码：www.example.com
摘要：Equation discovery from data is a core challenge in machine learning for science, requiring the recovery of concise symbolic expressions that govern complex physical and geometric phenomena. Recent approaches with large language models (LLMs) show promise in symbolic regression, but their success often hinges on memorized formulas or overly simplified functional forms. Existing benchmarks exacerbate this limitation: they focus on scalar functions, ignore domain grounding, and rely on brittle string-matching based metrics that fail to capture scientific equivalence. We introduce SurfaceBench, first comprehensive benchmark for symbolic surface discovery. SurfaceBench comprises 183 tasks across 15 categories of symbolic complexity, spanning explicit, implicit, and parametric equation representation forms. Each task includes ground-truth equations, variable semantics, and synthetically sampled three dimensional data. Unlike prior SR datasets, our tasks reflect surface-level structure, resist LLM memorization through novel symbolic compositions, and are grounded in scientific domains such as fluid dynamics, robotics, electromagnetics, and geometry. To evaluate equation discovery quality, we pair symbolic checks with geometry-aware metrics such as Chamfer and Hausdorff distances, capturing both algebraic fidelity and spatial reconstruction accuracy. Our experiments reveal that state-of-the-art frameworks, while occasionally successful on specific families, struggle to generalize across representation types and surface complexities. SurfaceBench thus establishes a challenging and diagnostic testbed that bridges symbolic reasoning with geometric reconstruction, enabling principled benchmarking of progress in compositional generalization, data-driven scientific induction, and geometry-aware reasoning with LLMs. We release the code here: https://github.com/Sanchit-404/surfacebench

【10】PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
标题：PISanitizer：通过即时清理来防止对长上下文LLM的即时注入
链接：https://arxiv.org/abs/2511.10720

作者：Runpeng Geng,Yanting Wang,Chenlong Yin,Minhao Cheng,Ying Chen,Jinyuan Jia
备注：The code is available at https://github.com/sleeepeer/PISanitizer
摘要：长上下文LLM容易受到提示注入的攻击，其中攻击者可以在长上下文中注入指令以诱导LLM生成攻击者期望的输出。现有的提示注入防御是为短上下文设计的。当扩展到长期情景时，它们的有效性有限。原因是注入的指令只构成长上下文的一小部分，这使得防御非常具有挑战性。在这项工作中，我们提出了PISAnitizer，它首先在让后端LLM生成响应之前，在上下文中精确定位和清理潜在的注入令牌（如果有的话），从而消除注入指令的影响。为了清理注入的令牌，PISAnitizer建立在两个观察基础上：（1）提示注入攻击本质上是制作一个指令，迫使LLM遵循它，（2）LLM本质上利用注意力机制来关注关键的输入令牌以生成输出。在这两个观察结果的指导下，我们首先有意地让LLM在上下文中遵循任意指令，然后清理受到高度关注的令牌，这些令牌驱动LLM的遵循行为。通过设计，PISanitizer给攻击者带来了一个困境：注入的指令越有效地迫使LLM遵循它，它就越有可能被PISanitizer净化。我们的广泛评估表明，PISAnitizer可以成功地防止即时注入，保持实用性，优于现有的防御，是有效的，是强大的基于优化和强自适应攻击。该代码可在https://github.com/sleeepeer/PISanitizer上获得。
摘要：Long context LLMs are vulnerable to prompt injection, where an attacker can inject an instruction in a long context to induce an LLM to generate an attacker-desired output. Existing prompt injection defenses are designed for short contexts. When extended to long-context scenarios, they have limited effectiveness. The reason is that an injected instruction constitutes only a very small portion of a long context, making the defense very challenging. In this work, we propose PISanitizer, which first pinpoints and sanitizes potential injected tokens (if any) in a context before letting a backend LLM generate a response, thereby eliminating the influence of the injected instruction. To sanitize injected tokens, PISanitizer builds on two observations: (1) prompt injection attacks essentially craft an instruction that compels an LLM to follow it, and (2) LLMs intrinsically leverage the attention mechanism to focus on crucial input tokens for output generation. Guided by these two observations, we first intentionally let an LLM follow arbitrary instructions in a context and then sanitize tokens receiving high attention that drive the instruction-following behavior of the LLM. By design, PISanitizer presents a dilemma for an attacker: the more effectively an injected instruction compels an LLM to follow it, the more likely it is to be sanitized by PISanitizer. Our extensive evaluation shows that PISanitizer can successfully prevent prompt injection, maintain utility, outperform existing defenses, is efficient, and is robust to optimization-based and strong adaptive attacks. The code is available at https://github.com/sleeepeer/PISanitizer.

【11】Evaluating LLM Understanding via Structured Tabular Decision Simulations
标题：通过结构化表格决策模拟评估LLM理解
链接：https://arxiv.org/abs/2511.10667

作者：Sichao Li,Xinyue Xu,Xiaomeng Li
摘要：大型语言模型（LLM）通常可以实现令人印象深刻的预测准确性，但仅凭正确性并不意味着真正的理解。真正的LLM理解，类似于人类的专业知识，需要在多个实例和不同的领域做出一致的，有根据的决策，依赖于相关和基于领域的决策因素。我们介绍了结构化表格决策模拟（STaDS），一套类似专家的决策设置，评估LLM，就好像他们是专业人士进行结构化决策“考试”。在这种情况下，理解被定义为识别和依赖正确决策因素的能力，这些因素是决定某个领域内结果的特征。STaDS通过以下方式联合评估理解：（i）问题和指令理解，（ii）基于知识的预测，以及（iii）对相关决策因素的依赖。通过分析15个不同决策环境中的9个前沿LLM，我们发现：（a）大多数模型都很难在不同领域实现一致的高准确性;（b）模型可能是准确的，但在全球范围内并不可靠，并且在所陈述的理论和驱动预测的因素之间经常存在不匹配。我们的研究结果强调了全球层面理解评估协议的必要性，并倡导超越准确性的新框架，以提高LLM的理解能力。
摘要：Large language models (LLMs) often achieve impressive predictive accuracy, yet correctness alone does not imply genuine understanding. True LLM understanding, analogous to human expertise, requires making consistent, well-founded decisions across multiple instances and diverse domains, relying on relevant and domain-grounded decision factors. We introduce Structured Tabular Decision Simulations (STaDS), a suite of expert-like decision settings that evaluate LLMs as if they were professionals undertaking structured decision ``exams''. In this context, understanding is defined as the ability to identify and rely on the correct decision factors, features that determine outcomes within a domain. STaDS jointly assesses understanding through: (i) question and instruction comprehension, (ii) knowledge-based prediction, and (iii) reliance on relevant decision factors. By analyzing 9 frontier LLMs across 15 diverse decision settings, we find that (a) most models struggle to achieve consistently strong accuracy across diverse domains; (b) models can be accurate yet globally unfaithful, and there are frequent mismatches between stated rationales and factors driving predictions. Our findings highlight the need for global-level understanding evaluation protocols and advocate for novel frameworks that go beyond accuracy to enhance LLMs' understanding ability.

【12】Bayesian Evaluation of Large Language Model Behavior
标题：大型语言模型行为的Bayesian评估
链接：https://arxiv.org/abs/2511.10661

作者：Rachel Longjohn,Shang Wu,Saatvik Kher,Catarina Belém,Padhraic Smyth
备注：Accepted to NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling
摘要：评估基于大型语言模型（LLM）的文本生成系统的行为越来越重要，例如它们产生有害输出的倾向或它们对对抗性输入的敏感性。这种评估通常依赖于提供给LLM的输入提示的策划基准集，其中每个提示的输出可以以二进制方式进行评估（例如，有害/无害或不泄漏/泄漏敏感信息），并且使用二进制分数的聚合来评估LLM。然而，现有的评估方法往往忽略了统计不确定性量化。考虑到应用统计受众，我们提供了LLM文本生成和评估的背景，然后描述了一种贝叶斯方法，用于量化二进制评估指标中的不确定性。我们特别关注的不确定性，通常部署在基于LLM的系统中的概率文本生成策略引起的。我们提出了两个应用这种方法的案例研究：1）在旨在引起有害反应的对抗性输入的基准上评估拒绝率，以及2）在开放式互动对话示例的基准上评估一个LLM对另一个LLM的成对偏好。我们演示了贝叶斯方法如何提供有用的不确定性量化的行为LLM为基础的系统。
摘要：It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts provided to the LLM, where the output for each prompt may be assessed in a binary fashion (e.g., harmful/non-harmful or does not leak/leaks sensitive information), and the aggregation of binary scores is used to evaluate the LLM. However, existing approaches to evaluation often neglect statistical uncertainty quantification. With an applied statistics audience in mind, we provide background on LLM text generation and evaluation, and then describe a Bayesian approach for quantifying uncertainty in binary evaluation metrics. We focus in particular on uncertainty that is induced by the probabilistic text generation strategies typically deployed in LLM-based systems. We present two case studies applying this approach: 1) evaluating refusal rates on a benchmark of adversarial inputs designed to elicit harmful responses, and 2) evaluating pairwise preferences of one LLM over another on a benchmark of open-ended interactive dialogue examples. We demonstrate how the Bayesian approach can provide useful uncertainty quantification about the behavior of LLM-based systems.

Graph相关(图学习|图神经网络|图优化等)(10篇)

【1】Heterogeneous Attributed Graph Learning via Neighborhood-Aware Star Kernels
标题：基于邻域感知星核的异构属性图学习
链接：https://arxiv.org/abs/2511.11245

作者：Hong Huang,Chengyu Yao,Haiming Chen,Hang Gao
摘要：属性图的典型特征是不规则的拓扑结构以及数字和分类属性的混合，其在诸如社交网络、生物信息学和化学信息学等不同领域中普遍存在。虽然图核提供了一个原则性的框架，用于测量图的相似性，现有的核方法往往难以同时捕获属性图中的异构属性语义和邻域信息。在这项工作中，我们提出了邻居感知星核（NASK），一种新的图形内核设计的属性图学习。NASK利用Gower相似系数的指数变换来有效地联合建模数值和分类特征，并采用Weisfeiler-Lehman迭代增强的星形子结构来整合多尺度邻域结构信息。我们从理论上证明了NASK是正定的，确保了与基于内核的学习框架（如SVM）的兼容性。在11个属性和4个大规模真实世界的图基准上进行了广泛的实验。结果表明，NASK在16个最先进的基线上始终实现了卓越的性能，包括9个图形内核和7个图形神经网络。
摘要：Attributed graphs, typically characterized by irregular topologies and a mix of numerical and categorical attributes, are ubiquitous in diverse domains such as social networks, bioinformatics, and cheminformatics. While graph kernels provide a principled framework for measuring graph similarity, existing kernel methods often struggle to simultaneously capture heterogeneous attribute semantics and neighborhood information in attributed graphs. In this work, we propose the Neighborhood-Aware Star Kernel (NASK), a novel graph kernel designed for attributed graph learning. NASK leverages an exponential transformation of the Gower similarity coefficient to jointly model numerical and categorical features efficiently, and employs star substructures enhanced by Weisfeiler-Lehman iterations to integrate multi-scale neighborhood structural information. We theoretically prove that NASK is positive definite, ensuring compatibility with kernel-based learning frameworks such as SVMs. Extensive experiments are conducted on eleven attributed and four large-scale real-world graph benchmarks. The results demonstrate that NASK consistently achieves superior performance over sixteen state-of-the-art baselines, including nine graph kernels and seven Graph Neural Networks.

【2】Dynamic Deep Graph Learning for Incomplete Multi-View Clustering with Masked Graph Reconstruction Loss
标题：具有掩蔽图重建损失的不完全多视图集群的动态深度图学习
链接：https://arxiv.org/abs/2511.11181

作者：Zhenghao Zhang,Jun Xie,Xingchen Chen,Tao Yu,Hongzhu Yi,Kaixin Xu,Yuanxiang Wang,Tianyu Zong,Xinming Wang,Jiahuan Chen,Guoqing Chao,Feng Chen,Zhepeng Wang,Jungang Xu
摘要：现实世界中多视图数据的普遍存在使得不完全多视图聚类（IMVC）成为一个重要的研究课题。图神经网络（GNNs）的快速发展使其成为多视图聚类的主流方法之一。尽管基于GNNs的IMVC取得了重大进展，但仍然存在一些挑战：（1）大多数方法依赖于K-最近邻（KNN）算法从原始数据构建静态图，这会引入噪声并降低图拓扑的鲁棒性。(2)现有的方法通常利用重建图和稀疏邻接图之间的均方误差（MSE）损失直接作为图重建损失，导致在优化期间产生大量梯度噪声。为了解决这些问题，我们提出了一种新的\textbf{D}深度\textbf{G}学习方法，用于\textbf{I}ncomplete \textbf{M}ulti-\textbf{V} view\textbf{C}lustering with \textbf{M}asked Graph Reconstruction Loss（DGIMVCM）。首先，我们从原始数据中构造一个缺失鲁棒全局图。然后设计一个图卷积嵌入层来提取主要特征和细化的动态视图特定图结构，利用全局图来填补缺失的视图。图结构对比学习补充了这个过程，它识别特定于视图的图结构之间的一致性。其次，引入图自注意编码器，以基于估算的主要特征和视图特定的图来提取高级表示，并且使用掩蔽的图重建损失来优化，以减轻优化期间的梯度噪声。最后，通过伪标签自监督训练机制构建并优化聚类模块。在多个数据集上的实验验证了DGIMVCM的有效性和优越性。
摘要：The prevalence of real-world multi-view data makes incomplete multi-view clustering (IMVC) a crucial research. The rapid development of Graph Neural Networks (GNNs) has established them as one of the mainstream approaches for multi-view clustering. Despite significant progress in GNNs-based IMVC, some challenges remain: (1) Most methods rely on the K-Nearest Neighbors (KNN) algorithm to construct static graphs from raw data, which introduces noise and diminishes the robustness of the graph topology. (2) Existing methods typically utilize the Mean Squared Error (MSE) loss between the reconstructed graph and the sparse adjacency graph directly as the graph reconstruction loss, leading to substantial gradient noise during optimization. To address these issues, we propose a novel \textbf{D}ynamic Deep \textbf{G}raph Learning for \textbf{I}ncomplete \textbf{M}ulti-\textbf{V}iew \textbf{C}lustering with \textbf{M}asked Graph Reconstruction Loss (DGIMVCM). Firstly, we construct a missing-robust global graph from the raw data. A graph convolutional embedding layer is then designed to extract primary features and refined dynamic view-specific graph structures, leveraging the global graph for imputation of missing views. This process is complemented by graph structure contrastive learning, which identifies consistency among view-specific graph structures. Secondly, a graph self-attention encoder is introduced to extract high-level representations based on the imputed primary features and view-specific graphs, and is optimized with a masked graph reconstruction loss to mitigate gradient noise during optimization. Finally, a clustering module is constructed and optimized through a pseudo-label self-supervised training mechanism. Extensive experiments on multiple datasets validate the effectiveness and superiority of DGIMVCM.

【3】Improving Continual Learning of Knowledge Graph Embeddings via Informed Initialization
标题：通过知情者改善知识图谱嵌入的持续学习
链接：https://arxiv.org/abs/2511.11118

作者：Gerard Pons,Besim Bilalli,Anna Queralt
摘要：许多知识图（KG）经常更新，迫使其知识图嵌入（KGE）适应这些变化。为了解决这个问题，KGE的持续学习技术在更新旧实体的同时为新实体嵌入。这些方法中的一个必要步骤是嵌入的初始化，作为KGE学习过程的输入，这可能对最终嵌入的准确性以及训练它们所需的时间产生重要影响。这对于相对较小和频繁的更新尤其重要。我们提出了一种新的知情嵌入初始化策略，它可以无缝集成到现有的KGE的持续学习方法，提高了新知识的获取，同时减少灾难性遗忘。具体而言，KG模式和先前学习的嵌入被用于基于实体所属的类来获得新实体的初始表示。我们广泛的实验分析表明，所提出的初始化策略提高了KGE的预测性能，同时也提高了知识保留。此外，我们的方法加速了知识获取，减少了迭代次数，从而减少了增量学习新嵌入所需的时间。最后，它的好处在各种类型的KGE学习模型的证明。
摘要：Many Knowledege Graphs (KGs) are frequently updated, forcing their Knowledge Graph Embeddings (KGEs) to adapt to these changes. To address this problem, continual learning techniques for KGEs incorporate embeddings for new entities while updating the old ones. One necessary step in these methods is the initialization of the embeddings, as an input to the KGE learning process, which can have an important impact in the accuracy of the final embeddings, as well as in the time required to train them. This is especially relevant for relatively small and frequent updates. We propose a novel informed embedding initialization strategy, which can be seamlessly integrated into existing continual learning methods for KGE, that enhances the acquisition of new knowledge while reducing catastrophic forgetting. Specifically, the KG schema and the previously learned embeddings are utilized to obtain initial representations for the new entities, based on the classes the entities belong to. Our extensive experimental analysis shows that the proposed initialization strategy improves the predictive performance of the resulting KGEs, while also enhancing knowledge retention. Furthermore, our approach accelerates knowledge acquisition, reducing the number of epochs, and therefore time, required to incrementally learn new embeddings. Finally, its benefits across various types of KGE learning models are demonstrated.

【4】Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph Learning
标题：基于无回声标签的预计算用于内存高效的异类图学习
链接：https://arxiv.org/abs/2511.11081

作者：Jun Hu,Shangheng Chen,Yufei He,Yuan Li,Bryan Hooi,Bingsheng He
备注：Accepted by AAAI 2026
摘要：异构图神经网络（HGNN）被广泛用于异构图的深度学习。典型的端到端HGNN在训练过程中需要重复的消息传递，这限制了大规模真实世界图的效率。基于预计算的HGNN通过在预处理期间仅执行一次消息传递来解决这个问题，将邻居信息收集到规则形状的张量中，从而实现高效的小批量训练。基于标签的预计算方法收集邻居的标签信息，但遭受训练标签泄漏，其中节点自己的标签信息在多跳消息传递期间传播回自身-回声效应。现有的缓解策略在大型图上内存效率低下，或者与高级消息传递方法存在兼容性问题。我们提出了基于回声标签的预计算（Echoless-LP），它消除了分区聚焦回声传播（PFEP）的训练标签泄漏。PFEP对目标节点进行分区并执行无回声传播，其中每个分区中的节点仅从其他分区中的邻居收集标签信息，避免回声，同时保持内存效率并与任何消息传递方法兼容。我们还引入了非对称分区方案（APS）和PostAdjust机制，以解决分区和跨分区分布偏移造成的信息丢失问题。在公共数据集上的实验表明，Echoless-LP与基线相比具有更好的性能并保持了内存效率。
摘要：Heterogeneous Graph Neural Networks (HGNNs) are widely used for deep learning on heterogeneous graphs. Typical end-to-end HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Pre-computation-based HGNNs address this by performing message passing only once during preprocessing, collecting neighbor information into regular-shaped tensors, which enables efficient mini-batch training. Label-based pre-computation methods collect neighbors' label information but suffer from training label leakage, where a node's own label information propagates back to itself during multi-hop message passing - the echo effect. Existing mitigation strategies are memory-inefficient on large graphs or suffer from compatibility issues with advanced message passing methods. We propose Echoless Label-based Pre-computation (Echoless-LP), which eliminates training label leakage with Partition-Focused Echoless Propagation (PFEP). PFEP partitions target nodes and performs echoless propagation, where nodes in each partition collect label information only from neighbors in other partitions, avoiding echo while remaining memory-efficient and compatible with any message passing method. We also introduce an Asymmetric Partitioning Scheme (APS) and a PostAdjust mechanism to address information loss from partitioning and distributional shifts across partitions. Experiments on public datasets demonstrate that Echoless-LP achieves superior performance and maintains memory efficiency compared to baselines.

【5】Enhancing Graph Representations with Neighborhood-Contextualized Message-Passing
标题：通过邻居上下文化消息传递增强图表示
链接：https://arxiv.org/abs/2511.11046

作者：Brian Godwin Lim
摘要：图神经网络（GNN）已经成为分析关系数据不可或缺的工具。在文献中，经典的GNN可以分为三种变体：卷积，注意力和消息传递。虽然标准的消息传递变体具有高度的表达性，但其典型的成对消息仅单独考虑中心节点和每个相邻节点的特征。这种设计无法整合更广泛的本地邻域中包含的丰富上下文信息，可能会阻碍其学习整个相邻节点集内复杂关系的能力。为了解决这个问题，这项工作首先形式化的概念，邻里语境化，植根于一个关键属性的注意力的变体。然后，这是将消息传递变体推广到建议的邻域上下文消息传递（NCMP）框架的基础。为了证明其效用，提出了一种简单，实用，高效的方法来参数化和操作NCMP，导致提出的软同构邻域上下文化图卷积网络（SINC-GCN）的发展。对合成二进制节点分类问题的初步分析强调了所提出的GNN架构的表达能力和效率。总的来说，本文为新的NCMP框架奠定了基础，作为进一步增强经典GNN的图形表示能力的实用途径。
摘要：Graph neural networks (GNNs) have become an indispensable tool for analyzing relational data. In the literature, classical GNNs may be classified into three variants: convolutional, attentional, and message-passing. While the standard message-passing variant is highly expressive, its typical pair-wise messages nevertheless only consider the features of the center node and each neighboring node individually. This design fails to incorporate the rich contextual information contained within the broader local neighborhood, potentially hindering its ability to learn complex relationships within the entire set of neighboring nodes. To address this limitation, this work first formalizes the concept of neighborhood-contextualization, rooted in a key property of the attentional variant. This then serves as the foundation for generalizing the message-passing variant to the proposed neighborhood-contextualized message-passing (NCMP) framework. To demonstrate its utility, a simple, practical, and efficient method to parametrize and operationalize NCMP is presented, leading to the development of the proposed Soft-Isomorphic Neighborhood-Contextualized Graph Convolution Network (SINC-GCN). A preliminary analysis on a synthetic binary node classification problem then underscores both the expressivity and efficiency of the proposed GNN architecture. Overall, the paper lays the foundation for the novel NCMP framework as a practical path toward further enhancing the graph representational power of classical GNNs.

【6】GraphToxin: Reconstructing Full Unlearned Graphs from Graph Unlearning
标题：GraphToxin：从图未学习重建完全未学习的图
链接：https://arxiv.org/abs/2511.10936

作者：Ying Song,Balaji Palanisamy
备注：Submitted to S&P 2026. Code will be available
摘要：图形学习已经成为遵守“被遗忘权”法规的一个有前途的解决方案，它可以根据请求删除敏感信息。然而，这种解决方案并非万无一失。多方的参与创造了新的攻击面，被删除数据的残留痕迹仍然可以留在未学习的图神经网络中。这些漏洞可以被攻击者利用来恢复所谓的删除样本，从而破坏了图形学习的固有功能。在这项工作中，我们提出了GraphToxin，这是第一个针对图学习的图重建攻击。具体来说，我们引入了一种新的曲率匹配模块，为完全未学习的图恢复提供细粒度的指导。我们证明了GraphToxin可以成功地颠覆图形学习所期望的监管保证-它不仅可以恢复已删除的个人信息和个人链接，还可以从其连接中恢复敏感内容，从而构成更有害的威胁。此外，我们将GraphToxin扩展到白盒和黑盒设置下的多个节点删除。我们强调了最坏情况分析的必要性，并提出了一个全面的评估框架，系统地评估随机和最坏情况下的节点删除的攻击性能。这提供了一个更强大的和现实的衡量脆弱性的图unlearning方法图重建攻击。我们广泛的实验证明了GraphToxin的有效性和灵活性。值得注意的是，我们表明，现有的防御机制在很大程度上对这种攻击无效，在某些情况下，甚至可以放大其性能。鉴于GraphToxin带来的严重隐私风险，我们的工作强调了迫切需要针对这种攻击制定更有效和更强大的防御策略。
摘要：Graph unlearning has emerged as a promising solution for complying with "the right to be forgotten" regulations by enabling the removal of sensitive information upon request. However, this solution is not foolproof. The involvement of multiple parties creates new attack surfaces, and residual traces of deleted data can still remain in the unlearned graph neural networks. These vulnerabilities can be exploited by attackers to recover the supposedly erased samples, thereby undermining the inherent functionality of graph unlearning. In this work, we propose GraphToxin, the first graph reconstruction attack against graph unlearning. Specifically, we introduce a novel curvature matching module to provide a fine-grained guidance for full unlearned graph recovery. We demonstrate that GraphToxin can successfully subvert the regulatory guarantees expected from graph unlearning - it can recover not only a deleted individual's information and personal links but also sensitive content from their connections, thereby posing substantially more detrimental threats. Furthermore, we extend GraphToxin to multiple node removals under both white-box and black-box setting. We highlight the necessity of a worst-case analysis and propose a comprehensive evaluation framework to systematically assess the attack performance under both random and worst-case node removals. This provides a more robust and realistic measure of the vulnerability of graph unlearning methods to graph reconstruction attacks. Our extensive experiments demonstrate the effectiveness and flexibility of GraphToxin. Notably, we show that existing defense mechanisms are largely ineffective against this attack and, in some cases, can even amplify its performance. Given the severe privacy risks posed by GraphToxin, our work underscores the urgent need for the development of more effective and robust defense strategies against this attack.

【7】Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework
标题：迈向联合集群：客户端专用图聚合框架
链接：https://arxiv.org/abs/2511.10915

作者：Guanxiong He,Jie Wang,Liaoyuan Tang,Zheng Wang,Rong Wang,Feiping Nie
摘要：联合聚类解决了从分散的未标记数据中提取模式的关键挑战。然而，它受到当前方法被迫接受性能和隐私之间的妥协的缺陷的阻碍：\textit{传输嵌入表示有敏感数据泄漏的风险，而仅共享抽象集群原型会导致模型准确性降低}。为了解决这个难题，我们提出了结构隐私保护联邦图聚类（SPP-FGC），一种新的算法，创新地利用本地结构图作为隐私保护知识共享的主要媒介，从而超越了传统技术的局限性。我们的框架在一个清晰的客户端-服务器逻辑上运行;在客户端，每个参与者构建一个私有的结构图，该图捕获内在的数据关系，然后服务器安全地聚合和对齐，以形成一个全面的全局图，从中导出一个统一的聚类结构。该框架提供了两种不同的模式，以满足不同的需求。SPP-FGC被设计为一种高效的一次性方法，可在单次通信中完成其任务，是快速分析的理想选择。对于更复杂的非结构化数据（如图像），SPP-FGC+采用迭代过程，客户端和服务器协同优化特征表示，以实现卓越的下游性能。大量的实验表明，我们的框架实现了最先进的性能，提高聚类精度高达10%（NMI）超过联邦基线，同时保持可证明的隐私保证。
摘要：Federated clustering addresses the critical challenge of extracting patterns from decentralized, unlabeled data. However, it is hampered by the flaw that current approaches are forced to accept a compromise between performance and privacy: \textit{transmitting embedding representations risks sensitive data leakage, while sharing only abstract cluster prototypes leads to diminished model accuracy}. To resolve this dilemma, we propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing, thus moving beyond the limitations of conventional techniques. Our framework operates on a clear client-server logic; on the client-side, each participant constructs a private structural graph that captures intrinsic data relationships, which the server then securely aggregates and aligns to form a comprehensive global graph from which a unified clustering structure is derived. The framework offers two distinct modes to suit different needs. SPP-FGC is designed as an efficient one-shot method that completes its task in a single communication round, ideal for rapid analysis. For more complex, unstructured data like images, SPP-FGC+ employs an iterative process where clients and the server collaboratively refine feature representations to achieve superior downstream performance. Extensive experiments demonstrate that our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10\% (NMI) over federated baselines while maintaining provable privacy guarantees.

【8】Graph Attention Network for Predicting Duration of Large-Scale Power Outages Induced by Natural Disasters
标题：预测自然灾害引发大规模停电持续时间的图注意力网络
链接：https://arxiv.org/abs/2511.10898

作者：Chenghao Duan,Chuanyi Ji
摘要：飓风、野火和冬季风暴等自然灾害导致美国大规模停电，造成巨大的经济和社会影响。准确预测停电恢复和影响是电网恢复的关键。机器学习的最新进展为从地理空间和天气数据估计停电持续时间提供了可行的框架。然而，三个主要的挑战是固有的任务在现实世界中设置：空间依赖性的数据，空间异质性的影响，和温和的事件数据。我们提出了一种新的方法来估计严重的天气引起的停电持续时间，通过图形注意力网络（GAT）。我们的网络使用了一个简单的结构，来自无监督的预训练，然后是半监督学习。我们使用的现场数据，从四个主要的飓风影响$501$县在美国东南部八个州。该模型表现出良好的性能（> 93%的准确率），在整体性能和分类准确率方面都优于现有的XGBoost，Random Forest，GCN和简单GAT方法2%~ 15%.
摘要：Natural disasters such as hurricanes, wildfires, and winter storms have induced large-scale power outages in the U.S., resulting in tremendous economic and societal impacts. Accurately predicting power outage recovery and impact is key to resilience of power grid. Recent advances in machine learning offer viable frameworks for estimating power outage duration from geospatial and weather data. However, three major challenges are inherent to the task in a real world setting: spatial dependency of the data, spatial heterogeneity of the impact, and moderate event data. We propose a novel approach to estimate the duration of severe weather-induced power outages through Graph Attention Networks (GAT). Our network uses a simple structure from unsupervised pre-training, followed by semi-supervised learning. We use field data from four major hurricanes affecting $501$ counties in eight Southeastern U.S. states. The model exhibits an excellent performance ($>93\%$ accuracy) and outperforms the existing methods XGBoost, Random Forest, GCN and simple GAT by $2\% - 15\%$ in both the overall performance and class-wise accuracy.

【9】Incorporating Spatial Information into Goal-Conditioned Hierarchical Reinforcement Learning via Graph Representations
标题：基于图表示的空间信息目标约束分层强化学习
链接：https://arxiv.org/abs/2511.10872

作者：Shuyuan Zhang,Zihan Wang,Xiao-Wen Chang,Doina Precup
备注：Transactions on Machine Learning Research (2025)
摘要：图与目标条件分层强化学习（GCHRL）的集成最近引起了人们的关注，因为中间目标（子目标）可以从自然代表大多数RL任务中的整体任务结构的图中有效地采样。然而，现有的方法通常依赖于特定领域的知识来构建这些图，限制了它们对新任务的适用性。其他基于图的方法在探索期间动态地创建图，但很难充分利用它们，因为它们在将图中的信息传递到新访问的状态时存在问题。此外，目前的GCHRL方法面临的挑战，如样本效率低下和不良的子目标表示。本文提出了一个解决方案，这些问题，通过开发一个图形编码器-解码器来评估看不见的状态。我们提出的方法，图形引导的子目标表示生成RL（G4 RL），可以被纳入任何现有的GCHRL方法时，主要是对称和可逆的过渡，以提高这类问题的性能环境中运行。我们表明，图编码器-解码器可以有效地实现使用在探索过程中生成的状态图上训练的网络。实证结果表明，利用图编码器-解码器的高水平和低水平内在奖励显着提高了最先进的GCHRL方法的性能，在密集和稀疏奖励环境中具有额外的小计算成本。
摘要：The integration of graphs with Goal-conditioned Hierarchical Reinforcement Learning (GCHRL) has recently gained attention, as intermediate goals (subgoals) can be effectively sampled from graphs that naturally represent the overall task structure in most RL tasks. However, existing approaches typically rely on domain-specific knowledge to construct these graphs, limiting their applicability to new tasks. Other graph-based approaches create graphs dynamically during exploration but struggle to fully utilize them, because they have problems passing the information in the graphs to newly visited states. Additionally, current GCHRL methods face challenges such as sample inefficiency and poor subgoal representation. This paper proposes a solution to these issues by developing a graph encoder-decoder to evaluate unseen states. Our proposed method, Graph-Guided sub-Goal representation Generation RL (G4RL), can be incorporated into any existing GCHRL method when operating in environments with primarily symmetric and reversible transitions to enhance performance across this class of problems. We show that the graph encoder-decoder can be effectively implemented using a network trained on the state graph generated during exploration. Empirical results indicate that leveraging high and low-level intrinsic rewards from the graph encoder-decoder significantly enhances the performance of state-of-the-art GCHRL approaches with an extra small computational cost in dense and sparse reward environments.

【10】HyperComplEx: Adaptive Multi-Space Knowledge Graph Embeddings
标题：HyperComplEx：自适应多空间知识图嵌入
链接：https://arxiv.org/abs/2511.10842

作者：Jugal Gajjar,Kaustik Ranaware,Kamalasankari Subramaniakuppusamy,Vaibhav Gandhi
备注：9 pages, 3 figures, 8 tables, 19 equations, accepted at the 5th Workshop on Knowledge Graphs and Big Data in IEEE BigData 2025 and the paper will be published in the IEEE BigData Conference Proceedings
摘要：知识图已经成为表示跨科学和企业领域的复杂关系数据的基本结构。然而，现有的嵌入方法在大规模建模不同的关系类型时面临着严重的限制：欧几里得模型与层次结构斗争，向量空间模型无法捕获不对称性，双曲模型在对称关系上失败。我们提出了HyperComplEx，一个混合嵌入框架，通过学习注意力机制自适应地结合双曲，复杂和欧几里得空间。特定于关系的空间加权策略为每个关系类型动态选择最佳几何形状，而多空间一致性损失确保跨空间的一致性预测。我们在计算机科学研究知识图谱上评估了HyperComplEx，范围从1K篇论文（约25K个三元组）到1000万篇论文（约45M个三元组），证明了在最先进的基线上的一致改进，包括TransE，RotatE，DistMult，ComplEx，SEPA和UltraE。对标准基准的进一步测试证实，结果明显高于所有基线。在1000万张纸的数据集上，HyperComplEx实现了0.612 MRR，比最佳基线相对增益4.8%，同时保持了有效的训练，实现了每个三元组85 ms的推理。该模型通过自适应的维度分配与图的大小近似线性地缩放。我们发布了我们的实现和数据集系列，以促进可扩展知识图嵌入的可重复研究。
摘要：Knowledge graphs have emerged as fundamental structures for representing complex relational data across scientific and enterprise domains. However, existing embedding methods face critical limitations when modeling diverse relationship types at scale: Euclidean models struggle with hierarchies, vector space models cannot capture asymmetry, and hyperbolic models fail on symmetric relations. We propose HyperComplEx, a hybrid embedding framework that adaptively combines hyperbolic, complex, and Euclidean spaces via learned attention mechanisms. A relation-specific space weighting strategy dynamically selects optimal geometries for each relation type, while a multi-space consistency loss ensures coherent predictions across spaces. We evaluate HyperComplEx on computer science research knowledge graphs ranging from 1K papers (~25K triples) to 10M papers (~45M triples), demonstrating consistent improvements over state-of-the-art baselines including TransE, RotatE, DistMult, ComplEx, SEPA, and UltraE. Additional tests on standard benchmarks confirm significantly higher results than all baselines. On the 10M-paper dataset, HyperComplEx achieves 0.612 MRR, a 4.8% relative gain over the best baseline, while maintaining efficient training, achieving 85 ms inference per triple. The model scales near-linearly with graph size through adaptive dimension allocation. We release our implementation and dataset family to facilitate reproducible research in scalable knowledge graph embeddings.

Transformer(4篇)

【1】Multistability of Self-Attention Dynamics in Transformers
标题：Transformer中自我注意动力学的多稳定性
链接：https://arxiv.org/abs/2511.11553

作者：Claudio Altafini
备注：8 pages, 3 figures
摘要：在机器学习中，自注意力动态是Transformers注意力机制的连续时间多代理模型。在本文中，我们表明，这种动态是有关的多代理版本的Oja流，动力系统，计算主特征向量的矩阵对应的Transformers的值矩阵。我们将“单头”自注意系统的均衡分为四类：一致性均衡、二分一致性均衡、聚类均衡和多边形均衡。前三类的多个渐近稳定平衡点经常共存于自注意动力学中。有趣的是，前两类均衡总是与价值矩阵的特征向量对齐，通常但不完全是与主特征向量对齐。
摘要：In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head'' self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.

【2】MoCap2Radar: A Spatiotemporal Transformer for Synthesizing Micro-Doppler Radar Signatures from Motion Capture
标题：MoCap2 Radar：一种时空Transformer，用于从运动捕获合成微多普勒雷达信号
链接：https://arxiv.org/abs/2511.11462

作者：Kevin Chen,Kenneth W. Parker,Anish Arora
摘要：我们提出了一个纯机器学习过程，用于从运动捕获（MoCap）数据合成雷达频谱图。我们将MoCap到频谱图的翻译制定为窗口化的序列到序列任务，使用基于变换器的模型，该模型联合捕获跨帧的MoCap标记和时间动态之间的空间关系。实验结果表明，该方法能够生成直观、定量的多普勒雷达谱图，并具有良好的泛化能力。消融实验表明，学习的模型包括将多部分运动转换为多普勒特征的能力和对人体不同部位之间的空间关系的理解。其结果是一个有趣的例子，使用Transformers的时间序列信号处理。它特别适用于边缘计算和物联网（IoT）雷达。它还建议使用更丰富的MoCap数据来训练更高级别的应用程序，以增强稀缺的雷达数据集的能力。最后，它比基于物理的方法需要更少的计算来生成雷达数据。
摘要：We present a pure machine learning process for synthesizing radar spectrograms from Motion-Capture (MoCap) data. We formulate MoCap-to-spectrogram translation as a windowed sequence-to-sequence task using a transformer-based model that jointly captures spatial relations among MoCap markers and temporal dynamics across frames. Real-world experiments show that the proposed approach produces visually and quantitatively plausible doppler radar spectrograms and achieves good generalizability. Ablation experiments show that the learned model includes both the ability to convert multi-part motion into doppler signatures and an understanding of the spatial relations between different parts of the human body. The result is an interesting example of using transformers for time-series signal processing. It is especially applicable to edge computing and Internet of Things (IoT) radars. It also suggests the ability to augment scarce radar datasets using more abundant MoCap data for training higher-level applications. Finally, it requires far less computation than physics-based methods for generating radar data.

【3】Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning
标题：基于变换器的强化学习多阶段航天器轨迹优化
链接：https://arxiv.org/abs/2511.11402

作者：Amit Jain,Victor Rodriguez-Fernandez,Richard Linares
摘要：自主航天器控制的任务阶段，如发射，上升，级分离，入轨仍然是一个关键的挑战，由于需要适应性的政策，概括了动态不同的制度。虽然强化学习（RL）在单个天体动力学任务中表现出了希望，但现有方法通常需要针对不同任务阶段的单独策略，从而限制了适应性并增加了操作复杂性。这项工作引入了一个基于transformer的强化学习框架，通过一个单一的策略架构，利用Transformer的固有能力来建模扩展的时间上下文，统一多阶段的轨迹优化。建立在邻近策略优化（PPO），我们的框架取代了传统的经常性网络与Transformer编码器-解码器结构，使代理保持连贯的内存跨任务阶段跨越几秒钟到几分钟的关键操作。通过集成门控Transformer-XL（GTrXL）架构，该框架消除了手动阶段转换，同时保持了控制决策的稳定性。我们逐步验证我们的方法：首先在单相基准（双积分器和Van der Pol振荡器）上展示接近最佳的性能，然后扩展到多相路点导航变体，最后解决复杂的多相火箭上升问题，包括大气层飞行，级分离和真空操作。结果表明，基于变压器的框架不仅在简单的情况下匹配的分析解决方案，但也有效地学习一致的控制策略，在动态不同的制度，建立可扩展的自主任务规划，减少依赖于特定阶段的控制器，同时保持与安全关键的验证协议的兼容性的基础。
摘要：Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.

【4】Transformers know more than they can tell -- Learning the Collatz sequence
标题：Transformer知道的比他们能说的还要多--学习Collatz序列
链接：https://arxiv.org/abs/2511.10811

作者：François Charton,Ashvni Narayanan
摘要：我们研究了长Collatz步的Transformer预测，这是一个复杂的算术函数，它将奇数整数映射到Collatz序列中的远距离后继整数（如果$u_n$是偶数，则$u_{n +1}=u_n/2$，如果$u_{n$是奇数，则$u_{n+1}=（3u_n +1）/2$）。模型精度随用于编码输入和输出的基数而变化。对于碱基24 $和32 $，它可以高达99.7 $，对于碱基11 $和3 $，它可以低至37 $和25 $。然而，所有的模型，无论基础如何，都遵循一个共同的学习模式。随着训练的进行，它们学习一系列共享相同残差模2^p $的输入类。模型在这些类上实现了近乎完美的准确性，并且对于所有其他输入都小于1\%$。这映射到Collatz序列的数学性质：计算长Collatz步长所涉及的循环长度可以从其输入的二进制表示中推导出来。学习模式反映了模型学习以预测与增加的环路长度相关联的输入。失败案例的分析表明，几乎所有的模型误差遵循可预测的模式。幻觉是大型语言模型的常见特征，几乎从未发生过。在超过90\%$的故障，该模型执行正确的计算，但错误地估计回路长度。我们的观察充分说明了模型学习的算法。他们认为学习如此复杂的算术函数的困难在于计算的控制结构--循环的长度。我们相信，这里概述的方法，使用数学问题作为理解，解释，也许改善语言模型的工具，可以应用于广泛的问题，并取得丰硕的成果。
摘要：We investigate transformer prediction of long Collatz steps, a complex arithmetic function that maps odd integers to their distant successors in the Collatz sequence ( $u_{n+1}=u_n/2$ if $u_n$ is even, $u_{n+1}=(3u_n+1)/2$ if $u_n$ is odd). Model accuracy varies with the base used to encode input and output. It can be as high as $99.7\%$ for bases $24$ and $32$, and as low as $37$ and $25\%$ for bases $11$ and $3$. Yet, all models, no matter the base, follow a common learning pattern. As training proceeds, they learn a sequence of classes of inputs that share the same residual modulo $2^p$. Models achieve near-perfect accuracy on these classes, and less than $1\%$ for all other inputs. This maps to a mathematical property of Collatz sequences: the length of the loops involved in the computation of a long Collatz step can be deduced from the binary representation of its input. The learning pattern reflects the model learning to predict inputs associated with increasing loop lengths. An analysis of failure cases reveals that almost all model errors follow predictable patterns. Hallucination, a common feature of large language models, almost never happens. In over $90\%$ of failures, the model performs the correct calculation, but wrongly estimates loop lengths. Our observations give a full account of the algorithms learned by the models. They suggest that the difficulty of learning such complex arithmetic function lies in figuring the control structure of the computation -- the length of the loops. We believe that the approach outlined here, using mathematical problems as tools for understanding, explaining, and perhaps improving language models, can be applied to a broad range of problems and bear fruitful results.

GAN|对抗|攻击|生成相关(4篇)

【1】Adaptive Intrusion Detection for Evolving RPL IoT Attacks Using Incremental Learning
标题：使用增量学习对不断演变的RPL物联网攻击进行自适应入侵检测
链接：https://arxiv.org/abs/2511.11464

作者：Sumeyye Bas,Kiymet Kaya,Elif Ak,Sule Gunduz Oguducu
摘要：低功耗和有损网络（RPL）的路由协议已成为资源受限的物联网系统的事实上的路由标准，但其轻量级设计暴露了各种路由层攻击的关键漏洞，如hello flood，降低排名和版本号操纵。传统的对策，包括协议级修改和机器学习分类器，可以对已知的威胁实现高准确性，但在面对新的或零日攻击时，除非完全重新训练，否则它们会失败，这种方法对于动态物联网环境是不切实际的。在本文中，我们研究增量学习作为一个实用的和自适应的策略，在基于RPL的网络入侵检测。我们系统地评估了五个模型家族，包括集成模型和深度学习模型。我们的分析强调，增量学习不仅可以恢复对新攻击类别的检测性能，还可以减轻对先前学习的威胁的灾难性遗忘，同时与完全重新训练相比，可以减少训练时间。通过将五种不同的模型与特定于攻击的分析、遗忘行为和时间效率相结合，这项研究提供了系统的证据，证明增量学习提供了一种可扩展的途径，可以在不断发展的基于RPL的物联网网络中保持弹性入侵检测。
摘要：The routing protocol for low-power and lossy networks (RPL) has become the de facto routing standard for resource-constrained IoT systems, but its lightweight design exposes critical vulnerabilities to a wide range of routing-layer attacks such as hello flood, decreased rank, and version number manipulation. Traditional countermeasures, including protocol-level modifications and machine learning classifiers, can achieve high accuracy against known threats, yet they fail when confronted with novel or zero-day attacks unless fully retrained, an approach that is impractical for dynamic IoT environments. In this paper, we investigate incremental learning as a practical and adaptive strategy for intrusion detection in RPL-based networks. We systematically evaluate five model families, including ensemble models and deep learning models. Our analysis highlights that incremental learning not only restores detection performance on new attack classes but also mitigates catastrophic forgetting of previously learned threats, all while reducing training time compared to full retraining. By combining five diverse models with attack-specific analysis, forgetting behavior, and time efficiency, this study provides systematic evidence that incremental learning offers a scalable pathway to maintain resilient intrusion detection in evolving RPL-based IoT networks.

【2】SoK: Security Evaluation of Wi-Fi CSI Biometrics: Attacks, Metrics, and Systemic Weaknesses
标题：SoK：Wi-Fi SI生物识别技术的安全评估：攻击、漏洞和系统性弱点
链接：https://arxiv.org/abs/2511.11381

作者：Gioliano de Oliveira Braga,Pedro Henrique dos Santos Rocha,Rafael Pimenta de Mattos Paixão,Giovani Hoff da Costa,Gustavo Cavalcanti Morais,Lourenço Alves Pereira Júnior
备注：An improved version will be submitted to Euro S&P 2026, and this paper will be updated in the near future
摘要：Wi-Fi信道状态信息（CSI）已被反复提出作为生物特征模态，通常具有高准确性和操作可行性的报告。然而，该领域对其安全特性、对抗性复原力和方法一致性缺乏统一的理解。本知识系统化（SoK）通过安全角度研究了基于CSI的生物特征认证，分析了现有工作在传感基础设施、信号表示、特征管道、学习模型和评估方法方面的差异。我们的综合揭示了系统的不一致性：依赖于总体准确性指标，有限的报告FAR/FRR/EER，缺乏每个用户的风险分析，以及缺乏对威胁模型或对抗可行性的考虑。我们构建了一个统一的评估框架，以实证的方式揭示这些问题，并展示如何安全相关的指标，如每类EER，FCS和基尼系数，发现风险集中，仍然隐藏在传统的报告实践。我们的分析突出了具体的攻击面，并显示了方法选择如何实质性地影响脆弱性配置文件，其中包括重放，几何模仿和环境扰动。基于这些发现，我们阐明了当前CSI生物识别技术的安全边界，并为严格的评估，可重复的实验和未来的研究方向提供了指导方针。该SoK为安全社区提供了一个结构化的，证据驱动的Wi-Fi CSI生物识别技术及其作为身份验证原语的适用性重新评估。
摘要：Wi-Fi Channel State Information (CSI) has been repeatedly proposed as a biometric modality, often with reports of high accuracy and operational feasibility. However, the field lacks a consolidated understanding of its security properties, adversarial resilience, and methodological consistency. This Systematization of Knowledge (SoK) examines CSI-based biometric authentication through a security perspective, analyzing how existing work differs across sensing infrastructure, signal representations, feature pipelines, learning models, and evaluation methodologies. Our synthesis reveals systemic inconsistencies: reliance on aggregate accuracy metrics, limited reporting of FAR/FRR/EER, absence of per-user risk analysis, and scarce consideration of threat models or adversarial feasibility. We construct a unified evaluation framework to empirically expose these issues and demonstrate how security-relevant metrics, such as per-class EER, FCS, and the Gini Coefficient, uncover risk concentration that remains hidden under traditional reporting practices. Our analysis highlights concrete attack surfaces and shows how methodological choices materially influence vulnerability profiles, which include replay, geometric mimicry, and environmental perturbation. Based on these findings, we articulate the security boundaries of current CSI biometrics and provide guidelines for rigorous evaluation, reproducible experimentation, and future research directions. This SoK offers the security community a structured, evidence-driven reassessment of Wi-Fi CSI biometrics and their suitability as an authentication primitive.

【3】HealSplit: Towards Self-Healing through Adversarial Distillation in Split Federated Learning
标题：HealSplit：通过Split联邦学习中的对抗蒸馏实现自我修复
链接：https://arxiv.org/abs/2511.11240

作者：Yuhan Xie,Chen Lyu
备注：Accepted by AAAI 2026
摘要：分裂联邦学习（SFL）是一种新兴的隐私保护分布式学习范式。然而，它仍然容易受到针对本地特征，标签，破碎数据和模型权重的复杂数据中毒攻击。现有的防御，主要是从传统的联邦学习（FL），是有效的SFL下，由于有限的访问完整的模型更新。本文介绍了HealSplit，为SFL量身定制的第一个统一防御框架，提供端到端的检测和恢复对五种复杂类型的中毒攻击。HealSplit包括三个关键组件：（1）拓扑感知检测模块，通过拓扑异常评分（TAS）在破碎数据上构建图形以识别中毒样本;（2）生成恢复管道，为检测到的异常合成语义一致的替代品，由一致性验证学生验证;以及（3）对抗性多教师蒸馏框架使用来自Vanilla教师的语义监督和来自异常影响去偏（AD）教师的异常感知信号来训练学生，由拓扑和基于梯度的交互矩阵之间的对齐引导。在四个基准数据集上进行的大量实验表明，HealSplit始终优于十种最先进的防御，在不同的攻击场景中实现了卓越的鲁棒性和防御有效性。
摘要：Split Federated Learning (SFL) is an emerging paradigm for privacy-preserving distributed learning. However, it remains vulnerable to sophisticated data poisoning attacks targeting local features, labels, smashed data, and model weights. Existing defenses, primarily adapted from traditional Federated Learning (FL), are less effective under SFL due to limited access to complete model updates. This paper presents HealSplit, the first unified defense framework tailored for SFL, offering end-to-end detection and recovery against five sophisticated types of poisoning attacks. HealSplit comprises three key components: (1) a topology-aware detection module that constructs graphs over smashed data to identify poisoned samples via topological anomaly scoring (TAS); (2) a generative recovery pipeline that synthesizes semantically consistent substitutes for detected anomalies, validated by a consistency validation student; and (3) an adversarial multi-teacher distillation framework trains the student using semantic supervision from a Vanilla Teacher and anomaly-aware signals from an Anomaly-Influence Debiasing (AD) Teacher, guided by the alignment between topological and gradient-based interaction matrices. Extensive experiments on four benchmark datasets demonstrate that HealSplit consistently outperforms ten state-of-the-art defenses, achieving superior robustness and defense effectiveness across diverse attack scenarios.

【4】When to Stop Federated Learning: Zero-Shot Generation of Synthetic Validation Data with Generative AI for Early Stopping
标题：何时停止联邦学习：利用生成人工智能Zero-Shot生成合成验证数据，以实现早期停止
链接：https://arxiv.org/abs/2511.11208

作者：Youngjoon Lee,Hyukjoon Lee,Jinu Gong,Yang Cao,Joonhyuk Kang
备注：Accepted to IEEE BigData 2025
摘要：联合学习（FL）支持跨分散设备的协作模型训练，同时保护数据隐私。然而，FL方法通常运行预定义数量的全局轮，当较早达到最佳性能时，通常导致不必要的计算。此外，即使模型未能实现有意义的性能，训练也可以继续。为了解决这种效率低下的问题，我们引入了一个zero-shot合成验证框架，该框架利用生成式AI来监控模型性能并确定早期停止点。我们的方法自适应地在最佳回合附近停止训练，从而节省计算资源并实现快速的超参数调整。多标签胸部X射线分类的数值结果表明，我们的方法减少了高达74%的训练轮，同时保持精度在1%的最佳。
摘要：Federated Learning (FL) enables collaborative model training across decentralized devices while preserving data privacy. However, FL methods typically run for a predefined number of global rounds, often leading to unnecessary computation when optimal performance is reached earlier. In addition, training may continue even when the model fails to achieve meaningful performance. To address this inefficiency, we introduce a zero-shot synthetic validation framework that leverages generative AI to monitor model performance and determine early stopping points. Our approach adaptively stops training near the optimal round, thereby conserving computational resources and enabling rapid hyperparameter adjustments. Numerical results on multi-label chest X-ray classification demonstrate that our method reduces training rounds by up to 74% while maintaining accuracy within 1% of the optimal.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm
标题：无监督鲁棒领域自适应：范式、理论和算法
链接：https://arxiv.org/abs/2511.11009

作者：Fuxiang Huang,Xiaowei Fu,Shiyu Ye,Lina Ma,Wen Li,Xinbo Gao,David Zhang,Lei Zhang
备注：To appear in IJCV
摘要：无监督领域自适应（UDA）的目的是通过解决领域转移，将知识从标签丰富的源领域转移到未标记的目标领域。大多数UDA方法强调传输能力，但往往忽视了对抗攻击的鲁棒性。尽管vanilla对抗训练（VAT）提高了深度神经网络的鲁棒性，但对UDA的影响很小。本文着重回答了三个关键问题：1）为什么以防御有效性著称的增值税在UDA范式中失败了？2)攻击下的广义界理论是什么？它是如何从经典的UDA理论演化而来的？3)我们如何在不进行复杂修改的情况下实施鲁棒化培训程序？具体而言，我们探索和揭示了一般的UDA+VAT范式固有的纠缠挑战，并提出了一个无监督的鲁棒域自适应（URDA）范式。我们进一步推导了URDA范式的广义界理论，使其能够抵抗对抗性噪声和域转移。据我们所知，这是第一次建立URDA范式和理论。我们进一步介绍了一种简单，新颖而有效的URDA算法，称为解纠缠对抗鲁棒性训练（DART），这是一个两步训练过程，可确保可移植性和鲁棒性。DART首先对任意UDA模型进行预训练，然后通过解纠缠蒸馏（disentangled distillation）进行瞬时鲁棒后训练，在有/无攻击的四个基准数据集上的实验表明，DART在保持领域适应性的同时，有效地增强了鲁棒性，验证了URDA范式和理论。
摘要：Unsupervised domain adaptation (UDA) aims to transfer knowledge from a label-rich source domain to an unlabeled target domain by addressing domain shifts. Most UDA approaches emphasize transfer ability, but often overlook robustness against adversarial attacks. Although vanilla adversarial training (VAT) improves the robustness of deep neural networks, it has little effect on UDA. This paper focuses on answering three key questions: 1) Why does VAT, known for its defensive effectiveness, fail in the UDA paradigm? 2) What is the generalization bound theory under attacks and how does it evolve from classical UDA theory? 3) How can we implement a robustification training procedure without complex modifications? Specifically, we explore and reveal the inherent entanglement challenge in general UDA+VAT paradigm, and propose an unsupervised robust domain adaptation (URDA) paradigm. We further derive the generalization bound theory of the URDA paradigm so that it can resist adversarial noise and domain shift. To the best of our knowledge, this is the first time to establish the URDA paradigm and theory. We further introduce a simple, novel yet effective URDA algorithm called Disentangled Adversarial Robustness Training (DART), a two-step training procedure that ensures both transferability and robustness. DART first pre-trains an arbitrary UDA model, and then applies an instantaneous robustification post-training step via disentangled distillation.Experiments on four benchmark datasets with/without attacks show that DART effectively enhances robustness while maintaining domain adaptability, and validate the URDA paradigm and theory.

【2】Towards Uncertainty Quantification in Generative Model Learning
标题：生成模型学习中的不确定性量化
链接：https://arxiv.org/abs/2511.10710

作者：Giorgio Morales,Frederic Jurie,Jalal Fadili
备注：Accepted at EurIPS 2025 Workshop: Epistemic Intelligence in Machine Learning (EIML@EurIPS 2025)
摘要：虽然生成模型在各个领域变得越来越普遍，但关于其可靠性的基本问题仍然存在。这些模型的一个关键但研究不足的方面是围绕其分布近似能力的不确定性量化。目前的评估方法主要集中在测量学习和目标分布之间的接近程度，忽略了这些测量中固有的不确定性。在这篇立场论文中，我们形式化了生成模型学习中的不确定性量化问题。我们讨论了潜在的研究方向，包括使用基于集合的精确召回曲线。我们在合成数据集上的初步实验证明了聚合精确度-召回率曲线在捕获模型近似不确定性方面的有效性，从而能够根据其不确定性特征在不同模型架构之间进行系统比较。
摘要：While generative models have become increasingly prevalent across various domains, fundamental concerns regarding their reliability persist. A crucial yet understudied aspect of these models is the uncertainty quantification surrounding their distribution approximation capabilities. Current evaluation methodologies focus predominantly on measuring the closeness between the learned and the target distributions, neglecting the inherent uncertainty in these measurements. In this position paper, we formalize the problem of uncertainty quantification in generative model learning. We discuss potential research directions, including the use of ensemble-based precision-recall curves. Our preliminary experiments on synthetic datasets demonstrate the effectiveness of aggregated precision-recall curves in capturing model approximation uncertainty, enabling systematic comparison among different model architectures based on their uncertainty characteristics.

【3】Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models
标题：守护意义：守护模型中语义稳健性的自我监督训练
链接：https://arxiv.org/abs/2511.10665

作者：Cristina Pinneri,Christos Louizos
摘要：守卫模型是LLM安全的关键组成部分，但它们对表面语言变化的敏感性仍然是一个关键的漏洞。我们发现，即使是意义保留的释义也会导致安全分数的大幅波动，揭示了语义基础的缺乏。为了解决这个问题，我们引入了一个实用的，自我监督的框架，以提高警卫模型的语义鲁棒性。我们的方法利用释义集执行预测的一致性，使用一种新的，倾斜意识的聚合策略，强大的目标计算。值得注意的是，我们发现，标准的聚合方法，如平均值和中位数可以降低安全性，强调需要倾斜意识的替代品。我们分析了六个开源守卫模型，并表明我们的方法将释义的语义变化降低了约58%，平均提高了约2.5%的基准准确性，并推广到看不见的风格变化。有趣的是，我们发现了模型校准和一致性之间的双向关系：我们的鲁棒性训练将校准提高了40%，揭示了这些属性之间的基本联系。这些结果突出了将语义一致性视为一流训练目标的价值，并为构建更可靠的守卫模型提供了可扩展的方法。
摘要：Guard models are a critical component of LLM safety, but their sensitivity to superficial linguistic variations remains a key vulnerability. We show that even meaning-preserving paraphrases can cause large fluctuations in safety scores, revealing a lack of semantic grounding. To address this, we introduce a practical, self-supervised framework for improving the semantic robustness of guard models. Our method leverages paraphrase sets to enforce prediction consistency using a novel, skew-aware aggregation strategy for robust target computation. Notably, we find that standard aggregation methods like mean and median can degrade safety, underscoring the need for skew-aware alternatives. We analyze six open-source guard models and show that our approach reduces semantic variability across paraphrases by ~58%, improves benchmark accuracy by ~2.5% on average, and generalizes to unseen stylistic variations. Intriguingly, we discover a bidirectional relationship between model calibration and consistency: our robustness training improves calibration by up to 40%, revealing a fundamental connection between these properties. These results highlight the value of treating semantic consistency as a first-class training objective and provide a scalable recipe for building more reliable guard models.

【4】Patent Representation Learning via Self-supervision
标题：通过自我监督进行专利代表学习
链接：https://arxiv.org/abs/2511.10657

作者：You Zuo,Kim Gerdes,Eric Villemonte de La Clergerie,Benoît Sagot
摘要：本文提出了一个简单而有效的对比学习框架，通过利用同一文档中的多个视图来学习专利嵌入。我们首先确定了一个专利特定的故障模式SimCSE风格的辍学增强：它产生过于统一的嵌入，失去语义的凝聚力。为了弥补这一点，我们提出了基于部分的增强，其中专利的不同部分（例如，摘要、权利要求、背景）作为补充视图。这种设计引入了自然的语义和结构多样性，减轻了过度分散，并产生了更好地保持全局结构和局部连续性的嵌入。在大规模的基准测试中，我们的完全自我监督方法匹配或超过了现有技术检索和分类中的引用和IPC监督基线，同时避免依赖脆弱或不完整的注释。我们的分析进一步表明，不同的部分专门用于不同的任务-索赔和摘要有利于检索，而背景部分的援助分类-突出专利的内在话语结构的表征学习的价值。这些结果突出了利用文档内视图进行可扩展和可推广的专利理解的价值。
摘要：This paper presents a simple yet effective contrastive learning framework for learning patent embeddings by leveraging multiple views from within the same document. We first identify a patent-specific failure mode of SimCSE style dropout augmentation: it produces overly uniform embeddings that lose semantic cohesion. To remedy this, we propose section-based augmentation, where different sections of a patent (e.g., abstract, claims, background) serve as complementary views. This design introduces natural semantic and structural diversity, mitigating over-dispersion and yielding embeddings that better preserve both global structure and local continuity. On large-scale benchmarks, our fully self-supervised method matches or surpasses citation-and IPC-supervised baselines in prior-art retrieval and classification, while avoiding reliance on brittle or incomplete annotations. Our analysis further shows that different sections specialize for different tasks-claims and summaries benefit retrieval, while background sections aid classification-highlighting the value of patents' inherent discourse structure for representation learning. These results highlight the value of exploiting intra-document views for scalable and generalizable patent understanding.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】Experience-Guided Adaptation of Inference-Time Reasoning Strategies
标题：经验引导的推理时推理策略适应
链接：https://arxiv.org/abs/2511.11519

作者：Adam Stein,Matthew Trager,Benjamin Bowman,Michael Kleinman,Aditya Chattopadhyay,Wei Xia,Stefano Soatto
备注：29 pages, 5 figures
摘要：使代理人工智能系统能够基于训练后的交互来调整其解决问题的方法仍然是一个根本性的挑战。虽然已经提出了在推理时更新和维护内存的系统，但现有的设计只能通过修改语言模型或代理的文本输入来引导系统，这意味着它们不能改变采样参数，删除工具，修改系统提示或在代理和工作流范式之间切换。另一方面，适应性更灵活的系统需要离线优化，并且在部署后保持静态。我们提出了经验指导推理（EGuR），它生成量身定制的策略-完整的计算程序，涉及LLM调用，工具，采样参数和控制逻辑-动态推理时间的基础上积累的经验。我们使用基于LLM的元策略来实现这一目标-一种输出策略的策略-使所有策略组件（提示，采样参数，工具配置和控制逻辑）能够适应。EGuR通过两个组件进行操作：指南生成多个候选策略，这些策略以当前问题和对过去经验的结构化记忆为条件，而整合器集成执行反馈以改进未来策略生成。这将生成针对每个问题进行优化的完整的、可随时运行的策略，这些策略可以根据需要进行缓存、检索和执行，而不会浪费资源。在五个具有挑战性的基准测试（AIME 2025，3-SAT和三个大板凳超硬任务）中，EGuR在最强基线上实现了高达14%的准确性提升，同时将计算成本降低了111倍，随着系统获得经验，这两个指标都有所改善。
摘要：Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.

【2】SPOT: Single-Shot Positioning via Trainable Near-Field Rainbow Beamforming
标题：SPOT：通过可训练的近场彩虹射束形成进行单次定位
链接：https://arxiv.org/abs/2511.11391

作者：Yeyue Cai,Jianhua Mo,Meixia Tao
摘要：相位-时间阵列集成了移相器（PS）和实时延迟（TTD），是一种高性价比的宽带传感和定位系统中产生频率依赖的彩虹波束的结构。本文提出了一种基于端到端深度学习的方案，该方案同时设计彩虹光束并估计用户位置。将PS和TTD系数视为可训练变量允许网络合成最大化定位精度的面向任务的波束。然后，在单个下行链路传输之后，轻量级全连接模块从其最大量化接收功率及其对应的子载波索引的反馈中恢复用户的角度-范围坐标。与现有的分析和学习为基础的计划相比，所提出的方法减少了一个数量级的开销，并提供一贯较低的二维定位误差。
摘要：Phase-time arrays, which integrate phase shifters (PSs) and true-time delays (TTDs), have emerged as a cost-effective architecture for generating frequency-dependent rainbow beams in wideband sensing and localization. This paper proposes an end-to-end deep learning-based scheme that simultaneously designs the rainbow beams and estimates user positions. Treating the PS and TTD coefficients as trainable variables allows the network to synthesize task-oriented beams that maximize localization accuracy. A lightweight fully connected module then recovers the user's angle-range coordinates from its feedback of the maximum quantized received power and its corresponding subcarrier index after a single downlink transmission. Compared with existing analytical and learning-based schemes, the proposed method reduces overhead by an order of magnitude and delivers consistently lower two-dimensional positioning error.

【3】Adaptive Symmetrization of the KL Divergence
标题：KL分歧的适应性对称化
链接：https://arxiv.org/abs/2511.11159

作者：Omri Ben-Dov,Luiz F. O. Chamon
摘要：机器学习中的许多任务可以被描述为或简化为学习给定有限样本集的概率分布。一种常见的方法是最小化（经验）数据分布和参数化分布之间的统计差异，例如，归一化流（NF）或基于能量的模型（EBM）。在这种情况下，前向KL发散由于其易处理性而无处不在，尽管其不对称性可能会阻止捕获目标分布的一些特性。对称替代方案涉及脆弱的最小-最大公式和对抗训练（例如，生成对抗网络）或评估反向KL发散，如对称Jeffreys发散的情况，这对从样本计算具有挑战性。这项工作旨在开发一种新的方法来最大限度地减少杰弗里斯分歧。为此，它使用了一个代理模型，其目标不仅是拟合数据，而且还有助于优化主模型的Jeffreys散度。该联合训练任务被公式化为约束优化问题，以获得在整个训练过程中适应模型优先级的实用算法。我们说明了如何使用这个框架来结合的优势，如密度估计，图像生成和基于模拟的推理任务的NF和EBM。
摘要：Many tasks in machine learning can be described as or reduced to learning a probability distribution given a finite set of samples. A common approach is to minimize a statistical divergence between the (empirical) data distribution and a parameterized distribution, e.g., a normalizing flow (NF) or an energy-based model (EBM). In this context, the forward KL divergence is a ubiquitous due to its tractability, though its asymmetry may prevent capturing some properties of the target distribution. Symmetric alternatives involve brittle min-max formulations and adversarial training (e.g., generative adversarial networks) or evaluating the reverse KL divergence, as is the case for the symmetric Jeffreys divergence, which is challenging to compute from samples. This work sets out to develop a new approach to minimize the Jeffreys divergence. To do so, it uses a proxy model whose goal is not only to fit the data, but also to assist in optimizing the Jeffreys divergence of the main model. This joint training task is formulated as a constrained optimization problem to obtain a practical algorithm that adapts the models priorities throughout training. We illustrate how this framework can be used to combine the advantages of NFs and EBMs in tasks such as density estimation, image generation, and simulation-based inference.

【4】One-Shot Transfer Learning for Nonlinear PDEs with Perturbative PINNs
标题：具有扰动PINN的非线性偏出方程的单次传输学习
链接：https://arxiv.org/abs/2511.11137

作者：Samuel Auroy,Pavlos Protopapas
备注：Accepted at Machine Learning and the Physical Sciences Workshop, NeurIPS 2025
摘要：我们提出了一个解决非线性偏微分方程（PDE）的框架，结合扰动理论与物理信息神经网络（PINNs）中的一次性迁移学习。将多项式项的非线性偏微分方程分解为一系列线性子问题，并使用多头PINN有效地求解。一旦学习了线性算子的潜在表示，就可以在不重新训练的情况下以封闭形式获得具有不同扰动、强迫项或边界/初始条件的新PDE实例的解。我们在KPP-Fisher和波动方程上验证了该方法，实现了1 e-3量级的误差，同时在0.2秒内适应新的问题实例;与经典求解器相当的精度，但传输速度更快。敏感性分析表明，可预测的误差增长与多项式和次数，澄清该方法的有效制度。我们的贡献是：（i）将一次性迁移学习从非线性常微分方程扩展到偏微分方程，（ii）导出用于适应新的偏微分方程实例的封闭形式解，以及（iii）证明规范非线性偏微分方程的准确性和效率。最后，我们概述了衍生依赖非线性和高维偏微分方程的扩展。
摘要：We propose a framework for solving nonlinear partial differential equations (PDEs) by combining perturbation theory with one-shot transfer learning in Physics-Informed Neural Networks (PINNs). Nonlinear PDEs with polynomial terms are decomposed into a sequence of linear subproblems, which are efficiently solved using a Multi-Head PINN. Once the latent representation of the linear operator is learned, solutions to new PDE instances with varying perturbations, forcing terms, or boundary/initial conditions can be obtained in closed form without retraining. We validate the method on KPP-Fisher and wave equations, achieving errors on the order of 1e-3 while adapting to new problem instances in under 0.2 seconds; comparable accuracy to classical solvers but with faster transfer. Sensitivity analyses show predictable error growth with epsilon and polynomial degree, clarifying the method's effective regime. Our contributions are: (i) extending one-shot transfer learning from nonlinear ODEs to PDEs, (ii) deriving a closed-form solution for adapting to new PDE instances, and (iii) demonstrating accuracy and efficiency on canonical nonlinear PDEs. We conclude by outlining extensions to derivative-dependent nonlinearities and higher-dimensional PDEs.

【5】Scalable Population Training for Zero-Shot Coordination
标题：Zero-Shot协调的可扩展人群训练
链接：https://arxiv.org/abs/2511.11083

作者：Bingyu Hui,Lebin Yu,Quanming Yao,Yunpeng Qu,Xudong Zhang,Jian Wang
摘要：零次协调（Zero-Shot Coordination，ZSC）是近年来强化学习研究的热点。它专注于智能体的泛化能力，要求它们在没有任何微调的情况下与以前从未见过的合作者进行良好的协调。基于种群的训练已被证明提供良好的zero-shot协调性能;然而，现有的方法受到计算资源的限制，主要集中在优化小种群的多样性，而忽略了缩放种群大小的潜在性能增益。为了解决这个问题，本文提出了可扩展的人口训练（ScaPT），一个有效的训练框架，包括两个关键组成部分：一个元代理，有效地实现人口有选择地共享参数的代理，和一个互信息正则化，保证人口的多样性。为了实证验证ScaPT的有效性，本文将其与Hanabi中的代表性框架一起进行评估，并确认其优越性。
摘要：Zero-shot coordination(ZSC) has become a hot topic in reinforcement learning research recently. It focuses on the generalization ability of agents, requiring them to coordinate well with collaborators that are not seen before without any fine-tuning. Population-based training has been proven to provide good zero-shot coordination performance; nevertheless, existing methods are limited by computational resources, mainly focusing on optimizing diversity in small populations while neglecting the potential performance gains from scaling population size. To address this issue, this paper proposes the Scalable Population Training (ScaPT), an efficient training framework comprising two key components: a meta-agent that efficiently realizes a population by selectively sharing parameters across agents, and a mutual information regularizer that guarantees population diversity. To empirically validate the effectiveness of ScaPT, this paper evaluates it along with representational frameworks in Hanabi and confirms its superiority.

【6】Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
标题：通过模型平均正无标签数据的异类多维度迁移学习
链接：https://arxiv.org/abs/2511.10919

作者：Jialei Liu,Jun Liao,Kuangnan Fang
摘要：由于缺乏明确标记的负样本，正未标记（PU）学习面临着独特的挑战，特别是在欺诈检测和医疗诊断等高风险领域。为了解决数据稀缺和隐私限制问题，我们提出了一种新的迁移学习模型平均框架，该框架集成了来自异构数据源的信息-包括完全二进制标记，半监督和PU数据集-而无需直接共享数据。对于每个源域类型，进行定制的逻辑回归模型，并通过模型平均将知识转移到PU目标域。组合源模型的最佳权重通过最小化Kullback-Leibler发散的交叉验证标准来确定。我们建立了理论保证的重量最优性和收敛性，包括错误指定和正确指定的目标模型，进一步扩展到高维设置使用稀疏惩罚估计。大量的模拟和真实信用风险数据分析表明，我们的方法在预测准确性和鲁棒性方面优于其他比较方法，特别是在有限的标记数据和异构环境下。
摘要：Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with model averaging framework that integrates information from heterogeneous data sources - including fully binary labeled, semi-supervised, and PU data sets - without direct data sharing. For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging. Optimal weights for combining source models are determined via a cross-validation criterion that minimizes the Kullback-Leibler divergence. We establish theoretical guarantees for weight optimality and convergence, covering both misspecified and correctly specified target models, with further extensions to high-dimensional settings using sparsity-penalized estimators. Extensive simulations and real-world credit risk data analyses demonstrate that our method outperforms other comparative methods in terms of predictive accuracy and robustness, especially under limited labeled data and heterogeneous environments.

强化学习(2篇)

【1】LoRaCompass: Robust Reinforcement Learning to Efficiently Search for a LoRa Tag
标题：LoRaCompass：强大的强化学习，有效搜索LoRa标签
链接：https://arxiv.org/abs/2511.11190

作者：Tianlang He,Zhongming Lin,Tianrui Jiang,S. -H. Gary Chan
摘要：长距离（LoRa）协议以其广泛的范围和低功耗而闻名，越来越多地被精神上无行为能力的人（MIP）和其他有失踪风险的人所佩戴的标签所采用。我们研究了移动传感器的顺序决策过程，以在一般未知环境中以最少的移动（跳数）定位定期广播的LoRa标签，由接收信号强度指示器（RSSI）引导。虽然现有的方法利用强化学习进行搜索，但它们仍然容易受到域偏移和信号波动的影响，导致级联决策错误，最终导致大量的定位不准确。为了弥合这一差距，我们提出了LoRaCompass，这是一种强化学习模型，旨在实现对LoRa标签的强大而高效的搜索。为了在域偏移和信号波动下进行利用，LoRaCompass通过空间感知特征提取器和策略蒸馏损失函数，从RSSI中学习鲁棒的空间表示，以最大限度地提高靠近标签的概率。它还引入了一个探索功能的灵感来自置信上限（UCB），引导传感器朝着标签越来越多的信心。我们已经在地面和无人机辅助的场景中验证了LoRaCompass，这些场景位于覆盖面积超过80平方公里的各种看不见的环境中。它已经证明了在100米范围内定位标签的高成功率（>90%）（比现有方法提高了40%）和高效率，搜索路径长度（以跳数为单位）与初始距离呈线性关系。
摘要：The Long-Range (LoRa) protocol, known for its extensive range and low power, has increasingly been adopted in tags worn by mentally incapacitated persons (MIPs) and others at risk of going missing. We study the sequential decision-making process for a mobile sensor to locate a periodically broadcasting LoRa tag with the fewest moves (hops) in general, unknown environments, guided by the received signal strength indicator (RSSI). While existing methods leverage reinforcement learning for search, they remain vulnerable to domain shift and signal fluctuation, resulting in cascading decision errors that culminate in substantial localization inaccuracies. To bridge this gap, we propose LoRaCompass, a reinforcement learning model designed to achieve robust and efficient search for a LoRa tag. For exploitation under domain shift and signal fluctuation, LoRaCompass learns a robust spatial representation from RSSI to maximize the probability of moving closer to a tag, via a spatially-aware feature extractor and a policy distillation loss function. It further introduces an exploration function inspired by the upper confidence bound (UCB) that guides the sensor toward the tag with increasing confidence. We have validated LoRaCompass in ground-based and drone-assisted scenarios within diverse unseen environments covering an area of over 80km^2. It has demonstrated high success rate (>90%) in locating the tag within 100m proximity (a 40% improvement over existing methods) and high efficiency with a search path length (in hops) that scales linearly with the initial distance.

【2】Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning
标题：行为政策优化：政策外强化学习的方差回报估计明显较低
链接：https://arxiv.org/abs/2511.10843

作者：Alexander W. Goodall,Edwin Hamel-De le Court,Francesco Belardinelli
备注：Accepted at AAAI 2026 (main track)
摘要：许多强化学习算法，特别是那些依赖于回报估计来改进策略的算法，由于高方差的回报估计，可能会受到样本效率低下和训练不稳定的影响。在本文中，我们利用新的结果，从政策外的评估，它最近被证明，精心设计的行为政策可以用来收集政策外的数据，可证明较低的方差回报估计。这一结果令人惊讶，因为它意味着收集策略数据不是方差最优的。我们将这一关键见解扩展到在线强化学习设置，其中策略评估和改进都是交错的，以学习最佳策略。政策外RL已经得到了很好的研究（例如，IMPALA），具有正确的和截断的重要性加权样本，用于去偏置和适当地管理方差。一般来说，这些方法涉及并行地协调从多个工作者收集的数据，而策略是异步更新的，工作者和策略之间的不匹配以数学上合理的方式得到纠正。在这里，我们只考虑一个工人-行为政策，这是用来收集数据的政策改进，可证明较低的方差回报估计。在我们的实验中，我们扩展了两个政策梯度的方法与此制度，在不同的环境中表现出更好的采样效率和性能。
摘要：Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we leverage new results from off-policy evaluation; it has recently been shown that well-designed behaviour policies can be used to collect off-policy data for provably lower variance return estimates. This result is surprising as it means collecting data on-policy is not variance optimal. We extend this key insight to the online reinforcement learning setting, where both policy evaluation and improvement are interleaved to learn optimal policies. Off-policy RL has been well studied (e.g., IMPALA), with correct and truncated importance weighted samples for de-biasing and managing variance appropriately. Generally these approaches are concerned with reconciling data collected from multiple workers in parallel, while the policy is updated asynchronously, mismatch between the workers and policy is corrected in a mathematically sound way. Here we consider only one worker - the behaviour policy, which is used to collect data for policy improvement, with provably lower variance return estimates. In our experiments we extend two policy-gradient methods with this regime, demonstrating better sample efficiency and performance over a diverse set of environments.

分层学习(1篇)

【1】PROMISE: Prompt-Attentive Hierarchical Contrastive Learning for Robust Cross-Modal Representation with Missing Modalities
标题：PROMISE：专注的分层对比学习，用于缺失模式的鲁棒跨模式表示
链接：https://arxiv.org/abs/2511.10997

作者：Jiajun Chen,Sai Cheng,Yutao Yuan,Yirui Zhang,Haitao Yuan,Peng Peng,Yi Zhong
备注：Accepted by AAAI'2026 Main Conference
摘要：多模态模型集成了自然语言和视觉信息，大大提高了表示模型的泛化能力。然而，在某些模式缺失或不可用的现实情况下，其有效性显著下降。这种退化主要源于完整的多模态数据和不完整的模态场景之间不一致的表示学习。现有的方法通常通过相对简单的生成方法来解决丢失的模态，但这些方法无法充分保持跨模态的一致性，从而导致次优性能。为了克服这一局限性，我们提出了一种新的多模态框架名为PROMISE，PROMpting-Attentive Historical对比学习方法明确设计的鲁棒跨模态表示条件下丢失的模态。具体来说，PROMISE创新地将多模态提示学习纳入一个分层对比学习框架，配备了一个专门设计的注意力机制。这种机制动态地生成鲁棒性和一致的表示的情况下，特定的模态是缺席，从而有效地弥合完整和不完整的数据之间的代表性差距。在基准数据集上进行的大量实验以及全面的消融研究清楚地表明，与当前最先进的多模态方法相比，PROMISE具有优异的性能。
摘要：Multimodal models integrating natural language and visual information have substantially improved generalization of representation models. However, their effectiveness significantly declines in real-world situations where certain modalities are missing or unavailable. This degradation primarily stems from inconsistent representation learning between complete multimodal data and incomplete modality scenarios. Existing approaches typically address missing modalities through relatively simplistic generation methods, yet these approaches fail to adequately preserve cross-modal consistency, leading to suboptimal performance. To overcome this limitation, we propose a novel multimodal framework named PROMISE, a PROMpting-Attentive HIerarchical ContraStive LEarning approach designed explicitly for robust cross-modal representation under conditions of missing modalities. Specifically, PROMISE innovatively incorporates multimodal prompt learning into a hierarchical contrastive learning framework, equipped with a specially designed prompt-attention mechanism. This mechanism dynamically generates robust and consistent representations for scenarios where particular modalities are absent, thereby effectively bridging the representational gap between complete and incomplete data. Extensive experiments conducted on benchmark datasets, along with comprehensive ablation studies, clearly demonstrate the superior performance of PROMISE compared to current state-of-the-art multimodal methods.

医学相关(7篇)

【1】VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
标题：VoxTell：自由文本可搜索通用3D医学图像分割
链接：https://arxiv.org/abs/2511.11450

作者：Maximilian Rokuss,Moritz Langenberg,Yannick Kirchhoff,Fabian Isensee,Benjamin Hamm,Constantin Ulrich,Sebastian Regnery,Lukas Bauer,Efthimios Katsigiannopulos,Tobias Norajitra,Klaus Maier-Hein
摘要：我们介绍VoxTell，文本提示体积医学图像分割的视觉语言模型。它映射了自由形式的描述，从单个单词到完整的临床句子，再到3D面具。VoxTell在62K+ CT、MRI和PET体积上进行训练，跨越1K解剖学和病理学类别，在解码器层上使用多阶段视觉语言融合，以多个尺度对齐文本和视觉特征。它在看不见的数据集上实现了跨模态的最先进的zero-shot性能，在熟悉的概念上表现出色，同时推广到相关的看不见的类。大量的实验进一步证明了强大的跨模态传输，对语言变化和临床语言的鲁棒性，以及从真实世界文本中准确的特定实例分割。代码可在：https://www.github.com/MIC-DKFZ/VoxTell
摘要：We introduce VoxTell, a vision-language model for text-prompted volumetric medical image segmentation. It maps free-form descriptions, from single words to full clinical sentences, to 3D masks. Trained on 62K+ CT, MRI, and PET volumes spanning over 1K anatomical and pathological classes, VoxTell uses multi-stage vision-language fusion across decoder layers to align textual and visual features at multiple scales. It achieves state-of-the-art zero-shot performance across modalities on unseen datasets, excelling on familiar concepts while generalizing to related unseen classes. Extensive experiments further demonstrate strong cross-modality transfer, robustness to linguistic variations and clinical language, as well as accurate instance-specific segmentation from real-world text. Code is available at: https://www.github.com/MIC-DKFZ/VoxTell

【2】Toward Scalable Early Cancer Detection: Evaluating EHR-Based Predictive Models Against Traditional Screening Criteria
标题：迈向可扩展的早期癌症检测：根据传统筛查标准评估基于EHR的预测模型
链接：https://arxiv.org/abs/2511.11293

作者：Jiheum Park,Chao Pang,Tristan Y. Lee,Jeong Yun Yang,Jacob Berkowitz,Alexander Z. Wei,Nicholas Tatonetti
摘要：目前的癌症筛查指南仅涵盖少数癌症类型，并依赖于严格定义的标准，如年龄或吸烟史等单一风险因素，以识别高风险个体。使用电子健康记录（EHR）的预测模型可以捕获大规模纵向患者水平的健康信息，可以通过检测癌症的微妙预诊断信号来识别高危人群，从而提供更有效的工具。大型语言和基础模型的最新进展进一步扩大了这一潜力，但与目前用于筛查指南的传统风险因素相比，基于HER的模型的有用性仍然有限。我们系统地评估了基于EHR的预测模型对传统风险因素（包括基因突变和癌症家族史）的临床效用，以识别八种主要癌症的高风险个体（乳腺，肺，结肠直肠，前列腺，卵巢，肝脏，胰腺和胃），使用来自All of Us研究计划的数据，该计划整合了来自865个以上的EHR，基因组和调查数据，000名参与者。即使使用基线建模方法，与单独使用传统风险因素相比，基于EHR的模型在被确定为高风险的个体中实现了3至6倍的真实癌症病例富集，无论是作为独立工具还是作为补充工具。EHR基础模型是一种经过全面患者轨迹训练的最先进方法，进一步提高了26种癌症类型的预测性能，证明了基于EHR的预测建模在支持更精确和可扩展的早期检测策略方面的临床潜力。
摘要：Current cancer screening guidelines cover only a few cancer types and rely on narrowly defined criteria such as age or a single risk factor like smoking history, to identify high-risk individuals. Predictive models using electronic health records (EHRs), which capture large-scale longitudinal patient-level health information, may provide a more effective tool for identifying high-risk groups by detecting subtle prediagnostic signals of cancer. Recent advances in large language and foundation models have further expanded this potential, yet evidence remains limited on how useful HER-based models are compared with traditional risk factors currently used in screening guidelines. We systematically evaluated the clinical utility of EHR-based predictive models against traditional risk factors, including gene mutations and family history of cancer, for identifying high-risk individuals across eight major cancers (breast, lung, colorectal, prostate, ovarian, liver, pancreatic, and stomach), using data from the All of Us Research Program, which integrates EHR, genomic, and survey data from over 865,000 participants. Even with a baseline modeling approach, EHR-based models achieved a 3- to 6-fold higher enrichment of true cancer cases among individuals identified as high risk compared with traditional risk factors alone, whether used as a standalone or complementary tool. The EHR foundation model, a state-of-the-art approach trained on comprehensive patient trajectories, further improved predictive performance across 26 cancer types, demonstrating the clinical potential of EHR-based predictive modeling to support more precise and scalable early detection strategies.

【3】PINGS-X: Physics-Informed Normalized Gaussian Splatting with Axes Alignment for Efficient Super-Resolution of 4D Flow MRI
标题：PINGS-X：具有轴对齐的物理信息的正规化高斯飞溅，实现4D血流MRI的高效超分辨率
链接：https://arxiv.org/abs/2511.11048

作者：Sun Jo,Seok Young Hong,JinHyun Kim,Seungmin Kang,Ahjin Choi,Don-Gwan An,Simon Song,Je Hyeong Hong
备注：Accepted at AAAI 2026. Supplementary material included after references. 27 pages, 21 figures, 11 tables
摘要：4D血流磁共振成像（MRI）是一种可靠的、非侵入性的方法，用于估计血流速度，这对心血管诊断至关重要。与聚焦于解剖结构的传统MRI不同，4D Flow MRI需要高时空分辨率，以便早期检测狭窄或动脉瘤等关键情况。然而，实现这样的分辨率通常会导致延长扫描时间，从而在采集速度和预测精度之间产生权衡。最近的研究已经利用物理信息神经网络（PINN）来实现MRI数据的超分辨率，但它们的实用性有限，因为必须对每个患者进行非常缓慢的训练过程。为了克服这一限制，我们提出了PINGS-X，一个新的框架建模高分辨率的流速轴对齐的时空高斯表示。受到3D高斯溅射（3DGS）在新颖视图合成中的有效性的启发，PINGS-X通过几个重要的新颖创新扩展了这一概念：（i）具有形式收敛保证的归一化高斯溅射，（ii）简化高维数据的训练同时保持精度和收敛保证的轴对齐高斯，以及（iii）高斯合并过程以防止退化解并提高计算效率。在计算流体力学（CFD）和真实4D流动MRI数据集上的实验结果表明，PINGS-X大大减少了训练时间，同时实现了卓越的超分辨率精度。我们的代码和数据集可以在https://github.com/SpatialAILab/PINGS-X上找到。
摘要：4D flow magnetic resonance imaging (MRI) is a reliable, non-invasive approach for estimating blood flow velocities, vital for cardiovascular diagnostics. Unlike conventional MRI focused on anatomical structures, 4D flow MRI requires high spatiotemporal resolution for early detection of critical conditions such as stenosis or aneurysms. However, achieving such resolution typically results in prolonged scan times, creating a trade-off between acquisition speed and prediction accuracy. Recent studies have leveraged physics-informed neural networks (PINNs) for super-resolution of MRI data, but their practical applicability is limited as the prohibitively slow training process must be performed for each patient. To overcome this limitation, we propose PINGS-X, a novel framework modeling high-resolution flow velocities using axes-aligned spatiotemporal Gaussian representations. Inspired by the effectiveness of 3D Gaussian splatting (3DGS) in novel view synthesis, PINGS-X extends this concept through several non-trivial novel innovations: (i) normalized Gaussian splatting with a formal convergence guarantee, (ii) axes-aligned Gaussians that simplify training for high-dimensional data while preserving accuracy and the convergence guarantee, and (iii) a Gaussian merging procedure to prevent degenerate solutions and boost computational efficiency. Experimental results on computational fluid dynamics (CFD) and real 4D flow MRI datasets demonstrate that PINGS-X substantially reduces training time while achieving superior super-resolution accuracy. Our code and datasets are available at https://github.com/SpatialAILab/PINGS-X.

【4】CardioEmbed: Domain-Specialized Text Embeddings for Clinical Cardiology
标题：收件箱嵌入：临床心脏病学的领域专业文本嵌入
链接：https://arxiv.org/abs/2511.10930

作者：Richard J. Young,Alice M. Matthews
备注：14 pages, 6 figures
摘要：生物医学文本嵌入主要是使用PubMed的研究文献开发的，但临床心脏病学实践在很大程度上依赖于综合教科书中的程序知识和专业术语，而不是研究摘要。这种研究实践的差距限制了现有嵌入模型在心脏病学临床应用中的有效性。这项研究训练了基于Qwen 3-Embedding-8B的领域专用嵌入模型，使用对比学习对七本综合心脏病学教科书的精选语料库进行了对比学习，共约150，000句。该模型采用InfoNCE损失与批量阴性，并在心脏特定语义检索任务上实现了99.60%的检索准确率，比当前最先进的医学嵌入模型MedTE提高了15.94个百分点。在MTEB医学基准上，该模型获得了BIOSSES 0.77 Spearman和SciFact 0.61 NDCG@10，表明在相关生物医学领域具有竞争力。综合临床教科书的领域专业培训产生了近乎完美的心脏病学检索（99.60%Acc@1），比MedTE提高了+15.94个百分点。
摘要：Biomedical text embeddings have primarily been developed using research literature from PubMed, yet clinical cardiology practice relies heavily on procedural knowledge and specialized terminology found in comprehensive textbooks rather than research abstracts. This research practice gap limits the effectiveness of existing embedding models for clinical applications incardiology. This study trained CardioEmbed, a domain-specialized embedding model based on Qwen3-Embedding-8B, using contrastive learning on a curated corpus of seven comprehensive cardiology textbooks totaling approximately 150,000 sentences after deduplication. The model employs InfoNCE loss with in-batch negatives and achieves 99.60% retrieval accuracy on cardiac-specific semantic retrieval tasks, a +15.94 percentage point improvement over MedTE, the current state-of-the-art medical embedding model. On MTEB medical benchmarks, the model obtained BIOSSES 0.77 Spearman and SciFact 0.61 NDCG@10, indicating competitive performance on related biomedical domains. Domain-specialized training on comprehensive clinical textbooks yields near-perfect cardiology retrieval (99.60% Acc@1), improving over MedTE by +15.94 percentage points.

【5】Forecasting Spoken Language Development in Children with Cochlear Implants Using Preimplantation MRI
标题：利用植入前MRI预测人工晶状体植入儿童的口语发育
链接：https://arxiv.org/abs/2511.10669

作者：Yanlin Wang,Di Yuan,Shani Dettman,Dawn Choo,Emily Shimeng Xu,Denise Thomas,Maura E Ryan,Patrick C M Wong,Nancy M Young
备注：38 pages
摘要：耳蜗植入体（CI）可显著改善重度至极重度感音神经性听力损失（SNHL）儿童的口语，但其结果仍比听力正常儿童的结果更易变。对于个体儿童，使用植入年龄或残余听力不能可靠地预测这种变异性。本研究旨在比较传统机器学习（ML）与深度迁移学习（DTL）算法的准确性，以使用高与低语言改进者的二元分类模型预测双侧SNHL儿童的CI后口语发展。三家临床试验机构共招募了278名植入儿童。基于传统ML和DTL学习的脑神经解剖学特征的预测模型的准确性，灵敏度和特异性。使用基于双线性注意力的融合策略的DTL预测模型实现了：准确率92.39%（95%CI，90.70%-94.07%），敏感性为91.22%（95% CI，89.98%-92.47%），特异性为93.56%（95% CI，90.91%-96.21%），曲线下面积（AUC）为0.977（95% CI，0.969-0.986）。DTL在所有结局指标中均优于传统ML模型。DTL通过直接捕获区分性和任务特定信息而得到显着改善，这是这种方法实现的表征学习优于ML的优势。研究结果支持了一个单一的DTL预测模型的可行性，为儿童的语言预测的CI程序在世界各地提供服务。
摘要：Cochlear implants (CI) significantly improve spoken language in children with severe-to-profound sensorineural hearing loss (SNHL), yet outcomes remain more variable than in children with normal hearing. This variability cannot be reliably predicted for individual children using age at implantation or residual hearing. This study aims to compare the accuracy of traditional machine learning (ML) to deep transfer learning (DTL) algorithms to predict post-CI spoken language development of children with bilateral SNHL using a binary classification model of high versus low language improvers. A total of 278 implanted children enrolled from three centers. The accuracy, sensitivity and specificity of prediction models based upon brain neuroanatomic features using traditional ML and DTL learning. DTL prediction models using bilinear attention-based fusion strategy achieved: accuracy of 92.39% (95% CI, 90.70%-94.07%), sensitivity of 91.22% (95% CI, 89.98%-92.47%), specificity of 93.56% (95% CI, 90.91%-96.21%), and area under the curve (AUC) of 0.977 (95% CI, 0.969-0.986). DTL outperformed traditional ML models in all outcome measures. DTL was significantly improved by direct capture of discriminative and task-specific information that are advantages of representation learning enabled by this approach over ML. The results support the feasibility of a single DTL prediction model for language prediction of children served by CI programs worldwide.

【6】Synergy vs. Noise: Performance-Guided Multimodal Fusion For Biochemical Recurrence-Free Survival in Prostate Cancer
标题：协同与噪音：性能引导的多模式融合实现前列腺癌生化无复发生存
链接：https://arxiv.org/abs/2511.11452

作者：Seth Alain Chang,Muhammad Mueez Amjad,Noorul Wahab,Ethar Alzaid,Nasir Rajpoot,Adam Shephard
备注：5 pages, 1 figure, 4 tables
摘要：多模态深度学习（MDL）已经成为计算病理学中的一种变革性方法。通过整合来自多个数据源的互补信息，与单峰模型相比，MDL模型在不同的临床任务中表现出更好的预测性能。然而，组合模式本质上可以提高性能的假设在很大程度上仍未得到验证。我们假设，多模态增益严重依赖于单个模态的预测质量，并且整合弱模态可能会引入噪声而不是补充信息。我们用组织病理学、放射学和临床数据在前列腺癌数据集上测试这一假设，以预测生化复发的时间。我们的研究结果证实，结合高性能的方式产生优越的性能相比，单峰的方法。然而，将性能较差的模态与其他性能较高的模态集成会降低预测准确性。这些研究结果表明，多模式的好处需要选择性的，性能指导的整合，而不是不加选择的方式组合，与跨计算病理学和医学成像的MDL设计的影响。
摘要：Multimodal deep learning (MDL) has emerged as a transformative approach in computational pathology. By integrating complementary information from multiple data sources, MDL models have demonstrated superior predictive performance across diverse clinical tasks compared to unimodal models. However, the assumption that combining modalities inherently improves performance remains largely unexamined. We hypothesise that multimodal gains depend critically on the predictive quality of individual modalities, and that integrating weak modalities may introduce noise rather than complementary information. We test this hypothesis on a prostate cancer dataset with histopathology, radiology, and clinical data to predict time-to-biochemical recurrence. Our results confirm that combining high-performing modalities yield superior performance compared to unimodal approaches. However, integrating a poor-performing modality with other higher-performing modalities degrades predictive accuracy. These findings demonstrate that multimodal benefit requires selective, performance-guided integration rather than indiscriminate modality combination, with implications for MDL design across computational pathology and medical imaging.

【7】Large-scale modality-invariant foundation models for brain MRI analysis: Application to lesion segmentation
标题：用于脑部MRI分析的大规模模式不变基础模型：应用于病变分割
链接：https://arxiv.org/abs/2511.11311

作者：Petros Koutsouvelis,Matej Gazda,Leroy Volmer,Sina Amirrajab,Kamil Barbierik,Branislav Setlak,Jakub Gazda,Peter Drotar
备注：Submitted to IEEE ISBI 2026
摘要：计算机视觉领域正在经历一场范式转变，通过自监督学习（SSL）进行大规模基础模型预训练。利用大量未标记的大脑MRI数据，这些模型可以学习解剖学先验，从而提高各种神经成像任务中的Few-Shot性能。然而，大多数SSL框架都是针对自然图像定制的，并且它们对捕获多模态MRI信息的适应性仍然未得到充分探索。这项工作提出了一个模态不变的表示学习设置，并评估其有效性中风和癫痫病变分割，大规模的预训练。实验结果表明，尽管成功的跨模态对齐，病变分割主要受益于保留细粒度的模态特定的功能。模型检查点和代码是公开的。
摘要：The field of computer vision is undergoing a paradigm shift toward large-scale foundation model pre-training via self-supervised learning (SSL). Leveraging large volumes of unlabeled brain MRI data, such models can learn anatomical priors that improve few-shot performance in diverse neuroimaging tasks. However, most SSL frameworks are tailored to natural images, and their adaptation to capture multi-modal MRI information remains underexplored. This work proposes a modality-invariant representation learning setup and evaluates its effectiveness in stroke and epilepsy lesion segmentation, following large-scale pre-training. Experimental results suggest that despite successful cross-modality alignment, lesion segmentation primarily benefits from preserving fine-grained modality-specific features. Model checkpoints and code are made publicly available.

聚类(3篇)

【1】Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications
标题：将公平集群推广到多个群体：算法和应用
链接：https://arxiv.org/abs/2511.11539

作者：Diptarka Chakraborty,Kushagra Chatterjee,Debarati Das,Tien-Long Nguyen
备注：Accepted in AAAI 2026 for Oral Representation
摘要：聚类是机器学习和数据分析中的一项基本任务，但它经常无法为由多个受保护属性定义的各种边缘化社区提供公平的表示-这一缺点通常是由训练数据中的偏见引起的。因此，越来越需要提高聚类结果的公平性，理想情况下通过进行最小的修改，可能作为常规聚类后的后处理步骤。最近，Chakraborty等人。[COLT'25]发起了\n {最近公平聚类}的研究，尽管是在数据点只属于两组的受限场景中。然而，在实践中，数据点通常具有许多群体的特征，反映了不同的受保护属性，如年龄、族裔、性别等。在这项工作中，我们推广的研究\n {最接近的公平聚类}的问题设置与任意数量（两个以上）的群体。我们首先表明，这个问题是NP-困难的，即使当所有组的大小相等-一个鲜明的对比，两组的情况下，一个确切的算法存在。接下来，我们提出了近线性时间近似算法，有效地处理任意大小的多个组，从而回答了Chakraborty等人提出的一个开放性问题。[COLT'25]。利用我们最接近的公平聚类算法，我们进一步实现了对\n {公平相关聚类}问题的改进近似保证，推进了Ahmadian et al. [AISTATS'20]和Ahmadi et al. [2020]建立的最先进的结果。此外，我们是第一个为涉及多个（两个以上）组的\n {公平共识聚类}问题提供近似算法的人，从而解决了Chakraborty等人强调的另一个开放方向。[COLT'25]。
摘要：Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of \emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the \emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25]. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the \emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the \emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].

【2】When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering
标题：当基因说话时：空间解析转录组学数据集群的语义引导框架
链接：https://arxiv.org/abs/2511.11380

作者：Jiangkai Long,Yanran Zhu,Chang Tang,Kun Sun,Yuanyuan Liu,Xuesong Yan
备注：AAAI'2026 poster paper. 12 pages, 8 figures
摘要：空间转录组学使基因表达谱与空间背景，提供前所未有的见解组织微环境。然而，大多数计算模型将基因视为孤立的数字特征，忽略了编码在其符号中的丰富生物语义。这阻碍了对关键生物特征的真正深入理解。为了克服这一限制，我们提出了SemST，这是一个用于空间转录组学数据聚类的语义引导深度学习框架。SemST利用大型语言模型（LLM）使基因能够通过其符号意义“说话”，将每个组织点内的基因集转化为生物信息嵌入。然后，将这些嵌入与图神经网络（GNNs）捕获的空间邻域关系融合，实现生物功能和空间结构的一致集成。我们进一步引入细粒度语义调制（FSM）模块，以最佳地利用这些生物先验知识。FSM模块学习特定于点的仿射变换，其使语义嵌入能够执行空间特征的逐元素校准，从而将高阶生物知识动态地注入到空间背景中。在公共空间转录组数据集上的大量实验表明，SemST实现了最先进的聚类性能。最重要的是，FSM模块具有即插即用的多功能性，在集成到其他基线方法中时，可以持续提高性能。
摘要：Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation, we present SemST, a semantic-guided deep learning framework for spatial transcriptomics data clustering. SemST leverages Large Language Models (LLMs) to enable genes to "speak" through their symbolic meanings, transforming gene sets within each tissue spot into biologically informed embeddings. These embeddings are then fused with the spatial neighborhood relationships captured by Graph Neural Networks (GNNs), achieving a coherent integration of biological function and spatial structure. We further introduce the Fine-grained Semantic Modulation (FSM) module to optimally exploit these biological priors. The FSM module learns spot-specific affine transformations that empower the semantic embeddings to perform an element-wise calibration of the spatial features, thus dynamically injecting high-order biological knowledge into the spatial context. Extensive experiments on public spatial transcriptomics datasets show that SemST achieves state-of-the-art clustering performance. Crucially, the FSM module exhibits plug-and-play versatility, consistently improving the performance when integrated into other baseline methods.

【3】Near-optimal Linear Predictive Clustering in Non-separable Spaces via Mixed Integer Programming and Quadratic Pseudo-Boolean Reductions
标题：通过混合迭代规划和二次伪布尔约简实现不可分空间中的近优线性预测聚集
链接：https://arxiv.org/abs/2511.10809

作者：Jiazhou Liang,Hassan Khurram,Scott Sanner
摘要：线性预测聚类（LPC）基于特征变量和目标变量之间的共享线性关系对样本进行分区，具有许多应用，包括营销，医学和教育。通常用于LPC的贪婪优化方法在聚类和线性回归之间交替，但缺乏全局最优性。虽然对可分离的聚类有效，但它们在聚类在特征空间中重叠的不可分离的设置中很难。在另一种约束优化范式中，Bertsimas和Shioda（2007）将LPC公式化为混合可编程（MIP），确保全局最优性，而不管可分性如何，但可扩展性较差。这项工作建立在约束优化范式的基础上，引入了两种新的方法，提高了LPC的全局优化效率。通过利用可分性的关键理论属性，我们获得了接近最佳的近似与可证明的误差界，显着降低了MIP制定的复杂性和提高可扩展性。此外，我们可以进一步近似LPC作为二次伪布尔优化（QPBO）问题，在某些设置中实现了实质性的计算改进。对合成数据集和真实数据集的比较分析表明，我们的方法始终实现接近最优的解决方案，回归误差大大低于贪婪优化，同时表现出优于现有MIP配方的可扩展性。
摘要：Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables, with numerous applications including marketing, medicine, and education. Greedy optimization methods, commonly used for LPC, alternate between clustering and linear regression but lack global optimality. While effective for separable clusters, they struggle in non-separable settings where clusters overlap in feature space. In an alternative constrained optimization paradigm, Bertsimas and Shioda (2007) formulated LPC as a Mixed-Integer Program (MIP), ensuring global optimality regardless of separability but suffering from poor scalability. This work builds on the constrained optimization paradigm to introduce two novel approaches that improve the efficiency of global optimization for LPC. By leveraging key theoretical properties of separability, we derive near-optimal approximations with provable error bounds, significantly reducing the MIP formulation's complexity and improving scalability. Additionally, we can further approximate LPC as a Quadratic Pseudo-Boolean Optimization (QPBO) problem, achieving substantial computational improvements in some settings. Comparative analyses on synthetic and real-world datasets demonstrate that our methods consistently achieve near-optimal solutions with substantially lower regression errors than greedy optimization while exhibiting superior scalability over existing MIP formulations.

推理|分析|理解|解释(7篇)

【1】A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication
标题：半分散学习的统一收敛分析：抽样与所有沟通
链接：https://arxiv.org/abs/2511.11560

作者：Angelo Rodio,Giovanni Neglia,Zheng Chen,Erik G. Larsson
备注：Accepted as a conference paper at AAAI 2026 (oral presentation). This is the extended version including the appendix
摘要：在半分散式联合学习中，设备主要依赖于设备到设备的通信，但偶尔会与中央服务器进行交互。定期地，设备的采样子集将其本地模型上传到服务器，服务器计算聚合模型。然后，服务器可以（i）仅与采样的客户端（采样到采样，S2S）共享该聚合模型，或者（ii）将其广播给所有客户端（采样到所有，S2A）。尽管它们的实际意义，这两种策略的严格的理论和实证比较仍然缺乏。我们通过在统一的融合框架内分析S2S和S2A来解决这一差距，该框架考虑了关键系统参数：采样率，服务器聚合频率和网络连接。我们的结果，分析和实验，揭示了不同的制度，其中一种策略优于其他，主要取决于跨设备的数据异质性的程度。这些见解为实际的半分散FL部署提供了具体的设计指南。
摘要：In semi-decentralized federated learning, devices primarily rely on device-to-device communication but occasionally interact with a central server. Periodically, a sampled subset of devices uploads their local models to the server, which computes an aggregate model. The server can then either (i) share this aggregate model only with the sampled clients (sampled-to-sampled, S2S) or (ii) broadcast it to all clients (sampled-to-all, S2A). Despite their practical significance, a rigorous theoretical and empirical comparison of these two strategies remains absent. We address this gap by analyzing S2S and S2A within a unified convergence framework that accounts for key system parameters: sampling rate, server aggregation frequency, and network connectivity. Our results, both analytical and experimental, reveal distinct regimes where one strategy outperforms the other, depending primarily on the degree of data heterogeneity across devices. These insights lead to concrete design guidelines for practical semi-decentralized FL deployments.

【2】DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference
标题：迪夫Pro：联合时步和分层精确优化，实现高效扩散推理
链接：https://arxiv.org/abs/2511.11446

作者：Farhana Amin,Sabiha Afroz,Kanchon Gharami,Mona Moghadampanah,Dimitrios S. Nikolopoulos
摘要：扩散模型产生高质量的图像，但由于许多去噪步骤和繁重的矩阵运算，推断是昂贵的。我们提出了DiffPro，这是一个训练后的硬件忠实框架，它与部署中使用的确切整数内核一起工作，并在扩散Transformers（DiTs）中联合调整时间步长和每层精度，以减少延迟和内存，而无需任何训练。DiffPro由三部分组成：用于分配权重位的流形感知灵敏度度量、用于稳定跨时间步的激活的动态激活量化、以及由教师-学生漂移引导的预算时间步选择器。在实验中，DiffPro在标准基准测试中实现了高达6.25倍的模型压缩，时间步长减少了50%，并且Delta FID <= 10的推理速度加快了2.8倍，证明了实际的效率提升。DiffPro将步骤减少和精确规划统一到一个预算可部署的计划中，用于实时能量感知扩散推理。
摘要：Diffusion models produce high quality images but inference is costly due to many denoising steps and heavy matrix operations. We present DiffPro, a post-training, hardware-faithful framework that works with the exact integer kernels used in deployment and jointly tunes timesteps and per-layer precision in Diffusion Transformers (DiTs) to reduce latency and memory without any training. DiffPro combines three parts: a manifold-aware sensitivity metric to allocate weight bits, dynamic activation quantization to stabilize activations across timesteps, and a budgeted timestep selector guided by teacher-student drift. In experiments DiffPro achieves up to 6.25x model compression, fifty percent fewer timesteps, and 2.8x faster inference with Delta FID <= 10 on standard benchmarks, demonstrating practical efficiency gains. DiffPro unifies step reduction and precision planning into a single budgeted deployable plan for real-time energy-aware diffusion inference.

【3】Deep Learning for Short-Term Precipitation Prediction in Four Major Indian Cities: A ConvLSTM Approach with Explainable AI
标题：印度四个主要城市短期降水预测的深度学习：具有可解释人工智能的ConvLSTM方法
链接：https://arxiv.org/abs/2511.11152

作者：Tanmay Ghosh,Shaurabh Anand,Rakesh Gomaji Nannewar,Nithin Nagaraj
摘要：用于降水预测的深度学习模型通常起着黑匣子的作用，限制了它们在现实世界天气预测中的应用。为了提高透明度，同时保持准确性，我们开发了一个可解释的深度学习框架，用于印度四个主要城市的短期降水预测：班加罗尔，孟买，德里和加尔各答，跨越不同的气候区。我们实现了一个混合的时间分布式CNN-ConvLSTM（卷积神经网络-长短期记忆）架构，在数十年ERA 5再分析数据上进行训练。该架构针对每个城市进行了优化，使用了不同数量的卷积滤波器：班加罗尔（32），孟买和德里（64）以及加尔各答（128）。这些模型的均方根误差（RMSE）值分别为0.21 mm/天（Bengalu）、0.52 mm/天（Mumbai）、0.48 mm/天（Delhi）和1.80 mm/天（Kolkata）。通过使用置换重要性、一致性加权类激活映射（Grad-CAM）、时间遮挡和反事实扰动的可解释性分析，我们确定了模型行为中的不同模式。该模型依赖于特定城市的变量，预测范围从班加罗尔的一天到加尔各答的五天不等。这项研究展示了可解释人工智能（xAI）如何为不同城市环境中的降水模式提供准确的预测和透明的见解。
摘要：Deep learning models for precipitation forecasting often function as black boxes, limiting their adoption in real-world weather prediction. To enhance transparency while maintaining accuracy, we developed an interpretable deep learning framework for short-term precipitation prediction in four major Indian cities: Bengaluru, Mumbai, Delhi, and Kolkata, spanning diverse climate zones. We implemented a hybrid Time-Distributed CNN-ConvLSTM (Convolutional Neural Network-Long Short-Term Memory) architecture, trained on multi-decadal ERA5 reanalysis data. The architecture was optimized for each city with a different number of convolutional filters: Bengaluru (32), Mumbai and Delhi (64), and Kolkata (128). The models achieved root mean square error (RMSE) values of 0.21 mm/day (Bengaluru), 0.52 mm/day (Mumbai), 0.48 mm/day (Delhi), and 1.80 mm/day (Kolkata). Through interpretability analysis using permutation importance, Gradient-weighted Class Activation Mapping (Grad-CAM), temporal occlusion, and counterfactual perturbation, we identified distinct patterns in the model's behavior. The model relied on city-specific variables, with prediction horizons ranging from one day for Bengaluru to five days for Kolkata. This study demonstrates how explainable AI (xAI) can provide accurate forecasts and transparent insights into precipitation patterns in diverse urban environments.

【4】VIDEOP2R: Video Understanding from Perception to Reasoning
标题：VIDEOP 2R：从感知到推理的视频理解
链接：https://arxiv.org/abs/2511.11113

作者：Yifan Jiang,Yueying Wang,Rui Zhao,Toufiq Parag,Zhimin Chen,Zhenyu Liao,Jayakrishnan Unnikrishnan
摘要：强化微调（RFT），一个由监督微调（SFT）和强化学习（RL）组成的两阶段框架，在提高大型语言模型（LLM）的推理能力方面表现出了良好的效果。然而，将RFT扩展到大型视频语言模型（LVLM）仍然具有挑战性。我们提出了VideoP 2 R，一种新的过程感知视频RFT框架，通过将感知和推理建模为不同的过程来增强视频推理。在SFT阶段，我们开发了一个三步流水线来生成VideoP 2 R-CoT-162 K，这是一个用于感知和推理的高质量，过程感知的思想链（CoT）数据集。在强化学习阶段，我们引入了一种新的过程感知的组相对策略优化（PA-GRPO）算法，为感知和推理提供单独的奖励。大量的实验表明，VideoP 2 R在七个视频推理和理解基准测试中的六个上达到了最先进的（SotA）性能。消融研究进一步证实了我们的过程感知建模和PA-GRPO的有效性，并证明模型的感知输出是下游推理的信息充足。
摘要：Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning.

【5】Movement-Specific Analysis for FIM Score Classification Using Spatio-Temporal Deep Learning
标题：使用时空深度学习进行DTS评分分类的特定运动分析
链接：https://arxiv.org/abs/2511.10713

作者：Jun Masaki,Ariaki Higashi,Naoko Shinagawa,Kazuhiko Hirata,Yuichi Kurita,Akira Furui
备注：10 pages, 5 figures, 3tables, Accepted for the 2026 IEEE/SICE International Symposium on System Integration (SII 2026), January 11-14, 2026, Cancun, Mexico
摘要：功能独立性测量（Functional Independence Measure，简称FIA）是一种广泛用于评估患者日常生活活动中身体独立性的测量方法。然而，传统的糖尿病评估给患者和医疗保健专业人员带来了巨大的负担。为了解决这一挑战，我们提出了一个自动化的评估分数估计方法，利用简单的练习不同的指定的评估行动。我们的方法采用了一种深度神经网络架构，集成了时空图卷积网络（ST-GCN），双向长短期记忆（BiLSTM）和注意力机制来估计运动项目得分。该模型有效地捕获了长期的时间依赖性，并通过学习注意力权重来识别关键的身体关节贡献。我们评估了我们的方法在277例康复患者的研究中，重点是肌肉转移和运动项目。我们的方法成功地区分了完全独立的患者和需要帮助的患者，在不同的诊断项目中实现了70.09- 78.79%的平衡准确度。此外，我们的分析揭示了特定的运动模式，作为可靠的预测特定的运动评估项目。
摘要：The functional independence measure (FIM) is widely used to evaluate patients' physical independence in activities of daily living. However, traditional FIM assessment imposes a significant burden on both patients and healthcare professionals. To address this challenge, we propose an automated FIM score estimation method that utilizes simple exercises different from the designated FIM assessment actions. Our approach employs a deep neural network architecture integrating a spatial-temporal graph convolutional network (ST-GCN), bidirectional long short-term memory (BiLSTM), and an attention mechanism to estimate FIM motor item scores. The model effectively captures long-term temporal dependencies and identifies key body-joint contributions through learned attention weights. We evaluated our method in a study of 277 rehabilitation patients, focusing on FIM transfer and locomotion items. Our approach successfully distinguishes between completely independent patients and those requiring assistance, achieving balanced accuracies of 70.09-78.79 % across different FIM items. Additionally, our analysis reveals specific movement patterns that serve as reliable predictors for particular FIM evaluation items.

【6】Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning
标题：数学推理的偏限制的前置表示微调
链接：https://arxiv.org/abs/2511.10707

作者：Sirui Liang,Pengfei Cao,Jian Zhao,Cong Huang,Jun Zhao,Kang Liu
备注：accepted by aaai2026
摘要：参数高效微调（PEFT）通过更新参数的最小子集来增强下游任务的模型性能。Representation finetuning（ReFT）方法通过冻结模型权重和优化内部表示来进一步提高效率，其参数比PEFT少，在多个任务上优于PEFT。然而，ReFT表现出显着的性能下降的数学推理任务。为了解决这个问题，本文表明，ReFT的数学任务的表现不佳，主要源于它的斗争，以产生有效的推理前缀在早期推理阶段。此外，ReFT干扰了数值编码，并且在CoT阶段期间误差累积。基于这些观察，本文提出了偏差约束前缀表示微调（BREP ReFT），它通过截断训练数据来优化初始推理前缀的生成，在早期推理阶段进行干预以防止错误积累，并限制干预向量的大小以避免干扰数值编码，从而增强ReFT的数学推理能力。在不同的模型架构上进行的大量实验表明，BREP具有卓越的有效性、效率和强大的泛化能力，在数学推理任务上优于标准的ReFT和基于权重的PEFT方法。源代码可在https://github.com/LiangThree/BREP上获得。
摘要：Parameter-Efficient finetuning (PEFT) enhances model performance on downstream tasks by updating a minimal subset of parameters. Representation finetuning (ReFT) methods further improve efficiency by freezing model weights and optimizing internal representations with fewer parameters than PEFT, outperforming PEFT on several tasks. However, ReFT exhibits a significant performance decline on mathematical reasoning tasks. To address this problem, the paper demonstrates that ReFT's poor performance on mathematical tasks primarily stems from its struggle to generate effective reasoning prefixes during the early inference phase. Moreover, ReFT disturbs the numerical encoding and the error accumulats during the CoT stage. Based on these observations, this paper proposes Bias-REstrained Prefix Representation FineTuning (BREP ReFT), which enhances ReFT's mathematical reasoning capability by truncating training data to optimize the generation of initial reasoning prefixes, intervening on the early inference stage to prevent error accumulation, and constraining the intervention vectors' magnitude to avoid disturbing numerical encoding. Extensive experiments across diverse model architectures demonstrate BREP's superior effectiveness, efficiency, and robust generalization capability, outperforming both standard ReFT and weight-based PEFT methods on the task of mathematical reasoning. The source code is available at https://github.com/LiangThree/BREP.

【7】Non-Euclidean SGD for Structured Optimization: Unified Analysis and Improved Rates
标题：结构化优化的非欧几里德新元：统一分析和提高利率
链接：https://arxiv.org/abs/2511.11466

作者：Dmitry Kovalev,Ekaterina Borodich
摘要：最近，包括SignSGD、Lion和Muon在内的几个非欧几里得SGD实例，由于在训练深度神经网络方面的实际成功，引起了优化社区的极大兴趣。因此，一些作品试图通过发展理论趋同分析来解释这一成功。不幸的是，这些结果不能正确地证明这些方法的优越性能，因为它们不能击败香草欧几里德SGD的收敛速度。我们解决了这个重要的开放问题，通过开发一个新的统一的收敛分析下的结构光滑和梯度噪声的假设。特别是，我们的结果表明，非欧几里得SGD（i）可以利用Hessian和梯度噪声上界的稀疏性或低秩结构，（ii）可以证明受益于流行的算法工具，例如外推或动量方差减少，并且（iii）可以匹配AdaGrad和Shampoo等自适应和更复杂优化算法的最新收敛速度。
摘要：Recently, several instances of non-Euclidean SGD, including SignSGD, Lion, and Muon, have attracted significant interest from the optimization community due to their practical success in training deep neural networks. Consequently, a number of works have attempted to explain this success by developing theoretical convergence analyses. Unfortunately, these results cannot properly justify the superior performance of these methods, as they could not beat the convergence rate of vanilla Euclidean SGD. We resolve this important open problem by developing a new unified convergence analysis under the structured smoothness and gradient noise assumption. In particular, our results indicate that non-Euclidean SGD (i) can exploit the sparsity or low-rank structure of the upper bounds on the Hessian and gradient noise, (ii) can provably benefit from popular algorithmic tools such as extrapolation or momentum variance reduction, and (iii) can match the state-of-the-art convergence rates of adaptive and more complex optimization algorithms such as AdaGrad and Shampoo.

检测相关(1篇)

【1】Anomaly Detection in High-Dimensional Bank Account Balances via Robust Methods
标题：通过稳健方法检测多维银行账户余额的异常
链接：https://arxiv.org/abs/2511.11143

作者：Federico Maddanu,Tommaso Proietti,Riccardo Crupi
摘要：检测银行账户余额中的点异常对金融机构至关重要，因为它可以识别潜在的欺诈，运营问题或其他违规行为。稳健统计对于标记离群值和提供不受污染观测影响的数据分布参数的估计非常有用。然而，这样的策略在高维设置下通常效率较低并且计算昂贵。在本文中，我们提出了几个强大的方法，可能是计算效率高，在中等和高维数据集，高故障点和低计算时间的经验和评估。我们的应用程序每天处理大约260万条匿名用户银行账户余额的记录。
摘要：Detecting point anomalies in bank account balances is essential for financial institutions, as it enables the identification of potential fraud, operational issues, or other irregularities. Robust statistics is useful for flagging outliers and for providing estimates of the data distribution parameters that are not affected by contaminated observations. However, such a strategy is often less efficient and computationally expensive under high dimensional setting. In this paper, we propose and evaluate empirically several robust approaches that may be computationally efficient in medium and high dimensional datasets, with high breakdown points and low computational time. Our application deals with around 2.6 million daily records of anonymous users' bank account balances.

分类|识别(3篇)

【1】FlowPath: Learning Data-Driven Manifolds with Invertible Flows for Robust Irregularly-sampled Time Series Classification
标题：FlowPath：具有可逆流的学习数据驱动Manifals，用于稳健的不规则采样时间序列分类
链接：https://arxiv.org/abs/2511.10841

作者：YongKyung Oh,Dong-Young Lim,Sungil Kim
摘要：从稀疏和不规则采样的时间序列建模连续时间动态仍然是一个基本的挑战。神经控制微分方程为这些任务提供了一个原则性的框架，但它们的性能对从离散观测值构造的控制路径的选择非常敏感。现有的方法通常采用固定的插值方案，这强加了简单的几何假设，往往歪曲了底层的数据流形，特别是在高缺失。我们提出了FlowPath，一种新的方法，通过可逆的神经流学习控制路径的几何形状。而不是仅仅连接观察，FlowPath构造了一个连续的和数据自适应的流形，由可逆性约束指导，强制执行信息保留和良好的转换。这种归纳偏差将FlowPath与先前的无约束可学习路径模型区分开来。对18个基准数据集和一个真实案例研究的实证评估表明，与使用固定插值或不可逆架构的基线相比，FlowPath在分类准确性方面始终实现了统计学上的显著改善。这些结果突出了不仅要对路径上的动态进行建模，还要对路径本身的几何形状进行建模的重要性，从而为从不规则时间序列中学习提供了一种鲁棒且可推广的解决方案。
摘要：Modeling continuous-time dynamics from sparse and irregularly-sampled time series remains a fundamental challenge. Neural controlled differential equations provide a principled framework for such tasks, yet their performance is highly sensitive to the choice of control path constructed from discrete observations. Existing methods commonly employ fixed interpolation schemes, which impose simplistic geometric assumptions that often misrepresent the underlying data manifold, particularly under high missingness. We propose FlowPath, a novel approach that learns the geometry of the control path via an invertible neural flow. Rather than merely connecting observations, FlowPath constructs a continuous and data-adaptive manifold, guided by invertibility constraints that enforce information-preserving and well-behaved transformations. This inductive bias distinguishes FlowPath from prior unconstrained learnable path models. Empirical evaluations on 18 benchmark datasets and a real-world case study demonstrate that FlowPath consistently achieves statistically significant improvements in classification accuracy over baselines using fixed interpolants or non-invertible architectures. These results highlight the importance of modeling not only the dynamics along the path but also the geometry of the path itself, offering a robust and generalizable solution for learning from irregular time series.

【2】Differentiable Sparse Identification of Lagrangian Dynamics
标题：拉格朗日动力学的可微稀疏辨识
链接：https://arxiv.org/abs/2511.10706

作者：Zitong Zhang,Hao Sun
摘要：从数据中发现控制方程仍然是非线性动力学的一个基本挑战。虽然稀疏回归技术具有先进的系统识别，但它们在复杂的机械系统中与有理函数和噪声敏感性作斗争。拉格朗日形式主义提供了一个有前途的替代方案，因为它通常避免理性表达式，并提供了一个更简洁的系统动力学表示。然而，现有的拉格朗日识别方法受到测量噪声和有限数据可用性的显著影响。本文提出了一种新的可微稀疏识别框架，通过三个关键贡献来解决这些限制：（1）首次将三次B样条近似集成到拉格朗日系统识别中，使得能够精确地表示复杂的非线性，（2）鲁棒的方程发现机制，其有效地利用测量，同时结合已知的物理约束，（3）基于B样条基函数的递推导数计算方法，有效地约束了高阶导数，降低了二阶动力系统对噪声的敏感性。所提出的方法表现出优越的性能，并能够更准确和可靠的物理定律从噪声数据中提取，特别是在复杂的机械系统相比，基线方法。
摘要：Data-driven discovery of governing equations from data remains a fundamental challenge in nonlinear dynamics. Although sparse regression techniques have advanced system identification, they struggle with rational functions and noise sensitivity in complex mechanical systems. The Lagrangian formalism offers a promising alternative, as it typically avoids rational expressions and provides a more concise representation of system dynamics. However, existing Lagrangian identification methods are significantly affected by measurement noise and limited data availability. This paper presents a novel differentiable sparse identification framework that addresses these limitations through three key contributions: (1) the first integration of cubic B-Spline approximation into Lagrangian system identification, enabling accurate representation of complex nonlinearities, (2) a robust equation discovery mechanism that effectively utilizes measurements while incorporating known physical constraints, (3) a recursive derivative computation scheme based on B-spline basis functions, effectively constraining higher-order derivatives and reducing noise sensitivity on second-order dynamical systems. The proposed method demonstrates superior performance and enables more accurate and reliable extraction of physical laws from noisy data, particularly in complex mechanical systems compared to baseline methods.

【3】LT-Soups: Bridging Head and Tail Classes via Subsampled Model Soups
标题：LT汤：通过二次抽样模型汤连接头部和尾部类别
链接：https://arxiv.org/abs/2511.10683

作者：Masih Aminbeidokhti,Subhankar Roy,Eric Granger,Elisa Ricci,Marco Pedersoli
备注：Neurips 2025
摘要：现实世界的数据集通常表现出长尾（LT）分布，其中少数头部类占主导地位，许多尾部类严重不足。虽然最近的工作表明，参数有效的微调（PEFT）方法，如LoRA和AdaptFormer保留基础模型，如CLIP的尾类性能，我们发现，他们这样做的成本头类精度。我们确定了头尾比，头到尾类的比例，作为一个重要的，但被忽视的因素影响这种权衡。通过在CIFAR 100上进行不同不平衡比（$ρ$）和头-尾比（$η$）的控制实验，我们发现PEFT在尾重的情况下表现出色，但在更平衡和头重脚轻的分布下表现下降。为了克服这些局限性，我们提出了LT汤，两阶段模型汤框架，旨在推广不同的LT制度。在第一阶段，LT-Soups对在平衡子集上微调的模型进行平均，以减少头类偏差;在第二阶段，它只对完整数据集上的分类器进行微调，以恢复头类准确性。在六个基准数据集上的实验表明，与PEFT和传统模型汤相比，LT汤在广泛的不平衡制度中实现了卓越的权衡。
摘要：Real-world datasets typically exhibit long-tailed (LT) distributions, where a few head classes dominate and many tail classes are severely underrepresented. While recent work shows that parameter-efficient fine-tuning (PEFT) methods like LoRA and AdaptFormer preserve tail-class performance on foundation models such as CLIP, we find that they do so at the cost of head-class accuracy. We identify the head-tail ratio, the proportion of head to tail classes, as a crucial but overlooked factor influencing this trade-off. Through controlled experiments on CIFAR100 with varying imbalance ratio ($ρ$) and head-tail ratio ($η$), we show that PEFT excels in tail-heavy scenarios but degrades in more balanced and head-heavy distributions. To overcome these limitations, we propose LT-Soups, a two-stage model soups framework designed to generalize across diverse LT regimes. In the first stage, LT-Soups averages models fine-tuned on balanced subsets to reduce head-class bias; in the second, it fine-tunes only the classifier on the full dataset to restore head-class accuracy. Experiments across six benchmark datasets show that LT-Soups achieves superior trade-offs compared to both PEFT and traditional model soups across a wide range of imbalance regimes.

表征(3篇)

【1】MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising
标题：MOON嵌入：电子商务搜索广告的多模式表示学习
链接：https://arxiv.org/abs/2511.11305

作者：Chenghan Fu,Daoze Zhang,Yukang Lin,Zhanheng Nie,Xiang Zhang,Jianyu Liu,Yueran Liu,Wanxian Guan,Pengjie Wang,Jian Xu,Bo Zheng
备注：31 pages, 12 figures
摘要：我们介绍了MOON，这是一套全面的可持续迭代实践，用于电子商务应用程序的多模态表示学习。MOON已经全面部署在淘宝搜索广告系统的各个阶段，包括检索、相关度、排名等，在点击率（CTR）预测任务上的性能提升尤为显著，整体在线CTR提升了+20.00%。在过去的三年里，该项目在CTR预测任务上取得了最大的改进，并经历了五次全面迭代。在我们MOON的探索和迭代过程中，我们积累了宝贵的见解和实践经验，我们相信这些经验将有益于研究界。MOON包含“预训练，后训练和应用”的三阶段训练范式，允许多模态表示与下游任务的有效集成。值得注意的是，为了弥合多模态表征学习和下游训练目标之间的不一致，我们定义了交换率来量化中间指标的改进如何有效地转化为下游收益。通过这种分析，我们确定了基于图像的搜索召回率作为一个关键的中间指标，指导多模态模型的优化。经过三年五次迭代，MOON已经沿着四个关键维度发展：数据处理，培训策略，模型架构和下游应用程序。还将分享通过迭代改进获得的经验教训和见解。作为我们探索电子商务领域规模效应的一部分，我们进一步对多模态表征学习的规模律进行了系统研究，研究了多个因素，如训练令牌的数量，负样本和用户行为序列的长度。
摘要：We introduce MOON, our comprehensive set of sustainable iterative practices for multimodal representation learning for e-commerce applications. MOON has already been fully deployed across all stages of Taobao search advertising system, including retrieval, relevance, ranking, and so on. The performance gains are particularly significant on click-through rate (CTR) prediction task, which achieves an overall +20.00% online CTR improvement. Over the past three years, this project has delivered the largest improvement on CTR prediction task and undergone five full-scale iterations. Throughout the exploration and iteration of our MOON, we have accumulated valuable insights and practical experience that we believe will benefit the research community. MOON contains a three-stage training paradigm of "Pretraining, Post-training, and Application", allowing effective integration of multimodal representations with downstream tasks. Notably, to bridge the misalignment between the objectives of multimodal representation learning and downstream training, we define the exchange rate to quantify how effectively improvements in an intermediate metric can translate into downstream gains. Through this analysis, we identify the image-based search recall as a critical intermediate metric guiding the optimization of multimodal models. Over three years and five iterations, MOON has evolved along four critical dimensions: data processing, training strategy, model architecture, and downstream application. The lessons and insights gained through the iterative improvements will also be shared. As part of our exploration into scaling effects in the e-commerce field, we further conduct a systematic study of the scaling laws governing multimodal representation learning, examining multiple factors such as the number of training tokens, negative samples, and the length of user behavior sequences.

【2】From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging
标题：从参数到表示：可控模型合并的封闭形式方法
链接：https://arxiv.org/abs/2511.10943

作者：Jialin Wu,Jian Yang,Handing Wang,Jiajun Wen,Zhiyong Yu
备注：Accepted by AAAI 2026, Extended Version
摘要：模型合并结合了多任务性能的专家模型，但面临着参数干扰的挑战。这引发了最近对可控模型合并的兴趣，使用户能够显式地平衡性能权衡。现有的方法采用先编译后查询的模式，执行昂贵的离线多目标优化，以实现快速，偏好感知模型生成。这个离线阶段通常涉及迭代搜索或专门的训练，复杂性随着任务的数量呈指数级增长。为了克服这些局限性，我们将视角从参数空间优化转向直接校正模型的最终表示。我们的方法将此校正建模为最佳线性变换，从而产生一个封闭形式的解决方案，该解决方案用单步架构不可知的计算取代了整个离线优化过程。该解决方案直接结合了用户偏好，允许实时生成帕累托最优模型，其复杂性随任务数量线性扩展。实验结果表明，我们的方法产生了一个优越的Pareto前沿更精确的偏好对齐和大幅降低计算成本。
摘要：Model merging combines expert models for multitask performance but faces challenges from parameter interference. This has sparked recent interest in controllable model merging, giving users the ability to explicitly balance performance trade-offs. Existing approaches employ a compile-then-query paradigm, performing a costly offline multi-objective optimization to enable fast, preference-aware model generation. This offline stage typically involves iterative search or dedicated training, with complexity that grows exponentially with the number of tasks. To overcome these limitations, we shift the perspective from parameter-space optimization to a direct correction of the model's final representation. Our approach models this correction as an optimal linear transformation, yielding a closed-form solution that replaces the entire offline optimization process with a single-step, architecture-agnostic computation. This solution directly incorporates user preferences, allowing a Pareto-optimal model to be generated on-the-fly with complexity that scales linearly with the number of tasks. Experimental results show our method generates a superior Pareto front with more precise preference alignment and drastically reduced computational cost.

【3】Multi-View Polymer Representations for the Open Polymer Prediction
标题：开放聚合物预测的多视图聚合物表示
链接：https://arxiv.org/abs/2511.10893

作者：Wonjin Jung,Yongseok Choi
摘要：我们解决聚合物性质预测与多视图设计，利用互补表示。我们的系统集成了四个系列：（i）表格式RDKit/Morgan描述符，（ii）图神经网络，（iii）3D信息表示，和（iv）预训练的SMILES语言模型，并通过统一的系综对每个属性的预测求平均值。模型使用10倍分割进行训练，并使用SMILES测试时间增强进行评估。该方法在NeurIPS 2025的开放聚合物预测挑战赛中排名第9。提交的集合实现了0.057的公共MAE和0.082的私有MAE。
摘要：We address polymer property prediction with a multi-view design that exploits complementary representations. Our system integrates four families: (i) tabular RDKit/Morgan descriptors, (ii) graph neural networks, (iii) 3D-informed representations, and (iv) pretrained SMILES language models, and averages per-property predictions via a uniform ensemble. Models are trained with 10-fold splits and evaluated with SMILES test-time augmentation. The approach ranks 9th of 2241 teams in the Open Polymer Prediction Challenge at NeurIPS 2025. The submitted ensemble achieves a public MAE of 0.057 and a private MAE of 0.082.

优化|敛散性(5篇)

【1】Low-Bit, High-Fidelity: Optimal Transport Quantization for Flow Matching
标题：低位、高保真：流匹配的最佳传输量化
链接：https://arxiv.org/abs/2511.11418

作者：Dara Varam,Diaa A. Abuhani,Imran Zualkernan,Raghad AlDamani,Lujain Khalil
备注：12 pages, 8 figures
摘要：流匹配（FM）生成模型提供高效的无仿真训练和确定性采样，但其实际部署受到高精度参数要求的挑战。我们将基于最优传输（OT）的后训练量化应用于FM模型，最小化量化和原始权重之间的2-Wasserstein距离，并系统地比较其与均匀，分段和对数量化方案的有效性。我们的理论分析提供了量化下生成退化的上限，并且在不同复杂度的五个基准数据集上的经验结果表明，基于OT的量化将视觉生成质量和潜在空间稳定性保持在每个参数2-3位，而替代方法失败。这将基于OT的量化确立为压缩边缘和嵌入式AI应用的FM生成模型的原则性有效方法。
摘要：Flow Matching (FM) generative models offer efficient simulation-free training and deterministic sampling, but their practical deployment is challenged by high-precision parameter requirements. We adapt optimal transport (OT)-based post-training quantization to FM models, minimizing the 2-Wasserstein distance between quantized and original weights, and systematically compare its effectiveness against uniform, piecewise, and logarithmic quantization schemes. Our theoretical analysis provides upper bounds on generative degradation under quantization, and empirical results across five benchmark datasets of varying complexity show that OT-based quantization preserves both visual generation quality and latent space stability down to 2-3 bits per parameter, where alternative methods fail. This establishes OT-based quantization as a principled, effective approach to compress FM generative models for edge and embedded AI applications.

【2】Differentiation Strategies for Acoustic Inverse Problems: Admittance Estimation and Shape Optimization
标题：声学逆问题的求导策略：介导估计和形状优化
链接：https://arxiv.org/abs/2511.11415

作者：Nikolas Borrel-Jensen,Josiah Bjorgaard
备注：4 pages, 2 figures
摘要：我们通过两个应用程序：导纳估计和形状优化谐振阻尼声学反问题的一个实用的可微编程方法。首先，我们证明了JAX-FEM的自动微分（AD）可以直接从稀疏的压力测量值中估计复杂的边界导纳，实现3位数的精度，而无需手动推导伴随方程。其次，我们将随机有限差分应用于声学形状优化，将JAX-FEM用于正向模拟与PyTorch 3D通过AD进行网格操作相结合。通过将物理驱动的边界优化与几何驱动的内部网格自适应分离，我们在目标频率下实现了48.1%的能量降低，与全网格上的标准有限差分相比，FEM解决方案减少了30倍。这项工作展示了现代可微软件栈如何实现基于物理的逆问题的优化工作流程的快速原型设计，以及参数估计的自动微分和几何设计的有限差分和AD的组合。
摘要：We demonstrate a practical differentiable programming approach for acoustic inverse problems through two applications: admittance estimation and shape optimization for resonance damping. First, we show that JAX-FEM's automatic differentiation (AD) enables direct gradient-based estimation of complex boundary admittance from sparse pressure measurements, achieving 3-digit precision without requiring manual derivation of adjoint equations. Second, we apply randomized finite differences to acoustic shape optimization, combining JAX-FEM for forward simulation with PyTorch3D for mesh manipulation through AD. By separating physics-driven boundary optimization from geometry-driven interior mesh adaptation, we achieve 48.1% energy reduction at target frequencies with 30-fold fewer FEM solutions compared to standard finite difference on the full mesh. This work showcases how modern differentiable software stacks enable rapid prototyping of optimization workflows for physics-based inverse problems, with automatic differentiation for parameter estimation and a combination of finite differences and AD for geometric design.

【3】On-Device Fine-Tuning via Backprop-Free Zeroth-Order Optimization
标题：通过无反推零阶优化进行设备上微调
链接：https://arxiv.org/abs/2511.11362

作者：Prabodh Katti,Sangwoo Park,Bipin Rajendran,Osvaldo Simeone
备注：Conference submission; Under review
摘要：设备上的微调是边缘AI系统的关键能力，它必须支持在严格的内存限制下适应不同的代理任务。传统的基于反向传播（BP）的训练需要存储层激活和优化器状态，这一需求只能通过检查点来部分缓解。在模型权重必须完全驻留在设备内存中的边缘部署中，这种开销严重限制了可以部署的最大模型大小。内存高效的零阶优化（MeZO）通过单独使用前向评估来估计梯度，消除了存储中间激活或优化器状态的需要，从而消除了这一瓶颈。这使得更大的模型能够容纳在片上存储器中，尽管代价是可能需要更长的微调挂钟时间。本文首先提供了BP和MeZO训练下可以容纳的相对模型大小的理论估计。然后，我们数值验证的分析，证明MeZO具有准确性的优势，在设备上的内存约束下，提供足够的挂钟时间可用于微调。
摘要：On-device fine-tuning is a critical capability for edge AI systems, which must support adaptation to different agentic tasks under stringent memory constraints. Conventional backpropagation (BP)-based training requires storing layer activations and optimizer states, a demand that can be only partially alleviated through checkpointing. In edge deployments in which the model weights must reside entirely in device memory, this overhead severely limits the maximum model size that can be deployed. Memory-efficient zeroth-order optimization (MeZO) alleviates this bottleneck by estimating gradients using forward evaluations alone, eliminating the need for storing intermediate activations or optimizer states. This enables significantly larger models to fit within on-chip memory, albeit at the cost of potentially longer fine-tuning wall-clock time. This paper first provides a theoretical estimate of the relative model sizes that can be accommodated under BP and MeZO training. We then numerically validate the analysis, demonstrating that MeZO exhibits accuracy advantages under on-device memory constraints, provided sufficient wall-clock time is available for fine-tuning.

【4】Private Zeroth-Order Optimization with Public Data
标题：利用公共数据进行私有零阶优化
链接：https://arxiv.org/abs/2511.10859

作者：Xuchen Gong,Tian Li
备注：NeurIPS 2025
摘要：部署流行的一阶差分私有（DP）机器学习算法（例如，DP-SGD）的缺点在于它们的高计算和存储器成本，尽管存在优化的实现。零阶方法有望减轻开销，因为它们利用函数求值来近似梯度，因此更容易私有化。虽然最近的工作已经探索了零阶方法在私人和非私人的设置，他们仍然遭受相对较低的实用程序相比，DP-SGD，只有在有限的应用领域进行了评估。在这项工作中，我们建议利用公共信息来指导和改进私人零阶算法的梯度近似。我们探索了一套公共数据辅助零阶优化器（PAZO）的开销最小。我们提供了理论分析的PAZO框架下的公共和私人数据之间的相似性的假设。从经验上讲，我们证明了PAZO在预训练和微调设置中实现了视觉和文本任务的卓越隐私/实用性权衡，尤其是在高度私有的制度中，表现优于最佳的一阶基线（使用公共数据），同时提供高达16倍的运行时加速。
摘要：One of the major bottlenecks for deploying popular first-order differentially private (DP) machine learning algorithms (e.g., DP-SGD) lies in their high computation and memory cost, despite the existence of optimized implementations. Zeroth-order methods have promise in mitigating the overhead, as they leverage function evaluations to approximate the gradients, hence significantly easier to privatize. While recent works have explored zeroth-order approaches in both private and non-private settings, they still suffer from relatively low utilities compared with DP-SGD, and have only been evaluated in limited application domains. In this work, we propose to leverage public information to guide and improve gradient approximation of private zeroth-order algorithms. We explore a suite of public-data-assisted zeroth-order optimizers (PAZO) with minimal overhead. We provide theoretical analyses of the PAZO framework under an assumption of the similarity between public and private data. Empirically, we demonstrate that PAZO achieves superior privacy/utility tradeoffs across vision and text tasks in both pre-training and fine-tuning settings, outperforming the best first-order baselines (with public data) especially in highly private regimes, while offering up to $16\times$ runtime speedup.

【5】Surrogate-Based Differentiable Pipeline for Shape Optimization
标题：基于代理的差异化形状优化管道
链接：https://arxiv.org/abs/2511.10761

作者：Andrin Rehmann,Nolan Black,Josiah Bjorgaard,Alessandro Angioi,Andrei Paleyes,Niklas Heim,Dion Häfner,Alexander Lavin
摘要：在典型的计算机辅助工程（CAE）工作流程中，基于性能的工程设计优化受到不可微组件的限制，该工作流程从设计参数计算性能指标。虽然基于梯度的方法可以在高维设计空间中提供明显的速度提升，但网格化，物理模拟和其他常见组件的代码是不可微的，即使它们下面的数学或物理是不可微的。我们建议取代不可微的管道组件与代理模型，这是固有的微分。使用气动形状优化的玩具示例，我们展示了一个端到端的可微分管道，其中3D U-Net全场代理通过在形状的符号距离场（SDF）和感兴趣的场之间的映射上对其进行训练来取代网格化和模拟步骤。这种方法使基于梯度的形状优化，而不需要微分求解器，这可能是有用的情况下，伴随方法是不可用的和/或难以实现。
摘要：Gradient-based optimization of engineering designs is limited by non-differentiable components in the typical computer-aided engineering (CAE) workflow, which calculates performance metrics from design parameters. While gradient-based methods could provide noticeable speed-ups in high-dimensional design spaces, codes for meshing, physical simulations, and other common components are not differentiable even if the math or physics underneath them is. We propose replacing non-differentiable pipeline components with surrogate models which are inherently differentiable. Using a toy example of aerodynamic shape optimization, we demonstrate an end-to-end differentiable pipeline where a 3D U-Net full-field surrogate replaces both meshing and simulation steps by training it on the mapping between the signed distance field (SDF) of the shape and the fields of interest. This approach enables gradient-based shape optimization without the need for differentiable solvers, which can be useful in situations where adjoint methods are unavailable and/or hard to implement.

预测|估计(12篇)

【1】Intrinsic Dimension Estimation for Radio Galaxy Zoo using Diffusion Models
标题：使用扩散模型估计射电银河动物园的内在维度
链接：https://arxiv.org/abs/2511.11490

作者：Joan Font-Quer Roset,Devina Mohan,Anna Scaife
备注：9 pages, 5 figures, 2 tables, submitted to NeurIPS 2025 ML4PS Workshop
摘要：在这项工作中，我们估计的内在尺寸（iD）的无线电银河动物园（RGZ）数据集使用基于分数的扩散模型。我们研究了iD估计如何作为贝叶斯神经网络（BNN）能量分数的函数而变化，该分数衡量了无线电源与RGZ数据集的MiraBest子集的相似程度。我们发现，分布外的源表现出较高的iD值，并且RGZ的整体iD超过了自然图像数据集通常报告的iD。此外，我们分析了iD如何在Fanaroff-Riley（FR）形态学类别中变化，并作为信噪比（SNR）的函数。虽然FR I和FR II类之间没有关系，但在较低的iD下，SNR较高的趋势较弱。使用RGZ数据集的未来工作可以利用iD和能量分数之间的关系来定量研究和改进各种自监督学习算法学习的表示。
摘要：In this work, we estimate the intrinsic dimension (iD) of the Radio Galaxy Zoo (RGZ) dataset using a score-based diffusion model. We examine how the iD estimates vary as a function of Bayesian neural network (BNN) energy scores, which measure how similar the radio sources are to the MiraBest subset of the RGZ dataset. We find that out-of-distribution sources exhibit higher iD values, and that the overall iD for RGZ exceeds those typically reported for natural image datasets. Furthermore, we analyse how iD varies across Fanaroff-Riley (FR) morphological classes and as a function of the signal-to-noise ratio (SNR). While no relationship is found between FR I and FR II classes, a weak trend toward higher SNR at lower iD. Future work using the RGZ dataset could make use of the relationship between iD and energy scores to quantitatively study and improve the representations learned by various self-supervised learning algorithms.

【2】Quantifying and Improving Adaptivity in Conformal Prediction through Input Transformations
标题：通过输入变换量化和提高保形预测的适应性
链接：https://arxiv.org/abs/2511.11472

作者：Sooyong Jang,Insup Lee
摘要：保形预测构造一组标签而不是单点预测，同时提供概率覆盖保证。除了覆盖率保证之外，对示例难度的适应性也是一个重要的属性。这意味着该方法应该为更困难的示例生成更大的预测集，为更简单的示例生成更小的预测集。现有的自适应性评估方法通常分析覆盖率违规或按难度分组的示例的箱的平均集合大小。然而，这些方法经常遭受不平衡的分箱，这可能导致覆盖率或集合大小的不准确估计。为了解决这个问题，我们提出了一个分箱方法，利用输入变换来排序的困难，其次是统一的质量分箱的例子。在此基础上，我们引入了两个指标来更好地评估适应性。这些指标提供了更可靠的估计覆盖率违规和平均集大小，由于平衡分箱，导致更准确的自适应评估。通过实验，我们证明了我们提出的度量与现有度量相比，与所需的自适应属性相关性更强。此外，我们的研究结果的动机，我们提出了一个新的自适应预测集算法，组的例子估计的难度和应用组条件保形预测。这使我们能够为每个组确定适当的阈值。（a）图像分类（ImageNet）（b）医疗任务（视力预测）的实验结果表明，根据新的指标，我们的方法优于现有的方法。
摘要：Conformal prediction constructs a set of labels instead of a single point prediction, while providing a probabilistic coverage guarantee. Beyond the coverage guarantee, adaptiveness to example difficulty is an important property. It means that the method should produce larger prediction sets for more difficult examples, and smaller ones for easier examples. Existing evaluation methods for adaptiveness typically analyze coverage rate violation or average set size across bins of examples grouped by difficulty. However, these approaches often suffer from imbalanced binning, which can lead to inaccurate estimates of coverage or set size. To address this issue, we propose a binning method that leverages input transformations to sort examples by difficulty, followed by uniform-mass binning. Building on this binning, we introduce two metrics to better evaluate adaptiveness. These metrics provide more reliable estimates of coverage rate violation and average set size due to balanced binning, leading to more accurate adaptivity assessment. Through experiments, we demonstrate that our proposed metric correlates more strongly with the desired adaptiveness property compared to existing ones. Furthermore, motivated by our findings, we propose a new adaptive prediction set algorithm that groups examples by estimated difficulty and applies group-conditional conformal prediction. This allows us to determine appropriate thresholds for each group. Experimental results on both (a) an Image Classification (ImageNet) (b) a medical task (visual acuity prediction) show that our method outperforms existing approaches according to the new metrics.

【3】Epistemic Error Decomposition for Multi-step Time Series Forecasting: Rethinking Bias-Variance in Recursive and Direct Strategies
标题：多步时间序列预测的认识误差分解：重新思考循环和直接策略中的偏差方差
链接：https://arxiv.org/abs/2511.11461

作者：Riku Green,Huw Day,Zahraa S. Abdallah,Telmo M. Silva Filho
备注：2025 EIML Eurips Workshop, 6 pages
摘要：多步预测通常通过一个简单的经验法则来描述：递归策略被认为具有高偏差和低方差，而直接策略被认为具有低偏差和高方差。我们通过将预期的多步预测误差分解为三个部分来重新审视这一信念：不可约噪声、结构近似间隙和估计方差项。对于线性预测变量，我们证明了对于任何数据集，结构间隙都是零。然而，对于非线性预测因子，递归中使用的重复组合可以增加模型的表达能力，使结构差距同时取决于模型和数据。我们进一步表明，在任何水平的递归策略的估计方差可以写为一步方差乘以雅可比放大因子，衡量组合预测是多么敏感的参数误差。这个观点解释了递归预测何时可能同时具有比直接预测更低的偏差和更高的方差。在ETTm 1数据集上使用多层感知器的实验证实了这些发现。研究结果为基于模型非线性和噪声特性的递归和直接策略之间的选择提供了实际指导，而不是依赖于传统的偏差-方差直觉。
摘要：Multi-step forecasting is often described through a simple rule of thumb: recursive strategies are said to have high bias and low variance, while direct strategies are said to have low bias and high variance. We revisit this belief by decomposing the expected multi-step forecast error into three parts: irreducible noise, a structural approximation gap, and an estimation-variance term. For linear predictors we show that the structural gap is identically zero for any dataset. For nonlinear predictors, however, the repeated composition used in recursion can increase model expressivity, making the structural gap depend on both the model and the data. We further show that the estimation variance of the recursive strategy at any horizon can be written as the one-step variance multiplied by a Jacobian-based amplification factor that measures how sensitive the composed predictor is to parameter error. This perspective explains when recursive forecasting may simultaneously have lower bias and higher variance than direct forecasting. Experiments with multilayer perceptrons on the ETTm1 dataset confirm these findings. The results offer practical guidance for choosing between recursive and direct strategies based on model nonlinearity and noise characteristics, rather than relying on traditional bias-variance intuition.

【4】FairReweighing: Density Estimation-Based Reweighing Framework for Improving Separation in Fair Regression
标题：公平重新加权：基于密度估计的重新加权框架，用于改善公平回归中的分离性
链接：https://arxiv.org/abs/2511.11459

作者：Xiaoyin Xi,Zhe Yu
摘要：在高风险的公共部门和工业环境中，应用人工智能软件的情况一直很普遍。然而，缺乏透明度引发了人们的担忧，即这些基于数据的人工智能软件决策是否能确保对所有种族、性别或年龄组的人的公平性。尽管对新兴的公平感知AI软件进行了广泛的研究，但到目前为止，解决这个问题的大多数努力都致力于二进制分类任务。回归中的公平性相对来说还没有得到充分的探讨。在这项工作中，我们采用了基于互信息的度量来评估分离违规。该度量也进行了扩展，使其可以直接应用于二进制和连续敏感属性的分类和回归问题。受公平分类中Reweighting算法的启发，提出了一种基于密度估计的FairReweighting预处理算法，以保证学习后的模型满足分离准则。理论上，我们表明，建议的FairReweighting算法可以保证分离的训练数据下的数据独立性假设。从经验上讲，在合成和真实世界的数据上，我们表明FairReweighting在提高分离度的同时保持高准确性方面优于现有的最先进的回归公平性解决方案。
摘要：There has been a prevalence of applying AI software in both high-stakes public-sector and industrial contexts. However, the lack of transparency has raised concerns about whether these data-informed AI software decisions secure fairness against people of all racial, gender, or age groups. Despite extensive research on emerging fairness-aware AI software, up to now most efforts to solve this issue have been dedicated to binary classification tasks. Fairness in regression is relatively underexplored. In this work, we adopted a mutual information-based metric to assess separation violations. The metric is also extended so that it can be directly applied to both classification and regression problems with both binary and continuous sensitive attributes. Inspired by the Reweighing algorithm in fair classification, we proposed a FairReweighing pre-processing algorithm based on density estimation to ensure that the learned model satisfies the separation criterion. Theoretically, we show that the proposed FairReweighing algorithm can guarantee separation in the training data under a data independence assumption. Empirically, on both synthetic and real-world data, we show that FairReweighing outperforms existing state-of-the-art regression fairness solutions in terms of improving separation while maintaining high accuracy.

【5】Fast and Expressive Multi-Token Prediction with Probabilistic Circuits
标题：使用概率电路的快速、富有表达性的多令牌预测
链接：https://arxiv.org/abs/2511.11346

作者：Andreas Grivas,Lorenzo Loconte,Emile van Krieken,Piotr Nawrot,Yu Zhao,Euan Wielewski,Pasquale Minervini,Edoardo Ponti,Antonio Vergari
摘要：多标记预测（MTP）是一种显著加快大型语言模型（LLM）生成的重要策略，包括字节级LLM，它们没有标记器，但速度非常慢。然而，现有的MTP方法通常通过假设未来令牌之间的独立性来牺牲表现力。在这项工作中，我们调查的表现力和延迟之间的权衡在MTP的框架内的概率电路（PC）。我们的框架名为MTPC，允许人们通过选择不同的电路架构、概括经典模型（例如（分层）混合模型、隐马尔可夫模型和张量网络）来探索对未来令牌的联合分布进行编码的不同方法。我们通过改造现有的字节级LLM（如EvaByte）来展示MTPC的功效。我们的实验表明，当结合推测解码，MTPC显着加快生成相比，MTP与独立性假设，同时保证保留原始验证器LLM的性能。我们还严格研究了表现力和延迟之间的最佳权衡时，探索可能的MTPC参数，如PC架构和验证器和草案LLM之间的部分层共享。
摘要：Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), including byte-level LLMs, which are tokeniser-free but prohibitively slow. However, existing MTP methods often sacrifice expressiveness by assuming independence between future tokens. In this work, we investigate the trade-off between expressiveness and latency in MTP within the framework of probabilistic circuits (PCs). Our framework, named MTPC, allows one to explore different ways to encode the joint distributions over future tokens by selecting different circuit architectures, generalising classical models such as (hierarchical) mixture models, hidden Markov models and tensor networks. We show the efficacy of MTPC by retrofitting existing byte-level LLMs, such as EvaByte. Our experiments show that, when combined with speculative decoding, MTPC significantly speeds up generation compared to MTP with independence assumptions, while guaranteeing to retain the performance of the original verifier LLM. We also rigorously study the optimal trade-off between expressiveness and latency when exploring the possible parameterisations of MTPC, such as PC architectures and partial layer sharing between the verifier and draft LLMs.

【6】Power Ensemble Aggregation for Improved Extreme Event AI Prediction
标题：电力集合聚合改进极端事件AI预测
链接：https://arxiv.org/abs/2511.11170

作者：Julien Collard,Pierre Gentine,Tian Zheng
备注：Accepted for the NeurIPS 2025 ML4PS workshop
摘要：本文讨论了使用机器学习方法改进气候极端事件（特别是热浪）预测的关键挑战。我们的工作是框架作为一个分类问题，我们试图预测地表空气温度是否会超过其第q个本地分位数在指定的时间范围内。我们的主要发现是，聚合集成预测使用功率平均值显着提高分类器的性能。通过使基于机器学习的天气预报模型生成并应用这种非线性聚合方法，我们在预测极端高温事件方面的准确性比来自同一模型的典型平均预测更高。我们的功率聚合方法显示出良好的前景和适应性，因为其最佳性能随所选择的分位数阈值而变化，从而证明了更高极端预测的有效性。
摘要：This paper addresses the critical challenge of improving predictions of climate extreme events, specifically heat waves, using machine learning methods. Our work is framed as a classification problem in which we try to predict whether surface air temperature will exceed its q-th local quantile within a specified timeframe. Our key finding is that aggregating ensemble predictions using a power mean significantly enhances the classifier's performance. By making a machine-learning based weather forecasting model generative and applying this non-linear aggregation method, we achieve better accuracy in predicting extreme heat events than with the typical mean prediction from the same model. Our power aggregation method shows promise and adaptability, as its optimal performance varies with the quantile threshold chosen, demonstrating increased effectiveness for higher extremes prediction.

【7】SMART: A Surrogate Model for Predicting Application Runtime in Dragonfly Systems
标题：Smart：预测蜻蜓系统应用周期的代理模型
链接：https://arxiv.org/abs/2511.11111

作者：Xin Wang,Pietro Lodi Rizzini,Sourav Medya,Zhiling Lan
备注：Accepted at AAAI 2026
摘要：Dragonfly网络以其高基数和低直径的结构，是高性能计算中领先的互连。一个主要的挑战是共享网络链路上的工作负载干扰。并行离散事件仿真（PDES）是分析工作负载干扰的常用方法。然而，高保真PDES在计算上是昂贵的，使得其对于大规模或实时场景是不切实际的。结合数据驱动代理模型的混合模拟提供了一种有前途的替代方案，特别是对于预测应用程序运行时，这是一项因网络流量的动态行为而变得复杂的任务。我们提出了我们的模型，一个代理模型，结合图神经网络（GNNs）和大语言模型（LLM）捕捉空间和时间模式从端口级路由器数据。我们的模型优于现有的统计和机器学习基线，能够实现准确的运行时预测，并支持Dragonfly网络的高效混合仿真。
摘要：The Dragonfly network, with its high-radix and low-diameter structure, is a leading interconnect in high-performance computing. A major challenge is workload interference on shared network links. Parallel discrete event simulation (PDES) is commonly used to analyze workload interference. However, high-fidelity PDES is computationally expensive, making it impractical for large-scale or real-time scenarios. Hybrid simulation that incorporates data-driven surrogate models offers a promising alternative, especially for forecasting application runtime, a task complicated by the dynamic behavior of network traffic. We present \ourmodel, a surrogate model that combines graph neural networks (GNNs) and large language models (LLMs) to capture both spatial and temporal patterns from port level router data. \ourmodel outperforms existing statistical and machine learning baselines, enabling accurate runtime prediction and supporting efficient hybrid simulation of Dragonfly networks.

【8】Sheaf Cohomology of Linear Predictive Coding Networks
标题：线性预测编码网络的层上同调
链接：https://arxiv.org/abs/2511.11092

作者：Jeffrey Seely
备注：Accepted to NeurIPS 2025 Workshop on Symmetry and Geometry in Neural Representations
摘要：预测编码（PC）用权重和激活的局部优化代替了全局反向传播。我们发现，线性PC网络承认一个自然的配方作为细胞层：层的共边界映射激活的边缘预测误差，和PC推理的层拉普拉斯算子下的扩散。层上同调然后表征推理不能去除的不可约错误模式。我们分析经常性的拓扑结构，其中反馈回路产生内部矛盾，引入与监督无关的预测误差。使用霍奇分解，我们确定这些矛盾何时导致学习停滞。层形式主义提供了诊断工具，用于识别有问题的网络配置和设计原则，有效的权重初始化经常性PC网络。
摘要：Predictive coding (PC) replaces global backpropagation with local optimization over weights and activations. We show that linear PC networks admit a natural formulation as cellular sheaves: the sheaf coboundary maps activations to edge-wise prediction errors, and PC inference is diffusion under the sheaf Laplacian. Sheaf cohomology then characterizes irreducible error patterns that inference cannot remove. We analyze recurrent topologies where feedback loops create internal contradictions, introducing prediction errors unrelated to supervision. Using a Hodge decomposition, we determine when these contradictions cause learning to stall. The sheaf formalism provides both diagnostic tools for identifying problematic network configurations and design principles for effective weight initialization for recurrent PC networks.

【9】Flow matching-based generative models for MIMO channel estimation
标题：基于流匹配的生成模型用于CDMA信道估计
链接：https://arxiv.org/abs/2511.10941

作者：Wenkai Liu,Nan Ma,Jianqiao Chen,Xiaoxuan Qi,Yuhang Ma
备注：6 pages, 4 figures
摘要：基于扩散模型（DM）的信道估计通过后验采样逐步产生信道样本并进行去噪处理，在高精度信道状态信息（CSI）获取中显示出了潜力。然而，缓慢的采样速度是最近开发的基于DM的计划的一个重要挑战。为了缓解这个问题，我们提出了一种新的基于流匹配（FM）的多输入多输出（MIMO）信道估计生成模型。首先，我们制定的FM框架内的信道估计问题，其中的条件概率路径被构造从噪声信道分布的真实信道分布。在这种情况下，路径以恒定速度沿着直线轨迹演变。然后，在此指导下，我们推导出仅依赖于噪声统计的速度场，以指导生成模型的训练。此外，在采样阶段，我们利用训练的速度场作为信道估计的先验信息，这允许通过常微分方程（ODE）欧拉求解器快速可靠地增强噪声信道。最后，数值结果表明，所提出的FM为基础的信道估计方案，可以显着减少采样开销相比，其他流行的DM为基础的计划，如得分匹配（SM）为基础的计划。同时，它在不同的信道条件下，实现了优越的信道估计精度。
摘要：Diffusion model (DM)-based channel estimation, which generates channel samples via a posteriori sampling stepwise with denoising process, has shown potential in high-precision channel state information (CSI) acquisition. However, slow sampling speed is an essential challenge for recent developed DM-based schemes. To alleviate this problem, we propose a novel flow matching (FM)-based generative model for multiple-input multiple-output (MIMO) channel estimation. We first formulate the channel estimation problem within FM framework, where the conditional probability path is constructed from the noisy channel distribution to the true channel distribution. In this case, the path evolves along the straight-line trajectory at a constant speed. Then, guided by this, we derive the velocity field that depends solely on the noise statistics to guide generative models training. Furthermore, during the sampling phase, we utilize the trained velocity field as prior information for channel estimation, which allows for quick and reliable noise channel enhancement via ordinary differential equation (ODE) Euler solver. Finally, numerical results demonstrate that the proposed FM-based channel estimation scheme can significantly reduce the sampling overhead compared to other popular DM-based schemes, such as the score matching (SM)-based scheme. Meanwhile, it achieves superior channel estimation accuracy under different channel conditions.

【10】Fast Neural Tangent Kernel Alignment, Norm and Effective Rank via Trace Estimation
标题：通过踪迹估计的快速神经切核对齐、规范和有效排序
链接：https://arxiv.org/abs/2511.10796

作者：James Hazelden
摘要：神经切向核（NTK）描述了模型的状态如何在梯度下降中演变。计算完整的NTK矩阵通常是不可行的，特别是对于递归架构。在这里，我们介绍了一个矩阵的角度来看，使用跟踪估计快速分析经验，有限宽度NTK。这使得能够快速计算NTK的迹、Frobenius范数、有效秩和对齐。我们提供数值食谱的基础上的哈奇++跟踪估计可证明快速收敛的保证。此外，我们表明，由于NTK的结构，可以计算的轨迹只使用正向或反向模式自动微分，不需要这两种模式。我们证明了这些所谓的单侧估计在低样本情况下可以优于Hutch++，特别是当模型状态和参数计数之间的差距很大时。总的来说，我们的研究结果表明，无矩阵随机化方法可以产生许多数量级的加速，从而更快地分析和应用的NTK。
摘要：The Neural Tangent Kernel (NTK) characterizes how a model's state evolves over Gradient Descent. Computing the full NTK matrix is often infeasible, especially for recurrent architectures. Here, we introduce a matrix-free perspective, using trace estimation to rapidly analyze the empirical, finite-width NTK. This enables fast computation of the NTK's trace, Frobenius norm, effective rank, and alignment. We provide numerical recipes based on the Hutch++ trace estimator with provably fast convergence guarantees. In addition, we show that, due to the structure of the NTK, one can compute the trace using only forward- or reverse-mode automatic differentiation, not requiring both modes. We show these so-called one-sided estimators can outperform Hutch++ in the low-sample regime, especially when the gap between the model state and parameter count is large. In total, our results demonstrate that matrix-free randomized approaches can yield speedups of many orders of magnitude, leading to faster analysis and applications of the NTK.

【11】LAD-BNet: Lag-Aware Dual-Branch Networks for Real-Time Energy Forecasting on Edge Devices
标题：LAD-BNet：用于边缘设备实时能源预测的滞后感知双分支网络
链接：https://arxiv.org/abs/2511.10680

作者：Jean-Philippe Lignier
备注：27 pages, in French language. 10 tables, 26 references. Submitted to Energy and AI
摘要：边缘设备上的实时能源预测是智能电网优化和智能建筑的主要挑战。我们提出了LAD-BNet（滞后感知双分支网络），这是一种创新的神经架构，通过Google Coral TPU进行边缘推理优化。我们的混合方法将一个专门用于明确利用时间滞后的分支与具有膨胀卷积的时间卷积网络（TCN）相结合，从而能够同时捕获短期和长期依赖关系。在具有10分钟时间分辨率的真实能耗数据上进行测试，LAD-BNet在1小时范围内实现了14.49%的MAPE，在Edge TPU上的推理时间仅为18 ms，与CPU相比加速8-12倍。多尺度架构可实现长达12小时的预测，并控制性能下降。我们的模型比LSTM基线提高了2.39%，比纯TCN架构提高了3.04%，同时保持了适合嵌入式设备限制的180 MB内存占用。这些结果为实时能源优化、需求管理和运营规划等工业应用铺平了道路。
摘要：Real-time energy forecasting on edge devices represents a major challenge for smart grid optimization and intelligent buildings. We present LAD-BNet (Lag-Aware Dual-Branch Network), an innovative neural architecture optimized for edge inference with Google Coral TPU. Our hybrid approach combines a branch dedicated to explicit exploitation of temporal lags with a Temporal Convolutional Network (TCN) featuring dilated convolutions, enabling simultaneous capture of short and long-term dependencies. Tested on real energy consumption data with 10-minute temporal resolution, LAD-BNet achieves 14.49% MAPE at 1-hour horizon with only 18ms inference time on Edge TPU, representing an 8-12 x acceleration compared to CPU. The multi-scale architecture enables predictions up to 12 hours with controlled performance degradation. Our model demonstrates a 2.39% improvement over LSTM baselines and 3.04% over pure TCN architectures, while maintaining a 180MB memory footprint suitable for embedded device constraints. These results pave the way for industrial applications in real-time energy optimization, demand management, and operational planning.

【12】Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths
标题：基于离散观察独立路径的神经网络扩散过程漂移估计
链接：https://arxiv.org/abs/2511.11161

作者：Yuzhen Zhao,Yating Liu,Marc Hoffmann
摘要：本文讨论了时间齐次扩散过程的漂移函数的非参数估计，在一个紧凑的区域，从$N$独立轨迹的高频离散观测的基础上。我们提出了一个基于神经网络的估计，并推导出一个非渐近收敛速度，分解成一个训练误差，一个近似误差，和一个扩散相关的长期缩放为${\log N}/{N}$。对于组分漂移函数，我们建立了一个明确的速率。在数值实验中，我们考虑了一个漂移函数与局部波动产生的双层组成结构具有局部振荡，并表明，经验收敛速度变得独立的输入尺寸$d$。与$B$样条方法相比，神经网络估计器实现了更好的收敛速度，更有效地捕捉局部特征，特别是在高维设置。
摘要：This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$ independent trajectories. We propose a neural network-based estimator and derive a non-asymptotic convergence rate, decomposed into a training error, an approximation error, and a diffusion-related term scaling as ${\log N}/{N}$. For compositional drift functions, we establish an explicit rate. In the numerical experiments, we consider a drift function with local fluctuations generated by a double-layer compositional structure featuring local oscillations, and show that the empirical convergence rate becomes independent of the input dimension $d$. Compared to the $B$-spline method, the neural network estimator achieves better convergence rates and more effectively captures local features, particularly in higher-dimensional settings.

其他神经网络|深度学习|模型|建模(17篇)

【1】CVChess: A Deep Learning Framework for Converting Chessboard Images to Forsyth-Edwards Notation
标题：CVChess：将棋盘图像转换为Forsyth-Edwards符号的深度学习框架
链接：https://arxiv.org/abs/2511.11522

作者：Luthira Abeykoon,Ved Patel,Gawthaman Senthilvelan,Darshan Kasundra
摘要：自疫情以来，国际象棋的收视率大幅上升，主要是由于在线学习平台的可访问性。然而，对于物理象棋游戏来说，没有等效的帮助，从而在模拟和数字象棋体验之间形成了鸿沟。本文介绍了CVChess，这是一个深度学习框架，用于将棋盘图像转换为Forsyth-Edwards Notation（FEN），然后将其输入在线国际象棋引擎，为您提供最佳下一步。我们的方法采用具有残留层的卷积神经网络（CNN）来执行智能手机相机图像的片段识别。该系统通过多步骤处理物理棋盘的RGB图像：使用Hough线变换进行边缘检测的图像预处理，投影变换以实现自上而下的棋盘对齐，分割成64个单独的正方形，并使用残差CNN将棋子分类为13个类（6个独特的白色棋子，6个独特的黑色棋子和一个空正方形）。残余连接有助于保留低级视觉特征，同时实现更深层次的特征提取，提高训练过程中的准确性和稳定性。我们使用国际象棋识别数据集（ChessReD）对我们的模型进行训练和评估，该数据集包含在不同照明条件和角度下捕获的10，800张带注释的智能手机图像。产生的分类被编码为FEN字符串，该字符串可以被馈送到国际象棋引擎中以生成最佳移动
摘要：Chess has experienced a large increase in viewership since the pandemic, driven largely by the accessibility of online learning platforms. However, no equivalent assistance exists for physical chess games, creating a divide between analog and digital chess experiences. This paper presents CVChess, a deep learning framework for converting chessboard images to Forsyth-Edwards Notation (FEN), which is later input into online chess engines to provide you with the best next move. Our approach employs a convolutional neural network (CNN) with residual layers to perform piece recognition from smartphone camera images. The system processes RGB images of a physical chess board through a multistep process: image preprocessing using the Hough Line Transform for edge detection, projective transform to achieve a top-down board alignment, segmentation into 64 individual squares, and piece classification into 13 classes (6 unique white pieces, 6 unique black pieces and an empty square) using the residual CNN. Residual connections help retain low-level visual features while enabling deeper feature extraction, improving accuracy and stability during training. We train and evaluate our model using the Chess Recognition Dataset (ChessReD), containing 10,800 annotated smartphone images captured under diverse lighting conditions and angles. The resulting classifications are encoded as an FEN string, which can be fed into a chess engine to generate the most optimal move

【2】FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models
标题：FarSkip-Collective：消除混合专家模型中的通信阻塞
链接：https://arxiv.org/abs/2511.11505

作者：Yonatan Dukler,Guihong Li,Deval Shah,Vikram Appia,Emad Barsoum
摘要：阻塞通信是在分布式环境中有效运行MoE的主要障碍。为了解决这个问题，我们提出了FarSkip-Collective，它修改了现代模型的架构，使其计算与通信重叠。我们的方法修改了架构以跳过模型中的连接，并且先验地不清楚修改后的模型架构是否可以保持能力，特别是对于大型最先进的模型以及修改所有模型层。我们肯定地回答了这个问题，并完全转换了一系列最先进的模型，从16 B到109 B参数不等，以实现其通信的重叠，同时实现与原始开源版本相同的准确性。例如，我们通过自蒸馏转换Llama 4 Scout（109 B），并在其指令调整版本的1%范围内实现平均精度。除了展示大型修改模型的保留准确性外，我们还通过优化实现实现了FarSkip-Collective的好处，这些优化实现明确地将通信与计算重叠，从而加速了现有框架中的训练和推理。
摘要：Blocking communication presents a major hurdle in running MoEs efficiently in distributed settings. To address this, we present FarSkip-Collective which modifies the architecture of modern models to enable overlapping of their computation with communication. Our approach modifies the architecture to skip connections in the model and it is unclear a priori whether the modified model architecture can remain as capable, especially for large state-of-the-art models and while modifying all of the model layers. We answer this question in the affirmative and fully convert a series of state-of-the-art models varying from 16B to 109B parameters to enable overlapping of their communication while achieving accuracy on par with their original open-source releases. For example, we convert Llama 4 Scout (109B) via self-distillation and achieve average accuracy within 1% of its instruction tuned release averaged across a wide range of downstream evaluations. In addition to demonstrating retained accuracy of the large modified models, we realize the benefits of FarSkip-Collective through optimized implementations that explicitly overlap communication with computation, accelerating both training and inference in existing frameworks.

【3】Learning and Testing Convex Functions
标题：学习和测试凸函数
链接：https://arxiv.org/abs/2511.11498

作者：Renato Ferreira Pinto,Cassandra Marcussen,Elchanan Mossel,Shivam Nadimpalli
备注：43 pages
摘要：We consider the problems of \emph{learning} and \emph{testing} real-valued convex functions over Gaussian space. Despite the extensive study of function convexity across mathematics, statistics, and computer science, its learnability and testability have largely been examined only in discrete or restricted settings -- typically with respect to the Hamming distance, which is ill-suited for real-valued functions. In contrast, we study these problems in high dimensions under the standard Gaussian measure, assuming sample access to the function and a mild smoothness condition, namely Lipschitzness. A smoothness assumption is natural and, in fact, necessary even in one dimension: without it, convexity cannot be inferred from finitely many samples. As our main results, we give: - Learning Convex Functions: An agnostic proper learning algorithm for Lipschitz convex functions that achieves error $\varepsilon$ using $n^{O(1/\varepsilon^2)}$ samples, together with a complementary lower bound of $n^{\mathrm{poly}(1/\varepsilon)}$ samples in the \emph{correlational statistical query (CSQ)} model. - Testing Convex Functions: A tolerant (two-sided) tester for convexity of Lipschitz functions with the same sample complexity (as a corollary of our learning result), and a one-sided tester (which never rejects convex functions) using $O(\sqrt{n}/\varepsilon)^n$ samples.
摘要：We consider the problems of \emph{learning} and \emph{testing} real-valued convex functions over Gaussian space. Despite the extensive study of function convexity across mathematics, statistics, and computer science, its learnability and testability have largely been examined only in discrete or restricted settings -- typically with respect to the Hamming distance, which is ill-suited for real-valued functions. In contrast, we study these problems in high dimensions under the standard Gaussian measure, assuming sample access to the function and a mild smoothness condition, namely Lipschitzness. A smoothness assumption is natural and, in fact, necessary even in one dimension: without it, convexity cannot be inferred from finitely many samples. As our main results, we give: - Learning Convex Functions: An agnostic proper learning algorithm for Lipschitz convex functions that achieves error $\varepsilon$ using $n^{O(1/\varepsilon^2)}$ samples, together with a complementary lower bound of $n^{\mathrm{poly}(1/\varepsilon)}$ samples in the \emph{correlational statistical query (CSQ)} model. - Testing Convex Functions: A tolerant (two-sided) tester for convexity of Lipschitz functions with the same sample complexity (as a corollary of our learning result), and a one-sided tester (which never rejects convex functions) using $O(\sqrt{n}/\varepsilon)^n$ samples.

【4】Retrofit: Continual Learning with Bounded Forgetting for Security Applications
标题：改造：安全应用程序的持续学习和有限遗忘
链接：https://arxiv.org/abs/2511.11439

作者：Yiling He,Junchi Lei,Hongyu She,Shuo Shao,Xinran Zheng,Yiping Liu,Zhan Qin,Lorenzo Cavallaro
摘要：现代安全分析越来越多地由深度学习模型提供支持，但其性能往往会随着威胁形势的演变和数据表示的变化而下降。虽然持续学习（CL）提供了一个很有前途的范例，以保持模型的有效性，许多方法依赖于完全重新训练或数据重放，这是不可行的数据敏感的环境。此外，现有的方法仍然不足以满足安全关键场景，在知识转移方面面临两个双重挑战：在没有旧数据的情况下保留先前知识，并以最小的干扰整合新知识。我们提出了RETROFIT，这是一种无数据追溯的持续学习方法，可以实现有效的知识转移。我们的主要想法是通过参数级合并来整合以前训练过的和新微调过的模型，作为新旧知识的教师，从而消除了对历史数据的需求。为了减轻干扰，我们应用低秩和稀疏更新，将参数变化限制在独立的子空间，而知识仲裁动态平衡模型置信度指导下的教师贡献。我们对两个代表性应用程序的评估表明，RETROFIT在保持适应性的同时始终减轻遗忘。在时间漂移下的恶意软件检测中，它大大提高了保留分数，从CL基线的20.2%提高到38.6%，并超过了新数据的oracle上限。在跨反编译级别的二进制摘要中，分析剥离的二进制文件尤其具有挑战性，RETROFIT实现了之前工作中使用的转移学习的BLEU分数的两倍左右，并超过了交叉表示泛化的所有基线。
摘要：Modern security analytics are increasingly powered by deep learning models, but their performance often degrades as threat landscapes evolve and data representations shift. While continual learning (CL) offers a promising paradigm to maintain model effectiveness, many approaches rely on full retraining or data replay, which are infeasible in data-sensitive environments. Moreover, existing methods remain inadequate for security-critical scenarios, facing two coupled challenges in knowledge transfer: preserving prior knowledge without old data and integrating new knowledge with minimal interference. We propose RETROFIT, a data retrospective-free continual learning method that achieves bounded forgetting for effective knowledge transfer. Our key idea is to consolidate previously trained and newly fine-tuned models, serving as teachers of old and new knowledge, through parameter-level merging that eliminates the need for historical data. To mitigate interference, we apply low-rank and sparse updates that confine parameter changes to independent subspaces, while a knowledge arbitration dynamically balances the teacher contributions guided by model confidence. Our evaluation on two representative applications demonstrates that RETROFIT consistently mitigates forgetting while maintaining adaptability. In malware detection under temporal drift, it substantially improves the retention score, from 20.2% to 38.6% over CL baselines, and exceeds the oracle upper bound on new data. In binary summarization across decompilation levels, where analyzing stripped binaries is especially challenging, RETROFIT achieves around twice the BLEU score of transfer learning used in prior work and surpasses all baselines in cross-representation generalization.

【5】BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning
标题：BOFA：基于CLIP的类增量学习的桥层垂直低等级融合
链接：https://arxiv.org/abs/2511.11421

作者：Lan Li,Tao Hu,Da-Wei Zhou,Han-Jia Ye,De-Chuan Zhan
备注：Accepted by AAAI 2026
摘要：类增量学习（CIL）旨在不断学习新的类别，而不会忘记以前获得的知识。像CLIP这样的视觉语言模型通过多模态监督提供了强大的可转移表示，使它们成为CIL的希望。然而，将CLIP应用于CIL提出了两个主要挑战：（1）适应下游任务通常需要额外的可学习模块，增加了模型的复杂性和遗忘的敏感性;（2）虽然多模态表示提供了互补的优势，但现有方法尚未充分发挥其有效整合视觉和文本模态的潜力。为了解决这些问题，我们提出了BOFA（桥层正交融合自适应），一个新的框架CIL。BOFA将所有模型自适应限制在CLIP现有的跨模态桥接层，因此不增加额外的参数或推理成本。为了防止在这一层内遗忘，它利用了正交低秩融合，这是一种将参数更新约束到低秩"安全子空间”的机制，该机制在数学上构造成与过去的任务特征正交。这确保了稳定的知识积累，而无需数据重放。此外，BOFA采用了一个跨模态的混合原型，协同稳定的文本原型与视觉对应物来自我们的稳定适应的桥梁层，提高分类性能。在标准基准上的大量实验表明，与现有方法相比，BOFA实现了更高的准确性和效率。
摘要：Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge. Vision-language models such as CLIP offer strong transferable representations via multi-modal supervision, making them promising for CIL. However, applying CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have yet to fully realize their potential in effectively integrating visual and textual modalities. To address these issues, we propose BOFA (Bridge-layer Orthogonal Fusion for Adaptation), a novel framework for CIL. BOFA confines all model adaptation exclusively to CLIP's existing cross-modal bridge-layer, thereby adding no extra parameters or inference cost. To prevent forgetting within this layer, it leverages Orthogonal Low-Rank Fusion, a mechanism that constrains parameter updates to a low-rank ``safe subspace" mathematically constructed to be orthogonal to past task features. This ensures stable knowledge accumulation without data replay. Furthermore, BOFA employs a cross-modal hybrid prototype that synergizes stable textual prototypes with visual counterparts derived from our stably adapted bridge-layer, enhancing classification performance. Extensive experiments on standard benchmarks show that BOFA achieves superior accuracy and efficiency compared to existing methods.

【6】Toward Multi-Fidelity Machine Learning Force Field for Cathode Materials
标题：面向阴极材料的多保真度机器学习力场
链接：https://arxiv.org/abs/2511.11361

作者：Guangyi Dong,Zhihui Wang
摘要：机器学习力场（MLFF）采用神经网络将原子结构映射到系统能量，有效地将第一原理计算的高准确性与经验力场的计算效率结合起来。它们被广泛用于计算材料模拟。然而，MLFF用于锂离子电池正极材料的开发和应用仍然相对有限。这主要是由于阴极材料复杂的电子结构特性以及由此导致的可用于力场训练的高质量计算数据集的稀缺性。在这项工作中，我们开发了一个多保真度机器学习力场框架，以提高计算结果的数据效率，它可以同时利用阴极材料的低保真度非磁性和高保真磁性计算数据集进行训练。在磷酸锰铁锂（LMFP）正极材料系统上进行的测试证明了这种多保真度方法的有效性。这项工作有助于以较低的训练数据集成本实现阴极材料的高精度MLFF训练，并为将MLFF应用于阴极材料的计算模拟提供了新的视角。
摘要：Machine learning force fields (MLFFs), which employ neural networks to map atomic structures to system energies, effectively combine the high accuracy of first-principles calculation with the computational efficiency of empirical force fields. They are widely used in computational materials simulations. However, the development and application of MLFFs for lithium-ion battery cathode materials remain relatively limited. This is primarily due to the complex electronic structure characteristics of cathode materials and the resulting scarcity of high-quality computational datasets available for force field training. In this work, we develop a multi-fidelity machine learning force field framework to enhance the data efficiency of computational results, which can simultaneously utilize both low-fidelity non-magnetic and high-fidelity magnetic computational datasets of cathode materials for training. Tests conducted on the lithium manganese iron phosphate (LMFP) cathode material system demonstrate the effectiveness of this multi-fidelity approach. This work helps to achieve high-accuracy MLFF training for cathode materials at a lower training dataset cost, and offers new perspectives for applying MLFFs to computational simulations of cathode materials.

【7】StochEP: Stochastic Equilibrium Propagation for Spiking Convergent Recurrent Neural Networks
标题：StochEP：尖峰收敛回归神经网络的随机平衡传播
链接：https://arxiv.org/abs/2511.11320

作者：Jiaqi Lin,Yi Jiang,Abhronil Sengupta
摘要：尖峰神经网络（SNN）承诺节能，稀疏，生物启发计算。使用时间反向传播（BPTT）和替代梯度训练它们可以实现强大的性能，但在生物学上仍然令人难以置信。平衡传播（EP）提供了一个更本地化和生物接地的替代方案。然而，现有的EP框架，主要是基于确定性神经元，要么需要复杂的机制来处理尖峰动态的不连续性，要么无法扩展到简单的视觉任务之外。受生物尖峰机制的随机性和最近的硬件趋势的启发，我们提出了一个随机EP框架，将概率尖峰神经元EP范式。这种公式平滑了优化环境，稳定了训练，并在深度卷积尖峰收敛递归神经网络（CRNN）中实现了可扩展的学习。我们提供的理论保证表明，所提出的随机EP动力学近似确定性EP平均场理论下，从而继承其基本的理论保证。拟议的框架缩小了视觉基准中BPTT训练的SNN和EP训练的非尖峰CRNN之间的差距，同时保留了局部性，强调随机EP是神经形态和片上学习的一个有前途的方向。
摘要：Spiking Neural Networks (SNNs) promise energy-efficient, sparse, biologically inspired computation. Training them with Backpropagation Through Time (BPTT) and surrogate gradients achieves strong performance but remains biologically implausible. Equilibrium Propagation (EP) provides a more local and biologically grounded alternative. However, existing EP frameworks, primarily based on deterministic neurons, either require complex mechanisms to handle discontinuities in spiking dynamics or fail to scale beyond simple visual tasks. Inspired by the stochastic nature of biological spiking mechanism and recent hardware trends, we propose a stochastic EP framework that integrates probabilistic spiking neurons into the EP paradigm. This formulation smoothens the optimization landscape, stabilizes training, and enables scalable learning in deep convolutional spiking convergent recurrent neural networks (CRNNs). We provide theoretical guarantees showing that the proposed stochastic EP dynamics approximate deterministic EP under mean-field theory, thereby inheriting its underlying theoretical guarantees. The proposed framework narrows the gap to both BPTT-trained SNNs and EP-trained non-spiking CRNNs in vision benchmarks while preserving locality, highlighting stochastic EP as a promising direction for neuromorphic and on-chip learning.

【8】Virtual Width Networks
标题：虚拟宽度网络
链接：https://arxiv.org/abs/2511.11238

作者：Seed,Baisheng Li,Banggu Wu,Bole Ma,Bowen Xiao,Chaoyi Zhang,Cheng Li,Chengyi Wang,Chenyin Xu,Chi Zhang,Chong Hu,Daoguang Zan,Defa Zhu,Dongyu Xu,Du Li,Faming Wu,Fan Xia,Ge Zhang,Guang Shi,Haobin Chen,Hongyu Zhu,Hongzhi Huang,Huan Zhou,Huanzhang Dou,Jianhui Duan,Jianqiao Lu,Jianyu Jiang,Jiayi Xu,Jiecao Chen,Jin Chen,Jin Ma,Jing Su,Jingji Chen,Jun Wang,Jun Yuan,Juncai Liu,Jundong Zhou,Kai Hua,Kai Shen,Kai Xiang,Kaiyuan Chen,Kang Liu,Ke Shen,Liang Xiang,Lin Yan,Lishu Luo,Mengyao Zhang,Ming Ding,Mofan Zhang,Nianning Liang,Peng Li,Penghao Huang,Pengpeng Mu,Qi Huang,Qianli Ma,Qiyang Min,Qiying Yu,Renming Pang,Ru Zhang,Shen Yan,Shen Yan,Shixiong Zhao,Shuaishuai Cao,Shuang Wu,Siyan Chen,Siyu Li,Siyuan Qiao,Tao Sun,Tian Xin,Tiantian Fan,Ting Huang,Ting-Han Fan,Wei Jia,Wenqiang Zhang,Wenxuan Liu,Xiangzhong Wu,Xiaochen Zuo,Xiaoying Jia,Ximing Yang,Xin Liu,Xin Yu,Xingyan Bin,Xintong Hao,Xiongcai Luo,Xujing Li,Xun Zhou,Yanghua Peng,Yangrui Chen,Yi Lin,Yichong Leng,Yinghao Li,Yingshuan Song,Yiyuan Ma,Yong Shan,Yongan Xiang,Yonghui Wu,Yongtao Zhang,Yongzhen Yao,Yu Bao,Yuehang Yang,Yufeng Yuan,Yunshui Li,Yuqiao Xian,Yutao Zeng,Yuxuan Wang,Zehua Hong,Zehua Wang,Zengzhi Wang,Zeyu Yang,Zhengqiang Yin,Zhenyi Lu,Zhexi Zhang,Zhi Chen,Zhi Zhang,Zhiqi Lin,Zihao Huang,Zilin Xu,Ziyun Wei,Zuo Wang
摘要：我们引入虚拟宽度网络（VWN），一个框架，提供更广泛的表示的好处，而不会产生增加隐藏大小的二次成本。VWN从主干宽度中提取代表性宽度，在保持主干计算几乎不变的同时扩大嵌入空间。在我们的大规模实验中，8倍的扩展将下一个令牌的优化速度提高了2倍以上，将下一个2令牌预测的优化速度提高了3倍。随着损失差距的增加和收敛加速比的增加，这种优势在训练中得到了放大，这表明VWN不仅是令牌有效的，而且随着规模的增加也越来越有效。此外，我们确定虚拟宽度和损失减少之间的近似对数线性缩放关系，提供了初步的经验基础和动机，探索虚拟宽度缩放作为一个新的维度的大模型效率。
摘要：We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples representational width from backbone width, expanding the embedding space while keeping backbone compute nearly constant. In our large-scale experiment, an 8-times expansion accelerates optimization by over 2 times for next-token and 3 times for next-2-token prediction. The advantage amplifies over training as both the loss gap grows and the convergence-speedup ratio increases, showing that VWN is not only token-efficient but also increasingly effective with scale. Moreover, we identify an approximately log-linear scaling relation between virtual width and loss reduction, offering an initial empirical basis and motivation for exploring virtual-width scaling as a new dimension of large-model efficiency.

【9】Neural Network-Powered Finger-Drawn Biometric Authentication
标题：神经网络驱动的手指绘制生物识别认证
链接：https://arxiv.org/abs/2511.11235

作者：Maan Al Balkhi,Kordian Gontarska,Marko Harasic,Adrian Paschke
摘要：本文研究了基于神经网络的生物特征认证，使用手指在触摸屏设备上绘制的数字。我们评估了CNN和自动编码器架构，通过手指输入跟踪的简单数字模式（0-9）进行用户认证。20名参与者每人在个人触摸屏设备上贡献了2,000个手指。我们比较了两种CNN架构：修改后的Inception-V1网络和用于移动环境的轻量级浅层CNN。此外，我们还研究了卷积和全连接自动编码器的异常检测。两种CNN架构都实现了约89%的认证准确度，浅CNN需要更少的参数。自动编码器方法实现了约75%的准确度。结果表明，手指绘制的符号认证为触摸屏设备提供了一种可行的、安全的、用户友好的生物识别解决方案。这种方法可以与现有的基于模式的身份验证方法集成，为移动应用程序创建多层安全系统。
摘要：This paper investigates neural network-based biometric authentication using finger-drawn digits on touchscreen devices. We evaluated CNN and autoencoder architectures for user authentication through simple digit patterns (0-9) traced with finger input. Twenty participants contributed 2,000 finger-drawn digits each on personal touchscreen devices. We compared two CNN architectures: a modified Inception-V1 network and a lightweight shallow CNN for mobile environments. Additionally, we examined Convolutional and Fully Connected autoencoders for anomaly detection. Both CNN architectures achieved ~89% authentication accuracy, with the shallow CNN requiring fewer parameters. Autoencoder approaches achieved ~75% accuracy. The results demonstrate that finger-drawn symbol authentication provides a viable, secure, and user-friendly biometric solution for touchscreen devices. This approach can be integrated with existing pattern-based authentication methods to create multi-layered security systems for mobile applications.

【10】On-line learning of dynamic systems: sparse regression meets Kalman filtering
标题：动态系统在线学习：稀疏回归与卡尔曼过滤
链接：https://arxiv.org/abs/2511.11178

作者：Gianluigi Pillonetto,Akram Yazdani,Aleksandr Aravkin
摘要：从数据中学习控制方程是理解物理系统在不同科学学科（包括物理学、生物学和工程学）中的行为的核心。Sindy算法已被证明有效地利用稀疏性来识别非线性动态系统的简洁模型。在本文中，我们扩展稀疏驱动的方法，实时学习集成的基石算法从控制理论-卡尔曼滤波器（KF）。由此产生的Sindy卡尔曼滤波器（SKF）通过将未知的系统参数作为状态变量来统一这两个框架，从而能够对单独使用任何一种方法都无法实现的复杂时变非线性模型进行实时推断。此外，SKF增强了KF参数识别策略，特别是通过前瞻误差，大大简化了稀疏水平，方差参数和切换时刻的估计。我们验证SKF的混沌洛伦兹系统漂移或开关参数，并证明其有效性的稀疏非线性飞机模型的实时识别从真实的飞行数据。
摘要：Learning governing equations from data is central to understanding the behavior of physical systems across diverse scientific disciplines, including physics, biology, and engineering. The Sindy algorithm has proven effective in leveraging sparsity to identify concise models of nonlinear dynamical systems. In this paper, we extend sparsity-driven approaches to real-time learning by integrating a cornerstone algorithm from control theory -- the Kalman filter (KF). The resulting Sindy Kalman Filter (SKF) unifies both frameworks by treating unknown system parameters as state variables, enabling real-time inference of complex, time-varying nonlinear models unattainable by either method alone. Furthermore, SKF enhances KF parameter identification strategies, particularly via look-ahead error, significantly simplifying the estimation of sparsity levels, variance parameters, and switching instants. We validate SKF on a chaotic Lorenz system with drifting or switching parameters and demonstrate its effectiveness in the real-time identification of a sparse nonlinear aircraft model built from real flight data.

【11】Training Neural Networks at Any Scale
标题：以任何规模训练神经网络
链接：https://arxiv.org/abs/2511.11163

作者：Thomas Pethick,Kimon Antonakopoulos,Antonio Silveti-Falls,Leena Chennuru Vankadara,Volkan Cevher
摘要：本文回顾了训练神经网络的现代优化方法，重点是效率和规模。我们提出了一个统一的算法模板，突出了适应问题中的结构的重要性下的最先进的优化算法。然后，我们将介绍如何使这些算法对问题的规模不可知。我们的博览会旨在为希望参与这些令人兴奋的新发展的从业者和研究人员提供介绍。
摘要：This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale. We present state-of-the-art optimization algorithms under a unified algorithmic template that highlights the importance of adapting to the structures in the problem. We then cover how to make these algorithms agnostic to the scale of the problem. Our exposition is intended as an introduction for both practitioners and researchers who wish to be involved in these exciting new developments.

【12】How Data Quality Affects Machine Learning Models for Credit Risk Assessment
标题：数据质量如何影响信用风险评估的机器学习模型
链接：https://arxiv.org/abs/2511.10964

作者：Andrea Maurino
摘要：机器学习（ML）模型越来越多地用于信用风险评估，其有效性在很大程度上取决于输入数据的质量。在本文中，我们研究了几个数据质量问题，包括缺失值，噪声属性，离群值和标签错误，对信用风险评估中使用的机器学习模型的预测准确性的影响。利用一个开源数据集，我们使用Pucktrick库引入受控数据损坏，以评估随机森林，SVM和Logistic回归等10个常用模型的鲁棒性。我们的实验显示，根据数据退化的性质和严重程度，模型的鲁棒性存在显著差异。此外，所提出的方法和配套工具为寻求增强数据管道鲁棒性的从业者提供了实际支持，并为研究人员提供了一个灵活的框架，以便在以数据为中心的人工智能环境中进行进一步的实验。
摘要：Machine Learning (ML) models are being increasingly employed for credit risk evaluation, with their effectiveness largely hinging on the quality of the input data. In this paper we investigate the impact of several data quality issues, including missing values, noisy attributes, outliers, and label errors, on the predictive accuracy of the machine learning model used in credit risk assessment. Utilizing an open-source dataset, we introduce controlled data corruption using the Pucktrick library to assess the robustness of 10 frequently used models like Random Forest, SVM, and Logistic Regression and so on. Our experiments show significant differences in model robustness based on the nature and severity of the data degradation. Moreover, the proposed methodology and accompanying tools offer practical support for practitioners seeking to enhance data pipeline robustness, and provide researchers with a flexible framework for further experimentation in data-centric AI contexts.

【13】CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding
标题：CAT-Net：一种用于跨学科EEG-EMG融合音解码的交叉注意音网络
链接：https://arxiv.org/abs/2511.10935

作者：Yifan Zhuang,Calvin Huang,Zepeng Yu,Yongjie Zou,Jiawei Ju
备注：This is the extended version with technical appendices. The version of record appears in AAAI-26. Please cite the AAAI version
摘要：脑-机接口（BCI）语音解码已成为一个有前途的工具，以帮助个人与语音障碍。在这种情况下，整合脑电图（EEG）和肌电图（EMG）信号提供了强大的潜力，以提高解码性能。普通话声调分类提出了特别的挑战，因为即使音素保持相同，声调变化也传达不同的含义。在这项研究中，我们提出了一种新的跨学科的多模态BCI解码框架，融合EEG和EMG信号分类四个普通话音调在可听和无声的语音条件下。受神经和肌肉系统在语音产生中的合作机制的启发，我们的神经解码架构将时空特征提取分支与交叉注意融合机制相结合，从而实现模态之间的信息交互。我们进一步结合了领域对抗训练来提高跨学科的泛化能力。我们从10名参与者中收集了4，800个EEG试验和4，800个EMG试验，仅使用20个EEG和5个EMG通道，证明了最小通道解码的可行性。尽管采用了轻量级模块，但我们的模型在所有条件下都优于最先进的基线，可听语音的平均分类准确率为87.83%，无声语音的平均分类准确率为88.08%。在跨学科评估中，它仍然保持了很强的性能，对可听和无声语音的准确率分别为83.27%和85.10%。我们进一步进行消融研究，以验证每个组件的有效性。我们的研究结果表明，音调级解码与最小的EEG-EMG通道是可行的，并可能在学科之间推广，有助于实际BCI应用的发展。
摘要：Brain-computer interface (BCI) speech decoding has emerged as a promising tool for assisting individuals with speech impairments. In this context, the integration of electroencephalography (EEG) and electromyography (EMG) signals offers strong potential for enhancing decoding performance. Mandarin tone classification presents particular challenges, as tonal variations convey distinct meanings even when phonemes remain identical. In this study, we propose a novel cross-subject multimodal BCI decoding framework that fuses EEG and EMG signals to classify four Mandarin tones under both audible and silent speech conditions. Inspired by the cooperative mechanisms of neural and muscular systems in speech production, our neural decoding architecture combines spatial-temporal feature extraction branches with a cross-attention fusion mechanism, enabling informative interaction between modalities. We further incorporate domain-adversarial training to improve cross-subject generalization. We collected 4,800 EEG trials and 4,800 EMG trials from 10 participants using only twenty EEG and five EMG channels, demonstrating the feasibility of minimal-channel decoding. Despite employing lightweight modules, our model outperforms state-of-the-art baselines across all conditions, achieving average classification accuracies of 87.83% for audible speech and 88.08% for silent speech. In cross-subject evaluations, it still maintains strong performance with accuracies of 83.27% and 85.10% for audible and silent speech, respectively. We further conduct ablation studies to validate the effectiveness of each component. Our findings suggest that tone-level decoding with minimal EEG-EMG channels is feasible and potentially generalizable across subjects, contributing to the development of practical BCI applications.

【14】MMA-Sim: Bit-Accurate Reference Model of Tensor Cores and Matrix Cores
标题：MMA-Sim：张量核和矩阵核的位准确参考模型
链接：https://arxiv.org/abs/2511.10909

作者：Peichen Xie,Yang Wang,Fan Yang,Mao Yang
摘要：深度神经网络（DNN）快速增长的计算需求促使硬件供应商将矩阵乘法加速器（MMA）（如NVIDIA Tensor Cores和AMD Matrix Cores）集成到现代GPU中。然而，由于浮点矩阵乘法的算法规范不同且未记录，一些MMA可能会导致数值不精确和不一致，从而影响DNN训练和推理的稳定性和可重复性。本文介绍了MMA-Sim，第一个位精确的参考模型，揭示了10个GPU架构（8个来自NVIDIA，2个来自AMD）的MMA的详细算术行为。通过使用有针对性的和随机化的测试相结合，我们的方法来解剖的MMA，推导出九个算术算法来模拟浮点矩阵乘法的MMA。大规模验证证实了MMA-Sim和真实硬件之间的按位等效性。使用MMA-Sim，我们研究了影响DNN训练稳定性的算术行为，并识别了可能导致重大错误的未记录行为。
摘要：The rapidly growing computation demands of deep neural networks (DNNs) have driven hardware vendors to integrate matrix multiplication accelerators (MMAs), such as NVIDIA Tensor Cores and AMD Matrix Cores, into modern GPUs. However, due to distinct and undocumented arithmetic specifications for floating-point matrix multiplication, some MMAs can lead to numerical imprecision and inconsistency that can compromise the stability and reproducibility of DNN training and inference. This paper presents MMA-Sim, the first bit-accurate reference model that reveals the detailed arithmetic behaviors of the MMAs from ten GPU architectures (eight from NVIDIA and two from AMD). By dissecting the MMAs using a combination of targeted and randomized tests, our methodology derives nine arithmetic algorithms to simulate the floating-point matrix multiplication of the MMAs. Large-scale validation confirms bitwise equivalence between MMA-Sim and the real hardware. Using MMA-Sim, we investigate arithmetic behaviors that affect DNN training stability, and identify undocumented behaviors that could lead to significant errors.

【15】Multi-Joint Physics-Informed Deep Learning Framework for Time-Efficient Inverse Dynamics
标题：多关节物理信息深度学习框架，实现高效的逆动力学
链接：https://arxiv.org/abs/2511.10878

作者：Shuhao Ma,Zeyi Huang,Yu Cao,Wesley Doorsamy,Chaoyang Shi,Jun Li,Zhi-Qiang Zhang
备注：11 pages
摘要：跨多关节系统的肌肉激活和力的时间有效估计对于临床评估和辅助设备控制至关重要。然而，传统的方法是计算昂贵的，缺乏高质量的标记数据集的多关节应用。为了解决这些挑战，我们提出了一个物理学深度学习框架，可以直接从运动学中估计肌肉激活和力量。该框架采用了一种新颖的多关节交叉注意（MJCA）模块，具有双向门控递归单元（BiGRU）层来捕获关节间的协调，使每个关节能够自适应地整合来自其他关节的运动信息。通过将多关节动力学、关节间耦合和外力相互作用嵌入到损失函数中，我们的物理信息MJCA-BiGRU（PI-MJCA-BiGRU）在没有标记数据的情况下提供生理上一致的预测，同时实现时间高效的推断。在两个数据集上的实验验证表明，PI-MJCA-BiGRU实现了与传统监督方法相当的性能，而不需要地面实况标签，而与其他基线架构相比，MJCA模块显着增强了关节间协调建模。
摘要：Time-efficient estimation of muscle activations and forces across multi-joint systems is critical for clinical assessment and assistive device control. However, conventional approaches are computationally expensive and lack a high-quality labeled dataset for multi-joint applications. To address these challenges, we propose a physics-informed deep learning framework that estimates muscle activations and forces directly from kinematics. The framework employs a novel Multi-Joint Cross-Attention (MJCA) module with Bidirectional Gated Recurrent Unit (BiGRU) layers to capture inter-joint coordination, enabling each joint to adaptively integrate motion information from others. By embedding multi-joint dynamics, inter-joint coupling, and external force interactions into the loss function, our Physics-Informed MJCA-BiGRU (PI-MJCA-BiGRU) delivers physiologically consistent predictions without labeled data while enabling time-efficient inference. Experimental validation on two datasets demonstrates that PI-MJCA-BiGRU achieves performance comparable to conventional supervised methods without requiring ground-truth labels, while the MJCA module significantly enhances inter-joint coordination modeling compared to other baseline architectures.

【16】Fast Data Attribution for Text-to-Image Models
标题：文本到图像模型的快速数据属性
链接：https://arxiv.org/abs/2511.10721

作者：Sheng-Yu Wang,Aaron Hertzmann,Alexei A Efros,Richard Zhang,Jun-Yan Zhu
备注：NeurIPS 2025 camera ready. Project page: https://peterwang512.github.io/FastGDA
摘要：文本到图像模型的数据属性旨在识别对生成的输出影响最大的训练图像。现有的属性方法涉及相当大的计算资源，为每个查询，使他们不切实际的现实世界中的应用。我们提出了一种新的方法，可扩展的和有效的数据归因。我们的核心思想是提取一个缓慢的，基于非学习的属性方法的特征嵌入空间，以有效检索高度有影响力的训练图像。在部署过程中，结合高效的索引和搜索方法，我们的方法成功地找到了具有高度影响力的图像，而无需运行昂贵的归因算法。我们展示了在MSCOCO上训练的中等规模模型和在LAION上训练的大规模稳定扩散模型的广泛结果，表明我们的方法可以在几秒钟内实现更好或有竞争力的性能，比现有方法快2，500倍-400，000倍。我们的工作代表了在真实世界模型（如稳定扩散）上大规模应用数据归因方法的有意义的一步。
摘要：Data attribution for text-to-image models aims to identify the training images that most significantly influenced a generated output. Existing attribution methods involve considerable computational resources for each query, making them impractical for real-world applications. We propose a novel approach for scalable and efficient data attribution. Our key idea is to distill a slow, unlearning-based attribution method to a feature embedding space for efficient retrieval of highly influential training images. During deployment, combined with efficient indexing and search methods, our method successfully finds highly influential images without running expensive attribution algorithms. We show extensive results on both medium-scale models trained on MSCOCO and large-scale Stable Diffusion models trained on LAION, demonstrating that our method can achieve better or competitive performance in a few seconds, faster than existing methods by 2,500x - 400,000x. Our work represents a meaningful step towards the large-scale application of data attribution methods on real-world models such as Stable Diffusion.

【17】Decomposing Direct and Indirect Biases in Linear Models under Demographic Parity Constraint
标题：人口平价约束下线性模型中直接和间接偏差的分解
链接：https://arxiv.org/abs/2511.11294

作者：Bertille Tierny,Arthur Charpentier,François Hu
摘要：线性模型由于其简单性和可解释性而广泛用于高风险决策。然而，当引入人口统计学均等等公平性约束时，它们对模型系数的影响以及预测偏差如何在特征之间分布仍然是不透明的。现有的线性模型的方法往往依赖于强大的和不切实际的假设，或忽略了敏感属性的明确作用，限制了他们的实际效用的公平性评估。我们扩展了（Chzhen和Schreuder，2022）和（Fukuchi和Sakuma，2023）的工作，提出了一个后处理框架，可以应用于任何线性模型的顶部，将产生的偏差分解为直接（敏感属性）和间接（相关特征）分量。我们的方法分析了人口均等如何重塑每个模型系数，包括敏感和非敏感特征。这使得公平干预措施的透明，功能层面的解释，并揭示了偏见如何可能持续或通过相关变量转移。我们的框架不需要再培训，并为模型审计和缓解提供了可操作的见解。在合成和真实世界数据集上的实验表明，我们的方法捕获了先前工作所错过的公平动态，为负责任地部署线性模型提供了一个实用且可解释的工具。
摘要：Linear models are widely used in high-stakes decision-making due to their simplicity and interpretability. Yet when fairness constraints such as demographic parity are introduced, their effects on model coefficients, and thus on how predictive bias is distributed across features, remain opaque. Existing approaches on linear models often rely on strong and unrealistic assumptions, or overlook the explicit role of the sensitive attribute, limiting their practical utility for fairness assessment. We extend the work of (Chzhen and Schreuder, 2022) and (Fukuchi and Sakuma, 2023) by proposing a post-processing framework that can be applied on top of any linear model to decompose the resulting bias into direct (sensitive-attribute) and indirect (correlated-features) components. Our method analytically characterizes how demographic parity reshapes each model coefficient, including those of both sensitive and non-sensitive features. This enables a transparent, feature-level interpretation of fairness interventions and reveals how bias may persist or shift through correlated variables. Our framework requires no retraining and provides actionable insights for model auditing and mitigation. Experiments on both synthetic and real-world datasets demonstrate that our method captures fairness dynamics missed by prior work, offering a practical and interpretable tool for responsible deployment of linear models.

其他(22篇)

【1】Optimizing Mixture of Block Attention
标题：优化块注意力混合
链接：https://arxiv.org/abs/2511.11571

作者：Guangxuan Xiao,Junxian Guo,Kasra Mazaheri,Song Han
备注：The first two authors contributed equally to this work
摘要：注意阻滞混合（MoBA）（Lu等人，2025）是一个有前途的构建块，通过使查询稀疏地关注键值块的一个小子集，大大降低了计算成本，有效地处理LLM中的长上下文。然而，MoBA性能的设计原则知之甚少，并且缺乏有效的GPU实现，阻碍了其实际应用。在本文中，我们首先开发了一个统计模型来分析MoBA的基本机制。我们的模型表明，性能关键取决于路由器的能力，以准确地区分相关的查询关键的亲和力的基础上不相关的块。我们推导出一个信号噪声比，正式连接到这个检索精度的建筑参数。在我们的分析指导下，我们确定了两个关键的改进途径：使用更小的块大小和对键应用短卷积来聚类相关信号，从而提高路由准确性。虽然理论上更好，但小块大小在GPU上效率低下。为了弥合这一差距，我们引入了FlashMoBA，这是一种硬件感知的CUDA内核，即使在我们的理论建议的小块大小下也能实现高效的MoBA执行。我们通过从头开始训练LLM来验证我们的见解，表明我们改进的MoBA模型与密集注意力基线的性能相匹配。FlashMoBA在小块上实现了比FlashAttention-2高出14.7倍的加速，使我们理论上的改进变得实用。代码可从以下网址获得：https://github.com/mit-han-lab/flash-moba。
摘要：Mixture of Block Attention (MoBA) (Lu et al., 2025) is a promising building block for efficiently processing long contexts in LLMs by enabling queries to sparsely attend to a small subset of key-value blocks, drastically reducing computational cost. However, the design principles governing MoBA's performance are poorly understood, and it lacks an efficient GPU implementation, hindering its practical adoption. In this paper, we first develop a statistical model to analyze MoBA's underlying mechanics. Our model reveals that performance critically depends on the router's ability to accurately distinguish relevant from irrelevant blocks based on query-key affinities. We derive a signal-to-noise ratio that formally connects architectural parameters to this retrieval accuracy. Guided by our analysis, we identify two key pathways for improvement: using smaller block sizes and applying a short convolution on keys to cluster relevant signals, which enhances routing accuracy. While theoretically better, small block sizes are inefficient on GPUs. To bridge this gap, we introduce FlashMoBA, a hardware-aware CUDA kernel that enables efficient MoBA execution even with the small block sizes our theory recommends. We validate our insights by training LLMs from scratch, showing that our improved MoBA models match the performance of dense attention baselines. FlashMoBA achieves up to 14.7x speedup over FlashAttention-2 for small blocks, making our theoretically-grounded improvements practical. Code is available at: https://github.com/mit-han-lab/flash-moba.

【2】Data-efficient U-Net for Segmentation of Carbide Microstructures in SEM Images of Steel Alloys
标题：数据高效的U-Net用于钢铁合金扫描电子显微结构的分割
链接：https://arxiv.org/abs/2511.11485

作者：Alinda Ezgi Gerçek,Till Korten,Paul Chekhonin,Maleeha Hassan,Peter Steinbach
摘要：了解反应堆压力容器钢的微观结构对于预测机械性能至关重要，因为碳化物沉淀物既可以强化合金，又可以引发裂纹。在扫描电子显微镜图像中，碳化物和基体之间的灰度值重叠使得简单的阈值化无效。我们提出了一个数据高效的分割管道，使用轻量级的U-Net（30.7~M参数），仅在\textbf{10个带注释的扫描电子显微镜图像}上训练。尽管数据有限，但我们的模型实现了\textbf{Dice-Sørensen系数为0.98}，显著优于冶金领域的最新技术（经典图像分析：0.85），同时与最新的数据高效分割模型相比，将注释工作减少了一个数量级。这种方法可以快速、自动地量化合金设计中的碳化物，并推广到其他钢类型，展示了数据高效深度学习在反应堆压力容器钢分析中的潜力。
摘要：Understanding reactor-pressure-vessel steel microstructure is crucial for predicting mechanical properties, as carbide precipitates both strengthen the alloy and can initiate cracks. In scanning electron microscopy images, gray-value overlap between carbides and matrix makes simple thresholding ineffective. We present a data-efficient segmentation pipeline using a lightweight U-Net (30.7~M parameters) trained on just \textbf{10 annotated scanning electron microscopy images}. Despite limited data, our model achieves a \textbf{Dice-Sørensen coefficient of 0.98}, significantly outperforming the state-of-the-art in the field of metallurgy (classical image analysis: 0.85), while reducing annotation effort by one order of magnitude compared to the state-of-the-art data efficient segmentation model. This approach enables rapid, automated carbide quantification for alloy design and generalizes to other steel types, demonstrating the potential of data-efficient deep learning in reactor-pressure-vessel steel analysis.

【3】Multicalibration yields better matchings
标题：多重校准产生更好的匹配
链接：https://arxiv.org/abs/2511.11413

作者：Riccardo Colini Baldeschi,Simone Di Gregorio,Simone Fioravanti,Federico Fusco,Ido Guy,Daniel Haimovich,Stefano Leonardi,Fridolin Linder,Lorenzo Perini,Matteo Russo,Niek Tax
摘要：考虑在加权图中找到最佳匹配的问题，其中我们只能访问基于底层上下文的实际随机权重的预测。如果预测器是贝叶斯最优预测器，则基于预测的权重计算最佳匹配是最优的。然而，在实践中，这种完美的信息场景并不现实。给定一个不完美的预测器，一个次优的决策规则可能会补偿引起的误差，从而优于标准的最优规则。在本文中，我们提出多校准作为一种方法来解决这个问题。这种公平性的概念要求预测器对上下文的受保护集合的家族的每个元素都是无偏的。给定一类匹配算法$\mathcal C$和任何边权重的预测器$γ$，我们展示了如何构造一个特定的多校准预测器$\hat γ$，具有以下性质。基于$\hat γ$的输出选择最佳匹配与应用于原始预测器$\mathcal C$中的最佳决策规则$\hat γ$具有竞争力。我们补充这一结果，提供样本的复杂性界限。
摘要：Consider the problem of finding the best matching in a weighted graph where we only have access to predictions of the actual stochastic weights, based on an underlying context. If the predictor is the Bayes optimal one, then computing the best matching based on the predicted weights is optimal. However, in practice, this perfect information scenario is not realistic. Given an imperfect predictor, a suboptimal decision rule may compensate for the induced error and thus outperform the standard optimal rule. In this paper, we propose multicalibration as a way to address this problem. This fairness notion requires a predictor to be unbiased on each element of a family of protected sets of contexts. Given a class of matching algorithms $\mathcal C$ and any predictor $γ$ of the edge-weights, we show how to construct a specific multicalibrated predictor $\hat γ$, with the following property. Picking the best matching based on the output of $\hat γ$ is competitive with the best decision rule in $\mathcal C$ applied onto the original predictor $γ$. We complement this result by providing sample complexity bounds.

【4】Robust inverse material design with physical guarantees using the Voigt-Reuss Net
标题：使用Voigt-Reuss Net进行具有物理保证的稳健反向材料设计
链接：https://arxiv.org/abs/2511.11388

作者：Sanath Keshav,Felix Fritzen
摘要：我们提出了一个光谱归一化代理的正向和反向机械均匀化硬物理保证。利用Voigt-Reuss边界，我们通过Cholesky类算子分解它们的差异，并学习特征值在$[0，1]$中的无量纲对称半正定表示;逆映射返回位于Löwner意义上的边界之间的对称正定预测。在随机双相微结构的开放数据集上的3D线性弹性中，在$>\！7.5\times 10^{5}$基于FFT的标签，具有236个各向同性不变描述符和三个对比度参数，以近乎完美的保真度恢复各向同性投影（各向同性相关条目：$R^2 \ge 0.998$），而各向异性揭示耦合无法从$SO（3）$不变输入中识别。张量级相对Frobenius误差的中位数为$\approximat1.7\%$，平均值为$\approximat3.4\%$。对于阈值化三角微结构上的2D平面应变，将谱归一化与可微分渲染器和CNN耦合，在所有分量上产生$R^2>0.99$，归一化损失低于%，精确跟踪由分解引起的特征值跳跃，以及对分布外图像的鲁棒泛化。将参数化微结构视为设计变量，使用单个代理的批量一阶优化在几个百分比内匹配目标张量，并返回各种接近最优的设计。总的来说，Voigt-Reuss网络将精确的、物理上可接受的正向预测与大批量的、约束一致的逆设计相统一，并且对于椭圆算子和耦合物理设置是通用的。
摘要：We propose a spectrally normalized surrogate for forward and inverse mechanical homogenization with hard physical guarantees. Leveraging the Voigt-Reuss bounds, we factor their difference via a Cholesky-like operator and learn a dimensionless, symmetric positive semi-definite representation with eigenvalues in $[0,1]$; the inverse map returns symmetric positive-definite predictions that lie between the bounds in the Löwner sense. In 3D linear elasticity on an open dataset of stochastic biphasic microstructures, a fully connected Voigt-Reuss net trained on $>\!7.5\times 10^{5}$ FFT-based labels with 236 isotropy-invariant descriptors and three contrast parameters recovers the isotropic projection with near-perfect fidelity (isotropy-related entries: $R^2 \ge 0.998$), while anisotropy-revealing couplings are unidentifiable from $SO(3)$-invariant inputs. Tensor-level relative Frobenius errors have median $\approx 1.7\%$ and mean $\approx 3.4\%$ across splits. For 2D plane strain on thresholded trigonometric microstructures, coupling spectral normalization with a differentiable renderer and a CNN yields $R^2>0.99$ on all components, subpercent normalized losses, accurate tracking of percolation-induced eigenvalue jumps, and robust generalization to out-of-distribution images. Treating the parametric microstructure as design variables, batched first-order optimization with a single surrogate matches target tensors within a few percent and returns diverse near-optimal designs. Overall, the Voigt-Reuss net unifies accurate, physically admissible forward prediction with large-batch, constraint-consistent inverse design, and is generic to elliptic operators and coupled-physics settings.

【5】Sparse Methods for Vector Embeddings of TPC Data
标题：PPC数据的载体嵌入稀疏方法
链接：https://arxiv.org/abs/2511.11221

作者：Tyler Wheeler,Michelle P. Kuchera,Raghuram Ramanujan,Ryan Krupp,Chris Wrede,Saiprasad Ravishankar,Connor L. Cross,Hoi Yan Ian Heung,Andrew J. Jones,Benjamin Votaw
备注：NeurIPS Machine Learning and the Physical Sciences Workshop 2025
摘要：时间投影室（TPC）是一种多功能探测器，可重建电离介质中的带电粒子轨迹，从而在广泛的核物理实验中实现灵敏的测量。我们探索了用于TPC数据表示学习的稀疏卷积网络，发现稀疏ResNet架构，即使具有随机设置的权重，也提供了有用的事件结构化向量嵌入。在一个简单的物理驱动的二进制分类任务上对该架构进行预训练，进一步提高了嵌入质量。使用GADGET II TPC（一种用于测量低能量β$延迟粒子衰变的检测器）的Gaseous检测器的数据，我们将原始垫级信号表示为稀疏张量，训练Minkowski Engine ResNet模型，并探测由此产生的事件级嵌入，揭示丰富的事件结构。作为一个交叉检测器测试，我们嵌入数据的主动目标TPC（AT-TPC）-一个检测器设计用于核反应研究的逆运动学-使用相同的编码器。我们发现，即使是未经训练的稀疏ResNet模型也提供了有用的AT-TPC数据嵌入，并且当模型在GADGET数据上训练时，我们观察到了改进。总之，这些结果突出了稀疏卷积技术作为各种TPC实验中表示学习的通用工具的潜力。
摘要：Time Projection Chambers (TPCs) are versatile detectors that reconstruct charged-particle tracks in an ionizing medium, enabling sensitive measurements across a wide range of nuclear physics experiments. We explore sparse convolutional networks for representation learning on TPC data, finding that a sparse ResNet architecture, even with randomly set weights, provides useful structured vector embeddings of events. Pre-training this architecture on a simple physics-motivated binary classification task further improves the embedding quality. Using data from the GAseous Detector with GErmanium Tagging (GADGET) II TPC, a detector optimized for measuring low-energy $β$-delayed particle decays, we represent raw pad-level signals as sparse tensors, train Minkowski Engine ResNet models, and probe the resulting event-level embeddings which reveal rich event structure. As a cross-detector test, we embed data from the Active-Target TPC (AT-TPC) -- a detector designed for nuclear reaction studies in inverse kinematics -- using the same encoder. We find that even an untrained sparse ResNet model provides useful embeddings of AT-TPC data, and we observe improvements when the model is trained on GADGET data. Together, these results highlight the potential of sparse convolutional techniques as a general tool for representation learning in diverse TPC experiments.

【6】A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates
标题：没有Fenchel结合物的Tsallis-inf的最佳两个世界证明
链接：https://arxiv.org/abs/2511.11211

作者：Wei-Cheng Lee,Francesco Orabona
摘要：在这篇短文中，我们给出了J. Zimmert和Y.塞尔丁Tsallis-INF：一个随机和对抗性强盗的最佳算法。Journal of Machine Learning Research，22（28）：1-49，2021。网址https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf。特别是，证明使用在线凸优化的现代工具，避免使用共轭函数。此外，我们不优化边界中的常数，以利于更精简的证明。
摘要：In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not optimize the constants in the bounds in favor of a slimmer proof.

【7】Questioning the Stability of Visual Question Answering
标题：质疑视觉问题回答的稳定性
链接：https://arxiv.org/abs/2511.11206

作者：Amir Rosenfeld,Neta Glazer,Ethan Fetaya
摘要：视觉语言模型（VLM）已经取得了显着的进展，但他们的可靠性下小，意义保持输入变化仍然知之甚少。我们提出了第一个大规模的，系统的研究VLM鲁棒性良性的视觉和文本扰动：像素级的位移，光几何变换，填充重新缩放，释义，和多语言重写，不改变潜在的语义的图像问题对。在一系列广泛的模型和数据集中，我们发现现代VLM对这种微小的扰动非常敏感：相当一部分样本在至少一次视觉或文本修改下改变了预测结果。我们描述了这种不稳定性如何在扰动类型、问题类别和模型之间变化，揭示了即使是最先进的系统（例如，GPT-4 o，Gemini 2.0 Flash）经常在小到几个像素的偏移或无害的改写下失败。我们进一步表明，样本水平的稳定性是正确性的一个强有力的指标：稳定的样本始终更有可能被正确回答。利用这一点，我们证明了小的，可访问的开源模型的稳定性模式可以用来预测更大的闭源模型的正确性，具有高精度。我们的研究结果揭示了当前VLM的根本脆弱性，并强调了超越对抗性扰动的鲁棒性评估的必要性，而是关注模型应该可靠地维护的不变性。
摘要：Visual Language Models (VLMs) have achieved remarkable progress, yet their reliability under small, meaning-preserving input changes remains poorly understood. We present the first large-scale, systematic study of VLM robustness to benign visual and textual perturbations: pixel-level shifts, light geometric transformations, padded rescaling, paraphrasing, and multilingual rewrites that do not alter the underlying semantics of an image-question pair. Across a broad set of models and datasets, we find that modern VLMs are highly sensitive to such minor perturbations: a substantial fraction of samples change their predicted answer under at least one visual or textual modification. We characterize how this instability varies across perturbation types, question categories, and models, revealing that even state-of-the-art systems (e.g., GPT-4o, Gemini 2.0 Flash) frequently fail under shifts as small as a few pixels or harmless rephrasings. We further show that sample-level stability serves as a strong indicator of correctness: stable samples are consistently far more likely to be answered correctly. Leveraging this, we demonstrate that the stability patterns of small, accessible open-source models can be used to predict the correctness of much larger closed-source models with high precision. Our findings expose a fundamental fragility in current VLMs and highlight the need for robustness evaluations that go beyond adversarial perturbations, focusing instead on invariances that models should reliably uphold.

【8】Refine and Align: Confidence Calibration through Multi-Agent Interaction in VQA
标题：细化和对齐：通过VQA中的多智能体交互进行信心校准
链接：https://arxiv.org/abs/2511.11169

作者：Ayush Pandey,Jai Bardhan,Ishita Jain,Ramya S Hebbalaguppe,Rohan Raju Dhanakshirur,Lovekesh Vig
备注：17 pages, 6 figures, 5 tables. Accepted to Special Track on AI Alignment, AAAI 2026. Project Page- https://refine-align.github.io/
摘要：在视觉问题识别（VQA）和人工智能的背景下，校准是指人工智能系统对其答案的信心如何反映其实际正确性。当这种系统自主运行并且必须在视觉不确定性下做出决策时，这方面变得尤为重要。虽然由高级视觉语言模型（VLM）提供支持的现代VQA系统由于其更高的准确性而越来越多地用于医疗诊断和自主导航等高风险领域，但其置信度估计的可靠性仍然没有得到充分研究。特别是，这些系统经常产生过度自信的反应。为了解决这个问题，我们引入了AlignVQA，一个基于辩论的多代理框架，在这个框架中，不同的专门VLM -每个都遵循不同的提示策略-生成候选答案，然后进行两阶段的互动：通才代理批评，完善和汇总这些建议。这个辩论过程产生的信心估计，更准确地反映了模型的真实预测性能。我们发现，更多的校准专门代理产生更好的对齐的信心。此外，我们引入了一种新的可微的校准感知损失函数，称为peclcal，旨在通过最小化校准误差的上限来微调专门的代理。这个目标明确地提高了每个代理的信心估计的保真度。多个基准VQA数据集的实证结果证实了我们的方法的有效性，表明校准差异大幅减少。此外，我们提出了一种新的可微校准感知损失微调的专业代理和提高他们的个人的信心估计的基础上最大限度地减少上限校准误差的质量。
摘要：In the context of Visual Question Answering (VQA) and Agentic AI, calibration refers to how closely an AI system's confidence in its answers reflects their actual correctness. This aspect becomes especially important when such systems operate autonomously and must make decisions under visual uncertainty. While modern VQA systems, powered by advanced vision-language models (VLMs), are increasingly used in high-stakes domains like medical diagnostics and autonomous navigation due to their improved accuracy, the reliability of their confidence estimates remains under-examined. Particularly, these systems often produce overconfident responses. To address this, we introduce AlignVQA, a debate-based multi-agent framework, in which diverse specialized VLM -- each following distinct prompting strategies -- generate candidate answers and then engage in two-stage interaction: generalist agents critique, refine and aggregate these proposals. This debate process yields confidence estimates that more accurately reflect the model's true predictive performance. We find that more calibrated specialized agents produce better aligned confidences. Furthermore, we introduce a novel differentiable calibration-aware loss function called aligncal designed to fine-tune the specialized agents by minimizing an upper bound on the calibration error. This objective explicitly improves the fidelity of each agent's confidence estimates. Empirical results across multiple benchmark VQA datasets substantiate the efficacy of our approach, demonstrating substantial reductions in calibration discrepancies. Furthermore, we propose a novel differentiable calibration-aware loss to fine-tune the specialized agents and improve the quality of their individual confidence estimates based on minimising upper bound calibration error.

【9】PRSM: A Measure to Evaluate CLIP's Robustness Against Paraphrases
标题：PRSM：评估CLIP对重述稳健性的一项措施
链接：https://arxiv.org/abs/2511.11141

作者：Udo Schlegel,Franziska Weeber,Jian Lan,Thomas Seidl
备注：8 pages, accpeted as short paper at MMM 2026
摘要：对比图像预训练（CLIP）是一种广泛使用的多模态模型，通过大规模训练对齐文本和图像表示。虽然它在zero-shot和Few-Shot任务上表现很好，但它对语言变化的鲁棒性，特别是释义，仍然有待探索。释义鲁棒性对于可靠的部署至关重要，特别是在社会敏感的环境中，不一致的表示可能会放大人口统计学偏见。在本文中，我们介绍了释义排名稳定性度量（PRSM），一种新的措施，量化CLIP的释义查询的敏感性。使用社会反事实数据集，旨在揭示社会和人口偏见的基准，我们经验性地评估CLIP的稳定性下的释义变化，研究释义的鲁棒性和性别之间的相互作用，并讨论多模式系统的公平和公平部署的影响。我们的分析表明，鲁棒性各不相同的释义策略，与男性和女性相关的查询之间观察到微妙而一致的差异。
摘要：Contrastive Language-Image Pre-training (CLIP) is a widely used multimodal model that aligns text and image representations through large-scale training. While it performs strongly on zero-shot and few-shot tasks, its robustness to linguistic variation, particularly paraphrasing, remains underexplored. Paraphrase robustness is essential for reliable deployment, especially in socially sensitive contexts where inconsistent representations can amplify demographic biases. In this paper, we introduce the Paraphrase Ranking Stability Metric (PRSM), a novel measure for quantifying CLIP's sensitivity to paraphrased queries. Using the Social Counterfactuals dataset, a benchmark designed to reveal social and demographic biases, we empirically assess CLIP's stability under paraphrastic variation, examine the interaction between paraphrase robustness and gender, and discuss implications for fairness and equitable deployment of multimodal systems. Our analysis reveals that robustness varies across paraphrasing strategies, with subtle yet consistent differences observed between male- and female-associated queries.

【10】Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
标题：纠正文本嵌入中的平均偏差：对MMTEB进行无训练改进的精细重新规范化
链接：https://arxiv.org/abs/2511.11041

作者：Xingyu Ren,Youran Sun,Haoyu Liang
摘要：我们发现，当前的文本嵌入模型产生的输出具有一致的偏差，即，每个嵌入向量$e$可以分解为$\tilde{e} + μ$，其中$μ$在所有句子中几乎相同。我们提出了一个即插即用，无需训练和轻量级的解决方案，称为重整。通过大量的实验，我们表明，重正化一致和统计显着提高现有模型的性能上的大规模多语言文本嵌入基准（MMTEB）。特别是，在38个模型中，重正化在检索任务上提高了9.7 $σ$，在分类任务上提高了3.1 $σ$，在其他类型的任务上提高了0.8 $σ$。重正化有两种变体：直接从$e$中减去$μ$，或者减去$e$在$μ$上的投影。我们从理论上预测，后者的表现更好，我们的实验证实了这一预测。
摘要：We find that current text embedding models produce outputs with a consistent bias, i.e., each embedding vector $e$ can be decomposed as $\tilde{e} + μ$, where $μ$ is almost identical across all sentences. We propose a plug-and-play, training-free and lightweight solution called Renormalization. Through extensive experiments, we show that renormalization consistently and statistically significantly improves the performance of existing models on the Massive Multilingual Text Embedding Benchmark (MMTEB). In particular, across 38 models, renormalization improves performance by 9.7 $σ$ on retrieval tasks, 3.1 $σ$ on classification tasks, and 0.8 $σ$ on other types of tasks. Renormalization has two variants: directly subtracting $μ$ from $e$, or subtracting the projection of $e$ onto $μ$. We theoretically predict that the latter performs better, and our experiments confirm this prediction.

【11】Cascading Bandits With Feedback
标题：通过反馈级联盗贼
链接：https://arxiv.org/abs/2511.10938

作者：R Sri Prakash,Nikhil Karamchandani,Sharayu Moharir
摘要：出于边缘推理的挑战，我们研究了级联强盗模型的一个变体，其中每个手臂对应于一个推理模型，具有相关的准确性和错误概率。我们分析了四个决策政策-探索，然后提交，行动消除，置信下限（LCB），和汤普森采样，并提供尖锐的理论遗憾保证。与经典的bandit设置不同，Explore-then-Commit和Action Elimination会导致次优的后悔，因为它们在探索阶段之后会承诺一个固定的顺序，限制了它们的适应能力。相比之下，LCB和Thompson Sampling根据观察到的反馈不断更新他们的决策，实现恒定的O（1）后悔。仿真证实了这些理论研究结果，突出了不确定性下的有效边缘推理的自适应性的关键作用。
摘要：Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability. We analyse four decision-making policies-Explore-then-Commit, Action Elimination, Lower Confidence Bound (LCB), and Thompson Sampling-and provide sharp theoretical regret guarantees for each. Unlike in classical bandit settings, Explore-then-Commit and Action Elimination incur suboptimal regret because they commit to a fixed ordering after the exploration phase, limiting their ability to adapt. In contrast, LCB and Thompson Sampling continuously update their decisions based on observed feedback, achieving constant O(1) regret. Simulations corroborate these theoretical findings, highlighting the crucial role of adaptivity for efficient edge inference under uncertainty.

【12】ICX360: In-Context eXplainability 360 Toolkit
标题：ICX 360：上下文内eXplainability 360工具包
链接：https://arxiv.org/abs/2511.10879

作者：Dennis Wei,Ronny Luss,Xiaomeng Hu,Lucas Monteiro Paes,Pin-Yu Chen,Karthikeyan Natesan Ramamurthy,Erik Miehling,Inge Vejsbjerg,Hendrik Strobelt
备注：14 pages, 4 figures
摘要：大型语言模型（LLM）在日常生活中无处不在，并且正在进入更高风险的应用，从总结会议记录到回答医生的问题。与早期的预测模型一样，我们开发用于解释LLM输出的工具至关重要，无论是摘要，列表，对问题的回答等。考虑到这些需求，我们引入了In-Context Explainability 360（ICX 360），这是一个开源的Python工具包，用于解释LLM，重点关注用户提供的上下文（或一般提示）。ICX 360包含了三个最新工具的实现，这些工具使用黑盒和白盒方法（分别通过扰动和梯度）来解释LLM。该工具包位于https：//github.com/IBM/ICX360，包含快速入门指导材料以及详细的教程，涵盖检索增强生成、自然语言生成和越狱等用例。
摘要：Large Language Models (LLMs) have become ubiquitous in everyday life and are entering higher-stakes applications ranging from summarizing meeting transcripts to answering doctors' questions. As was the case with earlier predictive models, it is crucial that we develop tools for explaining the output of LLMs, be it a summary, list, response to a question, etc. With these needs in mind, we introduce In-Context Explainability 360 (ICX360), an open-source Python toolkit for explaining LLMs with a focus on the user-provided context (or prompts in general) that are fed to the LLMs. ICX360 contains implementations for three recent tools that explain LLMs using both black-box and white-box methods (via perturbations and gradients respectively). The toolkit, available at https://github.com/IBM/ICX360, contains quick-start guidance materials as well as detailed tutorials covering use cases such as retrieval augmented generation, natural language generation, and jailbreaking.

【13】Accuracy-Preserving CNN Pruning Method under Limited Data Availability
标题：有限数据可用性下保持准确性的CNN修剪方法
链接：https://arxiv.org/abs/2511.10861

作者：Daisuke Yasui,Toshitaka Matsuki,Hiroshi Sato
摘要：卷积神经网络（CNN）被广泛应用于图像识别，并在各个领域取得了成功。CNN模型已经变得更大规模，以提高准确性和泛化性能。已经进行了关于在具有有限计算资源的环境中压缩用于特定目标应用的预训练模型的研究。在模型压缩技术中，使用分层相关传播（LRP）（一种可解释的AI技术）的方法已显示出实现高修剪率同时保持准确性的希望，即使没有微调。因为这些方法不需要微调，所以它们适用于数据有限的场景。然而，现有的基于LRP的修剪方法仍然遭受显着的准确性下降，限制了它们的实际可用性。本研究提出了一种剪枝方法，在保持更好的模型精度的同时，达到更高的剪枝率。我们的方法修剪少量的数据已经实现了修剪，保持准确性比现有的方法。
摘要：Convolutional Neural Networks (CNNs) are widely used in image recognition and have succeeded in various domains. CNN models have become larger-scale to improve accuracy and generalization performance. Research has been conducted on compressing pre-trained models for specific target applications in environments with limited computing resources. Among model compression techniques, methods using Layer-wise Relevance Propagation (LRP), an explainable AI technique, have shown promise by achieving high pruning rates while preserving accuracy, even without fine-tuning. Because these methods do not require fine-tuning, they are suited to scenarios with limited data. However, existing LRP-based pruning approaches still suffer from significant accuracy degradation, limiting their practical usability. This study proposes a pruning method that achieves a higher pruning rate while preserving better model accuracy. Our approach to pruning with a small amount of data has achieved pruning that preserves accuracy better than existing methods.

【14】STAMP: Spatial-Temporal Adapter with Multi-Head Pooling
标题：STAMPS：具有多头池化的时空适配器
链接：https://arxiv.org/abs/2511.10848

作者：Brad Shook,Abby Turner,Jieshi Chen,Michał Wiliński,Mononito Goswami,Jonathan Elmer,Artur Dubrawski
备注：Accepted as a Proceedings paper at Machine Learning for Health (ML4H) 2025, invited presentation at the Time Series for Health (TS4H) Workshop, NeurIPS 2025
摘要：在多个领域的数据上预训练的时间序列基础模型（TSFM）在各种建模任务上表现出强大的性能。已经进行了各种努力来开发特定于脑电图（EEG）数据的基础模型，EEG数据将脑电活动记录为时间序列。然而，没有比较分析的EEG特定的基础模型（EEGFM）与一般的TSFMs已进行EEG特定的任务。我们介绍了一种新的时空适配器与多头池（STAMP），它利用单变量嵌入产生的一般TSFM，隐式模型的EEG数据的时空特征，并实现性能媲美国家的最先进的EEGFM。对使用EEG进行分类的临床任务的8个基准数据集以及消融研究进行全面分析。我们提出的适配器是轻量级的可训练参数和灵活的输入，它可以容纳，支持使用TSFM的EEG数据的简单建模。
摘要：Time series foundation models (TSFMs) pretrained on data from multiple domains have shown strong performance on diverse modeling tasks. Various efforts have been made to develop foundation models specific to electroencephalography (EEG) data, which records brain electrical activity as time series. However, no comparative analysis of EEG-specific foundation models (EEGFMs) versus general TSFMs has been performed on EEG-specific tasks. We introduce a novel Spatial-Temporal Adapter with Multi-Head Pooling (STAMP), which leverages univariate embeddings produced by a general TSFM, implicitly models spatial-temporal characteristics of EEG data, and achieves performance comparable to state-of-the-art EEGFMs. A comprehensive analysis is performed on 8 benchmark datasets of clinical tasks using EEG for classification, along with ablation studies. Our proposed adapter is lightweight in trainable parameters and flexible in the inputs it can accommodate, supporting easy modeling of EEG data using TSFMs.

【15】The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns
标题：错误信念的地图：通过注意力模式追踪内在和外在幻觉
链接：https://arxiv.org/abs/2511.10837

作者：Elyes Hajji,Aymen Bouguerra,Fabio Arnez
备注：Accepted at AAAI 2025-FS-ATRACC
摘要：大型语言模型（LLM）越来越多地部署在安全关键领域，但仍然容易受到幻觉的影响。虽然先前的工作已经提出了用于幻觉检测的置信度表示方法，但是这些方法中的大多数依赖于计算上昂贵的采样策略，并且通常忽略幻觉类型之间的区别。在这项工作中，我们引入了一个原则性的评估框架，区分外在和内在的幻觉类别，并评估检测性能在一套策划的基准。此外，我们利用最近基于注意力的不确定性量化算法，并提出新型注意力聚合策略，可提高可解释性和幻觉检测性能。我们的实验结果表明，基于采样的方法，如语义熵是有效的检测外部幻觉，但通常失败的内在的。相比之下，我们的方法，即通过输入标记聚集注意力，更适合于内在幻觉。这些见解为将检测策略与幻觉的性质相结合提供了新的方向，并将注意力作为量化模型不确定性的丰富信号。
摘要：Large Language Models (LLMs) are increasingly deployed in safety-critical domains, yet remain susceptible to hallucinations. While prior works have proposed confidence representation methods for hallucination detection, most of these approaches rely on computationally expensive sampling strategies and often disregard the distinction between hallucination types. In this work, we introduce a principled evaluation framework that differentiates between extrinsic and intrinsic hallucination categories and evaluates detection performance across a suite of curated benchmarks. In addition, we leverage a recent attention-based uncertainty quantification algorithm and propose novel attention aggregation strategies that improve both interpretability and hallucination detection performance. Our experimental findings reveal that sampling-based methods like Semantic Entropy are effective for detecting extrinsic hallucinations but generally fail on intrinsic ones. In contrast, our method, which aggregates attention over input tokens, is better suited for intrinsic hallucinations. These insights provide new directions for aligning detection strategies with the nature of hallucination and highlight attention as a rich signal for quantifying model uncertainty.

【16】EarthSight: A Distributed Framework for Low-Latency Satellite Intelligence
标题：EarthSight：低延迟卫星情报的分布式框架
链接：https://arxiv.org/abs/2511.10834

作者：Ansel Kaplan Erol,Seungjun Lee,Divya Mahajan
摘要：卫星图像的低延迟交付对于灾难响应、情报和基础设施监控等时间关键型应用至关重要。然而，传统的管道依赖于在分析之前下行链接所有捕获的图像，由于有限的通信带宽，导致数小时至数天的延迟。为了解决这些瓶颈，新兴系统执行板载机器学习，以优先传输哪些图像。然而，这些解决方案通常将每个卫星视为孤立的计算节点，限制了可扩展性和效率。跨卫星和任务的冗余推理进一步增加了机载电源和计算成本，限制了任务范围和响应能力。我们提出了EarthSight，一个分布式运行时框架，重新定义卫星图像智能轨道和地面之间的分布式决策问题。EarthSight引入了三个核心创新：（1）使用共享骨干在卫星上进行多任务推理，以分摊多个视觉任务的计算;（2）地面站查询调度器，可以聚合用户请求，预测优先级，并为传入图像分配计算预算;以及（3）动态过滤器排序，其集成模型选择性、准确性和执行成本以早期拒绝低价值图像并节省资源。EarthSight利用来自地面站的全球背景和轨道上的资源感知自适应决策，使星座能够在严格的下行链路带宽和机载功率预算内执行可扩展的低延迟图像分析。使用先前建立的卫星模拟器进行的评估表明，与最先进的基线相比，EarthSight将每张图像的平均计算时间减少了1.9倍，并将从第一次接触到交付的第90百分位端到端延迟从51分钟降低到21分钟。
摘要：Low-latency delivery of satellite imagery is essential for time-critical applications such as disaster response, intelligence, and infrastructure monitoring. However, traditional pipelines rely on downlinking all captured images before analysis, introducing delays of hours to days due to restricted communication bandwidth. To address these bottlenecks, emerging systems perform onboard machine learning to prioritize which images to transmit. However, these solutions typically treat each satellite as an isolated compute node, limiting scalability and efficiency. Redundant inference across satellites and tasks further strains onboard power and compute costs, constraining mission scope and responsiveness. We present EarthSight, a distributed runtime framework that redefines satellite image intelligence as a distributed decision problem between orbit and ground. EarthSight introduces three core innovations: (1) multi-task inference on satellites using shared backbones to amortize computation across multiple vision tasks; (2) a ground-station query scheduler that aggregates user requests, predicts priorities, and assigns compute budgets to incoming imagery; and (3) dynamic filter ordering, which integrates model selectivity, accuracy, and execution cost to reject low-value images early and conserve resources. EarthSight leverages global context from ground stations and resource-aware adaptive decisions in orbit to enable constellations to perform scalable, low-latency image analysis within strict downlink bandwidth and onboard power budgets. Evaluations using a prior established satellite simulator show that EarthSight reduces average compute time per image by 1.9x and lowers 90th percentile end-to-end latency from first contact to delivery from 51 to 21 minutes compared to the state-of-the-art baseline.

【17】Benchmarking Quantum Kernels Across Diverse and Complex Data
标题：跨多样化和复杂数据对量子核心进行基准测试
链接：https://arxiv.org/abs/2511.10831

作者：Yuhan Jiang,Matthew Otten
摘要：量子核方法是量子机器学习的一个很有前途的分支，但它们在多维、高维、真实世界数据上的实际优势尚未得到验证。目前的研究主要局限于低维或合成数据集，无法对其潜力进行全面评估。为了解决这一差距，我们开发了一个变分量子内核框架，利用资源高效的ansätze进行复杂的分类任务，并引入了参数缩放技术来加速收敛。我们在八个具有挑战性的真实世界和高维数据集上对该框架进行了全面的基准测试，这些数据集涵盖表格，图像，时间序列和图形数据。我们的经典模拟结果表明，所提出的量子内核表现出明显的性能优势，比标准的经典内核，如径向基函数（RBF）内核。这项工作表明，适当设计的量子内核可以作为通用的高性能工具，为现实世界机器学习中的量子增强应用奠定了基础。需要进一步的研究来充分评估实际的量子优势。
摘要：Quantum kernel methods are a promising branch of quantum machine learning, yet their practical advantage on diverse, high-dimensional, real-world data remains unverified. Current research has largely been limited to low-dimensional or synthetic datasets, preventing a thorough evaluation of their potential. To address this gap, we developed a variational quantum kernel framework utilizing resource-efficient ansätze for complex classification tasks and introduced a parameter scaling technique to accelerate convergence. We conducted a comprehensive benchmark of this framework on eight challenging, real world and high-dimensional datasets covering tabular, image, time series, and graph data. Our classically simulated results show that the proposed quantum kernel demonstrated a clear performance advantage over standard classical kernels, such as the radial basis function (RBF) kernel. This work demonstrates that properly designed quantum kernels can function as versatile, high-performance tools, laying a foundation for quantum-enhanced applications in real-world machine learning. Further research is needed to fully assess the practical quantum advantage.

【18】Towards Universal Neural Operators through Multiphysics Pretraining
标题：通过多物理场预训练迈向通用神经运算符
链接：https://arxiv.org/abs/2511.10829

作者：Mikhail Masliaev,Dmitry Gusarov,Ilya Markov,Alexander Hvatov
备注：5 pages, 1 figure, accepted for Machine Learning and the Physical Sciences Workshop, NeurIPS 2025
摘要：虽然神经操作符被广泛用于数据驱动的物理模拟，但它们的训练仍然是计算昂贵的。最近的进展通过下游学习解决了这个问题，其中在更复杂的问题上对简单问题进行预训练的模型进行了微调。在这项研究中，我们研究了基于transformer的神经算子，这些算子以前只应用于特定的问题，在更一般的迁移学习环境中。我们评估了它们在不同PDE问题中的性能，包括外推到看不见的参数，引入新变量以及从多方程数据集转移。我们的研究结果表明，先进的神经算子架构可以有效地跨PDE问题传递知识。
摘要：Although neural operators are widely used in data-driven physical simulations, their training remains computationally expensive. Recent advances address this issue via downstream learning, where a model pretrained on simpler problems is fine-tuned on more complex ones. In this research, we investigate transformer-based neural operators, which have previously been applied only to specific problems, in a more general transfer learning setting. We evaluate their performance across diverse PDE problems, including extrapolation to unseen parameters, incorporation of new variables, and transfer from multi-equation datasets. Our results demonstrate that advanced neural operator architectures can effectively transfer knowledge across PDE problems.

【19】Cognitively-Inspired Episodic Memory Architectures for Accurate and Efficient Character AI
标题：认知启发的情景记忆架构，实现准确有效的角色人工智能
链接：https://arxiv.org/abs/2511.10652

作者：Rafael Arias Gonzalez,Steve DiPaola
备注：25 pages
摘要：大型语言模型显示出在对话系统中体现历史人物的希望，但现有的方法面临着一个关键的权衡：简单的检索增强生成产生浅响应，而多阶段反射在禁止延迟实现深度。我们提出了一个架构，解决了这种紧张局势，通过离线数据增强和高效的并行检索结构化情景记忆。我们的系统将传记数据转换为1,774个丰富的第一人称记忆与情感语义元数据，然后采用两阶段检索实现0.52秒的提示生成。使用LLM-as-judge和RAGAs指标进行的评估显示，我们的方法在GPT-4上与传统RAG实现了同等效果，同时在较小的模型（GPT-3.5，GPT-3）上表现明显优于传统RAG，这表明了资源受限部署的特殊价值。除了对话之外，结构化记忆还支持新颖的可视化工具：时空热图，情感轨迹分析和交互式路径跟踪，将系统定位为对话界面和传记分析的研究工具。我们使用梵高作为测试案例，但该架构可以推广到任何具有大量文本记录的历史人物，为教育，博物馆和研究应用提供了一个实用的框架，这些应用需要准确性和效率
摘要：Large language models show promise for embodying historical characters in dialogue systems, but existing approaches face a critical trade-off: simple retrieval-augmented generation produces shallow responses, while multi-stage reflection achieves depth at prohibitive latency. We present an architecture that resolves this tension through offline data augmentation and efficient parallel retrieval from structured episodic memory. Our system transforms biographical data into 1,774 enriched first-person memories with affective-semantic metadata, then employs two-stage retrieval achieving 0.52s prompt generation. Evaluation using LLM-as-judge and RAGAs metrics shows our approach achieves parity with traditional RAG on GPT-4 while significantly outperforming it on smaller models (GPT-3.5, GPT-3), suggesting particular value for resource-constrained deployments. Beyond dialogue, the structured memory enables novel visualization tools: spatiotemporal heatmaps, emotional trajectory analysis, and interactive path tracking, positioning the system as both a dialogue interface and research tool for biographical analysis. We use Van Gogh as a test case, but the architecture is generalizable to any historical figure with substantial textual records, offering a practical framework for educational, museum, and research applications requiring both accuracy and efficiency

【20】Estimating Total Effects in Bipartite Experiments with Spillovers and Partial Eligibility
标题：具有溢出和部分合格性的双方实验中的总效应估计
链接：https://arxiv.org/abs/2511.11564

作者：Albert Tan,Mohsen Bayati,James Nordlund,Roman Istomin
备注：21 pages, 6 figures, Appeared as Oral Presentation in 2025 Conference on Digital Experimentation (CODE) at MIT
摘要：我们研究在二分系统中的随机实验，其中只有一个子集的治疗方单位有资格分配，而所有单位继续相互作用，产生干扰。我们将杀伤力约束的二分实验形式化，并定义与完全部署一致的被估量：合格单位的主要总治疗效果（PTTE）和不合格单位的次要总治疗效果（STTE）。在合格集内的随机化下，我们给出了识别条件，并开发了结合暴露映射、广义倾向分数和灵活机器学习的干扰感知集成估计器。我们进一步引入了一个将治疗水平和结果水平被估量联系起来的投影;该映射在线性加性边缘条件下是精确的，并且能够在（通常小得多的）治疗侧进行估计，并确定性地聚合到结果。在模拟与已知的地面真相在现实的曝光制度，建议的估计恢复PTTE和STTE与低偏差和方差，并减少偏差时，可能会出现干扰被忽略。两个现场实验说明了实际的相关性：我们的方法纠正了预期的干扰偏差的方向，在这两项研究中的一个预先指定的度量和逆转的主要决策度量的符号和意义在一种情况下。
摘要：We study randomized experiments in bipartite systems where only a subset of treatment-side units are eligible for assignment while all units continue to interact, generating interference. We formalize eligibility-constrained bipartite experiments and define estimands aligned with full deployment: the Primary Total Treatment Effect (PTTE) on eligible units and the Secondary Total Treatment Effect (STTE) on ineligible units. Under randomization within the eligible set, we give identification conditions and develop interference-aware ensemble estimators that combine exposure mappings, generalized propensity scores, and flexible machine learning. We further introduce a projection that links treatment- and outcome-level estimands; this mapping is exact under a Linear Additive Edges condition and enables estimation on the (typically much smaller) treatment side with deterministic aggregation to outcomes. In simulations with known ground truth across realistic exposure regimes, the proposed estimators recover PTTE and STTE with low bias and variance and reduce the bias that could arise when interference is ignored. Two field experiments illustrate practical relevance: our method corrects the direction of expected interference bias for a pre-specified metric in both studies and reverses the sign and significance of the primary decision metric in one case.

【21】Inferring response times of perceptual decisions with Poisson variational autoencoders
标题：使用Poisson变分自动编码器推断感知决策的响应时间
链接：https://arxiv.org/abs/2511.11480

作者：Hayden R. Johnson,Anastasia N. Krouglova,Hadi Vafaii,Jacob L. Yates,Pedro J. Gonçalves
备注：To appear at the NeurIPS 2025 Workshop on Data on the Mind and Brain
摘要：感知决策的许多属性都可以通过深度神经网络进行很好的建模。然而，这种架构通常将决策视为瞬时读出，忽略了决策过程的时间动态。我们提出了一个图像可计算模型的感知决策中的选择和响应时间产生有效的感官编码和贝叶斯解码的神经尖峰活动。我们使用泊松变分自编码器学习无监督表示的视觉刺激的速率编码神经元的人口，建模为独立的齐次泊松过程。任务优化的解码器，然后不断推断出一个近似的后验条件下传入的尖峰活动的行动。将这些组件与基于熵的停止规则相结合，产生了一个原则性的和图像可计算的感知决策模型，该模型能够生成逐个尝试的选择和响应时间模式。应用于MNIST数字分类，该模型再现关键的经验签名的感知决策，包括随机变异性，右偏的响应时间分布，对数缩放的响应时间与替代品的数量（希克定律），和速度-准确性权衡。
摘要：Many properties of perceptual decision making are well-modeled by deep neural networks. However, such architectures typically treat decisions as instantaneous readouts, overlooking the temporal dynamics of the decision process. We present an image-computable model of perceptual decision making in which choices and response times arise from efficient sensory encoding and Bayesian decoding of neural spiking activity. We use a Poisson variational autoencoder to learn unsupervised representations of visual stimuli in a population of rate-coded neurons, modeled as independent homogeneous Poisson processes. A task-optimized decoder then continually infers an approximate posterior over actions conditioned on incoming spiking activity. Combining these components with an entropy-based stopping rule yields a principled and image-computable model of perceptual decisions capable of generating trial-by-trial patterns of choices and response times. Applied to MNIST digit classification, the model reproduces key empirical signatures of perceptual decision making, including stochastic variability, right-skewed response time distributions, logarithmic scaling of response times with the number of alternatives (Hick's law), and speed-accuracy trade-offs.

【22】Neural Local Wasserstein Regression
标题：神经局部沃瑟斯坦回归
链接：https://arxiv.org/abs/2511.10824

作者：Inga Girshfeld,Xiaohui Chen
备注：Accepted to TAG-DS 2025. 11 pages, 3 figures
摘要：研究了分布-分布回归模型的估计问题，其中预测变量和响应变量都是概率测度。现有的方法通常依赖于全局最优传输映射或切空间线性化，这可能在逼近能力方面受到限制，并且在多变量基础域中扭曲几何形状。在本文中，我们提出了神经局部Wasserstein回归，这是一个灵活的非参数框架，通过Wasserstein空间中局部定义的传输映射来建模回归。我们的方法建立在与经典核回归的类比之上：基于2-Wasserstein距离的核权重围绕参考度量定位估计，而神经网络参数化传输算子可以灵活地适应复杂的数据几何形状。这种局部化的观点拓宽了容许变换的类别，避免了全局映射假设和线性化结构的限制。我们使用DeepSets风格的架构和Sinkhorn近似损失开发了一个实用的训练过程，并结合贪婪的参考选择策略以实现可扩展性。通过对高斯和混合模型的合成实验，以及MNIST上的分布预测任务，我们证明了我们的方法有效地捕获了现有方法无法捕获的非线性和高维分布关系。
摘要：We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures. Existing approaches typically rely on a global optimal transport map or tangent-space linearization, which can be restrictive in approximation capacity and distort geometry in multivariate underlying domains. In this paper, we propose the \emph{Neural Local Wasserstein Regression}, a flexible nonparametric framework that models regression through locally defined transport maps in Wasserstein space. Our method builds on the analogy with classical kernel regression: kernel weights based on the 2-Wasserstein distance localize estimators around reference measures, while neural networks parameterize transport operators that adapt flexibly to complex data geometries. This localized perspective broadens the class of admissible transformations and avoids the limitations of global map assumptions and linearization structures. We develop a practical training procedure using DeepSets-style architectures and Sinkhorn-approximated losses, combined with a greedy reference selection strategy for scalability. Through synthetic experiments on Gaussian and mixture models, as well as distributional prediction tasks on MNIST, we demonstrate that our approach effectively captures nonlinear and high-dimensional distributional relationships that elude existing methods.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递