机器学习学术速递[11.6]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计127篇

大模型相关(11篇)

【1】AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing
标题：AnaFlow：基于LLM的大型工作流程，用于推理驱动的可解释和样本高效的模拟电路规模调整
链接：https://arxiv.org/abs/2511.03697

作者：Mohsen Ahmadzadeh, Kaichang Chen, Georges Gielen
备注：This article was accepted by 2025 International Conference on Computer-Aided Design (ICCAD 2025) and was presented in Munich, October 2025
摘要：模拟/混合信号电路是电子产品与物理世界接口的关键。然而，它们的设计在很大程度上仍然是手工制作的过程，导致设计周期长且容易出错。虽然最近基于人工智能的强化学习和生成人工智能的兴起创造了自动化这项任务的新技术，但需要许多耗时的模拟是阻碍整体效率的关键瓶颈。此外，缺乏可解释性的设计解决方案阻碍了广泛采用的工具。为了解决这些问题，提出了一种新的代理人工智能框架，用于样本高效和可解释的模拟电路尺寸。它采用了一个多代理工作流，其中专门的大语言模型（LLM）为基础的代理合作，以解释电路拓扑结构，理解设计目标，并迭代地完善电路的设计参数与人类可解释的推理目标。自适应模拟策略创建了一个智能控制，产生了高采样效率。AnaFlow框架被证明为两个不同复杂度的电路，并且能够完全自动地完成大小调整任务，与纯贝叶斯优化和强化学习方法不同。系统从其优化历史中学习，以避免过去的错误并加速收敛。固有的可解释性使其成为模拟设计空间探索的强大工具和模拟EDA中的新范式，其中AI代理作为透明的设计助手。
摘要：Analog/mixed-signal circuits are key for interfacing electronics with the physical world. Their design, however, remains a largely handcrafted process, resulting in long and error-prone design cycles. While the recent rise of AI-based reinforcement learning and generative AI has created new techniques to automate this task, the need for many time-consuming simulations is a critical bottleneck hindering the overall efficiency. Furthermore, the lack of explainability of the resulting design solutions hampers widespread adoption of the tools. To address these issues, a novel agentic AI framework for sample-efficient and explainable analog circuit sizing is presented. It employs a multi-agent workflow where specialized Large Language Model (LLM)-based agents collaborate to interpret the circuit topology, to understand the design goals, and to iteratively refine the circuit's design parameters towards the target goals with human-interpretable reasoning. The adaptive simulation strategy creates an intelligent control that yields a high sample efficiency. The AnaFlow framework is demonstrated for two circuits of varying complexity and is able to complete the sizing task fully automatically, differently from pure Bayesian optimization and reinforcement learning approaches. The system learns from its optimization history to avoid past mistakes and to accelerate convergence. The inherent explainability makes this a powerful tool for analog design space exploration and a new paradigm in analog EDA, where AI agents serve as transparent design assistants.

【2】TabGemma: Text-Based Tabular ICL via LLM using Continued Pretraining and Retrieval
标题：TabGemma：通过LLM使用持续预训练和检索的基于文本的表格ICL
链接：https://arxiv.org/abs/2511.03570

作者：Günther Schindler, Maximilian Schambach, Michael Medek, Sam Thelin
摘要：我们研究了混合文本，数字和分类字段的表格预测LLM。我们介绍了TabGemma，一个模式无关的上下文学习器，它将行视为序列，并在将预训练的LLM用于表格预测时解决了两个实际障碍：不稳定的数字标记化和有限的上下文大小。我们建议通过有符号科学计数法规范化数字，并使用大规模真实世界数据集继续预训练12 B Gemma 3模型，并使用目标插补目标。为了进行推理，我们使用一个紧凑的基于n-gram的检索来选择适合128 k令牌窗口的信息样本。在语义丰富的基准测试中，TabGemma在低数据和高数据状态下建立了一种新的分类技术，并通过更多的上下文行进行单调改进。对于回归，它在小样本量下具有竞争力，但随着数据的增长，它落后于传统方法。我们的研究结果表明，LLM可以有效的表格在上下文学习者的高度语义的任务时，配对专用的数字处理和上下文检索，同时激励进一步的进步，在数字建模和长期的上下文扩展。
摘要：We study LLMs for tabular prediction with mixed text, numeric, and categorical fields. We introduce TabGemma, a schema-agnostic in-context learner that treats rows as sequences and tackles two practical hurdles when adapting pretrained LLMs for tabular predictions: unstable numeric tokenization and limited context size. We propose to canonicalize numbers via signed scientific notation and continue pretraining of a 12B Gemma 3 model with a target imputation objective using a large-scale real world dataset. For inference, we use a compact n-gram-based retrieval to select informative exemplars that fit within a 128k-token window. On semantically rich benchmarks, TabGemma establishes a new state of the art on classification across low- and high-data regimes and improves monotonically with more context rows. For regression, it is competitive at small sample sizes but trails conventional approaches as data grows. Our results show that LLMs can be effective tabular in-context learners on highly semantic tasks when paired with dedicated numeric handling and context retrieval, while motivating further advances in numeric modeling and long-context scaling.

【3】Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
标题：在视觉语言模型的即时学习中消除增强偏差
链接：https://arxiv.org/abs/2511.03367

作者：Gahyeon Kim, Sohee Kim, Seokju Lee
备注：Accepted in Pattern Recognition
摘要：大规模视觉和语言模型的最新进展使zero-shot学习任务取得了重大进展。CoOp和CoCoOp等方法已经表明，用可学习的向量（称为提示学习）取代手工制作的提示可以提高性能。然而，这些模型往往难以推广到完全看不见的类别。虽然传统的zero-shot学习技术受益于各种数据增强策略，但快速学习主要集中在基于文本的修改上，使得基于图像的增强的潜力在很大程度上未被探索。在这项工作中，我们将探讨如何图像级增强，特别是那些引入属性特定的变化，可以支持和增强提示学习。我们的分析研究了这些增强和软提示框架之间的相互作用，揭示了它们提高泛化能力的潜力。我们还确定了现有方法的局限性，如CoCoOp，它不提供明确的指导学习提示，重点是语义上有意义的视觉功能。为了解决这个问题，我们提出了将属性添加到提示学习（AAPL），这是一种新方法，它引入了对抗性令牌嵌入，以将增强所引入的表面视觉变化与类相关的语义表示解耦。这种解耦使得学习的提示能够集中于与目标类别对齐的视觉上有区别的特征。我们在11个基准数据集上进行了全面的实验，AAPL在Few-Shot、zero-shot、跨数据集和域泛化设置上始终优于现有方法。我们的源代码可在以下网址公开获取：https://github.com/Gahyeonkim09/AAPL
摘要：Recent advances in large-scale vision and language models have led to significant progress in zero-shot learning tasks. Methods such as CoOp and CoCoOp have shown that replacing handcrafted prompts with learnable vectors, known as prompt learning, can result in improved performance. However, these models often struggle to generalize to entirely unseen categories. While traditional zero-shot learning techniques benefit from various data augmentation strategies, prompt learning has primarily focused on text-based modifications, leaving the potential of image-based augmentation largely unexplored. In this work, we explore how image-level augmentations, particularly those that introduce attribute-specific variations, can support and enhance prompt learning. Our analysis examines the interaction between these augmentations and soft prompt frameworks, revealing their potential to improve generalization. We also identify a limitation in existing methods, such as CoCoOp, which do not provide explicit guidance for learning prompts that focus on semantically meaningful visual features. To address this, we propose Adding Attributes to Prompt Learning, AAPL, a novel method that introduces adversarial token embeddings to decouple superficial visual variations introduced by augmentation from class-relevant semantic representations. This decoupling enables the learned prompts to concentrate on visually discriminative features that align with the target categories. We conduct comprehensive experiments on eleven benchmark datasets, and AAPL consistently outperforms existing methods across few-shot, zero-shot, cross-dataset, and domain generalization settings. Our source code is publicly available at: https://github.com/Gahyeonkim09/AAPL

【4】Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks
标题：临床任务中多模式大型语言模型的思维模式基准
链接：https://arxiv.org/abs/2511.03328

作者：Jindong Hong, Tianjie Chen, Lingjie Luo, Chuanyang Zheng, Ting Xu, Haibao Yu, Jianing Qiu, Qianzhong Chen, Suning Huang, Yan Xu, Yong Gui, Yijun He, Jiankai Sun
摘要：多模态大型语言模型（MLLM）研究的最新进展是“推理MLLM”的出现，它提供了对内部思维过程（通常称为“思维模式”）以及标准“非思维模式”的明确控制。这种能力允许这些模型在生成最终响应之前参与一个逐步的内部审议过程。随着这些“双态”MLLM的快速过渡和采用，这项工作严格评估了这些MLLM的增强推理过程如何影响模型在临床任务中的性能和可靠性。本文评估了两个领先的MLLM，种子1.5-VL和双子座-2.5-闪存，用于医疗应用的主动“思维模式”的能力。我们使用VQA-RAD和ROCOv 2数据集评估了他们在四个视觉医疗任务上的表现。我们的研究结果表明，对于大多数任务，与标准的非思维模式相比，激活思维模式的改善仍然微不足道。它们在复杂的医疗任务（如开放式VQA和医学图像解释）上的表现仍然不理想，这突出了对特定领域医疗数据和更先进的医学知识集成方法的需求。
摘要：A recent advancement in Multimodal Large Language Models (MLLMs) research is the emergence of "reasoning MLLMs" that offer explicit control over their internal thinking processes (normally referred as the "thinking mode") alongside the standard "non-thinking mode". This capability allows these models to engage in a step-by-step process of internal deliberation before generating a final response. With the rapid transition to and adoption of these "dual-state" MLLMs, this work rigorously evaluated how the enhanced reasoning processes of these MLLMs impact model performance and reliability in clinical tasks. This paper evaluates the active "thinking mode" capabilities of two leading MLLMs, Seed1.5-VL and Gemini-2.5-Flash, for medical applications. We assessed their performance on four visual medical tasks using VQA-RAD and ROCOv2 datasets. Our findings reveal that the improvement from activating the thinking mode remains marginal compared to the standard non-thinking mode for the majority of the tasks. Their performance on complex medical tasks such as open-ended VQA and medical image interpretation remains suboptimal, highlighting the need for domain-specific medical data and more advanced methods for medical knowledge integration.

【5】Diffusion Language Models are Super Data Learners
标题：扩散语言模型是超级数据学习者
链接：https://arxiv.org/abs/2511.03276

作者：Jinjie Ni, Qian Liu, Longxu Dou, Chao Du, Zili Wang, Hang Yan, Tianyu Pang, Michael Qizhe Shieh
摘要：在严格控制的预训练设置下，我们观察到一个交叉：当唯一数据有限时，扩散语言模型（DLM）通过训练更多的时期而始终超过自回归（AR）模型。交叉转移后，更多或更高质量的数据，更早与更大的模型，并持续在密集和稀疏的架构。我们将收益归因于三个复合因素：（1）任意阶建模，（2）迭代双向去噪的超密集计算，以及（3）内置的Monte Carlo增强;输入或参数噪声在数据约束下改善了AR，但无法缩小差距。在规模上，一个1.7B的DLM在10 B唯一的Python令牌上训练了大约1.5T的令牌计算预算，超过了一个严格匹配设置训练的AR编码器。此外，1B参数DLM在HellaSwag上实现了> 56%的准确率，在MMLU上实现了> 33%的准确率，仅使用1B令牌，没有任何特殊技巧，只是通过重复标准的预训练数据。我们还表明，不断上升的验证交叉熵并不意味着下游性能下降，在这个制度。
摘要：Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later with more or higher-quality data, earlier with larger models, and persists across dense and sparse architectures. We attribute the gains to three compounding factors: (1) any-order modeling, (2) super-dense compute from iterative bidirectional denoising, and (3) built-in Monte Carlo augmentation; input or parameter noise improves AR under data constraint but cannot close the gap. At scale, a 1.7B DLM trained with a ~1.5T-token compute budget on 10B unique Python tokens overtakes an AR coder trained with strictly matched settings. In addition, a 1B-parameter DLM achieves > 56% accuracy on HellaSwag and > 33% on MMLU using only 1B tokens, without any special tricks, just by repeating standard pre-training data. We also show that rising validation cross-entropy does not imply degraded downstream performance in this regime.

【6】Understanding Robustness of Model Editing in Code LLMs: An Empirical Study
标题：了解代码LLM中模型编辑的稳健性：实证研究
链接：https://arxiv.org/abs/2511.03182

作者：Vinaik Chhetri, A.B Siddique, Umar Farooq
备注：26 pages, 2 figures, 15 tables
摘要：大型语言模型（LLM）越来越多地用于软件开发。然而，尽管LLM在预训练后保持静态，但编程语言和API仍在不断发展，导致生成不推荐或不兼容的代码，从而破坏了可靠性。从头开始重新训练LLM以反映这些变化在计算上是昂贵的，使得模型编辑成为一种有前途的轻量级替代方案，仅更新一小部分参数。尽管有其潜力，但目前仍不清楚模型编辑是否会产生真正的语法和语义适应或仅仅是表面上的修复。在这项工作中，我们提出了一个系统的研究五个国家的最先进的模型编辑方法：约束微调（FT），GRACE，MEMIT，PMET和罗马。我们将这些方法应用于三个领先的开源代码LLM，CodeLlama，CodeQwen1.5和DeepSeek-Coder，在受控的API弃用场景下。我们的评估涵盖即时和顺序编辑设置，使用三个不相交的评估集，旨在评估可靠性，泛化和特异性。我们衡量模型的正确性在三个层次：成功的编译，部分测试用例通过，和完整的测试通过。我们的研究结果表明，即时编辑始终会降低模型性能，即使在性能最好的环境中，语法有效性也会下降86个百分点，功能正确性也会下降45个百分点。顺序编辑进一步放大了这种退化，在某些情况下，模型性能完全崩溃。在所有模型中，大多数经过的代都依赖于变通方法，而不是正确地采用预期的更改，而导致测试失败或编译错误的错误采用则更为频繁。正确的采用，即模型正确地整合了预期的变化，只有大约6%的情况下发生。
摘要：Large language models (LLMs) are increasingly used in software development. However, while LLMs remain static after pretraining, programming languages and APIs continue to evolve, leading to the generation of deprecated or incompatible code that undermines reliability. Retraining LLMs from scratch to reflect such changes is computationally expensive, making model editing a promising lightweight alternative that updates only a small subset of parameters. Despite its potential, it remains unclear whether model editing yields genuine syntactic and semantic adaptations or merely superficial fixes. In this work, we present a systematic study of five state-of-the-art model editing methods: Constrained Fine-Tuning (FT), GRACE, MEMIT, PMET, and ROME. We apply these methods to three leading open-source code LLMs, CodeLlama, CodeQwen1.5, and DeepSeek-Coder, under controlled API deprecation scenarios. Our evaluation covers both instant and sequential editing settings, using three disjoint evaluation sets designed to assess reliability, generalization, and specificity. We measure model correctness at three levels: successful compilation, partial test case pass, and full test pass. Our findings show that instant edits consistently degrade model performance, with syntactic validity dropping by up to 86 percentage points and functional correctness declining by 45 points even in the best-performing setting. Sequential edits further amplify this degradation, and in some cases, model performance collapses entirely. Across all models, most passing generations relied on workarounds rather than correctly adopting the intended changes, while faulty adoptions that result in test failures or compilation errors were significantly more frequent. Correct adoptions, where the model correctly integrates the intended change, occurred in only about 6% of cases.

【7】From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
标题：从洞察到利用：利用LLM协作进行自适应对抗文本生成
链接：https://arxiv.org/abs/2511.03128

作者：Najrin Sultana, Md Rafi Ur Rashid, Kang Gu, Shagufta Mehnaz
备注：Findings of the Association for Computational Linguistics: EMNLP 2025 (camera-ready)
摘要：LLM可以使用简单的任务提示在不同的任务上提供实质性的zero-shot性能，从而消除了培训或微调的需要。然而，当将这些模型应用于敏感任务时，彻底评估它们对对抗性输入的鲁棒性至关重要。在这项工作中，我们介绍了静态欺骗者（StaDec）和动态欺骗者（DyDec），这两个创新的攻击框架旨在通过利用对LLM的理解来系统地生成动态和自适应的对抗性示例。我们生成微妙且看起来自然的对抗性输入，这些输入保留了与原始文本的语义相似性，同时有效地欺骗了目标LLM。通过利用自动化的LLM驱动的管道，我们消除了对外部物流的依赖。我们的攻击随着LLM的进步而发展，并在攻击者未知的模型之间表现出强大的可转移性。总的来说，这项工作提供了一个系统的方法进行自我评估的LLM的鲁棒性。我们在https://github.com/Shukti042/AdversarialExample上发布我们的代码和数据。
摘要：LLMs can provide substantial zero-shot performance on diverse tasks using a simple task prompt, eliminating the need for training or fine-tuning. However, when applying these models to sensitive tasks, it is crucial to thoroughly assess their robustness against adversarial inputs. In this work, we introduce Static Deceptor (StaDec) and Dynamic Deceptor (DyDec), two innovative attack frameworks designed to systematically generate dynamic and adaptive adversarial examples by leveraging the understanding of the LLMs. We produce subtle and natural-looking adversarial inputs that preserve semantic similarity to the original text while effectively deceiving the target LLM. By utilizing an automated, LLM-driven pipeline, we eliminate the dependence on external heuristics. Our attacks evolve with the advancements in LLMs and demonstrate strong transferability across models unknown to the attacker. Overall, this work provides a systematic approach for the self-assessment of an LLM's robustness. We release our code and data at https://github.com/Shukti042/AdversarialExample.

【8】PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech
标题：PolyNorm：基于Few-Shot LLM的文本到语音的文本规范化
链接：https://arxiv.org/abs/2511.03080

作者：Michel Wong, Ali Alshehri, Sophia Kao, Haotian He
备注：9 pages including appendix. EMNLP 2025 Industry Track
摘要：文本规范化（TN）是文本到语音（TTS）系统中的关键预处理步骤，将书面形式转换为标准的口语等价物。传统的TN系统可以表现出很高的准确性，但涉及大量的工程工作，难以扩展，并对语言覆盖面构成挑战，特别是在低资源环境中。我们提出了PolyNorm，这是一种使用大型语言模型（LLM）的基于语法的TN方法，旨在减少对手工制作的规则的依赖，并以最少的人为干预实现更广泛的语言适用性。此外，我们还提出了一个与语言无关的自动数据管理和评估管道，旨在促进跨不同语言的可扩展实验。八种语言的实验表明，与基于生产级的系统相比，单词错误率（WER）一致降低。为了支持进一步的研究，我们发布了PolyNorm-Benchmark，这是一个涵盖各种文本规范化现象的多语言数据集。
摘要：Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering effort, are difficult to scale, and pose challenges to language coverage, particularly in low-resource settings. We propose PolyNorm, a prompt-based approach to TN using Large Language Models (LLMs), aiming to reduce the reliance on manually crafted rules and enable broader linguistic applicability with minimal human intervention. Additionally, we present a language-agnostic pipeline for automatic data curation and evaluation, designed to facilitate scalable experimentation across diverse languages. Experiments across eight languages show consistent reductions in the word error rate (WER) compared to a production-grade-based system. To support further research, we release PolyNorm-Benchmark, a multilingual data set covering a diverse range of text normalization phenomena.

【9】Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
标题：大型语言模型的流行病学：观察分布知识的基准
链接：https://arxiv.org/abs/2511.03070

作者：Drago Plecko, Patrik Okanovic, Torsten Hoefler, Elias Bareinboim
摘要：人工智能（AI）系统为推进各种科学学科的发展带来了巨大的希望，并越来越多地用于现实世界的应用。尽管取得了显著进展，但为了实现更一般类型的智能，预计还需要进一步的能力。在这种情况下，一个关键的区别是事实知识，这可以根据正确或错误的答案进行评估（例如，“英国的首都是哪里？）和概率知识，反映真实世界的概率属性（例如，“美国计算机科学专业毕业生的性别是多少？").在本文中，我们的目标是建立一个基准，了解LLM在描述现实世界的概率分布的知识方面的能力。考虑到LLM是在大量文本上训练的，它们内化了这些分布的各个方面可能是合理的。事实上，LLM被吹捧为真实世界分布的强大通用近似器。与此同时，统计学中的经典结果（称为维度诅咒）凸显了高维学习分布的根本挑战，挑战了普遍分布学习的概念。在这项工作中，我们开发了第一个基准来直接测试这一假设，评估LLM是否可以获得描述经济学，健康，教育和社会行为等领域真实世界人口的经验分布。我们的研究结果表明，LLM整体表现不佳，并且似乎没有自然地内化真实世界的统计数据。当在Pearl的因果分层（PCH）上下文中解释时，我们的基准测试表明语言模型不包含有关观察分布（PCH的第1层）的知识，因此因果分层定理意味着这些模型的干预性（第2层）和反事实（第3层）知识也是有限的。
摘要：Artificial intelligence (AI) systems hold great promise for advancing various scientific disciplines, and are increasingly used in real-world applications. Despite their remarkable progress, further capabilities are expected in order to achieve more general types of intelligence. A critical distinction in this context is between factual knowledge, which can be evaluated against true or false answers (e.g., "what is the capital of England?"), and probabilistic knowledge, reflecting probabilistic properties of the real world (e.g., "what is the sex of a computer science graduate in the US?"). In this paper, our goal is to build a benchmark for understanding the capabilities of LLMs in terms of knowledge of probability distributions describing the real world. Given that LLMs are trained on vast amounts of text, it may be plausible that they internalize aspects of these distributions. Indeed, LLMs are touted as powerful universal approximators of real-world distributions. At the same time, classical results in statistics, known as curse of dimensionality, highlight fundamental challenges in learning distributions in high dimensions, challenging the notion of universal distributional learning. In this work, we develop the first benchmark to directly test this hypothesis, evaluating whether LLMs have access to empirical distributions describing real-world populations across domains such as economics, health, education, and social behavior. Our results demonstrate that LLMs perform poorly overall, and do not seem to internalize real-world statistics naturally. When interpreted in the context of Pearl's Causal Hierarchy (PCH), our benchmark demonstrates that language models do not contain knowledge on observational distributions (Layer 1 of PCH), and thus the Causal Hierarchy Theorem implies that interventional (Layer 2) and counterfactual (Layer 3) knowledge of these models is also limited.

【10】Zero-shot data citation function classification using transformer-based large language models (LLMs)
标题：使用基于转换器的大型语言模型（LLM）进行零次数据引用函数分类
链接：https://arxiv.org/abs/2511.02936

作者：Neil Byers, Ali Zaidi, Valerie Skye, Chris Beecroft, Kjiersten Fagnan
摘要：近年来，人们加大了努力，以确定特定数据集与包含这些数据集的科学文献之间的关联。知道给定的出版物引用了给定的数据集，下一个合乎逻辑的步骤是探索如何或为什么使用这些数据。近年来，预训练的基于transformer的大型语言模型（LLM）的进展为扩展已发表文献中的数据用例描述提供了潜在的方法。这避免了昂贵的手动标记和为经典机器学习（ML）系统开发训练数据集。在这项工作中，我们应用开源LLM，Llama 3.1- 405 B，为已知包含特定基因组数据集的出版物生成结构化数据用例标签。我们还介绍了一种新的评估框架，以确定我们的方法的有效性。我们的研究结果表明，股票模型可以实现一个zero-shot的数据引用分类任务，没有预先定义的类别的F1得分为0.674。虽然有希望，我们的结果是合格的障碍相关的数据可用性，及时过拟合，计算基础设施，并进行负责任的性能评估所需的费用。
摘要：Efforts have increased in recent years to identify associations between specific datasets and the scientific literature that incorporates them. Knowing that a given publication cites a given dataset, the next logical step is to explore how or why that data was used. Advances in recent years with pretrained, transformer-based large language models (LLMs) offer potential means for scaling the description of data use cases in the published literature. This avoids expensive manual labeling and the development of training datasets for classical machine-learning (ML) systems. In this work we apply an open-source LLM, Llama 3.1-405B, to generate structured data use case labels for publications known to incorporate specific genomic datasets. We also introduce a novel evaluation framework for determining the efficacy of our methods. Our results demonstrate that the stock model can achieve an F1 score of .674 on a zero-shot data citation classification task with no previously defined categories. While promising, our results are qualified by barriers related to data availability, prompt overfitting, computational infrastructure, and the expense required to conduct responsible performance evaluation.

【11】Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models
标题：使用大型语言模型在可穿戴物联网系统中进行自适应和稳健的数据中毒检测和清理
链接：https://arxiv.org/abs/2511.02894

作者：W.K.M Mithsara, Ning Yang, Ahmed Imteaj, Hussein Zangoti, Abdur R. Shahid
摘要：可穿戴传感设备在物联网（IoT）生态系统中的广泛集成，特别是在医疗保健，智能家居和工业应用中，需要强大的人类活动识别（HAR）技术来改善功能和用户体验。虽然机器学习模型具有先进的HAR，但它们越来越容易受到数据中毒攻击，从而损害这些系统的数据完整性和可靠性。传统的防御此类攻击的方法通常需要使用大型标记数据集进行广泛的特定任务训练，这限制了动态物联网环境中的适应性。这项工作提出了一种新的框架，使用大型语言模型（LLM）进行中毒检测和消毒HAR系统，利用zero-shot，one-shot，和Few-Shot学习范例。我们的方法结合了\textit{角色扮演}提示，LLM假设专家的角色，以上下文和评估传感器异常，\textit{认为一步一步}推理，引导LLM推断中毒指标的原始传感器数据和合理的清洁替代品。这些策略最大限度地减少了对大量数据集管理的依赖，并实现了实时的强大，适应性强的防御机制。我们对该框架进行了广泛的评估，量化了检测准确性，消毒质量，延迟和通信成本，从而证明了LLM在提高可穿戴物联网系统的安全性和可靠性方面的实用性和有效性。
摘要：The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and industrial applications, has required robust human activity recognition (HAR) techniques to improve functionality and user experience. Although machine learning models have advanced HAR, they are increasingly susceptible to data poisoning attacks that compromise the data integrity and reliability of these systems. Conventional approaches to defending against such attacks often require extensive task-specific training with large, labeled datasets, which limits adaptability in dynamic IoT environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot learning paradigms. Our approach incorporates \textit{role play} prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies, and \textit{think step-by-step} reasoning, guiding the LLM to infer poisoning indicators in the raw sensor data and plausible clean alternatives. These strategies minimize reliance on curation of extensive datasets and enable robust, adaptable defense mechanisms in real-time. We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost, thus demonstrating the practicality and effectiveness of LLMs in improving the security and reliability of wearable IoT systems.

Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】Graph Neural AI with Temporal Dynamics for Comprehensive Anomaly Detection in Microservices
标题：具有时间动力学的图神经人工智能用于微服务中的全面异常检测
链接：https://arxiv.org/abs/2511.03285

作者：Qingyuan Zhang, Ning Lyu, Le Liu, Yuxi Wang, Ziyu Cheng, Cancan Hua
摘要：该研究解决了微服务架构中的异常检测和根本原因跟踪问题，并提出了一个将图神经网络与时态建模相结合的统一框架。微服务调用链被抽象为有向图，其中节点和边的多维特征用于构建服务拓扑表示，图卷积用于聚合跨节点和模型依赖的特征，捕获服务之间的复杂结构关系。在此基础上，引入门控递归单元对调用链的时间演化进行建模，并采用多层堆叠和级联操作联合获得结构和时间表示，提高了异常模式的识别能力。在节点和路径两级定义异常评分函数，实现从局部异常检测到全局呼叫链跟踪的统一建模，实现异常服务节点的识别和潜在异常传播路径的重构。从超参数、环境干扰和数据分布等多个维度设计敏感性实验，对框架进行评估，结果表明，该框架在AUC、ACC、Recall和F1-Score等关键指标上优于基线方法，在动态拓扑和复杂环境下保持了较高的准确性和稳定性。该研究不仅为微服务中的异常检测提供了新的技术路径，也为分布式系统中的智能操作奠定了方法论基础。
摘要：This study addresses the problem of anomaly detection and root cause tracing in microservice architectures and proposes a unified framework that combines graph neural networks with temporal modeling. The microservice call chain is abstracted as a directed graph, where multidimensional features of nodes and edges are used to construct a service topology representation, and graph convolution is applied to aggregate features across nodes and model dependencies, capturing complex structural relationships among services. On this basis, gated recurrent units are introduced to model the temporal evolution of call chains, and multi-layer stacking and concatenation operations are used to jointly obtain structural and temporal representations, improving the ability to identify anomaly patterns. Furthermore, anomaly scoring functions at both the node and path levels are defined to achieve unified modeling from local anomaly detection to global call chain tracing, which enables the identification of abnormal service nodes and the reconstruction of potential anomaly propagation paths. Sensitivity experiments are then designed from multiple dimensions, including hyperparameters, environmental disturbances, and data distribution, to evaluate the framework, and results show that it outperforms baseline methods in key metrics such as AUC, ACC, Recall, and F1-Score, maintaining high accuracy and stability under dynamic topologies and complex environments. This research not only provides a new technical path for anomaly detection in microservices but also lays a methodological foundation for intelligent operations in distributed systems.

【2】GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models
标题：GMoPE：图形基础模型的预算专家混合框架
链接：https://arxiv.org/abs/2511.03251

作者：Zhibin Wang, Zhixing Zhang, Shuqi Wang, Xuanting Xie, Zhao Kang
摘要：图神经网络（GNN）在特定任务的基准测试中表现出令人印象深刻的性能，但它们在不同领域和任务中的泛化能力仍然有限。现有的方法往往与负迁移，可扩展性问题和高适应成本作斗争。为了解决这些挑战，我们提出了GMoPE（图形混合的专家），一种新的框架，无缝集成的混合专家（MoE）架构与基于图形的学习。GMoPE利用特定于专家的提示向量和结构感知的MoE路由，使每个专家能够专注于不同的子域，并动态地为预测做出贡献。为了促进多样性和防止专家崩溃，我们在提示向量中引入了软正交约束，鼓励专家专业化，促进更平衡的专家利用率。此外，我们采用了一个只微调策略，显着降低时空传输过程中的复杂性。我们通过各种预训练策略和多个下游任务下的广泛实验来验证GMoPE。结果表明，GMoPE始终优于最先进的基线，并实现与全参数微调相当的性能，同时只需要一小部分的自适应开销。我们的工作提供了一个原则性和可扩展的框架，推进通用和高效的图形基础模型。
摘要：Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited. Existing approaches often struggle with negative transfer, scalability issues, and high adaptation costs. To address these challenges, we propose GMoPE (Graph Mixture of Prompt-Experts), a novel framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs. GMoPE leverages expert-specific prompt vectors and structure-aware MoE routing to enable each expert to specialize in distinct subdomains and dynamically contribute to predictions. To promote diversity and prevent expert collapse, we introduce a soft orthogonality constraint across prompt vectors, encouraging expert specialization and facilitating a more balanced expert utilization. Additionally, we adopt a prompt-only fine-tuning strategy that significantly reduces spatiotemporal complexity during transfer. We validate GMoPE through extensive experiments under various pretraining strategies and multiple downstream tasks. Results show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning-while requiring only a fraction of the adaptation overhead. Our work provides a principled and scalable framework for advancing generalizable and efficient graph foundation models.

【3】Discrete Bayesian Sample Inference for Graph Generation
标题：用于图生成的离散Bayesian样本推理
链接：https://arxiv.org/abs/2511.03015

作者：Ole Petersen, Marcel Kollovieh, Marten Lienen, Stephan Günnemann
摘要：生成图结构数据在分子生成、知识图和网络分析等应用中至关重要。然而，它们的离散，无序的性质使它们难以为传统的生成模型，导致离散扩散和流动匹配模型的兴起。在这项工作中，我们介绍了GraphBSI，一种新的基于贝叶斯样本推理（BSI）的单次图生成模型。GraphBSI不是直接演化样本，而是在分布参数的连续空间中迭代地细化对图的信念，自然地处理离散结构。此外，我们国家BSI作为一个随机微分方程（SDES），并通过近似的分数函数，通过保留边缘分布，推导出一个噪声控制的家庭的SDES。我们的理论分析进一步揭示了贝叶斯流网络和扩散模型的连接。最后，在我们的实证评估中，我们展示了分子和合成图生成的最新性能，在标准基准Moses和GuacaMol上优于现有的一次性图生成模型。
摘要：Generating graph-structured data is crucial in applications such as molecular generation, knowledge graphs, and network analysis. However, their discrete, unordered nature makes them difficult for traditional generative models, leading to the rise of discrete diffusion and flow matching models. In this work, we introduce GraphBSI, a novel one-shot graph generative model based on Bayesian Sample Inference (BSI). Instead of evolving samples directly, GraphBSI iteratively refines a belief over graphs in the continuous space of distribution parameters, naturally handling discrete structures. Further, we state BSI as a stochastic differential equation (SDE) and derive a noise-controlled family of SDEs that preserves the marginal distributions via an approximation of the score function. Our theoretical analysis further reveals the connection to Bayesian Flow Networks and Diffusion models. Finally, in our empirical evaluation, we demonstrate state-of-the-art performance on molecular and synthetic graph generation, outperforming existing one-shot graph generative models on the standard benchmarks Moses and GuacaMol.

【4】Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks
标题：使用图神经网络的数字双驱动路面健康监测和维护优化
链接：https://arxiv.org/abs/2511.02957

作者：Mohsin Mahmud Topu, Mahfuz Ahmed Anik, Azmine Toushik Wasi, Md Manjurul Ahsan
摘要：复杂的空间依赖性、不断变化的环境条件和道路网络的非线性恶化对路面基础设施监测提出了挑战。传统的路面管理系统（PMS）在很大程度上仍然是被动的，缺乏故障预防和最佳维护计划的实时智能。为了解决这个问题，我们提出了一个统一的数字孪生（DT）和图形神经网络（GNN）框架，用于可扩展的，数据驱动的路面健康监测和预测性维护。路面段和空间关系被建模为图形节点和边，而实时无人机，传感器和激光雷达数据流进入DT。归纳GNN从图形结构的输入中学习恶化模式，以预测困境并进行积极干预。在具有分段属性和动态连接性的真实世界启发数据集上进行训练，我们的模型达到了0.3798的R2，优于基线回归器并有效地捕获非线性退化。我们还开发了一个交互式仪表板和强化学习模块，用于模拟，可视化和自适应维护计划。这种DT-GNN集成提高了预测精度，并建立了一个持续改进的闭环反馈回路，将该方法定位为主动、智能和可持续的路面管理的基础，并在未来扩展到现实世界的部署、多智能体协调和智能城市集成。
摘要：Pavement infrastructure monitoring is challenged by complex spatial dependencies, changing environmental conditions, and non-linear deterioration across road networks. Traditional Pavement Management Systems (PMS) remain largely reactive, lacking real-time intelligence for failure prevention and optimal maintenance planning. To address this, we propose a unified Digital Twin (DT) and Graph Neural Network (GNN) framework for scalable, data-driven pavement health monitoring and predictive maintenance. Pavement segments and spatial relations are modeled as graph nodes and edges, while real-time UAV, sensor, and LiDAR data stream into the DT. The inductive GNN learns deterioration patterns from graph-structured inputs to forecast distress and enable proactive interventions. Trained on a real-world-inspired dataset with segment attributes and dynamic connectivity, our model achieves an R2 of 0.3798, outperforming baseline regressors and effectively capturing non-linear degradation. We also develop an interactive dashboard and reinforcement learning module for simulation, visualization, and adaptive maintenance planning. This DT-GNN integration enhances forecasting precision and establishes a closed feedback loop for continuous improvement, positioning the approach as a foundation for proactive, intelligent, and sustainable pavement management, with future extensions toward real-world deployment, multi-agent coordination, and smart-city integration.

【5】Stochastic Deep Graph Clustering for Practical Group Formation
标题：用于实际群体形成的随机深度图聚集
链接：https://arxiv.org/abs/2511.02879

作者：Junhyung Park, Hyungjin Kim, Seokho Ahn, Young-Duk Seo
摘要：虽然以前的工作组推荐系统（GRS）主要集中在提高推荐的准确性，大多数方法假设静态或预定义的群体，使他们不适合动态的，现实世界的场景。我们将组形成重新定义为GRS的核心挑战，并提出了DeepForm（用于实际组形成的随机深度图聚类），这是一个旨在满足三个关键操作要求的框架：（1）高阶用户信息的合并，（2）实时组形成，以及（3）组数量的动态调整。DeepForm采用轻量级GCN架构，可有效捕获高阶结构信号。随机聚类学习使自适应组重新配置，而无需再训练，而对比学习在动态条件下细化组。在多个数据集上的实验表明，与各种基线相比，DeepForm实现了卓越的组形成质量，效率和推荐准确性。
摘要：While prior work on group recommender systems (GRSs) has primarily focused on improving recommendation accuracy, most approaches assume static or predefined groups, making them unsuitable for dynamic, real-world scenarios. We reframe group formation as a core challenge in GRSs and propose DeepForm (Stochastic Deep Graph Clustering for Practical Group Formation), a framework designed to meet three key operational requirements: (1) the incorporation of high-order user information, (2) real-time group formation, and (3) dynamic adjustment of the number of groups. DeepForm employs a lightweight GCN architecture that effectively captures high-order structural signals. Stochastic cluster learning enables adaptive group reconfiguration without retraining, while contrastive learning refines groups under dynamic conditions. Experiments on multiple datasets demonstrate that DeepForm achieves superior group formation quality, efficiency, and recommendation accuracy compared with various baselines.

Transformer(5篇)

【1】The Curved Spacetime of Transformer Architectures
标题：Transformer建筑的弯曲时空
链接：https://arxiv.org/abs/2511.03060

作者：Riccardo Di Sipio, Jairo Diaz-Rodriguez, Luis Serrano
摘要：我们提出了一个几何框架来理解基于transformer的语言模型，并与广义相对论进行了明确的类比。符号和键在表示空间上引入了一个有效的度量，注意力作为一个离散的连接，实现了价值向量在令牌之间的并行传输。堆叠层提供了离散的时间片，通过这些时间片，令牌表示在这个弯曲的流形上演化，而反向传播则扮演着最小作用原理的角色，在参数空间中形成损失最小化的轨迹。如果这个类比是正确的，那么token嵌入不应该在特征空间中穿过直线路径;相反，它们的分层步骤应该弯曲和重新定向，作为嵌入空间曲率介导的相互作用。为了验证这一预测，我们设计了一些实验来揭示曲率的存在和后果：（i）我们可视化了一个完整段落的曲率景观，揭示了局部转向角度如何在标记和层之间变化;（ii）我们通过模拟表明，过多的锐角/平面角和较长的长度与弦的比率不能用维度或机会来解释;和（iii）爱因斯坦的日食实验的启发，我们探测偏转控制的背景下编辑，展示可测量的，意义一致的弯曲嵌入轨迹，确认注意诱导曲率。
摘要：We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity. Queries and keys induce an effective metric on representation space, and attention acts as a discrete connection that implements parallel transport of value vectors across tokens. Stacked layers provide discrete time-slices through which token representations evolve on this curved manifold, while backpropagation plays the role of a least-action principle that shapes loss-minimizing trajectories in parameter space. If this analogy is correct, token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature. To test this prediction, we design experiments that expose both the presence and the consequences of curvature: (i) we visualize a curvature landscape for a full paragraph, revealing how local turning angles vary across tokens and layers; (ii) we show through simulations that excess counts of sharp/flat angles and longer length-to-chord ratios are not explainable by dimensionality or chance; and (iii) inspired by Einstein's eclipse experiment, we probe deflection under controlled context edits, demonstrating measurable, meaning-consistent bends in embedding trajectories that confirm attention-induced curvature.

【2】Data-Efficient Realized Volatility Forecasting with Vision Transformers
标题：使用Vision Transformers实现数据高效的波动性预测
链接：https://arxiv.org/abs/2511.03046

作者：Emi Soroka, Artem Arzyn
备注：NeurIPS Generative AI in Finance
摘要：金融机器学习的最新研究显示了复杂性的优点：能够学习高度非线性关系的深度学习方法在金融预测中优于简单方法的现象。虽然像Informer这样的Transformer架构已经显示出对金融时间序列预测的承诺，但Transformer模型在期权数据中的应用在很大程度上仍未得到探索。我们对期权数据的Transformer模型的开发进行了初步研究，通过训练Vision Transformer（ViT）架构，通常用于现代图像识别和分类系统，从其隐含波动率表面（增加了日期信息）预测未来30天内资产的已实现波动率。我们表明，ViT可以学习季节性模式和非线性特征的IV表面，提出了一个很有前途的方向模型的发展。
摘要：Recent work in financial machine learning has shown the virtue of complexity: the phenomenon by which deep learning methods capable of learning highly nonlinear relationships outperform simpler approaches in financial forecasting. While transformer architectures like Informer have shown promise for financial time series forecasting, the application of transformer models for options data remains largely unexplored. We conduct preliminary studies towards the development of a transformer model for options data by training the Vision Transformer (ViT) architecture, typically used in modern image recognition and classification systems, to predict the realized volatility of an asset over the next 30 days from its implied volatility surface (augmented with date information) for a single day. We show that the ViT can learn seasonal patterns and nonlinear features from the IV surface, suggesting a promising direction for model development.

【3】Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification
标题：用于TinyML图像分类的混合卷积和Vision Transformer NAS搜索空间
链接：https://arxiv.org/abs/2511.02992

作者：Mikhael Djajapermana, Moritz Reiber, Daniel Mueller-Gritschneder, Ulf Schlichtmann
备注：Presented at ITEM workshop co-located with ECML PKDD 2024, Vilnius LT
摘要：卷积神经网络（CNN）和Vision Transformer（ViT）的混合已经超过了纯CNN或ViT架构。然而，由于这些架构需要大的参数并产生大的计算成本，因此它们不适合tinyML部署。本文介绍了一种新的混合CNN-ViT搜索空间，用于神经结构搜索（NAS），以找到有效的图像分类混合结构。搜索空间覆盖了混合CNN和ViT块来学习局部和全局信息，以及可搜索池化层的新颖池化块，以实现高效的特征图缩减。在CIFAR 10数据集上的实验结果表明，我们提出的搜索空间可以产生混合CNN-ViT架构，在严格的模型大小约束下，具有优于基于ResNet的tinyML模型的准确性和推理速度。
摘要：Hybrids of Convolutional Neural Network (CNN) and Vision Transformer (ViT) have outperformed pure CNN or ViT architecture. However, since these architectures require large parameters and incur large computational costs, they are unsuitable for tinyML deployment. This paper introduces a new hybrid CNN-ViT search space for Neural Architecture Search (NAS) to find efficient hybrid architectures for image classification. The search space covers hybrid CNN and ViT blocks to learn local and global information, as well as the novel Pooling block of searchable pooling layers for efficient feature map reduction. Experimental results on the CIFAR10 dataset show that our proposed search space can produce hybrid CNN-ViT architectures with superior accuracy and inference speed to ResNet-based tinyML models under tight model size constraints.

【4】EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture
标题：EGMBE：使用混合扩散-Transformer架构高效生成金属有机框架
链接：https://arxiv.org/abs/2511.03122

作者：Seunghee Han, Yeonghun Kang, Taeun Bae, Varinia Bernales, Alan Aspuru-Guzik, Jihan Kim
摘要：由于化学空间的广阔和属性标记数据的稀缺，设计具有目标属性的材料仍然具有挑战性。虽然生成模型的最新进展为逆向设计提供了一种很有前途的方法，但大多数方法都需要大型数据集，并且必须针对每个新的目标属性进行重新训练。在这里，我们介绍了EGMOF（高效生成的MOF），一个混合扩散变压器框架，克服了这些限制，通过模块化，半导体介导的工作流程。EGMOF将逆向设计分解为两个步骤：（1）一维扩散模型（Prop2Desc），将所需属性映射到化学上有意义的描述符，然后是（2）Transformer模型（Desc2MOF），从这些描述符生成结构。这种模块化混合设计可以实现最少的重新训练，即使在小数据条件下也能保持高准确性。在氢吸收数据集上，EGMOF实现了超过95%的有效性和84%的命中率，与现有方法相比，有效性提高了57%，命中率提高了14%，同时仅使用1，000个训练样本即可保持有效性。此外，我们的模型成功地在29个不同的属性数据集上进行了条件生成，包括CoREMOF，QMOF和文本挖掘的实验数据集，而以前的模型没有。这项工作提出了一种数据高效，可推广的方法来逆向设计不同的MOF，并强调了模块化逆向设计工作流程的潜力，用于更广泛的材料发现。
摘要：Designing materials with targeted properties remains challenging due to the vastness of chemical space and the scarcity of property-labeled data. While recent advances in generative models offer a promising way for inverse design, most approaches require large datasets and must be retrained for every new target property. Here, we introduce the EGMOF (Efficient Generation of MOFs), a hybrid diffusion-transformer framework that overcomes these limitations through a modular, descriptor-mediated workflow. EGMOF decomposes inverse design into two steps: (1) a one-dimensional diffusion model (Prop2Desc) that maps desired properties to chemically meaningful descriptors followed by (2) a transformer model (Desc2MOF) that generates structures from these descriptors. This modular hybrid design enables minimal retraining and maintains high accuracy even under small-data conditions. On a hydrogen uptake dataset, EGMOF achieved over 95% validity and 84% hit rate, representing significant improvements of up to 57% in validity and 14% in hit rate compared to existing methods, while remaining effective with only 1,000 training samples. Moreover, our model successfully performed conditional generation across 29 diverse property datasets, including CoREMOF, QMOF, and text-mined experimental datasets, whereas previous models have not. This work presents a data-efficient, generalizable approach to the inverse design of diverse MOFs and highlights the potential of modular inverse design workflows for broader materials discovery.

【5】Consciousness-ECG Transformer for Conscious State Estimation System with Real-Time Monitoring
标题：用于实时监控的意识状态估计系统的意识-心电图Transformer
链接：https://arxiv.org/abs/2511.02853

作者：Young-Seok Kweon, Gi-Hwan Shin, Ji-Yong Kim, Bokyeong Ryu, Seong-Whan Lee
备注：30 pages, 8 figures
摘要：意识状态估计在各种医疗环境（包括睡眠分期和麻醉管理）中非常重要，可以确保患者安全并优化健康结果。传统的方法主要利用脑电图（EEG），它面临的挑战，如对噪声的高灵敏度和对受控环境的要求。在这项研究中，我们提出了意识心电图Transformer，利用心电图（ECG）信号的非侵入性和可靠的意识状态估计。我们的方法采用了一个Transformer与解耦查询注意有效地捕捉心率变异性功能，区分有意识和无意识的状态。我们实现了具有实时监测的意识状态估计系统，并在手术期间涉及睡眠分期和麻醉水平监测的数据集上验证了我们的系统。实验结果表明，我们的模型优于基线模型，实现睡眠分期的准确率为0.877，麻醉水平监测的准确率为0.880。此外，我们的模型在睡眠分期和麻醉水平监测上分别达到了0.786和0.895的最高曲线下面积值。所提出的系统提供了一个实用和强大的替代EEG为基础的方法，特别适合于动态的临床环境。我们的研究结果强调了基于ECG的意识监测在提高患者安全性和促进我们对意识状态的理解方面的潜力。
摘要：Conscious state estimation is important in various medical settings, including sleep staging and anesthesia management, to ensure patient safety and optimize health outcomes. Traditional methods predominantly utilize electroencephalography (EEG), which faces challenges such as high sensitivity to noise and the requirement for controlled environments. In this study, we propose the consciousness-ECG transformer that leverages electrocardiography (ECG) signals for non-invasive and reliable conscious state estimation. Our approach employs a transformer with decoupled query attention to effectively capture heart rate variability features that distinguish between conscious and unconscious states. We implemented the conscious state estimation system with real-time monitoring and validated our system on datasets involving sleep staging and anesthesia level monitoring during surgeries. Experimental results demonstrate that our model outperforms baseline models, achieving accuracies of 0.877 on sleep staging and 0.880 on anesthesia level monitoring. Moreover, our model achieves the highest area under curve values of 0.786 and 0.895 on sleep staging and anesthesia level monitoring, respectively. The proposed system offers a practical and robust alternative to EEG-based methods, particularly suited for dynamic clinical environments. Our results highlight the potential of ECG-based consciousness monitoring to enhance patient safety and advance our understanding of conscious states.

GAN|对抗|攻击|生成相关(2篇)

【1】RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse
标题：RAGBOP：具有保持准确性的上下文重用的高效检索增强生成
链接：https://arxiv.org/abs/2511.03475

作者：Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai
摘要：检索增强生成（RAG）增强了具有检索上下文的大型语言模型（LLM），但由于现代应用程序需要更长和更复杂的输入，因此通常会降低预填充性能。现有的缓存技术要么保持准确性与低缓存重用或提高重用的成本降低推理质量。我们提出了RAGBoost，一个高效的RAG系统，实现高缓存重用，而不牺牲准确性，通过准确性保持上下文重用。RAGBoost检测跨并发会话和多轮交互的重叠检索项，使用高效的上下文索引，排序和重复数据删除来最大限度地重用，而轻量级上下文提示保持推理保真度。它与现有的LLM推理引擎无缝集成，并将其预填充性能提高了1.5- 3倍，同时在不同的RAG和代理AI工作负载中保持甚至增强推理准确性。我们的代码发布于：https://github.com/Edinburgh-AgenticAI/RAGBoost。
摘要：Retrieval-augmented generation (RAG) enhances large language models (LLMs) with retrieved context but often suffers from downgraded prefill performance as modern applications demand longer and more complex inputs. Existing caching techniques either preserve accuracy with low cache reuse or improve reuse at the cost of degraded reasoning quality. We present RAGBoost, an efficient RAG system that achieves high cache reuse without sacrificing accuracy through accuracy-preserving context reuse. RAGBoost detects overlapping retrieved items across concurrent sessions and multi-turn interactions, using efficient context indexing, ordering, and de-duplication to maximize reuse, while lightweight contextual hints maintain reasoning fidelity. It integrates seamlessly with existing LLM inference engines and improves their prefill performance by 1.5-3X over state-of-the-art methods, while preserving or even enhancing reasoning accuracy across diverse RAG and agentic AI workloads. Our code is released at: https://github.com/Edinburgh-AgenticAI/RAGBoost.

【2】Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models
标题：基于潜在扩散模型的可扩展单细胞基因表达生成
链接：https://arxiv.org/abs/2511.02986

作者：Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak
备注：Github: this https URL
摘要：单细胞基因表达的计算建模对于理解细胞过程至关重要，但生成真实的表达谱仍然是一个重大挑战。这个困难来自基因表达数据的计数性质和基因之间复杂的潜在依赖性。现有的生成模型通常强加人工基因排序或依赖于浅层神经网络架构。我们引入了一个可扩展的潜在扩散模型的单细胞基因表达数据，我们称之为scLDM，尊重基本的数据交换属性。我们的VAE使用固定大小的潜在变量，利用统一的多头交叉注意块（MCAB）架构，它具有双重作用：编码器中的置换不变池和解码器中的置换等变解池。我们通过使用扩散Transformers和线性插值的潜在扩散模型替换高斯先验来增强此框架，从而实现具有多条件无分类器指导的高质量生成。我们在各种实验中显示了其优越的性能，包括观测和微扰单细胞数据，以及下游任务，如细胞级分类。
摘要：Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】From Propagation to Prediction: Point-level Uncertainty Evaluation of MLS Point Clouds under Limited Ground Truth
标题：从传播到预测：有限地面真实值条件下MLS点云的点级不确定性评估
链接：https://arxiv.org/abs/2511.03053

作者：Ziyang Xu, Olaf Wysocki, Christoph Holst
摘要：评估不确定性对于在许多高精度应用中可靠使用移动激光扫描（MLS）点云至关重要，例如扫描到BIM、变形分析和3D建模。然而，获得地面实况（GT）的评估往往是昂贵的，在许多现实世界的应用是不可行的。为了减少这种长期依赖GT的不确定性评估研究，本研究提出了一个基于学习的框架MLS点云，集成了最佳邻域估计与几何特征提取。在真实世界数据集上的实验表明，所提出的框架是可行的，XGBoost模型提供了与随机森林完全相当的准确性，同时实现了更高的效率（大约快3倍），提供了初步证据表明几何特征可用于预测点级别的不确定性由C2C距离量化。总之，本研究表明，MLS点云的不确定性是可学习的，为不确定性评估研究提供了一种新的基于学习的观点。
摘要：Evaluating uncertainty is critical for reliable use of Mobile Laser Scanning (MLS) point clouds in many high-precision applications such as Scan-to-BIM, deformation analysis, and 3D modeling. However, obtaining the ground truth (GT) for evaluation is often costly and infeasible in many real-world applications. To reduce this long-standing reliance on GT in uncertainty evaluation research, this study presents a learning-based framework for MLS point clouds that integrates optimal neighborhood estimation with geometric feature extraction. Experiments on a real-world dataset show that the proposed framework is feasible and the XGBoost model delivers fully comparable accuracy to Random Forest while achieving substantially higher efficiency (about 3 times faster), providing initial evidence that geometric features can be used to predict point-level uncertainty quantified by the C2C distance. In summary, this study shows that MLS point clouds' uncertainty is learnable, offering a novel learning-based viewpoint towards uncertainty evaluation research.

【2】Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions
标题：多圈时间驱动交互作用的无监督评估
链接：https://arxiv.org/abs/2511.03047

作者：Emi Soroka, Tanmay Chopra, Krish Desai, Sanjay Lall
备注：Under review at ICLR 2026
摘要：大型语言模型（LLM）在企业应用程序中越来越受欢迎，其中AI代理和人类参与目标驱动的交互。然而，这些系统很难评估：数据可能很复杂且未标记;人工注释通常在规模上不切实际;自定义度量可以监控特定错误，但不能监控以前未检测到的错误; LLM判断可能产生不可靠的结果。我们引入了第一组用于目标驱动交互的无监督度量，利用未标记交互数据的统计特性，并使用微调的LLM来适应分布变化。我们开发了标记用户目标，测量目标完成情况和量化LLM不确定性的指标，而无需在人类生成的理想响应中进行评估。我们的方法在开放域和特定任务的交互数据上进行了验证。
摘要：Large language models (LLMs) have seen increasing popularity in enterprise applications where AI agents and humans engage in objective-driven interactions. However, these systems are difficult to evaluate: data may be complex and unlabeled; human annotation is often impractical at scale; custom metrics can monitor for specific errors, but not previously-undetected ones; and LLM judges can produce unreliable results. We introduce the first set of unsupervised metrics for objective-driven interactions, leveraging statistical properties of unlabeled interaction data and using fine-tuned LLMs to adapt to distributional shifts. We develop metrics for labeling user goals, measuring goal completion, and quantifying LLM uncertainty without grounding evaluations in human-generated ideal responses. Our approach is validated on open-domain and task-specific interaction data.

【3】RKUM: An R Package for Robust Kernel Unsupervised Methods
标题：RKUM：鲁棒核无监督方法的R包
链接：https://arxiv.org/abs/2511.03216

作者：Md Ashad Alam
备注：26, 2 figures
摘要：RKUM是一个R包，用于实现健壮的基于内核的无监督方法。它提供的功能，估计鲁棒核协方差算子（CO）和鲁棒核互协方差算子（CCO）使用广义损失函数，而不是传统的二次损失。这些算子构成了鲁棒核学习的基础，并在污染或噪声数据条件下实现可靠的分析。该软件包包括鲁棒内核典型相关分析（Kernel CCA）的实现，以及标准和多内核CCA框架的影响函数（IF）。影响函数量化敏感性，并帮助检测跨双视图和多视图数据集的有影响或离群观测。使用合成的两视图和多视图数据的实验表明，IF的标准核CCA有效地识别离群值，而鲁棒的核方法在RKUM实现表现出降低的敏感性污染。总体而言，RKUM为高维数据应用程序中基于内核的健壮分析提供了一个高效且可扩展的平台。
摘要：RKUM is an R package developed for implementing robust kernel-based unsupervised methods. It provides functions for estimating the robust kernel covariance operator (CO) and the robust kernel cross-covariance operator (CCO) using generalized loss functions instead of the conventional quadratic loss. These operators form the foundation of robust kernel learning and enable reliable analysis under contaminated or noisy data conditions. The package includes implementations of robust kernel canonical correlation analysis (Kernel CCA), as well as the influence function (IF) for both standard and multiple kernel CCA frameworks. The influence function quantifies sensitivity and helps detect influential or outlying observations across two-view and multi-view datasets. Experiments using synthesized two-view and multi-view data demonstrate that the IF of the standard kernel CCA effectively identifies outliers, while the robust kernel methods implemented in RKUM exhibit reduced sensitivity to contamination. Overall, RKUM provides an efficient and extensible platform for robust kernel-based analysis in high-dimensional data applications.

【4】Approaching Low-Cost Cardiac Intelligence with Semi-Supervised Knowledge Distillation
标题：通过半监督知识提炼实现低成本心脏智能
链接：https://arxiv.org/abs/2511.02851

作者：Rushuang Zhou, Yuan-Ting Zhang, M.Jamal Deen, Yining Dong
摘要：部署先进的心脏人工智能用于日常心脏监测受到其对大量医疗数据和高计算资源的依赖的阻碍。低成本心脏智能（LCCI）通过使用可穿戴设备数据（如单导联心电图（ECG））提供了一种有前途的替代方案，但与高成本心脏智能（HCCI）相比，它存在显着的诊断性能差距。为了弥合这一差距，我们提出了LiteHeart，一个半监督的知识蒸馏框架。LiteHeart引入了一个区域感知的提取模块来模拟心脏病专家如何专注于诊断相关的ECG区域，并引入了一个跨层互信息模块来调整LCCI和HCCI系统的决策过程。使用半监督训练策略，LiteHeart进一步提高了有限监督下的模型鲁棒性。在涵盖38种心血管疾病的五个数据集上进行评估，LiteHeart大大缩小了LCCI和HCCI之间的性能差距，在宏观F1评分中优于现有方法4.27%至7.10%。这些结果表明，LiteHeart显著增强了低成本心脏智能系统的诊断能力，为使用可穿戴技术实现可扩展、经济实惠和准确的日常心脏医疗保健铺平了道路。
摘要：Deploying advanced cardiac artificial intelligence for daily cardiac monitoring is hindered by its reliance on extensive medical data and high computational resources. Low-cost cardiac intelligence (LCCI) offers a promising alternative by using wearable device data, such as 1-lead electrocardiogram (ECG), but it suffers from a significant diagnostic performance gap compared to high-cost cardiac intelligence (HCCI). To bridge this gap, we propose LiteHeart, a semi-supervised knowledge distillation framework. LiteHeart introduces a region-aware distillation module to mimic how cardiologists focus on diagnostically relevant ECG regions and a cross-layer mutual information module to align the decision processes of LCCI and HCCI systems. Using a semi-supervised training strategy, LiteHeart further improves model robustness under limited supervision. Evaluated on five datasets covering over 38 cardiovascular diseases, LiteHeart substantially reduces the performance gap between LCCI and HCCI, outperforming existing methods by 4.27% to 7.10% in macro F1 score. These results demonstrate that LiteHeart significantly enhances the diagnostic capabilities of low-cost cardiac intelligence systems, paving the way for scalable, affordable, and accurate daily cardiac healthcare using wearable technologies.

迁移|Zero/Few/One-Shot|自适应(12篇)

【1】Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
标题：行为自适应Q学习：一个统一的离线到在线强化学习框架
链接：https://arxiv.org/abs/2511.03695

作者：Lipeng Zu, Hansong Zhou, Xiaonan Zhang
摘要：离线强化学习（RL）可以在没有在线交互的情况下从固定数据进行训练，但是由于分布偏移和对不可见状态-动作对的不可靠值估计，离线学习的策略在动态环境中部署时往往会遇到困难。我们引入了行为自适应Q学习（BAQ），这是一个旨在实现从离线到在线RL的平稳和可靠过渡的框架。其关键思想是利用从离线数据导出的隐式行为模型，在在线微调期间提供行为一致性信号。BAQ包含了一个双目标损失，即（i）当不确定性很高时，使在线策略与离线行为保持一致，以及（ii）随着积累更自信的在线体验，逐渐放松这种约束。这种自适应机制减少了来自分布外估计的错误传播，稳定了早期在线更新，并加速了对新场景的适应。在标准基准测试中，BAQ始终优于之前的离线到在线RL方法，实现了更快的恢复，更好的鲁棒性和更高的整体性能。我们的研究结果表明，隐式行为适应是一个原则性和实用的解决方案，可靠的现实世界的政策部署。
摘要：Offline reinforcement learning (RL) enables training from fixed data without online interaction, but policies learned offline often struggle when deployed in dynamic environments due to distributional shift and unreliable value estimates on unseen state-action pairs. We introduce Behavior-Adaptive Q-Learning (BAQ), a framework designed to enable a smooth and reliable transition from offline to online RL. The key idea is to leverage an implicit behavioral model derived from offline data to provide a behavior-consistency signal during online fine-tuning. BAQ incorporates a dual-objective loss that (i) aligns the online policy toward the offline behavior when uncertainty is high, and (ii) gradually relaxes this constraint as more confident online experience is accumulated. This adaptive mechanism reduces error propagation from out-of-distribution estimates, stabilizes early online updates, and accelerates adaptation to new scenarios. Across standard benchmarks, BAQ consistently outperforms prior offline-to-online RL approaches, achieving faster recovery, improved robustness, and higher overall performance. Our results demonstrate that implicit behavior adaptation is a principled and practical solution for reliable real-world policy deployment.

【2】Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability
标题：迈向透明姿态检测：使用隐式和显式可解释性的Zero-Shot方法
链接：https://arxiv.org/abs/2511.03635

作者：Apoorva Upadhyaya, Wolfgang Nejdl, Marco Fisichella
备注：Accepted in AAAI CONFERENCE ON WEB AND SOCIAL MEDIA (ICWSM 2026)
摘要：Zero-Shot姿态检测（ZSSD）识别了柱子对看不见的目标的姿态。现有的研究使用对比，元学习，或数据增强遭受概括性问题或缺乏文本和目标之间的一致性。最近的工作利用ZSSD的大型语言模型（LLM）专注于改善看不见的目标特定的知识或生成立场分析的解释。然而，这些作品中的大多数都受到过度依赖显式推理的限制，提供了缺乏细微差别的粗略解释，并且没有显式地建模推理过程，因此难以解释模型的预测。为了解决这些问题，在我们的研究中，我们开发了一个新的可解释的ZSSD框架，IRIS。我们提供了一个可解释的理解的输入对目标的态度，隐含的文本内的序列（隐含的理由）和明确的语言措施（明确的理由）的基础上。IRIS将立场检测视为信息检索排名任务，理解不同立场的隐含理由的相关性，以引导模型进行正确的预测，而不需要理由的真实性，从而提供固有的可解释性。此外，基于交际特征的明确的理论基础有助于解码立场的情感和认知维度，提供对作者对给定目标的态度的可解释的理解。在VAST、EZ-STANCE、P-Stance和RFD的基准数据集上使用50%、30%甚至10%的训练数据进行了广泛的实验，证明了我们模型的通用性，受益于所提出的架构和可解释的设计。
摘要：Zero-Shot Stance Detection (ZSSD) identifies the attitude of the post toward unseen targets. Existing research using contrastive, meta-learning, or data augmentation suffers from generalizability issues or lack of coherence between text and target. Recent works leveraging large language models (LLMs) for ZSSD focus either on improving unseen target-specific knowledge or generating explanations for stance analysis. However, most of these works are limited by their over-reliance on explicit reasoning, provide coarse explanations that lack nuance, and do not explicitly model the reasoning process, making it difficult to interpret the model's predictions. To address these issues, in our study, we develop a novel interpretable ZSSD framework, IRIS. We provide an interpretable understanding of the attitude of the input towards the target implicitly based on sequences within the text (implicit rationales) and explicitly based on linguistic measures (explicit rationales). IRIS considers stance detection as an information retrieval ranking task, understanding the relevance of implicit rationales for different stances to guide the model towards correct predictions without requiring the ground-truth of rationales, thus providing inherent interpretability. In addition, explicit rationales based on communicative features help decode the emotional and cognitive dimensions of stance, offering an interpretable understanding of the author's attitude towards the given target. Extensive experiments on the benchmark datasets of VAST, EZ-STANCE, P-Stance, and RFD using 50%, 30%, and even 10% training data prove the generalizability of our model, benefiting from the proposed architecture and interpretable design.

【3】Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning
标题：使用深度强化学习在微服务中进行多目标自适应速率限制
链接：https://arxiv.org/abs/2511.03279

作者：Ning Lyu, Yuxi Wang, Ziyu Cheng, Qingyuan Zhang, Feng Chen
摘要：随着云计算和微服务架构变得越来越流行，API速率限制已成为确保系统稳定性和服务质量的关键机制。传统的速率限制算法，如令牌桶和滑动窗口，虽然被广泛采用，但难以适应动态流量模式和变化的系统负载。本文提出了一种基于深度强化学习的自适应速率限制策略，动态平衡系统吞吐量和服务延迟。我们设计了一种结合Deep Q-Network（DQN）和Asynchronous Advantage Actor-Critic（A3 C）算法的混合架构，将速率限制决策过程建模为Markov决策过程。该系统持续监控微服务状态，并通过环境交互学习最佳速率限制策略。在Kubernetes集群环境中进行的大量实验表明，与传统的固定阈值策略相比，我们的方法在高负载场景下实现了23.7%的吞吐量提高和31.4%的P99延迟减少。90天的生产部署处理5亿个日常请求的结果验证了所提出的方法的实际有效性，服务降级事件减少了82%，人工干预减少了68%。
摘要：As cloud computing and microservice architectures become increasingly prevalent, API rate limiting has emerged as a critical mechanism for ensuring system stability and service quality. Traditional rate limiting algorithms, such as token bucket and sliding window, while widely adopted, struggle to adapt to dynamic traffic patterns and varying system loads. This paper proposes an adaptive rate limiting strategy based on deep reinforcement learning that dynamically balances system throughput and service latency. We design a hybrid architecture combining Deep Q-Network (DQN) and Asynchronous Advantage Actor-Critic (A3C) algorithms, modeling the rate limiting decision process as a Markov Decision Process. The system continuously monitors microservice states and learns optimal rate limiting policies through environmental interaction. Extensive experiments conducted in a Kubernetes cluster environment demonstrate that our approach achieves 23.7% throughput improvement and 31.4% P99 latency reduction compared to traditional fixed-threshold strategies under high-load scenarios. Results from a 90-day production deployment handling 500 million daily requests validate the practical effectiveness of the proposed method, with 82% reduction in service degradation incidents and 68% decrease in manual interventions.

【4】Climate Adaptation with Reinforcement Learning: Economic vs. Quality of Life Adaptation Pathways
标题：通过强化学习适应气候：经济与生活质量适应途径
链接：https://arxiv.org/abs/2511.03243

作者：Miguel Costa, Arthur Vandervoort, Martin Drews, Karyn Morrissey, Francisco C. Pereira
备注：Accepted for presentation at AI for Climate and Conservation Workshop at EurIPS 2025
摘要：气候变化将导致洪水事件的频率和严重程度增加，这就需要制定协调一致的适应政策。然而，设计有效的适应政策取决于管理长期气候影响的不确定性。与此同时，这些政策可能具有重要的规范性选择，但并不总是明确的。我们建议强化学习（RL）可以成为一种有用的工具，既可以在不确定条件下识别适应途径，又可以对不同的适应优先级（例如经济与福利）进行明确的建模（和随后的比较）。我们使用综合评估模型（IAM）将降雨和洪水模型联系在一起，并计算洪水对生活质量（QoL），交通和基础设施损坏的影响。我们的研究结果表明，模型优先考虑生活质量的经济影响，在更多的适应支出，以及更均匀的分布在研究领域的支出，突出了这种规范性的假设可以改变适应政策的程度。我们的框架是公开的：https://github.com/MLSM-at-DTU/maat_qol_framework。
摘要：Climate change will cause an increase in the frequency and severity of flood events, prompting the need for cohesive adaptation policymaking. Designing effective adaptation policies, however, depends on managing the uncertainty of long-term climate impacts. Meanwhile, such policies can feature important normative choices that are not always made explicit. We propose that Reinforcement Learning (RL) can be a useful tool to both identify adaptation pathways under uncertain conditions while it also allows for the explicit modelling (and consequent comparison) of different adaptation priorities (e.g. economic vs. wellbeing). We use an Integrated Assessment Model (IAM) to link together a rainfall and flood model, and compute the impacts of flooding in terms of quality of life (QoL), transportation, and infrastructure damage. Our results show that models prioritising QoL over economic impacts results in more adaptation spending as well as a more even distribution of spending over the study area, highlighting the extent to which such normative assumptions can alter adaptation policy. Our framework is publicly available: https://github.com/MLSM-at-DTU/maat_qol_framework.

【5】Incorporating Quality of Life in Climate Adaptation Planning via Reinforcement Learning
标题：通过强化学习评估气候适应规划中的生活质量
链接：https://arxiv.org/abs/2511.03238

作者：Miguel Costa, Arthur Vandervoort, Martin Drews, Karyn Morrissey, Francisco C. Pereira
备注：Accepted for presentation at AI in Science (AIS) 2025
摘要：由于气候变化，预计城市洪水的频率和严重程度将增加，造成广泛的影响，包括城市生活质量（QoL）的下降。与此同时，决策者必须制定适应战略，以应对气候变化的不确定性和城市洪水的复杂性和动态性。强化学习（RL）在解决这些复杂、动态和不确定的问题方面具有重要的前景。正因为如此，我们使用RL来确定哪些气候适应途径导致长期较高的生活质量。我们这样做，使用综合评估模型（IAM），它结合了降雨预测模型，洪水模型，交通可达性模型，和生活质量指数。我们的初步结果表明，这种方法可以用来学习最佳的适应措施，它优于其他现实和现实世界的规划策略。我们的框架是公开的：https://github.com/MLSM-at-DTU/maat_qol_framework。
摘要：Urban flooding is expected to increase in frequency and severity as a consequence of climate change, causing wide-ranging impacts that include a decrease in urban Quality of Life (QoL). Meanwhile, policymakers must devise adaptation strategies that can cope with the uncertain nature of climate change and the complex and dynamic nature of urban flooding. Reinforcement Learning (RL) holds significant promise in tackling such complex, dynamic, and uncertain problems. Because of this, we use RL to identify which climate adaptation pathways lead to a higher QoL in the long term. We do this using an Integrated Assessment Model (IAM) which combines a rainfall projection model, a flood model, a transport accessibility model, and a quality of life index. Our preliminary results suggest that this approach can be used to learn optimal adaptation measures and it outperforms other realistic and real-world planning strategies. Our framework is publicly available: https://github.com/MLSM-at-DTU/maat_qol_framework.

【6】Test Time Adaptation Using Adaptive Quantile Recalibration
标题：使用自适应分位数重新校准的测试时间自适应
链接：https://arxiv.org/abs/2511.03148

作者：Paria Mehrbod, Pedro Vianna, Geraldin Nanfack, Guy Wolf, Eugene Belilovsky
摘要：领域自适应是增强深度学习模型在现实场景中的泛化能力的关键策略，在现实场景中，测试分布通常与训练域显著不同。然而，传统的方法通常依赖于目标域的先验知识或需要模型再训练，限制了它们在动态或资源受限环境中的实用性。最近的测试时自适应方法的基础上批量归一化统计更新允许无监督的适应，但他们往往无法捕捉复杂的激活分布，并限制到特定的归一化层。我们提出了自适应分位数重新校准（AQR），测试时的适应技术，修改激活前的分布对齐分位数的通道的基础上。AQR捕获激活分布的完整形状，并采用BatchNorm，GroupNorm或LayerNorm进行跨架构推广。为了解决在不同批量下估计分布尾部的挑战，AQR采用了一种稳健的尾部校准策略，可提高稳定性和精度。我们的方法利用在训练时计算的源域统计数据，实现无监督自适应，无需重新训练模型。在CIFAR-10-C、CIFAR-100-C和ImageNet-C上跨多个架构的实验表明，AQR在不同的设置中实现了鲁棒的自适应，优于现有的测试时间自适应基线。这些结果突出了AQR在具有动态和不可预测的数据分布的真实场景中的部署潜力。
摘要：Domain adaptation is a key strategy for enhancing the generalizability of deep learning models in real-world scenarios, where test distributions often diverge significantly from the training domain. However, conventional approaches typically rely on prior knowledge of the target domain or require model retraining, limiting their practicality in dynamic or resource-constrained environments. Recent test-time adaptation methods based on batch normalization statistic updates allow for unsupervised adaptation, but they often fail to capture complex activation distributions and are constrained to specific normalization layers. We propose Adaptive Quantile Recalibration (AQR), a test-time adaptation technique that modifies pre-activation distributions by aligning quantiles on a channel-wise basis. AQR captures the full shape of activation distributions and generalizes across architectures employing BatchNorm, GroupNorm, or LayerNorm. To address the challenge of estimating distribution tails under varying batch sizes, AQR incorporates a robust tail calibration strategy that improves stability and precision. Our method leverages source-domain statistics computed at training time, enabling unsupervised adaptation without retraining models. Experiments on CIFAR-10-C, CIFAR-100-C, and ImageNet-C across multiple architectures demonstrate that AQR achieves robust adaptation across diverse settings, outperforming existing test-time adaptation baselines. These results highlight AQR's potential for deployment in real-world scenarios with dynamic and unpredictable data distributions.

【7】Adaptive Detection of Software Aging under Workload Shift
标题：切换下软件老化的自适应检测
链接：https://arxiv.org/abs/2511.03103

作者：Rafael José Moura, Maria Gizele Nascimento, Fumio Machida, Ermeson Andrade
备注：SIMP\'OSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD)
摘要：软件老化是一种影响长期运行系统的现象，导致性能逐渐下降，并增加故障风险。为了缓解这个问题，这项工作提出了一种基于机器学习的自适应方法，用于在动态工作负载条件下进行软件老化检测。我们评估和比较静态模型与自适应模型，采用自适应检测器，特别是漂移检测方法（DDM）和自适应窗口（ADWIN），最初开发的概念漂移的情况下，并在这项工作中应用到处理工作量的变化。模拟突然，渐进和重复的工作负载过渡的实验表明，静态模型在应用于看不见的工作负载配置文件时会出现显着的性能下降，而ADWIN的自适应模型保持了较高的准确性，在所有分析的情况下都达到了0.93以上的F1分数。
摘要：Software aging is a phenomenon that affects long-running systems, leading to progressive performance degradation and increasing the risk of failures. To mitigate this problem, this work proposes an adaptive approach based on machine learning for software aging detection in environments subject to dynamic workload conditions. We evaluate and compare a static model with adaptive models that incorporate adaptive detectors, specifically the Drift Detection Method (DDM) and Adaptive Windowing (ADWIN), originally developed for concept drift scenarios and applied in this work to handle workload shifts. Experiments with simulated sudden, gradual, and recurring workload transitions show that static models suffer a notable performance drop when applied to unseen workload profiles, whereas the adaptive model with ADWIN maintains high accuracy, achieving an F1-Score above 0.93 in all analyzed scenarios.

【8】Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis
标题：数据高效自适应和基于蚁群的情绪分析的新型评估方法
链接：https://arxiv.org/abs/2511.03034

作者：Yan Cathy Hua, Paul Denny, Jörg Wicker, Katerina Taškova
摘要：基于观点的情感分析（ABSA）是一种细粒度的观点挖掘方法，用于识别和分类与句子中的特定实体（方面）或其类别相关的观点。尽管ABSA发展迅速，潜力巨大，但其研究和资源仍然集中在商业领域，使得教育和医疗等高需求、低资源领域的分析需求得不到满足。领域适应挑战和大多数现有方法依赖于资源密集型的培训知识注入，进一步阻碍了这些领域的进展。此外，基于精确匹配的传统评估方法对于ABSA任务过于严格，惩罚任何可能歪曲生成模型性能的边界变化。这项工作通过三个贡献来解决这些差距：1）我们提出了一种新的评估方法，灵活的文本相似性匹配和最佳二分配对（FTS-OBP），它适应现实的提取边界变化，同时保持与传统度量的强相关性，并提供细粒度的诊断。2)我们提出了第一个ABSA研究的小型解码器只生成语言模型（SLM; <7 B参数），检查资源下限通过教育审查ABSA的案例研究。我们系统地探索了无数据（上下文学习和权重合并）和轻数据微调方法，并提出了一种多任务微调策略，可显著增强SLM性能，使1.5-3.8 B模型能够超越专有的大型模型，并在单个GPU上仅用200- 1，000个示例就接近基准测试结果。3)我们发布了第一套公共的教育评论ABSA资源，以支持未来在低资源领域的研究。
摘要：Aspect-based Sentiment Analysis (ABSA) is a fine-grained opinion mining approach that identifies and classifies opinions associated with specific entities (aspects) or their categories within a sentence. Despite its rapid growth and broad potential, ABSA research and resources remain concentrated in commercial domains, leaving analytical needs unmet in high-demand yet low-resource areas such as education and healthcare. Domain adaptation challenges and most existing methods' reliance on resource-intensive in-training knowledge injection further hinder progress in these areas. Moreover, traditional evaluation methods based on exact matches are overly rigid for ABSA tasks, penalising any boundary variations which may misrepresent the performance of generative models. This work addresses these gaps through three contributions: 1) We propose a novel evaluation method, Flexible Text Similarity Matching and Optimal Bipartite Pairing (FTS-OBP), which accommodates realistic extraction boundary variations while maintaining strong correlation with traditional metrics and offering fine-grained diagnostics. 2) We present the first ABSA study of small decoder-only generative language models (SLMs; <7B parameters), examining resource lower bounds via a case study in education review ABSA. We systematically explore data-free (in-context learning and weight merging) and data-light fine-tuning methods, and propose a multitask fine-tuning strategy that significantly enhances SLM performance, enabling 1.5-3.8 B models to surpass proprietary large models and approach benchmark results with only 200-1,000 examples on a single GPU. 3) We release the first public set of education review ABSA resources to support future research in low-resource domains.

【9】Adaptive-Sensorless Monitoring of Shipping Containers
标题：集装箱的自适应无传感器监控
链接：https://arxiv.org/abs/2511.03022

作者：Lingqing Shen, Chi Heem Wong, Misaki Mito, Arnab Chakrabarti
备注：Published in 2025 IEEE Big Data
摘要：监测集装箱内部温度和湿度对于防止货物运输过程中质量下降至关重要。无传感器监控-使用外源因素预测容器内部条件的机器学习模型-显示出作为使用传感器监控的替代方案的前景。然而，它不包括遥测信息和纠正系统误差，导致预测与实时数据有很大差异，并使用户感到困惑。在本文中，我们介绍了残差校正方法，一个通用的框架，用于在观测现场遥测数据后校正无传感器模型中的系统偏差。我们称这类模型为"自适应无传感器"监测。我们在348万个数据点上对自适应无传感器模型进行了训练和评估-这是学术研究中使用的最大的集装箱传感器读数数据集-并表明它们比基线无传感器模型产生了一致的改进。当对模拟数据的保留集进行评估时，它们的平均绝对误差（MAE）为2.24 $\sim $2.31 $^\circ$C（无传感器为2.43$^\circ$C），相对湿度为5.72 $\sim $7.09%（与无传感器的7.99%相比），温度的平均均方根误差（RMSE）为3.19 $\sim $3.26$^\circ$C（与无传感器的3.38%相比）和相对湿度的7.70%（与无传感器的10.0%相比）。自适应无传感器模型可以实现更准确的货物监控，早期风险检测，并减少对全球航运全面连接的依赖。
摘要：Monitoring the internal temperature and humidity of shipping containers is essential to preventing quality degradation during cargo transportation. Sensorless monitoring -- machine learning models that predict the internal conditions of the containers using exogenous factors -- shows promise as an alternative to monitoring using sensors. However, it does not incorporate telemetry information and correct for systematic errors, causing the predictions to differ significantly from the live data and confusing the users. In this paper, we introduce the residual correction method, a general framework for correcting for systematic biases in sensorless models after observing live telemetry data. We call this class of models ``adaptive-sensorless'' monitoring. We train and evaluate adaptive-sensorless models on the 3.48 million data points -- the largest dataset of container sensor readings ever used in academic research -- and show that they produce consistent improvements over the baseline sensorless models. When evaluated on the holdout set of the simulated data, they achieve average mean absolute errors (MAEs) of 2.24 $\sim$ 2.31$^\circ$C (vs 2.43$^\circ$C by sensorless) for temperature and 5.72 $\sim$ 7.09% for relative humidity (vs 7.99% by sensorless) and average root mean-squared errors (RMSEs) of 3.19 $\sim$ 3.26$^\circ$C for temperature (vs 3.38$^\circ$C by sensorless) and 7.70 $\sim$ 9.12% for relative humidity (vs 10.0% by sensorless). Adaptive-sensorless models enable more accurate cargo monitoring, early risk detection, and less dependence on full connectivity in global shipping.

【10】Test-time Adaptation of Tiny Recursive Models
标题：微型回归模型的测试时适应
链接：https://arxiv.org/abs/2511.02886

作者：Ronan Killian McGovern
摘要：在2025年ARC Prize竞赛结束之前，领先的开源方法-称为TRM或Tiny Recursive Models -涉及在ARC任务的增强变体上训练7 M参数递归神经网络。这种方法在公开的ARC AGI II评估集上获得了大约7.8%的分数，但所需的计算水平远远超过了比赛期间允许的水平。这篇论文表明，通过从一个已经在公共ARC任务上进行过预训练的小型递归模型开始，人们可以在允许的计算限制内有效地对竞争任务进行微调。具体来说，一个模型在4xH 100 SXM GPU上对1,280个公共任务进行了48小时的700 k+优化器步骤的预训练，以在公共评估集上获得约10%的分数。然后，该模型在比赛期间仅经过12，500个梯度步骤的后训练，在半私人评估任务中获得6.67%的分数。值得注意的是，这种训练后的性能是通过对微小模型进行全微调来实现的，而不仅仅是LoRA微调或任务嵌入的微调。
摘要：Prior to the close of the 2025 ARC Prize competition, the leading open source approach - known as TRM, or Tiny Recursive Models - involved training a 7M parameter recursive neural network on augmented variants of ARC tasks. That approach scored approximately 7.8% on the public ARC AGI II evaluation set, but required a level of compute far in excess of what is allowed during the competition. This paper shows that, by starting from a tiny recursive model that has been pre-trained on public ARC tasks, one can efficiently fine-tune on competition tasks within the allowed compute limits. Specifically, a model was pre-trained on 1,280 public tasks for 700k+ optimizer steps over 48 hours on 4xH100 SXM GPUs to obtain a ~10% score on the public evaluation set. That model was then post-trained in just 12,500 gradient steps during the competition to reach a score of 6.67% on semi-private evaluation tasks. Notably, such post-training performance is achieved by full-fine tuning of the tiny model, not LoRA fine-tuning or fine-tuning of task embeddings alone.

【11】Provable Accelerated Bayesian Optimization with Knowledge Transfer
标题：具有知识转移的可证明加速Bayesian优化
链接：https://arxiv.org/abs/2511.03125

作者：Haitao Lin, Boxin Zhao, Mladen Kolar, Chong Liu
摘要：我们研究如何贝叶斯优化（BO）可以加速目标任务与历史知识转移相关的源任务。现有的工程BO与知识转移要么没有理论保证或实现相同的遗憾BO在非转移设置，$\tilde{\mathcal{O}}（\sqrt{T \gamma_f}）$，其中$T$是评估的目标函数的数量和$\gamma_f$表示其信息增益。在本文中，我们提出了DeltaBO算法，其中一种新的不确定性量化方法是建立在源函数和目标函数之间的差函数$\delta$上，允许它们属于不同的再生核希尔伯特空间（RKHS）。在温和的假设下，我们证明了DeltaBO的后悔是顺序$\tilde{\mathcal{O}}（\sqrt{T（T/N + \gamma_\delta）}）$，其中$N$表示来自源任务的评估的数量，通常为$N \gg T$。在许多应用程序中，源任务和目标任务是相似的，这意味着$\gamma_\delta$可以比$\gamma_f$小得多。对现实世界的超参数调整任务和合成函数的实证研究表明，DeltaBO优于其他基线方法，并支持我们的理论主张。
摘要：We study how Bayesian optimization (BO) can be accelerated on a target task with historical knowledge transferred from related source tasks. Existing works on BO with knowledge transfer either do not have theoretical guarantees or achieve the same regret as BO in the non-transfer setting, $\tilde{\mathcal{O}}(\sqrt{T \gamma_f})$, where $T$ is the number of evaluations of the target function and $\gamma_f$ denotes its information gain. In this paper, we propose the DeltaBO algorithm, in which a novel uncertainty-quantification approach is built on the difference function $\delta$ between the source and target functions, which are allowed to belong to different reproducing kernel Hilbert spaces (RKHSs). Under mild assumptions, we prove that the regret of DeltaBO is of order $\tilde{\mathcal{O}}(\sqrt{T (T/N + \gamma_\delta)})$, where $N$ denotes the number of evaluations from source tasks and typically $N \gg T$. In many applications, source and target tasks are similar, which implies that $\gamma_\delta$ can be much smaller than $\gamma_f$. Empirical studies on both real-world hyperparameter tuning tasks and synthetic functions show that DeltaBO outperforms other baseline methods and support our theoretical claims.

【12】EEGReXferNet: A Lightweight Gen-AI Framework for EEG Subspace Reconstruction via Cross-Subject Transfer Learning and Channel-Aware Embedding
标题：EEGReXferNet：一个轻量级Gen-AI框架，用于通过跨学科迁移学习和队列感知嵌入进行脑电子空间重建
链接：https://arxiv.org/abs/2511.02848

作者：Shantanu Sarkar, Piotr Nabrzyski, Saurabh Prasad, Jose Luis Contreras-Vidal
备注：Accepted for presentation at the NeurIPS 2025 Workshop on Foundation Models for the Brain and Body
摘要：脑电图（EEG）是一种广泛使用的非侵入性脑活动监测技术，但由于各种伪影导致的低信噪比（SNR）往往会影响其实用性。传统的伪影去除方法在滤波/重建期间需要人工干预或抑制关键神经特征的风险。生成模型的最新进展，包括变分自编码器（VAE）和生成对抗网络（GANs），已经显示出EEG重建的前景;然而，这些方法通常缺乏集成的时间-频谱-空间灵敏度，并且计算密集，限制了它们对脑机接口（BCI）等实时应用的适用性。为了克服这些挑战，我们引入了EEGReXferNet，这是一个轻量级的Gen-AI框架，用于通过跨学科迁移学习进行EEG子空间重建-使用Keras TensorFlow（v2.15.1）开发。EEGReXferNet采用模块化架构，利用相邻通道的体积传导、特定频带卷积编码和通过滑动窗口的动态潜在特征提取。通过整合基于参考的缩放，该框架确保了连续窗口的连续性，并有效地概括了各个主题。该设计提高了空间-时间-频谱分辨率（平均PSD相关性>= 0.95;平均频谱图RV系数>= 0.85），将总权重降低了~45%以减轻过拟合，并且保持了神经生理学和BCI应用中的鲁棒的实时EEG预处理的计算效率。
摘要：Electroencephalography (EEG) is a widely used non-invasive technique for monitoring brain activity, but low signal-to-noise ratios (SNR) due to various artifacts often compromise its utility. Conventional artifact removal methods require manual intervention or risk suppressing critical neural features during filtering/reconstruction. Recent advances in generative models, including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have shown promise for EEG reconstruction; however, these approaches often lack integrated temporal-spectral-spatial sensitivity and are computationally intensive, limiting their suitability for real-time applications like brain-computer interfaces (BCIs). To overcome these challenges, we introduce EEGReXferNet, a lightweight Gen-AI framework for EEG subspace reconstruction via cross-subject transfer learning - developed using Keras TensorFlow (v2.15.1). EEGReXferNet employs a modular architecture that leverages volume conduction across neighboring channels, band-specific convolution encoding, and dynamic latent feature extraction through sliding windows. By integrating reference-based scaling, the framework ensures continuity across successive windows and generalizes effectively across subjects. This design improves spatial-temporal-spectral resolution (mean PSD correlation >= 0.95; mean spectrogram RV-Coefficient >= 0.85), reduces total weights by ~45% to mitigate overfitting, and maintains computational efficiency for robust, real-time EEG preprocessing in neurophysiological and BCI applications.

强化学习(5篇)

【1】Shrinking the Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards
标题：缩小方差：具有可验证奖励的强化学习的缩小基线
链接：https://arxiv.org/abs/2511.03710

作者：Guanning Zeng, Zhaoyi Zhou, Daman Arora, Andrea Zanette
备注：Preprint. Under Review
摘要：带有可验证奖励的强化学习（RLVR）已经成为使用GRPO等策略梯度方法进行大型推理模型（LRM）后训练的强大范例。为了稳定训练，这些方法通常通过减去每个提示的经验平均值来中心轨迹奖励。从统计学上讲，这种中心化作为控制变量（或基线），减少了策略梯度估计器的方差。通常，平均奖励是使用一批中每个提示的每个提示经验平均值来估计的。从斯坦悖论的灵感，我们建议使用收缩估计，结合每提示和跨提示的手段，以提高整体的每提示平均估计精度-特别是在低世代制度典型的RLVR。从理论上讲，我们构建了一个基于收缩的基线，可证明产生较低的方差政策梯度估计算法。我们提出的基线作为现有的每提示平均基线的替代品，不需要额外的超参数或计算。从经验上讲，收缩基线始终优于标准的平均值基线，从而降低方差梯度更新并提高训练稳定性。
摘要：Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for post-training large reasoning models (LRMs) using policy-gradient methods such as GRPO. To stabilize training, these methods typically center trajectory rewards by subtracting the empirical mean for each prompt. Statistically, this centering acts as a control variate (or baseline), reducing the variance of the policy-gradient estimator. Typically, the mean reward is estimated using per-prompt empirical averages for each prompt in a batch. Drawing inspiration from Stein's paradox, we propose using shrinkage estimators that combine per-prompt and across-prompt means to improve the overall per-prompt mean estimation accuracy -- particularly in the low-generation regime typical of RLVR. Theoretically, we construct a shrinkage-based baseline that provably yields lower-variance policy-gradient estimators across algorithms. Our proposed baseline serves as a drop-in replacement for existing per-prompt mean baselines, requiring no additional hyper-parameters or computation. Empirically, shrinkage baselines consistently outperform standard empirical-mean baselines, leading to lower-variance gradient updates and improved training stability.

【2】Towards Formalizing Reinforcement Learning Theory
标题：走向正式化强化学习理论
链接：https://arxiv.org/abs/2511.03618

作者：Shangtong Zhang
摘要：在本文中，我们形式化的几乎必然收敛的$Q$-学习和线性时间差（TD）学习与马尔可夫样本使用精益4定理证明的基础上Mathlib库。Q学习和线性TD是最早和最有影响力的强化学习（RL）算法之一。它们的收敛性研究不仅是RL领域发展初期的一个重要研究课题，而且也受到越来越多的关注。本文基于Robbins-Siegmund定理在统一的框架下形式化地证明了它们的几乎处处收敛性。在这项工作中开发的框架可以很容易地扩展到收敛速度和其他模式的收敛。因此，这项工作使完全形式化收敛RL结果的重要一步。该代码可在https://github.com/ShangtongZhang/rl-theory-in-lean上获得。
摘要：In this paper, we formalize the almost sure convergence of $Q$-learning and linear temporal difference (TD) learning with Markovian samples using the Lean 4 theorem prover based on the Mathlib library. $Q$-learning and linear TD are among the earliest and most influential reinforcement learning (RL) algorithms. The investigation of their convergence properties is not only a major research topic during the early development of the RL field but also receives increasing attention nowadays. This paper formally verifies their almost sure convergence in a unified framework based on the Robbins-Siegmund theorem. The framework developed in this work can be easily extended to convergence rates and other modes of convergence. This work thus makes an important step towards fully formalizing convergent RL results. The code is available at https://github.com/ShangtongZhang/rl-theory-in-lean.

【3】Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning
标题：通过深度隐式模仿强化学习超越专家表现
链接：https://arxiv.org/abs/2511.03616

作者：Iason Chrysomallis, Georgios Chalkiadakis
摘要：模仿学习传统上需要来自最优或接近最优专家的完整状态-动作演示。这些要求严重限制了实用性，因为许多真实世界的场景只提供状态观测而没有相应的动作，并且专家的性能通常是次优的。在本文中，我们引入了一个深度隐式模仿强化学习框架，通过将深度强化学习与仅观察数据集的隐式模仿学习相结合来解决这两个限制。我们的主要算法，深度隐式模仿Q网络（DIIQN），采用行动推理机制，通过在线探索重建专家的行动，并集成了一个动态的信心机制，自适应平衡专家指导和自我导向的学习。这使得代理能够利用专家指导来加速训练，同时保持能力以超过次优的专家性能。我们进一步扩展我们的框架与异构动作DIIQN（HA-DIIQN）算法，以解决专家和代理拥有不同的动作集的情况下，以前未解决的挑战，在隐式模仿学习文献。HA-DIIQN引入了一个不可行性检测机制和一个桥接程序，当直接行动复制不可能时，该程序确定了连接代理能力和专家指导的替代途径。我们的实验结果表明，DIIQN实现了高达130%的情景回报相比，标准DQN，同时始终优于现有的隐式模仿方法，不能超过专家的表现。在异构动作设置中，HA-DIIQN的学习速度比基线快64%，利用了传统方法无法使用的专家数据集。广泛的参数敏感性分析揭示了该框架在不同数据集大小和超参数配置下的鲁棒性。
摘要：Imitation learning traditionally requires complete state-action demonstrations from optimal or near-optimal experts. These requirements severely limit practical applicability, as many real-world scenarios provide only state observations without corresponding actions and expert performance is often suboptimal. In this paper we introduce a deep implicit imitation reinforcement learning framework that addresses both limitations by combining deep reinforcement learning with implicit imitation learning from observation-only datasets. Our main algorithm, Deep Implicit Imitation Q-Network (DIIQN), employs an action inference mechanism that reconstructs expert actions through online exploration and integrates a dynamic confidence mechanism that adaptively balances expert-guided and self-directed learning. This enables the agent to leverage expert guidance for accelerated training while maintaining capacity to surpass suboptimal expert performance. We further extend our framework with a Heterogeneous Actions DIIQN (HA-DIIQN) algorithm to tackle scenarios where expert and agent possess different action sets, a challenge previously unaddressed in the implicit imitation learning literature. HA-DIIQN introduces an infeasibility detection mechanism and a bridging procedure identifying alternative pathways connecting agent capabilities to expert guidance when direct action replication is impossible. Our experimental results demonstrate that DIIQN achieves up to 130% higher episodic returns compared to standard DQN, while consistently outperforming existing implicit imitation methods that cannot exceed expert performance. In heterogeneous action settings, HA-DIIQN learns up to 64% faster than baselines, leveraging expert datasets unusable by conventional approaches. Extensive parameter sensitivity analysis reveals the framework's robustness across varying dataset sizes and hyperparameter configurations.

【4】Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments
标题：没有批评者的学习？重温经典强化学习环境中的GRPO
链接：https://arxiv.org/abs/2511.03527

作者：Bryan L. M. de Oliveira, Felipe V. Frujeri, Marcos P. C. M. Queiroz, Luana G. B. Martins, Telma W. de L. Soares, Luckeciano C. Melo
摘要：组相对策略优化（GRPO）已经成为最近策略优化（PPO）的可扩展替代方案，通过消除学习批评，而是通过轨迹的组相对比较来估计优势。这种简化提出了关于政策梯度方法中学习基线的必要性的基本问题。我们提出了第一个系统的研究GRPO在经典的单任务强化学习环境中，跨越离散和连续的控制任务。通过控制消融隔离基线，贴现和组抽样，我们揭示了三个关键发现：（1）有经验的批评者仍然是长期任务的关键：除了在CartPole这样的短期环境中，情节回报可能是有效的，所有无批评基线的表现都低于PPO;（2）GRPO受益于高贴现因子（伽马= 0.99），但HalfCheetah除外，在该研究中，缺乏提前终止有利于适度折扣（伽马= 0.9）;（3）较小的组规模优于较大的，这表明在基于批次的分组策略，混合不相关的事件的局限性。这些结果揭示了经典控制中的无临界方法的局限性，以及它们仍然是学习值函数的可行替代品的特定条件。
摘要：Group Relative Policy Optimization (GRPO) has emerged as a scalable alternative to Proximal Policy Optimization (PPO) by eliminating the learned critic and instead estimating advantages through group-relative comparisons of trajectories. This simplification raises fundamental questions about the necessity of learned baselines in policy-gradient methods. We present the first systematic study of GRPO in classical single-task reinforcement learning environments, spanning discrete and continuous control tasks. Through controlled ablations isolating baselines, discounting, and group sampling, we reveal three key findings: (1) learned critics remain essential for long-horizon tasks: all critic-free baselines underperform PPO except in short-horizon environments like CartPole where episodic returns can be effective; (2) GRPO benefits from high discount factors (gamma = 0.99) except in HalfCheetah, where lack of early termination favors moderate discounting (gamma = 0.9); (3) smaller group sizes outperform larger ones, suggesting limitations in batch-based grouping strategies that mix unrelated episodes. These results reveal both the limitations of critic-free methods in classical control and the specific conditions where they remain viable alternatives to learned value functions.

【5】Reinforcement Learning Using known Invariances
标题：使用已知不变性的强化学习
链接：https://arxiv.org/abs/2511.03473

作者：Alexandru Cioba, Aya Kayal, Laura Toni, Sattar Vakili, Alberto Bernacchia
摘要：在许多现实世界的强化学习（RL）问题中，环境表现出固有的对称性，可以利用这些对称性来提高学习效率。本文开发了一个理论和算法框架，将已知的组对称性纳入基于内核的RL。我们提出了一种乐观最小二乘值迭代（LSVI）的感知变量，它利用不变内核来编码奖励和过渡动态的不变性。我们的分析建立了新的界限不变RKHS的最大信息增益和覆盖数，明确量化的样本效率增益的对称性。在定制的Frozen Lake环境和2D布局设计问题上的实证结果证实了理论上的改进，证明了具有容错意识的RL比标准内核实现了更好的性能。这些发现突出了结构先验在设计更有效的样本强化学习算法中的价值。
摘要：In many real-world reinforcement learning (RL) problems, the environment exhibits inherent symmetries that can be exploited to improve learning efficiency. This paper develops a theoretical and algorithmic framework for incorporating known group symmetries into kernel-based RL. We propose a symmetry-aware variant of optimistic least-squares value iteration (LSVI), which leverages invariant kernels to encode invariance in both rewards and transition dynamics. Our analysis establishes new bounds on the maximum information gain and covering numbers for invariant RKHSs, explicitly quantifying the sample efficiency gains from symmetry. Empirical results on a customized Frozen Lake environment and a 2D placement design problem confirm the theoretical improvements, demonstrating that symmetry-aware RL achieves significantly better performance than their standard kernel counterparts. These findings highlight the value of structural priors in designing more sample-efficient reinforcement learning algorithms.

符号|符号学习(1篇)

【1】SyMuPe: Affective and Controllable Symbolic Music Performance
标题：SyMuPe：感人且可控的象征音乐表演
链接：https://arxiv.org/abs/2511.03425

作者：Ilya Borovik, Dmitrii Gavrilev, Vladimir Viro
备注：ACM Multimedia 2025. Extended version with supplementary material
摘要：情感是音乐表演创作和感知的基础。然而，通过机器学习模型实现类似人类的表达和情感，以实现性能渲染仍然是一项具有挑战性的任务。在这项工作中，我们提出了SyMuPe，一个新的框架，用于开发和训练情感和可控的象征性钢琴演奏模型。我们的旗舰模型PianoFlow使用经过训练的条件流匹配来解决各种多掩模性能修复任务。通过设计，它支持无条件生成和填充音乐表演功能。在训练中，我们使用了一个精心策划的、经过清理的数据集，其中包含了2,968小时的对齐乐谱和富有表现力的演奏。对于文本和情感控制，我们集成了钢琴演奏情感分类器，并将PianoFlow与情感加权的Flan-T5文本嵌入作为条件输入进行调整。针对基于transformer的基线和现有模型的客观和主观评估表明，PianoFlow不仅优于其他方法，而且还实现了与人类记录和转录的样本相当的性能质量。对于情绪控制，我们提出并分析了不同的文本条件下产生的样本。开发的模型可以集成到交互式应用程序中，有助于创建更易于访问和参与的音乐表演系统。
摘要：Emotions are fundamental to the creation and perception of music performances. However, achieving human-like expression and emotion through machine learning models for performance rendering remains a challenging task. In this work, we present SyMuPe, a novel framework for developing and training affective and controllable symbolic piano performance models. Our flagship model, PianoFlow, uses conditional flow matching trained to solve diverse multi-mask performance inpainting tasks. By design, it supports both unconditional generation and infilling of music performance features. For training, we use a curated, cleaned dataset of 2,968 hours of aligned musical scores and expressive MIDI performances. For text and emotion control, we integrate a piano performance emotion classifier and tune PianoFlow with the emotion-weighted Flan-T5 text embeddings provided as conditional inputs. Objective and subjective evaluations against transformer-based baselines and existing models show that PianoFlow not only outperforms other approaches, but also achieves performance quality comparable to that of human-recorded and transcribed MIDI samples. For emotion control, we present and analyze samples generated under different text conditioning scenarios. The developed model can be integrated into interactive applications, contributing to the creation of more accessible and engaging music performance systems.

医学相关(2篇)

【1】Colorectal Cancer Histopathological Grading using Multi-Scale Federated Learning
标题：使用多尺度联邦学习进行结直肠癌组织病理学分级
链接：https://arxiv.org/abs/2511.03693

作者：Md Ahasanul Arafath, Abhijit Kumar Ghosh, Md Rony Ahmed, Sabrin Afroz, Minhazul Hosen, Md Hasan Moon, Md Tanzim Reza, Md Ashad Alam
备注：15 pages and 7 figures
摘要：结直肠癌（CRC）分级是一个关键的预后因素，但仍然受到观察者之间的差异和多机构数据共享的隐私限制的阻碍。虽然深度学习提供了一条自动化的道路，但集中式训练模型与数据治理法规相冲突，并且忽视了多尺度分析的诊断重要性。在这项工作中，我们提出了一个可扩展的，隐私保护的联邦学习（FL）框架CRC组织病理学分级，集成了分布式训练范式内的多尺度特征学习。我们的方法采用双流ResNetRS 50骨干，同时捕获细粒度的核细节和更广泛的组织水平的背景。该架构集成到一个强大的FL系统中，使用FedProx进行稳定，以减轻来自多个医院的异构数据分布之间的客户端漂移。对CRC-HGD数据集的广泛评估表明，我们的框架实现了83.5%的总体准确率，优于可比的集中式模型（81.6%）。至关重要的是，该系统在识别最具侵袭性的III级肿瘤方面表现出色，召回率高达87.5%，这是防止危险假阴性的关键临床优先事项。性能随着放大倍率的提高而进一步提高，在40倍放大倍率下达到88.0%的准确度。这些结果验证了我们的联邦多尺度方法不仅保护了患者隐私，而且提高了模型性能和泛化能力。拟议的模块化管道具有内置的预处理，检查点和错误处理，为数字病理学的可部署，隐私感知临床AI奠定了基础。
摘要：Colorectal cancer (CRC) grading is a critical prognostic factor but remains hampered by inter-observer variability and the privacy constraints of multi-institutional data sharing. While deep learning offers a path to automation, centralized training models conflict with data governance regulations and neglect the diagnostic importance of multi-scale analysis. In this work, we propose a scalable, privacy-preserving federated learning (FL) framework for CRC histopathological grading that integrates multi-scale feature learning within a distributed training paradigm. Our approach employs a dual-stream ResNetRS50 backbone to concurrently capture fine-grained nuclear detail and broader tissue-level context. This architecture is integrated into a robust FL system stabilized using FedProx to mitigate client drift across heterogeneous data distributions from multiple hospitals. Extensive evaluation on the CRC-HGD dataset demonstrates that our framework achieves an overall accuracy of 83.5%, outperforming a comparable centralized model (81.6%). Crucially, the system excels in identifying the most aggressive Grade III tumors with a high recall of 87.5%, a key clinical priority to prevent dangerous false negatives. Performance further improves with higher magnification, reaching 88.0% accuracy at 40x. These results validate that our federated multi-scale approach not only preserves patient privacy but also enhances model performance and generalization. The proposed modular pipeline, with built-in preprocessing, checkpointing, and error handling, establishes a foundational step toward deployable, privacy-aware clinical AI for digital pathology.

【2】ECGXtract: Deep Learning-based ECG Feature Extraction for Automated CVD Diagnosis
标题：ECGXtract：基于深度学习的心电图特征提取，用于自动心血管疾病诊断
链接：https://arxiv.org/abs/2511.02850

作者：Youssif Abuzied, Hassan AbdEltawab, Abdelrhman Gaber, Tamer ElBatt
摘要：本文介绍了ECGXtract，这是一种基于深度学习的可解释心电图特征提取方法，解决了传统信号处理和黑箱机器学习方法的局限性。特别是，我们开发了卷积神经网络模型，能够提取时间和形态特征，与临床验证的基础事实具有很强的相关性。最初，每个模型都经过训练以提取单个特征，确保精确和可解释的输出。然后进行了一系列实验，以评估所提出的方法在多个设置，包括全球与铅特定的功能，不同的采样频率，并与其他方法，如ECGdeli比较。我们的研究结果表明，ECGXtract在大多数特征上都实现了稳健的性能，平均相关性得分为0.80，与全局特征的地面真实值相关，导联II始终提供最佳结果。对于电极导线特定特征，ECGXtract的平均相关性评分为0.822。此外，ECGXtract实现了优于最先进的开源ECGdeli的结果，因为它在90%的特征中获得了与地面真实值更高的相关性得分。此外，我们探讨了利用单个模型同时提取多个特征的可行性。语义分组被证明是有效的全局特征，而大规模分组和铅特定的多输出模型显示出显着的性能下降。这些结果突出了结构化分组策略在平衡计算效率与模型准确性方面的潜力，为在有限资源环境中实现更具可扩展性和临床可解释性的ECG特征提取系统铺平了道路。
摘要：This paper presents ECGXtract, a deep learning-based approach for interpretable ECG feature extraction, addressing the limitations of traditional signal processing and black-box machine learning methods. In particular, we develop convolutional neural network models capable of extracting both temporal and morphological features with strong correlations to a clinically validated ground truth. Initially, each model is trained to extract a single feature, ensuring precise and interpretable outputs. A series of experiments is then carried out to evaluate the proposed method across multiple setups, including global versus lead-specific features, different sampling frequencies, and comparisons with other approaches such as ECGdeli. Our findings show that ECGXtract achieves robust performance across most features with a mean correlation score of 0.80 with the ground truth for global features, with lead II consistently providing the best results. For lead-specific features, ECGXtract achieves a mean correlation score of 0.822. Moreover, ECGXtract achieves superior results to the state-of-the-art open source ECGdeli as it got a higher correlation score with the ground truth in 90% of the features. Furthermore, we explore the feasibility of extracting multiple features simultaneously utilizing a single model. Semantic grouping is proved to be effective for global features, while large-scale grouping and lead-specific multi-output models show notable performance drops. These results highlight the potential of structured grouping strategies to balance the computational efficiency vs. model accuracy, paving the way for more scalable and clinically interpretable ECG feature extraction systems in limited resource settings.

聚类(1篇)

【1】Unifying Information-Theoretic and Pair-Counting Clustering Similarity
标题：统一信息论和配对计数集群相似性
链接：https://arxiv.org/abs/2511.03000

作者：Alexander J. Gates
备注：28 pages, 2 figures
摘要：比较聚类是评估无监督模型的核心，然而许多现有的相似性度量可能会产生广泛的差异，有时是矛盾的评估。聚类相似性度量通常被组织成两个主要的家庭，对计数和信息理论，反映他们是否量化协议通过元素对或聚合信息在整个集群列联表。以前的工作已经发现了这些家庭之间的相似之处，并应用经验标准化或机会校正方案，但他们更深层次的分析联系仍然只有部分理解。在这里，我们开发了一个分析框架，通过两个互补的角度统一这些家庭。首先，这两个家庭表示为加权扩展的观察与预期的同现，与配对计数产生的二次，低阶近似和信息理论的措施，高阶，频率加权扩展。其次，我们推广对计数$k$元组协议，并表明信息理论的措施，可以被视为系统地积累更高阶的协同分配结构以外的两两水平。我们说明的方法分析的兰德指数和互信息，并显示如何在每个家庭的其他指数出现自然扩展。总之，这些观点澄清了何时以及为什么这两个政权分歧，直接涉及其敏感性加权和近似顺序，并提供了一个原则性的基础上选择，解释，并扩展聚类相似性措施的应用程序。
摘要：Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order approximation and information-theoretic measures as higher-order, frequency-weighted extensions. Second, we generalize pair-counting to $k$-tuple agreement and show that information-theoretic measures can be viewed as systematically accumulating higher-order co-assignment structure beyond the pairwise level. We illustrate the approaches analytically for the Rand index and Mutual Information, and show how other indices in each family emerge as natural extensions. Together, these views clarify when and why the two regimes diverge, relating their sensitivities directly to weighting and approximation order, and provide a principled basis for selecting, interpreting, and extending clustering similarity measures across applications.

自动驾驶|车辆|车道检测等(2篇)

【1】A Feedback-Control Framework for Efficient Dataset Collection from In-Vehicle Data Streams
标题：从车载数据流中高效收集数据集的反馈控制框架
链接：https://arxiv.org/abs/2511.03239

作者：Philipp Reis, Philipp Rigoll, Christian Steinhauser, Jacob Langner, Eric Sax
摘要：现代人工智能系统越来越不受模型容量的限制，而是受其数据质量和多样性的限制。尽管越来越重视以数据为中心的人工智能，但大多数数据集仍然是以开环方式收集的，这种方式积累了冗余样本，而没有来自当前覆盖范围的反馈。这导致存储效率低、标签成本高和泛化能力有限。为了解决这个问题，本文介绍了\ac{FCDC}，一个范例，制定数据收集作为一个闭环控制问题。\ac{FCDC}使用在线概率模型连续近似收集的数据分布状态，并基于反馈信号（如似然和马氏距离）自适应调节样本保留。通过这种反馈机制，系统动态地平衡探索和利用，保持数据集的多样性，并防止冗余随着时间的推移而积累。除了展示\ac{FCDC}在合成数据集上的可控性外，在真实数据流上的实验表明，\ac{FCDC}产生的数据集更平衡，同时减少了$\SI{39.8}{\percent}$。这些结果表明，数据收集本身可以被主动控制，将收集从被动的管道阶段转变为以数据为中心的人工智能核心的自我调节、反馈驱动的过程。
摘要：Modern AI systems are increasingly constrained not by model capacity but by the quality and diversity of their data. Despite growing emphasis on data-centric AI, most datasets are still gathered in an open-loop manner which accumulates redundant samples without feedback from the current coverage. This results in inefficient storage, costly labeling, and limited generalization. To address this, this paper introduces \ac{FCDC}, a paradigm that formulates data collection as a closed-loop control problem. \ac{FCDC} continuously approximates the state of the collected data distribution using an online probabilistic model and adaptively regulates sample retention using based on feedback signals such as likelihood and Mahalanobis distance. Through this feedback mechanism, the system dynamically balances exploration and exploitation, maintains dataset diversity, and prevents redundancy from accumulating over time. Besides showcasing the controllability of \ac{FCDC} on a synthetic dataset, experiments on a real data stream show that \ac{FCDC} produces more balanced datasets by $\SI{25.9}{\percent}$ while reducing data storage by $\SI{39.8}{\percent}$. These results demonstrate that data collection itself can be actively controlled, transforming collection from a passive pipeline stage into a self-regulating, feedback-driven process at the core of data-centric AI.

【2】Modeling Headway in Heterogeneous and Mixed Traffic Flow: A Statistical Distribution Based on a General Exponential Function
标题：异类和混合交通流中的车头时距建模：基于一般指数函数的统计分布
链接：https://arxiv.org/abs/2511.03154

作者：Natchaphon Leungbootnak, Zihao Li, Zihang Wei, Dominique Lord, Yunlong Zhang
摘要：现有车头时距分布准确反映异质交通（不同类型的车辆）和混合交通（人类驾驶车辆与自动驾驶车辆）中的不同行为和特征的能力有限，导致拟合优度不令人满意。为了解决这些问题，我们修改了指数函数，以获得一个新的车头时距分布。我们没有采用欧拉数（e）作为指数函数的基数，而是利用实数基数来为观测到的车头时距建模提供更大的灵活性。然而，所提出的不是概率函数。我们将其归一化以计算概率并导出封闭形式的方程。在这项研究中，我们利用了五个开放数据集的综合实验：highD，exiD，NGSIM，Waymo和Lyft来评估所提出的分布的性能，并将其性能与混合和异构交通流下的六个现有分布进行了比较。结果表明，该分布不仅捕捉车头时距分布的基本特征，但也提供了物理上有意义的参数，描述所观察到的车头时距的分布形状。在高速公路上的非均匀流（即，不间断的业务流），所提出的分布优于其他候选分布。在城市道路条件下（即，中断的业务流），包括异构和混合业务，所提出的分布仍然取得了不错的结果。
摘要：The ability of existing headway distributions to accurately reflect the diverse behaviors and characteristics in heterogeneous traffic (different types of vehicles) and mixed traffic (human-driven vehicles with autonomous vehicles) is limited, leading to unsatisfactory goodness of fit. To address these issues, we modified the exponential function to obtain a novel headway distribution. Rather than employing Euler's number (e) as the base of the exponential function, we utilized a real number base to provide greater flexibility in modeling the observed headway. However, the proposed is not a probability function. We normalize it to calculate the probability and derive the closed-form equation. In this study, we utilized a comprehensive experiment with five open datasets: highD, exiD, NGSIM, Waymo, and Lyft to evaluate the performance of the proposed distribution and compared its performance with six existing distributions under mixed and heterogeneous traffic flow. The results revealed that the proposed distribution not only captures the fundamental characteristics of headway distribution but also provides physically meaningful parameters that describe the distribution shape of observed headways. Under heterogeneous flow on highways (i.e., uninterrupted traffic flow), the proposed distribution outperforms other candidate distributions. Under urban road conditions (i.e., interrupted traffic flow), including heterogeneous and mixed traffic, the proposed distribution still achieves decent results.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】EvtSlowTV - A Large and Diverse Dataset for Event-Based Depth Estimation
标题：EvtSlowTV -用于基于事件的深度估计的大型且多样化的数据集
链接：https://arxiv.org/abs/2511.02953

作者：Sadiq Layi Macaulay, Nimet Kaygusuz, Simon Hadfield
摘要：事件摄像机具有高动态范围（HDR）和低延迟的特点，为在具有挑战性的环境中进行稳健的深度估计提供了一种有前途的替代方案。然而，许多基于事件的深度估计方法受到小规模注释数据集的约束，限制了它们对真实世界场景的推广。为了弥补这一差距，我们引入了EvtSlowTV，这是一个从公开的YouTube视频中策划的大型事件相机数据集，其中包含超过13B个不同环境条件和运动的事件，包括季节性徒步旅行，飞行，风景驾驶和水下探索。EvtSlowTV比现有的事件数据集大一个数量级，为基于事件的深度学习提供了一个不受约束的自然环境。这项工作显示了EvtSlowTV对于自我监督学习框架的适用性，以利用原始事件流的HDR潜力。我们进一步证明，EvtSlowTV训练增强了模型概括复杂场景和运动的能力。我们的方法消除了对基于帧的注释的需要，并保留了事件数据的异步性质。
摘要：Event cameras, with their high dynamic range (HDR) and low latency, offer a promising alternative for robust depth estimation in challenging environments. However, many event-based depth estimation approaches are constrained by small-scale annotated datasets, limiting their generalizability to real-world scenarios. To bridge this gap, we introduce EvtSlowTV, a large-scale event camera dataset curated from publicly available YouTube footage, which contains more than 13B events across various environmental conditions and motions, including seasonal hiking, flying, scenic driving, and underwater exploration. EvtSlowTV is an order of magnitude larger than existing event datasets, providing an unconstrained, naturalistic setting for event-based depth learning. This work shows the suitability of EvtSlowTV for a self-supervised learning framework to capitalise on the HDR potential of raw event streams. We further demonstrate that training with EvtSlowTV enhances the model's ability to generalise to complex scenes and motions. Our approach removes the need for frame-based annotations and preserves the asynchronous nature of event data.

联邦学习|隐私保护|加密(1篇)

【1】Byzantine-Robust Federated Learning with Learnable Aggregation Weights
标题：具有可学习聚合权重的拜占庭稳健联邦学习
链接：https://arxiv.org/abs/2511.03529

作者：Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira, Mikael Johansson
摘要：联合学习（FL）使客户能够协作训练全局模型，而无需共享其私有数据。然而，恶意（拜占庭）客户端的存在对FL的鲁棒性提出了重大挑战，特别是当客户端之间的数据分布是异构的。在本文中，我们提出了一种新的拜占庭鲁棒FL优化问题，将自适应加权到聚合过程。与传统方法不同，我们的公式将聚合权重视为可学习的参数，与全局模型参数一起对其进行联合优化。为了解决这个优化问题，我们开发了一个交替的最小化算法，在对抗攻击下具有强收敛保证。我们分析了拜占庭的弹性所提出的目标。我们评估我们的算法对国家的最先进的拜占庭强大的FL方法在各种数据集和攻击场景的性能。实验结果表明，我们的方法始终优于现有的方法，特别是在高度异构的数据和恶意客户端的比例很大的设置。
摘要：Federated Learning (FL) enables clients to collaboratively train a global model without sharing their private data. However, the presence of malicious (Byzantine) clients poses significant challenges to the robustness of FL, particularly when data distributions across clients are heterogeneous. In this paper, we propose a novel Byzantine-robust FL optimization problem that incorporates adaptive weighting into the aggregation process. Unlike conventional approaches, our formulation treats aggregation weights as learnable parameters, jointly optimizing them alongside the global model parameters. To solve this optimization problem, we develop an alternating minimization algorithm with strong convergence guarantees under adversarial attack. We analyze the Byzantine resilience of the proposed objective. We evaluate the performance of our algorithm against state-of-the-art Byzantine-robust FL approaches across various datasets and attack scenarios. Experimental results demonstrate that our method consistently outperforms existing approaches, particularly in settings with highly heterogeneous data and a large proportion of malicious clients.

推理|分析|理解|解释(5篇)

【1】Death by a Thousand Prompts: Open Model Vulnerability Analysis
标题：千人死亡：开放模型漏洞分析
链接：https://arxiv.org/abs/2511.03247

作者：Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda
摘要：开放权重模型为研究人员和开发人员提供了各种下游应用程序的可访问基础。我们测试了八个开放权重大型语言模型（LLM）的安全性和安全性状况，以识别可能影响后续微调和部署的漏洞。使用自动对抗测试，我们测量了每个模型对单回合和多回合提示注入和越狱攻击的弹性。我们的研究结果揭示了所有测试模型中普遍存在的漏洞，多回合攻击的成功率在25.86%到92.78%之间-比单回合基线增加了2\times $到10\times $。这些结果强调了当前开放重量模型在扩展相互作用中维持安全护栏的系统性无能。我们评估了对齐策略和实验室优先级对弹性的影响：以能力为中心的模型，如Llama 3.3和Qwen 3，表现出更高的多圈敏感性，而以安全为导向的设计，如Google Gemma 3，表现出更平衡的性能。分析得出的结论是，开放权重模型虽然对创新至关重要，但在没有分层安全控制的情况下部署时会带来切实的运营和道德风险。这些发现旨在告知从业者和开发人员潜在的风险以及专业人工智能安全解决方案的价值，以减轻风险。解决多回合漏洞对于确保在企业和公共领域安全、可靠和负责任地部署开放权重LLM至关重要。我们建议采用安全第一的设计理念和分层保护，以确保开放权重模型的弹性部署。
摘要：Open-weight models provide researchers and developers with accessible foundations for diverse downstream applications. We tested the safety and security postures of eight open-weight large language models (LLMs) to identify vulnerabilities that may impact subsequent fine-tuning and deployment. Using automated adversarial testing, we measured each model's resilience against single-turn and multi-turn prompt injection and jailbreak attacks. Our findings reveal pervasive vulnerabilities across all tested models, with multi-turn attacks achieving success rates between 25.86\% and 92.78\% -- representing a $2\times$ to $10\times$ increase over single-turn baselines. These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions. We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance. The analysis concludes that open-weight models, while crucial for innovation, pose tangible operational and ethical risks when deployed without layered security controls. These findings are intended to inform practitioners and developers of the potential risks and the value of professional AI security solutions to mitigate exposure. Addressing multi-turn vulnerabilities is essential to ensure the safe, reliable, and responsible deployment of open-weight LLMs in enterprise and public domains. We recommend adopting a security-first design philosophy and layered protections to ensure resilient deployments of open-weight models.

【2】Exploratory Analysis of Cyberattack Patterns on E-Commerce Platforms Using Statistical Methods
标题：使用统计方法探索性分析电子商务平台上的网络攻击模式
链接：https://arxiv.org/abs/2511.03020

作者：Fatimo Adenike Adeniya (York St John University, London Campus, London, United Kingdom)
备注：32 pages, 9 figures, 6 tables; MSc Research Dissertation, York St John University, London Campus
摘要：针对电子商务平台的网络攻击越来越复杂，威胁到消费者的信任和运营的连续性。该研究提出了一种混合分析框架，该框架集成了统计建模和机器学习，用于检测和预测电子商务领域的网络攻击模式。使用Verizon社区数据泄露（VCDB）数据集，该研究应用Auto ARIMA进行时间预测和显著性检验，包括Mann-Whitney U检验（U = 2579981.5，p = 0.0121），证实假日购物活动比非假日期间经历了更严重的网络攻击。ANOVA还用于检查威胁严重程度的季节性变化，而集成机器学习模型（XGBoost，LightGBM和CatBoost）用于预测分类。结果显示，在黑色星期五和假日季节等高风险时期，经常发生攻击高峰，涉及个人身份信息（PII）的违规行为显示出威胁指标升高。在所有模型中，CatBoost的性能最高（准确率= 85.29%，F1评分= 0.2254，ROC AUC = 0.8247）。该框架独特地将季节性预测与可解释的集成学习相结合，实现了时间风险预测和违约类型分类。纳入了伦理考虑因素，包括负责任地使用敏感数据和偏倚评估。尽管存在类别不平衡和对历史数据的依赖，但该研究为主动网络安全资源分配提供了见解，并概述了未来实时威胁检测研究的方向。
摘要：Cyberattacks on e-commerce platforms have grown in sophistication, threatening consumer trust and operational continuity. This research presents a hybrid analytical framework that integrates statistical modelling and machine learning for detecting and forecasting cyberattack patterns in the e-commerce domain. Using the Verizon Community Data Breach (VCDB) dataset, the study applies Auto ARIMA for temporal forecasting and significance testing, including a Mann-Whitney U test (U = 2579981.5, p = 0.0121), which confirmed that holiday shopping events experienced significantly more severe cyberattacks than non-holiday periods. ANOVA was also used to examine seasonal variation in threat severity, while ensemble machine learning models (XGBoost, LightGBM, and CatBoost) were employed for predictive classification. Results reveal recurrent attack spikes during high-risk periods such as Black Friday and holiday seasons, with breaches involving Personally Identifiable Information (PII) exhibiting elevated threat indicators. Among the models, CatBoost achieved the highest performance (accuracy = 85.29%, F1 score = 0.2254, ROC AUC = 0.8247). The framework uniquely combines seasonal forecasting with interpretable ensemble learning, enabling temporal risk anticipation and breach-type classification. Ethical considerations, including responsible use of sensitive data and bias assessment, were incorporated. Despite class imbalance and reliance on historical data, the study provides insights for proactive cybersecurity resource allocation and outlines directions for future real-time threat detection research.

【3】Inference-Time Personalized Alignment with a Few User Preference Queries
标题：具有一些用户偏好预设的推理时个性化对齐
链接：https://arxiv.org/abs/2511.02966

作者：Victor-Alexandru Pădurean, Parameswaran Kamalaruban, Nachiket Kotalwar, Alkis Gotovos, Adish Singla
备注：NeurIPS'25 paper
摘要：我们研究了生成模型的响应与用户的偏好对齐的问题。最近的作品提出了几种不同的配方个性化对齐，但是，他们要么需要大量的用户偏好查询或要求的偏好被明确指定为文本输入。在本文中，我们提出了一种新的推理时间的个性化对齐方法，UserAlign，这elavenant用户的喜好与几个查询成对响应比较。特别是，UserAlign建立在物流强盗中最佳手臂识别的理论框架之上，并从模型生成的响应的固定池中选择个性化响应。关键思想是考虑用户的反馈一致和无噪声，并将其纳入理论框架，以快速确定最佳响应。几个任务的实验结果，包括个性化的文本和图像生成，展示了UserAlign在实现个性化对齐方面的有效性。
摘要：We study the problem of aligning a generative model's response with a user's preferences. Recent works have proposed several different formulations for personalized alignment; however, they either require a large amount of user preference queries or require that the preference be explicitly specified as a text input. In this paper, we propose a novel inference-time personalized alignment method, UserAlign, that elicits the user's preferences with a few queries as pairwise response comparisons. In particular, UserAlign builds on the theoretical framework of best-arm identification in logistic bandits and selects a personalized response from a fixed pool of the model's generated responses. The key idea is to consider the user's feedback consistent and noise-free, and incorporate it into the theoretical framework to identify the best response quickly. Experimental results across several tasks, involving personalized text and image generation, showcase the effectiveness of UserAlign in achieving personalized alignment.

【4】Quantifying Weighted Morphological Content of Large-Scale Structures via Simulation-Based Inference
标题：通过基于模拟的推理量化大规模结构的加权形态内容
链接：https://arxiv.org/abs/2511.03636

作者：M. H. Jalali Kanafi, S. M. S. Movahed
备注：19 pages, 9 figures and 3 tables. Comments are welcome
摘要：在这项工作中，我们进行了一个基于模拟的预测分析，以比较大尺度结构（LSS），闵可夫斯基函数（MF）和导数的条件矩（CMD）的两个高阶汇总统计量的约束能力，特别关注它们对红移空间中的非线性和各向异性特征的敏感性。我们的分析依赖于光环目录从大Sobol序列（BSQ）模拟在红移$z=0.5$，采用似然自由推理框架，通过神经后验估计实现。在Quijote模拟的基准宇宙学中，（\Omega_{m}= 0.3175，\sigma_{8}=0.834），对于平滑尺度R= 15\，h ^{-1}$Mpc，我们发现CMD对$（\Omega_{m}}，\，\sigma_{8}）$比零阶到三阶MFs分量的约束精度提高了${\sim}（44%，52%）$，${\sim}（30%，45%）$，${\sim}（27%，17%）$，${\sim}（26%，17%）$。一个联合配置相结合的MF和CMD进一步提高了精度约${\sim}27\%$相比，单独的标准MF，突出显示互补的各向异性敏感的信息捕获的CMD相反的标量形态封装的MF内容。我们进一步扩展预测分析的宇宙学参数值和多个平滑尺度的连续范围。我们的研究结果表明，虽然绝对预测不确定性的汇总统计量的每个组成部分取决于基本的参数值和所采用的平滑尺度，汇总统计量之间的相对约束力几乎保持不变。
摘要：In this work, we perform a simulation-based forecasting analysis to compare the constraining power of two higher-order summary statistics of the large-scale structure (LSS), the Minkowski Functionals (MFs) and the Conditional Moments of Derivative (CMD), with a particular focus on their sensitivity to nonlinear and anisotropic features in redshift-space. Our analysis relies on halo catalogs from the Big Sobol Sequence(BSQ) simulations at redshift $z=0.5$, employing a likelihood-free inference framework implemented via neural posterior estimation. At the fiducial cosmology of the Quijote simulations $(\Omega_{m}=0.3175,\,\sigma_{8}=0.834)$, and for the smoothing scale $R=15\,h^{-1}$Mpc, we find that the CMD yields tighter forecasts for $(\Omega_{m}},\,\sigma_{8})$ than the zeroth- to third-order MFs components, improving the constraint precision by ${\sim}(44\%,\,52\%)$, ${\sim}(30\%,\,45\%)$, ${\sim}(27\%,\,17\%)$, and ${\sim}(26\%,\,17\%)$, respectively. A joint configuration combining the MFs and CMD further enhances the precision by approximately ${\sim}27\%$ compared to the standard MFs alone, highlighting the complementary anisotropy-sensitive information captured by the CMD in contrast to the scalar morphological content encapsulated by the MFs. We further extend the forecasting analysis to a continuous range of cosmological parameter values and multiple smoothing scales. Our results show that, although the absolute forecast uncertainty for each component of summary statistics depends on the underlying parameter values and the adopted smoothing scale, the relative constraining power among the summary statistics remains nearly constant throughout.

【5】Precise asymptotic analysis of Sobolev training for random feature models
标题：随机特征模型Sobolev训练的精确渐进分析
链接：https://arxiv.org/abs/2511.03050

作者：Katharine E Fisher, Matthew TC Li, Youssef Marzouk, Timo Schorlepp
备注：23(+49) pages, 7(+16) figures main text(+appendix)
摘要：梯度信息在应用中是广泛有用和可用的，因此很自然地包括在神经网络的训练中。然而，理论上对Sobolev训练的影响知之甚少-函数和梯度数据的回归-对高维高度过度参数化预测模型的泛化误差。在本文中，我们获得了随机特征（RF）模型的这种训练模式的精确表征，其中可训练参数，输入维度和训练数据的数量趋于无穷大。我们的Sobolev训练模型通过将梯度数据绘制到有限维子空间上来反映实际实现。通过将统计物理学中的副本方法与算子值自由概率理论中的线性化相结合，我们得到了训练RF模型的泛化误差的封闭形式描述。对于由单指数模型描述的目标函数，我们证明了用额外的梯度数据补充函数数据并不能普遍提高预测性能。相反，过度参数化的程度应该告知训练方法的选择。更广泛地说，我们的研究结果通过插值噪声函数和梯度数据来确定模型的最佳性能设置。
摘要：Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training -- regression with both function and gradient data -- on the generalization error of highly overparameterized predictive models in high dimensions. In this paper, we obtain a precise characterization of this training modality for random feature (RF) models in the limit where the number of trainable parameters, input dimensions, and training data tend proportionally to infinity. Our model for Sobolev training reflects practical implementations by sketching gradient data onto finite dimensional subspaces. By combining the replica method from statistical physics with linearizations in operator-valued free probability theory, we derive a closed-form description for the generalization errors of the trained RF models. For target functions described by single-index models, we demonstrate that supplementing function data with additional gradient data does not universally improve predictive performance. Rather, the degree of overparameterization should inform the choice of training method. More broadly, our results identify settings where models perform optimally by interpolating noisy function and gradient data.

检测相关(3篇)

【1】SHIELD: Securing Healthcare IoT with Efficient Machine Learning Techniques for Anomaly Detection
标题：SHIELD：利用高效的机器学习技术来保护医疗保健物联网的安全，以进行异常检测
链接：https://arxiv.org/abs/2511.03661

作者：Mahek Desai, Apoorva Rumale, Marjan Asadinia
备注：None
摘要：物联网设备在医疗保健领域的集成带来了重大的安全性和可靠性挑战，增加了对网络威胁和操作异常的敏感性。这项研究提出了一个机器学习驱动的框架，用于（1）检测恶意网络攻击和（2）识别故障设备异常，利用20万条记录的数据集。跨三种学习方法评估八种机器学习模型：监督学习（XGBoost、K最近邻（K-NN））、半监督学习（生成对抗网络（GAN）、变分自动编码器（VAE））和无监督学习（单类支持向量机（SVM）、隔离森林、图神经网络（GNN）和长短期记忆（LSTM）自动编码器）。综合评价是在多个指标，如F1分数，精度，召回率，准确率，ROC-AUC，计算效率。XGBoost在异常检测的计算开销最小（0.04s）的情况下达到了99%的准确率，而隔离森林有效地平衡了查准率和查全率。LSTM自动编码器表现不佳，准确性较低，延迟较高。对于攻击检测，KNN以最低的计算成本（0.05s）实现了近乎完美的精确度，召回率和F1分数，其次是VAE，准确率为97%。GAN显示出最高的计算成本，最低的准确性和ROC-AUC。这些发现通过有效的异常检测策略增强了物联网医疗安全。通过改进网络威胁和设备故障的早期检测，该框架有可能防止数据泄露，最大限度地减少系统停机时间，并确保医疗设备的持续安全运行，最终保护患者健康和对物联网驱动的医疗保健解决方案的信任。
摘要：The integration of IoT devices in healthcare introduces significant security and reliability challenges, increasing susceptibility to cyber threats and operational anomalies. This study proposes a machine learning-driven framework for (1) detecting malicious cyberattacks and (2) identifying faulty device anomalies, leveraging a dataset of 200,000 records. Eight machine learning models are evaluated across three learning approaches: supervised learning (XGBoost, K-Nearest Neighbors (K- NN)), semi-supervised learning (Generative Adversarial Networks (GAN), Variational Autoencoders (VAE)), and unsupervised learning (One-Class Support Vector Machine (SVM), Isolation Forest, Graph Neural Networks (GNN), and Long Short-Term Memory (LSTM) Autoencoders). The comprehensive evaluation was conducted across multiple metrics like F1-score, precision, recall, accuracy, ROC-AUC, computational efficiency. XGBoost achieved 99\% accuracy with minimal computational overhead (0.04s) for anomaly detection, while Isolation Forest balanced precision and recall effectively. LSTM Autoencoders underperformed with lower accuracy and higher latency. For attack detection, KNN achieved near-perfect precision, recall, and F1-score with the lowest computational cost (0.05s), followed by VAE at 97% accuracy. GAN showed the highest computational cost with lowest accuracy and ROC-AUC. These findings enhance IoT-enabled healthcare security through effective anomaly detection strategies. By improving early detection of cyber threats and device failures, this framework has the potential to prevent data breaches, minimize system downtime, and ensure the continuous and safe operation of medical devices, ultimately safeguarding patient health and trust in IoT-driven healthcare solutions.

【2】A Quantized VAE-MLP Botnet Detection Model: A Systematic Evaluation of Quantization-Aware Training and Post-Training Quantization Strategies
标题：量化VAE-MLP僵尸网络检测模型：量化感知训练和训练后量化策略的系统评估
链接：https://arxiv.org/abs/2511.03201

作者：Hassan Wasswa, Hussein Abbass, Timothy Lynar
摘要：为了应对越来越多的基于物联网僵尸网络的攻击，人们提出了最先进的深度学习方法，并取得了令人印象深刻的检测精度。然而，它们的计算强度限制了在资源受限的物联网设备上的部署，从而迫切需要轻量级检测模型。这个挑战的一个常见解决方案是通过量化进行模型压缩。本研究提出了一个VAE-MLP模型框架，其中基于MLP的分类器使用预训练的变分自动编码器（VAE）的编码器组件对来自高维训练数据的8维潜在向量进行训练。然后，使用两个基准物联网僵尸网络数据集-N-BaIoT和CICIoT 2022-系统地评估了两种广泛使用的量化策略-量化感知训练（QAT）和训练后量化（PTQ）-对检测性能，存储效率和推理延迟的影响。结果表明，在检测准确性方面，QAT策略经历了更明显的下降，而PTQ只发生了边际减少相比，原来的未量化的模型。此外，PTQ实现了6倍的加速和21倍的大小减少，而QAT实现了3倍的加速和24倍的压缩，证明了量化对设备级物联网僵尸网络检测的实用性。
摘要：In an effort to counter the increasing IoT botnet-based attacks, state-of-the-art deep learning methods have been proposed and have achieved impressive detection accuracy. However, their computational intensity restricts deployment on resource-constrained IoT devices, creating a critical need for lightweight detection models. A common solution to this challenge is model compression via quantization. This study proposes a VAE-MLP model framework where an MLP-based classifier is trained on 8-dimensional latent vectors derived from the high-dimensional train data using the encoder component of a pretrained variational autoencoder (VAE). Two widely used quantization strategies--Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ)--are then systematically evaluated in terms of their impact on detection performance, storage efficiency, and inference latency using two benchmark IoT botnet datasets--N-BaIoT and CICIoT2022. The results revealed that, with respect to detection accuracy, the QAT strategy experienced a more noticeable decline,whereas PTQ incurred only a marginal reduction compared to the original unquantized model. Furthermore, PTQ yielded a 6x speedup and 21x reduction in size, while QAT achieved a 3x speedup and 24x compression, demonstrating the practicality of quantization for device-level IoT botnet detection.

【3】Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
标题：使用代理多语言翻译模型的自动机器翻译检测
链接：https://arxiv.org/abs/2511.02958

作者：Cristian García-Romero, Miquel Esplà-Gomis, Felipe Sánchez-Martínez
备注：Pre-MIT Press publication version
摘要：现代机器翻译（MT）系统依赖于大型并行语料库，通常从互联网上收集。然而，最近的证据表明，（i）这些文本中有很大一部分是机器生成的翻译，（ii）过度依赖训练数据中的这种合成内容会显著降低翻译质量。因此，过滤掉非人工翻译正成为构建高质量机器翻译系统的重要预处理步骤。在这项工作中，我们提出了一种新的方法，直接利用代理多语言MT模型的内部表示来区分人类和机器翻译的句子。实验结果表明，我们的方法优于当前最先进的技术，特别是对于非英语语言对，实现了至少5个百分点的准确性增益。
摘要：Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a novel approach that directly exploits the internal representations of a surrogate multilingual MT model to distinguish between human and machine-translated sentences. Experimental results show that our method outperforms current state-of-the-art techniques, particularly for non-English language pairs, achieving gains of at least 5 percentage points of accuracy.

分类|识别(3篇)

【1】System Identification of a Moored ASV with Recessed Moon Pool via Deterministic and Bayesian Hankel-DMDc
标题：基于确定性和Bayesian Hankel-DMDc的系泊ASV系统识别
链接：https://arxiv.org/abs/2511.03482

作者：Giorgio Palma, Ivan Santic, Andrea Serani, Lorenzo Minno, Matteo Diez
备注：26 pages, 11 figures, 2 tables, 1 box
摘要：本研究针对系泊条件下的小型自主水面航行器（ASV），采用Hankel动态模式分解控制（HDMDc）及其贝叶斯扩展（BHDMDc）进行系统辨识。在CNR-INM的拖曳水池中，在不规则和规则的头浪条件下，在Codevintec CK-14 e ASV上进行了实验。正在研究的ASV具有一个凹入的月池，由于晃动引起非线性响应，从而增加了建模挑战。根据船舶运动和系泊载荷的测量结果建立了数据驱动的降阶模型。HDMDc框架提供了对血管动力学的准确确定性预测，而贝叶斯公式通过考虑超参数选择的可变性来实现模型响应的不确定性感知表征。对实验数据的验证表明，HDMDc和BHDMDc可以预测船舶的反应看不见的规则和不规则波激励。总之，该研究表明，基于HDMDC的ROM是系统识别的一种可行的数据驱动替代方案，首次展示了其对不同于训练集的海况的泛化能力，在再现船舶动力学方面实现了高精度。
摘要：This study addresses the system identification of a small autonomous surface vehicle (ASV) under moored conditions using Hankel dynamic mode decomposition with control (HDMDc) and its Bayesian extension (BHDMDc). Experiments were carried out on a Codevintec CK-14e ASV in the towing tank of CNR-INM, under both irregular and regular head-sea wave conditions. The ASV under investigation features a recessed moon pool, which induces nonlinear responses due to sloshing, thereby increasing the modelling challenge. Data-driven reduced-order models were built from measurements of vessel motions and mooring loads. The HDMDc framework provided accurate deterministic predictions of vessel dynamics, while the Bayesian formulation enabled uncertainty-aware characterization of the model response by accounting for variability in hyperparameter selection. Validation against experimental data demonstrated that both HDMDc and BHDMDc can predict the vessel's response to unseen regular and irregular wave excitations. In conclusion, the study shows that HDMDc-based ROMs are a viable data-driven alternative for system identification, demonstrating for the first time their generalization capability for a sea condition different from the training set, achieving high accuracy in reproducing vessel dynamics.

【2】A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications
标题：用于交通运输统计人工智能应用中多标签意图识别的模块化、无数据管道
链接：https://arxiv.org/abs/2511.03363

作者：Xiaocai Zhang, Hur Lim, Ke Wang, Zhe Xiao, Jing Wang, Kelvin Lee, Xiuju Fu, Zheng Qin
备注：Present in the Transportation Research Board (TRB) Annual Meeting 2026
摘要：在这项研究中，提出了一种用于多标签意图识别的模块化、无数据管道，用于交通领域的人工智能应用。与传统的意图识别系统依赖于大型的注释语料库，并且经常与细粒度的多标签歧视作斗争不同，我们的方法消除了对昂贵的数据收集的需要，同时提高了多标签意图理解的准确性。具体来说，整个管道，命名为DMTC，包括三个步骤：1）使用提示工程来引导大型语言模型（LLM）在不同的传输场景中生成不同的合成查询; 2）使用句子-T5模型对每个文本查询进行编码，以获得紧凑的语义嵌入; 3）使用强调硬样本并最大化类间可分性的新型在线焦点对比（OFC）损失来训练轻量级分类器。拟议的管道的适用性在海上运输背景下的代理人工智能应用中得到了证明。大量实验表明，DMTC实现了5.35%的海明损失和95.92%的AUC，优于最先进的多标签分类器和最近基于端到端SOTA LLM的基线。进一步的分析表明，句子-T5嵌入比替代编码器至少提高了3.29%的子集准确度，并且与标准对比目标相比，整合OFC损失产生了额外的0.98%的增益。总之，我们的系统无缝地将用户查询路由到特定于任务的模块（例如，ETA信息、交通风险评估和交通领域的其他典型场景），为完全自主、意图感知的代理奠定基础，而无需昂贵的人工标签。
摘要：In this study, a modular, data-free pipeline for multi-label intention recognition is proposed for agentic AI applications in transportation. Unlike traditional intent recognition systems that depend on large, annotated corpora and often struggle with fine-grained, multi-label discrimination, our approach eliminates the need for costly data collection while enhancing the accuracy of multi-label intention understanding. Specifically, the overall pipeline, named DMTC, consists of three steps: 1) using prompt engineering to guide large language models (LLMs) to generate diverse synthetic queries in different transport scenarios; 2) encoding each textual query with a Sentence-T5 model to obtain compact semantic embeddings; 3) training a lightweight classifier using a novel online focal-contrastive (OFC) loss that emphasizes hard samples and maximizes inter-class separability. The applicability of the proposed pipeline is demonstrated in an agentic AI application in the maritime transportation context. Extensive experiments show that DMTC achieves a Hamming loss of 5.35% and an AUC of 95.92%, outperforming state-of-the-art multi-label classifiers and recent end-to-end SOTA LLM-based baselines. Further analysis reveals that Sentence-T5 embeddings improve subset accuracy by at least 3.29% over alternative encoders, and integrating the OFC loss yields an additional 0.98% gain compared to standard contrastive objectives. In conclusion, our system seamlessly routes user queries to task-specific modules (e.g., ETA information, traffic risk evaluation, and other typical scenarios in the transportation domain), laying the groundwork for fully autonomous, intention-aware agents without costly manual labelling.

【3】An Efficient Classification Model for Cyber Text
标题：一种高效的网络文本分类模型
链接：https://arxiv.org/abs/2511.03107

作者：Md Sakhawat Hossen, Md. Zashid Iqbal Borshon, A. S. M. Badrudduza
摘要：近年来，深度学习方法和实践的兴起带来了严重的后果，即由于对计算资源和计算能力的贪得无厌的需求而增加了碳足迹。文本分析领域也在这种垄断方法论的趋势中经历了巨大的转变。本文对TF-IDF算法进行了改进，提出了Clement词频-逆文档频率（CTF-IDF）算法进行数据预处理。本文主要讨论了经典的机器学习技术在文本分析中的有效性与CTF-IDF和一个更快的IRLBA算法降维。在传统的文本分析管道中引入这两种技术，与关于碳足迹的深度学习方法相比，确保了更高效，更快，计算密集度更低的应用程序，在准确性方面略有妥协。实验结果还显示出多种形式的时间复杂度的降低和模型精度的提高经典的机器学习方法在本文中进一步讨论。
摘要：The uprising of deep learning methodology and practice in recent years has brought about a severe consequence of increasing carbon footprint due to the insatiable demand for computational resources and power. The field of text analytics also experienced a massive transformation in this trend of monopolizing methodology. In this paper, the original TF-IDF algorithm has been modified, and Clement Term Frequency-Inverse Document Frequency (CTF-IDF) has been proposed for data preprocessing. This paper primarily discusses the effectiveness of classical machine learning techniques in text analytics with CTF-IDF and a faster IRLBA algorithm for dimensionality reduction. The introduction of both of these techniques in the conventional text analytics pipeline ensures a more efficient, faster, and less computationally intensive application when compared with deep learning methodology regarding carbon footprint, with minor compromise in accuracy. The experimental results also exhibit a manifold of reduction in time complexity and improvement of model accuracy for the classical machine learning methods discussed further in this paper.

表征(1篇)

【1】Heterogeneous Metamaterials Design via Multiscale Neural Implicit Representation
标题：基于多尺度神经隐式表示的异类超材料设计
链接：https://arxiv.org/abs/2511.03012

作者：Hongrui Chen, Liwei Wang, Levent Burak Kara
摘要：超材料是由特殊设计的单元组成的工程材料，其表现出超出天然材料的非凡特性。复杂的工程任务通常需要异构的单位单元，以适应空间变化的性能要求。然而，由于巨大的设计空间和相邻单元之间严格的兼容性要求，设计异质超材料带来了重大挑战。传统的并行多尺度设计方法需要解决一个昂贵的优化问题，为每个单元格，并经常遭受不连续的单元边界。另一方面，从固定的微结构库组装结构的数据驱动方法受到数据集的限制，需要额外的后处理以确保无缝连接。在这项工作中，我们提出了一个基于神经网络的超材料设计框架，该框架可以学习结构的连续双尺度表示，从而共同应对这些挑战。我们的框架的核心是一个多尺度神经表示，其中神经网络将全局（宏观尺度）和局部（微观尺度）坐标作为输入，输出一个隐式字段，该字段表示跨域具有兼容单位单元几何形状的多尺度结构，而不需要预定义的数据集。我们在训练过程中使用兼容性损失项来加强相邻单位单元之间的连接。一旦经过训练，该网络就可以以任意高的分辨率生成超材料设计，从而实现制造或仿真的无限上采样。我们证明了所提出的方法在机械超材料设计，负泊松比和机械隐身问题在机器人，生物工程和航空航天中的潜在应用的有效性。
摘要：Metamaterials are engineered materials composed of specially designed unit cells that exhibit extraordinary properties beyond those of natural materials. Complex engineering tasks often require heterogeneous unit cells to accommodate spatially varying property requirements. However, designing heterogeneous metamaterials poses significant challenges due to the enormous design space and strict compatibility requirements between neighboring cells. Traditional concurrent multiscale design methods require solving an expensive optimization problem for each unit cell and often suffer from discontinuities at cell boundaries. On the other hand, data-driven approaches that assemble structures from a fixed library of microstructures are limited by the dataset and require additional post-processing to ensure seamless connections. In this work, we propose a neural network-based metamaterial design framework that learns a continuous two-scale representation of the structure, thereby jointly addressing these challenges. Central to our framework is a multiscale neural representation in which the neural network takes both global (macroscale) and local (microscale) coordinates as inputs, outputting an implicit field that represents multiscale structures with compatible unit cell geometries across the domain, without the need for a predefined dataset. We use a compatibility loss term during training to enforce connectivity between adjacent unit cells. Once trained, the network can produce metamaterial designs at arbitrarily high resolution, hence enabling infinite upsampling for fabrication or simulation. We demonstrate the effectiveness of the proposed approach on mechanical metamaterial design, negative Poisson's ratio, and mechanical cloaking problems with potential applications in robotics, bioengineering, and aerospace.

优化|敛散性(3篇)

【1】Flat Minima and Generalization: Insights from Stochastic Convex Optimization
标题：平坦极小值和推广：随机凸优化的见解
链接：https://arxiv.org/abs/2511.03548

作者：Matan Schliserman, Shira Vansover-Hager, Tomer Koren
摘要：理解学习算法的泛化行为是学习理论的核心目标。最近出现的一种解释是，学习算法在实践中是成功的，因为它们收敛到平坦的最小值，这一直与提高泛化性能有关。在这项工作中，我们研究了平坦的最小值和推广之间的联系，在规范设置的随机凸优化与非负，$\beta$光滑的目标。我们的第一个发现是，即使在这个基本的和充分研究的设置，平坦的经验最小值可能会导致微不足道的$\Omega（1）$人口风险，而尖锐的最小值最佳推广。然后，我们证明了这种糟糕的泛化行为扩展到最初由Foret等人提出的两种自然的“锐度感知”算法。（2021），旨在将优化偏向平坦的解决方案：锐度感知梯度下降（SA-GD）和锐度感知最小化（SAM）。对于SA-GD，它在预定义的邻域中的最大损失上执行梯度步骤，我们证明了虽然它以快速的速度成功收敛到平坦的最小值，但解决方案的人口风险仍然可以大到$\Omega（1）$，这表明即使是平坦的最小值，使用尖锐度感知的梯度方法算法也可能推广不良。对于SAM，一个计算效率的近似SA-GD的基础上归一化的上升步骤，我们表明，虽然它最大限度地减少了经验损失，它可能会收敛到一个尖锐的最小值，也会引起人口风险$\欧米茄（1）$。最后，我们建立人口风险上限SA-GD和SAM使用算法的稳定性技术。
摘要：Understanding the generalization behavior of learning algorithms is a central goal of learning theory. A recently emerging explanation is that learning algorithms are successful in practice because they converge to flat minima, which have been consistently associated with improved generalization performance. In this work, we study the link between flat minima and generalization in the canonical setting of stochastic convex optimization with a non-negative, $\beta$-smooth objective. Our first finding is that, even in this fundamental and well-studied setting, flat empirical minima may incur trivial $\Omega(1)$ population risk while sharp minima generalizes optimally. Then, we show that this poor generalization behavior extends to two natural ''sharpness-aware'' algorithms originally proposed by Foret et al. (2021), designed to bias optimization toward flat solutions: Sharpness-Aware Gradient Descent (SA-GD) and Sharpness-Aware Minimization (SAM). For SA-GD, which performs gradient steps on the maximal loss in a predefined neighborhood, we prove that while it successfully converges to a flat minimum at a fast rate, the population risk of the solution can still be as large as $\Omega(1)$, indicating that even flat minima found algorithmically using a sharpness-aware gradient method might generalize poorly. For SAM, a computationally efficient approximation of SA-GD based on normalized ascent steps, we show that although it minimizes the empirical loss, it may converge to a sharp minimum and also incur population risk $\Omega(1)$. Finally, we establish population risk upper bounds for both SA-GD and SAM using algorithmic stability techniques.

【2】A Support-Set Algorithm for Optimization Problems with Nonnegative and Orthogonal Constraints
标题：具有非负和垂直约束的优化问题的支持集算法
链接：https://arxiv.org/abs/2511.03443

作者：Lei Wang, Xin Liu, Xiaojun Chen
摘要：在本文中，我们研究非负和正交约束的优化问题，其中任何可行的矩阵的大小为n \times p$表现出稀疏模式，使每行容纳最多一个非零项。我们的分析表明，通过固定的支持集，全球解决方案的最小化子问题的目标函数的近端线性化可以计算在封闭的形式与最多$n$非零项目。利用这种结构特性为显著提高计算效率提供了一个强大的途径。在此基础上，我们提出了一个严格保持迭代可行性的支持集算法。一个中心的成分是一个战略性的设计更新计划的支持集，调整非零条目的位置。我们建立了支持集算法到一阶稳定点的全局收敛性，并证明了达到$\N $-近似一阶稳定点所需的迭代复杂度为O（\N ^{-2}）$.数值结果强烈支持我们的算法在现实世界中的应用，包括非负PCA，聚类和社区检测。
摘要：In this paper, we investigate optimization problems with nonnegative and orthogonal constraints, where any feasible matrix of size $n \times p$ exhibits a sparsity pattern such that each row accommodates at most one nonzero entry. Our analysis demonstrates that, by fixing the support set, the global solution of the minimization subproblem for the proximal linearization of the objective function can be computed in closed form with at most $n$ nonzero entries. Exploiting this structural property offers a powerful avenue for dramatically enhancing computational efficiency. Guided by this insight, we propose a support-set algorithm preserving strictly the feasibility of iterates. A central ingredient is a strategically devised update scheme for support sets that adjusts the placement of nonzero entries. We establish the global convergence of the support-set algorithm to a first-order stationary point, and show that its iteration complexity required to reach an $\epsilon$-approximate first-order stationary point is $O (\epsilon^{-2})$. Numerical results are strongly in favor of our algorithm in real-world applications, including nonnegative PCA, clustering, and community detection.

【3】Min-Max Optimization Is Strictly Easier Than Variational Inequalities
标题：最小-最大优化严格比变分不等式更容易
链接：https://arxiv.org/abs/2511.03052

作者：Henry Shugart, Jason M. Altschuler
摘要：经典上，解决凸-凹极小-极大问题的主流方法是解决由其一阶最优性条件引起的变分不等式问题。有没有可能通过绕过这种减少来更快地解决最小值-最大值问题？本文启动了这一调查。我们表明，答案是肯定的，在教科书设置的无约束二次目标：一阶算法的最佳收敛速度严格优于最小最大问题比相应的变分不等式。最小-最大算法可以更快的关键原因是它们可以利用最小和最大变量的不对称性-在变分不等式的简化中丢失的属性。我们的分析中心是尖锐的极值多项式，我们使用格林函数和保形映射计算的最佳收敛速度的特征。
摘要：Classically, a mainstream approach for solving a convex-concave min-max problem is to instead solve the variational inequality problem arising from its first-order optimality conditions. Is it possible to solve min-max problems faster by bypassing this reduction? This paper initiates this investigation. We show that the answer is yes in the textbook setting of unconstrained quadratic objectives: the optimal convergence rate for first-order algorithms is strictly better for min-max problems than for the corresponding variational inequalities. The key reason that min-max algorithms can be faster is that they can exploit the asymmetry of the min and max variables--a property that is lost in the reduction to variational inequalities. Central to our analyses are sharp characterizations of optimal convergence rates in terms of extremal polynomials which we compute using Green's functions and conformal mappings.

预测|估计(6篇)

【1】Financial Management System for SMEs: Real-World Deployment of Accounts Receivable and Cash Flow Prediction
标题：中小企业财务管理系统：应收账款和现金流预测的现实部署
链接：https://arxiv.org/abs/2511.03631

作者：Bartłomiej Małkus, Szymon Bobek, Grzegorz J. Nalepa
备注：11 pages, 1 figure
摘要：中小型企业（SME），特别是自由职业者和早期企业，由于资源有限、客户群小和数据可用性受限，面临着独特的财务管理挑战。本文介绍了一个综合的财务预测系统，结合应收账款预测和现金流预测专门为中小企业的经营约束的开发和部署。我们的系统解决了以企业为中心的金融工具与自由职业者和小企业的实际需求之间的差距。该解决方案集成了两个关键组件：用于预测发票支付延迟的二元分类模型，以及用于处理不完整和有限历史数据的多模块现金流预测模型。一个原型系统已经实施并部署为一个网络应用程序，并集成到Cockee的平台上，Cockee是一家为自由职业者提供财务管理工具的初创公司，展示了现实世界中小企业财务管理的实际可行性。
摘要：Small and Medium Enterprises (SMEs), particularly freelancers and early-stage businesses, face unique financial management challenges due to limited resources, small customer bases, and constrained data availability. This paper presents the development and deployment of an integrated financial prediction system that combines accounts receivable prediction and cash flow forecasting specifically designed for SME operational constraints. Our system addresses the gap between enterprise-focused financial tools and the practical needs of freelancers and small businesses. The solution integrates two key components: a binary classification model for predicting invoice payment delays, and a multi-module cash flow forecasting model that handles incomplete and limited historical data. A prototype system has been implemented and deployed as a web application with integration into Cluee's platform, a startup providing financial management tools for freelancers, demonstrating practical feasibility for real-world SME financial management.

【2】Forecast2Anomaly (F2A): Adapting Multivariate Time Series Foundation Models for Anomaly Prediction
标题：Forecast 2异常（F2 A）：调整多元时间序列基础模型进行异常预测
链接：https://arxiv.org/abs/2511.03149

作者：Atif Hassan, Tarun Kumar, Ashish Mishra, Sergey Serebryakov, Satish Kumar Mopur, Phanidhar Koganti, Murthy Chelankuri, Ramanagopal Vogety, Suparna Bhattacharya, Martin Foltin
摘要：预测来自不同现实世界、动态和复杂系统的多变量时间序列中的异常（异常预测）对于抢占关键故障至关重要，从而大幅降低运营成本和人力。然而，现有的方法仅限于特定的系统，而不能概括为随着时间的推移不断变化的异常模式。相比之下，预训练的时间序列基础模型（TSFM）最近表现出强大的泛化和zero-shot预测能力。然而，他们的潜力仍然未开发的异常预测，从根本上不同于预测正常行为的任务。因此，我们提出了Forecast2Anomaly（F2A），这是一个新的框架，通过两个关键的创新，使TSFM具有异常预测能力。首先，我们提出了一个联合预测异常损失，微调TSFM，以准确地预测未来的信号，即使在异常的时间点。其次，我们引入了一个检索增强生成（RAG）模块，检索历史上相关的视野和条件对他们的预测。该组件在推理时动态适应分布变化，使F2A能够跟踪不断变化的异常，而无需更新模型。通过将目标微调与动态检索相结合，F2A弥合了稳健TSFM zero-shot预测和zero-shot异常预测之间的差距。在16个不同的数据集和多个TSFM主干上进行的广泛实验表明，F2A始终优于最先进的方法，为现实世界的应用提供了可扩展的zero-shot异常预测解决方案。
摘要：Forecasting anomalies (anomaly prediction) in multivariate time series from different real-world, dynamic, and complex systems is vital for preempting critical failures, leading to a substantial minimization in operational costs and human labor. Yet, existing methods are limited to specific systems while failing to generalize to evolving anomaly patterns over time. In contrast, pretrained Time Series Foundation Models (TSFMs) have recently demonstrated strong generalization and zero-shot forecasting capabilities. However, their potential remains untapped for anomaly prediction, a task fundamentally different from forecasting normal behavior. Thus, we present Forecast2Anomaly (F2A), a novel framework that empowers TSFMs with anomaly prediction abilities through two key innovations. First, we propose a joint forecast-anomaly loss that fine-tunes TSFMs to accurately forecast future signals even at anomalous time points. Second, we introduce a Retrieval-Augmented Generation (RAG) module that retrieves historically relevant horizons and conditions predictions on them. This component dynamically adapts to distributional shifts at inference time, enabling F2A to track evolving anomalies without requiring model updates. By combining targeted fine-tuning with dynamic retrieval, F2A bridges the gap between robust TSFM zero-shot forecasting and zero-shot anomaly prediction. Extensive experiments across 16 diverse datasets and multiple TSFM backbones show that F2A consistently outperforms state-of-the-art methods, offering a scalable, zero-shot anomaly prediction solution for real-world applications.

【3】Towards Scalable Backpropagation-Free Gradient Estimation
标题：迈向可扩展的无反向传播梯度估计
链接：https://arxiv.org/abs/2511.03110

作者：Daniel Wang, Evan Markou, Dylan Campbell
备注：12 pages, 2 figures, Accepted to AJCAI 2025
摘要：虽然反向传播-反向模式自动微分-在深度学习中非常成功，但它需要通过神经网络的两次传递（向前和向后）以及中间激活的存储。现有的梯度估计方法，而不是使用前向模式自动微分的斗争，以扩大规模超过小型网络，由于高方差的估计。迄今为止，为减轻这一问题所作的努力给估计数带来了重大偏差，降低了其效用。我们引入了一种梯度估计方法，通过在计算猜测方向时操纵上游雅可比矩阵来降低偏差和方差。它显示出有希望的结果，并有可能扩展到更大的网络，确实随着网络宽度的增加而表现得更好。通过分析偏差和方差，以及它们与神经网络梯度的低维结构的联系，有助于我们理解这种方法。
摘要：While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations. Existing gradient estimation methods that instead use forward-mode automatic differentiation struggle to scale beyond small networks due to the high variance of the estimates. Efforts to mitigate this have so far introduced significant bias to the estimates, reducing their utility. We introduce a gradient estimation approach that reduces both bias and variance by manipulating upstream Jacobian matrices when computing guess directions. It shows promising results and has the potential to scale to larger networks, indeed performing better as the network width is increased. Our understanding of this method is facilitated by analyses of bias and variance, and their connection to the low-dimensional structure of neural network gradients.

【4】Predicting Weekly Fishing Concentration Zones through Deep Learning Integration of Heterogeneous Environmental Spatial Datasets
标题：通过深度学习集成异类环境空间数据预测每周捕捞集中区
链接：https://arxiv.org/abs/2511.02887

作者：Chaitanya Rele, Aditya Rathod, Kaustubh Natu, Saurabh Kulkarni, Ajay Koli, Swapnali Makdey
摘要：北印度洋，包括阿拉伯海和孟加拉湾，是沿海社区的重要生计来源，但渔民在寻找生产性渔场方面往往面临不确定性。为了应对这一挑战，我们提出了一个人工智能辅助框架，用于使用海洋学参数（如海面温度和叶绿素浓度）预测潜在捕鱼区（PFZ）。该方法旨在提高PFZ识别的准确性，并为可持续捕捞方法提供针对具体区域的见解。初步结果表明，该框架可以支持渔民减少搜索时间，降低燃料消耗，并促进有效的资源利用。
摘要：The North Indian Ocean, including the Arabian Sea and the Bay of Bengal, represents a vital source of livelihood for coastal communities, yet fishermen often face uncertainty in locating productive fishing grounds. To address this challenge, we present an AI-assisted framework for predicting Potential Fishing Zones (PFZs) using oceanographic parameters such as sea surface temperature and chlorophyll concentration. The approach is designed to enhance the accuracy of PFZ identification and provide region-specific insights for sustainable fishing practices. Preliminary results indicate that the framework can support fishermen by reducing search time, lowering fuel consumption, and promoting efficient resource utilization.

【5】A Novel Reservoir Computing Framework for Chaotic Time Series Prediction Using Time Delay Embedding and Random Fourier Features
标题：利用时间延迟嵌入和随机傅里叶特征进行混乱时间序列预测的新型水库计算框架
链接：https://arxiv.org/abs/2511.02877

作者：S. K. Laha
摘要：预测混沌时间序列需要能够捕获潜在吸引子的内在几何结构，同时保持计算效率的模型。我们介绍了一种新的水库计算（RC）框架，集成了时间延迟嵌入与随机傅立叶特征（RFF）映射，构建一个动态水库，而不需要传统的递归架构。与依赖于高维递归连接的标准RC不同，所提出的RFF-RC显式地近似于非线性核变换，从而揭示重构相空间中的潜在动力学关系。这种混合公式提供了两个关键优点：（i）它提供了一种原则性的方法来近似延迟坐标之间的复杂非线性相互作用，从而丰富了储层的有效动态表示，以及（ii）它减少了对手动储层超参数（诸如谱半径和泄漏速率）的依赖。我们评估的框架规范混沌系统的麦基-玻璃方程，洛伦兹系统，和Kuramoto-Sivashinsky方程。这种新的配方表明，RFF-RC不仅实现了优越的预测精度，但也产生强大的吸引子重建和长期预测。这些结果表明，延迟嵌入和基于RFF的水库的组合揭示了新的动力学结构，通过嵌入系统在一个丰富的特征空间，提供了一个计算效率和可解释的方法来建模混沌动力学。
摘要：Forecasting chaotic time series requires models that can capture the intrinsic geometry of the underlying attractor while remaining computationally efficient. We introduce a novel reservoir computing (RC) framework that integrates time-delay embedding with Random Fourier Feature (RFF) mappings to construct a dynamical reservoir without the need for traditional recurrent architectures. Unlike standard RC, which relies on high-dimensional recurrent connectivity, the proposed RFF-RC explicitly approximates nonlinear kernel transformations that uncover latent dynamical relations in the reconstructed phase space. This hybrid formulation offers two key advantages: (i) it provides a principled way to approximate complex nonlinear interactions among delayed coordinates, thereby enriching the effective dynamical representation of the reservoir, and (ii) it reduces reliance on manual reservoir hyperparameters such as spectral radius and leaking rate. We evaluate the framework on canonical chaotic systems-the Mackey-Glass equation, the Lorenz system, and the Kuramoto-Sivashinsky equation. This novel formulation demonstrates that RFF-RC not only achieves superior prediction accuracy but also yields robust attractor reconstructions and long-horizon forecasts. These results show that the combination of delay embedding and RFF-based reservoirs reveals new dynamical structure by embedding the system in an enriched feature space, providing a computationally efficient and interpretable approach to modeling chaotic dynamics.

【6】Spatio-Temporal Attention Network for Epileptic Seizure Prediction
标题：癫痫发作预测的时空注意力网络
链接：https://arxiv.org/abs/2511.02846

作者：Zan Li, Kyongmin Yeo, Wesley Gifford, Lara Marcuse, Madeline Fields, Bülent Yener
摘要：在这项研究中，我们提出了一个深度学习框架，通过时空注意力网络（STAN）学习EEG信号的复杂时空相关结构，以准确预测癫痫患者的癫痫发作。与现有的方法不同，这些方法依赖于特征工程和/或假设固定的发作前持续时间，我们的方法同时通过STAN对时空相关性进行建模，并采用对抗性神经网络来区分发作前和发作间期的注意力模式，从而实现患者特定的学习。在CHB-MIT和MSSM数据集上的评估表明，CHB-MIT的灵敏度为96.6%，错误检测率为0.011/h，MSSM的灵敏度为94.2%，FDR为0.063/h，显著优于最先进的方法。该框架可在发作前至少15分钟可靠地检测到发作前状态，患者特异性窗口可延长至45分钟，为临床应用提供足够的干预时间。
摘要：In this study, we present a deep learning framework that learns complex spatio-temporal correlation structures of EEG signals through a Spatio-Temporal Attention Network (STAN) for accurate predictions of onset of seizures for Epilepsy patients. Unlike existing methods, which rely on feature engineering and/or assume fixed preictal durations, our approach simultaneously models spatio-temporal correlations through STAN and employs an adversarial discriminator to distinguish preictal from interictal attention patterns, enabling patient-specific learning. Evaluation on CHB-MIT and MSSM datasets demonstrates 96.6\% sensitivity with 0.011/h false detection rate on CHB-MIT, and 94.2% sensitivity with 0.063/h FDR on MSSM, significantly outperforming state-of-the-art methods. The framework reliably detects preictal states at least 15 minutes before an onset, with patient-specific windows extending to 45 minutes, providing sufficient intervention time for clinical applications.

其他神经网络|深度学习|模型|建模(17篇)

【1】CLAX: Fast and Flexible Neural Click Models in JAX
标题：CLAX：JAX中快速灵活的神经点击模型
链接：https://arxiv.org/abs/2511.03620

作者：Philipp Hager, Onno Zoeter, Maarten de Rijke
摘要：CLAX是一个基于JAX的库，它使用现代的基于梯度的优化来实现经典的点击模型。虽然神经点击模型在过去十年中已经出现，但基于概率图模型（PGMs）的复杂点击模型尚未系统地采用基于梯度的优化，这阻止了从业者利用现代深度学习框架，同时保留经典模型的可解释性。CLAX通过以数值稳定的方式将基于EM的优化替换为直接基于梯度的优化来解决这一差距。该框架的模块化设计可以将任何组件（从嵌入和深度网络到自定义模块）集成到经典的点击模型中，以进行端到端优化。我们通过在完整的百度-ULTR数据集上运行实验来证明CLAX的效率，该数据集包括在单个GPU上的$\约$2小时内超过10亿个用户会话，比传统的EM方法快几个数量级。CLAX实现了10个经典的点击模型，为寻求了解用户行为和大规模提高排名性能的行业从业者和开发新点击模型的研究人员提供服务。CLAX可从以下网址获得：https://github.com/philipphager/clax
摘要：CLAX is a JAX-based library that implements classic click models using modern gradient-based optimization. While neural click models have emerged over the past decade, complex click models based on probabilistic graphical models (PGMs) have not systematically adopted gradient-based optimization, preventing practitioners from leveraging modern deep learning frameworks while preserving the interpretability of classic models. CLAX addresses this gap by replacing EM-based optimization with direct gradient-based optimization in a numerically stable manner. The framework's modular design enables the integration of any component, from embeddings and deep networks to custom modules, into classic click models for end-to-end optimization. We demonstrate CLAX's efficiency by running experiments on the full Baidu-ULTR dataset comprising over a billion user sessions in $\approx$ 2 hours on a single GPU, orders of magnitude faster than traditional EM approaches. CLAX implements ten classic click models, serving both industry practitioners seeking to understand user behavior and improve ranking performance at scale and researchers developing new click models. CLAX is available at: https://github.com/philipphager/clax

【2】Tensor-Efficient High-Dimensional Q-learning
标题：张量高效的多维Q学习
链接：https://arxiv.org/abs/2511.03595

作者：Junyi Wu, Dan Li
摘要：高维强化学习面临着计算复杂和大型状态-动作空间中样本效率低的挑战。Q-学习算法特别难以克服维数灾难，其中状态-动作对的数量随着问题的大小呈指数级增长。虽然基于神经网络的方法（如Deep Q-Networks）已经取得了成功，但最近使用低秩分解的基于张量的方法提供了更有效的参数选择。在现有的基于张量的方法的基础上，我们提出了张量高效Q学习（TEQL），它通过改进离散状态-动作空间上的块坐标下降来增强低秩张量分解，并结合了新的探索和正则化机制。关键的创新是一种探索策略，该策略将近似误差与基于访问计数的置信上限相结合，以优先考虑具有高度不确定性的动作，避免浪费的随机探索。此外，我们在目标函数中加入了一个基于频率的惩罚项，以鼓励探索访问较少的状态-动作对，并减少对频繁访问区域的过度拟合。经典控制任务的实证结果表明，TEQL在样本效率和总回报方面优于传统的基于矩阵的方法和深度RL方法，使其适用于资源受限的应用，如空间和医疗保健，其中采样成本很高。
摘要：High-dimensional reinforcement learning faces challenges with complex calculations and low sample efficiency in large state-action spaces. Q-learning algorithms struggle particularly with the curse of dimensionality, where the number of state-action pairs grows exponentially with problem size. While neural network-based approaches like Deep Q-Networks have shown success, recent tensor-based methods using low-rank decomposition offer more parameter-efficient alternatives. Building upon existing tensor-based methods, we propose Tensor-Efficient Q-Learning (TEQL), which enhances low-rank tensor decomposition via improved block coordinate descent on discretized state-action spaces, incorporating novel exploration and regularization mechanisms. The key innovation is an exploration strategy that combines approximation error with visit count-based upper confidence bound to prioritize actions with high uncertainty, avoiding wasteful random exploration. Additionally, we incorporate a frequency-based penalty term in the objective function to encourage exploration of less-visited state-action pairs and reduce overfitting to frequently visited regions. Empirical results on classic control tasks demonstrate that TEQL outperforms conventional matrix-based methods and deep RL approaches in both sample efficiency and total rewards, making it suitable for resource-constrained applications, such as space and healthcare where sampling costs are high.

【3】Learning Under Laws: A Constraint-Projected Neural PDE Solver that Eliminates Hallucinations
标题：法律下的学习：消除幻觉的约束投影神经DTE求解器
链接：https://arxiv.org/abs/2511.03578

作者：Mainak Singha
备注：25 pages, 2 figures. This work introduces Constraint-Projected Learning (CPL)- a framework for neural PDE solvers that enforces physical conservation laws during training to eliminate hallucinated, non-physical solutions. Feedback is welcome. Not under review elsewhere
摘要：神经网络可以近似求解偏微分方程，但它们经常打破它们本来要建模的定律--凭空产生质量，漂移冲击，或者违反守恒和熵。我们通过在物理定律范围内而不是在它们旁边进行训练来解决这个问题。我们的框架，称为约束投影学习（CPL），通过将网络输出投影到由守恒，Rankine-Hugoniot平衡，熵和积极性定义的约束集的交集上，保持每次更新物理上可接受。投影是可微的，仅增加约10%的计算开销，使其与反向传播完全兼容。我们进一步用全变差阻尼（TVD）来稳定训练，以抑制小的振荡，并推出一个在长期预测范围内保持一致性的课程。总之，这些机制消除了硬违规和软违规：在机器精度下保持守恒，总变差增长消失，熵和误差保持有限。在Burgers和Euler系统上，CPL产生稳定的、物理上合法的解而不损失精度。而不是希望神经求解器会尊重物理，CPL使这种行为成为学习过程的内在属性。
摘要：Neural networks can approximate solutions to partial differential equations, but they often break the very laws they are meant to model-creating mass from nowhere, drifting shocks, or violating conservation and entropy. We address this by training within the laws of physics rather than beside them. Our framework, called Constraint-Projected Learning (CPL), keeps every update physically admissible by projecting network outputs onto the intersection of constraint sets defined by conservation, Rankine-Hugoniot balance, entropy, and positivity. The projection is differentiable and adds only about 10% computational overhead, making it fully compatible with back-propagation. We further stabilize training with total-variation damping (TVD) to suppress small oscillations and a rollout curriculum that enforces consistency over long prediction horizons. Together, these mechanisms eliminate both hard and soft violations: conservation holds at machine precision, total-variation growth vanishes, and entropy and error remain bounded. On Burgers and Euler systems, CPL produces stable, physically lawful solutions without loss of accuracy. Instead of hoping neural solvers will respect physics, CPL makes that behavior an intrinsic property of the learning process.

【4】Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances
标题：深度学习时代的模仿学习：一种新的分类学和最新进展
链接：https://arxiv.org/abs/2511.03565

作者：Iason Chrysomallis, Georgios Chalkiadakis
摘要：模仿学习（IL）使智能体能够通过观察和复制一个或多个专家的行为来获得技能。近年来，深度学习的进步显着扩展了模仿学习在一系列领域的能力和可扩展性，其中专家数据可以从完整的状态-动作轨迹到部分观察或未标记的序列。随着这种增长，新的方法也出现了，新的方法正在开发，以解决长期存在的挑战，如泛化，协变量转移和演示质量。在本调查中，我们回顾了模仿学习研究的最新进展，突出了最近的趋势，方法创新和实际应用。我们提出了一种新的分类法，这是从现有的分类，以更好地反映IL研究阶层的现状及其趋势。在整个调查中，我们批判性地研究了代表性作品的优势，局限性和评估实践，并概述了未来研究的关键挑战和开放方向。
摘要：Imitation learning (IL) enables agents to acquire skills by observing and replicating the behavior of one or multiple experts. In recent years, advances in deep learning have significantly expanded the capabilities and scalability of imitation learning across a range of domains, where expert data can range from full state-action trajectories to partial observations or unlabeled sequences. Alongside this growth, novel approaches have emerged, with new methodologies being developed to address longstanding challenges such as generalization, covariate shift, and demonstration quality. In this survey, we review the latest advances in imitation learning research, highlighting recent trends, methodological innovations, and practical applications. We propose a novel taxonomy that is distinct from existing categorizations to better reflect the current state of the IL research stratum and its trends. Throughout the survey, we critically examine the strengths, limitations, and evaluation practices of representative works, and we outline key challenges and open directions for future research.

【5】Efficient Neural Networks with Discrete Cosine Transform Activations
标题：离散余弦变换激活的高效神经网络
链接：https://arxiv.org/abs/2511.03531

作者：Marc Martinez-Gost, Sara Pepe, Ana Pérez-Neira, Miguel Ángel Lagunas
备注：Paper submitted to WSEAS Signal Processing Journal
摘要：在本文中，我们扩展了我们以前的工作表达神经网络（ENN），多层感知器与自适应激活函数参数化使用离散余弦变换（DCT）。在以前的工作的基础上，展示了具有紧凑架构的ENN的强大表现力，我们现在强调它们的效率，可解释性和修剪能力。基于DCT的参数化提供了一种结构化和去相关的表示，揭示了每个神经元的功能作用，并允许直接识别冗余组件。利用这个属性，我们提出了一个有效的修剪策略，删除不必要的DCT系数，可以忽略不计或没有损失的性能。分类和隐式神经表征任务的实验结果证实，ENN在保持少量参数的同时实现了最先进的准确性。此外，由于DCT基的正交性和有界性，高达40%的激活系数可以被安全地修剪。总的来说，这些发现表明，ENN框架提供了信号处理概念到神经网络设计的原则性集成，实现了表达性，紧凑性和可解释性之间的平衡权衡。
摘要：In this paper, we extend our previous work on the Expressive Neural Network (ENN), a multilayer perceptron with adaptive activation functions parametrized using the Discrete Cosine Transform (DCT). Building upon previous work that demonstrated the strong expressiveness of ENNs with compact architectures, we now emphasize their efficiency, interpretability and pruning capabilities. The DCT-based parameterization provides a structured and decorrelated representation that reveals the functional role of each neuron and allows direct identification of redundant components. Leveraging this property, we propose an efficient pruning strategy that removes unnecessary DCT coefficients with negligible or no loss in performance. Experimental results across classification and implicit neural representation tasks confirm that ENNs achieve state-of-the-art accuracy while maintaining a low number of parameters. Furthermore, up to 40% of the activation coefficients can be safely pruned, thanks to the orthogonality and bounded nature of the DCT basis. Overall, these findings demonstrate that the ENN framework offers a principled integration of signal processing concepts into neural network design, achieving a balanced trade-off between expressiveness, compactness, and interpretability.

【6】Adaptable Hindsight Experience Replay for Search-Based Learning
标题：基于搜索学习的适应性后见之明经验回放
链接：https://arxiv.org/abs/2511.03405

作者：Alexandros Vazaios, Jannis Brugger, Cedric Derstroff, Kristian Kersting, Mira Mezini
备注：8 pages, 2 figures, Presented at the 9th International Workshop on Interactive Adaptive Learning
摘要：类似AlphaZero的蒙特卡洛树搜索系统最初是为双人游戏引入的，它使用神经网络指导来动态平衡探索和利用。这种组合使它们也适用于经典的搜索问题。然而，用模拟结果训练网络的原始方法在稀疏奖励设置中受到限制，特别是在网络还不能给出指导的早期阶段。后见之明经验重放（HER）通过将搜索树中不成功的轨迹重新标记为监督学习信号来解决这个问题。我们引入了Adaptable HER（\ours{}），这是一个灵活的框架，将HER与AlphaZero集成在一起，可以轻松调整HER属性，例如重新标记的目标，政策目标和轨迹选择。我们的实验，包括方程发现，表明修改HER的可能性是有益的，并超过了纯监督或强化学习的性能。
摘要：AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours{}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.

【7】TripleWin: Fixed-Point Equilibrium Pricing for Data-Model Coupled Markets
标题：TripleWin：数据模型耦合市场的定点均衡定价
链接：https://arxiv.org/abs/2511.03368

作者：Hongrun Ren, Yun Xiong, Lei You, Yingying Wang, Haixu Xiong, Yangyong Zhu
摘要：机器学习（ML）模型经济的兴起使训练数据集和预训练模型的市场交织在一起。然而，大多数定价方法仍然将数据和模型交易分开，或者依赖于以经纪人为中心的管道。最近的研究与外部性的数据市场捕捉买方的互动，但没有产生一个同步和对称的机制，在数据卖方，模型生产者和模型买方。我们提出了一个统一的数据模型耦合市场，将数据集和模型交易作为一个单一的系统。供应方映射将数据集支付转换为买方可见的模型报价，而需求方映射通过基于Shapley的分配将买方价格传播回数据集。它们共同形成了一个闭环，连接了四种相互作用：供需双向传播以及买方和卖方之间的相互耦合。我们证明了联合算子是一个标准的干扰函数（SIF），保证存在性，唯一性和全局收敛的均衡价格。实验表明，有效的收敛性和改善公平性相比，经纪人为中心的和片面的基线。该代码可在https://github.com/HongrunRen1109/Triple-Win-Pricing上获得。
摘要：The rise of the machine learning (ML) model economy has intertwined markets for training datasets and pre-trained models. However, most pricing approaches still separate data and model transactions or rely on broker-centric pipelines that favor one side. Recent studies of data markets with externalities capture buyer interactions but do not yield a simultaneous and symmetric mechanism across data sellers, model producers, and model buyers. We propose a unified data-model coupled market that treats dataset and model trading as a single system. A supply-side mapping transforms dataset payments into buyer-visible model quotations, while a demand-side mapping propagates buyer prices back to datasets through Shapley-based allocation. Together, they form a closed loop that links four interactions: supply-demand propagation in both directions and mutual coupling among buyers and among sellers. We prove that the joint operator is a standard interference function (SIF), guaranteeing existence, uniqueness, and global convergence of equilibrium prices. Experiments demonstrate efficient convergence and improved fairness compared with broker-centric and one-sided baselines. The code is available on https://github.com/HongrunRen1109/Triple-Win-Pricing.

【8】QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
标题：QG-CoC：大型多模式模型的格式引导字幕链
链接：https://arxiv.org/abs/2511.03206

作者：Kuei-Chun Kao, Hsu Tzu-Yin, Yunqi Hong, Ruochen Wang, Cho-Jui Hsieh
备注：16 pages
摘要：最近，多模态大型语言模型（MLLM）在多图像环境中遇到了两个关键问题：（1）缺乏对不同图像的细粒度感知，以及（2）有效推理和合成来自多个视觉输入的信息的能力减弱。然而，虽然各种提示方法的目的是描述视觉内容，许多现有的研究主要集中在单一的图像设置或特定的，受约束的情况下。这在理解和解决MLLM如何处理更一般和复杂的多图像推理任务方面留下了关键的差距。因此，我们首先广泛调查目前的提示方法如何感知细粒度的视觉细节和处理视觉信息时，处理多个图像。我们的研究结果表明，现有的提示方法不出席所需的线索和无缝集成的感知和推理。受这些发现的启发，我们提出了一种新的zero-shot提示方法，引导的标题链（QG-CoC），一种通用的提示方法，有效地处理任意数量的图像的问题。我们评估我们的方法在各种开源和闭源MLLM多图像和单图像基准。实验结果表明，QG-CoC表现出跨任务的竞争力的性能，并表现出强大的改进，在具有挑战性的情况下，现有的提示方法失败。
摘要：Recently, Multimodal Large Language Models (MLLMs) encounter two key issues in multi-image contexts: (1) a lack of fine-grained perception across disparate images, and (2) a diminished capability to effectively reason over and synthesize information from multiple visual inputs. However, while various prompting methods aim to describe visual content, many existing studies focus primarily on single-image settings or specific, constrained scenarios. This leaves a critical gap in understanding and addressing how MLLMs tackle more general and complex multi-image reasoning tasks. Thus, we first extensively investigate how current prompting methods perceive fine-grained visual details and process visual information when dealing with multiple images. Our findings reveal that existing prompting methods fall short in attending to needed clues and seamlessly integrating perception and reasoning. Inspired by the findings, we propose a new zero-shot prompting method, Question-Guided Chain-of-Captions (QG-CoC), a generalized prompting approach that effectively handles problems with an arbitrary number of images. We evaluate our method on various open-source and closed-source MLLMs for multi-image and single-image benchmarks. Experimental results indicate that QG-CoC demonstrates competitive performance across tasks and exhibits robust improvements in the challenging scenarios where existing prompting methods fail.

【9】Efficient Linear Attention for Multivariate Time Series Modeling via Entropy Equality
标题：通过熵等式实现多元时间序列建模的高效线性注意力
链接：https://arxiv.org/abs/2511.03190

作者：Mingtao Zhang, Guoli Yang, Zhanxing Zhu, Mengzhu Wang, Xiaoying Bai
摘要：注意力机制由于其捕捉复杂依赖关系的能力而被广泛应用于各种应用中，包括时间序列建模;然而，它们的效用通常受到二次计算复杂度的限制，这阻碍了长序列的可扩展性。在这项工作中，我们提出了一种新的线性注意力机制，旨在克服这些限制。我们的方法是基于一个理论证明，熵，作为一个严格的凹函数的概率单纯形，意味着对齐的概率排名和相似的熵值的分布表现出结构相似性。基于这一认识，我们开发了一种高效的近似算法，该算法仅以线性复杂度计算点积衍生分布的熵，从而实现基于熵相等的线性注意力机制。通过严格的分析，我们发现，注意力在时空时间序列建模中的有效性可能主要不是源于softmax的非线性，而是来自于实现适度和平衡的权重分布。在四个时空数据集上进行的大量实验验证了我们的方法，证明了具有竞争力或优越的预测性能，同时实现了内存使用和计算时间的大幅减少。
摘要：Attention mechanisms have been extensively employed in various applications, including time series modeling, owing to their capacity to capture intricate dependencies; however, their utility is often constrained by quadratic computational complexity, which impedes scalability for long sequences. In this work, we propose a novel linear attention mechanism designed to overcome these limitations. Our approach is grounded in a theoretical demonstration that entropy, as a strictly concave function on the probability simplex, implies that distributions with aligned probability rankings and similar entropy values exhibit structural resemblance. Building on this insight, we develop an efficient approximation algorithm that computes the entropy of dot-product-derived distributions with only linear complexity, enabling the implementation of a linear attention mechanism based on entropy equality. Through rigorous analysis, we reveal that the effectiveness of attention in spatio-temporal time series modeling may not primarily stem from the non-linearity of softmax but rather from the attainment of a moderate and well-balanced weight distribution. Extensive experiments on four spatio-temporal datasets validate our method, demonstrating competitive or superior forecasting performance while achieving substantial reductions in both memory usage and computational time.

【10】Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control
标题：基于学习的协作机器人纸张包装：具有残余力控制的统一控制策略
链接：https://arxiv.org/abs/2511.03181

作者：Rewida Ali, Cristian C. Beltran-Hernandez, Weiwei Wan, Kensuke Harada
摘要：在仓库和零售店等环境中，人机合作至关重要，工人经常处理纸张，袋子和织物等可变形物体。由于可变形材料的不可预测的动力学和自适应力控制的需要，协调机器人的动作与人类的协助仍然很困难。为了探索这一挑战，我们专注于礼品包装的任务，它体现了一个长期的操作问题，涉及精确折叠，控制折痕，并安全固定的纸张。当机器人完成生产一个整齐包装的包装，具有干净的褶皱和没有撕裂的序列时，成功就实现了。我们提出了一个基于学习的框架，该框架集成了由大型语言模型（LLM）提供支持的高级任务规划器，以及低级混合模仿学习（IL）和强化学习（RL）策略。其核心是一个子任务感知机器人Transformer（START），它从人类演示中学习统一的策略。关键的新颖性在于在单个模型中捕获整个包装序列的长范围时间依赖性。与通常应用于短任务的具有Transformer的香草动作分块（ACT）不同，我们的方法引入了提供显式时间基础的子任务ID。这使整个包装过程具有强大的性能，并支持灵活的执行，因为策略学习子目标，而不仅仅是复制运动序列。我们的框架在真实世界的包装任务上实现了97%的成功率。我们表明，统一的变压器为基础的政策，减少了需要专门的模型，允许控制人的监督，并有效地桥梁高层次的意图与细粒度的力控制所需的可变形对象的操作。
摘要：Human-robot cooperation is essential in environments such as warehouses and retail stores, where workers frequently handle deformable objects like paper, bags, and fabrics. Coordinating robotic actions with human assistance remains difficult due to the unpredictable dynamics of deformable materials and the need for adaptive force control. To explore this challenge, we focus on the task of gift wrapping, which exemplifies a long-horizon manipulation problem involving precise folding, controlled creasing, and secure fixation of paper. Success is achieved when the robot completes the sequence to produce a neatly wrapped package with clean folds and no tears. We propose a learning-based framework that integrates a high-level task planner powered by a large language model (LLM) with a low-level hybrid imitation learning (IL) and reinforcement learning (RL) policy. At its core is a Sub-task Aware Robotic Transformer (START) that learns a unified policy from human demonstrations. The key novelty lies in capturing long-range temporal dependencies across the full wrapping sequence within a single model. Unlike vanilla Action Chunking with Transformer (ACT), typically applied to short tasks, our method introduces sub-task IDs that provide explicit temporal grounding. This enables robust performance across the entire wrapping process and supports flexible execution, as the policy learns sub-goals rather than merely replicating motion sequences. Our framework achieves a 97% success rate on real-world wrapping tasks. We show that the unified transformer-based policy reduces the need for specialized models, allows controlled human supervision, and effectively bridges high-level intent with the fine-grained force control required for deformable object manipulation.

【11】Scheduling the Off-Diagonal Weingarten Loss of Neural SDFs for CAD Models
标题：为CAD模型安排神经SDF的非对角Weingarten损失
链接：https://arxiv.org/abs/2511.03147

作者：Haotian Yin, Przemyslaw Musialski
备注：Lecture Notes in Computer Science (LNCS), 20th International Symposium on Visual Computing 2025, 12 pages, 4 figures, preprint
摘要：神经符号距离函数（SDF）已成为从点云进行几何重建的强大表示，但它们通常需要基于梯度和曲率的正则化来抑制虚假扭曲并保持结构保真度。FlatCAD引入了非对角Weingarten（ODW）损失作为CAD曲面的有效二阶先验，以大约一半的计算成本近似全Hessian正则化。然而，FlatCAD在整个训练过程中应用固定的ODW权重，这是次优的：强正则化稳定了早期优化，但抑制了后期的细节恢复。我们提出了调度策略的ODW损失，分配一个高的初始权重，以稳定优化和逐步衰减，以允许精细规模的细化。我们调查常数，线性，五次和步进插值时间表，以及增加热身的变种。ABC CAD数据集上的实验表明，随时间变化的时间表始终优于固定权重。我们的方法在FlatCAD基线上实现了高达35%的倒角距离改进，将调度建立为曲率正则化的简单而有效的扩展，以实现强大的CAD重建。
摘要：Neural signed distance functions (SDFs) have become a powerful representation for geometric reconstruction from point clouds, yet they often require both gradient- and curvature-based regularization to suppress spurious warp and preserve structural fidelity. FlatCAD introduced the Off-Diagonal Weingarten (ODW) loss as an efficient second-order prior for CAD surfaces, approximating full-Hessian regularization at roughly half the computational cost. However, FlatCAD applies a fixed ODW weight throughout training, which is suboptimal: strong regularization stabilizes early optimization but suppresses detail recovery in later stages. We present scheduling strategies for the ODW loss that assign a high initial weight to stabilize optimization and progressively decay it to permit fine-scale refinement. We investigate constant, linear, quintic, and step interpolation schedules, as well as an increasing warm-up variant. Experiments on the ABC CAD dataset demonstrate that time-varying schedules consistently outperform fixed weights. Our method achieves up to a 35% improvement in Chamfer Distance over the FlatCAD baseline, establishing scheduling as a simple yet effective extension of curvature regularization for robust CAD reconstruction.

【12】An Augmentation Overlap Theory of Contrastive Learning
标题：对比学习的增强重叠理论
链接：https://arxiv.org/abs/2511.03114

作者：Qi Zhang, Yifei Wang, Yisen Wang
摘要：近年来，自监督对比学习在各种任务上取得了巨大的成功。然而，其基本的工作机制尚不清楚。在本文中，我们首先提供了最严格的边界的基础上广泛采用的条件独立的假设。此外，我们放宽了条件独立性的假设，以更实际的假设扩增重叠，并推导出渐近封闭的边界下游的性能。我们提出的增强重叠理论取决于这样一种观点，即在积极的数据增强下，不同类内样本的支持将变得更加重叠，因此简单地对齐阳性样本（同一样本的增强视图）可以使对比学习将类内样本聚类在一起。此外，从新衍生的增强重叠的角度来看，我们开发了一个无监督的指标，用于对比学习的表示评估，它与下游的性能几乎不依赖于额外的模块。代码可在https://github.com/PKU-ML/GARC上获得。
摘要：Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption of conditional independence. Further, we relax the conditional independence assumption to a more practical assumption of augmentation overlap and derive the asymptotically closed bounds for the downstream performance. Our proposed augmentation overlap theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations, thus simply aligning the positive samples (augmented views of the same sample) could make contrastive learning cluster intra-class samples together. Moreover, from the newly derived augmentation overlap perspective, we develop an unsupervised metric for the representation evaluation of contrastive learning, which aligns well with the downstream performance almost without relying on additional modules. Code is available at https://github.com/PKU-ML/GARC.

【13】Scaling Multi-Agent Environment Co-Design with Diffusion Models
标题：利用扩散模型扩展多智能体环境协同设计
链接：https://arxiv.org/abs/2511.03100

作者：Hao Xiang Li, Michael Amir, Amanda Prorok
摘要：代理环境协同设计范式共同优化代理策略和环境配置，以提高系统性能。从仓库物流到风电场管理的应用领域，协同设计有望从根本上改变我们部署多代理系统的方式。然而，目前的协同设计方法难以扩展。它们在高维环境设计空间下崩溃，并且在解决联合优化固有的移动目标时遭受样本效率低下。我们通过开发扩散协同设计（DiCoDe）来应对这些挑战，这是一个可扩展且样本高效的协同设计框架，将协同设计推向实际相关的设置。DiCoDe集成了两个核心创新。首先，我们介绍了投影通用指导（PUG），一种采样技术，使DiCoDe探索奖励最大化的环境分布，同时满足硬约束，如障碍物之间的空间分离。其次，我们设计了一个批评蒸馏机制，以分享来自强化学习批评的知识，确保引导扩散模型使用密集和最新的学习信号来适应不断发展的代理策略。总之，这些改进导致优越的环境政策对时，具有挑战性的多智能体环境协同设计基准验证，包括仓库自动化，多智能体寻路和风电场优化。我们的方法始终超过了最先进的水平，例如，在仓库设置中获得了39%的高奖励，而模拟样本减少了66%。这为智能体-环境协同设计设定了一个新的标准，也是在现实世界领域获得协同设计回报的垫脚石。
摘要：The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance. With application domains ranging from warehouse logistics to windfarm management, co-design promises to fundamentally change how we deploy multi-agent systems. However, current co-design methods struggle to scale. They collapse under high-dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address these challenges by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework pushing co-design towards practically relevant settings. DiCoDe incorporates two core innovations. First, we introduce Projected Universal Guidance (PUG), a sampling technique that enables DiCoDe to explore a distribution of reward-maximising environments while satisfying hard constraints such as spatial separation between obstacles. Second, we devise a critic distillation mechanism to share knowledge from the reinforcement learning critic, ensuring that the guided diffusion model adapts to evolving agent policies using a dense and up-to-date learning signal. Together, these improvements lead to superior environment-policy pairs when validated on challenging multi-agent environment co-design benchmarks including warehouse automation, multi-agent pathfinding and wind farm optimisation. Our method consistently exceeds the state-of-the-art, achieving, for example, 39% higher rewards in the warehouse setting with 66% fewer simulation samples. This sets a new standard in agent-environment co-design, and is a stepping stone towards reaping the rewards of co-design in real world domains.

【14】Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
标题：在线学习腐败排名：一种强有力的级联强盗方法
链接：https://arxiv.org/abs/2511.03074

作者：Fatemeh Ghaffari, Siddarth Sitaraman, Xutong Liu, Xuchuang Wang, Mohammad Hajiesmaili
摘要：在线学习排名（OLTR）研究如何从一个大的池中推荐一个简短的排名列表，并根据用户点击来改善未来的排名。这种设置通常被建模为级联强盗，其目标是最大化用户在尽可能多的时间步长上点击至少一个所呈现的项目的可能性。然而，这样的系统容易受到点击欺诈和其他操纵（即，腐败），其中机器人或付费点击农场注入错误的反馈，误导学习过程并降低用户体验。在本文中，我们提出了MSUCB，一个强大的算法，采用了一种新的平均中位数估计，据我们所知，这是第一次适用于腐败设置的土匪。这个估计量在没有腐败的情况下表现得像一个标准均值，所以没有为鲁棒性付出任何代价。在腐败情况下，中值步骤过滤掉离群值和腐败样本，使估计值接近其真实值。在每一轮更新该估计进一步加速了实验中的经验收敛。因此，MSUCB实现了最佳的对数遗憾在没有腐败和腐败下优雅地下降，与遗憾只增加了一个附加项与总腐败。在真实数据集上进行的全面和广泛的实验进一步证明，我们的方法在保持强大鲁棒性的同时始终优于现有方法。特别是，它实现了一个\（97.35\%\）和\（91.60\%\）的遗憾改善超过两个国家的最先进的方法。
摘要：Online learning to rank (OLTR) studies how to recommend a short ranked list of items from a large pool and improves future rankings based on user clicks. This setting is commonly modeled as cascading bandits, where the objective is to maximize the likelihood that the user clicks on at least one of the presented items across as many timesteps as possible. However, such systems are vulnerable to click fraud and other manipulations (i.e., corruption), where bots or paid click farms inject corrupted feedback that misleads the learning process and degrades user experience. In this paper, we propose MSUCB, a robust algorithm that incorporates a novel mean-of-medians estimator, which to our knowledge is applied to bandits with corruption setting for the first time. This estimator behaves like a standard mean in the absence of corruption, so no cost is paid for robustness. Under corruption, the median step filters out outliers and corrupted samples, keeping the estimate close to its true value. Updating this estimate at every round further accelerates empirical convergence in experiments. Hence, MSUCB achieves optimal logarithmic regret in the absence of corruption and degrades gracefully under corruptions, with regret increasing only by an additive term tied to the total corruption. Comprehensive and extensive experiments on real-world datasets further demonstrate that our approach consistently outperforms prior methods while maintaining strong robustness. In particular, it achieves a $97.35\%$ and a $91.60\%$ regret improvement over two state-of-the-art methods.

【15】Influence of Data Dimensionality Reduction Methods on the Effectiveness of Quantum Machine Learning Models
标题：数据模糊性降低方法对量子机器学习模型有效性的影响
链接：https://arxiv.org/abs/2511.03320

作者：Aakash Ravindra Shinde, Jukka K. Nurminen
备注：12 pages, IEEE International Conference on Quantum Computing & Engineering (QCE25)
摘要：数据降维技术通常用于实现量子机器学习模型，以解决两个重要问题：NISQ量子设备的约束，其特征在于噪声和有限数量的量子位，以及在经典设备上模拟大量量子位的挑战。这也引起了人们对这些方法的可扩展性的担忧，因为降维方法适应大型数据集的速度很慢。在这篇文章中，我们分析了数据简化方法如何影响不同的QML模型。我们在几个生成的数据集，量子机算法，量子数据编码方法和数据简化方法上进行了这个实验。所有这些模型都根据准确度、精确度、召回率和F1得分等性能指标进行了评估。我们的研究结果使我们得出结论，使用数据降维方法会导致性能指标值偏斜，从而导致错误地估计量子机器学习模型的实际性能。有几个因素，以及数据降维方法，使这个问题恶化，例如数据集的特性，经典到量子信息嵌入方法，特征减少的百分比，与量子模型相关的经典组件，以及量子机器学习模型的结构。我们一直观察到这些模型之间的准确率范围为14%至48%，使用数据缩减和不使用它。除此之外，我们的观察表明，一些数据缩减方法往往对某些特定的数据嵌入方法和反演算法结构表现更好。
摘要：Data dimensionality reduction techniques are often utilized in the implementation of Quantum Machine Learning models to address two significant issues: the constraints of NISQ quantum devices, which are characterized by noise and a limited number of qubits, and the challenge of simulating a large number of qubits on classical devices. It also raises concerns over the scalability of these approaches, as dimensionality reduction methods are slow to adapt to large datasets. In this article, we analyze how data reduction methods affect different QML models. We conduct this experiment over several generated datasets, quantum machine algorithms, quantum data encoding methods, and data reduction methods. All these models were evaluated on the performance metrics like accuracy, precision, recall, and F1 score. Our findings have led us to conclude that the usage of data dimensionality reduction methods results in skewed performance metric values, which results in wrongly estimating the actual performance of quantum machine learning models. There are several factors, along with data dimensionality reduction methods, that worsen this problem, such as characteristics of the datasets, classical to quantum information embedding methods, percentage of feature reduction, classical components associated with quantum models, and structure of quantum machine learning models. We consistently observed the difference in the accuracy range of 14% to 48% amongst these models, using data reduction and not using it. Apart from this, our observations have shown that some data reduction methods tend to perform better for some specific data embedding methodologies and ansatz constructions.

【16】Provable Separations between Memorization and Generalization in Diffusion Models
标题：扩散模型中简化和推广之间可证明的分离
链接：https://arxiv.org/abs/2511.03202

作者：Zeqi Ye, Qijie Zhu, Molei Tao, Minshuo Chen
备注：51 pages, 4 figures
摘要：扩散模型在不同的领域取得了巨大的成功，但它们仍然容易受到记忆的影响-复制训练数据而不是产生新的输出。这不仅限制了他们的创造潜力，也引发了对隐私和安全的担忧。虽然实证研究已经探索了缓解策略，但对记忆的理论理解仍然有限。我们通过两个互补的观点：统计估计和网络近似，通过开发一个双重分离的结果来解决这个差距。从估计方面来看，我们证明了地面实况评分函数并没有最小化经验去噪损失，从而产生了一种驱动记忆的分离。从近似的角度来看，我们证明了实现经验得分函数需要网络大小与样本大小成比例，与更紧凑的地面实况得分函数的网络表示相比，这是一个分离。在这些见解的指导下，我们开发了一种基于修剪的方法，在保持扩散Transformers的生成质量的同时减少记忆。
摘要：Diffusion models have achieved remarkable success across diverse domains, but they remain vulnerable to memorization -- reproducing training data rather than generating novel outputs. This not only limits their creative potential but also raises concerns about privacy and safety. While empirical studies have explored mitigation strategies, theoretical understanding of memorization remains limited. We address this gap through developing a dual-separation result via two complementary perspectives: statistical estimation and network approximation. From the estimation side, we show that the ground-truth score function does not minimize the empirical denoising loss, creating a separation that drives memorization. From the approximation side, we prove that implementing the empirical score function requires network size to scale with sample size, spelling a separation compared to the more compact network representation of the ground-truth score function. Guided by these insights, we develop a pruning-based method that reduces memorization while maintaining generation quality in diffusion transformers.

【17】Association-sensory spatiotemporal hierarchy and functional gradient-regularised recurrent neural network with implications for schizophrenia
标题：联想感觉时空层次结构和功能梯度调整回归神经网络对精神分裂症的影响
链接：https://arxiv.org/abs/2511.02722

作者：Subati Abulikemu, Puria Radmard, Michail Mamalakis, John Suckling
备注：34 pages, 9 figures
摘要：人类大脑新皮层在功能上是沿着一个连续的感觉-联想（AS）层次在其最高水平上组织的。本研究的特点与对照组相比，精神分裂症患者的AS等级。使用一个大的功能磁共振成像数据集（N=355），我们提取个人AS梯度通过频谱分析的大脑连接，量化层次专业化梯度扩散，并与连接几何的蔓延。我们发现，精神分裂症压缩的AS层次结构，表明减少功能分化。通过用Ornstein-Uhlenbeck过程对神经时间尺度进行建模，我们观察到，在梯度极值处最专业化的局部内聚区域表现出具有较长时间常数的动态，这种效应在精神分裂症中减弱。为了研究计算，我们使用梯度来正则化在工作记忆任务上训练的特定于主题的递归神经网络（RNN）。具有更大梯度分布的网络学习效率更高，在较低的任务损失下保持稳定，并与规定的AS分层几何结构保持更强的一致性。固定点线性化表明，高范围的网络解决到更稳定的神经状态在记忆延迟，证明了较低的能量和较小的最大雅可比特征值。因此，这种梯度正则化的RNN框架将大规模皮层结构与固定点稳定性联系起来，提供了梯度去分化如何使精神分裂症的神经计算不稳定的机制，并得到经验时间尺度平坦化和基于模型的不太稳定的固定点证据的一致支持。
摘要：The human neocortex is functionally organised at its highest level along a continuous sensory-to-association (AS) hierarchy. This study characterises the AS hierarchy of patients with schizophrenia in a comparison with controls. Using a large fMRI dataset (N=355), we extracted individual AS gradients via spectral analysis of brain connectivity, quantified hierarchical specialisation by gradient spread, and related this spread with connectivity geometry. We found that schizophrenia compresses the AS hierarchy indicating reduced functional differentiation. By modelling neural timescale with the Ornstein-Uhlenbeck process, we observed that the most specialised, locally cohesive regions at the gradient extremes exhibit dynamics with a longer time constant, an effect that is attenuated in schizophrenia. To study computation, we used the gradients to regularise subject-specific recurrent neural networks (RNNs) trained on working memory tasks. Networks endowed with greater gradient spread learned more efficiently, plateaued at lower task loss, and maintained stronger alignment to the prescribed AS hierarchical geometry. Fixed point linearisation showed that high-range networks settled into more stable neural states during memory delay, evidenced by lower energy and smaller maximal Jacobian eigenvalues. This gradient-regularised RNN framework therefore links large-scale cortical architecture with fixed point stability, providing a mechanistic account of how gradient de-differentiation could destabilise neural computations in schizophrenia, convergently supported by empirical timescale flattening and model-based evidence of less stable fixed points.

其他(37篇)

【1】Structured Matrix Scaling for Multi-Class Calibration
标题：用于多类校准的结构化矩阵缩放
链接：https://arxiv.org/abs/2511.03685

作者：Eugène Berta, David Holzmüller, Michael I. Jordan, Francis Bach
摘要：事后重新校准方法被广泛使用，以确保分类器提供可靠的概率估计。我们认为，基于逻辑回归的参数重新校准功能可以从一个简单的理论设置为二进制和多类分类的动机。这一见解促使人们使用标准温度定标之外的更具表现力的校准方法。然而，对于多类校准，一个关键的挑战在于越来越多的参数引入更复杂的模型，往往加上有限的校准数据，这可能导致过拟合。通过大量的实验，我们证明，由此产生的偏差方差权衡可以有效地管理结构化正则化，鲁棒的预处理和高效的优化。由此产生的方法导致现有的基于逻辑的校准技术的实质性收益。我们提供了我们的方法的高效和易于使用的开源实现，使他们成为一个有吸引力的替代常见的温度，矢量和矩阵缩放实现。
摘要：Post-hoc recalibration methods are widely used to ensure that classifiers provide faithful probability estimates. We argue that parametric recalibration functions based on logistic regression can be motivated from a simple theoretical setting for both binary and multiclass classification. This insight motivates the use of more expressive calibration methods beyond standard temperature scaling. For multi-class calibration however, a key challenge lies in the increasing number of parameters introduced by more complex models, often coupled with limited calibration data, which can lead to overfitting. Through extensive experiments, we demonstrate that the resulting bias-variance tradeoff can be effectively managed by structured regularization, robust preprocessing and efficient optimization. The resulting methods lead to substantial gains over existing logistic-based calibration techniques. We provide efficient and easy-to-use open-source implementations of our methods, making them an attractive alternative to common temperature, vector, and matrix scaling implementations.

【2】DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay
标题：DQN表现，带有情节贪婪政策和优先体验重播
链接：https://arxiv.org/abs/2511.03670

作者：Daniel Perkins, Oscar J. Escobar, Luke Green
备注：10 pages, 8 figures
摘要：我们详细研究了有限环境中的深度Q网络，强调了ε贪婪探索计划和优先经验重放的影响。通过系统的实验，我们评估的变化，衰减时间表影响学习效率，收敛行为和奖励优化。我们研究优先经验重放如何导致更快的收敛和更高的回报，并显示经验结果比较均匀，没有重放，并在多个模拟的优先策略。我们的研究结果阐明了DQN训练中探索策略和记忆管理之间的权衡和相互作用，为资源受限环境中的鲁棒强化学习提供了实用建议。
摘要：We present a detailed study of Deep Q-Networks in finite environments, emphasizing the impact of epsilon-greedy exploration schedules and prioritized experience replay. Through systematic experimentation, we evaluate how variations in epsilon decay schedules affect learning efficiency, convergence behavior, and reward optimization. We investigate how prioritized experience replay leads to faster convergence and higher returns and show empirical results comparing uniform, no replay, and prioritized strategies across multiple simulations. Our findings illuminate the trade-offs and interactions between exploration strategies and memory management in DQN training, offering practical recommendations for robust reinforcement learning in resource-constrained settings.

【3】Efficient Testing Implies Structured Symmetry
标题：高效测试意味着结构对称性
链接：https://arxiv.org/abs/2511.03653

作者：Cynthia Dwork, Pranay Tankala
摘要：给定一个小的随机样本的$n$位字符串标记的一个未知的布尔函数，该函数的哪些属性可以测试计算效率？我们证明了从几个样本和结构对称性，这只取决于功能的平均值的部分低复杂度分区的域之间的等价性，有效地测试的属性。在没有效率约束的情况下，Blais和Yoshida（2019）在非结构对称性方面获得了类似的表征。我们的主要技术工具是超级模拟，它建立在算法公平性文献的方法基础上，通过小电路模拟器来近似任意复杂的函数，这些模拟器可以欺骗更大的模拟器。我们还沿着其他轴扩展了表征。我们表明，允许部分重叠指数减少其所需的数量，扩大了建设的范围，从属性测试与$O（\log n）$样本的属性测试与$O（n）$样本。对于更大的样本量，我们表明，任何有效的测试基本上是检查从一个有界的小电路集合的不可扩展性，在精神上的可测试图形属性的表征。最后，我们证明了我们的结果布尔函数测试推广到任意域上的高熵分布测试。
摘要：Given a small random sample of $n$-bit strings labeled by an unknown Boolean function, which properties of this function can be tested computationally efficiently? We show an equivalence between properties that are efficiently testable from few samples and properties with structured symmetry, which depend only on the function's average values on parts of a low-complexity partition of the domain. Without the efficiency constraint, a similar characterization in terms of unstructured symmetry was obtained by Blais and Yoshida (2019). Our main technical tool is supersimulation, which builds on methods from the algorithmic fairness literature to approximate arbitrarily complex functions by small-circuit simulators that fool significantly larger distinguishers. We extend the characterization along other axes as well. We show that allowing parts to overlap exponentially reduces their required number, broadening the scope of the construction from properties testable with $O(\log n)$ samples to properties testable with $O(n)$ samples. For larger sample sizes, we show that any efficient tester is essentially checking for indistinguishability from a bounded collection of small circuits, in the spirit of a characterization of testable graph properties. Finally, we show that our results for Boolean function testing generalize to high-entropy distribution testing on arbitrary domains.

【4】nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN
标题：nanoTabPFN：TabPFN的轻量级教育性重新实现
链接：https://arxiv.org/abs/2511.03634

作者：Alexander Pfefferle, Johannes Hog, Lennart Purucker, Frank Hutter
摘要：TabPFN等表格基础模型彻底改变了表格数据的预测机器学习。与此同时，这场革命的驱动因素很难理解。现有的开源表格基础模型是在复杂的管道中实现的，拥有超过10，000行代码，缺乏架构文档或代码质量。简而言之，这些实现很难理解，对初学者不友好，而且很难适应新的实验。我们介绍nanoTabPFN，TabPFN v2架构的简化和轻量级实现，以及使用预生成的训练数据的相应训练循环。nanoTabPFN使学生和研究人员更容易访问表格基础模型。例如，仅限于小数据设置，它在单个GPU上的预训练一分钟内实现了与传统机器学习基线相当的性能（比TabPFN v2预训练快160，000倍）。这种消除了对大量计算资源的需求，使得预训练表格基础模型可用于教育目的。我们的代码可在https://github.com/automl/nanoTabPFN上获得。
摘要：Tabular foundation models such as TabPFN have revolutionized predictive machine learning for tabular data. At the same time, the driving factors of this revolution are hard to understand. Existing open-source tabular foundation models are implemented in complicated pipelines boasting over 10,000 lines of code, lack architecture documentation or code quality. In short, the implementations are hard to understand, not beginner-friendly, and complicated to adapt for new experiments. We introduce nanoTabPFN, a simplified and lightweight implementation of the TabPFN v2 architecture and a corresponding training loop that uses pre-generated training data. nanoTabPFN makes tabular foundation models more accessible to students and researchers alike. For example, restricted to a small data setting it achieves a performance comparable to traditional machine learning baselines within one minute of pre-training on a single GPU (160,000x faster than TabPFN v2 pretraining). This eliminated requirement of large computational resources makes pre-training tabular foundation models accessible for educational purposes. Our code is available at https://github.com/automl/nanoTabPFN.

【5】Neural Beamforming with Doppler-Aware Sparse Attention for High Mobility Environments
标题：高迁移率环境下具有雷达感知稀疏注意力的神经束形成
链接：https://arxiv.org/abs/2511.03632

作者：Cemil Vahapoglu, Timothy J. O'Shea, Wan Liu, Sennur Ulukus
摘要：波束成形对于提高多天线无线系统中的频谱效率和减轻干扰，促进密集和高移动性场景中的空间复用和分集具有重要意义。传统的波束形成技术，如迫零波束形成（ZFBF）和最小均方误差（MMSE）波束形成经验性能恶化不利的信道条件下。基于深度学习的波束成形通过提高对动态信道环境的鲁棒性，提供了一种从信道状态信息（CSI）到波束成形权重的非线性映射的替代方案。基于transformer的模型特别有效，因为它们能够跨时间和频率对长期依赖性进行建模。然而，它们的二次注意力复杂度限制了大型OFDM网格的可扩展性。最近的研究通过稀疏注意机制来解决这个问题，该机制在保持表现力的同时降低了复杂性，但通常采用忽略信道动态的模式，因为它们不是专门为无线通信场景设计的。在这项工作中，我们提出了一个多普勒感知稀疏神经网络波束形成（多普勒感知稀疏NNBF）模型，该模型在多用户单输入多输出（MU-SIMO）设置中采用了信道自适应稀疏注意机制。所提出的稀疏结构是可配置的沿二维时频轴的基础上的信道动态，并在理论上证明，以确保在p跳，其中p是注意头的数量内的完全连接。城市宏（UMa）信道条件下的仿真结果表明，多普勒感知稀疏NNBF显着优于固定模式的基线，称为标准稀疏NNBF，和传统的波束成形技术ZFBF和MMSE波束成形在高移动性的情况下，同时保持结构化的稀疏性与受控数量的参与密钥每个查询。
摘要：Beamforming has significance for enhancing spectral efficiency and mitigating interference in multi-antenna wireless systems, facilitating spatial multiplexing and diversity in dense and high mobility scenarios. Traditional beamforming techniques such as zero-forcing beamforming (ZFBF) and minimum mean square error (MMSE) beamforming experience performance deterioration under adverse channel conditions. Deep learning-based beamforming offers an alternative with nonlinear mappings from channel state information (CSI) to beamforming weights by improving robustness against dynamic channel environments. Transformer-based models are particularly effective due to their ability to model long-range dependencies across time and frequency. However, their quadratic attention complexity limits scalability in large OFDM grids. Recent studies address this issue through sparse attention mechanisms that reduce complexity while maintaining expressiveness, yet often employ patterns that disregard channel dynamics, as they are not specifically designed for wireless communication scenarios. In this work, we propose a Doppler-aware Sparse Neural Network Beamforming (Doppler-aware Sparse NNBF) model that incorporates a channel-adaptive sparse attention mechanism in a multi-user single-input multiple-output (MU-SIMO) setting. The proposed sparsity structure is configurable along 2D time-frequency axes based on channel dynamics and is theoretically proven to ensure full connectivity within p hops, where p is the number of attention heads. Simulation results under urban macro (UMa) channel conditions show that Doppler-aware Sparse NNBF significantly outperforms both a fixed-pattern baseline, referred to as Standard Sparse NNBF, and conventional beamforming techniques ZFBF and MMSE beamforming in high mobility scenarios, while maintaining structured sparsity with a controlled number of attended keys per query.

【6】BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation
标题：BanglaSTEM：技术领域孟加拉语-英语翻译的并行数据库
链接：https://arxiv.org/abs/2511.03498

作者：Kazi Reyazul Hasan, Mubasshira Musarrat, A. B. M. Alim Al Islam, Muhammad Abdullah Adnan
摘要：大型语言模型在解决英语技术问题时表现良好，但在孟加拉语中提出相同问题时表现不佳。一个简单的解决方案是首先将孟加拉语问题翻译成英语，然后使用这些模型。然而，现有的孟加拉语-英语翻译系统难以处理技术术语。他们经常误译专业词汇，这改变了问题的含义，导致错误的答案。我们介绍了BanglaSTEM，这是一个包含5，000个精心挑选的孟加拉语-英语句子对的数据集，来自STEM领域，包括计算机科学，数学，物理，化学和生物学。我们使用语言模型生成了超过12，000个翻译，然后使用人工评估器来选择最高质量的对，以正确保留技术术语。我们在BanglaSTEM上训练了一个基于T5的翻译模型，并在两个任务上对其进行了测试：生成代码和解决数学问题。我们的研究结果显示，技术内容的翻译准确性有了显着提高，使孟加拉语使用者更容易有效地使用以英语为中心的语言模型。BanglaSTEM数据集和经过训练的翻译模型都在https://huggingface.co/reyazul/BanglaSTEM-T5上公开发布。
摘要：Large language models work well for technical problem solving in English but perform poorly when the same questions are asked in Bangla. A simple solution would be to translate Bangla questions into English first and then use these models. However, existing Bangla-English translation systems struggle with technical terms. They often mistranslate specialized vocabulary, which changes the meaning of the problem and leads to wrong answers. We present BanglaSTEM, a dataset of 5,000 carefully selected Bangla-English sentence pairs from STEM fields including computer science, mathematics, physics, chemistry, and biology. We generated over 12,000 translations using language models and then used human evaluators to select the highest quality pairs that preserve technical terminology correctly. We train a T5-based translation model on BanglaSTEM and test it on two tasks: generating code and solving math problems. Our results show significant improvements in translation accuracy for technical content, making it easier for Bangla speakers to use English-focused language models effectively. Both the BanglaSTEM dataset and the trained translation model are publicly released at https://huggingface.co/reyazul/BanglaSTEM-T5.

【7】Why Less is More (Sometimes): A Theory of Data Curation
标题：为什么少就是多（有时）：数据处理理论
链接：https://arxiv.org/abs/2511.03492

作者：Elvis Dohmatob, Mohammad Pezeshki, Reyhane Askari-Hemmat
摘要：本文介绍了一个理论框架来解决现代机器学习中的一个中心悖论：什么时候使用更少的数据更好？这个问题已经变得至关重要，因为经典的比例定律表明"越多越好“（Sun等人，2025）受到LIMO（"少即是多“）和s1（Ye等人，2025; Muenighoff等人，2025），它通过小的、积极策划的数据集实现了卓越的性能。在这里，我们研究数据策展策略，其中不完美的预言机根据训练样本的难度和正确性选择训练样本。我们的研究结果提供了精确的标度律曲线的标签不可知和标签感知策展规则下的测试误差，揭示了何时以及为什么只保留一个子集的数据可以提高泛化。与经典的标度律相反，我们表明，在某些条件下，小型策划数据集可以优于完整的数据集，我们通过推导与数据大小和质量相关的精确相变曲线来提供分析条件。我们用ImageNet上的经验结果验证了这些理论主张，证实了我们关于策展何时提高准确性甚至可以减轻模型崩溃的预测。此外，我们的框架提供了一个原则性的解释，最近在法学硕士数学推理中观察到的矛盾策展策略。
摘要：This paper introduces a theoretical framework to resolve a central paradox in modern machine learning: When is it better to use less data? This question has become critical as classical scaling laws suggesting ``more is more'' (Sun et al., 2025) are challenged by methods like LIMO (``less is more'') and s1 (Ye et al., 2025; Muenighoff et al., 2025), which achieve superior performance with small, aggressively curated datasets. Here, we study data curation strategies where an imperfect oracle selects the training examples according to their difficulty and correctness. Our results provide exact scaling law curves for test error under both label-agnostic and label-aware curation rules, revealing when and why keeping only a subset of data can improve generalization. In contrast to classical scaling laws, we show that under certain conditions, small curated datasets can outperform full datasets, and we provide analytical conditions for this by deriving precise phase transition curves tied to data size and quality. We validate these theoretical claims with empirical results on ImageNet, confirming our predictions about when curation improves accuracy and can even mitigate model collapse. Furthermore, our framework provides a principled explanation for the contradictory curation strategies recently observed in LLM mathematical reasoning.

【8】NAP: Attention-Based Late Fusion for Automatic Sleep Staging
标题：NAP：基于注意力的后期融合，用于自动睡眠阶段
链接：https://arxiv.org/abs/2511.03488

作者：Alvise Dei Rossi, Julia van der Meer, Markus H. Schmidt, Claudio L.A. Bassetti, Luigi Fiorillo, Francesca Faraci
摘要：多导睡眠图信号是高度异质的，在模态组成上变化（例如，EEG、EOG、ECG）、通道可用性（例如，额叶、枕叶EEG）和跨数据集和临床站点的采集协议。大多数处理多导睡眠图数据的现有模型依赖于模态或通道的固定子集，因此忽略了充分利用其固有的多模态性质。我们通过引入NAP（神经预测聚合器）来解决这一限制，NAP是一种基于注意力的模型，它使用三轴注意力机制来学习组合多个预测流，该机制捕获时间，空间和预测器级别的依赖关系。国家行动方案接受培训，以适应不同的投入层面。通过聚合来自冻结的、预训练的单通道模型的输出，NAP始终优于单个预测器和简单的集成，在多个数据集上实现了最先进的zero-shot泛化。虽然在多导睡眠图的自动睡眠分期的背景下证明，所提出的方法可以扩展到其他多模态生理应用。
摘要：Polysomnography signals are highly heterogeneous, varying in modality composition (e.g., EEG, EOG, ECG), channel availability (e.g., frontal, occipital EEG), and acquisition protocols across datasets and clinical sites. Most existing models that process polysomnography data rely on a fixed subset of modalities or channels and therefore neglect to fully exploit its inherently multimodal nature. We address this limitation by introducing NAP (Neural Aggregator of Predictions), an attention-based model which learns to combine multiple prediction streams using a tri-axial attention mechanism that captures temporal, spatial, and predictor-level dependencies. NAP is trained to adapt to different input dimensions. By aggregating outputs from frozen, pretrained single-channel models, NAP consistently outperforms individual predictors and simple ensembles, achieving state-of-the-art zero-shot generalization across multiple datasets. While demonstrated in the context of automated sleep staging from polysomnography, the proposed approach could be extended to other multimodal physiological applications.

【9】POEMS: Product of Experts for Interpretable Multi-omic Integration using Sparse Decoding
标题：POEMS：使用稀疏解码实现可解释多元组集成的专家产品
链接：https://arxiv.org/abs/2511.03464

作者：Mihriban Kocak Balik, Pekka Marttinen, Negar Safinianaini
备注：None
摘要：整合不同的分子层，即，多组学数据对于揭示疾病的复杂性至关重要;然而，大多数深度生成模型要么以牺牲可解释性为代价优先考虑预测性能，要么通过线性化解码器来增强可解释性，从而削弱网络的非线性表达能力。为了克服这种权衡，我们引入了POEMS：使用稀疏解码的可解释多组学集成专家产品，这是一种无监督的概率框架，在提供可解释性的同时保留了预测性能。POEMS通过以下方式提供可解释性，而无需将网络的任何部分线性化：1）使用稀疏连接将特征映射到潜在因素，这直接转化为生物标志物发现，2）使用专家模型的产品通过共享潜在空间允许交叉组学关联，以及3）通过自适应计算其在表示学习中的影响的门控网络报告每个组学的贡献。此外，我们提出了一个有效的稀疏解码器。在一个癌症亚型的案例研究中，POEMS实现了有竞争力的聚类和分类性能，同时提供了一套新的解释，证明了基于生物标志物的洞察力和预测准确性可以在多组学表示学习中共存。
摘要：Integrating different molecular layers, i.e., multiomics data, is crucial for unraveling the complexity of diseases; yet, most deep generative models either prioritize predictive performance at the expense of interpretability or enforce interpretability by linearizing the decoder, thereby weakening the network's nonlinear expressiveness. To overcome this tradeoff, we introduce POEMS: Product Of Experts for Interpretable Multiomics Integration using Sparse Decoding, an unsupervised probabilistic framework that preserves predictive performance while providing interpretability. POEMS provides interpretability without linearizing any part of the network by 1) mapping features to latent factors using sparse connections, which directly translates to biomarker discovery, 2) allowing for cross-omic associations through a shared latent space using product of experts model, and 3) reporting contributions of each omic by a gating network that adaptively computes their influence in the representation learning. Additionally, we present an efficient sparse decoder. In a cancer subtyping case study, POEMS achieves competitive clustering and classification performance while offering our novel set of interpretations, demonstrating that biomarker based insight and predictive accuracy can coexist in multiomics representation learning.

【10】SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration
标题：SORTeD罗生门稀疏决策树集：随时计数
链接：https://arxiv.org/abs/2511.03344

作者：Elif Arslan, Jacobus G. M. van der Linden, Serge Hoogendoorn, Marco Rinaldi, Emir Demirović
备注：32 pages, 10 figures, to be published in the proceedings of The Thirty-Ninth Annual Conference on Neural Information Processing Systems
摘要：稀疏决策树学习提供了准确和可解释的预测模型，通过在（软）大小限制内找到单个最准确的树，这些模型非常适合高风险应用。罗生门集合--具有相似性能但不同结构的树--可用于增强变量重要性分析，丰富解释，并使用户能够选择更简单的树或满足利益相关者偏好的树（例如，公平性），而不将这样的标准硬编码到目标函数中。然而，因为找到最优树是NP难的，所以枚举罗生门集合本质上是具有挑战性的。因此，我们引入SORTD，一个新的框架，提高了可扩展性和枚举树的罗生门集的目标值的顺序，从而提供随时的行为。我们的实验表明，SORTD减少了高达两个数量级的运行时间相比，最先进的状态。此外，SORTD可以计算罗生门集的任何可分离的和完全有序的目标，并支持后评估使用其他可分离（和部分有序）的目标集。总之，这些进步使得探索罗生门集在现实世界的应用中更加实用。
摘要：Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single "best" tree, Rashomon sets-trees with similar performance but varying structures-can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.

【11】Extending Fair Null-Space Projections for Continuous Attributes to Kernel Methods
标题：将连续属性的公平零空间投影扩展到核方法
链接：https://arxiv.org/abs/2511.03304

作者：Felix Störck, Fabian Hinder, Barbara Hammer
摘要：随着机器学习系统不断融入数百万人的日常社会生活，公平的概念在其发展中越来越重要。公平概念通常依赖于受保护的属性来评估潜在的偏见。在这里，大多数文献集中在离散的目标和受保护的属性设置。关于连续属性的文献，特别是与回归相结合的文献，我们称之为连续公平性。一种常见的策略是迭代零空间投影，到目前为止，仅针对线性模型或嵌入（例如通过非线性编码器获得的嵌入）进行了探索。我们通过推广到内核方法来改进这一点，显著扩展了范围。这产生了一个模型和公平分数不可知的方法适用于连续保护属性的内核嵌入。我们证明了我们的新方法与支持向量回归（SVR）相结合，与其他当代方法相比，在多个数据集上提供了具有竞争力或改进的性能。
摘要：With the on-going integration of machine learning systems into the everyday social life of millions the notion of fairness becomes an ever increasing priority in their development. Fairness notions commonly rely on protected attributes to assess potential biases. Here, the majority of literature focuses on discrete setups regarding both target and protected attributes. The literature on continuous attributes especially in conjunction with regression -- we refer to this as \emph{continuous fairness} -- is scarce. A common strategy is iterative null-space projection which as of now has only been explored for linear models or embeddings such as obtained by a non-linear encoder. We improve on this by generalizing to kernel methods, significantly extending the scope. This yields a model and fairness-score agnostic method for kernel embeddings applicable to continuous protected attributes. We demonstrate that our novel approach in conjunction with Support Vector Regression (SVR) provides competitive or improved performance across multiple datasets in comparisons to other contemporary methods.

【12】When Generative Artificial Intelligence meets Extended Reality: A Systematic Review
标题：当生成人工智能遇到延展实境：系统性评论
链接：https://arxiv.org/abs/2511.03282

作者：Xinyu Ning, Yan Zhuo, Xian Wang, Chan-In Devin Sio, Lik-Hang Lee
摘要：随着技术的不断进步，生成式人工智能（AI）在各个领域的应用正逐渐展现出巨大的潜力，特别是与延展实境（XR）相结合，创造出前所未有的可能性。这篇调查文章系统地回顾了生成式AI在XR中的应用，涵盖了2023年至2025年期间尽可能多的相关文献。通过对最终26篇文章的PRISMA筛选和分析，总结了XR中生成式AI的应用领域及其关键技术实现。该调查重点介绍了过去三年中与XR如何利用生成式AI相关的现有文章，为当前趋势和研究差距提供了见解。我们还探索未来研究的潜在机会，通过生成式AI进一步增强XR，为未来的生成式XR研究提供指导和信息。
摘要：With the continuous advancement of technology, the application of generative artificial intelligence (AI) in various fields is gradually demonstrating great potential, particularly when combined with Extended Reality (XR), creating unprecedented possibilities. This survey article systematically reviews the applications of generative AI in XR, covering as much relevant literature as possible from 2023 to 2025. The application areas of generative AI in XR and its key technology implementations are summarised through PRISMA screening and analysis of the final 26 articles. The survey highlights existing articles from the last three years related to how XR utilises generative AI, providing insights into current trends and research gaps. We also explore potential opportunities for future research to further empower XR through generative AI, providing guidance and information for future generative XR research.

【13】A Probabilistic Approach to Pose Synchronization for Multi-Reference Alignment with Applications to MIMO Wireless Communication Systems
标题：多参考对准位姿同步的概率方法及其在多输入多输出无线通信系统中的应用
链接：https://arxiv.org/abs/2511.03280

作者：Rob Romijnders, Gabriele Cesa, Christos Louizos, Kumar Pratik, Arash Behboodi
备注：To appear in NeurIPS workshop: AI and ML for Next-Generation Wireless Communications (AI4NextG)
摘要：从分子成像到无线通信，从多个未对准的观测中对准和重建信号的能力对于系统性能至关重要。我们研究的问题，多参考对齐（MRA），这出现在许多现实世界的问题，如冷冻EM，计算机视觉，特别是，无线通信系统。使用概率的方法来建模MRA，我们找到了一种新的算法，使用相对构成的滋扰变量边缘化-从而消除了问题的全局对称性，并允许更直接的解决方案和改进的收敛性。这种方法的去中心化通过循环一致性避免了中心化方法的立方缩放，从而实现了显着的计算节省。这两种算法在实验设置中实现了较低的重建误差。
摘要：From molecular imaging to wireless communications, the ability to align and reconstruct signals from multiple misaligned observations is crucial for system performance. We study the problem of multi-reference alignment (MRA), which arises in many real-world problems, such as cryo-EM, computer vision, and, in particular, wireless communication systems. Using a probabilistic approach to model MRA, we find a new algorithm that uses relative poses as nuisance variables to marginalize out -- thereby removing the global symmetries of the problem and allowing for more direct solutions and improved convergence. The decentralization of this approach enables significant computational savings by avoiding the cubic scaling of centralized methods through cycle consistency. Both proposed algorithms achieve lower reconstruction error across experimental settings.

【14】Decoupled Entropy Minimization
标题：去耦合熵最小化
链接：https://arxiv.org/abs/2511.03256

作者：Jing Ma, Hanlin Li, Xiang Xiang
备注：To appear at NeurIPS 2025 (main conference), San Diego, CA, USA. Codes available at this https URL
摘要：熵最小化（Entropy Minimization，EM）算法对于减少机器学习中的类重叠、弥合领域鸿沟、限制不确定性等方面具有重要的意义，但其潜力有限。为了研究EM的内部机制，我们重新制定和解耦的经典EM成两个部分具有相反的效果：集群聚合驱动因子（CADF）奖励占主导地位的类，并提示峰值输出分布，而梯度缓解校准器（GMC）惩罚高置信度类的基础上预测的概率。此外，我们揭示了经典EM的局限性所造成的耦合公式：1）奖励崩溃阻碍了高确定性样本在学习过程中的贡献，2）容易类偏差导致输出分布和标签分布之间的不一致。为了解决这些问题，我们提出了自适应解耦熵最小化（AdaDEM），它规范化从CADF带来的奖励，并采用边际熵校准器（MEC）来取代GMC.AdaDEM优于DEM*，经典EM的上限变体，并在噪声和动态环境中的各种不完美的监督学习任务中实现卓越的性能。
摘要：Entropy Minimization (EM) is beneficial to reducing class overlap, bridging domain gap, and restricting uncertainty for various tasks in machine learning, yet its potential is limited. To study the internal mechanism of EM, we reformulate and decouple the classical EM into two parts with opposite effects: cluster aggregation driving factor (CADF) rewards dominant classes and prompts a peaked output distribution, while gradient mitigation calibrator (GMC) penalizes high-confidence classes based on predicted probabilities. Furthermore, we reveal the limitations of classical EM caused by its coupled formulation: 1) reward collapse impedes the contribution of high-certainty samples in the learning process, and 2) easy-class bias induces misalignment between output distribution and label distribution. To address these issues, we propose Adaptive Decoupled Entropy Minimization (AdaDEM), which normalizes the reward brought from CADF and employs a marginal entropy calibrator (MEC) to replace GMC. AdaDEM outperforms DEM*, an upper-bound variant of classical EM, and achieves superior performance across various imperfectly supervised learning tasks in noisy and dynamic environments.

【15】A unified physics-informed generative operator framework for general inverse problems
标题：针对一般反问题的统一物理信息生成式操作符框架
链接：https://arxiv.org/abs/2511.03241

作者：Gang Bao, Yaohua Zang
摘要：求解由偏微分方程（PDE）控制的反问题是科学和工程的核心，但当测量稀疏、有噪声或底层系数是高维或不连续时，仍然具有挑战性。现有的深度学习方法要么需要大量的标记数据集，要么仅限于特定的测量类型，这通常会导致这些机制的失败并限制其实际适用性。在这里，一个新的生成神经算子框架，IGNO，介绍了克服这些限制。IGNO统一了点测量和算子值数据的反问题的解决方案，而无需标记训练对。该框架将高维的、可能不连续的系数字段编码到低维的潜在空间中，这驱动神经运算符解码器重建系数和PDE解。训练完全依赖于通过PDE残差的物理约束，而反演则通过潜在空间中基于梯度的高效优化进行，并通过先验归一化流模型加速。在各种具有挑战性的逆问题中，包括从基于解的测量中恢复不连续系数和基于算子的测量的EIT问题，IGNO即使在严重的噪声下也能始终如一地实现准确，稳定和可扩展的反演。在不同的噪声水平下，它始终优于最先进的方法，并对分布外的目标表现出很强的泛化能力。这些结果建立IGNO作为一个统一的和强大的框架，以解决具有挑战性的反问题跨计算科学领域。
摘要：Solving inverse problems governed by partial differential equations (PDEs) is central to science and engineering, yet remains challenging when measurements are sparse, noisy, or when the underlying coefficients are high-dimensional or discontinuous. Existing deep learning approaches either require extensive labeled datasets or are limited to specific measurement types, often leading to failure in such regimes and restricting their practical applicability. Here, a novel generative neural operator framework, IGNO, is introduced to overcome these limitations. IGNO unifies the solution of inverse problems from both point measurements and operator-valued data without labeled training pairs. This framework encodes high-dimensional, potentially discontinuous coefficient fields into a low-dimensional latent space, which drives neural operator decoders to reconstruct both coefficients and PDE solutions. Training relies purely on physics constraints through PDE residuals, while inversion proceeds via efficient gradient-based optimization in latent space, accelerated by an a priori normalizing flow model. Across a diverse set of challenging inverse problems, including recovery of discontinuous coefficients from solution-based measurements and the EIT problem with operator-based measurements, IGNO consistently achieves accurate, stable, and scalable inversion even under severe noise. It consistently outperforms the state-of-the-art method under varying noise levels and demonstrates strong generalization to out-of-distribution targets. These results establish IGNO as a unified and powerful framework for tackling challenging inverse problems across computational science domains.

【16】A Probabilistic U-Net Approach to Downscaling Climate Simulations
标题：缩减气候模拟规模的概率U-Net方法
链接：https://arxiv.org/abs/2511.03197

作者：Maryam Alipourhajiagha, Pierre-Louis Lemaire, Youssef Diouane, Julie Carreau
备注：NeurIPS 2025 AI4Science
摘要：气候模型受到计算成本高的限制，往往以粗略的空间分辨率产生输出，而许多气候变化影响研究需要更精细的尺度。统计降尺度弥合了这一差距，我们调整了概率U-网这项任务，结合确定性的U-网骨干与变化的潜在空间，以捕捉任意的不确定性。我们评估了四个训练目标，afCRPS和WMSE-MS-SSIM与三个设置降尺度降水和温度从$16\times$粗分辨率。我们的主要发现是，WMSE-MS-SSIM在某些设置下的极端情况下表现良好，而afCRPS更好地捕捉跨尺度的空间变异性。
摘要：Climate models are limited by heavy computational costs, often producing outputs at coarse spatial resolutions, while many climate change impact studies require finer scales. Statistical downscaling bridges this gap, and we adapt the probabilistic U-Net for this task, combining a deterministic U-Net backbone with a variational latent space to capture aleatoric uncertainty. We evaluate four training objectives, afCRPS and WMSE-MS-SSIM with three settings for downscaling precipitation and temperature from $16\times$ coarser resolution. Our main finding is that WMSE-MS-SSIM performs well for extremes under certain settings, whereas afCRPS better captures spatial variability across scales.

【17】Cross-Modal Alignment via Variational Copula Modelling
标题：通过变分Copula模型实现跨模式对齐
链接：https://arxiv.org/abs/2511.03196

作者：Feng Wu, Tsai Hor Chan, Fuying Wang, Guosheng Yin, Lequan Yu
备注：None
摘要：各种数据模态在现实世界的应用中是常见的（例如，医疗保健中的电子健康记录、医学图像和临床记录）。开发多模态学习方法来聚合来自多模态的各种信息是至关重要的。主要的挑战是如何适当地调整和融合不同模态的表示，使之成为一个联合分布。现有的方法主要依赖于级联或克罗内克积，过度简化了模态之间的交互结构，并表明需要对更复杂的交互进行建模。此外，潜在表征与高阶相互作用的联合分布还未得到充分研究。Copula是一种强大的统计结构，用于建模变量之间的相互作用，因为它自然地桥接了多个变量的联合分布和边缘分布。我们提出了一种新的连接驱动的多模态学习框架，它专注于学习各种模态的联合分布，以捕捉它们之间的复杂交互。其核心思想是解释Copula模型作为一种工具，以对齐的边缘分布的模态有效。通过假设每个模态的高斯混合分布和联合分布上的copula模型，我们的模型可以生成丢失模态的准确表示。在公共MIMIC数据集上进行的大量实验表明，我们的模型优于其他竞争对手。该代码可在https://github.com/HKU-MedAI/CMCM上获得。
摘要：Various data modalities are common in real-world applications (e.g., electronic health records, medical images and clinical notes in healthcare). It is essential to develop multimodal learning methods to aggregate various information from multiple modalities. The main challenge is how to appropriately align and fuse the representations of different modalities into a joint distribution. Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying the interaction structure between modalities and indicating a need to model more complex interactions. Additionally, the joint distribution of latent representations with higher-order interactions is underexplored. Copula is a powerful statistical structure for modelling the interactions among variables, as it naturally bridges the joint distribution and marginal distributions of multiple variables. We propose a novel copula-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities to capture the complex interactions among them. The key idea is to interpret the copula model as a tool to align the marginal distributions of the modalities efficiently. By assuming a Gaussian mixture distribution for each modality and a copula model on the joint distribution, our model can generate accurate representations for missing modalities. Extensive experiments on public MIMIC datasets demonstrate the superior performance of our model over other competitors. The code is available at https://github.com/HKU-MedAI/CMCM.

【18】Periodic Skill Discovery
标题：定期技能发现
链接：https://arxiv.org/abs/2511.03187

作者：Jonghae Park, Daesol Cho, Jusuk Lee, Dongseok Shim, Inkyu Jang, H. Jin Kim
备注：NeurIPS 2025
摘要：强化学习（RL）中的无监督技能发现旨在学习不同的行为，而不依赖于外部奖励。然而，目前的方法往往忽略了学习技能的周期性，而是专注于增加状态和技能之间的相互依赖性或最大化在潜在空间中行进的距离。考虑到许多机器人任务-特别是那些涉及运动的任务-需要跨不同时间尺度的周期性行为，发现不同周期性技能的能力至关重要。基于此，我们提出了周期性技能发现（PSD），这是一个以无监督方式发现周期性行为的框架。PSD的关键思想是训练一个编码器，将状态映射到一个圆形的潜在空间，从而自然地编码潜在表示中的周期性。通过捕获时间距离，PSD可以有效地学习复杂机器人任务中不同时期的技能，即使是基于像素的观察。我们进一步表明，这些学到的技能实现高性能的下游任务，如跨栏。此外，集成PSD与现有的技能发现方法提供了更多样化的行为，从而扩大了代理的剧目。我们的代码和演示可在https://jonghaepark.github.io/psd/上获得
摘要：Unsupervised skill discovery in reinforcement learning (RL) aims to learn diverse behaviors without relying on external rewards. However, current methods often overlook the periodic nature of learned skills, focusing instead on increasing the mutual dependence between states and skills or maximizing the distance traveled in latent space. Considering that many robotic tasks -- particularly those involving locomotion -- require periodic behaviors across varying timescales, the ability to discover diverse periodic skills is essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a framework that discovers periodic behaviors in an unsupervised manner. The key idea of PSD is to train an encoder that maps states to a circular latent space, thereby naturally encoding periodicity in the latent representation. By capturing temporal distance, PSD can effectively learn skills with diverse periods in complex robotic tasks, even with pixel-based observations. We further show that these learned skills achieve high performance on downstream tasks such as hurdling. Moreover, integrating PSD with an existing skill discovery method offers more diverse behaviors, thus broadening the agent's repertoire. Our code and demos are available at https://jonghaepark.github.io/psd/

【19】Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework
标题：迈向自主工程设计：知识引导的多主体框架
链接：https://arxiv.org/abs/2511.03179

作者：Varun Kumar, George Em Karniadakis
摘要：工程设计过程通常需要来自多个领域的专业知识，导致复杂的协作和迭代改进。传统的方法可能是资源密集型的，而且效率低下。为了解决这个问题，我们通过一个集成了结构化设计和审查循环的多智能体AI框架来正式化工程设计过程。该框架引入了专门的知识驱动的代理人，合作产生和完善的设计候选人。作为一个例子，我们展示了它的应用程序的四位数NACA翼型的气动优化。该框架由三个关键的AI代理组成：图形本体论，设计工程师和系统工程师。图本体采用了大语言模型（LLM），从翼型设计文献中构建两个特定领域的知识图。系统工程师在人力经理的指导下，制定指导设计生成和评估的技术要求。设计工程师利用设计知识图和计算工具提出满足这些要求的候选翼型。系统工程师使用自己的知识图进行审查并提供定性和定量反馈，形成迭代反馈循环，直到经理验证设计。然后对最终设计进行优化，以最大限度地提高升阻比等性能指标。总的来说，这项工作展示了配备结构化知识表示的协作AI代理如何提高工程设计过程的效率，一致性和质量。
摘要：The engineering design process often demands expertise from multiple domains, leading to complex collaborations and iterative refinements. Traditional methods can be resource-intensive and prone to inefficiencies. To address this, we formalize the engineering design process through a multi-agent AI framework that integrates structured design and review loops. The framework introduces specialized knowledge-driven agents that collaborate to generate and refine design candidates. As an exemplar, we demonstrate its application to the aerodynamic optimization of 4-digit NACA airfoils. The framework consists of three key AI agents: a Graph Ontologist, a Design Engineer, and a Systems Engineer. The Graph Ontologist employs a Large Language Model (LLM) to construct two domain-specific knowledge graphs from airfoil design literature. The Systems Engineer, informed by a human manager, formulates technical requirements that guide design generation and evaluation. The Design Engineer leverages the design knowledge graph and computational tools to propose candidate airfoils meeting these requirements. The Systems Engineer reviews and provides feedback both qualitative and quantitative using its own knowledge graph, forming an iterative feedback loop until a design is validated by the manager. The final design is then optimized to maximize performance metrics such as the lift-to-drag ratio. Overall, this work demonstrates how collaborative AI agents equipped with structured knowledge representations can enhance efficiency, consistency, and quality in the engineering design process.

【20】UnCLe: Towards Scalable Dynamic Causal Discovery in Non-linear Temporal Systems
标题：UnCLe：非线性时间系统中的可扩展动态因果发现
链接：https://arxiv.org/abs/2511.03168

作者：Tingzhu Bi, Yicheng Pan, Xinrui Jiang, Huize Sun, Meng Ma, Ping Wang
备注：12 pages main content, 18 pages appendix, NeurIPS 2025. Code: this https URL
摘要：从观测时间序列中揭示因果关系是理解复杂系统的基础。虽然许多方法推断静态因果图，现实世界的系统往往表现出动态的因果关系，随着时间的推移而演变。准确地捕捉这些时间动态需要时间分辨的因果图。我们提出了UnCLe，这是一种用于可扩展动态因果发现的新型深度学习方法。UnCLe采用一对Uncoupler和Recoupler网络将输入时间序列分解为语义表示，并通过自回归依赖矩阵学习变量间的依赖关系。它通过分析时间扰动引起的数据点预测误差来估计动态因果影响。大量的实验表明，UnCLe不仅在静态因果发现基准上优于最先进的基线，而且更重要的是，它表现出独特的能力，可以准确地捕捉和表示合成和真实世界动态系统中不断变化的时间因果关系（例如，人体运动）。UnCLe为揭示复杂现象的潜在时变机制提供了一种有前途的方法。
摘要：Uncovering cause-effect relationships from observational time series is fundamental to understanding complex systems. While many methods infer static causal graphs, real-world systems often exhibit dynamic causality-where relationships evolve over time. Accurately capturing these temporal dynamics requires time-resolved causal graphs. We propose UnCLe, a novel deep learning method for scalable dynamic causal discovery. UnCLe employs a pair of Uncoupler and Recoupler networks to disentangle input time series into semantic representations and learns inter-variable dependencies via auto-regressive Dependency Matrices. It estimates dynamic causal influences by analyzing datapoint-wise prediction errors induced by temporal perturbations. Extensive experiments demonstrate that UnCLe not only outperforms state-of-the-art baselines on static causal discovery benchmarks but, more importantly, exhibits a unique capability to accurately capture and represent evolving temporal causality in both synthetic and real-world dynamic systems (e.g., human motion). UnCLe offers a promising approach for revealing the underlying, time-varying mechanisms of complex phenomena.

【21】From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents
标题：从测量到专业知识：对话式人工智能代理中基于上下文的同理心的同理心专家适配器
链接：https://arxiv.org/abs/2511.03143

作者：Erfan Shayegani, Jina Suh, Andy Wilson, Nagu Rangan, Javier Hernandez
摘要：同理心是在对话式AI中培养积极用户体验的关键因素。虽然模型可以表现出同理心，但它通常是通用的，而不是针对特定的任务和环境。在这项工作中，我们介绍了一种新的框架，用于开发和评估特定于上下文的移情大语言模型（LLM）。我们首先分析了一个真实世界的对话数据集，包括672个多回合的对话，在8个任务，揭示了显着差异的预期和经验的同理心之前和之后的对话，分别。为了帮助最大限度地减少这一差距，我们开发了一个合成的多轮会话生成管道，并根据更接近用户期望的上下文，将响应转向我们定义的移情模式。然后，我们训练共情专家适应特定情境的共情，他们专门根据所识别的任务来改变共情水平。我们的实证结果表明，感知和期望的同理心之间的差距显着减少了72.66%，分数增加了2.43的平均因素，我们的指标和奖励模型测量。此外，我们经过训练的共情专家适配器在整个谈话回合中保持共情模式方面表现出卓越的效率，优于系统提示，随着谈话时间的延长，系统提示的影响力往往会大大降低。
摘要：Empathy is a critical factor in fostering positive user experiences in conversational AI. While models can display empathy, it is often generic rather than tailored to specific tasks and contexts. In this work, we introduce a novel framework for developing and evaluating context-specific empathetic large language models (LLMs). We first analyze a real-world conversational dataset consisting of 672 multi-turn conversations across 8 tasks, revealing significant differences in terms of expected and experienced empathy before and after the conversations, respectively. To help minimize this gap, we develop a synthetic multi-turn conversational generation pipeline and steer responses toward our defined empathy patterns based on the context that more closely matches users' expectations. We then train empathetic expert adapters for context-specific empathy that specialize in varying empathy levels based on the recognized task. Our empirical results demonstrate a significant gap reduction of 72.66% between perceived and desired empathy with scores increasing by an average factor of 2.43 as measured by our metrics and reward models. Additionally, our trained empathetic expert adapters demonstrate superior effectiveness in preserving empathy patterns throughout conversation turns, outperforming system prompts, which tend to dramatically diminish in impact as conversations lengthen.

【22】FP-AbDiff: Improving Score-based Antibody Design by Capturing Nonequilibrium Dynamics through the Underlying Fokker-Planck Equation
标题：FP-AbDiff：通过潜在的福克-普朗克方程捕捉非平衡动力学来改进基于分数的抗体设计
链接：https://arxiv.org/abs/2511.03113

作者：Jiameng Chen, Yida Xiong, Kun Li, Hongzhi Zhang, Xiantao Cai, Wenbin Hu, Jia Wu
备注：9 pages, 3 figures
摘要：计算抗体设计为治疗发现带来了巨大的希望，但现有的生成模型从根本上受到两个核心挑战的限制：（i）缺乏动态一致性，这会产生物理上不可信的结构，以及（ii）由于数据稀缺和结构偏差而导致的泛化能力差。我们引入FP-AbDiff，第一个抗体生成器，以沿着整个生成轨迹执行Fokker-Planck方程（FPE）物理。我们的方法最小化了CDR几何形状（R^3 x SO（3））的混合流形上的新FPE残差损失，迫使本地学习的去噪分数组装成全局相干概率流。这种物理学信息的正则化因子与最先进的SE（3）-等变扩散框架内的深层生物先验知识协同整合。对RAbD基准的严格评估证实了FP-AbDiff建立了一个新的最先进的设计。在从头CDR-H3设计中，当叠加在可变区上时，它实现了0.99 {\AA}的平均均方根偏差，比之前的最先进的模型AbX提高了25%，并且报告了最高的接触氨基酸回收率39.91%。这种优越性在更具挑战性的六CDR协同设计任务中得到了强调，在该任务中，我们的模型始终提供卓越的几何精度，将平均全链均方根偏差降低约15%，并且至关重要的是，在功能主导的CDR-H3环上实现了最高的全链氨基酸回收率（45.67%）。通过将生成动力学与物理定律相结合，FP-AbDiff增强了鲁棒性和通用性，为物理上忠实和功能上可行的抗体设计建立了原则性方法。
摘要：Computational antibody design holds immense promise for therapeutic discovery, yet existing generative models are fundamentally limited by two core challenges: (i) a lack of dynamical consistency, which yields physically implausible structures, and (ii) poor generalization due to data scarcity and structural bias. We introduce FP-AbDiff, the first antibody generator to enforce Fokker-Planck Equation (FPE) physics along the entire generative trajectory. Our method minimizes a novel FPE residual loss over the mixed manifold of CDR geometries (R^3 x SO(3)), compelling locally-learned denoising scores to assemble into a globally coherent probability flow. This physics-informed regularizer is synergistically integrated with deep biological priors within a state-of-the-art SE(3)-equivariant diffusion framework. Rigorous evaluation on the RAbD benchmark confirms that FP-AbDiff establishes a new state-of-the-art. In de novo CDR-H3 design, it achieves a mean Root Mean Square Deviation of 0.99 {\AA} when superposing on the variable region, a 25% improvement over the previous state-of-the-art model, AbX, and the highest reported Contact Amino Acid Recovery of 39.91%. This superiority is underscored in the more challenging six-CDR co-design task, where our model delivers consistently superior geometric precision, cutting the average full-chain Root Mean Square Deviation by ~15%, and crucially, achieves the highest full-chain Amino Acid Recovery on the functionally dominant CDR-H3 loop (45.67%). By aligning generative dynamics with physical laws, FP-AbDiff enhances robustness and generalizability, establishing a principled approach for physically faithful and functionally viable antibody design.

【23】Sparse, self-organizing ensembles of local kernels detect rare statistical anomalies
标题：局部核的稀疏、自组织集合检测罕见的统计异常
链接：https://arxiv.org/abs/2511.03095

作者：Gaia Grosso, Sai Sumedh R. Hindupur, Thomas Fel, Samuel Bright-Thonney, Philip Harris, Demba Ba
摘要：现代人工智能已经彻底改变了我们提取跨科学学科的丰富和通用数据表示的能力。然而，这些表示的统计特性仍然控制不良，导致错误指定的异常检测（AD）方法摇摇欲坠。微弱或罕见的信号可能会隐藏在正常数据的明显规律中，从而使我们检测和解释异常的能力存在差距。我们研究这一差距，并确定了一组结构的必要条件下运行的检测方法的先验信息最少：稀疏性，以执行简约性;局部性，以保持几何敏感性;和竞争，以促进模型容量的有效分配。这些原则定义了一类自组织的本地内核，自适应分区的表示空间周围的区域的统计不平衡。作为这些原则的一个实例，我们介绍了SparKer，一个稀疏的集合高斯内核训练内的半监督内曼-皮尔逊框架，局部模型之间的似然比的样本，可能包含异常和名义上的，无异常的参考。我们提供了理论见解的机制，驱动检测和自组织在所提出的模型，并证明了这种方法对现实的高维问题的科学发现，开放世界的新奇检测，入侵检测和生成模型验证的有效性。我们的应用跨越自然科学和计算机科学领域。我们证明，合奏只包含少数内核可以识别统计上显着的异常位置内的表示空间的数千个维度，强调所提出的方法的可解释性，效率和可扩展性。
摘要：Modern artificial intelligence has revolutionized our ability to extract rich and versatile data representations across scientific disciplines. Yet, the statistical properties of these representations remain poorly controlled, causing misspecified anomaly detection (AD) methods to falter. Weak or rare signals can remain hidden within the apparent regularity of normal data, creating a gap in our ability to detect and interpret anomalies. We examine this gap and identify a set of structural desiderata for detection methods operating under minimal prior information: sparsity, to enforce parsimony; locality, to preserve geometric sensitivity; and competition, to promote efficient allocation of model capacity. These principles define a class of self-organizing local kernels that adaptively partition the representation space around regions of statistical imbalance. As an instantiation of these principles, we introduce SparKer, a sparse ensemble of Gaussian kernels trained within a semi-supervised Neyman--Pearson framework to locally model the likelihood ratio between a sample that may contain anomalies and a nominal, anomaly-free reference. We provide theoretical insights into the mechanisms that drive detection and self-organization in the proposed model, and demonstrate the effectiveness of this approach on realistic high-dimensional problems of scientific discovery, open-world novelty detection, intrusion detection, and generative-model validation. Our applications span both the natural- and computer-science domains. We demonstrate that ensembles containing only a handful of kernels can identify statistically significant anomalous locations within representation spaces of thousands of dimensions, underscoring both the interpretability, efficiency and scalability of the proposed approach.

【24】Homomorphism distortion: A metric to distinguish them all and in the latent space bind them
标题：同形失真：区分它们并在潜在空间中将它们绑定的指标
链接：https://arxiv.org/abs/2511.03068

作者：Martin Carrasco, Olga Zaghen, Erik Bekkers, Bastian Rieck
摘要：长期以来，图神经网络的表达能力仅根据组合特性来衡量。在这项工作中，我们偏离了这一传统，并提供了一个原则性的方法来衡量顶点属性图之间的相似性。我们把这个度量表示为\n {图同态失真}。我们证明了它可以完全刻画图，因此也是一个完全图嵌入。然而，在路上的某个地方，我们遇到了图的经典化问题。为了绕过这个障碍，我们设计通过采样有效地计算这个措施，这在预期中确保了完整性。此外，我们还发现我们可以从这个度量中获得一个度量。我们验证了我们的索赔经验，并发现\n {图同态失真}：（1）。完全区分具有高达$4 $-WL不可区分图的\texttt {BREC}数据集，以及（2.）\texttt {ZINC-12k}数据集下受同态启发的方法优于以前的方法。这些理论结果，（和他们的实证验证），铺平了道路，为未来的表征图，扩展图论的传统到新的前沿。
摘要：For far too long, expressivity of graph neural networks has been measured \emph{only} in terms of combinatorial properties. In this work we stray away from this tradition and provide a principled way to measure similarity between vertex attributed graphs. We denote this measure as the \emph{graph homomorphism distortion}. We show it can \emph{completely characterize} graphs and thus is also a \emph{complete graph embedding}. However, somewhere along the road, we run into the graph canonization problem. To circumvent this obstacle, we devise to efficiently compute this measure via sampling, which in expectation ensures \emph{completeness}. Additionally, we also discovered that we can obtain a metric from this measure. We validate our claims empirically and find that the \emph{graph homomorphism distortion}: (1.) fully distinguishes the \texttt{BREC} dataset with up to $4$-WL non-distinguishable graphs, and (2.) \emph{outperforms} previous methods inspired in homomorphisms under the \texttt{ZINC-12k} dataset. These theoretical results, (and their empirical validation), pave the way for future characterization of graphs, extending the graph theoretic tradition to new frontiers.

【25】Reading Between the Lines: The One-Sided Conversation Problem
标题：字里行间的解读：单边对话问题
链接：https://arxiv.org/abs/2511.03056

作者：Victoria Ebert, Rishabh Singh, Tuochao Chen, Noah A. Smith, Shyamnath Gollakota
备注：8 pages, 6 figures, 4 tables
摘要：对话式人工智能在许多现实环境中受到限制，只能记录对话的一方，例如远程医疗，呼叫中心和智能眼镜。我们将其形式化为单边对话问题（1 SC）：从对话的一方进行推断和学习。我们研究了两个任务：（1）为实时用例重建缺失的说话者的话轮，以及（2）从单侧转录中生成摘要。通过评估MultiWOZ，DailyDialog和Candor上的提示和微调模型，以及人类A/B测试和LLM作为判断指标，我们发现访问一个未来的回合和有关话语长度的信息可以改善重建，占位符提示有助于减轻幻觉，而大型模型可以通过提示生成有希望的重建，较小的模型需要微调。此外，可以生成高质量的摘要，而无需重建丢失的匝。我们将1 SC作为一个新的挑战，并报告了有希望的结果，标志着向隐私感知对话AI迈出了一步。
摘要：Conversational AI is constrained in many real-world settings where only one side of a dialogue can be recorded, such as telemedicine, call centers, and smart glasses. We formalize this as the one-sided conversation problem (1SC): inferring and learning from one side of a conversation. We study two tasks: (1) reconstructing the missing speaker's turns for real-time use cases, and (2) generating summaries from one-sided transcripts. Evaluating prompting and finetuned models on MultiWOZ, DailyDialog, and Candor with both human A/B testing and LLM-as-a-judge metrics, we find that access to one future turn and information about utterance length improves reconstruction, placeholder prompting helps to mitigate hallucination, and while large models generate promising reconstructions with prompting, smaller models require finetuning. Further, high-quality summaries can be generated without reconstructing missing turns. We present 1SC as a novel challenge and report promising results that mark a step toward privacy-aware conversational AI.

【26】Leveraging Discrete Function Decomposability for Scientific Design
标题：利用离散函数的可分解性进行科学设计
链接：https://arxiv.org/abs/2511.03032

作者：James C. Bowden, Sergey Levine, Jennifer Listgarten
摘要：在人工智能驱动的科学和工程时代，我们经常希望根据用户指定的属性通过计算机模拟设计离散对象。例如，我们可能希望设计一种蛋白质来结合其目标，在电路中排列组件以最小化延迟，或者找到具有某些特性的材料。给定属性预测模型，计算机设计通常涉及在设计空间上训练生成模型（例如，蛋白质序列空间）以集中于具有所需性质的设计。分布优化-可以形式化为分布算法的估计或强化学习策略优化-找到生成模型，最大化期望中的目标函数。由于设计空间的组合性质，在离散值设计上优化分布通常具有挑战性。然而，在科学应用中的许多属性预测是可分解的，在某种意义上，它们可以在设计变量上以一种原则上可以实现更有效优化的方式进行因子分解。例如，在蛋白质的催化位点处的氨基酸可以仅与蛋白质的其余部分的氨基酸松散地相互作用以实现最大催化活性。目前的分布式优化算法无法利用这种可分解结构。在这里，我们提出并证明了使用一种新的分布式优化算法，分解感知分布式优化（DADO），可以利用任何分解定义的连接树的设计变量，使优化更有效。在其核心，DADO采用软因子分解的“搜索分布”-学习生成模型-用于搜索空间的有效导航，调用图形消息传递来协调相关因素的优化。
摘要：In the era of AI-driven science and engineering, we often want to design discrete objects in silico according to user-specified properties. For example, we may wish to design a protein to bind its target, arrange components within a circuit to minimize latency, or find materials with certain properties. Given a property predictive model, in silico design typically involves training a generative model over the design space (e.g., protein sequence space) to concentrate on designs with the desired properties. Distributional optimization -- which can be formalized as an estimation of distribution algorithm or as reinforcement learning policy optimization -- finds the generative model that maximizes an objective function in expectation. Optimizing a distribution over discrete-valued designs is in general challenging because of the combinatorial nature of the design space. However, many property predictors in scientific applications are decomposable in the sense that they can be factorized over design variables in a way that could in principle enable more effective optimization. For example, amino acids at a catalytic site of a protein may only loosely interact with amino acids of the rest of the protein to achieve maximal catalytic activity. Current distributional optimization algorithms are unable to make use of such decomposability structure. Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables, to make optimization more efficient. At its core, DADO employs a soft-factorized "search distribution" -- a learned generative model -- for efficient navigation of the search space, invoking graph message-passing to coordinate optimization across linked factors.

【27】Value of Information-Enhanced Exploration in Bootstrapped DQN
标题：信息增强探索在引导DQN中的价值
链接：https://arxiv.org/abs/2511.02969

作者：Stergios Plataniotis, Charilaos Akasiadis, Georgios Chalkiadakis
摘要：深度强化学习中的有效探索仍然是一个根本性的挑战，特别是在以高维状态和稀疏奖励为特征的环境中。依赖于随机局部策略噪声的传统探索策略，例如$\xAE $-greedy和Boltzmann探索方法，通常难以有效地平衡探索和利用。在本文中，我们集成的概念（期望）值的信息（EVOI）的著名的Bootstrapped DQN算法框架内，以提高算法的深度探索能力。具体来说，我们开发了两种新的算法，将预期的收益从学习的信息值到Bootstrapped DQN。我们的方法使用信息价值估计来衡量不同网络负责人之间的意见差异，并推动对最具潜力的领域的探索。我们评估我们的算法的性能和利用随机网络初始化所产生的固有不确定性的能力。我们在复杂的、稀疏奖励的Atari游戏中的实验表明，性能得到了提高，同时更好地利用了不确定性，重要的是，没有引入额外的超参数。
摘要：Efficient exploration in deep reinforcement learning remains a fundamental challenge, especially in environments characterized by high-dimensional states and sparse rewards. Traditional exploration strategies that rely on random local policy noise, such as $\epsilon$-greedy and Boltzmann exploration methods, often struggle to efficiently balance exploration and exploitation. In this paper, we integrate the notion of (expected) value of information (EVOI) within the well-known Bootstrapped DQN algorithmic framework, to enhance the algorithm's deep exploration ability. Specifically, we develop two novel algorithms that incorporate the expected gain from learning the value of information into Bootstrapped DQN. Our methods use value of information estimates to measure the discrepancies of opinions among distinct network heads, and drive exploration towards areas with the most potential. We evaluate our algorithms with respect to performance and their ability to exploit inherent uncertainty arising from random network initialization. Our experiments in complex, sparse-reward Atari games demonstrate increased performance, all the while making better use of uncertainty, and, importantly, without introducing extra hyperparameters.

【28】Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics
标题：具有习惯和恢复动力学的权力约束非稳定盗贼
链接：https://arxiv.org/abs/2511.02944

作者：Fengxu Li, Stephanie M. Carpenter, Matthew P. Buman, Yonatan Mintz
摘要：决策者面临的一个共同挑战是选择回报未知且基于先前政策随时间推移而演变的行动。例如，重复使用可能会降低行为的有效性（习惯化），而不活动可能会恢复它（恢复）。这些非平稳性被减少或获得未知功效（ROGUE）的bandit框架捕获，该框架模拟了真实世界的环境，如行为健康干预。虽然现有的算法可以计算次线性后悔政策，以优化这些设置，他们可能无法提供足够的探索，由于过度的剥削，限制了估计人口水平的影响的能力。这是微随机试验（MRT）特别感兴趣的挑战，帮助研究人员开发具有人群水平影响的即时适应性干预措施，同时仍向个人提供个性化建议。在本文中，我们首先开发ROGUE-TS，汤普森采样算法定制的ROGUE框架，并提供理论保证的次线性后悔。然后，我们引入了一个概率裁剪程序，以平衡个性化和人口水平的学习，量化的权衡，平衡遗憾和最小的探索概率。两个MRT数据集的验证有关的体力活动促进和双相情感障碍的治疗表明，我们的方法都实现了较低的遗憾比现有的方法，并保持高的统计功率，通过裁剪程序没有显着增加遗憾。这使得能够可靠地检测治疗效果，同时考虑个体行为动态。对于设计MRT的研究人员来说，我们的框架为平衡个性化与统计有效性提供了实用指导。
摘要：A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action's effectiveness (habituation), while inactivity may restore it (recovery). These nonstationarities are captured by the Reducing or Gaining Unknown Efficacy (ROGUE) bandit framework, which models real-world settings such as behavioral health interventions. While existing algorithms can compute sublinear regret policies to optimize these settings, they may not provide sufficient exploration due to overemphasis on exploitation, limiting the ability to estimate population-level effects. This is a challenge of particular interest in micro-randomized trials (MRTs) that aid researchers in developing just-in-time adaptive interventions that have population-level effects while still providing personalized recommendations to individuals. In this paper, we first develop ROGUE-TS, a Thompson Sampling algorithm tailored to the ROGUE framework, and provide theoretical guarantees of sublinear regret. We then introduce a probability clipping procedure to balance personalization and population-level learning, with quantified trade-off that balances regret and minimum exploration probability. Validation on two MRT datasets concerning physical activity promotion and bipolar disorder treatment shows that our methods both achieve lower regret than existing approaches and maintain high statistical power through the clipping procedure without significantly increasing regret. This enables reliable detection of treatment effects while accounting for individual behavioral dynamics. For researchers designing MRTs, our framework offers practical guidance on balancing personalization with statistical validity.

【29】FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
标题：FATE：多难度水平前沿代数的正式基准系列
链接：https://arxiv.org/abs/2511.02872

作者：Jiedong Jiang, Wanyi He, Yuefeng Wang, Guoxiong Gao, Yongle Hu, Jingting Wang, Nailing Guan, Peihao Wu, Chunbo Dai, Liang Xiao, Bin Dong
摘要：大型语言模型（LLM）的最新进展在形式定理证明方面表现出了令人印象深刻的能力，特别是在IMO等基于最小化的数学基准上。然而，这些竞赛并不能反映现代数学研究的深度、广度和抽象性。为了弥合这一差距，我们引入了FATE（形式代数定理评估），这是一个新的形式代数基准系列，旨在为高级数学推理绘制一个课程。我们提出了两个新的组件，脂肪-H和脂肪-X，每100个问题，在抽象和交换代数。FATE系列涵盖了从本科练习到超过博士资格考试的问题的难度范围。值得注意的是，FATE-X是第一个超过博士水平考试难度和Mathlib库覆盖范围的正式基准。我们在这个新基准上对最先进的LLM证明器进行的评估显示，与竞赛数学相比，性能差距明显：最好的模型在FATE-H上仅达到3%（pass@64）的准确度，在FATE-X上达到0%。我们的两阶段评估表明，模型的自然语言推理明显比其形式化推理的能力更准确。我们系统地分类在这个形式化过程中出现的常见错误。此外，一项比较研究表明，一个专门的证明器可以表现出比通用模型更有效的反射，降低其在自然语言阶段的准确性。我们相信FATE提供了一个强大而具有挑战性的基准，在通往研究级正式数学推理的道路上建立了必要的检查点。
摘要：Recent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of modern mathematical research. To bridge this gap, we introduce FATE (Formal Algebra Theorem Evaluation), a new benchmark series in formal algebra designed to chart a course toward advanced mathematical reasoning. We present two new components, FATE-H and FATE-X, each with 100 problems in abstract and commutative algebra. The FATE series spans a difficulty spectrum from undergraduate exercises to problems exceeding PhD qualifying exams. Notably, FATE-X is the first formal benchmark to surpass both PhD-level exam difficulty and the coverage of the Mathlib library. Our evaluations of state-of-the-art LLM provers on this new benchmark reveal a stark performance gap compared to contest math: the best model achieves only 3% (pass@64) accuracy on FATE-H and 0% on FATE-X. Our two-stage evaluation reveals that models' natural-language reasoning is notably more accurate than their ability to formalize this reasoning. We systematically classify the common errors that arise during this formalization process. Furthermore, a comparative study shows that a specialized prover can exhibit less effective reflection than general-purpose models, reducing its accuracy at the natural-language stage. We believe FATE provides a robust and challenging benchmark that establishes essential checkpoints on the path toward research-level formal mathematical reasoning.

【30】Supersimulators
标题：超级模拟器
链接：https://arxiv.org/abs/2509.17994

作者：Cynthia Dwork, Pranay Tankala
摘要：我们证明，每一个随机布尔函数承认一个超级模拟器：一个随机多项式大小的电路，其输出的随机输入不能有效地区分与现实的不断的优势，即使是多项式更大的模拟器。我们的结果建立在Trevisan，Tulsiani和Vadhan（2009）的具有里程碑意义的复杂性理论正则性引理上，相比之下，它提供了一个欺骗较小的模拟器。我们绕过模拟器大小的下限，让模拟器大小的界限随目标函数而变化，同时保持低于一个独立于目标函数的绝对上限。这种对目标函数的依赖性自然产生于我们使用的迭代技术起源于图的正则性文献。根据Hebert-Johnson et al.（2018），由正则性引理及其最近的改进提供的模拟器，分别被称为多精度和多校准预测器，先前已被证明在复杂性理论，密码学，学习理论等方面有无数的应用。我们首先表明，最近的多校准为基础的表征产品分布的计算不确定性实际上只需要（校准）多精度。然后，我们表明，超级模拟器在这个应用领域中产生更紧密的结果，关闭了先前版本的表征中存在的复杂性差距。
摘要：We prove that every randomized Boolean function admits a supersimulator: a randomized polynomial-size circuit whose output on random inputs cannot be efficiently distinguished from reality with constant advantage, even by polynomially larger distinguishers. Our result builds on the landmark complexity-theoretic regularity lemma of Trevisan, Tulsiani and Vadhan (2009), which, in contrast, provides a simulator that fools smaller distinguishers. We circumvent lower bounds for the simulator size by letting the distinguisher size bound vary with the target function, while remaining below an absolute upper bound independent of the target function. This dependence on the target function arises naturally from our use of an iteration technique originating in the graph regularity literature. The simulators provided by the regularity lemma and recent refinements thereof, known as multiaccurate and multicalibrated predictors, respectively, as per Hebert-Johnson et al. (2018), have previously been shown to have myriad applications in complexity theory, cryptography, learning theory, and beyond. We first show that a recent multicalibration-based characterization of the computational indistinguishability of product distributions actually requires only (calibrated) multiaccuracy. We then show that supersimulators yield an even tighter result in this application domain, closing a complexity gap present in prior versions of the characterization.

【31】The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin
标题：批量非参数盗贼的适应性障碍：未知保证金价格的尖锐描述
链接：https://arxiv.org/abs/2511.03708

作者：Rong Jiang, Cong Ma
摘要：我们研究了批量非参数上下文土匪的保证金条件下，当保证金参数$\alpha$是未知的。为了捕捉这种无知的统计代价，我们引入了后悔膨胀标准，定义为自适应算法的后悔和知道$\alpha$的预言之间的比率。我们表明，最佳的遗憾通货膨胀增长多项式的地平线$T$，与指数精确的凸优化问题的值，涉及的尺寸，光滑，批量预算。此外，该优化问题的最小值直接规定了速率优化算法的批分配和探索策略。基于这一原则，我们开发了RoBIN（具有自适应BINning的ROBust批量算法），它实现了最佳的后悔膨胀到对数因子。这些结果揭示了一个新的自适应障碍：在不确定性，适应一个未知的保证金参数不可避免地会招致多项式惩罚，尖锐的特点是变分问题。值得注意的是，当批处理的数量超过$\log \log T$时，这个障碍就消失了;只有双对数的更新次数，就可以将预言机的遗憾率恢复到多对数因子。
摘要：We study batched nonparametric contextual bandits under a margin condition when the margin parameter $\alpha$ is unknown. To capture the statistical price of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing $\alpha$. We show that the optimal regret inflation grows polynomial with the horizon $T$, with exponent precisely given by the value of a convex optimization problem involving the dimension, smoothness, and batch budget. Moreover, the minimizers of this optimization problem directly prescribe the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to logarithmic factors. These results reveal a new adaptivity barrier: under batching, adaptation to an unknown margin parameter inevitably incurs a polynomial penalty, sharply characterized by a variational problem. Remarkably, this barrier vanishes when the number of batches exceeds $\log \log T$; with only a doubly logarithmic number of updates, one can recover the oracle regret rate up to polylogarithmic factors.

【32】Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity
标题：超越次高斯性的子值自正规化浓度不等式
链接：https://arxiv.org/abs/2511.03606

作者：Diego Martinez-Taboada, Tomas Gonzalez, Aaditya Ramdas
摘要：自规范化过程的研究在从序贯决策到计量经济学的广泛应用中起着至关重要的作用。虽然自归一化浓度的行为已被广泛研究的标量值的过程，向量值的过程仍然相对不足，特别是在亚高斯框架之外。在这方面的贡献，我们提供了浓度范围的自归一化过程的光尾超出亚高斯（如贝内特或伯恩斯坦边界）。我们说明了我们的研究结果的相关性，在在线线性回归的背景下，与应用（内核化）线性土匪。
摘要：The study of self-normalized processes plays a crucial role in a wide range of applications, from sequential decision-making to econometrics. While the behavior of self-normalized concentration has been widely investigated for scalar-valued processes, vector-valued processes remain comparatively underexplored, especially outside of the sub-Gaussian framework. In this contribution, we provide concentration bounds for self-normalized processes with light tails beyond sub-Gaussianity (such as Bennett or Bernstein bounds). We illustrate the relevance of our results in the context of online linear regression, with applications in (kernelized) linear bandits.

【33】The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits
标题：交叉验证误差的结构：稳定性、协方差和极小极大极限
链接：https://arxiv.org/abs/2511.03554

作者：Ido Nachum, Rüdiger Urbanke, Thomas Weinberger
备注：59 pages
摘要：Despite ongoing theoretical research on cross-validation (CV), many theoretical questions about CV remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold cross-validation. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For every learning algorithm that minimizes empirical error, a minimax lower bound on the mean-squared error of $k$-fold CV estimating the population risk $L_\mathcal{D}$: \[ \min_{k \mid n}\; \max_{\mathcal{D}}\; \mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{\mathcal{D}}\big)^{2}\right] \;=\; \Omega\!\big(\sqrt{k}/n\big), \] where $n$ is the sample size and $k$ the number of folds. This shows that even under idealized conditions, for large values of $k$, CV cannot attain the optimum of order $1/n$ achievable by a validation set of size $n$, reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which \[ \max_{\mathcal{D}}\; \mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{\mathcal{D}}\big)^{2}\right] \;=\; \Omega(k/n), \] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$. Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes.
摘要：Despite ongoing theoretical research on cross-validation (CV), many theoretical questions about CV remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold cross-validation. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For every learning algorithm that minimizes empirical error, a minimax lower bound on the mean-squared error of $k$-fold CV estimating the population risk $L_\mathcal{D}$: \[ \min_{k \mid n}\; \max_{\mathcal{D}}\; \mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{\mathcal{D}}\big)^{2}\right] \;=\; \Omega\!\big(\sqrt{k}/n\big), \] where $n$ is the sample size and $k$ the number of folds. This shows that even under idealized conditions, for large values of $k$, CV cannot attain the optimum of order $1/n$ achievable by a validation set of size $n$, reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which \[ \max_{\mathcal{D}}\; \mathbb{E}\!\left[\big(\widehat{L}_{\mathrm{CV}}^{(k)} - L_{\mathcal{D}}\big)^{2}\right] \;=\; \Omega(k/n), \] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$. Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes.

【34】Topography, climate, land cover, and biodiversity: Explaining endemic richness and management implications on a Mediterranean island
标题：地形、气候、土地覆盖和生物多样性：解释地中海岛屿的地方性丰富性和管理影响
链接：https://arxiv.org/abs/2511.03242

作者：Aristides Moustakas, Ioannis N Vogiatzakis
摘要：岛屿特有性是由环境、生态和进化因素之间复杂的相互作用形成的，但地形、气候和土地覆盖的相对贡献仍不完全量化。我们调查了克里特岛，地中海生物多样性热点地区的地方性植物丰富度的驱动因素，使用物种分布，地形复杂性，气候变异性，土地覆盖和土壤特性的空间显式数据。人工神经网络模型，一种机器学习工具，被用来评估这些预测因子的相对重要性，并确定特有性的热点。我们发现，总物种丰富度，海拔范围和气候变异性是最强的地方性丰富度的预测因子，反映了生物多样性，地形异质性和气候梯度在产生多样化的栖息地和微避难所，促进物种形成和缓冲灭绝风险的作用。特有热点只有部分重叠的高总物种丰富度的地区，表明总物种丰富度是最佳的检查，但不完美的替代品。这些环境多样的地区还提供关键的生态系统服务，包括土壤稳定、授粉和文化价值，这些服务日益受到旅游业、可再生能源开发、土地使用变化和气候影响的威胁。我们的研究结果强调了在保护规划中优先考虑山区和气候多变地区的重要性，整合生态系统服务考虑因素，并考虑岛屿内的空间异质性。通过明确地将特有性的环境驱动因素与生物多样性模式和生态系统功能联系起来，这项研究为克里特岛和其他具有相似地质和地理背景的地中海岛屿的循证保护规划提供了一个框架。
摘要：Island endemism is shaped by complex interactions among environmental, ecological, and evolutionary factors, yet the relative contributions of topography, climate, and land cover remain incompletely quantified. We investigated the drivers of endemic plant richness across Crete, a Mediterranean biodiversity hotspot, using spatially explicit data on species distributions, topographic complexity, climatic variability, land cover, and soil characteristics. Artificial Neural Network models, a machine learning tool, were employed to assess the relative importance of these predictors and to identify hotspots of endemism. We found that total species richness, elevation range, and climatic variability were the strongest predictors of endemic richness, reflecting the role of biodiversity, topographic heterogeneity, and climatic gradients in generating diverse habitats and micro-refugia that promote speciation and buffer extinction risk. Endemic hotspots only partially overlapped with areas of high total species richness, indicating that total species richness was the optimal from the ones examined, yet an imperfect surrogate. These environmentally heterogeneous areas also provide critical ecosystem services, including soil stabilization, pollination, and cultural value, which are increasingly threatened by tourism, renewable energy development, land-use change, and climate impacts. Our findings underscore the importance of prioritizing mountainous and climatically variable regions in conservation planning, integrating ecosystem service considerations, and accounting for within-island spatial heterogeneity. By explicitly linking the environmental drivers of endemism to both biodiversity patterns and ecosystem function, this study provides a framework for evidence-based conservation planning in Crete and other Mediterranean islands with similar geological and biogeographic contexts.

【35】Statistical Properties of Rectified Flow
标题：整流流的统计性质
链接：https://arxiv.org/abs/2511.03193

作者：Gonzalo Mena, Arun Kumar Kuchibhotla, Larry Wasserman
备注：159 pages, 7 figures
摘要：整流流（Liu等人，2022; Liu，2022; Wu等人，2023）是一种定义两个分布之间的传输映射的方法，在机器学习中很受欢迎，尽管支持这些方法有效性的理论结果很少。整流流可以被认为是最优传输的近似，但与其他需要在函数空间上优化的传输方法相比，计算整流流只需要标准的统计工具，如回归或密度估计。正因为如此，人们可以利用标准的数据分析工具进行回归和密度估计，以开发经验版本的交通地图。我们研究整流流的一些结构性质，包括存在性，唯一性和正则性，以及相关的统计性质，如收敛速度和中心极限定理，为一些选定的估计。要做到这一点，我们分别分析了有界和无界的情况下，因为每一个提出了独特的挑战。在这两种情况下，我们能够建立收敛速度比通常的非参数回归和密度估计。
摘要：Rectified flow (Liu et al., 2022; Liu, 2022; Wu et al., 2023) is a method for defining a transport map between two distributions, and enjoys popularity in machine learning, although theoretical results supporting the validity of these methods are scant. The rectified flow can be regarded as an approximation to optimal transport, but in contrast to other transport methods that require optimization over a function space, computing the rectified flow only requires standard statistical tools such as regression or density estimation. Because of this, one can leverage standard data analysis tools for regression and density estimation to develop empirical versions of transport maps. We study some structural properties of the rectified flow, including existence, uniqueness, and regularity, as well as the related statistical properties, such as rates of convergence and central limit theorems, for some selected estimators. To do so, we analyze separately the bounded and unbounded cases as each presents unique challenges. In both cases, we are able to establish convergence at faster rates than the ones for the usual nonparametric regression and density estimation.

【36】Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies
标题：优化地月传输和顺月导航：集成低能量轨迹、人工智能技术和GNSS-R技术
链接：https://arxiv.org/abs/2511.03173

作者：Arsalan Muhammad, Wasiu Akande Ahmed, Omada Friday Ojonugwa, Paul Puspendu Biswas
备注：None
摘要：地月活动的快速增长，包括月球着陆，月球门户和太空加油站，需要在具有成本效益的轨道设计和可靠的导航和遥感集成方面取得进展。传统的地月转移受制于严格的发射窗口和高推进剂需求，而地球上的全球导航卫星系统在地球静止轨道以外几乎没有覆盖。这限制了地月空间的自主性和环境意识。本文通过评估速度要求、飞行持续时间和燃料效率，并确定它们对载人和机器人任务的适用性，比较了四种主要的转移策略。人工智能和机器学习的新兴作用得到了强调：卷积神经网络支持自动火山口识别和数字地形模型生成，而深度强化学习可以在下降和着陆过程中实现自适应轨迹优化，以降低风险和决策延迟。该研究还探讨了GNSS反射测量和先进的定位、导航和定时架构如何将导航能力扩展到当前的极限之外。GNSS-R可以作为一个双基地雷达，用于测绘月球冰、土壤特性和表面地形，而PNT系统支持自主交会、拉格朗日点位置保持和协调的卫星群操作。结合这些发展，为可持续的地月探测和长期的人类和机器人存在建立了一个可扩展的框架。
摘要：The rapid growth of cislunar activities, including lunar landings, the Lunar Gateway, and in-space refueling stations, requires advances in cost-efficient trajectory design and reliable integration of navigation and remote sensing. Traditional Earth-Moon transfers suffer from rigid launch windows and high propellant demands, while Earth-based GNSS systems provide little to no coverage beyond geostationary orbit. This limits autonomy and environmental awareness in cislunar space. This review compares four major transfer strategies by evaluating velocity requirements, flight durations, and fuel efficiency, and by identifying their suitability for both crewed and robotic missions. The emerging role of artificial intelligence and machine learning is highlighted: convolutional neural networks support automated crater recognition and digital terrain model generation, while deep reinforcement learning enables adaptive trajectory refinement during descent and landing to reduce risk and decision latency. The study also examines how GNSS-Reflectometry and advanced Positioning, Navigation, and Timing architectures can extend navigation capabilities beyond current limits. GNSS-R can act as a bistatic radar for mapping lunar ice, soil properties, and surface topography, while PNT systems support autonomous rendezvous, Lagrange point station-keeping, and coordinated satellite swarm operations. Combining these developments establishes a scalable framework for sustainable cislunar exploration and long-term human and robotic presence.

【37】Quantifying Articulatory Coordination as a Biomarker for Schizophrenia
标题：量化关节协调作为精神分裂症的生物标志物
链接：https://arxiv.org/abs/2511.03084

作者：Gowtham Premananth, Carol Espy-Wilson
备注：Submitted to ICASSP 2026
摘要：人工智能（AI）和深度学习的进步提高了医疗保健的诊断能力，但有限的可解释性继续阻碍临床应用。精神分裂症是一种复杂的疾病，具有多种症状，包括言语紊乱和社交退缩，需要能够捕捉症状严重程度并提供超越二元诊断的临床有意义的见解的工具。在这里，我们提出了一个可解释的框架，利用发音语音特征，通过特征谱差异图和加权和指数衰减（WSED）来量化声道协调。本征谱图有效地区分了复杂的简单的协调模式，和WSED分数可靠地分开这些群体，与模糊性限制在一个狭窄的范围内接近零。重要的是，WSED评分不仅与总体BPRS严重程度相关，而且与阳性和阴性症状之间的平衡相关，反映了具有明显阳性症状的受试者的更复杂的协调性和更强的阴性症状的相反趋势。这种方法为精神分裂症提供了一个透明的，严重程度敏感的生物标志物，推进了临床可解释的基于语音的评估工具的潜力。
摘要：Advances in artificial intelligence (AI) and deep learning have improved diagnostic capabilities in healthcare, yet limited interpretability continues to hinder clinical adoption. Schizophrenia, a complex disorder with diverse symptoms including disorganized speech and social withdrawal, demands tools that capture symptom severity and provide clinically meaningful insights beyond binary diagnosis. Here, we present an interpretable framework that leverages articulatory speech features through eigenspectra difference plots and a weighted sum with exponential decay (WSED) to quantify vocal tract coordination. Eigenspectra plots effectively distinguished complex from simpler coordination patterns, and WSED scores reliably separated these groups, with ambiguity confined to a narrow range near zero. Importantly, WSED scores correlated not only with overall BPRS severity but also with the balance between positive and negative symptoms, reflecting more complex coordination in subjects with pronounced positive symptoms and the opposite trend for stronger negative symptoms. This approach offers a transparent, severity-sensitive biomarker for schizophrenia, advancing the potential for clinically interpretable speech-based assessment tools.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递