Py学习  »  机器学习算法

机器学习学术速递[1.5]

arXiv每日学术速递 • 3 周前 • 169 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计104篇


大模型相关(8篇)

【1】Memory Bank Compression for Continual Adaptation of Large Language Models
标题:用于大型语言模型连续自适应的存储体压缩
链接:https://arxiv.org/abs/2601.00756

作者:Thomas Katraouras,Dimitrios Rafailidis
备注:Accepted to the 41st ACM/SIGAPP Symposium on Applied Computing (SAC '26)
摘要:大型语言模型(LLM)已经成为许多日常应用程序的支柱。然而,随着数据的发展,他们的知识很快就会过时。持续学习的目的是用新的信息更新LLM,而不删除以前获得的知识。尽管诸如完全微调之类的方法可以包含新数据,但它们在计算上是昂贵的,并且容易发生灾难性的遗忘,其中先前的知识被覆盖。存储器增强方法通过为LLM配备存储器库来解决这个问题,该存储器库是存储信息以供将来使用的外部存储器模块。然而,这些方法面临着一个关键的限制,特别是,内存银行不断增长,在现实世界中的情况下,大规模的数据流到达。在本文中,我们提出了MBC,一个模型,通过在线自适应学习过程中的码本优化策略压缩的内存库。为了确保稳定的学习,我们还引入了一个在线重置机制,防止码本崩溃。此外,我们在LLM的注意层中采用键值低秩自适应,从而有效利用压缩内存表示。与基准问答数据集的实验表明,MBC与最具竞争力的基线相比,将记忆库大小减少到0.3%,同时在在线适应学习期间保持高的保留准确性。我们的代码可在https://github.com/Thomkat/MBC上公开获取。
摘要:Large Language Models (LLMs) have become a mainstay for many everyday applications. However, as data evolve their knowledge quickly becomes outdated. Continual learning aims to update LLMs with new information without erasing previously acquired knowledge. Although methods such as full fine-tuning can incorporate new data, they are computationally expensive and prone to catastrophic forgetting, where prior knowledge is overwritten. Memory-augmented approaches address this by equipping LLMs with a memory bank, that is an external memory module which stores information for future use. However, these methods face a critical limitation, in particular, the memory bank constantly grows in the real-world scenario when large-scale data streams arrive. In this paper, we propose MBC, a model that compresses the memory bank through a codebook optimization strategy during online adaptation learning. To ensure stable learning, we also introduce an online resetting mechanism that prevents codebook collapse. In addition, we employ Key-Value Low-Rank Adaptation in the attention layers of the LLM, enabling efficient utilization of the compressed memory representations. Experiments with benchmark question-answering datasets demonstrate that MBC reduces the memory bank size to 0.3% when compared against the most competitive baseline, while maintaining high retention accuracy during online adaptation learning. Our code is publicly available at https://github.com/Thomkat/MBC.


【2】QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models
标题:QLAM:一个具有性能和内存感知的量化框架,具有针对尖峰驱动语言模型的分层搜索策略
链接:https://arxiv.org/abs/2601.00679

作者:Rachmad Vidya Wicaksana Putra,Pasindu Wickramasinghe,Muhammad Shafique
备注:Accepted at the Design, Automation and Test in Europe Conference (DATE) 2025 on April 20th-22nd, 2025 in Verona, Italy
摘要:大型语言模型(LLM)由于其高性能(例如,准确性)和对给定输入生成高质量响应的能力。然而,其巨大的计算成本,巨大的内存占用和高处理能力/能源使其嵌入式部署具有挑战性。在几个tinyLLM中,最近的工作提出了尖峰驱动的语言模型(SLMs),用于显着降低LLM的处理能力/能量。然而,它们的内存占用对于低成本和资源受限的嵌入式设备来说仍然太大。手动量化方法可以有效地压缩SLM存储器占用空间,但是它需要大量的设计时间和计算能力来找到每个网络的量化设置,因此使得这种方法对于处理不同的网络、性能要求和存储器预算来说是不可扩展的。为了弥合这一差距,我们提出了QSLM,一种新的框架,执行自动量化压缩预训练的SLM,同时满足性能和内存的限制。为了实现这一点,QSLM首先识别给定网络架构的层次结构和量化下网络层的敏感性,然后采用分层量化策略(例如,全局、块和模块级量化),同时利用多目标性能和存储器折衷函数来选择最终量化设置。实验结果表明,我们的QSLM减少了高达86.5%的内存占用,降低了高达20%的功耗,保持了跨不同任务的高性能(即,SST-2数据集上情感分类的准确率高达84.4%,WikiText-2数据集上文本生成的困惑度得分为23.2)接近原始非量化模型,同时满足性能和内存限制。
摘要:Large Language Models (LLMs) have been emerging as prominent AI models for solving many natural language tasks due to their high performance (e.g., accuracy) and capabilities in generating high-quality responses to the given inputs. However, their large computational cost, huge memory footprints, and high processing power/energy make it challenging for their embedded deployments. Amid several tinyLLMs, recent works have proposed spike-driven language models (SLMs) for significantly reducing the processing power/energy of LLMs. However, their memory footprints still remain too large for low-cost and resource-constrained embedded devices. Manual quantization approach may effectively compress SLM memory footprints, but it requires a huge design time and compute power to find the quantization setting for each network, hence making this approach not-scalable for handling different networks, performance requirements, and memory budgets. To bridge this gap, we propose QSLM, a novel framework that performs automated quantization for compressing pre-trained SLMs, while meeting the performance and memory constraints. To achieve this, QSLM first identifies the hierarchy of the given network architecture and the sensitivity of network layers under quantization, then employs a tiered quantization strategy (e.g., global-, block-, and module-level quantization) while leveraging a multi-objective performance-and-memory trade-off function to select the final quantization setting. Experimental results indicate that our QSLM reduces memory footprint by up to 86.5%, reduces power consumption by up to 20%, maintains high performance across different tasks (i.e., by up to 84.4% accuracy of sentiment classification on the SST-2 dataset and perplexity score of 23.2 for text generation on the WikiText-2 dataset) close to the original non-quantized model while meeting the performance and memory constraints.


【3】Do Chatbot LLMs Talk Too Much? The YapBench Benchmark
标题:Chatbot LLM说话太多吗?YapBench基准
链接:https://arxiv.org/abs/2601.00624

作者:Vadim Borisov,Michael Gröger,Mina Mikhael,Richard H. Schreiber
摘要:ChatGPT、Claude和Gemini等大型语言模型(LLM)越来越多地充当通用副驾驶员,但它们通常会对简单的请求做出不必要的回应,添加冗余的解释、对冲或样板文件,从而增加认知负荷并增加基于令牌的推理成本。之前的工作表明,基于偏好的培训后和法学硕士评判的评估可能会导致系统性的长度偏差,即使质量相当,也会奖励较长的答案。   我们介绍YapBench,一个轻量级的基准,用于量化用户可见的过度生成简短的理想提示。每个项目包括一个单轮提示,一个策划的最低限度足够的基线答案和一个类别标签。我们的主要指标YapScore测量超出字符基线的多余响应长度,从而可以在不依赖任何特定标记器的情况下进行跨模型比较。我们通过YapIndex总结模型性能,YapIndex是类别级别中位数YapScores的均匀加权平均值。   YapBench包含超过300个英语提示,跨越三种常见的简短理想设置:(A)最小或模糊的输入,理想的行为是简短的澄清,(B)封闭形式的事实问题,简短的稳定答案,以及(C)单行编码任务,其中单个命令或片段就足够了。评估76个助理LLM,我们观察到一个数量级的蔓延中位多余的长度和不同的类别特定的故障模式,包括真空填充模糊的输入和解释或格式化开销的一行技术要求。我们发布了基准测试,并维护了一个实时排行榜,用于跟踪随时间推移的详细程度行为。
摘要 :Large Language Models (LLMs) such as ChatGPT, Claude, and Gemini increasingly act as general-purpose copilots, yet they often respond with unnecessary length on simple requests, adding redundant explanations, hedging, or boilerplate that increases cognitive load and inflates token-based inference cost. Prior work suggests that preference-based post-training and LLM-judged evaluations can induce systematic length bias, where longer answers are rewarded even at comparable quality.   We introduce YapBench, a lightweight benchmark for quantifying user-visible over-generation on brevity-ideal prompts. Each item consists of a single-turn prompt, a curated minimal-sufficient baseline answer, and a category label. Our primary metric, YapScore, measures excess response length beyond the baseline in characters, enabling comparisons across models without relying on any specific tokenizer. We summarize model performance via the YapIndex, a uniformly weighted average of category-level median YapScores.   YapBench contains over three hundred English prompts spanning three common brevity-ideal settings: (A) minimal or ambiguous inputs where the ideal behavior is a short clarification, (B) closed-form factual questions with short stable answers, and (C) one-line coding tasks where a single command or snippet suffices. Evaluating 76 assistant LLMs, we observe an order-of-magnitude spread in median excess length and distinct category-specific failure modes, including vacuum-filling on ambiguous inputs and explanation or formatting overhead on one-line technical requests. We release the benchmark and maintain a live leaderboard for tracking verbosity behavior over time.


【4】Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback
标题:通过检索增强生成和多工具反馈改进LLM辅助安全代码生成
链接:https://arxiv.org/abs/2601.00509

作者:Vidyut Sriram,Sawan Pandita,Achintya Lakshmanan,Aneesh Shamraj,Suman Saha
摘要:大型语言模型(LLM)可以生成代码,但通常会引入安全漏洞、逻辑不一致和编译错误。以前的工作表明,LLM受益于结构化反馈,静态分析,检索增强和基于执行的细化。我们提出了一个检索增强,多工具修复工作流程,其中一个单一的代码生成LLM迭代地使用编译器诊断,CodeQL安全扫描和KLEE符号执行完善其输出。一个轻量级的嵌入模型用于语义检索以前成功的修复,提供安全为重点的例子,指导生成。在由DeepSeek-Coder-1.3B和CodeLlama-7 B生成的3,242个程序的组合数据集上进行评估,该系统在鲁棒性方面表现出显着的改进。对于DeepSeek,安全漏洞减少了96%。对于更大的CodeLlama模型,关键安全缺陷率从58.55%降低到22.19%,突出了工具辅助自我修复的有效性,即使在“顽固”模型上。
摘要:Large Language Models (LLMs) can generate code but often introduce security vulnerabilities, logical inconsistencies, and compilation errors. Prior work demonstrates that LLMs benefit substantially from structured feedback, static analysis, retrieval augmentation, and execution-based refinement. We propose a retrieval-augmented, multi-tool repair workflow in which a single code-generating LLM iteratively refines its outputs using compiler diagnostics, CodeQL security scanning, and KLEE symbolic execution. A lightweight embedding model is used for semantic retrieval of previously successful repairs, providing security-focused examples that guide generation. Evaluated on a combined dataset of 3,242 programs generated by DeepSeek-Coder-1.3B and CodeLlama-7B, the system demonstrates significant improvements in robustness. For DeepSeek, security vulnerabilities were reduced by 96%. For the larger CodeLlama model, the critical security defect rate was decreased from 58.55% to 22.19%, highlighting the efficacy of tool-assisted self-repair even on "stubborn" models.


【5】Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving
标题:Revati:适用于LLM服务的透明无PU时间扭曲仿真
链接:https://arxiv.org/abs/2601.00397

作者:Amey Agrawal,Mayank Yadav,Sukrit Kumar,Anirudha Agrawal,Garv Ghai,Souradeep Bera,Elton Pinto,Sirish Gambhira,Mohammad Adain,Kasra Sohrab,Chus Antonanzas,Alexey Tumanov
摘要:高效地部署LLM需要测试数百个服务配置,但在GPU集群上评估每个配置需要花费数小时和数千美元。离散事件模拟器更快、更便宜,但它们需要重新实现服务系统的控制逻辑--随着框架的发展,这是一个复杂的负担。   我们提出了Revati,一个时间扭曲仿真器,使性能建模直接执行真实的服务系统代码在模拟的速度。该系统拦截CUDA API调用以虚拟化设备管理,允许服务框架在没有物理GPU的情况下运行。它不执行GPU内核,而是执行时间跳跃--根据预测的内核持续时间快进虚拟时间。我们提出了一个协调协议,这些跳跃跨越分布式进程,同时保持因果关系。在vLLM和SGLang上,Revati在多个模型和并行配置中的预测误差小于5%,同时运行速度比实际GPU执行速度快5- 17倍。
摘要:Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as frameworks evolve.   We present Revati, a time-warp emulator that enables performance modeling by directly executing real serving system code at simulation-like speed. The system intercepts CUDA API calls to virtualize device management, allowing serving frameworks to run without physical GPUs. Instead of executing GPU kernels, it performs time jumps -- fast-forwarding virtual time by predicted kernel durations. We propose a coordination protocol that synchronizes these jumps across distributed processes while preserving causality. On vLLM and SGLang, Revati achieves less than 5% prediction error across multiple models and parallelism configurations, while running 5-17x faster than real GPU execution.


【6】Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations
标题:大型语言模型还能解释自己吗?调查量化对自我安慰的影响
链接:https://arxiv.org/abs/2601.00282

作者:Qianli Wang,Nils Feldhus,Pepa Atanasova,Fedor Splitt,Simon Ostermann,Sebastian Möller,Vera Schmitt
备注:In submission
摘要:量化被广泛用于加速推理和简化大型语言模型(LLM)的部署,但其对自我解释(SE)的影响尚未被探索。由LLM生成的SE用于证明其自身的输出,需要对模型自身的决策过程进行推理,这种能力可能对量化表现出特别的敏感性。随着SE在高风险应用中越来越依赖于透明度,了解量化是否以及在多大程度上降低SE质量和忠诚度至关重要。为了解决这一差距,我们研究了两种类型的SE:自然语言解释(NLE)和反事实的例子,使用三种常见的技术在不同的位宽量化的LLM生成。我们的研究结果表明,量化通常会导致SE质量(高达4.4%)和忠诚度(高达2.38%)的适度下降。用户研究进一步表明,量化减少了SE的一致性和可信度(高达8.5%)。与较小的模型相比,较大的模型在SE质量方面对量化的弹性有限,但更好地保持了忠实性。此外,没有量化技术在任务准确性、SE质量和忠实性方面始终表现出色。鉴于量化的影响因上下文而异,我们建议验证特定用例的SE质量,特别是对于敏感性更高的NLE。尽管如此,SE质量和忠实性的相对较小的恶化并没有破坏量化作为模型压缩技术的有效性。
摘要:Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require reasoning about the model's own decision-making process, a capability that may exhibit particular sensitivity to quantization. As SEs are increasingly relied upon for transparency in high-stakes applications, understanding whether and to what extent quantization degrades SE quality and faithfulness is critical. To address this gap, we examine two types of SEs: natural language explanations (NLEs) and counterfactual examples, generated by LLMs quantized using three common techniques at distinct bit widths. Our findings indicate that quantization typically leads to moderate declines in both SE quality (up to 4.4\%) and faithfulness (up to 2.38\%). The user study further demonstrates that quantization diminishes both the coherence and trustworthiness of SEs (up to 8.5\%). Compared to smaller models, larger models show limited resilience to quantization in terms of SE quality but better maintain faithfulness. Moreover, no quantization technique consistently excels across task accuracy, SE quality, and faithfulness. Given that quantization's impact varies by context, we recommend validating SE quality for specific use cases, especially for NLEs, which show greater sensitivity. Nonetheless, the relatively minor deterioration in SE quality and faithfulness does not undermine quantization's effectiveness as a model compression technique.


【7】The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
标题:词汇中的特洛伊:LLM写作的秘密破坏
链接:https://arxiv.org/abs/2601.00065

作者:Xiaoze Liu,Weichen Yu,Matt Fredrikson,Xiaoqian Wang,Jing Gao
摘要:开放权重LLM生态系统越来越多地由模型组合技术(如权重合并,推测解码和词汇扩展)定义,这些技术将来自不同来源的功能重新混合。在不同的模型家族中应用这些方法的一个关键先决条件是标记器移植,它将不兼容的词汇表对齐到共享的嵌入空间。我们证明了这一重要的互操作性步骤引入了供应链漏洞:我们设计了一个单一的“断路器令牌”,该令牌在供体模型中功能惰性,但在移植到基础模型后可靠地重建为高显着性恶意功能。通过利用系数重用的几何形状,我们的攻击创建了一个不对称的可实现性差距,破坏了基础模型的生成,同时使捐赠者的效用在统计上与名义行为无法区分。我们将其形式化为一个双目标优化问题,并使用稀疏求解器实例化攻击。从经验上讲,该攻击无需训练,并实现了频谱模仿以逃避离群值检测,同时证明了针对微调和权重合并的结构持久性,突出了模块化AI组合管道中的隐藏风险。代码可从https://github.com/xz-liu/tokenforge获得
摘要:The open-weight LLM ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these methods across different model families is tokenizer transplant, which aligns incompatible vocabularies to a shared embedding space. We demonstrate that this essential interoperability step introduces a supply-chain vulnerability: we engineer a single "breaker token" that is functionally inert in a donor model yet reliably reconstructs into a high-salience malicious feature after transplant into a base model. By exploiting the geometry of coefficient reuse, our attack creates an asymmetric realizability gap that sabotages the base model's generation while leaving the donor's utility statistically indistinguishable from nominal behavior. We formalize this as a dual-objective optimization problem and instantiate the attack using a sparse solver. Empirically, the attack is training-free and achieves spectral mimicry to evade outlier detection, while demonstrating structural persistence against fine-tuning and weight merging, highlighting a hidden risk in the pipeline of modular AI composition. Code is available at https://github.com/xz-liu/tokenforge


【8】Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study
标题:尼日利亚洋泾浜英语中自动抑郁症筛查的微调大型语言模型:GENSCORE试点研究
链接:https://arxiv.org/abs/2601.00004

作者:Isaac Iyinoluwa Olufadewa,Miracle Ayomikun Adesina,Ezekiel Ayodeji Oladejo,Uthman Babatunde Usman,Owen Kolade Adeniyi,Matthew Tolulope Olawoyin
备注:9 pages, 1 figure, 4 tables
摘要:抑郁症是尼日利亚心理健康负担的一个主要因素,但由于难以获得临床医生,耻辱和语言障碍,筛查覆盖率仍然有限。患者健康问卷-9(PHQ-9)等传统工具在高收入国家得到验证,但在语言或文化上可能无法用于低收入和中等收入国家和社区,例如尼日利亚,那里的人们使用尼日利亚洋泾浜语和520多种当地语言进行交流。本研究提出了一种新的方法,自动抑郁症筛查微调大语言模型(LLM)适应尼日利亚洋泾浜会话。我们收集了432个来自尼日利亚18-40岁年轻人的皮钦语音频响应的数据集,以提示评估与PHQ-9项目一致的心理体验,进行转录,严格的预处理和注释,包括语义标记,俚语和成语解释,以及PHQ-9严重程度评分。三个LLM- Phi-3-mini-4k-instruction,Gemma-3- 4 B-it和GPT-4.1 -在这个注释数据集上进行了微调,并对其性能进行了定量(准确性,精确度和语义对齐)和定性(清晰度,相关性和文化适当性)评估。GPT-4.1实现了最高的定量性能,在PHQ-9严重程度评分预测中具有94.5%的准确性,优于Gemma-3- 4 B-it和Phi-3-mini-4k-instruction。在质量方面,GPT-4.1也产生了最适合文化的,清晰的和上下文相关的反应。AI介导的抑郁症筛查服务不足的尼日利亚社区这项工作为在语言多样、资源有限的环境中部署对话式心理健康工具奠定了基础。
摘要:Depression is a major contributor to the mental-health burden in Nigeria, yet screening coverage remains limited due to low access to clinicians, stigma, and language barriers. Traditional tools like the Patient Health Questionnaire-9 (PHQ-9) were validated in high-income countries but may be linguistically or culturally inaccessible for low- and middle-income countries and communities such as Nigeria where people communicate in Nigerian Pidgin and more than 520 local languages. This study presents a novel approach to automated depression screening using fine-tuned large language models (LLMs) adapted for conversational Nigerian Pidgin. We collected a dataset of 432 Pidgin-language audio responses from Nigerian young adults aged 18-40 to prompts assessing psychological experiences aligned with PHQ-9 items, performed transcription, rigorous preprocessing and annotation, including semantic labeling, slang and idiom interpretation, and PHQ-9 severity scoring. Three LLMs - Phi-3-mini-4k-instruct, Gemma-3-4B-it, and GPT-4.1 - were fine-tuned on this annotated dataset, and their performance was evaluated quantitatively (accuracy, precision and semantic alignment) and qualitatively (clarity, relevance, and cultural appropriateness). GPT-4.1 achieved the highest quantitative performance, with 94.5% accuracy in PHQ-9 severity scoring prediction, outperforming Gemma-3-4B-it and Phi-3-mini-4k-instruct. Qualitatively, GPT-4.1 also produced the most culturally appropriate, clear, and contextually relevant responses. AI-mediated depression screening for underserved Nigerian communities. This work provides a foundation for deploying conversational mental-health tools in linguistically diverse, resource-constrained environments.


Graph相关(图学习|图神经网络|图优化等)(3篇)

【1】Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning
标题:基于图神经网络的强化学习算法在出租车调度中的应用
链接:https://arxiv.org/abs/2601.00607

作者:Sonia Khetarpaul,P Y Sharan
摘要:在智慧城市交通背景下,出租车供给与乘客需求的高效匹配需要实时整合城市交通网络数据和出行模式。传统的出租车热点预测模型往往仅依赖于历史需求,忽视了交通拥堵、道路事件和公共事件等动态影响。本文提出了一个交通感知,基于图的强化学习(RL)框架,在大都市环境中的最佳出租车放置。城市道路网络被建模为一个图形,其中交叉口表示节点,路段作为边缘,节点属性捕获历史需求,事件接近度和实时交通API获得的实时拥堵分数。图神经网络(GNN)嵌入用于编码交通网络中的时空依赖性,然后由Q学习代理使用这些依赖性来推荐最佳出租车热点。奖励机制共同优化乘客等待时间、驾驶员出行距离和拥堵规避。在使用真实地理空间边界和历史乘车请求模式生成的模拟德里出租车数据集上进行的实验表明,与基线随机选择相比,该模型将乘客等待时间减少了约56%,并将行驶距离减少了38%。所提出的方法适用于多式联运系统,并可集成到智能城市平台中,以实现实时城市移动优化。
摘要:In the context of smart city transportation, efficient matching of taxi supply with passenger demand requires real-time integration of urban traffic network data and mobility patterns. Conventional taxi hotspot prediction models often rely solely on historical demand, overlooking dynamic influences such as traffic congestion, road incidents, and public events. This paper presents a traffic-aware, graph-based reinforcement learning (RL) framework for optimal taxi placement in metropolitan environments. The urban road network is modeled as a graph where intersections represent nodes, road segments serve as edges, and node attributes capture historical demand, event proximity, and real-time congestion scores obtained from live traffic APIs. Graph Neural Network (GNN) embeddings are employed to encode spatial-temporal dependencies within the traffic network, which are then used by a Q-learning agent to recommend optimal taxi hotspots. The reward mechanism jointly optimizes passenger waiting time, driver travel distance, and congestion avoidance. Experiments on a simulated Delhi taxi dataset, generated using real geospatial boundaries and historic ride-hailing request patterns, demonstrate that the proposed model reduced passenger waiting time by about 56% and reduced travel distance by 38% compared to baseline stochastic selection. The proposed approach is adaptable to multi-modal transport systems and can be integrated into smart city platforms for real-time urban mobility optimization.


【2】Robust Graph Fine-Tuning with Adversarial Graph Prompting
标题:使用对抗图预处理的鲁棒图微调
链接:https://arxiv.org/abs/2601.00229

作者:Ziyan Zhang,Bo Jiang,Jin Tang
摘要:参数高效微调(PEFT)方法已经成为使预训练的GNN模型适应下游任务的主导范式。然而,现有的PEFT方法通常表现出显着的脆弱性,各种噪声和攻击的图形拓扑结构和节点属性/功能。为了解决这个问题,我们首次提出将对抗性学习集成到图提示中,并开发了一种新的对抗性图提示(AGP)框架,以实现强大的图微调。我们的AGP有两个关键方面。首先,我们将AGP的一般问题形式化为一个最小最大优化问题,并提出了一种交替优化方案来解决它。对于内部最大化问题,我们提出了联合投影梯度下降(JointPGD)算法来产生强对抗性噪声。对于外部最小化,我们采用了一个简单而有效的模块来学习最佳节点提示,以抵消对抗性噪声。其次,我们证明了建议AGP理论上可以解决图拓扑结构和节点噪声。这证实了我们的AGP微调方法在各种图形噪声中的通用性和鲁棒性。请注意,所提出的AGP是一种通用方法,可以与各种预训练的GNN模型集成,以增强其对下游任务的鲁棒性。在多个基准任务上的大量实验验证了AGP方法的鲁棒性和有效性。
摘要:Parameter-Efficient Fine-Tuning (PEFT) method has emerged as a dominant paradigm for adapting pre-trained GNN models to downstream tasks. However, existing PEFT methods usually exhibit significant vulnerability to various noise and attacks on graph topology and node attributes/features. To address this issue, for the first time, we propose integrating adversarial learning into graph prompting and develop a novel Adversarial Graph Prompting (AGP) framework to achieve robust graph fine-tuning. Our AGP has two key aspects. First, we propose the general problem formulation of AGP as a min-max optimization problem and develop an alternating optimization scheme to solve it. For inner maximization, we propose Joint Projected Gradient Descent (JointPGD) algorithm to generate strong adversarial noise. For outer minimization, we employ a simple yet effective module to learn the optimal node prompts to counteract the adversarial noise. Second, we demonstrate that the proposed AGP can theoretically address both graph topology and node noise. This confirms the versatility and robustness of our AGP fine-tuning method across various graph noise. Note that, the proposed AGP is a general method that can be integrated with various pre-trained GNN models to enhance their robustness on the downstream tasks. Extensive experiments on multiple benchmark tasks validate the robustness and effectiveness of AGP method compared to state-of-the-art methods.


【3】IMBWatch -- a Spatio-Temporal Graph Neural Network approach to detect Illicit Massage Business
标题:IMBWatch --一种检测非法按摩业务的时空图神经网络方法
链接:https://arxiv.org/abs/2601.00075

作者:Swetha Varadarajan,Abhishek Ray,Lumina Albert
备注:Submitted to AAAI AISI 2026
摘要:非法按摩业务(IMB)是一种隐蔽而持久的有组织剥削形式,在合法健康服务的幌子下运作,同时促进人口贩运,性剥削和强迫劳动。由于编码的数字广告、人员和位置的频繁变化以及共享基础设施(例如电话号码和地址)的重用,检测IMB是困难的。传统的方法,包括社区提示和监管检查,在揭露贩运者所依赖的更广泛的业务网络方面基本上是被动的,没有效果。   为了应对这些挑战,我们引入了IMBWatch,这是一个用于大规模IMB检测的时空图神经网络(ST-GNN)框架。IMBWatch从开源情报中构建动态图表,包括抓取的在线广告,营业执照记录和众包评论。节点表示异构实体,如企业,别名,电话号码和位置,而边缘捕获时空和关系模式,包括协同定位,重复使用电话和同步广告。该框架将图卷积操作与时间注意力机制相结合,以模拟IMB网络随时间和空间的演变,捕捉诸如城际工人运动、一次性手机轮换和协调广告激增等模式。   在美国多个城市的真实数据集上进行的实验表明,IMBWatch的性能优于基线模型,实现了更高的准确性和F1得分。除了性能提升,IMBWatch还提供了更好的可解释性,提供可操作的见解,以支持主动和有针对性的干预措施。该框架是可扩展的,适用于其他非法领域,并与匿名数据和开源代码一起发布,以支持可重复的研究。
摘要:Illicit Massage Businesses (IMBs) are a covert and persistent form of organized exploitation that operate under the facade of legitimate wellness services while facilitating human trafficking, sexual exploitation, and coerced labor. Detecting IMBs is difficult due to encoded digital advertisements, frequent changes in personnel and locations, and the reuse of shared infrastructure such as phone numbers and addresses. Traditional approaches, including community tips and regulatory inspections, are largely reactive and ineffective at revealing the broader operational networks traffickers rely on.   To address these challenges, we introduce IMBWatch, a spatio-temporal graph neural network (ST-GNN) framework for large-scale IMB detection. IMBWatch constructs dynamic graphs from open-source intelligence, including scraped online advertisements, business license records, and crowdsourced reviews. Nodes represent heterogeneous entities such as businesses, aliases, phone numbers, and locations, while edges capture spatio-temporal and relational patterns, including co-location, repeated phone usage, and synchronized advertising. The framework combines graph convolutional operations with temporal attention mechanisms to model the evolution of IMB networks over time and space, capturing patterns such as intercity worker movement, burner phone rotation, and coordinated advertising surges.   Experiments on real-world datasets from multiple U.S. cities show that IMBWatch outperforms baseline models, achieving higher accuracy and F1 scores. Beyond performance gains, IMBWatch offers improved interpretability, providing actionable insights to support proactive and targeted interventions. The framework is scalable, adaptable to other illicit domains, and released with anonymized data and open-source code to support reproducible research.


Transformer(3篇)

【1】RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers
标题:DMAAT:Astrocyte启发的记忆压缩和回放,用于高效的长上下文Transformer
链接:https://arxiv.org/abs/2601.00426

作者:Md Zesun Ahmed Mia,Malyaban Bal,Abhronil Sengupta
摘要:自注意机制的二次复杂性是将Transformer模型应用于长序列的一个重要障碍。这项工作探讨了计算原理来自星形胶质细胞神经胶质细胞的生物记忆和突触调制的关键作为一种补充方法,以传统的建筑修改有效的自我注意。我们介绍了递归记忆增强的星形Transformer(RMAAT),一个集成抽象星形胶质细胞功能的架构。RMAAT采用了一种循环的、基于段的处理策略,其中持久内存令牌传播上下文信息。一个自适应的压缩机制,由一个新的保留因子来自模拟星形胶质细胞的长期可塑性(LTP),调制这些令牌。段内的注意力利用了一种高效的线性复杂性机制,这种机制受到星形胶质细胞短期可塑性(STP)的启发。训练是使用星形细胞记忆重放反向传播(AMRB),一种新的算法设计的记忆效率在递归网络。长距离竞技场(LRA)基准的评估表明,RMAAT的竞争力的准确性和大幅提高计算和内存效率,表明将星形胶质细胞启发的动态可扩展的序列模型的潜力。
摘要:The quadratic complexity of self-attention mechanism presents a significant impediment to applying Transformer models to long sequences. This work explores computational principles derived from astrocytes-glial cells critical for biological memory and synaptic modulation-as a complementary approach to conventional architectural modifications for efficient self-attention. We introduce the Recurrent Memory Augmented Astromorphic Transformer (RMAAT), an architecture integrating abstracted astrocyte functionalities. RMAAT employs a recurrent, segment-based processing strategy where persistent memory tokens propagate contextual information. An adaptive compression mechanism, governed by a novel retention factor derived from simulated astrocyte long-term plasticity (LTP), modulates these tokens. Attention within segments utilizes an efficient, linear-complexity mechanism inspired by astrocyte short-term plasticity (STP). Training is performed using Astrocytic Memory Replay Backpropagation (AMRB), a novel algorithm designed for memory efficiency in recurrent networks. Evaluations on the Long Range Arena (LRA) benchmark demonstrate RMAAT's competitive accuracy and substantial improvements in computational and memory efficiency, indicating the potential of incorporating astrocyte-inspired dynamics into scalable sequence models.


【2】StockBot 2.0: Vanilla LSTMs Outperform Transformer-based Forecasting for Stock Prices
标题:StockBot 2.0:Vanilla LSTM优于基于transformer的股票价格预测
链接:https://arxiv.org/abs/2601.00197

作者:Shaswat Mohanty
备注:14 pages, 5 figures
摘要:金融市场的准确预测仍然是一个长期的挑战,由于复杂的时间和往往是潜在的依赖性,非线性动态和高波动性。在我们早期的递归神经网络框架的基础上,我们提出了一个增强的StockBot架构,该架构在统一的实验环境中系统地评估了现代基于注意力的卷积和递归时间序列预测模型。虽然基于注意力和受transformer启发的模型提供了更高的建模灵活性,但广泛的实证评估表明,当在一组常见的默认超参数下训练时,精心构建的vanilla LSTM始终能够实现卓越的预测准确性和更稳定的买入/卖出决策。这些结果突出了金融时间序列预测的递归序列模型的鲁棒性和数据效率,特别是在没有广泛的超参数调整或离散到单日间隔时有足够的数据可用的情况下。此外,这些结果强调了架构归纳偏差在数据有限的市场预测任务中的重要性。
摘要:Accurate forecasting of financial markets remains a long-standing challenge due to complex temporal and often latent dependencies, non-linear dynamics, and high volatility. Building on our earlier recurrent neural network framework, we present an enhanced StockBot architecture that systematically evaluates modern attention-based, convolutional, and recurrent time-series forecasting models within a unified experimental setting. While attention-based and transformer-inspired models offer increased modeling flexibility, extensive empirical evaluation reveals that a carefully constructed vanilla LSTM consistently achieves superior predictive accuracy and more stable buy/sell decision-making when trained under a common set of default hyperparameters. These results highlight the robustness and data efficiency of recurrent sequence models for financial time-series forecasting, particularly in the absence of extensive hyperparameter tuning or the availability of sufficient data when discretized to single-day intervals. Additionally, these results underscore the importance of architectural inductive bias in data-limited market prediction tasks.


【3】Online Finetuning Decision Transformers with Pure RL Gradients
标题:具有纯RL元素的在线微调决策Transformer
链接:https://arxiv.org/abs/2601.00167

作者:Junkai Luo,Yinglun Zhu
摘要:决策Transformers(DT)已经成为一个强大的框架,通过制定离线强化学习(RL)作为一个序列建模问题的顺序决策。然而,将DT扩展到具有纯RL梯度的在线设置仍然在很大程度上未被探索,因为现有方法在在线微调期间继续严重依赖于监督序列建模目标。我们将后见之明返回重新标记(在线DT中的标准组件)确定为基于RL的微调的关键障碍:虽然对监督学习有益,但它与基于重要性采样的RL算法(如GRPO)根本不兼容,导致不稳定的训练。基于这一见解,我们提出了新的算法,使在线微调的决策Transformers使用纯强化学习梯度。我们适应GRPO DT和引入几个关键的修改,包括子轨迹优化,以改善信用分配,序列级似然目标,以提高稳定性和效率,并积极采样,以鼓励探索不确定的地区。通过大量的实验,我们证明了我们的方法优于现有的在线DT基线,并在多个基准测试中实现了新的最先进的性能,突出了基于纯RL的决策Transformers在线微调的有效性。
摘要:Decision Transformers (DTs) have emerged as a powerful framework for sequential decision making by formulating offline reinforcement learning (RL) as a sequence modeling problem. However, extending DTs to online settings with pure RL gradients remains largely unexplored, as existing approaches continue to rely heavily on supervised sequence-modeling objectives during online finetuning. We identify hindsight return relabeling -- a standard component in online DTs -- as a critical obstacle to RL-based finetuning: while beneficial for supervised learning, it is fundamentally incompatible with importance sampling-based RL algorithms such as GRPO, leading to unstable training. Building on this insight, we propose new algorithms that enable online finetuning of Decision Transformers using pure reinforcement learning gradients. We adapt GRPO to DTs and introduce several key modifications, including sub-trajectory optimization for improved credit assignment, sequence-level likelihood objectives for enhanced stability and efficiency, and active sampling to encourage exploration in uncertain regions. Through extensive experiments, we demonstrate that our methods outperform existing online DT baselines and achieve new state-of-the-art performance across multiple benchmarks, highlighting the effectiveness of pure-RL-based online finetuning for Decision Transformers.


GAN|对抗|攻击|生成相关(6篇)

【1】Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
标题:头像强制:用于自然对话的实时交互式头部头像生成
链接:https://arxiv.org/abs/2601.00664

作者:Taekyung Ki,Sangwon Jang,Jaehyeong Jo,Jaehong Yoon,Sung Ju Hwang
备注:Project page: https://taekyungki.github.io/AvatarForcing/
摘要:Talking Head Generation从静态肖像创建逼真的化身,用于虚拟通信和内容创建。然而,目前的模型还没有传达真正的互动交流的感觉,往往产生缺乏情感参与的单向反应。我们确定了实现真正交互式化身的两个关键挑战:在因果约束下实时生成运动,以及在没有额外标记数据的情况下学习富有表现力的、充满活力的反应。为了解决这些挑战,我们提出了头像强迫,一个新的框架,交互式头部头像生成模型,通过扩散强迫实时用户头像交互。这种设计允许化身处理实时多模态输入,包括用户的音频和运动,对语言和非语言线索(如语音、笑声和笑声)的即时反应具有低延迟。此外,我们引入了一种直接偏好优化方法,该方法利用通过丢弃用户条件构建的合成丢失样本,实现表达性交互的无标签学习。实验结果表明,我们的框架能够实现低延迟(约500 ms)的实时交互,与基线相比,实现了6.8倍的加速比,并产生反应性和表现力的化身运动,这是对基线的80%以上的首选。
摘要:Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.


【2】Adversarial Samples Are Not Created Equal
标题:对抗样本并非平等
链接:https://arxiv.org/abs/2601.00577

作者:Jennifer Crawford,Amol Khanna,Fred Lu,Amy R. Wagoner,Stella Biderman,Andre T. Nguyen,Edward Raff
摘要:在过去的十年中,已经提出了许多理论来解释深度神经网络对对抗性逃避攻击的普遍脆弱性。其中,Ilyas等人提出的非鲁棒特征理论已被广泛接受,表明数据分布的脆弱但预测性特征可以被攻击者直接利用。然而,该理论忽略了不直接利用这些特征的对抗样本。在这项工作中,我们主张这两种样本-使用脆弱但具有预测性的特征和不使用这些特征的样本-包含两种类型的对抗性弱点,在评估对抗性鲁棒性时应该加以区分。为此,我们提出了一个基于集合的度量来衡量对抗性扰动对非鲁棒特征的操纵,并使用该度量来分析攻击者生成的对抗性样本的构成。这种新的视角也使我们能够重新审视多种现象,包括尖锐度感知最小化对对抗鲁棒性的影响,以及对抗训练和标准训练之间在鲁棒数据集上观察到的鲁棒性差距。
摘要 :Over the past decade, numerous theories have been proposed to explain the widespread vulnerability of deep neural networks to adversarial evasion attacks. Among these, the theory of non-robust features proposed by Ilyas et al. has been widely accepted, showing that brittle but predictive features of the data distribution can be directly exploited by attackers. However, this theory overlooks adversarial samples that do not directly utilize these features. In this work, we advocate that these two kinds of samples - those which use use brittle but predictive features and those that do not - comprise two types of adversarial weaknesses and should be differentiated when evaluating adversarial robustness. For this purpose, we propose an ensemble-based metric to measure the manipulation of non-robust features by adversarial perturbations and use this metric to analyze the makeup of adversarial samples generated by attackers. This new perspective also allows us to re-examine multiple phenomena, including the impact of sharpness-aware minimization on adversarial robustness and the robustness gap observed between adversarially training and standard training on robust datasets.


【3】Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing
标题:工程攻击载体和检测增材制造中的异常
链接:https://arxiv.org/abs/2601.00384

作者:Md Mahbub Hasan,Marcus Sternhagen,Krishna Chandra Roy
备注:This paper has been accepted to EAI SmartSP 2025. This is the preprint version
摘要:增材制造(AM)正在迅速融入航空航天、汽车和医疗保健等关键领域。然而,这种网络物理融合引入了新的攻击面,特别是在计算机辅助设计(CAD)和机器执行层之间的接口处。在这项工作中,我们调查了两个广泛使用的熔融沉积成型(FDM)系统,Creality的旗舰型号K1 Max和Ender 3的有针对性的网络攻击。我们的威胁模型是一个多层次的中间人(MitM)入侵,对手在从用户界面上传到打印机固件期间拦截和操纵G代码文件。MitM入侵链实现了几种隐形破坏场景。这些攻击仍然无法被传统的切片器软件或运行时接口检测到,导致结构上有缺陷但外部看似合理的打印部件。为了应对这些隐形的威胁,我们提出了一个无监督的入侵检测系统(IDS),分析结构化的机器日志在现场打印过程中产生的。我们的防御机制使用一个冻结的基于transformer的编码器(一个BERT变体)来提取系统行为的语义表示,然后使用一个经过对比训练的投影头来学习异常敏感的嵌入。之后,一个基于聚类的方法和一个自注意自动编码器用于分类。实验结果表明,我们的方法有效地区分良性和妥协的执行。
摘要:Additive manufacturing (AM) is rapidly integrating into critical sectors such as aerospace, automotive, and healthcare. However, this cyber-physical convergence introduces new attack surfaces, especially at the interface between computer-aided design (CAD) and machine execution layers. In this work, we investigate targeted cyberattacks on two widely used fused deposition modeling (FDM) systems, Creality's flagship model K1 Max, and Ender 3. Our threat model is a multi-layered Man-in-the-Middle (MitM) intrusion, where the adversary intercepts and manipulates G-code files during upload from the user interface to the printer firmware. The MitM intrusion chain enables several stealthy sabotage scenarios. These attacks remain undetectable by conventional slicer software or runtime interfaces, resulting in structurally defective yet externally plausible printed parts. To counter these stealthy threats, we propose an unsupervised Intrusion Detection System (IDS) that analyzes structured machine logs generated during live printing. Our defense mechanism uses a frozen Transformer-based encoder (a BERT variant) to extract semantic representations of system behavior, followed by a contrastively trained projection head that learns anomaly-sensitive embeddings. Later, a clustering-based approach and a self-attention autoencoder are used for classification. Experimental results demonstrate that our approach effectively distinguishes between benign and compromised executions.


【4】Rectifying Adversarial Examples Using Their Vulnerabilities
标题:利用漏洞纠正敌对示例
链接:https://arxiv.org/abs/2601.00270

作者:Fumiya Morimoto,Ryuto Morita,Satoshi Ono
摘要:基于深度神经网络的分类器在处理对抗性样本(AE)时容易出错。AE是人类无法检测到的最小扰动输入数据,对依赖安全的应用程序构成重大风险。因此,已经进行了广泛的研究,以开发减轻其威胁的防御机制。大多数现有的方法主要集中在区分AE的输入样本的基础上的功能,强调AE检测,而不是解决正确的样本分类之前的攻击。虽然有些任务可能只需要拒绝检测到的AE,但其他任务需要识别正确的原始输入类别,例如自动驾驶中的交通标志识别。本研究的目的是提出一种纠正AE的方法,以估计其原始输入的正确标签。我们的方法是基于重新攻击AE,将它们移动到决策边界之外,以进行准确的标签预测,有效地解决了纠正使用白盒攻击方法创建的最小可感知AE的问题。然而,挑战仍然存在,有效地纠正由黑盒攻击产生的AE在距离边界的距离,或被有针对性的攻击错误地分类为低置信度类别。通过采用一种简单的方法,只考虑AE作为输入,所提出的方法可以解决不同的攻击,同时避免了参数调整或初步训练的要求。结果表明,所提出的方法在纠正通过各种攻击方法,包括有针对性的和黑盒攻击产生的AE表现出一致的性能。此外,它优于传统的整流和输入变换方法的稳定性,对各种攻击。
摘要:Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications. Hence, extensive research has been undertaken to develop defense mechanisms that mitigate their threats. Most existing methods primarily focus on discriminating AEs based on the input sample features, emphasizing AE detection without addressing the correct sample categorization before an attack. While some tasks may only require mere rejection on detected AEs, others necessitate identifying the correct original input category such as traffic sign recognition in autonomous driving. The objective of this study is to propose a method for rectifying AEs to estimate the correct labels of their original inputs. Our method is based on re-attacking AEs to move them beyond the decision boundary for accurate label prediction, effectively addressing the issue of rectifying minimally perceptible AEs created using white-box attack methods. However, challenge remains with respect to effectively rectifying AEs produced by black-box attacks at a distance from the boundary, or those misclassified into low-confidence categories by targeted attacks. By adopting a straightforward approach of only considering AEs as inputs, the proposed method can address diverse attacks while avoiding the requirement of parameter adjustments or preliminary training. Results demonstrate that the proposed method exhibits consistent performance in rectifying AEs generated via various attack methods, including targeted and black-box attacks. Moreover, it outperforms conventional rectification and input transformation methods in terms of stability against various attacks.


【5】SSI-GAN: Semi-Supervised Swin-Inspired Generative Adversarial Networks for Neuronal Spike Classification
标题:SSI-GAN:用于神经元尖峰分类的半监督猪启发生成对抗网络
链接:https://arxiv.org/abs/2601.00189

作者:Danial Sharifrazi,Nouman Javed,Mojtaba Mohammadi,Seyede Sana Salehi,Roohallah Alizadehsani,Prasad N. Paradkar,U. Rajendra Acharya,Asim Bhatti
摘要:蚊虫是虫媒病毒病的主要传播媒介。人工分类他们的神经元锋电位模式是非常劳动密集型和昂贵的。大多数可用的深度学习解决方案需要完全标记的尖峰数据集和高度预处理的神经元信号。这降低了在实际现场场景中大规模采用的可行性。为了解决标记数据稀缺的问题,我们提出了一种新的生成对抗网络(GAN)架构,我们称之为半监督Swin-Inspired GAN(SSI-GAN)。Swin启发的,移窗神经元,连同变压器为基础的发电机,被用来分类神经元尖峰列车,因此,检测病毒嗜神经性。我们在一个平坦的、基于窗口的Transformer模型中使用多头自注意模型,该模型学习捕获稀疏的高频尖峰特征。SSI-GAN仅使用1%至3%的标记数据,使用感染后五次收集的超过1500万个尖峰样本进行训练,并记录寨卡病毒感染,登革热感染或未感染类别的分类。使用贝叶斯Optuna框架优化超参数,并在五重Monte Carlo交叉验证下验证稳健性性能。SSI-GAN在感染后第三天达到99.93%的分类准确率,只有3%的标记数据。它在感染的所有阶段都保持了很高的准确性,只有1%的监督。这表明在相同的性能水平下,相对于标准的监督方法,手动标记工作减少了97-99%。这里提出的移位窗口Transformer设计以很大的幅度击败了所有基线,并在基于尖峰的神经元感染分类中设置了新的最佳标记。
摘要 :Mosquitos are the main transmissive agents of arboviral diseases. Manual classification of their neuronal spike patterns is very labor-intensive and expensive. Most available deep learning solutions require fully labeled spike datasets and highly preprocessed neuronal signals. This reduces the feasibility of mass adoption in actual field scenarios. To address the scarcity of labeled data problems, we propose a new Generative Adversarial Network (GAN) architecture that we call the Semi-supervised Swin-Inspired GAN (SSI-GAN). The Swin-inspired, shifted-window discriminator, together with a transformer-based generator, is used to classify neuronal spike trains and, consequently, detect viral neurotropism. We use a multi-head self-attention model in a flat, window-based transformer discriminator that learns to capture sparser high-frequency spike features. Using just 1 to 3% labeled data, SSI-GAN was trained with more than 15 million spike samples collected at five-time post-infection and recording classification into Zika-infected, dengue-infected, or uninfected categories. Hyperparameters were optimized using the Bayesian Optuna framework, and performance for robustness was validated under fivefold Monte Carlo cross-validation. SSI-GAN reached 99.93% classification accuracy on the third day post-infection with only 3% labeled data. It maintained high accuracy across all stages of infection with just 1% supervision. This shows a 97-99% reduction in manual labeling effort relative to standard supervised approaches at the same performance level. The shifted-window transformer design proposed here beat all baselines by a wide margin and set new best marks in spike-based neuronal infection classification.


【6】Neural Brain Fields: A NeRF-Inspired Approach for Generating Nonexistent EEG Electrodes
标题:Neural Brain Fields:A NeRF Inspired Approach for Generating Nonexisting EEG Electrodes神经脑场:一种基于NeRF的生成不存在EEG电极的方法
链接:https://arxiv.org/abs/2601.00012

作者:Shahar Ain Kedem,Itamar Zimerman,Eliya Nachmani
摘要:脑电图(EEG)数据提出了独特的建模挑战,因为记录长度不同,信噪比非常低,参与者之间存在显著差异,会话内随时间推移而漂移,并且很少在大型和干净的数据集中可用。因此,开发能够有效处理EEG信号的深度学习方法仍然是一个开放且重要的研究问题。为了解决这个问题,这项工作提出了一种受神经辐射场(NeRF)启发的新方法。在计算机视觉中,NeRF技术训练神经网络来记忆3D场景的外观,然后使用其学习的参数从任何视点渲染和编辑场景。我们在从不同视角捕获的离散图像之间进行类比,这些图像用于学习NeRF中的连续3D场景,而EEG电极位于头皮上的不同位置,用于推断连续神经活动的潜在表示。基于这种连接,我们证明了神经网络可以以NeRF风格的方式在单个EEG样本上进行训练,以产生一个固定大小和信息量的权重向量,对整个信号进行编码。此外,通过这种表示,我们可以在以前看不见的时间步长和空间电极位置呈现EEG信号。我们证明,这种方法能够以任何所需的分辨率(包括超高分辨率)连续可视化大脑活动,并重建原始脑电信号。最后,我们的实证分析表明,该方法可以有效地模拟不存在的电极数据在EEG记录,允许重建的信号被送入标准的EEG处理网络,以提高性能。
摘要:Electroencephalography (EEG) data present unique modeling challenges because recordings vary in length, exhibit very low signal to noise ratios, differ significantly across participants, drift over time within sessions, and are rarely available in large and clean datasets. Consequently, developing deep learning methods that can effectively process EEG signals remains an open and important research problem. To tackle this problem, this work presents a new method inspired by Neural Radiance Fields (NeRF). In computer vision, NeRF techniques train a neural network to memorize the appearance of a 3D scene and then uses its learned parameters to render and edit the scene from any viewpoint. We draw an analogy between the discrete images captured from different viewpoints used to learn a continuous 3D scene in NeRF, and EEG electrodes positioned at different locations on the scalp, which are used to infer the underlying representation of continuous neural activity. Building on this connection, we show that a neural network can be trained on a single EEG sample in a NeRF style manner to produce a fixed size and informative weight vector that encodes the entire signal. Moreover, via this representation we can render the EEG signal at previously unseen time steps and spatial electrode positions. We demonstrate that this approach enables continuous visualization of brain activity at any desired resolution, including ultra high resolution, and reconstruction of raw EEG signals. Finally, our empirical analysis shows that this method can effectively simulate nonexistent electrodes data in EEG recordings, allowing the reconstructed signal to be fed into standard EEG processing networks to improve performance.


半/弱/无/有监督|不确定性|主动学习(2篇)

【1】Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty
标题:随机行动者-批评者:通过时间随机不确定性缓解高估
链接:https://arxiv.org/abs/2601.00737

作者:Uğurcan Özalp
备注:19 pages
摘要:强化学习中的非策略行动者-批评者方法训练具有时间差更新的批评者,并将其用作策略(行动者)的学习信号。这种设计通常比纯粹的策略方法实现更高的采样效率。然而,批评者网络倾向于系统性地高估价值估计。这通常通过引入基于不确定性估计的悲观偏差来解决。目前的方法采用集成来量化批评家的认知不确定性-由于有限的数据和模型的不确定性-规模悲观的更新。在这项工作中,我们提出了一种新的算法,称为随机演员批评(STAC),采用时间(一步)任意的不确定性所产生的随机过渡,奖励和政策引起的变化贝尔曼目标的不确定性规模悲观偏见的时间差异更新,而不是依赖于认识的不确定性。STAC使用一个单一的分布式评论家网络来建模的时间回报的不确定性,并应用dropout的评论家和演员的网络进行正则化。我们的研究结果表明,悲观主义的基础上的一个分配的批评,足以减轻高估,并自然导致风险规避行为在随机环境中。引入dropout通过正则化进一步提高了训练的稳定性和性能。通过这种设计,STAC使用单个分布式批评网络实现了提高的计算效率。
摘要:Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy methods. However, critic networks tend to overestimate value estimates systematically. This is often addressed by introducing a pessimistic bias based on uncertainty estimates. Current methods employ ensembling to quantify the critic's epistemic uncertainty-uncertainty due to limited data and model ambiguity-to scale pessimistic updates. In this work, we propose a new algorithm called Stochastic Actor-Critic (STAC) that incorporates temporal (one-step) aleatoric uncertainty-uncertainty arising from stochastic transitions, rewards, and policy-induced variability in Bellman targets-to scale pessimistic bias in temporal-difference updates, rather than relying on epistemic uncertainty. STAC uses a single distributional critic network to model the temporal return uncertainty, and applies dropout to both the critic and actor networks for regularization. Our results show that pessimism based on a distributional critic alone suffices to mitigate overestimation, and naturally leads to risk-averse behavior in stochastic environments. Introducing dropout further improves training stability and performance by means of regularization. With this design, STAC achieves improved computational efficiency using a single distributional critic network.


【2】Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference
标题:具有Bayesian Operator推断的参数微系统数据驱动约简模型的主动学习
链接:https://arxiv.org/abs/2601.00038

作者:Shane A. McQuarrie,Mengwu Guo,Anirban Chaudhuri
摘要:这项工作开发了一个主动学习框架,以智能地丰富参数动态系统的数据驱动降阶模型(ROM),这些模型可以作为数字孪生中虚拟资产的基础。数据驱动ROM是可解释的、计算效率高的科学机器学习模型,旨在保留复杂动态模拟的底层物理。由于数据驱动的ROM的质量是敏感的有限的训练数据的质量,我们试图确定训练参数,使用相关的训练数据的结果在最好的参数ROM。我们的方法使用的运营商推理方法,一个基于回归的策略,可以定制特定的参数结构的一大类问题。我们建立了一个概率版本的参数算子推理,铸造的学习问题作为贝叶斯线性回归。预测的不确定性所产生的概率ROM解决方案被用来设计一个顺序的自适应采样方案,以选择新的训练参数向量,促进ROM的稳定性和准确性,在参数域的全球。我们进行了几个非线性参数系统的偏微分方程的数值实验,并将结果与随机参数样本训练的ROM进行比较。结果表明,所提出的自适应采样策略始终产生更稳定和更准确的ROM比随机采样在相同的计算预算。
摘要 :This work develops an active learning framework to intelligently enrich data-driven reduced-order models (ROMs) of parametric dynamical systems, which can serve as the foundation of virtual assets in a digital twin. Data-driven ROMs are explainable, computationally efficient scientific machine learning models that aim to preserve the underlying physics of complex dynamical simulations. Since the quality of data-driven ROMs is sensitive to the quality of the limited training data, we seek to identify training parameters for which using the associated training data results in the best possible parametric ROM. Our approach uses the operator inference methodology, a regression-based strategy which can be tailored to particular parametric structure for a large class of problems. We establish a probabilistic version of parametric operator inference, casting the learning problem as a Bayesian linear regression. Prediction uncertainties stemming from the resulting probabilistic ROM solutions are used to design a sequential adaptive sampling scheme to select new training parameter vectors that promote ROM stability and accuracy globally in the parameter domain. We conduct numerical experiments for several nonlinear parametric systems of partial differential equations and compare the results to ROMs trained on random parameter samples. The results demonstrate that the proposed adaptive sampling strategy consistently yields more stable and accurate ROMs than random sampling does under the same computational budget.


迁移|Zero/Few/One-Shot|自适应(6篇)

【1】BSAT: B-Spline Adaptive Tokenizer for Long-Term Time Series Forecasting
标题:BSAT:用于长期时间序列预测的B样条自适应令牌化器
链接:https://arxiv.org/abs/2601.00698

作者:Maximilian Reinwardt,Michael Eichelbeck,Matthias Althoff
备注:20 pages, 7 figures
摘要:使用Transformers的长期时间序列预测受到自我关注的二次复杂性和统一修补的刚性的阻碍,这可能与数据的语义结构不一致。在本文中,我们介绍了\textit{B-Spline Adaptive Tokenizer(BSAT)},一种新的,无参数的方法,自适应分段的时间序列,它与B-样条拟合。BSAT算法将令牌放置在高曲率区域中,并将每个可变长度基函数表示为固定大小的令牌,由其系数和位置组成。此外,我们提出了一种混合位置编码,它结合了一个可学习的位置编码与旋转位置嵌入,具有逐层可学习的基础:L-RoPE。这允许每个层关注不同的时间依赖性。我们在几个公共基准测试上的实验表明,我们的模型在高压缩率下具有很强的性能。这使得它特别适合于具有强内存约束的用例。
摘要:Long-term time series forecasting using transformers is hampered by the quadratic complexity of self-attention and the rigidity of uniform patching, which may be misaligned with the data's semantic structure. In this paper, we introduce the \textit{B-Spline Adaptive Tokenizer (BSAT)}, a novel, parameter-free method that adaptively segments a time series by fitting it with B-splines. BSAT algorithmically places tokens in high-curvature regions and represents each variable-length basis function as a fixed-size token, composed of its coefficient and position. Further, we propose a hybrid positional encoding that combines a additive learnable positional encoding with Rotary Positional Embedding featuring a layer-wise learnable base: L-RoPE. This allows each layer to attend to different temporal dependencies. Our experiments on several public benchmarks show that our model is competitive with strong performance at high compression rates. This makes it particularly well-suited for use cases with strong memory constraints.


【2】ARISE: Adaptive Reinforcement Integrated with Swarm Exploration
标题:ARISE:与群探索集成的自适应强化
链接:https://arxiv.org/abs/2601.00693

作者:Rajiv Chaitanya M,D R Ramesh Babu
备注:12 pages. Accepted for presentation at WCSC 2026
摘要:有效的探索仍然是强化学习中的一个关键挑战,特别是在非静态奖励或高维策略的情况下。我们介绍ARISE,这是一个轻量级框架,通过使用紧凑的基于群的探索层增强标准策略梯度方法来增强强化学习。ARISE将策略行动与粒子驱动的建议相结合,其中每个粒子代表在行动空间中采样的候选策略轨迹,并使用奖励方差线索自适应地调节探索。虽然简单的基准测试只显示出轻微的改进(例如,在CartPole-v1上+0.7%),ARISE在更具挑战性的任务上获得了可观的收益,包括在LunarLander-v3上的+46%和在Hopper-v4上的+22%,同时在Walker 2d和Ant上保持稳定性。在非平稳的奖励变化下,ARISE提供了显着的鲁棒性优势,在CartPole上超过PPO +75点,并相应地改进了LunarLander。烧蚀研究证实,集群组件和自适应机制有助于性能。总的来说,ARISE提供了一个简单的,与架构无关的路线,可以在不改变核心算法结构的情况下实现更具探索性和弹性的RL代理。
摘要:Effective exploration remains a key challenge in RL, especially with non-stationary rewards or high-dimensional policies. We introduce ARISE, a lightweight framework that enhances reinforcement learning by augmenting standard policy-gradient methods with a compact swarm-based exploration layer. ARISE blends policy actions with particle-driven proposals, where each particle represents a candidate policy trajectory sampled in the action space, and modulates exploration adaptively using reward-variance cues. While easy benchmarks exhibit only slight improvements (e.g., +0.7% on CartPole-v1), ARISE yields substantial gains on more challenging tasks, including +46% on LunarLander-v3 and +22% on Hopper-v4, while preserving stability on Walker2d and Ant. Under non-stationary reward shifts, ARISE provides marked robustness advantages, outperforming PPO by +75 points on CartPole and improving LunarLander accordingly. Ablation studies confirm that both the swarm component and the adaptive mechanism contribute to the performance. Overall, ARISE offers a simple, architecture-agnostic route to more exploratory and resilient RL agents without altering core algorithmic structures.


【3】A Comparative Study of Adaptation Strategies for Time Series Foundation Models in Anomaly Detection
标题:异常检测中时间序列基础模型适应策略的比较研究
链接:https://arxiv.org/abs/2601.00446

作者:Miseon Park,Kijung Yoon
摘要:时间序列异常检测对于复杂系统的可靠运行至关重要,但大多数现有方法需要大量的特定任务培训。我们探讨了在大型异构数据上预训练的时间序列基础模型(TSFM)是否可以作为异常检测的通用骨干。通过跨多个基准测试的系统实验,我们比较了zero-shot推理、完全模型自适应和参数有效微调(PEFT)策略。我们的研究结果表明,TSFM优于特定任务的基线,在AUC-PR和VUS-PR方面取得了显着的进步,特别是在严重的类不平衡的情况下。此外,LORA、OFT和HRA等PEFT方法不仅降低了计算成本,而且在大多数情况下还匹配或超过了完全微调,这表明TSFM可以有效地适应异常检测,即使在预先训练用于预测时。这些发现的位置TSFM作为有前途的通用模型的可扩展性和高效的时间序列异常检测。
摘要:Time series anomaly detection is essential for the reliable operation of complex systems, but most existing methods require extensive task-specific training. We explore whether time series foundation models (TSFMs), pretrained on large heterogeneous data, can serve as universal backbones for anomaly detection. Through systematic experiments across multiple benchmarks, we compare zero-shot inference, full model adaptation, and parameter-efficient fine-tuning (PEFT) strategies. Our results demonstrate that TSFMs outperform task-specific baselines, achieving notable gains in AUC-PR and VUS-PR, particularly under severe class imbalance. Moreover, PEFT methods such as LoRA, OFT, and HRA not only reduce computational cost but also match or surpass full fine-tuning in most cases, indicating that TSFMs can be efficiently adapted for anomaly detection, even when pretrained for forecasting. These findings position TSFMs as promising general-purpose models for scalable and efficient time series anomaly detection.


【4】GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation
标题:GRIT --具有K-RAC预条件处理、Fisher引导重投影和动态排序自适应的几何感知PEFT
链接:https://arxiv.org/abs/2601.00231

作者:Pritish Saha,Chandrav Rajbangshi,Rudra Goyal,Mohit Goyal,Anurag Deo,Biswajit Roy,Ningthoujam Dhanachandra Singh,Raxit Goswami,Amitava Das
摘要 :参数有效微调(PEFT)是适应LLM的默认方法,但广泛使用的LoRA和QLoRA在很大程度上是几何不可知的:它们在固定的,随机定向的低秩子空间中进行优化,具有一阶下降,主要忽略局部损失曲率。这可以膨胀有效更新预算并放大沿弱约束方向的漂移。我们引入GRIT,这是一种动态的、曲率感知的LoRA过程,它保留了LoRA参数化,但:(1)使用K-FAC作为自然梯度代理在秩空间中预处理梯度;(2)定期将低秩基重新投影到主导Fisher特征方向上以抑制漂移;(3)从频谱中调整有效秩,以便容量集中在信号所在的位置。在LLaMA主干上的推理遵循,理解和推理基准中,GRIT匹配或超过LoRA和QLoRA,同时将可训练参数平均减少46%(跨任务减少25- 80%),而在提示样式和数据混合中没有实际质量损失。为了模拟遗忘,我们拟合了曲率调制的幂律。从经验上讲,GRIT产生较低的漂移和更好的更新与保留边界比强大的PEFT优化器基线(Monitoral-LoRA,IA 3,DoRA,Eff-FT,Shampoo)。
摘要:Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subspaces with first-order descent, mostly ignoring local loss curvature. This can inflate the effective update budget and amplify drift along weakly constrained directions. We introduce GRIT, a dynamic, curvature-aware LoRA procedure that preserves the LoRA parameterization but: (1) preconditions gradients in rank space using K-FAC as a natural-gradient proxy; (2) periodically reprojects the low-rank basis onto dominant Fisher eigendirections to suppress drift; and (3) adapts the effective rank from the spectrum so capacity concentrates where signal resides. Across instruction-following, comprehension, and reasoning benchmarks on LLaMA backbones, GRIT matches or surpasses LoRA and QLoRA while reducing trainable parameters by 46% on average (25--80% across tasks), without practical quality loss across prompt styles and data mixes. To model forgetting, we fit a curvature-modulated power law. Empirically, GRIT yields lower drift and a better updates-vs-retention frontier than strong PEFT-optimizer baselines (Orthogonal-LoRA, IA3, DoRA, Eff-FT, Shampoo).


【5】Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing
标题:大型经验案例研究:Go-Explore适合人工智能红团队测试
链接:https://arxiv.org/abs/2601.00042

作者:Manish Bhatt,Adrian Wood,Idan Habler,Ammar Al-Kahfah
摘要:具有工具使用能力的生产LLM代理需要安全测试,尽管他们接受过安全培训。我们采用Go-Explore来评估GPT-4 o-mini在28个实验运行中跨越6个研究问题。我们发现随机种子方差主导算法参数,产生8倍的结果传播;单种子比较是不可靠的,而多种子平均大大减少了我们设置中的方差。奖励整形一贯会损害性能,导致94%的运行中的探索崩溃,或者产生18个误报,零验证攻击。在我们的环境中,简单的状态签名优于复杂的状态签名。对于全面的安全测试,集成提供了攻击类型的多样性,而单个代理优化了给定攻击类型的覆盖范围。总的来说,这些结果表明,在测试安全训练模型时,种子方差和目标领域知识可能超过算法复杂性。
摘要:Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapt Go-Explore to evaluate GPT-4o-mini across 28 experimental runs spanning six research questions. We find that random-seed variance dominates algorithmic parameters, yielding an 8x spread in outcomes; single-seed comparisons are unreliable, while multi-seed averaging materially reduces variance in our setup. Reward shaping consistently harms performance, causing exploration collapse in 94% of runs or producing 18 false positives with zero verified attacks. In our environment, simple state signatures outperform complex ones. For comprehensive security testing, ensembles provide attack-type diversity, whereas single agents optimize coverage within a given attack type. Overall, these results suggest that seed variance and targeted domain knowledge can outweigh algorithmic sophistication when testing safety-trained models.


【6】Combining datasets with different ground truths using Low-Rank Adaptation to generalize image-based CNN models for photometric redshift prediction
标题:使用低等级自适应将具有不同基本事实的数据集结合起来,以概括基于图像的CNN模型以进行光频红移预测
链接:https://arxiv.org/abs/2601.00146

作者:Vikram Seenivasan,Srinath Saikrishnan,Andrew Lizarraga,Jonathan Soriano,Bernie Boscoe,Tuan Do
备注:11 pages, 7 figures, 3 tables, Accepted to the Conference on Neural Information Processing Systems (NeurIPS), Machine Learning and the Physical Sciences (ML4PS) Workshop 2025
摘要:在这项工作中,我们演示了如何低秩自适应(LoRA)可以用来结合不同的星系成像数据集,以提高宇宙学的CNN模型的红移估计。LoRA是一种针对大型语言模型的成熟技术,它添加了适配器网络来调整模型权重和偏差,从而有效地微调大型基础模型,而无需重新训练。我们训练一个基础模型使用光度红移地面实况数据集,其中包含广泛的星系类型,但不太准确。然后,我们使用LoRA对光谱红移地面实况数据集进行微调。这些红移更精确,但仅限于明亮的星系,需要更多的时间来获得数量级,因此不太适用于大型调查。理想情况下,两个数据集的组合将产生更准确的模型,并且具有良好的泛化能力。LoRA模型比传统的迁移学习方法表现更好,偏差更少,分散更少。在组合数据集上重新训练模型会产生一个比LoRA更好的模型,但以更长的计算时间为代价。我们的工作表明,LoRA是有用的微调回归模型在天体物理学之间提供了一个中间地带,完全再培训和没有再培训。LoRA显示了允许我们利用现有的预训练天体物理模型的潜力,特别是对于数据稀疏的任务。
摘要:In this work, we demonstrate how Low-Rank Adaptation (LoRA) can be used to combine different galaxy imaging datasets to improve redshift estimation with CNN models for cosmology. LoRA is an established technique for large language models that adds adapter networks to adjust model weights and biases to efficiently fine-tune large base models without retraining. We train a base model using a photometric redshift ground truth dataset, which contains broad galaxy types but is less accurate. We then fine-tune using LoRA on a spectroscopic redshift ground truth dataset. These redshifts are more accurate but limited to bright galaxies and take orders of magnitude more time to obtain, so are less available for large surveys. Ideally, the combination of the two datasets would yield more accurate models that generalize well. The LoRA model performs better than a traditional transfer learning method, with $\sim2.5\times$ less bias and $\sim$2.2$\times$ less scatter. Retraining the model on a combined dataset yields a model that generalizes better than LoRA but at a cost of greater computation time. Our work shows that LoRA is useful for fine-tuning regression models in astrophysics by providing a middle ground between full retraining and no retraining. LoRA shows potential in allowing us to leverage existing pretrained astrophysical models, especially for data sparse tasks.


强化学习(7篇)

【1】IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning
标题:IRPO:通过强化学习扩展布拉德利-特里模型
链接:https://arxiv.org/abs/2601.00677

作者:Haonan Song,Qingchen Xie,Huan Zhu,Feng Xiao,Luxi Xing,Fuzhen Li,Liu Kang,Feng Jiang,Zhiyong Zheng,Fan Yang
备注:14 pages, 4 figures
摘要:生成奖励模型(GRM)由于其可解释性,推理时间可扩展性以及通过强化学习(RL)进行改进的潜力,在奖励建模方面引起了相当大的研究兴趣。然而,广泛使用的成对GRM在与RL算法(如组相对策略优化(GRPO))集成时会产生计算瓶颈。这个瓶颈来自两个因素:(i)获得相对分数所需的成对比较的时间复杂度为O(n^2),以及(ii)重复采样或额外的思想链(CoT)推理以提高性能的计算开销。为了解决第一个因素,我们提出了组间相对偏好优化(IRPO),这是一种新的RL框架,将成熟的Bradley-Terry模型纳入GRPO。通过为每个响应生成逐点得分,IRPO可以在RL训练期间有效评估任意多个候选者,同时保留可解释性和细粒度的奖励信号。实验结果表明,IRPO实现国家的最先进的(SOTA)性能之间的逐点GRM在多个基准测试,与当前领先的成对GRM的性能相当。此外,我们表明,IRPO在训练后评估中显着优于成对GRM。
摘要 :Generative Reward Models (GRMs) have attracted considerable research interest in reward modeling due to their interpretability, inference-time scalability, and potential for refinement through reinforcement learning (RL). However, widely used pairwise GRMs create a computational bottleneck when integrated with RL algorithms such as Group Relative Policy Optimization (GRPO). This bottleneck arises from two factors: (i) the O(n^2) time complexity of pairwise comparisons required to obtain relative scores, and (ii) the computational overhead of repeated sampling or additional chain-of-thought (CoT) reasoning to improve performance. To address the first factor, we propose Intergroup Relative Preference Optimization (IRPO), a novel RL framework that incorporates the well-established Bradley-Terry model into GRPO. By generating a pointwise score for each response, IRPO enables efficient evaluation of arbitrarily many candidates during RL training while preserving interpretability and fine-grained reward signals. Experimental results demonstrate that IRPO achieves state-of-the-art (SOTA) performance among pointwise GRMs across multiple benchmarks, with performance comparable to that of current leading pairwise GRMs. Furthermore, we show that IRPO significantly outperforms pairwise GRMs in post-training evaluations.


【2】E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models
标题:E-GRPO:高熵步骤推动流模型的有效强化学习
链接:https://arxiv.org/abs/2601.00423

作者:Shengjun Zhang,Zhang Zhang,Chensheng Dai,Yueqi Duan
备注:Code: https://github.com/shengjun-zhang/VisualGRPO
摘要:最近的强化学习增强了人类偏好对齐的流匹配模型。虽然随机采样能够探索去噪方向,但现有的方法在多个去噪步骤上进行优化,从而遭受稀疏和模糊的奖励信号。我们观察到,高熵步骤能够实现更高效和有效的探索,而低熵步骤导致不区分的推出。为此,我们提出了E-GRPO,一个熵感知的组相对策略优化,以增加熵采样步骤。由于随机微分方程的积分遭受模糊的奖励信号,由于多个步骤的随机性,我们专门合并连续的低熵步骤,制定一个高熵的步骤,用于连续采样,而在其他步骤上应用ODE采样。在此基础上,我们引入了多步组归一化优势,它计算共享相同的合并后去噪步骤的样本内的组相对优势。在不同奖励设置下的实验结果证明了该方法的有效性。
摘要:Recent reinforcement learning has enhanced the flow matching models on human preference alignment. While stochastic sampling enables the exploration of denoising directions, existing methods which optimize over multiple denoising steps suffer from sparse and ambiguous reward signals. We observe that the high entropy steps enable more efficient and effective exploration while the low entropy steps result in undistinguished roll-outs. To this end, we propose E-GRPO, an entropy aware Group Relative Policy Optimization to increase the entropy of SDE sampling steps. Since the integration of stochastic differential equations suffer from ambiguous reward signals due to stochasticity from multiple steps, we specifically merge consecutive low entropy steps to formulate one high entropy step for SDE sampling, while applying ODE sampling on other steps. Building upon this, we introduce multi-step group normalized advantage, which computes group-relative advantages within samples sharing the same consolidated SDE denoising step. Experimental results on different reward settings have demonstrated the effectiveness of our methods.


【3】Can Optimal Transport Improve Federated Inverse Reinforcement Learning?
标题:最佳传输能否改善联邦反向强化学习?
链接:https://arxiv.org/abs/2601.00309

作者:David Millard,Ali Baheri
摘要:在机器人和多智能体系统中,自治智能体的舰队通常在不同的环境中运行,同时追求共同的高级目标。直接汇集他们的数据来学习共享奖励函数通常是不切实际的,因为动态,隐私约束和有限的通信带宽的差异。本文介绍了一种基于最优传输的联邦逆强化学习(IRL)方法。每个客户端首先在本地执行轻量级的最大熵IRL,遵守其计算和隐私限制。然后通过Wasserstein重心融合所得到的奖励函数,该重心考虑其潜在的几何结构。我们进一步证明,这种重心融合产生一个更忠实的全球奖励估计比传统的参数平均方法在联邦学习。总的来说,这项工作提供了一个原则性和通信效率的框架,以获得一个共享的奖励,概括了异构代理和环境。
摘要:In robotics and multi-agent systems, fleets of autonomous agents often operate in subtly different environments while pursuing a common high-level objective. Directly pooling their data to learn a shared reward function is typically impractical due to differences in dynamics, privacy constraints, and limited communication bandwidth. This paper introduces an optimal transport-based approach to federated inverse reinforcement learning (IRL). Each client first performs lightweight Maximum Entropy IRL locally, adhering to its computational and privacy limitations. The resulting reward functions are then fused via a Wasserstein barycenter, which considers their underlying geometric structure. We further prove that this barycentric fusion yields a more faithful global reward estimate than conventional parameter averaging methods in federated learning. Overall, this work provides a principled and communication-efficient framework for deriving a shared reward that generalizes across heterogeneous agents and environments.


【4】Reinforcement Learning with Function Approximation for Non-Markov Processes
标题:非马尔科夫过程的函数逼近强化学习
链接:https://arxiv.org/abs/2601.00151

作者:Ali Devran Kara
摘要:研究了非马尔可夫状态过程和代价过程下线性函数逼近的强化学习方法。我们首先考虑的政策评估方法,并表明该算法在适当的遍历性条件下的基本非马尔可夫过程收敛。进一步证明了该极限对应于一个由正交投影和辅助Markov决策过程的Bellman算子组成的联合算子的不动点.   对于具有线性函数逼近的Q学习,如在马尔可夫设置中,一般不能保证收敛。然而,我们表明,对于基于量化映射选择基函数的特殊情况,可以在类似的遍历条件下证明收敛性。最后,我们将我们的结果部分观察马尔可夫决策过程,有限记忆变量被用作状态表示,我们得到明确的误差界的限制所得到的学习算法。
摘要:We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions on the underlying non-Markov processes. Furthermore, we show that the limit corresponds to the fixed point of a joint operator composed of an orthogonal projection and the Bellman operator of an auxiliary \emph{Markov} decision process.   For Q-learning with linear function approximation, as in the Markov setting, convergence is not guaranteed in general. We show, however, that for the special case where the basis functions are chosen based on quantization maps, the convergence can be shown under similar ergodicity conditions. Finally, we apply our results to partially observed Markov decision processes, where finite-memory variables are used as state representations, and we derive explicit error bounds for the limits of the resulting learning algorithms.


【5】GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments
标题:GRL-SNAM:使用路径差异Hamilton的几何强化学习,用于未知环境中同时导航和绘制
链接:https://arxiv.org/abs/2601.00116

作者:Aditya Sai Ellendula,Yi Wang,Minh Nguyen,Chandrajit Bajaj
摘要:我们提出了GRL-SNAM,一个几何强化学习框架,用于未知环境中的同时导航和映射(SNAM)。SNAM问题是具有挑战性的,因为它需要设计多个代理的分层或联合策略,这些代理控制现实生活中的机器人在无地图环境中向目标移动,即环境地图不可用的环境,并且需要通过传感器获取。传感器从路径学习器(即导航器)调用,通过对感官代理的主动查询响应,并沿着运动路径。GRL-SNAM与抢先式导航算法和其他强化学习方法不同,它只依赖于局部感官观察,而不构建全局地图。我们的方法制定的路径导航和映射作为一个动态的最短路径搜索和发现过程中使用受控哈密顿优化:传感器输入被转换成本地能源景观编码可达性,障碍,和变形约束,而政策的传感,规划和重新配置逐步通过更新哈密顿。一个简化的哈密顿函数作为一个自适应的评分函数,更新动力学/势项,嵌入障碍约束,并不断完善新的本地信息到达的轨迹。我们评估GRL-SNAM两个不同的二维导航任务。在相同的阶段感测约束下,与局部反应性基线和全局策略学习参考进行比较,它保留了间隙,推广到看不见的布局,并证明了通过更新Hamilton算子的几何RL学习通过局部能量细化而不是广泛的全局映射进行最小探索来实现高质量的导航。该代码可在\href{https://github.com/CVC-Lab/GRL-SNAM}{Github}上公开获取。
摘要 :We present GRL-SNAM, a geometric reinforcement learning framework for Simultaneous Navigation and Mapping(SNAM) in unknown environments. A SNAM problem is challenging as it needs to design hierarchical or joint policies of multiple agents that control the movement of a real-life robot towards the goal in mapless environment, i.e. an environment where the map of the environment is not available apriori, and needs to be acquired through sensors. The sensors are invoked from the path learner, i.e. navigator, through active query responses to sensory agents, and along the motion path. GRL-SNAM differs from preemptive navigation algorithms and other reinforcement learning methods by relying exclusively on local sensory observations without constructing a global map. Our approach formulates path navigation and mapping as a dynamic shortest path search and discovery process using controlled Hamiltonian optimization: sensory inputs are translated into local energy landscapes that encode reachability, obstacle barriers, and deformation constraints, while policies for sensing, planning, and reconfiguration evolve stagewise via updating Hamiltonians. A reduced Hamiltonian serves as an adaptive score function, updating kinetic/potential terms, embedding barrier constraints, and continuously refining trajectories as new local information arrives. We evaluate GRL-SNAM on two different 2D navigation tasks. Comparing against local reactive baselines and global policy learning references under identical stagewise sensing constraints, it preserves clearance, generalizes to unseen layouts, and demonstrates that Geometric RL learning via updating Hamiltonians enables high-quality navigation through minimal exploration via local energy refinement rather than extensive global mapping. The code is publicly available on \href{https://github.com/CVC-Lab/GRL-SNAM}{Github}.


【6】Reinforcement learning with timed constraints for robotics motion planning
标题:机器人运动规划的具有时间约束的强化学习
链接:https://arxiv.org/abs/2601.00087

作者:Zhaoan Wang,Junchao Li,Mahdi Mohammad,Shaoping Xiao
摘要:在动态和不确定环境中运行的机器人系统越来越需要规划者在满足复杂任务序列的同时遵守严格的时间约束。度量区间时序逻辑(MITL)提供了一个正式的和表达的框架,用于指定这种时间有界的要求;然而,由于随机动态和部分可观测性,将MITL与强化学习(RL)集成仍然具有挑战性。本文提出了一个统一的基于自动机的强化学习框架,用于在MITL规范下综合马尔可夫决策过程(MDP)和部分可观察马尔可夫决策过程(POMDP)中的策略。将MITL公式转换为时间限制确定性广义Büchi自动机(Timed-LDGBA),并与底层决策过程同步,构建适合Q学习的产品时间模型。一个简单而富有表现力的奖励结构在允许额外性能目标的同时强制执行时间正确性。该方法在三个仿真研究中得到了验证:一个$5 \times 5$网格世界制定为MDP,一个$10 \times 10$网格世界制定为POMDP,和一个办公室一样的服务机器人的情况下。结果表明,所提出的框架一致地学习政策,满足严格的时间限制的要求下随机过渡,规模更大的状态空间,并保持有效的部分可观察的环境中,突出其在时间关键和不确定的设置可靠的机器人规划的潜力。
摘要:Robotic systems operating in dynamic and uncertain environments increasingly require planners that satisfy complex task sequences while adhering to strict temporal constraints. Metric Interval Temporal Logic (MITL) offers a formal and expressive framework for specifying such time-bounded requirements; however, integrating MITL with reinforcement learning (RL) remains challenging due to stochastic dynamics and partial observability. This paper presents a unified automata-based RL framework for synthesizing policies in both Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs) under MITL specifications. MITL formulas are translated into Timed Limit-Deterministic Generalized Büchi Automata (Timed-LDGBA) and synchronized with the underlying decision process to construct product timed models suitable for Q-learning. A simple yet expressive reward structure enforces temporal correctness while allowing additional performance objectives. The approach is validated in three simulation studies: a $5 \times 5$ grid-world formulated as an MDP, a $10 \times 10$ grid-world formulated as a POMDP, and an office-like service-robot scenario. Results demonstrate that the proposed framework consistently learns policies that satisfy strict time-bounded requirements under stochastic transitions, scales to larger state spaces, and remains effective in partially observable environments, highlighting its potential for reliable robotic planning in time-critical and uncertain settings.


【7】Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games
标题:Yahtzee:随机组合游戏的强化学习技术
链接:https://arxiv.org/abs/2601.00007

作者:Nicholas A. Pape
备注:20 pages, 19 figures
摘要:Yahtzee是一个经典的骰子游戏,具有随机的组合结构和延迟奖励,使其成为一个有趣的中等规模RL基准。虽然一个最佳的策略可以使用动态规划方法来计算,多人游戏是棘手的,激励近似方法。我们将Yahtzee公式化为马尔可夫决策过程(MDP),并使用各种策略梯度方法训练自玩代理:REINFORCE,Advantage Actor-Critic(A2 C)和Proximal Policy Optimization(PPO),所有这些方法都使用具有共享干线的多头网络。我们消除了特征和动作编码,架构,返回估计器和熵正则化,以了解它们对学习的影响。在固定的训练预算下,REINFORCE和PPO被证明对超参数敏感,无法达到接近最佳的性能,而A2 C在一系列设置下都能稳健地训练。在10万个评估游戏中,我们的代理获得了241.78分的中位分数,与最优DP分数254.59相差5.0%,分别以24.9%和34.1%的比率获得了上半区奖金和Yahtzee。所有模型都在努力学习上限奖金策略,在四条上过度指数化,突出了持续的长期信用分配和探索挑战。
摘要:Yahtzee is a classic dice game with a stochastic, combinatorial structure and delayed rewards, making it an interesting mid-scale RL benchmark. While an optimal policy for solitaire Yahtzee can be computed using dynamic programming methods, multiplayer is intractable, motivating approximation methods. We formulate Yahtzee as a Markov Decision Process (MDP), and train self-play agents using various policy gradient methods: REINFORCE, Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO), all using a multi-headed network with a shared trunk. We ablate feature and action encodings, architecture, return estimators, and entropy regularization to understand their impact on learning. Under a fixed training budget, REINFORCE and PPO prove sensitive to hyperparameters and fail to reach near-optimal performance, whereas A2C trains robustly across a range of settings. Our agent attains a median score of 241.78 points over 100,000 evaluation games, within 5.0\% of the optimal DP score of 254.59, achieving the upper section bonus and Yahtzee at rates of 24.9\% and 34.1\%, respectively. All models struggle to learn the upper bonus strategy, overindexing on four-of-a-kind's, highlighting persistent long-horizon credit-assignment and exploration challenges.


医学相关(5篇)

【1】Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI
标题:两种深度学习方法在电影心脏MRI中自动分割左心室
链接:https://arxiv.org/abs/2601.00794

作者:Wenhui Chu,Nikolaos V. Tsekos
备注:7 pages, 5 figures, published in ICBBB 2022
摘要:左心室(LV)分割对于心脏图像的临床量化和诊断至关重要。在这项工作中,我们提出了两种新的深度学习架构,称为LNU-Net和IBU-Net,用于从短轴电影MRI图像中分割左心室。LNU-Net是从层规范化(LN)的U-Net架构中派生出来的,而IBU-Net是从用于医学图像分割的实例批规范化(IB)U-Net中派生出来的。LNU-Net和IBU-Net的架构具有用于特征提取的下采样路径和用于精确定位的上采样路径。我们使用原始的U-Net作为基本的分割方法,并将其与我们提出的架构进行比较。LNU-Net和IBU-Net都有左心室分割方法:LNU-Net在每个卷积块中应用层归一化,而IBU-Net在第一个卷积块中将实例和批量归一化合并在一起,并将其结果传递到下一层。我们的方法结合了仿射变换和弹性变形的图像数据处理。我们的数据集包含来自45名患者的805张关于左心室的MRI图像,用于评价。我们实验评估的结果,所提出的方法优于骰子系数和平均垂直距离比其他国家的最先进的方法。
摘要 :Left ventricle (LV) segmentation is critical for clinical quantification and diagnosis of cardiac images. In this work, we propose two novel deep learning architectures called LNU-Net and IBU-Net for left ventricle segmentation from short-axis cine MRI images. LNU-Net is derived from layer normalization (LN) U-Net architecture, while IBU-Net is derived from the instance-batch normalized (IB) U-Net for medical image segmentation. The architectures of LNU-Net and IBU-Net have a down-sampling path for feature extraction and an up-sampling path for precise localization. We use the original U-Net as the basic segmentation approach and compared it with our proposed architectures. Both LNU-Net and IBU-Net have left ventricle segmentation methods: LNU-Net applies layer normalization in each convolutional block, while IBU-Net incorporates instance and batch normalization together in the first convolutional block and passes its result to the next layer. Our method incorporates affine transformations and elastic deformations for image data processing. Our dataset that contains 805 MRI images regarding the left ventricle from 45 patients is used for evaluation. We experimentally evaluate the results of the proposed approaches outperforming the dice coefficient and the average perpendicular distance than other state-of-the-art approaches.


【2】A Sparse-Attention Deep Learning Model Integrating Heterogeneous Multimodal Features for Parkinson's Disease Severity Profiling
标题:一种融合异构多模态特征的稀疏注意深度学习模型用于帕金森病严重程度分析
链接:https://arxiv.org/abs/2601.00519

作者:Dristi Datta,Tanmoy Debnath,Minh Chau,Manoranjan Paul,Gourab Adhikary,Md Geaur Rahman
摘要:表征帕金森病(PD)的异质性表现需要在统一的预测框架内整合生物学和临床标志物。虽然多模态数据提供了补充信息,但许多现有的计算模型难以解释,类别不平衡或高维成像和表格临床特征的有效融合。为了解决这些限制,我们提出了类加权稀疏注意融合网络(SAFN),这是一种可解释的深度学习框架,用于强大的多模态分析。SAFN使用模态特定编码器和对称交叉注意机制(捕获成像和临床表示之间的非线性相互作用)集成MRI皮质厚度、MRI体积测量、临床评估和人口统计学变量。稀疏约束的注意力门控融合层动态地优先考虑信息模态,而类平衡的焦点损失(beta = 0.999,gamma = 1.5)在没有合成过采样的情况下减轻数据集的不平衡。使用受试者五重交叉验证对帕金森病进展标志物计划的703名参与者(570名PD,133名健康对照)进行了评估,SAFN实现了0.98 ± 0.02的准确度和1.00 ± 0.00的PR-AUC,优于已建立的机器学习和深度学习基线。可解释性分析显示了一个临床一致的决策过程,约60%的预测权重分配给临床评估,符合运动障碍协会的诊断原则。SAFN为神经退行性疾病的计算分析提供了一个可重复和透明的多模态建模范例。
摘要:Characterising the heterogeneous presentation of Parkinson's disease (PD) requires integrating biological and clinical markers within a unified predictive framework. While multimodal data provide complementary information, many existing computational models struggle with interpretability, class imbalance, or effective fusion of high-dimensional imaging and tabular clinical features. To address these limitations, we propose the Class-Weighted Sparse-Attention Fusion Network (SAFN), an interpretable deep learning framework for robust multimodal profiling. SAFN integrates MRI cortical thickness, MRI volumetric measures, clinical assessments, and demographic variables using modality-specific encoders and a symmetric cross-attention mechanism that captures nonlinear interactions between imaging and clinical representations. A sparsity-constrained attention-gating fusion layer dynamically prioritises informative modalities, while a class-balanced focal loss (beta = 0.999, gamma = 1.5) mitigates dataset imbalance without synthetic oversampling. Evaluated on 703 participants (570 PD, 133 healthy controls) from the Parkinson's Progression Markers Initiative using subject-wise five-fold cross-validation, SAFN achieves an accuracy of 0.98 plus or minus 0.02 and a PR-AUC of 1.00 plus or minus 0.00, outperforming established machine learning and deep learning baselines. Interpretability analysis shows a clinically coherent decision process, with approximately 60 percent of predictive weight assigned to clinical assessments, consistent with Movement Disorder Society diagnostic principles. SAFN provides a reproducible and transparent multimodal modelling paradigm for computational profiling of neurodegenerative disease.


【3】Optimized Hybrid Feature Engineering for Resource-Efficient Arrhythmia Detection in ECG Signals: An Optimization Framework
标题:优化混合特征工程,用于资源高效的心电信号心律失常检测:优化框架
链接:https://arxiv.org/abs/2601.00192

作者:Moirangthem Tiken Singh,Manibhushan Yaikhom
摘要:心血管疾病,特别是心律失常,仍然是全球主要的死亡原因,需要通过医疗物联网(IoMT)进行持续监测。然而,最先进的深度学习方法通常会带来令人望而却步的计算开销,使其不适合资源受限的边缘设备。这项研究提出了一个资源有效的,以数据为中心的框架,优先功能工程的复杂性。我们优化的流水线使复杂的高维心律失常数据线性可分。这是通过将时频小波分解与图论结构描述符(如PageRank中心性)相结合来实现的。这种混合特征空间,结合小波分解和图论描述符,然后细化使用互信息和递归消除,使可解释的,超轻量级的线性分类器。在MIT-BIH和INCART数据集上的验证产生了98.44%的诊断准确率,模型占用空间为8.54 KB。该系统在52 ms/节拍流水线内实现了0.46 μ s的分类推理延迟,保证了实时操作。这些成果提供了一个数量级的效率增益超过压缩模型,如KD-Light(25 KB,96.32%的准确性),推进无电池心脏传感器。
摘要:Cardiovascular diseases, particularly arrhythmias, remain a leading global cause of mortality, necessitating continuous monitoring via the Internet of Medical Things (IoMT). However, state-of-the-art deep learning approaches often impose prohibitive computational overheads, rendering them unsuitable for resource-constrained edge devices. This study proposes a resource-efficient, data-centric framework that prioritizes feature engineering over complexity. Our optimized pipeline makes the complex, high-dimensional arrhythmia data linearly separable. This is achieved by integrating time-frequency wavelet decompositions with graph-theoretic structural descriptors, such as PageRank centrality. This hybrid feature space, combining wavelet decompositions and graph-theoretic descriptors, is then refined using mutual information and recursive elimination, enabling interpretable, ultra-lightweight linear classifiers. Validation on the MIT-BIH and INCART datasets yields 98.44% diagnostic accuracy with an 8.54 KB model footprint. The system achieves 0.46 $μ$s classification inference latency within a 52 ms per-beat pipeline, ensuring real-time operation. These outcomes provide an order-of-magnitude efficiency gain over compressed models, such as KD-Light (25 KB, 96.32% accuracy), advancing battery-less cardiac sensors.


【4】Deep Learning Approach for the Diagnosis of Pediatric Pneumonia Using Chest X-ray Imaging
标题:使用胸部X射线成像诊断儿科肺炎的深度学习方法
链接:https://arxiv.org/abs/2601.00041

作者:Fatemeh Hosseinabadi,Mohammad Mojtaba Rohani
备注:9 pages, 3 figures
摘要:小儿肺炎仍然是全世界儿童发病和死亡的主要原因。及时和准确的诊断是至关重要的,但往往受到有限的放射学专业知识和儿科成像的生理和程序复杂性的挑战。这项研究调查了最先进的卷积神经网络(CNN)架构ResNetRS,RegNet和EfficientNetV2的性能,使用迁移学习将儿科胸部X射线图像自动分类为肺炎或正常。从最初包含5,856张儿科图像的公开数据集中提取了1,000张胸部X射线图像的精选子集。所有图像进行预处理和标记的二进制分类。每个模型都使用预训练的ImageNet权重进行了微调,并根据准确性和灵敏度进行了评估。RegNet取得了最高的分类性能,准确度为92.4,灵敏度为90.1,其次是ResNetRS(准确度:91.9,灵敏度:89.3)和EfficientNetV2(准确度:88.5,灵敏度:88.1)。
摘要:Pediatric pneumonia remains a leading cause of morbidity and mortality in children worldwide. Timely and accurate diagnosis is critical but often challenged by limited radiological expertise and the physiological and procedural complexity of pediatric imaging. This study investigates the performance of state-of-the-art convolutional neural network (CNN) architectures ResNetRS, RegNet, and EfficientNetV2 using transfer learning for the automated classification of pediatric chest Xray images as either pneumonia or normal.A curated subset of 1,000 chest X-ray images was extracted from a publicly available dataset originally comprising 5,856 pediatric images. All images were preprocessed and labeled for binary classification. Each model was fine-tuned using pretrained ImageNet weights and evaluated based on accuracy and sensitivity. RegNet achieved the highest classification performance with an accuracy of 92.4 and a sensitivity of 90.1, followed by ResNetRS (accuracy: 91.9, sensitivity: 89.3) and EfficientNetV2 (accuracy: 88.5, sensitivity: 88.1).


【5】Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI
标题:利用可解释的人工智能对全天心电信号进行建模以预测心力衰竭风险
链接 :https://arxiv.org/abs/2601.00014

作者:Eran Zvuloni,Ronit Almog,Michael Glikson,Shany Brimer Biton,Ilan Green,Izhar Laufer,Offer Amir,Joachim A. Behar
摘要:心力衰竭(HF)影响11.8%的65岁及以上成年人,降低了生活质量和寿命。预防HF可以降低发病率和死亡率。我们假设,应用于24小时单导联心电图(ECG)数据的人工智能(AI)可以预测五年内HF的风险。为了研究这一点,使用了Technion-Leumit Holter ECG(TLHE)数据集,包括20年来收集的47,729名患者的69,663个记录。我们的深度学习模型DeepHHF在24小时ECG记录上进行训练,获得了0.80的受试者工作特征曲线下面积,优于使用30秒片段和临床评分的模型。DeepHHF确定的高风险个体有两倍的住院或死亡机会。可解释性分析显示,DeepHHF专注于心律失常和心脏异常,重点关注上午8点到下午3点之间。这项研究强调了深度学习对24小时连续ECG数据进行建模的可行性,捕获对可靠的风险预测至关重要的阵发性事件和昼夜节律变化。应用于单导联动态心电图的人工智能是无创的,廉价的,广泛使用的,使其成为一个有前途的工具,用于HF风险预测。
摘要:Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities, with key attention between 8 AM and 3 PM. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events and circadian variations essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.


超分辨率|去噪|去模糊|去雾(1篇)

【1】Categorical Reparameterization with Denoising Diffusion models
标题:使用去噪扩散模型的类别重新参数化
链接:https://arxiv.org/abs/2601.00781

作者:Samson Gourevitch,Alain Durmus,Eric Moulines,Jimmy Olsson,Yazid Janati
备注:working paper
摘要:具有分类变量的基于一致性的优化通常依赖于无偏但有噪声的评分函数估计,或者依赖于连续松弛,其用允许路径(重新参数化)梯度的平滑代理来替换离散分布,以优化有偏的温度依赖目标为代价。在本文中,我们通过引入一个基于扩散的软重参数化的分类分布,扩展了这个家庭的松弛。对于这些分布,高斯噪声过程下的去噪器允许一个封闭的形式,可以有效地计算,产生一个无训练的扩散采样器,通过它我们可以反向传播。我们的实验表明,所提出的重新参数化技巧在各种基准测试中产生有竞争力或改进的优化性能。
摘要:Gradient-based optimization with categorical variables typically relies on score-function estimators, which are unbiased but noisy, or on continuous relaxations that replace the discrete distribution with a smooth surrogate admitting a pathwise (reparameterized) gradient, at the cost of optimizing a biased, temperature-dependent objective. In this paper, we extend this family of relaxations by introducing a diffusion-based soft reparameterization for categorical distributions. For these distributions, the denoiser under a Gaussian noising process admits a closed form and can be computed efficiently, yielding a training-free diffusion sampler through which we can backpropagate. Our experiments show that the proposed reparameterization trick yields competitive or improved optimization performance on various benchmarks.


自动驾驶|车辆|车道检测等(1篇)

【1】The Weather Paradox: Why Precipitation Fails to Predict Traffic Accident Severity in Large-Scale US Data
标题:天气悖论:为什么降水无法在美国大规模数据中预测交通事故严重程度
链接:https://arxiv.org/abs/2601.00152

作者:Yann Bellec,Rohan Kaman,Siwen Cui,Aarav Agrawal,Calvin Chen
备注:11 pages, 8 figures, 0 tables. Preprint, machine learning analysis of 500,000 US traffic accidents
摘要:本研究探讨了环境,时间和空间因素对美国交通事故严重程度的预测能力。使用2016-2023年美国50万起交通事故的数据集,我们训练了一个XGBoost分类器,该分类器通过随机搜索交叉验证进行优化,并通过类加权调整类不平衡。最终的模型达到了78%的整体准确率,在大多数类(严重程度2)上表现出色,达到了87%的精确度和召回率。特征重要性分析表明,一天中的时间,地理位置和天气相关的变量,包括能见度,温度和风速,是事故严重程度的最强预测因素。然而,与最初的假设相反,降水和能见度表现出有限的预测能力,可能反映驾驶员在明显危险的条件下的行为适应。该数据集在中等严重事故中的优势限制了模型在极端情况下学习有意义模式的能力,突出了对替代采样策略、增强特征工程和外部数据集集成的需求。这些研究结果有助于以证据为基础的交通管理,并建议未来的严重程度预测研究的方向。
摘要:This study investigates the predictive capacity of environmental, temporal, and spatial factors on traffic accident severity in the United States. Using a dataset of 500,000 U.S. traffic accidents spanning 2016-2023, we trained an XGBoost classifier optimized through randomized search cross-validation and adjusted for class imbalance via class weighting. The final model achieves an overall accuracy of 78%, with strong performance on the majority class (Severity 2), attaining 87% precision and recall. Feature importance analysis reveals that time of day, geographic location, and weather-related variables, including visibility, temperature, and wind speed, rank among the strongest predictors of accident severity. However, contrary to initial hypotheses, precipitation and visibility demonstrate limited predictive power, potentially reflecting behavioral adaptation by drivers under overtly hazardous conditions. The dataset's predominance of mid-level severity accidents constrains the model's capacity to learn meaningful patterns for extreme cases, highlighting the need for alternative sampling strategies, enhanced feature engineering, and integration of external datasets. These findings contribute to evidence-based traffic management and suggest future directions for severity prediction research.


联邦学习|隐私保护|加密(2篇)

【1】FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing
标题:FedHypeVAE:用于差异私有嵌入共享的超网络生成条件VAE的联邦学习
链接:https://arxiv.org/abs/2601.00785

作者:Sunny Gupta,Amit Sethi
备注:10 pages, 1 figures, Accepted at AAI'26
摘要:联邦数据共享承诺在不集中原始数据的情况下实现实用性,但现有的嵌入级生成器在非IID客户端异构性下挣扎,并提供有限的正式保护以防止梯度泄漏。我们提出了FedHypeVAE,一个差异化的私人,超网络驱动的框架,用于在分散的客户端之间合成嵌入级数据。在有条件的VAE骨干的基础上,我们将单个全局解码器和固定的潜在先验替换为客户端感知的解码器和由共享超网络从私有的可训练客户端代码生成的类条件先验。这种双层设计使生成层而不是下游模型个性化,同时将本地数据与通信参数解耦。共享超网络在差分隐私下进行了优化,确保只有噪声干扰,修剪梯度在客户端之间聚合。真实嵌入和合成嵌入之间的局部MMD对齐以及超网络输出上的Lipschitz正则化子进一步增强了非IID条件下的稳定性和分布一致性。在训练之后,中性元代码实现域不可知的合成,而元代码的混合物提供可控的多域覆盖。FedHypeVAE在生成器级别统一了个性化、隐私和分布对齐,为联邦设置中的隐私保护数据合成建立了原则性基础。代码:github.com/sunnyinAI/FedHypeVAE
摘要 :Federated data sharing promises utility without centralizing raw data, yet existing embedding-level generators struggle under non-IID client heterogeneity and provide limited formal protection against gradient leakage. We propose FedHypeVAE, a differentially private, hypernetwork-driven framework for synthesizing embedding-level data across decentralized clients. Building on a conditional VAE backbone, we replace the single global decoder and fixed latent prior with client-aware decoders and class-conditional priors generated by a shared hypernetwork from private, trainable client codes. This bi-level design personalizes the generative layerrather than the downstream modelwhile decoupling local data from communicated parameters. The shared hypernetwork is optimized under differential privacy, ensuring that only noise-perturbed, clipped gradients are aggregated across clients. A local MMD alignment between real and synthetic embeddings and a Lipschitz regularizer on hypernetwork outputs further enhance stability and distributional coherence under non-IID conditions. After training, a neutral meta-code enables domain agnostic synthesis, while mixtures of meta-codes provide controllable multi-domain coverage. FedHypeVAE unifies personalization, privacy, and distribution alignment at the generator level, establishing a principled foundation for privacy-preserving data synthesis in federated settings. Code: github.com/sunnyinAI/FedHypeVAE


【2】HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
标题:HFedMoE:具有混合专家的资源感知的异类联邦学习
链接:https://arxiv.org/abs/2601.00583

作者:Zihan Fang,Zheng Lin,Senkang Hu,Yanan Ma,Yihang Tao,Yiqin Deng,Xianhao Chen,Yuguang Fang
备注:14 pages, 16 figures
摘要:虽然联邦学习(FL)可以在不影响数据隐私的情况下对大型语言模型(LLM)进行微调,但LLM的巨大规模使得设备上的训练对于资源受限的客户端(如移动设备)来说是不切实际的。因此,混合专家(MoE)模型已经成为一种计算效率高的解决方案,它在模型训练期间只激活稀疏的专家子集,以减少计算负担而不牺牲性能。尽管将MoE集成到FL微调中具有巨大的潜力,但它仍然面临三个关键挑战:i)由于缺乏可靠的度量来衡量每个专家对本地微调性能的影响,为客户选择合适的专家仍然具有挑战性,ii)跨客户端的异构计算资源严重阻碍了基于MoE的LLM微调,因为跨不同输入样本的动态专家激活可能压倒资源受限的设备,以及iii)客户端特定的专家子集和路由偏好破坏全局聚合,其中未对准的专家更新和不一致的选通网络导致破坏性干扰。为了解决这些挑战,我们提出了HFedMoE,一个异构的基于MoE的FL微调框架,为每个客户端定制一个专家子集,以进行计算效率高的LLM微调。具体而言,HFedMoE基于其对微调性能的贡献来识别专家重要性,然后从信息瓶颈的角度自适应地选择专家子集,以与每个客户端的计算预算保持一致。一个稀疏感知的模型聚合策略也被设计来聚合积极微调的专家和门控参数的重要性加权的贡献。大量的实验表明,HFedMoE在训练精度和收敛速度方面优于最先进的基准。
摘要:While federated learning (FL) enables fine-tuning of large language models (LLMs) without compromising data privacy, the substantial size of an LLM renders on-device training impractical for resource-constrained clients, such as mobile devices. Thus, Mixture-of-Experts (MoE) models have emerged as a computation-efficient solution, which activates only a sparse subset of experts during model training to reduce computing burden without sacrificing performance. Though integrating MoE into FL fine-tuning holds significant potential, it still encounters three key challenges: i) selecting appropriate experts for clients remains challenging due to the lack of a reliable metric to measure each expert's impact on local fine-tuning performance, ii) the heterogeneous computing resources across clients severely hinder MoE-based LLM fine-tuning, as dynamic expert activations across diverse input samples can overwhelm resource-constrained devices, and iii) client-specific expert subsets and routing preference undermine global aggregation, where misaligned expert updates and inconsistent gating networks in troduce destructive interference. To address these challenges, we propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client for computation-efficient LLM fine-tuning. Specifically, HFedMoE identifies the expert importance based on its contributions to fine-tuning performance, and then adaptively selects a subset of experts from an information bottleneck perspective to align with each client' s computing budget. A sparsity-aware model aggregation strategy is also designed to aggregate the actively fine-tuned experts and gating parameters with importance weighted contributions. Extensive experiments demonstrate that HFedMoE outperforms state-of-the-art benchmarks in training accuracy and convergence speed.


推理|分析|理解|解释(4篇)

【1】Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning
标题:理性几何:有效数学推理的谱签名
链接:https://arxiv.org/abs/2601.00791

作者:Valentin Noël
备注:58 pages, 19 figures, Under Review
摘要:我们提出了一种无需训练的方法,通过对注意模式的谱分析来检测大型语言模型中的有效数学推理。通过将注意力矩阵视为令牌上动态图的邻接矩阵,我们提取了四个可解释的谱诊断,Fiedler值(代数连通性),高频能量比(HFER),图形信号平滑度和谱熵,有效和无效的数学证明之间表现出统计上的显着差异。在来自四个独立架构系列的七个Transformer模型上进行实验(Meta Llama,Alibaba Qwen,Microsoft Phi和Mistral AI)证明,这种光谱特征产生的效应大小高达Cohen的$d = 3.30$($p < 10^{-116}$),在严格评估下,分类准确率为85.0- 95.6\ %,校准阈值在全数据集上达到93- 95\ %。该方法不需要训练数据,微调或学习分类器:光谱度量上的单个阈值足以获得高精度。通过系统的标签校正,我们发现,谱方法检测逻辑的一致性,而不是编译器接受,确定数学上有效的证明,正式验证拒绝由于技术故障。我们进一步确定了架构依赖性:Mistral-7 B的滑动窗口注意力将区分信号从HFER转移到后期层平滑度($d = 2.09$,$p_{\text{MW}} = 1.16 \times 10^{-48}$),揭示了注意力机制设计影响哪些光谱特征捕获推理有效性。这些发现将谱图分析确立为推理验证的原则框架,并立即应用于幻觉检测和人工智能安全监控。
摘要:We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs over tokens, we extract four interpretable spectral diagnostics, the Fiedler value (algebraic connectivity), high-frequency energy ratio (HFER), graph signal smoothness, and spectral entropy, that exhibit statistically significant differences between valid and invalid mathematical proofs. Experiments across seven transformer models from four independent architectural families (Meta Llama, Alibaba Qwen, Microsoft Phi, and Mistral AI) demonstrate that this spectral signature produces effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling 85.0--95.6\% classification accuracy under rigorous evaluation, with calibrated thresholds reaching 93--95\% on the full dataset. The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy. Through systematic label correction, we discover that the spectral method detects logical coherence rather than compiler acceptance, identifying mathematically valid proofs that formal verifiers reject due to technical failures. We further identify an architectural dependency: Mistral-7B's Sliding Window Attention shifts the discriminative signal from HFER to late-layer Smoothness ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$), revealing that attention mechanism design affects which spectral features capture reasoning validity. These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.


【2】The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving
标题:推理与创造力的权衡:走向创造力驱动的问题解决
链接:https://arxiv.org/abs/2601.00747

作者:Max Ruiz Luyten,Mihaela van der Schaar
备注:56 pages, 9 figures, submitted to Twenty-Ninth Annual Conference on Artificial Intelligence and Statistics
摘要 :最先进的大型语言模型(LLM)管道依赖于自举推理循环:对不同的思想链进行采样,并加强得分最高的思想链,主要是优化正确性。我们分析了这种设计选择如何对模型在推理路径上的分布的崩溃敏感,削减语义熵并破坏创造性的解决问题。为了分析这种失败,我们引入了分布式创造性推理(DCR),这是一个统一的变分目标,通过对解决方案轨迹的概率测量将训练转换为梯度流。STaR、GRPO和DPO以及熵奖金和其他方法都构成了相同损失的特殊情况。该框架提供了三个核心成果:(一)多样性衰减定理,描述了基于正确性的目标如何导致STaR,GRPO和DPO的不同多样性衰减模式;(二)确保收敛到稳定和多样性政策的设计,有效防止崩溃;以及(三)简单,可操作的食谱,以在实践中实现这一点。因此,DCR为保持正确性和创造性的LLM提供了第一个原则性配方。
摘要:State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing correctness. We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To analyze this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, and other methods, all constitute special cases of the same loss. The framework delivers three core results: (i) the diversity decay theorem, describing how correctness-based objectives lead to distinct modes of diversity decay for STaR, GRPO, and DPO; (ii) designs that ensure convergence to a stable and diverse policy, effectively preventing collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct and creative.


【3】A Comparative Analysis of Interpretable Machine Learning Methods
标题:可解释机器学习方法的比较分析
链接:https://arxiv.org/abs/2601.00428

作者:Mattia Billa,Giovanni Orlandi,Veronica Guidetti,Federica Mandreoli
摘要:近年来,机器学习(ML)在广泛的领域得到了广泛的应用,包括医疗保健、金融和法律等高风险领域。这种日益增长的依赖性引起了人们对模型可解释性和问责制的日益关注,特别是随着法律和监管框架对在关键应用中使用黑箱模型施加更严格的限制。虽然可解释的ML已经引起了大量的关注,但对固有的可解释模型的系统评估,特别是对于表格数据,仍然相对较少,并且通常主要集中在聚合的性能结果上。   为了解决这一差距,我们提出了一个大规模的比较评估16固有的可解释的方法,从经典的线性模型和决策树,以更近的方法,如可解释的提升机(EBM),符号回归(SR),和广义最优稀疏决策树(GOSDT)。我们的研究涵盖了216个真实世界的表格数据集,并根据结构数据集的特征(包括维度,样本量,线性和类别不平衡)对性能进行分层,从而超越了综合排名。此外,我们还评估了受控分布变化下的训练时间和鲁棒性。我们的研究结果揭示了清晰的性能层次结构,特别是对于回归任务,EBM始终实现强大的预测准确性。与此同时,我们表明,性能是高度依赖于上下文的:SR和可解释的广义加性神经网络(IGANN)在非线性制度中表现得特别好,而GOSDT模型表现出显着的敏感性类不平衡。总体而言,这些研究结果为寻求可解释性和预测性能之间平衡的从业者提供了实用指导,并有助于对表格数据的可解释性建模进行更深入的实证理解。
摘要:In recent years, Machine Learning (ML) has seen widespread adoption across a broad range of sectors, including high-stakes domains such as healthcare, finance, and law. This growing reliance has raised increasing concerns regarding model interpretability and accountability, particularly as legal and regulatory frameworks place tighter constraints on using black-box models in critical applications. Although interpretable ML has attracted substantial attention, systematic evaluations of inherently interpretable models, especially for tabular data, remain relatively scarce and often focus primarily on aggregated performance outcomes.   To address this gap, we present a large-scale comparative evaluation of 16 inherently interpretable methods, ranging from classical linear models and decision trees to more recent approaches such as Explainable Boosting Machines (EBMs), Symbolic Regression (SR), and Generalized Optimal Sparse Decision Trees (GOSDT). Our study spans 216 real-world tabular datasets and goes beyond aggregate rankings by stratifying performance according to structural dataset characteristics, including dimensionality, sample size, linearity, and class imbalance. In addition, we assess training time and robustness under controlled distributional shifts. Our results reveal clear performance hierarchies, especially for regression tasks, where EBMs consistently achieve strong predictive accuracy. At the same time, we show that performance is highly context-dependent: SR and Interpretable Generalized Additive Neural Networks (IGANNs) perform particularly well in non-linear regimes, while GOSDT models exhibit pronounced sensitivity to class imbalance. Overall, these findings provide practical guidance for practitioners seeking a balance between interpretability and predictive performance, and contribute to a deeper empirical understanding of interpretable modeling for tabular data.


【4】Comparative Evaluation of Embedding Representations for Financial News Sentiment Analysis
标题:财经新闻情绪分析中嵌入表示的比较评价
链接:https://arxiv.org/abs/2512.13749

作者:Joyjit Roy,Samaresh Kumar Singh
备注:6 pages, 2 figures. Submitted to IEEE IATMSI-2026 (Track: AI, IoT and Computer Vision Enabled Technologies)
摘要:金融情绪分析增强了对市场的理解;然而,标准的自然语言处理方法在应用于小型数据集时遇到了重大挑战。本研究提供了一个基于嵌入的方法在资源受限的环境中的金融新闻情感分类的比较评估。Word 2 Vec、GloVe和句子Transformer表示与手动标记的标题上的梯度提升相结合进行评估。实验结果表明,验证和测试性能之间存在很大的差距,尽管有很强的验证指标,但模型的性能比普通基线差。分析表明,预训练的嵌入在关键数据充足性阈值以下会产生收益递减,并且小的验证集会导致模型选择过程中的过拟合。通过每周情绪汇总和市场监测工作流程的叙述性总结来说明实际应用。研究结果提供了经验证据,表明嵌入质量本身无法解决情绪分类中的基本数据稀缺问题。对于资源有限的从业者来说,结果表明,当标记样本稀缺时,需要考虑替代方法,如Few-Shot学习,数据增强或词典增强混合方法。
摘要:Financial sentiment analysis enhances market understanding; however, standard natural language processing approaches encounter significant challenges when applied to small datasets. This study provides a comparative evaluation of embedding-based methods for financial news sentiment classification in resource-constrained environments. Word2Vec, GloVe, and sentence transformer representations are evaluated in combination with gradient boosting on manually labeled headlines. Experimental results identify a substantial gap between validation and test performance, with models performing worse than trivial baselines despite strong validation metrics. The analysis demonstrates that pretrained embeddings yield diminishing returns below a critical data sufficiency threshold, and that small validation sets contribute to overfitting during model selection. Practical application is illustrated through weekly sentiment aggregation and narrative summarization for market monitoring workflows. The findings offer empirical evidence that embedding quality alone cannot address fundamental data scarcity in sentiment classification. For practitioners operating with limited resources, the results indicate the need to consider alternative approaches such as few-shot learning, data augmentation, or lexicon-enhanced hybrid methods when labeled samples are scarce.


检测相关(6篇)

【1】Trajectory Guard -- A Lightweight, Sequence-Aware Model for Real-Time Anomaly Detection in Agentic AI
标题:轨迹警卫--一种轻量级、序列感知模型,用于统计人工智能中实时异常检测
链接:https://arxiv.org/abs/2601.00516

作者:Laksh Advani
备注:Accepted to AAAI Trustagent 2026
摘要:自主LLM代理生成多步骤行动计划,这些计划可能由于上下文不一致或结构不一致而失败。现有的异常检测方法不适合这一挑战:均值池嵌入稀释了异常步骤,而仅对比的方法忽略了序列结构。预训练嵌入的标准无监督方法的F1分数不高于0.69。我们介绍了Trajectory Guard,这是一种具有混合损失函数的Siamese递归自动编码器,它通过对比学习和重建的顺序有效性来共同学习任务轨迹对齐。这种双重目标使得能够统一检测“此任务的错误计划”和“畸形计划结构”。“在安全审计(RAS-Eval)和多代理系统(Who\&When)的合成扰动和真实世界故障的基准测试中,我们在平衡集上获得了0.88-0.94的F1分数,在不平衡的外部基准测试中获得了0.86-0.92的召回率。在32 ms的推理延迟下,我们的方法比LLM Judge基线快17- 27倍,从而实现了生产部署中的实时安全验证。
摘要 :Autonomous LLM agents generate multi-step action plans that can fail due to contextual misalignment or structural incoherence. Existing anomaly detection methods are ill-suited for this challenge: mean-pooling embeddings dilutes anomalous steps, while contrastive-only approaches ignore sequential structure. Standard unsupervised methods on pre-trained embeddings achieve F1-scores no higher than 0.69. We introduce Trajectory Guard, a Siamese Recurrent Autoencoder with a hybrid loss function that jointly learns task-trajectory alignment via contrastive learning and sequential validity via reconstruction. This dual objective enables unified detection of both "wrong plan for this task" and "malformed plan structure." On benchmarks spanning synthetic perturbations and real-world failures from security audits (RAS-Eval) and multi-agent systems (Who\&When), we achieve F1-scores of 0.88-0.94 on balanced sets and recall of 0.86-0.92 on imbalanced external benchmarks. At 32 ms inference latency, our approach runs 17-27$\times$ faster than LLM Judge baselines, enabling real-time safety verification in production deployments.


【2】Detecting Spike Wave Discharges (SWD) using 1-dimensional Residual UNet
标题:使用1维剩余UNet检测尖峰波放电(SWD)
链接:https://arxiv.org/abs/2601.00459

作者:Saurav Sengupta,Scott Kilianski,Suchetha Sharma,Sakina Lashkeri,Ashley McHugh,Mark Beenhakker,Donald E. Brown
摘要:在脑电图(EEG)记录中手动标记事件是耗时的。当连续数周到数月进行EEG记录时,尤其如此。因此,自动标记相关EEG事件的方法减少了手动工作量。棘波放电(SWD)是失神发作的电图标志,是通常手动标记的EEG事件。虽然以前的一些研究已经利用机器学习来自动分割和分类EEG信号,如SWD,但它们可以改进。在这里,我们比较了14个机器学习分类器在我们自己的手动注释数据集上的性能,这些数据集是来自C3 H/HeJ小鼠的961小时EEG记录,包括22,637个标记的SWD。我们发现,在这个数据集中,一维UNet最适合标记SWD。我们还通过增强我们的训练数据来改进1D UNet,并确定缩放显示出所有应用的增强程序的最大益处。然后,我们比较1D UNet与数据增强,AugUNet 1D,对最近发表的时间和频率为基础的算法方法称为“双峰”。AugUNet 1D显示出更优的性能,并检测到与手动标记的SWD具有更相似特征的事件。AugUNet 1D,在我们手动注释的数据上预先训练或未经训练,对其他用户公开。
摘要:The manual labeling of events in electroencephalography (EEG) records is time-consuming. This is especially true when EEG recordings are taken continuously over weeks to months. Therefore, a method to automatically label pertinent EEG events reduces the manual workload. Spike wave discharges (SWD), which are the electrographic hallmark of absence seizures, are EEG events that are often labeled manually. While some previous studies have utilized machine learning to automatically segment and classify EEG signals like SWDs, they can be improved. Here we compare the performance of 14 machine learning classifiers on our own manually annotated dataset of 961 hours of EEG recordings from C3H/HeJ mice, including 22,637 labeled SWDs. We find that a 1D UNet performs best for labeling SWDs in this dataset. We also improve the 1D UNet by augmenting our training data and determine that scaling showed the greatest benefit of all augmentation procedures applied. We then compare the 1D UNet with data augmentation, AugUNet1D, against a recently published time- and frequency-based algorithmic approach called "Twin Peaks". AugUNet1D showed superior performance and detected events with more similar features to the SWDs labeled manually. AugUNet1D, pretrained on our manually annotated data or untrained, is made public for others users.


【3】Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models
标题:通过深度模型对空中捕获视频序列进行实时人体检测
链接:https://arxiv.org/abs/2601.00391

作者:Nouar AlDahoul,Aznul Qalid Md Sabri,Ali Mohammed Mansoor
摘要:视频中的人体检测在各种实际应用中起着重要的作用。大多数传统方法依赖于利用手工制作的功能,这些功能依赖于问题,并且对于特定任务是最佳的。此外,它们非常容易受到动态事件的影响,例如照明变化,相机抖动和对象大小的变化。另一方面,所提出的特征学习方法更便宜、更容易,因为可以自动生成高度抽象和区分性的特征,而不需要专家知识。在本文中,我们利用自动特征学习方法,该方法将光流和三种不同的深度模型(即,监督卷积神经网络(S-CNN)、预训练的CNN特征提取器和分层极端学习机),用于在具有不同高度的空中平台上使用非静态相机捕获的视频中的人体检测。这些模型是在公开的、极具挑战性的UCF-ARG航空数据集上训练和测试的。分析了这些模型在训练、测试精度和学习速度方面的比较。性能评估考虑了五个人类动作(挖掘、挥手、投掷、行走和跑步)。实验结果表明,所提出的方法是成功的人体检测任务。预训练的CNN产生的平均准确率为98.09%。S-CNN使用softmax的平均准确率为95.6%,使用支持向量机(SVM)的平均准确率为91.7%。H-ELM的平均准确率为95.9%。使用普通的中央处理器(CPU),H-ELM的训练时间需要445秒。在S-CNN中学习需要770秒,使用高性能图形处理单元(GPU)。
摘要:Human detection in videos plays an important role in various real-life applications. Most traditional approaches depend on utilizing handcrafted features, which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods, which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN), pretrained CNN feature extractor, and hierarchical extreme learning machine) for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that the proposed methods are successful for the human detection task. The pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with softmax and 91.7% with Support Vector Machines (SVM). H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU), H-ELM's training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high-performance Graphical Processing Unit (GPU).


【4】Smart Fault Detection in Nanosatellite Electrical Power System
标题:纳米卫星电力系统的智能故障检测
链接:https://arxiv.org/abs/2601.00335

作者:Alireza Rezaee,Niloofar Nobahari,Amin Asgarifar,Farshid Hajati
摘要:提出了一种在低轨无姿态确定控制子系统(ADCS)的情况下探测纳卫星电源故障的新方法。由于压力公差、发射器压力和环境条件,该系统的每个部分都有发生故障的风险。常见的故障是光伏子系统的线间故障和开路、DC-DC转换器处的IGBT短路和开路以及接地电池的调节器故障。以太阳辐射和太阳能电池板表面温度为输入,电流和负载为输出,采用神经网络对系统进行无故障仿真。最后,利用神经网络分类器,根据故障的模式和类型对不同的故障进行诊断。对于故障分类,还使用其他机器学习方法,例如PCA分类,决策树和KNN。
摘要:This paper presents a new detection method of faults at Nanosatellites' electrical power without an Attitude Determination Control Subsystem (ADCS) at the LEO orbit. Each part of this system is at risk of fault due to pressure tolerance, launcher pressure, and environmental circumstances. Common faults are line to line fault and open circuit for the photovoltaic subsystem, short circuit and open circuit IGBT at DC to DC converter, and regulator fault of the ground battery. The system is simulated without fault based on a neural network using solar radiation and solar panel's surface temperature as input data and current and load as outputs. Finally, using the neural network classifier, different faults are diagnosed by pattern and type of fault. For fault classification, other machine learning methods are also used, such as PCA classification, decision tree, and KNN.


【5】Application Research of a Deep Learning Model Integrating CycleGAN and YOLO in PCB Infrared Defect Detection
标题 :集成CycleGAN和YOLO的深度学习模型在PCB红外缺陷检测中的应用研究
链接:https://arxiv.org/abs/2601.00237

作者:Chao Yang,Haoyuan Zheng,Yue Ma
备注:8 pages,8 figures
摘要:针对印刷电路板(PCB)缺陷检测中红外(IR)数据不足的关键瓶颈,提出了一种集成CycleGAN和YOLOv8的跨模态数据增强框架。与依赖于配对监督的传统方法不同,我们利用CycleGAN来执行非配对图像到图像的转换,将丰富的可见光PCB图像映射到红外域。该生成过程合成高保真伪IR样本,其保留缺陷的结构语义,同时准确地模拟热分布模式。随后,我们构建了一种异构训练策略,将生成的伪红外数据与有限的真实红外样本融合,以训练轻量级的YOLOv 8检测器。实验结果表明,该方法有效地提高了低数据条件下的特征学习。增强的检测器显著优于仅在有限的真实数据上训练的模型,并接近完全监督训练的性能基准,证明了伪IR合成作为工业检测的鲁棒增强策略的有效性。
摘要:This paper addresses the critical bottleneck of infrared (IR) data scarcity in Printed Circuit Board (PCB) defect detection by proposing a cross-modal data augmentation framework integrating CycleGAN and YOLOv8. Unlike conventional methods relying on paired supervision, we leverage CycleGAN to perform unpaired image-to-image translation, mapping abundant visible-light PCB images into the infrared domain. This generative process synthesizes high-fidelity pseudo-IR samples that preserve the structural semantics of defects while accurately simulating thermal distribution patterns. Subsequently, we construct a heterogeneous training strategy that fuses generated pseudo-IR data with limited real IR samples to train a lightweight YOLOv8 detector. Experimental results demonstrate that this method effectively enhances feature learning under low-data conditions. The augmented detector significantly outperforms models trained on limited real data alone and approaches the performance benchmarks of fully supervised training, proving the efficacy of pseudo-IR synthesis as a robust augmentation strategy for industrial inspection.


【6】Detecting Unobserved Confounders: A Kernelized Regression Approach
标题:检测未观察的混杂因素:核化回归方法
链接:https://arxiv.org/abs/2601.00200

作者:Yikai Chen,Yunxin Mao,Chunyuan Zheng,Hao Zou,Shanzhi Gu,Shixuan Liu,Yang Shi,Wenjing Yang,Kun Kuang,Haotian Wang
摘要:在观察性研究中,检测未观察到的混杂因素对于可靠的因果推断至关重要。现有的方法需要线性假设或多个异构环境,限制了非线性单环境设置的适用性。为了弥补这一差距,我们提出了核回归混杂检测(KRCD),一种新的方法,用于检测在单环境条件下的非线性观测数据中的未观测到的混杂。KRCD利用再生核Hilbert空间来建模复杂的依赖关系。通过比较标准和高阶核回归,我们推导出一个检验统计量,其显著偏离零表示未观察到的混杂。理论上,我们证明了两个关键结果:第一,在无限样本,回归系数重合当且仅当不存在未观察到的混杂因素。第二,有限样本差收敛到零均值高斯分布,具有易处理的方差。在合成基准和Twins数据集上的大量实验表明,KRCD不仅优于现有的基线,而且还实现了卓越的计算效率。
摘要:Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for detecting unobserved confounding in nonlinear observational data under single-environment conditions. KRCD leverages reproducing kernel Hilbert spaces to model complex dependencies. By comparing standard and higherorder kernel regressions, we derive a test statistic whose significant deviation from zero indicates unobserved confounding. Theoretically, we prove two key results: First, in infinite samples, regression coefficients coincide if and only if no unobserved confounders exist. Second, finite-sample differences converge to zero-mean Gaussian distributions with tractable variance. Extensive experiments on synthetic benchmarks and the Twins dataset demonstrate that KRCD not only outperforms existing baselines but also achieves superior computational efficiency.


分类|识别(2篇)

【1】Noise-Aware Named Entity Recognition for Historical VET Documents
标题:历史VET文档的噪音感知命名实体识别
链接:https://arxiv.org/abs/2601.00488

作者:Alexander M. Esser,Jens Dörpinghaus
备注:This is an extended, non-peer-reviewed version of the paper presented at VISAPP 2026
摘要:本文讨论了职业教育和培训(VET)领域中的命名实体识别(NER),重点是遭受OCR引起的噪声的历史,数字化文档。我们提出了一种鲁棒的NER方法,利用噪声感知训练(NAT),综合注入OCR错误,迁移学习和多阶段微调。三个互补的策略,训练嘈杂,干净,人工数据,系统地比较。我们的方法是第一个识别VET文档中的多个实体类型。它适用于德语文件,但可转换为任意语言。实验结果表明,特定领域和噪声感知微调大大提高了噪声条件下的鲁棒性和准确性。我们提供公开可用的代码,可再生的噪声感知NER在特定领域的上下文中。
摘要:This paper addresses Named Entity Recognition (NER) in the domain of Vocational Education and Training (VET), focusing on historical, digitized documents that suffer from OCR-induced noise. We propose a robust NER approach leveraging Noise-Aware Training (NAT) with synthetically injected OCR errors, transfer learning, and multi-stage fine-tuning. Three complementary strategies, training on noisy, clean, and artificial data, are systematically compared. Our method is one of the first to recognize multiple entity types in VET documents. It is applied to German documents but transferable to arbitrary languages. Experimental results demonstrate that domain-specific and noise-aware fine-tuning substantially increases robustness and accuracy under noisy conditions. We provide publicly available code for reproducible noise-aware NER in domain-specific contexts.


【2】Evaluating Anomaly Detectors for Simulated Highly Imbalanced Industrial Classification Problems
标题:评估模拟高度不平衡工业分类问题的异常检测器
链接:https://arxiv.org/abs/2601.00005

作者:Lesley Wheat,Martin v. Mohrenschildt,Saeid Habibi
备注:21 pages, 14 figures, 11 tables
摘要:机器学习为质量控制和预测性维护等领域的工业系统当前问题提供了潜在的解决方案,但在工业应用中也面临着独特的障碍。一个持续的挑战是极端的类不平衡,主要是由于训练期间错误数据的可用性有限。本文提出了一个全面的评估异常检测算法使用的问题不可知的模拟数据集,反映了现实世界的工程约束。使用2D和10D中具有基于超球面的异常分布的合成数据集,我们对训练数据集上的14个检测器进行基准测试,异常率在0.05%和20%之间,训练大小在1000和10000之间(测试数据集大小为40000),以评估性能和泛化误差。我们的研究结果表明,最佳检测器高度依赖于训练数据集中错误示例的总数,在大多数情况下,额外的健康示例提供的好处微不足道。在少于20个错误示例的情况下,无监督方法(kNN/LOF)占主导地位;但是在大约30 - 50个错误示例,半监督(XGBOD)和监督(SVM/CatBoost)检测器,我们看到性能大幅提高。虽然半监督方法在只有两个特征的情况下没有显示出显着的优势,但在十个特征上的改进是明显的。该研究强调了在较小数据集上推广异常检测方法的性能下降,并为在工业环境中部署异常检测提供了实用的见解。
摘要 :Machine learning offers potential solutions to current issues in industrial systems in areas such as quality control and predictive maintenance, but also faces unique barriers in industrial applications. An ongoing challenge is extreme class imbalance, primarily due to the limited availability of faulty data during training. This paper presents a comprehensive evaluation of anomaly detection algorithms using a problem-agnostic simulated dataset that reflects real-world engineering constraints. Using a synthetic dataset with a hyper-spherical based anomaly distribution in 2D and 10D, we benchmark 14 detectors across training datasets with anomaly rates between 0.05% and 20% and training sizes between 1 000 and 10 000 (with a testing dataset size of 40 000) to assess performance and generalization error. Our findings reveal that the best detector is highly dependant on the total number of faulty examples in the training dataset, with additional healthy examples offering insignificant benefits in most cases. With less than 20 faulty examples, unsupervised methods (kNN/LOF) dominate; but around 30-50 faulty examples, semi-supervised (XGBOD) and supervised (SVM/CatBoost) detectors, we see large performance increases. While semi-supervised methods do not show significant benefits with only two features, the improvements are evident at ten features. The study highlights the performance drop on generalization of anomaly detection methods on smaller datasets, and provides practical insights for deploying anomaly detection in industrial environments.


优化|敛散性(5篇)

【1】Cost Optimization in Production Line Using Genetic Algorithm
标题:基于遗传算法的生产线成本优化
链接:https://arxiv.org/abs/2601.00689

作者:Alireza Rezaee
摘要:本文提出了一种遗传算法(GA)的方法来解决成本最优的生产线任务调度。该系统由一组串行处理任务,每个任务具有给定的持续时间,单位执行成本和优先约束,必须分配给无限数量的站受到每个站的持续时间约束。其目标是最小化总生产成本,建模为一个明智的任务成本和持续时间的函数,同时严格满足所有的先决条件和能力的限制。两个染色体编码策略进行了研究:一个基于站的表示与SuperGene有效性检查使用JGAP库实现,和一个基于任务的表示,其中基因编码站分配直接。对于每种编码,标准的GA算子(交叉,变异,选择和替换)都适用于保持可行性,并将种群推向低成本的时间表。三类优先结构的实验结果-紧耦合,松耦合和解耦-表明,基于任务的编码产生更平滑的收敛和更可靠的成本最小化比基于站的编码,特别是当有效的时间表的数量是大的。这项研究突出了遗传算法的优势,基于梯度和分析方法的组合调度问题,特别是在存在复杂的约束条件和不可微的成本景观。
摘要:This paper presents a genetic algorithm (GA) approach to cost-optimal task scheduling in a production line. The system consists of a set of serial processing tasks, each with a given duration, unit execution cost, and precedence constraints, which must be assigned to an unlimited number of stations subject to a per-station duration bound. The objective is to minimize the total production cost, modeled as a station-wise function of task costs and the duration bound, while strictly satisfying all prerequisite and capacity constraints. Two chromosome encoding strategies are investigated: a station-based representation implemented using the JGAP library with SuperGene validity checks, and a task-based representation in which genes encode station assignments directly. For each encoding, standard GA operators (crossover, mutation, selection, and replacement) are adapted to preserve feasibility and drive the population toward lower-cost schedules. Experimental results on three classes of precedence structures-tightly coupled, loosely coupled, and uncoupled-demonstrate that the task-based encoding yields smoother convergence and more reliable cost minimization than the station-based encoding, particularly when the number of valid schedules is large. The study highlights the advantages of GA over gradient-based and analytical methods for combinatorial scheduling problems, especially in the presence of complex constraints and non-differentiable cost landscapes.


【2】Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability
标题:可解释性引导的双目标优化:调整准确性和可解释性
链接:https://arxiv.org/abs/2601.00655

作者:Kasra Fouladi,Hamta Rahmani
备注:10 pages
摘要:本文介绍了可解释性指导的双目标优化(IGBO),一个框架,通过结合结构化的领域知识,通过双目标制定训练可解释的模型。IGBO将特征重要性层次结构编码为有向无环图(DAG),并使用时间综合因子(TIG)来度量特征重要性。为了解决TIG计算中的分布外(OOD)问题,我们提出了一个学习数据流形感知集成路径的最优路径Oracle。理论分析证明了收敛特性和对小批量噪声的鲁棒性,而时间序列数据的实证结果表明IGBO在以最小的精度损失执行DAG约束方面的有效性,优于标准正则化基线。
摘要:This paper introduces Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains interpretable models by incorporating structured domain knowledge via a bi-objective formulation. IGBO encodes feature importance hierarchies as a Directed Acyclic Graph (DAG) and uses Temporal Integrated Gradients (TIG) to measure feature importance. To address the Out-of-Distribution (OOD) problem in TIG computation, we propose an Optimal Path Oracle that learns data-manifold-aware integration paths. Theoretical analysis proves convergence properties and robustness to mini-batch noise, while empirical results on time-series data demonstrate IGBO's effectiveness in enforcing DAG constraints with minimal accuracy loss, outperforming standard regularization baselines.


【3】Cloud-Native Generative AI for Automated Planogram Synthesis: A Diffusion Model Approach for Multi-Store Retail Optimization
标题:用于自动规划图合成的云原生生成人工智能:多商店零售优化的扩散模型方法
链接:https://arxiv.org/abs/2601.00527

作者:Ravi Teja Pagidoju,Shriya Agarwal
备注:International Conference on Software Engineering and Data Engineering : Springer Nature
摘要:平面图的创建对零售业来说是一个巨大的挑战,每个复杂的布局平均需要30个小时。本文介绍了一种云原生架构,使用扩散模型自动生成商店特定的货架图。与传统的优化方法,重新组织现有的布局,我们的系统学习跨多个零售位置的成功货架布置,以创建新的货架图配置。该架构通过AWS将基于云的模型训练与边缘部署相结合,以实现实时推理。扩散模型通过修改后的损失函数集成了零售特定的约束。基于仿真的分析表明,该系统将货架图设计时间缩短了98.3%(从30小时缩短到0.5小时),同时实现了94.4%的约束满足率。经济分析显示,在4.4个月的盈亏平衡期内,创作费用减少了97.5%。云原生架构可线性扩展,支持多达10,000个并发存储请求。这项工作证明了生成式AI在自动零售空间优化方面的可行性。
摘要:Planogram creation is a significant challenge for retail, requiring an average of 30 hours per complex layout. This paper introduces a cloud-native architecture using diffusion models to automatically generate store-specific planograms. Unlike conventional optimization methods that reorganize existing layouts, our system learns from successful shelf arrangements across multiple retail locations to create new planogram configurations. The architecture combines cloud-based model training via AWS with edge deployment for real-time inference. The diffusion model integrates retail-specific constraints through a modified loss function. Simulation-based analysis demonstrates the system reduces planogram design time by 98.3% (from 30 to 0.5 hours) while achieving 94.4% constraint satisfaction. Economic analysis reveals a 97.5% reduction in creation expenses with a 4.4-month break-even period. The cloud-native architecture scales linearly, supporting up to 10,000 concurrent store requests. This work demonstrates the viability of generative AI for automated retail space optimization.


【4】It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models
标题:永远不会太晚:经过训练的扩散模型中崩溃恢复的噪音优化
链接:https://arxiv.org/abs/2601.00090

作者:Anne Harrington,A. Sophia Koepke,Shyamgopal Karthik,Trevor Darrell,Alexei A. Efros
摘要 :当代的文本到图像模型表现出令人惊讶的模式崩溃程度,这可以在给定相同文本提示的情况下对多个图像进行采样时看到。虽然以前的工作试图通过使用指导机制引导模型来解决这个问题,或者通过生成大量候选人并对其进行优化,但在这项工作中,我们采取了不同的方向,并通过噪声优化来实现世代的多样性。具体来说,我们表明,一个简单的噪声优化目标可以减轻模式崩溃,同时保持基本模型的保真度。我们还分析了噪声的频率特性,并表明,不同的频率配置文件的替代噪声初始化可以提高优化和搜索。我们的实验表明,噪声优化产生优越的结果,在生成质量和品种。
摘要:Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. While previous work has attempted to address this issue by steering the model using guidance mechanisms, or by generating a large pool of candidates and refining them, in this work we take a different direction and aim for diversity in generations via noise optimization. Specifically, we show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model. We also analyze the frequency characteristics of the noise and show that alternative noise initializations with different frequency profiles can improve both optimization and search. Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and variety.


【5】Dynamic Bayesian Optimization Framework for Instruction Tuning in Partial Differential Equation Discovery
标题:用于偏微方程发现中指令调整的动态Bayesian优化框架
链接:https://arxiv.org/abs/2601.00088

作者:Junqi Qu,Yan Zhang,Shangqian Gao,Shibo Li
摘要:大型语言模型(LLM)显示出方程发现的前景,但它们的输出对提示短语高度敏感,这是我们称之为指令脆性的现象。静态提示无法适应多步生成过程的演变状态,导致模型在次优解决方案处停滞。为了解决这个问题,我们提出了NeuroSymbBO,它重新构建提示工程作为一个顺序决策问题。我们的方法保持了一个离散的推理策略库,并使用贝叶斯优化选择最佳的指令在每一步的基础上的数值反馈。PDE发现基准上的实验表明,自适应指令选择显着优于固定提示,实现更高的恢复率与更简约的解决方案。
摘要:Large Language Models (LLMs) show promise for equation discovery, yet their outputs are highly sensitive to prompt phrasing, a phenomenon we term instruction brittleness. Static prompts cannot adapt to the evolving state of a multi-step generation process, causing models to plateau at suboptimal solutions. To address this, we propose NeuroSymBO, which reframes prompt engineering as a sequential decision problem. Our method maintains a discrete library of reasoning strategies and uses Bayesian Optimization to select the optimal instruction at each step based on numerical feedback. Experiments on PDE discovery benchmarks show that adaptive instruction selection significantly outperforms fixed prompts, achieving higher recovery rates with more parsimonious solutions.


预测|估计(4篇)

【1】Cycling Race Time Prediction: A Personalized Machine Learning Approach Using Route Topology and Training Load
标题:自行车比赛时间预测:一种使用路线布局和训练负荷的个性化机器学习方法
链接:https://arxiv.org/abs/2601.00604

作者:Francisco Aguilera Moreno
备注:14 pages, 22 figures
摘要:预测给定路线的骑行时间对于训练计划和活动准备至关重要。现有的解决方案依赖于基于物理的模型,需要大量的参数化,包括空气动力学阻力系数和实时风力预报,这些参数对大多数业余自行车运动员来说是不切实际的。这项工作提出了一种机器学习方法,预测乘坐时间使用路线拓扑结构的功能结合运动员的当前健身状态来自训练负荷指标。该模型从历史数据中学习特定于运动员的表现模式,用历史表现代理代替复杂的物理测量。我们使用一个单一的运动员数据集(N=96游乐设施)在N-1的研究设计的方法进行评估。经过严格的特征工程以消除数据泄漏,我们发现具有拓扑+健身特征的Lasso回归达到MAE=6.60分钟和R2=0.922。值得注意的是,整合健身指标(CTL,ATL)减少了14%的错误相比,拓扑结构单独(MAE=7.66分钟),表明生理状态有意义的约束性能,即使在自定进度的努力。渐进式检查点预测可在路线困难变得明显时实现动态竞赛规划。
摘要:Predicting cycling duration for a given route is essential for training planning and event preparation. Existing solutions rely on physics-based models that require extensive parameterization, including aerodynamic drag coefficients and real-time wind forecasts, parameters impractical for most amateur cyclists. This work presents a machine learning approach that predicts ride duration using route topology features combined with the athlete's current fitness state derived from training load metrics. The model learns athlete-specific performance patterns from historical data, substituting complex physical measurements with historical performance proxies. We evaluate the approach using a single-athlete dataset (N=96 rides) in an N-of-1 study design. After rigorous feature engineering to eliminate data leakage, we find that Lasso regression with Topology + Fitness features achieves MAE=6.60 minutes and R2=0.922. Notably, integrating fitness metrics (CTL, ATL) reduces error by 14% compared to topology alone (MAE=7.66 min), demonstrating that physiological state meaningfully constrains performance even in self-paced efforts. Progressive checkpoint predictions enable dynamic race planning as route difficulty becomes apparent.


【2】Optimizing LSTM Neural Networks for Resource-Constrained Retail Sales Forecasting: A Model Compression Study
标题:优化LSTM神经网络以实现资源限制零售销售预测:模型压缩研究
链接:https://arxiv.org/abs/2601.00525

作者:Ravi Teja Pagidoju
备注:Accepted to IEEE ICUIS 2025 (International Conference on Ubiquitous and Intelligent Systems). 5 pages, 3 figures, 1 table
摘要:标准的LSTM(长短期记忆)神经网络为零售行业的销售数据提供准确的预测,但需要大量的计算能力。这可能是具有挑战性的,特别是对于中小型零售行业。本文通过将隐藏单元的数量从128逐渐减少到16来研究LSTM模型压缩。我们使用Kaggle Store Item Demand Forecasting数据集,其中包含来自10家商店和50种商品的913,000条每日销售记录,以查看模型大小和预测准确性之间的权衡。实验表明,将隐藏LSTM单元的数量降低到64,保持了相同的精度水平,同时也提高了精度。平均绝对百分比误差(MAPE)范围从完整的128单元模型的23.6%到64单元模型的12.4%。优化后的模型体积小了73%(从280KB到76KB),准确率提高了47%。这些结果表明,较大的模型并不总是能获得更好的结果。
摘要:Standard LSTM(Long Short-Term Memory) neural networks provide accurate predictions for sales data in the retail industry, but require a lot of computing power. It can be challenging especially for mid to small retail industries. This paper examines LSTM model compression by gradually reducing the number of hidden units from 128 to 16. We used the Kaggle Store Item Demand Forecasting dataset, which has 913,000 daily sales records from 10 stores and 50 items, to look at the trade-off between model size and how accurate the predictions are. Experiments show that lowering the number of hidden LSTM units to 64 maintains the same level of accuracy while also improving it. The mean absolute percentage error (MAPE) ranges from 23.6% for the full 128-unit model to 12.4% for the 64-unit model. The optimized model is 73% smaller (from 280KB to 76KB) and 47% more accurate. These results show that larger models do not always achieve better results.


【3】Early Prediction of Liver Cirrhosis Up to Three Years in Advance: A Machine Learning Study Benchmarking Against the FIB-4 Score
标题:提前三年早期预测肝硬化:一项机器学习研究以TBC-4评分为基准
链接:https://arxiv.org/abs/2601.00175

作者:Zhuqi Miao,Sujan Ravi,Abdulaziz Ahmed
摘要 :目的:开发和评估机器学习(ML)模型,用于使用常规收集的电子健康记录(EHR)数据在诊断前一年,两年和三年预测肝硬化事件,并将其性能与FIB-4评分进行基准测试。方法:我们使用来自大型学术卫生系统的去识别EHR数据进行了一项回顾性队列研究。根据ICD-9/10编码,识别脂肪肝患者并将其分为肝硬化和非肝硬化队列。使用观察和预测窗口构建预测场景,以模拟真实世界的临床使用。从观察窗汇总人口统计学、诊断、实验室结果、生命体征和合并症指数。XGBoost模型针对1年、2年和3年的预测范围进行了训练,并在测试集上进行了评估。使用受试者工作特征曲线下面积(AUC)将模型性能与FIB-4进行比较。结果:最终队列包括3,043例1年预测患者,1,981例2年预测患者和1,470例3年预测患者。在所有预测窗口中,ML模型的表现始终优于FIB-4。XGBoost模型在1年、2年和3年预测中的AUC分别为0.81、0.73和0.69,而FIB-4的AUC分别为0.71、0.63和0.57。业绩增长持续较长的预测范围,表明改善早期风险的歧视。结论:利用常规EHR数据的机器学习模型在肝硬化的早期预测方面大大优于传统的FIB-4评分。这些模型可以更早,更准确地进行风险分层,并可以作为自动化决策支持工具集成到临床工作流程中,以支持积极的肝硬化预防和管理。
摘要:Objective: Develop and evaluate machine learning (ML) models for predicting incident liver cirrhosis one, two, and three years prior to diagnosis using routinely collected electronic health record (EHR) data, and to benchmark their performance against the FIB-4 score. Methods: We conducted a retrospective cohort study using de-identified EHR data from a large academic health system. Patients with fatty liver disease were identified and categorized into cirrhosis and non-cirrhosis cohorts based on ICD-9/10 codes. Prediction scenarios were constructed using observation and prediction windows to emulate real-world clinical use. Demographics, diagnoses, laboratory results, vital signs, and comorbidity indices were aggregated from the observation window. XGBoost models were trained for 1-, 2-, and 3-year prediction horizons and evaluated on held-out test sets. Model performance was compared with FIB-4 using area under the receiver operating characteristic curve (AUC). Results: Final cohorts included 3,043 patients for the 1-year prediction, 1,981 for the 2-year prediction, and 1,470 for the 3-year prediction. Across all prediction windows, ML models consistently outperformed FIB-4. The XGBoost models achieved AUCs of 0.81, 0.73, and 0.69 for 1-, 2-, and 3-year predictions, respectively, compared with 0.71, 0.63, and 0.57 for FIB-4. Performance gains persisted with longer prediction horizons, indicating improved early risk discrimination. Conclusions: Machine learning models leveraging routine EHR data substantially outperform the traditional FIB-4 score for early prediction of liver cirrhosis. These models enable earlier and more accurate risk stratification and can be integrated into clinical workflows as automated decision-support tools to support proactive cirrhosis prevention and management.


【4】Sequential Reservoir Computing for Efficient High-Dimensional Spatiotemporal Forecasting
标题:用于高效多维时空预测的序列储层计算
链接:https://arxiv.org/abs/2601.00172

作者:Ata Akbari Asanjan,Filip Wudarski,Daniel O'Connor,Shaun Geaney,Elena Strbac,P. Aaron Lott,Davide Venturelli
摘要:由于基于梯度的训练和记忆瓶颈,预测高维时空系统对于递归神经网络(RNN)和长短期记忆(LSTM)模型仍然具有计算挑战性。储层计算(RC)通过用固定递归层和凸读出优化代替反向传播来减轻这些挑战,但传统的RC架构在输入维度方面的扩展性仍然很差。我们介绍了一个顺序水库计算(顺序RC)的架构,分解成一系列较小的,相互连接的水库一个大水库。这种设计降低了内存和计算成本,同时保持了长期的时间依赖性。使用低维混沌系统(Lorenz 63)和高维物理模拟(2D涡量和浅水方程),与LSTM和标准RNN基线相比,顺序RC实现了15-25%的有效预测范围,20-30%的误差指标(SSIM,RMSE),以及高达三个数量级的训练成本。结果表明,顺序RC保持了传统RC的简单性和效率,同时实现了高维动力系统的可扩展性。这种方法为科学和工程应用中的实时、节能预测提供了一条实用的途径。
摘要:Forecasting high-dimensional spatiotemporal systems remains computationally challenging for recurrent neural networks (RNNs) and long short-term memory (LSTM) models due to gradient-based training and memory bottlenecks. Reservoir Computing (RC) mitigates these challenges by replacing backpropagation with fixed recurrent layers and a convex readout optimization, yet conventional RC architectures still scale poorly with input dimensionality. We introduce a Sequential Reservoir Computing (Sequential RC) architecture that decomposes a large reservoir into a series of smaller, interconnected reservoirs. This design reduces memory and computational costs while preserving long-term temporal dependencies. Using both low-dimensional chaotic systems (Lorenz63) and high-dimensional physical simulations (2D vorticity and shallow-water equations), Sequential RC achieves 15-25% longer valid forecast horizons, 20-30% lower error metrics (SSIM, RMSE), and up to three orders of magnitude lower training cost compared to LSTM and standard RNN baselines. The results demonstrate that Sequential RC maintains the simplicity and efficiency of conventional RC while achieving superior scalability for high-dimensional dynamical systems. This approach provides a practical path toward real-time, energy-efficient forecasting in scientific and engineering applications.


其他神经网络|深度学习|模型|建模(17篇)

【1】A Machine Learning Framework for Off Ball Defensive Role and Performance Evaluation in Football
标题:足球场外防守角色和表现评估的机器学习框架
链接:https://arxiv.org/abs/2601.00748

作者:Sean Groom,Shuo Wang,Francisco Belo,Axl Rice,Liam Anderson
备注:40 pages, 16 figures
摘要:评估足球中的无球防守表现具有挑战性,因为传统的指标无法捕捉到限制对手动作选择和成功概率的细微协调动作。虽然广泛使用的控球价值模型擅长评估有球动作,但它们在防守中的应用仍然有限。现有的反事实方法,如幻影模型,有助于扩展这些分析,但往往依赖于模拟缺乏战术背景的“平均”行为。为了解决这个问题,我们引入了一个协变量依赖的隐马尔可夫模型(CDHMM)量身定制的角球,足球比赛的高度结构化的方面。我们的无标签模型直接从球员跟踪数据中推断出时间分辨的人盯人和区域分配。我们利用这些任务,提出了一个新的框架,防守信用归因和角色条件鬼影的方法,对无球防守表现的反事实分析。我们展示了这些贡献如何提供对上下文感知基线的防御贡献的可解释的评估。
摘要:Evaluating off-ball defensive performance in football is challenging, as traditional metrics do not capture the nuanced coordinated movements that limit opponent action selection and success probabilities. Although widely used possession value models excel at appraising on-ball actions, their application to defense remains limited. Existing counterfactual methods, such as ghosting models, help extend these analyses but often rely on simulating "average" behavior that lacks tactical context. To address this, we introduce a covariate-dependent Hidden Markov Model (CDHMM) tailored to corner kicks, a highly structured aspect of football games. Our label-free model infers time-resolved man-marking and zonal assignments directly from player tracking data. We leverage these assignments to propose a novel framework for defensive credit attribution and a role-conditioned ghosting method for counterfactual analysis of off-ball defensive performance. We show how these contributions provide a interpretable evaluation of defensive contributions against context-aware baselines.


【2】Sparse FEONet: A Low-Cost, Memory-Efficient Operator Network via Finite-Element Local Sparsity for Parametric PDEs
标题:Sparse FEONet:一种通过参数化PDEs的子集元素局部稀疏性的低成本、内存高效的运营商网络
链接:https://arxiv.org/abs/2601.00672

作者:Seungchan Ko,Jiyeon Kim,Dongwook Shin
摘要:在本文中,我们研究有限元算子网络(FEONet),一个算子学习方法的参数问题,最初介绍了J。李,S。Ko和Y.洪,求解椭圆型参数偏微分方程的有限元算子网络,SIAM J. Sci.计算:47(2),C501-C528,2025。FEONet在有限元空间上实现了参数到解决方案的映射,并允许不需要训练数据的训练过程,同时在广泛的问题类别中表现出高精度和鲁棒性。然而,随着元素数量的增加,其计算成本增加,精度可能会下降,这对大规模问题提出了显着的挑战。在本文中,我们提出了一种新的稀疏网络架构的有限元结构的动机,以解决这个问题。通过大量的数值实验,我们表明所提出的稀疏网络在计算成本和效率方面取得了显着提高,同时保持了相当的准确性。我们还建立了理论结果,证明稀疏架构可以有效地近似目标算子,并提供稳定性分析,确保可靠的训练和预测。
摘要 :In this paper, we study the finite element operator network (FEONet), an operator-learning method for parametric problems, originally introduced in J. Y. Lee, S. Ko, and Y. Hong, Finite Element Operator Network for Solving Elliptic-Type Parametric PDEs, SIAM J. Sci. Comput., 47(2), C501-C528, 2025. FEONet realizes the parameter-to-solution map on a finite element space and admits a training procedure that does not require training data, while exhibiting high accuracy and robustness across a broad class of problems. However, its computational cost increases and accuracy may deteriorate as the number of elements grows, posing notable challenges for large-scale problems. In this paper, we propose a new sparse network architecture motivated by the structure of the finite elements to address this issue. Throughout extensive numerical experiments, we show that the proposed sparse network achieves substantial improvements in computational cost and efficiency while maintaining comparable accuracy. We also establish theoretical results demonstrating that the sparse architecture can approximate the target operator effectively and provide a stability analysis ensuring reliable training and prediction.


【3】Three factor delay learning rules for spiking neural networks
标题:尖峰神经网络的三因素延迟学习规则
链接:https://arxiv.org/abs/2601.00668

作者:Luke Vassallo,Nima Taherinejad
备注:7 pages, 5 figures
摘要:尖峰神经网络(SNN)是对时空数据进行操作的动态系统,但它们的可学习参数通常限于突触权重,对时间模式识别贡献不大。延迟尖峰时间的可学习参数可以提高时态任务的分类性能,但现有方法依赖于大型网络和离线学习,使其不适合资源受限环境中的实时操作。在本文中,我们引入突触和轴突延迟泄漏集成和消防(LIF)为基础的前馈和经常性SNN,并提出了三个因素的学习规则,同时学习延迟参数在线。我们采用了一个光滑的高斯代理近似尖峰导数专门用于资格跟踪计算,并与自上而下的误差信号一起确定参数更新。我们的实验表明,与仅使用权重的基线相比,引入延迟可将准确度提高20%,对于具有类似参数计数的网络,联合学习权重和延迟可将准确度提高14%。在SHD语音识别数据集上,我们的方法达到了与离线反向传播方法相似的精度。与最先进的方法相比,它将模型大小减少了6.6倍,推理延迟减少了67%,分类准确率仅下降了2.4%。我们的研究结果有利于设计功率和面积受限的神经形态处理器,使设备上的学习和降低内存要求。
摘要:Spiking Neural Networks (SNNs) are dynamical systems that operate on spatiotemporal data, yet their learnable parameters are often limited to synaptic weights, contributing little to temporal pattern recognition. Learnable parameters that delay spike times can improve classification performance in temporal tasks, but existing methods rely on large networks and offline learning, making them unsuitable for real-time operation in resource-constrained environments. In this paper, we introduce synaptic and axonal delays to leaky integrate and fire (LIF)-based feedforward and recurrent SNNs, and propose three-factor learning rules to simultaneously learn delay parameters online. We employ a smooth Gaussian surrogate to approximate spike derivatives exclusively for the eligibility trace calculation, and together with a top-down error signal determine parameter updates. Our experiments show that incorporating delays improves accuracy by up to 20% over a weights-only baseline, and for networks with similar parameter counts, jointly learning weights and delays yields up to 14% higher accuracy. On the SHD speech recognition dataset, our method achieves similar accuracy to offline backpropagation-based approaches. Compared to state-of-the-art methods, it reduces model size by 6.6x and inference latency by 67%, with only a 2.4% drop in classification accuracy. Our findings benefit the design of power and area-constrained neuromorphic processors by enabling on-device learning and lowering memory requirements.


【4】HyperPriv-EPN: Hypergraph Learning with Privileged Knowledge for Ependymoma Prognosis
标题:HyperPriv-EPN:利用丰富知识进行Hypergraph学习来预测室管膜瘤的预后
链接:https://arxiv.org/abs/2601.00626

作者:Shuren Gabriel Yu,Sikang Ren,Yongji Tian
备注:6 pages, 2 figures, 2 tables
摘要:室管膜瘤的术前预后对治疗计划至关重要,但由于MRI与术后手术报告相比缺乏语义见解,因此具有挑战性。现有的多模态方法无法利用这种特权的文本数据时,它是不可用的推理。为了弥合这一差距,我们提出了HyperPriv-EPN,一个基于超图的学习使用标记信息(LUPI)框架。我们引入了分割图策略,利用共享编码器来处理教师图(富含特权手术后信息)和学生图(仅限于手术前数据)。通过双流蒸馏,学生学会了从视觉特征中产生语义社区结构的幻觉。在311例患者的多中心队列中进行了验证,HyperPriv-EPN实现了最先进的诊断准确性和生存分层。这有效地将专家知识转移到术前设置,释放历史术后数据的价值,以指导新患者的诊断,而不需要推理时的文本。
摘要:Preoperative prognosis of Ependymoma is critical for treatment planning but challenging due to the lack of semantic insights in MRI compared to post-operative surgical reports. Existing multimodal methods fail to leverage this privileged text data when it is unavailable during inference. To bridge this gap, we propose HyperPriv-EPN, a hypergraph-based Learning Using Privileged Information (LUPI) framework. We introduce a Severed Graph Strategy, utilizing a shared encoder to process both a Teacher graph (enriched with privileged post-surgery information) and a Student graph (restricted to pre-operation data). Through dual-stream distillation, the Student learns to hallucinate semantic community structures from visual features alone. Validated on a multi-center cohort of 311 patients, HyperPriv-EPN achieves state-of-the-art diagnostic accuracy and survival stratification. This effectively transfers expert knowledge to the preoperative setting, unlocking the value of historical post-operative data to guide the diagnosis of new patients without requiring text at inference.


【5】Learning to be Reproducible: Custom Loss Design for Robust Neural Networks
标题:学习可重复性:鲁棒神经网络的定制损失设计
链接:https://arxiv.org/abs/2601.00578

作者:Waqas Ahmed,Sheeba Samuel,Kevin Coakley,Birgitta Koenig-Ries,Odd Erik Gundersen
摘要:为了提高深度学习模型的可重复性和可靠性,我们解决了当前训练方法中的一个关键差距:缺乏确保跨运行一致和强大性能的机制。我们的实证分析表明,即使在受控的初始化和训练条件下,模型的准确性也会表现出显着的可变性。为了解决这个问题,我们提出了一个自定义损失函数(CLF),它降低了训练结果对随机因素(如权重初始化和数据洗牌)的敏感性。通过微调其参数,CLF显式地平衡了预测准确性与训练稳定性,从而使模型性能更加一致和可靠。针对图像分类和时间序列预测的不同架构的广泛实验表明,我们的方法在不牺牲预测性能的情况下显着提高了训练鲁棒性。这些结果使CLF成为开发更稳定,可靠和值得信赖的神经网络的有效和高效的策略。
摘要:To enhance the reproducibility and reliability of deep learning models, we address a critical gap in current training methodologies: the lack of mechanisms that ensure consistent and robust performance across runs. Our empirical analysis reveals that even under controlled initialization and training conditions, the accuracy of the model can exhibit significant variability. To address this issue, we propose a Custom Loss Function (CLF) that reduces the sensitivity of training outcomes to stochastic factors such as weight initialization and data shuffling. By fine-tuning its parameters, CLF explicitly balances predictive accuracy with training stability, leading to more consistent and reliable model performance. Extensive experiments across diverse architectures for both image classification and time series forecasting demonstrate that our approach significantly improves training robustness without sacrificing predictive performance. These results establish CLF as an effective and efficient strategy for developing more stable, reliable and trustworthy neural networks.


【6】Entropy Production in Machine Learning Under Fokker-Planck Probability Flow
标题:福克-普朗克概率流下机器学习中的熵产生
链接:https://arxiv.org/abs/2601.00554

作者 :Lennon Shikhman
备注:10 pages, 3 figures. Submitted for journal review
摘要:部署在非平稳环境中的机器学习模型会因数据漂移而导致性能下降。虽然存在许多漂移检测算法,但大多数都缺乏原则性的动态解释,并且对如何平衡再训练频率与运营成本提供了有限的指导。在这项工作中,我们提出了一个基于熵的再培训框架接地非平衡随机动力学。建模部署-时间数据漂移的概率流由福克-普朗克方程,我们量化模型-数据不匹配使用时间-不断变化的Kullback-Leibler分歧。我们发现,这种不匹配的时间导数承认一个熵-平衡分解具有非负熵产生项驱动的概率流。这种解释促使熵触发再培训作为一种无标签的干预策略,对累积的不匹配而不是延迟的性能崩溃作出反应。在一个受控的非平稳分类实验中,熵触发的再训练实现了与高频再训练相当的预测性能,同时减少了相对于日常和基于标签的政策的再训练事件的数量级。
摘要:Machine learning models deployed in nonstationary environments experience performance degradation due to data drift. While many drift detection heuristics exist, most lack a principled dynamical interpretation and provide limited guidance on how retraining frequency should be balanced against operational cost. In this work, we propose an entropy--based retraining framework grounded in nonequilibrium stochastic dynamics. Modeling deployment--time data drift as probability flow governed by a Fokker--Planck equation, we quantify model--data mismatch using a time--evolving Kullback--Leibler divergence. We show that the time derivative of this mismatch admits an entropy--balance decomposition featuring a nonnegative entropy production term driven by probability currents. This interpretation motivates entropy--triggered retraining as a label--free intervention strategy that responds to accumulated mismatch rather than delayed performance collapse. In a controlled nonstationary classification experiment, entropy--triggered retraining achieves predictive performance comparable to high--frequency retraining while reducing retraining events by an order of magnitude relative to daily and label--based policies.


【7】Federated Customization of Large Models: Approaches, Experiments, and Insights
标题:大型模型的联合定制:方法,实验和见解
链接:https://arxiv.org/abs/2601.00526

作者:Yuchuan Ye,Ming Ding,Youjia Chen,Peng Cheng,Dusit Niyato
备注:8 pages, 1 figure
摘要:在本文中,我们探讨了大型模型的联邦定制,并强调了它在联邦学习框架中带来的关键挑战。我们回顾了几种流行的大型模型定制技术,包括完全微调,高效微调,及时工程,前缀调整,知识蒸馏,检索增强生成。然后,我们将讨论如何在联邦学习框架内实现这些技术。此外,我们进行了联邦前缀调优的实验,据我们所知,这是第一次尝试在联邦学习环境中应用前缀调优。通过实验验证了该方法的可行性,其性能接近集中式方法。通过与其他三种联邦定制方法的比较,证明了该方法具有良好的性能、令人满意的效率和一致的鲁棒性。
摘要:In this article, we explore federated customization of large models and highlight the key challenges it poses within the federated learning framework. We review several popular large model customization techniques, including full fine-tuning, efficient fine-tuning, prompt engineering, prefix-tuning, knowledge distillation, and retrieval-augmented generation. Then, we discuss how these techniques can be implemented within the federated learning framework. Moreover, we conduct experiments on federated prefix-tuning, which, to the best of our knowledge, is the first trial to apply prefix-tuning in the federated learning setting. The conducted experiments validate its feasibility with performance close to centralized approaches. Further comparison with three other federated customization methods demonstrated its competitive performance, satisfactory efficiency, and consistent robustness.


【8】When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents
标题:当小模型因错误原因而正确时:值得信赖的代理商的流程验证
链接:https://arxiv.org/abs/2601.00513

作者:Laksh Advani
备注:Accepted to Trustagent workshop AAAI 2026
摘要:部署小语言模型(7- 9 B参数)作为自主代理需要信任它们的推理,而不仅仅是它们的输出。我们揭示了一个关键的可靠性危机:50- 69%的正确答案,从这些模型包含根本上有缺陷的推理-一个“正确的错误原因”的现象看不见的标准准确性指标。通过分析三种模型和不同任务的10,734个推理痕迹,我们引入了推理完整性评分(RIS),这是一种基于过程的度量标准,经过大量评分者间一致性($κ=0.657$)的验证。传统的做法受到挑战,我们的研究结果:虽然检索增强生成(RAG)显着提高推理的完整性(科恩的$d=0.23$-0.93 $),元认知干预,如自我批评往往损害性能($d=-0.14 $到$-0.33$)在小模型的评估任务。机制分析表明,RAG成功的接地计算外部证据,减少7.6%的错误,而元认知放大混乱没有足够的模型能力。为了实现部署,将验证功能提取到神经分类器中,实现0.86 F1分数和100 $\times $加速。这些结果强调了对可信代理进行基于过程的验证的必要性:当模型在完全错误的原因下是正确的时,仅凭准确性是不够的。
摘要:Deploying small language models (7-9B parameters) as autonomous agents requires trust in their reasoning, not just their outputs. We reveal a critical reliability crisis: 50-69\% of correct answers from these models contain fundamentally flawed reasoning -- a ``Right-for-Wrong-Reasons'' phenomenon invisible to standard accuracy metrics. Through analysis of 10,734 reasoning traces across three models and diverse tasks, we introduce the Reasoning Integrity Score (RIS), a process-based metric validated with substantial inter-rater agreement ($κ=0.657$). Conventional practices are challenged by our findings: while retrieval-augmented generation (RAG) significantly improves reasoning integrity (Cohen's $d=0.23$--$0.93$), meta-cognitive interventions like self-critique often harm performance ($d=-0.14$ to $-0.33$) in small models on the evaluated tasks. Mechanistic analysis reveals RAG succeeds by grounding calculations in external evidence, reducing errors by 7.6\%, while meta-cognition amplifies confusion without sufficient model capacity. To enable deployment, verification capabilities are distilled into a neural classifier achieving 0.86 F1-score with 100$\times$ speedup. These results underscore the necessity of process-based verification for trustworthy agents: accuracy alone is dangerously insufficient when models can be right for entirely wrong reasons.


【9】Deep Networks Learn Deep Hierarchical Models
标题:深度网络学习深度分层模型
链接:https://arxiv.org/abs/2601.00455

作者:Amit Daniely
摘要:我们认为监督学习与$n$标签,并表明分层SGD残差网络可以有效地学习一类层次模型。这个模型类假设存在一个(未知的)标签层次结构$L_1 \subseteq L_2 \subseteq \dots \subseteq L_r = [n]$,其中$L_1$中的标签是输入的简单函数,而对于$i > 1$,$L_i$中的标签是更简单标签的简单函数。   我们的类超越了以前被证明可通过深度学习算法学习的模型,从某种意义上说,它达到了有效学习的深度极限。也就是说,这类模型需要多项式深度来表达,而以前的模型可以通过对数深度电路来计算。   此外,我们认为这种分层模型的可学习性最终可能成为理解深度学习的基础。除了它们自然适合深度学习擅长的领域之外,我们认为人类“教师”的存在支持了分层结构天生可用的假设。通过提供颗粒标签,教师有效地揭示了大脑使用的内部算法的“提示”或“片段”。我们形式化这种直觉,表明在一个简化的模型中,教师部分意识到他们的内部逻辑,出现了一个层次结构,促进有效的学习。
摘要 :We consider supervised learning with $n$ labels and show that layerwise SGD on residual networks can efficiently learn a class of hierarchical models. This model class assumes the existence of an (unknown) label hierarchy $L_1 \subseteq L_2 \subseteq \dots \subseteq L_r = [n]$, where labels in $L_1$ are simple functions of the input, while for $i > 1$, labels in $L_i$ are simple functions of simpler labels.   Our class surpasses models that were previously shown to be learnable by deep learning algorithms, in the sense that it reaches the depth limit of efficient learnability. That is, there are models in this class that require polynomial depth to express, whereas previous models can be computed by log-depth circuits.   Furthermore, we suggest that learnability of such hierarchical models might eventually form a basis for understanding deep learning. Beyond their natural fit for domains where deep learning excels, we argue that the mere existence of human ``teachers" supports the hypothesis that hierarchical structures are inherently available. By providing granular labels, teachers effectively reveal ``hints'' or ``snippets'' of the internal algorithms used by the brain. We formalize this intuition, showing that in a simplified model where a teacher is partially aware of their internal logic, a hierarchical structure emerges that facilitates efficient learnability.


【10】Controllable Concept Bottleneck Models
标题:可控概念瓶颈模型
链接:https://arxiv.org/abs/2601.00451

作者:Hongbin Lin,Chenyang Ren,Juangui Xu,Zhengyu Hu,Cheng-Long Wang,Yao Shu,Hui Xiong,Jingfeng Zhang,Di Wang,Lijie Hu
备注:arXiv admin note: substantial text overlap with arXiv:2405.15476
摘要:概念瓶颈模型(CBMs)因其通过人类可理解的概念层阐明预测过程的能力而备受关注。然而,大多数以前的研究集中在静态场景中,数据和概念被假定为固定和干净的。在实际应用中,部署的模型需要持续维护:我们经常需要删除错误或敏感的数据(非学习),纠正错误标记的概念,或合并新获取的样本(增量学习)以适应不断变化的环境。因此,在不从头开始重新训练的情况下获得有效的可编辑的CBM仍然是一个重大的挑战,特别是在大规模应用中。为了应对这些挑战,我们提出了可控概念瓶颈模型(CCBMs)。具体来说,CCBM支持三种粒度的模型编辑:概念标签级,概念级和数据级,后者包括数据删除和数据添加。CCBM享受数学上严格的封闭形式的近似来自影响函数,这使得需要重新训练。实验结果表明,我们的CCBM的效率和适应性,肯定了他们的实用价值,使动态和可信的CBM。
摘要:Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on static scenarios where the data and concepts are assumed to be fixed and clean. In real-world applications, deployed models require continuous maintenance: we often need to remove erroneous or sensitive data (unlearning), correct mislabeled concepts, or incorporate newly acquired samples (incremental learning) to adapt to evolving environments. Thus, deriving efficient editable CBMs without retraining from scratch remains a significant challenge, particularly in large-scale applications. To address these challenges, we propose Controllable Concept Bottleneck Models (CCBMs). Specifically, CCBMs support three granularities of model editing: concept-label-level, concept-level, and data-level, the latter of which encompasses both data removal and data addition. CCBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for retraining. Experimental results demonstrate the efficiency and adaptability of our CCBMs, affirming their practical value in enabling dynamic and trustworthy CBMs.


【11】Deep Delta Learning
标题:深度三角洲学习
链接:https://arxiv.org/abs/2601.00417

作者:Yifan Zhang,Yifeng Liu,Mengdi Wang,Quanquan Gu
备注:Project Page: https://github.com/yifanzhang-pro/deep-delta-learning
摘要:深度剩余网络的有效性基本上取决于身份捷径连接。虽然这种机制有效地缓解了消失梯度问题,但它对特征转换施加了严格的加性归纳偏差,从而限制了网络对复杂状态转换建模的能力。在本文中,我们介绍了Deep Delta Learning(Deep Delta Learning),这是一种新的架构,它通过使用可学习的数据相关几何变换来调制身份快捷方式来推广标准的剩余连接。这种变换,称为Delta算子,构成了单位矩阵的秩1扰动,由反射方向矢量$\mathbf{k}(\mathbf{X})$和门标量$β(\mathbf{X})$参数化。我们提供了一个频谱分析这个运营商,证明门$β(\mathbf{X})$使动态插值之间的身份映射,正交投影,几何反射。此外,我们重建的残留更新作为一个同步的秩1注入,其中门作为一个动态的步长大小管理的擦除旧信息和新功能的写入。这种统一使网络能够显式控制其逐层转换算子的频谱,从而能够对复杂的非单调动态进行建模,同时保留门控残差架构的稳定训练特性。
摘要:The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector $\mathbf{k}(\mathbf{X})$ and a gating scalar $β(\mathbf{X})$. We provide a spectral analysis of this operator, demonstrating that the gate $β(\mathbf{X})$ enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.


【12】Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing
标题:基于铁电突触的个性化Spiking神经网络脑电信号处理
链接:https://arxiv.org/abs/2601.00020

作者:Nikhil Garg,Anxiong Song,Niklas Plessnig,Nathan Savoia,Laura Bégon-Lours
摘要:基于脑电图(EEG)的脑机接口(BCI)受到跨会话和个体变化的非平稳神经信号的强烈影响,限制了主体不可知模型的泛化,并激励资源受限平台上的自适应和个性化学习。可编程忆阻硬件为这种部署后适应提供了有前途的衬底;然而,实际实现受到有限的重量分辨率、设备可变性、非线性编程动态和有限的设备耐久性的挑战。在这项工作中,我们表明,尖峰神经网络(SNN)可以部署在铁电忆阻突触设备的自适应EEG为基础的运动图像解码现实设备的限制。我们制造,表征和模型铁电突触。我们在两种互补的部署策略下评估卷积递归SNN架构:(i)使用铁电突触模型的设备感知训练,以及(ii)转移软件训练的权重,然后进行低开销的设备上重新调整。为了实现有效的适应,我们引入了一种设备感知的权重更新策略,其中基于梯度的更新以数字方式累积,并仅在超过阈值时转换为离散编程事件,从而在降低编程频率的同时模拟非线性、状态相关的编程动态。这两种部署策略都实现了与最先进的基于软件的SNN相当的分类性能。此外,通过仅重新训练最终网络层实现的特定于主题的迁移学习提高了分类精度。这些结果表明,可编程铁电硬件可以支持尖峰神经网络中的鲁棒性,低开销自适应,为神经信号的个性化神经形态处理开辟了一条实用的道路。
摘要:Electroencephalography (EEG)-based brain-computer interfaces (BCIs) are strongly affected by non-stationary neural signals that vary across sessions and individuals, limiting the generalization of subject-agnostic models and motivating adaptive and personalized learning on resource-constrained platforms. Programmable memristive hardware offers a promising substrate for such post-deployment adaptation; however, practical realization is challenged by limited weight resolution, device variability, nonlinear programming dynamics, and finite device endurance. In this work, we show that spiking neural networks (SNNs) can be deployed on ferroelectric memristive synaptic devices for adaptive EEG-based motor imagery decoding under realistic device constraints. We fabricate, characterize, and model ferroelectric synapses. We evaluate a convolutional-recurrent SNN architecture under two complementary deployment strategies: (i) device-aware training using a ferroelectric synapse model, and (ii) transfer of software-trained weights followed by low-overhead on-device re-tuning. To enable efficient adaptation, we introduce a device-aware weight-update strategy in which gradient-based updates are accumulated digitally and converted into discrete programming events only when a threshold is exceeded, emulating nonlinear, state-dependent programming dynamics while reducing programming frequency. Both deployment strategies achieve classification performance comparable to state-of-the-art software-based SNNs. Furthermore, subject-specific transfer learning achieved by retraining only the final network layers improves classification accuracy. These results demonstrate that programmable ferroelectric hardware can support robust, low-overhead adaptation in spiking neural networks, opening a practical path toward personalized neuromorphic processing of neural signals.


【13】AceFF: A State-of-the-Art Machine Learning Potential for Small Molecules
标题 :AceFF:最先进的小分子机器学习潜力
链接:https://arxiv.org/abs/2601.00581

作者:Stephen E. Farr,Stefan Doerr,Antonio Mirarchi,Francesc Sabanes Zariquiey,Gianni De Fabritiis
摘要:我们介绍AceFF,这是一种针对小分子药物发现优化的预训练机器学习原子间势(MLIP)。虽然MLIPs已经成为密度泛函理论(DFT)的有效替代品,但在不同的化学空间中推广仍然很困难。AceFF通过在药物样化合物的综合数据集上训练的精细TensorNet 2架构来解决这个问题。这种方法产生了一个力场,可以平衡高吞吐量的推理速度和DFT级别的准确性。AceFF完全支持基本的药物化学元素(H,B,C,N,O,F,Si,P,S,Cl,Br,I),并经过明确培训以处理带电状态。对严格的基准进行验证,包括复杂的扭转能量扫描,分子动力学轨迹,批量最小化以及力和无功精度,表明AceFF为有机分子建立了新的最先进技术。AceFF-2模型权重和推断代码可在https://huggingface.co/Acellera/AceFF-2.0上获得。
摘要:We introduce AceFF, a pre-trained machine learning interatomic potential (MLIP) optimized for small molecule drug discovery. While MLIPs have emerged as efficient alternatives to Density Functional Theory (DFT), generalizability across diverse chemical spaces remains difficult. AceFF addresses this via a refined TensorNet2 architecture trained on a comprehensive dataset of drug-like compounds. This approach yields a force field that balances high-throughput inference speed with DFT-level accuracy. AceFF fully supports the essential medicinal chemistry elements (H, B, C, N, O, F, Si, P, S, Cl, Br, I) and is explicitly trained to handle charged states. Validation against rigorous benchmarks, including complex torsional energy scans, molecular dynamics trajectories, batched minimizations, and forces and anergy accuracy demonstrates that AceFF establishes a new state-of-the-art for organic molecules. The AceFF-2 model weights and inference code are available at https://huggingface.co/Acellera/AceFF-2.0.


【14】Generative Conditional Missing Imputation Networks
标题:生成性条件缺失插补网络
链接:https://arxiv.org/abs/2601.00517

作者:George Sun,Yi-Hui Zhou
摘要:在这项研究中,我们引入了一个复杂的生成条件策略,旨在填补数据集内的缺失值,在统计分析中相当重要的一个领域。具体来说,我们首先阐明了生成条件缺失插补网络(GCMI)的理论基础,在完全随机缺失(MCAR)和随机缺失(MAR)机制的背景下展示了其强大的特性。随后,我们通过使用链式方程方法集成多个插补框架来增强GCMI的鲁棒性和准确性。这项创新有助于增强模型稳定性并显着提高插补性能。最后,通过一系列细致的模拟和经验评估,利用基准数据集,我们建立了我们提出的方法与目前可用的其他领先插补技术并列时的优越功效。这一全面评价不仅强调了全球气候变化管理指数的实用性,而且肯定了它作为统计数据分析领域的前沿工具的潜力。
摘要:In this study, we introduce a sophisticated generative conditional strategy designed to impute missing values within datasets, an area of considerable importance in statistical analysis. Specifically, we initially elucidate the theoretical underpinnings of the Generative Conditional Missing Imputation Networks (GCMI), demonstrating its robust properties in the context of the Missing Completely at Random (MCAR) and the Missing at Random (MAR) mechanisms. Subsequently, we enhance the robustness and accuracy of GCMI by integrating a multiple imputation framework using a chained equations approach. This innovation serves to bolster model stability and improve imputation performance significantly. Finally, through a series of meticulous simulations and empirical assessments utilizing benchmark datasets, we establish the superior efficacy of our proposed methods when juxtaposed with other leading imputation techniques currently available. This comprehensive evaluation not only underscores the practicality of GCMI but also affirms its potential as a leading-edge tool in the field of statistical data analysis.


【15】Interpretable Machine Learning for Quantum-Informed Property Predictions in Artificial Sensing Materials
标题:人工传感材料量子信息性质预测的可解释机器学习
链接:https://arxiv.org/abs/2601.00503

作者:Li Chen,Leonardo Medrano Sandonas,Shirong Huang,Alexander Croy,Gianaurelio Cuniberti
备注:18 pages, 6 figures, 1 table
摘要:数字传感在开发可持续的方法以将定制的电子鼻的适用性扩展到复杂的身体气味挥发组(BOV)方面面临挑战。为了应对这一挑战,我们开发了MORE-ML,这是一种计算框架,它将电子鼻分子构建块的量子力学(QM)特性数据与机器学习(ML)方法相结合,以预测传感相关特性。在此框架内,我们通过对BOV分子与粘蛋白衍生受体之间的相互作用的更大构象空间进行采样,将我们先前的数据集MORE-Q扩展到MORE-QX。该数据集提供了基于BOV吸附计算的广泛的电子结合特征(BF)。MORE-QX属性空间的分析揭示了构建块的QM属性与所产生的BF之间的弱相关性。利用这一观察结果,我们将构建块的电子描述符定义为基于树的ML模型的输入,以预测BF。基准测试表明,CatBoost模型的性能优于替代品,特别是在对未知化合物的转移性方面。可解释的人工智能方法进一步强调了哪些QM属性最影响BF预测。总的来说,MORE-ML将QM见解与ML相结合,为BOV传感中的分子受体提供机制理解和合理设计原则。这种方法为推进能够分析复杂气味混合物的人工传感材料奠定了基础,弥合了分子水平计算和实际电子鼻应用之间的差距。
摘要:Digital sensing faces challenges in developing sustainable methods to extend the applicability of customized e-noses to complex body odor volatilome (BOV). To address this challenge, we developed MORE-ML, a computational framework that integrates quantum-mechanical (QM) property data of e-nose molecular building blocks with machine learning (ML) methods to predict sensing-relevant properties. Within this framework, we expanded our previous dataset, MORE-Q, to MORE-QX by sampling a larger conformational space of interactions between BOV molecules and mucin-derived receptors. This dataset provides extensive electronic binding features (BFs) computed upon BOV adsorption. Analysis of MORE-QX property space revealed weak correlations between QM properties of building blocks and resulting BFs. Leveraging this observation, we defined electronic descriptors of building blocks as inputs for tree-based ML models to predict BFs. Benchmarking showed CatBoost models outperform alternatives, especially in transferability to unseen compounds. Explainable AI methods further highlighted which QM properties most influence BF predictions. Collectively, MORE-ML combines QM insights with ML to provide mechanistic understanding and rational design principles for molecular receptors in BOV sensing. This approach establishes a foundation for advancing artificial sensing materials capable of analyzing complex odor mixtures, bridging the gap between molecular-level computations and practical e-nose applications.


【16】Solving nonlinear subsonic compressible flow in infinite domain via multi-stage neural networks
标题:利用多级神经网络求解无限区域非线性亚音速可压缩流动
链接:https://arxiv.org/abs/2601.00342

作者:Xuehui Qian,Hongkai Tao,Yongji Wang
备注:24 pages, 9 figures
摘要:在空气动力学中,精确地模拟翼型上的亚音速可压缩流动对于飞机设计是至关重要的。然而,求解控制非线性扰动速度势方程提出了计算挑战。传统的方法通常依赖于线性化方程或有限的截断域,这会引入不可忽略的误差,并限制了在现实世界中的适用性。在这项研究中,我们提出了一个新的框架,利用物理信息神经网络(PINNs)解决完全非线性可压缩势方程在一个无界(无限)域。我们解决了无界域和收敛的挑战,固有的标准PINN通过将坐标变换和嵌入物理渐近约束直接到网络架构。此外,我们采用多级PINN(MS-PINN)的方法来迭代最小化残差,实现解决方案的精度接近机器精度。我们通过模拟圆形和椭圆形几何形状的流动来验证这个框架,将我们的结果与传统的有限域和线性化解决方案进行比较。我们的研究结果量化了域截断和线性化引入的明显差异,特别是在较高的马赫数下,并表明这种新的框架是一种强大的,高保真的计算流体动力学工具。
摘要 :In aerodynamics, accurately modeling subsonic compressible flow over airfoils is critical for aircraft design. However, solving the governing nonlinear perturbation velocity potential equation presents computational challenges. Traditional approaches often rely on linearized equations or finite, truncated domains, which introduce non-negligible errors and limit applicability in real-world scenarios. In this study, we propose a novel framework utilizing Physics-Informed Neural Networks (PINNs) to solve the full nonlinear compressible potential equation in an unbounded (infinite) domain. We address the unbounded-domain and convergence challenges inherent in standard PINNs by incorporating a coordinate transformation and embedding physical asymptotic constraints directly into the network architecture. Furthermore, we employ a Multi-Stage PINN (MS-PINN) approach to iteratively minimize residuals, achieving solution accuracy approaching machine precision. We validate this framework by simulating flow over circular and elliptical geometries, comparing our results against traditional finite-domain and linearized solutions. Our findings quantify the noticeable discrepancies introduced by domain truncation and linearization, particularly at higher Mach numbers, and demonstrate that this new framework is a robust, high-fidelity tool for computational fluid dynamics.


【17】Cuffless, calibration-free hemodynamic monitoring with physics-informed machine learning models
标题:无袖带、免校准的血液动力学监测,采用了解物理情况的机器学习模型
链接:https://arxiv.org/abs/2601.00081

作者:Henry Crandall,Tyler Schuessler,Filip Bělík,Albert Fabregas,Barry M. Stults,Alexandra Boyadzhiev,Huanan Zhang,Jim S. Wu,Aylin R. Rodan,Stephen P. Juraschek,Ramakrishna Mukkamala,Alfred K. Cheung,Stavros G. Drakos,Christel Hohenegger,Braxton Osting,Benjamin Sanchez
备注:225 pages, Number of Main Figures 4, Number of Extended Data Tables 4, Number of Extended Data Figures 5, Number of Supplementary Figures 34, Number of Supplementary Tables 11, Number of Supplementary Videos 11, Supplementary Statistical Table 1 (Supplementary Table 12)
摘要:可穿戴技术通过提供心血管健康指标的持续评估和指导临床管理,有可能改变动态和家庭血流动力学监测。然而,现有的用于血压(BP)监测的无袖带可穿戴设备通常依赖于缺乏理论基础的方法,例如脉搏波分析或脉搏到达时间,使得它们容易受到生理和实验混杂因素的影响,从而破坏其准确性和临床实用性。在这里,我们开发了一种具有实时生物电阻抗(BioZ)传感的智能手表设备,用于无袖带血流动力学监测。我们通过多尺度分析和计算建模框架阐明了BioZ和BP之间的生物物理关系,并确定了影响手腕处脉动BioZ信号的生理,解剖和实验参数。结合流体动力学原理的信号标记物理信息神经网络能够实现BP和径向和轴向血流速度的免校准估计。我们成功地测试了我们的方法与健康的个人在休息和体力活动后,包括身体和自主的挑战,并与高血压和心血管疾病的患者在门诊和重症监护环境。我们的研究结果证明了BioZ技术用于无袖带BP和血流速度监测的可行性,解决了现有无袖带技术的关键局限性。
摘要:Wearable technologies have the potential to transform ambulatory and at-home hemodynamic monitoring by providing continuous assessments of cardiovascular health metrics and guiding clinical management. However, existing cuffless wearable devices for blood pressure (BP) monitoring often rely on methods lacking theoretical foundations, such as pulse wave analysis or pulse arrival time, making them vulnerable to physiological and experimental confounders that undermine their accuracy and clinical utility. Here, we developed a smartwatch device with real-time electrical bioimpedance (BioZ) sensing for cuffless hemodynamic monitoring. We elucidate the biophysical relationship between BioZ and BP via a multiscale analytical and computational modeling framework, and identify physiological, anatomical, and experimental parameters that influence the pulsatile BioZ signal at the wrist. A signal-tagged physics-informed neural network incorporating fluid dynamics principles enables calibration-free estimation of BP and radial and axial blood velocity. We successfully tested our approach with healthy individuals at rest and after physical activity including physical and autonomic challenges, and with patients with hypertension and cardiovascular disease in outpatient and intensive care settings. Our findings demonstrate the feasibility of BioZ technology for cuffless BP and blood velocity monitoring, addressing critical limitations of existing cuffless technologies.


其他(22篇)

【1】Precision Autotuning for Linear Solvers via Contextual Bandit-Based RL
标题:通过基于上下文Bandit的RL对线性求解器进行精确自动调整
链接:https://arxiv.org/abs/2601.00728

作者:Erin Carson,Xinye Chen
摘要:我们提出了一个强化学习(RL)框架的线性求解器的自适应精度调整,并可以扩展到一般的算法。该框架被制定为一个上下文的强盗问题,并解决了使用增量动作值估计与离散化状态空间,以选择最佳的精度配置的计算步骤,平衡精度和计算效率。为了验证其有效性,我们将该框架应用于求解线性方程组Ax = b$的迭代精化。在这个应用程序中,我们的方法动态地选择精度的基础上计算的功能从系统。详细地,Q表映射离散化特征(例如,近似条件数和矩阵范数)到动作(为特定步骤选择的精度配置),通过ε贪婪策略优化以最大化平衡精度和计算成本的多目标回报。实证结果表明,有效的精度选择,降低计算成本,同时保持精度可比的双精度基线。该框架推广到不同的样本外数据,并提供了利用RL精度选择其他数值算法的见解,推进科学计算中的混合精度数值方法。据我们所知,这是第一个使用RL进行精确自动调整的工作,并在看不见的数据集上进行了验证。
摘要:We propose a reinforcement learning (RL) framework for adaptive precision tuning of linear solvers, and can be extended to general algorithms. The framework is formulated as a contextual bandit problem and solved using incremental action-value estimation with a discretized state space to select optimal precision configurations for computational steps, balancing precision and computational efficiency. To verify its effectiveness, we apply the framework to iterative refinement for solving linear systems $Ax = b$. In this application, our approach dynamically chooses precisions based on calculated features from the system. In detail, a Q-table maps discretized features (e.g., approximate condition number and matrix norm)to actions (chosen precision configurations for specific steps), optimized via an epsilon-greedy strategy to maximize a multi-objective reward balancing accuracy and computational cost. Empirical results demonstrate effective precision selection, reducing computational cost while maintaining accuracy comparable to double-precision baselines. The framework generalizes to diverse out-of-sample data and offers insight into utilizing RL precision selection for other numerical algorithms, advancing mixed-precision numerical methods in scientific computing. To the best of our knowledge, this is the first work on precision autotuning with RL and verified on unseen datasets.


【2】Bayesian Inverse Games with High-Dimensional Multi-Modal Observations
标题:具有多维多模式观测的Bayesian逆博弈
链接:https://arxiv.org/abs/2601.00696

作者:Yash Jain,Xinjie Liu,Lasse Peters,David Fridovich-Keil,Ufuk Topcu
摘要:许多多智能体交互场景可以自然地建模为非合作博弈,其中每个智能体的决策取决于其他智能体的未来行动。然而,部署自主决策的博弈论规划者需要一个规范的所有代理的目标。为了规避这一实际困难,最近的工作开发了最大似然技术,用于解决反向游戏,可以识别未知的代理目标的交互数据。不幸的是,这些方法只能推断点估计,并没有量化估计的不确定性,相应地,下游规划决策可能过于自信地承诺不安全的行动。我们提出了一种近似贝叶斯推理方法来解决逆博弈问题,它可以将来自多个模态的观测数据结合起来,并用于实时生成有限传感器观测的隐藏代理目标的贝叶斯后验样本。具体地说,所提出的贝叶斯逆博弈框架训练一个结构化的变分自动编码器与嵌入式可微纳什博弈求解器的交互数据集上,不需要标签的代理的真实目标。大量的实验表明,我们的框架成功地学习了先验和后验分布,提高了基于最大似然估计的逆博弈方法的推理质量,并在不牺牲效率的情况下实现了更安全的下游决策。当轨迹信息是无信息的或不可用的,多模态推理进一步减少不确定性,通过利用额外的观察模态。
摘要 :Many multi-agent interaction scenarios can be naturally modeled as noncooperative games, where each agent's decisions depend on others' future actions. However, deploying game-theoretic planners for autonomous decision-making requires a specification of all agents' objectives. To circumvent this practical difficulty, recent work develops maximum likelihood techniques for solving inverse games that can identify unknown agent objectives from interaction data. Unfortunately, these methods only infer point estimates and do not quantify estimator uncertainty; correspondingly, downstream planning decisions can overconfidently commit to unsafe actions. We present an approximate Bayesian inference approach for solving the inverse game problem, which can incorporate observation data from multiple modalities and be used to generate samples from the Bayesian posterior over the hidden agent objectives given limited sensor observations in real time. Concretely, the proposed Bayesian inverse game framework trains a structured variational autoencoder with an embedded differentiable Nash game solver on interaction datasets and does not require labels of agents' true objectives. Extensive experiments show that our framework successfully learns prior and posterior distributions, improves inference quality over maximum likelihood estimation-based inverse game approaches, and enables safer downstream decision-making without sacrificing efficiency. When trajectory information is uninformative or unavailable, multimodal inference further reduces uncertainty by exploiting additional observation modalities.


【3】TeleDoCTR: Domain-Specific and Contextual Troubleshooting for Telecommunications
标题:TeleDoCTR:电信领域特定和上下文故障排除
链接:https://arxiv.org/abs/2601.00691

作者:Mohamed Trabelsi,Huseyin Uzunalioglu
摘要:票务故障排除是指分析和解决通过票务系统报告的问题的过程。在提供广泛服务的大型组织中,由于提交的票证的多样性和对专业领域知识的需求,此任务非常复杂。特别是,电信(电信)中的故障排除是一项非常耗时的任务,因为它需要专家解释票据内容,查阅文档并搜索历史记录以确定适当的解决方案。这种人力密集的方法不仅拖延问题的解决,而且还阻碍了整体运营效率。为了提高电信票证故障排除的有效性和效率,我们提出了TeleDoCTR,一种新的电信相关的,特定于域的,和上下文的故障排除系统,专为电信端到端票证解决方案。TeleDoCTR集成了特定领域的排名和生成模型,以自动化故障排除工作流程的关键步骤,这些步骤是:将故障单发送给负责解决故障单的适当专家团队(分类任务),检索上下文和语义相似的历史故障单(检索任务),并生成详细的故障分析报告,概述问题,根本原因和潜在的解决方案(生成任务)。我们在电信基础设施的真实数据集上评估了TeleDoCTR,并证明它比现有的最先进的方法具有更好的性能,显著提高了故障排除过程的准确性和效率。
摘要:Ticket troubleshooting refers to the process of analyzing and resolving problems that are reported through a ticketing system. In large organizations offering a wide range of services, this task is highly complex due to the diversity of submitted tickets and the need for specialized domain knowledge. In particular, troubleshooting in telecommunications (telecom) is a very time-consuming task as it requires experts to interpret ticket content, consult documentation, and search historical records to identify appropriate resolutions. This human-intensive approach not only delays issue resolution but also hinders overall operational efficiency. To enhance the effectiveness and efficiency of ticket troubleshooting in telecom, we propose TeleDoCTR, a novel telecom-related, domain-specific, and contextual troubleshooting system tailored for end-to-end ticket resolution in telecom. TeleDoCTR integrates both domain-specific ranking and generative models to automate key steps of the troubleshooting workflow which are: routing tickets to the appropriate expert team responsible for resolving the ticket (classification task), retrieving contextually and semantically similar historical tickets (retrieval task), and generating a detailed fault analysis report outlining the issue, root cause, and potential solutions (generation task). We evaluate TeleDoCTR on a real-world dataset from a telecom infrastructure and demonstrate that it achieves superior performance over existing state-of-the-art methods, significantly enhancing the accuracy and efficiency of the troubleshooting process.


【4】Stronger Approximation Guarantees for Non-Monotone γ-Weakly DR-Submodular Maximization
标题:非单调γ-弱DR-次模最大化的更强逼近保证
链接:https://arxiv.org/abs/2601.00611

作者:Hareshkumar Jadav,Ranveer Singh,Vaneet Aggarwal
备注:Extended version of paper accepted in AAMAS 2026
摘要:在约束条件下最大化子模块目标是机器学习和优化中的一个基本问题。研究了下闭凸体上非负非单调$γ$-弱DR-次模函数的极大化问题。我们的主要结果是一个近似算法,其保证光滑地依赖于$γ$,特别是,当$γ=1$(DR-子模的情况下),我们的界恢复了$0.401$的近似因子,而对于$γ<1$的保证优雅地下降,它改善了以前报道的边界为$γ$-弱DR-子模最大化在相同的约束。我们的方法结合了Frank-Wolfe引导的连续贪婪框架和$γ$感知的双重贪婪步骤,产生了一个简单而有效的处理非单调性的过程。这导致下闭凸体上的非单调$γ$-弱DR-子模最大化的最新保证。
摘要:Maximizing submodular objectives under constraints is a fundamental problem in machine learning and optimization. We study the maximization of a nonnegative, non-monotone $γ$-weakly DR-submodular function over a down-closed convex body. Our main result is an approximation algorithm whose guarantee depends smoothly on $γ$; in particular, when $γ=1$ (the DR-submodular case) our bound recovers the $0.401$ approximation factor, while for $γ<1$ the guarantee degrades gracefully and, it improves upon previously reported bounds for $γ$-weakly DR-submodular maximization under the same constraints. Our approach combines a Frank-Wolfe-guided continuous-greedy framework with a $γ$-aware double-greedy step, yielding a simple yet effective procedure for handling non-monotonicity. This results in state-of-the-art guarantees for non-monotone $γ$-weakly DR-submodular maximization over down-closed convex bodies.


【5】Neural Chains and Discrete Dynamical Systems
标题:神经链和离散动态系统
链接:https://arxiv.org/abs/2601.00473

作者:Sauro Succi,Abhisek Ganguly,Santosh Ansumali
摘要:我们检查机器学习(ML)应用程序之间的类比基于Transformer架构没有自我注意,{\it神经链}下文,和离散动力系统与离散版本的神经积分和偏微分方程(NIE,PDE)。通过标准的数值离散化(也投在神经链)和通过PINN的学习(粘性和非粘性)的Burgers和Eikonal方程的数值解的比较分析,并commentedon. It发现,标准的数值离散化和PINN学习提供了两种不同的路径来获得基本上相同的知识系统的动力学。PINN学习通过随机矩阵进行,这些随机矩阵与有限差分(FD)程序相关的高度结构化矩阵没有直接关系。导致可接受的解决方案的随机矩阵比矩阵空间中唯一的三对角形式要多得多,这解释了为什么PINN搜索通常落在随机集合上。代价是大量的参数,导致缺乏物理透明度(可解释性)以及在FD程序中没有对应物的大量培训成本。然而,我们的研究结果涉及一维动态问题,因此他们不排除PINN和ML一般可能为高维问题提供更好的策略的可能性。
摘要:We inspect the analogy between machine-learning (ML) applications based on the transformer architecture without self-attention, {\it neural chains} hereafter, and discrete dynamical systems associated with discretised versions of neural integral and partial differential equations (NIE, PDE). A comparative analysis of the numerical solution of the (viscid and inviscid) Burgers and Eikonal equations via standard numerical discretization (also cast in terms of neural chains) and via PINN's learning is presented and commented on. It is found that standard numerical discretization and PINN learning provide two different paths to acquire essentially the same knowledge about the dynamics of the system. PINN learning proceeds through random matrices which bear no direct relation to the highly structured matrices associated with finite-difference (FD) procedures. Random matrices leading to acceptable solutions are far more numerous than the unique tridiagonal form in matrix space, which explains why the PINN search typically lands on the random ensemble. The price is a much larger number of parameters, causing lack of physical transparency (explainability) as well as large training costs with no counterpart in the FD procedure. However, our results refer to one-dimensional dynamic problems, hence they don't rule out the possibility that PINNs and ML in general, may offer better strategies for high-dimensional problems.


【6】Laplacian Kernelized Bandit
标题:拉普拉斯核心化强盗
链接:https://arxiv.org/abs/2601.00461

作者:Shuang Wu,Arash A. Amini
摘要:我们研究了多用户上下文土匪,用户相关的图形和他们的奖励功能表现出非线性行为和图形同构。我们引入了一个原则性的联合惩罚的用户奖励功能的集合$\{f_u\}$,结合一个图形的光滑度项的基础上RKHS距离与个人粗糙度处罚。我们的主要贡献是证明,这个惩罚是等价于一个单一的,统一的\n {多用户RKHS}的平方范数。我们明确地推导出它的再生核,它优雅地融合了图拉普拉斯算子与基臂核。这种统一使我们能够将问题重新定义为学习单个“提升”函数,从而能够设计原则性算法,\texttt{LK-GP-UCB}和\texttt{LK-GP-TS},这些算法利用高斯过程后验在这个新内核上进行探索。我们提供了高概率的遗憾界限,规模与多用户内核的\emdash {有效尺寸},取代依赖于用户计数或环境尺寸。从经验上讲,我们的方法在非线性设置中优于强线性和非图形感知基线,即使在真正的奖励是线性的情况下也保持竞争力。我们的工作提供了一个统一的,理论上有基础的,实用的框架,桥梁拉普拉斯正则化与内核化的强盗结构化的探索。
摘要:We study multi-user contextual bandits where users are related by a graph and their reward functions exhibit both non-linear behavior and graph homophily. We introduce a principled joint penalty for the collection of user reward functions $\{f_u\}$, combining a graph smoothness term based on RKHS distances with an individual roughness penalty. Our central contribution is proving that this penalty is equivalent to the squared norm within a single, unified \emph{multi-user RKHS}. We explicitly derive its reproducing kernel, which elegantly fuses the graph Laplacian with the base arm kernel. This unification allows us to reframe the problem as learning a single ''lifted'' function, enabling the design of principled algorithms, \texttt{LK-GP-UCB} and \texttt{LK-GP-TS}, that leverage Gaussian Process posteriors over this new kernel for exploration. We provide high-probability regret bounds that scale with an \emph{effective dimension} of the multi-user kernel, replacing dependencies on user count or ambient dimension. Empirically, our methods outperform strong linear and non-graph-aware baselines in non-linear settings and remain competitive even when the true rewards are linear. Our work delivers a unified, theoretically grounded, and practical framework that bridges Laplacian regularization with kernelized bandits for structured exploration.


【7】Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations
标题:专家混合中的几何规则化:权重和激活之间的脱节
链接:https://arxiv.org/abs/2601.00457

作者:Hyunjun Kim
摘要:混合专家(MoE)模型通过稀疏激活实现效率,但几何正则化在专家专业化中的作用仍然不清楚。我们应用正交性损失来加强专家多样性,并发现它在多个方面都失败了:它不会减少重量-空间重叠(MSO实际上增加了114%),激活空间重叠仍然很高(~0.6),并且对性能的影响不一致-WikiText-103的边际改进(-0.9%),TinyStories轻微退化(+0.9%),PTB结果高度可变(标准差> 1.0)。我们对7种正则化强度的分析显示,权重和激活正交性之间没有显著相关性(r =-0.293,p = 0.523)。这些发现表明,权重空间正则化既不能实现其几何目标,也不能可靠地提高性能,使其不适合MoE多样性。
摘要:Mixture-of-Experts (MoE) models achieve efficiency through sparse activation, but the role of geometric regularization in expert specialization remains unclear. We apply orthogonality loss to enforce expert diversity and find it fails on multiple fronts: it does not reduce weight-space overlap (MSO actually increases by up to 114%), activation-space overlap remains high (~0.6) regardless of regularization, and effects on performance are inconsistent -- marginal improvement on WikiText-103 (-0.9%), slight degradation on TinyStories (+0.9%), and highly variable results on PTB (std > 1.0). Our analysis across 7 regularization strengths reveals no significant correlation (r = -0.293, p = 0.523) between weight and activation orthogonality. These findings demonstrate that weight-space regularization neither achieves its geometric goal nor reliably improves performance, making it unsuitable for MoE diversity.


【8】Imitation from Observations with Trajectory-Level Generative Embeddings
标题:具有轨迹级生成嵌入的观测模拟
链接:https://arxiv.org/abs/2601.00452

作者:Yongtao Qu,Shangzhe Li,Weitong Zhang
备注:24 pages, 6 figures, 7 tables
摘要:我们考虑离线模仿学习从观察(LfO)的专家演示是稀缺的,可用的离线次优数据是远离专家的行为。许多现有的分布匹配方法在这种情况下挣扎,因为它们施加了严格的支持约束,并依赖于脆弱的一步模型,很难从不完美的数据中提取有用的信号。为了应对这一挑战,我们提出了TGE,这是一种离线LfO的自动生成嵌入,通过估计在离线轨迹数据上训练的时间扩散模型的潜在空间中的专家状态密度来构建密集,平滑的代理奖励。通过利用学习的扩散嵌入的平滑几何结构,TGE捕获长时间的时间动态,并有效地弥合不相交支持之间的差距,即使离线数据在分布上与专家不同,也能确保鲁棒的学习信号。从经验上看,在一系列D4 RL运动和操纵基准测试中,所提出的方法始终与之前的离线LfO方法相匹配或优于之前的离线LfO方法。
摘要:We consider the offline imitation learning from observations (LfO) where the expert demonstrations are scarce and the available offline suboptimal data are far from the expert behavior. Many existing distribution-matching approaches struggle in this regime because they impose strict support constraints and rely on brittle one-step models, making it hard to extract useful signal from imperfect data. To tackle this challenge, we propose TGE, a trajectory-level generative embedding for offline LfO that constructs a dense, smooth surrogate reward by estimating expert state density in the latent space of a temporal diffusion model trained on offline trajectory data. By leveraging the smooth geometry of the learned diffusion embedding, TGE captures long-horizon temporal dynamics and effectively bridges the gap between disjoint supports, ensuring a robust learning signal even when offline data is distributionally distinct from the expert. Empirically, the proposed approach consistently matches or outperforms prior offline LfO methods across a range of D4RL locomotion and manipulation benchmarks.


【9】Secure, Verifiable, and Scalable Multi-Client Data Sharing via Consensus-Based Privacy-Preserving Data Distribution
标题:通过基于案例的隐私保护数据分布实现安全、可验证和可扩展的多客户端数据共享
链接:https://arxiv.org/abs/2601.00418

作者:Prajwal Panth,Sahaj Raj Malla
备注:Preprint. Under review
摘要:我们提出了一个轻量级的,后设置的安全多客户端数据聚合的自治协议,基于数据库的隐私保护数据分发(CPPDD)框架。该框架通过一个双层保护机制来实施恶意发布的机密性,该机制将每个客户端的仿射掩码与优先级驱动的顺序共识锁定相结合。分散的完整性通过步骤(sigma_S)和数据(sigma_D)校验和进行验证,从而促进自主恶意偏差检测和原子中止,而无需持久协调。该设计支持标量、向量和矩阵有效载荷,计算和通信复杂度为O(N*D),可选的边缘服务器卸载,以及在N-1次破坏下的抗共谋能力。形式化分析证明了该算法的正确性、具有偏差中止概率的依赖于时间序列的完整性和公平性(CDIF)以及假设伪随机函数族的IND-CPA安全性。对MNIST导出的向量的经验评估表明,线性可扩展性高达N = 500,每客户端计算时间为亚毫秒。与MPC和HE基线相比,该框架实现了100%的恶意偏差检测,精确的数据恢复以及三到四个数量级的低FLOP。CPPDD实现了安全投票、联盟联合学习、区块链托管和地理信息能力建设方面的原子协作,解决了可扩展性、信任最小化和可验证多方计算方面的关键差距,适用于受监管和资源受限的环境。
摘要 :We propose the Consensus-Based Privacy-Preserving Data Distribution (CPPDD) framework, a lightweight and post-setup autonomous protocol for secure multi-client data aggregation. The framework enforces unanimous-release confidentiality through a dual-layer protection mechanism that combines per-client affine masking with priority-driven sequential consensus locking. Decentralized integrity is verified via step (sigma_S) and data (sigma_D) checksums, facilitating autonomous malicious deviation detection and atomic abort without requiring persistent coordination. The design supports scalar, vector, and matrix payloads with O(N*D) computation and communication complexity, optional edge-server offloading, and resistance to collusion under N-1 corruptions. Formal analysis proves correctness, Consensus-Dependent Integrity and Fairness (CDIF) with overwhelming-probability abort on deviation, and IND-CPA security assuming a pseudorandom function family. Empirical evaluations on MNIST-derived vectors demonstrate linear scalability up to N = 500 with sub-millisecond per-client computation times. The framework achieves 100% malicious deviation detection, exact data recovery, and three-to-four orders of magnitude lower FLOPs compared to MPC and HE baselines. CPPDD enables atomic collaboration in secure voting, consortium federated learning, blockchain escrows, and geo-information capacity building, addressing critical gaps in scalability, trust minimization, and verifiable multi-party computation for regulated and resource-constrained environments.


【10】NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion
标题:NOS-Gate:时间控制规避下的消费者门户的客户感知流媒体IDS
链接:https://arxiv.org/abs/2601.00389

作者:Muhammad Bilal,Omer Tariq,Hasan Ahmed
备注:9 pages, 3 figures, 4 tables
摘要:定时和突发模式可以通过加密泄漏,适应性对手可以利用它们。这破坏了独立消费者网关中的仅元数据检测。因此,在CPU和延迟预算紧张的情况下,消费者网关需要仅使用元数据对加密流量进行流式入侵检测。我们提出了一个流IDS的独立网关,实例化一个轻量级的两个状态的单位来自网络优化的尖峰(NOS)动态每个流,命名为NOS门。NOS-Gate对元数据功能的固定长度窗口进行评分,并在$K$-of-$M$持久性规则下触发可逆缓解,暂时降低加权公平分配(WFQ)下的流权重。我们使用可执行的“世界”基准,指定良性的设备进程,可审计的攻击者预算,争用结构,和数据包级WFQ重放量化队列的影响下,定时控制的规避评估NOS门。所有方法都是通过老化分位数阈值校准的无标记。在多个可重现的世界和恶意事件中,在实现的$0.1%$假阳性操作点处,NOS-Gate达到0.952事件召回,而这些运行中的最佳基线为0.857。在门控条件下,它减少了p99.9的延迟和p99.9的延迟,平均评分成本为~ 2.09 μs/CPU上的流窗口。
摘要:Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a stand-alone consumer gateway. Therefore, consumer gateways need streaming intrusion detection on encrypted traffic using metadata only, under tight CPU and latency budgets. We present a streaming IDS for stand-alone gateways that instantiates a lightweight two-state unit derived from Network-Optimised Spiking (NOS) dynamics per flow, named NOS-Gate. NOS-Gate scores fixed-length windows of metadata features and, under a $K$-of-$M$ persistence rule, triggers a reversible mitigation that temporarily reduces the flow's weight under weighted fair queueing (WFQ). We evaluate NOS-Gate under timing-controlled evasion using an executable 'worlds' benchmark that specifies benign device processes, auditable attacker budgets, contention structure, and packet-level WFQ replay to quantify queue impact. All methods are calibrated label-free via burn-in quantile thresholding. Across multiple reproducible worlds and malicious episodes, at an achieved $0.1%$ false-positive operating point, NOS-Gate attains 0.952 incident recall versus 0.857 for the best baseline in these runs. Under gating, it reduces p99.9 queueing delay and p99.9 collateral delay with a mean scoring cost of ~ 2.09 μs per flow-window on CPU.


【11】BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics
标题:BERT-JEPA:为不变量语义重新组织LIS嵌入
链接:https://arxiv.org/abs/2601.00366

作者:Taj Gillin,Adam Lalani,Kenneth Zhang,Marcel Mateos Salles
备注:16 pages, 10 figures, 10 tables
摘要:联合嵌入预测架构(JEPA)是一种新的自监督训练技术,最近在各个领域都显示出了希望。我们引入BERT-JEPA(BEPA),这是一种训练范式,它将JEPA训练目标添加到BERT风格的模型中,致力于对抗崩溃的[CLS]嵌入空间,并将其转变为语言不可知的空间。这种新结构提高了多语言基准测试的性能。
摘要:Joint Embedding Predictive Architectures (JEPA) are a novel self supervised training technique that have shown recent promise across domains. We introduce BERT-JEPA (BEPA), a training paradigm that adds a JEPA training objective to BERT-style models, working to combat a collapsed [CLS] embedding space and turning it into a language-agnostic space. This new structure leads to increased performance across multilingual benchmarks.


【12】Deterministic Coreset for Lp Subspace
标题:LP子空间的确定性核心集
链接:https://arxiv.org/abs/2601.00361

作者:Rachit Chhaya,Anirban Dasgupta,Dan Feldman,Supratim Shit
摘要:We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon > 0$. For a given full rank matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ where $n \gg d$, $\mathbf{X}' \in \mathbb{R}^{m \times d}$ is an $(\varepsilon,\ell_p)$-subspace embedding of $\mathbf{X}$, if for every $\mathbf{q} \in \mathbb{R}^d$, $(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}$. Specifically, in this paper, $\mathbf{X}'$ is a weighted subset of rows of $\mathbf{X}$ which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the $\ell_p$ subspace embedding. For an error parameter $\varepsilon$, our algorithm takes $O(\mathrm{poly}(n,d,\varepsilon^{-1}))$ time and returns a deterministic $\varepsilon$-coreset, for $\ell_p$ subspace embedding whose size is $O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right)$. Here, we remove the $\log$ factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the $\ell_p$ regression problem in a deterministic manner.
摘要:We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon > 0$. For a given full rank matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ where $n \gg d$, $\mathbf{X}' \in \mathbb{R}^{m \times d}$ is an $(\varepsilon,\ell_p)$-subspace embedding of $\mathbf{X}$, if for every $\mathbf{q} \in \mathbb{R}^d$, $(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}$. Specifically, in this paper, $\mathbf{X}'$ is a weighted subset of rows of $\mathbf{X}$ which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the $\ell_p$ subspace embedding. For an error parameter $\varepsilon$, our algorithm takes $O(\mathrm{poly}(n,d,\varepsilon^{-1}))$ time and returns a deterministic $\varepsilon$-coreset, for $\ell_p$ subspace embedding whose size is $O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right)$. Here, we remove the $\log$ factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the $\ell_p$ regression problem in a deterministic manner.


【13】Quantum King-Ring Domination in Chess: A QAOA Approach
标题:国际象棋中的量子王者环统治:QAOA方法
链接:https://arxiv.org/abs/2601.00318

作者:Gerhard Stenzel,Michael Kölle,Tobias Rohe,Julian Hager,Leo Sünkel,Maximilian Zorn,Claudia Linnhoff-Popien
摘要:量子近似优化算法(QAOA)广泛用于合成随机实例,如MaxCut,TSP和SAT问题,但这些问题缺乏语义结构和人类可解释性,对具有有意义约束的现实世界问题的性能提供了有限的见解。我们介绍量子王环统治(QKRD),一个NISQ规模的基准来自国际象棋战术位置,提供5,000个结构化的实例与一个热约束,空间局部性,和10- 40量子位规模。该基准将人类可解释的覆盖率指标与传统的算法的内在验证相结合,从而在没有外部预言的情况下实现算法结论。使用QKRD,我们系统地评估QAOA设计选择,并发现约束保持混频器(XY,畴壁)比标准混频器快约13步(p<10^{-7},d\approx0.5)在消除惩罚调整的同时,热启动策略将收敛减少了45步(p<10^{-127},d=3.35),能量改进超过d=8,条件风险价值(CVaR)优化产生信息性负结果,具有更差的能量(p<10^{-40},d=1.21),并且没有覆盖益处。内在验证表明QAOA比贪婪算法和随机选择算法分别提高了12.6%和80.1%。我们的研究结果表明,结构化的基准揭示了问题知情的QAOA技术的优势掩盖在随机的情况下。我们发布所有代码、数据和实验工件,用于可重复的NISQ算法研究。
摘要:The Quantum Approximate Optimization Algorithm (QAOA) is extensively benchmarked on synthetic random instances such as MaxCut, TSP, and SAT problems, but these lack semantic structure and human interpretability, offering limited insight into performance on real-world problems with meaningful constraints. We introduce Quantum King-Ring Domination (QKRD), a NISQ-scale benchmark derived from chess tactical positions that provides 5,000 structured instances with one-hot constraints, spatial locality, and 10--40 qubit scale. The benchmark pairs human-interpretable coverage metrics with intrinsic validation against classical heuristics, enabling algorithmic conclusions without external oracles. Using QKRD, we systematically evaluate QAOA design choices and find that constraint-preserving mixers (XY, domain-wall) converge approximately 13 steps faster than standard mixers (p<10^{-7}, d\approx0.5) while eliminating penalty tuning, warm-start strategies reduce convergence by 45 steps (p<10^{-127}, d=3.35) with energy improvements exceeding d=8, and Conditional Value-at-Risk (CVaR) optimization yields an informative negative result with worse energy (p<10^{-40}, d=1.21) and no coverage benefit. Intrinsic validation shows QAOA outperforms greedy heuristics by 12.6\% and random selection by 80.1\%. Our results demonstrate that structured benchmarks reveal advantages of problem-informed QAOA techniques obscured in random instances. We release all code, data, and experimental artifacts for reproducible NISQ algorithm research.


【14】Task-Driven Kernel Flows: Label Rank Compression and Laplacian Spectral Filtering
标题:任务驱动的核流:标签排名压缩和拉普拉斯谱过滤
链接:https://arxiv.org/abs/2601.00276

作者:Hongxi Li,Chunlin Huang
备注:47 pages;3 figures
摘要:我们提出了一个在宽L2正则化网络中的特征学习理论,表明监督学习本质上是压缩的。我们推导出一个核常微分方程,预测一个“注水”的光谱演化,并证明,对于任何稳定的稳态,核秩是有界的类($C$)的数量。我们进一步证明了SGD噪声同样是低秩的($O(C)$),将动态限制在任务相关的子空间。该框架统一了对齐的确定性和随机性观点,并将监督学习的低级别性质与自我监督的高级别,可扩展的表示进行了对比。
摘要:We present a theory of feature learning in wide L2-regularized networks showing that supervised learning is inherently compressive. We derive a kernel ODE that predicts a "water-filling" spectral evolution and prove that for any stable steady state, the kernel rank is bounded by the number of classes ($C$). We further demonstrate that SGD noise is similarly low-rank ($O(C)$), confining dynamics to the task-relevant subspace. This framework unifies the deterministic and stochastic views of alignment and contrasts the low-rank nature of supervised learning with the high-rank, expansive representations of self-supervision.


【15】Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing
标题:现代神经形态人工智能:从代币内处理到代币间处理
链接:https://arxiv.org/abs/2601.00245

作者:Osvaldo Simeone
摘要:人工智能(AI)的快速发展带来了新的数据处理和生成能力,但也带来了不断升级的能源需求。这一挑战激发了人们对神经形态计算原理的新兴趣,这些原理通过离散和稀疏的激活、循环动力学和非线性反馈来保证类似大脑的效率。事实上,现代人工智能架构越来越多地通过大量量化的激活、状态空间动态和稀疏注意力机制来体现神经形态原则。本文详细阐述了神经形态模型,状态空间模型,和Transformer架构之间的联系,通过镜头的令牌内处理和令牌间处理之间的区别。神经形态AI的大多数早期工作都是基于尖峰神经网络(SNN)进行令牌内处理,即,用于涉及同一向量输入的多个通道或特征的变换,例如图像的像素。相比之下,最近的研究已经探索了如何利用神经形态学原理来设计有效的令牌间处理方法,该方法根据上下文相关性选择性地组合不同的信息元素。这些方法通过实现联想记忆机制,利用状态空间动力学或稀疏自我注意力。除了通过令牌内和令牌间处理的镜头系统地介绍现代神经形态AI模型外,还回顾了神经形态AI模型的训练方法。这些范围从利用并行卷积处理的代理梯度到基于强化学习机制的本地学习规则。
摘要:The rapid growth of artificial intelligence (AI) has brought novel data processing and generative capabilities but also escalating energy requirements. This challenge motivates renewed interest in neuromorphic computing principles, which promise brain-like efficiency through discrete and sparse activations, recurrent dynamics, and non-linear feedback. In fact, modern AI architectures increasingly embody neuromorphic principles through heavily quantized activations, state-space dynamics, and sparse attention mechanisms. This paper elaborates on the connections between neuromorphic models, state-space models, and transformer architectures through the lens of the distinction between intra-token processing and inter-token processing. Most early work on neuromorphic AI was based on spiking neural networks (SNNs) for intra-token processing, i.e., for transformations involving multiple channels, or features, of the same vector input, such as the pixels of an image. In contrast, more recent research has explored how neuromorphic principles can be leveraged to design efficient inter-token processing methods, which selectively combine different information elements depending on their contextual relevance. Implementing associative memorization mechanisms, these approaches leverage state-space dynamics or sparse self-attention. Along with a systematic presentation of modern neuromorphic AI models through the lens of intra-token and inter-token processing, training methodologies for neuromorphic AI models are also reviewed. These range from surrogate gradients leveraging parallel convolutional processing to local learning rules based on reinforcement learning mechanisms.


【16】Unknown Aware AI-Generated Content Attribution
标题:未知感知人工智能生成的内容归因
链接:https://arxiv.org/abs/2601.00218

作者:Ellie Thieu,Jifan Zhang,Haoyue Bai
摘要:真实感生成模型的快速发展使得确定合成内容的来源变得越来越重要,超越了二元真假检测,转向识别产生给定图像的特定模型。我们研究了从目标生成模型(例如,OpenAI Dalle 3)来自其他来源的图像,包括真实图像和各种替代模型生成的图像。使用CLIP功能和一个简单的线性分类器,在以前的工作中被证明是有效的,我们建立了一个强大的基线目标生成器属性只使用有限的标记数据从目标模型和少量的已知发电机。然而,这个基线很难推广到更难、看不见的和新发布的生成器。为了解决这一限制,我们提出了一种约束优化方法,该方法利用未标记的野生数据,包括从互联网收集的图像,其中可能包括真实图像,来自未知生成器的输出,甚至来自目标模型本身的样本。所提出的方法鼓励野生样本被归类为非目标,同时明确限制标记数据的性能保持较高。实验结果表明,结合野生数据大大提高了具有挑战性的看不见的生成器的归因性能,表明可以有效地利用野生的未标记数据来增强开放世界环境中AI生成的内容归因。
摘要 :The rapid advancement of photorealistic generative models has made it increasingly important to attribute the origin of synthetic content, moving beyond binary real or fake detection toward identifying the specific model that produced a given image. We study the problem of distinguishing outputs from a target generative model (e.g., OpenAI Dalle 3) from other sources, including real images and images generated by a wide range of alternative models. Using CLIP features and a simple linear classifier, shown to be effective in prior work, we establish a strong baseline for target generator attribution using only limited labeled data from the target model and a small number of known generators. However, this baseline struggles to generalize to harder, unseen, and newly released generators. To address this limitation, we propose a constrained optimization approach that leverages unlabeled wild data, consisting of images collected from the Internet that may include real images, outputs from unknown generators, or even samples from the target model itself. The proposed method encourages wild samples to be classified as non target while explicitly constraining performance on labeled data to remain high. Experimental results show that incorporating wild data substantially improves attribution performance on challenging unseen generators, demonstrating that unlabeled data from the wild can be effectively exploited to enhance AI generated content attribution in open world settings.


【17】Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings
标题:量化语义嵌入的增强学习的不等错误保护
链接:https://arxiv.org/abs/2601.00186

作者:Moirangthem Tiken Singh,Adnan Arif
摘要:本文解决了在有限带宽的通信系统中保存语义的紧迫挑战。我们介绍了一种新的强化学习框架,通过自适应重复编码实现每维不等错误保护。我们的方法的核心是一个复合语义失真度量,它平衡了全局嵌入相似性与实体级保存,使强化学习代理能够以上下文感知的方式分配保护。实验表明,在统计上显着的收益超过统一的保护,实现6.8%的高chrF分数和9.3%的更好的实体保存在1 dB SNR。我们的框架的关键创新是演示简单,智能分配的重复编码,使细粒度的语义保护-传统的代码,如LDPC或里德-所罗门无法实现的优势。我们的研究结果挑战传统的信道编码范式,建立代码结构必须符合语义粒度。这种方法特别适合边缘计算和物联网场景,在这些场景中,带宽稀缺,但语义保真度至关重要,为下一代语义感知网络提供了一条实用的途径。
摘要:This paper tackles the pressing challenge of preserving semantic meaning in communication systems constrained by limited bandwidth. We introduce a novel reinforcement learning framework that achieves per-dimension unequal error protection via adaptive repetition coding. Central to our approach is a composite semantic distortion metric that balances global embedding similarity with entity-level preservation, empowering the reinforcement learning agent to allocate protection in a context-aware manner. Experiments show statistically significant gains over uniform protection, achieving 6.8% higher chrF scores and 9.3% better entity preservation at 1 dB SNR. The key innovation of our framework is the demonstration that simple, intelligently allocated repetition coding enables fine-grained semantic protection -- an advantage unattainable with conventional codes such as LDPC or Reed-Solomon. Our findings challenge traditional channel coding paradigms by establishing that code structure must align with semantic granularity. This approach is particularly suited to edge computing and IoT scenarios, where bandwidth is scarce, but semantic fidelity is critical, providing a practical pathway for next-generation semantic-aware networks.


【18】Exploration in the Limit
标题:极限探索
链接:https://arxiv.org/abs/2601.00084

作者:Brian M. Cho,Nathan Kallus
摘要:在固定置信度最佳臂识别(BAI)中,目标是快速识别最佳选项,同时将错误概率控制在期望阈值以下。尽管有过多的BAI算法,但现有的方法通常在实际设置中不足,因为严格的精确误差控制需要使用松散的尾部不等式和/或参数限制。为了克服这些限制,我们引入了一个宽松的配方,需要有效的误差控制渐近相对于最小样本量。这与许多现实世界的设置相一致,这些设置通常涉及弱信号、高期望显著性和实验后推理要求,所有这些都需要长视野。这使我们能够实现更严格的最优性,同时更好地处理灵活的非参数结果分布,并充分利用个人层面的背景。我们开发了一种新的渐近任意时间有效的置信序列的手臂指数,我们用它来设计一个新的BAI算法,我们的渐近框架。我们的方法灵活地采用协变量减少方差,并确保在完全非参数设置的近似误差控制。在温和的收敛假设下,我们提供了样本复杂度的渐近界,并显示我们的方法的最坏情况下的样本复杂度匹配高斯BAI的最佳情况下的样本复杂度精确的误差保证和已知的方差。实验表明,我们的方法降低了平均样本的复杂性,同时保持错误控制。
摘要:In fixed-confidence best arm identification (BAI), the objective is to quickly identify the optimal option while controlling the probability of error below a desired threshold. Despite the plethora of BAI algorithms, existing methods typically fall short in practical settings, as stringent exact error control requires using loose tail inequalities and/or parametric restrictions. To overcome these limitations, we introduce a relaxed formulation that requires valid error control asymptotically with respect to a minimum sample size. This aligns with many real-world settings that often involve weak signals, high desired significance, and post-experiment inference requirements, all of which necessitate long horizons. This allows us to achieve tighter optimality, while better handling flexible nonparametric outcome distributions and fully leveraging individual-level contexts. We develop a novel asymptotic anytime-valid confidence sequences over arm indices, and we use it to design a new BAI algorithm for our asymptotic framework. Our method flexibly incorporates covariates for variance reduction and ensures approximate error control in fully nonparametric settings. Under mild convergence assumptions, we provide asymptotic bounds on the sample complexity and show the worst-case sample complexity of our approach matches the best-case sample complexity of Gaussian BAI under exact error guarantees and known variances. Experiments suggest our approach reduces average sample complexities while maintaining error control.


【19】A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system
标题:在最后一英里城市交付系统中平衡运营人力资源工作量的多算法方法
链接:https://arxiv.org/abs/2601.00023

作者:Luis M. Moreno-Saavedra,Silvia Jimenez-Fernandez,Antonio Portilla-Figueras,David Casillas-Perez,Sancho Salcedo-Sanz
摘要:在最后一英里包裹递送系统中,有效地将工作量分配给劳动力至关重要。在这种情况下,基于地理接近度将包裹递送分配给工作人员的传统方法可能效率低下,并且肯定会导致递送工作人员之间的工作量分配不平衡。在本文中,我们着眼于最后一英里城市包裹递送系统的运营人力资源工作量平衡的问题。这个想法是考虑工作量来优化系统,即,优化过程现在侧重于改善交付时间,以便在所有工作人员之间完成工作量平衡。此过程应纠正给定区域中交付工作人员的工作量严重失调。具体来说,我们提出了一个多算法的方法来解决这个问题。所提出的方法将一组交付点和定义数量的工人作为输入,然后将包裹分配给工人,以确保每个工人每天完成类似的工作量。所提出的算法使用的距离和工作量的考虑相结合,以优化分配的包的工人。从这个意义上说,还考虑了交付点与每个工人的位置之间的距离。建议的多算法方法包括不同版本的k-means,进化方法,递归分配的基础上,k-means初始化与不同的问题编码,和一个混合进化集成算法。我们已经说明了所提出的方法在现实世界中的问题,在城市最后一英里的包裹递送劳动力在西班牙阿祖奎卡德埃纳雷斯经营的性能。
摘要:Efficient workload assignment to the workforce is critical in last-mile package delivery systems. In this context, traditional methods of assigning package deliveries to workers based on geographical proximity can be inefficient and surely guide to an unbalanced workload distribution among delivery workers. In this paper, we look at the problem of operational human resources workload balancing in last-mile urban package delivery systems. The idea is to consider the effort workload to optimize the system, i.e., the optimization process is now focused on improving the delivery time, so that the workload balancing is complete among all the staff. This process should correct significant decompensations in workload among delivery workers in a given zone. Specifically, we propose a multi-algorithm approach to tackle this problem. The proposed approach takes as input a set of delivery points and a defined number of workers, and then assigns packages to workers, in such a way that it ensures that each worker completes a similar amount of work per day. The proposed algorithms use a combination of distance and workload considerations to optimize the allocation of packages to workers. In this sense, the distance between the delivery points and the location of each worker is also taken into account. The proposed multi-algorithm methodology includes different versions of k-means, evolutionary approaches, recursive assignments based on k-means initialization with different problem encodings, and a hybrid evolutionary ensemble algorithm. We have illustrated the performance of the proposed approach in a real-world problem in an urban last-mile package delivery workforce operating at Azuqueca de Henares, Spain.


【20】Neural Minimum Weight Perfect Matching for Quantum Error Codes
标题:量子误码的神经网络最小权完美匹配
链接:https://arxiv.org/abs/2601.00242

作者:Yotam Peled,David Zenati,Eliya Nachmani
摘要:实现量子计算的全部潜力需要量子纠错(QEC)。QEC通过在冗余物理量子位上编码逻辑信息来降低错误率,从而能够检测和纠正错误。用于此任务的常见解码器是最小权重完美匹配(MWPM),这是一种基于图的算法,它依赖于边权重来识别最可能的错误链。在这项工作中,我们提出了一个数据驱动的解码器命名为神经最小权重完美匹配(NMWPM)。我们的解码器采用了一种混合架构,该架构集成了图神经网络(GNN)来提取局部综合征特征,并集成了Transformers来捕获远程全局依赖关系,然后将其用于预测MWPM解码器的动态边缘权重。为了便于通过不可微MWPM算法进行训练,我们制定了一种新的代理损失函数,可以实现端到端的优化。我们的研究结果表明,在标准基线的逻辑错误率(LER)的显着性能降低,突出了混合解码器的优势,结合了神经网络的预测能力与经典匹配的算法结构。
摘要:Realizing the full potential of quantum computation requires Quantum Error Correction (QEC). QEC reduces error rates by encoding logical information across redundant physical qubits, enabling errors to be detected and corrected. A common decoder used for this task is Minimum Weight Perfect Matching (MWPM) a graph-based algorithm that relies on edge weights to identify the most likely error chains. In this work, we propose a data-driven decoder named Neural Minimum Weight Perfect Matching (NMWPM). Our decoder utilizes a hybrid architecture that integrates Graph Neural Networks (GNNs) to extract local syndrome features and Transformers to capture long-range global dependencies, which are then used to predict dynamic edge weights for the MWPM decoder. To facilitate training through the non-differentiable MWPM algorithm, we formulate a novel proxy loss function that enables end-to-end optimization. Our findings demonstrate significant performance reduction in the Logical Error Rate (LER) over standard baselines, highlighting the advantage of hybrid decoders that combine the predictive capabilities of neural networks with the algorithmic structure of classical matching.


【21】Automated electrostatic characterization of quantum dot devices in single- and bilayer heterostructures
标题:单层和双层杂结构中量子点器件的自动静电特征
链接:https://arxiv.org/abs/2601.00067

作者:Merritt P. R. Losert,Dario Denora,Barnaby van Straaten,Michael Chan,Stefan D. Oosterhout,Lucas Stehouwer,Giordano Scappucci,Menno Veldhorst,Justyna P. Zwolak
备注:18 pages, 12 figures
摘要:随着基于量子点(QD)的自旋量子位向更大、更复杂的器件架构发展,快速、自动化的器件表征和数据分析工具变得至关重要。电荷稳定性图(CSD)中过渡线的方向和间距包含QD器件电容环境的指纹,使这些测量成为器件表征的有用工具。然而,手动解释这些功能是耗时的,容易出错的,并且在规模上是不切实际的。在这里,我们提出了一个自动化的协议,从CSD提取潜在的电容特性。我们的方法集成了机器学习、图像处理和对象检测,可以识别和跟踪大型数据集上的电荷转换,而无需手动标记。我们证明了这种方法使用实验测量的数据从应变锗单量子阱(平面)和应变锗双量子阱(双层)量子点器件。与平面QD器件不同,双层锗异质结构中的CSD表现出更大的转变集,包括垂直堆叠QD的层间隧穿和不同的加载线,使其成为自动化方法的强大测试平台。通过分析许多CSD的特性,我们可以统计地估计物理相关的量,如相对杠杆臂和电容耦合。因此,我们的协议,使快速提取有用的,非平凡的QD设备的信息。
摘要:As quantum dot (QD)-based spin qubits advance toward larger, more complex device architectures, rapid, automated device characterization and data analysis tools become critical. The orientation and spacing of transition lines in a charge stability diagram (CSD) contain a fingerprint of a QD device's capacitive environment, making these measurements useful tools for device characterization. However, manually interpreting these features is time-consuming, error-prone, and impractical at scale. Here, we present an automated protocol for extracting underlying capacitive properties from CSDs. Our method integrates machine learning, image processing, and object detection to identify and track charge transitions across large datasets without manual labeling. We demonstrate this method using experimentally measured data from a strained-germanium single-quantum-well (planar) and a strained-germanium double-quantum-well (bilayer) QD device. Unlike for planar QD devices, CSDs in bilayer germanium heterostructure exhibit a larger set of transitions, including interlayer tunneling and distinct loading lines for the vertically stacked QDs, making them a powerful testbed for automation methods. By analyzing the properties of many CSDs, we can statistically estimate physically relevant quantities, like relative lever arms and capacitive couplings. Thus, our protocol enables rapid extraction of useful, nontrivial information about QD devices.


【22】Group Cross-Correlations with Faintly Constrained Filters
标题:具有弱约束过滤器的群体交叉相关性
链接:https://arxiv.org/abs/2601.00045

作者:Benedikt Fluhr
备注:25 pages + 9 pages appendices, 1 figure, comments welcome
摘要:我们提供了一个概念的组互相关,其中相关的过滤器是不严格的约束,在以前的文献。这解决了先前约束对于具有非紧凑稳定器的组动作的不兼容性。此外,我们推广以前的结果,不一定是传递的群行动,我们削弱了常见的单模性假设。
摘要:We provide a notion of group cross-correlations, where the associated filter is not as tightly constrained as in the previous literature. This resolves an incompatibility previous constraints have for group actions with non-compact stabilizers. Moreover, we generalize previous results to group actions that are not necessarily transitive, and we weaken the common assumption of unimodularity.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/191308