点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计181篇
大模型相关(25篇)
【1】TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories
标题:TraceSafe:LLM护栏对多步骤工具调用轨迹的系统评估
链接:https://arxiv.org/abs/2604.07223
作者:Yen-Shan Chen,Sian-Yao Huang,Cheng-Lin Yang,Yun-Nung Chen
摘要:随着大型语言模型(LLM)从静态聊天机器人演变为自治代理,主要漏洞表面从最终输出转移到中间执行跟踪。虽然安全护栏对于自然语言响应具有良好的基准,但其功效在多步工具使用轨迹中仍然未被探索。为了解决这一差距,我们推出了TraceSafe-Bench,这是第一个专门用于评估中轨安全性的综合基准。它包括12个风险类别,从安全威胁(例如,即时注入,隐私泄露)到操作故障(例如,幻觉,界面不一致),具有超过1,000个独特的执行实例。我们对13个LLM-as-a-guard模型和7个专门的护栏进行了评估,得出了三个关键发现:1)结构瓶颈:结构数据能力(例如,JSON解析)而不是语义安全对齐。性能与结构化到文本的基准测试密切相关($ρ=0.79$),但与标准越狱鲁棒性的相关性接近于零。2)体系结构超过规模:模型架构对风险检测性能的影响比模型大小更显著,通用LLM在轨迹分析中始终优于专用安全护栏。3)时间稳定性:准确性在扩展轨迹中保持弹性。增加的执行步骤允许模型从静态工具定义转向动态执行行为,实际上提高了后期阶段的风险检测性能。我们的研究结果表明,确保代理工作流程需要联合优化结构推理和安全对齐,以有效地降低中间轨迹风险。
摘要:As large language models (LLMs) evolve from static chatbots into autonomous agents, the primary vulnerability surface shifts from final outputs to intermediate execution traces. While safety guardrails are well-benchmarked for natural language responses, their efficacy remains largely unexplored within multi-step tool-use trajectories. To address this gap, we introduce TraceSafe-Bench, the first comprehensive benchmark specifically designed to assess mid-trajectory safety. It encompasses 12 risk categories, ranging from security threats (e.g., prompt injection, privacy leaks) to operational failures (e.g., hallucinations, interface inconsistencies), featuring over 1,000 unique execution instances. Our evaluation of 13 LLM-as-a-guard models and 7 specialized guardrails yields three critical findings: 1) Structural Bottleneck: Guardrail efficacy is driven more by structural data competence (e.g., JSON parsing) than semantic safety alignment. Performance correlates strongly with structured-to-text benchmarks ($ρ=0.79$) but shows near-zero correlation with standard jailbreak robustness. 2) Architecture over Scale: Model architecture influences risk detection performance more significantly than model size, with general-purpose LLMs consistently outperforming specialized safety guardrails in trajectory analysis. 3) Temporal Stability: Accuracy remains resilient across extended trajectories. Increased execution steps allow models to pivot from static tool definitions to dynamic execution behaviors, actually improving risk detection performance in later stages. Our findings suggest that securing agentic workflows requires jointly optimizing for structural reasoning and safety alignment to effectively mitigate mid-trajectory risks.
【2】The ATOM Report: Measuring the Open Language Model Ecosystem
标题:ATOM报告:衡量开放语言模型生态系统
链接:https://arxiv.org/abs/2604.07190
作者:Nathan Lambert,Florian Brand
备注:23 pages, 17 figures
摘要:我们提供了一个领先的开放语言模型的全面采用快照,以及谁正在构建它们,重点关注来自阿里巴巴的Qwen,DeepSeek,Meta的Llama等的约1.5K主线开放模型,这些模型是研究人员,企业家和政策顾问至关重要的生态系统的基础。我们记录了一个明显的趋势,即中国模型在2025年夏天超过了在美国建造的同类产品,随后扩大了与西方同行的差距。我们研究了Hugging Face下载和模型衍生产品的组合,推断市场份额,性能指标等,以全面了解生态系统。
摘要:We present a comprehensive adoption snapshot of the leading open language models and who is building them, focusing on the ~1.5K mainline open models from the likes of Alibaba's Qwen, DeepSeek, Meta's Llama, that are the foundation of an ecosystem crucial to researchers, entrepreneurs, and policy advisors. We document a clear trend where Chinese models overtook their counterparts built in the U.S. in the summer of 2025 and subsequently widened the gap over their western counterparts. We study a mix of Hugging Face downloads and model derivatives, inference market share, performance metrics and more to make a comprehensive picture of the ecosystem.
【3】Improving Semantic Uncertainty Quantification in Language Model Question-Answering via Token-Level Temperature Scaling
标题:通过令牌级温度缩放改善语言模型语音响应中的语义不确定性量化
链接:https://arxiv.org/abs/2604.07172
作者:Tom A. Lamb,Desi R. Ivanova,Philip H. S. Torr,Tim G. J. Rudner
摘要:校准是可靠的语义不确定性量化的核心,但以前的工作主要集中在歧视,忽略了校准。由于校准和鉴别捕获不确定性的不同方面,仅关注鉴别会产生不完整的图像。我们通过在一系列广泛的信心措施中系统地评估这两个方面来解决这一差距。我们发现,目前的方法,特别是固定温度的questionistics,产生系统性的错误校准和歧视性差的语义置信度分布。我们证明,优化一个单一的标量温度,我们认为,提供了一个合适的电感偏置,是一个令人惊讶的简单而有效的解决方案。我们的详尽评估证实,温度缩放一致地提高了语义校准,区分和下游熵,在问答任务中优于启发式基线和更具表现力的标记级重新校准方法。
摘要
:Calibration is central to reliable semantic uncertainty quantification, yet prior work has largely focused on discrimination, neglecting calibration. As calibration and discrimination capture distinct aspects of uncertainty, focusing on discrimination alone yields an incomplete picture. We address this gap by systematically evaluating both aspects across a broad set of confidence measures. We show that current approaches, particularly fixed-temperature heuristics, produce systematically miscalibrated and poorly discriminative semantic confidence distributions. We demonstrate that optimising a single scalar temperature, which, we argue, provides a suitable inductive bias, is a surprisingly simple yet effective solution. Our exhaustive evaluation confirms that temperature scaling consistently improves semantic calibration, discrimination, and downstream entropy, outperforming both heuristic baselines and more expressive token-level recalibration methods on question-answering tasks.
【4】Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing
标题:用于移动边缘计算中任务卸载的多轮推理LLM
链接:https://arxiv.org/abs/2604.07148
作者:Ning Yang,Chuangxin Cheng,Haijun Zhang
摘要:新兴的计算密集型应用对资源受限的移动设备提出了严格的延迟要求。移动边缘计算(MEC)通过任务卸载解决了这一挑战。然而,设计有效的策略仍然是困难的,由于动态任务到达,时变通道,时空耦合的服务器队列。传统的神经网络缺乏适应性,而深度强化学习(DRL)的泛化能力和架构刚性有限,当网络拓扑发生变化时需要重新训练。虽然大型语言模型(LLM)提供了语义推理能力,但标准的监督微调(SFT)会产生短视的策略,这些策略会最大限度地减少即时延迟,而不会考虑长期的系统演化。为了解决这些局限性,我们提出了COMLLM,一个生成框架,使有远见的决策在MEC系统。COMLLM将组相对策略优化(GRPO)与前瞻协作仿真(LACS)机制集成在一起,该机制在联合建模服务器队列动态时执行多步Monte Carlo部署。通过将这些推出纳入奖励设计,该框架捕捉了当前决策对未来系统状态的长期影响。实验结果表明,COMLLM实现了接近最优的延迟和负载均衡的公平性。值得注意的是,它表现出zero-shot拓扑可扩展性,允许在小规模网络上训练的模型推广到更大的,看不见的拓扑结构,而无需重新训练,优于SFT,DRL和启发式基线。
摘要:Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies remains difficult due to dynamic task arrivals, time-varying channels, and the spatio-temporal coupling of server queues. Conventional heuristics lack adaptability, while Deep Reinforcement Learning (DRL) suffers from limited generalization and architectural rigidity, requiring retraining when network topology changes. Although Large Language Models (LLMs) offer semantic reasoning capabilities, standard Supervised Fine-Tuning (SFT) yields myopic policies that greedily minimize immediate latency without accounting for long-term system evolution. To address these limitations, we propose COMLLM, a generative framework that enables foresighted decision-making in MEC systems. COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics. By incorporating these rollouts into the reward design, the framework captures the long-term impact of current decisions on future system states. Experimental results demonstrate that COMLLM achieves near-optimal latency and improved load-balancing fairness. Notably, it exhibits zero-shot topological scalability, allowing a model trained on small-scale networks to generalize to larger, unseen topologies without retraining, outperforming SFT, DRL, and heuristic baselines.
【5】EVGeoQA: Benchmarking LLMs on Dynamic, Multi-Objective Geo-Spatial Exploration
标题:EVGeoQA:动态、多目标地理空间探索的LLM基准
链接:https://arxiv.org/abs/2604.07070
作者:Jianfei Wu,Zhichun Wang,Zhensheng Wang,Zhiyu He
摘要:虽然大型语言模型(LLM)表现出卓越的推理能力,但它们在动态地理空间环境中的目的驱动探索潜力仍然没有得到充分研究。现有的地理空间问题检索(GSQA)基准主要集中在静态检索,未能捕捉到现实世界的规划,涉及动态用户位置和复合约束的复杂性。为了弥合这一差距,我们引入了EVGeoQA,这是一种基于电动汽车(EV)充电场景的新型基准测试,具有独特的位置锚定和双目标设计。具体而言,EVGeoQA中的每个查询都明确绑定到用户的实时坐标,并集成了充电必要性和协同定位活动偏好的双重目标。为了系统地评估模型在这样复杂的环境中,我们进一步提出GeoRover,一个通用的评估框架,基于工具增强的代理架构,以评估LLM的动态,多目标探索的能力。我们的实验表明,虽然LLM成功地利用工具来解决子任务,但它们在长距离空间探索方面很困难。值得注意的是,我们观察到一个新兴的能力:LLM可以总结历史探索轨迹,以提高探索效率。这些发现使EVGeoQA成为未来地理空间智能的具有挑战性的测试平台。数据集和提示可在https://github.com/Hapluckyy/EVGeoQA/上获得。
摘要:While Large Language Models (LLMs) demonstrate remarkable reasoning capabilities, their potential for purpose-driven exploration in dynamic geo-spatial environments remains under-investigated. Existing Geo-Spatial Question Answering (GSQA) benchmarks predominantly focus on static retrieval, failing to capture the complexity of real-world planning that involves dynamic user locations and compound constraints. To bridge this gap, we introduce EVGeoQA, a novel benchmark built upon Electric Vehicle (EV) charging scenarios that features a distinct location-anchored and dual-objective design. Specifically, each query in EVGeoQA is explicitly bound to a user's real-time coordinate and integrates the dual objectives of a charging necessity and a co-located activity preference. To systematically assess models in such complex settings, we further propose GeoRover, a general evaluation framework based on a tool-augmented agent architecture to evaluate the LLMs' capacity for dynamic, multi-objective exploration. Our experiments reveal that while LLMs successfully utilize tools to address sub-tasks, they struggle with long-range spatial exploration. Notably, we observe an emergent capability: LLMs can summarize historical exploration trajectories to enhance exploration efficiency. These findings establish EVGeoQA as a challenging testbed for future geo-spatial intelligence. The dataset and prompts are available at https://github.com/Hapluckyy/EVGeoQA/.
【6】ReDAct: Uncertainty-Aware Deferral for LLM Agents
标题:RedDAct:LLM代理的不确定性延迟
链接:https://arxiv.org/abs/2604.07036
作者:Dzianis Piatrashyn,Nikita Kotelevskii,Kirill Grishchenkov,Nikita Glazkov,Ivan Nasonov,Ilya Makarov,Timothy Baldwin,Preslav Nakov,Roman Vashurin,Maxim Panov
摘要:最近,基于LLM的代理在许多应用中越来越受欢迎,包括复杂的顺序决策问题。然而,他们继承了LLM产生幻觉的倾向,导致错误的决定。在连续设置中,即使是一个错误也会不可逆转地降低轨迹,使幻觉成为更大的问题。虽然较大的LLM产生的幻觉较少,但它们的每代币成本要高得多。在本文中,我们通过提出ReDAct(Reason-Defer-Act)来解决这一权衡。在ReDAct中,代理配备了两个LLM:一个是默认使用的小型廉价模型,另一个是大型的更可靠但昂贵的模型。当小模型的预测不确定性超过校准阈值时,决策被推迟到大模型。我们评估我们的方法在基于文本的体现环境,如ALFWorld和MiniGrid,并表明,推迟只有约15%的决定,以大型模型可以匹配的质量,专门使用它,同时显着降低推理成本。
摘要:Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems. However, they inherit the tendency of LLMs to hallucinate, leading to incorrect decisions. In sequential settings, even a single mistake can irreversibly degrade the trajectory, making hallucinations an even bigger problem. Although larger LLMs hallucinate less, they incur a significantly higher per-token cost. In this paper, we address this tradeoff by proposing ReDAct (Reason-Defer-Act). In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model. When the predictive uncertainty of the small model exceeds a calibrated threshold, the decision is deferred to the large model. We evaluate our approach in text-based embodied environments such as ALFWorld and MiniGrid and show that deferring only about 15% of decisions to the large model can match the quality of using it exclusively, while significantly reducing inference costs.
【7】Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models
标题:大型语言模型潜在语义体中Voronoi修饰的几何性质
链接:https://arxiv.org/abs/2604.06767
作者:Marshall Brett
备注:20 pages
摘要:语言模型在离散的标记上操作,但在连续的向量空间中计算,从而在表示流形上产生Voronoi镶嵌。我们在Qwen3.5- 4 B-Base上对这种镶嵌进行了实证研究,做出了两个贡献。首先,使用float 32 margin重新计算来解决bfloat 16量化伪影,我们验证了Mabrok(2026)的可表达性差距的线性缩放定律,其中R^2 $ = 0.9997 -迄今为止最强的确认-并识别了中间层几何模糊性机制,其中margin几何与交叉熵反相关(层24-28,$ρ$ = -0.29),然后在最终层($ρ$ = 0.836)处结晶成对准。 其次,我们表明,收敛模型的Voronoi镶嵌是通过利润细化程序(MRP)重塑:短期事后优化运行,扩大令牌决策利润,而无需再培训。我们比较直接利润最大化对Fisher信息距离最大化在剂量-反应扫描。这两种方法都发现了相同的上限,即每256 K评估约16,300个可校正位置,但在附带损害方面存在严重差异。利润最大化的损害升级与干预强度,直到纠正被压倒。Fisher损伤在整个验证范围内($λ$ = 0.15-0.6)保持恒定在约5,300个位置,在$λ$ = 0.6时实现了+28%的中位边缘改善,下游基准不变-几何重组压缩了可表达性差距,同时保留了其缩放定律。然而,频率和代币类审计显示,收益集中在高频结构性代币(在$λ$ = 0.6时,净修正的84%),内容和实体类贡献在较高的$λ$时缩小。因此,费希尔MRP是一个可行的几何抛光工具,其实际上限不是由总的损害,而是由令牌级利益的均匀性。
摘要:Language models operate on discrete tokens but compute in continuous vector spaces, inducing a Voronoi tessellation over the representation manifold. We study this tessellation empirically on Qwen3.5-4B-Base, making two contributions. First, using float32 margin recomputation to resolve bfloat16 quantization artifacts, we validate Mabrok's (2026) linear scaling law of the expressibility gap with $R^2$ = 0.9997 - the strongest confirmation to date - and identify a mid-layer geometric ambiguity regime where margin geometry is anti-correlated with cross-entropy (layers 24-28, $ρ$ = -0.29) before crystallizing into alignment at the final layer ($ρ$ = 0.836). Second, we show that the Voronoi tessellation of a converged model is reshapable through margin refinement procedures (MRP): short post-hoc optimization runs that widen token-decision margins without retraining. We compare direct margin maximization against Fisher information distance maximization across a dose-response sweep. Both methods find the same ceiling of ~16,300 correctable positions per 256K evaluated, but differ critically in collateral damage. Margin maximization damage escalates with intervention strength until corrections are overwhelmed. Fisher damage remains constant at ~5,300 positions across the validated range ($λ$ = 0.15-0.6), achieving +28% median margin improvement at $λ$ = 0.6 with invariant downstream benchmarks - a geometric reorganization that compresses the expressibility gap while preserving its scaling law. However, frequency and token-class audits reveal that gains concentrate in high-frequency structural tokens (84% of net corrections at $λ$ = 0.6), with content and entity-like contributions shrinking at higher $λ$. Fisher MRP is therefore a viable geometric polishing tool whose practical ceiling is set not by aggregate damage but by the uniformity of token-level benefit.
【8】Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start
标题:Foundry:基于模板的CUDA图形上下文物化,用于快速LLM服务冷启动
链接:https://arxiv.org/abs/2604.06664
作者:Xueshen Liu,Yongji Wu,Yuncheng Yao,Danyang Zhuo,Ion Stoica,Z. Morley Mao
摘要:现代LLM服务提供商越来越依赖于自动扩展和并行重新配置来响应快速变化的工作负载,但冷启动延迟仍然是一个主要瓶颈。虽然最近的系统已经将模型权重加载减少到几秒钟,但CUDA图形捕获仍然需要几十秒到几分钟,并且通常主导启动。不幸的是,CUDA图不能被简单地序列化:除了图拓扑,它们与执行上下文紧密耦合,包括嵌入在内核参数中的设备地址和在预热期间延迟加载的内核代码。现有的方法要么依赖于脆弱的内核特定的修补或重量级的进程级检查点/恢复,这对动态并行切换是不灵活的。我们提出了Foundry,一个基于模板的CUDA图形上下文物化系统,在离线处理阶段保持图形拓扑结构和执行上下文,并在线重构可执行图形,开销可以忽略不计。Foundry强制执行确定性内存布局,自动提取和重新加载捕获的图形所需的内核二进制文件,并通过基于拓扑的模板降低在线重建成本。对于分布式服务,Foundry还支持单GPU离线捕获,通过仅修补与秩相关的通信状态来生成多GPU部署的模板。在高达235 B参数的密集和MoE模型中,Foundry将冷启动延迟降低了99%,将Qwen 3 - 235 B-A22 B的初始化时间从10分钟缩短到3.9秒,同时保持CUDA图形的吞吐量增益。
摘要
:Modern LLM service providers increasingly rely on autoscaling and parallelism reconfiguration to respond to rapidly changing workloads, but cold-start latency remains a major bottleneck. While recent systems have reduced model weight loading to seconds, CUDA graph capture still takes tens of seconds to minutes and often dominates startup. Unfortunately, CUDA graphs cannot be naively serialized: beyond graph topology, they are tightly coupled to execution context, including device addresses embedded in kernel arguments and kernel code lazily loaded during warmup. Existing approaches either rely on brittle kernel-specific patching or heavyweight process-level checkpoint/restore that are inflexible to dynamic parallelism switching. We present Foundry, a template-based CUDA graph context materialization system that persists both graph topology and execution context during an offline processing stage, and reconstructs executable graphs online with negligible overhead. Foundry enforces deterministic memory layouts, automatically extracts and reloads kernel binaries required by captured graphs, and reduces online reconstruction costs through topology-based templating. For distributed serving, Foundry further enables a single-GPU offline capture to generate templates for multi-GPU deployments by patching only rank-dependent communication state. Across dense and MoE models up to 235B parameters, Foundry reduces cold-start latency by up to 99%, cutting the initialization time of Qwen3-235B-A22B from 10 minutes to 3.9 seconds while preserving the throughput gains of CUDA graphs.
【9】SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning
标题:SHADE:通过LLM推理潜力估计的阶段感知分层优势
链接:https://arxiv.org/abs/2604.06636
作者:Zhengyang Ai,Zikang Shan,Xiaodong Ai,Jingxian Tang,Hangkai Hu,Pinyan Lu
备注:ACL 2026 Main
摘要:过程监督已经成为一种很有前途的方法,用于增强LLM推理,但现有的方法无法区分有意义的进展,从单纯的冗长,导致有限的推理能力和未解决的令牌效率低下。为了解决这个问题,我们提出了阶段感知层次优势通过潜在的估计(形状),一个框架,通过经验可解性的状态空间的轨迹形式化推理。SHAPE引入了一种分层的信用分配机制:在段级,它采用了一种阶段感知的优势函数来优先考虑低潜力状态的有效突破;在令牌级,它利用熵驱动的再分配来锐化执行信号。在三个基本模型和五个基准测试中进行的大量数学推理实验表明,SHAPE的平均准确率提高了3%,令牌消耗减少了30%。
摘要:Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing methods fail to distinguish meaningful progress from mere verbosity, leading to limited reasoning capabilities and unresolved token inefficiency. To address this, we propose Stage-aware Hierarchical Advantage via Potential Estimation (SHAPE), a framework that formalizes reasoning as a trajectory through a state space of empirical solvability. SHAPE introduces a hierarchical credit assignment mechanism: at the segment level, it employs a stage-aware advantage function to prioritize efficient breakthroughs in low-potential states; at the token level, it utilizes entropy-driven redistribution to sharpen execution signals. Extensive experiments in math reasoning across three base models and five benchmarks demonstrate that SHAPE achieves an average accuracy gain of 3% with 30% reduced token consumption.
【10】LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources
标题:基于LLM的模式引导的从异类数据源中提取和验证失踪人员情报
链接:https://arxiv.org/abs/2604.06571
作者:Joshua Castillo,Ravi Mukkamala
备注:9 pages, 6 figures. Accepted at International Conference on Intelligent Digitization of Systems and Services (IDSS 2026)
摘要:失踪人员和儿童安全调查依赖于异质的案例文件,包括结构化的表格,公告式海报和叙述性网络简介。布局、术语和数据质量的变化阻碍了快速分类、大规模分析和搜索规划工作流。本文介绍了Guardian Parser Pack,这是一个AI驱动的解析和规范化管道,可将多源调查文档转换为统一的,符合模式的表示,适用于操作审查和下游空间建模。拟议的系统集成了(i)多引擎PDF文本提取与光学字符识别(OCR)回退,(ii)基于规则的源识别与特定于源的解析器,(iii)模式优先的协调和验证,以及(iv)可选的大型语言模型(LLM)辅助提取路径,包含验证器引导的修复和共享地理编码服务。我们提出了系统架构,关键的实施决策,输出设计,并使用黄金对齐提取指标和语料库级的操作指标来评估性能。在75个病例的手动对齐子集上,LLM辅助路径实现了比确定性比较器更高的提取质量(F1 = 0.8664 vs. 0.2578),而在每个路径的517个解析记录中,它还提高了聚合关键字段完整性(96.97\ % vs. 93.23\ %)。确定性路径保持更快(平均运行时间0.03 s/记录对LLM路径的3.95 s/记录)。在评估的运行中,所有LLM输出都通过了初始模式验证,因此验证器引导的修复功能是内置的保护措施,而不是观察到的收益的贡献者。这些结果支持在高风险调查环境中,在模式优先、可审计的管道中控制使用概率AI。
摘要:Missing-person and child-safety investigations rely on heterogeneous case documents, including structured forms, bulletin-style posters, and narrative web profiles. Variations in layout, terminology, and data quality impede rapid triage, large-scale analysis, and search-planning workflows. This paper introduces the Guardian Parser Pack, an AI-driven parsing and normalization pipeline that transforms multi-source investigative documents into a unified, schema-compliant representation suitable for operational review and downstream spatial modeling. The proposed system integrates (i) multi-engine PDF text extraction with Optical Character Recognition (OCR) fallback, (ii) rule-based source identification with source-specific parsers, (iii) schema-first harmonization and validation, and (iv) an optional Large Language Model (LLM)-assisted extraction pathway incorporating validator-guided repair and shared geocoding services. We present the system architecture, key implementation decisions, and output design, and evaluate performance using both gold-aligned extraction metrics and corpus-level operational indicators. On a manually aligned subset of 75 cases, the LLM-assisted pathway achieved substantially higher extraction quality than the deterministic comparator (F1 = 0.8664 vs. 0.2578), while across 517 parsed records per pathway it also improved aggregate key-field completeness (96.97\% vs. 93.23\%). The deterministic pathway remained much faster (mean runtime 0.03 s/record vs. 3.95 s/record for the LLM pathway). In the evaluated run, all LLM outputs passed initial schema validation, so validator-guided repair functioned as a built-in safeguard rather than a contributor to the observed gains. These results support controlled use of probabilistic AI within a schema-first, auditable pipeline for high-stakes investigative settings.
【11】The Illusion of Stochasticity in LLMs
标题:法学硕士中的随机性幻觉
链接:https://arxiv.org/abs/2604.06543
作者:Xiangming Gu,Soham De,Michalis Titsias,Larisa Markeeva,Petar Veličković,Razvan Pascanu
备注
:Under review
摘要:在这项工作中,我们证明了可靠的随机抽样是一个基本的,但未实现的要求,大型语言模型(LLM)作为代理。分析系统经常需要从分布中采样,通常从观察到的数据中推断,这是一个需要由LLM模拟的过程。这导致了一个明显的失败点:虽然标准RL代理依赖于外部采样机制,但LLM无法将其内部概率估计映射到其随机输出。通过对多个模型族、模型大小、提示风格和分布进行严格的实证分析,我们证明了这种失败的程度。至关重要的是,我们表明,虽然强大的前沿模型可以将提供的随机种子转换为目标分布,但它们直接从特定分布中采样的能力存在根本性缺陷。
摘要:In this work, we demonstrate that reliable stochastic sampling is a fundamental yet unfulfilled requirement for Large Language Models (LLMs) operating as agents. Agentic systems are frequently required to sample from distributions, often inferred from observed data, a process which needs to be emulated by the LLM. This leads to a distinct failure point: while standard RL agents rely on external sampling mechanisms, LLMs fail to map their internal probability estimates to their stochastic outputs. Through rigorous empirical analysis across multiple model families, model sizes, prompting styles, and distributions, we demonstrate the extent of this failure. Crucially, we show that while powerful frontier models can convert provided random seeds to target distributions, their ability to sample directly from specific distributions is fundamentally flawed.
【12】VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts
标题:VLMShield:有效且稳健地防御视觉语言模型,防止恶意入侵
链接:https://arxiv.org/abs/2604.06502
作者:Peigui Qi,Kunsheng Tang,Yanpu Yu,Jialin Wu,Yide Song,Wenbo Zhou,Zhicong Huang,Cheng Hong,Weiming Zhang,Nenghai Yu
摘要:视觉语言模型(VLM)面临着严重的安全漏洞,恶意提示攻击,由于在视觉集成过程中削弱对齐。现有的防御系统效率低,鲁棒性差。为了解决这些挑战,我们首先提出了多模态聚合特征提取(MAFE)框架,使CLIP处理长文本和融合多模态信息到统一的表示。通过对MAFE提取的特征进行实证分析,我们发现了良性和恶意提示之间不同的分布模式。基于这一发现,我们开发VLMShield,一个轻量级的安全检测器,有效地识别多模态恶意攻击作为一个即插即用的解决方案。大量的实验证明了在多个维度上的卓越性能,包括鲁棒性,效率和实用性。通过我们的工作,我们希望为更安全的多模式AI部署铺平道路。代码可在[此https URL](https://github.com/pgqihere/VLMShield)获得。
摘要:Vision-Language Models (VLMs) face significant safety vulnerabilities from malicious prompt attacks due to weakened alignment during visual integration. Existing defenses suffer from efficiency and robustness. To address these challenges, we first propose the Multimodal Aggregated Feature Extraction (MAFE) framework that enables CLIP to handle long text and fuse multimodal information into unified representations. Through empirical analysis of MAFE-extracted features, we discover distinct distributional patterns between benign and malicious prompts. Building upon this finding, we develop VLMShield, a lightweight safety detector that efficiently identifies multimodal malicious attacks as a plug-and-play solution. Extensive experiments demonstrate superior performance across multiple dimensions, including robustness, efficiency, and utility. Through our work, we hope to pave the way for more secure multimodal AI deployment. Code is available at [this https URL](https://github.com/pgqihere/VLMShield).
【13】Distributed Interpretability and Control for Large Language Models
标题:大型语言模型的分布式解释性和控制
链接:https://arxiv.org/abs/2604.06483
作者:Dev Arpan Desai,Shaoyi Huang,Zining Zhu
摘要:需要多个GPU卡来托管的大型语言模型通常是功能最强大的模型。有必要理解和操纵这些模型,但是当前的技术不支持这些模型在多GPU设置以及单GPU设置中的可解释性和操纵性。我们提出了一个实际的实现激活级的可解释性(logit镜头)和转向(转向矢量),可扩展到多GPU语言模型。与相同硬件上的基线相比,我们的系统实施的设计选择可将激活内存减少多达7倍,并将吞吐量增加多达41倍。我们在LLaMA-3.1(8B,70 B)和Qwen-3(4 B,14 B,32 B)上演示了该方法,维持20-100个令牌/s,同时收集1,500个令牌序列的完整逐层激活轨迹。使用LayerNorm后注入的标签位置导向矢量,我们在模型输出中显示了可控的单调变化,在评估的数据集上的平均可操纵性斜率为0.702,无需微调或额外的向前传递。我们在https://github.com/Devdesai1901/LogitLense上发布了详细的基准测试、消融和可重复的仪器配方,以实现前沿LLM的实际可解释性和实时行为控制。
摘要:Large language models that require multiple GPU cards to host are usually the most capable models. It is necessary to understand and steer these models, but the current technologies do not support the interpretability and steering of these models in the multi-GPU setting as well as the single-GPU setting. We present a practical implementation of activation-level interpretability (logit lens) and steering (steering vector) that scales up to multi-GPU language models. Our system implements design choices that reduce the activation memory by up to 7x and increase the throughput by up to 41x compared to a baseline on identical hardware. We demonstrate the method across LLaMA-3.1 (8B, 70B) and Qwen-3 (4B, 14B, 32B), sustaining 20-100 tokens/s while collecting full layer-wise activation trajectories for sequences of 1,500 tokens. Using label-position steering vectors injected post-LayerNorm, we show controllable, monotonic shifts in model outputs with a mean steerability slope of 0.702 across evaluated datasets, without fine-tuning or additional forward passes. We release detailed benchmarks, ablations, and a reproducible instrumentation recipe to enable practical interpretability and real-time behavioral control for frontier LLMs at https://github.com/Devdesai1901/LogitLense.
【14】The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
标题:深度上限:大型语言模型在发现潜在规划方面的局限性
链接:https://arxiv.org/abs/2604.06427
作者:Yi Xu,Philipp Jettkant,Laura Ruis
备注:10 pages, 3 figures, 1 table (30 pages, 9 figures, 10 tables including references and appendices)
摘要
:思想链(CoT)监控的可行性取决于模型无法在其潜在表示中进行有效推理。然而,对LLM中这种潜在推理的限制知之甚少。我们通过研究模型是否可以在没有中间步骤监督的情况下发现多步规划策略并在单个向前传递中潜伏地执行它们来测试这些限制。使用精确控制所需潜在规划步骤数量的图路径查找任务,我们发现了一个无法通过大规模扩展解决的惊人限制:从头开始训练的微型Transformers发现需要多达三个潜在步骤的策略,微调的GPT-4 o和Qwen 3 - 32 B达到五个,GPT-5.4在Few-Shot提示下达到七个。虽然在训练过程中可以学习的最大潜在规划深度模型是五个,但所发现的策略在测试时可以概括多达八个潜在步骤。这揭示了在最终答案监督下发现潜在策略的能力与一旦发现就执行的能力之间的分离。如果类似的限制更广泛地适用,则可能需要明确地教授或外部化需要多个协调的潜在规划步骤的策略,从而使CoT监控具有可信度。
摘要:The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is known about the limits of such latent reasoning in LLMs. We test these limits by studying whether models can discover multi-step planning strategies without supervision on intermediate steps and execute them latently, within a single forward pass. Using graph path-finding tasks that precisely control the number of required latent planning steps, we uncover a striking limitation unresolved by massive scaling: tiny transformers trained from scratch discover strategies requiring up to three latent steps, fine-tuned GPT-4o and Qwen3-32B reach five, and GPT-5.4 attains seven under few-shot prompting. Although the maximum latent planning depth models can learn during training is five, the discovered strategy generalizes up to eight latent steps at test-time. This reveals a dissociation between the ability to discover a latent strategy under final-answer supervision alone and the ability to execute it once discovered. If similar limits hold more broadly, strategies requiring multiple coordinated latent planning steps may need to be explicitly taught or externalized, lending credence to CoT monitoring.
【15】Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries
标题:注意力流动:通过故事摘要追踪LLM概念参与
链接:https://arxiv.org/abs/2604.06416
作者:Rebecca M. M. Hicke,Sil Hamilton,David Mimno,Ross Deans Kristensen-McLachlan
摘要:虽然LLM上下文长度已经增长,但有证据表明,他们在长文本中整合信息的能力并没有跟上。我们评估这样一个理解任务:生成小说摘要。当人类的摘要作者压缩一个故事时,他们揭示了他们认为叙事上重要的东西。因此,通过比较人类和LLM撰写的摘要,我们可以评估模型是否反映了人类与文本的概念参与模式。为了衡量概念参与度,我们将150个人类撰写的小说摘要中的句子与它们引用的具体章节进行对齐。我们证明了这个对齐任务的难度,这表明总结作为一项任务的复杂性。然后,我们为150个参考文本中的每一个生成并对齐9个最先进的LLM的额外摘要。比较人类和模型撰写的摘要,我们发现文本之间的风格差异以及人类和LLM如何在整个叙述中分配其焦点的差异,模型强调文本的结尾。将人类叙事参与与模型注意力机制进行比较,可以解释叙事理解能力下降的原因,并为未来的发展提供目标。我们发布数据集以支持未来的研究。
摘要:Although LLM context lengths have grown, there is evidence that their ability to integrate information across long-form texts has not kept pace. We evaluate one such understanding task: generating summaries of novels. When human authors of summaries compress a story, they reveal what they consider narratively important. Therefore, by comparing human and LLM-authored summaries, we can assess whether models mirror human patterns of conceptual engagement with texts. To measure conceptual engagement, we align sentences from 150 human-written novel summaries with the specific chapters they reference. We demonstrate the difficulty of this alignment task, which indicates the complexity of summarization as a task. We then generate and align additional summaries by nine state-of-the-art LLMs for each of the 150 reference texts. Comparing the human and model-authored summaries, we find both stylistic differences between the texts and differences in how humans and LLMs distribute their focus throughout a narrative, with models emphasizing the ends of texts. Comparing human narrative engagement with model attention mechanisms suggests explanations for degraded narrative comprehension and targets for future development. We release our dataset to support future research.
【16】ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning
标题:ProofSketcher:混合LLM +轻量级证明收件箱,用于可靠的数学/逻辑推理
链接:https://arxiv.org/abs/2604.06401
作者:Kranthi Kommuru,Kunal Khanvilkar,Gaurav Parekh
摘要:大型语言模型(LLM)可能会在数学和逻辑领域中产生有说服力的论点,尽管这种论点通常包括一些小的失误,包括完全省略副条件,无效的推理模式,或者诉诸于无法从所讨论的上下文中逻辑推导出来的引理。这些遗漏是臭名昭著的很难注意到单独的文本,因为即使是误解的建设仍然可能似乎大多是准确的。相反,像Lean和Coq这样的交互式定理证明器通过确保语法和语义语句只接受可以通过程序中所有语法和语义步骤的语句,具有严格的可靠性,这是语言类型检查的一个小的可信内核。尽管这种技术提供了强有力的保证,但它也付出了相当沉重的代价:证据必须完全形式化,证据使用者或辅助搜索程序必须提供大量的低级信息。本文提出了一种混合流水线,LLM在紧凑的DSL中生成类型化证明草图,轻量级可信内核将草图扩展为显式证明义务。
摘要:The large language models (LLMs) might produce a persuasive argument within mathematical and logical fields, although such argument often includes some minor missteps, including the entire omission of side conditions, invalid inference patterns, or appeals to a lemma that cannot be derived logically out of the context being discussed. These omissions are infamously hard to notice solely out of the text, as even the misconstrued construction still may seem mostly accurate. Conversely, interactive theorem provers like Lean and Coq have rigorous reliability by ensuring that syntactic and semantic statements only accept statements that can pass all the syntactic and semantic steps in the program which is a small trusted kernel of the language type-checks with. Despite the fact that this technique provides strong guarantees, it comes at quite a heavy price: the evidence must be completely formalized, and the evidence user or a auxiliary search program must provide an avalanche of low-level information. This paper presents a hybrid pipeline where an LLM generates a typed proof sketch in a compact DSL and a lightweight trusted kernel expands the sketch into explicit proof obligations.
【17】The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models
标题:叠加幻觉?语言模型中潜在思维的原则性分析
链接:https://arxiv.org/abs/2604.06374
作者:Michael Rizvi-Martel,Guillaume Rabusseau,Marius Mosbach
备注:9 pages
摘要:通过连续思想链的潜在推理(Latent CoT)已经成为离散CoT推理的一个有前途的替代方案。在连续空间中操作增加了表现力,并被假设为实现叠加:在单个表示中同时维护多个候选解决方案的能力。尽管存在理论争论,但尚不清楚语言模型在使用潜在CoT进行推理时是否实际上利用了叠加。我们在三个制度中研究这个问题:一个无训练的制度,将潜在的想法构建为令牌嵌入的凸组合,一个微调的制度,其中一个基础模型适合产生潜在的想法,以及一个从头开始的制度,其中一个模型完全用潜在的想法来训练,以解决给定的任务。使用Logit Lens和实体级探测来分析内部表示,我们发现只有从头开始训练的模型才表现出使用叠加的迹象。在无训练和微调的情况下,我们发现叠加要么崩溃,要么根本不使用,模型反而发现了捷径解决方案。我们认为,这是由于两个互补的现象:i)自然语言数据的预训练偏向模型在最后一层提交一个令牌,ii)容量对模型倾向于哪种解决方案有巨大影响。总之,我们的研究结果提供了一个统一的解释,当和为什么叠加出现在连续的思维链推理,并确定其崩溃的条件。
摘要:Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating in continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, we find that the superposition either collapses or is not used at all, with models discovering shortcut solutions instead. We argue that this is due to two complementary phenomena: i) pretraining on natural language data biases models to commit to a token in the last layers ii) capacity has a huge effect on which solutions a model favors. Together, our results offer a unified explanation for when and why superposition arises in continuous chain-of-thought reasoning, and identify the conditions under which it collapses.
【18】FedSpy-LLM: Towards Scalable and Generalizable Data Reconstruction Attacks from Gradients on LLMs
标题:FedSpy-LLM:面向LLM的后继者发起可扩展和可通用的数据重建攻击
链接:https://arxiv.org/abs/2604.06297
作者:Syed Irfan Ali Meerza,Feiyi Wang,Jian Liu
摘要:鉴于在训练大型语言模型(LLM)时越来越依赖于私有数据,联合学习(FL)与参数高效微调(PEFT)相结合,以提高隐私和效率。尽管FL具有隐私优势,但之前的研究表明,仍然可以从共享梯度中提取私人数据。然而,这些研究,主要是全参数模型训练,仅限于重建小批量,短输入序列,和特定的模型架构,如基于编码器或基于解码器的模型。当处理来自PEFT方法的梯度时,重建质量变得更差。为了充分了解联邦LLM的实际攻击面,本文提出了FedSpy-LLM,这是一种可扩展和可推广的数据重建攻击,旨在重建具有更大批量和更长序列的训练数据,同时在不同的模型架构中推广,即使部署PEFT方法进行训练。FedSpy-LLM的核心是一种新型的梯度分解策略,该策略利用梯度的秩亏和子空间结构,实现了有效的令牌提取,同时保留了关键信号分量。这种方法进一步减轻了由PEFT的大量零空间引入的重构挑战,确保了基于编码器、基于解码器和编码器-解码器模型架构的鲁棒性。此外,通过迭代地将每个令牌的部分序列梯度与全序列梯度对齐,FedSpy-LLM确保了重构序列中令牌的准确排序。
摘要:Given the growing reliance on private data in training Large Language Models (LLMs), Federated Learning (FL) combined with Parameter-Efficient Fine-Tuning (PEFT) has garnered significant attention for enhancing privacy and efficiency. Despite FL's privacy benefits, prior studies have shown that private data can still be extracted from shared gradients. However, these studies, mainly on full-parameter model training, are limited to reconstructing small batches, short input sequences, and specific model architectures, such as encoder-based or decoder-based models. The reconstruction quality becomes even worse when dealing with gradients from PEFT methods. To fully understand the practical attack surface of federated LLMs, this paper proposes FedSpy-LLM, a scalable and generalizable data reconstruction attack designed to reconstruct training data with larger batch sizes and longer sequences while generalizing across diverse model architectures, even when PEFT methods are deployed for training. At the core of FedSpy-LLM is a novel gradient decomposition strategy that exploits the rank deficiency and subspace structure of gradients, enabling efficient token extraction while preserving key signal components at scale. This approach further mitigates the reconstruction challenges introduced by PEFT's substantial null space, ensuring robustness across encoder-based, decoder-based, and encoder-decoder model architectures. Additionally, by iteratively aligning each token's partial-sequence gradient with the full-sequence gradient, FedSpy-LLM ensures accurate token ordering in reconstructed sequences.
【19】AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
标题:AgentOpt v0.1技术报告:基于LLM-Based Agent的客户端优化
链接:https://arxiv.org/abs/2604.06296
作者:Wenyue Hua,Sripad Karne,Qian Xie,Armaan Agrawal,Nikos Pagonas,Kostis Kaffes,Tianyi Peng
备注:21 pages, 1 figure
摘要
:人工智能代理越来越多地部署在现实世界的应用程序中,包括Manus、OpenClaw和编码代理等系统。现有的研究主要集中在服务器端效率上,提出了缓存、推测执行、流量调度和负载平衡等方法,以降低为代理工作负载提供服务的成本。然而,随着用户越来越多地通过组合本地工具、远程API和各种模型来构建代理,在客户端出现了同样重要的优化问题。客户端优化询问开发人员应如何分配可用资源,包括模型选择,本地工具和跨管道阶段的API预算,并受到特定于应用程序的质量,成本和延迟约束。因为这些目标取决于任务和部署设置,所以它们不能单独由服务器端系统确定。我们将介绍AgentOpt,这是第一个用于客户端代理优化的框架无关Python包。我们首先研究模型选择,多步代理管道中的高影响力优化杠杆。给定一个管道和一个小的评估集,目标是找到模型到管道角色的最具成本效益的分配。这个问题在实践中是必然的:在匹配的精度,最好和最坏的模型组合之间的成本差距可以达到13- 32 $\times $在我们的实验。为了有效地探索指数增长的组合空间,AgentOpt实现了八种搜索算法,包括Arm Elimination,Epsilon-LUCB,Threshold Successive Elimination和Bayesian Optimization。在四个基准测试中,Arm Elimination在四个任务中的三个任务上恢复了接近最佳的准确性,同时相对于蛮力搜索减少了24- 67%的评估预算。代码和基准测试结果可在https://agentoptimizer.github.io/agentopt/上获得。
摘要:AI agents are increasingly deployed in real-world applications, including systems such as Manus, OpenClaw, and coding agents. Existing research has primarily focused on \emph{server-side} efficiency, proposing methods such as caching, speculative execution, traffic scheduling, and load balancing to reduce the cost of serving agentic workloads. However, as users increasingly construct agents by composing local tools, remote APIs, and diverse models, an equally important optimization problem arises on the client side. Client-side optimization asks how developers should allocate the resources available to them, including model choice, local tools, and API budget across pipeline stages, subject to application-specific quality, cost, and latency constraints. Because these objectives depend on the task and deployment setting, they cannot be determined by server-side systems alone. We introduce AgentOpt, the first framework-agnostic Python package for client-side agent optimization. We first study model selection, a high-impact optimization lever in multi-step agent pipelines. Given a pipeline and a small evaluation set, the goal is to find the most cost-effective assignment of models to pipeline roles. This problem is consequential in practice: at matched accuracy, the cost gap between the best and worst model combinations can reach 13--32$\times$ in our experiments. To efficiently explore the exponentially growing combination space, AgentOpt implements eight search algorithms, including Arm Elimination, Epsilon-LUCB, Threshold Successive Elimination, and Bayesian Optimization. Across four benchmarks, Arm Elimination recovers near-optimal accuracy while reducing evaluation budget by 24--67\% relative to brute-force search on three of four tasks. Code and benchmark results available at https://agentoptimizer.github.io/agentopt/.
【20】TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
标题:TalkLoRA:大型语言模型的低秩适应的通信感知混合
链接:https://arxiv.org/abs/2604.06291
作者:Lin Mu,Haiyang Wang,Li Ni,Lei Sang,Zhize Wu,Peiquan Jin,Yiwen Zhang
摘要:低秩自适应(LoRA)能够对大型语言模型(LLM)进行参数高效的微调,最近的专家混合(MoE)扩展通过动态组合多个LoRA专家进一步增强了灵活性。然而,现有的MoE增强LoRA方法假设专家独立操作,通常导致不稳定的路由,专家主导。在本文中,我们提出了\textbf{TalkLoRA},一个通信感知的MoELoRA框架,通过在路由之前引入专家级通信来放松这种独立性假设。TalkLoRA为低级别专家配备了一个轻量级的Talking Module,可以在专家子空间之间进行受控的信息交换,从而为路由提供更强大的全局信号。从理论上讲,我们表明,专家通信平滑路由动态,同时严格概括现有的MoELoRA架构,减轻扰动放大。从经验上讲,TalkLoRA在各种语言理解和生成任务中始终优于vanilla LoRA和MoELoRA,在可比的参数预算下实现了更高的参数效率和更平衡的专家路由。这些结果突出了结构化的专家沟通作为一个原则和有效的增强教育部为基础的参数有效的适应。代码可在https://github.com/why0129/TalkLoRA上获得。
摘要:Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often leading to unstable routing, expert dominance. In this paper, we propose \textbf{TalkLoRA}, a communication-aware MoELoRA framework that relaxes this independence assumption by introducing expert-level communication prior to routing. TalkLoRA equips low-rank experts with a lightweight Talking Module that enables controlled information exchange across expert subspaces, producing a more robust global signal for routing. Theoretically, we show that expert communication smooths routing dynamics by mitigating perturbation amplification while strictly generalizing existing MoELoRA architectures. Empirically, TalkLoRA consistently outperforms vanilla LoRA and MoELoRA across diverse language understanding and generation tasks, achieving higher parameter efficiency and more balanced expert routing under comparable parameter budgets. These results highlight structured expert communication as a principled and effective enhancement for MoE-based parameter-efficient adaptation. Code is available at https://github.com/why0129/TalkLoRA.
【21】Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models
标题:大型语言模型中生成广告的激励感知多保真优化
链接:https://arxiv.org/abs/2604.06263
作者:Jiayuan Liu,Barry Wang,Jiarui Gan,Tonghan Wang,Leon Xie,Mingyu Guo,Vincent Conitzer
摘要:生成广告在大语言模型(LLM)的反应,需要优化赞助配置下的两个严格的约束:广告商的战略行为和随机生成的高成本。为了解决这个问题,我们提出了激励意识多保真度机制(IAMFM),一个统一的框架耦合Vickrey-Clarke-Groves(VCG)激励与多保真度优化,以最大限度地提高预期的社会福利。我们比较了两个算法的实例化(消除为基础的和基于模型的),揭示了他们的依赖于性能的权衡。至关重要的是,为了使VCG在计算上可行,我们引入了主动反事实优化,这是一种“热启动”方法,可以重复使用优化数据进行有效的支付计算。我们提供了正式的保证,近似的战略防和个人的理性,建立一个激励对齐的,不受约束的生成过程的一般方法。实验表明,IAMFM优于单一保真度基线在不同的预算。
摘要:Generative advertising in large language model (LLM) responses requires optimizing sponsorship configurations under two strict constraints: the strategic behavior of advertisers and the high cost of stochastic generations. To address this, we propose the Incentive-Aware Multi-Fidelity Mechanism (IAMFM), a unified framework coupling Vickrey-Clarke-Groves (VCG) incentives with Multi-Fidelity Optimization to maximize expected social welfare. We compare two algorithmic instantiations (elimination-based and model-based), revealing their budget-dependent performance trade-offs. Crucially, to make VCG computationally feasible, we introduce Active Counterfactual Optimization, a "warm-start" approach that reuses optimization data for efficient payment calculation. We provide formal guarantees for approximate strategy-proofness and individual rationality, establishing a general approach for incentive-aligned, budget-constrained generative processes. Experiments demonstrate that IAMFM outperforms single-fidelity baselines across diverse budgets.
【22】$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models
标题:$S#3 $:扩散语言模型中测试时间的分层缩放搜索
链接:https://arxiv.org/abs/2604.06260
作者:Ahsan Bilal,Muhammad Ahmed Mohsin,Muhammad Umer,Asad Aali,Muhammad Usman Khanzada,Muhammad Usman Rafique,Zihao He,Emily Fox,Dean F. Hougen
备注:Submitted to COLM 2026
摘要:测试时间缩放研究了固定扩散语言模型(DLM)在没有额外训练的情况下,在给定更多推理计算时是否可以生成更好的输出。然而,朴素的最佳K采样从根本上是有限的,因为它重复地从相同的基础扩散分布中提取,其高概率区域通常与高质量输出不一致。我们提出了$S^3$(分层缩放搜索),这是一种经典的验证者引导搜索方法,它通过在去噪过程中重新分配计算来提高生成,而不仅仅是在最终输出阶段。在每个去噪步骤中,$S^3$扩展多个候选轨迹,使用轻量级无参考验证器对其进行评估,并选择性地重新采样有希望的候选轨迹,同时保留搜索边界内的多样性。该过程有效地近似了奖励倾斜的抽样分布,该分布有利于更高质量的输出,同时保持锚定到模型先验。在MATH-500、GSM 8 K、ARC-Challenge和TruthfulQA上使用LLaDA-8B-Instruct进行的实验表明,S^3 $在基准测试中始终提高性能,在数学推理任务上实现了最大收益,同时保持底层模型和解码时间表不变。这些结果表明,经典的搜索去噪轨迹提供了一个实用的机制,在DLMs的测试时间缩放。
摘要:Test-time scaling investigates whether a fixed diffusion language model (DLM) can generate better outputs when given more inference compute, without additional training. However, naive best-of-$K$ sampling is fundamentally limited because it repeatedly draws from the same base diffusion distribution, whose high-probability regions are often misaligned with high-quality outputs. We propose $S^3$ (Stratified Scaling Search), a classical verifier-guided search method that improves generation by reallocating compute during the denoising process rather than only at the final output stage. At each denoising step, $S^3$ expands multiple candidate trajectories, evaluates them with a lightweight reference-free verifier, and selectively resamples promising candidates while preserving diversity within the search frontier. This procedure effectively approximates a reward-tilted sampling distribution that favors higher-quality outputs while remaining anchored to the model prior. Experiments with LLaDA-8B-Instruct on MATH-500, GSM8K, ARC-Challenge, and TruthfulQA demonstrate that $S^3$ consistently improves performance across benchmarks, achieving the largest gains on mathematical reasoning tasks while leaving the underlying model and decoding schedule unchanged. These results show that classical search over denoising trajectories provides a practical mechanism for test-time scaling in DLMs.
【23】Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook
标题:基于价值代码簿的LLM文化价值一致分布开放式评估
链接:https://arxiv.org/abs/2604.06210
作者:Jaehyeok Lee,Xiaoyuan Yi,Jing Yao,Hyunjin Hwang,Roy Ka-Wei Lee,Xing Xie,JinYeong Bak
摘要:随着LLM在全球范围内部署,调整其文化价值取向对于安全和用户参与至关重要。然而,现有的基准面临着建构-组合-语境($C^3$)的挑战:依赖于歧视性的,多项选择的格式,探索价值知识,而不是真正的方向,忽视亚文化的异质性,并与现实世界的开放式一代不匹配。我们引入DOVE,一个分布式评估框架,直接比较人类编写的文本分布与LLM生成的输出。DOVE利用率失真变分优化目标从10 K文档中构建紧凑的值码本,将文本映射到结构化值空间以过滤语义噪音。对齐是使用不平衡的最佳运输、捕获文化内分布结构和子群体多样性来衡量的。12个LLM的实验表明,DOVE实现了卓越的预测有效性,与下游任务的相关性达到31.56%,同时保持高可靠性,每个培养物只有500个样本。
摘要:As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional evaluation framework that directly compares human-written text distributions with LLM-generated outputs. DOVE utilizes a rate-distortion variational optimization objective to construct a compact value-codebook from 10K documents, mapping text into a structured value space to filter semantic noise. Alignment is measured using unbalanced optimal transport, capturing intra-cultural distributional structures and sub-group diversity. Experiments across 12 LLMs show that DOVE achieves superior predictive validity, attaining a 31.56% correlation with downstream tasks, while maintaining high reliability with as few as 500 samples per culture.
【24】The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?
标题:逐步信息性假设:为什么LLM中的熵动力学和推理相互关联?
链接:https://arxiv.org/abs/2604.06192
作者:Mar Gonzàlez I Català,Haitz Sáez de Ocáriz Borde,George D. Montañez,Pietro Liò
备注:21 pages, 5 figures, 3 tables
摘要:最近的工作使用基于熵的信号在多个代表性的水平来研究大型语言模型中的推理,但该领域仍然主要是经验性的。一个尚未解决的核心难题是,为什么在模型的预测分布下定义的内部熵动态与地面真实答案给出的外部正确性如此紧密相关。在本文中,我们认为,这种相关性的出现,因为自回归模型的原因,正确的,当他们通过答案信息前缀积累有关真实答案的信息。我们通过逐步信息性假设(SIA)将这种直觉形式化,该假设指出,推理前缀随着生成过程的进行而积累预期中的答案相关信息。我们表明,SIA自然出现在人类推理轨迹的最大似然优化,并通过标准的微调和重复学习管道得到加强。然后,我们推导出可观察到的签名SIA连接条件答案熵动态的正确性。我们在多个推理基准(GSM 8 K,ARC,SVAMP)和一组不同的开放权重LLM(Gemma-2,LLaMA-3.2,Qwen-2.5,DeepSeek和Olmo变体)上对SIA进行了实证测试,结果表明训练会诱导SIA,并且正确的轨迹会显示出特征性的条件答案熵模式。
摘要
:Recent work uses entropy-based signals at multiple representation levels to study reasoning in large language models, but the field remains largely empirical. A central unresolved puzzle is why internal entropy dynamics, defined under the predictive distribution of a model, correlate so robustly with external correctness given by the ground-truth answer. In this paper, we argue that this correlation arises because autoregressive models reason correctly when they accumulate information about the true answer via answer-informative prefixes. We formalize this intuition via the Stepwise Informativeness Assumption (SIA), which states that reasoning prefixes accumulate answer-relevant information in expectation as generation progresses. We show that SIA naturally emerges from maximum-likelihood optimization on human reasoning traces and is reinforced by standard fine-tuning and reinforcement-learning pipelines. We then derive observable signatures of SIA linking conditional answer entropy dynamics to correctness. We empirically test SIA across multiple reasoning benchmarks (GSM8K, ARC, SVAMP) and a diverse set of open-weight LLMs (Gemma-2, LLaMA-3.2, Qwen-2.5, DeepSeek and Olmo variants), showing that training induces it and that correct traces exhibit characteristic conditional answer entropy patterns.
【25】Fighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations
标题:用人工智能对抗人工智能:人工智能代理在学生评估期间增强DNS对LLM服务的封锁
链接:https://arxiv.org/abs/2604.02360
作者:Yonas Kassa,James Bonacci,Ping Wang
备注:accepted at ITNG 2026
摘要:大型语言模型(LLM)在教育领域的变革潜力,例如提高可访问性和个性化学习,正在被重大挑战所掩盖。这些挑战源于人们担心LLM通过绕过批判性思维来破坏学术评估,从而导致认知负荷增加。这一新兴趋势强调了利用人工智能的教育优势,同时在不断发展的人工智能生态系统中维护批判性思维和学术严谨性的双重必要性。为此,我们引入了AI-Sinkhole,这是一个基于AI代理增强DNS的框架,可以在监考考试期间动态发现,语义分类和暂时阻止网络范围内出现的LLM聊天机器人服务。AI-Sinkhole通过量化的LLM(LLama 3,DeepSeek-R1,Qwen-3)和Pi-Hole的动态DNS阻止提供可解释的分类。我们还分享了我们使用LLM作为可解释分类器的观察结果,这些分类器实现了强大的跨语言性能(F1得分> 0.83)。为了支持该领域的未来研究和开发,可在https://github.com/AIMLEdu/ai-sinkhole上获得具有可随时部署的“AI-Sinkhole”blockist的初始代码。
摘要:The transformative potential of large language models (LLMs) in education, such as improving accessibility and personalized learning, is being eclipsed by significant challenges. These challenges stem from concerns that LLMs undermine academic assessment by enabling bypassing of critical thinking, leading to increased cognitive offloading. This emerging trend stresses the dual imperative of harnessing AI's educational benefits while safeguarding critical thinking and academic rigor in the evolving AI ecosystem. To this end, we introduce AI-Sinkhole, an AI-agent augmented DNS-based framework that dynamically discovers, semantically classifies, and temporarily network-wide blocks emerging LLM chatbot services during proctored exams. AI-Sinkhole offers explainable classification via quantized LLMs (LLama 3, DeepSeek-R1, Qwen-3) and dynamic DNS blocking with Pi-Hole. We also share our observations in using LLMs as explainable classifiers which achieved robust cross-lingual performance (F1-score > 0.83). To support future research and development in this domain initial codes with a readily deployable 'AI-Sinkhole' blockist is available on https://github.com/AIMLEdu/ai-sinkhole.
Graph相关(图学习|图神经网络|图优化等)(6篇)
【1】Graph Neural ODE Digital Twins for Control-Oriented Reactor Thermal-Hydraulic Forecasting Under Partial Observability
标题:部分可观测性下面向控制的反应堆热力水力预测的图神经ODE数字孪生子
链接:https://arxiv.org/abs/2604.07292
作者:Akzhol Almukhametov,Doyeong Lim,Rui Hu,Yang Liu
摘要:先进反应堆的实时监督控制需要准确预测全厂热工水力状态,包括物理传感器不可用的位置。满足这一需求需要结合预测保真度,毫秒级推理和部分可观测性的鲁棒性的替代模型。在这项工作中,我们提出了一个物理信息的消息传递图神经网络与神经常微分方程(GNN-ODE)耦合,以同时满足所有三个要求。我们将整个系统表示为一个有向传感器图,其边缘通过流量/传热感知消息传递编码液压连通性,并且我们通过受控神经ODE在连续时间内推进潜在动态。拓扑引导的缺失节点初始化器在推出开始时重建未检测的状态;预测然后完全自回归地进行。GNN-ODE代理取得了令人满意的结果为系统的动态预测。在保持的模拟瞬态,代理实现了平均MAE为0.91 K在60 s和2.18 K在300 s的未仪表节点,与R^2 $高达0.995的缺失节点状态重建。推理在单个GPU上的运行速度比模拟时间快约105倍,支持64个成员的集合推出不确定性量化。为了评估模拟到真实的转移,我们使用仅30个训练序列的逐层判别微调来使预训练的替代物适应实验设施数据。学习流相关的传热缩放恢复与建立的相关性,表明本构学习超越轨迹拟合的一个哈罗德数指数。该模型跟踪一个陡峭的功率变化瞬态,并产生准确的轨迹,在无仪表的位置。
摘要:Real-time supervisory control of advanced reactors requires accurate forecasting of plant-wide thermal-hydraulic states, including locations where physical sensors are unavailable. Meeting this need calls for surrogate models that combine predictive fidelity, millisecond-scale inference, and robustness to partial observability. In this work, we present a physics-informed message-passing Graph Neural Network coupled with a Neural Ordinary Differential Equation (GNN-ODE) to addresses all three requirements simultaneously. We represent the whole system as a directed sensor graph whose edges encode hydraulic connectivity through flow/heat transfer-aware message passing, and we advance the latent dynamics in continuous time via a controlled Neural ODE. A topology-guided missing-node initializer reconstructs uninstrumented states at rollout start; prediction then proceeds fully autoregressively. The GNN-ODE surrogate achieves satisfactory results for the system dynamics prediction. On held-out simulation transients, the surrogate achieves an average MAE of 0.91 K at 60 s and 2.18 K at 300 s for uninstrumented nodes, with $R^2$ up to 0.995 for missing-node state reconstruction. Inference runs at approximately 105 times faster than simulated time on a single GPU, enabling 64-member ensemble rollouts for uncertainty quantification. To assess sim-to-real transfer, we adapt the pretrained surrogate to experimental facility data using layerwise discriminative fine-tuning with only 30 training sequences. The learned flow-dependent heat-transfer scaling recovers a Reynolds-number exponent consistent with established correlations, indicating constitutive learning beyond trajectory fitting. The model tracks a steep power change transient and produces accurate trajectories at uninstrumented locations.
【2】CSA-Graphs: A Privacy-Preserving Structural Dataset for Child Sexual Abuse Research
标题:CSA-graphs:用于儿童性虐待研究的保护隐私的结构数据集
链接:https://arxiv.org/abs/2604.07132
作者:Carlos Caetano,Camila Laranjeira,Clara Ernesto,Artur Barros,João Macedo,Leo S. F. Ribeiro,Jefersson A. dos Santos,Sandra Avila
备注
:Conference on Computer Vision and Pattern Recognition (CVPR 2026), in the Workshop on Computer Vision for Children (CV4CHL)
摘要:儿童性虐待图像(CSAI)分类是计算机视觉研究的一个重要但具有挑战性的问题,因为严格的法律和道德限制阻止了CSAI数据集的公开共享。这种限制阻碍了可重复性,并减缓了自动化方法的开发进度。在这项工作中,我们介绍了CSA图,一个隐私保护的结构数据集。而不是释放原始图像,我们提供了结构化的表示,删除明确的视觉内容,同时保留上下文信息。CSA图包括两个互补的基于图的模态:描述对象关系的场景图和编码人体姿势的骨架图。实验表明,这两种表示保留有用的信息分类CSAI,并结合它们进一步提高性能。该数据集使人们能够在尊重法律和道德约束的同时,更广泛地研究用于儿童安全的计算机视觉方法。
摘要:Child Sexual Abuse Imagery (CSAI) classification is an important yet challenging problem for computer vision research due to the strict legal and ethical restrictions that prevent the public sharing of CSAI datasets. This limitation hinders reproducibility and slows progress in developing automated methods. In this work, we introduce CSA-Graphs, a privacy-preserving structural dataset. Instead of releasing the original images, we provide structural representations that remove explicit visual content while preserving contextual information. CSA-Graphs includes two complementary graph-based modalities: scene graphs describing object relationships and skeleton graphs encoding human pose. Experiments show that both representations retain useful information for classifying CSAI, and that combining them further improves performance. This dataset enables broader research on computer vision methods for child safety while respecting legal and ethical constraints.
【3】GraphWalker: Graph-Guided In-Context Learning for Clinical Reasoning on Electronic Health Records
标题:GraphWalker:用于电子健康记录临床推理的图形引导内上下文学习
链接:https://arxiv.org/abs/2604.06684
作者:Yue Fang,Weibin Liao,Yuxin Guo,Jiaran Gao,Hongxin Ding,Jinyang Zhang,Xinke Jiang,Zhibang Yang,Junfeng Zhao,Yasha Wang,Liantao Ma
摘要:电子健康记录(EHR)的临床推理是现代医疗保健中的一项基本但具有挑战性的任务。虽然在上下文学习(ICL)提供了一个有前途的推理时间适应范例的大型语言模型(LLM)在EHR推理,现有的方法面临三个基本挑战:(1)视角限制,其中数据驱动的相似性未能与LLM推理需求一致,模型驱动的信号受到有限的临床能力的限制;(2)群体意识,因为示范是独立选择的,而不对人口水平结构建模;(3)信息聚合,其中忽略了示范之间的冗余和交互作用,导致边际收益递减。为了应对这些挑战,我们提出了GraphWalker,一个原则性的EHR为导向的ICL示范选择框架。GraphWalker(i)通过整合数据驱动和模型驱动的视角,联合建模患者临床信息和LLM估计的信息增益,(ii)结合队列发现以避免嘈杂的局部最优,以及(iii)采用懒惰贪婪搜索与边界扩展算法来减轻信息聚合中的边际收益递减。在多个真实世界的EHR基准上进行的大量实验表明,GraphWalker始终优于最先进的ICL基线,从而大大提高了临床推理性能。我们的代码在https://github.com/PuppyKnightUniversity/GraphWalker上是开源的
摘要:Clinical Reasoning on Electronic Health Records (EHRs) is a fundamental yet challenging task in modern healthcare. While in-context learning (ICL) offers a promising inference-time adaptation paradigm for large language models (LLMs) in EHR reasoning, existing methods face three fundamental challenges: (1) Perspective Limitation, where data-driven similarity fails to align with LLM reasoning needs and model-driven signals are constrained by limited clinical competence; (2) Cohort Awareness, as demonstrations are selected independently without modeling population-level structure; and (3) Information Aggregation, where redundancy and interaction effects among demonstrations are ignored, leading to diminishing marginal gains. To address these challenges, we propose GraphWalker, a principled demonstration selection framework for EHR-oriented ICL. GraphWalker (i) jointly models patient clinical information and LLM-estimated information gain by integrating data-driven and model-driven perspectives, (ii) incorporates Cohort Discovery to avoid noisy local optima, and (iii) employs a Lazy Greedy Search with Frontier Expansion algorithm to mitigate diminishing marginal returns in information aggregation. Extensive experiments on multiple real-world EHR benchmarks demonstrate that GraphWalker consistently outperforms state-of-the-art ICL baselines, yielding substantial improvements in clinical reasoning performance. Our code is open-sourced at https://github.com/PuppyKnightUniversity/GraphWalker
【4】From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures
标题:从负载测试到实时流:微服务架构中基于图形嵌入的异常检测
链接:https://arxiv.org/abs/2604.06448
作者:Srinidhi Madabhushi,Pranesh Vyas,Swathi Vaidyanathan,Mayur Kurup,Elliott Nash,Yegor Silyutin
备注:Accepted at FSE 2026 - Industrial Track
摘要:Prime Video定期进行负载测试,以模拟在周四晚间足球赛等直播活动以及权力之环等视频点播(VOD)活动期间看到的观众流量峰值。虽然这些压力测试验证了系统容量,但它们有时可能会错过真实事件流量特有的服务行为。我们提出了一个基于图的异常检测系统,使用无监督的节点级图嵌入来识别未充分代表的服务。建立在GCN-GAE上,我们的方法以分钟级的分辨率从有向加权服务图中学习结构表示,并根据负载测试和事件嵌入之间的余弦相似性标记异常。该系统识别记录在案的与事件相关的服务,并展示早期检测能力。我们还介绍了一个初步的合成异常注入框架控制评估,显示有前途的精度(96%)和低误报率(0.08%),虽然召回率(58%)仍然有限,在保守的传播假设。该框架展示了Prime Video中的实用性,同时也展示了方法论的经验教训和方向,为跨微服务生态系统的更广泛应用奠定了基础。
摘要:Prime Video regularly conducts load tests to simulate the viewer traffic spikes seen during live events such as Thursday Night Football as well as video-on-demand (VOD) events such as Rings of Power. While these stress tests validate system capacity, they can sometimes miss service behaviors unique to real event traffic. We present a graph-based anomaly detection system that identifies under-represented services using unsupervised node-level graph embeddings. Built on a GCN-GAE, our approach learns structural representations from directed, weighted service graphs at minute-level resolution and flags anomalies based on cosine similarity between load test and event embeddings. The system identifies incident-related services that are documented and demonstrates early detection capability. We also introduce a preliminary synthetic anomaly injection framework for controlled evaluation that show promising precision (96%) and low false positive rate (0.08%), though recall (58%) remains limited under conservative propagation assumptions. This framework demonstrates practical utility within Prime Video while also surfacing methodological lessons and directions, providing a foundation for broader application across microservice ecosystems.
【5】Toward a universal foundation model for graph-structured data
标题:迈向图形结构数据的通用基础模型
链接:https://arxiv.org/abs/2604.06391
作者:Sakib Mostafa,Lei Xing,Md. Tauhidul Islam
备注:19 pages, 5 figures, 12 supplementary figures
摘要:图是生物医学研究中的中心表示,捕获分子相互作用网络,基因调控电路,细胞-细胞通信图和知识图。尽管它们很重要,但目前还没有一个可广泛重用的基础模型可用于图分析,与那些已经转换了语言和视觉的模型相媲美。现有的图神经网络通常在单个数据集上进行训练,并且只学习特定于该图的节点特征、拓扑和标签空间的表示,这限制了它们跨域传输的能力。这种普遍性的缺乏在生物学和医学中尤其成问题,其中网络在群体、测定和机构之间存在很大差异。在这里,我们介绍了一个图基础模型,旨在学习不特定于特定节点身份或特征方案的可转移结构表示。我们的方法利用特征不可知的图形属性,包括度统计,中心性措施,社区结构指标和基于扩散的签名,并将它们编码为结构提示。这些提示与消息传递骨干集成,以将不同的图形嵌入到共享的表示空间中。该模型在异构图上预训练一次,然后以最小的适应性在看不见的数据集上重用。在多个基准测试中,我们的预训练模型匹配或超过了强监督基线,同时在保持图上展示了卓越的zero-shot和Few-Shot泛化能力。在SagePPI基准测试中,预训练骨干的监督微调实现了95.5%的平均ROC-AUC,比最佳监督消息传递基线提高了21.8%。因此,所提出的技术提供了一个独特的方法,可重用的,基础规模的模型,在生物医学和网络科学应用的图形结构数据。
摘要:Graphs are a central representation in biomedical research, capturing molecular interaction networks, gene regulatory circuits, cell--cell communication maps, and knowledge graphs. Despite their importance, currently there is not a broadly reusable foundation model available for graph analysis comparable to those that have transformed language and vision. Existing graph neural networks are typically trained on a single dataset and learn representations specific only to that graph's node features, topology, and label space, limiting their ability to transfer across domains. This lack of generalization is particularly problematic in biology and medicine, where networks vary substantially across cohorts, assays, and institutions. Here we introduce a graph foundation model designed to learn transferable structural representations that are not specific to specific node identities or feature schemes. Our approach leverages feature-agnostic graph properties, including degree statistics, centrality measures, community structure indicators, and diffusion-based signatures, and encodes them as structural prompts. These prompts are integrated with a message-passing backbone to embed diverse graphs into a shared representation space. The model is pretrained once on heterogeneous graphs and subsequently reused on unseen datasets with minimal adaptation. Across multiple benchmarks, our pretrained model matches or exceeds strong supervised baselines while demonstrating superior zero-shot and few-shot generalization on held-out graphs. On the SagePPI benchmark, supervised fine-tuning of the pretrained backbone achieves a mean ROC-AUC of 95.5%, a gain of 21.8% over the best supervised message-passing baseline. The proposed technique thus provides a unique approach toward reusable, foundation-scale models for graph-structured data in biomedical and network science applications.
【6】BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning
标题:BiScale-GTR:用于多尺度分子表示学习的片段感知图转换器
链接:https://arxiv.org/abs/2604.06336
作者:Yi Yang,Ovidiu Daescu
摘要:图Transformers最近通过将图神经网络(GNNs)的归纳偏差与Transformers的全局感受野相结合而引起了人们的关注。然而,许多现有的混合架构仍然是GNN主导的,导致所得到的表示仍然受到本地消息传递的严重影响。此外,大多数现有方法仅在单一结构粒度下操作,限制了它们捕获跨越多个分子尺度的分子模式的能力。我们介绍了BiScale-GTR,这是一个用于自监督分子表示学习的统一框架,它将化学接地片段标记化与自适应多尺度推理相结合。我们的方法改进了图字节对编码(BPE)标记化,以产生一致的,化学上有效的和高覆盖率的片段标记,这些标记被用作并行GNN-Transformer架构的片段级输入。在架构上,GNN学习的原子级表示被汇集到片段级嵌入中,并在Transformer推理之前与片段令牌嵌入融合,使模型能够联合捕获局部化学环境,子结构级基序和长距离分子依赖性。在MoleculeNet、PharmaBench和长程图基准测试(LRGB)上的实验证明了分类和回归任务的最新性能。归属分析进一步表明,BiScale-GTR突出了化学上有意义的功能基序,提供了分子结构和预测性质之间的可解释的联系。代码将在接受后发布。
摘要:Graph Transformers have recently attracted attention for molecular property prediction by combining the inductive biases of graph neural networks (GNNs) with the global receptive field of Transformers. However, many existing hybrid architectures remain GNN-dominated, causing the resulting representations to remain heavily shaped by local message passing. Moreover, most existing methods operate at only a single structural granularity, limiting their ability to capture molecular patterns that span multiple molecular scales. We introduce BiScale-GTR, a unified framework for self-supervised molecular representation learning that combines chemically grounded fragment tokenization with adaptive multi-scale reasoning. Our method improves graph Byte Pair Encoding (BPE) tokenization to produce consistent, chemically valid, and high-coverage fragment tokens, which are used as fragment-level inputs to a parallel GNN-Transformer architecture. Architecturally, atom-level representations learned by a GNN are pooled into fragment-level embeddings and fused with fragment token embeddings before Transformer reasoning, enabling the model to jointly capture local chemical environments, substructure-level motifs, and long-range molecular dependencies. Experiments on MoleculeNet, PharmaBench, and the Long Range Graph Benchmark (LRGB) demonstrate state-of-the-art performance across both classification and regression tasks. Attribution analysis further shows that BiScale-GTR highlights chemically meaningful functional motifs, providing interpretable links between molecular structure and predicted properties. Code will be released upon acceptance.
Transformer(3篇)
【1】Self-Discovered Intention-aware Transformer for Multi-modal Vehicle Trajectory Prediction
标题:多模态车辆轨迹预测的自发现意图感知Transformer
链接:https://arxiv.org/abs/2604.07126
作者:Diyi Liu,Zihan Niu,Tu Xu,Lishan Sun
备注:5 pages, 2 figures
摘要:车辆轨迹预测在自动驾驶和ITS应用中起着重要的作用。尽管设计了多种深度学习算法来预测车辆轨迹,但它们对特定图形结构(例如,图神经网络)或显式意图标记限制了它们的灵活性。在这项研究中,我们提出了一个纯粹的基于transformer的网络与多个模态考虑其相邻的车辆。采用两条独立的轨道。一个跟踪侧重于预测轨迹,而另一个则侧重于预测每个意图的可能性,考虑到相邻车辆。研究发现,双轨迹设计可以通过分离空间模块和轨迹生成模块来提高性能。此外,我们发现该模型可以通过预测K个轨迹之间的剩余偏移来学习一组有序的轨迹。
摘要:Predicting vehicle trajectories plays an important role in autonomous driving and ITS applications. Although multiple deep learning algorithms are devised to predict vehicle trajectories, their reliant on specific graph structure (e.g., Graph Neural Network) or explicit intention labeling limit their flexibilities. In this study, we propose a pure Transformer-based network with multiple modals considering their neighboring vehicles. Two separate tracks are employed. One track focuses on predicting the trajectories while the other focuses on predicting the likelihood of each intention considering neighboring vehicles. Study finds that the two track design can increase the performance by separating spatial module from the trajectory generating module. Also, we find the the model can learn an ordered group of trajectories by predicting residual offsets among K trajectories.
【2】Transformer See, Transformer Do: Copying as an Intermediate Step in Learning Analogical Reasoning
标题:Transformer See,Transformer Do:Inbox作为学习类比推理的中间步骤
链接:https://arxiv.org/abs/2604.06501
作者:Philipp Hellwig,Willem Zuidema,Claire E. Stevenson,Martha Lewis
摘要:类比推理是人类智能的标志,它使我们能够通过将知识从一种情况转移到另一种情况来解决新问题。然而,开发能够强大的类人类比推理的人工智能系统已经被证明是困难的。在这项工作中,我们训练Transformers使用元学习组合(MLC)的类比推理任务(字母串类比),并评估其泛化能力。我们发现,当引导模型处理由训练数据中的复制任务引起的信息量最大的问题元素时,字母串类比变得可学习。此外,当模型使用更异构的数据集进行训练时,对新字母表的泛化变得更好,其中我们的3层编码器-解码器模型优于大多数前沿模型。MLC方法还使得能够对训练变换的组合进行一些推广,但不能推广到完全新颖的变换。为了理解模型是如何运作的,我们确定了一个近似模型计算的算法。我们使用可解释性分析来验证这一点,并表明该模型可以根据来自算法的期望来精确地操纵。最后,我们讨论了我们的研究结果的影响,更大的模型和人类类比推理的泛化能力。
摘要:Analogical reasoning is a hallmark of human intelligence, enabling us to solve new problems by transferring knowledge from one situation to another. Yet, developing artificial intelligence systems capable of robust human-like analogical reasoning has proven difficult. In this work, we train transformers using Meta-Learning for Compositionality (MLC) on an analogical reasoning task (letter-string analogies) and assess their generalization capabilities. We find that letter-string analogies become learnable when guiding the models to attend to the most informative problem elements induced by including copying tasks in the training data. Furthermore, generalization to new alphabets becomes better when models are trained with more heterogeneous datasets, where our 3-layer encoder-decoder model outperforms most frontier models. The MLC approach also enables some generalization to compositions of trained transformations, but not to completely novel transformations. To understand how the model operates, we identify an algorithm that approximates the model's computations. We verify this using interpretability analyses and show that the model can be steered precisely according to expectations derived from the algorithm. Finally, we discuss implications of our findings for generalization capabilities of larger models and parallels to human analogical reasoning.
【3】Weakly Supervised Distillation of Hallucination Signals into Transformer Representations
标题:弱监督地将幻觉信号蒸馏为Transformer表示
链接:https://arxiv.org/abs/2604.06277
作者:Shoaib Sadiq Salehmohamed,Jinal Prashant Thakkar,Hansika Aredla,Shaik Mohammed Omar,Shalmali Ayachit
备注:20 pages, 6 figures, 6 tables. Introduces a 15k-sample representation-level hallucination dataset with full transformer hidden states and multi-signal weak supervision. Evaluates 5 probing architectures and demonstrates internal hallucination detection without external inference-time signals. Includes held-out test evaluation and deployment benchmarks
摘要:现有的大型语言模型(LLM)的幻觉检测方法依赖于推理时的外部验证,需要黄金答案,检索系统或辅助判断模型。我们想知道,这种外部监督是否可以在训练过程中被提取到模型自己的表征中,从而在推理时仅从内部激活中进行幻觉检测。 我们引入了一个弱监督框架,它结合了三个互补的接地信号:子串匹配,句子嵌入相似性和LLM作为判断判决,将生成的响应标记为接地或幻觉,而无需人工注释。使用这个框架,我们从SQuAD v2构建了一个15000个样本的数据集(10500个训练/开发样本和一个单独的5000个样本的测试集),其中每个样本将LLaMA-2- 7 B生成的答案与其完整的每层隐藏状态和结构化的幻觉标签配对。 然后,我们训练五个探测分类器:ProbeMLP(M0),LayerWiseMLP(M1),CrossLayerTransformer(M2),HierarchicalTransformer(M3)和CrossLayerAttentionTransformerV 2(M4),直接在这些隐藏状态上,将外部接地信号仅视为训练时监督。我们的中心假设是,幻觉检测信号可以被提取到Transformer表示,使内部检测没有任何外部验证在推理时间。 结果支持这一假设。基于转换器的探针实现了最强的区分,M2在5倍平均AUC/F1上表现最好,M3在单倍验证和保留测试评价上表现最好。我们还对推理效率进行了基准测试:探测延迟范围从0.15到5.62 ms(批处理)和1.55到6.66 ms(单个样本),而端到端生成加上探测吞吐量保持在每秒约0.231次查询,这表明实际开销可以忽略不计。
摘要
:Existing hallucination detection methods for large language models (LLMs) rely on external verification at inference time, requiring gold answers, retrieval systems, or auxiliary judge models. We ask whether this external supervision can instead be distilled into the model's own representations during training, enabling hallucination detection from internal activations alone at inference time. We introduce a weak supervision framework that combines three complementary grounding signals: substring matching, sentence embedding similarity, and an LLM as a judge verdict to label generated responses as grounded or hallucinated without human annotation. Using this framework, we construct a 15000-sample dataset from SQuAD v2 (10500 train/development samples and a separate 5000-sample test set), where each example pairs a LLaMA-2-7B generated answer with its full per-layer hidden states and structured hallucination labels. We then train five probing classifiers: ProbeMLP (M0), LayerWiseMLP (M1), CrossLayerTransformer (M2), HierarchicalTransformer (M3), and CrossLayerAttentionTransformerV2 (M4), directly on these hidden states, treating external grounding signals as training-time supervision only. Our central hypothesis is that hallucination detection signals can be distilled into transformer representations, enabling internal detection without any external verification at inference time. Results support this hypothesis. Transformer-based probes achieve the strongest discrimination, with M2 performing best on 5-fold average AUC/F1, and M3 performing best on both single-fold validation and held-out test evaluation. We also benchmark inference efficiency: probe latency ranges from 0.15 to 5.62 ms (batched) and 1.55 to 6.66 ms (single sample), while end-to-end generation plus probe throughput remains approximately 0.231 queries per second, indicating negligible practical overhead.
GAN|对抗|攻击|生成相关(6篇)
【1】On the Price of Privacy for Language Identification and Generation
标题:论语言识别和生成的隐私代价
链接:https://arxiv.org/abs/2604.07238
作者:Xiaoyu Li,Andi Han,Jiaojiao Jiang,Junbin Gao
摘要:随着大型语言模型(LLM)越来越多地使用敏感的用户数据进行训练,了解语言学习中隐私的基本成本变得至关重要。我们开始研究的差异私人(DP)的语言识别和生成的不可知统计设置,建立算法和匹配的下限,精确地量化隐私的成本。对于这两个任务,近似$(\vareps,δ)$-DP与常数$\vareps> 0$恢复非私有错误率:$\exp(-r(n))$用于识别(对于任何$r(n)= o(n)$)和$\exp(-Ω(n))$用于生成。在纯$\varepsilon$-DP下,指数下降了$\min\{1,\varepsilon\}$的乘法因子,我们表明它紧到常数。值得注意的是,对于在温和假设下的纯DP生成,上限$\exp(-\min\{1,\varepad\} \cdot Ω(n))$与下限匹配到一些常数,建立最优速率。我们的研究结果表明,语言学习中的隐私成本是令人惊讶的温和:在近似DP下完全不存在,在纯DP下指数中正好有$\min\{1,\varepad\}$因子。
摘要:As large language models (LLMs) are increasingly trained on sensitive user data, understanding the fundamental cost of privacy in language learning becomes essential. We initiate the study of differentially private (DP) language identification and generation in the agnostic statistical setting, establishing algorithms and matching lower bounds that precisely quantify the cost of privacy. For both tasks, approximate $(\varepsilon, δ)$-DP with constant $\varepsilon > 0$ recovers the non-private error rates: $\exp(-r(n))$ for identification (for any $r(n) = o(n)$) and $\exp(-Ω(n))$ for generation. Under pure $\varepsilon$-DP, the exponents degrade by a multiplicative factor of $\min\{1, \varepsilon\}$, which we show is tight up to constants. Notably, for generation under pure DP with mild assumptions, the upper bound $\exp(-\min\{1,\varepsilon\} \cdot Ω(n))$ matches the lower bound up to some constants, establishing an optimal rate. Our results show that the cost of privacy in language learning is surprisingly mild: absent entirely under approximate DP, and exactly a $\min\{1,\varepsilon\}$ factor in the exponent under pure DP.
【2】Dynamic Context Evolution for Scalable Synthetic Data Generation
标题:可扩展合成数据生成的动态上下文进化
链接:https://arxiv.org/abs/2604.07147
作者:Ryan Lingo,Rajeev Chhajer
摘要:当在多个批次中独立提示时,大型语言模型会产生重复的输出,这种现象我们称之为跨批次模式崩溃:当一个语言模型在没有访问其前几代的情况下重复提示时,输出多样性的逐渐丧失。从业者长期以来一直通过临时重复数据删除和种子轮换来缓解这一问题,但没有原则性的框架存在。本文介绍了动态上下文演化(Dynamic Context Evolution,DCE),它包括三种机制:(1)动词化尾部采样(模型用关于它有多明显的猜测来标记每个想法,并且明显的想法被丢弃),其经由模型自我评估过滤高概率候选者;(2)语义记忆,其维持持久嵌入索引以拒绝跨批次的近似重复;(3)自适应提示进化,利用记忆状态和旋转多样性策略,对每批生成提示进行重构。在三个领域的实验中,(可持续包装概念,教育考试问题和创意写作提示)和两个模型家庭(gpt-5-mini和claude-haiku-4-5),每种方法2-3个随机种子的成分消融显示,DCE实现了0.0 +/- 0.0%的崩溃,而对于幼稚提示,则为5.6 +/- 2.0%,而每个种子产生17-18个HDBSCAN簇,而naive的不稳定的2-17个簇,指示可靠的更丰富的概念结构。这些结果通过独立嵌入模型(all-MiniLM-L 6-v2)进行验证,并在VTS阈值tau和重复数据删除阈值delta的灵敏度扫描中保持不变。可扩展性和快速进化单独是不够的,但联合起来是有效的,每1,000个候选者仅使用标准API调用约为0.50美元,不需要微调或自定义架构。
摘要:Large language models produce repetitive output when prompted independently across many batches, a phenomenon we term cross-batch mode collapse: the progressive loss of output diversity when a language model is prompted repeatedly without access to its prior generations. Practitioners have long mitigated this with ad hoc deduplication and seed rotation, but no principled framework exists. We introduce Dynamic Context Evolution (DCE), comprising three mechanisms: (1) verbalized tail sampling (the model labels each idea with a guess about how obvious it is, and obvious ideas are discarded), which filters high-probability candidates via model self-assessment; (2) semantic memory, which maintains a persistent embedding index to reject near-duplicates across batches; and (3) adaptive prompt evolution, which reconstructs the generation prompt each batch using memory state and rotating diversity strategies. In experiments across three domains (sustainable packaging concepts, educational exam questions, and creative writing prompts) and two model families (gpt-5-mini and claude-haiku-4-5), a component ablation across 2-3 random seeds per method shows that DCE achieves 0.0 +/- 0.0% collapse versus 5.6 +/- 2.0% for naive prompting, while producing 17-18 HDBSCAN clusters per seed versus naive's volatile 2-17, indicating reliably richer conceptual structure. These results are validated with an independent embedding model (all-MiniLM-L6-v2) and hold across sensitivity sweeps of the VTS threshold tau and dedup threshold delta. Deduplication and prompt evolution are individually insufficient but jointly effective, at approximately $0.50 per 1,000 candidates using only standard API calls, with no fine-tuning or custom architectures required.
【3】Towards Robust Content Watermarking Against Removal and Forgery Attacks
标题:迈向针对删除和伪造攻击的稳健内容水印
链接:https://arxiv.org/abs/2604.06662
作者:Yifan Zhu,Yihan Wang,Xiao-Shan Gao
备注:14 pages, 5 figures, CVPR 2026 Findings
摘要
:生成的内容引起了对版权保护、图像出处和信用归属的严重关注。这些问题的潜在解决方案是水印。近年来,基于文本到图像扩散模型的内容水印因其有效的检测效果和鲁棒性而得到了广泛的研究。然而,这些水印技术容易受到潜在的对抗性攻击,如删除攻击和伪造攻击。在本文中,我们建立了一个新的水印范例称为实例特定水印与双边检测(ISTS)抵抗删除和伪造攻击。具体来说,我们介绍了一种策略,动态控制注入时间和水印模式的基础上,语义的用户的提示。此外,我们提出了一种新的双边检测方法,以提高水印检测的鲁棒性。实验证明了我们的水印的优越性,对删除和伪造攻击。
摘要:Generated contents have raised serious concerns about copyright protection, image provenance, and credit attribution. A potential solution for these problems is watermarking. Recently, content watermarking for text-to-image diffusion models has been studied extensively for its effective detection utility and robustness. However, these watermarking techniques are vulnerable to potential adversarial attacks, such as removal attacks and forgery attacks. In this paper, we build a novel watermarking paradigm called Instance-Specific watermarking with Two-Sided detection (ISTS) to resist removal and forgery attacks. Specifically, we introduce a strategy that dynamically controls the injection time and watermarking patterns based on the semantics of users' prompts. Furthermore, we propose a new two-sided detection approach to enhance robustness in watermark detection. Experiments have demonstrated the superiority of our watermarking against removal and forgery attacks.
【4】ExplainFuzz: Explainable and Constraint-Conditioned Test Generation with Probabilistic Circuits
标题:ExplainFuzz:使用概率电路的可解释和约束条件测试生成
链接:https://arxiv.org/abs/2604.06559
作者:Annaëlle Baiget,Jaron Maene,Seongmin Lee,Benjie Wang,Guy Van den Broeck,Miryung Kim
备注:19 pages
摘要:理解和解释生成的测试输入的结构对于有效的软件测试和调试是必不可少的。现有的方法-包括基于语法的模糊器,概率上下文无关语法(pCFGs)和大型语言模型(LLM)-受到严重的限制。它们经常产生无法反映现实数据分布的病态输入,难以捕获上下文敏感的概率依赖关系,并且缺乏可解释性。我们介绍ExplainFuzz,测试生成框架,利用概率电路(PC)学习和查询结构化分布在基于语法的测试输入可解释和可控。从上下文无关语法(CFG)开始,ExplainFuzz编译一个语法感知的PC,并在现有输入上训练它。然后通过采样生成新的输入。ExplainFuzz利用PC的调节能力来结合测试特定的约束(例如,查询必须具有GROUPBY),从而使得受约束的概率采样能够生成满足语法和用户提供的约束的输入。 我们的研究结果表明,ExplainFuzz提高了生成的输入的一致性和真实性,实现了显着的困惑减少相比,pCFGs,语法不知道的PC,和LLM。通过利用其原生条件反射功能,ExplainFuzz显着增强了满足用户提供的约束的输入多样性。与语法感知的变异模糊相比,ExplainFuzz将SQL中的错误触发率从35%提高到63%,将XML中的错误触发率从10%提高到100%。这些结果证明了学习的输入分布优于突变模糊的能力,突变模糊通常限于探索种子输入的局部邻域。这些功能突出了PC作为语法感知、可控测试生成的基础的潜力,这些测试生成捕获上下文敏感的概率依赖性。
摘要:Understanding and explaining the structure of generated test inputs is essential for effective software testing and debugging. Existing approaches--including grammar-based fuzzers, probabilistic Context-Free Grammars (pCFGs), and Large Language Models (LLMs)--suffer from critical limitations. They frequently produce ill-formed inputs that fail to reflect realistic data distributions, struggle to capture context-sensitive probabilistic dependencies, and lack explainability. We introduce ExplainFuzz, a test generation framework that leverages Probabilistic Circuits (PCs) to learn and query structured distributions over grammar-based test inputs interpretably and controllably. Starting from a Context-Free Grammar (CFG), ExplainFuzz compiles a grammar-aware PC and trains it on existing inputs. New inputs are then generated via sampling. ExplainFuzz utilizes the conditioning capability of PCs to incorporate test-specific constraints (e.g., a query must have GROUP BY), enabling constrained probabilistic sampling to generate inputs satisfying grammar and user-provided constraints. Our results show that ExplainFuzz improves the coherence and realism of generated inputs, achieving significant perplexity reduction compared to pCFGs, grammar-unaware PCs, and LLMs. By leveraging its native conditioning capability, ExplainFuzz significantly enhances the diversity of inputs that satisfy a user-provided constraint. Compared to grammar-aware mutational fuzzing, ExplainFuzz increases bug-triggering rates from 35% to 63% in SQL and from 10% to 100% in XML. These results demonstrate the power of a learned input distribution over mutational fuzzing, which is often limited to exploring the local neighborhood of seed inputs. These capabilities highlight the potential of PCs to serve as a foundation for grammar-aware, controllable test generation that captures context-sensitive, probabilistic dependencies.
【5】Adversarial Robustness of Time-Series Classification for Crystal Collimator Alignment
标题:晶体瞄准器对准时间序列分类的对抗鲁棒性
链接:https://arxiv.org/abs/2604.06289
作者:Xaver Fink,Borja Fernandez Adiego,Daniele Mirarchi,Eloise Matheson,Alvaro Garcia Gonzales,Gianmarco Ricci,Joost-Pieter Katoen
摘要:在本文中,我们分析和改进的卷积神经网络(CNN),协助晶体准直器对准在欧洲核子研究中心的大型强子对撞机(LHC)的晶体旋转过程中的光束损失监测器(BLM)的时间序列进行分类的对抗鲁棒性。我们形式化的本地鲁棒性属性,这个分类器的对抗性威胁模型的基础上,现实世界的可扩展性。基于用于转换和语义扰动鲁棒性的已建立的参数化输入转换模式,我们为部署的时间序列管道实例化了一个预处理感知包装器:我们将时间序列规范化,填充约束和结构化扰动编码为CNN前面的轻量级可微包装器,以便现有的基于梯度的鲁棒性框架可以在部署的管道上运行。对于形式验证,数据相关的预处理,如每个窗口的z归一化引入了非线性运算符,需要验证特定的抽象。因此,我们专注于基于攻击的鲁棒性估计和管道检查的有效性,通过基准测试与框架Foolbox和ART的鲁棒性。对抗性微调所产生的CNN提高了高达18.6%的鲁棒准确性,而不会降低干净的准确性。最后,我们将时间序列数据的鲁棒性扩展到单窗口之外的滑动窗口分类的序列级鲁棒性,引入对抗序列作为完整扫描的时间鲁棒性要求的反例,并观察攻击引起的错误分类在相邻窗口中持续存在。
摘要
:In this paper, we analyze and improve the adversarial robustness of a convolutional neural network (CNN) that assists crystal-collimator alignment at CERN's Large Hadron Collider (LHC) by classifying a beam-loss monitor (BLM) time series during crystal rotation. We formalize a local robustness property for this classifier under an adversarial threat model based on real-world plausibility. Building on established parameterized input-transformation patterns used for transformation- and semantic-perturbation robustness, we instantiate a preprocessing-aware wrapper for our deployed time-series pipeline: we encode time-series normalization, padding constraints, and structured perturbations as a lightweight differentiable wrapper in front of the CNN, so that existing gradient-based robustness frameworks can operate on the deployed pipeline. For formal verification, data-dependent preprocessing such as per-window z-normalization introduces nonlinear operators that require verifier-specific abstractions. We therefore focus on attack-based robustness estimates and pipeline-checked validity by benchmarking robustness with the frameworks Foolbox and ART. Adversarial fine-tuning of the resulting CNN improves robust accuracy by up to 18.6 % without degrading clean accuracy. Finally, we extend robustness on time-series data beyond single windows to sequence-level robustness for sliding-window classification, introduce adversarial sequences as counterexamples to a temporal robustness requirement over full scans, and observe attack-induced misclassifications that persist across adjacent windows.
【6】Tight Convergence Rates for Online Distributed Linear Estimation with Adversarial Measurements
标题:对抗观测下在线分布线性估计的紧收敛速度
链接:https://arxiv.org/abs/2604.06282
作者:Nibedita Roy,Vishal Halder,Gugan Thoppe,Alexandre Reiffers-Masson,Mihir Dhanakshirur,Naman,Alexandre Azor
备注:Preprint
摘要:我们研究了分布参数服务器工人设置中随机向量X的均值估计。工人$i$观察$a_i^\top X$的样本,其中$a_i^\top$是已知传感矩阵$A$的第$i$行。关键的挑战是对抗性测量和冗余:固定的工作者子集可能会传输损坏的测量结果,并且工作者被异步激活-任何时候只有一个工作者处于活动状态。在我们以前的工作中,我们提出了一个两时标的$\ell_1$-最小化算法,并建立了渐近恢复下的零空间性质的条件$A$。在这篇文章中,我们建立了紧的非渐近收敛速度相同的零空间性质的条件下。我们还确定放松条件下,确切的恢复可能会失败,但恢复的$\mathbb{E}[X]$的预计组件仍然是可能的。总的来说,我们的研究结果提供了一个统一的有限时间表征的鲁棒性,可识别性和统计效率的分布式线性估计与对抗工人,网络断层扫描和相关的分布式传感问题的影响。
摘要:We study mean estimation of a random vector $X$ in a distributed parameter-server-worker setup. Worker $i$ observes samples of $a_i^\top X$, where $a_i^\top$ is the $i$th row of a known sensing matrix $A$. The key challenges are adversarial measurements and asynchrony: a fixed subset of workers may transmit corrupted measurements, and workers are activated asynchronously--only one is active at any time. In our previous work, we proposed a two-timescale $\ell_1$-minimization algorithm and established asymptotic recovery under a null-space-property-like condition on $A$. In this work, we establish tight non-asymptotic convergence rates under the same null-space-property-like condition. We also identify relaxed conditions on $A$ under which exact recovery may fail but recovery of a projected component of $\mathbb{E}[X]$ remains possible. Overall, our results provide a unified finite-time characterization of robustness, identifiability, and statistical efficiency in distributed linear estimation with adversarial workers, with implications for network tomography and related distributed sensing problems.
半/弱/无/有监督|不确定性|主动学习(4篇)
【1】Mixture Proportion Estimation and Weakly-supervised Kernel Test for Conditional Independence
标题:条件独立性的混合比例估计和弱监督核测试
链接:https://arxiv.org/abs/2604.07191
作者:Yushi Hirose,Akito Narahara,Takafumi Kanamori
备注:AISTATS 2026
摘要:混合比例估计(MPE)的目的是从未标记的数据中估计类先验。该任务是弱监督学习的关键组成部分,例如PU学习,标签噪声学习和域自适应。现有的MPE方法依赖于\textit{不可约}假设或其变体的可识别性。在本文中,我们提出了新的假设的基础上的条件独立性(CI)给定的类标签,确保可识别性,即使不可约不成立。在这些假设下,我们发展了矩估计的方法,并分析了它们的渐近性质。此外,我们提出了弱监督核测试,以验证CI假设,这是独立的兴趣,如因果发现和公平性评估的应用。从经验上讲,我们证明了与现有方法相比,我们的估计器的性能有所改善,并且我们的测试成功地控制了I型和II型错误。标签{key}
摘要:Mixture proportion estimation (MPE) aims to estimate class priors from unlabeled data. This task is a critical component in weakly supervised learning, such as PU learning, learning with label noise, and domain adaptation. Existing MPE methods rely on the \textit{irreducibility} assumption or its variant for identifiability. In this paper, we propose novel assumptions based on conditional independence (CI) given the class label, which ensure identifiability even when irreducibility does not hold. We develop method of moments estimators under these assumptions and analyze their asymptotic properties. Furthermore, we present weakly-supervised kernel tests to validate the CI assumptions, which are of independent interest in applications such as causal discovery and fairness evaluation. Empirically, we demonstrate the improved performance of our estimators compared with existing methods and that our tests successfully control both type I and type II errors.\label{key}
【2】Accuracy Improvement of Semi-Supervised Segmentation Using Supervised ClassMix and Sup-Unsup Feature Discriminator
标题:使用监督ClassMix和Sup-Unsup特征鉴别器提高半监督分割的准确性
链接:https://arxiv.org/abs/2604.07122
作者:Takahiro Mano,Reiji Saito,Kazuhiro Hotta
摘要:在语义分割中,为训练数据创建像素级标签会产生大量成本。为了解决这个问题,半监督学习,它利用少量的标记图像和未标记图像来提高性能,已经得到了关注。传统的半监督学习方法ClassMix将从未标记图像预测的类别标签粘贴到其他图像上。然而,由于ClassMix使用从未标记的图像中获得的伪标签执行操作,因此存在处理不准确标签的风险。此外,标记图像和未标记图像之间的数据质量存在差距,这可能会影响特征图。本研究针对这两个问题。首先,我们提出了一种方法,将来自标记图像的类标签以及相应的图像区域粘贴到未标记图像及其伪标记图像上。其次,我们引入了一种方法,该方法训练模型,使未标记图像的预测与标记图像的预测更相似。在Chase和COVID-19数据集上的实验表明,与传统的半监督学习方法相比,mIoU平均提高了2.07%。
摘要
:In semantic segmentation, the creation of pixel-level labels for training data incurs significant costs. To address this problem, semi-supervised learning, which utilizes a small number of labeled images alongside unlabeled images to enhance the performance, has gained attention. A conventional semi-supervised learning method, ClassMix, pastes class labels predicted from unlabeled images onto other images. However, since ClassMix performs operations using pseudo-labels obtained from unlabeled images, there is a risk of handling inaccurate labels. Additionally, there is a gap in data quality between labeled and unlabeled images, which can impact the feature maps. This study addresses these two issues. First, we propose a method where class labels from labeled images, along with the corresponding image regions, are pasted onto unlabeled images and their pseudo-labeled images. Second, we introduce a method that trains the model to make predictions on unlabeled images more similar to those on labeled images. Experiments on the Chase and COVID-19 datasets demonstrated an average improvement of 2.07% in mIoU compared to conventional semi-supervised learning methods.
【3】DynLP: Parallel Dynamic Batch Update for Label Propagation in Semi-Supervised Learning
标题:DynLP:半监督学习中标签传播的并行动态批量更新
链接:https://arxiv.org/abs/2604.06596
作者:S M Shovan,Arindam Khanda,S M Ferdous,Sajal K. Das,Mahantesh Halappanavar
备注:To be published in the ACM International Conference on Supercomputing (ICS 2026)
摘要:半监督学习的目标是仅使用一小部分标记数据来推断类别标签。在基于图的半监督学习中,这通常通过标签传播来预测未标记节点的标签来实现。然而,在现实世界的应用程序中,数据通常是分批递增到达的。每次出现新的批次时,重新应用传统的标签传播算法来重新计算所有标签是冗余的、计算密集的并且效率低下。为了解决缺乏有效的标签传播更新方法的问题,我们提出了DynLP,一种新的以GPU为中心的动态批量并行标签传播算法,它只执行必要的更新,将更改传播到相关子图,而不需要完全重新计算。通过利用GPU架构优化,与最先进的方法相比,我们的算法在大规模数据集上实现了平均13倍和高达102倍的加速。
摘要:Semi-supervised learning aims to infer class labels using only a small fraction of labeled data. In graph-based semi-supervised learning, this is typically achieved through label propagation to predict labels of unlabeled nodes. However, in real-world applications, data often arrive incrementally in batches. Each time a new batch appears, reapplying the traditional label propagation algorithm to recompute all labels is redundant, computationally intensive, and inefficient. To address the absence of an efficient label propagation update method, we propose DynLP, a novel GPU-centric Dynamic Batched Parallel Label Propagation algorithm that performs only the necessary updates, propagating changes to the relevant subgraph without requiring full recalculation. By exploiting GPU architectural optimizations, our algorithm achieves on average 13x and upto 102x speedup on large-scale datasets compared to state-of-the-art approaches.
【4】ELC: Evidential Lifelong Classifier for Uncertainty Aware Radar Pulse Classification
标题:ELC:不确定性感知雷达脉冲分类的证据终身分类器
链接:https://arxiv.org/abs/2604.06958
作者:Mohamed Rabie,Chinthana Panagamuwa,Konstantinos G. Kyriakopoulos
备注:IEEE RadarConf'26 Submission. 6 pages; 3 figures; 1 table
摘要:可靠的雷达脉冲分类是电磁战态势感知和决策支持的基础。深度神经网络在雷达脉冲和射频发射器识别方面表现出了强大的性能;然而,它们本身很难有效地学习新脉冲,并且缺乏表达预测信心的机制。本文将不确定性量化与终身学习相结合,以应对这两个挑战。所提出的方法是一个证据终身分类器(ELC),模型认知的不确定性,使用证据理论。ELC是针对贝叶斯终身分类器(BLC)进行评估的,该分类器通过香农熵量化不确定性。两者都集成了Learn-Prune-Share,以实现对新脉冲的持续学习和基于不确定性的选择性预测,以拒绝不可靠的预测。在2个合成雷达和3个RF指纹数据集上评估ELC和BLC。在合成雷达脉冲数据集上,基于证据不确定性的选择性预测在-20 dB SNR下将召回率提高了46%,突出了其在低SNR条件下识别不可靠预测的有效性。这些研究结果表明,证据的不确定性提供了一个强有力的相关性之间的信心和正确性,提高可信度的ELC允许它表示无知。
摘要:Reliable radar pulse classification is essential in Electromagnetic Warfare for situational awareness and decision support. Deep Neural Networks have shown strong performance in radar pulse and RF emitter recognition; however, on their own they struggle to efficiently learn new pulses and lack mechanisms for expressing predictive confidence. This paper integrates Uncertainty Quantification with Lifelong Learning to address both challenges. The proposed approach is an Evidential Lifelong Classifier (ELC), which models epistemic uncertainty using evidence theory. ELC is evaluated against a Bayesian Lifelong Classifier (BLC), which quantifies uncertainty through Shannon entropy. Both integrate Learn-Prune-Share to enable continual learning of new pulses and uncertainty-based selective prediction to reject unreliable predictions. ELC and BLC are evaluated on 2 synthetic radar and 3 RF fingerprinting datasets. Selective prediction based on evidential uncertainty improves recall by up to 46% at -20 dB SNR on synthetic radar pulse datasets, highlighting its effectiveness at identifying unreliable predictions in low-SNR conditions compared to BLC. These findings demonstrate that evidential uncertainty offers a strong correlation between confidence and correctness, improving the trustworthiness of ELC by allowing it to express ignorance.
迁移|Zero/Few/One-Shot|自适应(8篇)
【1】CADENCE: Context-Adaptive Depth Estimation for Navigation and Computational Efficiency
标题:CADEENCE:用于导航和计算效率的上下文自适应深度估计
链接:https://arxiv.org/abs/2604.07286
作者:Timothy K Johnsen,Marco Levorato
备注:7 pages, 7 figures, Accepted for publication at IEEE World AI IoT Congress (AIIoT) 2026
摘要
:部署在偏远环境中的自动驾驶汽车通常依赖于嵌入式处理器、紧凑型电池和轻型传感器。这些硬件限制与获得环境的鲁棒表示的需求相冲突,这通常需要执行计算密集型深度神经网络来进行感知。为了应对这一挑战,我们提出了CADENCE,一个自适应系统,动态缩放的计算复杂性的可瘦身的单目深度估计网络,以响应导航需求和环境背景。通过闭合感知保真度和驱动要求之间的回路,CADENCE确保高精度计算仅在关键任务时使用。我们对已发布的开源测试平台进行了评估,该平台将Microsoft AirSim与NVIDIA Jetson Orin Nano集成在一起。与最先进的静态方法相比,CADENCE将传感器采集、功耗和推理延迟分别降低了9.67%、16.1%和74.8%。结果表明,能源消耗总体减少了75.0%,导航精度提高了7.43%。
摘要:Autonomous vehicles deployed in remote environments typically rely on embedded processors, compact batteries, and lightweight sensors. These hardware limitations conflict with the need to derive robust representations of the environment, which often requires executing computationally intensive deep neural networks for perception. To address this challenge, we present CADENCE, an adaptive system that dynamically scales the computational complexity of a slimmable monocular depth estimation network in response to navigation needs and environmental context. By closing the loop between perception fidelity and actuation requirements, CADENCE ensures high-precision computing is only used when mission-critical. We conduct evaluations on our released open-source testbed that integrates Microsoft AirSim with an NVIDIA Jetson Orin Nano. As compared to a state-of-the-art static approach, CADENCE decreases sensor acquisitions, power consumption, and inference latency by 9.67%, 16.1%, and 74.8%, respectively. The results demonstrate an overall reduction in energy expenditure by 75.0%, along with an increase in navigation accuracy by 7.43%.
【2】Tracking Adaptation Time: Metrics for Temporal Distribution Shift
标题:跟踪适应时间:时间分布转移的预设
链接:https://arxiv.org/abs/2604.07266
作者:Lorenzo Iovine,Giacomo Ziffer,Emanuele Della Valle
备注:Accepted at CEUR-WS Vol. 4183 (Streaming Continual Learning Bridge at AAAI 2026)
摘要:在时间分布偏移下评估鲁棒性仍然是一个开放的挑战。现有指标量化了性能的平均下降,但未能捕捉模型如何适应不断变化的数据。因此,时间退化经常被误解:当准确性下降时,不清楚模型是否无法适应,或者数据本身是否变得更难学习。在这项工作中,我们提出了三个互补的指标来区分适应内在的困难的数据。总之,这些指标提供了一个动态的和可解释的视图下的模型行为的时间分布的转变。结果表明,我们的指标发现隐藏在现有的分析适应模式,在不断变化的环境中提供了更丰富的时间鲁棒性的理解。
摘要:Evaluating robustness under temporal distribution shift remains an open challenge. Existing metrics quantify the average decline in performance, but fail to capture how models adapt to evolving data. As a result, temporal degradation is often misinterpreted: when accuracy declines, it is unclear whether the model is failing to adapt or whether the data itself has become inherently more challenging to learn. In this work, we propose three complementary metrics to distinguish adaptation from intrinsic difficulty in the data. Together, these metrics provide a dynamic and interpretable view of model behavior under temporal distribution shift. Results show that our metrics uncover adaptation patterns hidden by existing analysis, offering a richer understanding of temporal robustness in evolving environments.
【3】Predictive Representations for Skill Transfer in Reinforcement Learning
标题:强化学习中技能转移的预测表示
链接:https://arxiv.org/abs/2604.07016
作者:Ruben Vereecken,Luke Dickens,Alessandra Russo
备注:esearch conducted: September 2018 to June 2021. This manuscript represents the work as of June 2021
摘要:扩大强化学习的一个关键挑战是推广学习到的行为。如果没有继承已获得知识的能力,智能体注定要从头开始学习每一项任务。在本文中,我们开发了一个新的形式主义转移凭借状态抽象。基于任务独立的,紧凑的观察(结果)的环境,我们引入结果预测状态表示(OPSRs),代理中心和任务独立的抽象,由结果的预测。我们正式和经验表明,他们有潜力的最佳,但有限的转让,然后克服这种权衡引入OPSR为基础的技能,即抽象的行动(基于选项),可以重复使用的任务之间的状态抽象的结果。在一系列的实证研究中,我们从演示中学习基于OPSR的技能,并展示了它们如何在没有任何预处理的情况下,在全新的和看不见的任务中大大加快学习速度。我们相信,在这项工作中引入的框架是一个有前途的一步,一般在RL的转移,并通过结合状态和动作抽象具体转移。
摘要:A key challenge in scaling up Reinforcement Learning is generalizing learned behaviour. Without the ability to carry forward acquired knowledge an agent is doomed to learn each task from scratch. In this paper we develop a new formalism for transfer by virtue of state abstraction. Based on task-independent, compact observations (outcomes) of the environment, we introduce Outcome-Predictive State Representations (OPSRs), agent-centered and task-independent abstractions that are made up of predictions of outcomes. We show formally and empirically that they have the potential for optimal but limited transfer, then overcome this trade-off by introducing OPSR-based skills, i.e. abstract actions (based on options) that can be reused between tasks as a result of state abstraction. In a series of empirical studies, we learn OPSR-based skills from demonstrations and show how they speed up learning considerably in entirely new and unseen tasks without any pre-processing. We believe that the framework introduced in this work is a promising step towards transfer in RL in general, and towards transfer through combining state and action abstraction specifically.
【4】Evaluating PQC KEMs, Combiners, and Cascade Encryption via Adaptive IND-CPA Testing Using Deep Learning
标题:使用深度学习通过自适应IND-CPA测试评估PQC KEM、组合器和级联加密
链接:https://arxiv.org/abs/2604.06942
作者:Simon Calderon,Niklas Johansson,Onur Günlü
摘要
:确保密文不可篡改性是密码安全的基础,但在实际实现和混合设置中凭经验验证此属性带来了实际挑战。过渡到后量子密码学(PQC),其混合结构结合了经典和抗量子原语,使得经验验证方法越来越有价值。通过将IND-CPA游戏建模为二进制分类任务,并在具有BCE损失的标记密文数据上进行训练,我们研究了密文不可识别性的深度神经网络(DNN)算法。我们将此方法应用于PQC KEM。我们专门测试了用于构建ML-KEM、BIKE和HQC等示例的公钥加密(PKE)方案。此外,提出了一种新的扩展DNN建模的混合KEM的经验可扩展性测试。我们在PQC KEM与普通RSA、RSA-OAEP和明文的组合上实现并测试了这一点。最后,通过将DNN IND-CPA分类框架应用于级联对称加密来说明方法的一般性,其中我们测试了AES-CTR,AES-CBC,AES-ECB,ChaCha 20和DES-ECB的组合。在我们对PQC算法、KEM组合器和级联加密的实验中,没有算法或算法组合表现出显著的优势(双侧二项式检验,显著性水平$α= 0.01$),这与理论保证相一致,即包括至少一个IND-CPA安全组件的混合体保持不可篡改性,并且在所考虑的DNN攻击模型下不存在可利用的模式。这些说明了使用深度学习作为更一般的IND-CPA设置中的自适应,实用和通用的经验估计器的潜力,允许数据驱动的实施和组合验证,并补充分析安全分析。
摘要:Ensuring ciphertext indistinguishability is fundamental to cryptographic security, but empirically validating this property in real implementations and hybrid settings presents practical challenges. The transition to post-quantum cryptography (PQC), with its hybrid constructions combining classical and quantum-resistant primitives, makes empirical validation approaches increasingly valuable. By modeling IND-CPA games as binary classification tasks and training on labeled ciphertext data with BCE loss, we study deep neural network (DNN) distinguishers for ciphertext indistinguishability. We apply this methodology to PQC KEMs. We specifically test the public-key encryption (PKE) schemes used to construct examples such as ML-KEM, BIKE, and HQC. Moreover, a novel extension of this DNN modeling for empirical distinguishability testing of hybrid KEMs is presented. We implement and test this on combinations of PQC KEMs with plain RSA, RSA-OAEP, and plaintext. Finally, methodological generality is illustrated by applying the DNN IND-CPA classification framework to cascade symmetric encryption, where we test combinations of AES-CTR, AES-CBC, AES-ECB, ChaCha20, and DES-ECB. In our experiments on PQC algorithms, KEM combiners, and cascade encryption, no algorithm or combination of algorithms demonstrates a significant advantage (two-sided binomial test, significance level $α= 0.01$), consistent with theoretical guarantees that hybrids including at least one IND-CPA-secure component preserve indistinguishability, and with the absence of exploitable patterns under the considered DNN adversary model. These illustrate the potential of using deep learning as an adaptive, practical, and versatile empirical estimator for indistinguishability in more general IND-CPA settings, allowing data-driven validation of implementations and compositions and complementing the analytical security analysis.
【5】STQuant: Spatio-Temporal Adaptive Framework for Optimizer Quantization in Large Multimodal Model Training
标题:STQuant:大型多峰模型训练中优化器量化的时空自适应框架
链接:https://arxiv.org/abs/2604.06836
作者:Minglu Liu,Cunchen Hu,Liangliang Xu,Fengming Tang,Ruijia Wang,Fu Yu
摘要:量化是减少大规模模型训练内存开销的有效方法。然而,大多数现有的方法采用固定精度的策略,忽略了优化器状态分布在层和训练步骤之间变化很大的事实。这样的均匀设计通常引入显著的精度降低。为了超越固定量化,我们提出了STQuant,这是一个分布式训练框架,通过跨层、状态变量和训练步骤的动态精度分配来减少优化器状态的内存占用,同时保持模型质量。出于两个原因,在训练期间天真地应用动态量化具有挑战性。首先,优化器状态在数值上是敏感的,并且量化噪声会使质量不稳定。第二,联合考虑多个状态和层导致大的组合搜索空间。STQuant通过两个关键技术解决了这些挑战:1)可证明接近最优的因子选择策略,可以准确识别对精度自适应影响最大的因子。2)一种动态转换决策算法,将搜索成本从指数复杂度降低到线性复杂度。在GPT-2和ViT上的实验表明,与现有解决方案相比,STQuant将优化器状态内存减少了84.4%,平均位宽低至5.1位。此外,STQuant只会产生O(N/K)的计算开销,并需要O(1)的额外空间。
摘要:Quantization is an effective way to reduce the memory cost of large-scale model training. However, most existing methods adopt fixed-precision policies, which ignore the fact that optimizer-state distributions vary significantly across layers and training steps. Such uniform designs often introduce noticeable accuracy degradation. To move beyond fixed quantization, we propose STQuant, a distributed training framework that reduces the memory footprint of optimizer states via dynamic precision allocation across layers, state variables, and training steps, while maintaining model quality. Naively applying dynamic quantization during training is challenging for two reasons. First, optimizer states are numerically sensitive, and quantization noise can destabilize quality. Second, jointly considering multiple states and layers induces a large combinatorial search space. STQuant addresses these challenges with two key techniques: 1) a provably near-optimal factor selection strategy that accurately identifies the most influential factors for precision adaptation. 2) a dynamic transition decision algorithm that reduces the search cost from exponential to linear complexity. Experiments on GPT-2 and ViT show that STQuant reduces optimizer-state memory by 84.4%, achieving an average bit-width of as low as 5.1 bits, compared with existing solutions. Moreover, STQuant incurs only O(N/K) computational overhead and requires O(1) extra space.
【6】Instance-Adaptive Parametrization for Amortized Variational Inference
标题:摊销变分推理的实例自适应参数化
链接:https://arxiv.org/abs/2604.06796
作者:Andrea Pollastro,Andrea Apicella,Francesco Isgrò,Roberto Prevete
摘要:潜变量模型,包括变分自编码器(VAE),由于其可扩展性和良好的概率公式,仍然是现代深度生成建模的核心工具。这些模型依赖于摊销变分推理来实现有效的后验近似,但这种效率是以共享参数化为代价的,从而产生摊销缺口。我们提出了实例自适应变分自动编码器(IA-VAE),一个摊销变分推理框架,其中超网络生成共享编码器的输入相关调制。这使得推理模型能够进行特定于输入的自适应,同时保持单个前向传递的效率。通过利用实例特定的参数调制,所提出的方法可以实现与具有少得多的参数的标准编码器相当的性能,这表明模型容量的更有效的使用。在真实后验概率已知的情况下,对合成数据进行的实验表明,IA-VAE方法可以得到更准确的后验概率近似,并减小了摊销差距。类似地,在标准图像基准上,IA-VAE一致地改善了与基线VAE相比的保持ELBO,在多次运行中具有统计学显著性增益。这些结果表明,通过实例自适应调制增加推理参数化的灵活性是减轻深度生成模型中摊销引起的次优性的关键因素。
摘要
:Latent variable models, including variational autoencoders (VAE), remain a central tool in modern deep generative modeling due to their scalability and a well-founded probabilistic formulation. These models rely on amortized variational inference to enable efficient posterior approximation, but this efficiency comes at the cost of a shared parametrization, giving rise to the amortization gap. We propose the instance-adaptive variational autoencoder (IA-VAE), an amortized variational inference framework in which a hypernetwork generates input-dependent modulations of a shared encoder. This enables input-specific adaptation of the inference model while preserving the efficiency of a single forward pass. By leveraging instance-specific parameter modulations, the proposed approach can achieve performance comparable to standard encoders with substantially fewer parameters, indicating a more efficient use of model capacity. Experiments on synthetic data, where the true posterior is known, show that IA-VAE yields more accurate posterior approximations and reduces the amortization gap. Similarly, on standard image benchmarks, IA-VAE consistently improves held-out ELBO over baseline VAEs, with statistically significant gains across multiple runs. These results suggest that increasing the flexibility of the inference parametrization through instance-adaptive modulation is a key factor in mitigating amortization-induced suboptimality in deep generative models.
【7】Adaptive Prompt Structure Factorization: A Framework for Self-Discovering and Optimizing Compositional Prompt Programs
标题:自适应提示结构分解:自我发现和优化合成提示程序的框架
链接:https://arxiv.org/abs/2604.06699
作者:Haoyue Liu,Zhichao Wang,Yongxin Guo,Haoran Shou,Xiaoying Tang
摘要:自动提示优化对于从大型语言模型(LLM)中引出可靠的推理至关重要,但大多数仅API提示优化器迭代地编辑整体提示,耦合组件并模糊信用分配,限制可控性并浪费令牌。我们提出了自适应提示结构因子化(aPSF),一个API的框架(输入/输出文本;没有访问模型内部),使用架构师模型来发现特定于任务的提示结构作为语义因素。aPSF然后执行干预性单因素更新:干预性因素级别评分通过验证性能变化估计每个因素的边际贡献,错误引导因素选择将更新路由到当前主要故障源,以实现更有效的样本优化。在多个高级推理基准测试中,aPSF的性能优于包括原理感知优化器在内的强大基线,平均将准确性提高了2.16个百分点,并将MultiArith的优化成本降低了45- 87%,同时在1步中达到峰值验证。
摘要:Automated prompt optimization is crucial for eliciting reliable reasoning from large language models (LLMs), yet most API-only prompt optimizers iteratively edit monolithic prompts, coupling components and obscuring credit assignment, limiting controllability, and wasting tokens. We propose Adaptive Prompt Structure Factorization (aPSF), an API-only framework (prompt-in/text-out; no access to model internals) that uses an Architect model to discover task-specific prompt structures as semantic factors. aPSF then performs interventional, single-factor updates: interventional factor-level scoring estimates each factor's marginal contribution via validation-performance changes, and error-guided factor selection routes updates to the current dominant failure source for more sample-efficient optimization. Across multiple advanced reasoning benchmarks, aPSF outperforms strong baselines including principle-aware optimizers, improving accuracy by up to +2.16 percentage points on average, and reduces optimization cost by 45--87% tokens on MultiArith while reaching peak validation in 1 step.
【8】Accelerating 4D Hyperspectral Imaging through Physics-Informed Neural Representation and Adaptive Sampling
标题:通过基于物理的神经表示和自适应采样加速4D高光谱成像
链接:https://arxiv.org/abs/2604.06561
作者:Chi-Jui Ho,Harsh Bhakta,Wei Xiong,Nicholas Antipa
备注:18 pages, 14 figures
摘要:高维高光谱成像(HSI)使超快分子动力学和复杂的,异构的光谱可视化。然而,应用这种能力来解决空间变化的振动耦合在二维红外(2DIR)光谱,一种类型的相干多维光谱(CMDS),需要禁止长的数据采集,由密集的奈奎斯特采样要求和需要广泛的信号积累。为了解决这一挑战,我们引入了一种物理信息的神经表示方法,该方法可以有效地从稀疏的实验测量中重建密集的空间分辨率200 nm高光谱图像。特别地,我们使用多层感知器(MLP)来建模子采样的4D坐标与其对应的光谱强度之间的关系,并从有限的观测中恢复密集采样的4D光谱。重建结果表明,我们的方法,使用一小部分的样本,忠实地恢复振荡和非振荡的光谱动力学在实验测量。此外,我们开发了一种损失感知自适应采样方法,逐步引入潜在的信息样本进行迭代数据收集,同时进行实验。实验结果表明,所提出的方法实现了高保真频谱恢复仅使用$1/32$的采样预算,而不是穷举采样,有效地减少了总的实验时间高达32倍。该框架提供了一个可扩展的解决方案,用于加速使用超立方体数据的任何实验,包括多维光谱和高光谱成像,为瞬态生物和材料系统的快速化学成像铺平了道路。
摘要:High-dimensional hyperspectral imaging (HSI) enables the visualization of ultrafast molecular dynamics and complex, heterogeneous spectra. However, applying this capability to resolve spatially varying vibrational couplings in two-dimensional infrared (2DIR) spectroscopy, a type of coherent multidimensional spectroscopy (CMDS), necessitates prohibitively long data acquisition, driven by dense Nyquist sampling requirements and the need for extensive signal accumulation. To address this challenge, we introduce a physics-informed neural representation approach that efficiently reconstructs dense spatially-resolved 2DIR hyperspectral images from sparse experimental measurements. In particular, we used a multilayer perceptron (MLP) to model the relationship between the sub-sampled 4D coordinates and their corresponding spectral intensities, and recover densely sampled 4D spectra from limited observations. The reconstruction results demonstrate that our method, using a fraction of the samples, faithfully recovers both oscillatory and non-oscillatory spectral dynamics in experimental measurement. Moreover, we develop a loss-aware adaptive sampling method to progressively introduce potentially informative samples for iterative data collection while conducting experiments. Experimental results show that the proposed approach achieves high-fidelity spectral recovery using only $1/32$ of the sampling budget, as opposed to exhaustive sampling, effectively reducing total experiment time by up to 32-fold. This framework offers a scalable solution for accelerating any experiments with hypercube data, including multidimensional spectroscopy and hyperspectral imaging, paving the way for rapid chemical imaging of transient biological and material systems.
强化学习(7篇)
【1】Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization
标题:Smart Commander:用于车队级PHM决策优化的分层强化学习框架
链接:https://arxiv.org/abs/2604.07171
作者:Yong Si,Mingfei Lu,Jing Li,Yang Hu,Guijiang Li,Yueheng Song,Zhaokui Wang
备注:21 pages, 6 figures, 4 tables
摘要
:由于大规模机群作战中的“维数灾难”以及稀疏反馈和随机任务剖面,军用航空预测与健康管理(PHM)决策面临着重大挑战。为了解决这些问题,本文提出了智能指挥官,一种新的分层强化学习(HRL)框架,旨在优化顺序维护和物流决策。该框架将复杂的控制问题分解为两层层次结构:战略总指挥官管理舰队级可用性和成本目标,而战术作战指挥官则执行特定的行动,以进行出动、维护调度和资源分配。该方法在一个定制的、高保真的离散事件仿真环境中进行了验证,该环境捕捉了飞机配置和保障后勤的动态特性,通过将分层奖励成形与规划增强神经网络相结合,该方法有效地解决了稀疏和延迟奖励的困难。经验评估表明,Smart Commander的性能明显优于传统的单片深度强化学习(DRL)和基于规则的基线。值得注意的是,它实现了训练时间的大幅减少,同时在易发生故障的环境中展示了卓越的可扩展性和鲁棒性。这些结果突出了HRL作为下一代智能车队管理的可靠范例的潜力。
摘要:Decision-making in military aviation Prognostics and Health Management (PHM) faces significant challenges due to the "curse of dimensionality" in large-scale fleet operations, combined with sparse feedback and stochastic mission profiles. To address these issues, this paper proposes Smart Commander, a novel Hierarchical Reinforcement Learning (HRL) framework designed to optimize sequential maintenance and logistics decisions. The framework decomposes the complex control problem into a two-tier hierarchy: a strategic General Commander manages fleet-level availability and cost objectives, while tactical Operation Commanders execute specific actions for sortie generation, maintenance scheduling, and resource allocation. The proposed approach is validated within a custom-built, high-fidelity discrete-event simulation environment that captures the dynamics of aircraft configuration and support logistics.By integrating layered reward shaping with planning-enhanced neural networks, the method effectively addresses the difficulty of sparse and delayed rewards. Empirical evaluations demonstrate that Smart Commander significantly outperforms conventional monolithic Deep Reinforcement Learning (DRL) and rule-based baselines. Notably, it achieves a substantial reduction in training time while demonstrating superior scalability and robustness in failure-prone environments. These results highlight the potential of HRL as a reliable paradigm for next-generation intelligent fleet management.
【2】Energy Saving for Cell-Free Massive MIMO Networks: A Multi-Agent Deep Reinforcement Learning Approach
标题:无细胞大规模CDMA网络的节能:多代理深度强化学习方法
链接:https://arxiv.org/abs/2604.07133
作者:Qichen Wang,Keyu Li,Ozan Alp Topal,Özlem Tugfe Demir,Mustafa Ozger,Cicek Cavdar
摘要:本文主要研究动态业务条件下无小区大规模MIMO(CF mMIMO)网络下行链路的节能问题。我们提出了一种多智能体深度强化学习(MADRL)算法,使每个接入点(AP)能够自主控制天线重新配置和高级睡眠模式(ASM)选择。在训练过程之后,所提出的框架以完全分布式的方式运行,消除了对集中控制的需要,并允许每个AP根据实时流量波动进行动态调整。仿真结果表明,该算法降低了功耗(PC)的56.23%相比,没有任何节能方案的系统和30.12%,相对于非学习机制,只利用最轻的睡眠模式,只有轻微的下降率增加。此外,与广泛使用的深度Q网络(DQN)算法相比,它实现了类似的PC水平,但具有显着更低的丢弃率。
摘要:This paper focuses on energy savings in downlink operation of cell-free massive MIMO (CF mMIMO) networks under dynamic traffic conditions. We propose a multi-agent deep reinforcement learning (MADRL) algorithm that enables each access point (AP) to autonomously control antenna re-configuration and advanced sleep mode (ASM) selection. After the training process, the proposed framework operates in a fully distributed manner, eliminating the need for centralized control and allowing each AP to dynamically adjust to real-time traffic fluctuations. Simulation results show that the proposed algorithm reduces power consumption (PC) by 56.23% compared to systems without any energy-saving scheme and by 30.12% relative to a non-learning mechanism that only utilizes the lightest sleep mode, with only a slight increase in drop ratio. Moreover, compared to the widely used deep Q-network (DQN) algorithm, it achieves a similar PC level but with a significantly lower drop ratio.
【3】Epistemic Robust Offline Reinforcement Learning
标题:认知稳健离线强化学习
链接:https://arxiv.org/abs/2604.07072
作者:Abhilash Reddy Chenreddy,Erick Delage
摘要:离线强化学习从固定数据集学习策略,无需进一步的环境交互。在这种情况下,一个关键的挑战是认识上的不确定性,由有限的或有偏见的数据覆盖面,特别是当行为政策系统地避免某些行动。这可能导致不准确的值估计和不可靠的泛化。像SAC-N这样的基于集成的方法通过使用集成最小值保守地估计Q值来减轻这一点,但它们需要大的集成,并且经常将认识不确定性与任意不确定性混为一谈。为了解决这些局限性,我们提出了一个统一且可推广的框架,该框架用Q值上的紧凑不确定集取代离散系综。我们进一步引入了一个基于Epinet的模型,该模型直接对不确定性集进行整形,以优化鲁棒Bellman目标下的累积奖励,而不依赖于集成。我们还介绍了一个基准评估离线RL算法下的风险敏感的行为政策,并证明了我们的方法实现了改进的鲁棒性和泛化基于集合的基线在表格和连续状态域。
摘要:Offline reinforcement learning learns policies from fixed datasets without further environment interaction. A key challenge in this setting is epistemic uncertainty, arising from limited or biased data coverage, particularly when the behavior policy systematically avoids certain actions. This can lead to inaccurate value estimates and unreliable generalization. Ensemble-based methods like SAC-N mitigate this by conservatively estimating Q-values using the ensemble minimum, but they require large ensembles and often conflate epistemic with aleatoric uncertainty. To address these limitations, we propose a unified and generalizable framework that replaces discrete ensembles with compact uncertainty sets over Q-values. %We further introduce an Epinet based model that directly shapes the uncertainty sets to optimize the cumulative reward under the robust Bellman objective without relying on ensembles. We also introduce a benchmark for evaluating offline RL algorithms under risk-sensitive behavior policies, and demonstrate that our method achieves improved robustness and generalization over ensemble-based baselines across both tabular and continuous state domains.
【4】Production-Ready Automated ECU Calibration using Residual Reinforcement Learning
标题:使用剩余强化学习的生产就绪自动化电子控制单元校准
链接:https://arxiv.org/abs/2604.07059
作者:Andreas Kampmeier,Kevin Badalian,Lucas Koch,Sung-Yong Lee,Jakob Andert
备注:This manuscript has been submitted to SAE as a conference paper for the 2026 Stuttgart International Symposium on Automotive and Powertrain Technology
摘要
:电子控制单元(ECU)在将昔日的汽车转变为我们今天在道路上看到的现代汽车方面发挥了关键作用。它们主动调节单个组件的驱动,从而确定整个系统的特性。在这种情况下,控制功能的行为在很大程度上取决于工程师传统上手工设计的校准参数。这是在客户期望不断提高和产品开发周期不断缩短的环境下发生的。与此同时,立法要求越来越高,排放标准越来越严格。最重要的是,考虑到车辆变体的数量,传统方法正在失去其实际和财务可行性。先前的工作已经证明,最佳控制功能可以通过强化学习(RL)自动开发;由于所产生的功能由人工神经网络表示,因此它们缺乏可解释性,这种情况使它们难以在生产车辆中使用。在这篇文章中,我们提出了一种可解释的方法,使用剩余RL自动化校准过程,该方法遵循既定的汽车开发原则。它的适用性是通过一个基于地图的空气路径控制器在一个系列的控制单元,使用硬件在环(HiL)平台证明。从次优映射开始,所提出的方法快速收敛到与系列ECU中的参考非常相似的校准。结果证明,该方法适用于在显著更短的时间内实现更好的校准并且几乎不需要人为干预的行业
摘要:Electronic Control Units (ECUs) have played a pivotal role in transforming motorcars of yore into the modern vehicles we see on our roads today. They actively regulate the actuation of individual components and thus determine the characteristics of the whole system. In this, the behavior of the control functions heavily depends on their calibration parameters which engineers traditionally design by hand. This is taking place in an environment of rising customer expectations and steadily shorter product development cycles. At the same time, legislative requirements are increasing while emission standards are getting stricter. Considering the number of vehicle variants on top of all that, the conventional method is losing its practical and financial viability. Prior work has already demonstrated that optimal control functions can be automatically developed with reinforcement learning (RL); since the resulting functions are represented by artificial neural networks, they lack explainability, a circumstance which renders them challenging to employ in production vehicles. In this article, we present an explainable approach to automating the calibration process using residual RL which follows established automotive development principles. Its applicability is demonstrated by means of a map-based air path controller in a series control unit using a hardware-in-the-loop (HiL) platform. Starting with a sub-optimal map, the proposed methodology quickly converges to a calibration which closely resembles the reference in the series ECU. The results prove that the approach is suitable for the industry where it leads to better calibrations in significantly less time and requires virtually no human intervention
【5】FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
标题:FP 4探索、BF 16训练:通过高效推出缩放进行扩散强化学习
链接:https://arxiv.org/abs/2604.06916
作者:Yitong Li,Junsong Chen,Shuchen Xue,Pengcuo Zeren,Siyuan Fu,Dinghao Yang,Yangyang Tang,Junjie Bai,Ping Luo,Song Han,Enze Xie
摘要:基于强化学习的后训练最近已经成为一种很有前途的范例,用于将文本到图像的扩散模型与人类偏好相匹配。在最近的研究中,增加卷展组的大小产生显着的性能改善,表明进一步对齐增益的巨大空间。然而,在大规模基础扩散模型(例如,FLUX.1- 12 B)施加了沉重的计算负担。为了缓解这一瓶颈,我们探索将FP 4量化集成到扩散RL卷展栏中。然而,我们确定,天真的量化管道固有地引入性能下降的风险。为了克服效率和训练完整性之间的这种困境,我们提出了Sol-RL(光速RL),这是一种新的FP 4授权的两阶段强化学习框架。首先,我们利用高吞吐量NVFP 4部署来生成大量候选池并提取高度对比的子集。其次,我们重新生成这些选定的样本在BF 16精度和优化的政策专门对他们。通过将候选探索与策略优化解耦,Sol-RL将部署扩展的算法机制与NVFP 4的系统级吞吐量增益集成在一起。这种协同算法-硬件设计有效地加速了推出阶段,同时保留了高保真样本用于优化。我们的经验表明,我们的框架保持了BF 16精度流水线的训练完整性,同时充分利用了FP 4算法实现的吞吐量增益。在SANA、FLUX.1和SD3.5-L上进行的大量实验证实,我们的方法在多个指标上提供了卓越的对齐性能,同时将训练收敛速度加快了4.64倍,以一小部分成本释放了大规模部署扩展的能力。
摘要:Reinforcement-Learning-based post-training has recently emerged as a promising paradigm for aligning text-to-image diffusion models with human preferences. In recent studies, increasing the rollout group size yields pronounced performance improvements, indicating substantial room for further alignment gains. However, scaling rollouts on large-scale foundational diffusion models (e.g., FLUX.1-12B) imposes a heavy computational burden. To alleviate this bottleneck, we explore the integration of FP4 quantization into Diffusion RL rollouts. Yet, we identify that naive quantized pipelines inherently introduce risks of performance degradation. To overcome this dilemma between efficiency and training integrity, we propose Sol-RL (Speed-of-light RL), a novel FP4-empowered Two-stage Reinforcement Learning framework. First, we utilize high-throughput NVFP4 rollouts to generate a massive candidate pool and extract a highly contrastive subset. Second, we regenerate these selected samples in BF16 precision and optimize the policy exclusively on them. By decoupling candidate exploration from policy optimization, Sol-RL integrates the algorithmic mechanisms of rollout scaling with the system-level throughput gains of NVFP4. This synergistic algorithm-hardware design effectively accelerates the rollout phase while reserving high-fidelity samples for optimization. We empirically demonstrate that our framework maintains the training integrity of BF16 precision pipeline while fully exploiting the throughput gains enabled by FP4 arithmetic. Extensive experiments across SANA, FLUX.1, and SD3.5-L substantiate that our approach delivers superior alignment performance across multiple metrics while accelerating training convergence by up to $4.64\times$, unlocking the power of massive rollout scaling at a fraction of the cost.
【6】Equivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems
标题:多模式车辆到基础设施系统的等变多智能体强化学习
链接:https://arxiv.org/abs/2604.06914
作者:Charbel Bou Chaaya,Mehdi Bennis
摘要
:在本文中,我们研究了车辆到基础设施(V2I)系统,分布式基站(BS)作为路边单元(RSU)收集移动车辆的多模态(无线和视觉)数据。我们考虑一个分散的速率最大化问题,其中每个RSU依赖于其本地观测来优化其资源,而所有RSU必须合作以保证良好的网络性能。我们将这个问题转化为一个分布式多智能体强化学习(MARL)问题,通过将车辆位置的旋转对称性。为了利用这些对称性,我们提出了一种新的自监督学习框架,其中每个BS代理对齐其多模态观测的潜在特征,以提取车辆在其局部区域中的位置。配备了这个传感数据在每个RSU,我们训练一个等变的政策网络,使用一个图形神经网络(GNN)的消息传递层,使每个代理本地计算其政策,而所有的代理协调他们的政策,通过一个信令方案,克服部分可观性,并保证等变的全球政策。我们提出了在模拟环境中进行的数值计算结果,其中光线跟踪和计算机图形学被用来收集无线和视觉数据。结果表明,我们的自我监督和多模式感知方法具有普遍性,比基线实现了两倍以上的准确性增益,以及我们的等变MARL训练的效率,比标准方法实现了50%以上的性能增益。
摘要:In this paper, we study a vehicle-to-infrastructure (V2I) system where distributed base stations (BSs) acting as road-side units (RSUs) collect multimodal (wireless and visual) data from moving vehicles. We consider a decentralized rate maximization problem, where each RSU relies on its local observations to optimize its resources, while all RSUs must collaborate to guarantee favorable network performance. We recast this problem as a distributed multi-agent reinforcement learning (MARL) problem, by incorporating rotation symmetries in terms of vehicles' locations. To exploit these symmetries, we propose a novel self-supervised learning framework where each BS agent aligns the latent features of its multimodal observation to extract the positions of the vehicles in its local region. Equipped with this sensing data at each RSU, we train an equivariant policy network using a graph neural network (GNN) with message passing layers, such that each agent computes its policy locally, while all agents coordinate their policies via a signaling scheme that overcomes partial observability and guarantees the equivariance of the global policy. We present numerical results carried out in a simulation environment, where ray-tracing and computer graphics are used to collect wireless and visual data. Results show the generalizability of our self-supervised and multimodal sensing approach, achieving more than two-fold accuracy gains over baselines, and the efficiency of our equivariant MARL training, attaining more than 50% performance gains over standard approaches.
【7】TwinLoop: Simulation-in-the-Loop Digital Twins for Online Multi-Agent Reinforcement Learning
标题:TwinLoop:用于在线多智能体强化学习的环仿真数字双胞胎
链接:https://arxiv.org/abs/2604.06610
作者:Nan Zhang,Zishuo Wang,Shuyu Huang,Georgios Diamantopoulos,Nikos Tziritas,Panagiotis Oikonomou,Georgios Theodoropoulos
备注:6 pages, 6 figures
摘要:分散式在线学习使网络物理多智能体系统的运行时适应成为可能,但当运行条件发生变化时,学习到的策略通常需要大量的试错交互才能恢复性能。为了解决这个问题,我们提出了TwinLoop,这是一个用于在线多智能体强化学习的模拟在环数字孪生框架。当发生上下文转移时,数字孪生被触发以重建当前系统状态,从最新的代理策略初始化,并在将更新的参数同步回物理系统中的代理之前通过模拟假设分析执行加速策略改进。我们在工作负载和基础设施条件不断变化的车载边缘计算任务卸载场景中评估了TwinLoop。结果表明,数字孪生可以提高轮班后的适应效率,减少对昂贵的在线试错的依赖。
摘要:Decentralised online learning enables runtime adaptation in cyber-physical multi-agent systems, but when operating conditions change, learned policies often require substantial trial-and-error interaction before recovering performance. To address this, we propose TwinLoop, a simulation-in-the-loop digital twin framework for online multi-agent reinforcement learning. When a context shift occurs, the digital twin is triggered to reconstruct the current system state, initialise from the latest agent policies, and perform accelerated policy improvement with simulation what-if analysis before synchronising updated parameters back to the agents in the physical system. We evaluate TwinLoop in a vehicular edge computing task-offloading scenario with changing workload and infrastructure conditions. The results suggest that digital twins can improve post-shift adaptation efficiency and reduce reliance on costly online trial-and-error.
符号|符号学习(1篇)
【1】Inference-Time Code Selection via Symbolic Equivalence Partitioning
标题:通过符号等效划分的推理时代码选择
链接:https://arxiv.org/abs/2604.06485
作者:David Cho,Yifan Wang,Fanping Sui,Ananth Grama
摘要:“Best-of-N”选择是一种流行的推理时间缩放方法,用于使用大型语言模型(LLM)生成代码。然而,为了可靠地识别正确的解决方案,现有的方法往往依赖于昂贵的或随机的外部验证。在本文中,我们提出了符号等价分区,选择框架,使用符号执行组候选程序的语义行为,并选择一个代表占主导地位的功能分区。为了改进分组和选择,我们在符号执行期间将特定于域的约束编码为可满足性模理论(SMT)假设,以减少路径爆炸并防止问题域之外的无效输入搜索。在N=10时,我们的方法在HumanEval+上将Pass@1的平均准确度从0.728提高到0.803,在LiveCodeBench上从0.516提高到0.604,而不需要在初始N个候选代之外进行任何额外的LLM推断。
摘要:"Best-of-N" selection is a popular inference-time scaling method for code generation using Large Language Models (LLMs). However, to reliably identify correct solutions, existing methods often depend on expensive or stochastic external verifiers. In this paper, we propose Symbolic Equivalence Partitioning, a selection framework that uses symbolic execution to group candidate programs by semantic behavior and select a representative from the dominant functional partition. To improve grouping and selection, we encode domain-specific constraints as Satisfiability Modulo Theories (SMT) assumptions during symbolic execution to reduce path explosion and prevent invalid input searches outside the problem domain. At N=10, our method improves average accuracy over Pass@1 from 0.728 to 0.803 on HumanEval+ and from 0.516 to 0.604 on LiveCodeBench, without requiring any additional LLM inference beyond the initial N candidate generations.
医学相关(3篇)
【1】A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering
标题:检索增强医学问题解答检索管道设计的系统研究
链接:https://arxiv.org/abs/2604.07274
作者:Nusrat Sultana,Abdullah Muhammad Moosa,Kazi Afzalur Rahman,Sajal Chandra Banik
摘要
:大型语言模型(LLM)在医学问题回答中表现出强大的能力;然而,纯参数模型通常存在知识缺口和有限的事实基础。检索增强生成(RAG)通过将外部知识检索集成到推理过程中来解决这一限制。尽管对基于RAG的医疗系统的兴趣越来越大,但单个检索组件对性能的影响仍然没有得到充分的理解。本研究提出了一个系统的评估检索增强的医疗问题回答使用MedQA USMLE基准和结构化的基于教科书的知识语料库。我们分析了语言模型,嵌入模型,检索策略,查询重新制定,和跨编码器重新排名在一个统一的实验框架内,包括四十个配置之间的相互作用。结果表明,检索增强显着提高zero-shot医疗问题回答的性能。性能最好的配置是密集检索与查询重构和重新排序达到60.49%的准确率。领域专用语言模型也被发现比通用模型更好地利用检索到的医学证据。分析进一步揭示了检索效率和计算成本之间的一个明确的权衡,更简单的密集检索配置提供强大的性能,同时保持更高的吞吐量。所有的实验都是在一个消费级GPU上进行的,这表明可以在适度的计算资源下对检索增强的医疗QA系统进行系统评估。
摘要:Large language models (LLMs) have demonstrated strong capabilities in medical question answering; however, purely parametric models often suffer from knowledge gaps and limited factual grounding. Retrieval-augmented generation (RAG) addresses this limitation by integrating external knowledge retrieval into the reasoning process. Despite increasing interest in RAG-based medical systems, the impact of individual retrieval components on performance remains insufficiently understood. This study presents a systematic evaluation of retrieval-augmented medical question answering using the MedQA USMLE benchmark and a structured textbook-based knowledge corpus. We analyze the interaction between language models, embedding models, retrieval strategies, query reformulation, and cross-encoder reranking within a unified experimental framework comprising forty configurations. Results show that retrieval augmentation significantly improves zero-shot medical question answering performance. The best-performing configuration was dense retrieval with query reformulation and reranking achieved 60.49% accuracy. Domain-specialized language models were also found to better utilize retrieved medical evidence than general-purpose models. The analysis further reveals a clear tradeoff between retrieval effectiveness and computational cost, with simpler dense retrieval configurations providing strong performance while maintaining higher throughput. All experiments were conducted on a single consumer-grade GPU, demonstrating that systematic evaluation of retrieval-augmented medical QA systems can be performed under modest computational resources.
【2】TeaLeafVision: An Explainable and Robust Deep Learning Framework for Tea Leaf Disease Classification
标题:TeaLeafVision:用于茶叶病分类的可解释且稳健的深度学习框架
链接:https://arxiv.org/abs/2604.07182
作者:Rafi Ahamed,Sidratul Moon Nafsin,Md Abir Rahman,Tasnia Tarannum Roza,Munaia Jannat Easha,Abu Raihan
摘要:作为世界上消费量仅次于水的第二大饮料,茶不仅是一种文化主食,而且是一种具有深远规模和影响力的全球经济力量。它不仅仅是一种饮料,它代表了自然,文化和人类反思片刻的愿望之间的安静谈判。因此,对茶叶病害的精确识别和检测至关重要。为了实现这一目标,我们评估了几种卷积神经网络(CNN)模型,其中三种模型在teaLeafBD数据集上表现出了显著的性能,包括DenseNet201,MobileNetV2,InceptionV3。teaLeafBD数据集包含七个类别,六个疾病类别和一个健康类别,在反映现实世界挑战的各种现场条件下收集。在CNN模型中,DenseNet201的测试准确率最高,达到99%。为了提高模型的可靠性和可解释性,我们实现了梯度加权类激活映射(Grad CAM),遮挡敏感性分析和对抗性训练技术,以提高模型的抗噪性。最后,我们开发了一个原型,以利用模型的能力,在现实生活中的农业。本文阐述了深度学习模型在现实生活中茶叶病害检测和管理中对病害进行分类的能力。
摘要:As the worlds second most consumed beverage after water, tea is not just a cultural staple but a global economic force of profound scale and influence. More than a mere drink, it represents a quiet negotiation between nature, culture, and the human desire for a moment of reflection. So, the precise identification and detection of tea leaf disease is crucial. With this goal, we have evaluated several Convolutional Neural Networks (CNN) models, among them three shows noticeable performance including DenseNet201, MobileNetV2, InceptionV3 on the teaLeafBD dataset. teaLeafBD dataset contains seven classes, six disease classes and one healthy class, collected under various field conditions reflecting real world challenges. Among the CNN models, DenseNet201 has achieved the highest test accuracy of 99%. In order to enhance the model reliability and interpretability, we have implemented Gradient weighted Class Activation Mapping (Grad CAM), occlusion sensitivity analysis and adversarial training techniques to increase the noise resistance of the model. Finally, we have developed a prototype in order to leverage the models capabilities on real life agriculture. This paper illustrates the deep learning models capabilities to classify the disease in real life tea leaf disease detection and management.
【3】MedRoute: RL-Based Dynamic Specialist Routing in Multi-Agent Medical Diagnosis
标题:MedRoute:多Agent医疗诊断中基于RL的动态专家路由
链接:https://arxiv.org/abs/2604.06180
作者:Ashmal Vayani,Parth Parag Kulkarni,Joseph Fioresi,Song Wang,Mubarak Shah
摘要:由于大型多模态模型在提供精确诊断方面的能力,使用大型多模态模型(LVMs)进行医学诊断已经得到越来越多的关注。这些模型通常将医学问题与视觉输入相结合,以生成诊断或治疗。然而,它们通常过于笼统,不适合现实世界医疗保健中广泛的医疗条件。在临床实践中,诊断是由多个专家,每个贡献领域特定的专业知识。为了模拟这个过程,一个潜在的解决方案是部署一个动态的多代理LMM框架,其中每个代理都充当医疗专家。目前在这一新兴领域的做法,通常依赖于静态或预定义的选择各种专家,不能适应不断变化的实际情况。在本文中,我们提出了MedRoute,一个灵活的和动态的多代理框架,包括专家LMM代理的协作系统。此外,我们添加了一个全科医生与RL训练路由器的动态专家选择,和一个主持人,产生最终的决定。通过这种方式,我们的框架密切反映了真实的临床工作流程。对基于文本和图像的医疗数据集的广泛评估表明,诊断准确性有所提高,优于最先进的基线。我们的工作为未来的研究奠定了坚实的基础。代码和模型可在https://github.com/UCF-CRCV/MedRoute/上获得。
摘要
:Medical diagnosis using Large Multimodal Models (LMMs) has gained increasing attention due to capability of these models in providing precise diagnoses. These models generally combine medical questions with visual inputs to generate diagnoses or treatments. However, they are often overly general and unsuitable under the wide range of medical conditions in real-world healthcare. In clinical practice, diagnosis is performed by multiple specialists, each contributing domain-specific expertise. To emulate this process, a potential solution is to deploy a dynamic multi-agent LMM framework, where each agent functions as a medical specialist. Current approaches in this emerging area, typically relying on static or predefined selection of various specialists, cannot be adapted to the changing practical scenario. In this paper, we propose MedRoute, a flexible and dynamic multi-agent framework that comprises of a collaborative system of specialist LMM agents. Furthermore, we add a General Practitioner with an RL-trained router for dynamic specialist selection, and a Moderator that produces the final decision. In this way, our framework closely mirrors real clinical workflows. Extensive evaluations on text and image-based medical datasets demonstrate improved diagnostic accuracy, outperforming the state-of-the-art baselines. Our work lays a strong foundation for future research. Code and models are available at https://github.com/UCF-CRCV/MedRoute/.
蒸馏|知识提取(3篇)
【1】Extraction of linearized models from pre-trained networks via knowledge distillation
标题:通过知识蒸馏从预训练网络中提取线性化模型
链接:https://arxiv.org/abs/2604.06732
作者:Fumito Kimura,Jun Ohkubo
备注:9 pages, 5 figures
摘要:硬件的最新发展,如光子集成电路和光学器件,正在推动对构建为线性运算定制的机器学习架构的研究需求。因此,探索在简单的非线性预处理之后仅使用线性运算来构造学习机的方法是有价值的。在这项研究中,我们提出了一个框架,通过将Koopman算子理论与知识蒸馏相结合,从预训练的神经网络中提取线性化模型进行分类任务。MNIST和Fashion-MNIST数据集上的数值演示表明,该模型在分类精度和数值稳定性方面始终优于传统的基于最小二乘的Koopman近似。
摘要:Recent developments in hardware, such as photonic integrated circuits and optical devices, are driving demand for research on constructing machine learning architectures tailored for linear operations. Hence, it is valuable to explore methods for constructing learning machines with only linear operations after simple nonlinear preprocessing. In this study, we propose a framework to extract a linearized model from a pre-trained neural network for classification tasks by integrating Koopman operator theory with knowledge distillation. Numerical demonstrations on the MNIST and the Fashion-MNIST datasets reveal that the proposed model consistently outperforms the conventional least-squares-based Koopman approximation in both classification accuracy and numerical stability.
【2】SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport
标题:SubFLOT:通过最优传输实现高效和个性化联邦学习的子模型提取
链接:https://arxiv.org/abs/2604.06631
作者:Zheng Jiang,Nan He,Yiming Chen,Lifeng Sun
备注:Accepted by CVPR 2026
摘要:联邦学习(FL)支持协作模型训练,同时保护数据隐私,但其实际部署受到系统和统计异质性的阻碍。虽然联合网络修剪提供了一种缓解这些问题的途径,但现有方法面临着一个关键的困境:服务器端修剪缺乏个性化,而客户端修剪对于资源受限的设备来说在计算上是禁止的。此外,修剪过程本身会导致异构子模型之间出现显着的参数差异,从而破坏训练的稳定并阻碍全局收敛。为了解决这些挑战,我们提出了SubFLOT,一个新的框架服务器端个性化的联邦修剪。SubFLOT引入了一个最优传输增强修剪(OTP)模块,将历史客户端模型视为本地数据分布的代理,将修剪任务制定为Wasserstein距离最小化问题,以生成自定义的子模型,而无需访问原始数据。同时,为了抵消参数发散,我们的基于缩放的自适应正则化(SAR)模块自适应地惩罚子模型与全局模型的偏差,惩罚的强度由客户端的修剪率缩放。全面的实验表明,SubFLOT始终大大优于最先进的方法,强调了其在资源受限的边缘设备上部署高效和个性化模型的潜力。
摘要:Federated Learning (FL) enables collaborative model training while preserving data privacy, but its practical deployment is hampered by system and statistical heterogeneity. While federated network pruning offers a path to mitigate these issues, existing methods face a critical dilemma: server-side pruning lacks personalization, whereas client-side pruning is computationally prohibitive for resource-constrained devices. Furthermore, the pruning process itself induces significant parametric divergence among heterogeneous submodels, destabilizing training and hindering global convergence. To address these challenges, we propose SubFLOT, a novel framework for server-side personalized federated pruning. SubFLOT introduces an Optimal Transport-enhanced Pruning (OTP) module that treats historical client models as proxies for local data distributions, formulating the pruning task as a Wasserstein distance minimization problem to generate customized submodels without accessing raw data. Concurrently, to counteract parametric divergence, our Scaling-based Adaptive Regularization (SAR) module adaptively penalizes a submodel's deviation from the global model, with the penalty's strength scaled by the client's pruning rate. Comprehensive experiments demonstrate that SubFLOT consistently and substantially outperforms state-of-the-art methods, underscoring its potential for deploying efficient and personalized models on resource-constrained edge devices.
【3】The Detection--Extraction Gap: Models Know the Answer Before They Can Say It
标题:检测--提取差距:模型在说出答案之前就知道答案
链接:https://arxiv.org/abs/2604.06613
作者:Hanyang Wang,Mingxuan Zhu
摘要:现代推理模型在答案已经确定之后很久还在继续生成。在五个模型配置,两个家庭,和三个基准测试,我们发现,\textbf{52- 88\ %的思维链令牌后,答案是可恢复的}从一个部分前缀。这种承诺后的生成揭示了一种结构性现象:\textbf{检测-提取间隙}。早期前缀的自由延续甚至在10%的跟踪中恢复正确答案,而强制提取在42%的情况下失败。答案是可恢复的模型状态,但无条件解码未能提取it. We正式通过自由和强制连续分布之间的总变差界这种失配,产生后缀引起的移位的定量估计。利用这种不对称性,我们提出了黑盒自适应早期退出(\BAEE{}),它使用自由延续进行检测和提取,截断\textbf{串行生成的70--78\%},同时\textbf{在所有模型中将准确性提高1--5\,pp}。对于思维模式模型,提前退出可以防止承诺后的延迟,产生高达5.8\,pp的收益;成本优化的变体在9个API调用的中位数处实现68- 73\ %的减少。代码可在https://github.com/EdWangLoDaSc/know2say上获得。
摘要
:Modern reasoning models continue generating long after the answer is already determined. Across five model configurations, two families, and three benchmarks, we find that \textbf{52--88\% of chain-of-thought tokens are produced after the answer is recoverable} from a partial prefix. This post-commitment generation reveals a structural phenomenon: the \textbf{detection--extraction gap}. Free continuations from early prefixes recover the correct answer even at 10\% of the trace, while forced extraction fails on 42\% of these cases. The answer is recoverable from the model state, yet prompt-conditioned decoding fails to extract it. We formalize this mismatch via a total-variation bound between free and forced continuation distributions, yielding quantitative estimates of suffix-induced shift. Exploiting this asymmetry, we propose Black-box Adaptive Early Exit (\BAEE{}), which uses free continuations for both detection and extraction, truncating \textbf{70--78\% of serial generation} while \textbf{improving accuracy by 1--5\,pp} across all models. For thinking-mode models, early exit prevents post-commitment overwriting, yielding gains of up to 5.8\,pp; a cost-optimized variant achieves 68--73\% reduction at a median of 9 API calls. Code is available at https://github.com/EdWangLoDaSc/know2say.
推荐(3篇)
【1】NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining
标题:NestPipe:通过嵌套管道对1,500多个加速器进行大规模推荐训练
链接:https://arxiv.org/abs/2604.06956
作者:Zhida Jiang,Zhaolong Xing,Huichao Chai,Tianxing Sun,Qiang Peng,Baopeng Yuan,Jiaxing Wang,Hua Du,Zhixin Wu,Xuemiao Li,Yikui Cao,Xinyu Liu,Yongxiang Feng,Zhen Chen,Ke Zhang
摘要:现代推荐模型已经增加到数万亿个参数。随着集群规模扩展到O(1k),分布式训练瓶颈从计算和内存转移到数据移动,特别是与嵌入相关的查找和通信延迟。现有的解决方案要么只优化一个瓶颈,要么通过牺牲训练一致性来提高吞吐量。本文介绍了NestPipe,这是一个大规模的分散式嵌入训练框架,可以解决这两个瓶颈,同时保留同步训练语义。NestPipe通过嵌套流水线利用两个层次稀疏并行机会。在批间级,双缓冲区流水线(DBP)通过双缓冲区同步构建了一个无过期的五阶段流水线,在不嵌入过期的情况下缓解了查找瓶颈。在批内级别,我们识别嵌入冻结现象,这激发了冻结窗口流水线(FWP)通过协调流调度和以键为中心的样本聚类来重叠All2All通信与密集计算。在具有1,536个工作器的生产GPU和NPU集群上的实验表明,NestPipe实现了高达3.06倍的加速和94.07%的扩展效率。
摘要:Modern recommendation models have increased to trillions of parameters. As cluster scales expand to O(1k), distributed training bottlenecks shift from computation and memory to data movement, especially lookup and communication latency associated with embeddings. Existing solutions either optimize only one bottleneck or improve throughput by sacrificing training consistency. This paper presents NestPipe, a large-scale decentralized embedding training framework that tackles both bottlenecks while preserving synchronous training semantics. NestPipe exploits two hierarchical sparse parallelism opportunities through nested pipelining. At the inter-batch level, Dual-Buffer Pipelining (DBP) constructs a staleness-free five-stage pipeline through dual-buffer synchronization, mitigating lookup bottlenecks without embedding staleness. At the intra-batch level, we identify the embedding freezing phenomenon, which inspires Frozen-Window Pipelining (FWP) to overlap All2All communication with dense computation via coordinated stream scheduling and key-centric sample clustering. Experiments on production GPU and NPU clusters with 1,536 workers demonstrate that NestPipe achieves up to 3.06x speedup and 94.07% scaling efficiency.
【2】CASE: Cadence-Aware Set Encoding for Large-Scale Next Basket Repurchase Recommendation
标题:案例:大规模下一篮子回购建议的Cadence感知集编码
链接:https://arxiv.org/abs/2604.06718
作者:Yanan Cao,Ashish Ranjan,Sinduja Subramaniam,Evren Korpeoglu,Kaushiki Nag,Kannan Achan
备注:Accepted at SIGIR 2026 Industry Track
摘要:重购行为是大规模零售推荐的主要信号,特别是在频繁补货的类别中:用户下一个购物篮中的许多物品是以前购买的,并且它们的时间遵循稳定的特定于物品的节奏。然而,大多数下一个购物篮回购推荐模型将历史表示为一系列离散的购物篮事件,这些事件由访问顺序索引,无法显式地对经过的日历时间进行建模,也无法根据购买之间的天数更新商品排名。我们提出了CASE(Cadence-Aware Set Encoding for Next Basket Repurchase Recommendation),它从跨项目交互中学习项目级节奏,从而在保持生产可扩展性的同时实现显式的实时建模。CASE将每个项目的购买历史表示为固定范围内的时间信号,应用共享的多尺度时间卷积来捕获重复的节奏,并使用诱导的集合注意力来模拟具有次二次复杂度的跨项目依赖关系,从而实现大规模的高效批量推理。在三个公共基准和一个专有数据集上,与强大的下一个篮子预测基线相比,CASE在多个截止点上持续提高了精度、召回率和NDCG。在具有数千万用户和大型项目目录的生产规模评估中,CASE在前5名中实现了高达8.6%的相对精度和9.9%的召回提升,这表明可扩展的节奏感知建模在基准和工业环境中都产生了可衡量的收益。
摘要:Repurchase behavior is a primary signal in large-scale retail recommendation, particularly in categories with frequent replenishment: many items in a user's next basket were previously purchased and their timing follows stable, item-specific cadences. Yet most next basket repurchase recommendation models represent history as a sequence of discrete basket events indexed by visit order, which cannot explicitly model elapsed calendar time or update item rankings as days pass between purchases. We present CASE (Cadence-Aware Set Encoding for next basket repurchase recommendation), which decouples item-level cadence learning from cross-item interaction, enabling explicit calendar-time modeling while remaining production-scalable. CASE represents each item's purchase history as a calendar-time signal over a fixed horizon, applies shared multi-scale temporal convolutions to capture recurring rhythms, and uses induced set attention to model cross-item dependencies with sub-quadratic complexity, allowing efficient batch inference at scale. Across three public benchmarks and a proprietary dataset, CASE consistently improves Precision, Recall, and NDCG at multiple cutoffs compared to strong next basket prediction baselines. In a production-scale evaluation with tens of millions of users and a large item catalog, CASE achieves up to 8.6% relative Precision and 9.9% Recall lift at top-5, demonstrating that scalable cadence-aware modeling yields measurable gains in both benchmark and industrial settings.
【3】The Unreasonable Effectiveness of Data for Recommender Systems
标题:推荐系统数据的不合理有效性
链接:https://arxiv.org/abs/2604.06420
作者:Youssef Abdou
备注:5 pages, 6 figures. Poster paper
摘要
:在推荐系统中,收集、存储和处理大规模交互数据在时间、精力和计算方面的成本越来越高,但尚不清楚额外的数据何时停止提供有意义的收益。本文研究了离线推荐性能如何随着训练数据集大小的增加而演变,以及是否可以观察到饱和点。我们使用两个已建立的工具包LensKit和RecBole实现了一个可重复的Python评估工作流程,包括11个大型公共数据集,至少有700万次交互,并评估了10个工具-算法组合。使用绝对分层用户抽样,我们在100,000到100,000,000个交互的9个样本量上训练模型,并测量NDCG@10。总体而言,原始NDCG通常随样本大小而增加,没有可观察到的饱和点。为了使结果组具有可比性,我们在每组内应用了最小-最大归一化,揭示了一个明显的积极趋势,其中最大完成样本量的约75%的点也达到了该组的最佳观察性能。对每组最后10-30%进行的后期斜率分析进一步支持了这一上升趋势:四分位数间距保持完全非负值,中位数接近1.0。总之,对于典型用户-项目交互数据的传统推荐系统,结合更多的训练数据仍然是主要有益的,而较弱的缩放行为集中在非典型数据集案例和我们设置下的算法离群值RecBole BPR中。
摘要:In recommender systems, collecting, storing, and processing large-scale interaction data is increasingly costly in terms of time, energy, and computation, yet it remains unclear when additional data stops providing meaningful gains. This paper investigates how offline recommendation performance evolves as the size of the training dataset increases and whether a saturation point can be observed. We implemented a reproducible Python evaluation workflow with two established toolkits, LensKit and RecBole, included 11 large public datasets with at least 7 million interactions, and evaluated 10 tool-algorithm combinations. Using absolute stratified user sampling, we trained models on nine sample sizes from 100,000 to 100,000,000 interactions and measured NDCG@10. Overall, raw NDCG usually increased with sample size, with no observable saturation point. To make result groups comparable, we applied min-max normalization within each group, revealing a clear positive trend in which around 75% of the points at the largest completed sample size also achieved the group's best observed performance. A late-stage slope analysis over the final 10-30% of each group further supported this upward trend: the interquartile range remained entirely non-negative with a median near 1.0. In summary, for traditional recommender systems on typical user-item interaction data, incorporating more training data remains primarily beneficial, while weaker scaling behavior is concentrated in atypical dataset cases and in the algorithmic outlier RecBole BPR under our setup.
聚类(3篇)
【1】Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees
标题:Lumbermark:通过削减相互可达性最小生成树来抵抗聚集
链接:https://arxiv.org/abs/2604.07143
作者:Marek Gagolewski
摘要:我们介绍Lumbermark,一个强大的分裂聚类算法,能够检测不同大小,密度和形状的集群。Lumbermark迭代地砍掉由数据集的相互可达性最小生成树的突出段连接的大分支。相互可达距离的使用平滑了数据分布,并减少了低密度对象的影响,例如聚类之间的噪声点或其外围的离群值。该算法可以被视为HDBSCAN的替代方案,它可以产生用户指定大小的分区。在Python和R的开源'lumbermark'包中提供了新方法的快速,易于使用的实现。我们证明了Lumbermark在基准数据上表现良好,并希望它对不同领域的数据科学家和从业者有用。
摘要:We introduce Lumbermark, a robust divisive clustering algorithm capable of detecting clusters of varying sizes, densities, and shapes. Lumbermark iteratively chops off large limbs connected by protruding segments of a dataset's mutual reachability minimum spanning tree. The use of mutual reachability distances smoothens the data distribution and decreases the influence of low-density objects, such as noise points between clusters or outliers at their peripheries. The algorithm can be viewed as an alternative to HDBSCAN that produces partitions with user-specified sizes. A fast, easy-to-use implementation of the new method is available in the open-source 'lumbermark' package for Python and R. We show that Lumbermark performs well on benchmark data and hope it will prove useful to data scientists and practitioners across different fields.
【2】Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering
标题:挖掘电子健康档案研究Enclosures深度聚类的有效性
链接:https://arxiv.org/abs/2604.07085
作者:Manar D. Samad,Yina Hou,Shrabani Ghosh
备注:14th IEEE Conference on Healthcare Informatics
摘要:在电子健康记录(EHR)中,聚类患者和区分疾病亚型是阐明病理生理学和辅助临床决策的关键任务。然而,医疗保健信息学中的聚类仍然基于传统方法,特别是K-means,并且在应用于嵌入自动编码器作为混合方法学习的表示时取得了有限的成功。本文使用All of Us研究计划的真实EHR数据,研究了传统、混合和深度学习方法在心力衰竭患者队列中的有效性。传统的聚类方法表现稳健,因为深度学习方法是专门为图像聚类而设计的,这是一项与表格EHR数据设置有很大不同的任务。为了解决深度聚类的缺点,我们引入了一种基于集成的深度聚类方法,该方法聚合从多个嵌入维度获得的聚类分配,而不是依赖于单个固定的嵌入空间。当在一个新的集成框架中与传统的聚类相结合时,所提出的用于深度聚类的集成嵌入在14种不同的聚类方法和多个患者队列中提供了最佳的整体性能排名。本文强调了EHR数据的生物性别特异性聚类的重要性,以及传统和深度聚类方法相结合的优势。
摘要:In electronic health records (EHRs), clustering patients and distinguishing disease subtypes are key tasks to elucidate pathophysiology and aid clinical decision-making. However, clustering in healthcare informatics is still based on traditional methods, especially K-means, and has achieved limited success when applied to embedding representations learned by autoencoders as hybrid methods. This paper investigates the effectiveness of traditional, hybrid, and deep learning methods in heart failure patient cohorts using real EHR data from the All of Us Research Program. Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting. To address the shortcomings of deep clustering, we introduce an ensemble-based deep clustering approach that aggregates cluster assignments obtained from multiple embedding dimensions, rather than relying on a single fixed embedding space. When combined with traditional clustering in a novel ensemble framework, the proposed ensemble embedding for deep clustering delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. This paper underscores the importance of biological sex-specific clustering of EHR data and the advantages of combining traditional and deep clustering approaches over a single method.
【3】A Data-Informed Variational Clustering Framework for Noisy High-Dimensional Data
标题:针对有噪音的多维数据的数据知情变分集群框架
链接:https://arxiv.org/abs/2604.06864
作者:Wan Ping Chen
摘要:在具有严重特征噪声的高维环境中进行聚类仍然具有挑战性,特别是当只有一小部分维度是信息性的并且未预先指定最终聚类数时。在这种情况下,分区恢复,特征相关性学习和结构自适应是紧密耦合的,标准的基于似然的方法可能变得不稳定或对噪声维度过于敏感。我们提出了DIVI,一个数据通知变分聚类框架,结合了全球功能门与分裂为基础的自适应结构增长。DIVI使用信息丰富的先验初始化来稳定优化,以可微分的方式学习特征相关性,并仅在局部诊断指示欠拟合时扩展模型复杂性。除了聚类性能,我们还研究了运行时的可扩展性和参数敏感性,以澄清框架的计算和实际行为。从经验上讲,我们发现,DIVI在严重的特征噪声下表现出竞争力,在计算上仍然是可行的,并产生可解释的特征门控行为,同时在具有挑战性的环境中也表现出保守的增长和可识别的故障制度。总的来说,DIVI最好被看作是一个实用的变分聚类框架,嘈杂的高维数据,而不是作为一个完全贝叶斯生成的解决方案。
摘要:Clustering in high-dimensional settings with severe feature noise remains challenging, especially when only a small subset of dimensions is informative and the final number of clusters is not specified in advance. In such regimes, partition recovery, feature relevance learning, and structural adaptation are tightly coupled, and standard likelihood-based methods can become unstable or overly sensitive to noisy dimensions. We propose DIVI, a data-informed variational clustering framework that combines global feature gating with split-based adaptive structure growth. DIVI uses informative prior initialization to stabilize optimization, learns feature relevance in a differentiable manner, and expands model complexity only when local diagnostics indicate underfit. Beyond clustering performance, we also examine runtime scalability and parameter sensitivity in order to clarify the computational and practical behavior of the framework. Empirically, we find that DIVI performs competitively under severe feature noise, remains computationally feasible, and yields interpretable feature-gating behavior, while also exhibiting conservative growth and identifiable failure regimes in challenging settings. Overall, DIVI is best viewed as a practical variational clustering framework for noisy high-dimensional data rather than as a fully Bayesian generative solution.
联邦学习|隐私保护|加密(2篇)
【1】DDP-SA: Scalable Privacy-Preserving Federated Learning via Distributed Differential Privacy and Secure Aggregation
标题:DDP-SA:通过分布式差异隐私和安全聚合的可扩展隐私保护联邦学习
链接:https://arxiv.org/abs/2604.07125
作者:Wenjing Wei,Farid Nait-Abdesselam,Alla Jammine
摘要:本文介绍了DDP-SA,这是一个可扩展的隐私保护联邦学习框架,它联合利用客户端局部差分隐私(LDP)和全阈值附加秘密共享(ASS)进行安全聚合。与仅依赖于差分隐私或安全多方计算(MPC)的现有方法不同,DDP-SA集成了这两种技术,以提供更强的端到端隐私保证,同时保持计算实用性。该框架引入了一个两阶段的保护机制:客户端首先用校准的拉普拉斯噪声扰动其本地梯度,然后将噪声梯度分解为分布在多个中间服务器上的附加秘密份额。这种设计确保(i)没有单个受损的服务器或通信信道可以泄露关于各个客户端更新的任何信息,以及(ii)参数服务器仅重建聚合的噪声梯度,而不会重建任何客户端特定的贡献。大量的实验表明,DDP-SA实现了更高的模型精度比独立的LDP,同时提供更强的隐私保护比MPC的方法。所提出的框架与参与者的数量呈线性关系,并为具有可控计算和通信开销的联邦学习应用程序提供了一种实用的隐私保护解决方案。
摘要:This article presents DDP-SA, a scalable privacy-preserving federated learning framework that jointly leverages client-side local differential privacy (LDP) and full-threshold additive secret sharing (ASS) for secure aggregation. Unlike existing methods that rely solely on differential privacy or on secure multi-party computation (MPC), DDP-SA integrates both techniques to deliver stronger end-to-end privacy guarantees while remaining computationally practical. The framework introduces a two-stage protection mechanism: clients first perturb their local gradients with calibrated Laplace noise, then decompose the noisy gradients into additive secret shares that are distributed across multiple intermediate servers. This design ensures that (i) no single compromised server or communication channel can reveal any information about individual client updates, and (ii) the parameter server reconstructs only the aggregated noisy gradient, never any client-specific contribution. Extensive experiments show that DDP-SA achieves substantially higher model accuracy than standalone LDP while providing stronger privacy protection than MPC-only approaches. The proposed framework scales linearly with the number of participants and offers a practical, privacy-preserving solution for federated learning applications with controllable computational and communication overhead.
【2】Bi-level Heterogeneous Learning for Time Series Foundation Models: A Federated Learning Approach
标题:时间序列基础模型的双层异类学习:联邦学习方法
链接:https://arxiv.org/abs/2604.06727
作者:Shengchao Chen,Guodong Long,Dikai Liu,Jing Jiang
备注:31 pages
摘要:时间序列数据中的异质性比视觉或语言中的异质性更明显,因为时间动态在各个领域和任务中变化很大。现有的从头开始训练时间序列基础模型(TSFM)的努力通常是使用混合批量策略进行训练的,这些策略合并大规模数据集,这可能会导致梯度冲突并降低表示质量。为了解决这个问题,我们提出了一种细粒度的学习方法,从异构系列中提取不变的知识,同时减少跨域干扰。我们在两个层面上的异质性:域间和域内。为了解决这种双层异构性,我们设计了一种联邦学习方法,通过局部正则化强制域不变和语义一致的表示来减轻域内冲突,并通过域感知聚合增强跨域协作来解决域间差异。不同基准测试的实验表明,用我们的方法训练的TSFM在点和概率预测方面始终优于集中式和联邦TSFM基线,同时还在大规模上实现了具有竞争力的zero-shot性能,为在异构环境中从头开始训练TSFM提供了灵活的途径。
摘要
:Heterogeneity in time series data is more pronounced than in vision or language, as temporal dynamics vary substantially across domains and tasks. Existing efforts on training time series foundation models (TSFMs) from scratch are often trained with mixed-batch strategies that merge large-scale datasets, which can cause gradient conflicts and degrade representation quality. To address this, we propose a fine-grained learning method that distills invariant knowledge from heterogeneous series while reducing cross-domain interference. We characterize heterogeneity at two levels: inter-domain and intra-domain. To tackle this bi-level heterogeneity, we design a federated learning method that mitigates intra-domain conflicts by enforcing domain-invariant and semantically consistent representations through local regularization, and addresses inter-domain discrepancies by enhancing cross-domain collaboration via domain-aware aggregation. Experiments across diverse benchmarks show that TSFMs trained with our method consistently outperform both centralized and federated TSFM baselines in point and probabilistic forecasting, while also achieving competitive zero-shot performance at scale, offering a flexible pathway for training TSFMs from scratch in heterogeneous environments.
推理|分析|理解|解释(8篇)
【1】A comparative analysis of machine learning models in SHAP analysis
标题:SHAP分析中机器学习模型的比较分析
链接:https://arxiv.org/abs/2604.07258
作者:Justin Lin,Julia Fukuyama
备注:17 pages, 16 figures, 4 tables
摘要:在这个数据和技术不断发展的时代,大型黑盒模型正在成为常态,因为它们能够处理大量数据并学习非常复杂的数据模式。然而,这些方法的不足之处在于,它们无法解释预测过程,使它们不值得信赖,并且在高风险情况下使用不稳定。SHapley加法解释(SHAP)分析是一种可解释的人工智能方法,由于其能够根据原始特征解释模型预测而越来越受欢迎。对于数据集中的每个样本和特征,相关联的SHAP值量化该特征对该样本的预测的贡献。对这些SHAP值的分析为模型的决策过程提供了有价值的见解,可以利用这些见解来创建数据驱动的解决方案。然而,这些SHAP值的解释是依赖于模型的,因此不存在通用的分析程序。为了帮助这些工作,我们对各种机器学习模型和数据集的SHAP分析进行了详细的调查。在揭示SHAP分析背后的细节和细微差别时,我们希望在这个较少探索的领域中赋予分析师权力。我们还提出了一个新的概括的瀑布图的多分类问题。
摘要:In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex data patterns. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) analysis is an explainable AI method growing in popularity for its ability to explain model predictions in terms of the original features. For each sample and feature in the data set, an associated SHAP value quantifies the contribution of that feature to the prediction of that sample. Analysis of these SHAP values provides valuable insight into the model's decision-making process, which can be leveraged to create data-driven solutions. The interpretation of these SHAP values, however, is model-dependent, so there does not exist a universal analysis procedure. To aid in these efforts, we present a detailed investigation of SHAP analysis across various machine learning models and data sets. In uncovering the details and nuance behind SHAP analysis, we hope to empower analysts in this less-explored territory. We also present a novel generalization of the waterfall plot to the multi-classification problem.
【2】Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments
标题:图像真实性判断深度网络中模型行为的解释的不可识别性
链接:https://arxiv.org/abs/2604.07254
作者:Icaro Re Depaolini,Uri Hasson
摘要:深度神经网络可以预测人类的判断,但这并不意味着它们依赖于类似人类的信息或揭示这些判断背后的线索。之前的工作已经使用归因热图解决了这个问题,但它们的解释价值本身取决于鲁棒性。在这里,我们通过评估预测人类真实性评级的模型是否也在架构内和跨架构产生一致的解释来测试这些解释的鲁棒性。我们将轻量级回归头适配到多个冻结的预训练视觉模型,并使用Grad-CAM,LIME和多尺度像素掩蔽生成归因图。一些建筑预测评级良好,达到约80%的噪音上限。VGG模型通过跟踪图像质量而不是真实性特定的方差来实现这一点,从而限制了其属性的相关性。在其余的模型中,归因图在架构内的随机种子中通常是稳定的,特别是对于EfficientNetB 3和Barlow Twins,并且对于被判断为更真实的图像,一致性更高。至关重要的是,即使在预测性能相似的情况下,跨架构的归因协议也很弱。为了解决这个问题,我们将模型组合在一起,这改善了对人类真实性判断的预测,并通过像素掩蔽实现了图像级归因。我们的结论是,虽然深度网络可以很好地预测人类的真实性判断,但它们不能为这些判断提供可识别的解释。更广泛地说,我们的研究结果表明,来自成功行为模型的事后解释应该被视为认知机制的弱证据。
摘要:Deep neural networks can predict human judgments, but this does not imply that they rely on human-like information or reveal the cues underlying those judgments. Prior work has addressed this issue using attribution heatmaps, but their explanatory value in itself depends on robustness. Here we tested the robustness of such explanations by evaluating whether models that predict human authenticity ratings also produce consistent explanations within and across architectures. We fit lightweight regression heads to multiple frozen pretrained vision models and generated attribution maps using Grad-CAM, LIME, and multiscale pixel masking. Several architectures predicted ratings well, reaching about 80% of the noise ceiling. VGG models achieved this by tracking image quality rather than authenticity-specific variance, limiting the relevance of their attributions. Among the remaining models, attribution maps were generally stable across random seeds within an architecture, especially for EfficientNetB3 and Barlow Twins, and consistency was higher for images judged as more authentic. Crucially, agreement in attribution across architectures was weak even when predictive performance was similar. To address this, we combined models in ensembles, which improved prediction of human authenticity judgments and enabled image-level attribution via pixel masking. We conclude that while deep networks can predict human authenticity judgments well, they do not produce identifiable explanations for those judgments. More broadly, our findings suggest that post hoc explanations from successful models of behavior should be treated as weak evidence for cognitive mechanism.
【3】ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations
标题:ConceptTracer:神经表示中概念显着性和选择性的交互分析
链接:https://arxiv.org/abs/2604.07019
作者:Ricardo Knauer,Andre Beinrucker,Erik Rodner
备注
:XAI 2026 Late-Breaking Work Track
摘要:神经网络在各种任务中提供了令人印象深刻的预测性能,但它们在决策过程中往往是不透明的。尽管人们对机械可解释性的兴趣越来越大,但用于系统地探索神经网络所学习的表示的工具,特别是表格基础模型,仍然有限。在这项工作中,我们介绍了ConceptTracer,一个交互式的应用程序,通过人类可解释的概念的镜头分析神经表征。ConceptTracer集成了两种信息理论测量方法,可以量化概念的显着性和选择性,使研究人员和从业者能够识别对单个概念反应强烈的神经元。我们展示了ConceptTracer在TabPFN学习的表示上的实用性,并表明我们的方法有助于发现可解释的神经元。总之,这些功能为研究TabPFN等神经网络如何编码概念级信息提供了一个实用的框架。ConceptTracer可在https://github.com/ml-lab-htw/concept-tracer上获得。
摘要:Neural networks deliver impressive predictive performance across a variety of tasks, but they are often opaque in their decision-making processes. Despite a growing interest in mechanistic interpretability, tools for systematically exploring the representations learned by neural networks in general, and tabular foundation models in particular, remain limited. In this work, we introduce ConceptTracer, an interactive application for analyzing neural representations through the lens of human-interpretable concepts. ConceptTracer integrates two information-theoretic measures that quantify concept saliency and selectivity, enabling researchers and practitioners to identify neurons that respond strongly to individual concepts. We demonstrate the utility of ConceptTracer on representations learned by TabPFN and show that our approach facilitates the discovery of interpretable neurons. Together, these capabilities provide a practical framework for investigating how neural networks like TabPFN encode concept-level information. ConceptTracer is available at https://github.com/ml-lab-htw/concept-tracer.
【4】Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem
标题:Markov决策问题加权LP模软Bellman剩余极小化的收缩对齐分析
链接:https://arxiv.org/abs/2604.06837
作者:Hyukjun Yang,Han-Dong Lim,Donghwan Lee
摘要:在函数近似下求解马尔可夫决策过程的问题仍然是一个基本的挑战,即使在线性函数近似设置。一个关键的困难来自于几何不匹配:虽然Bellman最优算子在Linfty范数下是压缩的,但常用的目标,如投影值迭代和Bellman残差最小化依赖于基于L2的公式。为了使基于梯度的优化,我们考虑了一个软配方贝尔曼剩余最小化,并将其扩展到一个广义加权Lp -范数。我们表明,这种配方对齐的Bellman算子的收缩几何的优化目标,随着p的增加,并获得相应的性能误差界。我们的分析提供了一个原则之间的联系残差最小化和贝尔曼收缩,从而改善控制的误差传播,同时保持兼容的基于梯度的优化。
摘要:The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman optimality operator is contractive in the Linfty-norm, commonly used objectives such as projected value iteration and Bellman residual minimization rely on L2-based formulations. To enable gradient-based optimization, we consider a soft formulation of Bellman residual minimization and extend it to a generalized weighted Lp -norm. We show that this formulation aligns the optimization objective with the contraction geometry of the Bellman operator as p increases, and derive corresponding performance error bounds. Our analysis provides a principled connection between residual minimization and Bellman contraction, leading to improved control of error propagation while remaining compatible with gradient-based optimization.
【5】MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
标题:MoBiE:训练后量化下的二元专家混合推理
链接:https://arxiv.org/abs/2604.06798
作者:Zhixiong Zhao,Zukang Xu,Zhixuan Chen,Dawei Yang
备注:Accepted at ACL 2026 Findings
摘要:基于混合专家(MoE)的大型语言模型(LLM)提供了强大的性能,但遭受高存储器和计算成本。权重二进制化提供了极高的效率,但为密集LLM设计的现有二进制方法难以解决MoE特定的问题,包括跨专家冗余,任务不可知的重要性估计和量化引起的路由偏移。为此,我们提出了MoBiE,这是第一个为基于MoE的LLM量身定制的二值化框架。MoBiE建立在三个核心创新之上:1。使用联合SVD分解减少跨专家冗余; 2.将全局损失梯度集成到局部Hessian度量中以增强权重重要性估计; 3.引入由所述输入零空间引导的误差约束以减轻路由失真。值得注意的是,MoBiE实现了这些优化,同时不会产生额外的存储开销,在效率和模型性能之间取得了平衡。大量的实验表明,MoBiE在多个基于MoE的LLM和基准测试中始终优于最先进的二进制方法。例如,在Qwen 3 - 30 B-A3 B上,MoBiE将困惑度降低了52.2$\%$,将平均zero-shot性能提高了43.4$\%$,实现了超过2 $\倍$的推理加速,并进一步缩短了量化时间。该代码可在https://github.com/Kishon-zzx/MoBiE上获得。
摘要:Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs struggle with MoE-specific issues, including cross-expert redundancy, task-agnostic importance estimation, and quantization-induced routing shifts. To this end, we propose MoBiE, the first binarization framework tailored for MoE-based LLMs. MoBiE is built on three core innovations: 1. using joint SVD decomposition to reduce cross-expert redundancy; 2. integrating global loss gradients into local Hessian metrics to enhance weight importance estimation; 3. introducing an error constraint guided by the input null space to mitigate routing distortion. Notably, MoBiE achieves these optimizations while incurring no additional storage overhead, striking a balance between efficiency and model performance. Extensive experiments demonstrate that MoBiE consistently outperforms state-of-the-art binary methods across multiple MoE-based LLMs and benchmarks. For example, on Qwen3-30B-A3B, MoBiE reduces perplexity by 52.2$\%$, improves average zero-shot performance by 43.4$\%$, achieves over 2 $\times$ inference speedup, and further shortens quantization time. The code is available at https://github.com/Kishon-zzx/MoBiE.
【6】Busemann energy-based attention for emotion analysis in Poincaré discs
标题:基于Busemann能量的注意力在庞加莱光盘中进行情感分析
链接:https://arxiv.org/abs/2604.06752
作者:Zinaid Kapić,Vladimir Jaćimović
摘要:我们介绍了EmBolic -一种新颖的全双曲深度学习架构,用于从文本消息中进行细粒度情感分析。其基本思想是,双曲几何有效地捕捉了文字和情感之间的层次结构。在我们的上下文中,这些层次关系源于语义歧义。Embolic旨在推断情感连续空间上的曲率,而不是将它们视为没有任何度量结构的范畴集。我们建筑的核心是双曲圆盘中的注意力机制。该模型经过训练,可以从文本消息中生成查询(双曲盘中的点),而键(边界上的点)则会从生成的查询中自动出现。预测是基于查询和键之间的Busemann能量,评估某个文本消息与代表情感的类方向的一致性。我们的实验表明,强大的泛化性能和相当好的预测精度,即使是小尺寸的表示空间。总的来说,这项研究支持我们的主张,情感计算是双曲表示特别有利的应用领域之一。
摘要:We present EmBolic - a novel fully hyperbolic deep learning architecture for fine-grained emotion analysis from textual messages. The underlying idea is that hyperbolic geometry efficiently captures hierarchies between both words and emotions. In our context, these hierarchical relationships arise from semantic ambiguities. EmBolic aims to infer the curvature on the continuous space of emotions, rather than treating them as a categorical set without any metric structure. In the heart of our architecture is the attention mechanism in the hyperbolic disc. The model is trained to generate queries (points in the hyperbolic disc) from textual messages, while keys (points at the boundary) emerge automatically from the generated queries. Predictions are based on the Busemann energy between queries and keys, evaluating how well a certain textual message aligns with the class directions representing emotions. Our experiments demonstrate strong generalization properties and reasonably good prediction accuracy even for small dimensions of the representation space. Overall, this study supports our claim that affective computing is one of the application domains where hyperbolic representations are particularly advantageous.
【7】RAGEN-2: Reasoning Collapse in Agentic RL
标题:RAgen-2:超级RL中的推理崩溃
链接:https://arxiv.org/abs/2604.06268
作者:Zihan Wang,Chi Gui,Xing Jin,Qineng Wang,Licheng Liu,Kangrui Wang,Shiqi Chen,Linjie Li,Zhengyuan Yang,Pingyue Zhang,Yiping Lu,Jiajun Wu,Li Fei-Fei,Lijuan Wang,Yejin Choi,Manling Li
摘要:多回合LLM代理的RL训练本质上是不稳定的,推理质量直接决定任务性能。熵被广泛用于跟踪推理的稳定性。然而,熵只能衡量同一输入的多样性,不能判断推理是否实际上对不同的输入做出了反应。在RAGEN-2中,我们发现即使有稳定的熵,模型也可以依赖于固定的模板,这些模板看起来不同,但与输入无关。我们称之为模板崩溃,一种熵和所有现有指标都不可见的故障模式。为了诊断这种故障,我们将推理质量分解为输入内多样性(熵)和交叉输入可扩展性(互信息,MI),并引入了一个家庭的互信息代理在线诊断。在不同的任务中,互信息与最终表现的相关性比熵强得多,使其成为推理质量的更可靠的代理。我们进一步解释模板崩溃的信噪比(SNR)机制。低回报方差削弱了任务梯度,让正则化项占主导地位,消除了交叉输入推理差异。为了解决这个问题,我们提出了SNR感知过滤选择高信号提示每次迭代使用奖励方差作为轻量级代理。在规划、数学推理、Web导航和代码执行过程中,该方法始终提高了输入依赖性和任务性能。
摘要:RL training of multi-turn LLM agents is inherently unstable, and reasoning quality directly determines task performance. Entropy is widely used to track reasoning stability. However, entropy only measures diversity within the same input, and cannot tell whether reasoning actually responds to different inputs. In RAGEN-2, we find that even with stable entropy, models can rely on fixed templates that look diverse but are input-agnostic. We call this template collapse, a failure mode invisible to entropy and all existing metrics. To diagnose this failure, we decompose reasoning quality into within-input diversity (Entropy) and cross-input distinguishability (Mutual Information, MI), and introduce a family of mutual information proxies for online diagnosis. Across diverse tasks, mutual information correlates with final performance much more strongly than entropy, making it a more reliable proxy for reasoning quality. We further explain template collapse with a signal-to-noise ratio (SNR) mechanism. Low reward variance weakens task gradients, letting regularization terms dominate and erase cross-input reasoning differences. To address this, we propose SNR-Aware Filtering to select high-signal prompts per iteration using reward variance as a lightweight proxy. Across planning, math reasoning, web navigation, and code execution, the method consistently improves both input dependence and task performance.
【8】Physics-Informed Functional Link Constrained Framework with Domain Mapping for Solving Bending Analysis of an Exponentially Loaded Perforated Beam
标题:具有区域映射的物理信息功能链接约束框架用于解决指数加载穿孔梁的弯曲分析
链接:https://arxiv.org/abs/2604.07025
作者:Iswari Sahu,Ramanath Garai,S. Chakraverty
摘要:本文提出了一种分析锥形开孔梁在指数载荷作用下弯曲性能的新方法。控制微分方程包含了填充率($α$)、孔排数($N$)、锥化参数($φ$和$φ $)以及指数加载参数($γ$)等重要因素,为多孔梁结构提供了一个真实而灵活的描述。这项工作的主要目标是看看如何以及域映射物理通知功能连接理论(DFL-TFC)方法分析指数载荷下的方孔穿孔梁的弯曲响应。为了比较的目的,开发了相应的基于PINN的制剂。结果清楚地表明,所提出的DFL-TFC框架给出了更好的结果,包括更快的收敛速度,降低计算成本,并提高解决方案的精度相比,PINN的方法。这些研究结果突出了DFL-TFC方法在解决由微分方程控制的复杂工程问题中的有效性和潜力。在这个框架内,隐藏层被替换为一个功能扩展块,通过正交多项式基函数丰富输入表示,DE的域映射到相应的正交多项式域。通过使用边界条件的功能连接理论(TFC)构造的约束表达式(CE)可确保约束得到精确满足。在CE中,自由函数使用函数链接神经网络(FLNN)表示,该网络学习解决由此产生的无约束优化问题。通过Galerkin和PINN解进一步验证了所得结果。
摘要
:This article presents a novel and comprehensive approach for analyzing bending behavior of the tapered perforated beam under an exponential load. The governing differential equation includes important factors like filling ratio ($α$), number of rows of holes ($N$), tapering parameters ($φ$ and $ψ$), and exponential loading parameter ($γ$), providing a realistic and flexible representation of perforated beam configuration. Main goal of this work is to see how well the Domain mapped physics-informed Functional link Theory of Functional Connection (DFL-TFC) method analyses bending response of perforated beam with square holes under exponential loading. For comparison purposes, a corresponding PINN-based formulation is developed. Outcomes clearly show that the proposed DFL-TFC framework gives better results, including faster convergence, reduced computational cost, and improved solution accuracy when compared to the PINN approach. These findings highlight effectiveness and potential of DFL-TFC method for solving complex engineering problems governed by differential equations. Within this framework, hidden layer is replaced by a functional expansion block that enriches input representation via orthogonal polynomial basis functions, and the domain of DE mapped to corresponding domain of orthogonal polynomials. A Constrained Expression (CE), constructed through the Theory of Functional Connections (TFC) using boundary conditions, ensures that constraints are exactly satisfied. In CE, free function is represented using a Functional Link Neural Network (FLNN), which learns to solve resulting unconstrained optimization problem. The obtained results are further validated through the Galerkin and PINN solutions.
检测相关(4篇)
【1】Towards Resilient Intrusion Detection in CubeSats: Challenges, TinyML Solutions, and Future Directions
标题:在CubeSats中实现弹性入侵检测:挑战、TinyML解决方案和未来方向
链接:https://arxiv.org/abs/2604.06411
作者:Yasamin Fayyaz,Li Yang,Khalil El-Khatib
备注:Published in IEEE Aerospace and Electronic Systems Magazine
摘要:CubeSats通过为研究和教育提供负担得起的和可访问的平台,彻底改变了进入太空的方式。然而,他们对商用现成(COTS)组件和开源软件的依赖带来了重大的网络安全漏洞。确保立方体卫星的网络安全至关重要,因为它们在太空任务中发挥着越来越重要的作用。传统的安全措施,如入侵检测系统(IDS),是不切实际的立方体卫星由于资源限制和独特的操作环境。本文深入审查了CubeSats当前的网络安全做法,强调了现有方法的局限性并找出了差距。此外,它还探索了非网络异常检测技术,这些技术提供了对适用于CubeSat约束的自适应算法和部署策略的见解。开放的研究问题,包括需要资源高效的入侵检测机制,评估IDS解决方案下现实的任务场景,自主响应系统的发展,并创建网络安全框架。将TinyML添加到CubeSat系统中是解决这些挑战的一个有前途的解决方案,提供资源高效的实时入侵检测功能。提出了未来的研究方向,如将网络安全与健康监测系统相结合,以及促进网络安全研究人员与空间领域专家之间的合作。
摘要:CubeSats have revolutionized access to space by providing affordable and accessible platforms for research and education. However, their reliance on Commercial Off-The-Shelf (COTS) components and open-source software has introduced significant cybersecurity vulnerabilities. Ensuring the cybersecurity of CubeSats is vital as they play increasingly important roles in space missions. Traditional security measures, such as intrusion detection systems (IDS), are impractical for CubeSats due to resource constraints and unique operational environments. This paper provides an in-depth review of current cybersecurity practices for CubeSats, highlighting limitations and identifying gaps in existing methods. Additionally, it explores non-cyber anomaly detection techniques that offer insights into adaptable algorithms and deployment strategies suitable for CubeSat constraints. Open research problems are identified, including the need for resource-efficient intrusion detection mechanisms, evaluation of IDS solutions under realistic mission scenarios, development of autonomous response systems, and creation of cybersecurity frameworks. The addition of TinyML into CubeSat systems is explored as a promising solution to address these challenges, offering resource-efficient, real-time intrusion detection capabilities. Future research directions are proposed, such as integrating cybersecurity with health monitoring systems, and fostering collaboration between cybersecurity researchers and space domain experts.
【2】Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection
标题:望远镜:用于超远距离物体检测的可学习双曲视觉
链接:https://arxiv.org/abs/2604.06332
作者:Parker Ewen,Dmitriy Rivkin,Mario Bijelic,Felix Heide
备注:Project website: https://light.princeton.edu/telescope
摘要:高速公路自动驾驶,特别是长途重型卡车,需要在500米以外的远距离检测物体,以满足高速制动距离要求。在远距离上,车辆和其他关键物体在高分辨率图像中只占据几个像素,导致最先进的物体探测器失效。商用激光雷达传感器的有效范围有限,使这一挑战变得更加复杂,由于分辨率随距离的二次损失,这些传感器的有效范围达不到超远程阈值,这使得基于图像的检测成为考虑到商用传感器限制的最实用的可扩展解决方案。我们介绍望远镜,这是一种为超远程自动驾驶设计的两阶段检测模型。除了强大的检测骨干外,该模型还包含一个新的重采样层和图像变换,以解决检测小而远的物体的基本挑战。与最先进的方法相比,望远镜在超远程探测中的mAP相对提高了76美元(在超过250米的距离上从0.185的绝对mAP提高到0.326),需要最小的计算开销,并在所有探测范围内保持强大的性能。
摘要:Autonomous highway driving, especially for long-haul heavy trucks, requires detecting objects at long ranges beyond 500 meters to satisfy braking distance requirements at high speeds. At long distances, vehicles and other critical objects occupy only a few pixels in high-resolution images, causing state-of-the-art object detectors to fail. This challenge is compounded by the limited effective range of commercially available LiDAR sensors, which fall short of ultra-long range thresholds because of quadratic loss of resolution with distance, making image-based detection the most practically scalable solution given commercially available sensor constraints. We introduce Telescope, a two-stage detection model designed for ultra-long range autonomous driving. Alongside a powerful detection backbone, this model contains a novel re-sampling layer and image transformation to address the fundamental challenges of detecting small, distant objects. Telescope achieves $76\%$ relative improvement in mAP in ultra-long range detection compared to state-of-the-art methods (improving from an absolute mAP of 0.185 to 0.326 at distances beyond 250 meters), requires minimal computational overhead, and maintains strong performance across all detection ranges.
【3】SMT-AD: a scalable quantum-inspired anomaly detection approach
标题:SMT-AD:一种可扩展的量子启发异常检测方法
链接:https://arxiv.org/abs/2604.06265
作者:Apimuk Sornsaeng,Si Min Chan,Wenxuan Zhang,Swee Liang Wong,Joshua Lim,Dario Poletti
备注:11 pages, 5 figures
摘要
:量子张量网络算法已被证明是机器学习任务(包括异常检测)的有效和高效模型。在这里,我们提出了一个高度并行化的量子启发的方法,我们称之为SMT-AD从叠加的多分辨率张量异常检测。它是基于键维1矩阵乘积算子的叠加,以傅立叶辅助特征嵌入来转换输入数据,其中可学习参数的数量随着特征大小、嵌入分辨率和矩阵乘积算子结构中的附加分量的数量线性增长。我们展示了成功的异常检测时,应用于标准数据集,包括信用卡交易,并发现,即使有最小的配置,它实现了竞争力的性能对既定的异常检测基线。此外,它提供了一种简单的方法来减少模型的权重,甚至通过突出显示最相关的输入特征来提高性能。
摘要:Quantum-inspired tensor networks algorithms have shown to be effective and efficient models for machine learning tasks, including anomaly detection. Here, we propose a highly parallelizable quantum-inspired approach which we call SMT-AD from Superposition of Multiresolution Tensors for Anomaly Detection. It is based upon the superposition of bond-dimension-1 matrix product operators to transform the input data with Fourier-assisted feature embedding, where the number of learnable parameters grows linearly with feature size, embedding resolutions, and the number of additional components in the matrix product operators structure. We demonstrate successful anomaly detection when applied to standard datasets, including credit card transactions, and find that, even with minimal configurations, it achieves competitive performance against established anomaly detection baselines. Furthermore, it provides a straightforward way to reduce the weight of the model and even improve the performance by highlighting the most relevant input features.
【4】Quantum-Inspired Tensor Network Autoencoders for Anomaly Detection: A MERA-Based Approach
标题:用于异常检测的量子启发张量网络自动编码器:基于MERA的方法
链接:https://arxiv.org/abs/2604.06541
作者:Emre Gurkanli,Michael Spannowsky
备注:26 pages, 5 figures
摘要:我们调查是否多尺度张量网络架构可以提供一个有用的归纳偏见,在对撞机射流重建为基础的异常检测。喷流是由分支级联产生的,因此它们的内部结构自然地在角和动量尺度上组织起来。这促使自动编码器分层压缩信息,并可以在粗粒度之前重新组织短程相关性。在这张图片的指导下,我们制定了一个MERA启发的自动编码器直接作用于有序的射流成分。据我们所知,MERA启发的自动编码器以前没有被提出过,这种架构还没有在碰撞机异常检测中进行过探索。 我们将此架构与密集的自动编码器,相应的树张量网络限制以及常见的仅背景重建框架内的标准经典基线进行比较。本文围绕两个主要问题进行组织:位置感知分层压缩是否真正得到数据的支持,以及MERA的解缠层是否超越了简单的树层次结构。为了解决这些问题,我们将基准比较与无训练的局部可压缩性诊断和直接的身份解开消融相结合。由此产生的图片是局部保持多尺度结构很好地匹配射流数据,并且MERA解缠器在压缩瓶颈最强时恰好变得有益。总的来说,该研究支持位置感知分层压缩作为一个有用的归纳偏差射流异常检测。
摘要:We investigate whether a multiscale tensor-network architecture can provide a useful inductive bias for reconstruction-based anomaly detection in collider jets. Jets are produced by a branching cascade, so their internal structure is naturally organised across angular and momentum scales. This motivates an autoencoder that compresses information hierarchically and can reorganise short-range correlations before coarse-graining. Guided by this picture, we formulate a MERA-inspired autoencoder acting directly on ordered jet constituents. To the best of our knowledge, a MERA-inspired autoencoder has not previously been proposed, and this architecture has not been explored in collider anomaly detection. We compare this architecture to a dense autoencoder, the corresponding tree-tensor-network limit, and standard classical baselines within a common background-only reconstruction framework. The paper is organised around two main questions: whether locality-aware hierarchical compression is genuinely supported by the data, and whether the disentangling layers of MERA contribute beyond a simpler tree hierarchy. To address these questions, we combine benchmark comparisons with a training-free local-compressibility diagnostic and a direct identity-disentangler ablation. The resulting picture is that the locality-preserving multiscale structure is well matched to jet data, and that the MERA disentanglers become beneficial precisely when the compression bottleneck is strongest. Overall, the study supports locality-aware hierarchical compression as a useful inductive bias for jet anomaly detection.
分类|识别(5篇)
【1】DINO-QPM: Adapting Visual Foundation Models for Globally Interpretable Image Classification
标题:DINO-QPM:调整视觉基础模型以实现全球可解释的图像分类
链接:https://arxiv.org/abs/2604.07166
作者:Robert Zimmermann,Thomas Norrenbrock,Bodo Rosenhahn
备注:Accepted to the 5th Explainable AI for Computer Vision (XAI4CV) Workshop at CVPR 2026
摘要:虽然像DINOv 2这样的视觉基础模型提供了最先进的特征提取器性能,但它们复杂的高维表示为可解释性带来了巨大的障碍。这项工作提出了DINO-QPM,它将这些强大但纠缠的特征转换为人类可解释的对比,类独立的表示。DINO-QPM是一个轻量级的可解释性适配器,追求全局可解释的图像分类,适应二次规划增强模型(QPM)在严格冻结的DINO骨干上运行。虽然使用视觉基础模型的分类通常依赖于\texttt{CLS}标记,但我们故意偏离了此标准。通过利用平均池,我们将补丁嵌入直接连接到模型的特征,从而在输入空间内实现DINO-QPM的全局可解释特征的空间定位。此外,我们应用稀疏损失来最大限度地减少空间散射和背景噪声,确保解释在相关对象部分中扎根。通过DINO-QPM,我们可以将QPM的可解释性水平作为适配器使用,同时超过DINOv 2线性探头的精度。通过引入似然性度量和其他可解释性度量进行评估,大量的实验表明,DINO-QPM在分类精度和解释质量方面优于其他适用于冻结视觉基础模型的方法。
摘要
:Although visual foundation models like DINOv2 provide state-of-the-art performance as feature extractors, their complex, high-dimensional representations create substantial hurdles for interpretability. This work proposes DINO-QPM, which converts these powerful but entangled features into contrastive, class-independent representations that are interpretable by humans. DINO-QPM is a lightweight interpretability adapter that pursues globally interpretable image classification, adapting the Quadratic Programming Enhanced Model (QPM) to operate on strictly frozen DINO backbones. While classification with visual foundation models typically relies on the \texttt{CLS} token, we deliberately diverge from this standard. By leveraging average-pooling, we directly connect the patch embeddings to the model's features and therefore enable spatial localisation of DINO-QPM's globally interpretable features within the input space. Furthermore, we apply a sparsity loss to minimise spatial scatter and background noise, ensuring that explanations are grounded in relevant object parts. With DINO-QPM we make the level of interpretability of QPM available as an adapter while exceeding the accuracy of DINOv2 linear probe. Evaluated through an introduced Plausibility metric and other interpretability metrics, extensive experiments demonstrate that DINO-QPM is superior to other applicable methods for frozen visual foundation models in both classification accuracy and explanation quality.
【2】Learning to Query History: Nonstationary Classification via Learned Retrieval
标题:学习查询历史:通过学习检索的非平稳分类
链接:https://arxiv.org/abs/2604.07027
作者:Jimmy Gammell,Bishal Thapaliya,Yoon Jung,Riyasat Ohib,Bilel Fehri,Deepayan Chakrabarti
备注:Accepted to ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM). 12 pages, 6 figures
摘要:非平稳性在实际的分类设置中无处不在,导致部署的模型表现不佳,即使它们在训练时很好地泛化到可用的holdout集。我们通过将非平稳分类重新构建为时间序列预测来解决这个问题:而不是仅从当前输入进行预测,我们将分类器置于一系列历史标记的示例上,这些示例超出了训练截止值。为了扩展到大序列,我们引入了一种学习的离散检索机制,该机制通过依赖于输入的查询对相关的历史示例进行采样,使用基于分数的梯度估计器与分类器进行端到端的训练。这使得完整的历史数据语料库在训练和部署期间可以保留在任意文件系统上。在合成基准测试和Amazon Reviews '23(电子产品类别)上的实验表明,与标准分类器相比,对分布偏移的鲁棒性有所提高,随着历史数据序列长度的增加,VRAM的缩放可预测。
摘要:Nonstationarity is ubiquitous in practical classification settings, leading deployed models to perform poorly even when they generalize well to holdout sets available at training time. We address this by reframing nonstationary classification as time series prediction: rather than predicting from the current input alone, we condition the classifier on a sequence of historical labeled examples that extends beyond the training cutoff. To scale to large sequences, we introduce a learned discrete retrieval mechanism that samples relevant historical examples via input-dependent queries, trained end-to-end with the classifier using a score-based gradient estimator. This enables the full corpus of historical data to remain on an arbitrary filesystem during training and deployment. Experiments on synthetic benchmarks and Amazon Reviews '23 (electronics category) show improved robustness to distribution shift compared to standard classifiers, with VRAM scaling predictably as the length of the historical data sequence increases.
【3】Towards Accurate and Calibrated Classification: Regularizing Cross-Entropy From A Generative Perspective
标题:迈向准确和校准的分类:从生成的角度规范交叉信息
链接:https://arxiv.org/abs/2604.06689
作者:Qipeng Zhan,Zhuoping Zhou,Li Shen
摘要:准确的分类不仅需要高的预测精度,而且还需要校准良好的置信度估计。然而,现代深度神经网络(DNN)通常过于自信,主要是由于负对数似然(NLL)的过度拟合。虽然焦点损失变量缓解了这个问题,但它们通常会降低准确性,揭示了校准和预测性能之间的持久权衡。基于生成式分类器和判别式分类器的互补优势,我们提出了生成式交叉熵(GCE),它最大化$p(x| y)$和等价于用类级置信度正则化子增广的交叉熵。在温和的条件下,GCE是严格适当的。在CIFAR-10/100、Tiny-ImageNet和医学成像基准测试中,GCE提高了交叉熵的准确性和校准,特别是在长尾场景中。结合自适应分段温度缩放(ATS),GCE实现校准与焦点损失的变种竞争,而不牺牲精度。
摘要:Accurate classification requires not only high predictive accuracy but also well-calibrated confidence estimates. Yet, modern deep neural networks (DNNs) are often overconfident, primarily due to overfitting on the negative log-likelihood (NLL). While focal loss variants alleviate this issue, they typically reduce accuracy, revealing a persistent trade-off between calibration and predictive performance. Motivated by the complementary strengths of generative and discriminative classifiers, we propose Generative Cross-Entropy (GCE), which maximizes $p(x|y)$ and is equivalent to cross-entropy augmented with a class-level confidence regularizer. Under mild conditions, GCE is strictly proper. Across CIFAR-10/100, Tiny-ImageNet, and a medical imaging benchmark, GCE improves both accuracy and calibration over cross-entropy, especially in the long-tailed scenario. Combined with adaptive piecewise temperature scaling (ATS), GCE attains calibration competitive with focal-loss variants without sacrificing accuracy.
【4】Time-Series Classification with Multivariate Statistical Dependence Features
标题:具有多元统计依赖特征的时间序列分类
链接:https://arxiv.org/abs/2604.06537
作者:Yao Sun,Bo Hu,Jose Principe
摘要:在本文中,我们提出了一种新的框架,非平稳时间序列分析,取代传统的相关性为基础的统计与直接估计的统计依赖的归一化联合密度的输入和目标信号,交叉密度比(CDR)。与加窗相关估计不同,该测量与样本阶数无关,并且对状态变化具有鲁棒性。该方法基于泛函最大相关算法(FMCA),通过分解CDR的特征谱来构造投影空间。多尺度特征从这个特征空间分类使用一个轻量级的单隐藏层感知器。在TI-46数字语音语料库上,我们的方法优于隐马尔可夫模型(HALF)和最先进的尖峰神经网络,实现了更高的准确性,少于10层,存储空间小于5 MB。
摘要:In this paper, we propose a novel framework for non-stationary time-series analysis that replaces conventional correlation-based statistics with direct estimation of statistical dependence in the normalized joint density of input and target signals, the cross density ratio (CDR). Unlike windowed correlation estimates, this measure is independent of sample order and robust to regime changes. The method builds on the functional maximal correlation algorithm (FMCA), which constructs a projection space by decomposing the eigenspectrum of the CDR. Multiscale features from this eigenspace are classified using a lightweight single-hidden-layer perceptron. On the TI-46 digit speech corpus, our approach outperforms hidden Markov models (HMMs) and state-of-the-art spiking neural networks, achieving higher accuracy with fewer than 10 layers and a storage footprint under 5 MB.
【5】Asymptotic-Preserving Neural Networks for Viscoelastic Parameter Identification in Multiscale Blood Flow Modeling
标题:用于多尺度血流建模中粘弹参数识别的渐进保持神经网络
链接:https://arxiv.org/abs/2604.06287
作者:Giulia Bertaglia,Raffaella Fiamma Cabini
摘要:数学模型和数值模拟提供了一种非侵入性的方式来探索心血管现象,提供了无法直接测量的数量。在这项研究中,我们从描述动脉壁粘弹性的一维多尺度血流模型开始,我们专注于通过解决一个主要挑战来提高其实用性:以可靠的方式确定控制动脉在脉动压力下如何变形的粘弹性参数。为了实现这一目标,我们采用了渐近保持神经网络,该网络将多尺度粘弹性血流模型的物理原理嵌入到学习过程中。这个框架使我们能够推断的粘弹性参数,同时重建血管的状态变量的时间依赖性的演变。利用这种方法,从容易获得的患者特异性数据估计压力波形,即,在无法直接测量压力的血管段中,通过多普勒超声测量横截面积和流速。不同的数值模拟,在合成和患者特定的情况下进行,显示所提出的方法的有效性。
摘要:Mathematical models and numerical simulations offer a non-invasive way to explore cardiovascular phenomena, providing access to quantities that cannot be measured directly. In this study, we start with a one-dimensional multiscale blood flow model that describes the viscoelastic properties of arterial walls, and we focus on improving its practical applicability by addressing a major challenge: determining, in a reliable way, the viscoelastic parameters that control how arteries deform under pulsatile pressure. To achieve this, we employ Asymptotic-Preserving Neural Networks that embed the governing physical principles of the multiscale viscoelastic blood flow model within the learning procedure. This framework allows us to infer the viscoelastic parameters while simultaneously reconstructing the time-dependent evolution of the state variables of blood vessels. With this approach, pressure waveforms are estimated from readily accessible patient-specific data, i.e., cross-sectional area and velocity measurements from Doppler ultrasound, in vascular segments where direct pressure measurements are not available. Different numerical simulations, conducted in both synthetic and patient-specific scenarios, show the effectiveness of the proposed methodology.
表征(4篇)
【1】Stress Estimation in Elderly Oncology Patients Using Visual Wearable Representations and Multi-Instance Learning
标题:使用视觉可穿戴表示和多实例学习估计老年肿瘤患者的压力
链接:https://arxiv.org/abs/2604.06990
作者:Ioannis Kyprakis,Vasileios Skaramagkas,Georgia Karanasiou,Vasilis Bouratzis,Andri Papakonstantinou,Dimitar Stefanovski,Kalliopi Keramida,Aristofania Simatou,Ketti Mazzocco,Anastasia Constantinidou,Konstantinos Marias,Dimitrios I. Fotiadis,Manolis Tsiknakis
备注:7 pages, 2 figures, under review for IEEE EMBC 2026
摘要:心理应激在心脏肿瘤学中具有临床相关性,但通常仅通过患者报告的结局指标(PROM)进行评估,很少纳入连续的心脏毒性监测。我们使用来自智能手表(体力活动和睡眠)和胸戴式ECG传感器的多模态可穿戴数据来估计老年多中心乳腺癌队列(CARDIOCARE)的感知压力。可穿戴流被转换成异构的视觉表示,产生一个弱监督的设置,其中一个单一的感知压力量表(PSS)分数对应于许多未标记的窗口。一个轻量级的预先训练的专家混合骨干(Tiny-BioMoE)将每个表示嵌入到192维向量中,这些向量通过基于注意力的多实例学习(MIL)聚合,以预测第3个月(M3)和第6个月(M6)的PSS。在留一受试者(LOSO)评价下,预测显示与问卷评分中度一致(M3:R^2=0.24,Pearson r=0.42,Spearman rho=0.48; M6:R^2=0.28,Pearson r=0.49,Spearman rho=0.52),M3的总体RMSE/MAE为6.62/6.07,M6为6.13/5.54。
摘要:Psychological stress is clinically relevant in cardio-oncology, yet it is typically assessed only through patient-reported outcome measures (PROMs) and is rarely integrated into continuous cardiotoxicity surveillance. We estimate perceived stress in an elderly, multicenter breast cancer cohort (CARDIOCARE) using multimodal wearable data from a smartwatch (physical activity and sleep) and a chest-worn ECG sensor. Wearable streams are transformed into heterogeneous visual representations, yielding a weakly supervised setting in which a single Perceived Stress Scale (PSS) score corresponds to many unlabeled windows. A lightweight pretrained mixture-of-experts backbone (Tiny-BioMoE) embeds each representation into 192-dimensional vectors, which are aggregated via attention-based multiple instance learning (MIL) to predict PSS at month 3 (M3) and month 6 (M6). Under leave-one-subject-out (LOSO) evaluation, predictions showed moderate agreement with questionnaire scores (M3: R^2=0.24, Pearson r=0.42, Spearman rho=0.48; M6: R^2=0.28, Pearson r=0.49, Spearman rho=0.52), with global RMSE/MAE of 6.62/6.07 at M3 and 6.13/5.54 at M6.
【2】Variational Feature Compression for Model-Specific Representations
标题:模型特定表示的变分特征压缩
链接:https://arxiv.org/abs/2604.06644
作者:Zinan Guo,Zihan Wang,Chuan Yan,Liuhuo Wan,Ethan Ma,Guangdong Bai
摘要:随着深度学习推理越来越多地部署在共享和基于云的环境中,人们越来越担心输入重新利用,其中为一个任务提交的数据被未经授权的模型重新用于另一个任务。现有的隐私防御主要集中在限制数据访问上,但对已发布的表示仍然可以支持的下游使用提供有限的控制。我们提出了一个特征提取框架,抑制跨模型传输,同时保持指定分类器的准确性。该框架采用了一个变分潜在瓶颈,用任务驱动的交叉熵目标和KL正则化进行训练,但没有任何像素级重建损失,将输入编码到一个紧凑的潜在空间中。一个动态的二进制掩模,计算从每维KL发散和基于梯度的显着性相对于冻结的目标模型,抑制潜在的尺寸是没有信息的预期任务。由于显着性计算需要梯度访问,因此编码器在白盒设置中进行训练,而推理仅需要通过冻结的目标模型进行正向传递。在CIFAR-100上,处理后的表示保留了指定分类器的强大实用性,同时将所有非预期分类器的准确性降低到2%以下,相对于非预期模型,抑制率超过45倍。在CIFAR-10、Tiny ImageNet和Pascal VOC上进行的初步实验提供了探索性证据,证明该方法可以扩展到各种任务设置,但还需要进一步评估,以评估针对自适应对手的鲁棒性。
摘要
:As deep learning inference is increasingly deployed in shared and cloud-based settings, a growing concern is input repurposing, in which data submitted for one task is reused by unauthorized models for another. Existing privacy defenses largely focus on restricting data access, but provide limited control over what downstream uses a released representation can still support. We propose a feature extraction framework that suppresses cross-model transfer while preserving accuracy for a designated classifier. The framework employs a variational latent bottleneck, trained with a task-driven cross-entropy objective and KL regularization, but without any pixel-level reconstruction loss, to encode inputs into a compact latent space. A dynamic binary mask, computed from per-dimension KL divergence and gradient-based saliency with respect to the frozen target model, suppresses latent dimensions that are uninformative for the intended task. Because saliency computation requires gradient access, the encoder is trained in a white-box setting, whereas inference requires only a forward pass through the frozen target model. On CIFAR-100, the processed representations retain strong utility for the designated classifier while reducing the accuracy of all unintended classifiers to below 2%, yielding a suppression ratio exceeding 45 times relative to unintended models. Preliminary experiments on CIFAR-10, Tiny ImageNet, and Pascal VOC provide exploratory evidence that the approach extends across task settings, although further evaluation is needed to assess robustness against adaptive adversaries.
【3】Neural parametric representations for thin-shell shape optimisation
标题:薄壳形状优化的神经参数表示
链接:https://arxiv.org/abs/2604.06612
作者:Xiao Xiao,Fehmi Cirak
备注:13 pages, 8 figures
摘要:薄壳结构的形状优化需要一个灵活的,可微的几何表示适合于基于梯度的优化。我们提出了一个神经参数表示(NRep)的基础上,与周期性激活函数的神经网络的壳体中表面。使用多层感知器(MLP)定义NRep,该多层感知器将中间表面顶点的参数坐标映射到它们的物理坐标。一个结构的顺应性优化问题,提出了优化的形状由NRep参数化的薄壳的体积约束,与网络参数作为设计变量。由此产生的形状优化问题的解决使用基于梯度的优化算法。经典的解决方案的基准测试的例子证明了所提出的NRep的有效性。该方法表现出潜在的复杂的网格皮肤结构,由于紧凑和富有表现力的几何表示所提供的NRep。
摘要:Shape optimisation of thin-shell structures requires a flexible, differentiable geometric representation suitable for gradient-based optimisation. We propose a neural parametric representation (NRep) for the shell mid-surface based on a neural network with periodic activation functions. The NRep is defined using a multi-layer perceptron (MLP), which maps the parametric coordinates of mid-surface vertices to their physical coordinates. A structural compliance optimisation problem is posed to optimise the shape of a thin-shell parameterised by the NRep subject to a volume constraint, with the network parameters as design variables. The resulting shape optimisation problem is solved using a gradient-based optimisation algorithm. Benchmark examples with classical solutions demonstrate the effectiveness of the proposed NRep. The approach exhibits potential for complex lattice-skin structures, owing to the compact and expressive geometry representation afforded by the NRep.
【4】Spatiotemporal Gaussian representation-based dynamic reconstruction and motion estimation framework for time-resolved volumetric MR imaging (DREME-GSMR)
标题:用于时间分辨体积MR成像的基于时空高斯表示的动态重建和运动估计框架(DREME-GSVR)
链接:https://arxiv.org/abs/2604.06482
作者:Jiacheng Xie,Hua-Chieh Shao,Can Wu,Ricardo Otazo,Jie Deng,Mu-Han Lin,Tsuicheng Chiu,Jacob Buatti,Viktor Iakovenko,You Zhang
备注:57 pages, 10 figures
摘要:时间分辨容积磁共振成像在亚秒内重建三维磁共振成像,以解决变形运动是必不可少的运动自适应放射治疗。将患者解剖结构和相关运动场表示为3D高斯,我们开发了一种基于时空高斯表示的框架(DREME-GSMR),该框架能够在没有任何先前解剖/运动模型的情况下从治疗前3D MR扫描进行时间分辨的动态MRI重建。DREME-GSMR使用3D高斯表示参考MRI体积和对应的低秩运动模型(作为运动基础分量),并结合双路径MLP/CNN运动编码器以从原始k空间导出信号估计运动模型的时间运动系数。此外,使用求解的运动模型,DREME-GSMR可以直接从新的在线k空间数据推断运动系数,允许随后的治疗内体积MR成像和运动跟踪(实时成像)。进一步引入运动增强策略以提高对实时成像期间看不见的运动模式的鲁棒性。DREME-GSMR在XCAT数字体模、物理运动体模和从6名健康志愿者和20名患者采集的MR-LINAC数据集上进行评价(交叉评价采用独立的顺序扫描)。DREME-GSMR重建时间分辨率约为400 ms的MRI,推理时间约为10 ms/体积。在XCAT实验中,DREME-GSMR实现了平均值(s.d.)SSIM、肿瘤质心误差(COME)和DSC分别为0.92(0.01)/0.91(0.02)、0.50(0.15)/0.65(0.19)mm和0.92(0.02)/0.92(0.03),用于动态重建/实时成像。对于物理体模,动态/实时成像的平均目标COME为1.19(0.94)/1.40(1.15)mm,而对于志愿者和患者,实时成像的平均肝脏COME分别为1.31(0.82)和0.96(0.64)mm。
摘要:Time-resolved volumetric MR imaging that reconstructs a 3D MRI within sub-seconds to resolve deformable motion is essential for motion-adaptive radiotherapy. Representing patient anatomy and associated motion fields as 3D Gaussians, we developed a spatiotemporal Gaussian representation-based framework (DREME-GSMR), which enables time-resolved dynamic MRI reconstruction from a pre-treatment 3D MR scan without any prior anatomical/motion model. DREME-GSMR represents a reference MRI volume and a corresponding low-rank motion model (as motion-basis components) using 3D Gaussians, and incorporates a dual-path MLP/CNN motion encoder to estimate temporal motion coefficients of the motion model from raw k-space-derived signals. Furthermore, using the solved motion model, DREME-GSMR can infer motion coefficients directly from new online k-space data, allowing subsequent intra-treatment volumetric MR imaging and motion tracking (real-time imaging). A motion-augmentation strategy is further introduced to improve robustness to unseen motion patterns during real-time imaging. DREME-GSMR was evaluated on the XCAT digital phantom, a physical motion phantom, and MR-LINAC datasets acquired from 6 healthy volunteers and 20 patients (with independent sequential scans for cross-evaluation). DREME-GSMR reconstructs MRIs of a ~400ms temporal resolution, with an inference time of ~10ms/volume. In XCAT experiments, DREME-GSMR achieved mean(s.d.) SSIM, tumor center-of-mass-error(COME), and DSC of 0.92(0.01)/0.91(0.02), 0.50(0.15)/0.65(0.19) mm, and 0.92(0.02)/0.92(0.03) for dynamic reconstruction/real-time imaging. For the physical phantom, the mean target COME was 1.19(0.94)/1.40(1.15) mm for dynamic/real-time imaging, while for volunteers and patients, the mean liver COME for real-time imaging was 1.31(0.82) and 0.96(0.64) mm, respectively.
3D|3D重建等相关(1篇)
【1】Splats under Pressure: Exploring Performance-Energy Trade-offs in Real-Time 3D Gaussian Splatting under Constrained GPU Budgets
标题:压力下的Splats:探索受限制的图形处理预算下实时3D高斯Splats的性能与能源权衡
链接:https://arxiv.org/abs/2604.07177
作者
:Muhammad Fahim Tajwar,Arthur Wuhrlin,Bhojan Anand
摘要:我们研究了在边缘客户端上实时3D高斯溅射(3DGS)光栅化的可行性,这些客户端具有不同的高斯溅射计数和GPU计算预算。我们采用基于仿真的方法,而不是评估多个物理设备,该方法在单个高端GPU上近似不同的GPU能力层。通过系统地降低GPU核心频率并应用功率上限,我们模拟了一系列受控的浮点性能水平,这些性能水平近似于不同的GPU能力层。在此范围内的每个点上,我们测量帧速率、运行时行为和不同复杂度、流水线和优化场景的功耗,从而能够分析功率-性能关系,例如FPS功率曲线、每帧能耗和每瓦性能。这种方法使我们能够近似各种GPU的性能范围,从嵌入式和移动级设备到高端消费级系统。 我们的目标是探索客户端3DGS光栅化的实际下限,并评估其在能源受限环境中部署的潜力,包括独立耳机和瘦客户端。通过这种分析,我们提供了早期的洞察力的性能-能源权衡,管理的可行性边缘部署的3DGS系统。
摘要:We investigate the feasibility of real-time 3D Gaussian Splatting (3DGS) rasterisation on edge clients with varying Gaussian splat counts and GPU computational budgets. Instead of evaluating multiple physical devices, we adopt an emulation-based approach that approximates different GPU capability tiers on a single high-end GPU. By systematically under-clocking the GPU core frequency and applying power caps, we emulate a controlled range of floating-point performance levels that approximate different GPU capability tiers. At each point in this range, we measure frame rate, runtime behaviour, and power consumption across scenes of varying complexity, pipelines, and optimisations, enabling analysis of power-performance relationships such as FPS-power curves, energy per frame, and performance per watt. This method allows us to approximate the performance envelope of a diverse class of GPUs, from embedded and mobile-class devices to high-end consumer-grade systems. Our objective is to explore the practical lower bounds of client-side 3DGS rasterisation and assess its potential for deployment in energy-constrained environments, including standalone headsets and thin clients. Through this analysis, we provide early insights into the performance-energy trade-offs that govern the viability of edge-deployed 3DGS systems.
编码器(2篇)
【1】Bi-Lipschitz Autoencoder With Injectivity Guarantee
标题:具有注入性保证的Bi-Lipschitz自动编码器
链接:https://arxiv.org/abs/2604.06701
作者:Qipeng Zhan,Zhuoping Zhou,Zexuan Wang,Qi Long,Li Shen
备注:Accepted for publication at ICLR 2026, 27 Pages, 15 Figures
摘要:自编码器被广泛用于降维,基于高维数据位于低维流形上的假设。正则化自编码器的目标是在降维过程中保持流形几何,但现有的方法往往受到非内射映射和过于严格的约束,限制了它们的有效性和鲁棒性。在这项工作中,我们将编码器非内射性确定为导致收敛性差和潜在表示失真的核心瓶颈。为了确保跨数据分布的鲁棒性,我们形式化了可容许正则化的概念,并提供了满足的充分条件。在这项工作中,我们提出了Bi-Lipschitz自动编码器(BLAE),它引入了两个关键的创新:(1)基于分离准则的单射正则化方案,以消除病理局部极小值,以及(2)双Lipschitz松弛,保留几何形状并对数据分布漂移表现出鲁棒性。在不同数据集上的实验结果表明,BLAE在保持流形结构方面始终优于现有方法,同时保持对采样稀疏性和分布变化的弹性。代码可在https://github.com/qipengz/BLAE上获得。
摘要:Autoencoders are widely used for dimensionality reduction, based on the assumption that high-dimensional data lies on low-dimensional manifolds. Regularized autoencoders aim to preserve manifold geometry during dimensionality reduction, but existing approaches often suffer from non-injective mappings and overly rigid constraints that limit their effectiveness and robustness. In this work, we identify encoder non-injectivity as a core bottleneck that leads to poor convergence and distorted latent representations. To ensure robustness across data distributions, we formalize the concept of admissible regularization and provide sufficient conditions for its satisfaction. In this work, we propose the Bi-Lipschitz Autoencoder (BLAE), which introduces two key innovations: (1) an injective regularization scheme based on a separation criterion to eliminate pathological local minima, and (2) a bi-Lipschitz relaxation that preserves geometry and exhibits robustness to data distribution drift. Empirical results on diverse datasets show that BLAE consistently outperforms existing methods in preserving manifold structure while remaining resilient to sampling sparsity and distribution shifts. Code is available at https://github.com/qipengz/BLAE.
【2】MO-RiskVAE: A Multi-Omics Variational Autoencoder for Survival Risk Modeling in Multiple MyelomaMO-RiskVAE
标题:MO-RiskVAE:用于多发性骨髓瘤生存风险建模的多组学变分自动编码器MO-RiskVAE
链接:https://arxiv.org/abs/2604.06267
作者:Zixuan Chen,Heng Zhang,YuPeng Qin,WenPeng Xing,Qiang Wang,Da Wang,Changting Lin,Meng Han
摘要:多模态变分自编码器(VAE)已成为一个强大的框架,生存风险建模多发性骨髓瘤,通过整合异质组学和临床数据。然而,当在生存监督下训练时,标准的潜在正则化策略通常无法保留与生存相关的变化,导致不稳定或过度约束的表示。尽管提出了许多变体,但仍不清楚潜在设计的哪些方面从根本上控制了这种设置中的性能。在这项工作中,我们进行了一个控制调查的潜在的多模态生存预测模型的选择MyeVAE框架内的统一扩展。通过在相同的架构和优化协议下系统地隔离正则化尺度、后验几何和潜在空间结构,我们证明了生存驱动的训练主要对潜在正则化的大小和结构敏感,而不是对特定的发散公式敏感。特别是,KL正则化的适度放松始终提高了生存歧视,而替代发散机制,如MMD和HSIC提供有限的好处,没有适当的缩放。我们进一步证明,构造潜在空间可以提高学习表示和生存风险梯度之间的对齐。基于Gumbel-Softmax的混合连续-离散配方增强了连续潜在子空间中的全局风险排序,即使在生存监督下也不会出现稳定的离散亚型发现。在这些研究结果的指导下,我们实例化了一个强大的多模式生存模型,称为MO-RiskVAE,它始终改善了原始MyeVAE的风险分层,而无需引入额外的监督或复杂的训练策略。
摘要
:Multimodal variational autoencoders (VAEs) have emerged as a powerful framework for survival risk modeling in multiple myeloma by integrating heterogeneous omics and clinical data. However, when trained under survival supervision, standard latent regularization strategies often fail to preserve prognostically relevant variation, leading to unstable or overly constrained representations. Despite numerous proposed variants, it remains unclear which aspects of latent design fundamentally govern performance in this setting. In this work, we conduct a controlled investigation of latent modeling choices for multimodal survival prediction within a unified extension of the MyeVAE framework. By systematically isolating regularization scale, posterior geometry, and latent space structure under identical architectures and optimization protocols, we show that survival-driven training is primarily sensitive to the magnitude and structure of latent regularization rather than the specific divergence formulation. In particular, moderate relaxation of KL regularization consistently improves survival discrimination, while alternative divergence mechanisms such as MMD and HSIC provide limited benefit without appropriate scaling. We further demonstrate that structuring the latent space can improve alignment between learned representations and survival risk gradients. A hybrid continuous--discrete formulation based on Gumbel--Softmax enhances global risk ordering in the continuous latent subspace, even though stable discrete subtype discovery does not emerge under survival supervision. Guided by these findings, we instantiate a robust multimodal survival model, termed MO-RiskVAE, which consistently improves risk stratification over the original MyeVAE without introducing additional supervision or complex training heuristics.
优化|敛散性(6篇)
【1】Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization
标题:链条中的理性,树木中的学习:多回合代理政策优化的自我纠正和嫁接
链接:https://arxiv.org/abs/2604.07165
作者:Yu Li,Sizhe Tang,Tian Lan
摘要:大型语言模型智能体的强化学习在多步推理任务中经常受到稀疏奖励的阻碍。现有的方法,如组相对策略优化处理采样的轨迹作为独立的链,分配统一的信用每个链中的所有步骤,并忽略存在的关键步骤,可能会影响推理结果。在本文中,我们提出了T-STAR(树结构的自学代理纠正),一个框架,恢复潜在的相关奖励结构在看似独立的轨迹。具体来说,我们通过识别和合并功能相似的步骤/节点,将轨迹合并到统一的认知树中。它使一个内省的估值机制,通过树反向传播的概率水平的奖励,以获得一个新的概念,方差减少相对优势的步骤。使用认知树,我们还开发了上下文思维嫁接,通过在关键分歧点/步骤对比成功和失败的分支来合成正确的推理。我们提出的手术策略优化,然后利用丰富的政策梯度信息集中在这些关键点/步骤,通过布拉德利-特里类型的手术损失。在体现,交互,推理和规划基准的广泛实验表明,T-STAR实现了在强基线上的一致改进,在需要扩展推理链的任务上的收益最明显。
摘要:Reinforcement learning for Large Language Model agents is often hindered by sparse rewards in multi-step reasoning tasks. Existing approaches like Group Relative Policy Optimization treat sampled trajectories as independent chains, assigning uniform credit to all steps in each chain and ignoring the existence of critical steps that may disproportionally impact reasoning outcome. In this paper, we propose T-STAR(Tree-structured Self-Taught Agent Rectification), a framework that recovers the latent correlated reward structure across seemingly independent trajectories. Specifically, we consolidate trajectories into a unified Cognitive Tree by identifying and merging functionally similar steps/nodes. It enables an Introspective Valuation mechanism that back-propagates trajectory-level rewards through the tree to obtain a new notion of variance-reduced relative advantage at step-level. Using the Cognitive Tree, we also develop In-Context Thought Grafting to synthesize corrective reasoning by contrasting successful and failed branches at critical divergence points/steps. Our proposed Surgical Policy Optimization then capitalizes on the rich policy gradient information concentrated at these critical points/steps through a Bradley-Terry type of surgical loss. Extensive experiments across embodied, interactive, reasoning, and planning benchmarks demonstrate that T-STAR achieves consistent improvements over strong baselines, with gains most pronounced on tasks requiring extended reasoning chains.
【2】Holistic Optimal Label Selection for Robust Prompt Learning under Partial Labels
标题:部分标签下鲁棒即时学习的整体最优标签选择
链接:https://arxiv.org/abs/2604.06614
作者:Yaqi Zhao,Haoliang Sun,Yating Wang,Yongshun Gong,Yilong Yin
摘要:Prompt learning has gained significant attention as a parameter-efficient approach for adapting large pre-trained vision-language models to downstream tasks. However, when only partial labels are available, its performance is often limited by label ambiguity and insufficient supervisory information. To address this issue, we propose Holistic Optimal Label Selection (HopS), leveraging the generalization ability of pre-trained feature encoders through two complementary strategies. First, we design a local density-based filter that selects the top frequent labels from the nearest neighbors' candidate sets and uses the softmax scores to identify the most plausible label, capturing structural regularities in the feature space. Second, we introduce a global selection objective based on optimal transport that maps the uniform sampling distribution to the candidate label distributions across a batch. By minimizing the expected transport cost, it can determine the most likely label assignments. These two strategies work together to provide robust label selection from both local and global perspectives. Extensive experiments on eight benchmark datasets show that HopS consistently improves performance under partial supervision and outperforms all baselines. Those results highlight the merit of holistic label selection and offer a practical solution for prompt learning in weakly supervised settings.
【3】Optimal Rates for Pure {\varepsilon}-Differentially Private Stochastic Convex Optimization with Heavy Tails
链接:https://arxiv.org/abs/2604.06492
作者:Andrew Lowy
摘要:We study stochastic convex optimization (SCO) with heavy-tailed gradients under pure epsilon-differential privacy (DP). Instead of assuming a bound on the worst-case Lipschitz parameter of the loss, we assume only a bounded k-th moment. This assumption allows for unbounded, heavy-tailed stochastic gradient distributions, and can yield sharper excess risk bounds. The minimax optimal rate for approximate (epsilon, delta)-DP SCO is known in this setting, but the pure epsilon-DP case has remained open. We characterize the minimax optimal excess-risk rate for pure epsilon-DP heavy-tailed SCO up to logarithmic factors. Our algorithm achieves this rate in polynomial time with high probability. Moreover, it runs in polynomial time with probability 1 when the worst-case Lipschitz parameter is polynomially bounded. For important structured problem classes - including hinge/ReLU-type and absolute-value losses on Euclidean balls, ellipsoids, and polytopes - we achieve the same excess-risk guarantee in polynomial time with probability 1 even when the worst-case Lipschitz parameter is infinite. Our approach is based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss. We complement our excess risk upper bound with a novel high probability lower bound.
【4】Discrete Flow Matching Policy Optimization
标题:离散流匹配政策优化
链接:https://arxiv.org/abs/2604.06491
作者:Maojiang Su,Po-Chung Hsieh,Weimin Wu,Mingcheng Lu,Jiunhau Chen,Jerry Yao-Chieh Hu,Han Liu
摘要:We introduce Discrete flow Matching policy Optimization (DoMinO), a unified framework for Reinforcement Learning (RL) fine-tuning Discrete Flow Matching (DFM) models under a broad class of policy gradient methods. Our key idea is to view the DFM sampling procedure as a multi-step Markov Decision Process. This perspective provides a simple and transparent reformulation of fine-tuning reward maximization as a robust RL objective. Consequently, it not only preserves the original DFM samplers but also avoids biased auxiliary estimators and likelihood surrogates used by many prior RL fine-tuning methods. To prevent policy collapse, we also introduce new total-variation regularizers to keep the fine-tuned distribution close to the pretrained one. Theoretically, we establish an upper bound on the discretization error of DoMinO and tractable upper bounds for the regularizers. Experimentally, we evaluate DoMinO on regulatory DNA sequence design. DoMinO achieves stronger predicted enhancer activity and better sequence naturalness than the previous best reward-driven baselines. The regularization further improves alignment with the natural sequence distribution while preserving strong functional performance. These results establish DoMinO as an useful framework for controllable discrete sequence generation.
【5】Bi-Level Optimization for Single Domain Generalization
标题:单域通用的双层优化
链接:https://arxiv.org/abs/2604.06349
作者:Marzi Heidari,Hanping Zhang,Hao Yan,Yuhong Guo
备注:CVPR Findings Track, 2026
摘要:Generalizing from a single labeled source domain to unseen target domains, without access to any target data during training, remains a fundamental challenge in robust machine learning. We address this underexplored setting, known as Single Domain Generalization (SDG), by proposing BiSDG, a bi-level optimization framework that explicitly decouples task learning from domain modeling. BiSDG simulates distribution shifts through surrogate domains constructed via label-preserving transformations of the source data. To capture domain-specific context, we propose a domain prompt encoder that generates lightweight modulation signals to produce augmenting features via feature-wise linear modulation. The learning process is formulated as a bi-level optimization problem: the inner objective optimizes task performance under fixed prompts, while the outer objective maximizes generalization across the surrogate domains by updating the domain prompt encoder. We further develop a practical gradient approximation scheme that enables efficient bi-level training without second-order derivatives. Extensive experiments on various SGD benchmarks demonstrate that BiSDG consistently outperforms prior methods, setting new state-of-the-art performance in the SDG setting.
【6】Stochastic Auto-conditioned Fast Gradient Methods with Optimal Rates
标题:具有最优速率的随机自条件快速梯度方法
链接:https://arxiv.org/abs/2604.06525
作者:Yao Ji,Guanghui Lan
摘要:Achieving optimal rates for stochastic composite convex optimization without prior knowledge of problem parameters remains a central challenge. In the deterministic setting, the auto-conditioned fast gradient method has recently been proposed to attain optimal accelerated rates without line-search procedures or prior knowledge of the Lipschitz smoothness constant, providing a natural prototype for parameter-free acceleration. However, extending this approach to the stochastic setting has proven technically challenging and remains open. Existing parameter-free stochastic methods either fail to achieve accelerated rates or rely on restrictive assumptions, such as bounded domains, bounded gradients, prior knowledge of the iteration horizon, or strictly sub-Gaussian noise. To address these limitations, we propose a stochastic variant of the auto-conditioned fast gradient method, referred to as stochastic AC-FGM. The proposed method is fully adaptive to the Lipschitz constant, the iteration horizon, and the noise level, enabling both adaptive stepsize selection and adaptive mini-batch sizing without line-search procedures. Under standard bounded conditional variance assumptions, we show that stochastic AC-FGM achieves the optimal iteration complexity of $O(1/\sqrt{\varepsilon})$ and the optimal sample complexity of $O(1/\varepsilon^2)$.
预测|估计(8篇)
【1】Beyond the Mean: Modelling Annotation Distributions in Continuous Affect Prediction
标题:超越平均值:连续情感预测中的注释分布建模
链接:https://arxiv.org/abs/2604.07198
作者:Kosmas Pinitas,Ilias Maglogiannis
备注:This paper has been accepted at the CVPR 2026 Workshop on Affective Behavior Analysis in-the-wild (ABAW)
摘要
:Emotion annotation is inherently subjective and cognitively demanding, producing signals that reflect diverse perceptions across annotators rather than a single ground truth. In continuous affect prediction, this variability is typically collapsed into point estimates such as the mean or median, discarding valuable information about annotator disagreement and uncertainty. In this work, we propose a distribution-aware framework that models annotation consensus using the Beta distribution. Instead of predicting a single affect value, models estimate the mean and standard deviation of the annotation distribution, which are transformed into valid Beta parameters through moment matching. This formulation enables the recovery of higher-order distributional descriptors, including skewness, kurtosis, and quantiles, in closed form. As a result, the model captures not only the central tendency of emotional perception but also variability, asymmetry, and uncertainty in annotator responses. We evaluate the proposed approach on the SEWA and RECOLA datasets using multimodal features. Experimental results show that Beta-based modelling produces predictive distributions that closely match the empirical annotator distributions while achieving competitive performance with conventional regression approaches. These findings highlight the importance of modelling annotation uncertainty in affective computing and demonstrate the potential of distribution-aware learning for subjective signal analysis.
【2】Frailty Estimation in Elderly Oncology Patients Using Multimodal Wearable Data and Multi-Instance Learning
标题:使用多模式可穿戴数据和多实例学习估计老年肿瘤患者的脆弱性
链接:https://arxiv.org/abs/2604.06985
作者:Ioannis Kyprakis,Vasileios Skaramagkas,Georgia Karanasiou,Lampros Lakkas,Andri Papakonstantinou,Domen Ribnikar,Kalliopi Keramida,Dorothea Tsekoura,Ketti Mazzocco,Anastasia Constantinidou,Konstantinos Marias,Dimitrios I. Fotiadis,Manolis Tsiknakis
备注:7 pages, 1 figure, under review for IEEE EMBC 2026
摘要:Frailty and functional decline strongly influence treatment tolerance and outcomes in older patients with cancer, yet assessment is typically limited to infrequent clinic visits. We propose a multimodal wearable framework to estimate frailty-related functional change between visits in elderly breast cancer patients enrolled in the multicenter CARDIOCARE study. Free-living smartwatch physical activity and sleep features are combined with ECG-derived heart rate variability (HRV) features from a chest strap and organized into patient-horizon bags aligned to month 3 (M3) and month 6 (M6) follow-ups. Our innovation is an attention-based multiple instance learning (MIL) formulation that fuses irregular, multimodal wearable instances under real-world missingness and weak supervision. An attention-based MIL model with modality-specific multilayer perceptron (MLP) encoders with embedding dimension 128 aggregates variable-length and partially missing longitudinal instances to predict discretized change-from-baseline classes (worsened, stable, improved) for FACIT-F and handgrip strength. Under subject-independent leave-one-subject-out (LOSO) evaluation, the full multimodal model achieved balanced accuracy/F1 of 0.68 +/- 0.08/0.67 +/- 0.09 at M3 and 0.70 +/- 0.10/0.69 +/- 0.08 at M6 for handgrip, and 0.59 +/- 0.04/0.58 +/- 0.06 at M3 and 0.64 +/- 0.05/0.63 +/- 0.07 at M6 for FACIT-F. Ablation results indicated that smartwatch activity and sleep provide the strongest predictive information for frailty-related functional changes, while HRV contributes complementary information when fused with smartwatch streams.
【3】When Does Context Help? A Systematic Study of Target-Conditional Molecular Property Prediction
标题:上下文什么时候有帮助?目标条件分子性质预测的系统研究
链接:https://arxiv.org/abs/2604.06558
作者:Bryan Cheng,Jasper Zhang
备注:9 pages, 5 figures. Accepted at Workshop on AI for Accelerated Materials Design and Foundation Models for Science: Real-World Impact and Science-First Design at ICLR 2026
摘要:We present the first systematic study of when target context helps molecular property prediction, evaluating context conditioning across 10 diverse protein families, 4 fusion architectures, data regimes spanning 67-9,409 training compounds, and both temporal and random evaluation splits. Using NestDrug, a FiLM-based architecture that conditions molecular representations on target identity, we characterize both success and failure modes with three principal findings. First, fusion architecture dominates: FiLM outperforms concatenation by 24.2 percentage points and additive conditioning by 8.6 pp; how you incorporate context matters more than whether you include it. Second, context enables otherwise impossible predictions: on data-scarce CYP3A4 (67 training compounds), multi-task transfer achieves 0.686 AUC where per-target Random Forest collapses to 0.238. Third, context can systematically hurt: distribution mismatch causes 10.2 pp degradation on BACE1; few-shot adaptation consistently underperforms zero-shot. Beyond methodology, we expose fundamental flaws in standard benchmarking: 1-nearest-neighbor Tanimoto achieves 0.991 AUC on DUD-E without any learning, and 50% of actives leak from training data, rendering absolute performance metrics meaningless. Our temporal split evaluation (train up to 2020, test 2021-2024) achieves stable 0.843 AUC with no degradation, providing the first rigorous evidence that context-conditional molecular representations generalize to future chemical space.
【4】MICA: Multivariate Infini Compressive Attention for Time Series Forecasting
标题:MICA:多元Infini对时间序列预测的压力关注
链接:https://arxiv.org/abs/2604.06473
作者:Willa Potosnak,Nina Żukowska,Michał Wiliński,Dan Howarth,Ignacy Stępka,Mononito Goswami,Artur Dubrawski
摘要:Multivariate forecasting with Transformers faces a core scalability challenge: modeling cross-channel dependencies via attention compounds attention's quadratic sequence complexity with quadratic channel scaling, making full cross-channel attention impractical for high-dimensional time series. We propose Multivariate Infini Compressive Attention (MICA), an architectural design to extend channel-independent Transformers to channel-dependent forecasting. By adapting efficient attention techniques from the sequence dimension to the channel dimension, MICA adds a cross-channel attention mechanism to channel-independent backbones that scales linearly with channel count and context length. We evaluate channel-independent Transformer architectures with and without MICA across multiple forecasting benchmarks. MICA reduces forecast error over its channel-independent counterparts by 5.4% on average and up to 25.4% on individual datasets, highlighting the importance of explicit cross-channel modeling. Moreover, models with MICA rank first among deep multivariate Transformer and MLP baselines. MICA models also scale more efficiently with respect to both channel count and context length than Transformer baselines that compute attention across both the temporal and channel dimensions, establishing compressive attention as a practical solution for scalable multivariate forecasting.
【5】Weighted Bayesian Conformal Prediction
标题:加权贝氏保形预测
链接:https://arxiv.org/abs/2604.06464
作者:Xiayin Lou,Peng Luo
摘要
:Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees, and recent work by Snell \& Griffiths reframes it as Bayesian Quadrature (BQ-CP), yielding powerful data-conditional guarantees via Dirichlet posteriors over thresholds. However, BQ-CP fundamentally requires the i.i.d. assumption -- a limitation the authors themselves identify. Meanwhile, weighted conformal prediction handles distribution shift via importance weights but remains frequentist, producing only point-estimate thresholds. We propose \textbf{Weighted Bayesian Conformal Prediction (WBCP)}, which generalizes BQ-CP to arbitrary importance-weighted settings by replacing the uniform Dirichlet $\Dir(1,\ldots,1)$ with a weighted Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$, where $\neff$ is Kish's effective sample size. We prove four theoretical results: (1)~$\neff$ is the unique concentration parameter matching frequentist and Bayesian variances; (2)~posterior standard deviation decays as $O(1/\sqrt{\neff})$; (3)~BQ-CP's stochastic dominance guarantee extends to per-weight-profile data-conditional guarantees; (4)~the HPD threshold provides $O(1/\sqrt{\neff})$ improvement in conditional coverage. We instantiate WBCP for spatial prediction as \emph{Geographical BQ-CP}, where kernel-based spatial weights yield per-location posteriors with interpretable diagnostics. Experiments on synthetic and real-world spatial datasets demonstrate that WBCP maintains coverage guarantees while providing substantially richer uncertainty information.
【6】Toward Reducing Unproductive Container Moves: Predicting Service Requirements and Dwell Times
标题:减少非生产性集装箱移动:预测服务需求和停留时间
链接:https://arxiv.org/abs/2604.06251
作者:Elena Villalobos,Adolfo De Unánue T.,Fernanda Sobrino,David Aké,Stephany Cisneros,Jorge Lecona,Alejandra Matadamaz
备注:Preprint, 20 pages, 9 figures, 5 tables (including appendices)
摘要:This article presents the results of a data science study conducted at a container terminal, aimed at reducing unproductive container moves through the prediction of service requirements and container dwell times. We develop and evaluate machine learning models that leverage historical operational data to anticipate which containers will require pre-clearance handling services prior to cargo release and to estimate how long they are expected to remain in the terminal. As part of the data preparation process, we implement a classification system for cargo descriptions and perform deduplication of consignee records to improve data consistency and feature quality. These predictive capabilities provide valuable inputs for strategic planning and resource allocation in yard operations. Across multiple temporal validation periods, the proposed models consistently outperform existing rule-based heuristics and random baselines in precision and recall. These results demonstrate the practical value of predictive analytics for improving operational efficiency and supporting data-driven decision-making in container terminal logistics.
【7】A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset
标题:基于新型孟加拉国市场价格数据集的农业商品价格预测经典和深度学习模型基准
链接:https://arxiv.org/abs/2604.06227
作者:Tashreef Muhammad,Tahsin Ahmed,Meherun Farzana,Md. Mahmudul Hasan,Abrar Eyasir,Md. Emon Khan,Mahafuzul Islam Shawon,Ferdous Mondol,Mahmudul Hasan,Muhammad Ibrahim
备注:26 pages, 22 figures, 7 tables
摘要:Accurate short-term forecasting of agricultural commodity prices is critical for food security planning and smallholder income stabilisation in developing economies, yet machine-learning-ready datasets for this purpose remain scarce in South Asia. This paper makes two contributions. First, we introduce AgriPriceBD, a benchmark dataset of 1,779 daily retail mid-prices for five Bangladeshi commodities - garlic, chickpea, green chilli, cucumber, and sweet pumpkin - spanning July 2020 to June 2025, extracted from government reports via an LLM-assisted digitisation pipeline. Second, we evaluate seven forecasting approaches spanning classical models - naïve persistence, SARIMA, and Prophet - and deep learning architectures - BiLSTM, Transformer, Time2Vec-enhanced Transformer, and Informer - with Diebold-Mariano statistical significance tests. Commodity price forecastability is fundamentally heterogeneous: naïve persistence dominates on near-random-walk commodities. Time2Vec temporal encoding provides no statistically significant advantage over fixed sinusoidal encoding and causes catastrophic degradation on green chilli (+146.1% MAE, p<0.001). Prophet fails systematically, attributable to discrete step-function price dynamics incompatible with its smooth decomposition assumptions. Informer produces erratic predictions (variance up to 50x ground-truth), confirming sparse-attention Transformers require substantially larger training sets than small agricultural datasets provide. All code, models, and data are released publicly to support replication and future forecasting research on agricultural commodity markets in Bangladesh and similar developing economies.
【8】Learning Debt and Cost-Sensitive Bayesian Retraining: A Forecasting Operations Framework
标题:学习债务和成本敏感的Bayesian再训练:预测运营框架
链接:https://arxiv.org/abs/2604.06438
作者:Harrison Katz
摘要:Forecasters often choose retraining schedules by convention rather than by an explicit decision rule. This paper gives that decision a posterior-space language. We define learning debt as the divergence between the deployed and continuously updated posteriors, define actionable staleness as the policy-relevant latent state, and derive a one-step Bayes retraining rule under an excess-loss formulation. In an online conjugate simulation using the exact Kullback-Leibler divergence between deployed and shadow normal-inverse-gamma posteriors, a debt-filter beats a default 10-period calendar baseline in 15 of 24 abrupt-shift cells, all 24 gradual-drift cells, and 17 of 24 variance-shift cells, and remains below the best fixed cadence in a grid of cadences (5, 10, 20, and 40 periods) in 10, 24, and 17 cells, respectively. Fixed-threshold CUSUM remains a strong benchmark, while a proxy filter built from indirect diagnostics performs poorly. A retrospective Airbnb production backtest shows how the same decision logic behaves around a known payment-policy shock.
其他神经网络|深度学习|模型|建模(30篇)
【1】Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
标题:个性化奖励平台:通过人性化个性化评估奖励模型
链接:https://arxiv.org/abs/2604.07343
作者:Qiyao Ma,Dechen Gao,Rui Cai,Boqi Zhao,Hanchu Zhou,Junshan Zhang,Zhe Zhao
摘要:Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values. While benchmarks for general response quality are prevalent, evaluating how well reward models account for individual user preferences remains an open challenge. To bridge this gap, we introduce Personalized RewardBench, a novel benchmark designed to rigorously assess reward models' capacity to model personalized preferences. We construct chosen and rejected response pairs based on strict adherence to (or violation of) user-specific rubrics, ensuring that preference distinctions are uniquely tailored to the individual. In particular, human evaluations confirm that the primary discriminative factor between pairs is strictly personal preference, with both responses maintaining high general quality (e.g., correctness, relevance and helpfulness). Extensive testing reveals that existing state-of-the-art reward models struggle significantly with personalization, peaking at an accuracy of just 75.94%. Crucially, because an effective reward model benchmark should predict a reward model's performance on downstream tasks, we conduct experiments demonstrating that our benchmark exhibits a significantly higher correlation with downstream performance in both Best-of-N (BoN) sampling and Proximal Policy Optimization (PPO) compared to existing baselines. These findings establish Personalized RewardBench as a robust and accurate proxy for evaluating reward models' performance in downstream applications.
【2】How to sketch a learning algorithm
标题:如何绘制学习算法
链接:https://arxiv.org/abs/2604.07328
作者:Sam Gunn
摘要:How does the choice of training data influence an AI model? This question is of central importance to interpretability, privacy, and basic science. At its core is the data deletion problem: after a reasonable amount of precomputation, quickly predict how the model would behave in a given situation if a given subset of training data had been excluded from the learning algorithm. We present a data deletion scheme capable of predicting model outputs with vanishing error $\varepsilon$ in the deep learning setting. Our precomputation and prediction algorithms are only $\mathrm{poly}(1/\varepsilon)$ factors slower than regular training and inference, respectively. The storage requirements are those of $\mathrm{poly}(1/\varepsilon)$ models. Our proof is based on an assumption that we call "stability." In contrast to the assumptions made by prior work, stability appears to be fully compatible with learning powerful AI models. In support of this, we show that stability is satisfied in a minimal set of experiments with microgpt. Our code is available at https://github.com/SamSpo1/microgpt-sketch. At a technical level, our work is based on a new method for locally sketching an arithmetic circuit by computing higher-order derivatives in random complex directions. Forward-mode automatic differentiation allows cheap computation of these derivatives.
【3】SL-FAC: A Communication-Efficient Split Learning Framework with Frequency-Aware Compression
标题:SL-FAC:具有频率感知压缩的通信高效分离学习框架
链接:https://arxiv.org/abs/2604.07316
作者:Zehang Lin,Miao Yang,Haihan Zhu,Zheng Lin,Jianhao Huang,Jing Yang,Guangjin Pan,Dianxin Luan,Zihan Fang,Shunzhi Zhu,Wei Ni,John Thompson
备注:6 pages, 4 figures
摘要:The growing complexity of neural networks hinders the deployment of distributed machine learning on resource-constrained devices. Split learning (SL) offers a promising solution by partitioning the large model and offloading the primary training workload from edge devices to an edge server. However, the increasing number of participating devices and model complexity leads to significant communication overhead from the transmission of smashed data (e.g., activations and gradients), which constitutes a critical bottleneck for SL. To tackle this challenge, we propose SL-FAC, a communication-efficient SL framework comprising two key components: adaptive frequency decomposition (AFD) and frequency-based quantization compression (FQC). AFD first transforms the smashed data into the frequency domain and decomposes it into spectral components with distinct information. FQC then applies customized quantization bit widths to each component based on its spectral energy distribution. This collaborative approach enables SL-FAC to achieve significant communication reduction while strategically preserving the information most crucial for model convergence. Extensive experiments confirm the superior performance of SL-FAC for improving the training efficiency.
【4】Are Face Embeddings Compatible Across Deep Neural Network Models?
标题:面部嵌入在深度神经网络模型中是否兼容?
链接:https://arxiv.org/abs/2604.07282
作者:Fizza Rubab,Yiying Tong,Arun Ross
摘要:Automated face recognition has made rapid strides over the past decade due to the unprecedented rise of deep neural network (DNN) models that can be trained for domain-specific tasks. At the same time, foundation models that are pretrained on broad vision or vision-language tasks have shown impressive generalization across diverse domains, including biometrics. This raises an important question: Do different DNN models--both domain-specific and foundation models--encode facial identity in similar ways, despite being trained on different datasets, loss functions, and architectures? In this regard, we directly analyze the geometric structure of embedding spaces imputed by different DNN models. Treating embeddings of face images as point clouds, we study whether simple affine transformations can align face representations of one model with another. Our findings reveal surprising cross-model compatibility: low-capacity linear mappings substantially improve cross-model face recognition over unaligned baselines for both face identification and verification tasks. Alignment patterns generalize across datasets and vary systematically across model families, indicating representational convergence in facial identity encoding. These findings have implications for model interoperability, ensemble design, and biometric template security.
【5】Weaves, Wires, and Morphisms: Formalizing and Implementing the Algebra of Deep Learning
标题:编织、线和形态学:形式化和实现深度学习代数
链接:https://arxiv.org/abs/2604.07242
作者:Vincent Abbott,Gioele Zardini
摘要:Despite deep learning models running well-defined mathematical functions, we lack a formal mathematical framework for describing model architectures. Ad-hoc notation, diagrams, and pseudocode poorly handle nonlinear broadcasting and the relationship between individual components and composed models. This paper introduces a categorical framework for deep learning models that formalizes broadcasting through the novel axis-stride and array-broadcasted categories. This allows the mathematical function underlying architectures to be precisely expressed and manipulated in a compositional manner. These mathematical definitions are translated into human manageable diagrams and machine manageable data structures. We provide a mirrored implementation in Python (pyncd) and TypeScript (tsncd) to show the universal aspect of our framework, along with features including algebraic construction, graph conversion, PyTorch compilation and diagram rendering. This lays the foundation for a systematic, formal approach to deep learning model design and analysis.
【6】How Does Machine Learning Manage Complexity?
标题:机器学习如何管理复杂性?
链接:https://arxiv.org/abs/2604.07233
作者:Lance Fortnow
备注:16 pages, no figures
摘要:We provide a computational complexity lens to understand the power of machine learning models, particularly their ability to model complex systems. Machine learning models are often trained on data drawn from sampleable or more complex distributions, a far wider range of distributions than just computable ones. By focusing on computable distributions, machine learning models can better manage complexity via probability. We abstract away from specific learning mechanisms, modeling machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy. We illustrate how learning computable distributions models complexity by showing that if a machine learning model produces a distribution $μ$ that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then $μ$ must be close to uniform.
【7】Information as Structural Alignment: A Dynamical Theory of Continual Learning
标题:信息作为结构对齐:持续学习的动态理论
链接:https://arxiv.org/abs/2604.07108
作者:Radu Negulescu
备注:31 pages, 8 figures
摘要:Catastrophic forgetting is not an engineering failure. It is a mathematical consequence of storing knowledge as global parameter superposition. Existing methods, such as regularization, replay, and frozen subnetworks, add external mechanisms to a shared-parameter substrate. None derives retention from the learning dynamics themselves. This paper introduces the Informational Buildup Framework (IBF), an alternative substrate for continual learning, based on the premise that information is the achievement of structural alignment rather than stored content. In IBF, two equations govern the dynamics: a Law of Motion that drives configuration toward higher coherence, and Modification Dynamics that persistently deform the coherence landscape in response to localized discrepancies. Memory, agency, and self-correction arise from these dynamics rather than being added as separate modules. We first demonstrate the full lifecycle in a transparent two-dimensional toy model, then validate across three domains: a controlled non-stationary world, chess evaluated independently by Stockfish, and Split-CIFAR-100 with a frozen ViT encoder. Across all three, IBF achieves replay-superior retention without storing raw data. We observe near-zero forgetting on CIFAR-100 (BT = -0.004), positive backward transfer in chess (+38.5 cp), and 43% less forgetting than replay in the controlled domain. In chess, the framework achieves a mean behavioral advantage of +88.9 +/- 2.8 cp under independent evaluation, exceeding MLP and replay baselines.
【8】Controller Design for Structured State-space Models via Contraction Theory
标题:基于压缩理论的结构化状态空间模型控制器设计
链接:https://arxiv.org/abs/2604.07069
作者:Muhammad Zakwan,Vaibhav Gupta,Alireza Karimi,Efe C. Balta,Giancarlo Ferrari-Trecate
备注:The first and second authors contributed equally. The paper has been accepted in 24th European Control Conference (ECC) in Reykjavik, Iceland, 2026
摘要:This paper presents an indirect data-driven output feedback controller synthesis for nonlinear systems, leveraging Structured State-space Models (SSMs) as surrogate models. SSMs have emerged as a compelling alternative in modelling time-series data and dynamical systems. They can capture long-term dependencies while maintaining linear computational complexity with respect to the sequence length, in comparison to the quadratic complexity of Transformer-based architectures. The contributions of this work are threefold. We provide the first analysis of controllability and observability of SSMs, which leads to scalable control design via Linear Matrix Inequalities (LMIs) that leverage contraction theory. Moreover, a separation principle for SSMs is established, enabling the independent design of observers and state-feedback controllers while preserving the exponential stability of the closed-loop system. The effectiveness of the proposed framework is demonstrated through a numerical example, showcasing nonlinear system identification and the synthesis of an output feedback controller.
【9】CAFP: A Post-Processing Framework for Group Fairness via Counterfactual Model Averaging
标题:CAFP:通过反事实模型平均实现群体公平的后处理框架
链接:https://arxiv.org/abs/2604.07009
作者:Irina Arévalo,Marcos Oliva
摘要
:Ensuring fairness in machine learning predictions is a critical challenge, especially when models are deployed in sensitive domains such as credit scoring, healthcare, and criminal justice. While many fairness interventions rely on data preprocessing or algorithmic constraints during training, these approaches often require full control over the model architecture and access to protected attribute information, which may not be feasible in real-world systems. In this paper, we propose Counterfactual Averaging for Fair Predictions (CAFP), a model-agnostic post-processing method that mitigates unfair influence from protected attributes without retraining or modifying the original classifier. CAFP operates by generating counterfactual versions of each input in which the sensitive attribute is flipped, and then averaging the model's predictions across factual and counterfactual instances. We provide a theoretical analysis of CAFP, showing that it eliminates direct dependence on the protected attribute, reduces mutual information between predictions and sensitive attributes, and provably bounds the distortion introduced relative to the original model. Under mild assumptions, we further show that CAFP achieves perfect demographic parity and reduces the equalized odds gap by at least half the average counterfactual bias.
【10】A First Guess is Rarely the Final Answer: Learning to Search in the Travelling Salesperson Problem
标题:第一个猜测很少是最终答案:学习搜索旅行推销员问题
链接:https://arxiv.org/abs/2604.06940
作者:Andoni Irazusta Garmendia
摘要:Most neural solvers for the Traveling Salesperson Problem (TSP) are trained to output a single solution, even though practitioners rarely stop there: at test time, they routinely spend extra compute on sampling or post-hoc search. This raises a natural question: can the search procedure itself be learned? Neural improvement methods take this perspective by learning a policy that applies local modifications to a candidate solution, accumulating gains over an improvement trajectory. Yet learned improvement for TSP remains comparatively immature, with existing methods still falling short of robust, scalable performance. We argue that a key reason is design mismatch: many approaches reuse state representations, architectural choices, and training recipes inherited from single-solution methods, rather than being built around the mechanics of local search. This mismatch motivates NICO-TSP (Neural Improvement for Combinatorial Optimization): a 2-opt improvement framework for TSP. NICO-TSP represents the current tour with exactly $n$ edge tokens aligned with the neighborhood operator, scores 2-opt moves directly without tour positional encodings, and trains via a two-stage procedure: imitation learning to short-horizon optimal trajectories, followed by critic-free group-based reinforcement learning over longer rollouts. Under compute-matched evaluations that measure improvement as a function of both search steps and wall-clock time, NICO-TSP delivers consistently stronger and markedly more step-efficient improvement than prior learned and heuristic search baselines, generalizes far more reliably to larger out-of-distribution instances, and serves both as a competitive replacement for classical local search and as a powerful test-time refinement module for constructive solvers.
【11】VertAX: a differentiable vertex model for learning epithelial tissue mechanics
标题:VertAX:一种用于上皮组织力学学习的可微顶点模型
链接:https://arxiv.org/abs/2604.06896
作者:Alessandro Pasqui,Jim Martin Catacora Ocana,Anshuman Sinha,Matthieu Perez,Fabrice Delbary,Giorgio Gosti,Mattia Miotto,Domenico Caudo,Maxence Ernoult,Hervé Turlier
备注:28 pages, 4 figures
摘要:Epithelial tissues dynamically reshape through local mechanical interactions among cells, a process well captured by vertex models. Yet their many tunable parameters make inference and optimization challenging, motivating computational frameworks that flexibly model and learn tissue mechanics. We introduce VertAX, a differentiable JAX-based framework for vertex-modeling of confluent epithelia. VertAX provides automatic differentiation, GPU acceleration, and end-to-end bilevel optimization for forward simulation, parameter inference, and inverse mechanical design. Users can define arbitrary energy and cost functions in pure Python, enabling seamless integration with machine-learning pipelines. We demonstrate VertAX on three representative tasks: (i) forward modeling of tissue morphogenesis, (ii) mechanical parameter inference, and (iii) inverse design of tissue-scale behaviors. We benchmark three differentiation strategies-automatic differentiation, implicit differentiation, and equilibrium propagation-showing that the latter can approximate gradients using repeated forward, adjoint-free simulations alone, offering a simple route for extending inverse biophysical problems to non-differentiable simulators with limited additional engineering effort.
【12】Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models
标题:能量规则空间掩蔽:增强视觉模型稳健性和可解释性的新方法
链接:https://arxiv.org/abs/2604.06893
作者:Tom Devynck Bilal Faye Djamel Bouchaffra Nadjib Lazaar Hanane Azzag Mustapha Lebbah
摘要:Deep convolutional neural networks achieve remarkable performance by exhaustively processing dense spatial feature maps, yet this brute-force strategy introduces significant computational redundancy and encourages reliance on spurious background correlations. As a result, modern vision models remain brittle and difficult to interpret. We propose Energy-Regularized Spatial Masking (ERSM), a novel framework that reformulates feature selection as a differentiable energy minimization problem. By embedding a lightweight Energy-Mask Layer inside standard convolutional backbones, each visual token is assigned a scalar energy composed of two competing forces: an intrinsic Unary importance cost and a Pairwise spatial coherence penalty. Unlike prior pruning methods that enforce rigid sparsity budgets or rely on heuristic importance scores, ERSM allows the network to autonomously discover an optimal information-density equilibrium tailored to each input. We validate ERSM on convolutional architectures and demonstrate that it produces emergent sparsity, improved robustness to structured occlusion, and highly interpretable spatial masks, while preserving classification accuracy. Furthermore, we show that the learned energy ranking significantly outperforms magnitude-based pruning in deletion-based robustness tests, revealing ERSM as an intrinsic denoising mechanism that isolates semantic object regions without pixel-level supervision.
【13】Explaining Neural Networks in Preference Learning: a Post-hoc Inductive Logic Programming Approach
标题:在偏好学习中解释神经网络:事后归纳逻辑编程方法
链接:https://arxiv.org/abs/2604.06838
作者:Daniele Fossemò,Filippo Mignosi,Giuseppe Placidi,Luca Raggioli,Matteo Spezialetti,Fabio Aurelio D'Asaro
备注:Under consideration for publication in Theory and Practice of Logic Programming (TPLP)
摘要:In this paper, we propose using Learning from Answer Sets to approximate black-box models, such as Neural Networks (NN), in the specific case of learning user preferences. We specifically explore the use of ILASP (Inductive Learning of Answer Set Programs) to approximate preference learning systems through weak constraints. We have created a dataset on user preferences over a set of recipes, which is used to train the NNs that we aim to approximate with ILASP. Our experiments investigate ILASP both as a global and a local approximator of the NNs. These experiments address the challenge of approximating NNs working on increasingly high-dimensional feature spaces while achieving appropriate fidelity on the target model and limiting the increase in computational time. To handle this challenge, we propose a preprocessing step that exploits Principal Component Analysis to reduce the dataset's dimensionality while keeping our explanations transparent. Under consideration for publication in Theory and Practice of Logic Programming (TPLP).
【14】OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale
标题:OmniTabBench:绘制GBDT、神经网络和大规模表格数据基础模型的经验前沿
链接:https://arxiv.org/abs/2604.06814
作者:Dihong Jiang,Ruoqi Cao,Zhiyuan Dang,Li Huang,Qingsong Zhang,Zhiyu Wang,Shihao Piao,Shenggao Zhu,Jianlong Chang,Zhouchen Lin,Qi Tian
摘要:While traditional tree-based ensemble methods have long dominated tabular tasks, deep neural networks and emerging foundation models have challenged this primacy, yet no consensus exists on a universally superior paradigm. Existing benchmarks typically contain fewer than 100 datasets, raising concerns about evaluation sufficiency and potential selection biases. To address these limitations, we introduce OmniTabBench, the largest tabular benchmark to date, comprising 3030 datasets spanning diverse tasks that are comprehensively collected from diverse sources and categorized by industry using large language models. We conduct an unprecedented large-scale empirical evaluation of state-of-the-art models from all model families on OmniTabBench, confirming the absence of a dominant winner. Furthermore, through a decoupled metafeature analysis, which examines individual properties such as dataset size, feature types, feature and target skewness/kurtosis, we elucidate conditions favoring specific model categories, providing clearer, more actionable guidance than prior compound-metric studies.
【15】Sparse-Aware Neural Networks for Nonlinear Functionals: Mitigating the Exponential Dependence on Dimension
标题:非线性函数的稀疏感知神经网络:缓解对维度的指数依赖性
链接:https://arxiv.org/abs/2604.06774
作者:Jianfei Li,Shuo Huang,Han Feng,Ding-Xuan Zhou,Gitta Kutyniok
摘要:Deep neural networks have emerged as powerful tools for learning operators defined over infinite-dimensional function spaces. However, existing theories frequently encounter difficulties related to dimensionality and limited interpretability. This work investigates how sparsity can help address these challenges in functional learning, a central ingredient in operator learning. We propose a framework that employs convolutional architectures to extract sparse features from a finite number of samples, together with deep fully connected networks to effectively approximate nonlinear functionals. Using universal discretization methods, we show that sparse approximators enable stable recovery from discrete samples. In addition, both the deterministic and the random sampling schemes are sufficient for our analysis. These findings lead to improved approximation rates and reduced sample sizes in various function spaces, including those with fast frequency decay and mixed smoothness. They also provide new theoretical insights into how sparsity can alleviate the curse of dimensionality in functional learning.
【16】The Rhetoric of Machine Learning
标题:机器学习的修辞
链接:https://arxiv.org/abs/2604.06754
作者:Robert C. Williamson
备注:25 pages. Text of a talk given at AlphaPersuade 2.0, 26 March 2026
摘要:I examine the technology of machine learning from the perspective of rhetoric, which is simply the art of persuasion. Rather than being a neutral and "objective" way to build "world models" from data, machine learning is (I argue) inherently rhetorical. I explore some of its rhetorical features, and examine one pervasive business model where machine learning is widely used, "manipulation as a service."
【17】Beyond Pessimism: Offline Learning in KL-regularized Games
标题:超越悲观主义:KL正则化游戏中的离线学习
链接:https://arxiv.org/abs/2604.06738
作者:Yuheng Zhang,Claire Chen,Nan Jiang
摘要:We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized under a KL constraint to a fixed reference policy. Prior work relies on pessimistic value estimation to handle distribution shift, yielding only $\widetilde{\mathcal{O}}(1/\sqrt n)$ statistical rates. We develop a new pessimism-free algorithm and analytical framework for KL-regularized games, built on the smoothness of KL-regularized best responses and a stability property of the Nash equilibrium induced by skew symmetry. This yields the first $\widetilde{\mathcal{O}}(1/n)$ sample complexity bound for offline learning in KL-regularized zero-sum games, achieved entirely without pessimism. We further propose an efficient self-play policy optimization algorithm and prove that, with a number of iterations linear in the sample size, it achieves the same fast $\widetilde{\mathcal{O}}(1/n)$ statistical rate as the minimax estimator.
【18】PD-SOVNet: A Physics-Driven Second-Order Vibration Operator Network for Estimating Wheel Polygonal Roughness from Axle-Box Vibrations
标题:PD-SOVNet:一个物理驱动的二阶振动操作员网络,用于根据轴箱振动估计车轮多边形粗糙度
链接:https://arxiv.org/abs/2604.06620
作者:Xiancheng Wang,Lin Wang,Rui Wang,Zhibo Zhang,Minghang Zhao,Xiaoheng Zhang,Zhongyue Tan,Kaitai Mao
摘要:Quantitative estimation of wheel polygonal roughness from axle-box vibration signals is a challenging yet practically relevant problem for rail-vehicle condition monitoring. Existing studies have largely focused on detection, identification, or severity classification, while continuous regression of multi-order roughness spectra remains less explored, especially under real operational data and unseen-wheel conditions. To address this problem, this paper presents PD-SOVNet, a physics-guided gray-box framework that combines shared second-order vibration kernels, a $4\times4$ MIMO coupling module, an adaptive physical correction branch, and a Mamba-based temporal branch for estimating the 1st--40th-order wheel roughness spectrum from axle-box vibrations. The proposed design embeds modal-response priors into the model while retaining data-driven flexibility for sample-dependent correction and residual temporal dynamics. Experiments on three real-world datasets, including operational data and real fault data, show that the proposed method provides competitive prediction accuracy and relatively stable cross-wheel performance under the current data protocol, with its most noticeable advantage observed on the more challenging Dataset III. Noise injection experiments further indicate that the Mamba temporal branch helps mitigate performance degradation under perturbed inputs. These results suggest that structured physical priors can be beneficial for stabilizing roughness regression in practical rail-vehicle monitoring scenarios, although further validation under broader operating conditions and stricter comparison protocols is still needed.
【19】AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
标题:AE-ViT:稳定的长期参数偏微方程建模
链接:https://arxiv.org/abs/2604.06475
作者:Iva Mikuš,Boris Muha,Domagoj Vlah
备注:16 pages, 7 figures
摘要:Deep Learning Reduced Order Models (ROMs) are becoming increasingly popular as surrogate models for parametric partial differential equations (PDEs) due to their ability to handle high-dimensional data, approximate highly nonlinear mappings, and utilize GPUs. Existing approaches typically learn evolution either on the full solution field, which requires capturing long-range spatial interactions at high computational cost, or on compressed latent representations obtained from autoencoders, which reduces the cost but often yields latent vectors that are difficult to evolve, since they primarily encode spatial information. Moreover, in parametric PDEs, the initial condition alone is not sufficient to determine the trajectory, and most current approaches are not evaluated on jointly predicting multiple solution components with differing magnitudes and parameter sensitivities. To address these challenges, we propose a joint model consisting of a convolutional encoder, a transformer operating on latent representations, and a decoder for reconstruction. The main novelties are joint training with multi-stage parameter injection and coordinate channel injection. Parameters are injected at multiple stages to improve conditioning. Physical coordinates are encoded to provide spatial information. This allows the model to dynamically adapt its computations to the specific PDE parameters governing each system, rather than learning a single fixed response. Experiments on the Advection-Diffusion-Reaction equation and Navier-Stokes flow around the cylinder wake demonstrate that our approach combines the efficiency of latent evolution with the fidelity of full-field models, outperforming DL-ROMs, latent transformers, and plain ViTs in multi-field prediction, reducing the relative rollout error by approximately $5$ times.
【20】Conformal Margin Risk Minimization: An Envelope Framework for Robust Learning under Label Noise
标题:共形边际风险最小化:标签噪声下鲁棒学习的包络框架
链接:https://arxiv.org/abs/2604.06468
作者:Yuanjie Shi,Peihong Li,Zijian Zhang,Janardhan Rao Doppa,Yan Yan
备注:Accepted for Publication at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS), 2026
摘要:Most methods for learning with noisy labels require privileged knowledge such as noise transition matrices, clean subsets or pretrained feature extractors, resources typically unavailable when robustness is most needed. We propose Conformal Margin Risk Minimization (CMRM), a plug-and-play envelope framework that improves any classification loss under label noise by adding a single quantile-calibrated regularization term, with no privileged knowledge or training pipeline modification. CMRM measures the confidence margin between the observed label and competing labels, and thresholds it with a conformal quantile estimated per batch to focus training on high-margin samples while suppressing likely mislabeled ones. We derive a learning bound for CMRM under arbitrary label noise requiring only mild regularity of the margin distribution. Across five base methods and six benchmarks with synthetic and real-world noise, CMRM consistently improves accuracy (up to +3.39%), reduces conformal prediction set size (up to -20.44%) and does not hurt under 0% noise, showing that CMRM captures a method-agnostic uncertainty signal that existing mechanisms did not exploit.
【21】Quality-preserving Model for Electronics Production Quality Tests Reduction
标题:电子产品生产质量测试减少的质量保持模型
链接:https://arxiv.org/abs/2604.06451
作者:Noufa Haneefa,Teddy Lazebnik,Einav Peretz-Andersson
摘要
:Manufacturing test flows in high-volume electronics production are typically fixed during product development and executed unchanged on every unit, even as failure patterns and process conditions evolve. This protects quality, but it also imposes unnecessary test cost, while existing data-driven methods mostly optimize static test subsets and neither adapt online to changing defect distributions nor explicitly control escape risk. In this study, we present an adaptive test-selection framework that combines offline minimum-cost diagnostic subset construction using greedy set cover with an online Thompson-sampling multi-armed bandit that switches between full and reduced test plans using a rolling process-stability signal. We evaluate the framework on two printed circuit board assembly stages-Functional Circuit Test and End-of-Line test-covering 28,000 board runs. Offline analysis identified zero-escape reduced plans that cut test time by 18.78% in Functional Circuit Test and 91.57\% in End-of-Line testing. Under temporal validation with real concept drift, static reduction produced 110 escaped defects in Functional Circuit Test and 8 in End-of-Line, whereas the adaptive policy reduced escapes to zero by reverting to fuller coverage when instability emerged in practice. These results show that online learning can preserve manufacturing quality while reducing test burden, offering a practical route to adaptive test planning across production domains, and offering both economic and logistics improvement for companies.
【22】ODE-free Neural Flow Matching for One-Step Generative Modeling
标题:一步生成建模的无ODE神经流匹配
链接:https://arxiv.org/abs/2604.06413
作者:Xiao Shou
摘要:Diffusion and flow matching models generate samples by learning time-dependent vector fields whose integration transports noise to data, requiring tens to hundreds of network evaluations at inference. We instead learn the transport map directly. We propose Optimal Transport Neural Flow Matching (OT-NFM), an ODE-free generative framework that parameterizes the flow map with neural flows, enabling true one-step generation with a single forward pass. We show that naive flow-map training suffers from mean collapse, where inconsistent noise-data pairings drive all outputs toward the data mean. We prove that consistent coupling is necessary for non-degenerate learning and address this using optimal transport pairings with scalable minibatch and online coupling strategies. Experiments on synthetic benchmarks and image generation tasks (MNIST and CIFAR-10) demonstrate competitive sample quality while reducing inference to a single network evaluation.
【23】The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
标题:主钥匙假设:通过线性子空间对齐解锁跨模型能力转移
链接:https://arxiv.org/abs/2604.06377
作者:Rishab Balasubramanian,Pin-Jie Lin,Rituraj Sharma,Anjie Fang,Fardin Abdi,Viktor Rozgic,Zheng Du,Mohit Bansal,Tu Vu
摘要:We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.
【24】Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks
标题:深度线性网络鞍到鞍区的随机梯度下降
链接:https://arxiv.org/abs/2604.06366
作者:Guillaume Corlouer,Avi Semler,Alexander Strang,Alexander Gietelink Oldenziel
摘要:Deep linear networks (DLNs) are used as an analytically tractable model of the training dynamics of deep neural networks. While gradient descent in DLNs is known to exhibit saddle-to-saddle dynamics, the impact of stochastic gradient descent (SGD) noise on this regime remains poorly understood. We investigate the dynamics of SGD during training of DLNs in the saddle-to-saddle regime. We model the training dynamics as stochastic Langevin dynamics with anisotropic, state-dependent noise. Under the assumption of aligned and balanced weights, we derive an exact decomposition of the dynamics into a system of one-dimensional per-mode stochastic differential equations. This establishes that the maximal diffusion along a mode precedes the corresponding feature being completely learned. We also derive the stationary distribution of SGD for each mode: in the absence of label noise, its marginal distribution along specific features coincides with the stationary distribution of gradient flow, while in the presence of label noise it approximates a Boltzmann distribution. Finally, we confirm experimentally that the theoretical results hold qualitatively even without aligned or balanced weights. These results establish that SGD noise encodes information about the progression of feature learning but does not fundamentally alter the saddle-to-saddle dynamics.
【25】Spectral Edge Dynamics Reveal Functional Modes of Learning
标题:光谱边缘动力学揭示功能学习模式
链接:https://arxiv.org/abs/2604.06256
作者:Yongzhong Xu
备注:17 pages, 1 figure
摘要
:Training dynamics during grokking concentrate along a small number of dominant update directions -- the spectral edge -- which reliably distinguishes grokking from non-grokking regimes. We show that standard mechanistic interpretability tools (head attribution, activation probing, sparse autoencoders) fail to capture these directions: their structure is not localized in parameter or feature space. Instead, each direction induces a structured function over the input domain, revealing low-dimensional functional modes invisible to representation-level analysis. For modular addition, all leading directions collapse to a single Fourier mode. For multiplication, the same collapse appears only in the discrete-log basis, yielding a 5.9x improvement in concentration. For subtraction, the edge spans a small multi-mode family. For $x^2+y^2$, no single harmonic basis suffices, but cross-terms of additive and multiplicative features provide a 4x variance boost, consistent with the decomposition (a+b)^2 - 2ab. Multitask training amplifies this compositional structure, with the $x^2+y^2$ spectral edge inheriting the addition circuit's characteristic frequency (2.3x concentration increase). These results suggest that training discovers low-dimensional functional modes over the input domain, whose structure depends on the algebraic symmetry of the task. These results suggest that spectral edge dynamics identify low-dimensional functional subspaces governing learning, whose representation depends on the algebraic structure of the task. Simple harmonic structure emerges only when the task admits a symmetry-adapted basis; more complex tasks require richer functional descriptions.
【26】Gaussian Approximation for Asynchronous Q-learning
标题:同步Q学习的高斯逼近
链接:https://arxiv.org/abs/2604.07323
作者:Artemy Rubtsov,Sergey Samsonov,Vladimir Ulyanov,Alexey Naumov
备注:41 pages
摘要:In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak-Ruppert averaged iterates generated by the asynchronous Q-learning algorithm with a polynomial stepsize $k^{-ω},\, ω\in (1/2, 1]$. Assuming that the sequence of state-action-next-state triples $(s_k, a_k, s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, we establish a rate of order up to $n^{-1/6} \log^{4} (nS A)$ over the class of hyper-rectangles, where $n$ is the number of samples used by the algorithm and $S$ and $A$ denote the numbers of states and actions, respectively. To obtain this result, we prove a high-dimensional central limit theorem for sums of martingale differences, which may be of independent interest. Finally, we present bounds for high-order moments for the algorithm's last iterate.
【27】QNAS: A Neural Architecture Search Framework for Accurate and Efficient Quantum Neural Networks
标题:QNAS:用于准确有效量子神经网络的神经架构搜索框架
链接:https://arxiv.org/abs/2604.07013
作者:Kooshan Maleki,Alberto Marchisio,Muhammad Shafique
备注:To appear at the IEEE International Joint Conference on Neural Networks (IJCNN), Maastricht, The Netherlands, June 2026
摘要:Designing quantum neural networks (QNNs) that are both accurate and deployable on NISQ hardware is challenging. Handcrafted ansatze must balance expressivity, trainability, and resource use, while limited qubits often necessitate circuit cutting. Existing quantum architecture search methods primarily optimize accuracy while only heuristically controlling quantum and mostly ignore the exponential overhead of circuit cutting. We introduce QNAS, a neural architecture search framework that unifies hardware aware evaluation, multi objective optimization, and cutting overhead awareness for hybrid quantum classical neural networks (HQNNs). QNAS trains a shared parameter SuperCircuit and uses NSGA-II to optimize three objectives jointly: (i) validation error, (ii) a runtime cost proxy measuring wall clock evaluation time, and (iii) the estimated number of subcircuits under a target qubit budget. QNAS evaluates candidate HQNNs under a few epochs of training and discovers clear Pareto fronts that reveal tradeoffs between accuracy, efficiency, and cutting overhead. Across MNIST, Fashion-MNIST, and Iris benchmarks, we observe that embedding type and CNOT mode selection significantly impact both accuracy and efficiency, with angle-y embedding and sparse entangling patterns outperforming other configurations on image datasets, and amplitude embedding excelling on tabular data (Iris). On MNIST, the best architecture achieves 97.16% test accuracy with a compact 8 qubit, 2 layer circuit; on the more challenging Fashion-MNIST, 87.38% with a 5 qubit, 2 layer circuit; and on Iris, 100% validation accuracy with a 4 qubit, 2 layer circuit. QNAS surfaces these design insights automatically during search, guiding practitioners toward architectures that balance accuracy, resource efficiency, and practical deployability on current hardware.
【28】Anticipating tipping in spatiotemporal systems with machine learning
标题:利用机器学习预测时空系统中的小费
链接:https://arxiv.org/abs/2604.06454
作者:Smita Deb,Zheng-Meng Zhai,Mulugeta Haile,Ying-Cheng Lai
备注:26 pages, 25 figures
摘要:In nonlinear dynamical systems, tipping refers to a critical transition from one steady state to another, typically catastrophic, steady state, often resulting from a saddle-node bifurcation. Recently, the machine-learning framework of parameter-adaptable reservoir computing has been applied to predict tipping in systems described by low-dimensional stochastic differential equations. However, anticipating tipping in complex spatiotemporal dynamical systems remains a significant open problem. The ability to forecast not only the occurrence but also the precise timing of such tipping events is crucial for providing the actionable lead time necessary for timely mitigation. By utilizing the mathematical approach of non-negative matrix factorization to generate dimensionally reduced spatiotemporal data as input, we exploit parameter-adaptable reservoir computing to accurately anticipate tipping. We demonstrate that the tipping time can be identified within a narrow prediction window across a variety of spatiotemporal dynamical systems, as well as in CMIP5 (Coupled Model Intercomparison Project 5) climate projections. Furthermore, we show that this reservoir-computing framework, utilizing reduced input data, is robust against common forecasting challenges and significantly alleviates the computational overhead associated with processing full spatiotemporal data.
【29】Operator Learning for Surrogate Modeling of Wave-Induced Forces from Sea Surface Waves
标题:海面波浪波浪诱导力代理建模的操作员学习
链接:https://arxiv.org/abs/2604.06433
作者:Shukai Cai,Sourav Dutta,Mark Loveland,Eirik Valseth,Peter Rivera-Casillas,Corey Trahan,Clint Dawson
备注:46 pages, 15 figures
摘要
:Wave setup plays a significant role in transferring wave-induced energy to currents and causing an increase in water elevation. This excess momentum flux, known as radiation stress, motivates the coupling of circulation models with wave models to improve the accuracy of storm surge prediction, however, traditional numerical wave models are complex and computationally expensive. As a result, in practical coupled simulations, wave models are often executed at much coarser temporal resolution than circulation models. In this work, we explore the use of Deep Operator Networks (DeepONets) as a surrogate for the Simulating WAves Nearshore (SWAN) numerical wave model. The proposed surrogate model was tested on three distinct 1-D and 2-D steady-state numerical examples with variable boundary wave conditions and wind fields. When applied to a realistic numerical example of steady state wave simulation in Duck, NC, the model achieved consistently high accuracy in predicting the components of the radiation stress gradient and the significant wave height across representative scenarios.
【30】Calibration of a neural network ocean closure for improved mean state and variability
标题:校准神经网络海洋封闭度以改善平均状态和变异性
链接:https://arxiv.org/abs/2604.06398
作者:Pavel Perezhogin,Alistair Adcroft,Laure Zanna
摘要:Global ocean models exhibit biases in the mean state and variability, particularly at coarse resolution, where mesoscale eddies are unresolved. To address these biases, parameterization coefficients are typically tuned ad hoc. Here, we formulate parameter tuning as a calibration problem using Ensemble Kalman Inversion (EKI). We optimize parameters of a neural network parameterization of mesoscale eddies in two idealized ocean models at coarse resolution. The calibrated parameterization reduces errors in the time-averaged fluid interfaces and their variability by approximately a factor of two compared to the unparameterized model or the offline-trained parameterization. The EKI method is robust to noise in time-averaged statistics arising from chaotic ocean dynamics. Furthermore, we propose an efficient calibration protocol that bypasses integration to statistical equilibrium by carefully choosing an initial condition. These results demonstrate that systematic calibration can substantially improve coarse-resolution ocean simulations and provide a practical pathway for reducing biases in global ocean models.
其他(39篇)
【1】Fast Spatial Memory with Elastic Test-Time Training
标题:快速空间记忆与弹性测试时间训练
链接:https://arxiv.org/abs/2604.07350
作者:Ziqiao Ma,Xueyang Yu,Haoyu Zhen,Yuncong Yang,Joyce Chai,Chuang Gan
备注:Project Page: https://fast-spatial-memory.github.io/
摘要:Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.
【2】MoRight: Motion Control Done Right
标题:MoRight:正确的运动控制
链接:https://arxiv.org/abs/2604.07348
作者:Shaowei Liu,Xuanchi Ren,Tianchang Shen,Huan Ling,Saurabh Gupta,Shenlong Wang,Sanja Fidler,Jun Gao
备注:Project Page: https://research.nvidia.com/labs/sil/projects/moright
摘要:Generating motion-controlled videos--where user-specified actions drive physically plausible scene dynamics under freely chosen viewpoints--demands two capabilities: (1) disentangled motion control, allowing users to separately control the object motion and adjust camera viewpoint; and (2) motion causality, ensuring that user-driven actions trigger coherent reactions from other objects rather than merely displacing pixels. Existing methods fall short on both fronts: they entangle camera and object motion into a single tracking signal and treat motion as kinematic displacement without modeling causal relationships between object motion. We introduce MoRight, a unified framework that addresses both limitations through disentangled motion modeling. Object motion is specified in a canonical static-view and transferred to an arbitrary target camera viewpoint via temporal cross-view attention, enabling disentangled camera and object control. We further decompose motion into active (user-driven) and passive (consequence) components, training the model to learn motion causality from data. At inference, users can either supply active motion and MoRight predicts consequences (forward reasoning), or specify desired passive outcomes and MoRight recovers plausible driving actions (inverse reasoning), all while freely adjusting the camera viewpoint. Experiments on three benchmarks demonstrate state-of-the-art performance in generation quality, motion controllability, and interaction awareness.
【3】Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning
标题:全设施数据中心基础设施规划的生成性人工智能力配置文件的测量
链接:https://arxiv.org/abs/2604.07345
作者:Roberto Vercellino,Jared Willard,Gustavo Campos,Weslley da Silva Pereira,Olivia Hull,Matthew Selensky,Juliane Mueller
备注
:The data associated with this publication can be found at http://doi.org/10.7799/3025227
摘要:The rapid growth of generative artificial intelligence (AI) has introduced unprecedented computational demands, driving significant increases in the energy footprint of data centers. However, existing power consumption data is largely proprietary and reported at varying resolutions, creating challenges for estimating whole-facility energy use and planning infrastructure. In this work, we present a methodology that bridges this gap by linking high-resolution workload power measurements to whole-facility energy demand. Using NLR's high-performance computing data center equipped with NVIDIA H100 GPUs, we measure power consumption of AI workloads at 0.1-second resolution for AI training, fine-tuning and inference jobs. Workloads are characterized using MLCommons benchmarks for model training and fine-tuning, and vLLM benchmarks for inference, enabling reproducible and standardized workload profiling. The dataset of power consumption profiles is made publicly available. These power profiles are then scaled to the whole-facility-level using a bottom-up, event-driven, data center energy model. The resulting whole-facility energy profiles capture realistic temporal fluctuations driven by AI workloads and user-behavior, and can be used to inform infrastructure planning for grid connection, on-site energy generation, and distributed microgrids.
【4】Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment
标题:超越损失值:通过损失轨迹对齐进行稳健的动态修剪
链接:https://arxiv.org/abs/2604.07306
作者:Huaiyuan Qin,Muli Yang,Gabriel James Goenawan,Kai Wang,Zheng Wang,Peng Hu,Xi Peng,Hongyuan Zhu
备注:Published in CVPR 2026 Findings
摘要:Existing dynamic data pruning methods often fail under noisy-label settings, as they typically rely on per-sample loss as the ranking criterion. This could mistakenly lead to preserving noisy samples due to their high loss values, resulting in significant performance drop. To address this, we propose AlignPrune, a noise-robust module designed to enhance the reliability of dynamic pruning under label noise. Specifically, AlignPrune introduces the Dynamic Alignment Score (DAS), which is a loss-trajectory-based criterion that enables more accurate identification of noisy samples, thereby improving pruning effectiveness. As a simple yet effective plug-and-play module, AlignPrune can be seamlessly integrated into state-of-the-art dynamic pruning frameworks, consistently outperforming them without modifying either the model architecture or the training pipeline. Extensive experiments on five widely-used benchmarks across various noise types and pruning ratios demonstrate the effectiveness of AlignPrune, boosting accuracy by up to 6.3\% over state-of-the-art baselines. Our results offer a generalizable solution for pruning under noisy data, encouraging further exploration of learning in real-world scenarios. Code is available at: https://github.com/leonqin430/AlignPrune.
【5】Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions
标题:Android Coach:通过单状态多动作提高在线统计训练效率
链接:https://arxiv.org/abs/2604.07277
作者:Guo Gan,Yuxuan Ding,Cong Chen,Yuwei Ren,Yin Huang,Hong Zhou
摘要:Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully exploring each costly emulator state. In this paper, we propose Android Coach, a novel framework that shifts the training paradigm to Single State Multiple Actions, allowing the agent to sample and utilize multiple actions for a single online state. We enable this without additional emulator overhead by learning a critic that estimates action values. To ensure the critic serves as a reliable coach, we integrate a process reward model and introduce a group-wise advantage estimator based on the averaged critic outputs. Extensive experiments demonstrate the effectiveness and efficiency of Android Coach: it achieves 7.5% and 8.3% success rate improvements on AndroidLab and AndroidWorld over UI-TARS-1.5-7B, and attains 1.4x higher training efficiency than Single State Single Action methods PPO and GRPO at matched success rates.
【6】Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS
标题:为人工智能腾出空间:CROMACS中具有深层潜力的多图形处理器分子动力学
链接:https://arxiv.org/abs/2604.07276
作者:Luca Pennati,Andong Hu,Ivy Peng,Lukas Müllender,Stefano Markidis
摘要:GROMACS is a de-facto standard for classical Molecular Dynamics (MD). The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge: embedding neural-network inference into multi-GPU simulations retaining high-performance. In this work, we integrate the MLIP framework DeePMD-kit into GROMACS, enabling domain-decomposed, GPU-accelerated inference across multi-node systems. We extend the GROMACS NNPot interface with a DeePMD backend, and we introduce a domain decomposition layer decoupled from the main simulation. The inference is executed concurrently on all processes, with two MPI collectives used each step to broadcast coordinates and to aggregate and redistribute forces. We train an in-house DPA-1 model (1.6 M parameters) on a dataset of solvated protein fragments. We validate the implementation on a small protein system, then we benchmark the GROMACS-DeePMD integration with a 15,668 atom protein on NVIDIA A100 and AMD MI250x GPUs up to 32 devices. Strong-scaling efficiency reaches 66% at 16 devices and 40% at 32; weak-scaling efficiency is 80% to 16 devices and reaches 48% (MI250x) and 40% (A100) at 32 devices. Profiling with the ROCm System profiler shows that >90% of the wall time is spent in DeePMD inference, while MPI collectives contribute <10%, primarily since they act as a global synchronization point. The principal bottlenecks are the irreducible ghost-atom cost set by the cutoff radius, confirmed by a simple throughput model, and load imbalance across ranks. These results demonstrate that production MD with near ab initio fidelity is feasible at scale in GROMACS.
【7】$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture
标题:$k$-服务器台:自动发现$k$-服务器猜想的潜在可能
链接:https://arxiv.org/abs/2604.07240
作者:Kirill Brilliantov,Etienne Bamas,Emmanuel Abbé
摘要:We introduce a code-based challenge for automated, open-ended mathematical discovery based on the $k$-server conjecture, a central open problem in competitive analysis. The task is to discover a potential function satisfying a large graph-structured system of simple linear inequalities. The resulting evaluation procedure is sound but incomplete: any violated inequality definitively refutes a candidate, whereas satisfying all inequalities does not by itself constitute a proof of the corresponding conjecture's special case. Nevertheless, a candidate that passes all constraints would be strong evidence toward a valid proof and, to the best of our knowledge, no currently known potential achieves this under our formulation in the open $k=4$ circle case. As such, a successful candidate would already be an interesting contribution to the $k$-server conjecture, and could become a substantial theoretical result when paired with a full proof. Experiments on the resolved $k=3$ regime show that current agentic methods can solve nontrivial instances, and in the open $k=4$ regime they reduce the number of violations relative to existing potentials without fully resolving the task. Taken together, these results suggest that the task is challenging but plausibly within reach of current methods. Beyond its relevance to the $k$-server community, where the developed tooling enables researchers to test new hypotheses and potentially improve on the current record, the task also serves as a useful \emph{benchmark} for developing code-based discovery agents. In particular, our $k=3$ results show that it mitigates important limitations of existing open-ended code-based benchmarks, including early saturation and the weak separation between naive random baselines and more sophisticated methods.
【8】Efficient Learned Data Compression via Dual-Stream Feature Decoupling
标题:通过双流特征去耦合高效的学习数据压缩
链接:https://arxiv.org/abs/2604.07239
作者:Huidong Ma,Xinyan Shi,Hui Sun,Xiaofei Yue,Xiaoguang Liu,Gang Wang,Wentong Cai
备注:Accepted to ACL 2026
摘要:While Learned Data Compression (LDC) has achieved superior compression ratios, balancing precise probability modeling with system efficiency remains challenging. Crucially, uniform single-stream architectures struggle to simultaneously capture micro-syntactic and macro-semantic features, necessitating deep serial stacking that exacerbates latency. Compounding this, heterogeneous systems are constrained by device speed mismatches, where throughput is capped by Amdahl's Law due to serial processing. To this end, we propose a Dual-Stream Multi-Scale Decoupler that disentangles local and global contexts to replace deep serial processing with shallow parallel streams, and incorporate a Hierarchical Gated Refiner for adaptive feature refinement and precise probability modeling. Furthermore, we design a Concurrent Stream-Parallel Pipeline, which overcomes systemic bottlenecks to achieve full-pipeline parallelism. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both compression ratio and throughput, while maintaining the lowest latency and memory usage. The code is available at https://github.com/huidong-ma/FADE.
【9】Diffusion Processes on Implicit Manifolds
标题:隐式多边形上的扩散过程
链接:https://arxiv.org/abs/2604.07213
作者:Victor Kawasaki-Borruat,Clara Grotehans,Pierre Vandergheynst,Adam Gosztolai
摘要:High-dimensional data are often modeled as lying near a low-dimensional manifold. We study how to construct diffusion processes on this data manifold in the implicit setting. That is, using only point cloud samples and without access to charts, projections, or other geometric primitives. Our main contribution is a data-driven SDE that captures intrinsic diffusion on the underlying manifold while being defined in ambient space. The construction relies on estimating the diffusion's infinitesimal generator and its carré-du-champ (CDC) from a proximity graph built from the data. The generator and CDC together encode the local stochastic and geometric structure of the intended diffusion. We show that, as the number of samples grows, the induced process converges in law on the space of probability paths to its smooth manifold counterpart. We call this construction Implicit Manifold-valued Diffusions (IMDs), and furthermore present a numerical simulation procedure using Euler-Maruyama integration. This gives a rigorous basis for practical implementations of diffusion dynamics on data manifolds, and opens new directions for manifold-aware sampling, exploration, and generative modeling.
【10】SBBTS: A Unified Schrödinger-Bass Framework for Synthetic Financial Time Series
标题:SBBTS:合成金融时间序列的统一Schrödinger-Bass框架
链接:https://arxiv.org/abs/2604.07159
作者:Alexandre Alouadi,Grégoire Loeper,Célian Marsala,Othmane Mazhar,Huyên Pham
摘要:We study the problem of generating synthetic time series that reproduce both marginal distributions and temporal dynamics, a central challenge in financial machine learning. Existing approaches typically fail to jointly model drift and stochastic volatility, as diffusion-based methods fix the volatility while martingale transport models ignore drift. We introduce the Schrödinger-Bass Bridge for Time Series (SBBTS), a unified framework that extends the Schrödinger-Bass formulation to multi-step time series. The method constructs a diffusion process that jointly calibrates drift and volatility and admits a tractable decomposition into conditional transport problems, enabling efficient learning. Numerical experiments on the Heston model demonstrate that SBBTS accurately recovers stochastic volatility and correlation parameters that prior SchrödingerBridge methods fail to capture. Applied to S&P 500 data, SBBTS-generated synthetic time series consistently improve downstream forecasting performance when used for data augmentation, yielding higher classification accuracy and Sharpe ratio compared to real-data-only training. These results show that SBBTS provides a practical and effective framework for realistic time series generation and data augmentation in financial applications.
【11】Selective Neuron Amplification for Training-Free Task Enhancement
标题:选择性神经元放大以增强免训练任务
链接:https://arxiv.org/abs/2604.07098
作者:Ryyan Akhtar
备注:28 pages, 12 figures. Preprint. Code and experiments conducted independently
摘要:Large language models often fail on tasks they seem to already understand. In our experiments, this appears to be less about missing knowledge and more about certain internal circuits not being strongly activated during inference. We explore Selective Neuron Amplification, which increases the influence of task relevant neurons without changing the model's parameters. The method works at inference time and does not permanently alter the model. SNA helps mainly when the model is uncertain, while having low effect when the model is already confident. This suggests that some model failures are due to weak activation rather than lack of capability.
【12】Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
标题:随机多目标盗贼比单目标盗贼更难吗?
链接:https://arxiv.org/abs/2604.07096
作者:Changkun Guan,Mengfan Xu
备注:21 pages
摘要:Multi-objective bandits have attracted increasing attention because of their broad applicability and mathematical elegance, where the reward of each arm is a multi-dimensional vector rather than a scalar. This naturally introduces Pareto order relations and Pareto regret. A long-standing question in this area is whether performance is fundamentally harder to optimize because of this added complexity. A recent surprising result shows that, in the adversarial setting, Pareto regret is no larger than classical regret; however, in the stochastic setting, where the regret notion is different, the picture remains unclear. In fact, existing work suggests that Pareto regret in the stochastic case increases with the dimensionality. This controversial yet subtle phenomenon motivates our central question: \emph{are multi-objective bandits actually harder than single-objective ones?} We answer this question in full by showing that, in the stochastic setting, Pareto regret is in fact governed by the maximum sub-optimality gap \(g^\dagger\), and hence by the minimum marginal regret of order \(Ω(\frac{K\log T}{g^\dagger})\). We further develop a new algorithm that achieves Pareto regret of order \(O(\frac{K\log T}{g^\dagger})\), and is therefore optimal. The algorithm leverages a nested two-layer uncertainty quantification over both arms and objectives through upper and lower confidence bound estimators. It combines a top-two racing strategy for arm selection with an uncertainty-greedy rule for dimension selection. Together, these components balance exploration and exploitation across the two layers. We also conduct comprehensive numerical experiments to validate the proposed algorithm, showing the desired regret guarantee and significant gains over benchmark methods.
【13】AdaBoost Does Not Always Cycle: A Computer-Assisted Counterexample
标题:AdaBoost并不总是循环:计算机辅助的反例
链接:https://arxiv.org/abs/2604.07055
作者:Erik Y. Wang
摘要:We give a computer-assisted counterexample to the open question, posed by Rudin, Schapire, and Daubechies in COLT 2012, of whether exhaustive AdaBoost always converges to a finite cycle. The construction is based on a block-product gadget whose two factors share an exact period-2 orbit for their 5-step branch maps, but whose linearized return maps have dominant eigenvalues with an irrational logarithmic ratio. This irrationality forces the burst-winner sequence to have an irrational asymptotic frequency, precluding eventual periodicity. All assertions are certified by exact rational arithmetic. This work was developed in collaboration with GPT-5.4 Pro and Claude Opus 4.6.
【14】MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale
标题:MoE路由测试床:研究专家专业化和小规模路由行为
链接:https://arxiv.org/abs/2604.07030
作者:Tobias Falke,Nicolas Anastassacos,Samson Tan,Chankrisna Richy Meas,Chandana Satya Prakash,Nitesh Sekhar,M Saiful Bari,Krishna Kompella,Gamaleldin F. Elsayed
摘要:Sparse Mixture-of-Experts (MoE) architectures are increasingly popular for frontier large language models (LLM) but they introduce training challenges due to routing complexity. Fully leveraging parameters of an MoE model requires all experts to be well-trained and to specialize in non-redundant ways. Assessing this, however, is complicated due to lack of established metrics and, importantly, many routing techniques exhibit similar performance at smaller sizes, which is often not reflective of their behavior at large scale. To address this challenge, we propose the MoE Routing Testbed, a setup that gives clearer visibility into routing dynamics at small scale while using realistic data. The testbed pairs a data mix with clearly distinguishable domains with a reference router that prescribes ideal routing based on these domains, providing a well-defined upper bound for comparison. This enables quantifiable measurement of expert specialization. To demonstrate the value of the testbed, we compare various MoE routing approaches and show that balancing scope is the crucial factor that allows specialization while maintaining high expert utilization. We confirm that this observation generalizes to models 35x larger.
【15】Data Leakage in Automotive Perception: Practitioners' Insights
标题:汽车感知中的数据泄露:从业者的见解
链接:https://arxiv.org/abs/2604.06899
作者:Md Abu Ahammed Babu,Sushant Kumar Pandey,Darko Durisic,Andras Balint,Miroslaw Staron
摘要
:Data leakage is the inadvertent transfer of information between training and evaluation datasets that poses a subtle, yet critical, risk to the reliability of machine learning (ML) models in safety-critical systems such as automotive perception. While leakage is widely recognized in research, little is known about how industrial practitioners actually perceive and manage it in practice. This study investigates practitioners' knowledge, experiences, and mitigation strategies around data leakage through ten semi-structured interviews with system design, development, and verification engineers working on automotive perception functions development. Using reflexive thematic analysis, we identify that knowledge of data leakage is widespread and fragmented along role boundaries: ML engineers conceptualize it as a data-splitting or validation issue, whereas design and verification roles interpret it in terms of representativeness and scenario coverage. Detection commonly arises through generic considerations and observed performance anomalies rather than implying specific tools. However, data leakage prevention is more commonly practiced, which depends mostly on experience and knowledge sharing. These findings suggest that leakage control is a socio-technical coordination problem distributed across roles and workflows. We discuss implications for ML reliability engineering, highlighting the need for shared definitions, traceable data practices, and continuous cross-role communication to institutionalize data leakage awareness within automotive ML development.
【16】MENO: MeanFlow-Enhanced Neural Operators for Dynamical Systems
标题:MENO:动态系统的MeanFlow增强神经运算符
链接:https://arxiv.org/abs/2604.06881
作者:Tianyue Yang,Xiao Xue
备注:23 pages, 9 figures
摘要:Neural operators have emerged as powerful surrogates for dynamical systems due to their grid-invariant properties and computational efficiency. However, the Fourier-based neural operator framework inherently truncates high-frequency components in spectral space, resulting in the loss of small-scale structures and degraded prediction quality at high resolutions when trained on low-resolution data. While diffusion-based enhancement methods can recover multi-scale features, they introduce substantial inference overhead that undermines the efficiency advantage of neural operators. In this work, we introduce \textbf{M}eanFlow-\textbf{E}nhanced \textbf{N}eural \textbf{O}perators (MENO), a novel framework that achieves accurate all-scale predictions with minimal inference cost. By leveraging the improved MeanFlow method, MENO restores both small-scale details and large-scale dynamics with superior physical fidelity and statistical accuracy. We evaluate MENO on three challenging dynamical systems, including phase-field dynamics, 2D Kolmogorov flow, and active matter dynamics, at resolutions up to 256$\times$256. Across all benchmarks, MENO improves the power spectrum density accuracy by up to a factor of 2 compared to baseline neural operators while achieving 12$\times$ faster inference than the state-of-the-art Diffusion Denoising Implicit Model (DDIM)-enhanced counterparts, effectively bridging the gap between accuracy and efficiency. The flexibility and efficiency of MENO position it as an efficient surrogate model for scientific machine learning applications where both statistical integrity and computational efficiency are paramount.
【17】FedDetox: Robust Federated SLM Alignment via On-Device Data Sanitization
标题:FedDetox:通过设备上数据清理实现稳健的联合SPL对齐
链接:https://arxiv.org/abs/2604.06833
作者:Shunan Zhu,Jiawei Chen,Yonghao Yu,Hideya Ochiai
摘要:As high quality public data becomes scarce, Federated Learning (FL) provides a vital pathway to leverage valuable private user data while preserving privacy. However, real-world client data often contains toxic or unsafe information. This leads to a critical issue we define as unintended data poisoning, which can severely damage the safety alignment of global models during federated alignment. To address this, we propose FedDetox, a robust framework tailored for Small Language Models (SLMs) on resource-constrained edge devices. We first employ knowledge distillation to transfer sophisticated safety alignment capabilities from large scale safety aligned teacher models into light weight student classifiers suitable for resource constrained edge devices. Specifically, during federated learning for human preference alignment, the edge client identifies unsafe samples at the source and replaces them with refusal templates, effectively transforming potential poisons into positive safety signals. Experiments demonstrate that our approach preserves model safety at a level comparable to centralized baselines without compromising general utility.
【18】CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir Computing
标题:CBM-Dual:一款65纳米全连接混乱Boltzmann机处理器,用于双功能模拟退变和储层计算
链接:https://arxiv.org/abs/2604.06808
作者:Kanta Yoshioka,Soshi Hirayae,Yuichiro Tanaka,Yuichi Katori,Takashi Morie,Hakaru Tamukoh
备注:3 pages, 9 figures
摘要:This paper presents CBM-Dual, the first silicon-proven digital chaotic dynamics processor (CDP) supporting both simulated annealing (SA) and reservoir computing (RC). CBM-Dual enables real-time decision-making and lightweight adaptation for autonomous Edge AI, employing the largest-scale fully connected 1024-neuron chaotic Boltzmann machine (CBM). To address the high computational and area costs of digital CDPs, we propose: 1) a CBM-specific scheduler that exploits an inherently low neuron flip rate to reduce multiply-accumulate operations by 99%, and 2) an efficient multiply splitting scheme that reduces the area by 59%. Fabricated in 65nm (12mm$^2$), CBM-Dual achieves simultaneous heterogeneous task execution and state-of-the-art energy efficiency, delivering $\times$25-54 and $\times$4.5 improvements in the SA and RC fields, respectively.
【19】Steering the Verifiability of Multimodal AI Hallucinations
标题:引导多模式人工智能幻觉的可验证性
链接:https://arxiv.org/abs/2604.06714
作者:Jianhong Pang,Ruoxi Cheng,Ziyi Ye,Xingjun Ma,Zuxuan Wu,Xuanjing Huang,Yu-Gang Jiang
摘要
:AI applications driven by multimodal large language models (MLLMs) are prone to hallucinations and pose considerable risks to human users. Crucially, such hallucinations are not equally problematic: some hallucination contents could be detected by human users(i.e., obvious hallucinations), while others are often missed or require more verification effort(i.e., elusive hallucinations). This indicates that multimodal AI hallucinations vary significantly in their verifiability. Yet, little research has explored how to control this property for AI applications with diverse security and usability demands. To address this gap, we construct a dataset from 4,470 human responses to AI-generated hallucinations and categorize these hallucinations into obvious and elusive types based on their verifiability by human users. Further, we propose an activation-space intervention method that learns separate probes for obvious and elusive hallucinations. We reveal that obvious and elusive hallucinations elicit different intervention probes, allowing for fine-grained control over the model's verifiability. Empirical results demonstrate the efficacy of this approach and show that targeted interventions yield superior performance in regulating corresponding verifiability. Moreover, simply mixing these interventions enables flexible control over the verifiability required for different scenarios.
【20】FlowAdam: Implicit Regularization via Geometry-Aware Soft Momentum Injection
标题:FlowAdam:通过几何感知软动量注入的隐式正则化
链接:https://arxiv.org/abs/2604.06652
作者:Devender Singh,Tarun Sheel
备注:Accepted at IJCNN 2026 (IEEE WCCI). 8 pages, 4 figures
摘要:Adaptive moment methods such as Adam use a diagonal, coordinate-wise preconditioner based on exponential moving averages of squared gradients. This diagonal scaling is coordinate-system dependent and can struggle with dense or rotated parameter couplings, including those in matrix factorization, tensor decomposition, and graph neural networks, because it treats each parameter independently. We introduce FlowAdam, a hybrid optimizer that augments Adam with continuous gradient-flow integration via an ordinary differential equation (ODE). When EMA-based statistics detect landscape difficulty, FlowAdam switches to clipped ODE integration. Our central contribution is Soft Momentum Injection, which blends ODE velocity with Adam's momentum during mode transitions. This prevents the training collapse observed with naive hybrid approaches. Across coupled optimization benchmarks, the ODE integration provides implicit regularization, reducing held-out error by 10-22% on low-rank matrix/tensor recovery and 6% on Jester (real-world collaborative filtering), also surpassing tuned Lion and AdaBelief, while matching Adam on well-conditioned workloads (CIFAR-10). MovieLens-100K confirms benefits arise specifically from coupled parameter interactions rather than bias estimation. Ablation studies show that soft injection is essential, as hard replacement reduces accuracy from 100% to 82.5%.
【21】The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence
标题:大卫·布莱克威尔博士的理论及其对人工智能的贡献
链接:https://arxiv.org/abs/2604.06621
作者:Napoleon Paxton
备注:Survey article, 19 pages, 1 figure, 2 tables
摘要:Dr. David Blackwell was a mathematician and statistician of the first rank, whose contributions to statistical theory, game theory, and decision theory predated many of the algorithmic breakthroughs that define modern artificial intelligence. This survey examines three of his most consequential theoretical results the Rao Blackwell theorem, the Blackwell Approachability theorem, and the Blackwell Informativeness theorem (comparison of experiments) and traces their direct influence on contemporary AI and machine learning. We show that these results, developed primarily in the 1940s and 1950s, remain technically live across modern subfields including Markov Chain Monte Carlo inference, autonomous mobile robot navigation (SLAM), generative model training, no-regret online learning, reinforcement learning from human feedback (RLHF), large language model alignment, and information design. NVIDIAs 2024 decision to name their flagship GPU architecture (Blackwell) provides vivid testament to his enduring relevance. We also document an emerging frontier: explicit Rao Blackwellized variance reduction in LLM RLHF pipelines, recently proposed but not yet standard practice. Together, Blackwell theorems form a unified framework addressing information compression, sequential decision making under uncertainty, and the comparison of information sources precisely the problems at the core of modern AI.
【22】Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
标题:具有理论概括保证的专家混合的有效量化
链接:https://arxiv.org/abs/2604.06515
作者:Mohammed Nowaz Rabbani Chowdhury,Kaoutar El Maghraoui,Hsinyu Tsai,Naigang Wang,Geoffrey W. Burr,Liu Liu,Meng Wang
摘要:Sparse Mixture-of-Experts (MoE) allows scaling of language and vision models efficiently by activating only a small subset of experts per input. While this reduces computation, the large number of parameters still incurs substantial memory overhead during inference. Post-training quantization has been explored to address this issue. Because uniform quantization suffers from significant accuracy loss at low bit-widths, mixed-precision methods have been recently explored; however, they often require substantial computation for bit-width allocation and overlook the varying sensitivity of model performance to the quantization of different experts. We propose a theoretically grounded expert-wise mixed precision strategy that assigns bit-width to each expert primarily based on their change in routers l2 norm during training. Experts with smaller changes are shown to capture less frequent but critical features, and model performance is more sensitive to the quantization of these experts, thus requiring higher precision. Furthermore, to avoid allocating experts to lower precision that inject high quantization noise, experts with large maximum intra-neuron variance are also allocated higher precision. Experiments on large-scale MoE models, including Switch Transformer and Mixtral, show that our method achieves higher accuracy than existing approaches, while also reducing inference cost and incurring only negligible overhead for bit-width assignment.
【23】Improving Robustness In Sparse Autoencoders via Masked Regularization
标题:通过掩蔽正规化提高稀疏自动编码器的鲁棒性
链接:https://arxiv.org/abs/2604.06495
作者:Vivek Narayanaswamy,Kowshik Thopalli,Bhavya Kailkhura,Wesam Sakla
备注:4 pages, 1 figure
摘要
:Sparse autoencoders (SAEs) are widely used in mechanistic interpretability to project LLM activations onto sparse latent spaces. However, sparsity alone is an imperfect proxy for interpretability, and current training objectives often result in brittle latent representations. SAEs are known to be prone to feature absorption, where general features are subsumed by more specific ones due to co-occurrence, degrading interpretability despite high reconstruction fidelity. Recent negative results on Out-of-Distribution (OOD) performance further underscore broader robustness related failures tied to under-specified training objectives. We address this by proposing a masking-based regularization that randomly replaces tokens during training to disrupt co-occurrence patterns. This improves robustness across SAE architectures and sparsity levels reducing absorption, enhancing probing performance, and narrowing the OOD gap. Our results point toward a practical path for more reliable interpretability tools.
【24】Visual prompting reimagined: The power of the Activation Prompts
标题:视觉提示重新想象:激活提示的力量
链接:https://arxiv.org/abs/2604.06440
作者:Yihua Zhang,Hongkang Li,Yuguang Yao,Aochuan Chen,Shuai Zhang,Pin-Yu Chen,Meng Wang,Sijia Liu
备注:AISTATS 2026
摘要:Visual prompting (VP) has emerged as a popular method to repurpose pretrained vision models for adaptation to downstream tasks. Unlike conventional model fine-tuning techniques, VP introduces a universal perturbation directly into the input data to facilitate task-specific fine-tuning rather than modifying model parameters. However, there exists a noticeable performance gap between VP and conventional fine-tuning methods, highlighting an unexplored realm in theory and practice to understand and advance the input-level VP to reduce its current performance gap. Towards this end, we introduce a generalized concept, termed activation prompt (AP), which extends the scope of the input-level VP by enabling universal perturbations to be applied to activation maps within the intermediate layers of the model. By using AP to revisit the problem of VP and employing it as an analytical tool, we demonstrate the intrinsic limitations of VP in both performance and efficiency, revealing why input-level prompting may lack effectiveness compared to AP, which exhibits a model-dependent layer preference. We show that AP is closely related to normalization tuning in convolutional neural networks and vision transformers, although each model type has distinct layer preferences for prompting. We also theoretically elucidate the rationale behind such a preference by analyzing global features across layers. Through extensive experiments across 29 datasets and various model architectures, we provide a comprehensive performance analysis of AP, comparing it with VP and parameter-efficient fine-tuning baselines. Our results demonstrate AP's superiority in both accuracy and efficiency, considering factors such as time, parameters, memory usage, and throughput.
【25】Neural Computers
标题:神经计算机
链接:https://arxiv.org/abs/2604.06425
作者:Mingchen Zhuge,Changsheng Zhao,Haozhe Liu,Zijian Zhou,Shuming Liu,Wenyi Wang,Ernie Chang,Gael Le Lan,Junjie Fei,Wenxuan Zhang,Yasheng Sun,Zhipeng Cai,Zechun Liu,Yunyang Xiong,Yining Yang,Yuandong Tian,Yangyang Shi,Vikas Chandra,Jürgen Schmidhuber
备注:Github (data pipeline): https://github.com/metauto-ai/NeuralComputer; Blogpost: https://metauto.ai/neuralcomputer/index_eng.html
摘要:We propose a new frontier: Neural Computers (NCs) -- an emerging machine form that unifies computation, memory, and I/O in a learned runtime state. Unlike conventional computers, which execute explicit programs, agents, which act over external execution environments, and world models, which learn environment dynamics, NCs aim to make the model itself the running computer. Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine form, with stable execution, explicit reprogramming, and durable capability reuse. As an initial step, we study whether early NC primitives can be learned solely from collected I/O traces, without instrumented program state. Concretely, we instantiate NCs as video models that roll out screen frames from instructions, pixels, and user actions (when available) in CLI and GUI settings. These implementations show that learned runtimes can acquire early interface primitives, especially I/O alignment and short-horizon control, while routine reuse, controlled updates, and symbolic stability remain open. We outline a roadmap toward CNCs around these challenges. If overcome, CNCs could establish a new computing paradigm beyond today's agents, world models, and conventional computers.
【26】Bridging Theory and Practice in Crafting Robust Spiking Reservoirs
标题:打造稳健尖峰水库的桥梁理论与实践
链接:https://arxiv.org/abs/2604.06395
作者:Ruggero Freddi,Nicolas Seseri,Diana Nigrisoli,Alessio Basti
摘要:Spiking reservoir computing provides an energy-efficient approach to temporal processing, but reliably tuning reservoirs to operate at the edge-of-chaos is challenging due to experimental uncertainty. This work bridges abstract notions of criticality and practical stability by introducing and exploiting the robustness interval, an operational measure of the hyperparameter range over which a reservoir maintains performance above task-dependent thresholds. Through systematic evaluations of Leaky Integrate-and-Fire (LIF) architectures on both static (MNIST) and temporal (synthetic Ball Trajectories) tasks, we identify consistent monotonic trends in the robustness interval across a broad spectrum of network configurations: the robustness-interval width decreases with presynaptic connection density $β$ (i.e., directly with sparsity) and directly with the firing threshold $θ$. We further identify specific $(β, θ)$ pairs that preserve the analytical mean-field critical point $w_{\text{crit}}$, revealing iso-performance manifolds in the hyperparameter space. Control experiments on Erdős-Rényi graphs show the phenomena persist beyond small-world topologies. Finally, our results show that $w_{\text{crit}}$ consistently falls within empirical high-performance regions, validating $w_{\text{crit}}$ as a robust starting coordinate for parameter search and fine-tuning. To ensure reproducibility, the full Python code is publicly available.
【27】Revisiting Fairness Impossibility with Endogenous Behavior
标题:用内生行为重新审视公平的不可能性
链接:https://arxiv.org/abs/2604.06378
作者:Elizabeth Maggie Penn,John W. Patty
摘要:In many real-world settings, institutions can and do adjust the consequences attached to algorithmic classification decisions, such as the size of fines, sentence lengths, or benefit levels. We refer to these consequences as the stakes associated with classification. These stakes can give rise to behavioral responses to classification, as people adjust their actions in anticipation of how they will be classified. Much of the algorithmic fairness literature evaluates classification outcomes while holding behavior fixed, treating behavioral differences across groups as exogenous features of the environment. Under this assumption, the stakes of classification play no role in shaping outcomes. We revisit classic impossibility results in algorithmic fairness in a setting where people respond strategically to classification. We show that, in this environment, the well-known incompatibility between error-rate balance and predictive parity disappears, but only by potentially introducing a qualitatively different form of unequal treatment. Concretely, we construct a two-stage design in which a classifier first standardizes its statistical performance across groups, and then adjusts stakes so as to induce comparable patterns of behavior. This requires treating groups differently in the consequences attached to identical classification decisions. Our results demonstrate that fairness in strategic settings cannot be assessed solely by how algorithms map data into decisions. Rather, our analysis treats the human consequences of classification as primary design variables, introduces normative criteria governing their use, and shows that their interaction with statistical fairness criteria generates qualitatively new tradeoffs. Our aim is to make these tradeoffs precise and explicit.
【28】ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache
标题:ForkKN:通过写时复制分解的KV缓存扩展多LoRA代理
链接:https://arxiv.org/abs/2604.06370
作者:Shao Wang,Rui Ren,Lin Gui
摘要:The serving paradigm of large language models (LLMs) is rapidly shifting towards complex multi-agent workflows where specialized agents collaborate over massive shared contexts. While Low-Rank Adaptation (LoRA) enables the efficient co-hosting of these specialized agents on a single base model, it introduces a critical memory footprint bottleneck during serving. Specifically, unique LoRA activations cause Key-Value (KV) cache divergence across agents, rendering traditional prefix caching ineffective for shared contexts. This forces redundant KV cache maintenance, rapidly saturating GPU capacity and degrading throughput. To address this challenge, we introduce ForkKV, a serving system for multi-LoRA agent workflows centered around a novel memory management paradigm in OS: fork with copy-on-write (CoW). By exploiting the structural properties of LoRA, ForkKV physically decouples the KV cache into a massive shared component (analogous to the parent process's memory pages) and lightweight agent-specific components (the child process's pages). To support this mechanism, we propose a DualRadixTree architecture that allows newly forked agents to inherit the massive shared cache and apply CoW semantics for their lightweight unique cache. Furthermore, to guarantee efficient execution, we design ResidualAttention, a specialized kernel that reconstructs the disaggregated KV cache directly within on-chip SRAM. Comprehensive evaluations across diverse language models and practical datasets of different tasks demonstrate that ForkKV achieves up to 3.0x the throughput of state-of-the-art multi-LoRA serving systems with a negligible impact on generation quality.
【29】WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks
标题:WebSP-Eval:评估Web代理的网站安全和隐私任务
链接:https://arxiv.org/abs/2604.06367
作者:Guruprasad Viswanathan Ramesh,Asmit Nayak,Basieem Siddique,Kassem Fawaz
摘要:Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully execute user-facing website security and privacy tasks, such as managing cookie preferences, configuring privacy-sensitive account settings, or revoking inactive sessions. To address this gap, we introduce WebSP-Eval, an evaluation framework for measuring web agent performance on website security and privacy tasks. WebSP-Eval comprises 1) a manually crafted task dataset of 200 task instances across 28 websites; 2) a robust agentic system supporting account and initial state management across runs using a custom Google Chrome extension; and 3) an automated evaluator. We evaluate a total of 8 web agent instantiations using state-of-the-art multimodal large language models, conducting a fine-grained analysis across websites, task categories, and UI elements. Our evaluation reveals that current models suffer from limited autonomous exploration capabilities to reliably solve website security and privacy tasks, and struggle with specific task categories and websites. Crucially, we identify stateful UI elements such as toggles and checkboxes are a primary reason for agent failure, failing at a rate of more than 45\% in tasks containing these elements across many models.
【30】Drifting Fields are not Conservative
标题:漂流场并不保守
链接:https://arxiv.org/abs/2604.06333
作者:Leonard Franz,Sebastian Hoffmann,Georg Martius
备注:19 pages, 7 figures
摘要:Drifting models generate high-quality samples in a single forward pass by transporting generated samples toward the data distribution using a vector valued drift field. We investigate whether this procedure is equivalent to optimizing a scalar loss and find that, in general, it is not: drift fields are not conservative - they cannot be written as the gradient of any scalar potential. We identify the position-dependent normalization as the source of non-conservatism. The Gaussian kernel is the unique exception where the normalization is harmless and the drift field is exactly the gradient of a scalar function. Generalizing this, we propose an alternative normalization via a related kernel (the sharp kernel) which restores conservatism for any radial kernel, yielding well-defined loss functions for training drifting models. While we identify that the drifting field matching objective is strictly more general than loss minimization, as it can implement non-conservative transport fields that no scalar loss can reproduce, we observe that practical gains obtained utilizing this flexibility are minimal. We thus propose to train drifting models with the conceptually simpler formulations utilizing loss functions.
【31】Limits of Difficulty Scaling: Hard Samples Yield Diminishing Returns in GRPO-Tuned SLMs
标题:难度缩放的局限性:硬样本在GRPO调谐的SLC中产生的回报不断减少
链接:https://arxiv.org/abs/2604.06298
作者:Suraj Yadav,Siddharth Yadav,Parth Goyal
备注:Accepted at ICLR Workshop 2026 ICBINB
摘要:Recent alignment work on Large Language Models (LLMs) suggests preference optimization can improve reasoning by shifting probability mass toward better solutions. We test this claim in a resource-constrained setting by applying GRPO with LoRA to SLMs (up to 3B) for math reasoning on GSM8K and MATH datasets with difficulty-stratified analyses. As problem difficulty increases, accuracy plateaus, revealing a capacity boundary: GRPO primarily reshapes output preferences without reliably improving hardest-tier solving. Consistent with this, training GRPO only on lower-difficulty problems matches full-dataset accuracy across difficulty tiers while using only ~45% training steps, indicating diminishing returns from harder samples in this regime. We also find a cross-dataset generalization effect: GSM8K-trained GRPO achieves higher accuracy on the numeric subset of MATH than MATH-trained GRPO, exceeding it by ~5% at 1.5B and by ~3% at 3B. We show that the best achievable gains depend strongly on the base model's prior reasoning competence and the dataset's difficulty profile.
【32】FLeX: Fourier-based Low-rank EXpansion for multilingual transfer
标题:FLeX:基于傅里叶的低级别Expansion,用于多语言传输
链接:https://arxiv.org/abs/2604.06253
作者:Gaurav Narasimhan
备注:19 pages, 25 figures, Stanford CS224N Custom Project
摘要:Cross-lingual code generation is critical in enterprise environments where multiple programming languages coexist. However, fine-tuning large language models (LLMs) individually for each language is computationally prohibitive. This paper investigates whether parameter-efficient fine-tuning methods and optimizer enhancements can improve cross-lingual transfer from Python to languages like Java. We fine-tune the Code Llama 7B model using low-rank adaptation (LoRA) to optimize a small subset of parameters and compare Adam and Sophia optimizers, while exploring a novel Fourier-based regularization technique. Our contributions include: (1)demonstrating that LoRA fine-tuning on a small, high-quality dataset (MBPP) can exceed the pass@1 performance of the more broadly fine-tuned Code Llama-Python-7B model (40.1% vs. 38.4%); (2) showing that while Sophia achieves faster convergence than Adam, final pass@1 scores show marginal differences; and (3) presenting evidence that Fourier-based regularization during fine-tuning significantly improves cross-lingual transfer, achieving 42.1% pass@1 on Java tasks compared to the 34.2% baseline. These findings suggest that combining LoRA, optimized training methods, and frequency-domain regularization can efficiently adapt single-language LLMs to perform well across multiple programming languages.
【33】Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse
标题:概率语言尝试:压缩、决策策略和执行重用的统一框架
链接:https://arxiv.org/abs/2604.06228
作者:Gregory Magarshak
备注:24 pages, 2 figures
摘要:We introduce probabilistic language tries (PLTs), a unified representation that makes explicit the prefix structure implicitly defined by any generative model over sequences. By assigning to each outgoing edge the conditional probability of the corresponding token or action, a PLT simultaneously serves as: (i) an optimal lossless compressor via frequency-weighted interval encoding, generalizing arithmetic coding to model-conditioned distributions; (ii) a policy representation for sequential decision problems including games, search, and robotic control; and (iii) a memoization index that lets repeated inference queries be answered by structured retrieval rather than full model execution. The central technical result is a prior-guided caching theorem: under a stationary generative distribution, a PLT-guided cache achieves strictly lower expected inference cost than any empirical-frequency cache for all query counts below a threshold that grows with the concentration of the prior. This converts O(n^2) transformer attention cost into an expected cost of p_r * O(log N) + (1 - p_r) * O(n^2), where p_r is the prior-estimated reuse probability and N is the artifact store size. We further introduce a hybrid compression architecture decomposing any dataset into a PLT-covered majority and a sparse residual store, connecting arithmetic coding with Kolmogorov-style program representations and rate-distortion theory. We instantiate the framework across chess, web search, robotics, organizational workflows, and LLM inference, demonstrating that compression, decision making, and computational reuse are all derived from a single probability measure on sequence space.
【34】The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours
标题:具有最近邻的高可扩展高斯过程回归的理论与实践
链接:https://arxiv.org/abs/2604.07267
作者:Robert Allison,Tomasz Maciazek,Anthony Stephenson
备注:92 pages (35-page main text + self-contained appendix with theorem proofs and auxiliary lemmas)
摘要
:Gaussian process ($GP$) regression is a widely used non-parametric modeling tool, but its cubic complexity in the training size limits its use on massive data sets. A practical remedy is to predict using only the nearest neighbours of each test point, as in Nearest Neighbour Gaussian Process ($NNGP$) regression for geospatial problems and the related scalable $GPnn$ method for more general machine-learning applications. Despite their strong empirical performance, the large-$n$ theory of $NNGP/GPnn$ remains incomplete. We develop a theoretical framework for $NNGP$ and $GPnn$ regression. Under mild regularity assumptions, we derive almost sure pointwise limits for three key predictive criteria: mean squared error ($MSE$), calibration coefficient ($CAL$), and negative log-likelihood ($NLL$). We then study the $L_2$-risk, prove universal consistency, and show that the risk attains Stone's minimax rate $n^{-2α/(2p+d)}$, where $α$ and $p$ capture regularity of the regression problem. We also prove uniform convergence of $MSE$ over compact hyper-parameter sets and show that its derivatives with respect to lengthscale, kernel scale, and noise variance vanish asymptotically, with explicit rates. This explains the observed robustness of $GPnn$ to hyper-parameter tuning. These results provide a rigorous statistical foundation for $NNGP/GPnn$ as a highly scalable and principled alternative to full $GP$ models.
【35】Amortized Filtering and Smoothing with Conditional Normalizing Flows
标题:使用条件正规化流进行摊销过滤和平滑
链接:https://arxiv.org/abs/2604.07169
作者:Tiangang Cui,Xiaodong Feng,Chenlong Pei,Xiaoliang Wan,Tao Zhou
备注:43 pages
摘要:Bayesian filtering and smoothing for high-dimensional nonlinear dynamical systems are fundamental yet challenging problems in many areas of science and engineering. In this work, we propose AFSF, a unified amortized framework for filtering and smoothing with conditional normalizing flows. The core idea is to encode each observation history into a fixed-dimensional summary statistic and use this shared representation to learn both a forward flow for the filtering distribution and a backward flow for the backward transition kernel. Specifically, a recurrent encoder maps each observation history to a fixed-dimensional summary statistic whose dimension does not depend on the length of the time series. Conditioned on this shared summary statistic, the forward flow approximates the filtering distribution, while the backward flow approximates the backward transition kernel. The smoothing distribution over an entire trajectory is then recovered by combining the terminal filtering distribution with the learned backward flow through the standard backward recursion. By learning the underlying temporal evolution structure, AFSF also supports extrapolation beyond the training horizon. Moreover, by coupling the two flows through shared summary statistics, AFSF induces an implicit regularization across latent state trajectories and improves trajectory-level smoothing. In addition, we develop a flow-based particle filtering variant that provides an alternative filtering procedure and enables ESS-based diagnostics when explicit model factors are available. Numerical experiments demonstrate that AFSF provides accurate approximations of both filtering distributions and smoothing paths.
【36】A solver-in-the-loop framework for end-to-end differentiable coastal hydrodynamics
标题:端到端可区分沿海水动力学的循环求解器框架
链接:https://arxiv.org/abs/2604.07129
作者:Elsa Cardoso-Bihlo,Alex Bihlo
备注:23 pages,9 figures
摘要:Numerical simulation of wave propagation and run-up is a cornerstone of coastal engineering and tsunami hazard assessment. However, applying these forward models to inverse problems, such as bathymetry estimation, source inversion, and structural optimization, remains notoriously difficult due to the rigidity and high computational cost of deriving discrete adjoints. In this paper, we introduce AegirJAX, a fully differentiable hydrodynamic solver based on the depth-integrated, non-hydrostatic shallow-water equations. By implementing the solver entirely within a reverse-mode automatic differentiation framework, AegirJAX treats the time-marching physics loop as a continuous computational graph. We demonstrate the framework's versatility across a suite of scientific machine learning tasks: (1) discovering regime-specific neural corrections for model misspecifications in highly dispersive wave propagation; (2) performing continuous topology optimization for breakwater design; (3) training recurrent neural networks in-the-loop for active wave cancellation; and (4) inverting hidden bathymetry and submarine landslide kinematics directly from downstream sensor data. The proposed differentiable paradigm fundamentally blurs the line between forward simulation and inverse optimization, offering a unified, end-to-end framework for coastal hydrodynamics.
【37】Continuous-Time Dynamics of the Difference-of-Convex Algorithm
标题:凸差算法的连续时间动态
链接:https://arxiv.org/abs/2604.06926
作者:Yi-Shuai Niu
备注:22 pages
摘要:We study the continuous-time structure of the difference-of-convex algorithm (DCA) for smooth DC decompositions with a strongly convex component. In dual coordinates, classical DCA is exactly the full-step explicit Euler discretization of a nonlinear autonomous system. This viewpoint motivates a damped DCA scheme, which is also a Bregman-regularized DCA variant, and whose vanishing-step limit yields a Hessian-Riemannian gradient flow generated by the convex part of the decomposition. For the damped scheme we prove monotone descent, asymptotic criticality, Kurdyka-Lojasiewicz convergence under boundedness, and a global linear rate under a metric DC-PL inequality. For the limiting flow we establish an exact energy identity, asymptotic criticality of bounded trajectories, explicit global rates under metric relative error bounds, finite-length and single-point convergence under a Kurdyka-Lojasiewicz hypothesis, and local exponential convergence near nondegenerate local minima. The analysis also reveals a global-local tradeoff: the half-relaxed scheme gives the best provable global guarantee in our framework, while the full-step scheme is locally fastest near a nondegenerate minimum. Finally, we show that different DC decompositions of the same objective induce different continuous dynamics through the metric generated by the convex component, providing a geometric criterion for decomposition quality and linking DCA with Bregman geometry.
【38】A Generalized Sinkhorn Algorithm for Mean-Field Schrödinger Bridge
标题:平均场薛定汉桥的广义Sinkhorn算法
链接:https://arxiv.org/abs/2604.06531
作者:Asmaa Eldesoukey,Yongxin Chen,Abhishek Halder
摘要
:The mean-field Schrödinger bridge (MFSB) problem concerns designing a minimum-effort controller that guides a diffusion process with nonlocal interaction to reach a given distribution from another by a fixed deadline. Unlike the standard Schrödinger bridge, the dynamical constraint for MFSB is the mean-field limit of a population of interacting agents with controls. It serves as a natural model for large-scale multi-agent systems. The MFSB is computationally challenging because the nonlocal interaction makes the problem nonconvex. We propose a generalization of the Hopf-Cole transform for MFSB and, building on it, design a Sinkhorn-type recursive algorithm to solve the associated system of integro-PDEs. Under mild assumptions on the interaction potential, we discuss convergence guarantees for the proposed algorithm. We present numerical examples with repulsive and attractive interactions to illustrate the theoretical contributions.
【39】Soft-Quantum Algorithms
标题:软量子算法
链接:https://arxiv.org/abs/2604.06523
作者:Basil Kyriacou,Mo Kordzanganeh,Maniraman Periyasamy,Alexey Melnikov
备注:6 pages, 6 figures, 0 tables
摘要:Quantum operations on pure states can be fully represented by unitary matrices. Variational quantum circuits, also known as quantum neural networks, embed data and trainable parameters into gate-based operations and optimize the parameters via gradient descent. The high cost of training and low fidelity of current quantum devices, however, restricts much of quantum machine learning to classical simulation. For few-qubit problems with large datasets, training the matrix elements directly, as is done with weight matrices in classical neural networks, can be faster than decomposing data and parameters into gates. We propose a method that trains matrices directly while maintaining unitarity through a single regularization term added to the loss function. A second training step, circuit alignment, then recovers a gate-based architecture from the resulting soft-unitary. On a five-qubit supervised classification task with 1000 datapoints, this two-step process produces a trained variational circuit in under four minutes, compared to over two hours for direct circuit training, while achieving lower binary cross-entropy loss. In a second experiment, soft-unitaries are embedded in a hybrid quantum-classical network for a reinforcement learning cartpole task, where the hybrid agent outperforms a purely classical baseline of comparable size.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递