点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计117篇
大模型相关(14篇)
【1】Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models
标题:上下文StereoSet:大型语言模型中的压力测试偏置对齐稳健性
链接:https://arxiv.org/abs/2601.10460
作者:Abhinaba Basu,Pavan Chakraborty
摘要:在实验室基准测试中避免构造型的模型在部署中可能无法避免它们。我们发现,当提示提到不同的地方,时间或观众时,测量的偏见会发生显着变化-不需要对抗性提示。 我们引入上下文StereoSet,一个基准,固定的刻板印象内容,同时系统地改变上下文框架。我们在两个方案中测试了13个模型,发现了惊人的模式:锚定到1990年(与2030年相比)在所有测试的模型中都提高了刻板印象选择(p<0.05);八卦框架在6个全网格模型中的5个中提高了刻板印象选择;外群体观察者框架将其转移了13个百分点。这些影响在招聘、贷款和寻求帮助的小插曲中复制。 我们提出了上下文敏感性指纹(CSF):一个紧凑的配置文件的每维分散和配对的对比与引导CI和FDR校正。两个评估跟踪支持不同的用例-用于深度分析的360上下文诊断网格和用于生产筛选的涵盖4,229项的预算协议。 其含义是方法上的:来自固定条件测试的偏差分数可能不会泛化。这不是关于地面真实偏差率的声明;这是评估鲁棒性的压力测试。CSF迫使评价者问:“在什么条件下会出现偏倚?而不是“这个模型有偏见吗?“我们发布我们的基准测试、代码和结果。
摘要:A model that avoids stereotypes in a lab benchmark may not avoid them in deployment. We show that measured bias shifts dramatically when prompts mention different places, times, or audiences -- no adversarial prompting required. We introduce Contextual StereoSet, a benchmark that holds stereotype content fixed while systematically varying contextual framing. Testing 13 models across two protocols, we find striking patterns: anchoring to 1990 (vs. 2030) raises stereotype selection in all models tested on this contrast (p<0.05); gossip framing raises it in 5 of 6 full-grid models; out-group observer framing shifts it by up to 13 percentage points. These effects replicate in hiring, lending, and help-seeking vignettes. We propose Context Sensitivity Fingerprints (CSF): a compact profile of per-dimension dispersion and paired contrasts with bootstrap CIs and FDR correction. Two evaluation tracks support different use cases -- a 360-context diagnostic grid for deep analysis and a budgeted protocol covering 4,229 items for production screening. The implication is methodological: bias scores from fixed-condition tests may not generalize.This is not a claim about ground-truth bias rates; it is a stress test of evaluation robustness. CSF forces evaluators to ask, "Under what conditions does bias appear?" rather than "Is this model biased?" We release our benchmark, code, and results.
【2】LangLasso: Interactive Cluster Descriptions through LLM Explanation
标题:LangLasso:通过LLM解释进行交互式集群描述
链接:https://arxiv.org/abs/2601.10458
作者:Raphael Buchmüller,Dennis Collaris,Linhao Meng,Angelos Chatzimparmpas
备注:This manuscript is accepted for publication in VIS 2025 VISxGenAI Workshop
摘要:抽象性约简是揭示数据结构和潜在聚类的一种强大技术。然而,由于轴是复杂的、非线性的特征组合,它们通常缺乏语义可解释性。现有的视觉分析(VA)方法通过特征比较和交互式探索来支持聚类解释,但它们需要技术专业知识和大量的人工努力。我们提出了\textit{LangLasso},一种新的方法,补充VA方法,通过交互式,自然语言描述的集群使用大型语言模型(LLM)。它产生人类可读的描述,使聚类解释可供非专家访问,并允许集成数据集以外的外部上下文知识。我们系统地评估了这些解释的可靠性,并证明了langlasso为更广泛的受众参与聚类解释提供了有效的第一步。该工具可在https://langlasso.vercel.app上获得
摘要:Dimensionality reduction is a powerful technique for revealing structure and potential clusters in data. However, as the axes are complex, non-linear combinations of features, they often lack semantic interpretability. Existing visual analytics (VA) methods support cluster interpretation through feature comparison and interactive exploration, but they require technical expertise and intense human effort. We present \textit{LangLasso}, a novel method that complements VA approaches through interactive, natural language descriptions of clusters using large language models (LLMs). It produces human-readable descriptions that make cluster interpretation accessible to non-experts and allow integration of external contextual knowledge beyond the dataset. We systematically evaluate the reliability of these explanations and demonstrate that \langlasso provides an effective first step for engaging broader audiences in cluster interpretation. The tool is available at https://langlasso.vercel.app
【3】An Efficient Long-Context Ranking Architecture With Calibrated LLM Distillation: Application to Person-Job Fit
标题:具有校准LLM蒸馏的高效长上下文排名架构:应用于个人-职位匹配
链接:https://arxiv.org/abs/2601.10321
作者:Warren Jouanneau,Emma Jouffroy,Marc Palyart
摘要:实时找到与工作建议最相关的人是一项挑战,特别是当简历很长,结构化和多语言时。在本文中,我们提出了一个重新排名模型的基础上,新一代的后期交叉注意力架构,分解简历和项目简介,以有效地处理长上下文输入,以最小的计算开销。为了减轻历史数据偏差,我们使用生成式大型语言模型(LLM)作为教师,生成细粒度的,语义接地的监督。该信号通过丰富的蒸馏损失函数被蒸馏到我们的学生模型中。由此产生的模型产生的技能匹配分数,使一致的和可解释的人与工作的匹配。相关性,排名和校准指标的实验表明,我们的方法优于国家的最先进的基线。
摘要:Finding the most relevant person for a job proposal in real time is challenging, especially when resumes are long, structured, and multilingual. In this paper, we propose a re-ranking model based on a new generation of late cross-attention architecture, that decomposes both resumes and project briefs to efficiently handle long-context inputs with minimal computational overhead. To mitigate historical data biases, we use a generative large language model (LLM) as a teacher, generating fine-grained, semantically grounded supervision. This signal is distilled into our student model via an enriched distillation loss function. The resulting model produces skill-fit scores that enable consistent and interpretable person-job matching. Experiments on relevance, ranking, and calibration metrics demonstrate that our approach outperforms state-of-the-art baselines.
【4】Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM Servers
标题:在LLM服务器中对推理令牌进行感知优化,以实现准确性和延迟性的权衡
链接:https://arxiv.org/abs/2601.10274
作者:Emre Ozbas,Melih Bastopcu
摘要
:我们认为一个单一的大型语言模型(LLM)服务器,属于$N$不同的任务类型的查询异构流。根据泊松过程,每种类型都以已知的先验概率发生。对于每个任务类型,服务器分配固定数量的内部思考令牌,这决定了专门用于该查询的计算工作量。令牌分配引起的准确性延迟权衡:服务时间遵循近似仿射函数的分配令牌,而正确响应的概率表现出收益递减。在先进先出(FIFO)服务规则下,系统按M/G/1排队方式工作,平均系统时间取决于服务时间分布的一阶矩和二阶矩。我们制定了一个约束优化问题,最大限度地提高了平均系统时间的加权平均精度目标的惩罚,受到建筑令牌预算约束和稳定性条件。目标函数是严格凹的稳定区域,这确保了最优令牌分配的存在性和唯一性。一阶最优性条件产生一个耦合的投影不动点表征的最佳,连同一个迭代的解决方案和一个明确的充分条件收缩。此外,一个可计算的全球步长界的投影梯度方法,以保证收敛超过收缩制度。最后,整数值的令牌分配通过四舍五入的连续解决方案,并在仿真结果中评估所产生的性能损失。
摘要:We consider a single large language model (LLM) server that serves a heterogeneous stream of queries belonging to $N$ distinct task types. Queries arrive according to a Poisson process, and each type occurs with a known prior probability. For each task type, the server allocates a fixed number of internal thinking tokens, which determines the computational effort devoted to that query. The token allocation induces an accuracy-latency trade-off: the service time follows an approximately affine function of the allocated tokens, while the probability of a correct response exhibits diminishing returns. Under a first-in, first-out (FIFO) service discipline, the system operates as an $M/G/1$ queue, and the mean system time depends on the first and second moments of the resulting service-time distribution. We formulate a constrained optimization problem that maximizes a weighted average accuracy objective penalized by the mean system time, subject to architectural token-budget constraints and queue-stability conditions. The objective function is shown to be strictly concave over the stability region, which ensures existence and uniqueness of the optimal token allocation. The first-order optimality conditions yield a coupled projected fixed-point characterization of the optimum, together with an iterative solution and an explicit sufficient condition for contraction. Moreover, a projected gradient method with a computable global step-size bound is developed to guarantee convergence beyond the contractive regime. Finally, integer-valued token allocations are attained via rounding of the continuous solution, and the resulting performance loss is evaluated in simulation results.
【5】PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary
标题:SPL:流程奖励学习提高了LLM的推理能力并拓宽了推理边界
链接:https://arxiv.org/abs/2601.10201
作者:Jiarui Yao,Ruida Wang,Tong Zhang
摘要:提高大型语言模型的推理能力是近年来的一个研究热点。但大多数相关工作都是基于轨迹层面的结果奖励,在推理过程中缺少细粒度的监督。试图将过程信号组合在一起以优化LLM的其他现有训练框架也严重依赖于繁琐的额外步骤,如MCTS,训练单独的奖励模型等,不利于训练效率。此外,过程信号设计背后的直觉缺乏严格的理论支持,使得对优化机制的理解不透明。在本文中,我们提出了过程奖励学习(PRL),它将熵正则化的强化学习目标分解为中间步骤,具有严格的过程奖励,可以相应地分配给模型。从理论动机出发,我们推导出PRL的公式,它基本上相当于奖励最大化的目标加上政策模型和参考模型之间的KL-发散惩罚项。然而,PRL可以将结果奖励转化为过程监督信号,这有助于更好地指导RL优化过程中的探索。从我们的实验结果,我们表明,PRL不仅提高了平均性能为LLM的推理能力衡量平均@ n,但也扩大了推理边界,通过提高通过@ n度量。大量实验表明,PRL的有效性可以得到验证和推广。
摘要:Improving the reasoning abilities of Large Language Models (LLMs) has been a continuous topic recently. But most relevant works are based on outcome rewards at the trajectory level, missing fine-grained supervision during the reasoning process. Other existing training frameworks that try to combine process signals together to optimize LLMs also rely heavily on tedious additional steps like MCTS, training a separate reward model, etc., doing harm to the training efficiency. Moreover, the intuition behind the process signals design lacks rigorous theoretical support, leaving the understanding of the optimization mechanism opaque. In this paper, we propose Process Reward Learning (PRL), which decomposes the entropy regularized reinforcement learning objective into intermediate steps, with rigorous process rewards that could be assigned to models accordingly. Starting from theoretical motivation, we derive the formulation of PRL that is essentially equivalent to the objective of reward maximization plus a KL-divergence penalty term between the policy model and a reference model. However, PRL could turn the outcome reward into process supervision signals, which helps better guide the exploration during RL optimization. From our experiment results, we demonstrate that PRL not only improves the average performance for LLMs' reasoning ability measured by average @ n, but also broadens the reasoning boundary by improving the pass @ n metric. Extensive experiments show the effectiveness of PRL could be verified and generalized.
【6】Understanding and Preserving Safety in Fine-Tuned LLMs
标题:了解和维护精调LLM的安全性
链接:https://arxiv.org/abs/2601.10141
作者:Jiawen Zhang,Yangfan Hu,Kejia Chen,Lipeng He,Jiachen Ma,Jian Lou,Dan Li,Jian Liu,Xiaohu Yang,Ruoxi Jia
摘要:微调是将大型语言模型(LLM)应用于下游任务的基本和普遍功能。然而,它有可能大大降低安全对准,例如,通过极大地增加对越狱攻击的敏感性,即使微调数据是完全无害的。尽管在微调阶段获得了越来越多的关注,现有的方法与持久的安全效用困境斗争:强调安全性会损害任务性能,而优先考虑效用通常需要深度微调,这不可避免地导致陡峭的安全性下降。 在这项工作中,我们解决了这个困境,通过在安全对齐的LLM中安全和实用性导向的梯度之间的几何相互作用提供新的光。通过系统的实证分析,我们发现了三个关键的见解:(I)安全梯度位于低秩子空间,而效用梯度跨越更广泛的高维空间;(II)这些子空间通常是负相关的,在微调过程中会导致方向冲突;(III)可以从单个样本中有效地估计主导安全方向。在这些新见解的基础上,我们提出了安全保护微调(SPF),这是一种轻量级方法,可以显式删除与低秩安全子空间冲突的梯度分量。从理论上讲,我们表明,SPF保证效用收敛,同时有界的安全漂移。从经验上讲,SPF始终保持下游任务性能,并恢复几乎所有预先训练的安全对齐,即使在对抗性微调场景下。此外,SPF对深度微调和动态越狱攻击都具有强大的抵抗力。总之,我们的研究结果为始终对齐的LLM微调提供了新的机械理解和实践指导。
摘要:Fine-tuning is an essential and pervasive functionality for applying large language models (LLMs) to downstream tasks. However, it has the potential to substantially degrade safety alignment, e.g., by greatly increasing susceptibility to jailbreak attacks, even when the fine-tuning data is entirely harmless. Despite garnering growing attention in defense efforts during the fine-tuning stage, existing methods struggle with a persistent safety-utility dilemma: emphasizing safety compromises task performance, whereas prioritizing utility typically requires deep fine-tuning that inevitably leads to steep safety declination. In this work, we address this dilemma by shedding new light on the geometric interaction between safety- and utility-oriented gradients in safety-aligned LLMs. Through systematic empirical analysis, we uncover three key insights: (I) safety gradients lie in a low-rank subspace, while utility gradients span a broader high-dimensional space; (II) these subspaces are often negatively correlated, causing directional conflicts during fine-tuning; and (III) the dominant safety direction can be efficiently estimated from a single sample. Building upon these novel insights, we propose safety-preserving fine-tuning (SPF), a lightweight approach that explicitly removes gradient components conflicting with the low-rank safety subspace. Theoretically, we show that SPF guarantees utility convergence while bounding safety drift. Empirically, SPF consistently maintains downstream task performance and recovers nearly all pre-trained safety alignment, even under adversarial fine-tuning scenarios. Furthermore, SPF exhibits robust resistance to both deep fine-tuning and dynamic jailbreak attacks. Together, our findings provide new mechanistic understanding and practical guidance toward always-aligned LLM fine-tuning.
【7】Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction
标题:更多的背景总是更好吗?检查时间间隔预测的LLM推理能力
链接:https://arxiv.org/abs/2601.10132
作者:Yanan Cao,Farnaz Fallahi,Murali Mohana Krishna Dandu,Lalitesh Morishetti,Kai Zhao,Luyi Ma,Sinduja Subramaniam,Jianpeng Xu,Evren Korpeoglu,Kaushiki Nag,Sushant Kumar,Kannan Achan
备注:Accepted at The Web Conference 2026 (WWW 2026)
摘要
:大型语言模型(LLM)在不同领域的推理和预测方面表现出令人印象深刻的能力。然而,他们从结构化行为数据中推断时间序列的能力仍然没有得到充分的研究。本文提出了一个系统的研究,调查LLM是否可以预测重复的用户行为之间的时间间隔,如重复购买,以及不同层次的上下文信息如何塑造他们的预测行为。使用一个简单但具有代表性的回购场景,我们对统计和机器学习模型进行了zero-shot设置中最先进的LLM进行了基准测试。有两个关键的发现。首先,尽管LLM超过了轻量级统计基线,但它们的表现始终低于专用的机器学习模型,这表明它们捕捉定量时间结构的能力有限。其次,虽然适度的上下文可以提高LLM的准确性,但添加更多的用户级细节会降低性能。这些结果挑战了“更多的上下文导致更好的推理”的假设。我们的研究强调了当今LLM在结构化时间推理中的基本局限性,并为设计未来的上下文感知混合模型提供了指导,该模型将统计精度与语言灵活性相结合。
摘要:Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning and prediction across different domains. Yet, their ability to infer temporal regularities from structured behavioral data remains underexplored. This paper presents a systematic study investigating whether LLMs can predict time intervals between recurring user actions, such as repeated purchases, and how different levels of contextual information shape their predictive behavior. Using a simple but representative repurchase scenario, we benchmark state-of-the-art LLMs in zero-shot settings against both statistical and machine-learning models. Two key findings emerge. First, while LLMs surpass lightweight statistical baselines, they consistently underperform dedicated machine-learning models, showing their limited ability to capture quantitative temporal structure. Second, although moderate context can improve LLM accuracy, adding further user-level detail degrades performance. These results challenge the assumption that "more context leads to better reasoning". Our study highlights fundamental limitations of today's LLMs in structured temporal inference and offers guidance for designing future context-aware hybrid models that integrate statistical precision with linguistic flexibility.
【8】Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts
标题:Sparse-RL:通过稳定的稀疏滚卷打破LLM强化学习中的记忆墙
链接:https://arxiv.org/abs/2601.10079
作者:Sijia Luo,Xiaokang Zhang,Yuxuan Hu,Bohan Zhang,Ke Wang,Jinbo Su,Mengshu Sun,Lei Liang,Jing Zhang
摘要:强化学习(RL)已经成为在大型语言模型(LLM)中激发复杂推理能力的关键。然而,在长期部署期间存储键值(KV)缓存的大量内存开销是一个关键的瓶颈,通常会阻碍在有限的硬件上进行有效的训练。虽然现有的KV压缩技术为推理提供了补救措施,但直接将其应用于RL训练会导致严重的策略不匹配,导致灾难性的性能崩溃。为了解决这个问题,我们引入了Sparse-RL,使其能够在稀疏部署下进行稳定的RL训练。我们发现,不稳定性产生于密集的旧政策,稀疏采样器政策和学习者政策之间的基本政策不匹配。为了缓解这个问题,Sparse-RL结合了稀疏感知拒绝采样和基于重要性的重新加权,以纠正由压缩引起的信息丢失引入的偏离策略的偏差。实验结果表明,与密集基线相比,Sparse-RL在保持性能的同时减少了卷展开销。此外,Sparse-RL本质上实现了稀疏感知训练,显著增强了稀疏推理部署期间的模型鲁棒性。
摘要:Reinforcement Learning (RL) has become essential for eliciting complex reasoning capabilities in Large Language Models (LLMs). However, the substantial memory overhead of storing Key-Value (KV) caches during long-horizon rollouts acts as a critical bottleneck, often prohibiting efficient training on limited hardware. While existing KV compression techniques offer a remedy for inference, directly applying them to RL training induces a severe policy mismatch, leading to catastrophic performance collapse. To address this, we introduce Sparse-RL empowers stable RL training under sparse rollouts. We show that instability arises from a fundamental policy mismatch among the dense old policy, the sparse sampler policy, and the learner policy. To mitigate this issue, Sparse-RL incorporates Sparsity-Aware Rejection Sampling and Importance-based Reweighting to correct the off-policy bias introduced by compression-induced information loss. Experimental results show that Sparse-RL reduces rollout overhead compared to dense baselines while preserving the performance. Furthermore, Sparse-RL inherently implements sparsity-aware training, significantly enhancing model robustness during sparse inference deployment.
【9】SoK: Privacy-aware LLM in Healthcare: Threat Model, Privacy Techniques, Challenges and Recommendations
标题:SoK:隐私意识LLM在医疗保健:威胁模型,隐私技术,挑战和建议
链接:https://arxiv.org/abs/2601.10004
作者:Mohoshin Ara Tahera,Karamveer Singh Sidhu,Shuvalaxmi Dass,Sajal Saha
摘要:大型语言模型(LLM)在医疗保健中越来越多地被采用,以支持临床决策,总结电子健康记录(EHR)并增强患者护理。然而,这种集成带来了重大的隐私和安全挑战,这是由临床数据的敏感性和医疗工作流程的高风险性所驱动的。这些风险在异构部署环境中变得更加明显,从小型内部医院系统到区域健康网络,每一个都有独特的资源限制和监管要求。知识系统化(SoK)检查了LLM三个核心阶段的不断变化的威胁格局:数据预处理,微调和现实医疗环境中的推理。我们提出了一个详细的威胁模型,描述了每个阶段的对手,能力和攻击面,并系统化了现有的隐私保护技术(PPT)如何试图减轻这些漏洞。虽然现有的防御措施显示出了希望,但我们的分析发现,在保护不同运营层的敏感临床数据方面存在持续的限制。我们的结论与阶段意识的建议和未来的研究方向,旨在加强隐私保障LLM在受监管的环境。这项工作为理解医疗保健中LLM、威胁和隐私的交叉点提供了基础,为实现更强大、临床上更值得信赖的AI系统提供了路线图。
摘要:Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs), and enhance patient care. However, this integration introduces significant privacy and security challenges, driven by the sensitivity of clinical data and the high-stakes nature of medical workflows. These risks become even more pronounced across heterogeneous deployment environments, ranging from small on-premise hospital systems to regional health networks, each with unique resource limitations and regulatory demands. This Systematization of Knowledge (SoK) examines the evolving threat landscape across the three core LLM phases: Data preprocessing, Fine-tuning, and Inference within realistic healthcare settings. We present a detailed threat model that characterizes adversaries, capabilities, and attack surfaces at each phase, and we systematize how existing privacy-preserving techniques (PPTs) attempt to mitigate these vulnerabilities. While existing defenses show promise, our analysis identifies persistent limitations in securing sensitive clinical data across diverse operational tiers. We conclude with phase-aware recommendations and future research directions aimed at strengthening privacy guarantees for LLMs in regulated environments. This work provides a foundation for understanding the intersection of LLMs, threats, and privacy in healthcare, offering a roadmap toward more robust and clinically trustworthy AI systems.
【10】FaTRQ: Tiered Residual Quantization for LLM Vector Search in Far-Memory-Aware ANNS Systems
标题:FaTRQ:远内存感知ANNS系统中LLM载体搜索的分层残留量化
链接:https://arxiv.org/abs/2601.09985
作者:Tianqi Zhang,Flavio Ponzina,Tajana Rosing
摘要:近似最近邻搜索(ANNS)是检索增强生成(RAG)中的关键技术,能够从海量向量数据库中快速识别最相关的高维嵌入。现代ANNS引擎使用预建索引加速此过程,并将压缩的矢量量化表示存储在快速内存中。然而,它们仍然依赖于昂贵的第二遍细化阶段,该阶段从较慢的存储(如SSD)中读取全精度向量。对于现代文本和多模态嵌入,这些读取现在主导了整个查询的延迟。我们提出了FaTRQ,一个遥远的内存感知细化系统,使用分层内存,消除了需要从存储中获取完整的向量。它引入了一个渐进的距离估计器,使用从远存储器流传输的紧凑残差来细化粗略的分数。一旦候选人被证明在前k之外,细化就提前停止。为了支持这一点,我们提出了分层残差量化,它将残差编码为有效存储在远存储器中的三进制值。自定义加速器部署在CXL Type-2设备中,以在本地执行低延迟细化。与SOTA GPU ANNS系统相比,FaTRQ将存储效率提高了2.4 $\times $,并将吞吐量提高了高达9$ \times$。
摘要
:Approximate Nearest-Neighbor Search (ANNS) is a key technique in retrieval-augmented generation (RAG), enabling rapid identification of the most relevant high-dimensional embeddings from massive vector databases. Modern ANNS engines accelerate this process using prebuilt indexes and store compressed vector-quantized representations in fast memory. However, they still rely on a costly second-pass refinement stage that reads full-precision vectors from slower storage like SSDs. For modern text and multimodal embeddings, these reads now dominate the latency of the entire query. We propose FaTRQ, a far-memory-aware refinement system using tiered memory that eliminates the need to fetch full vectors from storage. It introduces a progressive distance estimator that refines coarse scores using compact residuals streamed from far memory. Refinement stops early once a candidate is provably outside the top-k. To support this, we propose tiered residual quantization, which encodes residuals as ternary values stored efficiently in far memory. A custom accelerator is deployed in a CXL Type-2 device to perform low-latency refinement locally. Together, FaTRQ improves the storage efficiency by 2.4$\times$ and improves the throughput by up to 9$ \times$ than SOTA GPU ANNS system.
【11】An Exploratory Study to Repurpose LLMs to a Unified Architecture for Time Series Classification
标题:将LLM重新利用到统一的时间序列分类架构的探索性研究
链接:https://arxiv.org/abs/2601.09971
作者:Hansen He,Shuheng Li
摘要:时间序列分类(TSC)是机器学习的核心问题,具有广泛的应用前景.近年来,由于大型语言模型强大的推理和泛化能力,人们对将其用于TSC的兴趣越来越大。以前的工作主要集中在对齐策略,明确映射到文本域的时间序列数据,然而,时间序列编码器架构的选择仍然是探索不足。在这项工作中,我们进行了探索性研究的混合架构,结合了专门的时间序列编码器与冻结LLM骨干。我们评估了一组不同的编码器系列,包括Inception,卷积,残差,基于变换器和多层感知器架构,其中Inception模型是唯一的编码器架构,当与LLM骨干集成时,始终产生积极的性能增益。总的来说,这项研究强调了混合LLM架构中时间序列编码器选择的影响,并指出基于Inception的模型是未来LLM驱动的时间序列学习的一个有前途的方向。
摘要:Time series classification (TSC) is a core machine learning problem with broad applications. Recently there has been growing interest in repurposing large language models (LLMs) for TSC, motivated by their strong reasoning and generalization ability. Prior work has primarily focused on alignment strategies that explicitly map time series data into the textual domain; however, the choice of time series encoder architecture remains underexplored. In this work, we conduct an exploratory study of hybrid architectures that combine specialized time series encoders with a frozen LLM backbone. We evaluate a diverse set of encoder families, including Inception, convolutional, residual, transformer-based, and multilayer perceptron architectures, among which the Inception model is the only encoder architecture that consistently yields positive performance gains when integrated with an LLM backbone. Overall, this study highlights the impact of time series encoder choice in hybrid LLM architectures and points to Inception-based models as a promising direction for future LLM-driven time series learning.
【12】Advancing Model Refinement: Muon-Optimized Distillation and Quantization for LLM Deployment
标题:推进模型细化:LLM部署的μ子优化蒸馏和量化
链接:https://arxiv.org/abs/2601.09865
作者:Jacob Sander,Brian Jalaian,Venkat R. Dasari
备注:12 pages, 5 figures
摘要:大型语言模型(LLM)支持高级自然语言处理,但由于高计算、内存和能源需求,在资源受限的边缘设备上面临部署挑战。优化这些模型需要解决三个关键挑战:获取特定于任务的数据,微调性能,压缩模型以加速推理,同时减少资源需求。我们提出了一个集成框架,结合了基于GPTQ的量化,低秩自适应(LoRA)和专门的数据蒸馏过程,以显着降低模型的大小和复杂性,同时保持或增强特定于任务的性能。通过利用数据蒸馏、通过Kullback-Leibler发散的知识蒸馏、贝叶斯超参数优化和Muon优化器,我们的流水线实现了高达2倍的内存压缩(例如,将6 GB模型减少到3GB),并为专门的任务提供有效的推理。实证结果表明,与单独的GPTQ量化相比,标准LLM基准测试的性能优越,Muon优化器显着增强了微调模型对量化过程中精度衰减的抵抗力。
摘要:Large Language Models (LLMs) enable advanced natural language processing but face deployment challenges on resource-constrained edge devices due to high computational, memory, and energy demands. Optimizing these models requires addressing three key challenges: acquiring task-specific data, fine-tuning for performance, and compressing models to accelerate inference while reducing resource demands. We propose an integrated framework combining GPTQ-based quantization, low-rank adaptation (LoRA), and a specialized data distillation process to significantly reduce model size and complexity while preserving or enhancing task-specific performance. By leveraging data distillation, knowledge distillation via Kullback-Leibler divergence, Bayesian hyperparameter optimization, and the Muon optimizer, our pipeline achieves up to 2x memory compression (e.g., reducing a 6GB model to 3GB) and enables efficient inference for specialized tasks. Empirical results demonstrate superior performance on standard LLM benchmarks compared to GPTQ quantization alone, with the Muon optimizer notably enhancing fine-tuned models' resistance to accuracy decay during quantization.
【13】Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition
标题:使用LoRA进行财务命名实体识别的指令微调LLaMA-3-8B模型
链接:https://arxiv.org/abs/2601.10043
作者:Zhiming Lian
摘要:特别是,金融命名实体识别(NER)是将无格式的报告和新闻转换为结构化知识图的许多重要方法之一。然而,免费、易于使用的大型语言模型(LLM)往往无法将组织区分为人,或者完全忽略实际的货币金额。以Meta的Llama 3 8B为例,结合指令微调和低秩自适应(LoRA),将其应用于金融NER。每个带注释的句子都被转换为一个推理-输入-输出三元组,使模型能够学习任务描述,同时使用小的低秩矩阵进行微调,而不是更新所有权重。使用1,693个句子的语料库,我们的方法获得了0.894的micro-F1分数,与Qwen 3 -8B,百川2 - 7 B,T5和BERT-Base相比。我们提供数据集统计数据,描述训练超参数,并对实体密度,学习曲线和评估指标进行可视化。我们的研究结果表明,指令调整结合参数有效的微调,使国家的最先进的性能域敏感的NER。
摘要:Particularly, financial named-entity recognition (NER) is one of the many important approaches to translate unformatted reports and news into structured knowledge graphs. However, free, easy-to-use large language models (LLMs) often fail to differentiate organisations as people, or disregard an actual monetary amount entirely. This paper takes Meta's Llama 3 8B and applies it to financial NER by combining instruction fine-tuning and Low-Rank Adaptation (LoRA). Each annotated sentence is converted into an instruction-input-output triple, enabling the model to learn task descriptions while fine-tuning with small low-rank matrices instead of updating all weights. Using a corpus of 1,693 sentences, our method obtains a micro-F1 score of 0.894 compared with Qwen3-8B, Baichuan2-7B, T5, and BERT-Base. We present dataset statistics, describe training hyperparameters, and perform visualizations of entity density, learning curves, and evaluation metrics. Our results show that instruction tuning combined with parameter-efficient fine-tuning enables state-of-the-art performance on domain-sensitive NER.
【14】Performance of AI agents based on reasoning language models on ALD process optimization tasks
标题:基于推理语言模型的人工智能代理在ALS流程优化任务中的性能
链接:https://arxiv.org/abs/2601.09980
作者:Angel Yanguas-Gil
摘要:在这项工作中,我们探索推理大语言模型的性能和行为,以自主优化原子层沉积(ALD)工艺。在ALD工艺优化任务中,建立在推理LLM之上的代理必须找到ALD前体和共反应物的最佳剂量时间,而无需任何关于工艺的先验知识,包括它是否实际上是自限的。该试剂旨在以完全无人监督的方式与ALD反应器反复相互作用。我们使用ALD工具的简单模型来评估该试剂,该ALD工具将ALD过程与不同的自限性表面反应途径以及非自限性组分结合。我们的研究结果表明,基于OpenAI的o3和GPT5等推理模型的代理始终成功地完成了这一优化任务。然而,由于模型响应的不确定性,我们观察到显著的运行间变异性。为了理解推理模型遵循的逻辑,代理使用两步过程,其中模型首先生成详细描述推理过程的开放响应。然后将此响应转换为结构化输出。对这些推理痕迹的分析表明,该模型的逻辑是合理的,其推理是基于ALD情况下预期的自限过程和饱和的概念。然而,智能体在探索优化空间时,有时会被自己的先验选择所误导。
摘要
:In this work we explore the performance and behavior of reasoning large language models to autonomously optimize atomic layer deposition (ALD) processes. In the ALD process optimization task, an agent built on top of a reasoning LLM has to find optimal dose times for an ALD precursor and a coreactant without any prior knowledge on the process, including whether it is actually self-limited. The agent is meant to interact iteratively with an ALD reactor in a fully unsupervised way. We evaluate this agent using a simple model of an ALD tool that incorporates ALD processes with different self-limited surface reaction pathways as well as a non self-limited component. Our results show that agents based on reasoning models like OpenAI's o3 and GPT5 consistently succeeded at completing this optimization task. However, we observed significant run-to-run variability due to the non deterministic nature of the model's response. In order to understand the logic followed by the reasoning model, the agent uses a two step process in which the model first generates an open response detailing the reasoning process. This response is then transformed into a structured output. An analysis of these reasoning traces showed that the logic of the model was sound and that its reasoning was based on the notions of self-limited process and saturation expected in the case of ALD. However, the agent can sometimes be misled by its own prior choices when exploring the optimization space.
Graph相关(图学习|图神经网络|图优化等)(6篇)
【1】On the origin of neural scaling laws: from random graphs to natural language
标题:神经缩放定律的起源:从随机图到自然语言
链接:https://arxiv.org/abs/2601.10684
作者:Maissam Barkeshli,Alberto Alfarano,Andrey Gromov
备注:33 pages
摘要:标度律在现代人工智能革命中发挥了重要作用,为从业者提供了预测能力,可以预测随着数据、计算和模型参数数量的增加,模型性能将如何提高。这激发了人们对神经标度律起源的浓厚兴趣,一个常见的建议是它们来自数据中已经存在的幂律结构。在本文中,我们研究了可调复杂度的图上训练预测随机游走(二元组)的Transformers的标度律。我们证明,这种简化的设置已经引起神经标度律,即使在没有幂律结构的数据相关性。我们进一步考虑通过对从日益简化的生成语言模型(从4,2,1层Transformer语言模型到语言二元组)中采样的序列进行训练,系统地降低自然语言的复杂性,揭示标度指数的单调演变。我们的研究结果还包括从随机图上的随机行走训练获得的标度律,这些随机图来自Erdös-Renyi和无标度Barabási-Albert系综。最后,我们重新审视了语言建模的传统尺度律,证明了使用上下文长度为50的2层Transformers可以再现几个基本结果,提供了对先前文献中使用的各种拟合的批判性分析,证明了与出版文献中的当前实践相比,获得计算最佳曲线的替代方法,并提供了最大更新参数化可能比标准参数化更有效的初步证据。
摘要:Scaling laws have played a major role in the modern AI revolution, providing practitioners predictive power over how the model performance will improve with increasing data, compute, and number of model parameters. This has spurred an intense interest in the origin of neural scaling laws, with a common suggestion being that they arise from power law structure already present in the data. In this paper we study scaling laws for transformers trained to predict random walks (bigrams) on graphs with tunable complexity. We demonstrate that this simplified setting already gives rise to neural scaling laws even in the absence of power law structure in the data correlations. We further consider dialing down the complexity of natural language systematically, by training on sequences sampled from increasingly simplified generative language models, from 4,2,1-layer transformer language models down to language bigrams, revealing a monotonic evolution of the scaling exponents. Our results also include scaling laws obtained from training on random walks on random graphs drawn from Erdös-Renyi and scale-free Barabási-Albert ensembles. Finally, we revisit conventional scaling laws for language modeling, demonstrating that several essential results can be reproduced using 2 layer transformers with context length of 50, provide a critical analysis of various fits used in prior literature, demonstrate an alternative method for obtaining compute optimal curves as compared with current practice in published literature, and provide preliminary evidence that maximal update parameterization may be more parameter efficient than standard parameterization.
【2】PLGC: Pseudo-Labeled Graph Condensation
标题:PLGC:伪标记图浓缩
链接:https://arxiv.org/abs/2601.10358
作者:Jay Nandy,Arnab Kumar Mondal,Anuj Rathore,Mahesh Chandran
摘要:大型图数据集使得训练图神经网络(GNN)的计算成本很高。图压缩方法通过生成近似原始数据的小合成图来解决这个问题。然而,现有的方法依赖于干净的,有监督的标签,这限制了它们的可靠性时,标签是稀缺的,嘈杂的,或不一致的。我们提出了伪标签图浓缩(PLGC),这是一个自监督框架,它从节点嵌入中构建潜在的伪标签,并优化浓缩图以匹配原始图的结构和特征统计数据-而不需要地面真实标签。PLGC提供了三个关键贡献:(1)诊断为什么监督凝聚失败的标签噪声和分布偏移。(2)一种无标签的压缩方法,可以联合学习潜在原型和节点分配。(3)理论保证表明,伪标签保留了原始图的潜在结构统计,并确保准确的嵌入对齐。从经验上看,在节点分类和链接预测任务中,PLGC在干净数据集上使用最先进的监督压缩方法实现了具有竞争力的性能,并在标签噪声下表现出相当大的鲁棒性,通常优于所有基线。我们的研究结果突出了自监督图凝聚在噪声或弱标记环境中的实践和理论优势。
摘要:Large graph datasets make training graph neural networks (GNNs) computationally costly. Graph condensation methods address this by generating small synthetic graphs that approximate the original data. However, existing approaches rely on clean, supervised labels, which limits their reliability when labels are scarce, noisy, or inconsistent. We propose Pseudo-Labeled Graph Condensation (PLGC), a self-supervised framework that constructs latent pseudo-labels from node embeddings and optimizes condensed graphs to match the original graph's structural and feature statistics -- without requiring ground-truth labels. PLGC offers three key contributions: (1) A diagnosis of why supervised condensation fails under label noise and distribution shift. (2) A label-free condensation method that jointly learns latent prototypes and node assignments. (3) Theoretical guarantees showing that pseudo-labels preserve latent structural statistics of the original graph and ensure accurate embedding alignment. Empirically, across node classification and link prediction tasks, PLGC achieves competitive performance with state-of-the-art supervised condensation methods on clean datasets and exhibits substantial robustness under label noise, often outperforming all baselines by a significant margin. Our findings highlight the practical and theoretical advantages of self-supervised graph condensation in noisy or weakly-labeled environments.
【3】Meta Dynamic Graph for Traffic Flow Prediction
标题:交通流预测的Meta动态图
链接:https://arxiv.org/abs/2601.10328
作者:Yiqing Zou,Hanning Yuan,Qianyu Yang,Ziqiang Yuan,Shuliang Wang,Sijie Ruan
备注:Accepted to AAAI 2026
摘要:交通流预测是一个典型的时空预测问题,有着广泛的应用。核心挑战在于对底层复杂的时空依赖关系进行建模。已经提出了各种方法,最近的研究表明,动态建模是有用的,以满足核心的挑战。虽然使用单独的基础模型结构来处理空间依赖性和时间依赖性可能会阻碍时空相关性的建模,但动态建模可以弥合这一差距。消除时空异质性也推进了主要目标,因为它可以扩展参数空间并允许更多的灵活性。尽管取得了这些进展,但仍然存在两个局限性:1)动力学的建模通常限于空间拓扑的动力学(例如,邻接矩阵的变化),然而,这可以扩展到更广泛的范围; 2)异质性的建模往往是分离的空间和时间维度,但这一差距也可以通过动态建模来弥合。为了解决上述限制,我们提出了一种新的框架,称为Meta Dynamic Graph(MetaDG)的流量预测。MetaDG利用节点表示的动态图结构来显式地对时空动态进行建模。这生成了动态邻接矩阵和元参数,将动态建模扩展到拓扑之外,同时将时空异质性的捕获统一到单个维度中。在四个真实数据集上的大量实验验证了MetaDG的有效性。
摘要
:Traffic flow prediction is a typical spatio-temporal prediction problem and has a wide range of applications. The core challenge lies in modeling the underlying complex spatio-temporal dependencies. Various methods have been proposed, and recent studies show that the modeling of dynamics is useful to meet the core challenge. While handling spatial dependencies and temporal dependencies using separate base model structures may hinder the modeling of spatio-temporal correlations, the modeling of dynamics can bridge this gap. Incorporating spatio-temporal heterogeneity also advances the main goal, since it can extend the parameter space and allow more flexibility. Despite these advances, two limitations persist: 1) the modeling of dynamics is often limited to the dynamics of spatial topology (e.g., adjacency matrix changes), which, however, can be extended to a broader scope; 2) the modeling of heterogeneity is often separated for spatial and temporal dimensions, but this gap can also be bridged by the modeling of dynamics. To address the above limitations, we propose a novel framework for traffic prediction, called Meta Dynamic Graph (MetaDG). MetaDG leverages dynamic graph structures of node representations to explicitly model spatio-temporal dynamics. This generates both dynamic adjacency matrices and meta-parameters, extending dynamic modeling beyond topology while unifying the capture of spatio-temporal heterogeneity into a single dimension. Extensive experiments on four real-world datasets validate the effectiveness of MetaDG.
【4】Graph Regularized PCA
标题:图正规PCA
链接:https://arxiv.org/abs/2601.10199
作者:Antonio Briola,Marwin Schmidt,Fabio Caccioli,Carlos Ros Perez,James Singleton,Christian Michler,Tomaso Aste
备注:15 pages, 2 figures, 4 Tables
摘要:高维数据往往表现出变量之间的依赖关系,违反了各向同性噪声假设下,主成分分析(PCA)是最佳的。对于噪声不是独立的并且跨特征相同地分布的情况(即,协方差不是球形的),我们引入图正则化PCA(GR-PCA)。它是PCA的一种基于图的正则化,通过学习稀疏精度图并将负载偏向相应图拉普拉斯算子的低频傅立叶模式来合并数据特征的依赖结构。因此,高频信号被抑制,而图形相干的低频信号被保留,产生与条件关系一致的可解释的主成分。我们评估GR-PCA的合成数据跨越不同的图形拓扑结构,信噪比和稀疏水平。与主流方法相比,它将方差集中在预定的支持上,产生具有较低图拉普拉斯能量的负载,并且在样本外重建中保持竞争力。当存在高频信号时,图形拉普拉斯惩罚可以防止过拟合,降低重建精度,但提高结构保真度。当高频信号是图形相关的时,PCA的优势是最明显的,而当这种信号几乎是旋转不变的时,PCA仍然具有竞争力。该过程易于实现,相对于精度估计器是模块化的,并且是可扩展的,为结构感知的降维提供了一种实用的途径,该降维在不牺牲预测性能的情况下提高了结构保真度。
摘要:High-dimensional data often exhibit dependencies among variables that violate the isotropic-noise assumption under which principal component analysis (PCA) is optimal. For cases where the noise is not independent and identically distributed across features (i.e., the covariance is not spherical) we introduce Graph Regularized PCA (GR-PCA). It is a graph-based regularization of PCA that incorporates the dependency structure of the data features by learning a sparse precision graph and biasing loadings toward the low-frequency Fourier modes of the corresponding graph Laplacian. Consequently, high-frequency signals are suppressed, while graph-coherent low-frequency ones are preserved, yielding interpretable principal components aligned with conditional relationships. We evaluate GR-PCA on synthetic data spanning diverse graph topologies, signal-to-noise ratios, and sparsity levels. Compared to mainstream alternatives, it concentrates variance on the intended support, produces loadings with lower graph-Laplacian energy, and remains competitive in out-of-sample reconstruction. When high-frequency signals are present, the graph Laplacian penalty prevents overfitting, reducing the reconstruction accuracy but improving structural fidelity. The advantage over PCA is most pronounced when high-frequency signals are graph-correlated, whereas PCA remains competitive when such signals are nearly rotationally invariant. The procedure is simple to implement, modular with respect to the precision estimator, and scalable, providing a practical route to structure-aware dimensionality reduction that improves structural fidelity without sacrificing predictive performance.
【5】Simple Network Graph Comparative Learning
标题:简单网络图比较学习
链接:https://arxiv.org/abs/2601.10150
作者:Qiang Yu,Xinran Cheng,Shiqiang Xu,Chuanyi Liu
备注:10 pages, 5 figures
摘要:对比学习方法的有效性在图学习领域得到了广泛的认可,特别是在图数据通常缺乏标签或难以标记的情况下。然而,将这些方法应用于节点分类任务仍然面临许多挑战。首先,现有的数据增强技术在生成新视图时可能会导致与原始视图的显著差异,这可能会削弱视图的相关性,影响模型训练的效率。其次,绝大多数现有的图比较学习算法依赖于使用大量的负样本。为了解决上述挑战,本研究提出了一种新的节点分类对比学习方法,称为简单网络图比较学习(SNGCL)。具体而言,SNGCL采用叠加的多层拉普拉斯平滑滤波器作为处理数据的步骤,以分别获得全局和局部特征平滑矩阵,从而将其传递到连体网络的目标网络和在线网络中,最后采用改进的三重重组损失函数,使类内距离更近,类间距离更远。我们在节点分类任务中比较了SNGCL和最先进的模型,实验结果表明SNGCL在大多数任务中具有很强的竞争力。
摘要:The effectiveness of contrastive learning methods has been widely recognized in the field of graph learning, especially in contexts where graph data often lack labels or are difficult to label. However, the application of these methods to node classification tasks still faces a number of challenges. First, existing data enhancement techniques may lead to significant differences from the original view when generating new views, which may weaken the relevance of the view and affect the efficiency of model training. Second, the vast majority of existing graph comparison learning algorithms rely on the use of a large number of negative samples. To address the above challenges, this study proposes a novel node classification contrast learning method called Simple Network Graph Comparative Learning (SNGCL). Specifically, SNGCL employs a superimposed multilayer Laplace smoothing filter as a step in processing the data to obtain global and local feature smoothing matrices, respectively, which are thus passed into the target and online networks of the siamese network, and finally employs an improved triple recombination loss function to bring the intra-class distance closer and the inter-class distance farther. We have compared SNGCL with state-of-the-art models in node classification tasks, and the experimental results show that SNGCL is strongly competitive in most tasks.
【6】SciNets: Graph-Constrained Multi-Hop Reasoning for Scientific Literature Synthesis
标题:SciNets:用于科学文献综合的图约束多跳推理
链接:https://arxiv.org/abs/2601.09727
作者:Sauhard Dubey
备注:19 pages, 2 figures
摘要:跨领域的科学综合需要在碎片化的文献中连接机械解释,这种能力对于基于检索的系统和无约束的语言模型来说仍然具有挑战性。虽然最近的工作已经将大型语言模型应用于科学总结和问题回答,但这些方法对推理深度和结构基础的控制有限。我们框架机械合成作为一个图约束的多跳推理问题,在文献衍生的概念图。给定一个科学查询和一个紧凑的查询本地语料库,SciNets构建了一个有向概念图,并通过识别多跳推理路径来综合机械解释,这些路径连接了在单个论文中很少共同出现的概念。我们系统地比较最短路径推理,多样性约束,随机随机游走,和检索增强语言模型基线的k-最短路径。而不是评估的正确性,这往往是不确定的,当合成跨分布式源的连接,我们引入了一个行为框架,衡量符号推理的深度,机械的多样性,接地稳定性。在机器学习、生物学和气候科学任务中,显式图约束能够实现可控的多跳推理,同时揭示了一致的权衡:更深入、更多样的符号推理会增加基础不稳定性,而最短路径推理保持高度稳定,但结构保守。这些研究结果提供了一个系统的行为特征的限制和能力,目前的图形LLM集成科学综合。
摘要
:Cross-domain scientific synthesis requires connecting mechanistic explanations across fragmented literature, a capability that remains challenging for both retrieval-based systems and unconstrained language models. While recent work has applied large language models to scientific summarization and question answering, these approaches provide limited control over reasoning depth and structural grounding. We frame mechanistic synthesis as a graph-constrained multi-hop reasoning problem over literature-derived concept graphs. Given a scientific query and a compact, query-local corpus, SciNets constructs a directed concept graph and synthesizes mechanistic explanations by identifying multi-hop reasoning paths that connect concepts that rarely co-occur within individual papers. We systematically compare shortest-path reasoning, k-shortest paths with diversity constraints, stochastic random walks, and a retrieval-augmented language model baseline. Rather than evaluating correctness, which is often indeterminate when synthesizing connections across distributed sources, we introduce a behavioral framework that measures symbolic reasoning depth, mechanistic diversity, and grounding stability. Across machine learning, biology, and climate science tasks, explicit graph constraints enable controllable multi-hop reasoning while revealing a consistent trade-off: deeper and more diverse symbolic reasoning increases grounding instability, whereas shortest-path reasoning remains highly stable but structurally conservative. These findings provide a systematic behavioral characterization of the limits and capabilities of current graph-LLM integration for scientific synthesis.
Transformer(6篇)
【1】STEM: Scaling Transformers with Embedding Modules
标题:STEM:使用嵌入模块扩展Transformer
链接:https://arxiv.org/abs/2601.10639
作者:Ranajoy Sadhukhan,Sheng Cao,Harry Dong,Changsheng Zhao,Attiano Purpura-Pontoniere,Yuandong Tian,Zechun Liu,Beidi Chen
摘要:细粒度稀疏性承诺更高的参数容量,而无需按比例的每个令牌计算,但通常会受到训练不稳定性,负载平衡和通信开销的影响。我们介绍了STEM(Scaling Transformers with Embedding Modules),这是一种静态的标记索引方法,它用层局部嵌入查找替换FFN上投影,同时保持门和下投影的密集性。这消除了运行时路由,通过异步预取启用CPU卸载,并从每个令牌的FLOP和跨设备通信中增加容量。从经验上讲,STEM训练稳定,尽管极端稀疏。它提高了密集基线的下游性能,同时减少了每个令牌的FLOP和参数访问(消除了大约三分之一的FFN参数)。STEM学习具有大角度扩展的嵌入空间,这增强了其知识存储容量。更有趣的是,这种增强的知识能力伴随着更好的可解释性。STEM嵌入的标记索引性质允许以可解释的方式执行知识编辑和知识注入的简单方法,而无需对输入文本或额外计算进行任何干预。此外,STEM增强了长上下文性能:随着序列长度的增长,更多不同的参数被激活,从而产生实际的测试时间容量扩展。在3.5亿和1B的模型规模上,STEM总体上提供了高达3 - 4%的准确性改进,在知识和推理重基准测试(ARC挑战,OpenBookQA,GSM8K,MMLU)上有显着的收益。总的来说,STEM是一种扩展参数记忆的有效方法,同时提供更好的可解释性,更好的训练稳定性和更高的效率。
摘要:Fine-grained sparsity promises higher parametric capacity without proportional per-token compute, but often suffers from training instability, load balancing, and communication overhead. We introduce STEM (Scaling Transformers with Embedding Modules), a static, token-indexed approach that replaces the FFN up-projection with a layer-local embedding lookup while keeping the gate and down-projection dense. This removes runtime routing, enables CPU offload with asynchronous prefetch, and decouples capacity from both per-token FLOPs and cross-device communication. Empirically, STEM trains stably despite extreme sparsity. It improves downstream performance over dense baselines while reducing per-token FLOPs and parameter accesses (eliminating roughly one-third of FFN parameters). STEM learns embedding spaces with large angular spread which enhances its knowledge storage capacity. More interestingly, this enhanced knowledge capacity comes with better interpretability. The token-indexed nature of STEM embeddings allows simple ways to perform knowledge editing and knowledge injection in an interpretable manner without any intervention in the input text or additional computation. In addition, STEM strengthens long-context performance: as sequence length grows, more distinct parameters are activated, yielding practical test-time capacity scaling. Across 350M and 1B model scales, STEM delivers up to ~3--4% accuracy improvements overall, with notable gains on knowledge and reasoning-heavy benchmarks (ARC-Challenge, OpenBookQA, GSM8K, MMLU). Overall, STEM is an effective way of scaling parametric memory while providing better interpretability, better training stability and improved efficiency.
【2】Transformer-Based Cognitive Radio: Adaptive Modulation Strategies Using Transformer Models
标题:基于转换器的认知无线电:使用Transformer模型的自适应调制策略
链接:https://arxiv.org/abs/2601.10519
作者:Andrea Melis,Andrea Piroddi,Roberto Girau
摘要:认知无线电(CR)系统可以动态适应不断变化的频谱环境,可以从机器学习技术的进步中受益匪浅。这些系统可以通过诸如使用Transformer模型的创新方法在频谱效率、鲁棒性和安全性方面得到增强。这项工作研究的应用程序的Transformer模型,特别是GPT-2架构,产生新的调制方案的无线通信。通过在现有调制公式的数据集上训练GPT-2模型,已经创建了新的调制方案。这些生成的方案,然后比较传统的方法,使用关键的性能指标,如信噪比(SNR)和功率谱密度(PSD)。结果表明,变压器产生的调制方案可以实现性能相当,并在某些情况下优于传统的方法。这表明,先进的CR系统可以大大受益于Transformer模型的实现,从而实现更高效、更强大和更安全的通信系统。
摘要:Cognitive Radio (CR) systems, which dynamically adapt to changing spectrum environments, could benefit significantly from advancements in machine learning technologies. These systems can be enhanced in terms of spectral efficiency, robustness, and security through innovative approaches such as the use of Transformer models. This work investigates the application of Transformer models, specifically the GPT-2 architecture, to generate novel modulation schemes for wireless communications. By training a GPT-2 model on a dataset of existing modulation formulas, new modulation schemes has been created. These generated schemes are then compared to traditional methods using key performance metrics such as Signal-to-Noise Ratio (SNR) and Power Spectrum Density (PSD). The results show that Transformer-generated modulation schemes can achieve performance comparable to, and in some cases outperforming, traditional methods. This demonstrates that advanced CR systems could greatly benefit from the implementation of Transformer models, leading to more efficient, robust, and secure communication systems.
【3】LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers
标题:LOOKAT:内存高效Transformer的查找优化关键关注
链接:https://arxiv.org/abs/2601.10155
作者:Aryan Karmore
摘要:压缩KV缓存是在边缘设备上部署大型语言模型的必要步骤。当前的量化方法压缩了存储,但未能减少带宽,因为注意力计算需要在使用之前将密钥从INT 4/INT 8解量化到FP 16。我们观察到注意力评分在数学上等同于内积相似性搜索,并且我们可以应用向量数据库中的一些压缩技术来更好地压缩KV缓存。我们提出LOOKAT,它适用于产品量化和非对称距离计算,Transformer架构,通过分解关键向量到子空间,学习码本和计算注意力表通过查找表。这将注意力从内存绑定转换为计算绑定。在GPT-2上测试时,LOOKAT在95.7\%的输出保真度下实现64 $\times$压缩,在95.0\%的保真度下实现32 $\times$压缩。LOOKAT不需要架构更改或培训,同时保持等级相关性$p> 0.95$。理论分析证实,等级相关性下降为O(d_k/mK),保证在序列长度高达1024令牌验证。
摘要:Compressing the KV cache is a required step to deploy large language models on edge devices. Current quantization methods compress storage but fail to reduce bandwidth as attention calculation requires dequantizing keys from INT4/INT8 to FP16 before use. We observe that attention scoring is mathematically equivalent to the inner product similarity search and we can apply some compression techniques from vector databases to compress KV-cache better. We propose LOOKAT, which applies product quantization and asymmetric distance computation, to transformer architecture by decomposing key vectors into subspaces, learning codebooks and computing attention tables via lookup tables. This transforms attention from memory-bound to compute-bound. LOOKAT achieves 64 $\times$ compression at 95.7\% output fidelity and 32 $\times$ compression at 95.0\% fidelity when tested on GPT-2. LOOKAT requires no architecture changes or training while maintaining rank correlation $ρ> 0.95$. Theoretical analysis confirms that rank correlation degrades as $O(d_k/mK)$, with guarantees validated across sequence lengths up to 1024 tokens.
【4】Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
标题:无标签数据可以证明增强Transformer的上下文学习
链接:https://arxiv.org/abs/2601.10058
作者:Renpu Liu,Jing Yang
备注:Published as a conference paper at NeurIPS 2025
摘要:大型语言模型(LLM)表现出令人印象深刻的上下文学习(ICL)能力,但它们的预测质量从根本上受到少数可以适应提示的昂贵标记演示的限制。与此同时,存在大量的和不断增长的未标记的数据,可能是密切相关的ICL任务。如何利用这些未标记的数据,以证明提高ICL的性能,因此成为一个新兴的基本问题。在这项工作中,我们提出了一种新的增强ICL框架,其中的提示包括一个小的一组标记的例子旁边的一块未标记的输入。我们专注于多类线性分类设置,并证明,与链的思想(CoT)提示,多层Transformer可以有效地模拟期望最大化(EM)算法。这使得Transformer能够从标记和未标记的数据中隐式地提取有用的信息,从而导致ICL准确性的可证明的改进。此外,我们表明,这样的Transformer可以通过教师强迫训练,其参数收敛到所需的解决方案在一个线性的速度。实验表明,增强ICL框架始终优于传统的Few-Shot ICL,为我们的理论研究结果提供了实证支持。据我们所知,这是第一个理论研究的影响,未标记的数据对Transformers的ICL性能。
摘要:Large language models (LLMs) exhibit impressive in-context learning (ICL) capabilities, yet the quality of their predictions is fundamentally limited by the few costly labeled demonstrations that can fit into a prompt. Meanwhile, there exist vast and continuously growing amounts of unlabeled data that may be closely related to the ICL task. How to utilize such unlabeled data to provably enhance the performance of ICL thus becomes an emerging fundamental question. In this work, we propose a novel augmented ICL framework, in which the prompt includes a small set of labeled examples alongside a block of unlabeled inputs. We focus on the multi-class linear classification setting and demonstrate that, with chain-of-thought (CoT) prompting, a multi-layer transformer can effectively emulate an expectation-maximization (EM) algorithm. This enables the transformer to implicitly extract useful information from both labeled and unlabeled data, leading to provable improvements in ICL accuracy. Moreover, we show that such a transformer can be trained via teacher forcing, with its parameters converging to the desired solution at a linear rate. Experiments demonstrate that the augmented ICL framework consistently outperforms conventional few-shot ICL, providing empirical support for our theoretical findings. To the best of our knowledge, this is the first theoretical study on the impact of unlabeled data on the ICL performance of transformers.
【5】Continuous-Depth Transformers with Learned Control Dynamics
标题:具有习得控制动力学的连续深度变形机
链接:https://arxiv.org/abs/2601.10007
作者:Peter Jemley
备注:9 pages, 4 figures. Code available at: https://github.com/PeterJemley/Continuous-Depth-Transformers-with-Learned-Control-Dynamics
摘要:我们提出了一种混合Transformer架构,该架构用连续深度的神经常微分方程(ODE)块替换离散中间层,通过学习的转向信号实现对生成属性的推理时间控制。与通过固定离散层处理表示的标准Transformers不同,我们的方法将深度视为由学习向量场$F_θ(H,τ,u)$控制的连续变量,其中$u$是通过显式级联注入的低维控制信号。我们通过四个实验来验证该架构:(1)具有零爆炸/消失梯度事件的梯度流稳定性,(2)对于积极/消极情绪控制,语义转向达到98%/88%的准确性,(3)通过固定和自适应求解器之间的可忽略的0. 068%轨迹发散来验证的连续插值,以及(4)效率基准测试,其展示了与标准离散基线的延迟对等性。此外,我们表明,自适应ODE求解器揭示了几何结构的学习动力学:控制信号分区的向量场到不同的动力学制度,具有不同的曲率特性。伴随方法使$O(1)$记忆训练与集成深度无关。我们的研究结果表明,学习控制信号的连续深度动态提供了一个可行的,有效的可操纵的语言生成机制。
摘要:We present a hybrid transformer architecture that replaces discrete middle layers with a continuous-depth Neural Ordinary Differential Equation (ODE) block, enabling inference-time control over generation attributes via a learned steering signal. Unlike standard transformers that process representations through fixed discrete layers, our approach treats depth as a continuous variable governed by a learned vector field $F_θ(H, τ, u)$, where $u$ is a low-dimensional control signal injected via explicit concatenation. We validate the architecture through four experiments: (1) gradient flow stability with zero exploding/vanishing gradient events, (2) semantic steering achieving 98\%/88\% accuracy for positive/negative sentiment control, (3) continuous interpolation validated by a negligible 0.068\% trajectory divergence between fixed and adaptive solvers, and (4) efficiency benchmarking demonstrating latency parity with standard discrete baselines. Additionally, we show that adaptive ODE solvers reveal geometric structure in the learned dynamics: the control signal partitions the vector field into distinct dynamical regimes with different curvature characteristics. The adjoint method enables $O(1)$ memory training regardless of integration depth. Our results demonstrate that continuous-depth dynamics with learned control signals provide a viable, efficient mechanism for steerable language generation.
【6】The Geometry of Thought: Disclosing the Transformer as a Tropical Polynomial Circuit
标题:思维的几何学:将Transformer揭示为热带多项电路
链接:https://arxiv.org/abs/2601.09775
作者:Faruk Alpay,Bilge Senturk
备注:7 pages, 2 figures
摘要:我们证明了Transformer自注意机制在高置信度区域($β\to \infty$,其中$β$是逆温度)在热带半环(极大代数)中运行。特别是,我们表明,采取热带极限的softmax注意转换成热带矩阵产品。这表明,Transformer的前向传递在由标记相似性定义的潜在图上有效地执行动态编程递归(具体地说,Bellman-Ford寻路更新)。我们的理论结果为思想链推理提供了一个新的几何视角:它来自网络计算中执行的固有最短路径(或最长路径)算法。
摘要:We prove that the Transformer self-attention mechanism in the high-confidence regime ($β\to \infty$, where $β$ is an inverse temperature) operates in the tropical semiring (max-plus algebra). In particular, we show that taking the tropical limit of the softmax attention converts it into a tropical matrix product. This reveals that the Transformer's forward pass is effectively executing a dynamic programming recurrence (specifically, a Bellman-Ford path-finding update) on a latent graph defined by token similarities. Our theoretical result provides a new geometric perspective for chain-of-thought reasoning: it emerges from an inherent shortest-path (or longest-path) algorithm being carried out within the network's computation.
GAN|对抗|攻击|生成相关(3篇)
【1】CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning
标题:CS-GBA:一种基于关键样本的用户引导的离线强化学习后门攻击
链接:https://arxiv.org/abs/2601.10407
作者:Yuanjie Zhao,Junnan Qiu,Yue Ding,Jie Li
摘要:离线强化学习(RL)可以从静态数据集进行策略优化,但本质上容易受到后门攻击。现有的攻击策略通常与安全约束算法(例如,CQL),这是由于效率低下的随机中毒和使用容易检测到的分布外(OOD)触发器。在本文中,我们提出了CS-GBA(基于关键样本的恶意引导后门攻击),一种新的框架,旨在实现高隐蔽性和破坏性下严格的预算。利用具有高时间差(TD)误差的样本对于值函数收敛至关重要的理论见解,我们引入了一种自适应临界样本选择策略,将攻击预算集中在最有影响力的转换上。为了避免OOD检测,我们提出了一种相关性破坏触发机制,该机制利用状态特征的物理互斥性(例如,第95百分位数边界),以保持统计学上的隐藏。此外,我们取代了传统的标签反转与被动引导的动作生成机制,它使用受害者Q-网络的梯度搜索数据流形内的最坏情况下的行动。D4 RL基准测试的实证结果表明,我们的方法显着优于最先进的基线,实现高攻击成功率对代表性的安全约束算法与最低5%的中毒预算,同时保持代理的性能在干净的环境。
摘要
:Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to backdoor attacks. Existing attack strategies typically struggle against safety-constrained algorithms (e.g., CQL) due to inefficient random poisoning and the use of easily detectable Out-of-Distribution (OOD) triggers. In this paper, we propose CS-GBA (Critical Sample-based Gradient-guided Backdoor Attack), a novel framework designed to achieve high stealthiness and destructiveness under a strict budget. Leveraging the theoretical insight that samples with high Temporal Difference (TD) errors are pivotal for value function convergence, we introduce an adaptive Critical Sample Selection strategy that concentrates the attack budget on the most influential transitions. To evade OOD detection, we propose a Correlation-Breaking Trigger mechanism that exploits the physical mutual exclusivity of state features (e.g., 95th percentile boundaries) to remain statistically concealed. Furthermore, we replace the conventional label inversion with a Gradient-Guided Action Generation mechanism, which searches for worst-case actions within the data manifold using the victim Q-network's gradient. Empirical results on D4RL benchmarks demonstrate that our method significantly outperforms state-of-the-art baselines, achieving high attack success rates against representative safety-constrained algorithms with a minimal 5% poisoning budget, while maintaining the agent's performance in clean environments.
【2】Step-by-Step Causality: Transparent Causal Discovery with Multi-Agent Tree-Query and Adversarial Confidence Estimation
标题:逐步因果关系:利用多智能体树查询和对抗性置信度估计的透明因果关系发现
链接:https://arxiv.org/abs/2601.10137
作者:Ziyi Ding,Chenfei Ye-Hao,Zheyuan Wang,Xiao-Ping Zhang
摘要:因果发现的目的是恢复"什么导致什么“,但经典的基于约束的方法(例如,PC,FCI)遭受错误传播,最近基于LLM的因果预言机通常表现为不透明,无信心的黑匣子。本文介绍了Tree-Query,一个树形结构的多专家LLM框架,它将成对因果发现减少到关于后门路径、(不)依赖、潜在混淆和因果方向的短序列查询,从而产生具有鲁棒性感知置信度的可解释判断。理论保证四个成对关系的渐近可识别性。在来自Mooij等人的无数据基准上。和UCI因果图,Tree-Query在直接LLM基线上改进了结构度量,饮食-体重案例研究说明了混淆筛选和稳定,高置信度的因果结论。因此,Tree-Query提供了一种从LLM获得无数据因果先验的原则性方法,可以补充下游数据驱动的因果发现。代码可在https://anonymous.4open.science/r/Repo-9B3E-4F96上获得。
摘要:Causal discovery aims to recover ``what causes what'', but classical constraint-based methods (e.g., PC, FCI) suffer from error propagation, and recent LLM-based causal oracles often behave as opaque, confidence-free black boxes. This paper introduces Tree-Query, a tree-structured, multi-expert LLM framework that reduces pairwise causal discovery to a short sequence of queries about backdoor paths, (in)dependence, latent confounding, and causal direction, yielding interpretable judgments with robustness-aware confidence scores. Theoretical guarantees are provided for asymptotic identifiability of four pairwise relations. On data-free benchmarks derived from Mooij et al. and UCI causal graphs, Tree-Query improves structural metrics over direct LLM baselines, and a diet--weight case study illustrates confounder screening and stable, high-confidence causal conclusions. Tree-Query thus offers a principled way to obtain data-free causal priors from LLMs that can complement downstream data-driven causal discovery. Code is available at https://anonymous.4open.science/r/Repo-9B3E-4F96.
【3】Transition Matching Distillation for Fast Video Generation
标题:快速视频生成的过渡匹配蒸馏
链接:https://arxiv.org/abs/2601.09881
作者:Weili Nie,Julius Berner,Nanye Ma,Chao Liu,Saining Xie,Arash Vahdat
摘要:大型视频扩散和流模型在高质量视频生成方面取得了显著的成功,但由于其低效的多步采样过程,它们在实时交互应用中的使用仍然受到限制。在这项工作中,我们提出了过渡匹配蒸馏(TMD),一种新的框架,用于将视频扩散模型蒸馏成高效的几步生成器。TMD的核心思想是将扩散模型的多步去噪轨迹与几步概率转移过程相匹配,其中每个转移被建模为轻量级条件流。为了实现有效的蒸馏,我们将原始扩散主干分解为两个组件:(1)主主干,包括大多数早期层,在每个外部过渡步骤中提取语义表示;以及(2)由最后几层组成的流头,利用这些表示来执行多个内部流更新。给定一个预训练的视频扩散模型,我们首先在模型中引入一个流头,并将其调整为一个条件流图。然后,我们将分布匹配蒸馏应用于学生模型,并在每个过渡步骤中进行流头展示。对Wan2.1 1.3B和14 B文本到视频模型的大量实验表明,TMD在生成速度和视觉质量之间提供了灵活而强大的权衡。特别是,TMD优于现有的蒸馏模型在可比的推理成本的视觉保真度和及时遵守。项目页面:https://research.nvidia.com/labs/genair/tmd
摘要:Large video diffusion and flow models have achieved remarkable success in high-quality video generation, but their use in real-time interactive applications remains limited due to their inefficient multi-step sampling process. In this work, we present Transition Matching Distillation (TMD), a novel framework for distilling video diffusion models into efficient few-step generators. The central idea of TMD is to match the multi-step denoising trajectory of a diffusion model with a few-step probability transition process, where each transition is modeled as a lightweight conditional flow. To enable efficient distillation, we decompose the original diffusion backbone into two components: (1) a main backbone, comprising the majority of early layers, that extracts semantic representations at each outer transition step; and (2) a flow head, consisting of the last few layers, that leverages these representations to perform multiple inner flow updates. Given a pretrained video diffusion model, we first introduce a flow head to the model, and adapt it into a conditional flow map. We then apply distribution matching distillation to the student model with flow head rollout in each transition step. Extensive experiments on distilling Wan2.1 1.3B and 14B text-to-video models demonstrate that TMD provides a flexible and strong trade-off between generation speed and visual quality. In particular, TMD outperforms existing distilled models under comparable inference costs in terms of visual fidelity and prompt adherence. Project page: https://research.nvidia.com/labs/genair/tmd
半/弱/无/有监督|不确定性|主动学习(4篇)
【1】ProbFM: Probabilistic Time Series Foundation Model with Uncertainty Decomposition
标题:ProbFM:具有不确定性分解的概率时间序列基础模型
链接:https://arxiv.org/abs/2601.10591
作者:Arundeep Chinta,Lucas Vinh Tran,Jay Katukuri
备注:Accepted for oral presentation at the AI Meets Quantitative Finance Workshop at ICAIF 2025. An enhanced version was accepted for oral presentation at the AI for Time Series Analysis Workshop at AAAI 2026
摘要:时间序列基础模型(TSFM)已成为一种很有前途的zero-shot财务预测方法,表现出强大的可移植性和数据效率的提高。然而,它们在金融应用中的采用受到不确定性量化的根本限制的阻碍:当前的方法要么依赖于限制性的分布假设,要么合并不同的不确定性来源,要么缺乏原则性的校准机制。虽然最近的TSFM采用复杂的技术,如混合模型,学生的t-分布,或共形预测,他们未能解决提供理论基础的不确定性分解的核心挑战。这是第一次,我们提出了一种新的基于变换的概率框架,ProbFM(概率基础模型),它利用深度证据回归(DER)来提供具有显式认识-任意分解的原则性不确定性量化。与预先指定分布形式或需要基于采样的推理的现有方法不同,ProbFM通过高阶证据学习来学习最佳不确定性表示,同时保持单遍计算效率。为了严格评估与架构复杂性无关的核心DER不确定性量化方法,我们使用一致的LSTM架构跨五种概率方法进行了广泛的受控比较研究:DER,高斯NLL,Student's-t NLL,分位数损失和共形预测。对加密货币收益预测的评估表明,DER在提供明确的认识-任意不确定性分解的同时,保持了具有竞争力的预测准确性。这项工作为基础模型中有原则的不确定性量化建立了一个可扩展的框架,并为DER在金融应用中的有效性建立了经验证据。
摘要
:Time Series Foundation Models (TSFMs) have emerged as a promising approach for zero-shot financial forecasting, demonstrating strong transferability and data efficiency gains. However, their adoption in financial applications is hindered by fundamental limitations in uncertainty quantification: current approaches either rely on restrictive distributional assumptions, conflate different sources of uncertainty, or lack principled calibration mechanisms. While recent TSFMs employ sophisticated techniques such as mixture models, Student's t-distributions, or conformal prediction, they fail to address the core challenge of providing theoretically-grounded uncertainty decomposition. For the very first time, we present a novel transformer-based probabilistic framework, ProbFM (probabilistic foundation model), that leverages Deep Evidential Regression (DER) to provide principled uncertainty quantification with explicit epistemic-aleatoric decomposition. Unlike existing approaches that pre-specify distributional forms or require sampling-based inference, ProbFM learns optimal uncertainty representations through higher-order evidence learning while maintaining single-pass computational efficiency. To rigorously evaluate the core DER uncertainty quantification approach independent of architectural complexity, we conduct an extensive controlled comparison study using a consistent LSTM architecture across five probabilistic methods: DER, Gaussian NLL, Student's-t NLL, Quantile Loss, and Conformal Prediction. Evaluation on cryptocurrency return forecasting demonstrates that DER maintains competitive forecasting accuracy while providing explicit epistemic-aleatoric uncertainty decomposition. This work establishes both an extensible framework for principled uncertainty quantification in foundation models and empirical evidence for DER's effectiveness in financial applications.
【2】Early Fault Detection on CMAPSS with Unsupervised LSTM Autoencoders
标题:具有无监督LSTM自动编码器的CMAPSA早期故障检测
链接:https://arxiv.org/abs/2601.10269
作者:P. Sánchez,K. Reyes,B. Radu,E. Fernández
摘要:本文介绍了一个无人监督的健康监测框架,涡轮风扇发动机,不需要运行到故障的标签。首先,NASA CMAPSS传感器流中的操作条件影响通过基于回归的归一化来消除;然后,仅在每个轨迹的健康部分上训练长短期记忆(LSTM)自动编码器。使用自适应数据驱动阈值估计的持续重建错误会触发实时警报,而无需手动调整规则。基准测试结果显示,在多种操作模式下,召回率高,误报率低,这表明该方法可以快速部署,扩展到不同的车队,并作为剩余使用寿命模型的补充预警层。
摘要:This paper introduces an unsupervised health-monitoring framework for turbofan engines that does not require run-to-failure labels. First, operating-condition effects in NASA CMAPSS sensor streams are removed via regression-based normalisation; then a Long Short-Term Memory (LSTM) autoencoder is trained only on the healthy portion of each trajectory. Persistent reconstruction error, estimated using an adaptive data-driven threshold, triggers real-time alerts without hand-tuned rules. Benchmark results show high recall and low false-alarm rates across multiple operating regimes, demonstrating that the method can be deployed quickly, scale to diverse fleets, and serve as a complementary early-warning layer to Remaining Useful Life models.
【3】Breaking the Limits of Open-Weight CLIP: An Optimization Framework for Self-supervised Fine-tuning of CLIP
标题:打破开放权重CLIP的限制:CLIP自我监督微调的优化框架
链接:https://arxiv.org/abs/2601.09859
作者:Anant Mehta,Xiyuan Wei,Xingyu Chen,Tianbao Yang
备注:Submitted to ICLR 2026
摘要:CLIP已经成为多模态表征学习的基石,但提高其性能通常需要在数十亿个样本上从头开始进行昂贵的训练过程。我们问一个不同的问题:我们能否仅使用现有的自监督数据集来提高开放权重CLIP模型在各种下游任务中的性能?与监督式微调不同,它使预训练模型适应单个下游任务,我们的设置旨在提高各种任务的总体性能。然而,正如我们的实验和先前的研究所揭示的那样,简单地从开放权重CLIP模型开始应用标准训练协议通常会失败,导致性能下降。在本文中,我们介绍了TuneCLIP,一个自我监督的微调框架,克服了性能下降。TuneCLIP有两个关键组件:(1)受理论分析启发,恢复优化统计数据以减少冷启动偏差的预热阶段,以及(2)优化新对比损失以减轻对假阴性对的惩罚的微调阶段。我们广泛的实验表明,TuneCLIP在模型架构和规模上始终如一地提高了性能。值得注意的是,它提升了领先的开放权重模型,如SigLIP(ViT-B/16),在ImageNet和相关的发行外基准测试中获得了高达+2.5%的收益,在极具竞争力的DataComp基准测试中获得了+1.2%的收益,为有效的预训练后适应设定了新的强大基线。
摘要:CLIP has become a cornerstone of multimodal representation learning, yet improving its performance typically requires a prohibitively costly process of training from scratch on billions of samples. We ask a different question: Can we improve the performance of open-weight CLIP models across various downstream tasks using only existing self-supervised datasets? Unlike supervised fine-tuning, which adapts a pretrained model to a single downstream task, our setting seeks to improve general performance across various tasks. However, as both our experiments and prior studies reveal, simply applying standard training protocols starting from an open-weight CLIP model often fails, leading to performance degradation. In this paper, we introduce TuneCLIP, a self-supervised fine-tuning framework that overcomes the performance degradation. TuneCLIP has two key components: (1) a warm-up stage of recovering optimization statistics to reduce cold-start bias, inspired by theoretical analysis, and (2) a fine-tuning stage of optimizing a new contrastive loss to mitigate the penalization on false negative pairs. Our extensive experiments show that TuneCLIP consistently improves performance across model architectures and scales. Notably, it elevates leading open-weight models like SigLIP (ViT-B/16), achieving gains of up to +2.5% on ImageNet and related out-of-distribution benchmarks, and +1.2% on the highly competitive DataComp benchmark, setting a new strong baseline for efficient post-pretraining adaptation.
【4】Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODASER) for Safe Reinforcement Learning in Optimal Control
标题:自组织双缓冲区自适应集群体验回放(SODASER)用于最优控制中的安全强化学习
链接:https://arxiv.org/abs/2601.06540
作者:Roya Khalili Amirabadi,Mohsen Jalaeian Farimani,Omid Solaymani Fard
备注:Also available at SSRN: https://ssrn.com/abstract=5191427 or http://dx.doi.org/10.2139/ssrn.5191427
摘要:本文提出了一种新的强化学习框架,名为自组织双缓冲区自适应聚类经验重放(SODACER),旨在实现安全和可扩展的非线性系统的最优控制。建议的SODACER机制包括一个快速缓冲区的快速适应最近的经验和一个慢缓冲区配备了自组织自适应聚类机制,以保持多样性和非冗余的历史经验。自适应聚类机制动态地修剪冗余样本,优化内存效率,同时保留关键的环境模式。该方法集成了SODASER与控制屏障功能(CBF),以保证安全性,在整个学习过程中执行状态和输入约束。为了增强收敛性和稳定性,该框架与Sophia优化器相结合,实现自适应二阶梯度更新。SODACER-Sophia的架构可确保在动态、安全关键型环境中进行可靠、有效和稳健的学习,为机器人、医疗保健和大规模系统优化等应用提供可推广的解决方案。所提出的方法进行了验证的非线性人乳头瘤病毒(HPV)的传播模型与多个控制输入和安全约束。对随机和基于聚类的经验重放方法的比较评估表明,SODACER实现了更快的收敛,提高了样本效率,以及优越的偏差方差权衡,同时保持安全的系统轨迹,通过弗里德曼测试验证。
摘要:This paper proposes a novel reinforcement learning framework, named Self-Organizing Dual-buffer Adaptive Clustering Experience Replay (SODACER), designed to achieve safe and scalable optimal control of nonlinear systems. The proposed SODACER mechanism consisting of a Fast-Buffer for rapid adaptation to recent experiences and a Slow-Buffer equipped with a self-organizing adaptive clustering mechanism to maintain diverse and non-redundant historical experiences. The adaptive clustering mechanism dynamically prunes redundant samples, optimizing memory efficiency while retaining critical environmental patterns. The approach integrates SODASER with Control Barrier Functions (CBFs) to guarantee safety by enforcing state and input constraints throughout the learning process. To enhance convergence and stability, the framework is combined with the Sophia optimizer, enabling adaptive second-order gradient updates. The proposed SODACER-Sophia's architecture ensures reliable, effective, and robust learning in dynamic, safety-critical environments, offering a generalizable solution for applications in robotics, healthcare, and large-scale system optimization. The proposed approach is validated on a nonlinear Human Papillomavirus (HPV) transmission model with multiple control inputs and safety constraints. Comparative evaluations against random and clustering-based experience replay methods demonstrate that SODACER achieves faster convergence, improved sample efficiency, and a superior bias-variance trade-off, while maintaining safe system trajectories, validated via the Friedman test.
迁移|Zero/Few/One-Shot|自适应(3篇)
【1】Reinforcement Learning with Multi-Step Lookahead Information Via Adaptive Batching
标题
:通过自适应批处理具有多步前瞻信息的强化学习
链接:https://arxiv.org/abs/2601.10418
作者:Nadav Merlis
摘要:我们研究了具有多步前瞻信息的表格强化学习问题。在行动之前,学习者观察未来过渡和奖励实现的$\ell$步骤:代理将达到的确切状态以及在任何可能的行动过程中将收集的奖励。虽然已经证明这些信息可以大大提高价值,但找到最佳策略是NP困难的,并且通常应用两种易于处理的算法之一:以预定义大小的块处理前瞻(“固定策略”)和模型预测控制。我们首先说明了这两种方法的问题,并提出利用前瞻性自适应(状态依赖)批次,我们指的是自适应策略(ABP)等政策。我们推导出这些策略的最佳Bellman方程,并设计了一个乐观的遗憾最小化算法,使学习最佳ABP与未知环境交互时。我们的遗憾界限是顺序最优的,直到前瞻视野$\ell$的潜在因子,通常可以将其视为一个小常数。
摘要:We study tabular reinforcement learning problems with multiple steps of lookahead information. Before acting, the learner observes $\ell$ steps of future transition and reward realizations: the exact state the agent would reach and the rewards it would collect under any possible course of action. While it has been shown that such information can drastically boost the value, finding the optimal policy is NP-hard, and it is common to apply one of two tractable heuristics: processing the lookahead in chunks of predefined sizes ('fixed batching policies'), and model predictive control. We first illustrate the problems with these two approaches and propose utilizing the lookahead in adaptive (state-dependent) batches; we refer to such policies as adaptive batching policies (ABPs). We derive the optimal Bellman equations for these strategies and design an optimistic regret-minimizing algorithm that enables learning the optimal ABP when interacting with unknown environments. Our regret bounds are order-optimal up to a potential factor of the lookahead horizon $\ell$, which can usually be considered a small constant.
【2】Adaptive Label Error Detection: A Bayesian Approach to Mislabeled Data Detection
标题:自适应标签错误检测:错误标签数据检测的Bayesian方法
链接:https://arxiv.org/abs/2601.10084
作者:Zan Chaudhry,Noam H. Rotenberg,Brian Caffo,Craig K. Jones,Haris I. Sair
备注:10 pages, 5 figures
摘要:机器学习分类系统在使用不正确的地面真值标签进行训练时,即使数据是由专家注释者精心策划的,也很容易表现不佳。随着机器学习变得越来越普遍,识别和纠正错误标签以开发更强大的模型变得越来越重要。在这项工作中,我们激励和描述自适应标签错误检测(ALED),一种新的方法检测错误标记。ALED从深度卷积神经网络中提取中间特征空间,对特征进行降噪,用多维高斯分布对每个类的简化流形进行建模,并执行简单的似然比测试来识别错误标记的样本。我们表明,ALED显着提高了灵敏度,而不影响精度,建立标签错误检测方法相比,在多个医学成像数据集。我们展示了一个例子,其中在校正数据上微调神经网络导致测试集错误减少33.8%,为最终用户提供了强大的好处。ALED检测器部署在Python包statlab中。
摘要:Machine learning classification systems are susceptible to poor performance when trained with incorrect ground truth labels, even when data is well-curated by expert annotators. As machine learning becomes more widespread, it is increasingly imperative to identify and correct mislabeling to develop more powerful models. In this work, we motivate and describe Adaptive Label Error Detection (ALED), a novel method of detecting mislabeling. ALED extracts an intermediate feature space from a deep convolutional neural network, denoises the features, models the reduced manifold of each class with a multidimensional Gaussian distribution, and performs a simple likelihood ratio test to identify mislabeled samples. We show that ALED has markedly increased sensitivity, without compromising precision, compared to established label error detection methods, on multiple medical imaging datasets. We demonstrate an example where fine-tuning a neural network on corrected data results in a 33.8% decrease in test set errors, providing strong benefits to end users. The ALED detector is deployed in the Python package statlab.
【3】Classification Imbalance as Transfer Learning
标题:分类失衡作为迁移学习
链接:https://arxiv.org/abs/2601.10630
作者:Eric Xia,Jason M. Klusowski
摘要:当一个类别比另一个类别稀有得多时,就会出现分类不平衡。我们将此设置定义为在由观察数据引起的不平衡源分布和评估性能的平衡目标分布之间的标签(先验)转移下的迁移学习。在这个框架内,我们研究了一个家庭的过采样程序,增加训练数据生成合成样本估计少数类分布,大致平衡类,其中著名的SMOTE算法是一个典型的例子。我们表明,超额风险分解成在均衡训练下可实现的比率(就好像数据是从均衡目标分布中提取的)和一个附加项,即转移成本,它量化了估计和真实少数群体分布之间的差异。特别是,我们表明,SMOTE的传输成本占主导地位的引导(随机过采样)在中等高的维度,这表明我们应该期待引导有更好的性能比SMOTE一般。我们用实验证据证实了这些发现。更广泛地说,我们的研究结果为不平衡分类的增强策略提供了指导。
摘要:Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the cost of transfer for SMOTE dominates that of bootstrapping (random oversampling) in moderately high dimensions, suggesting that we should expect bootstrapping to have better performance than SMOTE in general. We corroborate these findings with experimental evidence. More broadly, our results provide guidance for choosing among augmentation strategies for imbalanced classification.
强化学习(3篇)
【1】Projected Microbatch Accumulation yields reference-free proximal policy updates for reinforcement learning
标题:预计的微批累积为强化学习产生无引用的近端策略更新
链接:https://arxiv.org/abs/2601.10498
作者:Nilin Abrahamsen
摘要:本文介绍了预计微批量累积(PROMA),这是一种用于大型语言模型微调的最近策略更新方法。PROMA通过在微批聚合之前投影出顺序梯度分量来累积跨微批的策略梯度。在向后传递期间逐层应用投影,从而实现高效实现,而无需额外的向前或向后传递。从经验上讲,PROMA比GRPO更严格地控制了本地KL分歧,从而实现了更稳定的政策学习。与PPO和GRPO不同,PROMA在不引起熵崩溃的情况下实现了邻近更新,并且不依赖于参考策略或似然比裁剪。
摘要:This note introduces Projected Microbatch Accumulation (PROMA), a proximal policy update method for large language model fine-tuning. PROMA accumulates policy gradients across microbatches by projecting out sequence-wise gradient components before microbatch aggregation. The projection is applied layer-wise during the backward pass, enabling efficient implementation without additional forward or backward passes. Empirically, PROMA enforces tighter control of local KL divergence than GRPO, resulting in more stable policy learning. Unlike PPO and GRPO, PROMA achieves proximal updates without inducing entropy collapse and does not rely on a reference policy or likelihood-ratio clipping.
【2】Reinforcement Learning to Discover a NorthEast Monsoon Index for Monthly Rainfall Prediction in Thailand
标题:强化学习发现用于泰国月度降雨量预测的东北季风指数
链接:https://arxiv.org/abs/2601.10181
作者:Kiattikun Chobtham
摘要:气候预测是一项挑战,因为地球系统内的时空模式错综复杂。全球气候指数,如厄尔尼诺南方涛动,是长期降雨预测的标准输入特征。然而,在能够提高泰国特定地区预测准确性的地方尺度指数方面仍然存在重大差距。本文介绍了一个新的东北季风气候指数,它是由海表温度计算的,以反映北方冬季风的气候学特征。为了优化用于该指数的计算面积,Deep Q-Network强化学习代理根据其与季节性降雨的相关性探索并选择最有效的矩形。雨量站分为12个不同的集群,以区分泰国南部和北部之间的降雨模式。实验结果表明,将优化后的指标纳入长短期记忆模型显着提高长期月降雨量预测技能,在大多数集群地区。这种方法有效地降低了12个月预测的均方根误差。
摘要:Climate prediction is a challenge due to the intricate spatiotemporal patterns within Earth systems. Global climate indices, such as the El Niño Southern Oscillation, are standard input features for long-term rainfall prediction. However, a significant gap persists regarding local-scale indices capable of improving predictive accuracy in specific regions of Thailand. This paper introduces a novel NorthEast monsoon climate index calculated from sea surface temperature to reflect the climatology of the boreal winter monsoon. To optimise the calculated areas used for this index, a Deep Q-Network reinforcement learning agent explores and selects the most effective rectangles based on their correlation with seasonal rainfall. Rainfall stations were classified into 12 distinct clusters to distinguish rainfall patterns between southern and upper Thailand. Experimental results show that incorporating the optimised index into Long Short-Term Memory models significantly improves long-term monthly rainfall prediction skill in most cluster areas. This approach effectively reduces the Root Mean Square Error for 12-month-ahead forecasts.
【3】OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing
标题:OUTLINEFORGE:用于科学写作的显式状态分层强化学习
链接:https://arxiv.org/abs/2601.09858
作者:Yilin Bao,Ziyao He,Zayden Yang
摘要:科学论文的生成需要文档级的规划和事实基础,但当前的大型语言模型尽管具有很强的本地流畅性,但在全局结构、输入覆盖率和引用一致性方面往往失败。我们提出了一个强化学习框架,将科学大纲构建作为分层文档结构的长期规划问题。我们的方法模型通过结构化操作编辑不断发展的大纲,使系统能够逐步构建完整的科学手稿。为了支持有效和稳定的学习,我们引入了一个两阶段的优化过程,包括(i)从部分计划向后轮廓重建以加强全局结构一致性,以及(ii)正向价值引导强化学习,奖励明确建模科学正确性,话语连贯性和引用保真度。此外,我们还介绍了一个科学论文生成的基准,评估文件规划,输入利用率,参考文献的忠实性,大纲组织和内容层次的事实准确性。我们的研究结果显示,在强大的神经和LLM基线上,特别是在长期结构一致性和引用可靠性方面,有了一致的改进。
摘要:Scientific paper generation requires document-level planning and factual grounding, but current large language models, despite their strong local fluency, often fail in global structure, input coverage, and citation consistency. We present a reinforcement learning framework that casts scientific outline construction as a long-horizon planning problem over hierarchical document structures. Our approach models edit evolving outlines through structured actions, enabling the system to incrementally build a complete scientific manuscript. To support effective and stabilize learning,we introduce a two-stage optimization procedure consisting of (i) backward outline reconstruction from partial plans to enforce global structural consistency, and (ii) forward value-guided reinforcement learning with rewards explicitly modeling scientific correctness, discourse coherence, and citation fidelity. In addition, We further introduce a benchmark for scientific paper generation that evaluates document planning, input utilization, reference faithfulness, outline organization, and content-level factual accuracy. Our results show consistent improvements over strong neural and LLM baselines, particularly in long-range structural coherence and citation reliability.
元学习(1篇)
【1】Bayesian Meta-Analyses Could Be More: A Case Study in Trial of Labor After a Cesarean-section Outcomes and Complications
标题:Bayesian Meta分析可能更多:剖腹产结局和并发症后分娩审判的案例研究
链接:https://arxiv.org/abs/2601.10089
作者:Ashley Klein,Edward Raff,Marcia DesJardin
备注:To appear in AAAI 2026
摘要:荟萃分析的效用取决于先前的研究是否准确地捕获了感兴趣的变量,但在医学研究中,没有捕获影响医生决策的关键决策变量。这导致未知的效应量和不可靠的结论。贝叶斯方法可以允许分析,以确定是否仍然有必要的积极作用的索赔,我们建立了一个贝叶斯方法,这种常见的医疗情况。为了证明它的效用,我们协助专业妇产科医生评估剖腹产后的劳动试验(TOLAC)的情况下,很少有干预措施可用于患者,并找到医生所需的支持,以推进患者的护理。
摘要:The meta-analysis's utility is dependent on previous studies having accurately captured the variables of interest, but in medical studies, a key decision variable that impacts a physician's decisions was not captured. This results in an unknown effect size and unreliable conclusions. A Bayesian approach may allow analysis to determine if the claim of a positive effect is still warranted, and we build a Bayesian approach to this common medical scenario. To demonstrate its utility, we assist professional OBGYNs in evaluating Trial of Labor After a Cesarean-section (TOLAC) situations where few interventions are available for patients and find the support needed for physicians to advance patient care.
医学相关(2篇)
【1】MHub.ai: A Simple, Standardized, and Reproducible Platform for AI Models in Medical Imaging
标题:MHub.ai:一个简单、标准化且可重复的医学成像人工智能模型平台
链接:https://arxiv.org/abs/2601.10154
作者:Leonard Nürnberg,Dennis Bontempi,Suraj Pai,Curtis Lisle,Steve Pieper,Ron Kikinis,Sil van de Leemput,Rahul Soni,Gowtham Murugesan,Cosmin Ciausu,Miriam Groeneveld,Felix J. Dorfner,Jue Jiang,Aneesh Rangnekar,Harini Veeraraghavan,Joeran S. Bosma,Keno Bressem,Raymond Mak,Andrey Fedorov,Hugo JWL Aerts
备注:41 pages, 15 figures, 6 tables
摘要:人工智能(AI)有可能通过自动化图像分析和加速临床研究来改变医学成像。然而,研究和临床使用受到各种各样的AI实现和架构、不一致的文档和再现性问题的限制。在这里,我们介绍了MHub.ai,这是一个开源的、基于容器的平台,它可以以最少的配置访问AI模型,从而提高医学成像的可访问性和可再现性。MHub.ai将来自同行评审出版物的模型打包到标准化容器中,这些容器支持直接处理DICOM和其他格式,提供统一的应用程序接口,并嵌入结构化元数据。每个模型都附有可用于确认模型运行的公开参考数据。MHub.ai包括用于不同模态的最先进的分割、预测和特征提取模型的初始集合。模块化框架支持对任何模型进行调整,并支持社区贡献。我们通过对肺部分割模型的比较评估,证明了该平台在临床用例中的实用性。为了进一步增强透明度和可重复性,我们公开发布生成的细分和评估指标,并提供交互式仪表板,允许读者检查单个案例并复制或扩展我们的分析。通过简化模型的使用,MHub.ai能够使用相同的执行命令和标准化输出进行并行基准测试,并降低了临床翻译的障碍。
摘要
:Artificial intelligence (AI) has the potential to transform medical imaging by automating image analysis and accelerating clinical research. However, research and clinical use are limited by the wide variety of AI implementations and architectures, inconsistent documentation, and reproducibility issues. Here, we introduce MHub.ai, an open-source, container-based platform that standardizes access to AI models with minimal configuration, promoting accessibility and reproducibility in medical imaging. MHub.ai packages models from peer-reviewed publications into standardized containers that support direct processing of DICOM and other formats, provide a unified application interface, and embed structured metadata. Each model is accompanied by publicly available reference data that can be used to confirm model operation. MHub.ai includes an initial set of state-of-the-art segmentation, prediction, and feature extraction models for different modalities. The modular framework enables adaptation of any model and supports community contributions. We demonstrate the utility of the platform in a clinical use case through comparative evaluation of lung segmentation models. To further strengthen transparency and reproducibility, we publicly release the generated segmentations and evaluation metrics and provide interactive dashboards that allow readers to inspect individual cases and reproduce or extend our analysis. By simplifying model use, MHub.ai enables side-by-side benchmarking with identical execution commands and standardized outputs, and lowers the barrier to clinical translation.
【2】LeMoF: Level-guided Multimodal Fusion for Heterogeneous Clinical Data
标题:LeMoF:针对不同临床数据的水平引导多模式融合
链接:https://arxiv.org/abs/2601.10092
作者:Jongseok Kim,Seongae Kang,Jonghwan Shin,Yuhan Lee,Ohyun Jo
摘要:多模态临床预测被广泛用于整合异构数据,如电子健康记录(EHR)和生物信号。然而,现有的方法往往依赖于静态模态集成计划和简单的融合策略。因此,他们未能充分利用特定模态的表征。在本文中,我们提出了级别引导的模态融合(LeMoF),这是一个新的框架,可以选择性地集成每个模态中的级别引导表示。每个级别指的是从编码器的不同层提取的表示。LeMoF显式地将全局模态级别预测与特定级别的歧视性表示分开并学习。这种设计使LeMoF即使在异构的临床环境中也能在预测稳定性和区分能力之间实现平衡。使用重症监护病房(ICU)数据进行的住院时间预测实验表明,LeMoF在各种编码器配置中始终优于现有的最先进的多模态融合技术。我们还证实,水平积分是在各种临床条件下实现稳健预测性能的关键因素。
摘要:Multimodal clinical prediction is widely used to integrate heterogeneous data such as Electronic Health Records (EHR) and biosignals. However, existing methods tend to rely on static modality integration schemes and simple fusion strategies. As a result, they fail to fully exploit modality-specific representations. In this paper, we propose Level-guided Modal Fusion (LeMoF), a novel framework that selectively integrates level-guided representations within each modality. Each level refers to a representation extracted from a different layer of the encoder. LeMoF explicitly separates and learns global modality-level predictions from level-specific discriminative representations. This design enables LeMoF to achieve a balanced performance between prediction stability and discriminative capability even in heterogeneous clinical environments. Experiments on length of stay prediction using Intensive Care Unit (ICU) data demonstrate that LeMoF consistently outperforms existing state-of-the-art multimodal fusion techniques across various encoder configurations. We also confirmed that level-wise integration is a key factor in achieving robust predictive performance across various clinical conditions.
蒸馏|知识提取(2篇)
【1】DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction
标题:DeFlow:离线策略提取的离线策略提取的多支管建模和价值最大化
链接:https://arxiv.org/abs/2601.10471
作者:Zhancun Mu
备注:13 pages, 3 figures
摘要:我们提出了DeFlow,一个解耦的离线RL框架,利用流匹配忠实地捕捉复杂的行为流形。优化生成策略在计算上是禁止的,通常需要通过ODE求解器进行反向传播。我们通过学习一个轻量级的细化模块内的一个明确的,数据派生的信任区域的流量歧管,而不是牺牲迭代生成能力,通过单步蒸馏来解决这个问题。通过这种方式,我们绕过了求解器微分,消除了平衡损失项的需要,确保了稳定的改进,同时完全保留了流的迭代表达能力。根据经验,DeFlow在具有挑战性的OGBench基准测试中实现了卓越的性能,并展示了高效的离线到在线适应。
摘要:We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.
【2】CAFEDistill: Learning Personalized and Dynamic Models through Federated Early-Exit Network Distillation
标题:CAFEDistill:通过联邦提前退出网络蒸馏学习个性化和动态模型
链接:https://arxiv.org/abs/2601.10015
作者:Boyi Liu,Zimu Zhou,Yongxin Tong
备注:12 pages, conference
摘要:个性化联合学习(PFL)支持在分散的异构数据上进行协作模型训练,同时根据每个客户端的独特分布量身定制。然而,现有的PFL方法产生的静态模型具有固定的准确性和效率之间的权衡,限制了它们在推理要求随上下文和资源可用性而变化的环境中的适用性。早期退出网络(EEN)通过附加中间分类器提供自适应推理。然而,将它们整合到PFL是具有挑战性的,由于客户端的异质性和深度方面的干扰所产生的冲突退出目标。先前的研究未能同时解决这两个冲突,导致次优性能。在本文中,我们提出了CAFEDistill,一个可感知的联合退出蒸馏框架,共同解决这些冲突,并将PFL扩展到早期退出网络。通过渐进的、深度优先的学生协调机制,CAFEDistill减少了浅出口和深出口之间的干扰,同时允许跨客户端进行有效的个性化知识转移。此外,它通过客户端解耦的制定减少了通信开销。广泛的评估表明,CAFEDistill优于最先进的技术,实现了更高的准确性,并将推理成本降低了30.79%-46.86%。
摘要:Personalized Federated Learning (PFL) enables collaboratively model training on decentralized, heterogeneous data while tailoring them to each client's unique distribution. However, existing PFL methods produce static models with a fixed tradeoff between accuracy and efficiency, limiting their applicability in environments where inference requirements vary with contexts and resource availability. Early-exit networks (EENs) offer adaptive inference by attaching intermediate classifiers. Yet integrating them into PFL is challenging due to client-wise heterogeneity and depth-wise interference arising from conflicting exit objectives. Prior studies fail to resolve both conflicts simultaneously, leading to suboptimal performance. In this paper, we propose CAFEDistill, a Conflict-Aware Federated Exit Distillation framework that jointly addresses these conflicts and extends PFL to early-exit networks. Through a progressive, depth-prioritized student coordination mechanism, CAFEDistill mitigates interference among shallow and deep exits while allowing effective personalized knowledge transfer across clients. Furthermore, it reduces communication overhead via a client-decoupled formulation. Extensive evaluations show that CAFEDistill outperforms the state-of-the-arts, achieving higher accuracy and reducing inference costs by 30.79%-46.86%.
推荐(1篇)
【1】Efficient Content-based Recommendation Model Training via Noise-aware Coreset Selection
标题:通过噪音感知核心集选择进行高效的基于内容的推荐模型训练
链接:https://arxiv.org/abs/2601.10067
作者:Hung Vinh Tran,Tong Chen,Hechuan Wen,Quoc Viet Hung Nguyen,Bin Cui,Hongzhi Yin
备注:WebConf 2026
摘要:基于内容的推荐系统(CRS)利用内容特征来预测用户与项目的交互,作为帮助用户导航信息丰富的Web服务的重要工具。然而,确保CRS的有效性需要大规模甚至连续的模型训练来适应不同的用户偏好,从而导致显著的计算成本和资源需求。一个很有前途的方法是核心集选择,它确定了一个小的,但有代表性的数据样本的子集,保留模型的质量,同时减少训练开销。然而,所选择的核心集容易受到用户-项目交互中普遍存在的噪声的影响,特别是当它的大小最小时。为此,我们提出了噪声感知的Coreset选择(NaCS),一个专门的框架CRS。NaCS通过基于训练梯度的子模块优化来构建核心集,同时使用渐进训练的模型来校正噪声标签。同时,我们通过不确定性量化过滤掉低置信度样本来细化选定的核心集,从而避免使用不可靠的交互进行训练。通过大量的实验,我们表明,NaCS产生更高质量的coresets的CRS,同时实现更好的效率比现有的coreset选择技术。值得注意的是,NaCS仅使用1%的训练数据就恢复了93- 95%的全数据集训练性能。源代码可在\href{https://github.com/chenxing1999/nacs}{https://github.com/chenxing1999/nacs}获得。
摘要:Content-based recommendation systems (CRSs) utilize content features to predict user-item interactions, serving as essential tools for helping users navigate information-rich web services. However, ensuring the effectiveness of CRSs requires large-scale and even continuous model training to accommodate diverse user preferences, resulting in significant computational costs and resource demands. A promising approach to this challenge is coreset selection, which identifies a small but representative subset of data samples that preserves model quality while reducing training overhead. Yet, the selected coreset is vulnerable to the pervasive noise in user-item interactions, particularly when it is minimally sized. To this end, we propose Noise-aware Coreset Selection (NaCS), a specialized framework for CRSs. NaCS constructs coresets through submodular optimization based on training gradients, while simultaneously correcting noisy labels using a progressively trained model. Meanwhile, we refine the selected coreset by filtering out low-confidence samples through uncertainty quantification, thereby avoid training with unreliable interactions. Through extensive experiments, we show that NaCS produces higher-quality coresets for CRSs while achieving better efficiency than existing coreset selection techniques. Notably, NaCS recovers 93-95\% of full-dataset training performance using merely 1\% of the training data. The source code is available at \href{https://github.com/chenxing1999/nacs}{https://github.com/chenxing1999/nacs}.
聚类(2篇)
【1】CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data
标题:CROCS:一个两阶段集群框架,用于使用智能电表数据进行以行为为中心的消费者细分
链接:https://arxiv.org/abs/2601.10494
作者:Luke W. Yerbury,Ricardo J. G. B. Campello,G. C. Livingston,Mark Goldsworthy,Lachlan O'Neil
摘要:随着电网运营商面临可再生能源整合带来的不确定性增加以及更广泛的电气化推动,需求侧管理(DSM)-特别是需求响应(DR)-作为平衡现代电力系统的成本效益机制引起了人们的极大关注。全球持续部署智能电表带来的前所未有的消费数据量使消费者能够根据真实使用行为进行细分,从而有望为更有效的DSM和DR计划的设计提供信息。然而,现有的基于聚类的分割方法不能充分反映消费者的行为多样性,通常依赖于严格的时间对齐,并且在存在异常、缺失数据或大规模部署的情况下摇摇欲坠。 为了解决这些挑战,我们提出了一种新的两阶段聚类框架--优化消费者细分的离散表示(CROCS)。在第一阶段,每个消费者的日常负荷配置文件被独立地聚类,形成一个代表性的负荷集(RLS),提供了一个紧凑的总结,他们的典型的昼夜消费行为。在第二阶段,消费者使用最小距离加权和(WSMD)进行聚类,这是一种新的集对集的测量方法,通过考虑这些行为的流行程度和相似性来比较RLS。最后,对WSMD诱导图的社区检测揭示了高阶原型,这些原型体现了定义消费者群体的共享昼夜行为,增强了所产生集群的可解释性。 在合成和真实澳大利亚智能电表数据集上进行的广泛实验表明,CROCS捕获了消费者内部的变化,揭示了同步和异步行为的相似性,并对异常和缺失数据保持鲁棒性,同时通过自然并行化有效地扩展。这些结果...
摘要:With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) -- particularly Demand Response (DR) -- has attracted significant attention as a cost-effective mechanism for balancing modern electricity systems. Unprecedented volumes of consumption data from a continuing global deployment of smart meters enable consumer segmentation based on real usage behaviours, promising to inform the design of more effective DSM and DR programs. However, existing clustering-based segmentation methods insufficiently reflect the behavioural diversity of consumers, often relying on rigid temporal alignment, and faltering in the presence of anomalies, missing data, or large-scale deployments. To address these challenges, we propose a novel two-stage clustering framework -- Clustered Representations Optimising Consumer Segmentation (CROCS). In the first stage, each consumer's daily load profiles are clustered independently to form a Representative Load Set (RLS), providing a compact summary of their typical diurnal consumption behaviours. In the second stage, consumers are clustered using the Weighted Sum of Minimum Distances (WSMD), a novel set-to-set measure that compares RLSs by accounting for both the prevalence and similarity of those behaviours. Finally, community detection on the WSMD-induced graph reveals higher-order prototypes that embody the shared diurnal behaviours defining consumer groups, enhancing the interpretability of the resulting clusters. Extensive experiments on both synthetic and real Australian smart meter datasets demonstrate that CROCS captures intra-consumer variability, uncovers both synchronous and asynchronous behavioural similarities, and remains robust to anomalies and missing data, while scaling efficiently through natural parallelisation. These results...
【2】Detecting Batch Heterogeneity via Likelihood Clustering
标题:通过似然聚集检测批异方差
链接:https://arxiv.org/abs/2601.09758
作者:Austin Talbot,Yue Ke
摘要:批次效应是基因组诊断中的一个主要混淆因素。在NGS的拷贝数变异(CNV)检测中,许多算法比较测试样品和参考样品之间的读取深度,假设它们是过程匹配的。当违反这一假设时,由于试剂批次变更到多中心处理等原因,参考变得不适当,引入了错误的CNV调用或掩盖了真正的致病性变体。在下游分析之前检测这种异质性对于可靠的临床解释至关重要。现有的批次效应检测方法要么基于原始特征对样本进行聚类,冒着生物信号与技术变化混淆的风险,要么需要经常不可用的已知批次标签。我们介绍了一种方法,解决了这两个限制聚类样本根据贝叶斯模型的证据。核心观点是,证据量化了数据和模型假设之间的兼容性,技术工件违反了假设并减少了证据,而生物学变异,包括CNV状态,是模型预期的,并产生了高证据。这种不对称性提供了一个区分信号,将批次效应与生物学分开。我们将异质性检测形式化为证据空间中混合结构的似然比检验,使用参数自举校准来确保保守的假阳性率。我们验证了我们的方法的合成数据,证明适当的I型错误控制,三个临床靶向测序面板(液体活检,BRCA和地中海贫血)表现出不同的批量效应机制,和小鼠电生理记录,表现出跨模态泛化。与标准的基于相关性和降维方法相比,我们的方法实现了更高的聚类精度,同时保持了临床使用所需的保守性。
摘要:Batch effects represent a major confounder in genomic diagnostics. In copy number variant (CNV) detection from NGS, many algorithms compare read depth between test samples and a reference sample, assuming they are process-matched. When this assumption is violated, with causes ranging from reagent lot changes to multi-site processing, the reference becomes inappropriate, introducing false CNV calls or masking true pathogenic variants. Detecting such heterogeneity before downstream analysis is critical for reliable clinical interpretation. Existing batch effect detection methods either cluster samples based on raw features, risking conflation of biological signal with technical variation, or require known batch labels that are frequently unavailable. We introduce a method that addresses both limitations by clustering samples according to their Bayesian model evidence. The central insight is that evidence quantifies compatibility between data and model assumptions, technical artifacts violate assumptions and reduce evidence, whereas biological variation, including CNV status, is anticipated by the model and yields high evidence. This asymmetry provides a discriminative signal that separates batch effects from biology. We formalize heterogeneity detection as a likelihood ratio test for mixture structure in evidence space, using parametric bootstrap calibration to ensure conservative false positive rates. We validate our approach on synthetic data demonstrating proper Type I error control, three clinical targeted sequencing panels (liquid biopsy, BRCA, and thalassemia) exhibiting distinct batch effect mechanisms, and mouse electrophysiology recordings demonstrating cross-modality generalization. Our method achieves superior clustering accuracy compared to standard correlation-based and dimensionality-reduction approaches while maintaining the conservativeness required for clinical usage.
自动驾驶|车辆|车道检测等(2篇)
【1】See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection
标题:少看,驾驶得更好:通过基础模型随机补丁选择实现可推广的端到端自动驾驶
链接:https://arxiv.org/abs/2601.10707
作者:Amir Mallak,Erfan Aasi,Shiva Sreeram,Tsun-Hsuan Wang,Daniela Rus,Alaa Maalouf
摘要:端到端自动驾驶的最新进展表明,在从基础模型中提取的补丁对齐特征上训练的策略可以更好地推广到分布外(OOD)。我们假设,由于自我注意机制,每个补丁功能隐式嵌入/包含来自所有其他补丁的信息,以不同的方式和强度表示,使这些描述符高度冗余。我们通过PCA和跨补丁相似性来量化这些(BLIP 2)特征中的冗余:$90$%的方差由$17/64$主成分捕获,并且强大的令牌间相关性是普遍存在的。对这种重叠信息的训练导致策略过度拟合虚假的相关性,从而损害OOD的鲁棒性。我们提出了随机补丁选择(SPS),一个简单而有效的学习策略,更强大,更普遍,更有效的方法。对于每一帧,SPS随机屏蔽一部分补丁描述符,不将它们馈送到策略模型,同时保留剩余补丁的空间布局。因此,该策略提供了(相同)场景的不同随机但完整的视图:补丁的每个随机子集都像是世界的不同但仍然合理的连贯投影。因此,该策略将其决策基于特定令牌生存的不变特征。大量的实验证实,在所有OOD场景中,我们的方法优于最先进的(SOTA),实现了6.2 $%的平均改善和高达20.4 $%的闭环模拟,同时是2.4\times $更快。我们对掩蔽率和补丁特征重组进行了消融,培训和评估了9个系统,其中8个超过了之前的SOTA。最后,我们表明,相同的学习政策转移到一个物理的,现实世界的汽车没有任何调整。
摘要:Recent advances in end-to-end autonomous driving show that policies trained on patch-aligned features extracted from foundation models generalize better to Out-of-Distribution (OOD). We hypothesize that due to the self-attention mechanism, each patch feature implicitly embeds/contains information from all other patches, represented in a different way and intensity, making these descriptors highly redundant. We quantify redundancy in such (BLIP2) features via PCA and cross-patch similarity: $90$% of variance is captured by $17/64$ principal components, and strong inter-token correlations are pervasive. Training on such overlapping information leads the policy to overfit spurious correlations, hurting OOD robustness. We present Stochastic-Patch-Selection (SPS), a simple yet effective approach for learning policies that are more robust, generalizable, and efficient. For every frame, SPS randomly masks a fraction of patch descriptors, not feeding them to the policy model, while preserving the spatial layout of the remaining patches. Thus, the policy is provided with different stochastic but complete views of the (same) scene: every random subset of patches acts like a different, yet still sensible, coherent projection of the world. The policy thus bases its decisions on features that are invariant to which specific tokens survive. Extensive experiments confirm that across all OOD scenarios, our method outperforms the state of the art (SOTA), achieving a $6.2$% average improvement and up to $20.4$% in closed-loop simulations, while being $2.4\times$ faster. We conduct ablations over masking rates and patch-feature reorganization, training and evaluating 9 systems, with 8 of them surpassing prior SOTA. Finally, we show that the same learned policy transfers to a physical, real-world car without any tuning.
【2】Bias in the Shadows: Explore Shortcuts in Encrypted Network Traffic Classification
标题:阴影中的偏见:探索加密网络流量分类的捷径
链接:https://arxiv.org/abs/2601.10180
作者:Chuyi Wang,Xiaohui Xie,Tongze Wang,Yong Cui
摘要:直接在原始字节上运行的预训练模型在加密网络流量分类(NTC)中取得了令人满意的性能,但通常存在捷径学习问题-依赖于无法推广到真实数据的虚假相关性。现有的解决方案严重依赖于特定于模型的解释技术,缺乏跨不同模型架构和部署场景的适应性和通用性。 在本文中,我们提出了BiasSeeker,这是第一个半自动化的框架,它既与模型无关,又是数据驱动的,用于检测加密流量中特定于区块链的快捷方式功能。通过直接对原始二进制流量进行统计相关性分析,BiasSeeker可以识别可能影响泛化的虚假或环境纠缠特征,而不依赖于任何分类器。为了解决快捷功能的多样性,我们引入了系统的分类,并应用特定于类别的验证策略,在保留有意义的信息的同时减少偏见。 我们在三个NTC任务的19个公共数据集上评估了BiasSeeker。通过强调上下文感知的特征选择和特定于网络的诊断,BiasSeeker为理解和解决加密网络流量分类中的快捷学习提供了一个新的视角,提高了对特征选择应该是模型训练之前的故意和对网络敏感的步骤的认识。
摘要:Pre-trained models operating directly on raw bytes have achieved promising performance in encrypted network traffic classification (NTC), but often suffer from shortcut learning-relying on spurious correlations that fail to generalize to real-world data. Existing solutions heavily rely on model-specific interpretation techniques, which lack adaptability and generality across different model architectures and deployment scenarios. In this paper, we propose BiasSeeker, the first semi-automated framework that is both model-agnostic and data-driven for detecting dataset-specific shortcut features in encrypted traffic. By performing statistical correlation analysis directly on raw binary traffic, BiasSeeker identifies spurious or environment-entangled features that may compromise generalization, independent of any classifier. To address the diverse nature of shortcut features, we introduce a systematic categorization and apply category-specific validation strategies that reduce bias while preserving meaningful information. We evaluate BiasSeeker on 19 public datasets across three NTC tasks. By emphasizing context-aware feature selection and dataset-specific diagnosis, BiasSeeker offers a novel perspective for understanding and addressing shortcut learning in encrypted network traffic classification, raising awareness that feature selection should be an intentional and scenario-sensitive step prior to model training.
联邦学习|隐私保护|加密(4篇)
【1】Communication-Efficient and Privacy-Adaptable Mechanism -- a Federated Learning Scheme with Convergence Analysis
标题:通信高效且隐私自适应机制--一种具有收敛分析的联邦学习方案
链接:https://arxiv.org/abs/2601.10701
作者:Chun Hei Michael Shiu,Chih Wei Ling
备注:19 pages, 5 figures. This work is submitted in part to the 2026 IEEE International Symposium on Information Theory (ISIT). arXiv admin note: substantial text overlap with arXiv:2501.12046
摘要:联合学习使多方能够联合训练学习模型,而无需共享自己的底层数据,从而为数据治理约束下的隐私保护协作提供了一条切实可行的途径。继续研究联邦学习对于解决其中的关键挑战至关重要,包括各方之间的通信效率和隐私保护。最近的一系列工作引入了一种称为通信高效和隐私自适应机制(CEPAM)的新方法,该方法同时实现了这两个目标。CEPAM利用了拒绝采样通用量化器(RSUQ),这是一种随机化矢量量化器,其量化误差相当于规定的噪声,可以对其进行调整以定制各方之间的隐私保护。在这项工作中,我们从理论上分析的隐私保障和收敛性能的CEPAM。此外,我们评估CEPAM的效用性能,通过实验评估,包括收敛配置文件与其他基线相比,和准确性,隐私权之间的权衡不同的缔约方。
摘要:Federated learning enables multiple parties to jointly train learning models without sharing their own underlying data, offering a practical pathway to privacy-preserving collaboration under data-governance constraints. Continued study of federated learning is essential to address key challenges in it, including communication efficiency and privacy protection between parties. A recent line of work introduced a novel approach called the Communication-Efficient and Privacy-Adaptable Mechanism (CEPAM), which achieves both objectives simultaneously. CEPAM leverages the rejection-sampled universal quantizer (RSUQ), a randomized vector quantizer whose quantization error is equivalent to a prescribed noise, which can be tuned to customize privacy protection between parties. In this work, we theoretically analyze the privacy guarantees and convergence properties of CEPAM. Moreover, we assess CEPAM's utility performance through experimental evaluations, including convergence profiles compared with other baselines, and accuracy-privacy trade-offs between different parties.
【2】Communication-Efficient Federated Learning by Exploiting Spatio-Temporal Correlations of Gradients
标题:利用对象的时空相关性进行通信高效的联邦学习
链接:https://arxiv.org/abs/2601.10491
作者:Shenlong Zheng,Zhen Zhang,Yuhui Deng,Geyong Min,Lin Cui
摘要:通信开销是联邦学习中的一个关键挑战,特别是在带宽受限的网络中。虽然已经提出了许多方法来减少通信开销,但大多数方法仅关注压缩各个梯度,忽略了它们之间的时间相关性。先前的研究表明,梯度表现出空间相关性,通常反映在低秩结构中。通过实证分析,我们进一步观察到相邻轮次的客户端梯度之间的强时间相关性。基于这些观察,我们提出了GradESTC,一种压缩技术,利用空间和时间梯度相关性。GradESTC利用空间相关性将每个完整梯度分解为一组紧凑的基向量和相应的组合系数。通过利用时间相关性,在每一轮中仅需要动态地更新基向量的一小部分。GradESTC通过传输轻量级组合系数和有限数量的更新基向量而不是完整的梯度来显着降低通信开销。大量的实验表明,在接近收敛时达到目标精度水平后,GradESTC与最强基线相比平均减少了39.79%的上行链路通信,同时保持了与未压缩FedAvg相当的收敛速度和最终精度。通过有效地利用时空梯度结构,GradESTC为通信高效的联邦学习提供了一个实用且可扩展的解决方案。
摘要:Communication overhead is a critical challenge in federated learning, particularly in bandwidth-constrained networks. Although many methods have been proposed to reduce communication overhead, most focus solely on compressing individual gradients, overlooking the temporal correlations among them. Prior studies have shown that gradients exhibit spatial correlations, typically reflected in low-rank structures. Through empirical analysis, we further observe a strong temporal correlation between client gradients across adjacent rounds. Based on these observations, we propose GradESTC, a compression technique that exploits both spatial and temporal gradient correlations. GradESTC exploits spatial correlations to decompose each full gradient into a compact set of basis vectors and corresponding combination coefficients. By exploiting temporal correlations, only a small portion of the basis vectors need to be dynamically updated in each round. GradESTC significantly reduces communication overhead by transmitting lightweight combination coefficients and a limited number of updated basis vectors instead of the full gradients. Extensive experiments show that, upon reaching a target accuracy level near convergence, GradESTC reduces uplink communication by an average of 39.79% compared to the strongest baseline, while maintaining comparable convergence speed and final accuracy to uncompressed FedAvg. By effectively leveraging spatio-temporal gradient structures, GradESTC offers a practical and scalable solution for communication-efficient federated learning.
【3】PID-Guided Partial Alignment for Multimodal Decentralized Federated Learning
标题:多模式分散联邦学习的PQ引导部分对齐
链接:https://arxiv.org/abs/2601.10012
作者:Yanhang Shi,Xiaoyu Wang,Houwei Cao,Jian Li,Yong Liu
摘要:多模态分散式联邦学习(DFL)具有挑战性,因为智能体在可用的模态和模型架构方面存在差异,但必须在没有中央协调器的情况下通过对等(P2P)网络进行协作。标准的多模态管道学习跨所有模态的单个共享嵌入。在DFL中,这样的整体表示会导致单模态和多模态代理之间的梯度错位;因此,它抑制了异构共享和跨模态交互。我们提出PARSE,一个多模态DFL框架,可操作的部分信息分解(PID)在一个无服务器的设置。每个代理执行功能裂变分解成冗余的,独特的,和协同切片的潜在代表。异构代理之间的P2P知识共享,使切片级部分对齐:只有语义共享的分支之间交换的代理,拥有相应的模态。通过消除对中央协调和梯度手术的需要,PARSE解决了单/多模式梯度冲突,从而克服了多模式DFL困境,同时保持与标准DFL约束的兼容性。在基准和代理混合中,PARSE产生了与任务,模态和混合共享DFL基线一致的收益。消融融合算子和分裂比,连同定性可视化,进一步证明了所提出的设计的效率和鲁棒性。
摘要:Multimodal decentralized federated learning (DFL) is challenging because agents differ in available modalities and model architectures, yet must collaborate over peer-to-peer (P2P) networks without a central coordinator. Standard multimodal pipelines learn a single shared embedding across all modalities. In DFL, such a monolithic representation induces gradient misalignment between uni- and multimodal agents; as a result, it suppresses heterogeneous sharing and cross-modal interaction. We present PARSE, a multimodal DFL framework that operationalizes partial information decomposition (PID) in a server-free setting. Each agent performs feature fission to factorize its latent representation into redundant, unique, and synergistic slices. P2P knowledge sharing among heterogeneous agents is enabled by slice-level partial alignment: only semantically shareable branches are exchanged among agents that possess the corresponding modality. By removing the need for central coordination and gradient surgery, PARSE resolves uni-/multimodal gradient conflicts, thereby overcoming the multimodal DFL dilemma while remaining compatible with standard DFL constraints. Across benchmarks and agent mixes, PARSE yields consistent gains over task-, modality-, and hybrid-sharing DFL baselines. Ablations on fusion operators and split ratios, together with qualitative visualizations, further demonstrate the efficiency and robustness of the proposed design.
【4】QFed: Parameter-Compact Quantum-Classical Federated Learning
标题:QFed:参数紧凑量子经典联邦学习
链接:https://arxiv.org/abs/2601.09809
作者:Samar Abdelghani,Soumaya Cherkaoui
摘要:医疗保健、金融和科学研究等领域的组织和企业越来越需要从分布式、孤立的数据集中提取集体智能,同时遵守严格的隐私、监管和主权要求。联合学习(FL)可以在不共享敏感原始数据的情况下实现协作模型构建,但面临着统计异质性、系统多样性和复杂模型的计算负担带来的日益严峻的挑战。这项研究探讨了量子辅助联邦学习的潜力,它可以通过多对数因子减少经典模型中的参数数量,从而减少训练开销。因此,我们引入了QFed,这是一个量子联邦学习框架,旨在提高边缘设备网络的计算效率。我们使用广泛采用的FashionMNIST数据集来评估所提出的框架。实验结果表明,QFed实现了77.6%的VGG类模型的参数计数减少,同时保持与可扩展环境中的经典方法相当的准确性。这些结果表明,在联邦学习环境中利用量子计算来加强边缘设备的FL能力的潜力。
摘要:Organizations and enterprises across domains such as healthcare, finance, and scientific research are increasingly required to extract collective intelligence from distributed, siloed datasets while adhering to strict privacy, regulatory, and sovereignty requirements. Federated Learning (FL) enables collaborative model building without sharing sensitive raw data, but faces growing challenges posed by statistical heterogeneity, system diversity, and the computational burden from complex models. This study examines the potential of quantum-assisted federated learning, which could cut the number of parameters in classical models by polylogarithmic factors and thus lessen training overhead. Accordingly, we introduce QFed, a quantum-enabled federated learning framework aimed at boosting computational efficiency across edge device networks. We evaluate the proposed framework using the widely adopted FashionMNIST dataset. Experimental results show that QFed achieves a 77.6% reduction in the parameter count of a VGG-like model while maintaining an accuracy comparable to classical approaches in a scalable environment. These results point to the potential of leveraging quantum computing within a federated learning context to strengthen FL capabilities of edge devices.
推理|分析|理解|解释(9篇)
【1】Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models
标题:你的推理模型是推理还是猜测?层次推理模型的机制分析
链接:https://arxiv.org/abs/2601.10679
作者:Zirui Ren,Ziming Liu
摘要
:层次推理模型(HRM)在各种推理任务上取得了非凡的性能,明显优于基于大型语言模型的推理机。为了了解人力资源管理的优势和潜在的失败模式,我们对其推理模式进行了一项机制研究,发现了三个令人惊讶的事实:(a)非常简单的难题失败,例如,只有一个未知单元的谜题可能会导致HRM失败。我们把这种失败归因于违反了不动点性质,这是HRM的一个基本假设。(b)推理步骤中的“Grokking”动态,即,答案不是均匀地改进的,而是存在突然使答案正确的关键推理步骤;(c)存在多个不动点。HRM“猜测”第一个固定点,这可能是不正确的,并被困在那里一段时间或永远。所有的事实都表明,人力资源管理似乎是“猜测”,而不是“推理”。利用这种“猜测”的图片,我们提出了三种策略来扩展HRM的猜测:数据增强(扩展猜测的质量),输入扰动(通过利用推理随机性来扩展猜测的数量)和模型自举(通过利用训练随机性来扩展猜测的数量)。在实践方面,通过结合所有方法,我们开发了增强型HRM,将数独极限的准确率从54.5%提高到96.9%。在科学方面,我们的分析为推理模型如何“推理”提供了新的见解。
摘要:Hierarchical reasoning model (HRM) achieves extraordinary performance on various reasoning tasks, significantly outperforming large language model-based reasoners. To understand the strengths and potential failure modes of HRM, we conduct a mechanistic study on its reasoning patterns and find three surprising facts: (a) Failure of extremely simple puzzles, e.g., HRM can fail on a puzzle with only one unknown cell. We attribute this failure to the violation of the fixed point property, a fundamental assumption of HRM. (b) "Grokking" dynamics in reasoning steps, i.e., the answer is not improved uniformly, but instead there is a critical reasoning step that suddenly makes the answer correct; (c) Existence of multiple fixed points. HRM "guesses" the first fixed point, which could be incorrect, and gets trapped there for a while or forever. All facts imply that HRM appears to be "guessing" instead of "reasoning". Leveraging this "guessing" picture, we propose three strategies to scale HRM's guesses: data augmentation (scaling the quality of guesses), input perturbation (scaling the number of guesses by leveraging inference randomness), and model bootstrapping (scaling the number of guesses by leveraging training randomness). On the practical side, by combining all methods, we develop Augmented HRM, boosting accuracy on Sudoku-Extreme from 54.5% to 96.9%. On the scientific side, our analysis provides new insights into how reasoning models "reason".
【2】EvoMorph: Counterfactual Explanations for Continuous Time-Series Extrinsic Regression Applied to Photoplethysmography
标题:EvoMorph:应用于光体积脉搏摄影的连续时间序列外征回归的反事实解释
链接:https://arxiv.org/abs/2601.10356
作者:Mesut Ceylan,Alexis Tabin,Patrick Langer,Elgar Fleisch,Filipe Barata
摘要:可穿戴设备能够对生理信号进行连续的人群规模监测,例如光电容积描记术(PPG),为数据驱动的临床评估创造了新的机会。时间序列外在回归(TSER)模型越来越多地利用PPG信号来估计临床相关结果,包括心率、呼吸率和血氧饱和度。然而,对于临床推理和信任,单点估计是不够的:临床医生还必须了解预测在生理上合理的变化下是否稳定,以及生理信号中现实的、可实现的变化在多大程度上会有意义地改变模型的预测。反事实解释(CFE)解决这些“如果”的问题,但现有的时间序列CFE生成方法在很大程度上限于分类,忽略波形形态,并经常产生生理上难以置信的信号,限制了其适用性连续的生物医学时间序列。为了解决这些局限性,我们引入EvoMorph,一个多目标的进化框架,用于生成生理上合理的和多样化的CFE TSER应用。EvoMorph优化了在可解释的信号描述符上定义的形态感知目标,并应用变换来保留波形结构。我们在三个PPG数据集(心率、呼吸率和血氧饱和度)上对EvoMorph进行了评估,并将其与最近的不相似邻居基线进行了比较。此外,在一个案例研究中,我们评估了EvoMorph作为不确定性量化的工具,通过将反事实敏感性与Bootstrap集成不确定性和数据密度措施相关联。总的来说,EvoMorph能够为连续的生物医学信号生成生理感知的反事实,并支持不确定性感知的可解释性,推进临床时间序列应用的可靠模型分析。
摘要:Wearable devices enable continuous, population-scale monitoring of physiological signals, such as photoplethysmography (PPG), creating new opportunities for data-driven clinical assessment. Time-series extrinsic regression (TSER) models increasingly leverage PPG signals to estimate clinically relevant outcomes, including heart rate, respiratory rate, and oxygen saturation. For clinical reasoning and trust, however, single point estimates alone are insufficient: clinicians must also understand whether predictions are stable under physiologically plausible variations and to what extent realistic, attainable changes in physiological signals would meaningfully alter a model's prediction. Counterfactual explanations (CFE) address these "what-if" questions, yet existing time series CFE generation methods are largely restricted to classification, overlook waveform morphology, and often produce physiologically implausible signals, limiting their applicability to continuous biomedical time series. To address these limitations, we introduce EvoMorph, a multi-objective evolutionary framework for generating physiologically plausible and diverse CFE for TSER applications. EvoMorph optimizes morphology-aware objectives defined on interpretable signal descriptors and applies transformations to preserve the waveform structure. We evaluated EvoMorph on three PPG datasets (heart rate, respiratory rate, and oxygen saturation) against a nearest-unlike-neighbor baseline. In addition, in a case study, we evaluated EvoMorph as a tool for uncertainty quantification by relating counterfactual sensitivity to bootstrap-ensemble uncertainty and data-density measures. Overall, EvoMorph enables the generation of physiologically-aware counterfactuals for continuous biomedical signals and supports uncertainty-aware interpretability, advancing trustworthy model analysis for clinical time-series applications.
【3】TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
标题:TRIM:通过多步推理任务中的目标逐步路由的混合推理
链接:https://arxiv.org/abs/2601.10245
作者:Vansh Kapoor,Aman Gupta,Hao Chen,Anurag Beniwal,Jing Huang,Aviral Kumar
摘要:像数学问题解决这样的多步推理任务很容易受到级联故障的影响,其中一个不正确的步骤会导致完整的解决方案崩溃。当前的LLM路由方法将整个查询分配给一个模型,将所有推理步骤视为平等。我们提出了TRIM(多步推理任务中的目标路由),它仅将那些可能使解决方案$\unicode {x2013}$脱轨的关键步骤$\unicode{x2013}$路由到较大的模型,同时让较小的模型处理例行的继续。我们的关键见解是,有针对性的步骤级干预可以从根本上改变推理效率,将昂贵的调用限制在更强大的模型可以防止级联错误的步骤上。TRIM在步骤级别上运行:它使用流程奖励模型来识别错误步骤,并根据步骤级别的不确定性和预算约束做出路由决策。我们在TRIM中开发了几种路由策略,从一个简单的基于阈值的策略到更有表现力的策略,这些策略会导致长期精度-成本权衡和步骤级正确性估计的不确定性。在MATH-500上,即使是最简单的阈值策略也比之前的路由方法具有5倍的成本效率,而更先进的策略使用减少80%的昂贵模型令牌来匹配强大、昂贵模型的性能。在AIME等更严格的基准测试中,TRIM的成本效率最高可提高6倍。所有方法都有效地推广到数学推理任务,表明步骤级难度代表了推理的基本特征。
摘要:Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods assign entire queries to one model, treating all reasoning steps as equal. We propose TRIM (Targeted routing in multi-step reasoning tasks), which routes only critical steps$\unicode{x2013}$those likely to derail the solution$\unicode{x2013}$to larger models while letting smaller models handle routine continuations. Our key insight is that targeted step-level interventions can fundamentally transform inference efficiency by confining expensive calls to precisely those steps where stronger models prevent cascading errors. TRIM operates at the step-level: it uses process reward models to identify erroneous steps and makes routing decisions based on step-level uncertainty and budget constraints. We develop several routing strategies within TRIM, ranging from a simple threshold-based policy to more expressive policies that reason about long-horizon accuracy-cost trade-offs and uncertainty in step-level correctness estimates. On MATH-500, even the simplest thresholding strategy surpasses prior routing methods with 5x higher cost efficiency, while more advanced policies match the strong, expensive model's performance using 80% fewer expensive model tokens. On harder benchmarks such as AIME, TRIM achieves up to 6x higher cost efficiency. All methods generalize effectively across math reasoning tasks, demonstrating that step-level difficulty represents fundamental characteristics of reasoning.
【4】V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation
标题:V-Zero:具有零注释的自我改进多模式推理
链接:https://arxiv.org/abs/2601.10094
作者:Han Wang,Yi Yang,Jingyuan Hu,Minfeng Zhu,Wei Chen
摘要
:多模态学习的最新进展显着增强了视觉语言模型(VLM)的推理能力。然而,最先进的方法在很大程度上依赖于大规模的人工注释数据集,这是昂贵和耗时的获取。为了克服这一限制,我们引入了V-Zero,这是一个通用的后训练框架,可以使用专门未标记的图像来促进自我改进。V-Zero通过实例化两个不同的角色建立了一个共同进化的循环:一个怀疑者和一个解决者。怀疑者通过利用双轨推理奖励来学习综合高质量,具有挑战性的问题,将直觉猜测与推理结果进行对比。求解器使用从其自己的采样响应的多数投票中派生的伪标签进行优化。这两个角色都通过组相对策略优化(GRPO)进行迭代训练,从而推动相互增强的循环。值得注意的是,在没有任何人工注释的情况下,V-Zero在Qwen2.5-VL-7 B-Instruct上实现了一致的性能提升,将视觉数学推理提高了+1.7,将一般视觉中心提高了+2.6,展示了多模态系统自我改进的潜力。代码可在https://github.com/SatonoDia/V-Zero上获得
摘要:Recent advances in multimodal learning have significantly enhanced the reasoning capabilities of vision-language models (VLMs). However, state-of-the-art approaches rely heavily on large-scale human-annotated datasets, which are costly and time-consuming to acquire. To overcome this limitation, we introduce V-Zero, a general post-training framework that facilitates self-improvement using exclusively unlabeled images. V-Zero establishes a co-evolutionary loop by instantiating two distinct roles: a Questioner and a Solver. The Questioner learns to synthesize high-quality, challenging questions by leveraging a dual-track reasoning reward that contrasts intuitive guesses with reasoned results. The Solver is optimized using pseudo-labels derived from majority voting over its own sampled responses. Both roles are trained iteratively via Group Relative Policy Optimization (GRPO), driving a cycle of mutual enhancement. Remarkably, without a single human annotation, V-Zero achieves consistent performance gains on Qwen2.5-VL-7B-Instruct, improving visual mathematical reasoning by +1.7 and general vision-centric by +2.6, demonstrating the potential of self-improvement in multimodal systems. Code is available at https://github.com/SatonoDia/V-Zero
【5】A New Convergence Analysis of Plug-and-Play Proximal Gradient Descent Under Prior Mismatch
标题:先验不匹配下即插即用近端梯度下降的新收敛分析
链接:https://arxiv.org/abs/2601.09831
作者:Guixian Xu,Jinglai Li,Junqi Tang
摘要:在这项工作中,我们提供了一个新的收敛理论,即插即用的近端梯度下降(PnP-PGD)在事先不匹配的情况下,降噪器是在不同的数据分布上训练的,以进行推理任务。据我们所知,这是PnP-PGD在先验失配下的第一个收敛性证明。与已有的Pestrian算法的理论结果相比,我们的新结果消除了一些限制性和不可验证的假设。
摘要:In this work, we provide a new convergence theory for plug-and-play proximal gradient descent (PnP-PGD) under prior mismatch where the denoiser is trained on a different data distribution to the inference task at hand. To the best of our knowledge, this is the first convergence proof of PnP-PGD under prior mismatch. Compared with the existing theoretical results for PnP algorithms, our new results removed the need for several restrictive and unverifiable assumptions.
【6】Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention
标题:通过注意力意识干预改善逻辑推理的思维链
链接:https://arxiv.org/abs/2601.09805
作者:Nguyen Minh Phuong,Dang Huu Tien,Naoya Inoue
备注:Findings of EACL 2026
摘要:使用LLM的现代逻辑推理主要依赖于采用复杂的交互式框架,这些框架将推理过程分解为通过精心设计的提示或需要外部资源(例如,符号求解器)来利用它们的强逻辑结构。虽然交互式方法引入了额外的开销,但混合方法依赖于外部组件,这限制了它们的可伸缩性。一个非交互式的端到端框架使推理能够在模型本身中出现--在不使用任何外部资源的情况下提高泛化能力,同时保持可分析性。在这项工作中,我们引入了一个用于推理任务的非交互式、端到端框架。我们发现,引入结构信息到Few-Shot提示激活了一个子集的注意头,模式与逻辑推理算子对齐。基于这一认识,我们提出了注意力感知干预(AAI),一种推理时间干预方法,通过其逻辑模式识别的选定头部重新加权注意力分数。AAI提供了一种有效的方法来引导模型的推理,通过注意力调节来利用先验知识。大量的实验表明,AAI在不同的基准测试和模型架构中增强了逻辑推理性能,同时产生了微不足道的额外计算开销。代码可在https://github.com/phuongnm94/aai_for_logical_reasoning上获得。
摘要:Modern logical reasoning with LLMs primarily relies on employing complex interactive frameworks that decompose the reasoning process into subtasks solved through carefully designed prompts or requiring external resources (e.g., symbolic solvers) to exploit their strong logical structures. While interactive approaches introduce additional overhead, hybrid approaches depend on external components, which limit their scalability. A non-interactive, end-to-end framework enables reasoning to emerge within the model itself -- improving generalization while preserving analyzability without any external resources. In this work, we introduce a non-interactive, end-to-end framework for reasoning tasks. We show that introducing structural information into the few-shot prompt activates a subset of attention heads that patterns aligned with logical reasoning operators. Building on this insight, we propose Attention-Aware Intervention (AAI), an inference-time intervention method that reweights attention scores across selected heads identified by their logical patterns. AAI offers an efficient way to steer the model's reasoning toward leveraging prior knowledge through attention modulation. Extensive experiments show that AAI enhances logical reasoning performance across diverse benchmarks and model architectures, while incurring negligible additional computational overhead. Code is available at https://github.com/phuongnm94/aai_for_logical_reasoning.
【7】TimeSAE: Sparse Decoding for Faithful Explanations of Black-Box Time Series Models
标题:TimeAE:对黑匣子时间序列模型的忠实解释进行稀疏解码
链接:https://arxiv.org/abs/2601.09776
作者:Khalid Oublal,Quentin Bouniot,Qi Gan,Stephan Clémençon,Zeynep Akata
摘要:随着黑箱模型和预训练模型在时间序列应用中的吸引力越来越大,理解和解释它们的预测变得越来越重要,特别是在可解释性和信任至关重要的高风险领域。然而,大多数现有的方法只涉及分布内的解释,并没有泛化的训练支持,这需要泛化的学习能力。在这项工作中,我们的目标是提供一个框架,通过稀疏自动编码器(SAE)和因果关系的双镜头来解释时间序列数据的黑盒模型。我们发现,目前的许多解释方法是敏感的分布变化,限制其在现实世界中的情况下的有效性。基于稀疏自动编码器的概念,我们引入了TimeSAE,一个用于黑盒模型解释的框架。我们在合成和真实世界的时间序列数据集上对TimeSAE进行了广泛的评估,并将其与领先的基线进行了比较。结果得到定量指标和定性见解的支持,表明TimeSAE提供了更可靠和更强大的解释。我们的代码可以在一个易于使用的库TimeSAE-Lib中找到:https://anonymous.4open.science/w/TimeSAE-571D/。
摘要:As black box models and pretrained models gain traction in time series applications, understanding and explaining their predictions becomes increasingly vital, especially in high-stakes domains where interpretability and trust are essential. However, most of the existing methods involve only in-distribution explanation, and do not generalize outside the training support, which requires the learning capability of generalization. In this work, we aim to provide a framework to explain black-box models for time series data through the dual lenses of Sparse Autoencoders (SAEs) and causality. We show that many current explanation methods are sensitive to distributional shifts, limiting their effectiveness in real-world scenarios. Building on the concept of Sparse Autoencoder, we introduce TimeSAE, a framework for black-box model explanation. We conduct extensive evaluations of TimeSAE on both synthetic and real-world time series datasets, comparing it to leading baselines. The results, supported by both quantitative metrics and qualitative insights, show that TimeSAE provides more faithful and robust explanations. Our code is available in an easy-to-use library TimeSAE-Lib: https://anonymous.4open.science/w/TimeSAE-571D/.
【8】Social Determinants of Health Prediction for ICD-9 Code with Reasoning Models
标题:基于推理模型的ICD-9代码健康预测的社会决定因素
链接:https://arxiv.org/abs/2601.09709
作者:Sharim Khan,Paul Landes,Adam Cross,Jimeng Sun
备注
:Published as part of Machine Learning for Health (ML4H) 2025 Findings Track
摘要:健康的社会决定因素与患者的结果相关,但很少在结构化数据中捕获。最近的注意力已经给予自动提取这些标记从临床文本,以补充诊断系统与患者的社会环境的知识。大型语言模型在从句子中识别健康标签的社会决定因素方面表现出强大的性能。然而,考虑到长距离依赖性,在大的录取或纵向笔记中的预测是具有挑战性的。在本文中,我们探讨了入院多标签健康的社会决定因素ICD-9代码分类的MIMIC-III数据集使用推理模型和传统的大语言模型。我们利用现有的ICD-9编码预测入院,达到了89%的F1。我们的贡献包括我们的研究结果,在139招生缺失SDoH代码,代码重现的结果。
摘要:Social Determinants of Health correlate with patient outcomes but are rarely captured in structured data. Recent attention has been given to automatically extracting these markers from clinical text to supplement diagnostic systems with knowledge of patients' social circumstances. Large language models demonstrate strong performance in identifying Social Determinants of Health labels from sentences. However, prediction in large admissions or longitudinal notes is challenging given long distance dependencies. In this paper, we explore hospital admission multi-label Social Determinants of Health ICD-9 code classification on the MIMIC-III dataset using reasoning models and traditional large language models. We exploit existing ICD-9 codes for prediction on admissions, which achieved an 89% F1. Our contributions include our findings, missing SDoH codes in 139 admissions, and code to reproduce the results.
【9】What Understanding Means in AI-Laden Astronomy
标题:理解本拉登天文学意味着什么
链接:https://arxiv.org/abs/2601.10038
作者:Yuan-Sen Ting,André Curtis-Trudel,Siyu Yao
备注:Perspective article, 8 pages. Based on the "Philosophy Sees the Algorithm" workshop held December 11-12, 2025 at The Ohio State University. Supported by the Alfred P. Sloan Foundation, the Center for Cosmology and AstroParticle Physics (CCAPP), and the University of Cincinnati Center for Humanities and Technology
摘要:人工智能正在迅速改变天文学研究,但科学界在很大程度上将这种转变视为工程挑战,而不是认识论挑战。这篇观点文章认为,科学哲学为人工智能与天文学的融合提供了必要的工具--关于“理解”意味着什么的概念清晰度,对数据和发现假设的批判性审查,以及评估人工智能在不同研究背景下作用的框架。利用召集天文学家,哲学家和计算机科学家的跨学科研讨会,我们确定了几个紧张局势。首先,人工智能将从数据中“导出基础物理学”的说法将当代天文学误认为是方程推导,而不是观察驱动的企业。其次,科学理解不仅仅涉及预测--它需要叙事结构、上下文判断和沟通成就,而这是当前人工智能架构难以提供的。第三,由于叙述和判断很重要,人类同行评审仍然是必不可少的-但人工智能生成的内容充斥着文献,威胁着我们识别真正见解的能力。第四,虽然人工智能擅长解决定义明确的问题,但推动突破的定义不明确的问题发现似乎需要超越模式识别的能力。第五,随着人工智能加速可行性,追求价值的标准有可能转向人工智能使之变得容易的东西,而不是真正重要的东西。我们提出“务实的理解”作为整合的框架--承认人工智能是一种扩展人类认知的工具,同时需要新的验证和认知评估规范。现在参与这些问题可能有助于社区塑造转型,而不仅仅是对其做出反应。
摘要:Artificial intelligence is rapidly transforming astronomical research, yet the scientific community has largely treated this transformation as an engineering challenge rather than an epistemological one. This perspective article argues that philosophy of science offers essential tools for navigating AI's integration into astronomy--conceptual clarity about what "understanding" means, critical examination of assumptions about data and discovery, and frameworks for evaluating AI's roles across different research contexts. Drawing on an interdisciplinary workshop convening astronomers, philosophers, and computer scientists, we identify several tensions. First, the narrative that AI will "derive fundamental physics" from data misconstrues contemporary astronomy as equation-derivation rather than the observation-driven enterprise it is. Second, scientific understanding involves more than prediction--it requires narrative construction, contextual judgment, and communicative achievement that current AI architectures struggle to provide. Third, because narrative and judgment matter, human peer review remains essential--yet AI-generated content flooding the literature threatens our capacity to identify genuine insight. Fourth, while AI excels at well-defined problem-solving, the ill-defined problem-finding that drives breakthroughs appears to require capacities beyond pattern recognition. Fifth, as AI accelerates what is feasible, pursuitworthiness criteria risk shifting toward what AI makes easy rather than what is genuinely important. We propose "pragmatic understanding" as a framework for integration--recognizing AI as a tool that extends human cognition while requiring new norms for validation and epistemic evaluation. Engaging with these questions now may help the community shape the transformation rather than merely react to it.
检测相关(1篇)
【1】A Novel Contrastive Loss for Zero-Day Network Intrusion Detection
标题:一种新的零日网络入侵检测对比损失
链接:https://arxiv.org/abs/2601.09902
作者:Jack Wilkie,Hanan Hindy,Craig Michie,Christos Tachtatzis,James Irvine,Robert Atkinson
备注:Published in: IEEE Transactions on Network Service and Management (TNSM), 2026. Official version: https://ieeexplore.ieee.org/document/11340750 Code: https://github.com/jackwilkie/CLOSR
摘要:机器学习在网络入侵检测中取得了最先进的成果;然而,当面对一种新的攻击类别-零日攻击时,其性能会显着下降。简而言之,经典的基于机器学习的方法擅长识别它们以前训练过的攻击类别,但却难以识别那些未包含在训练数据中的攻击类别。解决这一缺点的一种方法是利用异常检测器,该异常检测器专门对良性数据进行训练,目标是推广到所有攻击类别-已知和零日攻击。然而,这是以过高的假阳性率为代价的。这项工作提出了一种新的对比损失函数,它能够保持其他基于对比学习的方法的优点(对不平衡数据的鲁棒性),但也可以推广到零日攻击。与异常检测器不同,该模型使用良性和已知恶意样本(即其他众所周知的攻击类(不包括零日类))来学习良性流量的分布,从而实现显着的性能改进。在Lycos 2017数据集上对所提出的方法进行了实验验证,在已知和零日攻击检测中,它分别比以前的模型实现了.000065和.060883的AUROC改进。最后,该方法被扩展到开集识别,实现了OpenAUC的.170883改进现有的方法。
摘要:Machine learning has achieved state-of-the-art results in network intrusion detection; however, its performance significantly degrades when confronted by a new attack class -- a zero-day attack. In simple terms, classical machine learning-based approaches are adept at identifying attack classes on which they have been previously trained, but struggle with those not included in their training data. One approach to addressing this shortcoming is to utilise anomaly detectors which train exclusively on benign data with the goal of generalising to all attack classes -- both known and zero-day. However, this comes at the expense of a prohibitively high false positive rate. This work proposes a novel contrastive loss function which is able to maintain the advantages of other contrastive learning-based approaches (robustness to imbalanced data) but can also generalise to zero-day attacks. Unlike anomaly detectors, this model learns the distributions of benign traffic using both benign and known malign samples, i.e. other well-known attack classes (not including the zero-day class), and consequently, achieves significant performance improvements. The proposed approach is experimentally verified on the Lycos2017 dataset where it achieves an AUROC improvement of .000065 and .060883 over previous models in known and zero-day attack detection, respectively. Finally, the proposed method is extended to open-set recognition achieving OpenAUC improvements of .170883 over existing approaches.
分类|识别(2篇)
【1】We Need a More Robust Classifier: Dual Causal Learning Empowers Domain-Incremental Time Series Classification
标题:我们需要一个更稳健的分类器:双原因学习增强域增量时间序列分类的能力
链接:https://arxiv.org/abs/2601.10312
作者:Zhipeng Liu,Peibo Duan,Xuan Tang,Haodong Jing,Mingyang Geng,Yongsheng Huang,Jialu Xu,Bin Zhang,Binwu Wang
备注
:This paper has been accepted for publication at ACM WWW 2026
摘要:万维网依靠依赖于准确的时间序列分类的智能服务而蓬勃发展,最近在深度学习的进步推动下取得了重大进展。然而,现有的研究面临的挑战,领域增量学习。在本文中,我们提出了一个轻量级的和强大的双因果解纠缠框架(DualCD),以提高域增量场景下的模型的鲁棒性,它可以无缝地集成到时间序列分类模型。具体来说,DualCD首先引入了一个时间特征解纠缠模块来捕获类因果特征和虚假特征。因果特征可以提供足够的预测能力来支持领域增量学习设置中的分类器。为了准确地捕捉这些因果特征,我们进一步设计了一个双因果干预机制,以消除类内和类间混杂特征的影响。该机制通过将当前类的因果特征与类内虚假特征以及来自其他类的因果特征相结合来构造变体样本。因果干预损失鼓励模型仅基于因果特征来准确预测这些变体样本的标签。在多个数据集和模型上进行的大量实验表明,DualCD有效地提高了域增量场景的性能。我们总结了我们丰富的实验到一个全面的基准,以促进域增量时间序列分类的研究。
摘要:The World Wide Web thrives on intelligent services that rely on accurate time series classification, which has recently witnessed significant progress driven by advances in deep learning. However, existing studies face challenges in domain incremental learning. In this paper, we propose a lightweight and robust dual-causal disentanglement framework (DualCD) to enhance the robustness of models under domain incremental scenarios, which can be seamlessly integrated into time series classification models. Specifically, DualCD first introduces a temporal feature disentanglement module to capture class-causal features and spurious features. The causal features can offer sufficient predictive power to support the classifier in domain incremental learning settings. To accurately capture these causal features, we further design a dual-causal intervention mechanism to eliminate the influence of both intra-class and inter-class confounding features. This mechanism constructs variant samples by combining the current class's causal features with intra-class spurious features and with causal features from other classes. The causal intervention loss encourages the model to accurately predict the labels of these variant samples based solely on the causal features. Extensive experiments on multiple datasets and models demonstrate that DualCD effectively improves performance in domain incremental scenarios. We summarize our rich experiments into a comprehensive benchmark to facilitate research in domain incremental time series classification.
【2】Malware Classification using Diluted Convolutional Neural Network with Fast Gradient Sign Method
标题:使用快速梯度符号法稀释卷积神经网络进行恶意软件分类
链接:https://arxiv.org/abs/2601.09933
作者:Ashish Anand,Bhupendra Singh,Sunil Khemka,Bireswar Banerjee,Vishi Singh Bhatia,Piyush Ranjan
备注:Accepted 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON) Keywords data security, diluted convolutional neural network, fast gradient sign method, malware classification, privacy
摘要:Android恶意软件已成为组织、社会和个人日益严重的威胁,对隐私、数据安全和基础设施构成重大风险。随着恶意软件在复杂性和复杂性方面继续发展,这些恶意软件实例的缓解和检测变得更加耗时和具有挑战性,特别是由于需要大量特征来识别潜在的恶意软件。为了解决这些挑战,本研究提出了快速梯度符号方法与稀释卷积神经网络(FGSM DICNN)的恶意软件分类方法。DICNN包含稀释的卷积,增加了感受野,使模型能够使用更少的特征在长范围内捕获分散的恶意软件模式,而无需添加参数。此外,FGSM策略通过在训练期间使用单步扰动来提高准确性,从而提供更低计算成本的更多防御优势。这种集成有助于管理高分类精度,同时减少对广泛特征集的依赖。提出的FGSM DICNN模型达到了99.44%的准确率,同时优于其他现有方法,如自定义深度神经网络(DCNN)。
摘要:Android malware has become an increasingly critical threat to organizations, society and individuals, posing significant risks to privacy, data security and infrastructure. As malware continues to evolve in terms of complexity and sophistication, the mitigation and detection of these malicious software instances have become more time consuming and challenging particularly due to the requirement of large number of features to identify potential malware. To address these challenges, this research proposes Fast Gradient Sign Method with Diluted Convolutional Neural Network (FGSM DICNN) method for malware classification. DICNN contains diluted convolutions which increases receptive field, enabling the model to capture dispersed malware patterns across long ranges using fewer features without adding parameters. Additionally, the FGSM strategy enhance the accuracy by using one-step perturbations during training that provides more defensive advantage of lower computational cost. This integration helps to manage high classification accuracy while reducing the dependence on extensive feature sets. The proposed FGSM DICNN model attains 99.44% accuracy while outperforming other existing approaches such as Custom Deep Neural Network (DCNN).
表征(2篇)
【1】Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure
标题:通过激活签名实现表示感知取消学习:从抑制到知识签名擦除
链接:https://arxiv.org/abs/2601.10566
作者:Syed Naveed Mahmood,Md. Rezaur Rahman Bhuiyan,Tasfia Zaman,Jareen Tasneem Khondaker,Md. Sameer Sakib,Nazia Tasnim,Farig Sadeque
备注:16 pages, 4 figures
摘要:从LLM中选择性地擦除知识对于GDPR合规性和模型安全性至关重要,但目前的非学习方法将行为抑制与真正的知识移除混为一谈,允许潜在能力在表面拒绝之下持续存在。在这项工作中,我们通过引入知识免疫框架(KIF)来解决这一挑战,KIF是一种表征感知架构,通过针对内部激活签名而不是表面输出来区分真正的擦除和混淆。我们的方法结合了对特定主题表示的动态抑制和参数有效的适应,从而实现了持久的遗忘,而无需完全的模型再训练。KIF实现了接近Oracle的擦除(FQ约为0.99 vs. 1.00),同时保留了Oracle级别的实用性(MU = 0.62),有效地打破了限制所有先前工作的稳定性擦除权衡。我们评估了标准基础模型(Llama和Mistral)和推理先验模型(Qwen和DeepSeek)的3B到14 B参数。我们的观察表明,标准模型表现出与尺度无关的真实擦除(<3%的效用漂移),而推理先验模型则揭示了基本的架构差异。我们全面的双指标评估协议,结合表面级泄漏与潜在的痕迹持久性,操作混淆-擦除的区别,并使第一次系统诊断的机制级遗忘行为跨模型的家庭和规模。
摘要:Selective knowledge erasure from LLMs is critical for GDPR compliance and model safety, yet current unlearning methods conflate behavioral suppression with true knowledge removal, allowing latent capabilities to persist beneath surface-level refusals. In this work, we address this challenge by introducing Knowledge Immunization Framework (KIF), a representation-aware architecture that distinguishes genuine erasure from obfuscation by targeting internal activation signatures rather than surface outputs. Our approach combines dynamic suppression of subject-specific representations with parameter-efficient adaptation, enabling durable unlearning without full model retraining. KIF achieves near-oracle erasure (FQ approx 0.99 vs. 1.00) while preserving utility at oracle levels (MU = 0.62), effectively breaking the stability-erasure tradeoff that has constrained all prior work. We evaluate both standard foundation models (Llama and Mistral) and reasoning-prior models (Qwen and DeepSeek) across 3B to 14B parameters. Our observation shows that standard models exhibit scale-independent true erasure (<3% utility drift), while reasoning-prior models reveal fundamental architectural divergence. Our comprehensive dual-metric evaluation protocol, combining surface-level leakage with latent trace persistence, operationalizes the obfuscation - erasure distinction and enables the first systematic diagnosis of mechanism-level forgetting behavior across model families and scales.
【2】Searching for Quantum Effects in the Brain: A Bell-Type Test for Nonclassical Latent Representations in Autoencoders
标题:在大脑中寻找量子效应:自动编码器中非经典潜在表示的贝尔型测试
链接:https://arxiv.org/abs/2601.10588
作者:I. K. Kominis,C. Xie,S. Li,M. Skotiniotis,G. P. Tsironis
备注:6 pages, 2 figures
摘要
:神经信息处理是完全经典的还是涉及量子力学元素仍然是一个悬而未决的问题。在这里,我们提出了一个模型不可知论,信息论的非经典性测试,绕过微观假设,而是探测神经表征本身的结构。使用自动编码器作为一个透明的模型系统,我们引入了一个贝尔型的一致性测试在潜在的空间,并询问是否解码统计下获得的多个读出上下文可以共同解释一个单一的正潜变量分布。通过将对神经系统中量子特征的搜索从微观动力学转移到对信息处理的实验可测试约束,这项工作为探索神经计算的基础物理开辟了一条新途径。
摘要:Whether neural information processing is entirely classical or involves quantum-mechanical elements remains an open question. Here we propose a model-agnostic, information-theoretic test of nonclassicality that bypasses microscopic assumptions and instead probes the structure of neural representations themselves. Using autoencoders as a transparent model system, we introduce a Bell-type consistency test in latent space, and ask whether decoding statistics obtained under multiple readout contexts can be jointly explained by a single positive latent-variable distribution. By shifting the search for quantum-like signatures in neural systems from microscopic dynamics to experimentally testable constraints on information processing, this work opens a new route for probing the fundamental physics of neural computation.
3D|3D重建等相关(1篇)
【1】Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting
标题:像梵高一样思考:通过流引导3D高斯飞溅实现结构感知风格转移
链接:https://arxiv.org/abs/2601.10075
作者:Zhendong Wang,Lebin Zhou,Jingchuan Xiao,Rongduo Han,Nam Ling,Cihan Ruan
备注:7 pages, 8 figures
摘要:1888年,文森特·梵高写道:“我在本质上寻求夸张。“这一原则,放大结构形式,同时抑制摄影细节,是后印象派艺术的核心。然而,大多数现有的3D风格转移方法颠倒了这一哲学,将几何形状视为表面纹理投影的刚性基底。为了真实地再现后印象派风格,几何抽象必须被视为表达的主要载体。 我们提出了一个流引导的几何平流框架的三维高斯飞溅(3DGS),操作这一原则在无网格设置。我们的方法从2D绘画中提取定向流场,并将其反向传播到3D空间中,纠正高斯基元以形成符合场景拓扑结构的流对齐笔触,而不依赖于显式网格先验。这可以实现直接由绘画运动而不是光度约束驱动的富有表现力的结构变形。 我们的贡献有三个方面:(1)基于投影的无网格流引导机制,其将2D艺术运动转换为3D高斯几何;(2)亮度-结构解耦策略,其将几何变形与颜色优化隔离,从而减轻积极的结构抽象期间的伪影;以及(3)VLM-as-a-Judge评估框架,其通过美学判断而不是传统的像素级度量来评估艺术真实性,明确论述了艺术风格化的主观性。
摘要:In 1888, Vincent van Gogh wrote, "I am seeking exaggeration in the essential." This principle, amplifying structural form while suppressing photographic detail, lies at the core of Post-Impressionist art. However, most existing 3D style transfer methods invert this philosophy, treating geometry as a rigid substrate for surface-level texture projection. To authentically reproduce Post-Impressionist stylization, geometric abstraction must be embraced as the primary vehicle of expression. We propose a flow-guided geometric advection framework for 3D Gaussian Splatting (3DGS) that operationalizes this principle in a mesh-free setting. Our method extracts directional flow fields from 2D paintings and back-propagates them into 3D space, rectifying Gaussian primitives to form flow-aligned brushstrokes that conform to scene topology without relying on explicit mesh priors. This enables expressive structural deformation driven directly by painterly motion rather than photometric constraints. Our contributions are threefold: (1) a projection-based, mesh-free flow guidance mechanism that transfers 2D artistic motion into 3D Gaussian geometry; (2) a luminance-structure decoupling strategy that isolates geometric deformation from color optimization, mitigating artifacts during aggressive structural abstraction; and (3) a VLM-as-a-Judge evaluation framework that assesses artistic authenticity through aesthetic judgment instead of conventional pixel-level metrics, explicitly addressing the subjective nature of artistic stylization.
编码器(1篇)
【1】Single-Stage Huffman Encoder for ML Compression
标题:用于ML压缩的单级霍夫曼编码器
链接:https://arxiv.org/abs/2601.10673
作者:Aditya Agrawal,Albert Magyar,Hiteshwar Eswaraiah,Patrick Sheridan,Pradeep Janedula,Ravi Krishnan Venkatesan,Krishna Nair,Ravi Iyer
备注:5 pages, 4 figures
摘要:训练和服务大型语言模型(LLM)需要在多个加速器上划分数据,其中集体操作经常受到网络带宽的影响。使用霍夫曼码的无损压缩是缓解该问题的有效方式,然而,其需要即时频率分析、码本生成和码本连同数据的传输的三阶段设计引入了计算、延迟和数据开销,这对于诸如管芯到管芯通信的延迟敏感的场景是禁止的。本文提出了一种单级霍夫曼编码器,消除这些开销,通过使用固定的码本来自前一批数据的平均概率分布。通过我们对Gemma 2B模型的分析,我们证明了张量在层和碎片之间具有很高的统计相似性。使用这种方法,我们实现了每分片霍夫曼编码的0.5%以内的压缩和理想香农压缩率的1%以内的压缩,从而实现了高效的实时压缩。
摘要:Training and serving Large Language Models (LLMs) require partitioning data across multiple accelerators, where collective operations are frequently bottlenecked by network bandwidth. Lossless compression using Huffman codes is an effective way to alleviate the issue, however, its three-stage design requiring on-the-fly frequency analysis, codebook generation and transmission of codebook along with data introduces computational, latency and data overheads which are prohibitive for latency-sensitive scenarios such as die-to-die communication. This paper proposes a single-stage Huffman encoder that eliminates these overheads by using fixed codebooks derived from the average probability distribution of previous data batches. Through our analysis of the Gemma 2B model, we demonstrate that tensors exhibit high statistical similarity across layers and shards. Using this approach we achieve compression within 0.5% of per-shard Huffman coding and within 1% of the ideal Shannon compressibility, enabling efficient on-the-fly compression.
优化|敛散性(3篇)
【1】Combinatorial Optimization Augmented Machine Learning
标题:组合优化增强机器学习
链接:https://arxiv.org/abs/2601.10583
作者:Maximilian Schiffer,Heiko Hoppe,Yue Su,Louis Bouvier,Axel Parmentier
摘要:组合优化增强机器学习(COAML)最近已经成为将预测模型与组合决策集成的强大范例。通过将组合优化预言嵌入到学习管道中,COAML能够构建数据驱动和可行性保持的策略,从而弥合机器学习,运筹学和随机优化的传统。本文提供了一个全面的概述,在COAML的最先进的。我们介绍了一个统一的框架COAML管道,描述他们的方法构建块,并正式确定其连接到经验成本最小化。然后,我们开发了一个分类的问题设置的基础上的形式的不确定性和决策结构。使用这种分类法,我们回顾静态和动态问题的算法方法,调查跨领域的应用,如调度,车辆路径,随机规划和强化学习,并综合经验成本最小化,模仿学习和强化学习方面的方法学贡献。最后,我们确定了关键的研究前沿。本调查旨在作为该领域的教程介绍,并作为组合优化和机器学习接口的未来研究的路线图。
摘要
:Combinatorial optimization augmented machine learning (COAML) has recently emerged as a powerful paradigm for integrating predictive models with combinatorial decision-making. By embedding combinatorial optimization oracles into learning pipelines, COAML enables the construction of policies that are both data-driven and feasibility-preserving, bridging the traditions of machine learning, operations research, and stochastic optimization. This paper provides a comprehensive overview of the state of the art in COAML. We introduce a unifying framework for COAML pipelines, describe their methodological building blocks, and formalize their connection to empirical cost minimization. We then develop a taxonomy of problem settings based on the form of uncertainty and decision structure. Using this taxonomy, we review algorithmic approaches for static and dynamic problems, survey applications across domains such as scheduling, vehicle routing, stochastic programming, and reinforcement learning, and synthesize methodological contributions in terms of empirical cost minimization, imitation learning, and reinforcement learning. Finally, we identify key research frontiers. This survey aims to serve both as a tutorial introduction to the field and as a roadmap for future research at the interface of combinatorial optimization and machine learning.
【2】Kinematic Tokenization: Optimization-Based Continuous-Time Tokens for Learnable Decision Policies in Noisy Time Series
标题:运动学令牌化:基于优化的连续时间令牌,用于有噪时间序列中的可学习决策策略
链接:https://arxiv.org/abs/2601.09949
作者:Griffin Kearney
摘要:Transformers是为离散令牌而设计的,但许多现实世界的信号是通过噪声采样观察到的连续过程。离散标记化(原始值,补丁,有限差分)在低信噪比状态下可能很脆弱,特别是当下游目标施加合理鼓励避免的不对称惩罚时。我们介绍了运动标记化,一个基于优化的连续时间表示,重建显式样条从嘈杂的测量和标记本地样条系数(位置,速度,加速度,加加速度)。这适用于资产价格形式的金融时间序列数据以及交易量概况。在一个多资产的每日权益测试平台上,我们使用一个风险厌恶的非对称分类目标作为可学习性的压力测试。在这个目标下,几个离散的基线崩溃到一个吸收现金的政策(清算均衡),而连续的样条代币维持校准的,非平凡的行动分布和稳定的政策。这些结果表明,显式连续时间令牌可以提高学习和校准的选择性决策策略在噪声时间序列下的预防诱导损失。
摘要:Transformers are designed for discrete tokens, yet many real-world signals are continuous processes observed through noisy sampling. Discrete tokenizations (raw values, patches, finite differences) can be brittle in low signal-to-noise regimes, especially when downstream objectives impose asymmetric penalties that rationally encourage abstention. We introduce Kinematic Tokenization, an optimization-based continuous-time representation that reconstructs an explicit spline from noisy measurements and tokenizes local spline coefficients (position, velocity, acceleration, jerk). This is applied to financial time series data in the form of asset prices in conjunction with trading volume profiles. Across a multi-asset daily-equity testbed, we use a risk-averse asymmetric classification objective as a stress test for learnability. Under this objective, several discrete baselines collapse to an absorbing cash policy (the Liquidation Equilibrium), whereas the continuous spline tokens sustain calibrated, non-trivial action distributions and stable policies. These results suggest that explicit continuous-time tokens can improve the learnability and calibration of selective decision policies in noisy time series under abstention-inducing losses.
【3】Interpolation-Based Optimization for Enforcing lp-Norm Metric Differential Privacy in Continuous and Fine-Grained Domains
标题:基于插值的优化在连续和细粒度域中实施lp范度量差异隐私
链接:https://arxiv.org/abs/2601.09946
作者:Chenxi Qiu
备注:USENIX Security 2026
摘要:度量差分隐私(mDP)通过基于成对距离调整隐私保证来推广局部差分隐私(LDP),从而实现上下文感知保护和改进的效用。虽然现有的基于优化的方法在粗粒度域中有效地降低了效用损失,但由于构造密集的置换矩阵和满足逐点约束的计算成本,在细粒度或连续设置中优化mDP仍然具有挑战性。 在本文中,我们提出了一个插值为基础的框架,优化LP-范数的mDP在这样的领域。我们的方法优化扰动分布在一组稀疏的锚点和插值分布在非锚点位置通过对数凸组合,这可证明保持mDP。为了解决在高维空间中由朴素插值引起的隐私侵犯,我们将插值过程分解为一系列一维步骤,并推导出一个修正的公式,该公式通过设计强制执行lp范数mDP。我们进一步探讨了扰动分布和隐私预算分配跨维度的联合优化。在真实世界位置数据集上的实验表明,我们的方法在细粒度域中提供了严格的隐私保证和有竞争力的实用性,优于基线机制。在高维空间中,我们将插值过程分解为一系列一维步骤,并推导出一个修正的公式,该公式通过设计强制执行lp范数mDP。我们进一步探讨了扰动分布和隐私预算分配跨维度的联合优化。在真实世界位置数据集上的实验表明,我们的方法在细粒度域中提供了严格的隐私保证和有竞争力的实用性,优于基线机制。
摘要:Metric Differential Privacy (mDP) generalizes Local Differential Privacy (LDP) by adapting privacy guarantees based on pairwise distances, enabling context-aware protection and improved utility. While existing optimization-based methods reduce utility loss effectively in coarse-grained domains, optimizing mDP in fine-grained or continuous settings remains challenging due to the computational cost of constructing dense perterubation matrices and satisfying pointwise constraints. In this paper, we propose an interpolation-based framework for optimizing lp-norm mDP in such domains. Our approach optimizes perturbation distributions at a sparse set of anchor points and interpolates distributions at non-anchor locations via log-convex combinations, which provably preserve mDP. To address privacy violations caused by naive interpolation in high-dimensional spaces, we decompose the interpolation process into a sequence of one-dimensional steps and derive a corrected formulation that enforces lp-norm mDP by design. We further explore joint optimization over perturbation distributions and privacy budget allocation across dimensions. Experiments on real-world location datasets demonstrate that our method offers rigorous privacy guarantees and competitive utility in fine-grained domains, outperforming baseline mechanisms. in high-dimensional spaces, we decompose the interpolation process into a sequence of one-dimensional steps and derive a corrected formulation that enforces lp-norm mDP by design. We further explore joint optimization over perturbation distributions and privacy budget allocation across dimensions. Experiments on real-world location datasets demonstrate that our method offers rigorous privacy guarantees and competitive utility in fine-grained domains, outperforming baseline mechanisms.
预测|估计(1篇)
【1】CC-OR-Net: A Unified Framework for LTV Prediction through Structural Decoupling
标题:CC-OR-Net:通过结构脱钩进行LTV预测的统一框架
链接:https://arxiv.org/abs/2601.10176
作者:Mingyu Zhao,Haoran Bai,Yu Tian,Bing Zhu,Hengliang Luo
备注:Accepted by WWW'26
摘要:客户生命周期价值预测是现代市场营销的核心问题,具有独特的零膨胀和长尾数据分布特征。这种分布带来了两个根本性的挑战:(1)绝大多数中低价值用户在数量上压倒了高价值“鲸鱼”用户的小而重要的部分,以及(2)即使在中低价值用户群中也存在显著的价值异质性。常见的方法要么依赖于严格的统计假设,要么试图使用有序桶来解耦排名和回归;然而,它们通常通过基于损失的约束而不是固有的架构设计来强制排序,无法平衡全局精度和高值精度。为了解决这一差距,我们提出了\textbf {C} oncaded\textbf {C}ascaded \textbf{O} rvertical-\textbf{R}esidual Networks \textbf{(CC-OR-Net)},这是一种新的统一框架,通过\textbf{结构分解}实现了更强大的解耦,其中排名在架构上得到了保证。CC-OR-Net集成了三个专门的组件:用于鲁棒排名的\textit{结构顺序分解模块},用于细粒度回归的\textit{桶内残差模块},以及用于顶级用户精确度的\textit{目标高值增强模块}。CC-OR-Net在拥有超过3亿用户的真实数据集上进行了评估,在所有关键业务指标上实现了卓越的权衡,在创建具有商业价值的整体LTV预测解决方案方面优于最先进的方法。
摘要:Customer Lifetime Value (LTV) prediction, a central problem in modern marketing, is characterized by a unique zero-inflated and long-tail data distribution. This distribution presents two fundamental challenges: (1) the vast majority of low-to-medium value users numerically overwhelm the small but critically important segment of high-value "whale" users, and (2) significant value heterogeneity exists even within the low-to-medium value user base. Common approaches either rely on rigid statistical assumptions or attempt to decouple ranking and regression using ordered buckets; however, they often enforce ordinality through loss-based constraints rather than inherent architectural design, failing to balance global accuracy with high-value precision. To address this gap, we propose \textbf{C}onditional \textbf{C}ascaded \textbf{O}rdinal-\textbf{R}esidual Networks \textbf{(CC-OR-Net)}, a novel unified framework that achieves a more robust decoupling through \textbf{structural decomposition}, where ranking is architecturally guaranteed. CC-OR-Net integrates three specialized components: a \textit{structural ordinal decomposition module} for robust ranking, an \textit{intra-bucket residual module} for fine-grained regression, and a \textit{targeted high-value augmentation module} for precision on top-tier users. Evaluated on real-world datasets with over 300M users, CC-OR-Net achieves a superior trade-off across all key business metrics, outperforming state-of-the-art methods in creating a holistic and commercially valuable LTV prediction solution.
其他神经网络|深度学习|模型|建模(15篇)
【1】Data-driven stochastic reduced-order modeling of parametrized dynamical systems
标题:参数化动态系统的数据驱动随机降阶建模
链接:https://arxiv.org/abs/2601.10690
作者:Andrew F. Ilersich,Kevin Course,Prasanth B. Nair
摘要:在变化的条件下对复杂的动态系统进行建模是计算密集型的,通常会使高保真仿真变得棘手。虽然降阶模型(ROM)提供了一个很有前途的解决方案,目前的方法往往与随机动态和未能量化预测的不确定性,限制了他们的实用性在强大的决策环境。为了解决这些挑战,我们引入了一个数据驱动的框架,用于学习连续时间随机ROM,这些ROM可以在参数空间和强制条件下进行推广。我们的方法,基于摊销随机变分推理,利用马尔可夫高斯过程的重新参数化技巧,以消除在训练过程中需要计算昂贵的前向求解器。这使我们能够联合学习概率自动编码器和控制潜在动力学的随机微分方程,其计算成本与数据集大小和系统刚度无关。此外,我们的方法提供了灵活性,将物理知情的先验,如果可用的。数值研究提出了三个具有挑战性的测试问题,我们表现出良好的推广看不见的参数组合和强迫,和现有的方法相比,显着的效率提高。
摘要:Modeling complex dynamical systems under varying conditions is computationally intensive, often rendering high-fidelity simulations intractable. Although reduced-order models (ROMs) offer a promising solution, current methods often struggle with stochastic dynamics and fail to quantify prediction uncertainty, limiting their utility in robust decision-making contexts. To address these challenges, we introduce a data-driven framework for learning continuous-time stochastic ROMs that generalize across parameter spaces and forcing conditions. Our approach, based on amortized stochastic variational inference, leverages a reparametrization trick for Markov Gaussian processes to eliminate the need for computationally expensive forward solvers during training. This enables us to jointly learn a probabilistic autoencoder and stochastic differential equations governing the latent dynamics, at a computational cost that is independent of the dataset size and system stiffness. Additionally, our approach offers the flexibility of incorporating physics-informed priors if available. Numerical studies are presented for three challenging test problems, where we demonstrate excellent generalization to unseen parameter combinations and forcings, and significant efficiency gains compared to existing approaches.
【2】Kolmogorov Arnold Networks and Multi-Layer Perceptrons: A Paradigm Shift in Neural Modelling
标题:Kolmogorov Arnold网络和多层感知器:神经建模的范式转变
链接:https://arxiv.org/abs/2601.10563
作者:Aradhya Gaonkar,Nihal Jain,Vignesh Chougule,Nikhil Deshpande,Sneha Varur,Channabasappa Muttal
备注:13 pages, 8 figures, 2 tables
摘要:该研究对Kolmogorov-Arnold网络(KAN)和多层感知器(MLP)进行了全面的比较分析,突出了它们在解决非线性函数近似,时间序列预测和多元分类等基本计算挑战方面的有效性。KAN基于Kolmogorov表示定理,利用自适应样条激活函数和基于网格的结构,与传统的神经网络框架相比,提供了一种变革性的方法。利用各种数据集,从数学函数估计(二次和三次)到实际应用,如预测每日温度和对葡萄酒进行分类,拟议的研究通过精度指标(如均方误差(MSE)和通过浮点运算(FLOPs)评估的计算费用)来全面评估模型性能。结果表明,KAN在每个基准中都可靠地超过了MLP,从而实现了更高的预测精度,同时显著降低了计算成本。这样的结果突出了它们在计算效率和准确性之间保持平衡的能力,使它们在资源有限和实时操作环境中特别有益。通过阐明KAN和MLP之间的架构和功能差异,本文为选择最适合特定任务的神经架构提供了系统框架。此外,拟议的研究强调了KAN在发展智能系统方面的变革能力,影响了它们在需要可解释性和计算效率的情况下的使用。
摘要:The research undertakes a comprehensive comparative analysis of Kolmogorov-Arnold Networks (KAN) and Multi-Layer Perceptrons (MLP), highlighting their effectiveness in solving essential computational challenges like nonlinear function approximation, time-series prediction, and multivariate classification. Rooted in Kolmogorov's representation theorem, KANs utilize adaptive spline-based activation functions and grid-based structures, providing a transformative approach compared to traditional neural network frameworks. Utilizing a variety of datasets spanning mathematical function estimation (quadratic and cubic) to practical uses like predicting daily temperatures and categorizing wines, the proposed research thoroughly assesses model performance via accuracy measures like Mean Squared Error (MSE) and computational expense assessed through Floating Point Operations (FLOPs). The results indicate that KANs reliably exceed MLPs in every benchmark, attaining higher predictive accuracy with significantly reduced computational costs. Such an outcome highlights their ability to maintain a balance between computational efficiency and accuracy, rendering them especially beneficial in resource-limited and real-time operational environments. By elucidating the architectural and functional distinctions between KANs and MLPs, the paper provides a systematic framework for selecting the most suitable neural architectures for specific tasks. Furthermore, the proposed study highlights the transformative capabilities of KANs in progressing intelligent systems, influencing their use in situations that require both interpretability and computational efficiency.
【3】Process-Guided Concept Bottleneck Model
标题:过程引导的概念瓶颈模型
链接:https://arxiv.org/abs/2601.10562
作者:Reza M. Asiyabi,SEOSAW Partnership,Steven Hancock,Casey Ryan
备注:13 pages with 7 figures and 1 table, Supplementary Materials 10 pages with 3 figures
摘要:概念瓶颈模型(CBMs)通过引入中间语义概念来提高黑盒深度学习(DL)的可解释性。然而,标准的建立信任措施往往忽视特定领域的关系和因果机制,它们对完整概念标签的依赖限制了在监督稀疏但过程明确的科学领域的适用性。为了解决这个问题,我们提出了过程引导的概念瓶颈模型(PG-CBM),CBM的扩展,限制学习遵循域定义的因果机制,通过生物药理学意义的中间概念。使用地球观测数据的地上生物量密度估计作为案例研究,我们表明,与多个基准相比,PG-CBM减少了误差和偏差,同时利用多源异构训练数据并产生可解释的中间输出。除了提高准确性外,PG-CBM还提高了透明度,能够检测虚假学习,并提供科学见解,代表着在科学应用中向更值得信赖的AI系统迈出了一步。
摘要:Concept Bottleneck Models (CBMs) improve the explainability of black-box Deep Learning (DL) by introducing intermediate semantic concepts. However, standard CBMs often overlook domain-specific relationships and causal mechanisms, and their dependence on complete concept labels limits applicability in scientific domains where supervision is sparse but processes are well defined. To address this, we propose the Process-Guided Concept Bottleneck Model (PG-CBM), an extension of CBMs which constrains learning to follow domain-defined causal mechanisms through biophysically meaningful intermediate concepts. Using above ground biomass density estimation from Earth Observation data as a case study, we show that PG-CBM reduces error and bias compared to multiple benchmarks, whilst leveraging multi-source heterogeneous training data and producing interpretable intermediate outputs. Beyond improved accuracy, PG-CBM enhances transparency, enables detection of spurious learning, and provides scientific insights, representing a step toward more trustworthy AI systems in scientific applications.
【4】Mixtures of Transparent Local Models
标题:透明本地模型的混合
链接:https://arxiv.org/abs/2601.10541
作者:Niffa Cheick Oumar Diaby,Thierry Duchesne,Mario Marchand
备注:44 pages, 32 figues
摘要
:机器学习模型在人类活动的许多领域中的主导地位导致了对其透明度的日益增长的需求。模型的透明性使人们有可能辨别一些因素,如安全或非歧视。在本文中,我们提出了一种混合的透明局部模型作为设计可解释(或透明)模型的替代解决方案。我们的方法是专为一个简单的和透明的功能是适合建模的标签的实例在某些地方/区域的输入空间的情况下,但可能会突然改变,因为我们从一个地方移动到另一个。因此,所提出的算法是学习透明的标记函数和输入空间的局部性,其中标记函数在其分配的局部性中实现小的风险。通过使用新的多预测因子(和多局部)损失函数,我们针对二元线性分类问题和线性回归问题建立了严格的PAC-贝叶斯风险界。在这两种情况下,合成数据集被用来说明学习算法是如何工作的。从真实数据集获得的结果突出了我们的方法相比,其他现有的方法,以及某些不透明的模型的竞争力。关键词:PAC-Bayes,风险边界,局部模型,透明模型,局部透明模型的混合。
摘要:The predominance of machine learning models in many spheres of human activity has led to a growing demand for their transparency. The transparency of models makes it possible to discern some factors, such as security or non-discrimination. In this paper, we propose a mixture of transparent local models as an alternative solution for designing interpretable (or transparent) models. Our approach is designed for the situations where a simple and transparent function is suitable for modeling the label of instances in some localities/regions of the input space, but may change abruptly as we move from one locality to another. Consequently, the proposed algorithm is to learn both the transparent labeling function and the locality of the input space where the labeling function achieves a small risk in its assigned locality. By using a new multi-predictor (and multi-locality) loss function, we established rigorous PAC-Bayesian risk bounds for the case of binary linear classification problem and that of linear regression. In both cases, synthetic data sets were used to illustrate how the learning algorithms work. The results obtained from real data sets highlight the competitiveness of our approach compared to other existing methods as well as certain opaque models. Keywords: PAC-Bayes, risk bounds, local models, transparent models, mixtures of local transparent models.
【5】Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics
标题:用于学习非线性动力学的稳定可微模式综合
链接:https://arxiv.org/abs/2601.10453
作者:Victor Zheleznov,Stefan Bilbao,Alec Wright,Simon King
备注:Submitted to the Journal of Audio Engineering Society (December 2025)
摘要:模态方法是一种长期存在的物理建模综合方法。扩展到非线性问题是可能的,包括一个高振幅振动的字符串的情况下。模态分解导致一个密耦合的非线性常微分方程系统。最近的工作在标量辅助变量技术,使建设明确和稳定的数值求解器,这类非线性系统。另一方面,机器学习方法(特别是神经常微分方程)已经成功地从数据中自动建模非线性系统。在这项工作中,我们研究如何标量辅助变量技术可以与神经常微分方程相结合,产生一个稳定的可微模型能够学习非线性动力学。所提出的方法利用系统的模式的线性振动的分析解决方案,使系统的物理参数保持容易访问的训练后,而不需要在模型架构中的参数编码器。作为概念的证明,我们生成的非线性横向振动的字符串的合成数据,并表明,该模型可以训练再现系统的非线性动力学。声音的例子。
摘要:Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, including the case of a high-amplitude vibration of a string. A modal decomposition leads to a densely coupled nonlinear system of ordinary differential equations. Recent work in scalar auxiliary variable techniques has enabled construction of explicit and stable numerical solvers for such classes of nonlinear systems. On the other hand, machine learning approaches (in particular neural ordinary differential equations) have been successful in modelling nonlinear systems automatically from data. In this work, we examine how scalar auxiliary variable techniques can be combined with neural ordinary differential equations to yield a stable differentiable model capable of learning nonlinear dynamics. The proposed approach leverages the analytical solution for linear vibration of system's modes so that physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the model architecture. As a proof of concept, we generate synthetic data for the nonlinear transverse vibration of a string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.
【6】AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior
标题:AgentGuardian:学习访问控制策略来管理AI代理行为
链接:https://arxiv.org/abs/2601.10440
作者:Nadya Abaev,Denis Klimov,Gerard Levinov,David Mimran,Yuval Elovici,Asaf Shabtai
备注:14 pages, 5 figures
摘要:人工智能(AI)代理越来越多地用于各种领域,以自动执行任务,与用户交互,并根据数据输入做出决策。确保AI代理只执行授权的操作并适当处理输入对于维护系统完整性和防止滥用至关重要。在这项研究中,我们引入了AgentGuardian,这是一种新的安全框架,通过执行上下文感知的访问控制策略来管理和保护AI代理操作。在受控的阶段,框架监视执行跟踪,以学习合法的代理行为和输入模式。从这个阶段,它得出自适应的政策,规范工具调用的代理,实时输入上下文和多步代理动作的控制流依赖关系的指导。对两个真实世界的人工智能代理应用程序的评估表明,AgentGuardian可以有效地检测恶意或误导性输入,同时保留正常的代理功能。此外,其基于控制流的治理机制可以减轻幻觉驱动的错误和其他编排级别的故障。
摘要:Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions and handle inputs appropriately is essential for maintaining system integrity and preventing misuse. In this study, we introduce the AgentGuardian, a novel security framework that governs and protects AI agent operations by enforcing context-aware access-control policies. During a controlled staging phase, the framework monitors execution traces to learn legitimate agent behaviors and input patterns. From this phase, it derives adaptive policies that regulate tool calls made by the agent, guided by both real-time input context and the control flow dependencies of multi-step agent actions. Evaluation across two real-world AI agent applications demonstrates that AgentGuardian effectively detects malicious or misleading inputs while preserving normal agent functionality. Moreover, its control-flow-based governance mechanism mitigates hallucination-driven errors and other orchestration-level malfunctions.
【7】An analytic theory of convolutional neural network inverse problems solvers
标题:卷积神经网络逆问题求解器的分析理论
链接:https://arxiv.org/abs/2601.10334
作者:Minh Hai Nguyen,Quoc Bao Do,Edouard Pauwels,Pierre Weiss
摘要:监督卷积神经网络(CNN)被广泛用于解决成像逆问题,在许多应用中实现了最先进的性能。然而,尽管这些方法在经验上取得了成功,但从理论的角度来看,人们对它们的理解很少,而且往往被视为黑箱。为了弥合这一差距,我们通过最小均方误差(MMSE)估计器的镜头分析训练的神经网络,并结合功能约束,捕获CNN的两个基本归纳偏差:平移等方差和通过有限感受野的局部性。根据经验的训练分布,我们推导出一个分析,解释,和听话的公式,这个约束变量,称为本地等变MMSE(LE-MMSE)。通过各种逆问题(去噪,修复,反卷积),数据集(FFHQ,CIFAR-10,FashionMNIST)和架构(U-Net,ResNet,PatchMLP)的广泛数值实验,我们证明了我们的理论与神经网络输出(PSNR $\gtrsim25$dB)相匹配。此外,我们提供的洞察之间的差异物理感知和物理不可知估计,高密度区域的影响,在训练(补丁)分布,以及其他因素的影响(数据集大小,补丁大小等)。
摘要
:Supervised convolutional neural networks (CNNs) are widely used to solve imaging inverse problems, achieving state-of-the-art performance in numerous applications. However, despite their empirical success, these methods are poorly understood from a theoretical perspective and often treated as black boxes. To bridge this gap, we analyze trained neural networks through the lens of the Minimum Mean Square Error (MMSE) estimator, incorporating functional constraints that capture two fundamental inductive biases of CNNs: translation equivariance and locality via finite receptive fields. Under the empirical training distribution, we derive an analytic, interpretable, and tractable formula for this constrained variant, termed Local-Equivariant MMSE (LE-MMSE). Through extensive numerical experiments across various inverse problems (denoising, inpainting, deconvolution), datasets (FFHQ, CIFAR-10, FashionMNIST), and architectures (U-Net, ResNet, PatchMLP), we demonstrate that our theory matches the neural networks outputs (PSNR $\gtrsim25$dB). Furthermore, we provide insights into the differences between \emph{physics-aware} and \emph{physics-agnostic} estimators, the impact of high-density regions in the training (patch) distribution, and the influence of other factors (dataset size, patch size, etc).
【8】SPIKE: Sparse Koopman Regularization for Physics-Informed Neural Networks
标题:SPKE:物理信息神经网络的稀疏库普曼正规化
链接:https://arxiv.org/abs/2601.10282
作者:Jose Marie Antonio Minoza
摘要:物理信息神经网络(PINN)通过将物理约束嵌入到神经网络训练中,提供了一种求解微分方程的无网格方法。然而,PINN倾向于在训练域内过拟合,导致在训练的时空区域之外外推时泛化能力差。这项工作提出了SPIKE(稀疏物理信息Koopman增强),一个框架,规范PINN与连续时间Koopman运营商学习简约的动态表示。通过在学习的可观测空间中强制执行线性动力学$dz/dt = Az$,PIKE(没有显式稀疏性)和SPIKE(在$A$上进行L1正则化)都学习稀疏生成矩阵,体现了复杂动力学允许低维结构的简约原则。抛物线,双曲,色散和刚性偏微分方程,包括流体动力学(Navier-Stokes)和混沌ODE(洛伦兹)的实验,证明了时间外推,空间推广和长期预测精度的一致改善。矩阵指数积分的连续时间公式为刚性系统提供了无条件稳定性,同时避免了离散时间Koopman算子固有的对角优势问题。
摘要:Physics-Informed Neural Networks (PINNs) provide a mesh-free approach for solving differential equations by embedding physical constraints into neural network training. However, PINNs tend to overfit within the training domain, leading to poor generalization when extrapolating beyond trained spatiotemporal regions. This work presents SPIKE (Sparse Physics-Informed Koopman-Enhanced), a framework that regularizes PINNs with continuous-time Koopman operators to learn parsimonious dynamics representations. By enforcing linear dynamics $dz/dt = Az$ in a learned observable space, both PIKE (without explicit sparsity) and SPIKE (with L1 regularization on $A$) learn sparse generator matrices, embodying the parsimony principle that complex dynamics admit low-dimensional structure. Experiments across parabolic, hyperbolic, dispersive, and stiff PDEs, including fluid dynamics (Navier-Stokes) and chaotic ODEs (Lorenz), demonstrate consistent improvements in temporal extrapolation, spatial generalization, and long-term prediction accuracy. The continuous-time formulation with matrix exponential integration provides unconditional stability for stiff systems while avoiding diagonal dominance issues inherent in discrete-time Koopman operators.
【9】Comparative Evaluation of Deep Learning-Based and WHO-Informed Approaches for Sperm Morphology Assessment
标题:基于深度学习和WHO知情的精子形态评估方法的比较评估
链接:https://arxiv.org/abs/2601.10070
作者:Mohammad Abbadi
备注:Under review at Computers in Biology and Medicine
摘要:精子形态质量的评估仍然是男性生育力评估的关键但主观的组成部分,往往受到观察者之间的差异和资源限制的限制。本研究提出了一个比较生物医学人工智能框架,评估了基于图像的深度学习模型(HuSHeM)以及源自世界卫生组织标准并辅以全身炎症反应指数(WHO(+SIRI))的临床基础基线。 HuSHeM模型在高分辨率精子形态图像上进行训练,并使用独立的临床队列进行评估。模型性能进行了评估,使用歧视,校准和临床效用分析。HuSHeM模型表现出更高的区分性能,如与WHO(+SIRI)相比,受试者工作特征曲线下面积增加,置信区间相对较窄所反映的。精确-召回分析进一步表明,在类别不平衡的情况下,精确-召回区域值在评估阈值上更高。校准分析表明预测概率与观察到的HuSHeM结果之间的一致性更接近,而决策曲线分析表明,在临床相关阈值概率范围内,净临床获益更大。 这些发现表明,与传统的基于规则和炎症增强的标准相比,基于图像的深度学习可以提供更好的预测可靠性和临床实用性。拟议的框架支持客观和可重复的精子形态评估,并可作为生育筛查和转诊工作流程中的决策支持工具。建议的模型旨在作为决策支持或转诊工具,而不是用来取代临床判断或实验室评估。
摘要:Assessment of sperm morphological quality remains a critical yet subjective component of male fertility evaluation, often limited by inter-observer variability and resource constraints. This study presents a comparative biomedical artificial intelligence framework evaluating an image-based deep learning model (HuSHeM) alongside a clinically grounded baseline derived from World Health Organization criteria augmented with the Systemic Inflammation Response Index (WHO(+SIRI)). The HuSHeM model was trained on high-resolution sperm morphology images and evaluated using an independent clinical cohort. Model performance was assessed using discrimination, calibration, and clinical utility analyses. The HuSHeM model demonstrated higher discriminative performance, as reflected by an increased area under the receiver operating characteristic curve with relatively narrow confidence intervals compared to WHO(+SIRI). Precision-recall analysis further indicated improved performance under class imbalance, with higher precision-recall area values across evaluated thresholds. Calibration analysis indicated closer agreement between predicted probabilities and observed outcomes for HuSHeM, while decision curve analysis suggested greater net clinical benefit across clinically relevant threshold probabilities. These findings suggest that image-based deep learning may offer improved predictive reliability and clinical utility compared with traditional rule-based and inflammation-augmented criteria. The proposed framework supports objective and reproducible assessment of sperm morphology and may serve as a decision-support tool within fertility screening and referral workflows. The proposed models are intended as decision-support or referral tools and are not designed to replace clinical judgment or laboratory assessment.
【10】Time Aggregation Features for XGBoost Models
标题:XGBOP型号的时间聚合功能
链接:https://arxiv.org/abs/2601.10019
作者:Mykola Pinchuk
备注:17 pages, 18 tables and figures
摘要:研究了XGBoost模型在点击率预测中的时间聚集特性。该设置是Avazu点击率预测数据集,具有严格的超时分割和无前瞻功能约束。小时H的特征仅使用严格在小时H之前的小时的印象。本文比较了一个强大的时间意识的目标编码基线模型增强实体历史时间聚合在几个窗口设计。在确定性10%样品的两个卷尾折叠中,相对于单独的靶编码,拖尾窗口规格将ROC AUC提高约0.0066至0.0082,PR AUC提高约0.0084至0.0094。在时间聚合设计网格中,事件计数窗口相对于尾随窗口提供了唯一一致的改进,并且增益很小。在此数据集和协议中,间隙窗口和分桶窗口的性能低于简单尾随窗口。这些结果支持拖尾窗口的实际默认值,当边际ROC AUC增益重要时,具有可选的事件计数窗口。
摘要:This paper studies time aggregation features for XGBoost models in click-through rate prediction. The setting is the Avazu click-through rate prediction dataset with strict out-of-time splits and a no-lookahead feature constraint. Features for hour H use only impressions from hours strictly before H. This paper compares a strong time-aware target encoding baseline to models augmented with entity history time aggregation under several window designs. Across two rolling-tail folds on a deterministic ten percent sample, a trailing window specification improves ROC AUC by about 0.0066 to 0.0082 and PR AUC by about 0.0084 to 0.0094 relative to target encoding alone. Within the time aggregation design grid, event count windows provide the only consistent improvement over trailing windows, and the gain is small. Gap windows and bucketized windows underperform simple trailing windows in this dataset and protocol. These results support a practical default of trailing windows, with an optional event count window when marginal ROC AUC gains matter.
【11】In-Context Operator Learning on the Space of Probability Measures
标题:概率度量空间上的上下文操作员学习
链接:https://arxiv.org/abs/2601.09979
作者:Frank Cole,Dixi Wang,Yineng Chen,Yulong Lu,Rongjie Lai
摘要:我们介绍了\n {在概率测度空间上的上下文算子学习}的最优运输(OT)。目标是学习一个将一对分布映射到OT图的单个求解算子,仅使用每个分布的Few-Shot样本作为提示,并且在推理时不进行梯度更新。我们参数化的解决方案运营商和发展的标度律理论在两个政权。在非参数设置中,当任务集中在源-目标对的低内在维度流形上时,我们建立了泛化界限,该界限量化了上下文准确性如何与提示大小,内在任务维度和模型容量进行缩放。在\n {parametric}设置中(例如,高斯家庭),我们给出了一个明确的架构,恢复的确切OT地图的上下文中,并提供有限样本的超额风险界限。我们对合成传输和生成建模基准的数值实验验证了该框架。
摘要:We introduce \emph{in-context operator learning on probability measure spaces} for optimal transport (OT). The goal is to learn a single solution operator that maps a pair of distributions to the OT map, using only few-shot samples from each distribution as a prompt and \emph{without} gradient updates at inference. We parameterize the solution operator and develop scaling-law theory in two regimes. In the \emph{nonparametric} setting, when tasks concentrate on a low-intrinsic-dimension manifold of source--target pairs, we establish generalization bounds that quantify how in-context accuracy scales with prompt size, intrinsic task dimension, and model capacity. In the \emph{parametric} setting (e.g., Gaussian families), we give an explicit architecture that recovers the exact OT map in context and provide finite-sample excess-risk bounds. Our numerical experiments on synthetic transports and generative-modeling benchmarks validate the framework.
【12】VibrantSR: Sub-Meter Canopy Height Models from Sentinel-2 Using Generative Flow Matching
标题:VibrantSR:Sentinel-2使用生成式流量匹配的亚米树冠高度模型
链接:https://arxiv.org/abs/2601.09866
作者:Kiarie Ndegwa,Andreas Gros,Tony Chang,David Diaz,Vincent A. Landau,Nathan E. Rutenbeck,Luke J. Zachmann,Guy Bayes,Scott Conway
备注:12 pages, 8 figures, 2 tables
摘要:我们提出了VibrantSR(Vibrant Super-Resolution),这是一个用于从10米Sentinel-2图像中估计0.5米冠层高度模型(CHM)的生成超分辨率框架。与基于航空影像的方法不同,这些方法受到不频繁和不规则的采集时间表的限制,VibrantSR利用全球可用的Sentinel-2季节性复合材料,实现了季节性到年度的一致监测。在美国西部22个EPA 3级生态区使用空间不相交的验证分割进行评估,VibrantSR在树冠高度>= 2米的情况下实现了4.39米的平均绝对误差,优于Meta(4.83米),LANDFIRE(5.96米)和ETH(7.05米)基于卫星的基准。虽然基于空中的VibrantVS(2.71 m MAE)保留了准确性优势,但VibrantSR能够在大陆尺度上进行森林监测和碳核算,而无需依赖昂贵且暂时不频繁的空中采集。
摘要:We present VibrantSR (Vibrant Super-Resolution), a generative super-resolution framework for estimating 0.5 meter canopy height models (CHMs) from 10 meter Sentinel-2 imagery. Unlike approaches based on aerial imagery that are constrained by infrequent and irregular acquisition schedules, VibrantSR leverages globally available Sentinel-2 seasonal composites, enabling consistent monitoring at a seasonal-to-annual cadence. Evaluated across 22 EPA Level 3 eco-regions in the western United States using spatially disjoint validation splits, VibrantSR achieves a Mean Absolute Error of 4.39 meters for canopy heights >= 2 m, outperforming Meta (4.83 m), LANDFIRE (5.96 m), and ETH (7.05 m) satellite-based benchmarks. While aerial-based VibrantVS (2.71 m MAE) retains an accuracy advantage, VibrantSR enables operational forest monitoring and carbon accounting at continental scales without reliance on costly and temporally infrequent aerial acquisitions.
【13】R-LAM: Reproducibility-Constrained Large Action Models for Scientific Workflow Automation
标题:R-LAM:用于科学工作流程自动化的可重复性约束的大型动作模型
链接:https://arxiv.org/abs/2601.09749
作者:Suriya Sureshkumar
备注:9 pages, 3 figures, 1 Table, 2 Artifacts
摘要:大型动作模型(LAM)通过支持自主决策和工具执行来扩展大型语言模型,使它们有望实现科学工作流程的自动化。然而,科学工作流程对可重复性,可重复性和确定性执行提出了严格的要求,这是通用的基于LLM的代理无法满足的。不受约束的动作生成可能会导致沉默的状态变化,不确定的执行和不可重现的实验结果,限制了科学设置中的适用性的LAMs。 在本文中,我们提出了R-LAM,一个可重复性约束的框架,应用大动作模型,科学工作流自动化。R-LAM引入了结构化的动作模式、确定性的执行策略和显式的起源跟踪,以确保每个动作和中间工件都是可审计和可重放的。该框架支持故障感知执行循环和受控工作流分叉,从而在不影响可重复性的情况下实现迭代实验。 我们将R-LAM实现为一个轻量级的Python框架,并将其作为开源PyPI包发布,以促进可重复的研究。代表性的科学工作流程的实验评估表明,R-LAM提高了再现性成功率和执行可靠性相比,不受约束的基于LLM的代理,同时保留自适应控制工作流程的执行。
摘要:Large Action Models (LAMs) extend large language models by enabling autonomous decision-making and tool execution, making them promising for automating scientific workflows. However, scientific workflows impose strict requirements on reproducibility, auditability, and deterministic execution, which are not satisfied by generic LLM-based agents. Unconstrained action generation can lead to silent state changes, non-deterministic executions, and irreproducible experimental results, limiting the applicability of LAMs in scientific settings. In this paper, we propose R-LAM, a reproducibility-constrained framework for applying Large Action Models to scientific workflow automation. R-LAM introduces structured action schemas, deterministic execution policies, and explicit provenance tracking to ensure that every action and intermediate artifact is auditable and replayable. The framework supports failure-aware execution loops and controlled workflow forking, enabling iterative experimentation without compromising reproducibility. We implement R-LAM as a lightweight Python framework and release it as an open-source PyPI package to facilitate reproducible research. An experimental evaluation of representative scientific workflows demonstrates that R-LAM improves reproducibility success rates and execution reliability compared to unconstrained LLM-based agents, while retaining adaptive control over workflow execution.
【14】Multi-Agent Cooperative Learning for Robust Vision-Language Alignment under OOD Concepts
标题:OOD概念下实现鲁棒视觉语言对齐的多智能体合作学习
链接:https://arxiv.org/abs/2601.09746
作者:Philip Xu,Isabel Wagner,Eerke Boiten
摘要:本文介绍了一种新的多智能体协作学习(MACL)框架,以解决跨模态对齐崩溃的视觉语言模型时,处理的分布(OOD)的概念。四个核心代理,包括图像,文本,名称和协调代理,通过结构化的消息传递协作减轻模态不平衡。该框架支持多Agent特征空间名称学习,采用上下文交换增强的Few-Shot学习算法,并采用自适应动态平衡机制调节Agent间的贡献.在VISTA-Beyond数据集上的实验表明,MACL在Few-Shot和zero-shot设置下都显著提高了性能,在不同的视觉域中实现了1-5%的精度增益。
摘要:This paper introduces a novel Multi-Agent Cooperative Learning (MACL) framework to address cross-modal alignment collapse in vision-language models when handling out-of-distribution (OOD) concepts. Four core agents, including image, text, name, and coordination agents, collaboratively mitigate modality imbalance through structured message passing. The proposed framework enables multi-agent feature space name learning, incorporates a context exchange enhanced few-shot learning algorithm, and adopts an adaptive dynamic balancing mechanism to regulate inter-agent contributions. Experiments on the VISTA-Beyond dataset demonstrate that MACL significantly improves performance in both few-shot and zero-shot settings, achieving 1-5% precision gains across diverse visual domains.
【15】Coarsening Causal DAG Models
标题:粗化因果DAG模型
链接:https://arxiv.org/abs/2601.10531
作者:Francisco Madaleno,Pratik Misra,Alex Markham
备注:25 pages, 5 figures
摘要:有向无环图(DAG)模型是一种强大的工具,用于表示联合分布的随机变量之间的因果关系,特别是涉及来自不同实验环境的数据。然而,在特定数据集中以给定特征的粒度来估计因果模型并不总是实用或可取的。有越来越多的研究机构对因果抽象来解决这些问题。我们通过以下方式为这一研究路线做出贡献:(i)为实际相关的介入设置提供新的图形可识别性结果,(ii)提出一种有效的、可证明一致的算法,用于直接从具有未知介入目标的介入数据中学习抽象因果图,以及(iii)揭示有关底层搜索空间的晶格结构的理论见解,与更普遍的因果发现领域的联系。作为概念证明,我们将我们的算法应用于具有已知地面事实的合成和真实数据集,包括来自具有相互作用的光强度和偏振的受控物理系统的测量。
摘要:Directed acyclic graphical (DAG) models are a powerful tool for representing causal relationships among jointly distributed random variables, especially concerning data from across different experimental settings. However, it is not always practical or desirable to estimate a causal model at the granularity of given features in a particular dataset. There is a growing body of research on causal abstraction to address such problems. We contribute to this line of research by (i) providing novel graphical identifiability results for practically-relevant interventional settings, (ii) proposing an efficient, provably consistent algorithm for directly learning abstract causal graphs from interventional data with unknown intervention targets, and (iii) uncovering theoretical insights about the lattice structure of the underlying search space, with connections to the field of causal discovery more generally. As proof of concept, we apply our algorithm on synthetic and real datasets with known ground truths, including measurements from a controlled physical system with interacting light intensity and polarization.
其他(29篇)
【1】DInf-Grid: A Neural Differential Equation Solver with Differentiable Feature Grids
标题:DInf-Grid:具有可微特征网格的神经方程求解器
链接:https://arxiv.org/abs/2601.10715
作者:Navami Kairanda,Shanthika Naik,Marc Habermann,Avinash Sharma,Christian Theobalt,Vladislav Golyanik
备注:25 pages; 16 figures; project page: https://4dqv.mpi-inf.mpg.de/DInf-Grid/
摘要:我们提出了一种新的可微网格表示有效地解决微分方程(DE)。广泛使用的神经求解器架构(例如正弦神经网络)是基于坐标的MLP,其计算密集且训练缓慢。虽然隐式表示的基于网格的替代方案(例如,Instant-NGP和K-Planes)通过利用信号结构更快地训练,它们对线性插值的依赖限制了它们计算高阶导数的能力,使得它们不适合求解DE。我们的方法克服了这些限制相结合的效率特征网格与径向基函数插值,这是无限可微的。为了有效地捕获高频解并实现稳定和更快的全局梯度计算,我们引入了具有同位网格的多分辨率分解。我们提出的表示,DInf-Grid,使用微分方程作为损失函数进行隐式训练,从而实现物理场的精确建模。我们验证DInf-Grid的各种任务,包括泊松方程的图像重建,亥姆霍兹方程的波场,和Kirchhoff-Love边界值问题的布料模拟。我们的结果表明,基于坐标的MLP方法的速度提高了5- 20倍,在几秒或几分钟内求解微分方程,同时保持相当的精度和紧凑性。
摘要:We present a novel differentiable grid-based representation for efficiently solving differential equations (DEs). Widely used architectures for neural solvers, such as sinusoidal neural networks, are coordinate-based MLPs that are both computationally intensive and slow to train. Although grid-based alternatives for implicit representations (e.g., Instant-NGP and K-Planes) train faster by exploiting signal structure, their reliance on linear interpolation restricts their ability to compute higher-order derivatives, rendering them unsuitable for solving DEs. Our approach overcomes these limitations by combining the efficiency of feature grids with radial basis function interpolation, which is infinitely differentiable. To effectively capture high-frequency solutions and enable stable and faster computation of global gradients, we introduce a multi-resolution decomposition with co-located grids. Our proposed representation, DInf-Grid, is trained implicitly using the differential equations as loss functions, enabling accurate modelling of physical fields. We validate DInf-Grid on a variety of tasks, including the Poisson equation for image reconstruction, the Helmholtz equation for wave fields, and the Kirchhoff-Love boundary value problem for cloth simulation. Our results demonstrate a 5-20x speed-up over coordinate-based MLP-based methods, solving differential equations in seconds or minutes while maintaining comparable accuracy and compactness.
【2】High-accuracy and dimension-free sampling with diffusions
标题:具有扩散的高精度和无因次采样
链接:https://arxiv.org/abs/2601.10708
作者:Khashayar Gatmiry,Sitan Chen,Adil Salim
摘要:扩散模型在从丰富的多峰分布中取样方面取得了显著的经验成功。他们的推论依赖于数值求解某个微分方程。该微分方程不能以封闭形式求解,并且其通过离散化的分辨率通常需要许多小的迭代来产生高质量的样本。 更确切地说,以前的工作已经表明,扩散模型的离散化方法的迭代复杂性在环境维度和逆精度$1/\vareps $的多项式尺度。在这项工作中,我们提出了一个新的求解扩散模型依赖于一个微妙的相互作用之间的低次近似和配置方法(Lee,Song,Vempala 2018),并且我们证明了它的迭代复杂度以1/\vareps $的方式缩放,从而为基于扩散的采样器产生第一个"高精度“保证,该采样器仅使用对数据分布的分数的(近似)访问。此外,我们的边界并不显式地依赖于环境维度;更确切地说,维度仅通过目标分布的支持的有效半径来影响求解器的复杂性。
摘要:Diffusion models have shown remarkable empirical success in sampling from rich multi-modal distributions. Their inference relies on numerically solving a certain differential equation. This differential equation cannot be solved in closed form, and its resolution via discretization typically requires many small iterations to produce \emph{high-quality} samples. More precisely, prior works have shown that the iteration complexity of discretization methods for diffusion models scales polynomially in the ambient dimension and the inverse accuracy $1/\varepsilon$. In this work, we propose a new solver for diffusion models relying on a subtle interplay between low-degree approximation and the collocation method (Lee, Song, Vempala 2018), and we prove that its iteration complexity scales \emph{polylogarithmically} in $1/\varepsilon$, yielding the first ``high-accuracy'' guarantee for a diffusion-based sampler that only uses (approximate) access to the scores of the data distribution. In addition, our bound does not depend explicitly on the ambient dimension; more precisely, the dimension affects the complexity of our solver through the \emph{effective radius} of the support of the target distribution only.
【3】Distributed Perceptron under Bounded Staleness, Partial Participation, and Noisy Communication
标题:有限停滞、部分参与和噪音通信下的分布式感知器
链接:https://arxiv.org/abs/2601.10705
作者:Keval Jain,Anant Raj,Saurav Prakash,Girish Varma
摘要:我们研究了通过迭代参数混合(IPM式平均)训练的半异步客户端-服务器感知器:客户端运行本地感知器更新,服务器通过聚合每个通信回合中到达的更新来形成全局模型。该设置捕获了联合和分布式部署中的三种系统效应:(i)由于延迟的模型交付和延迟的客户端计算应用(双侧版本滞后)而导致的陈旧更新,(ii)部分参与(间歇性客户端可用性),以及(iii)下行链路和上行链路上的不完善通信,建模为具有有界二阶矩的有效零均值加性噪声。我们引入了一个服务器端的聚合规则称为陈旧桶聚合填充,确定性地执行规定的陈旧配置文件更新年龄,而不假设任何随机模型的延迟或参与。在保证金可分性和有界数据半径下,我们证明了在给定数量的服务器轮次上感知器错误的累积加权数上的有限时域预期界:延迟的影响只通过平均强制老化出现,而通信噪声贡献了一个额外的项,该项在时域的平方根的数量级上与总噪声能量一起增长。在无噪声的情况下,我们展示了有限的预期错误预算如何在一个温和的新鲜参与条件下产生一个明确的有限轮稳定界。
摘要
:We study a semi-asynchronous client-server perceptron trained via iterative parameter mixing (IPM-style averaging): clients run local perceptron updates and a server forms a global model by aggregating the updates that arrive in each communication round. The setting captures three system effects in federated and distributed deployments: (i) stale updates due to delayed model delivery and delayed application of client computations (two-sided version lag), (ii) partial participation (intermittent client availability), and (iii) imperfect communication on both downlink and uplink, modeled as effective zero-mean additive noise with bounded second moment. We introduce a server-side aggregation rule called staleness-bucket aggregation with padding that deterministically enforces a prescribed staleness profile over update ages without assuming any stochastic model for delays or participation. Under margin separability and bounded data radius, we prove a finite-horizon expected bound on the cumulative weighted number of perceptron mistakes over a given number of server rounds: the impact of delay appears only through the mean enforced staleness, whereas communication noise contributes an additional term that grows on the order of the square root of the horizon with the total noise energy. In the noiseless case, we show how a finite expected mistake budget yields an explicit finite-round stabilization bound under a mild fresh-participation condition.
【4】PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution
标题:PACEvolve:实现长期进度感知的一致演进
链接:https://arxiv.org/abs/2601.10657
作者:Minghao Yan,Bo Peng,Benjamin Coleman,Ziqi Chen,Zhouhang Xie,Zhankui He,Noveen Sachdeva,Isabella Ye,Weili Wang,Chi Wang,Ed H. Chi,Wang-Cheng Kang,Derek Zhiyuan Cheng,Beidou Wang
摘要:大型语言模型(LLM)已经成为进化搜索的强大算子,但高效搜索支架的设计仍然是特设的。虽然很有前途,但目前的LLM在环系统缺乏管理进化过程的系统方法。我们确定了三种不同的失败模式:上下文污染,实验历史偏向未来的候选生成;模式崩溃,由于探索-利用平衡差,代理停滞在局部最小值;弱协作,刚性交叉策略无法有效利用并行搜索轨迹。我们引入进度感知一致性进化(PACEvolve),一个框架,旨在强大地管理代理的上下文和搜索动态,以解决这些挑战。PACEvolve将分层上下文管理(HCM)与修剪相结合,以解决上下文污染;基于动量的回溯(MBB)以逃避局部最小值;以及自适应采样策略,该策略将回溯和交叉统一起来以进行动态搜索协调(CE),允许代理平衡内部细化与交叉轨迹协作。我们证明,PACEvolve提供了一个系统的路径,以一致的,长期的自我改进,实现国家的最先进的结果LLM-SR和KernelBench,同时发现解决方案超越了Modded NanoGPT的记录。
摘要:Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure modes: Context Pollution, where experiment history biases future candidate generation; Mode Collapse, where agents stagnate in local minima due to poor exploration-exploitation balance; and Weak Collaboration, where rigid crossover strategies fail to leverage parallel search trajectories effectively. We introduce Progress-Aware Consistent Evolution (PACEvolve), a framework designed to robustly govern the agent's context and search dynamics, to address these challenges. PACEvolve combines hierarchical context management (HCM) with pruning to address context pollution; momentum-based backtracking (MBB) to escape local minima; and a self-adaptive sampling policy that unifies backtracking and crossover for dynamic search coordination (CE), allowing agents to balance internal refinement with cross-trajectory collaboration. We demonstrate that PACEvolve provides a systematic path to consistent, long-horizon self-improvement, achieving state-of-the-art results on LLM-SR and KernelBench, while discovering solutions surpassing the record on Modded NanoGPT.
【5】Procedural Fairness in Multi-Agent Bandits
标题:多智能体盗贼中的程序公平性
链接:https://arxiv.org/abs/2601.10600
作者:Joshua Caiata,Carter Blair,Kate Larson
摘要:在多智能体多武装土匪(MA-MAB)的背景下,公平往往被简化为结果:最大化福利,减少不平等,或平衡效用。然而,心理学、经济学和罗尔斯理论的证据表明,公平也与过程有关,以及谁在做出的决定中有发言权。我们引入了一个新的公平目标,程序公平,它为所有代理人提供了平等的决策权,位于核心,并提供了结果的比例。实证结果证实,基于优化结果的公平观念牺牲了平等的发言权和代表性,而基于结果的公平目标(如平等和功利主义)的牺牲是最小的程序公平的政策。我们进一步证明,不同的公平观念优先考虑根本不同的和不相容的价值观,强调公平需要明确的规范选择。本文认为,程序合法性作为一个公平目标值得更多的关注,并提供了一个框架,将程序公平付诸实践。
摘要:In the context of multi-agent multi-armed bandits (MA-MAB), fairness is often reduced to outcomes: maximizing welfare, reducing inequality, or balancing utilities. However, evidence in psychology, economics, and Rawlsian theory suggests that fairness is also about process and who gets a say in the decisions being made. We introduce a new fairness objective, procedural fairness, which provides equal decision-making power for all agents, lies in the core, and provides for proportionality in outcomes. Empirical results confirm that fairness notions based on optimizing for outcomes sacrifice equal voice and representation, while the sacrifice in outcome-based fairness objectives (like equality and utilitarianism) is minimal under procedurally fair policies. We further prove that different fairness notions prioritize fundamentally different and incompatible values, highlighting that fairness requires explicit normative choices. This paper argues that procedural legitimacy deserves greater focus as a fairness objective, and provides a framework for putting procedural fairness into practice.
【6】Generative AI collective behavior needs an interactionist paradigm
标题:生成性人工智能集体行为需要互动主义范式
链接:https://arxiv.org/abs/2601.10567
作者:Laura Ferrarotti,Gian Maria Campedelli,Roberto Dessì,Andrea Baronchelli,Giovanni Iacca,Kathleen M. Carley,Alex Pentland,Joel Z. Leibo,James Evans,Bruno Lepri
摘要:在这篇文章中,我们认为,理解基于大型语言模型(LLM)的代理的集体行为是一个重要的研究领域,在风险和收益方面具有重要意义,在许多层面上影响着我们的社会。我们认为,LLM的独特性质-即,他们的初始化与广泛的预先训练的知识和隐式的社会先验,以及他们的适应能力,通过在上下文学习-激发了对互动主义范式的需要,包括替代的理论基础,方法和分析工具,为了系统地研究先验知识和嵌入式价值观如何与社会背景相互作用,以塑造多代理生成AI系统中的涌现现象。我们提出并讨论了四个方向,我们认为这对基于LLM的集体的开发和部署至关重要,重点是理论,方法和跨学科对话。
摘要:In this article, we argue that understanding the collective behavior of agents based on large language models (LLMs) is an essential area of inquiry, with important implications in terms of risks and benefits, impacting us as a society at many levels. We claim that the distinctive nature of LLMs--namely, their initialization with extensive pre-trained knowledge and implicit social priors, together with their capability of adaptation through in-context learning--motivates the need for an interactionist paradigm consisting of alternative theoretical foundations, methodologies, and analytical tools, in order to systematically examine how prior knowledge and embedded values interact with social context to shape emergent phenomena in multi-agent generative AI systems. We propose and discuss four directions that we consider crucial for the development and deployment of LLM-based collectives, focusing on theory, methods, and trans-disciplinary dialogue.
【7】CoGen: Creation of Reusable UI Components in Figma via Textual Commands
标题:CoGen:通过文本命令在Figma中创建可重复使用的UI组件
链接:https://arxiv.org/abs/2601.10536
作者:Ishani Kanapathipillai,Obhasha Priyankara
备注:8 pages, 6 figures, 11 tables
摘要:用户界面设计的发展强调了对高效、可重用和可编辑组件的需求,以确保高效的设计过程。本研究介绍了CoGen,一个使用机器学习技术直接在最流行的UI设计工具之一Figma中生成可重用UI组件的系统。为了解决当前系统中的差距,CoGen专注于使用结构化JSON和自然语言提示创建原子组件,如按钮,标签和输入字段。 该项目集成了Figma API数据提取,Seq2Seq模型和微调的T5 Transformers,用于组件生成。关键结果证明了T5模型在提示生成方面的效率,准确率为98%,BLEU得分为0.2668,确保了JSON到描述性提示的映射。对于JSON创建,CoGen在为指定组件类型生成简单JSON输出方面的成功率高达100%。
摘要:The evolution of User Interface design has emphasized the need for efficient, reusable, and editable components to ensure an efficient design process. This research introduces CoGen, a system that uses machine learning techniques to generate reusable UI components directly in Figma, one of the most popular UI design tools. Addressing gaps in current systems, CoGen focuses on creating atomic components such as buttons, labels, and input fields using structured JSON and natural language prompts. The project integrates Figma API data extraction, Seq2Seq models, and fine-tuned T5 transformers for component generation. The key results demonstrate the efficiency of the T5 model in prompt generation, with an accuracy of 98% and a BLEU score of 0.2668, which ensures the mapping of JSON to descriptive prompts. For JSON creation, CoGen achieves a success rate of up to 100% in generating simple JSON outputs for specified component types.
【8】A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
标题:GPT-5.2、Gemini 3 Pro、Qwen 3-DL、Doubao 1.8、Grok 4.1 Fast、Nano Banana Pro和Seedamous 4.5的安全报告
链接:https://arxiv.org/abs/2601.10527
作者:Xingjun Ma,Yixu Wang,Hengyuan Xu,Yutao Wu,Yifan Ding,Yunhan Zhao,Zilong Wang,Jiabin Hua,Ming Wen,Jianan Liu,Ranjie Duan,Yifeng Gao,Yingshui Tan,Yunhao Chen,Hui Xue,Xin Wang,Wei Cheng,Jingjing Chen,Zuxuan Wu,Bo Li,Yu-Gang Jiang
备注:42 pages, 24 figures
摘要:大型语言模型(LLM)和多模态大型语言模型(MLLM)的快速发展已经在跨语言和视觉的推理、感知和生成能力方面产生了实质性的进步。然而,这些进步是否会在安全性方面产生相应的改善仍不清楚,部分原因是由于分散的评估实践仅限于单一模式或威胁模型。在这份报告中,我们对7个前沿型号进行了综合安全性评估:GPT-5.2,Gemini 3 Pro,Qwen 3-VL,Doubao 1.8,Grok 4.1 Fast,Nano Banana Pro和Seedbao 4.5。我们使用一个统一的协议来评估每个模型的语言,视觉语言和图像生成设置,该协议集成了基准评估,对抗评估,多语言评估和合规性评估。将我们的评估汇总到安全排行榜和多种评估模式的模型安全配置文件中,揭示了一个明显不同的安全环境。虽然GPT-5.2在评估中表现出一贯强大和平衡的安全性能,但其他模型在基准安全性,对抗性对齐,多语言泛化和法规遵从性之间表现出明显的权衡。语言和视觉语言模式在对抗性评估下表现出显著的脆弱性,尽管在标准基准上取得了很好的结果,但所有模型都大幅下降。文本到图像模型在受监管的视觉风险类别中实现了相对更强的对齐,但在对抗性或语义模糊的提示下仍然很脆弱。总体而言,这些结果表明,前沿模型的安全性本质上是多维的-由模式,语言和评估方案形成,强调了标准化安全评估的必要性,以准确评估现实世界的风险并指导负责任的模型开发和部署。
摘要:The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has produced substantial gains in reasoning, perception, and generative capability across language and vision. However, whether these advances yield commensurate improvements in safety remains unclear, in part due to fragmented evaluation practices limited to single modalities or threat models. In this report, we present an integrated safety evaluation of 7 frontier models: GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5. We evaluate each model across language, vision-language, and image generation settings using a unified protocol that integrates benchmark evaluation, adversarial evaluation, multilingual evaluation, and compliance evaluation. Aggregating our evaluations into safety leaderboards and model safety profiles across multiple evaluation modes reveals a sharply heterogeneous safety landscape. While GPT-5.2 demonstrates consistently strong and balanced safety performance across evaluations, other models exhibit pronounced trade-offs among benchmark safety, adversarial alignment, multilingual generalization, and regulatory compliance. Both language and vision-language modalities show significant vulnerability under adversarial evaluation, with all models degrading substantially despite strong results on standard benchmarks. Text-to-image models achieve relatively stronger alignment in regulated visual risk categories, yet remain brittle under adversarial or semantically ambiguous prompts. Overall, these results show that safety in frontier models is inherently multidimensional--shaped by modality, language, and evaluation scheme, underscoring the need for standardized safety evaluations to accurately assess real-world risk and guide responsible model development and deployment.
【9】SatMap: Revisiting Satellite Maps as Prior for Online HD Map Construction
标题:SatMap:重新审视卫星地图,作为在线高清地图建设的优先事项
链接:https://arxiv.org/abs/2601.10512
作者:Kanak Mazumder,Fabian B. Flohr
备注:This work has been submitted to the IEEE ICPR for possible publication
摘要:在线高清(HD)地图构建是安全可靠的端到端自动驾驶(AD)管道的重要组成部分。机载基于相机的方法由于遮挡而遭受有限的深度感知和降低的准确性。在这项工作中,我们提出了SatMap,一种在线矢量化高清地图估计方法,它将卫星地图与多视图相机观测相结合,并直接预测矢量化高清地图,用于下游预测和规划模块。我们的方法利用从鸟瞰图(BEV)角度捕获的卫星图像的车道级语义和纹理作为全局先验,有效地减轻了深度模糊和遮挡。在我们对nuScenes数据集的实验中,SatMap在仅摄像头基线上实现了34.8%的mAP性能改进,在摄像头-LiDAR融合基线上实现了8.5%的mAP性能改进。此外,我们评估我们的模型在长距离和恶劣的天气条件下,证明使用卫星先验地图的优势。源代码将在https://iv.ee.hm.edu/satmap/上提供。
摘要:Online high-definition (HD) map construction is an essential part of a safe and robust end-to-end autonomous driving (AD) pipeline. Onboard camera-based approaches suffer from limited depth perception and degraded accuracy due to occlusion. In this work, we propose SatMap, an online vectorized HD map estimation method that integrates satellite maps with multi-view camera observations and directly predicts a vectorized HD map for downstream prediction and planning modules. Our method leverages lane-level semantics and texture from satellite imagery captured from a Bird's Eye View (BEV) perspective as a global prior, effectively mitigating depth ambiguity and occlusion. In our experiments on the nuScenes dataset, SatMap achieves 34.8% mAP performance improvement over the camera-only baseline and 8.5% mAP improvement over the camera-LiDAR fusion baseline. Moreover, we evaluate our model in long-range and adverse weather conditions to demonstrate the advantages of using a satellite prior map. Source code will be available at https://iv.ee.hm.edu/satmap/.
【10】Discrete Feynman-Kac Correctors
标题:离散费曼-卡茨修正器
链接:https://arxiv.org/abs/2601.10403
作者:Mohsin Hasan,Viktor Ohanesian,Artem Gazizov,Yoshua Bengio,Alán Aspuru-Guzik,Roberto Bondesan,Marta Skreta,Kirill Neklyudov
备注:Code: https://github.com/hasanmohsin/discrete_fkc
摘要:离散扩散模型最近出现作为一个有前途的替代自回归方法生成离散序列。通过逐步去噪或去掩蔽过程生成的样本使他们能够捕获数据中的分层非顺序相互依赖性。然而,这些定制过程并不假定对生成的样品的分布的灵活控制。我们提出了离散Feynman-Kac校正器,这是一个允许在推理时控制离散掩蔽扩散模型生成的分布的框架。我们推导出顺序蒙特卡罗(SMC)算法,给定训练的离散扩散模型,控制采样分布的温度(即执行退火),从几个扩散过程的边缘的乘积中采样(例如,不同条件的过程),并从边际与外部奖励函数的乘积中采样,从而从目标分布中产生也具有高奖励的可能样本。值得注意的是,我们的框架不需要对额外的模型进行任何训练或对原始模型进行微调。我们说明了我们的框架在几个应用程序中的效用,包括:从伊辛模型的退火玻尔兹曼分布的有效采样,提高代码生成和摊销学习的语言模型的性能,以及奖励倾斜的蛋白质序列生成。
摘要
:Discrete diffusion models have recently emerged as a promising alternative to the autoregressive approach for generating discrete sequences. Sample generation via gradual denoising or demasking processes allows them to capture hierarchical non-sequential interdependencies in the data. These custom processes, however, do not assume a flexible control over the distribution of generated samples. We propose Discrete Feynman-Kac Correctors, a framework that allows for controlling the generated distribution of discrete masked diffusion models at inference time. We derive Sequential Monte Carlo (SMC) algorithms that, given a trained discrete diffusion model, control the temperature of the sampled distribution (i.e. perform annealing), sample from the product of marginals of several diffusion processes (e.g. differently conditioned processes), and sample from the product of the marginal with an external reward function, producing likely samples from the target distribution that also have high reward. Notably, our framework does not require any training of additional models or fine-tuning of the original model. We illustrate the utility of our framework in several applications including: efficient sampling from the annealed Boltzmann distribution of the Ising model, improving the performance of language models for code generation and amortized learning, as well as reward-tilted protein sequence generation.
【11】SuS: Strategy-aware Surprise for Intrinsic Exploration
标题:SuS:内在探索的战略意识惊喜
链接:https://arxiv.org/abs/2601.10349
作者:Mark Kashirskiy,Ilya Makarov
备注:8 pages, 7 figures, 3 tables. Code available at https://github.com/mariklolik/sus
摘要:我们提出了策略感知惊喜(SuS),这是一种新的内在动机框架,它使用前后预测不匹配作为强化学习探索的新颖信号。与传统的仅依赖于状态预测误差的好奇心驱动方法不同,SuS引入了两个互补的组件:策略稳定性(SS)和策略惊喜(SuS)。SS测量跨时间步骤的行为策略的一致性,而SuS捕获相对于代理当前策略表示的意外结果。我们的组合奖励公式通过学习加权系数来利用这两个信号。我们使用大型语言模型评估SuS在数学推理任务中的表现,在准确性和解决方案多样性方面都有显着改善。消融研究证实,去除任何一种成分都会导致至少10%的性能下降,从而验证了我们方法的协同性质。与基线方法相比,SuS在Pass@1中实现了17.4%的改进,在Pass@5中实现了26.4%的改进,同时在整个训练过程中保持了更高的策略多样性。
摘要:We propose Strategy-aware Surprise (SuS), a novel intrinsic motivation framework that uses pre-post prediction mismatch as a novelty signal for exploration in reinforcement learning. Unlike traditional curiosity-driven methods that rely solely on state prediction error, SuS introduces two complementary components: Strategy Stability (SS) and Strategy Surprise (SuS). SS measures consistency in behavioral strategy across temporal steps, while SuS captures unexpected outcomes relative to the agent's current strategy representation. Our combined reward formulation leverages both signals through learned weighting coefficients. We evaluate SuS on mathematical reasoning tasks using large language models, demonstrating significant improvements in both accuracy and solution diversity. Ablation studies confirm that removing either component results in at least 10% performance degradation, validating the synergistic nature of our approach. SuS achieves 17.4% improvement in Pass@1 and 26.4% improvement in Pass@5 compared to baseline methods, while maintaining higher strategy diversity throughout training.
【12】Training-Trajectory-Aware Token Selection
标题:训练-轨迹-感知代币选择
链接:https://arxiv.org/abs/2601.10348
作者:Zhanming Shen,Jiaqi Hu,Zeyu Qin,Hao Chen,Wentao Ye,Zenan Huang,Yihong Zhuang,Guoshan Lu,Junlin Zhou,Junbo Zhao
摘要:有效的蒸馏是将昂贵的推理能力转化为可部署的效率的关键途径,然而在学生已经具有强大推理能力的前沿领域,天真的持续蒸馏往往会产生有限的收益甚至退化。我们观察到一个典型的训练现象:即使损失单调下降,所有的性能指标都可以在几乎相同的瓶颈处急剧下降,然后逐渐恢复。我们进一步发现了一种令牌级机制:信心分叉为稳定增加的模仿锚定令牌(快速锚定优化)和其他尚未学习的令牌(其信心在瓶颈之后才被抑制)。而这两种代币不能共存的特性,正是持续蒸馏失败的根本原因。为此,我们提出了Training-Trajectory-Aware Token Selection(T3 S),以在令牌级别重建训练目标,为尚未学习的令牌清除优化路径。T3在AR和dLLM设置中产生了一致的收益:只有数百个示例,Qwen 3 -8B在竞争性推理基准上超过了DeepSeek-R1,Qwen 3 - 32 B接近Qwen 3 - 235 B,T3训练的LLaDA-2.0-Mini超过了其AR基线,在所有16 B规模的无思考模型中实现了最先进的性能。
摘要:Efficient distillation is a key pathway for converting expensive reasoning capability into deployable efficiency, yet in the frontier regime where the student already has strong reasoning ability, naive continual distillation often yields limited gains or even degradation. We observe a characteristic training phenomenon: even as loss decreases monotonically, all performance metrics can drop sharply at almost the same bottleneck, before gradually recovering. We further uncover a token-level mechanism: confidence bifurcates into steadily increasing Imitation-Anchor Tokens that quickly anchor optimization and other yet-to-learn tokens whose confidence is suppressed until after the bottleneck. And the characteristic that these two types of tokens cannot coexist is the root cause of the failure in continual distillation. To this end, we propose Training-Trajectory-Aware Token Selection (T3S) to reconstruct the training objective at the token level, clearing the optimization path for yet-to-learn tokens. T3 yields consistent gains in both AR and dLLM settings: with only hundreds of examples, Qwen3-8B surpasses DeepSeek-R1 on competitive reasoning benchmarks, Qwen3-32B approaches Qwen3-235B, and T3-trained LLaDA-2.0-Mini exceeds its AR baseline, achieving state-of-the-art performance among all of 16B-scale no-think models.
【13】MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts
标题:Most:将语音和文本与认知模式的专家混合在一起
链接:https://arxiv.org/abs/2601.10272
作者:Yuxuan Lou,Kai Yang,Yang You
摘要:我们提出了MoST(语音和文本的混合),一种新的多模态大型语言模型,通过我们提出的模态感知专家混合(MAMoE)架构无缝集成语音和文本处理。虽然目前的多模态模型通常处理不同的模态表示相同的参数,忽略其固有的代表性差异,我们引入专门的路由路径,直接令牌的模态适当的专家基于输入类型。MAMoE通过两个互补的组成部分同时增强了特定模态的学习和跨模态的理解:捕获特定领域模式的特定模态专家组和促进模态之间信息传递的共享专家。在此架构的基础上,我们开发了一个高效的转换管道,通过对ASR和TTS数据集进行策略性的后训练来适应预训练的MoE语言模型,然后使用精心策划的语音文本指令数据集进行微调。该管道的一个关键特征是它完全依赖于完全可访问的开源数据集,以实现强大的性能和数据效率。跨ASR、TTS、音频语言建模和口语问答基准的综合评估表明,MoST始终优于具有可比参数计数的现有模型。我们的消融研究证实,特定于模态的路由机制和共享的专家设计显着有助于在所有测试域的性能增益。据我们所知,MoST代表了第一个完全开源的语音-文本LLM,它建立在专家混合架构上。\footnote{我们在https://github.com/NUS-HPC-AI-Lab/MoST上发布MoST模型、训练代码、推理代码和训练数据
摘要
:We present MoST (Mixture of Speech and Text), a novel multimodal large language model that seamlessly integrates speech and text processing through our proposed Modality-Aware Mixture of Experts (MAMoE) architecture. While current multimodal models typically process diverse modality representations with identical parameters, disregarding their inherent representational differences, we introduce specialized routing pathways that direct tokens to modality-appropriate experts based on input type. MAMoE simultaneously enhances modality-specific learning and cross-modal understanding through two complementary components: modality-specific expert groups that capture domain-specific patterns and shared experts that facilitate information transfer between modalities. Building on this architecture, we develop an efficient transformation pipeline that adapts the pretrained MoE language model through strategic post-training on ASR and TTS datasets, followed by fine-tuning with a carefully curated speech-text instruction dataset. A key feature of this pipeline is that it relies exclusively on fully accessible, open-source datasets to achieve strong performance and data efficiency. Comprehensive evaluations across ASR, TTS, audio language modeling, and spoken question answering benchmarks show that MoST consistently outperforms existing models of comparable parameter counts. Our ablation studies confirm that the modality-specific routing mechanism and shared experts design significantly contribute to performance gains across all tested domains. To our knowledge, MoST represents the first fully open-source speech-text LLM built on a Mixture of Experts architecture. \footnote{We release MoST model, training code, inference code, and training data at https://github.com/NUS-HPC-AI-Lab/MoST
【14】In-Context Source and Channel Coding
标题:上下文内源和频道编码
链接:https://arxiv.org/abs/2601.10267
作者:Ziqiong Wang,Tianqi Ren,Rongpeng Li,Zhifeng Zhao,Honggang Zhang
摘要:分离信源信道编码(SSCC)由于其模块性和与成熟的熵编码器和强大的信道编码的兼容性,在文本传输中仍然具有吸引力。然而,SSCC在低信噪比(SNR)情况下经常遭受明显的悬崖效应,其中信道解码之后的残留比特错误可以灾难性地破坏无损源解码,特别是对于由大语言模型(LLM)驱动的算术编码(AC)。本文提出了一种接收端的上下文解码(ICD)框架,增强SSCC的鲁棒性,而无需修改发送器。ICD利用纠错码Transformer(ECCT)来获得解码信息比特的逐比特可靠性。基于上下文一致的比特流,ICD通过可靠性引导的比特翻转构造置信度排名的候选池,对候选的紧凑而多样的子集进行采样,并应用基于LLM的算术解码器来获得重建和序列级对数似然。一个可靠性似然融合规则,然后选择最终的输出。我们进一步提供了理论保证的稳定性和收敛性的采样过程。在加性高斯白噪声(AWGN)和瑞利衰落信道上的大量实验表明,与传统的SSCC基线和代表性的联合信源信道编码(JSCC)方案相比,该方案具有一致的增益。
摘要:Separate Source-Channel Coding (SSCC) remains attractive for text transmission due to its modularity and compatibility with mature entropy coders and powerful channel codes. However, SSCC often suffers from a pronounced cliff effect in low Signal-to-Noise Ratio (SNR) regimes, where residual bit errors after channel decoding can catastrophically break lossless source decoding, especially for Arithmetic Coding (AC) driven by Large Language Models (LLMs). This paper proposes a receiver-side In-Context Decoding (ICD) framework that enhances SSCC robustness without modifying the transmitter. ICD leverages an Error Correction Code Transformer (ECCT) to obtain bit-wise reliability for the decoded information bits. Based on the context-consistent bitstream, ICD constructs a confidence-ranked candidate pool via reliability-guided bit flipping, samples a compact yet diverse subset of candidates, and applies an LLM-based arithmetic decoder to obtain both reconstructions and sequence-level log-likelihoods. A reliability-likelihood fusion rule then selects the final output. We further provide theoretical guarantees on the stability and convergence of the proposed sampling procedure. Extensive experiments over Additive White Gaussian Noise (AWGN) and Rayleigh fading channels demonstrate consistent gains compared with conventional SSCC baselines and representative Joint Source-Channel Coding (JSCC) schemes.
【15】X-SAM: Boosting Sharpness-Aware Minimization with Dominant-Eigenvector Gradient Correction
标题:X-Sam:通过主导特征量梯度修正提高敏锐度最小化
链接:https://arxiv.org/abs/2601.10251
作者:Hongru Duan,Yongle Chen,Lei Guan
摘要:Sharpness-Aware Minimization(SAM)的目标是通过在模型参数的小邻域内最小化最坏情况下的扰动损失来提高泛化能力。然而,在训练过程中,其优化行为并不总是与理论预期一致,因为尖锐和平坦区域都可能产生小的扰动损失。在这种情况下,梯度可能仍然指向尖锐区域,无法实现SAM的预期效果。为了解决这个问题,我们调查SAM从光谱和几何的角度来看:具体来说,我们利用梯度和领先的特征向量的Hessian之间的角度作为一个衡量锐度。我们的分析表明,当这个角度小于或等于90度时,SAM的锐度正则化的效果可以减弱。此外,我们提出了一个显式的特征向量对齐SAM(X-SAM),它通过沿顶部特征向量的正交分解来校正梯度,从而使Hessian的最大特征值的正则化更直接,更有效。我们证明了X-SAM的收敛性和优越的推广,广泛的实验评估证实了理论和实践的优势。
摘要:Sharpness-Aware Minimization (SAM) aims to improve generalization by minimizing a worst-case perturbed loss over a small neighborhood of model parameters. However, during training, its optimization behavior does not always align with theoretical expectations, since both sharp and flat regions may yield a small perturbed loss. In such cases, the gradient may still point toward sharp regions, failing to achieve the intended effect of SAM. To address this issue, we investigate SAM from a spectral and geometric perspective: specifically, we utilize the angle between the gradient and the leading eigenvector of the Hessian as a measure of sharpness. Our analysis illustrates that when this angle is less than or equal to ninety degrees, the effect of SAM's sharpness regularization can be weakened. Furthermore, we propose an explicit eigenvector-aligned SAM (X-SAM), which corrects the gradient via orthogonal decomposition along the top eigenvector, enabling more direct and efficient regularization of the Hessian's maximum eigenvalue. We prove X-SAM's convergence and superior generalization, with extensive experimental evaluations confirming both theoretical and practical advantages.
【16】Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD
标题:DP-Singapore有利的隐私-公用事业保证的基本局限性
链接:https://arxiv.org/abs/2601.10237
作者:Murat Bilgehan Ertan,Marten van Dijk
摘要:差分私有随机梯度下降(DP-SGD)是私有训练的主要范式,但其在最坏情况下的对抗隐私定义下的基本局限性仍然知之甚少。我们分析DP-SGD在$f$差分隐私框架,其特征在于隐私通过假设测试权衡曲线,并研究洗牌采样在一个单一的时代与$M$梯度更新。我们推导出一个明确的次优上界的可实现的权衡曲线。这一结果导致了间隔$κ$的几何下界,间隔$κ$是机制的折衷曲线与理想随机猜测线之间的最大距离。因为大的分离意味着显著的对抗优势,有意义的隐私需要小的$κ$。然而,我们证明,强制执行一个小的分离施加了严格的下限的高斯噪声乘数$σ$,这直接限制了可实现的效用。特别地,在标准的最坏情况对抗模型下,混洗的DP-SGD必须满足 $σ\ge \frac{1}{\sqrt{2\ln M}}$ $\quad\text{or}\quad$ $κ\ge\ \frac{1}{\sqrt{8}}\!\左(1-\frac{1}{\sqrt{4π\ln M}}}\right)$, 因此不能同时实现强私密性和高实用性。虽然这个界限渐近消失为$M \to \infty$,收敛是非常缓慢的:即使是实际相关的更新数量所需的噪声幅度仍然很大。我们进一步表明,同样的限制扩展到泊松子采样常数的因素。我们的实验证实,该界限所暗示的噪声水平会导致在现实训练环境下的准确性显著下降,从而在标准最坏情况对抗假设下显示出DP-SGD的关键瓶颈。
摘要:Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under worst-case adversarial privacy definitions remain poorly understood. We analyze DP-SGD in the $f$-differential privacy framework, which characterizes privacy via hypothesis-testing trade-off curves, and study shuffled sampling over a single epoch with $M$ gradient updates. We derive an explicit suboptimal upper bound on the achievable trade-off curve. This result induces a geometric lower bound on the separation $κ$ which is the maximum distance between the mechanism's trade-off curve and the ideal random-guessing line. Because a large separation implies significant adversarial advantage, meaningful privacy requires small $κ$. However, we prove that enforcing a small separation imposes a strict lower bound on the Gaussian noise multiplier $σ$, which directly limits the achievable utility. In particular, under the standard worst-case adversarial model, shuffled DP-SGD must satisfy $σ\ge \frac{1}{\sqrt{2\ln M}}$ $\quad\text{or}\quad$ $κ\ge\ \frac{1}{\sqrt{8}}\!\left(1-\frac{1}{\sqrt{4π\ln M}}\right)$, and thus cannot simultaneously achieve strong privacy and high utility. Although this bound vanishes asymptotically as $M \to \infty$, the convergence is extremely slow: even for practically relevant numbers of updates the required noise magnitude remains substantial. We further show that the same limitation extends to Poisson subsampling up to constant factors. Our experiments confirm that the noise levels implied by this bound leads to significant accuracy degradation at realistic training settings, thus showing a critical bottleneck in DP-SGD under standard worst-case adversarial assumptions.
【17】Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
标题:对齐预训练:人工智能话语导致自我实现(Mis)对齐
链接:https://arxiv.org/abs/2601.10160
作者:Cameron Tice,Puria Radmard,Samuel Ratnam,Andy Kim,David Africa,Kyle O'Brien
摘要:预训练语料库包含大量关于人工智能系统的论述,但这种论述对下游对齐的因果影响仍然知之甚少。如果对人工智能行为的普遍描述主要是负面的,LLM可能会内化相应的行为先验,从而导致自我实现的错位。本文通过预训练6. 9B参数LLM与不同数量的(错误)对齐话语,提供了这一假设的第一个对照研究。我们发现,对AI的讨论会导致错位。对人工智能错位的合成训练文档进行上采样会导致错位行为的显著增加。相反,对关于对齐行为的文档进行上采样将未对齐分数从45%降低到9%。我们认为这是自我实现对齐的证据。这些影响会减弱,但会持续到训练后。我们的研究结果建立了预训练数据如何塑造对齐先验或对齐预训练的研究,作为对后训练的补充。我们建议从业者预先培训对齐以及能力。我们的模型和数据集可在alignmentpretraining.ai上获得
摘要:Pretraining corpora contain extensive discourse about AI systems, yet the causal influence of this discourse on downstream alignment remains poorly understood. If prevailing descriptions of AI behaviour are predominantly negative, LLMs may internalise corresponding behavioural priors, giving rise to self-fulfilling misalignment. This paper provides the first controlled study of this hypothesis by pretraining 6.9B-parameter LLMs with varying amounts of (mis)alignment discourse. We find that discussion of AI contributes to misalignment. Upsampling synthetic training documents about AI misalignment leads to a notable increase in misaligned behaviour. Conversely, upsampling documents about aligned behaviour reduces misalignment scores from 45% to 9%. We consider this evidence of self-fulfilling alignment. These effects are dampened, but persist through post-training. Our findings establish the study of how pretraining data shapes alignment priors, or alignment pretraining, as a complement to post-training. We recommend practitioners pretrain for alignment as well as capabilities. Our models and datasets are available at alignmentpretraining.ai
【18】Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text
标题:多语言到多模式(M2M):用单语言文本解锁新语言
链接:https://arxiv.org/abs/2601.10096
作者:Piyush Singh Pasi
备注:EACL 2026 Findings accepted. Initial Draft of Camera-ready
摘要:多模态模型在英语中表现出色,有丰富的图像-文本和音频-文本数据支持,但由于多语种多模态资源有限,其他语言的性能急剧下降。现有的解决方案严重依赖于机器翻译,而多语言文本建模的进步仍然没有得到充分利用。我们介绍了金属,一个轻量级的对齐方法,只学习几个线性层使用英语文本单独映射到多模态空间的多语言文本嵌入。尽管它很简单,但METAL在英语中的基线性能(10岁时94.9%的召回率)并在XTD文本到图像检索上实现了强大的zero-shot迁移(11种语言中平均10岁时89.5%的召回率,10种语言不可见)。定性t-SNE可视化显示,多语言嵌入与多模态表示紧密对齐,而权重分析显示,变换重塑嵌入几何形状,而不是执行平凡的旋转。除了图像-文本检索,METAL还可以推广到音频-文本检索和跨语言的文本到图像生成。我们在https://github.com/m2m-codebase/M2M上发布代码和检查点,以及多语言评估数据集,包括MSCOCO Multilingual 30 K(https:huggingface.co/datasets/piyushsinghpasi/audiocaps-multilingual),AudioCaps Multilingual(https:huggingface.co/datasets/piyushsinghpasi/clotho-multilingual)和Clotho Multilingual(https://www.example.com),以促进进一步的研究。huggingface.co/datasets/piyushsinghpasi/mscoco-multilingual-30k
摘要:Multimodal models excel in English, supported by abundant image-text and audio-text data, but performance drops sharply for other languages due to limited multilingual multimodal resources. Existing solutions rely heavily on machine translation, while advances in multilingual text modeling remain underutilized. We introduce METAL, a lightweight alignment method that learns only a few linear layers using English text alone to map multilingual text embeddings into a multimodal space. Despite its simplicity, METAL matches baseline performance in English (94.9 percent Recall at 10) and achieves strong zero-shot transfer (89.5 percent Recall at 10 averaged across 11 languages, 10 unseen) on XTD text-to-image retrieval. Qualitative t-SNE visualizations show that multilingual embeddings align tightly with multimodal representations, while weight analysis reveals that the transformation reshapes embedding geometry rather than performing trivial rotations. Beyond image-text retrieval, METAL generalizes to audio-text retrieval and cross-lingual text-to-image generation. We release code and checkpoints at https://github.com/m2m-codebase/M2M , as well as multilingual evaluation datasets including MSCOCO Multilingual 30K (https://huggingface.co/datasets/piyushsinghpasi/mscoco-multilingual-30k ), AudioCaps Multilingual (https://huggingface.co/datasets/piyushsinghpasi/audiocaps-multilingual ), and Clotho Multilingual (https://huggingface.co/datasets/piyushsinghpasi/clotho-multilingual ), to facilitate further research.
【19】BPE: Behavioral Profiling Ensemble
标题:BPE:行为分析集合
链接:https://arxiv.org/abs/2601.10024
作者:Yanxin Liu,Yunqi Zhang
摘要:包围学习被广泛认为是推动预测性能边界的关键策略。传统的静态集成方法(如Stacking)通常通过将每个基本学习者视为整体实体来分配权重,从而忽略了单个模型在实例空间的不同区域中表现出不同程度的能力的事实。为了解决这一限制,引入了动态包围选择(DES)。然而,静态和动态方法主要依赖于不同模型之间的差异作为整合的基础。这种模型间的观点忽略了模型本身的内在特征,需要严重依赖验证集的能力估计。在本文中,我们提出了行为分析环境(BPE)的框架,它引入了一种新的范式转变。与传统方法不同,BPE构建了每个模型固有的“行为配置文件”,并根据模型对特定测试实例的响应与其建立的行为配置文件之间的偏差导出集成权重。在合成数据集和真实数据集上进行的大量实验表明,从BPE框架衍生的算法在最先进的集成基线上实现了显着的改进。这些收益不仅在预测准确性方面很明显,而且在各种场景中的计算效率和存储资源利用率方面也很明显。
摘要:Ensemble learning is widely recognized as a pivotal strategy for pushing the boundaries of predictive performance. Traditional static ensemble methods, such as Stacking, typically assign weights by treating each base learner as a holistic entity, thereby overlooking the fact that individual models exhibit varying degrees of competence across different regions of the instance space. To address this limitation, Dynamic Ensemble Selection (DES) was introduced. However, both static and dynamic approaches predominantly rely on the divergence among different models as the basis for integration. This inter-model perspective neglects the intrinsic characteristics of the models themselves and necessitates a heavy reliance on validation sets for competence estimation. In this paper, we propose the Behavioral Profiling Ensemble (BPE) framework, which introduces a novel paradigm shift. Unlike traditional methods, BPE constructs a ``behavioral profile'' intrinsic to each model and derives integration weights based on the deviation between the model's response to a specific test instance and its established behavioral profile. Extensive experiments on both synthetic and real-world datasets demonstrate that the algorithm derived from the BPE framework achieves significant improvements over state-of-the-art ensemble baselines. These gains are evident not only in predictive accuracy but also in computational efficiency and storage resource utilization across various scenarios.
【20】A Sustainable AI Economy Needs Data Deals That Work for Generators
标题:可持续的人工智能经济需要对发电机有效的数据交易
链接:https://arxiv.org/abs/2601.09966
作者:Ruoxi Jia,Luis Oala,Wenjie Xiong,Suqin Ge,Jiachen T. Wang,Feiyang Kang,Dawn Song
备注:Published at NeurIPS 2025 (https://neurips.cc/virtual/2025/loc/san-diego/poster/121926)
摘要:我们认为,由于经济数据处理的不平等,机器学习价值链在结构上是不可持续的:从输入到模型权重再到合成输出的数据周期中的每个状态都细化了技术信号,但从数据生成器中剥离了经济公平。通过分析73个公共数据交易,我们发现,大部分价值都归聚合商所有,记录在案的创作者版税四舍五入为零,交易条款普遍不透明。这不仅仅是一个经济福利问题:随着数据及其衍生品成为经济资产,维持当前学习算法的反馈回路面临风险。我们确定了三个结构性故障-缺失的出处,不对称的议价能力,和非动态定价-作为这种不平等的运作机制。在我们的分析中,我们沿着机器学习价值链追踪这些问题,并提出了一个公平的数据价值交换(EDVEX)框架,以实现一个有利于所有参与者的最小市场。最后,我们概述了我们的社区可以为数据交易做出具体贡献的研究方向,并将我们的立场与相关和正交的观点联系起来。
摘要:We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.
【21】The PROPER Approach to Proactivity: Benchmarking and Advancing Knowledge Gap Navigation
标题:积极主动的正确方法:基准和推进知识差距导航
链接:https://arxiv.org/abs/2601.09926
作者:Kirandeep Kaur,Vinayak Gupta,Aditya Gupta,Chirag Shah
摘要:大多数基于语言的助手遵循反应式的询问和响应范式,要求用户明确地陈述他们的需求。因此,相关但未表达的需求往往得不到满足。现有的积极主动的代理人试图解决这一差距,要么引起进一步的澄清,保留这一负担,或通过推断未来的需要从上下文,往往导致不必要的或不合时宜的干预。我们介绍ProPer,主动性驱动的个性化代理,一种新的两个代理的体系结构组成的维度生成代理(DGA)和响应生成代理(RGA)。DGA是一种微调的LLM代理,它利用显式用户数据来生成多个隐式维度(与用户任务相关但用户未考虑的潜在方面)或知识差距。这些维度使用基于质量、多样性和任务相关性的重新排序器来选择性地过滤。RGA然后平衡显式和隐式的尺寸,以定制个性化的反应与及时和积极的干预。我们使用一个结构化的、具有缺口意识的规则来评估多个领域的ProPer,该规则衡量覆盖范围、计划适当性和意图一致性。我们的研究结果表明,ProPer提高了所有领域的质量分数和胜率,在单回合评估中获得了高达84%的收益,并在多回合互动中保持了一致的优势。
摘要:Most language-based assistants follow a reactive ask-and-respond paradigm, requiring users to explicitly state their needs. As a result, relevant but unexpressed needs often go unmet. Existing proactive agents attempt to address this gap either by eliciting further clarification, preserving this burden, or by extrapolating future needs from context, often leading to unnecessary or mistimed interventions. We introduce ProPer, Proactivity-driven Personalized agents, a novel two-agent architecture consisting of a Dimension Generating Agent (DGA) and a Response Generating Agent (RGA). DGA, a fine-tuned LLM agent, leverages explicit user data to generate multiple implicit dimensions (latent aspects relevant to the user's task but not considered by the user) or knowledge gaps. These dimensions are selectively filtered using a reranker based on quality, diversity, and task relevance. RGA then balances explicit and implicit dimensions to tailor personalized responses with timely and proactive interventions. We evaluate ProPer across multiple domains using a structured, gap-aware rubric that measures coverage, initiative appropriateness, and intent alignment. Our results show that ProPer improves quality scores and win rates across all domains, achieving up to 84% gains in single-turn evaluation and consistent dominance in multi-turn interactions.
【22】Epistemology gives a Future to Complementarity in Human-AI Interactions
标题:认识论为人机互动的互补性提供了未来
链接:https://arxiv.org/abs/2601.09871
作者:Andrea Ferrario,Alessandro Facchini,Juan M. Durán
备注:Submitted to FAccT 2026
摘要:人类-人工智能互补性是指在人工智能系统的支持下,人类在决策过程中可以胜过任何一个人。自从它被引入人类-人工智能交互文献以来,它通过推广依赖范式并为有争议的“信任人工智能”结构提供更实用的替代方案而获得了吸引力。“然而,互补性面临着关键的理论挑战:它缺乏精确的理论锚定,它只是作为相对预测准确性的事后指标而正式化,它对人类与人工智能互动的其他必要条件保持沉默,并且它从其性能增益的数量成本概况中抽象出来。因此,在经验环境中很难获得互补性。在这项工作中,我们利用认识论来解决这些挑战,通过重新构建关于证明性人工智能的话语中的互补性。利用计算可靠性,我们认为互补性的历史实例可以作为证据,证明给定的人类-人工智能交互对于给定的预测任务是可靠的认知过程。与其他评估人工智能团队与认知标准和社会技术实践的一致性的可靠性指标一起,互补性有助于人工智能团队在生成预测时的可靠性程度。这支持了那些受这些输出影响的人的实际推理-患者,管理者,监管者和其他人。总之,我们的方法表明,互补性的作用和价值不在于提供预测准确性的相对衡量标准,而在于帮助校准决策,以适应日益塑造日常生活的人工智能支持过程的可靠性。
摘要:Human-AI complementarity is the claim that a human supported by an AI system can outperform either alone in a decision-making process. Since its introduction in the human-AI interaction literature, it has gained traction by generalizing the reliance paradigm and by offering a more practical alternative to the contested construct of 'trust in AI.' Yet complementarity faces key theoretical challenges: it lacks precise theoretical anchoring, it is formalized just as a post hoc indicator of relative predictive accuracy, it remains silent about other desiderata of human-AI interactions and it abstracts away from the magnitude-cost profile of its performance gain. As a result, complementarity is difficult to obtain in empirical settings. In this work, we leverage epistemology to address these challenges by reframing complementarity within the discourse on justificatory AI. Drawing on computational reliabilism, we argue that historical instances of complementarity function as evidence that a given human-AI interaction is a reliable epistemic process for a given predictive task. Together with other reliability indicators assessing the alignment of the human-AI team with the epistemic standards and socio-technical practices, complementarity contributes to the degree of reliability of human-AI teams when generating predictions. This supports the practical reasoning of those affected by these outputs -- patients, managers, regulators, and others. In summary, our approach suggests that the role and value of complementarity lies not in providing a relative measure of predictive accuracy, but in helping calibrate decision-making to the reliability of AI-supported processes that increasingly shape everyday life.
【23】A pipeline for enabling path-specific causal fairness in observational health data
标题:在观察健康数据中实现路径特定因果公平性的管道
链接:https://arxiv.org/abs/2601.09841
作者:Aparajita Kashyap,Sara Matijevic,Noémie Elhadad,Steven A. Kushner,Shalmali Joshi
摘要:当训练机器学习(ML)模型以用于医疗环境中的潜在部署时,必须确保它们不会复制或加剧现有的医疗偏见。虽然存在许多公平的定义,但我们专注于特定路径的因果公平,这使我们能够更好地考虑偏见发生的社会和医学背景(例如,临床医生或模型的直接区分与由于对医疗保健系统的不同访问而导致的偏差),并表征这些偏差如何出现在学习的模型中。在这项工作中,我们将结构公平模型映射到观察性医疗保健环境,并创建一个可推广的管道来训练因果公平模型。管道明确考虑特定的医疗保健背景和差异,以定义目标“公平”模型。我们的工作填补了两个主要的空白:首先,我们扩大了“公平性-准确性”的权衡的特征,通过解开直接和间接的偏见来源,并共同提出这些公平性的考虑,以及在广泛已知的偏见的背景下的准确性的考虑。其次,我们展示了如何利用在没有对观察性健康数据进行公平约束的情况下训练的基础模型,在具有已知社会和医疗差异的任务中生成因果公平的下游预测。这项工作提出了一个模型不可知的管道,用于训练因果公平的机器学习模型,解决直接和间接形式的医疗偏见。
摘要
:When training machine learning (ML) models for potential deployment in a healthcare setting, it is essential to ensure that they do not replicate or exacerbate existing healthcare biases. Although many definitions of fairness exist, we focus on path-specific causal fairness, which allows us to better consider the social and medical contexts in which biases occur (e.g., direct discrimination by a clinician or model versus bias due to differential access to the healthcare system) and to characterize how these biases may appear in learned models. In this work, we map the structural fairness model to the observational healthcare setting and create a generalizable pipeline for training causally fair models. The pipeline explicitly considers specific healthcare context and disparities to define a target "fair" model. Our work fills two major gaps: first, we expand on characterizations of the "fairness-accuracy" tradeoff by detangling direct and indirect sources of bias and jointly presenting these fairness considerations alongside considerations of accuracy in the context of broadly known biases. Second, we demonstrate how a foundation model trained without fairness constraints on observational health data can be leveraged to generate causally fair downstream predictions in tasks with known social and medical disparities. This work presents a model-agnostic pipeline for training causally fair machine learning models that address both direct and indirect forms of healthcare bias.
【24】Eluder dimension: localise it!
标题:Eluder维度:本地化!
链接:https://arxiv.org/abs/2601.09825
作者:Alireza Bakhtiari,Alex Ayoub,Samuel Robertson,David Janz,Csaba Szepesvári
摘要:我们建立了一个下界的广义线性模型类的逃避维度,表明标准的逃避维度为基础的分析不能导致一阶遗憾界。为了解决这个问题,我们引入了一种针对逃避者维度的本地化方法;我们的分析立即恢复并改进了Bernoulli bandits的经典结果,并允许有限时域强化学习任务的第一个真正的一阶边界具有有界累积收益。
摘要:We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.
【25】Adjusted Similarity Measures and a Violation of Expectations
标题:调整后的相似性措施和违背预期
链接:https://arxiv.org/abs/2601.10641
作者:William L. Lippitt,Edward J. Bedrick,Nichole E. Carlson
备注:12 pages, 1 figure
摘要:调整后的相似性度量,如科恩的kappa评分员间的可靠性和调整后的兰德指数用于比较聚类算法,是一个重要的工具,比较离散的标签。这些测量旨在具有零分布下的0期望值和最大相似性下的最大值1的属性,以帮助解释。由于历史和分析的原因,测量经常相对于排列分布进行调整。目前有新的兴趣,考虑其他空模型更适合的情况下,如聚类合奏允许一个随机数的识别集群。这项工作的目的是双重的:(1)推广研究的调整运营商一般零模型和更一般的程序,其中包括统计标准化作为一个特殊的情况下,(2)确定充分条件的调整运营商产生预期的属性,其中充分条件与是否和如何观察到的数据被纳入零分布。我们演示了如何违反的充分条件可能会导致实质性的故障,如通过传统的调整,而不是一个平均值为0的非积极的措施,或通过生产的措施,这是确定性为0的统计标准化。
摘要:Adjusted similarity measures, such as Cohen's kappa for inter-rater reliability and the adjusted Rand index used to compare clustering algorithms, are a vital tool for comparing discrete labellings. These measures are intended to have the property of 0 expectation under a null distribution and maximum value 1 under maximal similarity to aid in interpretation. Measures are frequently adjusted with respect to the permutation distribution for historic and analytic reasons. There is currently renewed interest in considering other null models more appropriate for context, such as clustering ensembles permitting a random number of identified clusters. The purpose of this work is two -- fold: (1) to generalize the study of the adjustment operator to general null models and to a more general procedure which includes statistical standardization as a special case and (2) to identify sufficient conditions for the adjustment operator to produce the intended properties, where sufficient conditions are related to whether and how observed data are incorporated into null distributions. We demonstrate how violations of the sufficient conditions may lead to substantial breakdown, such as by producing a non-positive measure under traditional adjustment rather than one with mean 0, or by producing a measure which is deterministically 0 under statistical standardization.
【26】Parametric RDT approach to computational gap of symmetric binary perceptron
标题:对称二元感知器计算差距的参数RDT方法
链接:https://arxiv.org/abs/2601.10628
作者:Mihailo Stojnic
摘要:我们研究了对称二进制感知器(SBP)中可能存在的计算间隙(SCG),这是通过参数化利用完全提升随机对偶理论(fl-RDT)[96]实现的。在第二个提升水平上观察到从递减到任意序的$c$-序列(一个关键的fl-RDT参数分量)的结构变化,并与可满足性($α c$)-算法约束($α a$)密度阈值的变化相关联,从而表明可能存在非零计算间隙$SCG=α c-α a$。第二水平估计被证明与理论$α_c$相匹配,而$r\rightarrow \infty$水平估计被提出与$α_a$相对应。例如,对于标准SBP($κ=1$ margin),我们在第二层得到$α_c\approximat1.8159 $,在第七层得到$α_a\approximat1.6021 $(有向$\sim 1.59$范围收敛的趋势)。我们的命题与最近的文献非常一致:(i)在[20]中,局部熵副本方法预测$α_{LE}\approximately 1.58$作为聚类碎片整理的开始(假定的驱动力背后的局部改进算法失败);(ii)在$α\rightarrow 0$状态下,我们在第三提升水平$κ\approximat1.2385\sqrt {\frac{α_a}{-\log\left(α_a \right)}}$,其定性地匹配[43]的基于重叠间隙性质(OGP)的预测,并且等同地匹配[24]的基于局部熵的预测;(iii)$c$-序列排序变化现象反映了[98]中在非对称二元感知器(ABP)中观察到的现象和[100]中的负Hopfield模型;和(iv)如[98,100]中所述,我们在这里设计了一种基于CLuP的算法,其实际性能与所提出的理论预测密切匹配。
摘要:We study potential presence of statistical-computational gaps (SCG) in symmetric binary perceptrons (SBP) via a parametric utilization of \emph{fully lifted random duality theory} (fl-RDT) [96]. A structural change from decreasingly to arbitrarily ordered $c$-sequence (a key fl-RDT parametric component) is observed on the second lifting level and associated with \emph{satisfiability} ($α_c$) -- \emph{algorithmic} ($α_a$) constraints density threshold change thereby suggesting a potential existence of a nonzero computational gap $SCG=α_c-α_a$. The second level estimate is shown to match the theoretical $α_c$ whereas the $r\rightarrow \infty$ level one is proposed to correspond to $α_a$. For example, for the canonical SBP ($κ=1$ margin) we obtain $α_c\approx 1.8159$ on the second and $α_a\approx 1.6021$ (with converging tendency towards $\sim 1.59$ range) on the seventh level. Our propositions remarkably well concur with recent literature: (i) in [20] local entropy replica approach predicts $α_{LE}\approx 1.58$ as the onset of clustering defragmentation (presumed driving force behind locally improving algorithms failures); (ii) in $α\rightarrow 0$ regime we obtain on the third lifting level $κ\approx 1.2385\sqrt{\frac{α_a}{-\log\left ( α_a \right ) }}$ which qualitatively matches overlap gap property (OGP) based predictions of [43] and identically matches local entropy based predictions of [24]; (iii) $c$-sequence ordering change phenomenology mirrors the one observed in asymmetric binary perceptron (ABP) in [98] and the negative Hopfield model in [100]; and (iv) as in [98,100], we here design a CLuP based algorithm whose practical performance closely matches proposed theoretical predictions.
【27】H-EFT-VA: An Effective-Field-Theory Variational Ansatz with Provable Barren Plateau Avoidance
标题:H-EFT-VA:一个具有可证明贫瘠高原避免的有效场理论变分类比
链接:https://arxiv.org/abs/2601.10479
作者:Eyad I. B Hamid
备注:7 pages, 5 figuers, Appendix
摘要
:变分量子算法(VQA)受到贫瘠高原(BP)现象的严重威胁。在这项工作中,我们介绍了H-EFT变分模拟器(H-EFT-VA),一个架构的启发有效场理论(EFT)。通过在初始化时强制执行分层的“UV截止”,我们在理论上限制了电路的状态探索,防止了近似酉2-设计的形成。我们提供了一个严格的证明,这种局部化保证了梯度方差的逆多项式下界:$Var[\partial θ] \in Ω(1/poly(N))$。至关重要的是,与通过限制纠缠来避免BPs的方法不同,我们证明了H-EFT-VA保持了体积定律纠缠和近Haar纯度,确保了复杂量子态的足够表达能力。在16个实验中进行的广泛基准测试-包括横向场伊辛和海森堡XXZ模型-证实了能量收敛的109倍改进和基态保真度的10.7倍增加,超过标准的硬件高效Ansatze(HEA),统计显著性为$p < 10^{-88}$。
摘要:Variational Quantum Algorithms (VQAs) are critically threatened by the Barren Plateau (BP) phenomenon. In this work, we introduce the H-EFT Variational Ansatz (H-EFT-VA), an architecture inspired by Effective Field Theory (EFT). By enforcing a hierarchical "UV-cutoff" on initialization, we theoretically restrict the circuit's state exploration, preventing the formation of approximate unitary 2-designs. We provide a rigorous proof that this localization guarantees an inverse-polynomial lower bound on the gradient variance: $Var[\partial θ] \in Ω(1/poly(N))$. Crucially, unlike approaches that avoid BPs by limiting entanglement, we demonstrate that H-EFT-VA maintains volume-law entanglement and near-Haar purity, ensuring sufficient expressibility for complex quantum states. Extensive benchmarking across 16 experiments -- including Transverse Field Ising and Heisenberg XXZ models -- confirms a 109x improvement in energy convergence and a 10.7x increase in ground-state fidelity over standard Hardware-Efficient Ansatze (HEA), with a statistical significance of $p < 10^{-88}$.
【28】Sim2Real Deep Transfer for Per-Device CFO Calibration
标题:Sim 2 Real深度传输,用于每设备CFO校准
链接:https://arxiv.org/abs/2601.10264
作者:Jingze Zheng,Zhiguo Shi,Shibo He,Chaojie Gu
备注:Accepted by Globecom 2025
摘要:正交频分复用(OFDM)系统中的载波频率偏移(CFO)估计在异构软件定义无线电(SDR)平台上由于未校准的硬件损伤而面临显著的性能下降。现有的基于深度神经网络(DNN)的方法缺乏设备级自适应,限制了它们的实际部署。本文提出了一个用于按设备CFO校准的Sim 2 Real迁移学习框架,将模拟驱动的预训练与轻量级接收器自适应相结合。骨干DNN在包含参数硬件失真(例如,相位噪声、IQ不平衡),从而实现通用特征学习,而无需昂贵的跨设备数据收集。随后,只有回归层使用每个目标设备1,000美元的实际帧进行微调,保留硬件不可知的知识,同时适应特定于设备的损伤。在室内多径条件下,与传统的基于CP的方法相比,三个SDR系列(USRP B210,USRP N210,HackRF One)的实验实现了$30\times$ BER降低。该框架弥补了仿真与现实之间的差距,实现了稳健的CFO估计,从而实现了异构无线系统中的经济高效部署。
摘要:Carrier Frequency Offset (CFO) estimation in Orthogonal Frequency Division Multiplexing (OFDM) systems faces significant performance degradation across heterogeneous software-defined radio (SDR) platforms due to uncalibrated hardware impairments. Existing deep neural network (DNN)-based approaches lack device-level adaptation, limiting their practical deployment. This paper proposes a Sim2Real transfer learning framework for per-device CFO calibration, combining simulation-driven pretraining with lightweight receiver adaptation. A backbone DNN is pre-trained on synthetic OFDM signals incorporating parametric hardware distortions (e.g., phase noise, IQ imbalance), enabling generalized feature learning without costly cross-device data collection. Subsequently, only the regression layers are fine-tuned using $1,000$ real frames per target device, preserving hardware-agnostic knowledge while adapting to device-specific impairments. Experiments across three SDR families (USRP B210, USRP N210, HackRF One) achieve $30\times$ BER reduction compared to conventional CP-based methods under indoor multipath conditions. The framework bridges the simulation-to-reality gap for robust CFO estimation, enabling cost-effective deployment in heterogeneous wireless systems.
【29】Accelerated Regularized Wasserstein Proximal Sampling Algorithms
标题:加速的正规化Wasserstein近端抽样算法
链接:https://arxiv.org/abs/2601.09848
作者:Hong Ye Tan,Stanley Osher,Wuchen Li
摘要:我们考虑使用特定的分数估计而不是布朗运动,通过演化有限数量的粒子来从吉布斯分布进行采样。为了加速粒子,我们考虑了一个基于二阶分数的常微分方程,类似于Nesterov加速。与传统的核密度分数估计相比,我们使用最近提出的正则化Wasserstein近似方法,产生加速正则化Wasserstein近似方法(ARWP)。我们提供了一个详细的分析连续和离散时间的非渐近和渐近混合率高斯初始和目标分布,使用技术从欧几里得加速和加速信息梯度。与动力学Langevin采样算法相比,该算法在渐近时间范围内具有更高的收缩率。数值实验进行了各种低维实验,包括多模态高斯混合和病态Rosenbrock分布。ARWP表现出结构化和收敛的粒子,加速离散时间混合,和更快的尾部探索比非加速正则化Wasserstein近端方法和动力学Langevin方法。此外,ARWP粒子表现出更好的泛化性能的一些非对数凹贝叶斯神经网络任务。
摘要:We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consider a second-order score-based ODE, similar to Nesterov acceleration. In contrast to traditional kernel density score estimation, we use the recently proposed regularized Wasserstein proximal method, yielding the Accelerated Regularized Wasserstein Proximal method (ARWP). We provide a detailed analysis of continuous- and discrete-time non-asymptotic and asymptotic mixing rates for Gaussian initial and target distributions, using techniques from Euclidean acceleration and accelerated information gradients. Compared with the kinetic Langevin sampling algorithm, the proposed algorithm exhibits a higher contraction rate in the asymptotic time regime. Numerical experiments are conducted across various low-dimensional experiments, including multi-modal Gaussian mixtures and ill-conditioned Rosenbrock distributions. ARWP exhibits structured and convergent particles, accelerated discrete-time mixing, and faster tail exploration than the non-accelerated regularized Wasserstein proximal method and kinetic Langevin methods. Additionally, ARWP particles exhibit better generalization properties for some non-log-concave Bayesian neural network tasks.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递