|
|
创作新主题 |
| docker Elasticsearch |
| linux MongoDB Redis DATABASE NGINX 其他Web框架 web工具 zookeeper tornado NoSql Bootstrap js peewee Git bottle IE MQ Jquery |
| 机器学习算法 |
| 短视频 |
| 印度 |
点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计182篇
大模型相关(18篇)
【1】What do vision-language models see in the context? Investigating multimodal in-context learning
标题:视觉语言模型在上下文中看到了什么?研究多模态情境学习
链接:https://arxiv.org/abs/2510.24331
摘要:上下文学习(ICL)使大型语言模型(LLM)能够从演示示例中学习任务,而无需参数更新。虽然它已经在LLM中得到了广泛的研究,但它在视觉语言模型(VLM)中的有效性仍然有待探索。在这项工作中,我们提出了一个系统的研究,ICL在VLMs,评估七个模型跨越四个架构的三个图像字幕基准。我们分析了如何及时设计,建筑的选择,和培训策略影响多模态ICL。据我们所知,我们是第一个分析注意力模式的VLM随着越来越多的上下文演示而变化。我们的研究结果表明,图像-文本交错数据的训练提高了ICL的性能,但并不意味着从演示示例中有效地整合视觉和文本信息。相比之下,指令调优改善了指令遵循,但可以减少对上下文演示的依赖,这表明在指令对齐和上下文自适应之间进行权衡。注意力分析进一步表明,目前的VLM主要集中在文本线索,并未能利用视觉信息,这表明有限的能力,多模态整合。这些研究结果突出了当前VLM的ICL能力的关键限制,并为增强其从多模态上下文示例中学习的能力提供了见解。
摘要:In-context learning (ICL) enables Large Language Models (LLMs) to learn tasks from demonstration examples without parameter updates. Although it has been extensively studied in LLMs, its effectiveness in Vision-Language Models (VLMs) remains underexplored. In this work, we present a systematic study of ICL in VLMs, evaluating seven models spanning four architectures on three image captioning benchmarks. We analyze how prompt design, architectural choices, and training strategies influence multimodal ICL. To our knowledge, we are the first to analyze how attention patterns in VLMs vary with an increasing number of in-context demonstrations. Our results reveal that training on imag-text interleaved data enhances ICL performance but does not imply effective integration of visual and textual information from demonstration examples. In contrast, instruction tuning improves instruction-following but can reduce reliance on in-context demonstrations, suggesting a trade-off between instruction alignment and in-context adaptation. Attention analyses further show that current VLMs primarily focus on textual cues and fail to leverage visual information, suggesting a limited capacity for multimodal integration. These findings highlight key limitations in the ICL abilities of current VLMs and provide insights for enhancing their ability to learn from multimodal in-context examples.
【2】Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models
标题:通过大型视觉语言模型的卫星-地面协作实现近实时遥感
链接:https://arxiv.org/abs/2510.24242
备注:15 pages, 11 figures
摘要:大型视觉语言模型(LVLM)最近在遥感(RS)任务中表现出巨大的潜力(例如,灾害监测)。然而,它们在现实世界的低地球轨道卫星系统中的部署在很大程度上仍未得到探索,受到有限的机载计算资源和短暂的卫星-地面接触的阻碍。我们提出格雷斯,卫星地面协作系统设计的近实时LVLM推理RS任务。因此,我们在卫星上部署紧凑型LVLM以进行实时推理,但在地面站(GS)上部署较大的LVLM以保证端到端性能。该算法由两个主要阶段组成:异步卫星GS检索增强生成(RAG)和任务调度算法。首先,在有限的星地数据交换周期内,采用定制的自适应更新算法将GS RAG的知识库更新到星库中。其次,提出了一种基于置信度的测试算法,该算法可以在卫星上处理任务,也可以将其卸载到GS。基于真实世界卫星轨道数据的大量实验表明,与最先进的方法相比,Grace将平均延迟降低了76-95%,而不会影响推理准确性。
摘要:Large vision-language models (LVLMs) have recently demonstrated great potential in remote sensing (RS) tasks (e.g., disaster monitoring) conducted by low Earth orbit (LEO) satellites. However, their deployment in real-world LEO satellite systems remains largely unexplored, hindered by limited onboard computing resources and brief satellite-ground contacts. We propose Grace, a satellite-ground collaborative system designed for near-realtime LVLM inference in RS tasks. Accordingly, we deploy compact LVLM on satellites for realtime inference, but larger ones on ground stations (GSs) to guarantee end-to-end performance. Grace is comprised of two main phases that are asynchronous satellite-GS Retrieval-Augmented Generation (RAG), and a task dispatch algorithm. Firstly, we still the knowledge archive of GS RAG to satellite archive with tailored adaptive update algorithm during limited satellite-ground data exchange period. Secondly, propose a confidence-based test algorithm that either processes the task onboard the satellite or offloads it to the GS. Extensive experiments based on real-world satellite orbital data show that Grace reduces the average latency by 76-95% compared to state-of-the-art methods, without compromising inference accuracy.
【3】Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment
标题:超越神经不兼容性:通过潜在语义对齐来缓解大型语言模型中的跨规模知识转移
链接:https://arxiv.org/abs/2510.24208
备注:an early-stage version
摘要:大型语言模型(LLM)将大量知识编码在其大量参数中,这些参数可用于定位,跟踪和分析。尽管在神经可解释性方面取得了进展,但仍然不清楚如何以细粒度的方式传递知识,即参数知识传递(PKT)。一个关键问题是使不同规模的LLM之间的知识转移有效和高效,这对于实现LLM之间知识转移的更大灵活性和更广泛的适用性至关重要。由于神经不兼容性,指的是不同尺度的LLM之间的架构和参数差异,现有的方法,直接重用层参数受到严重限制。在本文中,我们确定潜在空间的语义对齐作为LLM跨尺度知识转移的基本前提。我们的方法不直接使用层参数,而是将激活作为逐层知识传递的媒介。利用潜在空间中的语义,我们的方法很简单,并且优于以前的工作,更好地在不同尺度上对齐模型行为。四个基准的评价证明了我们的方法的有效性。进一步的分析揭示了跨尺度知识转移的关键因素,并提供了潜在的语义对齐的性质的见解。
摘要:Large Language Models (LLMs) encode vast amounts of knowledge in their massive parameters, which is accessible to locate, trace, and analyze. Despite advances in neural interpretability, it is still not clear how to transfer knowledge in a fine-grained manner, namely parametric knowledge transfer (PKT). A key problem is enabling effective and efficient knowledge transfer across LLMs of different scales, which is essential for achieving greater flexibility and broader applicability in transferring knowledge between LLMs. Due to neural incompatibility, referring to the architectural and parametric differences between LLMs of varying scales, existing methods that directly reuse layer parameters are severely limited. In this paper, we identify the semantic alignment in latent space as the fundamental prerequisite for LLM cross-scale knowledge transfer. Instead of directly using the layer parameters, our approach takes activations as the medium of layer-wise knowledge transfer. Leveraging the semantics in latent space, our approach is simple and outperforms prior work, better aligning model behaviors across varying scales. Evaluations on four benchmarks demonstrate the efficacy of our method. Further analysis reveals the key factors easing cross-scale knowledge transfer and provides insights into the nature of latent semantic alignment.
【4】HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology
标题:HistoLens:一个交互式XAI工具包,用于发现和缓解组织学视觉语言模型中的缺陷
链接:https://arxiv.org/abs/2510.24115
摘要:对于医生来说,真正信任人工智能,它不能是一个黑匣子。他们需要理解它的推理,几乎就像他们在咨询同事一样。我们创建HistoLens 1就是为了成为透明、协作的合作伙伴。它允许病理学家简单地用简单的英语问一个关于组织切片的问题--就像他们问一个实习生一样。我们的系统智能地将这个问题转化为AI引擎的精确查询,然后提供一个清晰的结构化报告。但它并没有就此止步。如果有医生问:“为什么?HistoLens可以立即为任何发现提供“视觉证明”-指向AI用于分析的确切细胞和区域的热图。我们还确保人工智能只关注患者的组织,就像训练有素的病理学家一样,通过教它忽略分散注意力的背景噪音。其结果是,病理学家仍然是负责的专家,使用值得信赖的人工智能助手来验证他们的见解,并做出更快,更自信的诊断。
摘要
:For doctors to truly trust artificial intelligence, it can't be a black box. They need to understand its reasoning, almost as if they were consulting a colleague. We created HistoLens1 to be that transparent, collaborative partner. It allows a pathologist to simply ask a question in plain English about a tissue slide--just as they would ask a trainee. Our system intelligently translates this question into a precise query for its AI engine, which then provides a clear, structured report. But it doesn't stop there. If a doctor ever asks, "Why?", HistoLens can instantly provide a 'visual proof' for any finding--a heatmap that points to the exact cells and regions the AI used for its analysis. We've also ensured the AI focuses only on the patient's tissue, just like a trained pathologist would, by teaching it to ignore distracting background noise. The result is a workflow where the pathologist remains the expert in charge, using a trustworthy AI assistant to verify their insights and make faster, more confident diagnoses.
【5】Discovering Heuristics with Large Language Models (LLMs) for Mixed-Integer Programs: Single-Machine Scheduling
标题:使用大型语言模型(LLM)为混合任务程序发现启发式方法:单机调度
链接:https://arxiv.org/abs/2510.24013
摘要:我们的研究有助于调度和组合优化文献,通过利用大型语言模型(LLM)的力量发现了新的算法。我们专注于单机总拖期(SMTT)问题,它的目的是通过在一个单一的处理器上没有抢占,给定的加工时间和交货期排序n个工作,以尽量减少总拖期。我们开发和基准测试两种新的LLM-discovered mechanistics,EDD挑战者(EDDC)和MDD挑战者(MDDC),灵感来自著名的最早到期日(EDD)和修改的到期日(MDD)规则。与采用更简单的基于规则的算法的先前研究相比,我们使用严格的标准来评估我们的LLM发现的算法,包括来自SMTT的混合整数规划(MIP)制定的最优性间隙和解决方案时间。我们将它们的性能与各种作业规模(20,100,200和500个作业)的最先进的算法和精确方法进行比较。对于超过100个工作的情况,精确的方法,如MIP和动态规划,在计算上变得难以处理。EDDC改进了经典的EDD规则和文献中另一种广泛使用的算法,最多可处理500个作业。MDDC的性能始终优于传统的分析方法,并且与精确方法相比仍然具有竞争力,特别是在更大和更复杂的情况下。这项研究表明,人类LLM合作可以产生可扩展的,高性能的NP-硬约束组合优化算法,即使在有限的资源有效配置。
摘要:Our study contributes to the scheduling and combinatorial optimization literature with new heuristics discovered by leveraging the power of Large Language Models (LLMs). We focus on the single-machine total tardiness (SMTT) problem, which aims to minimize total tardiness by sequencing n jobs on a single processor without preemption, given processing times and due dates. We develop and benchmark two novel LLM-discovered heuristics, the EDD Challenger (EDDC) and MDD Challenger (MDDC), inspired by the well-known Earliest Due Date (EDD) and Modified Due Date (MDD) rules. In contrast to prior studies that employed simpler rule-based heuristics, we evaluate our LLM-discovered algorithms using rigorous criteria, including optimality gaps and solution time derived from a mixed-integer programming (MIP) formulation of SMTT. We compare their performance against state-of-the-art heuristics and exact methods across various job sizes (20, 100, 200, and 500 jobs). For instances with more than 100 jobs, exact methods such as MIP and dynamic programming become computationally intractable. Up to 500 jobs, EDDC improves upon the classic EDD rule and another widely used algorithm in the literature. MDDC consistently outperforms traditional heuristics and remains competitive with exact approaches, particularly on larger and more complex instances. This study shows that human-LLM collaboration can produce scalable, high-performing heuristics for NP-hard constrained combinatorial optimization, even under limited resources when effectively configured.
【6】The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity
标题:标志估计:面对选择异向性的LLM一致
链接:https://arxiv.org/abs/2510.23965
摘要:传统的LLM比对方法容易受到人类偏好异质性的影响。将一个朴素的概率模型拟合到成对比较数据(比如在非完全配对上),会产生一个不一致的人口平均效用估计值--社会福利的一个标准度量。我们提出了一种新的方法,被称为符号估计,提供了一个简单的,可证明一致的,有效的估计,通过在聚合步骤中用二进制分类损失代替交叉熵。这个简单的修改恢复一致的序数对齐温和的假设下,并实现了第一个多项式有限样本误差界在此设置。在使用数字孪生子的LLM对齐的现实模拟中,符号估计器大大减少了模拟人物面板上的偏好失真,将(角度)估计误差减少了近35%,并将与真实人群偏好的不一致从12%减少到8%。我们的方法还比较有利的面板数据分析,明确建模用户的异质性,并需要跟踪个人级别的偏好数据,同时保持现有的LLM对齐管道的实现简单性。
摘要:Traditional LLM alignment methods are vulnerable to heterogeneity in human preferences. Fitting a na\"ive probabilistic model to pairwise comparison data (say over prompt-completion pairs) yields an inconsistent estimate of the population-average utility -a canonical measure of social welfare. We propose a new method, dubbed the sign estimator, that provides a simple, provably consistent, and efficient estimator by replacing cross-entropy with binary classification loss in the aggregation step. This simple modification recovers consistent ordinal alignment under mild assumptions and achieves the first polynomial finite-sample error bounds in this setting. In realistic simulations of LLM alignment using digital twins, the sign estimator substantially reduces preference distortion over a panel of simulated personas, cutting (angular) estimation error by nearly 35% and decreasing disagreement with true population preferences from 12% to 8% compared to standard RLHF. Our method also compares favorably to panel data heuristics that explicitly model user heterogeneity and require tracking individual-level preference data-all while maintaining the implementation simplicity of existing LLM alignment pipelines.
【7】ChessQA: Evaluating Large Language Models for Chess Understanding
标题:ChessQA:评估大型语言模型以理解国际象棋
链接:https://arxiv.org/abs/2510.23948
备注:33 pages,8 figures
摘要:Chess为评估大型语言模型(LLM)的推理、建模和抽象能力提供了一个理想的测试平台,因为它具有定义良好的结构和客观的基础事实,同时允许广泛的技能水平。然而,现有的国际象棋LLM能力评估是临时的,范围狭窄,难以准确衡量LLM国际象棋的理解以及它如何随规模,培训后方法或架构选择而变化。我们提出了ChessQA,这是一个全面的基准,评估LLM国际象棋理解五个任务类别(结构,图案,简短的战术,位置判断和语义),这大致对应于玩家在积累国际象棋知识时掌握的上升抽象,从理解基本规则和学习战术图案到正确计算战术,评估位置和语义描述高级概念。通过这种方式,ChessQA可以更全面地了解国际象棋的能力和理解力,远远超出了以前简单的移动质量评估,并为诊断和比较提供了一个可控的、一致的设置。此外,ChessQA本质上是动态的,提示、答案键和构造脚本可以随着模型的改进而发展。评估一系列当代法学硕士,我们发现所有五个类别的持续弱点,并按类别提供结果和错误分析。我们将发布代码,定期刷新数据集,以及一个公共排行榜,以支持进一步的研究。
摘要:Chess provides an ideal testbed for evaluating the reasoning, modeling, and abstraction capabilities of large language models (LLMs), as it has well-defined structure and objective ground truth while admitting a wide spectrum of skill levels. However, existing evaluations of LLM ability in chess are ad hoc and narrow in scope, making it difficult to accurately measure LLM chess understanding and how it varies with scale, post-training methodologies, or architecture choices. We present ChessQA, a comprehensive benchmark that assesses LLM chess understanding across five task categories (Structural, Motifs, Short Tactics, Position Judgment, and Semantic), which approximately correspond to the ascending abstractions that players master as they accumulate chess knowledge, from understanding basic rules and learning tactical motifs to correctly calculating tactics, evaluating positions, and semantically describing high-level concepts. In this way, ChessQA captures a more comprehensive picture of chess ability and understanding, going significantly beyond the simple move quality evaluations done previously, and offers a controlled, consistent setting for diagnosis and comparison. Furthermore, ChessQA is inherently dynamic, with prompts, answer keys, and construction scripts that can evolve as models improve. Evaluating a range of contemporary LLMs, we find persistent weaknesses across all five categories and provide results and error analyses by category. We will release the code, periodically refreshed datasets, and a public leaderboard to support further research.
【8】Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation
标题:打破基准:通过最小限度的上下文增强揭示LLM偏见
链接:https://arxiv.org/abs/2510.23921
备注:9 pages, 3 figures, 3 tables
摘要:大型语言模型已经被证明在其表示和行为中表现出定型偏见,这是由于它们所训练的数据的歧视性。尽管在开发方法和模型方面取得了重大进展,避免在决策中使用定型信息,但最近的工作表明,用于偏见对齐的方法是脆弱的。在这项工作中,我们介绍了一种新的和一般的增强框架,涉及三个即插即用的步骤,并适用于一些公平性评估基准。通过将增强应用于公平性评估数据集(Bias Benchmark for Question Questioning(BBQ)),我们发现大型语言模型(LLM),包括最先进的开放和封闭权重模型,容易受到输入扰动的影响,表现出更高的刻板行为可能性。此外,我们发现,这样的模型更有可能有偏见的行为的情况下,目标人口属于一个社区的文献研究较少,强调需要扩大公平和安全的研究,包括更多样化的社区。
摘要
:Large Language Models have been shown to demonstrate stereotypical biases in their representations and behavior due to the discriminative nature of the data that they have been trained on. Despite significant progress in the development of methods and models that refrain from using stereotypical information in their decision-making, recent work has shown that approaches used for bias alignment are brittle. In this work, we introduce a novel and general augmentation framework that involves three plug-and-play steps and is applicable to a number of fairness evaluation benchmarks. Through application of augmentation to a fairness evaluation dataset (Bias Benchmark for Question Answering (BBQ)), we find that Large Language Models (LLMs), including state-of-the-art open and closed weight models, are susceptible to perturbations to their inputs, showcasing a higher likelihood to behave stereotypically. Furthermore, we find that such models are more likely to have biased behavior in cases where the target demographic belongs to a community less studied by the literature, underlining the need to expand the fairness and safety research to include more diverse communities.
【9】PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs
标题:PRO:为开源LLM启用精确且稳健的文本水印
链接:https://arxiv.org/abs/2510.23891
摘要:大型语言模型(LLM)的文本水印使模型所有者能够验证文本来源并保护知识产权。虽然用于闭源LLM的水印方法相对成熟,但将其扩展到开源模型仍然具有挑战性,因为开发人员无法控制解码过程。因此,开源LLM的所有者缺乏实际的方法来验证文本是否由他们的模型生成。一个核心难点在于将水印直接嵌入到模型权重中而不损害可检测性。一个有前途的想法是将水印从封闭源模型提取到开放模型中,但这会受到(i)由于学习和预定义模式之间的不匹配而导致的检测能力差,以及(ii)对下游修改(如微调或模型合并)的脆弱性。为了克服这些限制,我们提出PRO,一个精确和鲁棒的文本水印方法的开源LLM。PRO与LLM联合训练水印策略模型,产生模型更容易学习且与检测标准更一致的模式。正则化项进一步模拟下游扰动并惩罚水印可检测性的退化,确保模型编辑下的鲁棒性。在开源LLM上进行的实验(例如,LLaMA-3.2,LLaMA-3,Phi-2)表明,PRO大大提高了水印的可检测性和对模型修改的弹性。
摘要:Text watermarking for large language models (LLMs) enables model owners to verify text origin and protect intellectual property. While watermarking methods for closed-source LLMs are relatively mature, extending them to open-source models remains challenging, as developers cannot control the decoding process. Consequently, owners of open-source LLMs lack practical means to verify whether text was generated by their models. A core difficulty lies in embedding watermarks directly into model weights without hurting detectability. A promising idea is to distill watermarks from a closed-source model into an open one, but this suffers from (i) poor detectability due to mismatch between learned and predefined patterns, and (ii) fragility to downstream modifications such as fine-tuning or model merging. To overcome these limitations, we propose PRO, a Precise and Robust text watermarking method for open-source LLMs. PRO jointly trains a watermark policy model with the LLM, producing patterns that are easier for the model to learn and more consistent with detection criteria. A regularization term further simulates downstream perturbations and penalizes degradation in watermark detectability, ensuring robustness under model edits. Experiments on open-source LLMs (e.g., LLaMA-3.2, LLaMA-3, Phi-2) show that PRO substantially improves both watermark detectability and resilience to model modifications.
【10】Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra
标题:测试时间调谐语言模型实现从MS/MS Spectrum生成端到端的De Novo分子结构
链接:https://arxiv.org/abs/2510.23746
摘要:串联质谱能够在代谢组学、天然产物发现和环境分析等关键领域识别未知化合物。然而,目前的方法依赖于来自先前观察到的分子的数据库匹配,或者依赖于需要中间片段或指纹预测的多步流水线。这使得找到正确的分子极具挑战性,特别是对于参考数据库中不存在的化合物。我们引入了一个框架,该框架通过利用测试时间调整,增强了对预训练的Transformer模型的学习,以解决这一差距,从而能够直接从串联质谱和分子式生成端到端的从头分子结构,绕过手动注释和中间步骤。我们在两个流行的基准测试NPLIB1和MassSpecGym上分别超过了事实上最先进的DiffMS方法100%和20%。实验光谱的测试时间调谐允许模型动态地适应新的光谱,相对于传统的微调性能增益为62%的MassSpecGym。当预测偏离地面事实时,生成的分子候选物在结构上仍然准确,为人类解释和更可靠的识别提供了有价值的指导。
摘要:Tandem Mass Spectrometry enables the identification of unknown compounds in crucial fields such as metabolomics, natural product discovery and environmental analysis. However, current methods rely on database matching from previously observed molecules, or on multi-step pipelines that require intermediate fragment or fingerprint prediction. This makes finding the correct molecule highly challenging, particularly for compounds absent from reference databases. We introduce a framework that, by leveraging test-time tuning, enhances the learning of a pre-trained transformer model to address this gap, enabling end-to-end de novo molecular structure generation directly from the tandem mass spectra and molecular formulae, bypassing manual annotations and intermediate steps. We surpass the de-facto state-of-the-art approach DiffMS on two popular benchmarks NPLIB1 and MassSpecGym by 100% and 20%, respectively. Test-time tuning on experimental spectra allows the model to dynamically adapt to novel spectra, and the relative performance gain over conventional fine-tuning is of 62% on MassSpecGym. When predictions deviate from the ground truth, the generated molecular candidates remain structurally accurate, providing valuable guidance for human interpretation and more reliable identification.
【11】Aligning Diffusion Language Models via Unpaired Preference Optimization
标题:通过不成对偏好优化调整扩散语言模型
链接:https://arxiv.org/abs/2510.23658
摘要:扩散语言模型(dLLM)是一种新兴的替代自回归(AR)生成器,但将它们与人类的偏好是具有挑战性的,因为序列对数似然是棘手的,成对的偏好数据是昂贵的收集。我们介绍ELBO-KTO,它结合了ELBO代理扩散对数似然与前景理论,不成对的偏好目标(Kahneman Tversky优化,KTO)。我们分析了ELBO替代引起的偏差和方差,并采用方差减少的做法,在训练过程中稳定梯度。应用于LLaDA-8B-Instruct,ELBO-KTO在kto-mix-14 k和UltraFeedback-Binary上分别产生\textbf{65.9\%}和\textbf{62.3\%}调整后的胜率,与自动LLM判断下的基本模型相比。在下游任务中,包括GSM 8 K、MMLU和其他推理/知识基准测试,在UltraFeedback-Binary上训练的ELBO-KTO在相同解码下的性能与基本模型相当或更好。这建立了非成对偏好优化作为扩散LLM中成对对齐的可行替代方案。
摘要:Diffusion language models (dLLMs) are an emerging alternative to autoregressive (AR) generators, but aligning them to human preferences is challenging because sequence log-likelihoods are intractable and pairwise preference data are costly to collect. We introduce ELBO-KTO, which combines an ELBO surrogate for diffusion log-likelihoods with a prospect-theoretic, unpaired preference objective (Kahneman Tversky Optimization, KTO). We analyze the bias and variance induced by the ELBO substitution and employ variance-reduction practices that stabilize gradients during training. Applied to LLaDA-8B-Instruct, ELBO-KTO yields \textbf{65.9\%} and \textbf{62.3\%} adjusted win rates on kto-mix-14k and UltraFeedback-Binary, respectively, versus the base model under an automatic LLM judge. Across downstream tasks, including GSM8K, MMLU, and additional reasoning/knowledge benchmarks, ELBO-KTO trained on UltraFeedback-Binary performs on par with or better than the base model under identical decoding. This establishes unpaired preference optimization as a viable alternative to pairwise alignment in diffusion LLMs.
【12】The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models
标题:结构手术刀:大型语言模型的自动连续层修剪
链接:https://arxiv.org/abs/2510.23652
摘要:尽管大型语言模型(LLM)在许多领域取得了革命性的突破,但其庞大的模型规模和高昂的计算成本给资源受限的边缘设备上的实际部署带来了重大挑战。为此,已经提出了层修剪,以通过直接去除冗余层来减少计算开销。然而,现有的层修剪方法通常依赖于手工制作的指标来评估和删除各个层,而忽略了层之间的依赖关系。这可能会破坏模型的信息流并严重降低性能。为了解决这些问题,我们提出了CLP,一种新的连续层修剪框架,引入了两个关键的创新:一个可微凹门算法,自动识别最佳连续层段修剪通过基于梯度的优化;和一个截止端点调整策略,有效地恢复模型的性能微调只有层相邻的修剪段。在多个模型架构(包括LLaMA 2、LLaMA 3和Qwen)和大小(从70亿美元到700亿美元的参数)上进行的广泛实验表明,CLP的性能明显优于现有的最先进的基线。例如,在修剪率为20美元的情况下,CLP在LLaMA 3 - 70 B上实现了95.34美元的平均性能保留,比基线高出4.29美元至30.52美元。此外,CLP可以无缝地与量化相结合,以进一步压缩模型,而只有轻微的性能损失。
摘要:Although large language models (LLMs) have achieved revolutionary breakthroughs in many fields, their large model size and high computational cost pose significant challenges for practical deployment on resource-constrained edge devices. To this end, layer pruning has been proposed to reduce the computational overhead by directly removing redundant layers. However, existing layer pruning methods typically rely on hand-crafted metrics to evaluate and remove individual layers, while ignoring the dependencies between layers. This can disrupt the model's information flow and severely degrade performance. To address these issues, we propose CLP, a novel continuous layer pruning framework that introduces two key innovations: a differentiable concave gate algorithm that automatically identifies the best continuous layer segments for pruning via gradient-based optimization; and a cutoff endpoint tuning strategy that effectively restores model performance by fine-tuning only the layers adjacent to the pruned segments. Extensive experiments across multiple model architectures (including LLaMA2, LLaMA3 and Qwen) and sizes (from $7$B to $70$B parameters) show that CLP significantly outperforms existing state-of-the-art baselines. For example, at a pruning rate of $20\%$, CLP achieves an average performance retention of $95.34\%$ on LLaMA3-70B, outperforming baselines by $4.29\%$-$30.52\%$. Furthermore, CLP can be seamlessly combined with quantization to further compress the model with only a slight performance loss.
【13】Beyond Hidden-Layer Manipulation: Semantically-Aware Logit Interventions for Debiasing LLMs
标题:超越隐藏层操纵:用于去偏置LLM的语义感知Logit干预
链接:https://arxiv.org/abs/2510.23650
摘要:提出了静态和动态两种zero-shot对数层去偏方法。动态减少偏差高达70%,流畅性损失最小。Logits干预优于隐藏层方法。我们表明语义感知logits干预是稳定和有效的去偏对齐LLM。
摘要:We proposed Static and Dynamic -- two zero-shot logits-layer debiasing methods. Dynamic reduces bias by up to 70% with minimal fluency loss. Logits intervention outperforms hidden-layer approaches. We show semantic-aware logits intervention is stable and effective for debiasing aligned LLMs.
【14】Efficient Low Rank Attention for Long-Context Inference in Large Language Models
标题:大型语言模型中长上下文推理的高效低级别注意力
链接:https://arxiv.org/abs/2510.23649
摘要:随着输入文本长度的增长,LLM中的键值(KV)缓存会带来过高的GPU内存成本,并限制资源受限设备上的长上下文推断。现有的方法,如KV量化和修剪,减少内存使用,但遭受数值精度损失或次优保留的关键值对。我们引入了低秩查询和关键注意力(LRQK),这是一个两阶段的框架,在预填充阶段将全精度查询和关键矩阵联合分解为紧凑的秩-\(r\)因子,然后在每个解码步骤中使用这些低维投影在\(\mathcal{O}(lr)\)时间内计算代理注意力得分。通过只选择前k个令牌和一小部分固定的最近令牌,LRQK采用了一个混合的GPU-CPU缓存,并采用了命中和未命中机制,只传输丢失的全精度KV对,从而在减少CPU-GPU数据移动的同时保留了精确的注意力输出。使用LLaMA-3-8B和Qwen2.5- 7 B对RULER和LongBench基准进行的大量实验表明,LRQK在长上下文设置中匹配或超越领先的稀疏注意方法,同时以最小的准确性损失提供显着的内存节省。我们的代码可在https://github.com/tenghuilee/LRQK上获得。
摘要:As the length of input text grows, the key-value (KV) cache in LLMs imposes prohibitive GPU memory costs and limits long-context inference on resource constrained devices. Existing approaches, such as KV quantization and pruning, reduce memory usage but suffer from numerical precision loss or suboptimal retention of key-value pairs. We introduce Low Rank Query and Key attention (LRQK), a two-stage framework that jointly decomposes the full-precision query and key matrices into compact rank-\(r\) factors during the prefill stage, and then uses these low-dimensional projections to compute proxy attention scores in \(\mathcal{O}(lr)\) time at each decode step. By selecting only the top-\(k\) tokens and a small fixed set of recent tokens, LRQK employs a mixed GPU-CPU cache with a hit-and-miss mechanism that transfers only missing full-precision KV pairs, thereby preserving exact attention outputs while reducing CPU-GPU data movement. Extensive experiments on the RULER and LongBench benchmarks with LLaMA-3-8B and Qwen2.5-7B demonstrate that LRQK matches or surpasses leading sparse-attention methods in long context settings, while delivering significant memory savings with minimal loss in accuracy. Our code is available at https://github.com/tenghuilee/LRQK.
【15】Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation
标题:通过大型语言模型的跨模式自适应和飞机轨迹表示进行航班延误预测
链接:https://arxiv.org/abs/2510.23636
备注:Preprint submitted to Aerospace Science and Technology (Elsevier) for possible publication
摘要:航班延误预测已经成为空中交通管理的一个关键焦点,因为延误突出了影响整体网络性能的效率低下。本文提出了一种轻量级的大语言模型为基础的多模态航班延误预测,制定了从空中交通管制员监测飞机进入终端区后的延误的角度。该方法将轨迹表示与文本航空信息,包括飞行信息,天气报告和机场通知,通过将轨迹数据调整到语言模态中来捕获空域条件。实验结果表明,该模型通过有效地利用与延迟源相关的上下文信息,始终实现亚分钟的预测误差。该框架表明,语言的理解,结合跨通道适应的轨迹信息,提高延迟预测。此外,该方法显示了现实世界操作的实用性和可扩展性,支持在接收新的操作信息时改进预测的实时更新。
摘要:Flight delay prediction has become a key focus in air traffic management, as delays highlight inefficiencies that impact overall network performance. This paper presents a lightweight large language model-based multimodal flight delay prediction, formulated from the perspective of air traffic controllers monitoring aircraft delay after entering the terminal area. The approach integrates trajectory representations with textual aeronautical information, including flight information, weather reports, and aerodrome notices, by adapting trajectory data into the language modality to capture airspace conditions. Experimental results show that the model consistently achieves sub-minute prediction error by effectively leveraging contextual information related to the sources of delay. The framework demonstrates that linguistic understanding, when combined with cross-modality adaptation of trajectory information, enhances delay prediction. Moreover, the approach shows practicality and scalability for real-world operations, supporting real-time updates that refine predictions upon receiving new operational information.
【16】Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
标题:超越配对:通过排名选择建模增强LLM对齐
链接:https://arxiv.org/abs/2510.23631
摘要:大型语言模型(LLM)的对齐主要依赖于成对偏好优化,其中注释者选择对提示的两个响应中的较好者。虽然简单,但这种方法忽略了从更丰富的人类反馈中学习的机会,例如多项比较和顶级排名。我们提出了排名选择偏好优化(RCPO),一个统一的框架,通过最大似然估计的桥梁偏好优化(排名)选择建模。该框架是灵活的,支持基于效用和基于排名的选择模型。它包含了几种现有的成对方法(例如,DPO、SimPO),同时为更丰富的反馈格式提供原则性的培训目标。我们用两个代表性的排名选择模型(Multinomial Logit和Mallows-RMJ)来实例化这个框架。在AlpacaEval 2和Arena-Hard基准上对Llama-3 - 8B-Instruct和Gemma-2 - 9B-it的实证研究表明,RCPO始终优于竞争基准。RCPO展示了如何直接利用排名偏好数据,结合正确的选择模型,产生更有效的对齐。它为将(排名)选择建模纳入LLM培训提供了一个通用和可扩展的基础。
摘要:Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this approach overlooks the opportunity to learn from richer forms of human feedback, such as multiwise comparisons and top-$k$ rankings. We propose Ranked Choice Preference Optimization (RCPO), a unified framework that bridges preference optimization with (ranked) choice modeling via maximum likelihood estimation. The framework is flexible, supporting both utility-based and rank-based choice models. It subsumes several existing pairwise methods (e.g., DPO, SimPO), while providing principled training objectives for richer feedback formats. We instantiate this framework with two representative ranked choice models (Multinomial Logit and Mallows-RMJ). Empirical studies on Llama-3-8B-Instruct and Gemma-2-9B-it across AlpacaEval 2 and Arena-Hard benchmarks show that RCPO consistently outperforms competitive baselines. RCPO shows how directly leveraging ranked preference data, combined with the right choice models, yields more effective alignment. It offers a versatile and extensible foundation for incorporating (ranked) choice modeling into LLM training.
【17】Chain of Execution Supervision Promotes General Reasoning in Large Language Models
标题:执行监督链促进大型语言模型中的通用推理
链接:https://arxiv.org/abs/2510.23629
备注:None
摘要:构建强大的通用推理能力是大型语言模型(LLM)开发的核心目标。最近的努力越来越多地转向代码作为一个丰富的培训来源,考虑到其固有的逻辑结构和不同的推理范式,如分治,拓扑排序和枚举。然而,代码中的推理通常是隐式表达的,并且与语法或实现噪声纠缠在一起,使得直接在原始代码上进行训练不是最佳的。为了解决这个问题,我们引入了TracePile,这是一个包含260万个样本的大规模语料库,可以将代码执行转换为显式的,逐步的思想链风格的基本原理,我们称之为执行链(Chain of Execution,简称CoE)。该语料库涵盖数学、经典算法和算法竞赛等领域,并通过变量跟踪问题和代码重写来增强逻辑粒度和代码多样性。我们使用三种训练设置来评估TracePile:持续预训练,预训练后的指令调整和两阶段微调。在四个基本模型(LLaMA 3,LLaMA 3.1,Qwen-2.5和Qwen-2.5 Coder)和20个涵盖数学,代码,逻辑和算法的基准测试中进行的实验证明了一致的改进。值得注意的是,TracePile在九个数学数据集上平均将LLaMA3.1-8B提升了7.1%,并在两阶段微调下在LiveCodeBench,CRUX和MMLU上提供了明显的收益。
摘要
:Building robust and general reasoning ability is a central goal in the development of large language models (LLMs). Recent efforts increasingly turn to code as a rich training source, given its inherent logical structure and diverse reasoning paradigms such as divide-and-conquer, topological ordering, and enumeration. However, reasoning in code is often expressed implicitly and entangled with syntactic or implementation noise, making direct training on raw code suboptimal.To address this, we introduce TracePile, a large-scale corpus of 2.6 million samples that transforms code execution into explicit, step-by-step chain-of-thought-style rationales, which we call Chain of Execution (CoE). The corpus spans domains including mathematics, classical algorithms and algorithmic competition, and is enriched with variable-tracing questions and code rewritings to enhance logical granularity and code diversity. We evaluate TracePile using three training setups: continue-pretraining, instruction tuning after pretraining, and two-stage finetuning. Experiments across four base models (LLaMA 3, LLaMA 3.1, Qwen-2.5, and Qwen-2.5 Coder) and 20 benchmarks covering math, code, logic, and algorithms demonstrate consistent improvements. Notably, TracePile boosts LLaMA3.1-8B by 7.1\% on average across nine math datasets and delivers clear gains on LiveCodeBench, CRUX, and MMLU under two-stage fine-tuning.
【18】Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide
标题:利用有限数据微调大型语言模型:调查和实践指南
链接:https://arxiv.org/abs/2411.09539
备注:Accepted to TACL. Pre-MIT Press version. Major restructuring; added preference alignment section and additional tables. 36 pages
摘要:在低资源语言、专业领域和受限的部署设置中,对具有有限数据的大型语言模型(LLM)进行微调是一个实际挑战。虽然预先训练的LLM提供了坚实的基础,但在数据稀缺的情况下,有效的适应需要有针对性和有效的微调技术。本文提出了一个结构化的和实际的调查最近的方法微调LLM在数据稀缺的情况下。我们系统地回顾了降低训练和部署成本的参数高效微调技术,编码器和解码器模型的域和跨语言适应方法,以及模型专业化策略。我们进一步研究了偏好对齐方法,这些方法使用有限的人工或合成反馈来指导模型行为,强调样本和计算效率。在整个过程中,我们强调经验的权衡,选择标准和最佳实践,选择合适的技术的基础上的任务约束,包括模型缩放,数据缩放和灾难性遗忘的缓解。其目的是为研究人员和从业人员提供可行的见解,以便在数据和资源有限的情况下有效地微调LLM。
摘要:Fine-tuning large language models (LLMs) with limited data poses a practical challenge in low-resource languages, specialized domains, and constrained deployment settings. While pre-trained LLMs provide strong foundations, effective adaptation under data scarcity requires focused and efficient fine-tuning techniques. This paper presents a structured and practical survey of recent methods for fine-tuning LLMs in data-scarce scenarios. We systematically review parameter-efficient fine-tuning techniques that lower training and deployment costs, domain and cross-lingual adaptation methods for both encoder and decoder models, and model specialization strategies. We further examine preference alignment approaches that guide model behavior using limited human or synthetic feedback, emphasizing sample and compute efficiency. Throughout, we highlight empirical trade-offs, selection criteria, and best practices for choosing suitable techniques based on task constraints, including model scaling, data scaling, and the mitigation of catastrophic forgetting. The aim is to equip researchers and practitioners with actionable insights for effectively fine-tuning LLMs when data and resources are limited.
Graph相关(图学习|图神经网络|图优化等)(5篇)
【1】Temporal Knowledge Graph Hyperedge Forecasting: Exploring Entity-to-Category Link Prediction
标题:时间知识图超边缘预测:探索数量到类别的链接预测
链接:https://arxiv.org/abs/2510.24240
摘要:时态知识图已经成为一种强大的方式,不仅可以建模实体之间的静态关系,还可以建模关系如何随时间演变的动态关系。由于这些信息结构可以用于存储来自真实世界设置的信息,例如新闻流,因此预测未来的图形组件在一定程度上等同于预测真实世界事件。该领域的大部分研究都集中在基于嵌入的方法上,通常利用卷积神经网络架构。这些解决方案就像黑匣子,限制了洞察力。在本文中,我们探讨了一个扩展到一个既定的基于规则的框架,TLogic,结合可解释的预测,产生了高精度。这提供了透明度,并允许最终用户在预测阶段结束时严格评估应用的规则。新规则格式将实体类别作为关键组件,目的是将规则应用仅限于相关实体。当类别是未知的构建图,我们提出了一个数据驱动的方法来生成它们与基于LLM的方法。此外,我们调查的选择聚合方法进行类别预测时,检索到的实体的分数。
摘要:Temporal Knowledge Graphs have emerged as a powerful way of not only modeling static relationships between entities but also the dynamics of how relations evolve over time. As these informational structures can be used to store information from a real-world setting, such as a news flow, predicting future graph components to a certain extent equates predicting real-world events. Most of the research in this field focuses on embedding-based methods, often leveraging convolutional neural net architectures. These solutions act as black boxes, limiting insight. In this paper, we explore an extension to an established rule-based framework, TLogic, that yields a high accuracy in combination with explainable predictions. This offers transparency and allows the end-user to critically evaluate the rules applied at the end of the prediction stage. The new rule format incorporates entity category as a key component with the purpose of limiting rule application only to relevant entities. When categories are unknown for building the graph, we propose a data-driven method to generate them with an LLM-based approach. Additionally, we investigate the choice of aggregation method for scores of retrieved entities when performing category prediction.
【2】Graph-Guided Concept Selection for Efficient Retrieval-Augmented Generation
标题:高效检索增强生成的图形引导概念选择
链接:https://arxiv.org/abs/2510.24120
摘要:基于图的RAG从文本块构建知识图(KG),以增强基于大语言模型(LLM)的问答系统的检索。它在生物医学、法律和政治学等领域尤其有用,在这些领域,有效的检索通常涉及对专有文档的多跳推理。然而,这些方法需要大量的LLM调用来从文本块中提取实体和关系,这在规模上会产生高昂的成本。通过一个精心设计的消融研究,我们观察到,某些词(称为概念)及其相关的文件更重要。基于这一认识,我们提出了图引导的概念选择(G2ConS)。其核心包括一个组块选择方法和一个LLM独立的概念图。前者选择突出的文件块,以减少KG建设成本,后者关闭知识差距块选择在零成本。对多个真实世界数据集的评估表明,G2 ConS在构建成本、检索有效性和回答质量方面优于所有基线。
摘要:Graph-based RAG constructs a knowledge graph (KG) from text chunks to enhance retrieval in Large Language Model (LLM)-based question answering. It is especially beneficial in domains such as biomedicine, law, and political science, where effective retrieval often involves multi-hop reasoning over proprietary documents. However, these methods demand numerous LLM calls to extract entities and relations from text chunks, incurring prohibitive costs at scale. Through a carefully designed ablation study, we observe that certain words (termed concepts) and their associated documents are more important. Based on this insight, we propose Graph-Guided Concept Selection (G2ConS). Its core comprises a chunk selection method and an LLM-independent concept graph. The former selects salient document chunks to reduce KG construction costs; the latter closes knowledge gaps introduced by chunk selection at zero cost. Evaluations on multiple real-world datasets show that G2ConS outperforms all baselines in construction cost, retrieval effectiveness, and answering quality.
【3】GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research
标题:GraphNet:用于张量简化器研究的大规模计算图形数据集
链接:https://arxiv.org/abs/2510.24035
摘要:我们介绍了GraphNet,这是一个包含2.7K真实深度学习计算图的数据集,具有丰富的元数据,跨越了多个深度学习框架的六个主要任务类别。为了评估这些样本上的张量编译器性能,我们提出了基准度量加速分数S(t),它在可调容差水平下联合考虑了运行时加速和执行正确性,提供了通用优化能力的可靠度量。此外,我们将S(t)扩展到错误感知加速分数ES(t),它包含错误信息,并帮助编译器开发人员识别关键的性能瓶颈。在这份报告中,我们对默认的张量编译器,PaddlePaddle的CINN和PyTorch的TorchInductor,在计算机视觉(CV)和自然语言处理(NLP)样本上进行了基准测试,以证明GraphNet的实用性。包含图形提取和编译器评估工具的完整构建管道可在https://github.com/PaddlePaddle/GraphNet上获得。
摘要:We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To evaluate tensor compiler performance on these samples, we propose the benchmark metric Speedup Score S(t), which jointly considers runtime speedup and execution correctness under tunable tolerance levels, offering a reliable measure of general optimization capability. Furthermore, we extend S(t) to the Error-aware Speedup Score ES(t), which incorporates error information and helps compiler developers identify key performance bottlenecks. In this report, we benchmark the default tensor compilers, CINN for PaddlePaddle and TorchInductor for PyTorch, on computer vision (CV) and natural language processing (NLP) samples to demonstrate the practicality of GraphNet. The full construction pipeline with graph extraction and compiler evaluation tools is available at https://github.com/PaddlePaddle/GraphNet .
【4】HyperGraphX: Graph Transductive Learning with Hyperdimensional Computing and Message Passing
标题:HyperGraphX:具有多维计算和消息传递的图转化学习
链接:https://arxiv.org/abs/2510.23980
摘要:我们提出了一个新的算法,\hdgc,婚姻的图卷积与绑定和捆绑操作在高维计算的转导图学习。在预测精度方面,hdgc优于主流和流行的图神经网络实现,以及最先进的高维计算实现,适用于一组同亲图和异亲图。与我们测试过的最精确的学习方法相比,在相同的目标GPU平台上,\hdgc平均分别比\gcnii(一种图神经网络实现)和HDGL(一种多维计算实现)快9561.0倍和144.5倍。由于大多数学习都是在二进制向量上进行的,因此我们期望\HDGC在神经形态和新兴的内存处理设备上具有出色的能量性能。
摘要:We present a novel algorithm, \hdgc, that marries graph convolution with binding and bundling operations in hyperdimensional computing for transductive graph learning. For prediction accuracy \hdgc outperforms major and popular graph neural network implementations as well as state-of-the-art hyperdimensional computing implementations for a collection of homophilic graphs and heterophilic graphs. Compared with the most accurate learning methodologies we have tested, on the same target GPU platform, \hdgc is on average 9561.0 and 144.5 times faster than \gcnii, a graph neural network implementation and HDGL, a hyperdimensional computing implementation, respectively. As the majority of the learning operates on binary vectors, we expect outstanding energy performance of \hdgc on neuromorphic and emerging process-in-memory devices.
【5】Inferring Group Intent as a Cooperative Game. An NLP-based Framework for Trajectory Analysis using Graph Transformer Neural Network
标题:将群体意图推断为合作游戏。基于NLP的图Transformer神经网络弹道分析框架
链接:https://arxiv.org/abs/2510.23905
摘要:本文研究了群体目标轨迹意图作为合作博弈的结果,其中复杂空间轨迹使用基于NLP的生成模型建模。在我们的框架中,群体意图指定的合作游戏的特征函数,并分配给合作游戏中的球员指定的核心,Shapley值,或核仁。由此产生的分配诱导的概率分布,管理协调的时空轨迹的目标,反映了该集团的基本意图。我们解决两个关键问题:(1)如何将群体轨迹的意图最优地形式化为合作博弈的特征函数?(2)如何从对目标的嘈杂观察中推断出这种意图?为了回答第一个问题,我们引入了一个基于Fisher信息的合作博弈的特征函数,它产生的概率分布,产生协调的时空模式。作为这些模式的生成模型,我们开发了一个基于NLP的生成模型建立在形式语法,使现实的多目标轨迹数据的创建。为了回答第二个问题,我们训练一个图Transformer神经网络(GTNN)推断组轨迹意图表示为合作游戏的特征功能,从观测数据具有高精度。GTNN的自注意功能取决于跟踪估计。因此,公式和算法提供了一个多层的方法,跨越目标跟踪(贝叶斯信号处理)和GTNN(组意图推断)。
摘要:This paper studies group target trajectory intent as the outcome of a cooperative game where the complex-spatio trajectories are modeled using an NLP-based generative model. In our framework, the group intent is specified by the characteristic function of a cooperative game, and allocations for players in the cooperative game are specified by either the core, the Shapley value, or the nucleolus. The resulting allocations induce probability distributions that govern the coordinated spatio-temporal trajectories of the targets that reflect the group's underlying intent. We address two key questions: (1) How can the intent of a group trajectory be optimally formalized as the characteristic function of a cooperative game? (2) How can such intent be inferred from noisy observations of the targets? To answer the first question, we introduce a Fisher-information-based characteristic function of the cooperative game, which yields probability distributions that generate coordinated spatio-temporal patterns. As a generative model for these patterns, we develop an NLP-based generative model built on formal grammar, enabling the creation of realistic multi-target trajectory data. To answer the second question, we train a Graph Transformer Neural Network (GTNN) to infer group trajectory intent-expressed as the characteristic function of the cooperative game-from observational data with high accuracy. The self-attention function of the GTNN depends on the track estimates. Thus, the formulation and algorithms provide a multi-layer approach that spans target tracking (Bayesian signal processing) and the GTNN (for group intent inference).
Transformer(8篇)
【1】Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?
标题:对象绑定会自然出现在大型预训练视觉Transformer中吗?
链接:https://arxiv.org/abs/2510.24709
备注:Accepted as a Spotlight at NeurIPS 2025
摘要:对象绑定,大脑将共同代表一个对象的许多特征绑定成一个连贯整体的能力,是人类认知的核心。它将低级感知特征分组为高级对象表示,将这些对象有效地和组合地存储在内存中,并支持人类对单个对象实例的推理。虽然以前的工作往往强加以对象为中心的注意力(例如,插槽注意力)明确探讨这些好处,目前还不清楚这种能力是否自然出现在预先训练的Vision Transformers(ViTs)。直觉上,他们可以:识别哪些补丁属于同一个对象应该是有用的下游预测,从而引导注意力。出于自我注意的二次性质,我们假设ViTs表示两个补丁是否属于同一个对象,我们称之为IsSameObject的属性。我们解码IsSameObject补丁嵌入跨ViT层使用相似性探针,达到90%以上的准确性。至关重要的是,这种对象绑定能力在自监督ViTs(DINO,MAE,CLIP)中可靠地出现,但在ImageNet监督模型中明显较弱,这表明绑定不是一个微不足道的架构工件,而是通过特定的预训练目标获得的能力。我们进一步发现IsSameObject被编码在对象特征之上的低维子空间中,并且该信号积极引导注意力。从模型激活中删除IsSameObject会降低下游性能,并与学习目标背道而驰,这意味着紧急对象绑定自然地服务于预训练目标。我们的研究结果挑战了ViTs缺乏对象绑定的观点,并强调了“哪些部分属于一起”的符号知识是如何在联结系统中自然出现的。
摘要:Object binding, the brain's ability to bind the many features that collectively represent an object into a coherent whole, is central to human cognition. It groups low-level perceptual features into high-level object representations, stores those objects efficiently and compositionally in memory, and supports human reasoning about individual object instances. While prior work often imposes object-centric attention (e.g., Slot Attention) explicitly to probe these benefits, it remains unclear whether this ability naturally emerges in pre-trained Vision Transformers (ViTs). Intuitively, they could: recognizing which patches belong to the same object should be useful for downstream prediction and thus guide attention. Motivated by the quadratic nature of self-attention, we hypothesize that ViTs represent whether two patches belong to the same object, a property we term IsSameObject. We decode IsSameObject from patch embeddings across ViT layers using a similarity probe, which reaches over 90% accuracy. Crucially, this object-binding capability emerges reliably in self-supervised ViTs (DINO, MAE, CLIP), but markedly weaker in ImageNet-supervised models, suggesting that binding is not a trivial architectural artifact, but an ability acquired through specific pretraining objectives. We further discover that IsSameObject is encoded in a low-dimensional subspace on top of object features, and that this signal actively guides attention. Ablating IsSameObject from model activations degrades downstream performance and works against the learning objective, implying that emergent object binding naturally serves the pretraining objective. Our findings challenge the view that ViTs lack object binding and highlight how symbolic knowledge of "which parts belong together" emerges naturally in a connectionist system.
【2】Transformers can do Bayesian Clustering
标题:Transformer可以进行Bayesian集群
链接:https://arxiv.org/abs/2510.24318
摘要:贝叶斯聚类考虑了不确定性,但在规模上对计算要求很高。此外,真实世界的数据集通常包含缺失值,简单的插补忽略了相关的不确定性,导致次优结果。我们提出了一种基于转换器的模型,将先验数据拟合网络(PFNs)扩展到无监督贝叶斯聚类。完全在从有限高斯混合模型(GMM)先验生成的合成数据集上训练,Querter-PFN学习估计聚类数量和聚类分配的后验分布。我们的方法比手工制作的模型选择程序(如AIC,BIC和变分推理(VI))更准确地估计聚类的数量,并实现与VI竞争的聚类质量,同时更快几个数量级。可以在包括缺失数据的复杂先验数据上训练Pinter-PFN,在高缺失率下,在真实世界的基因组数据集上优于基于插补的基线。这些结果表明,该算法可以提供可扩展和灵活的贝叶斯聚类。
摘要:Bayesian clustering accounts for uncertainty but is computationally demanding at scale. Furthermore, real-world datasets often contain missing values, and simple imputation ignores the associated uncertainty, resulting in suboptimal results. We present Cluster-PFN, a Transformer-based model that extends Prior-Data Fitted Networks (PFNs) to unsupervised Bayesian clustering. Trained entirely on synthetic datasets generated from a finite Gaussian Mixture Model (GMM) prior, Cluster-PFN learns to estimate the posterior distribution over both the number of clusters and the cluster assignments. Our method estimates the number of clusters more accurately than handcrafted model selection procedures such as AIC, BIC and Variational Inference (VI), and achieves clustering quality competitive with VI while being orders of magnitude faster. Cluster-PFN can be trained on complex priors that include missing data, outperforming imputation-based baselines on real-world genomic datasets, at high missingness. These results show that the Cluster-PFN can provide scalable and flexible Bayesian clustering.
【3】Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Decoder-Only Transformers
标题:密钥和值权重可能就是您所需要的全部:论仅解码器Transformer中查询、密钥、值权重三重组的必要性
链接:https://arxiv.org/abs/2510.23912
摘要:查询、键、值权重三元组是最先进的LLM中当前注意力机制的构建块。我们从理论上研究这个三元组是否可以减少,证明在简化的假设下,查询权重是冗余的,从而减少非嵌入/lm-head参数的数量超过8%。我们验证了从头开始训练的全复杂性GPT-3小型架构(具有层规范化,跳过连接和权重衰减)的理论,证明了简化模型实现了与标准基线相当的验证损失。这些发现激发了调查的查询权重冗余的规模。
摘要:The Query, Key, Value weight triplet is a building block of current attention mechanisms in state-of-the-art LLMs. We theoretically investigate whether this triplet can be reduced, proving under simplifying assumptions that the Query weights are redundant, thereby reducing the number of non-embedding/lm-head parameters by over 8%. We validate the theory on full-complexity GPT-3 small architectures (with layer normalization, skip connections, and weight decay) trained from scratch, demonstrating that the reduced model achieves comparable validation loss to standard baselines. These findings motivate the investigation of the Query weight redundancy at scale.
【4】Parallel BiLSTM-Transformer networks for forecasting chaotic dynamics
标题:用于预测混乱动力学的并行BiLSTM-Transformer网络
链接:https://arxiv.org/abs/2510.23685
备注:9 pages,7 figures
摘要:混沌系统的非线性性质导致对初始条件的极端敏感性和高度复杂的动力学行为,这对准确预测其演化提出了根本性的挑战。为了克服传统方法无法同时捕获混沌时间序列的局部特征和全局依赖性的局限性,提出了一种集成Transformer和双向长短期记忆(BiLSTM)网络的并行预测框架.混合模型采用双分支架构,其中Transformer分支主要捕获长范围依赖关系,而BiLSTM分支专注于提取局部时间特征。来自两个分支的互补表示在专用的特征融合层中融合,以提高预测准确性。作为说明的例子,该模型的性能进行了系统的评价在两个代表性的任务在洛伦兹系统。第一种是自主进化预测,其中模型递归地从状态向量的时延嵌入外推系统轨迹,以评估长期跟踪精度和稳定性。第二个是不可测变量的推断,其中模型从部分观测的时滞嵌入重构未观测状态,以评估其状态完成能力。结果一致表明,所提出的混合框架优于两个单分支架构的任务,证明了其在混沌系统预测的鲁棒性和有效性。
摘要:The nonlinear nature of chaotic systems results in extreme sensitivity to initial conditions and highly intricate dynamical behaviors, posing fundamental challenges for accurately predicting their evolution. To overcome the limitation that conventional approaches fail to capture both local features and global dependencies in chaotic time series simultaneously, this study proposes a parallel predictive framework integrating Transformer and Bidirectional Long Short-Term Memory (BiLSTM) networks. The hybrid model employs a dual-branch architecture, where the Transformer branch mainly captures long-range dependencies while the BiLSTM branch focuses on extracting local temporal features. The complementary representations from the two branches are fused in a dedicated feature-fusion layer to enhance predictive accuracy. As illustrating examples, the model's performance is systematically evaluated on two representative tasks in the Lorenz system. The first is autonomous evolution prediction, in which the model recursively extrapolates system trajectories from the time-delay embeddings of the state vector to evaluate long-term tracking accuracy and stability. The second is inference of unmeasured variable, where the model reconstructs the unobserved states from the time-delay embeddings of partial observations to assess its state-completion capability. The results consistently indicate that the proposed hybrid framework outperforms both single-branch architectures across tasks, demonstrating its robustness and effectiveness in chaotic system prediction.
【5】Transformers from Compressed Representations
标题:来自压缩表示的Transformer
链接:https://arxiv.org/abs/2510.23665
摘要:压缩文件格式是高效数据存储和传输的基石,但它们在表示学习方面的潜力在很大程度上尚未得到充分挖掘。我们介绍TEMPEST(Transformers froM comPressed rEpreSentations),一种利用压缩文件的固有字节流结构来设计有效的标记化和编码策略的方法。通过利用这种紧凑编码,标准Transformer可以直接从压缩数据流中学习语义表示,从而绕过对原始字节级处理或全媒体解码的需要。我们的建议大大减少了语义分类所需的令牌的数量,从而降低了计算复杂性和内存使用。通过对不同数据集、编码方案和模式的广泛实验,我们证明了TEMPEST在提高内存和计算效率的同时,实现了与最先进技术相竞争的准确性。
摘要:Compressed file formats are the corner stone of efficient data storage and transmission, yet their potential for representation learning remains largely underexplored. We introduce TEMPEST (TransformErs froM comPressed rEpreSenTations), a method that exploits the inherent byte-stream structure of compressed files to design an effective tokenization and encoding strategy. By leveraging this compact encoding, a standard transformer can directly learn semantic representations from compressed data streams, bypassing the need for raw byte-level processing or full media decoding. Our proposal substantially reduces the number of tokens required for semantic classification, thereby lowering both computational complexity and memory usage. Through extensive experiments across diverse datasets, coding schemes, and modalities, we show that TEMPEST achieves accuracy competitive wit the state-of-the-art while delivering efficiency gains in memory and compute.
【6】AI-Driven Carbon Monitoring: Transformer-Based Reconstruction of Atmospheric CO2 in Canadian Poultry Regions
标题:人工智能驱动的碳监测:基于转换器的加拿大家禽区大气二氧化碳重建
链接:https://arxiv.org/abs/2510.23663
摘要:准确绘制农田景观中的CO2柱平均值(XCO 2)对于指导减排战略至关重要。我们提出了一个时空Vision Transformer与Wavelet(ST-ViWT)的框架,重建连续的,不确定性量化的XCO 2字段从OCO-2在加拿大南部,强调家禽密集的地区。该模型融合了小波时频表示与Transformer关注气象,植被指数,地形和土地覆盖。在2024年OCO-2数据上,ST-ViWT达到R2 = 0.984和RMSE = 0.468 ppm; 92.3%的缺口填充预测位于+/-1 ppm范围内。TCCON的独立验证显示了强大的泛化(偏差= -0.14 ppm; r = 0.928),包括忠实再现夏末降水。在14个家禽区域的空间分析显示,设施密度和XCO 2(r = 0.43)之间的中度正相关;高密度地区表现出较大的季节性振幅(9.57 ppm)和增强的夏季变化。与传统的插值和标准机器学习基线相比,ST-ViWT产生了无缝的0.25度CO2表面,具有明确的不确定性,即使观测稀疏,也能实现全年覆盖。该方法支持将卫星制约因素与国家清单和精确牲畜平台相结合,以确定排放基准,完善区域特定因素,并验证干预措施。重要的是,基于变压器的地球观测能够实现可扩展、透明、空间明确的碳核算、热点优先级排序和政策相关的减缓评估。
摘要:Accurate mapping of column-averaged CO2 (XCO2) over agricultural landscapes is essential for guiding emission mitigation strategies. We present a Spatiotemporal Vision Transformer with Wavelets (ST-ViWT) framework that reconstructs continuous, uncertainty-quantified XCO2 fields from OCO-2 across southern Canada, emphasizing poultry-intensive regions. The model fuses wavelet time-frequency representations with transformer attention over meteorology, vegetation indices, topography, and land cover. On 2024 OCO-2 data, ST-ViWT attains R2 = 0.984 and RMSE = 0.468 ppm; 92.3 percent of gap-filled predictions lie within +/-1 ppm. Independent validation with TCCON shows robust generalization (bias = -0.14 ppm; r = 0.928), including faithful reproduction of the late-summer drawdown. Spatial analysis across 14 poultry regions reveals a moderate positive association between facility density and XCO2 (r = 0.43); high-density areas exhibit larger seasonal amplitudes (9.57 ppm) and enhanced summer variability. Compared with conventional interpolation and standard machine-learning baselines, ST-ViWT yields seamless 0.25 degree CO2 surfaces with explicit uncertainties, enabling year-round coverage despite sparse observations. The approach supports integration of satellite constraints with national inventories and precision livestock platforms to benchmark emissions, refine region-specific factors, and verify interventions. Importantly, transformer-based Earth observation enables scalable, transparent, spatially explicit carbon accounting, hotspot prioritization, and policy-relevant mitigation assessment.
【7】Spatially Aware Linear Transformer (SAL-T) for Particle Jet Tagging
标题:用于粒子喷射标记的空间感知线性Transformer(SAL-T)
链接:https://arxiv.org/abs/2510.23641
摘要:Transformers在捕获高能粒子碰撞中的全局和局部相关性方面非常有效,但它们在高数据吞吐量环境(如CERN LHC)中存在部署挑战。Transformer模型的二次复杂性需要大量资源,并增加了推理过程中的延迟。为了解决这些问题,我们引入了空间感知线性Transformer(SAL-T),一个物理启发的linformer架构,保持线性的注意力增强。我们的方法结合了基于运动学特征的粒子空间感知分区,从而计算物理意义区域之间的注意力。此外,我们采用卷积层来捕获局部相关性,并从射流物理学的见解中获得信息。除了在jet分类任务中优于标准linformer之外,SAL-T还实现了与全注意力Transformers相当的分类结果,同时在推理过程中使用更少的资源和更低的延迟。通用点云分类数据集(ModelNet 10)上的实验进一步证实了这一趋势。我们的代码可在https://github.com/aaronw5/SAL-T4HEP上获得。
摘要
:Transformers are very effective in capturing both global and local correlations within high-energy particle collisions, but they present deployment challenges in high-data-throughput environments, such as the CERN LHC. The quadratic complexity of transformer models demands substantial resources and increases latency during inference. In order to address these issues, we introduce the Spatially Aware Linear Transformer (SAL-T), a physics-inspired enhancement of the linformer architecture that maintains linear attention. Our method incorporates spatially aware partitioning of particles based on kinematic features, thereby computing attention between regions of physical significance. Additionally, we employ convolutional layers to capture local correlations, informed by insights from jet physics. In addition to outperforming the standard linformer in jet classification tasks, SAL-T also achieves classification results comparable to full-attention transformers, while using considerably fewer resources with lower latency during inference. Experiments on a generic point cloud classification dataset (ModelNet10) further confirm this trend. Our code is available at https://github.com/aaronw5/SAL-T4HEP.
【8】An Enhanced Dual Transformer Contrastive Network for Multimodal Sentiment Analysis
标题:一种用于多模态情感分析的增强型双Transformer对比网络
链接:https://arxiv.org/abs/2510.23617
备注:The paper has been accepted for presentation at the MEDES 2025 conference
摘要:多模态情感分析(MSA)试图通过联合分析来自多个模态(通常是文本和图像)的数据来理解人类情感,从而提供比单峰方法更丰富和更准确的解释。在本文中,我们首先提出BERT-ViT-EF,一种新的模型,结合了强大的基于转换器的编码器BERT的文本输入和ViT的视觉输入,通过早期融合策略。这种方法有助于更深入的跨模态交互和更有效的联合表示学习。为了进一步增强模型的能力,我们提出了一个扩展称为双Transformer对比网络(DTCN),它建立在BERT-ViT-EF。DTCN在BERT之后加入了一个额外的Transformer编码器层,以细化文本上下文(在融合之前),并采用对比学习来对齐文本和图像表示,促进强大的多模态特征学习。两个广泛使用的MSA基准MVSA-Single和TumEmo的实证结果证明了我们的方法的有效性。DTCN在TumEmo上实现了最佳的准确率(78.4%)和F1评分(78.3%),并在MVSA-Single上提供了具有竞争力的性能,准确率为76.6%,F1评分为75.9%。这些改进突出了基于Transformer的多模态情感分析中早期融合和更深入的上下文建模的好处。
摘要:Multimodal Sentiment Analysis (MSA) seeks to understand human emotions by jointly analyzing data from multiple modalities typically text and images offering a richer and more accurate interpretation than unimodal approaches. In this paper, we first propose BERT-ViT-EF, a novel model that combines powerful Transformer-based encoders BERT for textual input and ViT for visual input through an early fusion strategy. This approach facilitates deeper cross-modal interactions and more effective joint representation learning. To further enhance the model's capability, we propose an extension called the Dual Transformer Contrastive Network (DTCN), which builds upon BERT-ViT-EF. DTCN incorporates an additional Transformer encoder layer after BERT to refine textual context (before fusion) and employs contrastive learning to align text and image representations, fostering robust multimodal feature learning. Empirical results on two widely used MSA benchmarks MVSA-Single and TumEmo demonstrate the effectiveness of our approach. DTCN achieves best accuracy (78.4%) and F1-score (78.3%) on TumEmo, and delivers competitive performance on MVSA-Single, with 76.6% accuracy and 75.9% F1-score. These improvements highlight the benefits of early fusion and deeper contextual modeling in Transformer-based multimodal sentiment analysis.
GAN|对抗|攻击|生成相关(7篇)
【1】A Novel XAI-Enhanced Quantum Adversarial Networks for Velocity Dispersion Modeling in MaNGA Galaxies
标题:用于MaNGA星系速度扩散建模的新型XAI增强量子对抗网络
链接:https://arxiv.org/abs/2510.24598
摘要:当前的量子机器学习方法通常面临着平衡预测准确性、鲁棒性和可解释性的挑战。为了解决这个问题,我们提出了一种新的量子对抗框架,它将混合量子神经网络(QNN)与经典的深度学习层集成在一起,由具有基于LIME的可解释性的评估模型指导,并通过量子GAN和自监督变体进行扩展。在所提出的模型中,对抗评估器通过计算反馈损失同时指导QNN,从而优化预测精度和模型可解释性。实证评估表明,Vanilla模型实现了RMSE = 0.27,MSE = 0.071,MAE = 0.21,R^2 = 0.59,与对抗性模型相比,在回归指标上提供了最一致的性能。这些结果证明了将量子启发方法与经典架构相结合以开发轻量级,高性能和可解释的预测模型的潜力,从而使QML的适用性超越当前的限制。
摘要:Current quantum machine learning approaches often face challenges balancing predictive accuracy, robustness, and interpretability. To address this, we propose a novel quantum adversarial framework that integrates a hybrid quantum neural network (QNN) with classical deep learning layers, guided by an evaluator model with LIME-based interpretability, and extended through quantum GAN and self-supervised variants. In the proposed model, an adversarial evaluator concurrently guides the QNN by computing feedback loss, thereby optimizing both prediction accuracy and model explainability. Empirical evaluations show that the Vanilla model achieves RMSE = 0.27, MSE = 0.071, MAE = 0.21, and R^2 = 0.59, delivering the most consistent performance across regression metrics compared to adversarial counterparts. These results demonstrate the potential of combining quantum-inspired methods with classical architectures to develop lightweight, high-performance, and interpretable predictive models, advancing the applicability of QML beyond current limitations.
【2】Attack on a PUF-based Secure Binary Neural Network
标题:对基于PFA的安全二进制神经网络的攻击
链接:https://arxiv.org/abs/2510.24422
备注:Accepted at VLSID 2026. To be published in IEEE Xplore
摘要:部署在忆阻交叉开关阵列上的二进制神经网络(BNN)为边缘计算提供了节能解决方案,但由于忆阻器的非易失性而容易受到物理攻击。最近,Rajendran等人(IEEE Embedded Systems Letter 2025)提出了一种基于物理不可克隆功能(PUF)的方案,以保护BNN免受盗窃攻击。具体而言,通过基于设备的PUF密钥位交换列来保护BNN层的权重和偏置矩阵。 在本文中,我们证明了这种方案,以确保BNN是容易受到PUF密钥恢复攻击。由于我们的攻击,我们恢复的秘密权重和偏置矩阵的BNN。我们的方法是由差分密码分析的动机和重建的PUF密钥逐位通过观察模型精度的变化,并最终恢复BNN模型参数。在MNIST数据集上训练的BNN上进行评估,我们的攻击可以恢复85%的PUF密钥,并恢复BNN模型高达93%的分类准确率,而原始模型的准确率为96%。我们的攻击是非常有效的,它需要几分钟来恢复PUF密钥和模型参数。
摘要:Binarized Neural Networks (BNNs) deployed on memristive crossbar arrays provide energy-efficient solutions for edge computing but are susceptible to physical attacks due to memristor nonvolatility. Recently, Rajendran et al. (IEEE Embedded Systems Letter 2025) proposed a Physical Unclonable Function (PUF)-based scheme to secure BNNs against theft attacks. Specifically, the weight and bias matrices of the BNN layers were secured by swapping columns based on device's PUF key bits. In this paper, we demonstrate that this scheme to secure BNNs is vulnerable to PUF-key recovery attack. As a consequence of our attack, we recover the secret weight and bias matrices of the BNN. Our approach is motivated by differential cryptanalysis and reconstructs the PUF key bit-by-bit by observing the change in model accuracy, and eventually recovering the BNN model parameters. Evaluated on a BNN trained on the MNIST dataset, our attack could recover 85% of the PUF key, and recover the BNN model up to 93% classification accuracy compared to the original model's 96% accuracy. Our attack is very efficient and it takes a couple of minutes to recovery the PUF key and the model parameters.
【3】A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport
标题:公共交通合成出行数据生成的综合评估框架
链接:https://arxiv.org/abs/2510.24375
摘要:合成数据为在公共交通研究中使用智能卡数据的隐私和可访问性挑战提供了一个有前途的解决方案。尽管生成式建模取得了快速进展,但对综合评估的关注有限,因此不清楚合成数据的可靠性,安全性和有用性。现有的评价仍然是零散的,通常限于人口一级的代表性或记录一级的隐私,而不考虑群体一级的差异或特定任务的效用。为了解决这一差距,我们提出了一个代表性隐私效用(RPU)框架,系统地评估合成旅行数据在三个互补的维度和三个层次(记录,组,人口)。该框架集成了一套一致的指标来量化相似性、披露风险和实际有用性,从而实现对合成数据质量的透明和平衡评估。我们应用该框架对12种有代表性的生成方法进行了基准测试,这些方法涵盖了传统的统计模型、深度生成网络和隐私增强变体。结果表明,合成数据并不能保证隐私性,也不存在“一刀切”的模型,隐私性和代表性/效用之间的权衡是明显的。条件表生成对抗网络(CTGAN)提供了最平衡的权衡,并建议用于实际应用。RPU框架为研究人员和从业人员提供了一个系统的和可重复的基础,以比较合成数据生成技术,并在公共交通应用中选择合适的方法。
摘要
:Synthetic data offers a promising solution to the privacy and accessibility challenges of using smart card data in public transport research. Despite rapid progress in generative modeling, there is limited attention to comprehensive evaluation, leaving unclear how reliable, safe, and useful synthetic data truly are. Existing evaluations remain fragmented, typically limited to population-level representativeness or record-level privacy, without considering group-level variations or task-specific utility. To address this gap, we propose a Representativeness-Privacy-Utility (RPU) framework that systematically evaluates synthetic trip data across three complementary dimensions and three hierarchical levels (record, group, population). The framework integrates a consistent set of metrics to quantify similarity, disclosure risk, and practical usefulness, enabling transparent and balanced assessment of synthetic data quality. We apply the framework to benchmark twelve representative generation methods, spanning conventional statistical models, deep generative networks, and privacy-enhanced variants. Results show that synthetic data do not inherently guarantee privacy and there is no "one-size-fits-all" model, the trade-off between privacy and representativeness/utility is obvious. Conditional Tabular generative adversarial network (CTGAN) provide the most balanced trade-off and is suggested for practical applications. The RPU framework provides a systematic and reproducible basis for researchers and practitioners to compare synthetic data generation techniques and select appropriate methods in public transport applications.
【4】What Can Be Recovered Under Sparse Adversarial Corruption? Assumption-Free Theory for Linear Measurements
标题:稀疏对抗性腐败下可以挽救什么?线性测量的无假设理论
链接:https://arxiv.org/abs/2510.24215
摘要:Let \(\bm{A} \in \mathbb{R}^{m \times n}\) be an arbitrary, known matrix and \(\bm{e}\) a \(q\)-sparse adversarial vector. Given \(\bm{y} = \bm{A} x^* + \bm{e}\) and \(q\), we seek the smallest set containing \(x^*\)-hence the one conveying maximal information about \(x^*\)-that is uniformly recoverable from \(\bm{y}\) without knowing \(\bm{e}\). While exact recovery of \(x^*\) via strong (and often impractical) structural assumptions on \(\bm{A}\) or \(x^*\) (for example, restricted isometry, sparsity) is well studied, recoverability for arbitrary \(\bm{A}\) and \(x^*\) remains open. Our main result shows that the best that one can hope to recover is \(x^* + \ker(\bm{U})\), where \(\bm{U}\) is the unique projection matrix onto the intersection of rowspaces of all possible submatrices of \(\bm{A}\) obtained by deleting \(2q\) rows. Moreover, we prove that every \(x\) that minimizes the \(\ell\_0\)-norm of \(\bm{y} - \bm{A} x\) lies in \(x^* + \ker(\bm{U})\), which then gives a constructive approach to recover this set.
摘要:Let \(\bm{A} \in \mathbb{R}^{m \times n}\) be an arbitrary, known matrix and \(\bm{e}\) a \(q\)-sparse adversarial vector. Given \(\bm{y} = \bm{A} x^* + \bm{e}\) and \(q\), we seek the smallest set containing \(x^*\)-hence the one conveying maximal information about \(x^*\)-that is uniformly recoverable from \(\bm{y}\) without knowing \(\bm{e}\). While exact recovery of \(x^*\) via strong (and often impractical) structural assumptions on \(\bm{A}\) or \(x^*\) (for example, restricted isometry, sparsity) is well studied, recoverability for arbitrary \(\bm{A}\) and \(x^*\) remains open. Our main result shows that the best that one can hope to recover is \(x^* + \ker(\bm{U})\), where \(\bm{U}\) is the unique projection matrix onto the intersection of rowspaces of all possible submatrices of \(\bm{A}\) obtained by deleting \(2q\) rows. Moreover, we prove that every \(x\) that minimizes the \(\ell\_0\)-norm of \(\bm{y} - \bm{A} x\) lies in \(x^* + \ker(\bm{U})\), which then gives a constructive approach to recover this set.
【5】Causal-Aware Generative Adversarial Networks with Reinforcement Learning
标题:具有强化学习的Cause感知生成对抗网络
链接:https://arxiv.org/abs/2510.24046
摘要:表格数据在从模型训练到大规模数据分析等任务中的实用性通常受到隐私问题或监管障碍的限制。虽然现有的数据生成方法,特别是基于生成对抗网络(GANs)的方法,已经显示出了希望,但它们经常难以捕获复杂的因果关系,维护数据效用,并提供适合企业部署的可证明的隐私保证。我们介绍CA-GAN,一种专门设计用于解决现实世界表格数据集的这些挑战的新型生成框架。CA-GAN采用两步方法:因果图提取以学习数据流形中的稳健、全面的因果关系,然后是定制的条件WGAN-GP(Wasserstein GAN with Gradient Penalty),它仅根据因果图中节点的结构进行操作。更重要的是,生成器使用新的基于强化学习的目标进行训练,该目标将从真实和虚假数据构建的因果图对齐,确保在训练和采样阶段都有因果意识。我们在14个表格数据集上证明了CA-GAN优于6种SOTA方法。我们的评估重点关注核心数据工程指标:因果保留、效用保留和隐私保留。我们的方法为数据工程师提供了一种实用的高性能解决方案,这些数据工程师希望创建高质量、符合隐私的合成数据集,以基准测试数据库系统,加速软件开发,并促进安全的数据驱动研究。
摘要:The utility of tabular data for tasks ranging from model training to large-scale data analysis is often constrained by privacy concerns or regulatory hurdles. While existing data generation methods, particularly those based on Generative Adversarial Networks (GANs), have shown promise, they frequently struggle with capturing complex causal relationship, maintaining data utility, and providing provable privacy guarantees suitable for enterprise deployment. We introduce CA-GAN, a novel generative framework specifically engineered to address these challenges for real-world tabular datasets. CA-GAN utilizes a two-step approach: causal graph extraction to learn a robust, comprehensive causal relationship in the data's manifold, followed by a custom Conditional WGAN-GP (Wasserstein GAN with Gradient Penalty) that operates exclusively as per the structure of nodes in the causal graph. More importantly, the generator is trained with a new Reinforcement Learning-based objective that aligns the causal graphs constructed from real and fake data, ensuring the causal awareness in both training and sampling phases. We demonstrate CA-GAN superiority over six SOTA methods across 14 tabular datasets. Our evaluations, focused on core data engineering metrics: causal preservation, utility preservation, and privacy preservation. Our method offers a practical, high-performance solution for data engineers seeking to create high-quality, privacy-compliant synthetic datasets to benchmark database systems, accelerate software development, and facilitate secure data-driven research.
【6】Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments
标题:差异隐私:联邦学习环境中的梯度泄漏攻击
链接:https://arxiv.org/abs/2510.23931
备注:17 pages, 12 figures
摘要:联合学习(FL)允许以协作方式训练机器学习模型,而无需共享敏感数据。然而,它仍然容易受到梯度泄漏攻击(GLAs),这可能会泄露共享模型更新中的私人信息。在这项工作中,我们研究了差分隐私(DP)机制的有效性-特别是DP-SGD和基于显式正则化(PDP-SGD)的变体-作为对GLA的防御。为此,我们评估了在不同隐私级别下训练的几种计算机视觉模型在简单分类任务中的性能,然后分析了在模拟FL环境中从截获的梯度中获得的私有数据重建的质量。我们的研究结果表明,DP-SGD显着减轻梯度泄漏攻击的风险,尽管在模型效用的适度权衡。相比之下,PDP-SGD保持了很强的分类性能,但作为对重建攻击的实际防御证明是无效的。这些发现突出了经验评估隐私机制超出其理论保证的重要性,特别是在分布式学习场景中,信息泄露可能对数据安全和隐私构成不可否认的严重威胁。
摘要:Federated Learning (FL) allows for the training of Machine Learning models in a collaborative manner without the need to share sensitive data. However, it remains vulnerable to Gradient Leakage Attacks (GLAs), which can reveal private information from the shared model updates. In this work, we investigate the effectiveness of Differential Privacy (DP) mechanisms - specifically, DP-SGD and a variant based on explicit regularization (PDP-SGD) - as defenses against GLAs. To this end, we evaluate the performance of several computer vision models trained under varying privacy levels on a simple classification task, and then analyze the quality of private data reconstructions obtained from the intercepted gradients in a simulated FL environment. Our results demonstrate that DP-SGD significantly mitigates the risk of gradient leakage attacks, albeit with a moderate trade-off in model utility. In contrast, PDP-SGD maintains strong classification performance but proves ineffective as a practical defense against reconstruction attacks. These findings highlight the importance of empirically evaluating privacy mechanisms beyond their theoretical guarantees, particularly in distributed learning scenarios where information leakage may represent an unassumable critical threat to data security and privacy.
【7】Generating Creative Chess Puzzles
标题:生成创意国际象棋谜题
链接:https://arxiv.org/abs/2510.23881
摘要:虽然生成性人工智能在各个领域都在迅速发展,但生成真正有创意、美观和反直觉的输出仍然是一个挑战。本文提出了一种方法来解决这些困难的领域中的国际象棋难题。我们首先对生成式AI架构进行基准测试,然后引入一个基于国际象棋引擎搜索统计的具有新颖奖励的RL框架,以克服其中的一些缺点。奖励旨在增强谜题的独特性、反直觉性、多样性和真实性。我们的RL方法将反直觉谜题生成量大幅提高了10倍,从0.22\%(监督)提高到2.5\%,超过了现有的数据集率(2.1\%)和最好的Lichess训练模型(0.4\%)。我们的拼图符合新颖性和多样性的基准,保留美学主题,并被人类专家评为比组成的书籍拼图更有创意,更令人愉快,更违反直觉,甚至接近经典作品。我们的最终成果是这些人工智能生成的谜题的策划小册子,这是由三位世界知名专家的创造力。
摘要
:While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by benchmarking Generative AI architectures, and then introduce an RL framework with novel rewards based on chess engine search statistics to overcome some of those shortcomings. The rewards are designed to enhance a puzzle's uniqueness, counter-intuitiveness, diversity, and realism. Our RL approach dramatically increases counter-intuitive puzzle generation by 10x, from 0.22\% (supervised) to 2.5\%, surpassing existing dataset rates (2.1\%) and the best Lichess-trained model (0.4\%). Our puzzles meet novelty and diversity benchmarks, retain aesthetic themes, and are rated by human experts as more creative, enjoyable, and counter-intuitive than composed book puzzles, even approaching classic compositions. Our final outcome is a curated booklet of these AI-generated puzzles, which is acknowledged for creativity by three world-renowned experts.
半/弱/无/有监督|不确定性|主动学习(5篇)
【1】Semi-supervised and unsupervised learning for health indicator extraction from guided waves in aerospace composite structures
标题:半监督和无监督学习从航空航天复合材料结构中引导波提取健康指标
链接:https://arxiv.org/abs/2510.24614
摘要:健康指标(HI)是诊断和诊断航空航天复合材料结构状况的核心,能够实现有效的维护和操作安全。然而,提取可靠的HI仍然具有挑战性,由于材料性能的变化,随机损伤演变,和不同的损伤模式。制造缺陷(例如,断开)和使用中的事故(例如,鸟撞击)进一步使该过程复杂化。这项研究提出了一个全面的数据驱动的框架,通过两种学习方法与多域信号处理集成来学习HI。由于地面实况HI不可用,因此提出了半监督和无监督方法:(i)多样性深度半监督异常检测(Diversity-DeepSAD)方法,其用连续辅助标签作为假设的损伤代理来增强,其克服了仅区分健康和故障状态而忽略中间退化的现有二进制标签的限制,以及(ii)退化趋势约束变分自编码器(DTC-VAE),其中单调性准则经由显式趋势约束嵌入。采用多频率激励的导波对单筋复合材料结构进行疲劳监测。时间,频率和时间-频率表示进行了探索,并通过无监督集成学习融合每个频率的HI,以减轻频率依赖性和减少方差。使用快速傅立叶变换功能,增强的Diversity-DeepSAD模型实现了81.6%的性能,而DTC-VAE提供了最一致的HI,性能为92.3%,优于现有的基线。
摘要:Health indicators (HIs) are central to diagnosing and prognosing the condition of aerospace composite structures, enabling efficient maintenance and operational safety. However, extracting reliable HIs remains challenging due to variability in material properties, stochastic damage evolution, and diverse damage modes. Manufacturing defects (e.g., disbonds) and in-service incidents (e.g., bird strikes) further complicate this process. This study presents a comprehensive data-driven framework that learns HIs via two learning approaches integrated with multi-domain signal processing. Because ground-truth HIs are unavailable, a semi-supervised and an unsupervised approach are proposed: (i) a diversity deep semi-supervised anomaly detection (Diversity-DeepSAD) approach augmented with continuous auxiliary labels used as hypothetical damage proxies, which overcomes the limitation of prior binary labels that only distinguish healthy and failed states while neglecting intermediate degradation, and (ii) a degradation-trend-constrained variational autoencoder (DTC-VAE), in which the monotonicity criterion is embedded via an explicit trend constraint. Guided waves with multiple excitation frequencies are used to monitor single-stiffener composite structures under fatigue loading. Time, frequency, and time-frequency representations are explored, and per-frequency HIs are fused via unsupervised ensemble learning to mitigate frequency dependence and reduce variance. Using fast Fourier transform features, the augmented Diversity-DeepSAD model achieved 81.6% performance, while DTC-VAE delivered the most consistent HIs with 92.3% performance, outperforming existing baselines.
【2】Informed Initialization for Bayesian Optimization and Active Learning
标题:Bayesian优化和主动学习的知情收件箱
链接:https://arxiv.org/abs/2510.23681
备注:28 pages
摘要:贝叶斯优化是一种广泛使用的优化昂贵的黑盒函数的方法,依赖于概率代理模型,如高斯过程。代理模型的质量对于良好的优化性能至关重要,特别是在只能评估少量点批次的Few-Shot设置中。在这种情况下,初始化在塑造代理的预测质量和指导后续优化方面起着至关重要的作用。尽管如此,实践者通常依赖于(准)随机设计来覆盖输入空间。然而,这种方法忽略了两个关键因素:(a)空间填充设计可能不适合减少预测的不确定性,以及(b)初始化期间的有效超参数学习对于高质量预测至关重要,这可能与空间填充设计冲突。为了解决这些限制,我们提出了超参数信息预测探索(HIPE),一种新的获取策略,使用信息理论原理平衡预测不确定性降低与超参数学习。我们推导出一个封闭形式的表达HIPE在高斯过程设置,并证明其有效性,通过广泛的实验,在主动学习和Few-Shot BO。我们的研究结果表明,HIPE优于标准的初始化策略的预测精度,超参数识别,和随后的优化性能,特别是在大批量,Few-Shot设置相关的许多现实世界的贝叶斯优化应用程序。
摘要:Bayesian Optimization is a widely used method for optimizing expensive black-box functions, relying on probabilistic surrogate models such as Gaussian Processes. The quality of the surrogate model is crucial for good optimization performance, especially in the few-shot setting where only a small number of batches of points can be evaluated. In this setting, the initialization plays a critical role in shaping the surrogate's predictive quality and guiding subsequent optimization. Despite this, practitioners typically rely on (quasi-)random designs to cover the input space. However, such approaches neglect two key factors: (a) space-filling designs may not be desirable to reduce predictive uncertainty, and (b) efficient hyperparameter learning during initialization is essential for high-quality prediction, which may conflict with space-filling designs. To address these limitations, we propose Hyperparameter-Informed Predictive Exploration (HIPE), a novel acquisition strategy that balances predictive uncertainty reduction with hyperparameter learning using information-theoretic principles. We derive a closed-form expression for HIPE in the Gaussian Process setting and demonstrate its effectiveness through extensive experiments in active learning and few-shot BO. Our results show that HIPE outperforms standard initialization strategies in terms of predictive accuracy, hyperparameter identification, and subsequent optimization performance, particularly in large-batch, few-shot settings relevant to many real-world Bayesian Optimization applications.
【3】SAND: A Self-supervised and Adaptive NAS-Driven Framework for Hardware Trojan Detection
标题:SAND:一个用于硬件特洛伊木马检测的自监督和自适应NAS驱动框架
链接:https://arxiv.org/abs/2510.23643
摘要:全球化的半导体供应链使得硬件木马(Hardware Trojans,HT)成为嵌入式系统的重要安全威胁,设计高效且适应性强的检测机制成为必要。尽管在文献中有很有前途的基于机器学习的HT检测技术,但它们受到特设特征选择和缺乏自适应性的影响,所有这些都阻碍了它们在不同HT攻击中的有效性。在本文中,我们提出了SAND,这是一种用于高效HT检测的自监督和自适应NAS驱动框架。具体而言,本文做出了三个关键贡献。(1)我们利用自监督学习(SSL)来实现自动化特征提取,消除对手动设计特征的依赖。(2)SAND集成了神经架构搜索(NAS),以动态优化下游分类器,允许以最小的微调无缝适应看不见的基准。(3)实验结果表明,SAND实现了显着提高检测准确率(高达18.3%)比国家的最先进的方法,表现出较高的弹性对规避木马,并表现出较强的泛化能力。
摘要:The globalized semiconductor supply chain has made Hardware Trojans (HT) a significant security threat to embedded systems, necessitating the design of efficient and adaptable detection mechanisms. Despite promising machine learning-based HT detection techniques in the literature, they suffer from ad hoc feature selection and the lack of adaptivity, all of which hinder their effectiveness across diverse HT attacks. In this paper, we propose SAND, a selfsupervised and adaptive NAS-driven framework for efficient HT detection. Specifically, this paper makes three key contributions. (1) We leverage self-supervised learning (SSL) to enable automated feature extraction, eliminating the dependency on manually engineered features. (2) SAND integrates neural architecture search (NAS) to dynamically optimize the downstream classifier, allowing for seamless adaptation to unseen benchmarks with minimal fine-tuning. (3) Experimental results show that SAND achieves a significant improvement in detection accuracy (up to 18.3%) over state-of-the-art methods, exhibits high resilience against evasive Trojans, and demonstrates strong generalization.
【4】Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades
标题:用于数据驱动缺陷检测和特征的无监督机器学习管道:应用于位移级联
链接:https://arxiv.org/abs/2510.24523
备注:22 pages, 1 graphical abstract, 7 figures, 4 tables
摘要
:中子辐照在几皮秒内产生位移级联,其是原子碰撞产生点和扩展缺陷的序列,其随后影响材料的长期演变。从形态学和统计学上表征的这些缺陷的多样性定义了所谓的“原发性损伤”。在这项工作中,我们提出了一个完全无监督的机器学习(ML)工作流程,可以直接从分子动力学数据中检测和分类这些缺陷。局部环境由原子位置的平滑重叠(SOAP)向量编码,异常原子用自动编码器神经网络(AE)分离,用均匀人-倍近似和投影(UMAP)嵌入,并使用具有噪声的应用的基于层次密度的空间聚类(HDBSCAN)聚类。应用到80千电子伏的位移级联在Ni,Fe 70 Ni 10 Cr20,和Zr,AE成功地确定小部分的离群原子参与缺陷的形成。然后,HDBSCAN将AE标记的SOAP描述符的UMAP潜在空间划分为定义良好的组,这些组表示空位和空位占主导地位的区域,并且在每个组中,将小聚集体与大聚集体分开,将99.7%的离群值分配给紧凑的物理基序。带符号的簇识别分数证实了这种分离,并且簇大小与净缺陷计数成比例(R2 > 0.89)。ML离群图和几种传统检测器(中心对称,位错提取等)之间的统计交叉分析。显示出强的重叠和互补覆盖,所有这些都是在没有模板或阈值调整的情况下实现的。因此,这种ML工作流程提供了一种有效的工具,用于定量映射材料中的结构异常,特别是那些由位移级联中的辐射损伤引起的结构异常。
摘要:Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the long-term evolution of materials. The diversity of these defects, characterized morphologically and statistically, defines what is called the "primary damage". In this work, we present a fully unsupervised machine learning (ML) workflow that detects and classifies these defects directly from molecular dynamics data. Local environments are encoded by the Smooth Overlap of Atomic Positions (SOAP) vector, anomalous atoms are isolated with autoencoder neural networks (AE), embedded with Uniform Man- ifold Approximation and Projection (UMAP) and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Applied to 80 keV displacement cascades in Ni, Fe70Ni10Cr20, and Zr, the AE successfully identify the small fraction of outlier atoms that participate in defect formation. HDBSCAN then partitions the UMAP latent space of AE-flagged SOAP de- scriptors into well defined groups representing vacancy- and interstitial-dominated regions and, within each, separates small from large aggregates, assigning 99.7 % of outliers to compact physical motifs. A signed cluster-identification score confirms this separation, and cluster size scales with net defect counts (R2 > 0.89). Statistical cross analyses between the ML outlier map and several conventional detectors (centrosymmetry, dislocation extraction, etc.) reveal strong overlap and complementary coverage, all achieved without template or threshold tuning. This ML workflow thus provides an efficient tool for the quantitative mapping of structural anomalies in materials, particularly those arising from irradiation damage in displacement cascades.
【5】Self-supervised Synthetic Pretraining for Inference of Stellar Mass Embedded in Dense Gas
标题:自监督合成预训练推断致密气体中恒星质量
链接:https://arxiv.org/abs/2510.24159
备注:6 pages, 3 figures, 1 table, accepted for NeurIPS 2025 ML4PS workshop
摘要:恒星质量是决定恒星性质和演化的基本量。然而,估计恒星形成区域的恒星质量是具有挑战性的,因为年轻恒星被致密气体所掩盖,并且该区域是高度不均匀的,使得球形动力学估计不可靠。监督机器学习可以将这种复杂的结构与恒星质量联系起来,但它需要来自高分辨率磁流体动力学(MHD)模拟的大型高质量标记数据集,这在计算上是昂贵的。我们通过使用自监督框架DINOv2在一百万个合成分形图像上预训练Vision Transformer来解决这个问题,然后将冻结模型应用于有限的高分辨率MHD模拟。我们的研究结果表明,合成预训练改善了冻结特征回归恒星质量预测,预训练模型的表现略好于在相同的有限模拟上训练的监督模型。提取的特征的主成分分析进一步揭示了语义上有意义的结构,这表明该模型能够在不需要标记数据或微调的情况下对恒星形成区域进行无监督分割。
摘要:Stellar mass is a fundamental quantity that determines the properties and evolution of stars. However, estimating stellar masses in star-forming regions is challenging because young stars are obscured by dense gas and the regions are highly inhomogeneous, making spherical dynamical estimates unreliable. Supervised machine learning could link such complex structures to stellar mass, but it requires large, high-quality labeled datasets from high-resolution magneto-hydrodynamical (MHD) simulations, which are computationally expensive. We address this by pretraining a vision transformer on one million synthetic fractal images using the self-supervised framework DINOv2, and then applying the frozen model to limited high-resolution MHD simulations. Our results demonstrate that synthetic pretraining improves frozen-feature regression stellar mass predictions, with the pretrained model performing slightly better than a supervised model trained on the same limited simulations. Principal component analysis of the extracted features further reveals semantically meaningful structures, suggesting that the model enables unsupervised segmentation of star-forming regions without the need for labeled data or fine-tuning.
迁移|Zero/Few/One-Shot|自适应(8篇)
【1】Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation
标题:使用基于前置的适应的Zero-Shot跨语言迁移
链接:https://arxiv.org/abs/2510.24619
备注:12 Pages
摘要:随着Llama和Mistral等新的大型语言模型(LLM)的发布,由于其多语言预训练和强大的泛化能力,zero-shot跨语言迁移变得越来越可行。然而,使这些仅解码器的LLM适应跨语言的新任务仍然具有挑战性。虽然像低秩自适应(LoRA)这样的参数高效微调(PeFT)技术被广泛使用,但是诸如软提示调谐、前缀调谐和Llama适配器之类的基于前缀的技术很少被探索,特别是对于仅解码器模型中的zero-shot传输。我们提出了一个全面的研究三个基于前缀的方法zero-shot跨语言迁移从英语到35+高和低资源的语言。我们的分析进一步探讨了跨语言家族和脚本的迁移,以及将模型大小从1B扩展到24B的影响。使用Llama 3.1 8B,前缀方法在Belebele基准测试中的表现优于LoRA基线高达6%。使用Mistral v0.3 7B也观察到类似的改善。尽管只使用了1.23M的前缀调整学习参数,我们在不同的基准测试中实现了一致的改进。这些发现突出了基于前缀的技术作为LoRA的有效和可扩展的替代方案的潜力,特别是在低资源的多语言环境中。
摘要:With the release of new large language models (LLMs) like Llama and Mistral, zero-shot cross-lingual transfer has become increasingly feasible due to their multilingual pretraining and strong generalization capabilities. However, adapting these decoder-only LLMs to new tasks across languages remains challenging. While parameter-efficient fine-tuning (PeFT) techniques like Low-Rank Adaptation (LoRA) are widely used, prefix-based techniques such as soft prompt tuning, prefix tuning, and Llama Adapter are less explored, especially for zero-shot transfer in decoder-only models. We present a comprehensive study of three prefix-based methods for zero-shot cross-lingual transfer from English to 35+ high- and low-resource languages. Our analysis further explores transfer across linguistic families and scripts, as well as the impact of scaling model sizes from 1B to 24B. With Llama 3.1 8B, prefix methods outperform LoRA-baselines by up to 6% on the Belebele benchmark. Similar improvements were observed with Mistral v0.3 7B as well. Despite using only 1.23M learning parameters with prefix tuning, we achieve consistent improvements across diverse benchmarks. These findings highlight the potential of prefix-based techniques as an effective and scalable alternative to LoRA, particularly in low-resource multilingual settings.
【2】LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis
标题:LoRA-DA:通过渐进分析进行低等级适应的数据感知预设
链接:https://arxiv.org/abs/2510.24561
摘要:随着LLM的广泛应用,LoRA已成为PEFT的主导方法,其初始化方法也越来越受到关注。然而,现有的方法有明显的局限性:许多方法不结合目标域数据,而基于梯度的方法通过依赖于一步梯度分解仅在浅层次上利用数据,这由于用作其基础的一步微调模型的弱经验性能而仍然不能令人满意,以及这些方法要么缺乏严格的理论基础要么严重依赖于限制性的各向同性假设的事实。在本文中,我们建立了一个理论框架的数据感知LoRA初始化的基础上渐近分析。从最小化微调模型和目标模型之间的参数差异的期望的一般优化目标出发,我们导出了具有两个分量的优化问题:偏差项,其与微调模型和目标模型之间的参数距离有关,并且使用Fisher梯度公式近似以保持各向异性;和方差项,它说明了通过Fisher信息抽样随机性引入的不确定性。通过求解该问题,我们得到了LoRA的最优初始化策略。在此理论框架的基础上,我们开发了一种高效的算法LoRA-DA,它从一小组目标域样本中估计优化问题中的项,并获得最佳LoRA初始化。多个基准测试的实证结果表明,LoRA-DA始终提高了现有初始化方法的最终准确性。进一步的研究表明,更快,更稳定的收敛,跨等级的鲁棒性,只有一个小的初始化开销LoRA-DA。源代码将在出版后发布。
摘要
:With the widespread adoption of LLMs, LoRA has become a dominant method for PEFT, and its initialization methods have attracted increasing attention. However, existing methods have notable limitations: many methods do not incorporate target-domain data, while gradient-based methods exploit data only at a shallow level by relying on one-step gradient decomposition, which remains unsatisfactory due to the weak empirical performance of the one-step fine-tuning model that serves as their basis, as well as the fact that these methods either lack a rigorous theoretical foundation or depend heavily on restrictive isotropic assumptions. In this paper, we establish a theoretical framework for data-aware LoRA initialization based on asymptotic analysis. Starting from a general optimization objective that minimizes the expectation of the parameter discrepancy between the fine-tuned and target models, we derive an optimization problem with two components: a bias term, which is related to the parameter distance between the fine-tuned and target models, and is approximated using a Fisher-gradient formulation to preserve anisotropy; and a variance term, which accounts for the uncertainty introduced by sampling stochasticity through the Fisher information. By solving this problem, we obtain an optimal initialization strategy for LoRA. Building on this theoretical framework, we develop an efficient algorithm, LoRA-DA, which estimates the terms in the optimization problem from a small set of target domain samples and obtains the optimal LoRA initialization. Empirical results across multiple benchmarks demonstrate that LoRA-DA consistently improves final accuracy over existing initialization methods. Additional studies show faster, more stable convergence, robustness across ranks, and only a small initialization overhead for LoRA-DA. The source code will be released upon publication.
【3】UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation
标题:UtilGen:以实用工具为中心的生成数据增强,具有双重任务适应
链接:https://arxiv.org/abs/2510.24262
备注:39th Conference on Neural Information Processing Systems (NeurIPS
2025)
摘要:使用生成模型的数据增强已经成为增强计算机视觉任务性能的强大范例。然而,大多数现有的增强方法主要集中在优化固有的数据属性(如保真度和多样性),以生成视觉上高质量的合成数据,而往往忽略了特定于任务的要求。然而,数据生成器必须考虑下游任务的需求,因为不同任务和网络架构的训练数据需求可能会有很大差异。为了解决这些局限性,我们提出了一种新的以实用程序为中心的数据增强框架UtilGen,它自适应地优化了数据生成过程,通过下游任务反馈生成特定于任务的高实用程序训练数据。具体来说,我们首先引入一个权重分配网络来评估每个合成样本的特定任务效用。在这些评估的指导下,UtilGen使用双层优化策略迭代地改进数据生成过程,以最大化合成数据效用:(1)模型级优化为下游任务定制生成模型,(2)实例级优化在每个生成轮调整生成策略-例如提示嵌入和初始噪声。在不同复杂度和粒度的八个基准数据集上进行的大量实验表明,UtilGen始终实现了卓越的性能,与以前的SOTA相比,平均精度提高了3.87%。对数据影响和分布的进一步分析表明,UtilGen产生了更有影响力和任务相关的合成数据,验证了从以视觉特征为中心到以任务实用程序为中心的数据增强的范式转变的有效性。
摘要:Data augmentation using generative models has emerged as a powerful paradigm for enhancing performance in computer vision tasks. However, most existing augmentation approaches primarily focus on optimizing intrinsic data attributes -- such as fidelity and diversity -- to generate visually high-quality synthetic data, while often neglecting task-specific requirements. Yet, it is essential for data generators to account for the needs of downstream tasks, as training data requirements can vary significantly across different tasks and network architectures. To address these limitations, we propose UtilGen, a novel utility-centric data augmentation framework that adaptively optimizes the data generation process to produce task-specific, high-utility training data via downstream task feedback. Specifically, we first introduce a weight allocation network to evaluate the task-specific utility of each synthetic sample. Guided by these evaluations, UtilGen iteratively refines the data generation process using a dual-level optimization strategy to maximize the synthetic data utility: (1) model-level optimization tailors the generative model to the downstream task, and (2) instance-level optimization adjusts generation policies -- such as prompt embeddings and initial noise -- at each generation round. Extensive experiments on eight benchmark datasets of varying complexity and granularity demonstrate that UtilGen consistently achieves superior performance, with an average accuracy improvement of 3.87% over previous SOTA. Further analysis of data influence and distribution reveals that UtilGen produces more impactful and task-relevant synthetic data, validating the effectiveness of the paradigm shift from visual characteristics-centric to task utility-centric data augmentation.
【4】PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling
标题:PaTaRM:通过偏好感知任务自适应奖励建模来桥梁成对和逐点信号
链接:https://arxiv.org/abs/2510.24235
摘要:奖励模型(RM)是人类反馈强化学习(RLHF)的核心,提供了将大型语言模型(LLM)与人类偏好相匹配的关键监督信号。虽然生成奖励模型(GRM)比传统的标量RM提供了更好的可解释性,但目前的训练范式仍然有限。成对方法依赖于二进制好与坏标签,这会导致点式推理的不匹配,并且需要复杂的配对策略才能在RLHF中有效应用。另一方面,逐点方法需要更精细的绝对标记与规则驱动的标准,导致适应性差,注释成本高。在这项工作中,我们提出了偏好感知任务自适应奖励模型(PaTaRM),一个统一的框架,集成了偏好感知奖励(PAR)机制与动态规则适应。PaTaRM利用来自成对数据的相对偏好信息来构建鲁棒的逐点训练信号,从而消除了对显式逐点标签的需要。同时,它采用了一个任务自适应的规则系统,灵活地生成评估标准的全局任务一致性和特定实例的细粒度推理。该设计实现了RLHF的高效、可推广和可解释的奖励建模。大量的实验表明,PaTaRM在Qwen 3 -8B和Qwen 3 - 14 B模型上的RewardBench和RMBench上实现了4.7%的平均相对改进。此外,PaTaRM提升了下游RLHF性能,在IFEval和InFoBench基准测试中平均提高了13.6%,证实了其有效性和鲁棒性。我们的代码可在https://github.com/JaneEyre0530/PaTaRM上获得。
摘要:Reward models (RMs) are central to reinforcement learning from human feedback (RLHF), providing the critical supervision signals that align large language models (LLMs) with human preferences. While generative reward models (GRMs) offer greater interpretability than traditional scalar RMs, current training paradigms remain limited. Pair-wise methods rely on binary good-versus-bad labels, which cause mismatches for point-wise inference and necessitate complex pairing strategies for effective application in RLHF. On the other hand, point-wise methods require more elaborate absolute labeling with rubric-driven criteria, resulting in poor adaptability and high annotation costs. In this work, we propose the Preference-Aware Task-Adaptive Reward Model (PaTaRM), a unified framework that integrates a preference-aware reward (PAR) mechanism with dynamic rubric adaptation. PaTaRM leverages relative preference information from pairwise data to construct robust point-wise training signals, eliminating the need for explicit point-wise labels. Simultaneously, it employs a task-adaptive rubric system that flexibly generates evaluation criteria for both global task consistency and instance-specific fine-grained reasoning. This design enables efficient, generalizable, and interpretable reward modeling for RLHF. Extensive experiments show that PaTaRM achieves an average relative improvement of 4.7% on RewardBench and RMBench across Qwen3-8B and Qwen3-14B models. Furthermore, PaTaRM boosts downstream RLHF performance, with an average improvement of 13.6% across IFEval and InFoBench benchmarks, confirming its effectiveness and robustness. Our code is available at https://github.com/JaneEyre0530/PaTaRM.
【5】Auto-Adaptive PINNs with Applications to Phase Transitions
标题:自适应PINN应用于相转变
链接:https://arxiv.org/abs/2510.23999
摘要:我们提出了一种自适应采样方法,用于训练物理信息神经网络(PINNs),该方法允许基于任意特定问题的启发式算法进行采样,该启发式算法可能取决于网络及其梯度。特别是,我们专注于我们的分析艾伦-卡恩方程,试图准确地解决使用PINN没有任何事后的rescue的特征界面区域。在实验中,我们展示了这些方法在残差自适应框架上的有效性。
摘要:We propose an adaptive sampling method for the training of Physics Informed Neural Networks (PINNs) which allows for sampling based on an arbitrary problem-specific heuristic which may depend on the network and its gradients. In particular we focus our analysis on the Allen-Cahn equations, attempting to accurately resolve the characteristic interfacial regions using a PINN without any post-hoc resampling. In experiments, we show the effectiveness of these methods over residual-adaptive frameworks.
【6】Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models
标题:文本到图像扩散模型的扩散自适应文本嵌入
链接:https://arxiv.org/abs/2510.23974
备注:Accepted at NeurIPS 2025
摘要:文本到图像扩散模型依赖于来自预先训练的文本编码器的文本嵌入,但这些嵌入在所有扩散时间步中保持固定,限制了它们对生成过程的适应性。我们提出了扩散自适应文本嵌入(日期),动态更新文本嵌入在每个扩散时间步的基础上中间扰动数据。我们制定了一个优化问题,并推导出一个更新规则,细化在每个采样步骤的文本嵌入,以提高平均预测图像和文本之间的对齐和偏好。这允许DATE在整个扩散采样过程中动态地将文本条件适应于反向扩散图像,而无需额外的模型训练。通过理论分析和实证结果,我们表明,日期保持了模型的生成能力,同时在各种任务中提供优于固定文本嵌入的文本图像对齐,包括多概念生成和文本引导的图像编辑。我们的代码可在https://github.com/aailab-kaist/DATE上获得。
摘要:Text-to-image diffusion models rely on text embeddings from a pre-trained text encoder, but these embeddings remain fixed across all diffusion timesteps, limiting their adaptability to the generative process. We propose Diffusion Adaptive Text Embedding (DATE), which dynamically updates text embeddings at each diffusion timestep based on intermediate perturbed data. We formulate an optimization problem and derive an update rule that refines the text embeddings at each sampling step to improve alignment and preference between the mean predicted image and the text. This allows DATE to dynamically adapts the text conditions to the reverse-diffused images throughout diffusion sampling without requiring additional model training. Through theoretical analysis and empirical results, we show that DATE maintains the generative capability of the model while providing superior text-image alignment over fixed text embeddings across various tasks, including multi-concept generation and text-guided image editing. Our code is available at https://github.com/aailab-kaist/DATE.
【7】ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning
标题:ScaLoRA:优化扩展的低等级自适应,实现高效的高等级微调
链接:https://arxiv.org/abs/2510.23818
摘要:随着大型语言模型(LLM)的规模不断扩大,计算开销已成为特定于任务的微调的主要瓶颈。虽然低秩自适应(LoRA)通过将权重更新限制在低维子空间来有效地减少这种成本,但这种限制可能会阻碍有效性和缓慢收敛。这种贡献通过从连续的低秩增量逐渐累积高秩权重更新来处理这些限制。具体地说,每次更新的最佳低秩矩阵被确定为最小化损失函数和密切接近全微调。为了在不重新启动的情况下赋予高效和无缝的优化,通过适当地缩放原始低秩矩阵的列来形成该最优选择。严格的性能保证表明,最佳缩放可以通过分析找到。使用流行的LLM扩展到120亿个参数的广泛数值测试表明,相对于最先进的LoRA变体,在包括自然语言理解,常识推理和数学问题解决在内的各种任务上,性能增益一致,收敛速度快。
摘要:As large language models (LLMs) continue to scale in size, the computational overhead has become a major bottleneck for task-specific fine-tuning. While low-rank adaptation (LoRA) effectively curtails this cost by confining the weight updates to a low-dimensional subspace, such a restriction can hinder effectiveness and slow convergence. This contribution deals with these limitations by accumulating progressively a high-rank weight update from consecutive low-rank increments. Specifically, the per update optimal low-rank matrix is identified to minimize the loss function and closely approximate full fine-tuning. To endow efficient and seamless optimization without restarting, this optimal choice is formed by appropriately scaling the columns of the original low-rank matrix. Rigorous performance guarantees reveal that the optimal scaling can be found analytically. Extensive numerical tests with popular LLMs scaling up to 12 billion parameters demonstrate a consistent performance gain and fast convergence relative to state-of-the-art LoRA variants on diverse tasks including natural language understanding, commonsense reasoning, and mathematical problem solving.
【8】PULSE: Privileged Knowledge Transfer from Electrodermal Activity to Low-Cost Sensors for Stress Monitoring
标题:PulSE:从皮电活动到用于压力监测的低成本传感器的特权知识转移
链接:https://arxiv.org/abs/2510.24058
备注:Accepted as a finders paper at ML4H 2025
摘要:皮肤电活动(EDA)是压力检测的主要信号,需要昂贵的硬件,而这些硬件通常在现实世界的可穿戴设备中无法获得。在本文中,我们提出了PULSE,这是一个在自我监督预训练期间专门利用EDA的框架,同时可以在没有EDA的情况下进行推理,但可以使用更容易获得的模式,如ECG,BVP,ACC和TEMP。我们的方法将编码器输出分为共享和私有嵌入。我们对齐跨模态的共享嵌入,并将它们融合到模态不变的表示中。私有嵌入携带特定于模态的信息以支持重建目标。预训练之后是知识转移,其中冻结的EDA教师将同情唤醒表示转移到学生编码器中。在WESAD上,我们的方法实现了强大的应力检测性能,表明特权EDA的表示可以转移到低成本传感器上,以提高精度,同时降低硬件成本。
摘要:Electrodermal activity (EDA), the primary signal for stress detection, requires costly hardware often unavailable in real-world wearables. In this paper, we propose PULSE, a framework that utilizes EDA exclusively during self-supervised pretraining, while enabling inference without EDA but with more readily available modalities such as ECG, BVP, ACC, and TEMP. Our approach separates encoder outputs into shared and private embeddings. We align shared embeddings across modalities and fuse them into a modality-invariant representation. The private embeddings carry modality-specific information to support the reconstruction objective. Pretraining is followed by knowledge transfer where a frozen EDA teacher transfers sympathetic-arousal representations into student encoders. On WESAD, our method achieves strong stress-detection performance, showing that representations of privileged EDA can be transferred to low-cost sensors to improve accuracy while reducing hardware cost.
符号|符号学习(2篇)
【1】Symbolic Snapshot Ensembles
标题:符号快照集合
链接:https://arxiv.org/abs/2510.24633
摘要:归纳逻辑编程(ILP)是一种逻辑机器学习。大多数ILP算法从单个训练运行中学习单个假设。包围方法多次训练ILP算法以学习多个假设。在本文中,我们训练一个ILP算法只有一次,并保存中间假设。然后,我们结合使用最小描述长度加权方案的假设。我们在多个基准测试(包括游戏和视觉推理)上的实验表明,我们的方法将预测准确率提高了4%,计算开销不到1%。
摘要:Inductive logic programming (ILP) is a form of logical machine learning. Most ILP algorithms learn a single hypothesis from a single training run. Ensemble methods train an ILP algorithm multiple times to learn multiple hypotheses. In this paper, we train an ILP algorithm only once and save intermediate hypotheses. We then combine the hypotheses using a minimum description length weighting scheme. Our experiments on multiple benchmarks, including game playing and visual reasoning, show that our approach improves predictive accuracy by 4% with less than 1% computational overhead.
【2】Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents
标题:超越即时工程:稳健多目标人工智能代理的神经符号因果架构
链接:https://arxiv.org/abs/2510.23682
备注:35 pages, 15 figures, 2 tables. Keywords: Large Language Models, Autonomous Agents, Neuro-Symbolic AI, Causal Inference, Formal Verification, Multi-Objective Optimization. Open-source code and interactive demo available
摘要:大型语言模型显示出作为自主决策代理的前景,但它们在高风险领域的部署仍然充满风险。没有架构保障,LLM代理表现出灾难性的脆弱性:相同的功能只取决于即时框架产生截然不同的结果。我们提出了嵌合体,神经符号因果架构,集成了三个互补的组成部分-一个LLM战略家,一个正式验证的符号约束引擎,和一个因果推理模块反事实推理。我们在现实的电子商务环境中对Chimera进行了为期52周的模拟基准测试,其中包括价格弹性,信任动态和季节性需求。在对数量或利润优化的组织偏见下,仅限LLM的代理商会灾难性地失败(在数量场景中总损失9.9万美元)或破坏品牌信任(在利润场景中为-48.6%)。增加象征性的限制可以防止灾难,但只能实现Chimera利润的43-87%。Chimera始终提供最高的回报(分别为152万美元和196万美元,某些情况下为220万美元),同时提高了品牌信任度(+1.8%和+10.8%,某些情况下为+20.86%),展示了不可知的稳健性。我们的TLA+正式验证证明了所有场景中的零约束违反。这些结果表明,建筑设计不提示工程决定了生产环境中的自主代理的可靠性。我们提供开源实现和交互式演示,以实现可重复性。
摘要:Large language models show promise as autonomous decision-making agents, yet their deployment in high-stakes domains remains fraught with risk. Without architectural safeguards, LLM agents exhibit catastrophic brittleness: identical capabilities produce wildly different outcomes depending solely on prompt framing. We present Chimera, a neuro-symbolic-causal architecture that integrates three complementary components - an LLM strategist, a formally verified symbolic constraint engine, and a causal inference module for counterfactual reasoning. We benchmark Chimera against baseline architectures (LLM-only, LLM with symbolic constraints) across 52-week simulations in a realistic e-commerce environment featuring price elasticity, trust dynamics, and seasonal demand. Under organizational biases toward either volume or margin optimization, LLM-only agents fail catastrophically (total loss of \$99K in volume scenarios) or destroy brand trust (-48.6% in margin scenarios). Adding symbolic constraints prevents disasters but achieves only 43-87% of Chimera's profit. Chimera consistently delivers the highest returns (\$1.52M and \$1.96M respectively, some cases +\$2.2M) while improving brand trust (+1.8% and +10.8%, some cases +20.86%), demonstrating prompt-agnostic robustness. Our TLA+ formal verification proves zero constraint violations across all scenarios. These results establish that architectural design not prompt engineering determines the reliability of autonomous agents in production environments. We provide open-source implementations and interactive demonstrations for reproducibility.
医学相关(2篇)
【1】From Detection to Discovery: A Closed-Loop Approach for Simultaneous and Continuous Medical Knowledge Expansion and Depression Detection on Social Media
标题:从检测到发现:在社交媒体上同时、持续地扩展医学知识和抑郁检测的闭环方法
链接:https://arxiv.org/abs/2510.23626
备注:Presented at SWAIB2025 and HICSS2026
摘要:社交媒体用户生成内容(UGC)提供了抑郁症等心理健康状况的实时自我报告指标,为预测分析提供了宝贵的来源。虽然先前的研究整合了医学知识以提高预测准确性,但它们忽略了通过预测过程同时扩展这些知识的机会。我们开发了一个闭环大语言模型(LLM)-知识图框架,它在迭代学习周期中集成了预测和知识扩展。在知识感知抑郁检测阶段,LLM联合执行抑郁检测和实体提取,而知识图表示并加权这些实体以改进预测性能。在知识细化和扩展阶段,LLM提取的新实体、关系和实体类型在专家监督下被纳入知识图,从而实现持续的知识进化。通过使用大规模的UGC,该框架提高了预测准确性和医学理解。专家评价证实了发现的有临床意义的症状、合并症和社会触发因素对现有文献的补充。我们将通过学习进行预测和通过预测进行学习作为相互加强的过程进行概念化和操作化,从而推进对预测分析的方法和理论理解。该框架展示了计算模型和领域知识的共同进化,为适用于其他动态风险监测环境的自适应数据驱动知识系统提供了基础。
摘要
:Social media user-generated content (UGC) provides real-time, self-reported indicators of mental health conditions such as depression, offering a valuable source for predictive analytics. While prior studies integrate medical knowledge to improve prediction accuracy, they overlook the opportunity to simultaneously expand such knowledge through predictive processes. We develop a Closed-Loop Large Language Model (LLM)-Knowledge Graph framework that integrates prediction and knowledge expansion in an iterative learning cycle. In the knowledge-aware depression detection phase, the LLM jointly performs depression detection and entity extraction, while the knowledge graph represents and weights these entities to refine prediction performance. In the knowledge refinement and expansion phase, new entities, relationships, and entity types extracted by the LLM are incorporated into the knowledge graph under expert supervision, enabling continual knowledge evolution. Using large-scale UGC, the framework enhances both predictive accuracy and medical understanding. Expert evaluations confirmed the discovery of clinically meaningful symptoms, comorbidities, and social triggers complementary to existing literature. We conceptualize and operationalize prediction-through-learning and learning-through-prediction as mutually reinforcing processes, advancing both methodological and theoretical understanding in predictive analytics. The framework demonstrates the co-evolution of computational models and domain knowledge, offering a foundation for adaptive, data-driven knowledge systems applicable to other dynamic risk monitoring contexts.
【2】Adversarially-Aware Architecture Design for Robust Medical AI Systems
标题:稳健医疗人工智能系统的敌对感知架构设计
链接:https://arxiv.org/abs/2510.23622
摘要:对抗性攻击对医疗保健中使用的人工智能系统构成了严重的风险,能够将模型误导为危险的错误分类,从而延迟治疗或导致误诊。这些攻击往往不易被人察觉,威胁到病人的安全,特别是在服务不足的人群中。我们的研究通过对皮肤病数据集的实证实验来探索这些漏洞,其中对抗性方法显着降低了分类准确性。通过详细的威胁建模、实验基准测试和模型评估,我们展示了威胁的严重性以及对抗性训练和蒸馏等防御措施的部分成功。我们的研究结果表明,虽然防御措施降低了攻击成功率,但它们必须与干净数据上的模型性能相平衡。最后,我们呼吁采用综合的技术、道德和基于政策的方法,在医疗保健领域建立更具弹性、公平的人工智能。
摘要:Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.
蒸馏|知识提取(2篇)
【1】Eigenfunction Extraction for Ordered Representation Learning
标题:有序表示学习的特征函数提取
链接:https://arxiv.org/abs/2510.24672
摘要:表征学习的最新进展表明,广泛使用的目标,如对比和非对比,隐含地执行上下文核的谱分解,由输入和它们的上下文之间的关系引起的。然而,这些方法只能恢复内核的顶部特征函数的线性跨度,而精确的谱分解对于理解特征排序和重要性是必不可少的。在这项工作中,我们提出了一个通用的框架,以提取有序和可识别的特征函数,模块化的积木设计,以满足关键desiderata,包括与上下文内核的兼容性和可扩展性,现代设置的基础上。然后,我们展示了两个主要的方法范式,低秩近似和瑞利商优化,与本征函数提取的框架。最后,我们在合成内核上验证了我们的方法,并在真实图像数据集上证明了恢复的特征值作为特征选择的有效重要性分数,通过自适应维度表示实现了原则性的效率-准确性权衡。
摘要:Recent advances in representation learning reveal that widely used objectives, such as contrastive and non-contrastive, implicitly perform spectral decomposition of a contextual kernel, induced by the relationship between inputs and their contexts. Yet, these methods recover only the linear span of top eigenfunctions of the kernel, whereas exact spectral decomposition is essential for understanding feature ordering and importance. In this work, we propose a general framework to extract ordered and identifiable eigenfunctions, based on modular building blocks designed to satisfy key desiderata, including compatibility with the contextual kernel and scalability to modern settings. We then show how two main methodological paradigms, low-rank approximation and Rayleigh quotient optimization, align with this framework for eigenfunction extraction. Finally, we validate our approach on synthetic kernels and demonstrate on real-world image datasets that the recovered eigenvalues act as effective importance scores for feature selection, enabling principled efficiency-accuracy tradeoffs via adaptive-dimensional representations.
【2】Quanvolutional Neural Networks for Pneumonia Detection: An Efficient Quantum-Assisted Feature Extraction Paradigm
标题:用于肺炎检测的量子卷积神经网络:一种有效的量子辅助特征提取范式
链接:https://arxiv.org/abs/2510.23660
备注:None
摘要:肺炎对全球健康构成重大挑战,需要准确及时的诊断。虽然深度学习,特别是卷积神经网络(CNN),在肺炎检测的医学图像分析中显示出了希望,但CNN通常存在计算成本高、特征表示受限以及从较小数据集推广的挑战。为了解决这些限制,我们探索了量子神经网络(QNN)的应用,利用量子计算来增强特征提取。本文介绍了一种新的混合量子-经典模型,用于肺炎检测,使用ECONONIAMNIST数据集。我们的方法利用具有参数化量子电路(PQC)的量子层来处理2x2图像补丁,采用旋转Y门进行数据编码和纠缠层来生成非经典特征表示。然后,这些量子提取的特征被送入经典神经网络进行分类。实验结果表明,所提出的QNN实现了更高的验证准确率为83.33%相比,可比较的经典CNN,达到73.33%。这种增强的收敛性和样本效率突出了QNN在医学图像分析中的潜力,特别是在标记数据有限的情况下。这项研究为将量子计算集成到深度学习驱动的医疗诊断系统中奠定了基础,为传统方法提供了一种计算效率高的替代方案。
摘要:Pneumonia poses a significant global health challenge, demanding accurate and timely diagnosis. While deep learning, particularly Convolutional Neural Networks (CNNs), has shown promise in medical image analysis for pneumonia detection, CNNs often suffer from high computational costs, limitations in feature representation, and challenges in generalizing from smaller datasets. To address these limitations, we explore the application of Quanvolutional Neural Networks (QNNs), leveraging quantum computing for enhanced feature extraction. This paper introduces a novel hybrid quantum-classical model for pneumonia detection using the PneumoniaMNIST dataset. Our approach utilizes a quanvolutional layer with a parameterized quantum circuit (PQC) to process 2x2 image patches, employing rotational Y-gates for data encoding and entangling layers to generate non-classical feature representations. These quantum-extracted features are then fed into a classical neural network for classification. Experimental results demonstrate that the proposed QNN achieves a higher validation accuracy of 83.33 percent compared to a comparable classical CNN which achieves 73.33 percent. This enhanced convergence and sample efficiency highlight the potential of QNNs for medical image analysis, particularly in scenarios with limited labeled data. This research lays the foundation for integrating quantum computing into deep-learning-driven medical diagnostic systems, offering a computationally efficient alternative to traditional approaches.
自动驾驶|车辆|车道检测等(3篇)
【1】Modeling Electric Vehicle Car-Following Behavior: Classical vs Machine Learning Approach
标题:电动汽车跟车行为建模:经典与机器学习方法
链接:https://arxiv.org/abs/2510.24085
摘要:电动汽车(EV)的日益普及需要了解其驾驶行为,以提高交通安全和开发智能驾驶系统。这项研究比较了电动汽车跟随行为的经典和机器学习模型。经典模型包括智能驾驶员模型(IDM)、最优速度模型(OVM)、最优速度相对速度(OVRV)和简化的CACC模型,而机器学习方法采用随机森林回归。使用在不同驾驶条件下跟随内燃机(ICE)车辆的EV的真实世界数据集,我们通过最小化预测和实际数据之间的RMSE来校准经典模型参数。随机森林模型使用间距、速度和间隙类型作为输入来预测加速度。结果证明了随机森林的优越准确性,达到RMSE为0.0046(中间隙),0.0016(长间隙)和0.0025(超长间隙)。在基于物理的模型中,CACC表现最好,长间隙的RMSE为2.67。这些发现突出了机器学习模型在所有场景中的性能。此类模型对于模拟电动汽车行为和分析电动汽车集成环境中的混合自主交通动态非常有价值。
摘要
:The increasing adoption of electric vehicles (EVs) necessitates an understanding of their driving behavior to enhance traffic safety and develop smart driving systems. This study compares classical and machine learning models for EV car following behavior. Classical models include the Intelligent Driver Model (IDM), Optimum Velocity Model (OVM), Optimal Velocity Relative Velocity (OVRV), and a simplified CACC model, while the machine learning approach employs a Random Forest Regressor. Using a real world dataset of an EV following an internal combustion engine (ICE) vehicle under varied driving conditions, we calibrated classical model parameters by minimizing the RMSE between predictions and real data. The Random Forest model predicts acceleration using spacing, speed, and gap type as inputs. Results demonstrate the Random Forest's superior accuracy, achieving RMSEs of 0.0046 (medium gap), 0.0016 (long gap), and 0.0025 (extra long gap). Among physics based models, CACC performed best, with an RMSE of 2.67 for long gaps. These findings highlight the machine learning model's performance across all scenarios. Such models are valuable for simulating EV behavior and analyzing mixed autonomy traffic dynamics in EV integrated environments.
【2】Traffic flow forecasting, STL decomposition, Hybrid model, LSTM, ARIMA, XGBoost, Intelligent transportation systems
标题:交通流预测、STL分解、混合模型、LSTM、ARIMA、XGBOP、智能交通系统
链接:https://arxiv.org/abs/2510.23668
摘要:准确的交通流预测是智能交通系统和城市交通管理的基础。然而,单模型方法往往无法捕捉交通流数据中复杂的、非线性的、多尺度的时间模式。本研究提出了一个分解驱动的混合框架,集成了季节趋势分解黄土(STL)与三个互补的预测模型。STL首先将原始时间序列分解为趋势、季节和残差分量。然后,长短期记忆(LSTM)网络对长期趋势进行建模,自回归综合移动平均(ARIMA)模型捕捉季节性周期性,极端梯度提升(XGBoost)算法预测非线性残差波动。通过子模型预测的乘法积分获得最终预测。使用2015年11月至12月期间纽约市十字路口的998个交通流量记录,结果显示LSTM ARIMA XGBoost混合模型在MAE,RMSE和R平方度量方面明显优于LSTM,ARIMA和XGBoost等独立模型。分解策略有效地隔离了时间特征,允许每个模型进行专门化,从而提高了预测精度,可解释性和鲁棒性。
摘要:Accurate traffic flow forecasting is essential for intelligent transportation systems and urban traffic management. However, single model approaches often fail to capture the complex, nonlinear, and multi scale temporal patterns in traffic flow data. This study proposes a decomposition driven hybrid framework that integrates Seasonal Trend decomposition using Loess (STL) with three complementary predictive models. STL first decomposes the original time series into trend, seasonal, and residual components. Then, a Long Short Term Memory (LSTM) network models long term trends, an Autoregressive Integrated Moving Average (ARIMA) model captures seasonal periodicity, and an Extreme Gradient Boosting (XGBoost) algorithm predicts nonlinear residual fluctuations. The final forecast is obtained through multiplicative integration of the sub model predictions. Using 998 traffic flow records from a New York City intersection between November and December 2015, results show that the LSTM ARIMA XGBoost hybrid model significantly outperforms standalone models including LSTM, ARIMA, and XGBoost across MAE, RMSE, and R squared metrics. The decomposition strategy effectively isolates temporal characteristics, allowing each model to specialize, thereby improving prediction accuracy, interpretability, and robustness.
【3】Error Adjustment Based on Spatiotemporal Correlation Fusion for Traffic Forecasting
标题:基于时空相关融合的交通量预测误差调整
链接:https://arxiv.org/abs/2510.23656
备注:12 pages, 7 figures, 3 tables
摘要:深度神经网络(DNN)在越来越多的交通预测研究中发挥着重要作用,因为它们可以有效地捕获嵌入交通数据中的时空模式。通过均方误差估计训练所述预测模型的一般假设是跨时间步长和空间位置的误差是不相关的。然而,这种假设并不真正成立,因为交通数据的时间性和空间性引起的自相关性。这一差距限制了基于DNN的预测模型的性能,并且被当前的研究所忽视。为了填补这一空白,本文提出了时空自相关误差调整(SAEA),一种新的和一般的框架,旨在系统地调整自相关预测误差在交通预测。与假设预测误差遵循随机高斯噪声分布的现有方法不同,SAEA将这些误差建模为时空向量自回归(VAR)过程以捕获其内在依赖性。首先,它明确地捕捉空间和时间的误差相关系数矩阵,然后嵌入到一个新制定的成本函数。其次,引入结构稀疏正则化以合并先验空间信息,确保学习的系数矩阵与固有的道路网络结构对齐。最后,设计了一个带有测试时间误差调整的推理过程,以动态地细化预测,减轻自相关误差对实时预测的影响。在不同的流量数据集上验证了该方法的有效性。在广泛的流量预测模型的结果表明,我们的方法提高了性能,在几乎所有的情况下。
摘要:Deep neural networks (DNNs) play a significant role in an increasing body of research on traffic forecasting due to their effectively capturing spatiotemporal patterns embedded in traffic data. A general assumption of training the said forecasting models via mean squared error estimation is that the errors across time steps and spatial positions are uncorrelated. However, this assumption does not really hold because of the autocorrelation caused by both the temporality and spatiality of traffic data. This gap limits the performance of DNN-based forecasting models and is overlooked by current studies. To fill up this gap, this paper proposes Spatiotemporally Autocorrelated Error Adjustment (SAEA), a novel and general framework designed to systematically adjust autocorrelated prediction errors in traffic forecasting. Unlike existing approaches that assume prediction errors follow a random Gaussian noise distribution, SAEA models these errors as a spatiotemporal vector autoregressive (VAR) process to capture their intrinsic dependencies. First, it explicitly captures both spatial and temporal error correlations by a coefficient matrix, which is then embedded into a newly formulated cost function. Second, a structurally sparse regularization is introduced to incorporate prior spatial information, ensuring that the learned coefficient matrix aligns with the inherent road network structure. Finally, an inference process with test-time error adjustment is designed to dynamically refine predictions, mitigating the impact of autocorrelated errors in real-time forecasting. The effectiveness of the proposed approach is verified on different traffic datasets. Results across a wide range of traffic forecasting models show that our method enhances performance in almost all cases.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】Improved Accuracy of Robot Localization Using 3-D LiDAR in a Hippocampus-Inspired Model
标题:在海马启发模型中使用3-D LiDART提高机器人定位的准确性
链接:https://arxiv.org/abs/2510.24029
备注:8 pages, 9 figures, Presented at the 2025 International Joint Conference on Neural Networks, Rome, July 2025
摘要:边界向量细胞(BVC)是脊椎动物脑中的一类神经元,其编码特定距离和异中心方向的环境边界,在海马中形成位置场中起核心作用。大多数计算BVC模型仅限于二维(2D)环境,使得它们在环境中存在水平对称性时易于产生空间模糊性。为了解决这一限制,我们将垂直角度灵敏度纳入BVC框架,从而实现三维的鲁棒边界检测,并在生物启发的机器人模型中实现更准确的空间定位。 所提出的模型处理LiDAR数据以捕获垂直轮廓,从而消除在纯2D表示下无法区分的位置。实验结果表明,在垂直变化最小的环境中,所提出的3D模型与2D基线的性能相匹配;然而,随着3D复杂性的增加,它会产生更多不同的地方字段,并显着减少空间混叠。这些发现表明,向基于BVC的定位添加垂直维度可以显着增强真实世界3D空间中的导航和映射,同时在更简单的近平面场景中保持性能平价。
摘要:Boundary Vector Cells (BVCs) are a class of neurons in the brains of vertebrates that encode environmental boundaries at specific distances and allocentric directions, playing a central role in forming place fields in the hippocampus. Most computational BVC models are restricted to two-dimensional (2D) environments, making them prone to spatial ambiguities in the presence of horizontal symmetries in the environment. To address this limitation, we incorporate vertical angular sensitivity into the BVC framework, thereby enabling robust boundary detection in three dimensions, and leading to significantly more accurate spatial localization in a biologically-inspired robot model. The proposed model processes LiDAR data to capture vertical contours, thereby disambiguating locations that would be indistinguishable under a purely 2D representation. Experimental results show that in environments with minimal vertical variation, the proposed 3D model matches the performance of a 2D baseline; yet, as 3D complexity increases, it yields substantially more distinct place fields and markedly reduces spatial aliasing. These findings show that adding a vertical dimension to BVC-based localization can significantly enhance navigation and mapping in real-world 3D spaces while retaining performance parity in simpler, near-planar scenarios.
联邦学习|隐私保护|加密(1篇)
【1】Local Performance vs. Out-of-Distribution Generalization: An Empirical Analysis of Personalized Federated Learning in Heterogeneous Data Environments
标题:本地性能与分布外概括:异类数据环境中个性化联邦学习的实证分析
链接:https://arxiv.org/abs/2510.24503
摘要
:在具有异构数据环境的联合学习的上下文中,局部模型往往会在局部训练步骤中收敛到自己的局部模型最优值,从而偏离整体数据分布。这些本地更新的聚合,例如,使用FedAvg,通常与全局模型最优值(客户端漂移)不一致,导致对大多数客户端而言次优的更新。个性化联合学习方法通过专门关注客户端模型在其自己的数据分布上的平均本地性能来解决这一挑战。对分布外样本的推广是FedAvg的一个实质性优势,也是稳健性的一个重要组成部分,但似乎没有充分纳入评估和评价过程。本研究涉及对联邦学习方法的彻底评估,包括其局部性能和泛化能力。因此,我们在一轮沟通中检查不同的阶段,以便对所考虑的指标有更细致的理解。此外,我们提出并纳入了一个修改后的方法FedAvg,指定为联邦学习与个性化更新(FLIU),通过一个简单的个性化步骤与自适应个性化因子扩展算法。我们评估和比较的方法,经验使用MNIST和CIFAR-10在各种分布条件下,包括基准IID和病理非IID,以及额外的新的测试环境与Dirichlet分布专门开发的复杂的数据异质性强调的算法。
摘要:In the context of Federated Learning with heterogeneous data environments, local models tend to converge to their own local model optima during local training steps, deviating from the overall data distributions. Aggregation of these local updates, e.g., with FedAvg, often does not align with the global model optimum (client drift), resulting in an update that is suboptimal for most clients. Personalized Federated Learning approaches address this challenge by exclusively focusing on the average local performances of clients' models on their own data distribution. Generalization to out-of-distribution samples, which is a substantial benefit of FedAvg and represents a significant component of robustness, appears to be inadequately incorporated into the assessment and evaluation processes. This study involves a thorough evaluation of Federated Learning approaches, encompassing both their local performance and their generalization capabilities. Therefore, we examine different stages within a single communication round to enable a more nuanced understanding of the considered metrics. Furthermore, we propose and incorporate a modified approach of FedAvg, designated as Federated Learning with Individualized Updates (FLIU), extending the algorithm by a straightforward individualization step with an adaptive personalization factor. We evaluate and compare the approaches empirically using MNIST and CIFAR-10 under various distributional conditions, including benchmark IID and pathological non-IID, as well as additional novel test environments with Dirichlet distribution specifically developed to stress the algorithms on complex data heterogeneity.
推理|分析|理解|解释(10篇)
【1】Methodology for Comparing Machine Learning Algorithms for Survival Analysis
标题:比较用于生存分析的机器学习算法的方法
链接:https://arxiv.org/abs/2510.24473
摘要:本研究对六种用于生存分析的机器学习模型(MLSA)进行了比较方法学分析。使用来自圣保罗医院癌症登记处的近45,000名结直肠癌患者的数据,我们评估了随机生存森林(RSF),生存分析梯度提升(GBSA),生存SVM(SSVM),XGBoost-Cox(XGB-Cox),XGBoost-AFT(XGB-AFT)和LightGBM(LGBM),能够预测考虑删失数据的生存率。使用不同的采样器进行超参数优化,并使用一致性指数(C-Index)、C-Index IPCW、时间依赖性AUC和综合Brier评分(IBS)评估模型性能。将模型产生的生存曲线与分类算法的预测进行比较,并使用SHAP和排列重要性进行预测解释。XGB-AFT的性能最好(C-Index = 0.7618; IPCW = 0.7532),其次是GBSA和RSF。结果突出了MLSA在改善生存预测和支持决策方面的潜力和适用性。
摘要:This study presents a comparative methodological analysis of six machine learning models for survival analysis (MLSA). Using data from nearly 45,000 colorectal cancer patients in the Hospital-Based Cancer Registries of S\~ao Paulo, we evaluated Random Survival Forest (RSF), Gradient Boosting for Survival Analysis (GBSA), Survival SVM (SSVM), XGBoost-Cox (XGB-Cox), XGBoost-AFT (XGB-AFT), and LightGBM (LGBM), capable of predicting survival considering censored data. Hyperparameter optimization was performed with different samplers, and model performance was assessed using the Concordance Index (C-Index), C-Index IPCW, time-dependent AUC, and Integrated Brier Score (IBS). Survival curves produced by the models were compared with predictions from classification algorithms, and predictor interpretation was conducted using SHAP and permutation importance. XGB-AFT achieved the best performance (C-Index = 0.7618; IPCW = 0.7532), followed by GBSA and RSF. The results highlight the potential and applicability of MLSA to improve survival prediction and support decision making.
【2】From Memorization to Reasoning in the Spectrum of Loss Curvature
标题:从曲线损失谱中的简化到推理
链接:https://arxiv.org/abs/2510.24256
摘要:我们的特点是如何记忆表示在Transformer模型,并表明它可以解开在两个语言模型(LM)和Vision Transformers(ViTs)的重量使用的基础上的损失景观曲率的分解。这一见解是基于先前的理论和经验工作,表明记忆训练点的曲率比非记忆训练点的曲率要尖锐得多,这意味着从高到低曲率排序权重分量可以揭示没有明确标签的区别。这激发了一个权重编辑程序,该程序比最近的非学习方法(BalancedSubnet)更有效地抑制了更多的非目标记忆数据的背诵,同时保持较低的困惑。由于曲率的基础上有一个自然的解释模型权重的共享结构,我们分析了编辑过程中广泛的下游任务的影响,在LM,并发现,事实检索和算法的具体和一贯的负面影响,即使开卷事实检索和一般的逻辑推理是保守的。我们发现这些任务在很大程度上依赖于权重空间中的专门方向,而不是通用机制,无论这些单独的数据点是否被记住。我们通过显示任务数据的激活强度与我们编辑出的低曲率组件之间的对应关系来支持这一点,以及编辑后任务性能的下降。我们的工作增强了对神经网络中记忆的理解,并将其实际应用于消除记忆,并为解决数学和事实检索等任务所涉及的特殊,狭窄的结构提供了证据。
摘要:We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature. This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels. This motivates a weight editing procedure that suppresses far more recitation of untargeted memorized data more effectively than a recent unlearning method (BalancedSubnet), while maintaining lower perplexity. Since the basis of curvature has a natural interpretation for shared structure in model weights, we analyze the editing procedure extensively on its effect on downstream tasks in LMs, and find that fact retrieval and arithmetic are specifically and consistently negatively affected, even though open book fact retrieval and general logical reasoning is conserved. We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized. We support this by showing a correspondence between task data's activation strength with low curvature components that we edit out, and the drop in task performance after the edit. Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiosyncratic, narrowly-used structures involved in solving tasks like math and fact retrieval.
【3】Closing Gaps: An Imputation Analysis of ICU Vital Signs
标题:缩小差距:ICU生命体征的归责分析
链接:https://arxiv.org/abs/2510.24217
备注:Preprint
摘要:随着更多的重症监护室(ICU)数据变得可用,开发临床预测模型以改善医疗保健协议的兴趣增加。然而,缺乏数据质量仍然阻碍了使用机器学习(ML)进行临床预测。许多生命体征测量(如心率)包含相当大的缺失段,在数据中留下了可能对预测性能产生负面影响的空白。以前的作品介绍了许多时间序列插补技术。然而,需要更全面的工作来比较ICU生命体征的代表性方法集并确定最佳实践。在现实中,ad-hoc插补技术,可能会降低预测精度,如零插补,仍然使用。在这项工作中,我们比较了已建立的插补技术,以指导研究人员通过选择最准确的插补技术来提高临床预测模型的性能。我们引入了一个可扩展和可重用的基准,目前有15种插补和4种截肢方法,用于对主要ICU数据集进行基准测试。我们希望提供一个比较的基础,并促进ML的进一步发展,使更多的模型进入临床实践。
摘要:As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction models to improve healthcare protocols increases. However, the lack of data quality still hinders clinical prediction using Machine Learning (ML). Many vital sign measurements, such as heart rate, contain sizeable missing segments, leaving gaps in the data that could negatively impact prediction performance. Previous works have introduced numerous time-series imputation techniques. Nevertheless, more comprehensive work is needed to compare a representative set of methods for imputing ICU vital signs and determine the best practice. In reality, ad-hoc imputation techniques that could decrease prediction accuracy, like zero imputation, are still used. In this work, we compare established imputation techniques to guide researchers in improving the performance of clinical prediction models by selecting the most accurate imputation technique. We introduce an extensible and reusable benchmark with currently 15 imputation and 4 amputation methods, created for benchmarking on major ICU datasets. We hope to provide a comparative basis and facilitate further ML development to bring more models into clinical practice.
【4】NeuroPathNet: Dynamic Path Trajectory Learning for Brain Functional Connectivity Analysis
标题:NeuropathNet:用于脑功能连接性分析的动态路径轨迹学习
链接:https://arxiv.org/abs/2510.24025
摘要:了解大脑功能网络随时间的演变对于认知机制的分析和神经系统疾病的诊断具有重要意义。现有的方法往往难以捕捉特定功能社区之间的连接的时间演化特征。为此,本文提出了一种新的路径级轨迹建模框架(NeuroPathNet)来表征大脑功能分区之间连接路径的动态行为。基于医学支持的静态分区方案(例如Yeo和Smith ICA),我们提取每对功能分区之间的连接强度的时间序列,并使用时间神经网络对其进行建模。在三个公开的功能性磁共振成像(fMRI)数据集上验证了模型的性能,结果表明该模型在多个指标上优于现有的主流方法。本研究可促进脑网络分析的动态图学习方法的发展,并为神经系统疾病的诊断提供可能的临床应用。
摘要:Understanding the evolution of brain functional networks over time is of great significance for the analysis of cognitive mechanisms and the diagnosis of neurological diseases. Existing methods often have difficulty in capturing the temporal evolution characteristics of connections between specific functional communities. To this end, this paper proposes a new path-level trajectory modeling framework (NeuroPathNet) to characterize the dynamic behavior of connection pathways between brain functional partitions. Based on medically supported static partitioning schemes (such as Yeo and Smith ICA), we extract the time series of connection strengths between each pair of functional partitions and model them using a temporal neural network. We validate the model performance on three public functional Magnetic Resonance Imaging (fMRI) datasets, and the results show that it outperforms existing mainstream methods in multiple indicators. This study can promote the development of dynamic graph learning methods for brain network analysis, and provide possible clinical applications for the diagnosis of neurological diseases.
【5】A data free neural operator enabling fast inference of 2D and 3D Navier Stokes equations
标题:无数据神经运算符,能够快速推断2D和3D Navier Stokes方程
链接:https://arxiv.org/abs/2510.23936
摘要:高维流动模型的包围模拟(例如,Navier Stokes型偏微分方程)对于实时应用在计算上是禁止的。神经运算符可以实现快速推理,但受到昂贵的数据要求和对3D流的泛化能力差的限制。我们提出了一个无数据的Navier Stokes方程的算子网络,消除了对成对的解决方案的数据的需要,并实现强大的,实时的推理大型合奏预测。物理基础架构采用初始和边界条件以及强制函数,产生对高变异性和扰动鲁棒的解决方案。在2D基准测试和3D测试用例中,该方法在准确性上超过了先前的神经运算符,并且对于集合,比传统的数值求解器实现了更高的效率。值得注意的是,它提供了三维Navier Stokes方程的精确解,这是一种以前没有为无数据神经操作员证明的方法。通过将数字基础架构与机器学习的可扩展性相结合,这种方法为端到端科学模拟和预测建立了一条通往无数据、高保真PDE代理的实用途径。
摘要:Ensemble simulations of high-dimensional flow models (e.g., Navier Stokes type PDEs) are computationally prohibitive for real time applications. Neural operators enable fast inference but are limited by costly data requirements and poor generalization to 3D flows. We present a data-free operator network for the Navier Stokes equations that eliminates the need for paired solution data and enables robust, real time inference for large ensemble forecasting. The physics-grounded architecture takes initial and boundary conditions as well as forcing functions, yielding solutions robust to high variability and perturbations. Across 2D benchmarks and 3D test cases, the method surpasses prior neural operators in accuracy and, for ensembles, achieves greater efficiency than conventional numerical solvers. Notably, it delivers accurate solutions of the three dimensional Navier Stokes equations, a regime not previously demonstrated for data free neural operators. By uniting a numerically grounded architecture with the scalability of machine learning, this approach establishes a practical pathway toward data free, high fidelity PDE surrogates for end to end scientific simulation and prediction.
【6】Combining SHAP and Causal Analysis for Interpretable Fault Detection in Industrial Processes
标题:结合SHAP和因果分析进行工业过程中的可解释故障检测
链接:https://arxiv.org/abs/2510.23817
摘要:工业流程产生复杂的数据,挑战故障检测系统,尽管有先进的机器学习技术,但通常会产生不透明或不起眼的结果。这项研究解决了这样的困难,使用田纳西州伊士曼过程,一个完善的基准,其复杂的动力学,开发一个创新的故障检测框架。标准模型的初步尝试揭示了性能和可解释性的局限性,促使人们转向更易于处理的方法。通过采用SHAP(SHapley加法解释),我们将问题转化为一个更易于管理和透明的形式,精确定位最关键的过程特征驱动故障预测。这种复杂性的降低释放了通过多种算法生成的有向非循环图应用因果分析的能力,以揭示故障传播的潜在机制。由此产生的因果结构与SHAP研究结果惊人地一致,始终突出了关键的过程元素,如冷却和分离系统,作为故障发展的关键。总之,这些方法不仅提高了检测精度,而且还为操作员提供了对故障起源的清晰,可操作的见解,据我们所知,这种协同作用以前没有在这种情况下进行过探索。这种双重方法将预测能力与因果理解联系起来,为监控复杂的制造环境提供了强大的工具,并为工业系统中更智能,更可解释的故障检测铺平了道路。
摘要:Industrial processes generate complex data that challenge fault detection systems, often yielding opaque or underwhelming results despite advanced machine learning techniques. This study tackles such difficulties using the Tennessee Eastman Process, a well-established benchmark known for its intricate dynamics, to develop an innovative fault detection framework. Initial attempts with standard models revealed limitations in both performance and interpretability, prompting a shift toward a more tractable approach. By employing SHAP (SHapley Additive exPlanations), we transform the problem into a more manageable and transparent form, pinpointing the most critical process features driving fault predictions. This reduction in complexity unlocks the ability to apply causal analysis through Directed Acyclic Graphs, generated by multiple algorithms, to uncover the underlying mechanisms of fault propagation. The resulting causal structures align strikingly with SHAP findings, consistently highlighting key process elements-like cooling and separation systems-as pivotal to fault development. Together, these methods not only enhance detection accuracy but also provide operators with clear, actionable insights into fault origins, a synergy that, to our knowledge, has not been previously explored in this context. This dual approach bridges predictive power with causal understanding, offering a robust tool for monitoring complex manufacturing environments and paving the way for smarter, more interpretable fault detection in industrial systems.
【7】MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection
标题:MUStReason:多模式讽刺检测视频LM中诊断务实推理的基准
链接:https://arxiv.org/abs/2510.23727
摘要:讽刺是一种特殊类型的讽刺,它涉及区分所说的内容和所指的内容。检测讽刺不仅取决于话语的字面内容,还取决于非语言线索,例如说话者的音调、面部表情和对话上下文。然而,目前的多模态模型与复杂的任务斗争,如讽刺检测,这需要识别跨模态的相关线索,并对它们进行语用推理,以推断说话者的意图。为了探索这些限制在VideoLMs,我们引入MUSReason,诊断基准丰富的注释模态特定的相关线索和基本的推理步骤,以确定讽刺的意图。除了基准的讽刺分类性能在VideoLMs,使用MUSReason,我们定量和定性地评估生成的推理,通过将问题分解成感知和推理,我们提出了PragCoT,一个框架,引导VideoLMs专注于隐含的意图在字面意义上,一个属性的核心,以检测讽刺。
摘要:Sarcasm is a specific type of irony which involves discerning what is said from what is meant. Detecting sarcasm depends not only on the literal content of an utterance but also on non-verbal cues such as speaker's tonality, facial expressions and conversational context. However, current multimodal models struggle with complex tasks like sarcasm detection, which require identifying relevant cues across modalities and pragmatically reasoning over them to infer the speaker's intention. To explore these limitations in VideoLMs, we introduce MUStReason, a diagnostic benchmark enriched with annotations of modality-specific relevant cues and underlying reasoning steps to identify sarcastic intent. In addition to benchmarking sarcasm classification performance in VideoLMs, using MUStReason we quantitatively and qualitatively evaluate the generated reasoning by disentangling the problem into perception and reasoning, we propose PragCoT, a framework that steers VideoLMs to focus on implied intentions over literal meaning, a property core to detecting sarcasm.
【8】NUM2EVENT: Interpretable Event Reasoning from Numerical time-series
标题:NUM 2EVANCE:从数字时间序列进行可解释事件推理
链接:https://arxiv.org/abs/2510.23630
摘要:大型语言模型(LLM)最近展示了令人印象深刻的多模态推理能力,但它们对纯数值时间序列信号的理解仍然有限。现有的方法主要集中在预测或趋势描述,没有发现潜在的事件,驱动数值变化或解释背后的推理过程。在这项工作中,我们介绍了数字到事件推理和解码的任务,其目的是从数字输入中推断出可解释的结构化事件,即使当前文本不可用。为了解决数据稀缺和语义对齐的挑战,我们提出了一个推理感知的框架,集成了一个代理引导的事件提取器(AGE),一个标记的多变量霍克斯为基础的合成生成器(Evestival),和一个两阶段的微调流水线相结合的时间序列编码器与结构化解码器。我们的模型明确的原因在数值变化,产生中间解释,并输出结构化的事件假设。多域数据集上的实验表明,我们的方法在事件级的准确率和召回率方面大大优于强LLM基线。这些结果为桥接定量推理和语义理解提出了一个新的方向,使LLM能够直接从数值动力学解释和预测事件。
摘要
:Large language models (LLMs) have recently demonstrated impressive multimodal reasoning capabilities, yet their understanding of purely numerical time-series signals remains limited. Existing approaches mainly focus on forecasting or trend description, without uncovering the latent events that drive numerical changes or explaining the reasoning process behind them. In this work, we introduce the task of number-to-event reasoning and decoding, which aims to infer interpretable structured events from numerical inputs, even when current text is unavailable. To address the data scarcity and semantic alignment challenges, we propose a reasoning-aware framework that integrates an agent-guided event extractor (AGE), a marked multivariate Hawkes-based synthetic generator (EveDTS), and a two-stage fine-tuning pipeline combining a time-series encoder with a structured decoder. Our model explicitly reasons over numerical changes, generates intermediate explanations, and outputs structured event hypotheses. Experiments on multi-domain datasets show that our method substantially outperforms strong LLM baselines in event-level precision and recall. These results suggest a new direction for bridging quantitative reasoning and semantic understanding, enabling LLMs to explain and predict events directly from numerical dynamics.
【9】Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis
标题:通过子空间分解和影响分析了解公平性和预测误差
链接:https://arxiv.org/abs/2510.23935
摘要:机器学习模型取得了广泛的成功,但往往继承和放大历史偏见,导致不公平的结果。传统的公平方法通常在预测层面施加约束,而没有解决数据表示中的潜在偏差。在这项工作中,我们提出了一个原则性的框架,调整数据表示,以平衡预测效用和公平性。使用足够的降维,我们将特征空间分解为目标相关的,敏感的和共享的组件,并通过选择性地删除敏感信息来控制公平效用的权衡。我们提供了一个理论分析的预测误差和公平差距如何演变的共享子空间添加,并采用影响函数来量化其对参数估计的渐近行为的影响。在合成数据集和真实数据集上的实验验证了我们的理论见解,并表明所提出的方法有效地提高了公平性,同时保持了预测性能。
摘要:Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we propose a principled framework that adjusts data representations to balance predictive utility and fairness. Using sufficient dimension reduction, we decompose the feature space into target-relevant, sensitive, and shared components, and control the fairness-utility trade-off by selectively removing sensitive information. We provide a theoretical analysis of how prediction error and fairness gaps evolve as shared subspaces are added, and employ influence functions to quantify their effects on the asymptotic behavior of parameter estimates. Experiments on both synthetic and real-world datasets validate our theoretical insights and show that the proposed method effectively improves fairness while preserving predictive performance.
【10】VIKING: Deep variational inference with stochastic projections
标题:Viking:利用随机投影进行深度变分推理
链接:https://arxiv.org/abs/2510.23684
备注:NeurIPS 2025 (poster)
摘要:变分平均场近似往往与当代过度参数化的深度神经网络相矛盾。贝叶斯处理通常与高质量的预测和不确定性相关联,但实际情况恰恰相反,训练不稳定,预测能力差,校准不合格。基于最近的工作神经网络的重新参数化,我们提出了一个简单的变分家庭,认为两个独立的线性子空间的参数空间。这些代表了训练数据支持内外的功能变化。这使我们能够建立一个完全相关的近似后验,反映了调整易于解释的超参数的过度参数化。我们开发了可扩展的数值例程,最大限度地提高相关证据下限(ELBO)和样本的近似后验。从经验上讲,与各种基线方法相比,我们观察到了跨任务、模型和数据集的最新性能。我们的研究结果表明,在构建反映重新参数化几何结构的推理机制时,应用于深度神经网络的近似贝叶斯推理远非失败的原因。
摘要:Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.
检测相关(3篇)
【1】ARIMA_PLUS: Large-scale, Accurate, Automatic and Interpretable In-Database Time Series Forecasting and Anomaly Detection in Google BigQuery
标题:ARIMA_PLUS:Google BigQuery中的大规模、准确、自动和可解释的数据库内时间序列预测和异常检测
链接:https://arxiv.org/abs/2510.24452
摘要:时间序列预测和异常检测是零售、制造、广告和能源等行业从业人员的常见任务。两个独特的挑战突出:(1)高效准确地预测时间序列或自动检测大量异常;(2)确保结果的可解释性,以有效地纳入业务见解。我们提出了ARIMA_PLUS,一个新的框架,以克服这两个挑战的独特组合(a)准确和可解释的时间序列模型和(b)可扩展和完全管理的系统基础设施。该模型具有顺序和模块化结构,以处理时间序列的不同组成部分,包括假日效应,季节性,趋势和异常,这使得结果具有高度的可解释性。对每个模块进行了新的增强,并建立了一个统一的框架,以同时解决预测和异常检测任务。在准确性方面,其对Monash预测库中42个公共数据集的综合基准测试不仅显示出优于成熟的统计替代方案(如ETS,ARIMA,TBATS,Prophet)的性能,而且还显示出较新的神经网络模型(如DeepAR,N-BEATS,PatchTST,TimeMixer)的性能。在基础设施方面,它直接内置在Google Cloud的BigQuery的查询引擎中。它使用一个简单的SQL接口,并自动化繁琐的技术,如数据清理和模型选择。它可以自动扩展托管云计算和存储资源,使其能够预测1亿个时间序列,仅用1.5小时,每秒超过18000个时间序列的吞吐量。在可解释性方面,我们提出了几个案例研究,以证明它产生的时间序列的见解和它提供的可定制性。
摘要:Time series forecasting and anomaly detection are common tasks for practitioners in industries such as retail, manufacturing, advertising and energy. Two unique challenges stand out: (1) efficiently and accurately forecasting time series or detecting anomalies in large volumes automatically; and (2) ensuring interpretability of results to effectively incorporate business insights. We present ARIMA_PLUS, a novel framework to overcome these two challenges by a unique combination of (a) accurate and interpretable time series models and (b) scalable and fully managed system infrastructure. The model has a sequential and modular structure to handle different components of the time series, including holiday effects, seasonality, trend, and anomalies, which enables high interpretability of the results. Novel enhancements are made to each module, and a unified framework is established to address both forecasting and anomaly detection tasks simultaneously. In terms of accuracy, its comprehensive benchmark on the 42 public datasets in the Monash forecasting repository shows superior performance over not only well-established statistical alternatives (such as ETS, ARIMA, TBATS, Prophet) but also newer neural network models (such as DeepAR, N-BEATS, PatchTST, TimeMixer). In terms of infrastructure, it is directly built into the query engine of BigQuery in Google Cloud. It uses a simple SQL interface and automates tedious technicalities such as data cleaning and model selection. It automatically scales with managed cloud computational and storage resources, making it possible to forecast 100 million time series using only 1.5 hours with a throughput of more than 18000 time series per second. In terms of interpretability, we present several case studies to demonstrate time series insights it generates and customizability it offers.
【2】Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection
标题:局部核投影离群度:多模式离群点检测的两阶段方法
链接:https://arxiv.org/abs/2510.24043
备注:10 pages, 4 figures; submitted to The IEICE Transactions on Information and Systems
摘要:本文提出了两阶段LKPLO,一种新的多阶段离群值检测框架,克服了传统的基于投影的方法共存的局限性:它们依赖于一个固定的统计度量和它们的假设一个单一的数据结构。我们的框架独特地综合了三个关键概念:(1)一个广义的基于损失的异常度度量(PLO),它用灵活的自适应损失函数(如我们提出的SVM类损失)代替固定度量;(2)一个全局核PCA阶段,用于线性化非线性数据结构;(3)一个后续的局部聚类阶段,用于处理多模态分布。在10个基准数据集上进行的全面的5重交叉验证实验,以及自动化的超参数优化,证明了两阶段LKPLO达到了最先进的性能。它在现有方法失败的具有挑战性结构的数据集上的表现显着优于强基线,特别是在多聚类数据(Optdigits)和复杂的高维数据(心律失常)上。此外,消融研究经验证实,核化和定位阶段的协同组合是其优越性能不可或缺的。这项工作为一类重要的离群值检测问题提供了一个强大的新工具,并强调了混合多阶段架构的重要性。
摘要
:This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a fixed statistical metric and their assumption of a single data structure. Our framework uniquely synthesizes three key concepts: (1) a generalized loss-based outlyingness measure (PLO) that replaces the fixed metric with flexible, adaptive loss functions like our proposed SVM-like loss; (2) a global kernel PCA stage to linearize non-linear data structures; and (3) a subsequent local clustering stage to handle multi-modal distributions. Comprehensive 5-fold cross-validation experiments on 10 benchmark datasets, with automated hyperparameter optimization, demonstrate that Two-Stage LKPLO achieves state-of-the-art performance. It significantly outperforms strong baselines on datasets with challenging structures where existing methods fail, most notably on multi-cluster data (Optdigits) and complex, high-dimensional data (Arrhythmia). Furthermore, an ablation study empirically confirms that the synergistic combination of both the kernelization and localization stages is indispensable for its superior performance. This work contributes a powerful new tool for a significant class of outlier detection problems and underscores the importance of hybrid, multi-stage architectures.
【3】In Search of the Unknown Unknowns: A Multi-Metric Distance Ensemble for Out of Distribution Anomaly Detection in Astronomical Surveys
标题:寻找未知的未知数:天文测量中用于分布异常检测的多尺度距离集合
链接:https://arxiv.org/abs/2510.23702
备注:9 pages, 5 figures, Accepted at the 2025 Machine Learning and the Physical Sciences (ML4PS) workshop at NeurIPS
摘要:基于距离的方法涉及计算特征之间的距离值,并且是机器学习中的一个成熟范例。在异常检测中,异常通过它们与正常数据点的大距离来识别。然而,这些方法的性能通常取决于单个用户选择的距离度量(例如,Euclidean),这对于天文学中常见的复杂高维特征空间可能不是最佳的。在这里,我们介绍了一种新的异常检测方法,距离多度量异常检测(DiMMAD),它使用一个合奏的距离度量来发现新奇。 使用多个距离度量实际上等同于在特征空间中使用不同的几何形状。通过使用不同的距离度量的鲁棒集成,我们克服了度量选择问题,创建了一个不依赖于任何单一距离定义的异常分数。我们展示了这种多度量方法作为一种工具,用于天文时间序列的简单,可解释的科学发现-(1)与即将到来的Vera C。鲁宾天文台的空间和时间的遗产调查,和(2)从茨维基瞬态设施的实际数据。 我们发现,DiMMAD在分布外异常检测方面表现出色-数据中可能是新类别的异常-并且在最大化发现的新类别的多样性方面击败了其他最先进的方法。对于罕见的分布异常检测,DiMMAD的表现与其他方法相似,但可以提高可解释性。我们所有的代码都是开源的:DiMMAD是在DistClassiPy中实现的:https://github.com/distchaini/distclassipy/,而复制本文结果的所有代码都可以在这里获得:https://github.com/sidchaini/dimmad/。
摘要:Distance-based methods involve the computation of distance values between features and are a well-established paradigm in machine learning. In anomaly detection, anomalies are identified by their large distance from normal data points. However, the performance of these methods often hinges on a single, user-selected distance metric (e.g., Euclidean), which may not be optimal for the complex, high-dimensional feature spaces common in astronomy. Here, we introduce a novel anomaly detection method, Distance Multi-Metric Anomaly Detection (DiMMAD), which uses an ensemble of distance metrics to find novelties. Using multiple distance metrics is effectively equivalent to using different geometries in the feature space. By using a robust ensemble of diverse distance metrics, we overcome the metric-selection problem, creating an anomaly score that is not reliant on any single definition of distance. We demonstrate this multi-metric approach as a tool for simple, interpretable scientific discovery on astronomical time series -- (1) with simulated data for the upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time, and (2) real data from the Zwicky Transient Facility. We find that DiMMAD excels at out-of-distribution anomaly detection -- anomalies in the data that might be new classes -- and beats other state-of-the-art methods in the goal of maximizing the diversity of new classes discovered. For rare in-distribution anomaly detection, DiMMAD performs similarly to other methods, but may allow for improved interpretability. All our code is open source: DiMMAD is implemented within DistClassiPy: https://github.com/sidchaini/distclassipy/, while all code to reproduce the results of this paper is available here: https://github.com/sidchaini/dimmad/.
分类|识别(3篇)
【1】EDC: Equation Discovery for Classification
标题:PDC:用于分类的方程发现
链接:https://arxiv.org/abs/2510.24310
备注:This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution is published in Lecture Notes in Computer Science, and is available online at this https URL
摘要:方程发现技术在回归任务中取得了相当大的成功,它们被用来发现简洁和可解释的模型(符号回归)。在本文中,我们提出了一个新的ED为基础的二进制分类框架。我们提出的方法EDC发现的分析功能,指定的位置和形状的决策边界的管理规模。在对人工和现实数据的广泛实验中,我们展示了EDC如何能够发现目标方程的结构及其参数的值,在二进制分类中优于当前最先进的基于ED的分类方法,并实现与最先进的二进制分类相媲美的性能。我们提出了一种适度复杂的语法,它似乎在测试数据集上工作得很好,但认为确切的语法-以及模型的复杂性-是可配置的,特别是特定于域的表达式可以包含在模式语言中,这是必要的。所提出的语法由一系列的summands(加法项),其中包括线性,二次和指数项,以及产品的两个功能(生产双曲线捕捉XOR类依赖的理想)。实验表明,这种语法允许相当灵活的决策边界,而不是那么丰富,导致过拟合。
摘要:Equation Discovery techniques have shown considerable success in regression tasks, where they are used to discover concise and interpretable models (\textit{Symbolic Regression}). In this paper, we propose a new ED-based binary classification framework. Our proposed method EDC finds analytical functions of manageable size that specify the location and shape of the decision boundary. In extensive experiments on artificial and real-life data, we demonstrate how EDC is able to discover both the structure of the target equation as well as the value of its parameters, outperforming the current state-of-the-art ED-based classification methods in binary classification and achieving performance comparable to the state of the art in binary classification. We suggest a grammar of modest complexity that appears to work well on the tested datasets but argue that the exact grammar -- and thus the complexity of the models -- is configurable, and especially domain-specific expressions can be included in the pattern language, where that is required. The presented grammar consists of a series of summands (additive terms) that include linear, quadratic and exponential terms, as well as products of two features (producing hyperbolic curves ideal for capturing XOR-like dependencies). The experiments demonstrate that this grammar allows fairly flexible decision boundaries while not so rich to cause overfitting.
【2】Fixed Point Neural Acceleration and Inverse Surrogate Model for Battery Parameter Identification
标题:基于定点神经加速逆代理模型的电池参数辨识
链接:https://arxiv.org/abs/2510.24135
备注:31 pages, 11 figures, submitted to Applied Energy
摘要:电动汽车的快速发展加剧了对锂离子电池准确有效诊断的需求。电化学电池模型的参数辨识被广泛认为是电池健康评估的有效方法。然而,传统的元启发式方法存在计算成本高和收敛速度慢的问题,而最近的机器学习方法受到其对恒定电流数据的依赖的限制,这在实践中可能无法获得。为了克服这些挑战,我们提出了基于深度学习的电化学电池模型参数识别框架。该框架结合了单粒子模型与电解质的神经代理模型(NeuralSPMe)和基于深度学习的定点迭代方法。NeuralSPMe在真实的EV负载曲线上进行训练,以准确预测动态运行条件下的锂浓度动态,而参数更新网络(PUNet)执行定点迭代更新,以显着减少每个样本的评估时间和收敛所需的迭代总数。实验结果表明,该框架加快了2000倍以上的参数识别,实现了优越的采样效率和10倍以上的精度相比,传统的元启发式算法,特别是在动态负载的情况下,在实际应用中遇到的。
摘要:The rapid expansion of electric vehicles has intensified the need for accurate and efficient diagnosis of lithium-ion batteries. Parameter identification of electrochemical battery models is widely recognized as a powerful method for battery health assessment. However, conventional metaheuristic approaches suffer from high computational cost and slow convergence, and recent machine learning methods are limited by their reliance on constant current data, which may not be available in practice. To overcome these challenges, we propose deep learning-based framework for parameter identification of electrochemical battery models. The proposed framework combines a neural surrogate model of the single particle model with electrolyte (NeuralSPMe) and a deep learning-based fixed-point iteration method. NeuralSPMe is trained on realistic EV load profiles to accurately predict lithium concentration dynamics under dynamic operating conditions while a parameter update network (PUNet) performs fixed-point iterative updates to significantly reduce both the evaluation time per sample and the overall number of iterations required for convergence. Experimental evaluations demonstrate that the proposed framework accelerates the parameter identification by more than 2000 times, achieves superior sample efficiency and more than 10 times higher accuracy compared to conventional metaheuristic algorithms, particularly under dynamic load scenarios encountered in practical applications.
【3】Quantum Machine Learning for Image Classification: A Hybrid Model of Residual Network with Quantum Support Vector Machine
标题
:用于图像分类的量子机器学习:剩余网络与量子支持载体机的混合模型
链接:https://arxiv.org/abs/2510.23659
备注:None
摘要:最近,人们越来越关注将量子机器学习(QML)与经典深度学习方法相结合,因为计算技术是提高图像分类任务性能的关键。本研究提出了一种混合方法,使用ResNet-50(残差网络)进行特征提取和量子支持向量机(QSVM)进行分类的背景下,马铃薯疾病检测。经典的机器学习和深度学习模型通常难以处理高维和复杂的数据集,因此需要量子计算等先进技术来提高分类效率。在我们的研究中,我们使用ResNet-50从马铃薯疾病的RGB图像中提取深度特征表示。然后,这些功能进行降维使用主成分分析(PCA)。所得到的特征通过QSVM模型进行处理,该模型应用各种量子特征图,如ZZ,Z和Pauli-X,将经典数据转换为量子态。为了评估模型的性能,我们将其与经典的机器学习算法,如支持向量机(SVM)和随机森林(RF)进行了比较,使用五重分层交叉验证进行综合评估。实验结果表明,基于Z特征图的QSVM比经典模型的识别效果更好,准确率达到99.23%,超过了SVM和RF模型。这项研究突出了将量子计算集成到图像分类中的优势,并通过混合量子-经典建模提供了一种潜在的疾病检测解决方案。
摘要:Recently, there has been growing attention on combining quantum machine learning (QML) with classical deep learning approaches, as computational techniques are key to improving the performance of image classification tasks. This study presents a hybrid approach that uses ResNet-50 (Residual Network) for feature extraction and Quantum Support Vector Machines (QSVM) for classification in the context of potato disease detection. Classical machine learning as well as deep learning models often struggle with high-dimensional and complex datasets, necessitating advanced techniques like quantum computing to improve classification efficiency. In our research, we use ResNet-50 to extract deep feature representations from RGB images of potato diseases. These features are then subjected to dimensionality reduction using Principal Component Analysis (PCA). The resulting features are processed through QSVM models which apply various quantum feature maps such as ZZ, Z, and Pauli-X to transform classical data into quantum states. To assess the model performance, we compared it with classical machine learning algorithms such as Support Vector Machine (SVM) and Random Forest (RF) using five-fold stratified cross-validation for comprehensive evaluation. The experimental results demonstrate that the Z-feature map-based QSVM outperforms classical models, achieving an accuracy of 99.23 percent, surpassing both SVM and RF models. This research highlights the advantages of integrating quantum computing into image classification and provides a potential disease detection solution through hybrid quantum-classical modeling.
表征(5篇)
【1】Perception Learning: A Formal Separation of Sensory Representation Learning from Decision Learning
标题:感知学习:感觉表示学习与决策学习的正式分离
链接:https://arxiv.org/abs/2510.24356
摘要:我们引入感知学习(PeL),这是一种使用与任务无关的信号优化代理的感觉界面$f_\phi:\mathcal{X}\to\mathcal{Z}$的范例,与下游决策学习$g_\theta:\mathcal{Z}\to\mathcal{Y}$解耦。PeL直接针对无标签的感知属性,例如对滋扰的稳定性,无崩溃的信息量和受控几何形状,通过客观的表示不变度量进行评估。我们形式化的分离的感知和决策,定义独立的目标或reparameterizations的感知属性,并证明PeL更新保持足够的不变量是正交的贝叶斯任务风险梯度。此外,我们还提供了一套与任务无关的评估指标来验证感知质量。
摘要:We introduce Perception Learning (PeL), a paradigm that optimizes an agent's sensory interface $f_\phi:\mathcal{X}\to\mathcal{Z}$ using task-agnostic signals, decoupled from downstream decision learning $g_\theta:\mathcal{Z}\to\mathcal{Y}$. PeL directly targets label-free perceptual properties, such as stability to nuisances, informativeness without collapse, and controlled geometry, assessed via objective representation-invariant metrics. We formalize the separation of perception and decision, define perceptual properties independent of objectives or reparameterizations, and prove that PeL updates preserving sufficient invariants are orthogonal to Bayes task-risk gradients. Additionally, we provide a suite of task-agnostic evaluation metrics to certify perceptual quality.
【2】Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
标题:增强预训练的表示可分类性可以提高其可解释性
链接:https://arxiv.org/abs/2510.24105
备注:ICLR 2025 (Spotlight)
摘要:预训练模型的视觉表示优先考虑下游任务的可分类性,而预训练视觉模型的广泛应用对表示的可解释性提出了新的要求。然而,目前尚不清楚预训练的表示是否可以同时实现高可解释性和可分类性。为了回答这个问题,我们量化的表示可解释性,利用其相关性与可解释的语义表示内的比例。给定预训练的表示,解释只能捕获可解释的语义,而不可解释的部分会导致信息丢失。基于这一事实,我们提出了内在可解释性分数(IIS),评估的信息损失,衡量的比例可解释的语义,并量化表示的可解释性。在对具有不同可分类性的表示的可解释性的评价中,我们惊奇地发现可解释性和可分类性是正相关的,即,具有更高可分类性的表示提供了可以在解释中捕获的更多可解释语义。这一观察结果进一步支持了预训练表示的两个好处。首先,表示的可分类性可以通过可解释性最大化进行微调来进一步提高。其次,随着表征的可分类性的提高,我们基于其解释获得预测,并且准确性降低。所发现的正相关性和相应的应用表明,从业者可以统一预训练视觉模型的可解释性和可分类性的改进。代码可在https://github.com/ssfgunner/IIS上获得。
摘要:The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations. Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability. In the evaluation of the representation interpretability with different classifiability, we surprisingly discover that the interpretability and classifiability are positively correlated, i.e., representations with higher classifiability provide more interpretable semantics that can be captured in the interpretations. This observation further supports two benefits to the pre-trained representations. First, the classifiability of representations can be further improved by fine-tuning with interpretability maximization. Second, with the classifiability improvement for the representations, we obtain predictions based on their interpretations with less accuracy degradation. The discovered positive correlation and corresponding applications show that practitioners can unify the improvements in interpretability and classifiability for pre-trained vision models. Codes are available at https://github.com/ssfgunner/IIS.
【3】Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation
标题:鲁棒多任务机器人操纵的参数条件表示和专家混合策略
链接:https://arxiv.org/abs/2510.24055
备注:8 pages
摘要:感知模糊性和任务冲突限制了机器人通过模仿学习进行多任务操作。我们提出了一个框架相结合的存储条件的视觉表示(LCVR)模块和存储条件的混合专家密度政策(LMoE-DP)。LCVR通过将视觉特征与语言指令相结合来解决感知模糊,从而能够区分视觉上相似的任务。为了减轻任务冲突,LMoE-DP使用稀疏专家架构来专门处理不同的多模态动作分布,并通过梯度调制进行稳定。在真实机器人基准测试中,LCVR使用Transformers(ACT)和扩散策略(DP)分别将动作组块成功率提高了33.75%和25%。完整的框架实现了79%的平均成功率,比高级基线高出21%。我们的工作表明,结合语义基础和专家专业化,使强大的,有效的多任务操纵
摘要:Perceptual ambiguity and task conflict limit multitask robotic manipulation via imitation learning. We propose a framework combining a Language-Conditioned Visual Representation (LCVR) module and a Language-conditioned Mixture-ofExperts Density Policy (LMoE-DP). LCVR resolves perceptual ambiguities by grounding visual features with language instructions, enabling differentiation between visually similar tasks. To mitigate task conflict, LMoE-DP uses a sparse expert architecture to specialize in distinct, multimodal action distributions, stabilized by gradient modulation. On real-robot benchmarks, LCVR boosts Action Chunking with Transformers (ACT) and Diffusion Policy (DP) success rates by 33.75% and 25%, respectively. The full framework achieves a 79% average success, outperforming the advanced baseline by 21%. Our work shows that combining semantic grounding and expert specialization enables robust, efficient multi-task manipulation
【4】Debiasing Reward Models by Representation Learning with Guarantees
标题:通过带保证的表示学习消除奖励模型的偏差
链接:https://arxiv.org/abs/2510.23751
摘要:最近的对齐技术,如来自人类反馈的强化学习,已被广泛采用,通过学习和利用奖励模型来将大型语言模型与人类偏好对齐。在实践中,这些模型经常利用虚假的相关性,例如,反应长度、歧视、奉承和概念偏见,这是一个越来越受到关注的问题。在这项工作中,我们提出了一个原则性的框架,减轻这些偏见的奖励模型,同时保留反映预期偏好的基本因素。我们首先提供数据生成过程的公式,假设观察到的数据(例如,文本)是从虚假和非虚假潜在变量生成的。我们发现,有趣的是,这些非虚假的潜变量可以从理论上确定的数据,无论是否有替代的虚假的潜变量。这进一步激发了一种实用的方法,使用变分推理来恢复这些变量,并利用它们来训练奖励模型。在合成数据集和真实数据集上的实验表明,我们的方法有效地减轻了虚假相关问题,并产生了更强大的奖励模型。
摘要:Recent alignment techniques, such as reinforcement learning from human feedback, have been widely adopted to align large language models with human preferences by learning and leveraging reward models. In practice, these models often exploit spurious correlations, involving, e.g., response length, discrimination, sycophancy, and conceptual bias, which is a problem that has received increasing attention. In this work, we propose a principled framework that mitigates these biases in reward models while preserving the underlying factors that reflect intended preferences. We first provide a formulation of the data-generating process, assuming that the observed data (e.g., text) is generated from both spurious and non-spurious latent variables. We show that, interestingly, these non-spurious latent variables can be theoretically identified from data, regardless of whether a surrogate for the spurious latent variables is available. This further inspires a practical method that uses variational inference to recover these variables and leverages them to train reward models. Experiments on synthetic and real-world datasets demonstrate that our method effectively mitigates spurious correlation issues and yields more robust reward models.
【5】Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning
标题:结构感知融合与渐进注入用于多峰分子表示学习
链接:https://arxiv.org/abs/2510.23640
备注:Accepted by NeurIPS 2025
摘要:多模态分子模型经常遭受3D构象不可靠性和模态崩溃,限制了它们的鲁棒性和通用性。我们提出了MuMo,一个结构化的多模态融合框架,通过两个关键策略来解决分子表示中的这些挑战。为了减少一致性相关融合的不稳定性,我们设计了一个结构化融合管道(SFP),它将2D拓扑和3D几何结构结合成一个统一的和稳定的结构先验。为了减轻由朴素融合引起的模态崩溃,我们引入了一种渐进注入(PI)机制,该机制将此先验不对称地集成到序列流中,从而在实现跨模态丰富的同时保留模态特定的建模。MuMo建立在状态空间主干上,支持远程依赖建模和鲁棒的信息传播。在来自Therapeutics Data Commons(TDC)和MoleculeNet的29个基准任务中,MuMo在每个任务上的平均性能比最佳基线提高了2.7%,在其中22个任务上排名第一,包括LD 50任务的27%提高。这些结果验证了它对3D构象噪声的鲁棒性和多峰融合在分子表示中的有效性。该代码可从以下网址获得:github.com/selmiss/MuMo。
摘要:Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization. We propose MuMo, a structured multimodal fusion framework that addresses these challenges in molecular representation through two key strategies. To reduce the instability of conformer-dependent fusion, we design a Structured Fusion Pipeline (SFP) that combines 2D topology and 3D geometry into a unified and stable structural prior. To mitigate modality collapse caused by naive fusion, we introduce a Progressive Injection (PI) mechanism that asymmetrically integrates this prior into the sequence stream, preserving modality-specific modeling while enabling cross-modal enrichment. Built on a state space backbone, MuMo supports long-range dependency modeling and robust information propagation. Across 29 benchmark tasks from Therapeutics Data Commons (TDC) and MoleculeNet, MuMo achieves an average improvement of 2.7% over the best-performing baseline on each task, ranking first on 22 of them, including a 27% improvement on the LD50 task. These results validate its robustness to 3D conformer noise and the effectiveness of multimodal fusion in molecular representation. The code is available at: github.com/selmiss/MuMo.
优化|敛散性(8篇)
【1】Low-N Protein Activity Optimization with FolDE
标题:利用FolDE优化低N蛋白质活性
链接:https://arxiv.org/abs/2510.24053
备注:18 pages, 4 figures. Preprint. Open-source software available at this https URL
摘要:蛋白质传统上通过昂贵的构建和测量许多突变体来优化。主动学习辅助定向进化(ALDE)通过预测最佳改进和迭代测试突变体来为预测提供信息,从而降低了成本。然而,现有的ALDE方法面临着一个关键的限制:在每一轮中选择预测最高的突变体会产生同质的训练数据,不足以在随后的几轮中建立准确的预测模型。在这里,我们提出了FoldE,一种旨在最大限度地提高活动成功率的ALDE方法。在20个蛋白质靶标的模拟中,FoldE发现的前10%突变体比最佳基线ALDE方法多23%(p=0.005),并且发现前1%突变体的可能性高55%。FoldE主要通过基于自然的暖启动来实现这一点,该暖启动通过蛋白质语言模型输出来增强有限的活动测量,以改善活动预测。我们还引入了一个常数说谎者批量选择器,它提高了批量多样性,这在多突变运动中很重要,但在我们的基准测试中效果有限。完整的工作流程作为开源软件免费提供,使任何实验室都可以进行高效的蛋白质优化。
摘要:Proteins are traditionally optimized through the costly construction and measurement of many mutants. Active Learning-assisted Directed Evolution (ALDE) alleviates that cost by predicting the best improvements and iteratively testing mutants to inform predictions. However, existing ALDE methods face a critical limitation: selecting the highest-predicted mutants in each round yields homogeneous training data insufficient for accurate prediction models in subsequent rounds. Here we present FolDE, an ALDE method designed to maximize end-of-campaign success. In simulations across 20 protein targets, FolDE discovers 23% more top 10% mutants than the best baseline ALDE method (p=0.005) and is 55% more likely to find top 1% mutants. FolDE achieves this primarily through naturalness-based warm-starting, which augments limited activity measurements with protein language model outputs to improve activity prediction. We also introduce a constant-liar batch selector, which improves batch diversity; this is important in multi-mutation campaigns but had limited effect in our benchmarks. The complete workflow is freely available as open-source software, making efficient protein optimization accessible to any laboratory.
【2】Geometric Algorithms for Neural Combinatorial Optimization with Constraints
标题:带约束神经组合优化的几何算法
链接:https://arxiv.org/abs/2510.24039
摘要:组合优化的自监督学习(SSL)是一种利用神经网络解决组合优化问题的新兴范式。在本文中,我们解决了一个核心的挑战,SSL的CO:解决问题的离散约束。我们设计了一个端到端的微分框架,使我们能够解决离散约束优化问题的神经网络。具体地说,我们利用凸几何和Carath 'eodory定理文献中的算法技术将神经网络输出分解为对应于可行集的多面体角点的凸组合。这种基于分解的方法可以实现自我监督训练,但也可以确保将神经网络输出有效地保留质量,并将其舍入为可行的解决方案。在基数约束优化中的大量实验表明,我们的方法可以始终优于神经基线。我们还提供了工作的例子,我们的方法可以应用到基数约束的问题以外的一组不同的组合优化任务,包括在图中找到独立的集合,并解决拟阵约束的问题。
摘要:Self-Supervised Learning (SSL) for Combinatorial Optimization (CO) is an emerging paradigm for solving combinatorial problems using neural networks. In this paper, we address a central challenge of SSL for CO: solving problems with discrete constraints. We design an end-to-end differentiable framework that enables us to solve discrete constrained optimization problems with neural networks. Concretely, we leverage algorithmic techniques from the literature on convex geometry and Carath\'eodory's theorem to decompose neural network outputs into convex combinations of polytope corners that correspond to feasible sets. This decomposition-based approach enables self-supervised training but also ensures efficient quality-preserving rounding of the neural net output into feasible solutions. Extensive experiments in cardinality-constrained optimization show that our approach can consistently outperform neural baselines. We further provide worked-out examples of how our method can be applied beyond cardinality-constrained problems to a diverse set of combinatorial optimization tasks, including finding independent sets in graphs, and solving matroid-constrained problems.
【3】Optimal Arm Elimination Algorithms for Combinatorial Bandits
标题:组合盗贼的最佳手臂消除算法
链接:https://arxiv.org/abs/2510.23992
摘要:组合强盗将经典的强盗框架扩展到学习者在每一轮中选择多个手臂的设置,受到在线推荐和分类优化等应用的激励。虽然置信上限(UCB)算法的扩展自然出现在这种情况下,适应手臂消除方法已被证明更具挑战性。我们引入了一种新的消除计划,分区武器分为三类(确认,积极,消除),并结合明确的探索,以更新这些集。我们在两种情况下证明了我们的算法的有效性:具有一般图反馈的组合多臂强盗和组合线性上下文强盗。在这两种情况下,我们的方法实现了接近最优的遗憾,而基于UCB的方法可以证明失败,由于显式探索不足。还提供了匹配下限。
摘要
:Combinatorial bandits extend the classical bandit framework to settings where the learner selects multiple arms in each round, motivated by applications such as online recommendation and assortment optimization. While extensions of upper confidence bound (UCB) algorithms arise naturally in this context, adapting arm elimination methods has proved more challenging. We introduce a novel elimination scheme that partitions arms into three categories (confirmed, active, and eliminated), and incorporates explicit exploration to update these sets. We demonstrate the efficacy of our algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit. In both cases, our approach achieves near-optimal regret, whereas UCB-based methods can provably fail due to insufficient explicit exploration. Matching lower bounds are also provided.
【4】RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees
标题:RS-ORT:最优回归树的精简空间分支定界算法
链接:https://arxiv.org/abs/2510.23901
备注:20 pages, 1 figure, uses ICLR 2026 LaTeX style. Submitted to arXiv as a preprint version
摘要:混合整数规划(MIP)已经成为一个强大的框架学习最优决策树。然而,现有的MIP回归任务的方法要么局限于纯粹的二进制功能,或成为计算棘手的连续,大规模的数据时,涉及。天真地二进制化连续特征牺牲了全局最优性,并经常产生不必要的深度树。我们将最优回归树训练转换为两阶段优化问题,并提出了缩减空间最优回归树(RS-ORT)-一种专门针对树结构变量的分支定界(BB)算法。这种设计保证了算法的收敛性和独立于训练样本的数量。利用模型的结构,我们引入了几种边界收紧技术-闭合形式的叶预测,经验阈值离散化和精确的深度1子树解析-结合可分解的上下边界策略来加速训练。BB节点分解实现了平凡的并行执行,进一步减轻了计算的棘手性,即使是百万大小的数据集。基于对包含二进制和连续特征的几个回归基准的实证研究,RS-ORT还提供了比最先进的方法更好的训练和测试性能。值得注意的是,在具有连续特征的多达2,000,000个样本的数据集上,RS-ORT可以在4小时内以更简单的树结构和更好的泛化能力获得有保证的训练性能。
摘要:Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees. Yet, existing MIP approaches for regression tasks are either limited to purely binary features or become computationally intractable when continuous, large-scale data are involved. Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees. We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT) - a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables. This design guarantees the algorithm's convergence and its independence from the number of training samples. Leveraging the model's structure, we introduce several bound tightening techniques - closed-form leaf prediction, empirical threshold discretization, and exact depth-1 subtree parsing - that combine with decomposable upper and lower bounding strategies to accelerate the training. The BB node-wise decomposition enables trivial parallel execution, further alleviating the computational intractability even for million-size datasets. Based on the empirical studies on several regression benchmarks containing both binary and continuous features, RS-ORT also delivers superior training and testing performance than state-of-the-art methods. Notably, on datasets with up to 2,000,000 samples with continuous features, RS-ORT can obtain guaranteed training performance with a simpler tree structure and a better generalization ability in four hours.
【5】Optimize Any Topology: A Foundation Model for Shape- and Resolution-Free Structural Topology Optimization
标题:优化任何布局:无形状和分辨率的结构布局优化的基础模型
链接:https://arxiv.org/abs/2510.23667
摘要:结构拓扑优化(TO)是工程设计的核心,但由于复杂的物理和硬约束,仍然是计算密集型的。现有的深度学习方法仅限于固定的正方形网格,一些手工编码的边界条件和事后优化,无法进行一般部署。我们介绍优化任何拓扑结构(OAT),一个基础模型框架,直接预测任意纵横比,分辨率,体积分数,负载和固定装置的最低合规布局。OAT将分辨率和形状无关的自动编码器与隐式神经场解码器和在OpenTO上训练的条件潜在扩散模型相结合,OpenTO是一个包含220万个优化结构的新语料库,覆盖200万个独特的边界条件配置。在四个公共基准测试和两个具有挑战性的未知测试中,OAT相对于最佳先验模型将平均合规性降低了90%,并在单个GPU上提供低于1秒的推理,分辨率从64 x 64到256 x 256,宽高比高达10:1。这些结果将OAT确立为物理感知拓扑优化的通用、快速和无分辨率框架,并提供了一个大规模的数据集,以促进逆向设计生成建模的进一步研究。代码和数据可以在https://github.com/ahnobari/OptimizeAnyTopology上找到。
摘要:Structural topology optimization (TO) is central to engineering design but remains computationally intensive due to complex physics and hard constraints. Existing deep-learning methods are limited to fixed square grids, a few hand-coded boundary conditions, and post-hoc optimization, preventing general deployment. We introduce Optimize Any Topology (OAT), a foundation-model framework that directly predicts minimum-compliance layouts for arbitrary aspect ratios, resolutions, volume fractions, loads, and fixtures. OAT combines a resolution- and shape-agnostic autoencoder with an implicit neural-field decoder and a conditional latent-diffusion model trained on OpenTO, a new corpus of 2.2 million optimized structures covering 2 million unique boundary-condition configurations. On four public benchmarks and two challenging unseen tests, OAT lowers mean compliance up to 90% relative to the best prior models and delivers sub-1 second inference on a single GPU across resolutions from 64 x 64 to 256 x 256 and aspect ratios as high as 10:1. These results establish OAT as a general, fast, and resolution-free framework for physics-aware topology optimization and provide a large-scale dataset to spur further research in generative modeling for inverse design. Code & data can be found at https://github.com/ahnobari/OptimizeAnyTopology.
【6】A Single-Loop First-Order Algorithm for Linearly Constrained Bilevel Optimization
标题:线性约束二层优化的单循环一阶算法
链接:https://arxiv.org/abs/2510.24710
备注:NeurIPS 2025
摘要:我们研究两层优化问题,其中下层问题是强凸的,并有耦合的线性约束。为了克服超目标的潜在非光滑性和与Hessian矩阵相关的计算挑战,我们利用罚函数和增广拉格朗日方法将原始问题转化为单级问题。特别是,我们建立了一个强大的理论之间的联系,重新制定的功能和原来的超目标,通过其值和导数的接近性特征。在此基础上,我们提出了一个单循环,一阶算法的线性约束双层优化(SFLCB)。我们提供了严格的非渐近收敛速度的分析,显示出比以前的双环算法的改进-从$O(\displaystyle\log ^{-1})$到$O(\displaystyle\log ^{-3})$。实验结果证实了我们的理论研究结果,并证明了所提出的SFLCB算法的实际效率。模拟代码可在https://github.com/ShenGroup/SFLCB上找到。
摘要:We study bilevel optimization problems where the lower-level problems are strongly convex and have coupled linear constraints. To overcome the potential non-smoothness of the hyper-objective and the computational challenges associated with the Hessian matrix, we utilize penalty and augmented Lagrangian methods to reformulate the original problem as a single-level one. Especially, we establish a strong theoretical connection between the reformulated function and the original hyper-objective by characterizing the closeness of their values and derivatives. Based on this reformulation, we propose a single-loop, first-order algorithm for linearly constrained bilevel optimization (SFLCB). We provide rigorous analyses of its non-asymptotic convergence rates, showing an improvement over prior double-loop algorithms -- form $O(\epsilon^{-3}\log(\epsilon^{-1}))$ to $O(\epsilon^{-3})$. The experiments corroborate our theoretical findings and demonstrate the practical efficiency of the proposed SFLCB algorithm. Simulation code is provided at https://github.com/ShenGroup/SFLCB.
【7】Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation
标题:深度学习的统计物理:多层感知器接近插值的最佳学习
链接:https://arxiv.org/abs/2510.24616
备注:30 pages, 19 figures + appendix. This submission supersedes both arXiv:2505.24849 and arXiv:2501.18530
摘要:30年来,统计物理学一直在为分析神经网络提供一个框架。一个长期存在的问题是,它是否有能力处理捕获丰富特征学习效果的深度学习模型,从而超越了迄今为止分析的狭窄网络或内核方法。通过对多层感知器的监督学习的研究,我们给出了肯定的回答。重要的是,(i)它的宽度随着输入维度的变化而变化,使其比超宽网络更容易进行特征学习,比窄网络或具有固定嵌入层的网络更具表现力;(ii)我们专注于具有挑战性的插值机制,其中可训练参数和数据的数量是可比的,这迫使模型适应任务。我们考虑匹配的教师-学生设置。它提供了学习随机深度神经网络目标的基本限制,并有助于识别足够的统计数据,这些统计数据描述了随着数据预算的增加,经过最佳训练的网络所学习的内容。一个丰富的现象学出现了各种学习过渡。有了足够的数据,通过模型对目标的“专业化”来实现最佳性能,但对于被理论预测的次优解决方案所吸引的训练算法来说,这可能很难达到。特化在各层之间不均匀地发生,从浅层向深层传播,而且也在每层的神经元之间传播。此外,更深的目标更难学习。尽管它很简单,但贝叶斯最优设置提供了关于深度,非线性和有限(比例)宽度如何影响特征学习机制中的神经网络的见解,这些机制可能与之相关。
摘要
:For three decades statistical physics has been providing a framework to analyse neural networks. A long-standing question remained on its capacity to tackle deep learning models capturing rich feature learning effects, thus going beyond the narrow networks or kernel methods analysed until now. We positively answer through the study of the supervised learning of a multi-layer perceptron. Importantly, (i) its width scales as the input dimension, making it more prone to feature learning than ultra wide networks, and more expressive than narrow ones or with fixed embedding layers; and (ii) we focus on the challenging interpolation regime where the number of trainable parameters and data are comparable, which forces the model to adapt to the task. We consider the matched teacher-student setting. It provides the fundamental limits of learning random deep neural network targets and helps in identifying the sufficient statistics describing what is learnt by an optimally trained network as the data budget increases. A rich phenomenology emerges with various learning transitions. With enough data optimal performance is attained through model's "specialisation" towards the target, but it can be hard to reach for training algorithms which get attracted by sub-optimal solutions predicted by the theory. Specialisation occurs inhomogeneously across layers, propagating from shallow towards deep ones, but also across neurons in each layer. Furthermore, deeper targets are harder to learn. Despite its simplicity, the Bayesian-optimal setting provides insights on how the depth, non-linearity and finite (proportional) width influence neural networks in the feature learning regime that are potentially relevant way beyond it.
【8】Problem-Parameter-Free Decentralized Bilevel Optimization
标题:无问题参数分散二层优化
链接:https://arxiv.org/abs/2510.24288
备注:Accepted by NeurIPS 2025
摘要:分散式双层优化由于其在解决大规模机器学习问题中的关键作用而受到了极大的关注。然而,现有的方法往往依赖于先验知识的问题参数,如平滑度,凸性,或通信网络拓扑结构,以确定适当的步长。在实践中,这些问题参数通常不可用,导致大量的手动工作用于超参数调整。在本文中,我们提出了AdaSDBO,一个完全无问题参数的分散式双层优化算法与单循环结构。AdaSDBO利用基于累积梯度范数的自适应步长来同时更新所有变量,动态调整其进度并消除对特定于问题的超参数调优的需要。通过严格的理论分析,我们确定AdaSDBO实现了$\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$的收敛速度,与调优的最先进方法的性能相匹配。大量的数值实验表明,AdaSDBO提供竞争力的性能相比,现有的分散式双层优化方法,同时表现出显着的鲁棒性在不同的步长配置。
摘要:Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading to substantial manual effort for hyperparameter tuning. In this paper, we propose AdaSDBO, a fully problem-parameter-free algorithm for decentralized bilevel optimization with a single-loop structure. AdaSDBO leverages adaptive stepsizes based on cumulative gradient norms to update all variables simultaneously, dynamically adjusting its progress and eliminating the need for problem-specific hyperparameter tuning. Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $\widetilde{\mathcal{O}}\left(\frac{1}{T}\right)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors. Extensive numerical experiments demonstrate that AdaSDBO delivers competitive performance compared to existing decentralized bilevel optimization methods while exhibiting remarkable robustness across diverse stepsize configurations.
预测|估计(14篇)
【1】DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment
标题:DistDF:时间序列预测需要联合分布Wasserstein一致
链接:https://arxiv.org/abs/2510.24574
摘要:为时间序列预测模型定型需要将模型预测的条件分布与标签序列的条件分布对齐。标准的直接预测(DF)方法采取最小化标签序列的条件负对数似然,通常使用均方误差估计。然而,这种估计被证明是有偏见的标签自相关的存在。在本文中,我们提出了DistDF,它实现对齐的条件预测和标签分布之间的差异最小化。由于条件差异很难从有限的时间序列观测值中估计,我们引入了一个新提出的联合分布Wasserstein差异的时间序列预测,可证明上限的条件差异的兴趣。这种差异允许从经验样本中进行易处理的、可区分的估计,并与基于梯度的训练无缝集成。大量的实验表明,DistDF提高了各种预测模型的性能,达到了最先进的预测性能。代码可在https://anonymous.4open.science/r/DistDF-F66B上获得。
摘要:Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimize the conditional negative log-likelihood of the label sequence, typically estimated using the mean squared error. However, this estimation proves to be biased in the presence of label autocorrelation. In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. Because conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a newly proposed joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. This discrepancy admits tractable, differentiable estimation from empirical samples and integrates seamlessly with gradient-based training. Extensive experiments show that DistDF improves the performance diverse forecast models and achieves the state-of-the-art forecasting performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.
【2】A comparison between joint and dual UKF implementations for state estimation and leak localization in water distribution networks
标题:用于供水网络状态估计和泄漏定位的联合和双重UKF实现之间的比较
链接:https://arxiv.org/abs/2510.24228
备注:This work has been submitted to ECC2026 for review. It has 7 pages and 2 figures
摘要:现代城市的可持续性在很大程度上取决于有效的配水管理,包括有效的压力控制、泄漏检测和定位。因此,有关管网水力状态的准确信息至关重要。本文提出了两种基于无迹卡尔曼滤波(UKF)的数据驱动的状态估计方法,融合压力,需求和流量数据的水头和流量估计进行比较。一种方法使用一个联合状态向量与一个单一的估计,而另一种使用双估计方案。我们分析了它们的主要特点,讨论了差异,优点和局限性,并从理论上比较它们的准确性和复杂性。最后,我们展示了L-TOWN基准测试的几个估计结果,允许在实际实现中讨论它们的属性。
摘要:The sustainability of modern cities highly depends on efficient water distribution management, including effective pressure control and leak detection and localization. Accurate information about the network hydraulic state is therefore essential. This article presents a comparison between two data-driven state estimation methods based on the Unscented Kalman Filter (UKF), fusing pressure, demand and flow data for head and flow estimation. One approach uses a joint state vector with a single estimator, while the other uses a dual-estimator scheme. We analyse their main characteristics, discussing differences, advantages and limitations, and compare them theoretically in terms of accuracy and complexity. Finally, we show several estimation results for the L-TOWN benchmark, allowing to discuss their properties in a real implementation.
【3】Learning from History: A Retrieval-Augmented Framework for Spatiotemporal Prediction
标题:向历史学习:时空预测的检索增强框架
链接:https://arxiv.org/abs/2510.24049
摘要:对复杂物理系统进行精确、长期的时空预测仍然是科学计算领域的一个基本挑战。虽然深度学习模型作为强大的参数近似器已经取得了显着的成功,但它们受到一个关键限制:长期自回归推出过程中错误的积累通常会导致物理上难以置信的伪影。这种缺陷源于它们的纯参数性质,它难以捕捉系统内在动力学的全部约束。为了解决这个问题,我们引入了一个新的\textbf{检索增强预测(RAP)}框架,这是一种混合范式,可以将深度网络的预测能力与历史数据的真实性结合起来。RAP的核心理念是利用历史进化范例作为系统局部动态的非参数估计。对于任何给定的状态,RAP有效地从大规模数据库中检索最相似的历史模拟。这种类似物的真正未来演变然后作为\textbf{参考目标}。重要的是,这个目标不是损失函数中的硬约束,而是专用双流架构的强大条件输入。它提供了强大的\textbf{动态指导},将模型的预测转向物理上可行的轨迹。在气象学、湍流和火灾模拟的广泛基准测试中,RAP不仅超越了最先进的方法,而且显著优于强大的\textbf{仅模拟预测基线}。更重要的是,RAP通过有效抑制长期部署中的误差发散,生成更符合实际的预测。
摘要
:Accurate and long-term spatiotemporal prediction for complex physical systems remains a fundamental challenge in scientific computing. While deep learning models, as powerful parametric approximators, have shown remarkable success, they suffer from a critical limitation: the accumulation of errors during long-term autoregressive rollouts often leads to physically implausible artifacts. This deficiency arises from their purely parametric nature, which struggles to capture the full constraints of a system's intrinsic dynamics. To address this, we introduce a novel \textbf{Retrieval-Augmented Prediction (RAP)} framework, a hybrid paradigm that synergizes the predictive power of deep networks with the grounded truth of historical data. The core philosophy of RAP is to leverage historical evolutionary exemplars as a non-parametric estimate of the system's local dynamics. For any given state, RAP efficiently retrieves the most similar historical analog from a large-scale database. The true future evolution of this analog then serves as a \textbf{reference target}. Critically, this target is not a hard constraint in the loss function but rather a powerful conditional input to a specialized dual-stream architecture. It provides strong \textbf{dynamic guidance}, steering the model's predictions towards physically viable trajectories. In extensive benchmarks across meteorology, turbulence, and fire simulation, RAP not only surpasses state-of-the-art methods but also significantly outperforms a strong \textbf{analog-only forecasting baseline}. More importantly, RAP generates predictions that are more physically realistic by effectively suppressing error divergence in long-term rollouts.
【4】Spatio-temporal Multivariate Time Series Forecast with Chosen Variables
标题:选择变量的时空多元时间序列预测
链接:https://arxiv.org/abs/2510.24027
备注:In submission
摘要:时空多变量时间序列预测(STMF)是利用最近一段时间内n个空间分布变量的时间序列来预测它们在不久的将来的值。它在道路交通预测、空气污染预测等时空传感预测中有着重要的应用。最近的论文解决了模型输入中缺失变量的实际问题,该问题出现在传感应用中,由于预算限制,传感器的数量$m$远小于要监控的位置的数量$n$。我们观察到,现有技术假设$m$变量(即,具有传感器的位置)是预先确定的,并且如何选择输入中的$m$变量的重要问题从未被研究过。本文填补了这一空白,研究了一个新的问题,STMF与选择的变量,最佳选择$m$-出-$n$的变量的模型输入,以最大限度地提高预测精度。我们提出了一个统一的框架,共同进行变量选择和模型优化的预测精度和模型效率。它包括三个新的技术组件:(1)掩蔽变量参数修剪,通过基于分位数的掩蔽逐步修剪信息量较少的变量和注意力参数;(2)优先变量参数重放,重放低损失的过去样本,以保留学习到的知识,以保持模型稳定性;(3)动态外推机制,其通过可学习的空间嵌入和邻接信息将信息从被选择用于输入的变量传播到所有其他变量。在五个真实数据集上的实验表明,我们的工作在准确性和效率方面都显着优于最先进的基线,证明了联合变量选择和模型优化的有效性。
摘要:Spatio-Temporal Multivariate time series Forecast (STMF) uses the time series of $n$ spatially distributed variables in a period of recent past to forecast their values in a period of near future. It has important applications in spatio-temporal sensing forecast such as road traffic prediction and air pollution prediction. Recent papers have addressed a practical problem of missing variables in the model input, which arises in the sensing applications where the number $m$ of sensors is far less than the number $n$ of locations to be monitored, due to budget constraints. We observe that the state of the art assumes that the $m$ variables (i.e., locations with sensors) in the model input are pre-determined and the important problem of how to choose the $m$ variables in the input has never been studied. This paper fills the gap by studying a new problem of STMF with chosen variables, which optimally selects $m$-out-of-$n$ variables for the model input in order to maximize the forecast accuracy. We propose a unified framework that jointly performs variable selection and model optimization for both forecast accuracy and model efficiency. It consists of three novel technical components: (1) masked variable-parameter pruning, which progressively prunes less informative variables and attention parameters through quantile-based masking; (2) prioritized variable-parameter replay, which replays low-loss past samples to preserve learned knowledge for model stability; (3) dynamic extrapolation mechanism, which propagates information from variables selected for the input to all other variables via learnable spatial embeddings and adjacency information. Experiments on five real-world datasets show that our work significantly outperforms the state-of-the-art baselines in both accuracy and efficiency, demonstrating the effectiveness of joint variable selection and model optimization.
【5】Predicting Barge Tow Size on Inland Waterways Using Vessel Trajectory Derived Features: Proof of Concept
标题:使用船舶轨迹衍生特征预测内陆水道上的驳船拖曳尺寸:概念验证
链接:https://arxiv.org/abs/2510.23994
摘要:由于驳船的非自航性质和现有监测系统的局限性,准确、实时地估计内陆水道上的驳船数量仍然是一个严峻的挑战。本研究介绍一种新的方法,使用自动识别系统(AIS)的船舶跟踪数据来预测驳船的数量在拖使用机器学习(ML)。为了训练和测试模型,驳船实例从密西西比河下游的卫星场景手动注释。使用时空匹配程序将标记图像与AIS血管轨迹匹配。使用递归特征消除(RFE)创建并评价了一组包含30个AIS衍生特征的综合特征集,这些特征捕获血管几何形状、动态运动和轨迹模式,以识别最具预测性的变量。六个回归模型,包括集成,基于核的,和广义线性方法,进行了训练和评估。泊松回归模型产生了最好的性能,实现了平均绝对误差(MAE)的1.92驳船使用12的30个功能。特征重要性分析表明,捕获船舶机动性的指标,如课程熵,速度变化和行程长度是最预测驳船数量。所提出的方法提供了一个可扩展的,易于实施的方法,以提高海事领域的意识(MDA),具有强大的潜在应用在船闸调度,港口管理和货运规划。未来的工作将扩大这里提出的概念证明,以探索模型可移植到其他具有不同操作和环境条件的内陆河流。
摘要:Accurate, real-time estimation of barge quantity on inland waterways remains a critical challenge due to the non-self-propelled nature of barges and the limitations of existing monitoring systems. This study introduces a novel method to use Automatic Identification System (AIS) vessel tracking data to predict the number of barges in tow using Machine Learning (ML). To train and test the model, barge instances were manually annotated from satellite scenes across the Lower Mississippi River. Labeled images were matched to AIS vessel tracks using a spatiotemporal matching procedure. A comprehensive set of 30 AIS-derived features capturing vessel geometry, dynamic movement, and trajectory patterns were created and evaluated using Recursive Feature Elimination (RFE) to identify the most predictive variables. Six regression models, including ensemble, kernel-based, and generalized linear approaches, were trained and evaluated. The Poisson Regressor model yielded the best performance, achieving a Mean Absolute Error (MAE) of 1.92 barges using 12 of the 30 features. The feature importance analysis revealed that metrics capturing vessel maneuverability such as course entropy, speed variability and trip length were most predictive of barge count. The proposed approach provides a scalable, readily implementable method for enhancing Maritime Domain Awareness (MDA), with strong potential applications in lock scheduling, port management, and freight planning. Future work will expand the proof of concept presented here to explore model transferability to other inland rivers with differing operational and environmental conditions.
【6】Synergistic Neural Forecasting of Air Pollution with Stochastic Sampling
标题:空气污染随机抽样协同神经预测
链接:https://arxiv.org/abs/2510.23977
摘要:空气污染仍然是一个主要的全球健康和环境风险,特别是在容易受到野火、城市雾霾和沙尘暴造成的间歇性空气污染高峰的地区。准确预测颗粒物(PM)浓度对于及时发布公共卫生警报和采取干预措施至关重要,但现有模型往往低估了罕见但危险的污染事件。在这里,我们介绍了SynCast,这是一种高分辨率的神经预测模型,它集成了气象和空气成分数据,以改善对平均和极端污染水平的预测。SynCast构建在区域适应性的Transformer主干上,并通过基于扩散的随机细化模块进行增强,比现有方法更准确地捕捉驱动PM尖峰的非线性动态。利用协调ERA 5和CAMS数据集,我们的模型在多个PM变量(PM $_1 $,PM$_{2.5}$,PM$_{10}$)的预测精度方面显示出实质性的提高,特别是在极端条件下。我们证明,传统的损失函数代表分布的尾巴(罕见的污染事件),并表明,SynCast,域感知目标和极值理论的指导下,显着提高性能,在高度受影响的地区,而不影响全球的准确性。这种方法为下一代空气质量预警系统提供了可扩展的基础,并支持脆弱地区减轻气候健康风险。
摘要:Air pollution remains a leading global health and environmental risk, particularly in regions vulnerable to episodic air pollution spikes due to wildfires, urban haze and dust storms. Accurate forecasting of particulate matter (PM) concentrations is essential to enable timely public health warnings and interventions, yet existing models often underestimate rare but hazardous pollution events. Here, we present SynCast, a high-resolution neural forecasting model that integrates meteorological and air composition data to improve predictions of both average and extreme pollution levels. Built on a regionally adapted transformer backbone and enhanced with a diffusion-based stochastic refinement module, SynCast captures the nonlinear dynamics driving PM spikes more accurately than existing approaches. Leveraging on harmonized ERA5 and CAMS datasets, our model shows substantial gains in forecasting fidelity across multiple PM variables (PM$_1$, PM$_{2.5}$, PM$_{10}$), especially under extreme conditions. We demonstrate that conventional loss functions underrepresent distributional tails (rare pollution events) and show that SynCast, guided by domain-aware objectives and extreme value theory, significantly enhances performance in highly impacted regions without compromising global accuracy. This approach provides a scalable foundation for next-generation air quality early warning systems and supports climate-health risk mitigation in vulnerable regions.
【7】Improving the Straight-Through Estimator with Zeroth-Order Information
标题:利用零阶信息改进直通估计器
链接:https://arxiv.org/abs/2510.23926
备注
:39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:研究了参数量化的神经网络的训练问题。通过使得能够经由直通估计(STE)计算梯度来学习低精度量化参数可能具有挑战性。虽然STE能够实现反向传播,这是一种一阶方法,但最近的工作已经探索了使用零阶(ZO)梯度下降进行微调。我们注意到,STE提供了高质量的有偏梯度,而ZO梯度是无偏的,但可能是昂贵的。因此,我们提出了一阶引导零阶梯度下降(FOGZO),减少STE偏见,同时减少计算相对于ZO方法。从经验上讲,我们表明FOGZO改善了量化感知预训练中质量和训练时间之间的权衡。具体来说,与相同迭代次数的STE相比,DeiT Tiny/Small的精度提高了1-8\%,ResNet 18/50的精度提高了1-2\%,LLaMA模型的困惑点提高了1-22个,参数高达3亿。对于相同的损失,FOGZO产生了796$\times$减少计算与n-SPSA的2层MLP上MNIST。代码可在https://github.com/1733116199/fogzo上获得。
摘要:We study the problem of training neural networks with quantized parameters. Learning low-precision quantized parameters by enabling computation of gradients via the Straight-Through Estimator (STE) can be challenging. While the STE enables back-propagation, which is a first-order method, recent works have explored the use of zeroth-order (ZO) gradient descent for fine-tuning. We note that the STE provides high-quality biased gradients, and ZO gradients are unbiased but can be expensive. We thus propose First-Order-Guided Zeroth-Order Gradient Descent (FOGZO) that reduces STE bias while reducing computations relative to ZO methods. Empirically, we show FOGZO improves the tradeoff between quality and training time in Quantization-Aware Pre-Training. Specifically, versus STE at the same number of iterations, we show a 1-8\% accuracy improvement for DeiT Tiny/Small, 1-2\% accuracy improvement on ResNet 18/50, and 1-22 perplexity point improvement for LLaMA models with up to 0.3 billion parameters. For the same loss, FOGZO yields a 796$\times$ reduction in computation versus n-SPSA for a 2-layer MLP on MNIST. Code is available at https://github.com/1733116199/fogzo.
【8】Artificial Intelligence Based Predictive Maintenance for Electric Buses
标题:基于人工智能的电动公交车预测性维护
链接:https://arxiv.org/abs/2510.23879
摘要:预测性维护(PdM)对于优化电动公交车的效率和最大限度地减少停机时间至关重要。虽然这些车辆提供了环境效益,但由于复杂的电力传输和电池系统,它们对PdM构成了挑战。传统的维护通常基于定期检查,难以捕获多维实时CAN总线数据中的异常。本研究采用一种基于图的特征选择方法来分析电动公交车CAN总线参数之间的关系,并利用人工智能技术研究目标报警的预测性能。两年多来收集的原始数据经过了广泛的预处理,以确保数据质量和一致性。通过将统计滤波(Pearson相关性、Cramer's V、ANOVA F检验)与基于优化的社区检测算法(InfoMap、Leiden、Louvain、Fast Greedy)相结合,开发了一种混合的基于图的特征选择工具。机器学习模型,包括SVM,随机森林和XGBoost,通过网格和随机搜索进行优化,并通过SMOTEEN和基于二进制搜索的下采样进行数据平衡。模型的可解释性,实现了使用LIME来识别影响预测的功能。结果表明,开发的系统有效地预测车辆警报,增强功能可解释性,并支持符合工业4.0原则的主动维护策略。
摘要:Predictive maintenance (PdM) is crucial for optimizing efficiency and minimizing downtime of electric buses. While these vehicles provide environmental benefits, they pose challenges for PdM due to complex electric transmission and battery systems. Traditional maintenance, often based on scheduled inspections, struggles to capture anomalies in multi-dimensional real-time CAN Bus data. This study employs a graph-based feature selection method to analyze relationships among CAN Bus parameters of electric buses and investigates the prediction performance of targeted alarms using artificial intelligence techniques. The raw data collected over two years underwent extensive preprocessing to ensure data quality and consistency. A hybrid graph-based feature selection tool was developed by combining statistical filtering (Pearson correlation, Cramer's V, ANOVA F-test) with optimization-based community detection algorithms (InfoMap, Leiden, Louvain, Fast Greedy). Machine learning models, including SVM, Random Forest, and XGBoost, were optimized through grid and random search with data balancing via SMOTEEN and binary search-based down-sampling. Model interpretability was achieved using LIME to identify the features influencing predictions. The results demonstrate that the developed system effectively predicts vehicle alarms, enhances feature interpretability, and supports proactive maintenance strategies aligned with Industry 4.0 principles.
【9】Revealing the Potential of Learnable Perturbation Ensemble Forecast Model for Tropical Cyclone Prediction
标题:揭示可学习扰动群预报模型在热带气旋预报中的潜力
链接:https://arxiv.org/abs/2510.23794
备注:30 pages, 21 figures, 1 table
摘要:Tropical cyclones (TCs) are highly destructive and inherently uncertain weather systems. Ensemble forecasting helps quantify these uncertainties, yet traditional systems are constrained by high computational costs and limited capability to fully represent atmospheric nonlinearity. FuXi-ENS introduces a learnable perturbation scheme for ensemble generation, representing a novel AI-based forecasting paradigm. Here, we systematically compare FuXi-ENS with ECMWF-ENS using all 90 global TCs in 2018, examining their performance in TC-related physical variables, track and intensity forecasts, and the associated dynamical and thermodynamical fields. FuXi-ENS demonstrates clear advantages in predicting TC-related physical variables, and achieves more accurate track forecasts with reduced ensemble spread, though it still underestimates intensity relative to observations. Further dynamical and thermodynamical analyses reveal that FuXi-ENS better captures large-scale circulation, with moisture turbulent energy more tightly concentrated around the TC warm core, whereas ECMWF-ENS exhibits a more dispersed distribution. These findings highlight the potential of learnable perturbations to improve TC forecasting skill and provide valuable insights for advancing AI-based ensemble prediction of extreme weather events that have significant societal impacts.
【10】DBLoss: Decomposition-based Loss Function for Time Series Forecasting
标题:DBLoss:用于时间序列预测的基于分解的损失函数
链接:https://arxiv.org/abs/2510.23672
备注:Accepted by NeurIPS 2025
摘要:Time series forecasting holds significant value in various domains such as economics, traffic, energy, and AIOps, as accurate predictions facilitate informed decision-making. However, the existing Mean Squared Error (MSE) loss function sometimes fails to accurately capture the seasonality or trend within the forecasting horizon, even when decomposition modules are used in the forward propagation to model the trend and seasonality separately. To address these challenges, we propose a simple yet effective Decomposition-Based Loss function called DBLoss. This method uses exponential moving averages to decompose the time series into seasonal and trend components within the forecasting horizon, and then calculates the loss for each of these components separately, followed by weighting them. As a general loss function, DBLoss can be combined with any deep learning forecasting model. Extensive experiments demonstrate that DBLoss significantly improves the performance of state-of-the-art models across diverse real-world datasets and provides a new perspective on the design of time series loss functions.
【11】A machine learning framework integrating seed traits and plasma parameters for predicting germination uplift in crops
标题:集成种子性状和血浆参数的机器学习框架,用于预测作物发芽上升
链接:https://arxiv.org/abs/2510.23657
摘要:Cold plasma (CP) is an eco-friendly method to enhance seed germination, yet outcomes remain difficult to predict due to complex seed--plasma--environment interactions. This study introduces the first machine learning framework to forecast germination uplift in soybean, barley, sunflower, radish, and tomato under dielectric barrier discharge (DBD) plasma. Among the models tested (GB, XGB, ET, and hybrids), Extra Trees (ET) performed best (R\textsuperscript{2} = 0.919; RMSE = 3.21; MAE = 2.62), improving to R\textsuperscript{2} = 0.925 after feature reduction. Engineering analysis revealed a hormetic response: negligible effects at $
【12】Nearest Neighbor Matching as Least Squares Density Ratio Estimation and Riesz Regression
标题:最近邻匹配作为最小平方密度比估计和Riesz回归
链接:https://arxiv.org/abs/2510.24433
摘要:This study proves that Nearest Neighbor (NN) matching can be interpreted as an instance of Riesz regression for automatic debiased machine learning. Lin et al. (2023) shows that NN matching is an instance of density-ratio estimation with their new density-ratio estimator. Chernozhukov et al. (2024) develops Riesz regression for automatic debiased machine learning, which directly estimates the Riesz representer (or equivalently, the bias-correction term) by minimizing the mean squared error. In this study, we first prove that the density-ratio estimation method proposed in Lin et al. (2023) is essentially equivalent to Least-Squares Importance Fitting (LSIF) proposed in Kanamori et al. (2009) for direct density-ratio estimation. Furthermore, we derive Riesz regression using the LSIF framework. Based on these results, we derive NN matching from Riesz regression. This study is based on our work Kato (2025a) and Kato (2025b).
【13】Towards actionable hypotension prediction- predicting catecholamine therapy initiation in the intensive care unit
标题:迈向可采取行动的低血压预测-预测重症监护室中的多巴胺治疗开始
链接:https://arxiv.org/abs/2510.24287
备注:27 pages, 8 figures, source code under this https URL
摘要:Hypotension in critically ill ICU patients is common and life-threatening. Escalation to catecholamine therapy marks a key management step, with both undertreatment and overtreatment posing risks. Most machine learning (ML) models predict hypotension using fixed MAP thresholds or MAP forecasting, overlooking the clinical decision behind treatment escalation. Predicting catecholamine initiation, the start of vasoactive or inotropic agent administration offers a more clinically actionable target reflecting real decision-making. Using the MIMIC-III database, we modeled catecholamine initiation as a binary event within a 15-minute prediction window. Input features included statistical descriptors from a two-hour sliding MAP context window, along with demographics, biometrics, comorbidities, and ongoing treatments. An Extreme Gradient Boosting (XGBoost) model was trained and interpreted via SHapley Additive exPlanations (SHAP). The model achieved an AUROC of 0.822 (0.813-0.830), outperforming the hypotension baseline (MAP < 65, AUROC 0.686 [0.675-0.699]). SHAP analysis highlighted recent MAP values, MAP trends, and ongoing treatments (e.g., sedatives, electrolytes) as dominant predictors. Subgroup analysis showed higher performance in males, younger patients (<53 years), those with higher BMI (>32), and patients without comorbidities or concurrent medications. Predicting catecholamine initiation based on MAP dynamics, treatment context, and patient characteristics supports the critical decision of when to escalate therapy, shifting focus from threshold-based alarms to actionable decision support. This approach is feasible across a broad ICU cohort under natural event imbalance. Future work should enrich temporal and physiological context, extend label definitions to include therapy escalation, and benchmark against existing hypotension prediction systems.
【14】Forecasting precipitation in the Arctic using probabilistic machine learning informed by causal climate drivers
标题:利用因果气候驱动因素提供信息的概率机器学习预测北极降水
链接:https://arxiv.org/abs/2510.24254
摘要:Understanding and forecasting precipitation events in the Arctic maritime environments, such as Bear Island and Ny-{\AA}lesund, is crucial for assessing climate risk and developing early warning systems in vulnerable marine regions. This study proposes a probabilistic machine learning framework for modeling and predicting the dynamics and severity of precipitation. We begin by analyzing the scale-dependent relationships between precipitation and key atmospheric drivers (e.g., temperature, relative humidity, cloud cover, and air pressure) using wavelet coherence, which captures localized dependencies across time and frequency domains. To assess joint causal influences, we employ Synergistic-Unique-Redundant Decomposition, which quantifies the impact of interaction effects among each variable on future precipitation dynamics. These insights inform the development of data-driven forecasting models that incorporate both historical precipitation and causal climate drivers. To account for uncertainty, we employ the conformal prediction method, which enables the generation of calibrated non-parametric prediction intervals. Our results underscore the importance of utilizing a comprehensive framework that combines causal analysis with probabilistic forecasting to enhance the reliability and interpretability of precipitation predictions in Arctic marine environments.
其他神经网络|深度学习|模型|建模(37篇)
【1】Learning to Drive Safely with Hybrid Options
标题:学习使用混合动力选项安全驾驶
链接:https://arxiv.org/abs/2510.24674
摘要:Out of the many deep reinforcement learning approaches for autonomous driving, only few make use of the options (or skills) framework. That is surprising, as this framework is naturally suited for hierarchical control applications in general, and autonomous driving tasks in specific. Therefore, in this work the options framework is applied and tailored to autonomous driving tasks on highways. More specifically, we define dedicated options for longitudinal and lateral manoeuvres with embedded safety and comfort constraints. This way, prior domain knowledge can be incorporated into the learning process and the learned driving behaviour can be constrained more easily. We propose several setups for hierarchical control with options and derive practical algorithms following state-of-the-art reinforcement learning techniques. By separately selecting actions for longitudinal and lateral control, the introduced policies over combined and hybrid options obtain the same expressiveness and flexibility that human drivers have, while being easier to interpret than classical policies over continuous actions. Of all the investigated approaches, these flexible policies over hybrid options perform the best under varying traffic conditions, outperforming the baseline policies over actions.
【2】Pearl: A Foundation Model for Placing Every Atom in the Right Location
标题:Pearl:将每个原子放置在正确位置的基础模型
链接:https://arxiv.org/abs/2510.24670
摘要
:Accurately predicting the three-dimensional structures of protein-ligand complexes remains a fundamental challenge in computational drug discovery that limits the pace and success of therapeutic design. Deep learning methods have recently shown strong potential as structural prediction tools, achieving promising accuracy across diverse biomolecular systems. However, their performance and utility are constrained by scarce experimental data, inefficient architectures, physically invalid poses, and the limited ability to exploit auxiliary information available at inference. To address these issues, we introduce Pearl (Placing Every Atom in the Right Location), a foundation model for protein-ligand cofolding at scale. Pearl addresses these challenges with three key innovations: (1) training recipes that include large-scale synthetic data to overcome data scarcity; (2) architectures that incorporate an SO(3)-equivariant diffusion module to inherently respect 3D rotational symmetries, improving generalization and sample efficiency, and (3) controllable inference, including a generalized multi-chain templating system supporting both protein and non-polymeric components as well as dual unconditional/conditional modes. Pearl establishes a new state-of-the-art performance in protein-ligand cofolding. On the key metric of generating accurate (RMSD < 2 \r{A}) and physically valid poses, Pearl surpasses AlphaFold 3 and other open source baselines on the public Runs N' Poses and PoseBusters benchmarks, delivering 14.5% and 14.2% improvements, respectively, over the next best model. In the pocket-conditional cofolding regime, Pearl delivers $3.6\times$ improvement on a proprietary set of challenging, real-world drug targets at the more rigorous RMSD < 1 \r{A} threshold. Finally, we demonstrate that model performance correlates directly with synthetic dataset size used in training.
【3】Causal Ordering for Structure Learning From Time Series
标题:时间序列结构学习的因果排序
链接:https://arxiv.org/abs/2510.24639
备注:32 pages
摘要:Predicting causal structure from time series data is crucial for understanding complex phenomena in physiology, brain connectivity, climate dynamics, and socio-economic behaviour. Causal discovery in time series is hindered by the combinatorial complexity of identifying true causal relationships, especially as the number of variables and time points grow. A common approach to simplify the task is the so-called ordering-based methods. Traditional ordering methods inherently limit the representational capacity of the resulting model. In this work, we fix this issue by leveraging multiple valid causal orderings, instead of a single one as standard practice. We propose DOTS (Diffusion Ordered Temporal Structure), using diffusion-based causal discovery for temporal data. By integrating multiple orderings, DOTS effectively recovers the transitive closure of the underlying directed acyclic graph, mitigating spurious artifacts inherent in single-ordering approaches. We formalise the problem under standard assumptions such as stationarity and the additive noise model, and leverage score matching with diffusion processes to enable efficient Hessian estimation. Extensive experiments validate the approach. Empirical evaluations on synthetic and real-world datasets demonstrate that DOTS outperforms state-of-the-art baselines, offering a scalable and robust approach to temporal causal discovery. On synthetic benchmarks ($d{=}\!3-\!6$ variables, $T{=}200\!-\!5{,}000$ samples), DOTS improves mean window-graph $F1$ from $0.63$ (best baseline) to $0.81$. On the CausalTime real-world benchmark ($d{=}20\!-\!36$), while baselines remain the best on individual datasets, DOTS attains the highest average summary-graph $F1$ while halving runtime relative to graph-optimisation methods. These results establish DOTS as a scalable and accurate solution for temporal causal discovery.
【4】Physics-Informed Extreme Learning Machine (PIELM): Opportunities and Challenges
标题:身体知情的极限学习机(PIELM):机遇与挑战
链接:https://arxiv.org/abs/2510.24577
摘要:We are very delighted to see the fast development of physics-informed extreme learning machine (PIELM) in recent years for higher computation efficiency and accuracy in physics-informed machine learning. As a summary or review on PIELM is currently not available, we would like to take this opportunity to show our perspective and experience for this promising research direction. We can see many efforts are made to solve PDEs with sharp gradients, nonlinearities, high-frequency behavior, hard constraints, uncertainty, multiphysics coupling. Despite the success, many urgent challenges remain to be tackled, which also provides us opportunities to develop more robust, interpretable, and generalizable PIELM frameworks with applications in science and engineering.
【5】Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks
标题:双思维世界模型:动态无线网络中学习的通用框架
链接:https://arxiv.org/abs/2510.24546
摘要:Despite the popularity of reinforcement learning (RL) in wireless networks, existing approaches that rely on model-free RL (MFRL) and model-based RL (MBRL) are data inefficient and short-sighted. Such RL-based solutions cannot generalize to novel network states since they capture only statistical patterns rather than the underlying physics and logic from wireless data. These limitations become particularly challenging in complex wireless networks with high dynamics and long-term planning requirements. To address these limitations, in this paper, a novel dual-mind world model-based learning framework is proposed with the goal of optimizing completeness-weighted age of information (CAoI) in a challenging mmWave V2X scenario. Inspired by cognitive psychology, the proposed dual-mind world model encompasses a pattern-driven System 1 component and a logic-driven System 2 component to learn dynamics and logic of the wireless network, and to provide long-term link scheduling over reliable imagined trajectories. Link scheduling is learned through end-to-end differentiable imagined trajectories with logical consistency over an extended horizon rather than relying on wireless data obtained from environment interactions. Moreover, through imagination rollouts, the proposed world model can jointly reason network states and plan link scheduling. During intervals without observations, the proposed method remains capable of making efficient decisions. Extensive experiments are conducted on a realistic simulator based on Sionna with real-world physical channel, ray-tracing, and scene objects with material properties. Simulation results show that the proposed world model achieves a significant improvement in data efficiency and achieves strong generalization and adaptation to unseen environments, compared to the state-of-the-art RL baselines, and the world model approach with only System 1.
【6】MIMIC-Sepsis: A Curated Benchmark for Modeling and Learning from Sepsis Trajectories in the ICU
标题:模拟败血症:ICU中败血症轨迹建模和学习的精选基准
链接:https://arxiv.org/abs/2510.24500
摘要:Sepsis is a leading cause of mortality in intensive care units (ICUs), yet existing research often relies on outdated datasets, non-reproducible preprocessing pipelines, and limited coverage of clinical interventions. We introduce MIMIC-Sepsis, a curated cohort and benchmark framework derived from the MIMIC-IV database, designed to support reproducible modeling of sepsis trajectories. Our cohort includes 35,239 ICU patients with time-aligned clinical variables and standardized treatment data, including vasopressors, fluids, mechanical ventilation and antibiotics. We describe a transparent preprocessing pipeline-based on Sepsis-3 criteria, structured imputation strategies, and treatment inclusion-and release it alongside benchmark tasks focused on early mortality prediction, length-of-stay estimation, and shock onset classification. Empirical results demonstrate that incorporating treatment variables substantially improves model performance, particularly for Transformer-based architectures. MIMIC-Sepsis serves as a robust platform for evaluating predictive and sequential models in critical care research.
【7】Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings
标题:填补空白:在稀疏奖励设置中通过少量演示加速Q-Learning
链接:https://arxiv.org/abs/2510.24432
摘要:Reinforcement learning (RL) in sparse-reward environments remains a significant challenge due to the lack of informative feedback. We propose a simple yet effective method that uses a small number of successful demonstrations to initialize the value function of an RL agent. By precomputing value estimates from offline demonstrations and using them as targets for early learning, our approach provides the agent with a useful prior over promising actions. The agent then refines these estimates through standard online interaction. This hybrid offline-to-online paradigm significantly reduces the exploration burden and improves sample efficiency in sparse-reward settings. Experiments on benchmark tasks demonstrate that our method accelerates convergence and outperforms standard baselines, even with minimal or suboptimal demonstration data.
【8】Filtering instances and rejecting predictions to obtain reliable models in healthcare
标题:过滤实例并拒绝预测以获得医疗保健中的可靠模型
链接:https://arxiv.org/abs/2510.24368
备注:This paper is under review at Machine Learning (Springer)
摘要:Machine Learning (ML) models are widely used in high-stakes domains such as healthcare, where the reliability of predictions is critical. However, these models often fail to account for uncertainty, providing predictions even with low confidence. This work proposes a novel two-step data-centric approach to enhance the performance of ML models by improving data quality and filtering low-confidence predictions. The first step involves leveraging Instance Hardness (IH) to filter problematic instances during training, thereby refining the dataset. The second step introduces a confidence-based rejection mechanism during inference, ensuring that only reliable predictions are retained. We evaluate our approach using three real-world healthcare datasets, demonstrating its effectiveness at improving model reliability while balancing predictive performance and rejection rate. Additionally, we use alternative criteria - influence values for filtering and uncertainty for rejection - as baselines to evaluate the efficiency of the proposed method. The results demonstrate that integrating IH filtering with confidence-based rejection effectively enhances model performance while preserving a large proportion of instances. This approach provides a practical method for deploying ML systems in safety-critical applications.
【9】HergNet: a Fast Neural Surrogate Model for Sound Field Predictions via Superposition of Plane Waves
标题:HergNet:一种通过平面波叠加进行声学预测的快速神经代理模型
链接:https://arxiv.org/abs/2510.24279
摘要:We present a novel neural network architecture for the efficient prediction of sound fields in two and three dimensions. The network is designed to automatically satisfy the Helmholtz equation, ensuring that the outputs are physically valid. Therefore, the method can effectively learn solutions to boundary-value problems in various wave phenomena, such as acoustics, optics, and electromagnetism. Numerical experiments show that the proposed strategy can potentially outperform state-of-the-art methods in room acoustics simulation, in particular in the range of mid to high frequencies.
【10】SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning
标题:SPSYS ++:通过稀疏使用的词典学习来缩放梯度倒置
链接:https://arxiv.org/abs/2510.24200
备注:Published at the Workshop on Regulatable ML at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:Federated Learning has seen an increased deployment in real-world scenarios recently, as it enables the distributed training of machine learning models without explicit data sharing between individual clients. Yet, the introduction of the so-called gradient inversion attacks has fundamentally challenged its privacy-preserving properties. Unfortunately, as these attacks mostly rely on direct data optimization without any formal guarantees, the vulnerability of real-world systems remains in dispute and requires tedious testing for each new federated deployment. To overcome these issues, recently the SPEAR attack was introduced, which is based on a theoretical analysis of the gradients of linear layers with ReLU activations. While SPEAR is an important theoretical breakthrough, the attack's practicality was severely limited by its exponential runtime in the batch size b. In this work, we fill this gap by applying State-of-the-Art techniques from Sparsely-Used Dictionary Learning to make the problem of gradient inversion on linear layers with ReLU activations tractable. Our experiments demonstrate that our new attack, SPEAR++, retains all desirable properties of SPEAR, such as robustness to DP noise and FedAvg aggregation, while being applicable to 10x bigger batch sizes.
【11】Identifiable learning of dissipative dynamics
标题:消散动力学的可识别学习
链接:https://arxiv.org/abs/2510.24160
摘要:Complex dissipative systems appear across science and engineering, from polymers and active matter to learning algorithms. These systems operate far from equilibrium, where energy dissipation and time irreversibility are key to their behavior, but are difficult to quantify from data. Learning accurate and interpretable models of such dynamics remains a major challenge: the models must be expressive enough to describe diverse processes, yet constrained enough to remain physically meaningful and mathematically identifiable. Here, we introduce I-OnsagerNet, a neural framework that learns dissipative stochastic dynamics directly from trajectories while ensuring both interpretability and uniqueness. I-OnsagerNet extends the Onsager principle to guarantee that the learned potential is obtained from the stationary density and that the drift decomposes cleanly into time-reversible and time-irreversible components, as dictated by the Helmholtz decomposition. Our approach enables us to calculate the entropy production and to quantify irreversibility, offering a principled way to detect and quantify deviations from equilibrium. Applications to polymer stretching in elongational flow and to stochastic gradient Langevin dynamics reveal new insights, including super-linear scaling of barrier heights and sub-linear scaling of entropy production rates with the strain rate, and the suppression of irreversibility with increasing batch size. I-OnsagerNet thus establishes a general, data-driven framework for discovering and interpreting non-equilibrium dynamics.
【12】Causal Convolutional Neural Networks as Finite Impulse Response Filters
标题:因果卷积神经网络作为有限脉冲响应过滤器
链接:https://arxiv.org/abs/2510.24125
备注:14 pages, 19 figures, Under review
摘要
:This study investigates the behavior of Causal Convolutional Neural Networks (CNNs) with quasi-linear activation functions when applied to time-series data characterized by multimodal frequency content. We demonstrate that, once trained, such networks exhibit properties analogous to Finite Impulse Response (FIR) filters, particularly when the convolutional kernels are of extended length exceeding those typically employed in standard CNN architectures. Causal CNNs are shown to capture spectral features both implicitly and explicitly, offering enhanced interpretability for tasks involving dynamic systems. Leveraging the associative property of convolution, we further show that the entire network can be reduced to an equivalent single-layer filter resembling an FIR filter optimized via least-squares criteria. This equivalence yields new insights into the spectral learning behavior of CNNs trained on signals with sparse frequency content. The approach is validated on both simulated beam dynamics and real-world bridge vibration datasets, underlining its relevance for modeling and identifying physical systems governed by dynamic responses.
【13】Learning Parameterized Skills from Demonstrations
标题:从演示中学习参数化技能
链接:https://arxiv.org/abs/2510.24095
备注:Neurips 2025
摘要:We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extended, semantically meaningful, and adaptable. We empirically show that learning parameterized skills from multitask expert demonstrations significantly improves generalization to unseen tasks. Our method outperforms multitask as well as skill learning baselines on both LIBERO and MetaWorld benchmarks. We also demonstrate that DEPS discovers interpretable parameterized skills, such as an object grasping skill whose continuous arguments define the grasp location.
【14】Kernelized Sparse Fine-Tuning with Bi-level Parameter Competition for Vision Models
标题:视觉模型的两级参数竞争核化稀疏精调
链接:https://arxiv.org/abs/2510.24037
摘要:Parameter-efficient fine-tuning (PEFT) aims to adapt pre-trained vision models to downstream tasks. Among PEFT paradigms, sparse tuning achieves remarkable performance by adjusting only the weights most relevant to downstream tasks, rather than densely tuning the entire weight matrix. Current methods follow a two-stage paradigm. First, it locates task-relevant weights by gradient information, which overlooks the parameter adjustments during fine-tuning and limits the performance. Second, it updates only the located weights by applying a sparse mask to the gradient of the weight matrix, which results in high memory usage due to the storage of all weight matrices in the optimizer. In this paper, we propose a one-stage method named SNELLA to overcome the above limitations. For memory usage, SNELLA selectively updates the weight matrix by adding it to another sparse matrix that is merged by two low-rank learnable matrices. We extend the low-rank decomposition by introducing nonlinear kernel functions, thereby increasing the rank of the resulting merged matrix to prevent the interdependency among weight updates, enabling better adaptation to downstream tasks. For locating task-relevant weights, we propose an adaptive bi-level sparsity allocation mechanism that encourages weights to compete across and inside layers based on their importance scores in an end-to-end manner. Extensive experiments are conducted on classification, segmentation, and generation tasks using different pre-trained vision models. The results show that SNELLA achieves SOTA performance with low memory usage. Notably, SNELLA obtains 1.8% (91.9% v.s. 90.1%) higher Top-1 accuracy on the FGVC benchmark compared to SPT-LoRA. Compared to previous methods, SNELLA achieves a memory reduction of 31.1%-39.9% across models with parameter scales from 86M to 632M. Our source codes are available at https://github.com/ssfgunner/SNELL.
【15】Efficient Global-Local Fusion Sampling for Physics-Informed Neural Networks
标题:物理信息神经网络的高效全局-局部融合采样
链接:https://arxiv.org/abs/2510.24026
摘要:The accuracy of Physics-Informed Neural Networks (PINNs) critically depends on the placement of collocation points, as the PDE loss is approximated through sampling over the solution domain. Global sampling ensures stability by covering the entire domain but requires many samples and is computationally expensive, whereas local sampling improves efficiency by focusing on high-residual regions but may neglect well-learned areas, reducing robustness. We propose a Global-Local Fusion (GLF) Sampling Strategy that combines the strengths of both approaches. Specifically, new collocation points are generated by perturbing training points with Gaussian noise scaled inversely to the residual, thereby concentrating samples in difficult regions while preserving exploration. To further reduce computational overhead, a lightweight linear surrogate is introduced to approximate the global residual-based distribution, achieving similar effectiveness at a fraction of the cost. Together, these components, residual-adaptive sampling and residual-based approximation, preserve the stability of global methods while retaining the efficiency of local refinement. Extensive experiments on benchmark PDEs demonstrate that GLF consistently improves both accuracy and efficiency compared with global and local sampling strategies. This study provides a practical and scalable framework for enhancing the reliability and efficiency of PINNs in solving complex and high-dimensional PDEs.
【16】Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models
标题:文本到图像扩散模型的免训练安全文本嵌入指南
链接:https://arxiv.org/abs/2510.24012
备注:Accepted at NeurIPS 2025
摘要:Text-to-image models have recently made significant advances in generating realistic and semantically coherent images, driven by advanced diffusion models and large-scale web-crawled datasets. However, these datasets often contain inappropriate or biased content, raising concerns about the generation of harmful outputs when provided with malicious text prompts. We propose Safe Text embedding Guidance (STG), a training-free approach to improve the safety of diffusion models by guiding the text embeddings during sampling. STG adjusts the text embeddings based on a safety function evaluated on the expected final denoised image, allowing the model to generate safer outputs without additional training. Theoretically, we show that STG aligns the underlying model distribution with safety constraints, thereby achieving safer outputs while minimally affecting generation quality. Experiments on various safety scenarios, including nudity, violence, and artist-style removal, show that STG consistently outperforms both training-based and training-free baselines in removing unsafe content while preserving the core semantic intent of input prompts. Our code is available at https://github.com/aailab-kaist/STG.
【17】Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks
标题:Mars-Bench:评估火星科学任务基础模型的基准
链接:https://arxiv.org/abs/2510.24010
备注:Accepted at NeurIPS 2025
摘要
:Foundation models have enabled rapid progress across many specialized domains by leveraging large-scale pre-training on unlabeled data, demonstrating strong generalization to a variety of downstream tasks. While such models have gained significant attention in fields like Earth Observation, their application to Mars science remains limited. A key enabler of progress in other domains has been the availability of standardized benchmarks that support systematic evaluation. In contrast, Mars science lacks such benchmarks and standardized evaluation frameworks, which have limited progress toward developing foundation models for Martian tasks. To address this gap, we introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks using both orbital and surface imagery. Mars-Bench comprises 20 datasets spanning classification, segmentation, and object detection, focused on key geologic features such as craters, cones, boulders, and frost. We provide standardized, ready-to-use datasets and baseline evaluations using models pre-trained on natural images, Earth satellite data, and state-of-the-art vision-language models. Results from all analyses suggest that Mars-specific foundation models may offer advantages over general-domain counterparts, motivating further exploration of domain-adapted pre-training. Mars-Bench aims to establish a standardized foundation for developing and comparing machine learning models for Mars science. Our data, models, and code are available at: https://mars-bench.github.io/.
【18】STNet: Spectral Transformation Network for Solving Operator Eigenvalue Problem
标题:STNet:用于解决运算符特征值问题的谱变换网络
链接:https://arxiv.org/abs/2510.23986
摘要:Operator eigenvalue problems play a critical role in various scientific fields and engineering applications, yet numerical methods are hindered by the curse of dimensionality. Recent deep learning methods provide an efficient approach to address this challenge by iteratively updating neural networks. These methods' performance relies heavily on the spectral distribution of the given operator: larger gaps between the operator's eigenvalues will improve precision, thus tailored spectral transformations that leverage the spectral distribution can enhance their performance. Based on this observation, we propose the Spectral Transformation Network (STNet). During each iteration, STNet uses approximate eigenvalues and eigenfunctions to perform spectral transformations on the original operator, turning it into an equivalent but easier problem. Specifically, we employ deflation projection to exclude the subspace corresponding to already solved eigenfunctions, thereby reducing the search space and avoiding converging to existing eigenfunctions. Additionally, our filter transform magnifies eigenvalues in the desired region and suppresses those outside, further improving performance. Extensive experiments demonstrate that STNet consistently outperforms existing learning-based methods, achieving state-of-the-art performance in accuracy.
【19】An efficient probabilistic hardware architecture for diffusion-like models
标题:用于类扩散模型的高效概率硬件架构
链接:https://arxiv.org/abs/2510.23972
备注:9 pages, 6 figures
摘要:The proliferation of probabilistic AI has promoted proposals for specialized stochastic computers. Despite promising efficiency gains, these proposals have failed to gain traction because they rely on fundamentally limited modeling techniques and exotic, unscalable hardware. In this work, we address these shortcomings by proposing an all-transistor probabilistic computer that implements powerful denoising models at the hardware level. A system-level analysis indicates that devices based on our architecture could achieve performance parity with GPUs on a simple image benchmark using approximately 10,000 times less energy.
【20】Modeling Biological Multifunctionality with Echo State Networks
标题:用回声状态网络建模生物多功能性
链接:https://arxiv.org/abs/2510.23940
备注:26 pages, 17 figures, 6 tables, 23 references
摘要:In this work, a three-dimensional multicomponent reaction-diffusion model has been developed, combining excitable-system dynamics with diffusion processes and sharing conceptual features with the FitzHugh-Nagumo model. Designed to capture the spatiotemporal behavior of biological systems, particularly electrophysiological processes, the model was solved numerically to generate time-series data. These data were subsequently used to train and evaluate an Echo State Network (ESN), which successfully reproduced the system's dynamic behavior. The results demonstrate that simulating biological dynamics using data-driven, multifunctional ESN models is both feasible and effective.
【21】Group Interventions on Deep Networks for Causal Discovery in Subsystems
标题:在子系统中进行因果发现的深度网络上的群体干预
链接:https://arxiv.org/abs/2510.23906
备注:Submitted to IEEE Access. We are working on the revised version
摘要:Causal discovery uncovers complex relationships between variables, enhancing predictions, decision-making, and insights into real-world systems, especially in nonlinear multivariate time series. However, most existing methods primarily focus on pairwise cause-effect relationships, overlooking interactions among groups of variables, i.e., subsystems and their collective causal influence. In this study, we introduce gCDMI, a novel multi-group causal discovery method that leverages group-level interventions on trained deep neural networks and employs model invariance testing to infer causal relationships. Our approach involves three key steps. First, we use deep learning to jointly model the structural relationships among groups of all time series. Second, we apply group-wise interventions to the trained model. Finally, we conduct model invariance testing to determine the presence of causal links among variable groups. We evaluate our method on simulated datasets, demonstrating its superior performance in identifying group-level causal relationships compared to existing methods. Additionally, we validate our approach on real-world datasets, including brain networks and climate ecosystems. Our results highlight that applying group-level interventions to deep learning models, combined with invariance testing, can effectively reveal complex causal structures, offering valuable insights for domains such as neuroscience and climate science.
【22】A PDE-Informed Latent Diffusion Model for 2-m Temperature Downscaling
标题:2-m温度降尺度的基于PED的潜在扩散模型
链接:https://arxiv.org/abs/2510.23866
摘要:This work presents a physics-conditioned latent diffusion model tailored for dynamical downscaling of atmospheric data, with a focus on reconstructing high-resolution 2-m temperature fields. Building upon a pre-existing diffusion architecture and employing a residual formulation against a reference UNet, we integrate a partial differential equation (PDE) loss term into the model's training objective. The PDE loss is computed in the full resolution (pixel) space by decoding the latent representation and is designed to enforce physical consistency through a finite-difference approximation of an effective advection-diffusion balance. Empirical observations indicate that conventional diffusion training already yields low PDE residuals, and we investigate how fine-tuning with this additional loss further regularizes the model and enhances the physical plausibility of the generated fields. The entirety of our codebase is available on Github, for future reference and development.
【23】Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
标题:通过稀疏自动编码器学习音频潜在空间中的可解释特征
链接:https://arxiv.org/abs/2510.23802
备注:Accepted to NeurIPS 2025 Mechanistic Interpretability Workshop
摘要:While sparse autoencoders (SAEs) successfully extract interpretable features from language models, applying them to audio generation faces unique challenges: audio's dense nature requires compression that obscures semantic meaning, and automatic feature characterization remains limited. We propose a framework for interpreting audio generative models by mapping their latent representations to human-interpretable acoustic concepts. We train SAEs on audio autoencoder latents, then learn linear mappings from SAE features to discretized acoustic properties (pitch, amplitude, and timbre). This enables both controllable manipulation and analysis of the AI music generation process, revealing how acoustic properties emerge during synthesis. We validate our approach on continuous (DiffRhythm-VAE) and discrete (EnCodec, WavTokenizer) audio latent spaces, and analyze DiffRhythm, a state-of-the-art text-to-music model, to demonstrate how pitch, timbre, and loudness evolve throughout generation. While our work is only done on audio modality, our framework can be extended to interpretable analysis of visual latent space generation models.
【24】On the Societal Impact of Machine Learning
标题:关于机器学习的社会影响
链接:https://arxiv.org/abs/2510.23693
备注:PhD thesis
摘要:This PhD thesis investigates the societal impact of machine learning (ML). ML increasingly informs consequential decisions and recommendations, significantly affecting many aspects of our lives. As these data-driven systems are often developed without explicit fairness considerations, they carry the risk of discriminatory effects. The contributions in this thesis enable more appropriate measurement of fairness in ML systems, systematic decomposition of ML systems to anticipate bias dynamics, and effective interventions that reduce algorithmic discrimination while maintaining system utility. I conclude by discussing ongoing challenges and future research directions as ML systems, including generative artificial intelligence, become increasingly integrated into society. This work offers a foundation for ensuring that ML's societal impact aligns with broader social values.
【25】JiuTian Chuanliu: A Large Spatiotemporal Model for General-purpose Dynamic Urban Sensing
标题:九天川流:通用动态城市感知的大时空模型
链接:https://arxiv.org/abs/2510.23662
备注:None
摘要:As a window for urban sensing, human mobility contains rich spatiotemporal information that reflects both residents' behavior preferences and the functions of urban areas. The analysis of human mobility has attracted the attention of many researchers. However, existing methods often address specific tasks from a particular perspective, leading to insufficient modeling of human mobility and limited applicability of the learned knowledge in various downstream applications. To address these challenges, this paper proposes to push massive amounts of human mobility data into a spatiotemporal model, discover latent semantics behind mobility behavior and support various urban sensing tasks. Specifically, a large-scale and widely covering human mobility data is collected through the ubiquitous base station system and a framework named General-purpose and Dynamic Human Mobility Embedding (GDHME) for urban sensing is introduced. The framework follows the self-supervised learning idea and contains two major stages. In stage 1, GDHME treats people and regions as nodes within a dynamic graph, unifying human mobility data as people-region-time interactions. An encoder operating in continuous-time dynamically computes evolving node representations, capturing dynamic states for both people and regions. Moreover, an autoregressive self-supervised task is specially designed to guide the learning of the general-purpose node embeddings. In stage 2, these representations are utilized to support various tasks. To evaluate the effectiveness of our GDHME framework, we further construct a multi-task urban sensing benchmark. Offline experiments demonstrate GDHME's ability to automatically learn valuable node features from vast amounts of data. Furthermore, our framework is used to deploy the JiuTian ChuanLiu Big Model, a system that has been presented at the 2023 China Mobile Worldwide Partner Conference.
【26】Integrating Genomics into Multimodal EHR Foundation Models
标题:将基因组学集成到多模式EHR基金会模型中
链接:https://arxiv.org/abs/2510.23639
摘要:This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR-only approaches to build more holistic health profiles. Leveraging the extensive and diverse data from the All of Us (AoU) Research Program, this multimodal framework aims to learn complex relationships between clinical data and genetic predispositions. The methodology extends advancements in generative AI to the EHR foundation model space, enhancing predictive capabilities and interpretability. Evaluation on AoU data demonstrates the model's predictive value for the onset of various conditions, particularly Type 2 Diabetes (T2D), and illustrates the interplay between PRS and EHR data. The work also explores transfer learning for custom classification tasks, showcasing the architecture's versatility and efficiency. This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies, laying the groundwork for more personalized, equitable, and actionable real-world evidence generation in healthcare.
【27】Bridging Function Approximation and Device Physics via Negative Differential Resistance Networks
标题:通过负差阻网络实现桥函数逼近和器件物理
链接:https://arxiv.org/abs/2510.23638
摘要:Achieving fully analog neural computation requires hardware that can natively implement both linear and nonlinear operations with high efficiency. While analogue matrix-vector multiplication has advanced via compute-in-memory architectures, nonlinear activation functions remain a bottleneck, often requiring digital or hybrid solutions. Inspired by the Kolmogorov-Arnold framework, we propose KANalogue, a fully analogue implementation of Kolmogorov-Arnold Networks (KANs) using negative differential resistance devices as physical realizations of learnable univariate basis functions. By leveraging the intrinsic negative differential resistance characteristics of tunnel diodes fabricated from NbSi2N4/HfSi2N4 heterostructures, we construct coordinate-wise nonlinearities with distinct curvature and support profiles. We extract I-V data from fabricated armchair and zigzag devices, fit high-order polynomials to emulate diode behavior in software, and train KANs on vision benchmarks using these learned basis functions. Our results demonstrate that KANalogue can approximate complex functions with minimal parameters while maintaining classification accuracy competitive with digital baselines. This work bridges device-level physics and function approximation theory, charting a path toward scalable, energy-efficient analogue machine learning systems.
【28】Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning
标题
:帮助机器帮助您:通过怀疑学习对以自我为中心的数据清理的野外评估
链接:https://arxiv.org/abs/2510.23635
摘要:Any digital personal assistant, whether used to support task performance, answer questions, or manage work and daily life, including fitness schedules, requires high-quality annotations to function properly. However, user annotations, whether actively produced or inferred from context (e.g., data from smartphone sensors), are often subject to errors and noise. Previous research on Skeptical Learning (SKEL) addressed the issue of noisy labels by comparing offline active annotations with passive data, allowing for an evaluation of annotation accuracy. However, this evaluation did not include confirmation from end-users, the best judges of their own context. In this study, we evaluate SKEL's performance in real-world conditions with actual users who can refine the input labels based on their current perspectives and needs. The study involves university students using the iLog mobile application on their devices over a period of four weeks. The results highlight the challenges of finding the right balance between user effort and data quality, as well as the potential benefits of using SKEL, which include reduced annotation effort and improved quality of collected data.
【29】Monotone and Separable Set Functions: Characterizations and Neural Models
标题:单调和可分集函数:特征和神经模型
链接:https://arxiv.org/abs/2510.23634
摘要:Motivated by applications for set containment problems, we consider the following fundamental problem: can we design set-to-vector functions so that the natural partial order on sets is preserved, namely $S\subseteq T \text{ if and only if } F(S)\leq F(T) $. We call functions satisfying this property Monotone and Separating (MAS) set functions. % We establish lower and upper bounds for the vector dimension necessary to obtain MAS functions, as a function of the cardinality of the multisets and the underlying ground set. In the important case of an infinite ground set, we show that MAS functions do not exist, but provide a model called our which provably enjoys a relaxed MAS property we name "weakly MAS" and is stable in the sense of Holder continuity. We also show that MAS functions can be used to construct universal models that are monotone by construction and can approximate all monotone set functions. Experimentally, we consider a variety of set containment tasks. The experiments show the benefit of using our our model, in comparison with standard set models which do not incorporate set containment as an inductive bias. Our code is available in https://github.com/yonatansverdlov/Monotone-Embedding.
【30】Noise is All You Need: Solving Linear Inverse Problems by Noise Combination Sampling with Diffusion Models
标题:噪声就是你所需要的一切:通过噪声组合采样和扩散模型求解线性逆问题
链接:https://arxiv.org/abs/2510.23633
备注:9 pages
摘要:Pretrained diffusion models have demonstrated strong capabilities in zero-shot inverse problem solving by incorporating observation information into the generation process of the diffusion models. However, this presents an inherent dilemma: excessive integration can disrupt the generative process, while insufficient integration fails to emphasize the constraints imposed by the inverse problem. To address this, we propose \emph{Noise Combination Sampling}, a novel method that synthesizes an optimal noise vector from a noise subspace to approximate the measurement score, replacing the noise term in the standard Denoising Diffusion Probabilistic Models process. This enables conditional information to be naturally embedded into the generation process without reliance on step-wise hyperparameter tuning. Our method can be applied to a wide range of inverse problem solvers, including image compression, and, particularly when the number of generation steps $T$ is small, achieves superior performance with negligible computational overhead, significantly improving robustness and stability.
【31】LLMComp: A Language Modeling Paradigm for Error-Bounded Scientific Data Compression
标题:LLMComp:一种用于有误差限制的科学数据压缩的语言建模范式
链接:https://arxiv.org/abs/2510.23632
摘要:The rapid growth of high-resolution scientific simulations and observation systems is generating massive spatiotemporal datasets, making efficient, error-bounded compression increasingly important. Meanwhile, decoder-only large language models (LLMs) have demonstrated remarkable capabilities in modeling complex sequential data. In this paper, we propose LLMCOMP, a novel lossy compression paradigm that leverages decoder-only large LLMs to model scientific data. LLMCOMP first quantizes 3D fields into discrete tokens, arranges them via Z-order curves to preserve locality, and applies coverage-guided sampling to enhance training efficiency. An autoregressive transformer is then trained with spatial-temporal embeddings to model token transitions. During compression, the model performs top-k prediction, storing only rank indices and fallback corrections to ensure strict error bounds. Experiments on multiple reanalysis datasets show that LLMCOMP consistently outperforms state-of-the-art compressors, achieving up to 30% higher compression ratios under strict error bounds. These results highlight the potential of LLMs as general-purpose compressors for high-fidelity scientific data.
【32】Preference Learning with Response Time: Robust Losses and Guarantees
标题:具有响应时间的偏好学习:稳健的损失和保证
链接:https://arxiv.org/abs/2505.22820
备注:Accepted at NeurIPS 2025
摘要:This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal information inherent in user decision-making remains largely unexploited. We propose novel methodologies to incorporate response time information alongside binary choice data, leveraging the Evidence Accumulation Drift Diffusion (EZ) model, under which response time is informative of the preference strength. We develop Neyman-orthogonal loss functions that achieve oracle convergence rates for reward model learning, matching the theoretical optimal rates that would be attained if the expected response times for each query were known a priori. Our theoretical analysis demonstrates that for linear reward functions, conventional preference learning suffers from error rates that scale exponentially with reward magnitude. In contrast, our response time-augmented approach reduces this to polynomial scaling, representing a significant improvement in sample efficiency. We extend these guarantees to non-parametric reward function spaces, establishing convergence properties for more complex, realistic reward models. Our extensive experiments validate our theoretical findings in the context of preference learning over images.
【33】Comparison of generalised additive models and neural networks in applications: A systematic review
标题:广义加性模型和神经网络在应用中的比较:系统性评论
链接:https://arxiv.org/abs/2510.24601
摘要
:Neural networks have become a popular tool in predictive modelling, more commonly associated with machine learning and artificial intelligence than with statistics. Generalised Additive Models (GAMs) are flexible non-linear statistical models that retain interpretability. Both are state-of-the-art in their own right, with their respective advantages and disadvantages. This paper analyses how these two model classes have performed on real-world tabular data. Following PRISMA guidelines, we conducted a systematic review of papers that performed empirical comparisons of GAMs and neural networks. Eligible papers were identified, yielding 143 papers, with 430 datasets. Key attributes at both paper and dataset levels were extracted and reported. Beyond summarising comparisons, we analyse reported performance metrics using mixed-effects modelling to investigate potential characteristics that can explain and quantify observed differences, including application area, study year, sample size, number of predictors, and neural network complexity. Across datasets, no consistent evidence of superiority was found for either GAMs or neural networks when considering the most frequently reported metrics (RMSE, $R^2$, and AUC). Neural networks tended to outperform in larger datasets and in those with more predictors, but this advantage narrowed over time. Conversely, GAMs remained competitive, particularly in smaller data settings, while retaining interpretability. Reporting of dataset characteristics and neural network complexity was incomplete in much of the literature, limiting transparency and reproducibility. This review highlights that GAMs and neural networks should be viewed as complementary approaches rather than competitors. For many tabular applications, the performance trade-off is modest, and interpretability may favour GAMs.
【34】Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations
标题:具有分段分析激活的神经网络梯度下降图的非奇异性
链接:https://arxiv.org/abs/2510.24466
摘要:The theory of training deep networks has become a central question of modern machine learning and has inspired many practical advancements. In particular, the gradient descent (GD) optimization algorithm has been extensively studied in recent years. A key assumption about GD has appeared in several recent works: the \emph{GD map is non-singular} -- it preserves sets of measure zero under preimages. Crucially, this assumption has been used to prove that GD avoids saddle points and maxima, and to establish the existence of a computable quantity that determines the convergence to global minima (both for GD and stochastic GD). However, the current literature either assumes the non-singularity of the GD map or imposes restrictive assumptions, such as Lipschitz smoothness of the loss (for example, Lipschitzness does not hold for deep ReLU networks with the cross-entropy loss) and restricts the analysis to GD with small step-sizes. In this paper, we investigate the neural network map as a function on the space of weights and biases. We also prove, for the first time, the non-singularity of the gradient descent (GD) map on the loss landscape of realistic neural network architectures (with fully connected, convolutional, or softmax attention layers) and piecewise analytic activations (which includes sigmoid, ReLU, leaky ReLU, etc.) for almost all step-sizes. Our work significantly extends the existing results on the convergence of GD and SGD by guaranteeing that they apply to practical neural network settings and has the potential to unlock further exploration of learning dynamics.
【35】Deep Learning-Enhanced Calibration of the Heston Model: A Unified Framework
标题:赫斯顿模型的深度学习增强校准:统一框架
链接:https://arxiv.org/abs/2510.24074
摘要:The Heston stochastic volatility model is a widely used tool in financial mathematics for pricing European options. However, its calibration remains computationally intensive and sensitive to local minima due to the model's nonlinear structure and high-dimensional parameter space. This paper introduces a hybrid deep learning-based framework that enhances both the computational efficiency and the accuracy of the calibration procedure. The proposed approach integrates two supervised feedforward neural networks: the Price Approximator Network (PAN), which approximates the option price surface based on strike and moneyness inputs, and the Calibration Correction Network (CCN), which refines the Heston model's output by correcting systematic pricing errors. Experimental results on real S\&P 500 option data demonstrate that the deep learning approach outperforms traditional calibration techniques across multiple error metrics, achieving faster convergence and superior generalization in both in-sample and out-of-sample settings. This framework offers a practical and robust solution for real-time financial model calibration.
【36】Score-based constrained generative modeling via Langevin diffusions with boundary conditions
标题:通过带边界条件的Langevin扩散进行基于分数的约束生成式建模
链接:https://arxiv.org/abs/2510.23985
摘要:Score-based generative models based on stochastic differential equations (SDEs) achieve impressive performance in sampling from unknown distributions, but often fail to satisfy underlying constraints. We propose a constrained generative model using kinetic (underdamped) Langevin dynamics with specular reflection of velocity on the boundary defining constraints. This results in piecewise continuously differentiable noising and denoising process where the latter is characterized by a time-reversed dynamics restricted to a domain with boundary due to specular boundary condition. In addition, we also contribute to existing reflected SDEs based constrained generative models, where the stochastic dynamics is restricted through an abstract local time term. By presenting efficient numerical samplers which converge with optimal rate in terms of discretizations step, we provide a comprehensive comparison of models based on confined (specularly reflected kinetic) Langevin diffusion with models based on reflected diffusion with local time.
【37】Bayesian neural networks with interpretable priors from Mercer kernels
标题:具有来自Mercer核的可解释先验的Bayesian神经网络
链接:https://arxiv.org/abs/2510.23745
摘要:Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing a Bayesian posterior distribution over the network parameters. However, the prior, which is of key importance in any Bayesian setting, is rarely meaningful for BNNs. This is because the complexity of the input-to-output map of a BNN makes it difficult to understand how certain distributions enforce any interpretable constraint on the output space. Gaussian processes (GPs), on the other hand, are often preferred in uncertainty quantification tasks due to their interpretability. The drawback is that GPs are limited to small datasets without advanced techniques, which often rely on the covariance kernel having a specific structure. To address these challenges, we introduce a new class of priors for BNNs, called Mercer priors, such that the resulting BNN has samples which approximate that of a specified GP. The method works by defining a prior directly over the network parameters from the Mercer representation of the covariance kernel, and does not rely on the network having a specific structure. In doing so, we can exploit the scalability of BNNs in a meaningful Bayesian way.
其他(40篇)
【1】Generative View Stitching
标题:生成式视图缝合
链接:https://arxiv.org/abs/2510.24718
备注:Project website: this https URL
摘要:Autoregressive video diffusion models are capable of long rollouts that are stable and consistent with history, but they are unable to guide the current generation with conditioning from the future. In camera-guided video generation with a predefined camera trajectory, this limitation leads to collisions with the generated scene, after which autoregression quickly collapses. To address this, we propose Generative View Stitching (GVS), which samples the entire sequence in parallel such that the generated scene is faithful to every part of the predefined camera trajectory. Our main contribution is a sampling algorithm that extends prior work on diffusion stitching for robot planning to video generation. While such stitching methods usually require a specially trained model, GVS is compatible with any off-the-shelf video model trained with Diffusion Forcing, a prevalent sequence diffusion framework that we show already provides the affordances necessary for stitching. We then introduce Omni Guidance, a technique that enhances the temporal consistency in stitching by conditioning on both the past and future, and that enables our proposed loop-closing mechanism for delivering long-range coherence. Overall, GVS achieves camera-guided video generation that is stable, collision-free, frame-to-frame consistent, and closes loops for a variety of predefined camera paths, including Oscar Reutersv\"ard's Impossible Staircase. Results are best viewed as videos at https://andrewsonga.github.io/gvs.
【2】Tongyi DeepResearch Technical Report
标题:通益深研技术报告
链接:https://arxiv.org/abs/2510.24701
备注:his https URL
摘要:We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.
【3】Greedy Sampling Is Provably Efficient for RLHF
标题:贪婪采样对于RL HF来说是有效的
链接:https://arxiv.org/abs/2510.24700
备注:NeurIPS 2025
摘要:Reinforcement Learning from Human Feedback (RLHF) has emerged as a key technique for post-training large language models. Despite its empirical success, the theoretical understanding of RLHF is still limited, as learning the KL-regularized target with only preference feedback poses additional challenges compared with canonical RL. Existing works mostly study the reward-based Bradley-Terry (BT) preference model, and extend classical designs utilizing optimism or pessimism. This work, instead, considers the general preference model (whose practical relevance has been observed recently) and obtains performance guarantees with major, order-wise improvements over existing ones. Surprisingly, these results are derived from algorithms that directly use the empirical estimates (i.e., greedy sampling), as opposed to constructing optimistic or pessimistic estimates in previous works. This insight has a deep root in the unique structural property of the optimal policy class under the KL-regularized target, and we further specialize it to the BT model, highlighting the surprising sufficiency of greedy sampling in RLHF.
【4】AgentFold: Long-Horizon Web Agents with Proactive Context Management
标题:AgentFold:具有主动上下文管理的长视野Web代理
链接:https://arxiv.org/abs/2510.24699
备注:26 pages, 9 figures
摘要:LLM-based web agents show immense promise for information seeking, yet their effectiveness on long-horizon tasks is hindered by a fundamental trade-off in context management. Prevailing ReAct-based agents suffer from context saturation as they accumulate noisy, raw histories, while methods that fixedly summarize the full history at each step risk the irreversible loss of critical details. Addressing these, we introduce AgentFold, a novel agent paradigm centered on proactive context management, inspired by the human cognitive process of retrospective consolidation. AgentFold treats its context as a dynamic cognitive workspace to be actively sculpted, rather than a passive log to be filled. At each step, it learns to execute a `folding' operation, which manages its historical trajectory at multiple scales: it can perform granular condensations to preserve vital, fine-grained details, or deep consolidations to abstract away entire multi-step sub-tasks. The results on prominent benchmarks are striking: with simple supervised fine-tuning (without continual pre-training or RL), our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp and 47.3% on BrowseComp-ZH. Notably, this performance not only surpasses or matches open-source models of a dramatically larger scale, such as the DeepSeek-V3.1-671B-A37B, but also surpasses leading proprietary agents like OpenAI's o4-mini.
【5】The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
标题:鲁棒性的成本:ReLU网络中鲁棒再同步的参数复杂性更严格的界限
链接:https://arxiv.org/abs/2510.24643
备注:Accepted to NeurIPS 2025, 72 pages, 8 figures
摘要:We study the parameter complexity of robust memorization for $\mathrm{ReLU}$ networks: the number of parameters required to interpolate any given dataset with $\epsilon$-separation between differently labeled points, while ensuring predictions remain consistent within a $\mu$-ball around each training sample. We establish upper and lower bounds on the parameter count as a function of the robustness ratio $\rho = \mu / \epsilon$. Unlike prior work, we provide a fine-grained analysis across the entire range $\rho \in (0,1)$ and obtain tighter upper and lower bounds that improve upon existing results. Our findings reveal that the parameter complexity of robust memorization matches that of non-robust memorization when $\rho$ is small, but grows with increasing $\rho$.
【6】Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers
标题:稳健几何中位数的核心集:消除对离群值的大小依赖
链接:https://arxiv.org/abs/2510.24621
备注:This paper has been accepted by NeurIPS 2025
摘要:We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $\varepsilon$. Given an outlier count $m$, we construct a coreset of size $\tilde{O}(\varepsilon^{-2} \cdot \min\{\varepsilon^{-2}, d\})$ when $n \geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022 & 2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $\tilde{\Theta}(\varepsilon^{-1/2} + \frac{m}{n} \varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them.Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.
【7】Enforcing boundary conditions for physics-informed neural operators
标题:为物理知识的神经操作员强制边界条件
链接:https://arxiv.org/abs/2510.24557
摘要:Machine-learning based methods like physics-informed neural networks and physics-informed neural operators are becoming increasingly adept at solving even complex systems of partial differential equations. Boundary conditions can be enforced either weakly by penalizing deviations in the loss function or strongly by training a solution structure that inherently matches the prescribed values and derivatives. The former approach is easy to implement but the latter can provide benefits with respect to accuracy and training times. However, previous approaches to strongly enforcing Neumann or Robin boundary conditions require a domain with a fully $C^1$ boundary and, as we demonstrate, can lead to instability if those boundary conditions are posed on a segment of the boundary that is piecewise $C^1$ but only $C^0$ globally. We introduce a generalization of the approach by Sukumar \& Srivastava (doi: 10.1016/j.cma.2021.114333), and a new approach based on orthogonal projections that overcome this limitation. The performance of these new techniques is compared against weakly and semi-weakly enforced boundary conditions for the scalar Darcy flow equation and the stationary Navier-Stokes equations.
【8】Sample-efficient and Scalable Exploration in Continuous-Time RL
标题:连续时间RL中的样本高效和可扩展探索
链接:https://arxiv.org/abs/2510.24482
备注:26 pages, 6 figures, 6 tables
摘要:Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations (ODEs). We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying ODE. Our algorithm, COMBRL, greedily maximizes a weighted sum of the extrinsic reward and model epistemic uncertainty. This yields a scalable and sample-efficient approach to continuous-time model-based RL. We show that COMBRL achieves sublinear regret in the reward-driven setting, and in the unsupervised RL setting (i.e., without extrinsic rewards), we provide a sample complexity bound. In our experiments, we evaluate COMBRL in both standard and unsupervised RL settings and demonstrate that it scales better, is more sample-efficient than prior methods, and outperforms baselines across several deep RL tasks.
【9】APEX: Approximate-but-exhaustive search for ultra-large combinatorial synthesis libraries
标题:APEX:对超大型组合合成库的近乎但详尽的搜索
链接:https://arxiv.org/abs/2510.24380
摘要:Make-on-demand combinatorial synthesis libraries (CSLs) like Enamine REAL have significantly enabled drug discovery efforts. However, their large size presents a challenge for virtual screening, where the goal is to identify the top compounds in a library according to a computational objective (e.g., optimizing docking score) subject to computational constraints under a limited computational budget. For current library sizes -- numbering in the tens of billions of compounds -- and scoring functions of interest, a routine virtual screening campaign may be limited to scoring fewer than 0.1% of the available compounds, leaving potentially many high scoring compounds undiscovered. Furthermore, as constraints (and sometimes objectives) change during the course of a virtual screening campaign, existing virtual screening algorithms typically offer little room for amortization. We propose the approximate-but-exhaustive search protocol for CSLs, or APEX. APEX utilizes a neural network surrogate that exploits the structure of CSLs in the prediction of objectives and constraints to make full enumeration on a consumer GPU possible in under a minute, allowing for exact retrieval of approximate top-$k$ sets. To demonstrate APEX's capabilities, we develop a benchmark CSL comprised of more than 10 million compounds, all of which have been annotated with their docking scores on five medically relevant targets along with physicohemical properties measured with RDKit such that, for any objective and set of constraints, the ground truth top-$k$ compounds can be identified and compared against the retrievals from any virtual screening algorithm. We show APEX's consistently strong performance both in retrieval accuracy and runtime compared to alternative methods.
【10】SALS: Sparse Attention in Latent Space for KV cache Compression
标题:SALS:KV缓存压缩的潜在空间中的稀疏注意力
链接:https://arxiv.org/abs/2510.24273
摘要
:Large Language Models capable of handling extended contexts are in high demand, yet their inference remains challenging due to substantial Key-Value cache size and high memory bandwidth requirements. Previous research has demonstrated that KV cache exhibits low-rank characteristics within the hidden dimension, suggesting the potential for effective compression. However, due to the widely adopted Rotary Position Embedding mechanism in modern LLMs, naive low-rank compression suffers severe accuracy degradation or creates a new speed bottleneck, as the low-rank cache must first be reconstructed in order to apply RoPE. In this paper, we introduce two key insights: first, the application of RoPE to the key vectors increases their variance, which in turn results in a higher rank; second, after the key vectors are transformed into the latent space, they largely maintain their representation across most layers. Based on these insights, we propose the Sparse Attention in Latent Space framework. SALS projects the KV cache into a compact latent space via low-rank projection, and performs sparse token selection using RoPE-free query-key interactions in this space. By reconstructing only a small subset of important tokens, it avoids the overhead of full KV cache reconstruction. We comprehensively evaluate SALS on various tasks using two large-scale models: LLaMA2-7b-chat and Mistral-7b, and additionally verify its scalability on the RULER-128k benchmark with LLaMA3.1-8B-Instruct. Experimental results demonstrate that SALS achieves SOTA performance by maintaining competitive accuracy. Under different settings, SALS achieves 6.4-fold KV cache compression and 5.7-fold speed-up in the attention operator compared to FlashAttention2 on the 4K sequence. For the end-to-end throughput performance, we achieves 1.4-fold and 4.5-fold improvement compared to GPT-fast on 4k and 32K sequences, respectively.
【11】Sparse Optimistic Information Directed Sampling
标题:稀疏乐观信息引导抽样
链接:https://arxiv.org/abs/2510.24234
摘要:Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial depen- dence on the ambient dimension is unavoidable, or the data-poor regime, where dimension-independence is possible at the cost of worse dependence on the num- ber of rounds. In contrast, the sparse Information Directed Sampling (IDS) algo- rithm satisfies a Bayesian regret bound that has the optimal rate in both regimes simultaneously. In this work, we explore the use of Sparse Optimistic Informa- tion Directed Sampling (SOIDS) to achieve the same adaptivity in the worst-case setting, without Bayesian assumptions. Through a novel analysis that enables the use of a time-dependent learning rate, we show that SOIDS can optimally balance information and regret. Our results extend the theoretical guarantees of IDS, pro- viding the first algorithm that simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes. We empirically demonstrate the good performance of SOIDS.
【12】PRIVET: Privacy Metric Based on Extreme Value Theory
标题:PRIVET:基于极端价值理论的隐私指标
链接:https://arxiv.org/abs/2510.24233
摘要:Deep generative models are often trained on sensitive data, such as genetic sequences, health data, or more broadly, any copyrighted, licensed or protected content. This raises critical concerns around privacy-preserving synthetic data, and more specifically around privacy leakage, an issue closely tied to overfitting. Existing methods almost exclusively rely on global criteria to estimate the risk of privacy failure associated to a model, offering only quantitative non interpretable insights. The absence of rigorous evaluation methods for data privacy at the sample-level may hinder the practical deployment of synthetic data in real-world applications. Using extreme value statistics on nearest-neighbor distances, we propose PRIVET, a generic sample-based, modality-agnostic algorithm that assigns an individual privacy leak score to each synthetic sample. We empirically demonstrate that PRIVET reliably detects instances of memorization and privacy leakage across diverse data modalities, including settings with very high dimensionality, limited sample sizes such as genetic data and even under underfitting regimes. We compare our method to existing approaches under controlled settings and show its advantage in providing both dataset level and sample level assessments through qualitative and quantitative outputs. Additionally, our analysis reveals limitations in existing computer vision embeddings to yield perceptually meaningful distances when identifying near-duplicate samples.
【13】Unlocking Out-of-Distribution Generalization in Dynamics through Physics-Guided Augmentation
标题:通过物理引导的增强解锁动力学中的非分布概括
链接:https://arxiv.org/abs/2510.24216
摘要:In dynamical system modeling, traditional numerical methods are limited by high computational costs, while modern data-driven approaches struggle with data scarcity and distribution shifts. To address these fundamental limitations, we first propose SPARK, a physics-guided quantitative augmentation plugin. Specifically, SPARK utilizes a reconstruction autoencoder to integrate physical parameters into a physics-rich discrete state dictionary. This state dictionary then acts as a structured dictionary of physical states, enabling the creation of new, physically-plausible training samples via principled interpolation in the latent space. Further, for downstream prediction, these augmented representations are seamlessly integrated with a Fourier-enhanced Graph ODE, a combination designed to robustly model the enriched data distribution while capturing long-term temporal dependencies. Extensive experiments on diverse benchmarks demonstrate that SPARK significantly outperforms state-of-the-art baselines, particularly in challenging out-of-distribution scenarios and data-scarce regimes, proving the efficacy of our physics-guided augmentation paradigm.
【14】Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames
标题:蒙眼专家更好地概括:来自机器人操作和视频游戏的见解
链接:https://arxiv.org/abs/2510.24194
摘要:Behavioral cloning is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for the physical world, where achieving generalization requires countless demonstrations of a multitude of tasks. Typically, a human expert with full information on the task demonstrates a (nearly) optimal behavior. In this paper, we propose to hide some of the task's information from the demonstrator. This ``blindfolded'' expert is compelled to employ non-trivial exploration to solve the task. We show that cloning the blindfolded expert generalizes better to unseen tasks than its fully-informed counterpart. We conduct experiments of real-world robot peg insertion tasks with (limited) human demonstrations, alongside videogames from the Procgen benchmark. Additionally, we support our findings with theoretical analysis, which confirms that the generalization error scales with $\sqrt{I/m}$, where $I$ measures the amount of task information available to the demonstrator, and $m$ is the number of demonstrated tasks. Both theory and practice indicate that cloning blindfolded experts generalizes better with fewer demonstrated tasks. Project page with videos and code: https://sites.google.com/view/blindfoldedexperts/home
【15】V-SAT: Video Subtitle Annotation Tool
标题:V-SAT:视频字幕注释工具
链接:https://arxiv.org/abs/2510.24180
摘要
:The surge of audiovisual content on streaming platforms and social media has heightened the demand for accurate and accessible subtitles. However, existing subtitle generation methods primarily speech-based transcription or OCR-based extraction suffer from several shortcomings, including poor synchronization, incorrect or harmful text, inconsistent formatting, inappropriate reading speeds, and the inability to adapt to dynamic audio-visual contexts. Current approaches often address isolated issues, leaving post-editing as a labor-intensive and time-consuming process. In this paper, we introduce V-SAT (Video Subtitle Annotation Tool), a unified framework that automatically detects and corrects a wide range of subtitle quality issues. By combining Large Language Models(LLMs), Vision-Language Models (VLMs), Image Processing, and Automatic Speech Recognition (ASR), V-SAT leverages contextual cues from both audio and video. Subtitle quality improved, with the SUBER score reduced from 9.6 to 3.54 after resolving all language mode issues and F1-scores of ~0.80 for image mode issues. Human-in-the-loop validation ensures high-quality results, providing the first comprehensive solution for robust subtitle annotation.
【16】EddyFormer: Accelerated Neural Simulations of Three-Dimensional Turbulence at Scale
标题:EddyFormer:大规模三维湍流的加速神经模拟
链接:https://arxiv.org/abs/2510.24173
备注:NeurIPS 2025
摘要:Computationally resolving turbulence remains a central challenge in fluid dynamics due to its multi-scale interactions. Fully resolving large-scale turbulence through direct numerical simulation (DNS) is computationally prohibitive, motivating data-driven machine learning alternatives. In this work, we propose EddyFormer, a Transformer-based spectral-element (SEM) architecture for large-scale turbulence simulation that combines the accuracy of spectral methods with the scalability of the attention mechanism. We introduce an SEM tokenization that decomposes the flow into grid-scale and subgrid-scale components, enabling capture of both local and global features. We create a new three-dimensional isotropic turbulence dataset and train EddyFormer to achieves DNS-level accuracy at 256^3 resolution, providing a 30x speedup over DNS. When applied to unseen domains up to 4x larger than in training, EddyFormer preserves accuracy on physics-invariant metrics-energy spectra, correlation functions, and structure functions-showing domain generalization. On The Well benchmark suite of diverse turbulent flows, EddyFormer resolves cases where prior ML models fail to converge, accurately reproducing complex dynamics across a wide range of physical conditions.
【17】Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators
标题:驯服尾部:基于芯片的加速器上混合DL工作负载的NoI布局综合
链接:https://arxiv.org/abs/2510.24113
摘要:Heterogeneous chiplet-based systems improve scaling by disag-gregating CPUs/GPUs and emerging technologies (HBM/DRAM).However this on-package disaggregation introduces a latency inNetwork-on-Interposer(NoI). We observe that in modern large-modelinference, parameters and activations routinely move backand forth from HBM/DRAM, injecting large, bursty flows into theinterposer. These memory-driven transfers inflate tail latency andviolate Service Level Agreements (SLAs) across k-ary n-cube base-line NoI topologies. To address this gap we introduce an InterferenceScore (IS) that quantifies worst-case slowdown under contention.We then formulate NoI synthesis as a multi-objective optimization(MOO) problem. We develop PARL (Partition-Aware ReinforcementLearner), a topology generator that balances throughput, latency,and power. PARL-generated topologies reduce contention at the memory cut, meet SLAs, and cut worst-case slowdown to 1.2 times while maintaining competitive mean throughput relative to link-rich meshes. Overall, this reframes NoI design for heterogeneouschiplet accelerators with workload-aware objectives.
【18】Information-Theoretic Discrete Diffusion
标题:信息论离散扩散
链接:https://arxiv.org/abs/2510.24088
备注:Accepted at NeurIPS 2025
摘要:We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.
【19】FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic
标题:CLARQON:利用低位浮点算法加速LoRA微调
链接:https://arxiv.org/abs/2510.24061
备注:NeurIPS 2025
摘要:Low-bit floating-point (FP) formats, such as FP8, provide significant acceleration and memory savings in model training thanks to native hardware support on modern GPUs and NPUs. However, we analyze that FP8 quantization offers speedup primarily for large-dimensional matrix multiplications, while inherent quantization overheads diminish speedup when applied to low-rank adaptation (LoRA), which uses small-dimensional matrices for efficient fine-tuning of large language models (LLMs). To address this limitation, we propose FALQON, a novel framework that eliminates the quantization overhead from separate LoRA computational paths by directly merging LoRA adapters into an FP8-quantized backbone during fine-tuning. Furthermore, we reformulate the forward and backward computations for merged adapters to significantly reduce quantization overhead, and introduce a row-wise proxy update mechanism that efficiently integrates substantial updates into the quantized backbone. Experimental evaluations demonstrate that FALQON achieves approximately a 3$\times$ training speedup over existing quantized LoRA methods with a similar level of accuracy, providing a practical solution for efficient large-scale model fine-tuning. Moreover, FALQON's end-to-end FP8 workflow removes the need for post-training quantization, facilitating efficient deployment. Code is available at https://github.com/iamkanghyunchoi/falqon.
【20】Mitigating Negative Transfer via Reducing Environmental Disagreement
标题:通过减少环境分歧来缓解负转移
链接:https://arxiv.org/abs/2510.24044
备注:13 pages, 5 figures
摘要:Unsupervised Domain Adaptation~(UDA) focuses on transferring knowledge from a labeled source domain to an unlabeled target domain, addressing the challenge of \emph{domain shift}. Significant domain shifts hinder effective knowledge transfer, leading to \emph{negative transfer} and deteriorating model performance. Therefore, mitigating negative transfer is essential. This study revisits negative transfer through the lens of causally disentangled learning, emphasizing cross-domain discriminative disagreement on non-causal environmental features as a critical factor. Our theoretical analysis reveals that overreliance on non-causal environmental features as the environment evolves can cause discriminative disagreements~(termed \emph{environmental disagreement}), thereby resulting in negative transfer. To address this, we propose Reducing Environmental Disagreement~(RED), which disentangles each sample into domain-invariant causal features and domain-specific non-causal environmental features via adversarially training domain-specific environmental feature extractors in the opposite domains. Subsequently, RED estimates and reduces environmental disagreement based on domain-specific non-causal environmental features. Experimental results confirm that RED effectively mitigates negative transfer and achieves state-of-the-art performance.
【21】A Pragmatic Way to Measure Chain-of-Thought Monitorability
标题
:衡量思想链协调性的务实方法
链接:https://arxiv.org/abs/2510.23966
备注:The first two authors contributed equally
摘要:While Chain-of-Thought (CoT) monitoring offers a unique opportunity for AI safety, this opportunity could be lost through shifts in training practices or model architecture. To help preserve monitorability, we propose a pragmatic way to measure two components of it: legibility (whether the reasoning can be followed by a human) and coverage (whether the CoT contains all the reasoning needed for a human to also produce the final output). We implement these metrics with an autorater prompt that enables any capable LLM to compute the legibility and coverage of existing CoTs. After sanity-checking our prompted autorater with synthetic CoT degradations, we apply it to several frontier models on challenging benchmarks, finding that they exhibit high monitorability. We present these metrics, including our complete autorater prompt, as a tool for developers to track how design decisions impact monitorability. While the exact prompt we share is still a preliminary version under ongoing development, we are sharing it now in the hopes that others in the community will find it useful. Our method helps measure the default monitorability of CoT - it should be seen as a complement, not a replacement, for the adversarial stress-testing needed to test robustness against deliberately evasive models.
【22】Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs
标题:受几何学启发的折扣和平均奖励MDP统一框架
链接:https://arxiv.org/abs/2510.23914
备注:12 pages, 1 figure
摘要:The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, are typically analyzed separately. In this work, we extend a recently introduced geometric interpretation of MDPs for the discounted-reward case to the average-reward case, thereby unifying both. This allows us to extend a major result known for the discounted-reward case to the average-reward case: under a unique and ergodic optimal policy, the Value Iteration algorithm achieves a geometric convergence rate.
【23】DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning
标题:DynaStride:使用MMCoT的动态Stride窗口,用于教学多场景字幕
链接:https://arxiv.org/abs/2510.23907
备注:16 pages, 15 figures, 5 Tables, submitted to AAAI AI4ED Workshop 2026
摘要:Scene-level captioning in instructional videos can enhance learning by requiring an understanding of both visual cues and temporal structure. By aligning visual cues with textual guidance, this understanding supports procedural learning and multimodal reasoning, providing a richer context for skill acquisition. However, captions that fail to capture this structure may lack coherence and quality, which can create confusion and undermine the video's educational intent. To address this gap, we introduce DynaStride, a pipeline to generate coherent, scene-level captions without requiring manual scene segmentation. Using the YouCookII dataset's scene annotations, DynaStride performs adaptive frame sampling and multimodal windowing to capture key transitions within each scene. It then employs a multimodal chain-of-thought process to produce multiple action-object pairs, which are refined and fused using a dynamic stride window selection algorithm that adaptively balances temporal context and redundancy. The final scene-level caption integrates visual semantics and temporal reasoning in a single instructional caption. Empirical evaluations against strong baselines, including VLLaMA3 and GPT-4o, demonstrate consistent gains on both N-gram-based metrics (BLEU, METEOR) and semantic similarity measures (BERTScore, CLIPScore). Qualitative analyses further show that DynaStride produces captions that are more temporally coherent and informative, suggesting a promising direction for improving AI-powered instructional content generation.
【24】GIFT: Group-relative Implicit Fine Tuning Integrates GRPO with DPO and UNA
标题:Gift:相对群体的隐式微调将GRPO与DPO和UNA集成
链接:https://arxiv.org/abs/2510.23868
摘要:I propose \textbf{G}roup-relative \textbf{I}mplicit \textbf{F}ine \textbf{T}uning (GIFT), a novel reinforcement learning framework for aligning LLMs. Instead of directly maximizing cumulative rewards like PPO or GRPO, GIFT minimizes the discrepancy between implicit and explicit reward models. It combines three key ideas: (1) the online multi-response generation and normalization of GRPO, (2) the implicit reward formulation of DPO, and (3) the implicit-explicit reward alignment principle of UNA. By jointly normalizing the implicit and explicit rewards, GIFT eliminates an otherwise intractable term that prevents effective use of implicit rewards. This normalization transforms the complex reward maximization objective into a simple mean squared error (MSE) loss between the normalized reward functions, converting a non-convex optimization problem into a convex, stable, and analytically differentiable formulation. Unlike offline methods such as DPO and UNA, GIFT remains on-policy and thus retains exploration capability. Compared to GRPO, it requires fewer hyperparameters, converges faster, and generalizes better with significantly reduced training overfitting. Empirically, GIFT achieves superior reasoning and alignment performance on mathematical benchmarks while remaining computationally efficient.
【25】A Physics-informed Multi-resolution Neural Operator
标题:一个基于物理学的多分辨率神经运算符
链接:https://arxiv.org/abs/2510.23810
备注:26 pages, 14 figures, 4 tables
摘要:The predictive accuracy of operator learning frameworks depends on the quality and quantity of available training data (input-output function pairs), often requiring substantial amounts of high-fidelity data, which can be challenging to obtain in some real-world engineering applications. These datasets may be unevenly discretized from one realization to another, with the grid resolution varying across samples. In this study, we introduce a physics-informed operator learning approach by extending the Resolution Independent Neural Operator (RINO) framework to a fully data-free setup, addressing both challenges simultaneously. Here, the arbitrarily (but sufficiently finely) discretized input functions are projected onto a latent embedding space (i.e., a vector space of finite dimensions), using pre-trained basis functions. The operator associated with the underlying partial differential equations (PDEs) is then approximated by a simple multi-layer perceptron (MLP), which takes as input a latent code along with spatiotemporal coordinates to produce the solution in the physical space. The PDEs are enforced via a finite difference solver in the physical space. The validation and performance of the proposed method are benchmarked on several numerical examples with multi-resolution data, where input functions are sampled at varying resolutions, including both coarse and fine discretizations.
【26】How do simple rotations affect the implicit bias of Adam?
标题:简单的旋转如何影响亚当的隐性偏见?
链接:https://arxiv.org/abs/2510.23804
摘要:Adaptive gradient methods such as Adam and Adagrad are widely used in machine learning, yet their effect on the generalization of learned models -- relative to methods like gradient descent -- remains poorly understood. Prior work on binary classification suggests that Adam exhibits a ``richness bias,'' which can help it learn nonlinear decision boundaries closer to the Bayes-optimal decision boundary relative to gradient descent. However, the coordinate-wise preconditioning scheme employed by Adam renders the overall method sensitive to orthogonal transformations of feature space. We show that this sensitivity can manifest as a reversal of Adam's competitive advantage: even small rotations of the underlying data distribution can make Adam forfeit its richness bias and converge to a linear decision boundary that is farther from the Bayes-optimal decision boundary than the one learned by gradient descent. To alleviate this issue, we show that a recently proposed reparameterization method -- which applies an orthogonal transformation to the optimization objective -- endows any first-order method with equivariance to data rotations, and we empirically demonstrate its ability to restore Adam's bias towards rich decision boundaries.
【27】Relaxed Sequence Sampling for Diverse Protein Design
标题:多样化蛋白质设计的宽松序列采样
链接:https://arxiv.org/abs/2510.23786
摘要:Protein design using structure prediction models such as AlphaFold2 has shown remarkable success, but existing approaches like relaxed sequence optimization (RSO) rely on single-path gradient descent and ignore sequence-space constraints, limiting diversity and designability. We introduce Relaxed Sequence Sampling (RSS), a Markov chain Monte Carlo (MCMC) framework that integrates structural and evolutionary information for protein design. RSS operates in continuous logit space, combining gradient-guided exploration with protein language model-informed jumps. Its energy function couples AlphaFold2-derived structural objectives with ESM2-derived sequence priors, balancing accuracy and biological plausibility. In an in silico protein binder design task, RSS produces 5$\times$ more designable structures and 2-3$\times$ greater structural diversity than RSO baselines, at equal computational cost. These results highlight RSS as a principled approach for efficiently exploring the protein design landscape.
【28】Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions
标题:Silico创造力评估:人工智能国际象棋构图的专家评论
链接:https://arxiv.org/abs/2510.23772
备注:Accepted at the Creative AI Track, NeurIPS 2025
摘要:The rapid advancement of Generative AI has raised significant questions regarding its ability to produce creative and novel outputs. Our recent work investigates this question within the domain of chess puzzles and presents an AI system designed to generate puzzles characterized by aesthetic appeal, novelty, counter-intuitive and unique solutions. We briefly discuss our method below and refer the reader to the technical paper for more details. To assess our system's creativity, we presented a curated booklet of AI-generated puzzles to three world-renowned experts: International Master for chess compositions Amatzia Avni, Grandmaster Jonathan Levitt, and Grandmaster Matthew Sadler. All three are noted authors on chess aesthetics and the evolving role of computers in the game. They were asked to select their favorites and explain what made them appealing, considering qualities such as their creativity, level of challenge, or aesthetic design.
【29】Explaining Robustness to Catastrophic Forgetting Through Incremental Concept Formation
标题:通过增量概念形成解释灾难性遗忘的稳健性
链接:https://arxiv.org/abs/2510.23756
备注:18 pages, 5 figures, Advances in Cognitive Systems 2025
摘要:Catastrophic forgetting remains a central challenge in continual learning, where models are required to integrate new knowledge over time without losing what they have previously learned. In prior work, we introduced Cobweb/4V, a hierarchical concept formation model that exhibited robustness to catastrophic forgetting in visual domains. Motivated by this robustness, we examine three hypotheses regarding the factors that contribute to such stability: (1) adaptive structural reorganization enhances knowledge retention, (2) sparse and selective updates reduce interference, and (3) information-theoretic learning based on sufficiency statistics provides advantages over gradient-based backpropagation. To test these hypotheses, we compare Cobweb/4V with neural baselines, including CobwebNN, a neural implementation of the Cobweb framework introduced in this work. Experiments on datasets of varying complexity (MNIST, Fashion-MNIST, MedMNIST, and CIFAR-10) show that adaptive restructuring enhances learning plasticity, sparse updates help mitigate interference, and the information-theoretic learning process preserves prior knowledge without revisiting past data. Together, these findings provide insight into mechanisms that can mitigate catastrophic forgetting and highlight the potential of concept-based, information-theoretic approaches for building stable and adaptive continual learning systems.
【30】Sparsity and Superposition in Mixture of Experts
标题:专家混合中的稀疏性与叠加性
链接:https://arxiv.org/abs/2510.23671
摘要:Mixture of Experts (MoE) models have become central to scaling large language models, yet their mechanistic differences from dense networks remain poorly understood. Previous work has explored how dense models use \textit{superposition} to represent more features than dimensions, and how superposition is a function of feature sparsity and feature importance. MoE models cannot be explained mechanistically through the same lens. We find that neither feature sparsity nor feature importance cause discontinuous phase changes, and that network sparsity (the ratio of active to total experts) better characterizes MoEs. We develop new metrics for measuring superposition across experts. Our findings demonstrate that models with greater network sparsity exhibit greater \emph{monosemanticity}. We propose a new definition of expert specialization based on monosemantic feature representation rather than load balancing, showing that experts naturally organize around coherent feature combinations when initialized appropriately. These results suggest that network sparsity in MoEs may enable more interpretable models without sacrificing performance, challenging the common assumption that interpretability and capability are fundamentally at odds.
【31】Combining Textual and Structural Information for Premise Selection in Lean
标题:结合文本和结构信息进行精益中首选
链接:https://arxiv.org/abs/2510.23637
摘要
:Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of dependencies that connects them. We present a graph-augmented approach that combines dense text embeddings of Lean formalizations with graph neural networks over a heterogeneous dependency graph capturing both state--premise and premise--premise relations. On the LeanDojo Benchmark, our method outperforms the ReProver language-based baseline by over 25% across standard retrieval metrics. These results demonstrate the power of relational information for more effective premise selection.
【32】DiNo and RanBu: Lightweight Predictions from Shallow Random Forests
标题:DiNo和RanBu:来自浅层随机森林的轻量级预测
链接:https://arxiv.org/abs/2510.23624
摘要:Random Forest ensembles are a strong baseline for tabular prediction tasks, but their reliance on hundreds of deep trees often results in high inference latency and memory demands, limiting deployment in latency-sensitive or resource-constrained environments. We introduce DiNo (Distance with Nodes) and RanBu (Random Bushes), two shallow-forest methods that convert a small set of depth-limited trees into efficient, distance-weighted predictors. DiNo measures cophenetic distances via the most recent common ancestor of observation pairs, while RanBu applies kernel smoothing to Breiman's classical proximity measure. Both approaches operate entirely after forest training: no additional trees are grown, and tuning of the single bandwidth parameter $h$ requires only lightweight matrix-vector operations. Across three synthetic benchmarks and 25 public datasets, RanBu matches or exceeds the accuracy of full-depth random forests-particularly in high-noise settings-while reducing training plus inference time by up to 95\%. DiNo achieves the best bias-variance trade-off in low-noise regimes at a modest computational cost. Both methods extend directly to quantile regression, maintaining accuracy with substantial speed gains. The implementation is available as an open-source R/C++ package at https://github.com/tiagomendonca/dirf. We focus on structured tabular random samples (i.i.d.), leaving extensions to other modalities for future work.
【33】Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields
标题:加速GMA:等效力场的低精度技巧
链接:https://arxiv.org/abs/2510.23621
备注:78 pages, 21 figures
摘要:Machine-learning force fields can deliver accurate molecular dynamics (MD) at high computational cost. For SO(3)-equivariant models such as MACE, there is little systematic evidence on whether reduced-precision arithmetic and GPU-optimized kernels can cut this cost without harming physical fidelity. This thesis aims to make MACE cheaper and faster while preserving accuracy by identifying computational bottlenecks and evaluating low-precision execution policies. We profile MACE end-to-end and per block, compare the e3nn and NVIDIA cuEquivariance backends, and assess FP64/FP32/BF16/FP16 settings (with FP32 accumulation) for inference, short NVT and long NPT water simulations, and toy training runs under reproducible, steady-state timing. cuEquivariance reduces inference latency by about $3\times$. Casting only linear layers to BF16/FP16 within an FP32 model yields roughly 4x additional speedups, while energies and thermodynamic observables in NVT/NPT MD remain within run-to-run variability. Half-precision weights during training degrade force RMSE. Mixing e3nn and cuEq modules without explicit adapters causes representation mismatches. Fused equivariant kernels and mixed-precision inference can substantially accelerate state-of-the-art force fields with negligible impact on downstream MD. A practical policy is to use cuEquivariance with FP32 by default and enable BF16/FP16 for linear layers (keeping FP32 accumulations) for maximum throughput, while training remains in FP32. Further gains are expected on Ampere/Hopper GPUs (TF32/BF16) and from kernel-level FP16/BF16 paths and pipeline fusion.
【34】Feedback Lunch: Deep Feedback Codes for Wiretap Channels
标题:反馈午餐:窃听频道的深度反馈代码
链接:https://arxiv.org/abs/2510.16620
摘要:We consider reversely-degraded wiretap channels, for which the secrecy capacity is zero if there is no channel feedback. This work focuses on a seeded modular code design for the Gaussian wiretap channel with channel output feedback, combining universal hash functions for security and learned feedback-based codes for reliability to achieve positive secrecy rates. We study the trade-off between communication reliability and information leakage, illustrating that feedback enables agreeing on a secret key shared between legitimate parties, overcoming the security advantage of the wiretapper. Our findings also motivate code designs for sensing-assisted secure communication, to be used in next-generation integrated sensing and communication methods.
【35】Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication
标题:用于矩阵相乘的节能精确和近似Syrup阵列架构
链接:https://arxiv.org/abs/2509.00778
备注:Submitted to 39th International Conference on VLSI Design, 2026
摘要:Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs), designed using energy-efficient positive partial product and negative partial product cells, termed as PPC and NPPC, respectively. The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively, compared to the existing design. To demonstrate their effectiveness, the proposed PEs are integrated into a systolic array (SA) for Discrete Cosine Transform (DCT) computation, achieving high output quality with a PSNR of 38.21,dB. Furthermore, in an edge detection application using convolution, the approximate PE achieves a PSNR of 30.45,dB. These results highlight the potential of the proposed design to deliver significant energy efficiency while maintaining competitive output quality, making it well-suited for error-resilient image and vision processing applications.
【36】Self-Concordant Perturbations for Linear Bandits
标题:线性Bandits的自协调扰动
链接:https://arxiv.org/abs/2510.24187
摘要:We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $O(d\sqrt{n \ln n})$ on both the $d$-dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a $\sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.
【37】Copula-Stein Discrepancy: A Generator-Based Stein Operator for Archimedean Dependence
标题:Copula-Stein离散:阿基米德依赖的基于生成器的Stein运算符
链接:https://arxiv.org/abs/2510.24056
摘要:Kernel Stein discrepancies (KSDs) have become a principal tool for goodness-of-fit testing, but standard KSDs are often insensitive to higher-order dependency structures, such as tail dependence, which are critical in many scientific and financial domains. We address this gap by introducing the Copula-Stein Discrepancy (CSD), a novel class of discrepancies tailored to the geometry of statistical dependence. By defining a Stein operator directly on the copula density, CSD leverages the generative structure of dependence, rather than relying on the joint density's score function. For the broad class of Archimedean copulas, this approach yields a closed-form Stein kernel derived from the scalar generator function. We provide a comprehensive theoretical analysis, proving that CSD (i) metrizes weak convergence of copula distributions, ensuring it detects any mismatch in dependence; (ii) has an empirical estimator that converges at the minimax optimal rate of $O_P(n^{-1/2})$; and (iii) is provably sensitive to differences in tail dependence coefficients. The framework is extended to general non-Archimedean copulas, including elliptical and vine copulas. Computationally, the exact CSD kernel evaluation scales linearly in dimension, while a novel random feature approximation reduces the $n$-dependence from quadratic $O(n^2)$ to near-linear $\tilde{O}(n)$, making CSD a practical and theoretically principled tool for dependence-aware inference.
【38】Testing-driven Variable Selection in Bayesian Modal Regression
标题:贝叶斯模态回归中测试驱动的变量选择
链接:https://arxiv.org/abs/2510.23831
备注:30 pages, 2 figures, preprint under review
摘要:We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite parameter estimation. A test statistic is constructed to exploit the shape of the model error distribution to effectively separate informative covariates from unimportant ones. Through simulations, we demonstrate and evaluate the efficacy of the proposed method in identifying important covariates in the presence of non-Gaussian model errors. Finally, we apply the proposed method to analyze two datasets arising in genetic and epigenetic studies.
【39】Re-envisioning Euclid Galaxy Morphology: Identifying and Interpreting Features with Sparse Autoencoders
标题:重新构想欧几里德星系形态学:用稀疏自动编码器识别和解释特征
链接:https://arxiv.org/abs/2510.23749
备注:Accepted to NeurIPS Machine Learning and the Physical Sciences Workshop
摘要:Sparse Autoencoders (SAEs) can efficiently identify candidate monosemantic features from pretrained neural networks for galaxy morphology. We demonstrate this on Euclid Q1 images using both supervised (Zoobot) and new self-supervised (MAE) models. Our publicly released MAE achieves superhuman image reconstruction performance. While a Principal Component Analysis (PCA) on the supervised model primarily identifies features already aligned with the Galaxy Zoo decision tree, SAEs can identify interpretable features outside of this framework. SAE features also show stronger alignment than PCA with Galaxy Zoo labels. Although challenges in interpretability remain, SAEs provide a powerful engine for discovering astrophysical phenomena beyond the confines of human-defined classification.
【40】Beyond Normality: Reliable A/B Testing with Non-Gaussian Data
标题:超越正态性:使用非高斯数据进行可靠的A/B测试
链接:https://arxiv.org/abs/2510.23666
备注:11 pages, 3 figures
摘要:A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to compare outcomes between the treatment and control groups, thereby assessing the effectiveness of a given strategy. To be trustworthy, these experiments must keep Type I error (i.e., false positive rate) under control; otherwise, we may launch harmful strategies. However, in real-world applications, we find that A/B testing often fails to deliver reliable results. When the data distribution departs from normality or when the treatment and control groups differ in sample size, the commonly used pairwise $t$-test is no longer trustworthy. In this paper, we quantify how skewed, long tailed data and unequal allocation distort error rates and derive explicit formulas for the minimum sample size required for the $t$-test to remain valid. We find that many online feedback metrics require hundreds of millions samples to ensure reliable A/B testing. Thus we introduce an Edgeworth-based correction that provides more accurate $p$-values when the available sample size is limited. Offline experiments on a leading A/B testing platform corroborate the practical value of our theoretical minimum sample size thresholds and demonstrate that the corrected method substantially improves the reliability of A/B testing in real-world conditions.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递