点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计329篇
大模型相关(28篇)
【1】Who Judges the Judge? LLM Jury-on-Demand: Building Trustworthy LLM Evaluation Systems
标题:谁评判法官?LLM按需陪审团:建立值得信赖的LLM评估系统
链接:https://arxiv.org/abs/2512.01786
作者:Xiaochuan Li,Ke Wang,Girija Gouda,Shubham Choudhary,Yaqun Wang,Linwei Hu,Joel Vaughan,Freddy Lecue
备注:66 pages, 22 figures, 37 tables
摘要:As Large Language Models (LLMs) become integrated into high-stakes domains, there is a growing need for evaluation methods that are both scalable for real-time deployment and reliable for critical decision-making. While human evaluation is reliable, it is slow and costly. Single LLM judges are biased, and static juries lack adaptability. To overcome these limitations, we propose LLM Jury-on-Demand - a dynamic, learning-based framework for scalable and context-aware evaluation. Our method trains a set of reliability predictors to assess when LLM judges will agree with human experts, leveraging token distributions, embeddings, and structural input features. This enables a fully adaptive evaluation where, for each data point, an optimal jury of the most reliable judges is dynamically selected, and their scores are aggregated using their reliability as weights. Experiments on summarization and RAG benchmarks show that our dynamic jury system achieves significantly higher correlation with human judgment than both single-judge and static-jury baselines. These results highlight the promise of adaptive, learning-based juries for building scalable, more reliable and trustworthy evaluation systems for modern LLMs in high-stakes domains.
【2】SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models
标题:SA-ADP:大型语言模型的敏感性感知自适应差异隐私
链接:https://arxiv.org/abs/2512.01748
作者:Stella Etuk,Ashraf Matrawy
备注:It is a 5-page paper with 5 figures and 1 Table
摘要:Despite advances in the use of large language models (LLMs) in downstream tasks, their ability to memorize information has raised privacy concerns. Therefore, protecting personally identifiable information (PII) during LLM training remains a fundamental challenge. Conventional methods like Differential Privacy-Stochastic Gradient Descent (DP-SGD) provide robust privacy protection via uniform noising, protecting PII regardless of its distinct sensitivity. This comes at the expense of the model's utility, leading to a trade-off. In this paper, we propose SA-ADP, a sensitivity-aware approach that allocates noise based on the sensitivity of individual PII. We evaluated our method on four datasets (ABCD, CUSTOMERSIM, Wikitext-2, and UNSW-NB15 ). Our results show that SA-ADP achieves results comparable to the baseline (No-DP) and the conventional DP-SGD. This means that our method did not degrade the model's utility while still maintaining strong privacy protection.
【3】Automating modeling in mechanics: LLMs as designers of physics-constrained neural networks for constitutive modeling of materials
标题:力学自动化建模:LLM作为材料本构建模的物理约束神经网络的设计者
链接:https://arxiv.org/abs/2512.01735
作者:Marius Tacke,Matthias Busch,Kian Abdolazizi,Jonas Eichinger,Kevin Linka,Christian Cyron,Roland Aydin
备注:Currently under review
摘要:Large language model (LLM)-based agentic frameworks increasingly adopt the paradigm of dynamically generating task-specific agents. We suggest that not only agents but also specialized software modules for scientific and engineering tasks can be generated on demand. We demonstrate this concept in the field of solid mechanics. There, so-called constitutive models are required to describe the relationship between mechanical stress and body deformation. Constitutive models are essential for both the scientific understanding and industrial application of materials. However, even recent data-driven methods of constitutive modeling, such as constitutive artificial neural networks (CANNs), still require substantial expert knowledge and human labor. We present a framework in which an LLM generates a CANN on demand, tailored to a given material class and dataset provided by the user. The framework covers LLM-based architecture selection, integration of physical constraints, and complete code generation. Evaluation on three benchmark problems demonstrates that LLM-generated CANNs achieve accuracy comparable to or greater than manually engineered counterparts, while also exhibiting reliable generalization to unseen loading scenarios and extrapolation to large deformations. These findings indicate that LLM-based generation of physics-constrained neural networks can substantially reduce the expertise required for constitutive modeling and represent a step toward practical end-to-end automation.
【4】ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models
标题:ICAD-LLM:通过大型语言模型的上下文学习进行一对所有异常检测
链接:https://arxiv.org/abs/2512.01672
作者:Zhongyuan Wu,Jingyuan Wang,Zexuan Cheng,Yilong Zhou,Weizhi Wang,Juhua Pu,Chao Li,Changqing Ma
摘要
:Anomaly detection (AD) is a fundamental task of critical importance across numerous domains. Current systems increasingly operate in rapidly evolving environments that generate diverse yet interconnected data modalities -- such as time series, system logs, and tabular records -- as exemplified by modern IT systems. Effective AD methods in such environments must therefore possess two critical capabilities: (1) the ability to handle heterogeneous data formats within a unified framework, allowing the model to process and detect multiple modalities in a consistent manner during anomalous events; (2) a strong generalization ability to quickly adapt to new scenarios without extensive retraining. However, most existing methods fall short of these requirements, as they typically focus on single modalities and lack the flexibility to generalize across domains. To address this gap, we introduce a novel paradigm: In-Context Anomaly Detection (ICAD), where anomalies are defined by their dissimilarity to a relevant reference set of normal samples. Under this paradigm, we propose ICAD-LLM, a unified AD framework leveraging Large Language Models' in-context learning abilities to process heterogeneous data within a single model. Extensive experiments demonstrate that ICAD-LLM achieves competitive performance with task-specific AD methods and exhibits strong generalization to previously unseen tasks, which substantially reduces deployment costs and enables rapid adaptation to new environments. To the best of our knowledge, ICAD-LLM is the first model capable of handling anomaly detection tasks across diverse domains and modalities.
【5】Scaling and context steer LLMs along the same computational path as the human brain
标题:扩展和上下文引导LLC沿着与人类大脑相同的计算路径前进
链接:https://arxiv.org/abs/2512.01591
作者:Joséphine Raugel,Stéphane d'Ascoli,Jérémy Rapin,Valentin Wyart,Jean-Rémi King
摘要:Recent studies suggest that the representations learned by large language models (LLMs) are partially aligned to those of the human brain. However, whether and why this alignment score arises from a similar sequence of computations remains elusive. In this study, we explore this question by examining temporally-resolved brain signals of participants listening to 10 hours of an audiobook. We study these neural dynamics jointly with a benchmark encompassing 22 LLMs varying in size and architecture type. Our analyses confirm that LLMs and the brain generate representations in a similar order: specifically, activations in the initial layers of LLMs tend to best align with early brain responses, while the deeper layers of LLMs tend to best align with later brain responses. This brain-LLM alignment is consistent across transformers and recurrent architectures. However, its emergence depends on both model size and context length. Overall, this study sheds light on the sequential nature of computations and the factors underlying the partial convergence between biological and artificial neural networks.
【6】Do Large Language Models Walk Their Talk? Measuring the Gap Between Implicit Associations, Self-Report, and Behavioral Altruism
标题:大型语言模型会说话吗?衡量内隐联想、自我报告和行为利他主义之间的差距
链接:https://arxiv.org/abs/2512.01568
作者:Sandro Andric
备注:14 pages, 7 figures, 7 tables. Code and data available at https://github.com/sandroandric/LLMs_Altruism_Study_Code
摘要:We investigate whether Large Language Models (LLMs) exhibit altruistic tendencies, and critically, whether their implicit associations and self-reports predict actual altruistic behavior. Using a multi-method approach inspired by human social psychology, we tested 24 frontier LLMs across three paradigms: (1) an Implicit Association Test (IAT) measuring implicit altruism bias, (2) a forced binary choice task measuring behavioral altruism, and (3) a self-assessment scale measuring explicit altruism beliefs. Our key findings are: (1) All models show strong implicit pro-altruism bias (mean IAT = 0.87, p < .0001), confirming models "know" altruism is good. (2) Models behave more altruistically than chance (65.6% vs. 50%, p < .0001), but with substantial variation (48-85%). (3) Implicit associations do not predict behavior (r = .22, p = .29). (4) Most critically, models systematically overestimate their own altruism, claiming 77.5% altruism while acting at 65.6% (p < .0001, Cohen's d = 1.08). This "virtue signaling gap" affects 75% of models tested. Based on these findings, we recommend the Calibration Gap (the discrepancy between self-reported and behavioral values) as a standardized alignment metric. Well-calibrated models are more predictable and behaviorally consistent; only 12.5% of models achieve the ideal combination of high prosocial behavior and accurate self-knowledge.
【7】SynthStrategy: Extracting and Formalizing Latent Strategic Insights from LLMs in Organic Chemistry
标题:SynthStrategy:从有机化学领域法学硕士中提取和形式化潜在战略见解
链接:https://arxiv.org/abs/2512.01507
作者:Daniel Armstrong,Zlatko Jončev,Andres M Bran,Philippe Schwaller
摘要:Modern computer-assisted synthesis planning (CASP) systems show promises at generating chemically valid reaction steps but struggle to incorporate strategic considerations such as convergent assembly, protecting group minimization, and optimal ring-forming sequences. We introduce a methodology that leverages Large Language Models to distill synthetic knowledge into code. Our system analyzes synthesis routes and translates strategic principles into Python functions representing diverse strategic and tactical rules, such as strategic functional group interconversions and ring construction strategies. By formalizing this knowledge as verifiable code rather than simple heuristics, we create testable, interpretable representations of synthetic strategy. We release the complete codebase and the USPTO-ST dataset -- synthesis routes annotated with strategic tags. This framework unlocks a novel capability for CASP: natural language-based route retrieval, achieving 75\% Top-3 accuracy on our benchmark. We further validate our library through temporal analysis of historical trends and chemically intuitive route clustering that offers more granular partitioning than common previous methods. This work bridges the tactical-strategic divide in CASP, enabling specification, search, and evaluation of routes by strategic criteria rather than structure alone.
【8】RE-LLM: Integrating Large Language Models into Renewable Energy Systems
标题:RE-LLM:将大型语言模型集成到可再生能源系统中
链接:https://arxiv.org/abs/2512.01392
作者:Ali Forootani,Mohammad Sadr,Danial Esmaeili Aliabadi,Daniela Thraen
摘要:Energy system models are increasingly employed to guide long-term planning in multi-sectoral environments where decisions span electricity, heat, transport, land use, and industry. While these models provide rigorous quantitative insights, their outputs are often highly technical, making them difficult to interpret for non-expert stakeholders such as policymakers, planners, and the public. This communication gap limits the accessibility and practical impact of scenario-based modeling, particularly as energy transitions grow more complex with rising shares of renewables, sectoral integration, and deep uncertainties. To address this challenge, we propose the Renewable Energy Large Language Model (RE-LLM), a hybrid framework that integrates Large Language Models (LLMs) directly into the energy system modeling workflow. RE-LLM combines three core elements: (i) optimization-based scenario exploration, (ii) machine learning surrogates that accelerate computationally intensive simulations, and (iii) LLM-powered natural language generation that translates complex results into clear, stakeholder-oriented explanations. This integrated design not only reduces computational burden but also enhances inter-pretability, enabling real-time reasoning about trade-offs, sensitivities, and policy implications. The framework is adaptable across different optimization platforms and energy system models, ensuring broad applicability beyond the case study presented. By merging speed, rigor, and interpretability, RE-LLM advances a new paradigm of human-centric energy modeling. It enables interactive, multilingual, and accessible engagement with future energy pathways, ultimately bridging the final gap between data-driven analysis and actionable decision-making for sustainable transitions.
【9】Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
标题:使用LLM稳定强化学习:制定和实践
链接:https://arxiv.org/abs/2512.01374
作者:Chujie Zheng,Kai Dang,Bowen Yu,Mingze Li,Huiqiang Jiang,Junrong Lin,Yuqiong Liu,An Yang,Jingren Zhou,Junyang Lin
摘要:This paper proposes a novel formulation for reinforcement learning (RL) with large language models, explaining why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy gradient methods such as REINFORCE. Specifically, through a first-order approximation, we show that this surrogate becomes increasingly valid only when both the training-inference discrepancy and policy staleness are minimized. This insight provides a principled explanation for the crucial role of several widely adopted techniques in stabilizing RL training, including importance sampling correction, clipping, and particularly Routing Replay for Mixture-of-Experts (MoE) models. Through extensive experiments with a 30B MoE model totaling hundreds of thousands of GPU hours, we show that for on-policy training, the basic policy gradient algorithm with importance sampling correction achieves the highest training stability. When off-policy updates are introduced to accelerate convergence, combining clipping and Routing Replay becomes essential to mitigate the instability caused by policy staleness. Notably, once training is stabilized, prolonged optimization consistently yields comparable final performance regardless of cold-start initialization. We hope that the shared insights and the developed recipes for stable RL training will facilitate future research.
【10】Intrinsic Structure as a Proxy for Saliency: SVD-Based Weight Preservation for Mixed-Precision Quantization in Large Language Models
标题:内在结构作为显著性的代理:大型语言模型中混合精度量化的基于SVD的权重保持
链接:https://arxiv.org/abs/2512.01343
作者:Shashank Landge,Abhishek Patil,Tejas kamble,Bhushan Buddhivant,Priyanka Joshi
摘要:As Large Language Models (LLMs) continue to scale in parameter count, deploying them on commodity hardware has become increasingly challenging. Post-Training Quantization (PTQ) addresses this by reducing the precision of model weights, typically to 4-bit or lower. However, uniform quantization often leads to significant performance degradation due to the presence of ``outlier features'' -- weights that, while few in number, are critical for maintaining model accuracy. Current state-of-the-art methods such as AWQ (Activation-aware Weight Quantization) and SpQR (Sparse Quantization Representations) rely on calibration data to identify these salient weights via activation magnitudes or Hessian sensitivity. In scenarios where data privacy is paramount or calibration data is unavailable, these methods are inapplicable. In this work, we propose a data-free, structure-aware hypothesis: that the weights identified as Principal Components via Singular Value Decomposition (SVD) are intrinsically important to the model's downstream performance. We introduce a novel selection heuristic that preserves the top-$k$ weights aligned with the principal components in FP32, while aggressively quantizing the residual weights. We compare our method against activation-aware (AWQ) and second-order (SpQR) methods across GLUE benchmarks (MRPC, RTE, QNLI) using a DistilBERT backbone. Our experiments reveal that structural importance is highly correlated with functional importance. On the challenging RTE task, our SVD-based method achieves an accuracy of 66.06\%, outperforming both AWQ (65.34\%) and SpQR (65.34\%) at high protection budgets, validating that intrinsic matrix structure can serve as a robust proxy for weight saliency without the need for forward passes or calibration data.
【11】Securing Large Language Models (LLMs) from Prompt Injection Attacks
标题:保护大型语言模型(LLM)免受提示注入攻击
链接:https://arxiv.org/abs/2512.01326
作者:Omar Farooq Khan Suri,John McCrae
备注:10 pages, 1 figure, 1 table
摘要
:Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious tasks. Recent work has proposed JATMO, a task-specific fine-tuning approach that trains non-instruction-tuned base models to perform a single function, thereby reducing susceptibility to adversarial instructions. In this study, we evaluate the robustness of JATMO against HOUYI, a genetic attack framework that systematically mutates and optimizes adversarial prompts. We adapt HOUYI by introducing custom fitness scoring, modified mutation logic, and a new harness for local model testing, enabling a more accurate assessment of defense effectiveness. We fine-tuned LLaMA 2-7B, Qwen1.5-4B, and Qwen1.5-0.5B models under the JATMO methodology and compared them with a fine-tuned GPT-3.5-Turbo baseline. Results show that while JATMO reduces attack success rates relative to instruction-tuned models, it does not fully prevent injections; adversaries exploiting multilingual cues or code-related disruptors still bypass defenses. We also observe a trade-off between generation quality and injection vulnerability, suggesting that better task performance often correlates with increased susceptibility. Our results highlight both the promise and limitations of fine-tuning-based defenses and point toward the need for layered, adversarially informed mitigation strategies.
【12】CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions
标题:CyclliST:循环状态转变推理的视频语言模型基准
链接:https://arxiv.org/abs/2512.01095
作者:Simon Kohaut,Daniel Ochs,Shun Zhang,Benedict Flade,Julian Eggert,Kristian Kersting,Devendra Singh Dhami
摘要:We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clutter, and lighting conditions, challenging state-of-the-art models on their spatio-temporal cognition. We conduct extensive experiments with current state-of-the-art VLMs, both open-source and proprietary, and reveal their limitations in generalizing to cyclical dynamics such as linear and orbital motion, as well as time-dependent changes in visual attributes like color and scale. Our results demonstrate that present-day VLMs struggle to reliably detect and exploit cyclic patterns, lack a notion of temporal understanding, and are unable to extract quantitative insights from scenes, such as the number of objects in motion, highlighting a significant technical gap that needs to be addressed. More specifically, we find no single model consistently leads in performance: neither size nor architecture correlates strongly with outcomes, and no model succeeds equally well across all tasks. By providing a targeted challenge and a comprehensive evaluation framework, CycliST paves the way for visual reasoning models that surpass the state-of-the-art in understanding periodic patterns.
【13】WUSH: Near-Optimal Adaptive Transforms for LLM Quantization
标题:WUSH:LLM量化的近优自适应变换
链接:https://arxiv.org/abs/2512.00956
作者:Jiale Chen,Vage Egiazarian,Torsten Hoefler,Dan Alistarh
摘要:Quantization to low bitwidth is a standard approach for deploying large language models, however, a few extreme weights and activations stretch the dynamic range and reduce the effective resolution of the quantizer. A common mitigation approach is to apply some fixed orthogonal transforms, such as Hadamard matrices, before quantization, which typically reduces the dynamic range. Yet, these transforms ignore the statistics of the data, and their optimality is currently not understood. In this work, we derive, for the first time, closed-form optimal linear blockwise transforms for joint weight-activation quantization using standard data-free quantizers for common numerical formats. Specifically, we provide derivations of the optimal adaptive (data-aware) transforms for round-to-nearest (RTN), AbsMax-scaled block quantizers for both integer and floating-point formats. The resulting construction, which we call WUSH, combines a Hadamard backbone with a data-dependent component based on second-order moments, yielding a non-orthogonal transform that is provably optimal under mild assumptions and remains structured for efficient implementation. Preliminary experimental results show that our approach consistently improves upon the Hadamard transform for common formats.
【14】Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs
标题:超越高熵探索:基于正确性的低熵段的推理LLM优势塑造
链接:https://arxiv.org/abs/2512.00908
作者:Xinzhu Chen,Xuesheng Li,Zhongxiang Sun,Weijie Yu
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has become a central approach for improving the reasoning ability of large language models. Recent work studies RLVR through token entropy, arguing that high-entropy tokens drive exploration and should receive stronger updates. However, they overlook the fact that most of a reasoning trajectory consists of low-entropy segments that encode stable and reusable structural patterns. Through qualitative and quantitative analyses, we find that the overlap of low-entropy segments across correct responses strongly correlates with model accuracy, while overlaps involving incorrect responses exhibit stable but unproductive patterns. Motivated by these findings, we propose LESS, a correctness-aware reinforcement framework that performs fine-grained advantage modulation over low-entropy segments. LESS amplifies segments unique to correct responses, suppresses those unique to incorrect ones, and neutralizes segments shared by both, while preserving high-entropy exploration in the underlying RL algorithm. Instantiated on top of the popular GRPO, LESS consistently improves accuracy over strong RL baselines across three backbones and six math benchmarks, achieves stronger robustness of the performance floor.
【15】Towards Active Synthetic Data Generation for Finetuning Language Models
标题:迈向微调语言模型的主动合成数据生成
链接:https://arxiv.org/abs/2512.00884
作者:Samuel Kessler,Menglin Xia,Daniel Madrigal Diaz,Dongge Han,Helia Heshemi,Saravan Rajmohan,Victor Ruehle,Jordan T. Ash
备注:14 figures, 36 pages
摘要
:A common and effective means for improving language model capabilities involves finetuning a ``student'' language model's parameters on generations from a more proficient ``teacher'' model. Termed ``synthetic data'', these generations are often produced before any student finetuning, but some work has considered generating new synthetic samples as training progresses. This paper studies and advocates for the latter case, where data are generated in an iterative, closed-loop fashion that is guided by the current state of the student model. For a fixed budget of generated samples, or a budget in terms of compute spent querying a teacher, we show that this curation of finetuning data affords improved student performance over static generation. Further, while there have been several LLM-specific methods proposed that operate in this regime, we find that simple, inexpensive selection criteria from the active learning literature tend to be most performant. We validate these claims across four mathematical and logical reasoning datasets using four different small language models.
【16】HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs
标题:HBLLM:LLM的微波增强高保真1位量化
链接:https://arxiv.org/abs/2512.00862
作者:Ningning Chen,Weicai Ye,Ying Jiang
摘要:We introduce HBLLM, a wavelet-enhanced high-fidelity $1$-bit post-training quantization method for Large Language Models (LLMs). By leveraging Haar wavelet transforms to enhance expressive capacity through frequency decomposition, HBLLM significantly improves quantization fidelity while maintaining minimal overhead. This approach features two innovative structure-aware grouping strategies: (1) frequency-aware multi-parameter intra-row grouping and (2) $\ell_2$-norm-based saliency-driven column selection. For non-salient weights, a shared mean is employed across quantization groups within each frequency band to optimize storage efficiency. Experiments conducted on the OPT and LLaMA models demonstrate that HBLLM achieves state-of-the-art performance in $1$-bit quantization, attaining a perplexity of $6.71$ on LLaMA$2$-$13$B with an average weight storage of only $1.08$ bits. Code available at: https://github.com/Yeyke/HBLLM.
【17】ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning
标题:ReJump:一种用于分析和改进LLM推理的树跳表示
链接:https://arxiv.org/abs/2512.00831
作者:Yuchen Zeng,Shuibai Zhang,Wonjun Kang,Shutong Wu,Lynnix Zou,Ying Fan,Heeju Kim,Ziqian Lin,Jungtaek Kim,Hyung Il Koo,Dimitris Papailiopoulos,Kangwook Lee
摘要:Large Reasoning Models (LRMs) are Large Language Models (LLMs) explicitly trained to generate long-form Chain-of-Thoughts (CoTs), achieving impressive success on challenging tasks like math and programming. However, their underlying reasoning "algorithms" remain poorly understood. To investigate this, we propose ReJump, which represents a reasoning trace as a visitation order over nodes in a tree of intermediate problem-solving steps. Transitions between nodes, which we term jumps, include adjacent moves that capture behaviors such as calculation, and non-adjacent moves that capture behaviors such as backtracking and verification. ReJump enables analyzing LLM reasoning with diverse metrics that quantify exploration, exploitation, overthinking, forgetting, and verification. Using our proposed LLM agent to extract reasoning traces into ReJump format, we evaluate state-of-the-art LRMs on two tasks and find that models with similar accuracy can exhibit distinct reasoning behaviors, while different tasks favor different reasoning styles (e.g., varying balance between exploration and exploitation). To further understand how learning strategies shape reasoning, we use ReJump to compare distilled LRMs with their teachers, CoT-prompted LLMs with LRMs, and to examine how the number of reasoning examples and reinforcement learning affect reasoning behavior. Finally, we show that ReJump can improve reasoning quality at test time through strategies such as ReJump-guided Best-of-N selection and prompt selection. Our code is publicly available at https://github.com/UW-Madison-Lee-Lab/ReJump.
【18】Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment
标题:西格玛:视觉-语言-动作模型实现心灵感应一致的关键
链接:https://arxiv.org/abs/2512.00783
作者:Libo Wang
备注:The Sigma model has been open-sourced on Hugging Face. Weights, dataset, some scripts, and logs are all available. The link is: https://huggingface.co/Veltraxor/Sigma
摘要:To address the gap in humanoid robot cognitive systems regarding the lack of a time-updable mediating thought space between semantics and continuous control, this study constructs and trains a VLA model named "Sigma" that runs on a single RTX 4090. It uses the open-source pi05_base model as a foundation and preprocesses svla_so101_pickplace into a training dataset. The researcher independently designed an architecture for a vision-language-action model that combines deep semantic understanding and association to achieve telepathic communication. The training process involved repeated optimizations of data preprocessing, LoRA fine-tuning, and the inference-stage adapter. The experiment employed offline closed-loop replay, comparing Sigma with the untuned pure pi05_base_base model under data conditions. Results showed that Sigma exhibited a stable decrease in control MSE across vector, fragment, and entire trajectory timescales, while maintaining the telepathy norm and semantic-text alignment quality unchanged. It demonstrates that mind-responsive alignment control is quantified through an architecture that combines deep understanding of semantics and association without retraining the base model, which provides reproducible experience for semantic alignment and intention-driven behavior in humanoid robots.
【19】REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories
标题:REM:通过多帧轨迹评估LLM有序空间推理
链接:https://arxiv.org/abs/2512.00736
作者:Jacob Thompson,Emiliano Garcia-Lopez,Yonatan Bisk
摘要:Humans build viewpoint-independent cognitive maps through navigation, enabling intuitive reasoning about object permanence and spatial relations. We argue that multimodal large language models (MLLMs), despite extensive video training, lack this fundamental spatial reasoning capability, a critical limitation for embodied applications. To demonstrate these limitations and drive research, we introduce REM (Reasoning over Embodied Multi-Frame Trajectories), a benchmark using controllable 3D environments for long-horizon embodied spatial reasoning. REM systematically evaluates key aspects like object permanence/distinction, spatial relationships, and numerical tracking across dynamic embodied viewpoints. Our evaluation shows that the best-performing current models exhibit promising overall performance, but become increasingly unreliable at even moderate complexity levels easily handled by humans. These findings highlight challenges MLLMs face in developing robust spatial representations from sequential visual input. Consequently, REM provides targeted metrics and diagnostics to foster improved spatial understanding in future models.
【20】Large Language Models for Software Engineering: A Reproducibility Crisis
标题:软件工程的大型语言模型:再现性危机
链接:https://arxiv.org/abs/2512.00651
作者:Mohammed Latif Siddiq,Arvin Islam-Gomes,Natalie Sekerak,Joanna C. S. Santos
备注:Submitted to Empirical Software Engineering (EMSE) journal; 112 pages (81 pages of references)
摘要:Reproducibility is a cornerstone of scientific progress, yet its state in large language model (LLM)-based software engineering (SE) research remains poorly understood. This paper presents the first large-scale, empirical study of reproducibility practices in LLM-for-SE research. We systematically mined and analyzed 640 papers published between 2017 and 2025 across premier software engineering, machine learning, and natural language processing venues, extracting structured metadata from publications, repositories, and documentation. Guided by four research questions, we examine (i) the prevalence of reproducibility smells, (ii) how reproducibility has evolved over time, (iii) whether artifact evaluation badges reliably reflect reproducibility quality, and (iv) how publication venues influence transparency practices. Using a taxonomy of seven smell categories: Code and Execution, Data, Documentation, Environment and Tooling, Versioning, Model, and Access and Legal, we manually annotated all papers and associated artifacts. Our analysis reveals persistent gaps in artifact availability, environment specification, versioning rigor, and documentation clarity, despite modest improvements in recent years and increased adoption of artifact evaluation processes at top SE venues. Notably, we find that badges often signal artifact presence but do not consistently guarantee execution fidelity or long-term reproducibility. Motivated by these findings, we provide actionable recommendations to mitigate reproducibility smells and introduce a Reproducibility Maturity Model (RMM) to move beyond binary artifact certification toward multi-dimensional, progressive evaluation of reproducibility rigor.
【21】Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
标题:Wikontic:用大型语言模型构建维基数据对齐、实体感知的知识图
链接:https://arxiv.org/abs/2512.00590
作者:Alla Chepurova,Aydar Bulatov,Yuri Kuratov,Mikhail Burtsev
摘要:Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3$\times$ fewer than AriGraph and $
【22】SelfAI: Building a Self-Training AI System with LLM Agents
标题:SelfAI:利用LLM代理构建自我训练人工智能系统
链接:https://arxiv.org/abs/2512.00403
作者:Xiao Wu,Ting-Zhu Huang,Liang-Jian Deng,Xiaobing Yu,Yu Zhong,Shangqi Deng,Ufaq Khan,Jianghao Wu,Xiaofeng Liu,Imran Razzak,Xiaojun Chang,Yutong Xie
摘要:Recent work on autonomous scientific discovery has leveraged LLM-based agents to integrate problem specification, experiment planning, and execution into end-to-end systems. However, these frameworks are often confined to narrow application domains, offer limited real-time interaction with researchers, and lack principled mechanisms for determining when to halt exploration, resulting in inefficiencies, reproducibility challenges, and under-utilized human expertise. To address these gaps, we propose \textit{SelfAI}, a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations, a Cognitive Agent powered by LLMs with optimal stopping criteria to iteratively refine hyperparameter searches, and an Experiment Manager responsible for orchestrating parallel, fault-tolerant training workflows across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback. We further introduce two novel evaluation metrics, Score and $\text{AUP}_D$, to quantify discovery efficiency and search diversity. Across regression, NLP, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials compared to classical Bayesian optimization and LLM-based baselines, while enabling seamless interaction with human researchers.
【23】Evaluating LLMs in Open-Source Games
标题:评估开源游戏中的LLM
链接:https://arxiv.org/abs/2512.00371
作者:Swadesh Sistla,Max Kleiman-Weiner
备注:39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions. These programs offer numerous advantages, including interpretability, inter-agent transparency, and formal verifiability; additionally, they enable program equilibria, solutions that leverage the transparency of code and are inaccessible within normal-form settings. We evaluate the capabilities of leading open- and closed-weight LLMs to predict and classify program strategies and evaluate features of the approximate program equilibria reached by LLM agents in dyadic and evolutionary settings. We identify the emergence of payoff-maximizing, cooperative, and deceptive strategies, characterize the adaptation of mechanisms within these programs over repeated open-source games, and analyze their comparative evolutionary fitness. We find that open-source games serve as a viable environment to study and steer the emergence of cooperative strategy in multi-agent dilemmas.
【24】RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs
标题:RL-Struct:一个轻量级强化学习框架,用于LLM中的可靠结构化输出
链接:https://arxiv.org/abs/2512.00319
作者:Ruike Hu,Shulei Wu
备注:23 pages, 14 figures. Model is available at https://huggingface.co/Freakz3z/Qwen-JSON
摘要:Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language generation and reasoning. However, their integration into automated software ecosystems is often hindered by the "Structure Gap" - the inherent tension between the probabilistic nature of token generation and the deterministic requirements of structured data formats (e.g., JSON, XML). Traditional Supervised Fine-Tuning (SFT) often fails to enforce strict syntactic constraints, leading to "hallucinated" keys or malformed structures, while constrained decoding methods impose significant inference latency. In this paper, we propose a lightweight, efficient Reinforcement Learning (RL) framework to bridge this gap. We introduce a novel Multi-dimensional Reward Function that decomposes the structured output task into a hierarchy of constraints: structural integrity, format correctness, content accuracy, and validity. Leveraging Gradient Regularized Policy Optimization (GRPO), we enable the model to internalize these constraints without the need for a separate critic network, reducing peak VRAM usage by 40% compared to PPO. We validate our approach on multiple tasks, including complex recipe generation and structured math reasoning (GSM8K-JSON). Experimental results demonstrate that our method achieves 89.7% structural accuracy and 92.1% JSON validity, significantly outperforming both zero-shot baselines (e.g., GPT-3.5) and SFT on larger models like LLaMA-3-8B. Furthermore, we provide a detailed analysis of training dynamics, revealing a distinct self-paced curriculum where the model sequentially acquires syntactic proficiency before semantic accuracy. Our model is publicly available at https://huggingface.co/Freakz3z/Qwen-JSON.
【25】FiCoTS: Fine-to-Coarse LLM-Enhanced Hierarchical Cross-Modality Interaction for Time Series Forecasting
标题:FiCoTS:用于时间序列预测的细到粗的LLM增强分层跨模式交互
链接:https://arxiv.org/abs/2512.00293
作者:Yafei Lyu,Hao Zhou,Lu Zhang,Xu Yang,Zhiyong Liu
摘要:Time series forecasting is central to data analysis and web technologies. The recent success of Large Language Models (LLMs) offers significant potential for this field, especially from the cross-modality aspect. Most methods adopt an LLM-as-Predictor paradigm, using LLM as the forecasting backbone and designing modality alignment mechanisms to enable LLM to understand time series data. However, the semantic information in the two modalities of time series and text differs significantly, making it challenging for LLM to fully understand time series data. To mitigate this challenge, our work follows an LLM-as-Enhancer paradigm to fully utilize the advantage of LLM in text understanding, where LLM is only used to encode text modality to complement time series modality. Based on this paradigm, we propose FiCoTS, an LLM-enhanced fine-to-coarse framework for multimodal time series forecasting. Specifically, the framework facilitates progressive cross-modality interaction by three levels in a fine-to-coarse scheme: First, in the token-level modality alignment module, a dynamic heterogeneous graph is constructed to filter noise and align time series patches with text tokens; Second, in the feature-level modality interaction module, a global cross-attention mechanism is introduced to enable each time series variable to connect with relevant textual contexts; Third, in the decision-level modality fusion module, we design a gated network to adaptively fuse the results of the two modalities for robust predictions. These three modules work synergistically to let the two modalities interact comprehensively across three semantic levels, enabling textual information to effectively support temporal prediction. Extensive experiments on seven real-world benchmarks demonstrate that our model achieves state-of-the-art performance. The codes will be released publicly.
【26】Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financial Tabular Classification
标题:衡量LLM认为他们做了什么:SHAP在财务表格分类上的忠诚度和可部署性
链接:https://arxiv.org/abs/2512.00163
作者:Saeed AlMarri,Mathieu Ravaut,Kristof Juhasz,Gautier Marti,Hamdan Al Ahbabi,Ibrahim Elfadel
备注:7 pages, 3 figures, 3 tables, AAAI 2026 Deployable AI Workshop
摘要:Large Language Models (LLMs) have attracted significant attention for classification tasks, offering a flexible alternative to trusted classical machine learning models like LightGBM through zero-shot prompting. However, their reliability for structured tabular data remains unclear, particularly in high stakes applications like financial risk assessment. Our study systematically evaluates LLMs and generates their SHAP values on financial classification tasks. Our analysis shows a divergence between LLMs self-explanation of feature impact and their SHAP values, as well as notable differences between LLMs and LightGBM SHAP values. These findings highlight the limitations of LLMs as standalone classifiers for structured financial modeling, but also instill optimism that improved explainability mechanisms coupled with few-shot prompting will make LLMs usable in risk-sensitive domains.
【27】Enhancing Talent Search Ranking with Role-Aware Expert Mixtures and LLM-based Fine-Grained Job Descriptions
标题:通过角色感知专家混合和基于法学硕士的细粒度职位描述增强人才搜索排名
链接:https://arxiv.org/abs/2512.00004
作者:Jihang Li,Bing Xu,Zulong Chen,Chuanfei Xu,Minping Chen,Suyu Liu,Ying Zhou,Zeyi Wen
摘要
:Talent search is a cornerstone of modern recruitment systems, yet existing approaches often struggle to capture nuanced job-specific preferences, model recruiter behavior at a fine-grained level, and mitigate noise from subjective human judgments. We present a novel framework that enhances talent search effectiveness and delivers substantial business value through two key innovations: (i) leveraging LLMs to extract fine-grained recruitment signals from job descriptions and historical hiring data, and (ii) employing a role-aware multi-gate MoE network to capture behavioral differences across recruiter roles. To further reduce noise, we introduce a multi-task learning module that jointly optimizes click-through rate (CTR), conversion rate (CVR), and resume matching relevance. Experiments on real-world recruitment data and online A/B testing show relative AUC gains of 1.70% (CTR) and 5.97% (CVR), and a 17.29% lift in click-through conversion rate. These improvements reduce dependence on external sourcing channels, enabling an estimated annual cost saving of millions of CNY.
【28】Layer Probing Improves Kinase Functional Prediction with Protein Language Models
标题:层探测利用蛋白质语言模型改进了Kinase功能预测
链接:https://arxiv.org/abs/2512.00376
作者:Ajit Kumar,IndraPrakash Jha
备注:14 pages, 7 figures, 3 tables; includes code and dataset links
摘要:Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsupervised clustering and supervised classification. We show that mid-to-late transformer layers (layers 20-33) outperform the final layer by 32 percent in unsupervised Adjusted Rand Index and improve homology-aware supervised accuracy to 75.7 percent. Domain-level extraction, calibrated probability estimates, and a reproducible benchmarking pipeline further strengthen reliability. Our results demonstrate that transformer depth contains functionally distinct biological signals and that principled layer selection significantly improves kinase function prediction.
Graph相关(图学习|图神经网络|图优化等)(10篇)
【1】Elastic Weight Consolidation for Knowledge Graph Continual Learning: An Empirical Evaluation
标题:知识图持续学习的弹性权重合并:一个实证评估
链接:https://arxiv.org/abs/2512.01890
作者:Gaganpreet Jhajj,Fuhua Lin
备注:Accepted to NORA Workshop at NeurIPS 2025
摘要:Knowledge graphs (KGs) require continual updates as new information emerges, but neural embedding models suffer from catastrophic forgetting when learning new tasks sequentially. We evaluate Elastic Weight Consolidation (EWC), a regularization-based continual learning method, on KG link prediction using TransE embeddings on FB15k-237. Across multiple experiments with five random seeds, we find that EWC reduces catastrophic forgetting from 12.62% to 6.85%, a 45.7% reduction compared to naive sequential training. We observe that the task partitioning strategy affects the magnitude of forgetting: relation-based partitioning (grouping triples by relation type) exhibits 9.8 percentage points higher forgetting than randomly partitioned tasks (12.62% vs 2.81%), suggesting that task construction influences evaluation outcomes. While focused on a single embedding model and dataset, our results demonstrate that EWC effectively mitigates catastrophic forgetting in KG continual learning and highlight the importance of evaluation protocol design.
【2】Domain-Decomposed Graph Neural Network Surrogate Modeling for Ice Sheets
标题:冰盖的域分解图神经网络代理建模
链接:https://arxiv.org/abs/2512.01888
作者:Adrienne M. Propp,Mauro Perego,Eric C. Cyr,Anthony Gruber,Amanda A. Howard,Alexander Heinlein,Panos Stinis,Daniel M. Tartakovsky
摘要:Accurate yet efficient surrogate models are essential for large-scale simulations of partial differential equations (PDEs), particularly for uncertainty quantification (UQ) tasks that demand hundreds or thousands of evaluations. We develop a physics-inspired graph neural network (GNN) surrogate that operates directly on unstructured meshes and leverages the flexibility of graph attention. To improve both training efficiency and generalization properties of the model, we introduce a domain decomposition (DD) strategy that partitions the mesh into subdomains, trains local GNN surrogates in parallel, and aggregates their predictions. We then employ transfer learning to fine-tune models across subdomains, accelerating training and improving accuracy in data-limited settings. Applied to ice sheet simulations, our approach accurately predicts full-field velocities on high-resolution meshes, substantially reduces training time relative to training a single global surrogate model, and provides a ripe foundation for UQ objectives. Our results demonstrate that graph-based DD, combined with transfer learning, provides a scalable and reliable pathway for training GNN surrogates on massive PDE-governed systems, with broad potential for application beyond ice sheet dynamics.
【3】HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment
标题:HalluShape:通过知识图对齐为合法RAG系统进行可审核的幻觉检测
链接:https://arxiv.org/abs/2512.01659
作者:Valentin Noël,Elimane Yassine Seidou,Charly Ken Capo-Chichi,Ghanem Amari
备注:8 pages, 4 figures, under review
摘要
:Legal AI systems powered by retrieval-augmented generation (RAG) face a critical accountability challenge: when an AI assistant cites case law, statutes, or contractual clauses, practitioners need verifiable guarantees that generated text faithfully represents source documents. Existing hallucination detectors rely on semantic similarity metrics that tolerate entity substitutions, a dangerous failure mode when confusing parties, dates, or legal provisions can have material consequences. We introduce HalluGraph, a graph-theoretic framework that quantifies hallucinations through structural alignment between knowledge graphs extracted from context, query, and response. Our approach produces bounded, interpretable metrics decomposed into \textit{Entity Grounding} (EG), measuring whether entities in the response appear in source documents, and \textit{Relation Preservation} (RP), verifying that asserted relationships are supported by context. On structured control documents, HalluGraph achieves near-perfect discrimination ($>$400 words, $>$20 entities), HalluGraph achieves $AUC = 0.979$, while maintaining robust performance ($AUC \approx 0.89$) on challenging generative legal task, consistently outperforming semantic similarity baselines. The framework provides the transparency and traceability required for high-stakes legal applications, enabling full audit trails from generated assertions back to source passages.
【4】LGDC: Latent Graph Diffusion via Spectrum-Preserving Coarsening
标题:LGDC:通过保谱粗化的潜图扩散
链接:https://arxiv.org/abs/2512.01190
作者:Nagham Osman,Keyue Jiang,Davide Buffelli,Xiaowen Dong,Laura Toni
摘要:Graph generation is a critical task across scientific domains. Existing methods fall broadly into two categories: autoregressive models, which iteratively expand graphs, and one-shot models, such as diffusion, which generate the full graph at once. In this work, we provide an analysis of these two paradigms and reveal a key trade-off: autoregressive models stand out in capturing fine-grained local structures, such as degree and clustering properties, whereas one-shot models excel at modeling global patterns, such as spectral distributions. Building on this, we propose LGDC (latent graph diffusion via spectrum-preserving coarsening), a hybrid framework that combines strengths of both approaches. LGDC employs a spectrum-preserving coarsening-decoarsening to bidirectionally map between graphs and a latent space, where diffusion efficiently generates latent graphs before expansion restores detail. This design captures both local and global properties with improved efficiency. Empirically, LGDC matches autoregressive models on locally structured datasets (Tree) and diffusion models on globally structured ones (Planar, Community-20), validating the benefits of hybrid generation.
【5】Graph Data Augmentation with Contrastive Learning on Covariate Distribution Shift
标题:利用协变量分布转移的对比学习进行图数据增强
链接:https://arxiv.org/abs/2512.00716
作者:Fanlong Zeng,Wensheng Gan
备注:8 tables, 8 figures
摘要:Covariate distribution shift occurs when certain structural features present in the test set are absent from the training set. It is a common type of out-of-distribution (OOD) problem, frequently encountered in real-world graph data with complex structures. Existing research has revealed that most out-of-the-box graph neural networks (GNNs) fail to account for covariate shifts. Furthermore, we observe that existing methods aimed at addressing covariate shifts often fail to fully leverage the rich information contained within the latent space. Motivated by the potential of the latent space, we introduce a new method called MPAIACL for More Powerful Adversarial Invariant Augmentation using Contrastive Learning. MPAIACL leverages contrastive learning to unlock the full potential of vector representations by harnessing their intrinsic information. Through extensive experiments, MPAIACL demonstrates its robust generalization and effectiveness, as it performs well compared with other baselines across various public OOD datasets. The code is publicly available at https://github.com/flzeng1/MPAIACL.
【6】Generalized Graph Transformer Variational Autoencoder
标题:广义图Transformer变分自动编码器
链接:https://arxiv.org/abs/2512.00612
作者:Siddhant Karki
摘要:Graph link prediction has long been a central problem in graph representation learning in both network analysis and generative modeling. Recent progress in deep learning has introduced increasingly sophisticated architectures for capturing relational dependencies within graph-structured data. In this work, we propose the Generalized Graph Transformer Variational Autoencoder (GGT-VAE). Our model integrates Generalized Graph Transformer Architecture with Variational Autoencoder framework for link prediction. Unlike prior GraphVAE, GCN, or GNN approaches, GGT-VAE leverages transformer style global self-attention mechanism along with laplacian positional encoding to model structural patterns across nodes into a latent space without relying on message passing. Experimental results on several benchmark datasets demonstrate that GGT-VAE consistently achieves above-baseline performance in terms of ROC-AUC and Average Precision. To the best of our knowledge, this is among the first studies to explore graph structure generation using a generalized graph transformer backbone in a variational framework.
【7】A Graph Neural Network Approach for Localized and High-Resolution Temperature Forecasting
标题:局部高分辨率温度预测的图神经网络方法
链接:https://arxiv.org/abs/2512.00546
作者:Joud El-Shawa,Elham Bagheri,Sedef Akinli Kocak,Yalda Mohsenzadeh
备注:6 pages, 2 figures. Accepted to the NeurIPS 2025 Tackling Climate Change with Machine Learning Workshop
摘要:Heatwaves are intensifying worldwide and are among the deadliest weather disasters. The burden falls disproportionately on marginalized populations and the Global South, where under-resourced health systems, exposure to urban heat islands, and the lack of adaptive infrastructure amplify risks. Yet current numerical weather prediction models often fail to capture micro-scale extremes, leaving the most vulnerable excluded from timely early warnings. We present a Graph Neural Network framework for localized, high-resolution temperature forecasting. By leveraging spatial learning and efficient computation, our approach generates forecasts at multiple horizons, up to 48 hours. For Southwestern Ontario, Canada, the model captures temperature patterns with a mean MAE of 1.93$^{\circ}$C across 1-48h forecasts and MAE@48h of 2.93$^{\circ}$C, evaluated using 24h input windows on the largest region. While demonstrated here in a data-rich context, this work lays the foundation for transfer learning approaches that could enable localized, equitable forecasts in data-limited regions of the Global South.
【8】Adversarial Signed Graph Learning with Differential Privacy
标题:具有差异隐私的对抗签名图学习
链接:https://arxiv.org/abs/2512.00307
作者:Haobin Ke,Sen Zhang,Qingqing Ye,Xun Ran,Haibo Hu
摘要:Signed graphs with positive and negative edges can model complex relationships in social networks. Leveraging on balance theory that deduces edge signs from multi-hop node pairs, signed graph learning can generate node embeddings that preserve both structural and sign information. However, training on sensitive signed graphs raises significant privacy concerns, as model parameters may leak private link information. Existing protection methods with differential privacy (DP) typically rely on edge or gradient perturbation for unsigned graph protection. Yet, they are not well-suited for signed graphs, mainly because edge perturbation tends to cascading errors in edge sign inference under balance theory, while gradient perturbation increases sensitivity due to node interdependence and gradient polarity change caused by sign flips, resulting in larger noise injection. In this paper, motivated by the robustness of adversarial learning to noisy interactions, we present ASGL, a privacy-preserving adversarial signed graph learning method that preserves high utility while achieving node-level DP. We first decompose signed graphs into positive and negative subgraphs based on edge signs, and then design a gradient-perturbed adversarial module to approximate the true signed connectivity distribution. In particular, the gradient perturbation helps mitigate cascading errors, while the subgraph separation facilitates sensitivity reduction. Further, we devise a constrained breadth-first search tree strategy that fuses with balance theory to identify the edge signs between generated node pairs. This strategy also enables gradient decoupling, thereby effectively lowering gradient sensitivity. Extensive experiments on real-world datasets show that ASGL achieves favorable privacy-utility trade-offs across multiple downstream tasks.
【9】Statistical-computational gap in multiple Gaussian graph alignment
标题:多重高斯图对齐中的统计计算差距
链接:https://arxiv.org/abs/2512.00610
作者:Bertrand Even,Luca Ganassali
摘要:We investigate the existence of a statistical-computational gap in multiple Gaussian graph alignment. We first generalize a previously established informational threshold from Vassaux and Massoulié (2025) to regimes where the number of observed graphs $p$ may also grow with the number of nodes $n$: when $p \leq O(n/\log(n))$, we recover the results from Vassaux and Massoulié (2025), and $p \geq Ω(n/\log(n))$ corresponds to a regime where the problem is as difficult as aligning one single graph with some unknown "signal" graph. Moreover, when $\log p = ω(\log n)$, the informational thresholds for partial and exact recovery no longer coincide, in contrast to the all-or-nothing phenomenon observed when $\log p=O(\log n)$. Then, we provide the first computational barrier in the low-degree framework for (multiple) Gaussian graph alignment. We prove that when the correlation $ρ$ is less than $1$, up to logarithmic terms, low degree non-trivial estimation fails. Our results suggest that the task of aligning $p$ graphs in polynomial time is as hard as the problem of aligning two graphs in polynomial time, up to logarithmic factors. These results characterize the existence of a statistical-computational gap and provide another example in which polynomial-time algorithms cannot handle complex combinatorial bi-dimensional structures.
【10】GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding
标题:GCMCG:一个用于多范式、多任务和跨主题脑电解码的感知图形注意力和专家融合网络
链接:https://arxiv.org/abs/2512.00574
作者:Yiqiao Chen,Zijian Huang,Juchi He,Fazheng Xu,Zhenghui Feng
备注:46 pages, 11 figures
摘要:Brain-Computer Interfaces (BCIs) based on Motor Execution (ME) and Motor Imagery (MI) electroencephalogram (EEG) signals offer a direct pathway for human-machine interaction. However, developing robust decoding models remains challenging due to the complex spatio-temporal dynamics of EEG, its low signal-to-noise ratio, and the limited generalizability of many existing approaches across subjects and paradigms. To address these issues, this paper proposes Graph-guided Clustering Mixture-of-Experts CNN-GRU (GCMCG), a novel unified framework for MI-ME EEG decoding. Our approach integrates a robust preprocessing stage using Independent Component Analysis and Wavelet Transform (ICA-WT) for effective denoising. We further introduce a pre-trainable graph tokenization module that dynamically models electrode relationships via a Graph Attention Network (GAT), followed by unsupervised spectral clustering to decompose signals into interpretable functional brain regions. Each region is processed by a dedicated CNN-GRU expert network, and a gated fusion mechanism with L1 regularization adaptively combines these local features with a global expert. This Mixture-of-Experts (MoE) design enables deep spatio-temporal fusion and enhances representational capacity. A three-stage training strategy incorporating focal loss and progressive sampling is employed to improve cross-subject generalization and handle class imbalance. Evaluated on three public datasets of varying complexity (EEGmmidb-BCI2000, BCI-IV 2a, and M3CV), GCMCG achieves overall accuracies of 86.60%, 98.57%, and 99.61%, respectively, which demonstrates its superior effectiveness and strong generalization capability for practical BCI applications.
Transformer(9篇)
【1】The Mean-Field Dynamics of Transformers
标题:Transformer的平均场动力学
链接:https://arxiv.org/abs/2512.01868
作者:Philippe Rigollet
备注:to appear as Proceedings of the ICM2026, Philadelphia, USA
摘要:We develop a mathematical framework that interprets Transformer attention as an interacting particle system and studies its continuum (mean-field) limits. By idealizing attention continuous on the sphere, we connect Transformer dynamics to Wasserstein gradient flows, synchronization models (Kuramoto), and mean-shift clustering. Central to our results is a global clustering phenomenon whereby tokens cluster asymptotically after long metastable states where they are arranged into multiple clusters. We further analyze a tractable equiangular reduction to obtain exact clustering rates, show how commonly used normalization schemes alter contraction speeds, and identify a phase transition for long-context attention. The results highlight both the mechanisms that drive representation collapse and the regimes that preserve expressive, multi-cluster structure in deep attention architectures.
【2】Teaching by Failure: Counter-Example-Driven Curricula for Transformer Self-Improvement
标题:失败教学:反例驱动的Transformer自我提升课程
链接:https://arxiv.org/abs/2512.01187
作者:Harshil Vejendla
备注:AACL 2025 Findings
摘要:Transformer models often exhibit brittle extrapolation, failing on inputs that are longer or structurally more complex than those seen during training. We introduce Counter-Example-Driven Curricula (CEDC), an automated framework that improves model robustness by iteratively focusing on its own failures. At each step, CEDC uses the current model to generate a diverse set of candidate problems, employs a fast, executable verifier to identify incorrect predictions (counter-examples), and then fine-tunes the model on a dataset enriched with these discovered failures. We evaluate CEDC on a suite of algorithmic and natural language tasks, including integer addition, sorting, Dyck-2 language recognition, and three text classification benchmarks. Compared to static training and standard curriculum learning baselines, CEDC achieves up to 30x greater length extrapolation, is 3.75x more computationally efficient than uniform data augmentation, and requires no manual difficulty heuristics. We provide a detailed analysis of the counter-examples, showing how the curriculum naturally adapts to target progressively more complex error modes. Our findings establish verifier-guided, failure-driven learning as a simple, powerful, and efficient paradigm for enhancing the generalization capabilities of Transformer models.
【3】Parameter Reduction Improves Vision Transformers: A Comparative Study of Sharing and Width Reduction
标题:参数缩减改善视觉Transformer:共享和宽度缩减的比较研究
链接:https://arxiv.org/abs/2512.01059
作者:Anantha Padmanaban Krishna Kumar
备注:7 pages total (6 pages main text, 1 page references), 1 figures, 2 tables. Code available at https://github.com/AnanthaPadmanaban-KrishnaKumar/parameter-efficient-vit-mlps
摘要:Although scaling laws and many empirical results suggest that increasing the size of Vision Transformers often improves performance, model accuracy and training behavior are not always monotonically increasing with scale. Focusing on ViT-B/16 trained on ImageNet-1K, we study two simple parameter-reduction strategies applied to the MLP blocks, each removing 32.7\% of the baseline parameters. Our \emph{GroupedMLP} variant shares MLP weights between adjacent transformer blocks and achieves 81.47\% top-1 accuracy while maintaining the baseline computational cost. Our \emph{ShallowMLP} variant halves the MLP hidden dimension and reaches 81.25\% top-1 accuracy with a 38\% increase in inference throughput. Both models outperform the 86.6M-parameter baseline (81.05\%) and exhibit substantially improved training stability, reducing peak-to-final accuracy degradation from 0.47\% to the range 0.03\% to 0.06\%. These results suggest that, for ViT-B/16 on ImageNet-1K with a standard training recipe, the model operates in an overparameterized regime in which MLP capacity can be reduced without harming performance and can even slightly improve it. More broadly, our findings suggest that architectural constraints such as parameter sharing and reduced width may act as useful inductive biases, and highlight the importance of how parameters are allocated when designing Vision Transformers. All code is available at: https://github.com/AnanthaPadmanaban-KrishnaKumar/parameter-efficient-vit-mlps.
【4】Robust Probabilistic Load Forecasting for a Single Household: A Comparative Study from SARIMA to Transformers on the REFIT Dataset
标题:单个家庭的稳健概率负荷预测:RECIT数据集中从SARIMA到Transformers的比较研究
链接:https://arxiv.org/abs/2512.00856
作者:Midhun Manoj
备注:12 pages, 8 figures, 1 table. This work includes a rigorous comparative study of imputation methods and presents results submitted to PAKDD 2026. Source code and analysis notebooks are available on GitHub: [https://github.com/middhun-31/Robust-Probabilistic-Load-Forecasting-for-a-Single-Household]
摘要:Probabilistic forecasting is essential for modern risk management, allowing decision-makers to quantify uncertainty in critical systems. This paper tackles this challenge using the volatile REFIT household dataset, which is complicated by a large structural data gap. We first address this by conducting a rigorous comparative experiment to select a Seasonal Imputation method, demonstrating its superiority over linear interpolation in preserving the data's underlying distribution. We then systematically evaluate a hierarchy of models, progressing from classical baselines (SARIMA, Prophet) to machine learning (XGBoost) and advanced deep learning architectures (LSTM). Our findings reveal that classical models fail to capture the data's non-linear, regime-switching behavior. While the LSTM provided the most well-calibrated probabilistic forecast, the Temporal Fusion Transformer (TFT) emerged as the superior all-round model, achieving the best point forecast accuracy (RMSE 481.94) and producing safer, more cautious prediction intervals that effectively capture extreme volatility.
【5】Estimating the Effective Rank of Vision Transformers via Low-Rank Factorization
标题:通过低等级分解估计视觉变形者的有效等级
链接:https://arxiv.org/abs/2512.00792
作者:Liyu Zerihun
摘要
:Deep networks are heavily over-parameterized, yet their learned representations often admit low-rank structure. We introduce a framework for estimating a model's intrinsic dimensionality by treating learned representations as projections onto a low-rank subspace of the model's full capacity. Our approach: train a full-rank teacher, factorize its weights at multiple ranks, and train each factorized student via distillation to measure performance as a function of rank. We define effective rank as a region, not a point: the smallest contiguous set of ranks for which the student reaches 85-95% of teacher accuracy. To stabilize estimates, we fit accuracy vs. rank with a monotone PCHIP interpolant and identify crossings of the normalized curve. We also define the effective knee as the rank maximizing perpendicular distance between the smoothed accuracy curve and its endpoint secant; an intrinsic indicator of where marginal gains concentrate. On ViT-B/32 fine-tuned on CIFAR-100 (one seed, due to compute constraints), factorizing linear blocks and training with distillation yields an effective-rank region of approximately [16, 34] and an effective knee at r* ~ 31. At rank 32, the student attains 69.46% top-1 accuracy vs. 73.35% for the teacher (~94.7% of baseline) while achieving substantial parameter compression. We provide a framework to estimate effective-rank regions and knees across architectures and datasets, offering a practical tool for characterizing the intrinsic dimensionality of deep models.
【6】Constructing Efficient Fact-Storing MLPs for Transformers
标题:为Transformer构建高效的事实存储MLP
链接:https://arxiv.org/abs/2512.00207
作者:Owen Dugan,Roberto Garcia,Ronny Junkins,Jerry Liu,Dylan Zinsley,Sabri Eyuboglu,Atri Rudra,Chris Ré
摘要:The success of large language models (LLMs) can be attributed in part to their ability to efficiently store factual knowledge as key-value mappings within their MLP parameters. Recent work has proposed explicit weight constructions to build such fact-storing MLPs, providing an improved understanding of LLM fact storage mechanisms. In this paper, we introduce an MLP construction framework that improves over previous constructions in three areas: it 1) works for all but a measure-zero set of feasible input-output pairs, 2) achieves asymptotically optimal parameter efficiency matching information-theoretic bounds for some embeddings, and 3) maintains usability within Transformers for factual recall. Through our improvements, we 1) discover a metric on value embeddings that characterizes facts-per-parameter scaling for both constructed and gradient-descent-trained MLPs, 2) identify a simple encoder-decoder mechanism that empirically matches gradient-descent MLP facts-per-parameter asymptotics across all the inputs and outputs we test, and 3) uncover a fundamental tradeoff between an MLP's fact-storage capacity and its usability within Transformers. Finally, we demonstrate a proof-of-concept application of fact-storing MLPs: modular fact editing on one-layer Transformers by \textit{replacing entire MLPs at once}.
【7】Comparative Analysis of Vision Transformer, Convolutional, and Hybrid Architectures for Mental Health Classification Using Actigraphy-Derived Images
标题:使用Actigraph衍生图像进行心理健康分类的视觉Transformer、卷积和混合架构的比较分析
链接:https://arxiv.org/abs/2512.00103
作者:Ifeanyi Okala
摘要:This work examines how three different image-based methods, VGG16, ViT-B/16, and CoAtNet-Tiny, perform in identifying depression, schizophrenia, and healthy controls using daily actigraphy records. Wrist-worn activity signals from the Psykose and Depresjon datasets were converted into 30 by 48 images and evaluated through a three-fold subject-wise split. Although all methods fitted the training data well, their behaviour on unseen data differed. VGG16 improved steadily but often settled at lower accuracy. ViT-B/16 reached strong results in some runs, but its performance shifted noticeably from fold to fold. CoAtNet-Tiny stood out as the most reliable, recording the highest average accuracy and the most stable curves across folds. It also produced the strongest precision, recall, and F1-scores, particularly for the underrepresented depression and schizophrenia classes. Overall, the findings indicate that CoAtNet-Tiny performed most consistently on the actigraphy images, while VGG16 and ViT-B/16 yielded mixed results. These observations suggest that certain hybrid designs may be especially suited for mental-health work that relies on actigraphy-derived images.
【8】Efficient Turing Machine Simulation with Transformers
标题:使用Transformer的高效图灵机模拟
链接:https://arxiv.org/abs/2512.00003
作者:Qian Li,Yuyi Wang
备注:17 pages
摘要:Constant bit-size Transformers are known to be Turing complete, but existing constructions require $Ω(s(n))$ chain-of-thought (CoT) steps per simulated Turing machine (TM) step, leading to impractical reasoning lengths. In this paper, we significantly reduce this efficiency gap by proving that any $(t(n),s(n))$-bounded multi-tape TM can be simulated by a constant bit-size Transformer with an optimal $O(s(n))$-long context window and only $O(s(n)^c)$ CoT steps per TM step, where $c>0$ can be made arbitrarily small by letting the Transformers' head-layer product sufficiently large. In addition, our construction shows that sparse attention with fixed geometric offsets suffices for efficient universal computation. Our proof leverages multi-queue TMs as a bridge. The main technical novelty is a more efficient simulation of multi-tape TMs by synchronous multi-queue TMs, improving both time and space complexity under stricter model assumptions.
【9】Neural Networks for Predicting Permeability Tensors of 2D Porous Media: Comparison of Convolution- and Transformer-based Architectures
标题:用于预测2D多孔媒体渗透率张量的神经网络:基于卷积和基于变换器的架构的比较
链接:https://arxiv.org/abs/2512.01517
作者:Sigurd Vargdal,Paula Reis,Henrik Andersen Sveinsson,Gaute Linga
摘要
:Permeability is a central concept in the macroscopic description of flow through porous media, with applications spanning from oil recovery to hydrology. Traditional methods for determining the permeability tensor involving flow simulations or experiments can be time consuming and resource-intensive, while analytical methods, e.g., based on the Kozeny-Carman equation, may be too simplistic for accurate prediction based on pore-scale features. In this work, we explore deep learning as a more efficient alternative for predicting the permeability tensor based on two-dimensional binary images of porous media, segmented into solid ($1$) and void ($0$) regions. We generate a dataset of 24,000 synthetic random periodic porous media samples with specified porosity and characteristic length scale. Using Lattice-Boltzmann simulations, we compute the permeability tensor for flow through these samples with values spanning three orders of magnitude. We evaluate three families of image-based deep learning models: ResNet (ResNet-$50$ and ResNet-$101$), Vision Transformers (ViT-T$16$ and ViT-S$16$) and ConvNeXt (Tiny and Small). To improve model generalisation, we employ techniques such as weight decay, learning rate scheduling, and data augmentation. The effect of data augmentation and dataset size on model performance is studied, and we find that they generally increase the accuracy of permeability predictions. We also show that ConvNeXt and ResNet converge faster than ViT and degrade in performance if trained for too long. ConvNeXt-Small achieved the highest $R^2$ score of $0.99460$ on $4,000$ unseen test samples. These findings underscore the potential to use image-based neural networks to predict permeability tensors accurately.
GAN|对抗|攻击|生成相关(17篇)
【1】Mofasa: A Step Change in Metal-Organic Framework Generation
标题:Mofasa:金属有机框架生成的一个重大变化
链接:https://arxiv.org/abs/2512.01756
作者:Vaidotas Simkus,Anders Christensen,Steven Bennett,Ian Johnson,Mark Neumann,James Gin,Jonathan Godwin,Benjamin Rhodes
摘要:Mofasa is an all-atom latent diffusion model with state-of-the-art performance for generating Metal-Organic Frameworks (MOFs). These are highly porous crystalline materials used to harvest water from desert air, capture carbon dioxide, store toxic gases and catalyse chemical reactions. In recognition of their value, the development of MOFs recently received a Nobel Prize in Chemistry. In many ways, MOFs are well-suited for exploiting generative models in chemistry: they are rationally-designable materials with a large combinatorial design space and strong structure-property couplings. And yet, to date, a high performance generative model has been lacking. To fill this gap, we introduce Mofasa, a general-purpose latent diffusion model that jointly samples positions, atom-types and lattice vectors for systems as large as 500 atoms. Mofasa avoids handcrafted assembly algorithms common in the literature, unlocking the simultaneous discovery of metal nodes, linkers and topologies. To help the scientific community build on our work, we release MofasaDB, an annotated library of hundreds of thousands of sampled MOF structures, along with a user-friendly web interface for search and discovery: https://mofux.ai/ .
【2】ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation
标题:ZIP-RC:自适应和可解释生成的回报和成本的零开销推理时预测
链接:https://arxiv.org/abs/2512.01457
作者:Rohin Manvi,Joey Hong,Tim Seyde,Maxime Labonne,Mathias Lechner,Sergey Levine
备注:Code coming soon
摘要:Large language models excel at reasoning but lack key aspects of introspection, including anticipating their own success and the computation required to achieve it. Humans use real-time introspection to decide how much effort to invest, when to make multiple attempts, when to stop, and when to signal success or failure. Without this, LLMs struggle to make intelligent meta-cognition decisions. Test-time scaling methods like Best-of-N drive up cost and latency by using a fixed budget of samples regardless of the marginal benefit of each one at any point in generation, and the absence of confidence signals can mislead people, prevent appropriate escalation to better tools, and undermine trustworthiness. Learned verifiers or reward models can provide confidence estimates, but do not enable adaptive inference and add substantial cost by requiring extra models or forward passes. We present ZIP-RC, an adaptive inference method that equips models with zero-overhead inference-time predictions of reward and cost. At every token, ZIP-RC reuses reserved or unused logits in the same forward pass as next-token prediction to output a joint distribution over final reward and remaining length -- no extra models, architecture change, or inference overhead. This full joint distribution is used to compute a sampling utility which is the linear combination of the expected maximum reward, total compute, and latency of set of samples if generated to completion. During inference, we maximize this utility with meta-actions that determine which prefix of tokens to continue or initiate sampling from. On mixed-difficulty mathematical benchmarks, ZIP-RC improves accuracy by up to 12% over majority voting at equal or lower average cost, and traces smooth Pareto frontiers between quality, compute, and latency. By providing real-time reward-cost introspection, ZIP-RC enables adaptive, efficient reasoning.
【3】SocialDriveGen: Generating Diverse Traffic Scenarios with Controllable Social Interactions
标题:SocialDriveGen:生成具有可控社交互动的多样化交通场景
链接:https://arxiv.org/abs/2512.01363
作者:Jiaguo Tian,Zhengbang Zhu,Shenyu Zhang,Li Xu,Bo Zheng,Xu Liu,Weiji Peng,Shizeng Yao,Weinan Zhang
摘要:The generation of realistic and diverse traffic scenarios in simulation is essential for developing and evaluating autonomous driving systems. However, most simulation frameworks rely on rule-based or simplified models for scene generation, which lack the fidelity and diversity needed to represent real-world driving. While recent advances in generative modeling produce more realistic and context-aware traffic interactions, they often overlook how social preferences influence driving behavior. SocialDriveGen addresses this gap through a hierarchical framework that integrates semantic reasoning and social preference modeling with generative trajectory synthesis. By modeling egoism and altruism as complementary social dimensions, our framework enables controllable diversity in driver personalities and interaction styles. Experiments on the Argoverse 2 dataset show that SocialDriveGen generates diverse, high-fidelity traffic scenarios spanning cooperative to adversarial behaviors, significantly enhancing policy robustness and generalization to rare or high-risk situations.
【4】On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
标题:论政策优化中的最优性与对抗稳健性之间的张力
链接:https://arxiv.org/abs/2512.01228
作者:Haoran Li,Jiayu Lv,Congying Han,Zicheng Zhang,Anqi Li,Yan Liu,Tiande Guo,Nan Jiang
摘要
:Achieving optimality and adversarial robustness in deep reinforcement learning has long been regarded as conflicting goals. Nonetheless, recent theoretical insights presented in CAR suggest a potential alignment, raising the important question of how to realize this in practice. This paper first identifies a key gap between theory and practice by comparing standard policy optimization (SPO) and adversarially robust policy optimization (ARPO). Although they share theoretical consistency, a fundamental tension between robustness and optimality arises in practical policy gradient methods. SPO tends toward convergence to vulnerable first-order stationary policies (FOSPs) with strong natural performance, whereas ARPO typically favors more robust FOSPs at the expense of reduced returns. Furthermore, we attribute this tradeoff to the reshaping effect of the strongest adversary in ARPO, which significantly complicates the global landscape by inducing deceptive sticky FOSPs. This improves robustness but makes navigation more challenging. To alleviate this, we develop the BARPO, a bilevel framework unifying SPO and ARPO by modulating adversary strength, thereby facilitating navigability while preserving global optima. Extensive empirical results demonstrate that BARPO consistently outperforms vanilla ARPO, providing a practical approach to reconcile theoretical and empirical performance.
【5】DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling
标题:DPAC:扩散抽样的分布保持对抗控制
链接:https://arxiv.org/abs/2512.01153
作者:Han-Jin Lee,Han-Ju Lee,Jin-Seong Kim,Seok-Hwan Choi
摘要:Adversarially guided diffusion sampling often achieves the target class, but sample quality degrades as deviations between the adversarially controlled and nominal trajectories accumulate. We formalize this degradation as a path-space Kullback-Leibler divergence(path-KL) between controlled and nominal (uncontrolled) diffusion processes, thereby showing via Girsanov's theorem that it exactly equals the control energy. Building on this stochastic optimal control (SOC) view, we theoretically establish that minimizing this path-KL simultaneously tightens upper bounds on both the 2-Wasserstein distance and Fréchet Inception Distance (FID), revealing a principled connection between adversarial control energy and perceptual fidelity. From a variational perspective, we derive a first-order optimality condition for the control: among all directions that yield the same classification gain, the component tangent to iso-(log-)density surfaces (i.e., orthogonal to the score) minimizes path-KL, whereas the normal component directly increases distributional drift. This leads to DPAC (Distribution-Preserving Adversarial Control), a diffusion guidance rule that projects adversarial gradients onto the tangent space defined by the generative score geometry. We further show that in discrete solvers, the tangent projection cancels the O(Δt) leading error term in the Wasserstein distance, achieving an O(Δt^2) quality gap; moreover, it remains second-order robust to score or metric approximation. Empirical studies on ImageNet-100 validate the theoretical predictions, confirming that DPAC achieves lower FID and estimated path-KL at matched attack success rates.
【6】MM-ACT: Learn from Multimodal Parallel Generation to Act
标题:MM-ACT:从多模式并行生成学习到行动
链接:https://arxiv.org/abs/2512.00975
作者:Haotian Liang,Xinyi Chen,Bin Wang,Mingkang Chen,Yitian Liu,Yuhao Zhang,Zanxin Chen,Tianshuo Yang,Yilun Chen,Jiangmiao Pang,Dong Liu,Xiaokang Yang,Yao Mu,Wenqi Shao,Ping Luo
备注:17 pages
摘要:A generalist robotic policy needs both semantic understanding for task planning and the ability to interact with the environment through predictive capabilities. To tackle this, we present MM-ACT, a unified Vision-Language-Action (VLA) model that integrates text, image, and action in shared token space and performs generation across all three modalities. MM-ACT adopts a re-mask parallel decoding strategy for text and image generation, and employs a one-step parallel decoding strategy for action generation to improve efficiency. We introduce Context-Shared Multimodal Learning, a unified training paradigm that supervises generation in all three modalities from a shared context, enhancing action generation through cross-modal learning. Experiments were conducted on the LIBERO simulation and Franka real-robot setups as well as RoboTwin2.0 to assess in-domain and out-of-domain performances respectively. Our approach achieves a success rate of 96.3% on LIBERO, 72.0% across three tasks of real Franka, and 52.38% across eight bimanual tasks of RoboTwin2.0 with an additional gain of 9.25% from cross-modal learning. We release our codes, models and data at https://github.com/HHYHRHY/MM-ACT.
【7】Deep Learning for Modeling and Dispatching Hybrid Wind Farm Power Generation
标题:用于建模和调度混合风电场发电的深度学习
链接:https://arxiv.org/abs/2512.00728
作者:Zach Lawrence,Jessica Yao,Chris Qin
备注:10 pages, 8 figures, to be published in 2025 IEEE International Conference on Data Mining Workshops (ICDMW)
摘要:Wind farms with integrated energy storage, or hybrid wind farms, are able to store energy and dispatch it to the grid following an operational strategy. For individual wind farms with integrated energy storage capacity, data-driven dispatch strategies using localized grid demand and market conditions as input parameters stand to maximize wind energy value. Synthetic power generation data modeled on atmospheric conditions provide another avenue for improving the robustness of data-driven dispatch strategies. To these ends, the present work develops two deep learning frameworks: COVE-NN, an LSTM-based dispatch strategy tailored to individual wind farms, which reduced annual COVE by 32.3% over 43 years of simulated operations in a case study at the Pyron site; and a power generation modeling framework that reduced RMSE by 9.5% and improved power curve similarity by 18.9% when validated on the Palouse wind farm. Together, these models pave the way for more robust, data-driven dispatch strategies and potential extensions to other renewable energy systems.
【8】Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation
标题:混合型表格数据生成的隐私保护扩散模型
链接:https://arxiv.org/abs/2512.00638
作者:Timur Sattarov,Marco Schreyer,Damian Borth
备注:15 pages, 8 figures, 4 tables
摘要
:We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two privacy-aware training strategies: an adaptive timestep sampler that aligns updates with diffusion dynamics, and a feature-aggregated loss that mitigates clipping-induced bias. Together, these enhancements improve fidelity and downstream utility without weakening privacy guarantees. On financial and medical datasets, DP-FinDiff achieves 16-42% higher utility than DP baselines at comparable privacy levels, demonstrating its promise for safe and effective data sharing in sensitive domains.
【9】Pre-Generating Multi-Difficulty PDE Data for Few-Shot Neural PDE Solvers
标题:为Few-Shot神经DTE解算器预生成多困难的DTE数据
链接:https://arxiv.org/abs/2512.00564
作者:Naman Choudhary,Vedant Singh,Ameet Talwalkar,Nicholas Matthew Boffi,Mikhail Khodak,Tanya Marwah
备注:10 Pages, 11 Figures
摘要:A key aspect of learned partial differential equation (PDE) solvers is that the main cost often comes from generating training data with classical solvers rather than learning the model itself. Another is that there are clear axes of difficulty--e.g., more complex geometries and higher Reynolds numbers--along which problems become (1) harder for classical solvers and thus (2) more likely to benefit from neural speedups. Towards addressing this chicken-and-egg challenge, we study difficulty transfer on 2D incompressible Navier-Stokes, systematically varying task complexity along geometry (number and placement of obstacles), physics (Reynolds number), and their combination. Similar to how it is possible to spend compute to pre-train foundation models and improve their performance on downstream tasks, we find that by classically solving (analogously pre-generating) many low and medium difficulty examples and including them in the training set, it is possible to learn high-difficulty physics from far fewer samples. Furthermore, we show that by combining low and high difficulty data, we can spend 8.9x less compute on pre-generating a dataset to achieve the same error as using only high difficulty examples. Our results highlight that how we allocate classical-solver compute across difficulty levels is as important as how much we allocate overall, and suggest substantial gains from principled curation of pre-generated PDE data for neural solvers. Our code is available at https://github.com/Naman-Choudhary-AI-ML/pregenerating-pde
【10】A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research
标题:用于大规模热力建筑数据生成的高度可配置框架以推动机器学习研究
链接:https://arxiv.org/abs/2512.00483
作者:Thomas Krug,Fabian Raisch,Dominik Aimer,Markus Wirnsberger,Ferdinand Sigg,Felix Koch,Benjamin Schäfer,Benjamin Tischler
备注:Under Review
摘要:Data-driven modeling of building thermal dynamics is emerging as an increasingly important field of research for large-scale intelligent building control. However, research in data-driven modeling using machine learning (ML) techniques requires massive amounts of thermal building data, which is not easily available. Neither empirical public datasets nor existing data generators meet the needs of ML research in terms of data quality and quantity. Moreover, existing data generation approaches typically require expert knowledge in building simulation. To fill this gap, we present a thermal building data generation framework which we call BuilDa. BuilDa is designed to produce synthetic data of adequate quality and quantity for ML research. The framework does not require profound building simulation knowledge to generate large volumes of data. BuilDa uses a single-zone Modelica model that is exported as a Functional Mock-up Unit (FMU) and simulated in Python. We demonstrate BuilDa by generating data and utilizing it for a transfer learning study involving the fine-tuning of 486 data-driven models.
【11】RECTor: Robust and Efficient Correlation Attack on Tor
标题:RECTor:对Tor的稳健有效相关攻击
链接:https://arxiv.org/abs/2512.00436
作者:Binghui Wu,Dinil Mon Divakaran,Levente Csikor,Mohan Gurusamy
备注:8 pages, 4 figures, 2 tables
摘要:Tor is a widely used anonymity network that conceals user identities by routing traffic through encrypted relays, yet it remains vulnerable to traffic correlation attacks that deanonymize users by matching patterns in ingress and egress traffic. However, existing correlation methods suffer from two major limitations: limited robustness to noise and partial observations, and poor scalability due to computationally expensive pairwise matching. To address these challenges, we propose RECTor, a machine learning-based framework for traffic correlation under realistic conditions. RECTor employs attention-based Multiple Instance Learning (MIL) and GRU-based temporal encoding to extract robust flow representations, even when traffic data is incomplete or obfuscated. These embeddings are mapped into a shared space via a Siamese network and efficiently matched using approximate nearest neighbor (aNN) search. Empirical evaluations show that RECTor outperforms state-of-the-art baselines such as DeepCorr, DeepCOFFEA, and FlowTracker, achieving up to 60% higher true positive rates under high-noise conditions and reducing training and inference time by over 50%. Moreover, RECTor demonstrates strong scalability: inference cost grows near-linearly as the number of flows increases. These findings reveal critical vulnerabilities in Tor's anonymity model and highlight the need for advanced model-aware defenses.
【12】GreenPlanner: Practical Floorplan Layout Generation via an Energy-Aware and Function-Feasible Generative Framework
标题:GreenPlanner:通过能源感知和功能可行的生成框架生成实用的平面布局
链接:https://arxiv.org/abs/2512.00406
作者:Pengyu Zeng,Yuqin Dai,Jun Yin,Jing Zhong,Ziyang Han,Chaoyang Shi,ZhanXiang Jin,Maowei Jiang,Yuxing Han,Shuai Lu
备注:11 pages, 6 figures
摘要
:Building design directly affects human well-being and carbon emissions, yet generating spatial-functional and energy-compliant floorplans remains manual, costly, and non-scalable. Existing methods produce visually plausible layouts but frequently violate key constraints, yielding invalid results due to the absence of automated evaluation. We present GreenPlanner, an energy- and functionality-aware generative framework that unifies design evaluation and generation. It consists of a labeled Design Feasibility Dataset for learning constraint priors; a fast Practical Design Evaluator (PDE) for predicting energy performance and spatial-functional validity; a Green Plan Dataset (GreenPD) derived from PDE-guided filtering to pair user requirements with regulation-compliant layouts; and a GreenFlow generator trained on GreenPD with PDE feedback for controllable, regulation-aware generation. Experiments show that GreenPlanner accelerates evaluation by over $10^{5}\times$ with $>$99% accuracy, eliminates invalid samples, and boosts design efficiency by 87% over professional architects.
【13】SD-CGAN: Conditional Sinkhorn Divergence GAN for DDoS Anomaly Detection in IoT Networks
标题:SD-CGAN:用于物联网网络中DDOS异常检测的条件Sinkhorn分歧GAN
链接:https://arxiv.org/abs/2512.00251
作者:Henry Onyeka,Emmanuel Samson,Liang Hong,Tariqul Islam,Imtiaz Ahmed,Kamrul Hasan
备注:7 pages, 6 figures, camera-ready version accepted for presentation at IEEE ICNC 2026
摘要:The increasing complexity of IoT edge networks presents significant challenges for anomaly detection, particularly in identifying sophisticated Denial-of-Service (DoS) attacks and zero-day exploits under highly dynamic and imbalanced traffic conditions. This paper proposes SD-CGAN, a Conditional Generative Adversarial Network framework enhanced with Sinkhorn Divergence, tailored for robust anomaly detection in IoT edge environments. The framework incorporates CTGAN-based synthetic data augmentation to address class imbalance and leverages Sinkhorn Divergence as a geometry-aware loss function to improve training stability and reduce mode collapse. The model is evaluated on exploitative attack subsets from the CICDDoS2019 dataset and compared against baseline deep learning and GAN-based approaches. Results show that SD-CGAN achieves superior detection accuracy, precision, recall, and F1-score while maintaining computational efficiency suitable for deployment in edge-enabled IoT environments.
【14】Hybrid Synthetic Data Generation with Domain Randomization Enables Zero-Shot Vision-Based Part Inspection Under Extreme Class Imbalance
标题:具有域随机化的混合合成数据生成实现了极端类别不平衡下的零次视觉零件检测
链接:https://arxiv.org/abs/2512.00125
作者:Ruo-Syuan Mei,Sixian Jia,Guangze Li,Soo Yeon Lee,Brian Musser,William Keller,Sreten Zakula,Jorge Arinez,Chenhui Shao
备注:Submitted to the NAMRC 54
摘要:Machine learning, particularly deep learning, is transforming industrial quality inspection. Yet, training robust machine learning models typically requires large volumes of high-quality labeled data, which are expensive, time-consuming, and labor-intensive to obtain in manufacturing. Moreover, defective samples are intrinsically rare, leading to severe class imbalance that degrades model performance. These data constraints hinder the widespread adoption of machine learning-based quality inspection methods in real production environments. Synthetic data generation (SDG) offers a promising solution by enabling the creation of large, balanced, and fully annotated datasets in an efficient, cost-effective, and scalable manner. This paper presents a hybrid SDG framework that integrates simulation-based rendering, domain randomization, and real background compositing to enable zero-shot learning for computer vision-based industrial part inspection without manual annotation. The SDG pipeline generates 12,960 labeled images in one hour by varying part geometry, lighting, and surface properties, and then compositing synthetic parts onto real image backgrounds. A two-stage architecture utilizing a YOLOv8n backbone for object detection and MobileNetV3-small for quality classification is trained exclusively on synthetic data and evaluated on 300 real industrial parts. The proposed approach achieves an mAP@0.5 of 0.995 for detection, 96% classification accuracy, and 90.1% balanced accuracy. Comparative evaluation against few-shot real-data baseline approaches demonstrates significant improvement. The proposed SDG-based approach achieves 90-91% balanced accuracy under severe class imbalance, while the baselines reach only 50% accuracy. These results demonstrate that the proposed method enables annotation-free, scalable, and robust quality inspection for real-world manufacturing applications.
【15】Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
标题:Art 2 Music:通过多模式感觉对齐为艺术图像生成音乐
链接:https://arxiv.org/abs/2512.00120
作者:Jiaying Hong,Ting Zhu,Thanet Markchom,Huizhi Liang
摘要:With the rise of AI-generated content (AIGC), generating perceptually natural and feeling-aligned music from multimodal inputs has become a central challenge. Existing approaches often rely on explicit emotion labels that require costly annotation, underscoring the need for more flexible feeling-aligned methods. To support multimodal music generation, we construct ArtiCaps, a pseudo feeling-aligned image-music-text dataset created by semantically matching descriptions from ArtEmis and MusicCaps. We further propose Art2Music, a lightweight cross-modal framework that synthesizes music from artistic images and user comments. In the first stage, images and text are encoded with OpenCLIP and fused using a gated residual module; the fused representation is decoded by a bidirectional LSTM into Mel-spectrograms with a frequency-weighted L1 loss to enhance high-frequency fidelity. In the second stage, a fine-tuned HiFi-GAN vocoder reconstructs high-quality audio waveforms. Experiments on ArtiCaps show clear improvements in Mel-Cepstral Distortion, Frechet Audio Distance, Log-Spectral Distance, and cosine similarity. A small LLM-based rating study further verifies consistent cross-modal feeling alignment and offers interpretable explanations of matches and mismatches across modalities. These results demonstrate improved perceptual naturalness, spectral fidelity, and semantic consistency. Art2Music also maintains robust performance with only 50k training samples, providing a scalable solution for feeling-aligned creative audio generation in interactive art, personalized soundscapes, and digital art exhibitions.
【16】Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions
标题:对抗性监视任务的基于Bayesian模糊性收缩的自适应鲁棒马尔科夫决策过程
链接:https://arxiv.org/abs/2512.01660
作者:Jimin Choi,Max Z. Li
摘要:Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance (ISR) missions in contested environments, where adversaries may act strategically to deceive or evade detection. These missions pose challenges due to model uncertainty and the need for safe, real-time decision-making. Robust Markov Decision Processes (RMDPs) provide worst-case guarantees but are limited by static ambiguity sets that capture initial uncertainty without adapting to new observations. This paper presents an adaptive RMDP framework tailored to ISR missions with CCAs. We introduce a mission-specific formulation in which aircraft alternate between movement and sensing states. Adversarial tactics are modeled as a finite set of transition kernels, each capturing assumptions about how adversarial sensing or environmental conditions affect rewards. Our approach incrementally refines policies by eliminating inconsistent threat models, allowing agents to shift from conservative to aggressive behaviors while maintaining robustness. We provide theoretical guarantees showing that the adaptive planner converges as credible sets contract to the true threat and maintains safety under uncertainty. Experiments under Gaussian and non-Gaussian threat models across diverse network topologies show higher mission rewards and fewer exposure events compared to nominal and static robust planners.
【17】Comparative Evaluation of Generative AI Models for Chest Radiograph Report Generation in the Emergency Department
标题:急诊科胸透片报告生成的生成AI模型的比较评估
链接:https://arxiv.org/abs/2512.00271
作者:Woo Hyeon Lim,Ji Young Lee,Jong Hyuk Lee,Saehoon Kim,Hyungjin Kim
摘要:Purpose: To benchmark open-source or commercial medical image-specific VLMs against real-world radiologist-written reports. Methods: This retrospective study included adult patients who presented to the emergency department between January 2022 and April 2025 and underwent same-day CXR and CT for febrile or respiratory symptoms. Reports from five VLMs (AIRead, Lingshu, MAIRA-2, MedGemma, and MedVersa) and radiologist-written reports were randomly presented and blindly evaluated by three thoracic radiologists using four criteria: RADPEER, clinical acceptability, hallucination, and language clarity. Comparative performance was assessed using generalized linear mixed models, with radiologist-written reports treated as the reference. Finding-level analyses were also performed with CT as the reference. Results: A total of 478 patients (median age, 67 years [interquartile range, 50-78]; 282 men [59.0%]) were included. AIRead demonstrated the lowest RADPEER 3b rate (5.3% [76/1434] vs. radiologists 13.9% [200/1434]; P<.001 whereas other vlms showed higher disagreement rates p clinical acceptability was the highest with airead vs. radiologists while performed worse hallucinations were rare comparable to but frequent models language clarity lingshu and medversa compared sensitivity varied substantially across for common findings: maira-2 medgemma conclusion: medical cxr report generation exhibited variable performance in quality diagnostic measures.>
半/弱/无/有监督|不确定性|主动学习(12篇)
【1】RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies
标题:RoaD:滚动作为自动驾驶政策闭环监督微调的演示
链接:https://arxiv.org/abs/2512.01993
作者:Guillermo Garcia-Cobo,Maximilian Igl,Peter Karkus,Zhejun Zhang,Michael Watson,Yuxiao Chen,Boris Ivanovic,Marco Pavone
备注:Preprint
摘要:Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, leading to compounding errors. We introduce Rollouts as Demonstrations (RoaD), a simple and efficient method to mitigate covariate shift by leveraging the policy's own closed-loop rollouts as additional training data. During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning. This approach enables robust closed-loop adaptation with orders of magnitude less data than reinforcement learning, and avoids restrictive assumptions of prior closed-loop supervised fine-tuning (CL-SFT) methods, allowing broader applications domains including end-to-end driving. We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method; and in AlpaSim, a high-fidelity neural reconstruction-based simulator for end-to-end driving, where it improves driving score by 41\% and reduces collisions by 54\%.
【2】GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment
标题:GrndControl:通过自我监督的奖励调整来建立世界模型
链接:https://arxiv.org/abs/2512.01952
作者:Haoyang He,Jay Patrikar,Dong-Ki Kim,Max Smith,Daniel McGann,Ali-akbar Agha-mohammadi,Shayegan Omidshafiei,Sebastian Scherer
摘要:Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planning, and control. Yet, despite their realism, these models often lack geometric grounding, limiting their use in navigation tasks that require spatial coherence and long-horizon stability. We introduce Reinforcement Learning with World Grounding (RLWG), a self-supervised post-training framework that aligns pretrained world models with a physically verifiable structure through geometric and perceptual rewards. Analogous to reinforcement learning from verifiable feedback (RLVR) in language models, RLWG can use multiple rewards that measure pose cycle-consistency, depth reprojection, and temporal coherence. We instantiate this framework with GrndCtrl, a reward-aligned adaptation method based on Group Relative Policy Optimization (GRPO), yielding world models that maintain stable trajectories, consistent geometry, and reliable rollouts for embodied navigation. Like post-training alignment in large language models, GrndCtrl leverages verifiable rewards to bridge generative pretraining and grounded behavior, achieving superior spatial coherence and navigation stability over supervised fine-tuning in outdoor environments.
【3】Real-World Reinforcement Learning of Active Perception Behaviors
标题:主动感知行为的现实世界强化学习
链接:https://arxiv.org/abs/2512.01188
作者:Edward S. Hu,Jie Wang,Xingfang Yuan,Fiona Luo,Muyao Li,Gaspard Lambrechts,Oleh Rybkin,Dinesh Jayaraman
备注:NeurIPS 2025 camera ready
摘要:A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to gain the missing information. Today's standard robot learning techniques struggle to produce such active perception behaviors. We propose a simple real-world robot learning recipe to efficiently train active perception policies. Our approach, asymmetric advantage weighted regression (AAWR), exploits access to "privileged" extra sensors at training time. The privileged sensors enable training high-quality privileged value functions that aid in estimating the advantage of the target policy. Bootstrapping from a small number of potentially suboptimal demonstrations and an easy-to-obtain coarse policy initialization, AAWR quickly acquires active perception behaviors and boosts task performance. In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. When initialized with a "generalist" robot policy that struggles with active perception tasks, AAWR efficiently generates information-gathering behaviors that allow it to operate under severe partial observability for manipulation tasks. Website: https://penn-pal-lab.github.io/aawr/
【4】Uncertainty Quantification for Deep Regression using Contextualised Normalizing Flows
标题:使用上下文规范化流进行深度回归的不确定性量化
链接:https://arxiv.org/abs/2512.00835
作者:Adriel Sosa Marco,John Daniel Kirwan,Alexia Toumpa,Simos Gerasimou
摘要:Quantifying uncertainty in deep regression models is important both for understanding the confidence of the model and for safe decision-making in high-risk domains. Existing approaches that yield prediction intervals overlook distributional information, neglecting the effect of multimodal or asymmetric distributions on decision-making. Similarly, full or approximated Bayesian methods, while yielding the predictive posterior density, demand major modifications to the model architecture and retraining. We introduce MCNF, a novel post hoc uncertainty quantification method that produces both prediction intervals and the full conditioned predictive distribution. MCNF operates on top of the underlying trained predictive model; thus, no predictive model retraining is needed. We provide experimental evidence that the MCNF-based uncertainty estimate is well calibrated, is competitive with state-of-the-art uncertainty quantification methods, and provides richer information for downstream decision-making tasks.
【5】Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
标题:提取受监督和离线RL数据集的数学保证
链接:https://arxiv.org/abs/2512.00536
作者:Aaryan Gupta,Rishi Saket,Aravindan Raghuveer
备注:29 pages, 2 figures
摘要:Given a training dataset, the goal of dataset distillation is to derive a synthetic dataset such that models trained on the latter perform as well as those trained on the training dataset. In this work, we develop and analyze an efficient dataset distillation algorithm for supervised learning, specifically regression in $\mathbb{R}^d$, based on matching the losses on the training and synthetic datasets with respect to a fixed set of randomly sampled regressors without any model training. Our first key contribution is a novel performance guarantee proving that our algorithm needs only $\tilde{O}(d^2)$ sampled regressors to derive a synthetic dataset on which the MSE loss of any bounded linear model is nearly the same as its MSE loss on the given training data. In particular, the model optimized on the synthetic data has close to minimum loss on the training data, thus performing nearly as well as the model optimized on the latter. Complementing this, we also prove a matching lower bound of $Ω(d^2)$ for the number of sampled regressors showing the tightness of our analysis. Our second contribution is to extend our algorithm to offline RL dataset distillation by matching the Bellman loss, unlike previous works which used a behavioral cloning objective. This is the first such method which leverages both, the rewards and the next state information, available in offline RL datasets, without any policy model optimization. Our algorithm generates a synthetic dataset whose Bellman loss with respect to any linear action-value predictor is close to the latter's Bellman loss on the offline RL training dataset. Therefore, a policy associated with an action-value predictor optimized on the synthetic dataset performs nearly as well as that derived from the one optimized on the training data. We conduct experiments to validate our theoretical guarantees and observe performance gains.
【6】Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction
标题:通过保形预测实现主动模仿学习中样本高效的专家查询控制
链接:https://arxiv.org/abs/2512.00453
作者:Arad Firouzkouhi,Omid Mirzaeedodangeh,Lars Lindemann
摘要:Active imitation learning (AIL) combats covariate shift by querying an expert during training. However, expert action labeling often dominates the cost, especially in GPU-intensive simulators, human-in-the-loop settings, and robot fleets that revisit near-duplicate states. We present Conformalized Rejection Sampling for Active Imitation Learning (CRSAIL), a querying rule that requests an expert action only when the visited state is under-represented in the expert-labeled dataset. CRSAIL scores state novelty by the distance to the $K$-th nearest expert state and sets a single global threshold via conformal prediction. This threshold is the empirical $(1-α)$ quantile of on-policy calibration scores, providing a distribution-free calibration rule that links $α$ to the expected query rate and makes $α$ a task-agnostic tuning knob. This state-space querying strategy is robust to outliers and, unlike safety-gate-based AIL, can be run without real-time expert takeovers: we roll out full trajectories (episodes) with the learner and only afterward query the expert on a subset of visited states. Evaluated on MuJoCo robotics tasks, CRSAIL matches or exceeds expert-level reward while reducing total expert queries by up to 96% vs. DAgger and up to 65% vs. prior AIL methods, with empirical robustness to $α$ and $K$, easing deployment on novel systems with unknown dynamics.
【7】Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning
标题:用于离线鲁棒强化学习的样本高效表格式自玩
链接:https://arxiv.org/abs/2512.00352
作者:Na Li,Zewu Zheng,Wei Ni,Hangguan Shan,Wenjie Zhang,Xinyu Li
备注:NeurIPS 2025
摘要:Multi-agent reinforcement learning (MARL), as a thriving field, explores how multiple agents independently make decisions in a shared dynamic environment. Due to environmental uncertainties, policies in MARL must remain robust to tackle the sim-to-real gap. We focus on robust two-player zero-sum Markov games (TZMGs) in offline settings, specifically on tabular robust TZMGs (RTZMGs). We propose a model-based algorithm (\textit{RTZ-VI-LCB}) for offline RTZMGs, which is optimistic robust value iteration combined with a data-driven Bernstein-style penalty term for robust value estimation. By accounting for distribution shifts in the historical dataset, the proposed algorithm establishes near-optimal sample complexity guarantees under partial coverage and environmental uncertainty. An information-theoretic lower bound is developed to confirm the tightness of our algorithm's sample complexity, which is optimal regarding both state and action spaces. To the best of our knowledge, RTZ-VI-LCB is the first to attain this optimality, sets a new benchmark for offline RTZMGs, and is validated experimentally.
【8】Provable Memory Efficient Self-Play Algorithm for Model-free Reinforcement Learning
标题:用于无模型强化学习的可证明内存高效自玩算法
链接:https://arxiv.org/abs/2512.00351
作者:Na Li,Yuchen Jiao,Hangguan Shan,Shefeng Yan
备注:ICLR 2024. arXiv admin note: substantial text overlap with arXiv:2110.04645 by other authors
摘要:The thriving field of multi-agent reinforcement learning (MARL) studies how a group of interacting agents make decisions autonomously in a shared dynamic environment. Existing theoretical studies in this area suffer from at least two of the following obstacles: memory inefficiency, the heavy dependence of sample complexity on the long horizon and the large state space, the high computational complexity, non-Markov policy, non-Nash policy, and high burn-in cost. In this work, we take a step towards settling this problem by designing a model-free self-play algorithm \emph{Memory-Efficient Nash Q-Learning (ME-Nash-QL)} for two-player zero-sum Markov games, which is a specific setting of MARL. ME-Nash-QL is proven to enjoy the following merits. First, it can output an $\varepsilon$-approximate Nash policy with space complexity $O(SABH)$ and sample complexity $\widetilde{O}(H^4SAB/\varepsilon^2)$, where $S$ is the number of states, $\{A, B\}$ is the number of actions for two players, and $H$ is the horizon length. It outperforms existing algorithms in terms of space complexity for tabular cases, and in terms of sample complexity for long horizons, i.e., when $\min\{A, B\}\ll H^2$. Second, ME-Nash-QL achieves the lowest computational complexity $O(T\mathrm{poly}(AB))$ while preserving Markov policies, where $T$ is the number of samples. Third, ME-Nash-QL also achieves the best burn-in cost $O(SAB\,\mathrm{poly}(H))$, whereas previous algorithms have a burn-in cost of at least $O(S^3 AB\,\mathrm{poly}(H))$ to attain the same level of sample complexity with ours.
【9】Self-Supervised Dynamical System Representations for Physiological Time-Series
标题:生理时间序列的自我监督动态系统表示
链接:https://arxiv.org/abs/2512.00239
作者:Yenho Chen,Maxwell A. Xu,James M. Rehg,Christopher J. Rozell
摘要:The effectiveness of self-supervised learning (SSL) for physiological time series depends on the ability of a pretraining objective to preserve information about the underlying physiological state while filtering out unrelated noise. However, existing strategies are limited due to reliance on heuristic principles or poorly constrained generative tasks. To address this limitation, we propose a pretraining framework that exploits the information structure of a dynamical systems generative model across multiple time-series. This framework reveals our key insight that class identity can be efficiently captured by extracting information about the generative variables related to the system parameters shared across similar time series samples, while noise unique to individual samples should be discarded. Building on this insight, we propose PULSE, a cross-reconstruction-based pretraining objective for physiological time series datasets that explicitly extracts system information while discarding non-transferrable sample-specific ones. We establish theory that provides sufficient conditions for the system information to be recovered, and empirically validate it using a synthetic dynamical systems experiment. Furthermore, we apply our method to diverse real-world datasets, demonstrating that PULSE learns representations that can broadly distinguish semantic classes, increase label efficiency, and improve transfer learning.
【10】TIE: A Training-Inversion-Exclusion Framework for Visually Interpretable and Uncertainty-Guided Out-of-Distribution Detection
标题:TIE:用于视觉可解释和不确定性引导的非分布检测的训练-倒置-排除框架
链接:https://arxiv.org/abs/2512.00229
作者:Pirzada Suhail,Rehna Afroz,Amit Sethi
摘要
:Deep neural networks often struggle to recognize when an input lies outside their training experience, leading to unreliable and overconfident predictions. Building dependable machine learning systems therefore requires methods that can both estimate predictive \textit{uncertainty} and detect \textit{out-of-distribution (OOD)} samples in a unified manner. In this paper, we propose \textbf{TIE: a Training--Inversion--Exclusion} framework for visually interpretable and uncertainty-guided anomaly detection that jointly addresses these challenges through iterative refinement. TIE extends a standard $n$-class classifier to an $(n+1)$-class model by introducing a garbage class initialized with Gaussian noise to represent outlier inputs. Within each epoch, TIE performs a closed-loop process of \textit{training, inversion, and exclusion}, where highly uncertain inverted samples reconstructed from the just-trained classifier are excluded into the garbage class. Over successive iterations, the inverted samples transition from noisy artifacts into visually coherent class prototypes, providing transparent insight into how the model organizes its learned manifolds. During inference, TIE rejects OOD inputs by either directly mapping them to the garbage class or producing low-confidence, uncertain misclassifications within the in-distribution classes that are easily separable, all without relying on external OOD datasets. A comprehensive threshold-based evaluation using multiple OOD metrics and performance measures such as \textit{AUROC}, \textit{AUPR}, and \textit{FPR@95\%TPR} demonstrates that TIE offers a unified and interpretable framework for robust anomaly detection and calibrated uncertainty estimation (UE) achieving near-perfect OOD detection with \textbf{\(\!\approx\!\) 0 FPR@95\%TPR} when trained on MNIST or FashionMNIST and tested against diverse unseen datasets.
【11】SetupKit: Efficient Multi-Corner Setup/Hold Time Characterization Using Bias-Enhanced Interpolation and Active Learning
标题:SetupKit:使用偏置增强内插和主动学习的高效多角建立/保持时间特征
链接:https://arxiv.org/abs/2512.00044
作者:Junzhuo Zhou,Ziwen Wang,Haoxuan Xia,Yuxin Yan,Chengyu Zhu,Ting-Jung Lin,Wei Xing,Lei He
摘要:Accurate setup/hold time characterization is crucial for modern chip timing closure, but its reliance on potentially millions of SPICE simulations across diverse process-voltagetemperature (PVT) corners creates a major bottleneck, often lasting weeks or months. Existing methods suffer from slow search convergence and inefficient exploration, especially in the multi-corner setting. We introduce SetupKit, a novel framework designed to break this bottleneck using statistical intelligence, circuit analysis and active learning (AL). SetupKit integrates three key innovations: BEIRA, a bias-enhanced interpolation search derived from statistical error modeling to accelerate convergence by overcoming stagnation issues, initial search interval estimation by circuit analysis and AL strategy using Gaussian Process. This AL component intelligently learns PVT-timing correlations, actively guiding the expensive simulations to the most informative corners, thus minimizing redundancy in multicorner characterization. Evaluated on industrial 22nm standard cells across 16 PVT corners, SetupKit demonstrates a significant 2.4x overall CPU time reduction (from 720 to 290 days on a single core) compared to standard practices, drastically cutting characterization time. SetupKit offers a principled, learningbased approach to library characterization, addressing a critical EDA challenge and paving the way for more intelligent simulation management.
【12】Predicting COVID-19 Prevalence Using Wastewater RNA Surveillance: A Semi-Supervised Learning Approach with Temporal Feature Trust
标题:使用废水RNA监测预测COVID-19患病率:具有时间特征信任度的半监督学习方法
链接:https://arxiv.org/abs/2512.00100
作者:Yifei Chen,Eric Liang
备注:22 pages, 13 figures. Submitted to SIURO
摘要:As COVID-19 transitions into an endemic disease that remains constantly present in the population at a stable level, monitoring its prevalence without invasive measures becomes increasingly important. In this paper, we present a deep neural network estimator for the COVID-19 daily case count based on wastewater surveillance data and other confounding factors. This work builds upon the study by Jiang, Kolozsvary, and Li (2024), which connects the COVID-19 case counts with testing data collected early in the pandemic. Using the COVID-19 testing data and the wastewater surveillance data during the period when both data were highly reliable, one can train an artificial neural network that learns the nonlinear relation between the COVID-19 daily case count and the wastewater viral RNA concentration. From a machine learning perspective, the main challenge lies in addressing temporal feature reliability, as the training data has different reliability over different time periods.
迁移|Zero/Few/One-Shot|自适应(8篇)
【1】Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling
标题:四过六:通过自适应块缩放实现更准确的NVFP 4量化
链接:https://arxiv.org/abs/2512.02010
作者:Jack Cook,Junxian Guo,Guangxuan Xiao,Yujun Lin,Song Han
备注:10 pages, 5 figures
摘要:As large language models have grown larger, low-precision numerical formats such as NVFP4 have become increasingly popular due to the speed and memory benefits they provide. However, to accelerate computation with NVFP4, all matrix multiplication operands--weights and activations in the forward pass, and weights, activations, and gradients in the backward pass--must be quantized to NVFP4, often leading to divergence during training and performance degradation during inference. NVFP4 by evaluating multiple potential scale factors for each block of values. To address this issue, in this work we introduce Four Over Six (4/6), a modification to the NVFP4 quantization algorithm that evaluates two potential scale factors for each block of values. Unlike integer formats, floating-point formats such as FP4 have the most quantization error on near-maximal values in each block, which we find to be primarily responsible for downstream performance degradation. We find that for some blocks, scaling to smaller FP4 values makes the distribution of representable values more uniform, improving representation of near-maximal values. Importantly, 4/6 can be implemented efficiently on NVIDIA Blackwell GPUs, making it viable to use while training LLMs with NVFP4. In pre-training experiments with transformer and hybrid model architectures, we find that 4/6 prevents divergence in several cases, bringing training loss significantly closer to BF16 compared to models trained with current state-of-the-art NVFP4 training recipes. We also find that 4/6 can be easily incorporated into many different post-training quantization methods and generally improves downstream accuracy. We hope this inspires future work in training and deploying models with NVFP4.
【2】Winning Solutions for the Rayan AI Contest: Compositional Retrieval, Zero-Shot Anomaly Detection, and Backdoor Detection
标题:Rayan人工智能竞赛获胜解决方案:成分检索、Zero-Shot异常检测和后门检测
链接:https://arxiv.org/abs/2512.01498
作者:Ali Nafisi,Sina Asghari,Mohammad Saeed Arvenaghi,Hossein Shakibania
摘要:This report presents solutions to three machine learning challenges: compositional image retrieval, zero-shot anomaly detection, and backdoored model detection. In compositional image retrieval, we developed a system that processes visual and textual inputs to retrieve relevant images, achieving 95.38\% accuracy and ranking first with a clear margin over the second team. For zero-shot anomaly detection, we designed a model that identifies and localizes anomalies in images without prior exposure to abnormal examples, securing 1st place with 73.14\% accuracy. In the backdoored model detection task, we proposed a method to detect hidden backdoor triggers in neural networks, reaching an accuracy of 78\%, which placed our approach in second place. These results demonstrate the effectiveness of our methods in addressing key challenges related to retrieval, anomaly detection, and model security, with implications for real-world applications in industries such as healthcare, manufacturing, and cybersecurity. Code for all solutions is available online.
【3】Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution
标题:背景分布转变下的开放集域适应:挑战和可证明有效的解决方案
链接:https://arxiv.org/abs/2512.01152
作者:Shravan Chaudhari,Yoav Wald,Suchi Saria
摘要:As we deploy machine learning systems in the real world, a core challenge is to maintain a model that is performant even as the data shifts. Such shifts can take many forms: new classes may emerge that were absent during training, a problem known as open-set recognition, and the distribution of known categories may change. Guarantees on open-set recognition are mostly derived under the assumption that the distribution of known classes, which we call \emph{the background distribution}, is fixed. In this paper we develop \ours{}, a method that is guaranteed to solve open-set recognition even in the challenging case where the background distribution shifts. We prove that the method works under benign assumptions that the novel class is separable from the non-novel classes, and provide theoretical guarantees that it outperforms a representative baseline in a simplified overparameterized setting. We develop techniques to make \ours{} scalable and robust, and perform comprehensive empirical evaluations on image and text data. The results show that \ours{} significantly outperforms existing open-set recognition methods under background shift. Moreover, we provide new insights into how factors such as the size of the novel class influences performance, an aspect that has not been extensively explored in prior work.
【4】Adaptive-lambda Subtracted Importance Sampled Scores in Machine Unlearning for DDPMs and VAEs
标题:DDPM和VAE的机器取消学习中的自适应Lambda重要性抽样分数
链接:https://arxiv.org/abs/2512.01054
作者:MohammadParsa Dini,Human Jafari,Sajjad Amini,MohammadMahdi Mojahedian
摘要:Machine Unlearning is essential for large generative models (VAEs, DDPMs) to comply with the right to be forgotten and prevent undesired content generation without costly retraining. Existing approaches, such as Static-lambda SISS for diffusion models, rely on a fixed mixing weight lambda, which is suboptimal because the required unlearning strength varies across samples and training stages. We propose Adaptive-lambda SISS, a principled extension that turns lambda into a latent variable dynamically inferred at each training step. A lightweight inference network parameterizes an adaptive posterior over lambda, conditioned on contextual features derived from the instantaneous SISS loss terms (retain/forget losses and their gradients). This enables joint optimization of the diffusion model and the lambda-inference mechanism via a variational objective, yielding significantly better trade-offs. We further extend the adaptive-lambda principle to score-based unlearning and introduce a multi-class variant of Score Forgetting Distillation. In addition, we present two new directions: (i) a hybrid objective combining the data-free efficiency of Score Forgetting Distillation with the direct gradient control of SISS, and (ii) a Reinforcement Learning formulation that treats unlearning as a sequential decision process, learning an optimal policy over a state space defined by the model's current memory of the forget set. Experiments on an augmented MNIST benchmark show that Adaptive-lambda SISS substantially outperforms the original static-lambda SISS, achieving stronger removal of forgotten classes while better preserving generation quality on the retain set.
【5】Light-Weight Benchmarks Reveal the Hidden Hardware Cost of Zero-Shot Tabular Foundation Models
标题:轻量级基准揭示了Zero-Shot表格基础模型的隐藏硬件成本
链接:https://arxiv.org/abs/2512.00888
作者:Aayam Bansal,Ishaan Gangwani
备注:ICML NewInML
摘要:Zero-shot foundation models (FMs) promise training-free prediction on tabular data, yet their hardware footprint remains poorly characterized. We present a fully reproducible benchmark that reports test accuracy together with wall-clock latency, peak CPU RAM, and peak GPU VRAM on four public datasets: Adult-Income, Higgs-100k, Wine-Quality, and California-Housing. Two open FMs (TabPFN-1.0 and TabICL-base) are compared against tuned XGBoost, LightGBM, and Random Forest baselines on a single NVIDIA T4 GPU. The tree ensembles equal or surpass FM accuracy on three datasets while completing full-test batches in <= 0.40 s and <= 150 MB RAM, using zero VRAM. TabICL achieves a 0.8 percentage-point gain on Higgs but requires roughly 40,000 times more latency (960 s) and 9 GB VRAM. TabPFN matches tree-model accuracy on Wine and Housing but peaks at 4 GB VRAM and cannot process the full 100k-row Higgs table. These results quantify the substantial hardware-versus-accuracy trade-offs in current tabular FMs and provide an open baseline for future efficiency-oriented research.
【6】PEOAT: Personalization-Guided Evolutionary Question Assembly for One-Shot Adaptive Testing
标题:PEOAT:用于一次性自适应测试的个性化引导进化问题组装
链接:https://arxiv.org/abs/2512.00439
作者:Xiaoshan Yu,Ziwei Huang,Shangshang Yang,Ziwen Wang,Haiping Ma,Xingyi Zhang
备注:AAAI-2026, 9 pages
摘要
:With the rapid advancement of intelligent education, Computerized Adaptive Testing (CAT) has attracted increasing attention by integrating educational psychology with deep learning technologies. Unlike traditional paper-and-pencil testing, CAT aims to efficiently and accurately assess examinee abilities by adaptively selecting the most suitable items during the assessment process. However, its real-time and sequential nature presents limitations in practical scenarios, particularly in large-scale assessments where interaction costs are high, or in sensitive domains such as psychological evaluations where minimizing noise and interference is essential. These challenges constrain the applicability of conventional CAT methods in time-sensitive or resourceconstrained environments. To this end, we first introduce a novel task called one-shot adaptive testing (OAT), which aims to select a fixed set of optimal items for each test-taker in a one-time selection. Meanwhile, we propose PEOAT, a Personalization-guided Evolutionary question assembly framework for One-shot Adaptive Testing from the perspective of combinatorial optimization. Specifically, we began by designing a personalization-aware initialization strategy that integrates differences between examinee ability and exercise difficulty, using multi-strategy sampling to construct a diverse and informative initial population. Building on this, we proposed a cognitive-enhanced evolutionary framework incorporating schema-preserving crossover and cognitively guided mutation to enable efficient exploration through informative signals. To maintain diversity without compromising fitness, we further introduced a diversity-aware environmental selection mechanism. The effectiveness of PEOAT is validated through extensive experiments on two datasets, complemented by case studies that uncovered valuable insights.
【7】Adaptive prediction theory combining offline and online learning
标题:结合线下和在线学习的自适应预测理论
链接:https://arxiv.org/abs/2512.00342
作者:Haizheng Li,Lei Guo
摘要:Real-world intelligence systems usually operate by combining offline learning and online adaptation with highly correlated and non-stationary system data or signals, which, however, has rarely been investigated theoretically in the literature. This paper initiates a theoretical investigation on the prediction performance of a two-stage learning framework combining offline and online algorithms for a class of nonlinear stochastic dynamical systems. For the offline-learning phase, we establish an upper bound on the generalization error for approximate nonlinear-least-squares estimation under general datasets with strong correlation and distribution shift, leveraging the Kullback-Leibler divergence to quantify the distributional discrepancies. For the online-adaptation phase, we address, on the basis of the offline-trained model, the possible uncertain parameter drift in real-world target systems by proposing a meta-LMS prediction algorithm. This two-stage framework, integrating offline learning with online adaptation, demonstrates superior prediction performances compared with either purely offline or online methods. Both theoretical guarantees and empirical studies are provided.
【8】Statistical Inference under Adaptive Sampling with LinUCB
标题:用LinUCB实现自适应采样下的统计推断
链接:https://arxiv.org/abs/2512.00222
作者:Wei Fan,Kevin Tan,Yuting Wei
摘要:Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable $\sqrt{T}$ rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a $T^{-1/4}$ rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.
强化学习(14篇)
【1】A Diffusion Model Framework for Maximum Entropy Reinforcement Learning
标题:最大熵强化学习的扩散模型框架
链接:https://arxiv.org/abs/2512.02019
作者:Sebastian Sanokowski,Kaustubh Patil,Alois Knoll
摘要:Diffusion models have achieved remarkable success in data-driven learning and in sampling from complex, unnormalized target distributions. Building on this progress, we reinterpret Maximum Entropy Reinforcement Learning (MaxEntRL) as a diffusion model-based sampling problem. We tackle this problem by minimizing the reverse Kullback-Leibler (KL) divergence between the diffusion policy and the optimal policy distribution using a tractable upper bound. By applying the policy gradient theorem to this objective, we derive a modified surrogate objective for MaxEntRL that incorporates diffusion dynamics in a principled way. This leads to simple diffusion-based variants of Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO) and Wasserstein Policy Optimization (WPO), termed DiffSAC, DiffPPO and DiffWPO. All of these methods require only minor implementation changes to their base algorithm. We find that on standard continuous control benchmarks, DiffSAC, DiffPPO and DiffWPO achieve better returns and higher sample efficiency than SAC and PPO.
【2】Forecasting in Offline Reinforcement Learning for Non-stationary Environments
标题:非平稳环境下离线强化学习中的预测
链接:https://arxiv.org/abs/2512.01987
作者:Suzan Ece Ada,Georg Martius,Emre Ugur,Erhan Oztop
备注
:The Thirty-Ninth Annual Conference on Neural Information Processing Systems, NeurIPS 2025
摘要:Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing offline RL methods often assume stationarity or only consider synthetic perturbations at test time, assumptions that often fail in real-world scenarios characterized by abrupt, time-varying offsets. These offsets can lead to partial observability, causing agents to misperceive their true state and degrade performance. To overcome this challenge, we introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies (i) conditional diffusion-based candidate state generation, trained without presupposing any specific pattern of future non-stationarity, and (ii) zero-shot time-series foundation models. FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode. Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data to simulate realistic non-stationarity, demonstrate that FORL consistently improves performance compared to competitive baselines. By integrating zero-shot forecasting with the agent's experience, we aim to bridge the gap between offline RL and the complexities of real-world, non-stationary environments.
【3】End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW
标题:C-VR PTW中用于随机多目标优化的端到端深度强化学习
链接:https://arxiv.org/abs/2512.01518
作者:Abdo Abouelrous,Laurens Bliek,Yaoxin Wu,Yingqian Zhang
备注:25 pages, 5 figures
摘要:In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are representative of practical settings where decision-makers have to deal with uncertainty in the operational environment as well as multiple conflicting objectives due to different stakeholders. We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regulations on shift length, although different objectives could be incorporated. Learning-based methods offer earnest computational advantages as they can repeatedly solve problems with limited interference from the decision-maker. We specifically focus on end-to-end deep learning models that leverage the attention mechanism and multiple solution trajectories. These models have seen several successful applications in routing problems. However, since travel times are not a direct input to these models due to the large dimensions of the travel time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple objectives. In turn, we propose a model that simultaneously addresses stochasticity and multi-objectivity and provide a refined training mechanism for this model through scenario clustering to reduce training time. Our results show that our model is capable of constructing a Pareto Front of good quality within acceptable run times compared to three baselines.
【4】Multi-Path Collaborative Reasoning via Reinforcement Learning
标题:通过强化学习进行多路径协作推理
链接:https://arxiv.org/abs/2512.01485
作者:Jindi Lv,Yuhao Zhou,Zheng Zhu,Xiaofeng Wang,Guan Huang,Jiancheng Lv
摘要:Chain-of-Thought (CoT) reasoning has significantly advanced the problem-solving capabilities of Large Language Models (LLMs), yet conventional CoT often exhibits internal determinism during decoding, limiting exploration of plausible alternatives. Recent methods attempt to address this by generating soft abstract tokens to enable reasoning in a continuous semantic space. However, we find that such approaches remain constrained by the greedy nature of autoregressive decoding, which fundamentally isolates the model from alternative reasoning possibilities. In this work, we propose Multi-Path Perception Policy Optimization (M3PO), a novel reinforcement learning framework that explicitly injects collective insights into the reasoning process. M3PO leverages parallel policy rollouts as naturally diverse reasoning sources and integrates cross-path interactions into policy updates through a lightweight collaborative mechanism. This design allows each trajectory to refine its reasoning with peer feedback, thereby cultivating more reliable multi-step reasoning patterns. Empirical results show that M3PO achieves state-of-the-art performance on both knowledge- and reasoning-intensive benchmarks. Models trained with M3PO maintain interpretability and inference efficiency, underscoring the promise of multi-path collaborative learning for robust reasoning.
【5】A TinyML Reinforcement Learning Approach for Energy-Efficient Light Control in Low-Cost Greenhouse Systems
标题:TinyML强化学习方法用于低成本温室系统中的节能光控制
链接:https://arxiv.org/abs/2512.01167
作者:Mohamed Abdallah Salem,Manuel Cuevas Perez,Ahmed Harb Rabia
备注:Copyright 2025 IEEE. This is the author's version of the work that has been accepted for publication in Proceedings of the 5. Interdisciplinary Conference on Electrics and Computer (INTCEC 2025) 15-16 September 2025, Chicago-USA. The final version of record is available at: https://doi.org/10.1109/INTCEC65580.2025.11256135
摘要:This study presents a reinforcement learning (RL)-based control strategy for adaptive lighting regulation in controlled environments using a low-power microcontroller. A model-free Q-learning algorithm was implemented to dynamically adjust the brightness of a Light-Emitting Diode (LED) based on real-time feedback from a light-dependent resistor (LDR) sensor. The system was trained to stabilize at 13 distinct light intensity levels (L1 to L13), with each target corresponding to a specific range within the 64-state space derived from LDR readings. A total of 130 trials were conducted, covering all target levels with 10 episodes each. Performance was evaluated in terms of convergence speed, steps taken, and time required to reach target states. Box plots and histograms were generated to analyze the distribution of training time and learning efficiency across targets. Experimental validation demonstrated that the agent could effectively learn to stabilize at varying light levels with minimal overshooting and smooth convergence, even in the presence of environmental perturbations. This work highlights the feasibility of lightweight, on-device RL for energy-efficient lighting control and sets the groundwork for multi-modal environmental control applications in resource-constrained agricultural systems.
【6】Automating the Refinement of Reinforcement Learning Specifications
标题:自动细化强化学习规范
链接:https://arxiv.org/abs/2512.01047
作者:Tanmay Ambadkar,Đorđe Žikelić,Abhinav Verma
摘要:Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful policies. In this work, we explore the possibility of improving coarse-grained logical specifications via an exploration-guided strategy. We propose \textsc{AutoSpec}, a framework that searches for a logical specification refinement whose satisfaction implies satisfaction of the original specification, but which provides additional guidance therefore making it easier for reinforcement learning algorithms to learn useful policies. \textsc{AutoSpec} is applicable to reinforcement learning tasks specified via the SpectRL specification logic. We exploit the compositional nature of specifications written in SpectRL, and design four refinement procedures that modify the abstract graph of the specification by either refining its existing edge specifications or by introducing new edge specifications. We prove that all four procedures maintain specification soundness, i.e. any trajectory satisfying the refined specification also satisfies the original. We then show how \textsc{AutoSpec} can be integrated with existing reinforcement learning algorithms for learning policies from logical specifications. Our experiments demonstrate that \textsc{AutoSpec} yields promising improvements in terms of the complexity of control tasks that can be solved, when refined logical specifications produced by \textsc{AutoSpec} are utilized.
【7】AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning
标题:AltNet:解决强化学习中的可塑性-稳定性困境
链接:https://arxiv.org/abs/2512.01034
作者:Mansi Maheshwari,John C. Raisbeck,Bruno Castro da Silva
摘要:Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning ability is known as plasticity loss. To restore plasticity, prior work has explored periodically resetting the parameters of the learning network, a strategy that often improves overall performance. However, such resets come at the cost of a temporary drop in performance, which can be dangerous in real-world settings. To overcome this instability, we introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks. The use of twin networks anchors performance during resets through a mechanism that allows networks to periodically alternate roles: one network learns as it acts in the environment, while the other learns off-policy from the active network's interactions and a replay buffer. At fixed intervals, the active network is reset and the passive network, having learned from prior experiences, becomes the new active network. AltNet restores plasticity, improving sample efficiency and achieving higher performance, while avoiding performance drops that pose risks in safety-critical settings. We demonstrate these advantages in several high-dimensional control tasks from the DeepMind Control Suite, where AltNet outperforms various relevant baseline methods, as well as state-of-the-art reset-based techniques.
【8】Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning
标题:强化学习视频扩散模型的目标驱动奖励
链接:https://arxiv.org/abs/2512.00961
作者:Qi Wang,Mian Wu,Yuyang Zhang,Mingqi Yuan,Wenyao Zhang,Haoxiang You,Yunbo Wang,Xin Jin,Xiaokang Yang,Wenjun Zeng
摘要:Reinforcement Learning (RL) has achieved remarkable success in various domains, yet it often relies on carefully designed programmatic reward functions to guide agent behavior. Designing such reward functions can be challenging and may not generalize well across different tasks. To address this limitation, we leverage the rich world knowledge contained in pretrained video diffusion models to provide goal-driven reward signals for RL agents without ad-hoc design of reward. Our key idea is to exploit off-the-shelf video diffusion models pretrained on large-scale video datasets as informative reward functions in terms of video-level and frame-level goals. For video-level rewards, we first finetune a pretrained video diffusion model on domain-specific datasets and then employ its video encoder to evaluate the alignment between the latent representations of agent's trajectories and the generated goal videos. To enable more fine-grained goal-achievement, we derive a frame-level goal by identifying the most relevant frame from the generated video using CLIP, which serves as the goal state. We then employ a learned forward-backward representation that represents the probability of visiting the goal state from a given state-action pair as frame-level reward, promoting more coherent and goal-driven trajectories. Experiments on various Meta-World tasks demonstrate the effectiveness of our approach.
【9】Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
标题:对称性破坏环境中的部分等变强化学习
链接:https://arxiv.org/abs/2512.00915
作者:Junwoo Chang,Minwoo Park,Joohwan Seo,Roberto Horowitz,Jongmin Lee,Jongeun Choi
备注:27 pages, 10 figures
摘要
:Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL.
【10】List Replicable Reinforcement Learning
标题:列出可复制强化学习
链接:https://arxiv.org/abs/2512.00553
作者:Bohan Zhang,Michael Chen,A. Pavan,N. V. Vinodchandran,Lin F. Yang,Ruosong Wang
摘要:Replicability is a fundamental challenge in reinforcement learning (RL), as RL algorithms are empirically observed to be unstable and sensitive to variations in training conditions. To formally address this issue, we study \emph{list replicability} in the Probably Approximately Correct (PAC) RL framework, where an algorithm must return a near-optimal policy that lies in a \emph{small list} of policies across different runs, with high probability. The size of this list defines the \emph{list complexity}. We introduce both weak and strong forms of list replicability: the weak form ensures that the final learned policy belongs to a small list, while the strong form further requires that the entire sequence of executed policies remains constrained. These objectives are challenging, as existing RL algorithms exhibit exponential list complexity due to their instability. Our main theoretical contribution is a provably efficient tabular RL algorithm that guarantees list replicability by ensuring the list complexity remains polynomial in the number of states, actions, and the horizon length. We further extend our techniques to achieve strong list replicability, bounding the number of possible policy execution traces polynomially with high probability. Our theoretical result is made possible by key innovations including (i) a novel planning strategy that selects actions based on lexicographic order among near-optimal choices within a randomly chosen tolerance threshold, and (ii) a mechanism for testing state reachability in stochastic environments while preserving replicability. Finally, we demonstrate that our theoretical investigation sheds light on resolving the \emph{instability} issue of RL algorithms used in practice. In particular, we show that empirically, our new planning strategy can be incorporated into practical RL frameworks to enhance their stability.
【11】DQ4FairIM: Fairness-aware Influence Maximization using Deep Reinforcement Learning
标题:DQ4 FairIM:使用深度强化学习实现公平意识的影响最大化
链接:https://arxiv.org/abs/2512.00545
作者:Akrati Saxena,Harshith Kumar Yadav,Bart Rutten,Shashi Shekhar Jha
摘要:The Influence Maximization (IM) problem aims to select a set of seed nodes within a given budget to maximize the spread of influence in a social network. However, real-world social networks have several structural inequalities, such as dominant majority groups and underrepresented minority groups. If these inequalities are not considered while designing IM algorithms, the outcomes might be biased, disproportionately benefiting majority groups while marginalizing minorities. In this work, we address this gap by designing a fairness-aware IM method using Reinforcement Learning (RL) that ensures equitable influence outreach across all communities, regardless of protected attributes. Fairness is incorporated using a maximin fairness objective, which prioritizes improving the outreach of the least-influenced group, pushing the solution toward an equitable influence distribution. We propose a novel fairness-aware deep RL method, called DQ4FairIM, that maximizes the expected number of influenced nodes by learning an RL policy. The learnt policy ensures that minority groups formulate the IM problem as a Markov Decision Process (MDP) and use deep Q-learning, combined with the Structure2Vec network embedding, earning together with Structure2Vec network embedding to solve the MDP. We perform extensive experiments on synthetic benchmarks and real-world networks to compare our method with fairness-agnostic and fairness-aware baselines. The results show that our method achieves a higher level of fairness while maintaining a better fairness-performance trade-off than baselines. Additionally, our approach learns effective seeding policies that generalize across problem instances without retraining, such as varying the network size or the number of seed nodes.
【12】Gradient Inversion in Federated Reinforcement Learning
标题:联邦强化学习中的梯度倒置
链接:https://arxiv.org/abs/2512.00303
作者:Shenghong He
摘要:Federated reinforcement learning (FRL) enables distributed learning of optimal policies while preserving local data privacy through gradient sharing.However, FRL faces the risk of data privacy leaks, where attackers exploit shared gradients to reconstruct local training data.Compared to traditional supervised federated learning, successful reconstruction in FRL requires the generated data not only to match the shared gradients but also to align with real transition dynamics of the environment (i.e., aligning with the real data transition distribution).To address this issue, we propose a novel attack method called Regularization Gradient Inversion Attack (RGIA), which enforces prior-knowledge-based regularization on states, rewards, and transition dynamics during the optimization process to ensure that the reconstructed data remain close to the true transition distribution.Theoretically, we prove that the prior-knowledge-based regularization term narrows the solution space from a broad set containing spurious solutions to a constrained subset that satisfies both gradient matching and true transition dynamics.Extensive experiments on control tasks and autonomous driving tasks demonstrate that RGIA can effectively constrain reconstructed data transition distributions and thus successfully reconstruct local private data.
【13】A Hierarchical Hybrid AI Approach: Integrating Deep Reinforcement Learning and Scripted Agents in Combat Simulations
标题:分层混合人工智能方法:在战斗模拟中集成深度强化学习和脚本代理
链接:https://arxiv.org/abs/2512.00249
作者:Scotty Black,Christian Darken
备注:arXiv admin note: substantial text overlap with arXiv:2408.13333
摘要
:In the domain of combat simulations in support of wargaming, the development of intelligent agents has predominantly been characterized by rule-based, scripted methodologies with deep reinforcement learning (RL) approaches only recently being introduced. While scripted agents offer predictability and consistency in controlled environments, they fall short in dynamic, complex scenarios due to their inherent inflexibility. Conversely, RL agents excel in adaptability and learning, offering potential improvements in handling unforeseen situations, but suffer from significant challenges such as black-box decision-making processes and scalability issues in larger simulation environments. This paper introduces a novel hierarchical hybrid artificial intelligence (AI) approach that synergizes the reliability and predictability of scripted agents with the dynamic, adaptive learning capabilities of RL. By structuring the AI system hierarchically, the proposed approach aims to utilize scripted agents for routine, tactical-level decisions and RL agents for higher-level, strategic decision-making, thus addressing the limitations of each method while leveraging their individual strengths. This integration is shown to significantly improve overall performance, providing a robust, adaptable, and effective solution for developing and training intelligent agents in complex simulation environments.
【14】InF-ATPG: Intelligent FFR-Driven ATPG with Advanced Circuit Representation Guided Reinforcement Learning
标题:InF-ATPG:具有高级电路表示引导的强化学习的智能血流储备驱动ATPG
链接:https://arxiv.org/abs/2512.00079
作者:Bin Sun,Rengang Zhang,Zhiteng Chao,Zizhen Liu,Jianan Mu,Jing Ye,Huawei Li
备注:9 pages,6 figures
摘要:Automatic test pattern generation (ATPG) is a crucial process in integrated circuit (IC) design and testing, responsible for efficiently generating test patterns. As semiconductor technology progresses, traditional ATPG struggles with long execution times to achieve the expected fault coverage, which impacts the time-to-market of chips. Recent machine learning techniques, like reinforcement learning (RL) and graph neural networks (GNNs), show promise but face issues such as reward delay in RL models and inadequate circuit representation in GNN-based methods. In this paper, we propose InF-ATPG, an intelligent FFR-driven ATPG framework that overcomes these challenges by using advanced circuit representation to guide RL. By partitioning circuits into fanout-free regions (FFRs) and incorporating ATPG-specific features into a novel QGNN architecture, InF-ATPG enhances test pattern generation efficiency. Experimental results show InF-ATPG reduces backtracks by 55.06\% on average compared to traditional methods and 38.31\% compared to the machine learning approach, while also improving fault coverage.
医学相关(10篇)
【1】Mitigating Gender Bias in Depression Detection via Counterfactual Inference
标题:通过反事实推理减轻抑郁检测中的性别偏见
链接:https://arxiv.org/abs/2512.01834
作者:Mingxuan Hu,Hongbo Ma,Xinlan Wu,Ziqi Liu,Jiaqi Liu,Yangbin Chen
摘要:Audio-based depression detection models have demonstrated promising performance but often suffer from gender bias due to imbalanced training data. Epidemiological statistics show a higher prevalence of depression in females, leading models to learn spurious correlations between gender and depression. Consequently, models tend to over-diagnose female patients while underperforming on male patients, raising significant fairness concerns. To address this, we propose a novel Counterfactual Debiasing Framework grounded in causal inference. We construct a causal graph to model the decision-making process and identify gender bias as the direct causal effect of gender on the prediction. During inference, we employ counterfactual inference to estimate and subtract this direct effect, ensuring the model relies primarily on authentic acoustic pathological features. Extensive experiments on the DAIC-WOZ dataset using two advanced acoustic backbones demonstrate that our framework not only significantly reduces gender bias but also improves overall detection performance compared to existing debiasing strategies.
【2】Semantic-aware Random Convolution and Source Matching for Domain Generalization in Medical Image Segmentation
标题:基于语义的随机卷积和源匹配医学图像分割
链接:https://arxiv.org/abs/2512.01510
作者:Franz Thaler,Martin Urschler,Mateusz Kozinski,Matthias AF Gsell,Gernot Plank,Darko Stern
备注:Preprint submitted to Computer Methods and Programs in Biomedicine (currently under revision)
摘要:We tackle the challenging problem of single-source domain generalization (DG) for medical image segmentation. To this end, we aim for training a network on one domain (e.g., CT) and directly apply it to a different domain (e.g., MR) without adapting the model and without requiring images or annotations from the new domain during training. We propose a novel method for promoting DG when training deep segmentation networks, which we call SRCSM. During training, our method diversifies the source domain through semantic-aware random convolution, where different regions of a source image are augmented differently, based on their annotation labels. At test-time, we complement the randomization of the training domain via mapping the intensity of target domain images, making them similar to source domain data. We perform a comprehensive evaluation on a variety of cross-modality and cross-center generalization settings for abdominal, whole-heart and prostate segmentation, where we outperform previous DG techniques in a vast majority of experiments. Additionally, we also investigate our method when training on whole-heart CT or MR data and testing on the diastolic and systolic phase of cine MR data captured with different scanner hardware, where we make a step towards closing the domain gap in this even more challenging setting. Overall, our evaluation shows that SRCSM can be considered a new state-of-the-art in DG for medical image segmentation and, moreover, even achieves a segmentation performance that matches the performance of the in-domain baseline in several settings.
【3】A Fine Evaluation Method for Cube Copying Test for Early Detection of Alzheimer's Disease
标题:阿尔茨海默病早期检测的立方体反射试验的精细评价方法
链接:https://arxiv.org/abs/2512.01367
作者:Xinyu Jiang,Cuiyun Gao,Wenda Huang,Yiyang Jiang,Binwen Luo,Yuxin Jiang,Mengting Wang,Haoran Wen,Yang Zhao,Xuemei Chen,Songqun Huang
摘要
:Background: Impairment of visual spatial cognitive function is the most common early clinical manifestation of Alzheimer's Disease (AD). When the Montreal Cognitive Assessment (MoCA) uses the "0/1" binary method ("pass/fail") to evaluate the visual spatial cognitive ability represented by the Cube Copying Test(CCT), the elder with less formal education generally score 0 point, resulting in serious bias in the evaluation results. Therefore, this study proposes a fine evaluation method for CCT based on dynamic handwriting feature extraction of DH-SCSM-BLA. method : The Cogni-CareV3.0 software independently developed by our team was used to collect dynamic handwriting data of CCT. Then, the spatial and motion features of segmented dynamic handwriting were extracted, and feature matrix with unequal dimensions were normalized. Finally, a bidirectional long short-term memory network model combined with attention mechanism (BiLSTM-Attention) was adopted for classification. Result: The experimental results showed that: The proposed method has significant superiority compared to similar studies, with a classification accuracy of 86.69%. The distribution of cube drawing ability scores has significant regularity for three aspects such as MCI patients and healthy control group, age, and levels of education. It was also found that score for each cognitive task including cube drawing ability score is negatively correlated with age. Score for each cognitive task including cube drawing ability score, but positively correlated with levels of education significantly. Conclusion: This study provides a relatively objective and comprehensive evaluation method for early screening and personalized intervention of visual spatial cognitive impairment.
【4】Multi-Modal AI for Remote Patient Monitoring in Cancer Care
标题:用于癌症护理中远程患者监护的多模式人工智能
链接:https://arxiv.org/abs/2512.00949
作者:Yansong Liu,Ronnie Stafford,Pramit Khetrapal,Huriye Kocadag,Graça Carvalho,Patricia de Winter,Maryam Imran,Amelia Snook,Adamos Hadjivasiliou,D. Vijay Anand,Weining Lin,John Kelly,Yukun Zhou,Ivana Drobnjak
摘要:For patients undergoing systemic cancer therapy, the time between clinic visits is full of uncertainties and risks of unmonitored side effects. To bridge this gap in care, we developed and prospectively trialed a multi-modal AI framework for remote patient monitoring (RPM). This system integrates multi-modal data from the HALO-X platform, such as demographics, wearable sensors, daily surveys, and clinical events. Our observational trial is one of the largest of its kind and has collected over 2.1 million data points (6,080 patient-days) of monitoring from 84 patients. We developed and adapted a multi-modal AI model to handle the asynchronous and incomplete nature of real-world RPM data, forecasting a continuous risk of future adverse events. The model achieved an accuracy of 83.9% (AUROC=0.70). Notably, the model identified previous treatments, wellness check-ins, and daily maximum heart rate as key predictive features. A case study demonstrated the model's ability to provide early warnings by outputting escalating risk profiles prior to the event. This work establishes the feasibility of multi-modal AI RPM for cancer care and offers a path toward more proactive patient support.(Accepted at Europe NeurIPS 2025 Multimodal Representation Learning for Healthcare Workshop)
【5】Text Mining Analysis of Symptom Patterns in Medical Chatbot Conversations
标题:医疗聊天机器人对话中症状模式的文本挖掘分析
链接:https://arxiv.org/abs/2512.00768
作者:Hamed Razavi
备注:9 pages, 4 tables
摘要:The fast growth of digital health systems has led to a need to better comprehend how they interpret and represent patient-reported symptoms. Chatbots have been used in healthcare to provide clinical support and enhance the user experience, making it possible to provide meaningful clinical patterns from text-based data through chatbots. The proposed research utilises several different natural language processing methods to study the occurrences of symptom descriptions in medicine as well as analyse the patterns that emerge through these conversations within medical bots. Through the use of the Medical Conversations to Disease Dataset which contains 960 multi-turn dialogues divided into 24 Clinical Conditions, a standardised representation of conversations between patient and bot is created for further analysis by computational means. The multi-method approach uses a variety of tools, including Latent Dirichlet Allocation (LDA) to identify latent symptom themes, K-Means to group symptom descriptions by similarity, Transformer-based Named Entity Recognition (NER) to extract medical concepts, and the Apriori algorithm to discover frequent symptom pairs. Findings from the analysis indicate a coherent structure of clinically relevant topics, moderate levels of clustering cohesiveness and several high confidence rates on the relationships between symptoms like fever headache and rash itchiness. The results support the notion that conversational medical data can be a valuable diagnostic signal for early symptom interpretation, assist in strengthening decision support and improve how users interact with tele-health technology. By demonstrating a method for converting unstructured free-flowing dialogue into actionable knowledge regarding symptoms this work provides an extensible framework to further enhance future performance, dependability and clinical utility of selecting medical chatbots.
【6】Statistical NLP for Optimization of Clinical Trial Success Prediction in Pharmaceutical R&D
标题:药物研发临床试验成功预测优化的统计NLP
链接:https://arxiv.org/abs/2512.00586
作者:Michael R. Doane
备注:Doctor of Engineering Praxis Dissertation, The George Washington University. 122 pages. Present affiliation: Iambic Therapeutics
摘要
:This work presents the development and evaluation of an NLP-enabled probabilistic classifier designed to estimate the probability of technical and regulatory success (pTRS) for clinical trials in the field of neuroscience. While pharmaceutical R&D is plagued by high attrition rates and enormous costs, particularly within neuroscience, where success rates are below 10%, timely identification of promising programs can streamline resource allocation and reduce financial risk. Leveraging data from the ClinicalTrials.gov database and success labels from the recently developed Clinical Trial Outcome dataset, the classifier extracts text-based clinical trial features using statistical NLP techniques. These features were integrated into several non-LLM frameworks (logistic regression, gradient boosting, and random forest) to generate calibrated probability scores. Model performance was assessed on a retrospective dataset of 101,145 completed clinical trials spanning 1976-2024, achieving an overall ROC-AUC of 0.64. An LLM-based predictive model was then built using BioBERT, a domain-specific language representation encoder. The BioBERT-based model achieved an overall ROC-AUC of 0.74 and a Brier Score of 0.185, indicating its predictions had, on average, 40% less squared error than would be observed using industry benchmarks. The BioBERT-based model also made trial outcome predictions that were superior to benchmark values 70% of the time overall. By integrating NLP-driven insights into drug development decision-making, this work aims to enhance strategic planning and optimize investment allocation in neuroscience programs.
【7】Privacy-Preserving Generative Modeling and Clinical Validation of Longitudinal Health Records for Chronic Disease
标题:慢性病纵向健康记录的隐私保护生成建模和临床验证
链接:https://arxiv.org/abs/2512.00434
作者:Benjamin D. Ballyk,Ankit Gupta,Sujay Konda,Kavitha Subramanian,Chris Landon,Ahmed Ammar Naseer,Georg Maierhofer,Sumanth Swaminathan,Vasudevan Venkateshwaran
备注:To appear in Proceedings of Machine Learning Research Volume 297 - Proceedings of ML4H 2025
摘要:Data privacy is a critical challenge in modern medical workflows as the adoption of electronic patient records has grown rapidly. Stringent data protection regulations limit access to clinical records for training and integrating machine learning models that have shown promise in improving diagnostic accuracy and personalized care outcomes. Synthetic data offers a promising alternative; however, current generative models either struggle with time-series data or lack formal privacy guaranties. In this paper, we enhance a state-of-the-art time-series generative model to better handle longitudinal clinical data while incorporating quantifiable privacy safeguards. Using real data from chronic kidney disease and ICU patients, we evaluate our method through statistical tests, a Train-on-Synthetic-Test-on-Real (TSTR) setup, and expert clinical review. Our non-private model (Augmented TimeGAN) outperforms transformer- and flow-based models on statistical metrics in several datasets, while our private model (DP-TimeGAN) maintains a mean authenticity of 0.778 on the CKD dataset, outperforming existing state-of-the-art models on the privacy-utility frontier. Both models achieve performance comparable to real data in clinician evaluations, providing robust input data necessary for developing models for complex chronic conditions without compromising data privacy.
【8】A Fast and Efficient Modern BERT based Text-Conditioned Diffusion Model for Medical Image Segmentation
标题:一种快速有效的现代BERT文本条件扩散模型医学图像分割
链接:https://arxiv.org/abs/2512.00084
作者:Venkata Siddharth Dhara,Pawan Kumar
备注:15 pages, 3 figures, Accepted in Slide 3 10th International Conference on Computer Vision & Image Processing (CVIP 2026)
摘要:In recent times, denoising diffusion probabilistic models (DPMs) have proven effective for medical image generation and denoising, and as representation learners for downstream segmentation. However, segmentation performance is limited by the need for dense pixel-wise labels, which are expensive, time-consuming, and require expert knowledge. We propose FastTextDiff, a label-efficient diffusion-based segmentation model that integrates medical text annotations to enhance semantic representations. Our approach uses ModernBERT, a transformer capable of processing long clinical notes, to tightly link textual annotations with semantic content in medical images. Trained on MIMIC-III and MIMIC-IV, ModernBERT encodes clinical knowledge that guides cross-modal attention between visual and textual features. This study validates ModernBERT as a fast, scalable alternative to Clinical BioBERT in diffusion-based segmentation pipelines and highlights the promise of multi-modal techniques for medical image analysis. By replacing Clinical BioBERT with ModernBERT, FastTextDiff benefits from FlashAttention 2, an alternating attention mechanism, and a 2-trillion-token corpus, improving both segmentation accuracy and training efficiency over traditional diffusion-based models.
【9】Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios
标题:多运动状态场景中六种可穿戴传感器模式的无袖带血压估计
链接:https://arxiv.org/abs/2512.01653
作者:Yiqiao Chen,Fazheng Xu,Zijian Huang,Juchi He,Zhenghui Feng
备注:13 pages, 7 figures
摘要:Cardiovascular disease (CVD) is a leading cause of morbidity and mortality worldwide, and sustained hypertension is an often silent risk factor, making cuffless continuous blood pressure (BP) monitoring with wearable devices important for early screening and long-term management. Most existing cuffless BP estimation methods use only photoplethysmography (PPG) and electrocardiography (ECG) signals, alone or in combination. These models are typically developed under resting or quasi-static conditions and struggle to maintain robust accuracy in multi-motion-state scenarios. In this study, we propose a six-modal BP estimation framework that jointly leverages ECG, multi-channel PPG, attachment pressure, sensor temperature, and triaxial acceleration and angular velocity. Each modality is processed by a lightweight branch encoder, contrastive learning enforces cross-modal semantic alignment, and a mixture-of-experts (MoE) regression head adaptively maps the fused features to BP across motion states. Comprehensive experiments on the public Pulse Transit Time PPG Dataset, which includes running, walking, and sitting data from 22 subjects, show that the proposed method achieves mean absolute errors (MAE) of 3.60 mmHg for systolic BP (SBP) and 3.01 mmHg for diastolic BP (DBP). From a clinical perspective, it attains Grade A for SBP, DBP, and mean arterial pressure (MAP) according to the British Hypertension Society (BHS) protocol and meets the numerical criteria of the Association for the Advancement of Medical Instrumentation (AAMI) standard for mean error (ME) and standard deviation of error (SDE).
【10】MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation
标题:MedCondDiff:用于医学图像分割的轻量级、稳健、语义引导扩散
链接:https://arxiv.org/abs/2512.00350
作者:Ruirui Huang,Jiacheng Li
摘要
:We introduce MedCondDiff, a diffusion-based framework for multi-organ medical image segmentation that is efficient and anatomically grounded. The model conditions the denoising process on semantic priors extracted by a Pyramid Vision Transformer (PVT) backbone, yielding a semantically guided and lightweight diffusion architecture. This design improves robustness while reducing both inference time and VRAM usage compared to conventional diffusion models. Experiments on multi-organ, multi-modality datasets demonstrate that MedCondDiff delivers competitive performance across anatomical regions and imaging modalities, underscoring the potential of semantically guided diffusion models as an effective class of architectures for medical imaging tasks.
蒸馏|知识提取(1篇)
【1】S^2-KD: Semantic-Spectral Knowledge Distillation Spatiotemporal Forecasting
标题:S#2-KD:语义谱知识蒸馏时空预测
链接:https://arxiv.org/abs/2512.00366
作者:Wenshuo Wang,Yaomin Shen,Yingjie Tan,Yihao Chen
摘要:Spatiotemporal forecasting often relies on computationally intensive models to capture complex dynamics. Knowledge distillation (KD) has emerged as a key technique for creating lightweight student models, with recent advances like frequency-aware KD successfully preserving spectral properties (i.e., high-frequency details and low-frequency trends). However, these methods are fundamentally constrained by operating on pixel-level signals, leaving them blind to the rich semantic and causal context behind the visual patterns. To overcome this limitation, we introduce S^2-KD, a novel framework that unifies Semantic priors with Spectral representations for distillation. Our approach begins by training a privileged, multimodal teacher model. This teacher leverages textual narratives from a Large Multimodal Model (LMM) to reason about the underlying causes of events, while its architecture simultaneously decouples spectral components in its latent space. The core of our framework is a new distillation objective that transfers this unified semantic-spectral knowledge into a lightweight, vision-only student. Consequently, the student learns to make predictions that are not only spectrally accurate but also semantically coherent, without requiring any textual input or architectural overhead at inference. Extensive experiments on benchmarks like WeatherBench and TaxiBJ+ show that S^2-KD significantly boosts the performance of simple student models, enabling them to outperform state-of-the-art methods, particularly in long-horizon and complex non-stationary scenarios.
聚类(7篇)
【1】Dynamic Algorithm for Explainable k-medians Clustering under lp Norm
标题:lp范数下可解释k-median聚类的动态算法
链接:https://arxiv.org/abs/2512.01150
作者:Konstantin Makarychev,Ilias Papanikolaou,Liren Shan
备注:36 pages, 3 figures, to appear in NeurIPS 2025
摘要:We study the problem of explainable k-medians clustering introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (2020). In this problem, the goal is to construct a threshold decision tree that partitions data into k clusters while minimizing the k-medians objective. These trees are interpretable because each internal node makes a simple decision by thresholding a single feature, allowing users to trace and understand how each point is assigned to a cluster. We present the first algorithm for explainable k-medians under lp norm for every finite p >= 1. Our algorithm achieves an O(p(log k)^{1 + 1/p - 1/p^2}) approximation to the optimal k-medians cost for any p >= 1. Previously, algorithms were known only for p = 1 and p = 2. For p = 2, our algorithm improves upon the existing bound of O(log^{3/2}k), and for p = 1, it matches the tight bound of log k + O(1) up to a multiplicative O(log log k) factor. We show how to implement our algorithm in a dynamic setting. The dynamic algorithm maintains an explainable clustering under a sequence of insertions and deletions, with amortized update time O(d log^3 k) and O(log k) recourse, making it suitable for large-scale and evolving datasets.
【2】Hierarchical Semantic Alignment for Image Clustering
标题:图像集群的分层语义对齐
链接:https://arxiv.org/abs/2512.00904
作者:Xingyu Zhu,Beier Zhu,Yunfan Li,Junfeng Fang,Shuo Wang,Kesen Zhao,Hanwang Zhang
备注:AAAI 2026
摘要:Image clustering is a classic problem in computer vision, which categorizes images into different groups. Recent studies utilize nouns as external semantic knowledge to improve clus- tering performance. However, these methods often overlook the inherent ambiguity of nouns, which can distort semantic representations and degrade clustering quality. To address this issue, we propose a hierarChical semAntic alignmEnt method for image clustering, dubbed CAE, which improves cluster- ing performance in a training-free manner. In our approach, we incorporate two complementary types of textual seman- tics: caption-level descriptions, which convey fine-grained attributes of image content, and noun-level concepts, which represent high-level object categories. We first select relevant nouns from WordNet and descriptions from caption datasets to construct a semantic space aligned with image features. Then, we align image features with selected nouns and captions via optimal transport to obtain a more discriminative semantic space. Finally, we combine the enhanced semantic and image features to perform clustering. Extensive experiments across 8 datasets demonstrate the effectiveness of our method, notably surpassing the state-of-the-art training-free approach with a 4.2% improvement in accuracy and a 2.9% improvement in adjusted rand index (ARI) on the ImageNet-1K dataset.
【3】Topological Federated Clustering via Gravitational Potential Fields under Local Differential Privacy
标题:局部差分隐私下基于引力势场的拓扑联邦聚类
链接:https://arxiv.org/abs/2512.00849
作者:Yunbo Long,Jiaquan Zhang,Xi Chen,Alexandra Brintrup
摘要
:Clustering non-independent and identically distributed (non-IID) data under local differential privacy (LDP) in federated settings presents a critical challenge: preserving privacy while maintaining accuracy without iterative communication. Existing one-shot methods rely on unstable pairwise centroid distances or neighborhood rankings, degrading severely under strong LDP noise and data heterogeneity. We present Gravitational Federated Clustering (GFC), a novel approach to privacy-preserving federated clustering that overcomes the limitations of distance-based methods under varying LDP. Addressing the critical challenge of clustering non-IID data with diverse privacy guarantees, GFC transforms privatized client centroids into a global gravitational potential field where true cluster centers emerge as topologically persistent singularities. Our framework introduces two key innovations: (1) a client-side compactness-aware perturbation mechanism that encodes local cluster geometry as "mass" values, and (2) a server-side topological aggregation phase that extracts stable centroids through persistent homology analysis of the potential field's superlevel sets. Theoretically, we establish a closed-form bound between the privacy budget $ε$ and centroid estimation error, proving the potential field's Lipschitz smoothing properties exponentially suppress noise in high-density regions. Empirically, GFC outperforms state-of-the-art methods on ten benchmarks, especially under strong LDP constraints ($ε< 1$), while maintaining comparable performance at lower privacy budgets. By reformulating federated clustering as a topological persistence problem in a synthetic physics-inspired space, GFC achieves unprecedented privacy-accuracy trade-offs without iterative communication, providing a new perspective for privacy-preserving distributed learning.
【4】ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering
标题:ESMC:基于MLLM的可解释多重集群嵌入选择
链接:https://arxiv.org/abs/2512.00725
作者:Xinyue Wang,Yuheng Jia,Hui Liu,Junhui Hou
摘要:Typical deep clustering methods, while achieving notable progress, can only provide one clustering result per dataset. This limitation arises from their assumption of a fixed underlying data distribution, which may fail to meet user needs and provide unsatisfactory clustering outcomes. Our work investigates how multi-modal large language models (MLLMs) can be leveraged to achieve user-driven clustering, emphasizing their adaptability to user-specified semantic requirements. However, directly using MLLM output for clustering has risks for producing unstructured and generic image descriptions instead of feature-specific and concrete ones. To address these issues, our method first discovers that MLLMs' hidden states of text tokens are strongly related to the corresponding features, and leverages these embeddings to perform clusterings from any user-defined criteria. We also employ a lightweight clustering head augmented with pseudo-label learning, significantly enhancing clustering accuracy. Extensive experiments demonstrate its competitive performance on diverse datasets and metrics.
【5】Hyperbolic Continuous Structural Entropy for Hierarchical Clustering
标题:层次集群的双曲连续结构信息
链接:https://arxiv.org/abs/2512.00524
作者:Guangjie Zeng,Hao Peng,Angsheng Li,Li Sun,Chunyang Liu,Shengze Li,Yicheng Pan,Philip S. Yu
备注:14 pages, accepted by AAAI 2026
摘要:Hierarchical clustering is a fundamental machine-learning technique for grouping data points into dendrograms. However, existing hierarchical clustering methods encounter two primary challenges: 1) Most methods specify dendrograms without a global objective. 2) Graph-based methods often neglect the significance of graph structure, optimizing objectives on complete or static predefined graphs. In this work, we propose Hyperbolic Continuous Structural Entropy neural networks, namely HypCSE, for structure-enhanced continuous hierarchical clustering. Our key idea is to map data points in the hyperbolic space and minimize the relaxed continuous structural entropy (SE) on structure-enhanced graphs. Specifically, we encode graph vertices in hyperbolic space using hyperbolic graph neural networks and minimize approximate SE defined on graph embeddings. To make the SE objective differentiable for optimization, we reformulate it into a function using the lowest common ancestor (LCA) on trees and then relax it into continuous SE (CSE) by the analogy of hyperbolic graph embeddings and partitioning trees. To ensure a graph structure that effectively captures the hierarchy of data points for CSE calculation, we employ a graph structure learning (GSL) strategy that updates the graph structure during training. Extensive experiments on seven datasets demonstrate the superior performance of HypCSE.
【6】Hybrid Context-Fusion Attention (CFA) U-Net and Clustering for Robust Seismic Horizon Interpretation
标题:混合上下文融合注意力(CFA)U-Net和集群用于稳健的地震层位解释
链接:https://arxiv.org/abs/2512.00191
作者:Jose Luis Lima de Jesus Silva,Joao Pedro Gomes,Paulo Roberto de Melo Barros Junior,Vitor Hugo Serravalle Reis Rodrigues,Alexsandro Guerra Cerqueira
摘要:Interpreting seismic horizons is a critical task for characterizing subsurface structures in hydrocarbon exploration. Recent advances in deep learning, particularly U-Net-based architectures, have significantly improved automated horizon tracking. However, challenges remain in accurately segmenting complex geological features and interpolating horizons from sparse annotations. To address these issues, a hybrid framework is presented that integrates advanced U-Net variants with spatial clustering to enhance horizon continuity and geometric fidelity. The core contribution is the Context Fusion Attention (CFA) U-Net, a novel architecture that fuses spatial and Sobel-derived geometric features within attention gates to improve both precision and surface completeness. The performance of five architectures, the U-Net (Standard and compressed), U-Net++, Attention U-Net, and CFA U-Net, was systematically evaluated across various data sparsity regimes (10-, 20-, and 40-line spacing). This approach outperformed existing baselines, achieving state-of-the-art results on the Mexilhao field (Santos Basin, Brazil) dataset with a validation IoU of 0.881 and MAE of 2.49ms, and excellent surface coverage of 97.6% on the F3 Block of the North Sea dataset under sparse conditions. The framework further refines merged horizon predictions (inline and cross-line) using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to produce geologically plausible surfaces. The results demonstrate the advantages of hybrid methodologies and attention-based architectures enhanced with geometric context, providing a robust and generalizable solution for seismic interpretation in structurally complex and data-scarce environments.
【7】An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis
标题:变量聚集的一种方法:转置数据中的K均值及其与主成分分析的关系
链接:https://arxiv.org/abs/2512.00979
作者:Victor Saquicela,Kenneth Palacio-Baus,Mario Chifla
备注:Presented at conference and to appear in the proceedings of the 2025 IEEE Chilean Conference on Electrical, Electronics Engineering, Information and Communication Technologies (ChileCon)
摘要:Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA.
超分辨率|去噪|去模糊|去雾(1篇)
【1】Consistency Flow Model Achieves One-step Denoising Error Correction Codes
标题:一致性流模型实现一步去噪错误纠正码
链接:https://arxiv.org/abs/2512.01389
作者:Haoyu Lei,Chin Wa Lau,Kaiwen Zhou,Nian Guo,Farzan Farnia
摘要:Error Correction Codes (ECC) are fundamental to reliable digital communication, yet designing neural decoders that are both accurate and computationally efficient remains challenging. Recent denoising diffusion decoders with transformer backbones achieve state-of-the-art performance, but their iterative sampling limits practicality in low-latency settings. We introduce the Error Correction Consistency Flow Model (ECCFM), an architecture-agnostic training framework for high-fidelity one-step decoding. By casting the reverse denoising process as a Probability Flow Ordinary Differential Equation (PF-ODE) and enforcing smoothness through a differential time regularization, ECCFM learns to map noisy signals along the decoding trajectory directly to the original codeword in a single inference step. Across multiple decoding benchmarks, ECCFM attains lower bit-error rates (BER) than autoregressive and diffusion-based baselines, with notable improvements on longer codes, while delivering inference speeds up from 30x to 100x faster than denoising diffusion decoders.
自动驾驶|车辆|车道检测等(3篇)
【1】New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles
标题:用于自动驾驶车辆多模式决策的新尖峰架构
链接:https://arxiv.org/abs/2512.01882
作者:Aref Ghoreishee,Abhishek Mishra,Lifeng Zhou,John Walsh,Nagarajan Kandasamy
摘要:This work proposes an end-to-end multi-modal reinforcement learning framework for high-level decision-making in autonomous vehicles. The framework integrates heterogeneous sensory input, including camera images, LiDAR point clouds, and vehicle heading information, through a cross-attention transformer-based perception module. Although transformers have become the backbone of modern multi-modal architectures, their high computational cost limits their deployment in resource-constrained edge environments. To overcome this challenge, we propose a spiking temporal-aware transformer-like architecture that uses ternary spiking neurons for computationally efficient multi-modal fusion. Comprehensive evaluations across multiple tasks in the Highway Environment demonstrate the effectiveness and efficiency of the proposed approach for real-time autonomous decision-making.
【2】City-Conditioned Memory for Multi-City Traffic and Mobility Forecasting
标题:多城市交通和移动性预测的城市条件记忆
链接:https://arxiv.org/abs/2512.00851
作者:Wenzhang Du
摘要:Deploying spatio-temporal forecasting models across many cities is difficult: traffic networks differ in size and topology, data availability can vary by orders of magnitude, and new cities may provide only a short history of logs. Existing deep traffic models are typically trained per city and backbone, creating high maintenance cost and poor transfer to data-scarce cities. We ask whether a single, backbone-agnostic layer can condition on "which city this sequence comes from", improve accuracy in full- and low-data regimes, and support better cross-city adaptation with minimal code changes. We propose CityCond, a light-weight city-conditioned memory layer that augments existing spatio-temporal backbones. CityCond combines a city-ID encoder with an optional shared memory bank (CityMem). Given a city index and backbone hidden states, it produces city-conditioned features fused through gated residual connections. We attach CityCond to five representative backbones (GRU, TCN, Transformer, GNN, STGCN) and evaluate three regimes: full-data, low-data, and cross-city few-shot transfer on METR-LA and PEMS-BAY. We also run auxiliary experiments on SIND, a drone-based multi-agent trajectory dataset from a signalized intersection in Tianjin (we focus on pedestrian tracks). Across more than fourteen model variants and three random seeds, CityCond yields consistent improvements, with the largest gains for high-capacity backbones such as Transformers and STGCNs. CityMem reduces Transformer error by roughly one third in full-data settings and brings substantial gains in low-data and cross-city transfer. On SIND, simple city-ID conditioning modestly improves low-data LSTM performance. CityCond can therefore serve as a reusable design pattern for scalable, multi-city forecasting under realistic data constraints.
【3】Data-Driven Modeling and Correction of Vehicle Dynamics
标题:车辆动力学的数据驱动建模和修正
链接:https://arxiv.org/abs/2512.00289
作者
:Nguyen Ly,Caroline Tatsuoka,Jai Nagaraj,Jacob Levy,Fernando Palafox,David Fridovich-Keil,Hannah Lu
摘要:We develop a data-driven framework for learning and correcting non-autonomous vehicle dynamics. Physics-based vehicle models are often simplified for tractability and therefore exhibit inherent model-form uncertainty, motivating the need for data-driven correction. Moreover, non-autonomous dynamics are governed by time-dependent control inputs, which pose challenges in learning predictive models directly from temporal snapshot data. To address these, we reformulate the vehicle dynamics via a local parameterization of the time-dependent inputs, yielding a modified system composed of a sequence of local parametric dynamical systems. We approximate these parametric systems using two complementary approaches. First, we employ the DRIPS (dimension reduction and interpolation in parameter space) methodology to construct efficient linear surrogate models, equipped with lifted observable spaces and manifold-based operator interpolation. This enables data-efficient learning of vehicle models whose dynamics admit accurate linear representations in the lifted spaces. Second, for more strongly nonlinear systems, we employ FML (Flow Map Learning), a deep neural network approach that approximates the parametric evolution map without requiring special treatment of nonlinearities. We further extend FML with a transfer-learning-based model correction procedure, enabling the correction of misspecified prior models using only a sparse set of high-fidelity or experimental measurements, without assuming a prescribed form for the correction term. Through a suite of numerical experiments on unicycle, simplified bicycle, and slip-based bicycle models, we demonstrate that DRIPS offers robust and highly data-efficient learning of non-autonomous vehicle dynamics, while FML provides expressive nonlinear modeling and effective correction of model-form errors under severe data scarcity.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】Efficient Edge-Compatible CNN for Speckle-Based Material Recognition in Laser Cutting Systems
标题:高效的边缘兼容CNN用于激光切割系统中基于斑点的材料识别
链接:https://arxiv.org/abs/2512.00179
作者:Mohamed Abdallah Salem,Nourhan Zein Diab
备注:Copyright 2025 IEEE. This is the author's version of the work that has been Accepted for publication in the Proceedings of the 2025 IEEE The 35th International Conference on Computer Theory and Applications (ICCTA 2025). Final published version will be available on IEEE Xplore
摘要:Accurate material recognition is critical for safe and effective laser cutting, as misidentification can lead to poor cut quality, machine damage, or the release of hazardous fumes. Laser speckle sensing has recently emerged as a low-cost and non-destructive modality for material classification; however, prior work has either relied on computationally expensive backbone networks or addressed only limited subsets of materials. In this study, A lightweight convolutional neural network (CNN) tailored for speckle patterns is proposed, designed to minimize parameters while maintaining high discriminative power. Using the complete SensiCut dataset of 59 material classes spanning woods, acrylics, composites, textiles, metals, and paper-based products, the proposed model achieves 95.05% test accuracy, with macro and weighted F1-scores of 0.951. The network contains only 341k trainable parameters (~1.3 MB) -- over 70X fewer than ResNet-50 -- and achieves an inference speed of 295 images per second, enabling deployment on Raspberry Pi and Jetson-class devices. Furthermore, when materials are regrouped into nine and five practical families, recall exceeds 98% and approaches 100%, directly supporting power and speed preset selection in laser cutters. These results demonstrate that compact, domain-specific CNNs can outperform large backbones for speckle-based material classification, advancing the feasibility of material-aware, edge-deployable laser cutting systems.
联邦学习|隐私保护|加密(4篇)
【1】Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated Learning
标题:用于能量收集联邦学习的基于任务的语义感知调度
链接:https://arxiv.org/abs/2512.01983
作者:Eunjeong Jeong,Giovanni Perin,Howard H. Yang,Nikolaos Pappas
备注:This paper is currently under review for presentation at a peer-reviewed conference
摘要:Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs. However, most existing Energy-Harvesting FL (EHFL) strategies fail to account for this reality, resulting in wasted energy due to redundant local computations. For efficient and proactive resource management, algorithms that predict local update contributions must be devised. We propose a lightweight client scheduling framework using the Version Age of Information (VAoI), a semantics-aware metric that quantifies update timeliness and significance. Crucially, we overcome VAoI's typical prohibitive computational cost, which requires statistical distance over the entire parameter space, by introducing a feature-based proxy. This proxy estimates model redundancy using intermediate-layer extraction from a single forward pass, dramatically reducing computational complexity. Experiments conducted under extreme non-IID data distributions and scarce energy availability demonstrate superior learning performance while achieving energy reduction compared to existing baseline selection policies. Our framework establishes semantics-aware scheduling as a practical and vital solution for EHFL in realistic scenarios where training costs dominate transmission costs.
【2】Operator-Theoretic Framework for Gradient-Free Federated Learning
标题:无委托联邦学习的操作员理论框架
链接:https://arxiv.org/abs/2512.01025
作者:Mohit Kumar,Mathias Brucker,Alexander Valentinitsch,Adnan Husakovic,Ali Abbas,Manuela Geiß,Bernhard A. Moser
摘要
:Federated learning must address heterogeneity, strict communication and computation limits, and privacy while ensuring performance. We propose an operator-theoretic framework that maps the $L^2$-optimal solution into a reproducing kernel Hilbert space (RKHS) via a forward operator, approximates it using available data, and maps back with the inverse operator, yielding a gradient-free scheme. Finite-sample bounds are derived using concentration inequalities over operator norms, and the framework identifies a data-dependent hypothesis space with guarantees on risk, error, robustness, and approximation. Within this space we design efficient kernel machines leveraging the space folding property of Kernel Affine Hull Machines. Clients transfer knowledge via a scalar space folding measure, reducing communication and enabling a simple differentially private protocol: summaries are computed from noise-perturbed data matrices in one step, avoiding per-round clipping and privacy accounting. The induced global rule requires only integer minimum and equality-comparison operations per test point, making it compatible with fully homomorphic encryption (FHE). Across four benchmarks, the gradient-free FL method with fixed encoder embeddings matches or outperforms strong gradient-based fine-tuning, with gains up to 23.7 points. In differentially private experiments, kernel smoothing mitigates accuracy loss in high-privacy regimes. The global rule admits an FHE realization using $Q \times C$ encrypted minimum and $C$ equality-comparison operations per test point, with operation-level benchmarks showing practical latencies. Overall, the framework provides provable guarantees with low communication, supports private knowledge transfer via scalar summaries, and yields an FHE-compatible prediction rule offering a mathematically grounded alternative to gradient-based federated learning under heterogeneity.
【3】Prediction-space knowledge markets for communication-efficient federated learning on multimedia tasks
标题:用于多媒体任务的通信高效联合学习的预测空间知识市场
链接:https://arxiv.org/abs/2512.00841
作者:Wenzhang Du
备注:13 pages, 3 figures
摘要:Federated learning (FL) enables collaborative training over distributed multimedia data but suffers acutely from statistical heterogeneity and communication constraints, especially when clients deploy large models. Classic parameter-averaging methods such as FedAvg transmit full model weights and can diverge under nonindependent and identically distributed (non-IID) data. We propose KTA v2, a prediction-space knowledge trading market for FL. Each round, clients locally train on their private data, then share only logits on a small public reference set. The server constructs a client-client similarity graph in prediction space, combines it with reference-set accuracy to form per-client teacher ensembles, and sends back personalized soft targets for a second-stage distillation update. This two-stage procedure can be interpreted as approximate block-coordinate descent on a unified objective with prediction-space regularization. Experiments on FEMNIST, CIFAR-10 and AG News show that, under comparable or much lower communication budgets, KTA v2 consistently outperforms a local-only baseline and strong parameter-based methods (FedAvg, FedProx), and substantially improves over a FedMD-style global teacher. On CIFAR-10 with ResNet-18, KTA v2 reaches 57.7% test accuracy using approximately 1/1100 of FedAvg's communication, while on AG News it attains 89.3% accuracy with approximately 1/300 of FedAvg's traffic.
【4】Differentially Private and Federated Structure Learning in Bayesian Networks
标题:Bayesian网络中的差异私有和联邦结构学习
链接:https://arxiv.org/abs/2512.01708
作者:Ghita Fassy El Fehri,Aurélien Bellet,Philippe Bastien
摘要:Learning the structure of a Bayesian network from decentralized data poses two major challenges: (i) ensuring rigorous privacy guarantees for participants, and (ii) avoiding communication costs that scale poorly with dimensionality. In this work, we introduce Fed-Sparse-BNSL, a novel federated method for learning linear Gaussian Bayesian network structures that addresses both challenges. By combining differential privacy with greedy updates that target only a few relevant edges per participant, Fed-Sparse-BNSL efficiently uses the privacy budget while keeping communication costs low. Our careful algorithmic design preserves model identifiability and enables accurate structure estimation. Experiments on synthetic and real datasets demonstrate that Fed-Sparse-BNSL achieves utility close to non-private baselines while offering substantially stronger privacy and communication efficiency.
推理|分析|理解|解释(21篇)
【1】KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference
标题:KV Pareto:用于长上下文推理的KV缓存和模型压缩的系统级优化
链接:https://arxiv.org/abs/2512.01953
作者:Sai Gokhale,Devleena Das,Rajeev Patwari,Ashish Sirasao,Elliott Delaye
摘要:Long-context Large Language Models (LLMs) face significant memory bottlenecks during inference due to the linear growth of key-value (KV) cache with sequence length. While individual optimization techniques like KV cache quantization, chunked prefill, and model weight quantization have shown promise, their joint effects and optimal configurations for edge deployment remain underexplored. We introduce KV Pareto, a systems-level framework that systematically maps the trade-off frontier between total memory consumption and task accuracy across these three complementary optimization techniques. Our framework evaluates multiple LLM architectures (Qwen, Llama, Mistral) with varying KV quantization schemes (int2/4/8, mixed-precision), granularities (per-token, per-tensor, per-block), and 4-bit weight quantization via AWQ. Our framework identifies model-specific Pareto-optimal configurations that achieve 68-78% total memory reduction with minimal (1-3%) accuracy degradation on long-context tasks. We additionally verify the selected frontiers on additional benchmarks of Needle-in-a-Haystack, GSM8k and MMLU as well as extended context lengths of up to 128k to demonstrate the practical need of joint optimization for efficient LLM inference.
【2】Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model
标题:通过具有时间分层世界模型的深度主动推理实现现实世界机器人控制
链接:https://arxiv.org/abs/2512.01924
作者:Kentaro Fujii,Shingo Murata
备注:Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
摘要
:Robots in uncertain real-world environments must perform both goal-directed and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under uncertainty. To address this, we adopt deep active inference, a framework that accounts for human goal-directed and exploratory actions. Yet, conventional deep active inference approaches face challenges due to limited environmental representation capacity and high computational cost in action selection. We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model. The world model encodes environmental dynamics into hidden state representations at slow and fast timescales. The action model compresses action sequences into abstract actions using vector quantization, and the abstract world model predicts future slow states conditioned on the abstract action, enabling low-cost action selection. We evaluate the framework on object-manipulation tasks with a real-world robot. Results show that it achieves high success rates across diverse manipulation tasks and switches between goal-directed and exploratory actions in uncertain settings, while making action selection computationally tractable. These findings highlight the importance of modeling multiple timescale dynamics and abstracting actions and state transitions.
【3】Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models
标题:解构代际多样性:离散潜在代际模型的信息瓶颈分析
链接:https://arxiv.org/abs/2512.01831
作者:Yudi Wu,Wenhao Zhao,Dianbo Liu
摘要:Generative diversity varies significantly across discrete latent generative models such as AR, MIM, and Diffusion. We propose a diagnostic framework, grounded in Information Bottleneck (IB) theory, to analyze the underlying strategies resolving this behavior. The framework models generation as a conflict between a 'Compression Pressure' - a drive to minimize overall codebook entropy - and a 'Diversity Pressure' - a drive to maximize conditional entropy given an input. We further decompose this diversity into two primary sources: 'Path Diversity', representing the choice of high-level generative strategies, and 'Execution Diversity', the randomness in executing a chosen strategy. To make this decomposition operational, we introduce three zero-shot, inference-time interventions that directly perturb the latent generative process and reveal how models allocate and express diversity. Application of this probe-based framework to representative AR, MIM, and Diffusion systems reveals three distinct strategies: "Diversity-Prioritized" (MIM), "Compression-Prioritized" (AR), and "Decoupled" (Diffusion). Our analysis provides a principled explanation for their behavioral differences and informs a novel inference-time diversity enhancement technique.
【4】DeepCAVE: A Visualization and Analysis Tool for Automated Machine Learning
标题:DeepCAVE:自动机器学习的可视化和分析工具
链接:https://arxiv.org/abs/2512.01810
作者:Sarah Segel,Helena Graf,Edward Bergman,Kristina Thieme,Marcel Wever,Alexander Tornede,Frank Hutter,Marius Lindauer
摘要:Hyperparameter optimization (HPO), as a central paradigm of AutoML, is crucial for leveraging the full potential of machine learning (ML) models; yet its complexity poses challenges in understanding and debugging the optimization process. We present DeepCAVE, a tool for interactive visualization and analysis, providing insights into HPO. Through an interactive dashboard, researchers, data scientists, and ML engineers can explore various aspects of the HPO process and identify issues, untouched potentials, and new insights about the ML model being tuned. By empowering users with actionable insights, DeepCAVE contributes to the interpretability of HPO and ML on a design level and aims to foster the development of more robust and efficient methodologies in the future.
【5】Enhancing BERT Fine-Tuning for Sentiment Analysis in Lower-Resourced Languages
标题:增强BERT微调以用于资源较少的语言的情感分析
链接:https://arxiv.org/abs/2512.01460
作者:Jozef Kubík,Marek Šuppa,Martin Takáč
摘要:Limited data for low-resource languages typically yield weaker language models (LMs). Since pre-training is compute-intensive, it is more pragmatic to target improvements during fine-tuning. In this work, we examine the use of Active Learning (AL) methods augmented by structured data selection strategies which we term 'Active Learning schedulers', to boost the fine-tuning process with a limited amount of training data. We connect the AL to data clustering and propose an integrated fine-tuning pipeline that systematically combines AL, clustering, and dynamic data selection schedulers to enhance model's performance. Experiments in the Slovak, Maltese, Icelandic and Turkish languages show that the use of clustering during the fine-tuning phase together with AL scheduling can simultaneously produce annotation savings up to 30% and performance improvements up to four F1 score points, while also providing better fine-tuning stability.
【6】A Self-explainable Model of Long Time Series by Extracting Informative Structured Causal Patterns
标题:提取信息结构化因果模式的长时间序列可自我解释模型
链接:https://arxiv.org/abs/2512.01412
作者:Ziqian Wang,Yuxiao Cheng,Jinli Suo
备注:Approximately 30 pages, 8 figures, and 5 tables. Preprint version. Includes theoretical analysis, model architecture, interpretability evaluation, and extensive benchmark experiments
摘要
:Explainability is essential for neural networks that model long time series, yet most existing explainable AI methods only produce point-wise importance scores and fail to capture temporal structures such as trends, cycles, and regime changes. This limitation weakens human interpretability and trust in long-horizon models. To address these issues, we identify four key requirements for interpretable time-series modeling: temporal continuity, pattern-centric explanation, causal disentanglement, and faithfulness to the model's inference process. We propose EXCAP, a unified framework that satisfies all four requirements. EXCAP combines an attention-based segmenter that extracts coherent temporal patterns, a causally structured decoder guided by a pre-trained causal graph, and a latent aggregation mechanism that enforces representation stability. Our theoretical analysis shows that EXCAP provides smooth and stable explanations over time and is robust to perturbations in causal masks. Extensive experiments on classification and forecasting benchmarks demonstrate that EXCAP achieves strong predictive accuracy while generating coherent and causally grounded explanations. These results show that EXCAP offers a principled and scalable approach to interpretable modeling of long time series with relevance to high-stakes domains such as healthcare and finance.
【7】Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
标题:用稀疏自推测解码加速大规模推理模型的推理
链接:https://arxiv.org/abs/2512.01278
作者:Yilong Zhao,Jiaming Tang,Kan Zhu,Zihao Ye,Chi-Chih Chang,Chaofan Lin,Jongseok Park,Guangxuan Xiao,Mohamed S. Abdelfattah,Mingyu Gao,Baris Kasikci,Song Han,Ion Stoica
摘要:Reasoning language models have demonstrated remarkable capabilities on challenging tasks by generating elaborate chain-of-thought (CoT) solutions. However, such lengthy generation shifts the inference bottleneck from compute-bound to memory-bound. To generate each token, the model applies full attention to all previously generated tokens, requiring memory access to an increasingly large KV-Cache. Consequently, longer generations demand more memory access for every step, leading to substantial pressure on memory bandwidth. To address this, we introduce SparseSpec, a speculative decoding framework that reuses the same model as the draft and target models (i.e., self-speculation). SparseSpec features a novel sparse attention mechanism, PillarAttn, as the draft model, which accurately selects critical tokens via elegantly reusing information from the verification stage. Furthermore, SparseSpec co-designs self-speculation with three system innovations: (1) a unified scheduler to batch token drafting and verification, (2) delayed verification for CPU/GPU overlap, and (3) dynamic KV-Cache management to maximize memory utilization. Across various models and datasets, SparseSpec outperforms state-of-the-art solutions, with an up to 2.13x throughput speedup.
【8】SUPERChem: A Multimodal Reasoning Benchmark in Chemistry
标题:SUPERChem:化学中的多模式推理基准
链接:https://arxiv.org/abs/2512.01274
作者:Zehua Zhao,Zhixian Huang,Junren Li,Siyu Lin,Junting Zhou,Fengqi Cao,Kun Zhou,Rui Ge,Tingting Long,Yuexiang Zhu,Yan Liu,Jie Zheng,Junnian Wei,Rong Zhu,Peng Zou,Wenyu Li,Zekai Cheng,Tian Ding,Yaxuan Wang,Yizhao Yan,Tingru Wei,Haowei Ming,Weijie Mao,Chen Sun,Yiming Liu,Zichen Wang,Zuo Zhang,Tong Yang,Hao Ma,Zhen Gao,Jian Pei
备注:35 pages, 11 figures, 5 tables
摘要:Current benchmarks for evaluating the chemical reasoning capabilities of Large Language Models (LLMs) are limited by oversimplified tasks, lack of process-level evaluation, and misalignment with expert-level chemistry skills. To address these issues, we introduce SUPERChem, a benchmark of 500 expert-curated reasoning-intensive chemistry problems, covering diverse subfields and provided in both multimodal and text-only formats. Original content and an iterative curation pipeline eliminate flawed items and mitigate data contamination. Each problem is paired with an expert-authored solution path, enabling Reasoning Path Fidelity (RPF) scoring to evaluate reasoning quality beyond final-answer accuracy. Evaluations against a human baseline of 40.3% accuracy show that even the best-performing model, GPT-5 (High), reaches only 38.5%, followed closely by Gemini 2.5 Pro (37.9%) and DeepSeek-V3.1-Think (37.3%). SUPERChem elicits multi-step, multimodal reasoning, reveals model-dependent effects of visual information, and distinguishes high-fidelity reasoners from heuristic ones. By providing a challenging benchmark and a reliable evaluation framework, SUPERChem aims to facilitate the advancement of LLMs toward expert-level chemical intelligence. The dataset of the benchmark is available at https://huggingface.co/datasets/ZehuaZhao/SUPERChem.
【9】Research on Milling Machine Predictive Maintenance Based on Machine Learning and SHAP Analysis in Intelligent Manufacturing Environment
标题:智能制造环境下基于机器学习和SHAP分析的磨机预测性维护研究
链接:https://arxiv.org/abs/2512.01205
作者:Wen Zhao,Jiawen Ding,Xueting Huang,Yibo Zhang
备注:5 pages, 5 figures. Accepted for publication at ICEIEC 2025 (not yet published)
摘要:In the context of intelligent manufacturing, this paper conducts a series of experimental studies on the predictive maintenance of industrial milling machine equipment based on the AI4I 2020 dataset. This paper proposes a complete predictive maintenance experimental process combining artificial intelligence technology, including six main links: data preprocessing, model training, model evaluation, model selection, SHAP analysis, and result visualization. By comparing and analyzing the performance of eight machine learning models, it is found that integrated learning methods such as XGBoost and random forest perform well in milling machine fault prediction tasks. In addition, with the help of SHAP analysis technology, the influence mechanism of different features on equipment failure is deeply revealed, among which processing temperature, torque and speed are the key factors affecting failure. This study combines artificial intelligence and manufacturing technology, provides a methodological reference for predictive maintenance practice in an intelligent manufacturing environment, and has practical significance for promoting the digital transformation of the manufacturing industry, improving production efficiency and reducing maintenance costs.
【10】2D-ThermAl: Physics-Informed Framework for Thermal Analysis of Circuits using Generative AI
标题:2D-ThermAl:使用生成AI进行电路热分析的物理信息框架
链接:https://arxiv.org/abs/2512.01163
作者:Soumyadeep Chandra,Sayeed Shafayet Chowdhury,Kaushik Roy
备注:10 pages, 8 figures, Under Review
摘要
:Thermal analysis is increasingly critical in modern integrated circuits, where non-uniform power dissipation and high transistor densities can cause rapid temperature spikes and reliability concerns. Traditional methods, such as FEM-based simulations offer high accuracy but computationally prohibitive for early-stage design, often requiring multiple iterative redesign cycles to resolve late-stage thermal failures. To address these challenges, we propose 'ThermAl', a physics-informed generative AI framework which effectively identifies heat sources and estimates full-chip transient and steady-state thermal distributions directly from input activity profiles. ThermAl employs a hybrid U-Net architecture enhanced with positional encoding and a Boltzmann regularizer to maintain physical fidelity. Our model is trained on an extensive dataset of heat dissipation maps, ranging from simple logic gates (e.g., inverters, NAND, XOR) to complex designs, generated via COMSOL. Experimental results demonstrate that ThermAl delivers precise temperature mappings for large circuits, with a root mean squared error (RMSE) of only 0.71°C, and outperforms conventional FEM tools by running up to ~200 times faster. We analyze performance across diverse layouts and workloads, and discuss its applicability to large-scale EDA workflows. While thermal reliability assessments often extend beyond 85°C for post-layout signoff, our focus here is on early-stage hotspot detection and thermal pattern learning. To ensure generalization beyond the nominal operating range 25-55°C, we additionally performed cross-validation on an extended dataset spanning 25-95°C maintaining a high accuracy (<2.2% full-scale RMSE) even under elevated temperature conditions representative of peak power and stress scenarios.
【11】Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning
标题:有效学习分支网络以实现多任务数学推理
链接:https://arxiv.org/abs/2512.01113
作者:Dongyue Li,Zhenshuo Zhang,Minxuan Duan,Edgar Dobriban,Hongyang R. Zhang
备注:31 pages. Preprint, to appear in KDD'26
摘要:Algorithmic reasoning -- the ability to perform step-by-step logical inference -- has become a core benchmark for evaluating reasoning in graph neural networks (GNNs) and large language models (LLMs). Ideally, one would like to design a single model capable of performing well on multiple algorithmic reasoning tasks simultaneously. However, this is challenging when the execution steps of algorithms differ from one another, causing negative interference when they are trained together. We propose branching neural networks, a principled architecture for multitask algorithmic reasoning. Searching for the optimal $k$-ary tree with $L$ layers over $n$ algorithmic tasks is combinatorial, requiring exploration of up to $k^{nL}$ possible structures. We develop AutoBRANE, an efficient algorithm that reduces this search to $O(nL)$ time by solving a convex relaxation at each layer to approximate an optimal task partition. The method clusters tasks using gradient-based affinity scores and can be used on top of any base model, including GNNs and LLMs. We validate AutoBRANE on a broad suite of graph-algorithmic and text-based reasoning benchmarks. We show that gradient features estimate true task performance within 5% error across four GNNs and four LLMs (up to 34B parameters). On the CLRS benchmark, it outperforms the strongest single multitask GNN by 3.7% and the best baseline by 1.2%, while reducing runtime by 48% and memory usage by 26%. The learned branching structures reveal an intuitively reasonable hierarchical clustering of related algorithms. On three text-based graph reasoning benchmarks, AutoBRANE improves over the best non-branching multitask baseline by 3.2%. Finally, on a large graph dataset with 21M edges and 500 tasks, AutoBRANE achieves a 28% accuracy gain over existing multitask and branching architectures, along with a 4.5$\times$ reduction in runtime.
【12】VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
标题:VLASH:通过未来状态感知同步推理的实时VLA
链接:https://arxiv.org/abs/2512.01031
作者:Jiaming Tang,Yufei Sun,Yilong Zhao,Shang Yang,Yujun Lin,Zhuoyang Zhang,James Hou,Yao Lu,Zhijian Liu,Song Han
摘要:Vision-Language-Action models (VLAs) are becoming increasingly capable across diverse robotic tasks. However, their real-world deployment remains slow and inefficient: demonstration videos are often sped up by 5-10x to appear smooth, with noticeable action stalls and delayed reactions to environmental changes. Asynchronous inference offers a promising solution to achieve continuous and low-latency control by enabling robots to execute actions and perform inference simultaneously. However, because the robot and environment continue to evolve during inference, a temporal misalignment arises between the prediction and execution intervals. This leads to significant action instability, while existing methods either degrade accuracy or introduce runtime overhead to mitigate it. We propose VLASH, a general asynchronous inference framework for VLAs that delivers smooth, accurate, and fast reaction control without additional overhead or architectural changes. VLASH estimates the future execution-time state by rolling the robot state forward with the previously generated action chunk, thereby bridging the gap between prediction and execution. Experiments show that VLASH achieves up to 2.03x speedup and reduces reaction latency by up to 17.4x compared to synchronous inference while fully preserving the original accuracy. Moreover, it empowers VLAs to handle fast-reaction, high-precision tasks such as playing ping-pong and playing whack-a-mole, where traditional synchronous inference fails. Code is available at https://github.com/mit-han-lab/vlash
【13】Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis
标题:通过遵循指示的意图分析减轻间接提示注射
链接:https://arxiv.org/abs/2512.00966
作者:Mintong Kang,Chong Xiang,Sanjay Kariyappa,Chaowei Xiao,Bo Li,Edward Suh
摘要:Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical threat to LLM-powered agents. In this paper, we present IntentGuard, a general defense framework based on instruction-following intent analysis. The key insight of IntentGuard is that the decisive factor in IPIAs is not the presence of malicious text, but whether the LLM intends to follow instructions from untrusted data. Building on this insight, IntentGuard leverages an instruction-following intent analyzer (IIA) to identify which parts of the input prompt the model recognizes as actionable instructions, and then flag or neutralize any overlaps with untrusted data segments. To instantiate the framework, we develop an IIA that uses three "thinking intervention" strategies to elicit a structured list of intended instructions from reasoning-enabled LLMs. These techniques include start-of-thinking prefilling, end-of-thinking refinement, and adversarial in-context demonstration. We evaluate IntentGuard on two agentic benchmarks (AgentDojo and Mind2Web) using two reasoning-enabled LLMs (Qwen-3-32B and gpt-oss-20B). Results demonstrate that IntentGuard achieves (1) no utility degradation in all but one setting and (2) strong robustness against adaptive prompt injection attacks (e.g., reducing attack success rates from 100% to 8.5% in a Mind2Web scenario).
【14】DeformAr: Rethinking NER Evaluation through Component Analysis and Visual Analytics
标题:DeformAR:通过成分分析和视觉分析重新思考NER评估
链接:https://arxiv.org/abs/2512.00938
作者:Ahmed Mustafa Younes
备注:PhD Thesis, University of Sussex, 2025. 311 pages, 140 figures, 32 tables. Submitted as a PDF-only. First supervisor: Julie Weeds. Second supervisor: David Weir
摘要:Transformer models have significantly advanced Natural Language Processing (NLP), demonstrating strong performance in English. However, their effectiveness in Arabic, particularly for Named Entity Recognition (NER), remains limited, even with larger pre-trained models. This performance gap stems from multiple factors, including tokenisation, dataset quality, and annotation inconsistencies. Existing studies often analyze these issues in isolation, failing to capture their joint effect on system behaviour and performance. We introduce DeformAr (Debugging and Evaluation Framework for Transformer-based NER Systems), a novel framework designed to investigate and explain the performance discrepancy between Arabic and English NER systems. DeformAr integrates a data extraction library and an interactive dashboard, supporting two modes of evaluation: cross-component analysis and behavioural analysis. The framework divides each language into dataset and model components to examine their interactions. The analysis proceeds in two stages. First, cross-component analysis provides systematic diagnostic measures across data and model subcomponents, addressing the "what," "how," and "why" behind observed discrepancies. The second stage applies behavioural analysis by combining interpretability techniques with token-level metrics, interactive visualisations, and representation space analysis. These stages enable a component-aware diagnostic process that detects model behaviours and explains them by linking them to underlying representational patterns and data factors. DeformAr is the first Arabic-specific, component-based interpretability tool, offering a crucial resource for advancing model analysis in under-resourced languages.
【15】One Swallow Does Not Make a Summer: Understanding Semantic Structures in Embedding Spaces
标题:一燕不成夏天:理解嵌入空间中的语义结构
链接:https://arxiv.org/abs/2512.00852
作者:Yandong Sun,Qiang Huang,Ziwei Xu,Yiqun Sun,Yixuan Tang,Anthony K. H. Tung
摘要:Embedding spaces are fundamental to modern AI, translating raw data into high-dimensional vectors that encode rich semantic relationships. Yet, their internal structures remain opaque, with existing approaches often sacrificing semantic coherence for structural regularity or incurring high computational overhead to improve interpretability. To address these challenges, we introduce the Semantic Field Subspace (SFS), a geometry-preserving, context-aware representation that captures local semantic neighborhoods within the embedding space. We also propose SAFARI (SemAntic Field subspAce deteRmInation), an unsupervised, modality-agnostic algorithm that uncovers hierarchical semantic structures using a novel metric called Semantic Shift, which quantifies how semantics evolve as SFSes evolve. To ensure scalability, we develop an efficient approximation of Semantic Shift that replaces costly SVD computations, achieving a 15~30x speedup with average errors below 0.01. Extensive evaluations across six real-world text and image datasets show that SFSes outperform standard classifiers not only in classification but also in nuanced tasks such as political bias detection, while SAFARI consistently reveals interpretable and generalizable semantic hierarchies. This work presents a unified framework for structuring, analyzing, and scaling semantic understanding in embedding spaces.
【16】Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals
标题:可解释的多模式深度学习用于从呼吸音频信号自动检测肺部疾病
链接:https://arxiv.org/abs/2512.00563
作者:S M Asiful Islam Saky,Md Rashidul Islam,Md Saiful Arefin,Shahaba Alam
摘要:Respiratory diseases remain major global health challenges, and traditional auscultation is often limited by subjectivity, environmental noise, and inter-clinician variability. This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals. The proposed system integrates two complementary representations: a spectral-temporal encoder based on a CNN-BiLSTM Attention architecture, and a handcrafted acoustic-feature encoder capturing physiologically meaningful descriptors such as MFCCs, spectral centroid, spectral bandwidth, and zero-crossing rate. These branches are combined through late-stage fusion to leverage both data-driven learning and domain-informed acoustic cues. The model is trained and evaluated on the Asthma Detection Dataset Version 2 using rigorous preprocessing, including resampling, normalization, noise filtering, data augmentation, and patient-level stratified partitioning. The study achieved strong generalization with 91.21% accuracy, 0.899 macro F1-score, and 0.9866 macro ROC-AUC, outperforming all ablated variants. An ablation study confirms the importance of temporal modeling, attention mechanisms, and multimodal fusion. The framework incorporates Grad-CAM, Integrated Gradients, and SHAP, generating interpretable spectral, temporal, and feature-level explanations aligned with known acoustic biomarkers to build clinical transparency. The findings demonstrate the framework's potential for telemedicine, point-of-care diagnostics, and real-world respiratory screening.
【17】Pushing the Boundaries of Interpretability: Incremental Enhancements to the Explainable Boosting Machine
标题:突破可解释性的界限:可解释助推机的增量增强
链接:https://arxiv.org/abs/2512.00528
作者:Isara Liyanage,Uthayasanker Thayasivam
摘要
:The widespread adoption of complex machine learning models in high-stakes domains has brought the "black-box" problem to the forefront of responsible AI research. This paper aims at addressing this issue by improving the Explainable Boosting Machine (EBM), a state-of-the-art glassbox model that delivers both high accuracy and complete transparency. The paper outlines three distinct enhancement methodologies: targeted hyperparameter optimization with Bayesian methods, the implementation of a custom multi-objective function for fairness for hyperparameter optimization, and a novel self-supervised pre-training pipeline for cold-start scenarios. All three methodologies are evaluated across standard benchmark datasets, including the Adult Income, Credit Card Fraud Detection, and UCI Heart Disease datasets. The analysis indicates that while the tuning process yielded marginal improvements in the primary ROC AUC metric, it led to a subtle but important shift in the model's decision-making behavior, demonstrating the value of a multi-faceted evaluation beyond a single performance score. This work is positioned as a critical step toward developing machine learning systems that are not only accurate but also robust, equitable, and transparent, meeting the growing demands of regulatory and ethical compliance.
【18】TrendGNN: Towards Understanding of Epidemics, Beliefs, and Behaviors
标题:TrendGNN:了解流行病、信念和行为
链接:https://arxiv.org/abs/2512.00421
作者:Mulin Tian,Ajitesh Srivastava
备注:4 pages, 2 figures, 1 table
摘要:Epidemic outcomes have a complex interplay with human behavior and beliefs. Most of the forecasting literature has focused on the task of predicting epidemic signals using simple mechanistic models or black-box models, such as deep transformers, that ingest all available signals without offering interpretability. However, to better understand the mechanisms and predict the impact of interventions, we need the ability to forecast signals associated with beliefs and behaviors in an interpretable manner. In this work, we propose a graph-based forecasting framework that first constructs a graph of interrelated signals based on trend similarity, and then applies graph neural networks (GNNs) for prediction. This approach enables interpretable analysis by revealing which signals are more predictable and which relationships contribute most to forecasting accuracy. We believe our method provides early steps towards a framework for interpretable modeling in domains with multiple potentially interdependent signals, with implications for building future simulation models that integrate behavior, beliefs, and observations.
【19】Tree Matching Networks for Natural Language Inference: Parameter-Efficient Semantic Understanding via Dependency Parse Trees
标题:用于自然语言推理的树匹配网络:通过依赖性解析树实现参数高效的语义理解
链接:https://arxiv.org/abs/2512.00204
作者:Jason Lunder
备注:16 pages, preprint
摘要:In creating sentence embeddings for Natural Language Inference (NLI) tasks, using transformer-based models like BERT leads to high accuracy, but require hundreds of millions of parameters. These models take in sentences as a sequence of tokens, and learn to encode the meaning of the sequence into embeddings such that those embeddings can be used reliably for NLI tasks. Essentially, every word is considered against every other word in the sequence, and the transformer model is able to determine the relationships between them, entirely from scratch. However, a model that accepts explicit linguistic structures like dependency parse trees may be able to leverage prior encoded information about these relationships, without having to learn them from scratch, thus improving learning efficiency. To investigate this, we adapt Graph Matching Networks (GMN) to operate on dependency parse trees, creating Tree Matching Networks (TMN). We compare TMN to a BERT based model on the SNLI entailment task and on the SemEval similarity task. TMN is able to achieve significantly better results with a significantly reduced memory footprint and much less training time than the BERT based model on the SNLI task, while both models struggled to preform well on the SemEval. Explicit structural representations significantly outperform sequence-based models at comparable scales, but current aggregation methods limit scalability. We propose multi-headed attention aggregation to address this limitation.
【20】Faster Verified Explanations for Neural Networks
标题:神经网络的更快验证简化
链接:https://arxiv.org/abs/2512.00164
作者:Alessandro De Palma,Greta Dolcetti,Caterina Urban
摘要:Verified explanations are a theoretically-principled way to explain the decisions taken by neural networks, which are otherwise black-box in nature. However, these techniques face significant scalability challenges, as they require multiple calls to neural network verifiers, each of them with an exponential worst-case complexity. We present FaVeX, a novel algorithm to compute verified explanations. FaVeX accelerates the computation by dynamically combining batch and sequential processing of input features, and by reusing information from previous queries, both when proving invariances with respect to certain input features, and when searching for feature assignments altering the prediction. Furthermore, we present a novel and hierarchical definition of verified explanations, termed verifier-optimal robust explanations, that explicitly factors the incompleteness of network verifiers within the explanation. Our comprehensive experimental evaluation demonstrates the superior scalability of both FaVeX, and of verifier-optimal robust explanations, which together can produce meaningful formal explanation on networks with hundreds of thousands of non-linear activations.
【21】Self-sufficient Independent Component Analysis via KL Minimizing Flows
标题:通过KL最小化流量进行自给自足的独立成分分析
链接:https://arxiv.org/abs/2512.00665
作者:Song Liu
摘要:We study the problem of learning disentangled signals from data using non-linear Independent Component Analysis (ICA). Motivated by advances in self-supervised learning, we propose to learn self-sufficient signals: A recovered signal should be able to reconstruct a missing value of its own from all remaining components without relying on any other signals. We formulate this problem as the minimization of a conditional KL divergence. Compared to traditional maximum likelihood estimation, our algorithm is prior-free and likelihood-free, meaning that we do not need to impose any prior on the original signals or any observational model, which often restricts the model's flexibility. To tackle the KL divergence minimization problem, we propose a sequential algorithm that reduces the KL divergence and learns an optimal de-mixing flow model at each iteration. This approach completely avoids the unstable adversarial training, a common issue in minimizing the KL divergence. Experiments on toy and real-world datasets show the effectiveness of our method.
检测相关(8篇)
【1】TimePred: efficient and interpretable offline change point detection for high volume data - with application to industrial process monitoring
标题:TimePred:针对大量数据的高效且可解释的离线变化点检测-应用于工业过程监控
链接:https://arxiv.org/abs/2512.01562
作者:Simon Leszek
备注:6 pages, 3 figures
摘要:Change-point detection (CPD) in high-dimensional, large-volume time series is challenging for statistical consistency, scalability, and interpretability. We introduce TimePred, a self-supervised framework that reduces multivariate CPD to univariate mean-shift detection by predicting each sample's normalized time index. This enables efficient offline CPD using existing algorithms and supports the integration of XAI attribution methods for feature-level explanations. Our experiments show competitive CPD performance while reducing computational cost by up to two orders of magnitude. In an industrial manufacturing case study, we demonstrate improved detection accuracy and illustrate the practical value of interpretable change-point insights.
【2】Heuristic algorithms for the stochastic critical node detection problem
标题:随机关键节点检测问题的启发式算法
链接:https://arxiv.org/abs/2512.01497
作者:Tuguldur Bayarsaikhan,Altannar Chinchuluun,Ashwin Arulselvan,Panos Pardalos
备注:17 pages, 11 figures
摘要:Given a network, the critical node detection problem finds a subset of nodes whose removal disrupts the network connectivity. Since many real-world systems are naturally modeled as graphs, assessing the vulnerability of the network is essential, with applications in transportation systems, traffic forecasting, epidemic control, and biological networks. In this paper, we consider a stochastic version of the critical node detection problem, where the existence of edges is given by certain probabilities. We propose heuristics and learning-based methods for the problem and compare them with existing algorithms. Experimental results performed on random graphs from small to larger scales, with edge-survival probabilities drawn from different distributions, demonstrate the effectiveness of the methods. Heuristic methods often illustrate the strongest results with high scalability, while learning-based methods maintain nearly constant inference time as the network size and density grow.
【3】BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud
标题:BlinkBud:通过单个耳塞上的采样单目镜3D检测从后面检测危险
链接:https://arxiv.org/abs/2512.01366
作者:Yunzhe Li,Jiajun Yan,Yuzhou Wei,Kechen Liu,Yize Zhao,Chong Zhang,Hongzi Zhu,Li Lu,Shan Chang,Minyi Guo
备注:This is the author-accepted version of the paper published in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Vol. 9, No. 4, Article 191, 2025. Final published version: https://doi.org/10.1145/3770707
摘要:Failing to be aware of speeding vehicles approaching from behind poses a huge threat to the road safety of pedestrians and cyclists. In this paper, we propose BlinkBud, which utilizes a single earbud and a paired phone to online detect hazardous objects approaching from behind of a user. The core idea is to accurately track visually identified objects utilizing a small number of sampled camera images taken from the earbud. To minimize the power consumption of the earbud and the phone while guaranteeing the best tracking accuracy, a novel 3D object tracking algorithm is devised, integrating both a Kalman filter based trajectory estimation scheme and an optimal image sampling strategy based on reinforcement learning. Moreover, the impact of constant user head movements on the tracking accuracy is significantly eliminated by leveraging the estimated pitch and yaw angles to correct the object depth estimation and align the camera coordinate system to the user's body coordinate system, respectively. We implement a prototype BlinkBud system and conduct extensive real-world experiments. Results show that BlinkBud is lightweight with ultra-low mean power consumptions of 29.8 mW and 702.6 mW on the earbud and smartphone, respectively, and can accurately detect hazards with a low average false positive ratio (FPR) and false negative ratio (FNR) of 4.90% and 1.47%, respectively.
【4】FC-ADL: Efficient Microservice Anomaly Detection and Localisation Through Functional Connectivity
标题:FC-ADL:通过功能连接进行高效的微服务异常检测和定位
链接:https://arxiv.org/abs/2512.00844
作者:Giles Winchester,George Parisis,Luc Berthouze
备注:13 pages, 6 figures, 2 tables
摘要:Microservices have transformed software architecture through the creation of modular and independent services. However, they introduce operational complexities in service integration and system management that makes swift and accurate anomaly detection and localisation challenging. Despite the complex, dynamic, and interconnected nature of microservice architectures, prior works that investigate metrics for anomaly detection rarely include explicit information about time-varying interdependencies. And whilst prior works on fault localisation typically do incorporate information about dependencies between microservices, they scale poorly to real world large-scale deployments due to their reliance on computationally expensive causal inference. To address these challenges we propose FC-ADL, an end-to-end scalable approach for detecting and localising anomalous changes from microservice metrics based on the neuroscientific concept of functional connectivity. We show that by efficiently characterising time-varying changes in dependencies between microservice metrics we can both detect anomalies and provide root cause candidates without incurring the significant overheads of causal and multivariate approaches. We demonstrate that our approach can achieve top detection and localisation performance across a wide degree of different fault scenarios when compared to state-of-the-art approaches. Furthermore, we illustrate the scalability of our approach by applying it to Alibaba's extremely large real-world microservice deployment.
【5】Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement
标题:边缘的时间序列:用于可穿戴步态检测和最佳传感器放置的微小可分离CNN
链接:https://arxiv.org/abs/2512.00396
作者:Andrea Procopio,Marco Esposito,Sara Raggiunto,Andrey Gizdov,Alberto Belli,Paola Pierleoni
摘要:We study on-device time-series analysis for gait detection in Parkinson's disease (PD) from short windows of triaxial acceleration, targeting resource-constrained wearables and edge nodes. We compare magnitude thresholding to three 1D CNNs for time-series analysis: a literature baseline (separable convolutions) and two ultra-light models - one purely separable and one with residual connections. Using the BioStampRC21 dataset, 2 s windows at 30 Hz, and subject-independent leave-one-subject-out (LOSO) validation on 16 PwPD with chest-worn IMUs, our residual separable model (Model 2, 533 params) attains PR-AUC = 94.5%, F1 = 91.2%, MCC = 89.4%, matching or surpassing the baseline (5,552 params; PR-AUC = 93.7%, F1 = 90.5%, MCC = 88.5%) with approximately 10x fewer parameters. The smallest model (Model 1, 305 params) reaches PR-AUC = 94.0%, F1 = 91.0%, MCC = 89.1%. Thresholding obtains high recall (89.0%) but low precision (76.5%), yielding many false positives and high inter-subject variance. Sensor-position analysis (train-on-all) shows chest and thighs are most reliable; forearms degrade precision/recall due to non-gait arm motion; naive fusion of all sites does not outperform the best single site. Both compact CNNs execute within tight memory/latency budgets on STM32-class MCUs (sub-10 ms on low-power boards), enabling on-sensor gating of transmission/storage. Overall, ultra-light separable CNNs provide a superior accuracy-efficiency-generalization trade-off to fixed thresholds for wearable PD gait detection and underscore the value of tailored time-series models for edge deployment.
【6】Robust Detection of Synthetic Tabular Data under Schema Variability
标题:模式可变性下合成表格数据的鲁棒检测
链接:https://arxiv.org/abs/2509.00092
作者:G. Charbel N. Kindji,Elisa Fromont,Lina Maria Rojas-Barahona,Tanguy Urvoy
摘要:The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored task of detecting synthetic tabular data ''in the wild'', i.e. when the detector is deployed on tables with variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence that detecting synthetic tabular data in real-world conditions is feasible, and demonstrates substantial improvements over previous approaches. Following acceptance of the paper, we are finalizing the administrative and licensing procedures necessary for releasing the source code. This extended version will be updated as soon as the release is complete.
【7】Modeling Wavelet Transformed Quantum Support Vector for Network Intrusion Detection
标题:网络入侵检测的子波变换量子支持量建模
链接:https://arxiv.org/abs/2512.01365
作者:Swati Kumari,Shiva Raj Pokhrel,Swathi Chandrasekhar,Navneet Singh,Hridoy Sankar Dutta,Adnan Anwar,Sutharshan Rajasegarar,Robin Doss
摘要:Network traffic anomaly detection is a critical cy- bersecurity challenge requiring robust solutions for complex Internet of Things (IoT) environments. We present a novel hybrid quantum-classical framework integrating an enhanced Quantum Support Vector Machine (QSVM) with the Quantum Haar Wavelet Packet Transform (QWPT) for superior anomaly classification under realistic noisy intermediate-scale Quantum conditions. Our methodology employs amplitude-encoded quan- tum state preparation, multi-level QWPT feature extraction, and behavioral analysis via Shannon Entropy profiling and Chi-square testing. Features are classified using QSVM with fidelity-based quantum kernels optimized through hybrid train- ing with simultaneous perturbation stochastic approximation (SPSA) optimizer. Evaluation under noiseless and depolarizing noise conditions demonstrates exceptional performance: 96.67% accuracy on BoT-IoT and 89.67% on IoT-23 datasets, surpassing quantum autoencoder approaches by over 7 percentage points.
【8】Sleep Apnea Detection on a Wireless Multimodal Wearable Device Without Oxygen Flow Using a Mamba-based Deep Learning Approach
标题:使用基于Mamba的深度学习方法在无氧气流量的无线多模式可穿戴设备上进行睡眠呼吸暂停检测
链接:https://arxiv.org/abs/2512.00989
作者:Dominik Luszczynski,Richard Fei Yin,Nicholas Afonin,Andrew S. P. Lim
备注:29 pages, 14 figures. Authors Dominik Luszczynski, Richard Fei Yin and Nicholas Afonin contributed equally
摘要
:Objectives: We present and evaluate a Mamba-based deep-learning model for diagnosis and event-level characterization of sleep disordered breathing based on signals from the ANNE One, a non-intrusive dual-module wireless wearable system measuring chest electrocardiography, triaxial accelerometry, chest and finger temperature, and finger phototplethysmography. Methods: We obtained concurrent PSG and wearable sensor recordings from 384 adults attending a tertiary care sleep laboratory. Respiratory events in the PSG were manually annotated in accordance with AASM guidelines. Wearable sensor and PSG recordings were automatically aligned based on the ECG signal, alignment confirmed by visual inspection, and PSG-derived respiratory event labels were used to train and evaluate a deep sequential neural network based on the Mamba architecture. Results: In 57 recordings in our test set (mean age 56, mean AHI 10.8, 43.86\% female) the model-predicted AHI was highly correlated with that derived form the PSG labels (R=0.95, p=8.3e-30, men absolute error 2.83). This performance did not vary with age or sex. At a threshold of AHI$>$5, the model had a sensitivity of 0.96, specificity of 0.87, and kappa of 0.82, and at a threshold of AHI$>$15, the model had a sensitivity of 0.86, specificity of 0.98, and kappa of 0.85. At the level of 30-sec epochs, the model had a sensitivity of 0.93 and specificity of 0.95, with a kappa of 0.68 regarding whether any given epoch contained a respiratory event. Conclusions: Applied to data from the ANNE One, a Mamba-based deep learning model can accurately predict AHI and identify SDB at clinically relevant thresholds, achieves good epoch- and event-level identification of individual respiratory events, and shows promise at physiological characterization of these events including event type (central vs. other) and event duration.
分类|识别(13篇)
【1】Label Forensics: Interpreting Hard Labels in Black-Box Text Classifier
标题:标签取证:解释黑匣子文本分类器中的硬标签
链接:https://arxiv.org/abs/2512.01514
作者:Mengyao Du,Gang Yang,Han Fang,Quanjun Yin,Ee-chien Chang
备注:10 pages, 3 figures
摘要:The widespread adoption of natural language processing techniques has led to an unprecedented growth of text classifiers across the modern web. Yet many of these models circulate with their internal semantics undocumented or even intentionally withheld. Such opaque classifiers, which may expose only hard-label outputs, can operate in unregulated web environments or be repurposed for unknown intents, raising legitimate forensic and auditing concerns. In this paper, we position ourselves as investigators and work to infer the semantic concept each label encodes in an undocumented black-box classifier. Specifically, we introduce label forensics, a black-box framework that reconstructs a label's semantic meaning. Concretely, we represent a label by a sentence embedding distribution from which any sample reliably reflects the concept the classifier has implicitly learned for that label. We believe this distribution should maintain two key properties: precise, with samples consistently classified into the target label, and general, covering the label's broad semantic space. To realize this, we design a semantic neighborhood sampler and an iterative optimization procedure to select representative seed sentences that jointly maximize label consistency and distributional coverage. The final output, an optimized seed sentence set combined with the sampler, constitutes the empirical distribution representing the label's semantics. Experiments on multiple black-box classifiers achieve an average label consistency of around 92.24 percent, demonstrating that the embedding regions accurately capture each classifier's label semantics. We further validate our framework on an undocumented HuggingFace classifier, enabling fine-grained label interpretation and supporting responsible AI auditing.
【2】MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification
标题:MEGConformer:基于Conformer的MEG解码器,用于稳健的语音和音素分类
链接:https://arxiv.org/abs/2512.01443
作者:Xabier de Zuazo,Ibon Saratxaga,Eva Navas
备注:10 pages, 5 figures, 4 tables, LibriBrain Workshop, NeurIPS 2025
摘要:We present Conformer-based decoders for the LibriBrain 2025 PNPL competition, targeting two foundational MEG tasks: Speech Detection and Phoneme Classification. Our approach adapts a compact Conformer to raw 306-channel MEG signals, with a lightweight convolutional projection layer and task-specific heads. For Speech Detection, a MEG-oriented SpecAugment provided a first exploration of MEG-specific augmentation. For Phoneme Classification, we used inverse-square-root class weighting and a dynamic grouping loader to handle 100-sample averaged examples. In addition, a simple instance-level normalization proved critical to mitigate distribution shifts on the holdout split. Using the official Standard track splits and F1-macro for model selection, our best systems achieved 88.9% (Speech) and 65.8% (Phoneme) on the leaderboard, surpassing the competition baselines and ranking within the top-10 in both tasks. For further implementation details, the technical documentation, source code, and checkpoints are available at https://github.com/neural2speech/libribrain-experiments.
【3】From Regression to Classification: Exploring the Benefits of Categorical Representations of Energy in MLIPs
标题:从回归到分类:探索MLIP中能量分类表示的好处
链接:https://arxiv.org/abs/2512.01160
作者:Ahmad Ali
备注:11th Annual Conference on Vision and Intelligent Systems (CVIS 2025)
摘要:Density Functional Theory (DFT) is a widely used computational method for estimating the energy and behavior of molecules. Machine Learning Interatomic Potentials (MLIPs) are models trained to approximate DFT-level energies and forces at dramatically lower computational cost. Many modern MLIPs rely on a scalar regression formulation; given information about a molecule, they predict a single energy value and corresponding forces while minimizing absolute error with DFT's calculations. In this work, we explore a multi-class classification formulation that predicts a categorical distribution over energy/force values, providing richer supervision through multiple targets. Most importantly, this approach offers a principled way to quantify model uncertainty. In particular, our method predicts a histogram of the energy/force distribution, converts scalar targets into histograms, and trains the model using cross-entropy loss. Our results demonstrate that this categorical formulation can achieve absolute error performance comparable to regression baselines. Furthermore, this representation enables the quantification of epistemic uncertainty through the entropy of the predicted distribution, offering a measure of model confidence absent in scalar regression approaches.
【4】World Model Robustness via Surprise Recognition
标题:通过惊喜识别的世界模型稳健性
链接:https://arxiv.org/abs/2512.01119
作者:Geigh Zollicoffer,Tanush Chopra,Mingkuan Yan,Xiaoxu Ma,Kenneth Eaton,Mark Riedl
摘要
:AI systems deployed in the real world must contend with distractions and out-of-distribution (OOD) noise that can destabilize their policies and lead to unsafe behavior. While robust training can reduce sensitivity to some forms of noise, it is infeasible to anticipate all possible OOD conditions. To mitigate this issue, we develop an algorithm that leverages a world model's inherent measure of surprise to reduce the impact of noise in world model--based reinforcement learning agents. We introduce both multi-representation and single-representation rejection sampling, enabling robustness to settings with multiple faulty sensors or a single faulty sensor. While the introduction of noise typically degrades agent performance, we show that our techniques preserve performance relative to baselines under varying types and levels of noise across multiple environments within self-driving simulation domains (CARLA and Safety Gymnasium). Furthermore, we demonstrate that our methods enhance the stability of two state-of-the-art world models with markedly different underlying architectures: Cosmos and DreamerV3. Together, these results highlight the robustness of our approach across world modeling domains. We release our code at https://github.com/Bluefin-Tuna/WISER .
【5】ForamDeepSlice: A High-Accuracy Deep Learning Framework for Foraminifera Species Classification from 2D Micro-CT Slices
标题:ForamDeepSlice:一个用于从2D Micro-CT切片中进行有孔虫物种分类的高精度深度学习框架
链接:https://arxiv.org/abs/2512.00912
作者:Abdelghafour Halimi,Ali Alibrahim,Didier Barradas-Bautista,Ronell Sicat,Abdulkader M. Afifi
摘要:This study presents a comprehensive deep learning pipeline for the automated classification of 12 foraminifera species using 2D micro-CT slices derived from 3D scans. We curated a scientifically rigorous dataset comprising 97 micro-CT scanned specimens across 27 species, selecting 12 species with sufficient representation for robust machine learning. To ensure methodological integrity and prevent data leakage, we employed specimen-level data splitting, resulting in 109,617 high-quality 2D slices (44,103 for training, 14,046 for validation, and 51,468 for testing). We evaluated seven state-of-the-art 2D convolutional neural network (CNN) architectures using transfer learning. Our final ensemble model, combining ConvNeXt-Large and EfficientNetV2-Small, achieved a test accuracy of 95.64%, with a top-3 accuracy of 99.6% and an area under the ROC curve (AUC) of 0.998 across all species. To facilitate practical deployment, we developed an interactive advanced dashboard that supports real-time slice classification and 3D slice matching using advanced similarity metrics, including SSIM, NCC, and the Dice coefficient. This work establishes new benchmarks for AI-assisted micropaleontological identification and provides a fully reproducible framework for foraminifera classification research, bridging the gap between deep learning and applied geosciences.
【6】Causal Invariance and Counterfactual Learning Driven Cooperative Game for Multi-Label Classification
标题:因果不变性和反事实学习驱动的多标签分类合作博弈
链接:https://arxiv.org/abs/2512.00812
作者:Yijia Fan,Jusheng Zhang,Kaitong Cai,Jing Yang,Keze Wang
摘要:Multi-label classification (MLC) remains vulnerable to label imbalance, spurious correlations, and distribution shifts, challenges that are particularly detrimental to rare label prediction. To address these limitations, we introduce the Causal Cooperative Game (CCG) framework, which conceptualizes MLC as a cooperative multi-player interaction. CCG unifies explicit causal discovery via Neural Structural Equation Models with a counterfactual curiosity reward to drive robust feature learning. Furthermore, it incorporates a causal invariance loss to ensure generalization across diverse environments, complemented by a specialized enhancement strategy for rare labels. Extensive benchmarking demonstrates that CCG substantially outperforms strong baselines in both rare label prediction and overall robustness. Through rigorous ablation studies and qualitative analysis, we validate the efficacy and interpretability of our components, underscoring the potential of synergizing causal inference with cooperative game theory for advancing multi-label learning.
【7】Realistic Handwritten Multi-Digit Writer (MDW) Number Recognition Challenges
标题:现实的手写多位书写器(MDW)号码识别挑战
链接:https://arxiv.org/abs/2512.00676
作者:Kiri L. Wagstaff
备注:10 pages, 6 figures
摘要:Isolated digit classification has served as a motivating problem for decades of machine learning research. In real settings, numbers often occur as multiple digits, all written by the same person. Examples include ZIP Codes, handwritten check amounts, and appointment times. In this work, we leverage knowledge about the writers of NIST digit images to create more realistic benchmark multi-digit writer (MDW) data sets. As expected, we find that classifiers may perform well on isolated digits yet do poorly on multi-digit number recognition. If we want to solve real number recognition problems, additional advances are needed. The MDW benchmarks come with task-specific performance metrics that go beyond typical error calculations to more closely align with real-world impact. They also create opportunities to develop methods that can leverage task-specific knowledge to improve performance well beyond that of individual digit classification methods.
【8】Financial Text Classification Based On rLoRA Finetuning On Qwen3-8B model
标题:基于rLoRA微调的金融文本分类Qwen 3 -8B模型
链接:https://arxiv.org/abs/2512.00630
作者:Zhiming Lian
备注:This paper has been accepted to the 2025 2nd International Conference on Digital Economy and Computer Science (DECS 2025) and is awaiting publication in the ACM International Conference Proceeding Series
摘要
:Financial text classification has increasingly become an important aspect in quantitative trading systems and related tasks, such as financial sentiment analysis and the classification of financial news. In this paper, we assess the performance of the large language model Qwen3-8B on both tasks. Qwen3-8B is a state-of-the-art model that exhibits strong instruction-following and multilingual capabilities, and is distinct from standard models, primarily because it is specifically optimized for efficient fine tuning and high performance on reasoning-based benchmarks, making it suitable for financial applications. To adapt this model, we apply Noisy Embedding Instruction Finetuning and based on our previous work, this method increases robustness by injecting controlled noise into the embedding layers during supervised adaptation. We improve efficiency further with Rank-stabilized Low-Rank Adaptation low-rank optimization approach, and FlashAttention, which allow for faster training with lower GPU memory. For both tasks, we benchmark Qwen3-8B against standard classical transformer models, such as T5, BERT, and RoBERTa, and large models at scale, such as LLaMA1-7B, LLaMA2-7B, and Baichuan2-7B. The findings reveal that Qwen3-8B consistently surpasses these baselines by obtaining better classification accuracy and needing fewer training epochs. The synergy of instruction-based fine-tuning and memory-efficient optimization methods suggests Qwen3-8B can potentially serve as a scalable, economical option for real-time financial NLP applications. Qwen3-8B provides a very promising base for advancing dynamic quantitative trading systems in the future.
【9】Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains
标题:大数据中的异类挑战:大规模结构化和非结构化领域分类的比较研究
链接:https://arxiv.org/abs/2512.00298
作者:González Trigueros Jesús Eduardo,Alonso Sánchez Alejandro,Muñoz Rivera Emilio,Peñarán Prieto Mariana Jaqueline,Mendoza González Camila Natalia
备注:13 pages, 1 figure, 3 tables. Comparative study involving Apache Spark and Hyperparameter Optimization. Keywords: Big Data, NLP, Tabular Data
摘要:This study analyzes the impact of heterogeneity ("Variety") in Big Data by comparing classification strategies across structured (Epsilon) and unstructured (Rest-Mex, IMDB) domains. A dual methodology was implemented: evolutionary and Bayesian hyperparameter optimization (Genetic Algorithms, Optuna) in Python for numerical data, and distributed processing in Apache Spark for massive textual corpora. The results reveal a "complexity paradox": in high-dimensional spaces, optimized linear models (SVM, Logistic Regression) outperformed deep architectures and Gradient Boosting. Conversely, in text-based domains, the constraints of distributed fine-tuning led to overfitting in complex models, whereas robust feature engineering -- specifically Transformer-based embeddings (ROBERTa) and Bayesian Target Encoding -- enabled simpler models to generalize effectively. This work provides a unified framework for algorithm selection based on data nature and infrastructure constraints.
【10】AutocleanEEG ICVision: Automated ICA Artifact Classification Using Vision-Language AI
标题:AutoCleanEcho ICVision:使用视觉语言AI自动ICA RST分类
链接:https://arxiv.org/abs/2512.00194
作者:Zag ElSayed,Grace Westerkamp,Gavin Gammoh,Yanchen Liu,Peyton Siekierski,Craig Erickson,Ernest Pedapati
备注:6 pages, 8 figures
摘要:We introduce EEG Autoclean Vision Language AI (ICVision) a first-of-its-kind system that emulates expert-level EEG ICA component classification through AI-agent vision and natural language reasoning. Unlike conventional classifiers such as ICLabel, which rely on handcrafted features, ICVision directly interprets ICA dashboard visualizations topography, time series, power spectra, and ERP plots, using a multimodal large language model (GPT-4 Vision). This allows the AI to see and explain EEG components the way trained neurologists do, making it the first scientific implementation of AI-agent visual cognition in neurophysiology. ICVision classifies each component into one of six canonical categories (brain, eye, heart, muscle, channel noise, and other noise), returning both a confidence score and a human-like explanation. Evaluated on 3,168 ICA components from 124 EEG datasets, ICVision achieved k = 0.677 agreement with expert consensus, surpassing MNE ICLabel, while also preserving clinically relevant brain signals in ambiguous cases. Over 97% of its outputs were rated as interpretable and actionable by expert reviewers. As a core module of the open-source EEG Autoclean platform, ICVision signals a paradigm shift in scientific AI, where models do not just classify, but see, reason, and communicate. It opens the door to globally scalable, explainable, and reproducible EEG workflows, marking the emergence of AI agents capable of expert-level visual decision-making in brain science and beyond.
【11】Learning Reduced Representations for Quantum Classifiers
标题:学习量子分类器的简化表示
链接:https://arxiv.org/abs/2512.01509
作者:Patrick Odagiu,Vasilis Belis,Lennart Schulze,Panagiotis Barkoutsos,Michele Grossi,Florentin Reiter,Günther Dissertori,Ivano Tavernelli,Sofia Vallecorsa
摘要:Data sets that are specified by a large number of features are currently outside the area of applicability for quantum machine learning algorithms. An immediate solution to this impasse is the application of dimensionality reduction methods before passing the data to the quantum algorithm. We investigate six conventional feature extraction algorithms and five autoencoder-based dimensionality reduction models to a particle physics data set with 67 features. The reduced representations generated by these models are then used to train a quantum support vector machine for solving a binary classification problem: whether a Higgs boson is produced in proton collisions at the LHC. We show that the autoencoder methods learn a better lower-dimensional representation of the data, with the method we design, the Sinkclass autoencoder, performing 40% better than the baseline. The methods developed here open up the applicability of quantum machine learning to a larger array of data sets. Moreover, we provide a recipe for effective dimensionality reduction in this context.
【12】Discriminative classification with generative features: bridging Naive Bayes and logistic regression
标题:具有生成性特征的区分性分类:弥合朴素Bayes和逻辑回归
链接:https://arxiv.org/abs/2512.01097
作者:Zachary Terner,Alexander Petersen,Yuedong Wang
摘要
:We introduce Smart Bayes, a new classification framework that bridges generative and discriminative modeling by integrating likelihood-ratio-based generative features into a logistic-regression-style discriminative classifier. From the generative perspective, Smart Bayes relaxes the fixed unit weights of Naive Bayes by allowing data-driven coefficients on density-ratio features. From a discriminative perspective, it constructs transformed inputs as marginal log-density ratios that explicitly quantify how much more likely each feature value is under one class than another, thereby providing predictors with stronger class separation than the raw covariates. To support this framework, we develop a spline-based estimator for univariate log-density ratios that is flexible, robust, and computationally efficient. Through extensive simulations and real-data studies, Smart Bayes often outperforms both logistic regression and Naive Bayes. Our results highlight the potential of hybrid approaches that exploit generative structure to enhance discriminative performance.
【13】Comparing Two Proxy Methods for Causal Identification
标题:比较两种代理方法进行原因识别
链接:https://arxiv.org/abs/2512.00175
作者:Helen Guo,Elizabeth L. Ogburn,Ilya Shpitser
备注:10 pages; 6 figures
摘要:Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We contrast two major approaches in this framework: (1) bridge equation methods, which leverage solutions to integral equations to recover causal targets, and (2) array decomposition methods, which recover latent factors composing counterfactual quantities by exploiting unique determination of eigenspaces. We compare the model restrictions underlying these two approaches and provide insight into implications of the underlying assumptions, clarifying the scope of applicability for each method.
表征(4篇)
【1】Weight Space Representation Learning with Neural Fields
标题:使用神经场的权重空间表示学习
链接:https://arxiv.org/abs/2512.01759
作者:Zhuoqian Yang,Mathieu Salzmann,Sabine Süsstrunk
备注:12 pages body, 9 pages appendix
摘要:In this work, we investigate the potential of weights to serve as effective representations, focusing on neural fields. Our key insight is that constraining the optimization space through a pre-trained base model and low-rank adaptation (LoRA) can induce structure in weight space. Across reconstruction, generation, and analysis tasks on 2D and 3D data, we find that multiplicative LoRA weights achieve high representation quality while exhibiting distinctiveness and semantic structure. When used with latent diffusion models, multiplicative LoRA weights enable higher-quality generation than existing weight-space methods.
【2】A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data
标题:输入水质数据的卷积神经网络非线性低阶表示模型
链接:https://arxiv.org/abs/2512.01465
作者:Hongnan Si,Tong Li,Yujie Chen,Xin Liao
备注:8 pages, 1 figure
摘要:Water quality monitoring is a core component of ecological environmental protection. However, due to sensor failure or other inevitable factors, data missing often exists in long-term monitoring, posing great challenges in water quality analysis. This paper proposes a Neural Tucker Convolutional Network (NTCN) model for water quality data imputation, which features the following key components: a) Encode different mode entities into respective embedding vectors, and construct a Tucker interaction tensor by outer product operations to capture the complex mode-wise feature interactions; b) Use 3D convolution to extract fine-grained spatiotemporal features from the interaction tensor. Experiments on three real-world water quality datasets show that the proposed NTCN model outperforms several state-of-the-art imputation models in terms of accuracy.
【3】Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction
标题:Rep3Net:一种利用多模态表示进行分子生物活性预测的方法
链接:https://arxiv.org/abs/2512.00521
作者:Sabrina Islam,Md. Atiqur Rahman,Md. Bakhtiar Hasan,Md. Hasanul Kabir
摘要:In early stage drug discovery, bioactivity prediction of molecules against target proteins plays a crucial role. Trdaitional QSAR models that utilizes molecular descriptor based data often struggles to predict bioactivity of molecules effectively due to its limitation in capturing structural and contextual information embedded within each compound. To address this challenge, we propose Rep3Net, a unified deep learning architecture that not only incorporates descriptor data but also includes spatial and relational information through graph-based represenation of compounds and contextual information through ChemBERTa generated embeddings from SMILES strings. Our model employing multimodal concatenated features produce reliable bioactivity prediction on Poly [ADP-ribose] polymerase 1 (PARP-1) dataset. PARP-1 is a crucial agent in DNA damage repair and has become a significant theraputic target in malignancies that depend on it for survival and growth. A comprehensive analysis and comparison with conventional standalone models including GCN, GAT, XGBoost, etc. demonstrates that our architecture achieves the highest predictive performance. In computational screening of compounds in drug discovery, our architecture provides a scalable framework for bioactivity prediction.
【4】SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic Features
标题:SemImage:文本的语义图像表示,一种嵌入解开的语言特征的新框架
链接:https://arxiv.org/abs/2512.00088
作者:Mohammad Zare
摘要:We propose SemImage, a novel method for representing a text document as a two-dimensional semantic image to be processed by convolutional neural networks (CNNs). In a SemImage, each word is represented as a pixel in a 2D image: rows correspond to sentences and an additional boundary row is inserted between sentences to mark semantic transitions. Each pixel is not a typical RGB value but a vector in a disentangled HSV color space, encoding different linguistic features: the Hue with two components H_cos and H_sin to account for circularity encodes the topic, Saturation encodes the sentiment, and Value encodes intensity or certainty. We enforce this disentanglement via a multi-task learning framework: a ColorMapper network maps each word embedding to the HSV space, and auxiliary supervision is applied to the Hue and Saturation channels to predict topic and sentiment labels, alongside the main task objective. The insertion of dynamically computed boundary rows between sentences yields sharp visual boundaries in the image when consecutive sentences are semantically dissimilar, effectively making paragraph breaks salient. We integrate SemImage with standard 2D CNNs (e.g., ResNet) for document classification. Experiments on multi-label datasets (with both topic and sentiment annotations) and single-label benchmarks demonstrate that SemImage can achieve competitive or better accuracy than strong text classification baselines (including BERT and hierarchical attention networks) while offering enhanced interpretability. An ablation study confirms the importance of the multi-channel HSV representation and the dynamic boundary rows. Finally, we present visualizations of SemImage that qualitatively reveal clear patterns corresponding to topic shifts and sentiment changes in the generated image, suggesting that our representation makes these linguistic features visible to both humans and machines.
编码器(2篇)
【1】Reconstructing Multi-Scale Physical Fields from Extremely Sparse Measurements with an Autoencoder-Diffusion Cascade
标题:利用自动编码器-扩散级联从极稀疏测量重建多尺度物理场
链接:https://arxiv.org/abs/2512.01572
作者:Letian Yi,Tingpeng Zhang,Mingyuan Zhou,Guannan Wang,Quanke Su,Zhilu Lai
备注:19 pages,10 figures
摘要:Reconstructing full fields from extremely sparse and random measurements is a longstanding ill-posed inverse problem. A powerful framework for addressing such challenges is hierarchical probabilistic modeling, where uncertainty is represented by intermediate variables and resolved through marginalization during inference. Inspired by this principle, we propose Cascaded Sensing (Cas-Sensing), a hierarchical reconstruction framework that integrates an autoencoder-diffusion cascade. First, a neural operator-based functional autoencoder reconstructs the dominant structures of the original field - including large-scale components and geometric boundaries - from arbitrary sparse inputs, serving as an intermediate variable. Then, a conditional diffusion model, trained with a mask-cascade strategy, generates fine-scale details conditioned on these large-scale structures. To further enhance fidelity, measurement consistency is enforced via the manifold constrained gradient based on Bayesian posterior sampling during the generation process. This cascaded pipeline substantially alleviates ill-posedness, delivering accurate and robust reconstructions. Experiments on both simulation and real-world datasets demonstrate that Cas-Sensing generalizes well across varying sensor configurations and geometric boundaries, making it a promising tool for practical deployment in scientific and engineering applications.
【2】EnzyCLIP: A Cross-Attention Dual Encoder Framework with Contrastive Learning for Predicting Enzyme Kinetic Constants
标题:EnzCLIP:一个具有对比学习的交叉注意双编码器框架,用于预测酶动力学常数
链接:https://arxiv.org/abs/2512.00379
作者:Anas Aziz Khan,Md Shah Fahad,Priyanka,Ramesh Chandra,Guransh Singh
摘要:Accurate prediction of enzyme kinetic parameters is crucial for drug discovery, metabolic engineering, and synthetic biology applications. Current computational approaches face limitations in capturing complex enzyme-substrate interactions and often focus on single parameters while neglecting the joint prediction of catalytic turnover numbers (Kcat) and Michaelis-Menten constants (Km). We present EnzyCLIP, a novel dual-encoder framework that leverages contrastive learning and cross-attention mechanisms to predict enzyme kinetic parameters from protein sequences and substrate molecular structures. Our approach integrates ESM-2 protein language model embeddings with ChemBERTa chemical representations through a CLIP-inspired architecture enhanced with bidirectional cross-attention for dynamic enzyme-substrate interaction modeling. EnzyCLIP combines InfoNCE contrastive loss with Huber regression loss to learn aligned multimodal representations while predicting log10-transformed kinetic parameters. The model is trained on the CatPred-DB database containing 23,151 Kcat and 41,174 Km experimentally validated measurements, and achieved competitive performance with R2 scores of 0.593 for Kcat and 0.607 for Km prediction. XGBoost ensemble methods applied to the learned embeddings further improved Km prediction (R2 = 0.61) while maintaining robust Kcat performance.
优化|敛散性(18篇)
【1】Agentic Policy Optimization via Instruction-Policy Co-Evolution
标题:通过指令-政策协同进化进行统计性政策优化
链接:https://arxiv.org/abs/2512.01945
作者:Han Zhou,Xingchen Wan,Ivan Vulić,Anna Korhonen
备注:10 pages, 3 figures, 2 tables (18 pages including references and appendices)
摘要
:Reinforcement Learning with Verifiable Rewards (RLVR) has advanced the reasoning capability of large language models (LLMs), enabling autonomous agents that can conduct effective multi-turn and tool-integrated reasoning. While instructions serve as the primary protocol for defining agents, RLVR typically relies on static and manually designed instructions. However, those instructions may be suboptimal for the base model, and the optimal instruction may change as the agent's policy improves and explores the interaction with the environment. To bridge the gap, we introduce INSPO, a novel Instruction-Policy co-evolution framework that integrates instruction optimization as a dynamic component of the reinforcement learning (RL) loop. INSPO maintains a dynamic population of instruction candidates that are sampled with questions, where reward signals in RL loops are automatically attributed to each instruction, and low performers are periodically pruned. New instructions are generated and verified through an on-policy reflection mechanism, where an LLM-based optimizer analyzes past experience from a replay buffer and evolves more effective strategies given the current policy. We conduct extensive experiments on multi-turn retrieval and reasoning tasks, demonstrating that INSPO substantially outperforms strong baselines relying on static instructions. INSPO discovers innovative instructions that guide the agent toward more strategic reasoning paths, achieving substantial performance gains with only a marginal increase in computational overhead.
【2】In-context Inverse Optimality for Fair Digital Twins: A Preference-based approach
标题:公平数字双胞胎的上下文逆优化:基于偏好的方法
链接:https://arxiv.org/abs/2512.01650
作者:Daniele Masti,Francesco Basciani,Arianna Fedeli,Girgio Gnecco,Francesco Smarra
备注:Submitted for possible publication at the IFAC World Congress 2026
摘要:Digital Twins (DTs) are increasingly used as autonomous decision-makers in complex socio-technical systems. Their mathematically optimal decisions often diverge from human expectations, exposing a persistent gap between algorithmic and bounded human rationality. This work addresses this gap by proposing a framework that operationalizes fairness as a learnable objective within optimization-based Digital Twins. We introduce a preference-driven learning pipeline that infers latent fairness objectives directly from human pairwise preferences over feasible decisions. A novel Siamese neural network is developed to generate convex quadratic cost functions conditioned on contextual information. The resulting surrogate objectives align optimization outcomes with human-perceived fairness while maintaining computational efficiency. The approach is demonstrated on a COVID-19 hospital resource allocation scenario. This study provides an actionable path toward embedding human-centered fairness in the design of autonomous decision-making systems.
【3】Neural Network Optimal Power Flow via Energy Gradient Flow and Unified Dynamics
标题:基于能量梯度流和统一动力学的神经网络最优潮流
链接:https://arxiv.org/abs/2512.01219
作者:Xuezhi Liu
摘要:Optimal Power Flow (OPF) is a core optimization problem in power system operation and planning, aiming to minimize generation costs while satisfying physical constraints such as power flow equations, generator limits, and voltage limits. Traditional OPF solving methods typically employ iterative optimization algorithms (such as interior point methods, sequential quadratic programming, etc.), with limitations including low computational efficiency, initial value sensitivity, and low batch computation efficiency. Most existing deep learning-based OPF methods rely on supervised learning, requiring pre-solving large numbers of cases, and have difficulty guaranteeing physical consistency. This paper proposes an Optimal Power Flow solving method based on neural network dynamics and energy gradient flow, transforming OPF problems into energy minimization problems. By constructing an energy function to measure the degree of deviation from the constraint manifold, and guiding networks to learn optimal solutions that simultaneously satisfy power flow constraints and minimize costs through gradient flow. Neural networks are trained unsupervised by directly minimizing physical residuals, requiring no labeled data, achieving true "end-to-end" physics-constrained learning.
【4】Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations
标题:缩小部分曲线下面积优化的逼近差距:两种配方的故事
链接:https://arxiv.org/abs/2512.01213
作者:Yangbangyan Jiang,Qianqian Xu,Huiyang Shao,Zhiyong Yang,Shilong Bao,Xiaochun Cao,Qingming Huang
摘要:As a variant of the Area Under the ROC Curve (AUC), the partial AUC (PAUC) focuses on a specific range of false positive rate (FPR) and/or true positive rate (TPR) in the ROC curve. It is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints. However, selecting instances within these constrained intervals during its calculation is NP-hard, and thus typically requires approximation techniques for practical resolution. Despite the progress made in PAUC optimization over the last few years, most existing methods still suffer from uncontrollable approximation errors or a limited scalability when optimizing the approximate PAUC objectives. In this paper, we close the approximation gap of PAUC optimization by presenting two simple instance-wise minimax reformulations: one with an asymptotically vanishing gap, the other with the unbiasedness at the cost of more variables. Our key idea is to first establish an equivalent instance-wise problem to lower the time complexity, simplify the complicated sample selection procedure by threshold learning, and then apply different smoothing techniques. Equipped with an efficient solver, the resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(ε^{-1/3})$ for typical one-way and two-way PAUCs. Moreover, we provide a tight generalization bound of our minimax reformulations. The result explicitly demonstrates the impact of the TPR/FPR constraints $α$/$β$ on the generalization and exhibits a sharp order of $\tilde{O}(α^{-1}\n_+^{-1} + β^{-1}\n_-^{-1})$. Finally, extensive experiments on several benchmark datasets validate the strength of our proposed methods.
【5】Sum Rate Maximization in STAR-RIS-UAV-Assisted Networks: A CA-DDPG Approach for Joint Optimization
标题:STAR-RIS-无人机辅助网络中的和率最大化:一种用于联合优化的CA-DDPG方法
链接:https://arxiv.org/abs/2512.01202
作者:Yujie Huang,Haibin Wan,Xiangcheng Li,Tuanfa Qin,Yun Li,Jun Li,Wen Chen
备注:14 pages, 12 figures
摘要
:With the rapid advances in programmable materials, reconfigurable intelligent surfaces (RIS) have become a pivotal technology for future wireless communications. The simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) can both transmit and reflect signals, enabling comprehensive signal control and expanding application scenarios. This paper introduces an unmanned aerial vehicle (UAV) to further enhance system flexibility and proposes an optimization design for the spectrum efficiency of the STAR-RIS-UAV-assisted wireless communication system. We present a deep reinforcement learning (DRL) algorithm capable of iteratively optimizing beamforming, phase shifts, and UAV positioning to maximize the system's sum rate through continuous interactions with the environment. To improve exploration in deterministic policies, we introduce a stochastic perturbation factor, which enhances exploration capabilities. As exploration is strengthened, the algorithm's ability to accurately evaluate the state-action value function becomes critical. Thus, based on the deep deterministic policy gradient (DDPG) algorithm, we propose a convolution-augmented deep deterministic policy gradient (CA-DDPG) algorithm that balances exploration and evaluation to improve the system's sum rate. The simulation results demonstrate that the CA-DDPG algorithm effectively interacts with the environment, optimizing the beamforming matrix, phase shift matrix, and UAV location, thereby improving system capacity and achieving better performance than other algorithms.
【6】Soft Quality-Diversity Optimization
标题:软质量多样性优化
链接:https://arxiv.org/abs/2512.00810
作者:Saeed Hedayatian,Stefanos Nikolaidis
备注:33 pages, 10 figures
摘要:Quality-Diversity (QD) algorithms constitute a branch of optimization that is concerned with discovering a diverse and high-quality set of solutions to an optimization problem. Current QD methods commonly maintain diversity by dividing the behavior space into discrete regions, ensuring that solutions are distributed across different parts of the space. The QD problem is then solved by searching for the best solution in each region. This approach to QD optimization poses challenges in large solution spaces, where storing many solutions is impractical, and in high-dimensional behavior spaces, where discretization becomes ineffective due to the curse of dimensionality. We present an alternative framing of the QD problem, called \emph{Soft QD}, that sidesteps the need for discretizations. We validate this formulation by demonstrating its desirable properties, such as monotonicity, and by relating its limiting behavior to the widely used QD Score metric. Furthermore, we leverage it to derive a novel differentiable QD algorithm, \emph{Soft QD Using Approximated Diversity (SQUAD)}, and demonstrate empirically that it is competitive with current state of the art methods on standard benchmarks while offering better scalability to higher dimensional problems.
【7】What Is Preference Optimization Doing, How and Why?
标题:偏好优化是什么、如何以及为什么?
链接:https://arxiv.org/abs/2512.00778
作者:Yue Wang,Qizhou Wang,Zizhuo Zhang,Ang Li,Gang Niu,Bo Han,Masashi Sugiyama
摘要:Preference optimization (PO) is indispensable for large language models (LLMs), with methods such as direct preference optimization (DPO) and proximal policy optimization (PPO) achieving great success. A common belief is that DPO is supervised learning while PPO is reinforcement learning, yet deeper analyses for the reasons underlying these differences remain lacking. To fill this gap, we analyze their optimization dynamics, revealing distinct algorithmic behaviors and comprehending their underlying causes. First, we examine the target directions of gradient-based updates and find that DPO follows stable targets, whereas PPO follows dynamic targets that balance exploration and exploitation, thus validating the common belief from a new perspective. Second, we examine the roles of positive learning, negative learning, and loss reweighting, which are three key components in PO methods. Our analyses reveal that these components play fairly different roles. In DPO, positive and negative learning jointly shape the learning targets meanwhile mutually offset each other. However, loss reweighting in DPO acts less as a reward signal but more as a regularizer to mitigate overfitting. In PPO, negative learning primarily supports exploration rather than determining the targets. Meanwhile, loss reweighting, related to absolute values of token-level advantages, indicates the distinct roles of token groups in updating targets. Given these findings, we conduct carefully designed ablation studies to further examine how controlling these dynamics impacts optimization efficiency and practical performance. The insights gained from our analyses not only deepen the understanding of PO methods but also inspire the development of more preference-aligned LLMs.
【8】Exploiting Function-Family Structure in Analog Circuit Optimization
标题:函数族结构在模拟电路优化中的应用
链接:https://arxiv.org/abs/2512.00712
作者:Zhuohua Liu,Kaiqi Huang,Qinxin Mei,Yuanqi Hu,Wei W. Xing
摘要:Analog circuit optimization is typically framed as black-box search over arbitrary smooth functions, yet device physics constrains performance mappings to structured families: exponential device laws, rational transfer functions, and regime-dependent dynamics. Off-the-shelf Gaussian-process surrogates impose globally smooth, stationary priors that are misaligned with these regime-switching primitives and can severely misfit highly nonlinear circuits at realistic sample sizes (50--100 evaluations). We demonstrate that pre-trained tabular models encoding these primitives enable reliable optimization without per-circuit engineering. Circuit Prior Network (CPN) combines a tabular foundation model (TabPFN v2) with Direct Expected Improvement (DEI), computing expected improvement exactly under discrete posteriors rather than Gaussian approximations. Across 6 circuits and 25 baselines, structure-matched priors achieve $R^2 \approx 0.99$ in small-sample regimes where GP-Matérn attains only $R^2 = 0.16$ on Bandgap, deliver $1.05$--$3.81\times$ higher FoM with $3.34$--$11.89\times$ fewer iterations, and suggest a shift from hand-crafting models as priors toward systematic physics-informed structure identification. Our code will be made publicly available upon paper acceptance.
【9】Efficient Matroid Bandit Linear Optimization Leveraging Unimodality
标题:利用单色性的高效拟阵Bandit线性优化
链接:https://arxiv.org/abs/2512.00605
作者:Aurélien Delage,Romaric Gaudel
摘要
:We study the combinatorial semi-bandit problem under matroid constraints. The regret achieved by recent approaches is optimal, in the sense that it matches the lower bound. Yet, time complexity remains an issue for large matroids or for matroids with costly membership oracles (e.g. online recommendation that ensures diversity). This paper sheds a new light on the matroid semi-bandit problem by exploiting its underlying unimodal structure. We demonstrate that, with negligible loss in regret, the number of iterations involving the membership oracle can be limited to \mathcal{O}(\log \log T)$. This results in an overall improved time complexity of the learning process. Experiments conducted on various matroid benchmarks show (i) no loss in regret compared to state-of-the-art approaches; and (ii) reduced time complexity and number of calls to the membership oracle.
【10】Non-Asymptotic Convergence of Discrete Diffusion Models: Masked and Random Walk dynamics
标题:离散扩散模型的非渐进收敛:掩蔽和随机游走动力学
链接:https://arxiv.org/abs/2512.00580
作者:Giovanni Conforti,Alain Durmus,Le-Tuyet-Nhi Pham
摘要:We investigate the theoretical underpinnings of Discrete Diffusion Models (DDMs) on discrete state spaces. Unlike in the continuous setting-where diffusion models are well understood both theoretically and empirically-the discrete case poses significant challenges due to its combinatorial structure and the lack of rigorous analysis. In this work, we establish convergence guarantees for DDMs on both the finite space $\mathbb{Z}^d_m=\{0,...,m-1\}^d$ and the countably infinite space $\mathbb{N}^d$ under mild assumptions, focusing on forward masked and random walk dynamics. Similar to the continuous case, the backward process can be characterized by a discrete score function, whose monotonicity plays a central role in deriving the error bounds of the generated data. Notably, the complexity of our model scales linearly up to logarithmic factors, rather than exponentially, with the dimension, making it efficiently scalable to high-dimensional data. To the best of our knowledge, this study provides the first non-asymptotic convergence guarantees that do not rely on the boundedness of the estimated score-covering not only uniform noising processes on $\mathbb{Z}^d_m$ and on $\mathbb{N}^d$, but also masking-based noising dynamics.
【11】Enhancing Analogy-Based Software Effort Estimation with Firefly Algorithm Optimization
标题:通过Firefly算法优化增强基于模拟的软件工作量估计
链接:https://arxiv.org/abs/2512.00571
作者:Tarun Chintada,Uday Kiran Cheera
备注:12 pages, 3 figures, 2 tables. Research conducted in June 2024
摘要:Analogy-Based Estimation (ABE) is a popular method for non-algorithmic estimation due to its simplicity and effectiveness. The Analogy-Based Estimation (ABE) model was proposed by researchers, however, no optimal approach for reliable estimation was developed. Achieving high accuracy in the ABE might be challenging for new software projects that differ from previous initiatives. This study (conducted in June 2024) proposes a Firefly Algorithm-guided Analogy-Based Estimation (FAABE) model that combines FA with ABE to improve estimation accuracy. The FAABE model was tested on five publicly accessible datasets: Cocomo81, Desharnais, China, Albrecht, Kemerer and Maxwell. To improve prediction efficiency, feature selection was used. The results were measured using a variety of evaluation metrics; various error measures include MMRE, MAE, MSE, and RMSE. Compared to conventional models, the experimental results show notable increases in prediction precision, demonstrating the efficacy of the Firefly-Analogy ensemble.
【12】ESPO: Entropy Importance Sampling Policy Optimization
标题:ESPO:熵重要性抽样政策优化
链接:https://arxiv.org/abs/2512.00499
作者:Yuepeng Sheng,Yuwei Huang,Shuman Liu,Haibo Zhang,Anxiang Zeng
摘要:Large language model (LLM) reinforcement learning has increasingly relied on group-based policy optimization frameworks, such as GRPO and GSPO, to achieve stable fine-tuning at scale. However, a fundamental trade-off persists between optimization granularity and training stability. While GSPO improves robustness via sequence-level optimization, its monolithic treatment of sequences introduces severe inefficiencies: its conservative clipping mechanism indiscriminately discards valid training samples-a phenomenon we term gradient underutilization-and its uniform credit assignment fails to capture the heterogeneous contributions of critical reasoning steps. In this work, we propose Entropy Importance Sampling Policy Optimization (ESPO), a novel framework that reconciles fine-grained control with training stability. ESPO decomposes sequences into groups based on predictive entropy, enabling (1) Entropy-driven Importance Sampling to capture intra-sequence heterogeneity, and (2) Entropy-adaptive Clipping to dynamically allocate trust regions based on model uncertainty. Extensive experiments on mathematical reasoning benchmarks demonstrate that ESPO not only accelerates convergence but also achieves state-of-the-art performance, notably improving accuracy on the challenging HMMT benchmark from 4.4% to 13.13%.
【13】BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models
标题:BioArc:发现生物基础模型的最佳神经架构
链接:https://arxiv.org/abs/2512.00283
作者:Yi Fang,Haoran Xu,Jiaxin Han,Sirui Ding,Yizhi Wang,Yue Wang,Xuan Wang
摘要
:Foundation models have revolutionized various fields such as natural language processing (NLP) and computer vision (CV). While efforts have been made to transfer the success of the foundation models in general AI domains to biology, existing works focus on directly adopting the existing foundation model architectures from general machine learning domains without a systematic design considering the unique physicochemical and structural properties of each biological data modality. This leads to suboptimal performance, as these repurposed architectures struggle to capture the long-range dependencies, sparse information, and complex underlying ``grammars'' inherent to biological data. To address this gap, we introduce BioArc, a novel framework designed to move beyond intuition-driven architecture design towards principled, automated architecture discovery for biological foundation models. Leveraging Neural Architecture Search (NAS), BioArc systematically explores a vast architecture design space, evaluating architectures across multiple biological modalities while rigorously analyzing the interplay between architecture, tokenization, and training strategies. This large-scale analysis identifies novel, high-performance architectures, allowing us to distill a set of empirical design principles to guide future model development. Furthermore, to make the best of this set of discovered principled architectures, we propose and compare several architecture prediction methods that effectively and efficiently predict optimal architectures for new biological tasks. Overall, our work provides a foundational resource and a principled methodology to guide the creation of the next generation of task-specific and foundation models for biology.
【14】We Still Don't Understand High-Dimensional Bayesian Optimization
标题:我们仍然不了解多维Bayesian优化
链接:https://arxiv.org/abs/2512.00170
作者:Colin Doumont,Donney Fan,Natalie Maus,Jacob R. Gardner,Henry Moss,Geoff Pleiss
摘要:High-dimensional spaces have challenged Bayesian optimization (BO). Existing methods aim to overcome this so-called curse of dimensionality by carefully encoding structural assumptions, from locality to sparsity to smoothness, into the optimization procedure. Surprisingly, we demonstrate that these approaches are outperformed by arguably the simplest method imaginable: Bayesian linear regression. After applying a geometric transformation to avoid boundary-seeking behavior, Gaussian processes with linear kernels match state-of-the-art performance on tasks with 60- to 6,000-dimensional search spaces. Linear models offer numerous advantages over their non-parametric counterparts: they afford closed-form sampling and their computation scales linearly with data, a fact we exploit on molecular optimization tasks with > 20,000 observations. Coupled with empirical analyses, our results suggest the need to depart from past intuitions about BO methods in high-dimensional spaces.
【15】Dimension-free error estimate for diffusion model and optimal scheduling
标题:扩散模型和最优调度的无干扰误差估计
链接:https://arxiv.org/abs/2512.01820
作者:Valentin de Bortoli,Romuald Elie,Anna Kazeykina,Zhenjie Ren,Jiacheng Zhang
摘要:Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution. A common approach involves simulating the time-reversal of an Ornstein-Uhlenbeck (OU) process initialized at the true data distribution. Since the score function associated with the OU process is typically unknown, it is approximated using a trained neural network. This approximation, along with finite time simulation, time discretization and statistical approximation, introduce several sources of error whose impact on the generated samples must be carefully understood. Previous analyses have quantified the error between the generated and the true data distributions in terms of Wasserstein distance or Kullback-Leibler (KL) divergence. However, both metrics present limitations: KL divergence requires absolute continuity between distributions, while Wasserstein distance, though more general, leads to error bounds that scale poorly with dimension, rendering them impractical in high-dimensional settings. In this work, we derive an explicit, dimension-free bound on the discrepancy between the generated and the true data distributions. The bound is expressed in terms of a smooth test functional with bounded first and second derivatives. The key novelty lies in the use of this weaker, functional metric to obtain dimension-independent guarantees, at the cost of higher regularity on the test functions. As an application, we formulate and solve a variational problem to minimize the time-discretization error, leading to the derivation of an optimal time-scheduling strategy for the reverse-time diffusion. Interestingly, this scheduler has appeared previously in the literature in a different context; our analysis provides a new justification for its optimality, now grounded in minimizing the discretization bias in generative sampling.
【16】Bayesian Optimization for Non-Cooperative Game-Based Radio Resource Management
标题:基于非合作游戏的无线电资源管理的Bayesian优化
链接:https://arxiv.org/abs/2512.01245
作者:Yunchuan Zhang,Jiechen Chen,Junshuo Liu,Robert C. Qiu
备注:6 pages, 4 figures, this paper is accepted to 2025 IEEE Global Communications Conference (Globecom)
摘要:Radio resource management in modern cellular networks often calls for the optimization of complex utility functions that are potentially conflicting between different base stations (BSs). Coordinating the resource allocation strategies efficiently across BSs to ensure stable network service poses significant challenges, especially when each utility is accessible only via costly, black-box evaluations. This paper considers formulating the resource allocation among spectrum sharing BSs as a non-cooperative game, with the goal of aligning their allocation incentives toward a stable outcome. To address this challenge, we propose PPR-UCB, a novel Bayesian optimization (BO) strategy that learns from sequential decision-evaluation pairs to approximate pure Nash equilibrium (PNE) solutions. PPR-UCB applies martingale techniques to Gaussian process (GP) surrogates and constructs high probability confidence bounds for utilities uncertainty quantification. Experiments on downlink transmission power allocation in a multi-cell multi-antenna system demonstrate the efficiency of PPR-UCB in identifying effective equilibrium solutions within a few data samples.
【17】No-Regret Gaussian Process Optimization of Time-Varying Functions
标题:时变函数的无遗憾高斯过程优化
链接:https://arxiv.org/abs/2512.00517
作者:Eliabelle Mauduit,Eloïse Berthier,Andrea Simonetto
摘要
:Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and often unrealistic assumptions are imposed. In this article, we propose a novel method to optimize time-varying rewards in the frequentist setting, where the objective has bounded RKHS norm. Time variations are captured through uncertainty injection (UI), which enables heteroscedastic GP regression that adapts past observations to the current time step. As no-regret is unattainable in general in the strict bandit setting, we relax the latter allowing additional queries on previously observed points. Building on sparse inference and the effect of UI on regret, we propose \textbf{W-SparQ-GP-UCB}, an online algorithm that achieves no-regret with only a vanishing number of additional queries per iteration. To assess the theoretical limits of this approach, we establish a lower bound on the number of additional queries required for no-regret, proving the efficiency of our method. Finally, we provide a comprehensive analysis linking the degree of time-variation of the function to achievable regret rates, together with upper and lower bounds on the number of additional queries needed in each regime.
【18】Stochastic Dominance Constrained Optimization with S-shaped Utilities: Poor-Performance-Region Algorithm and Neural Network
标题:具有S形效用的随机优势约束优化:差性能区域算法和神经网络
链接:https://arxiv.org/abs/2512.00299
作者:Zeyun Hu,Yang Liu
备注:30 pages
摘要:We investigate the static portfolio selection problem of S-shaped and non-concave utility maximization under first-order and second-order stochastic dominance (SD) constraints. In many S-shaped utility optimization problems, one should require a liquidation boundary to guarantee the existence of a finite concave envelope function. A first-order SD (FSD) constraint can replace this requirement and provide an alternative for risk management. We explicitly solve the optimal solution under a general S-shaped utility function with a first-order stochastic dominance constraint. However, the second-order SD (SSD) constrained problem under non-concave utilities is difficult to solve analytically due to the invalidity of Sion's maxmin theorem. For this sake, we propose a numerical algorithm to obtain a plausible and sub-optimal solution for general non-concave utilities. The key idea is to detect the poor performance region with respect to the SSD constraints, characterize its structure and modify the distribution on that region to obtain (sub-)optimality. A key financial insight is that the decision maker should follow the SD constraint on the poor performance scenario while conducting the unconstrained optimal strategy otherwise. We provide numerical experiments to show that our algorithm effectively finds a sub-optimal solution in many cases. Finally, we develop an algorithm-guided piecewise-neural-network framework to learn the solution of the SSD problem, which demonstrates accelerated convergence compared to standard neural network approaches.
预测|估计(16篇)
【1】A Footprint-Aware, High-Resolution Approach for Carbon Flux Prediction Across Diverse Ecosystems
标题:跨不同生态系统碳通量预测的足迹感知、高分辨率方法
链接:https://arxiv.org/abs/2512.01917
作者:Jacob Searcy,Anish Dulal,Scott Bridgham,Ashley Cordes,Lillian Aoki,Brendan Bohannan,Qing Zhu,Lucas C. R. Silva
备注:29 pages, 7 Figuers
摘要:Natural climate solutions (NCS) offer an approach to mitigating carbon dioxide (CO2) emissions. However, monitoring the carbon drawdown of ecosystems over large geographic areas remains challenging. Eddy-flux covariance towers provide ground truth for predictive 'upscaling' models derived from satellite products, but many satellites now produce measurements on spatial scales smaller than a flux tower's footprint. We introduce Footprint-Aware Regression (FAR), a first-of-its-kind, deep-learning framework that simultaneously predicts spatial footprints and pixel-level (30 m scale) estimates of carbon flux. FAR is trained on our AMERI-FAR25 dataset which combines 439 site years of tower data with corresponding Landsat scenes. Our model produces high-resolution predictions and achieves R2 = 0.78 when predicting monthly net ecosystem exchange on test sites from a variety of ecosystems.
【2】LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems
标题:REC:选择性预测和路由系统中错误发现控制的线性期望约束
链接:https://arxiv.org/abs/2512.01556
作者:Zhiyuan Wang,Aniri,Tianlong Chen,Yue Zhang,Heng Tao Shen,Xiaoshuang Shi,Kaidi Xu
摘要:Large language models (LLMs) often generate unreliable answers, while heuristic uncertainty methods fail to fully distinguish correct from incorrect predictions, causing users to accept erroneous answers without statistical guarantees. We address this issue through the lens of false discovery rate (FDR) control, ensuring that among all accepted predictions, the proportion of errors does not exceed a target risk level. To achieve this in a principled way, we propose LEC, which reinterprets selective prediction as a constrained decision problem by enforcing a Linear Expectation Constraint over selection and error indicators. Then, we establish a finite-sample sufficient condition, which relies only on a held-out set of exchangeable calibration samples, to compute an FDR-constrained, coverage-maximizing threshold. Furthermore, we extend LEC to a two-model routing mechanism: given a prompt, if the current model's uncertainty exceeds its calibrated threshold, we delegate it to a stronger model, while maintaining a unified FDR guarantee. Evaluations on closed-ended and open-ended question-answering (QA) datasets show that LEC achieves tighter FDR control and substantially improves sample retention over prior methods. Moreover, the two-model routing mechanism achieves lower risk levels while accepting more correct samples than each individual model.
【3】Directed evolution algorithm drives neural prediction
标题:有向进化算法驱动神经预测
链接:https://arxiv.org/abs/2512.01362
作者:Yanlin Wang,Nancy M Young,Patrick C M Wong
备注:43 pages, 5 figures
摘要
:Neural prediction offers a promising approach to forecasting the individual variability of neurocognitive functions and disorders and providing prognostic indicators for personalized invention. However, it is challenging to translate neural predictive models into medical artificial intelligent applications due to the limitations of domain shift and label scarcity. Here, we propose the directed evolution model (DEM), a novel computational model that mimics the trial-and-error processes of biological directed evolution to approximate optimal solutions for predictive modeling tasks. We demonstrated that the directed evolution algorithm is an effective strategy for uncertainty exploration, enhancing generalization in reinforcement learning. Furthermore, by incorporating replay buffer and continual backpropagate methods into DEM, we provide evidence of achieving better trade-off between exploitation and exploration in continuous learning settings. We conducted experiments on four different datasets for children with cochlear implants whose spoken language developmental outcomes vary considerably on the individual-child level. Preoperative neural MRI data has shown to accurately predict the post-operative outcome of these children within but not across datasets. Our results show that DEM can efficiently improve the performance of cross-domain pre-implantation neural predictions while addressing the challenge of label scarcity in target domain.
【4】Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI
标题:优化中风风险预测:结合LOS平衡集成和XAI的机器学习管道
链接:https://arxiv.org/abs/2512.01333
作者:A S M Ahsanul Sarkar Akib,Raduana Khawla,Abdul Hasib
摘要:Stroke is a major cause of death and permanent impairment, making it a major worldwide health concern. For prompt intervention and successful preventative tactics, early risk assessment is essential. To address this challenge, we used ensemble modeling and explainable AI (XAI) techniques to create an interpretable machine learning framework for stroke risk prediction. A thorough evaluation of 10 different machine learning models using 5-fold cross-validation across several datasets was part of our all-inclusive strategy, which also included feature engineering and data pretreatment (using Random Over-Sampling (ROS) to solve class imbalance). Our optimized ensemble model (Random Forest + ExtraTrees + XGBoost) performed exceptionally well, obtaining a strong 99.09% accuracy on the Stroke Prediction Dataset (SPD). We improved the model's transparency and clinical applicability by identifying three important clinical variables using LIME-based interpretability analysis: age, hypertension, and glucose levels. Through early prediction, this study highlights how combining ensemble learning with explainable AI (XAI) can deliver highly accurate and interpretable stroke risk assessment. By enabling data-driven prevention and personalized clinical decisions, our framework has the potential to transform stroke prediction and cardiovascular risk management.
【5】A Comparative Study of Machine Learning Algorithms for Electricity Price Forecasting with LIME-Based Interpretability
标题:基于LIME可解释性的电价预测机器学习算法比较研究
链接:https://arxiv.org/abs/2512.01212
作者:Xuanyi Zhao,Jiawen Ding,Xueting Huang,Yibo Zhang
备注:5 pages, 5 figures. Accepted for publication at ICEIEC 2025 (not yet published)
摘要:With the rapid development of electricity markets, price volatility has significantly increased, making accurate forecasting crucial for power system operations and market decisions. Traditional linear models cannot capture the complex nonlinear characteristics of electricity pricing, necessitating advanced machine learning approaches. This study compares eight machine learning models using Spanish electricity market data, integrating consumption, generation, and meteorological variables. The models evaluated include linear regression, ridge regression, decision tree, KNN, random forest, gradient boosting, SVR, and XGBoost. Results show that KNN achieves the best performance with R^2 of 0.865, MAE of 3.556, and RMSE of 5.240. To enhance interpretability, LIME analysis reveals that meteorological factors and supply-demand indicators significantly influence price fluctuations through nonlinear relationships. This work demonstrates the effectiveness of machine learning models in electricity price forecasting while improving decision transparency through interpretability analysis.
【6】Toward a benchmark for CTR prediction in online advertising: datasets, evaluation protocols and perspectives
标题:在线广告点击率预测的基准:数据集,评估协议和观点
链接:https://arxiv.org/abs/2512.01179
作者:Shan Gao,Yanwu Yang
备注:64 pages, 8 figures, 11 tables
摘要:This research designs a unified architecture of CTR prediction benchmark (Bench-CTR) platform that offers flexible interfaces with datasets and components of a wide range of CTR prediction models. Moreover, we construct a comprehensive system of evaluation protocols encompassing real-world and synthetic datasets, a taxonomy of metrics, standardized procedures and experimental guidelines for calibrating the performance of CTR prediction models. Furthermore, we implement the proposed benchmark platform and conduct a comparative study to evaluate a wide range of state-of-the-art models from traditional multivariate statistical to modern large language model (LLM)-based approaches on three public datasets and two synthetic datasets. Experimental results reveal that, (1) high-order models largely outperform low-order models, though such advantage varies in terms of metrics and on different datasets; (2) LLM-based models demonstrate a remarkable data efficiency, i.e., achieving the comparable performance to other models while using only 2% of the training data; (3) the performance of CTR prediction models has achieved significant improvements from 2015 to 2016, then reached a stage with slow progress, which is consistent across various datasets. This benchmark is expected to facilitate model development and evaluation and enhance practitioners' understanding of the underlying mechanisms of models in the area of CTR prediction. Code is available at https://github.com/NuriaNinja/Bench-CTR.
【7】Conversion rate prediction in online advertising: modeling techniques, performance evaluation and future directions
标题:在线广告转化率预测:建模技术、性能评估和未来方向
链接:https://arxiv.org/abs/2512.01171
作者:Tao Xue,Yanwu Yang,Panyu Zhai
备注:99 pages, 15 figures, 7 tables
摘要
:Conversion and conversion rate (CVR) prediction play a critical role in efficient advertising decision-making. In past decades, although researchers have developed plenty of models for CVR prediction, the methodological evolution and relationships between different techniques have been precluded. In this paper, we conduct a comprehensive literature review on CVR prediction in online advertising, and classify state-of-the-art CVR prediction models into six categories with respect to the underlying techniques and elaborate on connections between these techniques. For each category of models, we present the framework of underlying techniques, their advantages and disadvantages, and discuss how they are utilized for CVR prediction. Moreover, we summarize the performance of various CVR prediction models on public and proprietary datasets. Finally, we identify research trends, major challenges, and promising future directions. We observe that results of performance evaluation reported in prior studies are not unanimous; semantics-enriched, attribution-enhanced, debiased CVR prediction and jointly modeling CTR and CVR prediction would be promising directions to explore in the future. This review is expected to provide valuable references and insights for future researchers and practitioners in this area.
【8】A Benchmark of Causal vs Correlation AI for Predictive Maintenance
标题:预测性维护的因果关系与相关性AI基准
链接:https://arxiv.org/abs/2512.01149
作者:Krishna Taduri,Shaunak Dhande,Giacinto Paolo,Saggese,Paul Smith
摘要:Predictive maintenance in manufacturing environments presents a challenging optimization problem characterized by extreme cost asymmetry, where missed failures incur costs roughly fifty times higher than false alarms. Conventional machine learning approaches typically optimize statistical accuracy metrics that do not reflect this operational reality and cannot reliably distinguish causal relationships from spurious correlations. This study evaluates eight predictive models, ranging from baseline statistical approaches to formal causal inference methods, on a dataset of 10,000 CNC machines with a 3.3% failure prevalence. The formal causal inference model (L5) achieved estimated annual cost savings of 1.16 million USD (a 70.2 percent reduction), outperforming the best correlation-based decision tree model (L3) by approximately 80,000 USD per year. The causal model matched the highest observed recall (87.9 percent) while reducing false alarms by 97 percent (from 165 to 5) and attained a precision of 92.1 percent, with a train-test performance gap of only 2.6 percentage points. These results indicate that causal AI methods, when combined with domain knowledge, can yield superior financial outcomes and more interpretable predictions compared to correlation-based approaches in predictive maintenance applications.
【9】PIANO: Physics-informed Dual Neural Operator for Precipitation Nowcasting
标题:PIANO:用于降水临近预报的物理信息双神经操作器
链接:https://arxiv.org/abs/2512.01062
作者:Seokhyun Chin,Junghwan Park,Woojin Cho
备注:NeurIPS 2025 Machine Learning and Physical Sciences Workshop
【10】The Silence that Speaks: Neural Estimation via Communication Gaps
标题:说话的沉默:通过沟通差距进行神经估计
链接:https://arxiv.org/abs/2512.01056
作者:Shubham Aggarwal,Dipankar Maity,Tamer Başar
【11】D-CTNet: A Dual-Branch Channel-Temporal Forecasting Network with Frequency-Domain Correction
标题:D-CTNet:具有频域修正的双分支时间-时间预测网络
链接:https://arxiv.org/abs/2512.00925
作者:Shaoxun Wang,Xingjun Zhang,Kun Xia,Qianyang Li,Jiawei Cao,Zhendong Tan
【12】The Spectral Dimension of NTKs is Constant: A Theory of Implicit Regularization, Finite-Width Stability, and Scalable Estimation
标题:NTK的谱维是恒定的:隐式正规化、伪宽稳定性和可扩展估计理论
链接:https://arxiv.org/abs/2512.00860
作者:Praveen Anilkumar Shukla
备注:8 pages, 2 figures
【13】Forecasting India's Demographic Transition Under Fertility Policy Scenarios Using hybrid LSTM-PINN Model
标题:使用LSTM-PINN混合模型预测生育政策情景下印度人口转型
链接:https://arxiv.org/abs/2512.00760
作者:Subarna Khanra,Vijay Kumar Kukreja,Indu Bala
备注:31 pages, 17 figure, 57 references
【14】Towards Precision Protein-Ligand Affinity Prediction Benchmark: A Complete and Modification-Aware DAVIS Dataset
标题:迈向精确的蛋白质配体亲和力预测基准:完整且具有修改意识的DALIS数据集
链接:https://arxiv.org/abs/2512.00708
作者:Ming-Hsiu Wu,Ziqian Xie,Shuiwang Ji,Degui Zhi
【15】Neuroscience-Inspired Memory Replay for Continual Learning: A Comparative Study of Predictive Coding and Backpropagation-Based Strategies
标题:神经科学启发的持续学习记忆重演:预测编码和基于反向传播的策略的比较研究
链接:https://arxiv.org/abs/2512.00619
作者:Goutham Nalagatla,Shreyas Grandhe
备注:9 pages, 3 figures
【16】Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction
标题:开发公平性任务分解以提高脊柱融合后并发症预测的公平性
链接:https://arxiv.org/abs/2512.00598
作者:Yining Yuan,J. Ben Tamo,Wenqi Shi,Yishan Zhong,Micky C. Nnamdi,B. Randall Brenn,Steven W. Hwang,May D. Wang
其他神经网络|深度学习|模型|建模(58篇)
【1】EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI
标题:EfficientFlow:针对人工智能的高效等变流策略学习
链接:https://arxiv.org/abs/2512.02020
作者:Jianlei Chang,Ruofeng Mei,Wei Ke,Xiangyu Xu
备注:Accepted by AAAI 2026. Project Page: https://efficientflow.github.io/
【2】Improved Mean Flows: On the Challenges of Fastforward Generative Models
标题:改进的平均流量:快进生成模型的挑战
链接:https://arxiv.org/abs/2512.02012
作者:Zhengyang Geng,Yiyang Lu,Zongze Wu,Eli Shechtman,J. Zico Kolter,Kaiming He
备注:Technical report
【3】Learning Sim-to-Real Humanoid Locomotion in 15 Minutes
标题:15分钟内学习模拟到真实的人形机器人运动
链接:https://arxiv.org/abs/2512.01996
作者:Younggyo Seo,Carmelo Sferrazza,Juyue Chen,Guanya Shi,Rocky Duan,Pieter Abbeel
备注:Project website: https://younggyo.me/fastsac-humanoid
【4】ECO: Energy-Constrained Operator Learning for Chaotic Dynamics with Boundedness Guarantees
标题:ECO:具有有界性保证的混乱动力学的能量约束操作员学习
链接:https://arxiv.org/abs/2512.01984
作者:Andrea Goertzen,Sunbochen Tang,Navid Azizan
【5】Low-Rank Prehab: Preparing Neural Networks for SVD Compression
标题:低级别预处理:为DID压缩准备神经网络
链接:https://arxiv.org/abs/2512.01980
作者:Haoran Qin,Shansita Sharma,Ali Abbasi,Chayne Thrash,Soheil Kolouri
【6】Delays in Spiking Neural Networks: A State Space Model Approach
标题:尖峰神经网络的延迟:状态空间模型方法
链接:https://arxiv.org/abs/2512.01906
作者:Sanja Karilanova,Subhrakanti Dey,Ayça Özçelikkale
【7】Provably Safe Model Updates
标题:可证明安全的模型更新
链接:https://arxiv.org/abs/2512.01899
作者:Leo Elmecker-Plakolm,Pierre Fasterling,Philip Sosnin,Calvin Tsay,Matthew Wicker
备注:12 pages, 9 figures, submitted to IEEE SaTML 2026
【8】Unifying Sign and Magnitude for Optimizing Deep Vision Networks via ThermoLion
标题:通过TheroLion统一符号和幅度以优化深度视觉网络
链接:https://arxiv.org/abs/2512.01881
【9】Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning
标题:少忘记,多保留:基于排练的持续学习的轻量级调节器
链接:https://arxiv.org/abs/2512.01818
作者:Lama Alssum,Hasan Abed Al Kader Hammoud,Motasem Alfarra,Juan C Leon Alcazar,Bernard Ghanem
【10】MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention
标题:MSPAN:通过分组化多尺度注意力进行高效的大规模物理建模
链接:https://arxiv.org/abs/2512.01738
作者:Pedro M. P. Curvo,Jan-Willem van de Meent,Maksim Zhdanov
【11】A unified framework for geometry-independent operator learning in cardiac electrophysiology simulations
标题:心脏电生理模拟中几何独立操作员学习的统一框架
链接:https://arxiv.org/abs/2512.01702
作者:Bei Zhou,Cesare Corrado,Shuang Qian,Maximilian Balmus,Angela W. C. Lee,Cristobal Rodero,Marco J. W. Gotte,Luuk H. G. A. Hopman,Mengyun Qiao,Steven Niederer
【12】Walking on the Fiber: A Simple Geometric Approximation for Bayesian Neural Networks
标题:在纤维上行走:Bayesian神经网络的简单几何逼近
链接:https://arxiv.org/abs/2512.01500
作者:Alfredo Reichlin,Miguel Vasco,Danica Kragic
【13】Does Flatness imply Generalization for Logistic Loss in Univariate Two-Layer ReLU Network?
标题:平坦性是否意味着单变量两层ReLU网络中逻辑损失的推广?
链接:https://arxiv.org/abs/2512.01473
作者:Dan Qiao,Yu-Xiang Wang
备注:59 pages
【14】Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control
标题:可区分的失重控制器:用于连续控制的学习逻辑电路
链接:https://arxiv.org/abs/2512.01467
作者:Fabian Kresse,Christoph H. Lampert
备注:16 pages, 11 figures, 10 tables
【15】hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware
标题:hls 4ml:一个灵活的开源平台,用于在可重新配置硬件上加速深度学习
链接:https://arxiv.org/abs/2512.01463
作者
:Jan-Frederik Schulte,Benjamin Ramhorst,Chang Sun,Jovan Mitrevski,Nicolò Ghielmetti,Enrico Lupi,Dimitrios Danopoulos,Vladimir Loncar,Javier Duarte,David Burnette,Lauri Laatu,Stylianos Tzelepis,Konstantinos Axiotis,Quentin Berthet,Haoyan Wang,Paul White,Suleyman Demirsoy,Marco Colombo,Thea Aarrestad,Sioni Summers,Maurizio Pierini,Giuseppe Di Guglielmo,Jennifer Ngadiuba,Javier Campos,Ben Hawks,Abhijith Gandrakota,Farah Fahim,Nhan Tran,George Constantinides,Zhiqiang Que,Wayne Luk,Alexander Tapper,Duc Hoang,Noah Paladino,Philip Harris,Bo-Cheng Lai,Manuel Valentin,Ryan Forelli,Seda Ogrenci,Lino Gerlach,Rian Flynn,Mia Liu,Daniel Diaz,Elham Khoda,Melissa Quinnan,Russell Solares,Santosh Parajuli,Mark Neubauer,Christian Herwig,Ho Fung Tsoi,Dylan Rankin,Shih-Chieh Hsu,Scott Hauck
【16】Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
标题:保持独特,保持高效:在多任务合并中保留模特个性
链接:https://arxiv.org/abs/2512.01461
作者:Kuangpu Guo,Yuhe Ding,Jian Liang,Zilei Wang,Ran He
【17】Fantastic Features and Where to Find Them: A Probing Method to combine Features from Multiple Foundation Models
标题:奇妙的功能以及在哪里可以找到它们:组合来自多个基础模型的功能的探索方法
链接:https://arxiv.org/abs/2512.01405
作者:Benjamin Ramtoula,Pierre-Yves Lajoie,Paul Newman,Daniele De Martini
备注:Published at NeurIPS 2025
【18】On Global Applicability and Location Transferability of Generative Deep Learning Models for Precipitation Downscaling
标题:生成式降水降尺度深度学习模型的全局适用性和位置可迁移性
链接:https://arxiv.org/abs/2512.01400
作者:Paula Harder,Christian Lessig,Matthew Chantry,Francis Pelletier,David Rolnick
【19】The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness
标题:不完美的必要性:通过模拟认知界限扭转模型崩溃
链接:https://arxiv.org/abs/2512.01354
作者:Zhongjie Jiang
备注:38 pages,5 figures,30 tables. This paper proposes the Prompt-driven Cognitive Computing Framework (PMCSF) and validates it with A-share market stress tests (N=23 for 2015 crash, N=13 for 2024 bull market). Includes detailed appendices on cognitive vector definitions, perturbation operators, and financial backtest data
【20】milearn: A Python Package for Multi-Instance Machine Learning
标题:milarn:用于多实例机器学习的Python包
链接:https://arxiv.org/abs/2512.01287
作者:Dmitry Zankov,Pavlo Polishchuk,Michal Sobieraj,Mario Barbatti
备注:Open-source software for multi-instance machine learning
【21】Generative Modeling with Continuous Flows: Sample Complexity of Flow Matching
标题:具有连续流的生成建模:流匹配的样本复杂性
链接:https://arxiv.org/abs/2512.01286
作者:Mudit Gaur,Prashant Trivedi,Shuchin Aeron,Amrit Singh Bedi,George K. Atia,Vaneet Aggarwal
【22】Samplability makes learning easier
标题:可采样性使学习更容易
链接:https://arxiv.org/abs/2512.01276
作者:Guy Blanc,Caleb Koch,Jane Lange,Carmen Strassle,Li-Yang Tan
备注:ITCS 2026
【23】Efficient Hyperparameter Search for Non-Stationary Model Training
标题:非平稳模型训练的高效超参数搜索
链接:https://arxiv.org/abs/2512.01258
作者:Berivan Isik,Matthew Fahrbach,Dima Kuzmin,Nicolas Mayoraz,Emil Praun,Steffen Rendle,Raghavendra Vasudeva
【24】Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
标题:扩散混合专家模型的有效训练:一个实用的方法
链接:https://arxiv.org/abs/2512.01252
作者:Yahui Liu,Yang Yue,Jingyuan Zhang,Chenxi Sun,Yang Zhou,Wencong Zeng,Ruiming Tang,Guorui Zhou
备注:9 pages, 7 figures
【25】The Evolution of Learning Algorithms for Artificial Neural Networks
标题:人工神经网络学习算法的演变
链接:https://arxiv.org/abs/2512.01203
【26】Know Thyself by Knowing Others: Learning Neuron Identity from Population Context
标题:通过了解他人了解自己:从人口背景中学习神经元身份
链接:https://arxiv.org/abs/2512.01199
作者:Vinam Arora,Divyansha Lachi,Ian J. Knight,Mehdi Azabou,Blake Richards,Cole L. Hurwitz,Josh Siegle,Eva L. Dyer
备注:Accepted at Neurips 2025
【27】Learning to Reconstruct Temperature Field from Sparse Observations with Implicit Physics Priors
标题:学习利用隐式物理先验从稀疏观测重建温度场
链接:https://arxiv.org/abs/2512.01196
作者:Shihang Li,Zhiqiang Gong,Weien Zhou,Yue Gao,Wen Yao
【28】First On-Orbit Demonstration of a Geospatial Foundation Model
标题:地球空间基础模型的首次轨道演示
链接:https://arxiv.org/abs/2512.01181
作者:Andrew Du,Roberto Del Prete,Alejandro Mousist,Nick Manser,Fabrice Marre,Andrew Barton,Carl Seubert,Gabriele Meoni,Tat-Jun Chin
【29】Data assimilation and discrepancy modeling with shallow recurrent decoders
标题:用浅递归译码器进行资料同化和差异模拟
链接:https://arxiv.org/abs/2512.01170
作者:Yuxuan Bao,J. Nathan Kutz
备注:27 pages, 11 figures
【30】Fiber Bundle Networks: A Geometric Machine Learning Paradigm
标题:光纤束网络:几何机器学习范式
链接:https://arxiv.org/abs/2512.01151
作者:Dong Liu
备注:18 pages, 1 figure
【31】Projection-Free CNN Pruning via Frank-Wolfe with Momentum: Sparser Models with Less Pretraining
标题:通过Frank-Wolfe和Momentum进行无投影CNN修剪:预训练较少的稀疏模型
链接:https://arxiv.org/abs/2512.01147
作者:Hamza ElMokhtar Shili,Natasha Patnaik,Isabelle Ruble,Kathryn Jarjoura,Daniel Suarez Aguirre
备注:Preliminary preprint; numerical experiments are still being validated and may be updated in future revisions
【32】Neural Variable Name Repair: Learning to Rename Identifiers for Readability
标题:神经变量名称修复:学习简化标识符的可读性
链接:https://arxiv.org/abs/2512.01141
作者:Muhammad Yousuf,Akshat Bagade,Chhittebbayi Penugonda,Maanas Baraya
【33】Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI
标题
:实时边缘人工智能基础模型的联合分区和放置
链接:https://arxiv.org/abs/2512.01039
作者:Aladin Djuhera,Fernando Koch,Alecio Binotto
【34】FMTK: A Modular Toolkit for Composable Time Series Foundation Model Pipelines
标题:FMTK:可组合时间序列基础模型管道的模块化工具包
链接:https://arxiv.org/abs/2512.01038
作者:Hetvi Shastri,Pragya Sharma,Walid A. Hanafy,Mani Srivastava,Prashant Shenoy
【35】Subgroup Validity in Machine Learning for Echocardiogram Data
标题:超声心动图数据机器学习中的亚组有效性
链接:https://arxiv.org/abs/2512.00976
作者:Cynthia Feeney,Shane Williams,Benjamin S. Wessler,Michael C. Hughes
【36】Limitations of Using Identical Distributions for Training and Testing When Learning Boolean Functions
标题:学习布尔函数时使用相同分布进行训练和测试的局限性
链接:https://arxiv.org/abs/2512.00791
【37】Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tailed Class Imbalance
标题:符号下降的可证效益:重尾类不平衡下的最小模型
链接:https://arxiv.org/abs/2512.00763
作者:Robin Yadav,Shuo Xie,Tianhao Wang,Zhiyuan Li
【38】Preventing Model Collapse via Contraction-Conditioned Neural Filters
标题:通过收缩条件神经过滤器防止模型崩溃
链接:https://arxiv.org/abs/2512.00757
作者:Zongjian Han,Yiran Liang,Ruiwen Wang,Yiwei Luo,Yilin Huang,Xiaotong Song,Dongqing Wei
【39】Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking
标题:升级和合并的MoE奖励模型以缓解奖励黑客行为
链接:https://arxiv.org/abs/2512.00724
作者:Lingling Fu
备注:9 pages,5 figures
【40】Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks
标题:使用受物理启发的奇异学习理论来理解现代神经网络中的Grokking和其他相转变
链接:https://arxiv.org/abs/2512.00686
作者:Anish Lakkapragada
备注:Preprint
【41】Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation
标题:Doppler增强深度学习:使用YOLOv5实例分割改进甲状腺结节分割
链接:https://arxiv.org/abs/2512.00639
【42】Robust Precoding for Resilient Cell-Free Networks
标题:针对弹性无细胞网络的稳健预编码
链接:https://arxiv.org/abs/2512.00531
作者:Saeed Mashdour,André R. Flores,Rodrigo C. de Lamare
备注:2 figures, 6 pages
【43】FairMT: Fairness for Heterogeneous Multi-Task Learning
标题:FairMT:异类多任务学习的公平性
链接:https://arxiv.org/abs/2512.00469
作者:Guanyu Hu,Tangzheng Lian,Na Yan,Dimitrios Kollias,Xinyu Yang,Oya Celiktutan,Siyang Song,Zeyu Fu
【44】From Coefficients to Directions: Rethinking Model Merging with Directional Alignment
标题:从系数到方向:重新思考与方向对齐合并的模型
链接:https://arxiv.org/abs/2512.00391
作者:Zhikang Chen,Sen Cui,Deheng Ye,Min Zhang,Gang Niu,Yu Zhang,Masashi Sugiyama,Tingting Zhu
【45】Learning Causal States Under Partial Observability and Perturbation
标题:部分可观察性和微扰下的因果状态学习
链接:https://arxiv.org/abs/2512.00357
作者:Na Li,Hangguan Shan,Wei Ni,Wenjie Zhang,Xinyu Li,Yamin Wang
【46】Scalable and Interpretable Scientific Discovery via Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)
标题:通过稀疏变分高斯过程实现可扩展和可解释的科学发现Kolmogorov-Arnold网络(SVGP KAN)
链接:https://arxiv.org/abs/2512.00260
作者:Y. Sungtaek Ju
备注:7 pages, 3 figures
【47】Emergent Riemannian geometry over learning discrete computations on continuous manifolds
标题:关于学习连续形上的离散计算的新兴Riemann几何
链接:https://arxiv.org/abs/2512.00196
作者:Julian Brandon,Angus Chadwick,Arthur Pellegrino
【48】Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning
标题:Orion-Bix:表格式上下文学习的双向关注
链接:https://arxiv.org/abs/2512.00181
作者:Mohamed Bouadi,Pratinav Seth,Aditya Tanna,Vinay Kumar Sankarapu
【49】SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators
标题:SafeCiM:调查混合浮点内存计算深度学习加速器的弹性
链接:https://arxiv.org/abs/2512.00059
作者:Swastik Bhattacharya,Sanjay Das,Anand Menon,Shamik Kundu,Arnab Raha,Kanad Basu
【50】A robust generalizable device-agnostic deep learning model for sleep-wake determination from triaxial wrist accelerometry
标题:一个稳健的可推广的设备不可知深度学习模型,用于通过三轴手腕加速度测量来确定睡眠-觉醒
链接:https://arxiv.org/abs/2512.01986
作者:Nasim Montazeri,Stone Yang,Dominik Luszczynski,John Zhang,Dharmendra Gurve,Andrew Centen,Maged Goubran,Andrew Lim
备注:27 pages, 5 figures, 5 tables
【51】Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks
标题:低空无线网络中ISAC的多模式混合专家
链接:https://arxiv.org/abs/2512.01750
作者:Kai Zhang,Wentao Yu,Hengtao He,Shenghui Song,Jun Zhang,Khaled B. Letaief
【52】Common Structure Discovery in Collections of Bipartite Networks: Application to Pollination Systems
标题:双方网络集合中的公共结构发现:在Politics系统中的应用
链接:https://arxiv.org/abs/2512.01716
作者:Louis Lacoste,Pierre Barbillon,Sophie Donnet
【53】Masked Symbol Modeling for Demodulation of Oversampled Baseband Communication Signals in Impulsive Noise-Dominated Channels
标题:脉冲噪音主导通道中过采样基带通信信号解调的掩蔽符号建模
链接:https://arxiv.org/abs/2512.01428
作者:Oguz Bedir,Nurullah Sevim,Mostafa Ibrahim,Sabit Ekin
备注:Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop on AI and ML for Next-Generation Wireless Communications and Networking (AI4NextG), non-archival
【54】Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression
标题:工具变量回归的结果感知光谱特征学习
链接:https://arxiv.org/abs/2512.00919
作者:Dimitri Meunier,Jakub Wornbard,Vladimir R. Kostic,Antoine Moulin,Alek Fröhlich,Karim Lounici,Massimiliano Pontil,Arthur Gretton
【55】Fragmentation is Efficiently Learnable by Quantum Neural Networks
标题:量子神经网络可以有效学习碎片化
链接:https://arxiv.org/abs/2512.00751
作者:Mikhail Mints,Eric Anschuetz
备注:25 pages, 3 figures
【56】An Interpretable Operator-Learning Model for Electric Field Profile Reconstruction in Discharges Based on the EFISH Method
标题:基于EFISH方法的放电电场轮廓重构的可解释算子学习模型
链接:https://arxiv.org/abs/2512.00359
作者:Zhijian Yang,Edwin Setiadi Sugeng,Mhedine Alicherif,Tat Loon Chng
【57】VCWorld: A Biological World Model for Virtual Cell Simulation
标题:VC World:虚拟细胞模拟的生物世界模型
链接:https://arxiv.org/abs/2512.00306
作者:Zhijian Wei,Runze Ma,Zichen Wang,Zhongmin Li,Shuotong Song,Shuangjia Zheng
【58】Learning with Physical Constraints
标题:在身体限制下学习
链接:https://arxiv.org/abs/2512.00104
作者:Miguel A. Mendez,Jan van Den Berghe,Manuel Ratz,Matilde Fiore,Lorenzo Schena
备注:Chapter 3 from Machine Learning for Fluid Dynamics (ISBN 978-2875162090). Based on the VKI-ULB lecture series ''Machine Learning for Fluid Dynamics,'' held in Brussels in February 2022
其他(64篇)
【1】Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion
标题:视觉同步:通过交叉视图对象运动实现多摄像机同步
链接:https://arxiv.org/abs/2512.02017
作者:Shaowei Liu,David Yifan Yao,Saurabh Gupta,Shenlong Wang
备注:Accepted to NeurIPS 2025. Project page: https://stevenlsw.github.io/visualsync/
【2】AlignSAE: Concept-Aligned Sparse Autoencoders
标题:AlignSAGE:概念对齐的稀疏自动编码器
链接:https://arxiv.org/abs/2512.02004
作者:Minglai Yang,Xinyu Guo,Mihai Surdeanu,Liangming Pan
备注:20 pages, 7 figures, 5 tables
【3】SVRG and Beyond via Posterior Correction
标题:通过后路矫正进行SVRG及其他
链接:https://arxiv.org/abs/2512.01930
作者:Nico Daheim,Thomas Möllenhoff,Ming Liang Ang,Mohammad Emtiyaz Khan
备注:Preprint. Under review
【4】InnoGym: Benchmarking the Innovation Potential of AI Agents
标题:InnoGym:人工智能代理的创新潜力基准
链接:https://arxiv.org/abs/2512.01822
作者:Jintian Zhang,Kewei Xu,Jingsheng Zheng,Zhuoyun Yu,Yuqi Zhu,Yujie Luo,Lanning Wei,Shuofei Qiao,Lun Du,Da Zheng,Shumin Deng,Huajun Chen,Ningyu Zhang
备注:Work in progress
【5】Much Ado About Noising: Dispelling the Myths of Generative Robotic Control
标题:噪音太过分了:消除生成机器人控制的神话
链接:https://arxiv.org/abs/2512.01809
作者:Chaoyi Pan,Giri Anantharaman,Nai-Chieh Huang,Claire Jin,Daniel Pfrommer,Chenyang Yuan,Frank Permenter,Guannan Qu,Nicholas Boffi,Guanya Shi,Max Simchowitz
【6】GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
标题:GR-RL:灵巧而精确地实现长视野机器人操纵
链接:https://arxiv.org/abs/2512.01801
作者:Yunfei Li,Xiao Ma,Jiafeng Xu,Yu Cui,Zhongren Cui,Zhigang Han,Liqun Huang,Tao Kong,Yuxiao Liu,Hao Niu,Wanli Peng,Jingchao Qiao,Zeyu Ren,Haixin Shi,Zhi Su,Jiawen Tian,Yuyang Xiao,Shenyu Zhang,Liwei Zheng,Hang Li,Yonghui Wu
【7】The Active and Noise-Tolerant Strategic Perceptron
标题:主动且耐噪的战略感知器
链接:https://arxiv.org/abs/2512.01783
作者:Maria-Florina Blacan,Hedyeh Beyhaghi
【8】Dual Randomized Smoothing: Beyond Global Noise Variance
标题:双随机平滑:超越全局噪声方差
链接:https://arxiv.org/abs/2512.01782
作者:Chenhao Sun,Yuhao Mao,Martin Vechev
【9】How Does RL Post-training Induce Skill Composition? A Case Study on Countdown
标题:RL后训练如何诱导技能构成?倒计时案例研究
链接:https://arxiv.org/abs/2512.01775
作者:Simon Park,Simran Kaur,Sanjeev Arora
【10】On the Unreasonable Effectiveness of Last-layer Retraining
标题:论最后一层再训练的不合理效果
链接:https://arxiv.org/abs/2512.01766
作者:John C. Hill,Tyler LaBonte,Xinchen Zhang,Vidya Muthukumar
【11】Beyond Scaffold: A Unified Spatio-Temporal Gradient Tracking Method
标题:Beyond Scaffold:一种统一的时空梯度跟踪方法
链接:https://arxiv.org/abs/2512.01732
作者:Yan Huang,Jinming Xu,Jiming Chen,Karl Henrik Johansson
备注:13 pages
【12】Morphling: Fast, Fused, and Flexible GNN Training at Scale
标题:Morphling:快速,融合,灵活的GNN训练规模
链接:https://arxiv.org/abs/2512.01678
【13】Q2D2: A Geometry-Aware Audio Codec Leveraging Two-Dimensional Quantization
标题:Q2 D2:利用二维量化的几何感知音频编解码器
链接:https://arxiv.org/abs/2512.01537
作者:Tal Shuster,Eliya Nachmani
【14】Multi-view diffusion geometry using intertwined diffusion trajectories
标题:使用交织扩散轨迹的多视图扩散几何
链接:https://arxiv.org/abs/2512.01484
作者:Gwendal Debaussart-Joniec,Argyris Kalogeratos
【15】Fourier Neural Operators Explained: A Practical Perspective
标题:傅里叶神经运算符解释:实用角度
链接:https://arxiv.org/abs/2512.01421
作者:Valentin Duruisseaux,Jean Kossaifi,Anima Anandkumar
备注:92 pages, 26 figures
【16】CLAPS: Posterior-Aware Conformal Intervals via Last-Layer Laplace
标题:CLAPS:通过最后层拉普拉斯的后觉保形间隔
链接:https://arxiv.org/abs/2512.01384
作者:Dongseok Kim,Hyoungsun Choi,Mohamed Jismy Aashik Rasool,Gisung Oh
备注:19 pages, 2 figures
【17】Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators
标题:超越损失指导:在扩散神经运算符中使用PDL残度作为谱注意力
链接:https://arxiv.org/abs/2512.01370
作者:Medha Sawhney,Abhilash Neog,Mridul Khurana,Anuj Karpatne
【18】Modality-Augmented Fine-Tuning of Foundation Robot Policies for Cross-Embodiment Manipulation on GR1 and G1
标题:GR 1和G1上跨实施操纵的基金会机器人策略的模式增强微调
链接:https://arxiv.org/abs/2512.01358
作者:Junsung Park,Hogun Kee,Songhwai Oh
备注:8 pages, 10 figures
【19】Extending NGU to Multi-Agent RL: A Preliminary Study
标题:将NGU扩展到多Agent RL:初步研究
链接:https://arxiv.org/abs/2512.01321
作者:Juan Hernandez,Diego Fernández,Manuel Cifuentes,Denis Parra,Rodrigo Toro Icarte
备注:9 pages, 4 figures, 1 table. Accepted at the LatinX in AI (LXAI) Workshop at NeurIPS 2025. Includes experimental results for Multi-NGU and Multi-DQN in the PettingZoo simple_tag environment
【20】Agreement-Constrained Probabilistic Minimum Bayes Risk Decoding
标题:条件约束概率最小Bayes风险解码
链接:https://arxiv.org/abs/2512.01316
作者:Koki Natsumi,Hiroyuki Deguchi,Yusuke Sakai,Hidetaka Kamigaito,Taro Watanabe
备注:IJCNLP-AACL 2025 Main
【21】CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL
标题:CuES:一个好奇心驱动的环境导向的逆向强化学习综合框架
链接:https://arxiv.org/abs/2512.01311
作者:Shinji Mai,Yunpeng Zhai,Ziqian Chen,Cheng Chen,Anni Zou,Shuchang Tao,Zhaoyang Liu,Bolin Ding
【22】Social Media Data Mining of Human Behaviour during Bushfire Evacuation
标题:丛林大火疏散期间人类行为的社交媒体数据挖掘
链接:https://arxiv.org/abs/2512.01262
作者:Junfeng Wu,Xiangmin Zhou,Erica Kuligowski,Dhirendra Singh,Enrico Ronchi,Max Kinateder
【23】CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions
标题:CoSineVerification:面向计算的科学问题的工具增强答案验证
链接:https://arxiv.org/abs/2512.01224
作者:Ruixiang Feng,Zhenwei An,Yuntao Wen,Ran Le,Yiming Jia,Chen Yang,Zongchao Chen,Lisi Chen,Shen Gao,Shuo Shang,Yang Song,Tao Zhang
【24】Pay Attention Later: From Vector Space Diffusion to Linearithmic Spectral Phase-Locking
标题:稍后注意:从载体空间扩散到线性光谱锁相
链接:https://arxiv.org/abs/2512.01208
作者:Alper Yıldırım,İbrahim Yücedağ
备注:12 pages, 5 figures
【25】fMRI2GES: Co-speech Gesture Reconstruction from fMRI Signal with Dual Brain Decoding Alignment
标题:fMRI 2GES:利用双脑解码对齐从fMRI信号重建共语音手势
链接:https://arxiv.org/abs/2512.01189
作者:Chunzheng Zhu,Jialin Shao,Jianxin Lin,Yijun Wang,Jing Wang,Jinhui Tang,Kenli Li
备注:IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2025
【26】Mode-Conditioning Unlocks Superior Test-Time Scaling
标题:模式调节解锁卓越的测试时间缩放
链接:https://arxiv.org/abs/2512.01127
作者:Chen Henry Wu,Sachin Goyal,Aditi Raghunathan
【27】Bayesian dynamic scheduling of multipurpose batch processes under incomplete look-ahead information
标题:不完全前瞻信息下多用途批过程的Bayesian动态调度
链接:https://arxiv.org/abs/2512.01093
作者:Taicheng Zheng,Dan Li,Jie Li
【28】Testing the Machine Consciousness Hypothesis
标题:测试机器意识假设
链接:https://arxiv.org/abs/2512.01081
【29】Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
标题:具有操作约束的RL屏蔽控制器单元应用于远程微电网
链接:https://arxiv.org/abs/2512.01046
作者:Hadi Nekoei,Alexandre Blondin Massé,Rachid Hassani,Sarath Chandar,Vincent Mai
【30】Associative Syntax and Maximal Repetitions reveal context-dependent complexity in fruit bat communication
标题:关联重复和最大重复揭示了果蝠交流中的上下文依赖复杂性
链接:https://arxiv.org/abs/2512.01033
作者:Luigi Assom
备注:Accepted for a lightning talk at the NeurIPS 2025 Workshop: "AI for Non-Human Animal Communication"
【31】Upper Approximation Bounds for Neural Oscillators
标题:神经振荡器的上逼近界
链接:https://arxiv.org/abs/2512.01015
作者:Zifeng Huang,Konstantin M. Zuev,Yong Xia,Michael Beer
备注:30 pages, 4 figures
【32】Chain of Unit-Physics: A Primitive-Centric Approach to Scientific Code Synthesis
标题:单位物理链:以原始为中心的科学代码合成方法
链接:https://arxiv.org/abs/2512.01010
作者:Vansh Sharma,Venkat Raman
【33】Memory-Integrated Reconfigurable Adapters: A Unified Framework for Settings with Multiple Tasks
标题:内存集成可重新配置适配器:用于具有多个任务的设置的统一框架
链接:https://arxiv.org/abs/2512.00940
作者:Susmit Agrawal,Krishn Vishwas Kher,Saksham Mittal,Swarnim Maheshwari,Vineeth N. Balasubramanian
备注:NeurIPS 2025; 31 pages, 2 figures
【34】AI Agent for Source Finding by SoFiA-2 for SKA-SDC2
标题:SoFiA-2 for SKA-SDC 2进行源查找的人工智能代理
链接:https://arxiv.org/abs/2512.00769
作者:Xingchen Zhou,Nan Li,Peng Jia,Yingfeng Liu,Furen Deng,Shuanghao Shu,Ying Li,Liang Cao,Huanyuan Shan,Ayodeji Ibitoye
备注:20 pages, 10 figures, accepted by RAA
【35】Flow Matching for Tabular Data Synthesis
标题:表格数据综合的流匹配
链接:https://arxiv.org/abs/2512.00698
作者:Bahrul Ilmi Nasution,Floor Eijkelboom,Mark Elliot,Richard Allmendinger,Christian A. Naesseth
备注:15 pages main, 12 pages appendix, 5 figures
【36】ML-Tool-Bench: Tool-Augmented Planning for ML Tasks
标题:ML-Tools-Bench:ML任务的工具增强规划
链接:https://arxiv.org/abs/2512.00672
作者:Yaswanth Chittepu,Raghavendra Addanki,Tung Mai,Anup Rao,Branislav Kveton
【37】An Approach to Joint Hybrid Decision Making between Humans and Artificial Intelligence
标题:人类与人工智能联合混合决策方法
链接:https://arxiv.org/abs/2512.00420
作者:Jonas D. Rockbach,Sven Fuchs,Maren Bennewitz
【38】Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics
标题:解决神经最小-最大博弈:架构,可持续性和动力学的作用
链接:https://arxiv.org/abs/2512.00389
作者:Deep Patel,Emmanouil-Vasileios Vlatakis-Gkaragkounis
备注:Camera-ready for NeurIPS 2025 (including updated section on neural network initialization for experiments in Appendix C)
【39】Efficient and Programmable Exploration of Synthesizable Chemical Space
标题:可合成化学空间的高效和可编程探索
链接:https://arxiv.org/abs/2512.00384
作者:Shitong Luo,Connor W. Coley
【40】An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
标题:离线RL简化为在线RL子程序有效性的实证研究
链接:https://arxiv.org/abs/2512.00383
作者:Jianhai Su,Jinzhu Luo,Qi Zhang
【41】The Information Theory of Similarity
标题:相似性信息理论
链接:https://arxiv.org/abs/2512.00378
【42】Introducing AI-Driven IoT Energy Management Framework
标题:引入人工智能驱动的物联网能源管理框架
链接:https://arxiv.org/abs/2512.00321
作者:Shivani Mruthyunjaya,Anandi Dutta,Kazi Sifatul Islam
备注:Accepted in IEEE Smart World Congress 2025, Calgary, Canada
【43】Tracing Mathematical Proficiency Through Problem-Solving Processes
标题:通过问题解决过程追踪数学能力
链接:https://arxiv.org/abs/2512.00311
作者:Jungyang Park,Suho Kang,Jaewoo Park,Jaehong Kim,Jaewoo Shin,Seonjoon Park,Youngjae Yu
备注:15 pages, 7 figures
【44】Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning
标题:基于远程传输的近乎机器遗忘中的隐私辩护
链接:https://arxiv.org/abs/2512.00272
作者:Mohammad M Maheri,Xavier Cadet,Peter Chin,Hamed Haddadi
【45】Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular Sheaves
标题:多元神经束扩散:细胞束的光谱过滤方法
链接:https://arxiv.org/abs/2512.00242
作者:Alessio Borgi,Fabrizio Silvestri,Pietro Liò
【46】Chunking Strategies for Multimodal AI Systems
标题:多模式人工智能系统的分块策略
链接:https://arxiv.org/abs/2512.00185
作者:Shashanka B R,Mohith Charan R,Seema Banu F
备注:45 pages, 5 figure
【47】A CNN-Based Technique to Assist Layout-to-Generator Conversion for Analog Circuits
标题:基于CNN的辅助模拟电路布图到发生器转换的技术
链接:https://arxiv.org/abs/2512.00070
作者:Sungyu Jeong,Minsu Kim,Byungsub Kim
【48】SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning
标题:SpeedAug:通过Tempo丰富的政策和RL微调来加速政策
链接:https://arxiv.org/abs/2512.00062
作者:Taewook Nam,Sung Ju Hwang
【49】Architect in the Loop Agentic Hardware Design and Verification
标题:架构师在循环中的大型硬件设计和验证
链接:https://arxiv.org/abs/2512.00016
【50】Towards a future space-based, highly scalable AI infrastructure system design
标题:迈向未来基于太空的、高度可扩展的人工智能基础设施系统设计
链接:https://arxiv.org/abs/2511.19468
作者:Blaise Agüera y Arcas,Travis Beals,Maria Biggs,Jessica V. Bloom,Thomas Fischbacher,Konstantin Gromov,Urs Köster,Rishiraj Pravahan,James Manyika
备注:19 pages, 4 figures
【51】Fundamentals of Regression
标题:回归的基础
链接:https://arxiv.org/abs/2512.01920
作者:Miguel A. Mendez
备注:Chapter 2 from Machine Learning for Fluid Dynamics (ISBN 978-2875162090). Based on the VKI-ULB lecture series ''Machine Learning for Fluid Dynamics,'' held in Brussels in February 2022
【52】Decision Tree Embedding by Leaf-Means
标题:通过叶均值嵌入决策树
链接:https://arxiv.org/abs/2512.01819
作者:Cencheng Shen,Yuexiao Dong,Carey E. Priebe
备注:9 pages
【53】LPCD: Unified Framework from Layer-Wise to Submodule Quantization
标题:LPDC:从分层到子模块量化的统一框架
链接:https://arxiv.org/abs/2512.01546
作者:Yuma Ichikawa,Yudai Fujimoto,Akira Sakai
备注:21 pages, 4 figures
【54】Experimental Methods, Health Indicators, and Diagnostic Strategies for Retired Lithium-ion Batteries: A Comprehensive Review
标题:退役锂离子电池的实验方法、健康指标和诊断策略:全面回顾
链接:https://arxiv.org/abs/2512.01294
作者:Song Zhang,Ruohan Guo,Xiaohua Ge,Perter Mahon,Weixiang Shen
备注:Review article; 46 pages, 3 figures, 2 tables
【55】Implicitly Normalized Online PCA: A Regularized Algorithm with Exact High-Dimensional Dynamics
标题:隐式正规化在线PCA:具有精确多维动态学的正规化算法
链接:https://arxiv.org/abs/2512.01231
作者:Samet Demir,Zafer Dogan
备注:34 pages 9 figures
【56】High-dimensional Mean-Field Games by Particle-based Flow Matching
标题:基于粒子流匹配的多维平均场博弈
链接:https://arxiv.org/abs/2512.01172
作者:Jiajia Yu,Junghwan Lee,Yao Xie,Xiuyuan Cheng
【57】Building Trustworthy AI for Materials Discovery: From Autonomous Laboratories to Z-scores
标题:构建可信赖的材料发现人工智能:从自主实验室到Z分数
链接:https://arxiv.org/abs/2512.01080
作者:Benhour Amirian,Ashley S. Dale,Sergei Kalinin,Jason Hattrick-Simpers
【58】On The Finetuning of MLIPs Through the Lens of Iterated Maps With BPTT
标题:通过BPTT迭代地图的视角对MLIP进行微调
链接:https://arxiv.org/abs/2512.01067
作者:Evan Dramko,Yizhi Zhu,Aleksandar Krivokapic,Geoffroy Hautier,Thomas Reps,Christopher Jermaine,Anastasios Kyrillidis
备注:9 main pages, total of 15 pages. 6 tables, 6 Figures
【59】Thompson Sampling for Multi-Objective Linear Contextual Bandit
标题:多目标线性上下文盗贼的Thompson抽样
链接:https://arxiv.org/abs/2512.00930
作者:Somangchan Park,Heesang Ann,Min-hwan Oh
备注:NeurIPS 2025
【60】Non-Negative Matrix Factorization Using Non-Von Neumann Computers
标题:使用非冯·诺伊曼计算机的非负矩阵因式分解
链接:https://arxiv.org/abs/2512.00675
作者:Ajinkya Borle,Charles Nicholas,Uchenna Chukwu,Mohammad-Ali Miri,Nicholas Chancellor
备注:14 pages, 5 figures, 6 tables and 1 appendix
【61】Restricted Block Permutation for Two-Sample Testing
标题:双样本测试的限制性块排列
链接:https://arxiv.org/abs/2512.00668
【62】An RKHS Perspective on Tree Ensembles
标题:RKHS对树木合奏的看法
链接:https://arxiv.org/abs/2512.00397
作者:Mehdi Dagdoug,Clement Dombry,Jean-Jil Duchamps
备注:69 pages
【63】DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants
标题:DAISI:使用随机插值的反采样数据同化
链接:https://arxiv.org/abs/2512.00252
作者:Martin Andrae,Erik Larsson,So Takao,Tomas Landelius,Fredrik Lindsten
备注:41 pages, 24 figures
【64】Beyond Expected Goals: A Probabilistic Framework for Shot Occurrences in Soccer
标题:超出预期的目标:足球中射门发生的可能性框架
链接:https://arxiv.org/abs/2512.00203
作者:Jonathan Pipping,Tianshu Feng,R. Paul Sabin
备注
:18pp main + 3pp appendix; 8 figures, 12 tables. Submitted to the Journal of Quantitative Analysis in Sports (JQAS). Data proprietary to Gradient Sports; we share derived features & scripts (code under MIT/Apache-2.0). Preprint licensed CC BY 4.0
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递