点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计264篇
大模型相关(38篇)
【1】ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning
标题:ReasonBENCH:LLM推理的(In)稳定性基准测试
链接:https://arxiv.org/abs/2512.07795
作者:Nearchos Potamitis,Lars Klein,Akhil Arora
备注:11 pages, 3 tables, 4 figures
摘要:Large language models (LLMs) are increasingly deployed in settings where reasoning, such as multi-step problem solving and chain-of-thought, is essential. Yet, current evaluation practices overwhelmingly report single-run accuracy while ignoring the intrinsic uncertainty that naturally arises from stochastic decoding. This omission creates a blind spot because practitioners cannot reliably assess whether a method's reported performance is stable, reproducible, or cost-consistent. We introduce ReasonBENCH, the first benchmark designed to quantify the underlying instability in LLM reasoning. ReasonBENCH provides (i) a modular evaluation library that standardizes reasoning frameworks, models, and tasks, (ii) a multi-run protocol that reports statistically reliable metrics for both quality and cost, and (iii) a public leaderboard to encourage variance-aware reporting. Across tasks from different domains, we find that the vast majority of reasoning strategies and models exhibit high instability. Notably, even strategies with similar average performance can display confidence intervals up to four times wider, and the top-performing methods often incur higher and less stable costs. Such instability compromises reproducibility across runs and, consequently, the reliability of reported performance. To better understand these dynamics, we further analyze the impact of prompts, model families, and scale on the trade-off between solve rate and stability. Our results highlight reproducibility as a critical dimension for reliable LLM reasoning and provide a foundation for future reasoning methods and uncertainty quantification techniques. ReasonBENCH is publicly available at https://github.com/au-clan/ReasonBench .
【2】RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models
标题:RL-MTJail:用于大型语言模型自动黑匣子多回合越狱的强化学习
链接:https://arxiv.org/abs/2512.07761
作者:Xiqiao Xiong,Ouxiang Li,Zhuo Liu,Moxin Li,Wentao Shi,Fuli Feng,Xiangnan He
备注:19 pages, 15 figures
摘要:Large language models are vulnerable to jailbreak attacks, threatening their safe deployment in real-world applications. This paper studies black-box multi-turn jailbreaks, aiming to train attacker LLMs to elicit harmful content from black-box models through a sequence of prompt-output interactions. Existing approaches typically rely on single turn optimization, which is insufficient for learning long-term attack strategies. To bridge this gap, we formulate the problem as a multi-turn reinforcement learning task, directly optimizing the harmfulness of the final-turn output as the outcome reward. To mitigate sparse supervision and promote long-term attack strategies, we propose two heuristic process rewards: (1) controlling the harmfulness of intermediate outputs to prevent triggering the black-box model's rejection mechanisms, and (2) maintaining the semantic relevance of intermediate outputs to avoid drifting into irrelevant content. Experimental results on multiple benchmarks show consistently improved attack success rates across multiple models, highlighting the effectiveness of our approach. The code is available at https://github.com/xxiqiao/RL-MTJail. Warning: This paper contains examples of harmful content.
【3】In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models
标题:基于大型语言模型预测时间序列数据的上下文和Few-Shot学习
链接:https://arxiv.org/abs/2512.07705
作者:Saroj Gopali,Bipin Chhetri,Deepika Giri,Sima Siami-Namini,Akbar Siami Namin
摘要:Existing data-driven approaches in modeling and predicting time series data include ARIMA (Autoregressive Integrated Moving Average), Transformer-based models, LSTM (Long Short-Term Memory) and TCN (Temporal Convolutional Network). These approaches, and in particular deep learning-based models such as LSTM and TCN, have shown great results in predicting time series data. With the advancement of leveraging pre-trained foundation models such as Large Language Models (LLMs) and more notably Google's recent foundation model for time series data, {\it TimesFM} (Time Series Foundation Model), it is of interest to investigate whether these foundation models have the capability of outperforming existing modeling approaches in analyzing and predicting time series data. This paper investigates the performance of using LLM models for time series data prediction. We investigate the in-context learning methodology in the training of LLM models that are specific to the underlying application domain. More specifically, the paper explores training LLMs through in-context, zero-shot and few-shot learning and forecasting time series data with OpenAI {\tt o4-mini} and Gemini 2.5 Flash Lite, as well as the recent Google's Transformer-based TimesFM, a time series-specific foundation model, along with two deep learning models, namely TCN and LSTM networks. The findings indicate that TimesFM has the best overall performance with the lowest RMSE value (0.3023) and the competitive inference time (266 seconds). Furthermore, OpenAI's o4-mini also exhibits a good performance based on Zero Shot learning. These findings highlight pre-trained time series foundation models as a promising direction for real-time forecasting, enabling accurate and scalable deployment with minimal model adaptation.
【4】Depth-Wise Activation Steering for Honest Language Models
标题:诚实语言模型的深度激活引导
链接:https://arxiv.org/abs/2512.07667
作者:Gracjan Góral,Marysia Winkels,Steven Basart
备注:See \url{https://github.com/marysia/gaussian-activation-steering}. for code and experiments
摘要:Large language models sometimes assert falsehoods despite internally representing the correct answer, failures of honesty rather than accuracy, which undermines auditability and safety. Existing approaches largely optimize factual correctness or depend on retraining and brittle single-layer edits, offering limited leverage over truthful reporting. We present a training-free activation steering method that weights steering strength across network depth using a Gaussian schedule. On the MASK benchmark, which separates honesty from knowledge, we evaluate seven models spanning the LLaMA, Qwen, and Mistral families and find that Gaussian scheduling improves honesty over no-steering and single-layer baselines in six of seven models. Equal-budget ablations on LLaMA-3.1-8B-Instruct and Qwen-2.5-7B-Instruct show the Gaussian schedule outperforms random, uniform, and box-filter depth allocations, indicating that how intervention is distributed across depth materially affects outcomes beyond total strength. The method is simple, model-agnostic, requires no finetuning, and provides a low-cost control knob for eliciting truthful reporting from models' existing capabilities.
【5】Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement
标题:用于LLM推理增强的PPO、GRPO和DAPO的比较分析和参数调整
链接:https://arxiv.org/abs/2512.07611
作者:Yongsheng Lian
摘要:This study presents a systematic comparison of three Reinforcement Learning (RL) algorithms (PPO, GRPO, and DAPO) for improving complex reasoning in large language models (LLMs). Our main contribution is a controlled transfer-learning evaluation: models are first fine-tuned on the specialized Countdown Game and then assessed on a suite of general-purpose reasoning benchmarks. Across all tasks, RL-trained models outperform their corresponding base models, although the degree of improvement differs by benchmark. Our parametric analysis offers practical guidance for RL-based LLM training. Increasing the group size in GRPO and DAPO leads to more stable training dynamics and higher accuracy, while the impact of the KL-penalty coefficient is non-monotonic. Additionally, we find that the Dynamic Sampling (DS) component in DAPO does not improve performance; in fact, the best overall results are achieved with DAPO when DS is disabled.
【6】Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models
标题:迈向更可靠的人工智能:减少视觉语言模型中的幻觉
链接:https://arxiv.org/abs/2512.07564
作者:Kassoum Sanogo,Renzo Ardiccioni
备注:24 pages, 3 figures, 2 tables. Training-free self-correction framework for vision-language models. Code and implementation details will be released at: https://github.com/kassoumsanogo1/self-correcting-vlm-re-Attention.git
摘要:Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs to iteratively refine responses through uncertainty-guided visual re-attention. Our method combines multidimensional uncertainty quantification (token entropy, attention dispersion, semantic consistency, claim confidence) with attention-guided cropping of under-explored regions. Operating entirely with frozen, pretrained VLMs, our framework requires no gradient updates. We validate our approach on the POPE and MMHAL BENCH benchmarks using the Qwen2.5-VL-7B [23] architecture. Experimental results demonstrate that our method reduces hallucination rates by 9.8 percentage points compared to the baseline, while improving object existence accuracy by 4.7 points on adversarial splits. Furthermore, qualitative analysis confirms that uncertainty-guided re-attention successfully grounds corrections in visual evidence where standard decoding fails. We validate our approach on Qwen2.5-VL-7B [23], with plans to extend validation across diverse architectures in future versions. We release our code and methodology to facilitate future research in trustworthy multimodal systems.
【7】Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics
标题:通过博弈论理解LLM代理行为:策略识别,偏见和多代理动力学
链接:https://arxiv.org/abs/2512.07462
作者:Trung-Kiet Huynh,Duy-Minh Dao-Sy,Thanh-Bang Cao,Phong-Hao Le,Hong-Dan Nguyen,Phu-Quy Nguyen-Lam,Minh-Luan Nguyen-Vo,Hong-Phat Pham,Phu-Hoa Pham,Thien-Kim Than,Chi-Nguyen Tran,Huy Tran,Gia-Thoai Tran-Le,Alessio Buscemi,Le Hong Trang,The Anh Han
摘要:As Large Language Models (LLMs) increasingly operate as autonomous decision-makers in interactive and multi-agent systems and human societies, understanding their strategic behaviour has profound implications for safety, coordination, and the design of AI-driven social and economic infrastructures. Assessing such behaviour requires methods that capture not only what LLMs output, but the underlying intentions that guide their decisions. In this work, we extend the FAIRGAME framework to systematically evaluate LLM behaviour in repeated social dilemmas through two complementary advances: a payoff-scaled Prisoners Dilemma isolating sensitivity to incentive magnitude, and an integrated multi-agent Public Goods Game with dynamic payoffs and multi-agent histories. These environments reveal consistent behavioural signatures across models and languages, including incentive-sensitive cooperation, cross-linguistic divergence and end-game alignment toward defection. To interpret these patterns, we train traditional supervised classification models on canonical repeated-game strategies and apply them to FAIRGAME trajectories, showing that LLMs exhibit systematic, model- and language-dependent behavioural intentions, with linguistic framing at times exerting effects as strong as architectural differences. Together, these findings provide a unified methodological foundation for auditing LLMs as strategic agents and reveal systematic cooperation biases with direct implications for AI governance, collective decision-making, and the design of safe multi-agent systems.
【8】Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models
标题:革命性的混合精度量化:通过大型语言模型实现免训练的自动代理发现
链接
:https://arxiv.org/abs/2512.07419
作者:Haidong Kang,Jun Du,Lihong Lin
摘要:Mixed-Precision Quantization (MPQ) liberates the Deep Neural Networks (DNNs) from the Out-Of-Memory (OOM) bottleneck, which garnered increasing research attention. However, conventional methods either searched from costly differentiable optimization, which is neither efficient nor flexible, or learned a quantized DNN from the proxy (i.e., HAWQ) manually designed by human experts, which is labor-intensive and requires huge expert knowledge. Can we design a proxy without involving any human experts and training? In this paper, we provide an affirmative answer by proposing a novel Large Language Models (LLMs)-driven Training-free Automatic Proxy (dubbed TAP) discovery framework, which reforms the design paradigm of MPQ by utilizing LLMs to find superior TAP tailored for MPQ, automatically. In addition, to bridge the gap between black-box LLMs and the tough MPQ task, we ingeniously propose simple Direct Policy Optimization (DPO) based reinforcement learning to enhance LLMs' reasoning by optimizing prompts, which can construct a positive feedback loop between the LLM and the MPQ task, enabling LLMs to generate better TAP in the next evolution. Extensive experiments on mainstream benchmarks demonstrate that TAP achieves state-of-the-art performance. Finally, we truly believe that our TAP will significantly contribute to the MPQ community by providing a new perspective on LLM-driven design algorithms.
【9】Do LLMs Trust the Code They Write?
标题:法学硕士信任他们编写的代码吗?
链接:https://arxiv.org/abs/2512.07404
作者:Francisco Ribeiro,Claudio Spiess,Prem Devanbu,Sarah Nadi
摘要:Despite the effectiveness of large language models (LLMs) for code generation, they often output incorrect code. One reason is that model output probabilities are often not well-correlated with correctness, and reflect only the final output of the generation process. Inspired by findings that LLMs internally encode concepts like truthfulness, this paper explores if LLMs similarly represent code correctness. Specifically, we identify a correctness representation inside LLMs by contrasting the hidden states between pairs of correct and incorrect code for the same programming tasks. By experimenting on four LLMs, we show that exploiting this extracted correctness representation outperforms standard log-likelihood ranking, as well as verbalized model confidence. Furthermore, we explore how this internal correctness signal can be used to select higher-quality code samples, without requiring test execution. Ultimately, this work demonstrates how leveraging internal representations can enhance code generation systems and make LLMs more reliable, thus improving confidence in automatically generated code.
【10】LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples
标题:LUNE:通过LoRA微调和反面例子有效地摆脱LLM学习
链接:https://arxiv.org/abs/2512.07375
作者:Yezi Liu,Hanning Chen,Wenjun Huang,Yang Ni,Mohsen Imani
摘要:Large language models (LLMs) possess vast knowledge acquired from extensive training corpora, but they often cannot remove specific pieces of information when needed, which makes it hard to handle privacy, bias mitigation, and knowledge correction. Traditional model unlearning approaches require computationally expensive fine-tuning or direct weight editing, making them impractical for real-world deployment. In this work, we introduce LoRA-based Unlearning with Negative Examples (LUNE), a lightweight framework that performs negative-only unlearning by updating only low-rank adapters while freezing the backbone, thereby localizing edits and avoiding disruptive global changes. Leveraging Low-Rank Adaptation (LoRA), LUNE targets intermediate representations to suppress (or replace) requested knowledge with an order-of-magnitude lower compute and memory than full fine-tuning or direct weight editing. Extensive experiments on multiple factual unlearning tasks show that LUNE: (I) achieves effectiveness comparable to full fine-tuning and memory-editing methods, and (II) reduces computational cost by about an order of magnitude.
【11】Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning
标题:恢复到忘记:LoRA的梯度重建,以实现高效的LLM忘记学习
链接:https://arxiv.org/abs/2512.07374
作者:Yezi Liu,Hanning Chen,Wenjun Huang,Yang Ni,Mohsen Imani
摘要:Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.
【12】Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
标题:为了视觉语言模型的自由鲁棒性,减少对虚词的关注
链接:https://arxiv.org/abs/2512.07222
作者:Qiwei Tian,Chenhao Lin,Zhengyu Zhao,Chao Shen
摘要
:To address the trade-off between robustness and performance for robust VLM, we observe that function words could incur vulnerability of VLMs against cross-modal adversarial attacks, and propose Function-word De-Attention (FDA) accordingly to mitigate the impact of function words. Similar to differential amplifiers, our FDA calculates the original and the function-word cross-attention within attention heads, and differentially subtracts the latter from the former for more aligned and robust VLMs. Comprehensive experiments include 2 SOTA baselines under 6 different attacks on 2 downstream tasks, 3 datasets, and 3 models. Overall, our FDA yields an average 18/13/53% ASR drop with only 0.2/0.3/0.6% performance drops on the 3 tested models on retrieval, and a 90% ASR drop with a 0.3% performance gain on visual grounding. We demonstrate the scalability, generalization, and zero-shot performance of FDA experimentally, as well as in-depth ablation studies and analysis. Code will be made publicly at https://github.com/michaeltian108/FDA.
【13】NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models
标题:NeTR:大型语言模型中用于时态推理的神经符号外展框架
链接:https://arxiv.org/abs/2512.07218
作者:Feng Liang,Weixin Zeng,Runhao Zhao,Xiang Zhao
备注:Accepted by AAAI 2026
摘要:Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, temporal reasoning, particularly under complex temporal constraints, remains a major challenge. To this end, existing approaches have explored symbolic methods, which encode temporal structure explicitly, and reflective mechanisms, which revise reasoning errors through multi-step inference. Nonetheless, symbolic approaches often underutilize the reasoning capabilities of LLMs, while reflective methods typically lack structured temporal representations, which can result in inconsistent or hallucinated reasoning. As a result, even when the correct temporal context is available, LLMs may still misinterpret or misapply time-related information, leading to incomplete or inaccurate answers. To address these limitations, in this work, we propose Neuro-Symbolic Temporal Reasoning (NeSTR), a novel framework that integrates structured symbolic representations with hybrid reflective reasoning to enhance the temporal sensitivity of LLM inference. NeSTR preserves explicit temporal relations through symbolic encoding, enforces logical consistency via verification, and corrects flawed inferences using abductive reflection. Extensive experiments on diverse temporal question answering benchmarks demonstrate that NeSTR achieves superior zero-shot performance and consistently improves temporal reasoning without any fine-tuning, showcasing the advantage of neuro-symbolic integration in enhancing temporal understanding in large language models.
【14】SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models
标题:空间:噪音对比估计稳定大型语言模型的自玩微调
链接:https://arxiv.org/abs/2512.07175
作者:Yibo Wang,Qing-Guo Chen,Zhao Xu,Weihua Luo,Kaifu Zhang,Lijun Zhang
备注:NeurIPS 2025
摘要:Self-play fine-tuning has demonstrated promising abilities in adapting large language models (LLMs) to downstream tasks with limited real-world data. The basic principle is to iteratively refine the model with real samples and synthetic ones generated from itself. However, the existing methods primarily focus on the relative gaps between the rewards for two types of data, neglecting their absolute values. Through theoretical analysis, we identify that the gap-based methods suffer from unstable evolution, due to the potentially degenerated objectives. To address this limitation, we introduce a novel self-play fine-tuning method, namely Self-PlAy via Noise Contrastive Estimation (SPACE), which leverages noise contrastive estimation to capture the real-world data distribution. Specifically, SPACE treats synthetic samples as auxiliary components, and discriminates them from the real ones in a binary classification manner. As a result, SPACE independently optimizes the absolute reward values for each type of data, ensuring a consistently meaningful objective and thereby avoiding the instability issue. Theoretically, we show that the optimal solution of the objective in SPACE aligns with the underlying distribution of real-world data, and SPACE guarantees a provably stable convergence to the optimal distribution. Empirically, we show that SPACE significantly improves the performance of LLMs over various tasks, and outperforms supervised fine-tuning that employs much more real-world samples. Compared to gap-based self-play fine-tuning methods, SPACE exhibits remarkable superiority and stable evolution.
【15】Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration
标题:通过免训练信任感知校准提高基于扩散的大型语言模型的吞吐量
链接:https://arxiv.org/abs/2512.07173
作者:Jucheng Shen,Gaurav Sarkar,Yeonju Ro,Sharath Nittur Sridhar,Zhangyang Wang,Aditya Akella,Souvik Kundu
备注:8 pages, 3 figures. Preprint under review
摘要:We present CadLLM, a training-free method to accelerate the inference throughput of diffusion-based LLMs (dLLMs). We first investigate the dynamic nature of token unmasking confidence across blocks and steps. Based on this observation, we present a lightweight adaptive approach that controls the generation block size, step size, and threshold based on the average confidence of unmasked tokens. We further reduce softmax overhead by dynamically leveraging a subset of the vocabulary to regulate sampling breadth. CadLLM is a plug-and-play, model-agnostic method compatible with KV-cache-based dLLMs. Extensive experiments on four popular tasks demonstrate that CadLLM yields up to 2.28x throughput improvement over the state-of-the-art baseline with competitive accuracy.
【16】FOAM: Blocked State Folding for Memory-Efficient LLM Training
标题:FOAM:用于内存高效LLM训练的阻塞状态折叠
链接:https://arxiv.org/abs/2512.07112
作者:Ziqing Wen,Jiahuan Wang,Ping Luo,Dongsheng Li,Tao Sun
摘要
:Large language models (LLMs) have demonstrated remarkable performance due to their large parameter counts and extensive training data. However, their scale leads to significant memory bottlenecks during training, especially when using memory-intensive optimizers like Adam. Existing memory-efficient approaches often rely on techniques such as singular value decomposition (SVD), projections, or weight freezing, which can introduce substantial computational overhead, require additional memory for projections, or degrade model performance. In this paper, we propose Folded Optimizer with Approximate Moment (FOAM), a method that compresses optimizer states by computing block-wise gradient means and incorporates a residual correction to recover lost information. Theoretically, FOAM achieves convergence rates equivalent to vanilla Adam under standard non-convex optimization settings. Empirically, FOAM reduces total training memory by approximately 50\%, eliminates up to 90\% of optimizer state memory overhead, and accelerates convergence. Furthermore, FOAM is compatible with other memory-efficient optimizers, delivering performance and throughput that match or surpass both full-rank and existing memory-efficient baselines.
【17】The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models
标题:人物角色的几何学:将人格与大型语言模型中的推理分开
链接:https://arxiv.org/abs/2512.07092
作者:Zhixiang Wang
备注:10 pages, 3 figures, 1 table. Code and dataset available at https://huggingface.co/Zx93/Soul-Engine-Qwen2.5-0.5B
摘要:Background: The deployment of personalized Large Language Models (LLMs) is currently constrained by the stability-plasticity dilemma. Prevailing alignment methods, such as Supervised Fine-Tuning (SFT), rely on stochastic weight updates that often incur an "alignment tax" -- degrading general reasoning capabilities. Methods: We propose the Soul Engine, a framework based on the Linear Representation Hypothesis, which posits that personality traits exist as orthogonal linear subspaces. We introduce SoulBench, a dataset constructed via dynamic contextual sampling. Using a dual-head architecture on a frozen Qwen-2.5 base, we extract disentangled personality vectors without modifying the backbone weights. Results: Our experiments demonstrate three breakthroughs. First, High-Precision Profiling: The model achieves a Mean Squared Error (MSE) of 0.011 against psychological ground truth. Second, Geometric Orthogonality: T-SNE visualization confirms that personality manifolds are distinct and continuous, allowing for "Zero-Shot Personality Injection" that maintains original model intelligence. Third, Deterministic Steering: We achieve robust control over behavior via vector arithmetic, validated through extensive ablation studies. Conclusion: This work challenges the necessity of fine-tuning for personalization. By transitioning from probabilistic prompting to deterministic latent intervention, we provide a mathematically rigorous foundation for safe, controllable AI personalization.
【18】ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking
标题:ThinkTrap:通过无限思维对黑匣子LLM服务进行拒绝服务攻击
链接:https://arxiv.org/abs/2512.07086
作者:Yunzhe Li,Jianan Wang,Hongzi Zhu,James Lin,Shan Chang,Minyi Guo
备注:This version includes the final camera-ready manuscript accepted by NDSS 2026
摘要:Large Language Models (LLMs) have become foundational components in a wide range of applications, including natural language understanding and generation, embodied intelligence, and scientific discovery. As their computational requirements continue to grow, these models are increasingly deployed as cloud-based services, allowing users to access powerful LLMs via the Internet. However, this deployment model introduces a new class of threat: denial-of-service (DoS) attacks via unbounded reasoning, where adversaries craft specially designed inputs that cause the model to enter excessively long or infinite generation loops. These attacks can exhaust backend compute resources, degrading or denying service to legitimate users. To mitigate such risks, many LLM providers adopt a closed-source, black-box setting to obscure model internals. In this paper, we propose ThinkTrap, a novel input-space optimization framework for DoS attacks against LLM services even in black-box environments. The core idea of ThinkTrap is to first map discrete tokens into a continuous embedding space, then undertake efficient black-box optimization in a low-dimensional subspace exploiting input sparsity. The goal of this optimization is to identify adversarial prompts that induce extended or non-terminating generation across several state-of-the-art LLMs, achieving DoS with minimal token overhead. We evaluate the proposed attack across multiple commercial, closed-source LLM services. Our results demonstrate that, even far under the restrictive request frequency limits commonly enforced by these platforms, typically capped at ten requests per minute (10 RPM), the attack can degrade service throughput to as low as 1% of its original capacity, and in some cases, induce complete service failure.
【19】Ideal Attribution and Faithful Watermarks for Language Models
标题:语言模型的理想归因和忠实水印
链接:https://arxiv.org/abs/2512.07038
作者:Min Jae Song,Kameron Shahabi
备注:30 pages
摘要:We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of the prompt-response interaction history between a model and its user. Each mechanism produces deterministic decisions based on the ledger and an explicit selection criterion, making it well-suited to serve as a ground truth for attribution. We frame the design goal of watermarking schemes as faithful representation of ideal attribution mechanisms. This novel perspective brings conceptual clarity, replacing piecemeal probabilistic statements with a unified language for stating the guarantees of each scheme. It also enables precise reasoning about desiderata for future watermarking schemes, even when no current construction achieves them, since the ideal functionalities are specified first. In this way, the framework provides a roadmap that clarifies which guarantees are attainable in an idealized setting and worth pursuing in practice.
【20】LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding
标题:LLM驱动的复合神经架构搜索多源RL状态编码
链接:https://arxiv.org/abs/2512.06982
作者:Yu Yu,Qian Xie,Nairen Cao,Li Jin
备注:NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning
摘要:Designing state encoders for reinforcement learning (RL) with multiple information sources -- such as sensor measurements, time-series signals, image observations, and textual instructions -- remains underexplored and often requires manual design. We formalize this challenge as a problem of composite neural architecture search (NAS), where multiple source-specific modules and a fusion module are jointly optimized. Existing NAS methods overlook useful side information from the intermediate outputs of these modules -- such as their representation quality -- limiting sample efficiency in multi-source RL settings. To address this, we propose an LLM-driven NAS pipeline that leverages language-model priors and intermediate-output signals to guide sample-efficient search for high-performing composite state encoders. On a mixed-autonomy traffic control task, our approach discovers higher-performing architectures with fewer candidate evaluations than traditional NAS baselines and the LLM-based GENIUS framework.
【21】Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge
标题:视觉-语言-动作模型的任务适应:2025年行为挑战赛第一名解决方案
链接:https://arxiv.org/abs/2512.06951
作者:Ilia Larchenko,Gleb Zarin,Akash Karnatak
备注:2025 NeurIPS Behavior Challenge 1st place solution
摘要:We present a vision-action policy that won 1st place in the 2025 BEHAVIOR Challenge - a large-scale benchmark featuring 50 diverse long-horizon household tasks in photo-realistic simulation, requiring bimanual manipulation, navigation, and context-aware decision making. Building on the Pi0.5 architecture, we introduce several innovations. Our primary contribution is correlated noise for flow matching, which improves training efficiency and enables correlation-aware inpainting for smooth action sequences. We also apply learnable mixed-layer attention and System 2 stage tracking for ambiguity resolution. Training employs multi-sample flow matching to reduce variance, while inference uses action compression and challenge-specific correction rules. Our approach achieves 26% q-score across all 50 tasks on both public and private leaderboards.
【22】Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models
标题:父母引导语义奖励模型(PGSRM):基于嵌入的Transformer语言模型强化学习奖励函数
链接:https://arxiv.org/abs/2512.06920
作者:Alexandr Plashchinsky
摘要:We introduce the Parent-Guided Semantic Reward Model (PGSRM), a lightweight reward framework for reinforcement learning (RL) of transformer language models. PGSRM replaces binary correctness signals, human preference data, and trained reward models with a simple signal: cosine similarity between a parent model's reference output embedding and a child model's generated output for the same input. This yields a dense, semantically meaningful reward with no human annotation or additional model training. We apply PGSRM on five language tasks and find that it produces smoother reward improvement and more stable PPO dynamics than a binary reward baseline, suggesting that embedding-based semantic rewards are a practical alternative to RLHF-style reward modeling for parent-guided alignment in smaller transformer models.
【23】Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
标题:少即是多,但在哪里?通过LLM引导的关键帧优先级进行动态令牌压缩
链接:https://arxiv.org/abs/2512.06866
作者:Yulin Li,Haokun Gui,Ziyang Fan,Junjie Wang,Bin Kang,Bin Chen,Zhuotao Tian
备注:Accepted by NeurIPS 2025
摘要:Recent advances in Video Large Language Models (VLLMs) have achieved remarkable video understanding capabilities, yet face critical efficiency bottlenecks due to quadratic computational growth with lengthy visual token sequences of long videos. While existing keyframe sampling methods can improve temporal modeling efficiency, additional computational cost is introduced before feature encoding, and the binary frame selection paradigm is found suboptimal. Therefore, in this work, we propose Dynamic Token compression via LLM-guided Keyframe prior (DyToK), a training-free paradigm that enables dynamic token compression by harnessing VLLMs' inherent attention mechanisms. Our analysis reveals that VLLM attention layers naturally encoding query-conditioned keyframe priors, by which DyToK dynamically adjusts per-frame token retention ratios, prioritizing semantically rich frames while suppressing redundancies. Extensive experiments demonstrate that DyToK achieves state-of-the-art efficiency-accuracy tradeoffs. DyToK shows plug-and-play compatibility with existing compression methods, such as VisionZip and FastV, attaining 4.3x faster inference while preserving accuracy across multiple VLLMs, such as LLaVA-OneVision and Qwen2.5-VL. Code is available at https://github.com/yu-lin-li/DyToK .
【24】RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models
标题:RMAdaptor:用于视觉语言模型的基于重建的多模式适配器
链接:https://arxiv.org/abs/2512.06811
作者:Xiang Lin,Weixin Li,Shu Guo,Lihong Wang,Di Huang
备注:Accepted by AAAI 2026(Oral)
摘要
:Pre-trained Vision-Language Models (VLMs), \textit{e.g.} CLIP, have become essential tools in multimodal transfer learning. However, fine-tuning VLMs in few-shot scenarios poses significant challenges in balancing task-specific adaptation and generalization in the obtained model. Meanwhile, current researches have predominantly focused on prompt-based adaptation methods, leaving adapter-based approaches underexplored and revealing notable performance gaps. To address these challenges, we introduce a novel Reconstruction-based Multimodal Adapter (RMAdapter), which leverages a dual-branch architecture. Unlike conventional single-branch adapters, RMAdapter consists of: (1) an adaptation branch that injects task-specific knowledge through parameter-efficient fine-tuning, and (2) a reconstruction branch that preserves general knowledge by reconstructing latent space features back into the original feature space. This design facilitates a dynamic balance between general and task-specific knowledge. Importantly, although RMAdapter introduces an additional reconstruction branch, it is carefully optimized to remain lightweight. By computing reconstruction loss locally at each layer and sharing projection modules, the overall computational overhead is kept minimal. A consistency constraint is also incorporated to better regulate the trade-off between discriminability and generalization. We comprehensively evaluate the effectiveness of RMAdapter on three representative tasks: generalization to new categories, generalization to new target datasets, and domain generalization. Without relying on data augmentation or duplicate prompt designs, our RMAdapter consistently outperforms state-of-the-art approaches across all evaluation metrics.
【25】KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models
标题:KV-CAR:在大型语言模型中使用自动编码器的KV缓存压缩和KV重用
链接:https://arxiv.org/abs/2512.06727
作者:Sourjya Roy,Shrihari Sridharan,Surya Selvam,Anand Raghunathan
摘要:As Large Language Models (LLMs) scale in size and context length, the memory requirements of the key value (KV) cache have emerged as a major bottleneck during autoregressive decoding. The KV cache grows with sequence length and embedding dimension, often exceeding the memory footprint of the model itself and limiting achievable batch sizes and context windows. To address this challenge, we present KV CAR, a unified and architecture agnostic framework that significantly reduces KV cache storage while maintaining model fidelity. KV CAR combines two complementary techniques. First, a lightweight autoencoder learns compact representations of key and value tensors along the embedding dimension, compressing them before they are stored in the KV cache and restoring them upon retrieval. Second, a similarity driven reuse mechanism identifies opportunities to reuse KV tensors of specific attention heads across adjacent layers. Together, these methods reduce the dimensional and structural redundancy in KV tensors without requiring changes to the transformer architecture. Evaluations on GPT 2 and TinyLLaMA models across Wikitext, C4, PIQA, and Winogrande datasets demonstrate that KV CAR achieves up to 47.85 percent KV cache memory reduction with minimal impact on perplexity and zero shot accuracy. System level measurements on an NVIDIA A40 GPU show that the reduced KV footprint directly translates into longer sequence lengths and larger batch sizes during inference. These results highlight the effectiveness of KV CAR in enabling memory efficient LLM inference.
【26】GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering
标题:GAE:图形正规化稀疏自动编码器,用于稳健的LLM安全转向
链接:https://arxiv.org/abs/2512.06655
作者:Jehyeok Yeon,Federico Cinus,Yifan Wu,Luca Luceri
摘要:Large language models (LLMs) face critical safety challenges, as they can be manipulated to generate harmful content through adversarial prompts and jailbreak attacks. Many defenses are typically either black-box guardrails that filter outputs, or internals-based methods that steer hidden activations by operationalizing safety as a single latent feature or dimension. While effective for simple concepts, this assumption is limiting, as recent evidence shows that abstract concepts such as refusal and temporality are distributed across multiple features rather than isolated in one. To address this limitation, we introduce Graph-Regularized Sparse Autoencoders (GSAEs), which extends SAEs with a Laplacian smoothness penalty on the neuron co-activation graph. Unlike standard SAEs that assign each concept to a single latent feature, GSAEs recover smooth, distributed safety representations as coherent patterns spanning multiple features. We empirically demonstrate that GSAE enables effective runtime safety steering, assembling features into a weighted set of safety-relevant directions and controlling them with a two-stage gating mechanism that activates interventions only when harmful prompts or continuations are detected during generation. This approach enforces refusals adaptively while preserving utility on benign queries. Across safety and QA benchmarks, GSAE steering achieves an average 82% selective refusal rate, substantially outperforming standard SAE steering (42%), while maintaining strong task accuracy (70% on TriviaQA, 65% on TruthfulQA, 74% on GSM8K). Robustness experiments further show generalization across LLaMA-3, Mistral, Qwen, and Phi families and resilience against jailbreak attacks (GCG, AutoDAN), consistently maintaining >= 90% refusal of harmful content.
【27】A Fast and Effective Solution to the Problem of Look-ahead Bias in LLMs
标题:一种快速有效解决LLM中前瞻偏差问题的方法
链接:https://arxiv.org/abs/2512.06607
作者:Humzah Merchant,Bradford Levy
摘要:Applying LLMs to predictive tasks in finance is challenging due to look-ahead bias resulting from their training on long time-series data. This precludes the backtests typically employed in finance since retraining frontier models from scratch with a specific knowledge cutoff is prohibitive. In this paper, we introduce a fast, effective, and low-cost alternative. Our method guides generation at inference time by adjusting the logits of a large base model using a pair of smaller, specialized models -- one fine-tuned on information to be forgotten and another on information to be retained. We demonstrate that our method effectively removes both verbatim and semantic knowledge, corrects biases, and outperforms prior methods.
【28】A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation
标题:A-3 PO:通过具有停滞意识的近端政策逼近加速同步LLM训练
链接:https://arxiv.org/abs/2512.06547
作者:Xiaocan Li,Shiliang Wu,Zheng Shen
摘要
:Decoupled loss has been a successful reinforcement learning (RL) algorithm to deal with the high data staleness under the asynchronous RL setting. Decoupled loss improves coupled-loss style of algorithms' (e.g., PPO, GRPO) learning stability by introducing a proximal policy to decouple the off-policy corrections (importance weight) from the controlling policy updates (trust region). However, the proximal policy requires an extra forward pass through the network at each training step, creating a computational bottleneck for large language models. We observe that since the proximal policy only serves as a trust region anchor between the behavior and target policies, we can approximate it through simple interpolation without explicit computation. We call this approach A-3PO (APproximated Proximal Policy Optimization). A-3PO eliminates this overhead, reducing training time by 18% while maintaining comparable performance. Code & off-the-shelf example are available at: https://github.com/inclusionAI/AReaL/blob/main/docs/algorithms/prox_approx.md
【29】Optimizing LLMs Using Quantization for Mobile Execution
标题:使用移动执行的量化优化LLM
链接:https://arxiv.org/abs/2512.06490
作者:Agatsya Yadav,Renta Chintala Bhargavi
备注:11 pages, 1 equation, 2 tables. Author Accepted Manuscript (AAM) of a paper published in Springer LNNS, ICT4SD 2025. DOI: 10.1007/978-3-032-06697-8_33
摘要:Large Language Models (LLMs) offer powerful capabilities, but their significant size and computational requirements hinder deployment on resource-constrained mobile devices. This paper investigates Post-Training Quantization (PTQ) for compressing LLMs for mobile execution. We apply 4-bit PTQ using the BitsAndBytes library with the Hugging Face Transformers framework to Meta's Llama 3.2 3B model. The quantized model is converted to GGUF format using llama.cpp tools for optimized mobile inference. The PTQ workflow achieves a 68.66% reduction in model size through 4-bit quantization, enabling the Llama 3.2 3B model to run efficiently on an Android device. Qualitative validation shows that the 4-bit quantized model can perform inference tasks successfully. We demonstrate the feasibility of running the quantized GGUF model on an Android device using the Termux environment and the Ollama framework. PTQ, especially at 4-bit precision combined with mobile-optimized formats like GGUF, provides a practical pathway for deploying capable LLMs on mobile devices, balancing model size and performance.
【30】Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression
标题:规则删除、解释和压缩下LLM综合的多步逻辑推理的少即是多
链接:https://arxiv.org/abs/2512.06393
作者:Qiming Bao,Xiaoxuan Fu
摘要:Large language models (LLMs) excel across many natural language tasks, yet their generalisation to structural perturbations in logical contexts remains poorly understood. We introduce a controlled evaluation framework that probes reasoning reliability through four targeted stress tests: (1) rule deletion, removing either redundant or essential rules from a multi-step inference chain; (2) contradictory evidence injection; (3) logic-preserving rewrites generated through several families of equivalence laws (contrapositive, double negation, implication, De Morgan, identity, and commutativity); and (4) multi-law equivalence stacking that introduces 2-5 simultaneous logical transformations. Across three representative model families: BERT, Qwen2, and LLaMA-like models. Our experiments reveal a strikingly consistent pattern: all models achieve perfect accuracy on the base tasks and remain fully generalise to redundant rule deletion and all equivalence-based rewrites (single or multi-law), but fail sharply under essential rule deletion (dropping to 25% accuracy) and collapse completely in the presence of explicit contradictions (0% accuracy). These results demonstrate that LLMs possess stable invariance to semantic-preserving logical transformations, yet remain fundamentally brittle to missing or conflicting evidence. Our framework provides a clean diagnostic tool for isolating such reasoning failure modes and highlights persistent gaps in the logical generalisation abilities of current LLMs.
【31】RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs
标题:RLAX:用于pu上大型语言模型的大规模分布式强化学习
链接:https://arxiv.org/abs/2512.06392
作者:Runlong Zhou,Lefan Zhang,Shang-Chen Wu,Kelvin Zou,Hanzhi Zhou,Ke Ye,Yihao Feng,Dong Yin,Alex Guillen Garcia,Dmytro Babych,Rohit Chatterjee,Matthew Hopkins,Xiang Kong,Chang Lan,Lezhi Li,Yiping Ma,Daniele Molinari,Senyu Tong,Yanchao Sun,Thomas Voice,Jianyu Wang,Chong Wang,Simon Wang,Floris Weers,Yechen Xu,Guolin Yin,Muyang Yu,Yi Zhang,Zheng Zhou,Danyang Zhuo,Ruoming Pang,Cheng Leong
备注:14 pages, 6 figures
摘要:Reinforcement learning (RL) has emerged as the de-facto paradigm for improving the reasoning capabilities of large language models (LLMs). We have developed RLAX, a scalable RL framework on TPUs. RLAX employs a parameter-server architecture. A master trainer periodically pushes updated model weights to the parameter server while a fleet of inference workers pull the latest weights and generates new rollouts. We introduce a suite of system techniques to enable scalable and preemptible RL for a diverse set of state-of-art RL algorithms. To accelerate convergence and improve model quality, we have devised new dataset curation and alignment techniques. Large-scale evaluations show that RLAX improves QwQ-32B's pass@8 accuracy by 12.8% in just 12 hours 48 minutes on 1024 v5p TPUs, while remaining robust to preemptions during training.
【32】LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing
标题:用于智能制造中碳意识作业调度的LLM升级图强化学习
链接:https://arxiv.org/abs/2512.06351
作者:Zhiying Yang,Fang Liu,Wei Zhang,Xin Lou,Malcolm Yoke Hean Low,Boon Ping Gan
摘要
:This paper presents \textsc{Luca}, a \underline{l}arge language model (LLM)-\underline{u}pgraded graph reinforcement learning framework for \underline{c}arbon-\underline{a}ware flexible job shop scheduling. \textsc{Luca} addresses the challenges of dynamic and sustainable scheduling in smart manufacturing systems by integrating a graph neural network and an LLM, guided by a carefully designed in-house prompting strategy, to produce a fused embedding that captures both structural characteristics and contextual semantics of the latest scheduling state. This expressive embedding is then processed by a deep reinforcement learning policy network, which generates real-time scheduling decisions optimized for both makespan and carbon emission objectives. To support sustainability goals, \textsc{Luca} incorporates a dual-objective reward function that encourages both energy efficiency and scheduling timeliness. Experimental results on both synthetic and public datasets demonstrate that \textsc{Luca} consistently outperforms comparison algorithms. For instance, on the synthetic dataset, it achieves an average of 4.1\% and up to 12.2\% lower makespan compared to the best-performing comparison algorithm while maintaining the same emission level. On public datasets, additional gains are observed for both makespan and emission. These results demonstrate that \textsc{Luca} is effective and practical for carbon-aware scheduling in smart manufacturing.
【33】Chemistry Integrated Language Model using Hierarchical Molecular Representation for Polymer Informatics
标题:聚合物信息学使用分层分子表示的化学集成语言模型
链接:https://arxiv.org/abs/2512.06301
作者:Jihun Ahn,Gabriella Pasya Irianti,Vikram Thapar,Su-Mi Hur
摘要:Machine learning has transformed material discovery for inorganic compounds and small molecules, yet polymers remain largely inaccessible to these methods. While data scarcity is often cited as the primary bottleneck, we demonstrate that strategic molecular representations can overcome this limitation. We introduce CI-LLM (Chemically Informed Language Model), a framework combining HAPPY (Hierarchically Abstracted rePeat unit of PolYmer), which encodes chemical substructures as tokens, with numerical descriptors within transformer architectures. For property prediction, De$^3$BERTa, our descriptor-enriched encoder, achieves 3.5x faster inference than SMILES-based models with improved accuracy ($R^2$ score gains of 0.9-4.1 percent across four properties), while providing interpretable structure-property insights at the subgroup level. For inverse design, our GPT-based generator produces polymers with targeted properties, achieving 100 percent scaffold retention and successful multi-property optimization for negatively correlated objectives. This comprehensive framework demonstrates both forward prediction and inverse design capabilities, showcasing how strategic molecular representation advances machine learning applications in polymer science.
【34】Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety
标题:在开源LLM之间使用信任意识的细粒度辩论来自动化数据丰富,以促进心理健康和在线安全
链接:https://arxiv.org/abs/2512.06227
作者:Junyu Mao,Anthony Hills,Talia Tseriotou,Maria Liakata,Aya Shamir,Dan Sayda,Dana Atzil-Slonim,Natalie Djohari,Arpan Mandal,Silke Roth,Pamela Ugwudike,Mahesan Niranjan,Stuart E. Middleton
摘要:Real-world indicators are important for improving natural language processing (NLP) tasks such as life events for mental health analysis and risky behaviour for online safety, yet labelling such information in NLP training datasets is often costly and/or difficult given the dynamic nature of such events. This paper compares several LLM-based data enrichment methods and introduces a novel Confidence-Aware Fine-Grained Debate (CFD) framework in which multiple LLM agents simulate human annotators and exchange fine-grained evidence to reach consensus. We describe two new expert-annotated datasets, a mental health Reddit wellbeing dataset and an online safety Facebook sharenting risk dataset. Our CFD framework achieves the most robust data enrichment performance compared to a range of baselines and we show that this type of data enrichment consistently improves downstream tasks. Enriched features incorporated via debate transcripts yield the largest gains, outperforming the non-enriched baseline by 10.1% for the online safety task.
【35】K2-V2: A 360-Open, Reasoning-Enhanced LLM
标题:K2-V2:360-开放、推理增强的LLM
链接:https://arxiv.org/abs/2512.06201
作者:K2 Team,Zhengzhong Liu,Liping Tang,Linghao Jin,Haonan Li,Nikhil Ranjan,Desai Fan,Shaurya Rohatgi,Richard Fan,Omkar Pangarkar,Huijuan Wang,Zhoujun Cheng,Suqi Sun,Seungwook Han,Bowen Tan,Gurpreet Gosal,Xudong Han,Varad Pimpalkhute,Shibo Hao,Ming Shan Hee,Joel Hestness,Haolong Jia,Liqun Ma,Aaryamonvikram Singh,Daria Soboleva,Natalia Vassilieva,Renxi Wang,Yingquan Wu,Yuekai Sun,Taylor Killian,Alexander Moreno,John Maggs,Hector Ren,Guowei He,Hongyi Wang,Xuezhe Ma,Yuqi Wang,Mikhail Yurochkin,Eric P. Xing
摘要:We introduce K2-V2, a 360-open LLM built from scratch as a superior base for reasoning adaptation, in addition to functions such as conversation and knowledge retrieval from general LLMs. It stands as the strongest fully open model, rivals open-weight leaders in its size class, outperforms Qwen2.5-72B and approaches the performance of Qwen3-235B. We actively infuse domain knowledge, reasoning, long-context, and tool use throughout the training process. This explicitly prepares the model for complex reasoning tasks. We demonstrate this potential using simple supervised fine-tuning, establishing a strong baseline that indicates significant headroom for advanced alignment. By releasing the full training history and data composition, we maximize the effectiveness of continuous training, a key open source production scenario. We release the model weights and signature LLM360 artifacts, such as complete training data, to empower the community with a capable, reasoning-centric foundation.
【36】Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals
标题:全力以赴研究LLM准确性:虚假预测市场,真实信心信号
链接:https://arxiv.org/abs/2512.05998
作者:Michael Todasco
备注:25 pages, 8 tables, 2 figures. Pilot study. Data, prompts, and code available at https://osf.io/dc24t/
摘要
:Large language models are increasingly used to evaluate other models, yet these judgments typically lack any representation of confidence. This pilot study tests whether framing an evaluation task as a betting game (a fictional prediction market with its own LLM currency) improves forecasting accuracy and surfaces calibrated confidence signals. We generated 100 math and logic questions with verifiable answers. Six Baseline models (three current-generation, three prior-generation) answered all items. Three Predictor models then forecasted, for each question-baseline pair, if the baseline would answer correctly. Each predictor completed matched runs in two conditions: Control (simple correct/incorrect predictions) and Incentive (predictions plus wagers of 1-100,000 LLMCoin under even odds, starting from a 1,000,000 LLMCoin bankroll). Across 5,400 predictions per condition, Incentive runs showed modestly higher accuracy (81.5% vs. 79.1%, p = .089, d = 0.86) and significantly faster learning across rounds (12.0 vs. 2.9 percentage-point improvement from Round 1 to Round 4, p = .011). Most notably, stake size tracked confidence. "Whale" bets of 40,000+ coins were correct ~99% of the time, while small bets (<1,000 coins) showed only ~74% accuracy. The key finding is not that fictional money makes models smarter; accuracy gains were modest and did not reach statistical significance (p = .089) in this pilot. Rather, the betting mechanic created a legible confidence signal absent from binary yes/no outputs. This suggests that simple financial framing may help transform LLMs into risk-aware forecasters, making their internal beliefs visible and usable. The protocol offers a foundation for future work for meta-evaluation systems and what may become LLM-to-LLM prediction markets.
【37】A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation
标题:基于知识的语言模型:在多智能体语言习得模拟中推导语法知识
链接:https://arxiv.org/abs/2512.02195
作者:David Ph. Shakouri,Crit Cremers,Niels O. Schiller
备注:23 pages, 7 figures, 11 tables. Related work: arXiv:2503.18702. This is the peer-reviewed publisher's version, downloadable from: https://www.clinjournal.org/clinj/article/view/193
摘要:This paper presents an initial study performed by the MODOMA system. The MODOMA is a computational multi-agent laboratory environment for unsupervised language acquisition experiments such that acquisition is based on the interaction between two language models, an adult and a child agent. Although this framework employs statistical as well as rule-based procedures, the result of language acquisition is a knowledge-based language model, which can be used to generate and parse new utterances of the target language. This system is fully parametrized and researchers can control all aspects of the experiments while the results of language acquisition, that is, the acquired grammatical knowledge, are explicitly represented and can be consulted. Thus, this system introduces novel possibilities for conducting computational language acquisition experiments. The experiments presented by this paper demonstrate that functional and content categories can be acquired and represented by the daughter agent based on training and test data containing different amounts of exemplars generated by the adult agent. Interestingly, similar patterns, which are well-established for human-generated data, are also found for these machine-generated data. As the procedures resulted in the successful acquisition of discrete grammatical categories by the child agent, these experiments substantiate the validity of the MODOMA approach to modelling language acquisition.
【38】A Latent Variable Framework for Scaling Laws in Large Language Models
标题:大型语言模型中缩放定律的潜在变量框架
链接:https://arxiv.org/abs/2512.06553
作者:Peiyao Cai,Chengyu Cui,Felipe Maia Polo,Seamus Somerstep,Leshem Choshen,Mikhail Yurochkin,Moulinath Banerjee,Yuekai Sun,Kean Ming Tan,Gongjun Xu
摘要:We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM families with distinct architectures and training strategies, evaluated on an increasing number of benchmarks. This heterogeneity makes a single global scaling curve inadequate for capturing how performance varies across families and benchmarks. To address this, we propose a latent variable modeling framework in which each LLM family is associated with a latent variable that captures the common underlying features in that family. An LLM's performance on different benchmarks is then driven by its latent skills, which are jointly determined by the latent variable and the model's own observable features. We develop an estimation procedure for this latent variable model and establish its statistical properties. We also design efficient numerical algorithms that support estimation and various downstream tasks. Empirically, we evaluate the approach on 12 widely used benchmarks from the Open LLM Leaderboard (v1/v2).
Graph相关(图学习|图神经网络|图优化等)(15篇)
【1】Graph-Based Learning of Spectro-Topographical EEG Representations with Gradient Alignment for Brain-Computer Interfaces
标题:脑机接口中具有梯度对齐的光谱地形图脑电表示的基于图的学习
链接:https://arxiv.org/abs/2512.07820
作者:Prithila Angkan,Amin Jalali,Paul Hungler,Ali Etemad
摘要:We present a novel graph-based learning of EEG representations with gradient alignment (GEEGA) that leverages multi-domain information to learn EEG representations for brain-computer interfaces. Our model leverages graph convolutional networks to fuse embeddings from frequency-based topographical maps and time-frequency spectrograms, capturing inter-domain relationships. GEEGA addresses the challenge of achieving high inter-class separability, which arises from the temporally dynamic and subject-sensitive nature of EEG signals by incorporating the center loss and pairwise difference loss. Additionally, GEEGA incorporates a gradient alignment strategy to resolve conflicts between gradients from different domains and the fused embeddings, ensuring that discrepancies, where gradients point in conflicting directions, are aligned toward a unified optimization direction. We validate the efficacy of our method through extensive experiments on three publicly available EEG datasets: BCI-2a, CL-Drive and CLARE. Comprehensive ablation studies further highlight the impact of various components of our model.
【2】Mitigating Bias in Graph Hyperdimensional Computing
标题:缓解图多维计算中的偏差
链接:https://arxiv.org/abs/2512.07433
作者:Yezi Liu,William Youngwoo Chung,Yang Ni,Hanning Chen,Mohsen Imani
摘要
:Graph hyperdimensional computing (HDC) has emerged as a promising paradigm for cognitive tasks, emulating brain-like computation with high-dimensional vectors known as hypervectors. While HDC offers robustness and efficiency on graph-structured data, its fairness implications remain largely unexplored. In this paper, we study fairness in graph HDC, where biases in data representation and decision rules can lead to unequal treatment of different groups. We show how hypervector encoding and similarity-based classification can propagate or even amplify such biases, and we propose a fairness-aware training framework, FairGHDC, to mitigate them. FairGHDC introduces a bias correction term, derived from a gap-based demographic-parity regularizer, and converts it into a scalar fairness factor that scales the update of the class hypervector for the ground-truth label. This enables debiasing directly in the hypervector space without modifying the graph encoder or requiring backpropagation. Experimental results on six benchmark datasets demonstrate that FairGHDC substantially reduces demographic-parity and equal-opportunity gaps while maintaining accuracy comparable to standard GNNs and fairness-aware GNNs. At the same time, FairGHDC preserves the computational advantages of HDC, achieving up to about one order of magnitude ($\approx 10\times$) speedup in training time on GPU compared to GNN and fairness-aware GNN baselines.
【3】Local-Curvature-Aware Knowledge Graph Embedding: An Extended Ricci Flow Approach
标题:局部弯曲感知知识图嵌入:扩展的Ricci流方法
链接:https://arxiv.org/abs/2512.07332
作者:Zhengquan Luo,Guy Tadmor,Or Amar,David Zeevi,Zhiqiang Xu
摘要:Knowledge graph embedding (KGE) relies on the geometry of the embedding space to encode semantic and structural relations. Existing methods place all entities on one homogeneous manifold, Euclidean, spherical, hyperbolic, or their product/multi-curvature variants, to model linear, symmetric, or hierarchical patterns. Yet a predefined, homogeneous manifold cannot accommodate the sharply varying curvature that real-world graphs exhibit across local regions. Since this geometry is imposed a priori, any mismatch with the knowledge graph's local curvatures will distort distances between entities and hurt the expressiveness of the resulting KGE. To rectify this, we propose RicciKGE to have the KGE loss gradient coupled with local curvatures in an extended Ricci flow such that entity embeddings co-evolve dynamically with the underlying manifold geometry towards mutual adaptation. Theoretically, when the coupling coefficient is bounded and properly selected, we rigorously prove that i) all the edge-wise curvatures decay exponentially, meaning that the manifold is driven toward the Euclidean flatness; and ii) the KGE distances strictly converge to a global optimum, which indicates that geometric flattening and embedding optimization are promoting each other. Experimental improvements on link prediction and node classification benchmarks demonstrate RicciKGE's effectiveness in adapting to heterogeneous knowledge graph structures.
【4】SIT-Graph: State Integrated Tool Graph for Multi-Turn Agents
标题:SIT-Shape:多回合代理的状态集成工具图
链接:https://arxiv.org/abs/2512.07287
作者:Sijia Li,Yuchen Huang,Zifan Liu,Zijian Li,Jingjing fu,Lei Song,Jiang Bian,Jun Zhang,Rui Wang
摘要:Despite impressive advances in agent systems, multi-turn tool-use scenarios remain challenging. It is mainly because intent is clarified progressively and the environment evolves with each tool call. While reusing past experience is natural, current LLM agents either treat entire trajectories or pre-defined subtasks as indivisible units, or solely exploit tool-to-tool dependencies, hindering adaptation as states and information evolve across turns. In this paper, we propose a State Integrated Tool Graph (SIT-Graph), which enhances multi-turn tool use by exploiting partially overlapping experience. Inspired by human decision-making that integrates episodic and procedural memory, SIT-Graph captures both compact state representations (episodic-like fragments) and tool-to-tool dependencies (procedural-like routines) from historical trajectories. Specifically, we first build a tool graph from accumulated tool-use sequences, and then augment each edge with a compact state summary of the dialog and tool history that may shape the next action. At inference time, SIT-Graph enables a human-like balance between episodic recall and procedural execution: when the next decision requires recalling prior context, the agent retrieves the state summaries stored on relevant edges and uses them to guide its next action; when the step is routine, it follows high-confidence tool dependencies without explicit recall. Experiments across multiple stateful multi-turn tool-use benchmarks show that SIT-Graph consistently outperforms strong memory- and graph-based baselines, delivering more robust tool selection and more effective experience transfer.
【5】A graph generation pipeline for critical infrastructures based on heuristics, images and depth data
标题:基于启发式、图像和深度数据的关键基础设施的图形生成管道
链接:https://arxiv.org/abs/2512.07269
作者:Mike Diessner,Yannick Tarant
摘要:Virtual representations of physical critical infrastructures, such as water or energy plants, are used for simulations and digital twins to ensure resilience and continuity of their services. These models usually require 3D point clouds from laser scanners that are expensive to acquire and require specialist knowledge to use. In this article, we present a graph generation pipeline based on photogrammetry. The pipeline detects relevant objects and predicts their relation using RGB images and depth data generated by a stereo camera. This more cost-effective approach uses deep learning for object detection and instance segmentation of the objects, and employs user-defined heuristics or rules to infer their relations. Results of two hydraulic systems show that this strategy can produce graphs close to the ground truth while its flexibility allows the method to be tailored to specific applications and its transparency qualifies it to be used in the high stakes decision-making that is required for critical infrastructures.
【6】Dual Refinement Cycle Learning: Unsupervised Text Classification of Mamba and Community Detection on Text Attributed Graph
标题:双细化循环学习:Mamba的无监督文本分类和文本属性图上的社区检测
链接:https://arxiv.org/abs/2512.07100
作者:Hong Wang,Yinglong Zhang,Hanhan Guo,Xuewen Xia,Xing Xu
摘要:Pretrained language models offer strong text understanding capabilities but remain difficult to deploy in real-world text-attributed networks due to their heavy dependence on labeled data. Meanwhile, community detection methods typically ignore textual semantics, limiting their usefulness in downstream applications such as content organization, recommendation, and risk monitoring. To overcome these limitations, we present Dual Refinement Cycle Learning (DRCL), a fully unsupervised framework designed for practical scenarios where no labels or category definitions are available. DRCL integrates structural and semantic information through a warm-start initialization and a bidirectional refinement cycle between a GCN-based Community Detection Module (GCN-CDM) and a Text Semantic Modeling Module (TSMM). The two modules iteratively exchange pseudo-labels, allowing semantic cues to enhance structural clustering and structural patterns to guide text representation learning without manual supervision. Across several text-attributed graph datasets, DRCL consistently improves the structural and semantic quality of discovered communities. Moreover, a Mamba-based classifier trained solely from DRCL's community signals achieves accuracy comparable to supervised models, demonstrating its potential for deployment in large-scale systems where labeled data are scarce or costly.
【7】Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design
标题:分子图的自我监督学习:掩蔽设计的系统研究
链接:https://arxiv.org/abs/2512.07064
作者:Jiannan Yang,Veronika Thost,Tengfei Ma
摘要:Self-supervised learning (SSL) plays a central role in molecular representation learning. Yet, many recent innovations in masking-based pretraining are introduced as heuristics and lack principled evaluation, obscuring which design choices are genuinely effective. This work cast the entire pretrain-finetune workflow into a unified probabilistic framework, enabling a transparent comparison and deeper understanding of masking strategies. Building on this formalism, we conduct a controlled study of three core design dimensions: masking distribution, prediction target, and encoder architecture, under rigorously controlled settings. We further employ information-theoretic measures to assess the informativeness of pretraining signals and connect them to empirically benchmarked downstream performance. Our findings reveal a surprising insight: sophisticated masking distributions offer no consistent benefit over uniform sampling for common node-level prediction tasks. Instead, the choice of prediction target and its synergy with the encoder architecture are far more critical. Specifically, shifting to semantically richer targets yields substantial downstream improvements, particularly when paired with expressive Graph Transformer encoders. These insights offer practical guidance for developing more effective SSL methods for molecular graphs.
【8】Multi-Scale Protein Structure Modelling with Geometric Graph U-Nets
标题:使用几何图U网的多尺度蛋白质结构建模
链接:https://arxiv.org/abs/2512.06752
作者:Chang Liu,Vivian Li,Linus Leong,Vladimir Radenkovic,Pietro Liò,Chaitanya K. Joshi
备注:Presented at Machine Learning in Structural Biology, 2025. Open-source code: https://github.com/VirtualProteins/GNN_UNet
摘要:Geometric Graph Neural Networks (GNNs) and Transformers have become state-of-the-art for learning from 3D protein structures. However, their reliance on message passing prevents them from capturing the hierarchical interactions that govern protein function, such as global domains and long-range allosteric regulation. In this work, we argue that the network architecture itself should mirror this biological hierarchy. We introduce Geometric Graph U-Nets, a new class of models that learn multi-scale representations by recursively coarsening and refining the protein graph. We prove that this hierarchical design can theoretically more expressive than standard Geometric GNNs. Empirically, on the task of protein fold classification, Geometric U-Nets substantially outperform invariant and equivariant baselines, demonstrating their ability to learn the global structural patterns that define protein folds. Our work provides a principled foundation for designing geometric deep learning architectures that can learn the multi-scale structure of biomolecules.
【9】Multimodal Graph Neural Networks for Prognostic Modeling of Brain Network Reorganization
标题:用于脑网络重组预测建模的多峰图神经网络
链接:https://arxiv.org/abs/2512.06303
作者:Preksha Girish,Rachana Mysore,Kiran K. N.,Hiranmayee R.,Shipra Prashanth,Shrey Kumar
备注:5 pages, 2 figures. IEEE conference-style format
摘要:Understanding the dynamic reorganization of brain networks is critical for predicting cognitive decline, neurological progression, and individual variability in clinical outcomes. This work proposes a multimodal graph neural network framework that integrates structural MRI, diffusion tensor imaging, and functional MRI to model spatiotemporal brain network reorganization. Brain regions are represented as nodes and structural and functional connectivity as edges, forming longitudinal brain graphs for each subject. Temporal evolution is captured via fractional stochastic differential operators embedded within graph-based recurrent networks, enabling the modeling of long-term dependencies and stochastic fluctuations in network dynamics. Attention mechanisms fuse multimodal information and generate interpretable biomarkers, including network energy entropy, graph curvature, fractional memory indices, and modality-specific attention scores. These biomarkers are combined into a composite prognostic index to quantify individual risk of network instability or cognitive decline. Experiments on longitudinal neuroimaging datasets demonstrate both predictive accuracy and interpretability. The results highlight the potential of mathematically rigorous, multimodal graph-based approaches for deriving clinically meaningful biomarkers from existing imaging data without requiring new data collection.
【10】Back to Author Console Empowering GNNs for Domain Adaptation via Denoising Target Graph
标题:返回作者控制台通过去噪目标图来支持GNN进行域自适应
链接:https://arxiv.org/abs/2512.06236
作者
:Haiyang Yu,Meng-Chieh Lee,Xiang song,Qi Zhu,Christos Faloutsos
摘要:We explore the node classification task in the context of graph domain adaptation, which uses both source and target graph structures along with source labels to enhance the generalization capabilities of Graph Neural Networks (GNNs) on target graphs. Structure domain shifts frequently occur, especially when graph data are collected at different times or from varying areas, resulting in poor performance of GNNs on target graphs. Surprisingly, we find that simply incorporating an auxiliary loss function for denoising graph edges on target graphs can be extremely effective in enhancing GNN performance on target graphs. Based on this insight, we propose our framework, GraphDeT, a framework that integrates this auxiliary edge task into GNN training for node classification under domain adaptation. Our theoretical analysis connects this auxiliary edge task to the graph generalization bound with -distance, demonstrating such auxiliary task can imposes a constraint which tightens the bound and thereby improves generalization. The experimental results demonstrate superior performance compared to the existing baselines in handling both time and regional domain graph shifts.
【11】How Should We Evaluate Data Deletion in Graph-Based ANN Indexes?
标题:我们应该如何评估基于图的NN索引中的数据删除?
链接:https://arxiv.org/abs/2512.06200
作者:Tomohiro Yamashita,Daichi Amagata,Yusuke Matsui
备注:4 pages, 4 figures. Accepted at NeurIPS 2025 Workshop on Machine Learning for Systems
摘要:Approximate Nearest Neighbor Search (ANNS) has recently gained significant attention due to its many applications, such as Retrieval-Augmented Generation. Such applications require ANNS algorithms that support dynamic data, so the ANNS problem on dynamic data has attracted considerable interest. However, a comprehensive evaluation methodology for data deletion in ANNS has yet to be established. This study proposes an experimental framework and comprehensive evaluation metrics to assess the efficiency of data deletion for ANNS indexes under practical use cases. Specifically, we categorize data deletion methods in graph-based ANNS into three approaches and formalize them mathematically. The performance is assessed in terms of accuracy, query speed, and other relevant metrics. Finally, we apply the proposed evaluation framework to Hierarchical Navigable Small World, one of the state-of-the-art ANNS methods, to analyze the effects of data deletion, and propose Deletion Control, a method which dynamically selects the appropriate deletion method under a required search accuracy.
【12】Learning Invariant Graph Representations Through Redundant Information
标题:通过冗余信息学习不变图表示
链接:https://arxiv.org/abs/2512.06154
作者:Barproda Halder,Pasan Dissanayake,Sanghamitra Dutta
摘要:Learning invariant graph representations for out-of-distribution (OOD) generalization remains challenging because the learned representations often retain spurious components. To address this challenge, this work introduces a new tool from information theory called Partial Information Decomposition (PID) that goes beyond classical information-theoretic measures. We identify limitations in existing approaches for invariant representation learning that solely rely on classical information-theoretic measures, motivating the need to precisely focus on redundant information about the target $Y$ shared between spurious subgraphs $G_s$ and invariant subgraphs $G_c$ obtained via PID. Next, we propose a new multi-level optimization framework that we call -- Redundancy-guided Invariant Graph learning (RIG) -- that maximizes redundant information while isolating spurious and causal subgraphs, enabling OOD generalization under diverse distribution shifts. Our approach relies on alternating between estimating a lower bound of redundant information (which itself requires an optimization) and maximizing it along with additional objectives. Experiments on both synthetic and real-world graph datasets demonstrate the generalization capabilities of our proposed RIG framework.
【13】On Conditional Independence Graph Learning From Multi-Attribute Gaussian Dependent Time Series
标题:多属性高斯相关时间序列的条件独立图学习
链接:https://arxiv.org/abs/2512.07557
作者:Jitendra K. Tugnait
备注:16 pages, 3 figures, 4 tables
摘要:Estimation of the conditional independence graph (CIG) of high-dimensional multivariate Gaussian time series from multi-attribute data is considered. Existing methods for graph estimation for such data are based on single-attribute models where one associates a scalar time series with each node. In multi-attribute graphical models, each node represents a random vector or vector time series. In this paper we provide a unified theoretical analysis of multi-attribute graph learning for dependent time series using a penalized log-likelihood objective function formulated in the frequency domain using the discrete Fourier transform of the time-domain data. We consider both convex (sparse-group lasso) and non-convex (log-sum and SCAD group penalties) penalty/regularization functions. We establish sufficient conditions in a high-dimensional setting for consistency (convergence of the inverse power spectral density to true value in the Frobenius norm), local convexity when using non-convex penalties, and graph recovery. We do not impose any incoherence or irrepresentability condition for our convergence results. We also empirically investigate selection of the tuning parameters based on the Bayesian information criterion, and illustrate our approach using numerical examples utilizing both synthetic and real data.
【14】High-Dimensional Change Point Detection using Graph Spanning Ratio
标题:使用图生成比进行多维变点检测
链接:https://arxiv.org/abs/2512.07541
作者:Youngwen Sun,Katerina Papagiannouli,Vladimir Spokoiny
摘要
:Inspired by graph-based methodologies, we introduce a novel graph-spanning algorithm designed to identify changes in both offline and online data across low to high dimensions. This versatile approach is applicable to Euclidean and graph-structured data with unknown distributions, while maintaining control over error probabilities. Theoretically, we demonstrate that the algorithm achieves high detection power when the magnitude of the change surpasses the lower bound of the minimax separation rate, which scales on the order of $\sqrt{nd}$. Our method outperforms other techniques in terms of accuracy for both Gaussian and non-Gaussian data. Notably, it maintains strong detection power even with small observation windows, making it particularly effective for online environments where timely and precise change detection is critical.
【15】Learning Conditional Independence Differential Graphs From Time-Dependent Data
标题:从时间相关数据学习条件独立差图
链接:https://arxiv.org/abs/2512.06960
作者:Jitendra K Tugnait
备注:20 pages, 4 figures, 2 tables. To be published in IEEE Access, 2025
摘要:Estimation of differences in conditional independence graphs (CIGs) of two time series Gaussian graphical models (TSGGMs) is investigated where the two TSGGMs are known to have similar structure. The TSGGM structure is encoded in the inverse power spectral density (IPSD) of the time series. In several existing works, one is interested in estimating the difference in two precision matrices to characterize underlying changes in conditional dependencies of two sets of data consisting of independent and identically distributed (i.i.d.) observations. In this paper we consider estimation of the difference in two IPSDs to characterize the underlying changes in conditional dependencies of two sets of time-dependent data. Our approach accounts for data time dependencies unlike past work. We analyze a penalized D-trace loss function approach in the frequency domain for differential graph learning, using Wirtinger calculus. We consider both convex (group lasso) and non-convex (log-sum and SCAD group penalties) penalty/regularization functions. An alternating direction method of multipliers (ADMM) algorithm is presented to optimize the objective function. We establish sufficient conditions in a high-dimensional setting for consistency (convergence of the inverse power spectral density to true value in the Frobenius norm) and graph recovery. Both synthetic and real data examples are presented in support of the proposed approaches. In synthetic data examples, our log-sum-penalized differential time-series graph estimator significantly outperformed our lasso based differential time-series graph estimator which, in turn, significantly outperformed an existing lasso-penalized i.i.d. modeling approach, with $F_1$ score as the performance metric.
Transformer(6篇)
【1】Enabling Delayed-Full Charging Through Transformer-Based Real-Time-to-Departure Modeling for EV Battery Longevity
标题:通过基于Transformer的实时出发建模实现延迟完全充电,以提高电动汽车电池寿命
链接:https://arxiv.org/abs/2512.07723
作者:Yonggeon Lee,Jibin Hwang,Alfred Malengo Kondoro,Juhyun Song,Youngtae Noh
备注:16 pages, 9 figures, AAAI'26 (accepted)
摘要:Electric vehicles (EVs) are key to sustainable mobility, yet their lithium-ion batteries (LIBs) degrade more rapidly under prolonged high states of charge (SOC). This can be mitigated by delaying full charging \ours until just before departure, which requires accurate prediction of user departure times. In this work, we propose Transformer-based real-time-to-event (TTE) model for accurate EV departure prediction. Our approach represents each day as a TTE sequence by discretizing time into grid-based tokens. Unlike previous methods primarily dependent on temporal dependency from historical patterns, our method leverages streaming contextual information to predict departures. Evaluation on a real-world study involving 93 users and passive smartphone data demonstrates that our method effectively captures irregular departure patterns within individual routines, outperforming baseline models. These results highlight the potential for practical deployment of the \ours algorithm and its contribution to sustainable transportation systems.
【2】Towards a Relationship-Aware Transformer for Tabular Data
标题:一种表格数据的可感知Transformer
链接:https://arxiv.org/abs/2512.07310
作者:Andrei V. Konstantinov,Valerii A. Zuev,Lev V. Utkin
摘要:Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.
【3】A Neural Affinity Framework for Abstract Reasoning: Diagnosing the Compositional Gap in Transformer Architectures via Procedural Task Taxonomy
标题:抽象推理的神经亲和力框架:通过程序任务分类诊断Transformer架构中的组成差距
链接:https://arxiv.org/abs/2512.07109
作者:Miguel Ingram,Arthur Joseph Merritt
备注:62 pages, 10 figures
摘要
:Responding to Hodel et al.'s (2024) call for a formal definition of task relatedness in re-arc, we present the first 9-category taxonomy of all 400 tasks, validated at 97.5% accuracy via rule-based code analysis. We prove the taxonomy's visual coherence by training a CNN on raw grid pixels (95.24% accuracy on S3, 36.25% overall, 3.3x chance), then apply the taxonomy diagnostically to the original ARC-AGI-2 test set. Our curriculum analysis reveals 35.3% of tasks exhibit low neural affinity for Transformers--a distributional bias mirroring ARC-AGI-2. To probe this misalignment, we fine-tuned a 1.7M-parameter Transformer across 302 tasks, revealing a profound Compositional Gap: 210 of 302 tasks (69.5%) achieve >80% cell accuracy (local patterns) but <10% grid accuracy (global synthesis). This provides direct evidence for a Neural Affinity Ceiling Effect, where performance is bounded by architectural suitability, not curriculum. Applying our framework to Li et al.'s independent ViTARC study (400 specialists, 1M examples each) confirms its predictive power: Very Low affinity tasks achieve 51.9% versus 77.7% for High affinity (p<0.001), with a task at 0% despite massive data. The taxonomy enables precise diagnosis: low-affinity tasks (A2) hit hard ceilings, while high-affinity tasks (C1) reach 99.8%. These findings indicate that progress requires hybrid architectures with affinity-aligned modules. We release our validated taxonomy,
【4】Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features
标题:基于转换器的语义特征的网络钓鱼检测深度强化学习
链接:https://arxiv.org/abs/2512.06925
作者:Aseer Al Faisal
摘要:Phishing is a cybercrime in which individuals are deceived into revealing personal information, often resulting in financial loss. These attacks commonly occur through fraudulent messages, misleading advertisements, and compromised legitimate websites. This study proposes a Quantile Regression Deep Q-Network (QR-DQN) approach that integrates RoBERTa semantic embeddings with handcrafted lexical features to enhance phishing detection while accounting for uncertainties. Unlike traditional DQN methods that estimate single scalar Q-values, QR-DQN leverages quantile regression to model the distribution of returns, improving stability and generalization on unseen phishing data. A diverse dataset of 105,000 URLs was curated from PhishTank, OpenPhish, Cloudflare, and other sources, and the model was evaluated using an 80/20 train-test split. The QR-DQN framework achieved a test accuracy of 99.86%, precision of 99.75%, recall of 99.96%, and F1-score of 99.85%, demonstrating high effectiveness. Compared to standard DQN with lexical features, the hybrid QR-DQN with lexical and semantic features reduced the generalization gap from 1.66% to 0.04%, indicating significant improvement in robustness. Five-fold cross-validation confirmed model reliability, yielding a mean accuracy of 99.90% with a standard deviation of 0.04%. These results suggest that the proposed hybrid approach effectively identifies phishing threats, adapts to evolving attack strategies, and generalizes well to unseen data.
【5】BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination
标题:BitStopper:通过阶段融合和提前终止的高效Transformer注意力加速器
链接:https://arxiv.org/abs/2512.06457
作者:Huizheng Wang,Hongbin Wang,Shaojun Wei,Yang Hu,Shouyi Yin
摘要:Attention-based large language models (LLMs) have transformed modern AI applications, but the quadratic cost of self-attention imposes significant compute and memory overhead. Dynamic sparsity (DS) attention mitigates this, yet its hardware efficiency is limited by the added prediction stage and the heavy memory traffic it entails. To address these limitations, this paper proposes BitStopper, a fine-grained algorithm-architecture co-design that operates without a sparsity predictor. First, a bit-serial enable stage fusion (BESF) mechanism is proposed to reuse and minimize the memory access by progressively terminating trivial tokens and merging the prediction stage into the execution stage. Second, a lightweight and adaptive token selection (LATS) strategy is developed to work in concert with the bit-level sparsity speculation. Third, a bit-level asynchronous processing (BAP) strategy is employed to improve compute utilization during the on-demand bit-grained memory fetching. Finally, an elaborate architecture is designed to translate the theoretical complexity reduction into practical performance improvement. Extensive evaluations demonstrate that, compared to state-of-the-art (SOTA) Transformer accelerators, BitStopper achieves 2.03x and 1.89x speedups over Sanger and SOFA, respectively, while delivering 2.4x and 2.1x improvements in energy efficiency.
【6】VG3T: Visual Geometry Grounded Gaussian Transformer
标题:VG 3T:视觉几何接地高斯Transformer
链接:https://arxiv.org/abs/2512.05988
作者:Junho Kim,Seongwon Lee
摘要:Generating a coherent 3D scene representation from multi-view images is a fundamental yet challenging task. Existing methods often struggle with multi-view fusion, leading to fragmented 3D representations and sub-optimal performance. To address this, we introduce VG3T, a novel multi-view feed-forward network that predicts a 3D semantic occupancy via a 3D Gaussian representation. Unlike prior methods that infer Gaussians from single-view images, our model directly predicts a set of semantically attributed Gaussians in a joint, multi-view fashion. This novel approach overcomes the fragmentation and inconsistency inherent in view-by-view processing, offering a unified paradigm to represent both geometry and semantics. We also introduce two key components, Grid-Based Sampling and Positional Refinement, to mitigate the distance-dependent density bias common in pixel-aligned Gaussian initialization methods. Our VG3T shows a notable 1.7%p improvement in mIoU while using 46% fewer primitives than the previous state-of-the-art on the nuScenes benchmark, highlighting its superior efficiency and performance.
GAN|对抗|攻击|生成相关(7篇)
【1】Materium: An Autoregressive Approach for Material Generation
标题:Materium:材料生成的自回归方法
链接:https://arxiv.org/abs/2512.07486
作者:Niklas Dobberstein,Jan Hamaekers
摘要:We present Materium: an autoregressive transformer for generating crystal structures that converts 3D material representations into token sequences. These sequences include elements with oxidation states, fractional coordinates and lattice parameters. Unlike diffusion approaches, which refine atomic positions iteratively through many denoising steps, Materium places atoms at precise fractional coordinates, enabling fast, scalable generation. With this design, the model can be trained in a few hours on a single GPU and generate samples much faster on GPUs and CPUs than diffusion-based approaches. The model was trained and evaluated using multiple properties as conditions, including fundamental properties, such as density and space group, as well as more practical targets, such as band gap and magnetic density. In both single and combined conditions, the model performs consistently well, producing candidates that align with the requested inputs.
【2】AdLift: Lifting Adversarial Perturbations to Safeguard 3D Gaussian Splatting Assets Against Instruction-Driven Editing
标题:AdLift:消除对抗性扰动,保护3D高斯飞溅资产免受指令驱动编辑的影响
链接:https://arxiv.org/abs/2512.07247
作者:Ziming Hong,Tianyu Huang,Runnan Chen,Shanshan Ye,Mingming Gong,Bo Han,Tongliang Liu
备注:40 pages, 34 figures, 18 tables
摘要:Recent studies have extended diffusion-based instruction-driven 2D image editing pipelines to 3D Gaussian Splatting (3DGS), enabling faithful manipulation of 3DGS assets and greatly advancing 3DGS content creation. However, it also exposes these assets to serious risks of unauthorized editing and malicious tampering. Although imperceptible adversarial perturbations against diffusion models have proven effective for protecting 2D images, applying them to 3DGS encounters two major challenges: view-generalizable protection and balancing invisibility with protection capability. In this work, we propose the first editing safeguard for 3DGS, termed AdLift, which prevents instruction-driven editing across arbitrary views and dimensions by lifting strictly bounded 2D adversarial perturbations into 3D Gaussian-represented safeguard. To ensure both adversarial perturbations effectiveness and invisibility, these safeguard Gaussians are progressively optimized across training views using a tailored Lifted PGD, which first conducts gradient truncation during back-propagation from the editing model at the rendered image and applies projected gradients to strictly constrain the image-level perturbation. Then, the resulting perturbation is backpropagated to the safeguard Gaussian parameters via an image-to-Gaussian fitting operation. We alternate between gradient truncation and image-to-Gaussian fitting, yielding consistent adversarial-based protection performance across different viewpoints and generalizes to novel views. Empirically, qualitative and quantitative results demonstrate that AdLift effectively protects against state-of-the-art instruction-driven 2D image and 3DGS editing.
【3】Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits
标题:通过视频编辑后的条件音频生成进行连贯的视听编辑
链接:https://arxiv.org/abs/2512.07209
作者:Masato Ishii,Akio Hayakawa,Takashi Shibuya,Yuki Mitsufuji
摘要:We introduce a novel pipeline for joint audio-visual editing that enhances the coherence between edited video and its accompanying audio. Our approach first applies state-of-the-art video editing techniques to produce the target video, then performs audio editing to align with the visual changes. To achieve this, we present a new video-to-audio generation model that conditions on the source audio, target video, and a text prompt. We extend the model architecture to incorporate conditional audio input and propose a data augmentation strategy that improves training efficiency. Furthermore, our model dynamically adjusts the influence of the source audio based on the complexity of the edits, preserving the original audio structure where possible. Experimental results demonstrate that our method outperforms existing approaches in maintaining audio-visual alignment and content integrity.
【4】ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation
标题:ContextualSHAP:通过上下文语言生成增强SHAP解释
链接:https://arxiv.org/abs/2512.07178
作者:Latifa Dwiyanti,Sergio Ryan Wibisono,Hidetaka Nambo
备注:This paper was accepted and presented at the 7th World Symposium on Software Engineering (WSSE) 2025 on 25 October 2025 in Okayama, Japan, and is currently awaiting publication
摘要:Explainable Artificial Intelligence (XAI) has become an increasingly important area of research, particularly as machine learning models are deployed in high-stakes domains. Among various XAI approaches, SHAP (SHapley Additive exPlanations) has gained prominence due to its ability to provide both global and local explanations across different machine learning models. While SHAP effectively visualizes feature importance, it often lacks contextual explanations that are meaningful for end-users, especially those without technical backgrounds. To address this gap, we propose a Python package that extends SHAP by integrating it with a large language model (LLM), specifically OpenAI's GPT, to generate contextualized textual explanations. This integration is guided by user-defined parameters (such as feature aliases, descriptions, and additional background) to tailor the explanation to both the model context and the user perspective. We hypothesize that this enhancement can improve the perceived understandability of SHAP explanations. To evaluate the effectiveness of the proposed package, we applied it in a healthcare-related case study and conducted user evaluations involving real end-users. The results, based on Likert-scale surveys and follow-up interviews, indicate that the generated explanations were perceived as more understandable and contextually appropriate compared to visual-only outputs. While the findings are preliminary, they suggest that combining visualization with contextualized text may support more user-friendly and trustworthy model explanations.
【5】A Comprehensive Study of Supervised Machine Learning Models for Zero-Day Attack Detection: Analyzing Performance on Imbalanced Data
标题:用于零日攻击检测的监督机器学习模型的综合研究:分析不平衡数据的性能
链接:https://arxiv.org/abs/2512.07030
作者:Zahra Lotfi,Mostafa Lotfi
备注:13 pages, 5 figures
摘要
:Among the various types of cyberattacks, identifying zero-day attacks is problematic because they are unknown to security systems as their pattern and characteristics do not match known blacklisted attacks. There are many Machine Learning (ML) models designed to analyze and detect network attacks, especially using supervised models. However, these models are designed to classify samples (normal and attacks) based on the patterns they learn during the training phase, so they perform inefficiently on unseen attacks. This research addresses this issue by evaluating five different supervised models to assess their performance and execution time in predicting zero-day attacks and find out which model performs accurately and quickly. The goal is to improve the performance of these supervised models by not only proposing a framework that applies grid search, dimensionality reduction and oversampling methods to overcome the imbalance problem, but also comparing the effectiveness of oversampling on ml model metrics, in particular the accuracy. To emulate attack detection in real life, this research applies a highly imbalanced data set and only exposes the classifiers to zero-day attacks during the testing phase, so the models are not trained to flag the zero-day attacks. Our results show that Random Forest (RF) performs best under both oversampling and non-oversampling conditions, this increased effectiveness comes at the cost of longer processing times. Therefore, we selected XG Boost (XGB) as the top model due to its fast and highly accurate performance in detecting zero-day attacks.
【6】Partial Inverse Design of High-Performance Concrete Using Cooperative Neural Networks for Constraint-Aware Mix Generation
标题:使用协作神经网络进行高性能混凝土部分逆设计以产生约束感知混合料
链接:https://arxiv.org/abs/2512.06813
作者:Agung Nugraha,Heungjun Im,Jihwan Lee
备注:19 pages, 12 figures
摘要:High-performance concrete offers exceptional strength and durability but requires complex mix designs involving many interdependent variables and practical constraints. While data-driven methods have advanced predictive modeling for forward design, inverse design, which focuses on determining mix compositions that achieve target performance, remains limited, particularly in design situations where some mix variables are fixed by constraints and only the remaining variables must be determined. This study proposes a cooperative neural network framework for the partial inverse design of high-performance concrete. The framework combines two coupled neural network models, an imputation model that infers the undetermined variables and a surrogate model that predicts compressive strength. Through cooperative learning, the model generates valid and performance-consistent mix designs in a single forward pass while accommodating different constraint combinations without retraining. Its performance is compared with both probabilistic and generative approaches, including Bayesian inference based on a Gaussian process surrogate and autoencoder-based models. Evaluated on a benchmark dataset, the proposed model achieves stable and higher R-squared values of 0.87-0.92 and reduces mean squared error by an average of 50 percent compared with autoencoder baselines and by an average of 70 percent compared with Bayesian inference. The results demonstrate that the cooperative neural network provides an accurate, robust, and computationally efficient foundation for constraint-aware, data-driven mix proportioning in concrete engineering.
【7】Rethinking Training Dynamics in Scale-wise Autoregressive Generation
标题:重新思考规模自回归生成中的训练动力学
链接:https://arxiv.org/abs/2512.06421
作者:Gengze Zhou,Chongjian Ge,Hao Tan,Feng Liu,Yicong Hong
摘要:Recent advances in autoregressive (AR) generative models have produced increasingly powerful systems for media synthesis. Among them, next-scale prediction has emerged as a popular paradigm, where models generate images in a coarse-to-fine manner. However, scale-wise AR models suffer from exposure bias, which undermines generation quality. We identify two primary causes of this issue: (1) train-test mismatch, where the model must rely on its own imperfect predictions during inference, and (2) imbalance in scale-wise learning difficulty, where certain scales exhibit disproportionately higher optimization complexity. Through a comprehensive analysis of training dynamics, we propose Self-Autoregressive Refinement (SAR) to address these limitations. SAR introduces a Stagger-Scale Rollout (SSR) mechanism that performs lightweight autoregressive rollouts to expose the model to its own intermediate predictions, thereby aligning train-test patterns, and a complementary Contrastive Student-Forcing Loss (CSFL) that provides adequate supervision for self-generated contexts to ensure stable training. Experimental results show that applying SAR to pretrained AR models consistently improves generation quality with minimal computational overhead. For instance, SAR yields a 5.2% FID reduction on FlexVAR-d16 trained on ImageNet 256 within 10 epochs (5 hours on 32xA100 GPUs). Given its efficiency, scalability, and effectiveness, we expect SAR to serve as a reliable post-training method for visual autoregressive generation.
半/弱/无/有监督|不确定性|主动学习(9篇)
【1】Selective Masking based Self-Supervised Learning for Image Semantic Segmentation
标题:基于选择性掩蔽的自监督学习的图像语义分割
链接:https://arxiv.org/abs/2512.06981
作者:Yuemin Wang,Ian Stavness
摘要:This paper proposes a novel self-supervised learning method for semantic segmentation using selective masking image reconstruction as the pretraining task. Our proposed method replaces the random masking augmentation used in most masked image modelling pretraining methods. The proposed selective masking method selectively masks image patches with the highest reconstruction loss by breaking the image reconstruction pretraining into iterative steps to leverage the trained model's knowledge. We show on two general datasets (Pascal VOC and Cityscapes) and two weed segmentation datasets (Nassar 2020 and Sugarbeets 2016) that our proposed selective masking method outperforms the traditional random masking method and supervised ImageNet pretraining on downstream segmentation accuracy by 2.9% for general datasets and 2.5% for weed segmentation datasets. Furthermore, we found that our selective masking method significantly improves accuracy for the lowest-performing classes. Lastly, we show that using the same pretraining and downstream dataset yields the best result for low-budget self-supervised pretraining. Our proposed Selective Masking Image Reconstruction method provides an effective and practical solution to improve end-to-end semantic segmentation workflows, especially for scenarios that require limited model capacity to meet inference speed and computational resource requirements.
【2】Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT
标题:捉迷藏归因:CT中脊椎转移瘤的弱监督分割
链接:https://arxiv.org/abs/2512.06849
作者
:Matan Atad,Alexander W. Marka,Lisa Steinhelfer,Anna Curto-Vilalta,Yannik Leonhardt,Sarah C. Foreman,Anna-Sophia Walburga Dietrich,Robert Graf,Alexandra S. Gersing,Bjoern Menze,Daniel Rueckert,Jan S. Kirschke,Hendrik Möller
备注:In submission
摘要:Accurate segmentation of vertebral metastasis in CT is clinically important yet difficult to scale, as voxel-level annotations are scarce and both lytic and blastic lesions often resemble benign degenerative changes. We introduce a weakly supervised method trained solely on vertebra-level healthy/malignant labels, without any lesion masks. The method combines a Diffusion Autoencoder (DAE) that produces a classifier-guided healthy edit of each vertebra with pixel-wise difference maps that propose candidate lesion regions. To determine which regions truly reflect malignancy, we introduce Hide-and-Seek Attribution: each candidate is revealed in turn while all others are hidden, the edited image is projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of that component. High-scoring regions form the final lytic or blastic segmentation. On held-out radiologist annotations, we achieve strong blastic/lytic performance despite no mask supervision (F1: 0.91/0.85; Dice: 0.87/0.78), exceeding baselines (F1: 0.79/0.67; Dice: 0.74/0.55). These results show that vertebra-level labels can be transformed into reliable lesion masks, demonstrating that generative editing combined with selective occlusion supports accurate weakly supervised segmentation in CT.
【3】GradientSpace: Unsupervised Data Clustering for Improved Instruction Tuning
标题:CLARentSpace:用于改进指令调优的无监督数据集群
链接:https://arxiv.org/abs/2512.06678
作者:Shrihari Sridharan,Deepak Ravikumar,Anand Raghunathan,Kaushik Roy
摘要:Instruction tuning is one of the key steps required for adapting large language models (LLMs) to a broad spectrum of downstream applications. However, this procedure is difficult because real-world datasets are rarely homogeneous; they consist of a mixture of diverse information, causing gradient interference, where conflicting gradients pull the model in opposing directions, degrading performance. A common strategy to mitigate this issue is to group data based on semantic or embedding similarity. However, this fails to capture how data influences model parameters during learning. While recent works have attempted to cluster gradients directly, they randomly project gradients into lower dimensions to manage memory, which leads to accuracy loss. Moreover, these methods rely on expert ensembles which necessitates multiple inference passes and expensive on-the-fly gradient computations during inference. To address these limitations, we propose GradientSpace, a framework that clusters samples directly in full-dimensional gradient space. We introduce an online SVD-based algorithm that operates on LoRA gradients to identify latent skills without the infeasible cost of storing all sample gradients. Each cluster is used to train a specialized LoRA expert along with a lightweight router trained to select the best expert during inference. We show that routing to a single, appropriate expert outperforms expert ensembles used in prior work, while significantly reducing inference latency. Our experiments across mathematical reasoning, code generation, finance, and creative writing tasks demonstrate that GradientSpace leads to coherent expert specialization and consistent accuracy gains over state-of-the-art clustering methods and finetuning techniques.
【4】FedDSR: Federated Deep Supervision and Regularization Towards Autonomous Driving
标题:FedSVR:自动驾驶的联邦深度监督和监管
链接:https://arxiv.org/abs/2512.06676
作者:Wei-Bin Kou,Guangxu Zhu,Bingyang Cheng,Chen Zhang,Yik-Chung Wu,Jianping Wang
备注:9 pages
摘要:Federated Learning (FL) enables collaborative training of autonomous driving (AD) models across distributed vehicles while preserving data privacy. However, FL encounters critical challenges such as poor generalization and slow convergence due to non-independent and identically distributed (non-IID) data from diverse driving environments. To overcome these obstacles, we introduce Federated Deep Supervision and Regularization (FedDSR), a paradigm that incorporates multi-access intermediate layer supervision and regularization within federated AD system. Specifically, FedDSR comprises following integral strategies: (I) to select multiple intermediate layers based on predefined architecture-agnostic standards. (II) to compute mutual information (MI) and negative entropy (NE) on those selected layers to serve as intermediate loss and regularizer. These terms are integrated into the output-layer loss to form a unified optimization objective, enabling comprehensive optimization across the network hierarchy. (III) to aggregate models from vehicles trained based on aforementioned rules of (I) and (II) to generate the global model on central server. By guiding and penalizing the learning of feature representations at intermediate stages, FedDSR enhances the model generalization and accelerates model convergence for federated AD. We then take the semantic segmentation task as an example to assess FedDSR and apply FedDSR to multiple model architectures and FL algorithms. Extensive experiments demonstrate that FedDSR achieves up to 8.93% improvement in mIoU and 28.57% reduction in training rounds, compared to other FL baselines, making it highly suitable for practical deployment in federated AD ecosystems.
【5】Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning
标题:超越代币级监督:通过强化学习释放基于解码的回归的潜力
链接:https://arxiv.org/abs/2512.06533
作者:Ming Chen,Sheng Tang,Rong-Xi Tan,Ziniu Li,Jiacheng Chen,Ke Xue,Chao Qian
摘要
:Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.
【6】UncertaintyZoo: A Unified Toolkit for Quantifying Predictive Uncertainty in Deep Learning Systems
标题:UncertyZoo:用于量化深度学习系统中预测不确定性的统一工具包
链接:https://arxiv.org/abs/2512.06406
作者:Xianzong Wu,Xiaohong Li,Lili Quan,Qiang Hu
摘要:Large language models(LLMs) are increasingly expanding their real-world applications across domains, e.g., question answering, autonomous driving, and automatic software development. Despite this achievement, LLMs, as data-driven systems, often make incorrect predictions, which can lead to potential losses in safety-critical scenarios. To address this issue and measure the confidence of model outputs, multiple uncertainty quantification(UQ) criteria have been proposed. However, even though important, there are limited tools to integrate these methods, hindering the practical usage of UQ methods and future research in this domain. To bridge this gap, in this paper, we introduce UncertaintyZoo, a unified toolkit that integrates 29 uncertainty quantification methods, covering five major categories under a standardized interface. Using UncertaintyZoo, we evaluate the usefulness of existing uncertainty quantification methods under the code vulnerability detection task on CodeBERT and ChatGLM3 models. The results demonstrate that UncertaintyZoo effectively reveals prediction uncertainty. The tool with a demonstration video is available on the project site https://github.com/Paddingbuta/UncertaintyZoo.
【7】Closed-Loop Robotic Manipulation of Transparent Substrates for Self-Driving Laboratories using Deep Learning Micro-Error Correction
标题:使用深度学习微误差纠正的自动驾驶实验室透明基片闭环机器人操纵
链接:https://arxiv.org/abs/2512.06038
作者:Kelsey Fontenot,Anjali Gorti,Iva Goel,Tonio Buonassisi,Alexander E. Siemenn
备注:15 pages, 8 figures
摘要:Self-driving laboratories (SDLs) have accelerated the throughput and automation capabilities for discovering and improving chemistries and materials. Although these SDLs have automated many of the steps required to conduct chemical and materials experiments, a commonly overlooked step in the automation pipeline is the handling and reloading of substrates used to transfer or deposit materials onto for downstream characterization. Here, we develop a closed-loop method of Automated Substrate Handling and Exchange (ASHE) using robotics, dual-actuated dispensers, and deep learning-driven computer vision to detect and correct errors in the manipulation of fragile and transparent substrates for SDLs. Using ASHE, we demonstrate a 98.5% first-time placement accuracy across 130 independent trials of reloading transparent glass substrates into an SDL, where only two substrate misplacements occurred and were successfully detected as errors and automatically corrected. Through the development of more accurate and reliable methods for handling various types of substrates, we move toward an improvement in the automation capabilities of self-driving laboratories, furthering the acceleration of novel chemical and materials discoveries.
【8】Clinical Interpretability of Deep Learning Segmentation Through Shapley-Derived Agreement and Uncertainty Metrics
标题:通过Shapley推导的一致性和不确定性分隔符对深度学习分割的临床可解释性
链接:https://arxiv.org/abs/2512.07224
作者:Tianyi Ren,Daniel Low,Pittra Jaengprajak,Juampablo Heras Rivera,Jacob Ruzevick,Mehmet Kurt
摘要:Segmentation is the identification of anatomical regions of interest, such as organs, tissue, and lesions, serving as a fundamental task in computer-aided diagnosis in medical imaging. Although deep learning models have achieved remarkable performance in medical image segmentation, the need for explainability remains critical for ensuring their acceptance and integration in clinical practice, despite the growing research attention in this area. Our approach explored the use of contrast-level Shapley values, a systematic perturbation of model inputs to assess feature importance. While other studies have investigated gradient-based techniques through identifying influential regions in imaging inputs, Shapley values offer a broader, clinically aligned approach, explaining how model performance is fairly attributed to certain imaging contrasts over others. Using the BraTS 2024 dataset, we generated rankings for Shapley values for four MRI contrasts across four model architectures. Two metrics were proposed from the Shapley ranking: agreement between model and ``clinician" imaging ranking, and uncertainty quantified through Shapley ranking variance across cross-validation folds. Higher-performing cases (Dice \textgreater0.6) showed significantly greater agreement with clinical rankings. Increased Shapley ranking variance correlated with decreased performance (U-Net: $r=-0.581$). These metrics provide clinically interpretable proxies for model reliability, helping clinicians better understand state-of-the-art segmentation models.
【9】Symmetric Aggregation of Conformity Scores for Efficient Uncertainty Sets
标题:有效不确定集的一致性分数的对称聚集
链接:https://arxiv.org/abs/2512.06945
作者:Nabil Alami,Jad Zakharia,Souhaib Ben Taieb
摘要
:Access to multiple predictive models trained for the same task, whether in regression or classification, is increasingly common in many applications. Aggregating their predictive uncertainties to produce reliable and efficient uncertainty quantification is therefore a critical but still underexplored challenge, especially within the framework of conformal prediction (CP). While CP methods can generate individual prediction sets from each model, combining them into a single, more informative set remains a challenging problem. To address this, we propose SACP (Symmetric Aggregated Conformal Prediction), a novel method that aggregates nonconformity scores from multiple predictors. SACP transforms these scores into e-values and combines them using any symmetric aggregation function. This flexible design enables a robust, data-driven framework for selecting aggregation strategies that yield sharper prediction sets. We also provide theoretical insights that help justify the validity and performance of the SACP approach. Extensive experiments on diverse datasets show that SACP consistently improves efficiency and often outperforms state-of-the-art model aggregation baselines.
迁移|Zero/Few/One-Shot|自适应(15篇)
【1】An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning
标题:通过深度学习进行威胁行为分析的自适应多层蜜网架构
链接:https://arxiv.org/abs/2512.07827
作者:Lukas Johannes Möller
摘要:The escalating sophistication and variety of cyber threats have rendered static honeypots inadequate, necessitating adaptive, intelligence-driven deception. In this work, ADLAH is introduced: an Adaptive Deep Learning Anomaly Detection Honeynet designed to maximize high-fidelity threat intelligence while minimizing cost through autonomous orchestration of infrastructure. The principal contribution is offered as an end-to-end architectural blueprint and vision for an AI-driven deception platform. Feasibility is evidenced by a functional prototype of the central decision mechanism, in which a reinforcement learning (RL) agent determines, in real time, when sessions should be escalated from low-interaction sensor nodes to dynamically provisioned, high-interaction honeypots. Because sufficient live data were unavailable, field-scale validation is not claimed; instead, design trade-offs and limitations are detailed, and a rigorous roadmap toward empirical evaluation at scale is provided. Beyond selective escalation and anomaly detection, the architecture pursues automated extraction, clustering, and versioning of bot attack chains, a core capability motivated by the empirical observation that exposed services are dominated by automated traffic. Together, these elements delineate a practical path toward cost-efficient capture of high-value adversary behavior, systematic bot versioning, and the production of actionable threat intelligence.
【2】PVeRA: Probabilistic Vector-Based Random Matrix Adaptation
标题:PVeRA:基于概率载体的随机矩阵自适应
链接:https://arxiv.org/abs/2512.07703
作者:Leo Fillioux,Enzo Ferrante,Paul-Henry Cournède,Maria Vakalopoulou,Stergios Christodoulidis
摘要:Large foundation models have emerged in the last years and are pushing performance boundaries for a variety of tasks. Training or even finetuning such models demands vast datasets and computational resources, which are often scarce and costly. Adaptation methods provide a computationally efficient solution to address these limitations by allowing such models to be finetuned on small amounts of data and computing power. This is achieved by appending new trainable modules to frozen backbones with only a fraction of the trainable parameters and fitting only these modules on novel tasks. Recently, the VeRA adapter was shown to excel in parameter-efficient adaptations by utilizing a pair of frozen random low-rank matrices shared across all layers. In this paper, we propose PVeRA, a probabilistic version of the VeRA adapter, which modifies the low-rank matrices of VeRA in a probabilistic manner. This modification naturally allows handling inherent ambiguities in the input and allows for different sampling configurations during training and testing. A comprehensive evaluation was performed on the VTAB-1k benchmark and seven adapters, with PVeRA outperforming VeRA and other adapters. Our code for training models with PVeRA and benchmarking all adapters is available https://github.com/leofillioux/pvera.
【3】RRAEDy: Adaptive Latent Linearization of Nonlinear Dynamical Systems
标题:RRAEDy:非线性动态系统的自适应潜在线性化
链接:https://arxiv.org/abs/2512.07542
作者:Jad Mounayer,Sebastian Rodriguez,Jerome Tomezyk,Chady Ghnatios,Francisco Chinesta
摘要:Most existing latent-space models for dynamical systems require fixing the latent dimension in advance, they rely on complex loss balancing to approximate linear dynamics, and they don't regularize the latent variables. We introduce RRAEDy, a model that removes these limitations by discovering the appropriate latent dimension, while enforcing both regularized and linearized dynamics in the latent space. Built upon Rank-Reduction Autoencoders (RRAEs), RRAEDy automatically rank and prune latent variables through their singular values while learning a latent Dynamic Mode Decomposition (DMD) operator that governs their temporal progression. This structure-free yet linearly constrained formulation enables the model to learn stable and low-dimensional dynamics without auxiliary losses or manual tuning. We provide theoretical analysis demonstrating the stability of the learned operator and showcase the generality of our model by proposing an extension that handles parametric ODEs. Experiments on canonical benchmarks, including the Van der Pol oscillator, Burgers' equation, 2D Navier-Stokes, and Rotating Gaussians, show that RRAEDy achieves accurate and robust predictions. Our code is open-source and available at https://github.com/JadM133/RRAEDy. We also provide a video summarizing the main results at https://youtu.be/ox70mSSMGrM.
【4】Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning
标题:通过多代理强化学习对参数化交通控制器进行自适应调整
链接:https://arxiv.org/abs/2512.07417
作者:Giray Önür,Azita Dabiri,Bart De Schutter
摘要
:Effective traffic control is essential for mitigating congestion in transportation networks. Conventional traffic management strategies, including route guidance, ramp metering, and traffic signal control, often rely on state feedback controllers, used for their simplicity and reactivity; however, they lack the adaptability required to cope with complex and time-varying traffic dynamics. This paper proposes a multi-agent reinforcement learning framework in which each agent adaptively tunes the parameters of a state feedback traffic controller, combining the reactivity of state feedback controllers with the adaptability of reinforcement learning. By tuning parameters at a lower frequency rather than directly determining control actions at a high frequency, the reinforcement learning agents achieve improved training efficiency while maintaining adaptability to varying traffic conditions. The multi-agent structure further enhances system robustness, as local controllers can operate independently in the event of partial failures. The proposed framework is evaluated on a simulated multi-class transportation network under varying traffic conditions. Results show that the proposed multi-agent framework outperforms the no control and fixed-parameter state feedback control cases, while performing on par with the single-agent RL-based adaptive state feedback control, with a much better resilience to partial failures.
【5】Towards Reliable Test-Time Adaptation: Style Invariance as a Correctness Likelihood
标题:迈向可靠的测试时适应:风格不变性作为正确可能性
链接:https://arxiv.org/abs/2512.07390
作者:Gilhyun Nam,Taewon Kim,Joonhyun Jeong,Eunho Yang
备注:Accepted to WACV 2026
摘要:Test-time adaptation (TTA) enables efficient adaptation of deployed models, yet it often leads to poorly calibrated predictive uncertainty - a critical issue in high-stakes domains such as autonomous driving, finance, and healthcare. Existing calibration methods typically assume fixed models or static distributions, resulting in degraded performance under real-world, dynamic test conditions. To address these challenges, we introduce Style Invariance as a Correctness Likelihood (SICL), a framework that leverages style-invariance for robust uncertainty estimation. SICL estimates instance-wise correctness likelihood by measuring prediction consistency across style-altered variants, requiring only the model's forward pass. This makes it a plug-and-play, backpropagation-free calibration module compatible with any TTA method. Comprehensive evaluations across four baselines, five TTA methods, and two realistic scenarios with three model architecture demonstrate that SICL reduces calibration error by an average of 13 percentage points compared to conventional calibration approaches.
【6】JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention
标题:JEPA作为神经标记器:使用密度自适应注意力学习鲁棒的语音表示
链接:https://arxiv.org/abs/2512.07168
作者:Georgios Ioannides,Christos Constantinou,Aman Chadha,Aaron Elkins,Linsey Pang,Ravid Shwartz-Ziv,Yann LeCun
备注:UniReps: Unifying Representations in Neural Models (NeurIPS 2025 Workshop)
摘要:We introduce a two-stage self-supervised framework that combines the Joint-Embedding Predictive Architecture (JEPA) with a Density Adaptive Attention Mechanism (DAAM) for learning robust speech representations. Stage~1 uses JEPA with DAAM to learn semantic audio features via masked prediction in latent space, fully decoupled from waveform reconstruction. Stage~2 leverages these representations for efficient tokenization using Finite Scalar Quantization (FSQ) and a mixed-radix packing scheme, followed by high-fidelity waveform reconstruction with a HiFi-GAN decoder. By integrating Gaussian mixture-based density-adaptive gating into the JEPA encoder, the model performs adaptive temporal feature selection and discovers hierarchical speech structure at a low frame rate of 2.5~Hz. The resulting tokens (47.5 tokens/sec) provide a reversible, highly compressed, and language-model-friendly representation that is competitive with, and often more efficient than, existing neural audio codecs.
【7】Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding
标题:基于多尺度趋势分解和块MoE编码的自适应归一化Mamba
链接:https://arxiv.org/abs/2512.06929
作者:MinCheol Jeon
摘要:Time series forecasting in real world environments faces significant challenges non stationarity, multi scale temporal patterns, and distributional shifts that degrade model stability and accuracy. This study propose AdaMamba, a unified forecasting architecture that integrates adaptive normalization, multi scale trend extraction, and contextual sequence modeling to address these challenges. AdaMamba begins with an Adaptive Normalization Block that removes non stationary components through multi scale convolutional trend extraction and channel wise recalibration, enabling consistent detrending and variance stabilization. The normalized sequence is then processed by a Context Encoder that combines patch wise embeddings, positional encoding, and a Mamba enhanced Transformer layer with a mixture of experts feed forward module, allowing efficient modeling of both long range dependencies and local temporal dynamics. A lightweight prediction head generates multi horizon forecasts, and a denormalization mechanism reconstructs outputs by reintegrating local trends to ensure robustness under varying temporal conditions. AdaMamba provides strong representational capacity with modular extensibility, supporting deterministic prediction and compatibility with probabilistic extensions. Its design effectively mitigates covariate shift and enhances predictive reliability across heterogeneous datasets. Experimental evaluations demonstrate that AdaMamba's combination of adaptive normalization and expert augmented contextual modeling yields consistent improvements in stability and accuracy over conventional Transformer based baselines.
【8】Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
标题:用于预测多中心队列中有创机械通气需求的自适应测试时间训练
链接:https://arxiv.org/abs/2512.06652
作者:Xiaolei Lu,Shamim Nemati
摘要
:Accurate prediction of the need for invasive mechanical ventilation (IMV) in intensive care units (ICUs) patients is crucial for timely interventions and resource allocation. However, variability in patient populations, clinical practices, and electronic health record (EHR) systems across institutions introduces domain shifts that degrade the generalization performance of predictive models during deployment. Test-Time Training (TTT) has emerged as a promising approach to mitigate such shifts by adapting models dynamically during inference without requiring labeled target-domain data. In this work, we introduce Adaptive Test-Time Training (AdaTTT), an enhanced TTT framework tailored for EHR-based IMV prediction in ICU settings. We begin by deriving information-theoretic bounds on the test-time prediction error and demonstrate that it is constrained by the uncertainty between the main and auxiliary tasks. To enhance their alignment, we introduce a self-supervised learning framework with pretext tasks: reconstruction and masked feature modeling optimized through a dynamic masking strategy that emphasizes features critical to the main task. Additionally, to improve robustness against domain shifts, we incorporate prototype learning and employ Partial Optimal Transport (POT) for flexible, partial feature alignment while maintaining clinically meaningful patient representations. Experiments across multi-center ICU cohorts demonstrate competitive classification performance on different test-time adaptation benchmarks.
【9】Diagnosis-based mortality prediction for intensive care unit patients via transfer learning
标题:通过迁移学习预测重症监护室患者基于诊断的死亡率
链接:https://arxiv.org/abs/2512.06511
作者:Mengqi Xu,Subha Maity,Joel Dubin
摘要:In the intensive care unit, the underlying causes of critical illness vary substantially across diagnoses, yet prediction models accounting for diagnostic heterogeneity have not been systematically studied. To address the gap, we evaluate transfer learning approaches for diagnosis-specific mortality prediction and apply both GLM- and XGBoost-based models to the eICU Collaborative Research Database. Our results demonstrate that transfer learning consistently outperforms models trained only on diagnosis-specific data and those using a well-known ICU severity-of-illness score, i.e., APACHE IVa, alone, while also achieving better calibration than models trained on the pooled data. Our findings also suggest that the Youden cutoff is a more appropriate decision threshold than the conventional 0.5 for binary outcomes, and that transfer learning maintains consistently high predictive performance across various cutoff criteria.
【10】Who Will Top the Charts? Multimodal Music Popularity Prediction via Adaptive Fusion of Modality Experts and Temporal Engagement Modeling
标题:谁将位居排行榜榜首?通过情态专家和时间参与建模的自适应融合的多情态音乐流行度预测
链接:https://arxiv.org/abs/2512.06259
作者:Yash Choudhary,Preeti Rao,Pushpak Bhattacharyya
备注:8 pages
摘要:Predicting a song's commercial success prior to its release remains an open and critical research challenge for the music industry. Early prediction of music popularity informs strategic decisions, creative planning, and marketing. Existing methods suffer from four limitations:(i) temporal dynamics in audio and lyrics are averaged away; (ii) lyrics are represented as a bag of words, disregarding compositional structure and affective semantics; (iii) artist- and song-level historical performance is ignored; and (iv) multimodal fusion approaches rely on simple feature concatenation, resulting in poorly aligned shared representations. To address these limitations, we introduce GAMENet, an end-to-end multimodal deep learning architecture for music popularity prediction. GAMENet integrates modality-specific experts for audio, lyrics, and social metadata through an adaptive gating mechanism. We use audio features from Music4AllOnion processed via OnionEnsembleAENet, a network of autoencoders designed for robust feature extraction; lyric embeddings derived through a large language model pipeline; and newly introduced Career Trajectory Dynamics (CTD) features that capture multi-year artist career momentum and song-level trajectory statistics. Using the Music4All dataset (113k tracks), previously explored in MIR tasks but not popularity prediction, GAMENet achieves a 12% improvement in R^2 over direct multimodal feature concatenation. Spotify audio descriptors alone yield an R^2 of 0.13. Integrating aggregate CTD features increases this to 0.69, with an additional 7% gain from temporal CTD features. We further validate robustness using the SpotGenTrack Popularity Dataset (100k tracks), achieving a 16% improvement over the previous baseline. Extensive ablations confirm the model's effectiveness and the distinct contribution of each modality.
【11】Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning
标题:学习何时切换:通过强化学习自适应策略选择
链接:https://arxiv.org/abs/2512.06250
作者:Chris Tava
备注:7 pages
摘要:Autonomous agents often require multiple strategies to solve complex tasks, but determining when to switch between strategies remains challenging. This research introduces a reinforcement learning technique to learn switching thresholds between two orthogonal navigation policies. Using maze navigation as a case study, this work demonstrates how an agent can dynamically transition between systematic exploration (coverage) and goal-directed pathfinding (convergence) to improve task performance. Unlike fixed-threshold approaches, the agent uses Q-learning to adapt switching behavior based on coverage percentage and distance to goal, requiring only minimal domain knowledge: maze dimensions and target location. The agent does not require prior knowledge of wall positions, optimal threshold values, or hand-crafted heuristics; instead, it discovers effective switching strategies dynamically during each run. The agent discretizes its state space into coverage and distance buckets, then adapts which coverage threshold (20-60\%) to apply based on observed progress signals. Experiments across 240 test configurations (4 maze sizes from 16$\times$16 to 128$\times$128 $\times$ 10 unique mazes $\times$ 6 agent variants) demonstrate that adaptive threshold learning outperforms both single-strategy agents and fixed 40\% threshold baselines. Results show 23-55\% improvements in completion time, 83\% reduction in runtime variance, and 71\% improvement in worst-case scenarios. The learned switching behavior generalizes within each size class to unseen wall configurations. Performance gains scale with problem complexity: 23\% improvement for 16$\times$16 mazes, 34\% for 32$\times$32, and 55\% for 64$\times$64, demonstrating that as the space of possible maze structures grows, the value of adaptive policy selection over fixed heuristics increases proportionally.
【12】Multi-Modal Zero-Shot Prediction of Color Trajectories in Food Drying
标题:食品干燥颜色轨迹的多模式零射预测
链接:https://arxiv.org/abs/2512.06190
作者:Shichen Li,Ahmadreza Eslaminia,Chenhui Shao
摘要
:Food drying is widely used to reduce moisture content, ensure safety, and extend shelf life. Color evolution of food samples is an important indicator of product quality in food drying. Although existing studies have examined color changes under different drying conditions, current approaches primarily rely on low-dimensional color features and cannot fully capture the complex, dynamic color trajectories of food samples. Moreover, existing modeling approaches lack the ability to generalize to unseen process conditions. To address these limitations, we develop a novel multi-modal color-trajectory prediction method that integrates high-dimensional temporal color information with drying process parameters to enable accurate and data-efficient color trajectory prediction. Under unseen drying conditions, the model attains RMSEs of 2.12 for cookie drying and 1.29 for apple drying, reducing errors by over 90% compared with baseline models. These experimental results demonstrate the model's superior accuracy, robustness, and broad applicability.
【13】Deep learning for autism detection using clinical notes: A comparison of transfer learning for a transparent and black-box approach
标题:使用临床笔记进行自闭症检测的深度学习:透明和黑匣子方法的迁移学习比较
链接:https://arxiv.org/abs/2512.06161
作者:Gondy Leroy,Prakash Bisht,Sai Madhuri Kandula,Nell Maltman,Sydney Rice
备注:9 pages
摘要:Autism spectrum disorder (ASD) is a complex neurodevelopmental condition whose rising prevalence places increasing demands on a lengthy diagnostic process. Machine learning (ML) has shown promise in automating ASD diagnosis, but most existing models operate as black boxes and are typically trained on a single dataset, limiting their generalizability. In this study, we introduce a transparent and interpretable ML approach that leverages BioBERT, a state-of-the-art language model, to analyze unstructured clinical text. The model is trained to label descriptions of behaviors and map them to diagnostic criteria, which are then used to assign a final label (ASD or not). We evaluate transfer learning, the ability to transfer knowledge to new data, using two distinct real-world datasets. We trained on datasets sequentially and mixed together and compared the performance of the best models and their ability to transfer to new data. We also created a black-box approach and repeated this transfer process for comparison. Our transparent model demonstrated robust performance, with the mixed-data training strategy yielding the best results (97 % sensitivity, 98 % specificity). Sequential training across datasets led to a slight drop in performance, highlighting the importance of training data order. The black-box model performed worse (90 % sensitivity, 96 % specificity) when trained sequentially or with mixed data. Overall, our transparent approach outperformed the black-box approach. Mixing datasets during training resulted in slightly better performance and should be the preferred approach when practically possible. This work paves the way for more trustworthy, generalizable, and clinically actionable AI tools in neurodevelopmental diagnostics.
【14】The Road of Adaptive AI for Precision in Cybersecurity
标题:自适应人工智能实现网络安全精准之路
链接:https://arxiv.org/abs/2512.06048
作者:Sahil Garg
摘要:Cybersecurity's evolving complexity presents unique challenges and opportunities for AI research and practice. This paper shares key lessons and insights from designing, building, and operating production-grade GenAI pipelines in cybersecurity, with a focus on the continual adaptation required to keep pace with ever-shifting knowledge bases, tooling, and threats. Our goal is to provide an actionable perspective for AI practitioners and industry stakeholders navigating the frontier of GenAI for cybersecurity, with particular attention to how different adaptation mechanisms complement each other in end-to-end systems. We present practical guidance derived from real-world deployments, propose best practices for leveraging retrieval- and model-level adaptation, and highlight open research directions for making GenAI more robust, precise, and auditable in cyber defense.
【15】ADAM Optimization with Adaptive Batch Selection
标题:基于自适应批量选择的ADAM优化
链接:https://arxiv.org/abs/2512.06795
作者:Gyu Yeol Kim,Min-hwan Oh
备注:Published at ICLR 2025
摘要:Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating them equally can lead to inefficient convergence. To address this, a prior work proposed adapting the sampling distribution using a bandit framework to select samples adaptively. While promising, the bandit-based variant of Adam suffers from limited theoretical guarantees. In this paper, we introduce Adam with Combinatorial Bandit Sampling (AdamCB), which integrates combinatorial bandit techniques into Adam to resolve these issues. AdamCB is able to fully utilize feedback from multiple samples at once, enhancing both theoretical guarantees and practical performance. Our regret analysis shows that AdamCB achieves faster convergence than Adam-based methods including the previous bandit-based variant. Numerical experiments demonstrate that AdamCB consistently outperforms existing methods.
强化学习(11篇)
【1】Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE
标题:每个提示都很重要:在不浪费100亿规模MoE部滚动的情况下扩展强化学习
链接:https://arxiv.org/abs/2512.07710
作者:Anxiang Zeng,Haibo Zhang,Hailing Zhang,Kaixiang Mo,Liang Yao,Ling Hu,Long Zhang,Shuman Liu,Shuyi Xie,Yanshi Li,Yizhang Chen,Yuepeng Sheng,Yuwei Huang,Zhaochen Xu,Zhiqiang Zhou,Ziqin Liew
摘要
:We present CompassMax-V3-Thinking, a hundred-billion-scale MoE reasoning model trained with a new RL framework built on one principle: each prompt must matter. Scaling RL to this size exposes critical inefficiencies-zero-variance prompts that waste rollouts, unstable importance sampling over long horizons, advantage inversion from standard reward models, and systemic bottlenecks in rollout processing. To overcome these challenges, we introduce several unified innovations: (1) Multi-Stage Zero-Variance Elimination, which filters out non-informative prompts and stabilizes group-based policy optimization (e.g. GRPO) by removing wasted rollouts; (2) ESPO, an entropy-adaptive optimization method that balances token-level and sequence-level importance sampling to maintain stable learning dynamics; (3) a Router Replay strategy that aligns training-time MoE router decisions with inference-time behavior to mitigate train-infer discrepancies, coupled with a reward model adjustment to prevent advantage inversion; (4) a high-throughput RL system with FP8-precision rollouts, overlapped reward computation, and length-aware scheduling to eliminate performance bottlenecks. Together, these contributions form a cohesive pipeline that makes RL on hundred-billion-scale MoE models stable and efficient. The resulting model delivers strong performance across both internal and public evaluations.
【2】Model-Based Reinforcement Learning Under Confounding
标题:混杂环境下基于模型的强化学习
链接:https://arxiv.org/abs/2512.07528
作者:Nishanth Venkatesh,Andreas A. Malikopoulos
备注:9 pages, 2 figures - decompressed draft
摘要:We investigate model-based reinforcement learning in contextual Markov decision processes (C-MDPs) in which the context is unobserved and induces confounding in the offline dataset. In such settings, conventional model-learning methods are fundamentally inconsistent, as the transition and reward mechanisms generated under a behavioral policy do not correspond to the interventional quantities required for evaluating a state-based policy. To address this issue, we adapt a proximal off-policy evaluation approach that identifies the confounded reward expectation using only observable state-action-reward trajectories under mild invertibility conditions on proxy variables. When combined with a behavior-averaged transition model, this construction yields a surrogate MDP whose Bellman operator is well defined and consistent for state-based policies, and which integrates seamlessly with the maximum causal entropy (MaxCausalEnt) model-learning framework. The proposed formulation enables principled model learning and planning in confounded environments where contextual information is unobserved, unavailable, or impractical to collect.
【3】PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning
标题:PrivORL:用于离线强化学习的差异私有合成数据集
链接:https://arxiv.org/abs/2512.07342
作者:Chen Gong,Zheng Liu,Kecen Li,Tianhao Wang
备注:Accepted at NDSS 2026; code available at https://github.com/2019ChenGong/PrivORL
摘要:Recently, offline reinforcement learning (RL) has become a popular RL paradigm. In offline RL, data providers share pre-collected datasets -- either as individual transitions or sequences of transitions forming trajectories -- to enable the training of RL models (also called agents) without direct interaction with the environments. Offline RL saves interactions with environments compared to traditional RL, and has been effective in critical areas, such as navigation tasks. Meanwhile, concerns about privacy leakage from offline RL datasets have emerged. To safeguard private information in offline RL datasets, we propose the first differential privacy (DP) offline dataset synthesis method, PrivORL, which leverages a diffusion model and diffusion transformer to synthesize transitions and trajectories, respectively, under DP. The synthetic dataset can then be securely released for downstream analysis and research. PrivORL adopts the popular approach of pre-training a synthesizer on public datasets, and then fine-tuning on sensitive datasets using DP Stochastic Gradient Descent (DP-SGD). Additionally, PrivORL introduces curiosity-driven pre-training, which uses feedback from the curiosity module to diversify the synthetic dataset and thus can generate diverse synthetic transitions and trajectories that closely resemble the sensitive dataset. Extensive experiments on five sensitive offline RL datasets show that our method achieves better utility and fidelity in both DP transition and trajectory synthesis compared to baselines. The replication package is available at the GitHub repository.
【4】Know your Trajectory -- Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis
标题:了解您的轨迹--通过基于重要性的轨迹分析进行值得信赖的强化学习部署
链接:https://arxiv.org/abs/2512.06917
作者:Clifford F,Devika Jay,Abhishek Sarkar,Satheesh K Perepu,Santhosh G S,Kaushik Dey,Balaraman Ravindran
备注:Accepted at 4th Deployable AI Workshop at AAAI 2026
摘要:As Reinforcement Learning (RL) agents are increasingly deployed in real-world applications, ensuring their behavior is transparent and trustworthy is paramount. A key component of trust is explainability, yet much of the work in Explainable RL (XRL) focuses on local, single-step decisions. This paper addresses the critical need for explaining an agent's long-term behavior through trajectory-level analysis. We introduce a novel framework that ranks entire trajectories by defining and aggregating a new state-importance metric. This metric combines the classic Q-value difference with a "radical term" that captures the agent's affinity to reach its goal, providing a more nuanced measure of state criticality. We demonstrate that our method successfully identifies optimal trajectories from a heterogeneous collection of agent experiences. Furthermore, by generating counterfactual rollouts from critical states within these trajectories, we show that the agent's chosen path is robustly superior to alternatives, thereby providing a powerful "Why this, and not that?" explanation. Our experiments in standard OpenAI Gym environments validate that our proposed importance metric is more effective at identifying optimal behaviors compared to classic approaches, offering a significant step towards trustworthy autonomous systems.
【5】Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control
标题:为什么目标条件强化学习有效:与双重控制的关系
链接:https://arxiv.org/abs/2512.06471
作者:Nathan P. Lawrence,Ali Mesbah
备注:IFAC preprint
摘要
:Goal-conditioned reinforcement learning (RL) concerns the problem of training an agent to maximize the probability of reaching target goal states. This paper presents an analysis of the goal-conditioned setting based on optimal control. In particular, we derive an optimality gap between more classical, often quadratic, objectives and the goal-conditioned reward, elucidating the success of goal-conditioned RL and why classical ``dense'' rewards can falter. We then consider the partially observed Markov decision setting and connect state estimation to our probabilistic reward, further making the goal-conditioned reward well suited to dual control problems. The advantages of goal-conditioned policies are validated on nonlinear and uncertain environments using both RL and predictive control techniques.
【6】Networked Restless Multi-Arm Bandits with Reinforcement Learning
标题:具有强化学习的网络不安多臂盗贼
链接:https://arxiv.org/abs/2512.06274
作者:Hanmo Zhang,Zenghui Sun,Kai Wang
摘要:Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a $1-\frac{1}{e}$ approximation guarantee in Bellman updates. Lastly, we prove that the approximate Bellman updates are guaranteed to converge by a modified contraction analysis. We experimentally verify these results by developing an efficient Q-learning algorithm tailored to the networked setting. Experimental results on real-world graph data demonstrate that our Q-learning approach outperforms both $k$-step look-ahead and network-blind approaches, highlighting the importance of capturing and leveraging network effects where they exist.
【7】Auto-exploration for online reinforcement learning
标题:在线强化学习的自动探索
链接:https://arxiv.org/abs/2512.06244
作者:Caleb Ju,Guanghui Lan
备注:35 pages (9 appendix), 1 figure. Comments are welcome
摘要:The exploration-exploitation dilemma in reinforcement learning (RL) is a fundamental challenge to efficient RL algorithms. Existing algorithms for finite state and action discounted RL problems address this by assuming sufficient exploration over both state and action spaces. However, this yields non-implementable algorithms and sub-optimal performance. To resolve these limitations, we introduce a new class of methods with auto-exploration, or methods that automatically explore both state and action spaces in a parameter-free way, i.e.,~without a priori knowledge of problem-dependent parameters. We present two variants: one for the tabular setting and one for linear function approximation. Under algorithm-independent assumptions on the existence of an exploring optimal policy, both methods attain $O(ε^{-2})$ sample complexity to solve to $ε$ error. Crucially, these complexities are novel since they are void of algorithm-dependent parameters seen in prior works, which may be arbitrarily large. The methods are also simple to implement because they are parameter-free and do not directly estimate the unknown parameters. These feats are achieved by new algorithmic innovations for RL, including a dynamic mixing time, a discounted state distribution for sampling, a simple robust gradient estimator, and a recent advantage gap function to certify convergence.
【8】Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration
标题:通过相对值迭代实现半马尔科夫决策过程中的平均回报强化学习
链接:https://arxiv.org/abs/2512.06218
作者:Huizhen Yu,Yi Wan,Richard S. Sutton
备注:24 pages. This paper presents the reinforcement-learning material previously contained in version 2 of arXiv:2409.03915, which is now being split into two stand-alone papers. Minor corrections and improvements to the main results have also been made in the course of this reformatting
摘要:This paper applies the authors' recent results on asynchronous stochastic approximation (SA) in the Borkar-Meyn framework to reinforcement learning in average-reward semi-Markov decision processes (SMDPs). We establish the convergence of an asynchronous SA analogue of Schweitzer's classical relative value iteration algorithm, RVI Q-learning, for finite-space, weakly communicating SMDPs. In particular, we show that the algorithm converges almost surely to a compact, connected subset of solutions to the average-reward optimality equation, with convergence to a unique, sample path-dependent solution under additional stepsize and asynchrony conditions. Moreover, to make full use of the SA framework, we introduce new monotonicity conditions for estimating the optimal reward rate in RVI Q-learning. These conditions substantially expand the previously considered algorithmic framework and are addressed through novel arguments in the stability and convergence analysis of RVI Q-learning.
【9】Quantifying Memory Use in Reinforcement Learning with Temporal Range
标题:量化具有时间范围的强化学习中的记忆使用
链接:https://arxiv.org/abs/2512.06204
作者:Rodney Lafuente-Mercado,Daniela Rus,T. Konstantin Rusch
摘要
:How much does a trained RL policy actually use its past observations? We propose \emph{Temporal Range}, a model-agnostic metric that treats first-order sensitivities of multiple vector outputs across a temporal window to the input sequence as a temporal influence profile and summarizes it by the magnitude-weighted average lag. Temporal Range is computed via reverse-mode automatic differentiation from the Jacobian blocks $\partial y_s/\partial x_t\in\mathbb{R}^{c\times d}$ averaged over final timesteps $s\in\{t+1,\dots,T\}$ and is well-characterized in the linear setting by a small set of natural axioms. Across diagnostic and control tasks (POPGym; flicker/occlusion; Copy-$k$) and architectures (MLPs, RNNs, SSMs), Temporal Range (i) remains small in fully observed control, (ii) scales with the task's ground-truth lag in Copy-$k$, and (iii) aligns with the minimum history window required for near-optimal return as confirmed by window ablations. We also report Temporal Range for a compact Long Expressive Memory (LEM) policy trained on the task, using it as a proxy readout of task-level memory. Our axiomatic treatment draws on recent work on range measures, specialized here to temporal lag and extended to vector-valued outputs in the RL setting. Temporal Range thus offers a practical per-sequence readout of memory dependence for comparing agents and environments and for selecting the shortest sufficient context.
【10】JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning
标题:JaxWildfire:用于强化学习的GPU加速野火模拟器
链接:https://arxiv.org/abs/2512.06102
作者:Ufuk Çakır,Victor-Alexandru Darvariu,Bruno Lacerda,Nick Hawes
备注:To be presented at the NeurIPS 2025 Workshop on Machine Learning and the Physical Sciences (ML4PS)
摘要:Artificial intelligence methods are increasingly being explored for managing wildfires and other natural hazards. In particular, reinforcement learning (RL) is a promising path towards improving outcomes in such uncertain decision-making scenarios and moving beyond reactive strategies. However, training RL agents requires many environment interactions, and the speed of existing wildfire simulators is a severely limiting factor. We introduce $\texttt{JaxWildfire}$, a simulator underpinned by a principled probabilistic fire spread model based on cellular automata. It is implemented in JAX and enables vectorized simulations using $\texttt{vmap}$, allowing high throughput of simulations on GPUs. We demonstrate that $\texttt{JaxWildfire}$ achieves 6-35x speedup over existing software and enables gradient-based optimization of simulator parameters. Furthermore, we show that $\texttt{JaxWildfire}$ can be used to train RL agents to learn wildfire suppression policies. Our work is an important step towards enabling the advancement of RL techniques for managing natural hazards.
【11】Statistical analysis of Inverse Entropy-regularized Reinforcement Learning
标题:逆熵正则化强化学习的统计分析
链接:https://arxiv.org/abs/2512.06956
作者:Denis Belomestny,Alexey Naumov,Sergey Samsonov
备注:27 pages
摘要:Inverse reinforcement learning aims to infer the reward function that explains expert behavior observed through trajectories of state--action pairs. A long-standing difficulty in classical IRL is the non-uniqueness of the recovered reward: many reward functions can induce the same optimal policy, rendering the inverse problem ill-posed. In this paper, we develop a statistical framework for Inverse Entropy-regularized Reinforcement Learning that resolves this ambiguity by combining entropy regularization with a least-squares reconstruction of the reward from the soft Bellman residual. This combination yields a unique and well-defined so-called least-squares reward consistent with the expert policy. We model the expert demonstrations as a Markov chain with the invariant distribution defined by an unknown expert policy $π^\star$ and estimate the policy by a penalized maximum-likelihood procedure over a class of conditional distributions on the action space. We establish high-probability bounds for the excess Kullback--Leibler divergence between the estimated policy and the expert policy, accounting for statistical complexity through covering numbers of the policy class. These results lead to non-asymptotic minimax optimal convergence rates for the least-squares reward function, revealing the interplay between smoothing (entropy regularization), model complexity, and sample size. Our analysis bridges the gap between behavior cloning, inverse reinforcement learning, and modern statistical learning theory.
元学习(1篇)
【1】The Meta-Learning Gap: Combining Hydra and Quant for Large-Scale Time Series Classification
标题:元学习差距:将Hydra和Quant结合起来进行大规模时间序列分类
链接:https://arxiv.org/abs/2512.06666
作者:Urav Maniar
备注:Link to the repository: https://github.com/urav06/research
摘要:Time series classification faces a fundamental trade-off between accuracy and computational efficiency. While comprehensive ensembles like HIVE-COTE 2.0 achieve state-of-the-art accuracy, their 340-hour training time on the UCR benchmark renders them impractical for large-scale datasets. We investigate whether targeted combinations of two efficient algorithms from complementary paradigms can capture ensemble benefits while maintaining computational feasibility. Combining Hydra (competing convolutional kernels) and Quant (hierarchical interval quantiles) across six ensemble configurations, we evaluate performance on 10 large-scale MONSTER datasets (7,898 to 1,168,774 training instances). Our strongest configuration improves mean accuracy from 0.829 to 0.836, succeeding on 7 of 10 datasets. However, prediction-combination ensembles capture only 11% of theoretical oracle potential, revealing a substantial meta-learning optimization gap. Feature-concatenation approaches exceeded oracle bounds by learning novel decision boundaries, while prediction-level complementarity shows moderate correlation with ensemble gains. The central finding: the challenge has shifted from ensuring algorithms are different to learning how to combine them effectively. Current meta-learning strategies struggle to exploit the complementarity that oracle analysis confirms exists. Improved combination strategies could potentially double or triple ensemble gains across diverse time series classification applications.
分层学习(1篇)
【1】Hierarchical geometric deep learning enables scalable analysis of molecular dynamics
标题:分层几何深度学习实现分子动力学的可扩展分析
链接:https://arxiv.org/abs/2512.06520
作者:Zihan Pengmei,Spencer C. Guo,Chatipat Lorpaiboon,Aaron R. Dinner
备注:17 pages, 12 figures
摘要
:Molecular dynamics simulations can generate atomically detailed trajectories of complex systems, but analyzing these dynamics can be challenging when systems lack well-established quantitative descriptors (features). Graph neural networks (GNNs) in which messages are passed between nodes that represent atoms that are spatial neighbors promise to obviate manual feature engineering, but the use of GNNs with biomolecular systems of more than a few hundred residues has been limited in the context of analyzing dynamics by both difficulties in capturing the details of long-range interactions with message passing and the memory and runtime requirements associated with large graphs. Here, we show how local information can be aggregated to reduce memory and runtime requirements without sacrificing atomic detail. We demonstrate that this approach opens the door to analyzing simulations of protein-nucleic acid complexes with thousands of residues on single GPUs within minutes. For systems with hundreds of residues, for which there are sufficient data to make quantitative comparisons, we show that the approach improves performance and interpretability.
医学相关(7篇)
【1】A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data
标题:通过语音和言语数据预测抑郁和焦虑的多模式Bayesian网络
链接:https://arxiv.org/abs/2512.07741
作者:Agnes Norbury,George Fairs,Alexandra L. Georgescu,Matthew M. Nour,Emilia Molimpakis,Stefano Goria
摘要:During psychiatric assessment, clinicians observe not only what patients report, but important nonverbal signs such as tone, speech rate, fluency, responsiveness, and body language. Weighing and integrating these different information sources is a challenging task and a good candidate for support by intelligence-driven tools - however this is yet to be realized in the clinic. Here, we argue that several important barriers to adoption can be addressed using Bayesian network modelling. To demonstrate this, we evaluate a model for depression and anxiety symptom prediction from voice and speech features in large-scale datasets (30,135 unique speakers). Alongside performance for conditions and symptoms (for depression, anxiety ROC-AUC=0.842,0.831 ECE=0.018,0.015; core individual symptom ROC-AUC>0.74), we assess demographic fairness and investigate integration across and redundancy between different input modality types. Clinical usefulness metrics and acceptability to mental health service users are explored. When provided with sufficiently rich and large-scale multimodal data streams and specified to represent common mental conditions at the symptom rather than disorder level, such models are a principled approach for building robust assessment support tools: providing clinically-relevant outputs in a transparent and explainable format that is directly amenable to expert clinical supervision.
【2】DAUNet: A Lightweight UNet Variant with Deformable Convolutions and Parameter-Free Attention for Medical Image Segmentation
标题:DAUNet:具有可变形卷积和无参数关注的轻量级UNet变体,用于医学图像分割
链接:https://arxiv.org/abs/2512.07051
作者:Adnan Munir,Shujaat Khan
备注:11 pages, 7 figures
摘要:Medical image segmentation plays a pivotal role in automated diagnostic and treatment planning systems. In this work, we present DAUNet, a novel lightweight UNet variant that integrates Deformable V2 Convolutions and Parameter-Free Attention (SimAM) to improve spatial adaptability and context-aware feature fusion without increasing model complexity. DAUNet's bottleneck employs dynamic deformable kernels to handle geometric variations, while the decoder and skip pathways are enhanced using SimAM attention modules for saliency-aware refinement. Extensive evaluations on two challenging datasets, FH-PS-AoP (fetal head and pubic symphysis ultrasound) and FUMPE (CT-based pulmonary embolism detection), demonstrate that DAUNet outperforms state-of-the-art models in Dice score, HD95, and ASD, while maintaining superior parameter efficiency. Ablation studies highlight the individual contributions of deformable convolutions and SimAM attention. DAUNet's robustness to missing context and low-contrast regions establishes its suitability for deployment in real-time and resource-constrained clinical environments.
【3】Transferring Clinical Knowledge into ECGs Representation
标题:将临床知识转化为心电图表示
链接:https://arxiv.org/abs/2512.07021
作者:Jose Geraldo Fernandes,Luiz Facury de Souza,Pedro Robles Dutenhefner,Gisele L. Pappa,Wagner Meira
摘要:Deep learning models have shown high accuracy in classifying electrocardiograms (ECGs), but their black box nature hinders clinical adoption due to a lack of trust and interpretability. To address this, we propose a novel three-stage training paradigm that transfers knowledge from multimodal clinical data (laboratory exams, vitals, biometrics) into a powerful, yet unimodal, ECG encoder. We employ a self-supervised, joint-embedding pre-training stage to create an ECG representation that is enriched with contextual clinical information, while only requiring the ECG signal at inference time. Furthermore, as an indirect way to explain the model's output we train it to also predict associated laboratory abnormalities directly from the ECG embedding. Evaluated on the MIMIC-IV-ECG dataset, our model outperforms a standard signal-only baseline in multi-label diagnosis classification and successfully bridges a substantial portion of the performance gap to a fully multimodal model that requires all data at inference. Our work demonstrates a practical and effective method for creating more accurate and trustworthy ECG classification models. By converting abstract predictions into physiologically grounded \emph{explanations}, our approach offers a promising path toward the safer integration of AI into clinical workflows.
【4】Novel Deep Learning Architectures for Classification and Segmentation of Brain Tumors from MRI Images
标题:用于从MRI图像中分类和分割脑肿瘤的新型深度学习架构
链接:https://arxiv.org/abs/2512.06531
作者:Sayan Das,Arghadip Biswas
摘要
:Brain tumors pose a significant threat to human life, therefore it is very much necessary to detect them accurately in the early stages for better diagnosis and treatment. Brain tumors can be detected by the radiologist manually from the MRI scan images of the patients. However, the incidence of brain tumors has risen amongst children and adolescents in recent years, resulting in a substantial volume of data, as a result, it is time-consuming and difficult to detect manually. With the emergence of Artificial intelligence in the modern world and its vast application in the medical field, we can make an approach to the CAD (Computer Aided Diagnosis) system for the early detection of Brain tumors automatically. All the existing models for this task are not completely generalized and perform poorly on the validation data. So, we have proposed two novel Deep Learning Architectures - (a) SAETCN (Self-Attention Enhancement Tumor Classification Network) for the classification of different kinds of brain tumors. We have achieved an accuracy of 99.38% on the validation dataset making it one of the few Novel Deep learning-based architecture that is capable of detecting brain tumors accurately. We have trained the model on the dataset, which contains images of 3 types of tumors (glioma, meningioma, and pituitary tumors) and non-tumor cases. and (b) SAS-Net (Self-Attentive Segmentation Network) for the accurate segmentation of brain tumors. We have achieved an overall pixel accuracy of 99.23%.
【5】On The Role of K-Space Acquisition in MRI Reconstruction Domain-Generalization
标题:K空间采集在MRI重建域泛化中的作用
链接:https://arxiv.org/abs/2512.06530
作者:Mohammed Wattad,Tamir Shor,Alex Bronstein
摘要:Recent work has established learned k-space acquisition patterns as a promising direction for improving reconstruction quality in accelerated Magnetic Resonance Imaging (MRI). Despite encouraging results, most existing research focuses on acquisition patterns optimized for a single dataset or modality, with limited consideration of their transferability across imaging domains. In this work, we demonstrate that the benefits of learned k-space sampling can extend beyond the training domain, enabling superior reconstruction performance under domain shifts. Our study presents two main contributions. First, through systematic evaluation across datasets and acquisition paradigms, we show that models trained with learned sampling patterns exhibitimproved generalization under cross-domain settings. Second, we propose a novel method that enhances domain robustness by introducing acquisition uncertainty during training-stochastically perturbing k-space trajectories to simulate variability across scanners and imaging conditions. Our results highlight the importance of treating kspace trajectory design not merely as an acceleration mechanism, but as an active degree of freedom for improving domain generalization in MRI reconstruction.
【6】The MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024: Efficient and Robust Aggregation Methods for Federated Learning
标题:MICCAE联邦肿瘤分割(FeTS)2024年挑战:联邦学习的高效且稳健的聚合方法
链接:https://arxiv.org/abs/2512.06206
作者:Akis Linardos,Sarthak Pati,Ujjwal Baid,Brandon Edwards,Patrick Foley,Kevin Ta,Verena Chung,Micah Sheller,Muhammad Irfan Khan,Mojtaba Jafaritadi,Elina Kontio,Suleiman Khan,Leon Mächler,Ivan Ezhov,Suprosanna Shit,Johannes C. Paetzold,Gustav Grimberg,Manuel A. Nickel,David Naccache,Vasilis Siomos,Jonathan Passerat-Palmbach,Giacomo Tarroni,Daewoon Kim,Leonard L. Klausmann,Prashant Shah,Bjoern Menze,Dimitrios Makris,Spyridon Bakas
备注:Published at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2025:033
摘要:We present the design and results of the MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024, which focuses on federated learning (FL) for glioma sub-region segmentation in multi-parametric MRI and evaluates new weight aggregation methods aimed at improving robustness and efficiency. Six participating teams were evaluated using a standardized FL setup and a multi-institutional dataset derived from the BraTS glioma benchmark, consisting of 1,251 training cases, 219 validation cases, and 570 hidden test cases with segmentations for enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Teams were ranked using a cumulative scoring system that considered both segmentation performance, measured by Dice Similarity Coefficient (DSC) and the 95th percentile Hausdorff Distance (HD95), and communication efficiency assessed through the convergence score. A PID-controller-based method achieved the top overall ranking, obtaining mean DSC values of 0.733, 0.761, and 0.751 for ET, TC, and WT, respectively, with corresponding HD95 values of 33.922 mm, 33.623 mm, and 32.309 mm, while also demonstrating the highest communication efficiency with a convergence score of 0.764. These findings advance the state of federated learning for medical imaging, surpassing top-performing methods from previous challenge iterations and highlighting PID controllers as effective mechanisms for stabilizing and optimizing weight aggregation in FL. The challenge code is available at https://github.com/FeTS-AI/Challenge.
【7】Physics-Informed Neural Koopman Machine for Interpretable Longitudinal Personalized Alzheimer's Disease Forecasting
标题:用于可解释纵向个性化阿尔茨海默病预测的物理信息神经库普曼机
链接:https://arxiv.org/abs/2512.06134
作者:Georgi Hrusanov,Duy-Thanh Vu,Duy-Cat Can,Sophie Tascedda,Margaret Ryan,Julien Bodelet,Katarzyna Koscielska,Carsten Magnus,Oliver Y. Chén
摘要:Early forecasting of individual cognitive decline in Alzheimer's disease (AD) is central to disease evaluation and management. Despite advances, it is as of yet challenging for existing methodological frameworks to integrate multimodal data for longitudinal personalized forecasting while maintaining interpretability. To address this gap, we present the Neural Koopman Machine (NKM), a new machine learning architecture inspired by dynamical systems and attention mechanisms, designed to forecast multiple cognitive scores simultaneously using multimodal genetic, neuroimaging, proteomic, and demographic data. NKM integrates analytical ($α$) and biological ($β$) knowledge to guide feature grouping and control the hierarchical attention mechanisms to extract relevant patterns. By implementing Fusion Group-Aware Hierarchical Attention within the Koopman operator framework, NKM transforms complex nonlinear trajectories into interpretable linear representations. To demonstrate NKM's efficacy, we applied it to study the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Our results suggest that NKM consistently outperforms both traditional machine learning methods and deep learning models in forecasting trajectories of cognitive decline. Specifically, NKM (1) forecasts changes of multiple cognitive scores simultaneously, (2) quantifies differential biomarker contributions to predicting distinctive cognitive scores, and (3) identifies brain regions most predictive of cognitive deterioration. Together, NKM advances personalized, interpretable forecasting of future cognitive decline in AD using past multimodal data through an explainable, explicit system and reveals potential multimodal biological underpinnings of AD progression.
蒸馏|知识提取(1篇)
【1】State Diversity Matters in Offline Behavior Distillation
标题:状态多样性在线下行为提炼中很重要
链接:https://arxiv.org/abs/2512.06692
作者:Shiye Lei,Zhihao Cheng,Dacheng Tao
备注:12 pages, 5 figures, 5 tables
摘要:Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In this paper, we uncover a misalignment between original and distilled datasets, observing that a high-quality original dataset does not necessarily yield a superior synthetic dataset. Through an empirical analysis of policy performance under varying levels of training loss, we show that datasets with greater state diversity outperforms those with higher state quality when training loss is substantial, as is often the case in OBD, whereas the relationship reverses under minimal loss, which contributes to the misalignment. By associating state quality and diversity in reducing pivotal and surrounding error, respectively, our theoretical analysis establishes that surrounding error plays a more crucial role in policy performance when pivotal error is large, thereby highlighting the importance of state diversity in OBD scenario. Furthermore, we propose a novel yet simple algorithm, state density weighted (SDW) OBD, which emphasizes state diversity by weighting the distillation objective using the reciprocal of state density, thereby distilling a more diverse state information into synthetic data. Extensive experiments across multiple D4RL datasets confirm that SDW significantly enhances OBD performance when the original dataset exhibits limited state diversity.
推荐(1篇)
【1】Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation
标题:通过大规模推荐的预测合并探索测试时缩放
链接:https://arxiv.org/abs/2512.07650
作者:Fuyuan Lyu,Zhentai Chen,Jingyan Jiang,Lingjie Li,Xing Tang,Xiuqiang He,Xue Liu
摘要:Inspired by the success of language models (LM), scaling up deep learning recommendation systems (DLRS) has become a recent trend in the community. All previous methods tend to scale up the model parameters during training time. However, how to efficiently utilize and scale up computational resources during test time remains underexplored, which can prove to be a scaling-efficient approach and bring orthogonal improvements in LM domains. The key point in applying test-time scaling to DLRS lies in effectively generating diverse yet meaningful outputs for the same instance. We propose two ways: One is to explore the heterogeneity of different model architectures. The other is to utilize the randomness of model initialization under a homogeneous architecture. The evaluation is conducted across eight models, including both classic and SOTA models, on three benchmarks. Sufficient evidence proves the effectiveness of both solutions. We further prove that under the same inference budget, test-time scaling can outperform parameter scaling. Our test-time scaling can also be seamlessly accelerated with the increase in parallel servers when deployed online, without affecting the inference time on the user side. Code is available.
聚类(2篇)
【1】A Broader View on Clustering under Cluster-Aware Norm Objectives
标题:对决策者意识规范目标下的集群的更广泛看法
链接:https://arxiv.org/abs/2512.06211
作者:Martin G. Herold,Evangelos Kipouridis,Joachim Spoerhase
摘要:We revisit the $(f,g)$-clustering problem that we introduced in a recent work [SODA'25], and which subsumes fundamental clustering problems such as $k$-Center, $k$-Median, Min-Sum of Radii, and Min-Load $k$-Clustering. This problem assigns each of the $k$ clusters a cost determined by the monotone, symmetric norm $f$ applied to the vector distances in the cluster, and aims at minimizing the norm $g$ applied to the vector of cluster costs. Previously, we focused on certain special cases for which we designed constant-factor approximation algorithms. Our bounds for more general settings left, however, large gaps to the known bounds for the basic problems they capture. In this work, we provide a clearer picture of the approximability of these more general settings. First, we design an $O(\log^2 n)$-approximation algorithm for $(f, L_{1})$-clustering for any $f$. This improves upon our previous $\widetilde{O}(\sqrt{n})$-approximation. Second, we provide an $O(k)$-approximation for the general $(f,g)$-clustering problem, which improves upon our previous $\widetilde{O}(\sqrt{kn})$-approximation algorithm and matches the best-known upper bound for Min-Load $k$-Clustering. We then design an approximation algorithm for $(f,g)$-clustering that interpolates, up to polylog factors, between the best known bounds for $k$-Center, $k$-Median, Min-Sum of Radii, Min-Load $k$-Clustering, (Top, $L_{1}$)-clustering, and $(L_{\infty},g)$-clustering based on a newly defined parameter of $f$ and $g$.
【2】Canonical Tail Dependence for Soft Extremal Clustering of Multichannel Brain Signals
标题:多通道脑信号软极端聚集的典型尾依赖
链接:https://arxiv.org/abs/2512.06435
作者:Mara Sherlin Talento,Jordan Richards,Raphael Huser,Hernando Ombao
摘要
:We develop a novel characterization of extremal dependence between two cortical regions of the brain when its signals display extremely large amplitudes. We show that connectivity in the tails of the distribution reveals unique features of extreme events (e.g., seizures) that can help to identify their occurrence. Numerous studies have established that connectivity-based features are effective for discriminating brain states. Here, we demonstrate the advantage of the proposed approach: that tail connectivity provides additional discriminatory power, enabling more accurate identification of extreme-related events and improved seizure risk management. Common approaches in tail dependence modeling use pairwise summary measures or parametric models. However, these approaches do not identify channels that drive the maximal tail dependence between two groups of signals -- an information that is useful when analyzing electroencephalography of epileptic patients where specific channels are responsible for seizure occurrences. A familiar approach in traditional signal processing is canonical correlation, which we extend to the tails to develop a visualization of extremal channel-contributions. Through the tail pairwise dependence matrix (TPDM), we develop a computationally-efficient estimator for our canonical tail dependence measure. Our method is then used for accurate frequency-based soft clustering of neonates, distinguishing those with seizures from those without.
超分辨率|去噪|去模糊|去雾(4篇)
【1】Evaluating and Preserving High-level Fidelity in Super-Resolution
标题:评估和保留超分辨率的高级别保真度
链接:https://arxiv.org/abs/2512.07037
作者:Josep M. Rocafort,Shaolin Su,Javier Vazquez-Corral,Alexandra Gomez-Villa
摘要:Recent image Super-Resolution (SR) models are achieving impressive effects in reconstructing details and delivering visually pleasant outputs. However, the overpowering generative ability can sometimes hallucinate and thus change the image content despite gaining high visual quality. This type of high-level change can be easily identified by humans yet not well-studied in existing low-level image quality metrics. In this paper, we establish the importance of measuring high-level fidelity for SR models as a complementary criterion to reveal the reliability of generative SR models. We construct the first annotated dataset with fidelity scores from different SR models, and evaluate how state-of-the-art (SOTA) SR models actually perform in preserving high-level fidelity. Based on the dataset, we then analyze how existing image quality metrics correlate with fidelity measurement, and further show that this high-level task can be better addressed by foundation models. Finally, by fine-tuning SR models based on our fidelity feedback, we show that both semantic fidelity and perceptual quality can be improved, demonstrating the potential value of our proposed criteria, both in model evaluation and optimization. We will release the dataset, code, and models upon acceptance.
【2】Mitigating Barren plateaus in quantum denoising diffusion probabilistic models
标题:缓解量子去噪扩散概率模型中的贫瘠高原
链接:https://arxiv.org/abs/2512.06695
作者:Haipeng Cao,Kaining Zhang,Dacheng Tao,Zhaofeng Su
备注:22 pages, 9 figures
摘要:Quantum generative models leverage quantum superposition and entanglement to enhance learning efficiency for both classical and quantum data. The quantum denoising diffusion probabilistic model (QuDDPM), inspired by its classical counterpart, has been proposed as a promising framework for quantum generative learning. QuDDPM is capable of efficiently learning and generating quantum data, and it demonstrates excellent performance in learning correlated quantum noise models, quantum many-body phases, and the topological structure of quantum data. However, we show that barren plateaus emerge in QuDDPMs due to the use of 2-design states as the input for the denoising process, which severely undermines the performance of QuDDPM. Through theoretical analysis and experimental validation, we confirm the presence of barren plateaus in the original QuDDPM. To address this issue, we introduce an improved QuDDPM that utilizes a distribution maintaining a certain distance from the Haar distribution, ensuring better trainability. Experimental results demonstrate that our approach effectively mitigates the barren plateau problem and generates samples with higher quality, paving the way for scalable and efficient quantum generative learning.
【3】Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution
标题:掩蔽自动编码器对强镜头图像进行预训练,以实现联合暗物质模型分类和超分辨率
链接:https://arxiv.org/abs/2512.06642
作者:Achmad Ardani Prasha,Clavino Ourizqi Rachmadi,Muhamad Fauzan Ibnu Syahlan,Naufal Rahfi Anugerah,Nanda Garin Raditya,Putri Amelia,Sabrina Laila Mutiara,Hilman Syachr Ramadhan
备注:21 pages, 7 figures, 3 table
摘要:Strong gravitational lensing can reveal the influence of dark-matter substructure in galaxies, but analyzing these effects from noisy, low-resolution images poses a significant challenge. In this work, we propose a masked autoencoder (MAE) pretraining strategy on simulated strong-lensing images from the DeepLense ML4SCI benchmark to learn generalizable representations for two downstream tasks: (i) classifying the underlying dark matter model (cold dark matter, axion-like, or no substructure) and (ii) enhancing low-resolution lensed images via super-resolution. We pretrain a Vision Transformer encoder using a masked image modeling objective, then fine-tune the encoder separately for each task. Our results show that MAE pretraining, when combined with appropriate mask ratio tuning, yields a shared encoder that matches or exceeds a ViT trained from scratch. Specifically, at a 90% mask ratio, the fine-tuned classifier achieves macro AUC of 0.968 and accuracy of 88.65%, compared to the scratch baseline (AUC 0.957, accuracy 82.46%). For super-resolution (16x16 to 64x64), the MAE-pretrained model reconstructs images with PSNR ~33 dB and SSIM 0.961, modestly improving over scratch training. We ablate the MAE mask ratio, revealing a consistent trade-off: higher mask ratios improve classification but slightly degrade reconstruction fidelity. Our findings demonstrate that MAE pretraining on physics-rich simulations provides a flexible, reusable encoder for multiple strong-lensing analysis tasks.
【4】Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions
标题:用于增强结构化分布学习的潜在非线性去噪分数匹配
链接:https://arxiv.org/abs/2512.06615
作者:Kaichen Shen,Wei Zhu
摘要
:We present latent nonlinear denoising score matching (LNDSM), a novel training objective for score-based generative models that integrates nonlinear forward dynamics with the VAE-based latent SGM framework. This combination is achieved by reformulating the cross-entropy term using the approximate Gaussian transition induced by the Euler-Maruyama scheme. To ensure numerical stability, we identify and remove two zero-mean but variance exploding terms arising from small time steps. Experiments on variants of the MNIST dataset demonstrate that the proposed method achieves faster synthesis and enhanced learning of inherently structured distributions. Compared to benchmark structure-agnostic latent SGMs, LNDSM consistently attains superior sample quality and variability.
自动驾驶|车辆|车道检测等(3篇)
【1】Estimating Black Carbon Concentration from Urban Traffic Using Vision-Based Machine Learning
标题:使用基于视觉的机器学习估计城市交通中的黑碳浓度
链接:https://arxiv.org/abs/2512.06649
作者:Camellia Zakaria,Aryan Sadeghi,Weaam Jaafar,Junshi Xu,Alex Mariakakis,Marianne Hatzopoulou
备注:12 pages, 16 figures, 4 tables, 4 pages Appendix, in submission and under review for ACM MobiSys 2026 as of December 6th, 2025
摘要:Black carbon (BC) emissions in urban areas are primarily driven by traffic, with hotspots near major roads disproportionately affecting marginalized communities. Because BC monitoring is typically performed using costly and specialized instruments. there is little to no available data on BC from local traffic sources that could help inform policy interventions targeting local factors. By contrast, traffic monitoring systems are widely deployed in cities around the world, highlighting the imbalance between what we know about traffic conditions and what do not know about their environmental consequences. To bridge this gap, we propose a machine learning-driven system that extracts visual information from traffic video to capture vehicles behaviors and conditions. Combining these features with weather data, our model estimates BC at street level, achieving an R-squared value of 0.72 and RMSE of 129.42 ng/m3 (nanogram per cubic meter). From a sustainability perspective, this work leverages resources already supported by urban infrastructure and established modeling techniques to generate information relevant to traffic emission. Obtaining BC concentration data provides actionable insights to support pollution reduction, urban planning, public health, and environmental justice at the local municipal level.
【2】A Prescriptive Framework for Determining Optimal Days for Short-Term Traffic Counts
标题:确定短期交通统计最佳日期的规定性框架
链接:https://arxiv.org/abs/2512.06111
作者:Arthur Mukwaya,Nancy Kasamala,Nana Kankam Gyimah,Judith Mwakalonge,Gurcan Comert,Saidi Siuhi,Denis Ruganuza,Mark Ngotonie
摘要:The Federal Highway Administration (FHWA) mandates that state Departments of Transportation (DOTs) collect reliable Annual Average Daily Traffic (AADT) data. However, many U.S. DOTs struggle to obtain accurate AADT, especially for unmonitored roads. While continuous count (CC) stations offer accurate traffic volume data, their implementation is expensive and difficult to deploy widely, compelling agencies to rely on short-duration traffic counts. This study proposes a machine learning framework, the first to our knowledge, to identify optimal representative days for conducting short count (SC) data collection to improve AADT prediction accuracy. Using 2022 and 2023 traffic volume data from the state of Texas, we compare two scenarios: an 'optimal day' approach that iteratively selects the most informative days for AADT estimation and a 'no optimal day' baseline reflecting current practice by most DOTs. To align with Texas DOT's traffic monitoring program, continuous count data were utilized to simulate the 24 hour short counts. The actual field short counts were used to enhance feature engineering through using a leave-one-out (LOO) technique to generate unbiased representative daily traffic features across similar road segments. Our proposed methodology outperforms the baseline across the top five days, with the best day (Day 186) achieving lower errors (RMSE: 7,871.15, MAE: 3,645.09, MAPE: 11.95%) and higher R^2 (0.9756) than the baseline (RMSE: 11,185.00, MAE: 5,118.57, MAPE: 14.42%, R^2: 0.9499). This research offers DOTs an alternative to conventional short-duration count practices, improving AADT estimation, supporting Highway Performance Monitoring System compliance, and reducing the operational costs of statewide traffic data collection.
【3】A self-driving lab for solution-processed electrochromic thin films
标题:溶液处理电致变色薄膜的自动驾驶实验室
链接:https://arxiv.org/abs/2512.05989
作者:Selma Dahms,Luca Torresi,Shahbaz Tareq Bandesha,Jan Hansmann,Holger Röhm,Alexander Colsmann,Marco Schott,Pascal Friederich
摘要:Solution-processed electrochromic materials offer high potential for energy-efficient smart windows and displays. Their performance varies with material choice and processing conditions. Electrochromic thin film electrodes require a smooth, defect-free coating for optimal contrast between bleached and colored states. The complexity of optimizing the spin-coated electrochromic thin layer poses challenges for rapid development. This study demonstrates the use of self-driving laboratories to accelerate the development of electrochromic coatings by coupling automation with machine learning. Our system combines automated data acquisition, image processing, spectral analysis, and Bayesian optimization to explore processing parameters efficiently. This approach not only increases throughput but also enables a pointed search for optimal processing parameters. The approach can be applied to various solution-processed materials, highlighting the potential of self-driving labs in enhancing materials discovery and process optimization.
推理|分析|理解|解释(13篇)
【1】ReLaX: Reasoning with Latent Exploration for Large Reasoning Models
标题:ReLaX:大型推理模型的潜在探索推理
链接:https://arxiv.org/abs/2512.07558
作者:Shimin Zhang,Xianwei Chen,Yufan Shen,Ziyuan Ye,Jibin Wu
摘要
:Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated remarkable potential in enhancing the reasoning capability of Large Reasoning Models (LRMs). However, RLVR often leads to entropy collapse, resulting in premature policy convergence and performance saturation. While manipulating token-level entropy has proven effective for promoting policy exploration, we argue that the latent dynamics underlying token generation encode a far richer computational structure for steering policy optimization toward a more effective exploration-exploitation tradeoff. To enable tractable analysis and intervention of the latent dynamics of LRMs, we leverage Koopman operator theory to obtain a linearized representation of their hidden-state dynamics. This enables us to introduce Dynamic Spectral Dispersion (DSD), a new metric to quantify the heterogeneity of the model's latent dynamics, serving as a direct indicator of policy exploration. Building upon these foundations, we propose Reasoning with Latent eXploration (ReLaX), a paradigm that explicitly incorporates latent dynamics to regulate exploration and exploitation during policy optimization. Comprehensive experiments across a wide range of multimodal and text-only reasoning benchmarks show that ReLaX significantly mitigates premature convergence and consistently achieves state-of-the-art performance.
【2】MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis
标题:MIDG:具有知识注入的不变专家混合,用于多模式情绪分析中的领域概括
链接:https://arxiv.org/abs/2512.07430
作者:Yangle Li,Danli Luo,Haifeng Hu
摘要:Existing methods in domain generalization for Multimodal Sentiment Analysis (MSA) often overlook inter-modal synergies during invariant features extraction, which prevents the accurate capture of the rich semantic information within multimodal data. Additionally, while knowledge injection techniques have been explored in MSA, they often suffer from fragmented cross-modal knowledge, overlooking specific representations that exist beyond the confines of unimodal. To address these limitations, we propose a novel MSA framework designed for domain generalization. Firstly, the framework incorporates a Mixture of Invariant Experts model to extract domain-invariant features, thereby enhancing the model's capacity to learn synergistic relationships between modalities. Secondly, we design a Cross-Modal Adapter to augment the semantic richness of multimodal representations through cross-modal knowledge injection. Extensive domain experiments conducted on three datasets demonstrate that the proposed MIDG achieves superior performance.
【3】Asymptotic analysis of shallow and deep forgetting in replay with Neural Collapse
标题:神经崩溃重播中浅遗忘和深遗忘的渐进分析
链接:https://arxiv.org/abs/2512.07400
作者:Giulia Lanzillotta,Damiano Meier,Thomas Hofmann
摘要:A persistent paradox in continual learning (CL) is that neural networks often retain linearly separable representations of past tasks even when their output predictions fail. We formalize this distinction as the gap between deep feature-space and shallow classifier-level forgetting. We reveal a critical asymmetry in Experience Replay: while minimal buffers successfully anchor feature geometry and prevent deep forgetting, mitigating shallow forgetting typically requires substantially larger buffer capacities. To explain this, we extend the Neural Collapse framework to the sequential setting. We characterize deep forgetting as a geometric drift toward out-of-distribution subspaces and prove that any non-zero replay fraction asymptotically guarantees the retention of linear separability. Conversely, we identify that the "strong collapse" induced by small buffers leads to rank-deficient covariances and inflated class means, effectively blinding the classifier to true population boundaries. By unifying CL with out-of-distribution detection, our work challenges the prevailing reliance on large buffers, suggesting that explicitly correcting these statistical artifacts could unlock robust performance with minimal replay.
【4】Understanding Diffusion Models via Code Execution
标题:通过代码执行理解扩散模型
链接:https://arxiv.org/abs/2512.07201
作者:Cheng Yu
摘要:Diffusion models have achieved remarkable performance in generative modeling, yet their theoretical foundations are often intricate, and the gap between mathematical formulations in papers and practical open-source implementations can be difficult to bridge. Existing tutorials primarily focus on deriving equations, offering limited guidance on how diffusion models actually operate in code. To address this, we present a concise implementation of approximately 300 lines that explains diffusion models from a code-execution perspective. Our minimal example preserves the essential components -- including forward diffusion, reverse sampling, the noise-prediction network, and the training loop -- while removing unnecessary engineering details. This technical report aims to provide researchers with a clear, implementation-first understanding of how diffusion models work in practice and how code and theory correspond. Our code and pre-trained models are available at: https://github.com/disanda/GM/tree/main/DDPM-DDIM-ClassifierFree.
【5】Transformation of Biological Networks into Images via Semantic Cartography for Visual Interpretation and Scalable Deep Analysis
标题:通过语义制图将生物网络转换为图像以进行视觉解释和可扩展的深度分析
链接:https://arxiv.org/abs/2512.07040
作者:Sakib Mostafa,Lei Xing,Md. Tauhidul Islam
摘要
:Complex biological networks are fundamental to biomedical science, capturing interactions among molecules, cells, genes, and tissues. Deciphering these networks is critical for understanding health and disease, yet their scale and complexity represent a daunting challenge for current computational methods. Traditional biological network analysis methods, including deep learning approaches, while powerful, face inherent challenges such as limited scalability, oversmoothing long-range dependencies, difficulty in multimodal integration, expressivity bounds, and poor interpretability. We present Graph2Image, a framework that transforms large biological networks into sets of two-dimensional images by spatially arranging representative network nodes on a 2D grid. This transformation decouples the nodes as images, enabling the use of convolutional neural networks (CNNs) with global receptive fields and multi-scale pyramids, thus overcoming limitations of existing biological network analysis methods in scalability, memory efficiency, and long-range context capture. Graph2Image also facilitates seamless integration with other imaging and omics modalities and enhances interpretability through direct visualization of node-associated images. When applied to several large-scale biological network datasets, Graph2Image improved classification accuracy by up to 67.2% over existing methods and provided interpretable visualizations that revealed biologically coherent patterns. It also allows analysis of very large biological networks (nodes > 1 billion) on a personal computer. Graph2Image thus provides a scalable, interpretable, and multimodal-ready approach for biological network analysis, offering new opportunities for disease diagnosis and the study of complex biological systems.
【6】Optimizing video analytics inference pipelines: a case study
标题:优化视频分析推理管道:案例研究
链接:https://arxiv.org/abs/2512.07009
作者:Saeid Ghafouri,Yuming Ding,Katerine Diaz Chito,Jesús Martinez del Rincón,Niamh O'Connell,Hans Vandierendonck
备注:Accepted to the IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT 2025)
摘要:Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.
【7】MINES: Explainable Anomaly Detection through Web API Invariant Inference
标题:MINES:通过Web API不变推理进行可解释异常检测
链接:https://arxiv.org/abs/2512.06906
作者:Wenjie Zhang,Yun Lin,Chun Fung Amos Kwok,Xiwen Teoh,Xiaofei Xie,Frank Liauw,Hongyu Zhang,Jin Song Dong
摘要:Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing reliable web services. Many modern web applications operate on web APIs (e.g., RESTful, SOAP, and WebSockets), their exposure invites intended attacks or unintended illegal visits, causing abnormal system behaviors. However, such anomalies can share very similar logs with normal logs, missing crucial information (which could be in database) for log discrimination. Further, log instances can be also noisy, which can further mislead the state-of-the-art log learning solutions to learn spurious correlation, resulting superficial models and rules for anomaly detection. In this work, we propose MINES which infers explainable API invariants for anomaly detection from the schema level instead of detailed raw log instances, which can (1) significantly discriminate noise in logs to identify precise normalities and (2) detect abnormal behaviors beyond the instrumented logs. Technically, MINES (1) converts API signatures into table schema to enhance the original database shema; and (2) infers the potential database constraints on the enhanced database schema to capture the potential relationships between APIs and database tables. MINES uses LLM for extracting potential relationship based on two given table structures; and use normal log instances to reject and accept LLM-generated invariants. Finally, MINES translates the inferred constraints into invariants to generate Python code for verifying the runtime logs. We extensively evaluate MINES on web-tamper attacks on the benchmarks of TrainTicket, NiceFish, Gitea, Mastodon, and NextCloud against baselines such as LogRobust, LogFormer, and WebNorm. The results show that MINES achieves high recall for the anomalies while introducing almost zero false positives, indicating a new state-of-the-art.
【8】Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship
标题:连续独裁市场匹配中强盗学习的最优分析
链接:https://arxiv.org/abs/2512.06758
作者:Zilong Wang,Shuai Li
摘要:The problem of two-sided matching markets is well-studied in computer science and economics, owing to its diverse applications across numerous domains. Since market participants are usually uncertain about their preferences in various online matching platforms, an emerging line of research is dedicated to the online setting where one-side participants (players) learn their unknown preferences through multiple rounds of interactions with the other side (arms). Sankararaman et al. provide an $Ω\left( \frac{N\log(T)}{Δ^2} + \frac{K\log(T)}Δ \right)$ regret lower bound for this problem under serial dictatorship assumption, where $N$ is the number of players, $K (\geq N)$ is the number of arms, $Δ$ is the minimum reward gap across players and arms, and $T$ is the time horizon. Serial dictatorship assumes arms have the same preferences, which is common in reality when one side participants have a unified evaluation standard. Recently, the work of Kong and Li proposes the ET-GS algorithm and achieves an $O\left( \frac{K\log(T)}{Δ^2} \right)$ regret upper bound, which is the best upper bound attained so far. Nonetheless, a gap between the lower and upper bounds, ranging from $N$ to $K$, persists. It remains unclear whether the lower bound or the upper bound needs to be improved. In this paper, we propose a multi-level successive selection algorithm that obtains an $O\left( \frac{N\log(T)}{Δ^2} + \frac{K\log(T)}Δ \right)$ regret bound when the market satisfies serial dictatorship. To the best of our knowledge, we are the first to propose an algorithm that matches the lower bound in the problem of matching markets with bandits.
【9】Enhancing Interpretability of AR-SSVEP-Based Motor Intention Recognition via CNN-BiLSTM and SHAP Analysis on EEG Data
标题:通过CNN-BiLSTM和SHAP分析增强基于AR-SSVEP的运动意图识别的可解释性
链接:https://arxiv.org/abs/2512.06730
作者:Lin Yang,Xiang Li,Xin Ma,Xinxin Zhao
摘要
:Patients with motor dysfunction show low subjective engagement in rehabilitation training. Traditional SSVEP-based brain-computer interface (BCI) systems rely heavily on external visual stimulus equipment, limiting their practicality in real-world settings. This study proposes an augmented reality steady-state visually evoked potential (AR-SSVEP) system to address the lack of patient initiative and the high workload on therapists. Firstly, we design four HoloLens 2-based EEG classes and collect EEG data from seven healthy subjects for analysis. Secondly, we build upon the conventional CNN-BiLSTM architecture by integrating a multi-head attention mechanism (MACNN-BiLSTM). We extract ten temporal-spectral EEG features and feed them into a CNN to learn high-level representations. Then, we use BiLSTM to model sequential dependencies and apply a multi-head attention mechanism to highlight motor-intention-related patterns. Finally, the SHAP (SHapley Additive exPlanations) method is applied to visualize EEG feature contributions to the neural network's decision-making process, enhancing the model's interpretability. These findings enhance real-time motor intention recognition and support recovery in patients with motor impairments.
【10】A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations
标题:逐层分解剩余使用寿命估计的新型多峰RUL框架
链接:https://arxiv.org/abs/2512.06708
作者:Waleed Razzaq,Yun-Bo Zhao
摘要:Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.
【11】Deep learning recognition and analysis of Volatile Organic Compounds based on experimental and synthetic infrared absorption spectra
标题:基于实验和合成红外吸收光谱的挥发性有机化合物深度学习识别和分析
链接:https://arxiv.org/abs/2512.06059
作者:Andrea Della Valle,Annalisa D'Arco,Tiziana Mancini,Rosanna Mosetti,Maria Chiara Paolozzi,Stefano Lupi,Sebastiano Pilati,Andrea Perali
摘要:Volatile Organic Compounds (VOCs) are organic molecules that have low boiling points and therefore easily evaporate into the air. They pose significant risks to human health, making their accurate detection the crux of efforts to monitor and minimize exposure. Infrared (IR) spectroscopy enables the ultrasensitive detection at low-concentrations of VOCs in the atmosphere by measuring their IR absorption spectra. However, the complexity of the IR spectra limits the possibility to implement VOC recognition and quantification in real-time. While deep neural networks (NNs) are increasingly used for the recognition of complex data structures, they typically require massive datasets for the training phase. Here, we create an experimental VOC dataset for nine different classes of compounds at various concentrations, using their IR absorption spectra. To further increase the amount of spectra and their diversity in term of VOC concentration, we augment the experimental dataset with synthetic spectra created via conditional generative NNs. This allows us to train robust discriminative NNs, able to reliably identify the nine VOCs, as well as to precisely predict their concentrations. The trained NN is suitable to be incorporated into sensing devices for VOCs recognition and analysis.
【12】Memory-Amortized Inference: A Topological Unification of Search, Closure, and Structure
标题:记忆摊销推理:搜索、封闭和结构的布局统一
链接:https://arxiv.org/abs/2512.05990
作者:Xin Li
摘要:Contemporary ML separates the static structure of parameters from the dynamic flow of inference, yielding systems that lack the sample efficiency and thermodynamic frugality of biological cognition. In this theoretical work, we propose \textbf{Memory-Amortized Inference (MAI)}, a formal framework rooted in algebraic topology that unifies learning and memory as phase transitions of a single geometric substrate. Central to our theory is the \textbf{Homological Parity Principle}, which posits a fundamental dichotomy: even-dimensional homology ($H_{even}$) physically instantiates stable \textbf{Content} (stable scaffolds or ``what''), while odd-dimensional homology ($H_{odd}$) instantiates dynamic \textbf{Context} (dynamic flows or ``where''). We derive the logical flow of MAI as a topological trinity transformation: \textbf{Search $\to$ Closure $\to$ Structure}. Specifically, we demonstrate that cognition operates by converting high-complexity recursive search (modeled by \textit{Savitch's Theorem} in NPSPACE) into low-complexity lookup (modeled by \textit{Dynamic Programming} in P) via the mechanism of \textbf{Topological Cycle Closure}. We further show that this consolidation process is governed by a topological generalization of the Wake-Sleep algorithm, functioning as a coordinate descent that alternates between optimizing the $H_{odd}$ flow (inference/wake) and condensing persistent cycles into the $H_{even}$ scaffold (learning/sleep). This framework offers a rigorous explanation for the emergence of fast-thinking (intuition) from slow-thinking (reasoning) and provides a blueprint for post-Turing architectures that compute via topological resonance.
【13】$ϕ$-test: Global Feature Selection and Inference for Shapley Additive Explanations
标题:$ð $-测试:Shapley添加性解释的全局特征选择和推理
链接
:https://arxiv.org/abs/2512.07578
作者:Dongseok Kim,Hyoungsun Choi,Mohamed Jismy Aashik Rasool,Gisung Oh
备注:15 pages
摘要:We propose $φ$-test, a global feature-selection and significance procedure for black-box predictors that combines Shapley attributions with selective inference. Given a trained model and an evaluation dataset, $φ$-test performs SHAP-guided screening and fits a linear surrogate on the screened features via a selection rule with a tractable selective-inference form. For each retained feature, it outputs a Shapley-based global score, a surrogate coefficient, and post-selection $p$-values and confidence intervals in a global feature-importance table. Experiments on real tabular regression tasks with tree-based and neural backbones suggest that $φ$-test can retain much of the predictive ability of the original model while using only a few features and producing feature sets that remain fairly stable across resamples and backbone classes. In these settings, $φ$-test acts as a practical global explanation layer linking Shapley-based importance summaries with classical statistical inference.
检测相关(5篇)
【1】Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation
标题:无参考自动机器翻译评估中错误跨度检测的最小Bayes风险解码
链接:https://arxiv.org/abs/2512.07540
作者:Boxuan Lyu,Haiyue Song,Hidetaka Kamigaito,Chenchen Ding,Hideki Tanaka,Masao Utiyama,Kotaro Funakoshi,Manabu Okumura
摘要:Error Span Detection (ESD) is a subtask of automatic machine translation evaluation that localizes error spans in translations and labels their severity. State-of-the-art generative ESD methods typically decode using Maximum a Posteriori (MAP), assuming that model-estimated probabilities are perfectly correlated with similarity to human annotation. However, we observed that annotations dissimilar to the human annotation could achieve a higher model likelihood than the human annotation. We address this issue by applying Minimum Bayes Risk (MBR) decoding to generative ESD models. Specifically, we employ sentence- and span-level similarity metrics as utility functions to select candidate hypotheses based on their approximate similarity to the human annotation. Extensive experimental results show that our MBR decoding outperforms the MAP baseline at the system, sentence, and span-levels. Furthermore, to mitigate the computational cost of MBR decoding, we demonstrate that applying MBR distillation enables a standard greedy model to match MBR decoding performance, effectively eliminating the inference-time latency bottleneck.
【2】TRACE: A Generalizable Drift Detector for Streaming Data-Driven Optimization
标题:TRACE:用于流数据驱动优化的可推广漂移检测器
链接:https://arxiv.org/abs/2512.07082
作者:Yuan-Ting Zhong,Ting Huang,Xiaolin Xiao,Yue-Jiao Gong
备注:Accepted by AAAI 2026
摘要:Many optimization tasks involve streaming data with unknown concept drifts, posing a significant challenge as Streaming Data-Driven Optimization (SDDO). Existing methods, while leveraging surrogate model approximation and historical knowledge transfer, are often under restrictive assumptions such as fixed drift intervals and fully environmental observability, limiting their adaptability to diverse dynamic environments. We propose TRACE, a TRAnsferable C}oncept-drift Estimator that effectively detects distributional changes in streaming data with varying time scales. TRACE leverages a principled tokenization strategy to extract statistical features from data streams and models drift patterns using attention-based sequence learning, enabling accurate detection on unseen datasets and highlighting the transferability of learned drift patterns. Further, we showcase TRACE's plug-and-play nature by integrating it into a streaming optimizer, facilitating adaptive optimization under unknown drifts. Comprehensive experimental results on diverse benchmarks demonstrate the superior generalization, robustness, and effectiveness of our approach in SDDO scenarios.
【3】DFIR-DETR: Frequency Domain Enhancement and Dynamic Feature Aggregation for Cross-Scene Small Object Detection
标题:DFIR-DETR:用于跨场景小目标检测的频域增强和动态特征聚集
链接:https://arxiv.org/abs/2512.07078
作者:Bo Gao,Jingcheng Tong,Xingsheng Chen,Han Yu,Zichen Li
备注:16 pages
摘要:Detecting small objects in UAV remote sensing images and identifying surface defects in industrial inspection remain difficult tasks. These applications face common obstacles: features are sparse and weak, backgrounds are cluttered, and object scales vary dramatically. Current transformer-based detectors, while powerful, struggle with three critical issues. First, features degrade severely as networks downsample progressively. Second, spatial convolutions cannot capture long-range dependencies effectively. Third, standard upsampling methods inflate feature maps unnecessarily. We introduce DFIR-DETR to tackle these problems through dynamic feature aggregation combined with frequency-domain processing. Our architecture builds on three novel components. The DCFA module uses dynamic K-sparse attention, cutting complexity from O(N2) down to O(NK), and employs spatial gated linear units for better nonlinear modeling. The DFPN module applies amplitude-normalized upsampling to prevent feature inflation and uses dual-path shuffle convolution to retain spatial details across scales. The FIRC3 module operates in the frequency domain, achieving global receptive fields without sacrificing efficiency. We tested our method extensively on NEU-DET and VisDrone datasets. Results show mAP50 scores of 92.9% and 51.6% respectively-both state-of-the-art. The model stays lightweight with just 11.7M parameters and 41.2 GFLOPs. Strong performance across two very different domains confirms that DFIR-DETR generalizes well and works effectively in resource-limited settings for cross-scene small object detection.
【4】The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News
标题:数据特征对GNN检测假新闻评估的影响
链接:https://arxiv.org/abs/2512.06638
作者:Isha Karn,David Jensen
备注:Preprint. Approximately 15 pages, 5 figures, 3 tables
摘要:Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media. We show that two of the most commonly used benchmark data sets - GossipCop and PolitiFact - are poorly suited to evaluating the utility of models that use propagation structure. Specifically, these data sets exhibit shallow, ego-like graph topologies that provide little or no ability to differentiate among modeling methods. We systematically benchmark five GNN architectures against a structure-agnostic multilayer perceptron (MLP) that uses the same node features. We show that MLPs match or closely trail the performance of GNNs, with performance gaps often within 1-2% and overlapping confidence intervals. To isolate the contribution of structure in these datasets, we conduct controlled experiments where node features are shuffled or edge structures randomized. We find that performance collapses under feature shuffling but remains stable under edge randomization. This suggests that structure plays a negligible role in these benchmarks. Structural analysis further reveals that over 75% of nodes are only one hop from the root, exhibiting minimal structural diversity. In contrast, on synthetic datasets where node features are noisy and structure is informative, GNNs significantly outperform MLPs. These findings provide strong evidence that widely used benchmarks do not meaningfully test the utility of modeling structural features, and they motivate the development of datasets with richer, more diverse graph topologies.
【5】Intrusion Detection on Resource-Constrained IoT Devices with Hardware-Aware ML and DL
标题:使用硬件感知ML和DL对资源受限的物联网设备进行入侵检测
链接:https://arxiv.org/abs/2512.02272
作者:Ali Diab,Adel Chehade,Edoardo Ragusa,Paolo Gastaldo,Rodolfo Zunino,Amer Baghdadi,Mostafa Rizk
备注:Accepted at the 2025 IEEE International Conference on Emerging Trends in Engineering and Computing (ETECOM). Recipient of the ETECOM 2025 Best Paper Award
摘要:This paper proposes a hardware-aware intrusion detection system (IDS) for Internet of Things (IoT) and Industrial IoT (IIoT) networks; it targets scenarios where classification is essential for fast, privacy-preserving, and resource-efficient threat detection. The goal is to optimize both tree-based machine learning (ML) models and compact deep neural networks (DNNs) within strict edge-device constraints. This allows for a fair comparison and reveals trade-offs between model families. We apply constrained grid search for tree-based classifiers and hardware-aware neural architecture search (HW-NAS) for 1D convolutional neural networks (1D-CNNs). Evaluation on the Edge-IIoTset benchmark shows that selected models meet tight flash, RAM, and compute limits: LightGBM achieves 95.3% accuracy using 75 KB flash and 1.2 K operations, while the HW-NAS-optimized CNN reaches 97.2% with 190 KB flash and 840 K floating-point operations (FLOPs). We deploy the full pipeline on a Raspberry Pi 3 B Plus, confirming that tree-based models operate within 30 ms and that CNNs remain suitable when accuracy outweighs latency. These results highlight the practicality of hardware-constrained model design for real-time IDS at the edge.
分类|识别(5篇)
【1】Parallel Algorithms for Combined Regularized Support Vector Machines: Application in Music Genre Classification
标题:组合正规化支持向量机并行算法:在音乐流派分类中的应用
链接:https://arxiv.org/abs/2512.07463
作者:Rongmei Liang,Zizheng Liu,Xiaofei Wu,Jingwen Tu
摘要:In the era of rapid development of artificial intelligence, its applications span across diverse fields, relying heavily on effective data processing and model optimization. Combined Regularized Support Vector Machines (CR-SVMs) can effectively handle the structural information among data features, but there is a lack of efficient algorithms in distributed-stored big data. To address this issue, we propose a unified optimization framework based on consensus structure. This framework is not only applicable to various loss functions and combined regularization terms but can also be effectively extended to non-convex regularization terms, showing strong scalability. Based on this framework, we develop a distributed parallel alternating direction method of multipliers (ADMM) algorithm to efficiently compute CR-SVMs when data is stored in a distributed manner. To ensure the convergence of the algorithm, we also introduce the Gaussian back-substitution method. Meanwhile, for the integrity of the paper, we introduce a new model, the sparse group lasso support vector machine (SGL-SVM), and apply it to music information retrieval. Theoretical analysis confirms that the computational complexity of the proposed algorithm is not affected by different regularization terms and loss functions, highlighting the universality of the parallel algorithm. Experiments on synthetic and free music archiv datasets demonstrate the reliability, stability, and efficiency of the algorithm.
【2】IFFair: Influence Function-driven Sample Reweighting for Fair Classification
标题:IFFair:影响函数驱动的样本权重调整
链接:https://arxiv.org/abs/2512.07249
作者:Jingran Yang,Min Zhang,Lingfeng Zhang,Zhaohui Wang,Yonggang Zhang
摘要
:Because machine learning has significantly improved efficiency and convenience in the society, it's increasingly used to assist or replace human decision-making. However, the data-based pattern makes related algorithms learn and even exacerbate potential bias in samples, resulting in discriminatory decisions against certain unprivileged groups, depriving them of the rights to equal treatment, thus damaging the social well-being and hindering the development of related applications. Therefore, we propose a pre-processing method IFFair based on the influence function. Compared with other fairness optimization approaches, IFFair only uses the influence disparity of training samples on different groups as a guidance to dynamically adjust the sample weights during training without modifying the network structure, data features and decision boundaries. To evaluate the validity of IFFair, we conduct experiments on multiple real-world datasets and metrics. The experimental results show that our approach mitigates bias of multiple accepted metrics in the classification setting, including demographic parity, equalized odds, equality of opportunity and error rate parity without conflicts. It also demonstrates that IFFair achieves better trade-off between multiple utility and fairness metrics compared with previous pre-processing methods.
【3】Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network
标题:基于卷积神经网络的上市公司财务舞弊识别与解释性研究
链接:https://arxiv.org/abs/2512.06648
作者:Xiao Li
备注:in Chinese language
摘要:Since the emergence of joint-stock companies, financial fraud by listed firms has repeatedly undermined capital markets. Fraud is difficult to detect because of covert tactics and the high labor and time costs of audits. Traditional statistical models are interpretable but struggle with nonlinear feature interactions, while machine learning models are powerful but often opaque. In addition, most existing methods judge fraud only for the current year based on current year data, limiting timeliness. This paper proposes a financial fraud detection framework for Chinese A-share listed companies based on convolutional neural networks (CNNs). We design a feature engineering scheme that transforms firm-year panel data into image like representations, enabling the CNN to capture cross-sectional and temporal patterns and to predict fraud in advance. Experiments show that the CNN outperforms logistic regression and LightGBM in accuracy, robustness, and early-warning performance, and that proper tuning of the classification threshold is crucial in high-risk settings. To address interpretability, we analyze the model along the dimensions of entity, feature, and time using local explanation techniques. We find that solvency, ratio structure, governance structure, and internal control are general predictors of fraud, while environmental indicators matter mainly in high-pollution industries. Non-fraud firms share stable feature patterns, whereas fraud firms exhibit heterogeneous patterns concentrated in short time windows. A case study of Guanong Shares in 2022 shows that cash flow analysis, social responsibility, governance structure, and per-share indicators are the main drivers of the model's fraud prediction, consistent with the company's documented misconduct.
【4】Proof of Concept for Mammography Classification with Enhanced Compactness and Separability Modules
标题:具有增强的紧凑性和可分离性模块的乳房X线摄影分类概念证明
链接:https://arxiv.org/abs/2512.06575
作者:Fariza Dahes
备注:26 pages, 16 figures, 2 tables; proof of concept on mammography classification with compactness/separability modules and interactive dashboard; preprint submitted to arXiv cs.LG
摘要:This study presents a validation and extension of a recent methodological framework for medical image classification. While an improved ConvNeXt Tiny architecture, integrating Global Average and Max Pooling fusion (GAGM), lightweight channel attention (SEVector), and Feature Smoothing Loss (FSL), demonstrated promising results on Alzheimer MRI under CPU friendly conditions, our work investigates its transposability to mammography classification. Using a Kaggle dataset that consolidates INbreast, MIAS, and DDSM mammography collections, we compare a baseline CNN, ConvNeXt Tiny, and InceptionV3 backbones enriched with GAGM and SEVector modules. Results confirm the effectiveness of GAGM and SEVector in enhancing feature discriminability and reducing false negatives, particularly for malignant cases. In our experiments, however, the Feature Smoothing Loss did not yield measurable improvements under mammography classification conditions, suggesting that its effectiveness may depend on specific architectural and computational assumptions. Beyond validation, our contribution extends the original framework through multi metric evaluation (macro F1, per class recall variance, ROC/AUC), feature interpretability analysis (Grad CAM), and the development of an interactive dashboard for clinical exploration. As a perspective, we highlight the need to explore alternative approaches to improve intra class compactness and inter class separability, with the specific goal of enhancing the distinction between malignant and benign cases in mammography classification.
【5】Microseismic event classification with a lightweight Fourier Neural Operator model
标题:使用轻量级傅里叶神经运算符模型的微地震事件分类
链接:https://arxiv.org/abs/2512.07425
作者:Ayrat Abdullin,Umair bin Waheed,Leo Eisner,Abdullatif Al-Shuhail
备注:Submitted to Nature Scientific Reports
摘要:Real-time monitoring of induced seismicity is crucial for mitigating operational hazards, relying on the rapid and accurate classification of microseismic events from continuous data streams. However, while many deep learning models excel at this task, their high computational requirements often limit their practical application in real-time monitoring systems. To address this limitation, a lightweight model based on the Fourier Neural Operator (FNO) is proposed for microseismic event classification, leveraging its inherent resolution-invariance and computational efficiency for waveform processing. In the STanford EArthquake Dataset (STEAD), a global and large-scale database of seismic waveforms, the FNO-based model demonstrates high effectiveness for trigger classification, with an F1 score of 95% even in the scenario of data sparsity in training. The new FNO model greatly decreases the computer power needed relative to current deep learning models without sacrificing the classification success rate measured by the F1 score. A test on a real microseismic dataset shows a classification success rate with an F1 score of 98%, outperforming many traditional deep-learning techniques. A combination of high success rate and low computational power indicates that the FNO model can serve as a methodology of choice for real-time monitoring of microseismicity for induced seismicity. The method saves computational resources and facilitates both post-processing and real-time seismic processing suitable for the implementation of traffic light systems to prevent undesired induced seismicity.
表征(2篇)
【1】When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models
标题:当距离分散时:BT损失奖励模型中的表示距离偏差
链接:https://arxiv.org/abs/2512.06343
作者:Tong Xie,Andrew Bai,Yuanhao Ban,Yunqi Hong,Haoyu Li,Cho-jui Hsieh
摘要:Reward models are central to Large Language Model (LLM) alignment within the framework of RLHF. The standard objective used in reward modeling is the Bradley-Terry (BT) loss, which learns from pairwise data consisting of a pair of chosen and rejected responses. In this work, we analyze the per-sample gradient of BT-loss and show that its norm scales with two distinct components: (1) the difference in predicted rewards between chosen and rejected responses, which reflects the prediction error, and critically, (2) representation distance between the pair measured in the output space of the final layer. While the first term captures the intended training signal, we show that the second term can significantly impact the update magnitude and misalign learning. Specifically, pairs with small representation distance often receive vanishingly weak updates, even when misranked, while pairs with large distance receive disproportionately strong updates. This leads to gradients from large-distance pairs to overshadow those from small-distance pairs, where fine-grained distinctions are especially important. To overcome this limitation, we propose NormBT, an adaptive pair-wise normalization scheme that balances representation-driven effects and focuses learning signals on prediction error. NormBT is a lightweight, drop-in integration to BT loss with negligible overhead. Across various LLM backbones and datasets, NormBT improves reward model performance consistently, with notable gains of over 5% on the Reasoning category of RewardBench, which contains numerous small-distance pairs. This work reveals a key limitation in the widely used BT objective and provides a simple, effective correction.
【2】Accelerating Materials Discovery: Learning a Universal Representation of Chemical Processes for Cross-Domain Property Prediction
标题:加速材料发现:学习化学过程的通用表示以进行跨领域性质预测
链接:https://arxiv.org/abs/2512.05979
作者:Mikhail Tsitsvero,Atsuyuki Nakao,Hisaki Ikebata
备注:22 pages, 8 figures
摘要:Experimental validation of chemical processes is slow and costly, limiting exploration in materials discovery. Machine learning can prioritize promising candidates, but existing data in patents and literature is heterogeneous and difficult to use. We introduce a universal directed-tree process-graph representation that unifies unstructured text, molecular structures, and numeric measurements into a single machine-readable format. To learn from this structured data, we developed a multi-modal graph neural network with a property-conditioned attention mechanism. Trained on approximately 700,000 process graphs from nearly 9,000 diverse documents, our model learns semantically rich embeddings that generalize across domains. When fine-tuned on compact, domain-specific datasets, the pretrained model achieves strong performance, demonstrating that universal process representations learned at scale transfer effectively to specialized prediction tasks with minimal additional data.
编码器(4篇)
【1】Group Representational Position Encoding
标题:组代表位置编码
链接:https://arxiv.org/abs/2512.07805
作者:Yifan Zhang,Zixiang Chen,Yifeng Liu,Zhen Qin,Huizhuo Yuan,Kangping Xu,Yang Yuan,Quanquan Gu,Andrew Chi-Chih Yao
备注:Project Page: https://github.com/model-architectures/GRAPE
摘要:We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\mathrm{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t \in \mathbb{R}$) acts as $\mathbf{G}(n)=\exp(n\,ω\,\mathbf{L})$ with a rank-2 skew generator $\mathbf{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(r d)$ cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE.
【2】A Physics-Aware Attention LSTM Autoencoder for Early Fault Diagnosis of Battery Systems
标题:用于电池系统早期故障诊断的物理感知注意LSTM自动编码器
链接:https://arxiv.org/abs/2512.06809
作者:Jiong Yang
备注:5 pages, 7 figures
摘要:Battery safety is paramount for electric vehicles. Early fault diagnosis remains a challenge due to the subtle nature of anomalies and the interference of dynamic operating noise. Existing data-driven methods often suffer from "physical blindness" leading to missed detections or false alarms. To address this, we propose a Physics-Aware Attention LSTM Autoencoder (PA-ALSTM-AE). This novel framework explicitly integrates battery aging laws (mileage) into the deep learning pipeline through a multi-stage fusion mechanism. Specifically, an adaptive physical feature construction module selects mileage-sensitive features, and a physics-guided latent fusion module dynamically calibrates the memory cells of the LSTM based on the aging state. Extensive experiments on the large-scale Vloong real-world dataset demonstrate that the proposed method significantly outperforms state-of-the-art baselines. Notably, it improves the recall rate of early faults by over 3 times while maintaining high precision, offering a robust solution for industrial battery management systems.
【3】Vector Quantization using Gaussian Variational Autoencoder
标题:使用高斯变分自动编码器的载体量化
链接:https://arxiv.org/abs/2512.06609
作者:Tongda Xu,Wendi Zheng,Jiajun He,Jose Miguel Hernandez-Lobato,Yan Wang,Ya-Qin Zhang,Jie Tang
摘要:Vector quantized variational autoencoder (VQ-VAE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to discretization. In this paper, we propose a simple yet effective technique, dubbed Gaussian Quant (GQ), that converts a Gaussian VAE with certain constraint into a VQ-VAE without training. GQ generates random Gaussian noise as a codebook and finds the closest noise to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE.
【4】A scalable and real-time neural decoder for topological quantum codes
标题:一种可扩展的实时神经解码器,用于拓扑量子码
链接:https://arxiv.org/abs/2512.07737
作者:Andrew W. Senior,Thomas Edlich,Francisco J. H. Heras,Lei M. Zhang,Oscar Higgott,James S. Spencer,Taylor Applebaum,Sam Blackwell,Justin Ledford,Akvilė Žemgulytė,Augustin Žídek,Noah Shutty,Andrew Cowie,Yin Li,George Holland,Peter Brooks,Charlie Beattie,Michael Newman,Alex Davies,Cody Jones,Sergio Boixo,Hartmut Neven,Pushmeet Kohli,Johannes Bausch
摘要:Fault-tolerant quantum computing will require error rates far below those achievable with physical qubits. Quantum error correction (QEC) bridges this gap, but depends on decoders being simultaneously fast, accurate, and scalable. This combination of requirements has not yet been met by a machine-learning decoder, nor by any decoder for promising resource-efficient codes such as the colour code. Here we introduce AlphaQubit 2, a neural-network decoder that achieves near-optimal logical error rates for both surface and colour codes at large scales under realistic noise. For the colour code, it is orders of magnitude faster than other high-accuracy decoders. For the surface code, we demonstrate real-time decoding faster than 1 microsecond per cycle up to distance 11 on current commercial accelerators with better accuracy than leading real-time decoders. These results support the practical application of a wider class of promising QEC codes, and establish a credible path towards high-accuracy, real-time neural decoding at the scales required for fault-tolerant quantum computation.
优化|敛散性(6篇)
【1】Comparing BFGS and OGR for Second-Order Optimization
标题:比较BFSG和OGR进行二阶优化
链接:https://arxiv.org/abs/2512.06969
作者:Adrian Przybysz,Mikołaj Kołek,Franciszek Sobota,Jarek Duda
摘要:Estimating the Hessian matrix, especially for neural network training, is a challenging problem due to high dimensionality and cost. In this work, we compare the classical Sherman-Morrison update used in the popular BFGS method (Broy-den-Fletcher-Goldfarb-Shanno), which maintains a positive definite Hessian approximation under a convexity assumption, with a novel approach called Online Gradient Regression (OGR). OGR performs regression of gradients against positions using an exponential moving average to estimate second derivatives online, without requiring Hessian inversion. Unlike BFGS, OGR allows estimation of a general (not necessarily positive definite) Hessian and can thus handle non-convex structures. We evaluate both methods across standard test functions and demonstrate that OGR achieves faster convergence and improved loss, particularly in non-convex settings.
【2】Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization
标题:机器学习训练管道的I/O性能预测建模:一种数据驱动的存储优化方法
链接:https://arxiv.org/abs/2512.06699
作者:Karthik Prabhakar
备注:20 pages, 10 figures
摘要:Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine learning approach to predict I/O performance and recommend optimal storage configurations for ML training pipelines. We collected 141 observations through systematic benchmarking across different storage backends (NVMe SSD, network-attached storage, in-memory filesystems), data formats, and access patterns, covering both low-level I/O operations and full training pipelines. After evaluating seven regression models and three classification approaches, XGBoost achieved the best performance with R-squared of 0.991, predicting I/O throughput within 11.8% error on average. Feature importance analysis revealed that throughput metrics and batch size are the primary performance drivers. This data-driven approach can reduce configuration time from days of trial-and-error to minutes of predictive recommendation. The methodology is reproducible and extensible to other resource management problems in ML systems. Code and data are available at https://github.com/knkarthik01/gpu_storage_ml_project
【3】Optimized Machine Learning Methods for Studying the Thermodynamic Behavior of Complex Spin Systems
标题:用于研究复杂旋转系统热力学行为的优化机器学习方法
链接:https://arxiv.org/abs/2512.07458
作者:Dmitrii Kapitan,Pavel Ovchinnikov,Konstantin Soldatov,Petr Andriushchenko,Vitalii Kapitan
备注:16 pages, in Russian language, 8 figures, 2 tables
摘要
:This paper presents a systematic study of the application of convolutional neural networks (CNNs) as an efficient and versatile tool for the analysis of critical and low-temperature phase states in spin system models. The problem of calculating the dependence of the average energy on the spatial distribution of exchange integrals for the Edwards-Anderson model on a square lattice with frustrated interactions is considered. We further construct a single convolutional classifier of phase states of the ferromagnetic Ising model on square, triangular, honeycomb, and kagome lattices, trained on configurations generated by the Swendsen-Wang cluster algorithm. Computed temperature profiles of the averaged posterior probability of the high-temperature phase form clear S-shaped curves that intersect in the vicinity of the theoretical critical temperatures and allow one to determine the critical temperature for the kagome lattice without additional retraining. It is shown that convolutional models substantially reduce the root-mean-square error (RMSE) compared with fully connected architectures and efficiently capture complex correlations between thermodynamic characteristics and the structure of magnetic correlated systems.
【4】Optimal and Diffusion Transports in Machine Learning
标题:机器学习中的最佳和扩散传输
链接:https://arxiv.org/abs/2512.06797
作者:Gabriel Peyré
备注:Proc. 2026 International Congress of Mathematicians
摘要:Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This includes sampling via diffusion methods, optimizing the weights of neural networks, and analyzing the evolution of token distributions across layers of large language models. While the targeted applications differ (samples, weights, tokens), their mathematical descriptions share a common structure. A key idea is to switch from the Eulerian representation of densities to their Lagrangian counterpart through vector fields that advect particles. This dual view introduces challenges, notably the non-uniqueness of Lagrangian vector fields, but also opportunities to craft density evolutions and flows with favorable properties in terms of regularity, stability, and computational tractability. This survey presents an overview of these methods, with emphasis on two complementary approaches: diffusion methods, which rely on stochastic interpolation processes and underpin modern generative AI, and optimal transport, which defines interpolation by minimizing displacement cost. We illustrate how both approaches appear in applications ranging from sampling, neural network optimization, to modeling the dynamics of transformers for large language models.
【5】Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions
标题:上下文强凸模拟优化:优化然后用不精确的解进行预测
链接:https://arxiv.org/abs/2512.06270
作者:Nifei Lin,Heng Luo,L. Jeff Hong
摘要:In this work, we study contextual strongly convex simulation optimization and adopt an "optimize then predict" (OTP) approach for real-time decision making. In the offline stage, simulation optimization is conducted across a set of covariates to approximate the optimal-solution function; in the online stage, decisions are obtained by evaluating this approximation at the observed covariate. The central theoretical challenge is to understand how the inexactness of solutions generated by simulation-optimization algorithms affects the optimality gap, which is overlooked in existing studies. To address this, we develop a unified analysis framework that explicitly accounts for both solution bias and variance. Using Polyak-Ruppert averaging SGD as an illustrative simulation-optimization algorithm, we analyze the optimality gap of OTP under four representative smoothing techniques: $k$ nearest neighbor, kernel smoothing, linear regression, and kernel ridge regression. We establish convergence rates, derive the optimal allocation of the computational budget $Γ$ between the number of design covariates and the per-covariate simulation effort, and demonstrate the convergence rate can approximately achieve $Γ^{-1}$ under appropriate smoothing technique and sample-allocation rule. Finally, through a numerical study, we validate the theoretical findings and demonstrate the effectiveness and practical value of the proposed approach.
【6】Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
标题:最优控制中的统一信息化:通过迭代软政策和路径积分解从经典目标回到经典目标
链接:https://arxiv.org/abs/2512.06109
作者:Ajinkya Bhole,Mohammad Mahmoudi Filabadi,Guillaume Crevecoeur,Tom Lefebvre
摘要:This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.
预测|估计(21篇)
【1】Provable Long-Range Benefits of Next-Token Prediction
标题:下一个代币预测的可证明的长期好处
链接:https://arxiv.org/abs/2512.07818
作者:Xinyuan Cao,Santosh S. Vempala
备注:66 pages, 5 figures
摘要
:Why do modern language models, trained to do well on next-word prediction, appear to generate coherent documents and capture long-range structure? Here we show that next-token prediction is provably powerful for learning longer-range structure, even with common neural network architectures. Specifically, we prove that optimizing next-token prediction over a Recurrent Neural Network (RNN) yields a model that closely approximates the training distribution: for held-out documents sampled from the training distribution, no algorithm of bounded description length limited to examining the next $k$ tokens, for any $k$, can distinguish between $k$ consecutive tokens of such documents and $k$ tokens generated by the learned language model following the same prefix. We provide polynomial bounds (in $k$, independent of the document length) on the model size needed to achieve such $k$-token indistinguishability, offering a complexity-theoretic explanation for the long-range coherence observed in practice.
【2】The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds
标题:代理能力问题:通过信息论界限预测可解性
链接:https://arxiv.org/abs/2512.07631
作者:Shahar Lutati
摘要:When should an autonomous agent commit resources to a task? We introduce the Agent Capability Problem (ACP), a framework for predicting whether an agent can solve a problem under resource constraints. Rather than relying on empirical heuristics, ACP frames problem-solving as information acquisition: an agent requires $\Itotal$ bits to identify a solution and gains $\Istep$ bits per action at cost $\Cstep$, yielding an effective cost $\Ceff = (\Itotal/\Istep), \Cstep$ that predicts resource requirements before search. We prove that $\Ceff$ lower-bounds expected cost and provide tight probabilistic upper bounds. Experimental validation shows that ACP predictions closely track actual agent performance, consistently bounding search effort while improving efficiency over greedy and random strategies. The framework generalizes across LLM-based and agentic workflows, linking principles from active learning, Bayesian optimization, and reinforcement learning through a unified information-theoretic lens. \
【3】Time Series Foundation Models for Process Model Forecasting
标题:过程模型预测的时间序列基础模型
链接:https://arxiv.org/abs/2512.07624
作者:Yongbo Yu,Jari Peeperkorn,Johannes De Smedt,Jochen De Weerdt
摘要:Process Model Forecasting (PMF) aims to predict how the control-flow structure of a process evolves over time by modeling the temporal dynamics of directly-follows (DF) relations, complementing predictive process monitoring that focuses on single-case prefixes. Prior benchmarks show that machine learning and deep learning models provide only modest gains over statistical baselines, mainly due to the sparsity and heterogeneity of the DF time series. We investigate Time Series Foundation Models (TSFMs), large pre-trained models for generic time series, as an alternative for PMF. Using DF time series derived from real-life event logs, we compare zero-shot use of TSFMs, without additional training, with fine-tuned variants adapted on PMF-specific data. TSFMs generally achieve lower forecasting errors (MAE and RMSE) than traditional and specialized models trained from scratch on the same logs, indicating effective transfer of temporal structure from non-process domains. While fine-tuning can further improve accuracy, the gains are often small and may disappear on smaller or more complex datasets, so zero-shot use remains a strong default. Our study highlights the generalization capability and data efficiency of TSFMs for process-related time series and, to the best of our knowledge, provides the first systematic evaluation of temporal foundation models for PMF.
【4】Weighted Contrastive Learning for Anomaly-Aware Time-Series Forecasting
标题:用于异常感知时间序列预测的加权对比学习
链接:https://arxiv.org/abs/2512.07569
作者:Joel Ekstrand,Tor Mattsson,Zahra Taghiyarrenani,Slawomir Nowaczyk,Jens Lundström,Mikael Lindén
摘要:Reliable forecasting of multivariate time series under anomalous conditions is crucial in applications such as ATM cash logistics, where sudden demand shifts can disrupt operations. Modern deep forecasters achieve high accuracy on normal data but often fail when distribution shifts occur. We propose Weighted Contrastive Adaptation (WECA), a Weighted contrastive objective that aligns normal and anomaly-augmented representations, preserving anomaly-relevant information while maintaining consistency under benign variations. Evaluations on a nationwide ATM transaction dataset with domain-informed anomaly injection show that WECA improves SMAPE on anomaly-affected data by 6.1 percentage points compared to a normally trained baseline, with negligible degradation on normal data. These results demonstrate that WECA enhances forecasting reliability under anomalies without sacrificing performance during regular operations.
【5】FRWKV:Frequency-Domain Linear Attention for Long-Term Time Series Forecasting
标题:FRWKN:长期时间序列预测的频域线性关注
链接:https://arxiv.org/abs/2512.07539
作者:Qingyuan Yang,Shizhuo,Dongyue Chen,Da Teng,Zehua Gan
摘要:Traditional Transformers face a major bottleneck in long-sequence time series forecasting due to their quadratic complexity $(\mathcal{O}(T^2))$ and their limited ability to effectively exploit frequency-domain information. Inspired by RWKV's $\mathcal{O}(T)$ linear attention and frequency-domain modeling, we propose FRWKV, a frequency-domain linear-attention framework that overcomes these limitations. Our model integrates linear attention mechanisms with frequency-domain analysis, achieving $\mathcal{O}(T)$ computational complexity in the attention path while exploiting spectral information to enhance temporal feature representations for scalable long-sequence modeling. Across eight real-world datasets, FRWKV achieves a first-place average rank. Our ablation studies confirm the critical roles of both the linear attention and frequency-encoder components. This work demonstrates the powerful synergy between linear attention and frequency analysis, establishing a new paradigm for scalable time series modeling. Code is available at this repository: https://github.com/yangqingyuan-byte/FRWKV.
【6】Efficient Low-Tubal-Rank Tensor Estimation via Alternating Preconditioned Gradient Descent
标题:通过交替预条件梯度下降的有效低管阶张量估计
链接:https://arxiv.org/abs/2512.07490
作者:Zhiyu Liu,Zhi Han,Yandong Tang,Jun Fan,Yao Wang
摘要:The problem of low-tubal-rank tensor estimation is a fundamental task with wide applications across high-dimensional signal processing, machine learning, and image science. Traditional approaches tackle such a problem by performing tensor singular value decomposition, which is computationally expensive and becomes infeasible for large-scale tensors. Recent approaches address this issue by factorizing the tensor into two smaller factor tensors and solving the resulting problem using gradient descent. However, this kind of approach requires an accurate estimate of the tensor rank, and when the rank is overestimated, the convergence of gradient descent and its variants slows down significantly or even diverges. To address this problem, we propose an Alternating Preconditioned Gradient Descent (APGD) algorithm, which accelerates convergence in the over-parameterized setting by adding a preconditioning term to the original gradient and updating these two factors alternately. Based on certain geometric assumptions on the objective function, we establish linear convergence guarantees for more general low-tubal-rank tensor estimation problems. Then we further analyze the specific cases of low-tubal-rank tensor factorization and low-tubal-rank tensor recovery. Our theoretical results show that APGD achieves linear convergence even under over-parameterization, and the convergence rate is independent of the tensor condition number. Extensive simulations on synthetic data are carried out to validate our theoretical assertions.
【7】Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction
标题:少即是多:非均匀道路段对于公交车到达预测是有效的
链接:https://arxiv.org/abs/2512.07200
作者:Zhen Huang,Jiaxin Deng,Jiayu Xu,Junbiao Pang,Haitao Yu
摘要:In bus arrival time prediction, the process of organizing road infrastructure network data into homogeneous entities is known as segmentation. Segmenting a road network is widely recognized as the first and most critical step in developing an arrival time prediction system, particularly for auto-regressive-based approaches. Traditional methods typically employ a uniform segmentation strategy, which fails to account for varying physical constraints along roads, such as road conditions, intersections, and points of interest, thereby limiting prediction efficiency. In this paper, we propose a Reinforcement Learning (RL)-based approach to efficiently and adaptively learn non-uniform road segments for arrival time prediction. Our method decouples the prediction process into two stages: 1) Non-uniform road segments are extracted based on their impact scores using the proposed RL framework; and 2) A linear prediction model is applied to the selected segments to make predictions. This method ensures optimal segment selection while maintaining computational efficiency, offering a significant improvement over traditional uniform approaches. Furthermore, our experimental results suggest that the linear approach can even achieve better performance than more complex methods. Extensive experiments demonstrate the superiority of the proposed method, which not only enhances efficiency but also improves learning performance on large-scale benchmarks. The dataset and the code are publicly accessible at: https://github.com/pangjunbiao/Less-is-More.
【8】UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting
标题:UniDiff:多峰时间序列预测的统一扩散框架
链接:https://arxiv.org/abs/2512.07184
作者:Da Zhang,Bingyu Li,Zhuyuan Zhao,Junyu Gao,Feiping Nie,Xuelong Li
摘要:As multimodal data proliferates across diverse real-world applications, leveraging heterogeneous information such as texts and timestamps for accurate time series forecasting (TSF) has become a critical challenge. While diffusion models demonstrate exceptional performance in generation tasks, their application to TSF remains largely confined to modeling single-modality numerical sequences, overlooking the abundant cross-modal signals inherent in complex heterogeneous data. To address this gap, we propose UniDiff, a unified diffusion framework for multimodal time series forecasting. To process the numerical sequence, our framework first tokenizes the time series into patches, preserving local temporal dynamics by mapping each patch to an embedding space via a lightweight MLP. At its core lies a unified and parallel fusion module, where a single cross-attention mechanism adaptively weighs and integrates structural information from timestamps and semantic context from texts in one step, enabling a flexible and efficient interplay between modalities. Furthermore, we introduce a novel classifier-free guidance mechanism designed for multi-source conditioning, allowing for decoupled control over the guidance strength of textual and temporal information during inference, which significantly enhances model robustness. Extensive experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
【9】OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
标题:OXtal:一个预测有机晶体结构的全原子扩散模型
链接:https://arxiv.org/abs/2512.06987
作者:Emily Jin,Andrei Cristian Nica,Mikhail Galkin,Jarrid Rector-Brooks,Kin Long Kelvin Lee,Santiago Miret,Frances H. Arnold,Michael Bronstein,Avishek Joey Bose,Alexander Tong,Cheng-Hao Liu
摘要
:Accurately predicting experimentally-realizable 3D molecular crystal structures from their 2D chemical graphs is a long-standing open challenge in computational chemistry called crystal structure prediction (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce OXtal, a large-scale 100M parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale OXtal, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, Stoichiometric Stochastic Shell Sampling ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization -- thus enabling more scalable architectural choices at all-atom resolution. By leveraging a large dataset of 600K experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), OXtal achieves orders-of-magnitude improvements over prior ab initio machine learning CSP methods, while remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, OXtal recovers experimental structures with conformer $\text{RMSD}_1<0.5$ Å and attains over 80\% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
【10】Prediction with Expert Advice under Local Differential Privacy
标题:本地差异隐私下的专家建议预测
链接:https://arxiv.org/abs/2512.06971
作者:Ben Jacobsen,Kassem Fawaz
备注:19 pages, 3 figures
摘要:We study the classic problem of prediction with expert advice under the constraint of local differential privacy (LDP). In this context, we first show that a classical algorithm naturally satisfies LDP and then design two new algorithms that improve it: RW-AdaBatch and RW-Meta. For RW-AdaBatch, we exploit the limited-switching behavior induced by LDP to provide a novel form of privacy amplification that grows stronger on easier data, analogous to the shuffle model in offline learning. Drawing on the theory of random walks, we prove that this improvement carries essentially no utility cost. For RW-Meta, we develop a general method for privately selecting between experts that are themselves non-trivial learning algorithms, and we show that in the context of LDP this carries no extra privacy cost. In contrast, prior work has only considered data-independent experts. We also derive formal regret bounds that scale inversely with the degree of independence between experts. Our analysis is supplemented by evaluation on real-world data reported by hospitals during the COVID-19 pandemic; RW-Meta outperforms both the classical baseline and a state-of-the-art \textit{central} DP algorithm by 1.5-3$\times$ on the task of predicting which hospital will report the highest density of COVID patients each week.
【11】Hidden Leaks in Time Series Forecasting: How Data Leakage Affects LSTM Evaluation Across Configurations and Validation Strategies
标题:时间序列预测中的隐藏泄漏:数据泄漏如何影响跨Inbox和验证策略的LSTM评估
链接:https://arxiv.org/abs/2512.06932
作者:Salma Albelali,Moataz Ahmed
摘要:Deep learning models, particularly Long Short-Term Memory (LSTM) networks, are widely used in time series forecasting due to their ability to capture complex temporal dependencies. However, evaluation integrity is often compromised by data leakage, a methodological flaw in which input-output sequences are constructed before dataset partitioning, allowing future information to unintentionally influence training. This study investigates the impact of data leakage on performance, focusing on how validation design mediates leakage sensitivity. Three widely used validation techniques (2-way split, 3-way split, and 10-fold cross-validation) are evaluated under both leaky (pre-split sequence generation) and clean conditions, with the latter mitigating leakage risk by enforcing temporal separation during data splitting prior to sequence construction. The effect of leakage is assessed using RMSE Gain, which measures the relative increase in RMSE caused by leakage, computed as the percentage difference between leaky and clean setups. Empirical results show that 10-fold cross-validation exhibits RMSE Gain values of up to 20.5% at extended lag steps. In contrast, 2-way and 3-way splits demonstrate greater robustness, typically maintaining RMSE Gain below 5% across diverse configurations. Moreover, input window size and lag step significantly influence leakage sensitivity: smaller windows and longer lags increase the risk of leakage, whereas larger windows help reduce it. These findings underscore the need for configuration-aware, leakage-resistant evaluation pipelines to ensure reliable performance estimation.
【12】Evaluating the Sensitivity of BiLSTM Forecasting Models to Sequence Length and Input Noise
标题:评估BiLSTM预测模型对序列长度和输入噪音的敏感性
链接:https://arxiv.org/abs/2512.06926
作者:Salma Albelali,Moataz Ahmed
摘要:Deep learning (DL) models, a specialized class of multilayer neural networks, have become central to time-series forecasting in critical domains such as environmental monitoring and the Internet of Things (IoT). Among these, Bidirectional Long Short-Term Memory (BiLSTM) architectures are particularly effective in capturing complex temporal dependencies. However, the robustness and generalization of such models are highly sensitive to input data characteristics - an aspect that remains underexplored in existing literature. This study presents a systematic empirical analysis of two key data-centric factors: input sequence length and additive noise. To support this investigation, a modular and reproducible forecasting pipeline is developed, incorporating standardized preprocessing, sequence generation, model training, validation, and evaluation. Controlled experiments are conducted on three real-world datasets with varying sampling frequencies to assess BiLSTM performance under different input conditions. The results yield three key findings: (1) longer input sequences significantly increase the risk of overfitting and data leakage, particularly in data-constrained environments; (2) additive noise consistently degrades predictive accuracy across sampling frequencies; and (3) the simultaneous presence of both factors results in the most substantial decline in model stability. While datasets with higher observation frequencies exhibit greater robustness, they remain vulnerable when both input challenges are present. These findings highlight important limitations in current DL-based forecasting pipelines and underscore the need for data-aware design strategies. This work contributes to a deeper understanding of DL model behavior in dynamic time-series environments and provides practical insights for developing more reliable and generalizable forecasting systems.
【13】A Novel Deep Neural Network Architecture for Real-Time Water Demand Forecasting
标题:用于实时需水量预测的新型深度神经网络架构
链接:https://arxiv.org/abs/2512.06714
作者:Tony Salloom,Okyay Kaynak,Wei He
摘要:Short-term water demand forecasting (StWDF) is the foundation stone in the derivation of an optimal plan for controlling water supply systems. Deep learning (DL) approaches provide the most accurate solutions for this purpose. However, they suffer from complexity problem due to the massive number of parameters, in addition to the high forecasting error at the extreme points. In this work, an effective method to alleviate the error at these points is proposed. It is based on extending the data by inserting virtual data within the actual data to relieve the nonlinearity around them. To our knowledge, this is the first work that considers the problem related to the extreme points. Moreover, the water demand forecasting model proposed in this work is a novel DL model with relatively low complexity. The basic model uses the gated recurrent unit (GRU) to handle the sequential relationship in the historical demand data, while an unsupervised classification method, K-means, is introduced for the creation of new features to enhance the prediction accuracy with less number of parameters. Real data obtained from two different water plants in China are used to train and verify the model proposed. The prediction results and the comparison with the state-of-the-art illustrate that the method proposed reduces the complexity of the model six times of what achieved in the literature while conserving the same accuracy. Furthermore, it is found that extending the data set significantly reduces the error by about 30%. However, it increases the training time.
【14】Quantum Temporal Convolutional Neural Networks for Cross-Sectional Equity Return Prediction: A Comparative Benchmark Study
标题:用于跨部门股票回报预测的量子时间卷积神经网络:比较基准研究
链接:https://arxiv.org/abs/2512.06630
作者:Chi-Sheng Chen,Xinyu Zhang,Rong Fu,Qiuzhe Xie,Fan Zhang
摘要:Quantum machine learning offers a promising pathway for enhancing stock market prediction, particularly under complex, noisy, and highly dynamic financial environments. However, many classical forecasting models struggle with noisy input, regime shifts, and limited generalization capacity. To address these challenges, we propose a Quantum Temporal Convolutional Neural Network (QTCNN) that combines a classical temporal encoder with parameter-efficient quantum convolution circuits for cross-sectional equity return prediction. The temporal encoder extracts multi-scale patterns from sequential technical indicators, while the quantum processing leverages superposition and entanglement to enhance feature representation and suppress overfitting. We conduct a comprehensive benchmarking study on the JPX Tokyo Stock Exchange dataset and evaluate predictions through long-short portfolio construction using out-of-sample Sharpe ratio as the primary performance metric. QTCNN achieves a Sharpe ratio of 0.538, outperforming the best classical baseline by approximately 72\%. These results highlight the practical potential of quantum-enhanced forecasting model, QTCNN, for robust decision-making in quantitative finance.
【15】On fine-tuning Boltz-2 for protein-protein affinity prediction
标题:微调Boltz-2用于蛋白质-蛋白质亲和力预测
链接:https://arxiv.org/abs/2512.06592
作者:James King,Lewis Cornwall,Andrei Cristian Nica,James Day,Aaron Sim,Neil Dalchau,Lilly Wollman,Joshua Meyers
备注:MLSB 2025
摘要:Accurate prediction of protein-protein binding affinity is vital for understanding molecular interactions and designing therapeutics. We adapt Boltz-2, a state-of-the-art structure-based protein-ligand affinity predictor, for protein-protein affinity regression and evaluate it on two datasets, TCR3d and PPB-affinity. Despite high structural accuracy, Boltz-2-PPI underperforms relative to sequence-based alternatives in both small- and larger-scale data regimes. Combining embeddings from Boltz-2-PPI with sequence-based embeddings yields complementary improvements, particularly for weaker sequence models, suggesting different signals are learned by sequence- and structure-based models. Our results echo known biases associated with training with structural data and suggest that current structure-based representations are not primed for performant affinity prediction.
【16】Automated Deep Learning Estimation of Anthropometric Measurements for Preparticipation Cardiovascular Screening
标题:用于预参与心血管筛查的人体测量测量的自动深度学习估计
链接:https://arxiv.org/abs/2512.06434
作者:Lucas R. Mareque,Ricardo L. Armentano,Leandro J. Cymberknop
备注:8 pages, 2 figures, 3 tables
摘要:Preparticipation cardiovascular examination (PPCE) aims to prevent sudden cardiac death (SCD) by identifying athletes with structural or electrical cardiac abnormalities. Anthropometric measurements, such as waist circumference, limb lengths, and torso proportions to detect Marfan syndrome, can indicate elevated cardiovascular risk. Traditional manual methods are labor-intensive, operator-dependent, and challenging to scale. We present a fully automated deep-learning approach to estimate five key anthropometric measurements from 2D synthetic human body images. Using a dataset of 100,000 images derived from 3D body meshes, we trained and evaluated VGG19, ResNet50, and DenseNet121 with fully connected layers for regression. All models achieved sub-centimeter accuracy, with ResNet50 performing best, achieving a mean MAE of 0.668 cm across all measurements. Our results demonstrate that deep learning can deliver accurate anthropometric data at scale, offering a practical tool to complement athlete screening protocols. Future work will validate the models on real-world images to extend applicability.
【17】Proportional integral derivative booster for neural networks-based time-series prediction: Case of water demand prediction
标题:基于神经网络的时间序列预测的比例积分求导助推器:需水量预测案例
链接:https://arxiv.org/abs/2512.06357
作者:Tony Sallooma,Okyay Kaynak,Xinbo Yub,Wei He
备注:Engineering Applications of Artificial Intelligence 2022
摘要
:Multi-step time-series prediction is an essential supportive step for decision-makers in several industrial areas. Artificial intelligence techniques, which use a neural network component in various forms, have recently frequently been used to accomplish this step. However, the complexity of the neural network structure still stands up as a critical problem against prediction accuracy. In this paper, a method inspired by the proportional-integral-derivative (PID) control approach is investigated to enhance the performance of neural network models used for multi-step ahead prediction of periodic time-series information while maintaining a negligible impact on the complexity of the system. The PID-based method is applied to the predicted value at each time step to bring that value closer to the real value. The water demand forecasting problem is considered as a case study, where two deep neural network models from the literature are used to prove the effectiveness of the proposed boosting method. Furthermore, to prove the applicability of this PID-based booster to other types of periodic time-series prediction problems, it is applied to enhance the accuracy of a neural network model used for multi-step forecasting of hourly energy consumption. The comparison between the results of the original prediction models and the results after using the proposed technique demonstrates the superiority of the proposed method in terms of prediction accuracy and system complexity.
【18】Distribution-informed Online Conformal Prediction
标题:基于分布的在线保形预测
链接:https://arxiv.org/abs/2512.07770
作者:Dongjian Hu,Junxi Wu,Shu-Tao Xia,Changliang Zou
摘要:Conformal prediction provides a pivotal and flexible technique for uncertainty quantification by constructing prediction sets with a predefined coverage rate. Many online conformal prediction methods have been developed to address data distribution shifts in fully adversarial environments, resulting in overly conservative prediction sets. We propose Conformal Optimistic Prediction (COP), an online conformal prediction algorithm incorporating underlying data pattern into the update rule. Through estimated cumulative distribution function of non-conformity scores, COP produces tighter prediction sets when predictable pattern exists, while retaining valid coverage guarantees even when estimates are inaccurate. We establish a joint bound on coverage and regret, which further confirms the validity of our approach. We also prove that COP achieves distribution-free, finite-sample coverage under arbitrary learning rates and can converge when scores are $i.i.d.$. The experimental results also show that COP can achieve valid coverage and construct shorter prediction intervals than other baselines.
【19】Physics-Informed Neural Networks for Source Inversion and Parameters Estimation in Atmospheric Dispersion
标题:物理信息神经网络用于大气扩散源倒置和参数估计
链接:https://arxiv.org/abs/2512.07755
作者:Brenda Anague,Bamdad Hosseini,Issa Karambal,Jean Medard Ngnotchouye
摘要:Recent studies have shown the success of deep learning in solving forward and inverse problems in engineering and scientific computing domains, such as physics-informed neural networks (PINNs). In the fields of atmospheric science and environmental monitoring, estimating emission source locations is a central task that further relies on multiple model parameters that dictate velocity profiles and diffusion parameters. Estimating these parameters at the same time as emission sources from scarce data is a difficult task. In this work, we achieve this by leveraging the flexibility and generality of PINNs. We use a weighted adaptive method based on the neural tangent kernels to solve a source inversion problem with parameter estimation on the 2D and 3D advection-diffusion equations with unknown velocity and diffusion coefficients that may vary in space and time. Our proposed weighted adaptive method is presented as an extension of PINNs for forward PDE problems to a highly ill-posed source inversion and parameter estimation problem. The key idea behind our methodology is to attempt the joint recovery of the solution, the sources along with the unknown parameters, thereby using the underlying partial differential equation as a constraint that couples multiple unknown functional parameters, leading to more efficient use of the limited information in the measurements. We present various numerical experiments, using different types of measurements that model practical engineering systems, to show that our proposed method is indeed successful and robust to additional noise in the measurements.
【20】Machine learning in an expectation-maximisation framework for nowcasting
标题:即时预报预期最大化框架中的机器学习
链接:https://arxiv.org/abs/2512.07335
作者:Paul Wilsens,Katrien Antonio,Gerda Claeskens
摘要:Decision making often occurs in the presence of incomplete information, leading to the under- or overestimation of risk. Leveraging the observable information to learn the complete information is called nowcasting. In practice, incomplete information is often a consequence of reporting or observation delays. In this paper, we propose an expectation-maximisation (EM) framework for nowcasting that uses machine learning techniques to model both the occurrence as well as the reporting process of events. We allow for the inclusion of covariate information specific to the occurrence and reporting periods as well as characteristics related to the entity for which events occurred. We demonstrate how the maximisation step and the information flow between EM iterations can be tailored to leverage the predictive power of neural networks and (extreme) gradient boosting machines (XGBoost). With simulation experiments, we show that we can effectively model both the occurrence and reporting of events when dealing with high-dimensional covariate information. In the presence of non-linear effects, we show that our methodology outperforms existing EM-based nowcasting frameworks that use generalised linear models in the maximisation step. Finally, we apply the framework to the reporting of Argentinian Covid-19 cases, where the XGBoost-based approach again is most performant.
【21】Equivariant Diffusion for Crystal Structure Prediction
标题:晶体结构预测的等变扩散
链接:https://arxiv.org/abs/2512.07289
作者:Peijia Lin,Pin Chen,Rui Jiao,Qing Mo,Jianhuan Cen,Wenbing Huang,Yang Liu,Dan Huang,Yutong Lu
备注:ICML 2024
摘要
:In addressing the challenge of Crystal Structure Prediction (CSP), symmetry-aware deep learning models, particularly diffusion models, have been extensively studied, which treat CSP as a conditional generation task. However, ensuring permutation, rotation, and periodic translation equivariance during diffusion process remains incompletely addressed. In this work, we propose EquiCSP, a novel equivariant diffusion-based generative model. We not only address the overlooked issue of lattice permutation equivariance in existing models, but also develop a unique noising algorithm that rigorously maintains periodic translation equivariance throughout both training and inference processes. Our experiments indicate that EquiCSP significantly surpasses existing models in terms of generating accurate structures and demonstrates faster convergence during the training process.
其他神经网络|深度学习|模型|建模(42篇)
【1】Formalized Hopfield Networks and Boltzmann Machines
标题:形式化Hopfield网络和Boltzmann机
链接:https://arxiv.org/abs/2512.07766
作者:Matteo Cipollina,Michail Karatarakis,Freek Wiedijk
摘要:Neural networks are widely used, yet their analysis and verification remain challenging. In this work, we present a Lean 4 formalization of neural networks, covering both deterministic and stochastic models. We first formalize Hopfield networks, recurrent networks that store patterns as stable states. We prove convergence and the correctness of Hebbian learning, a training rule that updates network parameters to encode patterns, here limited to the case of pairwise-orthogonal patterns. We then consider stochastic networks, where updates are probabilistic and convergence is to a stationary distribution. As a canonical example, we formalize the dynamics of Boltzmann machines and prove their ergodicity, showing convergence to a unique stationary distribution using a new formalization of the Perron-Frobenius theorem.
【2】Machine Learning: Progress and Prospects
标题:机器学习:进展与前景
链接:https://arxiv.org/abs/2512.07519
作者:Alexander Gammerman
备注:Inaugural Lecture. 18 pages, 13 figures, Published in 1997 by Royal Holloway, University of London, ISBN 0 900145 93 5
摘要:This Inaugural Lecture was given at Royal Holloway University of London in 1996. It covers an introduction to machine learning and describes various theoretical advances and practical projects in the field. The Lecture here is presented in its original format, but a few remarks have been added in 2025 to reflect recent developments, and the list of references has been updated to enhance the convenience and accuracy for readers. When did machine learning start? Maybe a good starting point is 1949, when Claude Shannon proposed a learning algorithm for chess-playing programs. Or maybe we should go back to the 1930s when Ronald Fisher developed discriminant analysis - a type of learning where the problem is to construct a decision rule that separates two types of vectors. Or could it be the 18th century when David Hume discussed the idea of induction? Or the 14th century, when William of Ockham formulated the principle of "simplicity" known as "Ockham's razor" (Ockham, by the way, is a small village not far from Royal Holloway). Or it may be that, like almost everything else in Western civilisation and culture, the origin of these ideas lies in the Mediterranean. After all, it was Aristotle who said that "we learn some things only by doing things". The field of machine learning has been greatly influenced by other disciplines and the subject is in itself not a very homogeneous discipline, but includes separate, overlapping subfields. There are many parallel lines of research in ML: inductive learning, neural networks, clustering, and theories of learning. They are all part of the more general field of machine learning.
【3】Exploring possible vector systems for faster training of neural networks with preconfigured latent spaces
标题:探索可能的载体系统,以更快地训练具有预配置潜在空间的神经网络
链接:https://arxiv.org/abs/2512.07509
作者:Nikita Gabdullin
备注:9 pages, 5 figures, 1 table, 4 equations
摘要:The overall neural network (NN) performance is closely related to the properties of its embedding distribution in latent space (LS). It has recently been shown that predefined vector systems, specifically An root system vectors, can be used as targets for latent space configurations (LSC) to ensure the desired LS structure. One of the main LSC advantage is the possibility of training classifier NNs without classification layers, which facilitates training NNs on datasets with extremely large numbers of classes. This paper provides a more general overview of possible vector systems for NN training along with their properties and methods for vector system construction. These systems are used to configure LS of encoders and visual transformers to significantly speed up ImageNet-1K and 50k-600k classes LSC training. It is also shown that using the minimum number of LS dimensions for a specific number of classes results in faster convergence. The latter has potential advantages for reducing the size of vector databases used to store NN embeddings.
【4】KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models
标题:KAN-Dreamer:将Kolmogorov-Arnold网络作为世界模型中的函数逼近器
链接:https://arxiv.org/abs/2512.07437
作者:Chenwei Shi,Xueyu Luan
备注:23 pages, 8 figures, 3 tables
摘要
:DreamerV3 is a state-of-the-art online model-based reinforcement learning (MBRL) algorithm known for remarkable sample efficiency. Concurrently, Kolmogorov-Arnold Networks (KANs) have emerged as a promising alternative to Multi-Layer Perceptrons (MLPs), offering superior parameter efficiency and interpretability. To mitigate KANs' computational overhead, variants like FastKAN leverage Radial Basis Functions (RBFs) to accelerate inference. In this work, we investigate integrating KAN architectures into the DreamerV3 framework. We introduce KAN-Dreamer, replacing specific MLP and convolutional components of DreamerV3 with KAN and FastKAN layers. To ensure efficiency within the JAX-based World Model, we implement a tailored, fully vectorized version with simplified grid management. We structure our investigation into three subsystems: Visual Perception, Latent Prediction, and Behavior Learning. Empirical evaluations on the DeepMind Control Suite (walker_walk) analyze sample efficiency, training time, and asymptotic performance. Experimental results demonstrate that utilizing our adapted FastKAN as a drop-in replacement for the Reward and Continue predictors yields performance on par with the original MLP-based architecture, maintaining parity in both sample efficiency and training speed. This report serves as a preliminary study for future developments in KAN-based world models.
【5】A Geometric Unification of Concept Learning with Concept Cones
标题:概念学习与概念锥的几何统一
链接:https://arxiv.org/abs/2512.07355
作者:Alexandre Rocchi--Henry,Thomas Fel,Gianni Franchi
备注:22 pages
摘要:Two traditions of interpretability have evolved side by side but seldom spoken to each other: Concept Bottleneck Models (CBMs), which prescribe what a concept should be, and Sparse Autoencoders (SAEs), which discover what concepts emerge. While CBMs use supervision to align activations with human-labeled concepts, SAEs rely on sparse coding to uncover emergent ones. We show that both paradigms instantiate the same geometric structure: each learns a set of linear directions in activation space whose nonnegative combinations form a concept cone. Supervised and unsupervised methods thus differ not in kind but in how they select this cone. Building on this view, we propose an operational bridge between the two paradigms. CBMs provide human-defined reference geometries, while SAEs can be evaluated by how well their learned cones approximate or contain those of CBMs. This containment framework yields quantitative metrics linking inductive biases -- such as SAE type, sparsity, or expansion ratio -- to emergence of plausible\footnote{We adopt the terminology of \citet{jacovi2020towards}, who distinguish between faithful explanations (accurately reflecting model computations) and plausible explanations (aligning with human intuition and domain knowledge). CBM concepts are plausible by construction -- selected or annotated by humans -- though not necessarily faithful to the true latent factors that organise the data manifold.} concepts. Using these metrics, we uncover a ``sweet spot'' in both sparsity and expansion factor that maximizes both geometric and semantic alignment with CBM concepts. Overall, our work unifies supervised and unsupervised concept discovery through a shared geometric framework, providing principled metrics to measure SAE progress and assess how well discovered concept align with plausible human concepts.
【6】M-STAR: Multi-Scale Spatiotemporal Autoregression for Human Mobility Modeling
标题:M-STAR:用于人类移动建模的多尺度时空自回归
链接:https://arxiv.org/abs/2512.07314
作者:Yuxiao Luo,Songming Zhang,Sijie Ruan,Siran Chen,Kang Liu,Yang Xu,Yu Zheng,Ling Yin
摘要:Modeling human mobility is vital for extensive applications such as transportation planning and epidemic modeling. With the rise of the Artificial Intelligence Generated Content (AIGC) paradigm, recent works explore synthetic trajectory generation using autoregressive and diffusion models. While these methods show promise for generating single-day trajectories, they remain limited by inefficiencies in long-term generation (e.g., weekly trajectories) and a lack of explicit spatiotemporal multi-scale modeling. This study proposes Multi-Scale Spatio-Temporal AutoRegression (M-STAR), a new framework that generates long-term trajectories through a coarse-to-fine spatiotemporal prediction process. M-STAR combines a Multi-scale Spatiotemporal Tokenizer that encodes hierarchical mobility patterns with a Transformer-based decoder for next-scale autoregressive prediction. Experiments on two real-world datasets show that M-STAR outperforms existing methods in fidelity and significantly improves generation speed. The data and codes are available at https://github.com/YuxiaoLuo0013/M-STAR.
【7】Learning-Augmented Ski Rental with Discrete Distributions: A Bayesian Approach
标题:具有离散分布的学习增强滑雪租赁:Bayesian方法
链接:https://arxiv.org/abs/2512.07313
作者:Bosun Kang,Hyejun Park,Chenglin Fan
备注:7 pages
摘要:We revisit the classic ski rental problem through the lens of Bayesian decision-making and machine-learned predictions. While traditional algorithms minimize worst-case cost without assumptions, and recent learning-augmented approaches leverage noisy forecasts with robustness guarantees, our work unifies these perspectives. We propose a discrete Bayesian framework that maintains exact posterior distributions over the time horizon, enabling principled uncertainty quantification and seamless incorporation of expert priors. Our algorithm achieves prior-dependent competitive guarantees and gracefully interpolates between worst-case and fully-informed settings. Our extensive experimental evaluation demonstrates superior empirical performance across diverse scenarios, achieving near-optimal results under accurate priors while maintaining robust worst-case guarantees. This framework naturally extends to incorporate multiple predictions, non-uniform priors, and contextual information, highlighting the practical advantages of Bayesian reasoning in online decision problems with imperfect predictions.
【8】PINE: Pipeline for Important Node Exploration in Attributed Networks
标题:PINE:属性网络中重要节点探索的管道
链接:https://arxiv.org/abs/2512.07244
作者:Elizaveta Kovtun,Maksim Makarenko,Natalia Semenova,Alexey Zaytsev,Semen Budennyy
摘要
:A graph with semantically attributed nodes are a common data structure in a wide range of domains. It could be interlinked web data or citation networks of scientific publications. The essential problem for such a data type is to determine nodes that carry greater importance than all the others, a task that markedly enhances system monitoring and management. Traditional methods to identify important nodes in networks introduce centrality measures, such as node degree or more complex PageRank. However, they consider only the network structure, neglecting the rich node attributes. Recent methods adopt neural networks capable of handling node features, but they require supervision. This work addresses the identified gap--the absence of approaches that are both unsupervised and attribute-aware--by introducing a Pipeline for Important Node Exploration (PINE). At the core of the proposed framework is an attention-based graph model that incorporates node semantic features in the learning process of identifying the structural graph properties. The PINE's node importance scores leverage the obtained attention distribution. We demonstrate the superior performance of the proposed PINE method on various homogeneous and heterogeneous attributed networks. As an industry-implemented system, PINE tackles the real-world challenge of unsupervised identification of key entities within large-scale enterprise graphs.
【9】MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling
标题:MUSE:一个简单而有效的基于搜索的多模式框架,用于终身用户兴趣建模
链接:https://arxiv.org/abs/2512.07216
作者:Bin Wu,Feifan Yang,Zhangming Chan,Yu-Ran Gu,Jiawei Feng,Chao Yi,Xiang-Rong Sheng,Han Zhu,Jian Xu,Mang Ye,Bo Zheng
摘要:Lifelong user interest modeling is crucial for industrial recommender systems, yet existing approaches rely predominantly on ID-based features, suffering from poor generalization on long-tail items and limited semantic expressiveness. While recent work explores multimodal representations for behavior retrieval in the General Search Unit (GSU), they often neglect multimodal integration in the fine-grained modeling stage -- the Exact Search Unit (ESU). In this work, we present a systematic analysis of how to effectively leverage multimodal signals across both stages of the two-stage lifelong modeling framework. Our key insight is that simplicity suffices in the GSU: lightweight cosine similarity with high-quality multimodal embeddings outperforms complex retrieval mechanisms. In contrast, the ESU demands richer multimodal sequence modeling and effective ID-multimodal fusion to unlock its full potential. Guided by these principles, we propose MUSE, a simple yet effective multimodal search-based framework. MUSE has been deployed in Taobao display advertising system, enabling 100K-length user behavior sequence modeling and delivering significant gains in top-line metrics with negligible online latency overhead. To foster community research, we share industrial deployment practices and open-source the first large-scale dataset featuring ultra-long behavior sequences paired with high-quality multimodal embeddings. Our code and data is available at https://taobao-mm.github.io.
【10】Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation
标题:您所看到的示例:通过扩散桥和嵌入观察的随机方程进行可视化政策学习
链接:https://arxiv.org/abs/2512.07212
作者:Zhaoyang Liu,Mokai Pan,Zhongyi Wang,Kaizhen Zhu,Haotao Lu,Jingya Wang,Ye Shi
摘要:Imitation learning with diffusion models has advanced robotic control by capturing multi-modal action distributions. However, existing approaches typically treat observations as high-level conditioning inputs to the denoising network, rather than integrating them into the stochastic dynamics of the diffusion process itself. As a result, sampling must begin from random Gaussian noise, weakening the coupling between perception and control and often yielding suboptimal performance. We introduce BridgePolicy, a generative visuomotor policy that explicitly embeds observations within the stochastic differential equation via a diffusion-bridge formulation. By constructing an observation-informed trajectory, BridgePolicy enables sampling to start from a rich, informative prior rather than random noise, substantially improving precision and reliability in control. A key challenge is that classical diffusion bridges connect distributions with matched dimensionality, whereas robotic observations are heterogeneous and multi-modal and do not naturally align with the action space. To address this, we design a multi-modal fusion module and a semantic aligner that unify visual and state inputs and align observation and action representations, making the bridge applicable to heterogeneous robot data. Extensive experiments across 52 simulation tasks on three benchmarks and five real-world tasks demonstrate that BridgePolicy consistently outperforms state-of-the-art generative policies.
【11】AutoLugano: A Deep Learning Framework for Fully Automated Lymphoma Segmentation and Lugano Staging on FDG-PET/CT
标题:AutoLugano:一个深度学习框架,用于全自动淋巴瘤分割和FDG-PET/CT上的Lugano分期
链接:https://arxiv.org/abs/2512.07206
作者:Boyang Pan,Zeyu Zhang,Hongyu Meng,Bin Cui,Yingying Zhang,Wenli Hou,Junhao Li,Langdi Zhong,Xiaoxiao Chen,Xiaoyu Xu,Changjin Zuo,Chao Cheng,Nan-Jie Gong
摘要:Purpose: To develop a fully automated deep learning system, AutoLugano, for end-to-end lymphoma classification by performing lesion segmentation, anatomical localization, and automated Lugano staging from baseline FDG-PET/CT scans. Methods: The AutoLugano system processes baseline FDG-PET/CT scans through three sequential modules:(1) Anatomy-Informed Lesion Segmentation, a 3D nnU-Net model, trained on multi-channel inputs, performs automated lesion detection (2) Atlas-based Anatomical Localization, which leverages the TotalSegmentator toolkit to map segmented lesions to 21 predefined lymph node regions using deterministic anatomical rules; and (3) Automated Lugano Staging, where the spatial distribution of involved regions is translated into Lugano stages and therapeutic groups (Limited vs. Advanced Stage).The system was trained on the public autoPET dataset (n=1,007) and externally validated on an independent cohort of 67 patients. Performance was assessed using accuracy, sensitivity, specificity, F1-scorefor regional involvement detection and staging agreement. Results: On the external validation set, the proposed model demonstrated robust performance, achieving an overall accuracy of 88.31%, sensitivity of 74.47%, Specificity of 94.21% and an F1-score of 80.80% for regional involvement detection,outperforming baseline models. Most notably, for the critical clinical task of therapeutic stratification (Limited vs. Advanced Stage), the system achieved a high accuracy of 85.07%, with a specificity of 90.48% and a sensitivity of 82.61%.Conclusion: AutoLugano represents the first fully automated, end-to-end pipeline that translates a single baseline FDG-PET/CT scan into a complete Lugano stage. This study demonstrates its strong potential to assist in initial staging, treatment stratification, and supporting clinical decision-making.
【12】Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search
标题:通过使用混凝土彩票搜索保留网络训练动态来赢得彩票
链接:https://arxiv.org/abs/2512.07142
作者:Tanay Arora,Christof Teuscher
备注:This work plans to be submitted to the IEEE for possible publication
摘要:The Lottery Ticket Hypothesis asserts the existence of highly sparse, trainable subnetworks ('winning tickets') within dense, randomly initialized neural networks. However, state-of-the-art methods of drawing these tickets, like Lottery Ticket Rewinding (LTR), are computationally prohibitive, while more efficient saliency-based Pruning-at-Initialization (PaI) techniques suffer from a significant accuracy-sparsity trade-off and fail basic sanity checks. In this work, we argue that PaI's reliance on first-order saliency metrics, which ignore inter-weight dependencies, contributes substantially to this performance gap, especially in the sparse regime. To address this, we introduce Concrete Ticket Search (CTS), an algorithm that frames subnetwork discovery as a holistic combinatorial optimization problem. By leveraging a Concrete relaxation of the discrete search space and a novel gradient balancing scheme (GRADBALANCE) to control sparsity, CTS efficiently identifies high-performing subnetworks near initialization without requiring sensitive hyperparameter tuning. Motivated by recent works on lottery ticket training dynamics, we further propose a knowledge distillation-inspired family of pruning objectives, finding that minimizing the reverse Kullback-Leibler divergence between sparse and dense network outputs (CTS-KL) is particularly effective. Experiments on varying image classification tasks show that CTS produces subnetworks that robustly pass sanity checks and achieve accuracy comparable to or exceeding LTR, while requiring only a small fraction of the computation. For example, on ResNet-20 on CIFAR10, it reaches 99.3% sparsity with 74.0% accuracy in 7.9 minutes, while LTR attains the same sparsity with 68.3% accuracy in 95.2 minutes. CTS's subnetworks outperform saliency-based methods across all sparsities, but its advantage over LTR is most pronounced in the highly sparse regime.
【13】Chromatic Feature Vectors for 2-Trees: Exact Formulas for Partition Enumeration with Network Applications
标题:2-树的色特征载体:具有网络应用的分区计数的精确公式
链接:https://arxiv.org/abs/2512.07120
作者:J. Allagan,G. Morgan,S. Langley,R. Lopez-Bonilla,V. Deriglazov
备注:18 pages
摘要:We establish closed-form enumeration formulas for chromatic feature vectors of 2-trees under the bichromatic triangle constraint. These efficiently computable structural features derive from constrained graph colorings where each triangle uses exactly two colors, forbidding monochromatic and rainbow triangles, a constraint arising in distributed systems where components avoid complete concentration or isolation. For theta graphs Theta_n, we prove r_k(Theta_n) = S(n-2, k-1) for k >= 3 (Stirling numbers of the second kind) and r_2(Theta_n) = 2^(n-2) + 1, computable in O(n) time. For fan graphs Phi_n, we establish r_2(Phi_n) = F_{n+1} (Fibonacci numbers) and derive explicit formulas r_k(Phi_n) = sum_{t=k-1}^{n-1} a_{n-1,t} * S(t, k-1) with efficiently computable binomial coefficients, achieving O(n^2) computation per component. Unlike classical chromatic polynomials, which assign identical features to all n-vertex 2-trees, bichromatic constraints provide informative structural features. While not complete graph invariants, these features capture meaningful structural properties through connections to Fibonacci polynomials, Bell numbers, and independent set enumeration. Applications include Byzantine fault tolerance in hierarchical networks, VM allocation in cloud computing, and secret-sharing protocols in distributed cryptography.
【14】PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes
标题:PlantBiMoE:具有SparseMoE的植物基因组双向基础模型
链接:https://arxiv.org/abs/2512.07113
作者:Kepeng Lin,Qizhe Zhang,Rui Wang,Xuehai Hu,Wei Xu
备注:6 pages, 5 figures, accept to BIBM
摘要:Understanding the underlying linguistic rules of plant genomes remains a fundamental challenge in computational biology. Recent advances including AgroNT and PDLLMs have made notable progress although, they suffer from excessive parameter size and limited ability to model the bidirectional nature of DNA strands respectively. To address these limitations, we propose PlantBiMoE, a lightweight and expressive plant genome language model that integrates bidirectional Mamba and a Sparse Mixture-of-Experts (SparseMoE) framework. The bidirectional Mamba enables the model to effectively capture structural dependencies across both the forward and reverse DNA strands, while SparseMoE significantly reduces the number of active parameters, improving computational efficiency without sacrificing modeling capacity. We evaluated and tested our model on the Modified Plants Genome Benchmark (MPGB), an enhanced genomic benchmark, which consolidates 31 datasets across 11 representative tasks, with input sequence lengths ranging from 50 to 6,000 bp. Experimental results demonstrate that PlantBiMoE achieves the best performance on 20 out of 31 datasets and the average best when comparing with existing models. In summary, all above results demonstrate that our model can effectively represent plant genomic sequences, serving as a robust computational tool for diverse genomic tasks, while making substantive contributions to plant genomics, gene editing, and synthetic biology. The code is available at: https://github.com/HUST-Keep-Lin/PlantBiMoE
【15】Always Keep Your Promises: DynamicLRP, A Model-Agnostic Solution To Layer-Wise Relevance Propagation
标题:始终信守承诺:DynamicLRP,分层相关性传播的模型不可知解决方案
链接:https://arxiv.org/abs/2512.07010
作者:Kevin Lee,Pablo Millan Arias
备注:Work in progress, (12 pages manuscript, 6 figures, 6 tables, 3 pages references, 14 pages appendix)
摘要
:Layer-wise Relevance Propagation (LRP) provides principled attribution for neural networks through conservation properties and foundations in Deep Taylor Decomposition. However, existing implementations operate at the module level, requiring architecture-specific propagation rules and modifications. These limit the generality of target model and sustainability of implementations as architectures evolve. We introduce DynamicLRP, a model-agnostic LRP framework operating at the tensor operation level. By decomposing attribution to individual operations within computation graphs and introducing a novel mechanism for deferred activation resolution, named the Promise System, our approach achieves true architecture agnosticity while maintaining LRP's theoretical guarantees. This design operates independently of backpropagation machinery, enabling operation on arbitrary computation graphs without model modification and side-by-side execution with gradient backpropagation. Being based on computation graphs, this method is theoretically extensible to other deep learning libraries that support auto-differentiation. We demonstrate faithfulness matching or exceeding specialized implementations (1.77 vs 1.69 ABPC on VGG, equivalent performance on ViT, 93.70\% and 95.06\% top-1 attribution accuracy for explaining RoBERTa-large and Flan-T5-large answers on SQuADv2, respectively) while maintaining practical efficiency on models with hundreds of millions of parameters. We achieved 99.92\% node coverage across 31,465 computation graph nodes from 15 diverse architectures, including state-space models (Mamba), audio transformers (Whisper), and multimodal systems (DePlot) without any model-specific code with rules for 47 fundamental operations implemented. Our operation-level decomposition and Promise System establish a sustainable, extensible foundation for LRP across evolving architectures.
【16】Flash Multi-Head Feed-Forward Network
标题:Flash多头前向网络
链接:https://arxiv.org/abs/2512.06989
作者:Minshen Zhang,Xiang Hu,Jianguo Li,Wei Wu,Kewei Tu
备注:17 pages, 8 figures
摘要:We explore Multi-Head FFN (MH-FFN) as a replacement of FFN in the Transformer architecture, motivated by the structural similarity between single-head attention and FFN. While multi-head mechanisms enhance expressivity in attention, naively applying them to FFNs faces two challenges: memory consumption scaling with the head count, and an imbalanced ratio between the growing intermediate size and the fixed head dimension as models scale, which degrades scalability and expressive power. To address these challenges, we propose Flash Multi-Head FFN (FlashMHF), with two key innovations: an I/O-aware fused kernel computing outputs online in SRAM akin to FlashAttention, and a design using dynamically weighted parallel sub-networks to maintain a balanced ratio between intermediate and head dimensions. Validated on models from 128M to 1.3B parameters, FlashMHF consistently improves perplexity and downstream task accuracy over SwiGLU FFNs, while reducing peak memory usage by 3-5x and accelerating inference by up to 1.08x. Our work establishes the multi-head design as a superior architectural principle for FFNs, presenting FlashMHF as a powerful, efficient, and scalable alternative to FFNs in Transformers.
【17】On Memory: A comparison of memory mechanisms in world models
标题:论记忆:世界模型中记忆机制的比较
链接:https://arxiv.org/abs/2512.06983
作者:Eli J. Laird,Corey Clark
备注:10 pages, 1 figure
摘要:World models enable agents to plan within imagined environments by predicting future states conditioned on past observations and actions. However, their ability to plan over long horizons is limited by the effective memory span of the backbone architecture. This limitation leads to perceptual drift in long rollouts, hindering the model's capacity to perform loop closures within imagined trajectories. In this work, we investigate the effective memory span of transformer-based world models through an analysis of several memory augmentation mechanisms. We introduce a taxonomy that distinguishes between memory encoding and memory injection mechanisms, motivating their roles in extending the world model's memory through the lens of residual stream dynamics. Using a state recall evaluation task, we measure the memory recall of each mechanism and analyze its respective trade-offs. Our findings show that memory mechanisms improve the effective memory span in vision transformers and provide a path to completing loop closures within a world model's imagination.
【18】Joint Learning of Feasibility-Aware Signal Temporal Logic and BarrierNet for Robust and Correct Control
标题:灵活性感知信号时态逻辑和屏障网络的联合学习以实现稳健和正确的控制
链接:https://arxiv.org/abs/2512.06973
作者:Shuo Liu,Wenliang Liu,Wei Xiao,Calin A. Belta
备注:16 pages, 11 figures
摘要:Control Barrier Functions (CBFs) have emerged as a powerful tool for enforcing safety in optimization-based controllers, and their integration with Signal Temporal Logic (STL) has enabled the specification-driven synthesis of complex robotic behaviors. However, existing CBF-STL approaches typically rely on fixed hyperparameters and myopic, per-time step optimization, which can lead to overly conservative behavior, infeasibility near tight input limits, and difficulty satisfying long-horizon STL tasks. To address these limitations, we propose a feasibility-aware learning framework that embeds trainable, time-varying High Order Control Barrier Functions (HOCBFs) into a differentiable Quadratic Program (dQP). Our approach provides a systematic procedure for constructing time-varying HOCBF constraints for a broad fragment of STL and introduces a unified robustness measure that jointly captures STL satisfaction, QP feasibility, and control-bound compliance. Three neural networks-InitNet, RefNet, and an extended BarrierNet-collaborate to generate reference inputs and adapt constraint-related hyperparameters automatically over time and across initial conditions, reducing conservativeness while maximizing robustness. The resulting controller achieves STL satisfaction with strictly feasible dQPs and requires no manual tuning. Simulation results demonstrate that the proposed framework maintains high STL robustness under tight input bounds and significantly outperforms fixed-parameter and non-adaptive baselines in complex environments.
【19】Angular Regularization for Positive-Unlabeled Learning on the Hypersphere
标题:超球上正无标记学习的角度正规化
链接:https://arxiv.org/abs/2512.06785
作者:Vasileios Sevetlidis,George Pavlidis,Antonios Gasteratos
备注:Featured Certification, J2C Certification. Transactions on Machine Learning Research, 2025
摘要
:Positive-Unlabeled (PU) learning addresses classification problems where only a subset of positive examples is labeled and the remaining data is unlabeled, making explicit negative supervision unavailable. Existing PU methods often rely on negative-risk estimation or pseudo-labeling, which either require strong distributional assumptions or can collapse in high-dimensional settings. We propose AngularPU, a novel PU framework that operates on the unit hypersphere using cosine similarity and angular margin. In our formulation, the positive class is represented by a learnable prototype vector, and classification reduces to thresholding the cosine similarity between an embedding and this prototype-eliminating the need for explicit negative modeling. To counteract the tendency of unlabeled embeddings to cluster near the positive prototype, we introduce an angular regularizer that encourages dispersion of the unlabeled set over the hypersphere, improving separation. We provide theoretical guarantees on the Bayes-optimality of the angular decision rule, consistency of the learned prototype, and the effect of the regularizer on the unlabeled distribution. Experiments on benchmark datasets demonstrate that AngularPU achieves competitive or superior performance compared to state-of-the-art PU methods, particularly in settings with scarce positives and high-dimensional embeddings, while offering geometric interpretability and scalability.
【20】Becoming Experienced Judges: Selective Test-Time Learning for Evaluators
标题:成为经验丰富的法官:评估员的选择性测试时学习
链接:https://arxiv.org/abs/2512.06751
作者:Seungyeon Jwa,Daechul Ahn,Reokyoung Kim,Dongyeop Kang,Jonghyun Choi
摘要:Automatic evaluation with large language models, commonly known as LLM-as-a-judge, is now standard across reasoning and alignment tasks. Despite evaluating many samples in deployment, these evaluators typically (i) treat each case independently, missing the opportunity to accumulate experience, and (ii) rely on a single fixed prompt for all cases, neglecting the need for sample-specific evaluation criteria. We introduce Learning While Evaluating (LWE), a framework that allows evaluators to improve sequentially at inference time without requiring training or validation sets. LWE maintains an evolving meta-prompt that (i) produces sample-specific evaluation instructions and (ii) refines itself through self-generated feedback. Furthermore, we propose Selective LWE, which updates the meta-prompt only on self-inconsistent cases, focusing computation where it matters most. This selective approach retains the benefits of sequential learning while being far more cost-effective. Across two pairwise comparison benchmarks, Selective LWE outperforms strong baselines, empirically demonstrating that evaluators can improve during sequential testing with a simple selective update, learning most from the cases they struggle with.
【21】Decoding Motor Behavior Using Deep Learning and Reservoir Computing
标题:使用深度学习和水库计算解码运动行为
链接:https://arxiv.org/abs/2512.06725
作者:Tian Lan
备注:10 pages, 3 figures
摘要:We present a novel approach to EEG decoding for non-invasive brain machine interfaces (BMIs), with a focus on motor-behavior classification. While conventional convolutional architectures such as EEGNet and DeepConvNet are effective in capturing local spatial patterns, they are markedly less suited for modeling long-range temporal dependencies and nonlinear dynamics. To address this limitation, we integrate an Echo State Network (ESN), a prominent paradigm in reservoir computing into the decoding pipeline. ESNs construct a high-dimensional, sparsely connected recurrent reservoir that excels at tracking temporal dynamics, thereby complementing the spatial representational power of CNNs. Evaluated on a skateboard-trick EEG dataset preprocessed via the PREP pipeline and implemented in MNE-Python, our ESNNet achieves 83.2% within-subject and 51.3% LOSO accuracies, surpassing widely used CNN-based baselines. Code is available at https://github.com/Yutiankunkun/Motion-Decoding-Using-Biosignals
【22】Pathway to $O(\sqrt{d})$ Complexity bound under Wasserstein metric of flow-based models
链接:https://arxiv.org/abs/2512.06702
作者:Xiangjun Meng,Zhongjian Wang
摘要:We provide attainable analytical tools to estimate the error of flow-based generative models under the Wasserstein metric and to establish the optimal sampling iteration complexity bound with respect to dimension as $O(\sqrt{d})$. We show this error can be explicitly controlled by two parts: the Lipschitzness of the push-forward maps of the backward flow which scales independently of the dimension; and a local discretization error scales $O(\sqrt{d})$ in terms of dimension. The former one is related to the existence of Lipschitz changes of variables induced by the (heat) flow. The latter one consists of the regularity of the score function in both spatial and temporal directions. These assumptions are valid in the flow-based generative model associated with the Föllmer process and $1$-rectified flow under the Gaussian tail assumption. As a consequence, we show that the sampling iteration complexity grows linearly with the square root of the trace of the covariance operator, which is related to the invariant distribution of the forward process.
【23】QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling
标题:QL-LSTM:用于稳定长序列建模的参数高效LSTM
链接:https://arxiv.org/abs/2512.06582
【24】Deep Manifold Part 2: Neural Network Mathematics
标题:深度总管第2部分:神经网络数学
链接:https://arxiv.org/abs/2512.06563
【25】Approximate Multiplier Induced Error Propagation in Deep Neural Networks
标题:深度神经网络中的近似乘子引起的误差传播
链接:https://arxiv.org/abs/2512.06537
作者:A. M. H. H. Alahakoon,Hassaan Saadat,Darshana Jayasinghe,Sri Parameswaran
备注:7 pages, Submitted to Design and Automation Conference (DAC 2026)
【26】Neural expressiveness for beyond importance model compression
标题:超越重要性模型压缩的神经表达能力
链接:https://arxiv.org/abs/2512.06440
作者:Angelos-Christos Maroudis,Sotirios Xydis
【27】A new initialisation to Control Gradients in Sinusoidal Neural network
标题:正弦神经网络控制变量的一种新初始化方法
链接:https://arxiv.org/abs/2512.06427
作者:Andrea Combette,Antoine Venaille,Nelly Pustelnik
【28】Optimizing Optimizers for Fast Gradient-Based Learning
标题:优化器以实现基于学生的快速学习
链接:https://arxiv.org/abs/2512.06370
作者:Jaerin Lee,Kyoung Mu Lee
备注:49 pages, 5 figures
【29】Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
标题:过度参数化神经网络中的熵限制和模式连通性
链接:https://arxiv.org/abs/2512.06297
作者:Luca Di Carlo,Chase Goddard,David J. Schwab
备注:Under Review
【30】Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media
标题:从喧闹的社交媒体中发现公共交通风险的重要性感知主题建模
链接:https://arxiv.org/abs/2512.06293
作者:Fatima Ashraf,Muhammad Ayub Sabir,Jiaxin Deng,Junbiao Pang,Haitao Yu
【31】Learning Without Time-Based Embodiment Resets in Soft-Actor Critic
标题:软演员评论家中没有基于时间的实施例的学习重置
链接:https://arxiv.org/abs/2512.06252
作者:Homayoon Farrahi,A. Rupam Mahmood
备注:In Proceedings of the 4th Conference on Lifelong Learning Agents (CoLLAs)
【32】Quantization Blindspots: How Model Compression Breaks Backdoor Defenses
标题:量化盲点:模型压缩如何打破后门防御
链接:https://arxiv.org/abs/2512.06243
作者:Rohan Pandey,Eric Ye
备注:10 pages
【33】Opinion: Learning Intuitive Physics May Require More than Visual Data
标题:观点:学习直观的物理学可能需要的不仅仅是视觉数据
链接:https://arxiv.org/abs/2512.06232
作者:Ellen Su,Solim Legris,Todd M. Gureckis,Mengye Ren
【34】Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures
标题
:可重新配置架构上快速模型恢复的硬件软件优化
链接:https://arxiv.org/abs/2512.06113
作者:Bin Xu,Ayan Banerjee,Sandeep Gupta
【35】When Privacy Isn't Synthetic: Hidden Data Leakage in Generative AI Models
标题:当隐私不是合成的:生成人工智能模型中的隐藏数据泄露
链接:https://arxiv.org/abs/2512.06062
作者:S. M. Mustaqim,Anantaa Kotal,Paul H. Yi
【36】Exact Synthetic Populations for Scalable Societal and Market Modeling
标题:精确合成人群用于可扩展的社会和市场建模
链接:https://arxiv.org/abs/2512.07306
作者:Thierry Petit,Arnault Pachot
备注:Submitted for peer review on December 7, 2025
【37】Non-negative DAG Learning from Time-Series Data
标题:从时间序列数据中进行非负DAB学习
链接:https://arxiv.org/abs/2512.07267
作者:Samuel Rey,Gonzalo Mateos
【38】DeepSVM: Learning Stochastic Volatility Models with Physics-Informed Deep Operator Networks
标题:DeepSV:利用物理信息的深度运营商网络学习随机波动率模型
链接:https://arxiv.org/abs/2512.07162
作者:Kieran A. Malandain,Selim Kalici,Hakob Chakhoyan
【39】Learning to Hedge Swaptions
标题:学习对冲互换
链接:https://arxiv.org/abs/2512.06639
作者:Zaniar Ahmadi,Frédéric Godin
【40】PRIMRose: Insights into the Per-Residue Energy Metrics of Proteins with Double InDel Mutations using Deep Learning
标题:PRIMRose:使用深度学习洞察具有双InDel突变的蛋白质的每残基能量分布
链接:https://arxiv.org/abs/2512.06496
作者:Stella Brown,Nicolas Preisig,Autumn Davis,Brian Hutchinson,Filip Jagodzinski
备注:Presented at Computational Structural Bioinformatics Workshop 2025
【41】Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders
标题:通过条件变分自动编码器建模时空极限
链接:https://arxiv.org/abs/2512.06348
作者:Xiaoyu Ma,Likun Zhang,Christopher K. Wikle
【42】Multi-resolution Physics-Aware Recurrent Convolutional Neural Network for Complex Flows
标题:用于复杂流的多分辨率物理感知回归卷积神经网络
链接:https://arxiv.org/abs/2512.06031
作者:Xinlun Cheng,Joseph Choi,H. S. Udaykumar,Stephen Baek
其他(45篇)
【1】Relational Visual Similarity
标题:关系视觉相似性
链接:https://arxiv.org/abs/2512.07833
作者:Thao Nguyen,Sicheng Mo,Krishna Kumar Singh,Yilin Wang,Jing Shi,Nicholas Kolkin,Eli Shechtman,Yong Jae Lee,Yuheng Li
备注:Project page, data, and code: https://thaoshibe.github.io/relsim
【2】Do Generalisation Results Generalise?
标题:一般化的结果是否普遍化?
链接:https://arxiv.org/abs/2512.07832
作者:Matteo Boglioni,Andrea Sgobbi,Gabriel Tavernini,Francesco Rita,Marius Mosbach,Tiago Pimentel
【3】The Adoption and Usage of AI Agents: Early Evidence from Perplexity
标题:人工智能代理的采用和使用:来自困惑的早期证据
链接:https://arxiv.org/abs/2512.07828
作者:Jeremy Yang,Noah Yonack,Kate Zyskowski,Denis Yarats,Johnny Ho,Jerry Ma
【4】Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support
标题:协作性因果感官挖掘:缩小人机决策支持中的互补性差距
链接:https://arxiv.org/abs/2512.07801
作者:Raunak Jain,Mudita Khurana
【5】GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory
标题:GetFWA:带门控关联记忆的线性Flash窗口注意力
链接:https://arxiv.org/abs/2512.07782
作者:Jiaxu Liu,Yuhe Bai,Christos-Savvas Bouganis
【6】Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks
标题:延迟感知扩散政策:弥合动态任务中的观察与执行差距
链接:https://arxiv.org/abs/2512.07697
作者:Aileen Liao,Dong-Ki Kim,Max Olan Smith,Ali-akbar Agha-mohammadi,Shayegan Omidshafiei
【7】A Bootstrap Perspective on Stochastic Gradient Descent
标题:随机梯度下降的Bootstrap观点
链接:https://arxiv.org/abs/2512.07676
作者:Hongjian Lan,Yucong Liu,Florian Schäfer
【8】A Mathematical Theory of Top-$k$ Sparse Attention via Total Variation Distance
标题:通过总变异距离计算Top-$k$稀疏注意力的数学理论
链接:https://arxiv.org/abs/2512.07647
作者:Georgios Tzachristas,Lei Deng,Ioannis Tzachristas,Gong Zhang,Renhai Chen
【9】PCMind-2.1-Kaiyuan-2B Technical Report
标题:PCMind-2.1-开源-2B技术报告
链接:https://arxiv.org/abs/2512.07612
作者:Kairong Luo,Zhenbo Sun,Xinyu Shi,Shengqi Chen,Bowen Yu,Yunyi Chen,Chenyi Dang,Hengtao Tao,Hui Wang,Fangming Liu,Kaifeng Lyu,Wenguang Chen
【10】Affordance Field Intervention: Enabling VLAs to Escape Memory Traps in Robotic Manipulation
标题:负担能力现场干预:使VLA能够摆脱机器人操纵中的记忆陷阱
链接:https://arxiv.org/abs/2512.07472
作者
:Siyu Xu,Zijian Wang,Yunke Wang,Chenghao Xia,Tao Huang,Chang Xu
【11】Forget and Explain: Transparent Verification of GNN Unlearning
标题:忘记并解释:GNN遗忘的透明验证
链接:https://arxiv.org/abs/2512.07450
作者:Imran Ahsan,Hyunwook Yu,Jinsung Kim,Mucheol Kim
备注:To appear in WSDM 2026 (ACM International Conference on Web Search and Data Mining). Code is available at https://github.com/ImranAhsan23/F-E
【12】Empirical Results for Adjusting Truncated Backpropagation Through Time while Training Neural Audio Effects
标题:训练神经音频效果时调整时间截短反向传播的经验结果
链接:https://arxiv.org/abs/2512.07393
作者:Yann Bourdin,Pierrick Legrand,Fanny Roche
【13】Towards Robust Protective Perturbation against DeepFake Face Swapping
标题:针对DeepFake换脸的强大保护性扰动
链接:https://arxiv.org/abs/2512.07228
作者:Hengyang Yao,Lin Li,Ke Sun,Jianing Qiu,Huiping Chen
【14】Geometric Prior-Guided Federated Prompt Calibration
标题:几何先验引导联邦即时校准
链接:https://arxiv.org/abs/2512.07208
作者:Fei Luo,Ziwei Zhao,Mingxuan Wang,Duoyang Li,Zhe Qian,Jiayi Tuo,Chenyue Zhou,Yanbiao Ma
【15】FlowLPS: Langevin-Proximal Sampling for Flow-based Inverse Problem Solvers
标题:FlowLPS:用于基于流的反问题求解器的Langevin-Proximal采样
链接:https://arxiv.org/abs/2512.07150
作者:Jonghyun Park,Jong Chul Ye
【16】Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation
标题:用于人工智能驱动的逆合成的Procrustean床:可重复性评估的统一框架
链接:https://arxiv.org/abs/2512.07079
作者:Anton Morgunov,Victor S. Batista
备注:11 pages + 7 pages of SI. RetroCast is available on GitHub, see https://github.com/ischemist/project-procrustes. SynthArena is publicly available, see https://syntharena.ischemist.com/
【17】Block Sparse Flash Attention
标题:块稀疏闪光注意力
链接:https://arxiv.org/abs/2512.07011
作者:Daniel Ohayon,Itay Lamprecht,Itay Hubara,Israel Cohen,Daniel Soudry,Noam Elata
备注:10 pages, 5 figures. Code: https://github.com/Danielohayon/Block-Sparse-Flash-Attention
【18】Toward Reliable Machine Unlearning: Theory, Algorithms, and Evaluation
标题:迈向可靠的机器去学习:理论、算法和评估
链接:https://arxiv.org/abs/2512.06993
作者:Ali Ebrahimpour-Boroojeny
【19】A Unifying Human-Centered AI Fairness Framework
标题:统一的以人为本的人工智能公平框架
链接:https://arxiv.org/abs/2512.06944
作者
:Munshi Mahbubur Rahman,Shimei Pan,James R. Foulds
【20】Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields
标题:涡旋流场中水面车辆的节能导航
链接:https://arxiv.org/abs/2512.06912
作者:Rushiraj Gadhvi,Sandeep Manjanna
备注:Under Review for International Conference on Robotics and Automation (ICRA 2026)
【21】Neural Factorization-based Bearing Fault Diagnosis
标题:基于神经分解的轴承故障诊断
链接:https://arxiv.org/abs/2512.06837
作者:Zhenhao Li,Xu Cheng,Yi Zhou
【22】Small-Gain Nash: Certified Contraction to Nash Equilibria in Differentiable Games
标题:小收益纳什:差异化游戏中对纳什均衡的认证收缩
链接:https://arxiv.org/abs/2512.06791
【23】Measuring Over-smoothing beyond Dirichlet energy
标题:Dirichlet能量之外的过平滑度量
链接:https://arxiv.org/abs/2512.06782
作者:Weiqi Guan,Zihao Shi
备注:17 pages, 1 figure
【24】Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics
标题:弧形梯度下降:具有相感知、用户控制的台阶动力学的梯度下降的数学推导式
链接:https://arxiv.org/abs/2512.06737
作者:Nikhil Verma,Joonas Linnosmaa,Espinosa-Leal Leonardo,Napat Vajragupta
备注:80 pages, 6 tables, 2 figures, 5 appendices, proof-of-concept
【25】Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods
标题:重新思考稳健性:评估特征归因方法的新方法
链接:https://arxiv.org/abs/2512.06665
作者:Panagiota Kiourti,Anu Singh,Preeti Duraipandian,Weichao Zhou,Wenchao Li
【26】Hankel-FNO: Fast Underwater Acoustic Charting Via Physics-Encoded Fourier Neural Operator
标题:Hankel-FNO:通过物理编码傅里叶神经运算符快速水下声学图表
链接:https://arxiv.org/abs/2512.06417
作者:Yifan Sun,Lei Cheng,Jianlong Li,Peter Gerstoft
【27】Web Technologies Security in the AI Era: A Survey of CDN-Enhanced Defenses
标题:人工智能时代的Web技术安全:对CDO增强防御的调查
链接:https://arxiv.org/abs/2512.06390
作者:Mehrab Hosain,Sabbir Alom Shuvo,Matthew Ogbe,Md Shah Jalal Mazumder,Yead Rahman,Md Azizul Hakim,Anukul Pandey
备注:Accepted at 2025 IEEE Asia Pacific Conference on Wireless and Mobile (APWiMob). 7 pages, 5 figures
【28】DDFI: Diverse and Distribution-aware Missing Feature Imputation via Two-step Reconstruction
标题:DDFI:通过两步重建实现多样化且具有分布意识的缺失特征插补
链接:https://arxiv.org/abs/2512.06356
作者:Yifan Song,Fenglin Yu,Yihong Luo,Xingjian Tao,Siya Qiu,Kai Han,Jing Tang
【29】Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry
标题:代数几何中随机插值器的零推广误差定理
链接:https://arxiv.org/abs/2512.06347
作者:Naoki Yoshida,Isao Ishikawa,Masaaki Imaizumi
【30】Interpretive Efficiency: Information-Geometric Foundations of Data Usefulness
标题:解释效率:数据可解释性的信息几何基础
链接:https://arxiv.org/abs/2512.06341
【31】Theoretical Compression Bounds for Wide Multilayer Perceptrons
标题:宽多层感知器的理论压缩界限
链接:https://arxiv.org/abs/2512.06288
作者:Houssam El Cheairi,David Gamarnik,Rahul Mazumder
【32】SparsePixels: Efficient Convolution for Sparse Data on FPGAs
标题:SparsePixels:在PGA上实现稀疏数据的高效卷积
链接:https://arxiv.org/abs/2512.06208
作者:Ho Fung Tsoi,Dylan Rankin,Vladimir Loncar,Philip Harris
备注:Under review
【33】On measuring grounding and generalizing grounding problems
标题:接地测量与接地问题的概括
链接:https://arxiv.org/abs/2512.06205
作者:Daniel Quigley,Eric Maynard
备注:36 pages, 85 sources
【34】PMA-Diffusion: A Physics-guided Mask-Aware Diffusion Framework for TSE from Sparse Observations
标题:PMA扩散:来自稀疏观测的PSE的物理引导口罩感知扩散框架
链接:https://arxiv.org/abs/2512.06183
作者:Lindong Liu,Zhixiong Jin,Seongjin Choi
【35】gp2Scale: A Class of Compactly-Supported Non-Stationary Kernels and Distributed Computing for Exact Gaussian Processes on 10 Million Data Points
标题:gp2 Scale:一类紧凑支持的非平稳核和分布式计算,用于1000万个数据点上的精确高斯过程
链接:https://arxiv.org/abs/2512.06143
作者:Marcus M. Noack,Mark D. Risser,Hengrui Luo,Vardaan Tekriwal,Ronald J. Pandolfi
备注:None
【36】ARC-AGI Without Pretraining
标题:AR-AGI无需预训练
链接:https://arxiv.org/abs/2512.06104
【37】LUNA: LUT-Based Neural Architecture for Fast and Low-Cost Qubit Readout
标题:LUNA:基于LU的神经架构,用于快速低成本量子位读取
链接:https://arxiv.org/abs/2512.07808
作者:M. A. Farooq,G. Di Guglielmo,A. Rajagopala,N. Tran,V. A. Chhabria,A. Arora
【38】Two-dimensional RMSD projections for reaction path visualization and validation
标题:用于反应路径可视化和验证的二维RMSD投影
链接:https://arxiv.org/abs/2512.07329
作者:Rohit Goswami
备注:4 pages, 1 figure
【39】Verifiable Deep Quantitative Group Testing
标题:可验证的深度定量组测试
链接:https://arxiv.org/abs/2512.07279
作者:Shreyas Jayant Grampurohit,Satish Mulleti,Ajit Rajwade
备注:11 pages, 2 figures, 3 tables
【40】Physics-Guided Diffusion Priors for Multi-Slice Reconstruction in Scientific Imaging
标题:科学成像中多切片重建的物理引导扩散先验
链接:https://arxiv.org/abs/2512.06977
作者:Laurentius Valdy,Richard D. Paul,Alessio Quercia,Zhuo Cao,Xuan Zhao,Hanno Scharr,Arya Bangun
备注:8 pages, 5 figures, AAAI AI2ASE 2026
【41】PARIS: Pruning Algorithm via the Representer theorem for Imbalanced Scenarios
标题:巴黎:不平衡场景的代表性定理的修剪算法
链接:https://arxiv.org/abs/2512.06950
【42】Interpretable Neural Approximation of Stochastic Reaction Dynamics with Guaranteed Reliability
标题:具有保证可靠性的随机反应动力学的可解释神经逼近
链接:https://arxiv.org/abs/2512.06294
作者:Quentin Badolle,Arthur Theuer,Zhou Fang,Ankit Gupta,Mustafa Khammash
【43】Forests of Uncertaint(r)ees: Using tree-based ensembles to estimate probability distributions of future conflict
标题:不确定森林:使用基于树的集合来估计未来冲突的概率分布
链接:https://arxiv.org/abs/2512.06210
作者:Daniel Mittermaier,Tobias Bohne,Martin Hofer,Daniel Racek
备注:18 pages, 4 figures, 3 tables. Replication code available at https://github.com/ccew-unibw/uncertaintrees
【44】Beyond Lux thresholds: a systematic pipeline for classifying biologically relevant light contexts from wearable data
标题:超越Lux阈值:用于从可穿戴数据中分类生物相关光上下文的系统管道
链接:https://arxiv.org/abs/2512.06181
作者:Yanuo Zhou
备注:16 pages, 8 figures. Reproducible pipeline for classifying biologically light from wearable spectral data. Manuscript in preparation for journal submission
【45】Physics Enhanced Deep Surrogates for the Phonon Boltzmann Transport Equation
标题:物理学增强的物理学深层替代品波尔兹曼输运方程
链接:https://arxiv.org/abs/2512.05976
作者:Antonio Varagnolo,Giuseppe Romano,Raphaël Pestourie
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递