点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计303篇
大模型相关(37篇)
【1】Priors in Time: Missing Inductive Biases for Language Model Interpretability
标题:时间上的先验:语言模型可解释性缺失的归纳偏差
链接:https://arxiv.org/abs/2511.01836
作者:Ekdeep Singh Lubana, Can Rager, Sai Sumedh R. Hindupur, Valerie Costa, Greta Tuckute, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Daniel Wurgaft, Eric J. Bigelow, Johnny Lin, Demba Ba, Martin Wattenberg, Fernanda Viegas, Melanie Weber, Aaron Mueller
备注:Preprint
摘要:Recovering meaningful concepts from language model activations is a central aim of interpretability. While existing feature extraction methods aim to identify concepts that are independent directions, it is unclear if this assumption can capture the rich temporal structure of language. Specifically, via a Bayesian lens, we demonstrate that Sparse Autoencoders (SAEs) impose priors that assume independence of concepts across time, implying stationarity. Meanwhile, language model representations exhibit rich temporal dynamics, including systematic growth in conceptual dimensionality, context-dependent correlations, and pronounced non-stationarity, in direct conflict with the priors of SAEs. Taking inspiration from computational neuroscience, we introduce a new interpretability objective -- Temporal Feature Analysis -- which possesses a temporal inductive bias to decompose representations at a given time into two parts: a predictable component, which can be inferred from the context, and a residual component, which captures novel information unexplained by the context. Temporal Feature Analyzers correctly parse garden path sentences, identify event boundaries, and more broadly delineate abstract, slow-moving information from novel, fast-moving information, while existing SAEs show significant pitfalls in all the above tasks. Overall, our results underscore the need for inductive biases that match the data in designing robust interpretability tools.
【2】Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models
标题:专家之间的动态路由:视觉语言模型中连续学习的数据高效方法
链接:https://arxiv.org/abs/2511.01831
作者:Jay Mohta, Kenan Emir Ak, Dimitrios Dimitriadis, Yan Xu, Mingwei Shen
摘要:Vision-Language Models (VLMs) suffer from catastrophic forgetting when sequentially fine-tuned on new tasks, degrading performance on previously learned foundational and task-specific capabilities. While multi-task learning can mitigate forgetting, it requires simultaneous access to all datasets and imposes computational overhead that scales linearly with the number of tasks. In this work, we introduce a routing-based approach that enables the integration of new tasks while preserving the foundational knowledge acquired during pretraining. We evaluate our method using InternVL-2 models (2B and 8B parameters) and demonstrate that routing preserves the model's foundational capabilities by maintaining performance on general-purpose benchmarks such as ChartQA, MMBench, and DocVQA, while simultaneously improving accuracy on specialized tasks. Importantly, our approach achieves this without requiring concurrent access to data from all tasks, avoiding the significant computational and data overhead associated with traditional multi-task learning. We further conduct extensive ablation studies to evaluate the scalability and robustness of routing-based learning, showing that the approach is resilient to a growing number of tasks and performs particularly well when new tasks are semantically related. Finally, we show that the routing mechanism enables superior cross-modal transfer between language and vision capabilities, allowing knowledge learned in one modality to enhance performance in another capability not achieved by existing continual learning methods.
【3】KV Cache Transform Coding for Compact Storage in LLM Inference
标题:LLM推理中紧凑存储的KV高速缓存变换编码
链接:https://arxiv.org/abs/2511.01815
作者:Konrad Staniszewski, Adrian Łańcucki
摘要:Serving large language models (LLMs) at scale necessitates efficient key-value (KV) cache management. KV caches can be reused across conversation turns via shared-prefix prompts that are common in iterative code editing and chat. However, stale caches consume scarce GPU memory, require offloading, or force recomputation. We present KVTC, a lightweight transform coder that compresses KV caches for compact on-GPU and off-GPU storage. Drawing on classical media compression, KVTC combines PCA-based feature decorrelation, adaptive quantization, and entropy coding. It requires only a brief initial calibration and leaves model parameters unchanged. By exploiting redundancies in KV caches, KVTC achieves up to 20$\times$ compression while maintaining reasoning and long-context accuracy, and 40$\times$ or higher for specific use cases. We test KVTC with Llama 3, Mistral NeMo, and R1-Qwen 2.5 models across benchmarks including AIME25, LiveCodeBench, GSM8K, MMLU, Qasper, RULER, and MATH-500. It consistently outperforms inference-time baselines such as token eviction, quantization, and SVD-based methods, while achieving higher compression ratios. These results support KVTC as a practical building block for memory-efficient LLM serving with reusable KV caches.
【4】Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding
标题:通过资源感知并行推测解码的协作大语言模型推理
链接:https://arxiv.org/abs/2511.01695
作者:Jungyeon Koh, Hyun Jong Yang
摘要
:The growing demand for on-device large language model (LLM) inference highlights the need for efficient mobile edge computing (MEC) solutions, especially in resource-constrained settings. Speculative decoding offers a promising solution by partitioning token generation between a lightweight draft model on mobile devices and a powerful target model on edge servers, but suffers from communication overhead and asynchronous delays. This paper is the first to propose a unified framework that jointly optimizes user association and resource allocation (UARA) to support efficient parallel speculative decoding. We solve the UARA problem using a multi-agent deep reinforcement learning algorithm. To evaluate our approach under realistic conditions, we conduct experiments using the Sionna simulator. Results show that our method achieves up to 28.0% and an average of 23.7% reduction in end-to-end latency without compromising inference accuracy, enabling scalable and low-latency LLM services in MEC systems.
【5】Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving
标题:缩放图思想链推理:具有高效LLM服务的多代理框架
链接:https://arxiv.org/abs/2511.01633
作者:Chengying Huan, Ziheng Meng, Yongchao Liu, Zhengyi Yang, Yun Zhu, Yue Yun, Shipeng Li, Rong Gu, Xiabao Wu, Haitao Zhang, Chuntao Hong, Shaonan Ma, Guihai Chen, Chen Tian
摘要:Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to perform step-by-step reasoning over graph-structured knowledge, but existing pipelines suffer from low accuracy, excessive token usage, high latency, and low throughput due to single-agent monolithic prompts, repeated context re-encoding, and inefficient serving execution. We present GLM, the first multi-agent Graph-CoT system co-designed with an optimized LLM serving architecture. GLM decomposes reasoning into specialized agents for classification, reasoning, action generation, and graph retrieval, enabling branching and selective context sharing to reduce prompt length and reasoning iterations while preserving reasoning quality, thereby improving accuracy and reducing overall token consumption. To scale inference, we introduce a Graph-CoT-aware LLM inference mechanism with graph-specific KV-cache management, priority-based eviction, and pipelined execution to improve serving efficiency. Experiments demonstrate that GLM improves answer accuracy by up to 38%, reduces token cost by up to 95.7%, lowers inference latency by 90.3%, and achieves up to 15.1x higher throughput compared to state-of-the-art Graph-CoT baselines, enabling efficient adoption for complex real-world reasoning at scale.
【6】L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3
标题:L2 T-Button:LLM引导的混合数据库调优,使用LHS和TD 3
链接:https://arxiv.org/abs/2511.01602
作者:Xinyue Yang, Chen Zheng, Yaoyang Hou, Renhao Zhang, Yiyan Zhang, Yanjun Wu, Heng Zhang
摘要:Configuration tuning is critical for database performance. Although recent advancements in database tuning have shown promising results in throughput and latency improvement, challenges remain. First, the vast knob space makes direct optimization unstable and slow to converge. Second, reinforcement learning pipelines often lack effective warm-start guidance and require long offline training. Third, transferability is limited: when hardware or workloads change, existing models typically require substantial retraining to recover performance. To address these limitations, we propose L2T-Tune, a new LLM-guided hybrid database tuning framework that features a three-stage pipeline: Stage one performs a warm start that simultaneously generates uniform samples across the knob space and logs them into a shared pool; Stage two leverages a large language model to mine and prioritize tuning hints from manuals and community documents for rapid convergence. Stage three uses the warm-start sample pool to reduce the dimensionality of knobs and state features, then fine-tunes the configuration with the Twin Delayed Deep Deterministic Policy Gradient algorithm. We conduct experiments on L2T-Tune and the state-of-the-art models. Compared with the best-performing alternative, our approach improves performance by an average of 37.1% across all workloads, and by up to 73% on TPC-C. Compared with models trained with reinforcement learning, it achieves rapid convergence in the offline tuning stage on a single server. Moreover, during the online tuning stage, it only takes 30 steps to achieve best results.
【7】RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
标题:RobustVLA:视觉-语言-动作模型的鲁棒感知强化训练后
链接:https://arxiv.org/abs/2511.01331
作者:Hongyin Zhang, Shuo Zhang, Junxi Jin, Qixin Zeng, Runze Li, Donglin Wang
摘要:Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.
【8】AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs
标题:人工智能进步应该通过每资源的能力来衡量,而不是单独规模:LLM中受委托的资源分配框架
链接:https://arxiv.org/abs/2511.01077
作者:David McCoy, Yulun Wu, Zachary Butzin-Dozier
备注:9 pages (main) + appendix, 3 figures. Accepted at NeurIPS 2025 (Position Paper Track), submission #491. OpenReview: this https URL
摘要
:This position paper challenges the "scaling fundamentalism" dominating AI research, where unbounded growth in model size and computation has led to unsustainable environmental impacts and widening resource inequality. We argue that LLM development should be fundamentally reoriented toward capability-per-resource rather than capability alone. We present a theoretical framework demonstrating that resource-allocation decisions guided by gradient influence patterns can dramatically improve efficiency throughout the AI lifecycle. Our analysis shows that in transformer-based models, where a small fraction of parameters exert outsized influence (following heavy-tailed distributions), three critical insights emerge: (1) updating only high-influence parameters strictly outperforms full-parameter tuning on a performance-per-resource basis; (2) simple gradient norms provide computationally efficient proxies for identifying these high-influence components; and (3) coordinated parameter and data selection yields multiplicative efficiency gains, potentially reducing resource requirements by orders of magnitude. Building on these theoretical foundations, we propose a two stage paradigm marginal-return pretraining for foundation developers and influence guided adaptation for downstream users bridged by gradient blueprints, metadata describing which parameters matter most for various tasks. This capability-per-resource perspective transforms what were once considered pragmatic hardware workarounds into theoretically optimal strategies, democratizing access to cutting-edge AI capabilities while significantly reducing environmental impact. By embedding resource consciousness into how we develop, adapt, and evaluate models, we can reshape AI progress toward a more sustainable and equitable future.
【9】Aligning LLM agents with human learning and adjustment behavior: a dual agent approach
标题:将LLM代理与人类学习和调整行为保持一致:双代理方法
链接:https://arxiv.org/abs/2511.00993
作者:Tianming Liu, Jirong Yang, Yafeng Yin, Manzi Li, Linghao Wang, Zheng Zhu
备注:32 pages, 6 figures, 7 tables
摘要:Effective modeling of how human travelers learn and adjust their travel behavior from interacting with transportation systems is critical for system assessment and planning. However, this task is also difficult due to the complex cognition and decision-making involved in such behavior. Recent research has begun to leverage Large Language Model (LLM) agents for this task. Building on this, we introduce a novel dual-agent framework that enables continuous learning and alignment between LLM agents and human travelers on learning and adaptation behavior from online data streams. Our approach involves a set of LLM traveler agents, equipped with a memory system and a learnable persona, which serve as simulators for human travelers. To ensure behavioral alignment, we introduce an LLM calibration agent that leverages the reasoning and analytical capabilities of LLMs to train the personas of these traveler agents. Working together, this dual-agent system is designed to track and align the underlying decision-making mechanisms of travelers and produce realistic, adaptive simulations. Using a real-world dataset from a day-to-day route choice experiment, we show our approach significantly outperforms existing LLM-based methods in both individual behavioral alignment and aggregate simulation accuracy. Furthermore, we demonstrate that our method moves beyond simple behavioral mimicry to capture the evolution of underlying learning processes, a deeper alignment that fosters robust generalization. Overall, our framework provides a new approach for creating adaptive and behaviorally realistic agents to simulate travelers' learning and adaptation that can benefit transportation simulation and policy analysis.
【10】Assessing LLM Reasoning Steps via Principal Knowledge Grounding
标题:通过主要知识基础评估LLM推理步骤
链接:https://arxiv.org/abs/2511.00879
作者:Hyeon Hwang, Yewon Cho, Chanwoong Yoon, Yein Park, Minju Song, Kyungjae Lee, Gangwoo Kim, Jaewoo Kang
备注:Accepted to EMNLP 2025 Findings
摘要:Step-by-step reasoning has become a standard approach for large language models (LLMs) to tackle complex tasks. While this paradigm has proven effective, it raises a fundamental question: How can we verify that an LLM's reasoning is accurately grounded in knowledge? To address this question, we introduce a novel evaluation suite that systematically assesses the knowledge grounding of intermediate reasoning. Our framework comprises three key components. (1) Principal Knowledge Collection, a large-scale repository of atomic knowledge essential for reasoning. Based on the collection, we propose (2) knowledge-grounded evaluation metrics designed to measure how well models recall and apply prerequisite knowledge in reasoning. These metrics are computed by our (3) evaluator LLM, a lightweight model optimized for cost-effective and reliable metric computation. Our evaluation suite demonstrates remarkable effectiveness in identifying missing or misapplied knowledge elements, providing crucial insights for uncovering fundamental reasoning deficiencies in LLMs. Beyond evaluation, we demonstrate how these metrics can be integrated into preference optimization, showcasing further applications of knowledge-grounded evaluation.
【11】Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding
标题:使用更少的位进行训练:使用随机舍入解锁边缘LLM训练
链接:https://arxiv.org/abs/2511.00874
作者:Taowen Liu, Marta Andronic, Deniz Gündüz, George A. Constantinides
摘要:LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy. Stochastic Rounding (SR) has emerged as a theoretically attractive alternative to deterministic rounding, offering unbiased gradient estimates. However, its interaction with other training factors -- especially batch size -- remains under explored. In this paper, we present a theoretical and empirical study of mini-batch stochastic gradient descent (SGD) with SR, showing that increased batch sizes can compensate for reduced precision during back-propagation. Furthermore, we show that quantizing weights and activations impacts gradient variance in distinct ways. Our experiments validate these theoretical insights.
【12】GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
标题:GrowthHacker:使用代码修改LLM代理自动化非政策评估优化
链接:https://arxiv.org/abs/2511.00802
作者:Jie JW Wu, Ayanda Patrick Herlihy, Ahmad Saleem Mirza, Ali Afoud, Fatemeh Fard
摘要
:With the software industry shifting toward a data-driven culture, online A/B testing is a key tool for evaluating new technologies. However, deploying such experiments requires substantial resources, may negatively impact users, and involves long data collection periods. To address this, \textit{off-policy evaluation (OPE)}, or offline A/B testing, uses logged data to assess technologies and is fundamental in Reinforcement Learning, making it crucial in domains where online testing is costly or risky, such as healthcare, recommender systems, education, dialog systems, and robotics. Despite advances in coding LLMs and agentic AI, little is known about leveraging them to optimize OPE results. We investigate whether LLMs and LLM-based agents can improve OPE performance via code optimization. We propose \textit{GrowthHacker}, a benchmark with agent and baseline methods on large-scale real-world datasets, which iteratively optimizes code, evaluates results, and begins new optimization cycles. We collected datasets, established protocols, implemented baselines for OPE on the Open Bandit Pipeline (OBP)~\cite{saito2021openbanditdatasetpipeline} and Scope-RL~\cite{kiyohara2023scope}, and developed the \textit{two_agent} framework, which reduces system complexity while preserving optimization effectiveness. Results show the two_agent framework achieves 100% reliability and the highest average improvement of 106.7% among positive outcomes. Both two_agent and CrewAI reach 45% success rates, outperforming AutoGen's 34%. These findings demonstrate the feasibility of LLM-based agents as automated "growth hackers" to enhance OPE systems, with implications for scaling data-driven decision-making in production.
【13】Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration
标题:具有内在探索的大型语言模型高效强化学习
链接:https://arxiv.org/abs/2511.00794
作者:Yan Sun, Jia Guo, Stanley Kok, Zihao Wang, Zujie Wen, Zhiqiang Zhang
摘要:Reinforcement learning with verifiable rewards (RLVR) has improved the reasoning ability of large language models, yet training remains costly because many rollouts contribute little to optimization, considering the amount of computation required. This study investigates how simply leveraging intrinsic data properties, almost free benefit during training, can improve data efficiency for RLVR. We propose PREPO with two complementary components. First, we adopt prompt perplexity as an indicator of model adaptability in learning, enabling the model to progress from well-understood contexts to more challenging ones. Second, we amplify the discrepancy among the rollouts by differentiating their relative entropy, and prioritize sequences that exhibit a higher degree of exploration. Together, these mechanisms reduce rollout demand while preserving competitive performance. On the Qwen and Llama models, PREPO achieves effective results on mathematical reasoning benchmarks with up to 3 times fewer rollouts than the baselines. Beyond empirical gains, we provide theoretical and in-depth analyses explaining the underlying rationale of our method to improve the data efficiency of RLVR.
【14】Reliable Curation of EHR Dataset via Large Language Models under Environmental Constraints
标题:环境约束下通过大语言模型可靠地保存EHR数据集
链接:https://arxiv.org/abs/2511.00772
作者:Raymond M. Xiong, Panyu Chen, Tianze Dong, Jian Lu, Benjamin Goldstein, Danyang Zhuo, Anru R. Zhang
摘要:Electronic health records (EHRs) are central to modern healthcare delivery and research; yet, many researchers lack the database expertise necessary to write complex SQL queries or generate effective visualizations, limiting efficient data use and scientific discovery. To address this barrier, we introduce CELEC, a large language model (LLM)-powered framework for automated EHR data extraction and analytics. CELEC translates natural language queries into SQL using a prompting strategy that integrates schema information, few-shot demonstrations, and chain-of-thought reasoning, which together improve accuracy and robustness. On a subset of the EHRSQL benchmark, CELEC achieves execution accuracy comparable to prior systems while maintaining low latency, cost efficiency, and strict privacy by exposing only database metadata to the LLM. CELEC also adheres to strict privacy protocols: the LLM accesses only database metadata (e.g., table and column names), while all query execution occurs securely within the institutional environment, ensuring that no patient-level data is ever transmitted to or shared with the LLM. Ablation studies confirm that each component of the SQL generation pipeline, particularly the few-shot demonstrations, plays a critical role in performance. By lowering technical barriers and enabling medical researchers to query EHR databases directly, CELEC streamlines research workflows and accelerates biomedical discovery.
【15】Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization
标题:并行自动调度:LLM引导的循环优化实验研究
链接:https://arxiv.org/abs/2511.00592
作者:Massinissa Merouani, Islem Kara Bernou, Riyadh Baghdadi
备注:Accepted at the 34th International Conference on Parallel Architectures and Compilation Techniques (PACT 2025). 12 pages, plus appendix
摘要:Automatic code optimization remains a difficult challenge, particularly for complex loop nests on modern hardware. This paper investigates a novel approach to code optimization where Large Language Models (LLMs) guide the process through a closed-loop interaction with a compiler. We present ComPilot, an experimental framework that leverages off-the-shelf LLMs, without any task-specific fine-tuning, as interactive optimization agents. ComPilot establishes a feedback loop where an LLM proposes transformations for a given loop nest to a compiler. The compiler attempts the transformations, reporting back legality status and measured speedup or slowdown. The LLM utilizes this concrete feedback to iteratively refine its optimization strategy. Our extensive evaluation across the PolyBench benchmark suite demonstrates the effectiveness of this zero-shot approach. ComPilot achieves geometric mean speedups of 2.66x (single run) and 3.54x (best-of-5 runs) over the original code. Furthermore, ComPilot demonstrates competitive performance against the state-of-the-art Pluto polyhedral optimizer, outperforming it in many cases. This experimental study demonstrates that general-purpose LLMs can effectively guide the code optimization process when grounded by compiler feedback, opening promising research directions for agentic AI in code optimization.
【16】Bayesian Network Structure Discovery Using Large Language Models
标题:使用大型语言模型发现Bayesian网络结构
链接:https://arxiv.org/abs/2511.00574
作者:Yinghuan Zhang, Yufei Zhang, Parisa Kordjamshidi, Zijun Cui
摘要
:Understanding probabilistic relationships among variables is crucial for analyzing complex systems. Traditional structure learning methods often require extensive observational data and incur high computational costs. Recent studies have explored using large language models (LLMs) for structure learning, but most treat LLMs as auxiliary tools for pre-processing or post-processing, leaving the core learning process data-driven. In this work, we propose a unified framework for Bayesian network structure discovery that places LLMs at the center, supporting both data-free and data-aware settings. In the data-free case, we introduce \textbf{PromptBN} to query LLMs with metadata and efficiently uncover valid probabilistic relationships. When observational data are available, we introduce \textbf{ReActBN}, which integrates the ReAct reasoning paradigm with structure scores such as the Bayesian Information Criterion (BIC) for iterative refinement. Unlike prior methods that offload refinement to external algorithms, our framework maintains the LLM actively in the loop throughout the discovery process. Experiments demonstrate that our method significantly outperforms both existing LLM-based approaches and traditional data-driven algorithms, particularly in the low- or no-data scenario. Code is publicly available at {\texttt{\textcolor{magenta}{https://github.com/sherryzyh/prompt2bn}}}.
【17】Red-teaming Activation Probes using Prompted LLMs
标题:使用已确定的LLM的红组激活探针
链接:https://arxiv.org/abs/2511.00554
作者:Phil Blandfort, Robert Graham
摘要:Activation probes are attractive monitors for AI systems due to low cost and latency, but their real-world robustness remains underexplored. We ask: What failure modes arise under realistic, black-box adversarial pressure, and how can we surface them with minimal effort? We present a lightweight black-box red-teaming procedure that wraps an off-the-shelf LLM with iterative feedback and in-context learning (ICL), and requires no fine-tuning, gradients, or architectural access. Running a case study with probes for high-stakes interactions, we show that our approach can help discover valuable insights about a SOTA probe. Our analysis uncovers interpretable brittleness patterns (e.g., legalese-induced FPs; bland procedural tone FNs) and reduced but persistent vulnerabilities under scenario-constraint attacks. These results suggest that simple prompted red-teaming scaffolding can anticipate failure patterns before deployment and might yield promising, actionable insights to harden future probes.
【18】Reasoning Planning for Language Models
标题:语言模型的推理规划
链接:https://arxiv.org/abs/2511.00521
作者:Bao Nguyen, Hieu Trung Nguyen, Ruifeng She, Xiaojin Fu, Viet Anh Nguyen
备注:29 pages, 5 figures
摘要:Selecting an appropriate reasoning method for a given query remains a key challenge in language model generation. Existing approaches typically generate multiple candidate responses and use an aggregation strategy to select the output answer, often assuming that more candidate answers yield higher accuracy. We revisit this assumption through a rigorous theoretical analysis, deriving accuracy bounds for standard aggregation methods under fixed generation distributions and candidate sizes. Building on these insights, we introduce EPIC, an Ensemble Planning with Contrastive learning framework to learn a shared representation space that captures both model reasoning abilities and query-method compatibility. EPIC incorporates our probability bounds as a regularizer in a utility-driven optimization that balances accuracy and computational cost. Experiments on diverse mathematical reasoning tasks show that EPIC consistently selects optimal reasoning methods, improving accuracy while reducing computational overhead. Our code can be found at https://github.com/nguyenngocbaocmt02/EPIC.
【19】Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse
标题:树训练:通过共享前置重用加速统计学LLM训练
链接:https://arxiv.org/abs/2511.00413
作者:Shaojie Wang, Jinghui Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Liang Huang, Xiaojiang Zhang, Junyi Peng, Li Wan, Haotian Zhang, Bin Chen
摘要:In agentic LLM scenarios, an agent's interaction process during a single rollout often exhibits branching behaviors. Due to memory retrieval and concurrent tool executions at certain decision points, the token trajectory of one task evolves into a tree-like structure rather than a linear sequence. However, current training pipelines decompose such tree-structured trajectories into separate linear segments, treating each branch as an independent sequence. As a result, shared prefixes across these branches are repeatedly recomputed during both forward and backward passes. To address this inefficiency, we propose Tree Training, a paradigm that computes each shared prefix only once and reuses its intermediate results across related branches during both forward and backward passes, substantially improving computation efficiency in large-scale agentic training. This is achieved via (i) Tree Packing, which efficiently reuses shared computations across trajectories, and (ii) Gradient Restoration, which ensures correct gradient propagation across reused prefixes. Experiments on multiple open-source models demonstrate up to 3.9x reduction in total training time, enabling more efficient agentic LLM SFT and RL training.
【20】Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs
标题:效率与一致:调查LLM参数高效微调中的安全和公平风险
链接:https://arxiv.org/abs/2511.00382
作者:Mina Taraghi, Yann Pequignot, Amin Nikanjam, Mohamed Amine Merzouk, Foutse Khomh
摘要
:Organizations are increasingly adopting and adapting Large Language Models (LLMs) hosted on public repositories such as HuggingFace. Although these adaptations often improve performance on specialized downstream tasks, recent evidence indicates that they can also degrade a model's safety or fairness. Since different fine-tuning techniques may exert distinct effects on these critical dimensions, this study undertakes a systematic assessment of their trade-offs. Four widely used Parameter-Efficient Fine-Tuning methods, LoRA, IA3, Prompt-Tuning, and P-Tuning, are applied to four instruction-tuned model families (Meta-Llama-3-8B, Qwen2.5-7B, Mistral-7B, and Gemma-7B). In total, 235 fine-tuned variants are evaluated across eleven safety hazard categories and nine demographic fairness dimensions. The results show that adapter-based approaches (LoRA, IA3) tend to improve safety scores and are the least disruptive to fairness, retaining higher accuracy and lower bias scores. In contrast, prompt-based methods (Prompt-Tuning and P-Tuning) generally reduce safety and cause larger fairness regressions, with decreased accuracy and increased bias. Alignment shifts are strongly moderated by base model type: LLaMA remains stable, Qwen records modest gains, Gemma experiences the steepest safety decline, and Mistral, which is released without an internal moderation layer, displays the greatest variance. Improvements in safety do not necessarily translate into improvements in fairness, and no single configuration optimizes all fairness metrics simultaneously, indicating an inherent trade-off between these objectives. These findings suggest a practical guideline for safety-critical deployments: begin with a well-aligned base model, favour adapter-based PEFT, and conduct category-specific audits of both safety and fairness.
【21】Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks
标题:利用潜在空间不连续性构建通用LLM越狱和数据提取攻击
链接:https://arxiv.org/abs/2511.00346
作者:Kayua Oleques Paim, Rodrigo Brandao Mansilha, Diego Kreutz, Muriel Figueredo Franco, Weverton Cordeiro
备注:10 pages, 5 figures, 4 tables, Published at the Brazilian Symposium on Cybersecurity (SBSeg 2025)
摘要:The rapid proliferation of Large Language Models (LLMs) has raised significant concerns about their security against adversarial attacks. In this work, we propose a novel approach to crafting universal jailbreaks and data extraction attacks by exploiting latent space discontinuities, an architectural vulnerability related to the sparsity of training data. Unlike previous methods, our technique generalizes across various models and interfaces, proving highly effective in seven state-of-the-art LLMs and one image generation model. Initial results indicate that when these discontinuities are exploited, they can consistently and profoundly compromise model behavior, even in the presence of layered defenses. The findings suggest that this strategy has substantial potential as a systemic attack vector.
【22】MH-1M: A 1.34 Million-Sample Comprehensive Multi-Feature Android Malware Dataset for Machine Learning, Deep Learning, Large Language Models, and Threat Intelligence Research
标题:MH-1 M:用于机器学习、深度学习、大型语言模型和威胁情报研究的134万样本全面多功能Android恶意软件数据集
链接:https://arxiv.org/abs/2511.00342
作者:Hendrio Braganca, Diego Kreutz, Vanderson Rocha, Joner Assolin, and Eduardo Feitosa
备注:17 pages, 7 figures, 13 tables, submitted to the Scientific Data journal published by Nature Research
摘要:We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of features and extensive metadata. To ensure accurate malware classification, we employ the VirusTotal API, integrating multiple detection engines for comprehensive and reliable assessment. Our GitHub, Figshare, and Harvard Dataverse repositories provide open access to the processed dataset and its extensive supplementary metadata, totaling more than 400 GB of data and including the outputs of the feature extraction pipeline as well as the corresponding VirusTotal reports. Our findings underscore the MH-1M dataset's invaluable role in understanding the evolving landscape of malware.
【23】A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data
标题:混合LLM合成数据因果推理的技术探索
链接:https://arxiv.org/abs/2511.00318
作者:Dana Kim, Yichen Xu, Tiffany Lin
备注:9 pages, 4 figures
摘要:Large Language Models (LLMs) offer a flexible means to generate synthetic tabular data, yet existing approaches often fail to preserve key causal parameters such as the average treatment effect (ATE). In this technical exploration, we first demonstrate that state-of-the-art synthetic data generators, both GAN- and LLM-based, can achieve high predictive fidelity while substantially misestimating causal effects. To address this gap, we propose a hybrid generation framework that combines model-based covariate synthesis (monitored via distance-to-closest-record filtering) with separately learned propensity and outcome models, thereby ensuring that (W, A, Y) triplets retain their underlying causal structure. We further introduce a synthetic pairing strategy to mitigate positivity violations and a realistic evaluation protocol that leverages unlimited synthetic samples to benchmark traditional estimators (IPTW, AIPW, substitution) under complex covariate distributions. This work lays the groundwork for LLM-powered data pipelines that support robust causal analysis. Our code is available at https://github.com/Xyc-arch/llm-synthetic-for-causal-inference.git.
【24】Calibration Across Layers: Understanding Calibration Evolution in LLMs
标题:跨层校准:了解LLC中的校准演变
链接:https://arxiv.org/abs/2511.00280
作者:Abhinav Joshi, Areeb Ahmad, Ashutosh Modi
备注:Accepted at EMNLP 2025 (main)
摘要:Large Language Models (LLMs) have demonstrated inherent calibration capabilities, where predicted probabilities align well with correctness, despite prior findings that deep neural networks are often overconfident. Recent studies have linked this behavior to specific components in the final layer, such as entropy neurons and the unembedding matrix null space. In this work, we provide a complementary perspective by investigating how calibration evolves throughout the network depth. Analyzing multiple open-weight models on the MMLU benchmark, we uncover a distinct confidence correction phase in the upper/later layers, where model confidence is actively recalibrated after decision certainty has been reached. Furthermore, we identify a low-dimensional calibration direction in the residual stream whose perturbation significantly improves calibration metrics (ECE and MCE) without harming accuracy. Our findings suggest that calibration is a distributed phenomenon, shaped throughout the network forward pass, not just in its final projection, providing new insights into how confidence-regulating mechanisms operate within LLMs.
【25】Diffusion LLMs are Natural Adversaries for any LLM
标题:扩散LLM是任何LLM的天然对手
链接:https://arxiv.org/abs/2511.00203
作者:David Lüdke, Tom Wollschläger, Paul Ungermann, Stephan Günnemann, Leo Schwinn
摘要:We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \emph{efficient, amortized inference task}. Our core insight is that pretrained, non-autoregressive generative LLMs, such as Diffusion LLMs, which model the joint distribution over prompt-response pairs, can serve as powerful surrogates for prompt search. This approach enables the direct conditional generation of prompts, effectively replacing costly, per-instance discrete optimization with a small number of parallelizable samples. We provide a probabilistic analysis demonstrating that under mild fidelity assumptions, only a few conditional samples are required to recover high-reward (harmful) prompts. Empirically, we find that the generated prompts are low-perplexity, diverse jailbreaks that exhibit strong transferability to a wide range of black-box target models, including robustly trained and proprietary LLMs. Beyond adversarial prompting, our framework opens new directions for red teaming, automated prompt optimization, and leveraging emerging Flow- and Diffusion-based LLMs.
【26】EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs
标题:EL-MIA:量化LLC中敏感实体的成员推断风险
链接:https://arxiv.org/abs/2511.00192
作者:Ali Satvaty, Suzan Verberne, Fatih Turkmen
摘要:Membership inference attacks (MIA) aim to infer whether a particular data point is part of the training dataset of a model. In this paper, we propose a new task in the context of LLM privacy: entity-level discovery of membership risk focused on sensitive information (PII, credit card numbers, etc). Existing methods for MIA can detect the presence of entire prompts or documents in the LLM training data, but they fail to capture risks at a finer granularity. We propose the ``EL-MIA'' framework for auditing entity-level membership risks in LLMs. We construct a benchmark dataset for the evaluation of MIA methods on this task. Using this benchmark, we conduct a systematic comparison of existing MIA techniques as well as two newly proposed methods. We provide a comprehensive analysis of the results, trying to explain the relation of the entity level MIA susceptability with the model scale, training epochs, and other surface level factors. Our findings reveal that existing MIA methods are limited when it comes to entity-level membership inference of the sensitive attributes, while this susceptibility can be outlined with relatively straightforward methods, highlighting the need for stronger adversaries to stress test the provided threat model.
【27】ParaScopes: What do Language Models Activations Encode About Future Text?
标题:ParaScopes:语言模型激活对未来文本进行哪些编码?
链接:https://arxiv.org/abs/2511.00180
作者:Nicky Pochinkov, Yulia Volkova, Anna Vasileva, Sai V R Chereddy
备注:Main paper: 9 pages, 10 figures. Total 24 pages
摘要:Interpretability studies in language models often investigate forward-looking representations of activations. However, as language models become capable of doing ever longer time horizon tasks, methods for understanding activations often remain limited to testing specific concepts or tokens. We develop a framework of Residual Stream Decoders as a method of probing model activations for paragraph-scale and document-scale plans. We test several methods and find information can be decoded equivalent to 5+ tokens of future context in small models. These results lay the groundwork for better monitoring of language models and better understanding how they might encode longer-term planning information.
【28】Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
标题:严重不良事件能否揭示并减轻医疗保健领域LLM的种族偏见?
链接:https://arxiv.org/abs/2511.00177
作者:Hiba Ahsan, Byron C. Wallace
摘要:LLMs are increasingly being used in healthcare. This promises to free physicians from drudgery, enabling better care to be delivered at scale. But the use of LLMs in this space also brings risks; for example, such models may worsen existing biases. How can we spot when LLMs are (spuriously) relying on patient race to inform predictions? In this work we assess the degree to which Sparse Autoencoders (SAEs) can reveal (and control) associations the model has made between race and stigmatizing concepts. We first identify SAE latents in Gemma-2 models which appear to correlate with Black individuals. We find that this latent activates on reasonable input sequences (e.g., "African American") but also problematic words like "incarceration". We then show that we can use this latent to steer models to generate outputs about Black patients, and further that this can induce problematic associations in model outputs as a result. For example, activating the Black latent increases the risk assigned to the probability that a patient will become "belligerent". We evaluate the degree to which such steering via latents might be useful for mitigating bias. We find that this offers improvements in simple settings, but is less successful for more realistic and complex clinical tasks. Overall, our results suggest that: SAEs may offer a useful tool in clinical applications of LLMs to identify problematic reliance on demographics but mitigating bias via SAE steering appears to be of marginal utility for realistic tasks.
【29】A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control
标题:用于并行细粒度交通信号控制的双大语言模型架构
链接:https://arxiv.org/abs/2511.00136
作者:Qing Guo, Xinhang Li, Junyu Chen, Zheng Guo, Xiaocong Li, Lin Zhang, Lei Li
摘要
:Leveraging large language models (LLMs) in traffic signal control (TSC) improves optimization efficiency and interpretability compared to traditional reinforcement learning (RL) methods. However, existing LLM-based approaches are limited by fixed time signal durations and are prone to hallucination errors, while RL methods lack robustness in signal timing decisions and suffer from poor generalization. To address these challenges, this paper proposes HeraldLight, a dual LLMs architecture enhanced by Herald guided prompts. The Herald Module extracts contextual information and forecasts queue lengths for each traffic phase based on real-time conditions. The first LLM, LLM-Agent, uses these forecasts to make fine grained traffic signal control, while the second LLM, LLM-Critic, refines LLM-Agent's outputs, correcting errors and hallucinations. These refined outputs are used for score-based fine-tuning to improve accuracy and robustness. Simulation experiments using CityFlow on real world datasets covering 224 intersections in Jinan (12), Hangzhou (16), and New York (196) demonstrate that HeraldLight outperforms state of the art baselines, achieving a 20.03% reduction in average travel time across all scenarios and a 10.74% reduction in average queue length on the Jinan and Hangzhou scenarios. The source code is available on GitHub: https://github.com/BUPT-ANTlab/HeraldLight.
【30】A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios
标题:LLM改编的比较分析:数据稀缺场景中的SFT、LoRA和ICL
链接:https://arxiv.org/abs/2511.00130
作者:Bernd Bohnet, Rumen Dangovski, Kevin Swersky, Sherry Moore, Arslan Chaudhry, Kathleen Kenealy, Noah Fiedel
摘要:The remarkable capabilities of Large Language Models (LLMs) often need to be tailored for specific applications, requiring the integration of new knowledge or the acquisition of new skills. While full fine-tuning is a powerful adaptation method, it is computationally expensive and can lead to a degradation of general reasoning abilities, a phenomenon known as catastrophic forgetting. A range of alternative techniques exists, each with its own trade-offs. In-Context Learning (ICL) is fast but limited by context length, while Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) offer a middle ground by minimizing parameter changes. However, the challenge of catastrophic forgetting persists, raising questions about the best adaptation strategy for a given task. This paper presents a comparative analysis of Supervised Finetuning (SFT), LoRA, and ICL in data-scarce scenarios. We find that LoRA provides the most effective balance, successfully instilling new skills with minimal impact on the base model's general knowledge. In contrast, while SFT excels at skill acquisition, it is highly susceptible to catastrophic forgetting. ICL is effective for incorporating factual knowledge but struggles with complex skills. Our findings offer a practical framework for selecting an LLM adaptation strategy. We highlight the critical distinction between skill acquisition and knowledge integration, clarify the trade-offs between task-specific performance and the preservation of general capabilities.
【31】Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving
标题:Loquetier:用于统一LLM微调和服务的虚拟化多LoRA框架
链接:https://arxiv.org/abs/2511.00101
作者:Yuchen Zhang, Hanyue Du, Chun Cao, Jingwei Xu
备注:26 pages including 10 pages of main text, 6 figures, 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning (PEFT) technique for adapting large language models (LLMs) to downstream tasks. While prior work has explored strategies for integrating LLM training and serving, there still remains a gap in unifying fine-tuning and inference for LoRA-based models. We present Loquetier, a virtualized multi-LoRA framework that seamlessly integrates LoRA fine-tuning and serving within a single runtime. Loquetier introduces two key components: (1) a Virtualized Module that isolates PEFT-based modifications and supports multiple adapters on a shared base model, and (2) an optimized computation flow with a kernel design that merges fine-tuning and inference paths in forward propagation, enabling efficient batching and minimizing kernel invocation overhead. Extensive experiments across three task settings show that Loquetier consistently outperforms existing baselines in both performance and flexibility, achieving up to $3.0\times$ the throughput of the state-of-the-art co-serving system on inference-only tasks and $46.4\times$ higher SLO attainment than PEFT on unified fine-tuning and inference tasks. The implementation of Loquetier is publicly available at https://github.com/NJUDeepEngine/Loquetier.
【32】Latent Domain Prompt Learning for Vision-Language Models
标题:视觉语言模型的潜在领域提示学习
链接:https://arxiv.org/abs/2511.00067
作者:Zhixing Li, Arsham Gholamzadeh Khoee, Yinan Yu
摘要:The objective of domain generalization (DG) is to enable models to be robust against domain shift. DG is crucial for deploying vision-language models (VLMs) in real-world applications, yet most existing methods rely on domain labels that may not be available and often ambiguous. We instead study the DG setting where models must generalize well without access to explicit domain labels. Our key idea is to represent an unseen target domain as a combination of latent domains automatically discovered from training data, enabling the model to adaptively transfer knowledge across domains. To realize this, we perform latent domain clustering on image features and fuse domain-specific text features based on the similarity between the input image and each latent domain. Experiments on four benchmarks show that this strategy yields consistent gains over VLM-based baselines and provides new insights into improving robustness under domain shift.
【33】Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models
标题:大型语言模型中稳定强化学习的令牌监管组相对策略优化
链接:https://arxiv.org/abs/2511.00066
作者:Tue Le, Nghi D.Q.Bui, Linh Ngo Van, Trung Le
摘要
:Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful approach for strengthening the reasoning capabilities of large language models (LLMs). Among existing algorithms, Group Relative Policy Optimization (GRPO) has demonstrated strong performance, yet it suffers from a critical issue: low-probability tokens disproportionately dominate gradient updates due to their inherently large gradient magnitudes. This imbalance leads to unstable training and suppresses the contribution of high-probability tokens that are more reliable for learning. In this work, we introduce Token-Regulated Group Relative Policy Optimization (TR-GRPO), a simple yet effective extension of GRPO that assigns token-level weights positively correlated with the model's predicted probability. By downweighting low-probability tokens and emphasizing high-probability ones, TR-GRPO mitigates gradient over-amplification while preserving informative learning signals. Extensive experiments demonstrate that TR-GRPO consistently outperforms GRPO across RLVR tasks, including logic, math, and agentic reasoning, highlighting the importance of regulating token contributions during RL training and establishing TR-GRPO as a robust framework for enhancing LLM reasoning.
【34】MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling
标题:MISA:采用模块重要性抽样的内存高效LLM优化
链接:https://arxiv.org/abs/2511.00056
作者:Yuxi Liu, Renjia Deng, Yutong He, Xue Wang, Tao Yao, Kun Yuan
摘要:The substantial memory demands of pre-training and fine-tuning large language models (LLMs) require memory-efficient optimization algorithms. One promising approach is layer-wise optimization, which treats each transformer block as a single layer and optimizes it sequentially, while freezing the other layers to save optimizer states and activations. Although effective, these methods ignore the varying importance of the modules within each layer, leading to suboptimal performance. Moreover, layer-wise sampling provides only limited memory savings, as at least one full layer must remain active during optimization. To overcome these limitations, we propose Module-wise Importance SAmpling (MISA), a novel method that divides each layer into smaller modules and assigns importance scores to each module. MISA uses a weighted random sampling mechanism to activate modules, provably reducing gradient variance compared to layer-wise sampling. Additionally, we establish an \(\mathcal{O}(1/\sqrt{K})\) convergence rate under non-convex and stochastic conditions, where $K$ is the total number of block updates, and provide a detailed memory analysis showcasing MISA's superiority over existing baseline methods. Experiments on diverse learning tasks validate the effectiveness of MISA. Source code is available at https://github.com/pkumelon/MISA.
【35】FLoRA: Fused forward-backward adapters for parameter efficient fine-tuning and reducing inference-time latencies of LLMs
标题:FLoRA:融合前向-后向适配器,用于参数高效微调并减少LLM的推断时间延迟
链接:https://arxiv.org/abs/2511.00050
作者:Dhananjaya Gowda, Seoha Song, Junhyun Lee, Harshith Goka
摘要:As the large language models (LLMs) grow in size each day, efficient training and fine-tuning has never been as important as nowadays. This resulted in the great interest in parameter efficient fine-tuning (PEFT), and effective methods including low-rank adapters (LoRA) has emerged. Although the various PEFT methods have been studied extensively in the recent years, the greater part of the subject remains unexplored with the huge degree of freedom. In this paper, we propose FLoRA, a family of fused forward-backward adapters (FFBA) for parameter-efficient fine-tuning of LLMs on downstream tasks. The FFBA combine ideas from the popular LoRA and parallel adapters to improve the overall fine-tuning accuracies. At the same time, latencies are minimized by fusing the forward and backward adapters into existing projection layers of the base model. Experimental results show that the proposed FFB adapters perform significantly better than the popularly used LoRA in both accuracy and latency for a similar parameter budget.
【36】Probing Knowledge Holes in Unlearned LLMs
标题:探索未学习的法学硕士中的知识漏洞
链接:https://arxiv.org/abs/2511.00030
作者:Myeongseob Ko, Hoang Anh Just, Charles Fleming, Ming Jin, Ruoxi Jia
备注:The Thirty-ninth Annual Conference on Neural Information Processing Systems
摘要:Machine unlearning has emerged as a prevalent technical solution for selectively removing unwanted knowledge absorbed during pre-training, without requiring full retraining. While recent unlearning techniques can effectively remove undesirable content without severely compromising performance on standard benchmarks, we find that they may inadvertently create ``knowledge holes'' -- unintended losses of benign knowledge that standard benchmarks fail to capture. To probe where unlearned models reveal knowledge holes, we propose a test case generation framework that explores both immediate neighbors of unlearned content and broader areas of potential failures. Our evaluation demonstrates significant hidden costs of unlearning: up to 98.7\% of the test cases yield irrelevant or nonsensical responses from unlearned models, despite being answerable by the pretrained model. These findings necessitate rethinking the conventional approach to evaluating knowledge preservation in unlearning, moving beyond standard, static benchmarks.
【37】Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model
标题:与人工智能闲聊:通过大语言模型了解全球公司的供应链碳披露
链接:https://arxiv.org/abs/2511.00024
作者:Haotian Hang, Yueyang Shen, Vicky Zhu, Jose Cruz, Michelle Li
摘要:In the context of global sustainability mandates, corporate carbon disclosure has emerged as a critical mechanism for aligning business strategy with environmental responsibility. The Carbon Disclosure Project (CDP) hosts the world's largest longitudinal dataset of climate-related survey responses, combining structured indicators with open-ended narratives, but the heterogeneity and free-form nature of these disclosures present significant analytical challenges for benchmarking, compliance monitoring, and investment screening. This paper proposes a novel decision-support framework that leverages large language models (LLMs) to assess corporate climate disclosure quality at scale. It develops a master rubric that harmonizes narrative scoring across 11 years of CDP data (2010-2020), enabling cross-sector and cross-country benchmarking. By integrating rubric-guided scoring with percentile-based normalization, our method identifies temporal trends, strategic alignment patterns, and inconsistencies in disclosure across industries and regions. Results reveal that sectors such as technology and countries like Germany consistently demonstrate higher rubric alignment, while others exhibit volatility or superficial engagement, offering insights that inform key decision-making processes for investors, regulators, and corporate environmental, social, and governance (ESG) strategists. The proposed LLM-based approach transforms unstructured disclosures into quantifiable, interpretable, comparable, and actionable intelligence, advancing the capabilities of AI-enabled decision support systems (DSSs) in the domain of climate governance.
Graph相关(图学习|图神经网络|图优化等)(16篇)
【1】Gated Fusion Enhanced Multi-Scale Hierarchical Graph Convolutional Network for Stock Movement Prediction
标题:用于股票走势预测的门控融合增强型多尺度层次图卷积网络
链接:https://arxiv.org/abs/2511.01570
作者:Xiaosha Xue, Peibo Duan, Zhipeng Liu, Qi Chu, Changsheng Zhang, Bin zhang
摘要:Accurately predicting stock market movements remains a formidable challenge due to the inherent volatility and complex interdependencies among stocks. Although multi-scale Graph Neural Networks (GNNs) hold potential for modeling these relationships, they frequently neglect two key points: the subtle intra-attribute patterns within each stock affecting inter-stock correlation, and the biased attention to coarse- and fine-grained features during multi-scale sampling. To overcome these challenges, we introduce MS-HGFN (Multi-Scale Hierarchical Graph Fusion Network). The model features a hierarchical GNN module that forms dynamic graphs by learning patterns from intra-attributes and features from inter-attributes over different time scales, thus comprehensively capturing spatio-temporal dependencies. Additionally, a top-down gating approach facilitates the integration of multi-scale spatio-temporal features, preserving critical coarse- and fine-grained features without too much interference. Experiments utilizing real-world datasets from U.S. and Chinese stock markets demonstrate that MS-HGFN outperforms both traditional and advanced models, yielding up to a 1.4% improvement in prediction accuracy and enhanced stability in return simulations. The code is available at https://anonymous.4open.science/r/MS-HGFN.
【2】Efficient Curvature-aware Graph Network
标题:高效的曲线感知图网络
链接:https://arxiv.org/abs/2511.01443
作者:Chaoqun Fei, Tinglve Zhou, Tianyong Hao, Yangyang Li
摘要:Graph curvature provides geometric priors for Graph Neural Networks (GNNs), enhancing their ability to model complex graph structures, particularly in terms of structural awareness, robustness, and theoretical interpretability. Among existing methods, Ollivier-Ricci curvature has been extensively studied due to its strong geometric interpretability, effectively characterizing the local geometric distribution between nodes. However, its prohibitively high computational complexity limits its applicability to large-scale graph datasets. To address this challenge, we propose a novel graph curvature measure--Effective Resistance Curvature--which quantifies the ease of message passing along graph edges using the effective resistance between node pairs, instead of the optimal transport distance. This method significantly outperforms Ollivier-Ricci curvature in computational efficiency while preserving comparable geometric expressiveness. Theoretically, we prove the low computational complexity of effective resistance curvature and establish its substitutability for Ollivier-Ricci curvature. Furthermore, extensive experiments on diverse GNN tasks demonstrate that our method achieves competitive performance with Ollivier-Ricci curvature while drastically reducing computational overhead.
【3】Leveraging Compact Satellite Embeddings and Graph Neural Networks for Large-Scale Poverty Mapping
标题:利用紧凑的卫星嵌入和图神经网络进行大规模贫困绘图
链接:https://arxiv.org/abs/2511.01408
作者:Markus B. Pettersson, Adel Daoud
摘要:Accurate, fine-grained poverty maps remain scarce across much of the Global South. While Demographic and Health Surveys (DHS) provide high-quality socioeconomic data, their spatial coverage is limited and reported coordinates are randomly displaced for privacy, further reducing their quality. We propose a graph-based approach leveraging low-dimensional AlphaEarth satellite embeddings to predict cluster-level wealth indices across Sub-Saharan Africa. By modeling spatial relations between surveyed and unlabeled locations, and by introducing a probabilistic "fuzzy label" loss to account for coordinate displacement, we improve the generalization of wealth predictions beyond existing surveys. Our experiments on 37 DHS datasets (2017-2023) show that incorporating graph structure slightly improves accuracy compared to "image-only" baselines, demonstrating the potential of compact EO embeddings for large-scale socioeconomic mapping.
【4】KAT-GNN: A Knowledge-Augmented Temporal Graph Neural Network for Risk Prediction in Electronic Health Records
标题:KAT-GNN:用于电子健康记录风险预测的知识增强时间图神经网络
链接:https://arxiv.org/abs/2511.01249
作者:Kun-Wei Lin, Yu-Chen Kuo, Hsin-Yao Wang, Yi-Ju Tseng
备注:10 pages, 3 figures
摘要:Clinical risk prediction using electronic health records (EHRs) is vital to facilitate timely interventions and clinical decision support. However, modeling heterogeneous and irregular temporal EHR data presents significant challenges. We propose \textbf{KAT-GNN} (Knowledge-Augmented Temporal Graph Neural Network), a graph-based framework that integrates clinical knowledge and temporal dynamics for risk prediction. KAT-GNN first constructs modality-specific patient graphs from EHRs. These graphs are then augmented using two knowledge sources: (1) ontology-driven edges derived from SNOMED CT and (2) co-occurrence priors extracted from EHRs. Subsequently, a time-aware transformer is employed to capture longitudinal dynamics from the graph-encoded patient representations. KAT-GNN is evaluated on three distinct datasets and tasks: coronary artery disease (CAD) prediction using the Chang Gung Research Database (CGRD) and in-hospital mortality prediction using the MIMIC-III and MIMIC-IV datasets. KAT-GNN achieves state-of-the-art performance in CAD prediction (AUROC: 0.9269 $\pm$ 0.0029) and demonstrated strong results in mortality prediction in MIMIC-III (AUROC: 0.9230 $\pm$ 0.0070) and MIMIC-IV (AUROC: 0.8849 $\pm$ 0.0089), consistently outperforming established baselines such as GRASP and RETAIN. Ablation studies confirm that both knowledge-based augmentation and the temporal modeling component are significant contributors to performance gains. These findings demonstrate that the integration of clinical knowledge into graph representations, coupled with a time-aware attention mechanism, provides an effective and generalizable approach for risk prediction across diverse clinical tasks and datasets.
【5】WindMiL: Equivariant Graph Learning for Wind Loading Prediction
标题:WindMiL:用于风负荷预测的等变图学习
链接
:https://arxiv.org/abs/2511.01226
作者:Themistoklis Vargiemezis, Charilaos Kanatsoulis, Catherine Gorlé
摘要:Accurate prediction of wind loading on buildings is crucial for structural safety and sustainable design, yet conventional approaches such as wind tunnel testing and large-eddy simulation (LES) are prohibitively expensive for large-scale exploration. Each LES case typically requires at least 24 hours of computation, making comprehensive parametric studies infeasible. We introduce WindMiL, a new machine learning framework that combines systematic dataset generation with symmetry-aware graph neural networks (GNNs). First, we introduce a large-scale dataset of wind loads on low-rise buildings by applying signed distance function interpolation to roof geometries and simulating 462 cases with LES across varying shapes and wind directions. Second, we develop a reflection-equivariant GNN that guarantees physically consistent predictions under mirrored geometries. Across interpolation and extrapolation evaluations, WindMiL achieves high accuracy for both the mean and the standard deviation of surface pressure coefficients (e.g., RMSE $\leq 0.02$ for mean $C_p$) and remains accurate under reflected-test evaluation, maintaining hit rates above $96\%$ where the non-equivariant baseline model drops by more than $10\%$. By pairing a systematic dataset with an equivariant surrogate, WindMiL enables efficient, scalable, and accurate predictions of wind loads on buildings.
【6】Equality Graph Assisted Symbolic Regression
标题:等式图辅助符号回归
链接:https://arxiv.org/abs/2511.01009
作者:Fabricio Olivetti de Franca, Gabriel Kronberger
摘要:In Symbolic Regression (SR), Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy. Its success relies on the concept of neutrality, which induces large plateaus that the search can safely navigate to more promising regions. Navigating these plateaus, while necessary, requires the computation of redundant expressions, up to 60% of the total number of evaluation, as noted in a recent study. The equality graph (e-graph) structure can compactly store and group equivalent expressions enabling us to verify if a given expression and their variations were already visited by the search, thus enabling us to avoid unnecessary computation. We propose a new search algorithm for symbolic regression called SymRegg that revolves around the e-graph structure following simple steps: perturb solutions sampled from a selection of expressions stored in the e-graph, if it generates an unvisited expression, insert it into the e-graph and generates its equivalent forms. We show that SymRegg is capable of improving the efficiency of the search, maintaining consistently accurate results across different datasets while requiring a choice of a minimalist set of hyperparameters.
【7】Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
标题:均衡政策推广:追逐-逃避游戏中交叉图Zero-Shot推广的强化学习框架
链接:https://arxiv.org/abs/2511.00811
作者:Runyu Lu, Peng Zhang, Ruochuan Shi, Yuanheng Zhu, Dongbin Zhao, Yang Liu, Dong Wang, Cesare Alippi
摘要:Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios. These two generalizability properties, to our knowledge, are the first to appear in this domain. The core idea of the EPG framework is to train an RL policy across different graph structures against the equilibrium policy for each single graph. To construct an equilibrium oracle for single-graph policies, we present a dynamic programming (DP) algorithm that provably generates pure-strategy Nash equilibrium with near-optimal time complexity. To guarantee scalability with respect to pursuer number, we further extend DP and RL by designing a grouping mechanism and a sequence model for joint policy decomposition, respectively. Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance in various unseen real-world graphs. Besides, when trained under an equilibrium heuristic proposed for the graphs with exits, our generalized pursuer policy can even match the performance of the fine-tuned policies from the state-of-the-art PEG methods.
【8】A Framework Based on Graph Cellular Automata for Similarity Evaluation in Urban Spatial Networks
标题:基于图元胞自动机的城市空间网络相似性评估框架
链接:https://arxiv.org/abs/2511.00768
作者:Peiru Wu, Maojun Zhai, Lingzhu Zhang
摘要:Measuring similarity in urban spatial networks is key to understanding cities as complex systems. Yet most existing methods are not tailored for spatial networks and struggle to differentiate them effectively. We propose GCA-Sim, a similarity-evaluation framework based on graph cellular automata. Each submodel measures similarity by the divergence between value distributions recorded at multiple stages of an information evolution process. We find that some propagation rules magnify differences among network signals; we call this "network resonance." With an improved differentiable logic-gate network, we learn several submodels that induce network resonance. We evaluate similarity through clustering performance on fifty city-level and fifty district-level road networks. The submodels in this framework outperform existing methods, with Silhouette scores above 0.9. Using the best submodel, we further observe that planning-led street networks are less internally homogeneous than organically grown ones; morphological categories from different domains contribute with comparable importance; and degree, as a basic topological signal, becomes increasingly aligned with land value and related variables over iterations.
【9】MeixnerNet: Adaptive and Robust Spectral Graph Neural Networks with Discrete Orthogonal Polynomials
标题:MeixnerNet:具有离散垂直多项的自适应鲁棒谱图神经网络
链接
:https://arxiv.org/abs/2511.00113
作者:Huseyin Goksu
摘要:Spectral Graph Neural Networks (GNNs) have achieved state-of-the-art results by defining graph convolutions in the spectral domain. A common approach, popularized by ChebyNet, is to use polynomial filters based on continuous orthogonal polynomials (e.g., Chebyshev). This creates a theoretical disconnect, as these continuous-domain filters are applied to inherently discrete graph structures. We hypothesize this mismatch can lead to suboptimal performance and fragility to hyperparameter settings. In this paper, we introduce MeixnerNet, a novel spectral GNN architecture that employs discrete orthogonal polynomials -- specifically, the Meixner polynomials $M_k(x; \beta, c)$. Our model makes the two key shape parameters of the polynomial, beta and c, learnable, allowing the filter to adapt its polynomial basis to the specific spectral properties of a given graph. We overcome the significant numerical instability of these polynomials by introducing a novel stabilization technique that combines Laplacian scaling with per-basis LayerNorm. We demonstrate experimentally that MeixnerNet achieves competitive-to-superior performance against the strong ChebyNet baseline at the optimal K = 2 setting (winning on 2 out of 3 benchmarks). More critically, we show that MeixnerNet is exceptionally robust to variations in the polynomial degree K, a hyperparameter to which ChebyNet proves to be highly fragile, collapsing in performance where MeixnerNet remains stable.
【10】GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation
标题:GraphKeeper:通过知识分解和保存的图域增量学习
链接:https://arxiv.org/abs/2511.00097
作者:Zihao Guo, Qingyun Sun, Ziwei Zhang, Haonan Yuan, Huiping Zhuang, Xingcheng Fu, Jianxin Li
备注:Accepted by the Main Track of NeurIPS-2025
摘要:Graph incremental learning (GIL), which continuously updates graph models by sequential knowledge acquisition, has garnered significant interest recently. However, existing GIL approaches focus on task-incremental and class-incremental scenarios within a single domain. Graph domain-incremental learning (Domain-IL), aiming at updating models across multiple graph domains, has become critical with the development of graph foundation models (GFMs), but remains unexplored in the literature. In this paper, we propose Graph Domain-Incremental Learning via Knowledge Dientanglement and Preservation (GraphKeeper), to address catastrophic forgetting in Domain-IL scenario from the perspectives of embedding shifts and decision boundary deviations. Specifically, to prevent embedding shifts and confusion across incremental graph domains, we first propose the domain-specific parameter-efficient fine-tuning together with intra- and inter-domain disentanglement objectives. Consequently, to maintain a stable decision boundary, we introduce deviation-free knowledge preservation to continuously fit incremental domains. Additionally, for graphs with unobservable domains, we perform domain-aware distribution discrimination to obtain precise embeddings. Extensive experiments demonstrate the proposed GraphKeeper achieves state-of-the-art results with 6.5%~16.6% improvement over the runner-up with negligible forgetting. Moreover, we show GraphKeeper can be seamlessly integrated with various representative GFMs, highlighting its broad applicative potential.
【11】Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
标题:将测试时计算最优缩放推广为可优化图
链接:https://arxiv.org/abs/2511.00086
作者:Fali Wang, Jihai Chen, Shuhua Yang, Runxue Bao, Tianxiang Zhao, Zhiwei Zhang, Xianfeng Tang, Hui Liu, Qi He, Suhang Wang
备注:Under review
摘要:Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference, typically through parallel, sequential, or hybrid scaling. However, prior studies often assume fixed collaboration architectures (e.g., topologies) and single-model usage, overlooking that optimal architectures and model combinations can vary across tasks. Therefore, we study the novel problem of searching for compute-optimal model combinations and architectures in TTS under a fixed budget. We formalize it as a multi-LLM collaboration graph, where nodes encode roles and LLM model assignments, and edges capture information flow. This problem is challenging because (i) the combinatorial search space is prohibitively large, and (ii) task-specific requirements demand tailored designs. To address these, we reformulate the problem as probabilistic graph optimization and, through pilot experiments, derive three empirical insights into TTS collaboration graphs. Guided by these insights, we propose Agent-REINFORCE, an LLM-agent-augmented framework that mirrors the REINFORCE pipeline by mapping sampling-gradient-update to sampling-feedback-update, where feedback serves as a textual gradient to update the probabilistic graph and efficiently search for optimal multi-LLM collaboration graphs. Experiments show that Agent-REINFORCE outperforms both traditional and LLM-based baselines in sample efficiency and search performance, and effectively identifies optimal graphs under joint objectives of accuracy and inference latency.
【12】Fixed-point graph convolutional networks against adversarial attacks
标题:抗对抗攻击的定点图卷积网络
链接:https://arxiv.org/abs/2511.00083
作者:Shakib Khan, A. Ben Hamza, Amr Youssef
摘要:Adversarial attacks present a significant risk to the integrity and performance of graph neural networks, particularly in tasks where graph structure and node features are vulnerable to manipulation. In this paper, we present a novel model, called fixed-point iterative graph convolutional network (Fix-GCN), which achieves robustness against adversarial perturbations by effectively capturing higher-order node neighborhood information in the graph without additional memory or computational complexity. Specifically, we introduce a versatile spectral modulation filter and derive the feature propagation rule of our model using fixed-point iteration. Unlike traditional defense mechanisms that rely on additional design elements to counteract attacks, the proposed graph filter provides a flexible-pass filtering approach, allowing it to selectively attenuate high-frequency components while preserving low-frequency structural information in the graph signal. By iteratively updating node representations, our model offers a flexible and efficient framework for preserving essential graph information while mitigating the impact of adversarial manipulation. We demonstrate the effectiveness of the proposed model through extensive experiments on various benchmark graph datasets, showcasing its resilience against adversarial attacks.
【13】EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics
标题:EVINGCA:具有不断变化的邻居统计的自适应图集群
链接:https://arxiv.org/abs/2511.00064
作者:Randolph Wiredu-Aidoo
摘要:Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can be highly sensitive. I introduce EVINGCA (Evolving Variance-Informed Nonparametric Graph Construction Algorithm), a density-variance based clustering algorithm that treats cluster formation as an adaptive, evolving process on a nearest-neighbor graph. EVINGCA expands rooted graphs via breadth-first search, guided by continuously updated local distance and shape statistics, replacing fixed density thresholds with local statistical feedback. With spatial indexing, EVINGCA features log-linear complexity in the average case and exhibits competitive performance against baselines across a variety of synthetic, real-world, low-d, and high-d datasets.
【14】Adaptive Spatio-Temporal Graphs with Self-Supervised Pretraining for Multi-Horizon Weather Forecasting
标题:具有自我监督预训练的自适应时空图用于多地平线天气预报
链接:https://arxiv.org/abs/2511.00049
作者:Yao Liu
摘要:Accurate and robust weather forecasting remains a fundamental challenge due to the inherent spatio-temporal complexity of atmospheric systems. In this paper, we propose a novel self-supervised learning framework that leverages spatio-temporal structures to improve multi-variable weather prediction. The model integrates a graph neural network (GNN) for spatial reasoning, a self-supervised pretraining scheme for representation learning, and a spatio-temporal adaptation mechanism to enhance generalization across varying forecasting horizons. Extensive experiments on both ERA5 and MERRA-2 reanalysis datasets demonstrate that our approach achieves superior performance compared to traditional numerical weather prediction (NWP) models and recent deep learning methods. Quantitative evaluations and visual analyses in Beijing and Shanghai confirm the model's capability to capture fine-grained meteorological patterns. The proposed framework provides a scalable and label-efficient solution for future data-driven weather forecasting systems.
【15】DynBERG: Dynamic BERT-based Graph neural network for financial fraud detection
标题:DynBERG:用于金融欺诈检测的基于BERT的动态图神经网络
链接:https://arxiv.org/abs/2511.00047
作者:Omkar Kulkarni, Rohitash Chandra
摘要:Financial fraud detection is critical for maintaining the integrity of financial systems, particularly in decentralised environments such as cryptocurrency networks. Although Graph Convolutional Networks (GCNs) are widely used for financial fraud detection, graph Transformer models such as Graph-BERT are gaining prominence due to their Transformer-based architecture, which mitigates issues such as over-smoothing. Graph-BERT is designed for static graphs and primarily evaluated on citation networks with undirected edges. However, financial transaction networks are inherently dynamic, with evolving structures and directed edges representing the flow of money. To address these challenges, we introduce DynBERG, a novel architecture that integrates Graph-BERT with a Gated Recurrent Unit (GRU) layer to capture temporal evolution over multiple time steps. Additionally, we modify the underlying algorithm to support directed edges, making DynBERG well-suited for dynamic financial transaction analysis. We evaluate our model on the Elliptic dataset, which includes Bitcoin transactions, including all transactions during a major cryptocurrency market event, the Dark Market Shutdown. By assessing DynBERG's resilience before and after this event, we analyse its ability to adapt to significant market shifts that impact transaction behaviours. Our model is benchmarked against state-of-the-art dynamic graph classification approaches, such as EvolveGCN and GCN, demonstrating superior performance, outperforming EvolveGCN before the market shutdown and surpassing GCN after the event. Additionally, an ablation study highlights the critical role of incorporating a time-series deep learning component, showcasing the effectiveness of GRU in modelling the temporal dynamics of financial transactions.
【16】Graph-Attentive MAPPO for Dynamic Retail Pricing
标题:图形关注的MAPPO,用于动态零售定价
链接:https://arxiv.org/abs/2511.00039
作者:Krishna Kumar Neelakanta Pillai Santha Kumari Amma
摘要:Dynamic pricing in retail requires policies that adapt to shifting demand while coordinating decisions across related products. We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization, comparing a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT) that leverages learned interactions among products. Using a simulated pricing environment derived from real transaction data, we evaluate profit, stability across random seeds, fairness across products, and training efficiency under a standardized evaluation protocol. The results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control, and that MAPPO+GAT further enhances performance by sharing information over the product graph without inducing excessive price volatility. These results indicate that graph-integrated MARL provides a more scalable and stable solution than independent learners for dynamic retail pricing, offering practical advantages in multi-product decision-making.
Transformer(10篇)
【1】HIT-ROCKET: Hadamard-vector Inner-product Transformer for ROCKET
标题:HIT-ROCKET:用于ROCKET的Hadamard向量内积Transformer
链接:https://arxiv.org/abs/2511.01572
作者:Wang Hao, Kuang Zhang, Hou Chengyu, Yuan Zhonghao, Tan Chenxing, Fu Weifeng, Zhu Yangying
摘要
:Time series classification holds broad application value in communications, information countermeasures, finance, and medicine. However, state-of-the-art (SOTA) methods-including HIVE-COTE, Proximity Forest, and TS-CHIEF-exhibit high computational complexity, coupled with lengthy parameter tuning and training cycles. In contrast, lightweight solutions like ROCKET (Random Convolutional Kernel Transform) offer greater efficiency but leave substantial room for improvement in kernel selection and computational overhead. To address these challenges, we propose a feature extraction approach based on Hadamard convolutional transform, utilizing column or row vectors of Hadamard matrices as convolution kernels with extended lengths of varying sizes. This enhancement maintains full compatibility with existing methods (e.g., ROCKET) while leveraging kernel orthogonality to boost computational efficiency, robustness, and adaptability. Comprehensive experiments on multi-domain datasets-focusing on the UCR time series dataset-demonstrate SOTA performance: F1-score improved by at least 5% vs. ROCKET, with 50% shorter training time than miniROCKET (fastest ROCKET variant) under identical hyperparameters, enabling deployment on ultra-low-power embedded devices. All code is available on GitHub.
【2】Transformer-Based Decoding in Concatenated Coding Schemes Under Synchronization Errors
标题:同步错误下级联编码方案中基于变换器的解码
链接:https://arxiv.org/abs/2511.00999
作者:Julian Streit, Franziska Weindel, Reinhard Heckel
备注:16 pages, 19 figures, a shortened version was published in the ISIT 2025 conference
摘要:We consider the reconstruction of a codeword from multiple noisy copies that are independently corrupted by insertions, deletions, and substitutions. This problem arises, for example, in DNA data storage. A common code construction uses a concatenated coding scheme that combines an outer linear block code with an inner code, which can be either a nonlinear marker code or a convolutional code. Outer decoding is done with Belief Propagation, and inner decoding is done with the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm. However, the BCJR algorithm scales exponentially with the number of noisy copies, which makes it infeasible to reconstruct a codeword from more than about four copies. In this work, we introduce BCJRFormer, a transformer-based neural inner decoder. BCJRFormer achieves error rates comparable to the BCJR algorithm for binary and quaternary single-message transmissions of marker codes. Importantly, BCJRFormer scales quadratically with the number of noisy copies. This property makes BCJRFormer well-suited for DNA data storage, where multiple reads of the same DNA strand occur. To lower error rates, we replace the Belief Propagation outer decoder with a transformer-based decoder. Together, these modifications yield an efficient and performant end-to-end transformer-based pipeline for decoding multiple noisy copies affected by insertion, deletion, and substitution errors. Additionally, we propose a novel cross-attending transformer architecture called ConvBCJRFormer. This architecture extends BCJRFormer to decode transmissions of convolutional codewords, serving as an initial step toward joint inner and outer decoding for more general linear code classes.
【3】Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle
标题:Transformer作为本质优化者:通过能量原理的向前推理
链接:https://arxiv.org/abs/2511.00907
作者:Ruifeng Ren, Sheng Ouyang, Huayi Tang, Yong Liu
摘要:Transformers have demonstrated strong adaptability across a wide range of tasks and have become the backbone of modern Large Language Models (LLMs). However, their underlying mechanisms remain open for further exploration. The energy-based perspective has long provided a valuable principle for understanding neural computation. In this paper, we revisit the principle of energy as a lens to understand attention-based Transformer models. We present a unified energy-based framework which is composed of three key components: the global energy $F^*$, the energy function $E_i$ and the employed gradient descent (GD) form. Within this framework, standard softmax attention can be viewed as a special case of minimizing the Helmholtz free energy as $F^*$ using standard GD when $E_i$ takes the form of elastic potential energy, with residual connections ensuring that this optimization proceeds in an incremental manner. In addition, linear attentions can also be naturally incorporated into this framework by adjusting the corresponding energy forms. We also extend the above analysis to the multi-head setting, where the energy is defined across multiple low-dimensional subspaces. Building on this framework, we propose energy-based modifications of attention structures. Inspired by classical GD algorithms, we extend the original attention formulation based on standard GD to the momentum-based GD, Nesterov Accelerated Gradient (NAG), and Newton's method variants, each inducing a corresponding new attention structure. Our experiments provide preliminary support for the potential of the energy-based framework for designing attention mechanisms.
【4】LL-ViT: Edge Deployable Vision Transformers with Look Up Table Neurons
标题:LL-ViT:具有查找表神经元的边缘可部署视觉Transformer
链接:https://arxiv.org/abs/2511.00812
作者:Shashank Nag, Alan T.L. Bacellar, Zachary Susskind, Anshul Jha, Logan Liberty, Aishwarya Sivakumar, Eugene B. John, Krishnan Kailas, Priscila M.V. Lima, Neeraja J. Yadwadkar, Felipe M.G. Franca, Lizy K. John
备注:Accepted for FPT 2025, 9 pages, conference
摘要:Vision Transformers have been tremendously successful in computer vision tasks. However, their large computational, memory, and energy demands are a challenge for edge inference on FPGAs -- a field that has seen a recent surge in demand. We recognize the benefits of recent works on logic and Look Up Table (LUT) based networks, such as LogicNets, NeuraLUT, DWN, among others, in offering models that simultaneously reduce both the memory and compute footprints. However, these models natively do not perform well on common vision tasks, such as CIFAR-10/100. In this work, we propose LL-ViT, a novel edge optimized vision transformer design that integrates layers of LUT neurons within the transformer architecture. Based on our characterization that reveals that a majority of model weights and computations are from the channel mixer (MLP layer), we design an alternate LUT-based channel mixer, and simultaneously develop an FPGA-based accelerator for LL-ViT. Contrary to some attempts to replace each multiplication with a table lookup, our architecture utilizes a neural learning approach which natively learns the LUT functions. This approach allows for reduced model sizes, and a computational and energy-efficient inference solution for vision transformer models. Evaluating on edge-suitable workloads, we achieve accuracies of 95.5% on CIFAR-10, 78.8% on CIFAR-100, and 60.9% on Tiny-ImageNet datasets, comparable to the baseline transformer. LL-ViT eliminates over 60% of the model weights and 50% of the multiplications in the model, and achieves 1.9x energy efficiency and 1.3x lower latency over an integer quantized ViT accelerator, while also offering superior throughput against prior works at a 10.9W power budget.
【5】Attention Saturation and Gradient Suppression at Inflection Layers: Diagnosing and Mitigating Bottlenecks in Transformer Adaptation
标题:拐点处的注意力饱和和梯度抑制:诊断和缓解Transformer适应中的瓶颈
链接:https://arxiv.org/abs/2511.00797
作者:Wang Zixian
摘要:Pre-trained Transformers often exhibit over-confidence in source patterns and difficulty in forming new target-domain patterns during fine-tuning. We formalize the mechanism of output saturation leading to gradient suppression through standard cross-entropy and softmax analysis, showing that gradient suppression at inflection layers confines adaptation to high-level recombination of existing features while preventing low-level reconstruction. We introduce a set of layer-wise diagnostic metrics -- attention entropy (saturation proxy), activation gradient norm, parameter gradient norm, and Delta-CKA under a shared PCA basis -- to identify inflection layers characterized by both low attention entropy and steep gradient decay. Building on these findings, we propose a diagnose-first, inject-light fine-tuning strategy: selectively inserting LoRA adapters at inflection layers to restore suppressed backward signals with minimal parameter overhead. Experiments on BERT-base transfer from SST-2 to Rotten Tomatoes under under-trained and over-trained source regimes reveal that over-trained initialization benefits from inflection-layer LoRA injection, while under-trained initialization suffers performance degradation. When base features are strong, unblocking inflection layers facilitates high-level compositional adaptation; when base features are weak, full-pathway unblocking is required for low-level reconstruction, as supported by joint analysis of layer-wise activation gradients and Delta-CKA dynamics.
【6】FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction
标题:FTT-GRU:一种与GRU的混合快速时态Transformer,用于剩余使用寿命预测
链接:https://arxiv.org/abs/2511.00564
作者:Varun Teja Chirukiri, Udaya Bhasker Cheerala, Sandeep Kanta, Abdul Karim, Praveen Damacharla
备注:5 pages, The 2025 International Conference on Computational Science and Computational Intelligence
摘要:Accurate prediction of the remaining useful life (RUL) of industrial machinery is essential for reducing downtime and optimizing maintenance schedules. Existing approaches, such as long short-term memory (LSTM) networks and convolutional neural networks (CNNs), often struggle to model both global temporal dependencies and fine-grained degradation trends in multivariate sensor data. We propose a hybrid model, FTT-GRU, which combines a Fast Temporal Transformer (FTT) -- a lightweight Transformer variant using linearized attention via fast Fourier transform (FFT) -- with a gated recurrent unit (GRU) layer for sequential modeling. To the best of our knowledge, this is the first application of an FTT with a GRU for RUL prediction on NASA CMAPSS, enabling simultaneous capture of global and local degradation patterns in a compact architecture. On CMAPSS FD001, FTT-GRU attains RMSE 30.76, MAE 18.97, and $R^2=0.45$, with 1.12 ms CPU latency at batch=1. Relative to the best published deep baseline (TCN--Attention), it improves RMSE by 1.16\% and MAE by 4.00\%. Training curves averaged over $k=3$ runs show smooth convergence with narrow 95\% confidence bands, and ablations (GRU-only, FTT-only) support the contribution of both components. These results demonstrate that a compact Transformer-RNN hybrid delivers accurate and efficient RUL predictions on CMAPSS, making it suitable for real-time industrial prognostics.
【7】Temporal Fusion Transformer for Multi-Horizon Probabilistic Forecasting of Weekly Retail Sales
标题:用于每周零售额多水平概率预测的时间融合Transformer
链接:https://arxiv.org/abs/2511.00552
作者:Santhi Bharath Punati, Sandeep Kanta, Udaya Bhasker Cheerala, Madhusudan G Lanjewar, Praveen Damacharla
备注:5 pages, 2025 6th International Conference on Data Analytics for Business and Industry (ICDABI)
摘要:Accurate multi-horizon retail forecasts are critical for inventory and promotions. We present a novel study of weekly Walmart sales (45 stores, 2010--2012) using a Temporal Fusion Transformer (TFT) that fuses static store identifiers with time-varying exogenous signals (holidays, CPI, fuel price, temperature). The pipeline produces 1--5-week-ahead probabilistic forecasts via Quantile Loss, yielding calibrated 90\% prediction intervals and interpretability through variable-selection networks, static enrichment, and temporal attention. On a fixed 2012 hold-out dataset, TFT achieves an RMSE of \$57.9k USD per store-week and an $R^2$ of 0.9875. Across a 5-fold chronological cross-validation, the averages are RMSE = \$64.6k USD and $R^2$ = 0.9844, outperforming the XGB, CNN, LSTM, and CNN-LSTM baseline models. These results demonstrate practical value for inventory planning and holiday-period optimization, while maintaining model transparency.
【8】Integrating ConvNeXt and Vision Transformers for Enhancing Facial Age Estimation
标题:集成ConvNeXt和Vision Transformers以增强面部年龄估计
链接:https://arxiv.org/abs/2511.00123
作者:Gaby Maroun, Salah Eddine Bekhouche, Fadi Dornaika
摘要
:Age estimation from facial images is a complex and multifaceted challenge in computer vision. In this study, we present a novel hybrid architecture that combines ConvNeXt, a state-of-the-art advancement of convolutional neural networks (CNNs), with Vision Transformers (ViT). While each model independently delivers excellent performance on a variety of tasks, their integration leverages the complementary strengths of the CNNs localized feature extraction capabilities and the Transformers global attention mechanisms. Our proposed ConvNeXt-ViT hybrid solution was thoroughly evaluated on benchmark age estimation datasets, including MORPH II, CACD, and AFAD, and achieved superior performance in terms of mean absolute error (MAE). To address computational constraints, we leverage pre-trained models and systematically explore different configurations, using linear layers and advanced regularization techniques to optimize the architecture. Comprehensive ablation studies highlight the critical role of individual components and training strategies, and in particular emphasize the importance of adapted attention mechanisms within the CNN framework to improve the model focus on age-relevant facial features. The results show that the ConvNeXt-ViT hybrid not only outperforms traditional methods, but also provides a robust foundation for future advances in age estimation and related visual tasks. This work underscores the transformative potential of hybrid architectures and represents a promising direction for the seamless integration of CNNs and transformers to address complex computer vision challenges.
【9】Automated Discovery of Conservation Laws via Hybrid Neural ODE-Transformers
标题:通过混合神经ODE-变形器自动发现保守定律
链接:https://arxiv.org/abs/2511.00102
作者:Vivan Doshi
备注:5th Math-AI Workshop - Neural Information Processing Systems (NeurIPS 2025)
摘要:The discovery of conservation laws is a cornerstone of scientific progress. However, identifying these invariants from observational data remains a significant challenge. We propose a hybrid framework to automate the discovery of conserved quantities from noisy trajectory data. Our approach integrates three components: (1) a Neural Ordinary Differential Equation (Neural ODE) that learns a continuous model of the system's dynamics, (2) a Transformer that generates symbolic candidate invariants conditioned on the learned vector field, and (3) a symbolic-numeric verifier that provides a strong numerical certificate for the validity of these candidates. We test our framework on canonical physical systems and show that it significantly outperforms baselines that operate directly on trajectory data. This work demonstrates the robustness of a decoupled learn-then-search approach for discovering mathematical principles from imperfect data.
【10】Seed-Induced Uniqueness in Transformer Models: Subspace Alignment Governs Subliminal Transfer
标题:Transformer模型中种子诱导的唯一性:子空间对齐控制潜意识转移
链接:https://arxiv.org/abs/2511.01023
作者:Ayşe Selin Okatan, Mustafa İlhan Akbaş, Laxima Niure Kandel, Berker Peköz
备注:Cite as A. S. Okatan, M. I. Akbaş, L. N. Kandel, and B. Peköz, "Seed-Induced Uniqueness in Transformer Models: Subspace Alignment Governs Subliminal Transfer," in Proc. 2025 Cyber Awareness and Research Symp. (IEEE CARS 2025), Grand Forks, ND, Oct. 2025, pp. 6
摘要:We analyze subliminal transfer in Transformer models, where a teacher embeds hidden traits that can be linearly decoded by a student without degrading main-task performance. Prior work often attributes transferability to global representational similarity, typically quantified with Centered Kernel Alignment (CKA). Using synthetic corpora with disentangled public and private labels, we distill students under matched and independent random initializations. We find that transfer strength hinges on alignment within a trait-discriminative subspace: same-seed students inherit this alignment and show higher leakage {\tau \approx} 0.24, whereas different-seed students--despite global CKA > 0.9--exhibit substantially reduced excess accuracy {\tau \approx} 0.12 - 0.13. We formalize this with subspace-level CKA diagnostic and residualized probes, showing that leakage tracks alignment within the trait-discriminative subspace rather than global representational similarity. Security controls (projection penalty, adversarial reversal, right-for-the-wrong-reasons regularization) reduce leakage in same-base models without impairing public-task fidelity. These results establish seed-induced uniqueness as a resilience property and argue for subspace-aware diagnostics for secure multi-model deployments.
GAN|对抗|攻击|生成相关(14篇)
【1】RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks
标题:RLAC:具有对抗性的强化学习对自由形式生成任务的批评
链接:https://arxiv.org/abs/2511.01758
作者:Mian Wu, Gavin Zhang, Sewon Min, Sergey Levine, Aviral Kumar
备注:Project page: this https URL
摘要:Open-ended generation tasks require outputs to satisfy diverse and often implicit task-specific evaluation rubrics. The sheer number of relevant rubrics leads to prohibitively high verification costs and incomplete assessments of a response, making reinforcement learning (RL) post-training with rubric-based rewards difficult to scale. This problem is exacerbated by the fact that often the best way to combine these rubrics into one single reward is also highly prompt-specific. We propose Reinforcement Learning with Adversarial Critic (RLAC), a post-training approach that addresses these challenges via dynamic rubric verification. Our approach employs a large language model (LLM) as a critic that dynamically identifies only the most likely failure modes (e.g., a factual error or unhandled edge case), which are then verified by an external validator to optimize both generator and critic jointly. By training both the generator and the critic, this game enhances the critic's error detection and the generator's output quality while reducing required verifications. Our experiments demonstrate that RLAC improves factual accuracy in text generation and correctness in code generation, while also outperforming exhaustive verification and reward model methods. We show that dynamic critics are more effective than fixed critics, showcasing the potential of RLAC for scaling RL post-training to free-form generation tasks.
【2】Protecting the Neural Networks against FGSM Attack Using Machine Unlearning
标题:使用机器去学习保护神经网络免受FGSM攻击
链接:https://arxiv.org/abs/2511.01377
作者:Amir Hossein Khorasani, Ali Jahanian, Maryam Rastgarpour
备注:7 pages, 9 figures, 1 table
摘要:Machine learning is a powerful tool for building predictive models. However, it is vulnerable to adversarial attacks. Fast Gradient Sign Method (FGSM) attacks are a common type of adversarial attack that adds small perturbations to input data to trick a model into misclassifying it. In response to these attacks, researchers have developed methods for "unlearning" these attacks, which involves retraining a model on the original data without the added perturbations. Machine unlearning is a technique that tries to "forget" specific data points from the training dataset, to improve the robustness of a machine learning model against adversarial attacks like FGSM. In this paper, we focus on applying unlearning techniques to the LeNet neural network, a popular architecture for image classification. We evaluate the efficacy of unlearning FGSM attacks on the LeNet network and find that it can significantly improve its robustness against these types of attacks.
【3】MiniFool - Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks
标题:MiniFool -深度神经网络中基于物理约束的最小化器的对抗性攻击
链接:https://arxiv.org/abs/2511.01352
作者:Lucie Flek, Oliver Janik, Philipp Alexander Jung, Akbar Karimi, Timo Saala, Alexander Schmidt, Matthias Schott, Philipp Soldin, Matthias Thiesmeyer, Christopher Wiebusch, Ulrich Willemsen
备注:Submitted to Computing and Software for Big Science
摘要:In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating its general applicability. Here, we apply the algorithm to the well-known MNIST data set and furthermore, to Open Data data from the CMS experiment at the Large Hadron Collider. The algorithm is based on minimizing a cost function that combines a $\chi^2$ based test-statistic with the deviation from the desired target score. The test statistic quantifies the probability of the perturbations applied to the data based on the experimental uncertainties. For our studied use cases, we find that the likelihood of a flipped classification differs for both the initially correctly and incorrectly classified events. When testing changes of the classifications as a function of an attack parameter that scales the experimental uncertainties, the robustness of the network decision can be quantified. Furthermore, this allows testing the robustness of the classification of unlabeled experimental data.
【4】Adversarial Spatio-Temporal Attention Networks for Epileptic Seizure Forecasting
标题:用于癫痫发作预测的对抗性时空注意力网络
链接:https://arxiv.org/abs/2511.01275
作者:Zan Li, Kyongmin Yeo, Wesley Gifford, Lara Marcuse, Madeline Fields, Bülent Yener
摘要:Forecasting epileptic seizures from multivariate EEG signals represents a critical challenge in healthcare time series prediction, requiring high sensitivity, low false alarm rates, and subject-specific adaptability. We present STAN, an Adversarial Spatio-Temporal Attention Network that jointly models spatial brain connectivity and temporal neural dynamics through cascaded attention blocks with alternating spatial and temporal modules. Unlike existing approaches that assume fixed preictal durations or separately process spatial and temporal features, STAN captures bidirectional dependencies between spatial and temporal patterns through a unified cascaded architecture. Adversarial training with gradient penalty enables robust discrimination between interictal and preictal states learned from clearly defined 15-minute preictal windows. Continuous 90-minute pre-seizure monitoring reveals that the learned spatio-temporal attention patterns enable early detection: reliable alarms trigger at subject-specific times (typically 15-45 minutes before onset), reflecting the model's capacity to capture subtle preictal dynamics without requiring individualized training. Experiments on two benchmark EEG datasets (CHB-MIT scalp: 8 subjects, 46 events; MSSM intracranial: 4 subjects, 14 events) demonstrate state-of-the-art performance: 96.6% sensitivity with 0.011 false detections per hour and 94.2% sensitivity with 0.063 false detections per hour, respectively, while maintaining computational efficiency (2.3M parameters, 45 ms latency, 180 MB memory) for real-time edge deployment. Beyond epilepsy, the proposed framework provides a general paradigm for spatio-temporal forecasting in healthcare and other time series domains where individual heterogeneity and interpretability are crucial.
【5】MotionStream: Real-Time Video Generation with Interactive Motion Controls
标题:MotionStream:具有交互式运动控制的实时视频生成
链接:https://arxiv.org/abs/2511.01266
作者:Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Schechtman, Xun Huang
备注:Project webpage: this https URL
摘要:Current motion-conditioned video generation methods suffer from prohibitive latency (minutes per video) and non-causal processing that prevents real-time interaction. We present MotionStream, enabling sub-second latency with up to 29 FPS streaming generation on a single GPU. Our approach begins by augmenting a text-to-video model with motion control, which generates high-quality videos that adhere to the global text prompt and local motion guidance, but does not perform inference on the fly. As such, we distill this bidirectional teacher into a causal student through Self Forcing with Distribution Matching Distillation, enabling real-time streaming inference. Several key challenges arise when generating videos of long, potentially infinite time-horizons: (1) bridging the domain gap from training on finite length and extrapolating to infinite horizons, (2) sustaining high quality by preventing error accumulation, and (3) maintaining fast inference, without incurring growth in computational cost due to increasing context windows. A key to our approach is introducing carefully designed sliding-window causal attention, combined with attention sinks. By incorporating self-rollout with attention sinks and KV cache rolling during training, we properly simulate inference-time extrapolations with a fixed context window, enabling constant-speed generation of arbitrarily long videos. Our models achieve state-of-the-art results in motion following and video quality while being two orders of magnitude faster, uniquely enabling infinite-length streaming. With MotionStream, users can paint trajectories, control cameras, or transfer motion, and see results unfold in real-time, delivering a truly interactive experience.
【6】Adapt under Attack and Domain Shift: Unified Adversarial Meta-Learning and Domain Adaptation for Robust Automatic Modulation Classification
标题:在攻击和领域转移下适应:统一对抗元学习和领域适应,用于鲁棒自动调制分类
链接:https://arxiv.org/abs/2511.01172
作者:Ali Owfi, Amirmohammad Bamdad, Tolunay Seyfi, Fatemeh Afghah
摘要
:Deep learning has emerged as a leading approach for Automatic Modulation Classification (AMC), demonstrating superior performance over traditional methods. However, vulnerability to adversarial attacks and susceptibility to data distribution shifts hinder their practical deployment in real-world, dynamic environments. To address these threats, we propose a novel, unified framework that integrates meta-learning with domain adaptation, making AMC systems resistant to both adversarial attacks and environmental changes. Our framework utilizes a two-phase strategy. First, in an offline phase, we employ a meta-learning approach to train the model on clean and adversarially perturbed samples from a single source domain. This method enables the model to generalize its defense, making it resistant to a combination of previously unseen attacks. Subsequently, in the online phase, we apply domain adaptation to align the model's features with a new target domain, allowing it to adapt without requiring substantial labeled data. As a result, our framework achieves a significant improvement in modulation classification accuracy against these combined threats, offering a critical solution to the deployment and operational challenges of modern AMC systems.
【7】Effective Series Decomposition and Components Learning for Time Series Generation
标题:用于时间序列生成的有效序列分解和成分学习
链接:https://arxiv.org/abs/2511.00747
作者:Zixuan Ma, Chenfeng Huang
备注:Accepted at IEEE International Conference on Data Mining (ICDM 2025). Camera-ready version to appear
摘要:Time series generation focuses on modeling the underlying data distribution and resampling to produce authentic time series data. Key components, such as trend and seasonality, drive temporal fluctuations, yet many existing approaches fail to employ interpretative decomposition methods, limiting their ability to synthesize meaningful trend and seasonal patterns. To address this gap, we introduce Seasonal-Trend Diffusion (STDiffusion), a novel framework for multivariate time series generation that integrates diffusion probabilistic models with advanced learnable series decomposition techniques, enhancing the interpretability of the generation process. Our approach separates the trend and seasonal learning into distinct blocks: a Multi-Layer Perceptron (MLP) structure captures the trend, while adaptive wavelet distillation facilitates effective multi-resolution learning of seasonal components. This decomposition improves the interpretability of the model on multiple scales. In addition, we designed a comprehensive correction mechanism aimed at ensuring that the generated components exhibit a high degree of internal consistency and preserve meaningful interrelationships with one another. Our empirical studies on eight real-world datasets demonstrate that STDiffusion achieves state-of-the-art performance in time series generation tasks. Furthermore, we extend the model's application to multi-window long-sequence time series generation, which delivered reliable results and highlighted its robustness and versatility.
【8】Stochastic Shortest Path with Sparse Adversarial Costs
标题:对抗成本稀疏的随机最短路径
链接:https://arxiv.org/abs/2511.00637
作者:Emmeran Johnson, Alberto Rumi, Ciara Pike-Burke, Patrick Rebeschini
摘要:We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Descent (OMD) with negative-entropy regularization scale with $\sqrt{\log S A}$, where $SA$ is the size of the state-action space. While we show that this is optimal in the worst-case, this bound fails to capture the benefits of sparsity when only a small number $M \ll SA$ of state-action pairs incur cost. In fact, we also show that the negative-entropy is inherently non-adaptive to sparsity: it provably incurs regret scaling with $\sqrt{\log S}$ on sparse problems. Instead, we propose a family of $\ell_r$-norm regularizers ($r \in (1,2)$) that adapts to the sparsity and achieves regret scaling with $\sqrt{\log M}$ instead of $\sqrt{\log SA}$. We show this is optimal via a matching lower bound, highlighting that $M$ captures the effective dimension of the problem instead of $SA$. Finally, in the unknown transition setting the benefits of sparsity are limited: we prove that even on sparse problems, the minimax regret for any learner scales polynomially with $SA$.
【9】ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training
标题:ToxicTextCLIP:对CLIP预训练的基于文本的中毒和后门攻击
链接:https://arxiv.org/abs/2511.00446
作者:Xin Yao, Haiyang Zhao, Yimin Chen, Jiawei Guo, Kecheng Huang, Ming Zhao
备注:Accepted by NeurIPS 2025
摘要:The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from large-scale web data through self-supervised contrastive learning. Yet, its reliance on uncurated Internet-sourced data exposes it to data poisoning and backdoor risks. While existing studies primarily investigate image-based attacks, the text modality, which is equally central to CLIP's training, remains underexplored. In this work, we introduce ToxicTextCLIP, a framework for generating high-quality adversarial texts that target CLIP during the pre-training phase. The framework addresses two key challenges: semantic misalignment caused by background inconsistency with the target class, and the scarcity of background-consistent texts. To this end, ToxicTextCLIP iteratively applies: 1) a background-aware selector that prioritizes texts with background content aligned to the target class, and 2) a background-driven augmenter that generates semantically coherent and diverse poisoned samples. Extensive experiments on classification and retrieval tasks show that ToxicTextCLIP achieves up to 95.83% poisoning success and 98.68% backdoor Hit@1, while bypassing RoCLIP, CleanCLIP and SafeCLIP defenses. The source code can be accessed via https://github.com/xinyaocse/ToxicTextCLIP/.
【10】Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling
标题:通过平衡探索和利用与对象引导抽样来增强对抗性可转移性
链接:https://arxiv.org/abs/2511.00411
作者:Zenghao Niu, Weicheng Xie, Siyang Song, Zitong Yu, Feng Liu, Linlin Shen
备注:accepted by iccv 2025
摘要
:Adversarial attacks present a critical challenge to deep neural networks' robustness, particularly in transfer scenarios across different model architectures. However, the transferability of adversarial attacks faces a fundamental dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization). Traditional momentum-based methods over-prioritize Exploitation, i.e., higher loss maxima for attack potency but weakened generalization (narrow loss surface). Conversely, recent methods with inner-iteration sampling over-prioritize Exploration, i.e., flatter loss surfaces for cross-model generalization but weakened attack potency (suboptimal local maxima). To resolve this dilemma, we propose a simple yet effective Gradient-Guided Sampling (GGS), which harmonizes both objectives through guiding sampling along the gradient ascent direction to improve both sampling efficiency and stability. Specifically, based on MI-FGSM, GGS introduces inner-iteration random sampling and guides the sampling direction using the gradient from the previous inner-iteration (the sampling's magnitude is determined by a random distribution). This mechanism encourages adversarial examples to reside in balanced regions with both flatness for cross-model generalization and higher local maxima for strong attack potency. Comprehensive experiments across multiple DNN architectures and multimodal large language models (MLLMs) demonstrate the superiority of our method over state-of-the-art transfer attacks. Code is made available at https://github.com/anuin-cat/GGS.
【11】MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
标题:MalDataGen:恶意软件检测中合成表格数据生成的模块化框架
链接:https://arxiv.org/abs/2511.00361
作者:Kayua Oleques Paim, Angelo Gaspar Diniz Nogueira, Diego Kreutz, Weverton Cordeiro, Rodrigo Brandao Mansilha
备注:10 pages, 6 figures, 2 tables. Published at the Brazilian Symposium on Cybersecurity (SBSeg 2025)
摘要:High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data using modular deep learning models (e.g., WGAN-GP, VQ-VAE). Evaluated via dual validation (TR-TS/TS-TR), seven classifiers, and utility metrics, MalDataGen outperforms benchmarks like SDV while preserving data utility. Its flexible design enables seamless integration into detection pipelines, offering a practical solution for cybersecurity applications.
【12】OSMGen: Highly Controllable Satellite Image Synthesis using OpenStreetMap Data
标题:OSMGen:使用OpenStreetMap数据的高度可控卫星图像合成
链接:https://arxiv.org/abs/2511.00345
作者:Amir Ziashahabi, Narges Ghasemi, Sajjad Shahabi, John Krumm, Salman Avestimehr, Cyrus Shahabi
备注:Accepted at NeurIPS 2025 UrbanAI Workshop
摘要:Accurate and up-to-date geospatial data are essential for urban planning, infrastructure monitoring, and environmental management. Yet, automating urban monitoring remains difficult because curated datasets of specific urban features and their changes are scarce. We introduce OSMGen, a generative framework that creates realistic satellite imagery directly from raw OpenStreetMap (OSM) data. Unlike prior work that relies on raster tiles, OSMGen uses the full richness of OSM JSON, including vector geometries, semantic tags, location, and time, giving fine-grained control over how scenes are generated. A central feature of the framework is the ability to produce consistent before-after image pairs: user edits to OSM inputs translate into targeted visual changes, while the rest of the scene is preserved. This makes it possible to generate training data that addresses scarcity and class imbalance, and to give planners a simple way to preview proposed interventions by editing map data. More broadly, OSMGen produces paired (JSON, image) data for both static and changed states, paving the way toward a closed-loop system where satellite imagery can automatically drive structured OSM updates. Source code is available at https://github.com/amir-zsh/OSMGen.
【13】Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules versus Therapeutic Peptides
标题:药物发现前沿的扩散模型:产生小分子与治疗性肽的评论
链接:https://arxiv.org/abs/2511.00209
作者:Yiquan Wang, Yahui Ma, Yuhan Chang, Jiayao Yan, Jialin Zhang, Minnuo Cai, Kai Wei
备注:21 pages, 3 figures
摘要:Diffusion models have emerged as a leading framework in generative modeling, showing significant potential to accelerate and transform the traditionally slow and costly process of drug discovery. This review provides a systematic comparison of their application in designing two principal therapeutic modalities: small molecules and therapeutic peptides. We analyze how a unified framework of iterative denoising is adapted to the distinct molecular representations, chemical spaces, and design objectives of each modality. For small molecules, these models excel at structure-based design, generating novel, pocket-fitting ligands with desired physicochemical properties, yet face the critical hurdle of ensuring chemical synthesizability. Conversely, for therapeutic peptides, the focus shifts to generating functional sequences and designing de novo structures, where the primary challenges are achieving biological stability against proteolysis, ensuring proper folding, and minimizing immunogenicity. Despite these distinct challenges, both domains face shared hurdles: the need for more accurate scoring functions, the scarcity of high-quality experimental data, and the crucial requirement for experimental validation. We conclude that the full potential of diffusion models will be unlocked by bridging these modality-specific gaps and integrating them into automated, closed-loop Design-Build-Test-Learn (DBTL) platforms, thereby shifting the paradigm from chemical exploration to the targeted creation of novel therapeutics.
【14】A generative adversarial network optimization method for damage detection and digital twinning by deep AI fault learning: Z24 Bridge structural health monitoring benchmark validation
标题:通过深度AI故障学习进行损伤检测和数字孪生的生成式对抗网络优化方法:Z24桥梁结构健康监测基准验证
链接:https://arxiv.org/abs/2511.00099
作者:Marios Impraimakis, Evangelia Nektaria Palkanoglou
备注:21 pages, 23 figures, published in Structural and Multidisciplinary Optimization
摘要
:The optimization-based damage detection and damage state digital twinning capabilities are examined here of a novel conditional-labeled generative adversarial network methodology. The framework outperforms current approaches for fault anomaly detection as no prior information is required for the health state of the system: a topic of high significance for real-world applications. Specifically, current artificial intelligence-based digital twinning approaches suffer from the uncertainty related to obtaining poor predictions when a low number of measurements is available, physics knowledge is missing, or when the damage state is unknown. To this end, an unsupervised framework is examined and validated rigorously on the benchmark structural health monitoring measurements of Z24 Bridge: a post-tensioned concrete highway bridge in Switzerland. In implementing the approach, firstly, different same damage-level measurements are used as inputs, while the model is forced to converge conditionally to two different damage states. Secondly, the process is repeated for a different group of measurements. Finally, the convergence scores are compared to identify which one belongs to a different damage state. The process for both healthy-to-healthy and damage-to-healthy input data creates, simultaneously, measurements for digital twinning purposes at different damage states, capable of pattern recognition and machine learning data generation. Further to this process, a support vector machine classifier and a principal component analysis procedure is developed to assess the generated and real measurements of each damage category, serving as a secondary new dynamics learning indicator in damage scenarios. Importantly, the approach is shown to capture accurately damage over healthy measurements, providing a powerful tool for vibration-based system-level monitoring and scalable infrastructure resilience.
半/弱/无/有监督|不确定性|主动学习(5篇)
【1】Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning
标题:自我和谐:学会在测试时强化学习中协调自我监督和自我游戏
链接:https://arxiv.org/abs/2511.01191
作者:Ru Wang, Wei Huang, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
摘要:Test-time reinforcement learning (TTRL) offers a label-free paradigm for adapting models using only synthetic signals at inference, but its success hinges on constructing reliable learning signals. Standard approaches such as majority voting often collapse to spurious yet popular answers. We introduce Self-Harmony, a framework built on a simple intuition: the correct answer should remain stable across both an original question and its paraphrase. Self-Harmony operationalizes this by employing a single model in two complementary roles: a Solver to produce answers and a Reframer to rephrase the input. Based on this, we further propose a pseudo-label method: instead of majority voting, it aggregates answer frequencies across these original and reframed views using the harmonic mean. This is a process that naturally selects for solutions stable under reframing, thereby avoiding the common trap of favoring view-dependent, spurious answers. Crucially, this requires no human supervision or auxiliary models. Across diverse reasoning benchmarks, Self-Harmony achieves state-of-the-art results at the label-free test-time setting, ranking first in 28 of 30 settings across multiple methods. Beyond accuracy, it demonstrates unprecedented robustness, with zero training failures in all experiments, underscoring its stability and reliability.
【2】A systematic evaluation of uncertainty quantification techniques in deep learning: a case study in photoplethysmography signal analysis
标题:深度学习中不确定性量化技术的系统评估:光电体积脉搏图信号分析的案例研究
链接:https://arxiv.org/abs/2511.00301
作者:Ciaran Bench, Oskar Pfeffer, Vivek Desai, Mohammad Moulaeifard, Loïc Coquelin, Peter H. Charlton, Nils Strodthoff, Nando Hegemann, Philip J. Aston, Andrew Thompson
摘要:In principle, deep learning models trained on medical time-series, including wearable photoplethysmography (PPG) sensor data, can provide a means to continuously monitor physiological parameters outside of clinical settings. However, there is considerable risk of poor performance when deployed in practical measurement scenarios leading to negative patient outcomes. Reliable uncertainties accompanying predictions can provide guidance to clinicians in their interpretation of the trustworthiness of model outputs. It is therefore of interest to compare the effectiveness of different approaches. Here we implement an unprecedented set of eight uncertainty quantification (UQ) techniques to models trained on two clinically relevant prediction tasks: Atrial Fibrillation (AF) detection (classification), and two variants of blood pressure regression. We formulate a comprehensive evaluation procedure to enable a rigorous comparison of these approaches. We observe a complex picture of uncertainty reliability across the different techniques, where the most optimal for a given task depends on the chosen expression of uncertainty, evaluation metric, and scale of reliability assessed. We find that assessing local calibration and adaptivity provides practically relevant insights about model behaviour that otherwise cannot be acquired using more commonly implemented global reliability metrics. We emphasise that criteria for evaluating UQ techniques should cater to the model's practical use case, where the use of a small number of measurements per patient places a premium on achieving small-scale reliability for the chosen expression of uncertainty, while preserving as much predictive performance as possible.
【3】A filtering scheme for confocal laser endomicroscopy (CLE)-video sequences for self-supervised learning
标题:共焦激光显微内窥镜(CLE)过滤方案-用于自我监督学习的视频序列
链接:https://arxiv.org/abs/2511.00098
作者:Nils Porsche, Flurin Müller-Diesing, Sweta Banerjee, Miguel Goncalves, Marc Aubreville
摘要:Confocal laser endomicroscopy (CLE) is a non-invasive, real-time imaging modality that can be used for in-situ, in-vivo imaging and the microstructural analysis of mucous structures. The diagnosis using CLE is, however, complicated by images being hard to interpret for non-experienced physicians. Utilizing machine learning as an augmentative tool would hence be beneficial, but is complicated by the shortage of histopathology-correlated CLE imaging sequences with respect to the plurality of patterns in this domain, leading to overfitting of machine learning models. To overcome this, self-supervised learning (SSL) can be employed on larger unlabeled datasets. CLE is a video-based modality with high inter-frame correlation, leading to a non-stratified data distribution for SSL training. In this work, we propose a filter functionality on CLE video sequences to reduce the dataset redundancy in SSL training and improve SSL training convergence and training efficiency. We use four state-of-the-art baseline networks and a SSL teacher-student network with a vision transformer small backbone for the evaluation. These networks were evaluated on downstream tasks for a sinonasal tumor dataset and a squamous cell carcinoma of the skin dataset. On both datasets, we found the highest test accuracy on the filtered SSL-pretrained model, with 67.48% and 73.52%, both considerably outperforming their non-SSL baselines. Our results show that SSL is an effective method for CLE pretraining. Further, we show that our proposed CLE video filter can be utilized to improve training efficiency in self-supervised scenarios, resulting in a reduction of 67% in training time.
【4】Wavelet-Based Feature Extraction and Unsupervised Clustering for Parity Detection: A Feature Engineering Perspective
标题:基于子波的特征提取和无监督集群用于Parity检测:特征工程的角度
链接
:https://arxiv.org/abs/2511.00071
作者:Ertugrul Mutlu
备注:8 pages, 2 figures. Code: this http URL
摘要:This paper explores a deliberately over-engineered approach to the classical problem of parity detection -- determining whether a number is odd or even -- by combining wavelet-based feature extraction with unsupervised clustering. Instead of relying on modular arithmetic, integers are transformed into wavelet-domain representations, from which multi-scale statistical features are extracted and clustered using the k-means algorithm. The resulting feature space reveals meaningful structural differences between odd and even numbers, achieving a classification accuracy of approximately 69.67% without any label supervision. These results suggest that classical signal-processing techniques, originally designed for continuous data, can uncover latent structure even in purely discrete symbolic domains. Beyond parity detection, the study provides an illustrative perspective on how feature engineering and clustering may be repurposed for unconventional machine learning problems, potentially bridging symbolic reasoning and feature-based learning.
【5】Semi-Supervised Preference Optimization with Limited Feedback
标题:有限反馈的半监督偏好优化
链接:https://arxiv.org/abs/2511.00040
作者:Seonggyun Lee, Sungjun Lim, Seojin Park, Soeun Cheon, Kyungwoo Song
摘要:The field of preference optimization has made outstanding contributions to the alignment of language models with human preferences. Despite these advancements, recent methods still rely heavily on substantial paired (labeled) feedback data, leading to substantial resource expenditures. To address these challenges, we study the problem of Semi-Supervised Preference Optimization (SSPO) in which the idea is to learn from both a small number of pairwise preference labels and a large pool of unpaired samples simultaneously. Our key theoretical contribution proves the existence of an optimal reward threshold capable of separating winning and losing responses with high probability, which enables a principled pseudo-labeling of unpaired data. By leveraging these pseudo-labels, SSPO effectively distills latent preferences from large-scale unpaired data, thus maintaining human alignment while drastically reducing acquisition costs. Extensive experiments across datasets validate this remarkable data efficiency; for instance, SSPO trained with Llama3-8B-Instruct on just 1% of UltraFeedback consistently surpasses strong baselines trained on 10% of UltraFeedback.
迁移|Zero/Few/One-Shot|自适应(8篇)
【1】A Comparative Study of Model Adaptation Strategies for Multi-Treatment Uplift Modeling
标题:多处理Upper建模的模型适应策略比较研究
链接:https://arxiv.org/abs/2511.01185
作者:Ruyue Zhang, Xiaopeng Ke, Ming Liu, Fangzhou Shi, Chang Men, Zhengdan Zhu
摘要:Uplift modeling has emerged as a crucial technique for individualized treatment effect estimation, particularly in fields such as marketing and healthcare. Modeling uplift effects in multi-treatment scenarios plays a key role in real-world applications. Current techniques for modeling multi-treatment uplift are typically adapted from binary-treatment works. In this paper, we investigate and categorize all current model adaptations into two types: Structure Adaptation and Feature Adaptation. Through our empirical experiments, we find that these two adaptation types cannot maintain effectiveness under various data characteristics (noisy data, mixed with observational data, etc.). To enhance estimation ability and robustness, we propose Orthogonal Function Adaptation (OFA) based on the function approximation theorem. We conduct comprehensive experiments with multiple data characteristics to study the effectiveness and robustness of all model adaptation techniques. Our experimental results demonstrate that our proposed OFA can significantly improve uplift model performance compared to other vanilla adaptation methods and exhibits the highest robustness.
【2】Continual Learning, Not Training: Online Adaptation For Agents
标题:持续学习,而不是训练:代理商的在线适应
链接:https://arxiv.org/abs/2511.01093
作者:Aman Jaglan, Jarrod Barnes
备注:12 pages, 4 figures
摘要:Continual Learning (CL) methods have traditionally focused on mitigating catastrophic forgetting through gradient-based retraining, an approach ill-suited for deployed agents that must adapt in real time. We introduce our Adaptive Teaching and Learning System (ATLAS), a dual-agent architecture that decouples reasoning (Teacher) from execution (Student) and incorporates a persistent learning memory that stores distilled guidance from experience. This informs the orchestration layer, enabling the system to dynamically adjust its operational strategies, such as supervision level or initial plan selection, at inference time. In doing so, ATLAS achieves gradient-free continual learning, shifting the locus of adaptation from model parameters to system-level orchestration. We formulate this as a system-centric paradigm for continual learning, where the objective is adaptive efficiency: maximizing task success while minimizing computational cost through inference-time orchestration rather than parameter updates. Evaluated on Microsoft's ExCyTIn-Bench, an open-source benchmark simulating complex cyberthreat investigation, ATLAS achieves 54.1% success with GPT-5-mini as its Student, outperforming the larger GPT-5 (High) by 13% while reducing cost by 86%. Cross-incident validation demonstrates generalization: frozen pamphlets from Incident #5 improve accuracy from 28% to 41% with zero retraining, while shifting output composition from verbose exploration to structured reasoning. Together, these findings establish gradient-free continual learning as a viable path toward adaptive, deployable AI systems and provide causally annotated traces valuable for training explicit world models.
【3】Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control
标题:区域自适应交通信号控制的单智能体强化学习模型
链接:https://arxiv.org/abs/2511.00551
作者
:Qiang Li, Ningjing Zeng, Lina Yu
摘要:Several studies have employed reinforcement learning (RL) to address the challenges of regional adaptive traffic signal control (ATSC) and achieved promising results. In this field, existing research predominantly adopts multi-agent frameworks. However, the adoption of multi-agent frameworks presents challenges for scalability. Instead, the Traffic signal control (TSC) problem necessitates a single-agent framework. TSC inherently relies on centralized management by a single control center, which can monitor traffic conditions across all roads in the study area and coordinate the control of all intersections. This work proposes a single-agent RL-based regional ATSC model compatible with probe vehicle technology. Key components of the RL design include state, action, and reward function definitions. To facilitate learning and manage congestion, both state and reward functions are defined based on queue length, with action designed to regulate queue dynamics. The queue length definition used in this study differs slightly from conventional definitions but is closely correlated with congestion states. More importantly, it allows for reliable estimation using link travel time data from probe vehicles. With probe vehicle data already covering most urban roads, this feature enhances the proposed method's potential for widespread deployment. The method was comprehensively evaluated using the SUMO simulation platform. Experimental results demonstrate that the proposed model effectively mitigates large-scale regional congestion levels via coordinated multi-intersection control.
【4】From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
标题:从一致到自适应:高效的PDL神经运算符的通用跳块机制
链接:https://arxiv.org/abs/2511.00032
作者:Lei Liu, Zhongyi Yu, Hong Wang, Huanshuo Dong, Haiyang Xin, Hongwei Zhao, Bin Li
摘要:In recent years, Neural Operators(NO) have gradually emerged as a popular approach for solving Partial Differential Equations (PDEs). However, their application to large-scale engineering tasks suffers from significant computational overhead. And the fact that current models impose a uniform computational cost while physical fields exhibit vastly different complexities constitutes a fundamental mismatch, which is the root of this inefficiency. For instance, in turbulence flows, intricate vortex regions require deeper network processing compared to stable flows. To address this, we introduce a framework: Skip-Block Routing (SBR), a general framework designed for Transformer-based neural operators, capable of being integrated into their multi-layer architectures. First, SBR uses a routing mechanism to learn the complexity and ranking of tokens, which is then applied during inference. Then, in later layers, it decides how many tokens are passed forward based on this ranking. This way, the model focuses more processing capacity on the tokens that are more complex. Experiments demonstrate that SBR is a general framework that seamlessly integrates into various neural operators. Our method reduces computational cost by approximately 50% in terms of Floating Point Operations (FLOPs), while still delivering up to 2x faster inference without sacrificing accuracy.
【5】A Proof of Learning Rate Transfer under $μ$P
标题:$μ$P下学习率转移的证明
链接:https://arxiv.org/abs/2511.01734
作者:Soufiane Hayou
备注:23 pages
摘要:We provide the first proof of learning rate transfer with width in a linear multi-layer perceptron (MLP) parametrized with $\mu$P, a neural network parameterization designed to ``maximize'' feature learning in the infinite-width limit. We show that under $\mu P$, the optimal learning rate converges to a \emph{non-zero constant} as width goes to infinity, providing a theoretical explanation to learning rate transfer. In contrast, we show that this property fails to hold under alternative parametrizations such as Standard Parametrization (SP) and Neural Tangent Parametrization (NTP). We provide intuitive proofs and support the theoretical findings with extensive empirical results.
【6】Few-Shot Multimodal Medical Imaging: A Theoretical Framework
标题:Few-Shot多模式医学成像:理论框架
链接:https://arxiv.org/abs/2511.01140
作者:Md Talha Mohsin, Ismail Abdulrashid
备注:6 Pages
摘要:Medical imaging relies heavily on large, labeled datasets. But, unfortunately, they are not always easily accessible in clinical settings. Additionally, many practitioners often face various structural obstacles like limited data availability, fragmented data systems, and unbalanced datasets. These barriers often lead to the increased diagnostic uncertainty, underrepresentation of certain conditions, reduced model robustness, and biased diagnostic decisions. In response to these challenges, approaches such as transfer learning, meta-learning, and multimodal fusion have made great strides. However, they still need a solid theoretical justification for why they succeed or fail in situations where data is scarce. To address this gap, we propose a unified theoretical framework that characterizes learning and inference under low-resource medical imaging conditions. We first formalize the learning objective under few-shot conditions and compute sample complexity constraints to estimate the smallest quantity of data needed to achieve clinically reliable accuracy. Then based on ideas from PAC-learning and PAC-Bayesian theory, we explain how multimodal integration encourages generalization and quantifies uncertainty under sparse supervision. We further propose a formal metric for explanation stability, offering interpretability guarantees under low-data conditions. Taken together, the proposed framework establishes a principled foundation for constructing dependable, data-efficient diagnostic systems by jointly characterizing sample efficiency, uncertainty quantification, and interpretability in a unified theoretical setting.
【7】SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations
标题:苏格拉底:使用相关复制品和自适应轨迹评估的模拟优化
链接:https://arxiv.org/abs/2511.00685
作者:Haoting Zhang, Haoxian Chen, Donglin Zhan, Hanyang Zhao, Henry Lam, Wenpin Tang, David Yao, Zeyu Zheng
摘要
:The field of simulation optimization (SO) encompasses various methods developed to optimize complex, expensive-to-sample stochastic systems. Established methods include, but are not limited to, ranking-and-selection for finite alternatives and surrogate-based methods for continuous domains, with broad applications in engineering and operations management. The recent advent of large language models (LLMs) offers a new paradigm for exploiting system structure and automating the strategic selection and composition of these established SO methods into a tailored optimization procedure. This work introduces SOCRATES (Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations), a novel two-stage procedure that leverages LLMs to automate the design of tailored SO algorithms. The first stage constructs an ensemble of digital replicas of the real system. An LLM is employed to implement causal discovery from a textual description of the system, generating a structural `skeleton' that guides the sample-efficient learning of the replicas. In the second stage, this replica ensemble is used as an inexpensive testbed to evaluate a set of baseline SO algorithms. An LLM then acts as a meta-optimizer, analyzing the performance trajectories of these algorithms to iteratively revise and compose a final, hybrid optimization schedule. This schedule is designed to be adaptive, with the ability to be updated during the final execution on the real system when the optimization performance deviates from expectations. By integrating LLM-driven reasoning with LLM-assisted trajectory-aware meta-optimization, SOCRATES creates an effective and sample-efficient solution for complex SO optimization problems.
【8】Transfer learning discovery of molecular modulators for perovskite solar cells
标题:转移学习发现钙钛合金太阳能电池分子调制剂
链接:https://arxiv.org/abs/2511.00204
作者:Haoming Yan, Xinyu Chen, Yanran Wang, Zhengchao Luo, Weizheng Huang, Hongshuai Wang, Peng Chen, Yuzhi Zhang, Weijie Sun, Jinzhuo Wang, Qihuang Gong, Rui Zhu, Lichen Zhao
摘要:The discovery of effective molecular modulators is essential for advancing perovskite solar cells (PSCs), but the research process is hindered by the vastness of chemical space and the time-consuming and expensive trial-and-error experimental screening. Concurrently, machine learning (ML) offers significant potential for accelerating materials discovery. However, applying ML to PSCs remains a major challenge due to data scarcity and limitations of traditional quantitative structure-property relationship (QSPR) models. Here, we apply a chemical informed transfer learning framework based on pre-trained deep neural networks, which achieves high accuracy in predicting the molecular modulator's effect on the power conversion efficiency (PCE) of PSCs. This framework is established through systematical benchmarking of diverse molecular representations, enabling lowcost and high-throughput virtual screening over 79,043 commercially available molecules. Furthermore, we leverage interpretability techniques to visualize the learned chemical representation and experimentally characterize the resulting modulator-perovskite interactions. The top molecular modulators identified by the framework are subsequently validated experimentally, delivering a remarkably improved champion PCE of 26.91% in PSCs.
强化学习(8篇)
【1】Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations
标题:使用强化学习和基于代理的模拟优化电动汽车充电站布局
链接:https://arxiv.org/abs/2511.01218
作者:Minh-Duc Nguyen, Dung D. Le, Phi Long Nguyen
备注:Under Review
摘要:The rapid growth of electric vehicles (EVs) necessitates the strategic placement of charging stations to optimize resource utilization and minimize user inconvenience. Reinforcement learning (RL) offers an innovative approach to identifying optimal charging station locations; however, existing methods face challenges due to their deterministic reward systems, which limit efficiency. Because real-world conditions are dynamic and uncertain, a deterministic reward structure cannot fully capture the complexities of charging station placement. As a result, evaluation becomes costly and time-consuming, and less reflective of real-world scenarios. To address this challenge, we propose a novel framework that integrates deep RL with agent-based simulations to model EV movement and estimate charging demand in real time. Our approach employs a hybrid RL agent with dual Q-networks to select optimal locations and configure charging ports, guided by a hybrid reward function that combines deterministic factors with simulation-derived feedback. Case studies in Hanoi, Vietnam, show that our method reduces average waiting times by 53.28% compared to the initial state, outperforming static baseline methods. This scalable and adaptive solution enhances EV infrastructure planning, effectively addressing real-world complexities and improving user experience.
【2】Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems
标题:面向大规模信息物理系统跨域优化的逻辑信息强化学习
链接:https://arxiv.org/abs/2511.00806
作者:Guangxi Wan, Peng Zeng, Xiaoting Dong, Chunhe Song, Shijie Cui, Dong Li, Qingwei Dong, Yiyang Liu, Hongfei Bai
摘要:Cyber-physical systems (CPS) require the joint optimization of discrete cyber actions and continuous physical parameters under stringent safety logic constraints. However, existing hierarchical approaches often compromise global optimality, whereas reinforcement learning (RL) in hybrid action spaces often relies on brittle reward penalties, masking, or shielding and struggles to guarantee constraint satisfaction. We present logic-informed reinforcement learning (LIRL), which equips standard policy-gradient algorithms with projection that maps a low-dimensional latent action onto the admissible hybrid manifold defined on-the-fly by first-order logic. This guarantees feasibility of every exploratory step without penalty tuning. Experimental evaluations have been conducted across multiple scenarios, including industrial manufacturing, electric vehicle charging stations, and traffic signal control, in all of which the proposed method outperforms existing hierarchical optimization approaches. Taking a robotic reducer assembly system in industrial manufacturing as an example, LIRL achieves a 36.47\% to 44.33\% reduction at most in the combined makespan-energy objective compared to conventional industrial hierarchical scheduling methods. Meanwhile, it consistently maintains zero constraint violations and significantly surpasses state-of-the-art hybrid-action reinforcement learning baselines. Thanks to its declarative logic-based constraint formulation, the framework can be seamlessly transferred to other domains such as smart transportation and smart grid, thereby paving the way for safe and real-time optimization in large-scale CPS.
【3】Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations
标题:需求波动下区域交通信号控制的鲁棒单智能体强化学习
链接:https://arxiv.org/abs/2511.00549
作者:Qiang Li, Jin Niu, Lina Yu
摘要:Traffic congestion, primarily driven by intersection queuing, significantly impacts urban living standards, safety, environmental quality, and economic efficiency. While Traffic Signal Control (TSC) systems hold potential for congestion mitigation, traditional optimization models often fail to capture real-world traffic complexity and dynamics. This study introduces a novel single-agent reinforcement learning (RL) framework for regional adaptive TSC, circumventing the coordination complexities inherent in multi-agent systems through a centralized decision-making paradigm. The model employs an adjacency matrix to unify the encoding of road network topology, real-time queue states derived from probe vehicle data, and current signal timing parameters. Leveraging the efficient learning capabilities of the DreamerV3 world model, the agent learns control policies where actions sequentially select intersections and adjust their signal phase splits to regulate traffic inflow/outflow, analogous to a feedback control system. Reward design prioritizes queue dissipation, directly linking congestion metrics (queue length) to control actions. Simulation experiments conducted in SUMO demonstrate the model's effectiveness: under inference scenarios with multi-level (10%, 20%, 30%) Origin-Destination (OD) demand fluctuations, the framework exhibits robust anti-fluctuation capability and significantly reduces queue lengths. This work establishes a new paradigm for intelligent traffic control compatible with probe vehicle technology. Future research will focus on enhancing practical applicability by incorporating stochastic OD demand fluctuations during training and exploring regional optimization mechanisms for contingency events.
【4】Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning
标题:利用域知情强化学习提高混乱对流控制的鲁棒性
链接:https://arxiv.org/abs/2511.00272
作者:Michiel Straat, Thorben Markmann, Sebastian Peitz, Barbara Hammer
摘要:Chaotic convective flows arise in many real-world systems, such as microfluidic devices and chemical reactors. Stabilizing these flows is highly desirable but remains challenging, particularly in chaotic regimes where conventional control methods often fail. Reinforcement Learning (RL) has shown promise for control in laminar flow settings, but its ability to generalize and remain robust under chaotic and turbulent dynamics is not well explored, despite being critical for real-world deployment. In this work, we improve the practical feasibility of RL-based control of such flows focusing on Rayleigh-B\'enard Convection (RBC), a canonical model for convective heat transport. To enhance generalization and sample efficiency, we introduce domain-informed RL agents that are trained using Proximal Policy Optimization across diverse initial conditions and flow regimes. We incorporate domain knowledge in the reward function via a term that encourages B\'enard cell merging, as an example of a desirable macroscopic property. In laminar flow regimes, the domain-informed RL agents reduce convective heat transport by up to 33%, and in chaotic flow regimes, they still achieve a 10% reduction, which is significantly better than the conventional controllers used in practice. We compare the domain-informed to uninformed agents: Our results show that the domain-informed reward design results in steady flows, faster convergence during training, and generalization across flow regimes without retraining. Our work demonstrates that elegant domain-informed priors can greatly enhance the robustness of RL-based control of chaotic flows, bringing real-world deployment closer.
【5】Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning
标题:基于深度强化学习的供应链金融决策模型及企业经济绩效预测研究
链接:https://arxiv.org/abs/2511.00166
作者:Shiman Zhang, Jinghan Zhou, Zhoufan Yu, Ningai Leng
备注:9 pages, 3 figures
摘要:To improve decision-making and planning efficiency in back-end centralized redundant supply chains, this paper proposes a decision model integrating deep learning with intelligent particle swarm optimization. A distributed node deployment model and optimal planning path are constructed for the supply chain network. Deep learning such as convolutional neural networks extracts features from historical data, and linear programming captures high-order statistical features. The model is optimized using fuzzy association rule scheduling and deep reinforcement learning, while neural networks fit dynamic changes. A hybrid mechanism of "deep learning feature extraction - intelligent particle swarm optimization" guides global optimization and selects optimal decisions for adaptive control. Simulations show reduced resource consumption, enhanced spatial planning, and in dynamic environments improved real-time decision adjustment, distribution path optimization, and robust intelligent control.
【6】LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers
标题:LC-Opt:对数据中心端到端液体冷却优化的强化学习和抽象人工智能进行基准测试
链接:https://arxiv.org/abs/2511.00116
作者:Avisek Naug, Antonio Guillen, Vineet Kumar, Scott Greenwood, Wesley Brewer, Sahand Ghorbanpour, Ashwin Ramesh Babu, Vineet Gundecha, Ricardo Luna Gutierrez, Soumyendu Sarkar
备注:Submitted to the NeurIPS 2025 conference
摘要
:Liquid cooling is critical for thermal management in high-density data centers with the rising AI workloads. However, machine learning-based controllers are essential to unlock greater energy efficiency and reliability, promoting sustainability. We present LC-Opt, a Sustainable Liquid Cooling (LC) benchmark environment, for reinforcement learning (RL) control strategies in energy-efficient liquid cooling of high-performance computing (HPC) systems. Built on the baseline of a high-fidelity digital twin of Oak Ridge National Lab's Frontier Supercomputer cooling system, LC-Opt provides detailed Modelica-based end-to-end models spanning site-level cooling towers to data center cabinets and server blade groups. RL agents optimize critical thermal controls like liquid supply temperature, flow rate, and granular valve actuation at the IT cabinet level, as well as cooling tower (CT) setpoints through a Gymnasium interface, with dynamic changes in workloads. This environment creates a multi-objective real-time optimization challenge balancing local thermal regulation and global energy efficiency, and also supports additional components like a heat recovery unit (HRU). We benchmark centralized and decentralized multi-agent RL approaches, demonstrate policy distillation into decision and regression trees for interpretable control, and explore LLM-based methods that explain control actions in natural language through an agentic mesh architecture designed to foster user trust and simplify system management. LC-Opt democratizes access to detailed, customizable liquid cooling models, enabling the ML community, operators, and vendors to develop sustainable data center liquid cooling control solutions.
【7】End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning
标题:集成生成人工智能和深度强化学习以实现自主超声扫描的端到端框架
链接:https://arxiv.org/abs/2511.00114
作者:Hanae Elmekki, Amanda Spilkin, Ehsan Zakeri, Antonela Mariel Zanuttini, Ahmed Alagha, Hani Sami, Jamal Bentahar, Lyes Kadem, Wen-Fang Xie, Philippe Pibarot, Rabeb Mizouni, Hadi Otrok, Azzam Mourad, Sami Muhaidat
摘要:Cardiac ultrasound (US) is among the most widely used diagnostic tools in cardiology for assessing heart health, but its effectiveness is limited by operator dependence, time constraints, and human error. The shortage of trained professionals, especially in remote areas, further restricts access. These issues underscore the need for automated solutions that can ensure consistent, and accessible cardiac imaging regardless of operator skill or location. Recent progress in artificial intelligence (AI), especially in deep reinforcement learning (DRL), has gained attention for enabling autonomous decision-making. However, existing DRL-based approaches to cardiac US scanning lack reproducibility, rely on proprietary data, and use simplified models. Motivated by these gaps, we present the first end-to-end framework that integrates generative AI and DRL to enable autonomous and reproducible cardiac US scanning. The framework comprises two components: (i) a conditional generative simulator combining Generative Adversarial Networks (GANs) with Variational Autoencoders (VAEs), that models the cardiac US environment producing realistic action-conditioned images; and (ii) a DRL module that leverages this simulator to learn autonomous, accurate scanning policies. The proposed framework delivers AI-driven guidance through expert-validated models that classify image type and assess quality, supports conditional generation of realistic US images, and establishes a reproducible foundation extendable to other organs. To ensure reproducibility, a publicly available dataset of real cardiac US scans is released. The solution is validated through several experiments. The VAE-GAN is benchmarked against existing GAN variants, with performance assessed using qualitative and quantitative approaches, while the DRL-based scanning system is evaluated under varying configurations to demonstrate effectiveness.
【8】On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning
标题:合作多智能体强化学习中分散可学习奖励塑造的基本局限性
链接:https://arxiv.org/abs/2511.00034
作者:Aditya Akella
备注:8 pages, 5 figures, 2 tables
摘要:Recent advances in learnable reward shaping have shown promise in single-agent reinforcement learning by automatically discovering effective feedback signals. However, the effectiveness of decentralized learnable reward shaping in cooperative multi-agent settings remains poorly understood. We propose DMARL-RSA, a fully decentralized system where each agent learns individual reward shaping, and evaluate it on cooperative navigation tasks in the simple_spread_v3 environment. Despite sophisticated reward learning, DMARL-RSA achieves only -24.20 +/- 0.09 average reward, compared to MAPPO with centralized training at 1.92 +/- 0.87--a 26.12-point gap. DMARL-RSA performs similarly to simple independent learning (IPPO: -23.19 +/- 0.96), indicating that advanced reward shaping cannot overcome fundamental decentralized coordination limitations. Interestingly, decentralized methods achieve higher landmark coverage (0.888 +/- 0.029 for DMARL-RSA, 0.960 +/- 0.045 for IPPO out of 3 total) but worse overall performance than centralized MAPPO (0.273 +/- 0.008 landmark coverage)--revealing a coordination paradox between local optimization and global performance. Analysis identifies three critical barriers: (1) non-stationarity from concurrent policy updates, (2) exponential credit assignment complexity, and (3) misalignment between individual reward optimization and global objectives. These results establish empirical limits for decentralized reward learning and underscore the necessity of centralized coordination for effective multi-agent cooperation.
元学习(1篇)
【1】Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features
标题:通过成对排名和元特征进行轨迹预测的动态模型选择
链接:https://arxiv.org/abs/2511.00126
作者:Lu Bowen
摘要:Recent deep trajectory predictors (e.g., Jiang et al., 2023; Zhou et al., 2022) have achieved strong average accuracy but remain unreliable in complex long-tail driving scenarios. These limitations reveal the weakness of the prevailing "one-model-fits-all" paradigm, particularly in safety-critical urban contexts where simpler physics-based models can occasionally outperform advanced networks (Kalman, 1960). To bridge this gap, we propose a dynamic multi-expert gating framework that adaptively selects the most reliable trajectory predictor among a physics-informed LSTM, a Transformer, and a fine-tuned GameFormer on a per-sample basis. Our method leverages internal model signals (meta-features) such as stability and uncertainty (Gal and Ghahramani, 2016), which we demonstrate to be substantially more informative than geometric scene descriptors. To the best of our knowledge, this is the first work to formulate trajectory expert selection as a pairwise-ranking problem over internal model signals (Burges et al., 2005), directly optimizing decision quality without requiring post-hoc calibration. Evaluated on the nuPlan-mini dataset (Caesar et al., 2021) with 1,287 samples, our LLM-enhanced tri-expert gate achieves a Final Displacement Error (FDE) of 2.567 m, representing a 9.5 percent reduction over GameFormer (2.835 m), and realizes 57.8 percent of the oracle performance bound. In open-loop simulations, after trajectory horizon alignment, the same configuration reduces FDE on left-turn scenarios by approximately 10 percent, demonstrating consistent improvements across both offline validation and open-loop evaluation. These results indicate that adaptive hybrid systems enhance trajectory reliability in safety-critical autonomous driving, providing a practical pathway beyond static single-model paradigms.
符号|符号学习(1篇)
【1】EngChain: A Symbolic Benchmark for Verifiable Multi-Step Reasoning in Engineering
标题:EngChain:工程中可验证多步推理的象征性基准
链接:https://arxiv.org/abs/2511.01650
作者:Ayesha Gull, Muhammad Usman Safder, Rania Elbadry, Preslav Nakov, Zhuohan Xie
备注:24 pages, includes figures and tables; introduces the EngChain benchmark
摘要:Large Language Models (LLMs) are increasingly being applied to specialized, high-stakes domains like engineering, which demands rigorous evaluation of their complex reasoning capabilities. While current benchmarks assess language understanding, factual recall, mathematics or code generation, none capture the integrative reasoning central to engineering where scientific principles, quantitative modeling and practical constraints must converge. To address this gap, we introduce EngChain, a benchmark for verifiable multi-step engineering problem-solving. EngChain contains 90 problems spanning three engineering branches, organized into 9 domains and 20 distinct areas. The problems are generated from symbolic templates with a high degree of randomization to ensure diversity and eliminate the risk of contamination. With this benchmark, we move beyond final answer accuracy with a two-stage evaluation: we first quantitatively verify the numerical and semantic validity of each reasoning step and then introduce LLM-As-A-Judge, an automated system to qualitatively categorize the identified reasoning errors.
医学相关(6篇)
【1】MedEqualizer: A Framework Investigating Bias in Synthetic Medical Data and Mitigation via Augmentation
标题:Medalizer:一个研究合成医学数据中的偏差和通过增强缓解的框架
链接:https://arxiv.org/abs/2511.01054
作者:Sama Salarian, Yue Zhang, Swati Padhee, Srinivasan Parthasarathy
摘要:Synthetic healthcare data generation presents a viable approach to enhance data accessibility and support research by overcoming limitations associated with real-world medical datasets. However, ensuring fairness across protected attributes in synthetic data is critical to avoid biased or misleading results in clinical research and decision-making. In this study, we assess the fairness of synthetic data generated by multiple generative adversarial network (GAN)-based models using the MIMIC-III dataset, with a focus on representativeness across protected demographic attributes. We measure subgroup representation using the logarithmic disparity metric and observe significant imbalances, with many subgroups either underrepresented or overrepresented in the synthetic data, compared to the real data. To mitigate these disparities, we introduce MedEqualizer, a model-agnostic augmentation framework that enriches the underrepresented subgroups prior to synthetic data generation. Our results show that MedEqualizer significantly improves demographic balance in the resulting synthetic datasets, offering a viable path towards more equitable and representative healthcare data synthesis.
【2】Integrating Visual and X-Ray Machine Learning Features in the Study of Paintings by Goya
标题:将视觉和X射线机器学习功能整合到戈雅绘画研究中
链接:https://arxiv.org/abs/2511.01000
作者:Hassan Ugail, Ismail Lujain Jaleel
摘要:Art authentication of Francisco Goya's works presents complex computational challenges due to his heterogeneous stylistic evolution and extensive historical patterns of forgery. We introduce a novel multimodal machine learning framework that applies identical feature extraction techniques to both visual and X-ray radiographic images of Goya paintings. The unified feature extraction pipeline incorporates Grey-Level Co-occurrence Matrix descriptors, Local Binary Patterns, entropy measures, energy calculations, and colour distribution analysis applied consistently across both imaging modalities. The extracted features from both visual and X-ray images are processed through an optimised One-Class Support Vector Machine with hyperparameter tuning. Using a dataset of 24 authenticated Goya paintings with corresponding X-ray images, split into an 80/20 train-test configuration with 10-fold cross-validation, the framework achieves 97.8% classification accuracy with a 0.022 false positive rate. Case study analysis of ``Un Gigante'' demonstrates the practical efficacy of our pipeline, achieving 92.3% authentication confidence through unified multimodal feature analysis. Our results indicate substantial performance improvement over single-modal approaches, establishing the effectiveness of applying identical computational methods to both visual and radiographic imagery in art authentication applications.
【3】Metadata-Aligned 3D MRI Representations for Contrast Understanding and Quality Control
标题:用于对比度理解和质量控制的元数据对齐3D MRI表示
链接:https://arxiv.org/abs/2511.00681
作者:Mehmet Yigit Avci, Pedro Borges, Virginia Fernandez, Paul Wright, Mehmet Yigitsoy, Sebastien Ourselin, Jorge Cardoso
摘要:Magnetic Resonance Imaging suffers from substantial data heterogeneity and the absence of standardized contrast labels across scanners, protocols, and institutions, which severely limits large-scale automated analysis. A unified representation of MRI contrast would enable a wide range of downstream utilities, from automatic sequence recognition to harmonization and quality control, without relying on manual annotations. To this end, we introduce MR-CLIP, a metadata-guided framework that learns MRI contrast representations by aligning volumetric images with their DICOM acquisition parameters. The resulting embeddings shows distinct clusters of MRI sequences and outperform supervised 3D baselines under data scarcity in few-shot sequence classification. Moreover, MR-CLIP enables unsupervised data quality control by identifying corrupted or inconsistent metadata through image-metadata embedding distances. By transforming routinely available acquisition metadata into a supervisory signal, MR-CLIP provides a scalable foundation for label-efficient MRI analysis across diverse clinical datasets.
【4】MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts
标题:MedRECT:临床文本错误纠正的医学推理基准
链接:https://arxiv.org/abs/2511.00421
作者:Naoto Iwase, Hiroki Okuyama, Junichiro Iwasawa
摘要
:Large language models (LLMs) show increasing promise in medical applications, but their ability to detect and correct errors in clinical texts -- a prerequisite for safe deployment -- remains under-evaluated, particularly beyond English. We introduce MedRECT, a cross-lingual benchmark (Japanese/English) that formulates medical error handling as three subtasks: error detection, error localization (sentence extraction), and error correction. MedRECT is built with a scalable, automated pipeline from the Japanese Medical Licensing Examinations (JMLE) and a curated English counterpart, yielding MedRECT-ja (663 texts) and MedRECT-en (458 texts) with comparable error/no-error balance. We evaluate 9 contemporary LLMs spanning proprietary, open-weight, and reasoning families. Key findings: (i) reasoning models substantially outperform standard architectures, with up to 13.5% relative improvement in error detection and 51.0% in sentence extraction; (ii) cross-lingual evaluation reveals 5-10% performance gaps from English to Japanese, with smaller disparities for reasoning models; (iii) targeted LoRA fine-tuning yields asymmetric improvements in error correction performance (Japanese: +0.078, English: +0.168) while preserving reasoning capabilities; and (iv) our fine-tuned model exceeds human expert performance on structured medical error correction tasks. To our knowledge, MedRECT is the first comprehensive cross-lingual benchmark for medical error correction, providing a reproducible framework and resources for developing safer medical LLMs across languages.
【5】Split Learning-Enabled Framework for Secure and Light-weight Internet of Medical Things Systems
标题:用于安全和轻量级医疗物联网系统的分离式学习支持框架
链接:https://arxiv.org/abs/2511.00336
作者:Siva Sai, Manish Prasad, Animesh Bhargava, Vinay Chamola, Rajkumar Buyya
备注:11 pages, 5 figures, Under review in an IEEE Transactions journal
摘要:The rapid growth of Internet of Medical Things (IoMT) devices has resulted in significant security risks, particularly the risk of malware attacks on resource-constrained devices. Conventional deep learning methods are impractical due to resource limitations, while Federated Learning (FL) suffers from high communication overhead and vulnerability to non-IID (heterogeneous) data. In this paper, we propose a split learning (SL) based framework for IoT malware detection through image-based classification. By dividing the neural network training between the clients and an edge server, the framework reduces computational burden on resource-constrained clients while ensuring data privacy. We formulate a joint optimization problem that balances computation cost and communication efficiency by using a game-theoretic approach for attaining better training performance. Experimental evaluations show that the proposed framework outperforms popular FL methods in terms of accuracy (+6.35%), F1-score (+5.03%), high convergence speed (+14.96%), and less resource consumption (33.83%). These results establish the potential of SL as a scalable and secure paradigm for next-generation IoT security.
【6】Towards Reliable Pediatric Brain Tumor Segmentation: Task-Specific nnU-Net Enhancements
标题:实现可靠的儿科脑肿瘤分割:特定任务的nnU-Net增强
链接:https://arxiv.org/abs/2511.00449
作者:Xiaolong Li, Zhi-Qin John Xu, Yan Ren, Tianming Qiu, Xiaowen Wang
摘要:Accurate segmentation of pediatric brain tumors in multi-parametric magnetic resonance imaging (mpMRI) is critical for diagnosis, treatment planning, and monitoring, yet faces unique challenges due to limited data, high anatomical variability, and heterogeneous imaging across institutions. In this work, we present an advanced nnU-Net framework tailored for BraTS 2025 Task-6 (PED), the largest public dataset of pre-treatment pediatric high-grade gliomas. Our contributions include: (1) a widened residual encoder with squeeze-and-excitation (SE) attention; (2) 3D depthwise separable convolutions; (3) a specificity-driven regularization term; and (4) small-scale Gaussian weight initialization. We further refine predictions with two postprocessing steps. Our models achieved first place on the Task-6 validation leaderboard, attaining lesion-wise Dice scores of 0.759 (CC), 0.967 (ED), 0.826 (ET), 0.910 (NET), 0.928 (TC) and 0.928 (WT).
蒸馏|知识提取(3篇)
【1】Privacy-Aware Time Series Synthesis via Public Knowledge Distillation
标题:通过公共知识蒸馏进行隐私意识时间序列合成
链接:https://arxiv.org/abs/2511.00700
作者:Penghang Liu, Haibei Zhu, Eleonora Kreacic, Svitlana Vyetrenko
备注:Published on Transactions on Machine Learning Research (TMLR)
摘要:Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.
【2】Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning
标题:恢复陈旧的更新:异步联邦学习的无数据知识蒸馏
链接:https://arxiv.org/abs/2511.00655
作者:Baris Askin, Holger R. Roth, Zhenyu Sun, Carlee Joe-Wong, Gauri Joshi, Ziyue Xu
摘要
:Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its scalability is limited by synchronization overhead. Asynchronous Federated Learning (AFL) alleviates this issue by allowing clients to communicate independently, thereby improving wall-clock efficiency in large-scale, heterogeneous environments. However, this asynchrony introduces stale updates (client updates computed on outdated global models) that can destabilize optimization and hinder convergence. We propose FedRevive, an asynchronous FL framework that revives stale updates through data-free knowledge distillation (DFKD). FedRevive integrates parameter-space aggregation with a lightweight, server-side DFKD process that transfers knowledge from stale client models to the current global model without access to real or public data. A meta-learned generator synthesizes pseudo-samples, which enables multi-teacher distillation. A hybrid aggregation scheme that combines raw updates with DFKD updates effectively mitigates staleness while retaining the scalability of AFL. Experiments on various vision and text benchmarks show that FedRevive achieves faster training up to 32.1% and higher final accuracy up to 21.5% compared to asynchronous baselines.
【3】SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation
标题:SpatialTraceGen:用于高效VLM空间推理蒸馏的高保真轨迹
链接:https://arxiv.org/abs/2511.00054
作者:Gio Huh, Dhruv Sheth, Rayhan Zirvi, Frank Xiao
备注:Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop on Efficient Reasoning
摘要:While Vision-Language Models (VLMs) excel in many areas, they struggle with complex spatial reasoning, which requires problem decomposition and strategic tool use. Fine-tuning smaller, more deployable models offers an efficient path to strong performance, but this is hampered by a major bottleneck: the absence of high-quality, step-by-step reasoning data. To address this data-efficiency gap, we introduce SpatialTraceGen, a framework to distill the reasoning processes of a large teacher model into a high-quality dataset of multi-hop, multi-tool reasoning traces. A key innovation is our automated Verifier, which scalably ensures the fidelity of each reasoning step, providing a cost-effective alternative to manual human annotation. On the CLEVR-Humans benchmark, this verifier-guided process improves the average quality score of traces by 17\% while reducing quality variance by over 40\%. SpatialTraceGen delivers a dataset of expert traces, providing the structured, step-by-step examples of tool use necessary for effective fine-tuning and sample-efficient offline reinforcement learning.
推荐(1篇)
【1】PolyRecommender: A Multimodal Recommendation System for Polymer Discovery
标题:PolyRecommender:一个聚合物发现的多模式推荐系统
链接:https://arxiv.org/abs/2511.00375
作者:Xin Wang, Yunhao Xiao, Rui Qiao
摘要:We introduce PolyRecommender, a multimodal discovery framework that integrates chemical language representations from PolyBERT with molecular graph-based representations from a graph encoder. The system first retrieves candidate polymers using language-based similarity and then ranks them using fused multimodal embeddings according to multiple target properties. By leveraging the complementary knowledge encoded in both modalities, PolyRecommender enables efficient retrieval and robust ranking across related polymer properties. Our work establishes a generalizable multimodal paradigm, advancing AI-guided design for the discovery of next-generation polymers.
聚类(1篇)
【1】SpEx: A Spectral Approach to Explainable Clustering
标题:SpEx:可解释集群的光谱方法
链接:https://arxiv.org/abs/2511.00885
作者:Tal Argov, Tal Wagner
备注:NeurIPS 2025
摘要:Explainable clustering by axis-aligned decision trees was introduced by Moshkovitz et al. (2020) and has gained considerable interest. Prior work has focused on minimizing the price of explainability for specific clustering objectives, lacking a general method to fit an explanation tree to any given clustering, without restrictions. In this work, we propose a new and generic approach to explainable clustering, based on spectral graph partitioning. With it, we design an explainable clustering algorithm that can fit an explanation tree to any given non-explainable clustering, or directly to the dataset itself. Moreover, we show that prior algorithms can also be interpreted as graph partitioning, through a generalized framework due to Trevisan (2013) wherein cuts are optimized in two graphs simultaneously. Our experiments show the favorable performance of our method compared to baselines on a range of datasets.
自动驾驶|车辆|车道检测等(3篇)
【1】A Spatio-Temporal Online Robust Tensor Recovery Approach for Streaming Traffic Data Imputation
标题:流流量数据插补的时空在线鲁棒张量恢复方法
链接:https://arxiv.org/abs/2511.01267
作者:Yiyang Yang, Xiejian Chi, Shanxing Gao, Kaidong Wang, Yao Wang
摘要
:Data quality is critical to Intelligent Transportation Systems (ITS), as complete and accurate traffic data underpin reliable decision-making in traffic control and management. Recent advances in low-rank tensor recovery algorithms have shown strong potential in capturing the inherent structure of high-dimensional traffic data and restoring degraded observations. However, traditional batch-based methods demand substantial computational and storage resources, which limits their scalability in the face of continuously expanding traffic data volumes. Moreover, recent online tensor recovery methods often suffer from severe performance degradation in complex real-world scenarios due to their insufficient exploitation of the intrinsic structural properties of traffic data. To address these challenges, we reformulate the traffic data recovery problem within a streaming framework, and propose a novel online robust tensor recovery algorithm that simultaneously leverages both the global spatio-temporal correlations and local consistency of traffic data, achieving high recovery accuracy and significantly improved computational efficiency in large-scale scenarios. Our method is capable of simultaneously handling missing and anomalous values in traffic data, and demonstrates strong adaptability across diverse missing patterns. Experimental results on three real-world traffic datasets demonstrate that the proposed approach achieves high recovery accuracy while significantly improving computational efficiency by up to three orders of magnitude compared to state-of-the-art batch-based methods. These findings highlight the potential of the proposed approach as a scalable and effective solution for traffic data quality enhancement in ITS.
【2】X-TRACK: Physics-Aware xLSTM for Realistic Vehicle Trajectory Prediction
标题:X-TRACK:物理感知xLSTM,用于真实车辆轨迹预测
链接:https://arxiv.org/abs/2511.00266
作者:Aanchal Rajesh Chugh, Marion Neumeier, Sebastian Dorn
摘要:Recent advancements in Recurrent Neural Network (RNN) architectures, particularly the Extended Long Short Term Memory (xLSTM), have addressed the limitations of traditional Long Short Term Memory (LSTM) networks by introducing exponential gating and enhanced memory structures. These improvements make xLSTM suitable for time-series prediction tasks as they exhibit the ability to model long-term temporal dependencies better than LSTMs. Despite their potential, these xLSTM-based models remain largely unexplored in the context of vehicle trajectory prediction. Therefore, this paper introduces a novel xLSTM-based vehicle trajectory prediction framework, X-TRAJ, and its physics-aware variant, X-TRACK (eXtended LSTM for TRAjectory prediction Constraint by Kinematics), which explicitly integrates vehicle motion kinematics into the model learning process. By introducing physical constraints, the proposed model generates realistic and feasible trajectories. A comprehensive evaluation on the highD and NGSIM datasets demonstrates that X-TRACK outperforms state-of-the-art baselines.
【3】Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
标题:Alpamayo-R1:长尾中可推广自动驾驶的桥梁推理和动作预测
链接:https://arxiv.org/abs/2511.00088
作者:NVIDIA: Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Diamond, Yifan Ding, Wenhao Ding, Liang Feng, Greg Heinrich, Jack Huang, Peter Karkus, Boyi Li, Pinyi Li, Tsung-Yi Lin, Dongran Liu, Ming-Yu Liu, Langechuan Liu, Zhijian Liu, Jason Lu, Yunxiang Mao, Pavlo Molchanov, Lindsey Pavao, Zhenghao Peng, Mike Ranzinger, Ed Schmerling, Shida Shen, Yunfei Shi, Sarah Tariq, Ran Tian, Tilman Wekel, Xinshuo Weng, Tianjun Xiao, Eric Yang, Xiaodong Yang, Yurong You, Xiaohui Zeng, Wenyuan Zhang, Boris Ivanovic, Marco Pavone
摘要:End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. To address this, we introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning to enhance decision-making in complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a Vision-Language Model pre-trained for Physical AI applications, with a diffusion-based trajectory decoder that generates dynamically feasible plans in real time; (3) a multi-stage training strategy using supervised fine-tuning to elicit reasoning and reinforcement learning (RL) to optimize reasoning quality via large reasoning model feedback and enforce reasoning-action consistency. Evaluation shows AR1 achieves up to a 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline, with a 35% reduction in off-road rate and 25% reduction in close encounter rate in closed-loop simulation. RL post-training improves reasoning quality by 45% as measured by a large reasoning model critic and reasoning-action consistency by 37%. Model scaling from 0.5B to 7B parameters shows consistent improvements. On-vehicle road tests confirm real-time performance (99 ms latency) and successful urban deployment. By bridging interpretable reasoning with precise control, AR1 demonstrates a practical path towards Level 4 autonomous driving. We plan to release AR1 models and a subset of the CoC in a future update.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】Casing Collar Identification using AlexNet-based Neural Networks for Depth Measurement in Oil and Gas Wells
标题:使用基于AlexNet的神经网络进行石油和天然气井深度测量的套管识别
链接:https://arxiv.org/abs/2511.00129
作者:Siyu Xiao, Xindi Zhao, Tianhao Mao, Yiwei Wang, Yuqiao Chen, Hongyun Zhang, Jian Wang, Junjie Wang, Shuang Liu, Tupei Chen, Yang Liu
摘要:Accurate downhole depth measurement is essential for oil and gas well operations, directly influencing reservoir contact, production efficiency, and operational safety. Collar correlation using a casing collar locator (CCL) is fundamental for precise depth calibration. While neural network-based CCL signal recognition has achieved significant progress in collar identification, preprocessing methods for such applications remain underdeveloped. Moreover, the limited availability of real well data poses substantial challenges for training neural network models that require extensive datasets. This paper presents a system integrated into downhole tools for CCL signal acquisition to facilitate dataset construction. We propose comprehensive preprocessing methods for data augmentation and evaluate their effectiveness using our AlexNet-based neural network models. Through systematic experimentation across various configuration combinations, we analyze the contribution of each augmentation method. Results demonstrate that standardization, label distribution smoothing (LDS), and random cropping are fundamental requirements for model training, while label smoothing regularization (LSR), time scaling, and multiple sampling significantly enhance model generalization capability. The F1 scores of our two benchmark models trained with the proposed augmentation methods maximumly improve from 0.937 and 0.952 to 1.0 and 1.0, respectively. Performance validation on real CCL waveforms confirms the effectiveness and practical applicability of our approach. This work addresses the gaps in data augmentation methodologies for training casing collar recognition models in CCL data-limited environments.
联邦学习|隐私保护|加密(5篇)
【1】Bayesian Coreset Optimization for Personalized Federated Learning
标题:个性化联邦学习的Bayesian Corset优化
链接:https://arxiv.org/abs/2511.01800
作者:Prateek Chanda, Shrey Modi, Ganesh Ramakrishnan
备注:9 pages, 5 figures, ICLR 2024
摘要
:In a distributed machine learning setting like Federated Learning where there are multiple clients involved which update their individual weights to a single central server, often training on the entire individual client's dataset for each client becomes cumbersome. To address this issue we propose $\methodprop$: a personalized coreset weighted federated learning setup where the training updates for each individual clients are forwarded to the central server based on only individual client coreset based representative data points instead of the entire client data. Through theoretical analysis we present how the average generalization error is minimax optimal up to logarithm bounds (upper bounded by $\mathcal{O}(n_k^{-\frac{2 \beta}{2 \beta+\boldsymbol{\Lambda}}} \log ^{2 \delta^{\prime}}(n_k))$) and lower bounds of $\mathcal{O}(n_k^{-\frac{2 \beta}{2 \beta+\boldsymbol{\Lambda}}})$, and how the overall generalization error on the data likelihood differs from a vanilla Federated Learning setup as a closed form function ${\boldsymbol{\Im}}(\boldsymbol{w}, n_k)$ of the coreset weights $\boldsymbol{w}$ and coreset sample size $n_k$. Our experiments on different benchmark datasets based on a variety of recent personalized federated learning architectures show significant gains as compared to random sampling on the training data followed by federated learning, thereby indicating how intelligently selecting such training samples can help in performance. Additionally, through experiments on medical datasets our proposed method showcases some gains as compared to other submodular optimization based approaches used for subset selection on client's data.
【2】Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing
标题:迈向移动边缘计算网络混合专家的高效联合学习
链接:https://arxiv.org/abs/2511.01743
作者:Song Gao, Shusen Jing, Shuai Zhang, Yue Wang, Xiangwei Zhou, Songyang Zhang
摘要:Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substantial demands for computational resources and large-scale training data required to train LAMs conflict with the limited storage and computational capacity of edge devices, posing significant challenges to training and deploying LAMs at the edge. In this work, we introduce the Networked Mixture-of-Experts (NMoE) system, in which clients infer collaboratively by distributing tasks to suitable neighbors based on their expertise and aggregate the returned results. For training the NMoE, we propose a federated learning framework that integrates both supervised and self-supervised learning to balance personalization and generalization, while preserving communication efficiency and data privacy. We conduct extensive experiments to demonstrate the efficacy of the proposed NMoE system, providing insights and benchmarks for the NMoE training algorithms.
【3】LSHFed: Robust and Communication-Efficient Federated Learning with Locally-Sensitive Hashing Gradient Mapping
标题:LSHFed:具有局部敏感哈希梯度映射的稳健且通信高效的联邦学习
链接:https://arxiv.org/abs/2511.01296
作者:Guanjie Cheng, Mengzhen Yang, Xinkui Zhao, Shuyi Yu, Tianyu Du, Yangyang Wu, Mengying Zhu, Shuiguang Deng
摘要:Federated learning (FL) enables collaborative model training across distributed nodes without exposing raw data, but its decentralized nature makes it vulnerable in trust-deficient environments. Inference attacks may recover sensitive information from gradient updates, while poisoning attacks can degrade model performance or induce malicious behaviors. Existing defenses often suffer from high communication and computation costs, or limited detection precision. To address these issues, we propose LSHFed, a robust and communication-efficient FL framework that simultaneously enhances aggregation robustness and privacy preservation. At its core, LSHFed incorporates LSHGM, a novel gradient verification mechanism that projects high-dimensional gradients into compact binary representations via multi-hyperplane locally-sensitive hashing. This enables accurate detection and filtering of malicious gradients using only their irreversible hash forms, thus mitigating privacy leakage risks and substantially reducing transmission overhead. Extensive experiments demonstrate that LSHFed maintains high model performance even when up to 50% of participants are collusive adversaries while achieving up to a 1000x reduction in gradient verification communication compared to full-gradient methods.
【4】FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts
标题:FedMGP:具有多组文本视觉预算的个性化联邦学习
链接:https://arxiv.org/abs/2511.00480
作者:Weihao Bo, Yanpeng Sun, Yu Wang, Xinyu Zhang, Zechao Li
摘要:In this paper, we introduce FedMGP, a new paradigm for personalized federated prompt learning in vision-language models. FedMGP equips each client with multiple groups of paired textual and visual prompts, enabling the model to capture diverse, fine-grained semantic and instance-level cues. A diversity loss is introduced to drive each prompt group to specialize in distinct and complementary semantic aspects, ensuring that the groups collectively cover a broader range of local characteristics. During communication, FedMGP employs a dynamic prompt aggregation strategy based on similarity-guided probabilistic sampling: each client computes the cosine similarity between its prompt groups and the global prompts from the previous round, then samples s groups via a softmax-weighted distribution. This soft selection mechanism preferentially aggregates semantically aligned knowledge while still enabling exploration of underrepresented patterns effectively balancing the preservation of common knowledge with client-specific features. Notably, FedMGP maintains parameter efficiency by redistributing a fixed prompt capacity across multiple groups, achieving state-of-the-art performance with the lowest communication parameters among all federated prompt learning methods. Theoretical analysis shows that our dynamic aggregation strategy promotes robust global representation learning by reinforcing shared semantics while suppressing client-specific noise. Extensive experiments demonstrate that FedMGP consistently outperforms prior approaches in both personalization and domain generalization across diverse federated vision-language benchmarks. The code will be released on https://github.com/weihao-bo/FedMGP.git.
【5】Exploring Federated Learning for Thermal Urban Feature Segmentation -- A Comparison of Centralized and Decentralized Approaches
标题:探索联邦学习用于热力城市特征分割--集中式和分散式方法的比较
链接:https://arxiv.org/abs/2511.00055
作者
:Leonhard Duda, Khadijeh Alibabaei, Elena Vollmer, Leon Klug, Valentin Kozlov, Lisana Berberi, Mishal Benz, Rebekka Volk, Juan Pedro Gutiérrez Hermosillo Muriedas, Markus Götz, Judith Sáínz-Pardo Díaz, Álvaro López García, Frank Schultmann, Achim Streit
摘要:Federated Learning (FL) is an approach for training a shared Machine Learning (ML) model with distributed training data and multiple participants. FL allows bypassing limitations of the traditional Centralized Machine Learning CL if data cannot be shared or stored centrally due to privacy or technical restrictions -- the participants train the model locally with their training data and do not need to share it among the other participants. This paper investigates the practical implementation and effectiveness of FL in a real-world scenario, specifically focusing on unmanned aerial vehicle (UAV)-based thermal images for common thermal feature detection in urban environments. The distributed nature of the data arises naturally and makes it suitable for FL applications, as images captured in two German cities are available. This application presents unique challenges due to non-identical distribution and feature characteristics of data captured at both locations. The study makes several key contributions by evaluating FL algorithms in real deployment scenarios rather than simulation. We compare several FL approaches with a centralized learning baseline across key performance metrics such as model accuracy, training time, communication overhead, and energy usage. This paper also explores various FL workflows, comparing client-controlled workflows and server-controlled workflows. The findings of this work serve as a valuable reference for understanding the practical application and limitations of the FL methods in segmentation tasks in UAV-based imaging.
推理|分析|理解|解释(21篇)
【1】Simulating Environments with Reasoning Models for Agent Training
标题:具有推理模型的模拟环境进行代理训练
链接:https://arxiv.org/abs/2511.01824
作者:Yuetai Li, Huseyin A Inan, Xiang Yue, Wei-Ning Chen, Lukas Wutschitz, Janardhan Kulkarni, Radha Poovendran, Robert Sim, Saravan Rajmohan
摘要:LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on $\tau^2$-Bench. Together, Simia-SFT and Simia-RL enable scalable agent training without environment engineering, replacing heavy and brittle implementations with flexible LLM-based simulation.
【2】Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement
标题:基于2级子空间解纠缠的多步骤知识交互分析
链接:https://arxiv.org/abs/2511.01706
作者:Sekh Mainul Islam, Pepa Atanasova, Isabelle Augenstein
备注:Under review
摘要:Natural Language Explanations (NLEs) describe how Large Language Models (LLMs) make decisions, drawing on both external Context Knowledge (CK) and Parametric Knowledge (PK) stored in model weights. Understanding their interaction is key to assessing the grounding of NLEs, yet it remains underexplored. Prior work has largely examined only single-step generation, typically the final answer, and has modelled PK and CK interaction only as a binary choice in a rank-1 subspace. This overlooks richer forms of interaction, such as complementary or supportive knowledge. We propose a novel rank-2 projection subspace that disentangles PK and CK contributions more accurately and use it for the first multi-step analysis of knowledge interactions across longer NLE sequences. Experiments on four QA datasets and three open-weight instruction-tuned LLMs show that diverse knowledge interactions are poorly represented in a rank-1 subspace but are effectively captured in our rank-2 formulation. Our multi-step analysis reveals that hallucinated NLEs align strongly with the PK direction, context-faithful ones balance PK and CK, and Chain-of-Thought prompting for NLEs shifts generated NLEs toward CK by reducing PK reliance. This work provides the first framework for systematic studies of multi-step knowledge interactions in LLMs through a richer rank-2 subspace disentanglement. Code and data: https://github.com/copenlu/pk-ck-knowledge-disentanglement.
【3】Panther: A Cost-Effective Privacy-Preserving Framework for GNN Training and Inference Services in Cloud Environments
标题:Panther:云环境中GNN训练和推理服务的经济高效的隐私保护框架
链接:https://arxiv.org/abs/2511.01654
作者:Congcong Chen, Xinyu Liu, Kaifeng Huang, Lifei Wei, Yang Shi
备注:Accepted for publication in IEEE Transactions on Services Computing (TSC)
摘要:Graph Neural Networks (GNNs) have marked significant impact in traffic state prediction, social recommendation, knowledge-aware question answering and so on. As more and more users move towards cloud computing, it has become a critical issue to unleash the power of GNNs while protecting the privacy in cloud environments. Specifically, the training data and inference data for GNNs need to be protected from being stolen by external adversaries. Meanwhile, the financial cost of cloud computing is another primary concern for users. Therefore, although existing studies have proposed privacy-preserving techniques for GNNs in cloud environments, their additional computational and communication overhead remain relatively high, causing high financial costs that limit their widespread adoption among users. To protect GNN privacy while lowering the additional financial costs, we introduce Panther, a cost-effective privacy-preserving framework for GNN training and inference services in cloud environments. Technically, Panther leverages four-party computation to asynchronously executing the secure array access protocol, and randomly pads the neighbor information of GNN nodes. We prove that Panther can protect privacy for both training and inference of GNN models. Our evaluation shows that Panther reduces the training and inference time by an average of 75.28% and 82.80%, respectively, and communication overhead by an average of 52.61% and 50.26% compared with the state-of-the-art, which is estimated to save an average of 55.05% and 59.00% in financial costs (based on on-demand pricing model) for the GNN training and inference process on Google Cloud Platform.
【4】Cross-Treatment Effect Estimation for Multi-Category, Multi-Valued Causal Inference via Dynamic Neural Masking
标题
:通过动态神经掩蔽估计多类别、多值因果推理的交叉治疗效应
链接:https://arxiv.org/abs/2511.01641
作者:Xiaopeng Ke, Yihan Yu, Ruyue Zhang, Zhishuo Zhou, Fangzhou Shi, Chang Men, Zhengdan Zhu
摘要:Counterfactual causal inference faces significant challenges when extended to multi-category, multi-valued treatments, where complex cross-effects between heterogeneous interventions are difficult to model. Existing methodologies remain constrained to binary or single-type treatments and suffer from restrictive assumptions, limited scalability, and inadequate evaluation frameworks for complex intervention scenarios. We present XTNet, a novel network architecture for multi-category, multi-valued treatment effect estimation. Our approach introduces a cross-effect estimation module with dynamic masking mechanisms to capture treatment interactions without restrictive structural assumptions. The architecture employs a decomposition strategy separating basic effects from cross-treatment interactions, enabling efficient modeling of combinatorial treatment spaces. We also propose MCMV-AUCC, a suitable evaluation metric that accounts for treatment costs and interaction effects. Extensive experiments on synthetic and real-world datasets demonstrate that XTNet consistently outperforms state-of-the-art baselines in both ranking accuracy and effect estimation quality. The results of the real-world A/B test further confirm its effectiveness.
【5】Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis
标题:Hydra:用于多元时间序列分析的双指数记忆
链接:https://arxiv.org/abs/2511.00989
作者:Asal Meskin, Alireza Mirrokni, Ali Najar, Ali Behrouz
摘要:In recent years, effectively modeling multivariate time series has gained significant popularity, mainly due to its wide range of applications, ranging from healthcare to financial markets and energy management. Transformers, MLPs, and linear models as the de facto backbones of modern time series models have shown promising results in single-variant and/or short-term forecasting. These models, however: (1) are permutation equivariant and so lack temporal inductive bias, being less expressive to capture the temporal dynamics; (2) are naturally designed for univariate setup, missing the inter-dependencies of temporal and variate dimensions; and/or (3) are inefficient for Long-term time series modeling. To overcome training and inference efficiency as well as the lack of temporal inductive bias, recently, linear Recurrent Neural Networks (RNNs) have gained attention as an alternative to Transformer-based models. These models, however, are inherently limited to a single sequence, missing inter-variate dependencies, and can propagate errors due to their additive nature. In this paper, we present Hydra, a by-design two-headed meta in-context memory module that learns how to memorize patterns at test time by prioritizing time series patterns that are more informative about the data. Hydra uses a 2-dimensional recurrence across both time and variate at each step, which is more powerful than mixing methods. Although the 2-dimensional nature of the model makes its training recurrent and non-parallelizable, we present a new 2D-chunk-wise training algorithm that approximates the actual recurrence with $\times 10$ efficiency improvement, while maintaining the effectiveness. Our experimental results on a diverse set of tasks and datasets, including time series forecasting, classification, and anomaly detection show the superior performance of Hydra compared to state-of-the-art baselines.
【6】HEATNETs: Explainable Random Feature Neural Networks for High-Dimensional Parabolic PDEs
标题:HEATNETs:用于多维Parabolic PED的可解释随机特征神经网络
链接:https://arxiv.org/abs/2511.00886
作者:Kyriakos Georgiou, Gianluca Fabiani, Constantinos Siettos, Athanasios N. Yannacopoulos
摘要:We deal with the solution of the forward problem for high-dimensional parabolic PDEs with random feature (projection) neural networks (RFNNs). We first prove that there exists a single-hidden layer neural network with randomized heat-kernels arising from the fundamental solution (Green's functions) of the heat operator, that we call HEATNET, that provides an unbiased universal approximator to the solution of parabolic PDEs in arbitrary (high) dimensions, with the rate of convergence being analogous to the ${O}(N^{-1/2})$, where $N$ is the size of HEATNET. Thus, HEATNETs are explainable schemes, based on the analytical framework of parabolic PDEs, exploiting insights from physics-informed neural networks aided by numerical and functional analysis, and the structure of the corresponding solution operators. Importantly, we show how HEATNETs can be scaled up for the efficient numerical solution of arbitrary high-dimensional parabolic PDEs using suitable transformations and importance Monte Carlo sampling of the integral representation of the solution, in order to deal with the singularities of the heat kernel around the collocation points. We evaluate the performance of HEATNETs through benchmark linear parabolic problems up to 2,000 dimensions. We show that HEATNETs result in remarkable accuracy with the order of the approximation error ranging from $1.0E-05$ to $1.0E-07$ for problems up to 500 dimensions, and of the order of $1.0E-04$ to $1.0E-03$ for 1,000 to 2,000 dimensions, with a relatively low number (up to 15,000) of features.
【7】Inference-Time Chain-of-Thought Pruning with Latent Informativeness Signals
标题:利用潜在信息性信号进行推理时思维链修剪
链接:https://arxiv.org/abs/2511.00699
作者:Sophie Li (1), Nicholas Huang (2), Nayan Saxena (3), Nina Luo (4), Vincent Lin (5), Kevin Zhu (3), Sunishchal Dev (3) ((1) Columbia University, (2) University of British Columbia, (3) Algoverse AI Research, (4) Harvey Mudd College, (5) University of Florida)
摘要:Large language models (LLMs) improve reasoning accuracy when generating multiple candidate solutions at test time, but standard methods like Best-of-N (BoN) incur high computational cost by fully generating all branches. Self-Truncation Best-of-N (ST-BoN) mitigates this by truncating unpromising paths early, but its reliance on consistency-based heuristics is a limitation as it does not directly evaluate branch quality. We present KL-Adjusted Pruned Path Algorithm (KAPPA), an inference-time method that combines Kullback-Leibler divergence, confidence, and entropy into a principled scoring function to guide progressive pruning. By promoting diversity during exploration and selectively eliminating low-scoring branches, KAPPA maintains accuracy while substantially reducing memory and token usage. Experiments on GSM8K and MATH500 with DeepSeek-R1-Distill-Qwen-1.5B and Qwen2.5-7B-Instruct demonstrate that KAPPA stabilizes performance in smaller models and achieves up to ~60% reduction in peak memory and ~90% reduction in total token generation relative to BoN, with minimal impact on accuracy.
【8】Sensitivity Analysis for Climate Science with Generative Flow Models
标题:用生成流模式进行气候科学的敏感性分析
链接:https://arxiv.org/abs/2511.00663
作者:Alex Dobra, Jakiw Pidstrigach, Tim Reichelt, Paolo Fraccaro, Johannes Jakubik, Anne Jones, Christian Schroeder de Witt, Philip Stier, Philip Torr
摘要:Sensitivity analysis is a cornerstone of climate science, essential for understanding phenomena ranging from storm intensity to long-term climate feedbacks. However, computing these sensitivities using traditional physical models is often prohibitively expensive in terms of both computation and development time. While modern AI-based generative models are orders of magnitude faster to evaluate, computing sensitivities with them remains a significant bottleneck. This work addresses this challenge by applying the adjoint state method for calculating gradients in generative flow models, with diffusion models as a special case. We apply this method to the cBottle generative model, an emulator of ERA5 data, to perform sensitivity analysis with respect to sea surface temperatures. Furthermore, we propose a novel gradient self-consistency check to quantitatively validate the computed sensitivities against the model's own outputs. Our results provide initial evidence that this approach can produce reliable gradients, reducing the computational cost of sensitivity analysis from weeks on a supercomputer with a physical model to hours on a GPU, thereby simplifying a critical workflow in climate science.
【9】DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching
标题::通过解码树草图增强大型推理模型
链接:https://arxiv.org/abs/2511.00640
作者:Zicheng Xu, Guanchu Wang, Yu-Neng Chuang, Guangyao Zheng, Alexander S. Szalay, Zirui Liu, Vladimir Braverman
摘要:Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces that increase inference cost and may degrade accuracy. Our analysis reveals a clear anti-correlation between reasoning length and accuracy, where across multiple stochastic decodes, the short reasoning paths consistently achieve the highest correctness, while longer ones accumulate errors and repetitions. These short optimal reasoning paths can be found ideally through full enumeration of the reasoning space. However, the tree-structured reasoning space grows exponentially with sequence length, rendering exhaustive exploration infeasible. To address this, we propose DTS, a model-agnostic decoding framework that sketches the reasoning space by selectively branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path. This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision. Experiments on AIME2024 and AIME2025 datasets with DeepSeek-R1-Distill-Qwen-7B and 1.5B show that DTS improves accuracy by up to 8%, reduces average reasoning length by 23%, and decreases repetition frequency by 12%, demonstrating DTS's ability for scalable and efficient LRM reasoning.
【10】Multi-refined Feature Enhanced Sentiment Analysis Using Contextual Instruction
标题:使用上下文指令的多精细特征增强情绪分析
链接:https://arxiv.org/abs/2511.00537
作者:Peter Atandoh, Jie Zou, Weikang Guo, Jiwei Wei, Zheng Wang
摘要:Sentiment analysis using deep learning and pre-trained language models (PLMs) has gained significant traction due to their ability to capture rich contextual representations. However, existing approaches often underperform in scenarios involving nuanced emotional cues, domain shifts, and imbalanced sentiment distributions. We argue that these limitations stem from inadequate semantic grounding, poor generalization to diverse linguistic patterns, and biases toward dominant sentiment classes. To overcome these challenges, we propose CISEA-MRFE, a novel PLM-based framework integrating Contextual Instruction (CI), Semantic Enhancement Augmentation (SEA), and Multi-Refined Feature Extraction (MRFE). CI injects domain-aware directives to guide sentiment disambiguation; SEA improves robustness through sentiment-consistent paraphrastic augmentation; and MRFE combines a Scale-Adaptive Depthwise Encoder (SADE) for multi-scale feature specialization with an Emotion Evaluator Context Encoder (EECE) for affect-aware sequence modeling. Experimental results on four benchmark datasets demonstrate that CISEA-MRFE consistently outperforms strong baselines, achieving relative improvements in accuracy of up to 4.6% on IMDb, 6.5% on Yelp, 30.3% on Twitter, and 4.1% on Amazon. These results validate the effectiveness and generalization ability of our approach for sentiment classification across varied domains.
【11】UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
标题:UME-R1:探索推理驱动的生成多模式嵌入
链接:https://arxiv.org/abs/2511.00405
作者:Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Jinsong Su
摘要:The remarkable success of multimodal large language models (MLLMs) has driven advances in multimodal embeddings, yet existing models remain inherently discriminative, limiting their ability to benefit from reasoning-driven generation paradigm. In this work, we pioneer the exploration of generative embeddings, unifying embedding tasks within a generative paradigm. We propose UME-R1, a universal multimodal embedding framework consisting of a two-stage training strategy: a cold-start supervised fine-tuning equips the model with reasoning capabilities and enables it to generate both discriminative and generative embeddings; a subsequent reinforcement learning enhances reasoning and further optimizes generative embedding quality. This pioneering work reveals four key insights: 1) generative embeddings unlock substantial performance gains over conventional discriminative embeddings by leveraging the powerful generative reasoning capabilities of MLLMs; 2) discriminative and generative embeddings are complementary, whose combined oracle performance far exceeding that of either alone; 3) RL can effectively enhance generative embeddings, establishing a scalable optimization paradigm.; 4) repeated sampling at inference boosts downstream task coverage (pass@k), highlighting the inference-time scalability potential of generative embeddings. Evaluated on the MMEB-V2 benchmark across 78 tasks spanning video, image, and visual documents, UME-R1 significantly outperforms conventional discriminative embedding models and offers a foundation for more interpretable, reasoning-driven generative multimodal embeddings. Our code, models, and datasets will be publicly available at https://github.com/XMUDeepLIT/UME-R1.
【12】Beyond ImageNet: Understanding Cross-Dataset Robustness of Lightweight Vision Models
标题:超越ImageNet:了解轻量级视觉模型的跨数据集稳健性
链接:https://arxiv.org/abs/2511.00335
作者:Weidong Zhang, Pak Lun Kevin Ding, Huan Liu
备注:10 pages, 5 tables, 1 figure, 3 equations, 11 mobile models, 7 datasets
摘要:Lightweight vision classification models such as MobileNet, ShuffleNet, and EfficientNet are increasingly deployed in mobile and embedded systems, yet their performance has been predominantly benchmarked on ImageNet. This raises critical questions: Do models that excel on ImageNet also generalize across other domains? How can cross-dataset robustness be systematically quantified? And which architectural elements consistently drive generalization under tight resource constraints? Here, we present the first systematic evaluation of 11 lightweight vision models (2.5M parameters), trained under a fixed 100-epoch schedule across 7 diverse datasets. We introduce the Cross-Dataset Score (xScore), a unified metric that quantifies the consistency and robustness of model performance across diverse visual domains. Our results show that (1) ImageNet accuracy does not reliably predict performance on fine-grained or medical datasets, (2) xScore provides a scalable predictor of mobile model performance that can be estimated from just four datasets, and (3) certain architectural components--such as isotropic convolutions with higher spatial resolution and channel-wise attention--promote broader generalization, while Transformer-based blocks yield little additional benefit, despite incurring higher parameter overhead. This study provides a reproducible framework for evaluating lightweight vision models beyond ImageNet, highlights key design principles for mobile-friendly architectures, and guides the development of future models that generalize robustly across diverse application domains.
【13】Melanoma Classification Through Deep Ensemble Learning and Explainable AI
标题:通过深度融合学习和可解释人工智能进行黑色素瘤分类
链接:https://arxiv.org/abs/2511.00246
作者:Wadduwage Shanika Perera, ABM Islam, Van Vung Pham, Min Kyung An
备注:Publisher-formatted version provided under CC BY-NC-ND 4.0 license. Original source produced by SciTePress
摘要:Melanoma is one of the most aggressive and deadliest skin cancers, leading to mortality if not detected and treated in the early stages. Artificial intelligence techniques have recently been developed to help dermatologists in the early detection of melanoma, and systems based on deep learning (DL) have been able to detect these lesions with high accuracy. However, the entire community must overcome the explainability limit to get the maximum benefit from DL for diagnostics in the healthcare domain. Because of the black box operation's shortcomings in DL models' decisions, there is a lack of reliability and trust in the outcomes. However, Explainable Artificial Intelligence (XAI) can solve this problem by interpreting the predictions of AI systems. This paper proposes a machine learning model using ensemble learning of three state-of-the-art deep transfer Learning networks, along with an approach to ensure the reliability of the predictions by utilizing XAI techniques to explain the basis of the predictions.
【14】Position: Vibe Coding Needs Vibe Reasoning: Improving Vibe Coding with Formal Verification
标题:职位:Vibe编码需要Vibe推理:通过形式验证改进Vibe编码
链接:https://arxiv.org/abs/2511.00202
作者:Jacqueline Mitchell, Yasser Shaaban
备注:7 pages, 3 figures, In Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages (LMPL'25), October 12-18, 2025, Singapore, Singapore. ACM, New York, NY, USA
摘要:``Vibe coding'' -- the practice of developing software through iteratively conversing with a large language model (LLM) -- has exploded in popularity within the last year. However, developers report key limitations including the accumulation of technical debt, security issues, and code churn to achieve satisfactory results. We argue that these pitfalls result from LLMs' inability to reconcile accumulating human-imposed constraints during vibe coding, with developers inadvertently failing to resolve contradictions because LLMs prioritize user commands over code consistency. Given LLMs' receptiveness to verification-based feedback, we argue that formal methods can mitigate these pitfalls, making vibe coding more reliable. However, we posit that integrating formal methods must transcend existing approaches that combine formal methods and LLMs. We advocate for a side-car system throughout the vibe coding process which: (1) \emph{Autoformalizes} specifications (2) Validates against targets, (3) Delivers \emph{actionable} feedback to the LLM, and (4) Allows intuitive developer influence on specifications.
【15】PDE-SHARP: PDE Solver Hybrids Through Analysis & Refinement Passes
标题:PDE-SHARP:通过分析和细化传递实现DOE解算器混合
链接:https://arxiv.org/abs/2511.00183
作者:Shaghayegh Fazliani, Madeleine Udell
摘要:Current LLM-driven approaches using test-time computing to generate PDE solvers execute a large number of solver samples to identify high-accuracy solvers. These paradigms are especially costly for complex PDEs requiring substantial computational resources for numerical evaluation. We introduce PDE-SHARP, a framework to reduce computational costs by replacing expensive scientific computation by cheaper LLM inference that achieves superior solver accuracy with 60-75% fewer computational evaluations. PDE-SHARP employs three stages: (1) Analysis: mathematical chain-of-thought analysis including PDE classification, solution type detection, and stability analysis; (2) Genesis: solver generation based on mathematical insights from the previous stage; and (3) Synthesis: collaborative selection-hybridization tournaments in which LLM judges iteratively refine implementations through flexible performance feedback. To generate high-quality solvers, PDE-SHARP requires fewer than 13 solver evaluations on average compared to 30+ for baseline methods, improving accuracy uniformly across tested PDEs by $4\times$ on average, and demonstrates robust performance across LLM architectures, from general-purpose to specialized reasoning models.
【16】ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus
标题:ARC-Gen:抽象和推理数据库的模仿程序基准生成器
链接:https://arxiv.org/abs/2511.00162
作者:Michael D. Moffitt
摘要:The Abstraction and Reasoning Corpus remains one of the most compelling and challenging benchmarks for tracking progress toward achieving Artificial General Intelligence. In contrast to other evaluation datasets designed to assess an agent's task-specific skills or accumulated knowledge, the ARC-AGI suite is specifically targeted at measuring skill acquisition efficiency, a trait that has (so far) been lacking in even the most sophisticated machine learning systems. For algorithms that require extensive intra-task exemplars, a significant constraint imposed by ARC-AGI is the modest cardinality of its demonstration set, comprising a small number of $\langle$ input, output $\rangle$ grids per task specifying the corresponding transformation. To embellish the space of viable sample pairs, this paper introduces ARC-GEN, an open-source procedural generator aimed at extending the original ARC-AGI training dataset as faithfully as possible. Unlike prior efforts, our generator is both exhaustive (covering all four-hundred tasks) and mimetic (more closely honoring the distributional properties and characteristics embodied in the initial ARC-AGI-1 release). We also discuss the use of this generator in establishing a static benchmark suite to verify the correctness of programs submitted to the 2025 Google Code Golf Championship.
【17】Analysis of Line Break prediction models for detecting defensive breakthrough in football
标题:足球防守突破的断线预测模型分析
链接:https://arxiv.org/abs/2511.00121
作者:Shoma Yagi, Jun Ichikawa, Genki Ichinose
备注:14 pages, 8 figures
摘要:In football, attacking teams attempt to break through the opponent's defensive line to create scoring opportunities. This action, known as a Line Break, is a critical indicator of offensive effectiveness and tactical performance, yet previous studies have mainly focused on shots or goal opportunities rather than on how teams break the defensive line. In this study, we develop a machine learning model to predict Line Breaks using event and tracking data from the 2023 J1 League season. The model incorporates 189 features, including player positions, velocities, and spatial configurations, and employs an XGBoost classifier to estimate the probability of Line Breaks. The proposed model achieved high predictive accuracy, with an AUC of 0.982 and a Brier score of 0.015. Furthermore, SHAP analysis revealed that factors such as offensive player speed, gaps in the defensive line, and offensive players' spatial distributions significantly contribute to the occurrence of Line Breaks. Finally, we found a moderate positive correlation between the predicted probability of being Line-Broken and the number of shots and crosses conceded at the team level. These results suggest that Line Breaks are closely linked to the creation of scoring opportunities and provide a quantitative framework for understanding tactical dynamics in football.
【18】Feature-Guided Analysis of Neural Networks: A Replication Study
标题:神经网络的创建引导分析:复制研究
链接:https://arxiv.org/abs/2511.00052
作者:Federico Formica, Stefano Gregis, Aurora Francesca Zanenga, Andrea Rota, Mark Lawford, Claudio Menghi
摘要:Understanding why neural networks make certain decisions is pivotal for their use in safety-critical applications. Feature-Guided Analysis (FGA) extracts slices of neural networks relevant to their tasks. Existing feature-guided approaches typically monitor the activation of the neural network neurons to extract the relevant rules. Preliminary results are encouraging and demonstrate the feasibility of this solution by assessing the precision and recall of Feature-Guided Analysis on two pilot case studies. However, the applicability in industrial contexts needs additional empirical evidence. To mitigate this need, this paper assesses the applicability of FGA on a benchmark made by the MNIST and LSC datasets. We assessed the effectiveness of FGA in computing rules that explain the behavior of the neural network. Our results show that FGA has a higher precision on our benchmark than the results from the literature. We also evaluated how the selection of the neural network architecture, training, and feature selection affect the effectiveness of FGA. Our results show that the selection significantly affects the recall of FGA, while it has a negligible impact on its precision.
【19】Physics-Informed Neural Network Frameworks for the Analysis of Engineering and Biological Dynamical Systems Governed by Ordinary Differential Equations
标题:用于分析由常微方程控制的工程和生物动力系统的物理信息神经网络框架
链接:https://arxiv.org/abs/2511.00043
作者:Tyrus Whitman, Andrew Particka, Christopher Diers, Ian Griffin, Charuka Wickramasinghe, Pradeep Ranaweera
备注:21 pages, 10 figures, 5 tables
摘要:In this study, we present and validate the predictive capability of the Physics-Informed Neural Networks (PINNs) methodology for solving a variety of engineering and biological dynamical systems governed by ordinary differential equations (ODEs). While traditional numerical methods a re effective for many ODEs, they often struggle to achieve convergence in problems involving high stiffness, shocks, irregular domains, singular perturbations, high dimensions, or boundary discontinuities. Alternatively, PINNs offer a powerful approach for handling challenging numerical scenarios. In this study, classical ODE problems are employed as controlled testbeds to systematically evaluate the accuracy, training efficiency, and generalization capability under controlled conditions of the PINNs framework. Although not a universal solution, PINNs can achieve superior results by embedding physical laws directly into the learning process. We first analyze the existence and uniqueness properties of several benchmark problems and subsequently validate the PINNs methodology on these model systems. Our results demonstrate that for complex problems to converge to correct solutions, the loss function components data loss, initial condition loss, and residual loss must be appropriately balanced through careful weighting. We further establish that systematic tuning of hyperparameters, including network depth, layer width, activation functions, learning rate, optimization algorithms, w eight initialization schemes, and collocation point sampling, plays a crucial role in achieving accurate solutions. Additionally, embedding prior knowledge and imposing hard constraints on the network architecture, without loss the generality of the ODE system, significantly enhances the predictive capability of PINNs.
【20】Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry
标题:偶数和椭圆对称性存在时变分推理的广义保证
链接:https://arxiv.org/abs/2511.01064
作者:Charles C. Margossian, Lawrence K. Saul
摘要:We extend several recent results providing symmetry-based guarantees for variational inference (VI) with location-scale families. VI approximates a target density~$p$ by the best match $q^*$ in a family $Q$ of tractable distributions that in general does not contain $p$. It is known that VI can recover key properties of $p$, such as its mean and correlation matrix, when $p$ and $Q$ exhibit certain symmetries and $q^*$ is found by minimizing the reverse Kullback-Leibler divergence. We extend these guarantees in two important directions. First, we provide symmetry-based guarantees for a broader family of divergences, highlighting the properties of variational objectives under which VI provably recovers the mean and correlation matrix. Second, we obtain further guarantees for VI when the target density $p$ exhibits even and elliptical symmetries in some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry. We illustrate these theoretical results in a number of experimental settings.
【21】Isotropic Curvature Model for Understanding Deep Learning Optimization: Is Gradient Orthogonalization Optimal?
标题:用于理解深度学习优化的各向同性弯曲模型:梯度电离是最佳的吗?
链接:https://arxiv.org/abs/2511.00674
作者:Weijie Su
摘要:In this paper, we introduce a model for analyzing deep learning optimization over a single iteration by leveraging the matrix structure of the weights. We derive the model by assuming isotropy of curvature, including the second-order Hessian and higher-order terms, of the loss function across all perturbation directions; hence, we call it the isotropic curvature model. This model is a convex optimization program amenable to analysis, which allows us to understand how an update on the weights in the form of a matrix relates to the change in the total loss function. As an application, we use the isotropic curvature model to analyze the recently introduced Muon optimizer and other matrix-gradient methods for training language models. First, we show that under a general growth condition on the curvature, the optimal update matrix is obtained by making the spectrum of the original gradient matrix more homogeneous -- that is, making its singular values closer in ratio -- which in particular improves the conditioning of the update matrix. Next, we show that the orthogonalized gradient becomes optimal for the isotropic curvature model when the curvature exhibits a phase transition in growth. Taken together, these results suggest that the gradient orthogonalization employed in Muon and other related methods is directionally correct but may not be strictly optimal. Finally, we discuss future research on how to leverage the isotropic curvature model for designing new optimization methods for training deep learning and language models.
检测相关(6篇)
【1】An Open-Access Benchmark of Statistical and Machine-Learning Anomaly Detection Methods for Battery Applications
标题:电池应用统计和机器学习异常检测方法的开放访问基准
链接:https://arxiv.org/abs/2511.01745
作者:Mei-Chin Pang, Suraj Adhikari, Takuma Kasahara, Nagihiro Haba, Saneyuki Ohno
摘要:Battery safety is critical in applications ranging from consumer electronics to electric vehicles and aircraft, where undetected anomalies could trigger safety hazards or costly downtime. In this study, we present OSBAD as an open-source benchmark for anomaly detection frameworks in battery applications. By benchmarking 15 diverse algorithms encompassing statistical, distance-based, and unsupervised machine-learning methods, OSBAD enables a systematic comparison of anomaly detection methods across heterogeneous datasets. In addition, we demonstrate how a physics- and statistics-informed feature transformation workflow enhances anomaly separability by decomposing collective anomalies into point anomalies. To address a major bottleneck in unsupervised anomaly detection due to incomplete labels, we propose a Bayesian optimization pipeline that facilitates automated hyperparameter tuning based on transfer-learning and regression proxies. Through validation on datasets covering both liquid and solid-state chemistries, we further demonstrate the cross-chemistry generalization capability of OSBAD to identify irregularities across different electrochemical systems. By making benchmarking database with open-source reproducible anomaly detection workflows available to the community, OSBAD establishes a unified foundation for developing safe, scalable, and transferable anomaly detection tools in battery analytics. This research underscores the significance of physics- and statistics-informed feature engineering as well as model selection with probabilistic hyperparameter tuning, in advancing trustworthy, data-driven diagnostics for safety-critical energy systems.
【2】Federated Cyber Defense: Privacy-Preserving Ransomware Detection Across Distributed Systems
标题:联合网络防御:跨分布式系统保护隐私的勒索软件检测
链接:https://arxiv.org/abs/2511.01583
作者:Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Joaquin Del Rio, Oleksii Sliusarenko, Xabi Uribe-Etxebarria
摘要
:Detecting malware, especially ransomware, is essential to securing today's interconnected ecosystems, including cloud storage, enterprise file-sharing, and database services. Training high-performing artificial intelligence (AI) detectors requires diverse datasets, which are often distributed across multiple organizations, making centralization necessary. However, centralized learning is often impractical due to security, privacy regulations, data ownership issues, and legal barriers to cross-organizational sharing. Compounding this challenge, ransomware evolves rapidly, demanding models that are both robust and adaptable. In this paper, we evaluate Federated Learning (FL) using the Sherpa.ai FL platform, which enables multiple organizations to collaboratively train a ransomware detection model while keeping raw data local and secure. This paradigm is particularly relevant for cybersecurity companies (including both software and hardware vendors) that deploy ransomware detection or firewall systems across millions of endpoints. In such environments, data cannot be transferred outside the customer's device due to strict security, privacy, or regulatory constraints. Although FL applies broadly to malware threats, we validate the approach using the Ransomware Storage Access Patterns (RanSAP) dataset. Our experiments demonstrate that FL improves ransomware detection accuracy by a relative 9% over server-local models and achieves performance comparable to centralized training. These results indicate that FL offers a scalable, high-performing, and privacy-preserving framework for proactive ransomware detection across organizational and regulatory boundaries.
【3】Window-Based Feature Engineering for Cognitive Workload Detection
标题:基于窗口的认知预设检测特征工程
链接:https://arxiv.org/abs/2511.01060
作者:Andrew Hallam, R G Gayathri, Glory Lee, Atul Sajjanhar
备注:9 pages, 3 figures
摘要:Cognitive workload is a topic of increasing interest across various fields such as health, psychology, and defense applications. In this research, we focus on classifying cognitive workload using the COLET dataset, employing a window-based approach for feature generation and machine/deep learning techniques for classification. We apply window-based temporal partitioning to enhance features used in existing research, followed by machine learning and deep learning models to classify different levels of cognitive workload. The results demonstrate that deep learning models, particularly tabular architectures, outperformed traditional machine learning methods in precision, F1-score, accuracy, and classification precision. This study highlights the effectiveness of window-based temporal feature extraction and the potential of deep learning techniques for real-time cognitive workload assessment in complex and dynamic tasks.
【4】Android Malware Detection: A Machine Leaning Approach
标题:Android恶意软件检测:一种依靠机器的方法
链接:https://arxiv.org/abs/2511.00894
作者:Hasan Abdulla
摘要:This study examines machine learning techniques like Decision Trees, Support Vector Machines, Logistic Regression, Neural Networks, and ensemble methods to detect Android malware. The study evaluates these models on a dataset of Android applications and analyzes their accuracy, efficiency, and real-world applicability. Key findings show that ensemble methods demonstrate superior performance, but there are trade-offs between model interpretability, efficiency, and accuracy. Given its increasing threat, the insights guide future research and practical use of ML to combat Android malware.
【5】Deep Learning Approach to Anomaly Detection in Enterprise ETL Processes with Autoencoders
标题:使用自动编码器进行企业RTL流程异常检测的深度学习方法
链接:https://arxiv.org/abs/2511.00462
作者:Xin Chen, Saili Uday Gadgil, Kangning Gao, Yi Hu, Cong Nie
摘要:An anomaly detection method based on deep autoencoders is proposed to address anomalies that often occur in enterprise-level ETL data streams. The study first analyzes multiple types of anomalies in ETL processes, including delays, missing values, duplicate loading, and sudden abnormal changes, and applies data standardization and feature modeling to ensure stable and usable inputs. In the method design, the encoder-decoder structure compresses high-dimensional inputs into latent representations and reconstructs them, while reconstruction error is used to measure anomaly levels. Regularization constraints are introduced in the latent space to enhance feature sparsity and distribution learning, thereby improving robustness in complex data streams. Systematic analyses under different hyperparameter settings, environmental changes, and data characteristics show that the proposed method achieves superior performance in AUC, ACC, Precision, and Recall. The results demonstrate that the deep autoencoder-based detection mechanism can effectively capture latent distribution patterns in enterprise-level ETL data streams and accurately identify diverse anomalies, providing reliable support for enterprise data processing and intelligent analysis.
【6】Perturbations in the Orthogonal Complement Subspace for Efficient Out-of-Distribution Detection
标题:用于高效分布外检测的垂直补子空间中的扰动
链接:https://arxiv.org/abs/2511.00849
作者:Zhexiao Huang, Weihao He, Shutao Deng, Junzhe Chen, Chao Yuan, Hongxin Wang, Changsheng Zhou
摘要:Out-of-distribution (OOD) detection is essential for deploying deep learning models in open-world environments. Existing approaches, such as energy-based scoring and gradient-projection methods, typically rely on high-dimensional representations to separate in-distribution (ID) and OOD samples. We introduce P-OCS (Perturbations in the Orthogonal Complement Subspace), a lightweight and theoretically grounded method that operates in the orthogonal complement of the principal subspace defined by ID features. P-OCS applies a single projected perturbation restricted to this complementary subspace, enhancing subtle ID-OOD distinctions while preserving the geometry of ID representations. We show that a one-step update is sufficient in the small-perturbation regime and provide convergence guarantees for the resulting detection score. Experiments across multiple architectures and datasets demonstrate that P-OCS achieves state-of-the-art OOD detection with negligible computational cost and without requiring model retraining, access to OOD data, or changes to model architecture.
分类|识别(8篇)
【1】Defining Energy Indicators for Impact Identification on Aerospace Composites: A Physics-Informed Machine Learning Perspective
标题:定义航空航天复合材料影响识别的能量指标:基于物理知识的机器学习视角
链接:https://arxiv.org/abs/2511.01592
作者:Natália Ribeiro Marinho, Richard Loendersloot, Frank Grooteman, Jan Willem Wiegman, Uraz Odyurt, Tiedo Tinga
摘要
:Energy estimation is critical to impact identification on aerospace composites, where low-velocity impacts can induce internal damage that is undetectable at the surface. Current methodologies for energy prediction are often constrained by data sparsity, signal noise, complex feature interdependencies, non-linear dynamics, massive design spaces, and the ill-posed nature of the inverse problem. This study introduces a physics-informed framework that embeds domain knowledge into machine learning through a dedicated input space. The approach combines observational biases, which guide the design of physics-motivated features, with targeted feature selection to retain only the most informative indicators. Features are extracted from time, frequency, and time-frequency domains to capture complementary aspects of the structural response. A structured feature selection process integrating statistical significance, correlation filtering, dimensionality reduction, and noise robustness ensures physical relevance and interpretability. Exploratory data analysis further reveals domain-specific trends, yielding a reduced feature set that captures essential dynamic phenomena such as amplitude scaling, spectral redistribution, and transient signal behaviour. Together, these steps produce a compact set of energy-sensitive indicators with both statistical robustness and physical significance, resulting in impact energy predictions that remain interpretable and traceable to measurable structural responses. Using this optimised input space, a fully-connected neural network is trained and validated with experimental data from multiple impact scenarios, including pristine and damaged states. The resulting model demonstrates significantly improved impact energy prediction accuracy, reducing errors by a factor of three compared to conventional time-series techniques and purely data-driven models.
【2】Identification of Capture Phases in Nanopore Protein Sequencing Data Using a Deep Learning Model
标题:使用深度学习模型识别纳米孔蛋白质测序数据中的捕获阶段
链接:https://arxiv.org/abs/2511.01277
作者:Annabelle Martin, Daphne Kontogiorgos-Heintz, Jeff Nivala
摘要:Nanopore protein sequencing produces long, noisy ionic current traces in which key molecular phases, such as protein capture and translocation, are embedded. Capture phases mark the successful entry of a protein into the pore and serve as both a checkpoint and a signal that a channel merits further analysis. However, manual identification of capture phases is time-intensive, often requiring several days for expert reviewers to annotate the data due to the need for domain-specific interpretation of complex signal patterns. To address this, a lightweight one-dimensional convolutional neural network (1D CNN) was developed and trained to detect capture phases in down-sampled signal windows. Evaluated against CNN-LSTM (Long Short-Term Memory) hybrids, histogram-based classifiers, and other CNN variants using run-level data splits, our best model, CaptureNet-Deep, achieved an F1 score of 0.94 and precision of 93.39% on held-out test data. The model supports low-latency inference and is integrated into a dashboard for Oxford Nanopore experiments, reducing the total analysis time from several days to under thirty minutes. These results show that efficient, real-time capture detection is possible using simple, interpretable architectures and suggest a broader role for lightweight ML models in sequencing workflows.
【3】Transmitter Identification and Protocol Categorization in Shared Spectrum via Multi-Task RF Classification at the Network Edge
标题:基于网络边缘多任务射频分类的共享频谱发射机识别和协议分类
链接:https://arxiv.org/abs/2511.01198
作者:Tariq Abdul-Quddoos, Tasnia Sharmin, Xiangfang Li, Lijun Qian
摘要:As spectrum sharing becomes increasingly vital to meet rising wireless demands in the future, spectrum monitoring and transmitter identification are indispensable for enforcing spectrum usage policy, efficient spectrum utilization, and net- work security. This study proposed a robust framework for transmitter identification and protocol categorization via multi- task RF signal classification in shared spectrum environments, where the spectrum monitor will classify transmission protocols (e.g., 4G LTE, 5G-NR, IEEE 802.11a) operating within the same frequency bands, and identify different transmitting base stations, as well as their combinations. A Convolutional Neural Network (CNN) is designed to tackle critical challenges such as overlapping signal characteristics and environmental variability. The proposed method employs a multi-channel input strategy to extract meaningful signal features, achieving remarkable accuracy: 90% for protocol classification, 100% for transmitting base station classification, and 92% for joint classification tasks, utilizing RF data from the POWDER platform. These results highlight the significant potential of the proposed method to enhance spectrum monitoring, management, and security in modern wireless networks.
【4】Learning with Category-Equivariant Architectures for Human Activity Recognition
标题:基于类别等变结构的人体活动识别
链接:https://arxiv.org/abs/2511.01139
作者:Yoshihiro Maruyama
摘要:We propose CatEquiv, a category-equivariant neural network for Human Activity Recognition (HAR) from inertial sensors that systematically encodes temporal, amplitude, and structural symmetries. In particular, we introduce the categorical symmetry product where cyclic time shifts, positive gains and the sensor-hierarchy poset together capture the categorical symmetry structure of the data. CatEquiv achieves equivariance with respect to the categorical symmetry product. On UCI-HAR under out-of-distribution perturbations, CatEquiv attains markedly higher robustness compared with circularly padded CNNs and plain CNNs. These results demonstrate that enforcing categorical symmetries yields strong invariance and generalization without additional model capacity.
【5】Motion-Robust Multimodal Fusion of PPG and Accelerometer Signals for Three-Class Heart Rhythm Classification
标题:用于三级心律分类的运动稳健多模式PPV和加速度计信号融合
链接:https://arxiv.org/abs/2511.00949
作者:Yangyang Zhao, Matti Kaisti, Olli Lahdenoja, Tero Koivisto
备注:Accepted for publication in the Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing and the 2025 International Symposium on Wearable Computers (UbiComp/ISWC 2025 Companion). 5 pages, 3 figures. Author's accepted manuscript (AAM)
摘要
:Atrial fibrillation (AF) is a leading cause of stroke and mortality, particularly in elderly patients. Wrist-worn photoplethysmography (PPG) enables non-invasive, continuous rhythm monitoring, yet suffers from significant vulnerability to motion artifacts and physiological noise. Many existing approaches rely solely on single-channel PPG and are limited to binary AF detection, often failing to capture the broader range of arrhythmias encountered in clinical settings. We introduce RhythmiNet, a residual neural network enhanced with temporal and channel attention modules that jointly leverage PPG and accelerometer (ACC) signals. The model performs three-class rhythm classification: AF, sinus rhythm (SR), and Other. To assess robustness across varying movement conditions, test data are stratified by accelerometer-based motion intensity percentiles without excluding any segments. RhythmiNet achieved a 4.3% improvement in macro-AUC over the PPG-only baseline. In addition, performance surpassed a logistic regression model based on handcrafted HRV features by 12%, highlighting the benefit of multimodal fusion and attention-based learning in noisy, real-world clinical data.
【6】Learning with Category-Equivariant Representations for Human Activity Recognition
标题:利用类别等变表示进行学习用于人类活动识别
链接:https://arxiv.org/abs/2511.00900
作者:Yoshihiro Maruyama
摘要:Human activity recognition is challenging because sensor signals shift with context, motion, and environment; effective models must therefore remain stable as the world around them changes. We introduce a categorical symmetry-aware learning framework that captures how signals vary over time, scale, and sensor hierarchy. We build these factors into the structure of feature representations, yielding models that automatically preserve the relationships between sensors and remain stable under realistic distortions such as time shifts, amplitude drift, and device orientation changes. On the UCI Human Activity Recognition benchmark, this categorical symmetry-driven design improves out-of-distribution accuracy by approx. 46 percentage points (approx. 3.6x over the baseline), demonstrating that abstract symmetry principles can translate into concrete performance gains in everyday sensing tasks via category-equivariant representation theory.
【7】Balancing Interpretability and Performance in Motor Imagery EEG Classification: A Comparative Study of ANFIS-FBCSP-PSO and EEGNet
标题:平衡运动图像脑电分类中的可解释性和性能:ANFIS-FBSCP-PSO和EEGNet的比较研究
链接:https://arxiv.org/abs/2511.00369
作者:Farjana Aktar, Mohd Ruhul Ameen, Akif Islam, Md Ekramul Hamid
备注:6 pages, 3 figures, 8 tables, Submitted to ICECTE 2026
摘要:Achieving both accurate and interpretable classification of motor imagery EEG remains a key challenge in brain computer interface (BCI) research. This paper compares a transparent fuzzy reasoning approach (ANFIS-FBCSP-PSO) with a deep learning benchmark (EEGNet) using the BCI Competition IV-2a dataset. The ANFIS pipeline combines filter bank common spatial pattern feature extraction with fuzzy IF-THEN rules optimized via particle swarm optimization, while EEGNet learns hierarchical spatial temporal representations directly from raw EEG data. In within-subject experiments, the fuzzy neural model performed better (68.58 percent +/- 13.76 percent accuracy, kappa = 58.04 percent +/- 18.43), while in cross-subject (LOSO) tests, the deep model exhibited stronger generalization (68.20 percent +/- 12.13 percent accuracy, kappa = 57.33 percent +/- 16.22). The study provides practical guidance for selecting MI-BCI systems according to design goals: interpretability or robustness across users. Future investigations into transformer based and hybrid neuro symbolic frameworks are expected to advance transparent EEG decoding.
【8】Deep recurrent-convolutional neural network learning and physics Kalman filtering comparison in dynamic load identification
标题:动态负载识别中的深度回归卷积神经网络学习和物理卡尔曼过滤比较
链接:https://arxiv.org/abs/2511.00100
作者:Marios Impraimakis
备注:31 pages, 20 figures, published in Structural Health Monitoring
摘要:The dynamic structural load identification capabilities of the gated recurrent unit, long short-term memory, and convolutional neural networks are examined herein. The examination is on realistic small dataset training conditions and on a comparative view to the physics-based residual Kalman filter (RKF). The dynamic load identification suffers from the uncertainty related to obtaining poor predictions when in civil engineering applications only a low number of tests are performed or are available, or when the structural model is unidentifiable. In considering the methods, first, a simulated structure is investigated under a shaker excitation at the top floor. Second, a building in California is investigated under seismic base excitation, which results in loading for all degrees of freedom. Finally, the International Association for Structural Control-American Society of Civil Engineers (IASC-ASCE) structural health monitoring benchmark problem is examined for impact and instant loading conditions. Importantly, the methods are shown to outperform each other on different loading scenarios, while the RKF is shown to outperform the networks in physically parametrized identifiable cases.
表征(2篇)
【1】Bridging Lifelong and Multi-Task Representation Learning via Algorithm and Complexity Measure
标题:通过算法和复杂性衡量弥合终身和多任务表示学习
链接:https://arxiv.org/abs/2511.01847
作者:Zhi Wang, Chicheng Zhang, Ramya Korlakai Vinayak
摘要:In lifelong learning, a learner faces a sequence of tasks with shared structure and aims to identify and leverage it to accelerate learning. We study the setting where such structure is captured by a common representation of data. Unlike multi-task learning or learning-to-learn, where tasks are available upfront to learn the representation, lifelong learning requires the learner to make use of its existing knowledge while continually gathering partial information in an online fashion. In this paper, we consider a generalized framework of lifelong representation learning. We propose a simple algorithm that uses multi-task empirical risk minimization as a subroutine and establish a sample complexity bound based on a new notion we introduce--the task-eluder dimension. Our result applies to a wide range of learning problems involving general function classes. As concrete examples, we instantiate our result on classification and regression tasks under noise.
【2】SliceVision-F2I: A Synthetic Feature-to-Image Dataset for Visual Pattern Representation on Network Slices
标题:SliceVision-F2 I:用于网络切片上视觉模式表示的合成生成到图像数据集
链接:https://arxiv.org/abs/2511.01087
作者:Md. Abid Hasan Rafi, Mst. Fatematuj Johora, Pankaj Bhowmik
摘要:The emergence of 5G and 6G networks has established network slicing as a significant part of future service-oriented architectures, demanding refined identification methods supported by robust datasets. The article presents SliceVision-F2I, a dataset of synthetic samples for studying feature visualization in network slicing for next-generation networking systems. The dataset transforms multivariate Key Performance Indicator (KPI) vectors into visual representations through four distinct encoding methods: physically inspired mappings, Perlin noise, neural wallpapering, and fractal branching. For each encoding method, 30,000 samples are generated, each comprising a raw KPI vector and a corresponding RGB image at low-resolution pixels. The dataset simulates realistic and noisy network conditions to reflect operational uncertainties and measurement imperfections. SliceVision-F2I is suitable for tasks involving visual learning, network state classification, anomaly detection, and benchmarking of image-based machine learning techniques applied to network data. The dataset is publicly available and can be reused in various research contexts, including multivariate time series analysis, synthetic data generation, and feature-to-image transformations.
3D|3D重建等相关(1篇)
【1】PDA-LSTM: Knowledge-driven page data arrangement based on LSTM for LCM supression in QLC 3D NAND flash memories
标题:PDA-LSTM:基于LSTM的知识驱动页面数据排列,用于QLC 3D AND闪存中的RCM抑制
链接:https://arxiv.org/abs/2511.00075
作者:Qianhui Li, Weiya Wang, Qianqi Zhao, Tong Qu, Jing He, Xuhong Qiang, Jingwen Hou, Ke Chen, Bao Zhang, Qi Wang
摘要:Quarter level cell (QLC) 3D NAND flash memory is emerging as the predominant storage solution in the era of artificial intelligence. QLC 3D NAND flash stores 4 bit per cell to expand the storage density, resulting in narrower read margins. Constrained to read margins, QLC always suffers from lateral charge migration (LCM), which caused by non-uniform charge density across adjacent memory cells. To suppress charge density gap between cells, there are some algorithm in form of intra-page data mapping such as WBVM, DVDS. However, we observe inter-page data arrangements also approach the suppression. Thus, we proposed an intelligent model PDA-LSTM to arrange intra-page data for LCM suppression, which is a physics-knowledge-driven neural network model. PDA-LSTM applies a long-short term memory (LSTM) neural network to compute a data arrangement probability matrix from input page data pattern. The arrangement is to minimize the global impacts derived from the LCM among wordlines. Since each page data can be arranged only once, we design a transformation from output matrix of LSTM network to non-repetitive sequence generation probability matrix to assist training process. The arranged data pattern can decrease the bit error rate (BER) during data retention. In addition, PDA-LSTM do not need extra flag bits to record data transport of 3D NAND flash compared with WBVM, DVDS. The experiment results show that the PDA-LSTM reduces the average BER by 80.4% compared with strategy without data arrangement, and by 18.4%, 15.2% compared respectively with WBVM and DVDS with code-length 64.
编码器(2篇)
【1】HyperNQ: A Hypergraph Neural Network Decoder for Quantum LDPC Codes
标题:HyperNQ:一种量子低脂码的超图神经网络解码器
链接:https://arxiv.org/abs/2511.01741
作者:Ameya S. Bhave, Navnil Choudhury, Kanad Basu
备注:6 pages, 4 figures, Submitted to the IEEE International Conference on Communications (ICC 2026). Preprint version
摘要:Quantum computing requires effective error correction strategies to mitigate noise and decoherence. Quantum Low-Density Parity-Check (QLDPC) codes have emerged as a promising solution for scalable Quantum Error Correction (QEC) applications by supporting constant-rate encoding and a sparse parity-check structure. However, decoding QLDPC codes via traditional approaches such as Belief Propagation (BP) suffers from poor convergence in the presence of short cycles. Machine learning techniques like Graph Neural Networks (GNNs) utilize learned message passing over their node features; however, they are restricted to pairwise interactions on Tanner graphs, which limits their ability to capture higher-order correlations. In this work, we propose HyperNQ, the first Hypergraph Neural Network (HGNN)- based QLDPC decoder that captures higher-order stabilizer constraints by utilizing hyperedges-thus enabling highly expressive and compact decoding. We use a two-stage message passing scheme and evaluate the decoder over the pseudo-threshold region. Below the pseudo-threshold mark, HyperNQ improves the Logical Error Rate (LER) up to 84% over BP and 50% over GNN-based strategies, demonstrating enhanced performance over the existing state-of-the-art decoders.
【2】Variational Autoencoder for Calibration: A New Approach
标题:用于校准的变分自动编码器:一种新方法
链接:https://arxiv.org/abs/2511.00475
作者:Travis Barrett, Amit Kumar Mishra, Joyce Mwangama
备注:6 pages, 5 figures
摘要:In this paper we present a new implementation of a Variational Autoencoder (VAE) for the calibration of sensors. We propose that the VAE can be used to calibrate sensor data by training the latent space as a calibration output. We discuss this new approach and show a proof-of-concept using an existing multi-sensor gas dataset. We show the performance of the proposed calibration VAE and found that it was capable of performing as calibration model while performing as an autoencoder simultaneously. Additionally, these models have shown that they are capable of creating statistically similar outputs from both the calibration output as well as the reconstruction output to their respective truth data. We then discuss the methods of future testing and planned expansion of this work.
优化|敛散性(8篇)
【1】A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
标题:鞍点补救措施:非凸优化中变量消除的力量
链接:https://arxiv.org/abs/2511.01234
作者:Min Gan, Guang-Yong Chen, Yang Yi, Lin Yang
摘要:The proliferation of saddle points, rather than poor local minima, is increasingly understood to be a primary obstacle in large-scale non-convex optimization for machine learning. Variable elimination algorithms, like Variable Projection (VarPro), have long been observed to exhibit superior convergence and robustness in practice, yet a principled understanding of why they so effectively navigate these complex energy landscapes has remained elusive. In this work, we provide a rigorous geometric explanation by comparing the optimization landscapes of the original and reduced formulations. Through a rigorous analysis based on Hessian inertia and the Schur complement, we prove that variable elimination fundamentally reshapes the critical point structure of the objective function, revealing that local maxima in the reduced landscape are created from, and correspond directly to, saddle points in the original formulation. Our findings are illustrated on the canonical problem of non-convex matrix factorization, visualized directly on two-parameter neural networks, and finally validated in training deep Residual Networks, where our approach yields dramatic improvements in stability and convergence to superior minima. This work goes beyond explaining an existing method; it establishes landscape simplification via saddle point transformation as a powerful principle that can guide the design of a new generation of more robust and efficient optimization algorithms.
【2】Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization
标题:在线零阶和一阶二层优化的随机遗憾保证
链接:https://arxiv.org/abs/2511.01126
作者:Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh, Li Shen, George Michailidis
备注:Published at NeurIPS 2025. 88 pages and 3 figures
摘要:Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approaches rely on deterministic \textit{window-smoothed} regret minimization, which may not accurately reflect system performance when functions change rapidly. In this work, we introduce a novel search direction and show that both first- and zeroth-order (ZO) stochastic OBO algorithms leveraging this direction achieve sublinear {stochastic bilevel regret without window smoothing}. Beyond these guarantees, our framework enhances efficiency by: (i) reducing oracle dependence in hypergradient estimation, (ii) updating inner and outer variables alongside the linear system solution, and (iii) employing ZO-based estimation of Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks validate our approach.
【3】None To Optima in Few Shots: Bayesian Optimization with MDP Priors
标题:几次尝试中无最佳:具有MDP先验的Bayesian优化
链接:https://arxiv.org/abs/2511.01006
作者:Diantong Li, Kyunghyun Cho, Chong Liu
摘要:Bayesian Optimization (BO) is an efficient tool for optimizing black-box functions, but its theoretical guarantees typically hold in the asymptotic regime. In many critical real-world applications such as drug discovery or materials design, where each evaluation can be very costly and time-consuming, BO becomes impractical for many evaluations. In this paper, we introduce the Procedure-inFormed BO (ProfBO) algorithm, which solves black-box optimization with remarkably few function evaluations. At the heart of our algorithmic design are Markov Decision Process (MDP) priors that model optimization trajectories from related source tasks, thereby capturing procedural knowledge on efficient optimization. We embed these MDP priors into a prior-fitted neural network and employ model-agnostic meta-learning for fast adaptation to new target tasks. Experiments on real-world Covid and Cancer benchmarks and hyperparameter tuning tasks demonstrate that ProfBO consistently outperforms state-of-the-art methods by achieving high-quality solutions with significantly fewer evaluations, making it ready for practical deployment.
【4】KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization
标题:KFCPO:克罗内克因子逼近约束政策优化
链接:https://arxiv.org/abs/2511.00880
作者:Joonyoung Lim, Younghwan Yoo
备注:12 pages, 8 figures, submitted to ECAI 2025
摘要:We propose KFCPO, a novel Safe Reinforcement Learning (Safe RL) algorithm that combines scalable Kronecker-Factored Approximate Curvature (K-FAC) based second-order policy optimization with safety-aware gradient manipulation. KFCPO leverages K-FAC to perform efficient and stable natural gradient updates by approximating the Fisher Information Matrix (FIM) in a layerwise, closed form manner, avoiding iterative approximation overheads. To address the tradeoff between reward maximization and constraint satisfaction, we introduce a margin aware gradient manipulation mechanism that adaptively adjusts the influence of reward and cost gradients based on the agent's proximity to safety boundaries. This method blends gradients using a direction sensitive projection, eliminating harmful interference and avoiding abrupt changes caused by fixed hard thresholds. Additionally, a minibatch level KL rollback strategy is adopted to ensure trust region compliance and to prevent destabilizing policy shifts. Experiments on Safety Gymnasium using OmniSafe show that KFCPO achieves 10.3% to 50.2% higher average return across environments compared to the best baseline that respected the safety constraint, demonstrating superior balance of safety and performance.
【5】Why Federated Optimization Fails to Achieve Perfect Fitting? A Theoretical Perspective on Client-Side Optima
标题:为什么联合优化未能实现完美匹配?客户端优化的理论视角
链接:https://arxiv.org/abs/2511.00469
作者:Zhongxiang Lei, Qi Yang, Ping Qiu, Gang Zhang, Yuanchi Ma, Jinyan Liu
摘要
:Federated optimization is a constrained form of distributed optimization that enables training a global model without directly sharing client data. Although existing algorithms can guarantee convergence in theory and often achieve stable training in practice, the reasons behind performance degradation under data heterogeneity remain unclear. To address this gap, the main contribution of this paper is to provide a theoretical perspective that explains why such degradation occurs. We introduce the assumption that heterogeneous client data lead to distinct local optima, and show that this assumption implies two key consequences: 1) the distance among clients' local optima raises the lower bound of the global objective, making perfect fitting of all client data impossible; and 2) in the final training stage, the global model oscillates within a region instead of converging to a single optimum, limiting its ability to fully fit the data. These results provide a principled explanation for performance degradation in non-iid settings, which we further validate through experiments across multiple tasks and neural network architectures. The framework used in this paper is open-sourced at: https://github.com/NPCLEI/fedtorch.
【6】DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
标题:DCcluster-Opt:针对地理分布式数据中心工作负载进行动态多目标优化基准测试
链接:https://arxiv.org/abs/2511.00117
作者:Antonio Guillen-Perez, Avisek Naug, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna Gutierrez, Ashwin Ramesh Babu, Munther Salim, Shubhanker Banerjee, Eoin H. Oude Essink, Damien Fay, Soumyendu Sarkar
备注:Submitted to the NeurIPS 2025 conference
摘要:The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks that realistically capture the interplay of time-varying environmental factors (grid carbon intensity, electricity prices, weather), detailed data center physics (CPUs, GPUs, memory, HVAC energy), and geo-distributed network dynamics (latency and transmission costs). To bridge this gap, we present DCcluster-Opt: an open-source, high-fidelity simulation benchmark for sustainable, geo-temporal task scheduling. DCcluster-Opt combines curated real-world datasets, including AI workload traces, grid carbon intensity, electricity markets, weather across 20 global regions, cloud transmission costs, and empirical network delay parameters with physics-informed models of data center operations, enabling rigorous and reproducible research in sustainable computing. It presents a challenging scheduling problem where a top-level coordinating agent must dynamically reassign or defer tasks that arrive with resource and service-level agreement requirements across a configurable cluster of data centers to optimize multiple objectives. The environment also models advanced components such as heat recovery. A modular reward system enables an explicit study of trade-offs among carbon emissions, energy costs, service level agreements, and water use. It provides a Gymnasium API with baseline controllers, including reinforcement learning and rule-based strategies, to support reproducible ML research and a fair comparison of diverse algorithms. By offering a realistic, configurable, and accessible testbed, DCcluster-Opt accelerates the development and validation of next-generation sustainable computing solutions for geo-distributed data centers.
【7】Benchmarking Generative AI Against Bayesian Optimization for Constrained Multi-Objective Inverse Design
标题:针对约束多目标逆设计的约束性多目标逆设计的生成性人工智能基准
链接:https://arxiv.org/abs/2511.00070
作者:Muhammad Bilal Awan, Abdul Razzaq, Abdul Shahid
备注:17 pages, 2 Figures
摘要:This paper investigates the performance of Large Language Models (LLMs) as generative optimizers for solving constrained multi-objective regression tasks, specifically within the challenging domain of inverse design (property-to-structure mapping). This problem, critical to materials informatics, demands finding complex, feasible input vectors that lie on the Pareto optimal front. While LLMs have demonstrated universal effectiveness across generative and reasoning tasks, their utility in constrained, continuous, high-dimensional numerical spaces tasks they weren't explicitly architected for remains an open research question. We conducted a rigorous comparative study between established Bayesian Optimization (BO) frameworks and a suite of fine-tuned LLMs and BERT models. For BO, we benchmarked the foundational BoTorch Ax implementation against the state-of-the-art q-Expected Hypervolume Improvement (qEHVI, BoTorchM). The generative approach involved fine-tuning models via Parameter-Efficient Fine-Tuning (PEFT), framing the challenge as a regression problem with a custom output head. Our results show that BoTorch qEHVI achieved perfect convergence (GD=0.0), setting the performance ceiling. Crucially, the best-performing LLM (WizardMath-7B) achieved a Generational Distance (GD) of 1.21, significantly outperforming the traditional BoTorch Ax baseline (GD=15.03). We conclude that specialized BO frameworks remain the performance leader for guaranteed convergence, but fine-tuned LLMs are validated as a promising, computationally fast alternative, contributing essential comparative metrics to the field of AI-driven optimization. The findings have direct industrial applications in optimizing formulation design for resins, polymers, and paints, where multi-objective trade-offs between mechanical, rheological, and chemical properties are critical to innovation and production efficiency.
【8】Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift
标题:最佳注意力温度增强分布转移下的情境学习
链接:https://arxiv.org/abs/2511.01292
作者:Samet Demir, Zafer Dogan
备注:26 pages, 6 figures
摘要:Pretrained Transformers excel at in-context learning (ICL), inferring new tasks from only a handful of examples. Yet, their ICL performance can degrade sharply under distribution shift between pretraining and test data, a regime increasingly common in real-world deployments. While recent empirical work hints that adjusting the attention temperature in the softmax can enhance Transformer performance, the attention temperature's role in ICL under distribution shift remains unexplored. This paper provides the first theoretical and empirical study of attention temperature for ICL under distribution shift. Using a simplified but expressive "linearized softmax" framework, we derive closed-form generalization error expressions and prove that shifts in input covariance or label noise substantially impair ICL, but that an optimal attention temperature exists which minimizes this error. We then validate our predictions through extensive simulations on linear regression tasks and large-scale experiments with GPT-2 and LLaMA2-7B on question-answering benchmarks. Our results establish attention temperature as a principled and powerful mechanism for improving the robustness of ICL in pretrained Transformers, advancing theoretical understanding and providing actionable guidance for selecting attention temperature in practice.
预测|估计(19篇)
【1】Coordinate ascent neural Kalman-MLE for state estimation
标题:用于状态估计的坐标上升神经网络Kalman-MLE
链接:https://arxiv.org/abs/2511.01855
作者:Bettina Hanlon, Angel Garcia Fernandez
摘要:This paper presents a coordinate ascent algorithm to learn dynamic and measurement models in dynamic state estimation using maximum likelihood estimation in a supervised manner. In particular, the dynamic and measurement models are assumed to be Gaussian and the algorithm learns the neural network parameters that model the dynamic and measurement functions, and also the noise covariance matrices. The trained dynamic and measurement models are then used with a non-linear Kalman filter algorithm to estimate the state during the testing phase.
【2】Estimation of Toeplitz Covariance Matrices using Overparameterized Gradient Descent
标题:利用超参数化梯度下降估计Toeplitz协方差矩阵
链接:https://arxiv.org/abs/2511.01605
作者:Daniel Busbib, Ami Wiesel
摘要:We consider covariance estimation under Toeplitz structure. Numerous sophisticated optimization methods have been developed to maximize the Gaussian log-likelihood under Toeplitz constraints. In contrast, recent advances in deep learning demonstrate the surprising power of simple gradient descent (GD) applied to overparameterized models. Motivated by this trend, we revisit Toeplitz covariance estimation through the lens of overparameterized GD. We model the $P\times P$ covariance as a sum of $K$ complex sinusoids with learnable parameters and optimize them via GD. We show that when $K = P$, GD may converge to suboptimal solutions. However, mild overparameterization ($K = 2P$ or $4P$) consistently enables global convergence from random initializations. We further propose an accelerated GD variant with separate learning rates for amplitudes and frequencies. When frequencies are fixed and only amplitudes are optimized, we prove that the optimization landscape is asymptotically benign and any stationary point recovers the true covariance. Finally, numerical experiments demonstrate that overparameterized GD can match or exceed the accuracy of state-of-the-art methods in challenging settings, while remaining simple and scalable.
【3】Deep Learning Prediction of Beam Coherence Time for Near-FieldTeraHertz Networks
标题:近场TeraHertz网络的射束一致时间深度学习预测
链接:https://arxiv.org/abs/2511.01491
作者:Irched Chafaa, E. Veronica Belmega, Giacomo Bacci
备注:IEEE Wireless Communication Letters (accepted October 2025)
摘要:Large multiple antenna arrays coupled with accu- rate beamforming are essential in terahertz (THz) communi- cations to ensure link reliability. However, as the number of antennas increases, beam alignment (focusing) and beam tracking in mobile networks incur prohibitive overhead. Additionally, the near-field region expands both with the size of antenna arrays and the carrier frequency, calling for adjustments in the beamforming to account for spherical wavefront instead of the conventional planar wave assumption. In this letter, we introduce a novel beam coherence time for mobile THz networks, to drastically reduce the rate of beam updates. Then, we propose a deep learning model, relying on a simple feedforward neural network with a time-dependent input, to predict the beam coherence time and adjust the beamforming on the fly with minimal overhead. Our numerical results demonstrate the effectiveness of the proposed approach by enabling higher data rates while reducing the overhead, especially at high (i.e., vehicular) mobility.
【4】Koopman-based Prediction of Connectivity for Flying Ad Hoc Networks
标题:基于Koopman的飞行自组织网络连通性预测
链接:https://arxiv.org/abs/2511.01286
作者:Sivaram Krishnan, Jinho Choi, Jihong Park, Gregory Sherman, Benjamin Campbell
摘要:The application of machine learning (ML) to communication systems is expected to play a pivotal role in future artificial intelligence (AI)-based next-generation wireless networks. While most existing works focus on ML techniques for static wireless environments, they often face limitations when applied to highly dynamic environments, such as flying ad hoc networks (FANETs). This paper explores the use of data-driven Koopman approaches to address these challenges. Specifically, we investigate how these approaches can model UAV trajectory dynamics within FANETs, enabling more accurate predictions and improved network performance. By leveraging Koopman operator theory, we propose two possible approaches -- centralized and distributed -- to efficiently address the challenges posed by the constantly changing topology of FANETs. To demonstrate this, we consider a FANET performing surveillance with UAVs following pre-determined trajectories and predict signal-to-interference-plus-noise ratios (SINRs) to ensure reliable communication between UAVs. Our results show that these approaches can accurately predict connectivity and isolation events that lead to modelled communication outages. This capability could help UAVs schedule their transmissions based on these predictions.
【5】GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction
标题:地理令牌:通过下一个令牌预测实现图像分层地理定位
链接:https://arxiv.org/abs/2511.01082
作者:Narges Ghasemi, Amir Ziashahabi, Salman Avestimehr, Cyrus Shahabi
备注:Accepted to IEEE International Conference on Data Mining (ICDM) 2025
摘要
:Image geolocalization, the task of determining an image's geographic origin, poses significant challenges, largely due to visual similarities across disparate locations and the large search space. To address these issues, we propose a hierarchical sequence prediction approach inspired by how humans narrow down locations from broad regions to specific addresses. Analogously, our model predicts geographic tokens hierarchically, first identifying a general region and then sequentially refining predictions to increasingly precise locations. Rather than relying on explicit semantic partitions, our method uses S2 cells, a nested, multiresolution global grid, and sequentially predicts finer-level cells conditioned on visual inputs and previous predictions. This procedure mirrors autoregressive text generation in large language models. Much like in language modeling, final performance depends not only on training but also on inference-time strategy. We investigate multiple top-down traversal methods for autoregressive sampling, incorporating techniques from test-time compute scaling used in language models. Specifically, we integrate beam search and multi-sample inference while exploring various selection strategies to determine the final output. This enables the model to manage uncertainty by exploring multiple plausible paths through the hierarchy. We evaluate our method on the Im2GPS3k and YFCC4k datasets against two distinct sets of baselines: those that operate without a Multimodal Large Language Model (MLLM) and those that leverage one. In the MLLM-free setting, our model surpasses other comparable baselines on nearly all metrics, achieving state-of-the-art performance with accuracy gains of up to 13.9%. When augmented with an MLLM, our model outperforms all baselines, setting a new state-of-the-art across all metrics. The source code is available at https://github.com/NNargesNN/GeoToken.
【6】SARIMAX-Based Power Outage Prediction During Extreme Weather Events
标题:极端天气事件期间基于SARIMAX的停电预测
链接:https://arxiv.org/abs/2511.01017
作者:Haoran Ye, Qiuzhuang Sun, Yang Yang
备注:12 pages, 3 figures. This paper presents the solution of Team 12 for the 2025 INFORMS Data Mining Society Data Challenge. The open-source code is available at: this https URL
摘要:This study develops a SARIMAX-based prediction system for short-term power outage forecasting during extreme weather events. Using hourly data from Michigan counties with outage counts and comprehensive weather features, we implement a systematic two-stage feature engineering pipeline: data cleaning to remove zero-variance and unknown features, followed by correlation-based filtering to eliminate highly correlated predictors. The selected features are augmented with temporal embeddings, multi-scale lag features, and weather variables with their corresponding lags as exogenous inputs to the SARIMAX model. To address data irregularity and numerical instability, we apply standardization and implement a hierarchical fitting strategy with sequential optimization methods, automatic downgrading to ARIMA when convergence fails, and historical mean-based fallback predictions as a final safeguard. The model is optimized separately for short-term (24 hours) and medium-term (48 hours) forecast horizons using RMSE as the evaluation metric. Our approach achieves an RMSE of 177.2, representing an 8.4\% improvement over the baseline method (RMSE = 193.4), thereby validating the effectiveness of our feature engineering and robust optimization strategy for extreme weather-related outage prediction.
【7】Occlusion-Aware Diffusion Model for Pedestrian Intention Prediction
标题:基于遮挡感知的行人意图预测扩散模型
链接:https://arxiv.org/abs/2511.00858
作者:Yu Liu, Zhijie Liu, Zedong Yang, You-Fu Li, He Kong
备注:This manuscript has been accepted to the IEEE Transactions on Intelligent Transportation Systems as a regular paper
摘要:Predicting pedestrian crossing intentions is crucial for the navigation of mobile robots and intelligent vehicles. Although recent deep learning-based models have shown significant success in forecasting intentions, few consider incomplete observation under occlusion scenarios. To tackle this challenge, we propose an Occlusion-Aware Diffusion Model (ODM) that reconstructs occluded motion patterns and leverages them to guide future intention prediction. During the denoising stage, we introduce an occlusion-aware diffusion transformer architecture to estimate noise features associated with occluded patterns, thereby enhancing the model's ability to capture contextual relationships in occluded semantic scenarios. Furthermore, an occlusion mask-guided reverse process is introduced to effectively utilize observation information, reducing the accumulation of prediction errors and enhancing the accuracy of reconstructed motion features. The performance of the proposed method under various occlusion scenarios is comprehensively evaluated and compared with existing methods on popular benchmarks, namely PIE and JAAD. Extensive experimental results demonstrate that the proposed method achieves more robust performance than existing methods in the literature.
【8】Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning
标题:用于机器人运动规划的预测动态障碍模型的实时学习
链接:https://arxiv.org/abs/2511.00814
作者:Stella Kombo, Masih Haseli, Skylar Wei, Joel W. Burdick
备注:10 pages, 6 figures, submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025
摘要:Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers the question: "can we learn, in real-time, a nonlinear predictive model of another agent's motions?" Our online framework denoises and forecasts such dynamics using a modified sliding-window Hankel Dynamic Mode Decomposition (Hankel-DMD). Partial noisy measurements are embedded into a Hankel matrix, while an associated Page matrix enables singular-value hard thresholding (SVHT) to estimate the effective rank. A Cadzow projection enforces structured low-rank consistency, yielding a denoised trajectory and local noise variance estimates. From this representation, a time-varying Hankel-DMD lifted linear predictor is constructed for multi-step forecasts. The residual analysis provides variance-tracking signals that can support downstream estimators and risk-aware planning. We validate the approach in simulation under Gaussian and heavy-tailed noise, and experimentally on a dynamic crane testbed. Results show that the method achieves stable variance-aware denoising and short-horizon prediction suitable for integration into real-time control frameworks.
【9】Enhancing Heavy Rain Nowcasting with Multimodal Data: Integrating Radar and Satellite Observations
标题:利用多模式数据加强大雨预预报:整合雷达和卫星观测
链接:https://arxiv.org/abs/2511.00716
作者
:Rama Kassoumeh, David Rügamer, Henning Oppel
备注:accepted to ICMLA 2025
摘要:The increasing frequency of heavy rainfall events, which are a major cause of urban flooding, underscores the urgent need for accurate precipitation forecasting - particularly in urban areas where localized events often go undetected by ground-based sensors. In Germany, only 17.3% of hourly heavy rain events between 2001 and 2018 were recorded by rain gauges, highlighting the limitations of traditional monitoring systems. Radar data are another source that effectively tracks ongoing precipitation; however, forecasting the development of heavy rain using radar alone remains challenging due to the brief and unpredictable nature of such events. Our focus is on evaluating the effectiveness of fusing satellite and radar data for nowcasting. We develop a multimodal nowcasting model that combines both radar and satellite imagery for predicting precipitation at lead times of 5, 15, and 30 minutes. We demonstrate that this multimodal strategy significantly outperforms radar-only approaches. Experimental results show that integrating satellite data improves prediction accuracy, particularly for intense precipitation. The proposed model increases the Critical Success Index for heavy rain by 4% and for violent rain by 3% at a 5-minute lead time. Moreover, it maintains higher predictive skill at longer lead times, where radar-only performance declines. A qualitative analysis of the severe flooding event in the state of North Rhine-Westphalia, Germany in 2021 further illustrates the superior performance of the multimodal model. Unlike the radar-only model, which captures general precipitation patterns, the multimodal model yields more detailed and accurate forecasts for regions affected by heavy rain. This improved precision enables timely, reliable, life-saving warnings. Implementation available at https://github.com/RamaKassoumeh/Multimodal_heavy_rain
【10】TRISKELION-1: Unified Descriptive-Predictive-Generative AI
标题:TRISKELION-1:统一的描述-预测-生成人工智能
链接:https://arxiv.org/abs/2511.00711
作者:Nardeep Kumar, Arun Kanwar
备注:12 pages, 18 figures, submitted to arXiv (2025)
摘要:TRISKELION-1 is a unified descriptive-predictive-generative architecture that integrates statistical, mechanistic, and generative reasoning within a single encoder-decoder framework. The model demonstrates how descriptive representation learning, predictive inference, and generative synthesis can be jointly optimized using variational objectives. Experiments on MNIST validate that descriptive reconstruction, predictive classification, and generative sampling can coexist stably within one model. The framework provides a blueprint toward universal intelligence architectures that connect interpretability, accuracy, and creativity.
【11】Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology
标题:动力系统方程的稀疏和非参数估计及其在生物学中的应用
链接:https://arxiv.org/abs/2511.00579
作者:G. Pillonetto, A. Giaretta, A. Aravkin, M. Bisiacco, T. Elston
摘要:Data-driven discovery of model equations is a powerful approach for understanding the behavior of dynamical systems in many scientific fields. In particular, the ability to learn mathematical models from data would benefit systems biology, where the complex nature of these systems often makes a bottom up approach to modeling unfeasible. In recent years, sparse estimation techniques have gained prominence in system identification, primarily using parametric paradigms to efficiently capture system dynamics with minimal model complexity. In particular, the Sindy algorithm has successfully used sparsity to estimate nonlinear systems by extracting from a library of functions only a few key terms needed to capture the dynamics of these systems. However, parametric models often fall short in accurately representing certain nonlinearities inherent in complex systems. To address this limitation, we introduce a novel framework that integrates sparse parametric estimation with nonparametric techniques. It captures nonlinearities that Sindy cannot describe without requiring a priori information about their functional form. That is, without expanding the library of functions to include the one that is trying to be discovered. We illustrate our approach on several examples related to estimation of complex biological phenomena.
【12】Air Pollution Forecasting in Bucharest
标题:布加勒斯特空气污染预测
链接:https://arxiv.org/abs/2511.00532
作者:Dragoş-Andrei Şerban, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel
备注:14 pages 3 figures
摘要:Air pollution, especially the particulate matter 2.5 (PM2.5), has become a growing concern in recent years, primarily in urban areas. Being exposed to air pollution is linked to developing numerous health problems, like the aggravation of respiratory diseases, cardiovascular disorders, lung function impairment, and even cancer or early death. Forecasting future levels of PM2.5 has become increasingly important over the past few years, as it can provide early warnings and help prevent diseases. This paper aims to design, fine-tune, test, and evaluate machine learning models for predicting future levels of PM2.5 over various time horizons. Our primary objective is to assess and compare the performance of multiple models, ranging from linear regression algorithms and ensemble-based methods to deep learning models, such as advanced recurrent neural networks and transformers, as well as large language models, on this forecasting task.
【13】MaGNet: A Mamba Dual-Hypergraph Network for Stock Prediction via Temporal-Causal and Global Relational Learning
标题:MaGNet:通过时态因果和全局关系学习进行股票预测的Mamba双超图网络
链接:https://arxiv.org/abs/2511.00085
作者:Peilin Tan, Chuanqi Shi, Dian Tu, Liang Xie
摘要
:Stock trend prediction is crucial for profitable trading strategies and portfolio management yet remains challenging due to market volatility, complex temporal dynamics and multifaceted inter-stock relationships. Existing methods struggle to effectively capture temporal dependencies and dynamic inter-stock interactions, often neglecting cross-sectional market influences, relying on static correlations, employing uniform treatments of nodes and edges, and conflating diverse relationships. This work introduces MaGNet, a novel Mamba dual-hyperGraph Network for stock prediction, integrating three key innovations: (1) a MAGE block, which leverages bidirectional Mamba with adaptive gating mechanisms for contextual temporal modeling and integrates a sparse Mixture-of-Experts layer to enable dynamic adaptation to diverse market conditions, alongside multi-head attention for capturing global dependencies; (2) Feature-wise and Stock-wise 2D Spatiotemporal Attention modules enable precise fusion of multivariate features and cross-stock dependencies, effectively enhancing informativeness while preserving intrinsic data structures, bridging temporal modeling with relational reasoning; and (3) a dual hypergraph framework consisting of the Temporal-Causal Hypergraph (TCH) that captures fine-grained causal dependencies with temporal constraints, and Global Probabilistic Hypergraph (GPH) that models market-wide patterns through soft hyperedge assignments and Jensen-Shannon Divergence weighting mechanism, jointly disentangling localized temporal influences from instantaneous global structures for multi-scale relational learning. Extensive experiments on six major stock indices demonstrate MaGNet outperforms state-of-the-art methods in both superior predictive performance and exceptional investment returns with robust risk management capabilities. Codes available at: https://github.com/PeilinTime/MaGNet.
【14】Application of predictive machine learning in pen & paper RPG game design
标题:预测机器学习在纸笔RPG游戏设计中的应用
链接:https://arxiv.org/abs/2511.00084
作者:Jolanta Śliwa
备注:Master's thesis submitted at AGH University of Science and Technology
摘要:In recent years, the pen and paper RPG market has experienced significant growth. As a result, companies are increasingly exploring the integration of AI technologies to enhance player experience and gain a competitive edge. One of the key challenges faced by publishers is designing new opponents and estimating their challenge level. Currently, there are no automated methods for determining a monster's level; the only approaches used are based on manual testing and expert evaluation. Although these manual methods can provide reasonably accurate estimates, they are time-consuming and resource-intensive. Level prediction can be approached using ordinal regression techniques. This thesis presents an overview and evaluation of state-of-the-art methods for this task. It also details the construction of a dedicated dataset for level estimation. Furthermore, a human-inspired model was developed to serve as a benchmark, allowing comparison between machine learning algorithms and the approach typically employed by pen and paper RPG publishers. In addition, a specialized evaluation procedure, grounded in domain knowledge, was designed to assess model performance and facilitate meaningful comparisons.
【15】Forecasting Occupational Survivability of Rickshaw Pullers in a Changing Climate with Wearable Data
标题:利用可穿戴数据预测人力车拉工在不断变化的气候下的职业生存能力
链接:https://arxiv.org/abs/2511.00081
作者:Masfiqur Rahaman, Maoyejatun Hasana, Shahad Shahriar Rahman, MD Sajid Mostafiz Noor, Razin Reaz Abedin, Md Toki Tahmid, Duncan Watson Parris, Tanzeem Choudhury, A. B. M. Alim Al Islam, Tauhidur Rahman
备注:This is a preprint version of a manuscript accepted and to be published in the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)
摘要:Cycle rickshaw pullers are highly vulnerable to extreme heat, yet little is known about how their physiological biomarkers respond under such conditions. This study collected real-time weather and physiological data using wearable sensors from 100 rickshaw pullers in Dhaka, Bangladesh. In addition, interviews with 12 pullers explored their knowledge, perceptions, and experiences related to climate change. We developed a Linear Gaussian Bayesian Network (LGBN) regression model to predict key physiological biomarkers based on activity, weather, and demographic features. The model achieved normalized mean absolute error values of 0.82, 0.47, 0.65, and 0.67 for skin temperature, relative cardiac cost, skin conductance response, and skin conductance level, respectively. Using projections from 18 CMIP6 climate models, we layered the LGBN on future climate forecasts to analyze survivability for current (2023-2025) and future years (2026-2100). Based on thresholds of WBGT above 31.1{\deg}C and skin temperature above 35{\deg}C, 32% of rickshaw pullers already face high heat exposure risk. By 2026-2030, this percentage may rise to 37% with average exposure lasting nearly 12 minutes, or about two-thirds of the trip duration. A thematic analysis of interviews complements these findings, showing that rickshaw pullers recognize their increasing climate vulnerability and express concern about its effects on health and occupational survivability.
【16】Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
标题:用于训练多步时间序列预测模型的二次直接预测
链接:https://arxiv.org/abs/2511.00053
作者:Hao Wang, Licheng Pan, Yuan Lu, Zhichao Chen, Tianqiao Liu, Shuting He, Zhixuan Chu, Qingsong Wen, Haoxuan Li, Zhouchen Lin
摘要:The design of training objective is central to training time-series forecasting models. Existing training objectives such as mean squared error mostly treat each future step as an independent, equally weighted task, which we found leading to the following two issues: (1) overlook the label autocorrelation effect among future steps, leading to biased training objective; (2) fail to set heterogeneous task weights for different forecasting tasks corresponding to varying future steps, limiting the forecasting performance. To fill this gap, we propose a novel quadratic-form weighted training objective, addressing both of the issues simultaneously. Specifically, the off-diagonal elements of the weighting matrix account for the label autocorrelation effect, whereas the non-uniform diagonals are expected to match the most preferable weights of the forecasting tasks with varying future steps. To achieve this, we propose a Quadratic Direct Forecast (QDF) learning algorithm, which trains the forecast model using the adaptively updated quadratic-form weighting matrix. Experiments show that our QDF effectively improves performance of various forecast models, achieving state-of-the-art results. Code is available at https://anonymous.4open.science/r/QDF-8937.
【17】Neural Architecture Search for global multi-step Forecasting of Energy Production Time Series
标题:神经架构搜索能源生产时间序列的全球多步预测
链接:https://arxiv.org/abs/2511.00035
作者:Georg Velev, Stefan Lessmann
摘要:The dynamic energy sector requires both predictive accuracy and runtime efficiency for short-term forecasting of energy generation under operational constraints, where timely and precise predictions are crucial. The manual configuration of complex methods, which can generate accurate global multi-step predictions without suffering from a computational bottleneck, represents a procedure with significant time requirements and high risk for human-made errors. A further intricacy arises from the temporal dynamics present in energy-related data. Additionally, the generalization to unseen data is imperative for continuously deploying forecasting techniques over time. To overcome these challenges, in this research, we design a neural architecture search (NAS)-based framework for the automated discovery of time series models that strike a balance between computational efficiency, predictive performance, and generalization power for the global, multi-step short-term forecasting of energy production time series. In particular, we introduce a search space consisting only of efficient components, which can capture distinctive patterns of energy time series. Furthermore, we formulate a novel objective function that accounts for performance generalization in temporal context and the maximal exploration of different regions of our high-dimensional search space. The results obtained on energy production time series show that an ensemble of lightweight architectures discovered with NAS outperforms state-of-the-art techniques, such as Transformers, as well as pre-trained forecasting models, in terms of both efficiency and accuracy.
【18】Accuracy estimation of neural networks by extreme value theory
标题:利用极小值理论估计神经网络的准确性
链接:https://arxiv.org/abs/2511.00490
作者:Gero Junike, Marco Oesting
摘要:Neural networks are able to approximate any continuous function on a compact set. However, it is not obvious how to quantify the error of the neural network, i.e., the remaining bias between the function and the neural network. Here, we propose the application of extreme value theory to quantify large values of the error, which are typically relevant in applications. The distribution of the error beyond some threshold is approximately generalized Pareto distributed. We provide a new estimator of the shape parameter of the Pareto distribution suitable to describe the error of neural networks. Numerical experiments are provided.
【19】Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data
标题:梯度增强混合模型:离散数据均值和方差分量的灵活联合估计
链接:https://arxiv.org/abs/2511.00217
作者:Mitchell L. Prevett, Francis K. C. Hui, Zhi Yang Tho, A. H. Welsh, Anton H. Westveld
摘要:Linear mixed models are widely used for clustered data, but their reliance on parametric forms limits flexibility in complex and high-dimensional settings. In contrast, gradient boosting methods achieve high predictive accuracy through nonparametric estimation, but do not accommodate clustered data structures or provide uncertainty quantification. We introduce Gradient Boosted Mixed Models (GBMixed), a framework and algorithm that extends boosting to jointly estimate mean and variance components via likelihood-based gradients. In addition to nonparametric mean estimation, the method models both random effects and residual variances as potentially covariate-dependent functions using flexible base learners such as regression trees or splines, enabling nonparametric estimation while maintaining interpretability. Simulations and real-world applications demonstrate accurate recovery of variance components, calibrated prediction intervals, and improved predictive accuracy relative to standard linear mixed models and nonparametric methods. GBMixed provides heteroscedastic uncertainty quantification and introduces boosting for heterogeneous random effects. This enables covariate-dependent shrinkage for cluster-specific predictions to adapt between population and cluster-level data. Under standard causal assumptions, the framework enables estimation of heterogeneous treatment effects with reliable uncertainty quantification.
其他神经网络|深度学习|模型|建模(55篇)
【1】Proximal Regret and Proximal Correlated Equilibria: A New Tractable Solution Concept for Online Learning and Games
标题:近端后悔与近端相关均衡:一种新的在线学习和游戏的易处理解决方案概念
链接:https://arxiv.org/abs/2511.01852
作者:Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
备注:This paper presents proximal regret and proximal correlated equilibria results that do not appear in the NeurIPS version of arXiv:2403.08171
摘要:Learning and computation of equilibria are central problems in algorithmic game theory. In this work, we introduce proximal regret, a new notion of regret based on proximal operators that lies strictly between external and swap regret. When every player employs a no-proximal-regret algorithm in a general convex game, the empirical distribution of play converges to proximal correlated equilibria (PCE), a refinement of coarse correlated equilibria. Our framework unifies several emerging notions in online learning and game theory -- such as gradient equilibrium and semicoarse correlated equilibrium -- and introduces new ones. Our main result shows that the classic Online Gradient Descent (GD) algorithm achieves an optimal $O(\sqrt{T})$ bound on proximal regret, revealing that GD, without modification, minimizes a stronger regret notion than external regret. This provides a new explanation for the empirically superior performance of gradient descent in online learning and games. We further extend our analysis to Mirror Descent in the Bregman setting and to Optimistic Gradient Descent, which yields faster convergence in smooth convex games.
【2】Interpretable Machine Learning for Reservoir Water Temperatures in the U.S. Red River Basin of the South
标题:美国南部红河流域水库水温的可解释机器学习
链接:https://arxiv.org/abs/2511.01837
作者:Isabela Suaza-Sierra, Hernan A. Moreno, Luis A De la Fuente, Thomas M. Neeson
摘要
:Accurate prediction of Reservoir Water Temperature (RWT) is vital for sustainable water management, ecosystem health, and climate resilience. Yet, prediction alone offers limited insight into the governing physical processes. To bridge this gap, we integrated explainable machine learning (ML) with symbolic modeling to uncover the drivers of RWT dynamics across ten reservoirs in the Red River Basin, USA, using over 10,000 depth-resolved temperature profiles. We first employed ensemble and neural models, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP), achieving high predictive skill (best RMSE = 1.20 degree Celsius, R^2 = 0.97). Using SHAP (SHapley Additive exPlanations), we quantified the contribution of physical drivers such as air temperature, depth, wind, and lake volume, revealing consistent patterns across reservoirs. To translate these data-driven insights into compact analytical expressions, we developed Kolmogorov Arnold Networks (KANs) to symbolically approximate RWT. Ten progressively complex KAN equations were derived, improving from R^2 = 0.84 using a single predictor (7-day antecedent air temperature) to R^2 = 0.92 with ten predictors, though gains diminished beyond five, highlighting a balance between simplicity and accuracy. The resulting equations, dominated by linear and rational forms, incrementally captured nonlinear behavior while preserving interpretability. Depth consistently emerged as a secondary but critical predictor, whereas precipitation had limited effect. By coupling predictive accuracy with explanatory power, this framework demonstrates how KANs and explainable ML can transform black-box models into transparent surrogates that advance both prediction and understanding of reservoir thermal dynamics.
【3】Machine and Deep Learning for Indoor UWB Jammer Localization
标题:用于室内超宽带干扰机定位的机器和深度学习
链接:https://arxiv.org/abs/2511.01819
作者:Hamed Fard, Mahsa Kholghi, Benedikt Groß, Gerhard Wunder
备注:Accepted at the 20th International Conference on Risks and Security of Internet and Systems (CRiSIS 2025, Gatineau-Canada, this https URL). The paper will soon be published as post-proceedings in Springer's LNCS
摘要:Ultra-wideband (UWB) localization delivers centimeter-scale accuracy but is vulnerable to jamming attacks, creating security risks for asset tracking and intrusion detection in smart buildings. Although machine learning (ML) and deep learning (DL) methods have improved tag localization, localizing malicious jammers within a single room and across changing indoor layouts remains largely unexplored. Two novel UWB datasets, collected under original and modified room configurations, are introduced to establish comprehensive ML/DL baselines. Performance is rigorously evaluated using a variety of classification and regression metrics. On the source dataset with the collected UWB features, Random Forest achieves the highest F1-macro score of 0.95 and XGBoost achieves the lowest mean Euclidean error of 20.16 cm. However, deploying these source-trained models in the modified room layout led to severe performance degradation, with XGBoost's mean Euclidean error increasing tenfold to 207.99 cm, demonstrating significant domain shift. To mitigate this degradation, a domain-adversarial ConvNeXt autoencoder (A-CNT) is proposed that leverages a gradient-reversal layer to align CIR-derived features across domains. The A-CNT framework restores localization performance by reducing the mean Euclidean error to 34.67 cm. This represents a 77 percent improvement over non-adversarial transfer learning and an 83 percent improvement over the best baseline, restoring the fraction of samples within 30 cm to 0.56. Overall, the results demonstrate that adversarial feature alignment enables robust and transferable indoor jammer localization despite environmental changes. Code and dataset available at https://github.com/afbf4c8996f/Jammer-Loc
【4】No-rank Tensor Decomposition Using Metric Learning
标题:使用度量学习的无阶张量分解
链接:https://arxiv.org/abs/2511.01816
作者:Maryam Bagherian
摘要:Tensor decomposition faces fundamental challenges in analyzing high-dimensional data, where traditional methods based on reconstruction and fixed-rank constraints often fail to capture semantically meaningful structures. This paper introduces a no-rank tensor decomposition framework grounded in metric learning, which replaces reconstruction objectives with a discriminative, similarity-based optimization. The proposed approach learns data-driven embeddings by optimizing a triplet loss with diversity and uniformity regularization, creating a feature space where distance directly reflects semantic similarity. We provide theoretical guarantees for the framework's convergence and establish bounds on its metric properties. Evaluations across diverse domains --including face recognition (LFW, Olivetti), brain connectivity analysis (ABIDE), and simulated data (galaxy morphology, crystal structures)-- demonstrate that our method outperforms baseline techniques, including PCA, t-SNE, UMAP, and tensor decomposition baselines (CP and Tucker). Results show substantial improvements in clustering metrics (Silhouette Score, Davies--Bouldin Index, Calinski--Harabasz Index, Separation Ratio, Adjusted Rand Index, Normalized Mutual Information) and reveal a fundamental trade-off: while metric learning optimizes global class separation, it deliberately transforms local geometry to align with semantic relationships. Crucially, our approach achieves superior performance with smaller training datasets compared to transformer-based methods, offering an efficient alternative for domains with limited labeled data. This work establishes metric learning as a paradigm for tensor-based analysis, prioritizing semantic relevance over pixel-level fidelity while providing computational advantages in data-scarce scenarios.
【5】Hybrid Neural Network-Based Indoor Localisation System for Mobile Robots Using CSI Data in a Robotics Simulator
标题:基于CSI数据的混合神经网络室内机器人定位系统
链接:https://arxiv.org/abs/2511.01797
作者:Javier Ballesteros-Jerez, Jesus Martínez-Gómez, Ismael García-Varea, Luis Orozco-Barbosa, Manuel Castillo-Cara
备注:13 pages, 7 figures. Conference paper (ROBOVIS 2025)
摘要:We present a hybrid neural network model for inferring the position of mobile robots using Channel State Information (CSI) data from a Massive MIMO system. By leveraging an existing CSI dataset, our approach integrates a Convolutional Neural Network (CNN) with a Multilayer Perceptron (MLP) to form a Hybrid Neural Network (HyNN) that estimates 2D robot positions. CSI readings are converted into synthetic images using the TINTO tool. The localisation solution is integrated with a robotics simulator, and the Robot Operating System (ROS), which facilitates its evaluation through heterogeneous test cases, and the adoption of state estimators like Kalman filters. Our contributions illustrate the potential of our HyNN model in achieving precise indoor localisation and navigation for mobile robots in complex environments. The study follows, and proposes, a generalisable procedure applicable beyond the specific use case studied, making it adaptable to different scenarios and datasets.
【6】Fractional Diffusion Bridge Models
标题:分数扩散桥模型
链接:https://arxiv.org/abs/2511.01795
作者:Gabriel Nobis, Maximilian Springenberg, Arina Belova, Rembert Daems, Christoph Knochenhauer, Manfred Opper, Tolga Birdal, Wojciech Samek
备注:To appear in NeurIPS 2025 proceedings. This version includes post-camera-ready revisions
摘要:We present Fractional Diffusion Bridge Models (FDBM), a novel generative diffusion bridge framework driven by an approximation of the rich and non-Markovian fractional Brownian motion (fBM). Real stochastic processes exhibit a degree of memory effects (correlations in time), long-range dependencies, roughness and anomalous diffusion phenomena that are not captured in standard diffusion or bridge modeling due to the use of Brownian motion (BM). As a remedy, leveraging a recent Markovian approximation of fBM (MA-fBM), we construct FDBM that enable tractable inference while preserving the non-Markovian nature of fBM. We prove the existence of a coupling-preserving generative diffusion bridge and leverage it for future state prediction from paired training data. We then extend our formulation to the Schr\"{o}dinger bridge problem and derive a principled loss function to learn the unpaired data translation. We evaluate FDBM on both tasks: predicting future protein conformations from aligned data, and unpaired image translation. In both settings, FDBM achieves superior performance compared to the Brownian baselines, yielding lower root mean squared deviation (RMSD) of C$_\alpha$ atomic positions in protein structure prediction and lower Fr\'echet Inception Distance (FID) in unpaired image translation.
【7】Game-theoretic distributed learning of generative models for heterogeneous data collections
标题:针对异类数据收集的生成模型的游戏理论分布式学习
链接:https://arxiv.org/abs/2511.01740
作者:Dmitrij Schlesinger, Boris Flach
备注:The manuscript is accepted for publishing at the 2025 Symposium on Federated Learning and Intelligent Computing Systems (FLICS 2025)
摘要:One of the main challenges in distributed learning arises from the difficulty of handling heterogeneous local models and data. In light of the recent success of generative models, we propose to meet this challenge by building on the idea of exchanging synthetic data instead of sharing model parameters. Local models can then be treated as ``black boxes'' with the ability to learn their parameters from data and to generate data according to these parameters. Moreover, if the local models admit semi-supervised learning, we can extend the approach by enabling local models on different probability spaces. This allows to handle heterogeneous data with different modalities. We formulate the learning of the local models as a cooperative game starting from the principles of game theory. We prove the existence of a unique Nash equilibrium for exponential family local models and show that the proposed learning approach converges to this equilibrium. We demonstrate the advantages of our approach on standard benchmark vision datasets for image classification and conditional generation.
【8】Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
标题:通过卡尔曼过滤对CLIP模型进行Bayesian自然梯度微调
链接:https://arxiv.org/abs/2511.01694
作者:Hossein Abdi, Mingfei Sun, Wei Pan
摘要:Vision-language pre-trained models, such as CLIP, have established new benchmarks in multimodal data mining. In such models, few-shot fine-tuning is a major challenge to achieve optimal performance on both in-distribution (ID) and out-of-distribution (OOD) datasets, especially when labeled data is scarce. Most existing fine-tuning approaches rely on first-order gradient-based optimizers, which typically suffer from slow convergence, sensitivity to step-size hyperparameters, and poor generalization in OOD settings. In contrast, second-order methods utilize local curvature information of the loss landscape to adjust the update step size. This is particularly beneficial for CLIP models, whose non-convex loss functions often contain sharp critical points. In such cases, natural gradient direction can offer more substantial and efficient per-iteration updates when fine-tuning with limited data. Natural Gradient Descent (NGD) is obtained by preconditioning the standard gradient with the inverse Fisher Information Matrix (FIM), which is computationally expensive for large models. To address this, we propose a Bayesian approximation of NGD using a Kalman filter for CLIP models. Our method combines the benefits of second-order optimization with Bayesian inference, which enhances generalization while providing uncertainty quantification. Extensive experiments conducted on diverse image classification datasets demonstrate that our algorithm consistently achieves superior--or comparable--ID performance and improved OOD robustness compared to state-of-the-art baselines. To the best of our knowledge, this work represents the first successful application of Kalman filtering to fine-tuning CLIP-based models, which enables more robust and efficient learning in vision-language tasks.
【9】Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization
标题:探索更多,学习更好:共同信息最小化下的并行MLLM嵌入
链接:https://arxiv.org/abs/2511.01588
作者:Zhicheng Wang, Chen Ju, Xu Chen, Shuai Xiao, Jinsong Lan, Xiaoyong Zhu, Ying Chen, Zhiguo Cao
摘要
:Embedding models are a cornerstone of modern AI. Driven by Multimodal Large Language Models (MLLMs), they have made great progress in architecture and data curation, while the holistic paradigm is still limited to SSC, i.e., single input, singular embedding, contrastive supervision, which collapses rich, multifaceted inputs into monolithic embeddings and fails to fully exploit MLLM capabilities. In this paper, we tailor one Parallel Decoupling Framework (PDF) for multimodal embedding learning, by utilizing the proprietary steerability of MLLMs, i.e., their ability to flexibly generate quite differentiated response under explicit instructions. Concretely, PDF conditions a shared MLLM backbone on distinct, learnable prefixes to roll out multiple parallel paths for one input, then relies on these paths to obtain parallel embeddings. To promote full parallel diversity, we employ Mutual Information Minimization (MIM) as an explicit constraint, coupled with per-path contrastive supervision to maintain semantic alignment. Such dual-objectives force PDF to yield robust semantic coverage and a generalizable embedding space. Ultimately, the remarkable embedding space are accessible at inference via one single forward pass, incurring negligible computational overhead. We instantiate PDF on multiple MLLM backbones and prove its effectiveness on MMEB benchmark. Significant gains are consistently achieved across various resolutions and model sizes, e.g., boosting the VLM2Vec-LLaVA-1.6-LR model by a remarkable +8.9% (7B), while the VLM2Vec-Qwen2VL models by +4.2% (2B) and +3.1% (7B). In terms of efficiency, our 2B model surpasses its baseline by +2.6% using only half the computational budget.
【10】Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning
标题:学习要说什么以及如何准确地说:通过差异化离散沟通学习进行高效沟通
链接:https://arxiv.org/abs/2511.01554
作者:Aditya Kapoor, Yash Bhisikar, Benjamin Freed, Jan Peters, Mingfei Sun
备注:30 pages, 12 figures, 6 tables
摘要:Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide \textit{whether} to communicate, not \textit{how precisely}. Learning to optimize message precision at the bit-level is fundamentally harder, as the required discretization step breaks gradient flow. We address this by generalizing Differentiable Discrete Communication Learning (DDCL), a framework for end-to-end optimization of discrete messages. Our primary contribution is an extension of DDCL to support unbounded signals, transforming it into a universal, plug-and-play layer for any MARL architecture. We verify our approach with three key results. First, through a qualitative analysis in a controlled environment, we demonstrate \textit{how} agents learn to dynamically modulate message precision according to the informational needs of the task. Second, we integrate our variant of DDCL into four state-of-the-art MARL algorithms, showing it reduces bandwidth by over an order of magnitude while matching or exceeding task performance. Finally, we provide direct evidence for the \enquote{Bitter Lesson} in MARL communication: a simple Transformer-based policy leveraging DDCL matches the performance of complex, specialized architectures, questioning the necessity of bespoke communication designs.
【11】Real-time Continual Learning on Intel Loihi 2
标题:英特尔Loihi 2上的实时持续学习
链接:https://arxiv.org/abs/2511.01553
作者:Elvin Hajizada, Danielle Rager, Timothy Shea, Leobardo Campos-Macias, Andreas Wild, Eyke Hüllermeier, Yulia Sandamirskaya, Mike Davies
摘要:AI systems on edge devices face a critical challenge in open-world environments: adapting when data distributions shift and novel classes emerge. While offline training dominates current paradigms, online continual learning (OCL)--where models learn incrementally from non-stationary streams without catastrophic forgetting--remains challenging in power-constrained settings. We present a neuromorphic solution called CLP-SNN: a spiking neural network architecture for Continually Learning Prototypes and its implementation on Intel's Loihi 2 chip. Our approach introduces three innovations: (1) event-driven and spatiotemporally sparse local learning, (2) a self-normalizing three-factor learning rule maintaining weight normalization, and (3) integrated neurogenesis and metaplasticity for capacity expansion and forgetting mitigation. On OpenLORIS few-shot learning experiments, CLP-SNN achieves accuracy competitive with replay methods while being rehearsal-free. CLP-SNN delivers transformative efficiency gains: 70\times faster (0.33ms vs 23.2ms), and 5,600\times more energy efficient (0.05mJ vs 281mJ) than the best alternative OCL on edge GPU. This demonstrates that co-designed brain-inspired algorithms and neuromorphic hardware can break traditional accuracy-efficiency trade-offs for future edge AI systems.
【12】DAMBench: A Multi-Modal Benchmark for Deep Learning-based Atmospheric Data Assimilation
标题:DAMBench:基于深度学习的大气数据同化的多模式基准
链接:https://arxiv.org/abs/2511.01468
作者:Hao Wang, Zixuan Weng, Jindong Han, Wei Fan, Hao Liu
摘要:Data Assimilation is a cornerstone of atmospheric system modeling, tasked with reconstructing system states by integrating sparse, noisy observations with prior estimation. While traditional approaches like variational and ensemble Kalman filtering have proven effective, recent advances in deep learning offer more scalable, efficient, and flexible alternatives better suited for complex, real-world data assimilation involving large-scale and multi-modal observations. However, existing deep learning-based DA research suffers from two critical limitations: (1) reliance on oversimplified scenarios with synthetically perturbed observations, and (2) the absence of standardized benchmarks for fair model comparison. To address these gaps, in this work, we introduce DAMBench, the first large-scale multi-modal benchmark designed to evaluate data-driven DA models under realistic atmospheric conditions. DAMBench integrates high-quality background states from state-of-the-art forecasting systems and real-world multi-modal observations (i.e., real-world weather stations and satellite imagery). All data are resampled to a common grid and temporally aligned to support systematic training, validation, and testing. We provide unified evaluation protocols and benchmark representative data assimilation approaches, including latent generative models and neural process frameworks. Additionally, we propose a lightweight multi-modal plugin to demonstrate how integrating realistic observations can enhance even simple baselines. Through comprehensive experiments, DAMBench establishes a rigorous foundation for future research, promoting reproducibility, fair comparison, and extensibility to real-world multi-modal scenarios. Our dataset and code are publicly available at https://github.com/figerhaowang/DAMBench.
【13】The Curvature Rate λ: A Scalar Measure of Input-Space Sharpness in Neural Networks
标题:弯曲率|神经网络输入空间清晰度的纯量测量
链接:https://arxiv.org/abs/2511.01438
作者:Jacob Poschl
备注:14 pages
【14】CG-FKAN: Compressed-Grid Federated Kolmogorov-Arnold Networks for Communication Constrained Environment
标题:CG-FKAN:用于通信约束环境的压缩网格联邦Kolmogorov-Arnold网络
链接:https://arxiv.org/abs/2511.01433
作者
:Seunghun Yu, Youngjoon Lee, Jinu Gong, Joonhyuk Kang
备注:5 pages
【15】Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization
标题:通过重新参数化和多样性规范化学习难以解决的多模式政策
链接:https://arxiv.org/abs/2511.01374
作者:Ziqi Wang, Jiashun Liu, Ling Pan
备注:NeurIPS 2025
【16】A semantic-based deep learning approach for mathematical expression retrieval
标题:基于语义的数学公式检索深度学习方法
链接:https://arxiv.org/abs/2511.01364
【17】Verifiable Split Learning via zk-SNARKs
标题:通过zk-SNARKS可验证的拆分学习
链接:https://arxiv.org/abs/2511.01356
作者:Rana Alaa, Darío González-Ferreiro, Carlos Beis-Penedo, Manuel Fernández-Veiga, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas
备注:Submitted to CAI'26 (IEEE Conference on Artificial Intelligence 2026)
【18】Lyapunov Stability Learning with Nonlinear Control via Inductive Biases
标题:通过归纳偏差进行非线性控制的李雅普诺夫稳定性学习
链接:https://arxiv.org/abs/2511.01283
作者:Yupu Lu, Shijie Lin, Hao Xu, Zeqing Zhang, Jia Pan
备注:Accepted by IEEE Robio 2025
【19】Learning When to Quit in Sales Conversations
标题:在销售对话中学习何时退出
链接:https://arxiv.org/abs/2511.01181
作者:Emaad Manzoor, Eva Ascarza, Oded Netzer
【20】Regularization Implies balancedness in the deep linear network
标题:正规化意味着深度线性网络的平衡
链接:https://arxiv.org/abs/2511.01137
作者:Kathryn Lindsey, Govind Menon
备注:18 pages, 3 figures
【21】One model to solve them all: 2BSDE families via neural operators
标题:解决所有问题的一个模型:通过神经运算符2 BSYS家族
链接:https://arxiv.org/abs/2511.01125
作者:Takashi Furuya, Anastasis Kratsios, Dylan Possamaï, Bogdan Raonić
【22】SLAP: Shortcut Learning for Abstract Planning
标题:Scrum:Scrum学习抽象规划
链接:https://arxiv.org/abs/2511.01107
作者:Y. Isabel Liu, Bowen Li, Benjamin Eysenbach, Tom Silver
【23】Energy-Efficient Deep Learning Without Backpropagation: A Rigorous Evaluation of Forward-Only Algorithms
标题:无需反向传播的节能深度学习:对纯向前算法的严格评估
链接:https://arxiv.org/abs/2511.01061
作者:Przemysław Spyra, Witold Dzwinel
【24】Balanced Multimodal Learning via Mutual Information
标题:通过互信息平衡的多模式学习
链接:https://arxiv.org/abs/2511.00987
作者:Rongrong Xie, Guido Sanguinetti
【25】Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow
标题:使用NicheFlow在空间转录组学上建模微环境轨迹
链接:https://arxiv.org/abs/2511.00977
作者:Kristiyan Sakalyan, Alessandro Palma, Filippo Guerranti, Fabian J. Theis, Stephan Günnemann
备注:37 pages, 15 figures, to appear in NeurIPS 2025
【26】The Hidden Power of Normalization: Exponential Capacity Control in Deep Neural Networks
标题:规范化的隐藏力量:深度神经网络中的指数容量控制
链接:https://arxiv.org/abs/2511.00958
【27】Random Spiking Neural Networks are Stable and Spectrally Simple
标题:随机尖峰神经网络稳定且光谱简单
链接:https://arxiv.org/abs/2511.00904
作者:Ernesto Araya, Massimiliano Datres, Gitta Kutyniok
【28】EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
标题:EraseFlow:通过GFlowNet驱动的对齐来学习概念擦除策略
链接:https://arxiv.org/abs/2511.00804
作者:Abhiram Kusumba, Maitreya Patel, Kyle Min, Changhoon Kim, Chitta Baral, Yezhou Yang
备注:NeurIPS'25 Spotlight | Project page: this https URL
【29】Investigating the Robustness of Knowledge Tracing Models in the Presence of Student Concept Drift
标题:研究存在学生概念漂移的情况下知识追踪模型的稳健性
链接:https://arxiv.org/abs/2511.00704
作者:Morgan Lee, Artem Frenk, Eamon Worden, Karish Gupta, Thinh Pham, Ethan Croteau, Neil Heffernan
备注:10 pages, 6 figures
【30】Filtered Neural Galerkin model reduction schemes for efficient propagation of initial condition uncertainties in digital twins
标题:数字孪生体中初始条件不确定性有效传播的滤波神经Galerkin模型降阶格式
链接:https://arxiv.org/abs/2511.00670
作者:Zhiyang Ning, Benjamin Peherstorfer
【31】Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
标题:信念动力学揭示了情境学习和激活引导的双重性质
链接:https://arxiv.org/abs/2511.00617
作者:Eric Bigelow, Daniel Wurgaft, YingQiao Wang, Noah Goodman, Tomer Ullman, Hidenori Tanaka, Ekdeep Singh Lubana
【32】Gaining Momentum: Uncovering Hidden Scoring Dynamics in Hockey through Deep Neural Sequencing and Causal Modeling
标题:获得动力:通过深度神经测序和因果模型揭示曲棍球中隐藏的评分动态
链接:https://arxiv.org/abs/2511.00615
作者:Daniel Griffiths, Piper Moskow
备注:5 Pages, 4 Figures, 2 Tables
【33】Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance
标题:通过混合策略子轨迹平衡学习高效优化器
链接:https://arxiv.org/abs/2511.00543
作者:Yunchuan Guan, Yu Liu, Ke Zhou, Hui Li, Sen Jia, Zhiqi Shen, Ziyang Wang, Xinglin Zhang, Tao Chen, Jenq-Neng Hwang, Lei Li
【34】Region-Aware Reconstruction Strategy for Pre-training fMRI Foundation Model
标题:训练前fMRI基础模型的区域感知重建策略
链接:https://arxiv.org/abs/2511.00443
作者:Ruthwik Reddy Doodipala, Pankaj Pandey, Carolina Torres Rojas, Manob Jyoti Saikia, Ranganatha Sitaram
【35】Trust-Region Methods with Low-Fidelity Objective Models
标题:具有低保真目标模型的信任区域方法
链接:https://arxiv.org/abs/2511.00434
作者:Andrea Angino, Matteo Aurina, Alena Kopaničáková, Matthias Voigt, Marco Donatelli, Rolf Krause
备注:Submitted to the Proceedings of Domain Decomposition Methods in Science and Engineering XXIX
【36】Bootstrap Off-policy with World Model
标题:Bootstrap Off政策与世界模式
链接:https://arxiv.org/abs/2511.00423
作者:Guojian Zhan, Likun Wang, Xiangteng Zhang, Jiaxin Gao, Masayoshi Tomizuka, Shengbo Eben Li
备注:NeurIPS 2025
【37】Structure-Preserving Physics-Informed Neural Network for the Korteweg--de Vries (KdV) Equation
标题:Korteweg-de Vries(Kdis)方程的结构保持物理信息神经网络
链接:https://arxiv.org/abs/2511.00418
作者:Victory Obieke, Emmanuel Oguadimma
备注:9 Pages, 11 figures
【38】Iterative Foundation Model Fine-Tuning on Multiple Rewards
标题:迭代基础模型对多重奖励进行微调
链接:https://arxiv.org/abs/2511.00220
作者:Pouya M. Ghari, Simone Sciabola, Ye Wang
备注:Accepted to NeurIPS 2025
【39】Reducing Robotic Upper-Limb Assessment Time While Maintaining Precision: A Time Series Foundation Model Approach
标题:减少机器人上肢评估时间同时保持精确度:时间序列基础模型方法
链接:https://arxiv.org/abs/2511.00193
作者:Faranak Akbarifar, Nooshin Maghsoodi, Sean P Dukelow, Stephen Scott, Parvin Mousavi
【40】A Retrospect to Multi-prompt Learning across Vision and Language
标题:跨越视觉和语言的多提示学习的挑战
链接:https://arxiv.org/abs/2511.00191
作者:Ziliang Chen, Xin Huang, Quanlong Guan, Liang Lin, Weiqi Luo
备注:ICCV
【41】Feature Importance Guided Random Forest Learning with Simulated Annealing Based Hyperparameter Tuning
标题:特征重要性引导的随机森林学习,基于模拟退变的超参数调整
链接:https://arxiv.org/abs/2511.00133
作者:Kowshik Balasubramanian, Andre Williams, Ismail Butun
备注:10 pages, 2 figures, 3 tables, submitted to IEEE Intelligent Systems journal
【42】Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models
标题:交叉涨落相转变揭示了扩散模型中的采样动态
链接:https://arxiv.org/abs/2511.00124
作者:Sai Niranjan Ramachandran, Manish Krishan Lal, Suvrit Sra
备注:Accepted at NeurIPS 2025. 10 pages, camera-ready version. appendices included
【43】Pelican-VL 1.0: A Foundation Brain Model for Embodied Intelligence
标题:Pelican-VL 1. 0:智力基础大脑模型
链接:https://arxiv.org/abs/2511.00108
作者:Yi Zhang, Che Liu, Xiancong Ren, Hanchu Ni, Shuai Zhang, Zeyuan Ding, Jiayu Hu, Hanzhe Shan, Zhenwei Niu, Zhaoyang Liu, Yue Zhao, Junbo Qi, Qinfan Zhang, Dengjie Li, Yidong Wang, Jiachen Luo, Yong Dai, Jian Tang, Xiaozhu Ju
【44】World Simulation with Video Foundation Models for Physical AI
标题:物理人工智能的视频基础模型世界模拟
链接:https://arxiv.org/abs/2511.00062
作者:NVIDIA: Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, Prithvijit Chattopadhyay, Mike Chen, Yongxin Chen, Yu Chen, Shuai Cheng, Yin Cui, Jenna Diamond, Yifan Ding, Jiaojiao Fan, Linxi Fan, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Ruiyuan Gao, Yunhao Ge, Jinwei Gu, Aryaman Gupta, Siddharth Gururani, Imad El Hanafi, Ali Hassani, Zekun Hao, Jacob Huffman, Joel Jang, Pooya Jannaty, Jan Kautz, Grace Lam, Xuan Li, Zhaoshuo Li, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Yen-Chen Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Seungjun Nah, Yashraj Narang, Abhijeet Panaskar, Lindsey Pavao, Trung Pham, Morteza Ramezanali, Fitsum Reda, Scott Reed, Xuanchi Ren, Haonan Shao, Yue Shen, Stella Shi, Shuran Song, Bartosz Stefaniak, Shangkun Sun, Shitao Tang, Sameena Tasmeen, Lyne Tchapmi, Wei-Cheng Tseng, Jibin Varghese, Andrew Z. Wang, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Jiashu Xu, Dinghao Yang, Xiaodong Yang, Haotian Ye, Seonghyeon Ye, Xiaohui Zeng, Jing Zhang, Qinsheng Zhang, Kaiwen Zheng, Andrew Zhu, Yuke Zhu
【45】ReLaX-Net: Reusing Layers for Parameter-Efficient Physical Neural Networks
标题:ReLaX-Net:重用层以实现参数高效的物理神经网络
链接:https://arxiv.org/abs/2511.00044
作者:Kohei Tsuchiyama, Andre Roehm, Takatomo Mihana, Ryoichi Horisaki
【46】Partial Trace-Class Bayesian Neural Networks
标题:部分追踪类Bayesian神经网络
链接:https://arxiv.org/abs/2511.01628
作者:Arran Carter, Torben Sell
备注:10 pages, 4 figures
【47】Fast, memory-efficient genomic interval tokenizers for modern machine learning
标题:用于现代机器学习的快速、内存高效的基因组间隔标记器
链接:https://arxiv.org/abs/2511.01555
作者:Nathan J. LeRoy, Donald R. Campbell Jr, Seth Stadick, Oleksandr Khoroshevskyi, Sang-Hoon Park, Ziyang Hu, Nathan C. Sheffield
备注:4 pages, 1 figure
【48】Quantum Deep Learning Still Needs a Quantum Leap
标题:量子深度学习仍需要量子飞跃
链接:https://arxiv.org/abs/2511.01253
作者:Hans Gundlach, Hrvoje Kukina, Jayson Lynch, Neil Thompson
【49】Generative Machine Learning Models for the Deconvolution of Charge Carrier Dynamics in Organic Photovoltaic Cells
标题:有机太阳能电池中电荷子动力学去卷积的生成性机器学习模型
链接:https://arxiv.org/abs/2511.01118
作者:Li Raymond, Salim Flora, Wang Sijin, Wright Brendan
【50】Hyper Hawkes Processes: Interpretable Models of Marked Temporal Point Processes
标题:超霍克斯过程:标记时间点过程的可解释模型
链接:https://arxiv.org/abs/2511.01096
作者:Alex Boyd, Andrew Warrington, Taha Kass-Hout, Parminder Bhatia, Danica Xiao
【51】Deep Generative Models for Enhanced Vitreous OCT Imaging
标题:用于增强玻璃体光学断层扫描成像的深度生成模型
链接:https://arxiv.org/abs/2511.00881
作者:Simone Sarrocco, Philippe C. Cattin, Peter M. Maloca, Paul Friedrich, Philippe Valmaggia
【52】Correspondence Between Ising Machines and Neural Networks
标题:伊辛机与神经网络之间的对应关系
链接:https://arxiv.org/abs/2511.00746
作者:Andrew G. Moore
备注:22 pages, 4 figures
【53】Generative Modeling Enables Molecular Structure Retrieval from Coulomb Explosion Imaging
标题:生成建模能够从库伦爆炸成像中检索分子结构
链接:https://arxiv.org/abs/2511.00179
作者:Xiang Li, Till Jahnke, Rebecca Boll, Jiaqi Han, Minkai Xu, Michael Meyer, Maria Novella Piancastelli, Daniel Rolles, Artem Rudenko, Florian Trinter, Thomas J.A. Wolf, Jana B. Thayer, James P. Cryan, Stefano Ermon, Phay J. Ho
【54】Using machine learning methods to predict cognitive age from psychophysiological tests
标题:使用机器学习方法通过心理生理学测试预测认知年龄
链接:https://arxiv.org/abs/2511.00013
作者:Daria D. Tyurina, Sergey V. Stasenko, Konstantin V. Lushnikov, Maria V. Vedunova
【55】Group-Equivariant Diffusion Models for Lattice Field Theory
标题:格场理论的群等变扩散模型
链接:https://arxiv.org/abs/2510.26081
作者:Octavio Vega, Javad Komijani, Aida El-Khadra, Marina Marinkovic
备注:45 pages, 12 figures
其他(61篇)
【1】Towards Multi-Fidelity Scaling Laws of Neural Surrogates in CFD
标题:迈向计算流体动力学中神经代理的多保真标度定律
链接:https://arxiv.org/abs/2511.01830
作者:Paul Setinek, Gianluca Galletti, Johannes Brandstetter
【2】Dynamic Reconstruction of Ultrasound-Derived Flow Fields With Physics-Informed Neural Fields
标题:利用物理信息神经场动态重建超声流场
链接:https://arxiv.org/abs/2511.01804
作者:Viraj Patel, Lisa Kreusser, Katharine Fraser
备注:29 pages, 18 figures
【3】Random Initialization of Gated Sparse Adapters
标题:门控稀疏适配器的随机分配
链接:https://arxiv.org/abs/2511.01794
作者:Vi Retault, Yohaï-Eliel Berreby
备注:13 pages (8 main), 6 figures (4 main). Accepted by NewInML workshop @ ICML 2025 on June 27, 2025
【4】ADNAC: Audio Denoiser using Neural Audio Codec
标题:ADNEC:使用神经音频编解码器的音频降噪器
链接:https://arxiv.org/abs/2511.01773
作者:Daniel Jimon, Mircea Vaida, Adriana Stan
备注:Accepted and presented at the 13th International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Cluj-Napoca, Romania, October 19-22, 2025. 4 pages, 1 figure. IEEE Catalog Number: CFP2555H-USB, ISBN: 979-8-3315-7485-7
【5】Edge AI in Highly Volatile Environments: Is Fairness Worth the Accuracy Trade-off?
标题:高度波动环境中的边缘人工智能:公平性是否值得准确性权衡?
链接:https://arxiv.org/abs/2511.01737
作者:Obaidullah Zaland, Feras M. Awaysheh, Sawsan Al Zubi, Abdul Rahman Safi, Monowar Bhuyan
备注:Presented at IEEE FLTA 2025
【6】Probabilistic Robustness for Free? Revisiting Training via a Benchmark
标题:概率稳健性免费?通过基准重新审视训练
链接:https://arxiv.org/abs/2511.01724
作者:Yi Zhang, Zheng Wang, Chen Zhen, Wenjie Ruan, Qing Guo, Siddartha Khastgir, Carsten Maple, Xingyu Zhao
【7】SemBench: A Benchmark for Semantic Query Processing Engines
标题:SemBench:语义查询处理引擎的基准
链接:https://arxiv.org/abs/2511.01716
作者:Jiale Lao, Andreas Zimmerer, Olga Ovcharenko, Tianji Cong, Matthew Russo, Gerardo Vitagliano, Michael Cochez, Fatma Özcan, Gautam Gupta, Thibaud Hottelier, H. V. Jagadish, Kris Kissel, Sebastian Schelter, Andreas Kipf, Immanuel Trummer
【8】Solution Space Topology Guides CMTS Search
标题:解决方案空间布局指导CMTS搜索
链接:https://arxiv.org/abs/2511.01701
作者:Mirco A. Mannucci
备注:15 pages, 3 figures
【9】Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI
标题:开放性格训练:通过宪政人工智能塑造人工智能助理的角色
链接:https://arxiv.org/abs/2511.01689
作者:Sharan Maiya, Henning Bartsch, Nathan Lambert, Evan Hubinger
备注:12 pages, 6 figures, 4 tables
【10】Extremal Contours: Gradient-driven contours for compact visual attribution
标题:极端轮廓:由物体驱动的轮廓,实现紧凑的视觉属性
链接:https://arxiv.org/abs/2511.01411
作者:Reza Karimzadeh, Albert Alonso, Frans Zdyb, Julius B. Kirkegaard, Bulat Ibragimov
【11】Memory-Efficient Training with In-Place FFT Implementation
标题:采用就地快速傅立叶实现的内存高效训练
链接:https://arxiv.org/abs/2511.01385
作者:Xinyu Ding, Bangtian Liu, Siyu Liao, Zhongfeng Wang
备注:Accepted at NeurIPS 2025. Presents a real-domain in-place FFT (rdFFT) operator for memory-efficient fine-tuning of large language models
【12】Diffusion-Based Solver for CNF Placement on the Cloud-Continuum
标题:云连续体上CNF放置的基于扩散的求解器
链接:https://arxiv.org/abs/2511.01343
作者:Álvaro Vázquez Rodríguez, Manuel Fernández-Veiga, Carlos Giraldo-Rodríguez
备注:7 pages, 7 figures. Presented at PE-WASUN'25 (IEEE International Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, and Ubiquitous Networks)
【13】Black-Box Differentially Private Nonparametric Confidence Intervals Under Minimal Assumptions
标题:最小假设下的黑箱差分私人非参数置信区间
链接:https://arxiv.org/abs/2511.01303
作者:Tomer Shoham, Moshe Shenfeld, Noa Velner-Harris, Katrina Ligett
【14】FEval-TTC: Fair Evaluation Protocol for Test-Time Compute
标题:FEval-TTC:测试时间计算的公平评估协议
链接:https://arxiv.org/abs/2511.01203
作者:Pavel Rumiantsev, Soumyasundar Pal, Yingxue Zhang, Mark Coates
【15】Analyzing the Power of Chain of Thought through Memorization Capabilities
标题:通过企业化能力分析思维链的力量
链接:https://arxiv.org/abs/2511.01190
作者:Lijia Yu, Xiao-Shan Gao, Lijun Zhang
【16】Happiness as a Measure of Fairness
标题:幸福作为公平的衡量标准
链接:https://arxiv.org/abs/2511.01069
作者:Georg Pichler, Marco Romanelli, Pablo Piantanida
【17】OceanAI: A Conversational Platform for Accurate, Transparent, Near-Real-Time Oceanographic Insights
标题:OceanAI:一个提供准确、透明、近实时海洋学洞察的对话平台
链接:https://arxiv.org/abs/2511.01019
作者:Bowen Chen, Jayesh Gajbhar, Gregory Dusek, Rob Redmon, Patrick Hogan, Paul Liu, DelWayne Bohnenstiehl, Dongkuan (DK)Xu, Ruoying He
备注:A related presentation will be given at the AGU(American Geophysical Union) and AMS(American Meteorological Society) Annual Meetings
【18】What's the next frontier for Data-centric AI? Data Savvy Agents
标题:以数据为中心的人工智能的下一个前沿是什么?精通数据的代理
链接:https://arxiv.org/abs/2511.01015
作者:Nabeel Seedat, Jiashuo Liu, Mihaela van der Schaar
备注:Presented at ICLR 2025 Data-FM. Seedat & Liu contributed equally
【19】Using Synthetic Data to estimate the True Error is theoretically and practically doable
标题:使用合成数据来估计真误差在理论上和实践上都是可行的
链接:https://arxiv.org/abs/2511.00964
作者:Hai Hoang Thanh, Duy-Tung Nguyen, Hung The Tran, Khoat Than
备注:To appear at Machine Learning journal and ACML
【20】Controlling Gender Bias in Retrieval via a Backpack Architecture
标题:通过背包结构控制检索中的性别偏见
链接:https://arxiv.org/abs/2511.00875
作者:Amirabbas Afzali, Amirreza Velae, Iman Ahmadi, Mohammad Aliannejadi
【21】FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management
标题:Expressache:利用注意力头部的时间稳定性进行高效的KV缓存管理
链接:https://arxiv.org/abs/2511.00868
作者:Nazmul Takbir, Hamidreza Alikhani, Nikil Dutt, Sangeetha Abdu Jyothi
【22】Identifying Slug Formation in Oil Well Pipelines: A Use Case from Industrial Analytics
标题:识别油井管道中的段塞地层:Industrial Analytics的用例
链接:https://arxiv.org/abs/2511.00851
作者:Abhishek Patange, Sharat Chidambaran, Prabhat Shankar, Manjunath G.B., Anindya Chatterjee
备注:This paper ID 254 has been accepted for presentation in the Demonstration Track of the 13th ACM IKDD CODS Conference on Data Science CODS 2025, IISER Pune, India, from December 17 to 20, 2025
【23】GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
标题:GUI-AIMA:将内在多模态注意力与上下文锚点对齐,以实现GUI基础
链接:https://arxiv.org/abs/2511.00810
作者
:Shijie Zhou, Viet Dac Lai, Hao Tan, Jihyung Kil, Wanrong Zhu, Changyou Chen, Ruiyi Zhang
【24】AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs
标题:AReaL-Hex:在异类图形处理器上简化同步RL训练
链接:https://arxiv.org/abs/2511.00796
作者:Ran Yan, Youhe Jiang, Tianyuan Wu, Jiaxuan Gao, Zhiyu Mei, Wei Fu, Haohui Mai, Wei Wang, Yi Wu, Binhang Yuan
【25】Fast PINN Eigensolvers via Biconvex Reformulation
标题:通过双凸重新公式快速PINN特征求解
链接:https://arxiv.org/abs/2511.00792
作者:Akshay Sai Banderwaar, Abhishek Gupta
备注:7 pages, 3 figures, Machine Learning and the Physical Sciences Workshop NeurIPS 2025
【26】Trust Region-Based Bayesian Optimisation to Discover Diverse Solutions
标题:基于信任域的贝叶斯优化算法
链接:https://arxiv.org/abs/2511.00750
作者:Kokila Kasuni Perera, Frank Neumann, Aneta Neumann
【27】A CPU-Centric Perspective on Agentic AI
标题:以MCU为中心的视角看待大型人工智能
链接:https://arxiv.org/abs/2511.00739
作者:Ritik Raj, Hong Wang, Tushar Krishna
【28】Diluting Restricted Boltzmann Machines
标题:稀释受限制的Boltzmann机器
链接:https://arxiv.org/abs/2511.00648
作者:C. Díaz-Faloh, R. Mulet
【29】Node Preservation and its Effect on Crossover in Cartesian Genetic Programming
标题:笛卡尔遗传规划中的节点保持性及其对交叉的影响
链接:https://arxiv.org/abs/2511.00634
作者:Mark Kocherovsky, Illya Bakurov, Wolfgang Banzhaf
备注:Draft to cite in another paper before both papers are peer-reviewed for the evo*2026 conference, 21 pages, 5 figures
【30】Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation
标题:人工智能手术决策支持中诊断幻觉风险:用于顺序验证的顺序框架
链接:https://arxiv.org/abs/2511.00588
作者:Dong Chen, Yanzhe Wei, Zonglin He, Guan-Ming Kuang, Canhua Ye, Meiru An, Huili Peng, Yong Hu, Huiren Tao, Kenneth MC Cheung
【31】Mind the Gap: Missing Cyber Threat Coverage in NIDS Datasets for the Energy Sector
标题:注意差距:能源行业NIDS数据集中缺乏网络威胁覆盖
链接:https://arxiv.org/abs/2511.00360
作者:Adrita Rahman Tory, Khondokar Fida Hasan, Md Saifur Rahman, Nickolaos Koroniotis, Mohammad Ali Moni
备注:13 pages
【32】Toward Unifying Group Fairness Evaluation from a Sparsity Perspective
标题:从稀疏性角度统一群体公平评价
链接:https://arxiv.org/abs/2511.00359
作者:Zhecheng Sheng, Jiawei Zhang, Enmao Diao
备注:30 pages, 14 figures
【33】Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
标题:仅限关键代币:具有风险意识的投机解码
链接
:https://arxiv.org/abs/2511.00351
作者:Amir Ziashahabi, Yavuz Faruk Bakman, Duygu Nur Yaldiz, Mostafa El-Khamy, Sai Praneeth Karimireddy, Salman Avestimehr
备注:Accepted at NeurIPS 2025 Efficient Reasoning Workshop
【34】LongCat-Flash-Omni Technical Report
标题:LongCat-Flash-Omni技术报告
链接:https://arxiv.org/abs/2511.00279
作者:Meituan LongCat Team, Bairui Wang, Bayan, Bin Xiao, Bo Zhang, Bolin Rong, Borun Chen, Chang Wan, Chao Zhang, Chen Huang, Chen Chen, Chen Chen, Chengxu Yang, Chengzuo Yang, Cong Han, Dandan Peng, Delian Ruan, Detai Xin, Disong Wang, Dongchao Yang, Fanfan Liu, Fengjiao Chen, Fengyu Yang, Gan Dong, Gang Huang, Gang Xu, Guanglu Wan, Guoqiang Tan, Guoqiao Yu, Haibo Qiu, Hao Lu, Hongbo Liu, Hongyu Xiang, Jiaheng Wu, Jian Yang, Jiaxing Liu, Jing Huang, Jingang Wang, Jinrui Ding, Juchao Jiang, Jun Kuang, Jun Wang, Junhui Mei, Ke Ding, Kefeng Zhang, Lei Chen, Liang Shi, Limeng Qiao, Liming Zheng, Lin Ma, Liuyang Guo, Liya Ma, Luying Sun, Man Gao, Mengshen Zhu, Miao Cao, Minliang Lin, Nuo Xu, Peng Shi, Qi Zhang, Qian Fang, Qian Wang, Qian Yang, Quanxiu Wang, Rongxiang Weng, Rongxin Guo, Ruoxuan Liang, Senbin Yang, Shanbo Xu, Shanglin Lei, Shengze Ye, Shimin Chen, Shuaiqi Chen, Shujie Hu, Shuo Li, Siqi Yang, Siyu Xu, Siyu Ren, Song Li, Songxiang Liu, Tianhao Bai, Tianye Dai, Wei Hong, Wei Wang, Weixiao Zhao, Wengang Cao, Wenlong Zhu, Wenlong He, Xi Su, Xi Nan, Xiaohan Zhao, Xiaohao Wang, Xiaoyu Zhao, Xiaoyu Wang, Xiaoyu Li, Xin Pan, Xin Chen, Xiusong Sun, Xu Xiang, Xudong Xing
【35】POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation
标题:POSESTITCH-RST:受语言启发的姿势缝合,用于端到端手语翻译
链接:https://arxiv.org/abs/2511.00270
作者:Abhinav Joshi, Vaibhav Sharma, Sanjeet Singh, Ashutosh Modi
备注:Accepted at EMNLP 2025 (Main)
【36】IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval
标题:IL-PCSR:用于先前案件和法规检索的法律数据库
链接:https://arxiv.org/abs/2511.00268
作者:Shounak Paul, Dhananjay Ghumare, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi
备注:Accepted at EMNLP 2025 (Main)
【37】Advancing AI Challenges for the United States Department of the Air Force
标题:美国空军部面临的人工智能挑战
链接:https://arxiv.org/abs/2511.00267
作者:Christian Prothmann, Vijay Gadepally, Jeremy Kepner, Koley Borchard, Luca Carlone, Zachary Folcik, J. Daniel Grith, Michael Houle, Jonathan P. How, Nathan Hughes, Ifueko Igbinedion, Hayden Jananthan, Tejas Jayashankar, Michael Jones, Sertac Karaman, Binoy G. Kurien, Alejandro Lancho, Giovanni Lavezzi, Gary C. F. Lee, Charles E. Leiserson, Richard Linares, Lindsey McEvoy, Peter Michaleas, Chasen Milner, Alex Pentland, Yury Polyanskiy, Jovan Popovich, Jeffrey Price, Tim W. Reid, Stephanie Riley, Siddharth Samsi, Peter Saunders, Olga Simek, Mark S. Veillette, Amir Weiss, Gregory W. Wornell, Daniela Rus, Scott T. Ruppel
备注:8 pages, 8 figures, 59 references. To appear in IEEE HPEC 2025
【38】A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice
标题:非随机多臂盗贼的严格下限与专家建议
链接:https://arxiv.org/abs/2511.00257
作者:Zachary Chase, Shinji Ito, Idan Mehalel
【39】Physiologically Active Vegetation Reverses Its Cooling Effect in Humid Urban Climates
标题:生理活跃植被在潮湿的城市气候中逆转其降温效应
链接:https://arxiv.org/abs/2511.00134
作者:Angana Borah, Adrija Datta, Ashish S. Kumar, Raviraj Dave, Udit Bhatia
备注:27 pages, 5 figures
【40】QuantumBench: A Benchmark for Quantum Problem Solving
标题:QuantumBench:量子问题解决的基准
链接:https://arxiv.org/abs/2511.00092
作者:Shunya Minami, Tatsuya Ishigaki, Ikko Hamamura, Taku Mikuriya, Youmi Ma, Naoaki Okazaki, Hiroya Takamura, Yohichi Suzuki, Tadashi Kadowaki
备注:11 pages, 8 figures
【41】flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R
标题:FlowengineR:一个用于R中公平和可复制工作流设计的模块化和可扩展框架
链接:https://arxiv.org/abs/2511.00079
作者:Maximilian Willer, Peter Ruckdeschel
备注:27 pages, 7 figures, 1 table
【42】Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves
标题:架起视觉、语言和数学的桥梁:用贝塞尔曲线重建象形文字
链接:https://arxiv.org/abs/2511.00076
作者:Zihao Wan, Pau Tong Lin Xu, Fuwen Luo, Ziyue Wang, Peng Li, Yang Liu
【43】LookSync: Large-Scale Visual Product Search System for AI-Generated Fashion Looks
标题:LookLock:人工智能生成时尚外观的大规模视觉产品搜索系统
链接:https://arxiv.org/abs/2511.00072
作者:Pradeep M, Ritesh Pallod, Satyen Abrol, Muthu Raman, Ian Anderson
备注:4 pages, 5 figures. Accepted at the International Conference on Data Science (IKDD CODS 2025), Demonstration Track. Demo video: this https URL
【44】Aligning Brain Signals with Multimodal Speech and Vision Embeddings
标题:将大脑信号与多模式语音和视觉嵌入对齐
链接:https://arxiv.org/abs/2511.00065
作者:Kateryna Shapovalenko, Quentin Auster
【45】Automatically Finding Rule-Based Neurons in OthelloGPT
标题:在OthelloGPT中自动查找基于规则的神经元
链接:https://arxiv.org/abs/2511.00059
作者:Aditya Singh, Zihang Wen, Srujananjali Medicherla, Adam Karvonen, Can Rager
备注:39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop Mechanistic interpretability
【46】Calibrating and Rotating: A Unified Framework for Weight Conditioning in PEFT
标题:校准和旋转:PEFT中体重调节的统一框架
链接:https://arxiv.org/abs/2511.00051
作者:Da Chang, Peng Xue, Yu Li, Yongxiang Liu, Pengxiang Xu, Shixun Zhang
【47】Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts
标题:使用对比预算的车辆引导的SAS转向进行参考率控制
链接:https://arxiv.org/abs/2511.00029
作者:Samaksh Bhargav, Zining Zhu
备注:12 pages, 6 figures
【48】Position Paper: If Innovation in AI Systematically Violates Fundamental Rights, Is It Innovation at All?
标题:如果人工智能的创新系统性地侵犯了基本权利,那它还算是创新吗?
链接:https://arxiv.org/abs/2511.00027
作者:Josu Eguiluz Castañeira, Axel Brando, Migle Laukyte, Marc Serra-Vidal
备注:NeurIPS 2025 Position Paper track; accepted for oral and poster presentation at the Thirty-Ninth Annual Conference on Neural Information Processing Systems
【49】On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication
标题:批量不变的图形处理器矩阵相乘中的浮点噪音结构
链接:https://arxiv.org/abs/2511.00025
作者:Tadisetty Sai Yashwanth
【50】Matrix Phylogeny: Compact Spectral Fingerprints for Trap-Robust Preconditioner Selection
标题:矩阵系统发育:用于陷阱稳健预处理器选择的紧凑光谱指纹
链接:https://arxiv.org/abs/2511.00012
作者:Jinwoo Baek
备注:16 Pages
【51】VRScout: Towards Real-Time, Autonomous Testing of Virtual Reality Games
标题:VR Scout:迈向虚拟现实游戏的实时、自主测试
链接:https://arxiv.org/abs/2511.00002
作者:Yurun Wu, Yousong Sun, Burkhard Wunsche, Jia Wang, Elliott Wen
【52】Disciplined Biconvex Programming
标题:简化的双凸面编程
链接:https://arxiv.org/abs/2511.01813
作者:Hao Zhu, Joschka Boedecker
【53】Making Interpretable Discoveries from Unstructured Data: A High-Dimensional Multiple Hypothesis Testing Approach
标题:从非结构化数据中做出可解释的发现:多维多重假设测试方法
链接:https://arxiv.org/abs/2511.01680
【54】Quantum Blackwell's Ordering and Differential Privacy
标题:量子布莱克威尔的排序和差异隐私
链接:https://arxiv.org/abs/2511.01467
作者:Ayanava Dasgupta, Naqueeb Ahmad Warsi, Masahito Hayashi
备注:46 pages, 3 figures
【55】Split-Flows: Measure Transport and Information Loss Across Molecular Resolutions
标题:分流:测量跨分子分辨率的传输和信息损失
链接:https://arxiv.org/abs/2511.01464
作者:Sander Hummerich, Tristan Bereau, Ullrich Köthe
【56】An Interdisciplinary and Cross-Task Review on Missing Data Imputation
标题:缺失数据插补的跨学科和跨任务审查
链接:https://arxiv.org/abs/2511.01196
【57】Stability of the Kim--Milman flow map
标题:Kim-Milman流图的稳定性
链接:https://arxiv.org/abs/2511.01154
作者:Sinho Chewi, Aram-Alexandre Pooladian, Matthew S. Zhang
【58】Binary perceptron computational gap -- a parametric fl RDT view
标题:二元感知器计算差距--参数fl RDT观点
链接:https://arxiv.org/abs/2511.01037
【59】Towards Channel Charting Enhancement with Non-Reconfigurable Intelligent Surfaces
标题:利用不可重新配置的智能表面实现频道图表增强
链接:https://arxiv.org/abs/2511.00919
作者:Mahdi Maleki, Reza Agahzadeh Ayoubi, Marouan Mizmizi, Umberto Spagnolini
【60】A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications
标题:数字孪生应用中求导高斯过程代理的流稀疏Cholesky方法
链接:https://arxiv.org/abs/2511.00366
作者:Krishna Prasath Logakannan, Shridhar Vashishtha, Jacob Hochhalter, Shandian Zhe, Robert M. Kirby
【61】NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversion
标题:NaturalVoices:用于语音转换的大规模、自发和情感播客数据集
链接:https://arxiv.org/abs/2511.00256
作者:Zongyang Du, Shreeram Suresh Chandra, Ismail Rasim Ulgen, Aurosweta Mahapatra, Ali N. Salman, Carlos Busso, Berrak Sisman
备注:Under review for IEEE Transactions on Affective Computing
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递