机器学习学术速递[9.10]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计128篇

大模型相关(12篇)

【1】Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees
标题：使用结构化攻击树的LLM驱动渗透测试中的引导推理
链接：https://arxiv.org/abs/2509.07939

作者：Nakano, Reza Feyyazi, Shanchieh Jay Yang, Michael Zuzak
摘要：大型语言模型（LLM）的最新进展推动了人们对自动化网络安全渗透测试工作流程的兴趣，为企业系统提供了更快，更一致的漏洞评估。用于渗透测试的现有LLM代理主要依赖于自我引导推理，这可能会产生不准确或幻觉的程序步骤。因此，LLM代理可能会采取非生产性的行动，例如利用未使用的软件库或生成重复先前策略的周期性响应。在这项工作中，我们提出了一个指导推理管道渗透测试LLM代理，它结合了一个确定性的任务树建立从MITRE ATT&CK矩阵，一个经过验证的渗透测试KLL链，约束LLM的reaoning过程明确定义的战术，技术和程序。这将推理锚定在经过验证的渗透测试方法中，并通过引导代理进行更有效的攻击程序来过滤无效的操作。为了评估我们的方法，我们使用三个LLM（Llama-3-8B，Gemini-1.5和GPT-4）构建了一个自动渗透测试LLM代理，并将其应用于导航10个HackTheBox网络安全演习，其中103个离散子任务代表真实世界的网络攻击场景。我们提出的推理管道分别引导LLM代理通过71.8%，72.8%和78.6%的子任务，使用Llama-3-8B，Gemini-1.5和GPT-4。相比之下，使用自引导推理的最先进的LLM渗透测试工具仅完成了13.5%、16.5%和75.7%的子任务，需要多86.2%、118.7%和205.9%的模型查询。这表明，将确定性任务树纳入LLM推理管道可以提高自动化网络安全评估的准确性和效率
摘要：Recent advances in Large Language Models (LLMs) have driven interest in automating cybersecurity penetration testing workflows, offering the promise of faster and more consistent vulnerability assessment for enterprise systems. Existing LLM agents for penetration testing primarily rely on self-guided reasoning, which can produce inaccurate or hallucinated procedural steps. As a result, the LLM agent may undertake unproductive actions, such as exploiting unused software libraries or generating cyclical responses that repeat prior tactics. In this work, we propose a guided reasoning pipeline for penetration testing LLM agents that incorporates a deterministic task tree built from the MITRE ATT&CK Matrix, a proven penetration testing kll chain, to constrain the LLM's reaoning process to explicitly defined tactics, techniques, and procedures. This anchors reasoning in proven penetration testing methodologies and filters out ineffective actions by guiding the agent towards more productive attack procedures. To evaluate our approach, we built an automated penetration testing LLM agent using three LLMs (Llama-3-8B, Gemini-1.5, and GPT-4) and applied it to navigate 10 HackTheBox cybersecurity exercises with 103 discrete subtasks representing real-world cyberattack scenarios. Our proposed reasoning pipeline guided the LLM agent through 71.8\%, 72.8\%, and 78.6\% of subtasks using Llama-3-8B, Gemini-1.5, and GPT-4, respectively. Comparatively, the state-of-the-art LLM penetration testing tool using self-guided reasoning completed only 13.5\%, 16.5\%, and 75.7\% of subtasks and required 86.2\%, 118.7\%, and 205.9\% more model queries. This suggests that incorporating a deterministic task tree into LLM reasoning pipelines can enhance the accuracy and efficiency of automated cybersecurity assessments

【2】GENUINE: Graph Enhanced Multi-level Uncertainty Estimation for Large Language Models
标题：GENUINE：大型语言模型的图形增强多层不确定性估计
链接：https://arxiv.org/abs/2509.07925

作者： Adithya Kulkarni, Tyler Cody, Peter A. Beling, Yujun Yan, Dawei Zhou
备注：Accepted by EMNLP 2025
摘要：不确定性估计对于增强大型语言模型（LLM）的可靠性至关重要，特别是在高风险应用中。现有的方法往往忽略语义依赖，依赖于令牌级的概率措施，无法捕捉生成的文本中的结构关系。我们建议真实：Graph Enhanced Multi-level UncertaINty Estimation for Large Language Models，一个结构感知框架，利用依赖解析树和分层图池来细化不确定性量化。通过结合监督学习，GENUINE有效地对语义和结构关系进行建模，从而提高了置信度评估。跨NLP任务的广泛实验表明，GENUINE的AUROC比基于语义熵的方法高出29%，并将校准误差减少了15%以上，证明了基于图的不确定性建模的有效性。该代码可在https://github.com/ODYSSEYWT/GUQ上获得。
摘要：Uncertainty estimation is essential for enhancing the reliability of Large Language Models (LLMs), particularly in high-stakes applications. Existing methods often overlook semantic dependencies, relying on token-level probability measures that fail to capture structural relationships within the generated text. We propose GENUINE: Graph ENhanced mUlti-level uncertaINty Estimation for Large Language Models, a structure-aware framework that leverages dependency parse trees and hierarchical graph pooling to refine uncertainty quantification. By incorporating supervised learning, GENUINE effectively models semantic and structural relationships, improving confidence assessments. Extensive experiments across NLP tasks show that GENUINE achieves up to 29% higher AUROC than semantic entropy-based approaches and reduces calibration errors by over 15%, demonstrating the effectiveness of graph-based uncertainty modeling. The code is available at https://github.com/ODYSSEYWT/GUQ.

【3】Uncovering Scaling Laws for Large Language Models via Inverse Problems
标题：通过逆问题揭示大型语言模型的缩放定律
链接：https://arxiv.org/abs/2509.07909

作者：a, Zhaoxuan Wu, Zijian Zhou, Xiaoqiang Lin, Zhiliang Chen, Rachael Hwee Ling Sim, Rui Qiao, Jingtan Wang, Nhung Bui, Xinyuan Niu, Wenyang Hu, Gregory Kang Ruey Lau, Zi-Yu Khoo, Zitong Zhao, Xinyi Xu, Apivich Hemachandra, See-Kiong Ng, Bryan Kian Hsiang Low
备注：Accepted at EMNLP Findings 2025
摘要：大型语言模型（LLM）是大规模的预训练模型，在不同的领域取得了显着的成功。这些成功是由数据和计算前所未有的复杂性和规模推动的。然而，由于训练这种模型的成本很高，用蛮力试错法来改进LLM是不可行的。受逆问题在揭示基本科学定律方面的成功启发，这份立场文件主张逆问题也可以有效地揭示尺度定律，指导LLM的构建，以实现更好的成本效益。
摘要：Large Language Models (LLMs) are large-scale pretrained models that have achieved remarkable success across diverse domains. These successes have been driven by unprecedented complexity and scale in both data and computations. However, due to the high costs of training such models, brute-force trial-and-error approaches to improve LLMs are not feasible. Inspired by the success of inverse problems in uncovering fundamental scientific laws, this position paper advocates that inverse problems can also efficiently uncover scaling laws that guide the building of LLMs to achieve the desirable performance with significantly better cost-effectiveness.

【4】Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data
标题：在公共数据上进行数据有效单阶段训练的竞争性音频语言模型
链接：https://arxiv.org/abs/2509.07526

作者：thik Kumar, Rishabh Saraf, Ludovick Lepauloux, Abdul Muneer, Billel Mokeddem, Hakim Hacid
备注：Accepted at ASRU 2025
摘要：大型语言模型（LLM）已经改变了NLP，但它们与音频的集成仍然没有得到充分的探索-尽管音频是人类交流的中心。我们介绍Falcon 3-Audio，这是一个基于音频调优LLM和Whisper编码器构建的音频语言模型（ALM）家族。使用非常少量的公共音频数据-不到30 K小时（5 K唯一）-Falcon 3-Audio-7 B在MMAU基准测试中与开放重量模型中报告的最佳性能相匹配，得分为64.14，与R1-AQA相匹配，同时通过卓越的数据和参数效率，单阶段训练和透明度而脱颖而出。值得注意的是，我们最小的1B模型与从2B到13 B参数的较大开放模型相比仍然具有竞争力。通过广泛的消融，我们发现常见的复杂性-例如课程学习，多个音频编码器和复杂的交叉注意连接器-即使与超过50万小时的数据训练模型相比，也不需要强大的性能。
摘要：Large language models (LLMs) have transformed NLP, yet their integration with audio remains underexplored -- despite audio's centrality to human communication. We introduce Falcon3-Audio, a family of Audio-Language Models (ALMs) built on instruction-tuned LLMs and Whisper encoders. Using a remarkably small amount of public audio data -- less than 30K hours (5K unique) -- Falcon3-Audio-7B matches the best reported performance among open-weight models on the MMAU benchmark, with a score of 64.14, matching R1-AQA, while distinguishing itself through superior data and parameter efficiency, single-stage training, and transparency. Notably, our smallest 1B model remains competitive with larger open models ranging from 2B to 13B parameters. Through extensive ablations, we find that common complexities -- such as curriculum learning, multiple audio encoders, and intricate cross-attention connectors -- are not required for strong performance, even compared to models trained on over 500K hours of data.

【5】Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
标题：与Oompa Loompas交谈：评估LLM代理语言习得的新型框架
链接：https://arxiv.org/abs/2509.07389

作者：attwadarshi Swain, Anshika Krishnatray, Dhruv Kumar, Jagat Sesh Challa
备注：Under review
摘要：现有的大型语言模型（LLM代理）的语言能力的评估研究主要集中在词汇学习，形态规则归纳，句法概括，语用推理和跨语言迁移。然而，没有人评估LLM代理是否可以通过模式识别和交互式反馈来获得语言，这是人类语言习得的核心特征。我们提出了一个新的实验框架，其中LLM代理评估其获取和使用新构建的语言（Tinkatongue）的能力与机器人只理解Tinkatongue的对话。我们的研究结果表明，LLM代理未能在100个响应内建立对话，但他们采用了不同的策略，反映了人类的语言学习方法。研究结果为评估基准和模型设计提供了一个新的方向，这些模型设计可以更有效地从交互式反馈中学习。
摘要：Existing evaluation studies on linguistic competence of large language models (LLM agents) have focused primarily on vocabulary learning, morphological rule induction, syntactic generalization, pragmatic inference, and cross-linguistic transfer. However, none assess whether LLM agents can acquire a language through pattern recognition and interactive feedback, a central feature of human language acquisition. We propose a novel experimental framework in which an LLM agent is evaluated on its ability to acquire and use a newly constructed language (Tinkatongue) in conversation with a bot that understands only Tinkatongue. Our findings show that LLM agents fail to establish a conversation within 100 responses, yet they adopt distinct strategies that mirror human approaches to language learning. The results suggest a new direction for evaluation benchmarks and open pathways to model designs that learn more effectively from interactive feedback.

【6】LLM Analysis of 150+ years of German Parliamentary Debates on Migration Reveals Shift from Post-War Solidarity to Anti-Solidarity in the Last Decade
标题：LLM对德国议会150多年移民辩论的分析揭示了过去十年从战后团结转向反团结
链接：https://arxiv.org/abs/2509.07274

作者：ikova, Ole Pütz, Steffen Eger, Olga Sabelfeld, Benjamin Paassen
摘要：移民一直是德国政治辩论的核心话题，从二战后数百万被驱逐者的劳工移民到最近的难民运动。传统上，深入研究有关这种广泛现象的政治言论需要大量的人工注释，将分析范围限制在数据的小子集上。大型语言模型（LLM）有可能部分自动化甚至复杂的注释任务。我们对德国议会辩论中注释（反）团结亚型的多个LLM进行了广泛评估，并与大量数千个人类参考注释（一年多来收集的）进行了比较。我们评估模型大小的影响，提示差异，微调，历史与当代数据;我们调查系统误差。除了方法论评估，我们还解释了由此产生的注释从社会科学研究，更深入地了解（反）团结的趋势，在德国二战后时期和最近的过去移民。我们的数据揭示了战后时期移民导向的高度团结，以及自2015年以来德国议会中强烈的反团结趋势，激发了进一步的研究。这些发现突出了法学硕士对政治文本分析的承诺以及德国移民辩论的重要性，在德国，人口下降和劳动力短缺与两极分化并存。
摘要：Migration has been a core topic in German political debate, from millions of expellees post World War II over labor migration to refugee movements in the recent past. Studying political speech regarding such wide-ranging phenomena in depth traditionally required extensive manual annotations, limiting the scope of analysis to small subsets of the data. Large language models (LLMs) have the potential to partially automate even complex annotation tasks. We provide an extensive evaluation of a multiple LLMs in annotating (anti-)solidarity subtypes in German parliamentary debates compared to a large set of thousands of human reference annotations (gathered over a year). We evaluate the influence of model size, prompting differences, fine-tuning, historical versus contemporary data; and we investigate systematic errors. Beyond methodological evaluation, we also interpret the resulting annotations from a social science lense, gaining deeper insight into (anti-)solidarity trends towards migrants in the German post-World War II period and recent past. Our data reveals a high degree of migrant-directed solidarity in the postwar period, as well as a strong trend towards anti-solidarity in the German parliament since 2015, motivating further research. These findings highlight the promise of LLMs for political text analysis and the importance of migration debates in Germany, where demographic decline and labor shortages coexist with rising polarization.

【7】HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring
标题：HealthSLM-Bench：针对移动和可穿戴医疗监控的小语言模型基准测试
链接：https://arxiv.org/abs/2509.07260

作者： Ting Dang, Xinyu Zhang, Vassilis Kostakos, Michael J. Witbrock, Hong Jia
备注：9 pages, 6 tables, 6 figures
摘要：移动和可穿戴医疗监测在促进及时干预、管理慢性健康状况以及最终改善个人生活质量方面发挥着至关重要的作用。以前对大型语言模型（LLM）的研究强调了它们在医疗保健预测任务中令人印象深刻的泛化能力和有效性。然而，大多数基于LLM的医疗保健解决方案都是基于云的，这引起了严重的隐私问题，并导致内存使用和延迟增加。为了应对这些挑战，人们越来越关注紧凑模型，即小型语言模型（SLM），它们是轻量级的，旨在在移动和可穿戴设备上本地有效地运行。然而，这些模型在医疗保健预测中的表现如何，在很大程度上尚未探索。我们使用zero-shot、Few-Shot和指令微调方法系统地评估了SLM在健康预测任务上的性能，并在移动设备上部署了性能最佳的微调SLM，以评估其在实际医疗保健场景中的真实效率和预测性能。我们的研究结果表明，SLM可以实现与LLM相当的性能，同时在效率和隐私方面提供实质性的收益。然而，挑战仍然存在，特别是在处理类不平衡和Few-Shot场景中。这些发现强调了空间光调制器，尽管目前的形式并不完美，但它是下一代保护隐私的医疗保健监测的一个有前途的解决方案。
摘要：Mobile and wearable healthcare monitoring play a vital role in facilitating timely interventions, managing chronic health conditions, and ultimately improving individuals' quality of life. Previous studies on large language models (LLMs) have highlighted their impressive generalization abilities and effectiveness in healthcare prediction tasks. However, most LLM-based healthcare solutions are cloud-based, which raises significant privacy concerns and results in increased memory usage and latency. To address these challenges, there is growing interest in compact models, Small Language Models (SLMs), which are lightweight and designed to run locally and efficiently on mobile and wearable devices. Nevertheless, how well these models perform in healthcare prediction remains largely unexplored. We systematically evaluated SLMs on health prediction tasks using zero-shot, few-shot, and instruction fine-tuning approaches, and deployed the best performing fine-tuned SLMs on mobile devices to evaluate their real-world efficiency and predictive performance in practical healthcare scenarios. Our results show that SLMs can achieve performance comparable to LLMs while offering substantial gains in efficiency and privacy. However, challenges remain, particularly in handling class imbalance and few-shot scenarios. These findings highlight SLMs, though imperfect in their current form, as a promising solution for next-generation, privacy-preserving healthcare monitoring.

【8】Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning
标题：数学推理开源大型语言模型的系统优化
链接：https://arxiv.org/abs/2509.07238

作者：war, Dhwaj Jain, Varun Gupta, Kaustav Dedhia, Dashrath Kale, Sudhir Dhekane
摘要：本文提出了一个实际的调查微调模型参数的数学推理任务，通过实验与各种配置，包括随机性控制，推理深度，和采样策略，仔细调整表明，显着提高效率以及性能。一个整体优化的框架，介绍了五个国家的最先进的数学推理任务的模型，表现出显着的性能提升，同时保持解决方案的正确性。通过对Qwen2.5- 72 B、Llama-3.1- 70 B、DeepSeek-V3、Mixtral-8x 22 B和Yi-Lightning的系统参数优化，以100%的优化成功率证明了一致的效率增益。该方法在所有测试模型中平均降低了29.4%的计算成本，提高了23.9%的推理速度。该框架系统地搜索参数空间，包括温度（0.1-0.5），推理步骤（4-12），规划周期（1-4），和核采样（0.85-0.98），确定最佳配置通过测试的数学推理基准。关键的发现表明，较低的温度制度（0.1-0.4）和减少推理步骤（4-6）始终提高效率，而不影响准确性。DeepSeek-V3达到了98%的最高准确率，而Mixtral-8x 22 B提供了最具成本效益的性能，每个准确响应361.5个令牌。主要贡献包括：（1）在数学推理中首次对五种不同的SOTA模型进行了全面的优化研究，（2）标准化的面向生产的参数优化框架，（3）发现适用于各种模型架构的通用优化趋势，以及（4）具有广泛性能表征的生产就绪配置。
摘要：This paper presents a practical investigation into fine-tuning model parameters for mathematical reasoning tasks through experimenting with various configurations including randomness control, reasoning depth, and sampling strategies, careful tuning demonstrates substantial improvements in efficiency as well as performance. A holistically optimized framework is introduced for five state-of-the-art models on mathematical reasoning tasks, exhibiting significant performance boosts while maintaining solution correctness. Through systematic parameter optimization across Qwen2.5-72B, Llama-3.1-70B, DeepSeek-V3, Mixtral-8x22B, and Yi-Lightning, consistent efficiency gains are demonstrated with 100% optimization success rate. The methodology achieves an average 29.4% reduction in computational cost and 23.9% improvement in inference speed across all tested models. This framework systematically searches parameter spaces including temperature (0.1-0.5), reasoning steps (4-12), planning periods (1-4), and nucleus sampling (0.85-0.98), determining optimal configurations through testing on mathematical reasoning benchmarks. Critical findings show that lower temperature regimes (0.1-0.4) and reduced reasoning steps (4-6) consistently enhance efficiency without compromising accuracy. DeepSeek-V3 achieves the highest accuracy at 98%, while Mixtral-8x22B delivers the most cost-effective performance at 361.5 tokens per accurate response. Key contributions include: (1) the first comprehensive optimization study for five diverse SOTA models in mathematical reasoning, (2) a standardized production-oriented parameter optimization framework, (3) discovery of universal optimization trends applicable across model architectures, and (4) production-ready configurations with extensive performance characterization.

【9】PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design
标题：PLaID++：一个面向目标无机材料设计的偏好对齐语言模型
链接：https://arxiv.org/abs/2509.07150

作者：Rohan Desai, Larry Wang, Gabriel Hope, Ethan Ritz
摘要：发现新材料对于太阳能电池、电池和碳捕获等技术进步至关重要。然而，新材料的开发受到缓慢而昂贵的试错过程的限制。为了加速这个流水线，我们引入了PLaID++，这是一个大型语言模型（LLM），经过微调，可用于稳定和属性引导的晶体生成。我们微调Qwen-2.5 7 B生成晶体结构使用一种新的Wyckoff为基础的文本表示。我们表明，生成可以有效地指导基于直接偏好优化（DPO）的强化学习技术，采样结构按其稳定性，新颖性和空间群进行分类。通过将对称性约束直接编码到文本中并将模型输出引导到所需的化学空间，PLaID++以比现有方法高50%的速率生成化学稳定、独特和新颖的结构，并有条件地生成具有所需空间群性质的结构。我们的实验突出了迭代DPO的有效性，与单独微调相比，在无条件和空间群条件生成中分别实现了$\sim$115\%和$\sim$50\%的改进。我们的工作展示了将后训练技术从自然语言处理应用到材料设计的潜力，为有针对性和有效地发现新材料铺平了道路。
摘要：Discovering novel materials is critical for technological advancements such as solar cells, batteries, and carbon capture. However, the development of new materials is constrained by a slow and expensive trial-and-error process. To accelerate this pipeline, we introduce PLaID++, a Large Language Model (LLM) fine-tuned for stable and property-guided crystal generation. We fine-tune Qwen-2.5 7B to generate crystal structures using a novel Wyckoff-based text representation. We show that generation can be effectively guided with a reinforcement learning technique based on Direct Preference Optimization (DPO), with sampled structures categorized by their stability, novelty, and space group. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a $\sim$50\% greater rate than prior methods and conditionally generates structures with desired space group properties. Our experiments highlight the effectiveness of iterative DPO, achieving $\sim$115\% and $\sim$50\% improvements in unconditional and space group conditioned generation, respectively, compared to fine-tuning alone. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.

【10】Avoiding Over-Personalization with Rule-Guided Knowledge Graph Adaptation for LLM Recommendations
标题：通过LLM推荐的规则引导知识图调整避免过度个性化
链接：https://arxiv.org/abs/2509.07133

作者：Spadea, Oshani Seneviratne
备注：5 pages, 2 figures, ISWC
摘要：我们提出了一个轻量级的神经符号框架，以减轻过度个性化的基于LLM的推荐系统，通过适应用户端的知识图（KG）在推理时间。我们的方法不是重新训练模型或依赖于不透明的知识，而是重构用户的个性化知识图（PKG）以抑制增强个性化信息环境（PIE）的特征同现模式，即，算法引起的过滤气泡限制了内容多样性。这些改编的PKG用于构建结构化提示，引导语言模型向更多样化的，Out-PIE建议，同时保持主题相关性。我们引入了一系列的符号自适应策略，包括软重新加权，硬反转，并有针对性地删除有偏见的三元组，以及客户端学习算法，优化每个用户的应用程序。食谱推荐基准上的实验表明，个性化PKG适应显着增加内容的新颖性，同时保持推荐质量，优于全球适应和天真的基于自适应的方法。
摘要：We present a lightweight neuro-symbolic framework to mitigate over-personalization in LLM-based recommender systems by adapting user-side Knowledge Graphs (KGs) at inference time. Instead of retraining models or relying on opaque heuristics, our method restructures a user's Personalized Knowledge Graph (PKG) to suppress feature co-occurrence patterns that reinforce Personalized Information Environments (PIEs), i.e., algorithmically induced filter bubbles that constrain content diversity. These adapted PKGs are used to construct structured prompts that steer the language model toward more diverse, Out-PIE recommendations while preserving topical relevance. We introduce a family of symbolic adaptation strategies, including soft reweighting, hard inversion, and targeted removal of biased triples, and a client-side learning algorithm that optimizes their application per user. Experiments on a recipe recommendation benchmark show that personalized PKG adaptations significantly increase content novelty while maintaining recommendation quality, outperforming global adaptation and naive prompt-based methods.

【11】RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use
标题：RL Factory：适用于LLM多回合工具使用的即插即用强化学习训练后框架
链接：https://arxiv.org/abs/2509.06980

作者：ai, Guojun Yin, Zekun Xu, Chuhuai Yue, Yi Jia, Siyu Xia, Xiaohan Wang, Jiwen Jiang, Xiaoguang Li, Chengqi Dong, Hang He, Wei Lin
摘要：大型语言模型在基本推理方面表现出色，但在需要与外部工具交互的任务方面却很困难。我们提出了RLFactory，一个用于多轮工具使用的即插即用强化学习后训练框架。RLFactory解决了（i）通过基于异步调用者和解耦的工具/培训架构解决工具异构性和接口问题中的工具调用稳定性和适应性，以及（ii）通过支持基于规则的奖励层，模型判断和工具验证信号的多样化评估需求。通过从工具反馈中引入观测标记，闭合模型、工具和环境之间的循环，重构MDP，实现了生成-解析-调用-更新的动态策略优化工作流程。在使用Qwen 3 - 4 B的Search-R1上，RLFactory在自然问题（NQ）数据集上获得了0.486的测试分数，超过了使用类似技术训练的大型模型（例如，Qwen2.5- 7 B-Instruct-GRPO为0.473），并将训练吞吐量提高了6.8倍。RLFactory提供了一个低障碍，高度适应性的框架，用于加强LLM在现实世界场景中的多轮工具使用。代码：https://github.com/Simple-Efficient/RL-Factory。
摘要：Large language models excel at basic reasoning but struggle with tasks that require interaction with external tools. We present RLFactory, a plug-and-play reinforcement learning post-training framework for multi-round tool use. RLFactory tackles (i) tool-call stability and adaptability amid tool heterogeneity and interface issues via an asyncio-based asynchronous caller and a decoupled tool/training architecture, and (ii) diverse evaluation needs via a reward layer supporting rule-based, model-judgment, and tool-verification signals. It reconstructs the MDP by introducing observation markers from tool feedback, closing the loop among model, tools, and environment, and implements a generate-parse-invoke-update workflow for dynamic policy optimization. On Search-R1 with Qwen3-4B, RLFactory achieves a 0.486 test score on the Natural Questions (NQ) dataset, surpassing larger models trained with similar techniques (e.g., Qwen2.5-7B-Instruct-GRPO at 0.473), and increases training throughput by 6.8x. RLFactory provides a low-barrier, highly adaptable framework for strengthening multi-round tool use of LLMs in real-world scenarios. Code: https://github.com/Simple-Efficient/RL-Factory.

【12】VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving
标题：VoltanaLLM：反馈驱动的频率控制和状态空间路由，用于节能LLM服务
链接：https://arxiv.org/abs/2509.04827

作者：u (1), Aryan Taneja (1), Junfeng Lin (2), Minjia Zhang (1) ((1) University of Illinois Urbana-Champaign, (2) Tsinghua University)
摘要：现代大型语言模型（LLM）服务系统越来越多地支持交互式应用程序，如实时聊天助手，代码生成工具和代理工作流。然而，LLM推理的能源成本飙升，对可持续和具有成本效益的部署提出了越来越大的挑战。本文介绍了VoltanaLLM，一个系统的SLO意识，节能LLM服务，建立从控制理论的角度来看。VoltanaLLM在新兴的预填充/解码分解架构中共同设计频率缩放和请求路由，利用其解耦执行来实现细粒度的特定阶段控制。它由一个反馈驱动的频率控制器和一个状态空间路由器组成，前者可动态调整预填充和解码阶段的GPU频率，后者可探索频率缩放实例之间的路由决策，以在延迟约束下最大限度地减少能量。我们在SGLang中实现了VoltanaLLM，并在多个最先进的LLM和真实数据集上评估了其性能。结果表明，VoltanaLLM实现了高达36.3%的节能，同时保持近乎完美的SLO实现率，为可持续和智能LLM服务铺平了道路。
摘要：Modern Large Language Model (LLM) serving systems increasingly support interactive applications, like real-time chat assistants, code generation tools, and agentic workflows. However, the soaring energy cost of LLM inference presents a growing challenge for sustainable and cost-effective deployment. This paper introduces VoltanaLLM, a system for SLO-aware, energy-efficient LLM serving, built from a control theory perspective. VoltanaLLM co-designs frequency scaling and request routing in emerging prefill/decode disaggregated architectures, leveraging their decoupled execution to enable fine-grained phase-specific control. It consists of a feedback-driven frequency controller that dynamically adapts GPU frequency for prefill and decode phases, and a state-space router that explores routing decisions across frequency-scaled instances to minimize energy under latency constraints. We implement VoltanaLLM in SGLang and evaluate its performance over multiple state-of-the-art LLMs and real-world datasets. The results demonstrate that VoltanaLLM achieves up to 36.3% energy savings while maintaining near-perfect SLO attainment rate, paving the way for sustainable and intelligent LLM serving.

Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】Bio-KGvec2go: Serving up-to-date Dynamic Biomedical Knowledge Graph Embeddings
标题：Bio-KGvec 2go：提供最新动态生物医学知识图谱嵌入
链接：https://arxiv.org/abs/2509.07905

作者：ad, Heiko Paulheim, Rita T. Sousa
备注：Accepted at ISWC Poster and Demo Track 2025
摘要：知识图和本体以结构化的方式表示实体及其关系，在现代AI应用程序的开发中具有重要意义。将这些语义资源与机器学习模型集成通常依赖于知识图嵌入模型将图形数据转换为数值表示。因此，用于流行知识图和本体的预训练模型越来越有价值，因为它们不需要使用相同的数据为不同的任务重新训练模型，从而有助于实现人工智能开发的民主化并实现可持续计算。在本文中，我们提出了生物KGvec 2go，KGvec 2go Web API的扩展，旨在为广泛使用的生物医学本体生成和服务知识图嵌入。鉴于这些本体的动态特性，Bio-KGvec 2go还支持与本体版本发布保持一致的定期更新。Bio-KGvec 2go通过提供最新的嵌入，只需用户进行最少的计算工作，从而促进了高效和及时的生物医学研究。
摘要：Knowledge graphs and ontologies represent entities and their relationships in a structured way, having gained significance in the development of modern AI applications. Integrating these semantic resources with machine learning models often relies on knowledge graph embedding models to transform graph data into numerical representations. Therefore, pre-trained models for popular knowledge graphs and ontologies are increasingly valuable, as they spare the need to retrain models for different tasks using the same data, thereby helping to democratize AI development and enabling sustainable computing. In this paper, we present Bio-KGvec2go, an extension of the KGvec2go Web API, designed to generate and serve knowledge graph embeddings for widely used biomedical ontologies. Given the dynamic nature of these ontologies, Bio-KGvec2go also supports regular updates aligned with ontology version releases. By offering up-to-date embeddings with minimal computational effort required from users, Bio-KGvec2go facilitates efficient and timely biomedical research.

【2】A Survey of Graph Neural Networks for Drug Discovery: Recent Developments and Challenges
标题：用于药物发现的图形神经网络概览：最近的发展和挑战
链接：https://arxiv.org/abs/2509.07887

作者： Berry, Liang Cheng
备注：16 pages, 1 figure
摘要：图形神经网络（GNN）在药物发现的复杂领域中获得了广泛的关注，因为它们能够处理图形结构的数据，如药物分子模型。这种方法已经在几个类别的药物发现研究的出版文献中产生了无数的方法和模型。本文结合近年来的研究成果，全面介绍了GNNs的研究范畴，即分子性质预测，包括药物-靶标结合亲和力预测、药物-药物相互作用研究、微生物组相互作用预测、药物重新定位、逆合成和新药设计，为GNNs在药物发现中的应用提供了指导。
摘要：Graph Neural Networks (GNNs) have gained traction in the complex domain of drug discovery because of their ability to process graph-structured data such as drug molecule models. This approach has resulted in a myriad of methods and models in published literature across several categories of drug discovery research. This paper covers the research categories comprehensively with recent papers, namely molecular property prediction, including drug-target binding affinity prediction, drug-drug interaction study, microbiome interaction prediction, drug repositioning, retrosynthesis, and new drug design, and provides guidance for future work on GNNs for drug discovery.

【3】Graph-based Integrated Gradients for Explaining Graph Neural Networks
标题：用于解释图神经网络的基于图的集成要素
链接：https://arxiv.org/abs/2509.07648

作者：impson, Kyle Millar, Adriel Cheng, Cheng-Chew Lim, Hong Gunn Chew
备注：Accepted at the Australasian Joint Conference on Artificial Intelligence (AJCAI) 2025
摘要：集成参数（IG）是解决神经网络黑箱问题的一种常用的可解释性技术。积分梯度假设数据连续。图是离散的结构，使得IG不适合图。在这项工作中，我们介绍了基于图的集成梯度（GB-IG），IG的扩展图。我们在四个合成数据集上证明，GB-IG准确地识别了分类任务中使用的图的关键结构组件。我们进一步证明了三个流行的现实世界的图形数据集，GB-IG优于IG突出节点分类任务的重要功能。
摘要：Integrated Gradients (IG) is a common explainability technique to address the black-box problem of neural networks. Integrated gradients assumes continuous data. Graphs are discrete structures making IG ill-suited to graphs. In this work, we introduce graph-based integrated gradients (GB-IG); an extension of IG to graphs. We demonstrate on four synthetic datasets that GB-IG accurately identifies crucial structural components of the graph used in classification tasks. We further demonstrate on three prevalent real-world graph datasets that GB-IG outperforms IG in highlighting important features for node classification tasks.

【4】Of Graphs and Tables: Zero-Shot Node Classification with Tabular Foundation Models
标题：图形和表格：使用表格基础模型的Zero-Shot节点分类
链接：https://arxiv.org/abs/2509.07143

作者：yler, Xingyue Huang, İsmail İlkan Ceylan, Michael Bronstein, Ben Finkelshtein
摘要：图基础模型（GFM）最近已经成为一个很有前途的范例，实现广泛的推广各种图形数据。然而，现有的GFM通常在数据集上进行训练，这些数据集表现出对真实世界图形的不良表现，限制了它们的泛化性能。相比之下，表格基础模型（TFM）不仅在经典的表格预测任务中表现出色，而且在其他领域（如时间序列预测，自然语言处理和计算机视觉）中也表现出很强的适用性。出于这一动机，我们采取了另一种观点的GFM标准的角度来看，并重新制定节点分类作为一个表格的问题。每个节点可以表示为一行，特征、结构和标签信息表示为列，从而使TFM能够通过上下文学习直接执行zero-shot节点分类。在这项工作中，我们介绍了TabGFM，一个图形基础模型框架，首先通过特征和结构编码器将图形转换为表格，将多个TFM应用于离散子采样表，然后通过集成选择聚合它们的输出。通过对28个真实世界数据集的实验，TabGFM实现了对特定任务GNN和最先进GFM的一致改进，突出了表格重构用于可扩展和可推广的图学习的潜力。
摘要：Graph foundation models (GFMs) have recently emerged as a promising paradigm for achieving broad generalization across various graph data. However, existing GFMs are often trained on datasets that were shown to poorly represent real-world graphs, limiting their generalization performance. In contrast, tabular foundation models (TFMs) not only excel at classical tabular prediction tasks but have also shown strong applicability in other domains such as time series forecasting, natural language processing, and computer vision. Motivated by this, we take an alternative view to the standard perspective of GFMs and reformulate node classification as a tabular problem. Each node can be represented as a row with feature, structure, and label information as columns, enabling TFMs to directly perform zero-shot node classification via in-context learning. In this work, we introduce TabGFM, a graph foundation model framework that first converts a graph into a table via feature and structural encoders, applies multiple TFMs to diversely subsampled tables, and then aggregates their outputs through ensemble selection. Through experiments on 28 real-world datasets, TabGFM achieves consistent improvements over task-specific GNNs and state-of-the-art GFMs, highlighting the potential of tabular reformulation for scalable and generalizable graph learning.

【5】From Eigenmodes to Proofs: Integrating Graph Spectral Operators with Symbolic Interpretable Reasoning
标题：从特征模到证明：将图谱运算符与符号可解释推理集成
链接：https://arxiv.org/abs/2509.07017

作者：ruluta, Priscilla Burity
摘要：我们介绍频谱NSR，一个完全频谱神经符号推理框架，嵌入逻辑规则作为频谱模板，并直接在图频谱域进行推理。通过利用图形信号处理（GSP）和基于知识图的拉普拉斯特征结构的频率选择滤波器，该架构将符号推理的可解释性与谱学习的可扩展性和适应性统一起来。除了核心配方之外，我们还整合了一套全面的扩展，包括动态图和基础学习，用于更清晰光谱选择性的理性和扩散过滤器，用于模块化专业化的光谱专家混合，光谱课程的证明指导培训，以及校准置信度的不确定性量化。其他增强功能，如大语言模型耦合，共谱传递对齐，对抗鲁棒性，高效的GPU内核，广义拉普拉斯算子和因果干预进一步扩展了框架的多功能性。对ProofWriter和CLUTRR等最先进的推理基准的实证评估表明，与包括Transformers，消息传递神经网络和神经符号逻辑编程系统在内的领先基线相比，Spectral NSR实现了更高的准确性，更快的推理速度，对对抗性扰动的鲁棒性以及更高的可解释性。光谱属性和证明带协议分析证实，模型的决策与符号证明结构密切相关，而转移实验验证有效的域适应通过共谱对齐。这些结果将光谱NSR确立为下一代推理系统的可扩展和原则性基础，提供超越传统方法的透明度，鲁棒性和泛化能力。
摘要：We introduce Spectral NSR, a fully spectral neuro-symbolic reasoning framework that embeds logical rules as spectral templates and performs inference directly in the graph spectral domain. By leveraging graph signal processing (GSP) and frequency-selective filters grounded in the Laplacian eigenstructure of knowledge graphs, the architecture unifies the interpretability of symbolic reasoning with the scalability and adaptability of spectral learning. Beyond the core formulation, we incorporate a comprehensive set of extensions, including dynamic graph and basis learning, rational and diffusion filters for sharper spectral selectivity, mixture-of-spectral-experts for modular specialization, proof-guided training with spectral curricula, and uncertainty quantification for calibrated confidence. Additional enhancements such as large language model coupling, co-spectral transfer alignment, adversarial robustness, efficient GPU kernels, generalized Laplacians, and causal interventions further expand the versatility of the framework. Empirical evaluation on state-of-the-art reasoning benchmarks such as ProofWriter and CLUTRR demonstrates that Spectral NSR achieves superior accuracy, faster inference, improved robustness to adversarial perturbations, and higher interpretability compared to leading baselines including transformers, message-passing neural networks, and neuro-symbolic logic programming systems. Spectral attribution and proof-band agreement analyses confirm that model decisions align closely with symbolic proof structures, while transfer experiments validate effective domain adaptation through co-spectral alignment. These results establish Spectral NSR as a scalable and principled foundation for the next generation of reasoning systems, offering transparency, robustness, and generalization beyond conventional approaches.

【6】GSTBench: A Benchmark Study on the Transferability of Graph Self-Supervised Learning
标题：GSTBench：图形自我监督学习可移植性的基准研究
链接：https://arxiv.org/abs/2509.06975

作者：Zhigang Hua, Yan Xie, Jingzhe Liu, Bo Long, Hui Liu
备注：Accepted at CIKM'25
摘要：自监督学习（SSL）在图表示学习中表现出很大的潜力。然而，大多数现有的图SSL方法都是在单数据集设置下开发和评估的，它们的跨数据集可移植性在很大程度上未被探索，并限制了它们利用知识转移和大规模预训练的能力，这些因素对于开发广义智能而不是拟合训练数据至关重要。为了解决这一差距，推进基础模型研究的图形，我们提出了GSTBench，第一个系统的基准评估图SSL方法的可移植性。我们对ogbn-papers 100 M进行了大规模的预训练，并在一组不同的目标图中评估了五种有代表性的SSL方法。我们的标准化实验设置将模型架构、数据集特征和适应协议等混杂因素进行了整合，从而能够仅针对预训练目标进行严格的比较。令人惊讶的是，我们观察到大多数图SSL方法很难泛化，其中一些方法的性能比随机初始化更差。相比之下，GraphMAE是一种掩码自动编码器方法，可持续提高传输性能。我们分析了驱动这些差异的潜在因素，并提供了指导可转移图SSL未来研究的见解，为图学习中的“预训练-然后转移”范式奠定了坚实的基础。我们的代码可在https://github.com/SongYYYY/GSTBench上获得。
摘要：Self-supervised learning (SSL) has shown great promise in graph representation learning. However, most existing graph SSL methods are developed and evaluated under a single-dataset setting, leaving their cross-dataset transferability largely unexplored and limiting their ability to leverage knowledge transfer and large-scale pretraining, factors that are critical for developing generalized intelligence beyond fitting training data. To address this gap and advance foundation model research for graphs, we present GSTBench, the first systematic benchmark for evaluating the transferability of graph SSL methods. We conduct large-scale pretraining on ogbn-papers100M and evaluate five representative SSL methods across a diverse set of target graphs. Our standardized experimental setup decouples confounding factors such as model architecture, dataset characteristics, and adaptation protocols, enabling rigorous comparisons focused solely on pretraining objectives. Surprisingly, we observe that most graph SSL methods struggle to generalize, with some performing worse than random initialization. In contrast, GraphMAE, a masked autoencoder approach, consistently improves transfer performance. We analyze the underlying factors that drive these differences and offer insights to guide future research on transferable graph SSL, laying a solid foundation for the "pretrain-then-transfer" paradigm in graph learning. Our code is available at https://github.com/SongYYYY/GSTBench.

【7】NestGNN: A Graph Neural Network Framework Generalizing the Nested Logit Model for Travel Mode Choice
标题：NestGNN：一个图神经网络框架，概括了用于旅行模式选择的巢式Logit模型
链接：https://arxiv.org/abs/2509.07123

作者：, Zhanhong Cheng, Lingqian Hu, Yuheng Bu, Shenhao Wang
摘要：嵌套logit（NL）通常用于离散选择分析，包括广泛的应用，如出行方式选择，汽车所有权或位置决策。然而，经典的NL模型受到其有限的表示能力和手工制作的效用规范的限制。虽然研究人员引入了深度神经网络（DNN）来应对这些挑战，但现有的DNN无法明确捕获离散选择背景下的替代间相关性。为了应对这些挑战，本研究提出了一个新的概念--替代图--来表示出行方式替代方案之间的关系。利用嵌套替代图，本研究进一步设计了一个嵌套效用图神经网络（NestGNN）作为神经网络家族中经典NL模型的推广。从理论上讲，NestGNNs在模型表示方面推广了经典NL模型和现有DNNs，同时保留了NL模型的关键两层替代模式：嵌套内的比例替代，但嵌套外的非比例替代。从经验上讲，我们发现NestGNNs显著优于基准模型，特别是相应的NL模型9.2%。如弹性表和替代可视化所示，NestGNNs保留了NL模型的两层替代模式，但在其模型设计空间中具有更大的灵活性。总的来说，我们的研究证明了NestGNN在预测，解释方面的能力，以及它在推广经典NL模型以分析出行方式选择方面的灵活性。
摘要：Nested logit (NL) has been commonly used for discrete choice analysis, including a wide range of applications such as travel mode choice, automobile ownership, or location decisions. However, the classical NL models are restricted by their limited representation capability and handcrafted utility specification. While researchers introduced deep neural networks (DNNs) to tackle such challenges, the existing DNNs cannot explicitly capture inter-alternative correlations in the discrete choice context. To address the challenges, this study proposes a novel concept - alternative graph - to represent the relationships among travel mode alternatives. Using a nested alternative graph, this study further designs a nested-utility graph neural network (NestGNN) as a generalization of the classical NL model in the neural network family. Theoretically, NestGNNs generalize the classical NL models and existing DNNs in terms of model representation, while retaining the crucial two-layer substitution patterns of the NL models: proportional substitution within a nest but non-proportional substitution beyond a nest. Empirically, we find that the NestGNNs significantly outperform the benchmark models, particularly the corresponding NL models by 9.2\%. As shown by elasticity tables and substitution visualization, NestGNNs retain the two-layer substitution patterns as the NL model, and yet presents more flexibility in its model design space. Overall, our study demonstrates the power of NestGNN in prediction, interpretation, and its flexibility of generalizing the classical NL model for analyzing travel mode choice.

Transformer(5篇)

【1】Transformer-Based Approach to Optimal Sensor Placement for Structural Health Monitoring of Probe Cards
标题：基于Transformer的探头卡结构健康监测最佳传感器放置方法
链接：https://arxiv.org/abs/2509.07603

作者：ani, Marco Mauri, Daniele Acconcia, Simone Todaro, Stefano Mariani
备注：22 pages, 11 figures
摘要：本文提出了一种创新的基于Transformer的深度学习策略，用于优化传感器的放置，旨在对半导体探针卡进行结构健康监测。探针卡的故障，包括基板裂纹和螺丝松动，将严重影响半导体制造的产量和可靠性。一些故障模式可以通过配备足够的传感器的探针卡来检测。在探针卡的有限元模型内采用来自模拟故障场景的频率响应函数。一个全面的数据集，丰富的物理知情的场景扩展和物理感知的统计数据增强，利用训练混合卷积神经网络和Transformer模型。该模型在对探针卡健康状态（基线、松动螺钉、裂纹）进行分类时实现了高准确率（99.83%）和出色的裂纹检测召回率（99.73%）。通过3次重复的10倍分层交叉验证的严格框架确认模型稳健性。注意力机制还可以精确定位关键传感器位置：对注意力权重的分析为通过优化传感器配置设计高效、经济的监控系统提供了可操作的见解。这项研究强调了基于注意力的深度学习能够推进主动维护，提高半导体制造的运营可靠性和产量。
摘要：This paper presents an innovative Transformer-based deep learning strategy for optimizing the placement of sensors aiming at structural health monitoring of semiconductor probe cards. Failures in probe cards, including substrate cracks and loosened screws, would critically affect semiconductor manufacturing yield and reliability. Some failure modes could be detected by equipping a probe card with adequate sensors. Frequency response functions from simulated failure scenarios are adopted within a finite element model of a probe card. A comprehensive dataset, enriched by physics-informed scenario expansion and physics-aware statistical data augmentation, is exploited to train a hybrid Convolutional Neural Network and Transformer model. The model achieves high accuracy (99.83%) in classifying the probe card health states (baseline, loose screw, crack) and an excellent crack detection recall (99.73%). Model robustness is confirmed through a rigorous framework of 3 repetitions of 10-fold stratified cross-validation. The attention mechanism also pinpoints critical sensor locations: an analysis of the attention weights offers actionable insights for designing efficient, cost-effective monitoring systems by optimizing sensor configurations. This research highlights the capability of attention-based deep learning to advance proactive maintenance, enhancing operational reliability and yield in semiconductor manufacturing.

【2】Measuring Uncertainty in Transformer Circuits with Effective Information Consistency
标题：利用有效的信息一致性测量Transformer电路的不确定度
链接：https://arxiv.org/abs/2509.07149

作者：. Krasnovsky
摘要：机械可解释性已经在大型语言模型（LLM）中确定了功能子图，称为Transformer Circuits（TC），它们似乎实现了特定的算法。然而，我们缺乏一种正式的、单通道的方法来量化一个有源电路何时表现一致，从而可能值得信赖。在以前的系统理论的建议，我们专门的一个层/上同调和因果出现的角度来看，TC和引入连续信息一致性得分（EICS）。EICS结合了（i）从局部雅可比矩阵和激活计算的归一化层不一致性，（ii）从相同的前向状态导出的电路级因果涌现的高斯EI代理。该构造是白盒、单通道的，并且使单位显式，以便分数是无量纲的。我们还提供了分数解释，计算开销（快速和准确的模式）和玩具健全检查分析的实用指导。LLM任务的经验验证被推迟。
摘要：Mechanistic interpretability has identified functional subgraphs within large language models (LLMs), known as Transformer Circuits (TCs), that appear to implement specific algorithms. Yet we lack a formal, single-pass way to quantify when an active circuit is behaving coherently and thus likely trustworthy. Building on prior systems-theoretic proposals, we specialize a sheaf/cohomology and causal emergence perspective to TCs and introduce the Effective-Information Consistency Score (EICS). EICS combines (i) a normalized sheaf inconsistency computed from local Jacobians and activations, with (ii) a Gaussian EI proxy for circuit-level causal emergence derived from the same forward state. The construction is white-box, single-pass, and makes units explicit so that the score is dimensionless. We further provide practical guidance on score interpretation, computational overhead (with fast and exact modes), and a toy sanity-check analysis. Empirical validation on LLM tasks is deferred.

【3】Benchmarking Vision Transformers and CNNs for Thermal Photovoltaic Fault Detection with Explainable AI Validation
标题：通过可解释的人工智能验证对Vision Transformers和CNN进行热太阳能故障检测进行基准测试
链接：https://arxiv.org/abs/2509.07039

作者：oy
备注：28 Pages, 4 Figures
摘要：用于自动光伏（PV）监控的人工智能部署面临着可解释性障碍，限制了能源基础设施应用的采用。虽然深度学习在热故障检测方面实现了高准确性，但仍然缺乏对模型决策与热物理原理一致性的验证，从而在理解模型推理至关重要的情况下造成部署犹豫。这项研究提供了卷积神经网络（ResNet-18，EfficientNet-B 0）和Vision Transformers（ViT-Tiny，Swin-Tiny）用于热光伏故障检测的系统比较，使用XRAI显着性分析来评估与热物理原理的一致性。这是CNN和Vision Transformers用于热光伏故障检测的第一次系统性比较，具有物理验证的可解释性。对涵盖正常操作和11种故障类别的20，000张红外图像的评估表明，与CNN方法相比，Swin Transformer实现了最高的性能（94%的二进制准确度; 73%的多类准确度）。XRAI分析表明，模型学习物理上有意义的功能，如电池缺陷的局部热点，二极管故障的线性热路径，以及植被阴影的热边界，与预期的热特征一致。然而，故障类型之间的性能差异很大：电气故障实现了强检测（F1分数>0.90），而污染等环境因素仍然具有挑战性（F1分数0.20-0.33），表明热成像分辨率的限制。热物理指导的可解释性方法为验证能源监测应用中的人工智能决策提供了方法，解决了可再生能源基础设施的部署障碍。
摘要：Artificial intelligence deployment for automated photovoltaic (PV) monitoring faces interpretability barriers that limit adoption in energy infrastructure applications. While deep learning achieves high accuracy in thermal fault detection, validation that model decisions align with thermal physics principles remains lacking, creating deployment hesitancy where understanding model reasoning is critical. This study provides a systematic comparison of convolutional neural networks (ResNet-18, EfficientNet-B0) and vision transformers (ViT-Tiny, Swin-Tiny) for thermal PV fault detection, using XRAI saliency analysis to assess alignment with thermal physics principles. This represents the first systematic comparison of CNNs and vision transformers for thermal PV fault detection with physics-validated interpretability. Evaluation on 20,000 infrared images spanning normal operation and 11 fault categories shows that Swin Transformer achieves the highest performance (94% binary accuracy; 73% multiclass accuracy) compared to CNN approaches. XRAI analysis reveals that models learn physically meaningful features, such as localized hotspots for cell defects, linear thermal paths for diode failures, and thermal boundaries for vegetation shading, consistent with expected thermal signatures. However, performance varies significantly across fault types: electrical faults achieve strong detection (F1-scores >0.90) while environmental factors like soiling remain challenging (F1-scores 0.20-0.33), indicating limitations imposed by thermal imaging resolution. The thermal physics-guided interpretability approach provides methodology for validating AI decision-making in energy monitoring applications, addressing deployment barriers in renewable energy infrastructure.

【4】A transformer-based generative model for planetary systems
标题：基于Transformer的行星系统生成模型
链接：https://arxiv.org/abs/2509.07226

作者：ert, Jeanne Davoult, Sara Marques
备注：Accepted in A&A
摘要：行星系统形成的数值计算对计算能力要求很高。然而，这些合成行星系统可以提供访问的相关性，在给定的数值框架预测，在同一系统中的行星之间的属性。反过来，这种相关性可以用来指导和优先考虑旨在发现某些类型行星的观测活动，如类地行星。我们的目标是开发一个生成模型，它能够捕获同一系统中行星之间的相关性和统计关系。这种模型在伯尔尼模型上训练，提供了以很少的计算成本生成大量合成行星系统的可能性，例如，可以用于指导观测活动。我们的生成模型是基于Transformer架构，这是众所周知的，有效地捕捉序列中的相关性，是所有现代大型语言模型的基础。为了评估生成模型的有效性，我们进行了视觉和统计比较，以及机器学习驱动的测试。最后，作为一个用例示例，我们考虑TOI-469系统，在该系统中，我们的目标是基于行星b（第一个被探测到的行星）的属性来预测行星c和d的可能属性。我们使用不同的比较方法，我们的模型所产生的系统的属性是非常相似的系统直接计算的伯尔尼模型。我们还表明，在TOI-469系统的情况下，使用生成模型可以根据已经观测到的行星的性质来预测尚未观测到的行星的性质。我们在我们的网站www.ai4exoplanets.com上向社区提供我们的模型。
摘要：Numerical calculations of planetary system formation are very demanding in terms of computing power. These synthetic planetary systems can however provide access to correlations, as predicted in a given numerical framework, between the properties of planets in the same system. Such correlations can, in return, be used in order to guide and prioritize observational campaigns aiming at discovering some types of planets, as Earth-like planets. Our goal is to develop a generative model which is capable of capturing correlations and statistical relationships between planets in the same system. Such a model, trained on the Bern model, offers the possibility to generate large number of synthetic planetary systems with little computational cost, that can be used, for example, to guide observational campaigns. Our generative model is based on the transformer architecture which is well-known to efficiently capture correlations in sequences and is at the basis of all modern Large Language Models. To assess the validity of the generative model, we perform visual and statistical comparisons, as well as a machine learning driven tests. Finally, as a use case example, we consider the TOI-469 system, in which we aim at predicting the possible properties of planets c and d, based on the properties of planet b (the first that has been detected). We show using different comparison methods that the properties of systems generated by our model are very similar to the ones of the systems computed directly by the Bern model. We also show in the case of the TOI-469 system, that using the generative model allows to predict the properties of planets not yet observed, based on the properties of the already observed planet. We provide our model to the community on our website www.ai4exoplanets.com.

【5】Physics-Guided Diffusion Transformer with Spherical Harmonic Posterior Sampling for High-Fidelity Angular Super-Resolution in Diffusion MRI
标题：基于球谐后采样的物理引导扩散Transformer实现扩散磁共振成像中的高保真角超分辨
链接：https://arxiv.org/abs/2509.07020

作者：aohui Xiao, Ruoyou Wu, Shoujun Yu, Ye Li, Hairong Zheng, Shanshan Wang
摘要：弥散MRI（dMRI）角度超分辨率（ASR）旨在从有限的低角度分辨率（LAR）数据重建高角度分辨率（HAR）信号，而不延长扫描时间。然而，现有的方法是有限的，在恢复细粒度的角度细节或保持高保真度，由于不充分的建模的q-空间几何和不充分的纳入物理约束。在本文中，我们介绍了一个物理引导的扩散Transformer（PGDiT），旨在探索整个训练和推理阶段的物理先验。在训练过程中，具有b矢量调制和随机角度掩蔽的Q空间几何感知模块（QGAM）有助于方向感知表示学习，使网络能够从稀疏和噪声数据中生成具有精细角度细节的方向一致重建。在推理中，两阶段球谐函数引导后验采样（SHPS）强制与采集的数据对齐，然后进行基于热扩散的SH正则化，以确保物理上合理的重建。这种从粗到细的细化策略减轻了在纯数据驱动或生成模型中常见的过度平滑和伪影。对一般ASR任务和两个下游应用（扩散张量成像（DTI）和神经突方向分散和密度成像（NODDI））的广泛实验表明，PGDiT在细节恢复和数据保真度方面优于现有的深度学习模型。我们的方法提出了一种新型的生成式ASR框架，可提供高保真度HAR dMRI重建，在神经科学和临床研究中具有潜在应用。
摘要：Diffusion MRI (dMRI) angular super-resolution (ASR) aims to reconstruct high-angular-resolution (HAR) signals from limited low-angular-resolution (LAR) data without prolonging scan time. However, existing methods are limited in recovering fine-grained angular details or preserving high fidelity due to inadequate modeling of q-space geometry and insufficient incorporation of physical constraints. In this paper, we introduce a Physics-Guided Diffusion Transformer (PGDiT) designed to explore physical priors throughout both training and inference stages. During training, a Q-space Geometry-Aware Module (QGAM) with b-vector modulation and random angular masking facilitates direction-aware representation learning, enabling the network to generate directionally consistent reconstructions with fine angular details from sparse and noisy data. In inference, a two-stage Spherical Harmonics-Guided Posterior Sampling (SHPS) enforces alignment with the acquired data, followed by heat-diffusion-based SH regularization to ensure physically plausible reconstructions. This coarse-to-fine refinement strategy mitigates oversmoothing and artifacts commonly observed in purely data-driven or generative models. Extensive experiments on general ASR tasks and two downstream applications, Diffusion Tensor Imaging (DTI) and Neurite Orientation Dispersion and Density Imaging (NODDI), demonstrate that PGDiT outperforms existing deep learning models in detail recovery and data fidelity. Our approach presents a novel generative ASR framework that offers high-fidelity HAR dMRI reconstructions, with potential applications in neuroscience and clinical research.

GAN|对抗|攻击|生成相关(5篇)

【1】Nearest Neighbor Projection Removal Adversarial Training
标题：最近邻居投影消除对抗训练
链接：https://arxiv.org/abs/2509.07673

作者：Singh, A. V. Subramanyam, Shivank Rajput, Mohan Kankanhalli
摘要：深度神经网络在图像分类任务中表现出令人印象深刻的性能，但仍然容易受到对抗性样本的影响。标准对抗训练增强了鲁棒性，但通常无法明确解决类间特征重叠，这是对抗敏感性的重要贡献者。在这项工作中，我们引入了一种新的对抗性训练框架，通过从特征空间中的对抗性和干净样本中投射出类间依赖关系，来积极减轻类间接近。具体来说，我们的方法首先为每个对抗样本识别最近的类间邻居，然后去除对这些邻居的投影，以增强特征的可分离性。理论上，我们证明了我们提出的logits校正降低了神经网络的Lipschitz常数，从而降低了Rademacher复杂度，这直接有助于提高泛化能力和鲁棒性。在包括CIFAR-10、CIFAR-100和SVHN在内的标准基准测试中进行的广泛实验表明，我们的方法表现出了与领先的对抗性训练技术相竞争的强大性能，突出了在鲁棒性和准确性方面的显著成就。我们的研究结果揭示了明确解决类间特征接近度以增强DNN对抗鲁棒性的重要性。
摘要：Deep neural networks have exhibited impressive performance in image classification tasks but remain vulnerable to adversarial examples. Standard adversarial training enhances robustness but typically fails to explicitly address inter-class feature overlap, a significant contributor to adversarial susceptibility. In this work, we introduce a novel adversarial training framework that actively mitigates inter-class proximity by projecting out inter-class dependencies from adversarial and clean samples in the feature space. Specifically, our approach first identifies the nearest inter-class neighbors for each adversarial sample and subsequently removes projections onto these neighbors to enforce stronger feature separability. Theoretically, we demonstrate that our proposed logits correction reduces the Lipschitz constant of neural networks, thereby lowering the Rademacher complexity, which directly contributes to improved generalization and robustness. Extensive experiments across standard benchmarks including CIFAR-10, CIFAR-100, and SVHN show that our method demonstrates strong performance that is competitive with leading adversarial training techniques, highlighting significant achievements in both robust and clean accuracy. Our findings reveal the importance of addressing inter-class feature proximity explicitly to bolster adversarial robustness in DNNs.

【2】Instance-level Performance Prediction for Long-form Generation Tasks
标题：长格式生成任务的实例级性能预测
链接：https://arxiv.org/abs/2509.07309

作者：Hsu, Alexander Braylan, Yiheng Su, Omar Alonso, Matthew Lease
摘要：我们激励和分享一个新的基准，实例级的性能预测的长形式生成任务具有多方面的，细粒度的质量指标。我们的任务，模型和度量不可知的配方预测连续的评价指标分数只给出黑盒模型的输入和输出。除了预测指标得分的点估计值之外，基准测试还需要推断预测区间以量化点估计值周围的不确定性。评估跨越11个长格式数据集/任务，每个任务具有多个LLM，基线和指标。我们表明，分数可以有效地预测整个长形式生成任务使用少至16个训练样本。总的来说，我们引入了一个新颖而有用的任务，一个推动进展的有价值的基准，以及今天可以实际采用的基线。
摘要：We motivate and share a new benchmark for instance-level performance prediction of long-form generation tasks having multi-faceted, fine-grained quality metrics. Our task-, model- and metric-agnostic formulation predicts continuous evaluation metric scores given only black-box model inputs and outputs. Beyond predicting point estimates of metric scores, the benchmark also requires inferring prediction intervals to quantify uncertainty around point estimates. Evaluation spans 11 long-form datasets/tasks with multiple LLMs, baselines, and metrics per task. We show that scores can be effectively predicted across long-form generation tasks using as few as 16 training examples. Overall, we introduce a novel and useful task, a valuable benchmark to drive progress, and baselines ready for practical adoption today.

【3】Adversarial Attacks on Audio Deepfake Detection: A Benchmark and Comparative Study
标题：音频Deepfake检测的对抗攻击：基准和比较研究
链接：https://arxiv.org/abs/2509.07132

作者：in, Muhammad Umar Farooq, Awais Khan, Khalid Mahmood Malik
摘要：生成式人工智能的广泛使用在制作高度逼真的deepfake方面取得了显着的成功，对各种语音生物识别应用构成了严重威胁，包括说话人验证，语音生物识别，音频会议和刑事调查。为了解决这个问题，已经提出了几种最先进的（SoTA）音频深度伪造检测（ADD）方法来识别生成AI签名，以区分真实和深度伪造音频。然而，这些方法的有效性被隐藏生成签名的反取证（AF）攻击严重破坏。这些AF攻击涵盖了广泛的技术，包括统计修改（例如，音调移位、滤波、噪声添加和量化）和基于优化的攻击（例如，FGSM、PGD、C \& W和DeepFool）。在本文中，我们研究了SoTA ADD方法，并提供了一个比较分析，以突出它们在暴露deepfake签名方面的有效性，以及它们在对抗条件下的漏洞。我们使用两类方法在五个deepfake基准数据集上对ADD方法进行了广泛的评估：原始方法和基于频谱图的方法。这种比较分析使人们能够更深入地了解SoTA ADD方法对抗各种AF攻击的优势和局限性。它不仅突出了ADD方法的弱点，而且还为现实世界的语音生物特征识别设计更强大和更通用的检测器提供了信息。它将进一步指导未来的研究，开发适应性防御策略，可以有效地对抗不断发展的AF技术。
摘要：The widespread use of generative AI has shown remarkable success in producing highly realistic deepfakes, posing a serious threat to various voice biometric applications, including speaker verification, voice biometrics, audio conferencing, and criminal investigations. To counteract this, several state-of-the-art (SoTA) audio deepfake detection (ADD) methods have been proposed to identify generative AI signatures to distinguish between real and deepfake audio. However, the effectiveness of these methods is severely undermined by anti-forensic (AF) attacks that conceal generative signatures. These AF attacks span a wide range of techniques, including statistical modifications (e.g., pitch shifting, filtering, noise addition, and quantization) and optimization-based attacks (e.g., FGSM, PGD, C \& W, and DeepFool). In this paper, we investigate the SoTA ADD methods and provide a comparative analysis to highlight their effectiveness in exposing deepfake signatures, as well as their vulnerabilities under adversarial conditions. We conducted an extensive evaluation of ADD methods on five deepfake benchmark datasets using two categories: raw and spectrogram-based approaches. This comparative analysis enables a deeper understanding of the strengths and limitations of SoTA ADD methods against diverse AF attacks. It does not only highlight vulnerabilities of ADD methods, but also informs the design of more robust and generalized detectors for real-world voice biometrics. It will further guide future research in developing adaptive defense strategies that can effectively counter evolving AF techniques.

【4】Random Forest Stratified K-Fold Cross Validation on SYN DoS Attack SD-IoV
标题：SEN NOS攻击SD-IoV的随机森林分层K折叠交叉验证
链接：https://arxiv.org/abs/2509.07016

作者：Arif Hakimi Zamrai, Kamaludin Mohd Yusof
摘要：针对软件定义的车联网（SD-IoV）背景下普遍存在的TCP SYN洪水攻击问题，本研究解决了快速发展的车辆通信系统中网络安全的重大挑战。本研究的重点是优化随机森林分类器模型，以实现最大的准确率和最短的检测时间，从而提高车辆网络的安全性。该方法涉及预处理包含SYN攻击实例的数据集，采用特征缩放和标签编码技术，并将分层K折叠交叉验证应用于目标关键指标，如准确性，精度，召回率和F1分数。这项研究的所有指标的平均值为0.999998，SYN DoS攻击检测时间为0.24秒。结果表明，经过微调的随机森林模型配置了20个估计器，深度为10，可以有效区分正常和恶意流量，具有高准确性和最短的检测时间，这对SD-IoV网络至关重要。这种方法标志着一个重大的进步，并引入了检测SYN洪水攻击的最先进的算法，结合了高准确性和最短的检测时间。它通过提供针对TCP SYN洪水攻击的强大解决方案，同时保持网络效率和可靠性，为车辆网络安全做出贡献。
摘要：In response to the prevalent concern of TCP SYN flood attacks within the context of Software-Defined Internet of Vehicles (SD-IoV), this study addresses the significant challenge of network security in rapidly evolving vehicular communication systems. This research focuses on optimizing a Random Forest Classifier model to achieve maximum accuracy and minimal detection time, thereby enhancing vehicular network security. The methodology involves preprocessing a dataset containing SYN attack instances, employing feature scaling and label encoding techniques, and applying Stratified K-Fold cross-validation to target key metrics such as accuracy, precision, recall, and F1-score. This research achieved an average value of 0.999998 for all metrics with a SYN DoS attack detection time of 0.24 seconds. Results show that the fine-tuned Random Forest model, configured with 20 estimators and a depth of 10, effectively differentiates between normal and malicious traffic with high accuracy and minimal detection time, which is crucial for SD-IoV networks. This approach marks a significant advancement and introduces a state-of-the-art algorithm in detecting SYN flood attacks, combining high accuracy with minimal detection time. It contributes to vehicular network security by providing a robust solution against TCP SYN flood attacks while maintaining network efficiency and reliability.

【5】Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters
标题：利用Lorenzetti生成合成数据，用于高能物理量热计中的时间序列异常检测
链接：https://arxiv.org/abs/2509.07451

作者：gia, Bogdan Malaescu
备注：4 pages, 2 figures, Submission to SciPost proceedings for EuCAIFCon 2025
摘要：多元时间序列的异常检测是保证物理实验数据质量的关键。准确识别意外错误或缺陷发生的时刻至关重要，但由于标签稀缺，异常类型未知以及维度之间的复杂相关性，因此具有挑战性。为了解决标记数据的稀缺性和不可靠性，我们使用Lorenzetti模拟器来生成具有注入量热计异常的合成事件。然后，我们评估了几种时间序列异常检测方法的灵敏度，包括基于transformer和其他深度学习模型。这里采用的方法是通用的，适用于不同的检测器设计和缺陷。
摘要：Anomaly detection in multivariate time series is crucial to ensure the quality of data coming from a physics experiment. Accurately identifying the moments when unexpected errors or defects occur is essential, yet challenging due to scarce labels, unknown anomaly types, and complex correlations across dimensions. To address the scarcity and unreliability of labelled data, we use the Lorenzetti Simulator to generate synthetic events with injected calorimeter anomalies. We then assess the sensitivity of several time series anomaly detection methods, including transformer-based and other deep learning models. The approach employed here is generic and applicable to different detector designs and defects.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】Methodological Insights into Structural Causal Modelling and Uncertainty-Aware Forecasting for Economic Indicators
标题：经济指标结构性因果建模和不确定性预测的方法论见解
链接：https://arxiv.org/abs/2509.07036

作者：Cerutti
备注：Accepted at the 2nd edition of the Workshop in AI and Finance at ECAI-2025
摘要：本文提出了一种结合因果发现和不确定性预测的金融时间序列分析方法。作为一个案例研究，我们专注于美国四个关键的宏观经济指标- GDP，经济增长，通货膨胀和失业-我们应用LPCMCI框架与高斯过程距离相关性（GPDC）来揭示1970年至2021年季度数据中的动态因果关系。我们的研究结果揭示了一个强大的单向因果关系从经济增长到GDP，并强调了有限的连接通货膨胀，这表明潜在因素的影响。失业率表现出很强的自回归相关性，促使其作为概率预测的案例研究。利用Chronos框架，一个为时间序列训练的大型语言模型，我们对失业率进行了zero-shot预测。这种方法可以提前一个季度和两个季度提供准确的预测，而不需要针对特定任务的培训。至关重要的是，该模型的不确定性感知预测产生90%的置信区间，通过统计学原理的偏差分析实现有效的异常检测。这项研究证明了因果结构学习与概率语言模型相结合的价值，以告知经济政策和提高预测的鲁棒性。
摘要：This paper presents a methodological approach to financial time series analysis by combining causal discovery and uncertainty-aware forecasting. As a case study, we focus on four key U.S. macroeconomic indicators -- GDP, economic growth, inflation, and unemployment -- and we apply the LPCMCI framework with Gaussian Process Distance Correlation (GPDC) to uncover dynamic causal relationships in quarterly data from 1970 to 2021. Our results reveal a robust unidirectional causal link from economic growth to GDP and highlight the limited connectivity of inflation, suggesting the influence of latent factors. Unemployment exhibits strong autoregressive dependence, motivating its use as a case study for probabilistic forecasting. Leveraging the Chronos framework, a large language model trained for time series, we perform zero-shot predictions on unemployment. This approach delivers accurate forecasts one and two quarters ahead, without requiring task-specific training. Crucially, the model's uncertainty-aware predictions yield 90\% confidence intervals, enabling effective anomaly detection through statistically principled deviation analysis. This study demonstrates the value of combining causal structure learning with probabilistic language models to inform economic policy and enhance forecasting robustness.

【2】The Protocol Genome A Self Supervised Learning Framework from DICOM Headers
标题：协议基因组一个来自DICIC标头的自我监督学习框架
链接：https://arxiv.org/abs/2509.06995

作者：eph
摘要：在本文中，我们介绍了Protocol Genome，这是一个自监督学习系统，可以从DICOM头中学习相关性，并在完全保持外部验证的情况下实现AUROC 0.901（vs 0.847基线）和ECE 0.036（vs 0.058）。我们的方法还提高了校准和跨模态（CT，MRI，CXR）和供应商的鲁棒性。临床成像通过PACS/DICOM进行，其中程序选择（扫描仪型号/型号、序列、内核、kVp、TR/TE和切片厚度）会对对比度、噪音和伪影产生影响。这些潜在的混杂因素阻碍了跨站点的仅图像网络的推广。我们认为结构化的DICOM标题作为一个标签，并学习协议感知，但临床上强大的图像表示。Protocol Genome获得去识别报头字段的标记化嵌入，并使用以下方法将其与图像特征一起建模：（1）协议-图像对比学习，（2）掩码协议预测，以及（3）协议-协议翻译。通过126万项研究（7个卫生系统，31台扫描仪，3个供应商; CT，MR，CR/DR），我们进行了以下实验：（A）PE的胸部CT分类，（B）脑MRI胶质瘤分级，和（C）胸部X线片心脏肥大检测。相对于强SSL基线（Simplified，MAE）以及ImageNet转移，Protocol Genome（+0.046：PE，+0.058：胶质瘤，+0.041：心脏肥大）与更高的外部AUROC相关;获得25-37%的校准改善（p < 0.01，DeLong检验）。虽然收益可能取决于任务，但它们会保留10-20%的标记数据。从临床角度来看，该技术减少了协议边界处的假阳性，并适用于PACS（DICOM C-FIND/C-MOVE，DICOMweb QIDO/WADO）。我们发布了一个模型卡和部署指南，完成了去识别和偏差审计。
摘要：In this paper, we introduce the Protocol Genome, a self-supervised learning system that learns correlations from DICOM headers and achieves AUROC 0.901 (vs 0.847 baseline) and ECE 0.036 (vs 0.058) on fully held-out external validation. Our method also improves calibration and robustness across modalities (CT, MRI, CXR) and vendors. Clinical imaging is funneled through PACS/DICOM, where procedure choices (scanner make/model, sequence, kernel, kVp, TR/TE, and slice thickness) have consequences for contrast, noise, and artifact. These latent confounders impede the generalization of image-only networks across sites. We consider structured DICOM headers as a label and learn protocol-aware but clinically robust image representations. Protocol Genome obtains tokenized embeddings of de-identified header fields and models them along with image features using: (1) protocol-image contrastive learning, (2) masked protocol prediction, and (3) protocol-protocol translation. With 1.26M studies (7 health systems, 31 scanners, 3 vendors; CT, MR, CR/DR), we experiment on: (A) chest CT triage for PE, (B) brain MRI glioma grading, and (C) chest radiograph cardiomegaly detection. Relative to strong SSL baselines (SimCLR, MAE) as well as ImageNet transfer, Protocol Genome (+0.046: PE, +0.058: glioma, +0.041: cardiomegaly) is associated with higher external AUROC; 25-37% calibration improvements are obtained (p < 0.01, DeLong tests). While the gains may be task-dependent, they are preserved with 10-20% of labeled data. From a clinical point of view, the technique reduces false positives at protocol borders and is applicable in a PACS (DICOM C-FIND/C-MOVE, DICOMweb QIDO/WADO). We publish a model card and deployment guide, complete with both de-identification and bias audits.

【3】DIET-CP: Lightweight and Data Efficient Self Supervised Continued Pretraining
标题：DIET-CP：轻量级且数据高效的自我监督持续预训练
链接：https://arxiv.org/abs/2509.06990

作者：as, Natalie Montesino, Jakob Ambsdorf, David Klindt, Randall Balestriero
摘要：持续的预训练为基础模型适应新的目标领域提供了一个很有前途的解决方案。然而，在专业领域，可用的数据集通常非常小，限制了为大规模预训练开发的SSL方法的适用性，并使超参数搜索不可行。此外，预训练模型通常仅作为骨干权重发布，缺乏继续预训练的重要信息。我们建议使用DIET-CP来弥合这一差距，这是一种简单的持续预训练策略，任何强大的基础模型都可以转向新的数据分布。DIET-CP依赖于一个非常简单的目标，不需要标签，并且没有引入比监督微调更多的超参数。它在数据模式和主干选择上都很稳定，同时为最先进的模型（如仅使用1000张图像的DINOv 3）提供了显着的性能提升。
摘要：Continued pretraining offers a promising solution for adapting foundation models to a new target domain. However, in specialized domains, available datasets are often very small, limiting the applicability of SSL methods developed for large-scale pretraining and making hyperparameter search infeasible. In addition, pretrained models are usually released as backbone-weights only, lacking important information to continue pretraining. We propose to bridge this gap with DIET-CP, a simple continued pretraining strategy, where any strong foundation model can be steered towards the new data distribution of interest. DIET-CP relies on a very simple objective, requires no labels, and introduces no more hyperparameters than supervised finetuning. It is stable across data modalities and backbone choices, while providing a significant performance boost for state-of-the-art models such as DINOv3 using only 1000 images.

【4】A Kriging-HDMR-based surrogate model with sample pool-free active learning strategy for reliability analysis
标题：基于Kriging-HDMR的代理模型，具有无样本池主动学习策略用于可靠性分析
链接：https://arxiv.org/abs/2509.06978

作者：Li, Hanyu Liao, Suiyin Chen
摘要：在可靠性工程中，随着随机变量数目的增加，传统的代理模型会遇到“维数灾难”。虽然具有高维模型表示（HDMR）的主动学习Kriging代理方法可以有效地近似高维函数，并广泛应用于优化问题，但很少有专门针对可靠性分析的研究，该研究优先考虑关键区域的预测精度，而不是整个域的均匀精度。本研究发展一种主动学习代理模型方法，以Kriging-HDMR模型为基础，进行可靠性分析。所提出的方法有利于近似高维极限状态函数，通过一个复合表示从多个低维子代理模型构建。代理建模框架的体系结构包括三个不同的阶段：为所有随机变量开发单变量子代理模型，确定耦合变量子代理模型的要求，以及构建耦合变量子代理模型。根据各阶段的特点，以不确定性方差、预测均值、样本位置和样本间距离为目标，建立了试验设计样本选择的优化数学模型。采用无候选样本池的方法来实现信息样本的选择。数值实验表明，该方法在求解高维可靠性问题时具有较高的计算效率，同时保持了较强的预测精度.
摘要：In reliability engineering, conventional surrogate models encounter the "curse of dimensionality" as the number of random variables increases. While the active learning Kriging surrogate approaches with high-dimensional model representation (HDMR) enable effective approximation of high-dimensional functions and are widely applied to optimization problems, there are rare studies specifically focused on reliability analysis, which prioritizes prediction accuracy in critical regions over uniform accuracy across the entire domain. This study develops an active learning surrogate model method based on the Kriging-HDMR modeling for reliability analysis. The proposed approach facilitates the approximation of high-dimensional limit state functions through a composite representation constructed from multiple low-dimensional sub-surrogate models. The architecture of the surrogate modeling framework comprises three distinct stages: developing single-variable sub-surrogate models for all random variables, identifying the requirements for coupling-variable sub-surrogate models, and constructing the coupling-variable sub-surrogate models. Optimization mathematical models for selection of design of experiment samples are formulated based on each stage's characteristics, with objectives incorporating uncertainty variance, predicted mean, sample location and inter-sample distances. A candidate sample pool-free approach is adopted to achieve the selection of informative samples. Numerical experiments demonstrate that the proposed method achieves high computational efficiency while maintaining strong predictive accuracy in solving high-dimensional reliability problems.

【5】Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space
标题：再生核Hilbert空间中自我监督学习的核VicReg
链接：https://arxiv.org/abs/2509.07289

作者：panj, Benyamin Ghojogh, Paul Fieguth
摘要：自监督学习（SSL）已经成为一个强大的范式，通过优化几何目标的表示学习-如不变性扩增，方差保持和功能去相关-而不需要标签。然而，大多数现有的方法在欧几里得空间中操作，限制了它们捕获非线性依赖关系和几何结构的能力。在这项工作中，我们提出了Kernel VICReg，这是一种新的自监督学习框架，它将VICReg目标提升到再生核希尔伯特空间（RKHS）中。通过对损失方差、不变性和协方差的每个项进行核化，我们得到了一个通用的公式，该公式在双中心核矩阵和希尔伯特-施密特范数上运行，从而在没有显式映射的情况下实现非线性特征学习。我们证明了内核VICReg不仅避免了代表性崩溃，而且提高了复杂或小规模数据任务的性能。对MNIST、CIFAR-10、STL-10、TinyImageNet和ImageNet 100的实证评估显示，与Euclidean VICReg相比，它们的收益是一致的，尤其是在非线性结构突出的数据集上。UMAP可视化进一步证实，基于内核的嵌入表现出更好的等距和类分离。我们的研究结果表明，核化SSL目标是一个很有前途的方向，为桥梁经典的内核方法与现代表示学习。
摘要：Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives--such as invariance to augmentations, variance preservation, and feature decorrelation--without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that lifts the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss-variance, invariance, and covariance--we obtain a general formulation that operates on double-centered kernel matrices and Hilbert-Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg not only avoids representational collapse but also improves performance on tasks with complex or small-scale data. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations further confirm that kernel-based embeddings exhibit better isometry and class separation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.

【6】A Quantum Bagging Algorithm with Unsupervised Base Learners for Label Corrupted Datasets
标题：标签损坏数据集的无监督基本学习者量子装袋算法
链接：https://arxiv.org/abs/2509.07040

作者：thi, Sanjeev Kumar
摘要：噪声弹性量子机器学习（QML）算法的发展在噪声中间尺度量子（NISQ）时代至关重要。在这项工作中，我们提出了一个量子装袋框架，使用QMeans聚类作为基础学习者，以减少预测方差，提高对标签噪声的鲁棒性。与建立在监督学习器上的装袋框架不同，我们的方法利用了QMeans的无监督性质，结合了通过基于QRAM的采样和通过多数投票进行装袋聚合的量子自举。通过对噪声分类和回归任务的广泛模拟，我们证明了所提出的量子装袋算法使用KMeans对其经典算法进行了改进，同时表现出比监督装袋方法更大的标签损坏弹性。这突出了无监督量子装袋在从不可靠数据中学习方面的潜力。
摘要：The development of noise-resilient quantum machine learning (QML) algorithms is critical in the noisy intermediate-scale quantum (NISQ) era. In this work, we propose a quantum bagging framework that uses QMeans clustering as the base learner to reduce prediction variance and enhance robustness to label noise. Unlike bagging frameworks built on supervised learners, our method leverages the unsupervised nature of QMeans, combined with quantum bootstrapping via QRAM-based sampling and bagging aggregation through majority voting. Through extensive simulations on both noisy classification and regression tasks, we demonstrate that the proposed quantum bagging algorithm performs comparably to its classical counterpart using KMeans while exhibiting greater resilience to label corruption than supervised bagging methods. This highlights the potential of unsupervised quantum bagging in learning from unreliable data.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】Leveraging Support Vector Regression for Outcome Prediction in Personalized Ultra-fractionated Stereotactic Adaptive Radiotherapy
标题：利用支持载体回归预测个性化超分割立体定向适应性放疗的结局
链接：https://arxiv.org/abs/2509.07872

作者： Steve Jiang, Robert Timmerman, Hao Peng
摘要：个性化超分割立体定向自适应放射治疗（PULSAR）是一种新型治疗方法，以延长间隔的脉冲形式提供辐射。通过回归模型准确预测大体肿瘤体积（GTV）的变化具有重要的预后价值。本研究旨在建立一个基于多组学的支持向量回归模型来预测GTV的变化。基于放射组学（MRI图像）和剂量组学（剂量图）特征，分析了39例患者（69例脑转移瘤）的回顾性队列。计算Delta特征以捕获两个时间点之间的相对变化。采用最小绝对收缩和选择算子（Lasso）算法和基于权重或频率的排序标准实现了特征选择流水线。使用决定系数（R2）和相对均方根误差（RRMSE）对具有各种核的SVR模型进行评估。采用10次重复的五重交叉验证来减轻小数据大小的限制。整合放射组学、剂量组学及其对应物的多组学模型优于个体组学模型。相对于单个时间点的特征，Δ-放射组学特征在提高预测准确性方面起着关键作用。表现最好的模型达到了0.743的R2和0.022的RRMSE。所提出的多组学支持向量回归模型在预测GTV的连续变化方面表现出良好的性能。它提供了一种更加量化和个性化的方法，以帮助PULSAR中的患者选择和治疗调整。
摘要：Personalized ultra-fractionated stereotactic adaptive radiotherapy (PULSAR) is a novel treatment that delivers radiation in pulses of protracted intervals. Accurate prediction of gross tumor volume (GTV) changes through regression models has substantial prognostic value. This study aims to develop a multi-omics based support vector regression (SVR) model for predicting GTV change. A retrospective cohort of 39 patients with 69 brain metastases was analyzed, based on radiomics (MRI images) and dosiomics (dose maps) features. Delta features were computed to capture relative changes between two time points. A feature selection pipeline using least absolute shrinkage and selection operator (Lasso) algorithm with weight- or frequency-based ranking criterion was implemented. SVR models with various kernels were evaluated using the coefficient of determination (R2) and relative root mean square error (RRMSE). Five-fold cross-validation with 10 repeats was employed to mitigate the limitation of small data size. Multi-omics models that integrate radiomics, dosiomics, and their delta counterparts outperform individual-omics models. Delta-radiomic features play a critical role in enhancing prediction accuracy relative to features at single time points. The top-performing model achieves an R2 of 0.743 and an RRMSE of 0.022. The proposed multi-omics SVR model shows promising performance in predicting continuous change of GTV. It provides a more quantitative and personalized approach to assist patient selection and treatment adjustment in PULSAR.

【2】EMORF-II: Adaptive EM-based Outlier-Robust Filtering with Correlated Measurement Noise
标题：EMORF-II：具有相关测量噪音的基于EM的自适应离群稳健过滤
链接：https://arxiv.org/abs/2509.07415

作者：jal, Aamir Hussain Chughtai, Muhammad Tahir
备注：6 pages, 4 figures, To appear in MLSP 2025 proceedings
摘要：我们提出了一个基于学习的离群值鲁棒滤波器的一般设置，测量噪声可以相关。由于它是基于EM的离群鲁棒滤波器（EMORF）的增强版本，我们称之为EMORF-II。由于它配备了一个额外的强大功能，在推理过程中学习离群值特征以及离群值检测，EMORF-II提高了离群值缓解能力。数值实验证实了性能增益相比，国家的最先进的方法在精度方面增加了计算开销。然而，值得庆幸的是，计算复杂度顺序与其他实用方法保持一致，使其成为各种应用的有用选择。
摘要：We present a learning-based outlier-robust filter for a general setup where the measurement noise can be correlated. Since it is an enhanced version of EM-based outlier robust filter (EMORF), we call it as EMORF-II. As it is equipped with an additional powerful feature to learn the outlier characteristics during inference along with outlier-detection, EMORF-II has improved outlier-mitigation capability. Numerical experiments confirm performance gains as compared to the state-of-the-art methods in terms of accuracy with an increased computational overhead. However, thankfully the computational complexity order remains at par with other practical methods making it a useful choice for diverse applications.

【3】SAM$^{*}$: Task-Adaptive SAM with Physics-Guided Rewards
标题：SAM$^{*}$：具有物理指导奖励的任务自适应SAM
链接：https://arxiv.org/abs/2509.07047

作者：rakati, Utkarsh Pratiush, Sheryl L. Sanchez, Aditya Raghavan, Delia J. Milliron, Mahshid Ahmadi, Philip D. Rack, Sergei V. Kalinin
备注：19 pages, 8 figures
摘要：图像分割是显微镜中的一项关键任务，对于准确分析和解释复杂的视觉数据至关重要。该任务可以使用在特定领域数据集上训练的自定义模型、从预训练模型或提供广泛适用性的基础模型进行迁移学习来执行。然而，基础模型通常呈现大量需要大量手动优化的非透明调优参数，限制了其用于实时流数据分析的可用性。在这里，我们介绍了一种基于奖励函数的优化微调基础模型，并说明了这种方法的SAM（段任何模型）框架的Meta。可以构造奖励函数以表示成像系统的物理特性，包括颗粒尺寸分布、几何形状和其他标准。通过集成奖励驱动的优化框架，我们增强了SAM的适应性和性能，从而产生了一个优化的变体SAM$^{*}$，它更好地满足了各种分割任务的要求，特别是允许实时流数据分割。我们证明了这种方法在显微成像中的有效性，精确的分割对于分析细胞结构，材料界面和纳米尺度特征至关重要。
摘要：Image segmentation is a critical task in microscopy, essential for accurately analyzing and interpreting complex visual data. This task can be performed using custom models trained on domain-specific datasets, transfer learning from pre-trained models, or foundational models that offer broad applicability. However, foundational models often present a considerable number of non-transparent tuning parameters that require extensive manual optimization, limiting their usability for real-time streaming data analysis. Here, we introduce a reward function-based optimization to fine-tune foundational models and illustrate this approach for SAM (Segment Anything Model) framework by Meta. The reward functions can be constructed to represent the physics of the imaged system, including particle size distributions, geometries, and other criteria. By integrating a reward-driven optimization framework, we enhance SAM's adaptability and performance, leading to an optimized variant, SAM$^{*}$, that better aligns with the requirements of diverse segmentation tasks and particularly allows for real-time streaming data segmentation. We demonstrate the effectiveness of this approach in microscopy imaging, where precise segmentation is crucial for analyzing cellular structures, material interfaces, and nanoscale features.

【4】Individualized and Interpretable Sleep Forecasting via a Two-Stage Adaptive Spatial-Temporal Model
标题：通过两阶段自适应时空模型进行个性化和可解释的睡眠预测
链接：https://arxiv.org/abs/2509.06974

作者：g, Elisabeth Wilhelm
摘要：睡眠质量对健康有很大影响。因此，医疗保健提供者和个人需要方便可靠的预测工具进行预防干预。本文介绍了一种可解释的，个性化的两阶段自适应时空模型预测睡眠质量分数。我们提出的框架结合了多尺度卷积层来模拟多个输入变量之间的空间交互，递归层和注意力机制来捕获长期的时间依赖性，以及两阶段域自适应策略来增强泛化。在训练期间应用第一自适应阶段以减轻训练集上的过拟合。在第二阶段，采用无源测试时自适应机制，以适应新的用户，而不需要标签的模型。我们使用五种输入窗口大小（3，5，7，9和11天）和五种预测窗口大小（1，3，5，7和9天）进行了各种实验。我们的模型始终优于时间序列预测基线方法，包括长短期记忆（LSTM），Informer，PatchTST和TimesNet。最佳性能是在三天的输入窗口和一天的预测窗口下实现的，得到的均方根误差（RMSE）为0.216。此外，该模型即使在较长的预测范围内也表现出良好的预测性能（例如，三天预测窗口的RMSE为0.257），突出了其在现实世界应用中的实用性。我们还进行了可解释性分析，以研究不同的特征如何影响睡眠质量。这些研究结果证明，所提出的框架提供了一个强大的，自适应的，可解释的解决方案，个性化的睡眠预测使用稀疏数据从商业可穿戴设备。
摘要：Sleep quality significantly impacts well-being. Therefore, healthcare providers and individuals need accessible and reliable forecasting tools for preventive interventions. This paper introduces an interpretable, individualized two-stage adaptive spatial-temporal model for predicting sleep quality scores. Our proposed framework combines multi-scale convolutional layers to model spatial interactions across multiple input variables, recurrent layers and attention mechanisms to capture long-term temporal dependencies, and a two-stage domain adaptation strategy to enhance generalization. The first adaptation stage is applied during training to mitigate overfitting on the training set. In the second stage, a source-free test-time adaptation mechanism is employed to adapt the model to new users without requiring labels. We conducted various experiments with five input window sizes (3, 5, 7, 9, and 11 days) and five prediction window sizes (1, 3, 5, 7, and 9 days). Our model consistently outperformed time series forecasting baseline approaches, including Long Short-Term Memory (LSTM), Informer, PatchTST, and TimesNet. The best performance was achieved with a three-day input window and a one-day prediction window, yielding a root mean square error (RMSE) of 0.216. Furthermore, the model demonstrated good predictive performance even for longer forecasting horizons (e.g, with a 0.257 RMSE for a three-day prediction window), highlighting its practical utility for real-world applications. We also conducted an explainability analysis to examine how different features influence sleep quality. These findings proved that the proposed framework offers a robust, adaptive, and explainable solution for personalized sleep forecasting using sparse data from commercial wearable devices.

【5】Cross-device Zero-shot Label Transfer via Alignment of Time Series Foundation Model Embeddings
标题：通过时间序列基础模型嵌入对齐的跨设备Zero-Shot标签传输
链接：https://arxiv.org/abs/2509.06966

作者：avindra, Arijit Sehanobish
备注：5 pages, 3 figures, 1 table. tl;dr: Adversarial alignment of Time-Series Foundation Model (TSFM) embeddings enables transfer of high-quality clinical labels from medical-grade to consumer-grade wearables, enabling zero-shot prediction of gestational age without requiring paired data
摘要：高质量的、经过医学验证的标签适用于临床活动记录数据，但不适用于Apple Watch等无处不在的消费者可穿戴设备。手动标记可穿戴设备数据是昂贵的，并且不能扩展。本文提供了一种新的框架，该框架将有价值的标签从源域（例如，体动记录）到目标域（例如，Apple Watch），而无需配对数据。我们不使用原始时间序列信号，而是使用时间序列基础模型（TSFM）将这两个域投影到共享的潜在嵌入空间中，并开发了一个新的框架来对齐跨设备表示。我们的方法，TSFM嵌入的对抗对齐迫使源和目标嵌入的分布在这个空间内对齐，促进标签在设备类型之间的传输。
摘要：High-quality, medically validated labels exist for clinical actigraphy data but not for ubiquitous consumer wearables like the Apple Watch. Manually labeling wearables data is expensive and doesn't scale. This paper offers a novel framework that transfers valuable labels from a source domain (e.g., actigraphy) to a target domain (e.g., Apple Watch) without requiring paired data. Instead of working with raw time-series signals, we project both domains into a shared latent embedding space using time-series foundation models (TSFMs) and develop a new framework to align the cross-device representations. Our method, Adversarial Alignment of TSFM Embeddings forces the distributions of source and target embeddings to align within this space, facilitating label transfer across device type.

强化学习(3篇)

【1】The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
标题：分歧的选择：通过可验证的奖励缓解强化学习中多样性崩溃的一个被忽视的关键
链接：https://arxiv.org/abs/2509.07430

作者：Jiaran Hao, Jason Klein Liu, Zhijian Zhou, Xiaoyu Tan, Wei Chu, Zhe Wang, Shirui Pan, Chao Qu, Yuan Qi
备注：26 pages, 5 figures
摘要：使用具有可验证奖励的强化学习（RLVR）微调大型语言模型（LLM）的一个中心悖论是多次尝试性能（Pass@k）的频繁下降，尽管单次尝试准确性（Pass@1）有所提高。这通常伴随着灾难性的遗忘，模型失去了以前获得的技能。虽然已经提出了各种方法，但令人惊讶的是，发散项的选择和功能作为一种积极的解决方案尚未得到审查。我们认为，标准的RLVR目标-无论是那些使用模式寻求反向KL发散和那些完全放弃一个分歧术语-缺乏一个关键的知识保留机制。反向KL通过缩小策略来积极加速这种衰减，而它的缺失并不能防止模型偏离其多样化的知识库。我们提出了一个根本性的观点转变：使用发散项本身作为解决方案。我们的框架Diversity-Preserving Hybrid RL（DPH-RL）利用质量覆盖f-发散（如前向KL和JS-发散）作为排练机制。通过不断引用初始策略，这种方法迫使模型保持广泛的解决方案覆盖范围。在数学和SQL生成方面的大量实验表明，DPH-RL不仅解决了Pass@k的退化问题，而且在域内和域外都改进了Pass@1和Pass@k。此外，DPH-RL的训练效率更高，因为它使用生成器函数计算f-散度，只需要从初始策略中采样，而不需要在线参考模型。我们的工作突出了一个关键的，被忽视的轴，提高RLVR，表明适当选择的分歧措施是一个强大的工具，建立更一般和多样化的推理模型。
摘要：A central paradox in fine-tuning Large Language Models (LLMs) with Reinforcement Learning with Verifiable Reward (RLVR) is the frequent degradation of multi-attempt performance (Pass@k) despite improvements in single-attempt accuracy (Pass@1). This is often accompanied by catastrophic forgetting, where models lose previously acquired skills. While various methods have been proposed, the choice and function of the divergence term have been surprisingly unexamined as a proactive solution. We argue that standard RLVR objectives -- both those using the mode-seeking reverse KL-divergence and those forgoing a divergence term entirely -- lack a crucial mechanism for knowledge retention. The reverse-KL actively accelerates this decay by narrowing the policy, while its absence provides no safeguard against the model drifting from its diverse knowledge base. We propose a fundamental shift in perspective: using the divergence term itself as the solution. Our framework, Diversity-Preserving Hybrid RL (DPH-RL), leverages mass-covering f-divergences (like forward-KL and JS-divergence) to function as a rehearsal mechanism. By continuously referencing the initial policy, this approach forces the model to maintain broad solution coverage. Extensive experiments on math and SQL generation demonstrate that DPH-RL not only resolves the Pass@k degradation but improves both Pass@1 and Pass@k in- and out-of-domain. Additionally, DPH-RL is more training-efficient because it computes f-divergence using generator functions, requiring only sampling from the initial policy and no online reference model. Our work highlights a crucial, overlooked axis for improving RLVR, demonstrating that the proper selection of a divergence measure is a powerful tool for building more general and diverse reasoning models.

【2】An efficient deep reinforcement learning environment for flexible job-shop scheduling
标题：高效的深度强化学习环境，用于灵活的车间作业调度
链接：https://arxiv.org/abs/2509.07019

作者：u, Xuefeng Yan, Mingqiang Wei, Donghai Guan
摘要：柔性作业车间调度问题（FJSP）是一个经典的组合优化问题，在现实世界中有着广泛的应用。为了生成快速准确的FJSP调度解决方案，已经开发了各种深度强化学习（DRL）调度方法。然而，这些方法主要集中在日间行车调度Agent的设计，忽略了日间行车环境的建模。本文提出了一个基于离散事件仿真的简单时间顺序的FJSP日间行车调度环境，并提出了一个基于最近策略优化（PPO）的端到端日间行车调度模型。在此基础上，基于调度环境中的两个状态变量，提出了FJSP的一种简短的新状态表示，并基于机器的调度区域设计了一种新的可理解的奖励函数。在公共基准测试实例上的实验结果表明，简单优先级调度规则（PDR）的性能在我们的调度环境中得到了改善，与OR-Tools、元启发式、DRL和PDR调度方法相比，我们的DRL调度模型获得了竞争性的性能.
摘要：The Flexible Job-shop Scheduling Problem (FJSP) is a classical combinatorial optimization problem that has a wide-range of applications in the real world. In order to generate fast and accurate scheduling solutions for FJSP, various deep reinforcement learning (DRL) scheduling methods have been developed. However, these methods are mainly focused on the design of DRL scheduling Agent, overlooking the modeling of DRL environment. This paper presents a simple chronological DRL environment for FJSP based on discrete event simulation and an end-to-end DRL scheduling model is proposed based on the proximal policy optimization (PPO). Furthermore, a short novel state representation of FJSP is proposed based on two state variables in the scheduling environment and a novel comprehensible reward function is designed based on the scheduling area of machines. Experimental results on public benchmark instances show that the performance of simple priority dispatching rules (PDR) is improved in our scheduling environment and our DRL scheduling model obtains competing performance compared with OR-Tools, meta-heuristic, DRL and PDR scheduling methods.

【3】Reinforcement learning for online hyperparameter tuning in convex quadratic programming
标题：凸二次规划中在线超参数调整的强化学习
链接：https://arxiv.org/abs/2509.07404

作者：rtoncini, Alberto De Marchi, Matthias Gerdts, Simon Gottschalk
摘要：二次规划是现代非线性优化，控制和数据科学的主力。虽然正则化方法在对问题数据的最小假设下提供收敛保证，但它们可能表现出一阶格式典型的缓慢尾部收敛，因此需要多次迭代才能获得高精度解。此外，超参数调整显着影响求解器的性能，但如何找到一个合适的参数配置仍然是一个难以捉摸的研究问题。为了解决这些问题，我们探讨了数据驱动的方法如何加速解决方案的过程。针对高精度的解决方案，我们专注于一个稳定的边界点求解器，并仔细处理其双回路流量和控制参数。我们将展示强化学习可以为促进求解器调整和加快优化过程做出重大贡献。数值实验表明，经过轻量级的训练，学习到的策略可以很好地推广到不同维度的不同问题类和各种求解器配置。
摘要：Quadratic programming is a workhorse of modern nonlinear optimization, control, and data science. Although regularized methods offer convergence guarantees under minimal assumptions on the problem data, they can exhibit the slow tail-convergence typical of first-order schemes, thus requiring many iterations to achieve high-accuracy solutions. Moreover, hyperparameter tuning significantly impacts on the solver performance but how to find an appropriate parameter configuration remains an elusive research question. To address these issues, we explore how data-driven approaches can accelerate the solution process. Aiming at high-accuracy solutions, we focus on a stabilized interior-point solver and carefully handle its two-loop flow and control parameters. We will show that reinforcement learning can make a significant contribution to facilitating the solver tuning and to speeding up the optimization process. Numerical experiments demonstrate that, after a lightweight training, the learned policy generalizes well to different problem classes with varying dimensions and to various solver configurations.

医学相关(5篇)

【1】Addressing the Cold-Start Problem for Personalized Combination Drug Screening
标题：解决个性化组合药物筛查的冷启动问题
链接：https://arxiv.org/abs/2509.07850

作者：e Mathelin, Christopher Tosh, Wesley Tansey
摘要：在肿瘤学中个性化联合治疗需要导航可能的药物和剂量组合的巨大空间，这是一项通过详尽的实验在很大程度上仍然不可行的任务。患者衍生模型的最新发展使得高通量离体筛选成为可能，但可行的实验数量有限。此外，紧密的治疗窗口使得收集分子谱信息（例如RNA-seq）作为指导药物反应预测的手段是不切实际的。这导致了一个具有挑战性的冷启动问题：当没有关于患者的先验信息时，我们如何选择最具信息性的组合进行早期测试？我们提出了一种策略，该策略利用基于历史药物反应数据构建的预训练深度学习模型。该模型提供了药物组合的嵌入和剂量水平的重要性分数，从而能够对初始实验进行有原则的选择。我们将药物嵌入的聚类与剂量加权机制相结合，以确保功能多样性，该剂量加权机制根据其历史信息量确定剂量的优先级。对大规模药物组合数据集的回顾性模拟表明，与基线相比，我们的方法大大提高了初始筛选效率，为个性化组合药物筛选中更有效的早期决策提供了可行的途径。
摘要：Personalizing combination therapies in oncology requires navigating an immense space of possible drug and dose combinations, a task that remains largely infeasible through exhaustive experimentation. Recent developments in patient-derived models have enabled high-throughput ex vivo screening, but the number of feasible experiments is limited. Further, a tight therapeutic window makes gathering molecular profiling information (e.g. RNA-seq) impractical as a means of guiding drug response prediction. This leads to a challenging cold-start problem: how do we select the most informative combinations to test early, when no prior information about the patient is available? We propose a strategy that leverages a pretrained deep learning model built on historical drug response data. The model provides both embeddings for drug combinations and dose-level importance scores, enabling a principled selection of initial experiments. We combine clustering of drug embeddings to ensure functional diversity with a dose-weighting mechanism that prioritizes doses based on their historical informativeness. Retrospective simulations on large-scale drug combination datasets show that our method substantially improves initial screening efficiency compared to baselines, offering a viable path for more effective early-phase decision-making in personalized combination drug screens.

【2】BDPM: A Machine Learning-Based Feature Extractor for Parkinson's Disease Classification via Gut Microbiota Analysis
标题：Bestival：一种基于机器学习的特征提取器，用于通过肠道微生物群分析进行帕金森病分类
链接：https://arxiv.org/abs/2509.07723

作者：ixiu Hua, Bo Zhao
备注：11 pages, 7 figures
摘要：背景资料：帕金森病仍然是一种主要的神经退行性疾病，误诊率高，主要是由于依赖于临床评定量表。最近的研究表明，肠道微生物群与帕金森病之间存在很强的关联，这表明微生物组成可能是一种有前途的生物标志物。尽管基于肠道微生物群的深度学习模型显示出早期预测的潜力，但大多数方法依赖于单一分类器，并且经常忽略菌株间的相关性或时间动态。因此，迫切需要针对微生物组数据量身定制的更稳健的特征提取方法。方法：我们提出了BDPM（一种基于机器学习的特征提取器，通过肠道微生物群分析进行帕金森病分类）。首先，我们收集了39名帕金森病患者及其健康配偶的肠道微生物群概况，以确定差异丰富的类群。其次，我们开发了一个创新的特征选择框架RFRE（随机森林结合递归特征消除），整合生态学知识，以提高生物可解释性。最后，我们设计了一个混合分类模型来捕获微生物组数据中的时间和空间模式。
摘要：Background: Parkinson's disease remains a major neurodegenerative disorder with high misdiagnosis rates, primarily due to reliance on clinical rating scales. Recent studies have demonstrated a strong association between gut microbiota and Parkinson's disease, suggesting that microbial composition may serve as a promising biomarker. Although deep learning models based ongut microbiota show potential for early prediction, most approaches rely on single classifiers and often overlook inter-strain correlations or temporal dynamics. Therefore, there is an urgent need for more robust feature extraction methods tailored to microbiome data. Methods: We proposed BDPM (A Machine Learning-Based Feature Extractor for Parkinson's Disease Classification via Gut Microbiota Analysis). First, we collected gut microbiota profiles from 39 Parkinson's patients and their healthy spouses to identify differentially abundant taxa. Second, we developed an innovative feature selection framework named RFRE (Random Forest combined with Recursive Feature Elimination), integrating ecological knowledge to enhance biological interpretability. Finally, we designed a hybrid classification model to capture temporal and spatial patterns in microbiome data.

【3】MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification
标题：MedicalPatchNet：一种基于补丁的自我解释人工智能架构，用于胸部X射线分类
链接：https://arxiv.org/abs/2509.07477

作者：ienholt, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn
摘要：深度神经网络在放射学图像分类方面表现出色，但经常存在可解释性差的问题，限制了临床接受度。我们提出了MedicalPatchNet，一个内在的自我解释架构的胸部X射线分类，透明的属性决定不同的图像区域。MedicalPatchNet将图像分割成不重叠的补丁，独立地对每个补丁进行分类，并聚合预测，从而实现每个补丁的诊断贡献的直观可视化，而无需事后技术。在CheXpert数据集（223，414张图像）上进行训练后，MedicalPatchNet与EfficientNet-B 0的分类性能（AUROC 0.907 vs. 0.908）相匹配，同时大幅提高了可解释性：MedicalPatchNet在CheXlocalize数据集上表现出更高的病理定位准确性（平均命中率0.485 vs. 0.376，Grad-CAM）。通过提供明确、可靠的解释，即使是非人工智能专家也可以访问，MedicalPatchNet减轻了与捷径学习相关的风险，从而提高了临床信任度。我们的模型是公开的，具有可重复的训练和推理脚本，有助于在医学成像领域实现更安全、可解释的人工智能辅助诊断。我们公开代码：https://github.com/TruhnLab/MedicalPatchNet
摘要：Deep neural networks excel in radiological image classification but frequently suffer from poor interpretability, limiting clinical acceptance. We present MedicalPatchNet, an inherently self-explainable architecture for chest X-ray classification that transparently attributes decisions to distinct image regions. MedicalPatchNet splits images into non-overlapping patches, independently classifies each patch, and aggregates predictions, enabling intuitive visualization of each patch's diagnostic contribution without post-hoc techniques. Trained on the CheXpert dataset (223,414 images), MedicalPatchNet matches the classification performance (AUROC 0.907 vs. 0.908) of EfficientNet-B0, while substantially improving interpretability: MedicalPatchNet demonstrates substantially improved interpretability with higher pathology localization accuracy (mean hit-rate 0.485 vs. 0.376 with Grad-CAM) on the CheXlocalize dataset. By providing explicit, reliable explanations accessible even to non-AI experts, MedicalPatchNet mitigates risks associated with shortcut learning, thus improving clinical trust. Our model is publicly available with reproducible training and inference scripts and contributes to safer, explainable AI-assisted diagnostics across medical imaging domains. We make the code publicly available: https://github.com/TruhnLab/MedicalPatchNet

【4】CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
标题：CancerGUIDE：通过内部分歧估计理解癌症指南
链接：https://arxiv.org/abs/2509.07325

作者：ell, Noel C. F. Codella, Sam Preston, Peniel Argaw, Wen-wai Yim, Zelalem Gero, Cliff Wong, Rajesh Jena, Eric Horvitz, Amanda K. Hall, Ruican Rachel Zhong, Jiachen Li, Shrey Jain, Mu Wei, Matthew Lungren, Hoifung Poon
摘要：美国国家综合癌症网络（NCCN）为癌症治疗提供循证指南。将复杂的患者介绍转化为符合指南的治疗建议是时间密集型的，需要专业知识，并且容易出错。大型语言模型（LLM）功能的进步有望减少生成治疗建议所需的时间并提高准确性。我们提出了一种基于LLM代理的方法来自动生成非小细胞肺癌（NSCLC）患者的指南一致的治疗轨迹。我们的贡献是三方面的。首先，我们构建了一个包含121例NSCLC患者的新型纵向数据集，其中包括临床表现、诊断结果和病史，每个数据集都由经过委员会认证的肿瘤学家用相应的NCCN指南轨迹进行了专业注释。其次，我们证明了现有的LLM拥有特定领域的知识，能够为模型开发和评估生成高质量的代理基准，与专家注释的基准实现强相关性（斯皮尔曼系数r=0.88，RMSE = 0.08）。第三，我们开发了一种混合方法，将昂贵的人类注释与模型一致性信息相结合，以创建预测患者相关指南的代理框架，以及验证预测准确性的元分类器，并校准治疗建议的置信度得分（AUROC=0.800），这是一种传达输出准确性、定制性能权衡和支持法规遵从性的关键能力。这项工作为临床可行的基于LLM的指南遵守系统建立了一个框架，该系统平衡了准确性，可解释性和监管要求，同时降低了注释成本，为自动化临床决策支持提供了可扩展的途径。
摘要：The National Comprehensive Cancer Network (NCCN) provides evidence-based guidelines for cancer treatment. Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error. Advances in large language model (LLM) capabilities promise to reduce the time required to generate treatment recommendations and improve accuracy. We present an LLM agent-based approach to automatically generate guideline-concordant treatment trajectories for patients with non-small cell lung cancer (NSCLC). Our contributions are threefold. First, we construct a novel longitudinal dataset of 121 cases of NSCLC patients that includes clinical encounters, diagnostic results, and medical histories, each expertly annotated with the corresponding NCCN guideline trajectories by board-certified oncologists. Second, we demonstrate that existing LLMs possess domain-specific knowledge that enables high-quality proxy benchmark generation for both model development and evaluation, achieving strong correlation (Spearman coefficient r=0.88, RMSE = 0.08) with expert-annotated benchmarks. Third, we develop a hybrid approach combining expensive human annotations with model consistency information to create both the agent framework that predicts the relevant guidelines for a patient, as well as a meta-classifier that verifies prediction accuracy with calibrated confidence scores for treatment recommendations (AUROC=0.800), a critical capability for communicating the accuracy of outputs, custom-tailoring tradeoffs in performance, and supporting regulatory compliance. This work establishes a framework for clinically viable LLM-based guideline adherence systems that balance accuracy, interpretability, and regulatory requirements while reducing annotation costs, providing a scalable pathway toward automated clinical decision support.

【5】PUUMA (Placental patch and whole-Uterus dual-branch U-Mamba-based Architecture): Functional MRI Prediction of Gestational Age at Birth and Preterm Risk
标题：PUUMA（胎盘补片和全子宫双分支U-Mamba架构）：胎儿出生年龄和早产风险的功能性MRI预测
链接：https://arxiv.org/abs/2509.07042

作者：ardo-Rojas, Levente Baljer, Jordina Aviles Verdera, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter
备注：11 pages, 4 figures, 2 tables, to be published in with Springer - Lecture Notes in Computer Science, as part of PerInatal, Preterm and Paediatric Image (PIPPI) Analysis workshop held in conjunction with MICCAI 2025
摘要：早产是儿童期死亡和终身发病的主要原因。其复杂和多因素的起源限制了当前临床预测因子的有效性，并阻碍了最佳护理。在这项研究中，开发了一种双分支深度学习架构（PUUMA），使用来自295例妊娠的T2* 胎儿MRI数据预测出生时的胎龄（GA），其中包括异质和不平衡的人群。该模型集成了全局全子宫和局部胎盘特征。它的性能是使用经验丰富的临床医生从解剖MRI和其他深度学习架构中获得的颈椎长度测量值与线性回归进行基准测试的。使用平均绝对误差评估出生时的GA预测。准确性、敏感性和特异性用于评估早产分类。尽管数据集中存在明显的类别不平衡，但基于MRI的全自动流水线和宫颈长度回归在检测早产方面均实现了相当的平均绝对误差（3周）和良好的灵敏度（0.67）。这些结果提供了一个概念的证明，自动预测GA在出生时的功能性MRI，并强调了价值的全子宫功能成像在识别高危妊娠。此外，我们还证明，目前临床实践中尚未常规的MRI手动高清晰度宫颈长度测量可提供有价值的预测信息。未来的工作将集中在扩大队列规模和纳入额外的器官特异性成像，以提高概括性和预测性能。
摘要：Preterm birth is a major cause of mortality and lifelong morbidity in childhood. Its complex and multifactorial origins limit the effectiveness of current clinical predictors and impede optimal care. In this study, a dual-branch deep learning architecture (PUUMA) was developed to predict gestational age (GA) at birth using T2* fetal MRI data from 295 pregnancies, encompassing a heterogeneous and imbalanced population. The model integrates both global whole-uterus and local placental features. Its performance was benchmarked against linear regression using cervical length measurements obtained by experienced clinicians from anatomical MRI and other Deep Learning architectures. The GA at birth predictions were assessed using mean absolute error. Accuracy, sensitivity, and specificity were used to assess preterm classification. Both the fully automated MRI-based pipeline and the cervical length regression achieved comparable mean absolute errors (3 weeks) and good sensitivity (0.67) for detecting preterm birth, despite pronounced class imbalance in the dataset. These results provide a proof of concept for automated prediction of GA at birth from functional MRI, and underscore the value of whole-uterus functional imaging in identifying at-risk pregnancies. Additionally, we demonstrate that manual, high-definition cervical length measurements derived from MRI, not currently routine in clinical practice, offer valuable predictive information. Future work will focus on expanding the cohort size and incorporating additional organ-specific imaging to improve generalisability and predictive performance.

聚类(2篇)

【1】Dimensionally Reduced Open-World Clustering: DROWCULA
标题：专业减少的开放世界集群：DROWCULLA
链接：https://arxiv.org/abs/2509.07184

作者：zbey, Dimitrios I. Diochnos
备注：16 pages, 12 Figures, 12 Tables
摘要：处理带注释的数据是监督学习的基石。然而，为实例提供标签是一项需要大量人力的任务。几个关键的现实世界应用程序使事情变得更加复杂，因为无论在感兴趣的任务中识别了多少个标签，未来可能会出现与新类别相对应的示例。毫不奇怪，在这种所谓的“开放世界”背景下，以前的工作主要集中在半监督方法上。关注图像分类，有点矛盾的是，我们提出了一个完全无监督的方法来确定特定数据集中的新类别的问题。我们的方法依赖于使用Vision Transformers估计聚类的数量，该方法利用注意力机制来生成向量嵌入。此外，我们采用流形学习技术，通过利用数据的内在几何结构来改进这些嵌入，从而提高整体图像聚类性能。总的来说，我们在CIFAR-10、CIFAR-100、ImageNet-100和Tiny ImageNet上建立了关于单模态聚类和新类发现的最新结果。我们这样做，无论是当集群的数量是已知的或未知的提前。该代码可从以下网址获得：https://github.com/DROWCULA/DROWCULA。
摘要：Working with annotated data is the cornerstone of supervised learning. Nevertheless, providing labels to instances is a task that requires significant human effort. Several critical real-world applications make things more complicated because no matter how many labels may have been identified in a task of interest, it could be the case that examples corresponding to novel classes may appear in the future. Not unsurprisingly, prior work in this, so-called, `open-world' context has focused a lot on semi-supervised approaches. Focusing on image classification, somehow paradoxically, we propose a fully unsupervised approach to the problem of determining the novel categories in a particular dataset. Our approach relies on estimating the number of clusters using Vision Transformers, which utilize attention mechanisms to generate vector embeddings. Furthermore, we incorporate manifold learning techniques to refine these embeddings by exploiting the intrinsic geometry of the data, thereby enhancing the overall image clustering performance. Overall, we establish new State-of-the-Art results on single-modal clustering and Novel Class Discovery on CIFAR-10, CIFAR-100, ImageNet-100, and Tiny ImageNet. We do so, both when the number of clusters is known or unknown ahead of time. The code is available at: https://github.com/DROWCULA/DROWCULA.

【2】Toward Quantum Utility in Finance: A Robust Data-Driven Algorithm for Asset Clustering
标题：迈向金融中的量子效用：一种稳健的数据驱动的资产集群算法
链接：https://arxiv.org/abs/2509.07766

作者：arma, Supreeth Mysore Venkatesh, Pushkin Kachroo
备注：9 pages, 2 figures, International Quantum Engineering conference and exhibition (QUEST-IS 2025)
摘要：基于收益相关性的金融资产聚类是投资组合优化和统计套利的基本任务。然而，经典的聚类方法在处理有符号相关结构时往往不足，通常需要有损变换和启发式假设，例如固定数量的聚类。在这项工作中，我们应用基于图的联盟结构生成算法（GCS-Q）直接集群签署，加权图，而不依赖于这样的转换。GCS-Q将每个划分步骤公式化为一个QUBO问题，使其能够利用量子退火来有效探索指数级大的解空间。我们在合成和真实世界的金融数据上验证了我们的方法，并对最先进的经典算法（如SPONGE和k-Medoids）进行了基准测试。我们的实验表明，GCS-Q始终实现更高的聚类质量，调整后的兰德指数和结构平衡的罚款，同时动态确定集群的数量。这些结果突出了近期量子计算在金融应用中基于图的无监督学习的实际效用。
摘要：Clustering financial assets based on return correlations is a fundamental task in portfolio optimization and statistical arbitrage. However, classical clustering methods often fall short when dealing with signed correlation structures, typically requiring lossy transformations and heuristic assumptions such as a fixed number of clusters. In this work, we apply the Graph-based Coalition Structure Generation algorithm (GCS-Q) to directly cluster signed, weighted graphs without relying on such transformations. GCS-Q formulates each partitioning step as a QUBO problem, enabling it to leverage quantum annealing for efficient exploration of exponentially large solution spaces. We validate our approach on both synthetic and real-world financial data, benchmarking against state-of-the-art classical algorithms such as SPONGE and k-Medoids. Our experiments demonstrate that GCS-Q consistently achieves higher clustering quality, as measured by Adjusted Rand Index and structural balance penalties, while dynamically determining the number of clusters. These results highlight the practical utility of near-term quantum computing for graph-based unsupervised learning in financial applications.

自动驾驶|车辆|车道检测等(1篇)

【1】A Knowledge-Guided Cross-Modal Feature Fusion Model for Local Traffic Demand Prediction
标题：基于知识引导的局部交通需求预测跨模式特征融合模型
链接：https://arxiv.org/abs/2509.06976

作者：ang, Pengfei Xu, Guobin Wu, Jian Liang, Ruiyang Dong, Yunhai Wang, Xuan Song
摘要：交通需求预测在智能交通系统中起着至关重要的作用。现有的交通预测模型主要依赖于时间交通数据，结合人类知识和经验的城市交通需求预测的努力有限。然而，在现实场景中，来自人类日常生活的交通知识和经验显着影响精确的交通预测。这些知识和经验可以指导模型发现交通数据中的潜在模式，从而提高预测的准确性和鲁棒性。为此，本文提出将结构化的时态交通数据与代表人类知识和经验的文本数据相结合，从而形成一种新的知识引导的跨模态特征表示学习（KGCM）模型用于交通需求预测。基于区域交通特征，我们使用大型语言模型结合人工编写和修订构建先验知识数据集，涵盖区域和全球知识和经验。然后，KGCM模型通过设计的局部和全局自适应图网络以及跨模态特征融合机制来学习多模态数据特征。提出了一种基于推理的动态更新策略，可以动态优化图模型的参数，实现最佳性能。多个流量数据集上的实验表明，我们的模型准确地预测未来的流量需求，并优于现有的最先进的（SOTA）模型。
摘要：Traffic demand prediction plays a critical role in intelligent transportation systems. Existing traffic prediction models primarily rely on temporal traffic data, with limited efforts incorporating human knowledge and experience for urban traffic demand forecasting. However, in real-world scenarios, traffic knowledge and experience derived from human daily life significantly influence precise traffic prediction. Such knowledge and experiences can guide the model in uncovering latent patterns within traffic data, thereby enhancing the accuracy and robustness of predictions. To this end, this paper proposes integrating structured temporal traffic data with textual data representing human knowledge and experience, resulting in a novel knowledge-guided cross-modal feature representation learning (KGCM) model for traffic demand prediction. Based on regional transportation characteristics, we construct a prior knowledge dataset using a large language model combined with manual authoring and revision, covering both regional and global knowledge and experiences. The KGCM model then learns multimodal data features through designed local and global adaptive graph networks, as well as a cross-modal feature fusion mechanism. A proposed reasoning-based dynamic update strategy enables dynamic optimization of the graph model's parameters, achieving optimal performance. Experiments on multiple traffic datasets demonstrate that our model accurately predicts future traffic demand and outperforms existing state-of-the-art (SOTA) models.

联邦学习|隐私保护|加密(2篇)

【1】FedTeddi: Temporal Drift and Divergence Aware Scheduling for Timely Federated Edge Learning
标题：FedTeddi：及时联合边缘学习的时间漂移和分歧感知调度
链接：https://arxiv.org/abs/2509.07342

作者：i, Yuxuan Sun, Tan Chen, Wei Chen, Sheng Zhou, Zhisheng Niu
备注：Submitted to IEEE for possible publication
摘要：联邦边缘学习（FEEL）支持通过无线网络跨分布式客户端进行协作模型训练，而不会暴露原始数据。虽然大多数现有的研究假设静态数据集，但在现实世界中，客户端可能会连续收集具有时变、非独立和同分布（非i.i.d.）特色一个关键的挑战是如何以及时而有效的方式调整模型以适应这种不断变化的数据。在本文中，我们提出了FedTeddi，一个时间漂移和发散感知调度算法，有利于快速收敛的FEEL下动态数据的演变和通信资源的限制。我们首先量化的时间动态和非独立同分布。分别使用时间漂移和集体发散的数据的特性，并将它们表示为分类任务的类分布的地球移动器距离（EMD）。然后，我们提出了一个新的优化目标，并开发了一个联合调度和带宽分配算法，使FEEL系统能够快速学习新的数据，而不会忘记以前的知识。实验结果表明，与基准算法相比，该算法具有更高的测试精度和更快的收敛速度，在CIFAR-10和CIFAR-100上的收敛速度分别比随机调度提高了58.4%和49.2%。
摘要：Federated edge learning (FEEL) enables collaborative model training across distributed clients over wireless networks without exposing raw data. While most existing studies assume static datasets, in real-world scenarios clients may continuously collect data with time-varying and non-independent and identically distributed (non-i.i.d.) characteristics. A critical challenge is how to adapt models in a timely yet efficient manner to such evolving data. In this paper, we propose FedTeddi, a temporal-drift-and-divergence-aware scheduling algorithm that facilitates fast convergence of FEEL under dynamic data evolution and communication resource limits. We first quantify the temporal dynamics and non-i.i.d. characteristics of data using temporal drift and collective divergence, respectively, and represent them as the Earth Mover's Distance (EMD) of class distributions for classification tasks. We then propose a novel optimization objective and develop a joint scheduling and bandwidth allocation algorithm, enabling the FEEL system to learn from new data quickly without forgetting previous knowledge. Experimental results show that our algorithm achieves higher test accuracy and faster convergence compared to benchmark methods, improving the rate of convergence by 58.4% on CIFAR-10 and 49.2% on CIFAR-100 compared to random scheduling.

【2】Fed-REACT: Federated Representation Learning for Heterogeneous and Evolving Data
标题：Fed-REACT：针对异类和不断发展的数据的联邦表示学习
链接：https://arxiv.org/abs/2509.07198

作者：n, Usman Akram, Chianing Wang, Haris Vikalo
摘要：受集中式机器学习的高资源成本和隐私问题的影响，联邦学习（FL）已经成为一种有效的替代方案，使客户能够协作训练全局模型，同时保持数据本地化。然而，在实际部署中，客户端数据分布通常会随着时间的推移而演变，并且在客户端之间存在显着差异，从而引入了降低标准FL算法性能的异构性。在这项工作中，我们介绍Fed-REACT，一个联邦学习框架，专为异构和不断发展的客户端数据。Fed-REACT在两个阶段的过程中将表示学习与进化聚类相结合：（1）在第一阶段，每个客户端学习本地模型以从其数据中提取特征表示;（2）在第二阶段，服务器基于这些表示动态地将客户端分组到集群中，并协调特定于任务的模型的集群训练，以实现下游目标，如分类或回归。我们提供了表示学习阶段的理论分析，并实证证明Fed-REACT在现实世界的数据集上实现了卓越的准确性和鲁棒性。
摘要：Motivated by the high resource costs and privacy concerns associated with centralized machine learning, federated learning (FL) has emerged as an efficient alternative that enables clients to collaboratively train a global model while keeping their data local. However, in real-world deployments, client data distributions often evolve over time and differ significantly across clients, introducing heterogeneity that degrades the performance of standard FL algorithms. In this work, we introduce Fed-REACT, a federated learning framework designed for heterogeneous and evolving client data. Fed-REACT combines representation learning with evolutionary clustering in a two-stage process: (1) in the first stage, each client learns a local model to extracts feature representations from its data; (2) in the second stage, the server dynamically groups clients into clusters based on these representations and coordinates cluster-wise training of task-specific models for downstream objectives such as classification or regression. We provide a theoretical analysis of the representation learning stage, and empirically demonstrate that Fed-REACT achieves superior accuracy and robustness on real-world datasets.

推理|分析|理解|解释(10篇)

【1】Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence
标题：学习率预热加速收敛的理论分析
链接：https://arxiv.org/abs/2509.07972

作者：u, Yuze Ge, Rui Pan, An Kang, Tong Zhang
摘要：学习率预热是训练大规模深度神经网络的一种流行而实用的技术。尽管在实践中取得了巨大的成功，但这种在训练过程开始时逐渐提高学习率的策略的理论优势尚未得到充分理解。为了解决这一理论与实践之间的差距，我们首先提出了一个新的家庭广义光滑性假设，并验证其适用性的理论和经验。在新的光滑性假设下，研究了梯度下降算法在确定性和随机性两种情形下的收敛性质。结果表明，学习率预热一致加速GD，GD与预热可以收敛最多$\Theta（T）$倍，在某些特定情况下，比非增加学习率的时间表，从优化理论的角度来看，这种策略的好处提供了见解。
摘要：Learning rate warmup is a popular and practical technique in training large-scale deep neural networks. Despite the huge success in practice, the theoretical advantages of this strategy of gradually increasing the learning rate at the beginning of the training process have not been fully understood. To resolve this gap between theory and practice, we first propose a novel family of generalized smoothness assumptions, and validate its applicability both theoretically and empirically. Under the novel smoothness assumption, we study the convergence properties of gradient descent (GD) in both deterministic and stochastic settings. It is shown that learning rate warmup consistently accelerates GD, and GD with warmup can converge at most $\Theta(T)$ times faster than with a non-increasing learning rate schedule in some specific cases, providing insights into the benefits of this strategy from an optimization theory perspective.

【2】MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
标题：MoE压缩：专家的压缩误差如何影响MoE模型的推理准确性？
链接：https://arxiv.org/abs/2509.07727

作者：a, Zhaorui Zhang, Sheng Di, Benben Liu, Xiaodong Yu, Xiaoyi Lu, Dan Wang
摘要：随着混合专家推理模型（Mixture of Experts，MoE）在LLM学习领域的广泛应用，如何在有限的GPU内存约束下高效地服务于MoE模型成为一个重要的挑战。将未激活的专家卸载到主存储器已被确定为解决此类问题的有效方法，同时它带来了在GPU存储器和主存储器之间转移专家的挑战。我们需要探索一种有效的方法来压缩专家，并分析压缩误差如何影响推理性能。为了弥合这一差距，我们建议采用错误有界的有损压缩算法（如SZ 3和CuSZp）来压缩非激活的专家，从而减少数据传输开销在MoE推理。我们在各种基准测试中进行了广泛的实验，并对不同专家的压缩引起的错误如何影响整体推理准确性进行了全面的分析。结果表明，专家在浅层，这是主要负责的注意力机制和输入令牌到矢量表示的转换，表现出最小的退化时，受到有界错误的推理精度。相比之下，中间层专家的错误，这是模型推理的核心，显着损害推理的准确性。有趣的是，在主要负责指令跟踪和输出集成的深层专家中引入有界错误有时会导致推理准确性的提高。
摘要：With the widespread application of Mixture of Experts (MoE) reasoning models in the field of LLM learning, efficiently serving MoE models under limited GPU memory constraints has emerged as a significant challenge. Offloading the non-activated experts to main memory has been identified as an efficient approach to address such a problem, while it brings the challenges of transferring the expert between the GPU memory and main memory. We need to explore an efficient approach to compress the expert and analyze how the compression error affects the inference performance. To bridge this gap, we propose employing error-bounded lossy compression algorithms (such as SZ3 and CuSZp) to compress non-activated experts, thereby reducing data transfer overhead during MoE inference. We conduct extensive experiments across various benchmarks and present a comprehensive analysis of how compression-induced errors in different experts affect overall inference accuracy. The results indicate that experts in the shallow layers, which are primarily responsible for the attention mechanism and the transformation of input tokens into vector representations, exhibit minimal degradation in inference accuracy when subjected to bounded errors. In contrast, errors in the middle-layer experts, which are central to model reasoning, significantly impair inference accuracy. Interestingly, introducing bounded errors in the deep-layer experts, which are mainly responsible for instruction following and output integration, can sometimes lead to improvements in inference accuracy.

【3】CAViAR: Critic-Augmented Video Agentic Reasoning
标题：CAViAR：批判性增强视频抽象推理
链接：https://arxiv.org/abs/2509.07680

作者：non, Ahmet Iscen, Arsha Nagrani, Tobias Weyand, Carl Vondrick, Cordelia Schmid
摘要：近年来，视频理解取得了重大进展，模型对短片的感知表现持续上升。然而，最近的多个基准测试，如LVBench，Neptune和ActivityNet-RTL，显示随着查询变得越来越复杂，视频变得越来越长，需要对视频进行复杂推理的任务的性能下降。在这项工作中，我们问：可以利用现有的感知能力，成功地执行更复杂的视频推理？特别是，我们开发了一个大型的语言模型代理访问视频模块作为子代理或工具。而不是像以前的工作（如Visual Programming，ViperGPT和MoReVQA）那样遵循固定的过程来解决查询，代理使用对模块的每次调用的结果来确定后续步骤。受文本推理领域工作的启发，我们引入了一个批评家，以区分成功和不成功的序列从代理的实例。我们表明，我们的代理和评论家的组合在前面提到的数据集上实现了强大的性能。
摘要：Video understanding has seen significant progress in recent years, with models' performance on perception from short clips continuing to rise. Yet, multiple recent benchmarks, such as LVBench, Neptune, and ActivityNet-RTL, show performance wanes for tasks requiring complex reasoning on videos as queries grow more complex and videos grow longer. In this work, we ask: can existing perception capabilities be leveraged to successfully perform more complex video reasoning? In particular, we develop a large language model agent given access to video modules as subagents or tools. Rather than following a fixed procedure to solve queries as in previous work such as Visual Programming, ViperGPT, and MoReVQA, the agent uses the results of each call to a module to determine subsequent steps. Inspired by work in the textual reasoning domain, we introduce a critic to distinguish between instances of successful and unsuccessful sequences from the agent. We show that the combination of our agent and critic achieve strong performance on the previously-mentioned datasets.

【4】K2-Think: A Parameter-Efficient Reasoning System
标题：K2-Think：参数高效的推理系统
链接：https://arxiv.org/abs/2509.07604

作者：heng, Richard Fan, Shibo Hao, Taylor W. Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno, Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute, Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu, Zhengzhong Liu, Eric P. Xing
备注：To access the K2-Think reasoning system, please visit this https URL
摘要：K2-Think是一个推理系统，具有32 B参数模型，达到最先进的性能，匹配或超越更大的模型，如GPT-OSS 120 B和DeepSeek v3.1。我们的系统建立在Qwen2.5基础模型上，通过结合先进的后训练和测试时计算技术，较小的模型可以在最高水平上竞争。该方法基于六个关键技术支柱：长思想链监督微调，具有可验证奖励的强化学习（RLVR），推理前的推理规划，测试时缩放，推测解码和推理优化硬件，所有这些都使用公开的开源数据集。K2-Think在数学推理方面表现出色，在开源模型的公共基准测试中获得了最先进的分数，同时在代码和科学等其他领域也表现出色。我们的研究结果证实，像K2-Think 32 B这样的参数效率更高的模型可以通过集成的训练后配方与最先进的系统竞争，该配方包括长链思维训练和战略推理时间增强，使开源推理系统更容易获得和负担得起。K2-Think可在k2think.ai上免费获得，通过Cerebras晶圆级引擎提供每秒超过2，000个令牌的一流推理速度。
摘要：K2-Think is a reasoning system that achieves state-of-the-art performance with a 32B parameter model, matching or surpassing much larger models like GPT-OSS 120B and DeepSeek v3.1. Built on the Qwen2.5 base model, our system shows that smaller models can compete at the highest levels by combining advanced post-training and test-time computation techniques. The approach is based on six key technical pillars: Long Chain-of-thought Supervised Finetuning, Reinforcement Learning with Verifiable Rewards (RLVR), Agentic planning prior to reasoning, Test-time Scaling, Speculative Decoding, and Inference-optimized Hardware, all using publicly available open-source datasets. K2-Think excels in mathematical reasoning, achieving state-of-the-art scores on public benchmarks for open-source models, while also performing strongly in other areas such as Code and Science. Our results confirm that a more parameter-efficient model like K2-Think 32B can compete with state-of-the-art systems through an integrated post-training recipe that includes long chain-of-thought training and strategic inference-time enhancements, making open-source reasoning systems more accessible and affordable. K2-Think is freely available at k2think.ai, offering best-in-class inference speeds of over 2,000 tokens per second per request via the Cerebras Wafer-Scale Engine.

【5】Conv4Rec: A 1-by-1 Convolutional AutoEncoder for User Profiling through Joint Analysis of Implicit and Explicit Feedbacks
标题：Conv 4Rec：一个1 x 1卷积自动编码器，用于通过联合分析隐式和显式反馈来进行用户分析
链接：https://arxiv.org/abs/2509.07499

作者：edent, Petr Kasalický, Rodrigo Alves, Hady W. Lauw
备注：Accepted at Transactions on Neural Networks and Learning Systems (TNNLS)
摘要：我们引入了一种新的卷积AutoEncoder架构，用于用户建模和推荐任务，并在现有技术基础上进行了几项改进。首先，我们的模型可以灵活地学习不同交互类型之间的一组关联和组合，并可以传递给每个用户和项目。其次，我们的模型能够从采样模式中的显式评级和隐式信息（我们称之为“隐式反馈”）中共同学习。它还可以单独预测消费内容的概率以及如果观察到则授予其高评级的可能性。这不仅允许模型对隐式和显式反馈进行预测，而且还增加了预测的信息量：特别是，我们的模型可以识别用户不太可能自然消费的项目，但如果暴露在他们面前，他们可能会喜欢。最后，我们为我们的模型提供了几个泛化界限，据我们所知，这是推荐系统设置中自动编码器的第一个泛化界限;我们还表明，优化我们的损失函数可以保证在交互过程中恢复精确的采样分布，直到总变差中的一个小误差。在几个现实生活中的数据集上的实验中，我们实现了隐式和显式反馈预测任务的最先进的性能，尽管依赖于一个单一的模型，并受益于额外的可解释性的形式，对每个可能的评级的概率进行单独的预测。
摘要：We introduce a new convolutional AutoEncoder architecture for user modelling and recommendation tasks with several improvements over the state of the art. Firstly, our model has the flexibility to learn a set of associations and combinations between different interaction types in a way that carries over to each user and item. Secondly, our model is able to learn jointly from both the explicit ratings and the implicit information in the sampling pattern (which we refer to as `implicit feedback'). It can also make separate predictions for the probability of consuming content and the likelihood of granting it a high rating if observed. This not only allows the model to make predictions for both the implicit and explicit feedback, but also increases the informativeness of the predictions: in particular, our model can identify items which users would not have been likely to consume naturally, but would be likely to enjoy if exposed to them. Finally, we provide several generalization bounds for our model, which to the best of our knowledge, are among the first generalization bounds for auto-encoders in a Recommender Systems setting; we also show that optimizing our loss function guarantees the recovery of the exact sampling distribution over interactions up to a small error in total variation. In experiments on several real-life datasets, we achieve state-of-the-art performance on both the implicit and explicit feedback prediction tasks despite relying on a single model for both, and benefiting from additional interpretability in the form of individual predictions for the probabilities of each possible rating.

【6】EfficientNet in Digital Twin-based Cardiac Arrest Prediction and Analysis
标题：EfficientNet在基于数字双胞胎的心脏骤停预测和分析中
链接：https://arxiv.org/abs/2509.07388

作者：, Avais Jan, Zafar Iqbal, Muhammad Mumtaz Ali, Mukarram Ali, Murray Patterson
备注：None
摘要：心脏骤停是全球最大的健康问题之一，早期识别和管理是改善患者预后的关键。在本文中，我们提出了一种新的框架，将基于EfficientNet的深度学习模型与数字孪生系统相结合，以改善心脏骤停的早期检测和分析。我们使用复合缩放和EfficientNet来学习心血管图像的特征。与此同时，数字孪生模型基于从连接到患者的物联网（IoT）设备接收的数据创建患者的真实和个性化的心血管系统模型，这可以帮助对患者进行持续评估以及可能的治疗计划的影响。实验结果表明，该系统具有较高的预测精度和预测效率。结合深度学习和数字孪生（DT）技术等高度先进的技术，可以使用主动和个性化的方法来预测心脏疾病。
摘要：Cardiac arrest is one of the biggest global health problems, and early identification and management are key to enhancing the patient's prognosis. In this paper, we propose a novel framework that combines an EfficientNet-based deep learning model with a digital twin system to improve the early detection and analysis of cardiac arrest. We use compound scaling and EfficientNet to learn the features of cardiovascular images. In parallel, the digital twin creates a realistic and individualized cardiovascular system model of the patient based on data received from the Internet of Things (IoT) devices attached to the patient, which can help in the constant assessment of the patient and the impact of possible treatment plans. As shown by our experiments, the proposed system is highly accurate in its prediction abilities and, at the same time, efficient. Combining highly advanced techniques such as deep learning and digital twin (DT) technology presents the possibility of using an active and individual approach to predicting cardiac disease.

【7】Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval
标题：超越顺序重排序：重排序引导的搜索改善了推理密集型检索
链接：https://arxiv.org/abs/2509.07163

作者： Tong Chen
摘要：广泛使用的检索和重新排序管道面临着两个关键的限制：它们受到前k个文档的初始检索质量的限制，以及基于LLM的重新排序器不断增长的计算需求限制了可以有效处理的文档数量。我们介绍了Reranker-Guided-Search（RGS），一种新的方法，绕过这些限制，直接检索文件根据reranker的喜好，而不是按照传统的顺序重新排序的方法。我们的方法使用近似最近邻算法生成的邻近图上的贪婪搜索，战略优先级有前途的文件重新排序的基础上，文件的相似性。实验结果表明，在多个基准测试中，性能得到了实质性的提高：在BRIGHT上提高了3.5分，在FollowIR上提高了2.9分，在M-BEIR上提高了5.1分，所有这些都在100个文档的约束reranker预算范围内。我们的分析表明，给定一对固定的嵌入和重新排序模型，在有限的重新排序预算下，战略性地选择文档进行重新排序可以显着提高检索准确性。
摘要：The widely used retrieve-and-rerank pipeline faces two critical limitations: they are constrained by the initial retrieval quality of the top-k documents, and the growing computational demands of LLM-based rerankers restrict the number of documents that can be effectively processed. We introduce Reranker-Guided-Search (RGS), a novel approach that bypasses these limitations by directly retrieving documents according to reranker preferences rather than following the traditional sequential reranking method. Our method uses a greedy search on proximity graphs generated by approximate nearest neighbor algorithms, strategically prioritizing promising documents for reranking based on document similarity. Experimental results demonstrate substantial performance improvements across multiple benchmarks: 3.5 points on BRIGHT, 2.9 on FollowIR, and 5.1 on M-BEIR, all within a constrained reranker budget of 100 documents. Our analysis suggests that, given a fixed pair of embedding and reranker models, strategically selecting documents to rerank can significantly improve retrieval accuracy under limited reranker budget.

【8】Recursive State Inference for Linear PASFA
标题：线性PasFA的递进状态推断
链接：https://arxiv.org/abs/2509.07028

作者：shi
备注：5 pages, 1 figure
摘要：慢特征分析（SFA）作为一种在分类和信号分析中学习慢变特征的方法，近年来受到越来越多的关注。最近的概率扩展SFA学习有效的表示分类任务。值得注意的是，概率自适应慢特征分析将慢特征建模为ARMA过程中的状态，并根据观察结果估计模型。然而，需要开发有效的方法来从观测和模型中推断状态（慢特征）。在本文中，一个递归扩展的线性PASFA已被提出。该算法执行MMSE估计的状态演变根据ARMA过程，给定的观测和模型。虽然目前的方法在将ARMA过程转换为状态空间模型后使用卡尔曼滤波器来解决这个问题，但形成有用表示的原始状态（或慢特征）不能容易地恢复。所提出的技术进行评估的合成数据集，以证明其正确性。
摘要：Slow feature analysis (SFA), as a method for learning slowly varying features in classification and signal analysis, has attracted increasing attention in recent years. Recent probabilistic extensions to SFA learn effective representations for classification tasks. Notably, the Probabilistic Adaptive Slow Feature Analysis models the slow features as states in an ARMA process and estimate the model from the observations. However, there is a need to develop efficient methods to infer the states (slow features) from the observations and the model. In this paper, a recursive extension to the linear PASFA has been proposed. The proposed algorithm performs MMSE estimation of states evolving according to an ARMA process, given the observations and the model. Although current methods tackle this problem using Kalman filters after transforming the ARMA process into a state space model, the original states (or slow features) that form useful representations cannot be easily recovered. The proposed technique is evaluated on a synthetic dataset to demonstrate its correctness.

【9】Exploring Over-stationarization in Deep Learning-based Bus/Tram Arrival Time Prediction: Analysis and Non-stationary Effect Recovery
标题：探索基于深度学习的公交车/有轨电车到达时间预测中的过度平稳化：分析和非平稳效应恢复
链接：https://arxiv.org/abs/2509.06979

作者： Bin Yang, Meng Wang
备注：26 pages, 13 figures
摘要：公交车辆到达时间预测（ATP）对于改善乘客体验和支持交通管理至关重要。深度学习在ATP中表现出出色的性能，因为它能够对非线性和时间动态进行建模。在多步ATP中，由于变量的联合分布在时间方向上的变化，非平稳数据会降低模型的性能。以往的研究主要是通过归一化来消除时间序列的非平稳性，从而达到更好的可预测性。然而，归一化可能会掩盖非平稳性中固有的有用特性，这被称为过平稳化。在这项工作中，权衡的可预测性和非平稳性，多步ATP的一种新的方法，命名为非平稳ATP（NSATP），提出。该方法包括两个阶段：序列平稳化和非平稳效应恢复。第一阶段的目标是提高可预测性。对于后者，NSATP将最先进的方法从一维模型扩展到二维模型，以捕捉时间序列中隐藏的周期性，并通过学习原始数据的缩放和移位因子来设计过平稳化补偿模块。收集了德累斯顿市125天的公共交通运营数据进行验证。实验结果表明，与基线方法相比，所提出的NSATP可以减少RMSE，MAE和MAPE分别为2.37%，1.22%和2.26%的有轨电车和1.72%，0.60%和1.17%的公共汽车，分别。
摘要：Arrival time prediction (ATP) of public transport vehicles is essential in improving passenger experience and supporting traffic management. Deep learning has demonstrated outstanding performance in ATP due to its ability to model non-linear and temporal dynamics. In the multi-step ATP, non-stationary data will degrade the model performance due to the variation in variables' joint distribution along the temporal direction. Previous studies mainly applied normalization to eliminate the non-stationarity in time series, thereby achieving better predictability. However, the normalization may obscure useful characteristics inherent in non-stationarity, which is known as the over-stationarization. In this work, to trade off predictability and non-stationarity, a new approach for multi-step ATP, named non-stationary ATP ( NSATP), is proposed. The method consists of two stages: series stationarization and non-stationarity effect recovery. The first stage aims at improving the predictability. As for the latter, NSATP extends a state-of-the-art method from one-dimensional to two dimensional based models to capture the hidden periodicity in time series and designs a compensation module of over-stationarization by learning scaling and shifting factors from raw data. 125 days' public transport operational data of Dresden is collected for validation. Experimental results show that compared to baseline methods, the proposed NSATP can reduce RMSE, MAE, and MAPE by 2.37%, 1.22%, and 2.26% for trams and by 1.72%, 0.60%, and 1.17% for buses, respectively.

【10】ADHAM: Additive Deep Hazard Analysis Mixtures for Interpretable Survival Regression
标题：ADHAM：可解释生存回归的加性深度危险分析混合
链接：https://arxiv.org/abs/2509.07108

作者：nci, Vincent Jeanselme, Harry Reyes Nieva, Shalmali Joshi, Noémie Elhadad
摘要：生存分析是医疗保健中建模至事件发生时间结局的基本工具。最近的进展已经引入了灵活的神经网络方法，以提高预测性能。然而，大多数这些模型并没有提供可解释的见解之间的关联暴露和建模的结果，在临床实践中的决策的关键要求。为了解决这个问题，我们提出了一个可解释的加性生存模型--加性深度危险分析混合模型（ADHAM）。ADHAM假设一个条件的潜在结构，定义子组，每个特征的协变量特定的风险函数的组合。为了选择子组的数量，我们引入了一个训练后的细化，通过合并类似的组来减少等效潜在子组的数量。我们进行了全面的研究，以证明ADHAM在人群，亚组和个体水平上的可解释性。在真实世界数据集上进行的大量实验表明，ADHAM为暴露与结果之间的关联提供了新的见解。此外，ADHAM在预测性能方面与现有最先进的生存基线保持一致，为医疗保健中的事件时间预测提供了可扩展和可解释的方法。
摘要：Survival analysis is a fundamental tool for modeling time-to-event outcomes in healthcare. Recent advances have introduced flexible neural network approaches for improved predictive performance. However, most of these models do not provide interpretable insights into the association between exposures and the modeled outcomes, a critical requirement for decision-making in clinical practice. To address this limitation, we propose Additive Deep Hazard Analysis Mixtures (ADHAM), an interpretable additive survival model. ADHAM assumes a conditional latent structure that defines subgroups, each characterized by a combination of covariate-specific hazard functions. To select the number of subgroups, we introduce a post-training refinement that reduces the number of equivalent latent subgroups by merging similar groups. We perform comprehensive studies to demonstrate ADHAM's interpretability at the population, subgroup, and individual levels. Extensive experiments on real-world datasets show that ADHAM provides novel insights into the association between exposures and outcomes. Further, ADHAM remains on par with existing state-of-the-art survival baselines in terms of predictive performance, offering a scalable and interpretable approach to time-to-event prediction in healthcare.

检测相关(4篇)

【1】RoseCDL: Robust and Scalable Convolutional Dictionary Learning for Rare-event Detection
标题：RoseCDL：用于稀有事件检测的鲁棒可扩展卷积字典学习
链接：https://arxiv.org/abs/2509.07523

作者：, Mansour Benbakoura, Cédric Allain, Benoît Malezieux, Matthieu Kowalski, Thomas Moreau
摘要：识别大规模信号中的重复模式和罕见事件是天文学、物理模拟和生物医学等领域的一项基本挑战。卷积字典学习（CDL）为信号中的局部结构建模提供了一个强大的框架，但它在检测罕见或异常事件方面的用途在很大程度上尚未得到探索。特别是，CDL在这种情况下面临两个关键挑战：高计算成本和对伪影和离群值的敏感性。在本文中，我们介绍了RoseCDL，一个可扩展的和强大的CDL算法设计的无监督的罕见事件检测长信号。RoseCDL结合了随机窗口，在大型数据集上进行有效训练，并结合了内联离群值检测，以增强鲁棒性并隔离异常模式。这将CDL重新定义为一种在真实信号中进行事件发现和表征的实用工具，将其作用扩展到压缩或去噪等传统任务之外。
摘要：Identifying recurring patterns and rare events in large-scale signals is a fundamental challenge in fields such as astronomy, physical simulations, and biomedical science. Convolutional Dictionary Learning (CDL) offers a powerful framework for modeling local structures in signals, but its use for detecting rare or anomalous events remains largely unexplored. In particular, CDL faces two key challenges in this setting: high computational cost and sensitivity to artifacts and outliers. In this paper, we introduce RoseCDL, a scalable and robust CDL algorithm designed for unsupervised rare event detection in long signals. RoseCDL combines stochastic windowing for efficient training on large datasets with inline outlier detection to enhance robustness and isolate anomalous patterns. This reframes CDL as a practical tool for event discovery and characterization in real-world signals, extending its role beyond traditional tasks like compression or denoising.

【2】Hybrid GCN-GRU Model for Anomaly Detection in Cryptocurrency Transactions
标题：用于加密货币交易异常检测的混合GCN-GRU模型
链接：https://arxiv.org/abs/2509.07392

作者：a, Minjung Park, Hyeonjeong Cha, Soyoun Kim, Sunyoung Moon, Sua Lee, Jaeyoung Choi, Hyemin Lee, Sangmi Chai
摘要：区块链交易网络是复杂的，具有不断变化的时间模式和节点间关系。为了检测非法活动，我们提出了一个混合GCN-GRU模型，捕获结构和序列特征。使用真实的比特币交易数据（2020-2024），我们的模型实现了0.9470的准确性和0.9807的AUC-ROC，优于所有基线。
摘要：Blockchain transaction networks are complex, with evolving temporal patterns and inter-node relationships. To detect illicit activities, we propose a hybrid GCN-GRU model that captures both structural and sequential features. Using real Bitcoin transaction data (2020-2024), our model achieved 0.9470 Accuracy and 0.9807 AUC-ROC, outperforming all baselines.

【3】Frustratingly Easy Feature Reconstruction for Out-of-Distribution Detection
标题：令人沮丧的简单特征重建以实现失分布检测
链接：https://arxiv.org/abs/2509.06988

作者： Wang, Shuo Lu, Jian Liang, Aihua Zheng, Ran He
备注：Accepted to PRCV2025
摘要：分发外（OOD）检测有助于模型识别训练类别之外的数据，这对安全应用至关重要。虽然基于特征的事后方法通过在不改变网络参数的情况下评估特征空间中的数据差异来解决这个问题，但它们通常需要访问训练数据，这可能不适合某些数据隐私场景。这可能不适合需要关注数据隐私保护的场景。在本文中，我们提出了一个简单而有效的事后方法，称为基于分类器的特征重建（ClaFR），从子空间投影的角度来看。它首先对分类器的权值进行正交分解，提取类已知的子空间，然后将原始数据特征映射到该子空间中，得到新的数据表示。随后，通过计算子空间内的数据的特征重构误差来确定OOD分数。与现有的OOD检测算法相比，我们的方法不需要访问训练数据，同时在多个OOD基准测试中获得领先的性能。我们的代码在https://github.com/Aie0923/ClaFR上发布。
摘要：Out-of-distribution (OOD) detection helps models identify data outside the training categories, crucial for security applications. While feature-based post-hoc methods address this by evaluating data differences in the feature space without changing network parameters, they often require access to training data, which may not be suitable for some data privacy scenarios. This may not be suitable in scenarios where data privacy protection is a concern. In this paper, we propose a simple yet effective post-hoc method, termed Classifier-based Feature Reconstruction (ClaFR), from the perspective of subspace projection. It first performs an orthogonal decomposition of the classifier's weights to extract the class-known subspace, then maps the original data features into this subspace to obtain new data representations. Subsequently, the OOD score is determined by calculating the feature reconstruction error of the data within the subspace. Compared to existing OOD detection algorithms, our method does not require access to training data while achieving leading performance on multiple OOD benchmarks. Our code is released at https://github.com/Aie0923/ClaFR.

【4】VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation
标题：Vision：利用反事实增强的稳健且可解释的代码漏洞检测
链接：https://arxiv.org/abs/2508.18933

作者：a, Barproda Halder, Sanghamitra Dutta
摘要：自动检测源代码中的漏洞是一项重要的网络安全挑战，是对数字系统和服务的信任的基础。图神经网络（GNN）已经成为一种很有前途的方法，因为它们可以以数据驱动的方式学习结构和逻辑代码关系。然而，它们的性能受到训练数据不平衡和标签噪声的严重限制。GNN通常从表面代码相似性中学习“虚假”相关性，产生的检测器无法很好地推广到看不见的真实世界数据。在这项工作中，我们提出了一个统一的框架，用于强大的和可解释的漏洞检测，称为VISION，通过系统地增强反事实训练数据集来减轻虚假的相关性。反事实是具有最小语义修改但相反标签的样本。我们的框架包括：（i）通过提示大型语言模型（LLM）生成反事实;（ii）对具有相反标签的配对代码示例进行有针对性的GNN训练;以及（iii）基于图形的可解释性，以识别与漏洞预测相关的关键代码语句，同时忽略虚假代码语句。我们发现，VISION减少了虚假学习，实现了更强大、更可推广的检测，提高了常见弱点枚举（CWE）-20漏洞的总体准确率（从51.8%到97.8%）、成对对比准确率（从4.5%到95.8%）和最差组准确率（从0.7%到85.5%）。我们进一步证明收益使用建议的指标：类内归因方差，类间归因距离和节点得分依赖。我们还发布了CWE-20-CFA，这是来自高影响力CWE-20类别的27，556个函数（真实和反事实）的基准。最后，VISION通过交互式可视化进行人在回路分析，推进透明和值得信赖的基于人工智能的网络安全系统。
摘要：Automated detection of vulnerabilities in source code is an essential cybersecurity challenge, underpinning trust in digital systems and services. Graph Neural Networks (GNNs) have emerged as a promising approach as they can learn structural and logical code relationships in a data-driven manner. However, their performance is severely constrained by training data imbalances and label noise. GNNs often learn 'spurious' correlations from superficial code similarities, producing detectors that fail to generalize well to unseen real-world data. In this work, we propose a unified framework for robust and interpretable vulnerability detection, called VISION, to mitigate spurious correlations by systematically augmenting a counterfactual training dataset. Counterfactuals are samples with minimal semantic modifications but opposite labels. Our framework includes: (i) generating counterfactuals by prompting a Large Language Model (LLM); (ii) targeted GNN training on paired code examples with opposite labels; and (iii) graph-based interpretability to identify the crucial code statements relevant for vulnerability predictions while ignoring spurious ones. We find that VISION reduces spurious learning and enables more robust, generalizable detection, improving overall accuracy (from 51.8% to 97.8%), pairwise contrast accuracy (from 4.5% to 95.8%), and worst-group accuracy (from 0.7% to 85.5%) on the Common Weakness Enumeration (CWE)-20 vulnerability. We further demonstrate gains using proposed metrics: intra-class attribution variance, inter-class attribution distance, and node score dependency. We also release CWE-20-CFA, a benchmark of 27,556 functions (real and counterfactual) from the high-impact CWE-20 category. Finally, VISION advances transparent and trustworthy AI-based cybersecurity systems through interactive visualization for human-in-the-loop analysis.

分类|识别(3篇)

【1】Predicting person-level injury severity using crash narratives: A balanced approach with roadway classification and natural language process techniques
标题：使用碰撞叙述预测个人层面的伤害严重程度：采用道路分类和自然语言处理技术的平衡方法
链接：https://arxiv.org/abs/2509.07845

作者：Zana Majidi, Sajjad Karimi, Teng Wang, Robert Kluger, Reginald Souleyrette
摘要：预测交通事故中的伤亡人数在加强道路安全、改善应急反应和指导公共卫生干预措施方面发挥着关键作用。本研究调查了非结构化的碰撞叙述（由警察在现场撰写）与结构化的碰撞数据相结合，以预测受伤严重程度时的附加值。两个广泛使用的自然语言处理（NLP）技术，词频-逆文档频率（TF-IDF）和Word 2 Vec，从叙述中提取语义，并比较它们的有效性。为了解决类别不平衡的挑战，在建模之前，将基于K最近邻的过采样方法应用于训练数据。该数据集由肯塔基州2019年至2023年的撞车记录组成。为了说明道路的异质性，使用了三种道路分类方案：（1）八种详细的功能类（例如，城市双车道，农村州际公路，城市多车道划分），（2）四个更广泛的配对类别（例如，城市与农村，高速公路与非高速公路），以及（3）没有分类的统一数据集。通过使用两种NLP技术以及三种集成算法（XGBoost，Random Forest和AdaBoost）组合结构化特征和基于叙述的特征，共开发了102个机器学习模型。结果表明，包含叙事数据的模型始终优于那些只依赖结构化数据的模型。在所有组合中，TF-IDF与XGBoost结合在大多数子组中产生了最准确的预测。研究结果强调了整合文本和结构化碰撞信息以增强人员伤害预测的力量。这项工作为交通安全专业人员提供了一个实用和适应性强的框架，以改善碰撞严重程度建模，指导政策决策，并设计更有效的对策。
摘要：Predicting injuries and fatalities in traffic crashes plays a critical role in enhancing road safety, improving emergency response, and guiding public health interventions. This study investigates the added value of unstructured crash narratives (written by police officers at the scene) when combined with structured crash data to predict injury severity. Two widely used Natural Language Processing (NLP) techniques, Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec, were employed to extract semantic meaning from the narratives, and their effectiveness was compared. To address the challenge of class imbalance, a K-Nearest Neighbors-based oversampling method was applied to the training data prior to modeling. The dataset consists of crash records from Kentucky spanning 2019 to 2023. To account for roadway heterogeneity, three road classification schemes were used: (1) eight detailed functional classes (e.g., Urban Two-Lane, Rural Interstate, Urban Multilane Divided), (2) four broader paired categories (e.g., Urban vs. Rural, Freeway vs. Non-Freeway), and (3) a unified dataset without classification. A total of 102 machine learning models were developed by combining structured features and narrative-based features using the two NLP techniques alongside three ensemble algorithms: XGBoost, Random Forest, and AdaBoost. Results demonstrate that models incorporating narrative data consistently outperform those relying solely on structured data. Among all combinations, TF-IDF coupled with XGBoost yielded the most accurate predictions in most subgroups. The findings highlight the power of integrating textual and structured crash information to enhance person-level injury prediction. This work offers a practical and adaptable framework for transportation safety professionals to improve crash severity modeling, guide policy decisions, and design more effective countermeasures.

【2】Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks
标题：使用深度卷积神经网络进行类别和类别级音频分类的频谱和节奏特征性能评估
链接：https://arxiv.org/abs/2509.07756

作者： Wolf-Monheim
摘要：除了决策树和k近邻算法之外，深度卷积神经网络（CNN）被广泛用于对音乐、语音或环境声音等许多领域的音频数据进行分类。为了训练特定的CNN，可以将各种频谱和节奏特征（如梅尔缩放频谱图、梅尔频率倒谱系数（MFCC）、循环温度图、短时傅里叶变换（STFT）色度图、恒定Q变换（CQT）色度图和色度能量归一化统计（CENS）色度图）用作神经网络的数字图像输入数据。使用深度CNN和ESC-50数据集，使用端到端深度学习管道，详细研究了这些频谱和节奏特征在音频类别级别以及音频类级别分类中的性能，其中ESC-50数据集包含2，000个标记的环境音频记录。多类分类的评估指标准确度、精确度、召回率和F1得分清楚地表明，梅尔缩放频谱图和梅尔频率倒谱系数（MFCC）的表现明显优于本研究中使用深度CNN进行音频分类任务所研究的其他频谱和节奏特征。
摘要：Next to decision tree and k-nearest neighbours algorithms deep convolutional neural networks (CNNs) are widely used to classify audio data in many domains like music, speech or environmental sounds. To train a specific CNN various spectral and rhythm features like mel-scaled spectrograms, mel-frequency cepstral coefficients (MFCC), cyclic tempograms, short-time Fourier transform (STFT) chromagrams, constant-Q transform (CQT) chromagrams and chroma energy normalized statistics (CENS) chromagrams can be used as digital image input data for the neural network. The performance of these spectral and rhythm features for audio category level as well as audio class level classification is investigated in detail with a deep CNN and the ESC-50 dataset with 2,000 labeled environmental audio recordings using an end-to-end deep learning pipeline. The evaluated metrics accuracy, precision, recall and F1 score for multiclass classification clearly show that the mel-scaled spectrograms and the mel-frequency cepstral coefficients (MFCC) perform significantly better then the other spectral and rhythm features investigated in this research for audio classification tasks using deep CNNs.

【3】Beyond Rebalancing: Benchmarking Binary Classifiers Under Class Imbalance Without Rebalancing Techniques
标题：超越再平衡：在没有再平衡技术的情况下对二进制分类器进行基准测试
链接：https://arxiv.org/abs/2509.07605

作者：, Amir Ahmad, Shehroz S. Khan
摘要：类别不平衡对监督分类提出了重大挑战，特别是在少数类别实例很少的医学诊断和异常检测等关键领域。虽然许多研究已经探索了重新平衡技术来解决这个问题，但当没有应用这种技术时，对不平衡下的二进制分类器的性能评估关注较少。因此，本研究的目标是评估性能的二进制分类器“原样”，没有执行任何显式的重新平衡。具体来说，我们系统地评估了一组不同的二进制分类器在现实世界和合成数据集的鲁棒性，逐步减少少数类的大小，使用一杆和Few-Shot的情况下作为基线。我们的方法还通过合成决策边界生成来模拟真实世界的条件，从而探索不同的数据复杂性。除了标准分类器之外，我们还包括使用欠采样、过采样策略和一类分类（OCC）方法的实验，以检查它们在严重不平衡下的行为。结果证实，随着数据复杂性的增加和少数类大小的减少，分类变得更加困难。虽然传统分类器在极端不平衡的情况下会恶化，但与传统分类器相比，TabPFN和基于增强的集成等高级模型保持了相对较高的性能和更好的泛化能力。视觉可解释性和评价指标进一步验证了这些发现。我们的工作为不平衡学习的模型选择提供了有价值的指导，在不依赖显式重新平衡技术的情况下提供了对分类器鲁棒性的见解。
摘要：Class imbalance poses a significant challenge to supervised classification, particularly in critical domains like medical diagnostics and anomaly detection where minority class instances are rare. While numerous studies have explored rebalancing techniques to address this issue, less attention has been given to evaluating the performance of binary classifiers under imbalance when no such techniques are applied. Therefore, the goal of this study is to assess the performance of binary classifiers "as-is", without performing any explicit rebalancing. Specifically, we systematically evaluate the robustness of a diverse set of binary classifiers across both real-world and synthetic datasets, under progressively reduced minority class sizes, using one-shot and few-shot scenarios as baselines. Our approach also explores varying data complexities through synthetic decision boundary generation to simulate real-world conditions. In addition to standard classifiers, we include experiments using undersampling, oversampling strategies, and one-class classification (OCC) methods to examine their behavior under severe imbalance. The results confirm that classification becomes more difficult as data complexity increases and the minority class size decreases. While traditional classifiers deteriorate under extreme imbalance, advanced models like TabPFN and boosting-based ensembles retain relatively higher performance and better generalization compared to traditional classifiers. Visual interpretability and evaluation metrics further validate these findings. Our work offers valuable guidance on model selection for imbalanced learning, providing insights into classifier robustness without dependence on explicit rebalancing techniques.

表征(3篇)

【1】Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations
标题：声音合成器的神经代理：学习感知知情的预设表示
链接：https://arxiv.org/abs/2509.07635

作者：bes, Stefan Weinzierl, Klaus Obermayer
备注：17 pages, 4 figures, published in the Journal of the Audio Engineering Society
摘要：深度学习似乎是自动合成器编程（ASP）的一个有吸引力的解决方案，旨在帮助音乐家和声音设计师对声音合成器进行编程。然而，由于其潜在的不可微性，将软件合成器集成到训练管道中具有挑战性。这项工作通过引入一种方法来近似任意合成器来应对这一挑战。具体来说，我们训练一个神经网络，以将合成器映射到从预训练模型导出的音频嵌入空间。这有助于定义产生紧凑而有效的表示的神经代理，从而能够将音频嵌入损失集成到黑盒合成器的基于神经的ASP系统中。我们在基于神经的nASP的背景下评估了各种预训练音频模型的表示，并评估了几种神经网络架构的有效性，包括前馈，递归和基于transformer的模型，在定义神经代理。我们评估所提出的方法使用合成和手工制作的合成器从三个流行的软件合成器，并评估其性能的合成器的声音匹配下游任务。虽然学习的代表性的好处是细微差别的资源需求，令人鼓舞的结果，获得了所有的合成器，为未来的研究铺平了道路，为基于神经的ASP系统的合成器代理的应用。
摘要：Deep learning appears as an appealing solution for Automatic Synthesizer Programming (ASP), which aims to assist musicians and sound designers in programming sound synthesizers. However, integrating software synthesizers into training pipelines is challenging due to their potential non-differentiability. This work tackles this challenge by introducing a method to approximate arbitrary synthesizers. Specifically, we train a neural network to map synthesizer presets onto an audio embedding space derived from a pretrained model. This facilitates the definition of a neural proxy that produces compact yet effective representations, thereby enabling the integration of audio embedding loss into neural-based ASP systems for black-box synthesizers. We evaluate the representations derived by various pretrained audio models in the context of neural-based nASP and assess the effectiveness of several neural network architectures, including feedforward, recurrent, and transformer-based models, in defining neural proxies. We evaluate the proposed method using both synthetic and hand-crafted presets from three popular software synthesizers and assess its performance in a synthesizer sound matching downstream task. While the benefits of the learned representation are nuanced by resource requirements, encouraging results were obtained for all synthesizers, paving the way for future research into the application of synthesizer proxies for neural-based ASP systems.

【2】Water Demand Forecasting of District Metered Areas through Learned Consumer Representations
标题：通过习得的消费者代表预测区域计量地区的需水量
链接：https://arxiv.org/abs/2509.07515

作者：amachandran, Thorkil Flensmark B. Neergaard, Tomás Arias-Vergara, Andreas Maier, Siming Bayer
备注：Presented at European Conference for Signal Procesing - EUSIPCO 2025
摘要：智能计量技术的进步大大提高了监测和管理水务设施的能力。在气候变化造成的不确定性日益增加的背景下，确保水资源和水供应已成为一个紧迫的全球问题，具有广泛的社会经济影响。来自最终用户的每小时消费数据为预测以不同消费模式为特点的各区域的需求提供了重要的见解。然而，由于气象条件等非确定性因素的影响，需水量的预测仍然具有挑战性。这项工作介绍了一种新的方法，短期用水需求预测的区域计量区（DMA），其中包括商业，农业和居民消费者。无监督对比学习被应用于根据DMA内存在的不同消费行为对最终用户进行分类。随后，不同的消费行为被用作随后的需求预测任务中使用小波变换的卷积网络，结合了交叉注意机制，结合历史数据和派生的表示的功能。所提出的方法进行了评估，在现实世界的DMA超过六个月的时间内，表现出改进的预测性能，在不同的DMA的MAPE方面，最大的改进为4.9%。此外，它还识别出消费者的行为是由社会经济因素塑造的，增强了对影响需求的确定性模式的先验知识。
摘要：Advancements in smart metering technologies have significantly improved the ability to monitor and manage water utilities. In the context of increasing uncertainty due to climate change, securing water resources and supply has emerged as an urgent global issue with extensive socioeconomic ramifications. Hourly consumption data from end-users have yielded substantial insights for projecting demand across regions characterized by diverse consumption patterns. Nevertheless, the prediction of water demand remains challenging due to influencing non-deterministic factors, such as meteorological conditions. This work introduces a novel method for short-term water demand forecasting for District Metered Areas (DMAs) which encompass commercial, agricultural, and residential consumers. Unsupervised contrastive learning is applied to categorize end-users according to distinct consumption behaviors present within a DMA. Subsequently, the distinct consumption behaviors are utilized as features in the ensuing demand forecasting task using wavelet-transformed convolutional networks that incorporate a cross-attention mechanism combining both historical data and the derived representations. The proposed approach is evaluated on real-world DMAs over a six-month period, demonstrating improved forecasting performance in terms of MAPE across different DMAs, with a maximum improvement of 4.9%. Additionally, it identifies consumers whose behavior is shaped by socioeconomic factors, enhancing prior knowledge about the deterministic patterns that influence demand.

【3】SBS: Enhancing Parameter-Efficiency of Neural Representations for Neural Networks via Spectral Bias Suppression
标题：SBS：通过谱偏差抑制提高神经网络神经表示的参数效率
链接：https://arxiv.org/abs/2509.07373

作者： Yuan Li, Yi Kang
备注：Accepted by ICONIP 2025
摘要：隐式神经表示最近已被扩展为通过神经网络的神经表示来表示卷积神经网络权重，提供了有前途的参数压缩好处。然而，用于神经网络的神经表示的标准多层感知器表现出明显的频谱偏差，阻碍了它们有效地重建高频细节的能力。在本文中，我们提出了SBS，一种对神经网络的神经表示的参数有效增强，使用两种技术来抑制频谱偏差：（1）基于单向排序的平滑，提高输出空间中的内核平滑度，以及（2）基于单向排序的平滑感知随机傅立叶特征，基于逐层参数计数自适应地调制输入编码的频率带宽。在使用CIFAR-10、CIFAR-100和ImageNet数据集的各种ResNet模型上进行的广泛评估表明，与SOTA相比，SBS以更少的参数实现了更好的重建精度。
摘要：Implicit neural representations have recently been extended to represent convolutional neural network weights via neural representation for neural networks, offering promising parameter compression benefits. However, standard multi-layer perceptrons used in neural representation for neural networks exhibit a pronounced spectral bias, hampering their ability to reconstruct high-frequency details effectively. In this paper, we propose SBS, a parameter-efficient enhancement to neural representation for neural networks that suppresses spectral bias using two techniques: (1) a unidirectional ordering-based smoothing that improves kernel smoothness in the output space, and (2) unidirectional ordering-based smoothing aware random fourier features that adaptively modulate the frequency bandwidth of input encodings based on layer-wise parameter count. Extensive evaluations on various ResNet models with datasets CIFAR-10, CIFAR-100, and ImageNet, demonstrate that SBS achieves significantly better reconstruction accuracy with less parameters compared to SOTA.

优化|敛散性(5篇)

【1】A Modular Algorithm for Non-Stationary Online Convex-Concave Optimization
标题：非平稳在线凹凸优化的模块化算法
链接：https://arxiv.org/abs/2509.07901

作者：Meng, Xia Lei, Jian-wei Liu
备注：Earlier Version: this https URL
摘要：本文研究了在线凸-凹优化问题，将在线凸优化问题推广到两人时变凸-凹对策。我们的目标是最小化动态二元差距（D-DGap），这是一个关键的性能指标，用于评估玩家对任意比较序列的策略。现有的算法无法提供最佳性能，特别是在静止或可预测的环境中。为了解决这个问题，我们提出了一种新型的模块化算法，该算法具有三个核心组件：动态调整以适应不同水平的非平稳性的自适应模块、在多个候选者中识别最佳预测器的多预测器聚合器以及有效结合的集成模块它们的优势。我们的算法实现了最小最大最优D-DGap上界，达到对数因子，同时还确保了预测误差驱动的D-DGap界。模块化设计允许无缝替换调节动态环境适应性的组件，以及整合来自多个预测因子的“辅助知识”的组件。实验结果进一步证明了该方法的有效性和适应性。
摘要：This paper investigates the problem of Online Convex-Concave Optimization, which extends Online Convex Optimization to two-player time-varying convex-concave games. The goal is to minimize the dynamic duality gap (D-DGap), a critical performance measure that evaluates players' strategies against arbitrary comparator sequences. Existing algorithms fail to deliver optimal performance, particularly in stationary or predictable environments. To address this, we propose a novel modular algorithm with three core components: an Adaptive Module that dynamically adjusts to varying levels of non-stationarity, a Multi-Predictor Aggregator that identifies the best predictor among multiple candidates, and an Integration Module that effectively combines their strengths. Our algorithm achieves a minimax optimal D-DGap upper bound, up to a logarithmic factor, while also ensuring prediction error-driven D-DGap bounds. The modular design allows for the seamless replacement of components that regulate adaptability to dynamic environments, as well as the incorporation of components that integrate ``side knowledge'' from multiple predictors. Empirical results further demonstrate the effectiveness and adaptability of the proposed method.

【2】Quantum Computing for Large-scale Network Optimization: Opportunities and Challenges
标题：用于大规模网络优化的量子计算：机遇与挑战
链接：https://arxiv.org/abs/2509.07773

作者： Macaluso, Giovanni Geraci, Elías F. Combarro, Sergi Abadal, Ioannis Arapakis, Sofia Vallecorsa, Eduard Alarcón
备注：7 pages, 4 figures
摘要：大规模6G及以上网络的复杂性需要在广阔的搜索空间中进行多目标优化的创新方法，这通常是一项棘手的任务。量子计算（QC）作为一种有前途的高效大规模优化技术而出现。我们提出了我们的愿景，利用QC解决关键类的问题，在未来的移动网络。通过分析和识别共同的功能，特别是他们的图形为中心的表示，我们提出了一个统一的策略，涉及QC算法。具体来说，我们概述了一种使用量子退火和量子强化学习进行优化的方法。此外，我们还讨论了QC算法和硬件必须克服的主要挑战，以有效地优化未来的网络。
摘要：The complexity of large-scale 6G-and-beyond networks demands innovative approaches for multi-objective optimization over vast search spaces, a task often intractable. Quantum computing (QC) emerges as a promising technology for efficient large-scale optimization. We present our vision of leveraging QC to tackle key classes of problems in future mobile networks. By analyzing and identifying common features, particularly their graph-centric representation, we propose a unified strategy involving QC algorithms. Specifically, we outline a methodology for optimization using quantum annealing as well as quantum reinforcement learning. Additionally, we discuss the main challenges that QC algorithms and hardware must overcome to effectively optimize future networks.

【3】Astra: A Multi-Agent System for GPU Kernel Performance Optimization
标题：Astra：一个用于图形处理器核心性能优化的多代理系统
链接：https://arxiv.org/abs/2509.07506

作者：ei, Tianran Sun, Yogesh Seenichamy, Hang Song, Anne Ouyang, Azalia Mirhoseini, Ke Wang, Alex Aiken
摘要：GPU内核优化长期以来一直是高性能计算和机器学习交叉领域的核心挑战。高效的内核对于加速大型语言模型（LLM）训练和服务至关重要，但要获得高性能通常需要大量的手动调优。基于机器人的系统减少了一些这种负担，但仍然需要大量的人工设计和工程工作。最近，研究人员已经探索了使用LLM来生成GPU内核，尽管之前的工作主要集中在将高级PyTorch模块转换为CUDA代码。在这项工作中，我们介绍了阿斯特拉，第一个基于LLM的多代理系统的GPU内核优化。与以前的方法不同，Astra从SGLang中提取的现有CUDA实现开始，SGLang是一个广泛部署的服务LLM的框架，而不是将PyTorch模块视为规范。在Astra中，专门的LLM代理通过迭代代码生成、测试、分析和规划进行协作，以生成正确且高性能的内核。在SGLang的内核上，Astra使用OpenAI o 4-mini的zero-shot提示实现了1.32倍的平均加速。详细的案例研究进一步证明了LLM可以自主应用循环转换，优化内存访问模式，利用CUDA内部函数，并利用快速数学运算来获得可观的性能收益。我们的工作突出了多智能体LLM系统作为一个有前途的新范例GPU内核优化。
摘要：GPU kernel optimization has long been a central challenge at the intersection of high-performance computing and machine learning. Efficient kernels are crucial for accelerating large language model (LLM) training and serving, yet attaining high performance typically requires extensive manual tuning. Compiler-based systems reduce some of this burden, but still demand substantial manual design and engineering effort. Recently, researchers have explored using LLMs for GPU kernel generation, though prior work has largely focused on translating high-level PyTorch modules into CUDA code. In this work, we introduce Astra, the first LLM-based multi-agent system for GPU kernel optimization. Unlike previous approaches, Astra starts from existing CUDA implementations extracted from SGLang, a widely deployed framework for serving LLMs, rather than treating PyTorch modules as the specification. Within Astra, specialized LLM agents collaborate through iterative code generation, testing, profiling, and planning to produce kernels that are both correct and high-performance. On kernels from SGLang, Astra achieves an average speedup of 1.32x using zero-shot prompting with OpenAI o4-mini. A detailed case study further demonstrates that LLMs can autonomously apply loop transformations, optimize memory access patterns, exploit CUDA intrinsics, and leverage fast math operations to yield substantial performance gains. Our work highlights multi-agent LLM systems as a promising new paradigm for GPU kernel optimization.

【4】A Minimalist Bayesian Framework for Stochastic Optimization
标题：随机优化的极简Bayesian框架
链接：https://arxiv.org/abs/2509.07030

作者：Wang
备注：25 pages
摘要：贝叶斯范式为不确定性下的顺序决策提供了原则性工具，但其对所有参数的概率模型的依赖可能会阻碍复杂结构约束的结合。我们引入了一个最低限度的贝叶斯框架，将先验只对感兴趣的组件，如最佳的位置。通过配置文件可能性消除了滋扰参数，该配置文件可能性自然地处理约束。作为一个直接的实例，我们开发了一个极小汤普森采样（MINTS）算法。我们的框架适应结构化的问题，包括连续武装Lipschitz土匪和动态定价。它还提供了一个经典的凸优化算法，如重心和椭球方法的概率镜头。我们进一步分析了多武装匪徒的薄荷糖，并建立近最优的遗憾保证。
摘要：The Bayesian paradigm offers principled tools for sequential decision-making under uncertainty, but its reliance on a probabilistic model for all parameters can hinder the incorporation of complex structural constraints. We introduce a minimalist Bayesian framework that places a prior only on the component of interest, such as the location of the optimum. Nuisance parameters are eliminated via profile likelihood, which naturally handles constraints. As a direct instantiation, we develop a MINimalist Thompson Sampling (MINTS) algorithm. Our framework accommodates structured problems, including continuum-armed Lipschitz bandits and dynamic pricing. It also provides a probabilistic lens on classical convex optimization algorithms such as the center of gravity and ellipsoid methods. We further analyze MINTS for multi-armed bandits and establish near-optimal regret guarantees.

【5】Decentralized Online Riemannian Optimization Beyond Hadamard Manifolds
标题：超越阿达玛Manifolds的去中心化在线Riemann优化
链接：https://arxiv.org/abs/2509.07779

作者：noglu, Shahin Shahrampour
摘要：我们研究分散在线黎曼优化流形可能正曲率，超越阿达玛流形设置。分散优化技术依赖于一个共识步骤，这在欧几里得空间中很好地理解，因为它们是线性的。然而，在正弯曲的黎曼空间中，一个主要的技术挑战是测地距离可能不会导致全局凸结构。在这项工作中，我们首先分析了曲率感知黎曼共识步骤，使阿达玛流形以外的线性收敛。在此基础上，我们建立了分散在线黎曼梯度下降算法的$O（\sqrt{T}）$遗憾界。然后，我们调查的两点强盗反馈设置，在那里我们采用计算效率的梯度估计使用平滑技术，我们证明了相同的O（\sqrt{T}）$遗憾界通过平滑的目标的次凸性分析。
摘要：We study decentralized online Riemannian optimization over manifolds with possibly positive curvature, going beyond the Hadamard manifold setting. Decentralized optimization techniques rely on a consensus step that is well understood in Euclidean spaces because of their linearity. However, in positively curved Riemannian spaces, a main technical challenge is that geodesic distances may not induce a globally convex structure. In this work, we first analyze a curvature-aware Riemannian consensus step that enables a linear convergence beyond Hadamard manifolds. Building on this step, we establish a $O(\sqrt{T})$ regret bound for the decentralized online Riemannian gradient descent algorithm. Then, we investigate the two-point bandit feedback setup, where we employ computationally efficient gradient estimators using smoothing techniques, and we demonstrate the same $O(\sqrt{T})$ regret bound through the subconvexity analysis of smoothed objectives.

预测|估计(5篇)

【1】Forecasting Russian Equipment Losses Using Time Series and Deep Learning Models
标题：使用时间序列和深度学习模型预测俄罗斯装备损失
链接：https://arxiv.org/abs/2509.07813

作者：Teagan
摘要：该研究应用了一系列预测技术，包括ARIMA，Prophet，长短期记忆网络（LSTM），时间卷积网络（TCN）和XGBoost，以建模和预测俄罗斯在乌克兰持续战争期间的设备损失。根据WarSpotting每日和每月的开源情报（OSINT）数据，我们的目标是评估减员趋势，评估模型性能，并估计到2025年底的未来损失模式。我们的研究结果表明，深度学习模型，特别是TCN和LSTM，可以产生稳定和一致的预测，特别是在高时间粒度的条件下。通过比较不同的模型架构和输入结构，本研究强调了集成预测在冲突建模中的重要性，以及公开可用的OSINT数据在量化材料随时间退化方面的价值。
摘要：This study applies a range of forecasting techniques,including ARIMA, Prophet, Long Short Term Memory networks (LSTM), Temporal Convolutional Networks (TCN), and XGBoost, to model and predict Russian equipment losses during the ongoing war in Ukraine. Drawing on daily and monthly open-source intelligence (OSINT) data from WarSpotting, we aim to assess trends in attrition, evaluate model performance, and estimate future loss patterns through the end of 2025. Our findings show that deep learning models, particularly TCN and LSTM, produce stable and consistent forecasts, especially under conditions of high temporal granularity. By comparing different model architectures and input structures, this study highlights the importance of ensemble forecasting in conflict modeling, and the value of publicly available OSINT data in quantifying material degradation over time.

【2】IBN: An Interpretable Bidirectional-Modeling Network for Multivariate Time Series Forecasting with Variable Missing
标题：IBN：一种用于变量缺失的多元时间序列预测的可解释双向建模网络
链接：https://arxiv.org/abs/2509.07725

作者：, Tianhao Zhang, Qijiu Xia, Yun-Bo Zhao
摘要：多变量时间序列预测（MTSF）经常面临变量缺失的挑战，这阻碍了传统的时空图神经网络建模变量间的相关性。虽然GinAR首次使用基于注意力的插补和自适应图学习解决了变量缺失问题，但由于其简单的递归单元（RU），它缺乏可解释性，无法捕获更多潜在的时间模式。为了克服这些限制，我们提出了可解释的双向建模网络（IBN），集成了不确定性感知插值（UAI）和基于高斯核的图卷积（GGCN）。IBN使用MC Dropout估计重建值的不确定性，并应用不确定性加权策略来减轻高风险重建。GGCN显式地对变量之间的空间相关性进行建模，而双向RU增强了时间依赖性建模。大量的实验表明，IBN实现了最先进的预测性能在各种缺失率的情况下，提供了一个更可靠的和可解释的框架MTSF与缺失的变量。代码可从以下网址获得：https://github.com/zhangth1211/NICLab-IBN。
摘要：Multivariate time series forecasting (MTSF) often faces challenges from missing variables, which hinder conventional spatial-temporal graph neural networks in modeling inter-variable correlations. While GinAR addresses variable missing using attention-based imputation and adaptive graph learning for the first time, it lacks interpretability and fails to capture more latent temporal patterns due to its simple recursive units (RUs). To overcome these limitations, we propose the Interpretable Bidirectional-modeling Network (IBN), integrating Uncertainty-Aware Interpolation (UAI) and Gaussian kernel-based Graph Convolution (GGCN). IBN estimates the uncertainty of reconstructed values using MC Dropout and applies an uncertainty-weighted strategy to mitigate high-risk reconstructions. GGCN explicitly models spatial correlations among variables, while a bidirectional RU enhances temporal dependency modeling. Extensive experiments show that IBN achieves state-of-the-art forecasting performance under various missing-rate scenarios, providing a more reliable and interpretable framework for MTSF with missing variables. Code is available at: https://github.com/zhangth1211/NICLab-IBN.

【3】General Demographic Foundation Models for Enhancing Predictive Performance Across Diseases
标题：提高跨疾病预测性能的一般人口基金会模型
链接：https://arxiv.org/abs/2509.07330

作者：hen, Ji-Tian Sheu, Yuh-Jue Chuang
摘要：人口统计学属性普遍存在于电子健康记录中，并在临床风险分层和治疗决策中作为重要的预测因素。尽管它们的重要性，这些属性往往被降级为辅助角色，在模型设计中，与有限的注意力已经给予学习他们的表示。本研究提出了一个一般人口预训练（GDP）模型作为一个基本的代表性框架，适合年龄和性别。该模型使用来自不同地理区域的不同疾病和人口组成的数据集进行预训练和评估。GDP架构探索了排序策略和编码方法的组合，以将表格人口输入转换为潜在嵌入。实验结果表明，顺序排序大大提高了模型的性能，在歧视，校准，并在每个决策树分裂相应的信息增益，特别是在疾病的年龄和性别的风险分层显着贡献。即使在人口统计属性预测值相对较低的数据集中，GDP也增强了代表性重要性，增加了它们在下游梯度提升模型中的影响力。研究结果表明，表格人口统计属性的基础模型可以在任务和人群中推广，为提高医疗保健应用中的预测性能提供了一个有希望的方向。
摘要：Demographic attributes are universally present in electronic health records and serve as vital predictors in clinical risk stratification and treatment decisions. Despite their significance, these attributes are often relegated to auxiliary roles in model design, with limited attention has been given to learning their representations. This study proposes a General Demographic Pre-trained (GDP) model as a foundational representation framework tailored to age and gender. The model is pre-trained and evaluated using datasets with diverse diseases and population compositions from different geographic regions. The GDP architecture explores combinations of ordering strategies and encoding methods to transform tabular demographic inputs into latent embeddings. Experimental results demonstrate that sequential ordering substantially improves model performance in discrimination, calibration, and the corresponding information gain at each decision tree split, particularly in diseases where age and gender contribute significantly to risk stratification. Even in datasets where demographic attributes hold relatively low predictive value, GDP enhances the representational importance, increasing their influence in downstream gradient boosting models. The findings suggest that foundational models for tabular demographic attributes can generalize across tasks and populations, offering a promising direction for improving predictive performance in healthcare applications.

【4】IP-Basis PINNs: Efficient Multi-Query Inverse Parameter Estimation
标题：基于IP的PINN：高效的多查询逆参数估计
链接：https://arxiv.org/abs/2509.07245

作者：nor, Mohammad Kohandel
备注：18 pages, 4 figures
摘要：使用物理信息神经网络（PINN）解决逆问题对于多查询场景来说计算成本很高，因为每个新的观测数据集都需要一个新的昂贵的训练过程。我们提出了逆参数基PINN（IP-Basis PINN），这是一个元学习框架，扩展了Desai等人（2022）的基础工作，以实现快速有效的逆问题推理。我们的方法采用离线-在线分解：首先离线训练深度网络，以产生一组丰富的基函数，这些基函数跨越参数微分方程的解空间。对于每个新的在线逆问题，该网络被冻结，并且通过仅针对观察到的数据训练轻量级线性输出层来推断解决方案和参数。使我们的方法对逆问题有效的关键创新包括：（1）一种新的在线损失公式，用于同时进行解重构和参数识别，（2）通过前向模式自动微分来显著减少PDE损失评估的计算开销，以及（3）用于鲁棒离线训练的非平凡验证和早期停止机制。我们证明了三个不同的基准，包括扩展到通用PINN未知的功能术语，表现出一致的性能在恒定和功能参数估计，一个显着的加速每查询标准PINN，和强大的操作稀缺和嘈杂的数据的IP基础PINN的功效。
摘要：Solving inverse problems with Physics-Informed Neural Networks (PINNs) is computationally expensive for multi-query scenarios, as each new set of observed data requires a new, expensive training procedure. We present Inverse-Parameter Basis PINNs (IP-Basis PINNs), a meta-learning framework that extends the foundational work of Desai et al. (2022) to enable rapid and efficient inference for inverse problems. Our method employs an offline-online decomposition: a deep network is first trained offline to produce a rich set of basis functions that span the solution space of a parametric differential equation. For each new inverse problem online, this network is frozen, and solutions and parameters are inferred by training only a lightweight linear output layer against observed data. Key innovations that make our approach effective for inverse problems include: (1) a novel online loss formulation for simultaneous solution reconstruction and parameter identification, (2) a significant reduction in computational overhead via forward-mode automatic differentiation for PDE loss evaluation, and (3) a non-trivial validation and early-stopping mechanism for robust offline training. We demonstrate the efficacy of IP-Basis PINNs on three diverse benchmarks, including an extension to universal PINNs for unknown functional terms-showing consistent performance across constant and functional parameter estimation, a significant speedup per query over standard PINNs, and robust operation with scarce and noisy data.

【5】Predicting effect of novel treatments using molecular pathways and real-world data
标题：使用分子途径和现实世界数据预测新型治疗方法的效果
链接：https://arxiv.org/abs/2509.07204

作者：uetoux, Thomas Devenyns, Lise Diagne, David Champagne, Pierre-Yves Mousset, Chris Anagnostopoulos
摘要：在药物研发中，在临床测试或任何实际使用之前预测药物治疗特定疾病的功效一直具有挑战性。在本文中，我们提出了一种灵活的基于模块化机器学习的方法，用于预测未经测试的药物治疗疾病的疗效。我们训练一个机器学习模型，使用药物途径权重影响评分和患者数据集，其中可以包括患者特征和观察到的临床结果。然后，所得模型分析未经测试的药物在人类生物分子-蛋白质途径中的加权影响评分，以生成预测的疗效值。我们演示了该方法如何在具有患者治疗和结果的真实世界数据集上工作，具有两种不同的权重影响评分算法。我们包括用于评估对看不见的治疗的泛化性能的方法，以及在哪些条件下该方法可以预期是最具预测性的方法。我们讨论了我们的方法可以迭代的具体方式，使其成为支持未来预测未经测试的药物效果的初步框架，利用RWD临床数据和药物嵌入。
摘要：In pharmaceutical R&D, predicting the efficacy of a pharmaceutical in treating a particular disease prior to clinical testing or any real-world use has been challenging. In this paper, we propose a flexible and modular machine learning-based approach for predicting the efficacy of an untested pharmaceutical for treating a disease. We train a machine learning model using sets of pharmaceutical-pathway weight impact scores and patient data, which can include patient characteristics and observed clinical outcomes. The resulting model then analyses weighted impact scores of an untested pharmaceutical across human biological molecule-protein pathways to generate a predicted efficacy value. We demonstrate how the method works on a real-world dataset with patient treatments and outcomes, with two different weight impact score algorithms We include methods for evaluating the generalisation performance on unseen treatments, and to characterise conditions under which the approach can be expected to be most predictive. We discuss specific ways in which our approach can be iterated on, making it an initial framework to support future work on predicting the effect of untested drugs, leveraging RWD clinical data and drug embeddings.

其他神经网络|深度学习|模型|建模(18篇)

【1】RaC: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction
标题：RaC：通过扩展恢复和纠正来进行长视野任务的机器人学习
链接：https://arxiv.org/abs/2509.07953

作者：u, Robyn Wu, Naveen Enock, Jasmine Li, Riya Kadakia, Zackory Erickson, Aviral Kumar
摘要：现代范例机器人模仿训练表达政策架构上的大量人类示范数据。然而，即使有成千上万的专家演示，在接触丰富、可变形物体和长视野任务上的表现也远低于完美执行。这是由于现有的基于人类远程操作的"专家"数据收集程序效率低下。为了解决这个问题，我们引入了RaC，这是模仿学习预训练之后的一个新的训练阶段。在RaC中，我们对人类干预轨迹进行了微调，以说明恢复和纠正行为。具体而言，在策略推出期间，当故障即将发生时，人类操作员进行干预，首先将机器人倒回熟悉的分布状态，然后提供完成当前子任务的纠正段。对这种数据组合的训练扩展了机器人的技能库，包括重试和适应行为，我们表明这对于提高长期任务的效率和鲁棒性至关重要。在三个现实世界的双手控制任务：衬衫悬挂，气密容器盖密封，外卖盒包装，和模拟装配任务，RaC优于现有的国家的最先进的使用10 $\times $更少的数据收集时间和样本。我们还表明，RaC使测试时间缩放：训练的RaC政策的性能在恢复演习的数量呈线性比例。学习政策的视频可在www.example.com上获得。
摘要：Modern paradigms for robot imitation train expressive policy architectures on large amounts of human demonstration data. Yet performance on contact-rich, deformable-object, and long-horizon tasks plateau far below perfect execution, even with thousands of expert demonstrations. This is due to the inefficiency of existing ``expert'' data collection procedures based on human teleoperation. To address this issue, we introduce RaC, a new phase of training on human-in-the-loop rollouts after imitation learning pre-training. In RaC, we fine-tune a robotic policy on human intervention trajectories that illustrate recovery and correction behaviors. Specifically, during a policy rollout, human operators intervene when failure appears imminent, first rewinding the robot back to a familiar, in-distribution state and then providing a corrective segment that completes the current sub-task. Training on this data composition expands the robotic skill repertoire to include retry and adaptation behaviors, which we show are crucial for boosting both efficiency and robustness on long-horizon tasks. Across three real-world bimanual control tasks: shirt hanging, airtight container lid sealing, takeout box packing, and a simulated assembly task, RaC outperforms the prior state-of-the-art using 10$\times$ less data collection time and samples. We also show that RaC enables test-time scaling: the performance of the trained RaC policy scales linearly in the number of recovery maneuvers it exhibits. Videos of the learned policy are available at https://rac-scaling-robot.github.io/.

【2】Bringing Multi-Modal Multi-Task Federated Foundation Models to Education Domain: Prospects and Challenges
标题：将多模式多任务联邦基金会模型引入教育领域：前景与挑战
链接：https://arxiv.org/abs/2509.07946

作者：azjani, Naji Khosravan, Rajeev Sahay, Bita Akram, Seyyedali Hosseinalipour
备注：12 pages, 2 figures
摘要：多模态多任务（M3 T）基础模型（FM）最近在人工智能中显示出变革潜力，并在教育中出现应用。然而，它们在现实世界教育环境中的部署受到隐私法规、数据孤岛和有限的特定领域数据可用性的阻碍。我们介绍了用于教育的M3 T联邦基础模型（FedFM）：一种将联邦学习（FL）与M3 T FM集成在一起的范例，可以在分散的机构中进行协作，保护隐私的培训，同时适应不同的模式和任务。随后，这份立场文件旨在揭示M3 T FedFM作为教育界一种有前途但尚未开发的方法，探索其潜力，并揭示其相关的未来研究方向。我们概述了M3 T FedFM如何推进下一代智能教育系统的三个关键支柱：（i）隐私保护，通过将敏感的多模态学生和机构数据保持在本地;（ii）个性化，通过模块化架构为学生，教师和机构提供量身定制的模型;（iii）公平和包容性，通过促进代表性不足和资源受限的实体的参与。我们最终确定了各种开放的研究挑战，包括研究（i）机构间异构隐私法规，（ii）数据模式特征的不一致性，（iii）M3 T FedFM的非学习方法，（iv）M3 T FedFM的持续学习框架，以及（v）M3 T FedFM模型的可解释性，这些都必须在实际部署中得到解决。
摘要：Multi-modal multi-task (M3T) foundation models (FMs) have recently shown transformative potential in artificial intelligence, with emerging applications in education. However, their deployment in real-world educational settings is hindered by privacy regulations, data silos, and limited domain-specific data availability. We introduce M3T Federated Foundation Models (FedFMs) for education: a paradigm that integrates federated learning (FL) with M3T FMs to enable collaborative, privacy-preserving training across decentralized institutions while accommodating diverse modalities and tasks. Subsequently, this position paper aims to unveil M3T FedFMs as a promising yet underexplored approach to the education community, explore its potentials, and reveal its related future research directions. We outline how M3T FedFMs can advance three critical pillars of next-generation intelligent education systems: (i) privacy preservation, by keeping sensitive multi-modal student and institutional data local; (ii) personalization, through modular architectures enabling tailored models for students, instructors, and institutions; and (iii) equity and inclusivity, by facilitating participation from underrepresented and resource-constrained entities. We finally identify various open research challenges, including studying of (i) inter-institution heterogeneous privacy regulations, (ii) the non-uniformity of data modalities' characteristics, (iii) the unlearning approaches for M3T FedFMs, (iv) the continual learning frameworks for M3T FedFMs, and (v) M3T FedFM model interpretability, which must be collectively addressed for practical deployment.

【3】One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
标题：适用于所有任务的一个模型：在多任务规划中利用高效世界模型
链接：https://arxiv.org/abs/2509.07945

作者：Yazhe Niu, Jia Tang, Junyu Xiong, Shuai Hu, Hongsheng Li
备注：43 pages, 19 figures
摘要：在异质性多任务学习中，任务不仅表现出不同的观察和行动空间，而且内在难度也有很大差异。虽然像UniZero这样的传统多任务世界模型在单任务环境中表现出色，但我们发现，在处理大规模异构环境时，梯度冲突和模型可塑性的丧失往往会限制其样本和计算效率。在这项工作中，我们从两个角度来解决这些挑战：单一的学习迭代和整体的学习过程。首先，我们研究了关键设计空间对将UniZero扩展到多任务规划的影响。我们发现，混合专家（MoE）架构提供了最大的性能增益，减轻梯度冲突，导致我们提出的模型，\textit{ScaleZero}。其次，为了动态平衡整个学习过程中的计算负载，我们引入了一个在线的，基于LoRA的动态参数缩放（DPS）策略。该策略逐步集成LoRA适配器以响应特定任务的进展，从而实现自适应知识保留和参数扩展。对Atari、DMControl（DMC）和Jericho等标准基准的实证评估表明，ScaleZero完全依赖于一个模型的在线强化学习，其性能与专门的单任务基准相当。此外，当我们的动态参数缩放策略增强，我们的方法实现了竞争力的性能，而只需要80%的单任务环境的交互步骤。这些发现强调了ScaleZero在有效的大规模多任务学习中的潜力。我们的代码可以在\textcolor{magenta}{https：//github.com/opendilab/LightZero}上找到。
摘要：In heterogeneous multi-task learning, tasks not only exhibit diverse observation and action spaces but also vary substantially in intrinsic difficulty. While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling large-scale heterogeneous environments, gradient conflicts and the loss of model plasticity often constrain their sample and computational efficiency. In this work, we address these challenges from two perspectives: the single learning iteration and the overall learning process. First, we investigate the impact of key design spaces on extending UniZero to multi-task planning. We find that a Mixture-of-Experts (MoE) architecture provides the most substantial performance gains by mitigating gradient conflicts, leading to our proposed model, \textit{ScaleZero}. Second, to dynamically balance the computational load across the learning process, we introduce an online, LoRA-based \textit{dynamic parameter scaling} (DPS) strategy. This strategy progressively integrates LoRA adapters in response to task-specific progress, enabling adaptive knowledge retention and parameter expansion. Empirical evaluations on standard benchmarks such as Atari, DMControl (DMC), and Jericho demonstrate that ScaleZero, relying exclusively on online reinforcement learning with one model, attains performance on par with specialized single-task baselines. Furthermore, when augmented with our dynamic parameter scaling strategy, our method achieves competitive performance while requiring only 80\% of the single-task environment interaction steps. These findings underscore the potential of ScaleZero for effective large-scale multi-task learning. Our code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.

【4】Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost
标题：小型开放模型在低资源文学翻译中以极低的成本实现与大型模型近乎对等
链接：https://arxiv.org/abs/2509.07829

作者：as, Laura Diosan, Andreea Tomescu, Andrei Piscoran
备注：25 pages, 8 figures, includes datasets and models released on Hugging Face
摘要：文学翻译作为机器翻译研究中一项独特而复杂的任务，近年来受到了广泛关注。然而，小的开放模型的翻译仍然是一个悬而未决的问题。我们通过介绍TINYFABULIST翻译框架（TF 2）为正在进行的研究做出了贡献，这是一个用于数据集创建，微调和评估英语-罗马尼亚文学翻译的统一框架，集中于创建和开放发布紧凑，微调的语言模型（TF 2 - 12 B）和大规模合成并行数据集（DS-TF 2-EN-RO-3 M和DS-TF 2-EN-RO-15 K）。在DS-TF 1-EN-3 M（TF 1）的基础上，迄今为止最大的合成英语寓言集，我们解决了对低资源语言（如罗马尼亚语）的丰富，高质量文学数据集的需求。我们的管道首先使用高性能的LLM从TF 1池生成15 k高质量的罗马尼亚参考文献。然后，我们将两个阶段的微调过程应用于12 B参数开放权重模型：（i）指令调整以捕获特定类型的叙事风格，以及（ii）适配器压缩以实现高效部署。评估结合了语料库水平BLEU和基于五维LLM的标准（准确性，流畅性，连贯性，风格，文化适应性），以提供对翻译质量的细致入微的评估。结果表明，我们的微调模型实现了流畅性和充分性与顶级性能的大型专有模型的竞争，同时是开放的，可访问的，并显着更具成本效益。除了微调的模型和两个数据集，我们还公开发布了所有脚本和评估提示。因此，TF 2提供了一个端到端的，可复制的管道，用于研究具有成本效益的翻译，跨语言叙事生成，以及在低资源环境中广泛采用具有文化意义的文学内容的开放模型。
摘要：Literary translation has recently gained attention as a distinct and complex task in machine translation research. However, the translation by small open models remains an open problem. We contribute to this ongoing research by introducing TINYFABULIST TRANSLATION FRAMEWORK (TF2), a unified framework for dataset creation, fine tuning, and evaluation in English-Romanian literary translations, centred on the creation and open release of both a compact, fine tuned language model (TF2-12B) and large scale synthetic parallel datasets (DS-TF2-EN-RO-3M and DS-TF2-EN-RO-15K). Building on DS-TF1-EN-3M (TF1), the largest collection of synthetic English fables to date, we address the need for rich, high quality literary datasets in low resource languages such as Romanian. Our pipeline first generates 15k high quality Romanian references from the TF1 pool using a high performing LLM. We then apply a two stage fine tuning process to a 12B parameter open weight model: (i) instruction tuning to capture genre specific narrative style, and (ii) adapter compression for efficient deployment. Evaluation combines corpus level BLEU and a five dimension LLM based rubric (accuracy, fluency, coherence, style, cultural adaptation) to provide a nuanced assessment of translation quality. Results show that our fine tuned model achieves fluency and adequacy competitive with top performing large proprietary models, while being open, accessible, and significantly more cost effective. Alongside the fine tuned model and both datasets, we publicly release all scripts and evaluation prompts. TF2 thus provides an end-to-end, reproducible pipeline for research on cost efficient translation, cross lingual narrative generation, and the broad adoption of open models for culturally significant literary content in low resource settings.

【5】Homogenization with Guaranteed Bounds via Primal-Dual Physically Informed Neural Networks
标题：通过原始-二元物理信息神经网络实现有保证边界的均匀化
链接：https://arxiv.org/abs/2509.07579

作者：utdinova, Martin Doškář, Ondřej Rokoš, Ivana Pultarová
摘要：物理信息神经网络（PINN）在解决与多尺度建模相关的偏微分方程（PDE）方面表现出了希望，但当应用于具有不连续系数的材料时，例如具有分段常数特性的介质时，它们往往会失败。本文介绍了一个双重制定的PINN框架，以提高可靠性的均匀化周期性导热复合材料，强和变分（弱）配方。对偶方法便于推导保证的误差上限和下限，从而能够更鲁棒地检测PINN故障。我们比较标准的PINNs应用到平滑的材料近似变分PINNs（VPINNs）使用光谱和神经网络为基础的测试功能。我们的研究结果表明，虽然强形式PINN可能优于VPINN在受控设置，他们是敏感的材料不连续性，可能会失败，没有明确的诊断。相比之下，VPINNs直接适应分段恒定的材料参数，但需要仔细选择测试功能，以避免不稳定性。对偶公式作为收敛质量的可靠指标，其集成到PINN框架中增强了其对细观力学中均匀化问题的适用性。
摘要：Physics-informed neural networks (PINNs) have shown promise in solving partial differential equations (PDEs) relevant to multiscale modeling, but they often fail when applied to materials with discontinuous coefficients, such as media with piecewise constant properties. This paper introduces a dual formulation for the PINN framework to improve the reliability of the homogenization of periodic thermo-conductive composites, for both strong and variational (weak) formulations. The dual approach facilitates the derivation of guaranteed upper and lower error bounds, enabling more robust detection of PINN failure. We compare standard PINNs applied to smoothed material approximations with variational PINNs (VPINNs) using both spectral and neural network-based test functions. Our results indicate that while strong-form PINNs may outperform VPINNs in controlled settings, they are sensitive to material discontinuities and may fail without clear diagnostics. In contrast, VPINNs accommodate piecewise constant material parameters directly but require careful selection of test functions to avoid instability. Dual formulation serves as a reliable indicator of convergence quality, and its integration into PINN frameworks enhances their applicability to homogenization problems in micromechanics.

【6】uGMM-NN: Univariate Gaussian Mixture Model Neural Network
标题：uGMM-NN：一元高斯混合模型神经网络
链接：https://arxiv.org/abs/2509.07569

作者：harif Ali
备注：10 pages, 2 figures
摘要：本文介绍了单变量高斯混合模型神经网络（uGMM-NN），这是一种新型的神经架构，它将概率推理直接嵌入到深度网络的计算单元中。与传统神经元不同，传统神经元应用加权和，然后是固定的非线性，每个uGMM-NN节点将其激活参数化为单变量高斯混合，具有可学习的均值，方差和混合系数。这种设计通过在单个神经元级别捕获多模态和不确定性来实现更丰富的表示，同时保留标准前馈网络的可扩展性。我们证明，uGMM-NN可以实现有竞争力的判别性能相比，传统的多层感知器，同时还提供了一个概率解释的激活。该框架为将不确定性感知组件集成到现代神经架构中提供了基础，为判别式和生成式建模开辟了新的方向。
摘要：This paper introduces the Univariate Gaussian Mixture Model Neural Network (uGMM-NN), a novel neural architecture that embeds probabilistic reasoning directly into the computational units of deep networks. Unlike traditional neurons, which apply weighted sums followed by fixed nonlinearities, each uGMM-NN node parameterizes its activations as a univariate Gaussian mixture, with learnable means, variances, and mixing coefficients. This design enables richer representations by capturing multimodality and uncertainty at the level of individual neurons, while retaining the scalability of standard feedforward networks. We demonstrate that uGMM-NN can achieve competitive discriminative performance compared to conventional multilayer perceptrons, while additionally offering a probabilistic interpretation of activations. The proposed framework provides a foundation for integrating uncertainty-aware components into modern neural architectures, opening new directions for both discriminative and generative modeling.

【7】Reconstruction Alignment Improves Unified Multimodal Models
标题：重建对齐改进统一多模式模型
链接：https://arxiv.org/abs/2509.07295

作者：revor Darrell, Luke Zettlemoyer, XuDong Wang
备注：28 pages, 24 figures and 10 tables
摘要：统一多模态模型（UMM）将视觉理解和生成统一在单个架构中。然而，传统的训练依赖于图像-文本对（或序列），其标题通常是稀疏的，并且错过了细粒度的视觉细节-即使它们使用数百个单词来描述一个简单的图像。我们引入了重建对齐（RecA），这是一种资源高效的后训练方法，它利用视觉理解编码器嵌入作为密集的“文本提示”，提供丰富的监督而无需字幕。具体地说，RecA将UMM置于其自身的视觉理解嵌入上，并对其进行优化，以重建具有自监督重建损失的输入图像，从而重新调整理解和生成。尽管RecA很简单，但它具有广泛的适用性：在自回归、掩蔽自回归和基于扩散的UMM中，它始终提高了生成和编辑的保真度。仅用27个GPU小时，RecA的后期训练大大提高了GenEval（0.73$\rightarrow$0.90）和DPGBench（80.93$\rightarrow$88.15）上的图像生成性能，同时也提高了编辑基准（ImgEdit 3.38$\rightarrow$3.75，GEdit 6.94$\rightarrow$7.25）。值得注意的是，RecA超越了更大的开源模型，并广泛应用于各种UMM架构，将其确立为UMM的有效和通用的培训后对齐策略
摘要：Unified multimodal models (UMMs) unify visual understanding and generation within a single architecture. However, conventional training relies on image-text pairs (or sequences) whose captions are typically sparse and miss fine-grained visual details--even when they use hundreds of words to describe a simple image. We introduce Reconstruction Alignment (RecA), a resource-efficient post-training method that leverages visual understanding encoder embeddings as dense "text prompts," providing rich supervision without captions. Concretely, RecA conditions a UMM on its own visual understanding embeddings and optimizes it to reconstruct the input image with a self-supervised reconstruction loss, thereby realigning understanding and generation. Despite its simplicity, RecA is broadly applicable: across autoregressive, masked-autoregressive, and diffusion-based UMMs, it consistently improves generation and editing fidelity. With only 27 GPU-hours, post-training with RecA substantially improves image generation performance on GenEval (0.73$\rightarrow$0.90) and DPGBench (80.93$\rightarrow$88.15), while also boosting editing benchmarks (ImgEdit 3.38$\rightarrow$3.75, GEdit 6.94$\rightarrow$7.25). Notably, RecA surpasses much larger open-source models and applies broadly across diverse UMM architectures, establishing it as an efficient and general post-training alignment strategy for UMMs

【8】Learning Generalized Hamiltonian Dynamics with Stability from Noisy Trajectory Data
标题：从噪声轨迹数据中学习具有稳定性的广义Hamilton动力学
链接：https://arxiv.org/abs/2509.07280

作者：nnan, Yi Wang, Ryan Farell, Minh Nguyen, Chandrajit Bajaj
摘要：我们引入了一个强大的框架，学习各种广义哈密顿动力学的噪声，稀疏相空间数据和变分贝叶斯推理的基础上，在无监督的方式。虽然保守、耗散和端口-哈密顿系统可能共享封闭系统的相同初始总能量，但是单个哈密顿网络模型从采样的观测相空间轨迹捕获相空间的独特和变化的运动动力学和物理是具有挑战性的。为了解决这个复杂的哈密顿流形学习的挑战，我们扩展稀疏辛，随机傅立叶高斯过程学习与预测连续的数值估计的哈密顿景观，使用状态和共轭动量哈密顿动力学的广义形式，适用于不同类别的保守，耗散和端口哈密顿物理系统。除了用于数据保真度的核化证据下限（ELBO）损失外，我们还将稳定性和守恒约束作为额外的超参数平衡损失项，以正则化模型的多梯度，增强物理正确性，以提高预测精度和有界不确定性。
摘要：We introduce a robust framework for learning various generalized Hamiltonian dynamics from noisy, sparse phase-space data and in an unsupervised manner based on variational Bayesian inference. Although conservative, dissipative, and port-Hamiltonian systems might share the same initial total energy of a closed system, it is challenging for a single Hamiltonian network model to capture the distinctive and varying motion dynamics and physics of a phase space, from sampled observational phase space trajectories. To address this complicated Hamiltonian manifold learning challenge, we extend sparse symplectic, random Fourier Gaussian processes learning with predictive successive numerical estimations of the Hamiltonian landscape, using a generalized form of state and conjugate momentum Hamiltonian dynamics, appropriate to different classes of conservative, dissipative and port-Hamiltonian physical systems. In addition to the kernelized evidence lower bound (ELBO) loss for data fidelity, we incorporate stability and conservation constraints as additional hyper-parameter balanced loss terms to regularize the model's multi-gradients, enforcing physics correctness for improved prediction accuracy with bounded uncertainty.

【9】GCond: Gradient Conflict Resolution via Accumulation-based Stabilization for Large-Scale Multi-Task Learning
标题：GCond：通过基于累积的稳定化来解决大规模多任务学习的梯度冲突
链接：https://arxiv.org/abs/2509.07252

作者：ves Limarenko, Anastasiia Alexandrovna Studenikina
备注：Preprint. Submitted to PeerJ
摘要：在多任务学习（MTL）中，梯度冲突是一个重大挑战。解决这个问题的有效方法，包括PCGrad，CAGrad和GradNorm，在其原始实现中计算要求很高，这大大限制了它们在现代大型模型和Transformers中的应用。我们提出了梯度导体（GCond），一种建立在PCGrad原则基础上的方法，将其与梯度累积和自适应仲裁机制相结合。我们使用MobileNetV 3-Small和ConvNeXt架构在ImageNet 1 K数据集和头颈部CT扫描数据集上评估了GCond的自监督学习任务，将所提出的方法与基线线性组合和最先进的梯度冲突解决方法进行了比较。GCond的随机模式在保持优化质量的同时实现了两倍的计算加速，并在所有评估的指标中表现出卓越的性能，与两个数据集上的其他方法相比，实现了更低的L1和SSIM损失。GCond具有很高的可扩展性，成功地应用于紧凑型模型（MobileNetV 3-Small）和大型架构（ConvNeXt-tiny和ConvNeXt-Base）。它还显示出与现代优化器（如AdamW和Lion/LARS）的兼容性。因此，GCond为多任务学习中的梯度冲突问题提供了一个可扩展且有效的解决方案。
摘要：In multi-task learning (MTL), gradient conflict poses a significant challenge. Effective methods for addressing this problem, including PCGrad, CAGrad, and GradNorm, in their original implementations are computationally demanding, which significantly limits their application in modern large models and transformers. We propose Gradient Conductor (GCond), a method that builds upon PCGrad principles by combining them with gradient accumulation and an adaptive arbitration mechanism. We evaluated GCond on self-supervised learning tasks using MobileNetV3-Small and ConvNeXt architectures on the ImageNet 1K dataset and a combined head and neck CT scan dataset, comparing the proposed method against baseline linear combinations and state-of-the-art gradient conflict resolution methods. The stochastic mode of GCond achieved a two-fold computational speedup while maintaining optimization quality, and demonstrated superior performance across all evaluated metrics, achieving lower L1 and SSIM losses compared to other methods on both datasets. GCond exhibited high scalability, being successfully applied to both compact models (MobileNetV3-Small) and large architectures (ConvNeXt-tiny and ConvNeXt-Base). It also showed compatibility with modern optimizers such as AdamW and Lion/LARS. Therefore, GCond offers a scalable and efficient solution to the problem of gradient conflicts in multi-task learning.

【10】Breaking the Conventional Forward-Backward Tie in Neural Networks: Activation Functions
标题：打破神经网络中传统的前后联系：激活函数
链接：https://arxiv.org/abs/2509.07236

作者：iano, Francesco Gissi, Vincenzo Benedetto, Genny Tortora
备注：30 pages, 8 figures, 14 tables, in press, available online 11 August 2025
摘要：传统上，基于递归的神经网络训练强制前向和后向传播之间的对称性，要求激活函数在某些区域是可微的（或次可微的）和严格单调的，以防止平坦的梯度区域。这种对称性将前向激活与后向梯度紧密联系起来，显著限制了激活函数的选择，特别是排除了那些具有基本平坦或不可微区域的激活函数。在本文中，我们挑战这个假设，通过数学分析，证明了精确的梯度幅值来自激活函数在很大程度上是多余的，提供梯度方向被保存。在基础架构（如多层感知器（MLP），卷积神经网络（CNN）和二进制神经网络（BNN））上进行的经验实验证实，放松前后对称性并用更简单或随机的替代品取代传统梯度不会损害学习，甚至可能提高训练稳定性和效率。我们明确地证明了具有平坦或不可微激活函数（如Heaviside阶跃函数）的神经网络可以有效地训练，从而扩展了设计灵活性和计算效率。更复杂架构的进一步实证验证仍然是未来研究的一个有价值的方向。
摘要：Gradient-based neural network training traditionally enforces symmetry between forward and backward propagation, requiring activation functions to be differentiable (or sub-differentiable) and strictly monotonic in certain regions to prevent flat gradient areas. This symmetry, linking forward activations closely to backward gradients, significantly restricts the selection of activation functions, particularly excluding those with substantial flat or non-differentiable regions. In this paper, we challenge this assumption through mathematical analysis, demonstrating that precise gradient magnitudes derived from activation functions are largely redundant, provided the gradient direction is preserved. Empirical experiments conducted on foundational architectures - such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), and Binary Neural Networks (BNNs) - confirm that relaxing forward-backward symmetry and substituting traditional gradients with simpler or stochastic alternatives does not impair learning and may even enhance training stability and efficiency. We explicitly demonstrate that neural networks with flat or non-differentiable activation functions, such as the Heaviside step function, can be effectively trained, thereby expanding design flexibility and computational efficiency. Further empirical validation with more complex architectures remains a valuable direction for future research.

【11】Explaining How Quantization Disparately Skews a Model
标题：解释量化如何不同程度地扭曲模型
链接：https://arxiv.org/abs/2509.07222

作者： Bellam, Jung-Eun Kim
摘要：后训练量化（PTQ）由于其高压缩容量和速度而被广泛采用，并且对精度的影响最小。然而，我们观察到量化加剧了不同的影响，特别是对少数群体来说。我们的分析解释说，在量化过程中，有一个因素链归因于一个不同的影响组在向前和向后通过。我们探讨了量化引起的权重和激活的变化如何在网络中引起级联影响，导致logits具有较低的方差，增加的损失和受损的组准确性。我们扩展了我们的研究，以验证这些影响对组梯度范数和Hessian矩阵的特征值的影响，从优化的角度提供了对网络状态的见解。为了减轻这些影响，我们建议将混合精度量化感知训练（QAT）与数据集采样方法和加权损失函数相结合，从而提供量化神经网络的公平部署。
摘要：Post Training Quantization (PTQ) is widely adopted due to its high compression capacity and speed with minimal impact on accuracy. However, we observed that disparate impacts are exacerbated by quantization, especially for minority groups. Our analysis explains that in the course of quantization there is a chain of factors attributed to a disparate impact across groups during forward and backward passes. We explore how the changes in weights and activations induced by quantization cause cascaded impacts in the network, resulting in logits with lower variance, increased loss, and compromised group accuracies. We extend our study to verify the influence of these impacts on group gradient norms and eigenvalues of the Hessian matrix, providing insights into the state of the network from an optimization point of view. To mitigate these effects, we propose integrating mixed precision Quantization Aware Training (QAT) with dataset sampling methods and weighted loss functions, therefore providing fair deployment of quantized neural networks.

【12】Lookup multivariate Kolmogorov-Arnold Networks
标题：多维Kolmogorov-Arnold网络
链接：https://arxiv.org/abs/2509.07103

作者：zdnyakov, Philippe Schwaller
摘要：高维线性映射或线性层主导着大多数现代深度学习模型的参数计数和计算成本。我们引入了一种通用的直接替代、查找多元Kolmogorov-Arnold网络（lmKAN），它在容量和推理成本之间提供了更好的权衡。我们的构造通过可训练的低维多元函数表达了一个一般的高维映射。这些函数每个都可以携带数十或数百个可训练参数，但计算它们只需要几次乘法，因为它们被实现为样条查找表。从经验上讲，lmKAN将推理FLOP减少高达6.0倍，同时匹配一般高维函数近似中MLP的灵活性。在另一个前馈全连接基准测试中，在随机置换甲烷配置的表格式数据集上，lmKAN在相同精度下实现了超过10倍的H100吞吐量。在卷积神经网络的框架内，基于lmKAN的CNN在CIFAR-10和ImageNet-1 k数据集上分别以1.6- 2.1倍和1.7倍的匹配精度削减推理FLOP。我们的代码，包括专用CUDA内核，可在https://github.com/schwallergroup/lmkan上在线获得。
摘要：High-dimensional linear mappings, or linear layers, dominate both the parameter count and the computational cost of most modern deep-learning models. We introduce a general drop-in replacement, lookup multivariate Kolmogorov-Arnold Networks (lmKANs), which deliver a substantially better trade-off between capacity and inference cost. Our construction expresses a general high-dimensional mapping through trainable low-dimensional multivariate functions. These functions can carry dozens or hundreds of trainable parameters each, and yet it takes only a few multiplications to compute them because they are implemented as spline lookup tables. Empirically, lmKANs reduce inference FLOPs by up to 6.0x while matching the flexibility of MLPs in general high-dimensional function approximation. In another feedforward fully connected benchmark, on the tabular-like dataset of randomly displaced methane configurations, lmKANs enable more than 10x higher H100 throughput at equal accuracy. Within frameworks of Convolutional Neural Networks, lmKAN-based CNNs cut inference FLOPs at matched accuracy by 1.6-2.1x and by 1.7x on the CIFAR-10 and ImageNet-1k datasets, respectively. Our code, including dedicated CUDA kernels, is available online at https://github.com/schwallergroup/lmkan.

【13】Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models
标题：基于矩和功率谱的文本到图像模型高斯正规化
链接：https://arxiv.org/abs/2509.07027

作者：ang, Jaihoon Kim, Minhyuk Sung
备注：Submitted to NeurIPS 2025
摘要：我们提出了一种新的正则化损失，强制执行标准高斯，鼓励样本与标准高斯分布对齐。这有助于一系列下游任务，包括优化文本到图像模型的潜在空间。我们把高维样本的元素作为一维标准高斯变量，并定义了一个复合损失，结合了基于矩的正则化在空间域与功率谱为基础的正则化在谱域。由于矩和功率谱分布的期望值是解析已知的，因此损失促进了对这些性质的符合。为了确保排列不变性，损失被应用于随机排列的输入。值得注意的是，现有的基于高斯的正则化属于我们的统一框架：一些对应于特定阶数的矩损失，而之前的协方差匹配损失相当于我们的谱损失，但由于其空间域计算而导致更高的时间复杂度。我们展示了我们的正则化在生成建模中的应用，用于测试时的奖励对齐与文本到图像模型，特别是增强美学和文本对齐。我们的正则化优于以前的高斯正则化，有效地防止奖励黑客攻击，并加速收敛。
摘要：We propose a novel regularization loss that enforces standard Gaussianity, encouraging samples to align with a standard Gaussian distribution. This facilitates a range of downstream tasks involving optimization in the latent space of text-to-image models. We treat elements of a high-dimensional sample as one-dimensional standard Gaussian variables and define a composite loss that combines moment-based regularization in the spatial domain with power spectrum-based regularization in the spectral domain. Since the expected values of moments and power spectrum distributions are analytically known, the loss promotes conformity to these properties. To ensure permutation invariance, the losses are applied to randomly permuted inputs. Notably, existing Gaussianity-based regularizations fall within our unified framework: some correspond to moment losses of specific orders, while the previous covariance-matching loss is equivalent to our spectral loss but incurs higher time complexity due to its spatial-domain computation. We showcase the application of our regularization in generative modeling for test-time reward alignment with a text-to-image model, specifically to enhance aesthetics and text alignment. Our regularization outperforms previous Gaussianity regularization, effectively prevents reward hacking and accelerates convergence.

【14】1 bit is all we need: binary normalized neural networks
标题：我们只需要1位：二进制规范化神经网络
链接：https://arxiv.org/abs/2509.07025

作者：obo Lustoda Cabral, Paulo Pirozelli, Larissa Driemeier
备注：14 pages; 2 figures; 5 tables; 8 algorithms
摘要：大型神经网络模型（特别是语言模型和基础图像模型）的规模越来越大，这给部署带来了挑战，促使人们努力降低内存需求并提高计算效率。这些工作对于确保在各种应用程序中实际部署和有效利用这些模型至关重要。在这项工作中，开发了一种新型的神经网络层和模型，仅使用单比特参数。在这种新型的模型中，所有层的所有参数，包括核权重和偏置，只有等于零或一的值。这种新类型的模型使用称为二进制规范化层的层。这些二进制归一化层可以是任何类型的，例如全连接、卷积、注意力等，并且它们由相应的常规层的微小变化组成。为了显示二进制规范化层的有效性，配置了两个不同的模型来解决多类图像分类问题和语言解码器来预测序列的下一个标记。解决图像分类的模型具有卷积层和全连接层，语言模型由具有多头注意力的Transformer块组成。结果表明，具有二进制规范化层的模型与具有真实32位参数的等效模型所获得的结果几乎相同。二进制规范化层允许开发使用比当前模型少32倍的内存的模型，并具有相同的性能。此外，二进制归一化层可以很容易地实现在当前的计算机上使用1位阵列，并且不需要开发专用的电子硬件。这种新型的层为大型神经网络模型开辟了一个新时代，它降低了内存需求，可以使用简单廉价的硬件（如移动设备或仅CPU）进行部署。
摘要：The increasing size of large neural network models, specifically language models and foundational image models, poses deployment challenges, prompting efforts to reduce memory requirements and enhance computational efficiency. These efforts are critical to ensure practical deployment and effective utilization of these models across various applications. In this work, a novel type of neural network layers and models is developed that uses only single-bit parameters. In this novel type of models all parameters of all layers, including kernel weights and biases, only have values equal to zero or one. This novel type of models uses layers named as binary normalized layer. These binary normalized layers can be of any type, such as fully connected, convolutional, attention, etc., and they consist of slight variations of the corresponding conventional layers. To show the effectiveness of the binary normalized layers, two different models are configured to solve a multiclass image classification problem and a language decoder to predict the next token of a sequence. The model to solve the image classification has convolutional and fully connected layers, and the language model is composed of transformer blocks with multi-head attention. The results show that models with binary normalized layers present almost the same results obtained by equivalent models with real 32-bit parameters. The binary normalized layers allow to develop models that use 32 times less memory than current models and have equivalent performance. Besides, the binary normalized layers can be easily implemented on current computers using 1-bit arrays, and do not require the development of dedicated electronic hardware. This novel type of layers opens a new era for large neural network models with reduced memory requirements that can be deployed using simple and cheap hardware, such as mobile devices or only cpus.

【15】Machine Generalize Learning in Agent-Based Models: Going Beyond Surrogate Models for Calibration in ABMs
标题：基于代理的模型中的机器广义学习：超越代理模型进行ABM校准
链接：https://arxiv.org/abs/2509.07013

作者：fzadehkhoei, George Vega Yon, Bernardo Modenesi, Derek S.Meyer
摘要：校准基于代理的流行病模型在计算上要求很高。我们提出了一个有监督的机器学习校准器，学习从流行病时间序列到SIR参数的逆映射。三层双向LSTM摄取60天的发病率以及种群规模和恢复率，并输出传播概率、接触率和R 0。训练使用一个复合损失与流行病学动机的一致性惩罚，鼓励R 0\* 恢复率等于传播概率\* 接触率。在1000个场景的模拟研究中，我们比较了校准器与近似贝叶斯计算（无似然MCMC）。该方法在所有目标上实现了较低的误差（MAE：R 0 0.0616 vs 0.275;传输0.0715 vs 0.128;接触1.02 vs 4.24），产生了接近标称覆盖范围的更紧密的预测间隔，并将每次校准的挂钟时间从77.4秒减少到2.35秒。虽然接触率和传播概率是部分不可识别的，但该方法比ABC更忠实地再现了流行曲线，从而实现了快速和实用的校准。我们评估它的SIR代理基于epiworldR生成的流行病，并提供了在R中的实现。
摘要：Calibrating agent-based epidemic models is computationally demanding. We present a supervised machine learning calibrator that learns the inverse mapping from epidemic time series to SIR parameters. A three-layer bidirectional LSTM ingests 60-day incidence together with population size and recovery rate, and outputs transmission probability, contact rate, and R0. Training uses a composite loss with an epidemiology-motivated consistency penalty that encourages R0 \* recovery rate to equal transmission probability \* contact rate. In a 1000-scenario simulation study, we compare the calibrator with Approximate Bayesian Computation (likelihood-free MCMC). The method achieves lower error across all targets (MAE: R0 0.0616 vs 0.275; transmission 0.0715 vs 0.128; contact 1.02 vs 4.24), produces tighter predictive intervals with near nominal coverage, and reduces wall clock time from 77.4 s to 2.35 s per calibration. Although contact rate and transmission probability are partially nonidentifiable, the approach reproduces epidemic curves more faithfully than ABC, enabling fast and practical calibration. We evaluate it on SIR agent based epidemics generated with epiworldR and provide an implementation in R.

【16】Toward Reproducible Cross-Backend Compatibility for Deep Learning: A Configuration-First Framework with Three-Tier Verification
标题：迈向深度学习的可重复跨后台兼容性：具有三层验证的描述优先框架
链接：https://arxiv.org/abs/2509.06977

作者：
备注：7 pages, 7 figures, 3 tables, appendix, code available at this https URL
摘要：本文提出了一个配置优先的框架，用于评估部署在CPU、GPU和编译运行时上的深度学习系统的跨后端兼容性。该框架使用YAML从代码中提取实验，支持库和存储库模型，并采用三层验证协议，涵盖张量级接近度，激活对齐和任务级指标。通过对多个模型和公差设置的672次检查，我们观察到72.0%的运行通过，大多数差异发生在更严格的阈值下。我们的结果表明，检测模型和编译后端特别容易出现漂移，这通常是由于不确定的后处理。我们进一步证明，确定性适配器和选择性回退可以大大提高协议，而不会显着的性能损失。据我们所知，这是第一个系统地量化和减轻深度学习中跨后端漂移的统一框架，为跨异构运行时的可靠部署提供了可重复的方法。
摘要：This paper presents a configuration-first framework for evaluating cross-backend compatibility in deep learning systems deployed on CPU, GPU, and compiled runtimes. The framework decouples experiments from code using YAML, supports both library and repository models, and employs a three-tier verification protocol covering tensor-level closeness, activation alignment, and task-level metrics. Through 672 checks across multiple models and tolerance settings, we observe that 72.0% of runs pass, with most discrepancies occurring under stricter thresholds. Our results show that detection models and compiled backends are particularly prone to drift, often due to nondeterministic post-processing. We further demonstrate that deterministic adapters and selective fallbacks can substantially improve agreement without significant performance loss. To our knowledge, this is the first unified framework that systematically quantifies and mitigates cross-backend drift in deep learning, providing a reproducible methodology for dependable deployment across heterogeneous runtimes.

【17】TGLF-SINN: Deep Learning Surrogate Model for Accelerating Turbulent Transport Modeling in Fusion
标题：TGLF-SINN：加速聚变湍流传输建模的深度学习替代模型
链接：https://arxiv.org/abs/2509.07024

作者： Futian Zhang, Wesley Liu, Tom Neiser, Orso Meneghini, Lawson Fuller, Sterling Smith, Raffi Nazikian, Brian Sammuli, Rose Yu
摘要：捕获的陀螺朗道流体（TGLF）模型提供了快速，准确的预测托卡马克中的湍流输运，但整个设备的模拟需要数千的评估仍然计算昂贵。神经网络（NN）替代品提供了具有完全可微近似的加速推理，该近似能够实现基于梯度的耦合，但通常需要大型训练数据集来捕获跨等离子体条件的传输通量变化，从而产生显著的训练负担并限制了昂贵的陀螺动力学模拟的适用性。我们提出\textbf{TGLF-SINN（光谱信息神经网络）}具有三个关键创新：（1）减少目标预测范围的原则性特征工程，简化学习任务;（2）物理引导的传输光谱正则化，以提高稀疏数据下的泛化能力;以及（3）贝叶斯主动学习（BAL），基于模型不确定性策略性地选择训练样本，在保持准确性的同时减少数据要求。我们的方法以显著更少的训练数据实现了卓越的性能。在离线设置中，TGLF-SINN将对数均方根误差（LRMSE）降低了12。与当前基线\基数相比，4\%。使用BAL仅使用25%的完整数据集，我们实现的LRMSE仅比基础模型高0.0165，比离线模型高0.0248（0.0583）。在下游通量匹配应用中，我们的NN代理提供了比TGLF高45倍的加速，同时保持了相当的准确性，证明了为数据采集成本高且稀疏的高保真模型训练有效代理的潜力。
摘要：The Trapped Gyro-Landau Fluid (TGLF) model provides fast, accurate predictions of turbulent transport in tokamaks, but whole device simulations requiring thousands of evaluations remain computationally expensive. Neural network (NN) surrogates offer accelerated inference with fully differentiable approximations that enable gradient-based coupling but typically require large training datasets to capture transport flux variations across plasma conditions, creating significant training burden and limiting applicability to expensive gyrokinetic simulations. We propose \textbf{TGLF-SINN (Spectra-Informed Neural Network)} with three key innovations: (1) principled feature engineering that reduces target prediction range, simplifying the learning task; (2) physics-guided regularization of transport spectra to improve generalization under sparse data; and (3) Bayesian Active Learning (BAL) to strategically select training samples based on model uncertainty, reducing data requirements while maintaining accuracy. Our approach achieves superior performance with significantly less training data. In offline settings, TGLF-SINN reduces logarithmic root mean squared error (LRMSE) by 12. 4\% compared to the current baseline \base. Using only 25\% of the complete dataset with BAL, we achieve LRMSE only 0.0165 higher than \base~and 0.0248 higher than our offline model (0.0583). In downstream flux matching applications, our NN surrogate provides 45x speedup over TGLF while maintaining comparable accuracy, demonstrating potential for training efficient surrogates for higher-fidelity models where data acquisition is costly and sparse.

【18】Toric geometry of ReLU neural networks
标题：ReLU神经网络的复曲面几何
链接：https://arxiv.org/abs/2509.05894

作者：u
摘要：给定一个连续的分段线性函数$f：\mathbb{R}^{n_0} \to \mathbb{R}$和一个固定的前馈ReLU神经网络结构$（n_0，\ldots，n_k;1）$，精确的函数实现问题是确定具有给定结构的网络何时实现$f$。为了开发一种系统的方法来回答这些问题，我们在复曲面几何和ReLU神经网络之间建立了联系。这种方法使我们能够利用代数几何中的许多结构和工具来研究ReLU神经网络。从具有合理权重的无偏ReLU神经网络开始，我们定义了与网络相关的ReLU fan，ReLU toric variety和ReLU Cartier除数。这项工作还揭示了ReLU神经网络的热带几何和环面几何之间的联系。作为环面几何框架的应用，我们通过计算ReLU Cartier因子和环面不变曲线的交集数，证明了无偏浅层ReLU神经网络可实现函数的一个充要条件。
摘要：Given a continuous finitely piecewise linear function $f:\mathbb{R}^{n_0} \to \mathbb{R}$ and a fixed architecture $(n_0,\ldots,n_k;1)$ of feedforward ReLU neural networks, the exact function realization problem is to determine when some network with the given architecture realizes $f$. To develop a systematic way to answer these questions, we establish a connection between toric geometry and ReLU neural networks. This approach enables us to utilize numerous structures and tools from algebraic geometry to study ReLU neural networks. Starting with an unbiased ReLU neural network with rational weights, we define the ReLU fan, the ReLU toric variety, and the ReLU Cartier divisor associated with the network. This work also reveals the connection between the tropical geometry and the toric geometry of ReLU neural networks. As an application of the toric geometry framework, we prove a necessary and sufficient criterion of functions realizable by unbiased shallow ReLU neural networks by computing intersection numbers of the ReLU Cartier divisor and torus-invariant curves.

其他(27篇)

【1】Customizing the Inductive Biases of Softmax Attention using Structured Matrices
标题：使用结构化矩阵自定义Softmax Attention的诱导偏差
链接：https://arxiv.org/abs/2509.07963

作者：ng, Noah Amsel, Sanae Lotfi, Shikai Qiu, Andres Potapczynski, Andrew Gordon Wilson
备注：ICML 2025. Code available at this https URL
摘要：注意力的核心组件是评分函数，它将输入转换为低维查询和键，并取每对的点积。虽然低维投影提高了效率，但它会导致某些具有固有高维输入的任务的信息丢失。此外，注意力对所有输入对使用相同的评分函数，而不会对序列中的相邻标记施加依赖于距离的计算偏差。在这项工作中，我们通过提出新的评分函数来解决这些缺点，这些评分函数基于具有高秩的计算效率高的结构化矩阵，包括块张量训练（BTT）和多级低秩（MLR）矩阵。在具有高维输入的上下文回归任务中，我们提出的评分函数在任何固定计算预算下都优于标准注意力。在语言建模，一个任务，表现出局部模式，我们的MLR为基础的注意力方法实现了改进的比例尺相比，标准的注意力和滑动窗口注意力的变体。此外，我们表明，BTT和MLR属于一个更广泛的家庭的高效结构化矩阵能够编码满秩或距离依赖的计算偏差，从而解决了标准注意力的重大缺点。最后，我们表明，MLR注意有前途的长期时间序列预测的结果。
摘要：The core component of attention is the scoring function, which transforms the inputs into low-dimensional queries and keys and takes the dot product of each pair. While the low-dimensional projection improves efficiency, it causes information loss for certain tasks that have intrinsically high-dimensional inputs. Additionally, attention uses the same scoring function for all input pairs, without imposing a distance-dependent compute bias for neighboring tokens in the sequence. In this work, we address these shortcomings by proposing new scoring functions based on computationally efficient structured matrices with high ranks, including Block Tensor-Train (BTT) and Multi-Level Low Rank (MLR) matrices. On in-context regression tasks with high-dimensional inputs, our proposed scoring functions outperform standard attention for any fixed compute budget. On language modeling, a task that exhibits locality patterns, our MLR-based attention method achieves improved scaling laws compared to both standard attention and variants of sliding window attention. Additionally, we show that both BTT and MLR fall under a broader family of efficient structured matrices capable of encoding either full-rank or distance-dependent compute biases, thereby addressing significant shortcomings of standard attention. Finally, we show that MLR attention has promising results for long-range time-series forecasting.

【2】ACE and Diverse Generalization via Selective Disagreement
标题：ACE和通过选择性分歧的多元化概括
链接：https://arxiv.org/abs/2509.07955

作者：niels, Stuart Armstrong, Alexandre Maranhão, Mahirah Fairuz Rahman, Benjamin M. Marlin, Rebecca Gorman
摘要：众所周知，深度神经网络对虚假相关性非常敏感-其中模型学习了一个在分布外失败的捷径。现有的关于虚假相关性的工作通常集中在不完全相关性上，利用对破坏相关性的标记实例的访问。但是在伪相关性是完全的情况下，正确的推广基本上是不确定的。为了解决这种欠规范，我们建议学习一组与训练数据一致的概念，但对新的未标记输入的子集进行不同的预测。使用自我训练的方法，鼓励\textit{自信}和\textit{选择性}的分歧，我们的方法ACE匹配或优于现有的方法在一套完整的虚假相关基准，同时保持强大的不完整的虚假相关。ACE也比以前的方法更具可配置性，允许对先验知识进行直接编码和原则性的无监督模型选择。在语言模型对齐的早期应用中，我们发现ACE在测量篡改检测基准测试中实现了具有竞争力的性能，而无需访问不可信的测量。虽然仍然受到重要的限制，ACE代表着克服规格不足的重大进展。
摘要：Deep neural networks are notoriously sensitive to spurious correlations - where a model learns a shortcut that fails out-of-distribution. Existing work on spurious correlations has often focused on incomplete correlations,leveraging access to labeled instances that break the correlation. But in cases where the spurious correlations are complete, the correct generalization is fundamentally \textit{underspecified}. To resolve this underspecification, we propose learning a set of concepts that are consistent with training data but make distinct predictions on a subset of novel unlabeled inputs. Using a self-training approach that encourages \textit{confident} and \textit{selective} disagreement, our method ACE matches or outperforms existing methods on a suite of complete-spurious correlation benchmarks, while remaining robust to incomplete spurious correlations. ACE is also more configurable than prior approaches, allowing for straight-forward encoding of prior knowledge and principled unsupervised model selection. In an early application to language-model alignment, we find that ACE achieves competitive performance on the measurement tampering detection benchmark \textit{without} access to untrusted measurements. While still subject to important limitations, ACE represents significant progress towards overcoming underspecification.

【3】Smart Fast Finish: Preventing Overdelivery via Daily Budget Pacing at DoorDash
标题：智能快速完成：通过DoorDash的每日预算节奏来防止超额交付
链接：https://arxiv.org/abs/2509.07929

作者：g, Yongjin Xiao, Jason (Dianxia)Yang, Mandar Rahurkar
摘要：我们提出了一种称为智能快速完成（SFF）的预算调整功能。SFF建立在预算调整系统中的行业标准快速完成（FF）功能之上，该功能在某个固定时间段结束时尽可能快地耗尽剩余的广告预算。SFF根据历史广告活动数据动态更新系统参数，如开始时间和节流率。SFF目前在美国最大的交付平台之一DoorDash使用，是其预算调整系统的一部分。我们通过在线分割实验数据和离线模拟表明，SFF是一个强大的解决方案，用于在调整预算时缓解过度输送。
摘要：We present a budget pacing feature called Smart Fast Finish (SFF). SFF builds upon the industry standard Fast Finish (FF) feature in budget pacing systems that depletes remaining advertising budget as quickly as possible towards the end of some fixed time period. SFF dynamically updates system parameters such as start time and throttle rate depending on historical ad-campaign data. SFF is currently in use at DoorDash, one of the largest delivery platforms in the US, and is part of its budget pacing system. We show via online budget-split experimentation data and offline simulations that SFF is a robust solution for overdelivery mitigation when pacing budget.

【4】Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s
标题：加速消费者图形处理器上的本地人工智能：YOLOv 10的硬件感知动态策略
链接：https://arxiv.org/abs/2509.07928

作者：Islam Masum, Miad Islam, Arif I. Sarwat
备注：6 pages, 7 figures
摘要：随着本地人工智能的普及，目标检测器的基准性能与其在消费级硬件上的实际可行性之间存在着关键差距。虽然像YOLOv10这样的模型承诺实时速度，但这些指标通常是在高功率桌面级GPU上实现的。本文揭示了在资源受限的系统上，例如配备RTX 4060 GPU的笔记本电脑，性能不受计算限制，而是由系统级瓶颈主导，如简单的瓶颈测试所示。为了克服这个硬件级的约束，我们引入了一个两遍自适应推理算法，模型无关的方法，不需要架构的变化。本研究主要集中在自适应推理策略，并进行了比较分析的架构早退出和分辨率自适应路由，突出各自的权衡在一个统一的评估框架。该系统使用快速、低分辨率通道，并且仅在检测置信度低时才升级到高分辨率模型通道。在5000张图像的COCO数据集上，我们的方法在PyTorch早期退出基线上实现了1.85倍的加速，适度的mAP损失为5.51%。这项工作为在消费级设备上部署高性能实时AI提供了一个实用且可复制的蓝图，将重点从纯模型优化转移到最大化吞吐量的硬件感知推理策略。
摘要：As local AI grows in popularity, there is a critical gap between the benchmark performance of object detectors and their practical viability on consumer-grade hardware. While models like YOLOv10s promise real-time speeds, these metrics are typically achieved on high-power, desktop-class GPUs. This paper reveals that on resource-constrained systems, such as laptops with RTX 4060 GPUs, performance is not compute-bound but is instead dominated by system-level bottlenecks, as illustrated by a simple bottleneck test. To overcome this hardware-level constraint, we introduce a Two-Pass Adaptive Inference algorithm, a model-independent approach that requires no architectural changes. This study mainly focuses on adaptive inference strategies and undertakes a comparative analysis of architectural early-exit and resolution-adaptive routing, highlighting their respective trade-offs within a unified evaluation framework. The system uses a fast, low-resolution pass and only escalates to a high-resolution model pass when detection confidence is low. On a 5000-image COCO dataset, our method achieves a 1.85x speedup over a PyTorch Early-Exit baseline, with a modest mAP loss of 5.51%. This work provides a practical and reproducible blueprint for deploying high-performance, real-time AI on consumer-grade devices by shifting the focus from pure model optimization to hardware-aware inference strategies that maximize throughput.

【5】Feasibility of In-Ear Single-Channel ExG for Wearable Sleep~Monitoring in Real-World Settings
标题：入耳式单通道ExG用于可穿戴睡眠的可行性~现实环境中的监测
链接：https://arxiv.org/abs/2509.07896

作者：epold, Jonas Leichtle, Tobias Röddiger, Michael Beigl
摘要：自动睡眠分期通常依赖于黄金标准的EEG设置，这些设置很准确，但对于睡眠实验室之外的日常使用来说很突兀且不切实际。这限制了在现实环境中的适用性，例如家庭环境，其中需要连续的长期监测。检测睡眠开始是特别相关的，从而实现消费者应用（例如，当用户入睡时自动暂停媒体回放）。最近的研究表明，耳内EEG和全头皮EEG之间的相关性，为各种现象，这表明可穿戴，耳内设备可以允许不显眼的睡眠监测。我们通过对11名参与者（平均年龄：24岁）进行睡眠研究，研究了在可穿戴设备中使用单通道耳内电生理（ExG）信号进行自动睡眠分期的可行性，其中一只耳朵使用带有干耳塞电极（D\“atwyler SoftPulse）的定制耳机作为测量电极，另一只耳朵作为参考电极。Ground truth sleep stages从Apple Watch Ultra获得，经过睡眠分期验证。我们的系统使用leave-one-subject-out验证实现了90.5%的二进制睡眠检测准确率（Awake vs. Asphalt）和65.1%的四类分期准确率（Awake，REM，Core，Deep）。这些发现证明了耳内电极作为一种低成本、舒适的睡眠监测方法的潜力，其应用包括在用户入睡时停止播客。
摘要：Automatic sleep staging typically relies on gold-standard EEG setups, which are accurate but obtrusive and impractical for everyday use outside sleep laboratories. This limits applicability in real-world settings, such as home environments, where continuous, long-term monitoring is needed. Detecting sleep onset is particularly relevant, enabling consumer applications (e.g. automatically pausing media playback when the user falls asleep). Recent research has shown correlations between in-ear EEG and full-scalp EEG for various phenomena, suggesting wearable, in-ear devices could allow unobtrusive sleep monitoring. We investigated the feasibility of using single-channel in-ear electrophysiological (ExG) signals for automatic sleep staging in a wearable device by conducting a sleep study with 11~participants (mean age: 24), using a custom earpiece with a dry eartip electrode (D\"atwyler SoftPulse) as a measurement electrode in one ear and a reference in the other. Ground truth sleep stages were obtained from an Apple Watch Ultra, validated for sleep staging. Our system achieved 90.5% accuracy for binary sleep detection (Awake vs. Asleep) and 65.1% accuracy for four-class staging (Awake, REM, Core, Deep) using leave-one-subject-out validation. These findings demonstrate the potential of in-ear electrodes as a low-effort, comfortable approach to sleep monitoring, with applications such as stopping podcasts when users fall asleep.

【6】FUnc-SNE: A flexible, Fast, and Unconstrained algorithm for neighbour embeddings
标题：FUnc-SNE：一种灵活、快速且无约束的邻居嵌入算法
链接：https://arxiv.org/abs/2509.07681

作者：mbert, Edouard Couplet, Michel Verleysen, John Aldo Lee
备注：Preprint submitted to Neurocomputing
摘要：邻域嵌入（NE）允许将高维数据集表示为低维空间，并且经常用于数据可视化。在实践中，加速近似用于处理非常大的数据集。加速NE是具有挑战性的，并且已经探索了两个主要方向：基于负采样的非常粗略的近似（如在UMAP中）实现高有效速度，但可能在提取的结构中缺乏质量;不太粗略的近似，如在FIT-SNE或BH-t-SNE中使用的，以速度为代价提供更好的结构保留，同时还将目标维度限制为2或3，将NE限制为可视化。在某些变体中，这些成本更高的加速度的精度还可以通过专用的超参数对提取的结构进行更细粒度的控制。本文提出通过引入一种新的方法来加速NE，从而弥合这两种方法之间的差距，每次迭代需要少量的计算，同时通过超参数调整保持良好的细粒度结构保留和灵活性，而不限制嵌入空间的维数。该方法旨在交互式探索数据;因此，它放弃了其他NE方法的传统两阶段方法，允许在改变超参数时即时视觉反馈，即使这些控制过程发生在计算的高维侧。使用公开可用的GPU加速的GUI集成的方法的实验显示出在提取的结构的速度和灵活性方面的有希望的结果，并显示出在更广泛的机器学习环境中的潜在用途，具有最小的算法修改。该算法的核心是一种新的迭代近似最近邻搜索方法，与最近邻下降法相比，该方法显示出良好的效果。
摘要：Neighbour embeddings (NE) allow the representation of high dimensional datasets into lower dimensional spaces and are often used in data visualisation. In practice, accelerated approximations are employed to handle very large datasets. Accelerating NE is challenging, and two main directions have been explored: very coarse approximations based on negative sampling (as in UMAP) achieve high effective speed but may lack quality in the extracted structures; less coarse approximations, as used in FIt-SNE or BH-t-SNE, offer better structure preservation at the cost of speed, while also restricting the target dimensionality to 2 or 3, limiting NE to visualisation. In some variants, the precision of these costlier accelerations also enables finer-grained control on the extracted structures through dedicated hyperparameters. This paper proposes to bridge the gab between both approaches by introducing a novel way to accelerate NE, requiring a small number of computations per iteration while maintaining good fine-grained structure preservation and flexibility through hyperparameter tuning, without limiting the dimensionality of the embedding space. The method was designed for interactive exploration of data; as such, it abandons the traditional two-phased approach of other NE methods, allowing instantaneous visual feedback when changing hyperparameters, even when these control processes happening on the high-dimensional side of the computations. Experiments using a publicly available, GPU accelerated GUI integration of the method show promising results in terms of speed, flexibility in the structures getting extracted, and show potential uses in broader machine learning contexts with minimal algorithmic modifications. Central to this algorithm is a novel approach to iterative approximate nearest neighbour search, which shows promising results compared to nearest neighbour descent.

【7】$ΔL$ Normalization: Rethink Loss Aggregation in RLVR
标题：$A L$标准化：重新思考WLVR中的损失聚集
链接：https://arxiv.org/abs/2509.07558

作者：e, Xufang Luo, Yike Zhang, Yuqing Yang, Lili Qiu
摘要：我们提出了$\Delta L$ Normalization，一个简单而有效的损失聚合方法，专门针对具有可验证奖励的强化学习（RLVR）中动态生成长度的特点。最近，RLVR在提高大型语言模型（LLM）的推理能力方面表现出了强大的潜力，但主要挑战在于训练过程中响应长度的大变化，这导致高梯度方差和不稳定的优化。虽然以前的方法，如GRPO，DAPO和GRPO博士引入了不同的损失归一化项来解决这个问题，但它们要么产生有偏的估计，要么仍然受到高梯度方差的影响。通过理论和实证分析不同长度对保单损失的影响，我们将问题转化为寻找最小方差无偏估计量。我们提出的$\Delta L$ Normalization不仅提供了一个真实的政策损失的无偏估计，而且在理论上最小化梯度方差。大量的实验表明，它在不同的模型大小，最大长度和任务中始终获得优异的结果。我们的代码将在https://github.com/zerolllin/Delta-L-Normalization上公开发布。
摘要：We propose $\Delta L$ Normalization, a simple yet effective loss aggregation method tailored to the characteristic of dynamic generation lengths in Reinforcement Learning with Verifiable Rewards (RLVR). Recently, RLVR has demonstrated strong potential in improving the reasoning capabilities of large language models (LLMs), but a major challenge lies in the large variability of response lengths during training, which leads to high gradient variance and unstable optimization. Although previous methods such as GRPO, DAPO, and Dr. GRPO introduce different loss normalization terms to address this issue, they either produce biased estimates or still suffer from high gradient variance. By analyzing the effect of varying lengths on policy loss both theoretically and empirically, we reformulate the problem as finding a minimum-variance unbiased estimator. Our proposed $\Delta L$ Normalization not only provides an unbiased estimate of the true policy loss but also minimizes gradient variance in theory. Extensive experiments show that it consistently achieves superior results across different model sizes, maximum lengths, and tasks. Our code will be made public at https://github.com/zerolllin/Delta-L-Normalization.

【8】Autonomous Code Evolution Meets NP-Completeness
标题：自主代码进化满足NP完整性
链接：https://arxiv.org/abs/2509.07367

作者： Rongjian Liang, Chia-Tung Ho, Haoxing Ren
备注：31 pages, 11 figures
摘要：大型语言模型（LLM）最近显示出强大的编码能力，不仅可以实现静态代码生成，还可以通过代理框架实现迭代代码的自我进化。最近，AlphaEvolve证明了基于LLM的编码代理可以自主改进算法并超越人类专家，其范围仅限于跨越数百行代码的孤立内核。受AlphaEvolve的启发，我们提出了SATLUTION，这是第一个将基于LLM的代码演化扩展到完整存储库规模的框架，包含数百个文件和数万行C/C++代码。针对布尔可满足性（SAT），规范的NP完全问题和理论和应用的基石。SATLUTION协调LLM代理在严格的正确性保证和分布式运行时反馈下直接进化求解器存储库，同时自我进化自己的进化策略和规则。从SAT Competition 2024代码库和基准开始，SATLUTION开发出的求解器的性能决定性地超过了SAT Competition 2025的人类设计获胜者，并且在2024年基准上超过了2024年和2025年的冠军。
摘要：Large language models (LLMs) have recently shown strong coding abilities, enabling not only static code generation but also iterative code self-evolving through agentic frameworks. Recently, AlphaEvolve \cite{novikov2025alphaevolve} demonstrated that LLM-based coding agents can autonomously improve algorithms and surpass human experts, with scopes limited to isolated kernels spanning hundreds of lines of code. Inspired by AlphaEvolve, we present SATLUTION, the first framework to extend LLM-based code evolution to the full repository scale, encompassing hundreds of files and tens of thousands of lines of C/C++ code. Targeting Boolean Satisfiability (SAT), the canonical NP-complete problem and a cornerstone of both theory and applications. SATLUTION orchestrates LLM agents to directly evolve solver repositories under strict correctness guarantees and distributed runtime feedback, while simultaneously self-evolving its own evolution policies and rules. Starting from SAT Competition 2024 codebases and benchmark, SATLUTION evolved solvers that decisively outperformed the human-designed winners of the SAT Competition 2025, and also surpassed both 2024 and 2025 champions on the 2024 benchmarks.

【9】Causal Attention with Lookahead Keys
标题：使用前瞻键引起注意
链接：https://arxiv.org/abs/2509.07301

作者：Song, Peng Sun, Huizhuo Yuan, Quanquan Gu
摘要：在标准的因果注意中，每个标记的查询、键和值（QKV）都是静态的，只对前面的上下文进行编码。我们引入CAuSal atTtention与Lookahead kEys（城堡），一个注意力机制，不断更新每个令牌的密钥的上下文展开。我们称这些更新的键为前瞻键，因为它们属于较早的位置，但整合了来自相对于这些位置稍后出现的标记的信息，同时严格保留了自回归属性。虽然该机制出现顺序，我们推导出一个数学等价，避免明确实现前瞻键在每个位置，并使高效的并行训练。在语言建模基准测试中，CASTLE在各个模型尺度上的表现始终优于标准的因果注意力，减少了验证的困惑，并提高了一系列下游任务的性能。
摘要：In standard causal attention, each token's query, key, and value (QKV) are static and encode only preceding context. We introduce CAuSal aTtention with Lookahead kEys (CASTLE), an attention mechanism that continually updates each token's keys as the context unfolds. We term these updated keys lookahead keys because they belong to earlier positions yet integrate information from tokens that appear later relative to those positions, while strictly preserving the autoregressive property. Although the mechanism appears sequential, we derive a mathematical equivalence that avoids explicitly materializing lookahead keys at each position and enables efficient parallel training. On language modeling benchmarks, CASTLE consistently outperforms standard causal attention across model scales, reducing validation perplexity and improving performance on a range of downstream tasks.

【10】ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers
标题：ALICE：一种用于替代密码推广的可解释神经架构
链接：https://arxiv.org/abs/2509.07282

作者：, Lindsay Smith
备注：Preprint. Project page at this https URL
摘要：我们提出密码解决作为一个理想的测试平台，研究神经网络的推广组合复杂的域。在这个任务中，模型必须解密用替换密码编码的文本，从26!可能的映射而不显式访问密码。我们开发ALICE（一个架构学习可解释的密码dEcipherment）：一个简单的编码器，只有Transformer，设置一个新的国家的最先进的准确性和速度对这个解密问题。令人惊讶的是，ALICE在仅${\sim} 1500 $唯一密码的训练后推广到看不见的密码，这是可能的密码空间的一小部分（3.7\times 10 ^{-24}$）。为了增强可解释性，我们引入了一种新的双射解码头，通过Gumbel-Sinkhorn方法明确建模排列，从而直接提取学习的密码映射。通过早期退出分析，我们揭示了ALICE如何逐步完善其预测的方式，似乎反映了人类对这项任务的共同策略：早期层采用基于频率的语法，中间层形成单词结构，最后层纠正个别字符。我们的架构创新和分析方法超越了密码学，扩展到具有双射映射和组合结构的任何领域，为神经网络的泛化和可解释性提供了新的见解。
摘要：We present cryptogram solving as an ideal testbed for studying neural network generalization in combinatorially complex domains. In this task, models must decrypt text encoded with substitution ciphers, choosing from 26! possible mappings without explicit access to the cipher. We develop ALICE (an Architecture for Learning Interpretable Cryptogram dEcipherment): a simple encoder-only Transformer that sets a new state-of-the-art for both accuracy and speed on this decryption problem. Surprisingly, ALICE generalizes to unseen ciphers after training on only ${\sim}1500$ unique ciphers, a minute fraction ($3.7 \times 10^{-24}$) of the possible cipher space. To enhance interpretability, we introduce a novel bijective decoding head that explicitly models permutations via the Gumbel-Sinkhorn method, enabling direct extraction of learned cipher mappings. Through early exit analysis, we reveal how ALICE progressively refines its predictions in a way that appears to mirror common human strategies for this task: early layers employ frequency-based heuristics, middle layers form word structures, and final layers correct individual characters. Our architectural innovations and analysis methods extend beyond cryptograms to any domain with bijective mappings and combinatorial structure, offering new insights into neural network generalization and interpretability.

【11】Riemannian Batch Normalization: A Gyro Approach
标题：黎曼批量归一化：陀螺法
链接：https://arxiv.org/abs/2509.07115

作者：en, Xiao-Jun Wu, Nicu Sebe
摘要：规范化层对于深度学习至关重要，但它们的欧几里得公式对于流形上的数据来说是不够的。另一方面，机器学习中的许多黎曼流形允许陀螺结构，使欧几里得神经网络能够原则性地扩展到非欧几里得域。受此启发，我们介绍GyroBN，一个原则性的黎曼批量规范化框架的gyrogroups。我们建立了两个必要条件，即伪归约和陀螺等距回转，保证GyroBN理论上控制样本统计，并表明这些条件适用于机器学习中所有已知的陀螺群。我们的框架还结合了几个现有的黎曼规范化方法作为特殊情况。我们进一步实例化GyroBN七个代表性的几何形状，包括格拉斯曼，五个常曲率空间，和相关流形，并推导出新的陀螺仪和黎曼结构，使这些实例化。这些几何形状的实验证明了GyroBN的有效性。该代码可在https://github.com/GitZH-Chen/GyroBN.git上获得。
摘要：Normalization layers are crucial for deep learning, but their Euclidean formulations are inadequate for data on manifolds. On the other hand, many Riemannian manifolds in machine learning admit gyro-structures, enabling principled extensions of Euclidean neural networks to non-Euclidean domains. Inspired by this, we introduce GyroBN, a principled Riemannian batch normalization framework for gyrogroups. We establish two necessary conditions, namely \emph{pseudo-reduction} and \emph{gyroisometric gyrations}, that guarantee GyroBN with theoretical control over sample statistics, and show that these conditions hold for all known gyrogroups in machine learning. Our framework also incorporates several existing Riemannian normalization methods as special cases. We further instantiate GyroBN on seven representative geometries, including the Grassmannian, five constant curvature spaces, and the correlation manifold, and derive novel gyro and Riemannian structures to enable these instantiations. Experiments across these geometries demonstrate the effectiveness of GyroBN. The code is available at https://github.com/GitZH-Chen/GyroBN.git.

【12】Sequentially Auditing Differential Privacy
标题：顺序审计差异隐私
链接：https://arxiv.org/abs/2509.07055

作者：zález, Mateo Dulce-Rubio, Aaditya Ramdas, Mónica Ribero
摘要：我们提出了一个实用的序贯测试审计差异的黑盒机制的隐私保证。该测试处理机制的输出流，在控制I类错误的同时提供任何时间有效的推理，克服了以前批量审计方法的固定样本量限制。实验表明，该测试检测到的违规行为的样本大小比现有方法小几个数量级，将这个数字从50 K减少到几百个例子，跨越不同的现实机制。值得注意的是，它在一次训练运行中识别DP-SGD隐私侵犯，而不像以前的方法需要完整的模型训练。
摘要：We propose a practical sequential test for auditing differential privacy guarantees of black-box mechanisms. The test processes streams of mechanisms' outputs providing anytime-valid inference while controlling Type I error, overcoming the fixed sample size limitation of previous batch auditing methods. Experiments show this test detects violations with sample sizes that are orders of magnitude smaller than existing methods, reducing this number from 50K to a few hundred examples, across diverse realistic mechanisms. Notably, it identifies DP-SGD privacy violations in \textit{under} one training run, unlike prior methods needing full model training.

【13】Statistical Methods in Generative AI
标题：生成人工智能中的统计方法
链接：https://arxiv.org/abs/2509.07054

作者：riban
备注：Invited review paper for Annual Review of Statistics and Its Application. Feedback welcome
摘要：生成式人工智能正在成为一项重要的技术，有望在许多领域实现变革。与此同时，生成式人工智能技术基于概率模型的采样，默认情况下，它们无法保证正确性、安全性、公平性或其他属性。统计方法为提高生成式AI技术的可靠性提供了一种有前途的潜在方法。此外，统计方法也有望提高人工智能评估的质量和效率，以及设计人工智能的干预措施和实验。在本文中，我们回顾了这些主题的一些现有工作，解释了所使用的一般统计技术，以及它们在生成AI中的应用。我们还讨论了局限性和潜在的未来方向。
摘要：Generative Artificial Intelligence is emerging as an important technology, promising to be transformative in many areas. At the same time, generative AI techniques are based on sampling from probabilistic models, and by default, they come with no guarantees about correctness, safety, fairness, or other properties. Statistical methods offer a promising potential approach to improve the reliability of generative AI techniques. In addition, statistical methods are also promising for improving the quality and efficiency of AI evaluation, as well as for designing interventions and experiments in AI. In this paper, we review some of the existing work on these topics, explaining both the general statistical techniques used, as well as their applications to generative AI. We also discuss limitations and potential future directions.

【14】End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers
标题：关键词发现的端到端效率：嵌入式微控制器的系统级方法
链接：https://arxiv.org/abs/2509.07051

作者：rtoli, Tommaso Bondini, Christian Veronesi, Andrea Giudici, Niccolò Antonello, Franco Zappa
备注：4 pages, 2 figures, 1 table. Accepted for publication in IEEE Sensors 2025. \c{opyright} 2025 IEEE. Personal use permitted. Permission from IEEE required for all other uses
摘要：关键字识别（KWS）是嵌入式和物联网设备中免提交互的关键支持技术，其中严格的内存和能源限制对支持AI的设备的部署构成了挑战。在这项工作中，我们系统地评估和比较了几种最先进的轻量级神经网络架构，包括DS-CNN，LiCoNet和TENet，以及我们提出的基于MobileNet的Typman-KWS（TKWS）架构，专为微控制器单元（MCU）上的高效KWS而设计。与之前仅关注模型推理的研究不同，我们的分析涵盖了从梅尔频率倒谱系数（MFCC）特征提取到神经推理的整个处理流程，并在三个STM 32平台（N6，H7和U 5）上进行了基准测试。我们的研究结果表明，TKWS与三个残留块实现高达92.4%的F1分数，只有14.4k的参数，减少内存占用，而不影响准确性。此外，具有集成神经加速功能的N6 MCU实现了最佳的能量延迟积（EDP），即使在高分辨率特性下也能实现高效、低延迟的操作。我们的研究结果强调了模型的准确性本身并不能决定现实世界的有效性;相反，最佳的关键字定位部署需要仔细考虑特征提取参数和特定于硬件的优化。
摘要：Keyword spotting (KWS) is a key enabling technology for hands-free interaction in embedded and IoT devices, where stringent memory and energy constraints challenge the deployment of AI-enabeld devices. In this work, we systematically evaluate and compare several state-of-the-art lightweight neural network architectures, including DS-CNN, LiCoNet, and TENet, alongside our proposed Typman-KWS (TKWS) architecture built upon MobileNet, specifically designed for efficient KWS on microcontroller units (MCUs). Unlike prior studies focused solely on model inference, our analysis encompasses the entire processing pipeline, from Mel-Frequency Cepstral Coefficient (MFCC) feature extraction to neural inference, and is benchmarked across three STM32 platforms (N6, H7, and U5). Our results show that TKWS with three residual blocks achieves up to 92.4% F1-score with only 14.4k parameters, reducing memory footprint without compromising the accuracy. Moreover, the N6 MCU with integrated neural acceleration achieves the best energy-delay product (EDP), enabling efficient, low-latency operation even with high-resolution features. Our findings highlight the model accuracy alone does not determine real-world effectiveness; rather, optimal keyword spotting deployments require careful consideration of feature extraction parameters and hardware-specific optimization.

【15】Private Queries with Sigma-Counting
标题：带西格玛计数的私人收件箱
链接：https://arxiv.org/abs/2509.07018

作者：Jie Ding
摘要：许多数据应用程序涉及计数查询，其中客户端指定变量的可行范围，数据库返回相应的项计数。产生不同查询的计数的程序通常有泄漏敏感的个人级别信息的风险。增强数据隐私的一种流行方法是返回实际计数的噪声版本。它通常通过向每个查询添加独立的噪声来实现，然后在一段时间内控制总的隐私预算。在实践中，这种方法可能在查询数量和输出准确性方面受到限制。此外，返回的计数不维护嵌套查询的总顺序，这是许多应用程序中的一个重要功能。这项工作提出了一种新的方法，西格玛计数，解决这些挑战的设计和分析。Sigma-counting使用sigma-algebra的概念来构造隐私保护计数查询。我们表明，所提出的概念和方法可以显着提高输出的准确性，同时保持所需的隐私水平，在存在大量的查询相同的数据。我们还讨论了如何将该技术应用于解决大型和随时间变化的数据集。
摘要：Many data applications involve counting queries, where a client specifies a feasible range of variables and a database returns the corresponding item counts. A program that produces the counts of different queries often risks leaking sensitive individual-level information. A popular approach to enhance data privacy is to return a noisy version of the actual count. It is typically achieved by adding independent noise to each query and then control the total privacy budget within a period. This approach may be limited in the number of queries and output accuracy in practice. Also, the returned counts do not maintain the total order for nested queries, an important feature in many applications. This work presents the design and analysis of a new method, sigma-counting, that addresses these challenges. Sigma-counting uses the notion of sigma-algebra to construct privacy-preserving counting queries. We show that the proposed concepts and methods can significantly improve output accuracy while maintaining a desired privacy level in the presence of massive queries to the same data. We also discuss how the technique can be applied to address large and time-varying datasets.

【16】ArGen: Auto-Regulation of Generative AI via GRPO and Policy-as-Code
标题：ArGen：通过GRPO和政策即代码对生成性人工智能进行自动监管
链接：https://arxiv.org/abs/2509.07006

作者：an
备注：53 pages, 7 figures, 8 tables. Open-source implementation available at: this https URL. Work explores the integration of policy-as-code for AI alignment, with a case study in culturally-nuanced, ethical AI using Dharmic principles
摘要：本文介绍了ArGen（生成AI系统的自动调节），这是一个将大型语言模型（LLM）与复杂的可配置机器可读规则集（涵盖道德原则，操作安全协议和监管合规标准）对齐的框架。ArGen超越了基于偏好的调整，旨在确保LLM通过基于原则的自动奖励评分，组相对策略优化（GRPO）和开放策略代理（OPA）启发的治理层的新颖合成来遵守这些多方面的政策。这种方法为实现和展示对各种细微差别的治理需求的遵从性提供了技术基础。为了展示该框架能够操作一个非常微妙和文化特定的价值体系，我们提出了一个深入的案例研究：开发一个医疗人工智能助理，该助理由来自佛法伦理（如Ahimsa和Dharma）的原则指导，这些原则来自《薄伽梵歌》等文本。这个具有挑战性的应用程序展示了ArGen的适应性，在域范围遵守方面比基线提高了70.9%。通过我们的开源存储库，我们表明ArGen的方法提供了一条通往“可治理的AI”系统的道路，这些系统在技术上精通，道德上稳健，并且可验证地符合在不同全球环境中安全部署的要求。
摘要：This paper introduces ArGen (Auto-Regulation of Generative AI systems), a framework for aligning Large Language Models (LLMs) with complex sets of configurable, machine-readable rules spanning ethical principles, operational safety protocols, and regulatory compliance standards. Moving beyond just preference-based alignment, ArGen is designed to ensure LLMs adhere to these multifaceted policies through a novel synthesis of principle-based automated reward scoring, Group Relative Policy Optimisation (GRPO), and an Open Policy Agent (OPA) inspired governance layer. This approach provides the technical foundation for achieving and demonstrating compliance with diverse and nuanced governance requirements. To showcase the framework's capability to operationalize a deeply nuanced and culturally-specific value system, we present an in-depth case study: the development of a medical AI assistant guided by principles from Dharmic ethics (such as Ahimsa and Dharma), as derived from texts like the Bhagavad Gita. This challenging application demonstrates ArGen's adaptability, achieving a 70.9% improvement in domain-scope adherence over the baseline. Through our open-source repository, we show that ArGen's methodology offers a path to 'Governable Al' systems that are technically proficient, ethically robust, and verifiably compliant for safe deployment in diverse global contexts.

【17】veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD
标题：veScale：使用命令模式SPMD的一致有效张量编程
链接：https://arxiv.org/abs/2509.07003

作者：, Cheng Wan, Zhiqi Lin, Hongyu Zhu, Jiacheng Yang, Ziang Song, Xinyi Di, Jiawei Wu, Huiyao Shu, Wenlei Bao, Yanghua Peng, Haibin Lin, Li-Wen Chang
备注：21 pages, 16 figures, 5 tables
摘要：大型语言模型（LLM）在规模和复杂性方面迅速扩展，需要越来越复杂的并行性来进行分布式训练，例如3D并行性。这种复杂性促使人们转向更简单、更可调试的编程范式，如单程序多数据（SPMD）。然而，在急切执行中的SPMD引入了两个关键挑战：确保与单设备执行的一致性和实现大规模的高性能。在本文中，我们介绍了veScale，这是一个渴望模式的训练系统，它完全采用SPMD范式来实现分布式张量编程的民主化。veScale通过引入一种与任意分片运算符兼容的分布式随机数生成（RNG）新算法，解决了PyTorch等系统中普遍存在的结果不一致的问题。veScale还通过减少PyTorch原语的开销和提高通信效率来显着提高训练性能。评估表明，veScale比最先进的训练系统（如TorchTitan）提供了高达2.2倍的加速，并将代码复杂性降低了78.4%，同时保持了单设备等效的结果。
摘要：Large Language Models (LLMs) have scaled rapidly in size and complexity, requiring increasingly intricate parallelism for distributed training, such as 3D parallelism. This sophistication motivates a shift toward simpler, more debuggable programming paradigm like Single Program Multiple Data (SPMD). However, SPMD in eager execution introduces two key challenges: ensuring consistency with single-device execution and achieving high performance at scale. In this paper, we introduce veScale, an eager-mode training system that fully embraces SPMD paradigm to democratize distributed tensor programming. veScale addresses the prevalent issue of inconsistent results in systems like PyTorch by introducing a novel algorithm of distributed Random Number Generation (RNG) compatible with arbitrary sharded operators. veScale also significantly boosts training performance by reducing PyTorch primitive's overhead and improving communication efficiency. Evaluations show that veScale delivers up to 2.2x speedup over the state-of-the-art training systems, like TorchTitan, and cuts code complexity by 78.4%, while preserving single-device-equivalent results.

【18】Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories
标题：并非所有分裂都是平等的：重新思考跨不相关类别的属性概括
链接：https://arxiv.org/abs/2509.06998

作者：olae Fircă, Antonio Bărbălau, Dan Oneata, Elena Burceanu
摘要：模型能在语义和感知上不同的类别中概括属性知识吗？虽然先前的工作已经解决了狭窄的分类或视觉上相似的领域内的属性预测，但目前还不清楚当前的模型是否可以抽象属性并将其应用于概念上遥远的类别。这项工作提出了在这种条件下属性预测任务的鲁棒性的第一个显式评估，测试模型是否可以正确推断不相关对象类型之间的共享属性：例如，标识属性“有四条腿”对于“狗”和“椅子”都是共同的。为了实现这种评估，我们引入了训练-测试分割策略，这些策略基于LLM驱动的语义分组、嵌入相似性阈值、基于嵌入的聚类和使用地面真值标签的基于超类别的分区，逐步降低训练集和测试集之间的相关性。结果显示，随着训练和测试类别之间的相关性降低，性能急剧下降，表明对分裂设计的强烈敏感性。在评估的方法中，聚类产生最有效的权衡，减少隐藏的相关性，同时保持可学习性。这些发现提供了新的见解，目前的表示的局限性，并告知未来的基准建设属性推理。
摘要：Can models generalize attribute knowledge across semantically and perceptually dissimilar categories? While prior work has addressed attribute prediction within narrow taxonomic or visually similar domains, it remains unclear whether current models can abstract attributes and apply them to conceptually distant categories. This work presents the first explicit evaluation for the robustness of the attribute prediction task under such conditions, testing whether models can correctly infer shared attributes between unrelated object types: e.g., identifying that the attribute "has four legs" is common to both "dogs" and "chairs". To enable this evaluation, we introduce train-test split strategies that progressively reduce correlation between training and test sets, based on: LLM-driven semantic grouping, embedding similarity thresholding, embedding-based clustering, and supercategory-based partitioning using ground-truth labels. Results show a sharp drop in performance as the correlation between training and test categories decreases, indicating strong sensitivity to split design. Among the evaluated methods, clustering yields the most effective trade-off, reducing hidden correlations while preserving learnability. These findings offer new insights into the limitations of current representations and inform future benchmark construction for attribute reasoning.

【19】FediLoRA: Heterogeneous LoRA for Federated Multimodal Fine-tuning under Missing Modalities
标题：FediLoRA：用于缺失模式下联邦多模式微调的异类LoRA
链接：https://arxiv.org/abs/2509.06984

作者：ng, Nam Kha Nguygen, Po Hu, Wei Emma Zhang, Yanjun Shu, Mong Yuan Sim, Weitong Chen
备注：8 pages, 7 figures
摘要：基础模型在广泛的任务中表现出了卓越的性能，但它们的大参数大小对实际部署构成了挑战，特别是在分散的环境中。参数高效微调（PEFT），如低秩自适应（LoRA），减少了本地计算和内存开销，使其对联邦学习具有吸引力。然而，现有的联邦LoRA方法通常假设统一的等级配置和单峰输入，忽略了两个关键的现实挑战：（1）异构客户端资源具有不同的LoRA等级，以及（2）可能丢失模态的多模态数据设置。在这项工作中，我们提出了FediLoRA，这是一个简单而有效的框架，用于在异构LoRA等级和缺失模式下进行联邦多模态微调。FediLoRA引入了一种维度聚合策略，可以在聚合过程中重新加权LoRA更新，而不会稀释信息。它还包括一个轻量级的逐层模型编辑方法，该方法选择性地合并全局参数以修复局部组件，从而提高客户端和全局模型的性能。在三个多模态基准数据集上的实验结果表明，FediLoRA在全球和个性化设置中，特别是在存在模态不完整性的情况下，都实现了优于竞争基线的性能。
摘要：Foundation models have demonstrated remarkable performance across a wide range of tasks, yet their large parameter sizes pose challenges for practical deployment, especially in decentralized environments. Parameter-efficient fine-tuning (PEFT), such as Low-Rank Adaptation (LoRA), reduces local computing and memory overhead, making it attractive for federated learning. However, existing federated LoRA methods typically assume uniform rank configurations and unimodal inputs, overlooking two key real-world challenges: (1) heterogeneous client resources have different LoRA ranks, and (2) multimodal data settings with potentially missing modalities. In this work, we propose FediLoRA, a simple yet effective framework for federated multimodal fine-tuning under heterogeneous LoRA ranks and missing modalities. FediLoRA introduces a dimension-wise aggregation strategy that reweights LoRA updates without information dilution during aggregation. It also includes a lightweight layer-wise model editing method that selectively incorporates global parameters to repair local components which improves both client and global model performances. Experimental results on three multimodal benchmark datasets demonstrate that FediLoRA achieves superior performance over competitive baselines in both global and personalized settings, particularly in the presence of modality incompleteness.

【20】CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention
标题：关怀：通过回滚和内省干预解码时间安全一致
链接：https://arxiv.org/abs/2509.06982

作者：Hu, Fei Huang, Chenhan Yuan, Junyang Lin, Tsung-Yi Ho
摘要：随着大型语言模型（LLM）越来越多地部署在现实世界的应用程序中，确保其输出在解码过程中的安全性已成为一个关键的挑战。然而，现有的解码时间干预，如对比解码，往往迫使安全性和响应质量之间的严重权衡。在这项工作中，我们提出了一种新的解码时间安全对齐框架CARE，它集成了三个关键组件：（1）用于实时安全监控的保护模型，能够检测潜在的不安全内容;（2）具有令牌缓冲区的回滚机制，可以在早期阶段有效地纠正不安全的输出，而不会破坏用户体验;以及（3）一种新的基于内省的干预策略，其中模型生成对其先前输出的自我反思批评，并将这些反思纳入上下文以指导后续解码步骤。该框架通过使用其用于精确干预的防护模型、用于及时纠正的回滚机制以及用于有效自我纠正的新颖内省方法，实现了卓越的安全质量权衡。实验结果表明，我们的框架实现了安全性，质量和效率的卓越平衡，实现了低有害响应率和最小的用户体验中断，同时保持高响应质量。
摘要：As large language models (LLMs) are increasingly deployed in real-world applications, ensuring the safety of their outputs during decoding has become a critical challenge. However, existing decoding-time interventions, such as Contrastive Decoding, often force a severe trade-off between safety and response quality. In this work, we propose CARE, a novel framework for decoding-time safety alignment that integrates three key components: (1) a guard model for real-time safety monitoring, enabling detection of potentially unsafe content; (2) a rollback mechanism with a token buffer to correct unsafe outputs efficiently at an earlier stage without disrupting the user experience; and (3) a novel introspection-based intervention strategy, where the model generates self-reflective critiques of its previous outputs and incorporates these reflections into the context to guide subsequent decoding steps. The framework achieves a superior safety-quality trade-off by using its guard model for precise interventions, its rollback mechanism for timely corrections, and our novel introspection method for effective self-correction. Experimental results demonstrate that our framework achieves a superior balance of safety, quality, and efficiency, attaining a low harmful response rate and minimal disruption to the user experience while maintaining high response quality.

【21】A data-driven discretized CS:GO simulation environment to facilitate strategic multi-agent planning research
标题：数据驱动的离散化CS：GO模拟环境，促进战略多智能体规划研究
链接：https://arxiv.org/abs/2509.06355

作者：ng, Volkan Ustun, Chris McGroarty
备注：Accepted at the Winter Simulation Conference 2025, December, Seattle USA
摘要：复杂多智能体交互的现代仿真环境必须平衡高保真细节与计算效率。我们提出了DECOY，一种新的多智能体模拟器，抽象的战略，长期规划的三维地形到高层次的离散化模拟，同时保持低层次的环境保真度。使用Counter-Strike：全球攻势（CS：GO）作为一个测试平台，我们的框架仅使用移动决策作为战术定位来准确地模拟游戏玩法，而无需明确建模瞄准和射击等低级机制。我们的方法的核心是一个路点系统，它简化和离散化了连续的状态和动作，并与在真实CS：GO锦标赛数据上训练的神经预测和生成模型相结合，以重建事件结果。广泛的评估表明，从DECOY中的人类数据生成的重放与原始游戏中观察到的重放非常匹配。我们公开的仿真环境提供了一个有价值的工具，推进战略多智能体规划和行为生成的研究。
摘要：Modern simulation environments for complex multi-agent interactions must balance high-fidelity detail with computational efficiency. We present DECOY, a novel multi-agent simulator that abstracts strategic, long-horizon planning in 3D terrains into high-level discretized simulation while preserving low-level environmental fidelity. Using Counter-Strike: Global Offensive (CS:GO) as a testbed, our framework accurately simulates gameplay using only movement decisions as tactical positioning -- without explicitly modeling low-level mechanics such as aiming and shooting. Central to our approach is a waypoint system that simplifies and discretizes continuous states and actions, paired with neural predictive and generative models trained on real CS:GO tournament data to reconstruct event outcomes. Extensive evaluations show that replays generated from human data in DECOY closely match those observed in the original game. Our publicly available simulation environment provides a valuable tool for advancing research in strategic multi-agent planning and behavior generation.

【22】Nuclear Data Adjustment for Nonlinear Applications in the OECD/NEA WPNCS SG14 Benchmark -- A Bayesian Inverse UQ-based Approach for Data Assimilation
标题：OECD/NEA WPNSSG 14基准中非线性应用的核数据调整--基于Bayesian逆UQ的数据同化方法
链接：https://arxiv.org/abs/2509.07790

作者：er Brady (1), Xu Wu (1) ((1) North Carolina State University)
备注：31 pages, 9 tables, 8 figures, submitted to Nuclear Science and Engineering, included in proceedings of International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering (M&C 2025)
摘要：经济合作与发展组织（经合组织）核临界安全工作队（临界安全工作队）提出了一项基准工作，以评估目前应用于非线性应用和与应用相关性低的实验的核数据调整技术的性能。这项工作介绍了贝叶斯逆不确定性量化（IUQ）作为核数据调整的方法，在这个基准，并比较IUQ的广义线性最小二乘法（GLLS）和蒙特卡罗贝叶斯（MOCABA）的更传统的方法。IUQ的后验预测结果与GLLS和MOCABA的线性应用程序一致。当比较GLLS，MOCABA，和IUQ后验预测计算模型响应使用调整后的参数，我们观察到GLLS预测无法复制计算的响应分布的非线性应用，而MOCABA显示接近协议，和IUQ直接使用计算模型响应。我们还讨论了为什么低相关性的应用程序的实验可以提供信息的核数据调整，并确定一些有用的属性，在选择实验列入核数据调整。该基准的性能表明贝叶斯IUQ在核数据调整中的潜力。
摘要：The Organization for Economic Cooperation and Development (OECD) Working Party on Nuclear Criticality Safety (WPNCS) proposed a benchmark exercise to assess the performance of current nuclear data adjustment techniques applied to nonlinear applications and experiments with low correlation to applications. This work introduces Bayesian Inverse Uncertainty Quantification (IUQ) as a method for nuclear data adjustments in this benchmark, and compares IUQ to the more traditional methods of Generalized Linear Least Squares (GLLS) and Monte Carlo Bayes (MOCABA). Posterior predictions from IUQ showed agreement with GLLS and MOCABA for linear applications. When comparing GLLS, MOCABA, and IUQ posterior predictions to computed model responses using adjusted parameters, we observe that GLLS predictions fail to replicate computed response distributions for nonlinear applications, while MOCABA shows near agreement, and IUQ uses computed model responses directly. We also discuss observations on why experiments with low correlation to applications can be informative to nuclear data adjustments and identify some properties useful in selecting experiments for inclusion in nuclear data adjustment. Performance in this benchmark indicates potential for Bayesian IUQ in nuclear data adjustments.

【23】Building causation links in stochastic nonlinear systems from data
标题：根据数据在随机非线性系统中建立因果关系
链接：https://arxiv.org/abs/2509.07701

作者：ibbaro, Cyril Furtlehner, Théo Marchetta, Andrei-Tiberiu Pantea, Davide Rossetti
备注：24 pages, 11 Figures. Comments are welcome
摘要：因果关系在理解我们周围的世界中发挥着重要作用。识别和理解因果关系的能力对于做出明智的决策，预测结果和制定有效的战略至关重要。然而，从观测数据中解读因果关系是一项艰巨的任务，因为相关性本身可能无法提供因果关系的确切证据。近年来，机器学习（ML）领域已经成为一个强大的工具，为发现隐藏的因果机制和更好地理解复杂系统提供了新的机会。在这项工作中，我们解决的问题，检测的内在因果联系的一个大类的复杂系统的响应理论在物理学的框架。我们发展了[1]提出的一些理论思想，从技术上讲，我们使用最先进的ML技术从数据中建立模型。我们考虑线性随机和非线性系统。最后，我们计算了在大规模线性相互作用马尔可夫过程网络的情况下，基于线性响应的因果预测的渐近效率。
摘要：Causal relationships play a fundamental role in understanding the world around us. The ability to identify and understand cause-effect relationships is critical to making informed decisions, predicting outcomes, and developing effective strategies. However, deciphering causal relationships from observational data is a difficult task, as correlations alone may not provide definitive evidence of causality. In recent years, the field of machine learning (ML) has emerged as a powerful tool, offering new opportunities for uncovering hidden causal mechanisms and better understanding complex systems. In this work, we address the issue of detecting the intrinsic causal links of a large class of complex systems in the framework of the response theory in physics. We develop some theoretical ideas put forward by [1], and technically we use state-of-the-art ML techniques to build up models from data. We consider both linear stochastic and non-linear systems. Finally, we compute the asymptotic efficiency of the linear response based causal predictor in a case of large scale Markov process network of linear interactions.

【24】Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription
标题：探索最小延迟实时钢琴抄写的系统适应
链接：https://arxiv.org/abs/2509.07586

作者：Hu, Silvan David Peter, Jan Schlüter, Gerhard Widmer
备注：to be published in Proceedings of the 26th International Society for Music Information Retrieval (ISMIR) Conference 2025, Daejeon, South Korea
摘要：神经网络设计的进步和大规模标记数据集的可用性推动了钢琴转录的重大改进。现有的方法针对离线应用程序，没有限制的计算需求，或在线转录，与延迟128-320毫秒。然而，大多数实时音乐应用程序需要延迟低于30毫秒。在这项工作中，我们调查是否以及如何目前最先进的在线转录模型可以适应实时钢琴转录。具体来说，我们消除了所有非因果处理，并通过跨核心模型组件和模型大小变化的共享计算来减少计算负载。此外，我们还探索了不同的预处理和后处理策略以及相关的标签编码方案，并讨论了它们对实时转录的适用性。评估MAESTRO数据集上的自适应，我们发现由于严格的因果处理以及预处理延迟和预测准确性之间的权衡，转录准确性下降。我们发布我们的系统作为基线，以支持研究人员设计最小延迟实时转录的模型。
摘要：Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 128-320 ms. However, most real-time musical applications require latencies below 30 ms. In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription. Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size. Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription. Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy. We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.

【25】Asynchronous Gossip Algorithms for Rank-Based Statistical Methods
标题：基于排名的统计方法的同步八卦算法
链接：https://arxiv.org/abs/2509.07543

作者：Elst, Igor Colin, Stephan Clémençon
摘要：随着去中心化人工智能和边缘智能变得越来越普遍，确保这种分布式环境中的鲁棒性和可信度已成为一个关键问题，特别是在存在损坏或对抗性数据的情况下。传统的分散式算法容易受到数据污染的影响，因为它们通常依赖于简单的统计数据（例如，手段或总和），激发了对更可靠统计数据的需求。根据最近的工作分散估计修剪的手段和排名，我们开发的八卦算法计算一个广泛的类基于排名的统计，包括L-统计和排名统计，都知道他们的鲁棒性离群值。我们应用我们的方法进行强大的分布式双样本假设检验，介绍了第一个八卦算法Wilcoxon秩和检验。我们提供了严格的收敛保证，包括异步gossip为基础的秩估计的第一个收敛速度界。我们经验验证我们的理论结果，通过不同的网络拓扑结构的实验。
摘要：As decentralized AI and edge intelligence become increasingly prevalent, ensuring robustness and trustworthiness in such distributed settings has become a critical issue-especially in the presence of corrupted or adversarial data. Traditional decentralized algorithms are vulnerable to data contamination as they typically rely on simple statistics (e.g., means or sum), motivating the need for more robust statistics. In line with recent work on decentralized estimation of trimmed means and ranks, we develop gossip algorithms for computing a broad class of rank-based statistics, including L-statistics and rank statistics-both known for their robustness to outliers. We apply our method to perform robust distributed two-sample hypothesis testing, introducing the first gossip algorithm for Wilcoxon rank-sum tests. We provide rigorous convergence guarantees, including the first convergence rate bound for asynchronous gossip-based rank estimation. We empirically validate our theoretical results through experiments on diverse network topologies.

【26】RINO: Renormalization Group Invariance with No Labels
标题：RINO：无标签的重正化群不变性
链接：https://arxiv.org/abs/2509.07486

作者：o, Raghav Kansal, Abhijith Gandrakota, Chang Sun, Ngadiuba Jennifer, Javier Duarte, Maria Spiropulu
备注：Submission for Machine Learning and the Physical Sciences Workshop @ NeurIPS 2025
摘要：在高能物理（HEP）中，监督机器学习（ML）的一个常见挑战是依赖于对标记数据的模拟，这通常会对潜在的碰撞或探测器响应进行错误建模。为了帮助缓解这种域转移的问题，我们提出了RINO（没有标签的重正化群不变性），这是一种自监督学习方法，可以直接在碰撞数据上预训练模型，学习重正化群流尺度不变的嵌入。在这项工作中，我们对JetClass数据集的量子色动力学（QCD）相互作用产生的射流进行了基于变换器的模型预训练，模拟真实的QCD主导的实验数据，然后在JetNet数据集上进行微调-模拟模拟-用于识别顶夸克衰变产生的射流。与JetNet上从头开始的监督训练相比，RINO证明了从JetNet训练数据到JetClass数据的泛化能力，证明了RINO对真实碰撞数据进行预训练的潜力，然后对小型高质量MC数据集进行微调，以提高HEP中ML模型的鲁棒性。
摘要：A common challenge with supervised machine learning (ML) in high energy physics (HEP) is the reliance on simulations for labeled data, which can often mismodel the underlying collision or detector response. To help mitigate this problem of domain shift, we propose RINO (Renormalization Group Invariance with No Labels), a self-supervised learning approach that can instead pretrain models directly on collision data, learning embeddings invariant to renormalization group flow scales. In this work, we pretrain a transformer-based model on jets originating from quantum chromodynamic (QCD) interactions from the JetClass dataset, emulating real QCD-dominated experimental data, and then finetune on the JetNet dataset -- emulating simulations -- for the task of identifying jets originating from top quark decays. RINO demonstrates improved generalization from the JetNet training data to JetClass data compared to supervised training on JetNet from scratch, demonstrating the potential for RINO pretraining on real collision data followed by fine-tuning on small, high-quality MC datasets, to improve the robustness of ML models in HEP.

【27】Identifying Neural Signatures from fMRI using Hybrid Principal Components Regression
标题：使用混合主成分回归从fMRI识别神经签名
链接：https://arxiv.org/abs/2509.07300

作者：ck, Julia Wrobel, Joshua L. Gowin, Yue Wang, Martin Paulus, Ryan Peterson
摘要：神经成像分析的最新进展使得能够在功能磁共振成像扫描期间从大脑激活模式准确解码精神状态。为此目的常用的工具是用最小绝对收缩和选择算子正则化的主成分回归（LASSO PCR），一种类型的多体素模式分析（MVPA）。该模型假设所有组件都同样可能包含相关信息，而事实上与任务相关的信号可能集中在特定组件中。在这种情况下，模型将无法选择最佳的主成分集，最大化与所研究的认知过程相关的总信号。在这里，我们提出了对LASSO PCR的修改，允许正则化惩罚直接与主成分的索引相关联，反映了先前的信念，即任务相关信号更有可能集中在解释更大方差的成分中。此外，我们提出了一种新的混合方法，联合稀疏排名LASSO（JSRL），它集成了组件级和体素级的活动下的信息奇偶校验框架，并施加排名稀疏指导组件的选择。我们将这些模型应用于冒险、金钱激励和情绪调节任务中的大脑激活。结果表明，将稀疏性排名纳入LASSO PCR可产生具有增强分类性能的模型，JSRL在交叉验证的偏差$R^2$方面实现了高达51.7%的改进，在交叉验证的AUC方面实现了7.3%的改进。此外，稀疏排名模型在所有分类任务中的表现与标准LASSO PCR方法一样好或更好，并将预测权重分配给与其既定功能角色一致的大脑区域，为MVPA提供了一个强大的替代方案。
摘要：Recent advances in neuroimaging analysis have enabled accurate decoding of mental state from brain activation patterns during functional magnetic resonance imaging scans. A commonly applied tool for this purpose is principal components regression regularized with the least absolute shrinkage and selection operator (LASSO PCR), a type of multi-voxel pattern analysis (MVPA). This model presumes that all components are equally likely to harbor relevant information, when in fact the task-related signal may be concentrated in specific components. In such cases, the model will fail to select the optimal set of principal components that maximizes the total signal relevant to the cognitive process under study. Here, we present modifications to LASSO PCR that allow for a regularization penalty tied directly to the index of the principal component, reflecting a prior belief that task-relevant signal is more likely to be concentrated in components explaining greater variance. Additionally, we propose a novel hybrid method, Joint Sparsity-Ranked LASSO (JSRL), which integrates component-level and voxel-level activity under an information parity framework and imposes ranked sparsity to guide component selection. We apply the models to brain activation during risk taking, monetary incentive, and emotion regulation tasks. Results demonstrate that incorporating sparsity ranking into LASSO PCR produces models with enhanced classification performance, with JSRL achieving up to 51.7\% improvement in cross-validated deviance $R^2$ and 7.3\% improvement in cross-validated AUC. Furthermore, sparsity-ranked models perform as well as or better than standard LASSO PCR approaches across all classification tasks and allocate predictive weight to brain regions consistent with their established functional roles, offering a robust alternative for MVPA.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递